1 00:00:00,680 --> 00:00:03,920 We have seen so far an example of a probability law on a 2 00:00:03,920 --> 00:00:07,590 discrete and finite sample space as well as an example 3 00:00:07,590 --> 00:00:10,550 with an infinite and continuous sample space. 4 00:00:10,550 --> 00:00:14,830 Let us now look at an example involving a discrete but 5 00:00:14,830 --> 00:00:17,350 infinite sample space. 6 00:00:17,350 --> 00:00:20,810 We carry out an experiment whose outcome is an arbitrary 7 00:00:20,810 --> 00:00:22,890 positive integer. 8 00:00:22,890 --> 00:00:25,300 As an example of such an experiment, suppose that we 9 00:00:25,300 --> 00:00:28,610 keep tossing a coin and the outcome is the number of 10 00:00:28,610 --> 00:00:32,850 tosses until we observe heads for the first time. 11 00:00:32,850 --> 00:00:36,140 The first heads might appear in the first toss or the 12 00:00:36,140 --> 00:00:39,150 second or the third, and so on. 13 00:00:39,150 --> 00:00:42,890 So in this example, any positive integer is possible. 14 00:00:42,890 --> 00:00:46,320 And so our sample space is infinite. 15 00:00:46,320 --> 00:00:49,480 Let us not specify a probability law. 16 00:00:49,480 --> 00:00:52,730 A probability law should determine the probability of 17 00:00:52,730 --> 00:00:56,950 every event, of every subset of the sample space. 18 00:00:56,950 --> 00:00:59,080 That is, the probability of every 19 00:00:59,080 --> 00:01:01,970 set of positive integers. 20 00:01:01,970 --> 00:01:06,140 But instead I will just tell you the probability of events 21 00:01:06,140 --> 00:01:08,860 that contain a single element. 22 00:01:08,860 --> 00:01:13,050 I'm going to tell you that there is probability 1 over 2 23 00:01:13,050 --> 00:01:18,010 to the n that the outcome is equal to n. 24 00:01:18,010 --> 00:01:19,860 Is this good enough? 25 00:01:19,860 --> 00:01:23,800 Is this information enough to determine the probability of 26 00:01:23,800 --> 00:01:26,420 any subset? 27 00:01:26,420 --> 00:01:28,950 Before we look into that question, let us first do a 28 00:01:28,950 --> 00:01:32,425 quick sanity check to see whether these numbers that we 29 00:01:32,425 --> 00:01:35,420 are given look like legitimate probabilities. 30 00:01:35,420 --> 00:01:37,410 Do they add to 1? 31 00:01:37,410 --> 00:01:39,410 Let's do a quick check. 32 00:01:39,410 --> 00:01:45,840 So the sum over all the possible values of n of the 33 00:01:45,840 --> 00:01:49,610 probabilities that we're given, which is an infinite 34 00:01:49,610 --> 00:01:55,520 sum starting from 1, all the way up to infinity, of 1 over 35 00:01:55,520 --> 00:01:58,700 2 to the n, is equal to the following. 36 00:01:58,700 --> 00:02:04,250 First we take out a factor of 1/2 from all of these terms, 37 00:02:04,250 --> 00:02:08,080 which reduces the exponent from n to n minus 1. 38 00:02:08,080 --> 00:02:13,700 This is the same as running the sum from n equals 0 to 39 00:02:13,700 --> 00:02:19,310 infinity of 1/2 and to the n. 40 00:02:19,310 --> 00:02:24,980 And now we have a usual infinite geometric series and 41 00:02:24,980 --> 00:02:27,730 we have a formula for this. 42 00:02:27,730 --> 00:02:33,320 The geometric series has a value of 1 over 1 minus the 43 00:02:33,320 --> 00:02:36,665 number whose power we're taking, which is 1/2. 44 00:02:39,280 --> 00:02:42,520 And after we do the arithmetic, this turns out to 45 00:02:42,520 --> 00:02:44,240 be equal to 1. 46 00:02:44,240 --> 00:02:50,860 So indeed, it appears that we have the basic elements of 47 00:02:50,860 --> 00:02:54,360 what it would take to have a legitimate probability law. 48 00:02:54,360 --> 00:02:57,870 But now let us look into how we might calculate the 49 00:02:57,870 --> 00:03:00,510 probability of some general event. 50 00:03:00,510 --> 00:03:05,370 For example, the probability that the outcome is even. 51 00:03:05,370 --> 00:03:08,300 We proceed as follows. 52 00:03:08,300 --> 00:03:11,200 The probability that the outcome is even, this is the 53 00:03:11,200 --> 00:03:15,840 probability of an infinite set that consists of 54 00:03:15,840 --> 00:03:18,610 all the even integers. 55 00:03:22,280 --> 00:03:29,760 We can write this set as the union of lots of little sets 56 00:03:29,760 --> 00:03:33,090 that contain a single element each. 57 00:03:33,090 --> 00:03:36,530 So it's the set containing the number 2, the set containing 58 00:03:36,530 --> 00:03:38,750 the number 4, the set containing the 59 00:03:38,750 --> 00:03:41,120 number 6, and so on. 60 00:03:44,010 --> 00:03:47,170 At this point we notice that we're talking about the 61 00:03:47,170 --> 00:03:51,430 probability of a union of sets and these sets are disjoint 62 00:03:51,430 --> 00:03:54,760 because they contain different elements. 63 00:03:54,760 --> 00:04:00,900 So we can use an additivity property and say that this is 64 00:04:00,900 --> 00:04:05,280 the probability of obtaining a 2, plus the probability of 65 00:04:05,280 --> 00:04:08,190 obtaining a 4, plus the probability of 66 00:04:08,190 --> 00:04:12,390 obtaining a 6 and so on. 67 00:04:12,390 --> 00:04:15,570 If you're curious about doing this calculation and actually 68 00:04:15,570 --> 00:04:19,339 obtaining a numerical answer, you would proceed as follows. 69 00:04:19,339 --> 00:04:26,030 You notice that this is 1 over 2 to the second power plus 1 70 00:04:26,030 --> 00:04:31,370 over 2 to the fourth power plus 1 over 2 to the sixth 71 00:04:31,370 --> 00:04:34,170 power and so on. 72 00:04:34,170 --> 00:04:43,260 Now you factor out a factor of 1/4 and what you're left is 1 73 00:04:43,260 --> 00:04:48,400 plus 1 over 2 to the second power, which is 1/4, plus 1 74 00:04:48,400 --> 00:04:56,000 over 2 to the fourth power, which is the same as 1/4 to 75 00:04:56,000 --> 00:04:59,760 the second power and so on. 76 00:04:59,760 --> 00:05:05,440 And now we have 1/4 times the infinite sum of a geometric 77 00:05:05,440 --> 00:05:12,620 series, which gives us 1 over 1 minus 1/4. 78 00:05:12,620 --> 00:05:16,240 And after you do the algebra you obtain a numerical answer, 79 00:05:16,240 --> 00:05:17,750 which is equal to 1/3. 80 00:05:20,260 --> 00:05:23,550 But leaving the details of the calculation aside, the more 81 00:05:23,550 --> 00:05:26,810 important question I want to address is the following. 82 00:05:26,810 --> 00:05:29,430 Is this calculation correct? 83 00:05:29,430 --> 00:05:32,370 We seem to have used an additivity 84 00:05:32,370 --> 00:05:34,370 property at this point. 85 00:05:37,720 --> 00:05:41,500 But the additivity properties that we have in our hands at 86 00:05:41,500 --> 00:05:46,800 this point only talk about disjoint unions of finitely 87 00:05:46,800 --> 00:05:48,290 many subsets. 88 00:05:48,290 --> 00:05:51,460 Our initial axiom talked about a disjoint union of two 89 00:05:51,460 --> 00:05:54,990 subsets and then later on we established a similar property 90 00:05:54,990 --> 00:05:58,820 for a disjoint union of finitely many subsets. 91 00:05:58,820 --> 00:06:02,620 But here we're talking about the union of 92 00:06:02,620 --> 00:06:05,770 infinitely many subsets. 93 00:06:05,770 --> 00:06:11,940 So this step here is not really allowed by what we have 94 00:06:11,940 --> 00:06:13,140 in our hands. 95 00:06:13,140 --> 00:06:16,500 On the other hand, we would like our theory to allow this 96 00:06:16,500 --> 00:06:18,540 kind of calculation. 97 00:06:18,540 --> 00:06:23,070 The way out of this dilemma is to introduce an additional 98 00:06:23,070 --> 00:06:27,015 axiom that will indeed allow this kind of calculation. 99 00:06:29,660 --> 00:06:32,836 The axiom that we introduce is the following. 100 00:06:32,836 --> 00:06:39,700 If we have an infinite sequence of disjoint events, 101 00:06:39,700 --> 00:06:42,430 as for example in this picture. 102 00:06:42,430 --> 00:06:44,560 We have our sample space. 103 00:06:44,560 --> 00:06:46,909 We have a first event, A1. 104 00:06:46,909 --> 00:06:49,440 We have a second event, A2. 105 00:06:49,440 --> 00:06:51,690 The third event, A3. 106 00:06:51,690 --> 00:06:55,730 And so we keep continuing and we have an infinite sequence 107 00:06:55,730 --> 00:06:57,400 of such events. 108 00:06:57,400 --> 00:07:02,770 Then the probability of the union of these events, of 109 00:07:02,770 --> 00:07:07,600 these infinitely many events, is the sum of their individual 110 00:07:07,600 --> 00:07:09,390 probabilities. 111 00:07:09,390 --> 00:07:15,630 The key word here is the word sequence. 112 00:07:15,630 --> 00:07:20,430 Namely, these events, these sets that we're dealing with, 113 00:07:20,430 --> 00:07:25,120 can be arranged so that we can talk about the first event, 114 00:07:25,120 --> 00:07:31,490 A1, the second event, A2, the third one, A3, and so on. 115 00:07:31,490 --> 00:07:35,510 To appreciate the issue that arises here and to see why the 116 00:07:35,510 --> 00:07:41,360 word sequence is so important, let us consider the following 117 00:07:41,360 --> 00:07:43,110 calculation. 118 00:07:43,110 --> 00:07:45,680 Our sample space is the unit square. 119 00:07:48,750 --> 00:07:52,290 And we consider a model where the probability of a set is 120 00:07:52,290 --> 00:07:57,030 its area, as in the examples that we considered earlier. 121 00:07:57,030 --> 00:08:00,550 Let us now look at the probability of the overall 122 00:08:00,550 --> 00:08:02,180 sample space. 123 00:08:02,180 --> 00:08:07,890 Our sample space is the unit square and the unit square can 124 00:08:07,890 --> 00:08:13,870 be thought of as the union of various sets that consist of 125 00:08:13,870 --> 00:08:15,330 single points. 126 00:08:15,330 --> 00:08:22,780 So it's the union of subsets with one element each. 127 00:08:22,780 --> 00:08:25,100 And it's a union taken over all the 128 00:08:25,100 --> 00:08:28,770 points in the unit square. 129 00:08:28,770 --> 00:08:31,590 Then we think about additivity. 130 00:08:31,590 --> 00:08:35,490 We observe that these subsets are disjoint. 131 00:08:35,490 --> 00:08:39,080 If we're considering different points, then we get disjoint 132 00:08:39,080 --> 00:08:40,890 single element sets. 133 00:08:40,890 --> 00:08:44,190 And then an additivity property would tells us that 134 00:08:44,190 --> 00:08:47,450 the probability of these union is the sum of the 135 00:08:47,450 --> 00:08:53,750 probabilities of the different single element subsets. 136 00:08:53,750 --> 00:08:57,910 Now, as we discussed before, single element subsets have 0 137 00:08:57,910 --> 00:08:58,770 probability. 138 00:08:58,770 --> 00:09:04,320 So we have a sum of lots of 0s and the sum of 0s should be 139 00:09:04,320 --> 00:09:06,310 equal to 0. 140 00:09:06,310 --> 00:09:09,310 On the other hand, by the probability axioms, the 141 00:09:09,310 --> 00:09:11,860 probability of the entire sample space 142 00:09:11,860 --> 00:09:13,750 should be equal to 1. 143 00:09:13,750 --> 00:09:18,140 And so we have established that 1 is equal to 0. 144 00:09:18,140 --> 00:09:20,120 This looks like a paradox. 145 00:09:20,120 --> 00:09:21,840 Is it? 146 00:09:21,840 --> 00:09:26,110 The catch is that there is nothing in the axioms we have 147 00:09:26,110 --> 00:09:29,770 introduced so far or the properties we have established 148 00:09:29,770 --> 00:09:32,600 that would justify this step. 149 00:09:32,600 --> 00:09:36,940 So this step here is questionable. 150 00:09:36,940 --> 00:09:40,440 You might argue that the unit square is the union of 151 00:09:40,440 --> 00:09:45,490 disjoint single element sets, which is the case that we have 152 00:09:45,490 --> 00:09:47,340 in additivity axioms. 153 00:09:47,340 --> 00:09:50,950 But the additivity axiom only applies when we have a 154 00:09:50,950 --> 00:09:53,770 sequence of events. 155 00:09:53,770 --> 00:09:56,580 And this is not what we have here. 156 00:09:56,580 --> 00:09:59,470 This is not a union of a sequence of 157 00:09:59,470 --> 00:10:01,090 single element sets. 158 00:10:01,090 --> 00:10:04,160 In fact, there is no way that the elements of the unit 159 00:10:04,160 --> 00:10:06,930 square can be arranged in a sequence. 160 00:10:06,930 --> 00:10:13,440 The unit square is said to be an uncountable set. 161 00:10:13,440 --> 00:10:16,950 This is a deep and fundamental mathematical fact. 162 00:10:16,950 --> 00:10:19,980 What it essentially says is that there are two kinds of 163 00:10:19,980 --> 00:10:21,510 infinite sets. 164 00:10:21,510 --> 00:10:26,450 Discrete ones or in formal terminology countable. 165 00:10:26,450 --> 00:10:29,980 These are sets whose elements can be arranged in a sequence, 166 00:10:29,980 --> 00:10:31,680 like the integers. 167 00:10:31,680 --> 00:10:36,910 And also uncountable sets, such as the unit square or the 168 00:10:36,910 --> 00:10:40,030 real line, whose elements cannot be 169 00:10:40,030 --> 00:10:42,140 arranged in a sequence. 170 00:10:42,140 --> 00:10:45,680 If you're curious, you can find the proof of this 171 00:10:45,680 --> 00:10:48,400 important fact in the supplementary materials that 172 00:10:48,400 --> 00:10:51,020 we are providing. 173 00:10:51,020 --> 00:10:53,680 After all these discussion, you may now have legitimate 174 00:10:53,680 --> 00:10:57,340 suspicions about the models we have been looking at. 175 00:10:57,340 --> 00:11:00,860 Is area a legitimate probability law? 176 00:11:00,860 --> 00:11:05,600 Does it even satisfy countable additivity? 177 00:11:05,600 --> 00:11:09,000 This question takes us into deep waters and has to do with 178 00:11:09,000 --> 00:11:12,250 a deep subfield of mathematics called Measure Theory. 179 00:11:12,250 --> 00:11:15,970 Fortunately, it turns out that all is well. 180 00:11:15,970 --> 00:11:19,030 Area is a legitimate probability law. 181 00:11:19,030 --> 00:11:23,600 It does indeed satisfy the countable additivity axiom as 182 00:11:23,600 --> 00:11:29,270 long as we only deal with nice subsets of the unit square. 183 00:11:29,270 --> 00:11:32,640 Fortunately, the subsets that arise in whatever we do in 184 00:11:32,640 --> 00:11:35,220 this course will be "nice". 185 00:11:35,220 --> 00:11:39,890 Subsets that are not nice are quite pathological and we will 186 00:11:39,890 --> 00:11:42,260 not encounter them. 187 00:11:42,260 --> 00:11:47,170 At this stage we are not in a position to say anything more 188 00:11:47,170 --> 00:11:50,200 that would be meaningful about these issues because they're 189 00:11:50,200 --> 00:11:53,230 quite complicated and mathematically deep. 190 00:11:53,230 --> 00:11:57,550 We can only say that there are some serious mathematical 191 00:11:57,550 --> 00:11:58,660 subtleties. 192 00:11:58,660 --> 00:12:01,620 But fortunately, they can all be overcome 193 00:12:01,620 --> 00:12:03,190 in a rigorous manner. 194 00:12:03,190 --> 00:12:06,190 And for the rest of this class, you can just forget 195 00:12:06,190 --> 00:12:07,710 about these subtle issues.