1 00:00:00,000 --> 00:00:00,040 2 00:00:00,040 --> 00:00:02,460 The following content is provided under a Creative 3 00:00:02,460 --> 00:00:03,870 Commons license. 4 00:00:03,870 --> 00:00:06,910 Your support will help MIT OpenCourseWare continue to 5 00:00:06,910 --> 00:00:10,560 offer high quality educational resources for free. 6 00:00:10,560 --> 00:00:13,460 To make a donation or view additional materials from 7 00:00:13,460 --> 00:00:17,390 hundreds of MIT courses, visit MIT OpenCourseWare at 8 00:00:17,390 --> 00:00:18,640 ocw.mit.edu. 9 00:00:18,640 --> 00:00:22,440 10 00:00:22,440 --> 00:00:29,250 PROFESSOR: OK, so welcome to 6.041/6.431, the class on 11 00:00:29,250 --> 00:00:31,750 probability models and the like. 12 00:00:31,750 --> 00:00:32,740 I'm John Tsitsiklis. 13 00:00:32,740 --> 00:00:36,340 I will be teaching this class, and I'm looking forward to 14 00:00:36,340 --> 00:00:41,060 this being an enjoyable and also useful experience. 15 00:00:41,060 --> 00:00:44,500 We have a fair amount of staff involved in this course, your 16 00:00:44,500 --> 00:00:48,040 recitation instructors and also a bunch of TAs, but I 17 00:00:48,040 --> 00:00:52,860 want to single out our head TA, Uzoma, who is the key 18 00:00:52,860 --> 00:00:54,450 person in this class. 19 00:00:54,450 --> 00:00:56,550 Everything has to go through him. 20 00:00:56,550 --> 00:00:59,640 If he doesn't know in which recitation section you are, 21 00:00:59,640 --> 00:01:03,700 then simply you do not exist, so keep that in mind. 22 00:01:03,700 --> 00:01:04,099 All right. 23 00:01:04,099 --> 00:01:08,360 So we want to jump right into the subject, but I'm going to 24 00:01:08,360 --> 00:01:11,210 take just a few minutes to talk about a few 25 00:01:11,210 --> 00:01:14,580 administrative details and how the course is run. 26 00:01:14,580 --> 00:01:17,990 So we're going to have lectures twice a week and I'm 27 00:01:17,990 --> 00:01:20,300 going to use old fashioned transparencies. 28 00:01:20,300 --> 00:01:23,270 Now, you get copies of these slides with plenty of space 29 00:01:23,270 --> 00:01:25,760 for you to keep notes on them. 30 00:01:25,760 --> 00:01:31,190 A useful way of making good use of the slides is to use 31 00:01:31,190 --> 00:01:33,670 them as a sort of mnemonic summary of 32 00:01:33,670 --> 00:01:35,720 what happens in lecture. 33 00:01:35,720 --> 00:01:38,460 Not everything that I'm going to say is, of course, on the 34 00:01:38,460 --> 00:01:41,700 slides, but by looking them you get the sense of what's 35 00:01:41,700 --> 00:01:42,760 happening right now. 36 00:01:42,760 --> 00:01:45,940 And it may be a good idea to review them before you go to 37 00:01:45,940 --> 00:01:47,240 recitation. 38 00:01:47,240 --> 00:01:48,310 So what happens in recitation? 39 00:01:48,310 --> 00:01:52,040 In recitation, your recitation instructor is going to maybe 40 00:01:52,040 --> 00:01:55,140 review some of the theory and then solve some 41 00:01:55,140 --> 00:01:57,150 problems for you. 42 00:01:57,150 --> 00:02:00,520 And then you have tutorials where you meet in very small 43 00:02:00,520 --> 00:02:02,750 groups together with your TA. 44 00:02:02,750 --> 00:02:05,740 And what happens in tutorials is that you actually do the 45 00:02:05,740 --> 00:02:09,020 problem solving with the help of your TA and the help of 46 00:02:09,020 --> 00:02:12,290 your classmates in your tutorial section. 47 00:02:12,290 --> 00:02:14,340 Now probability is a tricky subject. 48 00:02:14,340 --> 00:02:16,750 You may be reading the text, listening to lectures, 49 00:02:16,750 --> 00:02:20,660 everything makes perfect sense, and so on, but until 50 00:02:20,660 --> 00:02:23,510 you actually sit down and try to solve problems, you don't 51 00:02:23,510 --> 00:02:25,600 quite appreciate the subtleties and the 52 00:02:25,600 --> 00:02:27,310 difficulties that are involved. 53 00:02:27,310 --> 00:02:30,550 So problem solving is a key part of this class. 54 00:02:30,550 --> 00:02:34,010 And tutorials are extremely useful just for this reason 55 00:02:34,010 --> 00:02:36,710 because that's where you actually get the practice of 56 00:02:36,710 --> 00:02:39,620 solving problems on your own, as opposed to seeing someone 57 00:02:39,620 --> 00:02:43,510 else who's solving them for you. 58 00:02:43,510 --> 00:02:46,840 OK but, mechanics, a key part of what's going to happen 59 00:02:46,840 --> 00:02:51,890 today is that you will turn in your schedule forms that are 60 00:02:51,890 --> 00:02:55,350 at the end of the handout that you have in your hands. 61 00:02:55,350 --> 00:02:59,820 Then, the TAs will be working frantically through the night, 62 00:02:59,820 --> 00:03:04,000 and they're going to be producing a list of who goes 63 00:03:04,000 --> 00:03:05,700 into what section. 64 00:03:05,700 --> 00:03:09,640 And when that happens, any person in this class, with 65 00:03:09,640 --> 00:03:13,350 probability 90%, is going to be happy with their assignment 66 00:03:13,350 --> 00:03:17,670 and, with probability 10%, they're going to be unhappy. 67 00:03:17,670 --> 00:03:20,860 Now, unhappy people have an option, though. 68 00:03:20,860 --> 00:03:23,820 You can resubmit your form together with your full 69 00:03:23,820 --> 00:03:27,470 schedule and constraints, give it back to the head TA, who 70 00:03:27,470 --> 00:03:32,160 will then do some further juggling and reassign people, 71 00:03:32,160 --> 00:03:36,270 and after that happens, 90% of those unhappy people will 72 00:03:36,270 --> 00:03:37,570 become happy. 73 00:03:37,570 --> 00:03:42,270 And 10% of them will be less unhappy. 74 00:03:42,270 --> 00:03:42,840 OK. 75 00:03:42,840 --> 00:03:46,930 So what's the probability that a random person is going to be 76 00:03:46,930 --> 00:03:49,800 unhappy at the end of this process? 77 00:03:49,800 --> 00:03:50,780 It's 1%. 78 00:03:50,780 --> 00:03:51,330 Excellent. 79 00:03:51,330 --> 00:03:51,490 Good. 80 00:03:51,490 --> 00:03:53,200 Maybe you don't need this class. 81 00:03:53,200 --> 00:03:54,340 OK, so 1%. 82 00:03:54,340 --> 00:03:57,370 We have about 100 people in this class, so there's going 83 00:03:57,370 --> 00:03:59,590 to be about one unhappy person. 84 00:03:59,590 --> 00:04:03,020 I mean, anywhere you look in life, in any group you look 85 00:04:03,020 --> 00:04:05,370 at, there's always one unhappy person, right? 86 00:04:05,370 --> 00:04:09,060 So, what can we do about it? 87 00:04:09,060 --> 00:04:09,660 All right. 88 00:04:09,660 --> 00:04:12,710 Another important part about mechanics is to read carefully 89 00:04:12,710 --> 00:04:15,540 the statement that we have about collaboration, academic 90 00:04:15,540 --> 00:04:17,019 honesty, and all that. 91 00:04:17,019 --> 00:04:19,149 You're encouraged, it's a very good idea to 92 00:04:19,149 --> 00:04:21,140 work with other students. 93 00:04:21,140 --> 00:04:24,690 You can consult sources that are out there, but when you 94 00:04:24,690 --> 00:04:28,140 sit down and write your solutions you have to do that 95 00:04:28,140 --> 00:04:32,050 by setting things aside and just write them on your own. 96 00:04:32,050 --> 00:04:34,360 You cannot copy something that somebody else 97 00:04:34,360 --> 00:04:37,040 has given to you. 98 00:04:37,040 --> 00:04:41,390 One reason is that we're not going to like it when it 99 00:04:41,390 --> 00:04:44,280 happens, and then another reason is that you're not 100 00:04:44,280 --> 00:04:46,270 going to do yourself any favor. 101 00:04:46,270 --> 00:04:48,830 Really the only way to do well in this class is to get a lot 102 00:04:48,830 --> 00:04:51,620 of practice by solving problems yourselves. 103 00:04:51,620 --> 00:04:55,160 So if you don't do that on your own, then when quiz and 104 00:04:55,160 --> 00:04:59,070 exam time comes, things are going to be difficult. 105 00:04:59,070 --> 00:05:02,590 So, as I mentioned here, we're going to have recitation 106 00:05:02,590 --> 00:05:06,540 sections, that some of them are for 6.041 students, some 107 00:05:06,540 --> 00:05:10,270 are for 6.431 students, the graduate section of the class. 108 00:05:10,270 --> 00:05:12,950 Now undergraduates can sit in the 109 00:05:12,950 --> 00:05:14,690 graduate recitation sections. 110 00:05:14,690 --> 00:05:17,650 What's going to happen there is that things may be just a 111 00:05:17,650 --> 00:05:21,260 little faster and you may be covering a problem that's a 112 00:05:21,260 --> 00:05:23,300 little more advanced and is not covered in 113 00:05:23,300 --> 00:05:24,670 the undergrad sections. 114 00:05:24,670 --> 00:05:28,190 But if you sit in the graduate section, and you're an 115 00:05:28,190 --> 00:05:31,140 undergraduate, you're still just responsible for the 116 00:05:31,140 --> 00:05:33,130 undergraduate material. 117 00:05:33,130 --> 00:05:35,760 That is, you can just do the undergraduate work in the 118 00:05:35,760 --> 00:05:38,470 class, but maybe be exposed at the different section. 119 00:05:38,470 --> 00:05:41,070 120 00:05:41,070 --> 00:05:43,036 OK. 121 00:05:43,036 --> 00:05:46,220 A few words about the style of this class. 122 00:05:46,220 --> 00:05:50,760 We want to focus on basic ideas and concepts. 123 00:05:50,760 --> 00:05:53,860 There's going to be lots of formulas, but what we try to 124 00:05:53,860 --> 00:05:56,530 do in this class is to actually have you understand 125 00:05:56,530 --> 00:05:58,190 what those formulas mean. 126 00:05:58,190 --> 00:06:01,260 And, in a year from now when almost all of the formulas 127 00:06:01,260 --> 00:06:04,660 have been wiped out from your memory, you still have the 128 00:06:04,660 --> 00:06:05,610 basic concepts. 129 00:06:05,610 --> 00:06:08,690 You can understand them, so when you look things up again, 130 00:06:08,690 --> 00:06:12,820 they will still make sense. 131 00:06:12,820 --> 00:06:16,880 It's not the plug and chug kind of class where you're 132 00:06:16,880 --> 00:06:19,430 given a list of formulas, you're given numbers, and you 133 00:06:19,430 --> 00:06:21,470 plug in and you get answers. 134 00:06:21,470 --> 00:06:24,950 The really hard part is usually to choose which 135 00:06:24,950 --> 00:06:26,280 formulas you're going to use. 136 00:06:26,280 --> 00:06:28,900 You need judgment, you need intuition. 137 00:06:28,900 --> 00:06:32,400 Lots of probability problems, at least the interesting ones, 138 00:06:32,400 --> 00:06:34,450 often have lots of different solutions. 139 00:06:34,450 --> 00:06:37,440 Some are extremely long, some are extremely short. 140 00:06:37,440 --> 00:06:40,550 The extremely short ones usually involve some kind of 141 00:06:40,550 --> 00:06:44,320 deeper understanding of what's going on so that you can pick 142 00:06:44,320 --> 00:06:46,350 a shortcut and use it. 143 00:06:46,350 --> 00:06:48,300 And hopefully you are going to develop this 144 00:06:48,300 --> 00:06:51,630 skill during this class. 145 00:06:51,630 --> 00:06:56,360 Now, I could spend a lot of time in this lecture talking 146 00:06:56,360 --> 00:06:58,570 about why the subject is important. 147 00:06:58,570 --> 00:07:02,270 I'll keep it short because I think it's almost obvious. 148 00:07:02,270 --> 00:07:05,650 Anything that happens in life is uncertain. 149 00:07:05,650 --> 00:07:09,080 There's uncertainty anywhere, so whatever you try to do, you 150 00:07:09,080 --> 00:07:12,550 need to have some way of dealing or thinking about this 151 00:07:12,550 --> 00:07:13,930 uncertainty. 152 00:07:13,930 --> 00:07:17,110 And the way to do that in a systematic way is by using the 153 00:07:17,110 --> 00:07:20,110 models that are given to us by probability theory. 154 00:07:20,110 --> 00:07:22,330 So if you're an engineer and you're dealing with a 155 00:07:22,330 --> 00:07:25,470 communication system or signal processing, basically you're 156 00:07:25,470 --> 00:07:28,440 facing a fight against noise. 157 00:07:28,440 --> 00:07:30,380 Noise is random, is uncertain. 158 00:07:30,380 --> 00:07:31,450 How do you model it? 159 00:07:31,450 --> 00:07:33,120 How do you deal with it? 160 00:07:33,120 --> 00:07:36,400 If you're a manager, I guess you're dealing with customer 161 00:07:36,400 --> 00:07:38,410 demand, which is, of course, random. 162 00:07:38,410 --> 00:07:41,590 Or you're dealing with the stock market, which is 163 00:07:41,590 --> 00:07:42,820 definitely random. 164 00:07:42,820 --> 00:07:48,190 Or you play the casino, which is, again, random, and so on. 165 00:07:48,190 --> 00:07:51,100 And the same goes for pretty much any other field that you 166 00:07:51,100 --> 00:07:52,880 can think of. 167 00:07:52,880 --> 00:07:57,320 But, independent of which field you're coming from, the 168 00:07:57,320 --> 00:08:00,630 basic concepts and tools are really all the same. 169 00:08:00,630 --> 00:08:04,080 So you may see in bookstores that there are books, 170 00:08:04,080 --> 00:08:07,010 probability for scientists, probability for engineers, 171 00:08:07,010 --> 00:08:09,900 probability for social scientists, probability for 172 00:08:09,900 --> 00:08:11,440 astrologists. 173 00:08:11,440 --> 00:08:14,880 Well, what all those books have inside them is exactly 174 00:08:14,880 --> 00:08:18,040 the same models, the same equations, the same problems. 175 00:08:18,040 --> 00:08:21,510 They just make them somewhat different word problems. 176 00:08:21,510 --> 00:08:26,000 The basic concepts are just one and the same, and we'll 177 00:08:26,000 --> 00:08:30,420 take this as an excuse for not going too much into specific 178 00:08:30,420 --> 00:08:31,960 domain applications. 179 00:08:31,960 --> 00:08:35,260 We will have problems and examples that are motivated, 180 00:08:35,260 --> 00:08:38,140 in some loose sense, from real world situations. 181 00:08:38,140 --> 00:08:42,030 But we're not really trying in this class to develop the 182 00:08:42,030 --> 00:08:46,220 skills for domain-specific problems. 183 00:08:46,220 --> 00:08:49,660 Rather, we're going to try to stick to general understanding 184 00:08:49,660 --> 00:08:52,390 of the subject. 185 00:08:52,390 --> 00:08:52,760 OK. 186 00:08:52,760 --> 00:08:57,280 So the next slide, of which you do have in your handout, 187 00:08:57,280 --> 00:09:01,080 gives you a few more details about the class. 188 00:09:01,080 --> 00:09:04,540 Maybe one thing to comment here is that you do need to 189 00:09:04,540 --> 00:09:06,370 read the text. 190 00:09:06,370 --> 00:09:09,420 And with calculus books, perhaps you can live with a 191 00:09:09,420 --> 00:09:12,640 just a two page summary of all of the interesting formulas in 192 00:09:12,640 --> 00:09:18,050 calculus, and you can get by just with those formulas. 193 00:09:18,050 --> 00:09:20,430 But here, because we want to develop concepts and 194 00:09:20,430 --> 00:09:24,260 intuition, actually reading words, as opposed to just 195 00:09:24,260 --> 00:09:27,430 browsing through equations, does make a difference. 196 00:09:27,430 --> 00:09:30,250 In the beginning, the class is kind of easy. 197 00:09:30,250 --> 00:09:32,820 When we deal with discrete probability, that's the 198 00:09:32,820 --> 00:09:37,320 material until our first quiz, and some of you may get by 199 00:09:37,320 --> 00:09:40,710 without being too systematic about following the material. 200 00:09:40,710 --> 00:09:43,970 But it does get substantially harder afterwards. 201 00:09:43,970 --> 00:09:48,110 And I would keep restating that you do have to read the 202 00:09:48,110 --> 00:09:52,460 text to really understand the material. 203 00:09:52,460 --> 00:09:52,980 OK. 204 00:09:52,980 --> 00:09:57,850 So now we can start with the real part of the lecture. 205 00:09:57,850 --> 00:10:01,670 Let us set the goals for today. 206 00:10:01,670 --> 00:10:05,890 So probability, or probability theory, is a framework for 207 00:10:05,890 --> 00:10:09,870 dealing with uncertainty, for dealing with situations in 208 00:10:09,870 --> 00:10:12,200 which we have some kind of randomness. 209 00:10:12,200 --> 00:10:16,300 So what we want to do is, by the end of today's lecture, to 210 00:10:16,300 --> 00:10:21,910 give you anything that you need to know how to set up 211 00:10:21,910 --> 00:10:23,970 what does it take to set up a probabilistic model. 212 00:10:23,970 --> 00:10:28,390 And what are the basic rules of the game for dealing with 213 00:10:28,390 --> 00:10:30,520 probabilistic models? 214 00:10:30,520 --> 00:10:32,780 So, by the end of this lecture, you will have 215 00:10:32,780 --> 00:10:34,750 essentially recovered half of this 216 00:10:34,750 --> 00:10:36,860 semester's tuition, right? 217 00:10:36,860 --> 00:10:39,040 So we're going to talk about probabilistic 218 00:10:39,040 --> 00:10:40,820 models in more detail-- 219 00:10:40,820 --> 00:10:43,920 the sample space, which is basically a description of all 220 00:10:43,920 --> 00:10:47,410 the things that may happen during a random experiment, 221 00:10:47,410 --> 00:10:50,940 and the probability law, which describes our beliefs about 222 00:10:50,940 --> 00:10:53,710 which outcomes are more likely to occur 223 00:10:53,710 --> 00:10:56,080 compared to other outcomes. 224 00:10:56,080 --> 00:10:59,130 Probability laws have to obey certain properties that we 225 00:10:59,130 --> 00:11:00,640 call the axioms of probability. 226 00:11:00,640 --> 00:11:04,640 So the main part of today's lecture is to describe those 227 00:11:04,640 --> 00:11:09,350 axioms, which are the rules of the game, and consider a few 228 00:11:09,350 --> 00:11:12,770 really trivial examples. 229 00:11:12,770 --> 00:11:15,370 OK, so let's start with our agenda. 230 00:11:15,370 --> 00:11:18,080 The first piece in a probabilistic model is a 231 00:11:18,080 --> 00:11:21,850 description of the sample space of an experiment. 232 00:11:21,850 --> 00:11:27,470 So we do an experiment, and by experiment we just mean that 233 00:11:27,470 --> 00:11:30,270 just something happens out there. 234 00:11:30,270 --> 00:11:33,300 And that something that happens, it could be flipping 235 00:11:33,300 --> 00:11:39,320 a coin, or it could be rolling a dice, or it could be doing 236 00:11:39,320 --> 00:11:41,550 something in a card game. 237 00:11:41,550 --> 00:11:44,190 So we fix a particular experiment. 238 00:11:44,190 --> 00:11:48,780 And we come up with a list of all the possible things that 239 00:11:48,780 --> 00:11:51,090 may happen during this experiment. 240 00:11:51,090 --> 00:11:54,880 So we write down a list of all the possible outcomes. 241 00:11:54,880 --> 00:11:57,830 So here's a list of all the possible outcomes of the 242 00:11:57,830 --> 00:11:59,050 experiment. 243 00:11:59,050 --> 00:12:02,730 I use the word "list," but, if you want to be a little more 244 00:12:02,730 --> 00:12:06,730 formal, it's better to think of that list as a set. 245 00:12:06,730 --> 00:12:08,630 So we have a set. 246 00:12:08,630 --> 00:12:11,000 That set is our sample space. 247 00:12:11,000 --> 00:12:14,840 And it's a set whose elements are the possible outcomes of 248 00:12:14,840 --> 00:12:15,920 the experiment. 249 00:12:15,920 --> 00:12:18,530 So, for example, if you're dealing with flipping a coin, 250 00:12:18,530 --> 00:12:22,380 your sample space would be heads, this is one outcome, 251 00:12:22,380 --> 00:12:24,450 tails is one outcome. 252 00:12:24,450 --> 00:12:27,540 And this set, which has two elements, is the sample space 253 00:12:27,540 --> 00:12:29,260 of the experiment. 254 00:12:29,260 --> 00:12:29,670 OK. 255 00:12:29,670 --> 00:12:33,260 What do we need to think about when we're setting up the 256 00:12:33,260 --> 00:12:34,430 sample space? 257 00:12:34,430 --> 00:12:36,690 First, the list should be mutually exclusive, 258 00:12:36,690 --> 00:12:37,830 collectively exhaustive. 259 00:12:37,830 --> 00:12:39,150 What does that mean? 260 00:12:39,150 --> 00:12:42,490 Collectively exhaustive means that, no matter what happens 261 00:12:42,490 --> 00:12:45,730 in the experiment, you're going to get one of the 262 00:12:45,730 --> 00:12:47,700 outcomes inside here. 263 00:12:47,700 --> 00:12:51,010 So you have not forgotten any of the possibilities of what 264 00:12:51,010 --> 00:12:53,020 may happen in the experiment. 265 00:12:53,020 --> 00:12:57,720 Mutually exclusive means that if this happens, then that 266 00:12:57,720 --> 00:12:58,870 cannot happen. 267 00:12:58,870 --> 00:13:01,580 So at the end of the experiment, you should be able 268 00:13:01,580 --> 00:13:06,570 to point out to me just one, exactly one, of these outcomes 269 00:13:06,570 --> 00:13:10,660 and say, this is the outcome that happened. 270 00:13:10,660 --> 00:13:11,040 OK. 271 00:13:11,040 --> 00:13:13,690 So these are sort of basic requirements. 272 00:13:13,690 --> 00:13:16,540 There's another requirement which is a little more loose. 273 00:13:16,540 --> 00:13:19,150 When you set up your sample space, sometimes you do have 274 00:13:19,150 --> 00:13:23,530 some freedom about the details of how you're going to 275 00:13:23,530 --> 00:13:24,900 describe it. 276 00:13:24,900 --> 00:13:27,160 And the question is, how much detail are 277 00:13:27,160 --> 00:13:28,730 you going to include? 278 00:13:28,730 --> 00:13:31,880 So let's take this coin flipping experiment and think 279 00:13:31,880 --> 00:13:34,070 of the following sample space. 280 00:13:34,070 --> 00:13:37,825 One possible outcome is heads, a second possible outcome is 281 00:13:37,825 --> 00:13:44,000 tails and it's raining, and the third possible outcome is 282 00:13:44,000 --> 00:13:45,500 tails and it's not raining. 283 00:13:45,500 --> 00:13:49,180 284 00:13:49,180 --> 00:13:52,760 So this is another possible sample space for the 285 00:13:52,760 --> 00:13:56,910 experiment where I flip a coin just once. 286 00:13:56,910 --> 00:13:58,330 It's a legitimate one. 287 00:13:58,330 --> 00:14:01,600 These three possibilities are mutually exclusive and 288 00:14:01,600 --> 00:14:03,470 collectively exhaustive. 289 00:14:03,470 --> 00:14:05,410 Which one is the right sample space? 290 00:14:05,410 --> 00:14:08,440 Is it this one or that one? 291 00:14:08,440 --> 00:14:12,020 Well, if you think that my coin flipping inside this room 292 00:14:12,020 --> 00:14:15,690 is completely unrelated to the weather outside, then you're 293 00:14:15,690 --> 00:14:18,470 going to stick with this sample space. 294 00:14:18,470 --> 00:14:22,080 If, on the other hand, you have some superstitious belief 295 00:14:22,080 --> 00:14:27,180 that maybe rain has an effect on my coins, you might work 296 00:14:27,180 --> 00:14:29,520 with the sample space of this kind. 297 00:14:29,520 --> 00:14:33,190 So you probably wouldn't do that, but it's a legitimate 298 00:14:33,190 --> 00:14:35,370 option, strictly speaking. 299 00:14:35,370 --> 00:14:38,900 Now this example is a little bit on the frivolous side, but 300 00:14:38,900 --> 00:14:42,600 the issue that comes up here is a basic one that shows up 301 00:14:42,600 --> 00:14:44,700 anywhere in science and engineering. 302 00:14:44,700 --> 00:14:48,150 Whenever you're dealing with a model or with a situation, 303 00:14:48,150 --> 00:14:50,645 there are zillions of details in that situation. 304 00:14:50,645 --> 00:14:54,350 And when you come up with a model, you choose some of 305 00:14:54,350 --> 00:14:58,220 those details that you keep in your model, and some that you 306 00:14:58,220 --> 00:15:00,060 say, well, these are irrelevant. 307 00:15:00,060 --> 00:15:03,780 Or maybe there are small effects, I can neglect them, 308 00:15:03,780 --> 00:15:05,970 and you keep them outside your model. 309 00:15:05,970 --> 00:15:09,420 So when you go to the real world, there's definitely an 310 00:15:09,420 --> 00:15:12,950 element of art and some judgment that you need to do 311 00:15:12,950 --> 00:15:15,930 in order to set up an appropriate sample space. 312 00:15:15,930 --> 00:15:20,270 313 00:15:20,270 --> 00:15:23,310 So, an easy example now. 314 00:15:23,310 --> 00:15:26,000 So of course, the elementary examples are 315 00:15:26,000 --> 00:15:29,420 coins, cards, and dice. 316 00:15:29,420 --> 00:15:30,840 So let's deal with dice. 317 00:15:30,840 --> 00:15:34,550 But to keep the diagram small, instead of a six-sided die, 318 00:15:34,550 --> 00:15:38,270 we're going to think about the die that only has four faces. 319 00:15:38,270 --> 00:15:40,220 So you can do that with a tetrahedron, 320 00:15:40,220 --> 00:15:41,150 doesn't really matter. 321 00:15:41,150 --> 00:15:44,110 Basically, it's a die that when you roll it, you get a 322 00:15:44,110 --> 00:15:47,360 result which is one, two, three or four. 323 00:15:47,360 --> 00:15:50,860 However, the experiment that I'm going to think about will 324 00:15:50,860 --> 00:15:55,770 consist of two rolls of a dice. 325 00:15:55,770 --> 00:15:57,600 A crucial point here-- 326 00:15:57,600 --> 00:16:01,580 I'm rolling the die twice, but I'm thinking of this as just 327 00:16:01,580 --> 00:16:06,370 one experiment, not two different experiments, not a 328 00:16:06,370 --> 00:16:10,110 repetition twice of the same experiment. 329 00:16:10,110 --> 00:16:12,040 So it's one big experiment. 330 00:16:12,040 --> 00:16:15,190 During that big experiment various things could happen, 331 00:16:15,190 --> 00:16:17,910 such as I'm rolling the die once, and then I'm 332 00:16:17,910 --> 00:16:20,384 rolling the die twice. 333 00:16:20,384 --> 00:16:22,450 OK. 334 00:16:22,450 --> 00:16:25,280 So what's the sample space for that experiment? 335 00:16:25,280 --> 00:16:27,020 Well, the sample space consists of 336 00:16:27,020 --> 00:16:28,700 the possible outcomes. 337 00:16:28,700 --> 00:16:33,220 One possible outcome is that your first roll resulted in 338 00:16:33,220 --> 00:16:36,670 two and the second roll resulted in three. 339 00:16:36,670 --> 00:16:40,950 In which case, the outcome that you get is this one, a 340 00:16:40,950 --> 00:16:42,840 two followed by three. 341 00:16:42,840 --> 00:16:45,840 This is one possible outcome. 342 00:16:45,840 --> 00:16:49,750 The way I'm describing things, this outcome is to be 343 00:16:49,750 --> 00:16:54,130 distinguished from this outcome here, where a three is 344 00:16:54,130 --> 00:16:56,656 followed by two. 345 00:16:56,656 --> 00:17:00,500 If you're playing backgammon, it doesn't matter which one of 346 00:17:00,500 --> 00:17:02,250 the two happened. 347 00:17:02,250 --> 00:17:05,819 But if you're dealing with a probabilistic model that you 348 00:17:05,819 --> 00:17:08,530 want to keep track of everything that happens in 349 00:17:08,530 --> 00:17:12,829 this composite experiment, there are good reasons for 350 00:17:12,829 --> 00:17:15,859 distinguishing between these two outcomes. 351 00:17:15,859 --> 00:17:18,609 I mean, when this happens, it's definitely something 352 00:17:18,609 --> 00:17:20,220 different from that happening. 353 00:17:20,220 --> 00:17:22,900 A two followed by a three is different from a three 354 00:17:22,900 --> 00:17:24,349 followed by a two. 355 00:17:24,349 --> 00:17:27,700 So this is the correct sample space for this experiment 356 00:17:27,700 --> 00:17:29,890 where we roll the die twice. 357 00:17:29,890 --> 00:17:32,980 It has a total of 16 elements and it's, of 358 00:17:32,980 --> 00:17:35,840 course, a finite set. 359 00:17:35,840 --> 00:17:39,960 Sometimes, instead of describing sample spaces in 360 00:17:39,960 --> 00:17:44,250 terms of lists, or sets, or diagrams of this kind, it's 361 00:17:44,250 --> 00:17:46,930 useful to describe the experiment in 362 00:17:46,930 --> 00:17:48,660 some sequential way. 363 00:17:48,660 --> 00:17:50,950 Whenever you have an experiment that consists of 364 00:17:50,950 --> 00:17:55,790 multiple stages, it might be useful, at least visually, to 365 00:17:55,790 --> 00:17:59,940 give a diagram that shows you how those stages evolve. 366 00:17:59,940 --> 00:18:04,080 And that's what we do by using a sequential description or a 367 00:18:04,080 --> 00:18:08,390 tree-based description by drawing a tree of the possible 368 00:18:08,390 --> 00:18:11,250 evolutions during our experiment. 369 00:18:11,250 --> 00:18:14,890 So in this tree, I'm thinking of a first stage in which I 370 00:18:14,890 --> 00:18:18,600 roll the first die, and there are four possible results, 371 00:18:18,600 --> 00:18:20,520 one, two, three and four.and 4. 372 00:18:20,520 --> 00:18:24,310 And, given what happened, let's say in the first roll, 373 00:18:24,310 --> 00:18:26,050 suppose I got a one. 374 00:18:26,050 --> 00:18:28,980 Then I'm rolling the second dice, and there are four 375 00:18:28,980 --> 00:18:32,060 possibilities for what may happen to the second die. 376 00:18:32,060 --> 00:18:33,570 And the possible results are one, tow, 377 00:18:33,570 --> 00:18:36,010 three and four again. 378 00:18:36,010 --> 00:18:38,860 So what's the relation between the two diagrams? 379 00:18:38,860 --> 00:18:42,910 Well, for example, the outcome two followed by three 380 00:18:42,910 --> 00:18:46,940 corresponds to this path on the tree. 381 00:18:46,940 --> 00:18:50,550 So this path corresponds to two followed by a three. 382 00:18:50,550 --> 00:18:54,200 Any path is associated to a particular outcome, any 383 00:18:54,200 --> 00:18:57,360 outcome is associated to a particular path. 384 00:18:57,360 --> 00:19:00,370 And, instead of paths, you may want to think in terms of the 385 00:19:00,370 --> 00:19:01,990 leaves of this diagram. 386 00:19:01,990 --> 00:19:05,740 Same thing, think of each one of the leaves as being one 387 00:19:05,740 --> 00:19:07,980 possible outcome. 388 00:19:07,980 --> 00:19:11,160 And of course we have 16 outcomes here, we have 16 389 00:19:11,160 --> 00:19:12,790 outcomes here. 390 00:19:12,790 --> 00:19:15,920 Maybe you noticed the subtlety that I used in my language. 391 00:19:15,920 --> 00:19:18,810 I said I rolled the first dice and the result 392 00:19:18,810 --> 00:19:20,580 that I get is a two. 393 00:19:20,580 --> 00:19:23,700 I didn't use the word "outcome." I want to reserve 394 00:19:23,700 --> 00:19:28,960 the word "outcome" to mean the overall outcome at the end of 395 00:19:28,960 --> 00:19:30,570 the overall experiment. 396 00:19:30,570 --> 00:19:36,300 So "2, 3" is the outcome of the experiment. 397 00:19:36,300 --> 00:19:38,910 The experiment consisted of stages. 398 00:19:38,910 --> 00:19:41,620 Two was the result in the first stage, three was the 399 00:19:41,620 --> 00:19:43,370 result in the second stage. 400 00:19:43,370 --> 00:19:45,720 You put all those results together, and 401 00:19:45,720 --> 00:19:47,520 you get your outcome. 402 00:19:47,520 --> 00:19:53,550 OK, perhaps we are splitting hairs here, but it's useful to 403 00:19:53,550 --> 00:19:56,470 keep the concepts right. 404 00:19:56,470 --> 00:19:59,780 What's special about this example is that, besides being 405 00:19:59,780 --> 00:20:03,230 trivial, it has a sample space which is finite. 406 00:20:03,230 --> 00:20:06,000 There's 16 possible total outcomes. 407 00:20:06,000 --> 00:20:09,210 Not every experiment has a finite sample space. 408 00:20:09,210 --> 00:20:12,840 Here's an experiment in which the sample space is infinite. 409 00:20:12,840 --> 00:20:17,690 So you are playing darts and the target is this square. 410 00:20:17,690 --> 00:20:21,740 And you're perfect at that game, so you're sure that your 411 00:20:21,740 --> 00:20:26,010 darts will always fall inside the square. 412 00:20:26,010 --> 00:20:29,130 So, but where exactly your dart would fall inside that 413 00:20:29,130 --> 00:20:31,180 square, that itself is random. 414 00:20:31,180 --> 00:20:32,880 We don't know what it's going to be. 415 00:20:32,880 --> 00:20:34,300 It's uncertain. 416 00:20:34,300 --> 00:20:38,090 So all the possible points inside the square are possible 417 00:20:38,090 --> 00:20:39,710 outcomes of the experiment. 418 00:20:39,710 --> 00:20:43,060 So a typical outcome of the experiment is going to a pair 419 00:20:43,060 --> 00:20:46,490 of numbers, x,y, where x and y are real numbers 420 00:20:46,490 --> 00:20:48,280 between zero and one. 421 00:20:48,280 --> 00:20:51,390 Now there's infinitely many real numbers, there's 422 00:20:51,390 --> 00:20:55,270 infinitely many points in the square, so this is an example 423 00:20:55,270 --> 00:20:58,740 in which our sample space is an infinite set. 424 00:20:58,740 --> 00:21:01,670 425 00:21:01,670 --> 00:21:06,910 OK, so we're going to revisit this example a little later. 426 00:21:06,910 --> 00:21:11,790 So these are two examples of what the sample space might be 427 00:21:11,790 --> 00:21:13,730 in simple experiments. 428 00:21:13,730 --> 00:21:18,240 Now, the more important order of business is now to look at 429 00:21:18,240 --> 00:21:21,800 those possible outcomes and to make some statements about 430 00:21:21,800 --> 00:21:23,910 their relative likelihoods. 431 00:21:23,910 --> 00:21:26,780 Which outcome is more likely to occur 432 00:21:26,780 --> 00:21:29,060 compared to the others? 433 00:21:29,060 --> 00:21:32,510 And the way we do this is by assigning 434 00:21:32,510 --> 00:21:36,210 probabilities to the outcomes. 435 00:21:36,210 --> 00:21:38,590 Well, not exactly. 436 00:21:38,590 --> 00:21:42,440 Suppose that all you were to do was to assign probabilities 437 00:21:42,440 --> 00:21:44,320 to individual outcomes. 438 00:21:44,320 --> 00:21:49,200 If you go back to this example, and you consider one 439 00:21:49,200 --> 00:21:52,250 particular outcome-- let's say this point-- 440 00:21:52,250 --> 00:21:55,620 what would be the probability that you hit exactly this 441 00:21:55,620 --> 00:21:58,640 point to infinite precision? 442 00:21:58,640 --> 00:22:01,070 Intuitively, that probability would be zero. 443 00:22:01,070 --> 00:22:05,630 So any individual point in this diagram in any reasonable 444 00:22:05,630 --> 00:22:08,520 model should have zero probability. 445 00:22:08,520 --> 00:22:11,870 So if you just tell me that any individual outcome has 446 00:22:11,870 --> 00:22:14,440 zero probability, you're not really telling me 447 00:22:14,440 --> 00:22:17,030 much to work with. 448 00:22:17,030 --> 00:22:20,910 For that reason, what instead we're going to do is to assign 449 00:22:20,910 --> 00:22:25,150 probabilities to subsets of the sample space, as opposed 450 00:22:25,150 --> 00:22:29,170 to assigning probabilities to individual outcomes. 451 00:22:29,170 --> 00:22:32,410 So here's the picture. 452 00:22:32,410 --> 00:22:36,890 We have our sample space, which is omega, and we 453 00:22:36,890 --> 00:22:39,690 consider some subset of the sample space. 454 00:22:39,690 --> 00:22:45,820 Call it A. And I want to assign a number, a numerical 455 00:22:45,820 --> 00:22:50,720 probability, to this particular subset which 456 00:22:50,720 --> 00:22:56,950 represents my belief about how likely this set is to occur. 457 00:22:56,950 --> 00:22:57,340 OK. 458 00:22:57,340 --> 00:23:01,250 What do we mean "to occur?" And I'm introducing here a 459 00:23:01,250 --> 00:23:03,770 language that's being used in probability theory. 460 00:23:03,770 --> 00:23:07,410 When we talk about subsets of the sample space, we usually 461 00:23:07,410 --> 00:23:10,470 call them events, as opposed to subsets. 462 00:23:10,470 --> 00:23:14,480 And the reason is because it works nicely with the language 463 00:23:14,480 --> 00:23:16,710 that describes what's going on. 464 00:23:16,710 --> 00:23:19,010 So the outcome is a point. 465 00:23:19,010 --> 00:23:20,540 The outcome is random. 466 00:23:20,540 --> 00:23:26,800 The outcome may be inside this set, in which case we say that 467 00:23:26,800 --> 00:23:31,270 event A occurred, if we get an outcome inside here. 468 00:23:31,270 --> 00:23:35,120 Or the outcome may fall outside the set, in which case 469 00:23:35,120 --> 00:23:38,530 we say that event A did not occur. 470 00:23:38,530 --> 00:23:42,310 So we're going to assign probabilities to events. 471 00:23:42,310 --> 00:23:45,630 And now, how should we do this assignment? 472 00:23:45,630 --> 00:23:49,180 Well, probabilities are meant to describe your beliefs about 473 00:23:49,180 --> 00:23:52,880 which sets are more likely to occur versus other sets. 474 00:23:52,880 --> 00:23:55,050 So there's many ways that you can assign those 475 00:23:55,050 --> 00:23:56,080 probabilities. 476 00:23:56,080 --> 00:23:59,290 But there are some ground rules for this game. 477 00:23:59,290 --> 00:24:02,990 First, we want probabilities to be numbers between zero and 478 00:24:02,990 --> 00:24:06,740 one because that's the usual convention. 479 00:24:06,740 --> 00:24:09,840 So a probability of zero means we're certain that something 480 00:24:09,840 --> 00:24:10,820 is not going to happen. 481 00:24:10,820 --> 00:24:13,570 Probability of one means that we're essentially certain that 482 00:24:13,570 --> 00:24:14,870 something's going to happen. 483 00:24:14,870 --> 00:24:17,450 So we want numbers between zero and one. 484 00:24:17,450 --> 00:24:19,740 We also want a few other things. 485 00:24:19,740 --> 00:24:23,200 And those few other things are going to be encapsulated in a 486 00:24:23,200 --> 00:24:25,060 set of axioms. 487 00:24:25,060 --> 00:24:29,030 What "axioms" means in this context, it's the ground rules 488 00:24:29,030 --> 00:24:31,300 that any legitimate probabilistic 489 00:24:31,300 --> 00:24:33,410 model should obey. 490 00:24:33,410 --> 00:24:37,080 You have a choice of what kind of probabilities you use. 491 00:24:37,080 --> 00:24:40,900 But, no matter what you use, they should obey certain 492 00:24:40,900 --> 00:24:44,740 consistency properties because if they obey those properties, 493 00:24:44,740 --> 00:24:47,640 then you can go ahead and do useful calculations and do 494 00:24:47,640 --> 00:24:49,360 some useful reasoning. 495 00:24:49,360 --> 00:24:51,010 So what are these properties? 496 00:24:51,010 --> 00:24:55,060 First, probabilities should be non-negative. 497 00:24:55,060 --> 00:24:56,590 OK? 498 00:24:56,590 --> 00:24:57,530 That's our convention. 499 00:24:57,530 --> 00:25:00,350 We want probabilities to be numbers between zero and one. 500 00:25:00,350 --> 00:25:02,130 So they should certainly be non-negative. 501 00:25:02,130 --> 00:25:04,600 The probability that event A occurs should be a 502 00:25:04,600 --> 00:25:06,135 non-negative number. 503 00:25:06,135 --> 00:25:08,110 What's the second axiom? 504 00:25:08,110 --> 00:25:13,760 The probability of the entire sample space is equal to one. 505 00:25:13,760 --> 00:25:15,590 Why does this make sense? 506 00:25:15,590 --> 00:25:20,120 Well, the outcome is certain to be an element of the sample 507 00:25:20,120 --> 00:25:23,140 space because we set up a sample space, which is 508 00:25:23,140 --> 00:25:24,660 collectively exhaustive. 509 00:25:24,660 --> 00:25:28,590 No matter what the outcome is, it's going to be an element of 510 00:25:28,590 --> 00:25:29,350 the sample space. 511 00:25:29,350 --> 00:25:33,710 We're certain that event omega is going to occur. 512 00:25:33,710 --> 00:25:37,470 Therefore, we represent this certainty by saying that the 513 00:25:37,470 --> 00:25:41,520 probability of omega is equal to one. 514 00:25:41,520 --> 00:25:47,180 Pretty straightforward so far. 515 00:25:47,180 --> 00:25:52,240 The more interesting axiom is the third rule. 516 00:25:52,240 --> 00:25:55,580 Before getting into it, just a quick reminder. 517 00:25:55,580 --> 00:26:01,950 If you have two sets, A and B, the intersection of A and B 518 00:26:01,950 --> 00:26:07,220 consists of those elements that belong both to A and B. 519 00:26:07,220 --> 00:26:09,580 And we denote it this way. 520 00:26:09,580 --> 00:26:11,510 When you think probabilistically, the way to 521 00:26:11,510 --> 00:26:15,530 think of intersection is by using the word "and." This 522 00:26:15,530 --> 00:26:21,040 event, this intersection, is the event that A occurred and 523 00:26:21,040 --> 00:26:22,450 B occurred. 524 00:26:22,450 --> 00:26:26,060 If I get an outcome inside here, A has occurred and B has 525 00:26:26,060 --> 00:26:27,950 occurred at the same time. 526 00:26:27,950 --> 00:26:31,150 So you may find the word "and" to be a little more convenient 527 00:26:31,150 --> 00:26:33,680 than the word "intersection." 528 00:26:33,680 --> 00:26:37,360 And similarly, we have some notation for the union of two 529 00:26:37,360 --> 00:26:42,280 events, which we write this way. 530 00:26:42,280 --> 00:26:46,250 The union of two sets, or two events, is the collection of 531 00:26:46,250 --> 00:26:49,370 all the elements that belong either to the first set, or to 532 00:26:49,370 --> 00:26:51,400 the second, or to both. 533 00:26:51,400 --> 00:26:55,220 When you talk about events, you can use the word "or." So 534 00:26:55,220 --> 00:26:59,990 this is the event that A occurred or B occurred. 535 00:26:59,990 --> 00:27:03,350 And this "or" means that it could also be that both of 536 00:27:03,350 --> 00:27:04,600 them occurred. 537 00:27:04,600 --> 00:27:08,890 538 00:27:08,890 --> 00:27:09,150 OK. 539 00:27:09,150 --> 00:27:11,280 So now that we have this notation, what does 540 00:27:11,280 --> 00:27:13,835 the third axiom say? 541 00:27:13,835 --> 00:27:19,830 The third axiom says that if we have two events, A and B, 542 00:27:19,830 --> 00:27:23,140 that have no common elements-- 543 00:27:23,140 --> 00:27:29,330 so here's A, here's B, and perhaps this is 544 00:27:29,330 --> 00:27:31,140 our big sample space. 545 00:27:31,140 --> 00:27:33,470 The two events have no common elements. 546 00:27:33,470 --> 00:27:36,510 So the intersection of the two events is the empty set. 547 00:27:36,510 --> 00:27:38,930 There's nothing in their intersection. 548 00:27:38,930 --> 00:27:43,190 Then, the total probability of A together with B has to be 549 00:27:43,190 --> 00:27:46,600 equal to the sum of the individual probabilities. 550 00:27:46,600 --> 00:27:50,510 So the probability that A occurs or B occurs is equal to 551 00:27:50,510 --> 00:27:52,390 the probability that A occurs plus the 552 00:27:52,390 --> 00:27:55,040 probability that B occurs. 553 00:27:55,040 --> 00:27:58,860 So think of probability as being cream cheese. 554 00:27:58,860 --> 00:28:03,020 You have one pound of cream cheese, the total probability 555 00:28:03,020 --> 00:28:05,340 assigned to the entire sample space. 556 00:28:05,340 --> 00:28:12,780 And that cream cheese is spread out over this set. 557 00:28:12,780 --> 00:28:16,380 The probability of A is how much cream cheese sits on top 558 00:28:16,380 --> 00:28:20,320 of A. Probability of B is how much sits on top of B. The 559 00:28:20,320 --> 00:28:25,370 probability of A union B is the total amount of cream 560 00:28:25,370 --> 00:28:29,650 cheese sitting on top of this and that, which is obviously 561 00:28:29,650 --> 00:28:31,880 the sum of how much is sitting here and how 562 00:28:31,880 --> 00:28:33,220 much is sitting there. 563 00:28:33,220 --> 00:28:36,110 So probabilities behave like cream cheese, or 564 00:28:36,110 --> 00:28:38,450 they behave like mass. 565 00:28:38,450 --> 00:28:48,280 For example, if you think of some material object, the mass 566 00:28:48,280 --> 00:28:51,800 of this set consisting of two pieces is obviously the sum of 567 00:28:51,800 --> 00:28:53,120 the two masses. 568 00:28:53,120 --> 00:28:55,680 So this property is a very intuitive one. 569 00:28:55,680 --> 00:28:58,282 It's a pretty natural one to have. 570 00:28:58,282 --> 00:29:00,640 OK. 571 00:29:00,640 --> 00:29:03,880 Are these axioms enough for what we want to do? 572 00:29:03,880 --> 00:29:07,670 I mentioned a while ago that we want probabilities to be 573 00:29:07,670 --> 00:29:10,110 numbers between zero and one. 574 00:29:10,110 --> 00:29:12,400 Here's an axiom that tells you that probabilities are 575 00:29:12,400 --> 00:29:13,710 non-negative. 576 00:29:13,710 --> 00:29:17,280 Should we have another axiom that tells us that 577 00:29:17,280 --> 00:29:21,670 probabilities are less than or equal to one? 578 00:29:21,670 --> 00:29:23,150 It's a desirable property. 579 00:29:23,150 --> 00:29:26,090 We would like to have it in our hands. 580 00:29:26,090 --> 00:29:29,030 OK, why is it not in that list? 581 00:29:29,030 --> 00:29:32,850 Well, the people who are in the axiom making business are 582 00:29:32,850 --> 00:29:35,060 mathematicians and mathematicians tend to be 583 00:29:35,060 --> 00:29:36,390 pretty laconic. 584 00:29:36,390 --> 00:29:40,020 You don't say something if you don't have to say it. 585 00:29:40,020 --> 00:29:42,580 And this is the case here. 586 00:29:42,580 --> 00:29:46,660 We don't need that extra axiom because we can derive it from 587 00:29:46,660 --> 00:29:48,440 the existing axioms. 588 00:29:48,440 --> 00:29:50,590 Here's how it goes. 589 00:29:50,590 --> 00:29:55,180 One is the probability over the entire sample space. 590 00:29:55,180 --> 00:29:57,450 Here we're using the second axiom. 591 00:29:57,450 --> 00:30:00,310 592 00:30:00,310 --> 00:30:06,070 Now the sample space consists of A together with the 593 00:30:06,070 --> 00:30:07,680 complement of A. OK? 594 00:30:07,680 --> 00:30:11,200 595 00:30:11,200 --> 00:30:14,470 When I write the complement of A, I mean the complement of A 596 00:30:14,470 --> 00:30:16,800 inside of the set omega. 597 00:30:16,800 --> 00:30:21,700 So we have omega, here's A, here's the complement of A, 598 00:30:21,700 --> 00:30:24,660 and the overall set is omega. 599 00:30:24,660 --> 00:30:25,350 OK. 600 00:30:25,350 --> 00:30:27,520 Now, what's the next step? 601 00:30:27,520 --> 00:30:28,650 What should I do next? 602 00:30:28,650 --> 00:30:31,320 Which axiom should I use? 603 00:30:31,320 --> 00:30:35,350 We use axiom three because a set and the complement of that 604 00:30:35,350 --> 00:30:36,730 set are disjoint. 605 00:30:36,730 --> 00:30:38,770 They don't have any common elements. 606 00:30:38,770 --> 00:30:44,050 So axiom three applies and tells me that this is the 607 00:30:44,050 --> 00:30:48,150 probability of A plus the probability of A complement. 608 00:30:48,150 --> 00:30:53,970 In particular, the probability of A is equal to one minus the 609 00:30:53,970 --> 00:30:58,370 probability of A complement, and this is less 610 00:30:58,370 --> 00:31:00,540 than or equal to one. 611 00:31:00,540 --> 00:31:01,790 Why? 612 00:31:01,790 --> 00:31:03,430 613 00:31:03,430 --> 00:31:06,670 Because probabilities are non-negative, 614 00:31:06,670 --> 00:31:10,020 by the first axiom. 615 00:31:10,020 --> 00:31:10,310 OK. 616 00:31:10,310 --> 00:31:12,440 So we got the conclusion that we wanted. 617 00:31:12,440 --> 00:31:16,130 Probabilities are always less than or equal to one, and this 618 00:31:16,130 --> 00:31:20,230 is a simple consequence of the three axioms that we have. 619 00:31:20,230 --> 00:31:24,780 This is a really nice argument because it actually uses each 620 00:31:24,780 --> 00:31:26,560 one of those axioms. 621 00:31:26,560 --> 00:31:29,060 The argument is simple, but you have to use all of these 622 00:31:29,060 --> 00:31:33,050 three properties to get the conclusion that you want. 623 00:31:33,050 --> 00:31:33,720 OK. 624 00:31:33,720 --> 00:31:37,140 So we can get interesting things out of our axioms. 625 00:31:37,140 --> 00:31:40,050 Can we get some more interesting ones? 626 00:31:40,050 --> 00:31:44,540 How about the union of three sets? 627 00:31:44,540 --> 00:31:47,000 What kind of probability should it have? 628 00:31:47,000 --> 00:31:52,870 So here's an event consisting of three pieces. 629 00:31:52,870 --> 00:31:56,230 And I want to say something about the probability of A 630 00:31:56,230 --> 00:32:01,780 union B union C. What I would like to say is that this 631 00:32:01,780 --> 00:32:05,680 probability is equal to the sum of the three individual 632 00:32:05,680 --> 00:32:07,140 probabilities. 633 00:32:07,140 --> 00:32:08,860 How can I do it? 634 00:32:08,860 --> 00:32:11,080 I have an axiom that tells me that I can 635 00:32:11,080 --> 00:32:12,760 do it for two events. 636 00:32:12,760 --> 00:32:15,370 I don't have an axiom for three events. 637 00:32:15,370 --> 00:32:19,210 Well, maybe I can manage things and still be able to 638 00:32:19,210 --> 00:32:20,620 use that axiom. 639 00:32:20,620 --> 00:32:22,700 And here's the trick. 640 00:32:22,700 --> 00:32:28,000 The union of three sets, you can think of it as forming the 641 00:32:28,000 --> 00:32:32,560 union of the first two sets and then taking the union with 642 00:32:32,560 --> 00:32:35,670 the third set. 643 00:32:35,670 --> 00:32:36,530 OK? 644 00:32:36,530 --> 00:32:39,150 So taking unions, you can take the unions in any 645 00:32:39,150 --> 00:32:40,440 order that you want. 646 00:32:40,440 --> 00:32:44,580 So here we have the union of two sets. 647 00:32:44,580 --> 00:32:49,630 Now, ABC are disjoint, by assumption or 648 00:32:49,630 --> 00:32:51,780 that's how I drew it. 649 00:32:51,780 --> 00:32:55,950 So if A, B, and C are disjoint, then A union B is 650 00:32:55,950 --> 00:32:59,790 disjoint from C. So here we have the union of 651 00:32:59,790 --> 00:33:01,400 two disjoint sets. 652 00:33:01,400 --> 00:33:05,380 So by the additivity axiom, the probability of that the 653 00:33:05,380 --> 00:33:08,960 union is going to be the probability of the first set 654 00:33:08,960 --> 00:33:12,000 plus the probability of the second set. 655 00:33:12,000 --> 00:33:15,950 And now I can use the additivity axiom once more to 656 00:33:15,950 --> 00:33:20,330 write that this is probability of A plus probability of B 657 00:33:20,330 --> 00:33:25,220 plus probability of C. So by using this axiom which was 658 00:33:25,220 --> 00:33:28,940 stated for two sets, we can actually derive a similar 659 00:33:28,940 --> 00:33:32,450 property for the union of three disjoint sets. 660 00:33:32,450 --> 00:33:34,640 And then you can repeat this argument as many 661 00:33:34,640 --> 00:33:35,940 times as you want. 662 00:33:35,940 --> 00:33:39,050 It's valid for the union of ten disjoint sets, for the 663 00:33:39,050 --> 00:33:42,830 union of a hundred disjoint sets, for the union of any 664 00:33:42,830 --> 00:33:44,910 finite number of sets. 665 00:33:44,910 --> 00:33:53,210 So if A1 up to An are disjoint, then the probability 666 00:33:53,210 --> 00:33:59,490 of A1 union An is equal to the sum of the probabilities of 667 00:33:59,490 --> 00:34:01,500 the individual sets. 668 00:34:01,500 --> 00:34:04,180 669 00:34:04,180 --> 00:34:05,740 OK. 670 00:34:05,740 --> 00:34:08,710 Special case of this is when we're 671 00:34:08,710 --> 00:34:10,790 dealing with finite sets. 672 00:34:10,790 --> 00:34:14,300 Suppose I have just a finite set of outcomes. 673 00:34:14,300 --> 00:34:17,880 I put them together in a set and I'm interested in the 674 00:34:17,880 --> 00:34:19,630 probability of that set. 675 00:34:19,630 --> 00:34:22,050 So here's our sample space. 676 00:34:22,050 --> 00:34:26,840 There's lots of outcomes, but I'm taking a few of these and 677 00:34:26,840 --> 00:34:30,120 I form a set out of them. 678 00:34:30,120 --> 00:34:32,920 This is a set consisting of, in this 679 00:34:32,920 --> 00:34:34,760 picture, three elements. 680 00:34:34,760 --> 00:34:38,260 In general, it consists of k elements. 681 00:34:38,260 --> 00:34:43,650 Now, a finite set, I can write it as a union of single 682 00:34:43,650 --> 00:34:44,889 element sets. 683 00:34:44,889 --> 00:34:49,080 So this set here is the union of this one element set, 684 00:34:49,080 --> 00:34:52,800 together with this one element set together with that one 685 00:34:52,800 --> 00:34:53,980 element set. 686 00:34:53,980 --> 00:34:56,770 So the total probability of this set is going to be the 687 00:34:56,770 --> 00:35:02,510 sum of the probabilities of the one element sets. 688 00:35:02,510 --> 00:35:08,030 Now, probability of a one element set, you need to use 689 00:35:08,030 --> 00:35:10,010 the brackets here because probabilities 690 00:35:10,010 --> 00:35:12,260 are assigned to sets. 691 00:35:12,260 --> 00:35:16,190 But this gets kind of tedious, so here one abuses notation a 692 00:35:16,190 --> 00:35:19,920 little bit and we get rid of those brackets and just write 693 00:35:19,920 --> 00:35:24,030 probability of this single, individual outcome. 694 00:35:24,030 --> 00:35:28,510 In any case, conclusion from this exercise is that the 695 00:35:28,510 --> 00:35:33,410 total probability of a finite collection of possible 696 00:35:33,410 --> 00:35:37,070 outcomes, the total probability is equal to the 697 00:35:37,070 --> 00:35:42,190 sum of the probabilities of individual elements. 698 00:35:42,190 --> 00:35:46,460 So these are basically the axioms of probability theory. 699 00:35:46,460 --> 00:35:49,970 Or, well, they're almost the axioms. 700 00:35:49,970 --> 00:35:53,060 There are some subtleties that are involved here. 701 00:35:53,060 --> 00:35:58,650 One subtlety is that this axiom here doesn't quite do 702 00:35:58,650 --> 00:36:01,340 the job for everything we would like to do. 703 00:36:01,340 --> 00:36:03,030 And we're going to come back to this at 704 00:36:03,030 --> 00:36:05,080 the end of the lecture. 705 00:36:05,080 --> 00:36:10,380 A second subtlety has to do with weird sets. 706 00:36:10,380 --> 00:36:13,570 We said that an event is a subset of the sample space and 707 00:36:13,570 --> 00:36:16,712 we assign probabilities to events. 708 00:36:16,712 --> 00:36:19,990 Does this mean that we are going to assign probability to 709 00:36:19,990 --> 00:36:23,500 every possible subset of the sample space? 710 00:36:23,500 --> 00:36:26,660 Ideally, we would wish to do that. 711 00:36:26,660 --> 00:36:29,580 Unfortunately, this is not always possible. 712 00:36:29,580 --> 00:36:35,010 If you take a sample space, such as the square, the square 713 00:36:35,010 --> 00:36:38,560 has nice subsets, those that you can describe by cutting it 714 00:36:38,560 --> 00:36:40,220 with lines and so on. 715 00:36:40,220 --> 00:36:45,540 But it does have some very ugly subsets, as well, that 716 00:36:45,540 --> 00:36:48,870 are impossible to visualize, impossible to imagine, but 717 00:36:48,870 --> 00:36:50,030 they do exist. 718 00:36:50,030 --> 00:36:53,710 And those very weird sets are such that there's no way to 719 00:36:53,710 --> 00:36:56,750 assign probabilities to them in a way that's consistent 720 00:36:56,750 --> 00:36:58,630 with the axioms of probability. 721 00:36:58,630 --> 00:36:59,000 OK. 722 00:36:59,000 --> 00:37:02,960 So this is a very, very fine point that you can immediately 723 00:37:02,960 --> 00:37:05,940 forget for the rest of this class. 724 00:37:05,940 --> 00:37:09,350 You will only encounter these sets if you end up doing 725 00:37:09,350 --> 00:37:12,450 doctoral work on the theoretical aspects of 726 00:37:12,450 --> 00:37:15,910 probability theory. 727 00:37:15,910 --> 00:37:19,570 So it's just a mathematical subtlety that some very weird 728 00:37:19,570 --> 00:37:22,560 sets do not have probabilities assigned to them. 729 00:37:22,560 --> 00:37:25,110 But we're not going to encounter these sets and they 730 00:37:25,110 --> 00:37:26,885 do not show up in any applications. 731 00:37:26,885 --> 00:37:29,520 732 00:37:29,520 --> 00:37:29,840 OK. 733 00:37:29,840 --> 00:37:32,410 So now let's revisit our examples. 734 00:37:32,410 --> 00:37:34,800 Let's go back to the die example. 735 00:37:34,800 --> 00:37:36,950 We have our sample space. 736 00:37:36,950 --> 00:37:40,830 Now we need to assign a probability law. 737 00:37:40,830 --> 00:37:43,260 There's lots of possible probability laws 738 00:37:43,260 --> 00:37:44,690 that you can assign. 739 00:37:44,690 --> 00:37:49,060 I'm picking one here, arbitrarily, in which I say 740 00:37:49,060 --> 00:37:51,320 that every possible outcome has the same 741 00:37:51,320 --> 00:37:55,440 probability of 1/16. 742 00:37:55,440 --> 00:37:56,040 OK. 743 00:37:56,040 --> 00:37:58,010 Why do I make this model? 744 00:37:58,010 --> 00:38:02,340 Well, empirically, if you have well-manufactured dice, they 745 00:38:02,340 --> 00:38:04,540 tend to behave that way. 746 00:38:04,540 --> 00:38:06,870 We will be coming back to this kind of story 747 00:38:06,870 --> 00:38:08,500 later in this class. 748 00:38:08,500 --> 00:38:13,040 But I'm not saying that this is the only probability law 749 00:38:13,040 --> 00:38:13,720 that there can be. 750 00:38:13,720 --> 00:38:17,460 You might have weird dice in which certain outcomes are 751 00:38:17,460 --> 00:38:19,280 more likely than others. 752 00:38:19,280 --> 00:38:21,850 But to keep things simple, let's take every outcome to 753 00:38:21,850 --> 00:38:24,870 have the same probability of 1/16. 754 00:38:24,870 --> 00:38:26,790 OK. 755 00:38:26,790 --> 00:38:29,340 Now that we have in our hands a sample space and the 756 00:38:29,340 --> 00:38:31,990 probability law, we can actually solve any 757 00:38:31,990 --> 00:38:33,250 problem there is. 758 00:38:33,250 --> 00:38:36,070 We can answer any question that could be posed to us. 759 00:38:36,070 --> 00:38:39,320 For example, what's the probability that the outcome, 760 00:38:39,320 --> 00:38:43,590 which is this pair, is either 1,1 or 1,2. 761 00:38:43,590 --> 00:38:50,160 We're talking here about this particular event, 1,1 or 1,2. 762 00:38:50,160 --> 00:38:53,300 So it's an event consisting of these two items. 763 00:38:53,300 --> 00:38:56,640 According to what we were just discussing, the probability of 764 00:38:56,640 --> 00:38:59,540 a finite collection of outcomes is the sum of their 765 00:38:59,540 --> 00:39:01,170 individual probabilities. 766 00:39:01,170 --> 00:39:04,190 Each one of them has probability of 1/16, so the 767 00:39:04,190 --> 00:39:07,720 probability of this is 2/16. 768 00:39:07,720 --> 00:39:11,910 How about the probability of the event that x is equal to 769 00:39:11,910 --> 00:39:14,960 one. x is the first roll, so that's the probability that 770 00:39:14,960 --> 00:39:18,120 the first roll is equal to one. 771 00:39:18,120 --> 00:39:22,340 Notice the syntax that's being used here. 772 00:39:22,340 --> 00:39:26,880 Probabilities are assigned to subsets, to sets, so we think 773 00:39:26,880 --> 00:39:32,500 of this as meaning the set of all outcomes such that x is 774 00:39:32,500 --> 00:39:33,660 equal to one. 775 00:39:33,660 --> 00:39:35,210 How do you answer this question? 776 00:39:35,210 --> 00:39:38,370 You go back to the picture and you try to visualize or 777 00:39:38,370 --> 00:39:40,810 identify this event of interest. 778 00:39:40,810 --> 00:39:45,570 x is equal to one corresponds to this event here. 779 00:39:45,570 --> 00:39:48,950 These are all the outcomes at which x is equal to one. 780 00:39:48,950 --> 00:39:50,100 There's four outcomes. 781 00:39:50,100 --> 00:39:54,180 Each one has probability 1/16, so the answer is 4/16. 782 00:39:54,180 --> 00:39:56,760 783 00:39:56,760 --> 00:39:57,820 OK. 784 00:39:57,820 --> 00:40:06,482 How about the probability that x plus y is odd? 785 00:40:06,482 --> 00:40:07,100 OK. 786 00:40:07,100 --> 00:40:09,840 That will take a little bit more work. 787 00:40:09,840 --> 00:40:12,910 But you go to the sample space and you identify all the 788 00:40:12,910 --> 00:40:16,010 outcomes at which the sum is an odd number. 789 00:40:16,010 --> 00:40:20,930 So that's a place where the sum is odd, these are other 790 00:40:20,930 --> 00:40:27,570 places, and I guess that exhausts all the possible 791 00:40:27,570 --> 00:40:31,780 outcomes at which we have an odd sum. 792 00:40:31,780 --> 00:40:32,890 We count them. 793 00:40:32,890 --> 00:40:34,030 How many are there? 794 00:40:34,030 --> 00:40:35,540 There's a total of eight of them. 795 00:40:35,540 --> 00:40:40,490 Each one has probability 1/16, total probability is 8/16. 796 00:40:40,490 --> 00:40:41,620 And harder question. 797 00:40:41,620 --> 00:40:44,310 What is the probability that the minimum of the two rolls 798 00:40:44,310 --> 00:40:45,820 is equal to 2? 799 00:40:45,820 --> 00:40:48,710 This is something that you probably couldn't do in your 800 00:40:48,710 --> 00:40:51,640 head without the help of a diagram. 801 00:40:51,640 --> 00:40:54,780 But once you have a diagram, things are simple. 802 00:40:54,780 --> 00:40:55,760 You ask the question. 803 00:40:55,760 --> 00:40:59,710 OK, this is an event, that the minimum of the two rolls is 804 00:40:59,710 --> 00:41:01,140 equal to two. 805 00:41:01,140 --> 00:41:03,150 This can happen in several ways. 806 00:41:03,150 --> 00:41:05,250 What are the several ways that it can happen? 807 00:41:05,250 --> 00:41:07,980 Go to the diagram and try to identify them. 808 00:41:07,980 --> 00:41:11,620 So the minimum is equal to two if both of them are two's. 809 00:41:11,620 --> 00:41:14,230 810 00:41:14,230 --> 00:41:18,780 Or it could be that x is two and y is bigger, or y is two 811 00:41:18,780 --> 00:41:21,900 and x is bigger. 812 00:41:21,900 --> 00:41:23,150 OK. 813 00:41:23,150 --> 00:41:29,210 I guess we rediscover that yellow and blue make green, so 814 00:41:29,210 --> 00:41:31,910 we see here that there's a total of 815 00:41:31,910 --> 00:41:34,630 five possible outcomes. 816 00:41:34,630 --> 00:41:37,645 The probability of this event is 5/16. 817 00:41:37,645 --> 00:41:41,250 818 00:41:41,250 --> 00:41:47,460 Simple example, but the procedure that we followed in 819 00:41:47,460 --> 00:41:52,490 this example actually applies to any probability model you 820 00:41:52,490 --> 00:41:54,240 might ever encounter. 821 00:41:54,240 --> 00:41:57,720 You set up your sample space, you make a statement that 822 00:41:57,720 --> 00:42:00,710 describes the probability law over that sample space, then 823 00:42:00,710 --> 00:42:03,640 somebody asks you questions about various events. 824 00:42:03,640 --> 00:42:07,300 You go to your pictures, identify those events, pin 825 00:42:07,300 --> 00:42:11,410 them down, and then start kind of counting and calculating 826 00:42:11,410 --> 00:42:14,370 the total probability for those outcomes that you're 827 00:42:14,370 --> 00:42:16,560 considering. 828 00:42:16,560 --> 00:42:20,180 This example is a special case of what is called the discrete 829 00:42:20,180 --> 00:42:22,780 uniform law. 830 00:42:22,780 --> 00:42:26,500 The model obeys the discrete uniform law if all outcomes 831 00:42:26,500 --> 00:42:28,340 are equally likely. 832 00:42:28,340 --> 00:42:30,040 It doesn't have to be that way. 833 00:42:30,040 --> 00:42:33,290 That's just one example of a probability law. 834 00:42:33,290 --> 00:42:36,760 But when things are that way, if all outcomes are equally 835 00:42:36,760 --> 00:42:45,960 likely and we have N of them, and you have a set A that has 836 00:42:45,960 --> 00:42:51,150 little n elements, then each one of those elements has 837 00:42:51,150 --> 00:42:54,460 probability one over capital N since all 838 00:42:54,460 --> 00:42:56,450 outcomes are equally likely. 839 00:42:56,450 --> 00:42:58,980 And for our probabilities to add up to one, each one must 840 00:42:58,980 --> 00:43:02,620 have this much probability, and there's little n elements. 841 00:43:02,620 --> 00:43:06,120 That gives you the probability of the event of interest. 842 00:43:06,120 --> 00:43:09,020 So problems like the one in the previous slide and more 843 00:43:09,020 --> 00:43:11,560 generally of the type described here under discrete 844 00:43:11,560 --> 00:43:15,270 uniform law, these problems reduce to just counting. 845 00:43:15,270 --> 00:43:17,500 How many elements are there in my sample space? 846 00:43:17,500 --> 00:43:21,160 How many elements are there inside the event of interest? 847 00:43:21,160 --> 00:43:24,520 Counting is generally simple, but for some problems it gets 848 00:43:24,520 --> 00:43:25,950 pretty complicated. 849 00:43:25,950 --> 00:43:28,980 And in a couple of weeks, we're going to have to spend 850 00:43:28,980 --> 00:43:31,820 the whole lecture just on the subject of how to count 851 00:43:31,820 --> 00:43:33,280 systematically. 852 00:43:33,280 --> 00:43:37,070 Now the procedure we followed in the previous example is the 853 00:43:37,070 --> 00:43:39,950 same as the procedure you would follow in continuous 854 00:43:39,950 --> 00:43:41,330 probability problems. 855 00:43:41,330 --> 00:43:44,200 So, going back to our dart problem, we get the random 856 00:43:44,200 --> 00:43:46,550 point inside the square. 857 00:43:46,550 --> 00:43:48,030 That's our sample space. 858 00:43:48,030 --> 00:43:50,360 We need to assign a probability law. 859 00:43:50,360 --> 00:43:53,550 For lack of imagination, I'm taking the probability law to 860 00:43:53,550 --> 00:43:56,280 be the area of a subset. 861 00:43:56,280 --> 00:44:00,990 So if we have two subsets of the sample space that have 862 00:44:00,990 --> 00:44:05,000 equal areas, then I'm postulating that they are 863 00:44:05,000 --> 00:44:06,560 equally likely to occur. 864 00:44:06,560 --> 00:44:08,490 The probably that they fall here is the same as the 865 00:44:08,490 --> 00:44:11,430 probability that they fall there. 866 00:44:11,430 --> 00:44:13,670 The model doesn't have to be that way. 867 00:44:13,670 --> 00:44:16,720 But if I have sort of complete ignorance of which points are 868 00:44:16,720 --> 00:44:19,310 more likely than others, that might be the 869 00:44:19,310 --> 00:44:21,430 reasonable model to use. 870 00:44:21,430 --> 00:44:24,680 So equal areas mean equal probabilities. 871 00:44:24,680 --> 00:44:27,470 If the area is twice as large, the probability is going to be 872 00:44:27,470 --> 00:44:28,830 twice as big. 873 00:44:28,830 --> 00:44:32,130 So this is our model. 874 00:44:32,130 --> 00:44:34,580 We can now answer questions. 875 00:44:34,580 --> 00:44:35,730 Let's answer the easy one. 876 00:44:35,730 --> 00:44:38,070 What's the probability that the outcome is 877 00:44:38,070 --> 00:44:40,660 exactly this point? 878 00:44:40,660 --> 00:44:47,500 That of course is zero because a single point has zero area. 879 00:44:47,500 --> 00:44:50,190 And since this probability is equal to area, that's zero 880 00:44:50,190 --> 00:44:51,510 probability. 881 00:44:51,510 --> 00:44:55,940 How about the probability that the sum of the coordinates of 882 00:44:55,940 --> 00:45:00,090 the point that we got is less than or equal to 1/2? 883 00:45:00,090 --> 00:45:01,570 How do you deal with it? 884 00:45:01,570 --> 00:45:04,770 Well, you look at the picture again, at your sample space, 885 00:45:04,770 --> 00:45:08,130 and try to describe the event that you're talking about. 886 00:45:08,130 --> 00:45:12,210 The sum being less than 1/2 corresponds to getting an 887 00:45:12,210 --> 00:45:16,060 outcome that's below this line, where this line is the 888 00:45:16,060 --> 00:45:19,600 line where x plus y equals to 1/2. 889 00:45:19,600 --> 00:45:25,860 So the intercepts of that line with the axis are 1/2 and 1/2. 890 00:45:25,860 --> 00:45:29,730 So you describe the event visually and then you use your 891 00:45:29,730 --> 00:45:30,780 probability law. 892 00:45:30,780 --> 00:45:33,260 The probability law that we have is that the probability 893 00:45:33,260 --> 00:45:36,620 of a set is equal to the area of that set. 894 00:45:36,620 --> 00:45:39,900 So all we need to find is the area of this triangle, which 895 00:45:39,900 --> 00:45:48,915 is 1/2 times 1/2 times 1/2, half, equals to 1/8. 896 00:45:48,915 --> 00:45:49,380 OK. 897 00:45:49,380 --> 00:45:52,620 Moral from these two examples is that it's always useful to 898 00:45:52,620 --> 00:45:56,750 have a picture and work with a picture to visualize the 899 00:45:56,750 --> 00:45:58,750 events that you're talking about. 900 00:45:58,750 --> 00:46:01,340 And once you have a probability law in your hands, 901 00:46:01,340 --> 00:46:04,470 then it's a matter of calculation to find the 902 00:46:04,470 --> 00:46:06,540 probabilities of an event of interest. 903 00:46:06,540 --> 00:46:09,080 The calculations we did in these two examples, of course, 904 00:46:09,080 --> 00:46:10,130 were very simple. 905 00:46:10,130 --> 00:46:14,510 Sometimes calculations may be a lot harder, but it's a 906 00:46:14,510 --> 00:46:15,480 different business. 907 00:46:15,480 --> 00:46:19,250 It's a business of calculus, for example, or being good in 908 00:46:19,250 --> 00:46:20,250 algebra and so on. 909 00:46:20,250 --> 00:46:24,240 As far as probability is concerned, it's clear what you 910 00:46:24,240 --> 00:46:27,110 will be doing, and then maybe you're faced with a harder 911 00:46:27,110 --> 00:46:30,540 algebraic part to actually carry out the calculations. 912 00:46:30,540 --> 00:46:32,870 The area of a triangle is easy to compute. 913 00:46:32,870 --> 00:46:36,030 If I had put down a very complicated shape, then you 914 00:46:36,030 --> 00:46:39,300 might need to solve a hard integration problem to find 915 00:46:39,300 --> 00:46:42,190 the area of that shape, but that's stuff that belongs to 916 00:46:42,190 --> 00:46:46,306 another class that you have presumably mastered by now. 917 00:46:46,306 --> 00:46:47,000 Good, OK. 918 00:46:47,000 --> 00:46:49,730 So now let me spend just a couple of minutes to return to 919 00:46:49,730 --> 00:46:52,170 a point that I raised before. 920 00:46:52,170 --> 00:46:56,270 I was saying that the axiom that we had about additivity 921 00:46:56,270 --> 00:46:58,730 might not quite be enough. 922 00:46:58,730 --> 00:47:01,730 Let's illustrate what I mean by the following example. 923 00:47:01,730 --> 00:47:04,960 Think of the experiment where you keep flipping a coin and 924 00:47:04,960 --> 00:47:08,120 you wait until you obtain heads for the first time. 925 00:47:08,120 --> 00:47:11,390 What's the sample space of this experiment? 926 00:47:11,390 --> 00:47:13,730 It might happen the first flip, it might happen in the 927 00:47:13,730 --> 00:47:14,700 tenth flip. 928 00:47:14,700 --> 00:47:18,490 Heads for the first time might occur in the millionth flip. 929 00:47:18,490 --> 00:47:21,070 So the outcome of this experiment is going to be an 930 00:47:21,070 --> 00:47:23,820 integer and there's no bound to that integer. 931 00:47:23,820 --> 00:47:26,780 You might have to wait very much until that happens. 932 00:47:26,780 --> 00:47:29,020 So the natural sample space is the set of 933 00:47:29,020 --> 00:47:30,950 all possible integers. 934 00:47:30,950 --> 00:47:35,030 Somebody tells you some information about the 935 00:47:35,030 --> 00:47:36,250 probability law. 936 00:47:36,250 --> 00:47:39,900 The probability that you have to wait for n flips is equal 937 00:47:39,900 --> 00:47:41,130 to two to the minus n. 938 00:47:41,130 --> 00:47:42,850 Where did this come from? 939 00:47:42,850 --> 00:47:44,220 That's a separate story. 940 00:47:44,220 --> 00:47:45,730 Where did it come from? 941 00:47:45,730 --> 00:47:49,840 Somebody tells this to us, and those probabilities are 942 00:47:49,840 --> 00:47:52,150 plotted here as a function of n. 943 00:47:52,150 --> 00:47:54,580 And you're asked to find the probability that the outcome 944 00:47:54,580 --> 00:47:56,660 is an even number. 945 00:47:56,660 --> 00:47:59,920 How do you go about calculating that probability? 946 00:47:59,920 --> 00:48:02,960 So the probability of being an even number is the probability 947 00:48:02,960 --> 00:48:08,380 of the subset that consists of just the even numbers. 948 00:48:08,380 --> 00:48:11,810 So it would be a subset of this kind, that includes two, 949 00:48:11,810 --> 00:48:13,760 four, and so on. 950 00:48:13,760 --> 00:48:18,270 So any reasonable person would say, well the probability of 951 00:48:18,270 --> 00:48:22,170 obtaining an outcome that's either two or four or six and 952 00:48:22,170 --> 00:48:25,360 so on is equal to the probability of obtaining a 953 00:48:25,360 --> 00:48:28,370 two, plus the probability of obtaining a four, plus the 954 00:48:28,370 --> 00:48:31,130 probability of obtaining a six, and so on. 955 00:48:31,130 --> 00:48:33,640 These probabilities are given to us. 956 00:48:33,640 --> 00:48:35,990 So here I have to do my algebra. 957 00:48:35,990 --> 00:48:40,840 I add this geometric series and I get an answer of 1/3. 958 00:48:40,840 --> 00:48:43,430 That's what any reasonable person would do. 959 00:48:43,430 --> 00:48:48,290 But the person who only knows the axioms that they posted 960 00:48:48,290 --> 00:48:51,880 just a little earlier may get stuck. 961 00:48:51,880 --> 00:48:53,610 They would get stuck at this point. 962 00:48:53,610 --> 00:48:55,700 How do we justify this? 963 00:48:55,700 --> 00:48:59,000 964 00:48:59,000 --> 00:49:04,010 We had this property for the union of disjoint sets and the 965 00:49:04,010 --> 00:49:07,210 corresponding property that tells us that the total 966 00:49:07,210 --> 00:49:11,620 probability of finitely many things, outcomes, is the sum 967 00:49:11,620 --> 00:49:13,740 of their individual probabilities. 968 00:49:13,740 --> 00:49:17,940 But here we're using it on an infinite collection. 969 00:49:17,940 --> 00:49:23,180 The probability of infinitely many points is equal to the 970 00:49:23,180 --> 00:49:26,070 sum of the probabilities of each one of these. 971 00:49:26,070 --> 00:49:30,190 To justify this step we need to introduce one additional 972 00:49:30,190 --> 00:49:34,180 rule, an additional axiom, that tells us that this step 973 00:49:34,180 --> 00:49:36,160 is actually legitimate. 974 00:49:36,160 --> 00:49:39,540 And this is the countable additivity axiom, which is a 975 00:49:39,540 --> 00:49:42,780 little stronger, or quite a bit stronger, than the 976 00:49:42,780 --> 00:49:45,140 additivity axiom we had before. 977 00:49:45,140 --> 00:49:49,210 It tells us that if we have a sequence of sets that are 978 00:49:49,210 --> 00:49:54,190 disjoint and we want to find their total probability, then 979 00:49:54,190 --> 00:49:58,230 we are allowed to add their individual probabilities. 980 00:49:58,230 --> 00:50:01,000 So the picture might be such as follows. 981 00:50:01,000 --> 00:50:07,420 We have a sequence of sets, A1, A2, A3, and so on. 982 00:50:07,420 --> 00:50:10,110 I guess in order to fit them inside the sample space, the 983 00:50:10,110 --> 00:50:13,920 sets need to get smaller and smaller perhaps. 984 00:50:13,920 --> 00:50:15,340 They are disjoint. 985 00:50:15,340 --> 00:50:17,330 We have a sequence of such sets. 986 00:50:17,330 --> 00:50:21,340 The total probability of falling anywhere inside one of 987 00:50:21,340 --> 00:50:25,740 those sets is the sum of their individual probabilities. 988 00:50:25,740 --> 00:50:30,150 A key subtlety that's involved here is that we're talking 989 00:50:30,150 --> 00:50:33,710 about a sequence of events. 990 00:50:33,710 --> 00:50:36,560 By "sequence" we mean that these events can 991 00:50:36,560 --> 00:50:38,450 be arranged in order. 992 00:50:38,450 --> 00:50:41,780 I can tell you the first event, the second event, the 993 00:50:41,780 --> 00:50:43,530 third event, and so on. 994 00:50:43,530 --> 00:50:46,320 So if you have such a collection of events that can 995 00:50:46,320 --> 00:50:50,690 be ordered as first, second, third, and so on, then you can 996 00:50:50,690 --> 00:50:54,040 add their probabilities to find the 997 00:50:54,040 --> 00:50:55,790 probability of their union. 998 00:50:55,790 --> 00:50:58,230 So this point is actually a little more subtle than you 999 00:50:58,230 --> 00:51:00,730 might appreciate at this point, and I'm going to return 1000 00:51:00,730 --> 00:51:04,010 to it at the beginning of the next lecture. 1001 00:51:04,010 --> 00:51:07,160 For now, enjoy the first week of classes 1002 00:51:07,160 --> 00:51:09,380 and have a good weekend. 1003 00:51:09,380 --> 00:51:10,630 Thank you. 1004 00:51:10,630 --> 00:51:11,230