1 00:00:00,530 --> 00:00:02,960 The following content is provided under a Creative 2 00:00:02,960 --> 00:00:04,370 Commons license. 3 00:00:04,370 --> 00:00:07,410 Your support will help MIT OpenCourseWare continue to 4 00:00:07,410 --> 00:00:11,060 offer high quality educational resources for free. 5 00:00:11,060 --> 00:00:13,960 To make a donation, or view additional materials from 6 00:00:13,960 --> 00:00:18,240 hundreds of MIT courses, visit MIT OpenCourseWare at 7 00:00:18,240 --> 00:00:19,490 ocw.mit.edu. 8 00:00:23,894 --> 00:00:27,320 PROFESSOR: Today I'm starting a new topic and that's always 9 00:00:27,320 --> 00:00:29,980 the occasion for putting things into perspective. 10 00:00:29,980 --> 00:00:32,700 Keep in mind what we were trying to do in the subject. 11 00:00:32,700 --> 00:00:36,480 We were trying to introduce several intellectual themes. 12 00:00:36,480 --> 00:00:39,560 The first, and absolutely the most important, is how do you 13 00:00:39,560 --> 00:00:42,210 design a complex system? 14 00:00:42,210 --> 00:00:45,350 We think that's very important because there's absolutely no 15 00:00:45,350 --> 00:00:48,960 way this department could exist the way it does, making 16 00:00:48,960 --> 00:00:51,730 things like that, hooking up internets and so forth. 17 00:00:51,730 --> 00:00:53,700 Those are truly complex systems. 18 00:00:53,700 --> 00:00:55,780 And if you didn't have an organized way of thinking 19 00:00:55,780 --> 00:00:59,030 about complexity, they're hopeless. 20 00:00:59,030 --> 00:01:02,140 So the kinds of things we're interested to teach you about 21 00:01:02,140 --> 00:01:05,330 are just hopeless if you can't get a handle on complexity. 22 00:01:05,330 --> 00:01:07,390 So that's by far the most important thing that we've 23 00:01:07,390 --> 00:01:09,210 been thinking about. 24 00:01:09,210 --> 00:01:11,880 We've been interested in modeling, and controlling 25 00:01:11,880 --> 00:01:13,140 physical systems. 26 00:01:13,140 --> 00:01:16,110 I hope you remember the way we chased the robot around the 27 00:01:16,110 --> 00:01:18,980 lab, and that was the point there. 28 00:01:18,980 --> 00:01:22,570 We've thought about augmenting physical systems by adding 29 00:01:22,570 --> 00:01:25,570 computation, I hope you've got a feel for that. 30 00:01:25,570 --> 00:01:27,810 And we're going to start today thinking about how do you 31 00:01:27,810 --> 00:01:29,720 build systems that are robust. 32 00:01:29,720 --> 00:01:33,130 So just in review, so far-- 33 00:01:33,130 --> 00:01:34,800 you've already seen most of this-- 34 00:01:34,800 --> 00:01:39,340 so far we've taught you about abstraction, hierarchy, and 35 00:01:39,340 --> 00:01:42,300 controlling complexity starting primarily by thinking 36 00:01:42,300 --> 00:01:43,480 about software engineering. 37 00:01:43,480 --> 00:01:46,730 Because that's such a good pedagogical place to start. 38 00:01:46,730 --> 00:01:50,230 We introduced the idea of PCAP, and that has continued 39 00:01:50,230 --> 00:01:53,180 throughout the rest of the subject, then we worried about 40 00:01:53,180 --> 00:01:55,930 how do you control things. 41 00:01:55,930 --> 00:01:58,770 We developed ways of modeling so that you could predict the 42 00:01:58,770 --> 00:02:01,350 outcome before you actually built the system. 43 00:02:01,350 --> 00:02:02,310 That's crucial. 44 00:02:02,310 --> 00:02:05,710 You can't afford to build prototypes for everything, 45 00:02:05,710 --> 00:02:07,520 it's just not economical. 46 00:02:07,520 --> 00:02:10,949 And so this was an exercise in making models, figuring out 47 00:02:10,949 --> 00:02:14,490 how behaviors relate to the models, and trying to get the 48 00:02:14,490 --> 00:02:19,240 design done in the modeling stage rather than in the 49 00:02:19,240 --> 00:02:23,800 prototyping stage, and you built circuits. 50 00:02:23,800 --> 00:02:28,920 This had to do with how you augment a system with new 51 00:02:28,920 --> 00:02:35,240 capabilities, either hardware or software. 52 00:02:35,240 --> 00:02:38,370 Today what I want to start to think about is, how do you 53 00:02:38,370 --> 00:02:41,370 with uncertainty? 54 00:02:41,370 --> 00:02:44,470 And how do you deal with things that are much more 55 00:02:44,470 --> 00:02:46,370 complicated to plan? 56 00:02:46,370 --> 00:02:49,170 So the things that we will do in this segment are things 57 00:02:49,170 --> 00:02:51,360 like mapping. 58 00:02:51,360 --> 00:02:54,820 What if we gave you a maze-- 59 00:02:54,820 --> 00:02:56,280 you, the robot. 60 00:02:56,280 --> 00:02:59,270 What if we gave the robot a maze and didn't tell them the 61 00:02:59,270 --> 00:03:00,160 structure of the maze? 62 00:03:00,160 --> 00:03:01,800 How would it discover the structure? 63 00:03:01,800 --> 00:03:03,970 How would it make a map? 64 00:03:03,970 --> 00:03:05,100 How would it localize? 65 00:03:05,100 --> 00:03:07,820 What if you had a maze-- 66 00:03:07,820 --> 00:03:09,440 to make it simple, let's say that I tell you 67 00:03:09,440 --> 00:03:11,330 what the maze is. 68 00:03:11,330 --> 00:03:14,020 But you wake up-- you're the robot, you wake up, you have 69 00:03:14,020 --> 00:03:15,130 no idea where you are. 70 00:03:15,130 --> 00:03:15,500 What do you do? 71 00:03:15,500 --> 00:03:17,720 How do you figure out where you are? 72 00:03:17,720 --> 00:03:19,245 That's a problem we call localization. 73 00:03:19,245 --> 00:03:22,080 And then planning. 74 00:03:22,080 --> 00:03:25,210 What if you have a really complicated objective? 75 00:03:25,210 --> 00:03:27,080 What's the step-by-step things that you 76 00:03:27,080 --> 00:03:28,860 could do to get there? 77 00:03:28,860 --> 00:03:30,830 Those are the kinds of things we're going to do, and here's 78 00:03:30,830 --> 00:03:33,490 a typical kind of problem. 79 00:03:33,490 --> 00:03:36,520 Let's say that the robot starts someplace, and say that 80 00:03:36,520 --> 00:03:38,560 it has something in it that lets it know 81 00:03:38,560 --> 00:03:41,320 where it is, like GPS. 82 00:03:41,320 --> 00:03:44,570 And it knows where it wants to go. 83 00:03:44,570 --> 00:03:47,100 Making a plan is not very difficult, right? 84 00:03:47,100 --> 00:03:48,950 I'm here and I want to go there, connect 85 00:03:48,950 --> 00:03:49,730 with a straight line. 86 00:03:49,730 --> 00:03:52,530 And that's what I've done here. 87 00:03:52,530 --> 00:03:56,420 The problem is that unbeknownst to the robot, that 88 00:03:56,420 --> 00:03:59,430 path doesn't really work. 89 00:03:59,430 --> 00:04:03,620 So on the first step, he thinks he's going to go from 90 00:04:03,620 --> 00:04:05,110 here to here in a straight line. 91 00:04:05,110 --> 00:04:08,660 The blue represents the path that the robot would like to 92 00:04:08,660 --> 00:04:15,490 take, but then on the first step the sonars report that 93 00:04:15,490 --> 00:04:16,839 they hit walls. 94 00:04:16,839 --> 00:04:20,110 And those show up as the black marks over here. 95 00:04:20,110 --> 00:04:22,990 So already it can see that it's not going to be able to 96 00:04:22,990 --> 00:04:25,840 do what it wants to do. 97 00:04:25,840 --> 00:04:28,990 So it starts to turn and it finds even more places that 98 00:04:28,990 --> 00:04:31,780 don't work. 99 00:04:31,780 --> 00:04:33,390 Try again. 100 00:04:33,390 --> 00:04:34,060 Try again. 101 00:04:34,060 --> 00:04:37,240 Notice that the plan now is, well I don't know what's going 102 00:04:37,240 --> 00:04:39,320 on here, but I certainly can't go through there, so I'm going 103 00:04:39,320 --> 00:04:40,570 to have to go around it. 104 00:04:43,780 --> 00:04:45,070 Keep trying. 105 00:04:45,070 --> 00:04:46,320 Keep trying. 106 00:04:49,690 --> 00:04:50,570 Notice the plan. 107 00:04:50,570 --> 00:04:54,035 So he's always making a plan that sort of make sense. 108 00:04:58,580 --> 00:05:02,820 He's using for each plan the information about the walls 109 00:05:02,820 --> 00:05:04,640 that he's already figured out. 110 00:05:04,640 --> 00:05:07,750 And now he's figured out, well that didn't work. 111 00:05:07,750 --> 00:05:09,820 So now back track, try to get out of here. 112 00:05:14,600 --> 00:05:16,990 AUDIENCE: Is he backtracking right now? 113 00:05:16,990 --> 00:05:17,440 Or is he-- 114 00:05:17,440 --> 00:05:18,820 PROFESSOR: Well, he's going forward. 115 00:05:18,820 --> 00:05:21,160 He's making a forward plan. 116 00:05:21,160 --> 00:05:24,070 He's saying, OK now I know all these walls are here, and I'm 117 00:05:24,070 --> 00:05:26,910 way down in this corner, how do I get on the other 118 00:05:26,910 --> 00:05:27,890 side of that wall. 119 00:05:27,890 --> 00:05:31,110 Well given the information that I know, I'm going to have 120 00:05:31,110 --> 00:05:32,650 to go around the known walls. 121 00:05:35,160 --> 00:05:37,250 So my point of showing you this is several fold. 122 00:05:37,250 --> 00:05:39,340 First off it's uncertain. 123 00:05:39,340 --> 00:05:42,860 You didn't know at the outset just how bad the problem was. 124 00:05:42,860 --> 00:05:47,920 So there's no way to kind of pre-plan for all of this. 125 00:05:47,920 --> 00:05:50,840 Secondly, it's a really hard problem. 126 00:05:50,840 --> 00:05:53,990 If you were to think about structuring a program to solve 127 00:05:53,990 --> 00:05:57,270 that problem, in a kind of High School programming 128 00:05:57,270 --> 00:05:59,480 sense-- if this happens then do this, if this 129 00:05:59,480 --> 00:06:00,710 happens then do this-- 130 00:06:00,710 --> 00:06:03,810 you would have a lot of if statements, right? 131 00:06:03,810 --> 00:06:07,260 That's just not the way to do this. 132 00:06:07,260 --> 00:06:10,400 So what we're going to learn to do in this module is think 133 00:06:10,400 --> 00:06:14,700 through much more complicated plans. 134 00:06:14,700 --> 00:06:17,670 We're going to be looking at the kind of plans like shown 135 00:06:17,670 --> 00:06:22,990 here that just are not practical for, do this until 136 00:06:22,990 --> 00:06:25,180 this happens, and then do this until this happens, and then 137 00:06:25,180 --> 00:06:26,900 while that's going on, do this. 138 00:06:26,900 --> 00:06:31,480 It's just not going to be practical, that's the idea. 139 00:06:31,480 --> 00:06:34,150 So the very first element, the thing that we have to get on 140 00:06:34,150 --> 00:06:39,980 top of first, is how to think about uncertainty. 141 00:06:39,980 --> 00:06:42,460 And there's a theory for that and the theory is actually 142 00:06:42,460 --> 00:06:44,780 trivial, the theory is actually simple, except that 143 00:06:44,780 --> 00:06:48,510 it's mind-boggling weird that nobody can get their head 144 00:06:48,510 --> 00:06:50,480 around it the first time they see it. 145 00:06:50,480 --> 00:06:52,090 It's called Probability Theory. 146 00:06:52,090 --> 00:06:55,370 As you'll see in a minute, the rules are completely trivial. 147 00:06:55,370 --> 00:06:57,430 You'll have no trouble with the basic rules. 148 00:06:57,430 --> 00:06:59,050 What you will have trouble with-- 149 00:06:59,050 --> 00:07:02,830 unless you're a lot different from most people-- 150 00:07:02,830 --> 00:07:05,500 the first time you see this theory it's very hard to 151 00:07:05,500 --> 00:07:07,270 imagine exactly what's going on. 152 00:07:07,270 --> 00:07:10,770 And it's extremely difficult to have an intuition for 153 00:07:10,770 --> 00:07:12,360 what's going on. 154 00:07:12,360 --> 00:07:14,940 So the theory is going to give us a framework then for 155 00:07:14,940 --> 00:07:16,000 thinking about uncertainty. 156 00:07:16,000 --> 00:07:23,180 In particular, uncertainty sounds uncertain. 157 00:07:23,180 --> 00:07:26,150 What we would like to do is make precise statements about 158 00:07:26,150 --> 00:07:28,580 uncertain situations. 159 00:07:28,580 --> 00:07:31,780 Sounds contradictory, but we'll do several examples in 160 00:07:31,780 --> 00:07:33,310 lecture and then you'll do a lot more 161 00:07:33,310 --> 00:07:36,080 examples in the next week. 162 00:07:36,080 --> 00:07:39,570 So that you learn exactly what that means. 163 00:07:39,570 --> 00:07:44,320 We would like to draw reliable inferences from unreliable 164 00:07:44,320 --> 00:07:45,490 observations. 165 00:07:45,490 --> 00:07:47,660 OK, you have a lot of experience with unreliable 166 00:07:47,660 --> 00:07:48,760 observations, right? 167 00:07:48,760 --> 00:07:52,610 The seminars don't tell you the same thing each time. 168 00:07:52,610 --> 00:07:54,300 That's what we'd like to deal with. 169 00:07:54,300 --> 00:07:58,600 We would like to be able to take a bunch of different 170 00:07:58,600 --> 00:08:02,010 individually, not all that reliable observations, and 171 00:08:02,010 --> 00:08:04,740 come up with a conclusion that's a lot more reliable 172 00:08:04,740 --> 00:08:08,800 than any particular observation. 173 00:08:08,800 --> 00:08:10,740 And when we're all done with that what we'd like to do is 174 00:08:10,740 --> 00:08:13,630 use this theory to help us design robust systems. 175 00:08:13,630 --> 00:08:15,420 Systems that are not fragile. 176 00:08:15,420 --> 00:08:20,610 Systems that are not thrown off track by having a small 177 00:08:20,610 --> 00:08:24,990 feature that was not part of the original formulation of 178 00:08:24,990 --> 00:08:26,940 the problem. 179 00:08:26,940 --> 00:08:28,640 So that's the goal. 180 00:08:28,640 --> 00:08:31,120 And what I'd like to do is start by motivating it with 181 00:08:31,120 --> 00:08:33,090 the kind of practical thing to get you thinking. 182 00:08:33,090 --> 00:08:34,880 So here's the game, Let's Make a Deal. 183 00:08:37,500 --> 00:08:40,380 I'm going to put 4 LEGO bricks in a bag. 184 00:08:40,380 --> 00:08:41,460 OK. 185 00:08:41,460 --> 00:08:44,881 LEGO bricks, you've seen those probably. 186 00:08:44,881 --> 00:08:46,131 Bag. 187 00:08:48,150 --> 00:08:52,340 The LEGO bricks are white or red. 188 00:08:52,340 --> 00:08:54,510 There's only going to be 4, and you're not going to know 189 00:08:54,510 --> 00:08:58,480 how many of each there is. 190 00:08:58,480 --> 00:09:05,620 Then you get to pull one LEGO brick out, and if you pull a 191 00:09:05,620 --> 00:09:07,125 red one out, I'll give you 20$. 192 00:09:10,310 --> 00:09:14,790 The hitch is you have to pay me to play this game. 193 00:09:14,790 --> 00:09:19,940 So the question is, how much are you willing to pay me to 194 00:09:19,940 --> 00:09:20,570 play the game? 195 00:09:20,570 --> 00:09:21,740 So, I need a volunteer. 196 00:09:21,740 --> 00:09:25,100 I need somebody to take 4 LEGOs and not let me see, OK, 197 00:09:25,100 --> 00:09:27,050 please, please. 198 00:09:27,050 --> 00:09:30,710 I want you to put 4 LEGOs, only four. 199 00:09:30,710 --> 00:09:32,160 They can be white or red. 200 00:09:32,160 --> 00:09:34,110 If you have LEGOs in your pockets that are a different 201 00:09:34,110 --> 00:09:35,600 color, don't use them. 202 00:09:39,320 --> 00:09:41,760 You're allowed to know what the answer is but you're not 203 00:09:41,760 --> 00:09:43,170 allowed to tell me, or them. 204 00:09:43,170 --> 00:09:46,410 So OK well come over here. 205 00:09:46,410 --> 00:09:50,930 So bag, LEGOs, hide, put some number in. 206 00:09:55,430 --> 00:09:55,630 Oh, no no no. 207 00:09:55,630 --> 00:09:56,540 Wait, wait, wait. 208 00:09:56,540 --> 00:09:57,400 Put them back, put them back. 209 00:09:57,400 --> 00:09:59,630 I'm not supposed to see either. 210 00:09:59,630 --> 00:10:00,880 OK, I'll go way. 211 00:10:10,950 --> 00:10:12,560 OK, 4. 212 00:10:12,560 --> 00:10:15,920 OK, so we'll close the bag, right? 213 00:10:15,920 --> 00:10:18,740 And I'll call you back later, but it'll be nearer 214 00:10:18,740 --> 00:10:19,480 the end of the hour. 215 00:10:19,480 --> 00:10:22,700 So here's 4 LEGOs, sort of sounds like 4 LEGOs, 216 00:10:22,700 --> 00:10:24,840 it's more than one. 217 00:10:24,840 --> 00:10:27,600 OK, so how much would you be willing to pay 218 00:10:27,600 --> 00:10:31,080 me to play the game? 219 00:10:34,960 --> 00:10:36,900 AUDIENCE: 5$. 220 00:10:36,900 --> 00:10:38,360 PROFESSOR: 5$ -- 221 00:10:38,360 --> 00:10:39,928 Can I get more? 222 00:10:39,928 --> 00:10:41,920 I want to make money. 223 00:10:41,920 --> 00:10:43,414 Can I get a higher bid? 224 00:10:43,414 --> 00:10:44,410 More than 5$.. 225 00:10:44,410 --> 00:10:45,904 AUDIENCE: $9.90. 226 00:10:45,904 --> 00:10:47,398 PROFESSOR: How much? 227 00:10:47,398 --> 00:10:48,394 AUDIENCE: $9.90. 228 00:10:48,394 --> 00:10:49,888 PROFESSOR: $9.90, very interesting. 229 00:10:49,888 --> 00:10:56,860 Can I get more than $9.90? 230 00:10:56,860 --> 00:10:58,354 AUDIENCE: $9.99 and a half. 231 00:10:58,354 --> 00:11:00,346 PROFESSOR: $9.99 and a half? 232 00:11:00,346 --> 00:11:01,342 Magic number. 233 00:11:01,342 --> 00:11:02,245 AUDIENCE: 10$. 234 00:11:02,245 --> 00:11:05,440 PROFESSOR: 10$. 235 00:11:05,440 --> 00:11:08,392 Can I hear even a penny more? 236 00:11:11,350 --> 00:11:12,260 A penny more? 237 00:11:12,260 --> 00:11:14,220 AUDIENCE: I'll offer a penny more. 238 00:11:14,220 --> 00:11:17,160 You just have to go to the bag. 239 00:11:17,160 --> 00:11:20,500 PROFESSOR: I thought we were being very careful and letting 240 00:11:20,500 --> 00:11:21,400 them not know. 241 00:11:21,400 --> 00:11:22,190 AUDIENCE: No, no, no. 242 00:11:22,190 --> 00:11:25,184 Aren't you going to put 4 white blocks in all the time? 243 00:11:25,184 --> 00:11:26,182 PROFESSOR: I didn't do it. 244 00:11:26,182 --> 00:11:28,178 That person did it. 245 00:11:28,178 --> 00:11:29,675 It wasn't me. 246 00:11:29,675 --> 00:11:30,673 I'm innocent. 247 00:11:30,673 --> 00:11:32,170 I'm completely fair. 248 00:11:36,162 --> 00:11:37,160 Yeah? 249 00:11:37,160 --> 00:11:41,152 AUDIENCE: Are we imagining that you are equally as likely 250 00:11:41,152 --> 00:11:42,160 to put any number of blocks in? 251 00:11:42,160 --> 00:11:44,620 So, are we able to say that she's more likely 252 00:11:44,620 --> 00:11:46,099 to put it all white? 253 00:11:46,099 --> 00:11:49,057 Because that just changes how you calculate it. 254 00:11:49,057 --> 00:11:51,530 PROFESSOR: OK, that's an interesting question. 255 00:11:51,530 --> 00:11:53,100 We need a model of a person. 256 00:11:56,260 --> 00:11:59,080 That's tricky. 257 00:11:59,080 --> 00:12:01,470 OK, I have another idea. 258 00:12:01,470 --> 00:12:03,710 Two more volunteers. 259 00:12:03,710 --> 00:12:04,960 OK, volunteer, volunteer. 260 00:12:07,840 --> 00:12:14,650 Here's the experiment one person will hold the bag up 261 00:12:14,650 --> 00:12:19,120 high, so that the other person can't see it, 262 00:12:19,120 --> 00:12:20,180 and the other person-- 263 00:12:20,180 --> 00:12:23,110 I didn't look in, notice I'm being very careful, I'm very 264 00:12:23,110 --> 00:12:24,330 honest, right? 265 00:12:24,330 --> 00:12:27,920 Except for the X-ray vision, which you don't know about. 266 00:12:27,920 --> 00:12:29,540 Everything is completely fair. 267 00:12:29,540 --> 00:12:31,177 And the little window in the back you don't 268 00:12:31,177 --> 00:12:32,720 know about that either. 269 00:12:32,720 --> 00:12:35,890 So, one person holds it up so the other person can't see in, 270 00:12:35,890 --> 00:12:39,270 the other person grabs a LEGO and pulls it out and lets 271 00:12:39,270 --> 00:12:40,935 everybody see that LEGO. 272 00:12:51,900 --> 00:12:54,290 It was intended to make it hard to see in. 273 00:12:54,290 --> 00:12:55,540 OK, red one. 274 00:12:58,690 --> 00:13:01,010 OK, that's fine, so we're done. 275 00:13:01,010 --> 00:13:02,480 AUDIENCE: We each should get $20, right? 276 00:13:02,480 --> 00:13:03,530 PROFESSOR: No, no, no. 277 00:13:03,530 --> 00:13:05,600 This was a different part of the bet. 278 00:13:05,600 --> 00:13:08,180 No, no, no, no, no. 279 00:13:08,180 --> 00:13:09,660 Thank you, thank you. 280 00:13:09,660 --> 00:13:13,746 Now how much would you pay me to play the game? 281 00:13:13,746 --> 00:13:15,674 AUDIENCE: With that one out? 282 00:13:15,674 --> 00:13:17,906 PROFESSOR: No, we'll put that one back. 283 00:13:17,906 --> 00:13:21,510 OK, so this one came out, it was red. 284 00:13:21,510 --> 00:13:26,090 Now without looking, I'm going to stick it back in. 285 00:13:26,090 --> 00:13:27,870 OK, so we pulled it out. 286 00:13:27,870 --> 00:13:29,256 So what do we know? 287 00:13:29,256 --> 00:13:31,686 We know there's at least 1 red. 288 00:13:31,686 --> 00:13:34,602 OK, now what are you willing to pay to play the game? 289 00:13:34,602 --> 00:13:35,574 AUDIENCE: 5$.. 290 00:13:35,574 --> 00:13:37,518 PROFESSOR: 5$.. 291 00:13:37,518 --> 00:13:39,948 Yes? 292 00:13:39,948 --> 00:13:40,920 5$.. 293 00:13:40,920 --> 00:13:41,892 AUDIENCE: $4.99. 294 00:13:41,892 --> 00:13:42,864 PROFESSOR: $4.99? 295 00:13:42,864 --> 00:13:43,836 Wait a minute. 296 00:13:43,836 --> 00:13:47,400 Should you be willing to pay more or less? 297 00:13:47,400 --> 00:13:48,650 I got it up to 10$. 298 00:13:50,740 --> 00:13:53,570 Should you be willing to pay more or less now? 299 00:13:53,570 --> 00:13:54,170 AUDIENCE: More. 300 00:13:54,170 --> 00:13:56,650 PROFESSOR: More, why? 301 00:13:56,650 --> 00:13:58,634 The same. 302 00:13:58,634 --> 00:13:59,626 More. 303 00:13:59,626 --> 00:14:00,122 The same. 304 00:14:00,122 --> 00:14:03,098 AUDIENCE: You're insured that there's at least 1 red block. 305 00:14:03,098 --> 00:14:06,180 PROFESSOR: I know that there's at least 1, but didn't I know 306 00:14:06,180 --> 00:14:07,650 that before? 307 00:14:07,650 --> 00:14:08,140 No. 308 00:14:08,140 --> 00:14:12,060 The person could have been e that first person could have 309 00:14:12,060 --> 00:14:15,980 loaded it, because I was giving her a cut. 310 00:14:15,980 --> 00:14:18,920 I didn't talk about this before. 311 00:14:18,920 --> 00:14:20,170 This is not a set-up. 312 00:14:22,840 --> 00:14:26,270 So I want to vote. 313 00:14:26,270 --> 00:14:30,799 How many people would give me less than 10$? 314 00:14:30,799 --> 00:14:34,290 I'm going to give you [UNINTELLIGIBLE] first. 315 00:14:34,290 --> 00:14:36,130 10 to 12. 316 00:14:36,130 --> 00:14:43,288 Let's see, 13 to 15, 16 to 18, more than 18. 317 00:14:43,288 --> 00:14:45,390 So how many people would give me-- you're only 318 00:14:45,390 --> 00:14:48,600 allowed to vote once. 319 00:14:48,600 --> 00:14:49,420 Keep in mind that I'm more likely to choose 320 00:14:49,420 --> 00:14:52,980 you if you vote high. 321 00:14:52,980 --> 00:14:53,580 Right? 322 00:14:53,580 --> 00:14:54,720 Vote high. 323 00:14:54,720 --> 00:14:57,270 So how many people would give me less than 324 00:14:57,270 --> 00:14:59,870 $10 to play the game? 325 00:14:59,870 --> 00:15:01,716 A lot, I would say 20%. 326 00:15:04,380 --> 00:15:08,070 How many people would give me between $10 and $12? 327 00:15:08,070 --> 00:15:11,590 A lot smaller, 5%. 328 00:15:11,590 --> 00:15:13,355 How many people would give me between $13 and $15? 329 00:15:16,240 --> 00:15:19,980 Even smaller, 2%. 330 00:15:19,980 --> 00:15:23,590 How many people would give me between $16 and $18? 331 00:15:23,590 --> 00:15:26,320 Wait, these numbers are not going to add up to 100%. 332 00:15:29,740 --> 00:15:31,680 OK, we'll learn the theory for how to 333 00:15:31,680 --> 00:15:34,120 normalize things in a minute. 334 00:15:34,120 --> 00:15:35,770 OK, so we're down to about 1%. 335 00:15:35,770 --> 00:15:40,170 How many people would give me more than $18? 336 00:15:40,170 --> 00:15:41,110 One person. 337 00:15:41,110 --> 00:15:42,440 Thank you, thank you. 338 00:15:42,440 --> 00:15:47,680 So that's 1 in 200 or 0.05%. 339 00:15:47,680 --> 00:15:52,140 OK, so what I'd like to do now is go through the theory 340 00:15:52,140 --> 00:15:55,980 that's going to let us make a precise calculation for how 341 00:15:55,980 --> 00:15:59,150 much a rational person-- not to say 342 00:15:59,150 --> 00:16:00,530 that you're not rational-- 343 00:16:00,530 --> 00:16:03,600 but how much a rational person might be willing to pay. 344 00:16:03,600 --> 00:16:07,500 So that was the set up, then we'll do the theory, then 345 00:16:07,500 --> 00:16:09,760 we'll come back at the end of the hour and see how many 346 00:16:09,760 --> 00:16:11,750 people I would have gypped-- 347 00:16:11,750 --> 00:16:14,320 made money, or whatever. 348 00:16:14,320 --> 00:16:18,870 OK, so we're going to think about probability. 349 00:16:18,870 --> 00:16:22,170 And the first idea that we need is set theory. 350 00:16:22,170 --> 00:16:24,740 Because we're going to think about experiments having 351 00:16:24,740 --> 00:16:26,840 outcomes, and we're going to talk about the 352 00:16:26,840 --> 00:16:29,230 outcomes being an event. 353 00:16:29,230 --> 00:16:34,390 An event is any describable outcome from an experiment. 354 00:16:34,390 --> 00:16:36,630 So for example, what if the experiment were to flip 3 355 00:16:36,630 --> 00:16:38,150 coins in sequence. 356 00:16:38,150 --> 00:16:43,120 An event could be head, head, head. 357 00:16:43,120 --> 00:16:46,540 And you could talk about was the outcome head, head, head. 358 00:16:46,540 --> 00:16:49,460 The event could be head, tail, ahead. 359 00:16:49,460 --> 00:16:52,100 The event could be 1 head and 2 tails. 360 00:16:54,720 --> 00:16:56,940 The event could be the first toss was a head. 361 00:17:00,130 --> 00:17:05,910 So the idea is there's sets that we're rethinking about. 362 00:17:05,910 --> 00:17:08,950 And we're going to think about events as possible outcomes 363 00:17:08,950 --> 00:17:12,160 being members of sets. 364 00:17:12,160 --> 00:17:14,720 There's going to be a special kind of event that we're 365 00:17:14,720 --> 00:17:19,829 especially interested in, and that is an atomic event. 366 00:17:19,829 --> 00:17:21,874 By which we mean finest grain. 367 00:17:24,609 --> 00:17:28,470 Finest grain is kind of amorphous idea. 368 00:17:28,470 --> 00:17:34,930 What it really means is for the experiment at hand, it 369 00:17:34,930 --> 00:17:40,150 doesn't seem to make sense to try to slice the outcome into 370 00:17:40,150 --> 00:17:41,800 two smaller units. 371 00:17:41,800 --> 00:17:44,830 You keep slicing them down until slicing them into a 372 00:17:44,830 --> 00:17:48,770 smaller unit won't affect the outcome. 373 00:17:48,770 --> 00:17:52,490 So for example, in the coin toss experiment, I might think 374 00:17:52,490 --> 00:17:55,350 that there are 8 atomic events. 375 00:17:55,350 --> 00:17:57,300 Head, head, head, head, head tail, head, tail, head, head, 376 00:17:57,300 --> 00:17:59,990 tail, tail, blah, blah, blah. 377 00:17:59,990 --> 00:18:06,130 So I've ignored some things like, it took 3 minutes to do 378 00:18:06,130 --> 00:18:09,290 the first flip, and it took 2 minutes to do the second one. 379 00:18:09,290 --> 00:18:09,670 Right? 380 00:18:09,670 --> 00:18:14,990 That's the art of figuring out what atomic units are. 381 00:18:14,990 --> 00:18:18,490 So for the class of problems that I'm thinking about, those 382 00:18:18,490 --> 00:18:21,390 things can be ignored so I'm not counting them. 383 00:18:21,390 --> 00:18:23,800 But that's an art, that's not really a science. 384 00:18:23,800 --> 00:18:28,220 So you sort of have to use good judgment when you try to 385 00:18:28,220 --> 00:18:31,360 figure out what are the atomic events for a particular 386 00:18:31,360 --> 00:18:33,070 experiment. 387 00:18:33,070 --> 00:18:36,030 Atomic events always have several properties, they are 388 00:18:36,030 --> 00:18:38,940 always mutually exclusive. 389 00:18:38,940 --> 00:18:43,960 If I know the outcome was atomic event 3, then I know 390 00:18:43,960 --> 00:18:48,680 for sure that it was not atomic event 4. 391 00:18:48,680 --> 00:18:51,070 And you can see that these events up here don't have 392 00:18:51,070 --> 00:18:54,050 those properties, right? 393 00:18:54,050 --> 00:18:56,560 So the first toss-- 394 00:18:56,560 --> 00:18:59,590 here's an event head, head, head, which is not mutually 395 00:18:59,590 --> 00:19:03,730 exclusive with the first toss was a head. 396 00:19:03,730 --> 00:19:07,800 So atomic that events have to be mutually exclusive. 397 00:19:07,800 --> 00:19:11,920 Furthermore, if you list all of the atomic events, that set 398 00:19:11,920 --> 00:19:13,930 has to be collectively exhaustive. 399 00:19:13,930 --> 00:19:14,930 Collectively exhaustive? 400 00:19:14,930 --> 00:19:16,300 What buzz words? 401 00:19:16,300 --> 00:19:20,620 OK, that means that you've exhausted all possibilities 402 00:19:20,620 --> 00:19:23,970 when you've accounted for the collective behaviors of all 403 00:19:23,970 --> 00:19:25,760 the atomic events. 404 00:19:25,760 --> 00:19:27,670 And we have a very special name for that because it comes 405 00:19:27,670 --> 00:19:29,200 up over, and over, and over again. 406 00:19:29,200 --> 00:19:34,050 The set of atomic events, the maximum set of atomic events, 407 00:19:34,050 --> 00:19:36,660 is called the sample space. 408 00:19:36,660 --> 00:19:38,850 So the first thing we need to know when we're thinking about 409 00:19:38,850 --> 00:19:43,990 probability theory, is how to chunk outcomes 410 00:19:43,990 --> 00:19:47,520 into a sample space. 411 00:19:47,520 --> 00:19:51,030 Second thing we need to know are the rules of probability. 412 00:19:51,030 --> 00:19:55,810 These are the things that are so absurdly simple, that 413 00:19:55,810 --> 00:19:58,860 everybody who sees these immediately comes to the 414 00:19:58,860 --> 00:20:01,660 conclusion that probability theory is trivial, they then 415 00:20:01,660 --> 00:20:05,070 don't do anything until the next exam, and then they don't 416 00:20:05,070 --> 00:20:07,050 have a clue what we're asking. 417 00:20:07,050 --> 00:20:10,630 Because it's subtle, it's more subtle than you might think. 418 00:20:10,630 --> 00:20:13,960 Here's the rules, probabilities are real numbers 419 00:20:13,960 --> 00:20:15,210 that are not negative. 420 00:20:17,850 --> 00:20:20,360 Pretty easy. 421 00:20:20,360 --> 00:20:23,100 Probabilities have the feature that the probability of the 422 00:20:23,100 --> 00:20:25,260 sample space is 1. 423 00:20:25,260 --> 00:20:26,810 That's really just scaling. 424 00:20:26,810 --> 00:20:31,190 That's really just telling me how big all the numbers are. 425 00:20:31,190 --> 00:20:36,140 So if I enumerate all the possible atomic events, the 426 00:20:36,140 --> 00:20:40,190 probability of having one of those as the outcome of an 427 00:20:40,190 --> 00:20:44,660 experiment, that probability is 1. 428 00:20:44,660 --> 00:20:47,240 Doesn't seem like I said much, and I'm already 2/3 of the way 429 00:20:47,240 --> 00:20:48,010 through the list. 430 00:20:48,010 --> 00:20:48,876 Yes? 431 00:20:48,876 --> 00:20:50,214 AUDIENCE: Doesn't that just mean that something happened? 432 00:20:50,214 --> 00:20:52,690 PROFESSOR: Something happened, yes. 433 00:20:52,690 --> 00:20:56,165 And we are going to say that this certain event has 434 00:20:56,165 --> 00:20:58,840 probability 1. 435 00:20:58,840 --> 00:21:02,020 All probabilities are real, all probabilities are bigger 436 00:21:02,020 --> 00:21:05,150 than 0, and the probability of the certain event-- 437 00:21:05,150 --> 00:21:09,810 written here as the universe, the sample space-- 438 00:21:09,810 --> 00:21:11,540 the probability of some element in the 439 00:21:11,540 --> 00:21:14,140 sample space is 1. 440 00:21:14,140 --> 00:21:17,610 The only one that's terribly interesting is additivity. 441 00:21:17,610 --> 00:21:22,890 If the intersection between A and B is empty, the 442 00:21:22,890 --> 00:21:33,390 probability of the union is the sum of the probabilities 443 00:21:33,390 --> 00:21:36,240 of the individual events. 444 00:21:36,240 --> 00:21:40,290 Astonishingly, I'm done. 445 00:21:40,290 --> 00:21:44,280 And this doesn't alter the fact that people are still, to 446 00:21:44,280 --> 00:21:46,990 this day, doing fundamental research 447 00:21:46,990 --> 00:21:48,570 in probability theory. 448 00:21:48,570 --> 00:21:53,120 There are many subjects in probability theory, including 449 00:21:53,120 --> 00:21:58,000 many highly advanced graduate subjects, all of which derive 450 00:21:58,000 --> 00:21:59,000 from these three rules. 451 00:21:59,000 --> 00:22:06,100 It's absurd how un-intuitive things can be given such 452 00:22:06,100 --> 00:22:08,910 simple beginnings. 453 00:22:08,910 --> 00:22:10,130 Just as an idea. 454 00:22:10,130 --> 00:22:13,470 So you can prove all of the interesting results from 455 00:22:13,470 --> 00:22:14,260 probability theory-- 456 00:22:14,260 --> 00:22:18,350 you can prove all results from probability theory with these 457 00:22:18,350 --> 00:22:22,560 three rules, and here's just one example. 458 00:22:22,560 --> 00:22:29,120 If the intersection of A and B were not empty, you can still 459 00:22:29,120 --> 00:22:32,740 compute the probability of the union, it's just more 460 00:22:32,740 --> 00:22:35,550 complicated than if they were empty, if the intersection 461 00:22:35,550 --> 00:22:36,510 were empty. 462 00:22:36,510 --> 00:22:38,820 Generally speaking, the probability of the union of A 463 00:22:38,820 --> 00:22:41,920 and B, is the probability of A plus the probability of B, 464 00:22:41,920 --> 00:22:44,560 minus the probability of the intersection. 465 00:22:44,560 --> 00:22:46,890 And you can sort of see why that ought to be true, if you 466 00:22:46,890 --> 00:22:49,540 think about a Venn diagram. 467 00:22:49,540 --> 00:22:54,110 If you think about the odds of having A in the universe-- 468 00:22:54,110 --> 00:22:56,160 the universe is the sample space-- 469 00:22:56,160 --> 00:22:58,620 probability of having sum event A, the probability of 470 00:22:58,620 --> 00:23:00,980 having sum event B, the probability of their 471 00:23:00,980 --> 00:23:02,530 intersection. 472 00:23:02,530 --> 00:23:05,270 If you were to just add the probability of A and B, you 473 00:23:05,270 --> 00:23:08,600 doubly count the intersection. 474 00:23:08,600 --> 00:23:11,890 You don't want to double count it, you want to count it once. 475 00:23:11,890 --> 00:23:13,150 So you have to subtract one off. 476 00:23:13,150 --> 00:23:15,940 So that's sort of what's going on. 477 00:23:15,940 --> 00:23:19,910 OK, as I said the theory is very simple. 478 00:23:19,910 --> 00:23:23,140 But let's make sure that you've got the basics first. 479 00:23:23,140 --> 00:23:27,670 So experiment, I'm going to roll a fair, 6-sided die. 480 00:23:27,670 --> 00:23:32,010 And I'm going to count as the outcome the number of dots on 481 00:23:32,010 --> 00:23:34,930 the top surface, not surprisingly. 482 00:23:34,930 --> 00:23:37,990 Find the probability that the roll is odd, and greater than 483 00:23:37,990 --> 00:23:40,020 3 You have 10 seconds. 484 00:23:56,840 --> 00:23:58,050 OK, 10 seconds are up. 485 00:23:58,050 --> 00:23:59,650 What's the answer? (1), (2), (3), (4) or (5)? 486 00:23:59,650 --> 00:24:00,530 Raise your hands. 487 00:24:00,530 --> 00:24:01,510 Excellent, wonderful. 488 00:24:01,510 --> 00:24:03,840 The answer is (1). 489 00:24:03,840 --> 00:24:06,830 The way I want you to think about that is in terms of the 490 00:24:06,830 --> 00:24:09,500 theory that we just generated because it's useful for 491 00:24:09,500 --> 00:24:12,170 developing the answers to more complicated questions. 492 00:24:12,170 --> 00:24:15,620 In terms of the theory, what we will always do, the process 493 00:24:15,620 --> 00:24:21,060 that always works, is enumerate the sample space. 494 00:24:21,060 --> 00:24:21,680 What's that mean? 495 00:24:21,680 --> 00:24:26,790 That means identify all of the atomic events. 496 00:24:26,790 --> 00:24:30,440 The atomic events here are the faces that show are 497 00:24:30,440 --> 00:24:32,970 1, 2, 3, 4, 5, 6. 498 00:24:32,970 --> 00:24:36,140 Enumerate the sample space. 499 00:24:36,140 --> 00:24:40,950 And then find the event interest. 500 00:24:40,950 --> 00:24:44,220 So here the event was a compound event. 501 00:24:44,220 --> 00:24:46,330 The result is odd and greater than 3. 502 00:24:46,330 --> 00:24:50,950 Odd, well that's 1, 3, 5, shown by the check marks. 503 00:24:50,950 --> 00:24:55,920 Bigger than 3, that's the bottom 3 check marks. 504 00:24:55,920 --> 00:24:57,830 If it's going to be both, then you have to look where there's 505 00:24:57,830 --> 00:25:01,605 overlap and that only happens for the outcome 5. 506 00:25:01,605 --> 00:25:05,440 Since there's only 1, and so fair meant that these 507 00:25:05,440 --> 00:25:06,710 probabilities were the same. 508 00:25:06,710 --> 00:25:09,810 If you think through the fundamental axioms of 509 00:25:09,810 --> 00:25:15,270 probability, if they're equal, they're all non-negative real 510 00:25:15,270 --> 00:25:20,320 numbers, and they sum to 1, then they are all 1/6. 511 00:25:20,320 --> 00:25:23,180 So the answer is 1/6, right? 512 00:25:23,180 --> 00:25:24,430 OK, that was easy. 513 00:25:26,430 --> 00:25:32,880 The rule that is most interesting for us, happens 514 00:25:32,880 --> 00:25:35,930 not surprisingly to also be the one that people have the 515 00:25:35,930 --> 00:25:38,560 most trouble with. 516 00:25:38,560 --> 00:25:40,980 Not excluding the people who originally 517 00:25:40,980 --> 00:25:42,640 invented the theory. 518 00:25:42,640 --> 00:25:44,600 The theory goes back to Laplace. 519 00:25:44,600 --> 00:25:46,900 A bunch of people back then who were absolutely brilliant 520 00:25:46,900 --> 00:25:49,150 mathematicians, and still it took a while to 521 00:25:49,150 --> 00:25:50,480 formulate this rule. 522 00:25:50,480 --> 00:25:52,060 It was formulated a guy named Bayes. 523 00:25:55,030 --> 00:25:59,390 Bayes' theorem gives us a way to think about conditional 524 00:25:59,390 --> 00:26:01,490 probability. 525 00:26:01,490 --> 00:26:06,825 What if I tell you, in some sample space, B happened? 526 00:26:10,140 --> 00:26:14,270 How should you relabel the probabilities to take that 527 00:26:14,270 --> 00:26:16,770 into account? 528 00:26:16,770 --> 00:26:20,650 Bayes' rule is trivial, it says it if I know B happened, 529 00:26:20,650 --> 00:26:24,080 what is the probability that A occurs, given 530 00:26:24,080 --> 00:26:25,330 that I know B happens? 531 00:26:27,900 --> 00:26:30,630 And the rule is, you find the probability of the 532 00:26:30,630 --> 00:26:31,020 intersection. 533 00:26:31,020 --> 00:26:32,736 AUDIENCE: How do you do that? 534 00:26:32,736 --> 00:26:35,720 PROFESSOR: We'll do some examples. 535 00:26:35,720 --> 00:26:37,900 So we need to find the probability of the 536 00:26:37,900 --> 00:26:40,200 intersection, and then we have to find the probability of B 537 00:26:40,200 --> 00:26:42,200 occurring, and then we normalize-- 538 00:26:42,200 --> 00:26:44,680 a word I used before, and that's exactly what we need to 539 00:26:44,680 --> 00:26:47,220 do to that distribution-- 540 00:26:47,220 --> 00:26:54,110 we normalize the intersection by the probability of B. 541 00:26:54,110 --> 00:26:57,330 That's an interesting rule. 542 00:26:57,330 --> 00:26:59,310 It's the kind of thing we're going to want to know about. 543 00:26:59,310 --> 00:27:01,450 We're going to want to know-- 544 00:27:01,450 --> 00:27:03,780 OK, I'm a robot. 545 00:27:03,780 --> 00:27:04,360 I'm in a space. 546 00:27:04,360 --> 00:27:06,890 I don't know where I am. 547 00:27:06,890 --> 00:27:09,910 I have some a priori probability idea about where I 548 00:27:09,910 --> 00:27:14,810 am, so I think I'm 1/20 likely to be here, I'm 1/20 likely to 549 00:27:14,810 --> 00:27:17,100 be there, et cetera, et cetera. 550 00:27:17,100 --> 00:27:22,775 And then I find out the sonars told me that I'm 0.03 meters 551 00:27:22,775 --> 00:27:26,760 -- no it can't be that small, 0.72 meters from a wall. 552 00:27:26,760 --> 00:27:32,960 Well, how do I take into account this new information 553 00:27:32,960 --> 00:27:36,640 to update my probabilities for where I might be? 554 00:27:36,640 --> 00:27:40,120 That's what this rule is good for. 555 00:27:40,120 --> 00:27:41,600 So here's a picture. 556 00:27:41,600 --> 00:27:45,060 The way to think about the rule is if I condition on B, 557 00:27:45,060 --> 00:27:51,510 if I tell you B happened, that's equivalent to shrinking 558 00:27:51,510 --> 00:27:54,510 the universe -- 559 00:27:54,510 --> 00:27:56,630 the universe U, the square. 560 00:27:56,630 --> 00:28:00,420 That's everything that can happen. 561 00:28:00,420 --> 00:28:03,640 Inside the universe, there's this event A and it does not 562 00:28:03,640 --> 00:28:04,890 occupy the entire universe. 563 00:28:07,700 --> 00:28:10,420 There is a fraction of outcomes that belong logically 564 00:28:10,420 --> 00:28:14,240 in not A. OK? 565 00:28:14,240 --> 00:28:19,750 That's the part that's in U but not in A. Similarly with 566 00:28:19,750 --> 00:28:24,840 B. Similarly there's some region, there's some part of 567 00:28:24,840 --> 00:28:28,790 the universe where both A and B occur, the intersection of 568 00:28:28,790 --> 00:28:30,570 the two occurred. 569 00:28:30,570 --> 00:28:37,600 So what Bayes' theorem says is, if I tell you B occurred, 570 00:28:37,600 --> 00:28:41,870 all this part of the universe outside of B is irrelevant. 571 00:28:41,870 --> 00:28:44,410 As far as you're concerned, B's the new universe. 572 00:28:48,400 --> 00:28:53,340 Notice that if B is the new universe, then the 573 00:28:53,340 --> 00:28:54,320 intersection-- 574 00:28:54,320 --> 00:28:56,055 which is the part where A occurred-- 575 00:28:59,800 --> 00:29:06,070 is bigger after the conditioning then it was 576 00:29:06,070 --> 00:29:08,360 before the conditioning. 577 00:29:08,360 --> 00:29:11,060 Before the conditioning the universe was this big, now the 578 00:29:11,060 --> 00:29:13,250 universe is this big. 579 00:29:13,250 --> 00:29:18,410 The universe is smaller, so this region of overlap 580 00:29:18,410 --> 00:29:23,160 occupies a greater part of the new universe. 581 00:29:23,160 --> 00:29:24,460 Is that clear? 582 00:29:24,460 --> 00:29:27,820 So when you condition, you're really making the universe 583 00:29:27,820 --> 00:29:31,890 smaller, And the relative likelihood of things that are 584 00:29:31,890 --> 00:29:34,315 still in the universe, seem bigger. 585 00:29:37,020 --> 00:29:40,040 So what's the conditional probability of getting a die 586 00:29:40,040 --> 00:29:45,970 roll greater than 3, given that it was odd? 587 00:29:45,970 --> 00:29:48,040 Calculate, you have 30 seconds. 588 00:29:48,040 --> 00:29:49,290 This is three times harder. 589 00:30:23,850 --> 00:30:28,340 OK, what's the probability of getting a die roll greater 590 00:30:28,340 --> 00:30:30,910 than 3, given that the die role was odd? 591 00:30:30,910 --> 00:30:32,540 Everybody raise your hands. 592 00:30:32,540 --> 00:30:36,980 And it's a landslide, the answer is (2). 593 00:30:36,980 --> 00:30:40,790 You roughly do the same thing we did before, except now the 594 00:30:40,790 --> 00:30:43,140 math is incrementally harder because you 595 00:30:43,140 --> 00:30:45,120 have to do a divide. 596 00:30:45,120 --> 00:30:48,850 So we think about the same two events, the event that it is 597 00:30:48,850 --> 00:30:51,480 odd and the event that it's bigger than 3, and now we ask 598 00:30:51,480 --> 00:30:52,190 the question. 599 00:30:52,190 --> 00:30:56,590 If it were odd, what's the likelihood that it's 600 00:30:56,590 --> 00:30:57,780 greater than 3? 601 00:30:57,780 --> 00:30:59,840 Before I did the conditioning, what was the likelihood that 602 00:30:59,840 --> 00:31:01,090 it was bigger than 3? 603 00:31:03,905 --> 00:31:05,300 AUDIENCE: 1/6 604 00:31:05,300 --> 00:31:07,160 PROFESSOR: Nope. 605 00:31:07,160 --> 00:31:09,356 1/2. 606 00:31:09,356 --> 00:31:12,000 So bigger than 3 is 4, 5, or 6 -- 607 00:31:12,000 --> 00:31:12,810 right? 608 00:31:12,810 --> 00:31:16,190 There are 3 atomic units there. 609 00:31:16,190 --> 00:31:18,300 There are 6 atomic units to start with. 610 00:31:18,300 --> 00:31:19,620 They are equally likely. 611 00:31:19,620 --> 00:31:21,870 So before I did the conditioning, the event of 612 00:31:21,870 --> 00:31:24,960 interest had a probability of a 1/2. 613 00:31:24,960 --> 00:31:28,920 After I do the conditioning, I know that half of the possible 614 00:31:28,920 --> 00:31:30,090 samples didn't happen. 615 00:31:30,090 --> 00:31:33,240 The universe shrank. 616 00:31:33,240 --> 00:31:36,310 Instead of having a sample space with 6, I now have a 617 00:31:36,310 --> 00:31:39,600 sample space with 3. 618 00:31:39,600 --> 00:31:42,550 Similarly the probability law changed. 619 00:31:42,550 --> 00:31:47,480 So now the event of interest is bigger than 3, but bigger 620 00:31:47,480 --> 00:31:51,080 than 3 now only happens once. 621 00:31:51,080 --> 00:31:55,830 So what I need to do is rescale my probabilities. 622 00:31:55,830 --> 00:31:59,070 Remember the scaling rule, one of the fundamental properties 623 00:31:59,070 --> 00:31:59,790 of probability. 624 00:31:59,790 --> 00:32:01,230 The scaling rule said the sum of the 625 00:32:01,230 --> 00:32:03,420 probabilities must be 1. 626 00:32:03,420 --> 00:32:04,660 After I've conditioned, the sum of the 627 00:32:04,660 --> 00:32:07,020 probabilities is a 1/2. 628 00:32:07,020 --> 00:32:08,510 That's not good. 629 00:32:08,510 --> 00:32:11,170 I've got to fix it. 630 00:32:11,170 --> 00:32:18,140 So the way to think about Bayes' rule is, if all I know 631 00:32:18,140 --> 00:32:22,040 is it the universe got smaller, how 632 00:32:22,040 --> 00:32:25,260 should I redo the scaling? 633 00:32:25,260 --> 00:32:31,560 Well if all I've told you is that the answer is odd, then 634 00:32:31,560 --> 00:32:33,980 there are three possibilities. 635 00:32:33,980 --> 00:32:38,750 Before I told you that the answer was odd, they were 636 00:32:38,750 --> 00:32:40,250 equally likely. 637 00:32:40,250 --> 00:32:42,730 After I tell you that they're odd, has it changed the fact 638 00:32:42,730 --> 00:32:45,750 that they're equally likely? 639 00:32:45,750 --> 00:32:46,570 No. 640 00:32:46,570 --> 00:32:51,810 They're still equally likely even under that new condition. 641 00:32:51,810 --> 00:32:55,470 I haven't changed their individual probabilities. 642 00:32:55,470 --> 00:32:59,750 So they started out equally likely, they're still equally 643 00:32:59,750 --> 00:33:03,310 likely, they just don't sum to 1 anymore. 644 00:33:03,310 --> 00:33:07,950 Bayes' rule says, make them sum to 1. 645 00:33:07,950 --> 00:33:11,300 OK, so the way I make this sum, sum to one is 646 00:33:11,300 --> 00:33:13,330 to divide by 1/2. 647 00:33:13,330 --> 00:33:16,840 If you divide six by 1/2, you get 1/3. 648 00:33:16,840 --> 00:33:21,180 Notice that the probability that it's bigger than 3 went 649 00:33:21,180 --> 00:33:24,890 from 1/2 to a 1/3. 650 00:33:24,890 --> 00:33:26,140 It got smaller. 651 00:33:29,020 --> 00:33:33,640 It could have gone either way. 652 00:33:33,640 --> 00:33:40,760 So, think about what happens when the world shrinks, when 653 00:33:40,760 --> 00:33:42,140 the universe gets smaller, when I 654 00:33:42,140 --> 00:33:44,940 tell you that B happened. 655 00:33:44,940 --> 00:33:49,510 Well when I tell you that B happened, then I ask you 656 00:33:49,510 --> 00:33:52,890 whether A happened, here I'm showing a picture that in the 657 00:33:52,890 --> 00:33:55,780 original universe A and B sort of covered the 658 00:33:55,780 --> 00:33:57,720 same amount of area. 659 00:33:57,720 --> 00:34:00,000 By which I mean, they're about equally likely. 660 00:34:03,310 --> 00:34:06,003 Before I did the conditioning, the probability of A was about 661 00:34:06,003 --> 00:34:10,480 the same size as the probability of B. What happens 662 00:34:10,480 --> 00:34:12,070 when I condition? 663 00:34:12,070 --> 00:34:19,280 Well, when I condition now the universe is B. But notice the 664 00:34:19,280 --> 00:34:21,320 way I've drawn them, there's very little overlap. 665 00:34:21,320 --> 00:34:27,370 So now when I condition on B, the odds that I'm in A seems 666 00:34:27,370 --> 00:34:28,620 to have got smaller. 667 00:34:31,199 --> 00:34:36,330 Rather than being of equal probability, as I show here, 668 00:34:36,330 --> 00:34:40,370 after the conditioning the relative likelihood of being 669 00:34:40,370 --> 00:34:43,719 event A is smaller than it used to be. 670 00:34:43,719 --> 00:34:48,260 But that's entirely because of the way I rigged the circles. 671 00:34:48,260 --> 00:34:50,300 I could have rigged the circles to have a large amount 672 00:34:50,300 --> 00:34:51,550 of overlap. 673 00:34:54,360 --> 00:34:58,400 Then when I condition, it seems as though it's 674 00:34:58,400 --> 00:35:04,420 relatively more likely that I'm in the event A. That's 675 00:35:04,420 --> 00:35:06,390 what we mean by the conditioning. 676 00:35:06,390 --> 00:35:11,760 The conditioning can give you un-intuitive insight. 677 00:35:11,760 --> 00:35:15,550 Because when you condition, probabilities can get bigger 678 00:35:15,550 --> 00:35:17,400 or littler. 679 00:35:17,400 --> 00:35:19,890 And that's something that sort of at a gut level, we all have 680 00:35:19,890 --> 00:35:21,140 trouble dealing with. 681 00:35:23,620 --> 00:35:27,210 OK, so that's the fundamental ideas, right? 682 00:35:27,210 --> 00:35:30,630 We've talked about events. 683 00:35:30,630 --> 00:35:34,910 Three axioms of probability that are completely trivial. 684 00:35:34,910 --> 00:35:43,470 One, not quite so trivial rule, which is Bayes' rule. 685 00:35:43,470 --> 00:35:46,280 In order to apply it, there's two more things we need to 686 00:35:46,280 --> 00:35:46,760 talk about. 687 00:35:46,760 --> 00:35:48,736 The first is, notation. 688 00:35:51,300 --> 00:35:54,190 We could do the entire rest of the course using the notation 689 00:35:54,190 --> 00:35:57,700 that I showed so far, drawing circles on the blackboard, it 690 00:35:57,700 --> 00:35:59,090 would work. 691 00:35:59,090 --> 00:36:02,150 It would not be very convenient. 692 00:36:02,150 --> 00:36:06,010 So to better take advantage of math, which is a very concise 693 00:36:06,010 --> 00:36:10,630 way to write things down, we will define a new notion which 694 00:36:10,630 --> 00:36:13,370 is a random variable. 695 00:36:13,370 --> 00:36:18,390 Random variable is just like a variable, except shockingly, 696 00:36:18,390 --> 00:36:21,200 it's random. 697 00:36:21,200 --> 00:36:24,350 So where we would normally think about a variable 698 00:36:24,350 --> 00:36:29,480 represents a number, a random variable represents a 699 00:36:29,480 --> 00:36:30,730 distribution. 700 00:36:33,150 --> 00:36:38,300 So we could, for example in the die rolling case, we could 701 00:36:38,300 --> 00:36:46,660 say the sample space has 6 atomic events, and I could 702 00:36:46,660 --> 00:36:49,450 think about it as 6 circles. 703 00:36:49,450 --> 00:36:51,770 Circles wouldn't pack all that well. 704 00:36:51,770 --> 00:36:55,150 6 squares inside the universe, right? 705 00:36:55,150 --> 00:36:57,980 Because they are mutually exclusive, and collectively 706 00:36:57,980 --> 00:37:00,130 exhaustive, so if I started with a universal that looked 707 00:37:00,130 --> 00:37:03,920 like that, I would have this one would be the probability 708 00:37:03,920 --> 00:37:09,730 that the number of dots was 1, 2, 3, it has to fill up by the 709 00:37:09,730 --> 00:37:11,760 time I've put 6 of them in there. 710 00:37:11,760 --> 00:37:14,810 And they have to not overlap. 711 00:37:14,810 --> 00:37:18,660 A more convenient notation is to say, OK, let's let X 712 00:37:18,660 --> 00:37:19,910 represent that outcome. 713 00:37:27,190 --> 00:37:29,150 So I can label the events with math. 714 00:37:29,150 --> 00:37:33,900 I can say, there's the event X equals 1, the event X equals 715 00:37:33,900 --> 00:37:38,950 2, the event X equals 3, and it just makes it much easier 716 00:37:38,950 --> 00:37:41,770 to write down the possibilities, then to try to 717 00:37:41,770 --> 00:37:44,380 draw pictures with Venn diagrams all the time. 718 00:37:44,380 --> 00:37:48,370 So all we're doing here is introducing a mathematical 719 00:37:48,370 --> 00:37:52,080 representation for the same thing we talked about before. 720 00:37:52,080 --> 00:38:00,750 But among the things that you can do, after you've 721 00:38:00,750 --> 00:38:03,450 formalized this, so you can have a random variable then 722 00:38:03,450 --> 00:38:06,070 it's a very small jump to say you can have a 723 00:38:06,070 --> 00:38:09,260 multi-dimensional random variable. 724 00:38:09,260 --> 00:38:11,620 Let's just for example have a 2-space. 725 00:38:11,620 --> 00:38:13,730 X and Y, for example. 726 00:38:13,730 --> 00:38:20,400 So now we can talk very conveniently about situations 727 00:38:20,400 --> 00:38:21,990 that factor. 728 00:38:21,990 --> 00:38:30,450 So, for example when I think about flipping 3 coins, I can 729 00:38:30,450 --> 00:38:35,500 think about that as a multivariate random variable 730 00:38:35,500 --> 00:38:36,960 in three dimensions. 731 00:38:36,960 --> 00:38:40,120 One dimension represents the outcome of the first die-- 732 00:38:40,120 --> 00:38:44,160 the first coin toss. 733 00:38:44,160 --> 00:38:45,830 Another dimension is the second, the third 734 00:38:45,830 --> 00:38:47,630 dimension is the third. 735 00:38:47,630 --> 00:38:49,870 So there is a very convenient way of talking about it, and 736 00:38:49,870 --> 00:38:51,870 we have a more concise notation. 737 00:38:51,870 --> 00:38:57,180 We say, OK let V be the outcome of the first die roll, 738 00:38:57,180 --> 00:38:58,400 or whatever. 739 00:38:58,400 --> 00:39:01,480 Let W be the second one, and then we can think about the 740 00:39:01,480 --> 00:39:05,350 joint probability distribution, in terms of the 741 00:39:05,350 --> 00:39:07,920 multi-dimensional random variable. 742 00:39:07,920 --> 00:39:12,680 So we have the random variable defined by V and W. We will 743 00:39:12,680 --> 00:39:16,072 generally to try to make things easy for you to know 744 00:39:16,072 --> 00:39:18,120 what we're trying to talk about, we'll try to remember 745 00:39:18,120 --> 00:39:20,430 to capitalize things when we're talking about random 746 00:39:20,430 --> 00:39:23,390 variables, and then we'll use the small numbers to talk 747 00:39:23,390 --> 00:39:26,230 about events. 748 00:39:26,230 --> 00:39:29,330 So this notation would represent the probability that 749 00:39:29,330 --> 00:39:32,510 V took on the value little v, and W took on the 750 00:39:32,510 --> 00:39:34,300 value little w. 751 00:39:34,300 --> 00:39:36,110 We'll see examples of this in a minute. 752 00:39:36,110 --> 00:39:39,290 So the idea is-- you don't need to do this, it's just a 753 00:39:39,290 --> 00:39:41,210 convenient notation to write more 754 00:39:41,210 --> 00:39:44,085 complicated things concisely. 755 00:39:48,180 --> 00:39:52,830 Now a concept that's very easy to talk about, now we have 756 00:39:52,830 --> 00:39:55,950 random variables, is reducing dimensionality. 757 00:39:55,950 --> 00:39:58,750 And in fact, we will constantly reduce 758 00:39:58,750 --> 00:40:01,850 dimensionality of complicated problems that are represented 759 00:40:01,850 --> 00:40:06,670 by multiple dimensions, to smaller dimensional problems. 760 00:40:06,670 --> 00:40:08,650 And we'll talk about two ways of doing that. 761 00:40:08,650 --> 00:40:11,570 The first is what we will call marginalizing. 762 00:40:11,570 --> 00:40:15,650 Marginalizing means, I don't care what happened in the 763 00:40:15,650 --> 00:40:18,560 other dimensions. 764 00:40:18,560 --> 00:40:22,240 So if I have a probability rule that told me, for 765 00:40:22,240 --> 00:40:27,850 example, about the outcome of one toss a fair die, and a 766 00:40:27,850 --> 00:40:32,720 second toss of a fair die, and if I tell you the joint 767 00:40:32,720 --> 00:40:36,600 probability space for that, right? 768 00:40:36,600 --> 00:40:39,690 So I would have 6 outcomes on one dimension, 6 outcomes on 769 00:40:39,690 --> 00:40:42,330 another dimension, let's say they're all equally likely. 770 00:40:42,330 --> 00:40:46,710 I have 36 points altogether, if they're all equally likely, 771 00:40:46,710 --> 00:40:50,640 then my probability law is a joint distribution. 772 00:40:50,640 --> 00:40:54,220 The joint distribution has 32 non-zero points and each point 773 00:40:54,220 --> 00:40:56,700 has height of. 774 00:40:56,700 --> 00:40:57,580 I said the right thing, right. 775 00:40:57,580 --> 00:41:00,090 36 is what I meant to say. 776 00:41:00,090 --> 00:41:02,160 My brain is telling me that I might not have said that. 777 00:41:02,160 --> 00:41:04,920 I meant 36. 778 00:41:04,920 --> 00:41:10,900 So if I have 36 equally likely events, how high is each one? 779 00:41:10,900 --> 00:41:12,470 1/36. 780 00:41:12,470 --> 00:41:19,440 OK, so the joint probability space for two tosses of a fair 781 00:41:19,440 --> 00:41:24,040 6-sided die, is this 6-by-6 space. 782 00:41:24,040 --> 00:41:26,440 And I may be interested in marginalizing. 783 00:41:26,440 --> 00:41:28,270 Marginalizing would mean, I don't care what 784 00:41:28,270 --> 00:41:30,590 the second one was. 785 00:41:30,590 --> 00:41:33,520 OK well, how do you infer the rule for the first one from 786 00:41:33,520 --> 00:41:38,140 the joint, if I don't care what the second one was, well 787 00:41:38,140 --> 00:41:39,390 you sum out the second. 788 00:41:42,260 --> 00:41:45,790 So if I have this 2-space that represented the 789 00:41:45,790 --> 00:41:47,040 first and the second. 790 00:41:51,050 --> 00:41:52,730 So, say its X and Y, for example. 791 00:41:52,730 --> 00:41:59,520 So, I've got 6 points that represent 1, 2, 3, 4, 5, 6. 792 00:41:59,520 --> 00:42:04,260 And then 6 this way, that sort of thing, except now I have to 793 00:42:04,260 --> 00:42:06,830 draw in tediously all of the others, right? 794 00:42:06,830 --> 00:42:10,290 So you get the idea. 795 00:42:10,290 --> 00:42:18,150 Each one of the X's represents a point with probability 1/36, 796 00:42:18,150 --> 00:42:21,890 and imagine direction that they're all in straight lines. 797 00:42:21,890 --> 00:42:25,400 Now if I didn't care what is the second one, how would I 798 00:42:25,400 --> 00:42:28,340 find the rule for the first one, well I just sum over the 799 00:42:28,340 --> 00:42:28,760 second one. 800 00:42:28,760 --> 00:42:31,210 So, say I'm only interested in what happened in the first 801 00:42:31,210 --> 00:42:35,310 one, well I would describe all of the probabilities here to 802 00:42:35,310 --> 00:42:36,610 that point. 803 00:42:36,610 --> 00:42:41,700 I would sum out the one that I don't care about. 804 00:42:41,700 --> 00:42:42,510 That's obvious, right? 805 00:42:42,510 --> 00:42:47,530 Because if I marginalized these X's that all represent 806 00:42:47,530 --> 00:42:50,500 the number 1/36 have to turn into a single dimension axis, 807 00:42:50,500 --> 00:42:54,780 which is just X, and they have to be 6 numbers that 808 00:42:54,780 --> 00:42:57,770 are each how high? 809 00:42:57,770 --> 00:42:59,620 1/6, right? 810 00:42:59,620 --> 00:43:03,310 So the way I get 6 numbers that are each 1/6, when I 811 00:43:03,310 --> 00:43:08,160 started with 36 numbers that were each 1/36 is use sum. 812 00:43:08,160 --> 00:43:10,440 OK, so that's called marginalization. 813 00:43:10,440 --> 00:43:12,500 The other thing that I can do is condition. 814 00:43:12,500 --> 00:43:21,090 I can tell you something about the sample space and ask you 815 00:43:21,090 --> 00:43:24,700 to figure out a conditional probability. 816 00:43:24,700 --> 00:43:31,690 So I might tell you what's the probability rule for Y 817 00:43:31,690 --> 00:43:35,630 conditioned on the first one being 3? 818 00:43:35,630 --> 00:43:36,960 OK. 819 00:43:36,960 --> 00:43:39,700 Mathematically that's a different problem, that's a 820 00:43:39,700 --> 00:43:44,580 re-scale problem, because that's Bayes' rule. 821 00:43:44,580 --> 00:43:48,190 So generally if I carved out by conditioning some fraction 822 00:43:48,190 --> 00:43:51,500 of the sample space, the way you would compute the new 823 00:43:51,500 --> 00:43:53,840 probabilities would be to re-scale. 824 00:43:53,840 --> 00:43:56,360 So there's two operations that we will do. 825 00:43:56,360 --> 00:43:58,890 We will marginalize, which means summing out. 826 00:43:58,890 --> 00:44:03,690 And we will condition, which means re-scale. 827 00:44:03,690 --> 00:44:04,860 OK. 828 00:44:04,860 --> 00:44:07,540 So give some practice at that, let's think 829 00:44:07,540 --> 00:44:12,130 about a tangible problem. 830 00:44:12,130 --> 00:44:14,590 Example, prevalence and testing for AIDS. 831 00:44:14,590 --> 00:44:19,690 Consider the effectiveness of a test for AIDS. 832 00:44:19,690 --> 00:44:22,490 This is real data. 833 00:44:22,490 --> 00:44:24,390 Data from the United States. 834 00:44:24,390 --> 00:44:28,060 So imagine that we take a population, representative of 835 00:44:28,060 --> 00:44:31,070 the population in the United States, and classify every 836 00:44:31,070 --> 00:44:36,600 individual as having AIDS or not, and being diagnosed 837 00:44:36,600 --> 00:44:39,660 according to some test as positive or negative. 838 00:44:42,580 --> 00:44:45,090 OK, two dimensional. 839 00:44:45,090 --> 00:44:48,630 The two dimensions are what was the value of 840 00:44:48,630 --> 00:44:49,880 AIDS, true or false? 841 00:44:52,700 --> 00:44:57,312 And what's the value of the test, positive or negative? 842 00:44:57,312 --> 00:45:01,750 So we've divided the population into four pieces. 843 00:45:01,750 --> 00:45:04,770 And by using the idea of relative frequency, I've 844 00:45:04,770 --> 00:45:07,450 written probabilities here. 845 00:45:07,450 --> 00:45:12,300 So what's the probability of choosing by random choice an 846 00:45:12,300 --> 00:45:17,290 individual that has AIDS and tested positive. 847 00:45:17,290 --> 00:45:21,090 OK, so that's 0,003648, et cetera. 848 00:45:21,090 --> 00:45:25,140 So I've divided the population into four groups. 849 00:45:25,140 --> 00:45:27,440 Multidimensional, 850 00:45:27,440 --> 00:45:30,590 multidimensional random variable. 851 00:45:30,590 --> 00:45:31,220 OK. 852 00:45:31,220 --> 00:45:34,310 The question is, what's the probability that the test is 853 00:45:34,310 --> 00:45:39,020 positive given that the subject has AIDS? 854 00:45:39,020 --> 00:45:41,990 I want to know how good the test is. 855 00:45:41,990 --> 00:45:44,920 So the first question I'm going to ask is, given that 856 00:45:44,920 --> 00:45:48,930 the person has AIDS what's the probability that the test 857 00:45:48,930 --> 00:45:52,230 gives a true answer? 858 00:45:52,230 --> 00:45:53,970 You've got 60 seconds. 859 00:45:53,970 --> 00:45:55,220 This is harder. 860 00:45:57,590 --> 00:45:58,840 Some people don't think it's harder. 861 00:46:24,750 --> 00:46:27,730 So what's the probability that the test is positive, given 862 00:46:27,730 --> 00:46:29,460 that the subject has AIDS? 863 00:46:29,460 --> 00:46:30,730 Is it bigger than 90%? 864 00:46:30,730 --> 00:46:32,070 Between 50% and 90%? 865 00:46:32,070 --> 00:46:32,460 Less than 50%? 866 00:46:32,460 --> 00:46:33,830 Or you can't tell from the data? 867 00:46:33,830 --> 00:46:37,350 Everybody vote, and the answer is 100% correct. 868 00:46:37,350 --> 00:46:38,170 Wonderful. 869 00:46:38,170 --> 00:46:40,540 So let me make it harder. 870 00:46:40,540 --> 00:46:43,420 Is it between 90% and 95%? 871 00:46:43,420 --> 00:46:45,886 Or between 95% and a 100%? 872 00:46:45,886 --> 00:46:46,862 AUDIENCE: 95% and a 100% 873 00:46:46,862 --> 00:46:48,080 PROFESSOR: 95%. 874 00:46:48,080 --> 00:46:53,810 Is it between 95% and 97%, or 97% and 100%? 875 00:46:57,826 --> 00:46:59,940 OK, sorry. 876 00:46:59,940 --> 00:47:01,190 This is called marginalization. 877 00:47:03,860 --> 00:47:06,470 I told you something about the population that lets you 878 00:47:06,470 --> 00:47:09,660 eliminate some of the numbers. 879 00:47:09,660 --> 00:47:13,390 So if I told you that the person has AIDS, then I know 880 00:47:13,390 --> 00:47:16,460 I'm in the first column. 881 00:47:16,460 --> 00:47:17,790 That's marginalization. 882 00:47:17,790 --> 00:47:20,880 I gave you new information. 883 00:47:20,880 --> 00:47:23,720 I'm saying the other cases didn't happen. 884 00:47:23,720 --> 00:47:26,800 I've shrunk the universe, it used to have 4 groups of 885 00:47:26,800 --> 00:47:32,090 people, now it has 2 groups of people, I used Bayes' rule. 886 00:47:32,090 --> 00:47:38,330 I need to re-scale the numbers so that they add to 1. 887 00:47:38,330 --> 00:47:42,150 So these 2 numbers, the only 2 possibilities that can occur-- 888 00:47:42,150 --> 00:47:43,870 after I've done the conditioning, no 889 00:47:43,870 --> 00:47:45,990 longer add to 1. 890 00:47:45,990 --> 00:47:48,410 I've got to make them add to 1. 891 00:47:48,410 --> 00:47:52,160 I do that by dividing by the probability of the event that 892 00:47:52,160 --> 00:47:54,620 I'm using to normalize. 893 00:47:54,620 --> 00:47:58,240 So the sum of these two probabilities is something, 894 00:47:58,240 --> 00:48:03,350 whatever it is 0.003700. 895 00:48:03,350 --> 00:48:06,450 So I divide each of those probabilities by that sum, 896 00:48:06,450 --> 00:48:07,930 that's just Bayes' rule. 897 00:48:07,930 --> 00:48:11,070 And I find out that the answer is the probability that the 898 00:48:11,070 --> 00:48:14,190 test is positive-- 899 00:48:14,190 --> 00:48:16,780 given that person has AIDS, the probability that the test 900 00:48:16,780 --> 00:48:20,240 is positive is 0.986. 901 00:48:20,240 --> 00:48:21,490 Good test? 902 00:48:24,310 --> 00:48:26,376 Good test? 903 00:48:26,376 --> 00:48:28,380 98%. 904 00:48:28,380 --> 00:48:30,060 I won't say that. 905 00:48:30,060 --> 00:48:33,160 98%. is a good test right? 906 00:48:33,160 --> 00:48:36,040 Not that today is an appropriate day to talk about 907 00:48:36,040 --> 00:48:38,230 the outcomes of tests and 98%. 908 00:48:38,230 --> 00:48:39,380 But, I won't mention that. 909 00:48:39,380 --> 00:48:43,810 OK, so good test. 910 00:48:43,810 --> 00:48:47,530 The accuracy of the test is greater than 98%. 911 00:48:47,530 --> 00:48:48,780 Quite good. 912 00:48:56,020 --> 00:48:57,010 New question. 913 00:48:57,010 --> 00:48:59,310 What's the probability that the subject has AIDS given 914 00:48:59,310 --> 00:49:00,560 that the test is positive? 915 00:49:15,020 --> 00:49:16,270 Everybody vote. (1), (2), (3), (4). 916 00:49:22,210 --> 00:49:23,020 Looks 100%. 917 00:49:23,020 --> 00:49:24,970 OK, the answer is less than 50%. 918 00:49:24,970 --> 00:49:25,610 Why is that? 919 00:49:25,610 --> 00:49:27,970 Well that's another marginalization problem, but 920 00:49:27,970 --> 00:49:31,490 now we're marginalizing on a different population. 921 00:49:31,490 --> 00:49:34,840 This is how you can go awry thinking about probability. 922 00:49:34,840 --> 00:49:37,670 The 2 numbers seem kind of contradictory. 923 00:49:37,670 --> 00:49:40,550 Here I'm saying that the test came out positive and I'm 924 00:49:40,550 --> 00:49:44,570 asking does the subject have AIDS. 925 00:49:44,570 --> 00:49:45,970 It's still marginalization. 926 00:49:45,970 --> 00:49:50,140 I'm still throwing away 2 of the conditions, two fractions 927 00:49:50,140 --> 00:49:52,610 of the population, I'm only thinking about 2. 928 00:49:52,610 --> 00:49:58,240 I still have to normalize so that the sums come out 1, but 929 00:49:58,240 --> 00:49:59,490 the numbers are different. 930 00:49:59,490 --> 00:50:01,772 Yes? 931 00:50:01,772 --> 00:50:03,022 AUDIENCE: [INAUDIBLE PHRASE]. 932 00:50:08,534 --> 00:50:09,980 PROFESSOR: Thank you. 933 00:50:09,980 --> 00:50:13,890 Because my brain's not working. 934 00:50:13,890 --> 00:50:16,150 OK, I've been saying marginalization and I meant 935 00:50:16,150 --> 00:50:19,130 uniformly, over the last five minutes, to be saying 936 00:50:19,130 --> 00:50:21,490 conditioning. 937 00:50:21,490 --> 00:50:24,450 OK, so I skipped breakfast this morning, my blood sugar 938 00:50:24,450 --> 00:50:27,450 is low, sorry. 939 00:50:27,450 --> 00:50:28,850 Thank you very much. 940 00:50:28,850 --> 00:50:33,620 I should have been saying conditioning. 941 00:50:33,620 --> 00:50:34,580 Sorry. 942 00:50:34,580 --> 00:50:36,290 OK, so backing up. 943 00:50:38,920 --> 00:50:46,200 OK I conditioned on the fact that the person had AIDS, and 944 00:50:46,200 --> 00:50:49,100 then I conditioned on the fact that the 945 00:50:49,100 --> 00:50:50,750 test came up positive. 946 00:50:50,750 --> 00:50:54,850 In both cases I was conditioning. 947 00:50:54,850 --> 00:50:58,050 In both cases I was doing Bayes' rule. 948 00:50:58,050 --> 00:51:00,200 Please ignore the person who can't connect his 949 00:51:00,200 --> 00:51:02,740 brain to his mouth. 950 00:51:02,740 --> 00:51:07,340 So, here because the conditioning event has a very 951 00:51:07,340 --> 00:51:11,950 different set of numbers from these numbers, the relative 952 00:51:11,950 --> 00:51:18,450 likelihood that the subject has AIDS is small. 953 00:51:18,450 --> 00:51:25,460 So even though the test is very effective in identifying 954 00:51:25,460 --> 00:51:30,990 cases that are known to be true, it is not very effective 955 00:51:30,990 --> 00:51:35,840 in taking a random person from the population and saying the 956 00:51:35,840 --> 00:51:39,060 test was positive, you have it. 957 00:51:39,060 --> 00:51:42,430 OK, those are very different things and the probability 958 00:51:42,430 --> 00:51:45,700 theory gives us a way to say exactly how 959 00:51:45,700 --> 00:51:46,950 different those are. 960 00:51:49,330 --> 00:51:52,710 Why are they so different? 961 00:51:52,710 --> 00:51:54,785 The reason they're different is that other word. 962 00:51:57,300 --> 00:51:58,370 Because the marginal 963 00:51:58,370 --> 00:52:00,530 probabilities are so different. 964 00:52:00,530 --> 00:52:05,320 And that is because the population is skewed. 965 00:52:05,320 --> 00:52:09,650 So the fact that the test came out positive, is offset at 966 00:52:09,650 --> 00:52:14,000 least somewhat by the skew in the population. 967 00:52:14,000 --> 00:52:17,210 So the point here is actually marginalizing. 968 00:52:17,210 --> 00:52:20,470 If I think about how many people in the population have 969 00:52:20,470 --> 00:52:27,120 AIDS, that means I'm summing on the columns, rather than 970 00:52:27,120 --> 00:52:29,220 conditioning. 971 00:52:29,220 --> 00:52:33,200 And what you see is a very skewed population. 972 00:52:33,200 --> 00:52:38,310 And that's the reason you can't conclude from the test, 973 00:52:38,310 --> 00:52:42,440 whether or not this particular subject has the disease or not 974 00:52:42,440 --> 00:52:44,980 because the population is so skewed. 975 00:52:44,980 --> 00:52:49,850 So this was intended to be an example of conditioning versus 976 00:52:49,850 --> 00:52:52,140 marginalization and how you think about that in a 977 00:52:52,140 --> 00:52:54,400 multi-dimensional random variable. 978 00:52:54,400 --> 00:52:55,976 Yes? 979 00:52:55,976 --> 00:52:59,148 AUDIENCE: Don't you sum [UNINTELLIGIBLE] in order to 980 00:52:59,148 --> 00:53:00,398 do Bayes' rule? 981 00:53:02,808 --> 00:53:08,880 PROFESSOR: In order to condition on has AIDS, you 982 00:53:08,880 --> 00:53:12,140 need to sum has AIDS. 983 00:53:12,140 --> 00:53:13,950 And then you use that number. 984 00:53:13,950 --> 00:53:14,680 Yes? 985 00:53:14,680 --> 00:53:15,732 That's right. 986 00:53:15,732 --> 00:53:16,968 AUDIENCE: So how are they different? 987 00:53:16,968 --> 00:53:20,740 PROFESSOR: One of them has a [UNINTELLIGIBLE] and the other 988 00:53:20,740 --> 00:53:21,400 one doesn't. 989 00:53:21,400 --> 00:53:26,570 So when we did Bayes' rule, we did the marginalization here, 990 00:53:26,570 --> 00:53:32,850 but then we use that summed number to normalize the 991 00:53:32,850 --> 00:53:36,400 individual probabilities by scaling, by dividing. 992 00:53:36,400 --> 00:53:40,600 So that the new sum, over the new smaller sample 993 00:53:40,600 --> 00:53:45,280 space is still one. 994 00:53:45,280 --> 00:53:47,320 So your point 's right. 995 00:53:47,320 --> 00:53:50,600 So regardless of whether we're conditioning or marginalizing, 996 00:53:50,600 --> 00:53:54,080 we still end up computing the marginals. 997 00:53:54,080 --> 00:53:56,010 it's just that in one case were done, and in the other 998 00:53:56,010 --> 00:54:02,567 case we use that marginal to re-scale OK? 999 00:54:07,120 --> 00:54:12,421 So I said, we could just use set theory and we're done. 1000 00:54:12,421 --> 00:54:14,420 We'll in fact use random variables 1001 00:54:14,420 --> 00:54:15,230 because it's simpler. 1002 00:54:15,230 --> 00:54:17,740 That's one of the two other things we need to do which are 1003 00:54:17,740 --> 00:54:20,170 non-essential, it just makes our life easier. 1004 00:54:20,170 --> 00:54:23,160 And the other non-essential thing that we will do is 1005 00:54:23,160 --> 00:54:26,430 represent it in some sort of a Python structure. 1006 00:54:26,430 --> 00:54:29,200 So we would like to be able to conveniently represent 1007 00:54:29,200 --> 00:54:32,590 probabilities in Python. 1008 00:54:32,590 --> 00:54:36,690 The way we'll do that, is a little obscure the first time 1009 00:54:36,690 --> 00:54:37,500 you look at it. 1010 00:54:37,500 --> 00:54:40,160 But again, once you've done it a few times it's a very 1011 00:54:40,160 --> 00:54:41,920 natural way of doing it, otherwise we 1012 00:54:41,920 --> 00:54:43,200 wouldn't do it this way. 1013 00:54:43,200 --> 00:54:47,170 How are we going to represent probability laws in Python? 1014 00:54:47,170 --> 00:54:54,470 The way we'll do it, since the labels for random variables 1015 00:54:54,470 --> 00:54:57,040 can be lots of different things-- so for example, the 1016 00:54:57,040 --> 00:55:01,270 label in the previous one was in the case of the subject 1017 00:55:01,270 --> 00:55:05,900 having AIDS or not, the label was true or false. 1018 00:55:05,900 --> 00:55:10,190 The label for the test was positive or negative. 1019 00:55:10,190 --> 00:55:14,710 So in order to allow you to give symbolic and human 1020 00:55:14,710 --> 00:55:21,500 meaningful names to events we will use a dictionary as the 1021 00:55:21,500 --> 00:55:27,300 fundamental way of associating probabilities with events. 1022 00:55:27,300 --> 00:55:29,450 So, we'll represent a probability 1023 00:55:29,450 --> 00:55:31,270 distribution by a class-- 1024 00:55:31,270 --> 00:55:34,760 what a surprise, by a Python class-- 1025 00:55:34,760 --> 00:55:37,460 that we will call DDist which means discrete distribution. 1026 00:55:39,980 --> 00:55:47,110 DDists want to associate the name of an atomic event which 1027 00:55:47,110 --> 00:55:53,280 we will let you use any string, or in fact any-- 1028 00:55:53,280 --> 00:55:55,580 I should generalize that. 1029 00:55:55,580 --> 00:56:02,230 You can use any Python data structure to identify an 1030 00:56:02,230 --> 00:56:04,190 atomic event. 1031 00:56:04,190 --> 00:56:06,870 And then we will associate that using a Python 1032 00:56:06,870 --> 00:56:10,970 dictionary, with the probability. 1033 00:56:10,970 --> 00:56:16,310 So what we will do when you instantiate a new discrete 1034 00:56:16,310 --> 00:56:21,380 distribution, you will-- the instantiation rule, you must 1035 00:56:21,380 --> 00:56:22,670 call it with a dictionary. 1036 00:56:22,670 --> 00:56:26,550 A dictionary is a thing in Python that associates one 1037 00:56:26,550 --> 00:56:31,130 thing with another thing, I'll give an example in a minute. 1038 00:56:31,130 --> 00:56:37,430 And the utility of this is that you'll be able to use as 1039 00:56:37,430 --> 00:56:43,320 your atomic event a string, like true or false, a string 1040 00:56:43,320 --> 00:56:46,840 like positive or negative, or something more complicated 1041 00:56:46,840 --> 00:56:47,740 like a tuple. 1042 00:56:47,740 --> 00:56:49,810 And I'll show you an example of where you would want to do 1043 00:56:49,810 --> 00:56:51,470 that in just a second. 1044 00:56:51,470 --> 00:56:55,120 So the idea is going to be you establish a discrete 1045 00:56:55,120 --> 00:56:58,690 distribution by the unique method called the dictionary. 1046 00:56:58,690 --> 00:57:08,080 The dictionary is just a list of keys which tell you which 1047 00:57:08,080 --> 00:57:11,130 event that you're trying to name the probability of. 1048 00:57:11,130 --> 00:57:13,390 Associated with a number, and that number is the 1049 00:57:13,390 --> 00:57:15,380 probability. 1050 00:57:15,380 --> 00:57:18,210 And this shows you that there's one extremely 1051 00:57:18,210 --> 00:57:22,450 interesting method, which is the Prob method. 1052 00:57:22,450 --> 00:57:25,790 The idea is that Prob will tell you what is the 1053 00:57:25,790 --> 00:57:28,640 probability associated with that key. 1054 00:57:28,640 --> 00:57:31,020 If it doesn't find the key in the dictionary, I'll tell you 1055 00:57:31,020 --> 00:57:32,370 the answer is 0. 1056 00:57:32,370 --> 00:57:35,550 We do that for a specific reason too, because a lot of 1057 00:57:35,550 --> 00:57:38,900 the probability spaces that we will talk about, have lots of 1058 00:57:38,900 --> 00:57:40,560 0's in them. 1059 00:57:40,560 --> 00:57:44,110 So instead of having to enumerate all of the cases 1060 00:57:44,110 --> 00:57:48,020 that are 0 we will assume that if you didn't tell us a 1061 00:57:48,020 --> 00:57:53,480 probability, the answer was 0. 1062 00:57:53,480 --> 00:57:55,900 OK so this is the idea. 1063 00:57:55,900 --> 00:58:02,850 I could say use the disk module in lib 601 to create 1064 00:58:02,850 --> 00:58:05,770 the outcome of a coin toss experiment. 1065 00:58:05,770 --> 00:58:08,130 And I have a syntax error. 1066 00:58:08,130 --> 00:58:10,650 This should have had a squiggle brace. 1067 00:58:13,490 --> 00:58:15,330 A dictionary is something that in Python-- 1068 00:58:15,330 --> 00:58:18,420 So I should have said something like this-- 1069 00:58:18,420 --> 00:58:19,670 dist.DDist of squiggle. 1070 00:58:23,010 --> 00:58:25,450 Sorry about that, that should've said squiggle, I'll 1071 00:58:25,450 --> 00:58:27,770 fix it and put the answer on the website. 1072 00:58:30,290 --> 00:58:40,090 Head should be associated with the probability 0.5 and tail 1073 00:58:40,090 --> 00:58:43,200 should be associated with the probability 0.5. 1074 00:58:43,200 --> 00:58:46,840 End of dictionary, end of call. 1075 00:58:46,840 --> 00:58:48,700 Sorry, I missed the squiggle. 1076 00:58:48,700 --> 00:58:51,140 Actually what happened was, I put the squiggle in 1077 00:58:51,140 --> 00:58:52,260 and LaTeX ate it. 1078 00:58:52,260 --> 00:58:56,790 Because that's the LaTeX, anyway. 1079 00:58:56,790 --> 00:59:00,930 It's sort of my fault. 1080 00:59:00,930 --> 00:59:02,790 The dog ate my homework. 1081 00:59:02,790 --> 00:59:05,140 LaTeX ate my squiggle, it's sort of the same thing. 1082 00:59:08,140 --> 00:59:11,480 So having defined a distribution, then I can ask 1083 00:59:11,480 --> 00:59:14,560 what's the probability of the event head? 1084 00:59:14,560 --> 00:59:15,880 The answer is a half. 1085 00:59:15,880 --> 00:59:17,460 The probability of event tail? 1086 00:59:17,460 --> 00:59:19,360 The answer is a half. 1087 00:59:19,360 --> 00:59:21,640 The probability of event H? 1088 00:59:21,640 --> 00:59:23,860 There is no H. The answer 0. 1089 00:59:23,860 --> 00:59:25,920 That's what I meant by sparsity. 1090 00:59:25,920 --> 00:59:29,400 If I didn't tell you what the probability is, we assume the 1091 00:59:29,400 --> 00:59:30,650 answer is 0. 1092 00:59:33,290 --> 00:59:37,830 Conditional probabilities are a little more obscure. 1093 00:59:37,830 --> 00:59:40,830 What's the conditional probability that the test 1094 00:59:40,830 --> 00:59:44,060 gives me some outcome given that I tell you the status of 1095 00:59:44,060 --> 00:59:47,840 whether the patient has, or doesn't have AIDS? 1096 00:59:47,840 --> 00:59:49,930 OK, well conditionals-- 1097 00:59:49,930 --> 00:59:54,900 you're going to have to tell me which case I want to 1098 00:59:54,900 --> 00:59:56,680 condition on. 1099 00:59:56,680 --> 00:59:59,620 So in order for me to tell you the right probability law you 1100 00:59:59,620 --> 01:00:04,270 have to tell me does the person have AIDS or not. 1101 01:00:04,270 --> 01:00:06,430 So that becomes an argument. 1102 01:00:06,430 --> 01:00:09,620 So we're going to represent conditional probabilities as 1103 01:00:09,620 --> 01:00:11,450 procedures. 1104 01:00:11,450 --> 01:00:12,160 That's a little weird. 1105 01:00:12,160 --> 01:00:17,920 So the input to the procedure, specifies the condition. 1106 01:00:17,920 --> 01:00:22,420 So if I want to call the procedure and find out what's 1107 01:00:22,420 --> 01:00:26,410 the distribution for the tests, given that 1108 01:00:26,410 --> 01:00:29,070 the person has AIDS? 1109 01:00:29,070 --> 01:00:32,085 Then I would call, test given AIDS of true. 1110 01:00:34,980 --> 01:00:42,040 So if AIDS is true, return this DDist, otherwise return 1111 01:00:42,040 --> 01:00:43,140 this DDist. 1112 01:00:43,140 --> 01:00:46,460 So it's a little bizarre but think about what it has to do. 1113 01:00:46,460 --> 01:00:50,720 If I want to specify a conditional probability, I 1114 01:00:50,720 --> 01:00:53,640 have to tell you an answer. 1115 01:00:53,640 --> 01:00:56,990 And that's what the parameter is for. 1116 01:00:56,990 --> 01:01:00,260 So the way that would work is illustrated here having 1117 01:01:00,260 --> 01:01:03,910 defined this as the conditional distribution I 1118 01:01:03,910 --> 01:01:07,980 could call it by saying what is the distribution on tests 1119 01:01:07,980 --> 01:01:11,210 given that AIDS was true? 1120 01:01:11,210 --> 01:01:12,775 And the answer to that is the DDist. 1121 01:01:15,320 --> 01:01:20,140 Or if I had that DDist, which would be this phrase, I could 1122 01:01:20,140 --> 01:01:22,650 say what's then the probability in that new 1123 01:01:22,650 --> 01:01:26,980 distribution that the answer is negative? 1124 01:01:26,980 --> 01:01:30,720 Then I would look up the dot prob method within the 1125 01:01:30,720 --> 01:01:36,160 resulting conditional distribution, and look up the 1126 01:01:36,160 --> 01:01:37,410 condition negative. 1127 01:01:39,680 --> 01:01:42,930 And finally the way that I would think about a joint 1128 01:01:42,930 --> 01:01:48,850 probability distribution, is to use a tuple. 1129 01:01:48,850 --> 01:01:50,700 Joint probability distributions are 1130 01:01:50,700 --> 01:01:54,600 multi-dimensional, tuples are multi-dimensional. 1131 01:01:54,600 --> 01:01:57,500 So for example, if I wanted to represent this 1132 01:01:57,500 --> 01:02:03,990 multi-dimensional data, I might have the joint 1133 01:02:03,990 --> 01:02:11,300 distribution of AIDS and tests. 1134 01:02:11,300 --> 01:02:12,870 OK that's a 2-by-2. 1135 01:02:12,870 --> 01:02:16,900 AIDS can take on 2 different values, true or false. 1136 01:02:16,900 --> 01:02:18,380 And tests can take on 2 different 1137 01:02:18,380 --> 01:02:19,820 values, positive or negative. 1138 01:02:19,820 --> 01:02:21,940 So there's 4 cases. 1139 01:02:21,940 --> 01:02:25,310 The way I would specify a joint distribution would be 1140 01:02:25,310 --> 01:02:29,950 create a joint distribution starting with the marginal 1141 01:02:29,950 --> 01:02:39,900 distribution for AIDS and then using Bayes' rule tell me the 1142 01:02:39,900 --> 01:02:44,860 two different conditional probabilities given AIDS. 1143 01:02:44,860 --> 01:02:50,130 And that then will create a new joint distribution that 1144 01:02:50,130 --> 01:02:53,980 whose DDist is a tuple. 1145 01:02:53,980 --> 01:02:58,900 So in this new joint distribution, AIDS and tests, 1146 01:02:58,900 --> 01:03:02,860 if AIDS is false, and test is negative-- 1147 01:03:02,860 --> 01:03:06,830 so false negative is this number-- 1148 01:03:06,830 --> 01:03:12,830 the probability associated with tuple is that number. 1149 01:03:12,830 --> 01:03:14,710 Is that clear? 1150 01:03:14,710 --> 01:03:17,920 So I'm going to construct joint distributions by 1151 01:03:17,920 --> 01:03:22,870 thinking about conditional probabilities. 1152 01:03:22,870 --> 01:03:25,290 So I have a simple distributions which are 1153 01:03:25,290 --> 01:03:26,850 defined with dictionaries. 1154 01:03:26,850 --> 01:03:30,040 I have conditional probabilities which are 1155 01:03:30,040 --> 01:03:31,790 defined by procedures. 1156 01:03:31,790 --> 01:03:34,970 And I have joint probabilities which are defined by tuples. 1157 01:03:39,230 --> 01:03:43,990 OK, so that's the Python magic that we will use and a lot of 1158 01:03:43,990 --> 01:03:48,010 the exercises for Week 10 have to do with getting that 1159 01:03:48,010 --> 01:03:50,170 nomenclature straight. 1160 01:03:50,170 --> 01:03:52,530 It's a little confusing at first, I assure you that by 1161 01:03:52,530 --> 01:03:54,130 the time you've practiced with it, it is 1162 01:03:54,130 --> 01:03:56,910 a reasonable notation. 1163 01:03:56,910 --> 01:03:59,970 It just takes a little bit of practice to get onto it, much 1164 01:03:59,970 --> 01:04:01,380 like other notations. 1165 01:04:01,380 --> 01:04:03,650 OK where are we going with this? 1166 01:04:03,650 --> 01:04:06,000 What we would like to do is solve that problem that I 1167 01:04:06,000 --> 01:04:08,120 showed at the beginning of the hour. 1168 01:04:08,120 --> 01:04:12,620 So we would like to know things like, where am I? 1169 01:04:12,620 --> 01:04:15,670 So the kind of thing that we're going to do is think 1170 01:04:15,670 --> 01:04:19,720 about where am I based on my current velocity and where I 1171 01:04:19,720 --> 01:04:23,280 think I am, odometry-- 1172 01:04:23,280 --> 01:04:27,200 which is uncertain, it's unreliable-- 1173 01:04:27,200 --> 01:04:29,750 versus for example where I think I am 1174 01:04:29,750 --> 01:04:31,670 based on noisy sensors. 1175 01:04:31,670 --> 01:04:36,860 OK so that's like two independent noisy things. 1176 01:04:36,860 --> 01:04:37,070 Right? 1177 01:04:37,070 --> 01:04:39,410 The odometry you can't completely rely on it. 1178 01:04:39,410 --> 01:04:42,300 You've probably run into that by now. 1179 01:04:42,300 --> 01:04:44,750 The sonars are not completely reliable. 1180 01:04:44,750 --> 01:04:46,960 So there are two kinds of noisy things. 1181 01:04:46,960 --> 01:04:49,470 How do you optimally combine them? 1182 01:04:49,470 --> 01:04:51,830 That's where we're heading. 1183 01:04:51,830 --> 01:04:54,350 So the idea is going to be here I am, I think I'm a 1184 01:04:54,350 --> 01:04:56,510 robot, I think I'm heading toward a wall, I'd like to 1185 01:04:56,510 --> 01:04:58,790 know where am I. 1186 01:04:58,790 --> 01:05:01,430 So the kinds of data that we're going to look at are 1187 01:05:01,430 --> 01:05:08,370 things like, I think I know where I started out. 1188 01:05:08,370 --> 01:05:10,360 Now my thinking could be pretty vague. 1189 01:05:10,360 --> 01:05:13,340 It could be, I have no clue so I'm going to assume that I'm 1190 01:05:13,340 --> 01:05:17,030 equally likely anywhere in space. 1191 01:05:17,030 --> 01:05:19,600 So I have a small probability of being many places. 1192 01:05:19,600 --> 01:05:20,850 That just means that my initial 1193 01:05:20,850 --> 01:05:22,100 distribution is very broad. 1194 01:05:25,510 --> 01:05:29,510 But then I will define where I think I am by taking into 1195 01:05:29,510 --> 01:05:34,300 account where I think I will be after my next step. 1196 01:05:34,300 --> 01:05:38,320 So I think I'm moving at some speed. 1197 01:05:38,320 --> 01:05:41,120 If I were here, and if I'm going at some 1198 01:05:41,120 --> 01:05:43,790 speed I'll be there. 1199 01:05:43,790 --> 01:05:48,650 So we will formalize that by thinking about a transition. 1200 01:05:48,650 --> 01:05:52,920 I think that if I am here at time T, I will be there at 1201 01:05:52,920 --> 01:05:55,860 time T plus 1. 1202 01:05:55,860 --> 01:05:58,680 And I'll also think about, what do I think the sonars 1203 01:05:58,680 --> 01:05:59,720 should've told me. 1204 01:05:59,720 --> 01:06:02,090 If I think I'm here, what would the sonars have said? 1205 01:06:02,090 --> 01:06:05,060 If I think I'm here, what would the sonars have said? 1206 01:06:05,060 --> 01:06:09,160 And we'll use those as a way to work backwards in 1207 01:06:09,160 --> 01:06:12,890 probability, use Bayes' rule. 1208 01:06:12,890 --> 01:06:16,970 To say, I have a noisy idea about where I will be if I 1209 01:06:16,970 --> 01:06:19,230 started there. 1210 01:06:19,230 --> 01:06:22,180 I have a noisy idea of what the sonars would have said, if 1211 01:06:22,180 --> 01:06:24,190 I started there. 1212 01:06:24,190 --> 01:06:25,930 But I don't know where I started. 1213 01:06:25,930 --> 01:06:28,240 Where did I start? 1214 01:06:28,240 --> 01:06:32,780 That's the way we're going to use the probability theory. 1215 01:06:32,780 --> 01:06:36,650 So for example, if I thought I was here and if I thought I 1216 01:06:36,650 --> 01:06:41,770 was going ahead 2 units in space per unit in time, I 1217 01:06:41,770 --> 01:06:46,290 would think that the next time I'm here. 1218 01:06:46,290 --> 01:06:49,350 But since I'm not quite sure where I was maybe I'll be 1219 01:06:49,350 --> 01:06:51,370 there, and maybe I'll be there, but there's very little 1220 01:06:51,370 --> 01:06:52,760 chance that I'll be there. 1221 01:06:52,760 --> 01:06:57,230 That's what I mean by a transition model. 1222 01:06:57,230 --> 01:07:01,760 It's a probabilistic way of describing the difference 1223 01:07:01,760 --> 01:07:04,900 between where I start and where I finish in one step. 1224 01:07:08,040 --> 01:07:11,060 Similarly, we'll think about an observation model. 1225 01:07:11,060 --> 01:07:13,310 If I think I'm here, what do I think the 1226 01:07:13,310 --> 01:07:14,900 sonars would have said. 1227 01:07:14,900 --> 01:07:18,910 Well I think I've got some distribution that it's very 1228 01:07:18,910 --> 01:07:21,870 likely that they'll give me the right answer, but it might 1229 01:07:21,870 --> 01:07:23,850 be a little short it might be a long. 1230 01:07:23,850 --> 01:07:25,760 Maybe it'll make a bigger error. 1231 01:07:25,760 --> 01:07:30,760 So I'll think about two things. 1232 01:07:30,760 --> 01:07:34,220 Where do I think I will be based on how I'm going? 1233 01:07:34,220 --> 01:07:37,200 And where do I think I'll be based on my observations? 1234 01:07:37,200 --> 01:07:39,640 And then we'll try to formalize that into a 1235 01:07:39,640 --> 01:07:42,790 structure that gives me a better idea of where I am. 1236 01:07:46,040 --> 01:07:49,130 That's the point of the exercises next week when we 1237 01:07:49,130 --> 01:07:51,420 won't have a lecture. 1238 01:07:51,420 --> 01:07:53,810 So this week we're going to learn how to do some very 1239 01:07:53,810 --> 01:07:57,660 simple ideas with modelling probabilities. 1240 01:07:57,660 --> 01:07:59,950 With thinking about these kinds of distributions. 1241 01:07:59,950 --> 01:08:02,380 And the idea next week then is going to be incorporating it 1242 01:08:02,380 --> 01:08:06,310 into a structure that will let us figure out where the robot 1243 01:08:06,310 --> 01:08:10,700 is in some sort of an optimal sense. 1244 01:08:10,700 --> 01:08:13,220 So thinking about optimal -- 1245 01:08:13,220 --> 01:08:14,720 let's come back to the original question. 1246 01:08:17,229 --> 01:08:23,130 How much would you pay me to play the game? 1247 01:08:23,130 --> 01:08:24,740 OK, we had some votes. 1248 01:08:24,740 --> 01:08:27,470 They didn't add up to 1. 1249 01:08:27,470 --> 01:08:30,160 What should I do to make them add up to 1? 1250 01:08:33,000 --> 01:08:34,330 Divide by the sum. 1251 01:08:34,330 --> 01:08:35,960 Right? 1252 01:08:35,960 --> 01:08:37,359 Look at all of you know already, right? 1253 01:08:37,359 --> 01:08:41,149 So you now know all this great probability theory. 1254 01:08:41,149 --> 01:08:44,630 So the question is can we use probability theory to come up 1255 01:08:44,630 --> 01:08:50,010 with a rational way of thinking how much it's worth? 1256 01:08:50,010 --> 01:08:55,210 Most of you thought that it's worth less than 10$. 1257 01:08:55,210 --> 01:08:56,979 OK, so how do we think about this? 1258 01:08:56,979 --> 01:09:01,069 How do we use the theory that we just generated to come up 1259 01:09:01,069 --> 01:09:05,560 with a rational decision about how much that's worth? 1260 01:09:05,560 --> 01:09:10,210 OK, thinking about the bet quantitatively, what we're 1261 01:09:10,210 --> 01:09:11,439 going to try to do is think about it 1262 01:09:11,439 --> 01:09:13,710 with probability theory. 1263 01:09:13,710 --> 01:09:18,200 There are 5 possibilities inside the bag. 1264 01:09:18,200 --> 01:09:22,040 Originally there could have been 4 white, or 3 white and 1 1265 01:09:22,040 --> 01:09:26,790 red, or 2 and 2, or 1 and 3, or 0 and 4. 1266 01:09:26,790 --> 01:09:28,290 That was the original case. 1267 01:09:28,290 --> 01:09:29,040 You didn't know. 1268 01:09:29,040 --> 01:09:30,260 I didn't know. 1269 01:09:30,260 --> 01:09:31,970 They were thrown into the bag over here. 1270 01:09:31,970 --> 01:09:33,160 We didn't know. 1271 01:09:33,160 --> 01:09:36,250 How much would that game-- 1272 01:09:36,250 --> 01:09:43,590 how much should you be willing to pay to play that game? 1273 01:09:43,590 --> 01:09:48,189 Someone asked how many white ones and how many red ones did 1274 01:09:48,189 --> 01:09:49,810 the person put in the bag? 1275 01:09:49,810 --> 01:09:51,609 I don't have a clue, right? 1276 01:09:51,609 --> 01:09:54,970 We need a model for the person. 1277 01:09:54,970 --> 01:10:01,940 Since I don't have a clue, one very common strategy is to say 1278 01:10:01,940 --> 01:10:03,770 all these things I know nothing about let's just 1279 01:10:03,770 --> 01:10:06,880 assume they're all equally likely. 1280 01:10:06,880 --> 01:10:10,570 So that's called maximum likelihood, when you do that. 1281 01:10:10,570 --> 01:10:12,670 There's other possible strategies. 1282 01:10:12,670 --> 01:10:14,990 I'll use the maximum likelihood idea just 1283 01:10:14,990 --> 01:10:16,290 because it's easy. 1284 01:10:16,290 --> 01:10:18,080 So I have no idea. 1285 01:10:18,080 --> 01:10:21,790 Let's just assume that here's all of the conditions that 1286 01:10:21,790 --> 01:10:22,450 could have happened. 1287 01:10:22,450 --> 01:10:24,602 The number of red that are in the bag could have been 0, 1, 1288 01:10:24,602 --> 01:10:25,852 2, 3, or 4. 1289 01:10:28,000 --> 01:10:31,270 I have no idea how the person chose the 1290 01:10:31,270 --> 01:10:33,700 number of LEGO parts. 1291 01:10:33,700 --> 01:10:38,580 So I'll assume that each of those cases is 1/5 likely, 1292 01:10:38,580 --> 01:10:41,730 since there's 5 cases. 1293 01:10:41,730 --> 01:10:45,970 OK now I'll think about what's my expected value of the 1294 01:10:45,970 --> 01:10:50,350 amount of money that I'll make if the random variable S, 1295 01:10:50,350 --> 01:10:54,910 which is the number of red things that are in the bag was 1296 01:10:54,910 --> 01:10:56,430 s which is either 0, 1, 2, 3, or 4. 1297 01:10:59,260 --> 01:11:04,410 OK, if there are 0, how much money do you expect to make? 1298 01:11:04,410 --> 01:11:06,410 None. 1299 01:11:06,410 --> 01:11:08,610 If there are 4 reds, how much money would 1300 01:11:08,610 --> 01:11:11,680 you expect to make? 1301 01:11:11,680 --> 01:11:19,870 $20 If there are 2 reds, you would expect to make 10 $. 1302 01:11:19,870 --> 01:11:21,860 Everybody see that. 1303 01:11:21,860 --> 01:11:24,960 I'm trying to think through a logical sequence of steps for 1304 01:11:24,960 --> 01:11:28,450 thinking about how much is it worth to play the game. 1305 01:11:28,450 --> 01:11:33,110 So this is the amount of money that you would expect given 1306 01:11:33,110 --> 01:11:37,460 that the number of red in the bag, which you don't know, 1307 01:11:37,460 --> 01:11:39,750 were 0, 1, 2, 3, or 4. 1308 01:11:39,750 --> 01:11:41,560 That's this row. 1309 01:11:41,560 --> 01:11:45,360 What's the probability, what's the expected value of the 1310 01:11:45,360 --> 01:11:49,410 amount of money you would get., and that happens? 1311 01:11:49,410 --> 01:11:51,390 Well I have to use Bayes' rule. 1312 01:11:54,020 --> 01:11:57,810 What I need to do is I have to take this probability times 1313 01:11:57,810 --> 01:12:00,670 that amount to get that dollar value. 1314 01:12:00,670 --> 01:12:08,530 So over here, in the event that there are 4 reds in the 1315 01:12:08,530 --> 01:12:13,435 bag, I'm expecting to get $20 but that's only 1/5 likely. 1316 01:12:16,330 --> 01:12:16,590 Right? 1317 01:12:16,590 --> 01:12:20,440 Because there don't have to be 4 reds in the bag. 1318 01:12:20,440 --> 01:12:23,920 So I multiply the 1/5 times the $20, and I get 4$. 1319 01:12:23,920 --> 01:12:27,200 So my expected outcome for this trial is 4$. 1320 01:12:29,710 --> 01:12:33,490 Here, I'm expecting to make 10$ if I knew that there was 2 1321 01:12:33,490 --> 01:12:34,390 reds in the bag. 1322 01:12:34,390 --> 01:12:36,070 But I don't know that there's 2 reds in the bag, there's a 1323 01:12:36,070 --> 01:12:40,340 1/5 probability there's 2 reds in the bag. 1324 01:12:40,340 --> 01:12:44,465 So 1/5 of my expected amount of money which is 10$ is 2$. 1325 01:12:47,090 --> 01:12:49,970 So then in order to figure out my expected amount money I 1326 01:12:49,970 --> 01:12:53,630 just add these all up, marginalizing. 1327 01:12:53,630 --> 01:12:55,110 And I get the [UNINTELLIGIBLE] 1328 01:12:55,110 --> 01:12:58,470 4 plus 3 is 7 plus 2 is 9 plus 1 is 10. 1329 01:12:58,470 --> 01:13:02,470 So this theory says it if I can regard the person who put 1330 01:13:02,470 --> 01:13:07,260 the LEGOs in the bag as being completely random, I should 1331 01:13:07,260 --> 01:13:12,110 expect to make 10$ on the experiment. 1332 01:13:12,110 --> 01:13:14,180 So that means you should be willing to pay 10$. 1333 01:13:16,840 --> 01:13:20,400 Because on average, you'll get back 10$. 1334 01:13:20,400 --> 01:13:22,330 If you wanted to make a profit you ought to be 1335 01:13:22,330 --> 01:13:25,040 willing to pay 9$. 1336 01:13:25,040 --> 01:13:25,350 Right? 1337 01:13:25,350 --> 01:13:28,240 Because then you would pay 9$ expecting to get 10$. 1338 01:13:28,240 --> 01:13:30,630 If you really would like to make a loss, right? 1339 01:13:30,630 --> 01:13:33,760 Then you should pay $11. 1340 01:13:33,760 --> 01:13:34,499 Yeah? 1341 01:13:34,499 --> 01:13:36,594 AUDIENCE: Why do we assume that these 1342 01:13:36,594 --> 01:13:37,493 events are equally likely? 1343 01:13:37,493 --> 01:13:39,990 PROFESSOR: Completely arbitrary. 1344 01:13:39,990 --> 01:13:44,210 So there's theories, more advanced theories, for how you 1345 01:13:44,210 --> 01:13:46,200 would make that choice. 1346 01:13:46,200 --> 01:13:49,910 So for example if in your head you thought that the person 1347 01:13:49,910 --> 01:13:55,770 just took a large collection of LEGO parts and reached in, 1348 01:13:55,770 --> 01:13:59,390 then you would think that the number of red and white might 1349 01:13:59,390 --> 01:14:03,820 depend on the number that started out in the bin. 1350 01:14:03,820 --> 01:14:05,680 But I don't think that's probably true, right. 1351 01:14:05,680 --> 01:14:08,010 The person was probably looking at them and saying, oh 1352 01:14:08,010 --> 01:14:10,800 throw in one red, through in one white. 1353 01:14:10,800 --> 01:14:14,540 So you need a theory for doing that, and I'm saying that in 1354 01:14:14,540 --> 01:14:18,940 the absence of any other information let me assume that 1355 01:14:18,940 --> 01:14:22,210 those are equally likely and see what the consequence of 1356 01:14:22,210 --> 01:14:23,050 that would be. 1357 01:14:23,050 --> 01:14:26,640 The consequence of assuming that is that I should expect 1358 01:14:26,640 --> 01:14:29,800 to get 10 $ back. 1359 01:14:29,800 --> 01:14:34,770 What happens if you pull out a red? 1360 01:14:34,770 --> 01:14:36,960 As we did. 1361 01:14:36,960 --> 01:14:39,330 How does that affect things? 1362 01:14:39,330 --> 01:14:43,430 Well it increases the bottom line. 1363 01:14:43,430 --> 01:14:47,320 I start out again with the assumption that all 5 cases 1364 01:14:47,320 --> 01:14:49,810 are equally likely. 1365 01:14:49,810 --> 01:14:54,440 Now I have to ask the case, how likely is it that the one 1366 01:14:54,440 --> 01:14:57,540 that we pulled out was red? 1367 01:14:57,540 --> 01:15:01,520 Well it's not very likely that the one that I pulled out was 1368 01:15:01,520 --> 01:15:05,340 red, if they were all white. 1369 01:15:05,340 --> 01:15:07,180 The probability of that happening is 0. 1370 01:15:10,320 --> 01:15:12,690 What's the probability if there were 2 that the person 1371 01:15:12,690 --> 01:15:13,740 pulled out a red? 1372 01:15:13,740 --> 01:15:16,880 Well 2 of them were red, 2 of them were white, 2 out of 4 1373 01:15:16,880 --> 01:15:24,550 cases would have showed this case of pulling out a red. 1374 01:15:24,550 --> 01:15:27,290 So this line then tells me how likely is it 1375 01:15:27,290 --> 01:15:30,670 that the red was pulled. 1376 01:15:30,670 --> 01:15:31,260 OK. 1377 01:15:31,260 --> 01:15:34,020 Then what I want to do is think about what's the 1378 01:15:34,020 --> 01:15:38,110 probability that I pulled out a red, and there was 1379 01:15:38,110 --> 01:15:39,530 0, 1, 2, 3, or 4. 1380 01:15:39,530 --> 01:15:45,350 So I multiply 1/5 times 0/40, 0/20, 1/5 times 1/4 to get 1381 01:15:45,350 --> 01:15:50,120 1/20, 1/5 times 2/4 you get 2/20. 1382 01:15:50,120 --> 01:15:52,220 So those are probabilities of each 1383 01:15:52,220 --> 01:15:54,300 individual event happening. 1384 01:15:54,300 --> 01:15:57,220 But they don't sum to 1. 1385 01:15:57,220 --> 01:15:59,670 So then the next step I have to make them sum to 1. 1386 01:15:59,670 --> 01:16:01,380 So the sum of these is a 1/2. 1387 01:16:01,380 --> 01:16:05,270 So I make them sum to one this way. 1388 01:16:05,270 --> 01:16:10,170 So now what's happened is it's relatively more likely 4 out 1389 01:16:10,170 --> 01:16:14,200 of 10, that this case happened, than that case. 1390 01:16:14,200 --> 01:16:19,890 I know for sure, for example, that there's not 4 whites. 1391 01:16:19,890 --> 01:16:22,360 The probability of 4 whites is 0-- 1392 01:16:22,360 --> 01:16:25,400 0 out of 10 1393 01:16:25,400 --> 01:16:29,730 So what I've done is I've skewed the distribution toward 1394 01:16:29,730 --> 01:16:34,770 more red by learning that there's at least 1, I now know 1395 01:16:34,770 --> 01:16:36,650 that I know additional information. 1396 01:16:36,650 --> 01:16:39,570 These were not equally likely. 1397 01:16:39,570 --> 01:16:41,670 In fact, the ones with more red were 1398 01:16:41,670 --> 01:16:43,520 relatively more likely. 1399 01:16:43,520 --> 01:16:46,480 So if I compute this probability times that 1400 01:16:46,480 --> 01:16:50,580 expected amount, I now get a much bigger answer for the 1401 01:16:50,580 --> 01:16:54,100 high number of reds. 1402 01:16:54,100 --> 01:16:57,350 So I still get 0 just like I did before for this case, 1403 01:16:57,350 --> 01:16:59,540 because there's no reds in the bag. 1404 01:16:59,540 --> 01:17:02,930 But now it's much more likely that they're all red, because 1405 01:17:02,930 --> 01:17:05,600 I know there was at least 1 red. 1406 01:17:05,600 --> 01:17:08,740 And then the answer comes out $15. 1407 01:17:08,740 --> 01:17:15,620 So my overall assessment, don't go to Vegas. 1408 01:17:18,350 --> 01:17:25,710 You could have made a lot more money by offering $13. 1409 01:17:25,710 --> 01:17:27,380 Because on average, you should've 1410 01:17:27,380 --> 01:17:30,420 expected to make $15. 1411 01:17:30,420 --> 01:17:34,490 OK, so what I wanted to do by this example is go through a 1412 01:17:34,490 --> 01:17:39,130 specific example of how you can speak quantitatively about 1413 01:17:39,130 --> 01:17:40,540 things that are uncertain. 1414 01:17:40,540 --> 01:17:43,170 And that's the theme for the rest of the course.