1 00:00:00,000 --> 00:00:00,040 2 00:00:00,040 --> 00:00:02,460 The following content is provided under a Creative 3 00:00:02,460 --> 00:00:03,870 Commons license. 4 00:00:03,870 --> 00:00:06,910 Your support will help MIT OpenCourseWare continue to 5 00:00:06,910 --> 00:00:10,560 offer high quality educational resources for free. 6 00:00:10,560 --> 00:00:13,460 To make a donation or view additional materials from 7 00:00:13,460 --> 00:00:19,290 hundreds of MIT courses, visit MIT OpenCourseWare at 8 00:00:19,290 --> 00:00:22,004 ocw.mit.edu 9 00:00:22,004 --> 00:00:24,966 JOHN TSISIKLIS: So here's the agenda for today. 10 00:00:24,966 --> 00:00:26,848 We're going to do a very quick review. 11 00:00:26,848 --> 00:00:28,936 And then we're going to introduce some 12 00:00:28,936 --> 00:00:30,560 very important concepts. 13 00:00:30,560 --> 00:00:34,060 The idea is that all information is-- 14 00:00:34,060 --> 00:00:36,450 Information is always partial. 15 00:00:36,450 --> 00:00:40,260 And the question is what do we do to probabilities if we have 16 00:00:40,260 --> 00:00:43,340 some partial information about the random experiments. 17 00:00:43,340 --> 00:00:45,770 We're going to introduce the important concept of 18 00:00:45,770 --> 00:00:47,530 conditional probability. 19 00:00:47,530 --> 00:00:50,860 And then we will see three very useful ways 20 00:00:50,860 --> 00:00:52,670 in which it is used. 21 00:00:52,670 --> 00:00:55,410 And these ways basically correspond to divide and 22 00:00:55,410 --> 00:00:58,070 conquer methods for breaking up problems 23 00:00:58,070 --> 00:01:00,120 into simpler pieces. 24 00:01:00,120 --> 00:01:04,010 And also one more fundamental tool which allows us to use 25 00:01:04,010 --> 00:01:07,420 conditional probabilities to do inference, that is, if we 26 00:01:07,420 --> 00:01:09,440 get a little bit of information about some 27 00:01:09,440 --> 00:01:12,620 phenomenon, what can we infer about the things 28 00:01:12,620 --> 00:01:14,640 that we have not seen? 29 00:01:14,640 --> 00:01:17,050 So our quick review. 30 00:01:17,050 --> 00:01:22,100 In setting up a model of a random experiment, the first 31 00:01:22,100 --> 00:01:25,930 thing to do is to come up with a list of all the possible 32 00:01:25,930 --> 00:01:27,870 outcomes of the experiment. 33 00:01:27,870 --> 00:01:31,120 So that list is what we call the sample space. 34 00:01:31,120 --> 00:01:32,480 It's a set. 35 00:01:32,480 --> 00:01:34,580 And the elements of the sample space are all 36 00:01:34,580 --> 00:01:35,720 the possible outcomes. 37 00:01:35,720 --> 00:01:37,560 Those possible outcomes must be 38 00:01:37,560 --> 00:01:39,690 distinguishable from each other. 39 00:01:39,690 --> 00:01:41,020 They're mutually exclusive. 40 00:01:41,020 --> 00:01:44,900 Either one happens or the other happens, but not both. 41 00:01:44,900 --> 00:01:47,440 And they are collectively exhaustive, that is no matter 42 00:01:47,440 --> 00:01:50,480 what the outcome of the experiment is going to be an 43 00:01:50,480 --> 00:01:52,130 element of the sample space. 44 00:01:52,130 --> 00:01:54,200 And then we discussed last time that there's also an 45 00:01:54,200 --> 00:01:57,510 element of art in how to choose your sample space, 46 00:01:57,510 --> 00:02:01,440 depending on how much detail you want to capture. 47 00:02:01,440 --> 00:02:03,130 This is usually the easy part. 48 00:02:03,130 --> 00:02:06,980 Then the more interesting part is to assign probabilities to 49 00:02:06,980 --> 00:02:10,660 our model, that is to make some statements about what we 50 00:02:10,660 --> 00:02:14,610 believe to be likely and what we believe to be unlikely. 51 00:02:14,610 --> 00:02:17,720 The way we do that is by assigning probabilities to 52 00:02:17,720 --> 00:02:20,510 subsets of the sample space. 53 00:02:20,510 --> 00:02:26,120 So as we have our sample space here, we may have a subset A. 54 00:02:26,120 --> 00:02:31,090 And we assign a number to that subset P(A), which is the 55 00:02:31,090 --> 00:02:33,910 probability that this event happens. 56 00:02:33,910 --> 00:02:37,080 Or this is the probability that when we do the experiment 57 00:02:37,080 --> 00:02:39,860 and we get an outcome it's the probability that the outcome 58 00:02:39,860 --> 00:02:41,850 happens to fall inside that event. 59 00:02:41,850 --> 00:02:44,500 We have certain rules that probabilities should satisfy. 60 00:02:44,500 --> 00:02:46,210 They're non-negative. 61 00:02:46,210 --> 00:02:49,780 The probability of the overall sample space is equal to one, 62 00:02:49,780 --> 00:02:52,900 which expresses the fact that we're are certain, no matter 63 00:02:52,900 --> 00:02:55,480 what, the outcome is going to be an element 64 00:02:55,480 --> 00:02:56,830 of the sample space. 65 00:02:56,830 --> 00:02:59,760 Well, if we set the top right so that it exhausts all 66 00:02:59,760 --> 00:03:03,190 possibilities, this should be the case. 67 00:03:03,190 --> 00:03:05,480 And then there's another interesting property of 68 00:03:05,480 --> 00:03:09,240 probabilities that says that, if we have two events or two 69 00:03:09,240 --> 00:03:11,910 subsets that are disjoint, and we're interested in the 70 00:03:11,910 --> 00:03:17,670 probability, that one or the other happens, that is the 71 00:03:17,670 --> 00:03:21,870 outcome belongs to A or belongs to B. For disjoint 72 00:03:21,870 --> 00:03:25,320 events the total probability of these two, taken together, 73 00:03:25,320 --> 00:03:28,030 is just the sum of their individual probabilities. 74 00:03:28,030 --> 00:03:30,270 So probabilities behave like masses. 75 00:03:30,270 --> 00:03:34,760 The mass of the object consisting of A and B is the 76 00:03:34,760 --> 00:03:37,230 sum of the masses of these two objects. 77 00:03:37,230 --> 00:03:39,720 Or you can think of probabilities as areas. 78 00:03:39,720 --> 00:03:41,240 They have, again, the same property. 79 00:03:41,240 --> 00:03:45,490 The area of A together with B is the area of A plus the area 80 00:03:45,490 --> 00:03:46,410 B. 81 00:03:46,410 --> 00:03:50,290 But as we discussed at the end of last lecture, it's useful 82 00:03:50,290 --> 00:03:53,970 to have in our hands a more general version of this 83 00:03:53,970 --> 00:03:58,990 additivity property, which says the following, if we take 84 00:03:58,990 --> 00:04:00,982 a sequence of sets-- 85 00:04:00,982 --> 00:04:07,480 A1, A2, A3, A4, and so on. 86 00:04:07,480 --> 00:04:09,630 And we put all of those sets together. 87 00:04:09,630 --> 00:04:11,410 It's an infinite sequence. 88 00:04:11,410 --> 00:04:14,950 And we ask for the probability that the outcome falls 89 00:04:14,950 --> 00:04:19,170 somewhere in this infinite union, that is we are asking 90 00:04:19,170 --> 00:04:22,640 for the probability that the outcome belongs to one of 91 00:04:22,640 --> 00:04:27,950 these sets, and assuming that the sets are disjoint, we can 92 00:04:27,950 --> 00:04:32,820 again find the probability for the overall set by adding up 93 00:04:32,820 --> 00:04:36,000 the probabilities of the individual sets. 94 00:04:36,000 --> 00:04:38,910 So this is a nice and simple property. 95 00:04:38,910 --> 00:04:43,130 But it's a little more subtle than you might think. 96 00:04:43,130 --> 00:04:45,820 And let's see what's going on by considering 97 00:04:45,820 --> 00:04:47,770 the following example. 98 00:04:47,770 --> 00:04:51,850 We had an example last time where we take our sample space 99 00:04:51,850 --> 00:04:53,800 to be the unit square. 100 00:04:53,800 --> 00:04:58,110 And we said let's consider a probability law that says that 101 00:04:58,110 --> 00:05:04,190 the probability of a subset is just the area of that subset. 102 00:05:04,190 --> 00:05:07,630 So let's consider this probability law. 103 00:05:07,630 --> 00:05:08,530 OK. 104 00:05:08,530 --> 00:05:13,990 Now the unit square is the set --let me just 105 00:05:13,990 --> 00:05:15,210 draw it this way-- 106 00:05:15,210 --> 00:05:20,520 the unit square is the union of one element set consisting 107 00:05:20,520 --> 00:05:21,680 all of the points. 108 00:05:21,680 --> 00:05:28,280 So the unit square is made up by the union of the various 109 00:05:28,280 --> 00:05:30,740 points inside the square. 110 00:05:30,740 --> 00:05:33,830 So union over all x's and y's. 111 00:05:33,830 --> 00:05:34,770 OK? 112 00:05:34,770 --> 00:05:36,690 So the square is made up out of all the 113 00:05:36,690 --> 00:05:38,400 points that this contains. 114 00:05:38,400 --> 00:05:41,140 And now let's do a calculation. 115 00:05:41,140 --> 00:05:45,060 One is the probability of our overall sample space, which is 116 00:05:45,060 --> 00:05:47,260 the unit square. 117 00:05:47,260 --> 00:06:02,000 Now the unit square is the union of these things, which, 118 00:06:02,000 --> 00:06:06,810 according to our additivity axiom, is the sum of the 119 00:06:06,810 --> 00:06:10,595 probabilities of all of these one element sets. 120 00:06:10,595 --> 00:06:16,830 121 00:06:16,830 --> 00:06:20,580 Now what is the probability of a one element set? 122 00:06:20,580 --> 00:06:23,520 What is the probability of this one element set? 123 00:06:23,520 --> 00:06:26,100 What's the probability that our outcome is exactly that 124 00:06:26,100 --> 00:06:27,490 particular point? 125 00:06:27,490 --> 00:06:31,460 Well, it's the area of that set, which is zero. 126 00:06:31,460 --> 00:06:33,990 So it's just the sum of zeros. 127 00:06:33,990 --> 00:06:35,950 And by any reasonable definition the 128 00:06:35,950 --> 00:06:38,370 sum of zeros is zero. 129 00:06:38,370 --> 00:06:42,220 So we just proved that one is equal to zero. 130 00:06:42,220 --> 00:06:42,680 OK. 131 00:06:42,680 --> 00:06:48,340 Either probability theory is dead or there is some mistake 132 00:06:48,340 --> 00:06:51,030 in the derivation that I did. 133 00:06:51,030 --> 00:06:54,580 OK, the mistake is quite subtle and it 134 00:06:54,580 --> 00:06:57,300 comes at this step. 135 00:06:57,300 --> 00:07:00,640 We're sort of applied the additivity axiom by saying 136 00:07:00,640 --> 00:07:04,040 that the unit square is the union of all those sets. 137 00:07:04,040 --> 00:07:06,500 Can we really apply our additivity axiom. 138 00:07:06,500 --> 00:07:07,260 Here's the catch. 139 00:07:07,260 --> 00:07:11,470 The additivity axiom applies to the case where we have a 140 00:07:11,470 --> 00:07:17,180 sequence of disjoint events and we take their union. 141 00:07:17,180 --> 00:07:21,740 Is this a sequence of sets? 142 00:07:21,740 --> 00:07:27,780 Can you make up the whole unit square by taking a sequence of 143 00:07:27,780 --> 00:07:31,310 elements inside it and cover the whole unit square? 144 00:07:31,310 --> 00:07:34,900 Well if you try, if you start looking at the sequence of one 145 00:07:34,900 --> 00:07:40,910 element points, that sequence will never be able to exhaust 146 00:07:40,910 --> 00:07:43,100 the whole unit square. 147 00:07:43,100 --> 00:07:45,680 So there's a deeper reason behind that. 148 00:07:45,680 --> 00:07:48,790 And the reason is that infinite sets are not all of 149 00:07:48,790 --> 00:07:50,130 the same size. 150 00:07:50,130 --> 00:07:52,620 The integers are an infinite set. 151 00:07:52,620 --> 00:07:55,510 And you can arrange the integers in a sequence. 152 00:07:55,510 --> 00:07:57,630 But the continuous set like the units 153 00:07:57,630 --> 00:08:00,205 square is a bigger set. 154 00:08:00,205 --> 00:08:02,050 It's so-called uncountable. 155 00:08:02,050 --> 00:08:06,160 It has more elements than any sequence could have. 156 00:08:06,160 --> 00:08:13,610 So this union here is not of this kind, where we would have 157 00:08:13,610 --> 00:08:16,930 a sequence of events. 158 00:08:16,930 --> 00:08:18,370 It's a different kind of union. 159 00:08:18,370 --> 00:08:23,070 It's a Union that involves a union of many, many more sets. 160 00:08:23,070 --> 00:08:25,420 So the countable additivity axiom does not 161 00:08:25,420 --> 00:08:27,360 apply in this case. 162 00:08:27,360 --> 00:08:30,230 Because, we're not dealing with a sequence of sets. 163 00:08:30,230 --> 00:08:33,780 And so this is the incorrect step. 164 00:08:33,780 --> 00:08:37,240 So at some level you might think that this is puzzling 165 00:08:37,240 --> 00:08:38,580 and awfully confusing. 166 00:08:38,580 --> 00:08:41,070 On the other hand, if you think about areas of the way 167 00:08:41,070 --> 00:08:43,520 you're used to them from calculus, there's nothing 168 00:08:43,520 --> 00:08:44,940 mysterious about it. 169 00:08:44,940 --> 00:08:47,460 Every point on the unit square has zero area. 170 00:08:47,460 --> 00:08:50,140 When you put all the points together, they make up 171 00:08:50,140 --> 00:08:52,330 something that has finite area. 172 00:08:52,330 --> 00:08:55,470 So there shouldn't be any mystery behind it. 173 00:08:55,470 --> 00:09:00,230 Now, one interesting thing that this discussion tells us, 174 00:09:00,230 --> 00:09:03,670 especially the fact that the single elements set has zero 175 00:09:03,670 --> 00:09:05,790 area, is the following-- 176 00:09:05,790 --> 00:09:08,960 Individual points have zero probability. 177 00:09:08,960 --> 00:09:12,390 After you do the experiment and you observe the outcome, 178 00:09:12,390 --> 00:09:14,660 it's going to be an individual point. 179 00:09:14,660 --> 00:09:18,160 So what happened in that experiment is something that 180 00:09:18,160 --> 00:09:21,820 initially you thought had zero probability of occurring. 181 00:09:21,820 --> 00:09:25,420 So if you happen to get some particular numbers and you 182 00:09:25,420 --> 00:09:28,290 say, "Well, in the beginning, what did I think about those 183 00:09:28,290 --> 00:09:29,280 specific numbers? 184 00:09:29,280 --> 00:09:31,290 I thought they had zero probability. 185 00:09:31,290 --> 00:09:36,250 But yet those particular numbers did occur." 186 00:09:36,250 --> 00:09:41,640 So one moral from this is that zero probability does not mean 187 00:09:41,640 --> 00:09:42,890 impossible. 188 00:09:42,890 --> 00:09:46,920 It just means extremely, extremely unlikely by itself. 189 00:09:46,920 --> 00:09:49,420 So zero probability things do happen. 190 00:09:49,420 --> 00:09:53,340 In such continuous models, actually zero probability 191 00:09:53,340 --> 00:09:56,930 outcomes are everything that happens. 192 00:09:56,930 --> 00:10:00,790 And the bumper sticker version of this is to always expect 193 00:10:00,790 --> 00:10:02,220 the unexpected. 194 00:10:02,220 --> 00:10:05,095 Yes? 195 00:10:05,095 --> 00:10:06,345 AUDIENCE: [INAUDIBLE]. 196 00:10:06,345 --> 00:10:08,532 197 00:10:08,532 --> 00:10:11,800 JOHN TSISIKLIS: Well, probability is supposed to be 198 00:10:11,800 --> 00:10:12,530 a real number. 199 00:10:12,530 --> 00:10:16,220 So it's either zero or it's a positive number. 200 00:10:16,220 --> 00:10:21,350 So you can think of the probability of things just 201 00:10:21,350 --> 00:10:25,040 close to that point and those probabilities are tiny and 202 00:10:25,040 --> 00:10:26,390 close to zero. 203 00:10:26,390 --> 00:10:28,780 So that's how we're going to interpret probabilities in 204 00:10:28,780 --> 00:10:29,810 continuous models. 205 00:10:29,810 --> 00:10:31,340 But this is two chapters ahead. 206 00:10:31,340 --> 00:10:33,950 207 00:10:33,950 --> 00:10:34,230 Yeah? 208 00:10:34,230 --> 00:10:36,198 AUDIENCE: How do we interpret probability of zero? 209 00:10:36,198 --> 00:10:37,674 If we can use models that way, then how about 210 00:10:37,674 --> 00:10:38,658 probability of one? 211 00:10:38,658 --> 00:10:40,462 That it it's extremely likely but not 212 00:10:40,462 --> 00:10:42,110 necessarily for certain? 213 00:10:42,110 --> 00:10:43,320 JOHN TSISIKLIS: That's also the case. 214 00:10:43,320 --> 00:10:47,450 For example, if you ask in this continuous model, if you 215 00:10:47,450 --> 00:10:52,190 ask me for the probability that x, y, is different than 216 00:10:52,190 --> 00:10:55,840 the zero, zero this is the whole square, 217 00:10:55,840 --> 00:10:57,220 except for one point. 218 00:10:57,220 --> 00:11:01,150 So the area of this is going to be one. 219 00:11:01,150 --> 00:11:06,330 But this event is not entirely certain because the zero, zero 220 00:11:06,330 --> 00:11:08,210 outcome is also possible. 221 00:11:08,210 --> 00:11:12,330 So again, probability of one means essential certainty. 222 00:11:12,330 --> 00:11:16,450 But it still allows the possibility that the outcome 223 00:11:16,450 --> 00:11:18,320 might be outside that set. 224 00:11:18,320 --> 00:11:20,910 So these are some of the weird things that are happening when 225 00:11:20,910 --> 00:11:22,680 you have continuous models. 226 00:11:22,680 --> 00:11:25,240 And that's why we start to this class with discrete 227 00:11:25,240 --> 00:11:27,050 models, on which would be spending the 228 00:11:27,050 --> 00:11:30,400 next couple of weeks. 229 00:11:30,400 --> 00:11:30,820 OK. 230 00:11:30,820 --> 00:11:35,650 So now once we have set up our probability model and we have 231 00:11:35,650 --> 00:11:39,160 a legitimate probability law that has these properties, 232 00:11:39,160 --> 00:11:43,070 then the rest is usually simple. 233 00:11:43,070 --> 00:11:45,950 Somebody asks you a question of calculating the probability 234 00:11:45,950 --> 00:11:47,520 of some event. 235 00:11:47,520 --> 00:11:50,270 While you were told something about the probability law, 236 00:11:50,270 --> 00:11:52,520 such as for example the probabilities are equal to 237 00:11:52,520 --> 00:11:55,460 areas, and then you just need to calculate. 238 00:11:55,460 --> 00:11:58,730 In these type of examples somebody would give you a set 239 00:11:58,730 --> 00:12:00,230 and you would have to calculate the 240 00:12:00,230 --> 00:12:01,500 area of that set. 241 00:12:01,500 --> 00:12:06,060 So the rest is just calculation and simple. 242 00:12:06,060 --> 00:12:09,390 Alright, so now it's time to start with our main 243 00:12:09,390 --> 00:12:12,600 business for today. 244 00:12:12,600 --> 00:12:16,880 And the starting point is the following-- 245 00:12:16,880 --> 00:12:18,920 You know something about the world. 246 00:12:18,920 --> 00:12:21,690 And based on what you know when you set up a probability 247 00:12:21,690 --> 00:12:23,820 model and you write down probabilities for the 248 00:12:23,820 --> 00:12:26,000 different outcomes. 249 00:12:26,000 --> 00:12:28,950 Then something happens, and somebody tells you a little 250 00:12:28,950 --> 00:12:33,620 more about the world, gives you some new information. 251 00:12:33,620 --> 00:12:37,430 This new information, in general, should change your 252 00:12:37,430 --> 00:12:41,240 beliefs about what happened or what may happen. 253 00:12:41,240 --> 00:12:44,550 So whenever we're given new information, some partial 254 00:12:44,550 --> 00:12:47,400 information about the outcome of the experiment, we should 255 00:12:47,400 --> 00:12:49,750 revise our beliefs. 256 00:12:49,750 --> 00:12:54,470 And conditional probabilities are just the probabilities 257 00:12:54,470 --> 00:12:58,820 that apply after the revision of our beliefs, when we're 258 00:12:58,820 --> 00:13:00,580 given some information. 259 00:13:00,580 --> 00:13:04,510 So lets make this into a numerical example. 260 00:13:04,510 --> 00:13:07,870 So inside the sample space, this part of the sample space, 261 00:13:07,870 --> 00:13:12,580 let's say has probability 3/6, this part has 2/6, and that 262 00:13:12,580 --> 00:13:14,550 part has 1/6. 263 00:13:14,550 --> 00:13:17,940 I guess that means that out here we have zero probability. 264 00:13:17,940 --> 00:13:21,900 So these were our initial beliefs about the outcome of 265 00:13:21,900 --> 00:13:23,270 the experiment. 266 00:13:23,270 --> 00:13:27,160 Suppose now that someone comes and tells you 267 00:13:27,160 --> 00:13:30,960 that event B occurred. 268 00:13:30,960 --> 00:13:33,560 So they don't tell you the full outcome with the 269 00:13:33,560 --> 00:13:34,440 experiment. 270 00:13:34,440 --> 00:13:38,960 But they just tell you that the outcome is known to lie 271 00:13:38,960 --> 00:13:41,060 inside this set B. 272 00:13:41,060 --> 00:13:44,320 Well then, you should certainly change your beliefs 273 00:13:44,320 --> 00:13:45,560 in some way. 274 00:13:45,560 --> 00:13:48,420 And your new beliefs about what is likely to occur and 275 00:13:48,420 --> 00:13:51,770 what is not is going to be denoted by this notation. 276 00:13:51,770 --> 00:13:55,330 This is the conditional probability that the event A 277 00:13:55,330 --> 00:13:57,970 is going to occur, the probability that the outcome 278 00:13:57,970 --> 00:14:01,580 is going to fall inside the set A given that we are told 279 00:14:01,580 --> 00:14:05,890 and we're sure that the event lies inside the event B Now 280 00:14:05,890 --> 00:14:09,000 once you're told that the outcome lies inside the event 281 00:14:09,000 --> 00:14:13,740 B, then our old sample space in some ways is irrelevant. 282 00:14:13,740 --> 00:14:16,975 We have then you sample space, which is just the set B. We 283 00:14:16,975 --> 00:14:21,020 are certain that the outcome is going to be inside B. 284 00:14:21,020 --> 00:14:25,465 For example, what is this conditional probability? 285 00:14:25,465 --> 00:14:29,120 286 00:14:29,120 --> 00:14:30,160 It should be one. 287 00:14:30,160 --> 00:14:33,250 Given that I told you that B occurred, you're certain that 288 00:14:33,250 --> 00:14:36,380 B occurred, so this has unit probability. 289 00:14:36,380 --> 00:14:40,340 So here we see an instance of revision of our beliefs. 290 00:14:40,340 --> 00:14:44,880 Initially, event B had the probability of (2+1)/6 -- 291 00:14:44,880 --> 00:14:46,300 that's 1/2. 292 00:14:46,300 --> 00:14:49,500 Initially, we thought B had probability 1/2. 293 00:14:49,500 --> 00:14:52,370 Once we're told that B occurred, the new probability 294 00:14:52,370 --> 00:14:54,250 of B is equal to one. 295 00:14:54,250 --> 00:14:55,160 OK. 296 00:14:55,160 --> 00:15:00,860 How do we revise the probability that A occurs? 297 00:15:00,860 --> 00:15:03,950 So we are going to have the outcome of the experiment. 298 00:15:03,950 --> 00:15:07,330 We know that it's inside B. So we will either get something 299 00:15:07,330 --> 00:15:09,200 here, and A does not occur. 300 00:15:09,200 --> 00:15:12,570 Or something inside here, and A does occur. 301 00:15:12,570 --> 00:15:16,280 What's the likelihood that, given that we're inside B, the 302 00:15:16,280 --> 00:15:18,160 outcome is inside here? 303 00:15:18,160 --> 00:15:21,380 Here's how we're going to think about. 304 00:15:21,380 --> 00:15:26,110 This part of this set B, in which A also occurs, in our 305 00:15:26,110 --> 00:15:31,280 initial model was twice as likely as that part of B. So 306 00:15:31,280 --> 00:15:36,220 outcomes inside here collectively were twice as 307 00:15:36,220 --> 00:15:38,950 likely as outcomes out there. 308 00:15:38,950 --> 00:15:43,240 So we're going to keep the same proportions and say, that 309 00:15:43,240 --> 00:15:47,280 given that we are inside the set B, we still want outcomes 310 00:15:47,280 --> 00:15:51,120 inside here to be twice as likely outcomes there. 311 00:15:51,120 --> 00:15:55,800 So the proportion of the probabilities should be two 312 00:15:55,800 --> 00:15:57,570 versus one. 313 00:15:57,570 --> 00:16:01,210 And these probabilities should add up to one because together 314 00:16:01,210 --> 00:16:04,340 they make the conditional probability of B. So the 315 00:16:04,340 --> 00:16:09,260 conditional probabilities should be 2/3 probability of 316 00:16:09,260 --> 00:16:13,080 being here and 1/3 probability of being there. 317 00:16:13,080 --> 00:16:16,860 That's how we revise our probabilities. 318 00:16:16,860 --> 00:16:20,740 That's a reasonable, intuitively reasonable, way of 319 00:16:20,740 --> 00:16:22,230 doing this revision. 320 00:16:22,230 --> 00:16:26,650 Let's translate what we did into a definition. 321 00:16:26,650 --> 00:16:29,490 The definition says the following, that the 322 00:16:29,490 --> 00:16:33,410 conditional probability of A given that B occurred is 323 00:16:33,410 --> 00:16:35,270 calculated as follows. 324 00:16:35,270 --> 00:16:39,430 We look at the total probability of B. And out of 325 00:16:39,430 --> 00:16:43,190 that probability that was inside here, what fraction of 326 00:16:43,190 --> 00:16:48,310 that probability is assigned to points for which the event 327 00:16:48,310 --> 00:16:49,780 A also occurs? 328 00:16:49,780 --> 00:16:54,480 329 00:16:54,480 --> 00:16:56,860 Does it give us the same numbers as we got with this 330 00:16:56,860 --> 00:16:58,420 heuristic argument? 331 00:16:58,420 --> 00:17:01,530 Well in this example, probability of A intersection 332 00:17:01,530 --> 00:17:06,359 B is 2/6, divided by total probability of B, which is 333 00:17:06,359 --> 00:17:12,369 3/6, and so it's 2/3, which agrees with this answer that's 334 00:17:12,369 --> 00:17:13,589 we got before. 335 00:17:13,589 --> 00:17:18,280 So the former indeed matches what we were trying to do. 336 00:17:18,280 --> 00:17:21,040 One little technical detail. 337 00:17:21,040 --> 00:17:24,970 If the event B has zero probability, and then here we 338 00:17:24,970 --> 00:17:27,770 have a ratio that doesn't make sense. 339 00:17:27,770 --> 00:17:30,470 So in this case, we say that conditional probabilities are 340 00:17:30,470 --> 00:17:31,720 not defined. 341 00:17:31,720 --> 00:17:34,780 342 00:17:34,780 --> 00:17:38,980 Now you can take this definition and unravel it and 343 00:17:38,980 --> 00:17:40,260 write it in this form. 344 00:17:40,260 --> 00:17:43,510 The probability of A intersection B is the 345 00:17:43,510 --> 00:17:46,780 probability of B times the conditional probability. 346 00:17:46,780 --> 00:17:50,350 347 00:17:50,350 --> 00:17:53,820 So this is just consequence of the definition but it has a 348 00:17:53,820 --> 00:17:55,370 nice interpretation. 349 00:17:55,370 --> 00:17:57,930 Think of probabilities as frequencies. 350 00:17:57,930 --> 00:18:01,480 If I do the experiment over and over, what fraction of the 351 00:18:01,480 --> 00:18:05,300 time is it going to be the case that both A and B occur? 352 00:18:05,300 --> 00:18:08,490 Well, there's going to be a certain fraction of the time 353 00:18:08,490 --> 00:18:10,820 at which B occurs. 354 00:18:10,820 --> 00:18:14,760 And out of those times when B occurs, there's going to be a 355 00:18:14,760 --> 00:18:17,270 further fraction of the experiments in 356 00:18:17,270 --> 00:18:19,410 which A also occurs. 357 00:18:19,410 --> 00:18:21,930 So interpret the conditional probability as follows. 358 00:18:21,930 --> 00:18:24,320 You only look at those experiments at which 359 00:18:24,320 --> 00:18:26,050 B happens to occur. 360 00:18:26,050 --> 00:18:29,820 And look at what fraction of those experiments where B 361 00:18:29,820 --> 00:18:33,670 already occurred, event A also occurs. 362 00:18:33,670 --> 00:18:39,610 And there's a symmetrical version of this equality. 363 00:18:39,610 --> 00:18:44,660 There's symmetry between the events B and A. So you also 364 00:18:44,660 --> 00:18:48,890 have this relation that goes the other way. 365 00:18:48,890 --> 00:18:53,950 OK, so what do we use these conditional probabilities for? 366 00:18:53,950 --> 00:18:55,120 First, one comment. 367 00:18:55,120 --> 00:18:58,100 Conditional probabilities are just like ordinary 368 00:18:58,100 --> 00:18:59,170 probabilities. 369 00:18:59,170 --> 00:19:02,820 They're the new probabilities that apply in a new universe 370 00:19:02,820 --> 00:19:07,300 where event B is known to have occurred. 371 00:19:07,300 --> 00:19:10,620 So we had an original probability model. 372 00:19:10,620 --> 00:19:12,210 We are told that B occurs. 373 00:19:12,210 --> 00:19:13,840 We revise our model. 374 00:19:13,840 --> 00:19:16,690 Our new model should still be legitimate probability model. 375 00:19:16,690 --> 00:19:20,770 So it should satisfy all sorts of properties that ordinary 376 00:19:20,770 --> 00:19:23,210 probabilities do satisfy. 377 00:19:23,210 --> 00:19:29,230 So for example, if A and B are disjoint events, then we know 378 00:19:29,230 --> 00:19:33,830 that the probability of A union B is equal to the 379 00:19:33,830 --> 00:19:39,230 probability of A plus probability of B. And now if I 380 00:19:39,230 --> 00:19:42,770 tell you that a certain event C occurred, we're placed in a 381 00:19:42,770 --> 00:19:45,220 new universe where event C occurred. 382 00:19:45,220 --> 00:19:47,515 We have new probabilities for that universe. 383 00:19:47,515 --> 00:19:49,880 These are the conditional probabilities. 384 00:19:49,880 --> 00:19:52,960 And conditional probabilities also satisfy 385 00:19:52,960 --> 00:19:54,820 this kind of property. 386 00:19:54,820 --> 00:19:58,380 So this is just our usual additivity axiom but the 387 00:19:58,380 --> 00:20:02,290 applied in a new model, in which we were told that event 388 00:20:02,290 --> 00:20:03,250 C occurred. 389 00:20:03,250 --> 00:20:06,580 So conditional probabilities do not taste or smell any 390 00:20:06,580 --> 00:20:09,970 different than ordinary probabilities do. 391 00:20:09,970 --> 00:20:14,350 Conditional probabilities, given a specific event B, just 392 00:20:14,350 --> 00:20:19,480 form a probability law on our sample space. 393 00:20:19,480 --> 00:20:22,460 It's a different probability law but it's still a 394 00:20:22,460 --> 00:20:26,430 probability law that has all of the desired properties. 395 00:20:26,430 --> 00:20:30,360 OK, so where do conditional probabilities come up? 396 00:20:30,360 --> 00:20:32,450 They do come up in quizzes and they do 397 00:20:32,450 --> 00:20:34,070 come up in silly problems. 398 00:20:34,070 --> 00:20:35,680 So let's start with this. 399 00:20:35,680 --> 00:20:37,790 We have this example from last time. 400 00:20:37,790 --> 00:20:42,220 Two rolls of a die, all possible pairs of roles are 401 00:20:42,220 --> 00:20:46,410 equally likely, so every element in this square has 402 00:20:46,410 --> 00:20:47,660 probability of 1/16. 403 00:20:47,660 --> 00:20:50,300 404 00:20:50,300 --> 00:20:52,330 So all elements are equally likely. 405 00:20:52,330 --> 00:20:54,280 That's our original model. 406 00:20:54,280 --> 00:20:57,210 Then somebody comes and tells us that the minimum of the two 407 00:20:57,210 --> 00:20:59,530 rolls is equal to zero. 408 00:20:59,530 --> 00:21:02,060 What's that event? 409 00:21:02,060 --> 00:21:05,990 The minimum equal to zero can happen in many ways, if we get 410 00:21:05,990 --> 00:21:08,990 two zeros or if we get a zero and-- 411 00:21:08,990 --> 00:21:13,140 sorry, if we get two two's, or get a two 412 00:21:13,140 --> 00:21:14,830 and something larger. 413 00:21:14,830 --> 00:21:21,400 And so the is our new event B. The red event is the event B. 414 00:21:21,400 --> 00:21:23,500 And now we want to calculate probabilities 415 00:21:23,500 --> 00:21:25,310 inside this new universe. 416 00:21:25,310 --> 00:21:28,770 For example, you may be interested in the question, 417 00:21:28,770 --> 00:21:31,960 questions about the maximum of the two rolls. 418 00:21:31,960 --> 00:21:34,310 In the new universe, what's the probability that the 419 00:21:34,310 --> 00:21:37,550 maximum is equal to one? 420 00:21:37,550 --> 00:21:44,320 The maximum being equal to one is this black event. 421 00:21:44,320 --> 00:21:49,240 And given that we're told that B occurred, this black events 422 00:21:49,240 --> 00:21:50,300 cannot happen. 423 00:21:50,300 --> 00:21:53,240 So this probability is equal to zero. 424 00:21:53,240 --> 00:21:56,500 How about the maximum being equal to two, 425 00:21:56,500 --> 00:21:59,110 given that event B? 426 00:21:59,110 --> 00:22:01,760 OK, we can use the definition here. 427 00:22:01,760 --> 00:22:05,730 It's going to be the probability that the maximum 428 00:22:05,730 --> 00:22:10,590 is equal to two and B occurs divided by the probability of 429 00:22:10,590 --> 00:22:16,020 B. The probability that the maximum is equal to two. 430 00:22:16,020 --> 00:22:19,470 OK, what's the event that the maximum is equal to two? 431 00:22:19,470 --> 00:22:20,340 Let's draw it. 432 00:22:20,340 --> 00:22:22,300 This is going to be the blue event. 433 00:22:22,300 --> 00:22:25,950 The maximum is equal to two if we get any 434 00:22:25,950 --> 00:22:28,520 of those blue points. 435 00:22:28,520 --> 00:22:32,310 So the intersection of the two events is the intersection of 436 00:22:32,310 --> 00:22:35,170 the red event and the blue event. 437 00:22:35,170 --> 00:22:37,770 There's only one point in their intersection. 438 00:22:37,770 --> 00:22:39,640 So the probability of that intersection 439 00:22:39,640 --> 00:22:41,080 happening is 1/16. 440 00:22:41,080 --> 00:22:43,740 441 00:22:43,740 --> 00:22:45,160 That's the numerator. 442 00:22:45,160 --> 00:22:47,110 How about the denominator? 443 00:22:47,110 --> 00:22:50,610 The event B consists of five elements, each one of which 444 00:22:50,610 --> 00:22:52,270 had probability of 1/16. 445 00:22:52,270 --> 00:22:54,570 So that's 5/16. 446 00:22:54,570 --> 00:22:58,340 And so the answer is 1/5. 447 00:22:58,340 --> 00:23:02,830 Could we have gotten this answer in a faster way? 448 00:23:02,830 --> 00:23:04,190 Yes. 449 00:23:04,190 --> 00:23:05,560 Here's how it goes. 450 00:23:05,560 --> 00:23:09,060 We're trying to find the conditional probability that 451 00:23:09,060 --> 00:23:13,210 we get this point, given that B occurred. 452 00:23:13,210 --> 00:23:15,570 B consist of five elements. 453 00:23:15,570 --> 00:23:18,250 All of those five elements were equally likely when we 454 00:23:18,250 --> 00:23:22,720 started, so they remain equally likely afterwards. 455 00:23:22,720 --> 00:23:25,180 Because when we define conditional probabilities, we 456 00:23:25,180 --> 00:23:28,110 keep the same proportions inside the set. 457 00:23:28,110 --> 00:23:31,940 So the five red elements were equally likely. 458 00:23:31,940 --> 00:23:35,050 They remain equally likely in the conditional world. 459 00:23:35,050 --> 00:23:39,080 So conditional event B having happened, each one of these 460 00:23:39,080 --> 00:23:41,580 five elements has the same probability. 461 00:23:41,580 --> 00:23:44,300 So the probability that we actually get this point is 462 00:23:44,300 --> 00:23:46,210 going to be 1/5. 463 00:23:46,210 --> 00:23:48,280 And so that's the shortcut. 464 00:23:48,280 --> 00:23:53,070 More generally, whenever you have a uniform distribution on 465 00:23:53,070 --> 00:23:56,470 your initial sample space, when you condition on an 466 00:23:56,470 --> 00:24:01,000 event, your new distribution is still going to be uniform, 467 00:24:01,000 --> 00:24:05,010 but on the smaller events of that we considered. 468 00:24:05,010 --> 00:24:09,780 So we started with a uniform distribution on the big square 469 00:24:09,780 --> 00:24:13,730 and we ended up with a uniform distribution 470 00:24:13,730 --> 00:24:17,230 just on the red point. 471 00:24:17,230 --> 00:24:19,850 Now besides silly problems, however, conditional 472 00:24:19,850 --> 00:24:25,070 probabilities show up in real and interesting situations. 473 00:24:25,070 --> 00:24:27,390 And this example is going to give you some 474 00:24:27,390 --> 00:24:30,430 idea of how that happens. 475 00:24:30,430 --> 00:24:32,250 OK. 476 00:24:32,250 --> 00:24:35,450 Actually, in this example, instead of starting with a 477 00:24:35,450 --> 00:24:39,480 probability model in terms of regular probabilities, I'm 478 00:24:39,480 --> 00:24:43,070 actually going to define the model in terms of conditional 479 00:24:43,070 --> 00:24:43,890 probabilities. 480 00:24:43,890 --> 00:24:45,880 And we'll see how this is done. 481 00:24:45,880 --> 00:24:48,330 So here's the story. 482 00:24:48,330 --> 00:24:52,210 There may be an airplane flying up in the sky, in a 483 00:24:52,210 --> 00:24:55,400 particular sector of the sky that you're watching. 484 00:24:55,400 --> 00:24:57,950 Sometimes there is one sometimes there isn't. 485 00:24:57,950 --> 00:25:01,760 And from experience you know that when you look up, there's 486 00:25:01,760 --> 00:25:04,400 five percent probability that the plane is flying above 487 00:25:04,400 --> 00:25:09,670 there and 95% probability that there's no plane up there. 488 00:25:09,670 --> 00:25:14,930 So event A is the event that the plane is flying out there. 489 00:25:14,930 --> 00:25:19,140 Now you bought this wonderful radar that's looks up. 490 00:25:19,140 --> 00:25:23,300 And you're told in the manufacturer's specs that, if 491 00:25:23,300 --> 00:25:27,310 there is a plane out there, your radar is going to 492 00:25:27,310 --> 00:25:30,090 register something, a blip on the screen 493 00:25:30,090 --> 00:25:32,940 with probability 99%. 494 00:25:32,940 --> 00:25:35,540 And it will not register anything with 495 00:25:35,540 --> 00:25:37,500 probability one percent. 496 00:25:37,500 --> 00:25:43,890 So this particular part of the picture is a self-contained 497 00:25:43,890 --> 00:25:50,280 probability model of what your radar does in a world where a 498 00:25:50,280 --> 00:25:52,530 plane is out there. 499 00:25:52,530 --> 00:25:55,380 So I'm telling you that the plane is out there. 500 00:25:55,380 --> 00:25:58,240 So we're now dealing with conditional probabilities 501 00:25:58,240 --> 00:26:00,920 because I gave you some particular information. 502 00:26:00,920 --> 00:26:04,120 Given this information that the plane is out there, that's 503 00:26:04,120 --> 00:26:07,770 how your radar is going to behave with probability 99% is 504 00:26:07,770 --> 00:26:10,320 going to detect it, with probability one percent is 505 00:26:10,320 --> 00:26:11,620 going to miss it. 506 00:26:11,620 --> 00:26:14,100 So this piece of the picture is a self-contained 507 00:26:14,100 --> 00:26:15,060 probability model. 508 00:26:15,060 --> 00:26:17,130 The probabilities add up to one. 509 00:26:17,130 --> 00:26:20,300 But it's a piece of a larger model. 510 00:26:20,300 --> 00:26:22,820 Similarly, there's the other possibility. 511 00:26:22,820 --> 00:26:27,980 Maybe a plane is not up there and the manufacturer specs 512 00:26:27,980 --> 00:26:32,630 tell you something about false alarms. 513 00:26:32,630 --> 00:26:37,490 A false alarm is the situation where the plane is not there, 514 00:26:37,490 --> 00:26:41,190 but for some reason your radar picked up some noise or 515 00:26:41,190 --> 00:26:43,700 whatever and shows a blip on the screen. 516 00:26:43,700 --> 00:26:46,790 And suppose that this happens with probability ten percent. 517 00:26:46,790 --> 00:26:49,170 Whereas with probability 90% your radar 518 00:26:49,170 --> 00:26:51,220 gives the correct answer. 519 00:26:51,220 --> 00:26:55,430 So this is sort of a model of what's going to happen with 520 00:26:55,430 --> 00:26:59,430 respect to both the plane -- we're given probabilities 521 00:26:59,430 --> 00:27:02,000 about this -- and we're given probabilities about how the 522 00:27:02,000 --> 00:27:04,120 radar behaves. 523 00:27:04,120 --> 00:27:07,740 So here I have indirectly specified the probability law 524 00:27:07,740 --> 00:27:10,810 in our model by starting with conditional probabilities as 525 00:27:10,810 --> 00:27:13,670 opposed to starting with ordinary probabilities. 526 00:27:13,670 --> 00:27:17,160 Can we derive ordinary probabilities starting from 527 00:27:17,160 --> 00:27:18,740 the conditional number ones? 528 00:27:18,740 --> 00:27:20,340 Yeah, we certainly can. 529 00:27:20,340 --> 00:27:25,810 Let's look at this event, A intersection B, which is the 530 00:27:25,810 --> 00:27:31,160 event up here, that there is a plane and our 531 00:27:31,160 --> 00:27:33,750 radar picks it up. 532 00:27:33,750 --> 00:27:35,760 How can we calculate this probability? 533 00:27:35,760 --> 00:27:38,600 Well we use the definition of conditional probabilities and 534 00:27:38,600 --> 00:27:41,430 this is the probability of A times the conditional 535 00:27:41,430 --> 00:27:50,260 probability of B given A. So it's 0.05 times 0.99. 536 00:27:50,260 --> 00:27:53,290 And the answer, in case you care-- 537 00:27:53,290 --> 00:27:56,730 It's 0.0495. 538 00:27:56,730 --> 00:27:57,650 OK. 539 00:27:57,650 --> 00:28:01,370 So we can calculate the probabilities of final 540 00:28:01,370 --> 00:28:05,120 outcomes, which are the leaves of the tree, by using the 541 00:28:05,120 --> 00:28:07,250 probabilities that we have along the 542 00:28:07,250 --> 00:28:09,000 branches of the tree. 543 00:28:09,000 --> 00:28:11,950 So essentially, what we ended up doing was to multiply the 544 00:28:11,950 --> 00:28:13,700 probability of this branch times the 545 00:28:13,700 --> 00:28:17,220 probability of that branch. 546 00:28:17,220 --> 00:28:20,690 Now, how about the answer to this question. 547 00:28:20,690 --> 00:28:25,350 What is the probability that our radar is 548 00:28:25,350 --> 00:28:28,660 going to register something? 549 00:28:28,660 --> 00:28:32,800 OK, this is an event that can happen in multiple ways. 550 00:28:32,800 --> 00:28:38,020 It's the event that consists of this outcome. 551 00:28:38,020 --> 00:28:41,640 There is a plane and the radar registers something together 552 00:28:41,640 --> 00:28:46,440 with this outcome, there is no plane but the radar still 553 00:28:46,440 --> 00:28:48,470 registers something. 554 00:28:48,470 --> 00:28:52,650 So to find the probability of this event, we need the 555 00:28:52,650 --> 00:28:56,940 individual probabilities of the two outcomes. 556 00:28:56,940 --> 00:29:00,780 For the first outcome, we already calculated it. 557 00:29:00,780 --> 00:29:03,870 For the second outcome, the probability that this happens 558 00:29:03,870 --> 00:29:08,480 is going to be this probability 95% times 0.10, 559 00:29:08,480 --> 00:29:11,280 which is the conditional probability for taking this 560 00:29:11,280 --> 00:29:15,070 branch, given that there was no plane out there. 561 00:29:15,070 --> 00:29:18,080 So we just add the numbers. 562 00:29:18,080 --> 00:29:26,950 0.05 times 0.99 plus 0.95 times 0.1 and the 563 00:29:26,950 --> 00:29:31,720 final answer is 0.1445. 564 00:29:31,720 --> 00:29:32,410 OK. 565 00:29:32,410 --> 00:29:35,730 And now here's the interesting question. 566 00:29:35,730 --> 00:29:41,480 Given that your radar recorded something, how likely is it 567 00:29:41,480 --> 00:29:45,070 that there is an airplane up there? 568 00:29:45,070 --> 00:29:46,810 Your radar registering something -- 569 00:29:46,810 --> 00:29:48,730 that can be caused by two things. 570 00:29:48,730 --> 00:29:52,390 Either there's a plane there, and your radar did its job. 571 00:29:52,390 --> 00:29:57,400 Or there was nothing, but your radar fired a false alarm. 572 00:29:57,400 --> 00:30:01,690 What's the probability that this is the case as opposed to 573 00:30:01,690 --> 00:30:05,370 that being the case? 574 00:30:05,370 --> 00:30:06,460 OK. 575 00:30:06,460 --> 00:30:10,510 The intuitive shortcut would be that it should be the 576 00:30:10,510 --> 00:30:12,930 probability-- 577 00:30:12,930 --> 00:30:15,820 you look at their relative odds of these two elements and 578 00:30:15,820 --> 00:30:19,570 you use them to find out how much more likely it is to be 579 00:30:19,570 --> 00:30:21,730 there as opposed to being there. 580 00:30:21,730 --> 00:30:24,240 But instead of doing this, let's just write down the 581 00:30:24,240 --> 00:30:26,570 definition and just use it. 582 00:30:26,570 --> 00:30:30,480 It's the probability of A and B happening, divided by the 583 00:30:30,480 --> 00:30:34,250 probability of B. This is just our definition of conditional 584 00:30:34,250 --> 00:30:35,540 probabilities. 585 00:30:35,540 --> 00:30:39,300 Now we have already found the numerator. 586 00:30:39,300 --> 00:30:42,450 We have already calculated the denominator. 587 00:30:42,450 --> 00:30:46,440 So we take the ratio of these two numbers and we find the 588 00:30:46,440 --> 00:30:47,650 final answer -- 589 00:30:47,650 --> 00:30:54,490 which is 0.34. 590 00:30:54,490 --> 00:30:55,980 OK. 591 00:30:55,980 --> 00:30:59,040 There's this slightly curious thing that's 592 00:30:59,040 --> 00:31:02,270 happened in this example. 593 00:31:02,270 --> 00:31:08,380 Doesn't this number feel a little too low? 594 00:31:08,380 --> 00:31:10,700 My radar -- 595 00:31:10,700 --> 00:31:13,820 So this is a conditional probability, given that my 596 00:31:13,820 --> 00:31:17,110 radar said there is something out there, that there is 597 00:31:17,110 --> 00:31:19,200 indeed something there. 598 00:31:19,200 --> 00:31:21,960 So it's sort of the probability that our radar 599 00:31:21,960 --> 00:31:24,560 gave the correct answer. 600 00:31:24,560 --> 00:31:28,580 Now, the specs of our radar we're pretty good. 601 00:31:28,580 --> 00:31:31,460 In this situation, it gives you the correct 602 00:31:31,460 --> 00:31:34,160 answer 99% of the time. 603 00:31:34,160 --> 00:31:36,020 In this situation, it gives you the correct 604 00:31:36,020 --> 00:31:38,400 answer 90% of the time. 605 00:31:38,400 --> 00:31:39,730 So you would think that your radar 606 00:31:39,730 --> 00:31:41,870 there is really reliable. 607 00:31:41,870 --> 00:31:47,730 But yet here the radar recorded something, but the 608 00:31:47,730 --> 00:31:51,900 chance that the answer that you get out of this is the 609 00:31:51,900 --> 00:31:55,180 right one, given that it recorded something, the chance 610 00:31:55,180 --> 00:31:58,970 that there is an airplane out there is only 30%. 611 00:31:58,970 --> 00:32:01,980 So you cannot really rely on the measurements from your 612 00:32:01,980 --> 00:32:06,650 radar, even though the specs of the radar were really good. 613 00:32:06,650 --> 00:32:08,620 What's the reason for this? 614 00:32:08,620 --> 00:32:17,730 Well, the reason is that false alarms are pretty common. 615 00:32:17,730 --> 00:32:20,110 Most of the time there's nothing. 616 00:32:20,110 --> 00:32:23,750 And there's a ten percent probability of false alarms. 617 00:32:23,750 --> 00:32:26,640 So there's roughly a ten percent probability that in 618 00:32:26,640 --> 00:32:29,730 any given experiment, you have a false alarm. 619 00:32:29,730 --> 00:32:33,450 And there is about the five percent probability that 620 00:32:33,450 --> 00:32:37,090 something out there and your radar gets it. 621 00:32:37,090 --> 00:32:41,350 So when your radar records something, it's actually more 622 00:32:41,350 --> 00:32:44,980 likely to be a false alarm rather than 623 00:32:44,980 --> 00:32:46,860 being an actual airplane. 624 00:32:46,860 --> 00:32:49,100 This has probability ten percent roughly. 625 00:32:49,100 --> 00:32:52,000 This has probability roughly five percent 626 00:32:52,000 --> 00:32:55,130 So conditional probabilities are sometimes 627 00:32:55,130 --> 00:32:58,250 counter-intuitive in terms of the answers that they get. 628 00:32:58,250 --> 00:33:01,210 And you can make similar stories about doctors 629 00:33:01,210 --> 00:33:04,370 interpreting the results of tests. 630 00:33:04,370 --> 00:33:07,560 So you tested positive for a certain disease. 631 00:33:07,560 --> 00:33:11,260 Does it mean that you have the disease necessarily? 632 00:33:11,260 --> 00:33:14,590 Well if that disease has been eradicated from the face of 633 00:33:14,590 --> 00:33:17,900 the earth, testing positive doesn't mean that you have the 634 00:33:17,900 --> 00:33:21,740 disease, even if the test was designed to be 635 00:33:21,740 --> 00:33:23,320 a pretty good one. 636 00:33:23,320 --> 00:33:28,190 So unfortunately, doctors do get it wrong also sometimes. 637 00:33:28,190 --> 00:33:29,990 And the reasoning that comes in such 638 00:33:29,990 --> 00:33:32,290 situations is pretty subtle. 639 00:33:32,290 --> 00:33:34,890 Now for the rest of the lecture, what we're going to 640 00:33:34,890 --> 00:33:40,710 do is to take this example where we did three things and 641 00:33:40,710 --> 00:33:41,880 abstract them. 642 00:33:41,880 --> 00:33:44,540 These three trivial calculations that's we just 643 00:33:44,540 --> 00:33:50,190 did are three very important, very basic tools that you use 644 00:33:50,190 --> 00:33:53,350 to solve more general probability problems. 645 00:33:53,350 --> 00:33:55,040 So what's the first one? 646 00:33:55,040 --> 00:33:58,040 We find the probability of a composite event, two things 647 00:33:58,040 --> 00:34:01,300 happening, by multiplying probabilities and conditional 648 00:34:01,300 --> 00:34:03,130 probabilities. 649 00:34:03,130 --> 00:34:08,639 More general version of this, look at any situation, maybe 650 00:34:08,639 --> 00:34:10,860 involving lots and lots of events. 651 00:34:10,860 --> 00:34:15,510 So here's a story that event A may happen or may not happen. 652 00:34:15,510 --> 00:34:19,440 Given that A occurred, it's possible that B happens or 653 00:34:19,440 --> 00:34:21,360 that B does not happen. 654 00:34:21,360 --> 00:34:25,280 Given that B also happens, it's possible that the event C 655 00:34:25,280 --> 00:34:29,770 also happens or that event C does not happen. 656 00:34:29,770 --> 00:34:33,400 And somebody specifies for you a model by giving you all 657 00:34:33,400 --> 00:34:36,230 these conditional probabilities along the way. 658 00:34:36,230 --> 00:34:39,570 Notice what we move along the branches as the tree 659 00:34:39,570 --> 00:34:40,690 progresses. 660 00:34:40,690 --> 00:34:45,110 Any point in the tree corresponds to certain events 661 00:34:45,110 --> 00:34:47,050 having happened. 662 00:34:47,050 --> 00:34:50,980 And then, given that this has happened, we specify 663 00:34:50,980 --> 00:34:52,360 conditional probabilities. 664 00:34:52,360 --> 00:34:55,989 Given that this has happened, how likely is it for that C 665 00:34:55,989 --> 00:34:57,900 also occurs? 666 00:34:57,900 --> 00:35:00,890 Given a model of this kind, how do we find the probability 667 00:35:00,890 --> 00:35:02,660 or for this event? 668 00:35:02,660 --> 00:35:05,310 The answer is extremely simple. 669 00:35:05,310 --> 00:35:09,930 All that you do is move along with the tree and multiply 670 00:35:09,930 --> 00:35:12,950 conditional probabilities along the way. 671 00:35:12,950 --> 00:35:16,900 So in terms of frequencies, how often do all three things 672 00:35:16,900 --> 00:35:19,310 happen, A, B, and C? 673 00:35:19,310 --> 00:35:22,450 You first see how often does A occur. 674 00:35:22,450 --> 00:35:24,860 Out of the times that A occurs, how 675 00:35:24,860 --> 00:35:26,710 often does B occur? 676 00:35:26,710 --> 00:35:29,630 And out of the times where both A and B have occurred, 677 00:35:29,630 --> 00:35:31,660 how often does C occur? 678 00:35:31,660 --> 00:35:34,390 And you can just multiply those three frequencies with 679 00:35:34,390 --> 00:35:36,440 each other. 680 00:35:36,440 --> 00:35:39,740 What is the formal proof of this? 681 00:35:39,740 --> 00:35:43,000 Well, the only thing we have in our hands is the definition 682 00:35:43,000 --> 00:35:44,890 of conditional probabilities. 683 00:35:44,890 --> 00:35:49,660 So let's just use this. 684 00:35:49,660 --> 00:35:50,910 And-- 685 00:35:50,910 --> 00:35:54,370 686 00:35:54,370 --> 00:35:55,000 OK. 687 00:35:55,000 --> 00:35:58,210 Now, the definition of conditional probabilities 688 00:35:58,210 --> 00:36:00,770 tells us that the probability of two things is the 689 00:36:00,770 --> 00:36:03,660 probability of one of them times a conditional 690 00:36:03,660 --> 00:36:04,620 probability. 691 00:36:04,620 --> 00:36:05,850 Unfortunately, here we have the 692 00:36:05,850 --> 00:36:07,310 probability of three things. 693 00:36:07,310 --> 00:36:09,000 What can I do? 694 00:36:09,000 --> 00:36:13,570 I can put a parenthesis in here and think of this as the 695 00:36:13,570 --> 00:36:18,640 probability of this and that and apply our definition of 696 00:36:18,640 --> 00:36:20,300 conditional probabilities here. 697 00:36:20,300 --> 00:36:23,920 The probability of two things happening is the probability 698 00:36:23,920 --> 00:36:28,430 that the first happens times the conditional probability 699 00:36:28,430 --> 00:36:34,070 that the second happens, given A and B, given that the first 700 00:36:34,070 --> 00:36:35,330 one happened. 701 00:36:35,330 --> 00:36:38,850 So this is just the definition of the conditional probability 702 00:36:38,850 --> 00:36:41,980 of an event, given another event. 703 00:36:41,980 --> 00:36:44,270 That other event is a composite one, but 704 00:36:44,270 --> 00:36:45,330 that's not an issue. 705 00:36:45,330 --> 00:36:47,300 It's just an event. 706 00:36:47,300 --> 00:36:50,040 And then we use the definition of conditional probabilities 707 00:36:50,040 --> 00:36:56,290 once more to break this apart and make it P(A), P(B given A) 708 00:36:56,290 --> 00:36:58,260 and then finally, the last term. 709 00:36:58,260 --> 00:37:00,930 710 00:37:00,930 --> 00:37:01,270 OK. 711 00:37:01,270 --> 00:37:03,680 So this proves the formula that I have up 712 00:37:03,680 --> 00:37:05,290 there on the slides. 713 00:37:05,290 --> 00:37:07,470 And if you wish to calculate any other 714 00:37:07,470 --> 00:37:09,330 probability in this diagram. 715 00:37:09,330 --> 00:37:12,590 For example, if you want to calculate this probability, 716 00:37:12,590 --> 00:37:15,580 you would still multiply the conditional probabilities 717 00:37:15,580 --> 00:37:18,560 along the different branches of the tree. 718 00:37:18,560 --> 00:37:22,360 In particular, here in this branch, you would have the 719 00:37:22,360 --> 00:37:26,670 conditional probability of C complement, given A 720 00:37:26,670 --> 00:37:29,790 intersection B complement, and so on. 721 00:37:29,790 --> 00:37:32,070 So you write down probabilities along all those 722 00:37:32,070 --> 00:37:35,940 tree branches and just multiply them as you go. 723 00:37:35,940 --> 00:37:38,510 724 00:37:38,510 --> 00:37:44,450 So this was the first skill that we are covering. 725 00:37:44,450 --> 00:37:46,690 What was the second one? 726 00:37:46,690 --> 00:37:53,240 What we did was to calculate the total probability of a 727 00:37:53,240 --> 00:37:58,520 certain event B that consisted of-- 728 00:37:58,520 --> 00:38:02,820 was made up from different possibilities, which 729 00:38:02,820 --> 00:38:05,580 corresponded to different scenarios. 730 00:38:05,580 --> 00:38:08,870 So we wanted to calculate the probability of this event B 731 00:38:08,870 --> 00:38:12,030 that consisted of those two elements. 732 00:38:12,030 --> 00:38:13,280 Let's generalize. 733 00:38:13,280 --> 00:38:18,600 734 00:38:18,600 --> 00:38:23,080 So we have our big model. 735 00:38:23,080 --> 00:38:26,110 And this sample space is partitioned 736 00:38:26,110 --> 00:38:27,410 in a number of sets. 737 00:38:27,410 --> 00:38:30,620 In our radar example, we had a partition in two sets. 738 00:38:30,620 --> 00:38:33,600 Either a plane is there, or a plane is not there. 739 00:38:33,600 --> 00:38:35,850 Since we're trying to generalize, now I'm going to 740 00:38:35,850 --> 00:38:39,410 give you a picture for the case of three possibilities or 741 00:38:39,410 --> 00:38:41,360 three possible scenarios. 742 00:38:41,360 --> 00:38:45,160 So whatever happens in the world, there are three 743 00:38:45,160 --> 00:38:49,660 possible scenarios, A1, A2, A3. 744 00:38:49,660 --> 00:38:54,695 So think of these as there's nothing in the air, there's an 745 00:38:54,695 --> 00:38:58,190 airplane in the air, or there's a flock of geese 746 00:38:58,190 --> 00:38:59,490 flying in the air. 747 00:38:59,490 --> 00:39:03,050 So there's three possible scenarios. 748 00:39:03,050 --> 00:39:08,972 And then there's a certain event B of interest, such as a 749 00:39:08,972 --> 00:39:12,800 radar records something or doesn't record something. 750 00:39:12,800 --> 00:39:15,870 We specify this model by giving 751 00:39:15,870 --> 00:39:18,040 probabilities for the Ai's-- 752 00:39:18,040 --> 00:39:20,690 753 00:39:20,690 --> 00:39:23,420 That's the probability of the different scenarios. 754 00:39:23,420 --> 00:39:27,180 And somebody also gives us the probabilities that this event 755 00:39:27,180 --> 00:39:31,010 B is going to occur, given that the Ai-th 756 00:39:31,010 --> 00:39:33,480 scenario has occurred. 757 00:39:33,480 --> 00:39:36,230 Think of the Ai's as scenarios. 758 00:39:36,230 --> 00:39:39,130 759 00:39:39,130 --> 00:39:43,110 And we want to calculate the overall probability of the 760 00:39:43,110 --> 00:39:47,210 event B. What's happening in this example? 761 00:39:47,210 --> 00:39:49,640 Perhaps, instead of this picture, it's easier to 762 00:39:49,640 --> 00:39:54,970 visualize if I go back to the picture I was using before. 763 00:39:54,970 --> 00:39:59,990 We have three possible scenarios, A1, A2, A3. 764 00:39:59,990 --> 00:40:05,150 And under each scenario, B may happen or B may not happen. 765 00:40:05,150 --> 00:40:11,360 766 00:40:11,360 --> 00:40:12,250 And so on. 767 00:40:12,250 --> 00:40:16,060 So here we have A2 intersection B. And here we 768 00:40:16,060 --> 00:40:22,110 have A3 intersection B. In the previous slide, we found how 769 00:40:22,110 --> 00:40:25,350 to calculate the probability of any event of this kind, 770 00:40:25,350 --> 00:40:28,870 which is done by multiplying probabilities here and 771 00:40:28,870 --> 00:40:31,100 conditional probabilities there. 772 00:40:31,100 --> 00:40:34,320 Now we are asked to calculate the total probability of the 773 00:40:34,320 --> 00:40:38,410 event B. The event B can happen in three possible ways. 774 00:40:38,410 --> 00:40:39,900 It can happen here. 775 00:40:39,900 --> 00:40:41,700 It can happen there. 776 00:40:41,700 --> 00:40:43,780 And it can happen here. 777 00:40:43,780 --> 00:40:50,020 So this is our event B. It consists of three elements. 778 00:40:50,020 --> 00:40:53,370 To calculate the total probability of our event B, 779 00:40:53,370 --> 00:40:56,730 all we need to do is to add these three probabilities. 780 00:40:56,730 --> 00:40:59,440 781 00:40:59,440 --> 00:41:03,510 So B is an event that consists of these three elements. 782 00:41:03,510 --> 00:41:06,450 There are three ways that B can happen. 783 00:41:06,450 --> 00:41:10,390 Either B happens together with A1, or B happens together with 784 00:41:10,390 --> 00:41:13,030 A2, or B happens together with A3. 785 00:41:13,030 --> 00:41:15,340 So we need to add the probabilities of these three 786 00:41:15,340 --> 00:41:16,630 contingencies. 787 00:41:16,630 --> 00:41:18,980 For each one of those contingencies, we can 788 00:41:18,980 --> 00:41:23,020 calculate its probability by using the multiplication rule. 789 00:41:23,020 --> 00:41:27,580 So the probability of A1 and B happening is this-- 790 00:41:27,580 --> 00:41:30,030 It's the probability of A1 and then B happening 791 00:41:30,030 --> 00:41:32,020 given that A1 happens. 792 00:41:32,020 --> 00:41:36,140 The probability of this contingency is found by taking 793 00:41:36,140 --> 00:41:39,470 the probability that A2 happens times the conditional 794 00:41:39,470 --> 00:41:42,350 probability of A2, given that B happened. 795 00:41:42,350 --> 00:41:44,640 And similarly for the third one. 796 00:41:44,640 --> 00:41:48,030 So this is the general rule that we have here. 797 00:41:48,030 --> 00:41:50,830 The rule is written for the case of three scenarios. 798 00:41:50,830 --> 00:41:54,020 But obviously, it has a generalization for the case of 799 00:41:54,020 --> 00:41:57,440 four or five or more scenarios. 800 00:41:57,440 --> 00:42:02,050 It gives you a way of breaking up the calculation of an event 801 00:42:02,050 --> 00:42:06,740 that can happen in multiple ways by considering individual 802 00:42:06,740 --> 00:42:09,720 probabilities for the different ways that the event 803 00:42:09,720 --> 00:42:10,970 can happen. 804 00:42:10,970 --> 00:42:12,950 805 00:42:12,950 --> 00:42:14,640 OK. 806 00:42:14,640 --> 00:42:16,300 So-- 807 00:42:16,300 --> 00:42:16,656 Yes? 808 00:42:16,656 --> 00:42:18,180 AUDIENCE: Does this have to change for 809 00:42:18,180 --> 00:42:19,800 infinite sample space? 810 00:42:19,800 --> 00:42:20,760 JOHN TSISIKLIS: No. 811 00:42:20,760 --> 00:42:23,050 This is true whether your sample space 812 00:42:23,050 --> 00:42:25,450 is infinite or finite. 813 00:42:25,450 --> 00:42:28,410 What I'm using in this argument that we have a 814 00:42:28,410 --> 00:42:33,670 partition into just three scenarios, three events. 815 00:42:33,670 --> 00:42:36,720 So it's a partition to a finite number of events. 816 00:42:36,720 --> 00:42:41,100 It's also true if it's a partition into an infinite 817 00:42:41,100 --> 00:42:43,670 sequence of events. 818 00:42:43,670 --> 00:42:47,550 But that's, I think, one of the theoretical problems at 819 00:42:47,550 --> 00:42:49,430 the end of the chapter. 820 00:42:49,430 --> 00:42:54,350 You probably may not need it for now. 821 00:42:54,350 --> 00:42:57,550 OK, going back to the story here. 822 00:42:57,550 --> 00:43:00,410 There are three possible scenarios about what could 823 00:43:00,410 --> 00:43:03,390 happen in the world that are captured here. 824 00:43:03,390 --> 00:43:08,660 Event, under each scenario, event B may or may not happen. 825 00:43:08,660 --> 00:43:11,850 And so these probabilities tell us the likelihoods of the 826 00:43:11,850 --> 00:43:13,270 different scenarios. 827 00:43:13,270 --> 00:43:17,640 These conditional probabilities tell us how 828 00:43:17,640 --> 00:43:21,030 likely is it for B to happen under one scenario, or the 829 00:43:21,030 --> 00:43:23,760 other scenario, or the other scenario. 830 00:43:23,760 --> 00:43:28,510 The overall probability of B is found by taking some 831 00:43:28,510 --> 00:43:32,380 combination of the probabilities of B in the 832 00:43:32,380 --> 00:43:34,250 different possible worlds, in the 833 00:43:34,250 --> 00:43:36,230 different possible scenarios. 834 00:43:36,230 --> 00:43:38,690 Under some scenario, B may be very likely. 835 00:43:38,690 --> 00:43:42,280 Under another scenario, it may be very unlikely. 836 00:43:42,280 --> 00:43:45,740 We take all of these into account and weigh them 837 00:43:45,740 --> 00:43:48,590 according to the likelihood of the scenarios. 838 00:43:48,590 --> 00:43:53,040 Now notice that since A1, A2, and three form a partition, 839 00:43:53,040 --> 00:43:58,530 these three probabilities have what property? 840 00:43:58,530 --> 00:44:00,810 Add to what? 841 00:44:00,810 --> 00:44:03,640 They add to one. 842 00:44:03,640 --> 00:44:06,020 So it's the probability of this branch, plus this branch, 843 00:44:06,020 --> 00:44:07,240 plus this branch. 844 00:44:07,240 --> 00:44:11,660 So what we have here is a weighted average of the 845 00:44:11,660 --> 00:44:15,120 probabilities of the B's into the different worlds, or in 846 00:44:15,120 --> 00:44:16,690 the different scenarios. 847 00:44:16,690 --> 00:44:17,860 Special case. 848 00:44:17,860 --> 00:44:20,370 Suppose the three scenarios are equally likely. 849 00:44:20,370 --> 00:44:25,300 So P of A1 equals 1/3, equals to P of A2, P of A3. 850 00:44:25,300 --> 00:44:27,320 what are we saying here? 851 00:44:27,320 --> 00:44:31,750 In that case of equally likely scenarios, the probability of 852 00:44:31,750 --> 00:44:35,920 B is the average of the probabilities of B in the 853 00:44:35,920 --> 00:44:38,835 three different words, or in the three different scenarios. 854 00:44:38,835 --> 00:44:42,950 855 00:44:42,950 --> 00:44:43,450 OK. 856 00:44:43,450 --> 00:44:46,630 So to finally, the last step. 857 00:44:46,630 --> 00:44:53,800 If we go back again two slides, the last thing that we 858 00:44:53,800 --> 00:44:57,510 did was to calculate a conditional probability of 859 00:44:57,510 --> 00:45:01,760 this kind, probability of A given B, which is a 860 00:45:01,760 --> 00:45:04,080 probability associated essentially with 861 00:45:04,080 --> 00:45:05,630 an inference problem. 862 00:45:05,630 --> 00:45:09,840 Given that our radar recorded something, how likely is it 863 00:45:09,840 --> 00:45:12,060 that the plane was up there? 864 00:45:12,060 --> 00:45:15,240 So we're trying to infer whether a plane was up there 865 00:45:15,240 --> 00:45:18,610 or not, based on the information that we've got. 866 00:45:18,610 --> 00:45:20,770 So let's generalize once more. 867 00:45:20,770 --> 00:45:24,560 868 00:45:24,560 --> 00:45:28,250 And we're just going to rewrite what we did in that 869 00:45:28,250 --> 00:45:32,190 example, but in terms of general symbols instead of the 870 00:45:32,190 --> 00:45:33,650 specific numbers. 871 00:45:33,650 --> 00:45:38,180 So once more, the model that we have involves probabilities 872 00:45:38,180 --> 00:45:40,480 of the different scenarios. 873 00:45:40,480 --> 00:45:42,830 These we call them prior probabilities. 874 00:45:42,830 --> 00:45:46,690 They're are our initial beliefs about how likely each 875 00:45:46,690 --> 00:45:49,360 scenario is to occur. 876 00:45:49,360 --> 00:45:54,500 We also have a model of our measuring device that tells us 877 00:45:54,500 --> 00:45:58,110 under that scenario how likely is it that our radar will 878 00:45:58,110 --> 00:46:00,140 register something or not. 879 00:46:00,140 --> 00:46:03,220 So we're given again these conditional probabilities. 880 00:46:03,220 --> 00:46:04,330 We're given the conditional 881 00:46:04,330 --> 00:46:06,950 probabilities for these branches. 882 00:46:06,950 --> 00:46:11,050 Then we are told that event B occurred. 883 00:46:11,050 --> 00:46:15,330 And on the basis of this new information, we want to form 884 00:46:15,330 --> 00:46:18,510 some new beliefs about the relative likelihood of the 885 00:46:18,510 --> 00:46:20,110 different scenarios. 886 00:46:20,110 --> 00:46:23,790 Going back again to our radar example, an airplane was 887 00:46:23,790 --> 00:46:26,340 present with probability 5%. 888 00:46:26,340 --> 00:46:29,180 Given that the radar recorded something, we're going to 889 00:46:29,180 --> 00:46:30,540 change our beliefs. 890 00:46:30,540 --> 00:46:34,870 Now, a plane is present with probability 34%. 891 00:46:34,870 --> 00:46:38,270 The radar, since we saw something, we are going to 892 00:46:38,270 --> 00:46:41,880 revise our beliefs as to whether the plane is out there 893 00:46:41,880 --> 00:46:43,130 or is not there. 894 00:46:43,130 --> 00:46:46,040 895 00:46:46,040 --> 00:46:52,660 And so what we need to do is to calculate the conditional 896 00:46:52,660 --> 00:46:57,290 probabilities of the different scenarios, given the 897 00:46:57,290 --> 00:46:59,340 information that we got. 898 00:46:59,340 --> 00:47:02,330 So initially, we have these probabilities for the 899 00:47:02,330 --> 00:47:04,000 different scenarios. 900 00:47:04,000 --> 00:47:06,870 Once we get the information, we update them and we 901 00:47:06,870 --> 00:47:09,760 calculate our revised probabilities or conditional 902 00:47:09,760 --> 00:47:14,130 probabilities given the observation that we made. 903 00:47:14,130 --> 00:47:14,730 OK. 904 00:47:14,730 --> 00:47:15,760 So what do we do? 905 00:47:15,760 --> 00:47:17,620 We just use the definition of conditional 906 00:47:17,620 --> 00:47:19,360 probabilities twice. 907 00:47:19,360 --> 00:47:22,490 By definition the conditional probability is the probability 908 00:47:22,490 --> 00:47:25,740 of two things happening divided by the probability of 909 00:47:25,740 --> 00:47:27,960 the conditioning event. 910 00:47:27,960 --> 00:47:30,480 Now, I'm using the definition of conditional probabilities 911 00:47:30,480 --> 00:47:33,550 once more, or rather I use the multiplication rule. 912 00:47:33,550 --> 00:47:35,970 The probability of two things happening is the probability 913 00:47:35,970 --> 00:47:38,740 of the first and the second. 914 00:47:38,740 --> 00:47:41,190 So these are things that are given to us. 915 00:47:41,190 --> 00:47:43,430 They're the probabilities of the different scenarios. 916 00:47:43,430 --> 00:47:47,750 And it's the model of our measuring device, which we 917 00:47:47,750 --> 00:47:51,810 assume to be available. 918 00:47:51,810 --> 00:47:53,450 And how about the denominator? 919 00:47:53,450 --> 00:47:57,780 This is total probability of the event B. But we just found 920 00:47:57,780 --> 00:48:01,140 that's it's easy to calculate using the formula in the 921 00:48:01,140 --> 00:48:02,400 previous slide. 922 00:48:02,400 --> 00:48:04,750 To find the overall probability of event B 923 00:48:04,750 --> 00:48:08,260 occurring, we look at the probabilities of B occurring 924 00:48:08,260 --> 00:48:11,560 under the different scenario and weigh them according to 925 00:48:11,560 --> 00:48:13,710 the probabilities of all the scenarios. 926 00:48:13,710 --> 00:48:17,370 So in the end, we have a formula for the conditional 927 00:48:17,370 --> 00:48:22,730 probability, A's given B, based on the data of the 928 00:48:22,730 --> 00:48:25,090 problem, which were probabilities of the different 929 00:48:25,090 --> 00:48:27,360 scenarios and conditional probabilities of 930 00:48:27,360 --> 00:48:29,490 B, given the A's. 931 00:48:29,490 --> 00:48:33,320 So what this calculation does is, basically, it reverses the 932 00:48:33,320 --> 00:48:35,310 order of conditioning. 933 00:48:35,310 --> 00:48:39,000 We are given conditional probabilities of these kind, 934 00:48:39,000 --> 00:48:42,950 where it's B given A and we produce new conditional 935 00:48:42,950 --> 00:48:46,630 probabilities, where things go the other way. 936 00:48:46,630 --> 00:48:53,530 So schematically, what's happening here is that we have 937 00:48:53,530 --> 00:48:59,995 model of cause and effect and-- 938 00:48:59,995 --> 00:49:02,550 939 00:49:02,550 --> 00:49:09,840 So a scenario occurs and that may cause B to happen or may 940 00:49:09,840 --> 00:49:11,880 not cause it to happen. 941 00:49:11,880 --> 00:49:14,495 So this is a cause/effect model. 942 00:49:14,495 --> 00:49:17,300 943 00:49:17,300 --> 00:49:20,090 And it's modeled using probabilities, such as 944 00:49:20,090 --> 00:49:23,350 probability of B given Ai. 945 00:49:23,350 --> 00:49:28,710 And what we want to do is inference where we are told 946 00:49:28,710 --> 00:49:35,910 that B occurs, and we want to infer whether Ai 947 00:49:35,910 --> 00:49:38,580 also occurred or not. 948 00:49:38,580 --> 00:49:42,050 And the appropriate probabilities for that are the 949 00:49:42,050 --> 00:49:45,010 conditional probabilities that A occurred, 950 00:49:45,010 --> 00:49:48,110 given that B occurred. 951 00:49:48,110 --> 00:49:52,250 So we're starting with a causal model of our situation. 952 00:49:52,250 --> 00:49:57,220 It models from a given cause how likely is a certain effect 953 00:49:57,220 --> 00:49:58,830 to be observed. 954 00:49:58,830 --> 00:50:02,920 And then we do inference, which answers the question, 955 00:50:02,920 --> 00:50:06,730 given that the effect was observed, how likely is it 956 00:50:06,730 --> 00:50:10,870 that the world was in this particular situation or state 957 00:50:10,870 --> 00:50:12,940 or scenario. 958 00:50:12,940 --> 00:50:17,260 So the name of the Bayes rule comes from Thomas Bayes, a 959 00:50:17,260 --> 00:50:20,750 British theologian back in the 1700s. 960 00:50:20,750 --> 00:50:21,530 It actually-- 961 00:50:21,530 --> 00:50:25,000 This calculation addresses a basic problem, a basic 962 00:50:25,000 --> 00:50:30,230 philosophical problem, how one can learn from experience or 963 00:50:30,230 --> 00:50:33,300 from experimental data and some systematic way. 964 00:50:33,300 --> 00:50:35,840 So the British at that time were preoccupied with this 965 00:50:35,840 --> 00:50:36,710 type of question. 966 00:50:36,710 --> 00:50:41,200 Is there a basic theory that about how we can incorporate 967 00:50:41,200 --> 00:50:44,280 new knowledge to previous knowledge. 968 00:50:44,280 --> 00:50:47,600 And this calculation made an argument that, yes, it is 969 00:50:47,600 --> 00:50:50,100 possible to do that in a systematic way. 970 00:50:50,100 --> 00:50:53,040 So the philosophical underpinnings of this have a 971 00:50:53,040 --> 00:50:57,050 very long history and a lot of discussion around them. 972 00:50:57,050 --> 00:51:00,560 But for our purposes, it's just an extremely useful tool. 973 00:51:00,560 --> 00:51:03,550 And it's the foundation of almost everything that gets 974 00:51:03,550 --> 00:51:07,190 done when you try to do inference based on partial 975 00:51:07,190 --> 00:51:08,860 observations. 976 00:51:08,860 --> 00:51:09,690 Very well. 977 00:51:09,690 --> 00:51:10,940 Till next time. 978 00:51:10,940 --> 00:51:11,760