1 00:00:00,000 --> 00:00:00,040 2 00:00:00,040 --> 00:00:02,460 The following content is provided under a Creative 3 00:00:02,460 --> 00:00:03,870 Commons license. 4 00:00:03,870 --> 00:00:06,910 Your support will help MIT OpenCourseWare continue to 5 00:00:06,910 --> 00:00:10,560 offer high quality educational resources for free. 6 00:00:10,560 --> 00:00:13,460 To make a donation or view additional materials from 7 00:00:13,460 --> 00:00:19,290 hundreds of MIT courses, visit MIT OpenCourseWare at 8 00:00:19,290 --> 00:00:20,540 ocw.mit.edu. 9 00:00:20,540 --> 00:00:23,750 10 00:00:23,750 --> 00:00:26,970 PROFESSOR: Let us start. 11 00:00:26,970 --> 00:00:30,080 So as always, we're to have a quick review of what we 12 00:00:30,080 --> 00:00:31,240 discussed last time. 13 00:00:31,240 --> 00:00:34,240 And then today we're going to introduce just one new 14 00:00:34,240 --> 00:00:38,120 concept, the notion of independence of two events. 15 00:00:38,120 --> 00:00:41,030 And we will play with that concept. 16 00:00:41,030 --> 00:00:43,110 So what did we talk about last time? 17 00:00:43,110 --> 00:00:46,410 The idea is that we have an experiment, and the experiment 18 00:00:46,410 --> 00:00:48,800 has a sample space omega. 19 00:00:48,800 --> 00:00:52,300 And then somebody comes and tells us you know the outcome 20 00:00:52,300 --> 00:00:56,840 of the experiments happens to lie inside this particular 21 00:00:56,840 --> 00:01:00,470 event B. Given this information, it kind of 22 00:01:00,470 --> 00:01:03,070 changes what we know about the situation. 23 00:01:03,070 --> 00:01:05,510 It tells us that the outcome is going to be somewhere 24 00:01:05,510 --> 00:01:06,630 inside here. 25 00:01:06,630 --> 00:01:09,800 So this is essentially our new sample space. 26 00:01:09,800 --> 00:01:13,130 And now we need to we reassign probabilities to the various 27 00:01:13,130 --> 00:01:16,550 possible outcomes, because, for example, these outcomes, 28 00:01:16,550 --> 00:01:20,340 even if they had positive probability beforehand, now 29 00:01:20,340 --> 00:01:22,890 that we're told that B occurred, those outcomes out 30 00:01:22,890 --> 00:01:25,220 there are going to have zero probability. 31 00:01:25,220 --> 00:01:27,670 So we need to revise our probabilities. 32 00:01:27,670 --> 00:01:29,740 The new probabilities are called conditional 33 00:01:29,740 --> 00:01:33,390 probabilities, and they're defined this way. 34 00:01:33,390 --> 00:01:37,000 The conditional probability that A occurs given that we're 35 00:01:37,000 --> 00:01:40,670 told that B occurred is calculated by this formula, 36 00:01:40,670 --> 00:01:42,880 which tells us the following-- 37 00:01:42,880 --> 00:01:45,750 out of the total probability that was initially assigned to 38 00:01:45,750 --> 00:01:49,760 the event B, what fraction of that probability is assigned 39 00:01:49,760 --> 00:01:54,310 to outcomes that also make A to happen? 40 00:01:54,310 --> 00:01:58,650 So out of the total probability assigned to B, we 41 00:01:58,650 --> 00:02:03,080 see what fraction of that total probability is assigned 42 00:02:03,080 --> 00:02:06,650 to those elements here that will also make A happen. 43 00:02:06,650 --> 00:02:09,360 Conditional probabilities are left undefined if the 44 00:02:09,360 --> 00:02:12,860 denominator here is zero. 45 00:02:12,860 --> 00:02:16,030 An easy consequence of the definition is if we bring that 46 00:02:16,030 --> 00:02:18,670 term to the other side, then we can find the probability of 47 00:02:18,670 --> 00:02:22,040 two things happening by taking the probability that the first 48 00:02:22,040 --> 00:02:25,170 thing happens, and then, given that the first thing happened, 49 00:02:25,170 --> 00:02:28,230 the conditional probability that the second one happens. 50 00:02:28,230 --> 00:02:31,680 Then we saw last time that we can divide and conquer in 51 00:02:31,680 --> 00:02:36,010 calculating probabilities of mildly complicated events by 52 00:02:36,010 --> 00:02:39,070 breaking it down into different scenarios. 53 00:02:39,070 --> 00:02:41,350 So event B can happen in two ways. 54 00:02:41,350 --> 00:02:44,810 It can happen either together with A, which is this 55 00:02:44,810 --> 00:02:48,700 probability, or it can happen together with A complement, 56 00:02:48,700 --> 00:02:49,880 which is this probability. 57 00:02:49,880 --> 00:02:53,100 So basically what we're saying that the total probability of 58 00:02:53,100 --> 00:02:57,960 B is the probability of this, which is A intersection B, 59 00:02:57,960 --> 00:03:02,620 plus the probability of that, which is A complement 60 00:03:02,620 --> 00:03:07,530 intersection B. 61 00:03:07,530 --> 00:03:11,355 So these two facts here, multiplication rule and the 62 00:03:11,355 --> 00:03:14,760 total probability theorem, are basic tools that one uses to 63 00:03:14,760 --> 00:03:16,990 break down probability calculations 64 00:03:16,990 --> 00:03:18,930 into a simpler parts. 65 00:03:18,930 --> 00:03:21,600 So we find probabilities of two things happening by 66 00:03:21,600 --> 00:03:24,300 looking at each one at a time. 67 00:03:24,300 --> 00:03:27,830 And this is what we do to break up a situation with two 68 00:03:27,830 --> 00:03:29,760 different possible scenarios. 69 00:03:29,760 --> 00:03:32,110 Then we also have the Bayes rule, 70 00:03:32,110 --> 00:03:33,600 which does the following. 71 00:03:33,600 --> 00:03:36,640 Given a model that has conditional probabilities of 72 00:03:36,640 --> 00:03:38,970 this kind, the Bayes rule allows us to calculate 73 00:03:38,970 --> 00:03:41,570 conditional probabilities in which the events appear in 74 00:03:41,570 --> 00:03:43,020 different order. 75 00:03:43,020 --> 00:03:45,740 You can think of these probabilities as describing a 76 00:03:45,740 --> 00:03:49,270 causal model of a certain situation, whereas these are 77 00:03:49,270 --> 00:03:52,670 the probabilities that you get after you do some inference 78 00:03:52,670 --> 00:03:55,480 based on the information that you have available. 79 00:03:55,480 --> 00:03:59,200 Now the Bayes rule, we derived it, and it's a trivial 80 00:03:59,200 --> 00:04:01,040 half-line calculation. 81 00:04:01,040 --> 00:04:03,670 But it underlies lots and lots of useful 82 00:04:03,670 --> 00:04:05,290 things in the real world. 83 00:04:05,290 --> 00:04:07,650 We had the radar example last time. 84 00:04:07,650 --> 00:04:10,410 You can think of more complicated situations in 85 00:04:10,410 --> 00:04:14,570 which there's a bunch or lots of different hypotheses about 86 00:04:14,570 --> 00:04:15,920 the environment. 87 00:04:15,920 --> 00:04:18,899 Given any particular setting in the environment, you have a 88 00:04:18,899 --> 00:04:21,140 measuring device that can produce 89 00:04:21,140 --> 00:04:23,670 many different outcomes. 90 00:04:23,670 --> 00:04:29,210 And you observe the final outcome out of your measuring 91 00:04:29,210 --> 00:04:31,820 device, and you're trying to guess which 92 00:04:31,820 --> 00:04:34,210 particular branch occurred. 93 00:04:34,210 --> 00:04:36,640 That is, you're trying to guess the state of the world 94 00:04:36,640 --> 00:04:38,500 based on a particular measurement. 95 00:04:38,500 --> 00:04:40,770 That's what inference is all about. 96 00:04:40,770 --> 00:04:44,450 So real world problems only differ from the simple example 97 00:04:44,450 --> 00:04:48,610 that we saw last time in that this kind of tree is a little 98 00:04:48,610 --> 00:04:50,040 more complicated. 99 00:04:50,040 --> 00:04:52,150 You might have infinitely many possible 100 00:04:52,150 --> 00:04:54,450 outcomes here and so on. 101 00:04:54,450 --> 00:04:57,960 So setting up the model may be more elaborate, but the basic 102 00:04:57,960 --> 00:05:01,170 calculation that's done based on the Bayes rule is 103 00:05:01,170 --> 00:05:04,430 essentially the same as the one that we saw. 104 00:05:04,430 --> 00:05:07,190 Now something that we discuss is that sometimes we use 105 00:05:07,190 --> 00:05:11,050 conditional probabilities to describe models, and let's do 106 00:05:11,050 --> 00:05:14,030 this by looking at a model where we toss 107 00:05:14,030 --> 00:05:16,150 a coin three times. 108 00:05:16,150 --> 00:05:19,090 And how do we use conditional probabilities to 109 00:05:19,090 --> 00:05:20,630 describe the situation? 110 00:05:20,630 --> 00:05:22,950 So we have one experiment. 111 00:05:22,950 --> 00:05:26,590 But that one experiment consists of three consecutive 112 00:05:26,590 --> 00:05:27,880 coin tosses. 113 00:05:27,880 --> 00:05:32,380 So the possible outcomes, our sample space, consists of 114 00:05:32,380 --> 00:05:37,070 strings of length 3 that tell us whether we had heads, 115 00:05:37,070 --> 00:05:39,200 tails, and in what sequence. 116 00:05:39,200 --> 00:05:43,110 So three heads in a row is one particular outcome. 117 00:05:43,110 --> 00:05:46,460 So what is the meaning of those labels in 118 00:05:46,460 --> 00:05:48,030 front of the branches? 119 00:05:48,030 --> 00:05:51,850 So this P here, of course, stands for the probability 120 00:05:51,850 --> 00:05:55,640 that the first toss resulted in heads. 121 00:05:55,640 --> 00:05:59,270 And let me use this notation to denote that 122 00:05:59,270 --> 00:06:01,170 the first was heads. 123 00:06:01,170 --> 00:06:04,570 I put an H in toss one. 124 00:06:04,570 --> 00:06:08,350 How about the meaning of this probability here? 125 00:06:08,350 --> 00:06:10,570 Well the meaning of this probability is 126 00:06:10,570 --> 00:06:11,670 a conditional one. 127 00:06:11,670 --> 00:06:14,170 It's the conditional probability that the second 128 00:06:14,170 --> 00:06:18,340 toss resulted in heads, given that the first 129 00:06:18,340 --> 00:06:21,440 one resulted in heads. 130 00:06:21,440 --> 00:06:26,830 And similarly this label here corresponds to the probability 131 00:06:26,830 --> 00:06:31,550 that the third toss resulted in heads, given that the first 132 00:06:31,550 --> 00:06:35,010 one and the second one resulted in heads. 133 00:06:35,010 --> 00:06:39,610 So in this particular model that I wrote down here, those 134 00:06:39,610 --> 00:06:44,740 probabilities, P, of obtaining heads remain the same no 135 00:06:44,740 --> 00:06:47,570 matter what happened in the previous toss. 136 00:06:47,570 --> 00:06:52,020 For example, even if the first toss was tails, we still have 137 00:06:52,020 --> 00:06:56,920 the same probability, P, that the second one is heads, given 138 00:06:56,920 --> 00:06:59,100 that the first one was tails. 139 00:06:59,100 --> 00:07:01,820 So we're assuming that no matter what happened in the 140 00:07:01,820 --> 00:07:05,550 first toss, the second toss will still have a conditional 141 00:07:05,550 --> 00:07:08,960 probability equal to P. So that conditional probability 142 00:07:08,960 --> 00:07:12,800 does not depend on what happened in the first toss. 143 00:07:12,800 --> 00:07:16,040 And we will see that this is a very special situation, and 144 00:07:16,040 --> 00:07:19,240 that's really the concept of independence that we are going 145 00:07:19,240 --> 00:07:20,850 to introduce shortly. 146 00:07:20,850 --> 00:07:25,540 But before we get to independence, let's practice 147 00:07:25,540 --> 00:07:29,060 once more the three skills that we covered last time in 148 00:07:29,060 --> 00:07:30,490 this example. 149 00:07:30,490 --> 00:07:33,470 So first skill was multiplication rule. 150 00:07:33,470 --> 00:07:35,660 How do you find the probability of 151 00:07:35,660 --> 00:07:38,000 several things happening? 152 00:07:38,000 --> 00:07:41,390 That is the probability that we have tails followed by 153 00:07:41,390 --> 00:07:44,140 heads followed by tails. 154 00:07:44,140 --> 00:07:50,350 So here we're talking about this particular outcome here, 155 00:07:50,350 --> 00:07:53,130 tails followed by heads followed by tails. 156 00:07:53,130 --> 00:07:57,070 And the way we calculate such a probability is by 157 00:07:57,070 --> 00:08:01,480 multiplying conditional probabilities along the path 158 00:08:01,480 --> 00:08:03,560 that takes us to this outcome. 159 00:08:03,560 --> 00:08:05,160 And so these conditional probabilities 160 00:08:05,160 --> 00:08:06,220 are recorded here. 161 00:08:06,220 --> 00:08:11,840 So it's going to be (1 minus P) times P times (1 minus P). 162 00:08:11,840 --> 00:08:14,480 So this is the multiplication rule. 163 00:08:14,480 --> 00:08:17,990 Second question is how do we find the probability of a 164 00:08:17,990 --> 00:08:20,510 mildly complicated event? 165 00:08:20,510 --> 00:08:23,520 So the event of interest here that I wrote down is the 166 00:08:23,520 --> 00:08:25,850 probability that in the three tosses, we had a 167 00:08:25,850 --> 00:08:28,650 total of one head. 168 00:08:28,650 --> 00:08:30,470 Exactly one head. 169 00:08:30,470 --> 00:08:33,450 This is an event that can happen in multiple ways. 170 00:08:33,450 --> 00:08:35,940 It happens here. 171 00:08:35,940 --> 00:08:38,200 It happens here. 172 00:08:38,200 --> 00:08:41,380 And it also happens here. 173 00:08:41,380 --> 00:08:44,480 So we want to find the total probability of the event 174 00:08:44,480 --> 00:08:46,290 consisting of these three outcomes. 175 00:08:46,290 --> 00:08:47,370 What do we do? 176 00:08:47,370 --> 00:08:51,100 We just add the probabilities of each individual outcome. 177 00:08:51,100 --> 00:08:53,850 How do we find the probability of an individual outcome? 178 00:08:53,850 --> 00:08:56,250 Well, that's what we just did. 179 00:08:56,250 --> 00:09:00,300 Now notice that this outcome has probability P times (1 180 00:09:00,300 --> 00:09:01,550 minus P) squared. 181 00:09:01,550 --> 00:09:04,260 182 00:09:04,260 --> 00:09:07,000 That one should not be there. 183 00:09:07,000 --> 00:09:08,984 So where is it? 184 00:09:08,984 --> 00:09:09,750 Ah. 185 00:09:09,750 --> 00:09:11,000 It's this one. 186 00:09:11,000 --> 00:09:13,830 187 00:09:13,830 --> 00:09:18,610 OK, so the probability of this outcome is (1 minus P times P) 188 00:09:18,610 --> 00:09:20,970 times (1 minus P), the same probability. 189 00:09:20,970 --> 00:09:25,470 And finally, this one is again (1 minus P) squared times P. 190 00:09:25,470 --> 00:09:29,240 So this event of one head can happen in three ways. 191 00:09:29,240 --> 00:09:32,270 And each one of those three ways has the same probability 192 00:09:32,270 --> 00:09:33,380 of occurring. 193 00:09:33,380 --> 00:09:36,440 And this is the answer. 194 00:09:36,440 --> 00:09:40,110 And finally, the last thing that we learned how to do is 195 00:09:40,110 --> 00:09:41,980 to use the Bayes rule to 196 00:09:41,980 --> 00:09:44,230 calculate and make an inference. 197 00:09:44,230 --> 00:09:47,045 So somebody tells you that there was exactly one head in 198 00:09:47,045 --> 00:09:49,350 your three tosses. 199 00:09:49,350 --> 00:09:52,610 What is the probability that the first 200 00:09:52,610 --> 00:09:55,110 toss resulted in heads? 201 00:09:55,110 --> 00:09:59,980 OK, I guess you can guess the answer here if I tell you that 202 00:09:59,980 --> 00:10:01,710 there were three tosses. 203 00:10:01,710 --> 00:10:03,590 One of them was heads. 204 00:10:03,590 --> 00:10:05,670 Where was that head in the first, the 205 00:10:05,670 --> 00:10:07,300 second, or the third? 206 00:10:07,300 --> 00:10:10,410 Well, by symmetry, they should all be equally likely. 207 00:10:10,410 --> 00:10:13,770 So there should be probably just 1/3 that that head 208 00:10:13,770 --> 00:10:16,070 occurred in the first toss. 209 00:10:16,070 --> 00:10:19,230 Let's check our intuition using the definitions. 210 00:10:19,230 --> 00:10:21,280 So the definition of conditional probability tells 211 00:10:21,280 --> 00:10:26,030 us the conditional probability is the probability of both 212 00:10:26,030 --> 00:10:27,310 things happening. 213 00:10:27,310 --> 00:10:33,890 First toss is heads, and we have exactly one head divided 214 00:10:33,890 --> 00:10:36,430 by the probability of one head. 215 00:10:36,430 --> 00:10:40,720 216 00:10:40,720 --> 00:10:44,860 What is the probability that the first toss is heads, and 217 00:10:44,860 --> 00:10:47,340 we have exactly one head? 218 00:10:47,340 --> 00:10:51,810 This is the same as the event heads, tails, tails. 219 00:10:51,810 --> 00:10:54,280 If I tell you that the first is heads, and there's only one 220 00:10:54,280 --> 00:10:57,030 head, it means that the others are tails. 221 00:10:57,030 --> 00:11:03,080 So this is the probability of heads, tails, tails divided by 222 00:11:03,080 --> 00:11:06,080 the probability of one head. 223 00:11:06,080 --> 00:11:08,660 And we know all of these quantities probability of 224 00:11:08,660 --> 00:11:12,080 heads, tails, tails is P times (1 minus P) squared. 225 00:11:12,080 --> 00:11:14,806 Probability of one head is 3 times P 226 00:11:14,806 --> 00:11:17,680 times (1 minus P) squared. 227 00:11:17,680 --> 00:11:22,820 So the final answer is 1/3, which is what you should have 228 00:11:22,820 --> 00:11:27,280 a guessed on intuitive grounds. 229 00:11:27,280 --> 00:11:27,740 Very good. 230 00:11:27,740 --> 00:11:31,110 So we got our practice on the material that we 231 00:11:31,110 --> 00:11:33,040 did cover last time. 232 00:11:33,040 --> 00:11:33,700 Again, think. 233 00:11:33,700 --> 00:11:38,050 There's basically three basic skills that we are practicing 234 00:11:38,050 --> 00:11:40,210 and exercising here. 235 00:11:40,210 --> 00:11:43,870 In the problems, quizzes, and in the real life, you may have 236 00:11:43,870 --> 00:11:47,560 to apply those three skills in somewhat more complicated 237 00:11:47,560 --> 00:11:49,590 settings, but in the end that's what it 238 00:11:49,590 --> 00:11:51,860 boils down to usually. 239 00:11:51,860 --> 00:11:55,240 Now let's focus on this special feature of this 240 00:11:55,240 --> 00:11:59,610 particular model that I discussed a little earlier. 241 00:11:59,610 --> 00:12:03,010 Think of the event heads in the second toss. 242 00:12:03,010 --> 00:12:05,690 243 00:12:05,690 --> 00:12:09,750 Initially, the probability of heads in the second toss, you 244 00:12:09,750 --> 00:12:12,460 know, that it's P, the probability of 245 00:12:12,460 --> 00:12:14,290 success of your coin. 246 00:12:14,290 --> 00:12:19,100 If I tell you that the first toss resulted in heads, what's 247 00:12:19,100 --> 00:12:21,240 the probability that the second toss is heads? 248 00:12:21,240 --> 00:12:24,870 It's again P. If I tell you that the first toss was tails, 249 00:12:24,870 --> 00:12:27,510 what's the probability that the second toss is heads? 250 00:12:27,510 --> 00:12:33,290 It's again P. So whether I tell you the result of the 251 00:12:33,290 --> 00:12:37,280 first toss, or I don't tell you, it doesn't make any 252 00:12:37,280 --> 00:12:38,490 difference to you. 253 00:12:38,490 --> 00:12:40,690 You would always say the probability of heads in the 254 00:12:40,690 --> 00:12:44,970 second toss is going to P, no matter what happened in the 255 00:12:44,970 --> 00:12:46,400 first toss. 256 00:12:46,400 --> 00:12:49,550 This is a special situation to which we're going to give a 257 00:12:49,550 --> 00:12:53,540 name, and we're going to call that property independence. 258 00:12:53,540 --> 00:12:58,520 Basically independence between two things stands for the fact 259 00:12:58,520 --> 00:13:02,690 that the first thing, whether it occurred or not, doesn't 260 00:13:02,690 --> 00:13:05,630 give you any information, does not cause you to change your 261 00:13:05,630 --> 00:13:08,980 beliefs about the second event. 262 00:13:08,980 --> 00:13:11,600 This is the intuition. 263 00:13:11,600 --> 00:13:16,130 Let's try to translate this into mathematics. 264 00:13:16,130 --> 00:13:19,510 We have two events, and we're going to say that they're 265 00:13:19,510 --> 00:13:26,010 independent if your initial beliefs about B are not going 266 00:13:26,010 --> 00:13:30,140 to change if I tell you that A occurred. 267 00:13:30,140 --> 00:13:34,700 So you believe something how likely B is. 268 00:13:34,700 --> 00:13:37,640 Then somebody comes and tells you, you know, A has happened. 269 00:13:37,640 --> 00:13:39,790 Are you going to change your beliefs? 270 00:13:39,790 --> 00:13:42,200 No, I'm not going to change them. 271 00:13:42,200 --> 00:13:45,020 Whenever you are in such a situation, then you say that 272 00:13:45,020 --> 00:13:47,040 the two events are independent. 273 00:13:47,040 --> 00:13:51,470 Intuitively, the fact that A occurred does not convey any 274 00:13:51,470 --> 00:13:55,720 information to you about the likelihood of event B. The 275 00:13:55,720 --> 00:13:58,480 information that A provides is not so 276 00:13:58,480 --> 00:14:00,780 useful, is not relevant. 277 00:14:00,780 --> 00:14:03,010 A has to do with something else. 278 00:14:03,010 --> 00:14:06,250 It's not useful for your guessing whether B is going to 279 00:14:06,250 --> 00:14:07,780 occur or not. 280 00:14:07,780 --> 00:14:13,650 So we can take this as a first attempt into a definition of 281 00:14:13,650 --> 00:14:15,870 independence. 282 00:14:15,870 --> 00:14:23,130 Now remember that we have this property, the probability of 283 00:14:23,130 --> 00:14:25,690 two things happening is the probability of the first times 284 00:14:25,690 --> 00:14:27,920 the conditional probability of the second. 285 00:14:27,920 --> 00:14:31,390 If we have independence, this conditional probability is the 286 00:14:31,390 --> 00:14:33,840 same as the unconditional probability. 287 00:14:33,840 --> 00:14:38,040 So if we have independence according to that definition, 288 00:14:38,040 --> 00:14:41,190 we get this property that you can find the probability of 289 00:14:41,190 --> 00:14:44,440 two things happening by just multiplying their individual 290 00:14:44,440 --> 00:14:45,640 probabilities. 291 00:14:45,640 --> 00:14:48,070 Probability of heads in the first toss is 1/2. 292 00:14:48,070 --> 00:14:50,900 Probability of heads in the second toss is 1/2. 293 00:14:50,900 --> 00:14:54,200 Probability of heads heads is 1/4. 294 00:14:54,200 --> 00:14:57,590 That's what happens if your two tosses are independent of 295 00:14:57,590 --> 00:14:58,730 each other. 296 00:14:58,730 --> 00:15:03,110 So this property here is a consequence of this 297 00:15:03,110 --> 00:15:08,470 definition, but it's actually nicer, better, simpler, 298 00:15:08,470 --> 00:15:12,880 cleaner, more beautiful to take this as our definition 299 00:15:12,880 --> 00:15:14,380 instead of that one. 300 00:15:14,380 --> 00:15:17,180 Are the two definitions equivalent? 301 00:15:17,180 --> 00:15:21,040 Well, they're are almost the same, except for one thing. 302 00:15:21,040 --> 00:15:24,250 Conditional probabilities are only defined if you condition 303 00:15:24,250 --> 00:15:26,900 on an event that has positive probability. 304 00:15:26,900 --> 00:15:31,090 So this definition would be limited to cases where event A 305 00:15:31,090 --> 00:15:34,080 has positive probability, whereas this definition is 306 00:15:34,080 --> 00:15:38,140 something that you can write down always. 307 00:15:38,140 --> 00:15:43,280 We will say that two events are independent if and only if 308 00:15:43,280 --> 00:15:46,940 their probability of happening simultaneously is equal to the 309 00:15:46,940 --> 00:15:50,470 product of their two individual probabilities. 310 00:15:50,470 --> 00:15:54,690 And in particular, we can have events of zero probability. 311 00:15:54,690 --> 00:15:56,220 There's nothing wrong with that. 312 00:15:56,220 --> 00:16:01,450 If A has 0 probability, then A intersection B will also have 313 00:16:01,450 --> 00:16:04,990 zero probability, because it's an even smaller event. 314 00:16:04,990 --> 00:16:09,200 And so we're going to get zero is equal to zero. 315 00:16:09,200 --> 00:16:13,920 A corollary of what I just said, if an event A has zero 316 00:16:13,920 --> 00:16:17,700 probability, it's actually independent of any other event 317 00:16:17,700 --> 00:16:20,220 in our model, because we're going to get 318 00:16:20,220 --> 00:16:21,810 zero is equal to zero. 319 00:16:21,810 --> 00:16:24,140 And the definition is going to be satisfied. 320 00:16:24,140 --> 00:16:27,560 This is a little bit harder to reconcile with the intuition 321 00:16:27,560 --> 00:16:32,800 we have about independence, but then again, it's part of 322 00:16:32,800 --> 00:16:35,610 the mathematical definition. 323 00:16:35,610 --> 00:16:40,450 So what I want you to retain is this notion that the 324 00:16:40,450 --> 00:16:46,300 independence is something that you can check formally using 325 00:16:46,300 --> 00:16:50,420 this definition, but also you can check intuitively by if, 326 00:16:50,420 --> 00:16:54,280 in some cases, you can reason that whatever happens and 327 00:16:54,280 --> 00:16:58,310 determines whether A is going to occur or not, has nothing 328 00:16:58,310 --> 00:17:01,850 absolutely to do with whatever happens and determines whether 329 00:17:01,850 --> 00:17:04,369 B is going to occur or not. 330 00:17:04,369 --> 00:17:08,440 So if I'm doing a science experiment in this room, and 331 00:17:08,440 --> 00:17:12,569 it gets hit by some noise that's causes randomness. 332 00:17:12,569 --> 00:17:16,040 And then five years later, somebody somewhere else does 333 00:17:16,040 --> 00:17:19,069 the same science experiment somewhere else, it gets hit by 334 00:17:19,069 --> 00:17:23,230 other noise, you would usually say that these experiments are 335 00:17:23,230 --> 00:17:23,940 independent. 336 00:17:23,940 --> 00:17:30,230 So what events happen in one experiment are not going to 337 00:17:30,230 --> 00:17:33,290 change your beliefs about what might be happening in the 338 00:17:33,290 --> 00:17:36,610 other, because the sources of noise in these two experiments 339 00:17:36,610 --> 00:17:38,350 are completely unrelated. 340 00:17:38,350 --> 00:17:40,110 They have nothing to do with each other. 341 00:17:40,110 --> 00:17:43,470 So if I flip a coin here today, and I flip a coin in my 342 00:17:43,470 --> 00:17:47,890 office tomorrow, one shouldn't affect the other. 343 00:17:47,890 --> 00:17:52,690 So the events that I get from these should be independent. 344 00:17:52,690 --> 00:17:55,700 So that's usually how independence arises. 345 00:17:55,700 --> 00:17:57,580 By having distinct physical 346 00:17:57,580 --> 00:17:59,940 phenomena that do not interact. 347 00:17:59,940 --> 00:18:03,690 Sometimes you also get independence even though there 348 00:18:03,690 --> 00:18:06,590 is a physical interaction, but you just happen to have a 349 00:18:06,590 --> 00:18:08,930 numerical accident. 350 00:18:08,930 --> 00:18:13,340 A and B might be physically related very tightly, but a 351 00:18:13,340 --> 00:18:16,820 numerical accident happens and you get equality here, that's 352 00:18:16,820 --> 00:18:20,070 another case where we do get independence. 353 00:18:20,070 --> 00:18:24,350 Now suppose that we have two events that are 354 00:18:24,350 --> 00:18:27,380 laid out like this. 355 00:18:27,380 --> 00:18:30,240 Are these two events independent or not? 356 00:18:30,240 --> 00:18:34,570 357 00:18:34,570 --> 00:18:36,620 The picture kind of tells you that one is 358 00:18:36,620 --> 00:18:38,140 separate from the other. 359 00:18:38,140 --> 00:18:41,170 But separate has nothing to do with independent. 360 00:18:41,170 --> 00:18:45,340 In fact, these two events are as dependent as Siamese twins. 361 00:18:45,340 --> 00:18:46,480 Why is that? 362 00:18:46,480 --> 00:18:51,560 If I tell you that A occurred, then you are certain that B 363 00:18:51,560 --> 00:18:53,060 did not occur. 364 00:18:53,060 --> 00:18:57,780 So information about the occurrence of A definitely 365 00:18:57,780 --> 00:19:01,090 affects your beliefs about the possible occurrence or 366 00:19:01,090 --> 00:19:05,490 non-occurrence of B. When the picture is like that, knowing 367 00:19:05,490 --> 00:19:09,480 that A occurred will change drastically my beliefs about 368 00:19:09,480 --> 00:19:13,030 B, because now I suddenly become certain 369 00:19:13,030 --> 00:19:14,870 that B did not occur. 370 00:19:14,870 --> 00:19:18,260 So a picture like this is a case actually of extreme 371 00:19:18,260 --> 00:19:19,360 dependence. 372 00:19:19,360 --> 00:19:23,440 So don't confuse independence with disjointness. 373 00:19:23,440 --> 00:19:26,406 They're very different types of properties. 374 00:19:26,406 --> 00:19:27,080 AUDIENCE: Question. 375 00:19:27,080 --> 00:19:27,520 PROFESSOR: Yes? 376 00:19:27,520 --> 00:19:29,400 AUDIENCE: So I understand the explanation, but the 377 00:19:29,400 --> 00:19:31,954 probability of A intersect B [INAUDIBLE] to zero, because 378 00:19:31,954 --> 00:19:32,910 they're disjoint. 379 00:19:32,910 --> 00:19:33,388 PROFESSOR: Yes. 380 00:19:33,388 --> 00:19:35,539 AUDIENCE: But then the product of probability A and 381 00:19:35,539 --> 00:19:37,690 probability B, one of them is going to be 1. 382 00:19:37,690 --> 00:19:39,602 [INAUDIBLE] 383 00:19:39,602 --> 00:19:42,690 PROFESSOR: No, suppose that the probabilities are 1/3, 384 00:19:42,690 --> 00:19:46,610 1/4, and the rest is out there. 385 00:19:46,610 --> 00:19:48,560 You check the definition of independence. 386 00:19:48,560 --> 00:19:52,440 Probability of A intersection B is zero. 387 00:19:52,440 --> 00:19:58,520 Probability of A times the probability of B is 1/12. 388 00:19:58,520 --> 00:20:00,630 The two are not equal. 389 00:20:00,630 --> 00:20:02,710 Therefore we do not have independence. 390 00:20:02,710 --> 00:20:03,199 AUDIENCE: Right. 391 00:20:03,199 --> 00:20:05,644 So what's wrong with the intuition of the probability 392 00:20:05,644 --> 00:20:09,556 of A being 1, and the other one being 0? 393 00:20:09,556 --> 00:20:12,490 [INAUDIBLE]. 394 00:20:12,490 --> 00:20:12,610 PROFESSOR: No. 395 00:20:12,610 --> 00:20:19,340 The probability of A given B is equal to 0. 396 00:20:19,340 --> 00:20:23,870 Probability of A is equal to 1/3. 397 00:20:23,870 --> 00:20:26,650 So again, these two are different. 398 00:20:26,650 --> 00:20:30,210 So we had some initial beliefs about A, but as soon as we are 399 00:20:30,210 --> 00:20:34,440 told that B occurred, our beliefs about A changed. 400 00:20:34,440 --> 00:20:37,770 And so since our beliefs changed, that means that B 401 00:20:37,770 --> 00:20:40,666 conveys information about A. 402 00:20:40,666 --> 00:20:42,931 AUDIENCE: So can you not draw independent [INAUDIBLE] on a 403 00:20:42,931 --> 00:20:43,390 Venn diagram? 404 00:20:43,390 --> 00:20:44,430 PROFESSOR: I can't hear you. 405 00:20:44,430 --> 00:20:45,352 AUDIENCE: Can you draw 406 00:20:45,352 --> 00:20:46,735 independence on a Venn diagram? 407 00:20:46,735 --> 00:20:51,320 PROFESSOR: No, the Venn diagram is never enough to 408 00:20:51,320 --> 00:20:53,400 decide independence. 409 00:20:53,400 --> 00:20:56,350 So the typical picture in which you're going to have 410 00:20:56,350 --> 00:21:00,120 independence would be one event this way, and another 411 00:21:00,120 --> 00:21:01,760 event this way. 412 00:21:01,760 --> 00:21:03,800 You need to take the probability of this times the 413 00:21:03,800 --> 00:21:07,795 probability of that, and check that, numerically, it's equal 414 00:21:07,795 --> 00:21:11,350 to the probability of this intersection. 415 00:21:11,350 --> 00:21:14,330 So it's more than a Venn diagram. 416 00:21:14,330 --> 00:21:16,138 Numbers need to come out right. 417 00:21:16,138 --> 00:21:19,730 418 00:21:19,730 --> 00:21:23,570 Now we did say some time ago that conditional probabilities 419 00:21:23,570 --> 00:21:27,680 are just like ordinary probabilities, and whatever we 420 00:21:27,680 --> 00:21:31,870 do in probability theory can also be done 421 00:21:31,870 --> 00:21:34,350 in conditional universes. 422 00:21:34,350 --> 00:21:37,680 Talking about conditional probabilities. 423 00:21:37,680 --> 00:21:42,870 So since we have a notion of independence, then there 424 00:21:42,870 --> 00:21:47,470 should be also a notion of conditional independence. 425 00:21:47,470 --> 00:21:55,070 So independence was defined by the probability that A 426 00:21:55,070 --> 00:21:59,070 intersection B is equal to the probability of A times the 427 00:21:59,070 --> 00:22:01,920 probability of B. 428 00:22:01,920 --> 00:22:05,670 What would be a reasonable definition of conditional 429 00:22:05,670 --> 00:22:06,840 independence? 430 00:22:06,840 --> 00:22:09,355 Conditional independence would mean that this same property 431 00:22:09,355 --> 00:22:13,210 could be true, but in a conditional universe where we 432 00:22:13,210 --> 00:22:15,660 are told that the certain event happens. 433 00:22:15,660 --> 00:22:19,060 So if we're told that the event C has happened, then 434 00:22:19,060 --> 00:22:22,240 were transported in a conditional universe where the 435 00:22:22,240 --> 00:22:26,460 only thing that matters are conditional probabilities. 436 00:22:26,460 --> 00:22:31,320 And this is just the same plain, previous definition of 437 00:22:31,320 --> 00:22:35,190 independence, but applied in a conditional universe. 438 00:22:35,190 --> 00:22:40,020 So this is the definition of conditional independence. 439 00:22:40,020 --> 00:22:43,390 440 00:22:43,390 --> 00:22:46,940 So it's independence, but with reference to the conditional 441 00:22:46,940 --> 00:22:48,830 probabilities. 442 00:22:48,830 --> 00:22:51,830 And intuitively it has, again, the same meaning, that in the 443 00:22:51,830 --> 00:22:56,410 conditional world, if I tell you that A occurred, then that 444 00:22:56,410 --> 00:22:58,940 doesn't change your beliefs about B. 445 00:22:58,940 --> 00:23:01,100 So suppose you had a picture like this. 446 00:23:01,100 --> 00:23:06,880 And somebody told you that events A and B are independent 447 00:23:06,880 --> 00:23:09,630 unconditionally. 448 00:23:09,630 --> 00:23:14,320 Then somebody comes and tells you that event C actually has 449 00:23:14,320 --> 00:23:18,150 occurred, so we now live in this new universe. 450 00:23:18,150 --> 00:23:22,450 In this new universe, is the independence of A and B going 451 00:23:22,450 --> 00:23:25,180 to be preserved or not? 452 00:23:25,180 --> 00:23:29,300 Are A and B independent in this new universe? 453 00:23:29,300 --> 00:23:34,780 The answer is no, because in the new universe, whatever is 454 00:23:34,780 --> 00:23:36,790 left of event A is this piece. 455 00:23:36,790 --> 00:23:39,630 Whatever is left of event B is this piece. 456 00:23:39,630 --> 00:23:42,310 And these two pieces are disjoint. 457 00:23:42,310 --> 00:23:45,490 So we are back in a situation of this kind. 458 00:23:45,490 --> 00:23:46,450 So in the conditional 459 00:23:46,450 --> 00:23:49,620 universe, A and B are disjoint. 460 00:23:49,620 --> 00:23:53,380 And therefore, generically, they're not going to be 461 00:23:53,380 --> 00:23:54,730 independent. 462 00:23:54,730 --> 00:23:58,030 What's the moral of this example? 463 00:23:58,030 --> 00:24:01,870 Having independence in the original model does not imply 464 00:24:01,870 --> 00:24:05,930 independence in a conditional model. 465 00:24:05,930 --> 00:24:08,490 The opposite is also possible. 466 00:24:08,490 --> 00:24:12,160 And let's illustrate by another example. 467 00:24:12,160 --> 00:24:17,960 So I have two coins, and both of them are badly biased. 468 00:24:17,960 --> 00:24:21,680 One coin is much biased in favor of heads. 469 00:24:21,680 --> 00:24:25,320 The other coin is much biased in favor of tails. 470 00:24:25,320 --> 00:24:28,050 So the probabilities being 90%. 471 00:24:28,050 --> 00:24:33,050 Let's consider independent flips of coin A. This is the 472 00:24:33,050 --> 00:24:34,980 relevant model. 473 00:24:34,980 --> 00:24:39,600 This is a model of two independent flips 474 00:24:39,600 --> 00:24:41,240 of the first coin. 475 00:24:41,240 --> 00:24:43,850 There's going to be two flips, and each one has probability 476 00:24:43,850 --> 00:24:46,080 0.9 of being heads. 477 00:24:46,080 --> 00:24:49,330 So that's a model that describes coin A. You can 478 00:24:49,330 --> 00:24:52,540 think of this as a conditional model which is a model of the 479 00:24:52,540 --> 00:24:55,940 coin flips conditioned on the fact that they have chosen 480 00:24:55,940 --> 00:24:57,460 coin A. 481 00:24:57,460 --> 00:25:01,460 Alternatively we could be dealing with coin B In a 482 00:25:01,460 --> 00:25:05,260 conditional world where we chose coin B and flip it 483 00:25:05,260 --> 00:25:08,130 twice, this is the relevant model. 484 00:25:08,130 --> 00:25:10,660 The probability of two heads, for example, is the 485 00:25:10,660 --> 00:25:13,280 probability of heads the first time, heads the second time, 486 00:25:13,280 --> 00:25:16,070 and each one is 0.1. 487 00:25:16,070 --> 00:25:19,960 Now I'm building this into a bigger experiment in which I 488 00:25:19,960 --> 00:25:25,160 first start by choosing one of the two coins at random. 489 00:25:25,160 --> 00:25:26,620 So I have these two coins. 490 00:25:26,620 --> 00:25:28,610 I blindly pick one of them. 491 00:25:28,610 --> 00:25:32,610 And then I start flipping them. 492 00:25:32,610 --> 00:25:36,620 So the question now is, are the coin flips, or the coin 493 00:25:36,620 --> 00:25:39,730 tosses, are they independent of each other? 494 00:25:39,730 --> 00:25:46,370 If we just stay inside this sub-model here, are the coin 495 00:25:46,370 --> 00:25:47,620 flips independent? 496 00:25:47,620 --> 00:25:52,240 497 00:25:52,240 --> 00:25:56,540 They are independent, because the probability of heads in 498 00:25:56,540 --> 00:26:01,780 the second toss is the same, 0.9, no matter what happened 499 00:26:01,780 --> 00:26:03,550 in the first toss. 500 00:26:03,550 --> 00:26:06,550 So the conditional probabilities of what happens 501 00:26:06,550 --> 00:26:10,050 in the second toss are not affected by the outcome of the 502 00:26:10,050 --> 00:26:11,180 first toss. 503 00:26:11,180 --> 00:26:14,620 So the second toss and the first toss are independent. 504 00:26:14,620 --> 00:26:17,800 So here we're just dealing with plain, 505 00:26:17,800 --> 00:26:19,990 independent coin flips. 506 00:26:19,990 --> 00:26:24,940 Similarity the coin flips within this sub-model are also 507 00:26:24,940 --> 00:26:26,190 independent. 508 00:26:26,190 --> 00:26:28,840 509 00:26:28,840 --> 00:26:33,410 Now the question is, if we look at the big model as just 510 00:26:33,410 --> 00:26:38,955 one probability model, instead of looking at the conditional 511 00:26:38,955 --> 00:26:44,530 sub-models, are the coin flips independent of each other? 512 00:26:44,530 --> 00:26:49,590 Does the outcome of a few coin flips give you information 513 00:26:49,590 --> 00:26:53,610 about subsequent coin flips? 514 00:26:53,610 --> 00:27:02,570 Well if I observe ten heads in a row-- 515 00:27:02,570 --> 00:27:05,960 So instead of two coin flips, now let's think of doing more 516 00:27:05,960 --> 00:27:10,070 of them so that the tree gets expanded. 517 00:27:10,070 --> 00:27:13,800 So let's start with this. 518 00:27:13,800 --> 00:27:16,020 I don't know which coin it is. 519 00:27:16,020 --> 00:27:18,970 What's the probability that the 11th coin toss 520 00:27:18,970 --> 00:27:20,220 is going to be heads? 521 00:27:20,220 --> 00:27:25,570 522 00:27:25,570 --> 00:27:29,370 There's complete symmetry here, so the answer could not 523 00:27:29,370 --> 00:27:32,330 be anything other than 1/2. 524 00:27:32,330 --> 00:27:36,950 So let's justify it, why is it 1/2? 525 00:27:36,950 --> 00:27:40,560 Well, the probability that the 11th toss is heads, how can 526 00:27:40,560 --> 00:27:42,380 that outcome happen? 527 00:27:42,380 --> 00:27:43,840 It can happen in two ways. 528 00:27:43,840 --> 00:27:50,480 You can choose coin A, which happens with probability 1/2. 529 00:27:50,480 --> 00:27:54,370 And having chosen coin A, there's probability 0.9 that 530 00:27:54,370 --> 00:27:58,500 it results in that you get heads in the 11th toss. 531 00:27:58,500 --> 00:28:03,540 Or you can choose coin B. And if it's coin B when you flip 532 00:28:03,540 --> 00:28:06,710 it, there's probably 0.1 that you have heads. 533 00:28:06,710 --> 00:28:08,860 So the final answer is 1/2. 534 00:28:08,860 --> 00:28:11,370 535 00:28:11,370 --> 00:28:14,820 So each one of the coins is biased, but they're biased in 536 00:28:14,820 --> 00:28:16,190 different ways. 537 00:28:16,190 --> 00:28:20,340 If I don't know which coin it is, their two biases kind of 538 00:28:20,340 --> 00:28:23,740 cancel out, and the probability of obtaining heads 539 00:28:23,740 --> 00:28:27,880 is just in the middle, then it's 1/2. 540 00:28:27,880 --> 00:28:31,720 Now if someone tells you that the first ten tosses were 541 00:28:31,720 --> 00:28:34,940 heads, is that going to change your beliefs 542 00:28:34,940 --> 00:28:37,300 about the 11th toss? 543 00:28:37,300 --> 00:28:41,820 Here's how a reasonable person would think about it. 544 00:28:41,820 --> 00:28:49,480 If it's coin B the probability of obtaining 10 heads in a row 545 00:28:49,480 --> 00:28:51,510 is negligible. 546 00:28:51,510 --> 00:28:55,270 It's going to be 0.1 to the 10th. 547 00:28:55,270 --> 00:28:59,110 If it's coin A. The probability of 10 heads in a 548 00:28:59,110 --> 00:29:01,380 row is a more reasonable number. 549 00:29:01,380 --> 00:29:03,850 It's 0.9 to the 10th. 550 00:29:03,850 --> 00:29:10,320 So this event is a lot more likely to occur with coin A, 551 00:29:10,320 --> 00:29:13,910 rather than coin B. 552 00:29:13,910 --> 00:29:18,820 The plausible explanation of having seen ten heads in a row 553 00:29:18,820 --> 00:29:25,730 is that I actually chose coin A. When you see ten heads in a 554 00:29:25,730 --> 00:29:29,690 row, you are pretty certain that it's coin A that we're 555 00:29:29,690 --> 00:29:30,940 dealing with. 556 00:29:30,940 --> 00:29:33,800 And once you're pretty certain that it's coin A that we're 557 00:29:33,800 --> 00:29:36,350 dealing with, what's the probability that the 558 00:29:36,350 --> 00:29:38,246 next toss is heads? 559 00:29:38,246 --> 00:29:40,960 It's going to be 0.9. 560 00:29:40,960 --> 00:29:45,270 So essentially here I'm doing an inference calculation. 561 00:29:45,270 --> 00:29:48,990 Given this information, I'm making an inference about 562 00:29:48,990 --> 00:29:50,700 which coin I'm dealing with. 563 00:29:50,700 --> 00:29:53,540 564 00:29:53,540 --> 00:29:57,240 I become pretty certain that it's coin A, and given that 565 00:29:57,240 --> 00:30:00,640 it's coin A, this probability is going to be 0.9. 566 00:30:00,640 --> 00:30:04,070 And I'm putting an approximate sign here, because the 567 00:30:04,070 --> 00:30:06,220 inference that I did is approximate. 568 00:30:06,220 --> 00:30:09,850 I'm pretty certain it's coin A. I'm not 100% certain that 569 00:30:09,850 --> 00:30:11,200 it's coin A. 570 00:30:11,200 --> 00:30:15,430 But in any case what happens here is that the unconditional 571 00:30:15,430 --> 00:30:19,440 probability is different from the conditional probability. 572 00:30:19,440 --> 00:30:23,710 This information here makes me change my beliefs 573 00:30:23,710 --> 00:30:25,590 about the 11th toss. 574 00:30:25,590 --> 00:30:30,560 And this means that the 11th toss is dependent on the 575 00:30:30,560 --> 00:30:31,530 previous tosses. 576 00:30:31,530 --> 00:30:35,590 So the coin tosses have now become dependent. 577 00:30:35,590 --> 00:30:38,790 What is the physical link that causes this dependence? 578 00:30:38,790 --> 00:30:42,710 Well, the physical link is the choice of the coin. 579 00:30:42,710 --> 00:30:46,580 By choosing a particular coin, I'm introducing a pattern in 580 00:30:46,580 --> 00:30:48,200 the future coin tosses. 581 00:30:48,200 --> 00:30:52,740 And that pattern is what causes dependence. 582 00:30:52,740 --> 00:30:55,670 OK, so I've been playing a little bit too loose with the 583 00:30:55,670 --> 00:30:59,810 language here, because we defined the concept of 584 00:30:59,810 --> 00:31:01,810 independence of two events. 585 00:31:01,810 --> 00:31:06,180 But here I have been referring to independent coin tosses, 586 00:31:06,180 --> 00:31:08,380 where I'm thinking about many coin tosses, 587 00:31:08,380 --> 00:31:11,200 like 10 or 11 of them. 588 00:31:11,200 --> 00:31:15,160 So to be proper, I should have defined for you also the 589 00:31:15,160 --> 00:31:18,710 notion of independence of multiple events, not just two. 590 00:31:18,710 --> 00:31:21,970 We don't want to just say coin toss one is independent from 591 00:31:21,970 --> 00:31:23,170 coin toss two. 592 00:31:23,170 --> 00:31:26,250 We want to be able to say something like, these 10 then 593 00:31:26,250 --> 00:31:29,690 coin tosses are all independent of each other. 594 00:31:29,690 --> 00:31:33,800 Intuitively what that means should be the same thing-- 595 00:31:33,800 --> 00:31:37,450 that information about some of the coin tosses doesn't change 596 00:31:37,450 --> 00:31:40,220 your beliefs about the remaining coin tosses. 597 00:31:40,220 --> 00:31:43,580 How do we translate that into a mathematical definition? 598 00:31:43,580 --> 00:31:48,600 Well, an ugly attempt would be to impose 599 00:31:48,600 --> 00:31:51,800 requirements such as this. 600 00:31:51,800 --> 00:31:56,780 Think of A1 being the event that the first flip was heads. 601 00:31:56,780 --> 00:32:00,980 A2 is the event of that the second flip was heads. 602 00:32:00,980 --> 00:32:04,320 A3, the third flip, was heads, and so on. 603 00:32:04,320 --> 00:32:08,310 Here is an event whose occurrence is not determined 604 00:32:08,310 --> 00:32:10,860 by the first three coin flips. 605 00:32:10,860 --> 00:32:13,400 And here's an event whose occurrence or not is 606 00:32:13,400 --> 00:32:16,680 determined by the fifth and sixth coin flip. 607 00:32:16,680 --> 00:32:19,080 If we think physically that all those coin flips have 608 00:32:19,080 --> 00:32:22,220 nothing to do with each other, information about the fifth 609 00:32:22,220 --> 00:32:26,420 and sixth coin flip are not going to change what we expect 610 00:32:26,420 --> 00:32:27,960 from the first three. 611 00:32:27,960 --> 00:32:30,780 So the probability of this event, the conditional 612 00:32:30,780 --> 00:32:33,050 probability, should be the same as the unconditional 613 00:32:33,050 --> 00:32:34,430 probability. 614 00:32:34,430 --> 00:32:38,850 And we would like a relation of this kind to be true, no 615 00:32:38,850 --> 00:32:43,480 matter what kind of formula you write down, as long as the 616 00:32:43,480 --> 00:32:47,230 events that show up here are different from the events that 617 00:32:47,230 --> 00:32:49,250 show up there. 618 00:32:49,250 --> 00:32:49,770 OK. 619 00:32:49,770 --> 00:32:52,150 That's sort of an ugly definition. 620 00:32:52,150 --> 00:32:55,350 The mathematical definition that actually does the job, 621 00:32:55,350 --> 00:32:59,530 and leads to all the formulas of this 622 00:32:59,530 --> 00:33:01,130 kind, is the following. 623 00:33:01,130 --> 00:33:03,610 We're going to say that the collection of events are 624 00:33:03,610 --> 00:33:07,090 independent if we can find the probability of their joint 625 00:33:07,090 --> 00:33:11,780 occurrence by just multiplying probabilities. 626 00:33:11,780 --> 00:33:17,380 And that will be true even if you look at sub-collections of 627 00:33:17,380 --> 00:33:18,640 these events. 628 00:33:18,640 --> 00:33:20,670 Let's make that more precise. 629 00:33:20,670 --> 00:33:24,310 If we have three events, the definition tells us that the 630 00:33:24,310 --> 00:33:27,560 three events are independent if the following are true. 631 00:33:27,560 --> 00:33:31,830 Probability A1 and A2 and A3, you can calculate this 632 00:33:31,830 --> 00:33:34,840 probability by multiplying individual probabilities. 633 00:33:34,840 --> 00:33:38,370 634 00:33:38,370 --> 00:33:44,320 But the same is true even if you take fewer events. 635 00:33:44,320 --> 00:33:46,740 Just a few indices out of the indices 636 00:33:46,740 --> 00:33:48,340 that we have available. 637 00:33:48,340 --> 00:33:54,970 So we also require P(A1 intersection A2) is P(A1) 638 00:33:54,970 --> 00:33:57,600 times P(A2). 639 00:33:57,600 --> 00:34:01,250 And similarly for the other possibilities of 640 00:34:01,250 --> 00:34:02,500 choosing the indices. 641 00:34:02,500 --> 00:34:10,900 642 00:34:10,900 --> 00:34:14,659 OK, so independence, mathematical definition, 643 00:34:14,659 --> 00:34:18,860 requires that calculating probabilities of any 644 00:34:18,860 --> 00:34:22,370 intersection of the events we have in our hands, that 645 00:34:22,370 --> 00:34:25,590 calculation can be done by just multiplying individual 646 00:34:25,590 --> 00:34:27,000 probabilities. 647 00:34:27,000 --> 00:34:30,230 And this has to apply to the case where we consider all of 648 00:34:30,230 --> 00:34:33,300 the events in our hands or just 649 00:34:33,300 --> 00:34:36,900 sub-collections of those events. 650 00:34:36,900 --> 00:34:42,130 Now these relations just by themselves are called pairwise 651 00:34:42,130 --> 00:34:44,389 independence. 652 00:34:44,389 --> 00:34:47,179 So this relation, for example, tells us that A1 is 653 00:34:47,179 --> 00:34:48,710 independent from A2. 654 00:34:48,710 --> 00:34:51,130 This tells us that A2 is independent from A3. 655 00:34:51,130 --> 00:34:54,670 This will tell us that A1 is independent from A3. 656 00:34:54,670 --> 00:34:58,990 But independence of all the events together actually 657 00:34:58,990 --> 00:35:01,020 requires a little more. 658 00:35:01,020 --> 00:35:05,080 One more equality that has to do with all three events being 659 00:35:05,080 --> 00:35:07,000 considered at the same time. 660 00:35:07,000 --> 00:35:10,562 And this extra equality is not redundant. 661 00:35:10,562 --> 00:35:13,020 It actually does make a difference. 662 00:35:13,020 --> 00:35:15,390 Independence and pairwise independence 663 00:35:15,390 --> 00:35:17,310 are different things. 664 00:35:17,310 --> 00:35:20,320 So let's illustrate the situation with an example. 665 00:35:20,320 --> 00:35:22,790 Suppose we have two coin flips. 666 00:35:22,790 --> 00:35:28,390 The coin tosses are independent, so the bias is 667 00:35:28,390 --> 00:35:32,910 1/2, so all possible outcomes have a probability of 1/2 668 00:35:32,910 --> 00:35:36,100 times 1/2, which is 1/4. 669 00:35:36,100 --> 00:35:40,520 And let's consider now a bunch of different events. 670 00:35:40,520 --> 00:35:46,290 One event is that the first toss is heads. 671 00:35:46,290 --> 00:35:48,950 This is this blue set here. 672 00:35:48,950 --> 00:35:54,990 Another event is the second toss is heads. 673 00:35:54,990 --> 00:35:57,970 And this is this black event here. 674 00:35:57,970 --> 00:36:00,770 675 00:36:00,770 --> 00:36:01,850 OK. 676 00:36:01,850 --> 00:36:04,500 Are these two events independent? 677 00:36:04,500 --> 00:36:06,660 If you check it mathematically, yes. 678 00:36:06,660 --> 00:36:09,270 Probability of A is probability of B is 1/2. 679 00:36:09,270 --> 00:36:13,170 Probability of A times probability of B is 1/4, which 680 00:36:13,170 --> 00:36:16,700 is the same as the probability of A intersection B, 681 00:36:16,700 --> 00:36:18,070 which is this set. 682 00:36:18,070 --> 00:36:20,680 So we have just checked mathematically that A and B 683 00:36:20,680 --> 00:36:22,180 are independent. 684 00:36:22,180 --> 00:36:26,210 Now lets consider a third event which is that the first 685 00:36:26,210 --> 00:36:30,080 and second toss give the same result. 686 00:36:30,080 --> 00:36:32,270 I'll use a different color. 687 00:36:32,270 --> 00:36:35,400 First and second toss to give the same result. 688 00:36:35,400 --> 00:36:38,350 This is the event that we obtain heads, 689 00:36:38,350 --> 00:36:40,700 heads or tails, tails. 690 00:36:40,700 --> 00:36:43,030 So this is the probability of C. What's the 691 00:36:43,030 --> 00:36:44,280 probability of C? 692 00:36:44,280 --> 00:36:47,790 693 00:36:47,790 --> 00:36:51,520 Well, C is made up of two outcomes, each one of which 694 00:36:51,520 --> 00:36:55,500 has probability 1/4, so the probability of C is 1/2. 695 00:36:55,500 --> 00:36:58,600 What is the probability of C intersection A? 696 00:36:58,600 --> 00:37:02,760 C intersection A is just this one outcome, and has 697 00:37:02,760 --> 00:37:06,030 probability 1/4. 698 00:37:06,030 --> 00:37:10,040 What's the probability of A intersection B intersection C? 699 00:37:10,040 --> 00:37:13,650 The three events intersect just this outcome, so this 700 00:37:13,650 --> 00:37:15,620 probability is also 1/4. 701 00:37:15,620 --> 00:37:18,610 702 00:37:18,610 --> 00:37:19,860 OK. 703 00:37:19,860 --> 00:37:24,130 704 00:37:24,130 --> 00:37:27,060 What's the probability of C given A and B? 705 00:37:27,060 --> 00:37:29,800 706 00:37:29,800 --> 00:37:34,840 If A has occurred, and B has occurred, you are certain that 707 00:37:34,840 --> 00:37:36,980 this outcome here happened. 708 00:37:36,980 --> 00:37:40,160 If the first toss is H and the second toss is H, then you're 709 00:37:40,160 --> 00:37:41,970 certain of the first and second toss 710 00:37:41,970 --> 00:37:43,760 gave the same result. 711 00:37:43,760 --> 00:37:46,500 So the conditional probability of C given A and 712 00:37:46,500 --> 00:37:49,050 B is equal to 1. 713 00:37:49,050 --> 00:37:51,640 So do we have independence in this example? 714 00:37:51,640 --> 00:37:54,310 715 00:37:54,310 --> 00:37:55,970 We don't. 716 00:37:55,970 --> 00:38:00,210 C, that we obtain the same result in the first and the 717 00:38:00,210 --> 00:38:04,020 second toss, has probability 1/2. 718 00:38:04,020 --> 00:38:08,480 Half of the possible outcomes give us two coin flips with 719 00:38:08,480 --> 00:38:10,700 the same result-- heads, heads or tails, tails. 720 00:38:10,700 --> 00:38:12,970 So the probability of C is 1/2. 721 00:38:12,970 --> 00:38:17,590 But if I tell you that the events A and B both occurred, 722 00:38:17,590 --> 00:38:20,900 then you're certain that C occurred. 723 00:38:20,900 --> 00:38:23,190 If I tell you that we had heads and heads, then you're 724 00:38:23,190 --> 00:38:25,460 certain the outcomes were the same. 725 00:38:25,460 --> 00:38:28,830 So the conditional probability is different from the 726 00:38:28,830 --> 00:38:31,400 unconditional probability. 727 00:38:31,400 --> 00:38:37,050 So by combining these two relations together, we get 728 00:38:37,050 --> 00:38:39,235 that the three events are not independent. 729 00:38:39,235 --> 00:38:42,260 730 00:38:42,260 --> 00:38:45,520 But are they pairwise independent? 731 00:38:45,520 --> 00:38:49,020 Is A independent from B? 732 00:38:49,020 --> 00:38:53,400 Yes, because probability of A times probability of B is 1/4, 733 00:38:53,400 --> 00:38:58,670 which is probability of A intersection B. Is C 734 00:38:58,670 --> 00:39:02,350 independent from A? 735 00:39:02,350 --> 00:39:05,780 Well, the probability of C and A is 1/4. 736 00:39:05,780 --> 00:39:07,620 The probability of C is 1/2. 737 00:39:07,620 --> 00:39:09,830 The probability of A is 1/2. 738 00:39:09,830 --> 00:39:11,150 So it checks. 739 00:39:11,150 --> 00:39:17,960 1/4 is equal to 1/2 and 1/2, so event C and event A are 740 00:39:17,960 --> 00:39:19,410 independent. 741 00:39:19,410 --> 00:39:24,490 Knowing that the first toss was heads does not change your 742 00:39:24,490 --> 00:39:28,600 beliefs about whether the two tosses are going to have the 743 00:39:28,600 --> 00:39:31,380 same outcome or not. 744 00:39:31,380 --> 00:39:34,200 Knowing that the first was heads, well, the second is 745 00:39:34,200 --> 00:39:36,520 equally likely to be heads or tails. 746 00:39:36,520 --> 00:39:39,710 So event C has just the same probability, 747 00:39:39,710 --> 00:39:42,140 again, 1/2, to occur. 748 00:39:42,140 --> 00:39:46,110 To put it the opposite way, if I tell you that the two 749 00:39:46,110 --> 00:39:47,860 results were the same-- 750 00:39:47,860 --> 00:39:51,130 so it's either heads, heads or tails, tails-- 751 00:39:51,130 --> 00:39:53,070 what does that tell you about the first toss? 752 00:39:53,070 --> 00:39:54,800 Is it heads, or is it tails? 753 00:39:54,800 --> 00:39:56,570 Well, it doesn't tell you anything. 754 00:39:56,570 --> 00:39:59,700 It could be either over the two, so the probability of 755 00:39:59,700 --> 00:40:04,490 heads in the first toss is equal to 1/2, and telling you 756 00:40:04,490 --> 00:40:07,460 C occurred does not change anything. 757 00:40:07,460 --> 00:40:10,830 So this is an example that illustrates the case where we 758 00:40:10,830 --> 00:40:14,650 have three events in which we check that pairwise 759 00:40:14,650 --> 00:40:18,140 independence holds for any combination of 760 00:40:18,140 --> 00:40:19,250 two of these events. 761 00:40:19,250 --> 00:40:21,900 We have the probability of their intersection is equal to 762 00:40:21,900 --> 00:40:23,760 the product of their probabilities. 763 00:40:23,760 --> 00:40:27,930 On the other hand, the three events taken all together are 764 00:40:27,930 --> 00:40:29,500 not independent. 765 00:40:29,500 --> 00:40:32,780 A doesn't tell me anything useful, whether C is going to 766 00:40:32,780 --> 00:40:34,710 occur or not. 767 00:40:34,710 --> 00:40:36,730 B doesn't tell me anything useful. 768 00:40:36,730 --> 00:40:40,840 But if I tell you that both A and B occurred, the two of 769 00:40:40,840 --> 00:40:44,150 them together tell me something useful about C. 770 00:40:44,150 --> 00:40:47,165 Namely, they tell me that C certainly has occurred. 771 00:40:47,165 --> 00:40:49,750 772 00:40:49,750 --> 00:40:51,000 Very good. 773 00:40:51,000 --> 00:40:53,900 774 00:40:53,900 --> 00:40:56,890 So independence is this somewhat subtle concept. 775 00:40:56,890 --> 00:40:59,710 Once you grasp the intuition of what it really means, then 776 00:40:59,710 --> 00:41:02,910 things perhaps fall in place. 777 00:41:02,910 --> 00:41:06,630 But it's a concept where it's easy to get some 778 00:41:06,630 --> 00:41:07,430 misunderstanding. 779 00:41:07,430 --> 00:41:11,370 So just take some time to digest. 780 00:41:11,370 --> 00:41:14,810 So to lighten things up, I'm going to spend the remaining 781 00:41:14,810 --> 00:41:18,810 four minutes talking about the very nice, simple problem that 782 00:41:18,810 --> 00:41:23,240 involves conditional probabilities and the like. 783 00:41:23,240 --> 00:41:28,140 So here's the problem, formulated exactly as it shows 784 00:41:28,140 --> 00:41:30,250 up in various textbooks. 785 00:41:30,250 --> 00:41:31,780 And the formulation says the following. 786 00:41:31,780 --> 00:41:35,310 Well, consider one of those anachronistic places where 787 00:41:35,310 --> 00:41:40,050 they still have kings or queens, and where actually 788 00:41:40,050 --> 00:41:43,090 boys take precedence over girls. 789 00:41:43,090 --> 00:41:44,600 So if there is a boy-- 790 00:41:44,600 --> 00:41:47,280 791 00:41:47,280 --> 00:41:52,400 if the royal family has a boy, then he will become the king 792 00:41:52,400 --> 00:41:58,080 even if he has an older sister who might be the queen. 793 00:41:58,080 --> 00:42:02,930 So we have one of those royal families. 794 00:42:02,930 --> 00:42:06,810 That royal family had two children, and we know that 795 00:42:06,810 --> 00:42:08,060 there is a king. 796 00:42:08,060 --> 00:42:11,370 797 00:42:11,370 --> 00:42:14,250 There is a king, which means that at least one of the two 798 00:42:14,250 --> 00:42:16,030 children was a boy. 799 00:42:16,030 --> 00:42:18,970 Otherwise we wouldn't have a king. 800 00:42:18,970 --> 00:42:21,885 What is the probability that the king's sibling is female? 801 00:42:21,885 --> 00:42:24,920 802 00:42:24,920 --> 00:42:26,170 OK. 803 00:42:26,170 --> 00:42:28,260 804 00:42:28,260 --> 00:42:30,830 I guess we need to make some assumptions about genetics. 805 00:42:30,830 --> 00:42:33,910 Let's assume that every child is a boy or a girl with 806 00:42:33,910 --> 00:42:39,440 probability 1/2, and that different children, what they 807 00:42:39,440 --> 00:42:42,900 are is independent from what the other children were. 808 00:42:42,900 --> 00:42:47,660 So every childbirth is basically a coin flip. 809 00:42:47,660 --> 00:42:50,740 OK, so if you take that, you say, well, 810 00:42:50,740 --> 00:42:52,980 the king is a child. 811 00:42:52,980 --> 00:42:55,890 His sibling is another child. 812 00:42:55,890 --> 00:42:58,450 Children are independent of each other. 813 00:42:58,450 --> 00:43:05,860 So the probability that the sibling is a girl is 1/2. 814 00:43:05,860 --> 00:43:07,620 That's the naive answer. 815 00:43:07,620 --> 00:43:09,270 Now let's try to do it formally. 816 00:43:09,270 --> 00:43:12,410 Let's set up a model of the experiment. 817 00:43:12,410 --> 00:43:15,650 The royal family had two children, as we we're told, so 818 00:43:15,650 --> 00:43:17,020 there's four outcomes-- 819 00:43:17,020 --> 00:43:22,040 boy boy, boy girl, girl boy, and girl girl. 820 00:43:22,040 --> 00:43:26,520 Now, we are told that there is a king, which means what? 821 00:43:26,520 --> 00:43:29,530 This outcome here did not happen. 822 00:43:29,530 --> 00:43:30,760 It is not possible. 823 00:43:30,760 --> 00:43:33,810 There are three outcomes that remain possible. 824 00:43:33,810 --> 00:43:37,940 So this is our conditional sample space given 825 00:43:37,940 --> 00:43:40,500 that there is king. 826 00:43:40,500 --> 00:43:43,170 What are the probabilities for the original model? 827 00:43:43,170 --> 00:43:46,420 Well with the model that we assume that every child is a 828 00:43:46,420 --> 00:43:50,885 boy or a girl independently with probability 1/2, then the 829 00:43:50,885 --> 00:43:54,950 four outcomes would be equally likely, and they're like this. 830 00:43:54,950 --> 00:43:57,110 These are the original probabilities. 831 00:43:57,110 --> 00:44:00,810 But once we are told that this outcome did not happen, 832 00:44:00,810 --> 00:44:03,900 because we have a king, then we are transported to the 833 00:44:03,900 --> 00:44:05,830 smaller sample space. 834 00:44:05,830 --> 00:44:08,380 In this sample space, what's the probability that the 835 00:44:08,380 --> 00:44:10,360 sibling is a girl? 836 00:44:10,360 --> 00:44:15,160 Well the sibling is a girl in two out of the three outcomes. 837 00:44:15,160 --> 00:44:17,290 So the probability that the sibling is a 838 00:44:17,290 --> 00:44:21,880 girl is actually 2/3. 839 00:44:21,880 --> 00:44:25,780 So that's supposed to be the right answer. 840 00:44:25,780 --> 00:44:29,620 Maybe a little counter-intuitive. 841 00:44:29,620 --> 00:44:32,960 So you can play smart and say, oh I understand such problems 842 00:44:32,960 --> 00:44:35,990 better than you, here is a trick problem and here's why 843 00:44:35,990 --> 00:44:37,800 the answer is 2/3. 844 00:44:37,800 --> 00:44:41,300 But actually I'm not fully justified in saying that the 845 00:44:41,300 --> 00:44:42,930 answer is 2/3. 846 00:44:42,930 --> 00:44:46,520 I made lots of hidden assumptions when I put this 847 00:44:46,520 --> 00:44:50,040 model down, which I didn't yet state. 848 00:44:50,040 --> 00:44:54,960 So to reverse engineer this answer, let's actually think 849 00:44:54,960 --> 00:44:57,960 what's the probability model for which this would have been 850 00:44:57,960 --> 00:44:59,320 the right answer. 851 00:44:59,320 --> 00:45:01,300 And here's the probability model. 852 00:45:01,300 --> 00:45:02,800 The royal family-- 853 00:45:02,800 --> 00:45:07,050 the royal parents decided to have exactly two children. 854 00:45:07,050 --> 00:45:08,960 They went and had them. 855 00:45:08,960 --> 00:45:11,670 It turned out that at least one was a boy 856 00:45:11,670 --> 00:45:13,390 and became a king. 857 00:45:13,390 --> 00:45:15,580 Under this scenario-- 858 00:45:15,580 --> 00:45:18,070 that they decide to have exactly two children-- 859 00:45:18,070 --> 00:45:20,840 then this is the big sample space. 860 00:45:20,840 --> 00:45:23,350 It turned out that one was a boy. 861 00:45:23,350 --> 00:45:25,560 That eliminates this outcome. 862 00:45:25,560 --> 00:45:27,410 And then this picture is correct and this 863 00:45:27,410 --> 00:45:28,750 is the right answer. 864 00:45:28,750 --> 00:45:31,680 But there's hidden assumptions being there. 865 00:45:31,680 --> 00:45:35,170 How about if the royal family had followed 866 00:45:35,170 --> 00:45:37,230 the following strategy? 867 00:45:37,230 --> 00:45:41,760 We're going to have children until we get a boy, so that we 868 00:45:41,760 --> 00:45:45,700 get a king, and then we'll stop. 869 00:45:45,700 --> 00:45:47,660 OK, given they have two children, what's the 870 00:45:47,660 --> 00:45:50,660 probability that the sibling is a girl? 871 00:45:50,660 --> 00:45:51,880 It's 1. 872 00:45:51,880 --> 00:45:55,260 The reason that they had two children was because the first 873 00:45:55,260 --> 00:45:57,800 was a girl, so they had to have a second. 874 00:45:57,800 --> 00:46:00,820 So assumptions about reproductive practices 875 00:46:00,820 --> 00:46:03,130 actually need to come in, and they're going 876 00:46:03,130 --> 00:46:04,630 to affect the decisions. 877 00:46:04,630 --> 00:46:08,010 Or, if it's one of those ancient kingdoms where a king 878 00:46:08,010 --> 00:46:11,790 would always make sure too strangle any of his brothers, 879 00:46:11,790 --> 00:46:15,560 then the probability that the sibling is a girl is actually 880 00:46:15,560 --> 00:46:17,570 1 again, and so on. 881 00:46:17,570 --> 00:46:20,590 So it means that one needs to be careful when you start with 882 00:46:20,590 --> 00:46:24,330 loosely worded problems to make sure exactly what it 883 00:46:24,330 --> 00:46:26,950 means and what assumptions you're making. 884 00:46:26,950 --> 00:46:28,880 All right, see you next week. 885 00:46:28,880 --> 00:46:30,130