1 00:00:00,060 --> 00:00:02,490 The following content is provided under a Creative 2 00:00:02,490 --> 00:00:04,030 Commons license. 3 00:00:04,030 --> 00:00:06,330 Your support will help MIT OpenCourseWare 4 00:00:06,330 --> 00:00:10,690 continue to offer high quality educational resources for free. 5 00:00:10,690 --> 00:00:13,320 To make a donation or view additional materials 6 00:00:13,320 --> 00:00:17,250 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,250 --> 00:00:18,000 at ocw.mit.edu. 8 00:00:18,000 --> 00:00:21,390 VINA NGUYEN: So we'll get started. 9 00:00:21,390 --> 00:00:22,210 As always, review. 10 00:00:22,210 --> 00:00:26,770 Can anyone tell me what Bayes' rule is? 11 00:00:26,770 --> 00:00:28,740 Intuitively or mathematically. 12 00:00:32,760 --> 00:00:33,260 OK. 13 00:00:42,770 --> 00:00:45,730 So Bayes' rule rule is kind of like the reverse of what 14 00:00:45,730 --> 00:00:47,720 we're given. 15 00:00:47,720 --> 00:00:53,650 So if you want the probability of some event given B, 16 00:00:53,650 --> 00:00:59,950 then you have to do probability of A times 17 00:00:59,950 --> 00:01:19,960 the probability of B given A over probability of Ai 18 00:01:19,960 --> 00:01:26,630 given probability of B given Ai, all the way to probability 19 00:01:26,630 --> 00:01:34,190 of A n given probability of B. 20 00:01:34,190 --> 00:01:37,860 So it's probably a little hard to remember the formula. 21 00:01:37,860 --> 00:01:40,580 But if you think visually, it's like 22 00:01:40,580 --> 00:01:44,600 if you have a sample space, and say you 23 00:01:44,600 --> 00:01:47,900 have some random event, B, but you don't actually 24 00:01:47,900 --> 00:01:49,280 know what this is. 25 00:01:49,280 --> 00:01:53,860 All you know is that you have A1 maybe, 26 00:01:53,860 --> 00:01:57,000 and you know the probability of A1 and B. 27 00:01:57,000 --> 00:02:03,940 And then you might have A2, A3, et cetera, until A n. 28 00:02:03,940 --> 00:02:07,410 So n could be the number of partitions. 29 00:02:07,410 --> 00:02:11,960 So Bayes' rule is figuring out probability of A given B 30 00:02:11,960 --> 00:02:15,806 when you're given these chunks. 31 00:02:15,806 --> 00:02:17,090 Does that makes sense? 32 00:02:25,950 --> 00:02:27,800 Do you guys need more time? 33 00:02:27,800 --> 00:02:28,820 No? 34 00:02:28,820 --> 00:02:31,070 OK. 35 00:02:31,070 --> 00:02:33,250 So what's the total probability theorem? 36 00:02:33,250 --> 00:02:34,700 I basically just wrote it, so you 37 00:02:34,700 --> 00:02:37,770 can point it out if you want. 38 00:02:37,770 --> 00:02:38,270 Yeah. 39 00:02:38,270 --> 00:02:40,166 AUDIENCE: Is it the bit on the bottom? 40 00:02:40,166 --> 00:02:43,020 VINA NGUYEN: Yep. 41 00:02:43,020 --> 00:02:48,320 So is this the total probability theorem, which gives us Pv. 42 00:02:51,480 --> 00:02:53,340 So it's called total because you have 43 00:02:53,340 --> 00:02:56,730 to have every one of these A's that have B in it. 44 00:02:56,730 --> 00:02:59,370 So if you know A1, A2, and A3, but you 45 00:02:59,370 --> 00:03:03,030 don't know this part, that's not complete, not total, 46 00:03:03,030 --> 00:03:04,464 so it won't work. 47 00:03:04,464 --> 00:03:06,380 So you have to make sure you have all of that. 48 00:03:13,440 --> 00:03:15,860 Is this too high? 49 00:03:15,860 --> 00:03:16,360 OK. 50 00:03:16,360 --> 00:03:19,280 Does that makes sense? 51 00:03:19,280 --> 00:03:21,430 OK. 52 00:03:21,430 --> 00:03:25,510 And if I told you that A is independent from B, what 53 00:03:25,510 --> 00:03:27,556 does that mean intuitively? 54 00:03:31,210 --> 00:03:33,632 AUDIENCE: P of A given B equals P 55 00:03:33,632 --> 00:03:37,977 of A. They're not related at all. 56 00:03:37,977 --> 00:03:40,600 So if you know one, it doesn't help 57 00:03:40,600 --> 00:03:43,757 to change anything here [INAUDIBLE].. 58 00:03:43,757 --> 00:03:45,590 VINA NGUYEN: So B basically doesn't give you 59 00:03:45,590 --> 00:03:50,660 any information about A. No new information about A. Yep. 60 00:03:50,660 --> 00:03:51,990 And that can be reversed. 61 00:03:51,990 --> 00:03:54,230 So A is independent from B. B is also 62 00:03:54,230 --> 00:03:58,862 independent from A. I won't write that because I figure 63 00:03:58,862 --> 00:03:59,570 that makes sense. 64 00:04:02,100 --> 00:04:05,710 So how do we test for independence? 65 00:04:05,710 --> 00:04:08,632 What's that equation? 66 00:04:08,632 --> 00:04:13,217 You just said it, but anyone else? 67 00:04:13,217 --> 00:04:13,716 Yep. 68 00:04:13,716 --> 00:04:18,980 AUDIENCE: We put the total probability theorem on top. 69 00:04:18,980 --> 00:04:23,275 Then you put P parenthesis A parenthesis. 70 00:04:23,275 --> 00:04:25,239 VINA NGUYEN: Right. 71 00:04:25,239 --> 00:04:26,712 Can you say that again? 72 00:04:26,712 --> 00:04:29,658 AUDIENCE: You put the total probability theorem 73 00:04:29,658 --> 00:04:32,113 on the top of the bracket. 74 00:04:36,060 --> 00:04:37,700 VINA NGUYEN: You mean like this? 75 00:04:37,700 --> 00:04:38,530 AUDIENCE: Yeah. 76 00:04:38,530 --> 00:04:39,980 VINA NGUYEN: What do I put here? 77 00:04:39,980 --> 00:04:42,386 AUDIENCE: A, B. 78 00:04:42,386 --> 00:04:43,510 VINA NGUYEN: A something B. 79 00:04:43,510 --> 00:04:44,810 AUDIENCE: A plus B. 80 00:04:44,810 --> 00:04:45,620 VINA NGUYEN: Union. 81 00:04:45,620 --> 00:04:46,610 Yeah. 82 00:04:46,610 --> 00:04:48,740 Yep. 83 00:04:48,740 --> 00:04:53,195 AUDIENCE: Over-- draw a line-- 84 00:04:53,195 --> 00:04:58,650 P parenthesis A parenthesis. 85 00:04:58,650 --> 00:04:59,880 VINA NGUYEN: This? 86 00:04:59,880 --> 00:05:01,064 AUDIENCE: Yeah. 87 00:05:01,064 --> 00:05:02,730 VINA NGUYEN: Anyone have another answer? 88 00:05:06,666 --> 00:05:09,946 AUDIENCE: Isn't it the probability of A 89 00:05:09,946 --> 00:05:22,410 is equal to the probability of A intercept B 90 00:05:22,410 --> 00:05:24,880 over the probability of A? 91 00:05:24,880 --> 00:05:25,980 VINA NGUYEN: Yeah. 92 00:05:25,980 --> 00:05:26,480 OK. 93 00:05:26,480 --> 00:05:32,480 So like, for you, you got part of it if you switch these. 94 00:05:32,480 --> 00:05:36,230 But because this itself is just an expression, 95 00:05:36,230 --> 00:05:38,780 you're not saying equal to what. 96 00:05:38,780 --> 00:05:41,000 So you can't really evaluate it while giving 97 00:05:41,000 --> 00:05:44,870 some kind of resolve. 98 00:05:44,870 --> 00:05:49,250 So if you're just giving y, that doesn't really tell you 99 00:05:49,250 --> 00:05:50,810 if that's a test or not. 100 00:05:50,810 --> 00:05:55,190 But this does because this has to hold true. 101 00:05:55,190 --> 00:05:57,590 So this is right. 102 00:05:57,590 --> 00:06:00,950 Does everyone see why that is? 103 00:06:00,950 --> 00:06:05,210 So we usually actually write it with Pv over here 104 00:06:05,210 --> 00:06:06,920 because we don't like dividing by 0. 105 00:06:06,920 --> 00:06:09,810 So if we just move it over here, that takes care of it. 106 00:06:09,810 --> 00:06:12,415 So you'll get PA, PB. 107 00:06:20,320 --> 00:06:21,970 So we like to write it like that, 108 00:06:21,970 --> 00:06:24,600 but they are essentially the same thing. 109 00:06:24,600 --> 00:06:25,330 Yeah. 110 00:06:25,330 --> 00:06:27,360 Just because the divide 0 thing. 111 00:06:27,360 --> 00:06:29,278 So here you can put in 0. 112 00:06:32,350 --> 00:06:34,240 Sorry. 113 00:06:34,240 --> 00:06:36,540 Does that make sense? 114 00:06:36,540 --> 00:06:37,990 OK. 115 00:06:37,990 --> 00:06:38,720 All right. 116 00:06:38,720 --> 00:06:41,802 So we actually didn't get to conditional independence 117 00:06:41,802 --> 00:06:43,760 last class, so we're going to go over that now. 118 00:06:50,110 --> 00:06:51,150 Where's that eraser? 119 00:06:56,639 --> 00:06:57,740 Do you guys need this? 120 00:07:22,980 --> 00:07:26,650 So we have this for independence. 121 00:07:26,650 --> 00:07:29,120 And if we want to do conditional independence, 122 00:07:29,120 --> 00:07:32,900 we just do given C, given C, because that's 123 00:07:32,900 --> 00:07:34,580 what conditional means. 124 00:07:34,580 --> 00:07:37,100 Given something. 125 00:07:37,100 --> 00:07:41,630 And you can also see it the second way, 126 00:07:41,630 --> 00:07:44,630 which means that even if you're given B and C, 127 00:07:44,630 --> 00:07:47,040 the addition of B doesn't mean anything. 128 00:07:47,040 --> 00:07:50,030 So how do we get from the first to the bottom? 129 00:07:50,030 --> 00:08:03,787 So you have-- is this OK to read? 130 00:08:03,787 --> 00:08:04,620 Feels kind of messy. 131 00:08:04,620 --> 00:08:05,430 OK. 132 00:08:05,430 --> 00:08:15,770 So another way of writing this is if we have A union B union C 133 00:08:15,770 --> 00:08:17,240 over PC. 134 00:08:20,280 --> 00:08:22,945 It's kind of like we did with conditional probability. 135 00:08:26,390 --> 00:08:29,080 And then you're given in that equation 136 00:08:29,080 --> 00:08:39,669 that this is PA given C, PB given C equals. 137 00:08:39,669 --> 00:08:41,440 And the way we get from here is you've 138 00:08:41,440 --> 00:08:43,409 seen the multiplication rule. 139 00:08:43,409 --> 00:08:48,480 So you have probability of C, probability of B 140 00:08:48,480 --> 00:08:59,460 given C, probability of A given B and C. Does this make sense? 141 00:08:59,460 --> 00:09:02,540 Over probability of C. 142 00:09:02,540 --> 00:09:05,390 So if you don't fully understand that, 143 00:09:05,390 --> 00:09:08,090 you can think of it as C-- 144 00:09:08,090 --> 00:09:19,736 a tree-- and then B, not B, and then A. 145 00:09:19,736 --> 00:09:23,780 So that top part is just probability of C, probability 146 00:09:23,780 --> 00:09:27,560 of B given C happened, probability of A given B and C 147 00:09:27,560 --> 00:09:29,220 happened. 148 00:09:29,220 --> 00:09:29,720 OK. 149 00:09:29,720 --> 00:09:31,700 Does everyone see that? 150 00:09:31,700 --> 00:09:37,970 So then these two cancel, these two cancel, 151 00:09:37,970 --> 00:09:45,250 and then you're left with this part, which is essentially 152 00:09:45,250 --> 00:09:46,350 that second line. 153 00:09:49,030 --> 00:09:51,570 So you can use either of those, depending on what 154 00:09:51,570 --> 00:09:52,830 you're given in the problem. 155 00:09:57,260 --> 00:09:59,780 Does that make sense, how we got from one to the other? 156 00:10:08,285 --> 00:10:10,310 I'll give you more time to write it, OK? 157 00:10:13,830 --> 00:10:16,605 So we have an example. 158 00:10:20,340 --> 00:10:22,800 So we have two coins that are blue and red 159 00:10:22,800 --> 00:10:26,220 but they're biased. 160 00:10:26,220 --> 00:10:31,200 So the blue coin has heads 99% of the time, 161 00:10:31,200 --> 00:10:35,290 and then the red coin lands head only 1% of the time. 162 00:10:35,290 --> 00:10:37,530 So what we're doing is first we're 163 00:10:37,530 --> 00:10:39,720 choosing one of these coins at random, 164 00:10:39,720 --> 00:10:41,860 and then we're flipping it twice. 165 00:10:41,860 --> 00:10:44,600 Does everyone understand that problem space? 166 00:10:44,600 --> 00:10:45,390 OK. 167 00:10:45,390 --> 00:10:48,860 So the way we broke it down is that event one, 168 00:10:48,860 --> 00:10:51,540 H1 means the first toss is a head, 169 00:10:51,540 --> 00:10:54,570 and the second event is H2, which means 170 00:10:54,570 --> 00:10:56,120 the second toss is a head. 171 00:11:00,330 --> 00:11:06,110 So the question pretty much is, are they 172 00:11:06,110 --> 00:11:08,880 independent of each other, these tosses? 173 00:11:08,880 --> 00:11:10,820 The thing is that they're independent 174 00:11:10,820 --> 00:11:12,840 of each other conditionally. 175 00:11:12,840 --> 00:11:15,800 So if you already know what coin you're given, 176 00:11:15,800 --> 00:11:18,410 then each toss is independent. 177 00:11:18,410 --> 00:11:20,080 So I'm going to prove that. 178 00:11:31,010 --> 00:11:37,640 So we're going to assume B, which 179 00:11:37,640 --> 00:11:39,160 means we got the blue coin. 180 00:11:43,340 --> 00:11:46,022 So are they conditionally independent? 181 00:11:46,022 --> 00:11:47,230 You have that equation again. 182 00:11:47,230 --> 00:11:57,140 You have P H1 union P H2 given B. Is that 183 00:11:57,140 --> 00:11:59,495 equal to probability of heads? 184 00:12:02,210 --> 00:12:06,080 The first toss is a head given B plus probability 185 00:12:06,080 --> 00:12:15,190 of second toss heads given B. 186 00:12:15,190 --> 00:12:18,430 So that's what we're trying to answer right now. 187 00:12:18,430 --> 00:12:20,830 Does everyone understand how this is conditional? 188 00:12:20,830 --> 00:12:26,810 You're given that you have B. So it is kind of obvious. 189 00:12:26,810 --> 00:12:29,840 If you're given B, then it has 99 times 190 00:12:29,840 --> 00:12:33,470 99 chance of being heads. 191 00:12:33,470 --> 00:12:38,152 And then this is 0.99 times 0.99. 192 00:12:38,152 --> 00:12:39,360 So obviously, they are equal. 193 00:12:39,360 --> 00:12:41,310 So they are conditionally independent. 194 00:13:01,410 --> 00:13:09,230 But the next question is if you don't know which coin you have, 195 00:13:09,230 --> 00:13:11,175 if the tosses are still independent. 196 00:13:11,175 --> 00:13:12,550 So that's answering the question. 197 00:13:24,730 --> 00:13:27,530 So this was our original test for independence. 198 00:13:27,530 --> 00:13:30,970 There's nothing given, so we don't know what coin we have. 199 00:13:30,970 --> 00:13:32,410 So now we're trying to answer, are 200 00:13:32,410 --> 00:13:37,000 these tosses independent without given information? 201 00:13:37,000 --> 00:13:39,740 Can you see the difference? 202 00:13:39,740 --> 00:13:41,680 So how do we calculate this? 203 00:13:48,800 --> 00:13:54,920 Probability of the first toss being a head. 204 00:13:54,920 --> 00:13:56,590 So we're going to do this. 205 00:13:56,590 --> 00:13:59,090 You've see the total probability theorem, right? 206 00:13:59,090 --> 00:14:07,520 So we have probability of first head given that it's blue 207 00:14:07,520 --> 00:14:11,480 plus the probability that we got a blue coin plus probability 208 00:14:11,480 --> 00:14:16,130 that we have the head given that it's not blue.