1 00:00:00,499 --> 00:00:02,820 The following content is provided under a Creative 2 00:00:02,820 --> 00:00:04,340 Commons license. 3 00:00:04,340 --> 00:00:06,670 Your support will help MIT OpenCourseWare 4 00:00:06,670 --> 00:00:11,040 continue to offer high quality educational resources for free. 5 00:00:11,040 --> 00:00:13,650 To make a donation or view additional materials 6 00:00:13,650 --> 00:00:17,537 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,537 --> 00:00:18,162 at ocw.mit.edu. 8 00:00:23,160 --> 00:00:25,370 PROFESSOR: Just a reminder, drop day is tomorrow. 9 00:00:25,370 --> 00:00:27,415 So if you were thinking about dropping the course 10 00:00:27,415 --> 00:00:29,960 or in danger of a bad grade or something, 11 00:00:29,960 --> 00:00:34,420 tomorrow's the last chance to bail out. 12 00:00:34,420 --> 00:00:36,660 Last time we began our discussion on probability 13 00:00:36,660 --> 00:00:40,440 with the Monty Hall game-- the Monty Hall problem. 14 00:00:40,440 --> 00:00:43,280 And as part of the analysis, we made assumptions 15 00:00:43,280 --> 00:00:48,580 of the form that given that Carol placed the prize in box 16 00:00:48,580 --> 00:00:52,440 1, the probability that the contestant chooses 17 00:00:52,440 --> 00:00:55,280 box 1 is 1/3. 18 00:00:55,280 --> 00:00:57,620 Now, this is an example of something 19 00:00:57,620 --> 00:01:00,589 that's called a conditional probability. 20 00:01:00,589 --> 00:01:02,380 And that's what we're going to study today. 21 00:01:13,930 --> 00:01:16,170 Now, in general, you have something 22 00:01:16,170 --> 00:01:19,950 like the conditional probability that an event, A, 23 00:01:19,950 --> 00:01:24,210 happens given that some other event, B, has already 24 00:01:24,210 --> 00:01:25,770 taken place. 25 00:01:25,770 --> 00:01:38,650 And you write that down as a probability of A given B. 26 00:01:38,650 --> 00:01:39,945 And both A and B are events. 27 00:01:44,190 --> 00:01:46,665 Now, the example from Monty Hall-- 28 00:01:46,665 --> 00:01:48,870 and actually, we had several-- but you 29 00:01:48,870 --> 00:01:56,230 might have B being the event that Carol 30 00:01:56,230 --> 00:01:59,670 places the prize in box 1. 31 00:02:03,820 --> 00:02:17,300 And A might be the event that the contestant chooses box 1. 32 00:02:17,300 --> 00:02:20,200 And we assumed for the Monty Hall game 33 00:02:20,200 --> 00:02:29,200 that the probability of A given B in this case 34 00:02:29,200 --> 00:02:33,110 was 1/3 third because the contestant didn't 35 00:02:33,110 --> 00:02:35,220 know where the prize was. 36 00:02:37,960 --> 00:02:40,320 Now in general, there's a very simple formula 37 00:02:40,320 --> 00:02:44,974 to compute the probability of A given B. In fact, we'll 38 00:02:44,974 --> 00:02:46,015 treat it as a definition. 39 00:02:48,920 --> 00:02:55,700 Assuming the probability of B is non-zero than the probability 40 00:02:55,700 --> 00:03:03,440 of A given B is just the probability of A and B 41 00:03:03,440 --> 00:03:07,860 happening, both happening, divided by the probability 42 00:03:07,860 --> 00:03:10,700 of B happening. 43 00:03:10,700 --> 00:03:15,620 And you can see why this makes sense when the picture-- say 44 00:03:15,620 --> 00:03:18,560 this is our sample space. 45 00:03:18,560 --> 00:03:22,370 And let this be the event, A, and this 46 00:03:22,370 --> 00:03:28,440 be the event, B. Now we're conditioning on the fact 47 00:03:28,440 --> 00:03:31,110 that B happened. 48 00:03:31,110 --> 00:03:33,360 Now once we've conditioned on that, 49 00:03:33,360 --> 00:03:37,320 all this stuff outside of B is no longer possible. 50 00:03:37,320 --> 00:03:42,930 All those outcomes are no longer in the space of consideration. 51 00:03:42,930 --> 00:03:45,850 The only outcomes left are in B. So in some sense 52 00:03:45,850 --> 00:03:49,050 we've shrunk the sample space to be B. 53 00:03:49,050 --> 00:03:52,660 And all we care about is the probability that A happens 54 00:03:52,660 --> 00:03:55,410 inside this new sample space. 55 00:03:55,410 --> 00:03:58,950 And that is, we're asking the probability 1 of these outcomes 56 00:03:58,950 --> 00:04:03,040 happens given that this is the sample space. 57 00:04:03,040 --> 00:04:07,160 Well, this is just A intersect B because you still 58 00:04:07,160 --> 00:04:09,650 have to have A happen, but now you're inside of B. 59 00:04:09,650 --> 00:04:13,530 And then we divide by probability of B. 60 00:04:13,530 --> 00:04:17,150 So we normalize this to be probability one. 61 00:04:17,150 --> 00:04:17,649 OK. 62 00:04:17,649 --> 00:04:19,490 Because we're saying B happened-- 63 00:04:19,490 --> 00:04:20,970 we're conditioning on that. 64 00:04:20,970 --> 00:04:24,680 Therefore, the probability of these outcomes must be 1. 65 00:04:24,680 --> 00:04:28,660 So we divide by the probability of B. So we normalize. 66 00:04:28,660 --> 00:04:31,290 This now becomes-- the probability of A given B 67 00:04:31,290 --> 00:04:36,310 is this share of B weighted by the outcomes. 68 00:04:36,310 --> 00:04:36,810 OK. 69 00:04:39,020 --> 00:04:39,520 All right. 70 00:04:39,520 --> 00:04:45,410 For example then, what's the probability of B given B? 71 00:04:47,970 --> 00:04:49,740 what's that equal? 72 00:04:49,740 --> 00:04:50,700 1. 73 00:04:50,700 --> 00:04:51,470 OK. 74 00:04:51,470 --> 00:04:54,406 Because we said it happened-- so it happens with probability 1. 75 00:04:54,406 --> 00:04:58,890 Or, using the formula, that's just probability of B and B 76 00:04:58,890 --> 00:05:01,890 divided by probability of B. Well, that 77 00:05:01,890 --> 00:05:05,470 equals the probability of B divided 78 00:05:05,470 --> 00:05:09,511 by the probability of B, which is 1. 79 00:05:09,511 --> 00:05:10,010 All right. 80 00:05:10,010 --> 00:05:13,010 Any questions about the definition 81 00:05:13,010 --> 00:05:15,970 of the conditional probability? 82 00:05:15,970 --> 00:05:16,500 Very simple. 83 00:05:16,500 --> 00:05:20,130 And it's easy to work with using the formulas. 84 00:05:20,130 --> 00:05:22,850 Now, there's a nice rule called the product 85 00:05:22,850 --> 00:05:25,635 rule, which follows from the definition very simply. 86 00:05:31,600 --> 00:05:35,450 The product rule says that the probability of A and B 87 00:05:35,450 --> 00:05:40,290 for two events is equal to the probability of B 88 00:05:40,290 --> 00:05:44,880 times the probability of A given B. 89 00:05:44,880 --> 00:05:47,060 And that's just follow straightforwardly 90 00:05:47,060 --> 00:05:49,700 from this definition. 91 00:05:49,700 --> 00:05:53,750 Just multiply by probability of B on both sides. 92 00:05:53,750 --> 00:05:54,250 All right. 93 00:05:54,250 --> 00:05:55,708 So now you have a rule of computing 94 00:05:55,708 --> 00:05:59,750 a probability of two events simultaneously happening. 95 00:05:59,750 --> 00:06:03,460 So for example, in the Monty Hall problem, 96 00:06:03,460 --> 00:06:06,190 what's the probability that Carol places 97 00:06:06,190 --> 00:06:11,840 the prize in box one and that's the box the contestant chooses? 98 00:06:11,840 --> 00:06:12,340 All right? 99 00:06:12,340 --> 00:06:15,650 So if we took A and B as defined up there, 100 00:06:15,650 --> 00:06:18,880 that's the probability that Carol places it in box one 101 00:06:18,880 --> 00:06:22,090 and the contestant chose it. 102 00:06:22,090 --> 00:06:24,500 Well, that's the probability that the contestant chooses 103 00:06:24,500 --> 00:06:28,740 it is 1/3 times the probability that Carol put it there, 104 00:06:28,740 --> 00:06:33,130 given the contestant chose it, or actually, vice versa, 105 00:06:33,130 --> 00:06:36,282 Is 1/9. 106 00:06:36,282 --> 00:06:38,430 OK? 107 00:06:38,430 --> 00:06:40,275 And this extends to more events. 108 00:06:42,870 --> 00:06:44,830 It is called the general product rule. 109 00:06:55,500 --> 00:07:02,170 So if you want to compute the probability of A1 and A2 110 00:07:02,170 --> 00:07:08,840 and all the way up to An, that's simply the probability of a 1 111 00:07:08,840 --> 00:07:17,760 happening all by itself times the probability of A2 given A1 112 00:07:17,760 --> 00:07:20,350 times-- well, I'll do the next one-- times the probability 113 00:07:20,350 --> 00:07:29,750 of A3 given A1 and A2, dot, dot dot, times, finally, 114 00:07:29,750 --> 00:07:32,940 the probability of An given all the others. 115 00:07:40,730 --> 00:07:43,360 So that starts to look a little more complicated. 116 00:07:43,360 --> 00:07:46,720 But it gives you a handy way of computing the probability 117 00:07:46,720 --> 00:07:48,565 that an intersection of events takes place. 118 00:07:51,080 --> 00:07:54,639 I do This is proved by induction on n, just taking that rule 119 00:07:54,639 --> 00:07:55,680 and using induction on n. 120 00:07:55,680 --> 00:07:56,690 It's not hard. 121 00:07:56,690 --> 00:07:59,810 But we won't go through it. 122 00:07:59,810 --> 00:08:00,310 All right. 123 00:08:00,310 --> 00:08:02,770 Let's do some examples. 124 00:08:02,770 --> 00:08:04,070 We'll start with an easy one. 125 00:08:07,750 --> 00:08:11,670 Say you're playing a playoff series 126 00:08:11,670 --> 00:08:13,430 and you're going to play best 2 out of 3. 127 00:08:13,430 --> 00:08:15,358 All right. 128 00:08:15,358 --> 00:08:22,210 So you have a best 2 out of 3 series. 129 00:08:22,210 --> 00:08:25,350 So whoever wins the first two games, best two out of three 130 00:08:25,350 --> 00:08:27,590 wins. 131 00:08:27,590 --> 00:08:32,210 And say you're told that the probability of winning 132 00:08:32,210 --> 00:08:33,570 the first game is 1/2. 133 00:08:44,450 --> 00:08:48,560 So the teams are matched 50-50 for the first game. 134 00:08:48,560 --> 00:08:53,020 But then you're told that the probability of winning a game 135 00:08:53,020 --> 00:08:55,590 after a victory is higher. 136 00:08:55,590 --> 00:08:56,165 It's 2/3. 137 00:09:02,000 --> 00:09:09,700 So the probability of winning immediately 138 00:09:09,700 --> 00:09:22,190 after a game following a win is two thirds. 139 00:09:22,190 --> 00:09:26,540 And similarly, the probability of winning after a loss is 1/3. 140 00:09:43,370 --> 00:09:43,870 All right. 141 00:09:43,870 --> 00:09:46,000 And the idea here is that you win a game, 142 00:09:46,000 --> 00:09:47,860 you're sort of psyched, you've got momentum, 143 00:09:47,860 --> 00:09:50,870 and going into the next day you're more likely to win. 144 00:09:50,870 --> 00:09:52,730 Similarly, if you lost you're sort of down 145 00:09:52,730 --> 00:09:55,250 and the other guy has a better chance of beating you. 146 00:09:55,250 --> 00:09:57,630 Now, what we're going to try to figure out 147 00:09:57,630 --> 00:10:01,940 is the probability of winning the series 148 00:10:01,940 --> 00:10:05,450 given you won the first game. 149 00:10:05,450 --> 00:10:05,950 All right? 150 00:10:05,950 --> 00:10:08,100 Now, conditional probability comes up 151 00:10:08,100 --> 00:10:11,540 in two places in this problem. 152 00:10:11,540 --> 00:10:13,720 Anybody tell me places where it's come up? 153 00:10:13,720 --> 00:10:15,970 So I got the problem statement and the that's the goal 154 00:10:15,970 --> 00:10:17,870 is to figure out the probability you win the series given 155 00:10:17,870 --> 00:10:18,828 you won the first game. 156 00:10:18,828 --> 00:10:21,030 So what's one place conditional probability 157 00:10:21,030 --> 00:10:24,880 is entering into this problem? 158 00:10:24,880 --> 00:10:26,140 Yeah? 159 00:10:26,140 --> 00:10:27,550 AUDIENCE: The probability changes 160 00:10:27,550 --> 00:10:31,252 depending on the result of the previous game. 161 00:10:31,252 --> 00:10:32,210 PROFESSOR: That's true. 162 00:10:32,210 --> 00:10:34,270 The probability of winning any particular game 163 00:10:34,270 --> 00:10:35,910 is influenced by the previous game. 164 00:10:35,910 --> 00:10:38,780 So you're using conditional probability there. 165 00:10:38,780 --> 00:10:40,140 All right. 166 00:10:40,140 --> 00:10:41,860 And where else? 167 00:10:41,860 --> 00:10:42,546 Yeah. 168 00:10:42,546 --> 00:10:46,434 AUDIENCE: [INAUDIBLE] you have to take 169 00:10:46,434 --> 00:10:48,100 into account [INAUDIBLE]. 170 00:10:48,100 --> 00:10:49,350 PROFESSOR: That's interesting. 171 00:10:49,350 --> 00:10:51,558 That will be another question we're going to look at. 172 00:10:51,558 --> 00:10:54,050 What's the probability of playing three games? 173 00:10:54,050 --> 00:10:55,340 Yep. 174 00:10:55,340 --> 00:10:57,521 That's one. 175 00:10:57,521 --> 00:10:58,020 OK. 176 00:10:58,020 --> 00:11:00,410 Well, the question we're after, what's 177 00:11:00,410 --> 00:11:02,310 the probability of winning the series given 178 00:11:02,310 --> 00:11:04,040 that you won the first game. 179 00:11:04,040 --> 00:11:07,040 We're going to compute a conditional probability there. 180 00:11:07,040 --> 00:11:09,954 So it's coming up in a couple of places here. 181 00:11:09,954 --> 00:11:10,870 All right. 182 00:11:10,870 --> 00:11:12,650 Let's figure this out. 183 00:11:12,650 --> 00:11:16,710 It's easy to do given the tree method. 184 00:11:16,710 --> 00:11:19,610 So let's make the tree for this. 185 00:11:19,610 --> 00:11:27,300 So we have possibly three games there's game one, game two, 186 00:11:27,300 --> 00:11:28,190 and game three. 187 00:11:31,060 --> 00:11:32,910 Game one, you can win or lose. 188 00:11:32,910 --> 00:11:36,560 There's two branches. 189 00:11:36,560 --> 00:11:38,135 Game two you can win or lose. 190 00:11:41,350 --> 00:11:46,650 And now, game three-- well, it doesn't even take place here. 191 00:11:46,650 --> 00:11:47,450 But it does here. 192 00:11:47,450 --> 00:11:50,800 You can win or lose here. 193 00:11:50,800 --> 00:11:54,520 And you could win or lose here. 194 00:11:54,520 --> 00:11:56,120 And here the series is over. 195 00:11:56,120 --> 00:12:00,040 So there is no game three in that case. 196 00:12:00,040 --> 00:12:03,030 The probabilities are next we put a probability 197 00:12:03,030 --> 00:12:04,900 of every branch here. 198 00:12:04,900 --> 00:12:05,820 Game one is 50-50. 199 00:12:08,207 --> 00:12:10,040 What's the probability you take this branch? 200 00:12:13,530 --> 00:12:17,170 2/3, because you're on the path where you won the first game. 201 00:12:17,170 --> 00:12:18,760 You win the second game with 2/3. 202 00:12:18,760 --> 00:12:21,540 You lose with 1/3. 203 00:12:21,540 --> 00:12:25,020 Now here you're on the path where you lost the first game. 204 00:12:25,020 --> 00:12:29,931 So this has 1/3 and this has 2/3. 205 00:12:29,931 --> 00:12:30,430 All right? 206 00:12:30,430 --> 00:12:32,510 And then lastly, what's the probability I have 207 00:12:32,510 --> 00:12:39,060 the win on the third game here? 208 00:12:39,060 --> 00:12:42,150 1/3, because I just lost the last game. 209 00:12:42,150 --> 00:12:44,220 That's all I'm conditioning on. 210 00:12:44,220 --> 00:12:46,250 So that becomes 1/3. 211 00:12:46,250 --> 00:12:49,150 And this is 2/3 now. 212 00:12:49,150 --> 00:12:50,770 And then here I just won a game. 213 00:12:50,770 --> 00:12:54,386 So I've got 2/3 and 1/3. 214 00:12:54,386 --> 00:12:55,290 All right. 215 00:12:55,290 --> 00:12:57,890 So I got all the probabilities. 216 00:12:57,890 --> 00:13:02,870 And now I need to figure out for the sample points what's 217 00:13:02,870 --> 00:13:03,730 their probability. 218 00:13:03,730 --> 00:13:07,760 So this sample point we'll call win-win. 219 00:13:07,760 --> 00:13:11,176 This sample point is win-lose-win. 220 00:13:15,089 --> 00:13:16,130 This one's win-lose-lose. 221 00:13:20,130 --> 00:13:27,780 Then we have lose-win-win, lose-win-lose, 222 00:13:27,780 --> 00:13:30,040 and then lose-lose. 223 00:13:30,040 --> 00:13:32,010 So I got six sample points. 224 00:13:32,010 --> 00:13:35,430 And let's figure out the probability for each one. 225 00:13:35,430 --> 00:13:39,280 Now remember the rule we had for the tree method. 226 00:13:39,280 --> 00:13:42,570 I just multiply these things. 227 00:13:42,570 --> 00:13:44,820 Well, in fact, the reason we have that rule 228 00:13:44,820 --> 00:13:49,120 is because that is the same as the product rule. 229 00:13:49,120 --> 00:13:50,830 Because what I'm asking here to compute 230 00:13:50,830 --> 00:14:03,210 the probability of this guy is-- so the product rule gives 231 00:14:03,210 --> 00:14:07,700 the probability of a win-win scenario-- win the first game, 232 00:14:07,700 --> 00:14:09,420 win the second game. 233 00:14:09,420 --> 00:14:11,210 By the product rule is the probability 234 00:14:11,210 --> 00:14:16,580 that I win the first game times the probability 235 00:14:16,580 --> 00:14:23,570 that I win the second game given that I won the first game. 236 00:14:23,570 --> 00:14:27,280 That's what the product rule says. 237 00:14:27,280 --> 00:14:31,590 Probability I win the first game is 1/2 times 238 00:14:31,590 --> 00:14:33,480 the probability I win the second given 239 00:14:33,480 --> 00:14:37,590 that I won the first is 2/3. 240 00:14:37,590 --> 00:14:40,270 So that equals 1/3. 241 00:14:45,641 --> 00:14:47,390 So what we're doing here now is giving you 242 00:14:47,390 --> 00:14:51,270 the formal justification for that rule that we had last time 243 00:14:51,270 --> 00:14:53,770 and that you'll always use-- is the probability of a sample 244 00:14:53,770 --> 00:14:56,320 point is the product of the probabilities 245 00:14:56,320 --> 00:14:59,100 on the edges leading to it. 246 00:14:59,100 --> 00:15:00,310 It's just the product rule. 247 00:15:00,310 --> 00:15:03,260 Now the next example is this one. 248 00:15:03,260 --> 00:15:06,155 And here we're going to use the general product rule to get it. 249 00:15:09,190 --> 00:15:16,160 The probability of win-lose-win by the general product rule 250 00:15:16,160 --> 00:15:20,440 is the probability that you win the first game 251 00:15:20,440 --> 00:15:24,820 times the probability you lose the second game given 252 00:15:24,820 --> 00:15:30,640 the that you win the first times the probability you 253 00:15:30,640 --> 00:15:34,786 win the third given what? 254 00:15:34,786 --> 00:15:38,740 What am I given on the product rule? 255 00:15:38,740 --> 00:15:43,582 Won the first, lost the second. 256 00:15:43,582 --> 00:15:44,419 All right. 257 00:15:44,419 --> 00:15:45,960 Well, now we can fill in the numbers. 258 00:15:45,960 --> 00:15:49,768 The probability I win the first is a 1/2. 259 00:15:49,768 --> 00:15:52,740 The probability that I lose the second 260 00:15:52,740 --> 00:15:57,720 given that I won the first, that's 1/3. 261 00:15:57,720 --> 00:15:59,440 And then this one here, the probability 262 00:15:59,440 --> 00:16:01,680 that I win the third given that I won the first 263 00:16:01,680 --> 00:16:04,600 and lost the second, that simplifies 264 00:16:04,600 --> 00:16:09,160 the probability I win the third given that I lost the second. 265 00:16:09,160 --> 00:16:11,795 Doesn't matter what happened on the first. 266 00:16:11,795 --> 00:16:12,420 And that's 1/3. 267 00:16:16,010 --> 00:16:24,650 So this is 1/2 times 1/3 times 1/3 is 118. 268 00:16:24,650 --> 00:16:28,090 And that's 1/18. 269 00:16:28,090 --> 00:16:31,480 And it's just the product because the product rule 270 00:16:31,480 --> 00:16:35,290 saying product of the first probability 271 00:16:35,290 --> 00:16:37,910 times this one, which is the conditional probability 272 00:16:37,910 --> 00:16:40,080 of being here times this one, which 273 00:16:40,080 --> 00:16:45,600 is a conditional probability if these events happened before. 274 00:16:45,600 --> 00:16:49,180 Any questions about that? 275 00:16:49,180 --> 00:16:51,240 Very simple to do, which is good. 276 00:16:51,240 --> 00:16:51,740 Yeah. 277 00:16:51,740 --> 00:16:53,590 Is there a question? 278 00:16:53,590 --> 00:16:54,090 OK. 279 00:16:54,090 --> 00:16:54,714 All right. 280 00:16:54,714 --> 00:16:56,630 So let's fill in the other probabilities here. 281 00:16:56,630 --> 00:16:59,500 I got 1/2, 1/3, and 2/3. 282 00:16:59,500 --> 00:17:01,820 That's 1/9. 283 00:17:01,820 --> 00:17:04,650 Same thing here is 1/9. 284 00:17:04,650 --> 00:17:08,670 This is 1/18 and 1/3. 285 00:17:12,069 --> 00:17:12,569 OK. 286 00:17:12,569 --> 00:17:15,319 So those are the probabilities in the sample points. 287 00:17:15,319 --> 00:17:20,119 Now, to compute the probability of winning the series given 288 00:17:20,119 --> 00:17:23,124 that we won the first game, let's define the events here. 289 00:17:26,890 --> 00:17:32,355 So A be the event that we win the series. 290 00:17:36,220 --> 00:17:42,420 B will be the event that we win the first game. 291 00:17:45,580 --> 00:17:50,560 And I want to compute the probability of A given B. 292 00:17:50,560 --> 00:17:53,020 And we use our formula. 293 00:17:53,020 --> 00:17:54,825 Where's the formula for that? 294 00:17:54,825 --> 00:17:56,170 It's way back over there. 295 00:17:56,170 --> 00:17:59,210 The probability of A given B is the probability 296 00:17:59,210 --> 00:18:04,710 of both happening, the probability of A and B 297 00:18:04,710 --> 00:18:09,450 divided by the probability of B. 298 00:18:09,450 --> 00:18:13,620 So now I just have to compute these probabilities. 299 00:18:13,620 --> 00:18:16,700 So to do that I got to figure out which sample points are 300 00:18:16,700 --> 00:18:20,820 in A and B here. 301 00:18:20,820 --> 00:18:22,030 So let's write that down. 302 00:18:22,030 --> 00:18:27,750 There's A, B, A and B. All right. 303 00:18:27,750 --> 00:18:32,780 So A is the event that we win the series. 304 00:18:35,670 --> 00:18:42,910 Now this sample point qualifies, that one does, and this one. 305 00:18:42,910 --> 00:18:44,530 B is the event we won the first game. 306 00:18:44,530 --> 00:18:48,360 And that's these three sample points. 307 00:18:48,360 --> 00:18:54,521 And then A and B intersect B is these two. 308 00:18:54,521 --> 00:18:55,020 All right. 309 00:18:55,020 --> 00:18:56,560 So for each event that I care about 310 00:18:56,560 --> 00:18:58,684 I figure out which sample points are in that event. 311 00:19:01,370 --> 00:19:05,570 And now I just add the probabilities up. 312 00:19:05,570 --> 00:19:08,740 So what's the probability of A and B? 313 00:19:17,400 --> 00:19:19,070 7/18. 314 00:19:19,070 --> 00:19:19,750 1/3 plus 1/18. 315 00:19:24,810 --> 00:19:26,165 What's the probability of B? 316 00:19:30,661 --> 00:19:31,160 Yeah. 317 00:19:31,160 --> 00:19:32,330 1/2, 9/18. 318 00:19:32,330 --> 00:19:33,430 I got these three points. 319 00:19:37,070 --> 00:19:43,770 So this'll be 1/3 third plus 1/18 plus the extra one, 1/9. 320 00:19:43,770 --> 00:19:49,340 So I've got 7/18 over 9/18. 321 00:19:49,340 --> 00:19:52,640 7/9 is the answer. 322 00:19:52,640 --> 00:19:54,470 So the probability we win the Series 323 00:19:54,470 --> 00:19:56,370 given we won the first game is 7/9. 324 00:19:59,032 --> 00:19:59,615 Any questions? 325 00:20:03,050 --> 00:20:06,090 We're going to do this same thing about 10 different times. 326 00:20:06,090 --> 00:20:06,850 OK? 327 00:20:06,850 --> 00:20:09,170 And it will look a little different each time maybe. 328 00:20:09,170 --> 00:20:10,250 But it's the same idea. 329 00:20:10,250 --> 00:20:13,780 And the beauty here is it's really easy to do. 330 00:20:13,780 --> 00:20:16,400 I'm going to give you a lot of confusing examples. 331 00:20:16,400 --> 00:20:20,696 But really, if you just do this is it's going to be very easy. 332 00:20:20,696 --> 00:20:21,220 All right. 333 00:20:21,220 --> 00:20:24,670 Somebody talked about the series lasting three games. 334 00:20:24,670 --> 00:20:27,722 What's the probability the series lasts three games? 335 00:20:27,722 --> 00:20:29,712 Can anybody look at that and tell me? 336 00:20:33,410 --> 00:20:37,310 1/3 because what you would do is add up these three sample 337 00:20:37,310 --> 00:20:39,100 points. 338 00:20:39,100 --> 00:20:41,910 And it's the opposite of these two. 339 00:20:41,910 --> 00:20:45,690 So it's 2/3 chance of two games, a 1/3 chance of three games. 340 00:20:45,690 --> 00:20:49,910 So it's not likely to go three games. 341 00:20:49,910 --> 00:20:51,640 All right. 342 00:20:51,640 --> 00:20:54,860 So to this point, we've seen examples 343 00:20:54,860 --> 00:20:56,830 of a conditional probability where 344 00:20:56,830 --> 00:21:03,570 it's A given B where A follows B, like, we're told B happened. 345 00:21:03,570 --> 00:21:07,370 Now what's the chance of A. And A is coming later. 346 00:21:07,370 --> 00:21:09,990 The probability of winning today's game 347 00:21:09,990 --> 00:21:12,950 given that you won yesterday's game, the probability 348 00:21:12,950 --> 00:21:17,770 of winning the series given you already won the first game. 349 00:21:17,770 --> 00:21:21,140 Next, we're going to look at the opposite scenario 350 00:21:21,140 --> 00:21:24,320 where the events are reversed in order. 351 00:21:24,320 --> 00:21:28,240 The probability that you won the first game 352 00:21:28,240 --> 00:21:32,294 given that you won the series. 353 00:21:32,294 --> 00:21:33,120 All right. 354 00:21:33,120 --> 00:21:37,140 Now, this is inherently confusing 355 00:21:37,140 --> 00:21:40,170 because if you're trying to figure-- 356 00:21:40,170 --> 00:21:43,360 if you know you the series, well, 357 00:21:43,360 --> 00:21:45,370 you already know what happened in the first game 358 00:21:45,370 --> 00:21:47,270 because it's been played. 359 00:21:47,270 --> 00:21:49,270 So how could there be any probability there? 360 00:21:49,270 --> 00:21:51,240 It happened. 361 00:21:51,240 --> 00:21:54,460 Well, so what the meaning is is over all the times 362 00:21:54,460 --> 00:21:56,862 where the series was played, sort 363 00:21:56,862 --> 00:21:58,570 of what fraction of the time did the team 364 00:21:58,570 --> 00:22:00,760 that won the series win the first game is one 365 00:22:00,760 --> 00:22:03,330 way you could think about it. 366 00:22:03,330 --> 00:22:05,120 Or, maybe you just don't know. 367 00:22:05,120 --> 00:22:06,024 The game was played. 368 00:22:06,024 --> 00:22:07,190 You know you won the series. 369 00:22:07,190 --> 00:22:09,300 But you don't know who won the first game. 370 00:22:09,300 --> 00:22:12,540 And so you could think of a probability still being there. 371 00:22:12,540 --> 00:22:17,220 Now when you think about it, it gets me confused still. 372 00:22:17,220 --> 00:22:19,840 But just think about it like the math. 373 00:22:19,840 --> 00:22:21,981 It's the same formula. 374 00:22:21,981 --> 00:22:22,480 OK. 375 00:22:22,480 --> 00:22:25,800 It doesn't matter which happened first in time. 376 00:22:25,800 --> 00:22:27,750 You use the same mathematics. 377 00:22:27,750 --> 00:22:31,310 In fact, they give a special name these kinds of things. 378 00:22:31,310 --> 00:22:34,480 They're called a postieri conditional probabilities. 379 00:22:50,410 --> 00:22:55,060 It's a fancy name for just saying that things are out 380 00:22:55,060 --> 00:22:57,711 of order in time. 381 00:22:57,711 --> 00:22:58,210 All right? 382 00:22:58,210 --> 00:23:07,920 So it's a probability of B given A where B precedes A in time. 383 00:23:14,100 --> 00:23:16,150 All right? 384 00:23:16,150 --> 00:23:17,760 So it's the same math. 385 00:23:17,760 --> 00:23:21,500 It's just they're out of order. 386 00:23:21,500 --> 00:23:24,910 So let's figure out the probability 387 00:23:24,910 --> 00:23:30,090 that you won the first game given that you want the series. 388 00:23:30,090 --> 00:23:32,570 Let's figure it out. 389 00:23:32,570 --> 00:23:38,330 So I want probability of B given A now for this example. 390 00:23:38,330 --> 00:23:42,620 Well, it's just the probability of B and A 391 00:23:42,620 --> 00:23:47,590 over the probability of A. 392 00:23:47,590 --> 00:23:49,930 We already computed the probability of A and B. 393 00:23:49,930 --> 00:23:50,950 That's 1/3 plus 1/18. 394 00:23:56,380 --> 00:23:58,760 what's the probability of A, the probability 395 00:23:58,760 --> 00:24:00,290 of winning the first game? 396 00:24:04,705 --> 00:24:05,205 1/2. 397 00:24:05,205 --> 00:24:08,090 It's those three sample points and they better add up to 1/2 398 00:24:08,090 --> 00:24:10,590 because we sort of said, the probability of the first game's 399 00:24:10,590 --> 00:24:12,350 1/2. 400 00:24:12,350 --> 00:24:14,900 So that's over 1/2, which is 9/18. 401 00:24:17,780 --> 00:24:20,200 Well this was 7/18 over 9/18. 402 00:24:20,200 --> 00:24:22,980 It's 7/9. 403 00:24:22,980 --> 00:24:25,560 So the probability of winning the first game given 404 00:24:25,560 --> 00:24:27,076 that you won series is 7/9. 405 00:24:29,950 --> 00:24:34,218 Anybody notice anything unusual about that answer here? 406 00:24:34,218 --> 00:24:37,630 It's the same as the answer over there. 407 00:24:37,630 --> 00:24:38,380 Is that a theorem? 408 00:24:40,970 --> 00:24:41,470 No. 409 00:24:41,470 --> 00:24:43,920 The probability of A given B is not always 410 00:24:43,920 --> 00:24:47,230 the probability of B given A. It was in this case. 411 00:24:47,230 --> 00:24:49,070 It is not always true. 412 00:24:49,070 --> 00:24:52,310 In fact, we could make a simple example 413 00:24:52,310 --> 00:24:53,935 to see why that's not always the case. 414 00:25:07,191 --> 00:25:07,690 All right. 415 00:25:07,690 --> 00:25:11,150 So say here's your sample space. 416 00:25:11,150 --> 00:25:15,850 And say that this is B and this is 417 00:25:15,850 --> 00:25:20,990 A. What's the probability of A given B in this case? 418 00:25:23,670 --> 00:25:24,330 1. 419 00:25:24,330 --> 00:25:25,980 If you're in B-- wait. 420 00:25:25,980 --> 00:25:26,480 No. 421 00:25:26,480 --> 00:25:28,850 It's not 1. 422 00:25:28,850 --> 00:25:32,820 What's the probability of A given B If I got some-- 423 00:25:32,820 --> 00:25:33,970 probably less than 1. 424 00:25:33,970 --> 00:25:37,210 Might be I've drawn it as 1/3 third if it was uniform. 425 00:25:37,210 --> 00:25:40,730 But in this case, the probability of A given B 426 00:25:40,730 --> 00:25:41,580 is less than 1. 427 00:25:41,580 --> 00:25:43,450 What's the probability of B given A? 428 00:25:47,240 --> 00:25:53,815 1, because if I'm in A I'm definitely in B. All right. 429 00:25:53,815 --> 00:25:55,940 So that's an example where they would be different. 430 00:25:55,940 --> 00:25:59,450 And that's the generic case is they're different. 431 00:25:59,450 --> 00:26:01,840 All right? 432 00:26:01,840 --> 00:26:04,360 When are they equal because they were equal in this case? 433 00:26:04,360 --> 00:26:06,640 What makes them equal? 434 00:26:06,640 --> 00:26:07,330 Let's see. 435 00:26:07,330 --> 00:26:11,900 When does the probability of A given B 436 00:26:11,900 --> 00:26:14,170 equal a probability of B given A? 437 00:26:14,170 --> 00:26:16,135 Let's see. 438 00:26:16,135 --> 00:26:19,150 Well, If I plug-in the formula, this 439 00:26:19,150 --> 00:26:26,410 equals the probability of A and B over the probability of B. 440 00:26:26,410 --> 00:26:35,000 That equals the probability of B and A over a probability of A. 441 00:26:35,000 --> 00:26:36,030 So when are those equal? 442 00:26:39,430 --> 00:26:39,930 Yeah. 443 00:26:39,930 --> 00:26:43,380 When probability A equals probability B. All right. 444 00:26:43,380 --> 00:26:45,070 So that's one case. 445 00:26:48,152 --> 00:26:49,310 What's the other case? 446 00:26:54,230 --> 00:26:55,520 Yeah-- when it's 0. 447 00:26:55,520 --> 00:26:57,460 Probability-- there's no intersection. 448 00:26:57,460 --> 00:26:59,724 Probability of A intersect B is 0. 449 00:26:59,724 --> 00:27:00,640 That's the other case. 450 00:27:07,071 --> 00:27:07,570 All right. 451 00:27:07,570 --> 00:27:10,030 But usually these conditions won't 452 00:27:10,030 --> 00:27:13,690 apply-- just happened to in this example by coincidence. 453 00:27:16,320 --> 00:27:21,750 Any questions about that? 454 00:27:21,750 --> 00:27:23,650 All right. 455 00:27:23,650 --> 00:27:24,150 Yeah. 456 00:27:24,150 --> 00:27:26,399 So the math is the same with a postieri probabilities. 457 00:27:26,399 --> 00:27:29,210 It's really, really easy. 458 00:27:29,210 --> 00:27:29,830 All right. 459 00:27:29,830 --> 00:27:33,515 So let's do another simple example that'll start to maybe 460 00:27:33,515 --> 00:27:34,640 be a little more confusing. 461 00:27:38,470 --> 00:27:40,590 Say we've got two coins. 462 00:27:53,510 --> 00:27:55,080 One of them is a fair coin. 463 00:27:59,200 --> 00:28:01,530 And by that, I mean the probability comes up 464 00:28:01,530 --> 00:28:04,560 heads is the same as the probability comes up 465 00:28:04,560 --> 00:28:06,980 tails is 1/2. 466 00:28:06,980 --> 00:28:08,770 The other one is an unfair coin. 467 00:28:12,440 --> 00:28:16,056 And in this case, that means it's always heads. 468 00:28:16,056 --> 00:28:17,930 The probability of heads is 1. 469 00:28:17,930 --> 00:28:21,392 The probability of tails is 0. 470 00:28:21,392 --> 00:28:22,540 All right? 471 00:28:22,540 --> 00:28:25,370 I've got two such coins here. 472 00:28:25,370 --> 00:28:26,330 All right. 473 00:28:26,330 --> 00:28:30,870 Here is the unfair coin-- heads and heads. 474 00:28:30,870 --> 00:28:33,480 Actually, they make these things look like quarters sometimes. 475 00:28:33,480 --> 00:28:38,200 Here's the fair coin-- heads and tails. 476 00:28:38,200 --> 00:28:38,900 All right. 477 00:28:38,900 --> 00:28:45,410 Now suppose I pick one of these at random, 50-50, 478 00:28:45,410 --> 00:28:50,590 I pick one of these things, and I 479 00:28:50,590 --> 00:28:57,459 flip it, which I'm doing behind my back, and lo and behold, 480 00:28:57,459 --> 00:28:58,875 it comes out and, you see a heads. 481 00:29:02,240 --> 00:29:06,790 What's the probability I'm holding the fair coin? 482 00:29:11,010 --> 00:29:15,700 I picked the coin, 50-50, behind my back. 483 00:29:15,700 --> 00:29:21,420 So one answer is, I picked the fair coin with 50% probability. 484 00:29:21,420 --> 00:29:24,210 But then I flipped it behind my back 485 00:29:24,210 --> 00:29:28,120 and I showed you the result. And you see heads. 486 00:29:28,120 --> 00:29:31,250 Of course, if I'd have shown you tails, 487 00:29:31,250 --> 00:29:33,850 You would have known for sure it was the fair coin 488 00:29:33,850 --> 00:29:34,960 because that's the only one with the tails. 489 00:29:34,960 --> 00:29:36,400 But you don't know for sure now. 490 00:29:36,400 --> 00:29:38,400 You see a heads. 491 00:29:38,400 --> 00:29:41,280 What's the probability this is the fair coin given that you 492 00:29:41,280 --> 00:29:45,470 saw a heads after the flip? 493 00:29:45,470 --> 00:29:48,370 How many people think 1/2? 494 00:29:48,370 --> 00:29:50,940 After all, I picked it with probability 1/2. 495 00:29:50,940 --> 00:29:54,890 How many people think it's less than 1/2? 496 00:29:54,890 --> 00:29:55,680 Good. 497 00:29:55,680 --> 00:29:57,530 OK. 498 00:29:57,530 --> 00:30:00,110 Somebody even said 1/3. 499 00:30:00,110 --> 00:30:02,530 Does that sound right? 500 00:30:02,530 --> 00:30:04,870 A couple people like 1/3. 501 00:30:04,870 --> 00:30:07,160 OK. 502 00:30:07,160 --> 00:30:07,660 All right. 503 00:30:07,660 --> 00:30:09,990 Now, part of what makes this tricky 504 00:30:09,990 --> 00:30:13,950 is I told you I picked the coin with 50% probability. 505 00:30:13,950 --> 00:30:15,730 But then I gave you information. 506 00:30:15,730 --> 00:30:19,594 So I've conditioned the problem. 507 00:30:19,594 --> 00:30:21,010 And so this is one of those things 508 00:30:21,010 --> 00:30:23,130 you could have an ask Marilyn about. 509 00:30:23,130 --> 00:30:24,350 Is it 1/2 or is it 1/3? 510 00:30:24,350 --> 00:30:26,530 Because I picked it with 50% chance, 511 00:30:26,530 --> 00:30:30,900 what does the information do for you? 512 00:30:30,900 --> 00:30:34,190 Now, I'll give you a clue. 513 00:30:34,190 --> 00:30:37,660 Bobo might have written in and said it's 1/2. 514 00:30:37,660 --> 00:30:39,784 And his proof is that three other mathematicians 515 00:30:39,784 --> 00:30:40,450 agreed with him. 516 00:30:40,450 --> 00:30:41,640 [LAUGHTER] 517 00:30:41,640 --> 00:30:43,690 All right? 518 00:30:43,690 --> 00:30:44,550 OK. 519 00:30:44,550 --> 00:30:48,030 So let's figure it out. 520 00:30:48,030 --> 00:30:50,320 And really it's very simple. 521 00:30:50,320 --> 00:30:54,550 It's just drawing out the tree and computing 522 00:30:54,550 --> 00:30:56,480 the conditional probability. 523 00:30:56,480 --> 00:31:00,070 So we're going to do the same thing over and over again 524 00:31:00,070 --> 00:31:03,464 because it just works for every problem. 525 00:31:03,464 --> 00:31:07,950 Of course, you could imagine debating this for awhile, 526 00:31:07,950 --> 00:31:08,890 arguing with somebody. 527 00:31:08,890 --> 00:31:10,200 Is it 1/2 or 1/3? 528 00:31:10,200 --> 00:31:14,439 Much simpler just to do it. 529 00:31:14,439 --> 00:31:16,605 So the first thing is we have, which coin is picked? 530 00:31:19,300 --> 00:31:22,040 So it could be fair-- and I told you 531 00:31:22,040 --> 00:31:26,650 that happens with probability 1/2-- or unfair, 532 00:31:26,650 --> 00:31:29,300 which is also 1/2. 533 00:31:29,300 --> 00:31:33,290 Then we have the flip. 534 00:31:33,290 --> 00:31:35,570 The fair coin is equally likely to be 535 00:31:35,570 --> 00:31:39,170 heads or tails, each with 1/2. 536 00:31:39,170 --> 00:31:46,190 The unfair coin, guaranteed to be heads, probability 1. 537 00:31:46,190 --> 00:31:46,730 All right. 538 00:31:46,730 --> 00:31:50,510 Now we get the sample point outcomes. 539 00:31:50,510 --> 00:31:54,310 It's fair in heads with the probability 1/4, 540 00:31:54,310 --> 00:31:57,990 fair in tails, probability 1/4, unfair 541 00:31:57,990 --> 00:32:01,580 in heads, probability 1/2. 542 00:32:01,580 --> 00:32:05,070 Now we define the events of interest. 543 00:32:05,070 --> 00:32:07,510 A is going to be that we chose the fair coin. 544 00:32:13,670 --> 00:32:16,450 And B is at the result, is heads. 545 00:32:19,980 --> 00:32:22,150 And of course what I want to know 546 00:32:22,150 --> 00:32:25,640 is the probability that I chose the fair coin given 547 00:32:25,640 --> 00:32:26,660 that I saw a heads. 548 00:32:29,610 --> 00:32:32,020 So to do that we plug in our formula. 549 00:32:32,020 --> 00:32:37,750 That's just the probability of A and B 550 00:32:37,750 --> 00:32:42,190 over the probability of B. And to compute 551 00:32:42,190 --> 00:32:45,890 that I got to figure out the probability of A and B 552 00:32:45,890 --> 00:32:47,240 and the probability of B. 553 00:32:47,240 --> 00:32:49,110 So I'll make my diagram. 554 00:32:49,110 --> 00:32:54,970 A here, B here, A and B. A is the event 555 00:32:54,970 --> 00:32:57,570 I chose the fair coin. 556 00:32:57,570 --> 00:33:00,050 That's these guys. 557 00:33:00,050 --> 00:33:02,850 B is the event the result is heads. 558 00:33:02,850 --> 00:33:05,490 That's this one and this one. 559 00:33:08,070 --> 00:33:12,730 And A intersect B, That's the only point. 560 00:33:12,730 --> 00:33:16,450 So this is really easy to compute now. 561 00:33:16,450 --> 00:33:18,050 What's the probability of A and B? 562 00:33:20,770 --> 00:33:21,490 1/4. 563 00:33:21,490 --> 00:33:24,270 It's just that sample point. 564 00:33:24,270 --> 00:33:25,865 What's the probability of B? 565 00:33:28,400 --> 00:33:30,480 3/4, 1/4 plus 1/2. 566 00:33:33,100 --> 00:33:35,630 So the probability of A given B is 1/3. 567 00:33:38,650 --> 00:33:41,400 Really simple to answer this question. 568 00:33:41,400 --> 00:33:43,300 Just don't even think about it. 569 00:33:43,300 --> 00:33:46,460 Just write down the tree when you get these things. 570 00:33:46,460 --> 00:33:50,752 So much easier just to write the tree down. 571 00:33:50,752 --> 00:33:51,720 All right. 572 00:33:51,720 --> 00:33:57,690 Now the key here is we knew the probability 573 00:33:57,690 --> 00:34:01,445 of picking the fair coin in the first place. 574 00:34:01,445 --> 00:34:03,090 Maybe it's worth writing down what 575 00:34:03,090 --> 00:34:07,268 happens if that's a variable-- sum variable P. Let's do that. 576 00:34:15,239 --> 00:34:20,130 For example, what if I hadn't told you the probability 577 00:34:20,130 --> 00:34:22,690 that I picked the fair coin? 578 00:34:22,690 --> 00:34:26,420 I just picked one and flipped it. 579 00:34:26,420 --> 00:34:30,130 Think that'll change the answer? 580 00:34:30,130 --> 00:34:32,940 It should because you got to plug something in there 581 00:34:32,940 --> 00:34:34,927 for the 1/2 for this to work. 582 00:34:34,927 --> 00:34:36,010 So let's see what happens. 583 00:34:36,010 --> 00:34:40,889 Say I picked the fair coin with probability 584 00:34:40,889 --> 00:34:46,880 P and the unfair coin with 1 minus P. 585 00:34:46,880 --> 00:34:52,429 And this is the same heads and tails, 1/2, 1/2. 586 00:34:52,429 --> 00:34:56,239 Heads, the probability 1. 587 00:34:56,239 --> 00:35:01,520 Well now, instead of 1/4 I get P over 2 up here. 588 00:35:01,520 --> 00:35:03,960 And this is now 1 minus P instead of 1/2. 589 00:35:06,580 --> 00:35:12,736 So the probability of A given B is the probability of A and B 590 00:35:12,736 --> 00:35:16,220 is p over 2. 591 00:35:16,220 --> 00:35:19,890 And the probability of B is P over 2 592 00:35:19,890 --> 00:35:31,100 plus 1 minus P. That's P over 2 up top, one minus P over 2, 593 00:35:31,100 --> 00:35:33,950 and that is all multiplied by-- what am I 594 00:35:33,950 --> 00:35:35,720 going to multiply-- 2 here. 595 00:35:35,720 --> 00:35:41,210 I'll get P over 2 minus P. 596 00:35:41,210 --> 00:35:44,810 So the probability with which I picked the coin to start with 597 00:35:44,810 --> 00:35:48,060 impacts the answer here. 598 00:35:48,060 --> 00:35:55,680 For example, what if I picked the unfair coin for sure? 599 00:35:55,680 --> 00:35:57,330 That would be P being 0. 600 00:36:00,820 --> 00:36:03,130 Well, the probability that I picked the fair coin 601 00:36:03,130 --> 00:36:06,704 is 0 over 2, which is 0. 602 00:36:06,704 --> 00:36:08,870 All right though-- even know I showed you the heads, 603 00:36:08,870 --> 00:36:12,260 there's no chance it was the fair coin because I picked 604 00:36:12,260 --> 00:36:13,610 the unfair coin for sure. 605 00:36:19,840 --> 00:36:23,020 Same thing if I picked the fair coin for sure, 606 00:36:23,020 --> 00:36:25,110 better be the case this is 1. 607 00:36:25,110 --> 00:36:26,720 So I get 1 over 2 minus 1. 608 00:36:26,720 --> 00:36:27,400 It's 1. 609 00:36:30,380 --> 00:36:32,610 Any questions? 610 00:36:32,610 --> 00:36:36,044 So it's important you know the probability I picked 611 00:36:36,044 --> 00:36:37,210 the fair coin to start with. 612 00:36:37,210 --> 00:36:39,630 Otherwise, you can't go anywhere. 613 00:36:42,700 --> 00:36:43,930 All right. 614 00:36:43,930 --> 00:36:46,640 What if I do the same game? 615 00:36:46,640 --> 00:36:50,910 Pick a coin with probability p. 616 00:36:50,910 --> 00:36:54,291 But now I flip it K times. 617 00:36:54,291 --> 00:36:56,400 Say I flip it 100 times. 618 00:36:56,400 --> 00:36:58,340 And every time it comes up heads. 619 00:37:01,420 --> 00:37:04,720 I mean you're pretty sure you got the unfair coin because you 620 00:37:04,720 --> 00:37:05,850 never saw a tails. 621 00:37:05,850 --> 00:37:07,480 Right? 622 00:37:07,480 --> 00:37:08,220 So let's do that. 623 00:37:08,220 --> 00:37:11,060 Let's compute that scenario. 624 00:37:11,060 --> 00:37:16,430 So instead of a single heads I get K straight heads 625 00:37:16,430 --> 00:37:19,310 and no tails. 626 00:37:19,310 --> 00:37:22,290 This would happen with 1 over 2 to the K. 627 00:37:22,290 --> 00:37:27,270 This would happen with 1 minus 1 over 2 to the K. 628 00:37:27,270 --> 00:37:31,980 So this is now p over 2 to the K. This is now P1 minus 2 629 00:37:31,980 --> 00:37:34,870 to the minus K. 630 00:37:34,870 --> 00:37:36,380 Let's recompute the probabilities. 631 00:37:36,380 --> 00:37:38,657 I'm going somewhere where this. 632 00:37:38,657 --> 00:37:39,240 Wait a minute. 633 00:37:50,340 --> 00:37:51,800 So now we're looking at the event 634 00:37:51,800 --> 00:37:56,255 that B is K straight heads. 635 00:37:59,080 --> 00:38:02,150 Come up. 636 00:38:02,150 --> 00:38:04,240 And I want to know the probability 637 00:38:04,240 --> 00:38:07,420 that I picked the fair coin given that it just never comes 638 00:38:07,420 --> 00:38:09,700 up tails. 639 00:38:09,700 --> 00:38:10,575 The math is the same. 640 00:38:17,390 --> 00:38:19,895 The probability now that I picked the fair coin 641 00:38:19,895 --> 00:38:23,310 and got k straight heads is just p times 2 642 00:38:23,310 --> 00:38:31,490 to the minus K. The probability that I got K straight heads is 643 00:38:31,490 --> 00:38:34,210 P times 2 to the minus K plus the chance 644 00:38:34,210 --> 00:38:38,230 I picked the unfair coin, which is 1 minus P. 645 00:38:38,230 --> 00:38:40,340 And if I multiply top and bottom by 2 to the K, 646 00:38:40,340 --> 00:38:47,571 I get P over P plus to the K 1 minus B. 647 00:38:47,571 --> 00:38:48,070 All right. 648 00:38:48,070 --> 00:38:53,290 So it gets very unlikely that I've got the fair coin here 649 00:38:53,290 --> 00:38:55,280 as K gets big. 650 00:38:55,280 --> 00:38:58,660 Like if K is 100 I got a big number down here. 651 00:38:58,660 --> 00:39:00,680 And basically it's 0 chance-- close 652 00:39:00,680 --> 00:39:03,900 to 0 chance of the fair coin. 653 00:39:06,910 --> 00:39:10,760 But now say I do the following experiment. 654 00:39:10,760 --> 00:39:16,820 I don't tell you P. But I pull a coin out 655 00:39:16,820 --> 00:39:21,380 and 100 flips in a row it's heads. 656 00:39:21,380 --> 00:39:22,710 Which coin do you think I have? 657 00:39:26,340 --> 00:39:29,300 I flipped it 100 straight times and it's heads every time. 658 00:39:32,550 --> 00:39:33,050 Yeah. 659 00:39:33,050 --> 00:39:34,341 There's not enough information. 660 00:39:34,341 --> 00:39:34,990 You don't know. 661 00:39:34,990 --> 00:39:37,840 What do you want to say? 662 00:39:37,840 --> 00:39:40,890 You want to say it's the unfair coin 663 00:39:40,890 --> 00:39:46,250 but you have no idea because I might have picked the fair coin 664 00:39:46,250 --> 00:39:48,790 with probability 1, in which case it is the fair coin 665 00:39:48,790 --> 00:39:51,000 and it just was unlucky that it came up heads 666 00:39:51,000 --> 00:39:53,270 100 times in a row. 667 00:39:53,270 --> 00:39:54,530 But it could be. 668 00:39:54,530 --> 00:39:58,730 So you could say nothing if you don't know the probability P. 669 00:39:58,730 --> 00:40:05,370 Because sure enough, if I plug in P being 1 here, 670 00:40:05,370 --> 00:40:07,850 that wipes out the 2 to the K and I just get probability 1. 671 00:40:10,600 --> 00:40:12,940 OK? 672 00:40:12,940 --> 00:40:13,440 All right. 673 00:40:13,440 --> 00:40:18,510 Now when this comes up in practice 674 00:40:18,510 --> 00:40:20,560 is with things like polling. 675 00:40:20,560 --> 00:40:22,680 Like, we just had an election. 676 00:40:22,680 --> 00:40:25,430 And people do poles ahead of time. 677 00:40:25,430 --> 00:40:28,850 And they sample thousands of voters 678 00:40:28,850 --> 00:40:31,160 from 1% of the population. 679 00:40:31,160 --> 00:40:33,590 And they say, OK, that 60% of the people 680 00:40:33,590 --> 00:40:37,240 are going to vote Republican. 681 00:40:37,240 --> 00:40:39,880 And they might have a margin of error, three points, whatever 682 00:40:39,880 --> 00:40:40,380 that means. 683 00:40:40,380 --> 00:40:42,580 And we'll figure that out next week. 684 00:40:42,580 --> 00:40:45,920 What does that tell you about the electorate as a whole-- 685 00:40:45,920 --> 00:40:52,180 the population if they sample 1% at random, 60% are Republican. 686 00:40:52,180 --> 00:40:54,572 Yeah? 687 00:40:54,572 --> 00:40:58,476 AUDIENCE: [INAUDIBLE] The options you have, 688 00:40:58,476 --> 00:41:00,428 is it all heads or is it all tails? 689 00:41:00,428 --> 00:41:01,892 It should be one option all heads 690 00:41:01,892 --> 00:41:03,860 and another option at least one tails. 691 00:41:03,860 --> 00:41:04,910 PROFESSOR: You're right. 692 00:41:04,910 --> 00:41:06,320 Oops. 693 00:41:06,320 --> 00:41:06,940 All right. 694 00:41:06,940 --> 00:41:07,903 At least one tail for this one. 695 00:41:07,903 --> 00:41:08,402 Yeah. 696 00:41:08,402 --> 00:41:10,168 Good. 697 00:41:10,168 --> 00:41:11,530 That is true. 698 00:41:11,530 --> 00:41:13,870 OK. 699 00:41:13,870 --> 00:41:18,550 Any questions about that example? 700 00:41:18,550 --> 00:41:19,550 OK. 701 00:41:19,550 --> 00:41:21,300 Now we're back to the election and there's 702 00:41:21,300 --> 00:41:26,520 a pole that says they sampled 1% of the population at random 703 00:41:26,520 --> 00:41:29,540 and 60% said they're going to vote Republican. 704 00:41:29,540 --> 00:41:32,570 And the margin of error is 3% or something. 705 00:41:32,570 --> 00:41:36,510 What does that tell you about the population of the country? 706 00:41:36,510 --> 00:41:37,900 Nothing. 707 00:41:37,900 --> 00:41:39,560 That's right. 708 00:41:39,560 --> 00:41:42,490 It is what it is. 709 00:41:42,490 --> 00:41:47,200 All you can conclude is that either the population 710 00:41:47,200 --> 00:41:51,920 is close to 60% Republican or you were unlucky 711 00:41:51,920 --> 00:41:54,740 in the 1% you sample. 712 00:41:54,740 --> 00:41:59,120 That's what you can conclude because the population really 713 00:41:59,120 --> 00:42:00,110 is fixed in this case. 714 00:42:00,110 --> 00:42:01,060 It is what it is. 715 00:42:01,060 --> 00:42:04,090 There's no randomness in the population. 716 00:42:04,090 --> 00:42:04,590 All right? 717 00:42:04,590 --> 00:42:07,630 So you have next week for recitation. 718 00:42:07,630 --> 00:42:09,890 You're going to design a pole and work through how 719 00:42:09,890 --> 00:42:11,640 to calculate the margin of error and work 720 00:42:11,640 --> 00:42:14,640 through what that really means in terms 721 00:42:14,640 --> 00:42:17,410 of what the population is like. 722 00:42:17,410 --> 00:42:20,530 Now of course, if it comes out 100 straight times heads, 723 00:42:20,530 --> 00:42:23,590 you've got to be really unlucky to have the fair coin. 724 00:42:23,590 --> 00:42:25,460 And the same thing with designing the poll 725 00:42:25,460 --> 00:42:28,270 if you're way off. 726 00:42:28,270 --> 00:42:32,761 Any questions about that? 727 00:42:32,761 --> 00:42:33,260 OK. 728 00:42:33,260 --> 00:42:37,970 The next example comes up all the time in practice. 729 00:42:37,970 --> 00:42:40,946 And that's with medical testing. 730 00:42:40,946 --> 00:42:43,547 Maybe I'll leave-- no. 731 00:42:43,547 --> 00:42:44,380 I'll take that down. 732 00:42:44,380 --> 00:42:45,634 We know that now. 733 00:43:05,030 --> 00:43:08,520 Now in this case-- in fact, this is 734 00:43:08,520 --> 00:43:11,530 a question we had on the final exam a few years ago. 735 00:43:11,530 --> 00:43:14,230 And there's a good chance this kind of question's 736 00:43:14,230 --> 00:43:16,340 going to be on the final this year. 737 00:43:16,340 --> 00:43:18,760 There's a disease out there. 738 00:43:18,760 --> 00:43:21,390 And you can have a test for it. 739 00:43:21,390 --> 00:43:24,507 But like most medical tests, they're not perfect. 740 00:43:24,507 --> 00:43:26,840 Sometimes when it says you've got the disease you really 741 00:43:26,840 --> 00:43:27,900 don't. 742 00:43:27,900 --> 00:43:31,657 And if it ways you don't have it, you really do. 743 00:43:31,657 --> 00:43:33,240 So in this case, we're going to assume 744 00:43:33,240 --> 00:43:45,914 that 10% of the population has the disease, whatever it is. 745 00:43:45,914 --> 00:43:47,330 You don't get symptoms right away. 746 00:43:47,330 --> 00:43:49,190 So you have this test. 747 00:43:49,190 --> 00:44:04,142 But if you have the disease there is a 10% chance 748 00:44:04,142 --> 00:44:05,225 that the test is negative. 749 00:44:08,940 --> 00:44:15,040 And this is called a false negative, 750 00:44:15,040 --> 00:44:17,410 because the test comes back negative but it's wrong, 751 00:44:17,410 --> 00:44:19,750 because you have the disease. 752 00:44:19,750 --> 00:44:22,620 And similarly, if you have the disease-- 753 00:44:22,620 --> 00:44:29,240 or sorry-- if you don't have the disease, 754 00:44:29,240 --> 00:44:35,170 there's a 30% chance that the test comes back positive. 755 00:44:35,170 --> 00:44:41,560 And it's called a false positive because it came back positive, 756 00:44:41,560 --> 00:44:43,870 but you don't have it. 757 00:44:43,870 --> 00:44:45,800 So the test is pretty good. 758 00:44:45,800 --> 00:44:46,500 Right? 759 00:44:46,500 --> 00:44:52,390 It's 10% false negative right, 30% false positive right. 760 00:44:52,390 --> 00:44:59,210 Now say you select a random person and they test positive. 761 00:44:59,210 --> 00:45:01,120 What you want to know is the probability 762 00:45:01,120 --> 00:45:04,014 they have the disease given that it's a random person. 763 00:45:07,730 --> 00:45:12,630 So actually, this came up in my personal life. 764 00:45:12,630 --> 00:45:18,080 Many years ago when my wife was pregnant with Alex, 765 00:45:18,080 --> 00:45:22,840 she was exposed to somebody with TB here at MIT. 766 00:45:22,840 --> 00:45:25,400 And she took the test. 767 00:45:25,400 --> 00:45:28,040 And it came back positive. 768 00:45:28,040 --> 00:45:30,639 Now the bad thing-- TB's a bad thing. 769 00:45:30,639 --> 00:45:31,680 You don't want to get it. 770 00:45:31,680 --> 00:45:35,990 But the medicine for it you take for six months. 771 00:45:35,990 --> 00:45:37,760 And she was worried about taking medicine 772 00:45:37,760 --> 00:45:40,530 for six months when she's pregnant because who 773 00:45:40,530 --> 00:45:43,830 knows what the TB medicine does kind of thing 774 00:45:43,830 --> 00:45:45,940 if you have a baby. 775 00:45:45,940 --> 00:45:48,440 So she asked the doc, what's the probability I really 776 00:45:48,440 --> 00:45:51,320 have the disease? 777 00:45:51,320 --> 00:45:53,570 The doc doesn't know. 778 00:45:53,570 --> 00:45:55,620 The doc maybe could give you some of these steps, 779 00:45:55,620 --> 00:45:58,350 10% false negative, 30% false positive. 780 00:45:58,350 --> 00:45:59,760 But it tested positive. 781 00:45:59,760 --> 00:46:03,530 So they just normally give you the medicine. 782 00:46:03,530 --> 00:46:05,140 So say this was the story. 783 00:46:05,140 --> 00:46:06,490 What would you say? 784 00:46:06,490 --> 00:46:07,670 What do you think? 785 00:46:07,670 --> 00:46:10,090 How many people think that it's a least a 70% chance 786 00:46:10,090 --> 00:46:12,900 you got the disease? 787 00:46:12,900 --> 00:46:14,930 She tested positive and it's only 788 00:46:14,930 --> 00:46:16,820 got a 30% false positive rate. 789 00:46:16,820 --> 00:46:18,701 Anybody? 790 00:46:18,701 --> 00:46:21,260 So you don't think she's likely to have it. 791 00:46:21,260 --> 00:46:23,720 How many people think it's better than 50-50 792 00:46:23,720 --> 00:46:26,820 you have the disease? 793 00:46:26,820 --> 00:46:27,870 A few. 794 00:46:27,870 --> 00:46:31,420 How many people think less than 50%. 795 00:46:31,420 --> 00:46:32,290 A bunch. 796 00:46:32,290 --> 00:46:33,070 Yeah. 797 00:46:33,070 --> 00:46:36,250 You're right, in fact. 798 00:46:36,250 --> 00:46:37,440 Let's figure out the answer. 799 00:46:37,440 --> 00:46:39,550 It's easy to do. 800 00:46:39,550 --> 00:46:43,935 So A is the event the person has the disease. 801 00:46:51,800 --> 00:46:56,835 And B is the event that the person tests positive. 802 00:47:01,120 --> 00:47:03,400 And of course what we want to know 803 00:47:03,400 --> 00:47:05,670 is the probability you have the disease given 804 00:47:05,670 --> 00:47:07,550 that you tested positive. 805 00:47:07,550 --> 00:47:11,420 And that's just the probability of both events divided 806 00:47:11,420 --> 00:47:16,340 by the probability of testing positive. 807 00:47:16,340 --> 00:47:20,060 So let's figure that out by drawing the tree. 808 00:47:38,750 --> 00:47:40,970 So first, do you have the disease? 809 00:47:43,980 --> 00:47:45,360 And it's yes or no. 810 00:47:49,320 --> 00:47:50,380 And let's see. 811 00:47:50,380 --> 00:47:53,250 The probability of having the disease, what 812 00:47:53,250 --> 00:47:56,010 is that for a random person? 813 00:47:56,010 --> 00:47:56,810 10%. 814 00:47:56,810 --> 00:47:58,380 that the stat. 815 00:47:58,380 --> 00:48:01,274 So it's-- actually, we'll call it 0.1. 816 00:48:01,274 --> 00:48:03,997 And 9.9 you don't have it. 817 00:48:03,997 --> 00:48:05,080 And then there's the test. 818 00:48:07,940 --> 00:48:09,995 Well, you can be positive or negative. 819 00:48:14,360 --> 00:48:19,520 Now if you have the disease, there 820 00:48:19,520 --> 00:48:26,880 is a-- the chance you test negative is 10%, 0.1. 821 00:48:26,880 --> 00:48:31,352 Therefore there's a 90% chance you test positive. 822 00:48:31,352 --> 00:48:32,810 Now if, you don't have the disease, 823 00:48:32,810 --> 00:48:33,893 you could test either way. 824 00:48:36,740 --> 00:48:40,390 If you don't have the disease there's a 30% chance 825 00:48:40,390 --> 00:48:42,590 you test positive. 826 00:48:42,590 --> 00:48:47,760 30 here and 70% percent chance you're negative. 827 00:48:47,760 --> 00:48:51,620 Now we can compute each sample point probability. 828 00:48:51,620 --> 00:48:56,560 This one is 0.1 times 0.9 is 0.09. 829 00:48:56,560 --> 00:49:00,050 0.1 times 1 is 0.01. 830 00:49:00,050 --> 00:49:03,720 0.9 and 0.3 is 0.27. 831 00:49:03,720 --> 00:49:09,580 0.9 and 0.7 is 0.63. 832 00:49:09,580 --> 00:49:12,350 So all sample points are figured out. 833 00:49:12,350 --> 00:49:16,240 Now we figure out which sample points are in which sets. 834 00:49:16,240 --> 00:49:21,650 So we have event A, event B, and A intersect B. Let's see. 835 00:49:21,650 --> 00:49:25,300 A is the event you have the disease. 836 00:49:25,300 --> 00:49:27,590 That's these guys. 837 00:49:27,590 --> 00:49:31,790 B is the event you test positive. 838 00:49:31,790 --> 00:49:36,840 That's this one and this one. 839 00:49:36,840 --> 00:49:40,521 A intersect B is just this one. 840 00:49:40,521 --> 00:49:41,020 All right. 841 00:49:41,020 --> 00:49:43,149 We're almost done. 842 00:49:43,149 --> 00:49:44,690 Let's just figure out the probability 843 00:49:44,690 --> 00:49:47,190 you have the disease. 844 00:49:47,190 --> 00:49:49,000 What's the probability of A intersect B? 845 00:49:52,340 --> 00:49:53,120 0.09. 846 00:49:53,120 --> 00:49:56,740 It's just that one sample point. 847 00:49:56,740 --> 00:49:59,645 What's the probability that you tested positive? 848 00:50:02,170 --> 00:50:03,230 0.36. 849 00:50:03,230 --> 00:50:03,730 Yeah. 850 00:50:03,730 --> 00:50:13,610 0.09 plus 0.27, which is 0.36. 851 00:50:13,610 --> 00:50:21,430 So I got 0.09 over 0.36 is 1/4. 852 00:50:21,430 --> 00:50:22,790 Wow. 853 00:50:22,790 --> 00:50:24,071 That seems bizarre. 854 00:50:24,071 --> 00:50:24,570 Right? 855 00:50:24,570 --> 00:50:28,740 You've got a test, 10% percent false negative, 856 00:50:28,740 --> 00:50:30,250 30% false positive. 857 00:50:30,250 --> 00:50:34,060 Yet, when you test positive there's only a 25% chance 858 00:50:34,060 --> 00:50:36,490 you have the disease. 859 00:50:36,490 --> 00:50:38,610 So maybe you don't take the medicine. 860 00:50:38,610 --> 00:50:40,830 So if there's risk both ways, probably 861 00:50:40,830 --> 00:50:42,930 don't have the disease. 862 00:50:42,930 --> 00:50:44,970 Yeah? 863 00:50:44,970 --> 00:50:46,470 AUDIENCE: [INAUDIBLE] disease change 864 00:50:46,470 --> 00:50:47,970 because you've already been exposed 865 00:50:47,970 --> 00:50:48,970 to somebody that has it? 866 00:50:48,970 --> 00:50:52,530 PROFESSOR: That's a great point, great point, 867 00:50:52,530 --> 00:50:54,440 because there's additional information 868 00:50:54,440 --> 00:50:57,190 conditioning this in the personal example I cited. 869 00:50:57,190 --> 00:50:59,250 You were exposed to somebody. 870 00:50:59,250 --> 00:51:02,030 So we need to condition on that as well, which 871 00:51:02,030 --> 00:51:04,250 raises the chance you have the disease. 872 00:51:04,250 --> 00:51:06,040 That's a great point. 873 00:51:06,040 --> 00:51:07,950 Yeah. 874 00:51:07,950 --> 00:51:10,420 Just like in the-- well, we haven't got to that example. 875 00:51:10,420 --> 00:51:12,420 Do another example with that exact kind of thing 876 00:51:12,420 --> 00:51:14,431 is very important. 877 00:51:14,431 --> 00:51:14,930 All right. 878 00:51:14,930 --> 00:51:17,920 So this is sort of paradoxical that it 879 00:51:17,920 --> 00:51:21,120 looks like a pretty good test-- low false positive, 880 00:51:21,120 --> 00:51:25,460 full false negatives, but likely be wrong, at least if it 881 00:51:25,460 --> 00:51:28,637 tells you have the disease. 882 00:51:28,637 --> 00:51:29,720 In fact, let's figure out. 883 00:51:29,720 --> 00:51:31,720 What's the probability that the test is correct? 884 00:51:35,620 --> 00:51:39,130 What's the probability the test is right in general? 885 00:51:43,030 --> 00:51:44,350 72%. 886 00:51:44,350 --> 00:51:45,570 Let's see. 887 00:51:45,570 --> 00:51:48,731 So it would be 0.09 plus 0.63. 888 00:51:48,731 --> 00:51:49,231 72%. 889 00:51:56,780 --> 00:51:59,704 So it's likely to be right. 890 00:51:59,704 --> 00:52:01,370 But if it tells you you have the disease 891 00:52:01,370 --> 00:52:02,460 it's likely to be wrong. 892 00:52:05,500 --> 00:52:06,850 It's hard. 893 00:52:06,850 --> 00:52:08,530 Why is this happening? 894 00:52:08,530 --> 00:52:09,780 Why does it come out that way? 895 00:52:09,780 --> 00:52:10,280 Yeah? 896 00:52:12,980 --> 00:52:16,190 AUDIENCE: Then there is only a 1 in 64 chance 897 00:52:16,190 --> 00:52:18,630 that you have the disease. 898 00:52:18,630 --> 00:52:20,790 So if it comes back negative, then it's 899 00:52:20,790 --> 00:52:22,584 a pretty good indication that you're OK. 900 00:52:22,584 --> 00:52:23,250 PROFESSOR: Yeah. 901 00:52:23,250 --> 00:52:26,980 If it comes back negative than it really is doing very well. 902 00:52:26,980 --> 00:52:27,907 That's right. 903 00:52:27,907 --> 00:52:29,615 But why is it when it comes back positive 904 00:52:29,615 --> 00:52:32,156 that you're unlikely to have the disease if it's a good test. 905 00:52:32,156 --> 00:52:32,886 Yeah. 906 00:52:32,886 --> 00:52:34,260 AUDIENCE: The disease is so rare. 907 00:52:34,260 --> 00:52:35,760 PROFESSOR: The disease is so rare. 908 00:52:35,760 --> 00:52:37,500 Absolutely. 909 00:52:37,500 --> 00:52:39,770 This number here is so small. 910 00:52:39,770 --> 00:52:41,960 And that's what's doing it. 911 00:52:41,960 --> 00:52:45,020 Because if you look at how many people have the disease 912 00:52:45,020 --> 00:52:47,120 and test positive, it's 0.09. 913 00:52:47,120 --> 00:52:50,270 So many people don't have the disease 914 00:52:50,270 --> 00:52:53,120 that even with a small false positive rate, this number 915 00:52:53,120 --> 00:52:55,380 swamps out that number. 916 00:52:55,380 --> 00:52:57,930 In fact, imagine nobody had the disease. 917 00:52:57,930 --> 00:53:00,420 You'd have a 0 here. 918 00:53:00,420 --> 00:53:01,140 All right? 919 00:53:01,140 --> 00:53:04,750 And then you would always be wrong if you said you had it. 920 00:53:04,750 --> 00:53:05,480 OK? 921 00:53:05,480 --> 00:53:08,780 That's good. 922 00:53:08,780 --> 00:53:09,280 OK. 923 00:53:09,280 --> 00:53:11,980 This comes up in weather prediction, the same paradox. 924 00:53:11,980 --> 00:53:14,050 For example, say you're trying to predict 925 00:53:14,050 --> 00:53:15,750 the weather for Seattle. 926 00:53:15,750 --> 00:53:18,160 Sometimes it seems like this in Boston. 927 00:53:18,160 --> 00:53:22,490 And you just say, it's going to rain. 928 00:53:22,490 --> 00:53:25,272 Forget all the fancy weather forecasting stuff, the radar, 929 00:53:25,272 --> 00:53:25,980 and all the rest. 930 00:53:25,980 --> 00:53:28,440 Just say it's going to rain tomorrow. 931 00:53:28,440 --> 00:53:30,942 You're going to be right almost all the time. 932 00:53:30,942 --> 00:53:31,620 All right? 933 00:53:31,620 --> 00:53:34,350 And in fact, if you try to do fancy stuff, 934 00:53:34,350 --> 00:53:37,220 you're probably going to be wrong more of the time. 935 00:53:37,220 --> 00:53:37,800 All right. 936 00:53:37,800 --> 00:53:40,130 For example, in this case, if you just 937 00:53:40,130 --> 00:53:44,510 say the person does not have the disease, forget the lab test. 938 00:53:44,510 --> 00:53:47,551 Just come back with negative. 939 00:53:47,551 --> 00:53:48,550 How often are you right? 940 00:53:51,870 --> 00:53:54,190 90% of the time you're right. 941 00:53:54,190 --> 00:53:56,460 Much better than the test you paid a lot of money for. 942 00:53:59,050 --> 00:53:59,850 I see. 943 00:53:59,850 --> 00:54:02,630 You've got to be careful what you're looking for, 944 00:54:02,630 --> 00:54:06,500 how you measure the value of a test or a prediction. 945 00:54:06,500 --> 00:54:08,140 Because presumably the one you paid 946 00:54:08,140 --> 00:54:14,240 for is better, even though accurate less of the time. 947 00:54:14,240 --> 00:54:16,004 Any questions about that? 948 00:54:20,350 --> 00:54:22,030 OK. 949 00:54:22,030 --> 00:54:23,530 So For the rest of today we're going 950 00:54:23,530 --> 00:54:27,160 to do three more paradoxes. 951 00:54:27,160 --> 00:54:29,990 And in each case they're going to expose 952 00:54:29,990 --> 00:54:33,220 a flaw in our intuition about probability. 953 00:54:33,220 --> 00:54:34,832 But the good news is in each case it's 954 00:54:34,832 --> 00:54:36,040 easy to get the right answer. 955 00:54:36,040 --> 00:54:39,640 Just stick with the math and try not to think about it. 956 00:54:39,640 --> 00:54:44,090 Now the first example is a game involving 957 00:54:44,090 --> 00:54:48,240 dice that's called carnival dice that you can find in carnivals 958 00:54:48,240 --> 00:54:50,175 and you can also find in casinos. 959 00:54:53,150 --> 00:54:57,050 It's a pretty popular game, actually. 960 00:54:57,050 --> 00:54:59,775 So the way it works is as follows. 961 00:55:07,310 --> 00:55:21,360 The player picks a number from 1 to 6-- we'll call it N-- 962 00:55:21,360 --> 00:55:23,650 and then rolls three dice. 963 00:55:23,650 --> 00:55:26,006 And let's say they're fair and mutually independent. 964 00:55:34,970 --> 00:55:36,470 We haven't talked about independent. 965 00:55:36,470 --> 00:55:38,460 So they're fair dice. 966 00:55:38,460 --> 00:55:41,250 For now, normal dice-- nothing fishy. 967 00:55:41,250 --> 00:55:50,780 And the player wins if and only if the number he picked 968 00:55:50,780 --> 00:55:52,455 comes up on at least one of the dice. 969 00:55:58,100 --> 00:55:59,830 So you either win or you lose the game 970 00:55:59,830 --> 00:56:03,375 depending on if your lucky number came up at least once. 971 00:56:05,960 --> 00:56:08,590 Now you've got three dice, each of which has a 1 972 00:56:08,590 --> 00:56:10,740 in 6 chance of coming up a winner for you. 973 00:56:13,420 --> 00:56:17,730 So how many people think this is a fair game-- you 974 00:56:17,730 --> 00:56:22,730 got a 50-50 chance of winning-- three dice, each 1/6 975 00:56:22,730 --> 00:56:24,290 chance of winning? 976 00:56:24,290 --> 00:56:27,540 Anybody think it's not a fair game? 977 00:56:27,540 --> 00:56:28,420 A bunch of you. 978 00:56:28,420 --> 00:56:31,840 How many people think it is a fair game-- 50-50? 979 00:56:31,840 --> 00:56:32,601 A few. 980 00:56:32,601 --> 00:56:33,100 All right. 981 00:56:33,100 --> 00:56:36,479 Well, let's figure it out. 982 00:56:36,479 --> 00:56:38,270 And instead of doing the tree method, which 983 00:56:38,270 --> 00:56:39,728 we know we're supposed to do, we're 984 00:56:39,728 --> 00:56:46,284 just going to wing it, which is always seems easier to do. 985 00:56:51,650 --> 00:56:54,180 If you're in the Casino you want to just wing it 986 00:56:54,180 --> 00:56:57,970 instead of taking your napkin out and drawing a tree. 987 00:56:57,970 --> 00:57:07,740 So the claim, question mark, is the probability you win in 1/2. 988 00:57:07,740 --> 00:57:15,250 And the proof, question mark, is you let Ai 989 00:57:15,250 --> 00:57:25,220 be the event that the i-th die comes up N. And i is 1 to 3 990 00:57:25,220 --> 00:57:26,867 here. 991 00:57:26,867 --> 00:57:27,700 So then you say, OK. 992 00:57:27,700 --> 00:57:33,300 The probability I win is the probability of A1-- 993 00:57:33,300 --> 00:57:38,010 I could win that way-- or A2, or A3. 994 00:57:38,010 --> 00:57:41,810 All I need is one of the die to come up my way. 995 00:57:41,810 --> 00:57:45,430 And that is the probability of A1 996 00:57:45,430 --> 00:57:52,210 plus the probability of A2 plus the probability of A3. 997 00:57:52,210 --> 00:57:54,555 And each die wins for me with probability 1/6. 998 00:57:59,345 --> 00:58:00,220 And that is then 1/2. 999 00:58:03,460 --> 00:58:09,706 So that's a proof that we win with probability of 1/2. 1000 00:58:09,706 --> 00:58:13,160 What do you think? 1001 00:58:13,160 --> 00:58:14,387 Any problems with that proof? 1002 00:58:14,387 --> 00:58:15,262 AUDIENCE: [INAUDIBLE] 1003 00:58:20,489 --> 00:58:22,030 PROFESSOR: Well that's a great point. 1004 00:58:22,030 --> 00:58:22,850 Yeah. 1005 00:58:22,850 --> 00:58:25,950 So if I extended this nice proof technique 1006 00:58:25,950 --> 00:58:29,110 I couldn't have probability of 7/6 of winning with seven die. 1007 00:58:29,110 --> 00:58:30,250 Yeah? 1008 00:58:30,250 --> 00:58:31,125 AUDIENCE: [INAUDIBLE] 1009 00:58:34,874 --> 00:58:35,540 PROFESSOR: Yeah. 1010 00:58:35,540 --> 00:58:36,990 You're very close. 1011 00:58:36,990 --> 00:58:41,062 I didn't technically assume that. 1012 00:58:41,062 --> 00:58:43,970 AUDIENCE: [INAUDIBLE] 1013 00:58:43,970 --> 00:58:45,450 PROFESSOR: They could double up. 1014 00:58:45,450 --> 00:58:46,580 Yeah. 1015 00:58:46,580 --> 00:58:50,040 There's no intersection in the events. 1016 00:58:50,040 --> 00:58:53,220 In fact, there is intersection because there's 1017 00:58:53,220 --> 00:58:56,740 a chance I rolled all six-- all Ns. 1018 00:58:56,740 --> 00:58:57,680 Say N is 6. 1019 00:58:57,680 --> 00:58:59,810 I could roll all sixes and then each of these 1020 00:58:59,810 --> 00:59:00,990 would be a winner. 1021 00:59:00,990 --> 00:59:03,604 But I don't get to count them separately. 1022 00:59:03,604 --> 00:59:05,020 Then I only win once in that case. 1023 00:59:05,020 --> 00:59:07,290 In other words, all of these could turned on 1024 00:59:07,290 --> 00:59:08,060 at the same time. 1025 00:59:08,060 --> 00:59:09,268 There's an intersection here. 1026 00:59:09,268 --> 00:59:13,570 So this rule does not hold. 1027 00:59:13,570 --> 00:59:16,290 I need the Ai to be disjoined for this 1028 00:59:16,290 --> 00:59:20,380 to be true-- the events to be disjoined. 1029 00:59:20,380 --> 00:59:22,110 And they're not disjoined because there's 1030 00:59:22,110 --> 00:59:25,040 a sample point were two or more of the die 1031 00:59:25,040 --> 00:59:28,470 could come up the same being a winner, which 1032 00:59:28,470 --> 00:59:33,560 means the same sample point, namely all die are N, 1033 00:59:33,560 --> 00:59:34,960 comes up in each of these three. 1034 00:59:34,960 --> 00:59:36,530 So they're not disjoined. 1035 00:59:36,530 --> 00:59:40,690 Now what's the principal you used two weeks ago when 1036 00:59:40,690 --> 00:59:46,560 you did cardinality of a set-- cardinality of a union of sets? 1037 00:59:46,560 --> 00:59:48,480 Inclusion, exclusion. 1038 00:59:48,480 --> 00:59:50,700 And the same thing needs to be done here. 1039 00:59:55,870 --> 00:59:57,364 So let's do that. 1040 00:59:57,364 --> 00:59:59,405 And then we'll figure out the actual probability. 1041 01:00:03,450 --> 01:00:08,130 So this is a fact based on the inclusion, exclusion principle. 1042 01:00:08,130 --> 01:00:13,510 The probability of A1, union A2, union A3, 1043 01:00:13,510 --> 01:00:17,680 is just what you think it would be from inclusion, exclusion. 1044 01:00:17,680 --> 01:00:19,610 It's a probability of A1 plus a probability 1045 01:00:19,610 --> 01:00:26,550 of A2 plus the probability of A3 minus 1046 01:00:26,550 --> 01:00:27,720 the pairwise intersections. 1047 01:00:33,700 --> 01:00:39,770 A1 intersect A3 minus probability of A2 intersect A3. 1048 01:00:42,310 --> 01:00:45,860 And is there anything else? 1049 01:00:45,860 --> 01:00:48,300 Plus, the probably of all of them matching. 1050 01:00:54,860 --> 01:00:55,360 OK. 1051 01:00:55,360 --> 01:00:59,070 So the proof is really the same proof 1052 01:00:59,070 --> 01:01:01,676 you use for inclusion, exclusion with sets. 1053 01:01:01,676 --> 01:01:03,800 The only difference is that in a probability space, 1054 01:01:03,800 --> 01:01:05,710 we have weights on the elements. 1055 01:01:05,710 --> 01:01:09,190 And the weight corresponds to the probability. 1056 01:01:09,190 --> 01:01:14,510 So in fact, if you were drawing the sample space, say here's A1 1057 01:01:14,510 --> 01:01:19,790 and here's A2, and here's A3. 1058 01:01:19,790 --> 01:01:24,860 Well, you need to add the probabilities here, here, 1059 01:01:24,860 --> 01:01:25,770 and here. 1060 01:01:25,770 --> 01:01:30,180 Then you subtract off the double counting from here, from here, 1061 01:01:30,180 --> 01:01:30,790 and from here. 1062 01:01:30,790 --> 01:01:33,560 And then you add back again what you subtracted off 1063 01:01:33,560 --> 01:01:36,460 too much there. 1064 01:01:36,460 --> 01:01:39,430 Same proof, it's just your have weights on the elements 1065 01:01:39,430 --> 01:01:41,940 of probabilities. 1066 01:01:41,940 --> 01:01:42,440 All right. 1067 01:01:42,440 --> 01:01:45,280 So let's figure out the right probability. 1068 01:01:45,280 --> 01:01:50,550 That's 1/6, 1/6, 1/6. 1069 01:01:50,550 --> 01:01:53,700 What's the probability of the first two die 1070 01:01:53,700 --> 01:01:54,980 matching-- both of them? 1071 01:02:00,360 --> 01:02:02,640 1/36. 1072 01:02:02,640 --> 01:02:05,110 We'll talk more about why that is next time. 1073 01:02:05,110 --> 01:02:09,600 But there's a 6 for A1 then given that 1/6 for the second 1074 01:02:09,600 --> 01:02:10,260 die matching. 1075 01:02:10,260 --> 01:02:15,800 So it's 1/6 times 1/6 minus the 1/36. 1076 01:02:15,800 --> 01:02:22,010 1/36, the chance that all three match is 1/216 or 6 cubed. 1077 01:02:22,010 --> 01:02:29,520 So when you add all that up you get the 0.421 and some more. 1078 01:02:29,520 --> 01:02:31,060 So the chance of winning this game 1079 01:02:31,060 --> 01:02:36,280 is 41% which makes it a worst game in the casino. 1080 01:02:36,280 --> 01:02:38,560 It is hard to find a worse game than this. 1081 01:02:38,560 --> 01:02:39,580 Roulette, much better. 1082 01:02:39,580 --> 01:02:42,960 We'll study Roulette in the last lecture-- much better game. 1083 01:02:42,960 --> 01:02:45,990 And even that's a terrible game to play. 1084 01:02:45,990 --> 01:02:47,700 So it looks like an easy game. 1085 01:02:47,700 --> 01:02:49,600 There's a quick proof that it's 50-50. 1086 01:02:49,600 --> 01:02:54,250 But it's horrible odds against the house. 1087 01:02:54,250 --> 01:02:57,660 Now, this is a nice example because it 1088 01:02:57,660 --> 01:03:01,490 shows how a rule you had for computing the cardinality 1089 01:03:01,490 --> 01:03:05,100 of a set gives you the probability. 1090 01:03:05,100 --> 01:03:05,620 All right. 1091 01:03:05,620 --> 01:03:10,210 In fact, all the set laws you learned a couple weeks ago 1092 01:03:10,210 --> 01:03:12,969 work for probability spaces the same way. 1093 01:03:12,969 --> 01:03:14,760 And there were several of those in homework 1094 01:03:14,760 --> 01:03:18,400 that you just had the last problem set. 1095 01:03:18,400 --> 01:03:19,960 Any questions about that? 1096 01:03:25,660 --> 01:03:27,090 OK. 1097 01:03:27,090 --> 01:03:31,800 Now in addition, all those set laws you did also 1098 01:03:31,800 --> 01:03:35,070 work for conditional probabilities. 1099 01:03:35,070 --> 01:03:38,410 For example, this is true. 1100 01:03:38,410 --> 01:03:44,690 The probability of A union B given C-- whoops-- given C, 1101 01:03:44,690 --> 01:03:50,860 is the probability of A given C plus the probability of B 1102 01:03:50,860 --> 01:03:57,030 given C minus the intersection, A intersect B given 1103 01:03:57,030 --> 01:04:02,180 C. In other words, take any probability rule you have 1104 01:04:02,180 --> 01:04:06,580 and condition everything on an event, C, and it still works. 1105 01:04:09,310 --> 01:04:11,340 And the proof is not hard. 1106 01:04:11,340 --> 01:04:13,580 You can go through each individual law 1107 01:04:13,580 --> 01:04:17,550 but it all comes out to be fine. 1108 01:04:17,550 --> 01:04:19,680 All right. 1109 01:04:19,680 --> 01:04:21,770 You have to be a little careful though because you 1110 01:04:21,770 --> 01:04:25,030 got to remember which side you're doing, 1111 01:04:25,030 --> 01:04:27,580 which what you're putting on either side of the bar here. 1112 01:04:27,580 --> 01:04:29,830 For example, what about this one? 1113 01:04:29,830 --> 01:04:30,540 Is this true? 1114 01:04:33,080 --> 01:04:34,900 Claim. 1115 01:04:34,900 --> 01:04:37,400 Let's take-- say C and D are disjoined. 1116 01:04:42,900 --> 01:04:45,690 Is this true? 1117 01:04:45,690 --> 01:04:49,870 Then the probability of A conditioned 1118 01:04:49,870 --> 01:04:56,450 on C union D. So given that either C or D is true, 1119 01:04:56,450 --> 01:05:00,810 does that equal the probability of A given C 1120 01:05:00,810 --> 01:05:04,420 plus probability of A given D? 1121 01:05:07,140 --> 01:05:11,150 We know that if I swapped all these, it's true. 1122 01:05:11,150 --> 01:05:14,362 The probability of C union D when C and D are disjoined 1123 01:05:14,362 --> 01:05:16,820 is the probability that C given A plus the probability of D 1124 01:05:16,820 --> 01:05:19,030 given A. That I just claimed. 1125 01:05:19,030 --> 01:05:20,070 And what about this way? 1126 01:05:20,070 --> 01:05:21,685 Can I swap things around? 1127 01:05:24,580 --> 01:05:26,306 Yeah? 1128 01:05:26,306 --> 01:05:32,710 AUDIENCE: [INAUDIBLE] would C union D be 0? 1129 01:05:32,710 --> 01:05:36,350 PROFESSOR: If C and D are disjoined, 1130 01:05:36,350 --> 01:05:43,050 C union D would just be C union D. But you're not a good point. 1131 01:05:43,050 --> 01:05:44,540 What if C and D are disjoined? 1132 01:05:44,540 --> 01:05:45,740 That's a good example. 1133 01:05:45,740 --> 01:05:46,406 Let's draw that. 1134 01:05:52,726 --> 01:05:53,726 Let's look at that case. 1135 01:05:57,640 --> 01:06:00,540 So we've got a sample space here. 1136 01:06:00,540 --> 01:06:05,560 And you've got C here and D here. 1137 01:06:05,560 --> 01:06:09,745 And just for fun, let's make A be here-- include all of them. 1138 01:06:13,140 --> 01:06:18,680 What's the probability-- is this going to do what I want? 1139 01:06:18,680 --> 01:06:19,230 Yeah. 1140 01:06:19,230 --> 01:06:21,504 What's the probability of A given C? 1141 01:06:25,440 --> 01:06:26,330 1. 1142 01:06:26,330 --> 01:06:30,300 If I'm in C I'm in A. A is everything here. 1143 01:06:30,300 --> 01:06:32,970 So the probability of A given C is one. 1144 01:06:32,970 --> 01:06:36,255 What's the probability of A given D? 1145 01:06:36,255 --> 01:06:39,220 1. 1146 01:06:39,220 --> 01:06:39,720 All right. 1147 01:06:39,720 --> 01:06:41,277 Well, this is a problem because I 1148 01:06:41,277 --> 01:06:43,860 can't have the probability ot-- what's the probably of A given 1149 01:06:43,860 --> 01:06:46,890 C union D? 1150 01:06:46,890 --> 01:06:47,790 Well, it can't be 2. 1151 01:06:47,790 --> 01:06:48,950 Right? 1152 01:06:48,950 --> 01:06:50,550 It's 1. 1153 01:06:50,550 --> 01:06:51,550 They are not equal. 1154 01:06:54,120 --> 01:06:58,660 So you cannot do those set rules on the right side 1155 01:06:58,660 --> 01:07:01,020 of the conditioning bar. 1156 01:07:01,020 --> 01:07:04,080 You can do them on the left, not on the right. 1157 01:07:04,080 --> 01:07:04,580 All right. 1158 01:07:04,580 --> 01:07:05,413 So this is not true. 1159 01:07:13,320 --> 01:07:14,391 Now nobody would do this. 1160 01:07:14,391 --> 01:07:14,890 Right? 1161 01:07:14,890 --> 01:07:18,140 I mean, the probability of-- not that it's-- see this example? 1162 01:07:18,140 --> 01:07:21,520 This you just would never make this mistake again seeing 1163 01:07:21,520 --> 01:07:23,450 that example. 1164 01:07:23,450 --> 01:07:25,420 Everybody understand the example, 1165 01:07:25,420 --> 01:07:27,170 how it's clearly not always the case 1166 01:07:27,170 --> 01:07:28,770 that probability of A given C union 1167 01:07:28,770 --> 01:07:33,015 D is a probability of A given C plus probability of A given D? 1168 01:07:33,015 --> 01:07:35,390 Because now I'm going to show you an example where you're 1169 01:07:35,390 --> 01:07:38,250 going to swear it's true. 1170 01:07:38,250 --> 01:07:39,372 All right? 1171 01:07:39,372 --> 01:07:40,705 And this is a real life example. 1172 01:07:43,660 --> 01:07:47,270 Many years ago now there was a sex discrimination suit 1173 01:07:47,270 --> 01:07:49,320 at Berkeley. 1174 01:07:49,320 --> 01:07:52,170 There was a female professor in the math department. 1175 01:07:52,170 --> 01:07:54,500 And she was denied tenure. 1176 01:07:54,500 --> 01:07:57,180 And she filed a lawsuit against Berkeley 1177 01:07:57,180 --> 01:07:59,680 alleging sex discrimination. 1178 01:07:59,680 --> 01:08:02,390 Said she wasn't tenured because she's a woman. 1179 01:08:02,390 --> 01:08:04,661 Now, unfortunately sex discrimination 1180 01:08:04,661 --> 01:08:06,035 is a problem in math departments. 1181 01:08:06,035 --> 01:08:09,900 It's historically been a difficult area. 1182 01:08:09,900 --> 01:08:11,340 But it's always hard to prove. 1183 01:08:11,340 --> 01:08:12,840 It's a nebulous kind of thing. 1184 01:08:12,840 --> 01:08:14,590 They don't say, hey, you can't have tenure 1185 01:08:14,590 --> 01:08:15,810 because you're a woman. 1186 01:08:15,810 --> 01:08:19,550 They'd get sued and get killed for that. 1187 01:08:19,550 --> 01:08:24,020 So she had to get some mat to back her up. 1188 01:08:24,020 --> 01:08:27,569 So what she did is she looked into Berkeley's practices 1189 01:08:27,569 --> 01:08:31,000 and she found that in all 22 departments, 1190 01:08:31,000 --> 01:08:34,040 every single department, the percentage 1191 01:08:34,040 --> 01:08:38,430 of male PhD applicants that were accepted 1192 01:08:38,430 --> 01:08:43,950 was higher than the percentage of female PhD applicants 1193 01:08:43,950 --> 01:08:46,069 that were accepted. 1194 01:08:46,069 --> 01:08:49,069 Now you could understand some of the departments accepting more 1195 01:08:49,069 --> 01:08:50,809 male PhDs than female PhDs. 1196 01:08:50,809 --> 01:08:53,430 But all 22? 1197 01:08:53,430 --> 01:08:54,740 What are the odds of that? 1198 01:08:54,740 --> 01:08:56,450 I mean, so the immediate conclusion 1199 01:08:56,450 --> 01:09:00,220 is, well, that's clearly there's sex discrimination going on 1200 01:09:00,220 --> 01:09:01,500 at Berkeley. 1201 01:09:01,500 --> 01:09:03,399 OK? 1202 01:09:03,399 --> 01:09:06,260 Well Berkeley took a look at that and said, nothing good. 1203 01:09:06,260 --> 01:09:08,130 That doesn't look good for them. 1204 01:09:08,130 --> 01:09:13,140 But they did their own study of PhD applicants. 1205 01:09:13,140 --> 01:09:16,930 And they said that if the university as a whole-- 1206 01:09:16,930 --> 01:09:21,040 look at the University as a whole, actually, the women, 1207 01:09:21,040 --> 01:09:25,029 the females have a higher acceptance rate for the PhD 1208 01:09:25,029 --> 01:09:27,609 Program than the men. 1209 01:09:27,609 --> 01:09:28,109 So look. 1210 01:09:28,109 --> 01:09:29,840 Berkeley said, we're accepting more women 1211 01:09:29,840 --> 01:09:32,540 than men percentage-wise. 1212 01:09:32,540 --> 01:09:35,620 So how could we be discriminating against women? 1213 01:09:35,620 --> 01:09:38,670 And this is where the same argument the female faculty 1214 01:09:38,670 --> 01:09:41,010 member's making, But they're saying as a university 1215 01:09:41,010 --> 01:09:44,160 as a whole, when you add up all 22 departments. 1216 01:09:44,160 --> 01:09:45,410 Well, that sounds pretty good. 1217 01:09:45,410 --> 01:09:47,790 How could they be discriminating? 1218 01:09:47,790 --> 01:09:48,290 OK. 1219 01:09:48,290 --> 01:09:51,790 So the question for you guys, is it possible that 1220 01:09:51,790 --> 01:09:54,790 both sides we're telling the truth, 1221 01:09:54,790 --> 01:09:57,580 that in every single department the women have a lower 1222 01:09:57,580 --> 01:10:02,010 acceptance rate than men, but on the university as a whole 1223 01:10:02,010 --> 01:10:04,850 the women are higher percentage? 1224 01:10:04,850 --> 01:10:12,450 It sounds like it's-- and just to avoid any confusion here, 1225 01:10:12,450 --> 01:10:17,330 people only apply to one department and they're only one 1226 01:10:17,330 --> 01:10:18,490 sex. 1227 01:10:18,490 --> 01:10:21,916 So you can't-- Carroll didn't apply. 1228 01:10:21,916 --> 01:10:25,620 [LAUGHTER] 1229 01:10:25,620 --> 01:10:28,210 How many people think that one of the sides, actually, when 1230 01:10:28,210 --> 01:10:30,040 they look at the studies was wrong, 1231 01:10:30,040 --> 01:10:33,070 that they're contradictory? 1232 01:10:33,070 --> 01:10:34,270 Nobody? 1233 01:10:34,270 --> 01:10:35,852 You've been in 6 over 2 too long. 1234 01:10:35,852 --> 01:10:37,310 How many people think it's possible 1235 01:10:37,310 --> 01:10:40,310 that both sides were right? 1236 01:10:40,310 --> 01:10:40,941 Yeah. 1237 01:10:40,941 --> 01:10:41,440 All right. 1238 01:10:41,440 --> 01:10:44,410 So let's see how this works. 1239 01:10:50,709 --> 01:10:52,500 And to make it simple I'm going to get down 1240 01:10:52,500 --> 01:10:56,500 to just two departments rather than try to do data for all 22. 1241 01:10:56,500 --> 01:10:59,706 And I'm going to do not the actual data but something 1242 01:10:59,706 --> 01:11:01,122 that's represents what's going on. 1243 01:11:05,931 --> 01:11:06,430 OK. 1244 01:11:06,430 --> 01:11:08,388 So we're going to look at the following events. 1245 01:11:12,210 --> 01:11:19,340 A is the event that the applicant is admitted. 1246 01:11:25,930 --> 01:11:30,960 FCS is the event that the applicant 1247 01:11:30,960 --> 01:11:37,645 is female and applying to CS. 1248 01:11:40,420 --> 01:11:43,770 FEE is the event that the applicant 1249 01:11:43,770 --> 01:11:46,815 is female and applying to EE. 1250 01:11:49,540 --> 01:11:59,140 MCS is the event the applicant is a male and CS. 1251 01:11:59,140 --> 01:12:06,150 And then finally we have MEE is the event the applicant is 1252 01:12:06,150 --> 01:12:08,194 male and in EE. 1253 01:12:08,194 --> 01:12:10,110 So we're just going to look at two departments 1254 01:12:10,110 --> 01:12:13,950 here and try to figure out if it can happen 1255 01:12:13,950 --> 01:12:16,760 that in both departments the women are worse off 1256 01:12:16,760 --> 01:12:19,148 but if you take the union they're better off. 1257 01:12:25,840 --> 01:12:30,070 So the female professor's argument effectively 1258 01:12:30,070 --> 01:12:34,030 is, the probability of being admitted given that you're 1259 01:12:34,030 --> 01:12:39,730 a female in CS is less than the probability of being admitted 1260 01:12:39,730 --> 01:12:43,080 given that you're a male at CS. 1261 01:12:43,080 --> 01:12:46,820 And same thing in EE. 1262 01:12:46,820 --> 01:12:50,210 Probability of being admitted in EE if you're a female 1263 01:12:50,210 --> 01:12:51,465 is less than if you're a male. 1264 01:12:56,290 --> 01:12:57,680 OK? 1265 01:12:57,680 --> 01:13:04,220 Now Berkeley is saying it's sort of the reverse. 1266 01:13:04,220 --> 01:13:07,740 The probability that you're admitted 1267 01:13:07,740 --> 01:13:13,260 given that you're a female in either department 1268 01:13:13,260 --> 01:13:17,530 is bigger than the probability of being admitted if you're 1269 01:13:17,530 --> 01:13:19,625 a male in either department. 1270 01:13:24,720 --> 01:13:25,220 OK. 1271 01:13:25,220 --> 01:13:28,180 So we've now expressed their arguments 1272 01:13:28,180 --> 01:13:33,310 as conditional probabilities Any questions? 1273 01:13:33,310 --> 01:13:37,045 Can you sort of see why this seems contradictory? 1274 01:13:39,610 --> 01:13:41,800 Not plus, union. 1275 01:13:41,800 --> 01:13:45,990 Because this is sort of like-- these are just joined. 1276 01:13:45,990 --> 01:13:50,220 This is the sum of those. 1277 01:13:50,220 --> 01:13:54,680 And this is sort of the sum of those. 1278 01:13:54,680 --> 01:13:58,900 And yet the inequality changed. 1279 01:13:58,900 --> 01:13:59,400 All right. 1280 01:13:59,400 --> 01:14:03,150 In fact, this is the logic that we've just debunked over 1281 01:14:03,150 --> 01:14:06,306 there-- exactly that claim. 1282 01:14:06,306 --> 01:14:09,610 In fact, these are not equal as the sum. 1283 01:14:13,230 --> 01:14:14,270 So let's do an example. 1284 01:14:18,370 --> 01:14:23,107 Say that-- let's do it over here. 1285 01:14:23,107 --> 01:14:24,690 I'll put the real values in over here. 1286 01:14:24,690 --> 01:14:28,170 Say that for women in computer science, 0 out of 1 1287 01:14:28,170 --> 01:14:32,040 were admitted compared to the men, were 50 out of 100 1288 01:14:32,040 --> 01:14:34,800 were admitted. 1289 01:14:34,800 --> 01:14:37,680 And then in EE, 70 out of 100 women 1290 01:14:37,680 --> 01:14:43,850 were admitted compared to the men, which had 1 out of 1. 1291 01:14:43,850 --> 01:14:44,350 All right? 1292 01:14:44,350 --> 01:14:47,450 So as ratios, 70% is less than 100%. 1293 01:14:47,450 --> 01:14:49,620 0% is less than 50. 1294 01:14:49,620 --> 01:14:51,620 Now if I look at the two departments is a whole, 1295 01:14:51,620 --> 01:15:00,830 I get 70 over 101 is in fact bigger than 51 over 101. 1296 01:15:00,830 --> 01:15:01,330 All right? 1297 01:15:01,330 --> 01:15:02,790 And so as a whole women are a lot more 1298 01:15:02,790 --> 01:15:04,590 likely to be admitted even though in each department 1299 01:15:04,590 --> 01:15:06,048 they're less likely to be admitted. 1300 01:15:08,020 --> 01:15:10,800 OK? 1301 01:15:10,800 --> 01:15:13,700 So what went wrong with the intuition, which 1302 01:15:13,700 --> 01:15:16,170 you didn't fall victim to, but people often 1303 01:15:16,170 --> 01:15:21,280 do, that it shouldn't have been possible given that? 1304 01:15:21,280 --> 01:15:22,710 What's going on here that make it 1305 01:15:22,710 --> 01:15:25,659 so that it's not a less than when 1306 01:15:25,659 --> 01:15:27,367 you look at the union of the departments? 1307 01:15:33,440 --> 01:15:34,066 Yeah? 1308 01:15:34,066 --> 01:15:36,190 AUDIENCE: [INAUDIBLE] they're weighted differently? 1309 01:15:36,190 --> 01:15:36,856 PROFESSOR: Yeah. 1310 01:15:36,856 --> 01:15:39,497 They're weighted very differently. 1311 01:15:39,497 --> 01:15:41,530 You got huge waves here. 1312 01:15:41,530 --> 01:15:42,030 Right? 1313 01:15:42,030 --> 01:15:45,740 So if I look at the average of the percentages here, 1314 01:15:45,740 --> 01:15:50,960 well it's 35% for the women versus 75% for the men. 1315 01:15:50,960 --> 01:15:53,860 So the average of the percentage is just what you'd think. 1316 01:15:53,860 --> 01:15:56,280 35 is less than 75. 1317 01:15:56,280 --> 01:15:59,910 But I've got huge weightings on these guys, which changes 1318 01:15:59,910 --> 01:16:03,180 the numbers quite dramatically. 1319 01:16:03,180 --> 01:16:04,776 So it all depends how you count it. 1320 01:16:08,432 --> 01:16:10,390 Actually, who do you think had a better-- Yeah. 1321 01:16:10,390 --> 01:16:11,226 Go ahead. 1322 01:16:11,226 --> 01:16:12,120 AUDIENCE: [INAUDIBLE] 1323 01:16:12,120 --> 01:16:13,411 PROFESSOR: Who won the lawsuit? 1324 01:16:13,411 --> 01:16:15,910 Actually, the woman won the lawsuit. 1325 01:16:15,910 --> 01:16:18,179 And which argument would you buy now? 1326 01:16:18,179 --> 01:16:19,220 You've got two arguments. 1327 01:16:19,220 --> 01:16:23,120 Which one would you believe if either? 1328 01:16:23,120 --> 01:16:24,250 Which one? 1329 01:16:24,250 --> 01:16:28,820 I mean, now if I look at exactly this data 1330 01:16:28,820 --> 01:16:32,160 I might side-- I might side with Berkeley 1331 01:16:32,160 --> 01:16:34,400 looking at these numbers. 1332 01:16:34,400 --> 01:16:38,190 Then again, when you think about all 22 departments and the fact 1333 01:16:38,190 --> 01:16:40,940 they weren't this lopsided, not so good. 1334 01:16:40,940 --> 01:16:43,407 So in the end Berkeley lost. 1335 01:16:43,407 --> 01:16:45,240 I'm going to see another example in a minute 1336 01:16:45,240 --> 01:16:47,406 where it's even more clear which side to believe in. 1337 01:16:47,406 --> 01:16:49,330 But it really depends on the numbers 1338 01:16:49,330 --> 01:16:51,460 as to which one you might, if you had to vote, 1339 01:16:51,460 --> 01:16:54,070 which way you'd vote. 1340 01:16:54,070 --> 01:16:56,390 Here's another example. 1341 01:16:56,390 --> 01:17:00,820 This is from a newspaper article on which airlines are best 1342 01:17:00,820 --> 01:17:05,764 to fly because they have the best on-time rates. 1343 01:17:05,764 --> 01:17:10,300 And in this case they were comparing American Airlines 1344 01:17:10,300 --> 01:17:14,850 and America West, looking at on-time rates. 1345 01:17:14,850 --> 01:17:18,550 And here's the data they showed for the two airlines. 1346 01:17:18,550 --> 01:17:20,520 Here's American Airlines. 1347 01:17:20,520 --> 01:17:23,510 Here's America West. 1348 01:17:23,510 --> 01:17:31,070 And they took five cities, LA, Phoenix, San Diego, 1349 01:17:31,070 --> 01:17:33,105 San Francisco, and Seattle. 1350 01:17:36,510 --> 01:17:41,350 And then you looked at the number on time, 1351 01:17:41,350 --> 01:17:47,349 the number of flights, and then the rate, percentage on time. 1352 01:17:47,349 --> 01:17:48,390 And then same thing here. 1353 01:17:48,390 --> 01:17:57,300 Number on time, number of flights, and the rate. 1354 01:17:57,300 --> 01:17:59,800 So I'm just going to give you the numbers here. 1355 01:17:59,800 --> 01:18:14,100 So they had 500 out of 560 for a rate of 89%, 220 over 230 1356 01:18:14,100 --> 01:18:31,570 for 95, 210 over 230 for 92%, 500 over 600 for 83%, 1357 01:18:31,570 --> 01:18:32,430 and then Seattle. 1358 01:18:32,430 --> 01:18:33,580 They had a lot of flights. 1359 01:18:33,580 --> 01:18:40,970 That's where they're-- we have a hub of 2,200 for 86%. 1360 01:18:40,970 --> 01:18:48,520 And if you added them all up, they got 3,300 out of 3,820 1361 01:18:48,520 --> 01:18:54,260 for 87% on time. 1362 01:18:54,260 --> 01:18:56,260 Now the data for American West looks 1363 01:18:56,260 --> 01:18:58,150 something like the following. 1364 01:18:58,150 --> 01:19:02,716 In LA it's 700 out of 800 for 87%. 1365 01:19:06,170 --> 01:19:07,470 they're based in Phoenix. 1366 01:19:07,470 --> 01:19:08,860 They got a zillion flights there. 1367 01:19:08,860 --> 01:19:16,540 4,900 out of 5,300 for 92%. 1368 01:19:16,540 --> 01:19:31,860 And 400 over 450 for 89%, 320, over 450, 71%, 200 over 260 1369 01:19:31,860 --> 01:19:34,290 for 77%. 1370 01:19:34,290 --> 01:19:35,730 And then you add all them up. 1371 01:19:35,730 --> 01:19:45,110 And you've got 6,520 over 7,260 for 90%. 1372 01:19:45,110 --> 01:19:48,530 So the newspaper concluded and literally said 1373 01:19:48,530 --> 01:19:50,510 that American West is the better airline 1374 01:19:50,510 --> 01:19:53,410 to fly because they're on-time rate is much better. 1375 01:19:53,410 --> 01:19:55,825 It's 90% versus 87%. 1376 01:19:58,880 --> 01:20:00,420 What do you think? 1377 01:20:00,420 --> 01:20:03,960 Which airline would you fly looking at that data? 1378 01:20:03,960 --> 01:20:04,835 AUDIENCE: [INAUDIBLE] 1379 01:20:10,550 --> 01:20:13,890 PROFESSOR: I know which one I'd fly. 1380 01:20:13,890 --> 01:20:16,790 It looks like America West is better. 1381 01:20:16,790 --> 01:20:20,680 Every single city, American Airlines is better. 1382 01:20:23,410 --> 01:20:25,380 92 versus 89. 1383 01:20:25,380 --> 01:20:26,890 Everywhere it's better by a bunch. 1384 01:20:26,890 --> 01:20:29,010 83 versus 71. 1385 01:20:29,010 --> 01:20:31,330 86 versus 77. 1386 01:20:31,330 --> 01:20:36,510 Every single city, American Airlines is better. 1387 01:20:36,510 --> 01:20:40,422 Yet, America West is better overall. 1388 01:20:40,422 --> 01:20:41,880 And that's what the newspaper said. 1389 01:20:41,880 --> 01:20:43,196 They went on this. 1390 01:20:43,196 --> 01:20:44,987 But of course, no matter where you're going 1391 01:20:44,987 --> 01:20:46,695 you're better off with American Airlines. 1392 01:20:48,961 --> 01:20:49,460 All right? 1393 01:20:49,460 --> 01:20:53,880 Now what happened here? 1394 01:20:53,880 --> 01:20:55,270 The waiting. 1395 01:20:55,270 --> 01:20:59,460 In fact, America West flies out of Phoenix 1396 01:20:59,460 --> 01:21:02,520 where the weather's great. 1397 01:21:02,520 --> 01:21:05,740 So you get a higher on-time rate when in a good-weather city. 1398 01:21:05,740 --> 01:21:07,690 And they got most of their flights there. 1399 01:21:07,690 --> 01:21:09,940 American Airlines got a lot of flights in Seattle 1400 01:21:09,940 --> 01:21:14,390 where the weather sucks and you're always delayed. 1401 01:21:14,390 --> 01:21:14,890 All right? 1402 01:21:14,890 --> 01:21:16,530 And so they look worse on average 1403 01:21:16,530 --> 01:21:19,065 because so many of their flights are in a bad city 1404 01:21:19,065 --> 01:21:22,750 and so many of America West are in a good city. 1405 01:21:22,750 --> 01:21:23,250 All right? 1406 01:21:23,250 --> 01:21:24,950 So it makes America West look better 1407 01:21:24,950 --> 01:21:28,420 when in fact, in this case, it's absolutely clear whose better. 1408 01:21:28,420 --> 01:21:31,422 American Airlines is better, every single city. 1409 01:21:31,422 --> 01:21:32,030 All right. 1410 01:21:32,030 --> 01:21:34,860 That's why Mark Twain said, "There's 1411 01:21:34,860 --> 01:21:39,700 three kinds of lies-- lies, damned lies, and statistics." 1412 01:21:39,700 --> 01:21:42,410 We'll see more examples next time.