1 00:00:00,000 --> 00:00:00,040 2 00:00:00,040 --> 00:00:02,460 The following content is provided under a Creative 3 00:00:02,460 --> 00:00:03,870 Commons license. 4 00:00:03,870 --> 00:00:06,910 Your support will help MIT OpenCourseWare continue to 5 00:00:06,910 --> 00:00:10,560 offer high-quality educational resources for free. 6 00:00:10,560 --> 00:00:13,460 To make a donation or view additional materials from 7 00:00:13,460 --> 00:00:19,290 hundreds of MIT courses, visit MIT OpenCourseWare at 8 00:00:19,290 --> 00:00:20,540 ocw.mit.edu. 9 00:00:20,540 --> 00:00:22,420 10 00:00:22,420 --> 00:00:25,310 PROFESSOR: OK, good morning. 11 00:00:25,310 --> 00:00:30,930 So today, we're going to have a fairly packed lecture. 12 00:00:30,930 --> 00:00:34,060 We are going to conclude with chapter two, 13 00:00:34,060 --> 00:00:35,560 discrete random variables. 14 00:00:35,560 --> 00:00:37,140 And we will be talking mostly about 15 00:00:37,140 --> 00:00:39,322 multiple random variables. 16 00:00:39,322 --> 00:00:43,060 And this is also the last lecture as far 17 00:00:43,060 --> 00:00:44,720 as quiz one is concerned. 18 00:00:44,720 --> 00:00:48,350 So it's going to cover the material until today, and of 19 00:00:48,350 --> 00:00:52,550 course the next recitation and tutorial as well. 20 00:00:52,550 --> 00:00:57,170 OK, so we're going to review quickly what we introduced at 21 00:00:57,170 --> 00:01:01,040 the end of last lecture, where we talked about the joint PMF 22 00:01:01,040 --> 00:01:02,300 of two random variables. 23 00:01:02,300 --> 00:01:05,040 We're going to talk about the case of more than two random 24 00:01:05,040 --> 00:01:07,440 variables as well. 25 00:01:07,440 --> 00:01:09,910 We're going to talk about the familiar concepts of 26 00:01:09,910 --> 00:01:14,300 conditioning and independence, but applied to random 27 00:01:14,300 --> 00:01:16,460 variables instead of events. 28 00:01:16,460 --> 00:01:19,320 We're going to look at the expectations once more, talk 29 00:01:19,320 --> 00:01:22,720 about a few properties that they have, and then solve a 30 00:01:22,720 --> 00:01:25,900 couple of problems and calculate a few things in 31 00:01:25,900 --> 00:01:28,180 somewhat clever ways. 32 00:01:28,180 --> 00:01:31,790 So the first point I want to make is that, to a large 33 00:01:31,790 --> 00:01:34,870 extent, whatever is happening in our chapter on discrete 34 00:01:34,870 --> 00:01:39,160 random variables is just an exercise in notation. 35 00:01:39,160 --> 00:01:42,850 There is stuff and concepts that you are already familiar 36 00:01:42,850 --> 00:01:45,230 with-- probabilities, probabilities of two things 37 00:01:45,230 --> 00:01:47,490 happening, conditional probabilities. 38 00:01:47,490 --> 00:01:51,760 And all that we're doing, to some extent, is rewriting 39 00:01:51,760 --> 00:01:54,840 those familiar concepts in new notation. 40 00:01:54,840 --> 00:01:57,810 So for example, this is the joint PMF 41 00:01:57,810 --> 00:01:59,020 of two random variable. 42 00:01:59,020 --> 00:02:02,080 It gives us, for any pair or possible values of those 43 00:02:02,080 --> 00:02:05,510 random variables, the probability that that pair 44 00:02:05,510 --> 00:02:07,270 occurs simultaneously. 45 00:02:07,270 --> 00:02:10,020 So it's the probability that simultaneously x takes that 46 00:02:10,020 --> 00:02:13,580 value, and y takes that other value. 47 00:02:13,580 --> 00:02:17,210 And similarly, we have the notion of the conditional PMF, 48 00:02:17,210 --> 00:02:21,060 which is just a list of the -- condition of -- the various 49 00:02:21,060 --> 00:02:23,750 conditional probabilities of interest, conditional 50 00:02:23,750 --> 00:02:26,450 probability that one random variable takes this value 51 00:02:26,450 --> 00:02:30,320 given that the other random variable takes that value. 52 00:02:30,320 --> 00:02:33,640 Now, a remark about conditional probabilities. 53 00:02:33,640 --> 00:02:36,640 Conditional probabilities generally are like ordinary 54 00:02:36,640 --> 00:02:37,370 probabilities. 55 00:02:37,370 --> 00:02:40,170 You condition on something particular. 56 00:02:40,170 --> 00:02:43,230 So here we condition on a particular y. 57 00:02:43,230 --> 00:02:46,580 So think of little y as a fixed quantity. 58 00:02:46,580 --> 00:02:49,800 And then look at this as a function of x. 59 00:02:49,800 --> 00:02:54,430 So given that y, which we condition on, given our new 60 00:02:54,430 --> 00:02:58,990 universe, we're considering the various possibilities for 61 00:02:58,990 --> 00:03:01,290 x and the probabilities that they have. 62 00:03:01,290 --> 00:03:04,000 Now, the probabilities over all x's, of course, 63 00:03:04,000 --> 00:03:05,830 needs to add to 1. 64 00:03:05,830 --> 00:03:11,530 So we should have a relation of this kind. 65 00:03:11,530 --> 00:03:14,420 So they're just like ordinary probabilities over the 66 00:03:14,420 --> 00:03:18,230 different x's in a universe where we are told the value of 67 00:03:18,230 --> 00:03:20,940 the random variable y. 68 00:03:20,940 --> 00:03:22,335 Now, how are these related? 69 00:03:22,335 --> 00:03:25,200 70 00:03:25,200 --> 00:03:28,190 So we call these the marginal, these the joint, these the 71 00:03:28,190 --> 00:03:29,150 conditional. 72 00:03:29,150 --> 00:03:31,510 And there are some relations between these. 73 00:03:31,510 --> 00:03:35,430 For example, to find the marginal from the joint, it's 74 00:03:35,430 --> 00:03:37,730 pretty straightforward. 75 00:03:37,730 --> 00:03:41,680 The probability that x takes a particular value is the sum of 76 00:03:41,680 --> 00:03:45,030 the probabilities of all of the different ways that this 77 00:03:45,030 --> 00:03:47,190 particular value may occur. 78 00:03:47,190 --> 00:03:48,380 What are the different ways? 79 00:03:48,380 --> 00:03:51,910 Well, it may occur together with a certain y, or together 80 00:03:51,910 --> 00:03:55,110 with some other y, or together with some other y. 81 00:03:55,110 --> 00:03:58,030 So you look at all the possible y's that can go 82 00:03:58,030 --> 00:04:01,750 together with this x, and add the probabilities of all of 83 00:04:01,750 --> 00:04:07,220 those pairs for which we get this particular value of x. 84 00:04:07,220 --> 00:04:13,120 And then there's a relation between that connects these 85 00:04:13,120 --> 00:04:16,230 two probabilities with the conditional probability. 86 00:04:16,230 --> 00:04:18,630 And it's this relation. 87 00:04:18,630 --> 00:04:20,279 It's nothing new. 88 00:04:20,279 --> 00:04:25,160 It's just new notation for writing what we already know, 89 00:04:25,160 --> 00:04:28,130 that the probability of two things happening is the 90 00:04:28,130 --> 00:04:31,460 probability that the first thing happens, and then given 91 00:04:31,460 --> 00:04:34,210 that the first thing happens, the probability that the 92 00:04:34,210 --> 00:04:36,140 second one happened. 93 00:04:36,140 --> 00:04:39,050 So how do we go from one to the other? 94 00:04:39,050 --> 00:04:42,960 Think of A as being the event that X takes the value, little 95 00:04:42,960 --> 00:04:49,120 x, and B being the event that Y takes the value, little y. 96 00:04:49,120 --> 00:04:52,230 So the joint probability is the probability that these two 97 00:04:52,230 --> 00:04:54,220 things happen simultaneously. 98 00:04:54,220 --> 00:04:58,140 It's the probability that X takes this value times the 99 00:04:58,140 --> 00:05:03,280 conditional probability that Y takes this value, given that X 100 00:05:03,280 --> 00:05:04,670 took that first value. 101 00:05:04,670 --> 00:05:08,470 So it's the familiar multiplication rule, but just 102 00:05:08,470 --> 00:05:11,030 transcribed in our new notation. 103 00:05:11,030 --> 00:05:13,690 So nothing new so far. 104 00:05:13,690 --> 00:05:17,480 OK, why did we go through this exercise and this notation? 105 00:05:17,480 --> 00:05:19,980 It's because in the experiments where we're 106 00:05:19,980 --> 00:05:23,160 interested in the real world, typically there's going to be 107 00:05:23,160 --> 00:05:24,630 lots of uncertain quantities. 108 00:05:24,630 --> 00:05:27,150 There's going to be multiple random variables. 109 00:05:27,150 --> 00:05:31,520 And we want to be able to talk about them simultaneously. 110 00:05:31,520 --> 00:05:31,665 Okay. 111 00:05:31,665 --> 00:05:35,110 Why two and not more than two? 112 00:05:35,110 --> 00:05:37,620 How about three random variables? 113 00:05:37,620 --> 00:05:41,290 Well, if you understand what's going on in this slide, you 114 00:05:41,290 --> 00:05:45,720 should be able to kind of automatically generalize this 115 00:05:45,720 --> 00:05:48,260 to the case of multiple random variables. 116 00:05:48,260 --> 00:05:51,590 So for example, if we have three random variables, X, Y, 117 00:05:51,590 --> 00:05:56,720 and Z, and you see an expression like this, it 118 00:05:56,720 --> 00:05:58,670 should be clear what it means. 119 00:05:58,670 --> 00:06:02,070 It's the probability that X takes this value and 120 00:06:02,070 --> 00:06:06,240 simultaneously Y takes that value and simultaneously Z 121 00:06:06,240 --> 00:06:07,765 takes that value. 122 00:06:07,765 --> 00:06:13,280 I guess that's an uppercase Z here, that's a lowercase z. 123 00:06:13,280 --> 00:06:20,500 And if I ask you to find the marginal of X, if I tell you 124 00:06:20,500 --> 00:06:24,340 the joint PMF of the three random variables and I ask you 125 00:06:24,340 --> 00:06:27,320 for this value, how would you find it? 126 00:06:27,320 --> 00:06:31,350 Well, you will try to generalize this relation here. 127 00:06:31,350 --> 00:06:35,250 The probability that x occurs is the sum of the 128 00:06:35,250 --> 00:06:44,450 probabilities of all events that make X to take that 129 00:06:44,450 --> 00:06:45,870 particular value. 130 00:06:45,870 --> 00:06:47,400 So what are all the events? 131 00:06:47,400 --> 00:06:51,530 Well, this particular x can happen together with some y 132 00:06:51,530 --> 00:06:52,790 and some z. 133 00:06:52,790 --> 00:06:55,150 We don't care which y and z. 134 00:06:55,150 --> 00:06:57,890 Any y and z will do. 135 00:06:57,890 --> 00:07:01,220 So when we consider all possibilities, we need to add 136 00:07:01,220 --> 00:07:04,760 here over all possible values of y's and z's. 137 00:07:04,760 --> 00:07:08,020 So consider all triples, x, y, z. 138 00:07:08,020 --> 00:07:12,380 Fix x and consider all the possibilities for the 139 00:07:12,380 --> 00:07:16,600 remaining variables, y and z, add these up, and that gives 140 00:07:16,600 --> 00:07:24,740 you the marginal PMF of X. And then there's other things that 141 00:07:24,740 --> 00:07:26,140 you can do. 142 00:07:26,140 --> 00:07:29,340 This is the multiplication rule for two events. 143 00:07:29,340 --> 00:07:32,510 We saw back in chapter one that there's a multiplication 144 00:07:32,510 --> 00:07:35,130 rule when you talk about more than two events. 145 00:07:35,130 --> 00:07:38,860 And you can write a chain of conditional probabilities. 146 00:07:38,860 --> 00:07:43,860 We can certainly do the same in our new notation. 147 00:07:43,860 --> 00:07:45,810 So let's look at this rule up here. 148 00:07:45,810 --> 00:07:48,700 149 00:07:48,700 --> 00:07:51,220 Multiplication rule for three random variables, 150 00:07:51,220 --> 00:07:53,000 what does it say? 151 00:07:53,000 --> 00:07:55,280 The probability of three things happening 152 00:07:55,280 --> 00:07:59,770 simultaneously, X, Y, Z taking specific values, little x, 153 00:07:59,770 --> 00:08:03,110 little y, little z, that probability is the probability 154 00:08:03,110 --> 00:08:07,210 that the first thing happens, that X takes that value. 155 00:08:07,210 --> 00:08:09,880 Given that X takes that value, we multiply it with the 156 00:08:09,880 --> 00:08:14,650 probability that Y takes also a certain value. 157 00:08:14,650 --> 00:08:18,560 And now, given that X and Y have taken those particular 158 00:08:18,560 --> 00:08:21,730 values, we multiply with a conditional probability that 159 00:08:21,730 --> 00:08:24,380 the third thing happens, given that the 160 00:08:24,380 --> 00:08:26,960 first two things happen. 161 00:08:26,960 --> 00:08:30,080 So this is just the multiplication rule for three 162 00:08:30,080 --> 00:08:33,530 events, which would be probability of A intersection 163 00:08:33,530 --> 00:08:35,669 B intersection C equals-- 164 00:08:35,669 --> 00:08:37,909 you know the rest of the formula. 165 00:08:37,909 --> 00:08:42,330 You just rewrite this formula in PMF notation. 166 00:08:42,330 --> 00:08:45,310 Probability of A intersection B intersection C is the 167 00:08:45,310 --> 00:08:49,450 probability of A, which corresponds to this term, 168 00:08:49,450 --> 00:08:54,010 times the probability of B given A, times the probability 169 00:08:54,010 --> 00:09:00,700 of C given A and B. 170 00:09:00,700 --> 00:09:04,920 So what else is there that's left from chapter one that we 171 00:09:04,920 --> 00:09:10,190 can or should generalize to random variables? 172 00:09:10,190 --> 00:09:12,560 Well, there's the notion of independence. 173 00:09:12,560 --> 00:09:16,720 So let's define what independence means. 174 00:09:16,720 --> 00:09:19,970 Instead of talking about just two random variables, let's go 175 00:09:19,970 --> 00:09:22,470 directly to the case of multiple random variables. 176 00:09:22,470 --> 00:09:24,400 When we talked about events, things were a little 177 00:09:24,400 --> 00:09:25,100 complicated. 178 00:09:25,100 --> 00:09:28,480 We had a simple definition for independence of two events. 179 00:09:28,480 --> 00:09:31,950 Two events are independent if the probability of both is 180 00:09:31,950 --> 00:09:33,740 equal to the product of the probabilities. 181 00:09:33,740 --> 00:09:35,830 But for three events, it was kind of messy. 182 00:09:35,830 --> 00:09:38,460 We needed to write down lots of conditions. 183 00:09:38,460 --> 00:09:41,140 For random variables, things in some sense 184 00:09:41,140 --> 00:09:42,060 are a little simpler. 185 00:09:42,060 --> 00:09:46,360 We only need to write down one formula and take this as the 186 00:09:46,360 --> 00:09:49,020 definition of independence. 187 00:09:49,020 --> 00:09:53,630 Three random variables are independent if and only if, by 188 00:09:53,630 --> 00:09:58,390 definition, their joint probability mass function 189 00:09:58,390 --> 00:10:02,560 factors out into individual probability mass functions. 190 00:10:02,560 --> 00:10:08,190 So the probability that all three things happen is the 191 00:10:08,190 --> 00:10:11,840 product of the individual probabilities that each one of 192 00:10:11,840 --> 00:10:14,170 these three things is happening. 193 00:10:14,170 --> 00:10:17,580 So independence means mathematically that you can 194 00:10:17,580 --> 00:10:21,030 just multiply probabilities to get to the probability of 195 00:10:21,030 --> 00:10:22,706 several things happening simultaneously. 196 00:10:22,706 --> 00:10:25,680 197 00:10:25,680 --> 00:10:31,040 So with three events, we have to write a huge number of 198 00:10:31,040 --> 00:10:34,500 equations, of equalities that have to hold. 199 00:10:34,500 --> 00:10:37,500 How can it be that with random variables we can only manage 200 00:10:37,500 --> 00:10:39,370 with one equality? 201 00:10:39,370 --> 00:10:41,230 Well, the catch is that this is not 202 00:10:41,230 --> 00:10:43,260 really just one equality. 203 00:10:43,260 --> 00:10:48,390 We require this to be true for every little x, y, and z. 204 00:10:48,390 --> 00:10:52,600 So in some sense, this is a bunch of conditions that are 205 00:10:52,600 --> 00:10:56,300 being put on the joint PMF, a bunch of conditions that we 206 00:10:56,300 --> 00:10:58,130 need to check. 207 00:10:58,130 --> 00:11:01,040 So this is the mathematical definition. 208 00:11:01,040 --> 00:11:05,400 What is the intuitive content of this definition? 209 00:11:05,400 --> 00:11:11,130 The intuitive content is the same as for events. 210 00:11:11,130 --> 00:11:15,020 Random variables are independent if knowing 211 00:11:15,020 --> 00:11:19,490 something about the realized values of some of these random 212 00:11:19,490 --> 00:11:25,510 variables does not change our beliefs about the likelihood 213 00:11:25,510 --> 00:11:29,510 of various values for the remaining random variables. 214 00:11:29,510 --> 00:11:34,250 So independence would translate, for example, to a 215 00:11:34,250 --> 00:11:39,690 condition such as the conditional PMF of X , given 216 00:11:39,690 --> 00:11:46,420 y, should be equal to the marginal PMF of X. What is 217 00:11:46,420 --> 00:11:47,490 this saying? 218 00:11:47,490 --> 00:11:53,070 That you have some original beliefs about how likely it is 219 00:11:53,070 --> 00:11:55,210 for X to take this value. 220 00:11:55,210 --> 00:11:58,350 Now, someone comes and tells you that Y took 221 00:11:58,350 --> 00:12:00,140 on a certain value. 222 00:12:00,140 --> 00:12:03,470 This causes you, in principle, to revise your beliefs. 223 00:12:03,470 --> 00:12:06,430 And your new beliefs will be captured by the conditional 224 00:12:06,430 --> 00:12:08,750 PMF, or the conditional probabilities. 225 00:12:08,750 --> 00:12:12,820 Independence means that your revised beliefs actually will 226 00:12:12,820 --> 00:12:15,420 be the same as your original beliefs. 227 00:12:15,420 --> 00:12:19,960 Telling you information about the value of Y doesn't change 228 00:12:19,960 --> 00:12:24,400 what you expect for the random variable X. 229 00:12:24,400 --> 00:12:28,750 Why didn't we use this definition for independence? 230 00:12:28,750 --> 00:12:31,900 Well, because this definition only makes sense when this 231 00:12:31,900 --> 00:12:34,330 conditional is well-defined. 232 00:12:34,330 --> 00:12:43,290 And this conditional is only well-defined if the events 233 00:12:43,290 --> 00:12:46,130 that Y takes on that particular value has positive 234 00:12:46,130 --> 00:12:47,220 probability. 235 00:12:47,220 --> 00:12:51,730 We cannot condition on events that have zero probability, so 236 00:12:51,730 --> 00:12:55,460 conditional probabilities are only defined for y's that are 237 00:12:55,460 --> 00:12:59,500 likely to occur, that have a positive probability. 238 00:12:59,500 --> 00:13:03,640 Now, similarly, with multiple random variables, if they're 239 00:13:03,640 --> 00:13:07,970 independent, you would have relations such as the 240 00:13:07,970 --> 00:13:14,290 conditional of X, given y and z, should be the same as the 241 00:13:14,290 --> 00:13:17,340 marginal of X. What is this saying? 242 00:13:17,340 --> 00:13:21,220 Again, that if I tell you the values, the realized values of 243 00:13:21,220 --> 00:13:25,900 random variables Y and Z, this is not going to change your 244 00:13:25,900 --> 00:13:28,900 beliefs about how likely x is to occur. 245 00:13:28,900 --> 00:13:30,900 Whatever you believed in the beginning, you're going to 246 00:13:30,900 --> 00:13:33,000 believe the same thing afterwards. 247 00:13:33,000 --> 00:13:36,130 So it's important to keep that intuition in mind, because 248 00:13:36,130 --> 00:13:39,200 sometimes this way you can tell whether random variables 249 00:13:39,200 --> 00:13:42,820 are independent without having to do calculations and to 250 00:13:42,820 --> 00:13:44,930 check this formula. 251 00:13:44,930 --> 00:13:47,300 OK, so let's check our concepts 252 00:13:47,300 --> 00:13:49,250 with a simple example. 253 00:13:49,250 --> 00:13:52,220 Let's look at two random variables that are discrete, 254 00:13:52,220 --> 00:13:55,100 take values between one and for each. 255 00:13:55,100 --> 00:13:57,890 And this is a table that gives us the joint PMF. 256 00:13:57,890 --> 00:14:05,720 So it tells us the probability that X equals to 2 and Y 257 00:14:05,720 --> 00:14:08,040 equals to 1 happening simultaneously. 258 00:14:08,040 --> 00:14:10,810 It's an event that has probability 1/20. 259 00:14:10,810 --> 00:14:14,510 Are these two random variables independent? 260 00:14:14,510 --> 00:14:17,610 You can try to check a condition like this. 261 00:14:17,610 --> 00:14:21,940 But can we tell directly from the table? 262 00:14:21,940 --> 00:14:28,470 If I tell you a value of Y, could that give you useful 263 00:14:28,470 --> 00:14:29,720 information about X? 264 00:14:29,720 --> 00:14:32,180 265 00:14:32,180 --> 00:14:32,860 Certainly. 266 00:14:32,860 --> 00:14:38,680 If I tell you that Y is equal to 1, this tells you that X 267 00:14:38,680 --> 00:14:40,990 must be equal to 2. 268 00:14:40,990 --> 00:14:44,870 But if I tell you that Y was equal to 3, this tells you 269 00:14:44,870 --> 00:14:47,540 that, still, X could be anything. 270 00:14:47,540 --> 00:14:52,220 So telling you the value of Y kind of changes what you 271 00:14:52,220 --> 00:14:57,240 expect or what you consider possible for the values of the 272 00:14:57,240 --> 00:14:59,020 other random variable. 273 00:14:59,020 --> 00:15:03,070 So by just inspecting here, we can tell that the random 274 00:15:03,070 --> 00:15:04,860 variables are not independent. 275 00:15:04,860 --> 00:15:08,290 276 00:15:08,290 --> 00:15:08,470 Okay. 277 00:15:08,470 --> 00:15:10,990 What's the other concept we introduced in chapter one? 278 00:15:10,990 --> 00:15:14,060 We introduced the concept of conditional independence. 279 00:15:14,060 --> 00:15:17,120 And conditional independence is like ordinary independence 280 00:15:17,120 --> 00:15:20,420 but applied to a conditional universe where we're given 281 00:15:20,420 --> 00:15:21,780 some information. 282 00:15:21,780 --> 00:15:24,610 So suppose someone tells you that the outcome of the 283 00:15:24,610 --> 00:15:30,420 experiment is such that X is less than or equal to 2 and Y 284 00:15:30,420 --> 00:15:33,920 is larger than or equal to 3. 285 00:15:33,920 --> 00:15:37,670 So we are given the information that we now live 286 00:15:37,670 --> 00:15:40,010 inside this universe. 287 00:15:40,010 --> 00:15:42,080 So what happens inside this universe? 288 00:15:42,080 --> 00:15:47,200 Inside this universe, our random variables are going to 289 00:15:47,200 --> 00:15:55,140 have a new joint PMF which is conditioned on the event that 290 00:15:55,140 --> 00:15:58,650 we were told that it has occurred. 291 00:15:58,650 --> 00:16:04,780 So let A correspond to this sort of event here. 292 00:16:04,780 --> 00:16:06,900 And now we're dealing with conditional probabilities. 293 00:16:06,900 --> 00:16:09,490 What are those conditional probabilities? 294 00:16:09,490 --> 00:16:11,490 We can put them in a table. 295 00:16:11,490 --> 00:16:14,220 So it's a two by two table, since we only have two 296 00:16:14,220 --> 00:16:15,540 possible values. 297 00:16:15,540 --> 00:16:18,080 What are they going to be? 298 00:16:18,080 --> 00:16:20,740 Well, these probabilities show up in the ratios 299 00:16:20,740 --> 00:16:22,910 1, 2, 2, and 4. 300 00:16:22,910 --> 00:16:25,480 Those ratios have to stay the same. 301 00:16:25,480 --> 00:16:29,700 The probabilities need to add up to one. 302 00:16:29,700 --> 00:16:34,030 So what should the denominators be since these 303 00:16:34,030 --> 00:16:35,380 numbers add up to nine? 304 00:16:35,380 --> 00:16:37,820 These are the conditional probabilities. 305 00:16:37,820 --> 00:16:40,575 So this is the conditional PMF in this example. 306 00:16:40,575 --> 00:16:43,870 307 00:16:43,870 --> 00:16:46,990 Now, in this conditional universe, is x 308 00:16:46,990 --> 00:16:48,255 independent from y? 309 00:16:48,255 --> 00:16:51,230 310 00:16:51,230 --> 00:17:01,450 If I tell you that y takes this value, so we live in this 311 00:17:01,450 --> 00:17:04,980 universe, what do you know about x? 312 00:17:04,980 --> 00:17:08,109 What you know about x is at this value is twice as likely 313 00:17:08,109 --> 00:17:09,930 as that value. 314 00:17:09,930 --> 00:17:13,859 If I condition on y taking this value, so we're living 315 00:17:13,859 --> 00:17:16,450 here, what do you know about x? 316 00:17:16,450 --> 00:17:21,660 What you know about x is that this value is twice as likely 317 00:17:21,660 --> 00:17:23,240 as that value. 318 00:17:23,240 --> 00:17:24,500 So it's the same. 319 00:17:24,500 --> 00:17:30,250 Whether we live here or we live there, this x is twice as 320 00:17:30,250 --> 00:17:33,670 likely as that x. 321 00:17:33,670 --> 00:17:41,560 So the conditional PMF in this new universe, the conditional 322 00:17:41,560 --> 00:17:55,970 PMF of X given y, in the new universe is the same as the 323 00:17:55,970 --> 00:18:01,250 marginal PMF of X, but of course in the new universe. 324 00:18:01,250 --> 00:18:04,370 So no matter what y is, the conditional 325 00:18:04,370 --> 00:18:06,860 PMF of X is the same. 326 00:18:06,860 --> 00:18:12,150 And that conditional PMF is 1/3 and 2/3. 327 00:18:12,150 --> 00:18:15,150 This is the conditional PMF of X in the new universe no 328 00:18:15,150 --> 00:18:17,000 matter what y occurs. 329 00:18:17,000 --> 00:18:20,330 So Y does not give us any information about X, doesn't 330 00:18:20,330 --> 00:18:25,620 cause us to change our beliefs inside this little universe. 331 00:18:25,620 --> 00:18:28,440 And therefore the two random variables are independent. 332 00:18:28,440 --> 00:18:31,180 Now, the other way that you can verify that we have 333 00:18:31,180 --> 00:18:34,960 independence is to find the marginal PMFs of the two 334 00:18:34,960 --> 00:18:36,250 random variables. 335 00:18:36,250 --> 00:18:39,650 The marginal PMF of X, you find it by 336 00:18:39,650 --> 00:18:41,100 adding those two terms. 337 00:18:41,100 --> 00:18:42,720 You get 1/3. 338 00:18:42,720 --> 00:18:44,620 Adding those two terms, you get 2/3. 339 00:18:44,620 --> 00:18:48,530 Marginal PMF of Y, you find it, you add these two terms, 340 00:18:48,530 --> 00:18:51,410 and you get 1/3. 341 00:18:51,410 --> 00:18:56,470 And the marginal PMF of Y here is going to be 2/3. 342 00:18:56,470 --> 00:18:59,700 And then you ask the question, is the joint the product of 343 00:18:59,700 --> 00:19:00,860 the marginals? 344 00:19:00,860 --> 00:19:02,630 And indeed it is. 345 00:19:02,630 --> 00:19:05,330 This times this gives you 1/9. 346 00:19:05,330 --> 00:19:08,050 This times this gives you 2/9. 347 00:19:08,050 --> 00:19:12,180 So the values in the table with the joint PMFs is the 348 00:19:12,180 --> 00:19:17,220 product of the marginal PMFs of X and Y in this universe, 349 00:19:17,220 --> 00:19:19,090 so the two random variables are 350 00:19:19,090 --> 00:19:21,850 independent inside this universe. 351 00:19:21,850 --> 00:19:26,704 So we say that they're conditionally independent. 352 00:19:26,704 --> 00:19:28,500 All right. 353 00:19:28,500 --> 00:19:32,720 Now let's move to the new topic, to the new concept that 354 00:19:32,720 --> 00:19:35,170 we introduce in this chapter, which is the concept of 355 00:19:35,170 --> 00:19:36,440 expectations. 356 00:19:36,440 --> 00:19:38,200 So what are the things to know here? 357 00:19:38,200 --> 00:19:40,150 One is the general idea. 358 00:19:40,150 --> 00:19:43,140 The way to think about expectations is that it's 359 00:19:43,140 --> 00:19:46,080 something like the average value for random variable if 360 00:19:46,080 --> 00:19:49,590 you do an experiment over and over, and if you interpret 361 00:19:49,590 --> 00:19:51,550 probabilities as frequencies. 362 00:19:51,550 --> 00:19:57,030 So you get x's over and over with a certain frequency -- 363 00:19:57,030 --> 00:19:58,670 P(x) -- 364 00:19:58,670 --> 00:20:01,160 a particular value, little x, gets realized. 365 00:20:01,160 --> 00:20:03,960 And each time that this happens, you get x dollars. 366 00:20:03,960 --> 00:20:06,040 How many dollars do you get on the average? 367 00:20:06,040 --> 00:20:09,330 Well, this formula gives you that particular average. 368 00:20:09,330 --> 00:20:13,190 So first thing we do is to write down a definition for 369 00:20:13,190 --> 00:20:15,420 this sort of concept. 370 00:20:15,420 --> 00:20:19,810 But then the other things you need to know is how to 371 00:20:19,810 --> 00:20:23,990 calculate expectations using shortcuts sometimes, and what 372 00:20:23,990 --> 00:20:25,440 properties they have. 373 00:20:25,440 --> 00:20:28,500 The most important shortcut there is is that, if you want 374 00:20:28,500 --> 00:20:31,250 to calculate the expected value, the average value for a 375 00:20:31,250 --> 00:20:36,380 random variable, you do not need to find the PMF of that 376 00:20:36,380 --> 00:20:37,530 random variable. 377 00:20:37,530 --> 00:20:41,180 But you can work directly with the x's and the y's. 378 00:20:41,180 --> 00:20:44,210 So you do the experiment over and over. 379 00:20:44,210 --> 00:20:46,670 The outcome of the experiment is a pair (x,y). 380 00:20:46,670 --> 00:20:49,400 And each time that a certain (x,y) happens, 381 00:20:49,400 --> 00:20:51,280 you get so many dollars. 382 00:20:51,280 --> 00:20:54,990 So this fraction of the time, a certain (x,y) happens. 383 00:20:54,990 --> 00:20:58,050 And that fraction of the time, you get so many dollars, so 384 00:20:58,050 --> 00:21:00,860 this is the average number of dollars that you get. 385 00:21:00,860 --> 00:21:05,230 So what you end up, since it is the average, then that 386 00:21:05,230 --> 00:21:07,830 means that it corresponds to the expected value. 387 00:21:07,830 --> 00:21:09,820 Now, this is something that, of course, needs a little bit 388 00:21:09,820 --> 00:21:10,850 of mathematical proof. 389 00:21:10,850 --> 00:21:13,880 But this is just a different way of accounting. 390 00:21:13,880 --> 00:21:16,510 And it turns out we give you the right answer. 391 00:21:16,510 --> 00:21:19,420 And it's a very useful shortcut. 392 00:21:19,420 --> 00:21:22,070 Now, when we're talking about functions of random variables, 393 00:21:22,070 --> 00:21:26,620 in general, we cannot speak just about averages. 394 00:21:26,620 --> 00:21:29,690 That is, the expected value of a function of a random 395 00:21:29,690 --> 00:21:31,860 variable is not the same as the function of 396 00:21:31,860 --> 00:21:33,320 the expected values. 397 00:21:33,320 --> 00:21:36,120 A function of averages is not the same as the 398 00:21:36,120 --> 00:21:38,380 average of a function. 399 00:21:38,380 --> 00:21:40,510 So in general, this is not true. 400 00:21:40,510 --> 00:21:43,960 But what it's important to know is to know the exceptions 401 00:21:43,960 --> 00:21:45,370 to this rule. 402 00:21:45,370 --> 00:21:48,620 And the important exceptions are mainly two. 403 00:21:48,620 --> 00:21:51,560 One is the case of linear 404 00:21:51,560 --> 00:21:53,040 functions of a random variable. 405 00:21:53,040 --> 00:21:54,800 We discussed this last time. 406 00:21:54,800 --> 00:21:59,810 So the expected value of temperature in Celsius is, you 407 00:21:59,810 --> 00:22:03,340 first find the expected value of temperature in Fahrenheit, 408 00:22:03,340 --> 00:22:05,810 and then you do the conversion to Celsius. 409 00:22:05,810 --> 00:22:08,600 So whether you first average and then do the conversion to 410 00:22:08,600 --> 00:22:11,730 the new units or not, it shouldn't matter when you get 411 00:22:11,730 --> 00:22:13,740 the result. 412 00:22:13,740 --> 00:22:16,740 The other property that turns out to be true when you talk 413 00:22:16,740 --> 00:22:19,280 about multiple random variables is that expectation 414 00:22:19,280 --> 00:22:21,070 still behaves linearly. 415 00:22:21,070 --> 00:22:26,600 So let X, Y, and Z be the score of a random student at 416 00:22:26,600 --> 00:22:29,940 each one of the three sections of the SAT. 417 00:22:29,940 --> 00:22:36,310 So the overall SAT score is X plus Y plus Z. This is the 418 00:22:36,310 --> 00:22:40,940 average score, the average total SAT score. 419 00:22:40,940 --> 00:22:43,790 Another way to calculate that average is to look at the 420 00:22:43,790 --> 00:22:47,480 first section of the SAT and see what was the average. 421 00:22:47,480 --> 00:22:50,710 Look at the second section, look at what was the average, 422 00:22:50,710 --> 00:22:53,470 and so the third, and add the averages. 423 00:22:53,470 --> 00:22:56,910 So you can do the averages for each section separately, add 424 00:22:56,910 --> 00:23:00,500 the averages, or you can find total scores for each student 425 00:23:00,500 --> 00:23:01,710 and average them. 426 00:23:01,710 --> 00:23:05,690 So I guess you probably believe that this is correct 427 00:23:05,690 --> 00:23:09,030 if you talk just about averaging scores. 428 00:23:09,030 --> 00:23:12,580 Since expectations are just the variation of averages, it 429 00:23:12,580 --> 00:23:16,010 turns out that this is also true in general. 430 00:23:16,010 --> 00:23:19,760 And the derivation of this is very simple, based on the 431 00:23:19,760 --> 00:23:21,320 expected value rule. 432 00:23:21,320 --> 00:23:24,450 And you can look at it in the notes. 433 00:23:24,450 --> 00:23:27,740 So this is one exception, which is linearity. 434 00:23:27,740 --> 00:23:31,540 The second important exception is the case of independent 435 00:23:31,540 --> 00:23:34,520 random variables, that the product of two random 436 00:23:34,520 --> 00:23:37,830 variables has an expectation which is the product of the 437 00:23:37,830 --> 00:23:38,980 expectations. 438 00:23:38,980 --> 00:23:41,400 In general, this is not true. 439 00:23:41,400 --> 00:23:47,010 But for the case where we have independence, the expectation 440 00:23:47,010 --> 00:23:48,080 works out as follows. 441 00:23:48,080 --> 00:23:55,130 Using the expected value rule, this is how you calculate the 442 00:23:55,130 --> 00:23:59,170 expected value of a function of a random variable. 443 00:23:59,170 --> 00:24:04,810 So think of this as being your g(X, Y) and this being your 444 00:24:04,810 --> 00:24:06,160 g(little x, y). 445 00:24:06,160 --> 00:24:08,760 So this is something that's generally true. 446 00:24:08,760 --> 00:24:20,350 Now, if we have independence, then the PMFs factor out, and 447 00:24:20,350 --> 00:24:25,660 then you can separate this sum by bringing together the x 448 00:24:25,660 --> 00:24:30,130 terms, bring them outside the y summation. 449 00:24:30,130 --> 00:24:34,370 And you find that this is the same as expected value of X 450 00:24:34,370 --> 00:24:38,890 times the expected value of Y. So independence is used in 451 00:24:38,890 --> 00:24:40,140 this step here. 452 00:24:40,140 --> 00:24:44,020 453 00:24:44,020 --> 00:24:48,640 OK, now what if X and Y are independent, but instead of 454 00:24:48,640 --> 00:24:51,020 taking the expectation of X times Y, we take the 455 00:24:51,020 --> 00:24:56,600 expectation of the product of two functions of X and Y? 456 00:24:56,600 --> 00:24:59,560 I claim that the expected value of the product is still 457 00:24:59,560 --> 00:25:02,630 going to be the product of the expected values. 458 00:25:02,630 --> 00:25:04,180 How do we show that? 459 00:25:04,180 --> 00:25:09,230 We could show it by just redoing this derivation here. 460 00:25:09,230 --> 00:25:13,500 Instead of X and Y, we would have g(X) and h(Y), so the 461 00:25:13,500 --> 00:25:14,850 algebra goes through. 462 00:25:14,850 --> 00:25:17,720 But there's a better way to think about it which is more 463 00:25:17,720 --> 00:25:18,960 conceptual. 464 00:25:18,960 --> 00:25:20,886 And here's the idea. 465 00:25:20,886 --> 00:25:25,750 If X and Y are independent, what does it mean? 466 00:25:25,750 --> 00:25:31,180 X does not convey any information about Y. If X 467 00:25:31,180 --> 00:25:36,350 conveys no information about Y, does X convey information 468 00:25:36,350 --> 00:25:40,500 about h(Y)? 469 00:25:40,500 --> 00:25:41,940 No. 470 00:25:41,940 --> 00:25:46,160 If X tells me nothing about Y, nothing new, it shouldn't tell 471 00:25:46,160 --> 00:25:50,580 me anything about h(Y). 472 00:25:50,580 --> 00:25:59,270 Now, if X tells me nothing about h of h(Y), could g(X) 473 00:25:59,270 --> 00:26:01,470 tell me something about h(Y)? 474 00:26:01,470 --> 00:26:02,250 No. 475 00:26:02,250 --> 00:26:06,780 So the idea is that, if X is unrelated to Y, doesn't have 476 00:26:06,780 --> 00:26:11,080 any useful information, then g(X) could not have any useful 477 00:26:11,080 --> 00:26:13,250 information for h(Y). 478 00:26:13,250 --> 00:26:21,030 So if X and Y are independent, then g(X) and h(Y) are also 479 00:26:21,030 --> 00:26:22,280 independent. 480 00:26:22,280 --> 00:26:27,150 481 00:26:27,150 --> 00:26:29,430 So this is something that one can try to prove 482 00:26:29,430 --> 00:26:31,500 mathematically, but it's more important to understand 483 00:26:31,500 --> 00:26:34,530 conceptually why this is so. 484 00:26:34,530 --> 00:26:38,220 It's in terms of conveying information. 485 00:26:38,220 --> 00:26:44,950 So if X tells me nothing about Y, X cannot tell me anything 486 00:26:44,950 --> 00:26:48,490 about Y cubed, or X cannot tell me anything by Y 487 00:26:48,490 --> 00:26:51,030 squared, and so on. 488 00:26:51,030 --> 00:26:52,260 That's the idea. 489 00:26:52,260 --> 00:26:57,180 And once we are convinced that g(X) and h(Y) are independent, 490 00:26:57,180 --> 00:27:00,550 then we can apply our previous rule, that for independent 491 00:27:00,550 --> 00:27:04,390 random variables, expectations multiply the right way. 492 00:27:04,390 --> 00:27:08,660 Apply the previous rule, but apply it now to these two 493 00:27:08,660 --> 00:27:10,490 independent random variables. 494 00:27:10,490 --> 00:27:12,785 And we get the conclusion that we wanted. 495 00:27:12,785 --> 00:27:15,500 496 00:27:15,500 --> 00:27:19,050 Now, besides expectations, we also introduced the concept of 497 00:27:19,050 --> 00:27:20,300 the variance. 498 00:27:20,300 --> 00:27:23,560 499 00:27:23,560 --> 00:27:27,450 And if you remember the definition of the variance, 500 00:27:27,450 --> 00:27:31,100 let me write down the formula for the variance of aX. 501 00:27:31,100 --> 00:27:34,920 It's the expected value of the random variable that we're 502 00:27:34,920 --> 00:27:39,630 looking at minus the expected value of the random variable 503 00:27:39,630 --> 00:27:42,050 that we're looking at. 504 00:27:42,050 --> 00:27:44,780 So this is the difference of the random 505 00:27:44,780 --> 00:27:47,850 variable from its mean. 506 00:27:47,850 --> 00:27:50,880 And we take that difference and square it, so it's the 507 00:27:50,880 --> 00:27:53,070 squared distance from the mean, and then take 508 00:27:53,070 --> 00:27:55,250 expectations of the whole thing. 509 00:27:55,250 --> 00:27:59,570 So when you look at that expression, you realize that a 510 00:27:59,570 --> 00:28:01,780 can be pulled out of those expressions. 511 00:28:01,780 --> 00:28:04,540 512 00:28:04,540 --> 00:28:10,340 And because there is a squared, when you pull out the 513 00:28:10,340 --> 00:28:12,980 a, it's going to come out as an a-squared. 514 00:28:12,980 --> 00:28:16,050 So that gives us the rule for finding the variance of a 515 00:28:16,050 --> 00:28:18,990 scale or product of a random variable. 516 00:28:18,990 --> 00:28:22,370 The variance captures the idea of how wide, how spread out a 517 00:28:22,370 --> 00:28:24,210 certain distribution is. 518 00:28:24,210 --> 00:28:26,600 Bigger variance means it's more spread out. 519 00:28:26,600 --> 00:28:29,360 Now, if you take a random variable and the constants to 520 00:28:29,360 --> 00:28:31,960 it, what does it do to its distribution? 521 00:28:31,960 --> 00:28:35,480 It just shifts it, but it doesn't change its width. 522 00:28:35,480 --> 00:28:37,140 So intuitively it means that the 523 00:28:37,140 --> 00:28:39,030 variance should not change. 524 00:28:39,030 --> 00:28:42,360 You can check that mathematically, but it should 525 00:28:42,360 --> 00:28:44,290 also make sense intuitively. 526 00:28:44,290 --> 00:28:47,710 So the variance, when you add the constant, does not change. 527 00:28:47,710 --> 00:28:51,680 Now, can you add variances is the way we added expectations? 528 00:28:51,680 --> 00:28:54,760 Does variance behave linearly? 529 00:28:54,760 --> 00:28:57,810 It turns out that not always. 530 00:28:57,810 --> 00:28:59,270 Here, we need a condition. 531 00:28:59,270 --> 00:29:03,880 It's only in special cases-- 532 00:29:03,880 --> 00:29:06,210 for example, when the two random variables are 533 00:29:06,210 --> 00:29:07,190 independent-- 534 00:29:07,190 --> 00:29:09,300 that you can add variances. 535 00:29:09,300 --> 00:29:13,300 The variance of the sum is the sum of the variances if X and 536 00:29:13,300 --> 00:29:15,370 Y are independent. 537 00:29:15,370 --> 00:29:18,880 The derivation of this is, again, very short and simple. 538 00:29:18,880 --> 00:29:22,590 We'll skip it, but it's an important fact to remember. 539 00:29:22,590 --> 00:29:26,140 Now, to appreciate why this equality is not true always, 540 00:29:26,140 --> 00:29:28,980 we can think of some extreme examples. 541 00:29:28,980 --> 00:29:32,250 Suppose that X is the same as Y. What's going to be the 542 00:29:32,250 --> 00:29:34,520 variance of X plus Y? 543 00:29:34,520 --> 00:29:39,810 Well, X plus Y, in this case, is the same as 2X, so we're 544 00:29:39,810 --> 00:29:44,620 going to get 4 times the variance of X, which is 545 00:29:44,620 --> 00:29:49,770 different than the variance of X plus the variance of X. 546 00:29:49,770 --> 00:29:52,920 So that expression would give us twice the variance of X. 547 00:29:52,920 --> 00:29:56,460 But actually now it's 4 times the variance of X. The other 548 00:29:56,460 --> 00:30:01,990 extreme would be if X is equal to -Y. Then the variance is 549 00:30:01,990 --> 00:30:05,390 the variance of the random variable, which is always 550 00:30:05,390 --> 00:30:07,020 equal to 0. 551 00:30:07,020 --> 00:30:09,980 Now, a random variable which is always equal to 0 has no 552 00:30:09,980 --> 00:30:10,700 uncertainty. 553 00:30:10,700 --> 00:30:14,570 It is always equal to its mean value, so the variance, in 554 00:30:14,570 --> 00:30:17,090 this case, turns out to be 0. 555 00:30:17,090 --> 00:30:19,940 So in both of these cases, of course we have random 556 00:30:19,940 --> 00:30:23,020 variables that are extremely dependent. 557 00:30:23,020 --> 00:30:24,740 Why are they dependent? 558 00:30:24,740 --> 00:30:27,940 Because if I tell you something about Y, it tells 559 00:30:27,940 --> 00:30:32,020 you an awful lot about the value of X. There's a lot of 560 00:30:32,020 --> 00:30:34,910 information about X if I tell you Y, in this 561 00:30:34,910 --> 00:30:37,050 case or in that case. 562 00:30:37,050 --> 00:30:39,940 And finally, a short drill. 563 00:30:39,940 --> 00:30:42,570 If I tell you that the random variables are independent and 564 00:30:42,570 --> 00:30:44,840 you want to calculate the variance of a linear 565 00:30:44,840 --> 00:30:48,330 combination of this kind, then how do you argue? 566 00:30:48,330 --> 00:30:51,940 You argue that, since X and Y are independent, this means 567 00:30:51,940 --> 00:30:55,660 that X and 3Y are also independent. 568 00:30:55,660 --> 00:30:59,610 X has no information about Y, so X has no information about 569 00:30:59,610 --> 00:31:05,000 -Y. X has no information about -Y, so X should not have any 570 00:31:05,000 --> 00:31:10,270 information about -3Y. 571 00:31:10,270 --> 00:31:14,400 So X and -3Y are independent. 572 00:31:14,400 --> 00:31:18,480 So the variance of Z should be the variance of X plus the 573 00:31:18,480 --> 00:31:26,910 variance of -3Y, which is the variance of X plus 9 times the 574 00:31:26,910 --> 00:31:31,760 variance of Y. The important thing to note here is that no 575 00:31:31,760 --> 00:31:34,080 matter what happens, you end up getting a 576 00:31:34,080 --> 00:31:37,000 plus here, not a minus. 577 00:31:37,000 --> 00:31:41,160 So that's the sort of important thing to remember in 578 00:31:41,160 --> 00:31:42,410 this type of calculation. 579 00:31:42,410 --> 00:31:44,820 580 00:31:44,820 --> 00:31:48,890 So this has been all concepts, reviews, new 581 00:31:48,890 --> 00:31:50,390 concepts and all that. 582 00:31:50,390 --> 00:31:52,720 It's the usual fire hose. 583 00:31:52,720 --> 00:31:56,680 Now let's use them to do something useful finally. 584 00:31:56,680 --> 00:31:59,220 So let's revisit our old example, the binomial 585 00:31:59,220 --> 00:32:03,350 distribution, which counts the number of successes in 586 00:32:03,350 --> 00:32:06,230 independent trials of a coin. 587 00:32:06,230 --> 00:32:09,030 It's a biased coin that has a probability of heads, or 588 00:32:09,030 --> 00:32:13,000 probability of success, equal to p at each trial. 589 00:32:13,000 --> 00:32:16,160 Finally, we can go through the exercise of calculating the 590 00:32:16,160 --> 00:32:18,820 expected value of this random variable. 591 00:32:18,820 --> 00:32:21,790 And there's the way of calculating that expectation 592 00:32:21,790 --> 00:32:24,260 that would be the favorite of those people who enjoy 593 00:32:24,260 --> 00:32:27,500 algebra, which is to write down the definition of the 594 00:32:27,500 --> 00:32:28,740 expected value. 595 00:32:28,740 --> 00:32:31,980 We add over all possible values of the random variable, 596 00:32:31,980 --> 00:32:35,580 over all the possible k's, and weigh them according to the 597 00:32:35,580 --> 00:32:38,440 probabilities that this particular k occurs. 598 00:32:38,440 --> 00:32:42,250 The probability that X takes on a particular value k is, of 599 00:32:42,250 --> 00:32:44,820 course, the binomial PMF, which is 600 00:32:44,820 --> 00:32:47,560 this familiar formula. 601 00:32:47,560 --> 00:32:50,480 Clearly, that would be a messy and challenging calculation. 602 00:32:50,480 --> 00:32:52,490 Can we find a shortcut? 603 00:32:52,490 --> 00:32:54,010 There's a very clever trick. 604 00:32:54,010 --> 00:32:56,690 There's lots of problems in probability that you can 605 00:32:56,690 --> 00:33:00,000 approach really nicely by breaking up the random 606 00:33:00,000 --> 00:33:03,830 variable of interest into a sum of simpler and more 607 00:33:03,830 --> 00:33:06,010 manageable random variables. 608 00:33:06,010 --> 00:33:09,700 And if you can make it to be a sum of random variables that 609 00:33:09,700 --> 00:33:12,590 are just 0's or 1's, so much the better. 610 00:33:12,590 --> 00:33:13,990 Life is easier. 611 00:33:13,990 --> 00:33:16,850 Random variables that take values 0 or 1, we call them 612 00:33:16,850 --> 00:33:18,380 indicator variables. 613 00:33:18,380 --> 00:33:21,700 They indicate whether an event has occurred or not. 614 00:33:21,700 --> 00:33:25,600 In this case, we look at each coin flip one at a time. 615 00:33:25,600 --> 00:33:29,710 For the i-th flip, if it resulted in heads or a 616 00:33:29,710 --> 00:33:32,110 success, we record it 1. 617 00:33:32,110 --> 00:33:34,220 If not, we record it 0. 618 00:33:34,220 --> 00:33:37,540 And then we look at the random variable. 619 00:33:37,540 --> 00:33:42,580 If we take the sum of the Xi's, what is it going to be? 620 00:33:42,580 --> 00:33:48,030 We add one each time that we get a success, so the sum is 621 00:33:48,030 --> 00:33:50,820 going to be the total number of successes. 622 00:33:50,820 --> 00:33:53,900 So we break up the random variable of interest as a sum 623 00:33:53,900 --> 00:33:57,610 of really nice and simple random variables. 624 00:33:57,610 --> 00:34:00,380 And now we can use the linearity of expectations. 625 00:34:00,380 --> 00:34:02,800 We're going to find the expectation of X by finding 626 00:34:02,800 --> 00:34:05,700 the expectation of the Xi's and then adding the 627 00:34:05,700 --> 00:34:06,770 expectations. 628 00:34:06,770 --> 00:34:09,520 What's the expected value of Xi? 629 00:34:09,520 --> 00:34:13,050 Well, Xi takes the value 1 with probability p, and takes 630 00:34:13,050 --> 00:34:15,610 the value 0 with probability 1-p. 631 00:34:15,610 --> 00:34:19,070 So the expected value of Xi is just p. 632 00:34:19,070 --> 00:34:24,889 So the expected value of X is going to be just n times p. 633 00:34:24,889 --> 00:34:29,560 Because X is the sum of n terms, each one of which has 634 00:34:29,560 --> 00:34:33,050 expectation p, the expected value of the sum is the sum of 635 00:34:33,050 --> 00:34:34,600 the expected values. 636 00:34:34,600 --> 00:34:38,440 So I guess that's a pretty good shortcut for doing this 637 00:34:38,440 --> 00:34:40,790 horrendous calculation up there. 638 00:34:40,790 --> 00:34:47,210 So in case you didn't realize it, that's what we just 639 00:34:47,210 --> 00:34:51,940 established without doing any algebra. 640 00:34:51,940 --> 00:34:52,219 Good. 641 00:34:52,219 --> 00:34:56,150 How about the variance of X, of Xi? 642 00:34:56,150 --> 00:34:57,570 Two ways to calculate it. 643 00:34:57,570 --> 00:35:01,160 One is by using directly the formula for the variance, 644 00:35:01,160 --> 00:35:02,370 which would be -- 645 00:35:02,370 --> 00:35:03,900 let's see what it would be. 646 00:35:03,900 --> 00:35:06,800 With probability p, you get a 1. 647 00:35:06,800 --> 00:35:11,270 And in this case, you are so far from the mean. 648 00:35:11,270 --> 00:35:13,950 That's your squared distance from the mean. 649 00:35:13,950 --> 00:35:18,750 With probability 1-p, you get a 0, which is so far 650 00:35:18,750 --> 00:35:20,380 away from the mean. 651 00:35:20,380 --> 00:35:24,380 And then you can simplify that formula and get an answer. 652 00:35:24,380 --> 00:35:28,660 How about a slightly easier way of doing it. 653 00:35:28,660 --> 00:35:31,360 Instead of doing the algebra here, let me indicate the 654 00:35:31,360 --> 00:35:33,420 slightly easier way. 655 00:35:33,420 --> 00:35:36,070 We have a formula for the variance that tells us that we 656 00:35:36,070 --> 00:35:42,290 can find the variance by proceeding this way. 657 00:35:42,290 --> 00:35:45,980 That's a formula that's generally true for variances. 658 00:35:45,980 --> 00:35:47,380 Why is this easier? 659 00:35:47,380 --> 00:35:49,560 What's the expected value of Xi squared? 660 00:35:49,560 --> 00:35:52,240 661 00:35:52,240 --> 00:35:53,290 Backtrack. 662 00:35:53,290 --> 00:35:57,140 What is Xi squared, after all? 663 00:35:57,140 --> 00:35:59,510 It's the same thing as Xi. 664 00:35:59,510 --> 00:36:04,200 Since Xi takes value 0 and 1, Xi squared also takes the same 665 00:36:04,200 --> 00:36:05,780 values, 0 and 1. 666 00:36:05,780 --> 00:36:09,050 So the expected value of Xi squared is the same as the 667 00:36:09,050 --> 00:36:11,990 expected value of Xi, which is equal to p. 668 00:36:11,990 --> 00:36:15,120 669 00:36:15,120 --> 00:36:20,530 And the expected value of Xi squared is p squared, so we 670 00:36:20,530 --> 00:36:24,680 get the final answer, p times (1-p). 671 00:36:24,680 --> 00:36:28,630 If you were to work through and do the cancellations in 672 00:36:28,630 --> 00:36:32,400 this messy expression here, after one line you would also 673 00:36:32,400 --> 00:36:34,050 get to the same formula. 674 00:36:34,050 --> 00:36:38,240 But this sort of illustrates that working with this formula 675 00:36:38,240 --> 00:36:40,550 for the variance, sometimes things work 676 00:36:40,550 --> 00:36:43,090 out a little faster. 677 00:36:43,090 --> 00:36:45,420 Finally, are we in business? 678 00:36:45,420 --> 00:36:47,820 Can we calculate the variance of the random 679 00:36:47,820 --> 00:36:50,100 variable X as well? 680 00:36:50,100 --> 00:36:52,650 Well, we have the rule that for independent random 681 00:36:52,650 --> 00:36:55,680 variables, the variance of the sum is 682 00:36:55,680 --> 00:36:57,870 the sum of the variances. 683 00:36:57,870 --> 00:37:00,930 So to find the variance of X, we just need to add the 684 00:37:00,930 --> 00:37:02,960 variances of the Xi's. 685 00:37:02,960 --> 00:37:07,140 We have n Xi's, and each one of them has 686 00:37:07,140 --> 00:37:10,110 variance p_n times (1-p). 687 00:37:10,110 --> 00:37:12,290 And we are done. 688 00:37:12,290 --> 00:37:17,780 So this way, we have calculated both the mean and 689 00:37:17,780 --> 00:37:21,550 the variance of the binomial random variable. 690 00:37:21,550 --> 00:37:27,280 It's interesting to look at this particular formula and 691 00:37:27,280 --> 00:37:29,180 see what it tells us. 692 00:37:29,180 --> 00:37:33,470 If you are to plot the variance of X as a function of 693 00:37:33,470 --> 00:37:36,050 p, it has this shape. 694 00:37:36,050 --> 00:37:45,900 695 00:37:45,900 --> 00:37:51,310 And the maximum is here at 1/2. 696 00:37:51,310 --> 00:37:55,150 p times (1-p) is 0 when p is equal to 0. 697 00:37:55,150 --> 00:37:58,570 And when p equals to 1, it's a quadratic, so it must have 698 00:37:58,570 --> 00:38:00,250 this particular shape. 699 00:38:00,250 --> 00:38:02,080 So what does it tell us? 700 00:38:02,080 --> 00:38:05,880 If you think about variance as a measure of uncertainty, it 701 00:38:05,880 --> 00:38:10,290 tells you that coin flips are most uncertain when 702 00:38:10,290 --> 00:38:12,620 your coin is fair. 703 00:38:12,620 --> 00:38:16,190 When p is equal to 1/2, that's when you have the most 704 00:38:16,190 --> 00:38:17,050 randomness. 705 00:38:17,050 --> 00:38:18,790 And this is kind of intuitive. 706 00:38:18,790 --> 00:38:21,460 if on the other hand I tell you that the coin is extremely 707 00:38:21,460 --> 00:38:26,490 biased, p very close to 1, which means it almost always 708 00:38:26,490 --> 00:38:29,460 gives you heads, then that would be 709 00:38:29,460 --> 00:38:30,630 a case of low variance. 710 00:38:30,630 --> 00:38:32,870 There's low variability in the results. 711 00:38:32,870 --> 00:38:35,270 There's little uncertainty about what's going to happen. 712 00:38:35,270 --> 00:38:39,570 It's going to be mostly heads with some occasional tails. 713 00:38:39,570 --> 00:38:42,010 So p equals 1/2. 714 00:38:42,010 --> 00:38:45,350 Fair coin, that's the coin which is the most uncertain of 715 00:38:45,350 --> 00:38:47,240 all coins, in some sense. 716 00:38:47,240 --> 00:38:49,240 And it corresponds to the biggest variance. 717 00:38:49,240 --> 00:38:53,760 It corresponds to an X that has the widest distribution. 718 00:38:53,760 --> 00:38:57,680 Now that we're on a roll and we can calculate such hugely 719 00:38:57,680 --> 00:39:01,400 complicated sums in simple ways, let us try to push our 720 00:39:01,400 --> 00:39:05,100 luck and do a problem with this flavor, but a little 721 00:39:05,100 --> 00:39:06,590 harder than that. 722 00:39:06,590 --> 00:39:07,960 So you go to one of those 723 00:39:07,960 --> 00:39:09,910 old-fashioned cocktail parties. 724 00:39:09,910 --> 00:39:16,010 All males at least will have those standard big hats which 725 00:39:16,010 --> 00:39:16,990 look identical. 726 00:39:16,990 --> 00:39:19,700 They check them in when they walk in. 727 00:39:19,700 --> 00:39:23,390 And when they walk out, since they look pretty identical, 728 00:39:23,390 --> 00:39:26,830 they just pick a random hat and go home. 729 00:39:26,830 --> 00:39:31,080 So n people, they pick their hats completely at random, 730 00:39:31,080 --> 00:39:33,950 quote, unquote, and then leave. 731 00:39:33,950 --> 00:39:36,970 And the question is, to say something about the number of 732 00:39:36,970 --> 00:39:42,070 people who end up, by accident or by luck, to get back their 733 00:39:42,070 --> 00:39:45,170 own hat, the exact same hat that they checked in. 734 00:39:45,170 --> 00:39:48,490 OK, first what do we mean completely at random? 735 00:39:48,490 --> 00:39:51,060 Completely at random, we basically mean that any 736 00:39:51,060 --> 00:39:54,180 permutation of the hats is equally likely. 737 00:39:54,180 --> 00:39:58,520 Any way of distributing those n hats to the n people, any 738 00:39:58,520 --> 00:40:01,350 particular way is as likely as any other way. 739 00:40:01,350 --> 00:40:05,230 So there's complete symmetry between hats and people. 740 00:40:05,230 --> 00:40:08,490 So what we want to do is to calculate the expected value 741 00:40:08,490 --> 00:40:11,460 and the variance of this random variable X. Let's start 742 00:40:11,460 --> 00:40:13,240 with the expected value. 743 00:40:13,240 --> 00:40:17,840 Let's reuse the trick from the binomial case. 744 00:40:17,840 --> 00:40:21,110 So total number of hats picked, we're going to think 745 00:40:21,110 --> 00:40:24,140 of total number of hats picked as a sum of 746 00:40:24,140 --> 00:40:26,900 (0, 1) random variables. 747 00:40:26,900 --> 00:40:30,470 X1 tells us whether person 1 got their own hat back. 748 00:40:30,470 --> 00:40:32,920 If they did, we record a 1. 749 00:40:32,920 --> 00:40:34,960 X2, the same thing. 750 00:40:34,960 --> 00:40:40,910 By adding all X's is how many 1's did we get, which counts 751 00:40:40,910 --> 00:40:45,510 how many people selected their own hats. 752 00:40:45,510 --> 00:40:48,100 So we broke down the random variable of interest, the 753 00:40:48,100 --> 00:40:51,500 number of people who get their own hats back, as a sum of 754 00:40:51,500 --> 00:40:53,570 random variables. 755 00:40:53,570 --> 00:40:56,200 And these random variables, again, are easy to handle, 756 00:40:56,200 --> 00:40:58,010 because they're binary. 757 00:40:58,010 --> 00:40:59,250 The only take two values. 758 00:40:59,250 --> 00:41:03,500 What's the probability that Xi is equal to 1, the i-th person 759 00:41:03,500 --> 00:41:06,730 has a probability that they get their own hat? 760 00:41:06,730 --> 00:41:09,430 There's n hats by symmetry. 761 00:41:09,430 --> 00:41:11,890 The chance is that they end up getting their own hat, as 762 00:41:11,890 --> 00:41:14,930 opposed to any one of the other n - 1 hats, 763 00:41:14,930 --> 00:41:18,020 is going to be 1/n. 764 00:41:18,020 --> 00:41:20,710 So what's the expected value of Xi? 765 00:41:20,710 --> 00:41:23,130 It's one times 1/n. 766 00:41:23,130 --> 00:41:26,510 With probability 1/n, you get your own hat, or you get a 767 00:41:26,510 --> 00:41:30,960 value of 0 with probability 1-1/n, which is 1/n. 768 00:41:30,960 --> 00:41:34,660 769 00:41:34,660 --> 00:41:38,360 All right, so we got the expected value of the Xi's. 770 00:41:38,360 --> 00:41:41,510 And remember, we want to do is to calculate the expected 771 00:41:41,510 --> 00:41:46,900 value of X by using this decomposition? 772 00:41:46,900 --> 00:41:52,230 Are the random variables Xi independent of each other? 773 00:41:52,230 --> 00:41:55,470 You can try to answer that question by writing down a 774 00:41:55,470 --> 00:41:58,510 joint PMF for the X's, but I'm sure that 775 00:41:58,510 --> 00:42:00,000 you will not succeed. 776 00:42:00,000 --> 00:42:02,740 But can you think intuitively? 777 00:42:02,740 --> 00:42:05,940 If I tell you information about some of the Xi's, does 778 00:42:05,940 --> 00:42:08,920 it give you information about the remaining ones? 779 00:42:08,920 --> 00:42:09,300 Yeah. 780 00:42:09,300 --> 00:42:13,950 If I tell you that out of 10 people, 9 of them got their 781 00:42:13,950 --> 00:42:16,710 own hat back, does that tell you something 782 00:42:16,710 --> 00:42:18,330 about the 10th person? 783 00:42:18,330 --> 00:42:18,690 Yes. 784 00:42:18,690 --> 00:42:22,510 If 9 got their own hat, then the 10th must also have gotten 785 00:42:22,510 --> 00:42:24,170 their own hat back. 786 00:42:24,170 --> 00:42:27,170 So the first 9 random variables tell you something 787 00:42:27,170 --> 00:42:28,790 about the 10th one. 788 00:42:28,790 --> 00:42:33,000 And conveying information of this sort, that's the case of 789 00:42:33,000 --> 00:42:34,410 dependence. 790 00:42:34,410 --> 00:42:38,100 All right, so the random variables are not independent. 791 00:42:38,100 --> 00:42:39,030 Are we stuck? 792 00:42:39,030 --> 00:42:43,240 Can we still calculate the expected value of X? 793 00:42:43,240 --> 00:42:45,210 Yes, we can. 794 00:42:45,210 --> 00:42:50,710 And the reason we can is that expectations are linear. 795 00:42:50,710 --> 00:42:53,940 Expectation of a sum of random variables is the sum of the 796 00:42:53,940 --> 00:42:55,140 expectations. 797 00:42:55,140 --> 00:42:57,490 And that's always true. 798 00:42:57,490 --> 00:43:00,710 There's no independence assumption that's being used 799 00:43:00,710 --> 00:43:02,540 to apply that rule. 800 00:43:02,540 --> 00:43:06,980 So we have that the expected value of X is the sum of the 801 00:43:06,980 --> 00:43:09,580 expected value of the Xi's. 802 00:43:09,580 --> 00:43:12,970 And this is a property that's always true. 803 00:43:12,970 --> 00:43:14,350 You don't need independence. 804 00:43:14,350 --> 00:43:15,590 You don't care. 805 00:43:15,590 --> 00:43:18,660 So we're adding n terms, each one of which has 806 00:43:18,660 --> 00:43:20,430 expected value 1/n. 807 00:43:20,430 --> 00:43:22,670 And the final answer is 1. 808 00:43:22,670 --> 00:43:27,430 So out of the 100 people who selected hats at random, on 809 00:43:27,430 --> 00:43:32,590 the average, you expect only one of them to end up getting 810 00:43:32,590 --> 00:43:35,830 their own hat back. 811 00:43:35,830 --> 00:43:36,640 Very good. 812 00:43:36,640 --> 00:43:41,620 So since we are succeeding so far, let's try to see if we 813 00:43:41,620 --> 00:43:44,620 can succeed in calculating the variance as well. 814 00:43:44,620 --> 00:43:46,580 And of course, we will. 815 00:43:46,580 --> 00:43:50,160 But it's going to be a little more complicated. 816 00:43:50,160 --> 00:43:52,760 The reason it's going to be a little more complicated is 817 00:43:52,760 --> 00:43:56,500 because the Xi's are not independent, so the variance 818 00:43:56,500 --> 00:44:00,280 of the sum is not the same as the sum of the variances. 819 00:44:00,280 --> 00:44:04,320 So it's not enough to find the variances of the Xi's. 820 00:44:04,320 --> 00:44:06,930 We'll have to do more work. 821 00:44:06,930 --> 00:44:08,550 And here's what's involved. 822 00:44:08,550 --> 00:44:12,320 Let's start with the general formula for the variance, 823 00:44:12,320 --> 00:44:15,950 which, as I mentioned before, it's usually the simpler way 824 00:44:15,950 --> 00:44:18,430 to go about calculating variances. 825 00:44:18,430 --> 00:44:21,800 So we need to calculate the expected value for X-squared, 826 00:44:21,800 --> 00:44:27,110 and subtract from it the expectation squared. 827 00:44:27,110 --> 00:44:31,010 Well, we already found the expected value of X. It's 828 00:44:31,010 --> 00:44:31,870 equal to 1. 829 00:44:31,870 --> 00:44:34,580 So 1-squared gives us just 1. 830 00:44:34,580 --> 00:44:37,980 So we're left with the task of calculating the expected value 831 00:44:37,980 --> 00:44:43,440 of X-squared, the random variable X-squared. 832 00:44:43,440 --> 00:44:45,610 Let's try to follow the same idea. 833 00:44:45,610 --> 00:44:49,770 Write this messy random variable, X-squared, as a sum 834 00:44:49,770 --> 00:44:54,440 of hopefully simpler random variables. 835 00:44:54,440 --> 00:44:59,350 So X is the sum of the Xi's, so you square 836 00:44:59,350 --> 00:45:01,560 both sides of this. 837 00:45:01,560 --> 00:45:05,150 And then you expand the right-hand side. 838 00:45:05,150 --> 00:45:09,390 When you expand the right-hand side, you get the squares of 839 00:45:09,390 --> 00:45:11,420 the terms that appear here. 840 00:45:11,420 --> 00:45:14,230 And then you get all the cross-terms. 841 00:45:14,230 --> 00:45:19,100 For every pair of (i,j) that are different, i different 842 00:45:19,100 --> 00:45:24,030 than j, you're going to have a cross-term in the sum. 843 00:45:24,030 --> 00:45:29,230 So now, in order to calculate the expected value of 844 00:45:29,230 --> 00:45:32,480 X-squared, what does our task reduce to? 845 00:45:32,480 --> 00:45:36,230 It reduces to calculating the expected value of this term 846 00:45:36,230 --> 00:45:38,690 and calculating the expected value of that term. 847 00:45:38,690 --> 00:45:41,060 So let's do them one at a time. 848 00:45:41,060 --> 00:45:47,040 Expected value of Xi squared, what is it going to be? 849 00:45:47,040 --> 00:45:48,660 Same trick as before. 850 00:45:48,660 --> 00:45:53,350 Xi takes value 0 or 1, so Xi squared takes just the same 851 00:45:53,350 --> 00:45:55,290 values, 0 or 1. 852 00:45:55,290 --> 00:45:57,010 So that's the easy one. 853 00:45:57,010 --> 00:46:00,680 That's the same as expected value of Xi, which we already 854 00:46:00,680 --> 00:46:04,410 know to be 1/n. 855 00:46:04,410 --> 00:46:07,830 So this gives us a first contribution down here. 856 00:46:07,830 --> 00:46:10,840 857 00:46:10,840 --> 00:46:14,220 The expected value of this term is going to be what? 858 00:46:14,220 --> 00:46:17,210 We have n terms in the summation. 859 00:46:17,210 --> 00:46:21,800 And each one of these terms has an expectation of 1/n. 860 00:46:21,800 --> 00:46:24,710 So we did a piece of the puzzle. 861 00:46:24,710 --> 00:46:28,480 So now let's deal with the second piece of the puzzle. 862 00:46:28,480 --> 00:46:32,020 Let's find the expected value of Xi times Xj. 863 00:46:32,020 --> 00:46:35,540 Now by symmetry, the expected value of Xi times Xj is going 864 00:46:35,540 --> 00:46:39,900 to be the same no matter what i and j you see. 865 00:46:39,900 --> 00:46:44,930 So let's just think about X1 and X2 and try to find the 866 00:46:44,930 --> 00:46:48,260 expected value of X1 and X2. 867 00:46:48,260 --> 00:46:51,710 X1 times X2 is a random variable. 868 00:46:51,710 --> 00:46:53,960 What values does it take? 869 00:46:53,960 --> 00:46:56,570 Only 0 or 1? 870 00:46:56,570 --> 00:47:00,000 Since X1 and X2 are 0 or 1, their product can only take 871 00:47:00,000 --> 00:47:02,010 the values of 0 or 1. 872 00:47:02,010 --> 00:47:04,990 So to find the probability distribution of this random 873 00:47:04,990 --> 00:47:07,320 variable, it's just sufficient to find the probability that 874 00:47:07,320 --> 00:47:09,530 it takes the value of 1. 875 00:47:09,530 --> 00:47:14,500 Now, what does X1 times X2 equal to 1 mean? 876 00:47:14,500 --> 00:47:19,500 It means that X1 was 1 and X2 was 1. 877 00:47:19,500 --> 00:47:22,390 The only way that you can get a product of 1 is if both of 878 00:47:22,390 --> 00:47:24,350 them turned out to be 1's. 879 00:47:24,350 --> 00:47:29,570 So that's the same as saying, persons 1 and 2 both picked 880 00:47:29,570 --> 00:47:31,980 their own hats. 881 00:47:31,980 --> 00:47:35,510 The probability that person 1 and person 2 both pick their 882 00:47:35,510 --> 00:47:39,600 own hats is the probability of two things happening, which is 883 00:47:39,600 --> 00:47:42,320 the product of the first thing happening times the 884 00:47:42,320 --> 00:47:44,310 conditional probability of the second, given 885 00:47:44,310 --> 00:47:46,160 that the first happened. 886 00:47:46,160 --> 00:47:48,690 And in words, this is the probability that the first 887 00:47:48,690 --> 00:47:51,840 person picked their own hat times the probability that the 888 00:47:51,840 --> 00:47:54,920 second person picks their own hat, given that the first 889 00:47:54,920 --> 00:47:56,990 person already picked their own. 890 00:47:56,990 --> 00:47:58,820 So what's the probability that the first person 891 00:47:58,820 --> 00:48:00,760 picks their own hat? 892 00:48:00,760 --> 00:48:03,040 We know that it's 1/n. 893 00:48:03,040 --> 00:48:05,030 Now, how about the second person? 894 00:48:05,030 --> 00:48:09,540 If I tell you that one person has their own hat, and that 895 00:48:09,540 --> 00:48:13,240 person takes their hat and goes away, from the point of 896 00:48:13,240 --> 00:48:17,250 view of the second person, there's n - 1 people left 897 00:48:17,250 --> 00:48:19,770 looking at n - 1 hats. 898 00:48:19,770 --> 00:48:22,330 And they're getting just hats at random. 899 00:48:22,330 --> 00:48:24,930 What's the chance that I will get my own? 900 00:48:24,930 --> 00:48:26,180 It's 1/n - 1. 901 00:48:26,180 --> 00:48:29,210 902 00:48:29,210 --> 00:48:33,700 So think of them as person 1 goes, picks a hat at random, 903 00:48:33,700 --> 00:48:36,850 it happens to be their own, and it leaves. 904 00:48:36,850 --> 00:48:40,120 You're left with n - 1 people, and there are n 905 00:48:40,120 --> 00:48:41,250 - 1 hats out there. 906 00:48:41,250 --> 00:48:44,490 Person 2 goes and picks a hat at random, with probability 907 00:48:44,490 --> 00:48:48,820 1/n - 1, is going to pick his own hat. 908 00:48:48,820 --> 00:48:52,400 So the expected value now of this random variable is, 909 00:48:52,400 --> 00:48:54,520 again, that same number, because this is 910 00:48:54,520 --> 00:48:57,500 a 0, 1 random variable. 911 00:48:57,500 --> 00:49:02,370 So this is the same as expected value of Xi times Xj 912 00:49:02,370 --> 00:49:04,810 when i different than j. 913 00:49:04,810 --> 00:49:09,830 So here, all that's left to do is to add the expectations of 914 00:49:09,830 --> 00:49:10,540 these terms. 915 00:49:10,540 --> 00:49:14,480 Each one of these terms has an expected value that's 1/n 916 00:49:14,480 --> 00:49:16,910 times (1/n - 1). 917 00:49:16,910 --> 00:49:19,170 And how many terms do we have? 918 00:49:19,170 --> 00:49:21,410 How many of these are we adding up? 919 00:49:21,410 --> 00:49:24,840 920 00:49:24,840 --> 00:49:28,950 It's n-squared - n. 921 00:49:28,950 --> 00:49:31,830 When you expand the quadratic, there's a total 922 00:49:31,830 --> 00:49:33,890 of n-squared terms. 923 00:49:33,890 --> 00:49:37,860 Some are self-terms, n of them. 924 00:49:37,860 --> 00:49:42,170 And the remaining number of terms is n-squared - n. 925 00:49:42,170 --> 00:49:48,310 So here we got n-squared - n terms. 926 00:49:48,310 --> 00:49:51,200 And so we need to multiply here with n-squared - n. 927 00:49:51,200 --> 00:49:53,810 928 00:49:53,810 --> 00:49:59,980 And after you realize that this number here is 1, and you 929 00:49:59,980 --> 00:50:03,490 realize that this is the same as the denominator, you get 930 00:50:03,490 --> 00:50:06,750 the answer that the expected value of X squared equals 2. 931 00:50:06,750 --> 00:50:10,120 And then, finally going up to the top formula, we get the 932 00:50:10,120 --> 00:50:14,720 expected value of X squared, which is 2 - 1, and the 933 00:50:14,720 --> 00:50:17,610 variance is just equal to 1. 934 00:50:17,610 --> 00:50:21,680 So the variance of this random variable, number of people who 935 00:50:21,680 --> 00:50:25,130 get their own hats back, is also equal to 1, 936 00:50:25,130 --> 00:50:26,540 equal to the mean. 937 00:50:26,540 --> 00:50:27,690 Looks like magic. 938 00:50:27,690 --> 00:50:29,220 Why is this the case? 939 00:50:29,220 --> 00:50:31,550 Well, there's a deeper explanation why these two 940 00:50:31,550 --> 00:50:33,630 numbers should come out to be the same. 941 00:50:33,630 --> 00:50:35,980 But this is something that would probably have to wait a 942 00:50:35,980 --> 00:50:39,420 couple of chapters before we could actually explain it. 943 00:50:39,420 --> 00:50:40,730 And so I'll stop here. 944 00:50:40,730 --> 00:50:41,980