1 00:00:00,040 --> 00:00:02,460 The following content is provided under a Creative 2 00:00:02,460 --> 00:00:03,870 Commons license. 3 00:00:03,870 --> 00:00:06,910 Your support will help MIT OpenCourseWare continue to 4 00:00:06,910 --> 00:00:10,560 offer high-quality educational resources for free. 5 00:00:10,560 --> 00:00:13,460 To make a donation or view additional materials from 6 00:00:13,460 --> 00:00:19,290 hundreds of MIT courses, visit MIT OpenCourseWare at 7 00:00:19,290 --> 00:00:20,540 ocw.mit.edu. 8 00:00:22,420 --> 00:00:25,310 PROFESSOR: OK, good morning. 9 00:00:25,310 --> 00:00:30,930 So today, we're going to have a fairly packed lecture. 10 00:00:30,930 --> 00:00:34,060 We are going to conclude with chapter two, 11 00:00:34,060 --> 00:00:35,560 discrete random variables. 12 00:00:35,560 --> 00:00:37,140 And we will be talking mostly about 13 00:00:37,140 --> 00:00:39,322 multiple random variables. 14 00:00:39,322 --> 00:00:43,060 And this is also the last lecture as far 15 00:00:43,060 --> 00:00:44,720 as quiz one is concerned. 16 00:00:44,720 --> 00:00:48,350 So it's going to cover the material until today, and of 17 00:00:48,350 --> 00:00:52,550 course the next recitation and tutorial as well. 18 00:00:52,550 --> 00:00:57,170 OK, so we're going to review quickly what we introduced at 19 00:00:57,170 --> 00:01:01,040 the end of last lecture, where we talked about the joint PMF 20 00:01:01,040 --> 00:01:02,300 of two random variables. 21 00:01:02,300 --> 00:01:05,040 We're going to talk about the case of more than two random 22 00:01:05,040 --> 00:01:07,440 variables as well. 23 00:01:07,440 --> 00:01:09,910 We're going to talk about the familiar concepts of 24 00:01:09,910 --> 00:01:14,300 conditioning and independence, but applied to random 25 00:01:14,300 --> 00:01:16,460 variables instead of events. 26 00:01:16,460 --> 00:01:19,320 We're going to look at the expectations once more, talk 27 00:01:19,320 --> 00:01:22,720 about a few properties that they have, and then solve a 28 00:01:22,720 --> 00:01:25,900 couple of problems and calculate a few things in 29 00:01:25,900 --> 00:01:28,180 somewhat clever ways. 30 00:01:28,180 --> 00:01:31,790 So the first point I want to make is that, to a large 31 00:01:31,790 --> 00:01:34,870 extent, whatever is happening in our chapter on discrete 32 00:01:34,870 --> 00:01:39,160 random variables is just an exercise in notation. 33 00:01:39,160 --> 00:01:42,850 There is stuff and concepts that you are already familiar 34 00:01:42,850 --> 00:01:45,230 with-- probabilities, probabilities of two things 35 00:01:45,230 --> 00:01:47,490 happening, conditional probabilities. 36 00:01:47,490 --> 00:01:51,760 And all that we're doing, to some extent, is rewriting 37 00:01:51,760 --> 00:01:54,840 those familiar concepts in new notation. 38 00:01:54,840 --> 00:01:57,810 So for example, this is the joint PMF 39 00:01:57,810 --> 00:01:59,020 of two random variable. 40 00:01:59,020 --> 00:02:02,080 It gives us, for any pair or possible values of those 41 00:02:02,080 --> 00:02:05,510 random variables, the probability that that pair 42 00:02:05,510 --> 00:02:07,270 occurs simultaneously. 43 00:02:07,270 --> 00:02:10,020 So it's the probability that simultaneously x takes that 44 00:02:10,020 --> 00:02:13,580 value, and y takes that other value. 45 00:02:13,580 --> 00:02:17,210 And similarly, we have the notion of the conditional PMF, 46 00:02:17,210 --> 00:02:21,060 which is just a list of the -- condition of -- the various 47 00:02:21,060 --> 00:02:23,750 conditional probabilities of interest, conditional 48 00:02:23,750 --> 00:02:26,450 probability that one random variable takes this value 49 00:02:26,450 --> 00:02:30,320 given that the other random variable takes that value. 50 00:02:30,320 --> 00:02:33,640 Now, a remark about conditional probabilities. 51 00:02:33,640 --> 00:02:36,640 Conditional probabilities generally are like ordinary 52 00:02:36,640 --> 00:02:37,370 probabilities. 53 00:02:37,370 --> 00:02:40,170 You condition on something particular. 54 00:02:40,170 --> 00:02:43,230 So here we condition on a particular y. 55 00:02:43,230 --> 00:02:46,580 So think of little y as a fixed quantity. 56 00:02:46,580 --> 00:02:49,800 And then look at this as a function of x. 57 00:02:49,800 --> 00:02:54,430 So given that y, which we condition on, given our new 58 00:02:54,430 --> 00:02:58,990 universe, we're considering the various possibilities for 59 00:02:58,990 --> 00:03:01,290 x and the probabilities that they have. 60 00:03:01,290 --> 00:03:04,000 Now, the probabilities over all x's, of course, 61 00:03:04,000 --> 00:03:05,830 needs to add to 1. 62 00:03:05,830 --> 00:03:11,530 So we should have a relation of this kind. 63 00:03:11,530 --> 00:03:14,420 So they're just like ordinary probabilities over the 64 00:03:14,420 --> 00:03:18,230 different x's in a universe where we are told the value of 65 00:03:18,230 --> 00:03:20,940 the random variable y. 66 00:03:20,940 --> 00:03:22,335 Now, how are these related? 67 00:03:25,200 --> 00:03:28,190 So we call these the marginal, these the joint, these the 68 00:03:28,190 --> 00:03:29,150 conditional. 69 00:03:29,150 --> 00:03:31,510 And there are some relations between these. 70 00:03:31,510 --> 00:03:35,430 For example, to find the marginal from the joint, it's 71 00:03:35,430 --> 00:03:37,730 pretty straightforward. 72 00:03:37,730 --> 00:03:41,680 The probability that x takes a particular value is the sum of 73 00:03:41,680 --> 00:03:45,030 the probabilities of all of the different ways that this 74 00:03:45,030 --> 00:03:47,190 particular value may occur. 75 00:03:47,190 --> 00:03:48,380 What are the different ways? 76 00:03:48,380 --> 00:03:51,910 Well, it may occur together with a certain y, or together 77 00:03:51,910 --> 00:03:55,110 with some other y, or together with some other y. 78 00:03:55,110 --> 00:03:58,030 So you look at all the possible y's that can go 79 00:03:58,030 --> 00:04:01,750 together with this x, and add the probabilities of all of 80 00:04:01,750 --> 00:04:07,220 those pairs for which we get this particular value of x. 81 00:04:07,220 --> 00:04:13,120 And then there's a relation between that connects these 82 00:04:13,120 --> 00:04:16,230 two probabilities with the conditional probability. 83 00:04:16,230 --> 00:04:18,630 And it's this relation. 84 00:04:18,630 --> 00:04:20,279 It's nothing new. 85 00:04:20,279 --> 00:04:25,160 It's just new notation for writing what we already know, 86 00:04:25,160 --> 00:04:28,130 that the probability of two things happening is the 87 00:04:28,130 --> 00:04:31,460 probability that the first thing happens, and then given 88 00:04:31,460 --> 00:04:34,210 that the first thing happens, the probability that the 89 00:04:34,210 --> 00:04:36,140 second one happened. 90 00:04:36,140 --> 00:04:39,050 So how do we go from one to the other? 91 00:04:39,050 --> 00:04:42,960 Think of A as being the event that X takes the value, little 92 00:04:42,960 --> 00:04:49,120 x, and B being the event that Y takes the value, little y. 93 00:04:49,120 --> 00:04:52,230 So the joint probability is the probability that these two 94 00:04:52,230 --> 00:04:54,220 things happen simultaneously. 95 00:04:54,220 --> 00:04:58,140 It's the probability that X takes this value times the 96 00:04:58,140 --> 00:05:03,280 conditional probability that Y takes this value, given that X 97 00:05:03,280 --> 00:05:04,670 took that first value. 98 00:05:04,670 --> 00:05:08,470 So it's the familiar multiplication rule, but just 99 00:05:08,470 --> 00:05:11,030 transcribed in our new notation. 100 00:05:11,030 --> 00:05:13,690 So nothing new so far. 101 00:05:13,690 --> 00:05:17,480 OK, why did we go through this exercise and this notation? 102 00:05:17,480 --> 00:05:19,980 It's because in the experiments where we're 103 00:05:19,980 --> 00:05:23,160 interested in the real world, typically there's going to be 104 00:05:23,160 --> 00:05:24,630 lots of uncertain quantities. 105 00:05:24,630 --> 00:05:27,150 There's going to be multiple random variables. 106 00:05:27,150 --> 00:05:31,520 And we want to be able to talk about them simultaneously. 107 00:05:31,520 --> 00:05:31,665 Okay. 108 00:05:31,665 --> 00:05:35,110 Why two and not more than two? 109 00:05:35,110 --> 00:05:37,620 How about three random variables? 110 00:05:37,620 --> 00:05:41,290 Well, if you understand what's going on in this slide, you 111 00:05:41,290 --> 00:05:45,720 should be able to kind of automatically generalize this 112 00:05:45,720 --> 00:05:48,260 to the case of multiple random variables. 113 00:05:48,260 --> 00:05:51,590 So for example, if we have three random variables, X, Y, 114 00:05:51,590 --> 00:05:56,720 and Z, and you see an expression like this, it 115 00:05:56,720 --> 00:05:58,670 should be clear what it means. 116 00:05:58,670 --> 00:06:02,070 It's the probability that X takes this value and 117 00:06:02,070 --> 00:06:06,240 simultaneously Y takes that value and simultaneously Z 118 00:06:06,240 --> 00:06:07,765 takes that value. 119 00:06:07,765 --> 00:06:13,280 I guess that's an uppercase Z here, that's a lowercase z. 120 00:06:13,280 --> 00:06:20,500 And if I ask you to find the marginal of X, if I tell you 121 00:06:20,500 --> 00:06:24,340 the joint PMF of the three random variables and I ask you 122 00:06:24,340 --> 00:06:27,320 for this value, how would you find it? 123 00:06:27,320 --> 00:06:31,350 Well, you will try to generalize this relation here. 124 00:06:31,350 --> 00:06:35,250 The probability that x occurs is the sum of the 125 00:06:35,250 --> 00:06:44,450 probabilities of all events that make X to take that 126 00:06:44,450 --> 00:06:45,870 particular value. 127 00:06:45,870 --> 00:06:47,400 So what are all the events? 128 00:06:47,400 --> 00:06:51,530 Well, this particular x can happen together with some y 129 00:06:51,530 --> 00:06:52,790 and some z. 130 00:06:52,790 --> 00:06:55,150 We don't care which y and z. 131 00:06:55,150 --> 00:06:57,890 Any y and z will do. 132 00:06:57,890 --> 00:07:01,220 So when we consider all possibilities, we need to add 133 00:07:01,220 --> 00:07:04,760 here over all possible values of y's and z's. 134 00:07:04,760 --> 00:07:08,020 So consider all triples, x, y, z. 135 00:07:08,020 --> 00:07:12,380 Fix x and consider all the possibilities for the 136 00:07:12,380 --> 00:07:16,600 remaining variables, y and z, add these up, and that gives 137 00:07:16,600 --> 00:07:24,740 you the marginal PMF of X. And then there's other things that 138 00:07:24,740 --> 00:07:26,140 you can do. 139 00:07:26,140 --> 00:07:29,340 This is the multiplication rule for two events. 140 00:07:29,340 --> 00:07:32,510 We saw back in chapter one that there's a multiplication 141 00:07:32,510 --> 00:07:35,130 rule when you talk about more than two events. 142 00:07:35,130 --> 00:07:38,860 And you can write a chain of conditional probabilities. 143 00:07:38,860 --> 00:07:43,860 We can certainly do the same in our new notation. 144 00:07:43,860 --> 00:07:45,810 So let's look at this rule up here. 145 00:07:48,700 --> 00:07:51,220 Multiplication rule for three random variables, 146 00:07:51,220 --> 00:07:53,000 what does it say? 147 00:07:53,000 --> 00:07:55,280 The probability of three things happening 148 00:07:55,280 --> 00:07:59,770 simultaneously, X, Y, Z taking specific values, little x, 149 00:07:59,770 --> 00:08:03,110 little y, little z, that probability is the probability 150 00:08:03,110 --> 00:08:07,210 that the first thing happens, that X takes that value. 151 00:08:07,210 --> 00:08:09,880 Given that X takes that value, we multiply it with the 152 00:08:09,880 --> 00:08:14,650 probability that Y takes also a certain value. 153 00:08:14,650 --> 00:08:18,560 And now, given that X and Y have taken those particular 154 00:08:18,560 --> 00:08:21,730 values, we multiply with a conditional probability that 155 00:08:21,730 --> 00:08:24,380 the third thing happens, given that the 156 00:08:24,380 --> 00:08:26,960 first two things happen. 157 00:08:26,960 --> 00:08:30,080 So this is just the multiplication rule for three 158 00:08:30,080 --> 00:08:33,530 events, which would be probability of A intersection 159 00:08:33,530 --> 00:08:35,669 B intersection C equals-- 160 00:08:35,669 --> 00:08:37,909 you know the rest of the formula. 161 00:08:37,909 --> 00:08:42,330 You just rewrite this formula in PMF notation. 162 00:08:42,330 --> 00:08:45,310 Probability of A intersection B intersection C is the 163 00:08:45,310 --> 00:08:49,450 probability of A, which corresponds to this term, 164 00:08:49,450 --> 00:08:54,010 times the probability of B given A, times the probability 165 00:08:54,010 --> 00:09:00,700 of C given A and B. 166 00:09:00,700 --> 00:09:04,920 So what else is there that's left from chapter one that we 167 00:09:04,920 --> 00:09:10,190 can or should generalize to random variables? 168 00:09:10,190 --> 00:09:12,560 Well, there's the notion of independence. 169 00:09:12,560 --> 00:09:16,720 So let's define what independence means. 170 00:09:16,720 --> 00:09:19,970 Instead of talking about just two random variables, let's go 171 00:09:19,970 --> 00:09:22,470 directly to the case of multiple random variables. 172 00:09:22,470 --> 00:09:24,400 When we talked about events, things were a little 173 00:09:24,400 --> 00:09:25,100 complicated. 174 00:09:25,100 --> 00:09:28,480 We had a simple definition for independence of two events. 175 00:09:28,480 --> 00:09:31,950 Two events are independent if the probability of both is 176 00:09:31,950 --> 00:09:33,740 equal to the product of the probabilities. 177 00:09:33,740 --> 00:09:35,830 But for three events, it was kind of messy. 178 00:09:35,830 --> 00:09:38,460 We needed to write down lots of conditions. 179 00:09:38,460 --> 00:09:41,140 For random variables, things in some sense 180 00:09:41,140 --> 00:09:42,060 are a little simpler. 181 00:09:42,060 --> 00:09:46,360 We only need to write down one formula and take this as the 182 00:09:46,360 --> 00:09:49,020 definition of independence. 183 00:09:49,020 --> 00:09:53,630 Three random variables are independent if and only if, by 184 00:09:53,630 --> 00:09:58,390 definition, their joint probability mass function 185 00:09:58,390 --> 00:10:02,560 factors out into individual probability mass functions. 186 00:10:02,560 --> 00:10:08,190 So the probability that all three things happen is the 187 00:10:08,190 --> 00:10:11,840 product of the individual probabilities that each one of 188 00:10:11,840 --> 00:10:14,170 these three things is happening. 189 00:10:14,170 --> 00:10:17,580 So independence means mathematically that you can 190 00:10:17,580 --> 00:10:21,030 just multiply probabilities to get to the probability of 191 00:10:21,030 --> 00:10:22,706 several things happening simultaneously. 192 00:10:25,680 --> 00:10:31,040 So with three events, we have to write a huge number of 193 00:10:31,040 --> 00:10:34,500 equations, of equalities that have to hold. 194 00:10:34,500 --> 00:10:37,500 How can it be that with random variables we can only manage 195 00:10:37,500 --> 00:10:39,370 with one equality? 196 00:10:39,370 --> 00:10:41,230 Well, the catch is that this is not 197 00:10:41,230 --> 00:10:43,260 really just one equality. 198 00:10:43,260 --> 00:10:48,390 We require this to be true for every little x, y, and z. 199 00:10:48,390 --> 00:10:52,600 So in some sense, this is a bunch of conditions that are 200 00:10:52,600 --> 00:10:56,300 being put on the joint PMF, a bunch of conditions that we 201 00:10:56,300 --> 00:10:58,130 need to check. 202 00:10:58,130 --> 00:11:01,040 So this is the mathematical definition. 203 00:11:01,040 --> 00:11:05,400 What is the intuitive content of this definition? 204 00:11:05,400 --> 00:11:11,130 The intuitive content is the same as for events. 205 00:11:11,130 --> 00:11:15,020 Random variables are independent if knowing 206 00:11:15,020 --> 00:11:19,490 something about the realized values of some of these random 207 00:11:19,490 --> 00:11:25,510 variables does not change our beliefs about the likelihood 208 00:11:25,510 --> 00:11:29,510 of various values for the remaining random variables. 209 00:11:29,510 --> 00:11:34,250 So independence would translate, for example, to a 210 00:11:34,250 --> 00:11:39,690 condition such as the conditional PMF of X , given 211 00:11:39,690 --> 00:11:46,420 y, should be equal to the marginal PMF of X. What is 212 00:11:46,420 --> 00:11:47,490 this saying? 213 00:11:47,490 --> 00:11:53,070 That you have some original beliefs about how likely it is 214 00:11:53,070 --> 00:11:55,210 for X to take this value. 215 00:11:55,210 --> 00:11:58,350 Now, someone comes and tells you that Y took 216 00:11:58,350 --> 00:12:00,140 on a certain value. 217 00:12:00,140 --> 00:12:03,470 This causes you, in principle, to revise your beliefs. 218 00:12:03,470 --> 00:12:06,430 And your new beliefs will be captured by the conditional 219 00:12:06,430 --> 00:12:08,750 PMF, or the conditional probabilities. 220 00:12:08,750 --> 00:12:12,820 Independence means that your revised beliefs actually will 221 00:12:12,820 --> 00:12:15,420 be the same as your original beliefs. 222 00:12:15,420 --> 00:12:19,960 Telling you information about the value of Y doesn't change 223 00:12:19,960 --> 00:12:24,400 what you expect for the random variable X. 224 00:12:24,400 --> 00:12:28,750 Why didn't we use this definition for independence? 225 00:12:28,750 --> 00:12:31,900 Well, because this definition only makes sense when this 226 00:12:31,900 --> 00:12:34,330 conditional is well-defined. 227 00:12:34,330 --> 00:12:43,290 And this conditional is only well-defined if the events 228 00:12:43,290 --> 00:12:46,130 that Y takes on that particular value has positive 229 00:12:46,130 --> 00:12:47,220 probability. 230 00:12:47,220 --> 00:12:51,730 We cannot condition on events that have zero probability, so 231 00:12:51,730 --> 00:12:55,460 conditional probabilities are only defined for y's that are 232 00:12:55,460 --> 00:12:59,500 likely to occur, that have a positive probability. 233 00:12:59,500 --> 00:13:03,640 Now, similarly, with multiple random variables, if they're 234 00:13:03,640 --> 00:13:07,970 independent, you would have relations such as the 235 00:13:07,970 --> 00:13:14,290 conditional of X, given y and z, should be the same as the 236 00:13:14,290 --> 00:13:17,340 marginal of X. What is this saying? 237 00:13:17,340 --> 00:13:21,220 Again, that if I tell you the values, the realized values of 238 00:13:21,220 --> 00:13:25,900 random variables Y and Z, this is not going to change your 239 00:13:25,900 --> 00:13:28,900 beliefs about how likely x is to occur. 240 00:13:28,900 --> 00:13:30,900 Whatever you believed in the beginning, you're going to 241 00:13:30,900 --> 00:13:33,000 believe the same thing afterwards. 242 00:13:33,000 --> 00:13:36,130 So it's important to keep that intuition in mind, because 243 00:13:36,130 --> 00:13:39,200 sometimes this way you can tell whether random variables 244 00:13:39,200 --> 00:13:42,820 are independent without having to do calculations and to 245 00:13:42,820 --> 00:13:44,930 check this formula. 246 00:13:44,930 --> 00:13:47,300 OK, so let's check our concepts 247 00:13:47,300 --> 00:13:49,250 with a simple example. 248 00:13:49,250 --> 00:13:52,220 Let's look at two random variables that are discrete, 249 00:13:52,220 --> 00:13:55,100 take values between one and for each. 250 00:13:55,100 --> 00:13:57,890 And this is a table that gives us the joint PMF. 251 00:13:57,890 --> 00:14:05,720 So it tells us the probability that X equals to 2 and Y 252 00:14:05,720 --> 00:14:08,040 equals to 1 happening simultaneously. 253 00:14:08,040 --> 00:14:10,810 It's an event that has probability 1/20. 254 00:14:10,810 --> 00:14:14,510 Are these two random variables independent? 255 00:14:14,510 --> 00:14:17,610 You can try to check a condition like this. 256 00:14:17,610 --> 00:14:21,940 But can we tell directly from the table? 257 00:14:21,940 --> 00:14:28,470 If I tell you a value of Y, could that give you useful 258 00:14:28,470 --> 00:14:29,720 information about X? 259 00:14:32,180 --> 00:14:32,860 Certainly. 260 00:14:32,860 --> 00:14:38,680 If I tell you that Y is equal to 1, this tells you that X 261 00:14:38,680 --> 00:14:40,990 must be equal to 2. 262 00:14:40,990 --> 00:14:44,870 But if I tell you that Y was equal to 3, this tells you 263 00:14:44,870 --> 00:14:47,540 that, still, X could be anything. 264 00:14:47,540 --> 00:14:52,220 So telling you the value of Y kind of changes what you 265 00:14:52,220 --> 00:14:57,240 expect or what you consider possible for the values of the 266 00:14:57,240 --> 00:14:59,020 other random variable. 267 00:14:59,020 --> 00:15:03,070 So by just inspecting here, we can tell that the random 268 00:15:03,070 --> 00:15:04,860 variables are not independent. 269 00:15:08,290 --> 00:15:08,470 Okay. 270 00:15:08,470 --> 00:15:10,990 What's the other concept we introduced in chapter one? 271 00:15:10,990 --> 00:15:14,060 We introduced the concept of conditional independence. 272 00:15:14,060 --> 00:15:17,120 And conditional independence is like ordinary independence 273 00:15:17,120 --> 00:15:20,420 but applied to a conditional universe where we're given 274 00:15:20,420 --> 00:15:21,780 some information. 275 00:15:21,780 --> 00:15:24,610 So suppose someone tells you that the outcome of the 276 00:15:24,610 --> 00:15:30,420 experiment is such that X is less than or equal to 2 and Y 277 00:15:30,420 --> 00:15:33,920 is larger than or equal to 3. 278 00:15:33,920 --> 00:15:37,670 So we are given the information that we now live 279 00:15:37,670 --> 00:15:40,010 inside this universe. 280 00:15:40,010 --> 00:15:42,080 So what happens inside this universe? 281 00:15:42,080 --> 00:15:47,200 Inside this universe, our random variables are going to 282 00:15:47,200 --> 00:15:55,140 have a new joint PMF which is conditioned on the event that 283 00:15:55,140 --> 00:15:58,650 we were told that it has occurred. 284 00:15:58,650 --> 00:16:04,780 So let A correspond to this sort of event here. 285 00:16:04,780 --> 00:16:06,900 And now we're dealing with conditional probabilities. 286 00:16:06,900 --> 00:16:09,490 What are those conditional probabilities? 287 00:16:09,490 --> 00:16:11,490 We can put them in a table. 288 00:16:11,490 --> 00:16:14,220 So it's a two by two table, since we only have two 289 00:16:14,220 --> 00:16:15,540 possible values. 290 00:16:15,540 --> 00:16:18,080 What are they going to be? 291 00:16:18,080 --> 00:16:20,740 Well, these probabilities show up in the ratios 292 00:16:20,740 --> 00:16:22,910 1, 2, 2, and 4. 293 00:16:22,910 --> 00:16:25,480 Those ratios have to stay the same. 294 00:16:25,480 --> 00:16:29,700 The probabilities need to add up to one. 295 00:16:29,700 --> 00:16:34,030 So what should the denominators be since these 296 00:16:34,030 --> 00:16:35,380 numbers add up to nine? 297 00:16:35,380 --> 00:16:37,820 These are the conditional probabilities. 298 00:16:37,820 --> 00:16:40,575 So this is the conditional PMF in this example. 299 00:16:43,870 --> 00:16:46,990 Now, in this conditional universe, is x 300 00:16:46,990 --> 00:16:48,255 independent from y? 301 00:16:51,230 --> 00:17:01,450 If I tell you that y takes this value, so we live in this 302 00:17:01,450 --> 00:17:04,980 universe, what do you know about x? 303 00:17:04,980 --> 00:17:08,109 What you know about x is at this value is twice as likely 304 00:17:08,109 --> 00:17:09,930 as that value. 305 00:17:09,930 --> 00:17:13,859 If I condition on y taking this value, so we're living 306 00:17:13,859 --> 00:17:16,450 here, what do you know about x? 307 00:17:16,450 --> 00:17:21,660 What you know about x is that this value is twice as likely 308 00:17:21,660 --> 00:17:23,240 as that value. 309 00:17:23,240 --> 00:17:24,500 So it's the same. 310 00:17:24,500 --> 00:17:30,250 Whether we live here or we live there, this x is twice as 311 00:17:30,250 --> 00:17:33,670 likely as that x. 312 00:17:33,670 --> 00:17:41,560 So the conditional PMF in this new universe, the conditional 313 00:17:41,560 --> 00:17:55,970 PMF of X given y, in the new universe is the same as the 314 00:17:55,970 --> 00:18:01,250 marginal PMF of X, but of course in the new universe. 315 00:18:01,250 --> 00:18:04,370 So no matter what y is, the conditional 316 00:18:04,370 --> 00:18:06,860 PMF of X is the same. 317 00:18:06,860 --> 00:18:12,150 And that conditional PMF is 1/3 and 2/3. 318 00:18:12,150 --> 00:18:15,150 This is the conditional PMF of X in the new universe no 319 00:18:15,150 --> 00:18:17,000 matter what y occurs. 320 00:18:17,000 --> 00:18:20,330 So Y does not give us any information about X, doesn't 321 00:18:20,330 --> 00:18:25,620 cause us to change our beliefs inside this little universe. 322 00:18:25,620 --> 00:18:28,440 And therefore the two random variables are independent. 323 00:18:28,440 --> 00:18:31,180 Now, the other way that you can verify that we have 324 00:18:31,180 --> 00:18:34,960 independence is to find the marginal PMFs of the two 325 00:18:34,960 --> 00:18:36,250 random variables. 326 00:18:36,250 --> 00:18:39,650 The marginal PMF of X, you find it by 327 00:18:39,650 --> 00:18:41,100 adding those two terms. 328 00:18:41,100 --> 00:18:42,720 You get 1/3. 329 00:18:42,720 --> 00:18:44,620 Adding those two terms, you get 2/3. 330 00:18:44,620 --> 00:18:48,530 Marginal PMF of Y, you find it, you add these two terms, 331 00:18:48,530 --> 00:18:51,410 and you get 1/3. 332 00:18:51,410 --> 00:18:56,470 And the marginal PMF of Y here is going to be 2/3. 333 00:18:56,470 --> 00:18:59,700 And then you ask the question, is the joint the product of 334 00:18:59,700 --> 00:19:00,860 the marginals? 335 00:19:00,860 --> 00:19:02,630 And indeed it is. 336 00:19:02,630 --> 00:19:05,330 This times this gives you 1/9. 337 00:19:05,330 --> 00:19:08,050 This times this gives you 2/9. 338 00:19:08,050 --> 00:19:12,180 So the values in the table with the joint PMFs is the 339 00:19:12,180 --> 00:19:17,220 product of the marginal PMFs of X and Y in this universe, 340 00:19:17,220 --> 00:19:19,090 so the two random variables are 341 00:19:19,090 --> 00:19:21,850 independent inside this universe. 342 00:19:21,850 --> 00:19:26,704 So we say that they're conditionally independent. 343 00:19:26,704 --> 00:19:28,500 All right. 344 00:19:28,500 --> 00:19:32,720 Now let's move to the new topic, to the new concept that 345 00:19:32,720 --> 00:19:35,170 we introduce in this chapter, which is the concept of 346 00:19:35,170 --> 00:19:36,440 expectations. 347 00:19:36,440 --> 00:19:38,200 So what are the things to know here? 348 00:19:38,200 --> 00:19:40,150 One is the general idea. 349 00:19:40,150 --> 00:19:43,140 The way to think about expectations is that it's 350 00:19:43,140 --> 00:19:46,080 something like the average value for random variable if 351 00:19:46,080 --> 00:19:49,590 you do an experiment over and over, and if you interpret 352 00:19:49,590 --> 00:19:51,550 probabilities as frequencies. 353 00:19:51,550 --> 00:19:57,030 So you get x's over and over with a certain frequency -- 354 00:19:57,030 --> 00:19:58,670 P(x) -- 355 00:19:58,670 --> 00:20:01,160 a particular value, little x, gets realized. 356 00:20:01,160 --> 00:20:03,960 And each time that this happens, you get x dollars. 357 00:20:03,960 --> 00:20:06,040 How many dollars do you get on the average? 358 00:20:06,040 --> 00:20:09,330 Well, this formula gives you that particular average. 359 00:20:09,330 --> 00:20:13,190 So first thing we do is to write down a definition for 360 00:20:13,190 --> 00:20:15,420 this sort of concept. 361 00:20:15,420 --> 00:20:19,810 But then the other things you need to know is how to 362 00:20:19,810 --> 00:20:23,990 calculate expectations using shortcuts sometimes, and what 363 00:20:23,990 --> 00:20:25,440 properties they have. 364 00:20:25,440 --> 00:20:28,500 The most important shortcut there is is that, if you want 365 00:20:28,500 --> 00:20:31,250 to calculate the expected value, the average value for a 366 00:20:31,250 --> 00:20:36,380 random variable, you do not need to find the PMF of that 367 00:20:36,380 --> 00:20:37,530 random variable. 368 00:20:37,530 --> 00:20:41,180 But you can work directly with the x's and the y's. 369 00:20:41,180 --> 00:20:44,210 So you do the experiment over and over. 370 00:20:44,210 --> 00:20:46,670 The outcome of the experiment is a pair (x,y). 371 00:20:46,670 --> 00:20:49,400 And each time that a certain (x,y) happens, 372 00:20:49,400 --> 00:20:51,280 you get so many dollars. 373 00:20:51,280 --> 00:20:54,990 So this fraction of the time, a certain (x,y) happens. 374 00:20:54,990 --> 00:20:58,050 And that fraction of the time, you get so many dollars, so 375 00:20:58,050 --> 00:21:00,860 this is the average number of dollars that you get. 376 00:21:00,860 --> 00:21:05,230 So what you end up, since it is the average, then that 377 00:21:05,230 --> 00:21:07,830 means that it corresponds to the expected value. 378 00:21:07,830 --> 00:21:09,820 Now, this is something that, of course, needs a little bit 379 00:21:09,820 --> 00:21:10,850 of mathematical proof. 380 00:21:10,850 --> 00:21:13,880 But this is just a different way of accounting. 381 00:21:13,880 --> 00:21:16,510 And it turns out we give you the right answer. 382 00:21:16,510 --> 00:21:19,420 And it's a very useful shortcut. 383 00:21:19,420 --> 00:21:22,070 Now, when we're talking about functions of random variables, 384 00:21:22,070 --> 00:21:26,620 in general, we cannot speak just about averages. 385 00:21:26,620 --> 00:21:29,690 That is, the expected value of a function of a random 386 00:21:29,690 --> 00:21:31,860 variable is not the same as the function of 387 00:21:31,860 --> 00:21:33,320 the expected values. 388 00:21:33,320 --> 00:21:36,120 A function of averages is not the same as the 389 00:21:36,120 --> 00:21:38,380 average of a function. 390 00:21:38,380 --> 00:21:40,510 So in general, this is not true. 391 00:21:40,510 --> 00:21:43,960 But what it's important to know is to know the exceptions 392 00:21:43,960 --> 00:21:45,370 to this rule. 393 00:21:45,370 --> 00:21:48,620 And the important exceptions are mainly two. 394 00:21:48,620 --> 00:21:51,560 One is the case of linear 395 00:21:51,560 --> 00:21:53,040 functions of a random variable. 396 00:21:53,040 --> 00:21:54,800 We discussed this last time. 397 00:21:54,800 --> 00:21:59,810 So the expected value of temperature in Celsius is, you 398 00:21:59,810 --> 00:22:03,340 first find the expected value of temperature in Fahrenheit, 399 00:22:03,340 --> 00:22:05,810 and then you do the conversion to Celsius. 400 00:22:05,810 --> 00:22:08,600 So whether you first average and then do the conversion to 401 00:22:08,600 --> 00:22:11,730 the new units or not, it shouldn't matter when you get 402 00:22:11,730 --> 00:22:13,740 the result. 403 00:22:13,740 --> 00:22:16,740 The other property that turns out to be true when you talk 404 00:22:16,740 --> 00:22:19,280 about multiple random variables is that expectation 405 00:22:19,280 --> 00:22:21,070 still behaves linearly. 406 00:22:21,070 --> 00:22:26,600 So let X, Y, and Z be the score of a random student at 407 00:22:26,600 --> 00:22:29,940 each one of the three sections of the SAT. 408 00:22:29,940 --> 00:22:36,310 So the overall SAT score is X plus Y plus Z. This is the 409 00:22:36,310 --> 00:22:40,940 average score, the average total SAT score. 410 00:22:40,940 --> 00:22:43,790 Another way to calculate that average is to look at the 411 00:22:43,790 --> 00:22:47,480 first section of the SAT and see what was the average. 412 00:22:47,480 --> 00:22:50,710 Look at the second section, look at what was the average, 413 00:22:50,710 --> 00:22:53,470 and so the third, and add the averages. 414 00:22:53,470 --> 00:22:56,910 So you can do the averages for each section separately, add 415 00:22:56,910 --> 00:23:00,500 the averages, or you can find total scores for each student 416 00:23:00,500 --> 00:23:01,710 and average them. 417 00:23:01,710 --> 00:23:05,690 So I guess you probably believe that this is correct 418 00:23:05,690 --> 00:23:09,030 if you talk just about averaging scores. 419 00:23:09,030 --> 00:23:12,580 Since expectations are just the variation of averages, it 420 00:23:12,580 --> 00:23:16,010 turns out that this is also true in general. 421 00:23:16,010 --> 00:23:19,760 And the derivation of this is very simple, based on the 422 00:23:19,760 --> 00:23:21,320 expected value rule. 423 00:23:21,320 --> 00:23:24,450 And you can look at it in the notes. 424 00:23:24,450 --> 00:23:27,740 So this is one exception, which is linearity. 425 00:23:27,740 --> 00:23:31,540 The second important exception is the case of independent 426 00:23:31,540 --> 00:23:34,520 random variables, that the product of two random 427 00:23:34,520 --> 00:23:37,830 variables has an expectation which is the product of the 428 00:23:37,830 --> 00:23:38,980 expectations. 429 00:23:38,980 --> 00:23:41,400 In general, this is not true. 430 00:23:41,400 --> 00:23:47,010 But for the case where we have independence, the expectation 431 00:23:47,010 --> 00:23:48,080 works out as follows. 432 00:23:48,080 --> 00:23:55,130 Using the expected value rule, this is how you calculate the 433 00:23:55,130 --> 00:23:59,170 expected value of a function of a random variable. 434 00:23:59,170 --> 00:24:04,810 So think of this as being your g(X, Y) and this being your 435 00:24:04,810 --> 00:24:06,160 g(little x, y). 436 00:24:06,160 --> 00:24:08,760 So this is something that's generally true. 437 00:24:08,760 --> 00:24:20,350 Now, if we have independence, then the PMFs factor out, and 438 00:24:20,350 --> 00:24:25,660 then you can separate this sum by bringing together the x 439 00:24:25,660 --> 00:24:30,130 terms, bring them outside the y summation. 440 00:24:30,130 --> 00:24:34,370 And you find that this is the same as expected value of X 441 00:24:34,370 --> 00:24:38,890 times the expected value of Y. So independence is used in 442 00:24:38,890 --> 00:24:40,140 this step here. 443 00:24:44,020 --> 00:24:48,640 OK, now what if X and Y are independent, but instead of 444 00:24:48,640 --> 00:24:51,020 taking the expectation of X times Y, we take the 445 00:24:51,020 --> 00:24:56,600 expectation of the product of two functions of X and Y? 446 00:24:56,600 --> 00:24:59,560 I claim that the expected value of the product is still 447 00:24:59,560 --> 00:25:02,630 going to be the product of the expected values. 448 00:25:02,630 --> 00:25:04,180 How do we show that? 449 00:25:04,180 --> 00:25:09,230 We could show it by just redoing this derivation here. 450 00:25:09,230 --> 00:25:13,500 Instead of X and Y, we would have g(X) and h(Y), so the 451 00:25:13,500 --> 00:25:14,850 algebra goes through. 452 00:25:14,850 --> 00:25:17,720 But there's a better way to think about it which is more 453 00:25:17,720 --> 00:25:18,960 conceptual. 454 00:25:18,960 --> 00:25:20,886 And here's the idea. 455 00:25:20,886 --> 00:25:25,750 If X and Y are independent, what does it mean? 456 00:25:25,750 --> 00:25:31,180 X does not convey any information about Y. If X 457 00:25:31,180 --> 00:25:36,350 conveys no information about Y, does X convey information 458 00:25:36,350 --> 00:25:40,500 about h(Y)? 459 00:25:40,500 --> 00:25:41,940 No. 460 00:25:41,940 --> 00:25:46,160 If X tells me nothing about Y, nothing new, it shouldn't tell 461 00:25:46,160 --> 00:25:50,580 me anything about h(Y). 462 00:25:50,580 --> 00:25:59,270 Now, if X tells me nothing about h of h(Y), could g(X) 463 00:25:59,270 --> 00:26:01,470 tell me something about h(Y)? 464 00:26:01,470 --> 00:26:02,250 No. 465 00:26:02,250 --> 00:26:06,780 So the idea is that, if X is unrelated to Y, doesn't have 466 00:26:06,780 --> 00:26:11,080 any useful information, then g(X) could not have any useful 467 00:26:11,080 --> 00:26:13,250 information for h(Y). 468 00:26:13,250 --> 00:26:21,030 So if X and Y are independent, then g(X) and h(Y) are also 469 00:26:21,030 --> 00:26:22,280 independent. 470 00:26:27,150 --> 00:26:29,430 So this is something that one can try to prove 471 00:26:29,430 --> 00:26:31,500 mathematically, but it's more important to understand 472 00:26:31,500 --> 00:26:34,530 conceptually why this is so. 473 00:26:34,530 --> 00:26:38,220 It's in terms of conveying information. 474 00:26:38,220 --> 00:26:44,950 So if X tells me nothing about Y, X cannot tell me anything 475 00:26:44,950 --> 00:26:48,490 about Y cubed, or X cannot tell me anything by Y 476 00:26:48,490 --> 00:26:51,030 squared, and so on. 477 00:26:51,030 --> 00:26:52,260 That's the idea. 478 00:26:52,260 --> 00:26:57,180 And once we are convinced that g(X) and h(Y) are independent, 479 00:26:57,180 --> 00:27:00,550 then we can apply our previous rule, that for independent 480 00:27:00,550 --> 00:27:04,390 random variables, expectations multiply the right way. 481 00:27:04,390 --> 00:27:08,660 Apply the previous rule, but apply it now to these two 482 00:27:08,660 --> 00:27:10,490 independent random variables. 483 00:27:10,490 --> 00:27:12,785 And we get the conclusion that we wanted. 484 00:27:15,500 --> 00:27:19,050 Now, besides expectations, we also introduced the concept of 485 00:27:19,050 --> 00:27:20,300 the variance. 486 00:27:23,560 --> 00:27:27,450 And if you remember the definition of the variance, 487 00:27:27,450 --> 00:27:31,100 let me write down the formula for the variance of aX. 488 00:27:31,100 --> 00:27:34,920 It's the expected value of the random variable that we're 489 00:27:34,920 --> 00:27:39,630 looking at minus the expected value of the random variable 490 00:27:39,630 --> 00:27:42,050 that we're looking at. 491 00:27:42,050 --> 00:27:44,780 So this is the difference of the random 492 00:27:44,780 --> 00:27:47,850 variable from its mean. 493 00:27:47,850 --> 00:27:50,880 And we take that difference and square it, so it's the 494 00:27:50,880 --> 00:27:53,070 squared distance from the mean, and then take 495 00:27:53,070 --> 00:27:55,250 expectations of the whole thing. 496 00:27:55,250 --> 00:27:59,570 So when you look at that expression, you realize that a 497 00:27:59,570 --> 00:28:01,780 can be pulled out of those expressions. 498 00:28:04,540 --> 00:28:10,340 And because there is a squared, when you pull out the 499 00:28:10,340 --> 00:28:12,980 a, it's going to come out as an a-squared. 500 00:28:12,980 --> 00:28:16,050 So that gives us the rule for finding the variance of a 501 00:28:16,050 --> 00:28:18,990 scale or product of a random variable. 502 00:28:18,990 --> 00:28:22,370 The variance captures the idea of how wide, how spread out a 503 00:28:22,370 --> 00:28:24,210 certain distribution is. 504 00:28:24,210 --> 00:28:26,600 Bigger variance means it's more spread out. 505 00:28:26,600 --> 00:28:29,360 Now, if you take a random variable and the constants to 506 00:28:29,360 --> 00:28:31,960 it, what does it do to its distribution? 507 00:28:31,960 --> 00:28:35,480 It just shifts it, but it doesn't change its width. 508 00:28:35,480 --> 00:28:37,140 So intuitively it means that the 509 00:28:37,140 --> 00:28:39,030 variance should not change. 510 00:28:39,030 --> 00:28:42,360 You can check that mathematically, but it should 511 00:28:42,360 --> 00:28:44,290 also make sense intuitively. 512 00:28:44,290 --> 00:28:47,710 So the variance, when you add the constant, does not change. 513 00:28:47,710 --> 00:28:51,680 Now, can you add variances is the way we added expectations? 514 00:28:51,680 --> 00:28:54,760 Does variance behave linearly? 515 00:28:54,760 --> 00:28:57,810 It turns out that not always. 516 00:28:57,810 --> 00:28:59,270 Here, we need a condition. 517 00:28:59,270 --> 00:29:03,880 It's only in special cases-- 518 00:29:03,880 --> 00:29:06,210 for example, when the two random variables are 519 00:29:06,210 --> 00:29:07,190 independent-- 520 00:29:07,190 --> 00:29:09,300 that you can add variances. 521 00:29:09,300 --> 00:29:13,300 The variance of the sum is the sum of the variances if X and 522 00:29:13,300 --> 00:29:15,370 Y are independent. 523 00:29:15,370 --> 00:29:18,880 The derivation of this is, again, very short and simple. 524 00:29:18,880 --> 00:29:22,590 We'll skip it, but it's an important fact to remember. 525 00:29:22,590 --> 00:29:26,140 Now, to appreciate why this equality is not true always, 526 00:29:26,140 --> 00:29:28,980 we can think of some extreme examples. 527 00:29:28,980 --> 00:29:32,250 Suppose that X is the same as Y. What's going to be the 528 00:29:32,250 --> 00:29:34,520 variance of X plus Y? 529 00:29:34,520 --> 00:29:39,810 Well, X plus Y, in this case, is the same as 2X, so we're 530 00:29:39,810 --> 00:29:44,620 going to get 4 times the variance of X, which is 531 00:29:44,620 --> 00:29:49,770 different than the variance of X plus the variance of X. 532 00:29:49,770 --> 00:29:52,920 So that expression would give us twice the variance of X. 533 00:29:52,920 --> 00:29:56,460 But actually now it's 4 times the variance of X. The other 534 00:29:56,460 --> 00:30:01,990 extreme would be if X is equal to -Y. Then the variance is 535 00:30:01,990 --> 00:30:05,390 the variance of the random variable, which is always 536 00:30:05,390 --> 00:30:07,020 equal to 0. 537 00:30:07,020 --> 00:30:09,980 Now, a random variable which is always equal to 0 has no 538 00:30:09,980 --> 00:30:10,700 uncertainty. 539 00:30:10,700 --> 00:30:14,570 It is always equal to its mean value, so the variance, in 540 00:30:14,570 --> 00:30:17,090 this case, turns out to be 0. 541 00:30:17,090 --> 00:30:19,940 So in both of these cases, of course we have random 542 00:30:19,940 --> 00:30:23,020 variables that are extremely dependent. 543 00:30:23,020 --> 00:30:24,740 Why are they dependent? 544 00:30:24,740 --> 00:30:27,940 Because if I tell you something about Y, it tells 545 00:30:27,940 --> 00:30:32,020 you an awful lot about the value of X. There's a lot of 546 00:30:32,020 --> 00:30:34,910 information about X if I tell you Y, in this 547 00:30:34,910 --> 00:30:37,050 case or in that case. 548 00:30:37,050 --> 00:30:39,940 And finally, a short drill. 549 00:30:39,940 --> 00:30:42,570 If I tell you that the random variables are independent and 550 00:30:42,570 --> 00:30:44,840 you want to calculate the variance of a linear 551 00:30:44,840 --> 00:30:48,330 combination of this kind, then how do you argue? 552 00:30:48,330 --> 00:30:51,940 You argue that, since X and Y are independent, this means 553 00:30:51,940 --> 00:30:55,660 that X and 3Y are also independent. 554 00:30:55,660 --> 00:30:59,610 X has no information about Y, so X has no information about 555 00:30:59,610 --> 00:31:05,000 -Y. X has no information about -Y, so X should not have any 556 00:31:05,000 --> 00:31:10,270 information about -3Y. 557 00:31:10,270 --> 00:31:14,400 So X and -3Y are independent. 558 00:31:14,400 --> 00:31:18,480 So the variance of Z should be the variance of X plus the 559 00:31:18,480 --> 00:31:26,910 variance of -3Y, which is the variance of X plus 9 times the 560 00:31:26,910 --> 00:31:31,760 variance of Y. The important thing to note here is that no 561 00:31:31,760 --> 00:31:34,080 matter what happens, you end up getting a 562 00:31:34,080 --> 00:31:37,000 plus here, not a minus. 563 00:31:37,000 --> 00:31:41,160 So that's the sort of important thing to remember in 564 00:31:41,160 --> 00:31:42,410 this type of calculation. 565 00:31:44,820 --> 00:31:48,890 So this has been all concepts, reviews, new 566 00:31:48,890 --> 00:31:50,390 concepts and all that. 567 00:31:50,390 --> 00:31:52,720 It's the usual fire hose. 568 00:31:52,720 --> 00:31:56,680 Now let's use them to do something useful finally. 569 00:31:56,680 --> 00:31:59,220 So let's revisit our old example, the binomial 570 00:31:59,220 --> 00:32:03,350 distribution, which counts the number of successes in 571 00:32:03,350 --> 00:32:06,230 independent trials of a coin. 572 00:32:06,230 --> 00:32:09,030 It's a biased coin that has a probability of heads, or 573 00:32:09,030 --> 00:32:13,000 probability of success, equal to p at each trial. 574 00:32:13,000 --> 00:32:16,160 Finally, we can go through the exercise of calculating the 575 00:32:16,160 --> 00:32:18,820 expected value of this random variable. 576 00:32:18,820 --> 00:32:21,790 And there's the way of calculating that expectation 577 00:32:21,790 --> 00:32:24,260 that would be the favorite of those people who enjoy 578 00:32:24,260 --> 00:32:27,500 algebra, which is to write down the definition of the 579 00:32:27,500 --> 00:32:28,740 expected value. 580 00:32:28,740 --> 00:32:31,980 We add over all possible values of the random variable, 581 00:32:31,980 --> 00:32:35,580 over all the possible k's, and weigh them according to the 582 00:32:35,580 --> 00:32:38,440 probabilities that this particular k occurs. 583 00:32:38,440 --> 00:32:42,250 The probability that X takes on a particular value k is, of 584 00:32:42,250 --> 00:32:44,820 course, the binomial PMF, which is 585 00:32:44,820 --> 00:32:47,560 this familiar formula. 586 00:32:47,560 --> 00:32:50,480 Clearly, that would be a messy and challenging calculation. 587 00:32:50,480 --> 00:32:52,490 Can we find a shortcut? 588 00:32:52,490 --> 00:32:54,010 There's a very clever trick. 589 00:32:54,010 --> 00:32:56,690 There's lots of problems in probability that you can 590 00:32:56,690 --> 00:33:00,000 approach really nicely by breaking up the random 591 00:33:00,000 --> 00:33:03,830 variable of interest into a sum of simpler and more 592 00:33:03,830 --> 00:33:06,010 manageable random variables. 593 00:33:06,010 --> 00:33:09,700 And if you can make it to be a sum of random variables that 594 00:33:09,700 --> 00:33:12,590 are just 0's or 1's, so much the better. 595 00:33:12,590 --> 00:33:13,990 Life is easier. 596 00:33:13,990 --> 00:33:16,850 Random variables that take values 0 or 1, we call them 597 00:33:16,850 --> 00:33:18,380 indicator variables. 598 00:33:18,380 --> 00:33:21,700 They indicate whether an event has occurred or not. 599 00:33:21,700 --> 00:33:25,600 In this case, we look at each coin flip one at a time. 600 00:33:25,600 --> 00:33:29,710 For the i-th flip, if it resulted in heads or a 601 00:33:29,710 --> 00:33:32,110 success, we record it 1. 602 00:33:32,110 --> 00:33:34,220 If not, we record it 0. 603 00:33:34,220 --> 00:33:37,540 And then we look at the random variable. 604 00:33:37,540 --> 00:33:42,580 If we take the sum of the Xi's, what is it going to be? 605 00:33:42,580 --> 00:33:48,030 We add one each time that we get a success, so the sum is 606 00:33:48,030 --> 00:33:50,820 going to be the total number of successes. 607 00:33:50,820 --> 00:33:53,900 So we break up the random variable of interest as a sum 608 00:33:53,900 --> 00:33:57,610 of really nice and simple random variables. 609 00:33:57,610 --> 00:34:00,380 And now we can use the linearity of expectations. 610 00:34:00,380 --> 00:34:02,800 We're going to find the expectation of X by finding 611 00:34:02,800 --> 00:34:05,700 the expectation of the Xi's and then adding the 612 00:34:05,700 --> 00:34:06,770 expectations. 613 00:34:06,770 --> 00:34:09,520 What's the expected value of Xi? 614 00:34:09,520 --> 00:34:13,050 Well, Xi takes the value 1 with probability p, and takes 615 00:34:13,050 --> 00:34:15,610 the value 0 with probability 1-p. 616 00:34:15,610 --> 00:34:19,070 So the expected value of Xi is just p. 617 00:34:19,070 --> 00:34:24,889 So the expected value of X is going to be just n times p. 618 00:34:24,889 --> 00:34:29,560 Because X is the sum of n terms, each one of which has 619 00:34:29,560 --> 00:34:33,050 expectation p, the expected value of the sum is the sum of 620 00:34:33,050 --> 00:34:34,600 the expected values. 621 00:34:34,600 --> 00:34:38,440 So I guess that's a pretty good shortcut for doing this 622 00:34:38,440 --> 00:34:40,790 horrendous calculation up there. 623 00:34:40,790 --> 00:34:47,210 So in case you didn't realize it, that's what we just 624 00:34:47,210 --> 00:34:51,940 established without doing any algebra. 625 00:34:51,940 --> 00:34:52,219 Good. 626 00:34:52,219 --> 00:34:56,150 How about the variance of X, of Xi? 627 00:34:56,150 --> 00:34:57,570 Two ways to calculate it. 628 00:34:57,570 --> 00:35:01,160 One is by using directly the formula for the variance, 629 00:35:01,160 --> 00:35:02,370 which would be -- 630 00:35:02,370 --> 00:35:03,900 let's see what it would be. 631 00:35:03,900 --> 00:35:06,800 With probability p, you get a 1. 632 00:35:06,800 --> 00:35:11,270 And in this case, you are so far from the mean. 633 00:35:11,270 --> 00:35:13,950 That's your squared distance from the mean. 634 00:35:13,950 --> 00:35:18,750 With probability 1-p, you get a 0, which is so far 635 00:35:18,750 --> 00:35:20,380 away from the mean. 636 00:35:20,380 --> 00:35:24,380 And then you can simplify that formula and get an answer. 637 00:35:24,380 --> 00:35:28,660 How about a slightly easier way of doing it. 638 00:35:28,660 --> 00:35:31,360 Instead of doing the algebra here, let me indicate the 639 00:35:31,360 --> 00:35:33,420 slightly easier way. 640 00:35:33,420 --> 00:35:36,070 We have a formula for the variance that tells us that we 641 00:35:36,070 --> 00:35:42,290 can find the variance by proceeding this way. 642 00:35:42,290 --> 00:35:45,980 That's a formula that's generally true for variances. 643 00:35:45,980 --> 00:35:47,380 Why is this easier? 644 00:35:47,380 --> 00:35:49,560 What's the expected value of Xi squared? 645 00:35:52,240 --> 00:35:53,290 Backtrack. 646 00:35:53,290 --> 00:35:57,140 What is Xi squared, after all? 647 00:35:57,140 --> 00:35:59,510 It's the same thing as Xi. 648 00:35:59,510 --> 00:36:04,200 Since Xi takes value 0 and 1, Xi squared also takes the same 649 00:36:04,200 --> 00:36:05,780 values, 0 and 1. 650 00:36:05,780 --> 00:36:09,050 So the expected value of Xi squared is the same as the 651 00:36:09,050 --> 00:36:11,990 expected value of Xi, which is equal to p. 652 00:36:15,120 --> 00:36:20,530 And the expected value of Xi squared is p squared, so we 653 00:36:20,530 --> 00:36:24,680 get the final answer, p times (1-p). 654 00:36:24,680 --> 00:36:28,630 If you were to work through and do the cancellations in 655 00:36:28,630 --> 00:36:32,400 this messy expression here, after one line you would also 656 00:36:32,400 --> 00:36:34,050 get to the same formula. 657 00:36:34,050 --> 00:36:38,240 But this sort of illustrates that working with this formula 658 00:36:38,240 --> 00:36:40,550 for the variance, sometimes things work 659 00:36:40,550 --> 00:36:43,090 out a little faster. 660 00:36:43,090 --> 00:36:45,420 Finally, are we in business? 661 00:36:45,420 --> 00:36:47,820 Can we calculate the variance of the random 662 00:36:47,820 --> 00:36:50,100 variable X as well? 663 00:36:50,100 --> 00:36:52,650 Well, we have the rule that for independent random 664 00:36:52,650 --> 00:36:55,680 variables, the variance of the sum is 665 00:36:55,680 --> 00:36:57,870 the sum of the variances. 666 00:36:57,870 --> 00:37:00,930 So to find the variance of X, we just need to add the 667 00:37:00,930 --> 00:37:02,960 variances of the Xi's. 668 00:37:02,960 --> 00:37:07,140 We have n Xi's, and each one of them has 669 00:37:07,140 --> 00:37:10,110 variance p_n times (1-p). 670 00:37:10,110 --> 00:37:12,290 And we are done. 671 00:37:12,290 --> 00:37:17,780 So this way, we have calculated both the mean and 672 00:37:17,780 --> 00:37:21,550 the variance of the binomial random variable. 673 00:37:21,550 --> 00:37:27,280 It's interesting to look at this particular formula and 674 00:37:27,280 --> 00:37:29,180 see what it tells us. 675 00:37:29,180 --> 00:37:33,470 If you are to plot the variance of X as a function of 676 00:37:33,470 --> 00:37:36,050 p, it has this shape. 677 00:37:45,900 --> 00:37:51,310 And the maximum is here at 1/2. 678 00:37:51,310 --> 00:37:55,150 p times (1-p) is 0 when p is equal to 0. 679 00:37:55,150 --> 00:37:58,570 And when p equals to 1, it's a quadratic, so it must have 680 00:37:58,570 --> 00:38:00,250 this particular shape. 681 00:38:00,250 --> 00:38:02,080 So what does it tell us? 682 00:38:02,080 --> 00:38:05,880 If you think about variance as a measure of uncertainty, it 683 00:38:05,880 --> 00:38:10,290 tells you that coin flips are most uncertain when 684 00:38:10,290 --> 00:38:12,620 your coin is fair. 685 00:38:12,620 --> 00:38:16,190 When p is equal to 1/2, that's when you have the most 686 00:38:16,190 --> 00:38:17,050 randomness. 687 00:38:17,050 --> 00:38:18,790 And this is kind of intuitive. 688 00:38:18,790 --> 00:38:21,460 if on the other hand I tell you that the coin is extremely 689 00:38:21,460 --> 00:38:26,490 biased, p very close to 1, which means it almost always 690 00:38:26,490 --> 00:38:29,460 gives you heads, then that would be 691 00:38:29,460 --> 00:38:30,630 a case of low variance. 692 00:38:30,630 --> 00:38:32,870 There's low variability in the results. 693 00:38:32,870 --> 00:38:35,270 There's little uncertainty about what's going to happen. 694 00:38:35,270 --> 00:38:39,570 It's going to be mostly heads with some occasional tails. 695 00:38:39,570 --> 00:38:42,010 So p equals 1/2. 696 00:38:42,010 --> 00:38:45,350 Fair coin, that's the coin which is the most uncertain of 697 00:38:45,350 --> 00:38:47,240 all coins, in some sense. 698 00:38:47,240 --> 00:38:49,240 And it corresponds to the biggest variance. 699 00:38:49,240 --> 00:38:53,760 It corresponds to an X that has the widest distribution. 700 00:38:53,760 --> 00:38:57,680 Now that we're on a roll and we can calculate such hugely 701 00:38:57,680 --> 00:39:01,400 complicated sums in simple ways, let us try to push our 702 00:39:01,400 --> 00:39:05,100 luck and do a problem with this flavor, but a little 703 00:39:05,100 --> 00:39:06,590 harder than that. 704 00:39:06,590 --> 00:39:07,960 So you go to one of those 705 00:39:07,960 --> 00:39:09,910 old-fashioned cocktail parties. 706 00:39:09,910 --> 00:39:16,010 All males at least will have those standard big hats which 707 00:39:16,010 --> 00:39:16,990 look identical. 708 00:39:16,990 --> 00:39:19,700 They check them in when they walk in. 709 00:39:19,700 --> 00:39:23,390 And when they walk out, since they look pretty identical, 710 00:39:23,390 --> 00:39:26,830 they just pick a random hat and go home. 711 00:39:26,830 --> 00:39:31,080 So n people, they pick their hats completely at random, 712 00:39:31,080 --> 00:39:33,950 quote, unquote, and then leave. 713 00:39:33,950 --> 00:39:36,970 And the question is, to say something about the number of 714 00:39:36,970 --> 00:39:42,070 people who end up, by accident or by luck, to get back their 715 00:39:42,070 --> 00:39:45,170 own hat, the exact same hat that they checked in. 716 00:39:45,170 --> 00:39:48,490 OK, first what do we mean completely at random? 717 00:39:48,490 --> 00:39:51,060 Completely at random, we basically mean that any 718 00:39:51,060 --> 00:39:54,180 permutation of the hats is equally likely. 719 00:39:54,180 --> 00:39:58,520 Any way of distributing those n hats to the n people, any 720 00:39:58,520 --> 00:40:01,350 particular way is as likely as any other way. 721 00:40:01,350 --> 00:40:05,230 So there's complete symmetry between hats and people. 722 00:40:05,230 --> 00:40:08,490 So what we want to do is to calculate the expected value 723 00:40:08,490 --> 00:40:11,460 and the variance of this random variable X. Let's start 724 00:40:11,460 --> 00:40:13,240 with the expected value. 725 00:40:13,240 --> 00:40:17,840 Let's reuse the trick from the binomial case. 726 00:40:17,840 --> 00:40:21,110 So total number of hats picked, we're going to think 727 00:40:21,110 --> 00:40:24,140 of total number of hats picked as a sum of 728 00:40:24,140 --> 00:40:26,900 (0, 1) random variables. 729 00:40:26,900 --> 00:40:30,470 X1 tells us whether person 1 got their own hat back. 730 00:40:30,470 --> 00:40:32,920 If they did, we record a 1. 731 00:40:32,920 --> 00:40:34,960 X2, the same thing. 732 00:40:34,960 --> 00:40:40,910 By adding all X's is how many 1's did we get, which counts 733 00:40:40,910 --> 00:40:45,510 how many people selected their own hats. 734 00:40:45,510 --> 00:40:48,100 So we broke down the random variable of interest, the 735 00:40:48,100 --> 00:40:51,500 number of people who get their own hats back, as a sum of 736 00:40:51,500 --> 00:40:53,570 random variables. 737 00:40:53,570 --> 00:40:56,200 And these random variables, again, are easy to handle, 738 00:40:56,200 --> 00:40:58,010 because they're binary. 739 00:40:58,010 --> 00:40:59,250 The only take two values. 740 00:40:59,250 --> 00:41:03,500 What's the probability that Xi is equal to 1, the i-th person 741 00:41:03,500 --> 00:41:06,730 has a probability that they get their own hat? 742 00:41:06,730 --> 00:41:09,430 There's n hats by symmetry. 743 00:41:09,430 --> 00:41:11,890 The chance is that they end up getting their own hat, as 744 00:41:11,890 --> 00:41:14,930 opposed to any one of the other n - 1 hats, 745 00:41:14,930 --> 00:41:18,020 is going to be 1/n. 746 00:41:18,020 --> 00:41:20,710 So what's the expected value of Xi? 747 00:41:20,710 --> 00:41:23,130 It's one times 1/n. 748 00:41:23,130 --> 00:41:26,510 With probability 1/n, you get your own hat, or you get a 749 00:41:26,510 --> 00:41:30,960 value of 0 with probability 1-1/n, which is 1/n. 750 00:41:34,660 --> 00:41:38,360 All right, so we got the expected value of the Xi's. 751 00:41:38,360 --> 00:41:41,510 And remember, we want to do is to calculate the expected 752 00:41:41,510 --> 00:41:46,900 value of X by using this decomposition? 753 00:41:46,900 --> 00:41:52,230 Are the random variables Xi independent of each other? 754 00:41:52,230 --> 00:41:55,470 You can try to answer that question by writing down a 755 00:41:55,470 --> 00:41:58,510 joint PMF for the X's, but I'm sure that 756 00:41:58,510 --> 00:42:00,000 you will not succeed. 757 00:42:00,000 --> 00:42:02,740 But can you think intuitively? 758 00:42:02,740 --> 00:42:05,940 If I tell you information about some of the Xi's, does 759 00:42:05,940 --> 00:42:08,920 it give you information about the remaining ones? 760 00:42:08,920 --> 00:42:09,300 Yeah. 761 00:42:09,300 --> 00:42:13,950 If I tell you that out of 10 people, 9 of them got their 762 00:42:13,950 --> 00:42:16,710 own hat back, does that tell you something 763 00:42:16,710 --> 00:42:18,330 about the 10th person? 764 00:42:18,330 --> 00:42:18,690 Yes. 765 00:42:18,690 --> 00:42:22,510 If 9 got their own hat, then the 10th must also have gotten 766 00:42:22,510 --> 00:42:24,170 their own hat back. 767 00:42:24,170 --> 00:42:27,170 So the first 9 random variables tell you something 768 00:42:27,170 --> 00:42:28,790 about the 10th one. 769 00:42:28,790 --> 00:42:33,000 And conveying information of this sort, that's the case of 770 00:42:33,000 --> 00:42:34,410 dependence. 771 00:42:34,410 --> 00:42:38,100 All right, so the random variables are not independent. 772 00:42:38,100 --> 00:42:39,030 Are we stuck? 773 00:42:39,030 --> 00:42:43,240 Can we still calculate the expected value of X? 774 00:42:43,240 --> 00:42:45,210 Yes, we can. 775 00:42:45,210 --> 00:42:50,710 And the reason we can is that expectations are linear. 776 00:42:50,710 --> 00:42:53,940 Expectation of a sum of random variables is the sum of the 777 00:42:53,940 --> 00:42:55,140 expectations. 778 00:42:55,140 --> 00:42:57,490 And that's always true. 779 00:42:57,490 --> 00:43:00,710 There's no independence assumption that's being used 780 00:43:00,710 --> 00:43:02,540 to apply that rule. 781 00:43:02,540 --> 00:43:06,980 So we have that the expected value of X is the sum of the 782 00:43:06,980 --> 00:43:09,580 expected value of the Xi's. 783 00:43:09,580 --> 00:43:12,970 And this is a property that's always true. 784 00:43:12,970 --> 00:43:14,350 You don't need independence. 785 00:43:14,350 --> 00:43:15,590 You don't care. 786 00:43:15,590 --> 00:43:18,660 So we're adding n terms, each one of which has 787 00:43:18,660 --> 00:43:20,430 expected value 1/n. 788 00:43:20,430 --> 00:43:22,670 And the final answer is 1. 789 00:43:22,670 --> 00:43:27,430 So out of the 100 people who selected hats at random, on 790 00:43:27,430 --> 00:43:32,590 the average, you expect only one of them to end up getting 791 00:43:32,590 --> 00:43:35,830 their own hat back. 792 00:43:35,830 --> 00:43:36,640 Very good. 793 00:43:36,640 --> 00:43:41,620 So since we are succeeding so far, let's try to see if we 794 00:43:41,620 --> 00:43:44,620 can succeed in calculating the variance as well. 795 00:43:44,620 --> 00:43:46,580 And of course, we will. 796 00:43:46,580 --> 00:43:50,160 But it's going to be a little more complicated. 797 00:43:50,160 --> 00:43:52,760 The reason it's going to be a little more complicated is 798 00:43:52,760 --> 00:43:56,500 because the Xi's are not independent, so the variance 799 00:43:56,500 --> 00:44:00,280 of the sum is not the same as the sum of the variances. 800 00:44:00,280 --> 00:44:04,320 So it's not enough to find the variances of the Xi's. 801 00:44:04,320 --> 00:44:06,930 We'll have to do more work. 802 00:44:06,930 --> 00:44:08,550 And here's what's involved. 803 00:44:08,550 --> 00:44:12,320 Let's start with the general formula for the variance, 804 00:44:12,320 --> 00:44:15,950 which, as I mentioned before, it's usually the simpler way 805 00:44:15,950 --> 00:44:18,430 to go about calculating variances. 806 00:44:18,430 --> 00:44:21,800 So we need to calculate the expected value for X-squared, 807 00:44:21,800 --> 00:44:27,110 and subtract from it the expectation squared. 808 00:44:27,110 --> 00:44:31,010 Well, we already found the expected value of X. It's 809 00:44:31,010 --> 00:44:31,870 equal to 1. 810 00:44:31,870 --> 00:44:34,580 So 1-squared gives us just 1. 811 00:44:34,580 --> 00:44:37,980 So we're left with the task of calculating the expected value 812 00:44:37,980 --> 00:44:43,440 of X-squared, the random variable X-squared. 813 00:44:43,440 --> 00:44:45,610 Let's try to follow the same idea. 814 00:44:45,610 --> 00:44:49,770 Write this messy random variable, X-squared, as a sum 815 00:44:49,770 --> 00:44:54,440 of hopefully simpler random variables. 816 00:44:54,440 --> 00:44:59,350 So X is the sum of the Xi's, so you square 817 00:44:59,350 --> 00:45:01,560 both sides of this. 818 00:45:01,560 --> 00:45:05,150 And then you expand the right-hand side. 819 00:45:05,150 --> 00:45:09,390 When you expand the right-hand side, you get the squares of 820 00:45:09,390 --> 00:45:11,420 the terms that appear here. 821 00:45:11,420 --> 00:45:14,230 And then you get all the cross-terms. 822 00:45:14,230 --> 00:45:19,100 For every pair of (i,j) that are different, i different 823 00:45:19,100 --> 00:45:24,030 than j, you're going to have a cross-term in the sum. 824 00:45:24,030 --> 00:45:29,230 So now, in order to calculate the expected value of 825 00:45:29,230 --> 00:45:32,480 X-squared, what does our task reduce to? 826 00:45:32,480 --> 00:45:36,230 It reduces to calculating the expected value of this term 827 00:45:36,230 --> 00:45:38,690 and calculating the expected value of that term. 828 00:45:38,690 --> 00:45:41,060 So let's do them one at a time. 829 00:45:41,060 --> 00:45:47,040 Expected value of Xi squared, what is it going to be? 830 00:45:47,040 --> 00:45:48,660 Same trick as before. 831 00:45:48,660 --> 00:45:53,350 Xi takes value 0 or 1, so Xi squared takes just the same 832 00:45:53,350 --> 00:45:55,290 values, 0 or 1. 833 00:45:55,290 --> 00:45:57,010 So that's the easy one. 834 00:45:57,010 --> 00:46:00,680 That's the same as expected value of Xi, which we already 835 00:46:00,680 --> 00:46:04,410 know to be 1/n. 836 00:46:04,410 --> 00:46:07,830 So this gives us a first contribution down here. 837 00:46:10,840 --> 00:46:14,220 The expected value of this term is going to be what? 838 00:46:14,220 --> 00:46:17,210 We have n terms in the summation. 839 00:46:17,210 --> 00:46:21,800 And each one of these terms has an expectation of 1/n. 840 00:46:21,800 --> 00:46:24,710 So we did a piece of the puzzle. 841 00:46:24,710 --> 00:46:28,480 So now let's deal with the second piece of the puzzle. 842 00:46:28,480 --> 00:46:32,020 Let's find the expected value of Xi times Xj. 843 00:46:32,020 --> 00:46:35,540 Now by symmetry, the expected value of Xi times Xj is going 844 00:46:35,540 --> 00:46:39,900 to be the same no matter what i and j you see. 845 00:46:39,900 --> 00:46:44,930 So let's just think about X1 and X2 and try to find the 846 00:46:44,930 --> 00:46:48,260 expected value of X1 and X2. 847 00:46:48,260 --> 00:46:51,710 X1 times X2 is a random variable. 848 00:46:51,710 --> 00:46:53,960 What values does it take? 849 00:46:53,960 --> 00:46:56,570 Only 0 or 1? 850 00:46:56,570 --> 00:47:00,000 Since X1 and X2 are 0 or 1, their product can only take 851 00:47:00,000 --> 00:47:02,010 the values of 0 or 1. 852 00:47:02,010 --> 00:47:04,990 So to find the probability distribution of this random 853 00:47:04,990 --> 00:47:07,320 variable, it's just sufficient to find the probability that 854 00:47:07,320 --> 00:47:09,530 it takes the value of 1. 855 00:47:09,530 --> 00:47:14,500 Now, what does X1 times X2 equal to 1 mean? 856 00:47:14,500 --> 00:47:19,500 It means that X1 was 1 and X2 was 1. 857 00:47:19,500 --> 00:47:22,390 The only way that you can get a product of 1 is if both of 858 00:47:22,390 --> 00:47:24,350 them turned out to be 1's. 859 00:47:24,350 --> 00:47:29,570 So that's the same as saying, persons 1 and 2 both picked 860 00:47:29,570 --> 00:47:31,980 their own hats. 861 00:47:31,980 --> 00:47:35,510 The probability that person 1 and person 2 both pick their 862 00:47:35,510 --> 00:47:39,600 own hats is the probability of two things happening, which is 863 00:47:39,600 --> 00:47:42,320 the product of the first thing happening times the 864 00:47:42,320 --> 00:47:44,310 conditional probability of the second, given 865 00:47:44,310 --> 00:47:46,160 that the first happened. 866 00:47:46,160 --> 00:47:48,690 And in words, this is the probability that the first 867 00:47:48,690 --> 00:47:51,840 person picked their own hat times the probability that the 868 00:47:51,840 --> 00:47:54,920 second person picks their own hat, given that the first 869 00:47:54,920 --> 00:47:56,990 person already picked their own. 870 00:47:56,990 --> 00:47:58,820 So what's the probability that the first person 871 00:47:58,820 --> 00:48:00,760 picks their own hat? 872 00:48:00,760 --> 00:48:03,040 We know that it's 1/n. 873 00:48:03,040 --> 00:48:05,030 Now, how about the second person? 874 00:48:05,030 --> 00:48:09,540 If I tell you that one person has their own hat, and that 875 00:48:09,540 --> 00:48:13,240 person takes their hat and goes away, from the point of 876 00:48:13,240 --> 00:48:17,250 view of the second person, there's n - 1 people left 877 00:48:17,250 --> 00:48:19,770 looking at n - 1 hats. 878 00:48:19,770 --> 00:48:22,330 And they're getting just hats at random. 879 00:48:22,330 --> 00:48:24,930 What's the chance that I will get my own? 880 00:48:24,930 --> 00:48:26,180 It's 1/n - 1. 881 00:48:29,210 --> 00:48:33,700 So think of them as person 1 goes, picks a hat at random, 882 00:48:33,700 --> 00:48:36,850 it happens to be their own, and it leaves. 883 00:48:36,850 --> 00:48:40,120 You're left with n - 1 people, and there are n 884 00:48:40,120 --> 00:48:41,250 - 1 hats out there. 885 00:48:41,250 --> 00:48:44,490 Person 2 goes and picks a hat at random, with probability 886 00:48:44,490 --> 00:48:48,820 1/n - 1, is going to pick his own hat. 887 00:48:48,820 --> 00:48:52,400 So the expected value now of this random variable is, 888 00:48:52,400 --> 00:48:54,520 again, that same number, because this is 889 00:48:54,520 --> 00:48:57,500 a 0, 1 random variable. 890 00:48:57,500 --> 00:49:02,370 So this is the same as expected value of Xi times Xj 891 00:49:02,370 --> 00:49:04,810 when i different than j. 892 00:49:04,810 --> 00:49:09,830 So here, all that's left to do is to add the expectations of 893 00:49:09,830 --> 00:49:10,540 these terms. 894 00:49:10,540 --> 00:49:14,480 Each one of these terms has an expected value that's 1/n 895 00:49:14,480 --> 00:49:16,910 times (1/n - 1). 896 00:49:16,910 --> 00:49:19,170 And how many terms do we have? 897 00:49:19,170 --> 00:49:21,410 How many of these are we adding up? 898 00:49:24,840 --> 00:49:28,950 It's n-squared - n. 899 00:49:28,950 --> 00:49:31,830 When you expand the quadratic, there's a total 900 00:49:31,830 --> 00:49:33,890 of n-squared terms. 901 00:49:33,890 --> 00:49:37,860 Some are self-terms, n of them. 902 00:49:37,860 --> 00:49:42,170 And the remaining number of terms is n-squared - n. 903 00:49:42,170 --> 00:49:48,310 So here we got n-squared - n terms. 904 00:49:48,310 --> 00:49:51,200 And so we need to multiply here with n-squared - n. 905 00:49:53,810 --> 00:49:59,980 And after you realize that this number here is 1, and you 906 00:49:59,980 --> 00:50:03,490 realize that this is the same as the denominator, you get 907 00:50:03,490 --> 00:50:06,750 the answer that the expected value of X squared equals 2. 908 00:50:06,750 --> 00:50:10,120 And then, finally going up to the top formula, we get the 909 00:50:10,120 --> 00:50:14,720 expected value of X squared, which is 2 - 1, and the 910 00:50:14,720 --> 00:50:17,610 variance is just equal to 1. 911 00:50:17,610 --> 00:50:21,680 So the variance of this random variable, number of people who 912 00:50:21,680 --> 00:50:25,130 get their own hats back, is also equal to 1, 913 00:50:25,130 --> 00:50:26,540 equal to the mean. 914 00:50:26,540 --> 00:50:27,690 Looks like magic. 915 00:50:27,690 --> 00:50:29,220 Why is this the case? 916 00:50:29,220 --> 00:50:31,550 Well, there's a deeper explanation why these two 917 00:50:31,550 --> 00:50:33,630 numbers should come out to be the same. 918 00:50:33,630 --> 00:50:35,980 But this is something that would probably have to wait a 919 00:50:35,980 --> 00:50:39,420 couple of chapters before we could actually explain it. 920 00:50:39,420 --> 00:50:40,730 And so I'll stop here.