1 00:00:01,197 --> 00:00:03,030 PROFESSOR: So we've been saving for the last 2 00:00:03,030 --> 00:00:06,140 the property that makes expectation calculating 3 00:00:06,140 --> 00:00:10,750 really easy and short circuits a lot of the ingenious methods 4 00:00:10,750 --> 00:00:15,790 that we've used up until now-- namely, expectation is linear. 5 00:00:15,790 --> 00:00:19,060 So what that means is that if you have two random variables R 6 00:00:19,060 --> 00:00:22,790 and S and two constants a and b, the expectation function 7 00:00:22,790 --> 00:00:23,290 is linear. 8 00:00:23,290 --> 00:00:26,550 That is, you take a linear combination of R and S-- aR 9 00:00:26,550 --> 00:00:30,640 plus bS, and that's equal to the corresponding linear 10 00:00:30,640 --> 00:00:32,949 combination of the expectations. 11 00:00:32,949 --> 00:00:33,740 I'll read it again. 12 00:00:33,740 --> 00:00:37,510 Expectation of aR plus bS is equal to a times 13 00:00:37,510 --> 00:00:42,380 the expectation of R plus b times the expectation S. 14 00:00:42,380 --> 00:00:44,280 Expectation is linear. 15 00:00:44,280 --> 00:00:46,090 OK. 16 00:00:46,090 --> 00:00:47,960 That's an absolutely fundamental formula 17 00:00:47,960 --> 00:00:51,070 that you should be comfortable with and remember. 18 00:00:51,070 --> 00:00:54,670 It extends actually not only to any finite number of variables, 19 00:00:54,670 --> 00:00:56,650 but with some convergence conditions, 20 00:00:56,650 --> 00:00:59,930 it actually extends even to accountable sum of variables. 21 00:00:59,930 --> 00:01:03,080 But let's just settle for the two random variables 22 00:01:03,080 --> 00:01:05,180 case for today. 23 00:01:05,180 --> 00:01:08,170 Now, the crucial thing that makes it so powerful and useful 24 00:01:08,170 --> 00:01:12,750 is that this fact has nothing to do with independence. 25 00:01:12,750 --> 00:01:18,070 Whether R and S are independent or equal, it doesn't matter. 26 00:01:18,070 --> 00:01:21,840 This linearity holds. 27 00:01:21,840 --> 00:01:25,000 The proof is not terribly informative. 28 00:01:25,000 --> 00:01:28,490 It's just manipulating terms and rearranging terms into sum, 29 00:01:28,490 --> 00:01:30,170 but let's go through the exercise. 30 00:01:30,170 --> 00:01:32,480 Again, something I would never do in lecture. 31 00:01:32,480 --> 00:01:35,170 But in a video where you can skip it or fast forward 32 00:01:35,170 --> 00:01:39,650 or replay it, I think it might be worth doing, 33 00:01:39,650 --> 00:01:41,700 And it is a proof, by the way, that I think 34 00:01:41,700 --> 00:01:43,230 you should be responsible for. 35 00:01:43,230 --> 00:01:45,190 So let's go through it. 36 00:01:45,190 --> 00:01:45,800 OK. 37 00:01:45,800 --> 00:01:47,470 We're interested in the expectation 38 00:01:47,470 --> 00:01:49,540 of the random variable that you get 39 00:01:49,540 --> 00:01:51,910 by multiplying the random variable A 40 00:01:51,910 --> 00:01:57,190 by little a and the random variable B by little b. 41 00:01:57,190 --> 00:01:59,530 All right. 42 00:01:59,530 --> 00:02:01,790 One of the definitions of expectation 43 00:02:01,790 --> 00:02:07,510 is that you get this expectation by taking the sum over all 44 00:02:07,510 --> 00:02:11,820 the outcomes of the value of this linear combination 45 00:02:11,820 --> 00:02:15,310 at the outcome omega times the probability of omega. 46 00:02:15,310 --> 00:02:19,025 So what's the value of the linear combination aA plus bB? 47 00:02:19,025 --> 00:02:23,620 At omega, it's simply a times A of omega plus b 48 00:02:23,620 --> 00:02:25,440 times B of omega. 49 00:02:25,440 --> 00:02:26,390 OK. 50 00:02:26,390 --> 00:02:31,180 Now I've got a sum of these terms summing over omega. 51 00:02:31,180 --> 00:02:34,230 I can split them into two groups. 52 00:02:34,230 --> 00:02:38,730 I can take the sum over the aA's at omega times 53 00:02:38,730 --> 00:02:42,860 probability of omega and bB of omega times 54 00:02:42,860 --> 00:02:43,980 probability of omega. 55 00:02:43,980 --> 00:02:48,720 In other words, I'm multiplying by probability of omega 56 00:02:48,720 --> 00:02:51,010 here to get a sum of two terms, and then I'm 57 00:02:51,010 --> 00:02:54,160 rearranging all of the capital A terms first, 58 00:02:54,160 --> 00:02:56,230 followed by all the capital B terms. 59 00:02:56,230 --> 00:03:01,080 The result is that I wind up with the sum over omega 60 00:03:01,080 --> 00:03:03,570 of the A terms times the probability of omega-- 61 00:03:03,570 --> 00:03:07,210 and I factored out the little a-- plus b times the sum 62 00:03:07,210 --> 00:03:11,320 over all omega of B of omega times the probability of omega. 63 00:03:11,320 --> 00:03:14,380 It's just rearranging the terms in this sum 64 00:03:14,380 --> 00:03:18,230 after I've multiplied through by probability of omega. 65 00:03:18,230 --> 00:03:21,420 Well, of course, this is equal by definition to a times 66 00:03:21,420 --> 00:03:24,520 the expectation of A. Notice this is the expectation of A, 67 00:03:24,520 --> 00:03:27,710 and that's the expectation of B times b. 68 00:03:27,710 --> 00:03:29,700 And the proof is done. 69 00:03:29,700 --> 00:03:34,110 Not inspiring, but routine if you use the alternative 70 00:03:34,110 --> 00:03:37,480 definition of expectation in terms 71 00:03:37,480 --> 00:03:38,760 of summing over the outcomes. 72 00:03:38,760 --> 00:03:41,150 It's a messier proof if you have to use 73 00:03:41,150 --> 00:03:42,830 the definition of the expectation, 74 00:03:42,830 --> 00:03:47,897 being the value times the probability 75 00:03:47,897 --> 00:03:49,355 that the variable takes that value. 76 00:03:49,355 --> 00:03:52,660 And you wind up having to convert that formula 77 00:03:52,660 --> 00:03:56,080 into this formula in order to carry through the proof nicely. 78 00:03:56,080 --> 00:03:57,574 And we're done. 79 00:03:57,574 --> 00:03:58,690 OK. 80 00:03:58,690 --> 00:04:00,002 Let's make use of it. 81 00:04:00,002 --> 00:04:01,460 And in order to do that, let's make 82 00:04:01,460 --> 00:04:03,810 a really trivial but a very important 83 00:04:03,810 --> 00:04:06,450 remark about the expectation of an indicator variable. 84 00:04:06,450 --> 00:04:09,520 So remember I sub A is three random variables that's 85 00:04:09,520 --> 00:04:11,900 equal to 1 if the event A occurs and 0 86 00:04:11,900 --> 00:04:13,930 if the even A doesn't occur. 87 00:04:13,930 --> 00:04:18,459 So what is the expectation of the indicator variable? 88 00:04:18,459 --> 00:04:21,490 Well, by definition, it's 1 times the probability 89 00:04:21,490 --> 00:04:24,050 that it equals 1 plus 0 times the probability 90 00:04:24,050 --> 00:04:24,850 that it equals 0. 91 00:04:24,850 --> 00:04:27,160 Those are the only two values it can take. 92 00:04:27,160 --> 00:04:29,850 Well, we can forget this term in 0 times something. 93 00:04:29,850 --> 00:04:34,740 But what is the probability that I A is equal to 1? 94 00:04:34,740 --> 00:04:36,635 That's exactly the probability of A, 95 00:04:36,635 --> 00:04:39,680 and that's the fundamental formula that we want to notice. 96 00:04:39,680 --> 00:04:42,770 The expectation of the indicator variable for the event A 97 00:04:42,770 --> 00:04:46,430 is nothing but the probability that A occurs. 98 00:04:46,430 --> 00:04:47,550 File that away. 99 00:04:47,550 --> 00:04:51,160 We're about to use it multiple times. 100 00:04:51,160 --> 00:04:54,010 So let's go back to the expected number of heads in n 101 00:04:54,010 --> 00:04:57,790 flips which we've now seen at least two ways to do-- one 102 00:04:57,790 --> 00:05:02,120 by generating function argument, another by a recursive argument 103 00:05:02,120 --> 00:05:05,450 using total expectation. 104 00:05:05,450 --> 00:05:08,490 Now where we're going to knock it off very elegantly using 105 00:05:08,490 --> 00:05:13,620 linearity because let Hi be the indicator variable 106 00:05:13,620 --> 00:05:15,426 for having a head on ith flip. 107 00:05:15,426 --> 00:05:16,550 So we look at the ith flip. 108 00:05:16,550 --> 00:05:19,160 Hi is 1 if the ith flip comes up head 109 00:05:19,160 --> 00:05:24,460 and Hi is 0 if the ith flip does not come up head. 110 00:05:24,460 --> 00:05:27,040 Then we can make the following crucial remark, 111 00:05:27,040 --> 00:05:29,700 and this is a trick that we'll use regularly 112 00:05:29,700 --> 00:05:31,560 by expressing some quantity that we're 113 00:05:31,560 --> 00:05:34,720 interested in as a sum of indicator variables. 114 00:05:34,720 --> 00:05:37,110 The total number of heads-- the random variable 115 00:05:37,110 --> 00:05:39,720 equal to the total number of heads in n flips-- 116 00:05:39,720 --> 00:05:41,970 is equal to the sum of the indicator 117 00:05:41,970 --> 00:05:44,802 variables for whether there's a head on the first flip plus 118 00:05:44,802 --> 00:05:46,510 whether there's a head on the second flip 119 00:05:46,510 --> 00:05:50,100 up through whether there's a head on the n flip. 120 00:05:50,100 --> 00:05:54,580 So suddenly the random variable that I want to compute 121 00:05:54,580 --> 00:05:59,590 is a sum of n random variables, in fact, n indicator variables. 122 00:05:59,590 --> 00:06:00,450 All right. 123 00:06:00,450 --> 00:06:02,830 Well, that tells me that the expectation 124 00:06:02,830 --> 00:06:05,720 of the number of heads is the expectation of the sum. 125 00:06:05,720 --> 00:06:07,390 After all, it's equal to the sum. 126 00:06:07,390 --> 00:06:10,290 But the expectation of the sum is 127 00:06:10,290 --> 00:06:13,990 going to be the sum of the expectations by linearity. 128 00:06:13,990 --> 00:06:15,660 So it's simply the expectation of H 1 129 00:06:15,660 --> 00:06:21,280 plus the expectation of H 2 out through the expectation of H n. 130 00:06:21,280 --> 00:06:24,167 But what's the expectation of getting a head on ith flip? 131 00:06:24,167 --> 00:06:25,500 Well, the flips are independent. 132 00:06:25,500 --> 00:06:28,140 It's simply the expectation of a head. 133 00:06:28,140 --> 00:06:30,750 So what I have is-- each of these 134 00:06:30,750 --> 00:06:33,830 is equal to the probability of a head, and there's n of them. 135 00:06:33,830 --> 00:06:37,850 So the total is n times the probability of Head, or np, 136 00:06:37,850 --> 00:06:40,100 which is a formula that we had derived 137 00:06:40,100 --> 00:06:41,460 two other ways previously. 138 00:06:41,460 --> 00:06:44,140 And now it really falls out very elegantly 139 00:06:44,140 --> 00:06:47,620 with hardly any ingenuity other than the wonderful idea 140 00:06:47,620 --> 00:06:51,095 of expressing the number of heads as a sum of indicators. 141 00:06:53,650 --> 00:06:59,600 Let's look at one example and a very related example 142 00:06:59,600 --> 00:07:04,570 of asking about the probability of the expected number of hats 143 00:07:04,570 --> 00:07:05,650 that are returned. 144 00:07:05,650 --> 00:07:10,050 When n men check their hats at a hat check 145 00:07:10,050 --> 00:07:11,730 and then the hats get all scrambled up 146 00:07:11,730 --> 00:07:15,110 by incompetent staff and then they're 147 00:07:15,110 --> 00:07:17,920 given out again in such a way that the probability 148 00:07:17,920 --> 00:07:21,210 that the ith man gets his own hat back is 1 over n. 149 00:07:21,210 --> 00:07:27,270 What you could say is that all possible permutations of the n 150 00:07:27,270 --> 00:07:29,440 hats are equally likely. 151 00:07:29,440 --> 00:07:33,870 And we ask with all permutations equally likely, 152 00:07:33,870 --> 00:07:36,890 how many of them is it the case that the ith 153 00:07:36,890 --> 00:07:39,180 man gets his own hat back? 154 00:07:39,180 --> 00:07:41,780 And there's a 1 out of n chance that the ith man 155 00:07:41,780 --> 00:07:44,290 is going to get his own hat back because there's n hats, 156 00:07:44,290 --> 00:07:46,481 and he's equally likely to get all of them. 157 00:07:46,481 --> 00:07:46,980 OK. 158 00:07:49,660 --> 00:07:52,150 How many men do we expect will get their hat back 159 00:07:52,150 --> 00:07:53,350 in this setting? 160 00:07:53,350 --> 00:07:56,950 Well, let's let our Ri be the indicator variable for 161 00:07:56,950 --> 00:08:01,170 whether or not the ith man got his hat returned-- R i for hat 162 00:08:01,170 --> 00:08:04,000 returned to the ith man. 163 00:08:04,000 --> 00:08:08,120 Now, notice that Ri and Rj are not independent. 164 00:08:08,120 --> 00:08:10,184 In the previous case, those H's were independent 165 00:08:10,184 --> 00:08:11,850 because the coin flips were independent. 166 00:08:11,850 --> 00:08:14,530 But here, if I know, for example, 167 00:08:14,530 --> 00:08:17,660 that R 1 got his hat back, the probability 168 00:08:17,660 --> 00:08:20,510 that R 2 got his hat back has changed from 1 over n 169 00:08:20,510 --> 00:08:24,030 to 1 over n minus 1 because 1 is out of the picture, 170 00:08:24,030 --> 00:08:28,380 and R 2 is going to get his hat back among the remaining hats 2 171 00:08:28,380 --> 00:08:30,807 through and n is n minus 1 of them. 172 00:08:30,807 --> 00:08:33,980 And he's got a 1 minus 1 over n minus 1 173 00:08:33,980 --> 00:08:35,210 chance of getting his hat. 174 00:08:35,210 --> 00:08:37,630 His probability has changed given 175 00:08:37,630 --> 00:08:39,340 that R 1 got his hat back. 176 00:08:39,340 --> 00:08:42,130 So they're not independent. 177 00:08:42,130 --> 00:08:43,220 All right. 178 00:08:43,220 --> 00:08:46,940 Nevertheless, independence doesn't matter for linearity. 179 00:08:46,940 --> 00:08:50,400 So I can still say that the expected number of hats 180 00:08:50,400 --> 00:08:53,700 returned is equal to the expectation 181 00:08:53,700 --> 00:08:56,620 of the sum of the indicator variables for each man 182 00:08:56,620 --> 00:08:58,460 getting his hat back. 183 00:08:58,460 --> 00:09:01,320 And of course, the expectation of that sum 184 00:09:01,320 --> 00:09:04,180 is the sum of the expectations. 185 00:09:04,180 --> 00:09:07,140 And the expectation of each of these-- we figured out-- 186 00:09:07,140 --> 00:09:08,800 was 1 over n, and there's n of them. 187 00:09:08,800 --> 00:09:12,200 So it's n times 1 over n, or 1. 188 00:09:12,200 --> 00:09:14,600 I expect when all the hats are scrambled 189 00:09:14,600 --> 00:09:18,150 and all permutations of the hats are equally likely, 190 00:09:18,150 --> 00:09:21,170 that one man is going to get his hat back. 191 00:09:21,170 --> 00:09:23,600 None of the others will. 192 00:09:23,600 --> 00:09:24,100 OK. 193 00:09:24,100 --> 00:09:27,940 Now let's change the situation a little bit. 194 00:09:27,940 --> 00:09:30,640 And think instead of scrambling the hats in a way 195 00:09:30,640 --> 00:09:34,250 that all possible permutations of hats are equally likely, 196 00:09:34,250 --> 00:09:36,140 let's think about a Chinese banquet. 197 00:09:36,140 --> 00:09:37,740 So Chinese banquets are traditionally 198 00:09:37,740 --> 00:09:41,740 done with a table of nine in a circle, 199 00:09:41,740 --> 00:09:43,910 and there's a lazy Susan that spins around 200 00:09:43,910 --> 00:09:46,820 where there's dishes of food in front of each person. 201 00:09:46,820 --> 00:09:48,590 But let's generalize it to n. 202 00:09:48,590 --> 00:09:50,220 Suppose that n people are sitting 203 00:09:50,220 --> 00:09:52,920 around a spinning table-- a lazy-Susan) 204 00:09:52,920 --> 00:09:56,580 with n different dishes, one dish in front of each person. 205 00:09:56,580 --> 00:10:00,860 And now we spin the lazy Susan randomly, 206 00:10:00,860 --> 00:10:04,640 and we ask how many people do we expect 207 00:10:04,640 --> 00:10:06,820 will wind up with the same dish that they 208 00:10:06,820 --> 00:10:10,600 started with after the spin? 209 00:10:10,600 --> 00:10:14,690 Well, now we can let Ri indicate that the ith 210 00:10:14,690 --> 00:10:17,960 person got the same dish back. 211 00:10:17,960 --> 00:10:20,650 And now these Ris, which are different 212 00:10:20,650 --> 00:10:22,950 from the previous ones about hat returns, 213 00:10:22,950 --> 00:10:26,250 these are Ris are totally dependent-- much more 214 00:10:26,250 --> 00:10:29,480 so than the other ones work because they're all 1 215 00:10:29,480 --> 00:10:30,640 or they're all 0. 216 00:10:30,640 --> 00:10:33,250 If one person gets their hat back, 217 00:10:33,250 --> 00:10:35,810 it means that the spinning table got 218 00:10:35,810 --> 00:10:38,940 back to where it used to be, and everybody has their hat back. 219 00:10:38,940 --> 00:10:42,040 And if one person doesn't have their hat back, 220 00:10:42,040 --> 00:10:44,800 then the table is shifted off where it was originally 221 00:10:44,800 --> 00:10:47,670 and nobody has the original dish. 222 00:10:47,670 --> 00:10:48,220 I said hats. 223 00:10:48,220 --> 00:10:50,750 I meant the dish that they started with. 224 00:10:50,750 --> 00:10:53,250 So everybody gets the same dish back, 225 00:10:53,250 --> 00:10:55,220 or nobody gets the same dish back. 226 00:10:55,220 --> 00:10:58,990 These variables are as dependent as they possibly could be, 227 00:10:58,990 --> 00:11:03,760 but it doesn't matter because linearity still holds. 228 00:11:03,760 --> 00:11:06,710 And that means that the previous argument 229 00:11:06,710 --> 00:11:10,900 about the expected number of dishes that get back 230 00:11:10,900 --> 00:11:13,440 to the person that they started with 231 00:11:13,440 --> 00:11:18,150 is still 1 even though all the Ris are equal. 232 00:11:21,840 --> 00:11:25,710 Well, that's so much for the lovely rule about linearity 233 00:11:25,710 --> 00:11:29,430 of expectation which holds regardless of assumptions 234 00:11:29,430 --> 00:11:31,950 about independence or not. 235 00:11:31,950 --> 00:11:35,950 There is a rule for products, but it requires independence. 236 00:11:35,950 --> 00:11:38,530 So the independent product rule says, sure enough 237 00:11:38,530 --> 00:11:41,150 that the expectation of a product of two 238 00:11:41,150 --> 00:11:45,340 random variables, x and y, is the product 239 00:11:45,340 --> 00:11:49,750 of their expectations, providing that they are independent. 240 00:11:49,750 --> 00:11:52,110 And this extends to many variables 241 00:11:52,110 --> 00:11:55,710 if they're mutually independent. 242 00:11:55,710 --> 00:11:58,530 Again, the proof by rearranging terms in the defining 243 00:11:58,530 --> 00:12:00,410 sum for the expectation of xy. 244 00:12:00,410 --> 00:12:01,280 Let's go through it. 245 00:12:01,280 --> 00:12:03,321 And again, you can fast forward or skip this part 246 00:12:03,321 --> 00:12:06,630 if you don't want to watch equations being manipulated. 247 00:12:06,630 --> 00:12:10,360 So by definition, the expectation of the product xy 248 00:12:10,360 --> 00:12:13,450 is the sum over all the possible values 249 00:12:13,450 --> 00:12:16,330 of x and y of the value of the product 250 00:12:16,330 --> 00:12:20,050 xy times the probability that the first variable, capital X, 251 00:12:20,050 --> 00:12:22,520 equals little x and the second variable Y 252 00:12:22,520 --> 00:12:24,700 is equal to little y. 253 00:12:24,700 --> 00:12:28,890 This is by definition, the expectation of the product-- 254 00:12:28,890 --> 00:12:30,960 by the first definition. 255 00:12:30,960 --> 00:12:32,970 Not the one in terms of outcomes. 256 00:12:32,970 --> 00:12:34,620 Now, using independence, this term 257 00:12:34,620 --> 00:12:38,200 here can be split into a product of X equals x and Y equals y. 258 00:12:38,200 --> 00:12:39,240 So let's do that. 259 00:12:39,240 --> 00:12:42,150 So this is the sum of xy times the probability 260 00:12:42,150 --> 00:12:45,800 that X equals x times the probability that Y equals y. 261 00:12:45,800 --> 00:12:49,290 Now I'm going to do a fairly standard trick, which 262 00:12:49,290 --> 00:12:52,900 is I'm going to organize this sum in a clean way. 263 00:12:52,900 --> 00:12:55,970 Right now it's an unordered sum over all possible pairs of x 264 00:12:55,970 --> 00:13:01,220 and y in the range of the variables x and y, 265 00:13:01,220 --> 00:13:04,400 but I'm going to do the sum so I first sum over all 266 00:13:04,400 --> 00:13:07,070 the y's, and then for each y, I'm going to sum over all 267 00:13:07,070 --> 00:13:09,200 the x's. 268 00:13:09,200 --> 00:13:11,271 This is an unordered some really. 269 00:13:11,271 --> 00:13:13,770 There's no order here, but now I'm giving you an arrangement 270 00:13:13,770 --> 00:13:18,420 which says that I'm going to lump together the sums over all 271 00:13:18,420 --> 00:13:23,050 the x's, and then for each of those I'm going to sum up over 272 00:13:23,050 --> 00:13:25,900 the y's. 273 00:13:25,900 --> 00:13:28,570 Well, when I do it this way, what I've got 274 00:13:28,570 --> 00:13:30,070 is an interesting thing here. 275 00:13:30,070 --> 00:13:33,190 I've got a sum over x, and there's some y terms here 276 00:13:33,190 --> 00:13:34,670 that don't depend on x. 277 00:13:34,670 --> 00:13:38,590 I can factor them out because they don't change with x. 278 00:13:38,590 --> 00:13:42,990 So if I factor out this y and probability of Y equals y, 279 00:13:42,990 --> 00:13:47,840 I wind up with the sum over y of this factored out term-- y 280 00:13:47,840 --> 00:13:52,120 times probability Y equals y-- times the sum over x's x 281 00:13:52,120 --> 00:13:55,430 times the probability that X equals x. 282 00:13:55,430 --> 00:13:59,660 Now, this is the same term that is 283 00:13:59,660 --> 00:14:02,100 the coefficient of every one of these y terms 284 00:14:02,100 --> 00:14:05,420 that depends on y, and this term does not depend on y. 285 00:14:05,420 --> 00:14:07,910 So I can factor it out. 286 00:14:07,910 --> 00:14:11,740 And if I do that, I wind up with the sum 287 00:14:11,740 --> 00:14:13,920 over x of x times the probability 288 00:14:13,920 --> 00:14:18,250 that X equals x times the sum over y of y times 289 00:14:18,250 --> 00:14:19,950 the probability that Y equals y. 290 00:14:19,950 --> 00:14:21,100 And guess what. 291 00:14:21,100 --> 00:14:24,600 This is by definition the expectation of X, 292 00:14:24,600 --> 00:14:27,690 and this is by definition the expectation of Y. 293 00:14:27,690 --> 00:14:30,220 And by that chain of equalities, I've 294 00:14:30,220 --> 00:14:33,270 proved, sure enough, that the expectation of XY 295 00:14:33,270 --> 00:14:39,160 is equal to the expectation of X times the expectation of Y QED. 296 00:14:39,160 --> 00:14:41,260 So the key step here was where independence 297 00:14:41,260 --> 00:14:44,770 was used at the very first step when I split up the probability 298 00:14:44,770 --> 00:14:47,862 that X equaled x and Y equaled y into the product 299 00:14:47,862 --> 00:14:49,320 of the corresponding probabilities. 300 00:14:51,960 --> 00:14:53,840 Now, let me just end with a warning 301 00:14:53,840 --> 00:14:57,810 about a couple of blunders that people make all the time. 302 00:14:57,810 --> 00:15:00,460 So first of all, don't forget independence 303 00:15:00,460 --> 00:15:05,100 as a crucial condition on the product rule for expectations. 304 00:15:05,100 --> 00:15:09,830 It can hold in some cases where the variables are dependent. 305 00:15:09,830 --> 00:15:11,700 Independent is not a necessary condition. 306 00:15:11,700 --> 00:15:14,460 It's efficient, but you need some kind 307 00:15:14,460 --> 00:15:18,840 of a condition in order for the product rule to hold. 308 00:15:18,840 --> 00:15:21,760 So if you're not careful, it won't if you 309 00:15:21,760 --> 00:15:24,490 forget to check for independence or something that 310 00:15:24,490 --> 00:15:25,620 is tantamount to it. 311 00:15:25,620 --> 00:15:27,940 So let's just take an easy example remember what 312 00:15:27,940 --> 00:15:29,430 happens if independence fails. 313 00:15:29,430 --> 00:15:32,680 Suppose I have a variable X-- a random variable X-- which 314 00:15:32,680 --> 00:15:35,780 takes positive and negative values with equal probability. 315 00:15:35,780 --> 00:15:38,780 So it takes 1 and minus 1 with equal probability. 316 00:15:38,780 --> 00:15:41,580 It takes pi and minus pi with equal probability. 317 00:15:41,580 --> 00:15:43,290 I don't really care what those values are 318 00:15:43,290 --> 00:15:46,760 as long as it's taking some positive and negative values, 319 00:15:46,760 --> 00:15:49,370 and it takes a positive value at the same probability 320 00:15:49,370 --> 00:15:52,100 that it takes that value negated. 321 00:15:52,100 --> 00:15:54,970 Well, that automatically means that the expectation of X 322 00:15:54,970 --> 00:15:58,140 is 0 because when I add up all these terms, 323 00:15:58,140 --> 00:16:00,410 the positive and negative terms cancel because they 324 00:16:00,410 --> 00:16:01,670 have the same probability. 325 00:16:01,670 --> 00:16:08,780 So any such X that's symmetric about 0 has expectations 0. 326 00:16:08,780 --> 00:16:11,180 On the other hand, if I square X, 327 00:16:11,180 --> 00:16:15,070 then all of those positive and negative values 328 00:16:15,070 --> 00:16:16,140 become positive. 329 00:16:16,140 --> 00:16:19,020 And so I'm taking the expectation 330 00:16:19,020 --> 00:16:22,770 of a variable that's strictly positive-- at least 331 00:16:22,770 --> 00:16:25,910 with nonzero probability at a bunch of outcomes. 332 00:16:25,910 --> 00:16:30,520 And therefore, the expectation of X squared is positive. 333 00:16:30,520 --> 00:16:32,920 So the expectation of X is 0, but the expectation 334 00:16:32,920 --> 00:16:34,090 of X squared is positive. 335 00:16:34,090 --> 00:16:36,340 Well, of course, if I multiply expectation 336 00:16:36,340 --> 00:16:39,210 of X times expectation of X, that's still 0. 337 00:16:39,210 --> 00:16:40,940 So here's a counter example. 338 00:16:40,940 --> 00:16:43,110 Expectation of X times expectation of X 339 00:16:43,110 --> 00:16:45,590 is equal to 0, which is less than the expectation 340 00:16:45,590 --> 00:16:47,690 of the square of X. Of course this is about as 341 00:16:47,690 --> 00:16:49,740 dependent as it could possibly be because it's 342 00:16:49,740 --> 00:16:52,630 the same random variable, but it illustrates 343 00:16:52,630 --> 00:16:56,440 the failure of the product rule if you 344 00:16:56,440 --> 00:16:58,850 don't have some kind of a condition like independence 345 00:16:58,850 --> 00:17:00,290 around. 346 00:17:00,290 --> 00:17:03,100 There's a second blunder that's more interesting 347 00:17:03,100 --> 00:17:07,810 and that people can fall in because there's a temptation 348 00:17:07,810 --> 00:17:10,660 to assume that if the product rule holds 349 00:17:10,660 --> 00:17:13,690 for independence, then so should the reciprocal rule. 350 00:17:13,690 --> 00:17:17,440 That is, you might think that the expectation of X over Y 351 00:17:17,440 --> 00:17:20,900 is equal to the expectation of X over the expectation of Y 352 00:17:20,900 --> 00:17:22,970 when X and Y are independent. 353 00:17:22,970 --> 00:17:23,990 But it's not true. 354 00:17:23,990 --> 00:17:27,060 Even when they're independent, the expectation 355 00:17:27,060 --> 00:17:31,550 of X divided by Y is in general, not equal to the expectation 356 00:17:31,550 --> 00:17:34,820 of X divided by the expectation of Y. In fact, 357 00:17:34,820 --> 00:17:38,860 the counterexample is if X is the constant 1, the expectation 358 00:17:38,860 --> 00:17:39,690 of 1 over Y 359 00:17:39,690 --> 00:17:42,170 [AUDIO OUT] 360 00:18:15,150 --> 00:18:16,900 PROFESSOR: [INAUDIBLE] complex instruction 361 00:18:16,900 --> 00:18:20,630 set code was better than risk. 362 00:18:20,630 --> 00:18:25,080 So I won't mention names, but prominent people 363 00:18:25,080 --> 00:18:26,360 have made this blunder. 364 00:18:26,360 --> 00:18:28,070 You shouldn't.