1 00:00:00,000 --> 00:00:00,040 2 00:00:00,040 --> 00:00:02,460 The following content is provided under a Creative 3 00:00:02,460 --> 00:00:03,870 Commons license. 4 00:00:03,870 --> 00:00:06,910 Your support will help MIT OpenCourseWare continue to 5 00:00:06,910 --> 00:00:10,560 offer high quality educational resources for free. 6 00:00:10,560 --> 00:00:13,460 To make a donation or view additional materials from 7 00:00:13,460 --> 00:00:19,290 hundreds of MIT courses, visit MIT OpenCourseWare at 8 00:00:19,290 --> 00:00:20,540 ocw.mit.edu. 9 00:00:20,540 --> 00:00:22,648 10 00:00:22,648 --> 00:00:25,410 JOHN TSITSIKLIS: So today we're going to finish with the 11 00:00:25,410 --> 00:00:28,240 core material of this class. 12 00:00:28,240 --> 00:00:30,980 That is the material that has to do with probability theory 13 00:00:30,980 --> 00:00:31,690 in general. 14 00:00:31,690 --> 00:00:34,240 And then for the rest of the semester we're going to look 15 00:00:34,240 --> 00:00:38,290 at some special types of models, talk about inference. 16 00:00:38,290 --> 00:00:40,970 Well, there's also going to be a small module of core 17 00:00:40,970 --> 00:00:42,840 material coming later. 18 00:00:42,840 --> 00:00:46,940 But today we're basically finishing chapter four. 19 00:00:46,940 --> 00:00:50,800 And what we're going to do is we're going to look at a 20 00:00:50,800 --> 00:00:53,720 somewhat familiar concept, the concept of the conditional 21 00:00:53,720 --> 00:00:55,000 expectation. 22 00:00:55,000 --> 00:00:58,690 But we're going to look at it from a slightly different 23 00:00:58,690 --> 00:01:02,840 angle, from a slightly more sophisticated angle. 24 00:01:02,840 --> 00:01:05,370 And together with the conditional expectation we 25 00:01:05,370 --> 00:01:08,445 will also talk about conditional variances. 26 00:01:08,445 --> 00:01:11,840 It's something that we're going to denote this way. 27 00:01:11,840 --> 00:01:15,180 And we're going to see what they are, and there are some 28 00:01:15,180 --> 00:01:17,820 subtle concepts that are involved here. 29 00:01:17,820 --> 00:01:20,780 And we're going to apply some of the tools we're going to 30 00:01:20,780 --> 00:01:24,390 develop to deal with a special type of situation in which 31 00:01:24,390 --> 00:01:26,660 we're adding random variables. 32 00:01:26,660 --> 00:01:31,860 But we're adding a random number of random variables. 33 00:01:31,860 --> 00:01:34,720 OK, so let's start talking about conditional 34 00:01:34,720 --> 00:01:37,410 expectations. 35 00:01:37,410 --> 00:01:39,970 I guess you know what they are. 36 00:01:39,970 --> 00:01:43,660 Suppose we are in the discrete the world. xy, or discrete 37 00:01:43,660 --> 00:01:45,590 random variables. 38 00:01:45,590 --> 00:01:49,340 We defined the conditional expectation of x given that I 39 00:01:49,340 --> 00:01:52,480 told you the value of the random variable y. 40 00:01:52,480 --> 00:01:56,800 And the way we define it is the same way as an ordinary 41 00:01:56,800 --> 00:02:01,020 expectation, except that we're using the conditional PMF. 42 00:02:01,020 --> 00:02:03,440 So we're using the probabilities that apply to 43 00:02:03,440 --> 00:02:06,910 the new universe where we are told the value of the random 44 00:02:06,910 --> 00:02:08,289 variable y. 45 00:02:08,289 --> 00:02:12,050 So this is still a familiar concept so far. 46 00:02:12,050 --> 00:02:14,720 If we're dealing with the continuous random variable x 47 00:02:14,720 --> 00:02:17,170 the formula is the same, except that here we have an 48 00:02:17,170 --> 00:02:21,450 integral, and we have to use the conditional density 49 00:02:21,450 --> 00:02:25,020 function of x. 50 00:02:25,020 --> 00:02:28,770 Now what I'm going to do, I want to introduce it gently 51 00:02:28,770 --> 00:02:32,200 through the example that we talked about last time. 52 00:02:32,200 --> 00:02:35,290 So last time we talked about having a stick that has a 53 00:02:35,290 --> 00:02:36,770 certain length. 54 00:02:36,770 --> 00:02:41,950 And we take that stick, and we break it at some point that we 55 00:02:41,950 --> 00:02:43,790 choose uniformly at random. 56 00:02:43,790 --> 00:02:49,390 And let's denote why the place where we chose to break it. 57 00:02:49,390 --> 00:02:52,750 Having chosen y, then we're left with a 58 00:02:52,750 --> 00:02:53,930 piece of the stick. 59 00:02:53,930 --> 00:02:57,750 And I'm going to choose a place to break it once more 60 00:02:57,750 --> 00:03:01,330 uniformly at random between 0 and y. 61 00:03:01,330 --> 00:03:04,170 So this is the second place at which we are going to break 62 00:03:04,170 --> 00:03:07,900 it, and we call that place x. 63 00:03:07,900 --> 00:03:12,040 OK, so what's the conditional expectation of x if I tell you 64 00:03:12,040 --> 00:03:13,630 the value of y? 65 00:03:13,630 --> 00:03:16,740 I tell you that capital Y happens to take a specific 66 00:03:16,740 --> 00:03:18,800 numerical value. 67 00:03:18,800 --> 00:03:22,770 So this capital Y is now a specific numerical value, x is 68 00:03:22,770 --> 00:03:25,280 chosen uniformly over this range. 69 00:03:25,280 --> 00:03:29,780 So the expected value of x is going to be half of this range 70 00:03:29,780 --> 00:03:30,810 between 0 and y. 71 00:03:30,810 --> 00:03:36,850 So the conditional expectation is little y over 2. 72 00:03:36,850 --> 00:03:40,210 The important thing to realize here is that this 73 00:03:40,210 --> 00:03:42,170 quantity is a number. 74 00:03:42,170 --> 00:03:45,570 I told you that the random variable took a certain 75 00:03:45,570 --> 00:03:49,170 numerical value, let's say 3.5. 76 00:03:49,170 --> 00:03:52,780 And then you tell me given that the random variable took 77 00:03:52,780 --> 00:03:59,940 the numerical value 3.5 the expected value of x is 1.75. 78 00:03:59,940 --> 00:04:04,080 So this is an equality between numbers. 79 00:04:04,080 --> 00:04:08,110 On the other hand, before you do the experiment you don't 80 00:04:08,110 --> 00:04:12,160 know what y is going to turn out to be. 81 00:04:12,160 --> 00:04:15,680 So this little y is the numerical value that has been 82 00:04:15,680 --> 00:04:18,990 observed when you start doing the experiments and you 83 00:04:18,990 --> 00:04:22,700 observe the value of capital Y. So in some sense this 84 00:04:22,700 --> 00:04:27,770 quantity is not known ahead of time, it is random itself. 85 00:04:27,770 --> 00:04:33,670 So maybe we can start thinking of it as a random variable. 86 00:04:33,670 --> 00:04:37,010 So to put it differently, before we do the experiment I 87 00:04:37,010 --> 00:04:41,030 ask you what's the expected value of x given y? 88 00:04:41,030 --> 00:04:44,740 You're going to answer me well I don't know, it depends on 89 00:04:44,740 --> 00:04:47,580 what y is going to turn out to be. 90 00:04:47,580 --> 00:04:52,690 So the expected value of x given y itself can be viewed 91 00:04:52,690 --> 00:04:56,540 as a random variable, because it depends on the random 92 00:04:56,540 --> 00:04:58,330 variable capital Y. 93 00:04:58,330 --> 00:05:02,080 So hidden here there's some kind of statement about random 94 00:05:02,080 --> 00:05:04,810 variables instead of numbers. 95 00:05:04,810 --> 00:05:07,770 And that statement about random variables, we 96 00:05:07,770 --> 00:05:09,660 write it this way. 97 00:05:09,660 --> 00:05:12,770 By thinking of the expected value, the conditional 98 00:05:12,770 --> 00:05:17,330 expectation, as a random variable instead of a number. 99 00:05:17,330 --> 00:05:20,380 It's a random variable when we do not specify a specific 100 00:05:20,380 --> 00:05:23,410 number, but we think of it as an abstract object. 101 00:05:23,410 --> 00:05:29,560 The expected value of x given the random variable y is the 102 00:05:29,560 --> 00:05:34,390 random variable y over 2 no matter what capital Y 103 00:05:34,390 --> 00:05:37,090 turns out to be. 104 00:05:37,090 --> 00:05:39,530 So we turn and take a statement that deals with 105 00:05:39,530 --> 00:05:43,460 equality of two numbers, and we make it a statement that's 106 00:05:43,460 --> 00:05:46,740 an equality between two random variables. 107 00:05:46,740 --> 00:05:49,910 OK so this is clearly a random variable because 108 00:05:49,910 --> 00:05:52,330 capital Y is random. 109 00:05:52,330 --> 00:05:54,170 What exactly is this object? 110 00:05:54,170 --> 00:05:57,130 I didn't yet define it for you formally. 111 00:05:57,130 --> 00:06:02,150 So let's now give the formal definition of this object 112 00:06:02,150 --> 00:06:04,570 that's going to be denoted this way. 113 00:06:04,570 --> 00:06:09,400 The conditional expectation of x given the random variable y 114 00:06:09,400 --> 00:06:12,830 is a random variable. 115 00:06:12,830 --> 00:06:14,900 Which random variable is it? 116 00:06:14,900 --> 00:06:19,670 It's the random variable that takes this specific numerical 117 00:06:19,670 --> 00:06:24,330 value whenever capital Y happens to take the specific 118 00:06:24,330 --> 00:06:26,480 numerical value little y. 119 00:06:26,480 --> 00:06:30,010 In particular, this is a random variable, which is a 120 00:06:30,010 --> 00:06:33,840 function of the random variable capital Y. In this 121 00:06:33,840 --> 00:06:36,680 instance, it's given by a simple formula in terms of 122 00:06:36,680 --> 00:06:39,540 capital Y. In other situations it might be a 123 00:06:39,540 --> 00:06:41,520 more complicated formula. 124 00:06:41,520 --> 00:06:44,680 So again, to summarize, it's a random. 125 00:06:44,680 --> 00:06:48,530 The conditional expectation can be thought of as a random 126 00:06:48,530 --> 00:06:53,040 variable instead of something that's just a number. 127 00:06:53,040 --> 00:06:55,940 So in any specific context when you're given the value of 128 00:06:55,940 --> 00:06:59,110 capital Y the conditional expectation becomes a number. 129 00:06:59,110 --> 00:07:02,890 This is the realized value of this random variable. 130 00:07:02,890 --> 00:07:06,260 But before the experiment starts, before you know what 131 00:07:06,260 --> 00:07:10,140 capital Y is going to be, all that you can say is that the 132 00:07:10,140 --> 00:07:14,320 conditional expectation is going to be 1/2 of whatever 133 00:07:14,320 --> 00:07:16,840 capital Y turns out to be. 134 00:07:16,840 --> 00:07:20,270 This is a pretty subtle concept, it's an abstraction, 135 00:07:20,270 --> 00:07:22,990 but it's a useful abstraction. 136 00:07:22,990 --> 00:07:29,440 And we're going to see today how to use it. 137 00:07:29,440 --> 00:07:32,940 All right, I have made the point that the conditional 138 00:07:32,940 --> 00:07:37,200 expectation, the random variable that takes these 139 00:07:37,200 --> 00:07:40,490 numerical values is a random variable. 140 00:07:40,490 --> 00:07:43,090 If it is a random variable this means that it has an 141 00:07:43,090 --> 00:07:45,710 expectation of its own. 142 00:07:45,710 --> 00:07:48,590 So let's start thinking what the expectation of the 143 00:07:48,590 --> 00:07:53,432 conditional expectation is going to turn out to be. 144 00:07:53,432 --> 00:07:59,210 OK, so the conditional expectation is a random 145 00:07:59,210 --> 00:08:03,030 variable, and in general it's some function of the random 146 00:08:03,030 --> 00:08:05,465 variable y that we are observing. 147 00:08:05,465 --> 00:08:07,970 148 00:08:07,970 --> 00:08:13,910 In terms of numerical values if capital Y happens to take a 149 00:08:13,910 --> 00:08:17,490 specific numerical value then the conditional expectation 150 00:08:17,490 --> 00:08:20,830 also takes a specific numerical value, and we use 151 00:08:20,830 --> 00:08:22,630 the same function to evaluate it. 152 00:08:22,630 --> 00:08:25,770 The difference here is that this is an equality of random 153 00:08:25,770 --> 00:08:29,440 variables, this is an equality between numbers. 154 00:08:29,440 --> 00:08:33,120 Now if we want to calculate the expected value of the 155 00:08:33,120 --> 00:08:38,539 conditional expectation we're basically talking about the 156 00:08:38,539 --> 00:08:44,080 expected value of a function of a random variable. 157 00:08:44,080 --> 00:08:48,620 And we know how to calculate expected values of a function. 158 00:08:48,620 --> 00:08:54,330 If we are in the discrete case, for example, this would 159 00:08:54,330 --> 00:09:02,690 be a sum over all y's of the function who's expected value 160 00:09:02,690 --> 00:09:09,580 we're taking times the probability that y takes on a 161 00:09:09,580 --> 00:09:11,940 specific numerical value. 162 00:09:11,940 --> 00:09:16,360 OK, but let's remember what g is. 163 00:09:16,360 --> 00:09:22,690 So g is the numerical value of the conditional 164 00:09:22,690 --> 00:09:25,300 expectation of x with y. 165 00:09:25,300 --> 00:09:29,530 166 00:09:29,530 --> 00:09:33,450 And now when you see this expression you recognize it. 167 00:09:33,450 --> 00:09:35,630 This is the expression that we get in the 168 00:09:35,630 --> 00:09:37,190 total expectation theorem. 169 00:09:37,190 --> 00:09:41,300 170 00:09:41,300 --> 00:09:42,795 Did I miss something? 171 00:09:42,795 --> 00:09:45,570 172 00:09:45,570 --> 00:09:48,700 Yes, in the total expectation theorem to find the expected 173 00:09:48,700 --> 00:09:52,720 value of x, we divide the world into different scenarios 174 00:09:52,720 --> 00:09:55,970 depending on what y happens. 175 00:09:55,970 --> 00:09:59,110 We calculate the expectation in each one of the possible 176 00:09:59,110 --> 00:10:01,750 worlds, and we take the weighted average. 177 00:10:01,750 --> 00:10:04,770 So this is a formula that you have seen before, and you 178 00:10:04,770 --> 00:10:08,610 recognize that this is the expected value of x. 179 00:10:08,610 --> 00:10:13,280 So this is a longer, more detailed derivation of what I 180 00:10:13,280 --> 00:10:17,770 had written up here, but the important thing to keep in 181 00:10:17,770 --> 00:10:22,790 mind is the moral of the story, the punchline. 182 00:10:22,790 --> 00:10:26,640 The expected value of the conditional expectation is the 183 00:10:26,640 --> 00:10:27,890 expectation itself. 184 00:10:27,890 --> 00:10:30,710 185 00:10:30,710 --> 00:10:35,030 So this is just our total expectation theorem, but 186 00:10:35,030 --> 00:10:37,700 written in more abstract notation. 187 00:10:37,700 --> 00:10:40,035 And it comes handy to have this more abstract notation, 188 00:10:40,035 --> 00:10:43,570 as as we're going to see in a while. 189 00:10:43,570 --> 00:10:47,320 OK, we can apply this to our stick example. 190 00:10:47,320 --> 00:10:50,220 If we want to find the expected value of x how much 191 00:10:50,220 --> 00:10:53,110 of the stick is left at the end? 192 00:10:53,110 --> 00:10:57,370 We can calculate it using this law of iterated expectations. 193 00:10:57,370 --> 00:11:00,190 It's the expected value of the conditional expectation. 194 00:11:00,190 --> 00:11:03,790 We know that the conditional expectation is y over 2. 195 00:11:03,790 --> 00:11:10,730 So expected value of y is l over 2, because y is uniform 196 00:11:10,730 --> 00:11:12,830 so we get l over 4. 197 00:11:12,830 --> 00:11:15,440 So this gives us the same answer that we derived last 198 00:11:15,440 --> 00:11:18,210 time in a rather long way. 199 00:11:18,210 --> 00:11:24,470 200 00:11:24,470 --> 00:11:27,750 All right, now that we have mastered conditional 201 00:11:27,750 --> 00:11:33,100 expectations, let's raise the bar a little more and talk 202 00:11:33,100 --> 00:11:35,590 about conditional variances. 203 00:11:35,590 --> 00:11:38,750 So the conditional expectation is the mean value, or the 204 00:11:38,750 --> 00:11:41,380 expected value, in a conditional universe where 205 00:11:41,380 --> 00:11:43,450 you're told the value of y. 206 00:11:43,450 --> 00:11:47,270 In that same conditional universe you can talk about 207 00:11:47,270 --> 00:11:51,360 the conditional distribution of x, which has a mean-- 208 00:11:51,360 --> 00:11:52,810 the conditional expectation-- 209 00:11:52,810 --> 00:11:54,140 but the conditional distribution of 210 00:11:54,140 --> 00:11:56,130 x also has a variance. 211 00:11:56,130 --> 00:11:58,730 So we can talk about the variance of x in that 212 00:11:58,730 --> 00:12:01,500 conditional universe. 213 00:12:01,500 --> 00:12:07,390 The conditional variance as a number is the natural thing. 214 00:12:07,390 --> 00:12:11,680 It's the variance of x, except that all the calculations are 215 00:12:11,680 --> 00:12:13,790 done in the conditional universe. 216 00:12:13,790 --> 00:12:19,940 In the conditional universe the expected value of x is the 217 00:12:19,940 --> 00:12:21,740 conditional expectation. 218 00:12:21,740 --> 00:12:24,530 This is the distance from the mean in the conditional 219 00:12:24,530 --> 00:12:26,310 universe squared. 220 00:12:26,310 --> 00:12:30,080 And we take the average value of the squared distance, but 221 00:12:30,080 --> 00:12:32,660 calculate it again using the probabilities that apply in 222 00:12:32,660 --> 00:12:35,240 the conditional universe. 223 00:12:35,240 --> 00:12:38,020 This is an equality between numbers. 224 00:12:38,020 --> 00:12:43,720 I tell you the value of y, once you know that value for y 225 00:12:43,720 --> 00:12:47,730 you can go ahead and plot the conditional distribution of x. 226 00:12:47,730 --> 00:12:50,090 And for that conditional distribution you can calculate 227 00:12:50,090 --> 00:12:52,890 the number which is the variance of x in that 228 00:12:52,890 --> 00:12:54,650 conditional universe. 229 00:12:54,650 --> 00:12:57,820 So now let's repeat the mental gymnastics from the previous 230 00:12:57,820 --> 00:13:03,560 slide, and abstract things, and define a random variable-- 231 00:13:03,560 --> 00:13:06,080 the conditional variance. 232 00:13:06,080 --> 00:13:08,900 And it's going to be a random variable because we leave the 233 00:13:08,900 --> 00:13:12,010 numerical value of capital Y unspecified. 234 00:13:12,010 --> 00:13:15,670 So ahead of time we don't know what capital Y is going to be, 235 00:13:15,670 --> 00:13:18,860 and because of that we don't know ahead of time what the 236 00:13:18,860 --> 00:13:20,870 conditional variance is going to be. 237 00:13:20,870 --> 00:13:24,500 So before the experiment starts if I ask you what's the 238 00:13:24,500 --> 00:13:26,060 conditional variance of x? 239 00:13:26,060 --> 00:13:28,440 You're going to tell me well I don't know, It depends on what 240 00:13:28,440 --> 00:13:30,300 y is going to turn out to be. 241 00:13:30,300 --> 00:13:32,770 It's going to be something that depends on y. 242 00:13:32,770 --> 00:13:36,210 So it's a random variable, which is a function of y. 243 00:13:36,210 --> 00:13:38,980 So more precisely, the conditional variance when 244 00:13:38,980 --> 00:13:42,480 written in this notation just with capital letters, is a 245 00:13:42,480 --> 00:13:43,730 random variable. 246 00:13:43,730 --> 00:13:47,560 It's a random variable whose value is completely determined 247 00:13:47,560 --> 00:13:52,330 once you learned the value of capital Y. And it takes a 248 00:13:52,330 --> 00:13:55,070 specific numerical value. 249 00:13:55,070 --> 00:13:58,700 If capital Y happens to get a realization that's a specific 250 00:13:58,700 --> 00:14:03,130 number, then the variance also becomes a specific number. 251 00:14:03,130 --> 00:14:05,390 And it's just a conditional variance of y 252 00:14:05,390 --> 00:14:09,420 over x in that universe. 253 00:14:09,420 --> 00:14:12,390 All right, OK, so let's continue what we did in the 254 00:14:12,390 --> 00:14:13,620 previous slide. 255 00:14:13,620 --> 00:14:15,960 We had the law of iterated expectations. 256 00:14:15,960 --> 00:14:18,350 That told us that expected value of a conditional 257 00:14:18,350 --> 00:14:21,360 expectation is the unconditional expectation. 258 00:14:21,360 --> 00:14:26,140 Is there a similar rule that might apply in this context? 259 00:14:26,140 --> 00:14:29,810 So you might guess that the variance of x could be found 260 00:14:29,810 --> 00:14:33,590 by taking the expected value of the conditional variance. 261 00:14:33,590 --> 00:14:35,680 It turns out that this is not true. 262 00:14:35,680 --> 00:14:38,480 There is a formula for the variance in terms of 263 00:14:38,480 --> 00:14:40,060 conditional quantities. 264 00:14:40,060 --> 00:14:42,280 But the formula is a little more complicated. 265 00:14:42,280 --> 00:14:46,200 If involves two terms instead of one. 266 00:14:46,200 --> 00:14:50,010 So we're going to go quickly through the 267 00:14:50,010 --> 00:14:52,470 derivation of this formula. 268 00:14:52,470 --> 00:14:55,260 And then, through examples we'll try to get some 269 00:14:55,260 --> 00:14:58,480 interpretation of what the different terms here 270 00:14:58,480 --> 00:15:01,440 correspond to. 271 00:15:01,440 --> 00:15:04,800 All right, so let's try to prove this formula. 272 00:15:04,800 --> 00:15:08,940 And the proof is sort of a useful exercise to make sure 273 00:15:08,940 --> 00:15:11,860 you understand all the symbols that are involved in here. 274 00:15:11,860 --> 00:15:14,850 So the proof is not difficult, it's 4 and 1/2 lines of 275 00:15:14,850 --> 00:15:18,220 algebra, of just writing down formulas. 276 00:15:18,220 --> 00:15:21,710 But the challenge is to make sure that at each point you 277 00:15:21,710 --> 00:15:25,070 understand what each one of the objects is. 278 00:15:25,070 --> 00:15:27,880 So we go into formula for the variance affects. 279 00:15:27,880 --> 00:15:32,480 We know in general that the variance of x has this nice 280 00:15:32,480 --> 00:15:34,590 expression that we often use to calculate it. 281 00:15:34,590 --> 00:15:37,340 The expected value of the squared of the random variable 282 00:15:37,340 --> 00:15:41,220 minus the mean squared. 283 00:15:41,220 --> 00:15:45,290 This formula, for the variances, of course it should 284 00:15:45,290 --> 00:15:48,380 apply to conditional universes. 285 00:15:48,380 --> 00:15:50,430 I mean it's a general formula about variances. 286 00:15:50,430 --> 00:15:53,650 If we put ourselves in a conditional universe where the 287 00:15:53,650 --> 00:15:58,380 random variable y is given to us the same math should work. 288 00:15:58,380 --> 00:16:01,220 So we should have a similar formula for 289 00:16:01,220 --> 00:16:02,900 the conditional variances. 290 00:16:02,900 --> 00:16:05,430 It's just the same formula, but applied to 291 00:16:05,430 --> 00:16:07,370 the conditional universe. 292 00:16:07,370 --> 00:16:10,130 The variance of x in the conditional universe is the 293 00:16:10,130 --> 00:16:12,050 expected value of x squared-- 294 00:16:12,050 --> 00:16:13,770 in the conditional universe-- 295 00:16:13,770 --> 00:16:16,700 minus the mean of x-- in the conditional universe-- 296 00:16:16,700 --> 00:16:17,730 squared. 297 00:16:17,730 --> 00:16:20,350 So this formula looks fine. 298 00:16:20,350 --> 00:16:23,620 Now let's take expected values of both sides. 299 00:16:23,620 --> 00:16:27,470 Remember the conditional variance is a random variable, 300 00:16:27,470 --> 00:16:30,600 because its value depends on whatever realization we get 301 00:16:30,600 --> 00:16:33,860 for capital Y. So we can take expectations here. 302 00:16:33,860 --> 00:16:36,320 We get the expected value of the variance. 303 00:16:36,320 --> 00:16:39,380 Then we have the expected value of a conditional 304 00:16:39,380 --> 00:16:40,740 expectation. 305 00:16:40,740 --> 00:16:44,420 Here we use the fact that we discussed before. 306 00:16:44,420 --> 00:16:48,020 The expected value of a conditional expectation is the 307 00:16:48,020 --> 00:16:50,560 same as the unconditional expectation. 308 00:16:50,560 --> 00:16:52,780 So this term becomes this. 309 00:16:52,780 --> 00:16:57,240 And finally, here we just have some weird looking random 310 00:16:57,240 --> 00:17:02,360 variable, and we take the expected value of it. 311 00:17:02,360 --> 00:17:06,210 All right, now we need to do something about this term. 312 00:17:06,210 --> 00:17:10,130 Let's use the same rule up here to 313 00:17:10,130 --> 00:17:14,030 write down this variance. 314 00:17:14,030 --> 00:17:17,810 So variance of an expectation, that's kind of strange, but 315 00:17:17,810 --> 00:17:21,460 you remember that the conditional expectation is 316 00:17:21,460 --> 00:17:23,790 random, because y is random. 317 00:17:23,790 --> 00:17:26,099 So this thing is a random variable, so 318 00:17:26,099 --> 00:17:28,390 this thing has a variance. 319 00:17:28,390 --> 00:17:30,310 What is the variance of this thing? 320 00:17:30,310 --> 00:17:37,740 It's the expected value of the thing squared minus the square 321 00:17:37,740 --> 00:17:40,590 of the expected value of the thing. 322 00:17:40,590 --> 00:17:43,340 Now what's the expected value of that thing? 323 00:17:43,340 --> 00:17:47,230 By the law of iterated expectations, once more, the 324 00:17:47,230 --> 00:17:49,990 expected value of this thing is the unconditional 325 00:17:49,990 --> 00:17:51,090 expectation. 326 00:17:51,090 --> 00:17:54,560 And that's why here I put the unconditional expectation. 327 00:17:54,560 --> 00:17:58,040 So I'm using again this general rule about how to 328 00:17:58,040 --> 00:18:01,510 calculate variances, and I'm applying it to calculate the 329 00:18:01,510 --> 00:18:05,680 variance of the conditional expectation. 330 00:18:05,680 --> 00:18:10,030 And now you notice that if you add these two expressions c 331 00:18:10,030 --> 00:18:15,040 and d we get this plus that, which is this. 332 00:18:15,040 --> 00:18:17,220 It's equal to-- 333 00:18:17,220 --> 00:18:22,360 these two terms cancel, we're left with this minus that, 334 00:18:22,360 --> 00:18:24,810 which is the variance of x. 335 00:18:24,810 --> 00:18:27,430 And that's the end of the proof. 336 00:18:27,430 --> 00:18:31,105 This one of those proofs that do not convey any intuition. 337 00:18:31,105 --> 00:18:34,310 338 00:18:34,310 --> 00:18:37,880 This, as I said, it's a useful proof to go through just to 339 00:18:37,880 --> 00:18:40,250 make sure you understand the symbols. 340 00:18:40,250 --> 00:18:44,020 It starts to get pretty confusing, and a little bit on 341 00:18:44,020 --> 00:18:45,490 the abstract side. 342 00:18:45,490 --> 00:18:48,010 So it's good to understand what's going on. 343 00:18:48,010 --> 00:18:52,610 Now there is intuition behind this formula, some of which is 344 00:18:52,610 --> 00:18:54,780 better left for later in the class when 345 00:18:54,780 --> 00:18:56,680 we talk about inference. 346 00:18:56,680 --> 00:19:01,380 The idea is that the conditional expectation you 347 00:19:01,380 --> 00:19:04,110 can interpret it as an estimate of the random 348 00:19:04,110 --> 00:19:06,700 variable that you are trying to-- 349 00:19:06,700 --> 00:19:10,240 an estimate of x based on measurements of y, you can 350 00:19:10,240 --> 00:19:14,090 think of these variances as having something to do with an 351 00:19:14,090 --> 00:19:15,650 estimation error. 352 00:19:15,650 --> 00:19:19,040 And once you start thinking in those terms an interpretation 353 00:19:19,040 --> 00:19:20,060 will come about. 354 00:19:20,060 --> 00:19:23,750 But again as I said this is better left for when we start 355 00:19:23,750 --> 00:19:25,320 talking about inference. 356 00:19:25,320 --> 00:19:28,080 Nevertheless, we're going to get some intuition about all 357 00:19:28,080 --> 00:19:33,010 these formulas by considering a baby example where we're 358 00:19:33,010 --> 00:19:35,900 going to apply the law of iterated expectations, and the 359 00:19:35,900 --> 00:19:38,060 law of total variance. 360 00:19:38,060 --> 00:19:42,360 So the baby example is that we do this beautiful experiment 361 00:19:42,360 --> 00:19:47,190 of giving a quiz to a class consisting of many sections. 362 00:19:47,190 --> 00:19:49,325 And we're interested in two random variables. 363 00:19:49,325 --> 00:19:52,440 364 00:19:52,440 --> 00:19:54,590 So we have a number of students, and they're all 365 00:19:54,590 --> 00:19:55,980 allocated to sections. 366 00:19:55,980 --> 00:19:59,890 The experiment is that I pick a student at random, and I 367 00:19:59,890 --> 00:20:01,180 look at two random variables. 368 00:20:01,180 --> 00:20:05,880 One is the quiz score of the randomly selected student, and 369 00:20:05,880 --> 00:20:09,960 the other random variable is the section number of the 370 00:20:09,960 --> 00:20:13,040 student that I have selected. 371 00:20:13,040 --> 00:20:17,010 We're given some statistics about the two sections. 372 00:20:17,010 --> 00:20:19,960 Section one has 10 students, section two has 20 students. 373 00:20:19,960 --> 00:20:22,430 The quiz average in section one was 90. 374 00:20:22,430 --> 00:20:25,860 Quiz average in section two was 60. 375 00:20:25,860 --> 00:20:28,320 What's the expected value of x? 376 00:20:28,320 --> 00:20:32,990 What's the expected quiz score if I pick a student at random? 377 00:20:32,990 --> 00:20:34,420 Well, each student has the same 378 00:20:34,420 --> 00:20:35,930 probability of being selected. 379 00:20:35,930 --> 00:20:38,740 I'm making that assumption out of the 30 students. 380 00:20:38,740 --> 00:20:43,520 I need to add the quiz scores of all of the students. 381 00:20:43,520 --> 00:20:47,210 So I need to add the quiz scores in section one, which 382 00:20:47,210 --> 00:20:48,860 is 90 times 10. 383 00:20:48,860 --> 00:20:51,030 I need to add the quiz scores in that section, 384 00:20:51,030 --> 00:20:52,720 which is 60 times 20. 385 00:20:52,720 --> 00:20:55,220 And we find that the overall average was 70. 386 00:20:55,220 --> 00:20:58,310 So this is the usual unconditional expectation. 387 00:20:58,310 --> 00:21:00,990 Let's look at the conditional expectation, and let's look at 388 00:21:00,990 --> 00:21:03,000 the elementary version where we're talking 389 00:21:03,000 --> 00:21:04,690 about numerical values. 390 00:21:04,690 --> 00:21:07,330 If I tell you that the randomly selected student was 391 00:21:07,330 --> 00:21:10,780 in section one what's the expected value of the quiz 392 00:21:10,780 --> 00:21:12,490 score of that student? 393 00:21:12,490 --> 00:21:16,900 Well, given this information, we're picking a random student 394 00:21:16,900 --> 00:21:20,820 uniformly from that section in which the average was 90. 395 00:21:20,820 --> 00:21:23,070 The expected value of the score of that student 396 00:21:23,070 --> 00:21:24,580 is going to be 90. 397 00:21:24,580 --> 00:21:28,800 So given the specific value of y, the specific section, the 398 00:21:28,800 --> 00:21:31,280 conditional expectation or the expected value of the quiz 399 00:21:31,280 --> 00:21:34,470 score is a specific number, the number 90. 400 00:21:34,470 --> 00:21:37,900 Similarly for the second section the expected value is 401 00:21:37,900 --> 00:21:41,480 60, that's the average score in the second section. 402 00:21:41,480 --> 00:21:42,940 This is the elementary version. 403 00:21:42,940 --> 00:21:45,000 What about the abstract version? 404 00:21:45,000 --> 00:21:48,350 In the abstract version the conditional expectation is a 405 00:21:48,350 --> 00:21:52,540 random variable because it depends. 406 00:21:52,540 --> 00:21:57,220 In which section is the student that I picked? 407 00:21:57,220 --> 00:22:01,680 And with probability 1/3, I'm going to pick a student in the 408 00:22:01,680 --> 00:22:04,890 first section, in which case the conditional expectation 409 00:22:04,890 --> 00:22:08,180 will be 90, and with probability 2/3 I'm going to 410 00:22:08,180 --> 00:22:10,260 pick a student in the second section. 411 00:22:10,260 --> 00:22:12,450 And in that case the conditional expectation will 412 00:22:12,450 --> 00:22:14,220 take the value of 60. 413 00:22:14,220 --> 00:22:17,020 So this illustrates the idea that the conditional 414 00:22:17,020 --> 00:22:19,300 expectation is a random variable. 415 00:22:19,300 --> 00:22:21,760 Depending on what y is going to be, the conditional 416 00:22:21,760 --> 00:22:25,320 expectation is going to be one or the other value with 417 00:22:25,320 --> 00:22:27,260 certain probabilities. 418 00:22:27,260 --> 00:22:29,230 Now that we have the distribution of the 419 00:22:29,230 --> 00:22:31,610 conditional expectation we can calculate the 420 00:22:31,610 --> 00:22:33,560 expected value of it. 421 00:22:33,560 --> 00:22:37,220 And the expected value of such a random variable is 1/3 times 422 00:22:37,220 --> 00:22:44,000 90, plus 2/3 times 60, and it comes out to equal 70. 423 00:22:44,000 --> 00:22:49,020 Which miraculously is the same number that we got up there. 424 00:22:49,020 --> 00:22:53,060 So this tells you that you can calculate the overall average 425 00:22:53,060 --> 00:22:58,180 in a large class by taking the averages in each one of the 426 00:22:58,180 --> 00:23:02,900 sections and weighing each one of the sections according to 427 00:23:02,900 --> 00:23:06,320 the number of students that it has. 428 00:23:06,320 --> 00:23:10,560 So this section had 90 students but only 1/3 of the 429 00:23:10,560 --> 00:23:13,850 students, so it gets a weight of 1/3. 430 00:23:13,850 --> 00:23:16,520 So the law of iterated expectations, once more, is 431 00:23:16,520 --> 00:23:18,540 nothing too complicated. 432 00:23:18,540 --> 00:23:20,770 It's just that you can calculate overall class 433 00:23:20,770 --> 00:23:22,780 average by looking at the section 434 00:23:22,780 --> 00:23:26,330 averages and combine them. 435 00:23:26,330 --> 00:23:28,680 Now since the conditional expectation is a random 436 00:23:28,680 --> 00:23:31,860 variable, of course it has a variance of it's own. 437 00:23:31,860 --> 00:23:34,080 So let's calculate the variance. 438 00:23:34,080 --> 00:23:36,060 How do we calculate variances? 439 00:23:36,060 --> 00:23:38,960 We look at all the possible numerical values of this 440 00:23:38,960 --> 00:23:42,270 random variable, which are 90 and 60. 441 00:23:42,270 --> 00:23:45,620 We look at the difference of those possible numerical 442 00:23:45,620 --> 00:23:49,910 values from the mean of this random variable, and the mean 443 00:23:49,910 --> 00:23:53,770 of that random variable, we found that's it's 70. 444 00:23:53,770 --> 00:23:57,480 And then we weight the different possible numerical 445 00:23:57,480 --> 00:23:59,960 values according to their probabilities. 446 00:23:59,960 --> 00:24:03,930 So with probability 1/3 the conditional expectation is 90, 447 00:24:03,930 --> 00:24:06,940 which is 20 away from the mean. 448 00:24:06,940 --> 00:24:08,470 And we get this squared distance. 449 00:24:08,470 --> 00:24:11,750 With probability 2/3 the conditional expectation is 60, 450 00:24:11,750 --> 00:24:14,400 which is 10 away from the mean, has this squared 451 00:24:14,400 --> 00:24:16,910 distance and gets weighed by 2/3, which is the 452 00:24:16,910 --> 00:24:18,470 probability of 60. 453 00:24:18,470 --> 00:24:21,130 So you do the numbers, and you get the value for the variance 454 00:24:21,130 --> 00:24:26,800 equal to 200. 455 00:24:26,800 --> 00:24:30,250 All right, so now we want to move towards using that more 456 00:24:30,250 --> 00:24:33,770 complicated formula involving the conditional variances. 457 00:24:33,770 --> 00:24:36,650 458 00:24:36,650 --> 00:24:40,470 OK, suppose someone goes and calculates the variance of the 459 00:24:40,470 --> 00:24:44,060 quiz scores inside each one of the sections. 460 00:24:44,060 --> 00:24:47,680 So someone gives us these two pieces of information. 461 00:24:47,680 --> 00:24:53,230 In section one we take the differences from the mean in 462 00:24:53,230 --> 00:24:57,900 that section, and let's say that the various turns out to 463 00:24:57,900 --> 00:25:00,240 be a number equal to 10 similarly 464 00:25:00,240 --> 00:25:01,410 in the second section. 465 00:25:01,410 --> 00:25:05,280 So these are the variances of the quiz scores inside 466 00:25:05,280 --> 00:25:07,520 individual sections. 467 00:25:07,520 --> 00:25:09,850 The variance in one conditional universe, the 468 00:25:09,850 --> 00:25:13,290 variance in the other conditional universe. 469 00:25:13,290 --> 00:25:18,860 So if I pick a student in section one and I don't tell 470 00:25:18,860 --> 00:25:21,400 you anything more about the student, what's the variance 471 00:25:21,400 --> 00:25:23,530 of the random score of that student? 472 00:25:23,530 --> 00:25:25,810 The variance is 10. 473 00:25:25,810 --> 00:25:28,210 I know why, but I don't know the student. 474 00:25:28,210 --> 00:25:31,260 So the score is still a random variable in that universe. 475 00:25:31,260 --> 00:25:33,860 It has a variance, and that's the variance. 476 00:25:33,860 --> 00:25:36,330 Similarly, in the other universe, the variance of the 477 00:25:36,330 --> 00:25:39,110 quiz scores is this number, 20. 478 00:25:39,110 --> 00:25:42,650 Once more, this is an equality between numbers. 479 00:25:42,650 --> 00:25:44,920 I have fixed the specific value of y. 480 00:25:44,920 --> 00:25:48,440 So I put myself in a specific universe, I can calculate the 481 00:25:48,440 --> 00:25:51,430 variance in that specific universe. 482 00:25:51,430 --> 00:25:55,150 If I don't specify a numerical value for capital Y, and say I 483 00:25:55,150 --> 00:25:58,390 don't know what Y is going to be, it's going to be random. 484 00:25:58,390 --> 00:26:02,510 Then what kind of section variance I'm going to get 485 00:26:02,510 --> 00:26:04,500 itself will be random. 486 00:26:04,500 --> 00:26:09,530 With probability 1/3, I pick a student in the first section 487 00:26:09,530 --> 00:26:14,740 in which case the conditional variance given what I have 488 00:26:14,740 --> 00:26:16,630 picked is going to be 10. 489 00:26:16,630 --> 00:26:20,990 Or with probability 2/3 I pick y equal to 2, and I place 490 00:26:20,990 --> 00:26:22,690 myself in that universe. 491 00:26:22,690 --> 00:26:25,790 And in that universe the conditional variance is 20. 492 00:26:25,790 --> 00:26:28,320 So you see again from here that the conditional variance 493 00:26:28,320 --> 00:26:32,410 is a random variable that takes different values with 494 00:26:32,410 --> 00:26:33,920 certain probabilities. 495 00:26:33,920 --> 00:26:37,830 And which value it takes depends on the realization of 496 00:26:37,830 --> 00:26:41,670 the random variable capital Y. So this happens if capital Y 497 00:26:41,670 --> 00:26:45,970 is one, this happens if capital Y is equal to 2. 498 00:26:45,970 --> 00:26:50,000 Once you have something of this form-- 499 00:26:50,000 --> 00:26:52,040 a random variable that takes values with certain 500 00:26:52,040 --> 00:26:53,150 probabilities-- 501 00:26:53,150 --> 00:26:55,690 then you can certainly calculate the expected value 502 00:26:55,690 --> 00:26:57,320 of that random variable. 503 00:26:57,320 --> 00:27:00,110 Don't get intimidated by the fact that this random 504 00:27:00,110 --> 00:27:03,555 variable, it's something that's described by a string 505 00:27:03,555 --> 00:27:07,850 of eight symbols, or seven, instead of 506 00:27:07,850 --> 00:27:09,440 just a single letter. 507 00:27:09,440 --> 00:27:15,290 Think of this whole string of symbols there as just being a 508 00:27:15,290 --> 00:27:16,940 random variable. 509 00:27:16,940 --> 00:27:21,790 You could call it z for example, use one letter. 510 00:27:21,790 --> 00:27:25,990 So z is a random variable that takes these two values with 511 00:27:25,990 --> 00:27:27,990 these corresponding probabilities. 512 00:27:27,990 --> 00:27:31,210 So we can talk about the expected value of Z, which is 513 00:27:31,210 --> 00:27:35,560 going to be 1/3 times 10, 2/3 times 20, and we get a certain 514 00:27:35,560 --> 00:27:38,260 number from here. 515 00:27:38,260 --> 00:27:41,620 And now we have all the pieces to calculate the overall 516 00:27:41,620 --> 00:27:43,620 variance of x. 517 00:27:43,620 --> 00:27:49,330 The formula from the previous slide tells us this. 518 00:27:49,330 --> 00:27:51,310 Do we have all the pieces? 519 00:27:51,310 --> 00:27:53,190 The expected value of the variance, we 520 00:27:53,190 --> 00:27:55,160 just calculated it. 521 00:27:55,160 --> 00:27:58,710 The variance of the expected value, this was the last 522 00:27:58,710 --> 00:28:00,410 calculation in the previous slide. 523 00:28:00,410 --> 00:28:03,490 We did get a number for it, it was 200. 524 00:28:03,490 --> 00:28:05,765 You add the two, you find the total variance. 525 00:28:05,765 --> 00:28:09,050 526 00:28:09,050 --> 00:28:12,350 Now the useful piece of this exercise is to try to 527 00:28:12,350 --> 00:28:16,490 interpret these two numbers, and see what they mean. 528 00:28:16,490 --> 00:28:20,350 529 00:28:20,350 --> 00:28:26,670 The variance of x given y for a specific y is the variance 530 00:28:26,670 --> 00:28:28,850 inside section one. 531 00:28:28,850 --> 00:28:31,820 This is the variance inside section two. 532 00:28:31,820 --> 00:28:34,940 The expected value is some kind of average of the 533 00:28:34,940 --> 00:28:38,440 variances inside individual sections. 534 00:28:38,440 --> 00:28:41,770 So this term tells us something about the 535 00:28:41,770 --> 00:28:46,010 variability of this course, how widely spread they are 536 00:28:46,010 --> 00:28:47,856 within individual sections. 537 00:28:47,856 --> 00:28:50,580 538 00:28:50,580 --> 00:28:57,870 So we have three sections, and this course happens to be-- 539 00:28:57,870 --> 00:29:01,180 OK, let's say the sections are really different. 540 00:29:01,180 --> 00:29:03,190 So here you have undergraduates and here you 541 00:29:03,190 --> 00:29:05,860 have post-doctoral students. 542 00:29:05,860 --> 00:29:08,590 And these are the quiz scores, that's section one, section 543 00:29:08,590 --> 00:29:09,960 two, section three. 544 00:29:09,960 --> 00:29:13,360 Here's the mean of the first section. 545 00:29:13,360 --> 00:29:16,200 And the variance has something to do with the spread. 546 00:29:16,200 --> 00:29:18,430 The variance in the second section has something to do 547 00:29:18,430 --> 00:29:21,830 with the spread, similarly with the third spread. 548 00:29:21,830 --> 00:29:28,220 And the expected value of the conditional variances is some 549 00:29:28,220 --> 00:29:31,690 weighted average of the three variances that we get from 550 00:29:31,690 --> 00:29:33,720 individual sections. 551 00:29:33,720 --> 00:29:37,060 So variability within sections definitely contributes 552 00:29:37,060 --> 00:29:40,000 something to the overall variability of this course. 553 00:29:40,000 --> 00:29:45,340 But if you ask me about the variability over the entire 554 00:29:45,340 --> 00:29:47,740 class there's a second effect. 555 00:29:47,740 --> 00:29:50,470 That has to do with the fact that different sections are 556 00:29:50,470 --> 00:29:52,660 very different from each other. 557 00:29:52,660 --> 00:29:59,440 That these courses here are far away from those scores. 558 00:29:59,440 --> 00:30:02,490 And this term is the one that does the job. 559 00:30:02,490 --> 00:30:08,410 This one looks at the expected values inside each section, 560 00:30:08,410 --> 00:30:12,840 and these expected values are this, this, and that. 561 00:30:12,840 --> 00:30:18,230 And asks a question how widely spread are they? 562 00:30:18,230 --> 00:30:23,000 It asks how different from each other are the means 563 00:30:23,000 --> 00:30:25,400 inside individual sections? 564 00:30:25,400 --> 00:30:28,280 And in this picture it would be a large number because the 565 00:30:28,280 --> 00:30:31,980 difference section means are quite different. 566 00:30:31,980 --> 00:30:35,890 So the story that this formula is telling us is that the 567 00:30:35,890 --> 00:30:40,810 overall variability of the quiz scores consists of two 568 00:30:40,810 --> 00:30:44,720 factors that can be quantified and added. 569 00:30:44,720 --> 00:30:49,580 One factor is how much variability is there inside 570 00:30:49,580 --> 00:30:51,420 individual sections? 571 00:30:51,420 --> 00:30:54,990 And the other factor is how different are the sections 572 00:30:54,990 --> 00:30:56,100 from each other? 573 00:30:56,100 --> 00:30:58,620 Both effects contribute to the overall 574 00:30:58,620 --> 00:30:59,885 variability of this course. 575 00:30:59,885 --> 00:31:03,920 576 00:31:03,920 --> 00:31:08,290 Let's continue with just one more numerical example. 577 00:31:08,290 --> 00:31:11,730 Just to get the hang of doing these kinds of calculations, 578 00:31:11,730 --> 00:31:15,810 and apply this formula to do a divide and conquer calculation 579 00:31:15,810 --> 00:31:18,270 of the variance of a random variable. 580 00:31:18,270 --> 00:31:20,830 Just for variety now we're going to take a continuous 581 00:31:20,830 --> 00:31:22,140 random variable. 582 00:31:22,140 --> 00:31:25,890 Somebody gives you a PDF if this form, and they ask you 583 00:31:25,890 --> 00:31:26,640 for the variance. 584 00:31:26,640 --> 00:31:29,490 And you say oh that's too complicated, I don't want to 585 00:31:29,490 --> 00:31:30,350 do integrals. 586 00:31:30,350 --> 00:31:32,480 Can I divide and conquer? 587 00:31:32,480 --> 00:31:35,210 And you say OK, let me do the following trick. 588 00:31:35,210 --> 00:31:37,830 Let me define a random variable, y. 589 00:31:37,830 --> 00:31:43,450 Which takes the value 1 if x falls in here, and takes the 590 00:31:43,450 --> 00:31:47,080 value 2 if x falls in the second interval. 591 00:31:47,080 --> 00:31:51,340 And let me try to work in the conditional world where things 592 00:31:51,340 --> 00:31:54,340 might be easier, and then add things up to 593 00:31:54,340 --> 00:31:57,540 get the overall variance. 594 00:31:57,540 --> 00:32:01,500 So I have defined y this particular way. 595 00:32:01,500 --> 00:32:04,562 In this example y becomes a function of x. 596 00:32:04,562 --> 00:32:07,370 y is completely determined by x. 597 00:32:07,370 --> 00:32:11,230 And I'm going to calculate the overall variance by trying to 598 00:32:11,230 --> 00:32:14,420 calculate all of the terms that are involved here. 599 00:32:14,420 --> 00:32:16,430 So let's start calculating. 600 00:32:16,430 --> 00:32:21,690 First observation is that this event has probability 1/3, and 601 00:32:21,690 --> 00:32:24,390 this event has probability 2/3. 602 00:32:24,390 --> 00:32:28,480 The expected value of x given that we are in this universe 603 00:32:28,480 --> 00:32:31,260 is 1/2, because we have a uniform 604 00:32:31,260 --> 00:32:33,350 distribution from 0 to 1. 605 00:32:33,350 --> 00:32:36,630 Here we have a uniform distribution from 1 to 2, so 606 00:32:36,630 --> 00:32:40,820 the conditional expectation of x in that universe is 3/2. 607 00:32:40,820 --> 00:32:43,200 How about conditional variances? 608 00:32:43,200 --> 00:32:48,920 In the world who are y is equal to 1 x has a uniform 609 00:32:48,920 --> 00:32:50,770 distribution on a unit interval. 610 00:32:50,770 --> 00:32:53,090 What's the variance of x? 611 00:32:53,090 --> 00:32:57,480 By now you've probably seen that formula, it's 1 over 12. 612 00:32:57,480 --> 00:33:00,580 1 over 12 is the variance of a uniform distribution over a 613 00:33:00,580 --> 00:33:01,880 unit interval. 614 00:33:01,880 --> 00:33:07,120 When y is equal to 2 the variance is again 1 over 12. 615 00:33:07,120 --> 00:33:10,850 Because in this instance again x has a uniform distribution 616 00:33:10,850 --> 00:33:13,360 over an interval of unit length. 617 00:33:13,360 --> 00:33:16,010 What's the overall expected value of x? 618 00:33:16,010 --> 00:33:19,080 The way you find the overall expected value is to consider 619 00:33:19,080 --> 00:33:21,370 the different numerical values of the conditional 620 00:33:21,370 --> 00:33:22,450 expectation. 621 00:33:22,450 --> 00:33:25,570 And weigh them according to their probabilities. 622 00:33:25,570 --> 00:33:28,770 So with probability 1/3 the conditional 623 00:33:28,770 --> 00:33:30,830 expectation is 1/2. 624 00:33:30,830 --> 00:33:34,170 And with probability 2/3 the conditional 625 00:33:34,170 --> 00:33:36,460 expectation is 3 over 2. 626 00:33:36,460 --> 00:33:39,555 And this turns out to be 7 over 6. 627 00:33:39,555 --> 00:33:45,080 628 00:33:45,080 --> 00:33:48,450 So this is the advance work we need to do, now let's 629 00:33:48,450 --> 00:33:50,660 calculate a few things here. 630 00:33:50,660 --> 00:33:56,660 What's the variance of the expected value of x given y? 631 00:33:56,660 --> 00:34:00,800 Expected value of x given y is a random variable that takes 632 00:34:00,800 --> 00:34:06,600 these two values with these probabilities. 633 00:34:06,600 --> 00:34:10,610 So to find the variance we consider the probability that 634 00:34:10,610 --> 00:34:18,730 the expected value takes the numerical value of 1/2 minus 635 00:34:18,730 --> 00:34:23,659 the mean of the conditional expectation. 636 00:34:23,659 --> 00:34:26,820 What's the mean of the conditional expectation? 637 00:34:26,820 --> 00:34:28,560 It's the unconditional expectation. 638 00:34:28,560 --> 00:34:30,980 So it's 7 over 6. 639 00:34:30,980 --> 00:34:32,889 We just did that calculation. 640 00:34:32,889 --> 00:34:38,050 So I'm putting here that number, 7 over 6 squared. 641 00:34:38,050 --> 00:34:41,830 And then there's a second term with probability 2/3, the 642 00:34:41,830 --> 00:34:48,760 conditional expectation takes this value of 3 over 2, which 643 00:34:48,760 --> 00:34:54,380 is so much away from the mean, and we get this contribution. 644 00:34:54,380 --> 00:34:57,800 So this way we have calculated the variance of the 645 00:34:57,800 --> 00:35:01,590 conditional expectation, this is this term. 646 00:35:01,590 --> 00:35:04,000 What is this? 647 00:35:04,000 --> 00:35:05,940 Any guesses what this number is? 648 00:35:05,940 --> 00:35:09,900 649 00:35:09,900 --> 00:35:11,740 It's 1 over 12, why? 650 00:35:11,740 --> 00:35:15,740 The conditional variance just happened in this example to be 651 00:35:15,740 --> 00:35:18,550 1 over 12 no matter what. 652 00:35:18,550 --> 00:35:21,240 So the conditional variance is a deterministic random 653 00:35:21,240 --> 00:35:23,530 variable that takes a constant value. 654 00:35:23,530 --> 00:35:27,110 So the expected value of this random variable 655 00:35:27,110 --> 00:35:29,490 is just 1 over 12. 656 00:35:29,490 --> 00:35:35,460 So we got the two pieces that we need, and so we do have the 657 00:35:35,460 --> 00:35:39,515 overall variance of the random variable x. 658 00:35:39,515 --> 00:35:45,680 659 00:35:45,680 --> 00:35:50,750 So this was just an academic example in order to get the 660 00:35:50,750 --> 00:35:56,660 hang of how to manipulate various quantities. 661 00:35:56,660 --> 00:36:00,480 Now let's use what we have learned and the tools that we 662 00:36:00,480 --> 00:36:04,410 have to do something a little more interesting. 663 00:36:04,410 --> 00:36:07,820 OK, so by now you're all in love with probabilities. 664 00:36:07,820 --> 00:36:11,590 So over the weekend you're going to bookstores to buy 665 00:36:11,590 --> 00:36:13,540 probability books. 666 00:36:13,540 --> 00:36:19,110 So you're going to visit a random number bookstores, and 667 00:36:19,110 --> 00:36:23,900 at each one of the bookstores you're going to spend a random 668 00:36:23,900 --> 00:36:26,420 amount of money. 669 00:36:26,420 --> 00:36:31,060 So let n be the number of stores that you are visiting. 670 00:36:31,060 --> 00:36:32,890 So n is an integer-- 671 00:36:32,890 --> 00:36:34,870 non-negative random variable-- 672 00:36:34,870 --> 00:36:37,050 and perhaps you know the distribution 673 00:36:37,050 --> 00:36:39,230 of that random variable. 674 00:36:39,230 --> 00:36:44,080 Each time that you walk into a store your mind is clear from 675 00:36:44,080 --> 00:36:48,580 whatever you did before, and you just buy a random number 676 00:36:48,580 --> 00:36:51,530 of books that has nothing to do with how many books you 677 00:36:51,530 --> 00:36:53,650 bought earlier on the day. 678 00:36:53,650 --> 00:36:55,890 It has nothing to do with how many stores you are 679 00:36:55,890 --> 00:36:57,490 visiting, and so on. 680 00:36:57,490 --> 00:37:00,760 So each time you enter as a brand new person, and buy a 681 00:37:00,760 --> 00:37:02,180 random number of books, and spend a 682 00:37:02,180 --> 00:37:03,580 random amount of money. 683 00:37:03,580 --> 00:37:07,160 So what I'm saying, more precisely, is that I'm making 684 00:37:07,160 --> 00:37:08,760 the following assumptions. 685 00:37:08,760 --> 00:37:11,130 That for each store i-- 686 00:37:11,130 --> 00:37:14,360 if you end up visiting the i-th store-- 687 00:37:14,360 --> 00:37:17,480 the amount of money that you spend is a random variable 688 00:37:17,480 --> 00:37:19,090 that has a certain distribution. 689 00:37:19,090 --> 00:37:23,410 That distribution is the same for each store, and the xi's 690 00:37:23,410 --> 00:37:26,890 from store to store are independent from each other. 691 00:37:26,890 --> 00:37:30,800 And furthermore, the xi's are all independent of n. 692 00:37:30,800 --> 00:37:34,130 So how much I'm spending at the store-- once I get in-- 693 00:37:34,130 --> 00:37:37,280 has nothing to do with how many stores I'm visiting. 694 00:37:37,280 --> 00:37:40,700 So this is the setting that we're going to look at. 695 00:37:40,700 --> 00:37:45,470 y is the total amount of money that you did spend. 696 00:37:45,470 --> 00:37:48,790 It's the sum of how much you spent in the stores, but the 697 00:37:48,790 --> 00:37:53,980 index goes up to capital N. And what's the twist here? 698 00:37:53,980 --> 00:37:57,460 It's that we're dealing with the sum of independent random 699 00:37:57,460 --> 00:38:02,690 variables except that how many random variables we have is 700 00:38:02,690 --> 00:38:07,470 not given to us ahead of time, but it is chosen at random. 701 00:38:07,470 --> 00:38:12,480 So it's a sum of a random number of random variables. 702 00:38:12,480 --> 00:38:15,360 We would like to calculate some quantities that have to 703 00:38:15,360 --> 00:38:19,690 do with y, in particular the expected value of y, or the 704 00:38:19,690 --> 00:38:21,930 variance of y. 705 00:38:21,930 --> 00:38:23,540 How do we go about it? 706 00:38:23,540 --> 00:38:26,950 OK, we know something about the linearity of expectations. 707 00:38:26,950 --> 00:38:31,890 That expectation of a sum is the sum of the expectations. 708 00:38:31,890 --> 00:38:37,180 But we have used that rule only in the case where it's 709 00:38:37,180 --> 00:38:39,850 the sum of a fixed number of random variables. 710 00:38:39,850 --> 00:38:43,670 So expected value of x plus y plus z is expectation of x, 711 00:38:43,670 --> 00:38:46,390 plus expectation of y, plus expectation of z. 712 00:38:46,390 --> 00:38:48,960 We know this for a fixed number of random variables. 713 00:38:48,960 --> 00:38:53,140 We don't know it, or how it would work for the case of a 714 00:38:53,140 --> 00:38:54,430 random number. 715 00:38:54,430 --> 00:38:57,870 Well, if we know something about the case for fixed 716 00:38:57,870 --> 00:39:01,730 random variables let's transport ourselves to a 717 00:39:01,730 --> 00:39:05,310 conditional universe where the number of random variables 718 00:39:05,310 --> 00:39:07,570 we're summing is fixed. 719 00:39:07,570 --> 00:39:11,640 So let's try to break the problem divide and conquer by 720 00:39:11,640 --> 00:39:15,300 conditioning on the different possible values of the number 721 00:39:15,300 --> 00:39:17,290 of bookstores that we're visiting. 722 00:39:17,290 --> 00:39:19,860 So let's work in the conditional universe, find the 723 00:39:19,860 --> 00:39:24,950 conditional expectation in this universe, and then use 724 00:39:24,950 --> 00:39:29,630 our law of iterated expectations to see what 725 00:39:29,630 --> 00:39:32,840 happens more generally. 726 00:39:32,840 --> 00:39:37,120 If I told you that I visited exactly little n stores, where 727 00:39:37,120 --> 00:39:40,420 little n now is a number, let's say 10. 728 00:39:40,420 --> 00:39:44,840 Then the amount of money you're spending is x1 plus x2 729 00:39:44,840 --> 00:39:51,060 all the way up to x10 given that we visited 10 stores. 730 00:39:51,060 --> 00:39:54,640 So what I have done here is that I've replaced the capital 731 00:39:54,640 --> 00:39:59,370 N with little n, and I can do this because I'm now in the 732 00:39:59,370 --> 00:40:01,160 conditional universe where I know that 733 00:40:01,160 --> 00:40:04,160 capital N is little n. 734 00:40:04,160 --> 00:40:06,840 Now little n is fixed. 735 00:40:06,840 --> 00:40:10,810 We have assumed that n is independent from the xi's. 736 00:40:10,810 --> 00:40:15,900 So in this universe of a fixed n this information here 737 00:40:15,900 --> 00:40:20,400 doesn't tell me anything new about the values of the x's. 738 00:40:20,400 --> 00:40:24,600 If you're conditioning random variables that are independent 739 00:40:24,600 --> 00:40:27,220 from the random variables you are interested in, the 740 00:40:27,220 --> 00:40:30,630 conditioning has no effect, and so it can be dropped. 741 00:40:30,630 --> 00:40:33,000 So in this conditional universe where you visit 742 00:40:33,000 --> 00:40:35,720 exactly 10 stores the expected amount of money you're 743 00:40:35,720 --> 00:40:40,840 spending is the expectation of the amount of money spent in 744 00:40:40,840 --> 00:40:44,350 10 stores, which is the sum of the expected amount of money 745 00:40:44,350 --> 00:40:45,880 in each store. 746 00:40:45,880 --> 00:40:48,760 Each one of these is the same number, because the random 747 00:40:48,760 --> 00:40:50,960 variables have identical distributions. 748 00:40:50,960 --> 00:40:54,130 So it's n times the expected value of money you spent in a 749 00:40:54,130 --> 00:40:57,140 typical store. 750 00:40:57,140 --> 00:41:02,240 This is almost obvious without doing it formally. 751 00:41:02,240 --> 00:41:05,010 If I'm telling you that you're visiting 10 stores, what you 752 00:41:05,010 --> 00:41:09,220 expect to spend is 10 times the amount you expect to spend 753 00:41:09,220 --> 00:41:12,180 in each store individually. 754 00:41:12,180 --> 00:41:16,480 Now let's take this equality here and rewrite it in our 755 00:41:16,480 --> 00:41:20,030 abstract notation, in terms of random variables. 756 00:41:20,030 --> 00:41:22,170 This is an equality between numbers. 757 00:41:22,170 --> 00:41:25,440 Expected value of y given that you visit 10 stores is 10 758 00:41:25,440 --> 00:41:28,220 times this particular number. 759 00:41:28,220 --> 00:41:30,345 Let's translate it into random variables. 760 00:41:30,345 --> 00:41:36,290 In random variable notation, the expected value of money 761 00:41:36,290 --> 00:41:39,610 you're spending given the number of stores-- 762 00:41:39,610 --> 00:41:42,480 but without telling you a specific number-- 763 00:41:42,480 --> 00:41:46,720 is whatever that number of stores turns out to be times 764 00:41:46,720 --> 00:41:49,300 the expected value of x. 765 00:41:49,300 --> 00:41:55,110 So this is a random variable that takes this as a numerical 766 00:41:55,110 --> 00:41:58,150 value whenever capital N happens to be 767 00:41:58,150 --> 00:42:00,030 equal to little n. 768 00:42:00,030 --> 00:42:04,570 This is a random variable, which by definition takes this 769 00:42:04,570 --> 00:42:07,450 numerical value whenever capital N is 770 00:42:07,450 --> 00:42:09,520 equal to little n. 771 00:42:09,520 --> 00:42:14,960 So no matter what capital N happens to be what specific 772 00:42:14,960 --> 00:42:18,870 value, little n, it takes this is equal to that. 773 00:42:18,870 --> 00:42:21,590 Therefore the value of this random variable is going to be 774 00:42:21,590 --> 00:42:23,350 equal to that random variable. 775 00:42:23,350 --> 00:42:26,750 So as random variables, these two random variables are equal 776 00:42:26,750 --> 00:42:28,000 to each other. 777 00:42:28,000 --> 00:42:29,940 778 00:42:29,940 --> 00:42:33,200 And now we use the law of iterated expectations. 779 00:42:33,200 --> 00:42:35,750 The law of iterated expectations tells us that the 780 00:42:35,750 --> 00:42:39,530 overall expected value of y is the expected value of the 781 00:42:39,530 --> 00:42:41,270 conditional expectation. 782 00:42:41,270 --> 00:42:43,650 We have a formula for the conditional expectation. 783 00:42:43,650 --> 00:42:46,580 It's n times expected value of x. 784 00:42:46,580 --> 00:42:50,390 Now the expected value of x is a number. 785 00:42:50,390 --> 00:42:54,970 Expected value of something random times a number is 786 00:42:54,970 --> 00:42:58,320 expected value of the random variable 787 00:42:58,320 --> 00:42:59,820 times the number itself. 788 00:42:59,820 --> 00:43:02,880 We can take a number outside the expectation. 789 00:43:02,880 --> 00:43:06,060 So expected value of x gets pulled out. 790 00:43:06,060 --> 00:43:09,790 And that's the conclusion, that overall the expected 791 00:43:09,790 --> 00:43:13,340 amount of money you're going to spend is equal to how many 792 00:43:13,340 --> 00:43:16,670 stores you expect to visit on the average, and how much 793 00:43:16,670 --> 00:43:22,050 money you expect to spend on each one on the average. 794 00:43:22,050 --> 00:43:24,890 You might have guessed that this is the answer. 795 00:43:24,890 --> 00:43:30,400 If you expect to visit 10 stores, and you expect to 796 00:43:30,400 --> 00:43:34,460 spend $100 on each store, then yes, you expect to spend 797 00:43:34,460 --> 00:43:36,150 $1,000 today. 798 00:43:36,150 --> 00:43:39,050 You're not going to impress your Harvard friends if you 799 00:43:39,050 --> 00:43:40,300 tell them that story. 800 00:43:40,300 --> 00:43:42,900 801 00:43:42,900 --> 00:43:46,410 It's one of the cases where reasoning, on the average, 802 00:43:46,410 --> 00:43:50,160 does give you the plausible answer. 803 00:43:50,160 --> 00:43:54,290 But you will be able to impress your Harvard friends 804 00:43:54,290 --> 00:43:56,940 if you tell them that I can actually calculate the 805 00:43:56,940 --> 00:44:01,510 variance of how much I can spend. 806 00:44:01,510 --> 00:44:05,500 And we're going to work by applying this formula that we 807 00:44:05,500 --> 00:44:09,710 have, and the difficulty is basically sorting out all 808 00:44:09,710 --> 00:44:14,360 those terms here, and what they mean. 809 00:44:14,360 --> 00:44:20,630 So let's start with this term. 810 00:44:20,630 --> 00:44:23,460 So the expected value of y given that you're visiting n 811 00:44:23,460 --> 00:44:26,280 stores is n times the expected value of x. 812 00:44:26,280 --> 00:44:28,250 That's what we did in the previous slide. 813 00:44:28,250 --> 00:44:32,540 So this thing is a random variable, it has a variance. 814 00:44:32,540 --> 00:44:34,300 What is the variance? 815 00:44:34,300 --> 00:44:39,240 Is the variance of n times the expected value of x. 816 00:44:39,240 --> 00:44:42,010 Remember expected value of x is a number. 817 00:44:42,010 --> 00:44:46,180 So we're dealing with the variance of n times a number. 818 00:44:46,180 --> 00:44:48,330 What happens when you multiply a random 819 00:44:48,330 --> 00:44:50,800 variable by a constant? 820 00:44:50,800 --> 00:44:55,020 The variance becomes the previous variance times the 821 00:44:55,020 --> 00:44:56,650 constant squared. 822 00:44:56,650 --> 00:45:01,900 So the variance of this is the variance of n times the square 823 00:45:01,900 --> 00:45:04,300 of that constant that we had here. 824 00:45:04,300 --> 00:45:08,570 So this tells us the variance of the expected 825 00:45:08,570 --> 00:45:10,290 value of y given n. 826 00:45:10,290 --> 00:45:13,380 This is the part of the variability of how much money 827 00:45:13,380 --> 00:45:16,950 you're spending, which is attributed to the randomness, 828 00:45:16,950 --> 00:45:19,650 or the variability, in the number of stores 829 00:45:19,650 --> 00:45:21,380 that you are visiting. 830 00:45:21,380 --> 00:45:24,450 So the interpretation of the two terms is there's 831 00:45:24,450 --> 00:45:27,760 randomness in how much you're going to spend, and this is 832 00:45:27,760 --> 00:45:32,480 attributed to the randomness in the number of stores 833 00:45:32,480 --> 00:45:36,660 together with the randomness inside individual stores. 834 00:45:36,660 --> 00:45:40,110 Well, after I tell you how many stores you're visiting. 835 00:45:40,110 --> 00:45:42,570 So now let's deal with this term-- the variance inside 836 00:45:42,570 --> 00:45:45,020 individual stores. 837 00:45:45,020 --> 00:45:47,070 Let's take it slow. 838 00:45:47,070 --> 00:45:50,490 If I tell you that you're visiting exactly little n 839 00:45:50,490 --> 00:45:54,220 stores, then y is how much money you spent in those 840 00:45:54,220 --> 00:45:55,490 little n stores. 841 00:45:55,490 --> 00:45:59,480 You're dealing with the sum of little n random variables. 842 00:45:59,480 --> 00:46:01,290 What is the variance of the sum of 843 00:46:01,290 --> 00:46:03,120 little n random variables? 844 00:46:03,120 --> 00:46:05,880 It's the sum of their variances. 845 00:46:05,880 --> 00:46:10,590 So each store contributes a variance of x, and you're 846 00:46:10,590 --> 00:46:12,600 adding over little n stores. 847 00:46:12,600 --> 00:46:16,520 That's the variance of money spent if I tell you 848 00:46:16,520 --> 00:46:18,040 the number of stores. 849 00:46:18,040 --> 00:46:26,430 Now let's translate this into random variable notation. 850 00:46:26,430 --> 00:46:30,310 This is a random variable that takes this numerical value 851 00:46:30,310 --> 00:46:33,630 whenever capital N is equal to little n. 852 00:46:33,630 --> 00:46:37,020 This is a random variable that takes this numerical value 853 00:46:37,020 --> 00:46:39,250 whenever capital N is equal to little n. 854 00:46:39,250 --> 00:46:40,760 This is equal to that. 855 00:46:40,760 --> 00:46:43,960 Therefore, these two are always equal, no matter what 856 00:46:43,960 --> 00:46:45,400 happens to y. 857 00:46:45,400 --> 00:46:49,100 So we have an equality here between random variables. 858 00:46:49,100 --> 00:46:51,620 Now we take expectations of both. 859 00:46:51,620 --> 00:46:56,160 Expected value of the variance is expected value of this. 860 00:46:56,160 --> 00:46:59,890 OK it may look confusing to think of the expected value of 861 00:46:59,890 --> 00:47:05,740 the variance here, but the variance of x is a number, not 862 00:47:05,740 --> 00:47:06,650 a random variable. 863 00:47:06,650 --> 00:47:08,480 You think of it as a constant. 864 00:47:08,480 --> 00:47:12,580 So its expected value of n times a constant gives us the 865 00:47:12,580 --> 00:47:16,420 expected value of n times the constant itself. 866 00:47:16,420 --> 00:47:20,840 So now we got the second term as well, and now we put 867 00:47:20,840 --> 00:47:24,900 everything together, this plus that to get an expression for 868 00:47:24,900 --> 00:47:28,050 the overall variance of y. 869 00:47:28,050 --> 00:47:32,380 Which again, as I said before, the overall variability in y 870 00:47:32,380 --> 00:47:36,790 has to do with the variability of how much you spent inside 871 00:47:36,790 --> 00:47:39,210 the typical store. 872 00:47:39,210 --> 00:47:43,000 And the variability in the number of stores 873 00:47:43,000 --> 00:47:45,510 that you are visiting. 874 00:47:45,510 --> 00:47:48,820 OK, so this is it for today. 875 00:47:48,820 --> 00:47:52,600 We'll change subjects quite radically from next time. 876 00:47:52,600 --> 00:47:53,850