1 00:00:01,164 --> 00:00:03,580 PROFESSOR: If we're going to make use of Chebyshev's Bound 2 00:00:03,580 --> 00:00:05,840 and other results that depend on the variance, 3 00:00:05,840 --> 00:00:09,690 we'll need some methods for calculating variance 4 00:00:09,690 --> 00:00:10,980 in various circumstances. 5 00:00:10,980 --> 00:00:13,940 So let's develop that here. 6 00:00:13,940 --> 00:00:18,382 A basic place to begin is to ask about the indicator 7 00:00:18,382 --> 00:00:19,590 variables and their variance. 8 00:00:19,590 --> 00:00:22,070 Remember, i is an indicator variable that 9 00:00:22,070 --> 00:00:24,470 means that it's zero one value. 10 00:00:24,470 --> 00:00:26,500 It's also called the Bernoulli variable. 11 00:00:26,500 --> 00:00:29,340 And if the probability that it equals 1 is p, 12 00:00:29,340 --> 00:00:30,890 that's also its expectations. 13 00:00:30,890 --> 00:00:33,850 So we have an indicator variable with expectation 14 00:00:33,850 --> 00:00:36,590 of the indicator is p, and we're asking 15 00:00:36,590 --> 00:00:40,130 what's its variance, which by definition is the expectation 16 00:00:40,130 --> 00:00:42,310 of i minus p squared. 17 00:00:42,310 --> 00:00:45,630 Well, this is one of the sort of almost mechanical proofs that 18 00:00:45,630 --> 00:00:49,780 follows simply by algebra and linearity of expectation. 19 00:00:49,780 --> 00:00:51,810 But let's walk through it step by step, 20 00:00:51,810 --> 00:00:54,730 just to reassure you that that's all that's involved. 21 00:00:54,730 --> 00:00:57,470 I would recommend against really trying to memorize this, 22 00:00:57,470 --> 00:00:59,470 because it's-- I can remember it anyway, 23 00:00:59,470 --> 00:01:02,110 I just reprove it every time I need it. 24 00:01:02,110 --> 00:01:04,819 And so, let's see how the proof would go. 25 00:01:04,819 --> 00:01:10,130 So step one would be to expand this i minus 26 00:01:10,130 --> 00:01:11,440 p squared algebraically. 27 00:01:11,440 --> 00:01:13,880 So we're talking about the expectation of i squared 28 00:01:13,880 --> 00:01:16,720 minus 2pi plus p squared. 29 00:01:16,720 --> 00:01:19,350 Now we can just apply linearity of expectation, 30 00:01:19,350 --> 00:01:22,860 and I get the expectation of i squared minus 2p times 31 00:01:22,860 --> 00:01:25,106 the expectation of i plus p squared. 32 00:01:25,106 --> 00:01:27,480 Of course, the expectation of a constant is the constant. 33 00:01:27,480 --> 00:01:30,790 So when I take expectation of p squared, I get p squared. 34 00:01:30,790 --> 00:01:32,600 But now look at this. 35 00:01:32,600 --> 00:01:34,150 i squared is zero one value. 36 00:01:34,150 --> 00:01:38,760 So in fact, i squared is equal to i and the expectation of i 37 00:01:38,760 --> 00:01:41,000 has now appeared here, that's p. 38 00:01:41,000 --> 00:01:44,210 So this term simplifies to expectation of i, 39 00:01:44,210 --> 00:01:47,410 and this term becomes 2p times p plus p squared. 40 00:01:47,410 --> 00:01:49,720 Of course that expectation of i is a p. 41 00:01:49,720 --> 00:01:53,540 So I've got p minus 2p squared plus p squared. 42 00:01:53,540 --> 00:01:56,550 The p squareds cancel, and I get p minus p squared. 43 00:01:56,550 --> 00:01:59,820 If you factor out p, that's p times 1 minus p, 44 00:01:59,820 --> 00:02:03,110 or pq, which is the standard way that you 45 00:02:03,110 --> 00:02:05,840 write the variance of an indicator variable. 46 00:02:05,840 --> 00:02:08,560 It's p times 1 minus p. 47 00:02:08,560 --> 00:02:13,540 OK, that was easy, and again, completely mechanical. 48 00:02:13,540 --> 00:02:15,970 There's a couple of other rules for calculating 49 00:02:15,970 --> 00:02:20,950 variance of new variables from old ones that are basic. 50 00:02:20,950 --> 00:02:22,700 Like [? additivity ?] of expectation, 51 00:02:22,700 --> 00:02:25,650 but it doesn't quite work so simply for variance. 52 00:02:25,650 --> 00:02:28,190 So the first rule is that if you ask 53 00:02:28,190 --> 00:02:31,090 about the variance of a constant times r 54 00:02:31,090 --> 00:02:35,510 plus b, that turns out to be the same as a squared 55 00:02:35,510 --> 00:02:38,050 times the variance of b-- of r. 56 00:02:38,050 --> 00:02:40,960 The b doesn't-- the additive of be doesn't matter, 57 00:02:40,960 --> 00:02:45,370 and the-- because the variance is really the expectation 58 00:02:45,370 --> 00:02:50,610 of something squared, when you get rid of that constant 59 00:02:50,610 --> 00:02:53,510 a, you're factoring out an squared. 60 00:02:53,510 --> 00:02:56,450 And this is the rule you get here. 61 00:02:56,450 --> 00:02:58,260 OK. 62 00:02:58,260 --> 00:03:01,270 Another basic rule that's often convenient, instead 63 00:03:01,270 --> 00:03:05,770 of working with variance in the form of the expectation of r 64 00:03:05,770 --> 00:03:08,240 minus mu squared, is to say that it's 65 00:03:08,240 --> 00:03:13,170 the expectation of r squared minus the square 66 00:03:13,170 --> 00:03:15,910 of the expectation of r. 67 00:03:15,910 --> 00:03:19,850 Now, this expression-- the square of the expectation of r 68 00:03:19,850 --> 00:03:22,247 comes up so often that there's a shorthand for it. 69 00:03:22,247 --> 00:03:23,830 Where instead of writing [? parends ?] 70 00:03:23,830 --> 00:03:27,230 you write e squared of r, just means the same 71 00:03:27,230 --> 00:03:29,600 as expectation of r squared. 72 00:03:29,600 --> 00:03:32,800 And so much for the second rule, which we'll use all the time, 73 00:03:32,800 --> 00:03:34,650 because it's a convenient rule to have. 74 00:03:34,650 --> 00:03:36,580 I'm going to prove the second one, just again, 75 00:03:36,580 --> 00:03:38,330 just to show you have nothing to worry about. 76 00:03:38,330 --> 00:03:40,913 You don't even have to remember how the proof goes, of course, 77 00:03:40,913 --> 00:03:42,730 you can reconstruct it every time. 78 00:03:42,730 --> 00:03:47,220 So it's again simple proofs just by linearity of expectation 79 00:03:47,220 --> 00:03:49,370 and doing the algebra. 80 00:03:49,370 --> 00:03:53,690 So the variance of r is by definition the expectation 81 00:03:53,690 --> 00:03:55,840 of r minus mu squared. 82 00:03:55,840 --> 00:03:58,100 Let's expand our minus mu squared. 83 00:03:58,100 --> 00:03:59,810 It's the expectation of r squared 84 00:03:59,810 --> 00:04:02,450 minus 2mu r plus mu squared. 85 00:04:02,450 --> 00:04:04,540 Now we apply linearity to that. 86 00:04:04,540 --> 00:04:06,800 I get the expectation of r squared 87 00:04:06,800 --> 00:04:12,470 minus 2mu expectation of r, plus the expectation of mu squared, 88 00:04:12,470 --> 00:04:14,390 if I'm really being completely mechanical 89 00:04:14,390 --> 00:04:16,459 about linearity of expectation. 90 00:04:16,459 --> 00:04:20,070 Now expectation of a constant mu squared is simply mu squared. 91 00:04:20,070 --> 00:04:21,940 And here, I've get the expectation 92 00:04:21,940 --> 00:04:23,630 of r, that's mu again. 93 00:04:23,630 --> 00:04:26,100 So I wind up with the expectation of r squared 94 00:04:26,100 --> 00:04:29,910 minus 2mu mu plus r squared. 95 00:04:29,910 --> 00:04:33,370 This is 2mu squared-- minus 2mu squared plus mu squared. 96 00:04:33,370 --> 00:04:35,810 It winds up with minus mu squared. 97 00:04:35,810 --> 00:04:40,200 And of course, mu squared is the expectation squared of r, 98 00:04:40,200 --> 00:04:41,990 I've proved the formula. 99 00:04:41,990 --> 00:04:45,560 Again, as claimed, there's nothing interesting here, 100 00:04:45,560 --> 00:04:49,740 just algebra and linearity of expectation. 101 00:04:49,740 --> 00:04:52,440 And the first result about factoring out an a 102 00:04:52,440 --> 00:04:55,340 and squaring it follows from a similar proof, which 103 00:04:55,340 --> 00:04:58,620 I'm not going to include here. 104 00:04:58,620 --> 00:05:00,840 So let's look at the space station Mir 105 00:05:00,840 --> 00:05:04,630 again, which we used as an example of calculating 106 00:05:04,630 --> 00:05:06,450 mean time to failure. 107 00:05:06,450 --> 00:05:08,380 So the hypothesis that we're making 108 00:05:08,380 --> 00:05:12,450 is that with probability p, the Mir space station 109 00:05:12,450 --> 00:05:18,270 will run into some huge space garbage that will clobber it. 110 00:05:18,270 --> 00:05:21,150 And the probability of that happening in any given hour 111 00:05:21,150 --> 00:05:23,260 is probability p. 112 00:05:23,260 --> 00:05:27,990 So we know that that means the expected number of hours 113 00:05:27,990 --> 00:05:30,310 for the Mir to fail is 1 over p, that's 114 00:05:30,310 --> 00:05:31,990 the mean time to failure. 115 00:05:31,990 --> 00:05:36,200 And what we're asking is what's the variance of f, if f is 116 00:05:36,200 --> 00:05:38,310 the number of hours to failure? 117 00:05:38,310 --> 00:05:41,140 What's the variance of f? 118 00:05:41,140 --> 00:05:43,640 Well, one way we can do is just plug-in 119 00:05:43,640 --> 00:05:48,100 the definition of expectation and this will work. 120 00:05:48,100 --> 00:05:52,120 The probability that it takes k hours to fail 121 00:05:52,120 --> 00:05:55,050 is-- we know the geometric distribution. 122 00:05:55,050 --> 00:05:58,560 The probability of not failing for k minus 1 hours, 123 00:05:58,560 --> 00:06:02,180 and failing after that, q to the k minus 1 times p. 124 00:06:02,180 --> 00:06:05,800 So the variance of f, using our previous formula 125 00:06:05,800 --> 00:06:07,270 about the expectation of f squared 126 00:06:07,270 --> 00:06:10,810 minus the expectation squared of f, this becomes a minus 1 127 00:06:10,810 --> 00:06:12,220 over p squared. 128 00:06:12,220 --> 00:06:14,010 We can forget about that, we want 129 00:06:14,010 --> 00:06:19,090 to focus on calculating the expectation of f squared. 130 00:06:19,090 --> 00:06:21,680 So f is 1, 2, 3, and so on. 131 00:06:21,680 --> 00:06:24,200 That means f squared is 1, 4, 9, k squared. 132 00:06:24,200 --> 00:06:27,320 The point being that the only values that f squared 133 00:06:27,320 --> 00:06:28,930 can take our squares, so we don't 134 00:06:28,930 --> 00:06:32,220 have to worry about counting them in to sum 135 00:06:32,220 --> 00:06:33,560 that defines the expectation. 136 00:06:33,560 --> 00:06:36,320 So let's go look at that. 137 00:06:36,320 --> 00:06:38,620 So the expectation of f squared is the sum 138 00:06:38,620 --> 00:06:41,750 over the possible values that f squared can take, 139 00:06:41,750 --> 00:06:45,320 namely the sum from k equals 1 to infinity of k squared, 140 00:06:45,320 --> 00:06:48,307 times the probability that f squared is equal to k squared. 141 00:06:48,307 --> 00:06:49,890 Well of course, the probability that f 142 00:06:49,890 --> 00:06:52,110 squared is equal to k squared is the same as the probability 143 00:06:52,110 --> 00:06:53,010 that f equals k. 144 00:06:57,119 --> 00:06:58,660 And we know what the probability of f 145 00:06:58,660 --> 00:07:01,190 equals k is, it's a geometric distribution, so 146 00:07:01,190 --> 00:07:07,380 the probability that f equals k is q to the k minus 1 times p. 147 00:07:07,380 --> 00:07:11,550 If I factor out a p over q, this simplifies to the sum from k 148 00:07:11,550 --> 00:07:15,530 equals 0 to infinity of k squared, q to the k. 149 00:07:15,530 --> 00:07:18,090 And this is a kind of sum that we've 150 00:07:18,090 --> 00:07:19,940 seen before that has a closed form 151 00:07:19,940 --> 00:07:21,800 and we could perfectly well calculate then 152 00:07:21,800 --> 00:07:26,260 the expectation of f squared by appealing to our generating 153 00:07:26,260 --> 00:07:28,800 function information to get a closed form for this, 154 00:07:28,800 --> 00:07:31,220 and then remember to subtract 1 minus p squared 155 00:07:31,220 --> 00:07:34,920 because the variance is this term minus the square 156 00:07:34,920 --> 00:07:37,060 of the expectation of f. 157 00:07:37,060 --> 00:07:40,100 But let's go another way and use the same technique 158 00:07:40,100 --> 00:07:43,270 of total expectation that we used before. 159 00:07:43,270 --> 00:07:46,755 That is, the expectation of f squared, of the failure 160 00:07:46,755 --> 00:07:51,560 of time squared, is equal, by the law of total probability, 161 00:07:51,560 --> 00:07:55,980 to the expectation of f squared, given that f is one. 162 00:07:55,980 --> 00:07:58,120 That is, we fail f on the first step, 163 00:07:58,120 --> 00:08:00,930 times the probability that we fail on the first step. 164 00:08:00,930 --> 00:08:03,420 Plus, the expectation of f squared, 165 00:08:03,420 --> 00:08:05,890 given that we don't fail on the first step, 166 00:08:05,890 --> 00:08:09,290 that f is greater than 1, times the probability that f 167 00:08:09,290 --> 00:08:10,930 is greater than 1. 168 00:08:10,930 --> 00:08:13,120 Now, what's going to make this manageable 169 00:08:13,120 --> 00:08:16,380 is that this expression, the expectation of f squared when 170 00:08:16,380 --> 00:08:18,790 f is greater than 1, will turn out 171 00:08:18,790 --> 00:08:20,880 to be something that we can easily 172 00:08:20,880 --> 00:08:24,490 convert into a nonconditional probability, 173 00:08:24,490 --> 00:08:26,650 and find a value for. 174 00:08:26,650 --> 00:08:29,150 So the limit that we're using here is the following. 175 00:08:29,150 --> 00:08:30,710 What I'm thinking about in mean time 176 00:08:30,710 --> 00:08:34,220 to failure-- if I think of any function whatsoever, 177 00:08:34,220 --> 00:08:36,730 g of the mean time to failure. 178 00:08:36,730 --> 00:08:40,130 And I'm interested in the expectation of g of f, 179 00:08:40,130 --> 00:08:42,429 And I'm interested in the expectation of g of f, 180 00:08:42,429 --> 00:08:44,880 given that f is greater than n. 181 00:08:44,880 --> 00:08:48,570 That is, it's already taken n steps to get where I am. 182 00:08:48,570 --> 00:08:50,790 Then the thing about the mean time to failure 183 00:08:50,790 --> 00:08:55,390 is that at any moment that you haven't failed, 184 00:08:55,390 --> 00:08:57,800 you're starting off in essentially the same situation 185 00:08:57,800 --> 00:09:00,910 you were at the beginning in waiting for the next failure 186 00:09:00,910 --> 00:09:01,720 to occur. 187 00:09:01,720 --> 00:09:04,390 And the probability of failing in one more step 188 00:09:04,390 --> 00:09:07,809 is the same probability-- is the same p. 189 00:09:07,809 --> 00:09:09,350 And the probability of you're failing 190 00:09:09,350 --> 00:09:13,830 in two more steps is qp, and three more steps is qqp. 191 00:09:13,830 --> 00:09:17,440 The only difference is that the value of f 192 00:09:17,440 --> 00:09:20,590 has been shifted by n. 193 00:09:20,590 --> 00:09:22,230 In the ordinary case, we start off 194 00:09:22,230 --> 00:09:24,840 with f equals 0 and look at the probability 195 00:09:24,840 --> 00:09:27,010 that we fail in one more step, two more steps. 196 00:09:27,010 --> 00:09:30,470 Now we're starting off with f having the value f plus n, 197 00:09:30,470 --> 00:09:33,550 and asking about the probability that it 198 00:09:33,550 --> 00:09:36,060 fails in the next step or the next step, or the next step. 199 00:09:36,060 --> 00:09:39,830 So the punchline is that the expectation of g of f, 200 00:09:39,830 --> 00:09:44,080 given that f is greater than n, is simply the expectation 201 00:09:44,080 --> 00:09:46,310 of g of f plus n. 202 00:09:46,310 --> 00:09:48,020 And I'm going to let you meditate that 203 00:09:48,020 --> 00:09:49,560 and not say anymore about it. 204 00:09:49,560 --> 00:09:51,820 But the punchline is the corollary 205 00:09:51,820 --> 00:09:54,230 that the expectation of f squared, 206 00:09:54,230 --> 00:09:58,050 given that f is greater than 1, is simply the expectation 207 00:09:58,050 --> 00:10:00,710 of f plus 1 squared. 208 00:10:00,710 --> 00:10:03,940 And that lets us go back and simplify this expression. 209 00:10:03,940 --> 00:10:06,150 That we've had from total expectation, 210 00:10:06,150 --> 00:10:09,210 we now have-- here's the expectation of f squared, 211 00:10:09,210 --> 00:10:11,040 given that f is greater than 1. 212 00:10:11,040 --> 00:10:12,540 And let's look at these other terms. 213 00:10:12,540 --> 00:10:16,160 This is the expectation of f squared, given that f equals 1. 214 00:10:16,160 --> 00:10:19,180 Well, the expectation of f squared given that f equals 1 215 00:10:19,180 --> 00:10:22,090 is 1 squared, because we know what f is 216 00:10:22,090 --> 00:10:24,040 and that's the end of the story. 217 00:10:24,040 --> 00:10:26,080 Times the probability that f equals 1, 218 00:10:26,080 --> 00:10:29,750 that's p, the probability of failure on a given step. 219 00:10:29,750 --> 00:10:34,690 This is the probability that f is greater than 1, which is q, 220 00:10:34,690 --> 00:10:36,790 that we didn't fail on the first step. 221 00:10:36,790 --> 00:10:39,860 And we just figured out that this term is the expectation 222 00:10:39,860 --> 00:10:43,590 of the square of f plus 1. 223 00:10:43,590 --> 00:10:44,870 So there's the 1 and the p. 224 00:10:44,870 --> 00:10:47,920 And that becomes a q, and this is the expectation 225 00:10:47,920 --> 00:10:50,150 of f plus 1 squared. 226 00:10:50,150 --> 00:10:52,110 Now again, I apply limit linearity. 227 00:10:52,110 --> 00:10:54,340 I'm going to expand f plus 1 squared 228 00:10:54,340 --> 00:11:01,110 into f squared plus 2f plus 1, and then 229 00:11:01,110 --> 00:11:02,485 apply linearity of expectation. 230 00:11:02,485 --> 00:11:05,840 And I'm going to wind up with the expectation of f squared 231 00:11:05,840 --> 00:11:08,680 plus twice the expectation of f, which remember, 232 00:11:08,680 --> 00:11:13,690 is twice over-- 2 over p, plus 1, times the q. 233 00:11:13,690 --> 00:11:18,120 And now what I've got is a simple arithmetic equation 234 00:11:18,120 --> 00:11:20,860 between the expectation of f squared 235 00:11:20,860 --> 00:11:24,710 and some other arithmetic and the expectation of f squared. 236 00:11:24,710 --> 00:11:27,980 It's easy to solve for the expectation of f squared. 237 00:11:27,980 --> 00:11:31,120 And I'll spare you that elementary simplification. 238 00:11:31,120 --> 00:11:33,400 But the punchline is, when-- we also 239 00:11:33,400 --> 00:11:37,520 remember to subtract one over p squared, 240 00:11:37,520 --> 00:11:40,110 because that was the expectation of the square of f 241 00:11:40,110 --> 00:11:41,310 of the expectation of f. 242 00:11:41,310 --> 00:11:43,410 We came up with this punchline formula. 243 00:11:43,410 --> 00:11:46,150 The variance of mean time to failure 244 00:11:46,150 --> 00:11:50,520 is 1 over the probability of failure on a given step, 245 00:11:50,520 --> 00:11:53,780 times 1 minus 1 over-- times the probability-- 1 246 00:11:53,780 --> 00:11:56,280 over the probability of the failure in the first step, 247 00:11:56,280 --> 00:11:58,460 minus 1. 248 00:11:58,460 --> 00:11:59,970 That's just for practice and fun, 249 00:11:59,970 --> 00:12:02,590 let's look at the space station Mir again. 250 00:12:02,590 --> 00:12:05,980 Suppose that I tell you that there is a 1 in [? 10,000ths ?] 251 00:12:05,980 --> 00:12:10,370 chance that in any given hour, the Mir is going to crash 252 00:12:10,370 --> 00:12:12,960 into some debris that's out there in orbit. 253 00:12:12,960 --> 00:12:20,100 So the expectation of f is 10 to the fourth, about 10,000 hours. 254 00:12:20,100 --> 00:12:28,190 And the sigma is going to be the variance of f, which is about 1 255 00:12:28,190 --> 00:12:34,560 over ten thousandths, that is 10,000 times 10,000 minus 1, 256 00:12:34,560 --> 00:12:38,220 which is pretty close to 10,000 squared for the variance. 257 00:12:38,220 --> 00:12:40,770 And when I take the square root, I get back to 10,000. 258 00:12:40,770 --> 00:12:45,030 So sigma is just a tad less than 10,000, is 10 to the fourth. 259 00:12:45,030 --> 00:12:48,210 So with those numbers, I can apply the Chebyshev's Theorem 260 00:12:48,210 --> 00:12:51,670 and conclude that the probability that the Mir lasts 261 00:12:51,670 --> 00:12:54,440 more than 4 times 10 to the fourth hours 262 00:12:54,440 --> 00:12:57,110 is less than 1 chance in four. 263 00:12:57,110 --> 00:12:59,810 If we translate that into years-- if it was really 264 00:12:59,810 --> 00:13:03,190 the case that there was a 1 in 10,000 chance of the Mir being 265 00:13:03,190 --> 00:13:05,650 destroyed in any given hour, then the probability 266 00:13:05,650 --> 00:13:09,750 that it lasts more than 4.6 years before destructing 267 00:13:09,750 --> 00:13:11,520 is less than 1/4. 268 00:13:15,090 --> 00:13:18,500 So another rule for calculating variance, 269 00:13:18,500 --> 00:13:21,010 and maybe the most important general one, 270 00:13:21,010 --> 00:13:23,780 is that variance is additive. 271 00:13:23,780 --> 00:13:27,930 That is, the variance of a sum is the sum of the variances. 272 00:13:27,930 --> 00:13:31,290 But unlike expectation, where there's no other side 273 00:13:31,290 --> 00:13:34,350 condition, and it does not in any way depend on independence, 274 00:13:34,350 --> 00:13:36,510 it turns out that variance is additive 275 00:13:36,510 --> 00:13:41,310 only if the variables being added our pairwise independent. 276 00:13:41,310 --> 00:13:43,700 Now you might wonder where the pairwise came from, 277 00:13:43,700 --> 00:13:47,670 and it's because variance is the square of an expectation. 278 00:13:47,670 --> 00:13:52,310 So when you wind up multiplying out and doing the algebra, 279 00:13:52,310 --> 00:13:54,980 you're just getting quadratic terms 280 00:13:54,980 --> 00:14:00,060 for variances of-- for expectations of ri times rj. 281 00:14:00,060 --> 00:14:04,530 And so you need to factor those into expectation of r times 282 00:14:04,530 --> 00:14:07,410 expect-- ri times expectation of rj, 283 00:14:07,410 --> 00:14:09,487 which you only need pairwise independence for. 284 00:14:09,487 --> 00:14:11,320 So that's a fast talking through the algebra 285 00:14:11,320 --> 00:14:12,640 that I'm going to leave to you. 286 00:14:12,640 --> 00:14:16,290 It's in the text, and it's again one of these easy proofs.