1 00:00:00,499 --> 00:00:03,220 PROFESSOR: Our topic is deviation from the mean, 2 00:00:03,220 --> 00:00:06,550 meaning the probability that a random variable returns 3 00:00:06,550 --> 00:00:10,030 a value that differs significantly from its mean. 4 00:00:10,030 --> 00:00:14,930 Now, the Markov bound gave you a course bound on the probability 5 00:00:14,930 --> 00:00:19,580 that R was overly large using very little information 6 00:00:19,580 --> 00:00:22,300 about R. Not surprisingly, if you know a little bit more 7 00:00:22,300 --> 00:00:25,930 about the distribution of R, simply that it's not negative, 8 00:00:25,930 --> 00:00:28,560 you can state tighter bounds. 9 00:00:28,560 --> 00:00:32,870 And this was noticed by a mathematician named Chebyshev. 10 00:00:32,870 --> 00:00:35,820 And he has a bound called the Chebyshev bound. 11 00:00:35,820 --> 00:00:39,060 Now, it's interesting that the Markov bound, even though it's 12 00:00:39,060 --> 00:00:42,920 very weak and seems not very useful, 13 00:00:42,920 --> 00:00:44,720 the Chebyshev bound, which generally 14 00:00:44,720 --> 00:00:47,920 gives you a significantly stronger, invaluably stronger 15 00:00:47,920 --> 00:00:50,970 bound on the probability that a random variable differs much 16 00:00:50,970 --> 00:00:54,580 from its mean is actually a trivial corollary 17 00:00:54,580 --> 00:00:55,630 of Markov theorem. 18 00:00:55,630 --> 00:00:57,860 So that's just a very simple ingenious way 19 00:00:57,860 --> 00:01:02,340 to use Markov's bound to derive Chebyshev bound. 20 00:01:02,340 --> 00:01:04,519 And let's look at how. 21 00:01:04,519 --> 00:01:07,390 So we're interested in the probability 22 00:01:07,390 --> 00:01:11,200 that a random variable R differs from its mean by an amount x. 23 00:01:11,200 --> 00:01:12,970 The distance between R and its mean, 24 00:01:12,970 --> 00:01:14,940 the absolute value of R minus mu, 25 00:01:14,940 --> 00:01:16,680 is greater than or equal to x. 26 00:01:16,680 --> 00:01:19,850 We're trying to get a grip on that probability 27 00:01:19,850 --> 00:01:21,180 as a function of x. 28 00:01:21,180 --> 00:01:26,260 Now, the point is that the event that the distance between R 29 00:01:26,260 --> 00:01:29,830 and its mean is greater than or equal to x, another way 30 00:01:29,830 --> 00:01:32,820 to say that is to square both sides of this inequality. 31 00:01:32,820 --> 00:01:37,670 It says that the event that R minus mu squared is greater 32 00:01:37,670 --> 00:01:40,404 or equal to x squared happens. 33 00:01:40,404 --> 00:01:42,070 These two events are just different ways 34 00:01:42,070 --> 00:01:43,156 of saying the same set. 35 00:01:43,156 --> 00:01:44,530 So therefore, their probabilities 36 00:01:44,530 --> 00:01:46,470 are equal trivially. 37 00:01:46,470 --> 00:01:51,370 Now, what's nice about this is, of course, that R minus mu 38 00:01:51,370 --> 00:01:55,150 squared is a non-negative random variable to which Markov's 39 00:01:55,150 --> 00:01:57,040 theorem applies. 40 00:01:57,040 --> 00:01:58,930 The square of a real number is always 41 00:01:58,930 --> 00:02:00,900 going to be non-negative. 42 00:02:00,900 --> 00:02:03,390 So let's just apply Markov's theorem 43 00:02:03,390 --> 00:02:08,130 to this new random variable, R minus mu squared. 44 00:02:08,130 --> 00:02:12,650 And what does Markov's bound tell us about this probability, 45 00:02:12,650 --> 00:02:14,940 that the square variable is greater 46 00:02:14,940 --> 00:02:17,030 than or equal to an amount x squared. 47 00:02:17,030 --> 00:02:18,706 Well, just plug in Markov. 48 00:02:18,706 --> 00:02:22,100 And it tells you that this probability 49 00:02:22,100 --> 00:02:31,540 that the square variable, that it's as big as x squared, 50 00:02:31,540 --> 00:02:35,520 is simply the expectation of that squared variable 51 00:02:35,520 --> 00:02:36,600 divided by x squared. 52 00:02:36,600 --> 00:02:40,620 This is just applying Markov's bound to this variable, R 53 00:02:40,620 --> 00:02:42,680 minus u squared. 54 00:02:42,680 --> 00:02:47,160 Now, this numerator is a weird thing to stare at, 55 00:02:47,160 --> 00:02:50,640 expectation of R minus mu squared, and may not 56 00:02:50,640 --> 00:02:51,557 seem very memorable. 57 00:02:51,557 --> 00:02:53,640 But you should remember, because it's so important 58 00:02:53,640 --> 00:02:55,230 that it has name all it's own. 59 00:02:55,230 --> 00:02:58,080 It's called the variance of R. And this 60 00:02:58,080 --> 00:03:01,180 is an extra bit of information about the shape 61 00:03:01,180 --> 00:03:03,620 of the distribution of R that turns out 62 00:03:03,620 --> 00:03:08,230 to allow you to state much more powerful theorems in general 63 00:03:08,230 --> 00:03:12,390 about the probability that R deviates from its mean 64 00:03:12,390 --> 00:03:14,560 by a given amount. 65 00:03:14,560 --> 00:03:17,220 So we could just restate the Chebyshev bound. 66 00:03:17,220 --> 00:03:19,690 Just replacing that expectation formula 67 00:03:19,690 --> 00:03:21,960 in terms of its name, variance of R, 68 00:03:21,960 --> 00:03:24,050 this is what the Chebyshev bound says. 69 00:03:24,050 --> 00:03:27,470 The probability that the distance between R and its mean 70 00:03:27,470 --> 00:03:30,650 is greater than or equal to x is the variance of R divided 71 00:03:30,650 --> 00:03:33,480 by x squared, where variance of R 72 00:03:33,480 --> 00:03:37,150 is the expectation of the square of R minus u. 73 00:03:37,150 --> 00:03:40,260 Now, the very important technical aspect 74 00:03:40,260 --> 00:03:41,840 of the Chebyshev bound is that we're 75 00:03:41,840 --> 00:03:46,730 getting an inverse square reduction in the probability. 76 00:03:46,730 --> 00:03:49,650 Remember, with Markov, the denominator 77 00:03:49,650 --> 00:03:52,340 was behaving linearly. 78 00:03:52,340 --> 00:03:54,180 And here, it behaves quite quadratically. 79 00:03:54,180 --> 00:03:57,770 So these bounds get smaller, much more 80 00:03:57,770 --> 00:04:01,480 rapidly as we ask about the probability of differing 81 00:04:01,480 --> 00:04:04,090 by a larger amount. 82 00:04:04,090 --> 00:04:07,280 The variance of R, maybe in a way that will help you 83 00:04:07,280 --> 00:04:10,150 remember it is to remember another name that it has. 84 00:04:10,150 --> 00:04:12,060 It's called the mean square error. 85 00:04:12,060 --> 00:04:15,620 If you think of R minus mu as the error 86 00:04:15,620 --> 00:04:17,959 that R is making in how much it differs 87 00:04:17,959 --> 00:04:21,250 from what it ought to be, and we square it, 88 00:04:21,250 --> 00:04:24,550 and then we take the average, so we're taking 89 00:04:24,550 --> 00:04:29,520 the mean of the squared errors. 90 00:04:29,520 --> 00:04:31,730 And here, we're back to restating Markov 91 00:04:31,730 --> 00:04:35,130 bound in terms of the variance. 92 00:04:35,130 --> 00:04:38,220 The variance has one difficulty with it. 93 00:04:38,220 --> 00:04:41,324 And that leads us to want to look at another object, which 94 00:04:41,324 --> 00:04:42,990 is just the square root of the variance, 95 00:04:42,990 --> 00:04:44,970 called the standard deviation. 96 00:04:44,970 --> 00:04:47,810 So you wonder why-- I mean, if you understand variance, 97 00:04:47,810 --> 00:04:49,560 what's the point of taking the square root 98 00:04:49,560 --> 00:04:50,650 and working with that? 99 00:04:50,650 --> 00:04:53,660 And the answer is simply that if you 100 00:04:53,660 --> 00:04:56,840 think of R as a random variable whose values have 101 00:04:56,840 --> 00:05:02,300 some dimension, like seconds or dollars, then the variance of R 102 00:05:02,300 --> 00:05:06,150 is the expectation of a square variable of R minus mu 103 00:05:06,150 --> 00:05:09,710 squared, which means its units are second squared 104 00:05:09,710 --> 00:05:11,840 or dollar squared or whatever. 105 00:05:11,840 --> 00:05:15,570 And the variance of R itself is a squared value, 106 00:05:15,570 --> 00:05:22,270 which is not reflecting the magnitude of the distance 107 00:05:22,270 --> 00:05:23,895 that you expect-- of the kind of errors 108 00:05:23,895 --> 00:05:26,660 that you expect R to make, the distance that you expect 109 00:05:26,660 --> 00:05:28,530 part R to be from its mean. 110 00:05:28,530 --> 00:05:32,940 So we can get the units of this quantity 111 00:05:32,940 --> 00:05:35,300 back into matching the units of R 112 00:05:35,300 --> 00:05:38,265 and also get a number that's closer to the kind of variance 113 00:05:38,265 --> 00:05:41,390 that you'd expect to observe by just taking the square root. 114 00:05:41,390 --> 00:05:44,545 And it's called the standard deviation of R. 115 00:05:44,545 --> 00:05:46,670 If it helps you any, the standard deviation is also 116 00:05:46,670 --> 00:05:49,020 called the root mean square error. 117 00:05:49,020 --> 00:05:50,690 And you might have heard that phrase. 118 00:05:50,690 --> 00:05:52,490 It comes up all the time in discussions 119 00:05:52,490 --> 00:05:55,080 of experimental error. 120 00:05:55,080 --> 00:05:56,870 So again, we're taking the error-- 121 00:05:56,870 --> 00:06:00,500 means the distance between the random variable and its mean. 122 00:06:00,500 --> 00:06:01,450 We're squaring it. 123 00:06:01,450 --> 00:06:05,510 We're taking the expectation of that squared error. 124 00:06:05,510 --> 00:06:08,220 And then we're taking the square root of it. 125 00:06:08,220 --> 00:06:10,580 It's the standard deviation. 126 00:06:10,580 --> 00:06:15,120 So going back to understand what the standard deviation means 127 00:06:15,120 --> 00:06:19,230 intuitively in terms of a familiar shaped distribution 128 00:06:19,230 --> 00:06:21,060 function for a random variable R, 129 00:06:21,060 --> 00:06:23,670 suppose that R is a random variable that 130 00:06:23,670 --> 00:06:25,850 has this fairly standard kind of bell 131 00:06:25,850 --> 00:06:30,200 curved shape or Gaussian shape, that it's got one hump. 132 00:06:30,200 --> 00:06:31,480 It's unimodal. 133 00:06:31,480 --> 00:06:35,920 And it kind of trails off with some moderate rate, 134 00:06:35,920 --> 00:06:38,530 as you get further and further away from the mean. 135 00:06:38,530 --> 00:06:41,830 Well, the mean of a distribution that's shaped like this, 136 00:06:41,830 --> 00:06:46,590 it's symmetric around that high point, that's going 137 00:06:46,590 --> 00:06:48,080 to be the mean by symmetry. 138 00:06:48,080 --> 00:06:50,585 It's equally likely to be-- well, 139 00:06:50,585 --> 00:06:53,570 the values average out to this middle value. 140 00:06:53,570 --> 00:06:56,560 A standard deviation for a curve like this 141 00:06:56,560 --> 00:07:00,160 is going to be an interval that you can interpret 142 00:07:00,160 --> 00:07:01,980 as an interval around the mean. 143 00:07:01,980 --> 00:07:07,100 And the probability that you're within that interval 144 00:07:07,100 --> 00:07:09,952 is fairly high for standard distributions. 145 00:07:09,952 --> 00:07:13,170 Now, we'll see that the Chebyshev bound is not 146 00:07:13,170 --> 00:07:15,990 going to tell us much about for arbitrary unknown distribution. 147 00:07:15,990 --> 00:07:18,100 But in general, for the typical distributions, 148 00:07:18,100 --> 00:07:21,330 you expect to find that the standard deviation tells you 149 00:07:21,330 --> 00:07:24,560 that that's where you're most likely to be when you take 150 00:07:24,560 --> 00:07:27,480 a random value of the variable. 151 00:07:27,480 --> 00:07:31,755 So let's return to the Chebyshev bound, as we've stated it. 152 00:07:31,755 --> 00:07:34,380 And I'm just replacing here, I'm restating the Chebyshev bound, 153 00:07:34,380 --> 00:07:37,230 just replacing the variance of R in the numerator 154 00:07:37,230 --> 00:07:39,010 by the square of its square root, 155 00:07:39,010 --> 00:07:43,260 by sigma squared R. It's a useful way to restate it. 156 00:07:43,260 --> 00:07:45,210 Because by restating it this way, 157 00:07:45,210 --> 00:07:47,700 it motivates another reformulation 158 00:07:47,700 --> 00:07:49,860 of the Chebyshev bound as we reformulated 159 00:07:49,860 --> 00:07:52,180 the Markov bound previously in terms 160 00:07:52,180 --> 00:07:53,740 of a multiple of something. 161 00:07:53,740 --> 00:07:56,850 I'm going to replace x by a constant times 162 00:07:56,850 --> 00:07:58,680 the standard deviation. 163 00:07:58,680 --> 00:08:01,422 So I'm going to see the probability that the error is 164 00:08:01,422 --> 00:08:03,130 greater than or equal to a constant times 165 00:08:03,130 --> 00:08:04,560 the standard deviation. 166 00:08:04,560 --> 00:08:06,220 And this term is going to simplify. 167 00:08:06,220 --> 00:08:10,430 Once x is a constant times the standard deviation, 168 00:08:10,430 --> 00:08:12,930 the standard deviations are going to cancel out. 169 00:08:12,930 --> 00:08:15,660 And I'm just going to wind up with 1 over x squared. 170 00:08:15,660 --> 00:08:18,500 So let's just do that. 171 00:08:18,500 --> 00:08:24,510 And there's the formula-- the probability 172 00:08:24,510 --> 00:08:27,270 that the distance of R from its mean 173 00:08:27,270 --> 00:08:29,506 is greater than or equal to a multiple c 174 00:08:29,506 --> 00:08:33,030 of its standard deviation is less than or equal to 1 175 00:08:33,030 --> 00:08:34,559 over c squared. 176 00:08:34,559 --> 00:08:41,039 So it's getting much more rapidly smaller as c grows. 177 00:08:41,039 --> 00:08:44,680 Let's look at what that means for just some numbers, 178 00:08:44,680 --> 00:08:47,210 to make the thing a little bit more real. 179 00:08:47,210 --> 00:08:49,120 What this assertion is telling us 180 00:08:49,120 --> 00:08:51,930 is that R is probably not going to return 181 00:08:51,930 --> 00:08:55,420 a value that's a significant multiple 182 00:08:55,420 --> 00:08:57,500 of its standard deviation. 183 00:08:57,500 --> 00:08:59,680 For example, what does this formula 184 00:08:59,680 --> 00:09:02,220 tell us about the probability that R 185 00:09:02,220 --> 00:09:06,790 is going to be greater than or equal to one standard deviation 186 00:09:06,790 --> 00:09:07,850 away from its mean? 187 00:09:07,850 --> 00:09:08,800 Well, it actually tells us nothing. 188 00:09:08,800 --> 00:09:10,383 That's the case in which it's no good. 189 00:09:10,383 --> 00:09:12,117 Because c is 1, it's just telling us 190 00:09:12,117 --> 00:09:14,450 that the probability is at most 1, which we always know, 191 00:09:14,450 --> 00:09:16,350 because probabilities are at most 1. 192 00:09:16,350 --> 00:09:22,450 But if I ask, what's the probability that the error of R 193 00:09:22,450 --> 00:09:25,302 is greater than or equal to twice the standard deviation, 194 00:09:25,302 --> 00:09:27,510 then this theorem is telling me something nontrivial. 195 00:09:27,510 --> 00:09:29,880 It's telling me that the probability that it's twice 196 00:09:29,880 --> 00:09:33,240 the deviation is 1 over 2 squared or 1/4. 197 00:09:33,240 --> 00:09:36,890 An arbitrary random variable with standard deviation sigma 198 00:09:36,890 --> 00:09:39,560 is going to exceed twice-- the error 199 00:09:39,560 --> 00:09:42,850 is going to exceed twice the standard deviation at most 1/4 200 00:09:42,850 --> 00:09:46,140 of the time, three times at most 1/9 of the time, four times 201 00:09:46,140 --> 00:09:48,080 at most the 1/16 of the time. 202 00:09:48,080 --> 00:09:51,080 So the qualitative message to take away 203 00:09:51,080 --> 00:09:54,570 is that, for any random variable whatsoever, as long 204 00:09:54,570 --> 00:09:57,450 as it has a standard deviation sigma, 205 00:09:57,450 --> 00:10:01,340 then you can say some definite things about the probability 206 00:10:01,340 --> 00:10:03,030 that the random variable is going 207 00:10:03,030 --> 00:10:08,140 to take a value that differs by a large multiple 208 00:10:08,140 --> 00:10:11,700 of the standard deviation from its mean. 209 00:10:11,700 --> 00:10:13,570 That probability is going to be small 210 00:10:13,570 --> 00:10:16,620 and get smaller and rapidly smaller 211 00:10:16,620 --> 00:10:20,970 as the multiple of the standard deviation continues.