1 00:00:00,390 --> 00:00:03,510 Mathematically speaking, the Chebyshev inequality is just a 2 00:00:03,510 --> 00:00:06,230 simple application of the Markov inequality. 3 00:00:06,230 --> 00:00:09,820 However, it contains a somewhat different message. 4 00:00:09,820 --> 00:00:11,650 Consider a random variable that has a 5 00:00:11,650 --> 00:00:13,180 certain mean and variance. 6 00:00:13,180 --> 00:00:16,720 What the Chebyshev inequality says is that if the variance 7 00:00:16,720 --> 00:00:21,000 is small, then the random variable is unlikely to fall 8 00:00:21,000 --> 00:00:23,790 too far off from the mean. 9 00:00:23,790 --> 00:00:27,680 If the variance is small, we have little randomness. 10 00:00:27,680 --> 00:00:31,510 And so X cannot be too far from the mean. 11 00:00:31,510 --> 00:00:35,080 In more precise terms, we have the following inequality. 12 00:00:35,080 --> 00:00:37,640 The probability that the distance from the mean is 13 00:00:37,640 --> 00:00:42,440 larger than or equal to a certain number is, at most, 14 00:00:42,440 --> 00:00:46,040 the variance divided by the square of that number. 15 00:00:46,040 --> 00:00:50,250 So if the variance is small, the probability of falling far 16 00:00:50,250 --> 00:00:53,730 from the mean is also going to be small. 17 00:00:53,730 --> 00:00:57,290 And if the number c is large, so that we're talking about a 18 00:00:57,290 --> 00:01:01,150 large distance from the mean, then the probability of this 19 00:01:01,150 --> 00:01:04,900 event happening falls off at a rate at 20 00:01:04,900 --> 00:01:08,940 least 1 over c squared. 21 00:01:08,940 --> 00:01:13,050 By the way, I should add here that c is assumed to be a 22 00:01:13,050 --> 00:01:14,730 positive number. 23 00:01:14,730 --> 00:01:18,070 If c was negative, then the probability that we're looking 24 00:01:18,070 --> 00:01:20,840 at would be equal to 1 anyway. 25 00:01:20,840 --> 00:01:24,490 And there isn't any point in trying obtain a bound for it. 26 00:01:24,490 --> 00:01:27,440 To prove the Chebyshev inequality, we will apply the 27 00:01:27,440 --> 00:01:30,310 Markov equality as follows. 28 00:01:30,310 --> 00:01:39,190 The probability of interest is the same as the probability 29 00:01:39,190 --> 00:01:46,250 that the square of this quantity is larger than or 30 00:01:46,250 --> 00:01:49,430 equal to the square of c. 31 00:01:49,430 --> 00:01:53,259 But now, here we have a non-negative random variable. 32 00:01:53,259 --> 00:01:57,310 And we can apply the Markov inequality with X replaced by 33 00:01:57,310 --> 00:02:01,700 this random variable and with a replaced by c squared. 34 00:02:01,700 --> 00:02:05,550 So this gives us the expected value of the random variable 35 00:02:05,550 --> 00:02:10,919 of interest divided by c squared. 36 00:02:10,919 --> 00:02:12,980 But we recognize that the 37 00:02:12,980 --> 00:02:16,280 numerator is just the variance. 38 00:02:16,280 --> 00:02:22,200 And this is the Chebyshev inequality that we claimed. 39 00:02:22,200 --> 00:02:25,640 As an application of the Chebyshev inequality, let us 40 00:02:25,640 --> 00:02:28,810 look at the probability of this event that the distance 41 00:02:28,810 --> 00:02:34,410 from the mean is at least k standard deviations, where k 42 00:02:34,410 --> 00:02:37,710 is some positive number. 43 00:02:37,710 --> 00:02:41,820 Using the Chebyshev inequality with c replaced by k times 44 00:02:41,820 --> 00:02:47,000 sigma, we obtain sigma squared over c squared, which in our 45 00:02:47,000 --> 00:02:50,530 case is k squared times sigma squared, 46 00:02:50,530 --> 00:02:53,760 which is 1 over k squared. 47 00:02:53,760 --> 00:02:57,720 So what this is saying is that if you take, for example, k 48 00:02:57,720 --> 00:03:02,400 equal to 3, the probability that you fall three standard 49 00:03:02,400 --> 00:03:07,090 deviations away from the mean or more, that probability is 50 00:03:07,090 --> 00:03:10,590 going to be less than or equal to 1 over 9. 51 00:03:10,590 --> 00:03:12,920 And this is true no matter what kind of 52 00:03:12,920 --> 00:03:15,400 distribution you have. 53 00:03:15,400 --> 00:03:19,010 Let us now revisit our earlier example, where X is an 54 00:03:19,010 --> 00:03:20,860 exponential random variable. 55 00:03:20,860 --> 00:03:23,320 And we're interested in the probability that the random 56 00:03:23,320 --> 00:03:26,400 variable takes a value larger than or equal to a. 57 00:03:26,400 --> 00:03:32,100 The Markov inequality gave us a bound of 1 over a. 58 00:03:32,100 --> 00:03:38,700 And as we recall, the exact answer to this probability was 59 00:03:38,700 --> 00:03:40,780 e to the minus a. 60 00:03:40,780 --> 00:03:43,340 Let us see what we can get using the Chebyshev 61 00:03:43,340 --> 00:03:44,440 inequality. 62 00:03:44,440 --> 00:03:50,660 Now, our random variable has a mean of 1. 63 00:03:50,660 --> 00:03:55,860 Let us assume that a is bigger than 1, so that we're 64 00:03:55,860 --> 00:04:01,460 considering an event that we fall far away from the mean by 65 00:04:01,460 --> 00:04:05,030 a distance of at least a minus 1. 66 00:04:05,030 --> 00:04:08,700 That is we write the probability that X is larger 67 00:04:08,700 --> 00:04:12,690 than or equal to a as the probability that the distance 68 00:04:12,690 --> 00:04:18,750 of X from the mean is larger than or equal to a minus 1. 69 00:04:18,750 --> 00:04:26,350 And now, this event is smaller than the event that the 70 00:04:26,350 --> 00:04:33,450 absolute value of X minus 1 is larger than a minus 1. 71 00:04:33,450 --> 00:04:38,530 This is because if this event is true, then that event will 72 00:04:38,530 --> 00:04:39,950 also be true. 73 00:04:39,950 --> 00:04:42,200 And now, we can apply the Chebyshev inequality. 74 00:04:42,200 --> 00:04:45,680 Here we have the distance of X from the mean. 75 00:04:45,680 --> 00:04:50,100 So the Chebyshev inequality applied to the random variable 76 00:04:50,100 --> 00:04:53,690 X will have up here the variance of X, 77 00:04:53,690 --> 00:04:55,040 which is equal to 1. 78 00:04:55,040 --> 00:04:59,250 And in the denominator, we will have a minus 1 squared. 79 00:05:03,060 --> 00:05:08,030 Notice that if a is a large number, this quantity here 80 00:05:08,030 --> 00:05:14,060 behaves like 1 over a squared, which falls off much faster 81 00:05:14,060 --> 00:05:15,660 than 1 over a. 82 00:05:15,660 --> 00:05:20,080 So at least for large a's, the Chebyshev bound is going to 83 00:05:20,080 --> 00:05:26,630 give us a smaller bound and, therefore, more informative 84 00:05:26,630 --> 00:05:30,630 than what we obtained from the Markov inequality. 85 00:05:30,630 --> 00:05:34,100 In most cases, the Chebyshev inequality is, indeed, 86 00:05:34,100 --> 00:05:38,770 stronger and more informative than the Markov inequality. 87 00:05:38,770 --> 00:05:43,280 And one of the reasons is that it exploits more information 88 00:05:43,280 --> 00:05:46,720 about the distribution of the random variable X. That is it 89 00:05:46,720 --> 00:05:49,159 uses knowledge, not just about the mean 90 00:05:49,159 --> 00:05:50,490 of the random variable. 91 00:05:50,490 --> 00:05:54,450 But it also uses some information about the variance 92 00:05:54,450 --> 00:05:55,700 of the random variable.