1 00:00:00,950 --> 00:00:04,460 In this segment, we derive and discuss the Markov inequality, 2 00:00:04,460 --> 00:00:09,030 a rather simple but quite useful and powerful fact about 3 00:00:09,030 --> 00:00:11,660 probability distributions. 4 00:00:11,660 --> 00:00:14,760 The basic idea behind the Markov inequality as well as 5 00:00:14,760 --> 00:00:18,410 many other inequalities and bounds in probability theory 6 00:00:18,410 --> 00:00:20,140 is the following. 7 00:00:20,140 --> 00:00:23,050 We may be interested in saying something about the 8 00:00:23,050 --> 00:00:25,270 probability of an extreme event. 9 00:00:25,270 --> 00:00:28,280 By extreme event, we mean that some random variable takes a 10 00:00:28,280 --> 00:00:29,960 very large value. 11 00:00:29,960 --> 00:00:33,360 If we can calculate that probability exactly, then, of 12 00:00:33,360 --> 00:00:35,020 course, everything is fine. 13 00:00:35,020 --> 00:00:38,600 But suppose that we only have a little bit of information 14 00:00:38,600 --> 00:00:41,350 about the probability distribution at hand. 15 00:00:41,350 --> 00:00:44,250 For example, suppose that we only know the expected value 16 00:00:44,250 --> 00:00:46,060 associated with that distribution. 17 00:00:46,060 --> 00:00:47,810 Can we say something? 18 00:00:47,810 --> 00:00:51,040 Well, here's a statement, which is quite intuitive. 19 00:00:51,040 --> 00:00:53,790 If you have a non-negative random variable, and I tell 20 00:00:53,790 --> 00:00:56,920 you that the average or the expected value is rather 21 00:00:56,920 --> 00:01:01,580 small, then there should be only a very small probability 22 00:01:01,580 --> 00:01:05,030 that the random variable takes a very large value. 23 00:01:05,030 --> 00:01:08,039 This is an intuitively plausible statement, and the 24 00:01:08,039 --> 00:01:10,950 Markov inequality makes that statement precise. 25 00:01:10,950 --> 00:01:12,550 Here is what it says. 26 00:01:12,550 --> 00:01:15,160 If we have a random variable that's non-negative and you 27 00:01:15,160 --> 00:01:18,820 take any positive number, the probability that the random 28 00:01:18,820 --> 00:01:22,850 variable exceeds that particular number is bounded 29 00:01:22,850 --> 00:01:25,289 by this ratio. 30 00:01:25,289 --> 00:01:28,940 If the expected value of X is very small, then the 31 00:01:28,940 --> 00:01:34,070 probability of exceeding that value of a will also be small. 32 00:01:34,070 --> 00:01:37,610 Furthermore, if a is very large, the probability of 33 00:01:37,610 --> 00:01:41,660 exceeding that very large value drops down because this 34 00:01:41,660 --> 00:01:44,560 ratio becomes smaller. 35 00:01:44,560 --> 00:01:47,039 So that's what the Markov inequality says. 36 00:01:47,039 --> 00:01:49,430 Let us now proceed with a derivation. 37 00:01:49,430 --> 00:01:53,259 Let's start with the formula for the expected value of X, 38 00:01:53,259 --> 00:01:56,870 and just to keep the argument concrete, let us assume that 39 00:01:56,870 --> 00:01:59,479 the random variable is continuous so that the 40 00:01:59,479 --> 00:02:02,310 expected value is given by an integral. 41 00:02:02,310 --> 00:02:04,890 The argument would be exactly the same as in the discrete 42 00:02:04,890 --> 00:02:08,570 case, but in the discrete case, we would be using a sum. 43 00:02:08,570 --> 00:02:12,070 Now since the random variable is non-negative, this integral 44 00:02:12,070 --> 00:02:16,110 only ranges from 0 to infinity. 45 00:02:16,110 --> 00:02:19,750 Now, we're interested, however, in values of X larger 46 00:02:19,750 --> 00:02:24,840 than or equal to a, and that tempts us to consider just the 47 00:02:24,840 --> 00:02:29,800 integral from a to infinity of the same quantity. 48 00:02:29,800 --> 00:02:35,590 How do these two quantities compare to each other? 49 00:02:35,590 --> 00:02:39,010 Since we're integrating a non-negative quantity, if 50 00:02:39,010 --> 00:02:42,840 we're integrating over a smaller range, the resulting 51 00:02:42,840 --> 00:02:48,579 integral will be less than or equal to this integral here, 52 00:02:48,579 --> 00:02:53,680 so we get an inequality that goes in this direction. 53 00:02:53,680 --> 00:02:56,500 Now let us look at this integral here. 54 00:02:56,500 --> 00:03:00,860 Over the range of integration that we're considering, X is 55 00:03:00,860 --> 00:03:03,200 at least as large as a. 56 00:03:03,200 --> 00:03:07,740 Therefore, the quantity that we're integrating from a to 57 00:03:07,740 --> 00:03:16,420 infinity is at least as large as a times the density of X. 58 00:03:16,420 --> 00:03:20,170 And now we can take this a, which is a constant, pull it 59 00:03:20,170 --> 00:03:22,360 outside the integral. 60 00:03:22,360 --> 00:03:25,170 And what we're left with is the integral of the density 61 00:03:25,170 --> 00:03:29,040 from a to infinity, which is nothing but the probability 62 00:03:29,040 --> 00:03:32,140 that the random variable takes a value larger 63 00:03:32,140 --> 00:03:33,510 than or equal to a. 64 00:03:33,510 --> 00:03:37,560 And now if you compare the two sides of this inequality, 65 00:03:37,560 --> 00:03:44,010 that's exactly what the Markov inequality is telling us. 66 00:03:44,010 --> 00:03:47,740 Now it is instructive to go through a second derivation of 67 00:03:47,740 --> 00:03:49,290 the Markov inequality. 68 00:03:49,290 --> 00:03:53,180 This derivation is essentially the same conceptually as the 69 00:03:53,180 --> 00:03:56,320 one that we just went through except that it is more 70 00:03:56,320 --> 00:04:00,390 abstract and does not require us to write down any explicit 71 00:04:00,390 --> 00:04:02,480 sums or integrals. 72 00:04:02,480 --> 00:04:04,070 Here's how it goes. 73 00:04:04,070 --> 00:04:08,200 Let us define a new random variable Y, which is equal to 74 00:04:08,200 --> 00:04:13,350 0 if the random variable X happens to be less than a and 75 00:04:13,350 --> 00:04:17,970 it is equal to a if X happens to be larger 76 00:04:17,970 --> 00:04:19,709 than or equal to a. 77 00:04:19,709 --> 00:04:22,870 How is Y related to X? 78 00:04:22,870 --> 00:04:26,420 If X takes a value less than a, it will still be a 79 00:04:26,420 --> 00:04:29,920 non-negative value, so X is going to be at least as large 80 00:04:29,920 --> 00:04:31,360 as the value of 0. 81 00:04:31,360 --> 00:04:33,250 that Y takes. 82 00:04:33,250 --> 00:04:38,050 If X is larger than or equal to a, Y will be a, so X will 83 00:04:38,050 --> 00:04:40,440 again be at least as large. 84 00:04:40,440 --> 00:04:47,190 So no matter what, we have the inequality that Y is always 85 00:04:47,190 --> 00:04:52,960 less than or equal to X. And since this is always the case, 86 00:04:52,960 --> 00:04:56,950 this means that the expected value of Y will be less than 87 00:04:56,950 --> 00:05:00,550 or equal to the expected value of X. 88 00:05:00,550 --> 00:05:02,920 But now what is the expected value of Y? 89 00:05:02,920 --> 00:05:08,570 Since Y is either 0 or a, the expected value is equal to a 90 00:05:08,570 --> 00:05:14,050 times the probability of that event, which is a times the 91 00:05:14,050 --> 00:05:19,580 probability that X is larger than or equal to a. 92 00:05:19,580 --> 00:05:23,850 And by comparing the two sides of this inequality, what we 93 00:05:23,850 --> 00:05:27,530 have is exactly the Markov inequality. 94 00:05:27,530 --> 00:05:30,170 Let us now go through some simple examples. 95 00:05:30,170 --> 00:05:32,659 Suppose that X is exponentially distributed with 96 00:05:32,659 --> 00:05:36,590 parameter or equal to 1 so that the expected value of X 97 00:05:36,590 --> 00:05:40,870 is also going to be equal to 1, and in that case, we obtain 98 00:05:40,870 --> 00:05:43,510 a bound of 1 over a. 99 00:05:43,510 --> 00:05:47,210 To put this result in perspective, note that we're 100 00:05:47,210 --> 00:05:49,570 trying to bound a probability. 101 00:05:49,570 --> 00:05:53,040 We know that the probability lies between 0 and 1. 102 00:05:53,040 --> 00:05:56,409 There's a true value for this probability, and in this 103 00:05:56,409 --> 00:05:58,909 particular example because we have an exponential 104 00:05:58,909 --> 00:06:04,690 distribution, this probability is equal to e to the minus a. 105 00:06:04,690 --> 00:06:07,370 The Markov inequality gives us a bound. 106 00:06:07,370 --> 00:06:11,480 In this instance, the bound takes the form of 1 over a, 107 00:06:11,480 --> 00:06:14,270 and the inequality tells us that the true value is 108 00:06:14,270 --> 00:06:17,110 somewhere to the left of here. 109 00:06:17,110 --> 00:06:21,450 A bound will be considered good or strong or useful if 110 00:06:21,450 --> 00:06:24,590 that bound turns out to be quite close to the correct 111 00:06:24,590 --> 00:06:28,810 value so that it also serves as a fairly accurate estimate. 112 00:06:28,810 --> 00:06:31,560 Unfortunately, in this example, this is not the case 113 00:06:31,560 --> 00:06:35,000 because the true value falls off exponentially with a, 114 00:06:35,000 --> 00:06:37,940 whereas the bound that we obtained falls off at a much 115 00:06:37,940 --> 00:06:40,550 slower rate of 1 over a. 116 00:06:40,550 --> 00:06:43,690 For this reason, one would like to have even better 117 00:06:43,690 --> 00:06:46,470 bounds than the Markov inequality, and this is one 118 00:06:46,470 --> 00:06:49,350 motivation for the Chebyshev inequality that we will be 119 00:06:49,350 --> 00:06:50,870 considering next. 120 00:06:50,870 --> 00:06:55,230 But before we move there, let us consider one more example. 121 00:06:55,230 --> 00:06:59,220 Suppose that X is a uniform random variable on the 122 00:06:59,220 --> 00:07:05,960 interval from minus 4 to 4, and we're interested in saying 123 00:07:05,960 --> 00:07:10,270 something about the probability that X is larger 124 00:07:10,270 --> 00:07:12,350 than or equal to 3. 125 00:07:12,350 --> 00:07:15,960 So we're interested in this event here. 126 00:07:15,960 --> 00:07:20,015 So the value of the density, because we have a range of 127 00:07:20,015 --> 00:07:23,470 length 8, the value of the density is 1/8. 128 00:07:23,470 --> 00:07:27,750 So we know that this probability has a true value 129 00:07:27,750 --> 00:07:35,350 of 1 over 8, which we can indicate on a diagram here. 130 00:07:35,350 --> 00:07:37,540 Probabilities are between 0 and 1. 131 00:07:37,540 --> 00:07:40,890 We have a true value of 1 over 8. 132 00:07:40,890 --> 00:07:43,010 Lets us see what the Markov inequality is 133 00:07:43,010 --> 00:07:44,900 going to give us. 134 00:07:44,900 --> 00:07:48,650 There's one difficulty that X is not a non-negative random 135 00:07:48,650 --> 00:07:51,300 variable, so we cannot apply the Markov 136 00:07:51,300 --> 00:07:53,680 inequality right away. 137 00:07:53,680 --> 00:07:59,050 However, the event that X is larger than or equal to 3 is 138 00:07:59,050 --> 00:08:03,920 smaller than the event that the absolute value of X is 139 00:08:03,920 --> 00:08:06,690 larger than or equal to 3. 140 00:08:06,690 --> 00:08:12,460 That is, we take this blue event and we also add this 141 00:08:12,460 --> 00:08:17,340 green event, and we say that the probability of the blue 142 00:08:17,340 --> 00:08:21,370 event is less than or equal to the probability of the blue 143 00:08:21,370 --> 00:08:24,970 together with the green event, which is the event that the 144 00:08:24,970 --> 00:08:28,190 absolute value of X is larger than or equal to 3. 145 00:08:28,190 --> 00:08:30,430 So now we have a random variable, which is 146 00:08:30,430 --> 00:08:33,919 non-negative, and we can apply the Markov inequality and 147 00:08:33,919 --> 00:08:36,890 write that this is less than or equal to the expected value 148 00:08:36,890 --> 00:08:40,980 of the absolute value of X divided by 3. 149 00:08:40,980 --> 00:08:44,153 What is this expectation of the absolute value of X? 150 00:08:44,153 --> 00:08:47,160 X is uniform on this range. 151 00:08:47,160 --> 00:08:50,590 The absolute value of X will be taking values only 152 00:08:50,590 --> 00:08:52,550 between 0 and 4. 153 00:08:52,550 --> 00:08:55,360 And because the original distribution was uniform, the 154 00:08:55,360 --> 00:08:59,050 absolute value of X will also be uniform on the 155 00:08:59,050 --> 00:09:00,820 range from 0 to 4. 156 00:09:00,820 --> 00:09:03,510 And for this reason, the expected value is going to be 157 00:09:03,510 --> 00:09:08,290 equal to 2, and we get a bound of 2/3. 158 00:09:08,290 --> 00:09:10,810 This is a pretty bad bound. 159 00:09:10,810 --> 00:09:14,030 It is true, of course, but it is quite far 160 00:09:14,030 --> 00:09:15,910 from the true answer. 161 00:09:15,910 --> 00:09:18,230 Could we improve this bound? 162 00:09:18,230 --> 00:09:20,920 In this particular example, we can. 163 00:09:20,920 --> 00:09:24,720 Because of symmetry, we know that the probability of being 164 00:09:24,720 --> 00:09:29,150 larger than or equal to 3 is equal to the probability of 165 00:09:29,150 --> 00:09:32,200 being less than or equal to minus 3. 166 00:09:32,200 --> 00:09:35,840 Or the probability of this event, which is the blue and 167 00:09:35,840 --> 00:09:39,100 the green, is twice the probability of 168 00:09:39,100 --> 00:09:41,030 just the blue event. 169 00:09:41,030 --> 00:09:46,710 Or to put it differently, this probably here is equal to 1/2 170 00:09:46,710 --> 00:09:50,870 of the probability that the absolute value of x is larger 171 00:09:50,870 --> 00:09:54,780 than or equal to 3, and therefore, by using the same 172 00:09:54,780 --> 00:10:00,090 bound as here, we will obtain and answer of 1/3. 173 00:10:00,090 --> 00:10:03,440 So by being a little more clever and exploiting the 174 00:10:03,440 --> 00:10:07,660 symmetry of this distribution around 0, we get a somewhat 175 00:10:07,660 --> 00:10:12,660 better bound of 1/3, which is, again, a true bound. 176 00:10:12,660 --> 00:10:16,260 It is more informative than the original bound, but still 177 00:10:16,260 --> 00:10:18,910 it is quite far away from the true answer.