1 00:00:00,500 --> 00:00:03,180 PROFESSOR: The simplest bound that a random variable differs 2 00:00:03,180 --> 00:00:07,640 by much from its expectation is due to a guy named Markov, 3 00:00:07,640 --> 00:00:09,894 a Russian probability theorist. 4 00:00:09,894 --> 00:00:12,310 And this is Markov's bound that we're going to talk about. 5 00:00:12,310 --> 00:00:16,960 Let's illustrate it with a memorable example of IQ. 6 00:00:16,960 --> 00:00:20,060 In the MIT context, it may be a radical idea. 7 00:00:20,060 --> 00:00:23,670 But IQ was this thing that was invented 8 00:00:23,670 --> 00:00:29,210 for intelligence quotient in the late 19th century, I believe. 9 00:00:29,210 --> 00:00:31,330 Might have been early 20th. 10 00:00:31,330 --> 00:00:35,660 It was meant as an effort to break 11 00:00:35,660 --> 00:00:39,910 the mold at Harvard of hiring the children of wealthy alumni. 12 00:00:39,910 --> 00:00:42,667 And the idea was to have merit-based admissions. 13 00:00:42,667 --> 00:00:44,750 And it was going to be some objective measure that 14 00:00:44,750 --> 00:00:49,410 did not depend on social class of the ability that people had. 15 00:00:49,410 --> 00:00:51,290 And Harvard was going to admit students 16 00:00:51,290 --> 00:00:54,427 based on merit and their intelligence quotient. 17 00:00:54,427 --> 00:00:56,510 So the original design of the intelligent quotient 18 00:00:56,510 --> 00:01:00,540 by a bunch of psychologists was that the average was supposed 19 00:01:00,540 --> 00:01:04,480 to be 100 over the whole population, which, 20 00:01:04,480 --> 00:01:08,030 of course, is-- around here, there just 21 00:01:08,030 --> 00:01:12,660 aren't very many people with an IQ of just 100. 22 00:01:12,660 --> 00:01:15,470 Anyway, let's ask this extreme question. 23 00:01:15,470 --> 00:01:18,645 Yes, around the elite universities, 24 00:01:18,645 --> 00:01:21,940 there are a lot of people with IQs much higher than 100. 25 00:01:21,940 --> 00:01:25,380 But what fraction of the population 26 00:01:25,380 --> 00:01:28,840 could possibly have an IQ as high as 300? 27 00:01:28,840 --> 00:01:30,880 Now, I'm not sure that an IQ of as high as 300 28 00:01:30,880 --> 00:01:31,880 has ever been recorded. 29 00:01:31,880 --> 00:01:33,720 But we're talking logically here. 30 00:01:33,720 --> 00:01:36,620 Is it possible for a lot of people 31 00:01:36,620 --> 00:01:38,690 to have an IQ of greater than or equal to 100? 32 00:01:38,690 --> 00:01:40,520 And the answer is no. 33 00:01:40,520 --> 00:01:43,980 You can't possibly have more than 1/3 of the population 34 00:01:43,980 --> 00:01:48,890 have an IQ of 300, because if more than 1/3 had an IQ of 300, 35 00:01:48,890 --> 00:01:53,650 then that third alone would contribute 1/3 of 300 36 00:01:53,650 --> 00:01:56,460 to the average, which would be greater than 100. 37 00:01:56,460 --> 00:02:00,720 So you can't have more than 1/3 of the population 38 00:02:00,720 --> 00:02:06,090 have an IQ of triple the expected value of the IQ. 39 00:02:06,090 --> 00:02:10,020 So that's the basic bound. 40 00:02:10,020 --> 00:02:11,700 So we can restate it this way. 41 00:02:11,700 --> 00:02:15,160 The probability that a randomly chosen person 42 00:02:15,160 --> 00:02:17,580 has an IQ greater than or equal to 100 we can say 43 00:02:17,580 --> 00:02:22,010 is absolutely less than or equal to the expected value of IQ, 44 00:02:22,010 --> 00:02:24,650 namely 100 divided by 300. 45 00:02:24,650 --> 00:02:28,920 And just parameterizing it, if we ask, 46 00:02:28,920 --> 00:02:31,520 what's the probability that the IQ is 47 00:02:31,520 --> 00:02:33,790 greater than or equal to some amount x, 48 00:02:33,790 --> 00:02:38,420 it's less than or equal to 100/x by exactly that reasoning. 49 00:02:38,420 --> 00:02:40,700 And this is basically Markov's bound, 50 00:02:40,700 --> 00:02:43,270 except there's one implicit fact that we're 51 00:02:43,270 --> 00:02:48,020 using in deriving the previous identity, or inequality, 52 00:02:48,020 --> 00:02:49,510 that IQ is bounded by 100. 53 00:02:49,510 --> 00:02:52,780 Our logic was that you can't have more than population x 54 00:02:52,780 --> 00:02:55,730 with an IQ of more than 100x, because that 55 00:02:55,730 --> 00:02:59,380 would contribute x times 100/x, or more than 100 56 00:02:59,380 --> 00:03:00,440 to the average. 57 00:03:00,440 --> 00:03:02,070 And the average is only 100. 58 00:03:02,070 --> 00:03:08,480 That's only a problem if there are no negative terms, 59 00:03:08,480 --> 00:03:11,900 negative IQs, to offset the excess contribution 60 00:03:11,900 --> 00:03:14,530 of the fraction of the population that 61 00:03:14,530 --> 00:03:15,880 has this high IQ. 62 00:03:15,880 --> 00:03:19,860 But we're implicitly using the fact that IQ is never negative. 63 00:03:19,860 --> 00:03:23,270 IQ runs from zero up to unlimited amount. 64 00:03:23,270 --> 00:03:24,660 But it's never negative. 65 00:03:24,660 --> 00:03:26,730 And that means that that contribution 66 00:03:26,730 --> 00:03:30,610 from the 1/3 of the population that has an IQ of over 300 67 00:03:30,610 --> 00:03:32,252 can't be offset by negative values. 68 00:03:32,252 --> 00:03:33,960 It's there, and it messes up the average. 69 00:03:33,960 --> 00:03:35,440 It forces the average up. 70 00:03:35,440 --> 00:03:38,440 So we were using the fact that IQ is always non-negative. 71 00:03:38,440 --> 00:03:40,080 And by this very same reasoning, I'm 72 00:03:40,080 --> 00:03:43,130 not going to belabor you with a more formal proof. 73 00:03:43,130 --> 00:03:44,800 There's a trivial one in the text. 74 00:03:44,800 --> 00:03:45,490 It's easy. 75 00:03:45,490 --> 00:03:48,760 The theorem, Markov's bound, says that if R is non-negative, 76 00:03:48,760 --> 00:03:51,460 then the probability that R is greater than 77 00:03:51,460 --> 00:03:55,870 or equal to x is less than or equal to the expectation of R 78 00:03:55,870 --> 00:03:58,540 divided by x. 79 00:03:58,540 --> 00:04:00,660 And this holds for any x greater than 0. 80 00:04:00,660 --> 00:04:03,951 Of course, it's silly to state if this bound 81 00:04:03,951 --> 00:04:05,200 is greater than or equal to 1. 82 00:04:05,200 --> 00:04:07,240 It's not an interesting bound, since probability is never 83 00:04:07,240 --> 00:04:08,365 greater than or equal to 1. 84 00:04:08,365 --> 00:04:11,570 So we might as well just restrict ourselves 85 00:04:11,570 --> 00:04:14,242 to x's that are greater than the expectation of R, 86 00:04:14,242 --> 00:04:15,700 because those are the only x's that 87 00:04:15,700 --> 00:04:17,616 are going to give us a nontrivial bound that's 88 00:04:17,616 --> 00:04:19,250 less than 1. 89 00:04:19,250 --> 00:04:21,860 Again, if R is non-negative, then the probability 90 00:04:21,860 --> 00:04:25,770 that R exceeds an amount x is less than 91 00:04:25,770 --> 00:04:29,880 or equal to the expectation of R over x. 92 00:04:29,880 --> 00:04:32,610 And that's the Markov bound. 93 00:04:32,610 --> 00:04:36,150 If you restate it in terms of deviation from the mean, 94 00:04:36,150 --> 00:04:38,150 you could formulate it this way-- 95 00:04:38,150 --> 00:04:40,105 the probability that R is greater 96 00:04:40,105 --> 00:04:43,120 than or equal to a constant times its mean-- mu is 97 00:04:43,120 --> 00:04:46,190 an abbreviation for the expectation of R-- is less than 98 00:04:46,190 --> 00:04:48,600 or equal to 1/c. 99 00:04:48,600 --> 00:04:51,370 So now, we can understand that as a bound 100 00:04:51,370 --> 00:04:55,040 on the deviation from the mean-- above the mean, in this case-- 101 00:04:55,040 --> 00:05:00,310 that R, as the factor of the expectation increases, 102 00:05:00,310 --> 00:05:04,940 the probability decreases proportionally. 103 00:05:04,940 --> 00:05:08,480 So the probability that R is greater equal to 3 times 104 00:05:08,480 --> 00:05:12,060 the expected amount is less than or equal to 1/3, which was what 105 00:05:12,060 --> 00:05:16,300 we saw with the IQ example. 106 00:05:16,300 --> 00:05:20,390 So look, this Markov bound, in general, is very weak. 107 00:05:20,390 --> 00:05:23,690 As I said, I don't think there's ever been an IQ recorded 108 00:05:23,690 --> 00:05:26,751 that was as high as 300. 109 00:05:26,751 --> 00:05:31,244 And in almost all the examples that you come across, 110 00:05:31,244 --> 00:05:32,660 there'll be other information that 111 00:05:32,660 --> 00:05:36,090 allows you to deduce tighter bounds on the probability 112 00:05:36,090 --> 00:05:38,560 that a random variable is significantly 113 00:05:38,560 --> 00:05:40,360 bigger than its expectation. 114 00:05:40,360 --> 00:05:42,179 But if you don't have any information 115 00:05:42,179 --> 00:05:44,720 about the random variable, other than that it's non-negative, 116 00:05:44,720 --> 00:05:46,860 then as a matter of fact, Markov bound is tight. 117 00:05:46,860 --> 00:05:49,890 You can't possibly reach a stronger contribution, 118 00:05:49,890 --> 00:05:52,670 because there are non-negative random variables where 119 00:05:52,670 --> 00:05:55,050 the probability that they are greater 120 00:05:55,050 --> 00:05:57,270 than or equal to a given amount x 121 00:05:57,270 --> 00:06:00,790 is, in fact, equal to their expectation divided by x. 122 00:06:00,790 --> 00:06:04,480 So the Markov bound is weak in application, 123 00:06:04,480 --> 00:06:06,960 but it's the strongest condition you 124 00:06:06,960 --> 00:06:10,030 can make on the very limited hypotheses 125 00:06:10,030 --> 00:06:13,500 that it makes about properties of the random variable. 126 00:06:13,500 --> 00:06:17,184 And it's also pretty obvious, I hope from this example 127 00:06:17,184 --> 00:06:19,100 that we've talked about, but the amazing thing 128 00:06:19,100 --> 00:06:19,933 is how useful it is. 129 00:06:19,933 --> 00:06:24,030 We will get mileage out of it by using it in clever ways. 130 00:06:24,030 --> 00:06:26,450 So let's talk about the first clever way. 131 00:06:26,450 --> 00:06:28,650 And suppose that we're thinking about IQ 132 00:06:28,650 --> 00:06:30,150 is greater than or equal to 100. 133 00:06:30,150 --> 00:06:34,020 But I bring into the story another fact 134 00:06:34,020 --> 00:06:36,170 that we haven't mentioned before, which is, 135 00:06:36,170 --> 00:06:41,640 let's suppose that as a matter of fact, IQs of less than 50 136 00:06:41,640 --> 00:06:42,610 just don't occur. 137 00:06:42,610 --> 00:06:44,940 I think they might actually, but there's 138 00:06:44,940 --> 00:06:47,667 a certain point where you just are not functioning at all. 139 00:06:47,667 --> 00:06:49,250 And it's not clear that it makes sense 140 00:06:49,250 --> 00:06:51,330 to ever talk about somebody who's in a coma 141 00:06:51,330 --> 00:06:52,680 as having an IQ. 142 00:06:52,680 --> 00:06:54,110 Maybe they have an IQ of 0. 143 00:06:54,110 --> 00:06:57,670 But let's assume that pragmatically IQ is never 144 00:06:57,670 --> 00:07:00,710 less than or equal to 50. 145 00:07:00,710 --> 00:07:03,570 Now, if I tell you that I know that IQ 146 00:07:03,570 --> 00:07:05,300 is greater than or equal to 50, then I 147 00:07:05,300 --> 00:07:07,820 can actually get a better bound out of Markov. 148 00:07:07,820 --> 00:07:11,730 Because now, knowing that IQ is greater than or equal to 50, 149 00:07:11,730 --> 00:07:17,010 IQ minus 50 becomes a non-negative random variable, 150 00:07:17,010 --> 00:07:19,330 which I couldn't be sure it was before, 151 00:07:19,330 --> 00:07:21,820 because IQ might have gone below 50. 152 00:07:21,820 --> 00:07:25,220 Now that I know that it's always above 50, IQ minus 50 153 00:07:25,220 --> 00:07:25,990 is non-negative. 154 00:07:25,990 --> 00:07:29,320 And Markov's bound will apply to IQ minus 50. 155 00:07:29,320 --> 00:07:32,270 And applying it to IQ minus 50 will give you a better bound. 156 00:07:32,270 --> 00:07:34,540 Because now, looking at the probability 157 00:07:34,540 --> 00:07:37,310 that the IQ is greater than or equal to 100, of course, that's 158 00:07:37,310 --> 00:07:40,250 the same as saying that IQ minus 50 159 00:07:40,250 --> 00:07:43,860 is greater than or equal to 300 minus 50, 160 00:07:43,860 --> 00:07:54,280 which 50-- the average expected value of IQ minus 50 161 00:07:54,280 --> 00:07:55,880 is 100 minus 50. 162 00:07:55,880 --> 00:08:00,420 So we're asking whether this is non-negative random variable is 163 00:08:00,420 --> 00:08:05,580 greater than or equal to 250. 164 00:08:05,580 --> 00:08:07,480 And the answer is, that's less than 165 00:08:07,480 --> 00:08:16,730 or equal to its expectation over 250, which is 1/5, 50/250. 166 00:08:16,730 --> 00:08:19,660 And that's a tighter bound than the 1/3 we had previously. 167 00:08:19,660 --> 00:08:22,156 This is a general phenomenon that you get 168 00:08:22,156 --> 00:08:24,780 and that helps you get slightly stronger bounds out of Markov's 169 00:08:24,780 --> 00:08:28,560 bound, namely if you have a non-negative variable, 170 00:08:28,560 --> 00:08:30,580 you get a better bound on it by shifting it 171 00:08:30,580 --> 00:08:32,150 so that its mean is 0. 172 00:08:32,150 --> 00:08:34,270 As a matter of fact, even if it goes negative, 173 00:08:34,270 --> 00:08:36,760 if you shift it up, if you can force 174 00:08:36,760 --> 00:08:42,520 it to become above 0 as a minimum, 175 00:08:42,520 --> 00:08:46,320 then you can apply Markov's bound to it.