1 00:00:00,570 --> 00:00:02,340 PROFESSOR: The law of large numbers 2 00:00:02,340 --> 00:00:06,770 gives a precise formal statement of the basic intuitive idea 3 00:00:06,770 --> 00:00:09,240 that underlies probability theory, 4 00:00:09,240 --> 00:00:12,060 and in particular, our interest in random variables 5 00:00:12,060 --> 00:00:15,910 and their expectations-- their means. 6 00:00:15,910 --> 00:00:19,170 So let's begin by asking what the mean means. 7 00:00:19,170 --> 00:00:22,030 Why are we so interested in it, for example. 8 00:00:22,030 --> 00:00:25,800 If you roll a fair die, with faces one 9 00:00:25,800 --> 00:00:31,430 through six, the mean value, its expected value is 3 and 1/2. 10 00:00:31,430 --> 00:00:33,690 And you'll never roll 3 and 1/2 because there 11 00:00:33,690 --> 00:00:35,297 is no 3 and 1/2 face. 12 00:00:35,297 --> 00:00:37,630 So why do we care about what this mean is if we're never 13 00:00:37,630 --> 00:00:38,810 going to roll it? 14 00:00:38,810 --> 00:00:40,590 And the answer is that we believe 15 00:00:40,590 --> 00:00:43,450 that after many rolls, if we take 16 00:00:43,450 --> 00:00:47,270 the average of the numbers that show on the dice, 17 00:00:47,270 --> 00:00:49,975 that average is going to be near the mean. 18 00:00:49,975 --> 00:00:53,030 The mean is going to be near 3 and 1/2. 19 00:00:53,030 --> 00:00:55,480 Let's look at an even more basic example. 20 00:00:55,480 --> 00:01:00,010 If it's a fair die, the probability of rolling a six, 21 00:01:00,010 --> 00:01:03,180 as with any other number, is one sixth. 22 00:01:03,180 --> 00:01:07,240 And the very meaning of the fact that the probability of rolling 23 00:01:07,240 --> 00:01:10,510 a six is one sixth is that we expect 24 00:01:10,510 --> 00:01:16,450 that if you roll a lot of times, if you roll about n times, 25 00:01:16,450 --> 00:01:20,950 the fraction of sixes is going to be around n/6. 26 00:01:20,950 --> 00:01:24,650 The fraction of six is going to be about one sixth. 27 00:01:24,650 --> 00:01:29,050 Of n rolls, you'll get about n/6 6s. 28 00:01:29,050 --> 00:01:33,630 That's almost the definition-- or the intuitive idea 29 00:01:33,630 --> 00:01:36,240 behind what we mean when we assign probability 30 00:01:36,240 --> 00:01:37,990 to some outcome. 31 00:01:37,990 --> 00:01:40,380 It's that if we did it repeatedly, 32 00:01:40,380 --> 00:01:42,150 the fraction of times that it came up 33 00:01:42,150 --> 00:01:43,820 would be equal to its probability-- 34 00:01:43,820 --> 00:01:47,170 or at least closely equal to it in the long run. 35 00:01:47,170 --> 00:01:49,110 So let's look at what Jacob Bernoulli, who 36 00:01:49,110 --> 00:01:51,060 is the discoverer of the law of large numbers, 37 00:01:51,060 --> 00:01:52,730 had to say on the subject. 38 00:01:52,730 --> 00:01:56,940 He was born in 1659 and died in 1705. 39 00:01:56,940 --> 00:02:00,570 And his famous book, The Art of Guessing-- Ars Conjectandi-- 40 00:02:00,570 --> 00:02:05,840 was actually published posthumously by his cousin. 41 00:02:05,840 --> 00:02:08,960 And Bernoulli says, "Even the stupidest man-- 42 00:02:08,960 --> 00:02:10,900 by some instinct of nature per se 43 00:02:10,900 --> 00:02:14,340 and by no previous instruction-- this is truly amazing-- 44 00:02:14,340 --> 00:02:18,550 knows for sure that the more observations that are taken, 45 00:02:18,550 --> 00:02:23,090 the less the danger will be of straying from the mark." 46 00:02:23,090 --> 00:02:24,160 All right. 47 00:02:24,160 --> 00:02:25,780 What does he mean? 48 00:02:25,780 --> 00:02:28,820 Well, it's what we said a moment ago. 49 00:02:28,820 --> 00:02:32,490 If you roll the fair die n times and the probability of a roll 50 00:02:32,490 --> 00:02:35,980 is a sixth, then the average number of sixes, which 51 00:02:35,980 --> 00:02:40,340 is the number of sixes rolled divided by n, 52 00:02:40,340 --> 00:02:43,620 we believe intuitively that that number 53 00:02:43,620 --> 00:02:47,660 is going to approach one sixth as n approaches infinity. 54 00:02:47,660 --> 00:02:50,180 That's what Bernoulli is saying, that everybody understands 55 00:02:50,180 --> 00:02:52,210 that they intuitively are sure of it. 56 00:02:52,210 --> 00:02:53,940 And who knows how they figured that out. 57 00:02:53,940 --> 00:02:55,780 But that's what everyone thinks. 58 00:02:55,780 --> 00:02:58,460 And he might be right. 59 00:02:58,460 --> 00:03:02,550 Now of course, when you're doing this experiment of rolling n 60 00:03:02,550 --> 00:03:04,580 times and counting the number of sixes 61 00:03:04,580 --> 00:03:07,010 and seeing if the fraction is close to a sixth, 62 00:03:07,010 --> 00:03:08,290 you might be unlucky. 63 00:03:08,290 --> 00:03:10,890 And it's possible that you'd get an average that 64 00:03:10,890 --> 00:03:12,340 actually was way off one sixth. 65 00:03:12,340 --> 00:03:14,700 But that would be unlucky. 66 00:03:14,700 --> 00:03:19,340 And the question is, how unlikely is it 67 00:03:19,340 --> 00:03:24,540 to be that you'd get a fraction of sixes that 68 00:03:24,540 --> 00:03:26,610 wasn't really close to a sixth? 69 00:03:26,610 --> 00:03:27,990 And with the law of large numbers 70 00:03:27,990 --> 00:03:29,990 is getting a grip on that, and in fact, 71 00:03:29,990 --> 00:03:33,020 subsequently, we'll get a more even quantitative grip 72 00:03:33,020 --> 00:03:36,050 on it, which will be crucial for applications 73 00:03:36,050 --> 00:03:38,590 in sampling and hypothesis testing. 74 00:03:38,590 --> 00:03:41,270 But let's go on. 75 00:03:41,270 --> 00:03:43,810 So let's look at some actual numbers which I calculated. 76 00:03:43,810 --> 00:03:51,180 And if you roll a die n times, where n is 6, 60, 600, 1,200, 77 00:03:51,180 --> 00:03:54,390 3,000 or 6,000, the probability that you're 78 00:03:54,390 --> 00:03:59,820 going to be within 10% of the expected number of sixes 79 00:03:59,820 --> 00:04:00,682 is given here. 80 00:04:00,682 --> 00:04:02,390 So it turns out, of course, that in order 81 00:04:02,390 --> 00:04:03,765 to be within 10-- if you're going 82 00:04:03,765 --> 00:04:05,930 to roll six times, the only way to be 83 00:04:05,930 --> 00:04:11,930 within 10% of the one expected six that you should roll, 84 00:04:11,930 --> 00:04:15,580 is to roll exactly one six in six tries. 85 00:04:15,580 --> 00:04:18,350 And the probability of that is about 40%, 86 00:04:18,350 --> 00:04:21,940 0.4 as you can check yourself easily. 87 00:04:21,940 --> 00:04:27,110 Then it turns out that if you roll 60 times, 88 00:04:27,110 --> 00:04:34,600 the probability of being-- the expected number in 60 rolls 89 00:04:34,600 --> 00:04:38,120 is going to be 10. 90 00:04:38,120 --> 00:04:42,240 So the probability of there being within 10% of 10, 91 00:04:42,240 --> 00:04:48,050 or nine to 11 sixes is 0.26. 92 00:04:48,050 --> 00:04:49,910 And likewise, the probability of there 93 00:04:49,910 --> 00:04:55,190 being within 10% of 100, which is the expected number of sixes 94 00:04:55,190 --> 00:04:58,590 when you roll 600 times, is 0.72. 95 00:04:58,590 --> 00:05:01,730 And so on until finally the probability 96 00:05:01,730 --> 00:05:05,990 of being within 10% of 1,000, which 97 00:05:05,990 --> 00:05:08,450 is the expected number when you roll 6,000 times, that 98 00:05:08,450 --> 00:05:14,190 is between 900 and 1,100 sixes in 6,000 rolls, 99 00:05:14,190 --> 00:05:18,080 is 0.999-- triple nines. 100 00:05:18,080 --> 00:05:19,560 In fact, it's a little bit bigger. 101 00:05:19,560 --> 00:05:24,870 So it's really only about one chance in 1,000 102 00:05:24,870 --> 00:05:30,120 that your number of sixes won't fall in that interval, 103 00:05:30,120 --> 00:05:33,120 within 10% of the expected number. 104 00:05:33,120 --> 00:05:35,890 Well, suppose I ask for a tighter tolerance 105 00:05:35,890 --> 00:05:38,575 and I'd like to know what's the probability of being within 5%. 106 00:05:38,575 --> 00:05:40,350 Well first of all, notice of course, 107 00:05:40,350 --> 00:05:44,690 that as the number of rolls get larger, 108 00:05:44,690 --> 00:05:47,400 the probability of being in this given interval 109 00:05:47,400 --> 00:05:50,230 is getting higher and higher, which is what Bernoulli said 110 00:05:50,230 --> 00:05:52,660 and what we intuitively believe. 111 00:05:52,660 --> 00:05:55,660 The more rolls, the more likely you 112 00:05:55,660 --> 00:05:58,930 are to be close to what you expect. 113 00:05:58,930 --> 00:06:01,300 If you tighten the tolerance, of course, 114 00:06:01,300 --> 00:06:07,310 then the probabilities wind up getting smaller 115 00:06:07,310 --> 00:06:08,570 that you'll do so well. 116 00:06:08,570 --> 00:06:13,680 So if you want to be within 5% of the average in six rolls, 117 00:06:13,680 --> 00:06:16,800 it means you still have to roll exactly one sixth, which means 118 00:06:16,800 --> 00:06:18,800 the probability is still 0.4. 119 00:06:18,800 --> 00:06:22,770 But if you're trying to be within 5% 120 00:06:22,770 --> 00:06:25,250 of the expected number 10 and 60 rolls, 121 00:06:25,250 --> 00:06:29,470 meaning between five and 15, that probability 122 00:06:29,470 --> 00:06:32,020 is only 0.14 compared to the probability 123 00:06:32,020 --> 00:06:34,680 of 0.26 of being within 10%. 124 00:06:34,680 --> 00:06:40,180 And if we jump down here, say, to 3,000 rolls, 125 00:06:40,180 --> 00:06:44,690 the probability of being within 10% of 500, which 126 00:06:44,690 --> 00:06:50,630 is the expected number in 3,000 rolls, within 10% is 0.98. 127 00:06:50,630 --> 00:06:54,000 But within being within 5% of 500, 128 00:06:54,000 --> 00:06:59,670 it's 0.78, or about a little over 3/4. 129 00:06:59,670 --> 00:07:01,580 So what does that tell us? 130 00:07:01,580 --> 00:07:04,290 Well, it means that if you rolled 3,000 times 131 00:07:04,290 --> 00:07:08,630 and you did not get within 10% of the expected number 500, 132 00:07:08,630 --> 00:07:11,930 that is you did not get in the interval between 450 133 00:07:11,930 --> 00:07:16,610 and 556 sixes, you can be 98% confident 134 00:07:16,610 --> 00:07:18,050 that your die is loaded. 135 00:07:18,050 --> 00:07:22,450 It's not weighted one sixth to show a six. 136 00:07:22,450 --> 00:07:28,830 And similarly, if you did not get within 425 and 525 sixes 137 00:07:28,830 --> 00:07:36,900 in 3,000 rolls, you can be 78% sure that your die is loaded. 138 00:07:36,900 --> 00:07:40,950 And this is exactly why the law of large numbers 139 00:07:40,950 --> 00:07:46,550 is so important to us because it allows us to do an experiment 140 00:07:46,550 --> 00:07:50,800 and then assess whether what we think is true 141 00:07:50,800 --> 00:07:56,660 is verified by the outcome that we got in this experiment. 142 00:07:56,660 --> 00:07:58,060 All right. 143 00:07:58,060 --> 00:08:01,000 Let's go on to see what else Bernoulli was 144 00:08:01,000 --> 00:08:03,430 concerned with in his time. 145 00:08:03,430 --> 00:08:05,270 "It certainly remains to be inquired 146 00:08:05,270 --> 00:08:07,240 whether after the number of observations 147 00:08:07,240 --> 00:08:09,960 has been increased, the probability of obtaining 148 00:08:09,960 --> 00:08:14,240 the true ratio finally exceeds any given degree of certainty, 149 00:08:14,240 --> 00:08:16,590 or whether the problem has, so to speak, 150 00:08:16,590 --> 00:08:19,130 its own asymptote-- that is, whether some degree 151 00:08:19,130 --> 00:08:23,550 of certainty is given, which one can never exceed." 152 00:08:23,550 --> 00:08:29,640 Now, that's 17th century English that may be a little bit hard 153 00:08:29,640 --> 00:08:30,630 to parse. 154 00:08:30,630 --> 00:08:35,490 So let's translate it into math language. 155 00:08:35,490 --> 00:08:38,440 What is it that Bernoulli is asking? 156 00:08:38,440 --> 00:08:41,110 So what Bernoulli means is that he 157 00:08:41,110 --> 00:08:43,809 wants to think about taking a random variable R 158 00:08:43,809 --> 00:08:47,280 with an expectation or mean of mu. 159 00:08:47,280 --> 00:08:51,150 And he wants to make n trial observations of R 160 00:08:51,150 --> 00:08:54,040 and take the average of those observations 161 00:08:54,040 --> 00:08:56,580 and see how close they are to mu. 162 00:08:56,580 --> 00:08:57,080 All right. 163 00:08:57,080 --> 00:08:59,824 What is making n trial observations mean? 164 00:08:59,824 --> 00:09:02,240 Well, formally, the way we're going to capture it is we're 165 00:09:02,240 --> 00:09:05,730 going to think of having a bunch of mutually 166 00:09:05,730 --> 00:09:09,000 independent, identically distributed random variables 167 00:09:09,000 --> 00:09:10,810 R1 through Rn. 168 00:09:10,810 --> 00:09:13,710 This phrase "independent, identically distributed" 169 00:09:13,710 --> 00:09:17,150 comes up so often that there's a standard abbreviation i.i.d 170 00:09:17,150 --> 00:09:18,460 random variables. 171 00:09:18,460 --> 00:09:19,980 So we're going to have n of them. 172 00:09:19,980 --> 00:09:22,960 And think of those as being the n 173 00:09:22,960 --> 00:09:25,840 observations that we make of a given random variable 174 00:09:25,840 --> 00:09:29,260 R. So R1 through Rn each have exactly the same distribution 175 00:09:29,260 --> 00:09:32,080 as R. And they're mutually independent. 176 00:09:32,080 --> 00:09:36,400 And again, since they have identical distributions, 177 00:09:36,400 --> 00:09:39,792 they all have the same mean, mu, as the random variable R 178 00:09:39,792 --> 00:09:41,250 that we were trying to investigate. 179 00:09:41,250 --> 00:09:44,500 So we model n independent trials, 180 00:09:44,500 --> 00:09:48,490 repeated trials, by saying that we have n random variables that 181 00:09:48,490 --> 00:09:49,950 are i.i.d. 182 00:09:49,950 --> 00:09:50,860 OK. 183 00:09:50,860 --> 00:09:53,230 Now, what Bernoulli's proposing is 184 00:09:53,230 --> 00:09:57,240 that you take the average of those n random variables. 185 00:09:57,240 --> 00:10:01,490 So you take the sum of R1, R2, up through Rn, and divide by n. 186 00:10:01,490 --> 00:10:03,150 That's the average value. 187 00:10:03,150 --> 00:10:07,240 Call that A sub n-- the average of the n observations or the n 188 00:10:07,240 --> 00:10:08,190 rolls. 189 00:10:08,190 --> 00:10:11,610 And Bernoulli's question is, is this average probably 190 00:10:11,610 --> 00:10:15,020 close to the mean, mu, if n is big? 191 00:10:15,020 --> 00:10:17,010 What exactly does that mean? 192 00:10:17,010 --> 00:10:20,480 Probably close to mu means that the probability 193 00:10:20,480 --> 00:10:23,640 that the distance between the average and mu 194 00:10:23,640 --> 00:10:26,060 is less than or equal to delta. 195 00:10:26,060 --> 00:10:27,490 Is what? 196 00:10:27,490 --> 00:10:30,310 So delta is talking about how close you are. 197 00:10:30,310 --> 00:10:31,350 Delta is a parameter. 198 00:10:31,350 --> 00:10:34,570 We expect it's got to be positive. 199 00:10:34,570 --> 00:10:37,050 Think of whatever "close" means to you. 200 00:10:37,050 --> 00:10:39,780 Does it mean 0.1? 201 00:10:39,780 --> 00:10:41,570 Does it mean 0.01? 202 00:10:41,570 --> 00:10:47,690 What amount would persuade you that the average was 203 00:10:47,690 --> 00:10:50,020 close to what it ought to be? 204 00:10:50,020 --> 00:10:53,020 And we ask then, whether the distance 205 00:10:53,020 --> 00:10:57,430 between the average and the mean is close-- less 206 00:10:57,430 --> 00:10:58,750 than or equal to delta. 207 00:10:58,750 --> 00:11:00,970 And Bernoulli wants to know, what 208 00:11:00,970 --> 00:11:02,917 is the probability of that? 209 00:11:05,810 --> 00:11:08,440 And what it goes on to say is, "Therefore this 210 00:11:08,440 --> 00:11:11,300 is the problem which I now set forth and make 211 00:11:11,300 --> 00:11:14,810 known after I have pondered over it for 20 years. 212 00:11:14,810 --> 00:11:17,820 Both its novelty and its very great usefulness, 213 00:11:17,820 --> 00:11:20,070 coupled with its great difficulty, 214 00:11:20,070 --> 00:11:23,900 can exceed in weight and value all the remaining chapters 215 00:11:23,900 --> 00:11:25,220 of this thesis." 216 00:11:25,220 --> 00:11:28,880 Now, Bernoulli was right on about the usefulness 217 00:11:28,880 --> 00:11:32,780 of this result, at least in its quantitative form. 218 00:11:32,780 --> 00:11:36,320 And at the time, it was really pretty difficult for him. 219 00:11:36,320 --> 00:11:38,970 It took him like 200 pages to complete his proof 220 00:11:38,970 --> 00:11:41,550 in Ars Conjectandi. 221 00:11:41,550 --> 00:11:44,220 Nowadays, we are going to do it in about 222 00:11:44,220 --> 00:11:46,860 a lecture worth of material. 223 00:11:46,860 --> 00:11:51,050 And you'll be seeing that in some subsequent video segment. 224 00:11:51,050 --> 00:11:55,770 So that's what happens with 350 years to tune up a result. 225 00:11:55,770 --> 00:12:00,100 What took 200 pages then, now takes 10 or less pages. 226 00:12:00,100 --> 00:12:01,810 In fact, if it was really concise, 227 00:12:01,810 --> 00:12:04,810 it could be done in three pages. 228 00:12:04,810 --> 00:12:05,490 All right. 229 00:12:05,490 --> 00:12:07,910 So again, coming back to Bernoulli's question. 230 00:12:07,910 --> 00:12:11,590 Bernoulli's question is, what is the probability 231 00:12:11,590 --> 00:12:14,210 that the distance between the average and the mean 232 00:12:14,210 --> 00:12:18,870 is less than or equal to delta as you take more and more 233 00:12:18,870 --> 00:12:21,360 tries, as n goes to infinity? 234 00:12:21,360 --> 00:12:22,960 And Bernoulli's answer to the question 235 00:12:22,960 --> 00:12:25,990 is, that the probability is 1. 236 00:12:25,990 --> 00:12:31,190 That is, if you want to have a certain degree of certainty 237 00:12:31,190 --> 00:12:34,460 of being close to the mean, if you 238 00:12:34,460 --> 00:12:38,400 take enough trials you can be as certain as you want, 239 00:12:38,400 --> 00:12:40,850 that you'll be as close as you want. 240 00:12:40,850 --> 00:12:44,900 And that is called the weak law of large numbers. 241 00:12:44,900 --> 00:12:50,680 And it's one of the basic, transcendent rules and theorems 242 00:12:50,680 --> 00:12:52,500 of probability theory. 243 00:12:52,500 --> 00:12:54,510 It's usually stated in the other way, 244 00:12:54,510 --> 00:12:58,280 as that the limit of the probability that the average is 245 00:12:58,280 --> 00:13:03,070 a distance away from the mean delta is zero. 246 00:13:03,070 --> 00:13:06,680 It's the probability that it's extremely unlikely. 247 00:13:06,680 --> 00:13:08,360 It can be as unlikely as you want 248 00:13:08,360 --> 00:13:11,530 to make it that it's more than any given 249 00:13:11,530 --> 00:13:15,270 tolerance from the mean, if you take a large enough number 250 00:13:15,270 --> 00:13:16,200 of trials. 251 00:13:16,200 --> 00:13:18,980 Now, in this form, it's not yet really useful. 252 00:13:18,980 --> 00:13:22,876 This is a romantic qualitative limiting result. 253 00:13:22,876 --> 00:13:25,250 And to really use it, you need to know something or other 254 00:13:25,250 --> 00:13:27,520 about the rate at which it approaches the limit, which 255 00:13:27,520 --> 00:13:31,870 is what we're going to be seeing in a subsequent video. 256 00:13:31,870 --> 00:13:33,670 And in fact, the proof of this is 257 00:13:33,670 --> 00:13:38,120 going to follow easily from the Chebyshev inequality 258 00:13:38,120 --> 00:13:40,600 bound and variance properties when 259 00:13:40,600 --> 00:13:43,860 we go about trying to get the quantitative version that 260 00:13:43,860 --> 00:13:47,600 explains the rate at which the limit is approached.