1 00:00:00,780 --> 00:00:03,280 PROFESSOR: Now we're ready to prove the law of large numbers 2 00:00:03,280 --> 00:00:05,700 and while we're at it, to get a quantitative version, 3 00:00:05,700 --> 00:00:08,020 which will be the basis for theory 4 00:00:08,020 --> 00:00:10,590 of sampling an estimation. 5 00:00:10,590 --> 00:00:13,020 So let's remember that the law of large numbers 6 00:00:13,020 --> 00:00:17,360 says that if you have n independent, 7 00:00:17,360 --> 00:00:22,580 identically distributed random variables with mean mu 8 00:00:22,580 --> 00:00:26,750 and we let An b their average, then for every positive number 9 00:00:26,750 --> 00:00:31,650 delta, the probability that the average differs from the mean 10 00:00:31,650 --> 00:00:37,070 by more than delta goes to 0 as the number of trials increases. 11 00:00:37,070 --> 00:00:39,620 Remember, that means that if you tell me 12 00:00:39,620 --> 00:00:43,150 what you think close means and what 13 00:00:43,150 --> 00:00:46,170 you think very likely means, then I can guarantee that 14 00:00:46,170 --> 00:00:52,180 by doing enough trials, the likelihood that the mean will 15 00:00:52,180 --> 00:00:56,860 be that close to the average will be outside the tolerance 16 00:00:56,860 --> 00:01:01,390 is as small as you thought small should be. 17 00:01:01,390 --> 00:01:03,200 And we're ready for the proof. 18 00:01:03,200 --> 00:01:05,364 But in the proof, there's one extra fact 19 00:01:05,364 --> 00:01:07,780 that we're going to use that we didn't explicitly mention, 20 00:01:07,780 --> 00:01:11,520 which is that not only are all of these random variables 21 00:01:11,520 --> 00:01:13,430 identically distributed and independent, 22 00:01:13,430 --> 00:01:14,860 but we're actually going to assume 23 00:01:14,860 --> 00:01:16,500 that they have a variance. 24 00:01:16,500 --> 00:01:20,270 Now, not every random variable has a finite variance, 25 00:01:20,270 --> 00:01:21,486 even if it has a finite mean. 26 00:01:21,486 --> 00:01:23,610 In fact, there are random variables that don't even 27 00:01:23,610 --> 00:01:26,570 have finite means, and we'll look at them 28 00:01:26,570 --> 00:01:28,490 on the last day of class. 29 00:01:28,490 --> 00:01:31,170 But so we're going to explicitly assume 30 00:01:31,170 --> 00:01:33,010 that all of these random variables 31 00:01:33,010 --> 00:01:37,640 have the same variance, namely the standard deviation sigma 32 00:01:37,640 --> 00:01:41,840 squared, and we'll be using that fact in the proof. 33 00:01:41,840 --> 00:01:44,700 Now, the first question to ask is, what is 34 00:01:44,700 --> 00:01:46,154 the expectation of the average? 35 00:01:46,154 --> 00:01:47,570 And the expectation of the average 36 00:01:47,570 --> 00:01:48,930 is simply the expectation. 37 00:01:48,930 --> 00:01:49,920 Let's prove that. 38 00:01:49,920 --> 00:01:52,170 The expectation of the average is by definition 39 00:01:52,170 --> 00:01:56,030 the expectation of the sum of the R's over n, 40 00:01:56,030 --> 00:02:00,030 and by additivity of expectation, 41 00:02:00,030 --> 00:02:02,260 that's the sum of the expectation 42 00:02:02,260 --> 00:02:03,940 of each of the R's over n. 43 00:02:03,940 --> 00:02:07,230 But each of them has expectation mu, 44 00:02:07,230 --> 00:02:10,410 so the numerator is n mu and the n's cancel. 45 00:02:10,410 --> 00:02:14,520 And sure enough, the average has the same expectation 46 00:02:14,520 --> 00:02:20,500 as each of the individual variables, each of the trials. 47 00:02:20,500 --> 00:02:24,190 Now, that [INAUDIBLE] let's us apply the Chebyshev bound 48 00:02:24,190 --> 00:02:26,920 to the random variable An because now we 49 00:02:26,920 --> 00:02:31,200 know what its mean is, and its mean is independent of n. 50 00:02:31,200 --> 00:02:33,890 We can apply Chebyshev to the probability 51 00:02:33,890 --> 00:02:37,130 that the average of n trials differs from its mean 52 00:02:37,130 --> 00:02:38,130 by more than delta. 53 00:02:38,130 --> 00:02:40,940 And according to Chebyshev, that's 54 00:02:40,940 --> 00:02:47,380 bounded by the variance of the average divided by n squared. 55 00:02:47,380 --> 00:02:51,070 So I will have proved the law of large numbers 56 00:02:51,070 --> 00:02:54,820 if I can prove that the limit as n approaches infinity 57 00:02:54,820 --> 00:02:57,820 of the variance goes to 0 because that means 58 00:02:57,820 --> 00:03:01,200 that the right-hand side will be going to 0 over delta squared, 59 00:03:01,200 --> 00:03:04,770 namely going to 0, which is what the law of large numbers says. 60 00:03:04,770 --> 00:03:08,610 So we've reduced the proof of the law of large numbers 61 00:03:08,610 --> 00:03:12,530 to proving that the variance goes to 0 as n approaches 62 00:03:12,530 --> 00:03:16,320 infinity, where An is the average of n 63 00:03:16,320 --> 00:03:19,100 identically distributed variables 64 00:03:19,100 --> 00:03:24,430 with common mean mu and standard deviation sigma. 65 00:03:24,430 --> 00:03:28,340 Well, let's calculate the variance of An where An, again, 66 00:03:28,340 --> 00:03:32,120 is the average, the sum of the R's over n. 67 00:03:32,120 --> 00:03:34,400 And since we're assuming independence 68 00:03:34,400 --> 00:03:37,890 of the R's, the variance sum rule, and it 69 00:03:37,890 --> 00:03:41,510 just tells us that this is the sum of the variances. 70 00:03:41,510 --> 00:03:45,120 Now, if we factor out the R-- 1 over n-- now, 71 00:03:45,120 --> 00:03:48,700 this is a factor of 1 over times n this sum. 72 00:03:48,700 --> 00:03:51,370 When we factor a constant out of the variance, 73 00:03:51,370 --> 00:03:55,680 it squares, so the denominator here becomes n squared, 74 00:03:55,680 --> 00:03:57,510 and that's critical. 75 00:03:57,510 --> 00:04:00,160 And the numerator is the sum of the n variances. 76 00:04:00,160 --> 00:04:02,287 Now, each variance is sigma squared 77 00:04:02,287 --> 00:04:03,870 and we've got n of them, so we wind up 78 00:04:03,870 --> 00:04:07,150 with n sigma squared over n squared, 79 00:04:07,150 --> 00:04:10,560 which is, of course, equal to sigma squared over n. 80 00:04:10,560 --> 00:04:14,840 And sigma squared is a constant, and n is going to infinity. 81 00:04:14,840 --> 00:04:17,390 So sure enough, the right-hand side 82 00:04:17,390 --> 00:04:20,089 goes to 0 as n increases, which is 83 00:04:20,089 --> 00:04:25,470 all we needed to do to conclude the weak law of large numbers. 84 00:04:25,470 --> 00:04:27,760 Now, if we go back and look at this proof, 85 00:04:27,760 --> 00:04:31,030 the only thing that it used about the R's was 86 00:04:31,030 --> 00:04:34,360 that they had the same mean, mu, and they actually 87 00:04:34,360 --> 00:04:38,030 had the same variance, sigma squared, 88 00:04:38,030 --> 00:04:41,620 and the variances added. 89 00:04:41,620 --> 00:04:43,230 That was the key step in the proof, 90 00:04:43,230 --> 00:04:45,320 that the variance of the sum of the R's was 91 00:04:45,320 --> 00:04:46,910 equal to the sum of the variances. 92 00:04:46,910 --> 00:04:49,700 Now, additivity of variances only requires 93 00:04:49,700 --> 00:04:52,670 pairwise independence and didn't even require the hypothesis 94 00:04:52,670 --> 00:04:54,530 that they were mutually independent, 95 00:04:54,530 --> 00:04:57,142 and it didn't require the previous proof 96 00:04:57,142 --> 00:04:58,600 that we went through-- did not ever 97 00:04:58,600 --> 00:05:01,550 use the fact that the R's had the same distribution, 98 00:05:01,550 --> 00:05:03,810 that they may not be identically distributed. 99 00:05:03,810 --> 00:05:06,400 It was sufficient that they have the same mean, 100 00:05:06,400 --> 00:05:08,125 and we can summarize what really proved. 101 00:05:08,125 --> 00:05:10,500 When we thought we were proving the law of large numbers, 102 00:05:10,500 --> 00:05:13,840 we actually proved a precise quantitative theorem 103 00:05:13,840 --> 00:05:16,620 that says that if R1 through Rn are 104 00:05:16,620 --> 00:05:19,580 pairwise independent random variables 105 00:05:19,580 --> 00:05:24,340 with the same finite mean mu and variant sigma squared 106 00:05:24,340 --> 00:05:29,660 and we let An be the average of those n variables, 107 00:05:29,660 --> 00:05:34,680 then the probability that the average differs from the mean 108 00:05:34,680 --> 00:05:38,550 by more than delta is less than or equal to this definite 109 00:05:38,550 --> 00:05:45,100 number that we derived, 1 over n times sigma over delta squared, 110 00:05:45,100 --> 00:05:46,796 and this was what we previously proven. 111 00:05:46,796 --> 00:05:49,170 We thought we were just proving the law of large numbers. 112 00:05:49,170 --> 00:05:53,460 We actually got this much tighter quantitative theorem. 113 00:05:53,460 --> 00:05:57,430 Now, what this tells us here is that now if you tell me 114 00:05:57,430 --> 00:06:01,250 what delta is and you tell me how small you want this to be, 115 00:06:01,250 --> 00:06:05,160 well-- I'm given sigma and you give me the delta that you 116 00:06:05,160 --> 00:06:09,190 specified, and if you tell me how small you want this to be-- 117 00:06:09,190 --> 00:06:12,470 I will know how big an n to choose. 118 00:06:12,470 --> 00:06:15,180 So this tells me how big a sample I need, 119 00:06:15,180 --> 00:06:18,160 how many tries I have to make in order 120 00:06:18,160 --> 00:06:22,470 to get the probability that the mean is within a specified 121 00:06:22,470 --> 00:06:26,110 tolerance delta as small as you specified, 122 00:06:26,110 --> 00:06:28,167 and that is our independent sampling theorem. 123 00:06:28,167 --> 00:06:30,000 That's why it's called independent sampling, 124 00:06:30,000 --> 00:06:33,660 because we now know how big a sample is needed to estimate 125 00:06:33,660 --> 00:06:37,410 the mean of any random variable with any desired 126 00:06:37,410 --> 00:06:41,320 tolerance and any desired probability, where of course 127 00:06:41,320 --> 00:06:43,350 the various has to be finite, the tolerance 128 00:06:43,350 --> 00:06:45,470 has to be positive-- tolerance is delta-- 129 00:06:45,470 --> 00:06:50,480 and the probability has to be less than 1.