1 00:00:00,970 --> 00:00:03,600 In this segment, we derive and discuss the weak 2 00:00:03,600 --> 00:00:04,890 law of large numbers. 3 00:00:04,890 --> 00:00:07,870 It is a rather simple result, but plays a central role 4 00:00:07,870 --> 00:00:09,680 within probability theory. 5 00:00:09,680 --> 00:00:11,320 The setting is as follows. 6 00:00:11,320 --> 00:00:14,130 We start with some probability distribution that has a 7 00:00:14,130 --> 00:00:19,070 certain mean and variance, which we assume to be finite. 8 00:00:19,070 --> 00:00:23,200 We then draw independent random variables out of this 9 00:00:23,200 --> 00:00:28,110 distribution so that these Xi's are independent and 10 00:00:28,110 --> 00:00:30,680 identically distributed, i.i.d. 11 00:00:30,680 --> 00:00:32,479 for short. 12 00:00:32,479 --> 00:00:35,460 What's going on here is that we're carrying out a long 13 00:00:35,460 --> 00:00:38,400 experiment during which all of these random 14 00:00:38,400 --> 00:00:40,350 variables are drawn. 15 00:00:40,350 --> 00:00:44,380 Once we have drawn all of these random variables, we can 16 00:00:44,380 --> 00:00:47,480 calculate the average of the values that have been 17 00:00:47,480 --> 00:00:52,960 obtained, and this gives us the so-called sample mean. 18 00:00:52,960 --> 00:00:57,120 Notice that the sample mean is a random variable because it 19 00:00:57,120 --> 00:00:59,390 is a function of random variables. 20 00:00:59,390 --> 00:01:03,510 It should be distinguished from the true mean, mu, which 21 00:01:03,510 --> 00:01:08,670 is the expected value of the Xi's, which is a number. 22 00:01:08,670 --> 00:01:10,530 It is not random. 23 00:01:10,530 --> 00:01:15,950 And mu is some kind of average over all the possible outcomes 24 00:01:15,950 --> 00:01:19,220 of the random variable Xi. 25 00:01:19,220 --> 00:01:22,670 The sample mean is the simplest and most natural way 26 00:01:22,670 --> 00:01:25,940 for trying to estimate the true mean, and the weak law of 27 00:01:25,940 --> 00:01:30,260 large numbers will provide some support to this notion. 28 00:01:30,260 --> 00:01:33,009 Let us now look at the properties of the sample mean. 29 00:01:33,009 --> 00:01:35,940 Let us calculate its expectation. 30 00:01:35,940 --> 00:01:40,650 By the way, this object here involves two different kinds 31 00:01:40,650 --> 00:01:42,110 of averaging. 32 00:01:42,110 --> 00:01:48,300 The sample mean averages over the values observed during one 33 00:01:48,300 --> 00:01:55,500 long experiment, whereas the expectation averages over all 34 00:01:55,500 --> 00:01:58,930 possible outcomes of this experiment. 35 00:01:58,930 --> 00:02:02,510 The expectation is some kind of theoretical average because 36 00:02:02,510 --> 00:02:06,020 we do not get to observe all the possible outcomes of this 37 00:02:06,020 --> 00:02:09,320 experiment, but the sample mean is something that we 38 00:02:09,320 --> 00:02:13,740 actually calculate on the basis of our observations. 39 00:02:13,740 --> 00:02:17,440 In any case, the expected value of the sample mean, by 40 00:02:17,440 --> 00:02:24,780 linearity, it is the expected value of the numerator divided 41 00:02:24,780 --> 00:02:27,000 by the denominator. 42 00:02:27,000 --> 00:02:30,940 Using linearity once more, the expected value of the sum is 43 00:02:30,940 --> 00:02:33,650 the sum of the expected values, and since each one of 44 00:02:33,650 --> 00:02:38,150 those expected values is equal to mu, we obtain n times mu 45 00:02:38,150 --> 00:02:42,470 divided by n, which leaves us with mu. 46 00:02:42,470 --> 00:02:46,270 So the theoretical average, the expected value of the 47 00:02:46,270 --> 00:02:49,920 sample mean, is equal to the true mean. 48 00:02:49,920 --> 00:02:54,010 Let us now calculate the variance of the sample mean. 49 00:02:54,010 --> 00:02:59,079 The variance of a random variable divided by a number 50 00:02:59,079 --> 00:03:04,950 is the variance of that random variable divided by the square 51 00:03:04,950 --> 00:03:06,200 of that number. 52 00:03:11,090 --> 00:03:14,610 Now, since the Xi's are independent, the variance is 53 00:03:14,610 --> 00:03:16,970 the sum of the variances. 54 00:03:16,970 --> 00:03:19,829 And therefore, we obtain n times the variance 55 00:03:19,829 --> 00:03:22,480 of each one of them. 56 00:03:22,480 --> 00:03:26,240 And after we simplify, this leaves us with sigma 57 00:03:26,240 --> 00:03:29,340 squared over n. 58 00:03:29,340 --> 00:03:32,210 We're now in a position to apply the Chebyshev 59 00:03:32,210 --> 00:03:33,560 inequality. 60 00:03:33,560 --> 00:03:38,140 The Chebyshev inequality tells us that the distance of a 61 00:03:38,140 --> 00:03:41,860 random variable from its mean, being larger than a certain 62 00:03:41,860 --> 00:03:45,990 number, has a probability that's bounded above by the 63 00:03:45,990 --> 00:03:51,440 variance of the random variable of interest divided 64 00:03:51,440 --> 00:03:54,940 by the square of the number that we have here. 65 00:03:54,940 --> 00:04:01,040 We have already calculated the variance, and so this quantity 66 00:04:01,040 --> 00:04:06,900 is sigma squared over n times epsilon squared. 67 00:04:06,900 --> 00:04:11,310 And now, if we consider epsilon as a fixed number and 68 00:04:11,310 --> 00:04:16,850 let n go to infinity, then what we obtain is a limiting 69 00:04:16,850 --> 00:04:18,100 value of 0. 70 00:04:22,360 --> 00:04:26,580 So the probability of falling far from the mean diminishes 71 00:04:26,580 --> 00:04:30,850 to zero as we draw more and more samples. 72 00:04:30,850 --> 00:04:34,300 That's exactly what the weak law of large numbers tells us. 73 00:04:34,300 --> 00:04:38,290 If we fix any particular epsilon, which is a positive 74 00:04:38,290 --> 00:04:42,510 constant, the probability that the sample mean falls away 75 00:04:42,510 --> 00:04:47,170 from the true mean by more than epsilon, that probability 76 00:04:47,170 --> 00:04:51,530 becomes smaller and smaller and converges to 0 as n goes 77 00:04:51,530 --> 00:04:53,460 to infinity. 78 00:04:53,460 --> 00:04:58,220 Let us now interpret the weak law of large numbers. 79 00:04:58,220 --> 00:05:03,570 As I already hinted, we have to think in terms of one long 80 00:05:03,570 --> 00:05:07,850 experiment, and during that experiment, we draw several 81 00:05:07,850 --> 00:05:10,840 independent random variables drawn from the same 82 00:05:10,840 --> 00:05:11,760 distribution. 83 00:05:11,760 --> 00:05:15,010 One way of thinking about those random variables is that 84 00:05:15,010 --> 00:05:19,420 each one of them is equal to the mean, the true mean, plus 85 00:05:19,420 --> 00:05:23,900 some measurement noise, which is a term that has zero 86 00:05:23,900 --> 00:05:25,310 expected value. 87 00:05:25,310 --> 00:05:28,180 And all of these noises are independent. 88 00:05:28,180 --> 00:05:31,330 So we have a collection of noisy measurements, and then 89 00:05:31,330 --> 00:05:36,210 we take those measurements and form the average of them. 90 00:05:36,210 --> 00:05:39,420 What the weak law of large numbers tells us is that the 91 00:05:39,420 --> 00:05:45,010 sample mean is unlikely to be far off from the true mean. 92 00:05:45,010 --> 00:05:51,350 And by far off, we mean at least epsilon distance away. 93 00:05:51,350 --> 00:05:56,420 So the sample mean is, in some ways, a good way of estimating 94 00:05:56,420 --> 00:05:57,540 the true mean. 95 00:05:57,540 --> 00:06:02,570 If n is large enough, then we have high confidence that the 96 00:06:02,570 --> 00:06:07,300 sample mean gives us a value that's close to the true mean. 97 00:06:07,300 --> 00:06:10,620 As a special case let us consider a probabilistic model 98 00:06:10,620 --> 00:06:13,760 in which we repeat independently many times the 99 00:06:13,760 --> 00:06:15,280 same experiment. 100 00:06:15,280 --> 00:06:17,650 There's a certain event A associated with that 101 00:06:17,650 --> 00:06:20,950 experiment that has a certain probability, and each time 102 00:06:20,950 --> 00:06:23,910 that we carry out the experiment, we use an 103 00:06:23,910 --> 00:06:29,670 indicator variable to indicate whether the outcome was inside 104 00:06:29,670 --> 00:06:32,300 the event or outside the event. 105 00:06:32,300 --> 00:06:41,790 So Xi is 1 if A occurs, and it is 0 otherwise. 106 00:06:44,790 --> 00:06:48,430 The expected value of the Xi's, the true mean in this 107 00:06:48,430 --> 00:06:50,970 case, is equal to the number p. 108 00:06:53,830 --> 00:06:58,610 In this particular example, the sample mean just counts 109 00:06:58,610 --> 00:07:02,340 how many times the event A occurred out of the n 110 00:07:02,340 --> 00:07:05,500 experiments that we carried out, so it's the frequency 111 00:07:05,500 --> 00:07:08,620 with which the event A has occurred. 112 00:07:08,620 --> 00:07:12,860 And we call it the empirical frequency of event A. What the 113 00:07:12,860 --> 00:07:16,670 weak law of large numbers tells us is that the empirical 114 00:07:16,670 --> 00:07:22,260 frequency will be close to the probability of that event. 115 00:07:22,260 --> 00:07:26,610 In this sense, it reinforces or justifies the 116 00:07:26,610 --> 00:07:29,660 interpretation of probabilities as frequencies.