1 00:00:00,550 --> 00:00:03,850 The central limit theorem is absolutely remarkable. 2 00:00:03,850 --> 00:00:07,020 It is a very deep result, and highly 3 00:00:07,020 --> 00:00:09,860 nontrivial and non intuitive. 4 00:00:09,860 --> 00:00:15,030 There's no apparent reason why this random variable here, a 5 00:00:15,030 --> 00:00:18,200 standardized version of the sum of random variables, 6 00:00:18,200 --> 00:00:22,070 should have an approximately normal distribution. 7 00:00:22,070 --> 00:00:26,420 Furthermore, it is very useful, and one key reason is 8 00:00:26,420 --> 00:00:28,290 that it is universal. 9 00:00:28,290 --> 00:00:32,259 It doesn't matter what the distribution of the X's is. 10 00:00:32,259 --> 00:00:35,840 No matter what the distribution is, still in the 11 00:00:35,840 --> 00:00:40,010 limit, this standardized version of the sum is going to 12 00:00:40,010 --> 00:00:42,790 behave like a normal random variable. 13 00:00:42,790 --> 00:00:47,620 And if we wish to apply it to particular examples or models, 14 00:00:47,620 --> 00:00:50,690 the only thing that we need to know about the distribution of 15 00:00:50,690 --> 00:00:54,550 the X's are the corresponding means and variances, as we're 16 00:00:54,550 --> 00:00:57,200 going to see in multiple examples. 17 00:00:57,200 --> 00:01:02,990 When we apply it, it turns out to be very accurate, and it is 18 00:01:02,990 --> 00:01:06,810 also a very nice computational shortcut. 19 00:01:06,810 --> 00:01:10,300 Even if we knew, in detail, the distribution of the X's, 20 00:01:10,300 --> 00:01:14,000 in order to calculate the distribution of Sn, we would 21 00:01:14,000 --> 00:01:17,230 have to take the distribution of the X's and convolve it 22 00:01:17,230 --> 00:01:20,370 with itself n times, something that can be 23 00:01:20,370 --> 00:01:22,210 computationally tedious. 24 00:01:22,210 --> 00:01:26,050 Whereas the computations that are involved, when we use the 25 00:01:26,050 --> 00:01:30,210 central limit theorem, are very, very simple, as the 26 00:01:30,210 --> 00:01:33,729 examples that will be coming up will show to us. 27 00:01:33,729 --> 00:01:37,160 Finally, at the philosophical level, the central limit 28 00:01:37,160 --> 00:01:41,620 theorem justifies why models involving normal random 29 00:01:41,620 --> 00:01:44,210 variables are very natural. 30 00:01:44,210 --> 00:01:48,630 Whenever you have a phenomenon or an object that's affected 31 00:01:48,630 --> 00:01:54,130 by multiple noise sources, and if these noise sources are 32 00:01:54,130 --> 00:01:58,300 independent, then the overall effect of those noise sources 33 00:01:58,300 --> 00:02:03,330 is going to be well-modeled by a normal random variable, even 34 00:02:03,330 --> 00:02:06,800 if the distribution of each one of these noise sources is 35 00:02:06,800 --> 00:02:09,630 very different from being normal. 36 00:02:09,630 --> 00:02:14,290 And this is a reason why, in many, many applications in 37 00:02:14,290 --> 00:02:19,340 many different fields, normal random variables provide very 38 00:02:19,340 --> 00:02:23,740 useful and accurate models. 39 00:02:23,740 --> 00:02:26,329 Since the central limit theorem is so useful and 40 00:02:26,329 --> 00:02:30,210 important, it is worth making sure that we understand 41 00:02:30,210 --> 00:02:34,360 exactly what it says and to make a few comments on its 42 00:02:34,360 --> 00:02:36,329 mathematical content. 43 00:02:36,329 --> 00:02:38,020 What it says is the following. 44 00:02:38,020 --> 00:02:41,630 We take the sum of independent identically distributed random 45 00:02:41,630 --> 00:02:45,790 variables, we form this standardized version of the 46 00:02:45,790 --> 00:02:48,880 sum, where we subtracted the mean and divide by the 47 00:02:48,880 --> 00:02:53,420 standard deviation of the sum, and then what it tells us is 48 00:02:53,420 --> 00:02:57,730 that the CDF of this random variable, Zn, converges to the 49 00:02:57,730 --> 00:02:59,329 normal CDF. 50 00:02:59,329 --> 00:03:03,550 So what we have is a statement about CDFs. 51 00:03:03,550 --> 00:03:07,370 It does not yet tell us anything specific 52 00:03:07,370 --> 00:03:10,200 about PDFs or PMFs. 53 00:03:10,200 --> 00:03:14,670 So for example, if Sn and the X's are all continuous random 54 00:03:14,670 --> 00:03:18,300 variable, so Zn is also continuous, you might wonder 55 00:03:18,300 --> 00:03:23,320 whether the PDF of Zn converges to a normal PDF. 56 00:03:23,320 --> 00:03:26,810 It turns out that there are results of this kind that 57 00:03:26,810 --> 00:03:31,410 assert convergence of the PDF, or even PMF, of this random 58 00:03:31,410 --> 00:03:35,350 variable to a normal PDF, in some sense. 59 00:03:35,350 --> 00:03:39,110 But these results generally need a few more mathematical 60 00:03:39,110 --> 00:03:42,670 assumptions for the results to be valid. 61 00:03:42,670 --> 00:03:47,720 Nevertheless, when we show pictures of various examples, 62 00:03:47,720 --> 00:03:51,930 we will do this by showing pictures of PDFs and PMFs 63 00:03:51,930 --> 00:03:55,710 because these are easier to visualize. 64 00:03:55,710 --> 00:04:00,080 Now since the result is so general and so important, it 65 00:04:00,080 --> 00:04:03,400 might be worth understanding to what extent it can be 66 00:04:03,400 --> 00:04:06,190 generalized to other contexts. 67 00:04:06,190 --> 00:04:09,270 Our main two assumptions are that the random variables are 68 00:04:09,270 --> 00:04:12,180 independent and identically distributed. 69 00:04:12,180 --> 00:04:14,470 Can we remove those assumptions? 70 00:04:14,470 --> 00:04:17,630 There are versions of the central limit theorem that 71 00:04:17,630 --> 00:04:21,010 apply to the case where the Xi's are not identically 72 00:04:21,010 --> 00:04:22,550 distributed. 73 00:04:22,550 --> 00:04:26,410 One just needs to make certain assumptions on the means and 74 00:04:26,410 --> 00:04:28,910 to the variances of the Xi's. 75 00:04:28,910 --> 00:04:31,240 Some conditions will be needed. 76 00:04:31,240 --> 00:04:34,680 Also, the assumption of independence does not need to 77 00:04:34,680 --> 00:04:36,570 be literally true. 78 00:04:36,570 --> 00:04:39,482 There are versions of the central limit theorem that are 79 00:04:39,482 --> 00:04:43,480 valid when we have just weak dependence. 80 00:04:43,480 --> 00:04:48,920 That is, nearby X's may be dependent, but if you compare 81 00:04:48,920 --> 00:04:54,420 X5 with X of 1 million, then these two random variables are 82 00:04:54,420 --> 00:04:56,170 essentially independent. 83 00:04:56,170 --> 00:04:59,460 In those cases, we can still apply a suitable version of 84 00:04:59,460 --> 00:05:01,510 the central limit theorem. 85 00:05:01,510 --> 00:05:04,710 And finally, you may be curious how 86 00:05:04,710 --> 00:05:08,230 this result is proved. 87 00:05:08,230 --> 00:05:11,970 One way of proving it, which is the way it was originally 88 00:05:11,970 --> 00:05:16,270 established a long time ago, for the special case of 89 00:05:16,270 --> 00:05:21,080 Bernoulli random variables X, in which case S is binomial. 90 00:05:21,080 --> 00:05:24,780 The way it was established was by carrying out algebraic 91 00:05:24,780 --> 00:05:28,250 manipulations on the binomial formulas. 92 00:05:28,250 --> 00:05:31,910 But this was a derivation that would not generalize. 93 00:05:31,910 --> 00:05:35,060 For the general case, the proof is obtained using 94 00:05:35,060 --> 00:05:38,990 so-called transform methods, which is a topic that we're 95 00:05:38,990 --> 00:05:42,730 not covering, but it goes as follows. 96 00:05:42,730 --> 00:05:47,415 We consider this function of the random variable Zn, where 97 00:05:47,415 --> 00:05:51,150 s is some parameter, and we show that this expectation 98 00:05:51,150 --> 00:05:54,620 converges to the corresponding expectation if you have the 99 00:05:54,620 --> 00:05:58,440 standard normal Z in the place of Zn. 100 00:05:58,440 --> 00:06:03,870 And this is true for all s, or at least for all s in some 101 00:06:03,870 --> 00:06:05,480 rich enough set. 102 00:06:05,480 --> 00:06:09,310 And then, one appeals to some deep mathematical results that 103 00:06:09,310 --> 00:06:14,990 tell you that if this kind of expectation converges to that 104 00:06:14,990 --> 00:06:19,750 expectation, then the CDF of Zn must also converge to the 105 00:06:19,750 --> 00:06:24,300 CDF of Z. But this is a proof that involves various steps 106 00:06:24,300 --> 00:06:28,550 and appeals to some deep results from other fields of 107 00:06:28,550 --> 00:06:31,030 mathematics. 108 00:06:31,030 --> 00:06:34,100 And finally, there is the practical side. 109 00:06:34,100 --> 00:06:37,780 What exactly does it say and how do we use it? 110 00:06:37,780 --> 00:06:42,420 Since the CDF of Zn can be approximated by the CDF of a 111 00:06:42,420 --> 00:06:46,600 standard normal, this means that in practice, we can treat 112 00:06:46,600 --> 00:06:51,880 the random variable Zn just as if it were a standard normal. 113 00:06:51,880 --> 00:06:56,710 But now, we notice that Sn is obtained as a 114 00:06:56,710 --> 00:06:59,040 linear function of Zn. 115 00:06:59,040 --> 00:07:03,170 Namely, the definition of Zn gives us this formula. 116 00:07:05,960 --> 00:07:09,160 So, Sn is a linear function of Zn. 117 00:07:09,160 --> 00:07:13,300 If we pretend that Zn is normal, and since linear 118 00:07:13,300 --> 00:07:16,530 functions of normal random variables are normal, this 119 00:07:16,530 --> 00:07:21,150 means that we will also pretend that Sn is normal. 120 00:07:21,150 --> 00:07:24,610 So in practice, the way to carry out calculations is 121 00:07:24,610 --> 00:07:29,500 often to just pretend that Sn is normal with the appropriate 122 00:07:29,500 --> 00:07:31,290 mean and variance. 123 00:07:31,290 --> 00:07:35,010 These are the correct means and variance of Sn, and that's 124 00:07:35,010 --> 00:07:38,390 the only information that we need in order to apply it. 125 00:07:38,390 --> 00:07:42,120 If we know mu and sigma squared, and using the normal 126 00:07:42,120 --> 00:07:45,690 approximation, then we have an approximate distribution for 127 00:07:45,690 --> 00:07:48,350 Sn, and we can go ahead. 128 00:07:48,350 --> 00:07:52,040 Now in practice, can we use the central limit theorem when 129 00:07:52,040 --> 00:07:53,050 n is moderate? 130 00:07:53,050 --> 00:07:57,300 For example, if n is 30, can you apply the 131 00:07:57,300 --> 00:07:59,000 central limit theorem? 132 00:07:59,000 --> 00:08:03,220 This is a relation that's true in the limit of very large n. 133 00:08:03,220 --> 00:08:05,550 How large should n be? 134 00:08:05,550 --> 00:08:08,610 It turns out that the central limit theorem gives us very 135 00:08:08,610 --> 00:08:14,100 good approximations even when n has moderately small values. 136 00:08:14,100 --> 00:08:16,700 Now, these approximations sometimes will be better and 137 00:08:16,700 --> 00:08:18,680 sometimes worse. 138 00:08:18,680 --> 00:08:21,560 It helps if the distribution of the X's that you're 139 00:08:21,560 --> 00:08:26,170 starting with has some common one features with the normal 140 00:08:26,170 --> 00:08:27,310 distribution. 141 00:08:27,310 --> 00:08:31,700 If the X's are already normal, then S will be normal and 142 00:08:31,700 --> 00:08:33,360 there's no approximation involved. 143 00:08:33,360 --> 00:08:37,000 If the X's are close to normal, then for fairly small 144 00:08:37,000 --> 00:08:40,960 values of n, S will be very well modeled by a normal 145 00:08:40,960 --> 00:08:42,350 random variable. 146 00:08:42,350 --> 00:08:45,050 Now, what does it mean that the distribution of the X's 147 00:08:45,050 --> 00:08:47,960 looks a little bit like the normal one? 148 00:08:47,960 --> 00:08:51,750 It helps if the distribution is symmetric around its mean, 149 00:08:51,750 --> 00:08:55,910 and it also helps if it is unimodal, in the sense that it 150 00:08:55,910 --> 00:08:59,290 has a single peak rather than multiple peaks.