1 00:00:00,290 --> 00:00:03,060 The purpose of this segment is to give you a little bit of 2 00:00:03,060 --> 00:00:04,510 the bigger picture. 3 00:00:04,510 --> 00:00:07,750 We did discuss some inequalities, we did discuss 4 00:00:07,750 --> 00:00:09,820 convergence of the sample mean-- 5 00:00:09,820 --> 00:00:13,190 that's the weak law of large numbers-- and we did discuss a 6 00:00:13,190 --> 00:00:15,330 particular notion of convergence of random 7 00:00:15,330 --> 00:00:17,860 variables, convergence in probability. 8 00:00:17,860 --> 00:00:21,510 How far can we take those topics? 9 00:00:21,510 --> 00:00:23,790 Let's start with the issue of inequalities. 10 00:00:23,790 --> 00:00:27,010 Here, one would like to obtain bounds and approximations on 11 00:00:27,010 --> 00:00:30,620 tail probabilities that are better than the Markov and 12 00:00:30,620 --> 00:00:33,020 Cherbyshev inequalities that we have seen. 13 00:00:33,020 --> 00:00:34,740 This is indeed possible. 14 00:00:34,740 --> 00:00:38,300 For example, there is a so-called Chernoff bound that 15 00:00:38,300 --> 00:00:40,740 takes the following form. 16 00:00:40,740 --> 00:00:44,630 The Chernoff bound tells us that the probability that the 17 00:00:44,630 --> 00:00:51,030 sample mean is away from the true mean by at least a, where 18 00:00:51,030 --> 00:00:55,710 a is a positive number, this probability is bounded above 19 00:00:55,710 --> 00:01:00,290 by a function that falls exponentially with n and where 20 00:01:00,290 --> 00:01:03,700 the exponent depends on the particular number, a, that we 21 00:01:03,700 --> 00:01:04,860 are considering. 22 00:01:04,860 --> 00:01:07,110 But in any case, this term in the exponent 23 00:01:07,110 --> 00:01:09,510 is a positive quantity. 24 00:01:09,510 --> 00:01:14,100 Notice that this is much better, much stronger than 25 00:01:14,100 --> 00:01:17,900 what we obtained from the Cherbyshev inequality because 26 00:01:17,900 --> 00:01:21,560 in the Cherbyshev inequality, we only obtain an inequality 27 00:01:21,560 --> 00:01:24,470 for this probability that falls off as the 28 00:01:24,470 --> 00:01:26,950 rate of 1 over n. 29 00:01:26,950 --> 00:01:29,830 So this falls much faster, and so it tells us that this 30 00:01:29,830 --> 00:01:33,550 probability is indeed much smaller than what the 31 00:01:33,550 --> 00:01:36,100 Cherbyshev inequality might predict. 32 00:01:36,100 --> 00:01:38,860 However, this inequality requires some additional 33 00:01:38,860 --> 00:01:42,259 assumptions on the random variables involved. 34 00:01:42,259 --> 00:01:45,880 Another type of approximation on this tail probability can 35 00:01:45,880 --> 00:01:49,120 be obtained through the central limit theorem, which 36 00:01:49,120 --> 00:01:51,060 will actually be the next topic 37 00:01:51,060 --> 00:01:52,960 that we will be studying. 38 00:01:52,960 --> 00:01:57,660 Very loosely speaking, the central limit theorem tells us 39 00:01:57,660 --> 00:02:03,530 that the random variable M sub n, which is the sample mean, 40 00:02:03,530 --> 00:02:09,910 behaves as if it were a normal random variable with the mean 41 00:02:09,910 --> 00:02:13,380 and the variance that it should have. 42 00:02:13,380 --> 00:02:16,650 We know that this is the mean and the variance of the sample 43 00:02:16,650 --> 00:02:19,840 mean, but the central limit theorem tells us that in 44 00:02:19,840 --> 00:02:23,800 addition to that, we can also pretend that the sample mean 45 00:02:23,800 --> 00:02:27,870 is normal and carry out approximations as if this were 46 00:02:27,870 --> 00:02:30,020 a normal random variable. 47 00:02:30,020 --> 00:02:32,640 Now, this statement that I'm making here is 48 00:02:32,640 --> 00:02:34,660 only a loose statement. 49 00:02:34,660 --> 00:02:38,550 It is not mathematically completely accurate. 50 00:02:38,550 --> 00:02:41,579 We will see later a more accurate statement of the 51 00:02:41,579 --> 00:02:43,640 central limit theorem. 52 00:02:43,640 --> 00:02:47,090 In a different direction, we can talk about different types 53 00:02:47,090 --> 00:02:48,750 of convergence. 54 00:02:48,750 --> 00:02:52,550 We did define convergence in probability, but that's not 55 00:02:52,550 --> 00:02:54,490 the only notion of convergence that's 56 00:02:54,490 --> 00:02:56,350 relevant to random variables. 57 00:02:56,350 --> 00:02:59,160 There's an alternative notion, which is convergence with 58 00:02:59,160 --> 00:03:00,770 probability one. 59 00:03:00,770 --> 00:03:03,240 Here is what it means. 60 00:03:03,240 --> 00:03:06,240 We have a single probabilistic experiment. 61 00:03:06,240 --> 00:03:09,290 And within that the experiment, we have a sequence 62 00:03:09,290 --> 00:03:15,220 of random variables and another random variable, and 63 00:03:15,220 --> 00:03:18,160 we want to talk about this random variable converging to 64 00:03:18,160 --> 00:03:20,360 that random variable. 65 00:03:20,360 --> 00:03:22,280 What do we mean by that? 66 00:03:22,280 --> 00:03:27,040 We consider a typical outcome of the experiment, that is, 67 00:03:27,040 --> 00:03:29,110 some omega. 68 00:03:29,110 --> 00:03:33,730 Look at the values of the random variable Yn under that 69 00:03:33,730 --> 00:03:38,530 particular omega, and look at that sequence of values, the 70 00:03:38,530 --> 00:03:40,890 values of the different random variables under that 71 00:03:40,890 --> 00:03:42,740 particular outcome. 72 00:03:42,740 --> 00:03:47,010 Under that particular outcome, Y also has a certain numerical 73 00:03:47,010 --> 00:03:53,070 value, and we're interested in whether this convergence takes 74 00:03:53,070 --> 00:03:56,760 place as n goes to infinity. 75 00:03:56,760 --> 00:03:59,520 Now for some outcomes, omega, this will happen. 76 00:03:59,520 --> 00:04:02,080 For some, it will not happen. 77 00:04:02,080 --> 00:04:04,730 We will say that we have convergence with probability 78 00:04:04,730 --> 00:04:13,870 one if this event has probability equal to 1. 79 00:04:13,870 --> 00:04:18,060 That is, there is probability one, that is, essential 80 00:04:18,060 --> 00:04:21,760 certainty, that when an outcome of the experiment is 81 00:04:21,760 --> 00:04:25,380 obtained, the resulting sequence of values of the 82 00:04:25,380 --> 00:04:28,730 random variables Yn will converge to the value of the 83 00:04:28,730 --> 00:04:30,980 random variable Y. 84 00:04:30,980 --> 00:04:35,010 Now, this definition is easy to write down, but to actually 85 00:04:35,010 --> 00:04:37,670 understand what it really means and the ways it is 86 00:04:37,670 --> 00:04:41,770 different from convergence in probability is not so easy. 87 00:04:41,770 --> 00:04:45,050 It does take some conceptual effort, and we will not 88 00:04:45,050 --> 00:04:47,880 discuss it any further at this point. 89 00:04:47,880 --> 00:04:50,670 Let me just say that this is a stronger notion of 90 00:04:50,670 --> 00:04:51,659 convergence. 91 00:04:51,659 --> 00:04:54,710 If you have convergence with probability one, you also gets 92 00:04:54,710 --> 00:04:57,080 convergence in probability. 93 00:04:57,080 --> 00:05:02,880 And it turns out that the law of large numbers also holds 94 00:05:02,880 --> 00:05:06,280 under this stronger notion of convergence. 95 00:05:06,280 --> 00:05:10,420 That is, we have that the sample mean converges to the 96 00:05:10,420 --> 00:05:14,170 true mean with probability one. 97 00:05:14,170 --> 00:05:17,970 This is the so-called strong law of large numbers, and 98 00:05:17,970 --> 00:05:21,050 because this is a stronger notion of convergence, a more 99 00:05:21,050 --> 00:05:26,320 demanding one, that's why this is called the strong law. 100 00:05:26,320 --> 00:05:30,490 Incidentally, at this point, you might be quite uncertain 101 00:05:30,490 --> 00:05:34,470 and confused as to what is really the difference between 102 00:05:34,470 --> 00:05:37,040 these two notions of convergence. 103 00:05:37,040 --> 00:05:40,470 The definitions do look different, but what is the 104 00:05:40,470 --> 00:05:41,900 real difference? 105 00:05:41,900 --> 00:05:44,770 This is quite subtle, and it does take 106 00:05:44,770 --> 00:05:46,270 quite a bit of thinking. 107 00:05:46,270 --> 00:05:50,090 It's not supposed to be something that is obvious. 108 00:05:50,090 --> 00:05:53,190 So the purpose of this discussion is only to point 109 00:05:53,190 --> 00:05:57,480 out these further directions but without, at this point, 110 00:05:57,480 --> 00:05:59,890 going into it in any depth. 111 00:05:59,890 --> 00:06:03,730 Finally, there is another notion of convergence in which 112 00:06:03,730 --> 00:06:06,460 we're looking at the distributions of the random 113 00:06:06,460 --> 00:06:08,070 variables involved. 114 00:06:08,070 --> 00:06:10,790 So we may have a sequence of random variables. 115 00:06:10,790 --> 00:06:13,770 Each one of them has a certain distribution described by a 116 00:06:13,770 --> 00:06:17,680 CDF, and we can ask the question, does this sequence 117 00:06:17,680 --> 00:06:21,510 of CDFs converge to a limiting CDF? 118 00:06:21,510 --> 00:06:24,800 If that happens, then we say that we have convergence in 119 00:06:24,800 --> 00:06:28,450 distribution, and this is more or less the type of 120 00:06:28,450 --> 00:06:32,220 convergence that shows up when we deal with the central limit 121 00:06:32,220 --> 00:06:35,040 theorem because this is really a statement about 122 00:06:35,040 --> 00:06:37,560 distributions, that the distribution of the sample 123 00:06:37,560 --> 00:06:41,570 mean in some sense starts to approach the distribution of a 124 00:06:41,570 --> 00:06:42,820 normal random variable.