1 00:00:00,530 --> 00:00:02,960 The following content is provided under a Creative 2 00:00:02,960 --> 00:00:04,370 Commons license. 3 00:00:04,370 --> 00:00:07,410 Your support will help MIT OpenCourseWare continue to 4 00:00:07,410 --> 00:00:11,060 offer high quality educational resources for free. 5 00:00:11,060 --> 00:00:13,960 To make a donation or view additional materials from 6 00:00:13,960 --> 00:00:19,790 hundreds of MIT courses, visit MIT OpenCourseWare at 7 00:00:19,790 --> 00:00:21,040 ocw.mit.edu. 8 00:00:25,210 --> 00:00:29,010 PROFESSOR: OK, we're ready for the second lecture today. 9 00:00:29,010 --> 00:00:33,330 We will start to get into a little technical material, 10 00:00:33,330 --> 00:00:36,890 which doesn't mean necessarily that it's more important. 11 00:00:36,890 --> 00:00:40,630 It just means that it's easier because it deals with 12 00:00:40,630 --> 00:00:42,470 mathematics. 13 00:00:42,470 --> 00:00:46,920 I'm going to spend a little bit more time reviewing 14 00:00:46,920 --> 00:00:50,370 probability, as you learned it before. 15 00:00:50,370 --> 00:00:53,800 I want to review it at a slightly more fundamental 16 00:00:53,800 --> 00:00:56,600 level than what you're used to seeing it as. 17 00:00:56,600 --> 00:01:00,900 You will understand why as we go on because when we get into 18 00:01:00,900 --> 00:01:05,519 stochastic processes, we will find that there are lots of 19 00:01:05,519 --> 00:01:08,700 very peculiar things that happen. 20 00:01:08,700 --> 00:01:12,260 And when peculiar things happen, the only thing you can 21 00:01:12,260 --> 00:01:15,160 do is go back to basics ideas. 22 00:01:15,160 --> 00:01:19,090 And if you don't understand what those basic ideas are, 23 00:01:19,090 --> 00:01:20,640 then you're in real trouble. 24 00:01:20,640 --> 00:01:24,780 So we'll start out by talking about expectations just a 25 00:01:24,780 --> 00:01:26,030 little bit. 26 00:01:28,860 --> 00:01:36,570 Distribution function of a random variable often says 27 00:01:36,570 --> 00:01:38,600 more than you're really interested in. 28 00:01:38,600 --> 00:01:41,540 In other words, a distribution function is a function from 29 00:01:41,540 --> 00:01:45,590 the whole sample space into the real numbers. 30 00:01:45,590 --> 00:01:48,730 And that's a very complicated thing in general. 31 00:01:48,730 --> 00:01:51,990 And the expectation is just one simple number. 32 00:01:51,990 --> 00:01:55,050 And with that one simple number, you get an idea of 33 00:01:55,050 --> 00:01:58,090 what that random variable is all about, whether it's big or 34 00:01:58,090 --> 00:02:01,090 it's little or what have you. 35 00:02:01,090 --> 00:02:04,740 They're a bunch of formulas that you're familiar with for 36 00:02:04,740 --> 00:02:06,730 finding the expectation. 37 00:02:06,730 --> 00:02:11,140 If you have a discrete random variable, the usual formula is 38 00:02:11,140 --> 00:02:14,840 you take all of the possible sample values, multiply each 39 00:02:14,840 --> 00:02:18,100 of them by the probability of that sample value, 40 00:02:18,100 --> 00:02:20,200 and you sum it up. 41 00:02:20,200 --> 00:02:23,300 This is what you learned right at the beginning of taking 42 00:02:23,300 --> 00:02:24,230 probability. 43 00:02:24,230 --> 00:02:27,100 If you've never taken probability, you learned it in 44 00:02:27,100 --> 00:02:30,120 statistic classes just as something you don't know where 45 00:02:30,120 --> 00:02:30,850 it comes from. 46 00:02:30,850 --> 00:02:32,100 But it's there. 47 00:02:33,950 --> 00:02:37,710 If you have a continuous random variable, a continuous 48 00:02:37,710 --> 00:02:40,270 random variable is one that has a density. 49 00:02:40,270 --> 00:02:43,200 You can find the expectation there. 50 00:02:43,200 --> 00:02:46,860 If you have an arbitrary random variable and it's not 51 00:02:46,860 --> 00:02:52,520 negative, then there's this peculiar formula here, which I 52 00:02:52,520 --> 00:02:54,080 will point to. 53 00:02:59,510 --> 00:03:01,085 I think I'll point to it. 54 00:03:05,710 --> 00:03:09,680 Ah yes, I will point to it. 55 00:03:09,680 --> 00:03:14,810 This formula down here, you might or might not have seen. 56 00:03:14,810 --> 00:03:18,180 And I hope by the end of this course, you'll realize that 57 00:03:18,180 --> 00:03:23,620 it's one, more fundamental and two, probably more useful than 58 00:03:23,620 --> 00:03:25,600 either of these two. 59 00:03:25,600 --> 00:03:30,510 And then there's a final one, which this final formula, I'll 60 00:03:30,510 --> 00:03:35,020 tell you a little bit about that when we get to it. 61 00:03:35,020 --> 00:03:41,260 OK, the formula for the expected value in terms of the 62 00:03:41,260 --> 00:03:45,640 integral of the complimentary distribution function. 63 00:03:45,640 --> 00:03:49,520 There's a picture here which shows you how it corresponds 64 00:03:49,520 --> 00:03:51,930 to the usual thing you're used to for a 65 00:03:51,930 --> 00:03:53,850 discrete random variable. 66 00:03:53,850 --> 00:03:57,100 Namely what you're doing is you're integrating this 67 00:03:57,100 --> 00:03:59,570 complimentary distribution function, which is the 68 00:03:59,570 --> 00:04:05,060 probability that the random variable x is greater than any 69 00:04:05,060 --> 00:04:07,430 particular x along the axis here. 70 00:04:07,430 --> 00:04:10,880 So you integrate this function along here. 71 00:04:10,880 --> 00:04:13,900 And according to what I'm trying to convince you of, 72 00:04:13,900 --> 00:04:17,980 just integrating that function gives you the expected value. 73 00:04:17,980 --> 00:04:21,510 And the reason is that this top little square here is a1 74 00:04:21,510 --> 00:04:25,250 times the probability that x is equal to a1. 75 00:04:25,250 --> 00:04:28,990 Next one is a2 times the probability it's equal to a2 76 00:04:28,990 --> 00:04:31,160 and so forth down. 77 00:04:31,160 --> 00:04:34,020 And you can obviously generalize this to any 78 00:04:34,020 --> 00:04:37,030 discrete random variable, which is non-negative. 79 00:04:37,030 --> 00:04:40,490 And I'm just talking about non-negative random variables 80 00:04:40,490 --> 00:04:42,500 for the moment. 81 00:04:42,500 --> 00:04:46,340 If x has a density, the same argument applies to any 82 00:04:46,340 --> 00:04:48,580 Riemann sum for that integral. 83 00:04:48,580 --> 00:04:49,660 You can take integrals. 84 00:04:49,660 --> 00:04:52,020 You can break them up into little slices. 85 00:04:52,020 --> 00:04:54,470 If you break them up into little slices, you can 86 00:04:54,470 --> 00:04:56,490 represent it in this way. 87 00:04:56,490 --> 00:05:01,250 And presto, again, you get that this integral is equal to 88 00:05:01,250 --> 00:05:03,000 the expectation. 89 00:05:03,000 --> 00:05:06,910 And if you have any other thing at all, you can always 90 00:05:06,910 --> 00:05:10,670 represent it in terms of this Riemann sum. 91 00:05:10,670 --> 00:05:13,360 Now why is it even more powerful then that? 92 00:05:13,360 --> 00:05:17,100 Well, it's more powerful then that because if you took 93 00:05:17,100 --> 00:05:18,190 measure theory-- 94 00:05:18,190 --> 00:05:22,780 which most of you presumably have not taken yet, and many 95 00:05:22,780 --> 00:05:24,720 of you might never take it-- 96 00:05:24,720 --> 00:05:27,200 you will find out that this is really the fundamental 97 00:05:27,200 --> 00:05:29,220 definition after all. 98 00:05:29,220 --> 00:05:33,180 And integration, when you look and measure theoretic terms, 99 00:05:33,180 --> 00:05:36,600 instead of taking little slices that go this way, you 100 00:05:36,600 --> 00:05:41,400 wind up taking little slices that go that way. 101 00:05:41,400 --> 00:05:44,830 So that any way this is the fundamental definition of 102 00:05:44,830 --> 00:05:46,010 expectation. 103 00:05:46,010 --> 00:05:48,960 If you're worried about whether expectations exist or 104 00:05:48,960 --> 00:05:52,180 not, why is this much nicer? 105 00:05:52,180 --> 00:05:56,280 Because what you're integrating here is simply a 106 00:05:56,280 --> 00:06:02,010 function, which is monotonic decreasing with x. 107 00:06:02,010 --> 00:06:08,130 In other words, if you try to integrate it by integrating 108 00:06:08,130 --> 00:06:13,900 this function out to some largest value and then 109 00:06:13,900 --> 00:06:18,300 chopping if off there, what you get is some number. 110 00:06:18,300 --> 00:06:23,490 If you extend this chopping off point out, what you get is 111 00:06:23,490 --> 00:06:25,520 a number which keeps increasing. 112 00:06:25,520 --> 00:06:27,060 What can happen? 113 00:06:27,060 --> 00:06:30,470 As you take a number which is increasing, you either get to 114 00:06:30,470 --> 00:06:32,710 infinity or you get to some finite limit. 115 00:06:32,710 --> 00:06:34,320 Nothing else can happen. 116 00:06:34,320 --> 00:06:37,160 So there aren't any limiting problems here. 117 00:06:37,160 --> 00:06:40,710 And when you take expectations in other ways, there are 118 00:06:40,710 --> 00:06:43,670 always questions that you have to ask. 119 00:06:43,670 --> 00:06:45,850 And they're often serious. 120 00:06:45,850 --> 00:06:48,750 So this is just a much nicer way of doing it. 121 00:06:48,750 --> 00:06:51,840 Anyway, that's the way we're going to do it. 122 00:06:51,840 --> 00:06:55,020 And so now we go on. 123 00:06:55,020 --> 00:06:59,220 Oh, I should mention where the other formula comes from. 124 00:06:59,220 --> 00:07:02,420 This formula back here. 125 00:07:06,580 --> 00:07:11,360 You get that by representing x as both the positive part of x 126 00:07:11,360 --> 00:07:13,360 plus the negative part of x. 127 00:07:13,360 --> 00:07:16,610 And if you want to see how to do that exactly, it's in the 128 00:07:16,610 --> 00:07:20,320 notes where it talks about first this and then this. 129 00:07:20,320 --> 00:07:22,770 So you just put the two together. 130 00:07:22,770 --> 00:07:25,470 And then you get an expected value. 131 00:07:25,470 --> 00:07:28,950 A word about notation here, and there's nothing I can do 132 00:07:28,950 --> 00:07:30,750 about this. 133 00:07:30,750 --> 00:07:32,590 It's an unfortunate thing. 134 00:07:32,590 --> 00:07:36,840 When somebody says that the expected value of a random 135 00:07:36,840 --> 00:07:39,440 variable exists, what do they mean? 136 00:07:43,830 --> 00:07:48,910 Any engineer would try to integrate it and would either 137 00:07:48,910 --> 00:07:52,300 get something which was undefined, because it was 138 00:07:52,300 --> 00:07:54,040 infinite going this way. 139 00:07:54,040 --> 00:07:56,090 It's minus infinity going that way. 140 00:07:56,090 --> 00:07:59,510 And there's no way to put the two together. 141 00:07:59,510 --> 00:08:03,260 If you get infinity going this way, something finite going 142 00:08:03,260 --> 00:08:07,700 that way, like with a non-negative random variable, 143 00:08:07,700 --> 00:08:11,480 it's kind of silly to say the expectation doesn't exist. 144 00:08:11,480 --> 00:08:14,440 Because really what's happening is the expectation 145 00:08:14,440 --> 00:08:15,720 is infinite. 146 00:08:15,720 --> 00:08:18,950 Now mathematicians and everybody who writes books, 147 00:08:18,950 --> 00:08:21,840 everybody who writes papers, everybody-- 148 00:08:21,840 --> 00:08:23,510 I think-- 149 00:08:23,510 --> 00:08:27,810 defines expected value as existing only if it's finite. 150 00:08:27,810 --> 00:08:31,280 In other words, what you're doing is taking this integral 151 00:08:31,280 --> 00:08:33,409 over the set of real values. 152 00:08:33,409 --> 00:08:37,130 And you don't allow plus infinity or minus infinity. 153 00:08:37,130 --> 00:08:52,060 So you say that the expectation does not exist if 154 00:08:52,060 --> 00:08:56,950 in fact it's infinite or it's minus infinity or it is 155 00:08:56,950 --> 00:08:58,810 undefined completely. 156 00:08:58,810 --> 00:09:01,510 And you say it's undefined in all of those cases. 157 00:09:01,510 --> 00:09:04,550 And that's just a convention that everybody lives by. 158 00:09:04,550 --> 00:09:09,340 So the other way of saying this is if the expected value 159 00:09:09,340 --> 00:09:13,840 of the magnitude of the random variable is infinite, then the 160 00:09:13,840 --> 00:09:15,790 expectation doesn't exist. 161 00:09:15,790 --> 00:09:19,430 So we will try to say it that way sometimes 162 00:09:19,430 --> 00:09:21,750 when it's really important. 163 00:09:21,750 --> 00:09:26,010 OK, let's go on to indicator random variables. 164 00:09:26,010 --> 00:09:28,180 You're probably familiar with these. 165 00:09:28,180 --> 00:09:33,900 For every event you can think of, an event is something 166 00:09:33,900 --> 00:09:38,820 which is true, which occurs when some set of the sample 167 00:09:38,820 --> 00:09:42,460 points occur and is not true otherwise. 168 00:09:42,460 --> 00:09:46,570 So the definition of an indicator random variable is 169 00:09:46,570 --> 00:09:49,520 that the indicator for an event a-- 170 00:09:49,520 --> 00:09:51,980 as a function of the sample space-- 171 00:09:51,980 --> 00:09:58,920 is equal to 1, if omega is in the event a, and 0 otherwise. 172 00:09:58,920 --> 00:10:02,510 So if you draw the distribution function of it, 173 00:10:02,510 --> 00:10:11,170 the distribution function of the indicator function is 0 up 174 00:10:11,170 --> 00:10:13,270 until the point 0. 175 00:10:13,270 --> 00:10:16,165 Then it jumps up to 1 minus the probability of a. 176 00:10:16,165 --> 00:10:19,000 At 1, it jumps all the way up to 1. 177 00:10:19,000 --> 00:10:21,290 So it's simply a binary random variable. 178 00:10:21,290 --> 00:10:25,600 So every event has an indicator random variable. 179 00:10:25,600 --> 00:10:27,810 Every indicator random variable has a 180 00:10:27,810 --> 00:10:29,640 binary random variable. 181 00:10:29,640 --> 00:10:33,250 So indicator random variables are very simple. 182 00:10:33,250 --> 00:10:36,900 Events are very simple because you can map any event into an 183 00:10:36,900 --> 00:10:39,400 indicator random variable [INAUDIBLE]. 184 00:10:39,400 --> 00:10:42,930 And this also says that since we want to talk about events 185 00:10:42,930 --> 00:10:46,410 very often, binary random variables are particularly 186 00:10:46,410 --> 00:10:49,040 important in this field. 187 00:10:49,040 --> 00:10:52,760 OK, but what this really says now is that any theorem about 188 00:10:52,760 --> 00:10:55,990 random variables can be applied to events. 189 00:10:55,990 --> 00:10:59,360 This is one of the few examples I know where it's 190 00:10:59,360 --> 00:11:02,370 much harder to find the expectation by taking the 191 00:11:02,370 --> 00:11:05,350 complimentary distribution function and integrating it. 192 00:11:05,350 --> 00:11:06,420 It's not hard. 193 00:11:06,420 --> 00:11:09,750 But it's far easier to take the probability that the 194 00:11:09,750 --> 00:11:13,650 indicator random variable is 0, which is 1 minus 195 00:11:13,650 --> 00:11:14,800 probability of a. 196 00:11:14,800 --> 00:11:18,600 The probability is equal to 1, which is probability of a, and 197 00:11:18,600 --> 00:11:22,480 take the expectation, which is the probability of a, and the 198 00:11:22,480 --> 00:11:25,410 standard deviation, which is the square root the 199 00:11:25,410 --> 00:11:29,580 probability of a times 1 minus the probability of a. 200 00:11:29,580 --> 00:11:34,170 So random variables are sort of trivial things in a way. 201 00:11:34,170 --> 00:11:36,940 OK, let's go on to multiple random variables. 202 00:11:36,940 --> 00:11:40,720 Now here's something that's a trick question in a way. 203 00:11:40,720 --> 00:11:43,110 But it's a very important trick question. 204 00:11:43,110 --> 00:11:47,040 Is a random variable specified by its distribution function? 205 00:11:47,040 --> 00:11:49,620 We've already seen that it's not really specified by its 206 00:11:49,620 --> 00:11:54,410 density or by its probability mass function. 207 00:11:54,410 --> 00:11:57,190 But we've said a distribution function is a more general 208 00:11:57,190 --> 00:11:59,670 thing so that every random variable has a 209 00:11:59,670 --> 00:12:01,500 distribution function. 210 00:12:01,500 --> 00:12:05,900 Does the distribution function specify the random variable? 211 00:12:05,900 --> 00:12:09,910 No, that's the whole reason for what 212 00:12:09,910 --> 00:12:14,610 Kolmogorov did back in 1933. 213 00:12:14,610 --> 00:12:17,250 Or at least it was one of the main reasons 214 00:12:17,250 --> 00:12:18,130 for what he was doing. 215 00:12:18,130 --> 00:12:21,550 He wanted to straighten out this ambiguity which runs 216 00:12:21,550 --> 00:12:25,690 through the field about confusing random variables 217 00:12:25,690 --> 00:12:27,760 with their distribution function. 218 00:12:27,760 --> 00:12:32,600 Random variables are functions from the sample space to the 219 00:12:32,600 --> 00:12:34,110 real numbers. 220 00:12:34,110 --> 00:12:36,340 And they're not anything else. 221 00:12:36,340 --> 00:12:40,550 So if you want to really define a random variable, you 222 00:12:40,550 --> 00:12:43,290 not only have to know what that random variable is but 223 00:12:43,290 --> 00:12:46,980 you also have to know what its relationships are. 224 00:12:46,980 --> 00:12:49,360 It's like if you're trying to understand the person. 225 00:12:49,360 --> 00:12:52,110 You can't understand the person without understanding 226 00:12:52,110 --> 00:12:56,450 something about who they know, how they know them, all those 227 00:12:56,450 --> 00:12:57,050 other things. 228 00:12:57,050 --> 00:12:59,360 All those relationships are important. 229 00:12:59,360 --> 00:13:00,950 And it's the same with random variables. 230 00:13:00,950 --> 00:13:03,520 You got to know about all the relationships. 231 00:13:03,520 --> 00:13:05,890 Many problems you can solve just in terms of 232 00:13:05,890 --> 00:13:07,470 distribution function. 233 00:13:07,470 --> 00:13:10,120 But ultimately you have to-- 234 00:13:10,120 --> 00:13:13,170 or ultimately in many cases, you have to deal with these 235 00:13:13,170 --> 00:13:15,790 joint distribution functions. 236 00:13:15,790 --> 00:13:17,810 And random variables are independent. 237 00:13:17,810 --> 00:13:21,440 If the joint distribution function is equal to the 238 00:13:21,440 --> 00:13:27,890 product of the distribution functions for all x1 to xn, 239 00:13:27,890 --> 00:13:34,790 and that same form carries over for density functions and 240 00:13:34,790 --> 00:13:36,425 for probability mass functions. 241 00:13:41,757 --> 00:13:46,230 OK, if you have discrete random variables, the idea of 242 00:13:46,230 --> 00:13:49,990 independence is a whole lot more intuitive if you express 243 00:13:49,990 --> 00:13:54,280 it in terms of conditional probabilities. 244 00:13:54,280 --> 00:13:58,620 The conditional probability that the random variable x 245 00:13:58,620 --> 00:14:03,690 takes on some sample value x given that the random variable 246 00:14:03,690 --> 00:14:06,210 y takes on a sample value y. 247 00:14:08,750 --> 00:14:14,300 Just as one side comment here, when you're doing problems, 248 00:14:14,300 --> 00:14:18,700 you will very often want to leave out the subscripts here 249 00:14:18,700 --> 00:14:21,980 saying what random variables you're dealing with. 250 00:14:21,980 --> 00:14:26,680 And you will use either capital or small letters here 251 00:14:26,680 --> 00:14:33,140 mixing up the argument and the function itself, which 252 00:14:33,140 --> 00:14:34,110 everybody does. 253 00:14:34,110 --> 00:14:36,070 And it's perfectly all right. 254 00:14:36,070 --> 00:14:40,480 I suggest that you try not to do it for a while because you 255 00:14:40,480 --> 00:14:43,960 get so confused doing this, not being able to sort out 256 00:14:43,960 --> 00:14:46,960 what's a random variable and what's a real number. 257 00:14:46,960 --> 00:14:51,860 A lot of wags say random variables are neither random, 258 00:14:51,860 --> 00:14:54,080 because they're functions of the sample space, 259 00:14:54,080 --> 00:14:56,300 nor are they variables. 260 00:14:56,300 --> 00:14:59,590 And both of those are true. 261 00:14:59,590 --> 00:15:01,840 That's immaterial here. 262 00:15:01,840 --> 00:15:04,140 It's just that when you start getting confused about a 263 00:15:04,140 --> 00:15:07,840 problem, it's important to sort out which things are 264 00:15:07,840 --> 00:15:10,610 random variables and which things are arguments. 265 00:15:10,610 --> 00:15:13,700 Now this conditional probability is something 266 00:15:13,700 --> 00:15:15,320 you're all familiar with. 267 00:15:15,320 --> 00:15:20,620 But x and y are independent then if the probability of x 268 00:15:20,620 --> 00:15:24,780 conditional on y is the same as the probability of x not 269 00:15:24,780 --> 00:15:26,000 conditional on y. 270 00:15:26,000 --> 00:15:29,830 In other words, if observing what y is doesn't tell you 271 00:15:29,830 --> 00:15:32,970 anything about what x is, that's really your intuitive 272 00:15:32,970 --> 00:15:36,000 definition of independence. 273 00:15:36,000 --> 00:15:39,050 It's what you use if you're dealing with 274 00:15:39,050 --> 00:15:41,580 some real world situation. 275 00:15:41,580 --> 00:15:44,280 And you're asking what does this have to do with that? 276 00:15:44,280 --> 00:15:47,050 And if this has nothing to do with that, the random 277 00:15:47,050 --> 00:15:49,950 variables over here have nothing to do with the random 278 00:15:49,950 --> 00:15:54,790 variables over there, you would say in the real world 279 00:15:54,790 --> 00:15:56,880 that these things are independent of each other. 280 00:15:56,880 --> 00:15:59,540 When you have a probability model, you say they're 281 00:15:59,540 --> 00:16:01,580 statistically independent of each other. 282 00:16:11,580 --> 00:16:21,280 OK, so that's the relationship between the real world and the 283 00:16:21,280 --> 00:16:23,730 models that we're dealing with all the time. 284 00:16:23,730 --> 00:16:26,090 We call it independence in both cases. 285 00:16:26,090 --> 00:16:28,980 But it means somewhat different things in the two 286 00:16:28,980 --> 00:16:30,750 situations. 287 00:16:30,750 --> 00:16:34,270 OK, next about IID random variables. 288 00:16:34,270 --> 00:16:35,670 What are they? 289 00:16:35,670 --> 00:16:41,480 Well, the joint distribution function has to be equal to 290 00:16:41,480 --> 00:16:46,760 the product of the individual distribution functions. 291 00:16:46,760 --> 00:16:50,440 You notice I've done something funny here, which is a 292 00:16:50,440 --> 00:16:52,430 convention I always use. 293 00:16:52,430 --> 00:16:53,990 A lot of people use it. 294 00:16:53,990 --> 00:16:57,680 If you have a bunch of independent random variables, 295 00:16:57,680 --> 00:17:00,370 they all have the same distribution function. 296 00:17:00,370 --> 00:17:03,600 If they all have the same distribution function, it gets 297 00:17:03,600 --> 00:17:06,859 confusing to refer to their distribution functions as a 298 00:17:06,859 --> 00:17:09,750 distribution function of the random variable x1, 299 00:17:09,750 --> 00:17:12,680 distribution function of the random variable x2. 300 00:17:12,680 --> 00:17:17,150 It's nicer to just take a generic random variable x, 301 00:17:17,150 --> 00:17:20,079 which has the same distribution as all of these, 302 00:17:20,079 --> 00:17:23,930 and express this numerically in this way. 303 00:17:23,930 --> 00:17:28,460 You have the same product form for probability mass functions 304 00:17:28,460 --> 00:17:29,890 and for density functions. 305 00:17:29,890 --> 00:17:33,150 So this works throughout. 306 00:17:33,150 --> 00:17:39,430 OK next, think about a probability model in which r, 307 00:17:39,430 --> 00:17:42,980 the set of real numbers, is a sample space. 308 00:17:42,980 --> 00:17:46,420 And x is some random variable on that sample space. 309 00:17:46,420 --> 00:17:50,310 Namely x then is a function from the real numbers on to 310 00:17:50,310 --> 00:17:53,445 the real numbers. 311 00:17:53,445 --> 00:17:56,660 The interesting thing here, and what I'm 312 00:17:56,660 --> 00:17:59,380 saying here is obvious. 313 00:17:59,380 --> 00:18:01,350 It's something that you all know. 314 00:18:01,350 --> 00:18:03,890 It's something that you've all been using even before you 315 00:18:03,890 --> 00:18:07,010 started to learn about probability theory. 316 00:18:07,010 --> 00:18:10,380 But at the same time, you probably never thought about 317 00:18:10,380 --> 00:18:13,190 it in a serious enough way that you would really make 318 00:18:13,190 --> 00:18:15,190 sense out of it. 319 00:18:15,190 --> 00:18:18,890 You can always create an extended probability model in 320 00:18:18,890 --> 00:18:23,450 which the Cartesian space r to the n-- in other words, the 321 00:18:23,450 --> 00:18:26,920 space of n real numbers-- 322 00:18:26,920 --> 00:18:28,720 is the sample space. 323 00:18:28,720 --> 00:18:32,860 And x1 to xn are independent identity 324 00:18:32,860 --> 00:18:34,640 distributed random variables. 325 00:18:34,640 --> 00:18:36,720 This is not obvious. 326 00:18:36,720 --> 00:18:38,730 And that's something you have to prove. 327 00:18:38,730 --> 00:18:41,820 But it's not hard to prove. 328 00:18:41,820 --> 00:18:46,180 And all you have to do is start out with a probability 329 00:18:46,180 --> 00:18:49,950 model for one random variable. 330 00:18:49,950 --> 00:18:53,260 And then just define all products to be what they're 331 00:18:53,260 --> 00:18:56,960 supposed to be and go from the products to all unions and all 332 00:18:56,960 --> 00:18:58,940 intersections. 333 00:18:58,940 --> 00:19:01,330 We're just going to assume that that's true because we 334 00:19:01,330 --> 00:19:06,000 have to assume it's true if we don't want to use any measure 335 00:19:06,000 --> 00:19:06,490 theory here. 336 00:19:06,490 --> 00:19:08,980 This is one of the easier things to show 337 00:19:08,980 --> 00:19:10,380 using measure theory. 338 00:19:10,380 --> 00:19:13,110 But it's something you are always used to. 339 00:19:13,110 --> 00:19:16,560 When you think of a random experiment, when you think of 340 00:19:16,560 --> 00:19:19,030 playing dice with somebody or playing cards with 341 00:19:19,030 --> 00:19:22,390 someone, you are-- 342 00:19:22,390 --> 00:19:25,650 from the very beginning when you started to talk about odds 343 00:19:25,650 --> 00:19:30,290 or anything of that sort, you have always had the idea that 344 00:19:30,290 --> 00:19:33,790 this is a game which you can play repeatedly. 345 00:19:33,790 --> 00:19:36,590 And each time you play it, it's the same game. 346 00:19:36,590 --> 00:19:38,750 But the outcome is different. 347 00:19:38,750 --> 00:19:42,510 But all the probabilities are exactly the same. 348 00:19:42,510 --> 00:19:46,150 What this says is you can always make a probability 349 00:19:46,150 --> 00:19:46,990 model that way. 350 00:19:46,990 --> 00:19:49,590 You can always make a probability model which 351 00:19:49,590 --> 00:19:52,670 corresponds to what you've always believed deep in your 352 00:19:52,670 --> 00:19:54,700 hearts all your lives. 353 00:19:54,700 --> 00:19:56,300 And fortunately, that's true. 354 00:19:56,300 --> 00:19:59,110 Otherwise, you wouldn't use probability. 355 00:19:59,110 --> 00:20:03,830 OK, so let's move on from that. 356 00:20:03,830 --> 00:20:08,920 The page of philosophy, I will stop doing that pretty soon. 357 00:20:08,920 --> 00:20:13,240 But I have to get across what the relationship is between 358 00:20:13,240 --> 00:20:15,260 the real world and these models that 359 00:20:15,260 --> 00:20:16,210 we're dealing with. 360 00:20:16,210 --> 00:20:20,700 Because otherwise, you as engineers or business people 361 00:20:20,700 --> 00:20:23,760 or financial analysts or whatever the heck you're going 362 00:20:23,760 --> 00:20:27,930 to become will start believing in your probability models. 363 00:20:27,930 --> 00:20:33,000 And you will cause untold damage by losing track of the 364 00:20:33,000 --> 00:20:36,640 fact that these are supposedly models of something. 365 00:20:36,640 --> 00:20:38,160 And you better think of what they're supposed 366 00:20:38,160 --> 00:20:40,081 to be models of. 367 00:20:40,081 --> 00:20:43,210 OK, in order to do that, we're going to study the sample 368 00:20:43,210 --> 00:20:49,120 average, namely the sum of n random variables divided by n. 369 00:20:49,120 --> 00:20:50,820 That's the way you take sample averages. 370 00:20:50,820 --> 00:20:52,030 You add them all up. 371 00:20:52,030 --> 00:20:54,380 You divide by n. 372 00:20:54,380 --> 00:20:56,570 The law of large numbers, which we're going to talk 373 00:20:56,570 --> 00:21:00,880 about very soon, says that s sub n over n essentially 374 00:21:00,880 --> 00:21:06,210 becomes deterministic as n becomes very large. 375 00:21:06,210 --> 00:21:09,460 What we mean by that-- and most of you have seen that in 376 00:21:09,460 --> 00:21:10,780 various ways. 377 00:21:10,780 --> 00:21:14,180 We will review it later today. 378 00:21:14,180 --> 00:21:18,150 Well, we'll do it today and on Wednesday. 379 00:21:18,150 --> 00:21:22,400 And there's a big question of about what becoming 380 00:21:22,400 --> 00:21:24,530 deterministic means. 381 00:21:24,530 --> 00:21:27,180 But there is an essential idea there. 382 00:21:27,180 --> 00:21:30,130 The extended model, namely when you have one random 383 00:21:30,130 --> 00:21:33,510 variable, you create a very large number of them. 384 00:21:33,510 --> 00:21:37,380 If it corresponds to repeated experiments in the real world, 385 00:21:37,380 --> 00:21:41,060 then s sub n over n corresponds to the arithmetic 386 00:21:41,060 --> 00:21:42,600 average in the real world. 387 00:21:42,600 --> 00:21:46,120 In the real world, you do take arithmetic averages. 388 00:21:46,120 --> 00:21:49,650 Whenever you open up a newspaper, somebody is taking 389 00:21:49,650 --> 00:21:53,110 an arithmetic average of something and says, gee, this 390 00:21:53,110 --> 00:21:53,920 is significant. 391 00:21:53,920 --> 00:21:58,550 This shows what's going on someplace. 392 00:21:58,550 --> 00:22:00,930 Models can have two types of difficulties. 393 00:22:00,930 --> 00:22:04,250 This paragraph is a little different than what I wrote in 394 00:22:04,250 --> 00:22:07,445 the handout because I realized what I wrote in the handout 395 00:22:07,445 --> 00:22:10,250 didn't make a whole lot of sense. 396 00:22:10,250 --> 00:22:12,720 OK, the two types of difficulties you have with 397 00:22:12,720 --> 00:22:17,290 models, especially when you're trying to model things by IID 398 00:22:17,290 --> 00:22:18,720 random variables. 399 00:22:18,720 --> 00:22:23,390 In one, a sequence of real world experiments is not 400 00:22:23,390 --> 00:22:26,920 sufficiently similar and isolated to each other to 401 00:22:26,920 --> 00:22:30,820 correspond to the IID extended model. 402 00:22:30,820 --> 00:22:33,750 In other words, you want to model things so that each 403 00:22:33,750 --> 00:22:37,120 time, each trial of this experiment, you do the same 404 00:22:37,120 --> 00:22:40,390 thing but get a potentially different answer. 405 00:22:40,390 --> 00:22:46,535 Sometimes you rig things without trying to do so in 406 00:22:46,535 --> 00:22:49,630 such a way that these experiments are not 407 00:22:49,630 --> 00:22:52,420 independent of each other and in fact are very, 408 00:22:52,420 --> 00:22:53,540 very heavily biased. 409 00:22:53,540 --> 00:22:57,420 You find people taking risk models in the financial world 410 00:22:57,420 --> 00:22:59,840 where they take all sorts of these things. 411 00:22:59,840 --> 00:23:02,110 And they say, oh, all right, these things are all 412 00:23:02,110 --> 00:23:03,290 independent of each other. 413 00:23:03,290 --> 00:23:05,120 They look independent. 414 00:23:05,120 --> 00:23:08,230 And then suddenly a scare comes along. 415 00:23:08,230 --> 00:23:10,580 And everybody sells simultaneously. 416 00:23:10,580 --> 00:23:13,120 And you find out that all these random variables were 417 00:23:13,120 --> 00:23:14,880 not independent at all. 418 00:23:14,880 --> 00:23:17,760 They were very closely related to each other but in a way you 419 00:23:17,760 --> 00:23:19,460 never saw before. 420 00:23:19,460 --> 00:23:26,290 OK, the other way that these models are not true or not 421 00:23:26,290 --> 00:23:30,560 valid is that the IID extension is OK. 422 00:23:30,560 --> 00:23:33,340 But the basic model is not right. 423 00:23:33,340 --> 00:23:35,050 OK, in other words, you model a coin. 424 00:23:35,050 --> 00:23:38,410 It's coming out heads with probability 1/2. 425 00:23:38,410 --> 00:23:42,640 And somebody has put a loaded coin in. 426 00:23:42,640 --> 00:23:47,760 And the probability that it comes up heads is 0.45. 427 00:23:47,760 --> 00:23:51,700 And the probability that it comes up tails is 0.55. 428 00:23:51,700 --> 00:23:54,860 And you might guess that this person always bets on tails 429 00:23:54,860 --> 00:23:58,030 and tries to get you to bet on heads. 430 00:23:58,030 --> 00:24:01,410 So in that case, the basic model that you're 431 00:24:01,410 --> 00:24:02,940 using is not OK. 432 00:24:02,940 --> 00:24:05,380 So you have both of these kinds of problems. 433 00:24:05,380 --> 00:24:07,190 You should try to keep them straight. 434 00:24:07,190 --> 00:24:09,620 But we'll learn about these problems through 435 00:24:09,620 --> 00:24:10,750 study of the models. 436 00:24:10,750 --> 00:24:13,950 Namely, we're not going to go through an enormous amount of 437 00:24:13,950 --> 00:24:19,834 study on how you can bias a coin or things of this sort. 438 00:24:19,834 --> 00:24:23,200 OK, science, symmetry, analogies, earlier models, all 439 00:24:23,200 --> 00:24:27,830 of these are used to model real world situations. 440 00:24:27,830 --> 00:24:31,490 Let me again talk about an example I talked about a 441 00:24:31,490 --> 00:24:37,730 little bit last time because the model was so trivial that 442 00:24:37,730 --> 00:24:41,250 you probably understood everything about the model in 443 00:24:41,250 --> 00:24:42,110 the situation. 444 00:24:42,110 --> 00:24:46,060 But you didn't understand what it was illustrating. 445 00:24:46,060 --> 00:24:49,170 You have two dice. 446 00:24:49,170 --> 00:24:50,930 One of them is red. 447 00:24:50,930 --> 00:24:52,930 And one of them is white. 448 00:24:52,930 --> 00:24:54,225 You roll them. 449 00:24:54,225 --> 00:25:00,430 By symmetry, each one comes up to be 1, 2, 3, up to 6, each 450 00:25:00,430 --> 00:25:02,170 with equal probability. 451 00:25:02,170 --> 00:25:05,040 If you roll them with two hands or something, they're 452 00:25:05,040 --> 00:25:07,850 going to be independent of each other. 453 00:25:07,850 --> 00:25:11,910 And therefore, the probability of each pair of outcomes-- 454 00:25:11,910 --> 00:25:13,480 red is equal to i. 455 00:25:13,480 --> 00:25:15,100 White is equal to j. 456 00:25:15,100 --> 00:25:17,990 Probability of each one of those is going to be 136 457 00:25:17,990 --> 00:25:20,770 because that's the size of the sample space. 458 00:25:20,770 --> 00:25:23,580 Now you take two white dice. 459 00:25:23,580 --> 00:25:25,810 And you roll them. 460 00:25:25,810 --> 00:25:27,870 What's the sample space? 461 00:25:27,870 --> 00:25:30,420 Well, as far as the real world is concerned, you can't 462 00:25:30,420 --> 00:25:35,800 distinguish a red 1 from a white 1 and a white 463 00:25:35,800 --> 00:25:37,390 2 from a red 2. 464 00:25:37,390 --> 00:25:40,470 In other words, those two possibilities can't be 465 00:25:40,470 --> 00:25:41,730 distinguished. 466 00:25:41,730 --> 00:25:44,740 So you might say I want to use a sample space which 467 00:25:44,740 --> 00:25:46,736 corresponds to the-- 468 00:25:50,140 --> 00:25:53,880 what's the word I used here?-- finest grain possible outcome 469 00:25:53,880 --> 00:25:57,096 that you can observe. 470 00:25:57,096 --> 00:25:59,030 And who would do that? 471 00:25:59,030 --> 00:26:01,170 You'd be crazy to do that. 472 00:26:01,170 --> 00:26:04,270 I mean, you have a nice model of rolling dice where each 473 00:26:04,270 --> 00:26:07,510 outcome has probability 136. 474 00:26:07,510 --> 00:26:10,120 And you would replace that with something where the 475 00:26:10,120 --> 00:26:13,440 probability of a 1, 1 is 136. 476 00:26:13,440 --> 00:26:17,560 But the probability of a 1, 2 is 1/18 because you can get a 477 00:26:17,560 --> 00:26:20,190 2 in two different ways. 478 00:26:20,190 --> 00:26:23,470 You get a 2 in two different ways, which says you're really 479 00:26:23,470 --> 00:26:26,920 thinking about a red die and a white die. 480 00:26:26,920 --> 00:26:29,440 Otherwise, you wouldn't be able to say that. 481 00:26:29,440 --> 00:26:33,010 So the appropriate model here is certainly to think in terms 482 00:26:33,010 --> 00:26:35,050 of a red die and a white die. 483 00:26:35,050 --> 00:26:36,850 It's what everybody does. 484 00:26:36,850 --> 00:26:38,830 They just don't talk about it. 485 00:26:38,830 --> 00:26:43,200 OK, so the point that I'm trying to make here is that 486 00:26:43,200 --> 00:26:48,880 what you call a finest grain model is not at all clear. 487 00:26:48,880 --> 00:26:52,450 And if it's not at all clear in the case of dice, it sure 488 00:26:52,450 --> 00:26:55,740 as hell is not clear in most of the kinds of problems you 489 00:26:55,740 --> 00:26:57,040 want to deal with. 490 00:26:57,040 --> 00:27:02,180 So you need something considerably more than that. 491 00:27:02,180 --> 00:27:05,740 OK, so neither the axioms nor experimentation 492 00:27:05,740 --> 00:27:06,970 motivate this model. 493 00:27:06,970 --> 00:27:11,940 In other words, you really have to use common sense. 494 00:27:11,940 --> 00:27:14,610 You have to use judgment. 495 00:27:14,610 --> 00:27:16,480 And all of you have that. 496 00:27:16,480 --> 00:27:20,010 It's just that by learning all this mathematics, you 497 00:27:20,010 --> 00:27:22,530 eventually start to think that maybe you shouldn't use your 498 00:27:22,530 --> 00:27:24,170 common sense. 499 00:27:24,170 --> 00:27:27,830 So I have to keep saying that no, you keep on using your 500 00:27:27,830 --> 00:27:28,850 common sense. 501 00:27:28,850 --> 00:27:32,680 You want to learn what these models are about. 502 00:27:32,680 --> 00:27:34,475 You want to use your common sense also. 503 00:27:34,475 --> 00:27:39,130 And you've got to go back and forth between the two of them. 504 00:27:39,130 --> 00:27:42,410 OK, that's almost the end of our philosophy. 505 00:27:42,410 --> 00:27:43,520 I guess one more slide. 506 00:27:43,520 --> 00:27:45,810 I'm getting tired of this stuff. 507 00:27:45,810 --> 00:27:49,680 Comparing models for similar situations and analyzing 508 00:27:49,680 --> 00:27:52,770 limited and effective models helps a lot 509 00:27:52,770 --> 00:27:54,810 in clarifying fuzziness. 510 00:27:54,810 --> 00:27:57,260 But ultimately, as in all of science, some 511 00:27:57,260 --> 00:27:58,840 experimentation is needed. 512 00:27:58,840 --> 00:28:01,240 This is like any other branch of science. 513 00:28:01,240 --> 00:28:05,090 You need experimentation sometimes. 514 00:28:05,090 --> 00:28:08,190 You don't want to do too much of it because you'd always be 515 00:28:08,190 --> 00:28:10,040 doing experiments. 516 00:28:10,040 --> 00:28:13,600 But the important thing is that the outcome of an 517 00:28:13,600 --> 00:28:17,140 experiment is a sample point. 518 00:28:17,140 --> 00:28:18,920 It's not a probability. 519 00:28:18,920 --> 00:28:19,725 You do an experiment. 520 00:28:19,725 --> 00:28:21,300 You get an outcome. 521 00:28:21,300 --> 00:28:24,050 And all you find is one sample point, if you do the 522 00:28:24,050 --> 00:28:25,670 experiment once. 523 00:28:25,670 --> 00:28:27,800 And there's nothing that lets you draw a 524 00:28:27,800 --> 00:28:29,790 probability from that. 525 00:28:29,790 --> 00:28:32,430 The only way you can get things that you would call 526 00:28:32,430 --> 00:28:36,370 probabilities is to use an extended model, hope the 527 00:28:36,370 --> 00:28:40,690 extended model corresponds to the physical situation, and 528 00:28:40,690 --> 00:28:43,970 deal with these law of large numbers kind of things. 529 00:28:43,970 --> 00:28:46,650 You don't necessarily need IID random variables. 530 00:28:46,650 --> 00:28:50,200 But you need something that you know about between a large 531 00:28:50,200 --> 00:28:54,060 number of random variables to get from an outcome to 532 00:28:54,060 --> 00:28:57,700 something you could reasonably call a probability. 533 00:28:57,700 --> 00:29:00,560 OK, so that's enough. 534 00:29:00,560 --> 00:29:03,200 Let's go on to the law of large numbers. 535 00:29:03,200 --> 00:29:06,270 Let's do it in pictures first. 536 00:29:06,270 --> 00:29:10,820 So you can lie back and relax for a minute or stop being 537 00:29:10,820 --> 00:29:14,080 bored by all this stuff. 538 00:29:14,080 --> 00:29:17,670 What I've done here is to take the simplest random variable I 539 00:29:17,670 --> 00:29:20,760 can think of, which as you might guess is a 540 00:29:20,760 --> 00:29:22,230 binary random variable. 541 00:29:22,230 --> 00:29:24,360 It's either 0 or 1. 542 00:29:24,360 --> 00:29:28,050 Here it's 1 with probability 1/4 and 0 543 00:29:28,050 --> 00:29:30,260 with probability 3/4. 544 00:29:30,260 --> 00:29:32,290 I have actually calculated these things. 545 00:29:32,290 --> 00:29:38,490 The distribution function of x1 plus x2 plus x3 plus x4, 546 00:29:38,490 --> 00:29:45,390 this point down here, is the probability 547 00:29:45,390 --> 00:29:47,890 of all 0's, I guess. 548 00:29:47,890 --> 00:29:51,060 And then you get the probability of all 0's plus 1, 549 00:29:51,060 --> 00:29:54,210 1 and so forth. 550 00:29:54,210 --> 00:29:57,710 Here's where you take the sum of 20 random variables. 551 00:29:57,710 --> 00:30:01,010 And you're looking at the distribution function of the 552 00:30:01,010 --> 00:30:03,400 number of 1's that you get. 553 00:30:03,400 --> 00:30:05,410 And it comes out like this. 554 00:30:05,410 --> 00:30:07,090 Here you're looking at s50. 555 00:30:07,090 --> 00:30:09,650 You're adding up 50 random variables. 556 00:30:09,650 --> 00:30:14,110 And what's happening as far as the gross picture 557 00:30:14,110 --> 00:30:15,990 is concerned here? 558 00:30:15,990 --> 00:30:21,470 Well, the mean value of s sub n is the mean of a sum of 559 00:30:21,470 --> 00:30:22,940 random variables. 560 00:30:22,940 --> 00:30:26,820 And that's equal to n times a mean of a single random 561 00:30:26,820 --> 00:30:30,530 variable when you have identically distributed random 562 00:30:30,530 --> 00:30:33,600 variables or random variables that have the same mean. 563 00:30:33,600 --> 00:30:38,330 The variance is equal to n times sigma squared. 564 00:30:38,330 --> 00:30:46,270 Namely, when you take the expected value of this 565 00:30:46,270 --> 00:30:51,010 quantity squared, all these cross terms are going to 566 00:30:51,010 --> 00:30:54,360 balance out with the mean when you do. 567 00:30:54,360 --> 00:30:57,600 I mean, all of you know how to find the variance of s sub n. 568 00:30:57,600 --> 00:30:59,560 I hope you know how to find that. 569 00:30:59,560 --> 00:31:03,710 And when you do that, it increases with n. 570 00:31:03,710 --> 00:31:06,060 And the mean increases with n. 571 00:31:06,060 --> 00:31:08,870 The standard deviation, which gives you a picture of how 572 00:31:08,870 --> 00:31:12,810 wide the distribution is, only goes up as the 573 00:31:12,810 --> 00:31:14,820 square root of n. 574 00:31:14,820 --> 00:31:21,010 This is really the essence of the weak law of large numbers. 575 00:31:21,010 --> 00:31:25,520 I mean, everything else is mathematical detail. 576 00:31:25,520 --> 00:31:32,740 And then if you go on beyond this and you talk about the 577 00:31:32,740 --> 00:31:37,260 sample average, namely the sum of these n random variables-- 578 00:31:37,260 --> 00:31:38,970 assume them IID again. 579 00:31:38,970 --> 00:31:42,090 In fact, assume for this picture that they're the same 580 00:31:42,090 --> 00:31:44,230 binary random variables. 581 00:31:44,230 --> 00:31:46,130 You look at the sample average. 582 00:31:46,130 --> 00:31:48,890 You find the mean of the sample average. 583 00:31:48,890 --> 00:31:52,400 And it's the mean of a single random variable. 584 00:31:52,400 --> 00:31:54,460 You find the variance of it. 585 00:31:54,460 --> 00:31:57,960 Because of this n here and the squaring that you're doing, 586 00:31:57,960 --> 00:32:01,990 the variance of the sum divided by n, the sigma 587 00:32:01,990 --> 00:32:07,570 squared divided by n, what happens as n gets large? 588 00:32:07,570 --> 00:32:10,080 This variance goes to 0. 589 00:32:10,080 --> 00:32:13,830 What happens when you have a random variable, a sequence of 590 00:32:13,830 --> 00:32:17,850 random variables, all of which have the same mean and whose 591 00:32:17,850 --> 00:32:21,020 standard deviation is going to 0? 592 00:32:21,020 --> 00:32:24,550 Well, you might play around with a lot of funny kinds of 593 00:32:24,550 --> 00:32:27,310 things that you might think of as happening. 594 00:32:27,310 --> 00:32:32,810 But essentially what's going on here is the nice feature 595 00:32:32,810 --> 00:32:36,360 that when you add all these things up, the distribution 596 00:32:36,360 --> 00:32:42,250 function gets scrunched down into a unit step. 597 00:32:42,250 --> 00:32:45,650 In other words, since the standard deviation is going to 598 00:32:45,650 --> 00:32:49,810 0, the sequence of random variables-- 599 00:32:49,810 --> 00:32:52,260 since they all have the same mean-- 600 00:32:52,260 --> 00:32:55,690 they all have smaller and smaller standard deviations. 601 00:32:55,690 --> 00:32:59,500 The only way you can do that is to scrunch them down into a 602 00:32:59,500 --> 00:33:04,210 limiting random variable, which is deterministic. 603 00:33:04,210 --> 00:33:07,460 And you can see that happening here. 604 00:33:07,460 --> 00:33:12,670 Namely the largest value is the black thing, which is 605 00:33:12,670 --> 00:33:14,540 getting smaller and smaller. 606 00:33:14,540 --> 00:33:17,250 And the left side is going that way. 607 00:33:17,250 --> 00:33:19,830 On the right side, it's going that way. 608 00:33:19,830 --> 00:33:24,010 So it looks like it's approaching a unit step. 609 00:33:24,010 --> 00:33:25,630 That has to be proven. 610 00:33:25,630 --> 00:33:26,910 And there's a simple proof of it. 611 00:33:26,910 --> 00:33:27,830 And we'll see that. 612 00:33:27,830 --> 00:33:29,720 And you've all seen that before. 613 00:33:29,720 --> 00:33:32,413 And you've all probably said, ho-hum. 614 00:33:32,413 --> 00:33:35,050 But that's the way it is. 615 00:33:35,050 --> 00:33:39,000 Now the next thing to look at for this same set of random 616 00:33:39,000 --> 00:33:43,610 variables, the same sum, is you look at the normalized 617 00:33:43,610 --> 00:33:49,470 sum, namely sn minus n times the mean. 618 00:33:49,470 --> 00:33:53,730 And you divide that by the square root of n times sigma. 619 00:33:53,730 --> 00:33:55,520 And what do you get? 620 00:33:55,520 --> 00:33:57,820 Well, every one of these random variables-- 621 00:33:57,820 --> 00:34:03,190 for every n has mean 0, has mean 0 because the mean of sn 622 00:34:03,190 --> 00:34:04,570 is n times x bar. 623 00:34:04,570 --> 00:34:08,429 So you're subtracting off of the mean essentially. 624 00:34:08,429 --> 00:34:11,960 And every one of them has variance 1. 625 00:34:11,960 --> 00:34:16,590 So you've got a whole sequence of random variables, which are 626 00:34:16,590 --> 00:34:20,469 just sticking there at the same mean, 0, 627 00:34:20,469 --> 00:34:23,040 and at the same variance. 628 00:34:23,040 --> 00:34:26,110 What's extraordinary when you do that, and you can sort of 629 00:34:26,110 --> 00:34:30,949 see this happening a little bit, this curve looks like 630 00:34:30,949 --> 00:34:35,389 it's going into a fixed curve, which starts 631 00:34:35,389 --> 00:34:39,250 out sticking to 0. 632 00:34:39,250 --> 00:34:41,270 And then it gradually comes up. 633 00:34:41,270 --> 00:34:43,600 And it looks fairly smooth. 634 00:34:43,600 --> 00:34:45,946 It goes off this way. 635 00:34:45,946 --> 00:34:54,219 And if you read a lot about this or if you think that all 636 00:34:54,219 --> 00:34:58,640 respectable random variables are Gaussian random variables, 637 00:34:58,640 --> 00:35:02,020 and I hope at the end of this course you will realize that 638 00:35:02,020 --> 00:35:04,780 only most respectable random variables are 639 00:35:04,780 --> 00:35:06,660 Gaussian random variables. 640 00:35:06,660 --> 00:35:08,575 There are many very interesting random variables 641 00:35:08,575 --> 00:35:09,650 that aren't. 642 00:35:09,650 --> 00:35:13,520 But what the central limit theorem says is that as you 643 00:35:13,520 --> 00:35:17,040 add up more and more random variables and you look at this 644 00:35:17,040 --> 00:35:22,850 normalized sum here, what you get is in fact the normal 645 00:35:22,850 --> 00:35:26,700 distribution, which is this strange integral here, that e 646 00:35:26,700 --> 00:35:30,470 to the minus x squared over 2 times the x. 647 00:35:30,470 --> 00:35:35,900 Now what I want to do with the rest of our time is to show 648 00:35:35,900 --> 00:35:40,490 you why in fact that happens. 649 00:35:40,490 --> 00:35:42,420 I've never seen this proof of the central 650 00:35:42,420 --> 00:35:44,480 limit theorem before. 651 00:35:44,480 --> 00:35:47,040 I'm sure that some people have done it. 652 00:35:47,040 --> 00:35:52,230 I'm only going to do it for the case of a binomial 653 00:35:52,230 --> 00:35:55,780 distribution, which is the only place where this works. 654 00:35:55,780 --> 00:36:01,280 But I think in doing this you will see why in fact that 655 00:36:01,280 --> 00:36:04,990 strange e to the minus x squared over 2 comes up. 656 00:36:04,990 --> 00:36:08,500 It sure is not obvious by looking at this problem. 657 00:36:08,500 --> 00:36:11,370 OK, so that's what we're going to do. 658 00:36:11,370 --> 00:36:15,840 And I'm hoping that after you see this, you will in fact 659 00:36:15,840 --> 00:36:19,920 understand why the central limit theorem is true as well 660 00:36:19,920 --> 00:36:23,440 as knowing that it's true. 661 00:36:23,440 --> 00:36:27,250 OK, so let's look at the Bernoulli process. 662 00:36:27,250 --> 00:36:32,200 You have a sequence of binary random variables, each of them 663 00:36:32,200 --> 00:36:37,220 is IID, each of them is 1 with probability p. 664 00:36:37,220 --> 00:36:41,780 And a 0 with probability q equals 1 minus p. 665 00:36:41,780 --> 00:36:42,770 You add them all up. 666 00:36:42,770 --> 00:36:44,280 They're IID. 667 00:36:44,280 --> 00:36:49,890 And the question is, what does the distribution of 668 00:36:49,890 --> 00:36:51,860 the sum look like? 669 00:36:51,860 --> 00:36:55,620 Well, it has a nice formula to it. 670 00:36:55,620 --> 00:36:57,220 It's that formula down there. 671 00:36:57,220 --> 00:36:59,990 You've probably seen that formula before. 672 00:36:59,990 --> 00:37:01,950 Let's get some idea of where it comes 673 00:37:01,950 --> 00:37:04,670 from and what it means. 674 00:37:04,670 --> 00:37:10,240 Each n tuple that starts with k1's and then ends with n 675 00:37:10,240 --> 00:37:15,870 minus k0's, each one of those has the same probability. 676 00:37:15,870 --> 00:37:19,320 And it's p to the k times q to the n minus k. 677 00:37:19,320 --> 00:37:22,030 In other words, the probability you get a 1 on the 678 00:37:22,030 --> 00:37:24,030 first toss is p. 679 00:37:24,030 --> 00:37:28,740 The probability you get a 1 on the second toss also, since 680 00:37:28,740 --> 00:37:31,710 those are independent, probability you get two 1's in 681 00:37:31,710 --> 00:37:33,210 a row is p squared. 682 00:37:33,210 --> 00:37:36,460 Probably you get three 1's in a row is p cubed and 683 00:37:36,460 --> 00:37:38,340 so forth up to k. 684 00:37:38,340 --> 00:37:42,280 Because we're looking at the probability that the first k 685 00:37:42,280 --> 00:37:45,300 outputs are 1, so the probability of 686 00:37:45,300 --> 00:37:47,630 that is p to the k. 687 00:37:47,630 --> 00:37:48,940 That's this term. 688 00:37:48,940 --> 00:37:53,690 And the probability that the rest of them are all 0's is q 689 00:37:53,690 --> 00:37:54,940 to the n minus k. 690 00:37:58,210 --> 00:38:01,000 And this is sometimes confusing to you because you 691 00:38:01,000 --> 00:38:05,350 often think that this is going to be maximized when k is 692 00:38:05,350 --> 00:38:06,610 equal to p over n. 693 00:38:06,610 --> 00:38:09,880 You have some strange view of the law of large numbers. 694 00:38:09,880 --> 00:38:12,020 Well no, this quantity-- 695 00:38:12,020 --> 00:38:16,320 if p is less than 1/2, it's going to be largest at k 696 00:38:16,320 --> 00:38:17,280 equals zero. 697 00:38:17,280 --> 00:38:21,970 The most probable single outcome from n tosses of a 698 00:38:21,970 --> 00:38:26,830 coin, and it's a biased coin, it comes out 1's 699 00:38:26,830 --> 00:38:28,080 more often than 0's. 700 00:38:31,400 --> 00:38:34,250 0's are more probable than 1's. 701 00:38:34,250 --> 00:38:38,580 The most probable output is all 0's. 702 00:38:38,580 --> 00:38:42,130 Very improbable, but that's the most probable of all these 703 00:38:42,130 --> 00:38:43,570 improbable things. 704 00:38:43,570 --> 00:38:50,830 But as you probably know already, there are n choose k 705 00:38:50,830 --> 00:38:56,470 different n tuples, all of which have k1's in them and n 706 00:38:56,470 --> 00:38:58,340 minus k0's. 707 00:38:58,340 --> 00:39:01,650 If you don't know that, I didn't even 708 00:39:01,650 --> 00:39:02,900 put that in the text. 709 00:39:02,900 --> 00:39:04,430 I put most things there. 710 00:39:04,430 --> 00:39:08,500 This is one of those basic combinatorial facts. 711 00:39:08,500 --> 00:39:10,620 Look it up in Wikipedia. 712 00:39:10,620 --> 00:39:13,290 You'll probably get a cleaner explanation of it there than 713 00:39:13,290 --> 00:39:14,000 anywhere else. 714 00:39:14,000 --> 00:39:17,790 But look it up in any elementary probability book or 715 00:39:17,790 --> 00:39:20,780 in any elementary combinatorics book. 716 00:39:20,780 --> 00:39:23,440 I'm sure that all of you have seen this stuff. 717 00:39:23,440 --> 00:39:29,020 So when you put this together, the probability that the sum 718 00:39:29,020 --> 00:39:33,250 of a n random variables, all of which are binary, the 719 00:39:33,250 --> 00:39:38,730 probability of getting k1's is n choose k times p to the k 720 00:39:38,730 --> 00:39:40,720 times q to the n minus k. 721 00:39:40,720 --> 00:39:44,390 Now you look at that. 722 00:39:44,390 --> 00:39:49,680 And if k is 1,000 and if n is 1,000, I mean, your eyes 723 00:39:49,680 --> 00:39:52,050 boggle because you can't imagine what that 724 00:39:52,050 --> 00:39:53,800 number looks like. 725 00:39:53,800 --> 00:39:56,860 So we want to find out what it looks like. 726 00:39:56,860 --> 00:39:58,460 And here's a tricky way of doing it. 727 00:40:01,730 --> 00:40:05,550 What we want to do is to see how this varies with k. 728 00:40:05,550 --> 00:40:08,820 And in particular, we want to see how it varies with k when 729 00:40:08,820 --> 00:40:15,130 n is very large and when k is relatively close to p times n. 730 00:40:15,130 --> 00:40:18,070 So what we're going to do is take the ratio of the 731 00:40:18,070 --> 00:40:23,470 probability of k plus 1 1's to the ratio of k1's. 732 00:40:23,470 --> 00:40:24,220 And what is that? 733 00:40:24,220 --> 00:40:26,780 I've written it out. 734 00:40:26,780 --> 00:40:30,730 n choose k, n choose k plus 1-- 735 00:40:30,730 --> 00:40:32,400 which is this term here-- 736 00:40:32,400 --> 00:40:37,710 is n factorial divided by k plus 1 factorial times n minus 737 00:40:37,710 --> 00:40:40,420 k minus 1 factorial. 738 00:40:40,420 --> 00:40:42,220 This term here-- 739 00:40:42,220 --> 00:40:46,050 you put the n factorial down on the bottom, k factorial 740 00:40:46,050 --> 00:40:51,550 times n minus k quantity factorial. 741 00:40:51,550 --> 00:40:55,770 And then you take the p's and the q's. 742 00:40:55,770 --> 00:40:59,540 For this term here you have p to the k plus 1 q to the n 743 00:40:59,540 --> 00:41:01,160 minus k minus 1. 744 00:41:01,160 --> 00:41:04,620 And for this one here you have p to the k times p 745 00:41:04,620 --> 00:41:06,190 to the n minus k. 746 00:41:06,190 --> 00:41:10,120 All that stuff cancels out, which is really cute. 747 00:41:10,120 --> 00:41:13,050 When you look at this term you have p to the k plus 1 748 00:41:13,050 --> 00:41:13,890 over p to the k. 749 00:41:13,890 --> 00:41:15,400 That's just p. 750 00:41:15,400 --> 00:41:19,520 And here you have q to the n minus k minus 1 over q 751 00:41:19,520 --> 00:41:21,090 to the n minus k. 752 00:41:21,090 --> 00:41:23,520 That's just q in the denominator. 753 00:41:23,520 --> 00:41:26,960 So this goes into p over q. 754 00:41:26,960 --> 00:41:30,790 This quantity here is almost a simple n factorial over n 755 00:41:30,790 --> 00:41:32,680 factorial is 1. 756 00:41:32,680 --> 00:41:37,700 k plus 1 factorial divided by k factorial is k plus 1. 757 00:41:37,700 --> 00:41:39,970 That's this term here. 758 00:41:39,970 --> 00:41:46,270 And the n minus k over n minus k minus 1 is n minus k. 759 00:41:46,270 --> 00:41:48,930 So this ratio here is just that very 760 00:41:48,930 --> 00:41:52,289 simple expression there. 761 00:41:52,289 --> 00:42:03,270 Now this ratio is strictly decreasing in k. 762 00:42:03,270 --> 00:42:04,160 How do I see that? 763 00:42:04,160 --> 00:42:07,570 Well, as k gets bigger and bigger, what happens? 764 00:42:07,570 --> 00:42:11,680 As k gets bigger, the numerator gets larger. 765 00:42:11,680 --> 00:42:14,680 The denominator-- 766 00:42:14,680 --> 00:42:19,560 excuse me, as k gets larger, the numerator gets smaller. 767 00:42:19,560 --> 00:42:22,180 The denominator gets larger. 768 00:42:22,180 --> 00:42:28,030 So the ratio of the two gets smaller. 769 00:42:28,030 --> 00:42:32,710 So this whole quantity here, as k gets larger and larger 770 00:42:32,710 --> 00:42:36,810 for fixed n, is just decreasing and decreasing and 771 00:42:36,810 --> 00:42:38,810 decreasing. 772 00:42:38,810 --> 00:42:44,670 Now let's look a little bit at where this crosses 1, if it 773 00:42:44,670 --> 00:42:47,090 does cross 1. 774 00:42:47,090 --> 00:42:53,280 And what I claim here is that when k plus 1 is less than or 775 00:42:53,280 --> 00:43:00,230 equal to pn, what happens here? 776 00:43:00,230 --> 00:43:06,140 Well if I can do this-- 777 00:43:06,140 --> 00:43:09,080 I usually get confused doing these things. 778 00:43:09,080 --> 00:43:15,600 But if k plus 1 is less than or equal to pn, this is the 779 00:43:15,600 --> 00:43:20,730 last of these choices here, then k is also less than or 780 00:43:20,730 --> 00:43:22,530 equal to pn. 781 00:43:22,530 --> 00:43:27,320 And therefore, n minus k is greater than-- 782 00:43:27,320 --> 00:43:31,200 in fact, k is strictly less than pn. 783 00:43:31,200 --> 00:43:40,970 And n minus k is strictly greater than n minus pn. 784 00:43:40,970 --> 00:43:45,755 And since q is 1 minus p, this is n times q. 785 00:43:45,755 --> 00:43:49,490 OK, so you take this divided by this. 786 00:43:49,490 --> 00:43:52,130 And you take this divided by this. 787 00:43:52,130 --> 00:43:56,920 And sure enough, this ratio here is greater than 1 any 788 00:43:56,920 --> 00:44:03,710 time you have a k which is smaller than what you think k 789 00:44:03,710 --> 00:44:07,876 ought to be, which is p over n. 790 00:44:07,876 --> 00:44:10,060 OK, so you have these three quantities here. 791 00:44:10,060 --> 00:44:12,530 Let me go on to the next slide. 792 00:44:15,490 --> 00:44:21,620 Since these ratios are less than 1 when k is large, 793 00:44:21,620 --> 00:44:27,790 approximately equal to 1 when k is close to pn, and greater 794 00:44:27,790 --> 00:44:31,890 than 1 when it's smaller than 1, if I plot these things, 795 00:44:31,890 --> 00:44:39,390 what I find is that as k is increasing, getting closer and 796 00:44:39,390 --> 00:44:43,840 closer to pn, it's getting larger and larger. 797 00:44:43,840 --> 00:44:49,560 As k is increasing further, getting larger than pn, this 798 00:44:49,560 --> 00:44:52,470 ratio says that these things have to be 799 00:44:52,470 --> 00:44:56,670 getting smaller and smaller. 800 00:44:56,670 --> 00:45:00,140 So just from looking at this, we know that these terms have 801 00:45:00,140 --> 00:45:04,820 to be increasing for terms less than pn and have to be 802 00:45:04,820 --> 00:45:08,360 decreasing for terms greater than pn. 803 00:45:08,360 --> 00:45:10,730 So this is a bell-shaped curve. 804 00:45:10,730 --> 00:45:13,220 We've already seen that. 805 00:45:13,220 --> 00:45:16,790 It might not be quite clear that it's bell-shaped in the 806 00:45:16,790 --> 00:45:21,530 sense that it kind of tapers off as you get smaller. 807 00:45:21,530 --> 00:45:24,860 Because these ratios are getting bigger and bigger, as 808 00:45:24,860 --> 00:45:28,680 k gets bigger and bigger, the ratio of this term to this 809 00:45:28,680 --> 00:45:30,550 term gets bigger and bigger. 810 00:45:30,550 --> 00:45:32,750 So what's happening there? 811 00:45:32,750 --> 00:45:36,060 As this ratio gets bigger and bigger, these terms get 812 00:45:36,060 --> 00:45:37,735 smaller and smaller. 813 00:45:37,735 --> 00:45:40,280 But as these terms get smaller and smaller, they're getting 814 00:45:40,280 --> 00:45:42,210 closer and closer to 0. 815 00:45:42,210 --> 00:45:48,180 So even though they're going to 0 like a bat out of hell, 816 00:45:48,180 --> 00:45:50,260 they still can't get any smaller than 0. 817 00:45:50,260 --> 00:45:55,330 So they just taper down and start to get close to 0. 818 00:45:55,330 --> 00:46:03,910 So that is roughly how this sum of binary 819 00:46:03,910 --> 00:46:06,240 random variables behave. 820 00:46:06,240 --> 00:46:13,950 OK, so let's go on and show that the central limit theorem 821 00:46:13,950 --> 00:46:17,670 holds for the Bernoulli process. 822 00:46:17,670 --> 00:46:19,550 And that's just as easy really. 823 00:46:19,550 --> 00:46:23,230 There's nothing more difficult about it that we 824 00:46:23,230 --> 00:46:25,430 have to deal with. 825 00:46:25,430 --> 00:46:30,410 This ratio, as we've said, is equal to n minus k over k plus 826 00:46:30,410 --> 00:46:32,850 1 times p over k. 827 00:46:32,850 --> 00:46:35,310 What we're interested in here-- 828 00:46:35,310 --> 00:46:40,730 I mean, we've already seen from the last slide that the 829 00:46:40,730 --> 00:46:44,270 interesting thing here is the big terms. 830 00:46:44,270 --> 00:46:47,880 And the big terms are the terms which are close to pn. 831 00:46:47,880 --> 00:46:51,780 So what we'd like to do is look at values of k which are 832 00:46:51,780 --> 00:46:53,120 close to pn. 833 00:46:53,120 --> 00:46:58,580 What I've done here is to plot this as k minus the integer 834 00:46:58,580 --> 00:46:59,820 value of pn. 835 00:46:59,820 --> 00:47:01,270 So we get integers. 836 00:47:01,270 --> 00:47:05,030 What I'm going to assume now, because this gets a little 837 00:47:05,030 --> 00:47:08,400 hairy if I don't do that, I'm going to assume that pn is 838 00:47:08,400 --> 00:47:10,540 equal to an integer. 839 00:47:10,540 --> 00:47:14,000 It doesn't make a whole lot of difference to the argument. 840 00:47:14,000 --> 00:47:17,040 It just leaves out a lot of terms that you don't 841 00:47:17,040 --> 00:47:18,300 have to play with. 842 00:47:18,300 --> 00:47:22,050 So we'll assume that pn is an integer. 843 00:47:22,050 --> 00:47:24,360 With this example, we're looking at where 844 00:47:24,360 --> 00:47:26,310 p is equal to 1/4. 845 00:47:26,310 --> 00:47:30,890 Pn is going to be an integer whenever n is a multiple of 4. 846 00:47:30,890 --> 00:47:33,310 So things are fine then. 847 00:47:33,310 --> 00:47:37,360 If I try to make p equal to 1 over pi, then that doesn't 848 00:47:37,360 --> 00:47:38,170 work so well. 849 00:47:38,170 --> 00:47:45,800 But after all, no reason to chose p in such a strange way. 850 00:47:45,800 --> 00:47:50,520 OK, so I'm going to look at this for a fixed value of n. 851 00:47:50,520 --> 00:47:56,270 I'm going to look at it as k increases for k less than pn. 852 00:47:56,270 --> 00:47:58,800 I'm going to look at it as it decreases for k 853 00:47:58,800 --> 00:48:00,360 greater than pn. 854 00:48:00,360 --> 00:48:08,415 And I'm going to define k to be equal to the i plus pn. 855 00:48:08,415 --> 00:48:11,070 So I'm going to put the whole thing in terms of 856 00:48:11,070 --> 00:48:15,470 i instead of k. 857 00:48:15,470 --> 00:48:25,030 OK, so when I substitute i equals k minus pn for k here, 858 00:48:25,030 --> 00:48:27,620 what I'm going to get is this term. 859 00:48:27,620 --> 00:48:30,700 It's going to be the probability of pn plus i plus 860 00:48:30,700 --> 00:48:33,960 1 over p of-- 861 00:48:37,560 --> 00:48:40,530 fortunately when you're using textures, you can distinguish 862 00:48:40,530 --> 00:48:42,190 different kinds of p's. 863 00:48:42,190 --> 00:48:44,490 I have too many p's in this equation. 864 00:48:44,490 --> 00:48:47,210 This is the probability mass function. 865 00:48:47,210 --> 00:48:52,180 This is just my probability of a 1. 866 00:48:52,180 --> 00:48:54,730 And p's are things that you like to use a lot in 867 00:48:54,730 --> 00:48:55,470 probability. 868 00:48:55,470 --> 00:48:59,440 So it's nice to have that separation there. 869 00:48:59,440 --> 00:49:04,900 OK, when I take this and I substitute it into that with k 870 00:49:04,900 --> 00:49:10,770 equal to i, what I get is n minus pn minus i. 871 00:49:10,770 --> 00:49:17,570 That's n minus k over pn plus i plus 1 times p over q. 872 00:49:17,570 --> 00:49:24,750 Fair enough, OK, all I'm doing is replacing k with pn plus i 873 00:49:24,750 --> 00:49:29,080 because I want i to be very close to 0 in this argument. 874 00:49:29,080 --> 00:49:32,440 Because I've already seen that these terms are only 875 00:49:32,440 --> 00:49:35,370 significant when i is relatively close to 0. 876 00:49:35,370 --> 00:49:38,600 Because when I get away from 0, these terms are going down 877 00:49:38,600 --> 00:49:41,500 very, very fast. 878 00:49:41,500 --> 00:49:45,260 So when I do that, what do I get? 879 00:49:45,260 --> 00:49:51,040 I get n minus pn is equal to qn. 880 00:49:51,040 --> 00:49:51,820 That's nice. 881 00:49:51,820 --> 00:49:53,850 So I have an nq here. 882 00:49:53,850 --> 00:49:55,270 I have a q here. 883 00:49:55,270 --> 00:49:56,870 I have a pn here. 884 00:49:56,870 --> 00:49:58,230 I have a p here. 885 00:49:58,230 --> 00:49:59,860 I'm going to multiply this p by n. 886 00:49:59,860 --> 00:50:02,660 I'm going to multiply this q by n. 887 00:50:02,660 --> 00:50:05,730 And I'm going to take a ratio of this pair of things. 888 00:50:05,730 --> 00:50:11,170 So when I take this ratio, I'm going to get nq over 889 00:50:11,170 --> 00:50:12,420 nq, which is 1. 890 00:50:19,590 --> 00:50:26,030 And the other terms there become minus i over nq. 891 00:50:26,030 --> 00:50:33,460 In the denominator, I'm going to divide pn plus i plus 1 by 892 00:50:33,460 --> 00:50:40,320 p, by pn, which gives me 1 plus i plus 1 divided by np. 893 00:50:40,320 --> 00:50:45,300 So I get two terms, ratio of two terms, which are both 894 00:50:45,300 --> 00:50:49,170 close to 1 at this point and which are getting closer and 895 00:50:49,170 --> 00:50:54,910 closer to 1 as n gets larger and larger. 896 00:50:54,910 --> 00:50:59,580 Now let's take the logarithm of this. 897 00:50:59,580 --> 00:51:02,670 Let me justify taking the logarithm of it in two 898 00:51:02,670 --> 00:51:04,160 different ways. 899 00:51:04,160 --> 00:51:09,360 One of them is that what we're trying to prove-- 900 00:51:09,360 --> 00:51:17,130 and I'm playing the game that all of you 901 00:51:17,130 --> 00:51:18,560 always play in quizzes. 902 00:51:18,560 --> 00:51:20,720 When you're trying to prove something, what do you do? 903 00:51:20,720 --> 00:51:21,980 You start at the beginning. 904 00:51:21,980 --> 00:51:22,870 You work this way. 905 00:51:22,870 --> 00:51:24,150 You start at the end. 906 00:51:24,150 --> 00:51:25,620 You work back this way. 907 00:51:25,620 --> 00:51:29,090 And you hope, at some point, the two things come together. 908 00:51:29,090 --> 00:51:31,530 If they don't come together, you get to this point. 909 00:51:31,530 --> 00:51:32,810 And you say, obviously. 910 00:51:32,810 --> 00:51:35,440 And then you go to that point which leads to-- yeah. 911 00:51:35,440 --> 00:51:37,590 [LAUGHTER] 912 00:51:37,590 --> 00:51:42,510 PROFESSOR: OK, so I'm doing the same thing here. 913 00:51:42,510 --> 00:51:46,620 This probability that we're trying to calculate-- 914 00:51:46,620 --> 00:51:50,860 well, I've listed it here in terms of-- 915 00:51:50,860 --> 00:51:56,600 I have put it here in terms of a distribution function. 916 00:51:56,600 --> 00:52:01,170 I will do just as well if I can do it in terms of an PMF. 917 00:52:01,170 --> 00:52:05,270 And what I'd like to show is that the PMF of sn minus nX 918 00:52:05,270 --> 00:52:09,620 bar over square root of n times sigma is somehow 919 00:52:09,620 --> 00:52:13,740 proportional to e to the minus x squared over 2. 920 00:52:13,740 --> 00:52:18,690 Now if I want to do that, it will be all right if I can 921 00:52:18,690 --> 00:52:21,700 take the logarithm of this term and show that 922 00:52:21,700 --> 00:52:27,660 it's a square nX. 923 00:52:27,660 --> 00:52:32,240 And if I want to show that this logarithm is a square nX, 924 00:52:32,240 --> 00:52:35,910 and I'm looking at the differentials at each time, 925 00:52:35,910 --> 00:52:40,570 what are the differentials going to be if the sum of the 926 00:52:40,570 --> 00:52:44,510 differentials is quadratic? 927 00:52:44,510 --> 00:52:48,300 If the sum of these differentials is quadratic, 928 00:52:48,300 --> 00:52:50,340 then the individual terms have to be linear. 929 00:52:53,510 --> 00:52:57,490 If I take a bunch of linear terms, if I add up 1 plus 2 930 00:52:57,490 --> 00:53:03,900 plus 3 plus 4 plus 5, you've all done this I'm sure. 931 00:53:10,080 --> 00:53:17,960 And down here you write n plus n minus 1 plus, plus 1. 932 00:53:17,960 --> 00:53:19,240 And what do you get? 933 00:53:19,240 --> 00:53:25,246 You get n times n plus 1 over 2. 934 00:53:25,246 --> 00:53:28,170 You can also approximate that by integrating. 935 00:53:28,170 --> 00:53:31,560 Whenever you add up a sum of linear terms, you 936 00:53:31,560 --> 00:53:33,690 get a square term. 937 00:53:33,690 --> 00:53:35,660 And I'm just curious. 938 00:53:35,660 --> 00:53:39,570 How many of you have seen that? 939 00:53:39,570 --> 00:53:40,920 Good, OK. 940 00:53:40,920 --> 00:53:42,630 Well, it's only about 1/2. 941 00:53:42,630 --> 00:53:47,330 So it's something you've probably seen in high school. 942 00:53:47,330 --> 00:53:48,730 Or your haven't seen it at all. 943 00:53:53,060 --> 00:53:56,450 So let's go on with this argument. 944 00:54:02,250 --> 00:54:05,200 OK, so I'm going to take the logarithm of 945 00:54:05,200 --> 00:54:08,120 this expression here. 946 00:54:08,120 --> 00:54:09,570 I'm going to take the logarithm. 947 00:54:09,570 --> 00:54:14,070 I'm going to have the logarithm of 1 minus i over nq 948 00:54:14,070 --> 00:54:20,690 minus the logarithm of 1 plus i plus 1 over np. 949 00:54:20,690 --> 00:54:26,410 And I'm going to use what I think of as one of the most 950 00:54:26,410 --> 00:54:31,750 useful inequalities that you will ever see, which is the 951 00:54:31,750 --> 00:54:36,130 natural log of 1 plus x. 952 00:54:36,130 --> 00:54:44,450 If we use a power expansion, we get x minus x squared over 953 00:54:44,450 --> 00:54:50,160 2 plus x cubed over 3 minus-- 954 00:54:50,160 --> 00:54:52,100 it's an alternating series. 955 00:54:52,100 --> 00:54:56,630 If x is negative, this term is negative. 956 00:54:56,630 --> 00:54:57,820 This term is negative. 957 00:54:57,820 --> 00:54:59,650 This term is negative. 958 00:54:59,650 --> 00:55:04,060 And all this makes sense because if I draw this 959 00:55:04,060 --> 00:55:10,290 function here, logarithm of 1 plus x at x equals 0. 960 00:55:10,290 --> 00:55:12,680 This is equal to 0. 961 00:55:12,680 --> 00:55:16,420 It comes up with a slope of 1. 962 00:55:16,420 --> 00:55:18,620 And it levels off. 963 00:55:18,620 --> 00:55:23,870 And here it's going down very fast. 964 00:55:23,870 --> 00:55:28,600 So these terms, you get these negative terms. 965 00:55:28,600 --> 00:55:32,890 And on the positive side, you get these alternating terms. 966 00:55:32,890 --> 00:55:35,810 So this goes up slowly, down fast. 967 00:55:35,810 --> 00:55:39,290 The slope here is x, which is this term here. 968 00:55:39,290 --> 00:55:45,210 The curvature here gives you the minus x squared over 2. 969 00:55:45,210 --> 00:55:49,380 And the approximation, which is very useful here, is that 970 00:55:49,380 --> 00:55:56,310 the logarithm of 1 plus x, when x is small, is equal to x 971 00:55:56,310 --> 00:55:59,910 plus what we call little l of x. 972 00:55:59,910 --> 00:56:03,890 Namely something which goes to 0 faster than x 973 00:56:03,890 --> 00:56:05,930 as x goes to 0. 974 00:56:05,930 --> 00:56:09,630 OK, all of you know that, right? 975 00:56:09,630 --> 00:56:12,090 Well, if you don't know it, now you know it. 976 00:56:12,090 --> 00:56:12,940 It's useful. 977 00:56:12,940 --> 00:56:16,040 You will use it again and again. 978 00:56:16,040 --> 00:56:18,650 OK, so what we're going to do is-- 979 00:56:21,590 --> 00:56:26,160 well, that's pretty good. 980 00:56:26,160 --> 00:56:27,410 Where did I get to that point? 981 00:56:31,910 --> 00:56:34,630 I skipped something. 982 00:56:34,630 --> 00:56:42,240 What I have shown is that this increment in the probability, 983 00:56:42,240 --> 00:56:46,190 in the PMF for s sub n, namely the increment as you increase 984 00:56:46,190 --> 00:56:50,810 i by 1, is linear in i. 985 00:56:50,810 --> 00:56:54,940 And in fact, the logarithm of this increment is linear in i. 986 00:56:54,940 --> 00:56:58,740 So therefore, by what I was saying before, the logarithm 987 00:56:58,740 --> 00:57:02,760 of the actual terms should be rather than linear in i, they 988 00:57:02,760 --> 00:57:04,680 should be quadratic in i. 989 00:57:04,680 --> 00:57:06,870 So that's what I'm trying to do here. 990 00:57:06,870 --> 00:57:09,450 I just missed this whole term here. 991 00:57:09,450 --> 00:57:14,490 What I'm interested in now is getting a handle on pn plus 992 00:57:14,490 --> 00:57:17,960 some larger value, j, divided by the 993 00:57:17,960 --> 00:57:21,000 probability of sn for pn. 994 00:57:21,000 --> 00:57:23,340 What am I trying to do here? 995 00:57:23,340 --> 00:57:27,390 I should've said what I was trying to do. 996 00:57:27,390 --> 00:57:29,570 This term is just one term. 997 00:57:29,570 --> 00:57:31,370 It's fixed. 998 00:57:31,370 --> 00:57:35,120 Be nice if we knew what it was, we don't at the moment. 999 00:57:35,120 --> 00:57:38,370 But I'm trying to express everything else in terms of 1000 00:57:38,370 --> 00:57:40,890 that one unknown term. 1001 00:57:40,890 --> 00:57:44,580 And what I'm trying to do is to show that the logarithm of 1002 00:57:44,580 --> 00:57:50,970 this everything else is going to be quadratic in j. 1003 00:57:50,970 --> 00:57:54,320 And if I can do that, then I only have one undetermined 1004 00:57:54,320 --> 00:57:56,180 factor in this whole sum. 1005 00:57:56,180 --> 00:57:59,345 And I can use the fact that PMF summed to 1 to solve the 1006 00:57:59,345 --> 00:58:01,180 whole problem. 1007 00:58:01,180 --> 00:58:07,650 So I'm going to express the probability that we get pn 1008 00:58:07,650 --> 00:58:13,070 plus j1's divided by the probability that we get pn1's. 1009 00:58:13,070 --> 00:58:24,440 It's the sum of the probability pn plus i plus 1 1010 00:58:24,440 --> 00:58:26,250 over pn plus i. 1011 00:58:26,250 --> 00:58:27,270 And we increase i. 1012 00:58:27,270 --> 00:58:30,060 We start out at i equals 0. 1013 00:58:30,060 --> 00:58:34,740 And then the denominator is probability of pn plus 0, 1014 00:58:34,740 --> 00:58:36,900 which is this term. 1015 00:58:36,900 --> 00:58:42,110 And each time we increase i by 1, this term cancels out with 1016 00:58:42,110 --> 00:58:45,320 the previous or the next value of this term. 1017 00:58:45,320 --> 00:58:51,300 And when I get all done, all I have is this expression here. 1018 00:58:51,300 --> 00:58:53,854 Everybody see that? 1019 00:58:53,854 --> 00:58:56,130 OK, I see a lot of-- 1020 00:58:56,130 --> 00:58:59,470 if you don't see it, just look at it. 1021 00:58:59,470 --> 00:59:01,810 And you'll see that this-- 1022 00:59:01,810 --> 00:59:03,660 I think you'll see that this works. 1023 00:59:03,660 --> 00:59:07,700 OK, so now I take this expression here. 1024 00:59:07,700 --> 00:59:11,210 This logarithm is this linear term here. 1025 00:59:11,210 --> 00:59:12,680 What do I want to do? 1026 00:59:12,680 --> 00:59:18,210 I want to sum i from 0 up to j minus 1. 1027 00:59:18,210 --> 00:59:22,270 What do I get when I sum i from 0 to j minus 1? 1028 00:59:22,270 --> 00:59:24,330 I get this expression here. 1029 00:59:24,330 --> 00:59:29,870 I get j times j minus 1 divided by 2n. 1030 00:59:29,870 --> 00:59:31,120 Oh, I was-- 1031 00:59:33,420 --> 00:59:35,180 I skipped something. 1032 00:59:35,180 --> 00:59:37,380 Let's go back a little bit. 1033 00:59:37,380 --> 00:59:40,105 Because it'll look like it was a typo. 1034 00:59:43,800 --> 00:59:48,100 When I took this logarithm and I applied this approximation 1035 00:59:48,100 --> 00:59:55,160 to it, I got minus i over nq minus i over np minus 1 over 1036 00:59:55,160 --> 00:59:59,460 np plus square terms in n. 1037 00:59:59,460 --> 01:00:05,020 When I take i over nq minus i over np, I can combine those 1038 01:00:05,020 --> 01:00:06,680 two things together. 1039 01:00:06,680 --> 01:00:15,100 I can take ip over npq minus iq times npq. 1040 01:00:15,100 --> 01:00:17,430 And q plus p is equal to 1. 1041 01:00:17,430 --> 01:00:20,310 So the numerator all goes away. 1042 01:00:20,310 --> 01:00:27,230 And these two terms combine to be minus i over n times p 1043 01:00:27,230 --> 01:00:29,720 times p times q. 1044 01:00:29,720 --> 01:00:33,490 And I just has this one last little term left here. 1045 01:00:33,490 --> 01:00:35,190 Don't know what to do with that. 1046 01:00:35,190 --> 01:00:37,260 But then I add up all these terms. 1047 01:00:37,260 --> 01:00:39,690 This one is the one that leads to j times j 1048 01:00:39,690 --> 01:00:42,710 minus 1 over 2 npq. 1049 01:00:42,710 --> 01:00:47,300 This one is the one that leads to j over np. 1050 01:00:47,300 --> 01:00:51,580 And I just neglect this term, which is negligible compared 1051 01:00:51,580 --> 01:00:52,920 to j squared. 1052 01:00:52,920 --> 01:00:55,685 I get minus j squared over 2 npq. 1053 01:00:59,440 --> 01:01:04,430 Let me come back later to say why I'm so eager to neglect 1054 01:01:04,430 --> 01:01:07,240 this term except that that's what I have to do if I want to 1055 01:01:07,240 --> 01:01:08,700 get the right answer. 1056 01:01:08,700 --> 01:01:13,130 OK, so we'll see why that has to be negligible in just a 1057 01:01:13,130 --> 01:01:14,110 little bit. 1058 01:01:14,110 --> 01:01:19,630 But now this logarithm is coming out to be exactly the 1059 01:01:19,630 --> 01:01:23,160 term that I want it to be. 1060 01:01:23,160 --> 01:01:29,950 So finally, the logarithm of the sum of these random 1061 01:01:29,950 --> 01:01:34,790 variables pn plus j, namely j off the mean, divided by that 1062 01:01:34,790 --> 01:01:40,420 if pn is equal to minus j squared over 2 npq plus some 1063 01:01:40,420 --> 01:01:42,160 negligible terms. 1064 01:01:42,160 --> 01:01:45,120 And this says when I exponentiate things, that the 1065 01:01:45,120 --> 01:01:51,300 probability that sn is j off the mean is approximately 1066 01:01:51,300 --> 01:01:54,720 equal to this term, the probability that it's right at 1067 01:01:54,720 --> 01:01:58,965 the mean, times e to the minus j squared over 2 npq. 1068 01:02:01,490 --> 01:02:06,020 What that is saying is that this sum of terms that I was 1069 01:02:06,020 --> 01:02:07,270 looking at before-- 1070 01:02:10,290 --> 01:02:20,070 this term here, this term, this term, this term, this 1071 01:02:20,070 --> 01:02:21,865 term, and so forth down-- 1072 01:02:30,280 --> 01:02:37,840 these terms here are actually going as minus j squared over 1073 01:02:37,840 --> 01:02:42,150 2 npq, which is what they should be going as if you have 1074 01:02:42,150 --> 01:02:44,810 a Gaussian curve here. 1075 01:02:44,810 --> 01:02:50,140 OK, now there's one other thing we have to do, which is 1076 01:02:50,140 --> 01:02:54,340 figure out what this term is. 1077 01:02:54,340 --> 01:03:02,470 And if you look at this as an undetermined coefficient on 1078 01:03:02,470 --> 01:03:06,270 these Gaussian-type terms and you think of what happens if I 1079 01:03:06,270 --> 01:03:11,710 sum this over all i, well, if I sum it over all i what I'm 1080 01:03:11,710 --> 01:03:17,490 going to get is the sum of all of these terms here, which are 1081 01:03:17,490 --> 01:03:24,360 negligible except where j squared is proportional to n. 1082 01:03:24,360 --> 01:03:27,640 So I don't have to sum them beyond the point where this 1083 01:03:27,640 --> 01:03:29,910 approximation makes sense. 1084 01:03:29,910 --> 01:03:33,350 So I want to sum all these terms. 1085 01:03:33,350 --> 01:03:36,540 In summing these terms, when n gets very, very large, these 1086 01:03:36,540 --> 01:03:39,290 things are dropping off very, very slowly. 1087 01:03:39,290 --> 01:03:41,510 The curve is getting very, very wide. 1088 01:03:41,510 --> 01:03:46,940 If i scrunch the curve back in again, what I get is a Riemann 1089 01:03:46,940 --> 01:03:51,730 approximation to a normal density curve. 1090 01:03:51,730 --> 01:03:53,890 Therefore, I can integrate it. 1091 01:03:53,890 --> 01:03:55,920 And believe me. 1092 01:03:55,920 --> 01:03:57,620 If you don't believe me, I'll go through it. 1093 01:03:57,620 --> 01:03:59,720 And you won't like that. 1094 01:03:59,720 --> 01:04:06,750 When you go through this, what you get is in fact this 1095 01:04:06,750 --> 01:04:11,930 expression right here, which says that when n gets very, 1096 01:04:11,930 --> 01:04:20,190 very large and j is the offset from the mean and is 1097 01:04:20,190 --> 01:04:23,420 proportional to-- 1098 01:04:23,420 --> 01:04:25,890 well, it's proportional to the square root of n. 1099 01:04:25,890 --> 01:04:30,260 Then what I get is this PMF here, which is in fact what 1100 01:04:30,260 --> 01:04:32,350 the central limit theorem says. 1101 01:04:32,350 --> 01:04:36,380 And now if you go back and try to think of exactly what we've 1102 01:04:36,380 --> 01:04:40,110 done, what we've done is to show that the logarithm of 1103 01:04:40,110 --> 01:04:44,490 these differences here is in fact linear in i. 1104 01:04:44,490 --> 01:04:47,470 Therefore, when you sum them, you get something which is 1105 01:04:47,470 --> 01:04:50,640 quadratic in j. 1106 01:04:50,640 --> 01:04:54,000 And because of that, all you have to do is normalize with a 1107 01:04:54,000 --> 01:04:55,340 center term. 1108 01:04:55,340 --> 01:04:57,800 And you get this. 1109 01:04:57,800 --> 01:05:01,690 The central limit theorem, especially for the binary 1110 01:05:01,690 --> 01:05:05,550 case, is almost always done by using a Stirling 1111 01:05:05,550 --> 01:05:07,040 approximation. 1112 01:05:07,040 --> 01:05:10,460 And a Stirling approximation is one of these things which 1113 01:05:10,460 --> 01:05:12,740 is black magic. 1114 01:05:12,740 --> 01:05:17,190 I don't know any place except in William Feller's book where 1115 01:05:17,190 --> 01:05:21,240 anyone talks about where this formula comes from. 1116 01:05:21,240 --> 01:05:23,650 If you now go back and look very carefully at this 1117 01:05:23,650 --> 01:05:27,390 derivation, this tells you what the Stirling 1118 01:05:27,390 --> 01:05:28,830 approximation is. 1119 01:05:28,830 --> 01:05:34,210 Because if you do this for p equals q, what you're doing is 1120 01:05:34,210 --> 01:05:39,650 actually evaluating n choose k where k is very 1121 01:05:39,650 --> 01:05:42,560 close to and over 2. 1122 01:05:42,560 --> 01:05:45,170 And that will tell you exactly what Stirling's 1123 01:05:45,170 --> 01:05:47,300 approximation has to be. 1124 01:05:47,300 --> 01:05:49,700 In other words, that's a way of deriving Stirling's 1125 01:05:49,700 --> 01:05:50,960 approximation. 1126 01:05:50,960 --> 01:05:54,220 The very backward way of doing things it seems. 1127 01:05:54,220 --> 01:05:57,330 But often backward ways are the best ways 1128 01:05:57,330 --> 01:05:59,480 of doing these things. 1129 01:05:59,480 --> 01:06:05,210 OK, so I told you I would stop at some 1130 01:06:05,210 --> 01:06:06,840 point and ask for questions. 1131 01:06:06,840 --> 01:06:07,570 Yes? 1132 01:06:07,570 --> 01:06:09,820 AUDIENCE: Can you please go back one slide before this 1133 01:06:09,820 --> 01:06:15,695 slide where can you neglect a term, which [INAUDIBLE], 1134 01:06:15,695 --> 01:06:18,932 minus j over np. 1135 01:06:18,932 --> 01:06:22,400 PROFESSOR: Why did I neglect the j over np? 1136 01:06:22,400 --> 01:06:25,830 OK, that's a good question. 1137 01:06:25,830 --> 01:06:34,760 If you look at this curve here, and I put the j in. 1138 01:06:34,760 --> 01:06:39,470 I can put the j in by just making this expression here 1139 01:06:39,470 --> 01:06:44,360 look at one smaller value of j or one larger value of j. 1140 01:06:44,360 --> 01:06:45,870 And you get something different whether you're 1141 01:06:45,870 --> 01:06:48,880 looking at the minus side or the plus side. 1142 01:06:48,880 --> 01:06:53,850 In fact, if p is equal to q, this term cancels out. 1143 01:06:53,850 --> 01:06:57,400 If p is not equal to q, what happens is that the central 1144 01:06:57,400 --> 01:07:01,940 limit theorem is approximately symmetric. 1145 01:07:01,940 --> 01:07:04,130 But in this first ordered term, 1146 01:07:04,130 --> 01:07:05,760 it's not quite symmetric. 1147 01:07:05,760 --> 01:07:10,090 It can't be symmetric because this is p times n. 1148 01:07:10,090 --> 01:07:12,500 And you have all these terms out to 1. 1149 01:07:12,500 --> 01:07:15,690 And you have many, many fewer terms back to 0. 1150 01:07:15,690 --> 01:07:19,290 So it has to be slightly asymmetric. 1151 01:07:19,290 --> 01:07:25,230 But it's only asymmetric over at most a unit of value here, 1152 01:07:25,230 --> 01:07:27,190 which is not significant. 1153 01:07:27,190 --> 01:07:29,870 Because as n gets bigger, these terms-- 1154 01:07:29,870 --> 01:07:32,060 well, as I've done it, the terms do 1155 01:07:32,060 --> 01:07:33,870 not get close together. 1156 01:07:33,870 --> 01:07:37,820 But if I want to think of it as a normalized Gaussian 1157 01:07:37,820 --> 01:07:40,300 curve, I have to make the terms close together. 1158 01:07:40,300 --> 01:07:43,580 So that extra term is not significant. 1159 01:07:43,580 --> 01:07:46,630 I wish I had a nicer way of taking care of all the 1160 01:07:46,630 --> 01:07:48,740 approximations here. 1161 01:07:48,740 --> 01:07:52,110 I haven't put this in the notes because I still haven't 1162 01:07:52,110 --> 01:07:54,640 figured out how to do that. 1163 01:07:54,640 --> 01:08:00,310 But I still think you get more insight from doing it this way 1164 01:08:00,310 --> 01:08:04,130 than you do by going through Stirling's approximation, all 1165 01:08:04,130 --> 01:08:07,610 those pages and pages of algebra. 1166 01:08:07,610 --> 01:08:08,860 Anything else? 1167 01:08:14,020 --> 01:08:16,080 OK, well, see you Wednesday then.