1 00:00:00,000 --> 00:00:00,040 2 00:00:00,040 --> 00:00:02,460 The following content is provided under a Creative 3 00:00:02,460 --> 00:00:03,870 Commons license. 4 00:00:03,870 --> 00:00:06,910 Your support will help MIT OpenCourseWare continue to 5 00:00:06,910 --> 00:00:10,560 offer high-quality educational resources for free. 6 00:00:10,560 --> 00:00:13,460 To make a donation or view additional materials from 7 00:00:13,460 --> 00:00:19,290 hundreds of MIT courses, visit MIT OpenCourseWare at 8 00:00:19,290 --> 00:00:20,540 ocw.mit.edu. 9 00:00:20,540 --> 00:00:22,640 10 00:00:22,640 --> 00:00:22,990 JOHN TSITSIKLIS: OK. 11 00:00:22,990 --> 00:00:24,020 We can start. 12 00:00:24,020 --> 00:00:26,540 Good morning. 13 00:00:26,540 --> 00:00:29,600 So we're going to start now a new unit. 14 00:00:29,600 --> 00:00:32,200 For the next couple of lectures, we will be talking 15 00:00:32,200 --> 00:00:34,560 about continuous random variables. 16 00:00:34,560 --> 00:00:36,520 So this is new material which is not going 17 00:00:36,520 --> 00:00:37,400 to be in the quiz. 18 00:00:37,400 --> 00:00:41,170 You are going to have a long break next week without any 19 00:00:41,170 --> 00:00:45,230 lecture, just a quiz and recitation and tutorial. 20 00:00:45,230 --> 00:00:48,500 So what's going to happen in this new unit? 21 00:00:48,500 --> 00:00:52,760 Basically, we want to do everything that we did for 22 00:00:52,760 --> 00:00:56,610 discrete random variables, reintroduce the same sort of 23 00:00:56,610 --> 00:00:59,510 concepts but see how they apply and how they need to be 24 00:00:59,510 --> 00:01:02,840 modified in order to talk about random variables that 25 00:01:02,840 --> 00:01:04,700 take continuous values. 26 00:01:04,700 --> 00:01:06,610 At some level, it's all the same. 27 00:01:06,610 --> 00:01:10,340 At some level, it's quite a bit harder because when things 28 00:01:10,340 --> 00:01:12,490 are continuous, calculus comes in. 29 00:01:12,490 --> 00:01:14,770 So the calculations that you have to do on the side 30 00:01:14,770 --> 00:01:17,760 sometimes need a little bit more thinking. 31 00:01:17,760 --> 00:01:20,300 In terms of new concepts, there's not going to be a 32 00:01:20,300 --> 00:01:24,200 whole lot today, some analogs of things we have done. 33 00:01:24,200 --> 00:01:27,110 We're going to introduce the concept of cumulative 34 00:01:27,110 --> 00:01:29,950 distribution functions, which allows us to deal with 35 00:01:29,950 --> 00:01:32,750 discrete and continuous random variables, all 36 00:01:32,750 --> 00:01:34,560 of them in one shot. 37 00:01:34,560 --> 00:01:37,890 And finally, introduce a famous kind of continuous 38 00:01:37,890 --> 00:01:41,900 random variable, the normal random variable. 39 00:01:41,900 --> 00:01:43,970 OK, so what's the story? 40 00:01:43,970 --> 00:01:46,970 Continuous random variables are random variables that take 41 00:01:46,970 --> 00:01:50,350 values over the continuum. 42 00:01:50,350 --> 00:01:53,470 So the numerical value of the random variable can be any 43 00:01:53,470 --> 00:01:55,240 real number. 44 00:01:55,240 --> 00:01:58,600 They don't take values just in a discrete set. 45 00:01:58,600 --> 00:02:00,660 So we have our sample space. 46 00:02:00,660 --> 00:02:02,020 The experiment happens. 47 00:02:02,020 --> 00:02:05,730 We get some omega, a sample point in the sample space. 48 00:02:05,730 --> 00:02:10,070 And once that point is determined, it determines the 49 00:02:10,070 --> 00:02:12,500 numerical value of the random variable. 50 00:02:12,500 --> 00:02:15,370 Remember, random variables are functions on the sample space. 51 00:02:15,370 --> 00:02:17,020 You pick a sample point. 52 00:02:17,020 --> 00:02:19,690 This determines the numerical value of the random variable. 53 00:02:19,690 --> 00:02:23,500 So that numerical value is going to be some real number 54 00:02:23,500 --> 00:02:26,010 on that line. 55 00:02:26,010 --> 00:02:28,550 Now we want to say something about the distribution of the 56 00:02:28,550 --> 00:02:29,290 random variable. 57 00:02:29,290 --> 00:02:31,970 We want to say which values are more likely than others to 58 00:02:31,970 --> 00:02:34,060 occur in a certain sense. 59 00:02:34,060 --> 00:02:36,910 For example, you may be interested in a particular 60 00:02:36,910 --> 00:02:40,090 event, the event that the random variable takes values 61 00:02:40,090 --> 00:02:42,360 in the interval from a to b. 62 00:02:42,360 --> 00:02:43,950 And we want to say something about the 63 00:02:43,950 --> 00:02:45,820 probability of that event. 64 00:02:45,820 --> 00:02:48,510 In principle, how is this done? 65 00:02:48,510 --> 00:02:52,010 You go back to the sample space, and you find all those 66 00:02:52,010 --> 00:02:56,790 outcomes for which the value of the random variable happens 67 00:02:56,790 --> 00:02:58,500 to be in that interval. 68 00:02:58,500 --> 00:03:01,870 The probability that the random variable falls here is 69 00:03:01,870 --> 00:03:06,070 the same as the probability of all outcomes that make the 70 00:03:06,070 --> 00:03:08,290 random variable to fall in there. 71 00:03:08,290 --> 00:03:11,190 So in principle, you can work on the original sample space, 72 00:03:11,190 --> 00:03:14,750 find the probability of this event, and you would be done. 73 00:03:14,750 --> 00:03:18,810 But similar to what happened in chapter 2, we want to kind 74 00:03:18,810 --> 00:03:22,910 of push the sample space in the background and just work 75 00:03:22,910 --> 00:03:26,890 directly on the real axis and talk about 76 00:03:26,890 --> 00:03:28,640 probabilities up here. 77 00:03:28,640 --> 00:03:32,430 So we want now a way to specify probabilities, how 78 00:03:32,430 --> 00:03:38,340 they are bunched together, or arranged, along the real line. 79 00:03:38,340 --> 00:03:40,980 So what did we do for discrete random variables? 80 00:03:40,980 --> 00:03:44,100 We introduced PMFs, probability mass functions. 81 00:03:44,100 --> 00:03:47,100 And the way that we described the random variable was by 82 00:03:47,100 --> 00:03:50,300 saying this point has so much mass on top of it, that point 83 00:03:50,300 --> 00:03:52,790 has so much mass on top of it, and so on. 84 00:03:52,790 --> 00:03:57,610 And so we assigned a total amount of 1 unit of 85 00:03:57,610 --> 00:03:58,670 probability. 86 00:03:58,670 --> 00:04:01,810 We assigned it to different masses, which we put at 87 00:04:01,810 --> 00:04:04,870 different points on the real axis. 88 00:04:04,870 --> 00:04:08,070 So that's what you do if somebody gives you a pound of 89 00:04:08,070 --> 00:04:11,910 discrete stuff, a pound of mass in little chunks. 90 00:04:11,910 --> 00:04:15,300 And you place those chunks at a few points. 91 00:04:15,300 --> 00:04:20,890 Now, in the continuous case, this total unit of probability 92 00:04:20,890 --> 00:04:25,440 mass does not sit just on discrete points but is spread 93 00:04:25,440 --> 00:04:28,140 all over the real axis. 94 00:04:28,140 --> 00:04:31,280 So now we're going to have a unit of mass that spreads on 95 00:04:31,280 --> 00:04:32,510 top of the real axis. 96 00:04:32,510 --> 00:04:36,020 How do we describe masses that are continuously spread? 97 00:04:36,020 --> 00:04:39,680 The way we describe them is by specifying densities. 98 00:04:39,680 --> 00:04:43,800 That is, how thick is the mass that's sitting here? 99 00:04:43,800 --> 00:04:46,210 How dense is the mass that's sitting there? 100 00:04:46,210 --> 00:04:48,260 So that's exactly what we're going to do. 101 00:04:48,260 --> 00:04:50,930 We're going to introduce the concept of a probability 102 00:04:50,930 --> 00:04:55,340 density function that tells us how probabilities accumulate 103 00:04:55,340 --> 00:04:59,270 at different parts of the real axis. 104 00:04:59,270 --> 00:05:03,780 105 00:05:03,780 --> 00:05:07,870 So here's an example or a picture of a possible 106 00:05:07,870 --> 00:05:10,210 probability density function. 107 00:05:10,210 --> 00:05:13,210 What does that density function kind of convey 108 00:05:13,210 --> 00:05:14,290 intuitively? 109 00:05:14,290 --> 00:05:17,510 Well, that these x's are relatively 110 00:05:17,510 --> 00:05:19,160 less likely to occur. 111 00:05:19,160 --> 00:05:22,120 Those x's are somewhat more likely to occur because the 112 00:05:22,120 --> 00:05:24,930 density is higher. 113 00:05:24,930 --> 00:05:27,950 Now, for a more formal definition, we're going to say 114 00:05:27,950 --> 00:05:35,620 that a random variable X is said to be continuous if it 115 00:05:35,620 --> 00:05:38,560 can be described by a density function in 116 00:05:38,560 --> 00:05:40,780 the following sense. 117 00:05:40,780 --> 00:05:42,910 We have a density function. 118 00:05:42,910 --> 00:05:47,830 And we calculate probabilities of falling inside an interval 119 00:05:47,830 --> 00:05:52,580 by finding the area under the curve that sits 120 00:05:52,580 --> 00:05:54,940 on top of that interval. 121 00:05:54,940 --> 00:05:57,800 So that's sort of the defining relation for 122 00:05:57,800 --> 00:05:59,190 continuous random variables. 123 00:05:59,190 --> 00:06:00,860 It's an implicit definition. 124 00:06:00,860 --> 00:06:03,870 And it tells us a random variable is continuous if we 125 00:06:03,870 --> 00:06:06,560 can calculate probabilities this way. 126 00:06:06,560 --> 00:06:09,520 So the probability of falling in this interval is the area 127 00:06:09,520 --> 00:06:10,500 under this curve. 128 00:06:10,500 --> 00:06:14,950 Mathematically, it's the integral of the density over 129 00:06:14,950 --> 00:06:17,020 this particular interval. 130 00:06:17,020 --> 00:06:20,410 If the density happens to be constant over that interval, 131 00:06:20,410 --> 00:06:23,610 the area under the curve would be the length of the interval 132 00:06:23,610 --> 00:06:26,440 times the height of the density, which 133 00:06:26,440 --> 00:06:28,170 sort of makes sense. 134 00:06:28,170 --> 00:06:32,020 Now, because the density is not constant but it kind of 135 00:06:32,020 --> 00:06:35,720 moves around, what you need is to write down an integral. 136 00:06:35,720 --> 00:06:39,100 Now, this formula is very much analogous to what you would do 137 00:06:39,100 --> 00:06:41,030 for discrete random variables. 138 00:06:41,030 --> 00:06:44,140 For a discrete random variable, how do you calculate 139 00:06:44,140 --> 00:06:45,610 this probability? 140 00:06:45,610 --> 00:06:48,800 You look at all x's in this interval. 141 00:06:48,800 --> 00:06:54,060 And you add the probability mass function over that range. 142 00:06:54,060 --> 00:06:59,660 So just for comparison, this would be the formula for the 143 00:06:59,660 --> 00:07:01,590 discrete case-- 144 00:07:01,590 --> 00:07:05,620 the sum over all x's in the interval from a to b over the 145 00:07:05,620 --> 00:07:09,420 probability mass function. 146 00:07:09,420 --> 00:07:12,650 And there is a syntactic analogy that's happening here 147 00:07:12,650 --> 00:07:16,160 and which will be a persistent theme when we deal with 148 00:07:16,160 --> 00:07:18,920 continuous random variables. 149 00:07:18,920 --> 00:07:22,620 Sums get replaced by integrals. 150 00:07:22,620 --> 00:07:24,110 In the discrete case, you add. 151 00:07:24,110 --> 00:07:26,920 In the continuous case, you integrate. 152 00:07:26,920 --> 00:07:31,600 Mass functions get replaced by density functions. 153 00:07:31,600 --> 00:07:35,500 So you can take pretty much any formula from the discrete 154 00:07:35,500 --> 00:07:40,020 case and translate it to a continuous analog of that 155 00:07:40,020 --> 00:07:41,480 formula, as we're going to see. 156 00:07:41,480 --> 00:07:43,990 157 00:07:43,990 --> 00:07:45,240 OK. 158 00:07:45,240 --> 00:07:47,250 159 00:07:47,250 --> 00:07:50,040 So let's take this now as our model. 160 00:07:50,040 --> 00:07:53,220 What is the probability that the random variable takes a 161 00:07:53,220 --> 00:07:58,440 specific value if we have a continuous random variable? 162 00:07:58,440 --> 00:08:00,200 Well, this would be the case. 163 00:08:00,200 --> 00:08:02,880 It's a case of a trivial interval, where the two end 164 00:08:02,880 --> 00:08:04,660 points coincide. 165 00:08:04,660 --> 00:08:07,670 So it would be the integral from a to itself. 166 00:08:07,670 --> 00:08:10,520 So you're integrating just over a single point. 167 00:08:10,520 --> 00:08:12,790 Now, when you integrate over a single point, the 168 00:08:12,790 --> 00:08:14,600 integral is just 0. 169 00:08:14,600 --> 00:08:17,980 The area under the curve, if you're only looking at a 170 00:08:17,980 --> 00:08:19,560 single point, it's 0. 171 00:08:19,560 --> 00:08:22,670 So big property of continuous random variables is that any 172 00:08:22,670 --> 00:08:26,940 individual point has 0 probability. 173 00:08:26,940 --> 00:08:30,740 In particular, when you look at the value of the density, 174 00:08:30,740 --> 00:08:35,299 the density does not tell you the probability of that point. 175 00:08:35,299 --> 00:08:37,860 The point itself has 0 probability. 176 00:08:37,860 --> 00:08:42,409 So the density tells you something a little different. 177 00:08:42,409 --> 00:08:44,645 We are going to see shortly what that is. 178 00:08:44,645 --> 00:08:47,390 179 00:08:47,390 --> 00:08:52,070 Before we get there, can the density be 180 00:08:52,070 --> 00:08:54,410 an arbitrary function? 181 00:08:54,410 --> 00:08:56,160 Almost, but not quite. 182 00:08:56,160 --> 00:08:57,650 There are two things that we want. 183 00:08:57,650 --> 00:09:00,310 First, since densities are used to calculate 184 00:09:00,310 --> 00:09:02,690 probabilities, and since probabilities must be 185 00:09:02,690 --> 00:09:06,840 non-negative, the density should also be non-negative. 186 00:09:06,840 --> 00:09:10,960 Otherwise you would be getting negative probabilities, which 187 00:09:10,960 --> 00:09:13,360 is not a good thing. 188 00:09:13,360 --> 00:09:16,930 So that's a basic property that any density function 189 00:09:16,930 --> 00:09:18,640 should obey. 190 00:09:18,640 --> 00:09:21,970 The second property that we need is that the overall 191 00:09:21,970 --> 00:09:25,210 probability of the entire real line should be equal to 1. 192 00:09:25,210 --> 00:09:27,980 So if you ask me, what is the probability that x falls 193 00:09:27,980 --> 00:09:30,760 between minus infinity and plus infinity, well, we are 194 00:09:30,760 --> 00:09:33,590 sure that x is going to fall in that range. 195 00:09:33,590 --> 00:09:37,400 So the probability of that event should be 1. 196 00:09:37,400 --> 00:09:40,480 So the probability of being between minus infinity and 197 00:09:40,480 --> 00:09:43,600 plus infinity should be 1, which means that the integral 198 00:09:43,600 --> 00:09:46,410 from minus infinity to plus infinity should be 1. 199 00:09:46,410 --> 00:09:50,460 So that just tells us that there's 1 unit of total 200 00:09:50,460 --> 00:09:54,690 probability that's being spread over our space. 201 00:09:54,690 --> 00:09:59,000 Now, what's the best way to think intuitively about what 202 00:09:59,000 --> 00:10:01,480 the density function does? 203 00:10:01,480 --> 00:10:06,470 The interpretation that I find most natural and easy to 204 00:10:06,470 --> 00:10:10,300 convey the meaning of a density is to look at 205 00:10:10,300 --> 00:10:13,220 probabilities of small intervals. 206 00:10:13,220 --> 00:10:18,850 So let us take an x somewhere here and then x plus delta 207 00:10:18,850 --> 00:10:20,230 just next to it. 208 00:10:20,230 --> 00:10:23,050 So delta is a small number. 209 00:10:23,050 --> 00:10:26,460 And let's look at the probability of the event that 210 00:10:26,460 --> 00:10:29,750 we get a value in that range. 211 00:10:29,750 --> 00:10:32,220 For continuous random variables, the way we find the 212 00:10:32,220 --> 00:10:35,270 probability of falling in that range is by integrating the 213 00:10:35,270 --> 00:10:37,550 density over that range. 214 00:10:37,550 --> 00:10:41,610 So we're drawing this picture. 215 00:10:41,610 --> 00:10:46,060 And we want to take the area under this curve. 216 00:10:46,060 --> 00:10:50,760 Now, what happens if delta is a fairly small number? 217 00:10:50,760 --> 00:10:55,030 If delta is pretty small, our density is not going to change 218 00:10:55,030 --> 00:10:57,040 much over that range. 219 00:10:57,040 --> 00:10:59,330 So you can pretend that the density is 220 00:10:59,330 --> 00:11:01,230 approximately constant. 221 00:11:01,230 --> 00:11:04,550 And so to find the area under the curve, you just take the 222 00:11:04,550 --> 00:11:07,760 base times the height. 223 00:11:07,760 --> 00:11:10,630 And it doesn't matter where exactly you take the height in 224 00:11:10,630 --> 00:11:13,140 that interval, because the density doesn't change very 225 00:11:13,140 --> 00:11:15,370 much over that interval. 226 00:11:15,370 --> 00:11:19,760 And so the integral becomes just base times the height. 227 00:11:19,760 --> 00:11:24,020 So for small intervals, the probability of a small 228 00:11:24,020 --> 00:11:30,170 interval is approximately the density times delta. 229 00:11:30,170 --> 00:11:32,340 So densities essentially give us 230 00:11:32,340 --> 00:11:34,670 probabilities of small intervals. 231 00:11:34,670 --> 00:11:38,100 And if you want to think about it a little differently, you 232 00:11:38,100 --> 00:11:41,020 can take that delta from here and send it to 233 00:11:41,020 --> 00:11:43,960 the denominator there. 234 00:11:43,960 --> 00:11:48,880 And what this tells you is that the density is 235 00:11:48,880 --> 00:11:55,270 probability per unit length for intervals of small length. 236 00:11:55,270 --> 00:11:59,860 So the units of density are probability per unit length. 237 00:11:59,860 --> 00:12:01,420 Densities are not probabilities. 238 00:12:01,420 --> 00:12:04,430 They are rates at which probabilities accumulate, 239 00:12:04,430 --> 00:12:06,780 probabilities per unit length. 240 00:12:06,780 --> 00:12:09,780 And since densities are not probabilities, they don't have 241 00:12:09,780 --> 00:12:11,960 to be less than 1. 242 00:12:11,960 --> 00:12:14,730 Ordinary probabilities always must be less than 1. 243 00:12:14,730 --> 00:12:18,000 But density is a different kind of thing. 244 00:12:18,000 --> 00:12:20,530 It can get pretty big in some places. 245 00:12:20,530 --> 00:12:23,680 It can even sort of blow up in some places. 246 00:12:23,680 --> 00:12:27,620 As long as the total area under the curve is 1, other 247 00:12:27,620 --> 00:12:32,830 than that, the curve can do anything that it wants. 248 00:12:32,830 --> 00:12:35,930 Now, the density prescribes for us the 249 00:12:35,930 --> 00:12:41,620 probability of intervals. 250 00:12:41,620 --> 00:12:44,710 Sometimes we may want to find the probability of more 251 00:12:44,710 --> 00:12:46,540 general sets. 252 00:12:46,540 --> 00:12:47,780 How would we do that? 253 00:12:47,780 --> 00:12:51,580 Well, for nice sets, you will just integrate the density 254 00:12:51,580 --> 00:12:54,260 over that nice set. 255 00:12:54,260 --> 00:12:56,640 I'm not quite defining what "nice" means. 256 00:12:56,640 --> 00:12:59,140 That's a pretty technical topic in the theory of 257 00:12:59,140 --> 00:13:00,160 probability. 258 00:13:00,160 --> 00:13:04,530 But for our purposes, usually we will take b to be something 259 00:13:04,530 --> 00:13:06,500 like a union of intervals. 260 00:13:06,500 --> 00:13:10,200 So how do you find the probability of falling in the 261 00:13:10,200 --> 00:13:11,690 union of two intervals? 262 00:13:11,690 --> 00:13:14,180 Well, you find the probability of falling in that interval 263 00:13:14,180 --> 00:13:16,240 plus the probability of falling in that interval. 264 00:13:16,240 --> 00:13:19,150 So it's the integral over this interval plus the integral 265 00:13:19,150 --> 00:13:20,500 over that interval. 266 00:13:20,500 --> 00:13:24,370 And you think of this as just integrating over the union of 267 00:13:24,370 --> 00:13:25,730 the two intervals. 268 00:13:25,730 --> 00:13:28,580 So once you can calculate probabilities of intervals, 269 00:13:28,580 --> 00:13:30,590 then usually you are in business, and you can 270 00:13:30,590 --> 00:13:34,000 calculate anything else you might want. 271 00:13:34,000 --> 00:13:36,330 So the probability density function is a complete 272 00:13:36,330 --> 00:13:39,530 description of any statistical information we might be 273 00:13:39,530 --> 00:13:44,425 interested in for a continuous random variable. 274 00:13:44,425 --> 00:13:44,880 OK. 275 00:13:44,880 --> 00:13:47,330 So now we can start walking through the concepts and the 276 00:13:47,330 --> 00:13:51,730 definitions that we have for discrete random variables and 277 00:13:51,730 --> 00:13:54,230 translate them to the continuous case. 278 00:13:54,230 --> 00:13:58,960 The first big concept is the concept of the expectation. 279 00:13:58,960 --> 00:14:01,680 One can start with a mathematical definition. 280 00:14:01,680 --> 00:14:04,810 And here we put down a definition by 281 00:14:04,810 --> 00:14:07,730 just translating notation. 282 00:14:07,730 --> 00:14:11,160 Wherever we have a sum in the discrete case, we 283 00:14:11,160 --> 00:14:13,060 now write an integral. 284 00:14:13,060 --> 00:14:16,310 And wherever we had the probability mass function, we 285 00:14:16,310 --> 00:14:20,570 now throw in the probability density function. 286 00:14:20,570 --> 00:14:22,010 This formula-- 287 00:14:22,010 --> 00:14:24,200 you may have seen it in freshman physics-- 288 00:14:24,200 --> 00:14:28,190 basically, it again gives you the center of gravity of the 289 00:14:28,190 --> 00:14:31,150 picture that you have when you have the density. 290 00:14:31,150 --> 00:14:36,460 It's the center of gravity of the object sitting underneath 291 00:14:36,460 --> 00:14:38,220 the probability density function. 292 00:14:38,220 --> 00:14:40,900 So that the interpretation still applies. 293 00:14:40,900 --> 00:14:44,120 It's also true that our conceptual interpretation of 294 00:14:44,120 --> 00:14:47,820 what an expectation means is also valid in this case. 295 00:14:47,820 --> 00:14:51,770 That is, if you repeat an experiment a zillion times, 296 00:14:51,770 --> 00:14:54,100 each time drawing an independent sample of your 297 00:14:54,100 --> 00:14:58,500 random variable x, in the long run, the average that you are 298 00:14:58,500 --> 00:15:01,860 going to get should be the expectation. 299 00:15:01,860 --> 00:15:04,740 One can reason in a hand-waving way, sort of 300 00:15:04,740 --> 00:15:07,440 intuitively, the way we did it for the case of discrete 301 00:15:07,440 --> 00:15:08,770 random variables. 302 00:15:08,770 --> 00:15:11,940 But this is also a theorem of some sort. 303 00:15:11,940 --> 00:15:15,300 It's a limit theorem that we're going to visit later on 304 00:15:15,300 --> 00:15:17,530 in this class. 305 00:15:17,530 --> 00:15:20,700 Having defined the expectation and having claimed that the 306 00:15:20,700 --> 00:15:23,100 interpretation of the expectation is that same as 307 00:15:23,100 --> 00:15:26,810 before, then we can start taking just any formula you've 308 00:15:26,810 --> 00:15:28,580 seen before and just translate it. 309 00:15:28,580 --> 00:15:31,200 So for example, to find the expected value of a function 310 00:15:31,200 --> 00:15:35,430 of a continuous random variable, you do not have to 311 00:15:35,430 --> 00:15:39,130 find the PDF or PMF of g(X). 312 00:15:39,130 --> 00:15:43,040 You can just work directly with the original distribution 313 00:15:43,040 --> 00:15:44,990 of the random variable capital X. 314 00:15:44,990 --> 00:15:48,570 And this formula is the same as for the discrete case. 315 00:15:48,570 --> 00:15:50,880 Sums get replaced by integrals. 316 00:15:50,880 --> 00:15:54,340 And PMFs get replaced by PDFs. 317 00:15:54,340 --> 00:15:57,050 And in particular, the variance of a random variable 318 00:15:57,050 --> 00:15:59,080 is defined again the same way. 319 00:15:59,080 --> 00:16:03,390 The variance is the expected value, the average of the 320 00:16:03,390 --> 00:16:07,920 distance of X from the mean and then squared. 321 00:16:07,920 --> 00:16:10,690 So it's the expected value for a random variable that takes 322 00:16:10,690 --> 00:16:12,500 these numerical values. 323 00:16:12,500 --> 00:16:17,250 And same formula as before, integral and F instead of 324 00:16:17,250 --> 00:16:19,420 summation, and the P. 325 00:16:19,420 --> 00:16:23,090 And the formulas that we have derived or formulas that you 326 00:16:23,090 --> 00:16:26,260 have seen for the discrete case, they all go through the 327 00:16:26,260 --> 00:16:27,090 continuous case. 328 00:16:27,090 --> 00:16:31,990 So for example, the useful relation for variances, which 329 00:16:31,990 --> 00:16:37,410 is this one, remains true. 330 00:16:37,410 --> 00:16:37,850 All right. 331 00:16:37,850 --> 00:16:39,790 So time for an example. 332 00:16:39,790 --> 00:16:43,500 The most simple example of a continuous random variable 333 00:16:43,500 --> 00:16:45,170 that there is, is the so-called 334 00:16:45,170 --> 00:16:48,670 uniform random variable. 335 00:16:48,670 --> 00:16:51,940 So the uniform random variable is described by a density 336 00:16:51,940 --> 00:16:55,540 which is 0 except over an interval. 337 00:16:55,540 --> 00:16:58,360 And over that interval, it is constant. 338 00:16:58,360 --> 00:17:00,190 What is it meant to convey? 339 00:17:00,190 --> 00:17:04,829 It's trying to convey the idea that all x's in this range are 340 00:17:04,829 --> 00:17:06,540 equally likely. 341 00:17:06,540 --> 00:17:08,390 Well, that doesn't say very much. 342 00:17:08,390 --> 00:17:11,170 Any individual x has 0 probability. 343 00:17:11,170 --> 00:17:13,460 So it's conveying a little more than that. 344 00:17:13,460 --> 00:17:18,000 What it is saying is that if I take an interval of a given 345 00:17:18,000 --> 00:17:22,089 length delta, and I take another interval of the same 346 00:17:22,089 --> 00:17:26,290 length, delta, under the uniform distribution, these 347 00:17:26,290 --> 00:17:29,290 two intervals are going to have the same probability. 348 00:17:29,290 --> 00:17:34,670 So being uniform means that intervals of same length have 349 00:17:34,670 --> 00:17:35,720 the same probability. 350 00:17:35,720 --> 00:17:40,390 So no interval is more likely than any other to occur. 351 00:17:40,390 --> 00:17:44,200 And in that sense, it conveys the idea of sort of complete 352 00:17:44,200 --> 00:17:45,100 randomness. 353 00:17:45,100 --> 00:17:48,430 Any little interval in our range is equally likely as any 354 00:17:48,430 --> 00:17:49,830 other little interval. 355 00:17:49,830 --> 00:17:50,260 All right. 356 00:17:50,260 --> 00:17:53,870 So what's the formula for this density? 357 00:17:53,870 --> 00:17:55,280 I only told you the range. 358 00:17:55,280 --> 00:17:57,490 What's the height? 359 00:17:57,490 --> 00:18:00,340 Well, the area under the density must be equal to 1. 360 00:18:00,340 --> 00:18:02,700 Total probability is equal to 1. 361 00:18:02,700 --> 00:18:07,100 And so the height, inescapably, is going to be 1 362 00:18:07,100 --> 00:18:09,480 over (b minus a). 363 00:18:09,480 --> 00:18:14,880 That's the height that makes the density integrate to 1. 364 00:18:14,880 --> 00:18:16,610 So that's the formula. 365 00:18:16,610 --> 00:18:21,240 And if you don't want to lose one point in your exam, you 366 00:18:21,240 --> 00:18:25,946 have to say that it's also 0, otherwise. 367 00:18:25,946 --> 00:18:27,794 OK. 368 00:18:27,794 --> 00:18:28,260 All right? 369 00:18:28,260 --> 00:18:31,760 That's sort of the complete answer. 370 00:18:31,760 --> 00:18:35,590 How about the expected value of this random variable? 371 00:18:35,590 --> 00:18:36,060 OK. 372 00:18:36,060 --> 00:18:39,730 You can find the expected value in two different ways. 373 00:18:39,730 --> 00:18:42,400 One is to start with the definition. 374 00:18:42,400 --> 00:18:45,220 And so you integrate over the range of 375 00:18:45,220 --> 00:18:47,185 interest times the density. 376 00:18:47,185 --> 00:18:50,350 377 00:18:50,350 --> 00:18:55,460 And you figure out what that integral is going to be. 378 00:18:55,460 --> 00:18:57,800 Or you can be a little more clever. 379 00:18:57,800 --> 00:19:01,290 Since the center-of-gravity interpretation is still true, 380 00:19:01,290 --> 00:19:03,890 it must be the center of gravity of this picture. 381 00:19:03,890 --> 00:19:06,680 And the center of gravity is, of course, the midpoint. 382 00:19:06,680 --> 00:19:11,740 Whenever you have symmetry, the mean is always the 383 00:19:11,740 --> 00:19:20,630 midpoint of the diagram that gives you the PDF. 384 00:19:20,630 --> 00:19:22,180 OK. 385 00:19:22,180 --> 00:19:24,870 So that's the expected value of X. 386 00:19:24,870 --> 00:19:27,990 Finally, regarding the variance, well, there you will 387 00:19:27,990 --> 00:19:30,240 have to do a little bit of calculus. 388 00:19:30,240 --> 00:19:33,460 We can write down the definition. 389 00:19:33,460 --> 00:19:35,930 So it's an integral instead of a sum. 390 00:19:35,930 --> 00:19:40,590 A typical value of the random variable minus the expected 391 00:19:40,590 --> 00:19:44,280 value, squared, times the density. 392 00:19:44,280 --> 00:19:45,650 And we integrate. 393 00:19:45,650 --> 00:19:48,820 You do this integral, and you find it's (b minus a) squared 394 00:19:48,820 --> 00:19:52,660 over that number, which happens to be 12. 395 00:19:52,660 --> 00:19:56,140 Maybe more interesting is the standard deviation itself. 396 00:19:56,140 --> 00:19:59,140 397 00:19:59,140 --> 00:20:02,760 And you see that the standard deviation is proportional to 398 00:20:02,760 --> 00:20:05,280 the width of that interval. 399 00:20:05,280 --> 00:20:07,850 This agrees with our intuition, that the standard 400 00:20:07,850 --> 00:20:12,730 deviation is meant to capture a sense of how spread out our 401 00:20:12,730 --> 00:20:14,000 distribution is. 402 00:20:14,000 --> 00:20:17,370 And the standard deviation has the same units as the random 403 00:20:17,370 --> 00:20:19,040 variable itself. 404 00:20:19,040 --> 00:20:22,860 So it's sort of good to-- you can interpret it in a 405 00:20:22,860 --> 00:20:27,180 reasonable way based on that picture. 406 00:20:27,180 --> 00:20:30,890 OK, yes. 407 00:20:30,890 --> 00:20:38,280 Now, let's go up one level and think about the following. 408 00:20:38,280 --> 00:20:41,740 So we have formulas for the discrete case, formulas for 409 00:20:41,740 --> 00:20:42,690 the continuous case. 410 00:20:42,690 --> 00:20:44,420 So you can write them side by side. 411 00:20:44,420 --> 00:20:47,100 One has sums, the other has integrals. 412 00:20:47,100 --> 00:20:49,450 Suppose you want to make an argument and say that 413 00:20:49,450 --> 00:20:52,160 something is true for every random variable. 414 00:20:52,160 --> 00:20:55,770 You would essentially need to do two separate proofs, for 415 00:20:55,770 --> 00:20:57,510 discrete and for continuous. 416 00:20:57,510 --> 00:21:00,400 Is there some way of dealing with random variables just one 417 00:21:00,400 --> 00:21:05,130 at a time, in one shot, using a sort of uniform notation? 418 00:21:05,130 --> 00:21:07,990 Is there a unifying concept? 419 00:21:07,990 --> 00:21:10,170 Luckily, there is one. 420 00:21:10,170 --> 00:21:12,400 It's the notion of the cumulative distribution 421 00:21:12,400 --> 00:21:13,850 function of a random variable. 422 00:21:13,850 --> 00:21:16,400 423 00:21:16,400 --> 00:21:20,730 And it's a concept that applies equally well to 424 00:21:20,730 --> 00:21:22,890 discrete and continuous random variables. 425 00:21:22,890 --> 00:21:26,210 So it's an object that we can use to describe distributions 426 00:21:26,210 --> 00:21:29,340 in both cases, using just one piece of notation. 427 00:21:29,340 --> 00:21:32,070 428 00:21:32,070 --> 00:21:33,600 So what's the definition? 429 00:21:33,600 --> 00:21:36,290 It's the probability that the random variable takes values 430 00:21:36,290 --> 00:21:39,030 less than a certain number little x. 431 00:21:39,030 --> 00:21:41,440 So you go to the diagram, and you see what's the probability 432 00:21:41,440 --> 00:21:44,060 that I'm falling to the left of this. 433 00:21:44,060 --> 00:21:47,680 And you specify those probabilities for all x's. 434 00:21:47,680 --> 00:21:51,400 In the continuous case, you calculate those probabilities 435 00:21:51,400 --> 00:21:53,090 using the integral formula. 436 00:21:53,090 --> 00:21:55,730 So you integrate from here up to x. 437 00:21:55,730 --> 00:21:58,850 In the discrete case, to find the probability to the left of 438 00:21:58,850 --> 00:22:02,790 some point, you go here, and you add probabilities again 439 00:22:02,790 --> 00:22:03,980 from the left. 440 00:22:03,980 --> 00:22:06,770 So the way that the cumulative distribution function is 441 00:22:06,770 --> 00:22:10,010 calculated is a little different in the continuous 442 00:22:10,010 --> 00:22:10,850 and discrete case. 443 00:22:10,850 --> 00:22:11,990 In one case you integrate. 444 00:22:11,990 --> 00:22:13,440 In the other, you sum. 445 00:22:13,440 --> 00:22:18,340 But leaving aside how it's being calculated, what the 446 00:22:18,340 --> 00:22:22,530 concept is, it's the same concept in both cases. 447 00:22:22,530 --> 00:22:25,810 So let's see what the shape of the cumulative distribution 448 00:22:25,810 --> 00:22:28,360 function would be in the two cases. 449 00:22:28,360 --> 00:22:34,100 So here what we want is to record for every little x the 450 00:22:34,100 --> 00:22:36,760 probability of falling to the left of x. 451 00:22:36,760 --> 00:22:38,240 So let's start here. 452 00:22:38,240 --> 00:22:41,580 Probability of falling to the left of here is 0-- 453 00:22:41,580 --> 00:22:43,550 0, 0, 0. 454 00:22:43,550 --> 00:22:47,280 Once we get here and we start moving to the right, the 455 00:22:47,280 --> 00:22:51,750 probability of falling to the left of here is the area of 456 00:22:51,750 --> 00:22:53,610 this little rectangle. 457 00:22:53,610 --> 00:22:57,590 And the area of that little rectangle increases linearly 458 00:22:57,590 --> 00:22:59,290 as I keep moving. 459 00:22:59,290 --> 00:23:03,780 So accordingly, the CDF increases linearly until I get 460 00:23:03,780 --> 00:23:04,870 to that point. 461 00:23:04,870 --> 00:23:08,670 At that point, what's the value of my CDF? 462 00:23:08,670 --> 00:23:09,020 1. 463 00:23:09,020 --> 00:23:11,400 I have accumulated all the probability there is. 464 00:23:11,400 --> 00:23:13,180 I have integrated it. 465 00:23:13,180 --> 00:23:15,890 This total area has to be equal to 1. 466 00:23:15,890 --> 00:23:18,780 So it reaches 1, and then there's no more probability to 467 00:23:18,780 --> 00:23:20,040 be accumulated. 468 00:23:20,040 --> 00:23:23,170 It just stays at 1. 469 00:23:23,170 --> 00:23:28,050 So the value here is equal to 1. 470 00:23:28,050 --> 00:23:30,270 OK. 471 00:23:30,270 --> 00:23:36,716 How would you find the density if somebody gave you the CDF? 472 00:23:36,716 --> 00:23:39,570 The CDF is the integral of the density. 473 00:23:39,570 --> 00:23:43,820 Therefore, the density is the derivative of the CDF. 474 00:23:43,820 --> 00:23:46,190 So you look at this picture and take the derivative. 475 00:23:46,190 --> 00:23:48,580 Derivative is 0 here, 0 here. 476 00:23:48,580 --> 00:23:51,330 And it's a constant up there, which 477 00:23:51,330 --> 00:23:53,120 corresponds to that constant. 478 00:23:53,120 --> 00:23:56,900 So more generally, and an important thing to know, is 479 00:23:56,900 --> 00:24:04,250 that the derivative of the CDF is equal to the density-- 480 00:24:04,250 --> 00:24:10,210 481 00:24:10,210 --> 00:24:14,170 almost, with a little bit of an exception. 482 00:24:14,170 --> 00:24:15,800 What's the exception? 483 00:24:15,800 --> 00:24:19,200 At those places where the CDF does not have a derivative-- 484 00:24:19,200 --> 00:24:21,520 here where it has a corner-- 485 00:24:21,520 --> 00:24:23,720 the derivative is undefined. 486 00:24:23,720 --> 00:24:26,030 And in some sense, the density is also 487 00:24:26,030 --> 00:24:27,460 ambiguous at that point. 488 00:24:27,460 --> 00:24:31,860 Is my density at the endpoint, is it 0 or is it 1? 489 00:24:31,860 --> 00:24:33,330 It doesn't really matter. 490 00:24:33,330 --> 00:24:36,670 If you change the density at just a single point, it's not 491 00:24:36,670 --> 00:24:39,000 going to affect the value of any 492 00:24:39,000 --> 00:24:41,530 integral you ever calculate. 493 00:24:41,530 --> 00:24:44,900 So the value of the density at the endpoint, you can leave it 494 00:24:44,900 --> 00:24:47,390 as being ambiguous, or you can specify it. 495 00:24:47,390 --> 00:24:49,130 It doesn't matter. 496 00:24:49,130 --> 00:24:53,590 So at all places where the CDF has a derivative, 497 00:24:53,590 --> 00:24:54,970 this will be true. 498 00:24:54,970 --> 00:24:58,470 At those places where you have corners, which do show up 499 00:24:58,470 --> 00:25:01,740 sometimes, well, you don't really care. 500 00:25:01,740 --> 00:25:03,640 How about the discrete case? 501 00:25:03,640 --> 00:25:07,450 In the discrete case, the CDF has a more peculiar shape. 502 00:25:07,450 --> 00:25:08,870 So let's do the calculation. 503 00:25:08,870 --> 00:25:10,440 We want to find the probability of b 504 00:25:10,440 --> 00:25:11,920 to the left of here. 505 00:25:11,920 --> 00:25:13,970 That probability is 0, 0, 0. 506 00:25:13,970 --> 00:25:16,170 Once we cross that point, the probability of being to the 507 00:25:16,170 --> 00:25:19,140 left of here is 1/6. 508 00:25:19,140 --> 00:25:22,030 So as soon as we cross the point 1, we get the 509 00:25:22,030 --> 00:25:25,740 probability of 1/6, which means that the size of the 510 00:25:25,740 --> 00:25:29,230 jump that we have here is 1/6. 511 00:25:29,230 --> 00:25:31,020 Now, question. 512 00:25:31,020 --> 00:25:35,175 At this point 1, which is the correct value of the CDF? 513 00:25:35,175 --> 00:25:39,090 Is it 0, or is it 1/6? 514 00:25:39,090 --> 00:25:40,560 It's 1/6 because-- 515 00:25:40,560 --> 00:25:42,540 you need to look carefully at the definitions, the 516 00:25:42,540 --> 00:25:46,180 probability of x being less than or equal to little x. 517 00:25:46,180 --> 00:25:49,230 If I take little x to be 1, it's the probability that 518 00:25:49,230 --> 00:25:51,900 capital X is less than or equal to 1. 519 00:25:51,900 --> 00:25:55,730 So it includes the event that x is equal to 1. 520 00:25:55,730 --> 00:25:58,130 So it includes this probability here. 521 00:25:58,130 --> 00:26:02,710 So at jump points, the correct value of the CDF is going to 522 00:26:02,710 --> 00:26:04,650 be this one. 523 00:26:04,650 --> 00:26:08,130 And now as I trace, x is going to the right. 524 00:26:08,130 --> 00:26:12,750 As soon as I cross this point, I have added another 3/6 525 00:26:12,750 --> 00:26:14,180 probability. 526 00:26:14,180 --> 00:26:20,350 So that 3/6 causes a jump to the CDF. 527 00:26:20,350 --> 00:26:23,280 And that determines the new value. 528 00:26:23,280 --> 00:26:27,860 And finally, once I cross the last point, I get 529 00:26:27,860 --> 00:26:31,631 another jump of 2/6. 530 00:26:31,631 --> 00:26:35,900 A general moral from these two examples and these pictures. 531 00:26:35,900 --> 00:26:39,270 CDFs are well defined in both cases. 532 00:26:39,270 --> 00:26:42,490 For the case of continuous random variables, the CDF will 533 00:26:42,490 --> 00:26:45,000 be a continuous function. 534 00:26:45,000 --> 00:26:46,330 It starts from 0. 535 00:26:46,330 --> 00:26:49,760 It eventually goes to 1 and goes smoothly-- 536 00:26:49,760 --> 00:26:54,100 well, continuously from smaller to higher values. 537 00:26:54,100 --> 00:26:55,200 It can only go up. 538 00:26:55,200 --> 00:26:58,300 It cannot go down since we're accumulating more and more 539 00:26:58,300 --> 00:27:00,230 probability as we are going to the right. 540 00:27:00,230 --> 00:27:03,160 In the discrete case, again it starts from 0, 541 00:27:03,160 --> 00:27:04,610 and it goes to 1. 542 00:27:04,610 --> 00:27:07,740 But it does it in a staircase manner. 543 00:27:07,740 --> 00:27:13,050 And you get a jump at each place where the PMF assigns a 544 00:27:13,050 --> 00:27:14,660 positive mass. 545 00:27:14,660 --> 00:27:19,560 So jumps in the CDF are associated with point masses 546 00:27:19,560 --> 00:27:20,330 in our distribution. 547 00:27:20,330 --> 00:27:23,570 In the continuous case, we don't have any point masses, 548 00:27:23,570 --> 00:27:25,470 so we do not have any jumps either. 549 00:27:25,470 --> 00:27:30,390 550 00:27:30,390 --> 00:27:33,300 Now, besides saving us notation-- 551 00:27:33,300 --> 00:27:36,020 we don't have to deal with discrete 552 00:27:36,020 --> 00:27:39,000 and continuous twice-- 553 00:27:39,000 --> 00:27:43,240 CDFs give us actually a little more flexibility. 554 00:27:43,240 --> 00:27:46,840 Not all random variables are continuous or discrete. 555 00:27:46,840 --> 00:27:49,790 You can cook up random variables that are kind of 556 00:27:49,790 --> 00:27:53,410 neither or a mixture of the two. 557 00:27:53,410 --> 00:27:59,540 An example would be, let's say you play a game. 558 00:27:59,540 --> 00:28:03,620 And with a certain probability, you get a certain 559 00:28:03,620 --> 00:28:05,690 number of dollars in your hands. 560 00:28:05,690 --> 00:28:07,000 So you flip a coin. 561 00:28:07,000 --> 00:28:14,120 And with probability 1/2, you get a reward of 1/2 dollars. 562 00:28:14,120 --> 00:28:18,430 And with probability 1/2, you are led to a dark room where 563 00:28:18,430 --> 00:28:20,580 you spin a wheel of fortune. 564 00:28:20,580 --> 00:28:23,410 And that wheel of fortune gives you a random reward 565 00:28:23,410 --> 00:28:25,610 between 0 and 1. 566 00:28:25,610 --> 00:28:28,600 So any of these outcomes is possible. 567 00:28:28,600 --> 00:28:31,100 And the amount that you're going to get, 568 00:28:31,100 --> 00:28:33,930 let's say, is uniform. 569 00:28:33,930 --> 00:28:35,640 So you flip a coin. 570 00:28:35,640 --> 00:28:38,360 And depending on the outcome of the coin, either you get a 571 00:28:38,360 --> 00:28:43,530 certain value or you get a value that ranges over a 572 00:28:43,530 --> 00:28:45,360 continuous interval. 573 00:28:45,360 --> 00:28:48,380 So what kind of random variable is it? 574 00:28:48,380 --> 00:28:50,280 Is it continuous? 575 00:28:50,280 --> 00:28:54,100 Well, continuous random variables assign 0 probability 576 00:28:54,100 --> 00:28:56,180 to individual points. 577 00:28:56,180 --> 00:28:58,020 Is it the case here? 578 00:28:58,020 --> 00:29:00,680 No, because you have positive probability of 579 00:29:00,680 --> 00:29:04,740 obtaining 1/2 dollar. 580 00:29:04,740 --> 00:29:07,040 So our random variable is not continuous. 581 00:29:07,040 --> 00:29:08,220 Is it discrete? 582 00:29:08,220 --> 00:29:11,600 It's not discrete, because our random variable can take 583 00:29:11,600 --> 00:29:14,260 values also over a continuous range. 584 00:29:14,260 --> 00:29:16,780 So we call such a random variable a 585 00:29:16,780 --> 00:29:19,380 mixed random variable. 586 00:29:19,380 --> 00:29:27,200 If you were to draw its distribution very loosely, 587 00:29:27,200 --> 00:29:33,740 probably you would want to draw a picture like this one, 588 00:29:33,740 --> 00:29:36,710 which kind of conveys the idea of what's going on. 589 00:29:36,710 --> 00:29:39,690 So just think of this as a drawing of masses that are 590 00:29:39,690 --> 00:29:41,840 sitting over a table. 591 00:29:41,840 --> 00:29:47,940 We place an object that weighs half a pound, but it's an 592 00:29:47,940 --> 00:29:50,230 object that takes zero space. 593 00:29:50,230 --> 00:29:53,720 So half a pound is just sitting on top of that point. 594 00:29:53,720 --> 00:29:57,980 And we take another half-pound of probability and spread it 595 00:29:57,980 --> 00:30:00,740 uniformly over that interval. 596 00:30:00,740 --> 00:30:04,820 So this is like a piece that comes from mass functions. 597 00:30:04,820 --> 00:30:08,060 And that's a piece that looks more like a density function. 598 00:30:08,060 --> 00:30:10,920 And we just throw them together in the picture. 599 00:30:10,920 --> 00:30:13,150 I'm not trying to associate any formal 600 00:30:13,150 --> 00:30:14,310 meaning with this picture. 601 00:30:14,310 --> 00:30:18,410 It's just a schematic of how probabilities are distributed, 602 00:30:18,410 --> 00:30:20,860 help us visualize what's going on. 603 00:30:20,860 --> 00:30:26,080 Now, if you have taken classes on systems and all of that, 604 00:30:26,080 --> 00:30:29,890 you may have seen the concept of an impulse function. 605 00:30:29,890 --> 00:30:33,630 And you my start saying that, oh, I should treat this 606 00:30:33,630 --> 00:30:36,190 mathematically as a so-called impulse function. 607 00:30:36,190 --> 00:30:39,400 But we do not need this for our purposes in this class. 608 00:30:39,400 --> 00:30:43,860 Just think of this as a nice picture that conveys what's 609 00:30:43,860 --> 00:30:46,200 going on in this particular case. 610 00:30:46,200 --> 00:30:51,740 So now, what would the CDF look like in this case? 611 00:30:51,740 --> 00:30:55,550 The CDF is always well defined, no matter what kind 612 00:30:55,550 --> 00:30:57,220 of random variable you have. 613 00:30:57,220 --> 00:30:59,540 So the fact that it's not continuous, it's not discrete 614 00:30:59,540 --> 00:31:01,870 shouldn't be a problem as long as we can calculate 615 00:31:01,870 --> 00:31:04,120 probabilities of this kind. 616 00:31:04,120 --> 00:31:07,600 So the probability of falling to the left here is 0. 617 00:31:07,600 --> 00:31:10,850 Once I start crossing there, the probability of falling to 618 00:31:10,850 --> 00:31:13,890 the left of a point increases linearly with 619 00:31:13,890 --> 00:31:15,610 how far I have gone. 620 00:31:15,610 --> 00:31:17,900 So we get this linear increase. 621 00:31:17,900 --> 00:31:21,250 But as soon as I cross that point, I accumulate another 622 00:31:21,250 --> 00:31:24,220 1/2 unit of probability instantly. 623 00:31:24,220 --> 00:31:27,860 And once I accumulate that 1/2 unit, it means that my CDF is 624 00:31:27,860 --> 00:31:30,320 going to have a jump of 1/2. 625 00:31:30,320 --> 00:31:33,780 And then afterwards, I still keep accumulating probability 626 00:31:33,780 --> 00:31:36,760 at a fixed rate, the rate being the density. 627 00:31:36,760 --> 00:31:39,640 And I keep accumulating, again, at a linear rate until 628 00:31:39,640 --> 00:31:42,160 I settle to 1. 629 00:31:42,160 --> 00:31:46,240 So this is a CDF that has certain pieces where it 630 00:31:46,240 --> 00:31:48,060 increases continuously. 631 00:31:48,060 --> 00:31:50,280 And that corresponds to the continuous part of our 632 00:31:50,280 --> 00:31:51,390 randomize variable. 633 00:31:51,390 --> 00:31:55,090 And it also has some places where it has discrete jumps. 634 00:31:55,090 --> 00:31:57,500 And those district jumps correspond to places in which 635 00:31:57,500 --> 00:32:00,990 we have placed a positive mass. 636 00:32:00,990 --> 00:32:01,780 And by the-- 637 00:32:01,780 --> 00:32:03,750 OK, yeah. 638 00:32:03,750 --> 00:32:06,580 So this little 0 shouldn't be there. 639 00:32:06,580 --> 00:32:08,040 So let's cross it out. 640 00:32:08,040 --> 00:32:10,980 641 00:32:10,980 --> 00:32:11,780 All right. 642 00:32:11,780 --> 00:32:15,830 So finally, we're going to take the remaining time and 643 00:32:15,830 --> 00:32:17,610 introduce our new friend. 644 00:32:17,610 --> 00:32:23,080 It's going to be the Gaussian or normal distribution. 645 00:32:23,080 --> 00:32:27,690 So it's the most important distribution there is in all 646 00:32:27,690 --> 00:32:28,940 of probability theory. 647 00:32:28,940 --> 00:32:31,230 It's plays a very central role. 648 00:32:31,230 --> 00:32:34,340 It shows up all over the place. 649 00:32:34,340 --> 00:32:37,870 We'll see later in the class in more detail 650 00:32:37,870 --> 00:32:39,450 why it shows up. 651 00:32:39,450 --> 00:32:42,115 But the quick preview is the following. 652 00:32:42,115 --> 00:32:46,220 If you have a phenomenon in which you measure a certain 653 00:32:46,220 --> 00:32:50,970 quantity, but that quantity is made up of lots and lots of 654 00:32:50,970 --> 00:32:52,820 random contributions-- 655 00:32:52,820 --> 00:32:55,870 so your random variable is actually the sum of lots and 656 00:32:55,870 --> 00:32:59,570 lots of independent little random variables-- 657 00:32:59,570 --> 00:33:04,290 then invariability, no matter what kind of distribution the 658 00:33:04,290 --> 00:33:08,260 little random variables have, their sum will turn out to 659 00:33:08,260 --> 00:33:11,500 have approximately a normal distribution. 660 00:33:11,500 --> 00:33:14,490 So this makes the normal distribution to arise very 661 00:33:14,490 --> 00:33:16,680 naturally in lots and lots of contexts. 662 00:33:16,680 --> 00:33:21,210 Whenever you have noise that's comprised of lots of different 663 00:33:21,210 --> 00:33:26,310 independent pieces of noise, then the end result will be a 664 00:33:26,310 --> 00:33:28,650 random variable that's normal. 665 00:33:28,650 --> 00:33:31,250 So we are going to come back to that topic later. 666 00:33:31,250 --> 00:33:34,620 But that's the preview comment, basically to argue 667 00:33:34,620 --> 00:33:37,430 that it's an important one. 668 00:33:37,430 --> 00:33:37,690 OK. 669 00:33:37,690 --> 00:33:38,810 And there's a special case. 670 00:33:38,810 --> 00:33:41,030 If you are dealing with a binomial distribution, which 671 00:33:41,030 --> 00:33:44,610 is the sum of lots of Bernoulli random variables, 672 00:33:44,610 --> 00:33:47,200 again you would expect that the binomial would start 673 00:33:47,200 --> 00:33:51,170 looking like a normal if you have many, many-- a large 674 00:33:51,170 --> 00:33:53,150 number of point fields. 675 00:33:53,150 --> 00:33:53,530 All right. 676 00:33:53,530 --> 00:33:56,560 So what's the math involved here? 677 00:33:56,560 --> 00:34:02,370 Let's parse the formula for the density of the normal. 678 00:34:02,370 --> 00:34:07,110 What we start with is the function X squared over 2. 679 00:34:07,110 --> 00:34:09,750 And if you are to plot X squared over 2, it's a 680 00:34:09,750 --> 00:34:12,840 parabola, and it has this shape -- 681 00:34:12,840 --> 00:34:14,860 X squared over 2. 682 00:34:14,860 --> 00:34:16,790 Then what do we do? 683 00:34:16,790 --> 00:34:20,210 We take the negative exponential of this. 684 00:34:20,210 --> 00:34:24,600 So when X squared over 2 is 0, then negative 685 00:34:24,600 --> 00:34:28,980 exponential is 1. 686 00:34:28,980 --> 00:34:32,739 When X squared over 2 increases, the negative 687 00:34:32,739 --> 00:34:37,130 exponential of that falls off, and it falls off pretty fast. 688 00:34:37,130 --> 00:34:39,630 So as this goes up, the formula for the 689 00:34:39,630 --> 00:34:41,150 density goes down. 690 00:34:41,150 --> 00:34:45,060 And because exponentials are pretty strong in how quickly 691 00:34:45,060 --> 00:34:49,530 they fall off, this means that the tails of this distribution 692 00:34:49,530 --> 00:34:53,370 actually do go down pretty fast. 693 00:34:53,370 --> 00:34:53,659 OK. 694 00:34:53,659 --> 00:34:57,800 So that explains the shape of the normal PDF. 695 00:34:57,800 --> 00:35:02,340 How about this factor 1 over square root 2 pi? 696 00:35:02,340 --> 00:35:05,540 Where does this come from? 697 00:35:05,540 --> 00:35:08,760 Well, the integral has to be equal to 1. 698 00:35:08,760 --> 00:35:14,620 So you have to go and do your calculus exercise and find the 699 00:35:14,620 --> 00:35:18,350 integral of this the minus X squared over 2 function and 700 00:35:18,350 --> 00:35:22,240 then figure out, what constant do I need to put in front so 701 00:35:22,240 --> 00:35:24,250 that the integral is equal to 1? 702 00:35:24,250 --> 00:35:26,820 How do you evaluate that integral? 703 00:35:26,820 --> 00:35:30,760 Either you go to Mathematica or Wolfram's Alpha or 704 00:35:30,760 --> 00:35:33,340 whatever, and it tells you what it is. 705 00:35:33,340 --> 00:35:37,260 Or it's a very beautiful calculus exercise that you may 706 00:35:37,260 --> 00:35:39,050 have seen at some point. 707 00:35:39,050 --> 00:35:42,190 You throw in another exponential of this kind, you 708 00:35:42,190 --> 00:35:46,520 bring in polar coordinates, and somehow the answer comes 709 00:35:46,520 --> 00:35:48,010 beautifully out there. 710 00:35:48,010 --> 00:35:51,910 But in any case, this is the constant that you need to make 711 00:35:51,910 --> 00:35:56,070 it integrate to 1 and to be a legitimate density. 712 00:35:56,070 --> 00:35:58,550 We call this the standard normal. 713 00:35:58,550 --> 00:36:02,280 And for the standard normal, what is the expected value? 714 00:36:02,280 --> 00:36:05,780 Well, the symmetry, so it's equal to 0. 715 00:36:05,780 --> 00:36:07,490 What is the variance? 716 00:36:07,490 --> 00:36:09,740 Well, here there's no shortcut. 717 00:36:09,740 --> 00:36:12,490 You have to do another calculus exercise. 718 00:36:12,490 --> 00:36:17,080 And you find that the variance is equal to 1. 719 00:36:17,080 --> 00:36:17,750 OK. 720 00:36:17,750 --> 00:36:21,720 So this is a normal that's centered around 0. 721 00:36:21,720 --> 00:36:24,990 How about other types of normals that are centered at 722 00:36:24,990 --> 00:36:26,760 different places? 723 00:36:26,760 --> 00:36:29,730 So we can do the same kind of thing. 724 00:36:29,730 --> 00:36:34,080 Instead of centering it at 0, we can take some place where 725 00:36:34,080 --> 00:36:39,640 we want to center it, write down a quadratic such as (X 726 00:36:39,640 --> 00:36:44,050 minus mu) squared, and then take the negative 727 00:36:44,050 --> 00:36:45,940 exponential of that. 728 00:36:45,940 --> 00:36:53,790 And that gives us a normal density that's centered at mu. 729 00:36:53,790 --> 00:37:01,190 Now, I may wish to control the width of my density. 730 00:37:01,190 --> 00:37:04,820 To control the width of my density, equivalently I can 731 00:37:04,820 --> 00:37:07,720 control the width of my parabola. 732 00:37:07,720 --> 00:37:15,430 If my parabola is narrower, if my parabola looks like this, 733 00:37:15,430 --> 00:37:17,990 what's going to happen to the density? 734 00:37:17,990 --> 00:37:20,550 It's going to fall off much faster. 735 00:37:20,550 --> 00:37:26,620 736 00:37:26,620 --> 00:37:26,920 OK. 737 00:37:26,920 --> 00:37:31,150 How do I make my parabola narrower or wider? 738 00:37:31,150 --> 00:37:35,300 I do it by putting in a constant down here. 739 00:37:35,300 --> 00:37:39,890 So by putting a sigma here, this stretches or widens my 740 00:37:39,890 --> 00:37:42,840 parabola by a factor of sigma. 741 00:37:42,840 --> 00:37:43,540 Let's see. 742 00:37:43,540 --> 00:37:44,780 Which way does it go? 743 00:37:44,780 --> 00:37:49,330 If sigma is very small, this is a big number. 744 00:37:49,330 --> 00:37:55,080 My parabola goes up quickly, which means my normal falls 745 00:37:55,080 --> 00:37:56,730 off very fast. 746 00:37:56,730 --> 00:38:02,630 So small sigma corresponds to a narrower density. 747 00:38:02,630 --> 00:38:08,870 And so it, therefore, should be intuitive that the standard 748 00:38:08,870 --> 00:38:11,520 deviation is proportional to sigma. 749 00:38:11,520 --> 00:38:13,380 Because that's the amount by which you 750 00:38:13,380 --> 00:38:15,080 are scaling the picture. 751 00:38:15,080 --> 00:38:17,320 And indeed, the standard deviation is sigma. 752 00:38:17,320 --> 00:38:21,470 And so the variance is sigma squared. 753 00:38:21,470 --> 00:38:26,590 So all that we have done here to create a general normal 754 00:38:26,590 --> 00:38:31,180 with a given mean and variance is to take this picture, shift 755 00:38:31,180 --> 00:38:35,600 it in space so that the mean sits at mu instead of 0, and 756 00:38:35,600 --> 00:38:38,880 then scale it by a factor of sigma. 757 00:38:38,880 --> 00:38:41,130 This gives us a normal with a given 758 00:38:41,130 --> 00:38:42,560 mean and a given variance. 759 00:38:42,560 --> 00:38:47,670 And the formula for it is this one. 760 00:38:47,670 --> 00:38:48,810 All right. 761 00:38:48,810 --> 00:38:52,230 Now, normal random variables have some wonderful 762 00:38:52,230 --> 00:38:54,160 properties. 763 00:38:54,160 --> 00:39:00,190 And one of them is that they behave nicely when you take 764 00:39:00,190 --> 00:39:02,740 linear functions of them. 765 00:39:02,740 --> 00:39:07,190 So let's fix some constants a and b, suppose that X is 766 00:39:07,190 --> 00:39:13,840 normal, and look at this linear function Y. 767 00:39:13,840 --> 00:39:17,340 What is the expected value of Y? 768 00:39:17,340 --> 00:39:19,220 Here we don't need anything special. 769 00:39:19,220 --> 00:39:22,920 We know that the expected value of a linear function is 770 00:39:22,920 --> 00:39:26,690 the linear function of the expectation. 771 00:39:26,690 --> 00:39:30,570 So the expected value is this. 772 00:39:30,570 --> 00:39:33,230 How about the variance? 773 00:39:33,230 --> 00:39:36,430 We know that the variance of a linear function doesn't care 774 00:39:36,430 --> 00:39:37,910 about the constant term. 775 00:39:37,910 --> 00:39:40,880 But the variance gets multiplied by a squared. 776 00:39:40,880 --> 00:39:46,880 So we get these variance, where sigma squared is the 777 00:39:46,880 --> 00:39:49,070 variance of the original normal. 778 00:39:49,070 --> 00:39:53,730 So have we used so far the property that X is normal? 779 00:39:53,730 --> 00:39:55,170 No, we haven't. 780 00:39:55,170 --> 00:39:59,650 This calculation here is true in general when you take a 781 00:39:59,650 --> 00:40:02,650 linear function of a random variable. 782 00:40:02,650 --> 00:40:08,730 But if X is normal, we get the other additional fact that Y 783 00:40:08,730 --> 00:40:10,930 is also going to be normal. 784 00:40:10,930 --> 00:40:14,300 So that's the nontrivial part of the fact that 785 00:40:14,300 --> 00:40:16,070 I'm claiming here. 786 00:40:16,070 --> 00:40:19,700 So linear functions of normal random variables are 787 00:40:19,700 --> 00:40:23,020 themselves normal. 788 00:40:23,020 --> 00:40:26,680 How do we convince ourselves about it? 789 00:40:26,680 --> 00:40:27,080 OK. 790 00:40:27,080 --> 00:40:31,390 It's something that we will do formerly in about two or three 791 00:40:31,390 --> 00:40:33,390 lectures from today. 792 00:40:33,390 --> 00:40:35,310 So we're going to prove it. 793 00:40:35,310 --> 00:40:39,770 But if you think about it intuitively, normal means this 794 00:40:39,770 --> 00:40:42,070 particular bell-shaped curve. 795 00:40:42,070 --> 00:40:45,550 And that bell-shaped curve could be sitting anywhere and 796 00:40:45,550 --> 00:40:47,910 could be scaled in any way. 797 00:40:47,910 --> 00:40:51,190 So you start with a bell-shaped curve. 798 00:40:51,190 --> 00:40:55,370 If you take X, which is bell shaped, and you multiply it by 799 00:40:55,370 --> 00:40:57,500 a constant, what does that do? 800 00:40:57,500 --> 00:41:01,260 Multiplying by a constant is just like scaling the axis or 801 00:41:01,260 --> 00:41:03,750 changing the units with which you're measuring it. 802 00:41:03,750 --> 00:41:08,880 So it will take a bell shape and spread it or narrow it. 803 00:41:08,880 --> 00:41:10,850 But it will still be a bell shape. 804 00:41:10,850 --> 00:41:13,440 And then when you add the constant, you just take that 805 00:41:13,440 --> 00:41:16,260 bell and move it elsewhere. 806 00:41:16,260 --> 00:41:19,970 So under linear transformations, bell shapes 807 00:41:19,970 --> 00:41:23,360 will remain bell shapes, just sitting at a different place 808 00:41:23,360 --> 00:41:25,090 and with a different width. 809 00:41:25,090 --> 00:41:30,490 And that sort of the intuition of why normals remain normals 810 00:41:30,490 --> 00:41:32,096 under this kind of transformation. 811 00:41:32,096 --> 00:41:35,100 812 00:41:35,100 --> 00:41:36,770 So why is this useful? 813 00:41:36,770 --> 00:41:37,960 Well, OK. 814 00:41:37,960 --> 00:41:39,890 We have a formula for the density. 815 00:41:39,890 --> 00:41:43,750 But usually we want to calculate probabilities. 816 00:41:43,750 --> 00:41:45,760 How will you calculate probabilities? 817 00:41:45,760 --> 00:41:48,670 If I ask you, what's the probability that the normal is 818 00:41:48,670 --> 00:41:51,380 less than 3, how do you find it? 819 00:41:51,380 --> 00:41:54,830 You need to integrate the density from minus 820 00:41:54,830 --> 00:41:57,300 infinity up to 3. 821 00:41:57,300 --> 00:42:03,230 Unfortunately, the integral of the expression that shows up 822 00:42:03,230 --> 00:42:06,720 that you would have to calculate, an integral of this 823 00:42:06,720 --> 00:42:12,690 kind from, let's say, minus infinity to some number, is 824 00:42:12,690 --> 00:42:16,270 something that's not known in closed form. 825 00:42:16,270 --> 00:42:23,490 So if you're looking for a closed-form formula for this-- 826 00:42:23,490 --> 00:42:25,040 X bar-- 827 00:42:25,040 --> 00:42:27,890 if you're looking for a closed-form formula that gives 828 00:42:27,890 --> 00:42:32,010 you the value of this integral as a function of X bar, you're 829 00:42:32,010 --> 00:42:34,460 not going to find it. 830 00:42:34,460 --> 00:42:36,150 So what can we do? 831 00:42:36,150 --> 00:42:38,790 Well, since it's a useful integral, we can 832 00:42:38,790 --> 00:42:40,880 just tabulate it. 833 00:42:40,880 --> 00:42:46,070 Calculate it once and for all, for all values of X bar up to 834 00:42:46,070 --> 00:42:50,440 some precision, and have that table, and use it. 835 00:42:50,440 --> 00:42:53,010 That's what one does. 836 00:42:53,010 --> 00:42:54,885 OK, but now there is a catch. 837 00:42:54,885 --> 00:42:59,600 Are we going to write down a table for every conceivable 838 00:42:59,600 --> 00:43:01,870 type of normal distribution-- 839 00:43:01,870 --> 00:43:05,115 that is, for every possible mean and every variance? 840 00:43:05,115 --> 00:43:07,400 I guess that would be a pretty long table. 841 00:43:07,400 --> 00:43:09,540 You don't want to do that. 842 00:43:09,540 --> 00:43:12,820 Fortunately, it's enough to have a table with the 843 00:43:12,820 --> 00:43:17,590 numerical values only for the standard normal. 844 00:43:17,590 --> 00:43:20,880 And once you have those, you can use them in a clever way 845 00:43:20,880 --> 00:43:24,000 to calculate probabilities for the more general case. 846 00:43:24,000 --> 00:43:26,090 So let's see how this is done. 847 00:43:26,090 --> 00:43:30,610 So our starting point is that someone has graciously 848 00:43:30,610 --> 00:43:36,520 calculated for us the values of the CDF, the cumulative 849 00:43:36,520 --> 00:43:40,350 distribution function, that is the probability of falling 850 00:43:40,350 --> 00:43:44,120 below a certain point for the standard normal 851 00:43:44,120 --> 00:43:46,610 and at various places. 852 00:43:46,610 --> 00:43:48,770 How do we read this table? 853 00:43:48,770 --> 00:43:55,840 The probability that X is less than, let's say, 854 00:43:55,840 --> 00:43:59,170 0.63 is this number. 855 00:43:59,170 --> 00:44:04,610 This number, 0.7357, is the probability that the standard 856 00:44:04,610 --> 00:44:08,070 normal is below 0.63. 857 00:44:08,070 --> 00:44:11,377 So the table refers to the standard normal. 858 00:44:11,377 --> 00:44:15,990 859 00:44:15,990 --> 00:44:19,600 But someone, let's say, gives us some other numbers and 860 00:44:19,600 --> 00:44:22,140 tells us we're dealing with a normal with a certain mean and 861 00:44:22,140 --> 00:44:23,530 a certain variance. 862 00:44:23,530 --> 00:44:26,555 And we want to calculate the probability that the value of 863 00:44:26,555 --> 00:44:28,740 that random variable is less than or equal to 3. 864 00:44:28,740 --> 00:44:30,470 How are we going to do it? 865 00:44:30,470 --> 00:44:36,210 Well, there's a standard trick, which is so-called 866 00:44:36,210 --> 00:44:39,480 standardizing a random variable. 867 00:44:39,480 --> 00:44:41,350 Standardizing a random variable 868 00:44:41,350 --> 00:44:43,080 stands for the following. 869 00:44:43,080 --> 00:44:44,490 You look at the random variable, and you 870 00:44:44,490 --> 00:44:46,280 subtract the mean. 871 00:44:46,280 --> 00:44:50,690 This makes it a random variable with 0 mean. 872 00:44:50,690 --> 00:44:54,270 And then if I divide by the standard deviation, what 873 00:44:54,270 --> 00:44:58,220 happens to the variance of this random variable? 874 00:44:58,220 --> 00:45:03,860 Dividing by a number divides the variance by sigma squared. 875 00:45:03,860 --> 00:45:07,300 The original variance of X was sigma squared. 876 00:45:07,300 --> 00:45:11,740 So when I divide by sigma, I end up with unit variance. 877 00:45:11,740 --> 00:45:14,920 So after I do this transformation, I get a random 878 00:45:14,920 --> 00:45:19,190 variable that has 0 mean and unit variance. 879 00:45:19,190 --> 00:45:20,650 It is also normal. 880 00:45:20,650 --> 00:45:23,580 Why is its normal? 881 00:45:23,580 --> 00:45:28,890 Because this expression is a linear function of the X that 882 00:45:28,890 --> 00:45:30,120 I started with. 883 00:45:30,120 --> 00:45:32,700 It's a linear function of a normal random variable. 884 00:45:32,700 --> 00:45:34,620 Therefore, it is normal. 885 00:45:34,620 --> 00:45:37,090 And it is a standard normal. 886 00:45:37,090 --> 00:45:41,460 So by taking a general normal random variable and doing this 887 00:45:41,460 --> 00:45:47,200 standardization, you end up with a standard normal to 888 00:45:47,200 --> 00:45:49,580 which you can then apply the table. 889 00:45:49,580 --> 00:45:52,100 890 00:45:52,100 --> 00:45:56,180 Sometimes one calls this the normalized score. 891 00:45:56,180 --> 00:45:59,100 If you're thinking about test results, how would you 892 00:45:59,100 --> 00:46:00,780 interpret this number? 893 00:46:00,780 --> 00:46:05,440 It tells you how many standard deviations are you 894 00:46:05,440 --> 00:46:07,900 away from the mean. 895 00:46:07,900 --> 00:46:10,470 This is how much you are away from the mean. 896 00:46:10,470 --> 00:46:13,080 And you count it in terms of how many standard 897 00:46:13,080 --> 00:46:14,390 deviations it is. 898 00:46:14,390 --> 00:46:19,680 So this number being equal to 3 tells you that X happens to 899 00:46:19,680 --> 00:46:23,160 be 3 standard deviations above the mean. 900 00:46:23,160 --> 00:46:26,030 And I guess if you're looking at your quiz scores, very 901 00:46:26,030 --> 00:46:30,690 often that's the kind of number that you think about. 902 00:46:30,690 --> 00:46:32,130 So it's a useful quantity. 903 00:46:32,130 --> 00:46:35,120 But it's also useful for doing the calculation we're now 904 00:46:35,120 --> 00:46:36,050 going to do. 905 00:46:36,050 --> 00:46:40,910 So suppose that X has a mean of 2 and a variance of 16, so 906 00:46:40,910 --> 00:46:43,600 a standard deviation of 4. 907 00:46:43,600 --> 00:46:46,030 And we're going to calculate the probability of this event. 908 00:46:46,030 --> 00:46:49,900 This event is described in terms of this X that has ugly 909 00:46:49,900 --> 00:46:51,530 means and variances. 910 00:46:51,530 --> 00:46:55,390 But we can take this event and rewrite it as 911 00:46:55,390 --> 00:46:57,070 an equivalent event. 912 00:46:57,070 --> 00:47:01,470 X less than 3 is this same as X minus 2 being less than 3 913 00:47:01,470 --> 00:47:06,410 minus 2, which is the same as this ratio being less than 914 00:47:06,410 --> 00:47:08,440 that ratio. 915 00:47:08,440 --> 00:47:11,460 So I'm subtracting from both sides of the inequality the 916 00:47:11,460 --> 00:47:14,170 mean and then dividing by the standard deviation. 917 00:47:14,170 --> 00:47:16,190 This event is the same as that event. 918 00:47:16,190 --> 00:47:19,430 Why do we like this better than that? 919 00:47:19,430 --> 00:47:23,670 We like it because this is the standardized, or normalized, 920 00:47:23,670 --> 00:47:28,660 version of X. We know that this is standard normal. 921 00:47:28,660 --> 00:47:30,650 And so we're asking the question, what's the 922 00:47:30,650 --> 00:47:34,130 probability that the standard normal is less than this 923 00:47:34,130 --> 00:47:37,300 number, which is 1/4? 924 00:47:37,300 --> 00:47:45,380 So that's the key property, that this is normal (0, 1). 925 00:47:45,380 --> 00:47:48,470 And so we can look up now with the table and ask for the 926 00:47:48,470 --> 00:47:51,010 probability that the standard normal random variable 927 00:47:51,010 --> 00:47:53,170 is less than 0.25. 928 00:47:53,170 --> 00:47:55,130 Where is that going to be? 929 00:47:55,130 --> 00:48:01,390 0.2, 0.25, it's here. 930 00:48:01,390 --> 00:48:09,600 So the answer is 0.987. 931 00:48:09,600 --> 00:48:15,570 So I guess this is just a drill that you could learn in 932 00:48:15,570 --> 00:48:16,190 high school. 933 00:48:16,190 --> 00:48:18,990 You didn't have to come here to learn about it. 934 00:48:18,990 --> 00:48:22,030 But it's a drill that's very useful when we will be 935 00:48:22,030 --> 00:48:24,060 calculating normal probabilities all the time. 936 00:48:24,060 --> 00:48:27,300 So make sure you know how to use the table and how to 937 00:48:27,300 --> 00:48:30,350 massage a general normal random variable into a 938 00:48:30,350 --> 00:48:33,380 standard normal random variable. 939 00:48:33,380 --> 00:48:33,790 OK. 940 00:48:33,790 --> 00:48:37,450 So just one more minute to look at the big picture and 941 00:48:37,450 --> 00:48:40,940 take stock of what we have done so far 942 00:48:40,940 --> 00:48:42,970 and where we're going. 943 00:48:42,970 --> 00:48:47,840 Chapter 2 was this part of the picture, where we dealt with 944 00:48:47,840 --> 00:48:50,460 discrete random variables. 945 00:48:50,460 --> 00:48:54,590 And this time, today, we started talking about 946 00:48:54,590 --> 00:48:56,410 continuous random variables. 947 00:48:56,410 --> 00:49:00,305 And we introduced the density function, which is the analog 948 00:49:00,305 --> 00:49:03,160 of the probability mass function. 949 00:49:03,160 --> 00:49:05,790 We have the concepts of expectation and 950 00:49:05,790 --> 00:49:07,090 variance and CDF. 951 00:49:07,090 --> 00:49:10,290 And this kind of notation applies to both discrete and 952 00:49:10,290 --> 00:49:11,720 continuous cases. 953 00:49:11,720 --> 00:49:17,310 They are calculated the same way in both cases except that 954 00:49:17,310 --> 00:49:19,770 in the continuous case, you use sums. 955 00:49:19,770 --> 00:49:22,740 In the discrete case, you use integrals. 956 00:49:22,740 --> 00:49:25,320 So on that side, you have integrals. 957 00:49:25,320 --> 00:49:27,500 In this case, you have sums. 958 00:49:27,500 --> 00:49:30,200 In this case, you always have Fs in your formulas. 959 00:49:30,200 --> 00:49:33,500 In this case, you always have Ps in your formulas. 960 00:49:33,500 --> 00:49:37,890 So what's there that's left for us to do is to look at 961 00:49:37,890 --> 00:49:42,460 these two concepts, joint probability mass functions and 962 00:49:42,460 --> 00:49:47,410 conditional mass functions, and figure out what would be 963 00:49:47,410 --> 00:49:51,080 the equivalent concepts on the continuous side. 964 00:49:51,080 --> 00:49:55,240 So we will need some notion of a joint density when we're 965 00:49:55,240 --> 00:49:57,510 dealing with multiple random variables. 966 00:49:57,510 --> 00:50:00,310 And we will also need the concept of conditional 967 00:50:00,310 --> 00:50:03,430 density, again for the case of continuous random variables. 968 00:50:03,430 --> 00:50:07,840 The intuition and the meaning of these objects is going to 969 00:50:07,840 --> 00:50:14,120 be exactly the same as here, only a little subtler because 970 00:50:14,120 --> 00:50:16,000 densities are not probabilities. 971 00:50:16,000 --> 00:50:18,630 They're rates at which probabilities accumulate. 972 00:50:18,630 --> 00:50:22,030 So that adds a little bit of potential confusion here, 973 00:50:22,030 --> 00:50:24,680 which, hopefully, we will fully resolve in the next 974 00:50:24,680 --> 00:50:26,490 couple of sections. 975 00:50:26,490 --> 00:50:27,310 All right. 976 00:50:27,310 --> 00:50:28,560 Thank you. 977 00:50:28,560 --> 00:50:29,070