1 00:00:00,000 --> 00:00:00,040 2 00:00:00,040 --> 00:00:02,460 The following content is provided under a Creative 3 00:00:02,460 --> 00:00:03,870 Commons license. 4 00:00:03,870 --> 00:00:06,910 Your support will help MIT OpenCourseWare continue to 5 00:00:06,910 --> 00:00:10,560 offer high-quality educational resources for free. 6 00:00:10,560 --> 00:00:13,460 To make a donation or view additional materials from 7 00:00:13,460 --> 00:00:19,290 hundreds of MIT courses, visit MIT OpenCourseWare at 8 00:00:19,290 --> 00:00:20,540 ocw.mit.edu. 9 00:00:20,540 --> 00:00:23,050 10 00:00:23,050 --> 00:00:25,080 JOHN TSITSIKLIS: OK let's start. 11 00:00:25,080 --> 00:00:26,560 So we've had the quiz. 12 00:00:26,560 --> 00:00:29,760 And I guess there's both good and bad news in it. 13 00:00:29,760 --> 00:00:31,590 Yesterday, as you know, the bad news. 14 00:00:31,590 --> 00:00:33,910 The average was a little lower than what 15 00:00:33,910 --> 00:00:36,260 we would have wanted. 16 00:00:36,260 --> 00:00:39,580 On the other hand, the good news is that the distribution 17 00:00:39,580 --> 00:00:41,770 was nicely spread. 18 00:00:41,770 --> 00:00:44,890 And that's the main purpose of this quiz is basically for you 19 00:00:44,890 --> 00:00:48,260 to calibrate and see roughly where you are standing. 20 00:00:48,260 --> 00:00:50,650 The other piece of the good news is that, as you know, 21 00:00:50,650 --> 00:00:53,590 this quiz doesn't count for very much in your final grade. 22 00:00:53,590 --> 00:00:58,230 So it's really a matter of calibration and to get your 23 00:00:58,230 --> 00:01:02,810 mind set appropriately to prepare for the second quiz, 24 00:01:02,810 --> 00:01:04,470 which counts a lot more. 25 00:01:04,470 --> 00:01:06,370 And it's more substantial. 26 00:01:06,370 --> 00:01:08,810 And we'll make sure that the second quiz will 27 00:01:08,810 --> 00:01:12,110 have a higher average. 28 00:01:12,110 --> 00:01:12,520 All right. 29 00:01:12,520 --> 00:01:15,410 So let's go to our material. 30 00:01:15,410 --> 00:01:18,190 We're talking now these days about 31 00:01:18,190 --> 00:01:20,440 continuous random variables. 32 00:01:20,440 --> 00:01:23,240 And I'll remind you what we discussed last time. 33 00:01:23,240 --> 00:01:25,970 I'll remind you of the concept of the probability density 34 00:01:25,970 --> 00:01:28,230 function of a single random variable. 35 00:01:28,230 --> 00:01:31,090 And then we're going to rush through all the concepts that 36 00:01:31,090 --> 00:01:34,230 we covered for the case of discrete random variables and 37 00:01:34,230 --> 00:01:37,770 discuss their analogs for the continuous case. 38 00:01:37,770 --> 00:01:40,410 And talk about notions such as conditioning 39 00:01:40,410 --> 00:01:42,170 independence and so on. 40 00:01:42,170 --> 00:01:46,420 So the big picture is here. 41 00:01:46,420 --> 00:01:49,590 We have all those concepts that we developed for the case 42 00:01:49,590 --> 00:01:52,350 of discrete random variables. 43 00:01:52,350 --> 00:01:55,560 And now we will just talk about their analogs in the 44 00:01:55,560 --> 00:01:56,840 continuous case. 45 00:01:56,840 --> 00:02:00,800 We already discussed this analog last week, the density 46 00:02:00,800 --> 00:02:04,520 of a single random variable. 47 00:02:04,520 --> 00:02:08,570 Then there are certain concepts that show up both in 48 00:02:08,570 --> 00:02:10,780 the discrete and the continuous case. 49 00:02:10,780 --> 00:02:14,560 So we have the cumulative distribution function, which 50 00:02:14,560 --> 00:02:18,070 is a description of the probability distribution of a 51 00:02:18,070 --> 00:02:21,470 random variable and which applies whether you have a 52 00:02:21,470 --> 00:02:23,780 discrete or continuous random variable. 53 00:02:23,780 --> 00:02:26,500 Then there's the notion of the expected value. 54 00:02:26,500 --> 00:02:29,990 And in the two cases, the expected value is calculated 55 00:02:29,990 --> 00:02:32,990 in a slightly different way, but not very different. 56 00:02:32,990 --> 00:02:36,080 We have sums in one case, integrals in the other. 57 00:02:36,080 --> 00:02:37,720 And this is the general pattern that 58 00:02:37,720 --> 00:02:39,030 we're going to have. 59 00:02:39,030 --> 00:02:42,120 Formulas for the discrete case translate to corresponding 60 00:02:42,120 --> 00:02:44,920 formulas or expressions in the continuous case. 61 00:02:44,920 --> 00:02:50,010 We generically replace sums by integrals, and we replace must 62 00:02:50,010 --> 00:02:54,230 functions with density functions. 63 00:02:54,230 --> 00:02:58,330 Then the new pieces for today are going to be mostly the 64 00:02:58,330 --> 00:03:01,570 notion of a joint density function, which is how we 65 00:03:01,570 --> 00:03:04,330 describe the probability distribution of two random 66 00:03:04,330 --> 00:03:08,370 variables that are somehow related, in general, and then 67 00:03:08,370 --> 00:03:11,780 the notion of a conditional density function that tells us 68 00:03:11,780 --> 00:03:15,160 the distribution of one random variable X when you're told 69 00:03:15,160 --> 00:03:19,200 the value of another random variable Y. There's another 70 00:03:19,200 --> 00:03:22,680 concept, which is the conditional PDF given that the 71 00:03:22,680 --> 00:03:24,860 certain event has happened. 72 00:03:24,860 --> 00:03:27,420 This is a concept that's in some ways simpler. 73 00:03:27,420 --> 00:03:31,360 You've already seen a little bit of that in last week's 74 00:03:31,360 --> 00:03:33,140 recitation and tutorial. 75 00:03:33,140 --> 00:03:35,640 The idea is that we have a single random variable. 76 00:03:35,640 --> 00:03:37,710 It's described by a density. 77 00:03:37,710 --> 00:03:41,110 Then you're told that the certain event has occurred. 78 00:03:41,110 --> 00:03:42,880 Your model changes the universe that 79 00:03:42,880 --> 00:03:43,910 you are dealing with. 80 00:03:43,910 --> 00:03:46,640 In the new universe, you are dealing with a new density 81 00:03:46,640 --> 00:03:51,310 function, the one that applies given the knowledge that we 82 00:03:51,310 --> 00:03:55,700 have that the certain event has occurred. 83 00:03:55,700 --> 00:03:56,160 All right. 84 00:03:56,160 --> 00:03:59,870 So what exactly did we say about 85 00:03:59,870 --> 00:04:02,140 continuous random variables? 86 00:04:02,140 --> 00:04:05,020 The first thing is the definition, that a random 87 00:04:05,020 --> 00:04:09,370 variable is said to be continuous if we are given a 88 00:04:09,370 --> 00:04:12,220 certain object that we call the probability density 89 00:04:12,220 --> 00:04:17,050 function and we can calculate interval probabilities given 90 00:04:17,050 --> 00:04:18,709 this density function. 91 00:04:18,709 --> 00:04:21,589 So the definition is that the random variable is continuous 92 00:04:21,589 --> 00:04:24,490 if you can calculate probabilities associated with 93 00:04:24,490 --> 00:04:27,380 that random variable given that formula. 94 00:04:27,380 --> 00:04:29,770 So this formula tells you that the probability that your 95 00:04:29,770 --> 00:04:33,340 random variable falls inside this interval is the area 96 00:04:33,340 --> 00:04:34,880 under the density curve. 97 00:04:34,880 --> 00:04:37,390 98 00:04:37,390 --> 00:04:37,700 OK. 99 00:04:37,700 --> 00:04:39,720 There's a few properties that a density 100 00:04:39,720 --> 00:04:41,020 function must satisfy. 101 00:04:41,020 --> 00:04:42,900 Since we're talking about probabilities, and 102 00:04:42,900 --> 00:04:45,890 probabilities are non-negative, we have that the 103 00:04:45,890 --> 00:04:49,530 density function is always a non-negative function. 104 00:04:49,530 --> 00:04:52,790 The total probability over the entire real line 105 00:04:52,790 --> 00:04:54,690 must be equal to 1. 106 00:04:54,690 --> 00:04:58,070 So the integral when you integrate over the entire real 107 00:04:58,070 --> 00:04:59,590 line has to be equal to 1. 108 00:04:59,590 --> 00:05:01,800 That's the second property. 109 00:05:01,800 --> 00:05:05,200 Another property that you get is that if you let a equal to 110 00:05:05,200 --> 00:05:07,720 b, this integral becomes 0. 111 00:05:07,720 --> 00:05:11,390 And that tells you that the probability of a single point 112 00:05:11,390 --> 00:05:15,990 in the continuous case is always equal to 0. 113 00:05:15,990 --> 00:05:17,780 So these are formal properties. 114 00:05:17,780 --> 00:05:21,290 When you want to think intuitively, the best way to 115 00:05:21,290 --> 00:05:25,540 think about what the density function is to think in terms 116 00:05:25,540 --> 00:05:28,320 of little intervals, the probability that my random 117 00:05:28,320 --> 00:05:31,540 variable falls inside the little interval. 118 00:05:31,540 --> 00:05:35,170 Well, inside that little interval, the density function 119 00:05:35,170 --> 00:05:36,940 here is roughly constant. 120 00:05:36,940 --> 00:05:42,430 So that integral becomes the value of the density times the 121 00:05:42,430 --> 00:05:45,340 length of the interval over which you are integrating, 122 00:05:45,340 --> 00:05:47,070 which is delta. 123 00:05:47,070 --> 00:05:50,240 And so the density function basically gives us 124 00:05:50,240 --> 00:05:54,990 probabilities of little events, of small events. 125 00:05:54,990 --> 00:05:59,200 And the density is to be interpreted as probability per 126 00:05:59,200 --> 00:06:02,290 unit length at a certain place in the diagram. 127 00:06:02,290 --> 00:06:04,800 So in that place in the diagram, the probability per 128 00:06:04,800 --> 00:06:07,870 unit length around this neighborhood would be the 129 00:06:07,870 --> 00:06:12,320 height of the density function at that point. 130 00:06:12,320 --> 00:06:13,270 What else? 131 00:06:13,270 --> 00:06:16,440 We have a formula for calculating expected values of 132 00:06:16,440 --> 00:06:17,980 functions of random variables. 133 00:06:17,980 --> 00:06:21,310 In the discrete case, we had the formula where here we had 134 00:06:21,310 --> 00:06:25,430 the sum, and instead of the density, we had the PMF. 135 00:06:25,430 --> 00:06:29,188 The same formula is also valid in the continuous case. 136 00:06:29,188 --> 00:06:35,120 And it's not too hard to derive, but we will not do it. 137 00:06:35,120 --> 00:06:36,910 But let's think of the intuition of what 138 00:06:36,910 --> 00:06:38,420 this formula says. 139 00:06:38,420 --> 00:06:41,670 You're trying to figure out on the average how much g(X) is 140 00:06:41,670 --> 00:06:42,780 going to be. 141 00:06:42,780 --> 00:06:47,130 And then you reason, and you say, well, X may turn out to 142 00:06:47,130 --> 00:06:52,560 take a particular value or a small interval of values. 143 00:06:52,560 --> 00:06:54,780 This is the probability that X falls 144 00:06:54,780 --> 00:06:56,640 inside the small interval. 145 00:06:56,640 --> 00:07:00,310 And when that happens, g(X) takes that value. 146 00:07:00,310 --> 00:07:03,930 So this fraction of the time, you fall in the little 147 00:07:03,930 --> 00:07:07,350 neighborhood of x, and you get so much. 148 00:07:07,350 --> 00:07:10,860 Then you average over all the possible x's that can happen. 149 00:07:10,860 --> 00:07:13,930 And that gives you the average value of the function g(X). 150 00:07:13,930 --> 00:07:17,730 151 00:07:17,730 --> 00:07:18,045 OK. 152 00:07:18,045 --> 00:07:20,650 So this is the easy stuff. 153 00:07:20,650 --> 00:07:23,690 Now let's get to the new material. 154 00:07:23,690 --> 00:07:26,330 We want to talk about multiple random variables 155 00:07:26,330 --> 00:07:27,320 simultaneously. 156 00:07:27,320 --> 00:07:31,530 So we want to talk now about two random variables that are 157 00:07:31,530 --> 00:07:35,020 continuous, and in some sense that they are jointly 158 00:07:35,020 --> 00:07:35,840 continuous. 159 00:07:35,840 --> 00:07:38,080 And let's see what this means. 160 00:07:38,080 --> 00:07:40,840 The definition is similar to the definition we had for a 161 00:07:40,840 --> 00:07:44,850 single random variable, where I take this formula here as 162 00:07:44,850 --> 00:07:49,510 the definition of continuous random variables. 163 00:07:49,510 --> 00:07:53,830 Two random variables are said to be jointly continuous if we 164 00:07:53,830 --> 00:07:58,190 can calculate probabilities by integrating a certain function 165 00:07:58,190 --> 00:08:01,070 that we call the joint density function 166 00:08:01,070 --> 00:08:03,310 over the set of interest. 167 00:08:03,310 --> 00:08:08,690 So we have our two-dimensional plane. 168 00:08:08,690 --> 00:08:10,900 This is the x-y plane. 169 00:08:10,900 --> 00:08:13,810 There's a certain event S that we're interested in. 170 00:08:13,810 --> 00:08:15,860 We want to calculate the probability. 171 00:08:15,860 --> 00:08:17,370 How do we do that? 172 00:08:17,370 --> 00:08:22,660 We are given this function f_(X,Y), the joint density. 173 00:08:22,660 --> 00:08:25,910 It's a function of the two arguments x and y. 174 00:08:25,910 --> 00:08:29,530 So think of that function as being some kind of surface 175 00:08:29,530 --> 00:08:34,809 that sits on top of the two-dimensional plane. 176 00:08:34,809 --> 00:08:39,140 The probability of falling inside the set S, we calculate 177 00:08:39,140 --> 00:08:45,350 it by looking at the volume under the surface, that volume 178 00:08:45,350 --> 00:08:50,470 that sits on top of S. So the surface underneath it has a 179 00:08:50,470 --> 00:08:52,010 certain total volume. 180 00:08:52,010 --> 00:08:54,650 What should that total volume be? 181 00:08:54,650 --> 00:08:57,050 Well, we think of these volumes as probabilities. 182 00:08:57,050 --> 00:09:00,180 So the total probability should be equal to 1. 183 00:09:00,180 --> 00:09:05,430 The total volume under this surface, should be equal to 1. 184 00:09:05,430 --> 00:09:08,220 So that's one property that we want our 185 00:09:08,220 --> 00:09:10,138 density function to have. 186 00:09:10,138 --> 00:09:16,080 187 00:09:16,080 --> 00:09:20,500 So when you integrate over the entire space, this is of the 188 00:09:20,500 --> 00:09:22,400 volume under your surface. 189 00:09:22,400 --> 00:09:24,090 That should be equal to 1. 190 00:09:24,090 --> 00:09:27,280 Of course, since we're talking about probabilities, the joint 191 00:09:27,280 --> 00:09:29,560 density should be a non-negative function. 192 00:09:29,560 --> 00:09:34,140 So think of the situation as having one pound of 193 00:09:34,140 --> 00:09:38,230 probability that's spread all over your space. 194 00:09:38,230 --> 00:09:41,430 And the height of this joint density function basically 195 00:09:41,430 --> 00:09:45,470 tells you how much probability tends to be accumulated in 196 00:09:45,470 --> 00:09:48,400 certain regions of space as opposed to other 197 00:09:48,400 --> 00:09:49,870 parts of the space. 198 00:09:49,870 --> 00:09:53,130 So wherever the density is big, that means that this is 199 00:09:53,130 --> 00:09:54,920 an area of the two-dimensional plane that's 200 00:09:54,920 --> 00:09:56,340 more likely to occur. 201 00:09:56,340 --> 00:09:59,160 Where the density is small, that means that those x-y's 202 00:09:59,160 --> 00:10:01,100 are less likely to occur. 203 00:10:01,100 --> 00:10:03,070 You have already seen one example 204 00:10:03,070 --> 00:10:06,050 of continuous densities. 205 00:10:06,050 --> 00:10:08,730 That was the example we had in the very beginning of the 206 00:10:08,730 --> 00:10:10,700 class with a uniform 207 00:10:10,700 --> 00:10:13,380 distribution on the unit square. 208 00:10:13,380 --> 00:10:15,510 That was a special case of a density 209 00:10:15,510 --> 00:10:17,250 function that was constant. 210 00:10:17,250 --> 00:10:20,090 So all places in the unit square were roughly equally 211 00:10:20,090 --> 00:10:22,010 likely as any other places. 212 00:10:22,010 --> 00:10:25,580 But in other models, some parts of the space may be more 213 00:10:25,580 --> 00:10:27,000 likely than others. 214 00:10:27,000 --> 00:10:29,470 And we describe those relative likelihoods using 215 00:10:29,470 --> 00:10:31,120 this density function. 216 00:10:31,120 --> 00:10:33,420 So if somebody gives us the density function, this 217 00:10:33,420 --> 00:10:38,480 determines for us probabilities of all the 218 00:10:38,480 --> 00:10:41,520 subsets of the two-dimensional plane. 219 00:10:41,520 --> 00:10:45,710 Now for an intuitive interpretation, it's good to 220 00:10:45,710 --> 00:10:47,460 think about small events. 221 00:10:47,460 --> 00:10:51,220 So let's take a particular x here and then x plus delta. 222 00:10:51,220 --> 00:10:53,020 So this is a small interval. 223 00:10:53,020 --> 00:10:56,190 Take another small interval here that goes from y to y 224 00:10:56,190 --> 00:10:57,560 plus delta. 225 00:10:57,560 --> 00:11:03,270 And let's look at the event that x falls here and y falls 226 00:11:03,270 --> 00:11:04,780 right there. 227 00:11:04,780 --> 00:11:05,780 What is this event? 228 00:11:05,780 --> 00:11:07,760 Well, this is the event that will fall 229 00:11:07,760 --> 00:11:11,030 inside this little rectangle. 230 00:11:11,030 --> 00:11:15,820 Using this rule for calculating probabilities, 231 00:11:15,820 --> 00:11:19,040 what is the probability of that rectangle going to be? 232 00:11:19,040 --> 00:11:23,130 Well, it should be the integral of the density over 233 00:11:23,130 --> 00:11:24,300 this rectangle. 234 00:11:24,300 --> 00:11:29,720 Or it's the volume under the surface that sits on top of 235 00:11:29,720 --> 00:11:31,010 that rectangle. 236 00:11:31,010 --> 00:11:34,300 Now, if the rectangle is very small, the joint density is 237 00:11:34,300 --> 00:11:36,760 not going to change very much in that neighborhood. 238 00:11:36,760 --> 00:11:38,770 So we can treat it as a constant. 239 00:11:38,770 --> 00:11:42,350 So the volume is going to be the height times 240 00:11:42,350 --> 00:11:44,030 the area of the base. 241 00:11:44,030 --> 00:11:47,150 The height at that point is whatever the function happens 242 00:11:47,150 --> 00:11:49,460 to be around that point. 243 00:11:49,460 --> 00:11:52,590 And the area of the base is delta squared. 244 00:11:52,590 --> 00:11:58,750 So this is the intuitive way to understand what a joint 245 00:11:58,750 --> 00:12:01,070 density function really tells you. 246 00:12:01,070 --> 00:12:04,200 It specifies for you probabilities of little 247 00:12:04,200 --> 00:12:08,500 squares, of little rectangles. 248 00:12:08,500 --> 00:12:11,880 And it allows you to think of the joint density function as 249 00:12:11,880 --> 00:12:15,310 probability per unit area. 250 00:12:15,310 --> 00:12:18,790 So these are the units of the density, its probability per 251 00:12:18,790 --> 00:12:23,800 unit area in the neighborhood of a certain point. 252 00:12:23,800 --> 00:12:26,970 So what do we do with this density function once we have 253 00:12:26,970 --> 00:12:28,410 it in our hands? 254 00:12:28,410 --> 00:12:32,640 Well, we can use it to calculate expected values. 255 00:12:32,640 --> 00:12:34,880 Suppose that you have a function of two random 256 00:12:34,880 --> 00:12:38,040 variables described by a joint density. 257 00:12:38,040 --> 00:12:41,580 You can find, perhaps, the distribution of this random 258 00:12:41,580 --> 00:12:45,330 variable and then use the basic definition of the 259 00:12:45,330 --> 00:12:46,150 expectation. 260 00:12:46,150 --> 00:12:49,260 Or you can calculate expectations directly, using 261 00:12:49,260 --> 00:12:52,010 the distribution of the original random variables. 262 00:12:52,010 --> 00:12:55,280 This is a formula that's again identical to the formula that 263 00:12:55,280 --> 00:12:57,290 we had for the discrete case. 264 00:12:57,290 --> 00:12:59,500 In the discrete case, we had a double sum 265 00:12:59,500 --> 00:13:02,590 here, and we had PMFs. 266 00:13:02,590 --> 00:13:06,290 So the intuition behind this formula is the same that one 267 00:13:06,290 --> 00:13:08,220 had for the discrete case. 268 00:13:08,220 --> 00:13:12,550 It's just that the mechanics are different. 269 00:13:12,550 --> 00:13:16,220 Then something that we did in the discrete case was to find 270 00:13:16,220 --> 00:13:21,510 a way to go from the joint density of the two random 271 00:13:21,510 --> 00:13:25,750 variables taken together to the density of just one of the 272 00:13:25,750 --> 00:13:28,190 random variables. 273 00:13:28,190 --> 00:13:30,570 So we had a formula for the discrete case. 274 00:13:30,570 --> 00:13:33,450 Let's see how things are going to work out in 275 00:13:33,450 --> 00:13:35,800 the continuous case. 276 00:13:35,800 --> 00:13:40,560 So in the continuous case, we have here 277 00:13:40,560 --> 00:13:42,330 our two random variables. 278 00:13:42,330 --> 00:13:45,030 And we have a density for them. 279 00:13:45,030 --> 00:13:48,340 And let's say that we want to calculate the probability that 280 00:13:48,340 --> 00:13:51,570 x falls inside this interval. 281 00:13:51,570 --> 00:13:53,510 So we're looking at the probability that our random 282 00:13:53,510 --> 00:13:58,380 variable X falls in the interval from little x to x 283 00:13:58,380 --> 00:13:59,630 plus delta. 284 00:13:59,630 --> 00:14:02,130 285 00:14:02,130 --> 00:14:08,260 Now, by the properties that we already have for interpreting 286 00:14:08,260 --> 00:14:11,460 the density function of a single random variable, the 287 00:14:11,460 --> 00:14:14,100 probability of a little interval is approximately the 288 00:14:14,100 --> 00:14:18,750 density of that single random variable times delta. 289 00:14:18,750 --> 00:14:22,120 And now we want to find a formula for this marginal 290 00:14:22,120 --> 00:14:26,540 density in terms of the joint density. 291 00:14:26,540 --> 00:14:26,890 OK. 292 00:14:26,890 --> 00:14:28,930 So this is the probability that x 293 00:14:28,930 --> 00:14:30,970 falls inside this interval. 294 00:14:30,970 --> 00:14:34,070 In terms of the two-dimensional plane, this is 295 00:14:34,070 --> 00:14:40,030 the probability that (x,y) falls inside this strip. 296 00:14:40,030 --> 00:14:44,520 So to find that probability, we need to calculate the 297 00:14:44,520 --> 00:14:48,530 probability that (x,y) falls in here, which is going to be 298 00:14:48,530 --> 00:14:55,780 the double integral over the interval over this strip, of 299 00:14:55,780 --> 00:14:57,030 the joint density. 300 00:14:57,030 --> 00:15:05,080 301 00:15:05,080 --> 00:15:07,920 And what are we integrating over? 302 00:15:07,920 --> 00:15:11,185 y goes from minus infinity to plus infinity. 303 00:15:11,185 --> 00:15:15,680 304 00:15:15,680 --> 00:15:22,755 And the dummy variable x goes from little x to x plus delta. 305 00:15:22,755 --> 00:15:27,240 306 00:15:27,240 --> 00:15:31,580 So to integrate over this strip, what we do is for any 307 00:15:31,580 --> 00:15:34,810 given y, we integrate in this dimension. 308 00:15:34,810 --> 00:15:36,770 This is the x integral. 309 00:15:36,770 --> 00:15:40,220 And then we integrate over the y dimension. 310 00:15:40,220 --> 00:15:42,920 Now what is this inner integral? 311 00:15:42,920 --> 00:15:50,250 Because x only varies very little, this is approximately 312 00:15:50,250 --> 00:15:53,040 constant in that range. 313 00:15:53,040 --> 00:15:56,210 So the integral with respect to x just 314 00:15:56,210 --> 00:15:58,840 becomes delta times f(x,y). 315 00:15:58,840 --> 00:16:02,010 316 00:16:02,010 --> 00:16:03,490 And then we've got our dy. 317 00:16:03,490 --> 00:16:06,930 318 00:16:06,930 --> 00:16:11,760 So this is what the inner integral will evaluate to. 319 00:16:11,760 --> 00:16:15,280 We are integrating over the little interval. 320 00:16:15,280 --> 00:16:17,450 So we're keeping y fixed. 321 00:16:17,450 --> 00:16:22,020 Integrating over here, we take the value of the density times 322 00:16:22,020 --> 00:16:24,940 how much we're integrating over. 323 00:16:24,940 --> 00:16:27,890 And we get this formula. 324 00:16:27,890 --> 00:16:28,410 OK. 325 00:16:28,410 --> 00:16:33,170 Now, this expression must be equal to that expression. 326 00:16:33,170 --> 00:16:40,060 So if we cancel the deltas, we see that the marginal density 327 00:16:40,060 --> 00:16:44,000 must be equal to the integral of the joint density, where we 328 00:16:44,000 --> 00:16:48,200 have integrated out the value of y. 329 00:16:48,200 --> 00:16:54,060 330 00:16:54,060 --> 00:16:59,000 So this formula should come as no surprise at this point. 331 00:16:59,000 --> 00:17:01,380 It's exactly the same as the formula that we had for 332 00:17:01,380 --> 00:17:03,270 discrete random variables. 333 00:17:03,270 --> 00:17:06,800 But now we are replacing the sum with an integral. 334 00:17:06,800 --> 00:17:14,690 And instead of using the joint PMF, we are 335 00:17:14,690 --> 00:17:18,480 using the joint PDF. 336 00:17:18,480 --> 00:17:21,810 Then, continuing going down the list of things we did for 337 00:17:21,810 --> 00:17:24,839 discrete random variables, we can now introduce a definition 338 00:17:24,839 --> 00:17:28,310 of the notion of independence of two random variables. 339 00:17:28,310 --> 00:17:31,050 And by analogy with the discrete case, we define 340 00:17:31,050 --> 00:17:33,940 independence to be the following condition. 341 00:17:33,940 --> 00:17:37,210 Two random variables are independent if and only if 342 00:17:37,210 --> 00:17:42,220 their joint density function factors out as a product of 343 00:17:42,220 --> 00:17:44,390 their marginal densities. 344 00:17:44,390 --> 00:17:48,000 And this property needs to be true for all x and y. 345 00:17:48,000 --> 00:17:49,890 So this is the formal definition. 346 00:17:49,890 --> 00:17:53,020 Operationally and intuitively, what does it mean? 347 00:17:53,020 --> 00:17:55,110 Well, intuitively it means the same thing as in 348 00:17:55,110 --> 00:17:56,600 the discrete case. 349 00:17:56,600 --> 00:18:00,610 Knowing anything about X shouldn't tell you anything 350 00:18:00,610 --> 00:18:05,320 about Y. That is, information about X is not going to change 351 00:18:05,320 --> 00:18:10,120 your beliefs about Y. We are going to come back to this 352 00:18:10,120 --> 00:18:11,370 statement in a second. 353 00:18:11,370 --> 00:18:14,320 354 00:18:14,320 --> 00:18:16,920 The other thing that it allows you to do-- 355 00:18:16,920 --> 00:18:20,750 I'm not going to derive this-- is it allows you to calculate 356 00:18:20,750 --> 00:18:25,650 probabilities by multiplying individual probabilities. 357 00:18:25,650 --> 00:18:28,110 So if you ask for the probability that x falls in a 358 00:18:28,110 --> 00:18:34,220 certain set A and y falls in a certain set B, then you can 359 00:18:34,220 --> 00:18:37,670 calculate that probability by multiplying individual 360 00:18:37,670 --> 00:18:38,920 probabilities. 361 00:18:38,920 --> 00:18:41,860 362 00:18:41,860 --> 00:18:46,090 This takes just two lines of derivation, which I'm not 363 00:18:46,090 --> 00:18:47,710 going to do. 364 00:18:47,710 --> 00:18:51,240 But it comes back to the usual notion of 365 00:18:51,240 --> 00:18:53,370 independence of events. 366 00:18:53,370 --> 00:18:56,340 Basically, operationally independence means that you 367 00:18:56,340 --> 00:18:57,660 can multiply probabilities. 368 00:18:57,660 --> 00:19:00,190 369 00:19:00,190 --> 00:19:04,380 So now let's look at an example. 370 00:19:04,380 --> 00:19:08,150 There's a sort of pretty famous and classical one. 371 00:19:08,150 --> 00:19:12,540 It goes back a lot more than a 100 years. 372 00:19:12,540 --> 00:19:16,290 And it's the famous Needle of Buffon. 373 00:19:16,290 --> 00:19:19,860 Buffon was a French naturalist who, for some reason, also 374 00:19:19,860 --> 00:19:22,150 decided to play with probability. 375 00:19:22,150 --> 00:19:24,590 And look at the following problem. 376 00:19:24,590 --> 00:19:28,400 So you have the two-dimensional plane. 377 00:19:28,400 --> 00:19:33,870 And on the plane we draw a bunch of parallel lines. 378 00:19:33,870 --> 00:19:37,575 And those parallel lines are separated by a length. 379 00:19:37,575 --> 00:19:46,830 380 00:19:46,830 --> 00:19:52,270 And the lines are apart at distance d. 381 00:19:52,270 --> 00:19:58,780 And we throw a needle at random, completely at random. 382 00:19:58,780 --> 00:20:01,510 And we'll have to give a meaning to what "completely at 383 00:20:01,510 --> 00:20:03,180 random" means. 384 00:20:03,180 --> 00:20:06,490 And when we throw a needle, there's two possibilities. 385 00:20:06,490 --> 00:20:09,640 Either the needle is going to fall in a way that does not 386 00:20:09,640 --> 00:20:13,120 intersect any of the lines, or it's going to fall in a way 387 00:20:13,120 --> 00:20:15,700 that it intersects one of the lines. 388 00:20:15,700 --> 00:20:19,470 We're taking the needle to be shorter than this distance, so 389 00:20:19,470 --> 00:20:22,185 the needle cannot intersect two lines simultaneously. 390 00:20:22,185 --> 00:20:26,230 It either intersects 0, or it intersects one of the lines. 391 00:20:26,230 --> 00:20:29,610 The question is to find the probability that the needle is 392 00:20:29,610 --> 00:20:32,100 going to intersect a line. 393 00:20:32,100 --> 00:20:34,650 What's the probability of this? 394 00:20:34,650 --> 00:20:35,010 OK. 395 00:20:35,010 --> 00:20:40,020 We are going to approach this problem by using our standard 396 00:20:40,020 --> 00:20:42,110 four-step procedure. 397 00:20:42,110 --> 00:20:46,560 Set up your sample space, describe a probability law on 398 00:20:46,560 --> 00:20:51,460 that sample space, identify the event of interest, and 399 00:20:51,460 --> 00:20:53,370 then calculate. 400 00:20:53,370 --> 00:20:58,470 These four steps basically correspond to these three 401 00:20:58,470 --> 00:21:04,110 bullets and then the last equation down here. 402 00:21:04,110 --> 00:21:06,510 So first thing is to set up a sample space. 403 00:21:06,510 --> 00:21:09,470 We need some variables to describe what happened in the 404 00:21:09,470 --> 00:21:10,780 experiment. 405 00:21:10,780 --> 00:21:14,300 So what happens in the experiment is that the needle 406 00:21:14,300 --> 00:21:16,500 lands somewhere. 407 00:21:16,500 --> 00:21:20,450 And where it lands, we can describe this by specifying 408 00:21:20,450 --> 00:21:24,160 the location of the center of the needle. 409 00:21:24,160 --> 00:21:27,020 And what do we mean by the location of the center? 410 00:21:27,020 --> 00:21:30,310 Well, we can take as our variable to be the distance 411 00:21:30,310 --> 00:21:33,035 from the center of the needle to the nearest line. 412 00:21:33,035 --> 00:21:36,280 413 00:21:36,280 --> 00:21:42,520 So it tells us the vertical distance of the center of the 414 00:21:42,520 --> 00:21:45,930 needle from the nearest line. 415 00:21:45,930 --> 00:21:47,500 The other thing that matters is the 416 00:21:47,500 --> 00:21:49,400 orientation of the needle. 417 00:21:49,400 --> 00:21:53,820 So we need one more variable, which we take to be the angle 418 00:21:53,820 --> 00:21:56,940 that the needle is forming with the lines. 419 00:21:56,940 --> 00:22:00,260 We can put the angle here, or you can put in there. 420 00:22:00,260 --> 00:22:02,620 Yes, it's still the same angle. 421 00:22:02,620 --> 00:22:06,850 So we have these two variables that described what happened 422 00:22:06,850 --> 00:22:08,190 in the experiment. 423 00:22:08,190 --> 00:22:11,280 And we can take our sample space to be the set of all 424 00:22:11,280 --> 00:22:14,390 possible x's and theta's. 425 00:22:14,390 --> 00:22:16,770 What are the possible x's? 426 00:22:16,770 --> 00:22:20,800 The lines are d apart, so the nearest line is going to be 427 00:22:20,800 --> 00:22:24,400 anywhere between 0 and d/2 away. 428 00:22:24,400 --> 00:22:28,630 So that tells us what the possible x's will be. 429 00:22:28,630 --> 00:22:31,420 As for theta, it really depends how 430 00:22:31,420 --> 00:22:33,230 you define your angle. 431 00:22:33,230 --> 00:22:37,510 We are going to define our theta to be the acute angle 432 00:22:37,510 --> 00:22:44,020 that's formed between the needle and a line, if you were 433 00:22:44,020 --> 00:22:45,130 to extend it. 434 00:22:45,130 --> 00:22:50,180 So theta is going to be something between 0 and pi/2. 435 00:22:50,180 --> 00:22:54,140 So I guess these red pieces really correspond to the part 436 00:22:54,140 --> 00:22:58,490 of setting up the sample space. 437 00:22:58,490 --> 00:22:58,810 OK. 438 00:22:58,810 --> 00:23:00,270 So that's part one. 439 00:23:00,270 --> 00:23:03,390 Second part is we need a model. 440 00:23:03,390 --> 00:23:03,690 OK. 441 00:23:03,690 --> 00:23:08,140 Let's take our model to be that we basically know nothing 442 00:23:08,140 --> 00:23:10,600 about how the needle falls. 443 00:23:10,600 --> 00:23:13,890 It can fall in any possible way, and all possible ways are 444 00:23:13,890 --> 00:23:15,230 equally likely. 445 00:23:15,230 --> 00:23:18,910 Now, if you have those parallel lines, and you close 446 00:23:18,910 --> 00:23:22,330 your eyes completely and throw a needle completely at random, 447 00:23:22,330 --> 00:23:25,260 any x should be equally likely. 448 00:23:25,260 --> 00:23:29,490 So we describe that situation by saying that X should have a 449 00:23:29,490 --> 00:23:31,360 uniform distribution. 450 00:23:31,360 --> 00:23:33,880 That is, it should have a constant density over the 451 00:23:33,880 --> 00:23:35,410 range of interest. 452 00:23:35,410 --> 00:23:39,160 Similarly, if you kind of spin your needle completely at 453 00:23:39,160 --> 00:23:43,580 random, any angle should be as likely as any other angle. 454 00:23:43,580 --> 00:23:47,160 And we decide to model this situation by saying that theta 455 00:23:47,160 --> 00:23:49,680 also has a uniform distribution over 456 00:23:49,680 --> 00:23:50,995 the range of interest. 457 00:23:50,995 --> 00:23:54,220 458 00:23:54,220 --> 00:23:58,500 And finally, where we put it should have nothing to do with 459 00:23:58,500 --> 00:24:00,370 how much we rotate it. 460 00:24:00,370 --> 00:24:04,320 And we capture this mathematically by saying that 461 00:24:04,320 --> 00:24:07,480 X is going to be independent of theta. 462 00:24:07,480 --> 00:24:09,220 Now, this is going to be our model. 463 00:24:09,220 --> 00:24:11,920 I'm not deriving the model from anything. 464 00:24:11,920 --> 00:24:15,480 I'm only saying that this sounds like a model that does 465 00:24:15,480 --> 00:24:19,800 not assume any knowledge or preference for certain values 466 00:24:19,800 --> 00:24:22,360 of x rather than other values of theta. 467 00:24:22,360 --> 00:24:25,660 In the absence of any other particular information you 468 00:24:25,660 --> 00:24:28,420 might have in your hands, that's the most reasonable 469 00:24:28,420 --> 00:24:30,520 model to come up with. 470 00:24:30,520 --> 00:24:32,150 So you model the problem that way. 471 00:24:32,150 --> 00:24:35,490 So what's the formula for the joint density? 472 00:24:35,490 --> 00:24:37,590 It's going to be the product of the 473 00:24:37,590 --> 00:24:41,200 densities of X and Theta. 474 00:24:41,200 --> 00:24:42,410 Why is it the product? 475 00:24:42,410 --> 00:24:45,530 This is because we assumed independence. 476 00:24:45,530 --> 00:24:48,910 And the density of X, since it's uniform, and since it 477 00:24:48,910 --> 00:24:54,630 needs to integrate to 1, that density needs to be 2/d. 478 00:24:54,630 --> 00:24:57,580 That's the density of X. And the density of 479 00:24:57,580 --> 00:25:00,740 Theta needs to be 2/pi. 480 00:25:00,740 --> 00:25:03,660 That's the value for the density of Theta so that the 481 00:25:03,660 --> 00:25:07,920 overall probability over this interval ends up being 1. 482 00:25:07,920 --> 00:25:12,390 So now we do have our joint density in our hands. 483 00:25:12,390 --> 00:25:14,690 The next thing to do is to identify 484 00:25:14,690 --> 00:25:17,920 the event of interest. 485 00:25:17,920 --> 00:25:20,720 And this is best done in a picture. 486 00:25:20,720 --> 00:25:23,380 And there's two possible situations 487 00:25:23,380 --> 00:25:25,450 that one could have. 488 00:25:25,450 --> 00:25:33,450 Either the needle falls this way, or it falls this way. 489 00:25:33,450 --> 00:25:38,300 So how can we tell if one or the other is going to happen? 490 00:25:38,300 --> 00:25:45,470 It has to do with whether this interval here is smaller than 491 00:25:45,470 --> 00:25:50,130 that or bigger than that. 492 00:25:50,130 --> 00:25:52,260 So we are comparing the height of this 493 00:25:52,260 --> 00:25:55,460 interval to that interval. 494 00:25:55,460 --> 00:25:58,220 This interval here is capital X. 495 00:25:58,220 --> 00:26:02,350 This interval here, what is it? 496 00:26:02,350 --> 00:26:07,040 This is half of the length of the needle, which is l/2. 497 00:26:07,040 --> 00:26:10,590 To find this height, we take l/2 and multiply it with the 498 00:26:10,590 --> 00:26:13,700 sine of the angle that we have. 499 00:26:13,700 --> 00:26:18,330 So the length of this interval up here is 500 00:26:18,330 --> 00:26:23,500 l/2 times sine theta. 501 00:26:23,500 --> 00:26:28,520 If this is smaller than x, the needle does not 502 00:26:28,520 --> 00:26:30,010 intersect the line. 503 00:26:30,010 --> 00:26:33,130 If this is bigger than x, then the needle 504 00:26:33,130 --> 00:26:34,920 intersects the line. 505 00:26:34,920 --> 00:26:37,870 So the event of interest, that the needle intersects the 506 00:26:37,870 --> 00:26:42,740 line, is described this way in terms of x and theta. 507 00:26:42,740 --> 00:26:46,170 And now that we have the event of interest described 508 00:26:46,170 --> 00:26:50,100 mathematically, all that we need to do is to find the 509 00:26:50,100 --> 00:26:54,800 probability of this event, we integrate the joint density 510 00:26:54,800 --> 00:26:59,560 over the part of (x, theta) space in which this 511 00:26:59,560 --> 00:27:01,320 inequality is true. 512 00:27:01,320 --> 00:27:04,670 So it's a double integral over the set of all x's and theta's 513 00:27:04,670 --> 00:27:06,450 where this is true. 514 00:27:06,450 --> 00:27:11,430 The way to do this integral is we fix theta, and we integrate 515 00:27:11,430 --> 00:27:15,150 for x's that go from 0 up to that number. 516 00:27:15,150 --> 00:27:19,030 And theta can be anything between 0 and pi/2. 517 00:27:19,030 --> 00:27:23,620 So the integral over this set is basically this double 518 00:27:23,620 --> 00:27:24,980 integral here. 519 00:27:24,980 --> 00:27:27,475 We already have a formula for the joint density. 520 00:27:27,475 --> 00:27:30,930 It's 4 over pi d, so we put it here. 521 00:27:30,930 --> 00:27:32,640 And now, fortunately, this is a pretty 522 00:27:32,640 --> 00:27:34,645 easy integral to evaluate. 523 00:27:34,645 --> 00:27:37,650 The integral with respect to x -- there's nothing in here. 524 00:27:37,650 --> 00:27:40,950 So the integral is just the length of the interval over 525 00:27:40,950 --> 00:27:42,370 which we're integrating. 526 00:27:42,370 --> 00:27:44,950 It's l/2 sine theta. 527 00:27:44,950 --> 00:27:47,870 And then we need to integrate this with respect to theta. 528 00:27:47,870 --> 00:27:53,990 We know that the integral of a sine is a negative cosine. 529 00:27:53,990 --> 00:27:56,990 You plug in the values for the negative cosine 530 00:27:56,990 --> 00:27:58,390 at the two end points. 531 00:27:58,390 --> 00:28:00,260 I'm sure you can do this integral . 532 00:28:00,260 --> 00:28:04,540 And we finally obtain the answer, which is amazingly 533 00:28:04,540 --> 00:28:08,210 simple for such a pretty complicated-looking problem. 534 00:28:08,210 --> 00:28:09,910 It's 2l over pi d. 535 00:28:09,910 --> 00:28:12,420 536 00:28:12,420 --> 00:28:15,360 So some people a long, long time ago, after they looked at 537 00:28:15,360 --> 00:28:19,290 this answer, they said that maybe that gives us an 538 00:28:19,290 --> 00:28:22,910 interesting way where one could estimate the value by 539 00:28:22,910 --> 00:28:26,130 pi, for example, experimentally. 540 00:28:26,130 --> 00:28:27,690 How do you do that? 541 00:28:27,690 --> 00:28:32,360 Fix l and d, the dimensions of the problem. 542 00:28:32,360 --> 00:28:36,680 Throw a million needles on your piece of paper. 543 00:28:36,680 --> 00:28:40,690 See how often your needless do intersect the line. 544 00:28:40,690 --> 00:28:43,540 That gives you a number for this quantity. 545 00:28:43,540 --> 00:28:48,540 You know l and d, so you can use that to infer pi. 546 00:28:48,540 --> 00:28:52,330 And there's an apocryphal story about a wounded soldier 547 00:28:52,330 --> 00:28:55,300 in a hospital after the American Civil War who 548 00:28:55,300 --> 00:28:58,490 actually had heard about this and was spending his time in 549 00:28:58,490 --> 00:29:02,680 the hospital throwing needles on pieces of paper. 550 00:29:02,680 --> 00:29:04,350 I don't know if it's true or not. 551 00:29:04,350 --> 00:29:07,330 But let's do something similar here. 552 00:29:07,330 --> 00:29:11,720 So let's look at this diagram. 553 00:29:11,720 --> 00:29:14,110 We fix the dimensions. 554 00:29:14,110 --> 00:29:15,920 This is supposed to be our little d. 555 00:29:15,920 --> 00:29:18,330 That's supposed to be our little l. 556 00:29:18,330 --> 00:29:22,430 We have the formula from the previous slide that p 557 00:29:22,430 --> 00:29:25,230 is 2l over pi d. 558 00:29:25,230 --> 00:29:29,230 In this instance, we choose d to be twice l. 559 00:29:29,230 --> 00:29:32,170 So this number is 1/pi. 560 00:29:32,170 --> 00:29:37,770 So the probability that the needle hits the line is 1/pi. 561 00:29:37,770 --> 00:29:41,150 So I need needles that are 3.1 centimeters long. 562 00:29:41,150 --> 00:29:42,730 I couldn't find such needles. 563 00:29:42,730 --> 00:29:47,360 But I could find paper clips that are 3.1 centimeters long. 564 00:29:47,360 --> 00:29:51,510 So let's start throwing paper clips at random and see how 565 00:29:51,510 --> 00:29:55,285 many of them will end up intersecting the lines. 566 00:29:55,285 --> 00:30:00,501 567 00:30:00,501 --> 00:30:01,920 Good. 568 00:30:01,920 --> 00:30:02,400 OK. 569 00:30:02,400 --> 00:30:09,350 So out of eight paper clips, we have exactly four that 570 00:30:09,350 --> 00:30:11,510 intersected the line. 571 00:30:11,510 --> 00:30:13,620 So our estimate for the probability of intersecting 572 00:30:13,620 --> 00:30:18,970 the line is 1/2, which gives us an estimate for the value 573 00:30:18,970 --> 00:30:22,010 of pi, which is two. 574 00:30:22,010 --> 00:30:24,960 Well, I mean, within an engineering approximation, 575 00:30:24,960 --> 00:30:29,090 we're in the right ballpark, right? 576 00:30:29,090 --> 00:30:32,890 So this might look like a silly way of trying to 577 00:30:32,890 --> 00:30:33,920 estimate pi. 578 00:30:33,920 --> 00:30:36,420 And it probably is. 579 00:30:36,420 --> 00:30:41,200 On the other hand, this kind of methodology is being used 580 00:30:41,200 --> 00:30:44,930 especially by physicists and also by statisticians. 581 00:30:44,930 --> 00:30:46,550 It's used a lot. 582 00:30:46,550 --> 00:30:48,260 When is it used? 583 00:30:48,260 --> 00:30:52,300 If you have an integral to calculate, such as this 584 00:30:52,300 --> 00:30:55,980 integral, but you're not lucky, and your functions are 585 00:30:55,980 --> 00:30:59,980 not so simple where you can do your calculations by hand, and 586 00:30:59,980 --> 00:31:02,590 maybe the dimensions are larger-- instead of two random 587 00:31:02,590 --> 00:31:04,590 variables you have 100 random variables, so 588 00:31:04,590 --> 00:31:08,210 it's a 100-fold integral-- 589 00:31:08,210 --> 00:31:10,830 then there's no way to do that in the computer. 590 00:31:10,830 --> 00:31:14,230 But the way that you can actually do it is by 591 00:31:14,230 --> 00:31:18,290 generating random samples of your random variables, doing 592 00:31:18,290 --> 00:31:21,220 that simulation over and over many times. 593 00:31:21,220 --> 00:31:25,010 That is, by interpreting an integral as a probability, you 594 00:31:25,010 --> 00:31:29,060 can use simulation to estimate that probability. 595 00:31:29,060 --> 00:31:32,470 And that gives you a way of calculating integrals. 596 00:31:32,470 --> 00:31:36,850 And physicists do actually use that a lot, as well as 597 00:31:36,850 --> 00:31:39,630 statisticians, computer scientists, and so on. 598 00:31:39,630 --> 00:31:41,760 It's a so-called Monte Carlo method 599 00:31:41,760 --> 00:31:43,990 for evaluating integrals. 600 00:31:43,990 --> 00:31:50,250 And it's a basic piece of the toolbox in science these days. 601 00:31:50,250 --> 00:31:54,610 Finally, the harder concept of the day is the idea of 602 00:31:54,610 --> 00:31:55,770 conditioning. 603 00:31:55,770 --> 00:31:58,740 And here things become a little subtle when you deal 604 00:31:58,740 --> 00:32:00,970 with continuous random variables. 605 00:32:00,970 --> 00:32:02,290 OK. 606 00:32:02,290 --> 00:32:05,810 First, remember again our basic interpretation of what a 607 00:32:05,810 --> 00:32:06,860 density is. 608 00:32:06,860 --> 00:32:08,200 A density gives us 609 00:32:08,200 --> 00:32:10,500 probabilities of little intervals. 610 00:32:10,500 --> 00:32:13,560 So how should we define conditional densities? 611 00:32:13,560 --> 00:32:16,600 Conditional densities should again give us probabilities of 612 00:32:16,600 --> 00:32:21,290 little intervals, but inside a conditional world where we 613 00:32:21,290 --> 00:32:24,530 have been told something about the other random variable. 614 00:32:24,530 --> 00:32:28,090 So what we would like to be true is the following. 615 00:32:28,090 --> 00:32:31,340 We would like to define a concept of a conditional 616 00:32:31,340 --> 00:32:34,530 density of a random variable X given the value of another 617 00:32:34,530 --> 00:32:37,860 random variable Y. And it should behave the following 618 00:32:37,860 --> 00:32:40,570 way, that the conditional density gives us the 619 00:32:40,570 --> 00:32:42,690 probability of little intervals-- 620 00:32:42,690 --> 00:32:44,260 same as here-- 621 00:32:44,260 --> 00:32:48,440 given that we are told the value of y. 622 00:32:48,440 --> 00:32:50,930 And here's where the subtleties come. 623 00:32:50,930 --> 00:32:54,420 The main thing to notice is that here I didn't write 624 00:32:54,420 --> 00:32:59,000 "equal," I wrote "approximately equal." Why do 625 00:32:59,000 --> 00:33:01,250 we need that? 626 00:33:01,250 --> 00:33:04,460 Well, the thing is that conditional probabilities are 627 00:33:04,460 --> 00:33:08,840 not defined when you condition on an event that has 0 628 00:33:08,840 --> 00:33:10,180 probability. 629 00:33:10,180 --> 00:33:13,400 So we need the conditioning event here to have posed this 630 00:33:13,400 --> 00:33:14,430 probability. 631 00:33:14,430 --> 00:33:18,840 So instead of saying that Y is exactly equal to little y, we 632 00:33:18,840 --> 00:33:22,900 want to instead say we're in a new universe where capital Y 633 00:33:22,900 --> 00:33:27,070 is very close to little y. 634 00:33:27,070 --> 00:33:31,410 And then this notion of "very close" kind of takes the limit 635 00:33:31,410 --> 00:33:34,910 and takes it to be infinitesimally close. 636 00:33:34,910 --> 00:33:38,610 So this is the way to interpret conditional 637 00:33:38,610 --> 00:33:40,120 probabilities. 638 00:33:40,120 --> 00:33:42,550 That's what they should mean. 639 00:33:42,550 --> 00:33:45,330 Now, in practice, when you actually use probability, you 640 00:33:45,330 --> 00:33:46,780 forget about that subtlety. 641 00:33:46,780 --> 00:33:50,940 And you say, well, I've been told that Y is equal to 1.3. 642 00:33:50,940 --> 00:33:53,780 Give me the conditional distribution of X. But 643 00:33:53,780 --> 00:33:58,080 formally or rigorously, you should say I'm being told that 644 00:33:58,080 --> 00:34:01,400 Y is infinitesimally close to 1.3. 645 00:34:01,400 --> 00:34:03,620 Tell me the distribution of X. 646 00:34:03,620 --> 00:34:08,580 Now, if this is what we want, what should this quantity be? 647 00:34:08,580 --> 00:34:10,489 It's a conditional probability, so it should be 648 00:34:10,489 --> 00:34:12,800 the probability of two things happening-- 649 00:34:12,800 --> 00:34:16,550 X being close to little x, Y being close to little y. 650 00:34:16,550 --> 00:34:20,010 And that's basically given to us by the joint density 651 00:34:20,010 --> 00:34:23,920 divided by the probability of the conditioning event, which 652 00:34:23,920 --> 00:34:27,449 has something to do with the density of Y itself. 653 00:34:27,449 --> 00:34:30,840 And if you do things carefully, you see that the 654 00:34:30,840 --> 00:34:34,350 only way to satisfy this relation is to define the 655 00:34:34,350 --> 00:34:38,065 conditional density by this particular formula. 656 00:34:38,065 --> 00:34:38,590 OK. 657 00:34:38,590 --> 00:34:44,159 Big discussion to come down in the end to what you should 658 00:34:44,159 --> 00:34:46,120 have probably guessed by now. 659 00:34:46,120 --> 00:34:49,170 We just take any formulas and expressions from the discrete 660 00:34:49,170 --> 00:34:53,570 case and replace PMFs by PDFs. 661 00:34:53,570 --> 00:34:58,030 So the conditional PDF is defined by this formula where 662 00:34:58,030 --> 00:35:02,450 here we have joint PDF and marginal PDF, as opposed to 663 00:35:02,450 --> 00:35:05,450 the discrete case where we had the joint PMF and 664 00:35:05,450 --> 00:35:07,540 the marginal PMF. 665 00:35:07,540 --> 00:35:11,850 So in some sense, it's just a syntactic change. 666 00:35:11,850 --> 00:35:14,510 In another sense, it's a little subtler on how you 667 00:35:14,510 --> 00:35:17,130 actually interpret it. 668 00:35:17,130 --> 00:35:20,230 Speaking about interpretation, what are some ways of thinking 669 00:35:20,230 --> 00:35:22,170 about the joint density? 670 00:35:22,170 --> 00:35:24,740 Well, the best way to think about it is that somebody has 671 00:35:24,740 --> 00:35:27,720 fixed little y for you. 672 00:35:27,720 --> 00:35:31,980 So little y is being fixed here. 673 00:35:31,980 --> 00:35:35,350 And we look at this density as a function of X. 674 00:35:35,350 --> 00:35:37,020 I've told you what Y is. 675 00:35:37,020 --> 00:35:39,870 Tell me what you know about X. And you tell me that X has a 676 00:35:39,870 --> 00:35:42,070 certain distribution. 677 00:35:42,070 --> 00:35:44,840 What does that distribution look like? 678 00:35:44,840 --> 00:35:50,070 It has exactly the same shape as the joint density. 679 00:35:50,070 --> 00:35:53,390 Remember, we fixed Y. So this is a constant. 680 00:35:53,390 --> 00:35:57,200 So the only thing that varies is X. So we get the function 681 00:35:57,200 --> 00:36:01,320 that behaves like the joint density when you fix y, which 682 00:36:01,320 --> 00:36:04,100 is really you take the joint density, and you 683 00:36:04,100 --> 00:36:05,650 take a slice of it. 684 00:36:05,650 --> 00:36:09,200 You fix a y, and you see how it varies with x. 685 00:36:09,200 --> 00:36:11,810 So in that sense, the conditional PDF is just a 686 00:36:11,810 --> 00:36:14,150 slice of the joint PDF. 687 00:36:14,150 --> 00:36:17,230 But we need to divide by a certain number, which just 688 00:36:17,230 --> 00:36:19,480 scales it and changes its shape. 689 00:36:19,480 --> 00:36:21,950 We're coming back to a picture in a second. 690 00:36:21,950 --> 00:36:25,410 But before going to the picture, lets go back to the 691 00:36:25,410 --> 00:36:27,840 interpretation of independence. 692 00:36:27,840 --> 00:36:30,230 If the two random the variables are independent, 693 00:36:30,230 --> 00:36:33,550 according to our definition in the previous slide, the joint 694 00:36:33,550 --> 00:36:36,130 density is going to factor as the product of 695 00:36:36,130 --> 00:36:37,820 the marginal densities. 696 00:36:37,820 --> 00:36:40,850 The density of Y in the numerator cancels the density 697 00:36:40,850 --> 00:36:42,010 in the denominator. 698 00:36:42,010 --> 00:36:44,410 And we're just left with the density of X. 699 00:36:44,410 --> 00:36:46,940 So in the case of independence, what we get is 700 00:36:46,940 --> 00:36:49,870 that the conditional is the same as the marginal. 701 00:36:49,870 --> 00:36:52,980 And that solidifies our intuition that in the case of 702 00:36:52,980 --> 00:36:58,080 independence, being told something about the value of Y 703 00:36:58,080 --> 00:37:02,540 does not change our beliefs about how X is distributed. 704 00:37:02,540 --> 00:37:06,110 So whatever we expected about X is going to remain true even 705 00:37:06,110 --> 00:37:09,180 after we are told something about Y. 706 00:37:09,180 --> 00:37:12,680 So let's look at some pictures. 707 00:37:12,680 --> 00:37:16,110 Here is what the joint PDF might look like. 708 00:37:16,110 --> 00:37:19,480 Here we've got our x and y-axis. 709 00:37:19,480 --> 00:37:23,100 And if you want to calculate the probability of a certain 710 00:37:23,100 --> 00:37:27,240 event, what you do is you look at that event and you see how 711 00:37:27,240 --> 00:37:31,740 much of that mass is sitting on top of that event. 712 00:37:31,740 --> 00:37:35,180 Now let's start slicing. 713 00:37:35,180 --> 00:37:43,360 Let's fix a value of x and look along that slice where we 714 00:37:43,360 --> 00:37:48,610 obtain this function. 715 00:37:48,610 --> 00:37:52,280 Now what does that slice do? 716 00:37:52,280 --> 00:37:56,100 That slice tells us for that particular x what the possible 717 00:37:56,100 --> 00:38:00,330 values of y are going to be and how likely they are. 718 00:38:00,330 --> 00:38:05,440 If we integrate over all y's, what do we get? 719 00:38:05,440 --> 00:38:10,400 Integrating over all y's just gives us the marginal density 720 00:38:10,400 --> 00:38:15,270 of X. It's the calculation that we did here. 721 00:38:15,270 --> 00:38:19,820 By integrating over all y's, we find the marginal density 722 00:38:19,820 --> 00:38:27,850 of X. So the total area under that slice gives us the 723 00:38:27,850 --> 00:38:31,340 marginal density of X. And by looking at the different 724 00:38:31,340 --> 00:38:35,430 slices, we find how likely the different values of x are 725 00:38:35,430 --> 00:38:36,660 going to be. 726 00:38:36,660 --> 00:38:39,410 How about the conditional? 727 00:38:39,410 --> 00:38:48,790 If we're interested in the conditional of Y given X, how 728 00:38:48,790 --> 00:38:51,200 would you think about it? 729 00:38:51,200 --> 00:38:54,620 This refers to a universe where we are told that capital 730 00:38:54,620 --> 00:38:57,550 X takes on a specific value. 731 00:38:57,550 --> 00:39:00,010 So we put ourselves in the universe where 732 00:39:00,010 --> 00:39:01,810 this line has happened. 733 00:39:01,810 --> 00:39:05,940 There's still possible values of y that can happen. 734 00:39:05,940 --> 00:39:09,270 And this shape kind of tells us the relative likelihoods of 735 00:39:09,270 --> 00:39:10,760 the different y's. 736 00:39:10,760 --> 00:39:14,060 And this is indeed going to be the shape of the conditional 737 00:39:14,060 --> 00:39:17,850 distribution of Y given that X has occurred. 738 00:39:17,850 --> 00:39:21,090 On the other hand, the conditional distribution must 739 00:39:21,090 --> 00:39:22,630 add up to 1. 740 00:39:22,630 --> 00:39:25,920 So the total probability over all of the different y's in 741 00:39:25,920 --> 00:39:27,730 this universe, that total probability 742 00:39:27,730 --> 00:39:29,540 should be equal to 1. 743 00:39:29,540 --> 00:39:31,450 Here it's not equal to 1. 744 00:39:31,450 --> 00:39:34,290 The total area is the marginal density. 745 00:39:34,290 --> 00:39:38,590 To make it equal to 1, we need to divide by the marginal 746 00:39:38,590 --> 00:39:44,160 density, which is basically to renormalize this shape so that 747 00:39:44,160 --> 00:39:48,500 the total area under that slice, under that shape, is 748 00:39:48,500 --> 00:39:50,400 equal to 1. 749 00:39:50,400 --> 00:39:53,430 So we start with the joint. 750 00:39:53,430 --> 00:39:55,730 We take the slices. 751 00:39:55,730 --> 00:40:00,280 And then we adjust the slices so that every slice has an 752 00:40:00,280 --> 00:40:03,610 area underneath equal to 1. 753 00:40:03,610 --> 00:40:05,650 And this gives us the conditional. 754 00:40:05,650 --> 00:40:09,160 So for example, down here-- 755 00:40:09,160 --> 00:40:11,840 you can not even see it in this diagram-- 756 00:40:11,840 --> 00:40:15,410 but after you renormalize it so that its total area is 757 00:40:15,410 --> 00:40:20,160 equal to 1, you get this sort of narrow spike that goes up. 758 00:40:20,160 --> 00:40:22,980 And so this is a plot of the conditional distributions that 759 00:40:22,980 --> 00:40:26,060 you get for the different values of x. 760 00:40:26,060 --> 00:40:29,050 Given a particular value of x, you're going to get this 761 00:40:29,050 --> 00:40:31,460 certain conditional distribution. 762 00:40:31,460 --> 00:40:36,460 So this picture is worth about as much as anything else in 763 00:40:36,460 --> 00:40:38,840 this particular chapter. 764 00:40:38,840 --> 00:40:42,990 Make sure you kind of understand exactly all these 765 00:40:42,990 --> 00:40:44,240 pieces of the picture. 766 00:40:44,240 --> 00:40:47,130 767 00:40:47,130 --> 00:40:49,870 And finally, let's go, in the remaining time, through an 768 00:40:49,870 --> 00:40:55,240 example where we're going to throw in the bucket all the 769 00:40:55,240 --> 00:40:58,320 concepts and notations that we have introduced so far. 770 00:40:58,320 --> 00:40:59,960 So the example is as follows. 771 00:40:59,960 --> 00:41:04,210 We start with a stick that has a certain length. 772 00:41:04,210 --> 00:41:07,790 And we break it a completely random location. 773 00:41:07,790 --> 00:41:09,390 And-- 774 00:41:09,390 --> 00:41:13,686 yes, this 1 should be l. 775 00:41:13,686 --> 00:41:14,130 OK. 776 00:41:14,130 --> 00:41:15,770 So it has length l. 777 00:41:15,770 --> 00:41:19,210 And we're going to break it at the random place. 778 00:41:19,210 --> 00:41:21,970 And we call that random place where we break it, we call it 779 00:41:21,970 --> 00:41:24,210 X. 780 00:41:24,210 --> 00:41:26,670 X can be anywhere, uniform distribution. 781 00:41:26,670 --> 00:41:31,800 So this means that X has a density that goes from 0 to l. 782 00:41:31,800 --> 00:41:34,760 I guess this capital L is supposed to be the same as the 783 00:41:34,760 --> 00:41:36,190 lower-case l. 784 00:41:36,190 --> 00:41:39,430 So that's the density of X. And since the density needs to 785 00:41:39,430 --> 00:41:43,160 integrate to 1, the height of that density has to be 1/l. 786 00:41:43,160 --> 00:41:46,330 787 00:41:46,330 --> 00:41:49,660 Now, having broken the stick and given that we are left 788 00:41:49,660 --> 00:41:53,080 with this piece of the stick, I'm now going to break it 789 00:41:53,080 --> 00:41:56,900 again at a completely random place, meaning I'm going to 790 00:41:56,900 --> 00:41:59,940 choose a point where I break it uniformly over the length 791 00:41:59,940 --> 00:42:00,940 of the stick. 792 00:42:00,940 --> 00:42:02,750 What does this mean? 793 00:42:02,750 --> 00:42:05,720 And let's call Y the location where I break it. 794 00:42:05,720 --> 00:42:10,290 So Y is going to range between 0 and x. 795 00:42:10,290 --> 00:42:11,850 x is the stick that I'm left with. 796 00:42:11,850 --> 00:42:14,190 So I'm going to break it somewhere in between. 797 00:42:14,190 --> 00:42:21,140 So I pick a y between 0 and x. 798 00:42:21,140 --> 00:42:24,480 And of course, x is less than l. 799 00:42:24,480 --> 00:42:26,150 And I'm going to break it there. 800 00:42:26,150 --> 00:42:30,640 So y is uniform between 0 and x. 801 00:42:30,640 --> 00:42:36,460 What does that mean, that the density of y, given that you 802 00:42:36,460 --> 00:42:42,940 have already told me x, ranges from 0 to little x? 803 00:42:42,940 --> 00:42:46,170 If I told you that the first break happened at a particular 804 00:42:46,170 --> 00:42:50,850 x, then y can only range over this interval. 805 00:42:50,850 --> 00:42:52,830 And I'm assuming a uniform 806 00:42:52,830 --> 00:42:54,330 distribution over that interval. 807 00:42:54,330 --> 00:42:56,420 So we have this kind of shape. 808 00:42:56,420 --> 00:43:00,700 And that fixes for us the height of 809 00:43:00,700 --> 00:43:01,950 the conditional density. 810 00:43:01,950 --> 00:43:05,380 811 00:43:05,380 --> 00:43:11,690 So what's the joint density of those two random variables? 812 00:43:11,690 --> 00:43:14,440 By the definition of conditional densities, the 813 00:43:14,440 --> 00:43:18,290 conditional was defined as the ratio of this divided by that. 814 00:43:18,290 --> 00:43:21,500 So we can find the joint density by taking the marginal 815 00:43:21,500 --> 00:43:23,630 and then multiplying by the conditional. 816 00:43:23,630 --> 00:43:26,120 This is the same formula as in the discrete case. 817 00:43:26,120 --> 00:43:29,770 This is our very familiar multiplication rule, but 818 00:43:29,770 --> 00:43:32,150 adjusted to the case of continuous random variables. 819 00:43:32,150 --> 00:43:34,871 So Ps become Fs. 820 00:43:34,871 --> 00:43:35,290 OK. 821 00:43:35,290 --> 00:43:37,560 So we do have a formula for this. 822 00:43:37,560 --> 00:43:38,540 What is it? 823 00:43:38,540 --> 00:43:40,190 It's 1/l-- 824 00:43:40,190 --> 00:43:42,140 that's the density of X -- 825 00:43:42,140 --> 00:43:46,460 times 1/x, which is the conditional density of Y. This 826 00:43:46,460 --> 00:43:48,630 is the formula for the joint density. 827 00:43:48,630 --> 00:43:50,140 But we must be careful. 828 00:43:50,140 --> 00:43:53,230 This is a formula that's not valid anywhere. 829 00:43:53,230 --> 00:43:57,150 It's only valid for the x's and y's that are possible. 830 00:43:57,150 --> 00:44:00,840 And the x's and y's that are possible are given by these 831 00:44:00,840 --> 00:44:01,900 inequalities. 832 00:44:01,900 --> 00:44:05,940 So x can range from 0 to l, and y can only be 833 00:44:05,940 --> 00:44:07,270 smaller than x. 834 00:44:07,270 --> 00:44:09,780 So this is the formula for the density on 835 00:44:09,780 --> 00:44:12,310 this part of our space. 836 00:44:12,310 --> 00:44:16,270 The density is 0 anywhere else. 837 00:44:16,270 --> 00:44:18,430 So what does it look like? 838 00:44:18,430 --> 00:44:20,950 It's basically a 1/x function. 839 00:44:20,950 --> 00:44:23,460 So it's sort of constant along that dimension. 840 00:44:23,460 --> 00:44:27,600 But as x goes to 0, your density goes up and 841 00:44:27,600 --> 00:44:29,280 can even blow up. 842 00:44:29,280 --> 00:44:33,400 It sort of looks like a sail that's raised and somewhat 843 00:44:33,400 --> 00:44:37,640 curved and has a point up there going to infinity. 844 00:44:37,640 --> 00:44:39,680 So this is the joint density. 845 00:44:39,680 --> 00:44:43,480 Now once you have in your hands a joint density, then 846 00:44:43,480 --> 00:44:46,010 you can answer in principle any problem. 847 00:44:46,010 --> 00:44:50,550 It's just a matter of plugging in and doing computations. 848 00:44:50,550 --> 00:44:53,650 How about calculating something like a conditional 849 00:44:53,650 --> 00:44:59,040 expectation of Y given a value of x? 850 00:44:59,040 --> 00:44:59,430 OK. 851 00:44:59,430 --> 00:45:02,530 That's a concept we have not defined so far. 852 00:45:02,530 --> 00:45:04,860 But how should we define it? 853 00:45:04,860 --> 00:45:06,080 Means the reasonable thing. 854 00:45:06,080 --> 00:45:09,930 We'll define it the same way as ordinary expectations 855 00:45:09,930 --> 00:45:14,160 except that since we're given some conditioning information, 856 00:45:14,160 --> 00:45:17,130 we should use the probability distribution that applies to 857 00:45:17,130 --> 00:45:18,840 that particular situation. 858 00:45:18,840 --> 00:45:22,570 So in a situation where we are told the value of x, the 859 00:45:22,570 --> 00:45:25,760 distribution that applies is the conditional distribution 860 00:45:25,760 --> 00:45:29,950 of Y. So it's going to be the conditional density of Y given 861 00:45:29,950 --> 00:45:31,470 the value of x. 862 00:45:31,470 --> 00:45:34,120 Now, we know what this is. 863 00:45:34,120 --> 00:45:37,860 It's given by 1/x. 864 00:45:37,860 --> 00:45:46,160 So we need to integrate y times 1/x dy. 865 00:45:46,160 --> 00:45:48,920 And what should we integrate over? 866 00:45:48,920 --> 00:45:53,930 Well, given the value of x, y can only range from 0 to x. 867 00:45:53,930 --> 00:45:56,150 So this is what we get. 868 00:45:56,150 --> 00:46:01,690 And you do your integral, and you get that this is x/2. 869 00:46:01,690 --> 00:46:03,060 Is it a surprise? 870 00:46:03,060 --> 00:46:04,450 It shouldn't be. 871 00:46:04,450 --> 00:46:10,890 This is just the expected value of Y in a universe where 872 00:46:10,890 --> 00:46:14,560 X has been realized and Y is given by this distribution. 873 00:46:14,560 --> 00:46:17,390 Y is uniform between 0 and x. 874 00:46:17,390 --> 00:46:20,820 The expected value of Y should be the midpoint of this 875 00:46:20,820 --> 00:46:22,100 interval, which is x/2. 876 00:46:22,100 --> 00:46:25,090 877 00:46:25,090 --> 00:46:28,580 Now let's do fancier stuff. 878 00:46:28,580 --> 00:46:31,850 Since we have the joint distribution, we should be 879 00:46:31,850 --> 00:46:34,250 able to calculate the marginal. 880 00:46:34,250 --> 00:46:36,500 What is the distribution of Y? 881 00:46:36,500 --> 00:46:40,510 After breaking the stick twice, how big is the little 882 00:46:40,510 --> 00:46:42,890 piece that I'm left with? 883 00:46:42,890 --> 00:46:44,630 How do we find this? 884 00:46:44,630 --> 00:46:48,850 To find the marginal, we just take the joint and integrate 885 00:46:48,850 --> 00:46:52,670 out the variable that we don't want. 886 00:46:52,670 --> 00:46:55,220 A particular y can happen in many ways. 887 00:46:55,220 --> 00:46:57,800 It can happen together with any x. 888 00:46:57,800 --> 00:47:00,700 So we consider all the possible x's that can go 889 00:47:00,700 --> 00:47:05,940 together with this y and average over all those x's. 890 00:47:05,940 --> 00:47:09,330 So we plug in the formula for the joint density from the 891 00:47:09,330 --> 00:47:10,140 previous slide. 892 00:47:10,140 --> 00:47:13,070 We know that it's 1/lx. 893 00:47:13,070 --> 00:47:16,880 And what's the range of the x's? 894 00:47:16,880 --> 00:47:22,880 So to find the density of Y for a particular y up here, 895 00:47:22,880 --> 00:47:26,480 I'm going to integrate over x's. 896 00:47:26,480 --> 00:47:29,040 The density is 0 here and there. 897 00:47:29,040 --> 00:47:32,160 The density is nonzero only in this part. 898 00:47:32,160 --> 00:47:37,260 So I need to integrate over x's going from here to there. 899 00:47:37,260 --> 00:47:39,120 So what's the "here"? 900 00:47:39,120 --> 00:47:42,200 This line goes up at the slope of 1. 901 00:47:42,200 --> 00:47:45,420 So this is the line x equals y. 902 00:47:45,420 --> 00:47:49,835 So if I fix y, it means that my integral starts from a 903 00:47:49,835 --> 00:47:53,670 value of x that is also equal to y. 904 00:47:53,670 --> 00:47:58,330 So where the integral starts from is at x equals y. 905 00:47:58,330 --> 00:48:01,770 And it goes all the way until the end of the length of our 906 00:48:01,770 --> 00:48:03,660 stick, which is l. 907 00:48:03,660 --> 00:48:08,760 So we need to integrate from little y up to l. 908 00:48:08,760 --> 00:48:12,520 So that's something that almost always comes up. 909 00:48:12,520 --> 00:48:15,690 It's not enough to have just this formula for integrating 910 00:48:15,690 --> 00:48:16,640 the joint density. 911 00:48:16,640 --> 00:48:19,160 You need to keep track of different regions. 912 00:48:19,160 --> 00:48:23,920 And if the joint density is 0 in some regions, then you 913 00:48:23,920 --> 00:48:28,250 exclude those regions from the range of integration. 914 00:48:28,250 --> 00:48:32,380 So the range of integration is only over those values where 915 00:48:32,380 --> 00:48:35,600 the particular formula is valid, the places where the 916 00:48:35,600 --> 00:48:37,990 joint density is nonzero. 917 00:48:37,990 --> 00:48:38,360 All right. 918 00:48:38,360 --> 00:48:41,760 The integral of 1/x dx, that gives you a logarithm. 919 00:48:41,760 --> 00:48:45,460 So we evaluate this integral, and we get an 920 00:48:45,460 --> 00:48:47,410 expression of this kind. 921 00:48:47,410 --> 00:48:53,660 So the density of Y has a somewhat unexpected shape. 922 00:48:53,660 --> 00:48:55,470 So it's a logarithmic function. 923 00:48:55,470 --> 00:48:59,860 And it goes this way. 924 00:48:59,860 --> 00:49:02,980 It's for y going all the way to l. 925 00:49:02,980 --> 00:49:07,860 When y is equal to l, the logarithm of 1 is equal to 0. 926 00:49:07,860 --> 00:49:12,660 But when y approaches 0, logarithm of something big 927 00:49:12,660 --> 00:49:15,740 blows up, and we get a shape of this form. 928 00:49:15,740 --> 00:49:21,900 929 00:49:21,900 --> 00:49:22,330 OK. 930 00:49:22,330 --> 00:49:25,960 Finally, we can calculate the expected value of Y. And we 931 00:49:25,960 --> 00:49:29,430 can do this by using the definition of the expectation. 932 00:49:29,430 --> 00:49:33,300 So integral of y times the density of y. 933 00:49:33,300 --> 00:49:36,290 We already found what that density is, so we 934 00:49:36,290 --> 00:49:38,030 can plug it in here. 935 00:49:38,030 --> 00:49:40,470 And we're integrating over the range of possible 936 00:49:40,470 --> 00:49:42,470 y's, from 0 to l. 937 00:49:42,470 --> 00:49:46,930 Now this involves the integral for y log y, which I'm sure 938 00:49:46,930 --> 00:49:49,500 you have encountered in your calculus classes but maybe do 939 00:49:49,500 --> 00:49:51,350 not remember how to do it. 940 00:49:51,350 --> 00:49:53,650 In any case, you look it up in some integral 941 00:49:53,650 --> 00:49:55,300 tables or do it by parts. 942 00:49:55,300 --> 00:49:59,360 And you get the final answer of l/4. 943 00:49:59,360 --> 00:50:02,400 And at this point, you say, that's a really simple answer. 944 00:50:02,400 --> 00:50:06,200 Shouldn't I have expected it to be l/4? 945 00:50:06,200 --> 00:50:07,680 I guess, yes. 946 00:50:07,680 --> 00:50:11,070 I mean, when you break it once, the expected value of 947 00:50:11,070 --> 00:50:14,220 what you are left with is going to be 1/2 of what you 948 00:50:14,220 --> 00:50:15,860 started with. 949 00:50:15,860 --> 00:50:19,320 When you break it the next time, the expected length of 950 00:50:19,320 --> 00:50:23,380 what you're left with should be 1/2 of the piece that you 951 00:50:23,380 --> 00:50:24,550 are now breaking. 952 00:50:24,550 --> 00:50:27,350 So each time that you break it at random, you expected it to 953 00:50:27,350 --> 00:50:29,840 become smaller by a factor of 1/2. 954 00:50:29,840 --> 00:50:31,960 So if you break it twice, you are left something that's 955 00:50:31,960 --> 00:50:33,940 expected to be 1/4. 956 00:50:33,940 --> 00:50:37,350 This is reasoning on the average, which happens to give 957 00:50:37,350 --> 00:50:39,010 you the right answer in this case. 958 00:50:39,010 --> 00:50:41,800 But again, there's the warning that reasoning on the average 959 00:50:41,800 --> 00:50:44,230 doesn't always give you the right answer. 960 00:50:44,230 --> 00:50:48,100 So be careful about doing arguments of this type. 961 00:50:48,100 --> 00:50:48,620 Very good. 962 00:50:48,620 --> 00:50:49,870 See you on Wednesday. 963 00:50:49,870 --> 00:50:50,870