1 00:00:00,000 --> 00:00:00,040 2 00:00:00,040 --> 00:00:02,460 The following content is provided under a Creative 3 00:00:02,460 --> 00:00:03,870 Commons license. 4 00:00:03,870 --> 00:00:06,910 Your support will help MIT OpenCourseWare continue to 5 00:00:06,910 --> 00:00:10,560 offer high quality educational resources for free. 6 00:00:10,560 --> 00:00:13,460 To make a donation or view additional materials from 7 00:00:13,460 --> 00:00:19,290 hundreds of MIT courses, visit MIT OpenCourseWare at 8 00:00:19,290 --> 00:00:20,540 ocw.mit.edu. 9 00:00:20,540 --> 00:00:23,130 10 00:00:23,130 --> 00:00:25,940 PROFESSOR: So today's agenda is to say a few more things 11 00:00:25,940 --> 00:00:28,050 about continuous random variables. 12 00:00:28,050 --> 00:00:32,049 Mainly we're going to talk a little bit about inference. 13 00:00:32,049 --> 00:00:35,080 This is a topic that we're going to revisit at the end of 14 00:00:35,080 --> 00:00:36,390 the semester. 15 00:00:36,390 --> 00:00:38,070 But there's a few things that we can 16 00:00:38,070 --> 00:00:40,180 already say at this point. 17 00:00:40,180 --> 00:00:44,060 And then the new topic for today is the subject of 18 00:00:44,060 --> 00:00:45,880 derived distributions. 19 00:00:45,880 --> 00:00:48,140 Basically if you know the distribution of one random 20 00:00:48,140 --> 00:00:50,230 variable, and you have a function of that random 21 00:00:50,230 --> 00:00:52,010 variable, how to find a 22 00:00:52,010 --> 00:00:54,840 distribution for that function. 23 00:00:54,840 --> 00:00:58,180 And it's a fairly mechanical skill, but that's an important 24 00:00:58,180 --> 00:01:00,740 one, so we're going to go through it. 25 00:01:00,740 --> 00:01:02,200 So let's see where we stand. 26 00:01:02,200 --> 00:01:03,540 Here is the big picture. 27 00:01:03,540 --> 00:01:06,720 That's all we have done so far. 28 00:01:06,720 --> 00:01:09,460 We have talked about discrete random variables, which we 29 00:01:09,460 --> 00:01:11,970 described by probability mass function. 30 00:01:11,970 --> 00:01:14,900 So if we have multiple random variables, we describe them 31 00:01:14,900 --> 00:01:16,760 with the a joint mass function. 32 00:01:16,760 --> 00:01:19,810 And then we define conditional probabilities, or conditional 33 00:01:19,810 --> 00:01:24,310 PMFs, and the three are related according to this 34 00:01:24,310 --> 00:01:27,040 formula, which is, you can think of it either as the 35 00:01:27,040 --> 00:01:29,300 definition of conditional probability. 36 00:01:29,300 --> 00:01:32,170 Or as the multiplication rule, the probability of two things 37 00:01:32,170 --> 00:01:35,870 happening is the product of the probabilities of the first 38 00:01:35,870 --> 00:01:38,200 thing happening, and then the second happening, given that 39 00:01:38,200 --> 00:01:39,860 the first has happened. 40 00:01:39,860 --> 00:01:42,830 There's another relation between this, which is the 41 00:01:42,830 --> 00:01:46,360 probability of x occurring, is the sum of the different 42 00:01:46,360 --> 00:01:50,560 probabilities of the different ways that x may occur, which 43 00:01:50,560 --> 00:01:53,700 is in conjunction with different values of y. 44 00:01:53,700 --> 00:01:57,730 And there's an analog of all that in the continuous world, 45 00:01:57,730 --> 00:02:02,430 where all you do is to replace p's by f's, and replace sums 46 00:02:02,430 --> 00:02:03,340 by integrals. 47 00:02:03,340 --> 00:02:05,620 So the formulas all look the same. 48 00:02:05,620 --> 00:02:09,120 The interpretations are a little more subtle, so the f's 49 00:02:09,120 --> 00:02:11,720 are not probabilities, they're probability densities. 50 00:02:11,720 --> 00:02:16,010 So they're probabilities per unit length, or in the case of 51 00:02:16,010 --> 00:02:20,290 joint PDf's, these are probabilities per unit area. 52 00:02:20,290 --> 00:02:22,690 So they're densities of some sort. 53 00:02:22,690 --> 00:02:26,020 Probably the more subtle concept to understand what it 54 00:02:26,020 --> 00:02:29,250 really is the conditional density. 55 00:02:29,250 --> 00:02:30,590 In some sense, it's simple. 56 00:02:30,590 --> 00:02:34,900 It's just the density of X in a world where you have been 57 00:02:34,900 --> 00:02:40,290 told the value of the random variable Y. It's a function 58 00:02:40,290 --> 00:02:44,510 that has two arguments, but the best way to think about it 59 00:02:44,510 --> 00:02:47,050 is to say that we fixed y. 60 00:02:47,050 --> 00:02:50,980 We're told the value of the random variable Y, and we look 61 00:02:50,980 --> 00:02:52,930 at it as a function of x. 62 00:02:52,930 --> 00:02:56,150 So as a function of x, the denominator is a constant, and 63 00:02:56,150 --> 00:02:59,650 it just looks like the joint density. 64 00:02:59,650 --> 00:03:01,620 when we keep y fixed. 65 00:03:01,620 --> 00:03:05,570 So it's really a function of one argument, just the 66 00:03:05,570 --> 00:03:06,870 argument x. 67 00:03:06,870 --> 00:03:10,080 And it has the same shape as the joint's density when you 68 00:03:10,080 --> 00:03:11,720 take that slice of it. 69 00:03:11,720 --> 00:03:17,570 So conditional PDFs are just slices of joint PDFs. 70 00:03:17,570 --> 00:03:20,810 There's a bunch of concepts, expectations, variances, 71 00:03:20,810 --> 00:03:23,790 cumulative distribution functions that apply equally 72 00:03:23,790 --> 00:03:26,260 well for to both universes of discrete or 73 00:03:26,260 --> 00:03:28,800 continuous random variables. 74 00:03:28,800 --> 00:03:31,330 So why is probability useful? 75 00:03:31,330 --> 00:03:36,170 Probability is useful because, among other things, we use it 76 00:03:36,170 --> 00:03:38,420 to make sense of the world around us. 77 00:03:38,420 --> 00:03:41,870 We use it to make inferences about things that we do not 78 00:03:41,870 --> 00:03:43,280 see directly. 79 00:03:43,280 --> 00:03:45,570 And this is done in a very simple manner 80 00:03:45,570 --> 00:03:46,840 using the base rule. 81 00:03:46,840 --> 00:03:49,730 We've already seen some of that, and now we're going to 82 00:03:49,730 --> 00:03:55,070 revisit it with a bunch of different variations. 83 00:03:55,070 --> 00:03:58,240 And the variations come because sometimes our random 84 00:03:58,240 --> 00:04:01,040 variable are discrete, sometimes they're continuous, 85 00:04:01,040 --> 00:04:04,390 or we can have a combination of the two. 86 00:04:04,390 --> 00:04:08,170 So the big picture is that there's some unknown random 87 00:04:08,170 --> 00:04:11,660 variable out of there, and we know the distribution that's 88 00:04:11,660 --> 00:04:12,550 random variable. 89 00:04:12,550 --> 00:04:16,360 And in the discrete case, it's going to be given by PMF. 90 00:04:16,360 --> 00:04:20,269 In the continuous case, it's given a PDF. 91 00:04:20,269 --> 00:04:24,060 Then we have some phenomenon, some noisy phenomenon or some 92 00:04:24,060 --> 00:04:28,380 measuring device, and that measuring device produces 93 00:04:28,380 --> 00:04:31,260 observable random variables Y. 94 00:04:31,260 --> 00:04:34,930 We don't know what x is, but we have some beliefs about how 95 00:04:34,930 --> 00:04:36,310 X is distributed. 96 00:04:36,310 --> 00:04:39,450 We observe the random variable Y. We need a 97 00:04:39,450 --> 00:04:41,300 model of this box. 98 00:04:41,300 --> 00:04:46,170 And the model of that box is going to be either a PMF, for 99 00:04:46,170 --> 00:04:52,565 the random variable Y. And that model tells us, if the 100 00:04:52,565 --> 00:04:57,080 true state of the world is X, how do we expect to Y to be 101 00:04:57,080 --> 00:04:58,520 distributed? 102 00:04:58,520 --> 00:05:01,610 That's for the case where Y is this discrete. 103 00:05:01,610 --> 00:05:06,350 If Y is a continuous, you might instead have a density 104 00:05:06,350 --> 00:05:10,820 for Y, or something of that form. 105 00:05:10,820 --> 00:05:13,980 So in either case, this should be a function 106 00:05:13,980 --> 00:05:15,520 that's known to us. 107 00:05:15,520 --> 00:05:18,370 This is our model of the measuring device. 108 00:05:18,370 --> 00:05:20,950 And now having observed y, we want to make 109 00:05:20,950 --> 00:05:22,680 inferences about x. 110 00:05:22,680 --> 00:05:25,140 What does it mean to make inferences? 111 00:05:25,140 --> 00:05:29,880 Well the most complete answer in the inference problem is to 112 00:05:29,880 --> 00:05:32,380 tell me the probability distribution 113 00:05:32,380 --> 00:05:34,830 of the unknown quantity. 114 00:05:34,830 --> 00:05:36,900 But when I say the probability distribution, I 115 00:05:36,900 --> 00:05:38,540 don't mean this one. 116 00:05:38,540 --> 00:05:41,280 I mean the probability distribution that takes into 117 00:05:41,280 --> 00:05:43,760 account the measurements that you got. 118 00:05:43,760 --> 00:05:48,270 So the output of an inference problem is to come up with the 119 00:05:48,270 --> 00:05:59,830 distribution of X, the unknown quantity, given what we have 120 00:05:59,830 --> 00:06:00,980 already observed. 121 00:06:00,980 --> 00:06:04,110 And in the discrete case, it would be an object like that. 122 00:06:04,110 --> 00:06:08,920 If X is continuous, it would be an object of this kind. 123 00:06:08,920 --> 00:06:13,340 124 00:06:13,340 --> 00:06:18,080 OK, so we're given conditional probabilities of this type, 125 00:06:18,080 --> 00:06:21,240 and we want to get conditional distributions of the opposite 126 00:06:21,240 --> 00:06:23,280 type where the order of the 127 00:06:23,280 --> 00:06:25,580 conditioning is being reversed. 128 00:06:25,580 --> 00:06:28,980 So the starting point is always a formula 129 00:06:28,980 --> 00:06:30,810 such as this one. 130 00:06:30,810 --> 00:06:33,670 The probability of x happening, and then y 131 00:06:33,670 --> 00:06:36,280 happening given that x happens. 132 00:06:36,280 --> 00:06:40,910 This is the probability that a particular x and y happen 133 00:06:40,910 --> 00:06:42,370 simultaneously. 134 00:06:42,370 --> 00:06:47,240 But this is also equal to the probability that y happens, 135 00:06:47,240 --> 00:06:50,377 and then that x happens, given that y has happened. 136 00:06:50,377 --> 00:06:53,060 137 00:06:53,060 --> 00:06:57,140 And you take this expression and send one term to the 138 00:06:57,140 --> 00:07:00,950 denominator of the other side, and this gives us the base 139 00:07:00,950 --> 00:07:03,180 rule for the discrete case. 140 00:07:03,180 --> 00:07:05,550 Which is this one that you have already seen, and you 141 00:07:05,550 --> 00:07:07,200 have played with it. 142 00:07:07,200 --> 00:07:10,720 So this is what the formula looks like in 143 00:07:10,720 --> 00:07:12,030 the discrete case. 144 00:07:12,030 --> 00:07:14,570 And the typical example where both random variables are 145 00:07:14,570 --> 00:07:18,000 discrete is the one we discussed some time ago. 146 00:07:18,000 --> 00:07:20,720 X is, let's say, a binary variable, or whether an 147 00:07:20,720 --> 00:07:22,960 airplane is present up there or not. 148 00:07:22,960 --> 00:07:27,790 Y is a discrete measurement, for example, whether our radar 149 00:07:27,790 --> 00:07:30,040 beeped or it didn't beep. 150 00:07:30,040 --> 00:07:33,860 And we make inferences and calculate the probability that 151 00:07:33,860 --> 00:07:37,860 the plane is there, or the probability that the plane is 152 00:07:37,860 --> 00:07:41,000 not there, given the measurement that we have made. 153 00:07:41,000 --> 00:07:43,940 And of course X and Y do not need to be just binary. 154 00:07:43,940 --> 00:07:47,480 They could be more general discrete random variables. 155 00:07:47,480 --> 00:07:50,900 So how does the story change in the continuous case? 156 00:07:50,900 --> 00:07:53,290 First, what's a possible application of 157 00:07:53,290 --> 00:07:54,570 the continuous case? 158 00:07:54,570 --> 00:07:59,620 Well, think of X as being some signal that takes values over 159 00:07:59,620 --> 00:08:00,630 a continuous range. 160 00:08:00,630 --> 00:08:04,730 Let's say X is the current through a resistor. 161 00:08:04,730 --> 00:08:07,530 And then you have some measuring device that measures 162 00:08:07,530 --> 00:08:11,530 currents, but that device is noisy, it gets hit, let's say 163 00:08:11,530 --> 00:08:13,640 for example, by Gaussian noise. 164 00:08:13,640 --> 00:08:18,340 And the Y that you observe is a noisy version of X. But your 165 00:08:18,340 --> 00:08:22,410 instruments are analog, so you measure things on 166 00:08:22,410 --> 00:08:24,750 a continuous scale. 167 00:08:24,750 --> 00:08:26,250 What are you going to do in that case? 168 00:08:26,250 --> 00:08:29,920 Well the inference problem, the output of the inference 169 00:08:29,920 --> 00:08:33,360 problem, is going to be the conditional distribution of X. 170 00:08:33,360 --> 00:08:38,950 What do you think your current is based on a particular value 171 00:08:38,950 --> 00:08:40,870 of Y that you have observed? 172 00:08:40,870 --> 00:08:44,480 So the output of our inference problem is, given the specific 173 00:08:44,480 --> 00:08:48,560 value of Y, to calculate this entire function as a function 174 00:08:48,560 --> 00:08:51,050 of x, and then go and plot it. 175 00:08:51,050 --> 00:08:53,570 How do we calculate it? 176 00:08:53,570 --> 00:08:57,410 You go through the same calculation as in the discrete 177 00:08:57,410 --> 00:09:01,590 case, except that all of the x's gets replaced by p's. 178 00:09:01,590 --> 00:09:04,630 In the continuous case, it's equally true that the joint's 179 00:09:04,630 --> 00:09:07,790 density is the product of the marginal density with the 180 00:09:07,790 --> 00:09:09,220 conditional density. 181 00:09:09,220 --> 00:09:11,400 So the formula is still valid with just a 182 00:09:11,400 --> 00:09:13,160 little change of notation. 183 00:09:13,160 --> 00:09:16,480 So we end up with the same formula here, except that we 184 00:09:16,480 --> 00:09:18,990 replace x's with p's. 185 00:09:18,990 --> 00:09:23,240 So all of these functions are known to us. 186 00:09:23,240 --> 00:09:25,500 We have formulas for them. 187 00:09:25,500 --> 00:09:29,400 We fix a specific value of y, we plug it in, so we're left 188 00:09:29,400 --> 00:09:30,640 with a function of x. 189 00:09:30,640 --> 00:09:33,420 And that gives us the posterior distribution. 190 00:09:33,420 --> 00:09:38,130 Actually there's also a denominator term that's not 191 00:09:38,130 --> 00:09:42,340 necessarily given to us, but we can always calculate it if 192 00:09:42,340 --> 00:09:45,650 we have the marginal of X, and we have the model for 193 00:09:45,650 --> 00:09:47,250 measuring device. 194 00:09:47,250 --> 00:09:50,960 Then we can always find the marginal distribution of Y. So 195 00:09:50,960 --> 00:09:54,630 this quantity, that number, is in general a known one, as 196 00:09:54,630 --> 00:09:58,490 well, and doesn't give us any problems. 197 00:09:58,490 --> 00:10:03,140 So to complicate things a little bit, we can also look 198 00:10:03,140 --> 00:10:07,610 into situations where our two random variables are of 199 00:10:07,610 --> 00:10:09,080 different kinds. 200 00:10:09,080 --> 00:10:12,290 For example, one random variable could be discrete, 201 00:10:12,290 --> 00:10:15,280 and the other it might be continuous. 202 00:10:15,280 --> 00:10:17,340 And there's two versions. 203 00:10:17,340 --> 00:10:22,320 Here one version is when X is discrete, but Y is continuous. 204 00:10:22,320 --> 00:10:25,130 What's an example of this? 205 00:10:25,130 --> 00:10:30,690 Well suppose that I send a single bit of information so 206 00:10:30,690 --> 00:10:34,620 my X is 0 or 1. 207 00:10:34,620 --> 00:10:39,710 And what I measure is Y, which is X plus, let's 208 00:10:39,710 --> 00:10:42,360 say, Gaussian noise. 209 00:10:42,360 --> 00:10:48,960 210 00:10:48,960 --> 00:10:52,550 This is the standard example that shows up in any textbook 211 00:10:52,550 --> 00:10:55,220 on communication, or signal processing. 212 00:10:55,220 --> 00:10:58,530 You send a single bit, but what you observe is a noisy 213 00:10:58,530 --> 00:11:02,120 version of that bit. 214 00:11:02,120 --> 00:11:05,150 You start with a model of your x's. 215 00:11:05,150 --> 00:11:07,610 These would be your prior probabilities. 216 00:11:07,610 --> 00:11:11,670 For example, you might be believe that either 0 or 1 are 217 00:11:11,670 --> 00:11:16,250 equally likely, in which case your PMF gives equal weight to 218 00:11:16,250 --> 00:11:18,320 two possible values. 219 00:11:18,320 --> 00:11:21,840 And then we need a model of our measuring device. 220 00:11:21,840 --> 00:11:23,990 This is one specific model. 221 00:11:23,990 --> 00:11:28,090 The general model would have a shape such as follows. 222 00:11:28,090 --> 00:11:37,560 Y has a distribution, its density. 223 00:11:37,560 --> 00:11:41,590 And that density, however, depends on the value of X. 224 00:11:41,590 --> 00:11:46,170 So when x is 0, we might get a density of this kind. 225 00:11:46,170 --> 00:11:50,250 And when x is 1, we might get the density 226 00:11:50,250 --> 00:11:52,210 of a different kind. 227 00:11:52,210 --> 00:11:57,010 So these are the conditional densities of y in a universe 228 00:11:57,010 --> 00:11:59,730 that's specified by a particular value of x. 229 00:11:59,730 --> 00:12:04,660 230 00:12:04,660 --> 00:12:09,040 And then we go ahead and do our inference. 231 00:12:09,040 --> 00:12:13,520 OK, what's the right formula for doing this inference? 232 00:12:13,520 --> 00:12:18,270 We need a formula that's sort of an analog of this one, but 233 00:12:18,270 --> 00:12:22,210 applies to the case where we have two random variables of 234 00:12:22,210 --> 00:12:23,670 different kinds. 235 00:12:23,670 --> 00:12:29,370 So let me just redo this calculation here. 236 00:12:29,370 --> 00:12:33,250 Except that I'm not going to have a probability of taking 237 00:12:33,250 --> 00:12:34,340 specific values. 238 00:12:34,340 --> 00:12:36,800 It will have to be something a little different. 239 00:12:36,800 --> 00:12:39,250 So here's how it goes. 240 00:12:39,250 --> 00:12:44,340 Let's look at the probability that X takes a specific value 241 00:12:44,340 --> 00:12:47,510 that makes sense in the discrete case, but for the 242 00:12:47,510 --> 00:12:50,040 continuous random variable, let's look at the probability 243 00:12:50,040 --> 00:12:53,480 that it takes values in some little interval. 244 00:12:53,480 --> 00:12:55,940 And now this probability of two things happening, I'm 245 00:12:55,940 --> 00:12:57,520 going to write it as a product. 246 00:12:57,520 --> 00:12:59,450 And I'm going to write this as a product in 247 00:12:59,450 --> 00:13:01,350 two different ways. 248 00:13:01,350 --> 00:13:09,360 So one way is to say that this is the probability that X 249 00:13:09,360 --> 00:13:13,670 takes that value and then given that X takes that value, 250 00:13:13,670 --> 00:13:19,310 the probability that Y falls inside that interval. 251 00:13:19,310 --> 00:13:21,810 So this is our usual multiplication rule for 252 00:13:21,810 --> 00:13:25,330 multiplying probabilities, but I can use the multiplication 253 00:13:25,330 --> 00:13:27,610 rule also in a different way. 254 00:13:27,610 --> 00:13:30,210 It's the probability that Y falls in 255 00:13:30,210 --> 00:13:33,460 the range of interest. 256 00:13:33,460 --> 00:13:36,990 And then the probability that X takes the value of interest 257 00:13:36,990 --> 00:13:41,145 given that Y satisfies the first condition. 258 00:13:41,145 --> 00:13:45,960 259 00:13:45,960 --> 00:13:53,760 So this is something that's definitely true. 260 00:13:53,760 --> 00:13:57,410 We're just using the multiplication rule. 261 00:13:57,410 --> 00:14:02,240 And now let's translate it into PMF is PDF notation. 262 00:14:02,240 --> 00:14:07,130 So the entry up there is the PMF of X evaluated at x. 263 00:14:07,130 --> 00:14:10,030 The second entry, what is it? 264 00:14:10,030 --> 00:14:12,230 Well probabilities of little intervals are 265 00:14:12,230 --> 00:14:13,480 given to us by densities. 266 00:14:13,480 --> 00:14:16,010 267 00:14:16,010 --> 00:14:19,160 But we are in the conditional universe where X takes on a 268 00:14:19,160 --> 00:14:20,430 particular value. 269 00:14:20,430 --> 00:14:27,450 So it's going to be the density of Y given the value 270 00:14:27,450 --> 00:14:30,210 of X times delta. 271 00:14:30,210 --> 00:14:32,790 So probabilities of little intervals are given by the 272 00:14:32,790 --> 00:14:36,430 density times the length of the little interval, but 273 00:14:36,430 --> 00:14:39,390 because we're working in the conditional universe, it has 274 00:14:39,390 --> 00:14:41,230 to be the conditional density. 275 00:14:41,230 --> 00:14:43,860 Now let's try the second expression. 276 00:14:43,860 --> 00:14:46,690 This is the probability that the Y falls 277 00:14:46,690 --> 00:14:48,040 into the little interval. 278 00:14:48,040 --> 00:14:51,160 So that's the density of Y times delta. 279 00:14:51,160 --> 00:14:53,950 And then here we have an object which is the 280 00:14:53,950 --> 00:14:59,690 conditional probability X in a universe where the value of Y 281 00:14:59,690 --> 00:15:00,940 is given to us. 282 00:15:00,940 --> 00:15:04,900 283 00:15:04,900 --> 00:15:08,830 Now this relation is sort of approximate. 284 00:15:08,830 --> 00:15:13,630 This is true for very small delta in the limit. 285 00:15:13,630 --> 00:15:17,880 But we can cancel the deltas from both sides, and we're 286 00:15:17,880 --> 00:15:21,800 left with a formula that links together PMFs and PDFs. 287 00:15:21,800 --> 00:15:25,340 Now this may look terribly confusing because there's both 288 00:15:25,340 --> 00:15:27,730 p's and f's involved. 289 00:15:27,730 --> 00:15:29,850 But the logic should be clear. 290 00:15:29,850 --> 00:15:32,590 If a random variable is discrete, it's 291 00:15:32,590 --> 00:15:34,480 described by PMF. 292 00:15:34,480 --> 00:15:38,120 So here we're talking about the PMF of X in some 293 00:15:38,120 --> 00:15:39,130 particular universe. 294 00:15:39,130 --> 00:15:41,210 X is discrete, so it has a PMF. 295 00:15:41,210 --> 00:15:42,320 Similarly here. 296 00:15:42,320 --> 00:15:45,380 Y is continuous so it's described by a PDF. 297 00:15:45,380 --> 00:15:47,840 And even in the conditional universe where I tell you the 298 00:15:47,840 --> 00:15:50,900 value of X, Y is still a continuous random variable, so 299 00:15:50,900 --> 00:15:53,280 it's been described by a PDF. 300 00:15:53,280 --> 00:15:55,430 So this is the basic relation that links 301 00:15:55,430 --> 00:15:57,360 together PMF and PDFs. 302 00:15:57,360 --> 00:15:59,080 In this mixed the world. 303 00:15:59,080 --> 00:16:04,270 And now in this inequality, you can take this term and 304 00:16:04,270 --> 00:16:07,830 send it to the new denominator to the other side. 305 00:16:07,830 --> 00:16:10,070 And what you end up with is the formula 306 00:16:10,070 --> 00:16:11,830 that we have up here. 307 00:16:11,830 --> 00:16:15,640 And this is a formula that we can use to make inferences 308 00:16:15,640 --> 00:16:18,780 about the discrete random variable X when we're told the 309 00:16:18,780 --> 00:16:26,540 value of the continuous random variable Y. The probability 310 00:16:26,540 --> 00:16:29,690 that X takes on a particular value has something 311 00:16:29,690 --> 00:16:31,330 to do with the prior. 312 00:16:31,330 --> 00:16:36,520 And other than that, it's proportional to this quantity, 313 00:16:36,520 --> 00:16:41,720 the conditional of Y given X. So these are the quantities 314 00:16:41,720 --> 00:16:43,190 that we plotted here. 315 00:16:43,190 --> 00:16:47,550 Suppose that the x's are equally likely in your prior, 316 00:16:47,550 --> 00:16:50,210 so we don't really care about that term. 317 00:16:50,210 --> 00:16:55,530 It tells us that the posterior of X is proportional to that 318 00:16:55,530 --> 00:16:58,520 particular density under the given x's. 319 00:16:58,520 --> 00:17:03,350 So in this picture, if I were to get a particular y here, I 320 00:17:03,350 --> 00:17:07,200 would say that x equals 1 has a probability that's 321 00:17:07,200 --> 00:17:09,220 proportional to this quantity. 322 00:17:09,220 --> 00:17:11,470 x equals 0 has a probability that's 323 00:17:11,470 --> 00:17:13,599 proportional to this quantity. 324 00:17:13,599 --> 00:17:16,910 So the ratio of these two quantities gives us the 325 00:17:16,910 --> 00:17:21,200 relative odds of the different x's given the y 326 00:17:21,200 --> 00:17:24,010 that we have observed. 327 00:17:24,010 --> 00:17:28,099 So we're going to come back to this topic and redo plenty of 328 00:17:28,099 --> 00:17:31,350 examples of these kinds towards the end of the class, 329 00:17:31,350 --> 00:17:34,480 when we spend some time dedicated 330 00:17:34,480 --> 00:17:36,130 to inference problems. 331 00:17:36,130 --> 00:17:39,890 But already at this stage, we sort of have the basic skills 332 00:17:39,890 --> 00:17:42,000 to deal with a lot of that. 333 00:17:42,000 --> 00:17:43,840 And it's useful at this point to pull all 334 00:17:43,840 --> 00:17:45,610 the formulas together. 335 00:17:45,610 --> 00:17:49,990 So finally let's look at the last case that's remaining. 336 00:17:49,990 --> 00:17:54,440 Here we have a continuous phenomenon that we're trying 337 00:17:54,440 --> 00:17:57,770 to measure, but our measurements are discrete. 338 00:17:57,770 --> 00:18:00,780 What's an example where this might happen? 339 00:18:00,780 --> 00:18:05,270 So you have some device that emits light, and you drive it 340 00:18:05,270 --> 00:18:07,500 with a current that has a certain intensity. 341 00:18:07,500 --> 00:18:09,910 You don't know what that current is, and it's a 342 00:18:09,910 --> 00:18:12,120 continuous random variable. 343 00:18:12,120 --> 00:18:14,600 But the device emits light by sending 344 00:18:14,600 --> 00:18:16,580 out individual photons. 345 00:18:16,580 --> 00:18:20,480 And your measurement is some other device that counts how 346 00:18:20,480 --> 00:18:23,250 many photons did you get in a single second. 347 00:18:23,250 --> 00:18:28,020 So if we have devices that emit a very low intensity you 348 00:18:28,020 --> 00:18:31,720 can actually start counting individual photons as they're 349 00:18:31,720 --> 00:18:32,980 being observed. 350 00:18:32,980 --> 00:18:35,390 So we have a discrete measurement, which is the 351 00:18:35,390 --> 00:18:38,920 number of problems, and we have a continuous hidden 352 00:18:38,920 --> 00:18:43,060 random variable that we're trying to estimate. 353 00:18:43,060 --> 00:18:45,790 What do we do in this case? 354 00:18:45,790 --> 00:18:52,600 Well we start again with a formula of this kind, and send 355 00:18:52,600 --> 00:18:55,560 the p term to the denominator. 356 00:18:55,560 --> 00:18:58,180 And that's the formula that we use there, except that the 357 00:18:58,180 --> 00:19:01,100 roles of x's and y's are interchanged. 358 00:19:01,100 --> 00:19:06,810 So since here we have Y being discrete, we should change all 359 00:19:06,810 --> 00:19:07,590 the subscripts. 360 00:19:07,590 --> 00:19:15,490 It would be p_Y f_X given y f_X, and P(Y given X). 361 00:19:15,490 --> 00:19:19,230 So just change all those subscripts. 362 00:19:19,230 --> 00:19:22,740 Because now what we're used to be continuous became discrete, 363 00:19:22,740 --> 00:19:25,310 and vice versa. 364 00:19:25,310 --> 00:19:27,360 Take that formula, send the other terms to the 365 00:19:27,360 --> 00:19:32,140 denominator, and we have a formula for the density, or X, 366 00:19:32,140 --> 00:19:34,370 given the particular measurements for Y that we 367 00:19:34,370 --> 00:19:36,350 have obtained. 368 00:19:36,350 --> 00:19:41,420 In some sense that's all there is in Bayesian inference. 369 00:19:41,420 --> 00:19:46,540 It's using these very simple one line formulas. 370 00:19:46,540 --> 00:19:51,210 But why are there people then who make their living solving 371 00:19:51,210 --> 00:19:52,550 inference problems? 372 00:19:52,550 --> 00:19:54,990 Well, the devil is in the details. 373 00:19:54,990 --> 00:19:57,460 As we're going to discuss, there are some real world 374 00:19:57,460 --> 00:20:01,150 issues of how exactly do you design your f's, how do you 375 00:20:01,150 --> 00:20:04,680 model your system, then how do you do your calculations. 376 00:20:04,680 --> 00:20:06,940 This might not be always easy. 377 00:20:06,940 --> 00:20:09,710 For example, there's certain integrals or sums that have to 378 00:20:09,710 --> 00:20:12,900 be evaluated, which may be hard to do and so on. 379 00:20:12,900 --> 00:20:14,900 So this object is a lot of richer 380 00:20:14,900 --> 00:20:16,730 than just these formulas. 381 00:20:16,730 --> 00:20:21,270 On the other hand, at the conceptual level, that's the 382 00:20:21,270 --> 00:20:23,910 basis for Bayesian inference, that these 383 00:20:23,910 --> 00:20:25,160 are the basic concepts. 384 00:20:25,160 --> 00:20:27,570 385 00:20:27,570 --> 00:20:30,850 All right, so now let's change gear and move to the new 386 00:20:30,850 --> 00:20:36,180 subject, which is the topic of finding the distribution of a 387 00:20:36,180 --> 00:20:38,360 functional for a random variable. 388 00:20:38,360 --> 00:20:42,820 We call those distributions derived distributions, because 389 00:20:42,820 --> 00:20:45,480 we're given the distribution of X. We're interested in a 390 00:20:45,480 --> 00:20:48,980 function of X. We want to derive the distribution of 391 00:20:48,980 --> 00:20:51,020 that function based on the distribution 392 00:20:51,020 --> 00:20:53,060 that we already know. 393 00:20:53,060 --> 00:20:56,610 So it could be a function of just one random variable. 394 00:20:56,610 --> 00:20:59,170 It could be a function of several random variables. 395 00:20:59,170 --> 00:21:02,880 So one example that we are going to solve at some point, 396 00:21:02,880 --> 00:21:05,830 let's say you have to run the variables X and Y. Somebody 397 00:21:05,830 --> 00:21:09,055 tells you their distribution, for example, is a uniform of 398 00:21:09,055 --> 00:21:10,000 the square. 399 00:21:10,000 --> 00:21:12,120 For some reason, you're interested in the ratio of 400 00:21:12,120 --> 00:21:14,660 these two random variables, and you want to find the 401 00:21:14,660 --> 00:21:16,910 distribution of that ratio. 402 00:21:16,910 --> 00:21:21,810 You can think of lots of cases where your random variable of 403 00:21:21,810 --> 00:21:25,950 interest is created by taking some other unknown variables 404 00:21:25,950 --> 00:21:27,570 and taking a function of them. 405 00:21:27,570 --> 00:21:31,170 And so it's legitimate to care about the distribution of that 406 00:21:31,170 --> 00:21:33,310 random variable. 407 00:21:33,310 --> 00:21:35,560 A caveat, however. 408 00:21:35,560 --> 00:21:39,480 There's an important case where you don't need to find 409 00:21:39,480 --> 00:21:41,840 the distribution of that random variable. 410 00:21:41,840 --> 00:21:44,600 And this is when you want to calculate the expectations. 411 00:21:44,600 --> 00:21:47,750 If all you care about is the expected value of this 412 00:21:47,750 --> 00:21:50,580 function of the random variables, you can work 413 00:21:50,580 --> 00:21:53,800 directly with the distribution of the original random 414 00:21:53,800 --> 00:21:58,490 variables without ever having to find the PDF of g. 415 00:21:58,490 --> 00:22:03,790 So you don't do unnecessary work if it's not needed, but 416 00:22:03,790 --> 00:22:06,290 if it's needed, or if you're asked to do it, 417 00:22:06,290 --> 00:22:08,470 then you just do it. 418 00:22:08,470 --> 00:22:13,040 So how do we find the distribution of the function? 419 00:22:13,040 --> 00:22:17,690 As a warm-up, let's look at the discrete case. 420 00:22:17,690 --> 00:22:21,120 Suppose that X is a discrete random variable and takes 421 00:22:21,120 --> 00:22:22,550 certain values. 422 00:22:22,550 --> 00:22:27,070 We have a function g that maps x's into y's. 423 00:22:27,070 --> 00:22:30,430 And we want to find the probability mass function for 424 00:22:30,430 --> 00:22:31,930 Y. 425 00:22:31,930 --> 00:22:36,780 So for example, if I'm interested in finding the 426 00:22:36,780 --> 00:22:41,020 probability that Y takes on this particular value, how 427 00:22:41,020 --> 00:22:42,910 would they find it? 428 00:22:42,910 --> 00:22:46,890 Well I ask, what are the different ways that these 429 00:22:46,890 --> 00:22:49,390 particular y value can happen? 430 00:22:49,390 --> 00:22:53,390 And the different ways that it can happen is either if x 431 00:22:53,390 --> 00:22:56,800 takes this value, or if X takes that value. 432 00:22:56,800 --> 00:23:02,650 So we identify this event in the y space with that event in 433 00:23:02,650 --> 00:23:04,220 the x space. 434 00:23:04,220 --> 00:23:06,790 These two events are identical. 435 00:23:06,790 --> 00:23:12,350 X falls in this set if and only if Y falls in that set. 436 00:23:12,350 --> 00:23:15,060 Therefore, the probability of Y falling in that set is the 437 00:23:15,060 --> 00:23:17,540 probability of X falling in that set. 438 00:23:17,540 --> 00:23:20,890 The probability of X falling in that set is just the sum of 439 00:23:20,890 --> 00:23:24,650 the individual probabilities of the x's in this set. 440 00:23:24,650 --> 00:23:27,360 So we just add the probabilities of the different 441 00:23:27,360 --> 00:23:31,300 x's where the summation is taken over all x's that leads 442 00:23:31,300 --> 00:23:35,070 to that particular value of y. 443 00:23:35,070 --> 00:23:35,860 Very good. 444 00:23:35,860 --> 00:23:39,090 So that's all there is in the discrete case. 445 00:23:39,090 --> 00:23:41,070 It's a very nice and simple. 446 00:23:41,070 --> 00:23:43,460 So let's transfer these methods to 447 00:23:43,460 --> 00:23:45,810 the continuous case. 448 00:23:45,810 --> 00:23:47,890 Suppose we are in the continuous case. 449 00:23:47,890 --> 00:23:52,140 Suppose that X and Y now can take values anywhere. 450 00:23:52,140 --> 00:23:55,440 And I try to use same methods and I ask, what is the 451 00:23:55,440 --> 00:24:00,340 probability that Y is going to take this value? 452 00:24:00,340 --> 00:24:03,100 At least if the diagram is this way, you would say this 453 00:24:03,100 --> 00:24:06,990 is the same as the probability that X takes this value. 454 00:24:06,990 --> 00:24:10,220 So I can find the probability of Y being this in terms of 455 00:24:10,220 --> 00:24:12,600 the probability of X being that. 456 00:24:12,600 --> 00:24:14,610 Is this useful? 457 00:24:14,610 --> 00:24:16,480 In the continuous case, it's not. 458 00:24:16,480 --> 00:24:19,830 Because in the continuous case, any single value has 0 459 00:24:19,830 --> 00:24:21,020 probability. 460 00:24:21,020 --> 00:24:25,450 So what you're going to get out of this argument is that 461 00:24:25,450 --> 00:24:29,530 the probability Y takes this value is 0, is equal to the 462 00:24:29,530 --> 00:24:32,800 probability that X takes that value which also 0. 463 00:24:32,800 --> 00:24:34,650 That doesn't help us. 464 00:24:34,650 --> 00:24:36,060 We want to do something more. 465 00:24:36,060 --> 00:24:40,650 We want to actually find, perhaps, the density of Y, as 466 00:24:40,650 --> 00:24:43,550 opposed to the probabilities of individual y's. 467 00:24:43,550 --> 00:24:47,620 So to find the density of Y, you might argue as follows. 468 00:24:47,620 --> 00:24:51,100 I'm looking at an interval for y, and I ask what's the 469 00:24:51,100 --> 00:24:53,510 probability of falling in this interval. 470 00:24:53,510 --> 00:24:57,890 And you go back and find the corresponding set of x's that 471 00:24:57,890 --> 00:25:02,090 leads to those y's, and equate those two probabilities. 472 00:25:02,090 --> 00:25:04,960 The probability of all of those y's collectively should 473 00:25:04,960 --> 00:25:09,710 be equal to the probability of all of the x's that map into 474 00:25:09,710 --> 00:25:11,930 that interval collectively. 475 00:25:11,930 --> 00:25:16,010 And this way you can relate the two. 476 00:25:16,010 --> 00:25:22,870 As far as the mechanics go, in many cases it's easier to not 477 00:25:22,870 --> 00:25:26,670 to work with little intervals, but instead to work with 478 00:25:26,670 --> 00:25:30,110 cumulative distribution functions that used to work 479 00:25:30,110 --> 00:25:32,600 with sort of big intervals. 480 00:25:32,600 --> 00:25:35,460 So you can instead do a different picture. 481 00:25:35,460 --> 00:25:38,250 Look at this set of y's. 482 00:25:38,250 --> 00:25:41,690 This is the set of y's that are smaller 483 00:25:41,690 --> 00:25:43,200 than a certain value. 484 00:25:43,200 --> 00:25:46,990 The probability of this set is given by the cumulative 485 00:25:46,990 --> 00:25:49,740 distribution of the random variable Y. 486 00:25:49,740 --> 00:25:54,450 Now this set of y's gets produced by some corresponding 487 00:25:54,450 --> 00:25:56,850 set of x's. 488 00:25:56,850 --> 00:26:04,120 Maybe these are the x's that map into y's in that set. 489 00:26:04,120 --> 00:26:06,040 And then we argue as follows. 490 00:26:06,040 --> 00:26:08,870 The probability that the Y falls in this interval is the 491 00:26:08,870 --> 00:26:12,600 same as the probability that X falls in that interval. 492 00:26:12,600 --> 00:26:15,810 So the event of Y falling here and the event of X falling 493 00:26:15,810 --> 00:26:19,330 there are the same, so their probabilities must be equal. 494 00:26:19,330 --> 00:26:22,010 And then I do the calculations here. 495 00:26:22,010 --> 00:26:25,050 And I end up getting the cumulative distribution 496 00:26:25,050 --> 00:26:28,760 function of Y. Once I have the cumulative, I can get the 497 00:26:28,760 --> 00:26:31,670 density by just differentiating. 498 00:26:31,670 --> 00:26:34,900 So this is the general cookbook procedure that we 499 00:26:34,900 --> 00:26:37,886 will be using to calculate it derived distributions. 500 00:26:37,886 --> 00:26:40,450 501 00:26:40,450 --> 00:26:43,500 We're interested in a random variable Y, which is a 502 00:26:43,500 --> 00:26:45,320 function of the x's. 503 00:26:45,320 --> 00:26:50,070 We will aim at obtaining the cumulative distribution of Y. 504 00:26:50,070 --> 00:26:54,040 Somehow, manage to calculate the probability of this event. 505 00:26:54,040 --> 00:26:58,120 Once we get it, and what I mean by get it, I don't mean 506 00:26:58,120 --> 00:27:00,980 getting it for a single value of little y. 507 00:27:00,980 --> 00:27:04,640 You need to get this for all little y's. 508 00:27:04,640 --> 00:27:07,930 So you need to get the function itself, the 509 00:27:07,930 --> 00:27:09,480 cumulative distribution. 510 00:27:09,480 --> 00:27:12,750 Once you get it in that form, then you can calculate the 511 00:27:12,750 --> 00:27:15,260 derivative at any particular point. 512 00:27:15,260 --> 00:27:18,000 And this is going to give you the density of Y. 513 00:27:18,000 --> 00:27:19,690 So a simple two-step procedure. 514 00:27:19,690 --> 00:27:24,050 The devil is in the details of how you carry the mechanics. 515 00:27:24,050 --> 00:27:27,580 So let's do one first example. 516 00:27:27,580 --> 00:27:31,020 Suppose that X is a uniform random variable, takes values 517 00:27:31,020 --> 00:27:32,660 between 0 and 2. 518 00:27:32,660 --> 00:27:35,605 We're interested in the random variable Y, which is the cube 519 00:27:35,605 --> 00:27:37,500 of X. What kind of distribution 520 00:27:37,500 --> 00:27:38,840 is it going to have? 521 00:27:38,840 --> 00:27:44,960 Now first notice that Y takes values between 0 and 8. 522 00:27:44,960 --> 00:27:48,810 So X is uniform, so all the x's are equally likely. 523 00:27:48,810 --> 00:27:51,680 524 00:27:51,680 --> 00:27:55,340 You might then say, well, in that case, all the y's should 525 00:27:55,340 --> 00:27:56,740 be equally likely. 526 00:27:56,740 --> 00:28:00,630 So Y might also have a uniform distribution. 527 00:28:00,630 --> 00:28:02,210 Is this true? 528 00:28:02,210 --> 00:28:04,040 We'll find out. 529 00:28:04,040 --> 00:28:06,990 So let's start applying the cookbook procedure. 530 00:28:06,990 --> 00:28:10,410 We want to find first the cumulative distribution of the 531 00:28:10,410 --> 00:28:14,890 random variable Y, which by definition is the probability 532 00:28:14,890 --> 00:28:17,370 that the random variable is less than or equal to a 533 00:28:17,370 --> 00:28:18,850 certain number. 534 00:28:18,850 --> 00:28:20,680 That's what we want to find. 535 00:28:20,680 --> 00:28:24,440 What we have in our hands is the distribution of X. That's 536 00:28:24,440 --> 00:28:26,320 what we need to work with. 537 00:28:26,320 --> 00:28:30,090 So the first step that you need to do is to look at this 538 00:28:30,090 --> 00:28:33,680 events and translate it, and write it in terms of the 539 00:28:33,680 --> 00:28:39,040 random variable about which you know you have information. 540 00:28:39,040 --> 00:28:44,320 So Y is X cubed, so this event is the same as that event. 541 00:28:44,320 --> 00:28:46,760 So now we can forget about the y's. 542 00:28:46,760 --> 00:28:49,860 It's just an exercise involving a single random 543 00:28:49,860 --> 00:28:52,750 variable with a known distribution and we want to 544 00:28:52,750 --> 00:28:56,610 calculate the probability of some event. 545 00:28:56,610 --> 00:28:58,780 So we're looking at this event. 546 00:28:58,780 --> 00:29:02,230 X cubed being less than or equal to Y. We massage that 547 00:29:02,230 --> 00:29:06,130 expression so that's it involves X directly, so let's 548 00:29:06,130 --> 00:29:08,960 take cubic roots of both sides of this inequality. 549 00:29:08,960 --> 00:29:12,130 This event is the same as the event that X is less than or 550 00:29:12,130 --> 00:29:14,820 equal to Y to the 1/3. 551 00:29:14,820 --> 00:29:19,300 Now with a uniform distribution on [0,2], what is 552 00:29:19,300 --> 00:29:22,070 that probability going to be? 553 00:29:22,070 --> 00:29:27,710 It's the probability of being in the interval from 0 to y to 554 00:29:27,710 --> 00:29:34,680 the 1/3, so it's going to be in the area under the uniform 555 00:29:34,680 --> 00:29:37,010 going up to that point. 556 00:29:37,010 --> 00:29:39,315 And what's the area under that uniform? 557 00:29:39,315 --> 00:29:42,650 558 00:29:42,650 --> 00:29:44,290 So here's x. 559 00:29:44,290 --> 00:29:50,810 Here is the distribution of X. It goes up to 2. 560 00:29:50,810 --> 00:29:53,330 The distribution of X is this one. 561 00:29:53,330 --> 00:29:56,860 We want to go up to y to the 1/3. 562 00:29:56,860 --> 00:30:02,390 So the probability for this event happening is this area. 563 00:30:02,390 --> 00:30:06,590 And the area is equal to the base, which is y to the 1/3 564 00:30:06,590 --> 00:30:08,250 times the height. 565 00:30:08,250 --> 00:30:09,720 What is the height? 566 00:30:09,720 --> 00:30:13,480 Well since the density must integrate to 1, the total area 567 00:30:13,480 --> 00:30:15,340 under the curve has to be 1. 568 00:30:15,340 --> 00:30:19,660 So the height here is 1/2, and that explains why we get the 569 00:30:19,660 --> 00:30:22,530 1/2 factor down there. 570 00:30:22,530 --> 00:30:24,900 So that's the formula for the cumulative distribution. 571 00:30:24,900 --> 00:30:26,070 And then the rest is easy. 572 00:30:26,070 --> 00:30:28,340 You just take derivatives. 573 00:30:28,340 --> 00:30:32,650 You differentiate this expression with respect to y 574 00:30:32,650 --> 00:30:36,240 1/2 times 1/3, and y drops by one power. 575 00:30:36,240 --> 00:30:39,670 So you get y to 2/3 in the denominator. 576 00:30:39,670 --> 00:30:55,490 So if you wish to plot this, it's 1/y to the 2/3. 577 00:30:55,490 --> 00:31:00,480 So when y goes to 0, it sort of blows up and it 578 00:31:00,480 --> 00:31:03,090 goes on this way. 579 00:31:03,090 --> 00:31:06,090 Is this picture correct the way I've drawn it? 580 00:31:06,090 --> 00:31:08,900 581 00:31:08,900 --> 00:31:11,256 What's wrong with it? 582 00:31:11,256 --> 00:31:12,630 [? AUDIENCE: Something. ?] 583 00:31:12,630 --> 00:31:13,420 PROFESSOR: Yes. 584 00:31:13,420 --> 00:31:17,610 y only takes values from 0 to 8. 585 00:31:17,610 --> 00:31:21,890 This formula that I wrote here is only correct when the 586 00:31:21,890 --> 00:31:25,000 preview picture applies. 587 00:31:25,000 --> 00:31:31,070 I took my y to the 1/3 to be between 0 and 2. 588 00:31:31,070 --> 00:31:40,650 So this formula here is only correct for y between 0 and 8. 589 00:31:40,650 --> 00:31:43,770 590 00:31:43,770 --> 00:31:46,610 And for that reason, the formula for the derivative is 591 00:31:46,610 --> 00:31:50,700 also true only for a y between 0 and 8. 592 00:31:50,700 --> 00:31:55,630 And any other values of why are impossible, so they get 593 00:31:55,630 --> 00:31:57,880 zero density. 594 00:31:57,880 --> 00:32:04,070 So to complete the picture here, the PDF of y has a 595 00:32:04,070 --> 00:32:09,290 cut-off of 8, and it's also 0 everywhere else. 596 00:32:09,290 --> 00:32:13,330 597 00:32:13,330 --> 00:32:16,640 And one thing that we see is that the distribution of Y is 598 00:32:16,640 --> 00:32:17,980 not uniform. 599 00:32:17,980 --> 00:32:24,240 Certain y's are more likely than others, even though we 600 00:32:24,240 --> 00:32:26,130 started with a uniform random 601 00:32:26,130 --> 00:32:32,240 variable X. All right. 602 00:32:32,240 --> 00:32:36,530 So we will keep doing examples of this kind, a sequence of 603 00:32:36,530 --> 00:32:40,350 progressively more interesting or more complicated. 604 00:32:40,350 --> 00:32:42,530 So that's going to continue in the next lecture. 605 00:32:42,530 --> 00:32:45,930 You're going to see plenty of examples in your recitations 606 00:32:45,930 --> 00:32:48,060 and tutorials and so on. 607 00:32:48,060 --> 00:32:52,420 So let's do one that's pretty similar to the one that we 608 00:32:52,420 --> 00:32:57,730 did, but it's going to add to just a small twist in how we 609 00:32:57,730 --> 00:33:00,470 do the mechanics. 610 00:33:00,470 --> 00:33:02,780 OK so you set your cruise control 611 00:33:02,780 --> 00:33:04,010 when you start driving. 612 00:33:04,010 --> 00:33:06,310 And you keep driving at the constants based at the 613 00:33:06,310 --> 00:33:07,870 constant speed. 614 00:33:07,870 --> 00:33:09,980 Where you set your cruise control is somewhere 615 00:33:09,980 --> 00:33:11,660 between 30 and 60. 616 00:33:11,660 --> 00:33:14,520 You're going to drive a distance of 200. 617 00:33:14,520 --> 00:33:18,660 And so the time it's going to take for your trip is 200 over 618 00:33:18,660 --> 00:33:20,530 the setting of your cruise control. 619 00:33:20,530 --> 00:33:22,610 So it's 200/V. 620 00:33:22,610 --> 00:33:26,210 Somebody gives you the distribution of V, and they 621 00:33:26,210 --> 00:33:29,490 tell you not only it's between 30 and 60, it's roughly 622 00:33:29,490 --> 00:33:33,530 equally likely to be anything between 30 and 60, so we have 623 00:33:33,530 --> 00:33:36,280 a uniform distribution over that range. 624 00:33:36,280 --> 00:33:40,060 So we have a distribution of V. We want to find the 625 00:33:40,060 --> 00:33:43,460 distribution of the random variable T, which is the time 626 00:33:43,460 --> 00:33:46,540 it takes till your trip ends. 627 00:33:46,540 --> 00:33:49,200 628 00:33:49,200 --> 00:33:51,790 So how are we going to proceed? 629 00:33:51,790 --> 00:33:55,170 We'll use the exact same cookbook procedure. 630 00:33:55,170 --> 00:33:57,360 We're going to start by finding the cumulative 631 00:33:57,360 --> 00:34:02,920 distribution of T. What is this? 632 00:34:02,920 --> 00:34:05,730 By definition, the cumulative distribution is the 633 00:34:05,730 --> 00:34:10,230 probability that T is less than a certain number. 634 00:34:10,230 --> 00:34:12,070 OK. 635 00:34:12,070 --> 00:34:15,340 Now we don't know the distribution of T, so we 636 00:34:15,340 --> 00:34:17,989 cannot to work with these event directly. 637 00:34:17,989 --> 00:34:21,960 But we take that event and translate it into T-space. 638 00:34:21,960 --> 00:34:28,205 So we replace the t's by what we know T to be in terms of V 639 00:34:28,205 --> 00:34:28,271 or 640 00:34:28,271 --> 00:34:33,565 the v's All right. 641 00:34:33,565 --> 00:34:36,230 642 00:34:36,230 --> 00:34:39,659 So we have the distribution of V. So now let's 643 00:34:39,659 --> 00:34:41,739 calculate this quantity. 644 00:34:41,739 --> 00:34:42,179 OK. 645 00:34:42,179 --> 00:34:46,210 Let's massage this event and rewrite it as the probability 646 00:34:46,210 --> 00:35:06,880 that V is larger or equal to 200/T. 647 00:35:06,880 --> 00:35:10,870 So what is this going to be? 648 00:35:10,870 --> 00:35:14,400 So let's say that 200/T is some number that 649 00:35:14,400 --> 00:35:16,015 falls inside the range. 650 00:35:16,015 --> 00:35:19,150 651 00:35:19,150 --> 00:35:24,630 So that's going to be true if 200/T is bigger than 30, and 652 00:35:24,630 --> 00:35:26,610 less than 60. 653 00:35:26,610 --> 00:35:37,110 Which means that t is less than 30/200. 654 00:35:37,110 --> 00:35:38,360 No, 200/30. 655 00:35:38,360 --> 00:35:41,300 656 00:35:41,300 --> 00:35:44,570 And bigger than 200/60. 657 00:35:44,570 --> 00:35:51,360 So for t's inside that range, this number 200/t falls inside 658 00:35:51,360 --> 00:35:52,230 that range. 659 00:35:52,230 --> 00:35:55,960 This is the range of t's that are possible, given the 660 00:35:55,960 --> 00:35:59,240 description of the problem the we have set up. 661 00:35:59,240 --> 00:36:04,940 So for t's in that range, what is the probability that V is 662 00:36:04,940 --> 00:36:07,900 bigger than this number? 663 00:36:07,900 --> 00:36:11,550 So V being bigger than that number is the probability of 664 00:36:11,550 --> 00:36:17,000 this event, so it's going to be the area under this curve. 665 00:36:17,000 --> 00:36:22,880 So the area under that curve is the height of the curve, 666 00:36:22,880 --> 00:36:27,300 which is 1/3 over 30 times the base. 667 00:36:27,300 --> 00:36:28,910 How big is the base? 668 00:36:28,910 --> 00:36:33,060 Well it's from that point to 60, so the base has a length 669 00:36:33,060 --> 00:36:36,500 of 60 minus 200/t. 670 00:36:36,500 --> 00:36:45,470 671 00:36:45,470 --> 00:36:50,580 And this is a formula which is valid for those t's for which 672 00:36:50,580 --> 00:36:52,420 this picture is correct. 673 00:36:52,420 --> 00:36:57,410 And this picture is correct if 200/T happens to fall in this 674 00:36:57,410 --> 00:37:01,540 interval, which is the same as T falling in that interval, 675 00:37:01,540 --> 00:37:03,980 which are the t's that are possible. 676 00:37:03,980 --> 00:37:07,390 So finally let's find the density of T, which is what 677 00:37:07,390 --> 00:37:09,430 we're looking for. 678 00:37:09,430 --> 00:37:12,450 We find this by taking the derivative in this expression 679 00:37:12,450 --> 00:37:14,370 with respect to t. 680 00:37:14,370 --> 00:37:18,150 We only get one term from here. 681 00:37:18,150 --> 00:37:26,045 And this is going to be 200/30, 1 over t squared. 682 00:37:26,045 --> 00:37:30,820 683 00:37:30,820 --> 00:37:34,020 And this is the formula for the density for t's in the 684 00:37:34,020 --> 00:37:35,270 allowed to range. 685 00:37:35,270 --> 00:37:46,890 686 00:37:46,890 --> 00:37:51,130 OK, so that's the end of the solution to this particular 687 00:37:51,130 --> 00:37:52,880 problem as well. 688 00:37:52,880 --> 00:37:55,640 I said that there was a little twist compared to 689 00:37:55,640 --> 00:37:57,130 the previous one. 690 00:37:57,130 --> 00:37:58,410 What was the twist? 691 00:37:58,410 --> 00:38:01,380 Well the twist was that in the previous problem we dealt with 692 00:38:01,380 --> 00:38:05,580 the X cubed function, which was monotonically increasing. 693 00:38:05,580 --> 00:38:07,760 Here we dealt with the function that was 694 00:38:07,760 --> 00:38:09,850 monotonically decreasing. 695 00:38:09,850 --> 00:38:13,850 So when we had to find the probability that T is less 696 00:38:13,850 --> 00:38:17,220 than something, that translated into an event that 697 00:38:17,220 --> 00:38:19,640 V was bigger than something. 698 00:38:19,640 --> 00:38:22,410 Your time is less than something if and only if your 699 00:38:22,410 --> 00:38:25,090 velocity is bigger than something. 700 00:38:25,090 --> 00:38:27,510 So for when you're dealing with the monotonically 701 00:38:27,510 --> 00:38:31,950 decreasing function, at some point some inequalities will 702 00:38:31,950 --> 00:38:33,200 have to get reversed. 703 00:38:33,200 --> 00:38:38,540 704 00:38:38,540 --> 00:38:43,700 Finally let's look at a very useful one. 705 00:38:43,700 --> 00:38:47,990 Which is the case where we take a linear function of a 706 00:38:47,990 --> 00:38:49,700 random variable. 707 00:38:49,700 --> 00:38:55,810 So X is a random variable with given distribution, and we can 708 00:38:55,810 --> 00:38:57,110 see there is a linear function. 709 00:38:57,110 --> 00:38:59,920 So in this particular instance, we take a to be 710 00:38:59,920 --> 00:39:03,590 equal to 2 and b equal to 5. 711 00:39:03,590 --> 00:39:08,680 And let us first argue just by picture. 712 00:39:08,680 --> 00:39:13,920 So X is a random variable that has a given distribution. 713 00:39:13,920 --> 00:39:16,150 Let's say it's this weird shape here. 714 00:39:16,150 --> 00:39:20,170 And x ranges from -1 to +2. 715 00:39:20,170 --> 00:39:22,140 Let's do things one step at the time. 716 00:39:22,140 --> 00:39:26,190 Let's first find the distribution of 2X. 717 00:39:26,190 --> 00:39:28,960 Why do you think you know about 2X? 718 00:39:28,960 --> 00:39:35,330 Well if x ranges from -1 to 2, then the random variable X is 719 00:39:35,330 --> 00:39:36,580 going to range from -2 to +4. 720 00:39:36,580 --> 00:39:39,560 721 00:39:39,560 --> 00:39:42,360 So that's what the range is going to be. 722 00:39:42,360 --> 00:39:48,840 Now dealing with the random variable 2X, as opposed to the 723 00:39:48,840 --> 00:39:52,520 random variable X, in some sense it's just changing the 724 00:39:52,520 --> 00:39:55,270 units in which we measure that random variable. 725 00:39:55,270 --> 00:39:58,130 It's just changing the scale on which we 726 00:39:58,130 --> 00:39:59,730 draw and plot things. 727 00:39:59,730 --> 00:40:03,180 So if it's just a scale change, then intuition should 728 00:40:03,180 --> 00:40:08,120 tell you that the random variable X should have a PDF 729 00:40:08,120 --> 00:40:12,850 of the same shape, except that it's scaled out by a factor of 730 00:40:12,850 --> 00:40:16,540 2, because our random variable of 2X now has a range that's 731 00:40:16,540 --> 00:40:18,570 twice as large. 732 00:40:18,570 --> 00:40:23,720 So we take the same PDF and scale it up by stretching the 733 00:40:23,720 --> 00:40:26,790 x-axis by a factor of 2. 734 00:40:26,790 --> 00:40:30,330 So what does scaling correspond to 735 00:40:30,330 --> 00:40:33,870 in terms of a formula? 736 00:40:33,870 --> 00:40:39,500 So the distribution of 2X as a function, let's say, a generic 737 00:40:39,500 --> 00:40:45,760 argument z, is going to be the distribution of X, but scaled 738 00:40:45,760 --> 00:40:47,010 by a factor of 2. 739 00:40:47,010 --> 00:40:50,060 740 00:40:50,060 --> 00:40:54,100 So taking a function and replacing its arguments by the 741 00:40:54,100 --> 00:40:58,740 argument over 2, what it does is it stretches it 742 00:40:58,740 --> 00:41:00,430 by a factor of 2. 743 00:41:00,430 --> 00:41:04,410 You have probably been tortured ever since middle 744 00:41:04,410 --> 00:41:08,150 school to figure out when need to stretch a function, whether 745 00:41:08,150 --> 00:41:12,470 you need to put 2z or z/2. 746 00:41:12,470 --> 00:41:15,450 And the one that actually does the stretching is to put the 747 00:41:15,450 --> 00:41:18,000 z/2 in that place. 748 00:41:18,000 --> 00:41:21,180 So that's what the stretching does. 749 00:41:21,180 --> 00:41:23,670 Could that to be the full answer? 750 00:41:23,670 --> 00:41:24,930 Well there's a catch. 751 00:41:24,930 --> 00:41:29,730 If you stretch this function by a factor of 2, what happens 752 00:41:29,730 --> 00:41:32,100 to the area under the function? 753 00:41:32,100 --> 00:41:34,120 It's going to get doubled. 754 00:41:34,120 --> 00:41:38,670 But the total probability must add up to 1, so we need to do 755 00:41:38,670 --> 00:41:41,840 something else to make sure that the area under the curve 756 00:41:41,840 --> 00:41:44,300 stays to 1. 757 00:41:44,300 --> 00:41:47,980 So we need to take that function and scale it down by 758 00:41:47,980 --> 00:41:51,720 this factor of 2. 759 00:41:51,720 --> 00:41:55,580 So when you're dealing with a multiple of a random variable, 760 00:41:55,580 --> 00:42:00,580 what happens to the PDF is you stretch it according to the 761 00:42:00,580 --> 00:42:04,320 multiple, and then scale it down by the same number so 762 00:42:04,320 --> 00:42:07,460 that you preserve the area under that curve. 763 00:42:07,460 --> 00:42:10,800 So now we found the distribution of 2X. 764 00:42:10,800 --> 00:42:14,910 How about the distribution of 2X + 5? 765 00:42:14,910 --> 00:42:18,560 Well what does adding 5 to random variable do? 766 00:42:18,560 --> 00:42:20,940 You're going to get essentially the same values 767 00:42:20,940 --> 00:42:23,720 with the same probability, except that those values all 768 00:42:23,720 --> 00:42:26,260 get shifted by 5. 769 00:42:26,260 --> 00:42:30,650 So all that you need to do is to take this PDF here, and 770 00:42:30,650 --> 00:42:32,690 shift it by 5 units. 771 00:42:32,690 --> 00:42:35,530 So the range used to be from -2 to 4. 772 00:42:35,530 --> 00:42:38,750 The new range is going to be from 3 to 9. 773 00:42:38,750 --> 00:42:40,390 And that's the final answer. 774 00:42:40,390 --> 00:42:44,900 This is the distribution of 2X + 5, starting with this 775 00:42:44,900 --> 00:42:48,240 particular distribution of X. 776 00:42:48,240 --> 00:42:53,600 Now shifting to the right by b, what 777 00:42:53,600 --> 00:42:55,700 does it do to a function? 778 00:42:55,700 --> 00:42:58,620 Shifting to the right to by a certain amount, 779 00:42:58,620 --> 00:43:04,960 mathematically, it corresponds to putting -b in the argument 780 00:43:04,960 --> 00:43:06,000 of the function. 781 00:43:06,000 --> 00:43:09,750 So I'm taking the formula that I had here, which is the 782 00:43:09,750 --> 00:43:12,220 scaling by a factor of a. 783 00:43:12,220 --> 00:43:17,200 The scaling down to keep the total area equal to 1. 784 00:43:17,200 --> 00:43:19,740 And then I need to introduce this extra 785 00:43:19,740 --> 00:43:20,990 term to do the shifting. 786 00:43:20,990 --> 00:43:23,300 787 00:43:23,300 --> 00:43:26,200 So this is a plausible argument. 788 00:43:26,200 --> 00:43:31,080 The proof by picture that this should be the right answer. 789 00:43:31,080 --> 00:43:38,295 But just in order to keep our skills tuned and refined, let 790 00:43:38,295 --> 00:43:42,950 us do this derivation in a more formal way using our 791 00:43:42,950 --> 00:43:45,135 two-step cookbook procedure. 792 00:43:45,135 --> 00:43:48,000 793 00:43:48,000 --> 00:43:51,010 And I'm going to do it under the assumption that a is 794 00:43:51,010 --> 00:43:54,910 positive, as in the example that's we just did. 795 00:43:54,910 --> 00:43:59,090 So what's the two-step procedure? 796 00:43:59,090 --> 00:44:03,700 We want to find the cumulative of Y, and after that we're 797 00:44:03,700 --> 00:44:05,720 going to differentiate. 798 00:44:05,720 --> 00:44:09,220 By definition the cumulative is the probability that the 799 00:44:09,220 --> 00:44:13,280 random variable takes values less than a certain number. 800 00:44:13,280 --> 00:44:17,190 And now we need to take this event and translate it, and 801 00:44:17,190 --> 00:44:21,110 express it in terms of the original random variables. 802 00:44:21,110 --> 00:44:24,970 So Y is, by definition, aX + b, so we're 803 00:44:24,970 --> 00:44:28,970 looking at this event. 804 00:44:28,970 --> 00:44:33,580 And now we want to express this event in a clean form 805 00:44:33,580 --> 00:44:39,730 where X shows up in a straight way. 806 00:44:39,730 --> 00:44:42,740 Let's say I'm going to massage this event and 807 00:44:42,740 --> 00:44:44,640 write it in this form. 808 00:44:44,640 --> 00:44:48,070 For this inequality to be true, x should be less than or 809 00:44:48,070 --> 00:44:53,820 equal to (y minus b) divided by a. 810 00:44:53,820 --> 00:44:56,820 OK, now what is this? 811 00:44:56,820 --> 00:45:01,330 This is the cumulative distribution of X evaluated at 812 00:45:01,330 --> 00:45:02,580 the particular point. 813 00:45:02,580 --> 00:45:07,850 814 00:45:07,850 --> 00:45:14,760 So we got a formula for the cumulative Y based on the 815 00:45:14,760 --> 00:45:17,880 cumulative of X. What's the next step? 816 00:45:17,880 --> 00:45:21,550 Next step is to take derivatives of both sides. 817 00:45:21,550 --> 00:45:28,810 So the density of Y is going to be the derivative of this 818 00:45:28,810 --> 00:45:31,270 expression with respect to y. 819 00:45:31,270 --> 00:45:36,830 OK, so now here we need to use the chain rule. 820 00:45:36,830 --> 00:45:40,670 It's going to be the derivative of the F function 821 00:45:40,670 --> 00:45:43,080 with respect to its argument. 822 00:45:43,080 --> 00:45:46,930 And then we need to take the derivative of the argument 823 00:45:46,930 --> 00:45:48,780 with respect to y. 824 00:45:48,780 --> 00:45:51,530 What is the derivative of the cumulative? 825 00:45:51,530 --> 00:45:53,190 The derivative of the cumulative 826 00:45:53,190 --> 00:45:56,290 is the density itself. 827 00:45:56,290 --> 00:45:59,578 And we evaluate it at the point of interest. 828 00:45:59,578 --> 00:46:02,180 829 00:46:02,180 --> 00:46:05,340 And then the chain rule tells us that we need to take the 830 00:46:05,340 --> 00:46:08,800 derivative of this with respect to y, and the 831 00:46:08,800 --> 00:46:11,370 derivative of this with respect to y is 1/a. 832 00:46:11,370 --> 00:46:14,290 833 00:46:14,290 --> 00:46:18,330 And this gives us the formula which is consistent with what 834 00:46:18,330 --> 00:46:21,810 I had written down here, for the case where a 835 00:46:21,810 --> 00:46:25,030 is a positive number. 836 00:46:25,030 --> 00:46:27,915 What if a was a negative number? 837 00:46:27,915 --> 00:46:30,570 838 00:46:30,570 --> 00:46:31,910 Could this formula be true? 839 00:46:31,910 --> 00:46:35,120 840 00:46:35,120 --> 00:46:36,140 Of course not. 841 00:46:36,140 --> 00:46:39,000 Densities cannot be negative, right? 842 00:46:39,000 --> 00:46:41,180 So that formula cannot be true. 843 00:46:41,180 --> 00:46:43,750 Something needs to change. 844 00:46:43,750 --> 00:46:45,140 What should change? 845 00:46:45,140 --> 00:46:50,970 Where does this argument break down when a is negative? 846 00:46:50,970 --> 00:46:56,470 847 00:46:56,470 --> 00:47:01,570 So when I write this inequality in this form, I 848 00:47:01,570 --> 00:47:03,940 divide by a. 849 00:47:03,940 --> 00:47:07,730 But when you divide by a negative number, the direction 850 00:47:07,730 --> 00:47:10,390 of an inequality is going to change. 851 00:47:10,390 --> 00:47:14,520 So when a is negative, this inequality becomes larger than 852 00:47:14,520 --> 00:47:16,190 or equal to. 853 00:47:16,190 --> 00:47:18,770 And in that case, the expression that I have up 854 00:47:18,770 --> 00:47:24,360 there would change when this is larger than here. 855 00:47:24,360 --> 00:47:27,900 Instead of getting the cumulative, I would get 1 856 00:47:27,900 --> 00:47:32,350 minus the cumulative of (y minus b) divided by a. 857 00:47:32,350 --> 00:47:35,240 858 00:47:35,240 --> 00:47:39,890 So this is the probability that X is bigger than this 859 00:47:39,890 --> 00:47:41,170 particular number. 860 00:47:41,170 --> 00:47:44,000 And now when you take the derivatives, there's going to 861 00:47:44,000 --> 00:47:46,570 be a minus sign that shows up. 862 00:47:46,570 --> 00:47:49,810 And that minus sign will end up being here. 863 00:47:49,810 --> 00:47:53,730 And so we're taking the negative of a negative number, 864 00:47:53,730 --> 00:47:56,420 and that basically is equivalent to taking the 865 00:47:56,420 --> 00:47:58,660 absolute value of that number. 866 00:47:58,660 --> 00:48:03,830 So all that happens when we have a negative a is that we 867 00:48:03,830 --> 00:48:07,010 have to take the absolute value of the scaling factor 868 00:48:07,010 --> 00:48:10,250 instead of the factor itself. 869 00:48:10,250 --> 00:48:14,020 All right, so this general formula is quite useful for 870 00:48:14,020 --> 00:48:16,690 dealing with linear functions of random variables. 871 00:48:16,690 --> 00:48:21,330 And one nice application of it is to take the formula for a 872 00:48:21,330 --> 00:48:25,460 normal random variable, consider a linear function of 873 00:48:25,460 --> 00:48:29,600 a normal random variable, plug into this formula, and what 874 00:48:29,600 --> 00:48:34,000 you will find is that Y also has a normal distribution. 875 00:48:34,000 --> 00:48:37,310 So using this formula, now we can prove a statement that I 876 00:48:37,310 --> 00:48:40,565 had made a couple of lectures ago, that a linear function of 877 00:48:40,565 --> 00:48:43,900 a normal random variable is also linear. 878 00:48:43,900 --> 00:48:47,600 That's how you would prove it. 879 00:48:47,600 --> 00:48:51,190 I think this is it for today so. 880 00:48:51,190 --> 00:48:52,440