1 00:00:00,000 --> 00:00:00,040 2 00:00:00,040 --> 00:00:02,460 The following content is provided under a Creative 3 00:00:02,460 --> 00:00:03,870 Commons license. 4 00:00:03,870 --> 00:00:06,910 Your support will help MIT OpenCourseWare continue to 5 00:00:06,910 --> 00:00:10,560 offer high-quality educational resources for free. 6 00:00:10,560 --> 00:00:13,460 To make a donation or view additional materials from 7 00:00:13,460 --> 00:00:19,290 hundreds of MIT courses, visit MIT OpenCourseWare at 8 00:00:19,290 --> 00:00:20,540 ocw.mit.edu. 9 00:00:20,540 --> 00:00:22,460 10 00:00:22,460 --> 00:00:26,290 PROFESSOR: OK so let's start. 11 00:00:26,290 --> 00:00:28,120 So today, we're going to continue the 12 00:00:28,120 --> 00:00:29,550 subject from last time. 13 00:00:29,550 --> 00:00:32,619 And the subject is random variables. 14 00:00:32,619 --> 00:00:35,890 As we discussed, random variables basically associate 15 00:00:35,890 --> 00:00:39,550 numerical values with the outcomes of an experiment. 16 00:00:39,550 --> 00:00:42,690 And we want to learn how to manipulate them. 17 00:00:42,690 --> 00:00:45,890 Now to a large extent, what's going to happen, what's 18 00:00:45,890 --> 00:00:50,850 happening during this chapter, is that we are revisiting the 19 00:00:50,850 --> 00:00:53,220 same concepts we have seen in chapter one. 20 00:00:53,220 --> 00:00:57,340 But we're going to introduce a lot of new notation, but 21 00:00:57,340 --> 00:01:00,210 really dealing with the same kind of stuff. 22 00:01:00,210 --> 00:01:05,500 The only difference where we go beyond the new notation, 23 00:01:05,500 --> 00:01:08,470 the new concept in this chapter is the concept of the 24 00:01:08,470 --> 00:01:10,470 expectation or expected values. 25 00:01:10,470 --> 00:01:14,010 And we're going to learn how to manipulate expectations. 26 00:01:14,010 --> 00:01:17,280 So let us start with a quick review of what we 27 00:01:17,280 --> 00:01:19,610 discussed last time. 28 00:01:19,610 --> 00:01:22,830 We talked about random variables. 29 00:01:22,830 --> 00:01:25,860 Loosely speaking, random variables are random 30 00:01:25,860 --> 00:01:28,670 quantities that result from an experiment. 31 00:01:28,670 --> 00:01:31,950 More precisely speaking, mathematically speaking, a 32 00:01:31,950 --> 00:01:35,310 random variable is a function from the sample space to the 33 00:01:35,310 --> 00:01:35,900 real numbers. 34 00:01:35,900 --> 00:01:39,370 That is, you give me an outcome, and based on that 35 00:01:39,370 --> 00:01:42,940 outcome, I can tell you the value of the random variable. 36 00:01:42,940 --> 00:01:45,750 So the value of the random variable is a function of the 37 00:01:45,750 --> 00:01:47,870 outcome that we have. 38 00:01:47,870 --> 00:01:51,170 Now given a random variable, some of the numerical outcomes 39 00:01:51,170 --> 00:01:52,850 are more likely than others. 40 00:01:52,850 --> 00:01:55,600 And we want to say which ones are more likely and 41 00:01:55,600 --> 00:01:57,420 how likely they are. 42 00:01:57,420 --> 00:02:00,900 And the way we do that is by writing down the probabilities 43 00:02:00,900 --> 00:02:04,030 of the different possible numerical outcomes. 44 00:02:04,030 --> 00:02:05,530 Notice here, the notation. 45 00:02:05,530 --> 00:02:08,660 We use uppercase to denote the random variable. 46 00:02:08,660 --> 00:02:11,840 We use lowercase to denote real numbers. 47 00:02:11,840 --> 00:02:15,880 So the way you read this, this is the probability that the 48 00:02:15,880 --> 00:02:20,160 random variable, capital X, happens to take the numerical 49 00:02:20,160 --> 00:02:21,840 value, little x. 50 00:02:21,840 --> 00:02:25,780 This is a concept that's familiar from chapter one. 51 00:02:25,780 --> 00:02:28,810 And this is just the new notation we will be using for 52 00:02:28,810 --> 00:02:30,080 that concept. 53 00:02:30,080 --> 00:02:33,690 It's the Probability Mass Function of the random 54 00:02:33,690 --> 00:02:37,620 variable, capital X. So the subscript just indicates which 55 00:02:37,620 --> 00:02:40,190 random variable we're talking about. 56 00:02:40,190 --> 00:02:44,070 And it's the probability assigned to 57 00:02:44,070 --> 00:02:45,300 a particular outcome. 58 00:02:45,300 --> 00:02:48,060 And we want to assign such probabilities for all possibly 59 00:02:48,060 --> 00:02:49,400 numerical values. 60 00:02:49,400 --> 00:02:52,690 So you can think of this as being a function of little x. 61 00:02:52,690 --> 00:02:56,910 And it tells you how likely every little x is going to be. 62 00:02:56,910 --> 00:02:59,310 Now the new concept we introduced last time is the 63 00:02:59,310 --> 00:03:02,520 concept of the expected value for random variable, which is 64 00:03:02,520 --> 00:03:04,160 defined this way. 65 00:03:04,160 --> 00:03:06,870 You look at all the possible outcomes. 66 00:03:06,870 --> 00:03:10,900 And you form some kind of average of all the possible 67 00:03:10,900 --> 00:03:15,000 numerical values over the random variable capital X. You 68 00:03:15,000 --> 00:03:17,830 consider all the possible numerical values, and you form 69 00:03:17,830 --> 00:03:18,560 an average. 70 00:03:18,560 --> 00:03:23,160 In fact, it's a weighted average where, to every little 71 00:03:23,160 --> 00:03:26,750 x, you assign a weight equal to the probability that that 72 00:03:26,750 --> 00:03:30,190 particular little x is going to be realized. 73 00:03:30,190 --> 00:03:34,700 74 00:03:34,700 --> 00:03:38,100 Now, as we discussed last time, if you have a random 75 00:03:38,100 --> 00:03:41,220 variable, you can take a function of a random variable. 76 00:03:41,220 --> 00:03:43,860 And that's going to be a new random variable. 77 00:03:43,860 --> 00:03:47,870 So if capital X is a random variable and g is a function, 78 00:03:47,870 --> 00:03:51,920 g of X is a new random variable. 79 00:03:51,920 --> 00:03:53,090 You do the experiment. 80 00:03:53,090 --> 00:03:54,310 You get an outcome. 81 00:03:54,310 --> 00:03:56,950 This determines the value of X. And that determines the 82 00:03:56,950 --> 00:03:58,400 value of g of X. 83 00:03:58,400 --> 00:04:01,790 So the numerical value of g of X is determined by whatever 84 00:04:01,790 --> 00:04:03,330 happens in the experiment. 85 00:04:03,330 --> 00:04:04,150 It's random. 86 00:04:04,150 --> 00:04:06,440 And that makes it a random variable. 87 00:04:06,440 --> 00:04:09,040 Since it's a random variable, it has an 88 00:04:09,040 --> 00:04:11,320 expectation of its own. 89 00:04:11,320 --> 00:04:14,860 So how would we calculate the expectation of g of X? 90 00:04:14,860 --> 00:04:18,430 You could proceed by just using the definition, which 91 00:04:18,430 --> 00:04:23,170 would require you to find the PMF of the random variable g 92 00:04:23,170 --> 00:04:29,480 of X. So find the PMF of g of X, and then apply the formula 93 00:04:29,480 --> 00:04:31,260 for the expected value of a random 94 00:04:31,260 --> 00:04:33,360 variable with known PMF. 95 00:04:33,360 --> 00:04:36,820 But there is also a shortcut, which is just a different way 96 00:04:36,820 --> 00:04:40,530 of doing the counting and the calculations, in which we do 97 00:04:40,530 --> 00:04:44,580 not need to find the PMF of g of X. We just work with the 98 00:04:44,580 --> 00:04:47,290 PMF of the original random variable. 99 00:04:47,290 --> 00:04:50,010 And what this is saying is that the average value of g of 100 00:04:50,010 --> 00:04:51,800 X is obtained as follows. 101 00:04:51,800 --> 00:04:55,020 You look at all the possible results, the X's, 102 00:04:55,020 --> 00:04:56,460 how likely they are. 103 00:04:56,460 --> 00:04:59,740 And when that particular X happens, this is 104 00:04:59,740 --> 00:05:01,470 how much you get. 105 00:05:01,470 --> 00:05:05,690 And so this way, you add these things up. 106 00:05:05,690 --> 00:05:10,310 And you get the average amount that you're going to get, the 107 00:05:10,310 --> 00:05:13,120 average value of g of X, where you average over the 108 00:05:13,120 --> 00:05:16,130 likelihoods of the different X's. 109 00:05:16,130 --> 00:05:19,730 Now expected values have some properties that are always 110 00:05:19,730 --> 00:05:23,570 true and some properties that sometimes are not true. 111 00:05:23,570 --> 00:05:28,160 So the property that is not always true is that this would 112 00:05:28,160 --> 00:05:34,400 be the same as g of the expected value of X. So in 113 00:05:34,400 --> 00:05:36,730 general, this is not true. 114 00:05:36,730 --> 00:05:40,260 You cannot interchange function and expectation, 115 00:05:40,260 --> 00:05:44,310 which means you cannot reason on the average, in general. 116 00:05:44,310 --> 00:05:45,780 But there are some exceptions. 117 00:05:45,780 --> 00:05:49,780 When g is a linear function, then the expected value for a 118 00:05:49,780 --> 00:05:53,460 linear function is the same as that same linear function of 119 00:05:53,460 --> 00:05:54,470 the expectation. 120 00:05:54,470 --> 00:05:57,470 So for linear functions, so for random variable, the 121 00:05:57,470 --> 00:06:00,010 expectation behaves nicely. 122 00:06:00,010 --> 00:06:05,320 So this is basically telling you that, if X is degrees in 123 00:06:05,320 --> 00:06:09,440 Celsius, alpha X plus b is degrees in Fahrenheit, you can 124 00:06:09,440 --> 00:06:12,030 first do the conversion to Fahrenheit 125 00:06:12,030 --> 00:06:13,390 and take the average. 126 00:06:13,390 --> 00:06:16,870 Or you can find the average temperature in Celsius, and 127 00:06:16,870 --> 00:06:21,270 then do the conversion to Fahrenheit. 128 00:06:21,270 --> 00:06:23,630 Either is valid. 129 00:06:23,630 --> 00:06:27,370 So the expected value tells us something about where is the 130 00:06:27,370 --> 00:06:31,320 center of the distribution, more specifically, the center 131 00:06:31,320 --> 00:06:34,360 of mass or the center of gravity of the PMF, when you 132 00:06:34,360 --> 00:06:36,170 plot it as a bar graph. 133 00:06:36,170 --> 00:06:39,880 Besides the average value, you may be interested in knowing 134 00:06:39,880 --> 00:06:45,810 how far will you be from the average, typically. 135 00:06:45,810 --> 00:06:48,880 So let's look at this quantity, X minus expected 136 00:06:48,880 --> 00:06:50,270 value of X. 137 00:06:50,270 --> 00:06:53,620 This is the distance from the average value. 138 00:06:53,620 --> 00:06:58,230 So for a random outcome of the experiment, this quantity in 139 00:06:58,230 --> 00:07:01,260 here measures how far away from the mean 140 00:07:01,260 --> 00:07:03,380 you happen to be. 141 00:07:03,380 --> 00:07:08,330 This quantity inside the brackets is a random variable. 142 00:07:08,330 --> 00:07:09,620 Why? 143 00:07:09,620 --> 00:07:12,620 Because capital X is random. 144 00:07:12,620 --> 00:07:15,910 And what we have here is capital X, which is random, 145 00:07:15,910 --> 00:07:17,620 minus a number. 146 00:07:17,620 --> 00:07:20,130 Remember, expected values are numbers. 147 00:07:20,130 --> 00:07:22,340 Now a random variable minus a number is 148 00:07:22,340 --> 00:07:23,470 a new random variable. 149 00:07:23,470 --> 00:07:26,340 It has an expectation of its own. 150 00:07:26,340 --> 00:07:30,600 We can use the linearity rule, expected value of something 151 00:07:30,600 --> 00:07:35,060 minus something else is just the difference of their 152 00:07:35,060 --> 00:07:36,120 expected value. 153 00:07:36,120 --> 00:07:40,250 So it's going to be expected value of X minus the expected 154 00:07:40,250 --> 00:07:42,290 value over this thing. 155 00:07:42,290 --> 00:07:44,180 Now this thing is a number. 156 00:07:44,180 --> 00:07:46,700 And the expected value of a number is 157 00:07:46,700 --> 00:07:48,710 just the number itself. 158 00:07:48,710 --> 00:07:51,660 So we get from here that this is expected value minus 159 00:07:51,660 --> 00:07:52,750 expected value. 160 00:07:52,750 --> 00:07:55,510 And we get zero. 161 00:07:55,510 --> 00:07:57,420 What is this telling us? 162 00:07:57,420 --> 00:08:02,690 That, on the average, the assigned difference from the 163 00:08:02,690 --> 00:08:05,080 mean is equal to zero. 164 00:08:05,080 --> 00:08:06,460 That is, the mean is here. 165 00:08:06,460 --> 00:08:08,850 Sometimes X will fall to the right. 166 00:08:08,850 --> 00:08:11,770 Sometimes X will fall to the left. 167 00:08:11,770 --> 00:08:16,170 On the average, the average distance from the mean is 168 00:08:16,170 --> 00:08:19,390 going to be zero, because sometimes the realized 169 00:08:19,390 --> 00:08:21,950 distance will be positive, sometimes it will be negative. 170 00:08:21,950 --> 00:08:24,680 Positives and negatives cancel out. 171 00:08:24,680 --> 00:08:28,140 So if we want to capture the idea of how far are we from 172 00:08:28,140 --> 00:08:33,090 the mean, just looking at the assigned distance from the 173 00:08:33,090 --> 00:08:36,789 mean is not going to give us any useful information. 174 00:08:36,789 --> 00:08:40,200 So if we want to say something about how far we are, 175 00:08:40,200 --> 00:08:43,120 typically, we should do something different. 176 00:08:43,120 --> 00:08:47,810 One possibility might be to take the absolute values of 177 00:08:47,810 --> 00:08:49,520 the differences. 178 00:08:49,520 --> 00:08:51,780 And that's a quantity that sometimes people are 179 00:08:51,780 --> 00:08:52,920 interested in. 180 00:08:52,920 --> 00:08:58,540 But it turns out that a more useful quantity happens to be 181 00:08:58,540 --> 00:09:02,190 the variance of a random variable, which actually 182 00:09:02,190 --> 00:09:07,030 measures the average squared distance from the mean. 183 00:09:07,030 --> 00:09:11,230 So you have a random outcome, random results, random 184 00:09:11,230 --> 00:09:14,130 numerical value of the random variable. 185 00:09:14,130 --> 00:09:17,370 It is a certain distance away from the mean. 186 00:09:17,370 --> 00:09:19,390 That certain distance is random. 187 00:09:19,390 --> 00:09:21,210 We take the square of that. 188 00:09:21,210 --> 00:09:23,200 This is the squared distance from the mean, 189 00:09:23,200 --> 00:09:24,750 which is again random. 190 00:09:24,750 --> 00:09:27,990 Since it's random, it has an expected value of its own. 191 00:09:27,990 --> 00:09:32,000 And that expected value, we call it the variance of X. And 192 00:09:32,000 --> 00:09:35,350 so we have this particular definition. 193 00:09:35,350 --> 00:09:39,500 Using the rule that we have up here for how to calculate 194 00:09:39,500 --> 00:09:42,870 expectations of functions of a random variable, 195 00:09:42,870 --> 00:09:44,820 why does that apply? 196 00:09:44,820 --> 00:09:48,980 Well, what we have inside the brackets here is a function of 197 00:09:48,980 --> 00:09:52,850 the random variable, capital X. So we can apply this rule 198 00:09:52,850 --> 00:09:55,750 where g is this particular function. 199 00:09:55,750 --> 00:09:58,540 And we can use that to calculate the variance, 200 00:09:58,540 --> 00:10:02,160 starting with the PMF of the random variable X. And then we 201 00:10:02,160 --> 00:10:05,200 have a useful formula that's a nice shortcut, sometimes, if 202 00:10:05,200 --> 00:10:07,620 you want to do the calculation. 203 00:10:07,620 --> 00:10:11,340 Now one thing that's slightly wrong with the variance is 204 00:10:11,340 --> 00:10:15,600 that the units are not right, if you want to talk about the 205 00:10:15,600 --> 00:10:16,930 spread a of a distribution. 206 00:10:16,930 --> 00:10:20,830 Suppose that X is a random variable measured in meters. 207 00:10:20,830 --> 00:10:26,170 The variance will have the units of meters squared. 208 00:10:26,170 --> 00:10:28,390 So it's a kind of a different thing. 209 00:10:28,390 --> 00:10:31,120 If you want to talk about the spread of the distribution 210 00:10:31,120 --> 00:10:35,210 using the same units as you have for X, it's convenient to 211 00:10:35,210 --> 00:10:37,880 take the square root of the variance. 212 00:10:37,880 --> 00:10:39,580 And that's something that we define. 213 00:10:39,580 --> 00:10:42,980 And we call it to the standard deviation of X, or the 214 00:10:42,980 --> 00:10:46,140 standard deviation of the distribution of X. So it tells 215 00:10:46,140 --> 00:10:49,510 you the amount of spread in your distribution. 216 00:10:49,510 --> 00:10:52,980 And it is in the same units as the random variable itself 217 00:10:52,980 --> 00:10:54,230 that you are dealing with. 218 00:10:54,230 --> 00:10:57,260 219 00:10:57,260 --> 00:11:02,500 And we can just illustrate those quantities with an 220 00:11:02,500 --> 00:11:06,670 example that's about as simple as it can be. 221 00:11:06,670 --> 00:11:08,570 So consider the following experiment. 222 00:11:08,570 --> 00:11:11,060 You're going to go from here to New York, 223 00:11:11,060 --> 00:11:13,140 let's say, 200 miles. 224 00:11:13,140 --> 00:11:15,500 And you have two alternatives. 225 00:11:15,500 --> 00:11:20,640 Either you'll get your private plane and go at a speed of 200 226 00:11:20,640 --> 00:11:27,690 miles per hour, constant speed during your trip, or 227 00:11:27,690 --> 00:11:32,010 otherwise, you'll decide to walk really, really slowly, at 228 00:11:32,010 --> 00:11:35,120 the leisurely pace of one mile per hour. 229 00:11:35,120 --> 00:11:38,150 So you pick the speed at random by doing this 230 00:11:38,150 --> 00:11:39,820 experiment, by flipping a coin. 231 00:11:39,820 --> 00:11:41,510 And with probability one-half, you do one thing. 232 00:11:41,510 --> 00:11:44,230 With probably one-half, you do the other thing. 233 00:11:44,230 --> 00:11:47,890 So your V is a random variable. 234 00:11:47,890 --> 00:11:51,050 In case you're interested in how much time it's going to 235 00:11:51,050 --> 00:11:54,970 take you to get there, well, time is equal to distance 236 00:11:54,970 --> 00:11:56,920 divided by speed. 237 00:11:56,920 --> 00:11:58,480 So that's the formula. 238 00:11:58,480 --> 00:12:01,660 The time itself is a random variable, because it's a 239 00:12:01,660 --> 00:12:03,850 function of V, which is random. 240 00:12:03,850 --> 00:12:06,390 How much time it's going to take you depends on the coin 241 00:12:06,390 --> 00:12:09,270 flip that you do in the beginning to decide what speed 242 00:12:09,270 --> 00:12:11,920 you are going to have. 243 00:12:11,920 --> 00:12:15,110 OK, just as a warm up, the trivial calculations. 244 00:12:15,110 --> 00:12:17,790 To find the expected value of V, you argue as follows. 245 00:12:17,790 --> 00:12:21,730 With probability one-half, V is going to be one. 246 00:12:21,730 --> 00:12:25,740 And with probability one-half, V is going to be 200. 247 00:12:25,740 --> 00:12:31,410 And so the expected value of your speed is 100.5. 248 00:12:31,410 --> 00:12:34,970 If you wish to calculate the variance of V, then you argue 249 00:12:34,970 --> 00:12:36,420 as follows. 250 00:12:36,420 --> 00:12:40,650 With probability one-half, I'm going to travel at the speed 251 00:12:40,650 --> 00:12:45,920 of one, whereas, the mean is 100.5. 252 00:12:45,920 --> 00:12:49,090 So this is the distance from the mean, if I decide to 253 00:12:49,090 --> 00:12:50,380 travel at the speed of one. 254 00:12:50,380 --> 00:12:52,920 We take that distance from the mean squared. 255 00:12:52,920 --> 00:12:55,330 That's one contribution to the variance. 256 00:12:55,330 --> 00:12:59,220 And with probability one-half, you're going to travel at the 257 00:12:59,220 --> 00:13:05,610 speed of 200, which is this much away from the mean. 258 00:13:05,610 --> 00:13:08,270 You take the square of that. 259 00:13:08,270 --> 00:13:12,360 OK, so approximately how big is this number? 260 00:13:12,360 --> 00:13:14,370 Well, this is roughly 100 squared. 261 00:13:14,370 --> 00:13:16,130 That's also 100 squared. 262 00:13:16,130 --> 00:13:20,760 So approximately, the variance of this random 263 00:13:20,760 --> 00:13:25,060 variable is 100 squared. 264 00:13:25,060 --> 00:13:28,090 Now if I tell you that the variance of this distribution 265 00:13:28,090 --> 00:13:31,980 is 10,000, it doesn't really help you to 266 00:13:31,980 --> 00:13:33,600 relate it to this diagram. 267 00:13:33,600 --> 00:13:35,950 Whereas, the standard deviation, where you take the 268 00:13:35,950 --> 00:13:38,850 square root, is more interesting. 269 00:13:38,850 --> 00:13:43,770 It's the square root of 100 squared, which is a 100. 270 00:13:43,770 --> 00:13:48,320 And the standard deviation, indeed, gives us a sense of 271 00:13:48,320 --> 00:13:53,280 how spread out this distribution is from the mean. 272 00:13:53,280 --> 00:13:57,440 So the standard deviation basically gives us some 273 00:13:57,440 --> 00:14:01,110 indication about this spacing that we have here. 274 00:14:01,110 --> 00:14:03,950 It tells us the amount of spread in our distribution. 275 00:14:03,950 --> 00:14:07,320 276 00:14:07,320 --> 00:14:12,110 OK, now let's look at what happens to time. 277 00:14:12,110 --> 00:14:14,970 V is a random variable. 278 00:14:14,970 --> 00:14:17,340 T is a random variable. 279 00:14:17,340 --> 00:14:19,820 So now let's look at the expected values and all of 280 00:14:19,820 --> 00:14:23,730 that for the time. 281 00:14:23,730 --> 00:14:29,250 OK, so the time is a function of a random variable. 282 00:14:29,250 --> 00:14:33,070 We can find the expected time by looking at all possible 283 00:14:33,070 --> 00:14:37,030 outcomes of the experiment, the V's, weigh them according 284 00:14:37,030 --> 00:14:39,820 to their probabilities, and for each particular V, keep 285 00:14:39,820 --> 00:14:42,880 track of how much time it took us. 286 00:14:42,880 --> 00:14:48,760 So if V is one, which happens with probability one-half, the 287 00:14:48,760 --> 00:14:52,440 time it takes is going to be 200. 288 00:14:52,440 --> 00:14:56,410 If we travel at speed of one, it takes us 200 time units. 289 00:14:56,410 --> 00:15:01,560 And otherwise, if our speed is equal to 200, the time is one. 290 00:15:01,560 --> 00:15:06,540 So the expected value of T is once more the same as before. 291 00:15:06,540 --> 00:15:09,740 It's 100.5. 292 00:15:09,740 --> 00:15:13,840 So the expected speed is 100.5. 293 00:15:13,840 --> 00:15:19,320 The expected time is also 100.5. 294 00:15:19,320 --> 00:15:22,030 So the product of these expectations is 295 00:15:22,030 --> 00:15:24,350 something like 10,000. 296 00:15:24,350 --> 00:15:29,230 How about the expected value of the product of T and V? 297 00:15:29,230 --> 00:15:32,460 Well, T times V is 200. 298 00:15:32,460 --> 00:15:37,790 No matter what outcome you have in the experiment, in 299 00:15:37,790 --> 00:15:41,320 that particular outcome, T times V is total distance 300 00:15:41,320 --> 00:15:45,660 traveled, which is exactly 200. 301 00:15:45,660 --> 00:15:49,890 And so what do we get in this simple example is that the 302 00:15:49,890 --> 00:15:53,550 expected value of the product of these two random variables 303 00:15:53,550 --> 00:15:57,630 is different than the product of their expected values. 304 00:15:57,630 --> 00:16:01,120 This is one more instance of where we cannot 305 00:16:01,120 --> 00:16:03,500 reason on the average. 306 00:16:03,500 --> 00:16:08,190 So on the average, over a large number of trips, your 307 00:16:08,190 --> 00:16:10,120 average time would be 100. 308 00:16:10,120 --> 00:16:13,570 On the average, over a large number of trips, your average 309 00:16:13,570 --> 00:16:15,820 speed would be 100. 310 00:16:15,820 --> 00:16:20,740 But your average distance traveled is not 100 times 100. 311 00:16:20,740 --> 00:16:22,580 It's something else. 312 00:16:22,580 --> 00:16:26,850 So you cannot reason on the average, whenever you're 313 00:16:26,850 --> 00:16:28,850 dealing with non-linear things. 314 00:16:28,850 --> 00:16:31,410 And the non-linear thing here is that you have a function 315 00:16:31,410 --> 00:16:34,740 which is a product of stuff, as opposed to just 316 00:16:34,740 --> 00:16:37,330 linear sums of stuff. 317 00:16:37,330 --> 00:16:42,530 Another way to look at what's happening here is the expected 318 00:16:42,530 --> 00:16:44,100 value of the time. 319 00:16:44,100 --> 00:16:47,460 Time, by definition, is 200 over the speed. 320 00:16:47,460 --> 00:16:52,100 Expected value of the time, we found it to be about a 100. 321 00:16:52,100 --> 00:17:02,800 And so expected value of 200 over V is about a 100. 322 00:17:02,800 --> 00:17:07,280 But it's different from this quantity here, which is 323 00:17:07,280 --> 00:17:13,119 roughly equal to 2, and so 200. 324 00:17:13,119 --> 00:17:15,430 Expected value of V is about 100. 325 00:17:15,430 --> 00:17:19,030 So this quantity is about equal to two. 326 00:17:19,030 --> 00:17:22,329 Whereas, this quantity up here is about 100. 327 00:17:22,329 --> 00:17:23,609 So what do we have here? 328 00:17:23,609 --> 00:17:26,960 We have a non-linear function of V. And we find that the 329 00:17:26,960 --> 00:17:31,130 expected value of this function is not the same thing 330 00:17:31,130 --> 00:17:34,390 as the function of the expected value. 331 00:17:34,390 --> 00:17:38,560 So again, that's an instance where you cannot interchange 332 00:17:38,560 --> 00:17:40,820 expected values and functions. 333 00:17:40,820 --> 00:17:42,770 And that's because things are non-linear. 334 00:17:42,770 --> 00:17:46,120 335 00:17:46,120 --> 00:17:51,730 OK, so now let us introduce a new concept. 336 00:17:51,730 --> 00:17:56,030 Or maybe it's not quite a new concept. 337 00:17:56,030 --> 00:17:58,570 So we discussed, in chapter one, that we have 338 00:17:58,570 --> 00:17:59,470 probabilities. 339 00:17:59,470 --> 00:18:03,170 We also have conditional probabilities. 340 00:18:03,170 --> 00:18:05,250 What's the difference between them? 341 00:18:05,250 --> 00:18:06,410 Essentially, none. 342 00:18:06,410 --> 00:18:09,560 Probabilities are just an assignment of probability 343 00:18:09,560 --> 00:18:11,650 values to give different outcomes, given 344 00:18:11,650 --> 00:18:12,980 a particular model. 345 00:18:12,980 --> 00:18:16,180 Somebody comes and gives you new information. 346 00:18:16,180 --> 00:18:18,370 So you come up with a new model. 347 00:18:18,370 --> 00:18:20,180 And you have a new probabilities. 348 00:18:20,180 --> 00:18:23,120 We call these conditional probabilities, but they taste 349 00:18:23,120 --> 00:18:27,230 and behave exactly the same as ordinary probabilities. 350 00:18:27,230 --> 00:18:30,110 So since we can have conditional probabilities, why 351 00:18:30,110 --> 00:18:35,020 not have conditional PMFs as well, since PMFs deal with 352 00:18:35,020 --> 00:18:37,120 probabilities anyway. 353 00:18:37,120 --> 00:18:40,520 So we have a random variable, capital X. It has 354 00:18:40,520 --> 00:18:42,420 a PMF of its own. 355 00:18:42,420 --> 00:18:46,810 For example, it could be the PMF in this picture, which is 356 00:18:46,810 --> 00:18:52,160 a uniform PMF that takes for possible different values. 357 00:18:52,160 --> 00:18:54,850 And we also have an event. 358 00:18:54,850 --> 00:18:57,240 And somebody comes and tells us that 359 00:18:57,240 --> 00:18:59,920 this event has occurred. 360 00:18:59,920 --> 00:19:02,700 The PMF tells you the probability that capital X 361 00:19:02,700 --> 00:19:04,730 equals to some little x. 362 00:19:04,730 --> 00:19:08,930 Somebody tells you that a certain event has occurred 363 00:19:08,930 --> 00:19:12,320 that's going to make you change the probabilities that 364 00:19:12,320 --> 00:19:14,690 you assign to the different values. 365 00:19:14,690 --> 00:19:17,180 You are going to use conditional probabilities. 366 00:19:17,180 --> 00:19:20,880 So this part, it's clear what it means from chapter one. 367 00:19:20,880 --> 00:19:25,200 And this part is just the new notation we're using in this 368 00:19:25,200 --> 00:19:28,380 chapter to talk about conditional probabilities. 369 00:19:28,380 --> 00:19:31,200 So this is just a definition. 370 00:19:31,200 --> 00:19:35,010 So the conditional PMF is an ordinary PMF. 371 00:19:35,010 --> 00:19:39,840 But it's the PMF that applies to a new model in which we 372 00:19:39,840 --> 00:19:42,480 have been given some information about the outcome 373 00:19:42,480 --> 00:19:43,840 of the experiment. 374 00:19:43,840 --> 00:19:47,560 So to make it concrete, consider this event here. 375 00:19:47,560 --> 00:19:50,610 Take the event that capital X is bigger 376 00:19:50,610 --> 00:19:52,290 than or equal to two. 377 00:19:52,290 --> 00:19:54,490 In the picture, what is the event A? 378 00:19:54,490 --> 00:19:57,165 The event A consists of these three outcomes. 379 00:19:57,165 --> 00:20:00,470 380 00:20:00,470 --> 00:20:04,900 OK, what is the conditional PMF, given that we are told 381 00:20:04,900 --> 00:20:08,920 that event A has occurred? 382 00:20:08,920 --> 00:20:11,900 Given that the event A has occurred, it basically tells 383 00:20:11,900 --> 00:20:15,390 us that this outcome has not occurred. 384 00:20:15,390 --> 00:20:18,670 There's only three possible outcomes now. 385 00:20:18,670 --> 00:20:21,350 In the new universe, in the new model where we condition 386 00:20:21,350 --> 00:20:23,840 on A, there's only three possible outcomes. 387 00:20:23,840 --> 00:20:25,900 Those three possible outcomes were equally 388 00:20:25,900 --> 00:20:27,660 likely when we started. 389 00:20:27,660 --> 00:20:30,130 So in the conditional universe, they will remain 390 00:20:30,130 --> 00:20:31,270 equally likely. 391 00:20:31,270 --> 00:20:34,230 Remember, whenever you condition, the relative 392 00:20:34,230 --> 00:20:36,330 likelihoods remain the same. 393 00:20:36,330 --> 00:20:38,290 They keep the same proportions. 394 00:20:38,290 --> 00:20:40,860 They just need to be re-scaled, so that 395 00:20:40,860 --> 00:20:42,520 they add up to one. 396 00:20:42,520 --> 00:20:46,380 So each one of these will have the same probability. 397 00:20:46,380 --> 00:20:48,310 Now in the new world, probabilities need 398 00:20:48,310 --> 00:20:49,330 to add up to 1. 399 00:20:49,330 --> 00:20:54,130 So each one of them is going to get a probability of 1/3 in 400 00:20:54,130 --> 00:20:55,385 the conditional universe. 401 00:20:55,385 --> 00:20:58,420 402 00:20:58,420 --> 00:21:00,650 So this is our conditional model. 403 00:21:00,650 --> 00:21:08,276 So our PMF is equal to 1/3 for X equals to 2, 3 and 4. 404 00:21:08,276 --> 00:21:10,270 All right. 405 00:21:10,270 --> 00:21:13,380 Now whenever you have a probabilistic model involving 406 00:21:13,380 --> 00:21:16,550 a random variable and you have a PMF for that random 407 00:21:16,550 --> 00:21:19,690 variable, you can talk about the expected value of that 408 00:21:19,690 --> 00:21:20,990 random variable. 409 00:21:20,990 --> 00:21:25,340 We defined expected values just a few minutes ago. 410 00:21:25,340 --> 00:21:28,800 Here, we're dealing with a conditional model and 411 00:21:28,800 --> 00:21:30,680 conditional probabilities. 412 00:21:30,680 --> 00:21:33,680 And so we can also talk about the expected value of the 413 00:21:33,680 --> 00:21:38,100 random variable X in this new universe, in this new 414 00:21:38,100 --> 00:21:40,830 conditional model that we're dealing with. 415 00:21:40,830 --> 00:21:43,680 And this leads us to the definition of the notion of a 416 00:21:43,680 --> 00:21:45,780 conditional expectation. 417 00:21:45,780 --> 00:21:50,680 The conditional expectation is nothing but an ordinary 418 00:21:50,680 --> 00:21:56,720 expectation, except that you don't use the original PMF. 419 00:21:56,720 --> 00:21:58,600 You use the conditional PMF. 420 00:21:58,600 --> 00:22:00,620 You use the conditional probabilities. 421 00:22:00,620 --> 00:22:05,550 It's just an ordinary expectation, but applied to 422 00:22:05,550 --> 00:22:09,400 the new model that we have to the conditional universe where 423 00:22:09,400 --> 00:22:13,270 we are told that the certain event has occurred. 424 00:22:13,270 --> 00:22:17,310 So we can now calculate the condition expectation, which, 425 00:22:17,310 --> 00:22:19,890 in this particular example, would be 1/3. 426 00:22:19,890 --> 00:22:24,150 That's the probability of a 2, plus 1/3 which is the 427 00:22:24,150 --> 00:22:29,280 probability of a 3 plus 1/3, the probability of a 4. 428 00:22:29,280 --> 00:22:33,160 And then you can use your calculator to find the answer, 429 00:22:33,160 --> 00:22:35,780 or you can just argue by symmetry. 430 00:22:35,780 --> 00:22:39,360 The expected value has to be the center of gravity of the 431 00:22:39,360 --> 00:22:45,040 PMF we're working with, which is equal to 3. 432 00:22:45,040 --> 00:22:49,880 So conditional expectations are no different from ordinary 433 00:22:49,880 --> 00:22:51,230 expectations. 434 00:22:51,230 --> 00:22:54,500 They're just ordinary expectations applied to a new 435 00:22:54,500 --> 00:22:57,600 type of situation or a new type of model. 436 00:22:57,600 --> 00:23:03,010 Anything we might know about expectations will remain valid 437 00:23:03,010 --> 00:23:04,930 about conditional expectations. 438 00:23:04,930 --> 00:23:07,880 So for example, the conditional expectation of a 439 00:23:07,880 --> 00:23:11,040 linear function of a random variable is going to be the 440 00:23:11,040 --> 00:23:14,250 linear function of the conditional expectations. 441 00:23:14,250 --> 00:23:18,200 Or you can take any formula that you might know, such as 442 00:23:18,200 --> 00:23:23,970 the formula that expected value of X is equal to the-- 443 00:23:23,970 --> 00:23:24,310 sorry-- 444 00:23:24,310 --> 00:23:31,030 expected value of g of X is the sum over all X's of g of X 445 00:23:31,030 --> 00:23:37,700 times the PMF of X. So this is the formula that we already 446 00:23:37,700 --> 00:23:41,790 know about how to calculate expectations of a function of 447 00:23:41,790 --> 00:23:43,540 a random variable. 448 00:23:43,540 --> 00:23:47,190 If we move to the conditional universe, what changes? 449 00:23:47,190 --> 00:23:51,170 In the conditional universe, we're talking about the 450 00:23:51,170 --> 00:23:55,150 conditional expectation, given that event A has occurred. 451 00:23:55,150 --> 00:23:59,390 And we use the conditional probabilities, given that A 452 00:23:59,390 --> 00:24:00,650 has occurred. 453 00:24:00,650 --> 00:24:05,140 So any formula has a conditional counterpart. 454 00:24:05,140 --> 00:24:07,790 In the conditional counterparts, expectations get 455 00:24:07,790 --> 00:24:10,000 replaced by conditional expectations. 456 00:24:10,000 --> 00:24:13,940 And probabilities get replaced by conditional probabilities. 457 00:24:13,940 --> 00:24:17,190 So once you know the first formula and you know the 458 00:24:17,190 --> 00:24:21,210 general idea, there's absolutely no reason for you 459 00:24:21,210 --> 00:24:24,020 to memorize a formula like this one. 460 00:24:24,020 --> 00:24:27,070 You shouldn't even have to write it on your cheat sheet 461 00:24:27,070 --> 00:24:30,840 for the exam, OK? 462 00:24:30,840 --> 00:24:40,980 OK, all right, so now let's look at an example of a random 463 00:24:40,980 --> 00:24:44,470 variable that we've seen before, the geometric random 464 00:24:44,470 --> 00:24:47,910 variable, and this time do something a little more 465 00:24:47,910 --> 00:24:51,660 interesting with it. 466 00:24:51,660 --> 00:24:54,510 Do you remember from last time what the geometric random 467 00:24:54,510 --> 00:24:55,580 variable is? 468 00:24:55,580 --> 00:24:56,560 We do coin flips. 469 00:24:56,560 --> 00:24:59,580 Each time there's a probability of P 470 00:24:59,580 --> 00:25:01,250 of obtaining heads. 471 00:25:01,250 --> 00:25:03,910 And we're interested in the number of tosses we're going 472 00:25:03,910 --> 00:25:07,580 to need until we observe heads for the first time. 473 00:25:07,580 --> 00:25:10,045 The probability that the random variable takes the 474 00:25:10,045 --> 00:25:13,290 value K, this is the probability that the first K 475 00:25:13,290 --> 00:25:15,620 appeared at the K-th toss. 476 00:25:15,620 --> 00:25:20,900 So this is the probability of K minus 1 consecutive tails 477 00:25:20,900 --> 00:25:22,670 followed by a head. 478 00:25:22,670 --> 00:25:28,360 So this is the probability of having to weight K tosses. 479 00:25:28,360 --> 00:25:32,280 And when we plot this PMF, it has this kind of shape, which 480 00:25:32,280 --> 00:25:36,020 is the shape of a geometric progression. 481 00:25:36,020 --> 00:25:40,550 It starts at 1, and it goes all the way to infinity. 482 00:25:40,550 --> 00:25:43,700 So this is a discrete random variable that takes values 483 00:25:43,700 --> 00:25:49,160 over an infinite set, the set of the positive integers. 484 00:25:49,160 --> 00:25:51,720 So it's a random variable, therefore, it has an 485 00:25:51,720 --> 00:25:53,140 expectation. 486 00:25:53,140 --> 00:25:56,790 And the expected value is, by definition, we'll consider all 487 00:25:56,790 --> 00:25:59,180 possible values of the random variable. 488 00:25:59,180 --> 00:26:02,560 And we weigh them according to their probabilities, which 489 00:26:02,560 --> 00:26:05,860 leads us to this expression. 490 00:26:05,860 --> 00:26:09,860 You may have evaluated that expression some time in your 491 00:26:09,860 --> 00:26:11,400 previous life. 492 00:26:11,400 --> 00:26:15,730 And there are tricks for how to evaluate this and get a 493 00:26:15,730 --> 00:26:16,770 closed-form answer. 494 00:26:16,770 --> 00:26:19,350 But it's sort of an algebraic trick. 495 00:26:19,350 --> 00:26:20,710 You might not remember it. 496 00:26:20,710 --> 00:26:23,520 How do we go about doing this summation? 497 00:26:23,520 --> 00:26:26,830 Well, we're going to use a probabilistic trick and manage 498 00:26:26,830 --> 00:26:33,440 to evaluate the expectation of X, essentially, without doing 499 00:26:33,440 --> 00:26:34,870 any algebra. 500 00:26:34,870 --> 00:26:38,600 And in the process of doing so, we're going to get some 501 00:26:38,600 --> 00:26:43,080 intuition about what happens in coin tosses and with 502 00:26:43,080 --> 00:26:45,750 geometric random variables. 503 00:26:45,750 --> 00:26:48,930 So we have two people who are going to do the same 504 00:26:48,930 --> 00:26:53,870 experiment, flip a coin until they obtain heads for the 505 00:26:53,870 --> 00:26:55,550 first time. 506 00:26:55,550 --> 00:27:00,170 One of these people is going to use the letter Y to count 507 00:27:00,170 --> 00:27:02,760 how many heads it took. 508 00:27:02,760 --> 00:27:06,860 So that person starts flipping right now. 509 00:27:06,860 --> 00:27:08,710 This is the current time. 510 00:27:08,710 --> 00:27:11,440 And they are going to obtain tails, tails, tails, until 511 00:27:11,440 --> 00:27:13,620 eventually they obtain heads. 512 00:27:13,620 --> 00:27:20,750 And this random variable Y is, of course, geometric, so it 513 00:27:20,750 --> 00:27:22,560 has a PMF of this form. 514 00:27:22,560 --> 00:27:25,510 515 00:27:25,510 --> 00:27:32,090 OK, now there is a second person who is doing that same 516 00:27:32,090 --> 00:27:33,410 experiment. 517 00:27:33,410 --> 00:27:37,400 That second person is going to take, again, a random number, 518 00:27:37,400 --> 00:27:40,160 X, until they obtain heads for the first time. 519 00:27:40,160 --> 00:27:44,560 And of course, X is going to have the same PMF as Y. 520 00:27:44,560 --> 00:27:47,140 But that person was impatient. 521 00:27:47,140 --> 00:27:51,880 And they actually started flipping earlier, before the Y 522 00:27:51,880 --> 00:27:53,490 person started flipping. 523 00:27:53,490 --> 00:27:55,400 They flipped the coin twice. 524 00:27:55,400 --> 00:27:57,640 And they were unlucky, and they 525 00:27:57,640 --> 00:27:59,655 obtained tails both times. 526 00:27:59,655 --> 00:28:02,370 527 00:28:02,370 --> 00:28:05,115 And so they have to continue. 528 00:28:05,115 --> 00:28:09,100 529 00:28:09,100 --> 00:28:13,300 Looking at the situation at this time, how do these two 530 00:28:13,300 --> 00:28:14,690 people compare? 531 00:28:14,690 --> 00:28:20,260 Who do you think is going to obtain heads first? 532 00:28:20,260 --> 00:28:22,610 Is one more likely than the other? 533 00:28:22,610 --> 00:28:26,160 So if you play at the casino a lot, you'll say, oh, there 534 00:28:26,160 --> 00:28:29,810 were two tails in a row, so a head should be coming up 535 00:28:29,810 --> 00:28:31,350 sometime soon. 536 00:28:31,350 --> 00:28:35,600 But this is a wrong argument, because coin flips, at least 537 00:28:35,600 --> 00:28:37,870 in our model, are independent. 538 00:28:37,870 --> 00:28:41,750 The fact that these two happened to be tails doesn't 539 00:28:41,750 --> 00:28:45,230 change anything about our beliefs about what's going to 540 00:28:45,230 --> 00:28:46,900 be happening here. 541 00:28:46,900 --> 00:28:49,900 So what's going to be happening to that person is 542 00:28:49,900 --> 00:28:53,140 they will be flipping independent coin flips. 543 00:28:53,140 --> 00:28:54,930 That person will also be flipping 544 00:28:54,930 --> 00:28:56,600 independent coin flips. 545 00:28:56,600 --> 00:29:00,660 And both of them wait until the first head occurs. 546 00:29:00,660 --> 00:29:04,050 They're facing an identical situation, 547 00:29:04,050 --> 00:29:06,770 starting from this time. 548 00:29:06,770 --> 00:29:11,850 OK, now what's the probabilistic model of what 549 00:29:11,850 --> 00:29:14,530 this person is facing? 550 00:29:14,530 --> 00:29:18,940 The time until that person obtains heads for the first 551 00:29:18,940 --> 00:29:25,740 time is X. So this number of flips until they obtain heads 552 00:29:25,740 --> 00:29:30,080 for the first time is going to be X minus 2. 553 00:29:30,080 --> 00:29:35,810 So X is the total number until the first head. 554 00:29:35,810 --> 00:29:41,280 X minus 2 is the number or flips, starting from here. 555 00:29:41,280 --> 00:29:44,060 Now what information do we have about that person? 556 00:29:44,060 --> 00:29:45,910 We have the information that their first 557 00:29:45,910 --> 00:29:47,970 two flips were tails. 558 00:29:47,970 --> 00:29:52,650 So we're given the information that X was bigger than 2. 559 00:29:52,650 --> 00:29:57,035 So the probabilistic model that describes this piece of 560 00:29:57,035 --> 00:30:01,790 the experiment is that it's going to take a random number 561 00:30:01,790 --> 00:30:04,270 of flips until the first head. 562 00:30:04,270 --> 00:30:08,420 That number of flips, starting from here until the next head, 563 00:30:08,420 --> 00:30:10,980 is that number X minus 2. 564 00:30:10,980 --> 00:30:13,210 But we're given the information that this person 565 00:30:13,210 --> 00:30:17,780 has already wasted 2 coin flips. 566 00:30:17,780 --> 00:30:20,330 Now we argued that probabilistically, this 567 00:30:20,330 --> 00:30:24,780 person, this part of the experiment here is identical 568 00:30:24,780 --> 00:30:26,710 with that part of the experiment. 569 00:30:26,710 --> 00:30:31,650 So the PMF of this random variable, which is X minus 2, 570 00:30:31,650 --> 00:30:33,980 conditioned on this information, should be the 571 00:30:33,980 --> 00:30:39,150 same as that PMF that we have down there. 572 00:30:39,150 --> 00:30:46,290 So the formal statement that I'm making is that this PMF 573 00:30:46,290 --> 00:30:51,910 here of X minus 2, given that X is bigger than 2, is the 574 00:30:51,910 --> 00:30:58,060 same as the PMF of X itself. 575 00:30:58,060 --> 00:31:00,280 What is this saying? 576 00:31:00,280 --> 00:31:04,220 Given that I tell you that you already did a few flips and 577 00:31:04,220 --> 00:31:08,450 they were failures, the remaining number of flips 578 00:31:08,450 --> 00:31:13,260 until the first head has the same geometric distribution as 579 00:31:13,260 --> 00:31:16,130 if you were starting from scratch. 580 00:31:16,130 --> 00:31:19,590 Whatever happened in the past, it happened, but has no 581 00:31:19,590 --> 00:31:22,670 bearing what's going to happen in the future. 582 00:31:22,670 --> 00:31:27,660 Remaining coin flips until a head has the same 583 00:31:27,660 --> 00:31:32,220 distribution, whether you're starting right now, or whether 584 00:31:32,220 --> 00:31:35,590 you had done some other stuff in the past. 585 00:31:35,590 --> 00:31:38,860 So this is a property that we call the memorylessness 586 00:31:38,860 --> 00:31:42,550 property of the geometric distribution. 587 00:31:42,550 --> 00:31:45,560 Essentially, it says that whatever happens in the future 588 00:31:45,560 --> 00:31:48,920 is independent from whatever happened in the past. 589 00:31:48,920 --> 00:31:51,350 And that's true almost by definition, because we're 590 00:31:51,350 --> 00:31:53,750 assuming independent coin flips. 591 00:31:53,750 --> 00:31:56,750 Really, independence means that information about one 592 00:31:56,750 --> 00:32:00,390 part of the experiment has no bearing about what's going to 593 00:32:00,390 --> 00:32:04,280 happen in the other parts of the experiment. 594 00:32:04,280 --> 00:32:09,010 The argument that I tried to give using the intuition of 595 00:32:09,010 --> 00:32:14,240 coin flips, you can make it formal by just manipulating 596 00:32:14,240 --> 00:32:16,110 PMFs formally. 597 00:32:16,110 --> 00:32:19,450 So this is the original PMF of X. 598 00:32:19,450 --> 00:32:22,090 Suppose that you condition on the event that X 599 00:32:22,090 --> 00:32:24,030 is bigger than 3. 600 00:32:24,030 --> 00:32:27,570 This conditioning information, what it does is it tells you 601 00:32:27,570 --> 00:32:30,450 that this piece did not happen. 602 00:32:30,450 --> 00:32:33,760 You're conditioning just on this event. 603 00:32:33,760 --> 00:32:37,430 When you condition on that event, what's left is the 604 00:32:37,430 --> 00:32:42,130 conditional PMF, which has the same shape as this one, except 605 00:32:42,130 --> 00:32:45,010 that it needs to be re-normalized up, so that the 606 00:32:45,010 --> 00:32:46,820 probabilities add up to one. 607 00:32:46,820 --> 00:32:52,460 So you take that picture, but you need to change the height 608 00:32:52,460 --> 00:32:56,210 of it, so that these terms add up to 1. 609 00:32:56,210 --> 00:32:59,730 And this is the conditional PMF of X, given that X is 610 00:32:59,730 --> 00:33:01,310 bigger than 2. 611 00:33:01,310 --> 00:33:04,360 But we're talking here not about X. We're talking about 612 00:33:04,360 --> 00:33:07,930 the remaining number of heads. 613 00:33:07,930 --> 00:33:12,030 Remaining number of heads is X minus 2. 614 00:33:12,030 --> 00:33:17,120 If we have the PMF of X, can we find the PMF of X minus 2? 615 00:33:17,120 --> 00:33:22,870 Well, if X is equal to 3, that corresponds to X minus 2 being 616 00:33:22,870 --> 00:33:24,170 equal to 1. 617 00:33:24,170 --> 00:33:26,730 So this probability here should be equal to that 618 00:33:26,730 --> 00:33:27,950 probability. 619 00:33:27,950 --> 00:33:31,400 The probability that X is equal to 4 should be the same 620 00:33:31,400 --> 00:33:34,710 as the probability that X minus 2 is equal to 2. 621 00:33:34,710 --> 00:33:38,980 So basically, the PMF of X minus 2 is the same as the PMF 622 00:33:38,980 --> 00:33:43,460 of X, except that it gets shifted by these 2 units. 623 00:33:43,460 --> 00:33:47,340 So this way, we have formally derived the conditional PMF of 624 00:33:47,340 --> 00:33:51,490 the remaining number of coin tosses, given that the first 625 00:33:51,490 --> 00:33:55,230 two flips were tails. 626 00:33:55,230 --> 00:33:58,880 And we see that it's exactly the same as the PMF that we 627 00:33:58,880 --> 00:34:00,230 started with. 628 00:34:00,230 --> 00:34:05,130 And so this is the formal proof of this statement here. 629 00:34:05,130 --> 00:34:10,010 So it's useful here to digest both these formal statements 630 00:34:10,010 --> 00:34:13,290 and understand it and understand the notation that 631 00:34:13,290 --> 00:34:17,050 is involved here, but also to really appreciate the 632 00:34:17,050 --> 00:34:21,409 intuitive argument what this is really saying. 633 00:34:21,409 --> 00:34:28,980 OK, all right, so now we want to use this observation, this 634 00:34:28,980 --> 00:34:32,389 memorylessness, to eventually calculate the expected value 635 00:34:32,389 --> 00:34:34,679 for a geometric random variable. 636 00:34:34,679 --> 00:34:38,150 And the way we're going to do it is by using a divide and 637 00:34:38,150 --> 00:34:41,650 conquer tool, which is an analog of what we have already 638 00:34:41,650 --> 00:34:44,489 seen sometime before. 639 00:34:44,489 --> 00:34:48,230 Remember our story that there's a number of possible 640 00:34:48,230 --> 00:34:49,840 scenarios about the world? 641 00:34:49,840 --> 00:34:54,120 And there's a certain event, B, that can happen under any 642 00:34:54,120 --> 00:34:55,889 of these possible scenarios. 643 00:34:55,889 --> 00:34:57,980 And we have the total probability theory. 644 00:34:57,980 --> 00:35:00,970 And that tells us that, to find the probability of this 645 00:35:00,970 --> 00:35:03,360 event, B, you consider the probabilities of 646 00:35:03,360 --> 00:35:06,000 B under each scenario. 647 00:35:06,000 --> 00:35:09,180 And you weigh those probabilities according to the 648 00:35:09,180 --> 00:35:12,190 probabilities of the different scenarios that we have. 649 00:35:12,190 --> 00:35:14,520 So that's a formula that we already know 650 00:35:14,520 --> 00:35:16,760 and have worked with. 651 00:35:16,760 --> 00:35:18,020 What's the next step? 652 00:35:18,020 --> 00:35:19,910 Is it something deep? 653 00:35:19,910 --> 00:35:24,280 No, it's just translation in different notation. 654 00:35:24,280 --> 00:35:29,140 This is the exactly same formula, but with PMFs. 655 00:35:29,140 --> 00:35:32,720 The event that capital X is equal to little x can happen 656 00:35:32,720 --> 00:35:34,420 in many different ways. 657 00:35:34,420 --> 00:35:37,580 It can happen under either scenario. 658 00:35:37,580 --> 00:35:40,910 And within each scenario, you need to use the conditional 659 00:35:40,910 --> 00:35:44,140 probabilities of that event, given that this 660 00:35:44,140 --> 00:35:46,270 scenario has occurred. 661 00:35:46,270 --> 00:35:49,640 So this formula is identical to that one, except that we're 662 00:35:49,640 --> 00:35:53,440 using conditional PMFs, instead of conditional 663 00:35:53,440 --> 00:35:54,290 probabilities. 664 00:35:54,290 --> 00:35:56,860 But conditional PMFs, of course, are nothing but 665 00:35:56,860 --> 00:35:59,710 conditional probabilities anyway. 666 00:35:59,710 --> 00:36:02,500 So nothing new so far. 667 00:36:02,500 --> 00:36:08,700 Then what I do is to take this formula here and multiply both 668 00:36:08,700 --> 00:36:15,320 sides by X and take the sum over all X's. 669 00:36:15,320 --> 00:36:17,270 What do we get on this side? 670 00:36:17,270 --> 00:36:19,430 We get the expected value of X. 671 00:36:19,430 --> 00:36:22,830 What do we get on that side? 672 00:36:22,830 --> 00:36:24,290 Probability of A1. 673 00:36:24,290 --> 00:36:29,770 And then here, sum over all X's of X times P. That's, 674 00:36:29,770 --> 00:36:33,030 again, the same calculation we have when we deal with 675 00:36:33,030 --> 00:36:36,450 expectations, except that, since here, we're dealing with 676 00:36:36,450 --> 00:36:39,010 conditional probabilities, we're going to get the 677 00:36:39,010 --> 00:36:41,220 conditional expectation. 678 00:36:41,220 --> 00:36:44,160 And this is the total expectation theorem. 679 00:36:44,160 --> 00:36:47,300 It's a very useful way for calculating expectations using 680 00:36:47,300 --> 00:36:49,440 a divide and conquer method. 681 00:36:49,440 --> 00:36:53,730 We figure out the average value of X under each one of 682 00:36:53,730 --> 00:36:55,590 the possible scenarios. 683 00:36:55,590 --> 00:37:01,330 The overall average value of X is a weighted linear 684 00:37:01,330 --> 00:37:04,500 combination of the expected values of X in the different 685 00:37:04,500 --> 00:37:07,960 scenarios where the weights are chosen according to the 686 00:37:07,960 --> 00:37:09,210 different probabilities. 687 00:37:09,210 --> 00:37:15,356 688 00:37:15,356 --> 00:37:21,040 OK, and now we're going to apply this to the case of a 689 00:37:21,040 --> 00:37:23,070 geometric random variable. 690 00:37:23,070 --> 00:37:26,410 And we're going to divide and conquer by considering 691 00:37:26,410 --> 00:37:31,350 separately the two cases where the first toss was heads, and 692 00:37:31,350 --> 00:37:35,820 the other case where the first toss was tails. 693 00:37:35,820 --> 00:37:40,160 So the expected value of X is the probability that the first 694 00:37:40,160 --> 00:37:44,020 toss was heads, so that X is equal to 1, and the expected 695 00:37:44,020 --> 00:37:46,520 value if that happened. 696 00:37:46,520 --> 00:37:51,530 What is the expected value of X, given that X is equal to 1? 697 00:37:51,530 --> 00:37:55,770 If X is known to be equal to 1, then X 698 00:37:55,770 --> 00:37:57,390 becomes just a number. 699 00:37:57,390 --> 00:38:01,100 And the expected value of a number is the number itself. 700 00:38:01,100 --> 00:38:05,120 So this first line here is the probability of heads in the 701 00:38:05,120 --> 00:38:07,660 first toss times the number 1. 702 00:38:07,660 --> 00:38:13,320 703 00:38:13,320 --> 00:38:21,000 So the probability that X is bigger than 1 is 1 minus P. 704 00:38:21,000 --> 00:38:25,400 And then we need to do something about this 705 00:38:25,400 --> 00:38:26,650 conditional expectation. 706 00:38:26,650 --> 00:38:29,830 707 00:38:29,830 --> 00:38:33,400 What is it? 708 00:38:33,400 --> 00:38:39,420 I can write it in, perhaps, a more suggested form, as 709 00:38:39,420 --> 00:38:51,360 expected the value of X minus 1, given that X minus 1 is 710 00:38:51,360 --> 00:38:52,610 bigger than 1. 711 00:38:52,610 --> 00:38:56,590 712 00:38:56,590 --> 00:38:57,840 Ah. 713 00:38:57,840 --> 00:39:02,420 714 00:39:02,420 --> 00:39:07,453 OK, X bigger than 1 is the same as X minus 1 being 715 00:39:07,453 --> 00:39:10,770 positive, this way. 716 00:39:10,770 --> 00:39:15,500 X minus 1 is positive plus 1. 717 00:39:15,500 --> 00:39:16,680 What did I do here? 718 00:39:16,680 --> 00:39:21,250 I added and subtracted 1. 719 00:39:21,250 --> 00:39:24,110 Now what is this? 720 00:39:24,110 --> 00:39:29,240 This is the expected value of the remaining coin flips, 721 00:39:29,240 --> 00:39:34,660 until I obtain heads, given that the first one was tails. 722 00:39:34,660 --> 00:39:38,690 It's the same story that we were going through down there. 723 00:39:38,690 --> 00:39:41,860 Given that the first coin flip was tails doesn't tell me 724 00:39:41,860 --> 00:39:43,790 anything about the future, about the 725 00:39:43,790 --> 00:39:45,750 remaining coin flips. 726 00:39:45,750 --> 00:39:49,610 So this expectation should be the same as the expectation 727 00:39:49,610 --> 00:39:53,740 faced by a person who was starting just now. 728 00:39:53,740 --> 00:39:59,080 So this should be equal to the expected value of X itself. 729 00:39:59,080 --> 00:40:04,120 And then we have the plus 1 that's come from there, OK? 730 00:40:04,120 --> 00:40:07,830 731 00:40:07,830 --> 00:40:11,140 Remaining coin flips until a head, given that I had a tail 732 00:40:11,140 --> 00:40:15,710 yesterday, is the same as expected number of flips until 733 00:40:15,710 --> 00:40:20,160 heads for a person just starting now and wasn't doing 734 00:40:20,160 --> 00:40:21,280 anything yesterday. 735 00:40:21,280 --> 00:40:24,640 So the fact that they I had a coin flip yesterday doesn't 736 00:40:24,640 --> 00:40:28,990 change my beliefs about how long it's going to take me 737 00:40:28,990 --> 00:40:31,170 until the first head. 738 00:40:31,170 --> 00:40:34,700 So once we believe that relation, than 739 00:40:34,700 --> 00:40:37,750 we plug this here. 740 00:40:37,750 --> 00:40:42,346 And this red term becomes expected value of X plus 1. 741 00:40:42,346 --> 00:40:46,000 742 00:40:46,000 --> 00:40:50,850 So now we didn't exactly get the answer we wanted, but we 743 00:40:50,850 --> 00:40:56,110 got an equation that involves the expected value of X. And 744 00:40:56,110 --> 00:40:58,230 it's the only unknown in that equation. 745 00:40:58,230 --> 00:41:03,990 Expected value of X equals to P plus (1 minus P) times this 746 00:41:03,990 --> 00:41:05,190 expression. 747 00:41:05,190 --> 00:41:08,480 You solve this equation for expected value of X, and you 748 00:41:08,480 --> 00:41:12,990 get the value of 1/P. 749 00:41:12,990 --> 00:41:16,920 The final answer does make intuitive sense. 750 00:41:16,920 --> 00:41:21,030 If P is small, heads are difficult to obtain. 751 00:41:21,030 --> 00:41:24,050 So you expect that it's going to take you a long time until 752 00:41:24,050 --> 00:41:26,310 you see heads for the first time. 753 00:41:26,310 --> 00:41:29,243 So it is definitely a reasonable answer. 754 00:41:29,243 --> 00:41:32,960 Now the trick that we used here, the divide and conquer 755 00:41:32,960 --> 00:41:36,860 trick, is a really nice one. 756 00:41:36,860 --> 00:41:39,760 It gives us a very good shortcut in this problem. 757 00:41:39,760 --> 00:41:44,230 But you must definitely spend some time making sure you 758 00:41:44,230 --> 00:41:48,670 understand why this expression here is the same as that 759 00:41:48,670 --> 00:41:50,020 expression there. 760 00:41:50,020 --> 00:41:53,790 Essentially, what it's saying is that, if I tell you that X 761 00:41:53,790 --> 00:41:57,460 is bigger than 1, that the first coin flip was tails, all 762 00:41:57,460 --> 00:42:02,040 I'm telling you is that that person has wasted a coin flip, 763 00:42:02,040 --> 00:42:05,310 and they are starting all over again. 764 00:42:05,310 --> 00:42:08,510 So they've wasted 1 coin flip. 765 00:42:08,510 --> 00:42:10,670 And they're starting all over again. 766 00:42:10,670 --> 00:42:13,220 If I tell you that the first flip was tails, that's the 767 00:42:13,220 --> 00:42:16,800 only information that I'm basically giving you, a wasted 768 00:42:16,800 --> 00:42:19,970 flip, and then starts all over again. 769 00:42:19,970 --> 00:42:23,180 All right, so in the few remaining minutes now, we're 770 00:42:23,180 --> 00:42:26,970 going to quickly introduce a few new concepts that we will 771 00:42:26,970 --> 00:42:31,050 be playing with in the next ten days or so. 772 00:42:31,050 --> 00:42:33,300 And you will get plenty of opportunities 773 00:42:33,300 --> 00:42:34,860 to manipulate them. 774 00:42:34,860 --> 00:42:37,180 So here's the idea. 775 00:42:37,180 --> 00:42:40,370 A typical experiment may have several random variables 776 00:42:40,370 --> 00:42:43,310 associated with that experiment. 777 00:42:43,310 --> 00:42:46,370 So a typical student has height and weight. 778 00:42:46,370 --> 00:42:48,800 If I give you the PMF of height, that tells me 779 00:42:48,800 --> 00:42:51,060 something about distribution of heights in the class. 780 00:42:51,060 --> 00:42:57,110 I give you the PMF of weight, it tells me something about 781 00:42:57,110 --> 00:42:58,990 the different weights in this class. 782 00:42:58,990 --> 00:43:01,690 But if I want to ask a question, is there an 783 00:43:01,690 --> 00:43:05,910 association between height and weight, then I need to know a 784 00:43:05,910 --> 00:43:09,730 little more how height and weight relate to each other. 785 00:43:09,730 --> 00:43:13,480 And the PMF of height individuality and PMF of 786 00:43:13,480 --> 00:43:16,130 weight just by itself do not tell me 787 00:43:16,130 --> 00:43:18,240 anything about those relations. 788 00:43:18,240 --> 00:43:21,730 To be able to say something about those relations, I need 789 00:43:21,730 --> 00:43:27,230 to know something about joint probabilities, how likely is 790 00:43:27,230 --> 00:43:31,500 it that certain X's go together with certain Y's. 791 00:43:31,500 --> 00:43:34,180 So these probabilities, essentially, capture 792 00:43:34,180 --> 00:43:37,930 associations between these two random variables. 793 00:43:37,930 --> 00:43:40,910 And it's the information I would need to have to do any 794 00:43:40,910 --> 00:43:44,900 kind of statistical study that tries to relate the two random 795 00:43:44,900 --> 00:43:47,600 variables with each other. 796 00:43:47,600 --> 00:43:49,440 These are ordinary probabilities. 797 00:43:49,440 --> 00:43:50,750 This is an event. 798 00:43:50,750 --> 00:43:52,630 It's the event that this thing happens 799 00:43:52,630 --> 00:43:55,230 and that thing happens. 800 00:43:55,230 --> 00:43:58,460 This is just the notation that we will be using. 801 00:43:58,460 --> 00:44:00,840 It's called the joint PMF. 802 00:44:00,840 --> 00:44:04,560 It's the joint Probability Mass Function of the two 803 00:44:04,560 --> 00:44:09,170 random variables X and Y looked at together, jointly. 804 00:44:09,170 --> 00:44:11,740 And it gives me the probability that any 805 00:44:11,740 --> 00:44:17,100 particular numerical outcome pair does happen. 806 00:44:17,100 --> 00:44:20,580 So in the finite case, you can represent joint PMFs, for 807 00:44:20,580 --> 00:44:22,660 example, by a table. 808 00:44:22,660 --> 00:44:25,940 This particular table here would give you information 809 00:44:25,940 --> 00:44:31,870 such as, let's see, the joint PMF evaluated at 2, 3. 810 00:44:31,870 --> 00:44:35,240 This is the probability that X is equal to 3 and, 811 00:44:35,240 --> 00:44:38,200 simultaneously, Y is equal to 3. 812 00:44:38,200 --> 00:44:40,370 So it would be that number here. 813 00:44:40,370 --> 00:44:41,620 It's 4/20. 814 00:44:41,620 --> 00:44:44,290 815 00:44:44,290 --> 00:44:47,330 OK, what is a basic property of PMFs? 816 00:44:47,330 --> 00:44:49,920 First, these are probabilities, so all of the 817 00:44:49,920 --> 00:44:52,240 entries have to be non-negative. 818 00:44:52,240 --> 00:44:57,470 If you adopt the probabilities over all possible numerical 819 00:44:57,470 --> 00:45:01,070 pairs that you could get, of course, the total probability 820 00:45:01,070 --> 00:45:03,050 must be equal to 1. 821 00:45:03,050 --> 00:45:06,070 So that's another thing that we want. 822 00:45:06,070 --> 00:45:10,090 Now suppose somebody gives me this model, but I 823 00:45:10,090 --> 00:45:12,410 don't care about Y's. 824 00:45:12,410 --> 00:45:15,760 All I care is the distribution of the X's. 825 00:45:15,760 --> 00:45:18,550 So I'm going to find the probability that X takes on a 826 00:45:18,550 --> 00:45:20,060 particular value. 827 00:45:20,060 --> 00:45:22,230 Can I find it from the table? 828 00:45:22,230 --> 00:45:23,190 Of course, I can. 829 00:45:23,190 --> 00:45:27,930 If you ask me what's the probability that X is equal to 830 00:45:27,930 --> 00:45:31,890 3, what I'm going to do is to add up those three 831 00:45:31,890 --> 00:45:33,790 probabilities together. 832 00:45:33,790 --> 00:45:37,680 And those probabilities, taken all together, give me the 833 00:45:37,680 --> 00:45:40,400 probability that X is equal to 3. 834 00:45:40,400 --> 00:45:43,950 These are all the possible ways that the event X equals 835 00:45:43,950 --> 00:45:45,310 to 3 can happen. 836 00:45:45,310 --> 00:45:49,850 So we add these, and we get the 6/20. 837 00:45:49,850 --> 00:45:53,180 What I just did, can we translate it to a formula? 838 00:45:53,180 --> 00:45:55,510 What did I do? 839 00:45:55,510 --> 00:45:59,790 I fixed the particular value of X. And I added up the 840 00:45:59,790 --> 00:46:05,100 values of the joint PMF over all the possible values of Y. 841 00:46:05,100 --> 00:46:07,710 So that's how you do it. 842 00:46:07,710 --> 00:46:08,990 You take the joint. 843 00:46:08,990 --> 00:46:13,020 You take one slice of the joint, keeping X fixed, and 844 00:46:13,020 --> 00:46:16,110 adding up over the different values of Y. 845 00:46:16,110 --> 00:46:18,930 The moral of this example is that, if you know the joint 846 00:46:18,930 --> 00:46:22,590 PMFs, then you can find the individual PMFs of every 847 00:46:22,590 --> 00:46:24,310 individual random variable. 848 00:46:24,310 --> 00:46:25,980 And we have a name for these. 849 00:46:25,980 --> 00:46:28,840 We call them the marginal PMFs. 850 00:46:28,840 --> 00:46:31,900 We have the joint that talks about both together, and the 851 00:46:31,900 --> 00:46:35,170 marginal that talks about them one at the time. 852 00:46:35,170 --> 00:46:38,590 And finally, since we love conditional probabilities, we 853 00:46:38,590 --> 00:46:41,150 will certainly want to define an object called the 854 00:46:41,150 --> 00:46:44,160 conditional PMF. 855 00:46:44,160 --> 00:46:46,940 So this quantity here is a familiar one. 856 00:46:46,940 --> 00:46:49,310 It's just a conditional probability. 857 00:46:49,310 --> 00:46:54,940 It's the probability that X takes on a particular value, 858 00:46:54,940 --> 00:46:58,210 given that Y takes a certain value. 859 00:46:58,210 --> 00:47:02,060 860 00:47:02,060 --> 00:47:07,160 For our example, let's take little y to be equal to 2, 861 00:47:07,160 --> 00:47:10,890 which means that we're conditioning to live inside 862 00:47:10,890 --> 00:47:12,490 this universe. 863 00:47:12,490 --> 00:47:17,920 This red universe here is the y equal to 2 universe. 864 00:47:17,920 --> 00:47:20,860 And these are the conditional probabilities of the different 865 00:47:20,860 --> 00:47:22,935 X's inside that universe. 866 00:47:22,935 --> 00:47:26,020 867 00:47:26,020 --> 00:47:29,860 OK, once more, just an exercise in notation. 868 00:47:29,860 --> 00:47:34,850 This is the chapter two version of the notation of 869 00:47:34,850 --> 00:47:37,750 what we were denoting this way in chapter one. 870 00:47:37,750 --> 00:47:43,000 The way to read this is that it's a conditional PMF having 871 00:47:43,000 --> 00:47:46,450 to do with two random variables, the PMF of X 872 00:47:46,450 --> 00:47:51,150 conditioned on information about Y. We are fixing a 873 00:47:51,150 --> 00:47:54,830 particular value of capital Y, that's the value on which we 874 00:47:54,830 --> 00:47:56,610 are conditioning. 875 00:47:56,610 --> 00:47:58,340 And we're looking at the probabilities of 876 00:47:58,340 --> 00:48:00,190 the different X's. 877 00:48:00,190 --> 00:48:03,890 So it's really a function of two arguments, little 878 00:48:03,890 --> 00:48:05,340 x and little y. 879 00:48:05,340 --> 00:48:10,490 But the best way to think about it is to fix little y 880 00:48:10,490 --> 00:48:15,030 and think of it as a function of X. So I'm fixing little y 881 00:48:15,030 --> 00:48:17,610 here, let's say, to y equal to 2. 882 00:48:17,610 --> 00:48:20,400 So I'm considering only this. 883 00:48:20,400 --> 00:48:24,090 And now, this quantity becomes a function of little x. 884 00:48:24,090 --> 00:48:27,340 For the different little x's, we're going to have different 885 00:48:27,340 --> 00:48:29,290 conditional probabilities. 886 00:48:29,290 --> 00:48:31,040 What are those conditional probabilities? 887 00:48:31,040 --> 00:48:36,230 888 00:48:36,230 --> 00:48:40,760 OK, conditional probabilities are proportional to original 889 00:48:40,760 --> 00:48:41,940 probabilities. 890 00:48:41,940 --> 00:48:45,200 So it's going to be those numbers, but scaled up. 891 00:48:45,200 --> 00:48:48,340 And they need to be scaled so that they add up to 1. 892 00:48:48,340 --> 00:48:50,280 So we have 1, 3 and 1. 893 00:48:50,280 --> 00:48:51,970 That's a total of 5. 894 00:48:51,970 --> 00:48:56,220 So the conditional PMF would have the shape zero, 895 00:48:56,220 --> 00:49:02,480 1/5, 3/5, and 1/5. 896 00:49:02,480 --> 00:49:07,540 This is the conditional PMF, given a particular value of Y. 897 00:49:07,540 --> 00:49:13,180 It has the same shape as those numbers, where by shape, I 898 00:49:13,180 --> 00:49:15,850 mean try to visualize a bar graph. 899 00:49:15,850 --> 00:49:19,370 The bar graph associated with those numbers has exactly the 900 00:49:19,370 --> 00:49:23,630 same shape as the bar graph associated with those numbers. 901 00:49:23,630 --> 00:49:26,790 The only thing that has changed is the scaling. 902 00:49:26,790 --> 00:49:29,630 Big moral, let me say in different words, the 903 00:49:29,630 --> 00:49:34,250 conditional PMF, given a particular value of Y, is just 904 00:49:34,250 --> 00:49:39,790 a slice of the joint PMF where you maintain the same shape, 905 00:49:39,790 --> 00:49:44,320 but you rescale the numbers so that they add up to 1. 906 00:49:44,320 --> 00:49:48,410 Now mathematically, of course, what all of this is doing is 907 00:49:48,410 --> 00:49:54,750 it's taking the original joint PDF and it rescales it by a 908 00:49:54,750 --> 00:49:56,540 certain factor. 909 00:49:56,540 --> 00:50:00,340 This does not involve X, so the shape, is a function of X, 910 00:50:00,340 --> 00:50:01,720 has not changed. 911 00:50:01,720 --> 00:50:04,910 We're keeping the same shape as a function of X, but we 912 00:50:04,910 --> 00:50:06,420 divide by a certain number. 913 00:50:06,420 --> 00:50:09,670 And that's the number that we need, so that the conditional 914 00:50:09,670 --> 00:50:12,810 probabilities add up to 1. 915 00:50:12,810 --> 00:50:15,850 Now where does this formula come from? 916 00:50:15,850 --> 00:50:17,880 Well, this is just the definition of conditional 917 00:50:17,880 --> 00:50:19,000 probabilities. 918 00:50:19,000 --> 00:50:21,870 Probability of something conditioned on something else 919 00:50:21,870 --> 00:50:24,620 is the probability of both things happening, the 920 00:50:24,620 --> 00:50:28,040 intersection of the two divided by the probability of 921 00:50:28,040 --> 00:50:29,890 the conditioning event. 922 00:50:29,890 --> 00:50:33,100 And last remark is that, as I just said, conditional 923 00:50:33,100 --> 00:50:35,930 probabilities are nothing different than ordinary 924 00:50:35,930 --> 00:50:37,210 probabilities. 925 00:50:37,210 --> 00:50:42,390 So a conditional PMF must sum to 1, no matter what you are 926 00:50:42,390 --> 00:50:44,360 conditioning on. 927 00:50:44,360 --> 00:50:47,370 All right, so this was sort of quick introduction into our 928 00:50:47,370 --> 00:50:48,920 new notation. 929 00:50:48,920 --> 00:50:53,030 But you get a lot of practice in the next days to come.