1 00:00:05,500 --> 00:00:09,600 Why does going to the airport seem to require extra time compared with coming back from 2 00:00:09,600 --> 00:00:15,080 the airport even if the traffic is the same in both directions? The answer must somehow 3 00:00:15,080 --> 00:00:19,900 depend on more than just the average travel time, which we’re assuming is the same and 4 00:00:19,900 --> 00:00:27,760 often is. In fact, it depends on the distribution of travel times. Probability distributions 5 00:00:27,769 --> 00:00:32,980 are fully described by listing or graphing every probability. For example, how likely 6 00:00:32,980 --> 00:00:38,350 is a journey to the airport to be between 10 and 20 minutes? How likely is a 20—30 7 00:00:38,350 --> 00:00:43,550 minute journey? A 30—40 minute journey? And so on. We’ll answer the airport question 8 00:00:43,550 --> 00:00:45,590 at the end of the video. 9 00:00:45,590 --> 00:00:50,379 This video is part of the Probability and Statistics video series. Many natural and 10 00:00:50,379 --> 00:00:55,960 social phenomena are probabilistic in nature. Engineers, scientists, and policymakers often 11 00:00:55,960 --> 00:00:59,390 use probability to model and predict system behavior. 12 00:00:59,390 --> 00:01:04,720 Hi, my name is Sanjoy Mahajan, and I’m a professor of Applied Science and Engineering 13 00:01:04,720 --> 00:01:10,420 at Olin College. Before watching this video, you should be proficient with integration 14 00:01:10,420 --> 00:01:14,020 and have some familiarity with probabilities. 15 00:01:14,020 --> 00:01:16,920 After watching this video, you will be able to: 16 00:01:16,920 --> 00:01:20,140 Explain what moments of distributions are, and 17 00:01:20,140 --> 00:01:25,400 Compute moments and understand what they mean 18 00:01:27,658 --> 00:01:34,658 To illustrate what a probability distribution is, lets consider rolling two fair dice. The 19 00:01:34,670 --> 00:01:39,549 probability distribution of their sum is this table. For example, the only way to get a 20 00:01:39,549 --> 00:01:46,329 sum of two is to roll a 1 on each die. And, there are 36 possible rolls for a pair of 21 00:01:46,329 --> 00:01:53,329 dice. So, getting a sum of two has a probability of 1 over 36. The probability of rolling a 22 00:01:53,880 --> 00:02:00,880 sum of 3 is 2 over 36. And so on and so forth. You can fill in a table like this yourself. 23 00:02:01,090 --> 00:02:06,219 But the whole distribution, even for something as simple as two dice, is usually too much 24 00:02:06,219 --> 00:02:07,860 information. 25 00:02:07,860 --> 00:02:12,790 We often want to characterize the shape of the distribution using only a few numbers. 26 00:02:12,790 --> 00:02:19,700 Of course, that throws away information, but throwing away information is the only way 27 00:02:19,700 --> 00:02:23,200 to fit the complexity of the world into our brains. 28 00:02:23,200 --> 00:02:29,040 The art comes in keeping the most important information. Finding the moments of a distribution 29 00:02:29,040 --> 00:02:34,959 can help us reach our goal. Two moments that you are probably already familiar with are 30 00:02:34,959 --> 00:02:39,690 mean and variance. They are the two most important moments of distributions. 31 00:02:39,690 --> 00:02:47,150 Let’s define these moments more formally. The mean is the first moment of a distribution. 32 00:02:47,150 --> 00:02:54,599 It is also called the expected value and is computed as shown. Expected value of x, that’s 33 00:02:54,599 --> 00:03:00,349 x with angled brackets around it, is equal to this sum. It’s the weighted sum of all 34 00:03:00,349 --> 00:03:06,069 of the x’s weighted by their probabilities. Let the x sub i be the possible values of 35 00:03:06,069 --> 00:03:07,569 x. 36 00:03:07,569 --> 00:03:14,400 For example, for the rolling of two dice, the possible values for x sub i would be 2,3,4 37 00:03:14,400 --> 00:03:19,220 all the way up through 12. And p sub i would be the corresponding probabilities of rolling 38 00:03:19,220 --> 00:03:24,540 those sums - so that was 1 over 36, 2 over 36, and so on. 39 00:03:24,540 --> 00:03:29,840 So, the first moment gives us some idea of what our distribution might look like, but 40 00:03:29,840 --> 00:03:34,900 not much. Think about it like this, the center of mass in these two images is in the same 41 00:03:34,900 --> 00:03:39,930 place, but the mass is actually distributed very differently in the two cases. We need 42 00:03:39,930 --> 00:03:41,409 more information. 43 00:03:41,409 --> 00:03:46,099 The second moment can help us. The second moment is very similar in structure to the 44 00:03:46,099 --> 00:03:51,819 first moment. We write it the same way with angled brackets, but now we’re talking about 45 00:03:51,819 --> 00:03:58,379 the expected value of x squared. So it’s still a sum and it’s still weighted by the 46 00:03:58,379 --> 00:04:04,340 probabilities p sub i, but now we square each possible x value. For the dice example that 47 00:04:04,340 --> 00:04:10,920 was the values from two through twelve. This is also called the mean square. First you 48 00:04:10,920 --> 00:04:16,829 square the x values, then you take the mean, weighting each x sub i by its probability, 49 00:04:16,829 --> 00:04:17,779 p sub i. 50 00:04:17,779 --> 00:04:24,780 In general, the nth moment is defined as follows. 51 00:04:27,590 --> 00:04:32,479 So how does the second moment help us get a better picture of our distribution? Because 52 00:04:32,479 --> 00:04:38,300 it can help us calculate something called the variance. The variance measures how spread 53 00:04:38,300 --> 00:04:44,229 out the distribution is around the mean. To calculate the variance, you first subtract 54 00:04:44,229 --> 00:04:49,710 the mean from each x sub i – this is like finding the distance of each x sub i from 55 00:04:49,710 --> 00:04:56,710 the mean - and then you square the result and multiply by p sub i. 56 00:04:59,620 --> 00:05:04,930 What are the dimensions of the variance? The square of the dimensions of x. For example 57 00:05:04,930 --> 00:05:10,660 if the dimension is a length, then the variance is a length squared. But we often want a measure 58 00:05:10,660 --> 00:05:16,490 of dispersion like the variance, but one that has the same dimensions as x itself. That 59 00:05:16,490 --> 00:05:22,320 measure is the standard deviation, sigma. Sigma is defined as the square root of the 60 00:05:22,320 --> 00:05:27,520 variance. So if the variable x has dimensions of length, then the variance will have dimensions 61 00:05:27,520 --> 00:05:32,470 of length squared, but the standard deviation, sigma, will have dimensions of length so it’s 62 00:05:32,470 --> 00:05:35,000 comparable to x directly. 63 00:05:35,000 --> 00:05:40,350 This expression for the variance looks like a pain to compute, but it has an alternative 64 00:05:40,350 --> 00:05:45,320 expression that is much simpler. And you get to show that as one of the exercises after 65 00:05:45,320 --> 00:05:51,490 the video. The alternative expression, the much simpler one, is that the variance is 66 00:05:51,490 --> 00:05:57,159 equal to the second moment, our old friend, minus the square of the first moment, or the 67 00:05:57,159 --> 00:05:58,240 mean. 68 00:05:58,240 --> 00:06:05,240 Pause the video here to convince yourself that this difference is always non-negative. 69 00:06:09,729 --> 00:06:15,050 This alternative expression for the variance, this much more useful one, is also the parallel 70 00:06:15,050 --> 00:06:19,990 axis theorem in mechanics, which says that the moment of inertia of an object about the 71 00:06:19,990 --> 00:06:25,160 center of mass is equal to the moment of inertia about an axis shifted by h from the center 72 00:06:25,160 --> 00:06:29,860 of mass, a parallel shift, minus mh squared. 73 00:06:29,860 --> 00:06:36,350 So how does this analogy work? This, the dispersion around the mean, which is here at the center 74 00:06:36,350 --> 00:06:42,610 of mass, is like the variance. This is like the second moment if we make h equal to the 75 00:06:42,610 --> 00:06:50,389 mean. So this is the dispersion around zero or its second moment. So this is like x squared, 76 00:06:50,389 --> 00:06:56,580 the expected value. The mass is the sum total of all the weights here for each of xi which 77 00:06:56,580 --> 00:07:03,580 all add up to one. So this is just like one in this problem. And then the h squared, well 78 00:07:03,639 --> 00:07:06,840 h is the mean, so this is x squared. 79 00:07:06,840 --> 00:07:12,910 So you can see the exact same structure repeated with h, the shift of axis as the mean, and 80 00:07:12,910 --> 00:07:19,500 m the mass, as the sum of all probabilities which is one. So this formula for the variance 81 00:07:19,500 --> 00:07:24,080 is also the parallel axis theorem. 82 00:07:26,900 --> 00:07:32,100 Let’s use the definitions of the moments, and also of the related quantity, the variance, 83 00:07:32,110 --> 00:07:34,460 and practice on a few distributions. 84 00:07:34,460 --> 00:07:39,639 A simple discrete distribution is a single coin flip. Instead of thinking of the coin 85 00:07:39,639 --> 00:07:43,889 flip as resulting in heads or tails, let’s think about the coin as turning up a zero 86 00:07:43,889 --> 00:07:47,970 or one. Let p be the probability of a one. 87 00:07:47,970 --> 00:07:53,560 So the mean is the weighted sum of the xi’s, weighted by the probabilities. So the mean 88 00:07:53,560 --> 00:08:02,340 x is the sum pi xi which is equal to one minus p times zero plus p times one which is equal 89 00:08:02,349 --> 00:08:03,620 to p. 90 00:08:03,620 --> 00:08:11,040 What about the second moment? X squared, it’s equal to the weighted sum of the xi’s squared 91 00:08:11,050 --> 00:08:18,970 so the weights are the same and we can square each value here, the xi’s, but since they’re 92 00:08:18,970 --> 00:08:24,919 all zero or one, squaring doesn’t change them. So the second moment and the third moment 93 00:08:24,919 --> 00:08:32,759 and every higher moment are all p. Pause the video here and compute the variance and sketch 94 00:08:32,760 --> 00:08:35,919 it as a function of p. 95 00:08:40,690 --> 00:08:44,750 The variance from our old convenient form of the formula is… variance of x is the 96 00:08:44,750 --> 00:08:49,680 mean squared, mean square minus the squared mean and all the moments themselves were just 97 00:08:49,680 --> 00:08:56,580 p. So that’s p minus p squared which is equal to p times 1 minus p. 98 00:08:56,580 --> 00:09:03,339 What does that look like? We sketch it. P on this axis, variance on that axis and the 99 00:09:03,339 --> 00:09:09,310 curve starts at zero (something I can’t understand) and goes back to zero. 100 00:09:09,310 --> 00:09:15,450 This is a p equals 1 and that’s p equals zero. Does that make sense? 101 00:09:15,450 --> 00:09:22,080 Yeah, it does… from the meaning of variance as dispersion around the mean. So take the 102 00:09:22,080 --> 00:09:27,430 first extreme case of p equals zero. In other words, the coin has no chance of producing 103 00:09:27,430 --> 00:09:33,730 a one, always produces a zero every time. There the mean is zero and there is no dispersion 104 00:09:33,730 --> 00:09:40,060 because it always produces zero. The same applies when p equals one here at this extreme. 105 00:09:40,060 --> 00:09:45,560 The coin always produces a one with no dispersion. There is no variation, there is no variance 106 00:09:45,560 --> 00:09:52,420 and it’s plausible that the variance should be a maximum right in between… here at p 107 00:09:52,420 --> 00:09:59,100 equals one half which it is on this curve. So everything looks good. Our calculation 108 00:09:59,100 --> 00:10:03,540 seems reasonable and checks out in the extreme cases. 109 00:10:03,540 --> 00:10:07,900 Before we go back to the airport problem, let’s extend the idea of moments to continuous 110 00:10:07,900 --> 00:10:09,620 distributions. 111 00:10:09,630 --> 00:10:14,459 Here, instead of a list of probabilities for each possible x, we have a probability density 112 00:10:14,459 --> 00:10:21,010 p as a function of x, where x is now a continuous variable. That’s the continuous version 113 00:10:21,010 --> 00:10:27,880 for the nth moment was a sum of xi to the nth weighted by the probabilities. Here, the 114 00:10:27,880 --> 00:10:34,540 nth moment, x sub n, in equal to instead of a sum, an integral. Weighted again, as always, 115 00:10:34,540 --> 00:10:42,340 by the probability times x sub n, as before and with a dx because p of x times dx is the 116 00:10:42,340 --> 00:10:48,389 probability and you add them all up over all possible values of x. That’s the formula 117 00:10:48,399 --> 00:10:50,769 for a continuous distribution, for the moments of a continuous distribution. 118 00:10:50,769 --> 00:10:57,170 Let’s practice on the simplest continuous distribution, the uniform distribution. X 119 00:10:57,170 --> 00:11:03,420 is equally likely to be any real number between zero and one. That’s the distribution and 120 00:11:03,420 --> 00:11:07,450 we can compute the first and second moments and the variance. 121 00:11:07,450 --> 00:11:12,700 Pause the video here, use the definition of moments for a continuous distribution and 122 00:11:12,720 --> 00:11:19,720 compute the mean, first moment, the second moment, and from those two, the variance. 123 00:11:27,930 --> 00:11:33,240 What you should have found is … for the mean, it’s the integral of one because p 124 00:11:33,240 --> 00:11:40,980 of x is one, times x between zero and one dx, which is x squared over two evaluated 125 00:11:40,990 --> 00:11:47,649 between zero and one, which equal one half… which makes sense. The mean here, the average 126 00:11:47,649 --> 00:11:52,680 value is just one-half right in the middle of the distribution of the possible values 127 00:11:52,680 --> 00:11:54,279 of x. 128 00:11:54,279 --> 00:12:02,159 What about the mean square? For that, you should have found almost the same calculation, one 129 00:12:02,160 --> 00:12:09,519 times x squared dx, which equals x cubed over 3 between zero and one equals one-third. And 130 00:12:09,519 --> 00:12:15,200 thus, the variance is equal to one-third, that’s the mean square minus the squared 131 00:12:15,200 --> 00:12:22,720 mean, which is… one twelfth. And that number is familiar. That’s the same 1/12 that shows 132 00:12:22,730 --> 00:12:28,500 up in the moment of inertia of a ruler of length l and mass m. Its moment of inertia 133 00:12:28,500 --> 00:12:34,600 is 1/12 ml squared which illustrates again the connection between moments of inertia 134 00:12:34,600 --> 00:12:36,570 and moments of distributions. 135 00:12:36,570 --> 00:12:42,589 Let’s apply our knowledge to understand quantitatively, or in a formal way, what happens 136 00:12:42,589 --> 00:12:47,600 with airport travel – why does it seem so much longer on the way there, than on the 137 00:12:47,600 --> 00:12:48,370 way back? 138 00:12:48,370 --> 00:12:53,839 Here is the ideal travel experience to the airport, the distribution of travel times 139 00:12:53,839 --> 00:12:59,079 t. Here's the probability of each particular travel time, p 140 00:12:59,079 --> 00:13:06,219 of t. In the ideal world, the travel time would be very predictable. Let’s say it 141 00:13:06,230 --> 00:13:11,570 would be almost always twenty minutes. In that case, you would allow twenty minutes 142 00:13:11,570 --> 00:13:16,329 to get to the airport and you would allow twenty minutes on the way back. Going there 143 00:13:16,329 --> 00:13:18,399 and coming back would seem the same. 144 00:13:18,399 --> 00:13:24,070 But, here’s what travel to the airport actually looks like. Let’s say the mean is still 145 00:13:24,070 --> 00:13:30,139 the same, but the reality is that there’s lots of dispersion. And so the curve actually 146 00:13:30,139 --> 00:13:36,360 looks like that. Sometimes the travel time will be 30 minutes, sometimes 40, sometimes 147 00:13:36,360 --> 00:13:38,630 10. 148 00:13:38,630 --> 00:13:44,079 So now, what do you have to do?... this is reality. Well, on the way home, it’s no 149 00:13:44,079 --> 00:13:48,680 problem. On average, you get home in twenty minutes. You leave whenever you get out of 150 00:13:48,680 --> 00:13:54,269 the baggage claim. And while it’s true that the trip to the airport follows the same distribution, 151 00:13:54,269 --> 00:13:59,639 the risk to you of not making it to the airport on time is much greater. If you just allow 152 00:13:59,639 --> 00:14:03,800 twenty minutes, yeah, sometimes you’ll get lucky, but every once in a while it will take 153 00:14:03,800 --> 00:14:06,410 you twenty-five or thirty minutes. 154 00:14:06,410 --> 00:14:09,740 So what you have to do is allow more time on the way there so that you don’t miss 155 00:14:09,740 --> 00:14:14,540 your flight - maybe thirty minutes, maybe even forty minutes. It all depends on the 156 00:14:14,540 --> 00:14:19,730 dispersion, or standard deviation, of the distribution. On the way to the airport, you 157 00:14:19,730 --> 00:14:25,290 are much more aware of the distribution, if you will, than you are on the way back. 158 00:14:29,400 --> 00:14:34,320 In this video, we saw how to calculate the moments of a distribution and how these moments can 159 00:14:34,320 --> 00:14:39,000 help us quickly summarize the distribution. Like life... 160 00:14:39,000 --> 00:14:46,000 when something is complicated, simplify it, grasp it, and understand it by appreciating its moments!