1 00:00:00,499 --> 00:00:04,490 Let X be a random variable, and let g be a function. 2 00:00:04,490 --> 00:00:08,090 We know that if g is linear, then the expected value 3 00:00:08,090 --> 00:00:12,070 of the function is the same as that linear function 4 00:00:12,070 --> 00:00:13,970 of the expected value. 5 00:00:13,970 --> 00:00:17,190 On the other hand, we know that when g is nonlinear, 6 00:00:17,190 --> 00:00:20,910 in general, these two quantities will not be related. 7 00:00:20,910 --> 00:00:23,340 But there is a special case in which 8 00:00:23,340 --> 00:00:27,730 we can establish some relation between these two quantities 9 00:00:27,730 --> 00:00:30,020 in the form of an inequality. 10 00:00:30,020 --> 00:00:33,440 This is Jensen's inequality, which we're going to develop. 11 00:00:33,440 --> 00:00:36,020 Jensen's inequality applies to the special case 12 00:00:36,020 --> 00:00:38,800 where g is a convex function. 13 00:00:38,800 --> 00:00:41,140 I'm going to define more precisely what it 14 00:00:41,140 --> 00:00:43,710 means to be convex shortly. 15 00:00:43,710 --> 00:00:46,210 But in terms of a picture, it's a function 16 00:00:46,210 --> 00:00:48,570 that has a shape of this kind. 17 00:00:48,570 --> 00:00:51,770 So it tends to curve upwards. 18 00:00:51,770 --> 00:00:54,660 So let us look at a simple example. 19 00:00:54,660 --> 00:00:57,660 Suppose that X is a random variable that 20 00:00:57,660 --> 00:01:03,010 can take two values with equal probability. 21 00:01:03,010 --> 00:01:08,080 So these two values have both probability 1/2. 22 00:01:08,080 --> 00:01:12,050 And this is our function, g of x. 23 00:01:12,050 --> 00:01:15,090 Since they take the two values with equal probability, 24 00:01:15,090 --> 00:01:18,190 the expected value is going to be in the middle. 25 00:01:18,190 --> 00:01:23,260 So this here is the expected value of X. 26 00:01:23,260 --> 00:01:25,860 And in particular, this value here 27 00:01:25,860 --> 00:01:33,320 is going to be g of the expected value of X. Now, 28 00:01:33,320 --> 00:01:38,070 the random variable g of X will take this value 29 00:01:38,070 --> 00:01:40,950 with probability 1/2, and it's going 30 00:01:40,950 --> 00:01:43,390 to take that value with probability 1/2. 31 00:01:45,930 --> 00:01:50,800 What is the average of g of X, the expected value of g of X? 32 00:01:50,800 --> 00:01:53,350 It's going to be 1/2 of this value 33 00:01:53,350 --> 00:01:57,440 plus 1/2 of this value, which you can find as follows. 34 00:01:57,440 --> 00:02:01,850 If you draw a straight line, it's going to be this much. 35 00:02:01,850 --> 00:02:10,320 So this quantity here is the expected value of g of X. 36 00:02:10,320 --> 00:02:14,080 And we see that in this example, the expected value of g of X 37 00:02:14,080 --> 00:02:19,100 is above the value of g evaluated 38 00:02:19,100 --> 00:02:21,560 at the expected value of X. 39 00:02:21,560 --> 00:02:24,560 So this is what Jensen's inequality says, 40 00:02:24,560 --> 00:02:26,579 but for a more general distribution 41 00:02:26,579 --> 00:02:33,960 for the random variable X. Let us now step back and define 42 00:02:33,960 --> 00:02:37,625 more precisely what it means for a function to be convex. 43 00:02:44,100 --> 00:02:47,579 The most general definition is the following. 44 00:02:47,579 --> 00:02:52,990 If I take any two points, x and y, 45 00:02:52,990 --> 00:02:56,200 and I take some number p between 0 and 1. 46 00:02:56,200 --> 00:03:03,680 So in that case, the number px plus 1 minus py 47 00:03:03,680 --> 00:03:07,170 is a weighted average of x and y-- 48 00:03:07,170 --> 00:03:09,660 so it's somewhere in the middle. 49 00:03:09,660 --> 00:03:13,960 And if I look at the value of my function 50 00:03:13,960 --> 00:03:18,470 at that particular point-- so this value here 51 00:03:18,470 --> 00:03:22,930 corresponds to this-- this is less 52 00:03:22,930 --> 00:03:28,120 than or equal to the weighted average of the values of g 53 00:03:28,120 --> 00:03:30,980 of x and g of y. 54 00:03:30,980 --> 00:03:32,750 So this is g of x. 55 00:03:32,750 --> 00:03:36,190 This is g of y. 56 00:03:36,190 --> 00:03:44,020 This value here is the weighted average of the two values. 57 00:03:44,020 --> 00:03:51,780 So this quantity here is p times g of x, which is this value, 58 00:03:51,780 --> 00:03:56,840 plus 1 minus p, g of y. 59 00:03:56,840 --> 00:04:02,670 Convexity means that this value is below that value. 60 00:04:02,670 --> 00:04:08,050 Or in other words, whenever I take two points on this curve 61 00:04:08,050 --> 00:04:11,760 and join them by a segment, then the function 62 00:04:11,760 --> 00:04:14,450 lies underneath that segment. 63 00:04:14,450 --> 00:04:17,160 This is one possible definition. 64 00:04:17,160 --> 00:04:19,250 Now, in terms of a picture, we see 65 00:04:19,250 --> 00:04:22,540 that a convex function tends to curve upwards. 66 00:04:22,540 --> 00:04:26,260 This means that the derivative or the slope of the function 67 00:04:26,260 --> 00:04:28,450 keeps increasing. 68 00:04:28,450 --> 00:04:32,440 An increasing slope means that the second derivative 69 00:04:32,440 --> 00:04:35,080 of that function is non-negative. 70 00:04:35,080 --> 00:04:39,409 And that could be an alternative definition of convexity. 71 00:04:39,409 --> 00:04:42,330 It turns out that if you have a function that's 72 00:04:42,330 --> 00:04:47,836 twice differentiable, these two definitions are equivalent. 73 00:04:47,836 --> 00:04:49,460 On the other hand, the first definition 74 00:04:49,460 --> 00:04:51,270 is a little more general, because it also 75 00:04:51,270 --> 00:04:53,890 applies to functions that are not smooth. 76 00:04:53,890 --> 00:04:58,860 So for example, the function absolute value of x 77 00:04:58,860 --> 00:05:00,960 is a convex one. 78 00:05:00,960 --> 00:05:03,575 But it's not differentiable at zero. 79 00:05:06,440 --> 00:05:11,010 Finally, there's another way of defining convexity, 80 00:05:11,010 --> 00:05:13,630 and it is the following property, 81 00:05:13,630 --> 00:05:16,170 again for differentiable functions. 82 00:05:16,170 --> 00:05:19,010 What it says is the following. 83 00:05:19,010 --> 00:05:27,910 If I fix a certain point, c-- to use the same diagram 84 00:05:27,910 --> 00:05:32,780 let's say that this is c-- this value here 85 00:05:32,780 --> 00:05:35,630 is going to be g of c. 86 00:05:35,630 --> 00:05:40,610 I look at the derivative of that function and take 87 00:05:40,610 --> 00:05:45,690 this quantity, which is the derivative times how far I 88 00:05:45,690 --> 00:05:46,830 am going. 89 00:05:46,830 --> 00:05:50,290 This quantity here is a first-order Taylor series 90 00:05:50,290 --> 00:05:52,820 approximation of my function. 91 00:05:52,820 --> 00:05:56,320 So it corresponds to this black line. 92 00:05:56,320 --> 00:06:00,460 What this condition says is that the function 93 00:06:00,460 --> 00:06:06,596 lies on top of that tangent line to my function. 94 00:06:06,596 --> 00:06:11,950 It is not too difficult to show that this condition, 95 00:06:11,950 --> 00:06:15,370 non-negative second derivatives, implies this condition. 96 00:06:15,370 --> 00:06:18,820 What you do is that you write down the second order Taylor 97 00:06:18,820 --> 00:06:22,220 approximation of your function g. 98 00:06:22,220 --> 00:06:24,135 And then because the second order term 99 00:06:24,135 --> 00:06:26,880 is going to be non-negative because of that condition, 100 00:06:26,880 --> 00:06:29,770 you're going to get an inequality in this form. 101 00:06:29,770 --> 00:06:32,960 But in any case, this inequality is pretty intuitive. 102 00:06:32,960 --> 00:06:36,220 And we could take this one just as our definition 103 00:06:36,220 --> 00:06:38,700 of convexity-- that is, a function is convex 104 00:06:38,700 --> 00:06:41,520 if it has the property that whenever I draw 105 00:06:41,520 --> 00:06:44,970 a tangent to my curve, the function lies 106 00:06:44,970 --> 00:06:48,740 on top of this linear function. 107 00:06:48,740 --> 00:06:53,680 So now let's move back to probability. 108 00:06:53,680 --> 00:06:55,750 Suppose that g is convex. 109 00:06:55,750 --> 00:06:58,190 This is true for every x. 110 00:06:58,190 --> 00:07:02,820 So in particular, if I have my random variable, 111 00:07:02,820 --> 00:07:08,710 capital X, no matter what my random variable is, 112 00:07:08,710 --> 00:07:22,750 we're going to get this kind of inequality, no matter what. 113 00:07:22,750 --> 00:07:26,280 And here what I they left blank is c. 114 00:07:26,280 --> 00:07:28,040 Now, c is a number. 115 00:07:28,040 --> 00:07:30,110 This is true for any number. 116 00:07:30,110 --> 00:07:34,020 So in particular, it's true if I use as my number 117 00:07:34,020 --> 00:07:46,610 the expected value of X. So we have this inequality 118 00:07:46,610 --> 00:07:49,460 that's true now in terms of random variables. 119 00:07:49,460 --> 00:07:54,220 No matter what capital X happens to be, this will be valid. 120 00:07:54,220 --> 00:07:58,100 And now let us take expectations of both sides. 121 00:07:58,100 --> 00:08:02,550 What we obtain is that the expected value of g of X 122 00:08:02,550 --> 00:08:05,300 is larger than-- now, this is a number, 123 00:08:05,300 --> 00:08:07,160 so the expected value of that number 124 00:08:07,160 --> 00:08:15,820 is the number itself plus the expected value of this term. 125 00:08:15,820 --> 00:08:17,790 This quantity is a number. 126 00:08:17,790 --> 00:08:22,240 The expected value of X minus the expected value of X 127 00:08:22,240 --> 00:08:25,200 is equal to 0. 128 00:08:25,200 --> 00:08:30,450 And we have established this particular fact, 129 00:08:30,450 --> 00:08:32,599 which is true for any convex function. 130 00:08:35,280 --> 00:08:38,520 So this is Jensen's inequality. 131 00:08:38,520 --> 00:08:41,480 Let's apply it to some examples. 132 00:08:41,480 --> 00:08:45,890 Let's consider the function g, which 133 00:08:45,890 --> 00:08:48,690 is the quadratic function. 134 00:08:48,690 --> 00:08:50,860 Clearly, this is a convex function. 135 00:08:50,860 --> 00:08:52,700 It has this kind of shape. 136 00:08:52,700 --> 00:08:55,810 And the second derivative of this function is positive. 137 00:08:55,810 --> 00:08:58,800 That's another way of verifying it. 138 00:08:58,800 --> 00:09:00,980 Jensen's inequality is going to tell us 139 00:09:00,980 --> 00:09:05,620 something about the expected value of X squared. 140 00:09:05,620 --> 00:09:07,770 Now, for this expectation, we already 141 00:09:07,770 --> 00:09:12,970 know that this is equal to the variance of X plus the square 142 00:09:12,970 --> 00:09:15,310 of the expected value. 143 00:09:15,310 --> 00:09:18,150 And since the variance is always non-negative, 144 00:09:18,150 --> 00:09:20,210 we obtain this inequality. 145 00:09:23,660 --> 00:09:26,280 This is consistent with Jensen's inequality. 146 00:09:26,280 --> 00:09:30,720 Jensen's inequality tells us that E of g of X, 147 00:09:30,720 --> 00:09:33,780 with g the quadratic function, is 148 00:09:33,780 --> 00:09:37,270 larger than or equal to the square-- that 149 00:09:37,270 --> 00:09:41,270 is, g of the expected value. 150 00:09:41,270 --> 00:09:43,480 So for the case of the square function, 151 00:09:43,480 --> 00:09:46,440 Jensen's inequality did not tell us anything 152 00:09:46,440 --> 00:09:47,800 that we didn't know. 153 00:09:47,800 --> 00:09:51,180 But it's nice to confirm that it is consistent. 154 00:09:51,180 --> 00:09:53,620 But we could use Jensen's inequality 155 00:09:53,620 --> 00:09:57,490 in another setting where the answer might not be as obvious. 156 00:09:57,490 --> 00:10:00,400 For example, take the function X to the fourth. 157 00:10:00,400 --> 00:10:02,680 This is also a convex function. 158 00:10:02,680 --> 00:10:06,070 And Jensen's inequality is going to tell us 159 00:10:06,070 --> 00:10:12,510 that the fourth power of the expected value is less than 160 00:10:12,510 --> 00:10:17,555 or equal to the expected value of X to the fourth. 161 00:10:20,390 --> 00:10:23,160 Another case of a convex function 162 00:10:23,160 --> 00:10:25,860 is the negative logarithm. 163 00:10:30,650 --> 00:10:33,370 Remember that the logarithmic function 164 00:10:33,370 --> 00:10:38,040 has a shape of this kind, which curves the opposite way. 165 00:10:38,040 --> 00:10:40,300 So it's called a concave function. 166 00:10:40,300 --> 00:10:42,910 But if you take the negative of this function, 167 00:10:42,910 --> 00:10:47,820 then you're going to get something that is convex. 168 00:10:47,820 --> 00:10:50,010 So by applying Jensen's inequality 169 00:10:50,010 --> 00:10:53,400 to this setting, what we obtain is 170 00:10:53,400 --> 00:11:02,020 that g, which is minus log of the expected value of X, 171 00:11:02,020 --> 00:11:12,860 is less than or equal to the expected value of minus log X. 172 00:11:12,860 --> 00:11:18,230 And then we can remove the minus signs from both sides. 173 00:11:18,230 --> 00:11:21,370 And that is going to reverse the inequality. 174 00:11:21,370 --> 00:11:28,120 And we will obtain that the logarithm of the expected value 175 00:11:28,120 --> 00:11:38,190 of X is larger than or equal to the expected value of log X. 176 00:11:38,190 --> 00:11:41,170 So in this case for the logarithmic function, 177 00:11:41,170 --> 00:11:44,840 the inequality goes in the opposite direction. 178 00:11:44,840 --> 00:11:47,320 The reason is that the logarithmic function 179 00:11:47,320 --> 00:11:51,800 is a concave function, not a convex one. 180 00:11:51,800 --> 00:11:55,560 And by arguing similar to this example, 181 00:11:55,560 --> 00:11:59,080 a concave function is the negative of a convex function. 182 00:11:59,080 --> 00:12:02,200 And for concave functions, Jensen's inequality still 183 00:12:02,200 --> 00:12:06,800 holds, but the inequality goes the opposite way. 184 00:12:06,800 --> 00:12:09,650 Jensen's inequality turns out to be quite useful. 185 00:12:09,650 --> 00:12:12,010 In many cases, we want to say something 186 00:12:12,010 --> 00:12:14,180 about the expected value of g of X, 187 00:12:14,180 --> 00:12:17,661 and Jensen's inequality allows us to do that.