1 00:00:01 --> 00:00:03 The following content is provided under a Creative 2 00:00:03 --> 00:00:05 Commons license. Your support will help MIT 3 00:00:05 --> 00:00:08 OpenCourseWare continue to offer high quality educational 4 00:00:08 --> 00:00:13 resources for free. To make a donation or to view 5 00:00:13 --> 00:00:18 additional materials from hundreds of MIT courses, 6 00:00:18 --> 00:00:23 visit MIT OpenCourseWare at ocw.mit.edu. 7 00:00:25 --> 00:00:29 Last time we saw things about gradients and directional 8 00:00:29 --> 00:00:32 derivatives. Before that we studied how to 9 00:00:32 --> 00:00:37 look for minima and maxima of functions of several variables. 10 00:00:37 --> 00:00:41 And today we are going to look again at min/max problems but in 11 00:00:41 --> 00:00:45 a different setting, namely, one for variables that 12 00:00:45 --> 00:00:49 are not independent. And so what we will see is you 13 00:00:49 --> 00:00:52 may have heard of Lagrange multipliers. 14 00:00:52 --> 00:00:59 And this is the one point in the term when I can shine with 15 00:00:59 --> 00:01:05 my French accent and say Lagrange's name properly. 16 00:01:05 --> 00:01:08 OK. What are Lagrange multipliers 17 00:01:08 --> 00:01:13 about? Well, the goal is to minimize 18 00:01:13 --> 00:01:19 or maximize a function of several variables. 19 00:01:19 --> 00:01:22 Let's say, for example, f of x, y, z, 20 00:01:22 --> 00:01:27 but where these variables are no longer independent. 21 00:01:27 --> 00:01:41 22 00:01:41 --> 00:01:43 They are not independent. That means that there is a 23 00:01:43 --> 00:01:47 relation between them. The relation is maybe some 24 00:01:47 --> 00:01:52 equation of the form g of x, y, z equals some constant. 25 00:01:52 --> 00:01:57 You take the relation between x, y, z, you call that g and 26 00:01:57 --> 00:02:02 that gives you the constraint. And your goal is to minimize f 27 00:02:02 --> 00:02:05 only of those values of x, y, z that satisfy the 28 00:02:05 --> 00:02:07 constraint. What is one way to do that? 29 00:02:07 --> 00:02:10 Well, one to do that, if the constraint is very 30 00:02:10 --> 00:02:14 simple, we can maybe solve for one of the variables. 31 00:02:14 --> 00:02:17 Maybe we can solve this equation for one of the 32 00:02:17 --> 00:02:21 variables, plug it back into f, and then we have a usual 33 00:02:21 --> 00:02:25 min/max problem that we have seen how to do. 34 00:02:25 --> 00:02:28 The problem is sometimes you cannot actually solve for x, 35 00:02:28 --> 00:02:31 y, z in here because this condition is too complicated and 36 00:02:31 --> 00:02:38 then we need a new method. That is what we are going to do. 37 00:02:38 --> 00:02:41 Why would we care about that? Well, one example is actually 38 00:02:41 --> 00:02:43 in physics. Maybe you have seen in 39 00:02:43 --> 00:02:47 thermodynamics that you study quantities about gases, 40 00:02:47 --> 00:02:50 and those quantities that involve pressure, 41 00:02:50 --> 00:02:53 volume and temperature. And pressure, 42 00:02:53 --> 00:02:56 volume and temperature are not independent of each other. 43 00:02:56 --> 00:02:59 I mean you know probably the equation PV = NRT. 44 00:02:59 --> 00:03:01 And, of course, there you could actually solve 45 00:03:01 --> 00:03:03 to express things in terms of one or the other. 46 00:03:03 --> 00:03:07 But sometimes it is more convenient to keep all three 47 00:03:07 --> 00:03:09 variables but treat them as constrained. 48 00:03:09 --> 00:03:19 It is just an example of a situation where you might want 49 00:03:19 --> 00:03:24 to do this. Anyway, we will look mostly at 50 00:03:24 --> 00:03:28 particular examples, but just to point out that this 51 00:03:28 --> 00:03:32 is useful when you study guesses in physics. 52 00:03:32 --> 00:03:35 The first observation is we cannot use our usual method of 53 00:03:35 --> 00:03:36 looking for critical points of f. 54 00:03:36 --> 00:03:40 Because critical points of f typically will not satisfy this 55 00:03:40 --> 00:03:43 condition and so won't be good solutions. 56 00:03:43 --> 00:03:49 We need something else. Let's look at an example, 57 00:03:49 --> 00:03:53 and we will see how that leads us to the method. 58 00:03:53 --> 00:04:03 For example, let's say that I want to find 59 00:04:03 --> 00:04:17 the point closest to the origin -- -- on the hyperbola xy equals 60 00:04:17 --> 00:04:23 3 in the plane. That means I have this 61 00:04:23 --> 00:04:26 hyperbola, and I am asking myself what is the point on it 62 00:04:26 --> 00:04:29 that is the closest to the origin? 63 00:04:29 --> 00:04:31 I mean we can solve this by elementary geometry, 64 00:04:31 --> 00:04:34 we don't need actually Lagrange multipliers, 65 00:04:34 --> 00:04:38 but we are going to do it with Lagrange multipliers because it 66 00:04:38 --> 00:04:41 is a pretty good example. What does it mean? 67 00:04:41 --> 00:04:47 Well, it means that we want to minimize distance to the origin. 68 00:04:47 --> 00:04:49 What is the distance to the origin? 69 00:04:49 --> 00:04:53 If I have a point, at coordinates (x, 70 00:04:53 --> 00:04:58 y) and then the distance to the origin is square root of x 71 00:04:58 --> 00:05:02 squared plus y squared. Well, do we really want to 72 00:05:02 --> 00:05:05 minimize that or can we minimize something easier? 73 00:05:05 --> 00:05:06 Yeah. Maybe we can minimize the 74 00:05:06 --> 00:05:14 square of a distance. Let's forget this guy and 75 00:05:14 --> 00:05:23 instead -- Actually, we will minimize f of x, 76 00:05:23 --> 00:05:27 y equals x squared plus y squared, 77 00:05:27 --> 00:05:39 that looks better, subject to the constraint xy = 78 00:05:39 --> 00:05:44 3. And so we will call this thing 79 00:05:44 --> 00:05:50 g of x, y to illustrate the general method. 80 00:05:50 --> 00:05:58 Let's look at a picture. Here you can see in yellow the 81 00:05:58 --> 00:06:02 hyperbola xy equals three. And we are going to look for 82 00:06:02 --> 00:06:05 the points that are the closest to the origin. 83 00:06:05 --> 00:06:08 What can we do? Well, for example, 84 00:06:08 --> 00:06:13 we can plot the function x squared plus y squared, 85 00:06:13 --> 00:06:17 function f. That is the contour plot of f 86 00:06:17 --> 00:06:21 with a hyperbola on top of it. Now let's see what we can do 87 00:06:21 --> 00:06:25 with that. Well, let's ask ourselves, 88 00:06:25 --> 00:06:30 for example, if I look at points where f 89 00:06:30 --> 00:06:34 equals 20 now. I think I am at 20 but you 90 00:06:34 --> 00:06:37 cannot really see it. That is a circle with a point 91 00:06:37 --> 00:06:41 whose distant square is 20. Well, can I find a solution if 92 00:06:41 --> 00:06:44 I am on the hyperbola? Yes, there are four points at 93 00:06:44 --> 00:06:46 this distance. Can I do better? 94 00:06:46 --> 00:06:49 Well, let's decrease for distance. 95 00:06:49 --> 00:06:52 Yes, we can still find points on the hyperbola and so on. 96 00:06:52 --> 00:06:56 Except if we go too low then there are no points on this 97 00:06:56 --> 00:07:00 circle anymore in the hyperbola. If we decrease the value of f 98 00:07:00 --> 00:07:03 that we want to look at that will somehow limit value beyond 99 00:07:03 --> 00:07:07 which we cannot go, and that is the minimum of f. 100 00:07:07 --> 00:07:13 We are trying to look for the smallest value of f that will 101 00:07:13 --> 00:07:17 actually be realized on the hyperbola. 102 00:07:17 --> 00:07:20 When does that happen? Well, I have to backtrack a 103 00:07:20 --> 00:07:23 little bit. It seems like the limiting case 104 00:07:23 --> 00:07:26 is basically here. It is when the circle is 105 00:07:26 --> 00:07:31 tangent to the hyperbola. That is the smallest circle 106 00:07:31 --> 00:07:37 that will hit the hyperbola. If I take a larger value of f, 107 00:07:37 --> 00:07:39 I will have solutions. If I take a smaller value of f, 108 00:07:39 --> 00:07:41 I will not have any solutions anymore. 109 00:07:41 --> 00:07:49 So, that is the situation that we want to solve for. 110 00:07:49 --> 00:07:54 How do we find that minimum? Well, a key observation that is 111 00:07:54 --> 00:07:58 valid on this picture, and that actually remain true 112 00:07:58 --> 00:08:03 in the completely general case, is that when we have a minimum 113 00:08:03 --> 00:08:09 the level curve of f is actually tangent to our hyperbola. 114 00:08:09 --> 00:08:15 It is tangent to the set of points where x, 115 00:08:15 --> 00:08:20 y equals three, to the hyperbola. 116 00:08:20 --> 00:08:32 Let's write that down. We observe that at the minimum 117 00:08:32 --> 00:08:49 the level curve of f is tangent to the hyperbola. 118 00:08:49 --> 00:08:53 Remember, the hyperbola is given by the equal g equals 119 00:08:53 --> 00:08:56 three, so it is a level curve of g. 120 00:08:56 --> 00:08:59 We have a level curve of f and a level curve of g that are 121 00:08:59 --> 00:09:03 tangent to each other. And I claim that is going to be 122 00:09:03 --> 00:09:07 the general situation that we are interested in. 123 00:09:07 --> 00:09:12 How do we try to solve for points where this happens? 124 00:09:12 --> 00:09:28 125 00:09:28 --> 00:09:36 How do we find x, y where the level curves of f 126 00:09:36 --> 00:09:47 and g are tangent to each other? Let's think for a second. 127 00:09:47 --> 00:09:51 If the two level curves are tangent to each other that means 128 00:09:51 --> 00:09:57 they have the same tangent line. That means that the normal 129 00:09:57 --> 00:10:03 vectors should be parallel. Let me maybe draw a picture 130 00:10:03 --> 00:10:06 here. This is the level curve maybe f 131 00:10:06 --> 00:10:11 equals something. And this is the level curve g 132 00:10:11 --> 00:10:16 equals constant. Here my constant is three. 133 00:10:16 --> 00:10:20 Well, if I look for gradient vectors, the gradient of f will 134 00:10:20 --> 00:10:23 be perpendicular to the level curve of f. 135 00:10:23 --> 00:10:27 The gradient of g will be perpendicular to the level curve 136 00:10:27 --> 00:10:29 of g. They don't have any reason to 137 00:10:29 --> 00:10:32 be of the same size, but they have to be parallel to 138 00:10:32 --> 00:10:35 each other. Of course, they could also be 139 00:10:35 --> 00:10:38 parallel pointing in opposite directions. 140 00:10:38 --> 00:10:48 But the key point is that when this happens the gradient of f 141 00:10:48 --> 00:10:54 is parallel to the gradient of g. 142 00:10:54 --> 00:11:03 Well, let's check that. Here is a point. 143 00:11:03 --> 00:11:05 And I can plot the gradient of f in blue. 144 00:11:05 --> 00:11:08 The gradient of g in yellow. And you see, 145 00:11:08 --> 00:11:12 in most of these places, somehow the two gradients are 146 00:11:12 --> 00:11:14 not really parallel. Actually, I should not be 147 00:11:14 --> 00:11:17 looking at random points. I should be looking only on the 148 00:11:17 --> 00:11:19 hyperbola. I want points on the hyperbola 149 00:11:19 --> 00:11:22 where the two gradients are parallel. 150 00:11:22 --> 00:11:28 Well, when does that happen? Well, it looks like it will 151 00:11:28 --> 00:11:31 happen here. When I am at a minimum, 152 00:11:31 --> 00:11:34 the two gradient vectors are parallel. 153 00:11:34 --> 00:11:37 It is not really proof. It is an example that seems to 154 00:11:37 --> 00:11:43 be convincing. So far things work pretty well. 155 00:11:43 --> 00:11:46 How do we decide if two vectors are parallel? 156 00:11:46 --> 00:11:50 Well, they are parallel when they are proportional to each 157 00:11:50 --> 00:11:54 other. You can write one of them as a 158 00:11:54 --> 00:12:02 constant times the other one, and that constant usually one 159 00:12:02 --> 00:12:07 uses the Greek letter lambda. I don't know if you have seen 160 00:12:07 --> 00:12:10 it before. It is the Greek letter for L. 161 00:12:10 --> 00:12:15 And probably, I am sure, it is somebody's 162 00:12:15 --> 00:12:22 idea of paying tribute to Lagrange by putting an L in 163 00:12:22 --> 00:12:25 there. Lambda is just a constant. 164 00:12:25 --> 00:12:31 And we are looking for a scalar lambda and points x and y where 165 00:12:31 --> 00:12:33 this holds. In fact, 166 00:12:33 --> 00:12:37 what we are doing is replacing min/max problems in two 167 00:12:37 --> 00:12:41 variables with a constraint between them by a set of 168 00:12:41 --> 00:12:47 equations involving, you will see, three variables. 169 00:12:47 --> 00:12:54 We had min/max with two variables x, y, 170 00:12:54 --> 00:13:00 but no independent. We had a constraint g of x, 171 00:13:00 --> 00:13:06 y equals constant. And that becomes something new. 172 00:13:06 --> 00:13:12 That becomes a system of equations where we have to 173 00:13:12 --> 00:13:19 solve, well, let's write down what it means for gradient f to 174 00:13:19 --> 00:13:26 be proportional to gradient g. That means that f sub x should 175 00:13:26 --> 00:13:32 be lambda times g sub x, and f sub y should be lambda 176 00:13:32 --> 00:13:36 times g sub y. Because the gradient vectors 177 00:13:36 --> 00:13:39 here are f sub x, f sub y and g sub x, 178 00:13:39 --> 00:13:43 g sub y. If you have a third variable z 179 00:13:43 --> 00:13:49 then you have also an equation f sub z equals lambda g sub z. 180 00:13:49 --> 00:13:53 Now, let's see. How many unknowns do we have in 181 00:13:53 --> 00:13:55 these equations? Well, there is x, 182 00:13:55 --> 00:14:01 there is y and there is lambda. We have three unknowns and have 183 00:14:01 --> 00:14:06 only two equations. Something is missing. 184 00:14:06 --> 00:14:10 Well, I mean x and y are not actually independent. 185 00:14:10 --> 00:14:14 They are related by the equation g of x, 186 00:14:14 --> 00:14:21 y equals c, so we need to add the constraint g equals c. 187 00:14:21 --> 00:14:26 And now we have three equations involving three variables. 188 00:14:26 --> 00:14:39 Let's see how that works. Here remember we have f equals 189 00:14:39 --> 00:14:45 x squared y squared and g = xy. What is f sub x? 190 00:14:45 --> 00:14:52 It is going to be 2x equals lambda times, 191 00:14:52 --> 00:14:55 what is g sub x, y. 192 00:14:55 --> 00:14:59 Maybe I should write here f sub x equals lambda g sub x just to 193 00:14:59 --> 00:15:03 remind you. Then we have f sub y equals 194 00:15:03 --> 00:15:10 lambda g sub y. F sub y is 2y equals lambda 195 00:15:10 --> 00:15:18 times g sub y is x. And then our third equation g 196 00:15:18 --> 00:15:22 equals c becomes xy equals three. 197 00:15:22 --> 00:15:26 So, that is what you would have to solve. 198 00:15:26 --> 00:15:33 Any questions at this point? No. 199 00:15:33 --> 00:15:44 Yes? How do I know the direction of 200 00:15:44 --> 00:15:47 a gradient? Do you mean how do I know that 201 00:15:47 --> 00:15:50 it is perpendicular to a level curve? 202 00:15:50 --> 00:15:54 Oh, how do I know if it points in that direction on the 203 00:15:54 --> 00:15:56 opposite one? Well, that depends. 204 00:15:56 --> 00:15:59 I mean we'd seen in last time, but the gradient is 205 00:15:59 --> 00:16:02 perpendicular to the level and points towards higher values of 206 00:16:02 --> 00:16:05 a function. So it could be -- Wait. 207 00:16:05 --> 00:16:08 What did I have? It could be that my gradient 208 00:16:08 --> 00:16:11 vectors up there actually point in opposite directions. 209 00:16:11 --> 00:16:15 It doesn't matter to me because it will still look the same in 210 00:16:15 --> 00:16:18 terms of the equation, just lambda will be positive or 211 00:16:18 --> 00:16:22 negative, depending on the case. I can handle both situations. 212 00:16:22 --> 00:16:30 It's not a problem. I can allow lambda to be 213 00:16:30 --> 00:16:34 positive or negative. Well, in this example, 214 00:16:34 --> 00:16:35 it looks like lambda will be positive. 215 00:16:35 --> 00:16:38 If you look at the picture on the plot. 216 00:16:38 --> 00:16:48 Yes? Well, because actually they are 217 00:16:48 --> 00:16:51 not equal to each other. If you look at this point where 218 00:16:51 --> 00:16:55 the hyperbola and the circle touch each other, 219 00:16:55 --> 00:16:58 first of all, I don't know which circle I am 220 00:16:58 --> 00:17:01 going to look at. I am trying to solve, 221 00:17:01 --> 00:17:04 actually, for the radius of the circle. 222 00:17:04 --> 00:17:07 I am trying to find what the minimum value of f is. 223 00:17:07 --> 00:17:10 And, second, at that point, 224 00:17:10 --> 00:17:14 the value of f and the value of g are not equal. 225 00:17:14 --> 00:17:17 g is equal to three because I want the hyperbola x equals 226 00:17:17 --> 00:17:19 three. The value of f will be the 227 00:17:19 --> 00:17:22 square of a distance, whatever that is. 228 00:17:22 --> 00:17:27 I think it will end up being 6, but we will see. 229 00:17:27 --> 00:17:29 So, you cannot really set them equal because you don't know 230 00:17:29 --> 00:17:45 what f is equal to in advance. Yes? 231 00:17:45 --> 00:17:49 Not quite. Actually, here I am just using 232 00:17:49 --> 00:17:52 this idea of finding a point closest to the origin to 233 00:17:52 --> 00:17:55 illustrate an example of a min/max problem. 234 00:17:55 --> 00:17:59 The general problem we are trying to solve is minimize f 235 00:17:59 --> 00:18:03 subject to g equals constant. And what we are going to do for 236 00:18:03 --> 00:18:07 that is we are really going to say instead let's look at places 237 00:18:07 --> 00:18:10 where gradient f and gradient g are parallel to each other and 238 00:18:10 --> 00:18:14 solve for equations of that. I think we completely lose the 239 00:18:14 --> 00:18:19 notion of closest point if we just look at these equations. 240 00:18:19 --> 00:18:21 We don't really say anything about closest points anymore. 241 00:18:21 --> 00:18:24 Of course, that is what they mean in the end. 242 00:18:24 --> 00:18:28 But, in the general setting, there is no closest point 243 00:18:28 --> 00:18:31 involved anymore. OK. 244 00:18:31 --> 00:18:40 Yes? Yes. 245 00:18:40 --> 00:18:43 It is always going to be the case that, 246 00:18:43 --> 00:18:46 at the minimum, or at the maximum of a function 247 00:18:46 --> 00:18:49 subject to a constraint, the level curves of f and the 248 00:18:49 --> 00:18:52 level curves of g will be tangent to each other. 249 00:18:52 --> 00:18:54 That is the basis for this method. 250 00:18:54 --> 00:19:00 I am going to justify that soon. It could be minimum or maximum. 251 00:19:00 --> 00:19:02 In three-dimensions it could even be a saddle point. 252 00:19:02 --> 00:19:03 And, in fact, I should say in advance, 253 00:19:03 --> 00:19:06 this method will not tell us whether it is a minimum or a 254 00:19:06 --> 00:19:08 maximum. We do not have any way of 255 00:19:08 --> 00:19:10 knowing, except for testing values. 256 00:19:10 --> 00:19:13 We cannot use second derivative tests or anything like that. 257 00:19:13 --> 00:19:21 I will get back to that. Yes? 258 00:19:21 --> 00:19:23 Yes. Here you can set y equals to 259 00:19:23 --> 00:19:26 favor x. Then you can minimize x squared 260 00:19:26 --> 00:19:30 plus nine over x squared. In general, if I am trying to 261 00:19:30 --> 00:19:33 solve a more complicated problem, I might not be able to 262 00:19:33 --> 00:19:35 solve. I am doing an example where, 263 00:19:35 --> 00:19:38 indeed, here you could solve and remove one variable, 264 00:19:38 --> 00:19:41 but you cannot always do that. And this method will still work. 265 00:19:41 --> 00:19:47 The other one won't. OK. 266 00:19:47 --> 00:19:53 I don't see any other questions. Are there any other questions? 267 00:19:53 --> 00:19:56 No. OK. 268 00:19:56 --> 00:20:02 I see a lot of students stretching and so on, 269 00:20:02 --> 00:20:08 so it is very confusing for me. How do we solve these equations? 270 00:20:08 --> 00:20:14 Well, the answer is in general we might be in deep trouble. 271 00:20:14 --> 00:20:18 There is no general method for solving the equations that you 272 00:20:18 --> 00:20:21 get from this method. You just have to think about 273 00:20:21 --> 00:20:25 them. Sometimes it will be very easy. 274 00:20:25 --> 00:20:28 Sometimes it will be so hard that you cannot actually do it 275 00:20:28 --> 00:20:31 without the computer. Sometimes it will be just hard 276 00:20:31 --> 00:20:33 enough to be on Part B of this week's problem set. 277 00:20:33 --> 00:20:50 278 00:20:50 --> 00:20:56 I claim in this case we can actually do it without so much 279 00:20:56 --> 00:21:03 trouble, because actually we can think of this as a two by two 280 00:21:03 --> 00:21:10 linear system in x and y. Well, let me do something. 281 00:21:10 --> 00:21:18 Let me rewrite the first two equations as 2x - lambda y = 0. 282 00:21:18 --> 00:21:30 And lambda x - 2y = 0. And xy = 3. 283 00:21:30 --> 00:21:36 That is what we want to solve. Well, I can put this into 284 00:21:36 --> 00:21:41 matrix form. Two minus lambda, 285 00:21:41 --> 00:21:48 lambda minus two times x, y equals 0,0. 286 00:21:48 --> 00:21:52 Now, how do I solve a linear system matrix times x, 287 00:21:52 --> 00:21:54 y equals zero? Well, I always have an obvious 288 00:21:54 --> 00:21:56 solution. X and y both equal to zero. 289 00:21:56 --> 00:22:02 Is that a good solution? No, because zero times zero is 290 00:22:02 --> 00:22:07 not three. We want another solution, 291 00:22:07 --> 00:22:14 the trivial solution. 0,0 does not solve the 292 00:22:14 --> 00:22:20 constraint equation xy equals three, so we want another 293 00:22:20 --> 00:22:24 solution. When do we have another 294 00:22:24 --> 00:22:29 solution? Well, when the determinant of a 295 00:22:29 --> 00:22:37 matrix is zero. We have other solutions that 296 00:22:37 --> 00:22:46 exist only if determinant of a matrix is zero. 297 00:22:46 --> 00:23:01 M is this guy. Let's compute the determinant. 298 00:23:01 --> 00:23:08 Well, that seems to be negative four plus lambda squared. 299 00:23:08 --> 00:23:15 That is zero exactly when lambda squared equals four, 300 00:23:15 --> 00:23:20 which is lambda is plus or minus two. 301 00:23:20 --> 00:23:25 Already you see here it is a the level of difficulty that is 302 00:23:25 --> 00:23:30 a little bit much for an exam but perfectly fine for a problem 303 00:23:30 --> 00:23:33 set or for a beautiful lecture like this one. 304 00:23:33 --> 00:23:37 How do we deal with -- Well, we have two cases to look at. 305 00:23:37 --> 00:23:40 Lambda equals two or lambda equals minus two. 306 00:23:40 --> 00:23:43 Let's start with lambda equals two. 307 00:23:43 --> 00:23:47 If I set lambda equals two, what does this equation become? 308 00:23:47 --> 00:23:53 Well, it becomes x equals y. This one becomes y equals x. 309 00:23:53 --> 00:23:57 Well, they seem to be the same. x equals y. 310 00:23:57 --> 00:24:01 And then the equation xy equals three becomes, 311 00:24:01 --> 00:24:06 well, x squared equals three. I have two solutions. 312 00:24:06 --> 00:24:15 One is x equals root three and, therefore, y equals root three 313 00:24:15 --> 00:24:23 as well, or negative root three and negative root three. 314 00:24:23 --> 00:24:26 Let's look at the other case. If I set lambda equal to 315 00:24:26 --> 00:24:30 negative two then I get 2x equals negative 2y. 316 00:24:30 --> 00:24:37 That means x equals negative y. The second one, 317 00:24:37 --> 00:24:40 2y equals negative 2x. That is y equals negative x. 318 00:24:40 --> 00:24:45 Well, that is the same thing. And xy equals three becomes 319 00:24:45 --> 00:24:51 negative x squared equals three. Can we solve that? 320 00:24:51 --> 00:24:58 No. There are no solutions here. 321 00:24:58 --> 00:25:03 Now we have two candidate points which are these two 322 00:25:03 --> 00:25:07 points, root three, root three or negative root 323 00:25:07 --> 00:25:13 three, negative root three. OK. 324 00:25:13 --> 00:25:16 Let's actually look at what we have here. 325 00:25:16 --> 00:25:20 Maybe you cannot read the coordinates, but the point that 326 00:25:20 --> 00:25:23 I have here is indeed root three, root three. 327 00:25:23 --> 00:25:26 How do we see that lambda equals two? 328 00:25:26 --> 00:25:29 Well, if you look at this picture, the gradient of f, 329 00:25:29 --> 00:25:32 that is the blue vector, is indeed twice the yellow 330 00:25:32 --> 00:25:36 vector, gradient g. That is where you read the 331 00:25:36 --> 00:25:41 value of lambda. And we have the other solution 332 00:25:41 --> 00:25:45 which is somewhere here. Negative root three, 333 00:25:45 --> 00:25:48 negative root there. And there, again, 334 00:25:48 --> 00:25:51 lambda equals two. The two vectors are 335 00:25:51 --> 00:25:59 proportional by a factor of two. Yes? 336 00:25:59 --> 00:26:01 No, solutions are not quite guaranteed to be absolute minima 337 00:26:01 --> 00:26:03 or maxima. They are guaranteed to be 338 00:26:03 --> 00:26:06 somehow critical points end of a constraint. 339 00:26:06 --> 00:26:09 That means if you were able to solve and eliminate the variable 340 00:26:09 --> 00:26:12 that would be a critical point. When you have the same problem, 341 00:26:12 --> 00:26:14 as we have critical points, are they maxima or minima? 342 00:26:14 --> 00:26:22 And the answer is, well, we won't know until we 343 00:26:22 --> 00:26:28 check. More questions? 344 00:26:28 --> 00:26:32 No. Yes? 345 00:26:32 --> 00:26:36 What is a Lagrange multiplier? Well, it is this number lambda 346 00:26:36 --> 00:26:39 that is called the multiplier here. 347 00:26:39 --> 00:26:44 It is a multiplier because it is what you have to multiply 348 00:26:44 --> 00:26:48 gradient of g by to get gradient of f. 349 00:26:48 --> 00:26:49 It multiplies. 350 00:26:49 --> 00:27:04 351 00:27:04 --> 00:27:11 Let's try to see why is this method valid? 352 00:27:11 --> 00:27:18 Because so far I have shown you pictures and have said see they 353 00:27:18 --> 00:27:23 are tangent. But why is it that they have to 354 00:27:23 --> 00:27:28 be tangent in general? Let's think about it. 355 00:27:28 --> 00:27:37 Let's say that we are at constrained min or max. 356 00:27:37 --> 00:27:42 What that means is that if I move on the level g equals 357 00:27:42 --> 00:27:46 constant then the value of f should only increase or only 358 00:27:46 --> 00:27:49 decrease. But it means, 359 00:27:49 --> 00:27:53 in particular, to first order it will not 360 00:27:53 --> 00:27:56 change. At an unconstrained min or max, 361 00:27:56 --> 00:27:59 partial derivatives are zero. In this case, 362 00:27:59 --> 00:28:02 derivatives are zero only in the allowed directions. 363 00:28:02 --> 00:28:09 And the allowed directions are those that stay on the levels of 364 00:28:09 --> 00:28:21 this g equals constant. In any direction along the 365 00:28:21 --> 00:28:40 level set g = c the rate of change of f must be zero. 366 00:28:40 --> 00:28:44 That is what happens at minima or maxima. 367 00:28:44 --> 00:28:49 Except here, of course, we look only at the 368 00:28:49 --> 00:28:54 allowed directions. Let's say the same thing in 369 00:28:54 --> 00:28:57 terms of directional derivatives. 370 00:28:57 --> 00:29:23 371 00:29:23 --> 00:29:35 That means for any direction that is tangent to the 372 00:29:35 --> 00:29:49 constraint level g equal c, we must have df over ds in the 373 00:29:49 --> 00:30:00 direction of u equals zero. I will draw a picture. 374 00:30:00 --> 00:30:05 Let's say now I am in three variables just to give you 375 00:30:05 --> 00:30:09 different examples. Here I have a level surface g 376 00:30:09 --> 00:30:11 equals c. I am at my point. 377 00:30:11 --> 00:30:18 And if I move in any direction that is on the level surface, 378 00:30:18 --> 00:30:24 so I move in the direction u tangent to the level surface, 379 00:30:24 --> 00:30:32 then the rate of change of f in that direction should be zero. 380 00:30:32 --> 00:30:34 Now, remember what the formula is for this guy. 381 00:30:34 --> 00:30:44 Well, we have seen that this guy is actually radiant f dot u. 382 00:30:44 --> 00:30:58 That means any such vector u must be perpendicular to the 383 00:30:58 --> 00:31:05 gradient of f. That means that the gradient of 384 00:31:05 --> 00:31:10 f should be perpendicular to anything that is tangent to this 385 00:31:10 --> 00:31:12 level. That means the gradient of f 386 00:31:12 --> 00:31:16 should be perpendicular to the level set. 387 00:31:16 --> 00:31:17 That is what we have shown. 388 00:31:17 --> 00:31:37 389 00:31:37 --> 00:31:40 But we know another vector that is also perpendicular to the 390 00:31:40 --> 00:31:57 level set of g. That is the gradient of g. 391 00:31:57 --> 00:32:02 We conclude that the gradient of f must be parallel to the 392 00:32:02 --> 00:32:07 gradient of g because both are perpendicular to the level set 393 00:32:07 --> 00:32:09 of g. I see confused faces, 394 00:32:09 --> 00:32:13 so let me try to tell you again where that comes from. 395 00:32:13 --> 00:32:16 We said if we had a constrained minimum or maximum, 396 00:32:16 --> 00:32:19 if we move in the level set of g, f doesn't change. 397 00:32:19 --> 00:32:20 Well, it doesn't change to first order. 398 00:32:20 --> 00:32:24 It is the same idea as when you are looking for a minimum you 399 00:32:24 --> 00:32:26 set the derivative equal to zero. 400 00:32:26 --> 00:32:31 So the derivative in any direction, tangent to g equals 401 00:32:31 --> 00:32:34 c, should be the directional derivative of f, 402 00:32:34 --> 00:32:38 in any such direction, should be zero. 403 00:32:38 --> 00:32:43 That is what we mean by critical point of f. 404 00:32:43 --> 00:32:48 And so that means that any vector u, any unit vector 405 00:32:48 --> 00:32:55 tangent to the level set of g is going to be perpendicular to the 406 00:32:55 --> 00:33:00 gradient of f. That means that the gradient of 407 00:33:00 --> 00:33:04 f is perpendicular to the level set of g. 408 00:33:04 --> 00:33:06 If you want, that means the level sets of f 409 00:33:06 --> 00:33:10 and g are tangent to each other. That is justifying what we have 410 00:33:10 --> 00:33:15 observed in the picture that the two level sets have to be 411 00:33:15 --> 00:33:20 tangent to each other at the prime minimum or maximum. 412 00:33:20 --> 00:33:23 Does that make a little bit of sense? 413 00:33:23 --> 00:33:28 Kind of. I see at least a few faces 414 00:33:28 --> 00:33:35 nodding so I take that to be a positive answer. 415 00:33:35 --> 00:33:39 Since I have been asked by several of you, 416 00:33:39 --> 00:33:43 how do I know if it is a maximum or a minimum? 417 00:33:43 --> 00:33:57 Well, warning, the method doesn't tell whether 418 00:33:57 --> 00:34:09 a solution is a minimum or a maximum. 419 00:34:09 --> 00:34:13 How do we do it? Well, more bad news. 420 00:34:13 --> 00:34:26 We cannot use the second derivative test. 421 00:34:26 --> 00:34:30 And the reason for that is that we care actually only about 422 00:34:30 --> 00:34:34 these specific directions that are tangent to variable of g. 423 00:34:34 --> 00:34:39 And we don't want to bother to try to define directional second 424 00:34:39 --> 00:34:42 derivatives. Not to mention that actually it 425 00:34:42 --> 00:34:45 wouldn't work. There is a criterion but it is 426 00:34:45 --> 00:34:49 much more complicated than that. Basically, the answer for us is 427 00:34:49 --> 00:34:52 that we don't have a second derivative test in this 428 00:34:52 --> 00:34:54 situation. What are we left with? 429 00:34:54 --> 00:34:57 Well, we are just left with comparing values. 430 00:34:57 --> 00:35:00 Say that in this problem you found a point where f equals 431 00:35:00 --> 00:35:04 three, a point where f equals nine, a point where f equals 15. 432 00:35:04 --> 00:35:08 Well, then probably the minimum is the point where f equals 433 00:35:08 --> 00:35:12 three and the maximum is 15. Actually, in this case, 434 00:35:12 --> 00:35:17 where we found minima, these two points are tied for 435 00:35:17 --> 00:35:19 minimum. What about the maximum? 436 00:35:19 --> 00:35:22 What is the maximum of f on the hyperbola? 437 00:35:22 --> 00:35:25 Well, it is infinity because the point can go as far as you 438 00:35:25 --> 00:35:29 want from the origin. But the general idea is if we 439 00:35:29 --> 00:35:35 have a good reason to believe that there should be a minimum, 440 00:35:35 --> 00:35:38 and it's not like at infinity or something weird like that, 441 00:35:38 --> 00:35:42 then the minimum will be a solution of the Lagrange 442 00:35:42 --> 00:35:46 multiplier equations. We just look for all the 443 00:35:46 --> 00:35:51 solutions and then we choose the one that gives us the lowest 444 00:35:51 --> 00:35:55 value. Is that good enough? 445 00:35:55 --> 00:35:57 Let me actually write that down. 446 00:35:57 --> 00:36:23 447 00:36:23 --> 00:36:35 To find the minimum or the maximum, we compare values of f 448 00:36:35 --> 00:36:46 at the various solutions -- -- to Lagrange multiplier 449 00:36:46 --> 00:36:49 equations. 450 00:36:49 --> 00:37:08 451 00:37:08 --> 00:37:11 I should say also that sometimes you can just conclude 452 00:37:11 --> 00:37:14 by thinking geometrically. In this case, 453 00:37:14 --> 00:37:18 when it is asking you which point is closest to the origin 454 00:37:18 --> 00:37:23 you can just see that your answer is the correct one. 455 00:37:23 --> 00:37:32 Let's do an advanced example. Advanced means that -- Well, 456 00:37:32 --> 00:37:37 this one I didn't actually dare to put on top of the other 457 00:37:37 --> 00:37:48 problem sets. Instead, I am going to do it. 458 00:37:48 --> 00:37:51 What is this going to be about? We are going to look for a 459 00:37:51 --> 00:38:03 surface minimizing pyramid. Let's say that we want to build 460 00:38:03 --> 00:38:19 a pyramid with a given triangular base -- -- and a 461 00:38:19 --> 00:38:28 given volume. Say that I have maybe in the x, 462 00:38:28 --> 00:38:33 y plane I am giving you some triangle. 463 00:38:33 --> 00:38:40 And I am going to try to build a pyramid. 464 00:38:40 --> 00:38:48 Of course, I can choose where to put the top of a pyramid. 465 00:38:48 --> 00:38:53 This guy will end up being behind now. 466 00:38:53 --> 00:39:09 And the constraint and the goal is to minimize the total surface 467 00:39:09 --> 00:39:13 area. The first time I taught this 468 00:39:13 --> 00:39:15 class, it was a few years ago, was just before they built the 469 00:39:15 --> 00:39:17 Stata Center. And then I used to motivate 470 00:39:17 --> 00:39:20 this problem by saying Frank Gehry has gone crazy and has 471 00:39:20 --> 00:39:23 been given a triangular plot of land he wants to put a pyramid. 472 00:39:23 --> 00:39:26 There needs to be the right amount of volume so that you can 473 00:39:26 --> 00:39:28 put all the offices in there. And he wants it to be, 474 00:39:28 --> 00:39:31 actually, covered in solid gold. 475 00:39:31 --> 00:39:34 And because that is expensive, the administration wants him to 476 00:39:34 --> 00:39:38 cut the costs a bit. And so you have to minimize the 477 00:39:38 --> 00:39:42 total size so that it doesn't cost too much. 478 00:39:42 --> 00:39:45 We will see if MIT comes up with a triangular pyramid 479 00:39:45 --> 00:39:48 building. Hopefully not. 480 00:39:48 --> 00:39:58 It could be our next dorm, you never know. 481 00:39:58 --> 00:40:01 Anyway, it is a fine geometry problem. 482 00:40:01 --> 00:40:07 Let's try to think about how we can do this. 483 00:40:07 --> 00:40:10 The natural way to think about it would be -- Well, 484 00:40:10 --> 00:40:11 what do we have to look for first? 485 00:40:11 --> 00:40:18 We have to look for the position of that top point. 486 00:40:18 --> 00:40:29 Remember we know that the volume of a pyramid is one-third 487 00:40:29 --> 00:40:37 the area of base times height. In fact, fixing the volume, 488 00:40:37 --> 00:40:39 knowing that we have fixed the area of a base, 489 00:40:39 --> 00:40:43 means that we are fixing the height of the pyramid. 490 00:40:43 --> 00:40:47 The height is completely fixed. What we have to choose just is 491 00:40:47 --> 00:40:52 where do we put that top point? Do we put it smack in the 492 00:40:52 --> 00:40:58 middle of a triangle or to a side or even anywhere we want? 493 00:40:58 --> 00:41:15 Its z coordinate is fixed. Let's call h the height. 494 00:41:15 --> 00:41:20 What we could do is something like this. 495 00:41:20 --> 00:41:24 We say we have three points of a base. 496 00:41:24 --> 00:41:32 Let's call them p1 at (x1, y1,0); p2 at (x2, 497 00:41:32 --> 00:41:36 y2,0); p3 at (x3, y3,0). 498 00:41:36 --> 00:41:40 This point p is the unknown point at (x, y, 499 00:41:40 --> 00:41:42 h). We know the height. 500 00:41:42 --> 00:41:46 And then we want to minimize the sum of the areas of these 501 00:41:46 --> 00:41:50 three triangles. One here, one here and one at 502 00:41:50 --> 00:41:53 the back. And areas of triangles we know 503 00:41:53 --> 00:41:57 how to express by using length of cross-product. 504 00:41:57 --> 00:42:00 It becomes a function of x and y. 505 00:42:00 --> 00:42:04 And you can try to minimize it. Actually, it doesn't quite work. 506 00:42:04 --> 00:42:05 The formulas are just too complicated. 507 00:42:05 --> 00:42:14 You will never get there. What happens is actually maybe 508 00:42:14 --> 00:42:18 we need better coordinates. Why do we need better 509 00:42:18 --> 00:42:21 coordinates? That is because the geometry is 510 00:42:21 --> 00:42:24 kind of difficult to do if you use x, y coordinates. 511 00:42:24 --> 00:42:28 I mean formula for cross-product is fine, 512 00:42:28 --> 00:42:33 but then the length of the vector will be annoying and just 513 00:42:33 --> 00:42:37 doesn't look good. Instead, let's think about it 514 00:42:37 --> 00:42:38 differently. 515 00:42:38 --> 00:42:54 516 00:42:54 --> 00:43:01 I claim if we do it this way and we express the area as a 517 00:43:01 --> 00:43:06 function of x, y, well, actually we can't 518 00:43:06 --> 00:43:13 solve for a minimum. Here is another way to do it. 519 00:43:13 --> 00:43:17 Well, what has worked pretty well for us so far is this 520 00:43:17 --> 00:43:19 geometric idea of base times height. 521 00:43:19 --> 00:43:29 So let's think in terms of the heights of side triangles. 522 00:43:29 --> 00:43:37 I am going to use the height of these things. 523 00:43:37 --> 00:43:43 And I am going to say that the area will be the sum of three 524 00:43:43 --> 00:43:48 terms, which are three bases times three heights. 525 00:43:48 --> 00:43:53 Let's give names to these quantities. 526 00:43:53 --> 00:43:58 Actually, for that it is going to be good to have the point in 527 00:43:58 --> 00:44:01 the xy plane that lives directly below p. 528 00:44:01 --> 00:44:08 Let's call it q. P is the point that coordinates 529 00:44:08 --> 00:44:13 x, y, h. And let's call q the point that 530 00:44:13 --> 00:44:19 is just below it and so it' coordinates are x, 531 00:44:19 --> 00:44:22 y, 0. Let's see. 532 00:44:22 --> 00:44:34 Let me draw a map of this thing. p1, p2, p3 and I have my point 533 00:44:34 --> 00:44:37 q in the middle. Let's see. 534 00:44:37 --> 00:44:40 To know these areas, I need to know the base. 535 00:44:40 --> 00:44:44 Well, the base I can decide that I know it because it is 536 00:44:44 --> 00:44:48 part of my given data. I know the sides of this 537 00:44:48 --> 00:44:53 triangle. Let me call the lengths a1, 538 00:44:53 --> 00:44:56 a2, a3. I also need to know the height, 539 00:44:56 --> 00:44:58 so I need to know these lengths. 540 00:44:58 --> 00:45:01 How do I know these lengths? Well, its distance in space, 541 00:45:01 --> 00:45:03 but it is a little bit annoying. 542 00:45:03 --> 00:45:10 But maybe I can reduce it to a distance in the plane by looking 543 00:45:10 --> 00:45:17 instead at this distance here. Let me give names to the 544 00:45:17 --> 00:45:24 distances from q to the sides. Let's call u1, 545 00:45:24 --> 00:45:35 u2, u3 the distances from q to the sides. 546 00:45:35 --> 00:45:47 547 00:45:47 --> 00:45:49 Well, now I can claim I can find, actually, 548 00:45:49 --> 00:45:53 sorry. I need to draw one more thing. 549 00:45:53 --> 00:45:57 I claim I have a nice formula for the area, 550 00:45:57 --> 00:46:01 because this is vertical and this is horizontal so this 551 00:46:01 --> 00:46:05 length here is u3, this length here is h. 552 00:46:05 --> 00:46:13 So what is this length here? It is the square root of u3 553 00:46:13 --> 00:46:17 squared plus h squared. And similarly for these other 554 00:46:17 --> 00:46:23 guys. They are square roots of a u 555 00:46:23 --> 00:46:31 squared plus h squared. The heights of the faces are 556 00:46:31 --> 00:46:36 square root of u1 squared times h squared. 557 00:46:36 --> 00:46:43 And similarly with u2 and u3. So the total side area is going 558 00:46:43 --> 00:46:47 to be the area of the first faces, 559 00:46:47 --> 00:46:58 one-half of base times height, plus one-half of a base times a 560 00:46:58 --> 00:47:06 height plus one-half of the third one. 561 00:47:06 --> 00:47:09 It doesn't look so much better. But, trust me, 562 00:47:09 --> 00:47:15 it will get better. Now, that is a function of 563 00:47:15 --> 00:47:19 three variables, u1, u2, u3. 564 00:47:19 --> 00:47:22 And how do we relate u1, u2, u3 to each other? 565 00:47:22 --> 00:47:25 They are probably not independent. 566 00:47:25 --> 00:47:32 Well, let's cut this triangle here into three pieces like 567 00:47:32 --> 00:47:35 that. Then each piece has side -- 568 00:47:35 --> 00:47:40 Well, let's look at it the piece of the bottom. 569 00:47:40 --> 00:47:50 It has base a3, height u3. Cutting base into three tells 570 00:47:50 --> 00:47:57 you that the area of a base is one-half of a1, 571 00:47:57 --> 00:48:04 u1 plus one-half of a2, u2 plus one-half of a3, 572 00:48:04 --> 00:48:09 u3. And that is our constraint. 573 00:48:09 --> 00:48:12 My three variables, u1, u2, u3, are constrained in 574 00:48:12 --> 00:48:14 this way. The sum of this figure must be 575 00:48:14 --> 00:48:17 the area of a base. And I want to minimize that guy. 576 00:48:17 --> 00:48:23 So that is my g and that guy here is my f. 577 00:48:23 --> 00:48:28 Now we try to apply our Lagrange multiplier equations. 578 00:48:28 --> 00:48:33 Well, partial f of a partial u1 is -- Well, 579 00:48:33 --> 00:48:36 if you do the calculation, you will see it is one-half a1, 580 00:48:36 --> 00:48:43 u1 over square root of u1^2 plus h^2 equals lambda, 581 00:48:43 --> 00:48:46 what is partial g, partial a1? 582 00:48:46 --> 00:48:50 That one you can do, I am sure. It is one-half a1. 583 00:48:50 --> 00:49:00 Oh, these guys simplify. If you do the same with the 584 00:49:00 --> 00:49:09 second one -- -- things simplify again. 585 00:49:09 --> 00:49:17 And the same with the third one. Well, you will get, 586 00:49:17 --> 00:49:21 after simplifying, u3 over square root of u3 587 00:49:21 --> 00:49:24 squared plus h squared equals lambda. 588 00:49:24 --> 00:49:27 Now, that means this guy equals this guy equals this guy. 589 00:49:27 --> 00:49:33 They are all equal to lambda. And, if you think about it, 590 00:49:33 --> 00:49:39 that means that u1 = u2 = u3. See, it looked like scary 591 00:49:39 --> 00:49:42 equations but the solution is very simple. 592 00:49:42 --> 00:49:45 What does it mean? It means that our point q 593 00:49:45 --> 00:49:47 should be equidistant from all three sides. 594 00:49:47 --> 00:49:52 That is called the incenter. Q should be in the incenter. 595 00:49:52 --> 00:49:56 The next time you have to build a golden pyramid and don't want 596 00:49:56 --> 00:49:59 to go broke, well, you know where to put the top. 597 00:49:59 --> 00:50:03 If that was a bit fast, sorry. Anyway, it is not completely 598 00:50:03 --> 00:50:06 crucial. But go over it and you will see 599 00:50:06 --> 00:50:08 it works. Have a nice weekend. 600 00:50:08 --> 00:50:10