1 00:00:01 --> 00:00:03 The following content is provided under a Creative 2 00:00:03 --> 00:00:05 Commons license. Your support will help MIT 3 00:00:05 --> 00:00:08 OpenCourseWare continue to offer high quality educational 4 00:00:08 --> 00:00:13 resources for free. To make a donation or to view 5 00:00:13 --> 00:00:18 additional materials from hundreds of MIT courses, 6 00:00:18 --> 00:00:23 visit MIT OpenCourseWare at ocw.mit.edu. 7 00:00:23 --> 00:00:28 Today we are going to see how to use what we saw last time 8 00:00:28 --> 00:00:33 about partial derivatives to handle minimization or 9 00:00:33 --> 00:00:41 maximization problems involving functions of several variables. 10 00:00:41 --> 00:00:44 Remember last time we said that when we have a function, 11 00:00:44 --> 00:00:49 say, of two variables, x and y, then we have actually two 12 00:00:49 --> 00:00:53 different derivatives, partial f, partial x, 13 00:00:53 --> 00:01:02 also called f sub x, the derivative with respect to 14 00:01:02 --> 00:01:11 x keeping y constant. And we have partial f, 15 00:01:11 --> 00:01:21 partial y, also called f sub y, where we vary y and we keep x 16 00:01:21 --> 00:01:26 as a constant. And now, one thing I didn't 17 00:01:26 --> 00:01:30 have time to tell you about but hopefully you thought about in 18 00:01:30 --> 00:01:37 recitation yesterday, is the approximation formula 19 00:01:37 --> 00:01:47 that tells you what happens if you vary both x and y. 20 00:01:47 --> 00:01:50 f sub x tells us what happens if we change x a little bit, 21 00:01:50 --> 00:01:53 by some small amount delta x. f sub y tells us how f changes, 22 00:01:53 --> 00:01:56 if you change y by a small amount delta y. 23 00:01:56 --> 00:02:00 If we do both at the same time then the two effects will add up 24 00:02:00 --> 00:02:02 with each other, because you can imagine that 25 00:02:02 --> 00:02:05 first you will change x and then you will change y. 26 00:02:05 --> 00:02:12 Or the other way around. It doesn't really matter. 27 00:02:12 --> 00:02:18 If we change x by a certain amount delta x, 28 00:02:18 --> 00:02:23 and if we change y by the amount delta y, 29 00:02:23 --> 00:02:32 and let's say that we have z= f(x, y) then that changes by an 30 00:02:32 --> 00:02:40 amount which is approximately f sub x times delta x plus f sub y 31 00:02:40 --> 00:02:45 times delta y. And that is one of the most 32 00:02:45 --> 00:02:49 important formulas about partial derivatives. 33 00:02:49 --> 00:02:54 The intuition for this, again, is just the two effects 34 00:02:54 --> 00:02:58 of if I change x by a small amount and then I change y. 35 00:02:58 --> 00:03:02 Well, first changing x will modify f, how much does it 36 00:03:02 --> 00:03:06 modify f? The answer is the rate change 37 00:03:06 --> 00:03:09 is f sub x. And if I change y then the rate 38 00:03:09 --> 00:03:13 of change of f when I change y is f sub y. 39 00:03:13 --> 00:03:17 So all together I get this change as a value of f. 40 00:03:17 --> 00:03:19 And, of course, that is only an approximation 41 00:03:19 --> 00:03:22 formula. Actually, there would be higher 42 00:03:22 --> 00:03:28 order terms involving second and third derivatives and so on. 43 00:03:28 --> 00:03:43 One way to justify this -- Sorry. 44 00:03:43 --> 00:03:47 I was distracted by the microphone. 45 00:03:47 --> 00:03:55 OK. How do we justify this formula? 46 00:03:55 --> 00:04:05 Well, one way to think about it is in terms of tangent plane 47 00:04:05 --> 00:04:10 approximation. Let's think about the tangent 48 00:04:10 --> 00:04:13 plane with regard to a function f. 49 00:04:13 --> 00:04:15 We have some pictures to show you. 50 00:04:15 --> 00:04:20 It will be easier if I show you pictures. 51 00:04:20 --> 00:04:24 Remember, partial f, partial x was obtained by 52 00:04:24 --> 00:04:29 looking at the situation where y is held constant. 53 00:04:29 --> 00:04:33 That means I am slicing the graph of f by a plane that is 54 00:04:33 --> 00:04:35 parallel to the x, z plane. 55 00:04:35 --> 00:04:39 And when I change x, z changes, and the slope of 56 00:04:39 --> 00:04:44 that is going to be the derivative with respect to x. 57 00:04:44 --> 00:04:49 Now, if I do the same in the other direction then I will have 58 00:04:49 --> 00:04:53 similarly the slope in a slice now parallel to the y, 59 00:04:53 --> 00:04:57 z plane that will be partial f, partial y. 60 00:04:57 --> 00:05:00 In fact, in each case, I have a line. 61 00:05:00 --> 00:05:02 And that line is tangent to the surface. 62 00:05:02 --> 00:05:06 Now, if I have two lines tangent to the surface, 63 00:05:06 --> 00:05:09 well, then together they determine for me the tangent 64 00:05:09 --> 00:05:13 plane to the surface. Let's try to see how that works. 65 00:05:13 --> 00:05:18 66 00:05:18 --> 00:05:28 We know that f sub x and f sub y are the slopes of two tangent 67 00:05:28 --> 00:05:37 lines to this plane, two tangent lines to the graph. 68 00:05:37 --> 00:05:39 And let's write down the equations of these lines. 69 00:05:39 --> 00:05:41 I am not going to write parametric equations. 70 00:05:41 --> 00:05:45 I am going to write them in terms of x, y, 71 00:05:45 --> 00:05:49 z coordinates. Let's say that partial f of a 72 00:05:49 --> 00:05:53 partial x at the given point is equal to a. 73 00:05:53 --> 00:06:00 That means that we have a line given by the following 74 00:06:00 --> 00:06:05 conditions. I am going to keep y constant 75 00:06:05 --> 00:06:07 equal to y0. And I am going to change x. 76 00:06:07 --> 00:06:12 And, as I change x, z will change at the rate that 77 00:06:12 --> 00:06:22 is equal to a. That would be z = 0 a(x - x0). 78 00:06:22 --> 00:06:26 That is how you would describe a line that, I guess, 79 00:06:26 --> 00:06:30 the one that is plotted in green here, been dissected with 80 00:06:30 --> 00:06:33 the slice parallel to the x, z plane. 81 00:06:33 --> 00:06:40 I hold y constant equal to y0. And z is a function of x that 82 00:06:40 --> 00:06:50 varies with a rate of a. And now if I look similarly at 83 00:06:50 --> 00:06:55 the other slice, let's say that the partial with 84 00:06:55 --> 00:07:00 respect to y is equal to b, then I get another line which 85 00:07:00 --> 00:07:06 is obtained by the fact that z now will depend on y. 86 00:07:06 --> 00:07:10 And the rate of change with respect to y will be b. 87 00:07:10 --> 00:07:15 While x is held constant equal to x0. 88 00:07:15 --> 00:07:19 These two lines are both going to be in the tangent plane to 89 00:07:19 --> 00:07:20 the surface. 90 00:07:20 --> 00:07:40 91 00:07:40 --> 00:07:45 They are both tangent to the graph of f and together they 92 00:07:45 --> 00:07:47 determine the plane. 93 00:07:47 --> 00:07:56 94 00:07:56 --> 00:08:08 And that plane is just given by the formula z = z0 a( x - x0) b 95 00:08:08 --> 00:08:13 ( y - y0). If you look at what happens -- 96 00:08:13 --> 00:08:19 This is the equation of a plane. z equals constant times x plus 97 00:08:19 --> 00:08:24 constant times y plus constant. And if you look at what happens 98 00:08:24 --> 00:08:28 if I hold y constant and vary x, I will get the first line. 99 00:08:28 --> 00:08:33 If I hold x constant and vary y, I get the second line. 100 00:08:33 --> 00:08:34 Another way to do it, of course, 101 00:08:34 --> 00:08:37 would provide actually parametric equations of these 102 00:08:37 --> 00:08:40 lines, get vectors along them and then 103 00:08:40 --> 00:08:43 take the cross-product to get the normal vector to the plane. 104 00:08:43 --> 00:08:47 And then get this equation for the plane using the normal 105 00:08:47 --> 00:08:49 vector. That also works and it gives 106 00:08:49 --> 00:08:53 you the same formula. If you are curious of the 107 00:08:53 --> 00:08:57 exercise, do it again using parametrics and using 108 00:08:57 --> 00:09:01 cross-product to get the plane equation. 109 00:09:01 --> 00:09:03 That is how we get the tangent plane. 110 00:09:03 --> 00:09:06 And now what this approximation formula here says is that, 111 00:09:06 --> 00:09:10 in fact, the graph of a function is close to the tangent 112 00:09:10 --> 00:09:12 plane. If we were moving on the 113 00:09:12 --> 00:09:15 tangent plane, this would be an actual 114 00:09:15 --> 00:09:17 equality. Delta z would be a linear 115 00:09:17 --> 00:09:23 function of delta x and delta y. And the graph of a function is 116 00:09:23 --> 00:09:27 near the tangent plane, but is not quite the same, 117 00:09:27 --> 00:09:33 so it is only an approximation for small delta x and small 118 00:09:33 --> 00:09:43 delta y. The approximation formula says 119 00:09:43 --> 00:09:57 the graph of f is close to its tangent plane. 120 00:09:57 --> 00:10:02 And we can use that formula over here now to estimate how 121 00:10:02 --> 00:10:08 the value of f changes if I change x and y at the same time. 122 00:10:08 --> 00:10:18 Questions about that? Now that we have caught up with 123 00:10:18 --> 00:10:23 what we were supposed to see on Tuesday, I can tell you now 124 00:10:23 --> 00:10:26 about max and min problems. 125 00:10:26 --> 00:10:38 126 00:10:38 --> 00:10:48 That is going to be an application of partial 127 00:10:48 --> 00:11:00 derivatives to look at optimization problems. 128 00:11:00 --> 00:11:03 Maybe ten years from now, when you have a real job, 129 00:11:03 --> 00:11:07 your job might be to actually minimize the cost of something 130 00:11:07 --> 00:11:11 or maximize the profit of something or whatever. 131 00:11:11 --> 00:11:14 But typically the function that you will have to strive to 132 00:11:14 --> 00:11:18 minimize or maximize will depend on several variables. 133 00:11:18 --> 00:11:22 If you have a function of one variable, you know that to find 134 00:11:22 --> 00:11:26 its minimum or its maximum you look at the derivative and set 135 00:11:26 --> 00:11:29 that equal to zero. And you try to then look at 136 00:11:29 --> 00:11:38 what happens to the function. Here it is going to be kind of 137 00:11:38 --> 00:11:47 similar, except, of course, we have several 138 00:11:47 --> 00:11:51 derivatives. For today we will think about a 139 00:11:51 --> 00:11:56 function of two variables, but it works exactly the same 140 00:11:56 --> 00:12:00 if you have three variables, ten variables, 141 00:12:00 --> 00:12:07 a million variables. The first observation is that 142 00:12:07 --> 00:12:17 if we have a local minimum or a local maximum then both partial 143 00:12:17 --> 00:12:21 derivatives, so partial f partial x and 144 00:12:21 --> 00:12:26 partial f partial y, are both zero at the same time. 145 00:12:26 --> 00:12:30 Why is that? Well, let's say that f of x is 146 00:12:30 --> 00:12:32 zero. That means when I vary x to 147 00:12:32 --> 00:12:35 first order the function doesn't change. 148 00:12:35 --> 00:12:37 Maybe that is because it is going through... 149 00:12:37 --> 00:12:42 If I look only at the slice parallel to the x-axis then 150 00:12:42 --> 00:12:45 maybe I am going through the minimum. 151 00:12:45 --> 00:12:48 But if partial f, partial y is not 0 then 152 00:12:48 --> 00:12:51 actually, by changing y, I could still make a value 153 00:12:51 --> 00:12:54 larger or smaller. That wouldn't be an actual 154 00:12:54 --> 00:12:57 maximum or minimum. It would only be a maximum or 155 00:12:57 --> 00:13:01 minimum if I stay in the slice. But if I allow myself to change 156 00:13:01 --> 00:13:04 y that doesn't work. I need actually to know that if 157 00:13:04 --> 00:13:07 I change y the value will not change either to first order. 158 00:13:07 --> 00:13:11 That is why you also need partial f, partial y to be zero. 159 00:13:11 --> 00:13:13 Now, let's say that they are both zero. 160 00:13:13 --> 00:13:16 Well, why is that enough? It is essentially enough 161 00:13:16 --> 00:13:20 because of this formula telling me that if both of these guys 162 00:13:20 --> 00:13:24 are zero then to first order the function doesn't change. 163 00:13:24 --> 00:13:26 Then, of course, there will be maybe quadratic 164 00:13:26 --> 00:13:28 terms that will actually turn that, you know, 165 00:13:28 --> 00:13:31 this won't really say that your function is actually constant. 166 00:13:31 --> 00:13:35 It will just tell you that maybe it will actually be 167 00:13:35 --> 00:13:40 quadratic or higher order in delta x and delta y. 168 00:13:40 --> 00:13:52 That is what you expect to have at a maximum or a minimum. 169 00:13:52 --> 00:14:05 The condition is the same thing as saying that the tangent plane 170 00:14:05 --> 00:14:15 to the graph is actually going to be horizontal. 171 00:14:15 --> 00:14:18 And that is what you want to have. 172 00:14:18 --> 00:14:23 Say you have a minimum, well, the tangent plane at this 173 00:14:23 --> 00:14:30 point, at the bottom of the graph is going to be horizontal. 174 00:14:30 --> 00:14:35 And you can see that on this equation of a tangent plane, 175 00:14:35 --> 00:14:40 when both these coefficients are 0 that is when the equation 176 00:14:40 --> 00:14:44 becomes z equals constant: the horizontal plane. 177 00:14:44 --> 00:14:50 Does that make sense? We will have a name for this 178 00:14:50 --> 00:14:52 kind of point because, actually, 179 00:14:52 --> 00:14:55 what we will see very soon is that these conditions are 180 00:14:55 --> 00:14:57 necessary but are not sufficient. 181 00:14:57 --> 00:15:02 There are actually other kinds of points where the partial 182 00:15:02 --> 00:15:08 derivatives are zero. Let's give a name to this. 183 00:15:08 --> 00:15:24 We say the definition is (x0, y0) is a critical point of f -- 184 00:15:24 --> 00:15:36 -- if the partial derivative, with respect to x, 185 00:15:36 --> 00:15:44 and partial derivative with respect to y are both zero. 186 00:15:44 --> 00:15:50 Generally, you would want all the partial derivatives, 187 00:15:50 --> 00:15:56 no matter how many variables you have, to be zero at the same 188 00:15:56 --> 00:16:06 time. Let's see an example. 189 00:16:06 --> 00:16:23 Let's say I give you the function f(x;y)= x^2 - 2xy 3y^2 190 00:16:23 --> 00:16:28 2x - 2y. And let's try to figure out 191 00:16:28 --> 00:16:32 whether we can minimize or maximize this. 192 00:16:32 --> 00:16:37 What we would start doing immediately is taking the 193 00:16:37 --> 00:16:43 partial derivatives. What is f sub x? 194 00:16:43 --> 00:16:56 It starts with 2x - 2y 0 2. Remember that y is a constant 195 00:16:56 --> 00:17:04 so this differentiates to zero. Now, if we do f sub y, 196 00:17:04 --> 00:17:14 that is going to be 0-2x 6y-2. And what we want to do is set 197 00:17:14 --> 00:17:17 these things to zero. And we want to solve these two 198 00:17:17 --> 00:17:21 equations at the same time. An important thing to remember, 199 00:17:21 --> 00:17:23 and maybe I should have told you a couple of weeks ago 200 00:17:23 --> 00:17:25 already, if you have two equations to 201 00:17:25 --> 00:17:28 solve, well, it is very good to try to 202 00:17:28 --> 00:17:30 simplify them by adding them together or whatever, 203 00:17:30 --> 00:17:33 but you must keep two equations. If you have two equations, 204 00:17:33 --> 00:17:37 you shouldn't end up with just one equation out of nowhere. 205 00:17:37 --> 00:17:40 For example here, we can certainly simplify 206 00:17:40 --> 00:17:46 things by summing them together. If we add them together, 207 00:17:46 --> 00:17:52 well, the x's cancel and the constants cancel. 208 00:17:52 --> 00:17:56 In fact, we are just left with 4y for zero. 209 00:17:56 --> 00:18:00 That is pretty good. That tells us y should be zero. 210 00:18:00 --> 00:18:02 But then we should, of course, go back to these and 211 00:18:02 --> 00:18:07 see what else we know. Well, now it tells us, 212 00:18:07 --> 00:18:14 if you put y = 0 it tells you 2x 2 = 0. 213 00:18:14 --> 00:18:26 That tells you x = - 1. We have one critical point that 214 00:18:26 --> 00:18:33 is (x, y) = (- 1; 0). 215 00:18:33 --> 00:18:39 Any questions so far? No. 216 00:18:39 --> 00:18:40 Well, you should have a question. 217 00:18:40 --> 00:18:49 The question should be how do we know if it is a maximum or a 218 00:18:49 --> 00:18:53 minimum? Yeah. 219 00:18:53 --> 00:18:55 If we had a function of one variable, we would decide things 220 00:18:55 --> 00:18:58 based on the second derivative. And, in fact, 221 00:18:58 --> 00:19:00 we will see tomorrow how to do things based on the second 222 00:19:00 --> 00:19:03 derivative. But that is kind of tricky 223 00:19:03 --> 00:19:06 because there are a lot of second derivatives. 224 00:19:06 --> 00:19:09 I mean we already have two first derivatives. 225 00:19:09 --> 00:19:14 You can imagine that if you keep taking partials you may end 226 00:19:14 --> 00:19:17 up with more and more, so we will have to figure out 227 00:19:17 --> 00:19:19 carefully what the condition should be. 228 00:19:19 --> 00:19:27 We will do that tomorrow. For now, let's just try to look 229 00:19:27 --> 00:19:38 a bit at how do we understand these things by hand? 230 00:19:38 --> 00:19:42 In fact, let me point out to you immediately that there is 231 00:19:42 --> 00:19:49 more than maxima and minima. Remember, we saw the example of 232 00:19:49 --> 00:19:52 x^2 y^2. That has a critical point. 233 00:19:52 --> 00:19:56 That critical point is obviously a minimum. 234 00:19:56 --> 00:19:58 And, of course, it could be a local minimum 235 00:19:58 --> 00:20:01 because it could be that if you have a more complicated function 236 00:20:01 --> 00:20:04 there is indeed a minimum here, but then elsewhere the function 237 00:20:04 --> 00:20:08 drops to a lower value. We call that just a local 238 00:20:08 --> 00:20:12 minimum to say that it is a minimum if you stick two values 239 00:20:12 --> 00:20:15 that are close enough to that point. 240 00:20:15 --> 00:20:19 Of course, you also have local maximum, which I didn't plot, 241 00:20:19 --> 00:20:23 but it is easy to plot. That is a local maximum. 242 00:20:23 --> 00:20:27 But there is a third example of critical point, 243 00:20:27 --> 00:20:31 and that is a saddle point. The saddle point, 244 00:20:31 --> 00:20:35 it is a new phenomena that you don't really see in single 245 00:20:35 --> 00:20:38 variable calculus. It is a critical point that is 246 00:20:38 --> 00:20:42 neither a minimum nor a maximum because, depending on which 247 00:20:42 --> 00:20:46 direction you look in, it's either one or the other. 248 00:20:46 --> 00:20:50 See the point in the middle, at the origin, 249 00:20:50 --> 00:20:55 is a saddle point. If you look at the tangent 250 00:20:55 --> 00:20:58 plane to this graph, you will see that it is 251 00:20:58 --> 00:21:01 actually horizontal at the origin. 252 00:21:01 --> 00:21:05 You have this mountain pass where the ground is horizontal. 253 00:21:05 --> 00:21:08 But, depending on which direction you go, 254 00:21:08 --> 00:21:12 you go up or down. So, we say that a point is a 255 00:21:12 --> 00:21:16 saddle point if it is neither a minimum or a maximum. 256 00:21:16 --> 00:21:30 257 00:21:30 --> 00:21:38 Possibilities could be a local min, a local max or a saddle. 258 00:21:38 --> 00:21:42 Tomorrow we will see how to decide which one it is, 259 00:21:42 --> 00:21:46 in general, using second derivatives. 260 00:21:46 --> 00:21:50 For this time, let's just try to do it by 261 00:21:50 --> 00:21:53 hand. I just want to observe, 262 00:21:53 --> 00:21:57 in fact, I can try to, you know, 263 00:21:57 --> 00:21:58 these examples that I have here, 264 00:21:58 --> 00:22:02 they are x^2 y^2, y^2 - x^2, they are sums or differences of 265 00:22:02 --> 00:22:05 squares. And, if we know that we can put 266 00:22:05 --> 00:22:08 things as sum of squares for example, we will be done. 267 00:22:08 --> 00:22:16 Let's try to express this maybe as the square of something. 268 00:22:16 --> 00:22:21 The main problem is this 2xy. Observe we know something that 269 00:22:21 --> 00:22:26 starts with x^2 - 2xy but is actually a square of something 270 00:22:26 --> 00:22:32 else. It would be x^2 - 2xy y^2, 271 00:22:32 --> 00:22:37 not plus 3y2. Let's try that. 272 00:22:37 --> 00:22:48 So, we are going to complete the square. 273 00:22:48 --> 00:22:53 I am going to say it is x minus y squared, so it gives me the 274 00:22:53 --> 00:23:01 first two terms and also the y2. Well, I still need to add two 275 00:23:01 --> 00:23:09 more y^2, and I also need to add, of course, 276 00:23:09 --> 00:23:15 the 2x and - 2y. It is still not simple enough 277 00:23:15 --> 00:23:19 for my taste. I can actually do better. 278 00:23:19 --> 00:23:24 These guys look like a sum of squares, but here I have this 279 00:23:24 --> 00:23:28 extra stuff, 2x - 2y. Well, that is 2 (x - y). 280 00:23:28 --> 00:23:32 It looks like maybe we can modify this and make this into 281 00:23:32 --> 00:23:36 another square. So, in fact, 282 00:23:36 --> 00:23:45 I can simplify this further to (x - y 1)^2. 283 00:23:45 --> 00:23:51 That would be (x - y)^2 2( x - y), and then there is a plus 284 00:23:51 --> 00:23:55 one. Well, we don't have a plus one 285 00:23:55 --> 00:24:00 so let's remove it by subtracting one. 286 00:24:00 --> 00:24:07 And I still have my 2y^2. Do you see why this is the same 287 00:24:07 --> 00:24:13 function? Yeah. 288 00:24:13 --> 00:24:19 Again, if I expand x minus y plus one squared, 289 00:24:19 --> 00:24:28 I get (x - y)^2 2 (x - y) 1. But I will have minus one that 290 00:24:28 --> 00:24:34 will cancel out and then I have a plus 2y^2. 291 00:24:34 --> 00:24:41 Now, what I know is a sum of two squared minus one. 292 00:24:41 --> 00:24:44 And this critical point, (x,y) = (-1;0), 293 00:24:44 --> 00:24:49 that is actually when this is zero and that is zero, 294 00:24:49 --> 00:24:55 so that is the smallest value. This is always greater or equal 295 00:24:55 --> 00:25:00 to zero, the same with that one, so that is always at least 296 00:25:00 --> 00:25:03 minus one. And minus one happens to be the 297 00:25:03 --> 00:25:13 value at the critical point. So, it is a minimum. 298 00:25:13 --> 00:25:16 Now, of course here I was very lucky. 299 00:25:16 --> 00:25:19 I mean, generally, I couldn't expect things to 300 00:25:19 --> 00:25:21 simplify that much. In fact, I cheated. 301 00:25:21 --> 00:25:26 I started from that, I expanded, and then that is 302 00:25:26 --> 00:25:30 how I got my example. The general method will be a 303 00:25:30 --> 00:25:32 bit different, but you will see it will 304 00:25:32 --> 00:25:34 actually also involve completing squares. 305 00:25:34 --> 00:25:42 Just there is more to it than what we have seen. 306 00:25:42 --> 00:25:48 We will come back to this tomorrow. 307 00:25:48 --> 00:25:56 Sorry? How do I know that this equals 308 00:25:56 --> 00:26:09 -- How do I know that the whole function is greater or equal to 309 00:26:09 --> 00:26:15 negative one? Well, I wrote f of x, 310 00:26:15 --> 00:26:20 y as something squared plus 2y^2 - 1. 311 00:26:20 --> 00:26:25 This squared is always a positive number and not a 312 00:26:25 --> 00:26:27 negative. It is a square. 313 00:26:27 --> 00:26:30 The square of something is always non-negative. 314 00:26:30 --> 00:26:34 Similarly, y^2 is also always non-negative. 315 00:26:34 --> 00:26:38 So if you add something that is at least zero plus something 316 00:26:38 --> 00:26:40 that is at least zero and you subtract one, 317 00:26:40 --> 00:26:43 you get always at least minus one. 318 00:26:43 --> 00:26:48 And, in fact, the only way you can get minus 319 00:26:48 --> 00:26:54 one is if both of these guys are zero at the same time. 320 00:26:54 --> 00:27:17 That is how I get my minimum. More about this tomorrow. 321 00:27:17 --> 00:27:20 In fact, what I would like to tell you 322 00:27:20 --> 00:27:23 about now instead is a nice application of min, 323 00:27:23 --> 00:27:27 max problems that maybe you don't think of as a min, 324 00:27:27 --> 00:27:31 max problem that you will see. I mean you will think of it 325 00:27:31 --> 00:27:35 that way because probably your calculator can do it for you or, 326 00:27:35 --> 00:27:37 if not, your computer can do it for you. 327 00:27:37 --> 00:27:42 But it is actually something where the theory is based on 328 00:27:42 --> 00:27:47 minimization in two variables. Very often in experimental 329 00:27:47 --> 00:27:52 sciences you have to do something called least-squares 330 00:27:52 --> 00:28:01 intercalation. And what is that about? 331 00:28:01 --> 00:28:07 Well, it is the idea that maybe you do some experiments and you 332 00:28:07 --> 00:28:11 record some data. You have some data x and some 333 00:28:11 --> 00:28:13 data y. And, I don't know, 334 00:28:13 --> 00:28:17 maybe, for example, x is -- Maybe your measuring 335 00:28:17 --> 00:28:21 frogs and you're trying to measure how bit the frog leg is 336 00:28:21 --> 00:28:23 compared to the eyes of the frog, 337 00:28:23 --> 00:28:26 or you're trying to measure something. 338 00:28:26 --> 00:28:30 And if you are doing chemistry then it could be how much you 339 00:28:30 --> 00:28:35 put of some reactant and how much of the output product that 340 00:28:35 --> 00:28:37 you wanted to synthesize generated. 341 00:28:37 --> 00:28:43 All sorts of things. Make up your own example. 342 00:28:43 --> 00:28:46 You measure basically, for various values of x, 343 00:28:46 --> 00:28:48 what the value of y ends up being. 344 00:28:48 --> 00:28:52 And then you like to claim these points are kind of 345 00:28:52 --> 00:28:53 aligned. And, of course, 346 00:28:53 --> 00:28:55 to a mathematician they are not aligned. 347 00:28:55 --> 00:28:57 But, to an experimental scientist, that is evidence that 348 00:28:57 --> 00:29:00 there is a relation between the two. 349 00:29:00 --> 00:29:03 And so you want to claim -- And in your paper you will actually 350 00:29:03 --> 00:29:05 draw a nice little line like that. 351 00:29:05 --> 00:29:10 The functions depend linearly on each of them. 352 00:29:10 --> 00:29:15 The question is how do we come up with that nice line that 353 00:29:15 --> 00:29:19 passes smack in the middle of the points? 354 00:29:19 --> 00:29:27 The question is, given experimental data xi, 355 00:29:27 --> 00:29:36 yi -- Maybe I should actually be more precise. 356 00:29:36 --> 00:29:37 You are given some experimental data. 357 00:29:37 --> 00:29:45 You have data points x1, y1, x2, y2 and so on, 358 00:29:45 --> 00:29:52 xn, yn, the question would be find the 359 00:29:52 --> 00:30:00 "best fit" line of a form y equals ax b 360 00:30:00 --> 00:30:08 that somehow approximates very well this data. 361 00:30:08 --> 00:30:11 You can also use that right away to predict various things. 362 00:30:11 --> 00:30:13 For example, if you look at your new 363 00:30:13 --> 00:30:17 homework, actually the first problem asks 364 00:30:17 --> 00:30:22 you to predict how many iPods will be on this planet in ten 365 00:30:22 --> 00:30:28 years looking at past sales and how they behave. 366 00:30:28 --> 00:30:31 One thing, right away, before you lose all the money 367 00:30:31 --> 00:30:35 that you don't have yet, you cannot use that to predict 368 00:30:35 --> 00:30:39 the stock market. So, don't try to use that to 369 00:30:39 --> 00:30:52 make money. It doesn't work. 370 00:30:52 --> 00:30:58 One tricky thing here that I want to draw your attention to 371 00:30:58 --> 00:31:02 is what are the unknowns here? The natural answer would be to 372 00:31:02 --> 00:31:03 say that the unknowns are x and y. 373 00:31:03 --> 00:31:07 That is not actually the case. We are not going to solve for 374 00:31:07 --> 00:31:09 some x and y. I mean we have some values 375 00:31:09 --> 00:31:12 given to us. And, when we are looking for 376 00:31:12 --> 00:31:16 that line, we don't really care about the perfect value of x. 377 00:31:16 --> 00:31:21 What we care about is actually these coefficients a and b that 378 00:31:21 --> 00:31:26 will tell us what the relation is between x and y. 379 00:31:26 --> 00:31:30 In fact, we are trying to solve for a and b that will give us 380 00:31:30 --> 00:31:34 the nicest possible line for these points. 381 00:31:34 --> 00:31:36 The unknowns, in our equations, 382 00:31:36 --> 00:31:39 will have to be a and b, not x and y. 383 00:31:39 --> 00:32:11 384 00:32:11 --> 00:32:20 The question really is find the "best" 385 00:32:20 --> 00:32:23 a and b. And, of course, 386 00:32:23 --> 00:32:26 we have to decide what we mean by best. 387 00:32:26 --> 00:32:30 Best will mean that we minimize some function of a and b that 388 00:32:30 --> 00:32:34 measures the total errors that we are making when we are 389 00:32:34 --> 00:32:38 choosing this line compared to the experimental data. 390 00:32:38 --> 00:32:43 Maybe, roughly speaking, it should measure how far these 391 00:32:43 --> 00:32:49 points are from the line. But now there are various ways 392 00:32:49 --> 00:32:52 to do it. And a lot of them are valid 393 00:32:52 --> 00:32:57 they give you different answers. You have to decide what it is 394 00:32:57 --> 00:32:59 that you prefer. For example, 395 00:32:59 --> 00:33:04 you could measure the distance to the line by projecting 396 00:33:04 --> 00:33:08 perpendicularly. Or you could measure instead, 397 00:33:08 --> 00:33:13 for a given value of x, the difference between the 398 00:33:13 --> 00:33:17 experimental value of y and the predicted one. 399 00:33:17 --> 00:33:21 And that is often more relevant because these guys actually may 400 00:33:21 --> 00:33:25 be expressed in different units. They are not the same type of 401 00:33:25 --> 00:33:29 quantity. You cannot actually combine 402 00:33:29 --> 00:33:32 them arbitrarily. Anyway, the convention is 403 00:33:32 --> 00:33:34 usually we measure distance in this way. 404 00:33:34 --> 00:33:38 Next, you could try to minimize the largest distance. 405 00:33:38 --> 00:33:42 Say we look at who has the largest error and we make that 406 00:33:42 --> 00:33:44 the smallest possible. The drawback of doing that is 407 00:33:44 --> 00:33:47 experimentally very often you have one data point that is not 408 00:33:47 --> 00:33:50 good because maybe you fell asleep in front of the 409 00:33:50 --> 00:33:53 experiment. And so you didn't measure the 410 00:33:53 --> 00:33:55 right thing. You tend to want to not give 411 00:33:55 --> 00:33:59 too much importance to some data point that is far away from the 412 00:33:59 --> 00:34:02 others. Maybe instead you want to 413 00:34:02 --> 00:34:06 measure the average distance or maybe you want to actually give 414 00:34:06 --> 00:34:09 more weight to things that are further away. 415 00:34:09 --> 00:34:12 And then you don't want to do the distance with a square of 416 00:34:12 --> 00:34:14 the distance. There are various possible 417 00:34:14 --> 00:34:18 answers, but one of them gives us actually a particularly nice 418 00:34:18 --> 00:34:22 formula for a and b. And so that is why it is the 419 00:34:22 --> 00:34:27 universally used one. Here it says list squares. 420 00:34:27 --> 00:34:31 That's because we will measure, actually, the sum of the 421 00:34:31 --> 00:34:35 squares of the errors. And why do we do that? 422 00:34:35 --> 00:34:37 Well, part of it is because it looks good. 423 00:34:37 --> 00:34:42 When you see this plot in scientific papers they really 424 00:34:42 --> 00:34:46 look like the line is indeed the ideal line. 425 00:34:46 --> 00:34:49 And the second reason is because actually the 426 00:34:49 --> 00:34:52 minimization problem that we will get is particularly simple, 427 00:34:52 --> 00:34:57 well-posed and easy to solve. So we will have a nice formula 428 00:34:57 --> 00:35:03 for the best a and the best b. If you have a method that is 429 00:35:03 --> 00:35:07 simple and gives you a good answer then that is probably 430 00:35:07 --> 00:35:09 good. We have to define best. 431 00:35:09 --> 00:35:22 Here it is in the sense of minimizing the total square 432 00:35:22 --> 00:35:29 error. Or maybe I should say total 433 00:35:29 --> 00:35:35 square deviation instead. What do I mean by this? 434 00:35:35 --> 00:35:44 The deviation for each data point is the difference between 435 00:35:44 --> 00:35:52 what you have measured and what you are predicting by your 436 00:35:52 --> 00:36:00 model. That is the difference between 437 00:36:00 --> 00:36:11 y1 and axi plus b. Now, what we will do is try to 438 00:36:11 --> 00:36:25 minimize the function capital D, which is just the sum for all 439 00:36:25 --> 00:36:36 the data points of the square of a deviation. 440 00:36:36 --> 00:36:40 Let me go over this again. This is a function of a and b. 441 00:36:40 --> 00:36:43 Of course there are a lot of letters in here, 442 00:36:43 --> 00:36:46 but xi and yi in real life there will be numbers given to 443 00:36:46 --> 00:36:48 you. There will be numbers that you 444 00:36:48 --> 00:36:51 have measured. You have measured all of this 445 00:36:51 --> 00:36:53 data. They are just going to be 446 00:36:53 --> 00:36:58 numbers. You put them in there and you 447 00:36:58 --> 00:37:04 get a function of a and b. Any questions? 448 00:37:04 --> 00:37:16 449 00:37:16 --> 00:37:20 How do we minimize this function of a and b? 450 00:37:20 --> 00:37:27 Well, let's use your knowledge. Let's actually look for a 451 00:37:27 --> 00:37:34 critical point. We want to solve for partial d 452 00:37:34 --> 00:37:42 over partial a= 0, partial d over partial b = 0. 453 00:37:42 --> 00:37:48 That is how we look for critical points. 454 00:37:48 --> 00:37:52 Let's take the derivative of this with respect to a. 455 00:37:52 --> 00:37:59 Well, the derivative of a sum is sum of the derivatives. 456 00:37:59 --> 00:38:04 And now we have to take the derivative of this quantity 457 00:38:04 --> 00:38:07 squared. Remember, we take the 458 00:38:07 --> 00:38:11 derivative of the square. We take twice this quantity 459 00:38:11 --> 00:38:15 times the derivative of what we are squaring. 460 00:38:15 --> 00:38:26 We will get 2(yi - axi) b times the derivative of this with 461 00:38:26 --> 00:38:30 respect to a. What is the derivative of this 462 00:38:30 --> 00:38:35 with respect to a? Negative xi, exactly. 463 00:38:35 --> 00:38:38 And so we will want this to be 0. 464 00:38:38 --> 00:38:41 And partial d over partial b, we do the same thing, 465 00:38:41 --> 00:38:45 but different shading with respect to b instead of with 466 00:38:45 --> 00:38:50 respect to a. Again, the sum of squares twice 467 00:38:50 --> 00:38:58 yi minus axi equals b times the derivative of this with respect 468 00:38:58 --> 00:39:02 to b is, I think, negative one. 469 00:39:02 --> 00:39:07 Those are the equations we have to solve. 470 00:39:07 --> 00:39:10 Well, let's reorganize this a little bit. 471 00:39:10 --> 00:39:24 472 00:39:24 --> 00:39:32 The first equation. See, there are a's and there 473 00:39:32 --> 00:39:36 are b's in these equations. I am going to just look at the 474 00:39:36 --> 00:39:39 coefficients of a and b. If you have good eyes, 475 00:39:39 --> 00:39:42 you can see probably that these are actually linear equations in 476 00:39:42 --> 00:39:45 a and b. There is a lot of clutter with 477 00:39:45 --> 00:39:47 all these x's and y's all over the place. 478 00:39:47 --> 00:39:55 Let's actually try to expand things and make that more 479 00:39:55 --> 00:39:59 apparent. The first thing I will do is 480 00:39:59 --> 00:40:02 actually get rid of these factors of two. 481 00:40:02 --> 00:40:05 They are just not very important. 482 00:40:05 --> 00:40:10 I can simplify things. Next, I am going to look at the 483 00:40:10 --> 00:40:15 coefficient of a. I will get basically a times xi 484 00:40:15 --> 00:40:24 squared. Let me just do it and should be 485 00:40:24 --> 00:40:33 clear. I claim when we simplify this 486 00:40:33 --> 00:40:46 we get xi squared times a plus xi times b minus xiyi. 487 00:40:46 --> 00:40:53 And we set this equal to zero. Do you agree that this is what 488 00:40:53 --> 00:40:57 we get when we expand that product? 489 00:40:57 --> 00:41:03 Yeah. Kind of? OK. Let's do the other one. 490 00:41:03 --> 00:41:08 We just multiply by minus one, so we take the opposite of that 491 00:41:08 --> 00:41:19 which would be axi plus b. I will write that as xia plus b 492 00:41:19 --> 00:41:25 minus yi. Sorry. I forgot the n here. 493 00:41:25 --> 00:41:30 And let me just reorganize that by actually putting all the a's 494 00:41:30 --> 00:41:34 together. That means I will have sum of 495 00:41:34 --> 00:41:40 all the xi2 times a plus sum of xib minus sum of xiyi equal to 496 00:41:40 --> 00:41:41 zero. 497 00:41:41 --> 00:42:08 498 00:42:08 --> 00:42:15 If I rewrite this, it becomes sum of xi2 times a 499 00:42:15 --> 00:42:24 plus sum of the xi's time b, and let me move the other guys 500 00:42:24 --> 00:42:30 to the other side, equals sum of xiyi. 501 00:42:30 --> 00:42:37 And that one becomes sum of xi times a. 502 00:42:37 --> 00:42:41 Plus how many b's do I get on this one? 503 00:42:41 --> 00:42:45 I get one for each data point. When I sum them together, 504 00:42:45 --> 00:42:48 I will get n. Very good. 505 00:42:48 --> 00:42:56 N times b equals sum of yi. Now, this quantities look 506 00:42:56 --> 00:42:58 scary, but they are actually just numbers. 507 00:42:58 --> 00:43:01 For example, this one, you look at all your 508 00:43:01 --> 00:43:05 data points. For each of them you take the 509 00:43:05 --> 00:43:10 value of x and you just sum all these numbers together. 510 00:43:10 --> 00:43:19 What you get, actually, is a linear system in 511 00:43:19 --> 00:43:26 a and b, a two by two linear system. 512 00:43:26 --> 00:43:32 And so now we can solve this for a and b. 513 00:43:32 --> 00:43:35 In practice, of course, first you plug in 514 00:43:35 --> 00:43:40 the numbers for xi and yi and then you solve the system that 515 00:43:40 --> 00:43:44 you get. And we know how to solve two by 516 00:43:44 --> 00:43:46 two linear systems, I hope. 517 00:43:46 --> 00:43:50 That's how we find the best fit line. 518 00:43:50 --> 00:43:54 Now, why is that going to be the best one instead of the 519 00:43:54 --> 00:43:56 worst one? We just solved for a critical 520 00:43:56 --> 00:43:58 point. That could actually be a 521 00:43:58 --> 00:44:01 maximum of this error function D. 522 00:44:01 --> 00:44:05 We will have the answer to that next time, but trust me. 523 00:44:05 --> 00:44:08 If you really want to go over the second derivative test that 524 00:44:08 --> 00:44:11 we will see tomorrow and apply it in this case, 525 00:44:11 --> 00:44:14 it is quite hard to check, but you can see it is actually 526 00:44:14 --> 00:44:28 a minimum. I will just say -- -- we can 527 00:44:28 --> 00:44:42 show that it is a minimum. Now, the event with the linear 528 00:44:42 --> 00:44:47 case is the one that we are the most familiar with. 529 00:44:47 --> 00:44:56 Least-squares interpolation actually works in much more 530 00:44:56 --> 00:45:03 general settings. Because instead of fitting for 531 00:45:03 --> 00:45:06 the best line, if you think it has a different 532 00:45:06 --> 00:45:10 kind of relation then maybe you can fit in using a different 533 00:45:10 --> 00:45:14 kind of formula. Let me actually illustrate that 534 00:45:14 --> 00:45:17 with an example. I don't know if you are 535 00:45:17 --> 00:45:21 familiar with Moore's law. It is something that is 536 00:45:21 --> 00:45:24 supposed to tell you how quickly basically computer chips become 537 00:45:24 --> 00:45:27 smarter faster and faster all the time. 538 00:45:27 --> 00:45:31 It's a law that says things about the number of transistors 539 00:45:31 --> 00:45:33 that you can fit onto a computer chip. 540 00:45:33 --> 00:45:45 Here I have some data about -- Here is data about the number of 541 00:45:45 --> 00:45:58 transistors on a standard PC processor as a function of time. 542 00:45:58 --> 00:46:01 And if you try to do a best-line fit, 543 00:46:01 --> 00:46:07 well, it doesn't seem to follow a linear trend. 544 00:46:07 --> 00:46:11 On the other hand, if you plug the diagram in the 545 00:46:11 --> 00:46:13 log scale, the log of a number of 546 00:46:13 --> 00:46:15 transitions as a function of time, 547 00:46:15 --> 00:46:21 then you get a much better line. And so, in fact, 548 00:46:21 --> 00:46:26 that means that you had an exponential relation between the 549 00:46:26 --> 00:46:30 number of transistors and time. And so, actually that's what 550 00:46:30 --> 00:46:32 Moore's law says. It says that the number of 551 00:46:32 --> 00:46:36 transistors in the chip doubles every 18 months or every two 552 00:46:36 --> 00:46:40 years. They keep changing the 553 00:46:40 --> 00:46:49 statement. How do we find the best 554 00:46:49 --> 00:46:58 exponential fit? Well, an exponential fit would 555 00:46:58 --> 00:47:05 be something of a form y equals a constant times exponential of 556 00:47:05 --> 00:47:09 a times x. That is what we want to look at. 557 00:47:09 --> 00:47:13 Well, we could try to minimize a square error like we did 558 00:47:13 --> 00:47:16 before. That doesn't work well at all. 559 00:47:16 --> 00:47:18 The equations that you get are very complicated. 560 00:47:18 --> 00:47:24 You cannot solve them. But remember what I showed you 561 00:47:24 --> 00:47:28 on this log plot. If you plot the log of y as a 562 00:47:28 --> 00:47:33 function of x then suddenly it becomes a linear relation. 563 00:47:33 --> 00:47:43 Observe, this is the same as ln of y equals ln of c plus ax. 564 00:47:43 --> 00:47:55 And that is the linear best fit. What you do is you just look 565 00:47:55 --> 00:48:08 for the best straight line fit for the log of y. 566 00:48:08 --> 00:48:10 That is something we already know. 567 00:48:10 --> 00:48:12 But you can also do, for example, 568 00:48:12 --> 00:48:16 let's say that we have something more complicated. 569 00:48:16 --> 00:48:21 Let's say that we have actually a quadratic law. 570 00:48:21 --> 00:48:27 For example, y is of the form ax^2 bx c. 571 00:48:27 --> 00:48:31 And, of course, you are trying to find somehow 572 00:48:31 --> 00:48:34 the best. That would mean here fitting 573 00:48:34 --> 00:48:37 the best parabola for your data points. 574 00:48:37 --> 00:48:40 Well, to do that, you would need to find a, 575 00:48:40 --> 00:48:45 b and c. And now you will have actually 576 00:48:45 --> 00:48:51 a function of a, b and c, which would be the sum 577 00:48:51 --> 00:48:57 of the old data points of the square deviation. 578 00:48:57 --> 00:49:01 And, if you try to solve for critical points, 579 00:49:01 --> 00:49:03 now you will have three equations involving a, 580 00:49:03 --> 00:49:05 b and c, in fact, you will find a three 581 00:49:05 --> 00:49:09 by three linear system. And it works the same way. 582 00:49:09 --> 00:49:14 Just you have a little bit more data. 583 00:49:14 --> 00:49:19 Basically, you see that this best fit problems are an example 584 00:49:19 --> 00:49:24 of a minimization problem that maybe you didn't expect to see 585 00:49:24 --> 00:49:30 minimization problems come in. But that is really the way to 586 00:49:30 --> 00:49:34 handle these questions. Tomorrow we will go back to the 587 00:49:34 --> 00:49:38 question of how do we decide whether it is a minimum or a 588 00:49:38 --> 00:49:40 maximum. And we will continue exploring 589 00:49:40 --> 00:49:43 in terms of several variables. 590 00:49:43 --> 00:49:48