1
00:00:01 --> 00:00:03
The following content is
provided under a Creative

2
00:00:03 --> 00:00:05
Commons license.
Your support will help MIT

3
00:00:05 --> 00:00:08
OpenCourseWare continue to offer
high quality educational

4
00:00:08 --> 00:00:13
resources for free.
To make a donation or to view

5
00:00:13 --> 00:00:18
additional materials from
hundreds of MIT courses,

6
00:00:18 --> 00:00:23
visit MIT OpenCourseWare at
ocw.mit.edu.

7
00:00:25 --> 00:00:29
Last time we saw things about
gradients and directional

8
00:00:29 --> 00:00:32
derivatives.
Before that we studied how to

9
00:00:32 --> 00:00:37
look for minima and maxima of
functions of several variables.

10
00:00:37 --> 00:00:41
And today we are going to look
again at min/max problems but in

11
00:00:41 --> 00:00:45
a different setting,
namely, one for variables that

12
00:00:45 --> 00:00:49
are not independent.
And so what we will see is you

13
00:00:49 --> 00:00:52
may have heard of Lagrange
multipliers.

14
00:00:52 --> 00:00:59
And this is the one point in
the term when I can shine with

15
00:00:59 --> 00:01:05
my French accent and say
Lagrange's name properly.

16
00:01:05 --> 00:01:08
OK.
What are Lagrange multipliers

17
00:01:08 --> 00:01:13
about?
Well, the goal is to minimize

18
00:01:13 --> 00:01:19
or maximize a function of
several variables.

19
00:01:19 --> 00:01:22
Let's say, for example,
f of x, y, z,

20
00:01:22 --> 00:01:27
but where these variables are
no longer independent.

21
00:01:27 --> 00:01:41
 
 

22
00:01:41 --> 00:01:43
They are not independent.
That means that there is a

23
00:01:43 --> 00:01:47
relation between them.
The relation is maybe some

24
00:01:47 --> 00:01:52
equation of the form g of x,
y, z equals some constant.

25
00:01:52 --> 00:01:57
You take the relation between
x, y, z, you call that g and

26
00:01:57 --> 00:02:02
that gives you the constraint.
And your goal is to minimize f

27
00:02:02 --> 00:02:05
only of those values of x,
y, z that satisfy the

28
00:02:05 --> 00:02:07
constraint.
What is one way to do that?

29
00:02:07 --> 00:02:10
Well, one to do that,
if the constraint is very

30
00:02:10 --> 00:02:14
simple, we can maybe solve for
one of the variables.

31
00:02:14 --> 00:02:17
Maybe we can solve this
equation for one of the

32
00:02:17 --> 00:02:21
variables, plug it back into f,
and then we have a usual

33
00:02:21 --> 00:02:25
min/max problem that we have
seen how to do.

34
00:02:25 --> 00:02:28
The problem is sometimes you
cannot actually solve for x,

35
00:02:28 --> 00:02:31
y, z in here because this
condition is too complicated and

36
00:02:31 --> 00:02:38
then we need a new method.
That is what we are going to do.

37
00:02:38 --> 00:02:41
Why would we care about that?
Well, one example is actually

38
00:02:41 --> 00:02:43
in physics.
Maybe you have seen in

39
00:02:43 --> 00:02:47
thermodynamics that you study
quantities about gases,

40
00:02:47 --> 00:02:50
and those quantities that
involve pressure,

41
00:02:50 --> 00:02:53
volume and temperature.
And pressure,

42
00:02:53 --> 00:02:56
volume and temperature are not
independent of each other.

43
00:02:56 --> 00:02:59
I mean you know probably the
equation PV = NRT.

44
00:02:59 --> 00:03:01
And, of course,
there you could actually solve

45
00:03:01 --> 00:03:03
to express things in terms of
one or the other.

46
00:03:03 --> 00:03:07
But sometimes it is more
convenient to keep all three

47
00:03:07 --> 00:03:09
variables but treat them as
constrained.

48
00:03:09 --> 00:03:19
It is just an example of a
situation where you might want

49
00:03:19 --> 00:03:24
to do this.
Anyway, we will look mostly at

50
00:03:24 --> 00:03:28
particular examples,
but just to point out that this

51
00:03:28 --> 00:03:32
is useful when you study guesses
in physics.

52
00:03:32 --> 00:03:35
The first observation is we
cannot use our usual method of

53
00:03:35 --> 00:03:36
looking for critical points of
f.

54
00:03:36 --> 00:03:40
Because critical points of f
typically will not satisfy this

55
00:03:40 --> 00:03:43
condition and so won't be good
solutions.

56
00:03:43 --> 00:03:49
We need something else.
Let's look at an example,

57
00:03:49 --> 00:03:53
and we will see how that leads
us to the method.

58
00:03:53 --> 00:04:03
For example,
let's say that I want to find

59
00:04:03 --> 00:04:17
the point closest to the origin
-- -- on the hyperbola xy equals

60
00:04:17 --> 00:04:23
3 in the plane.
That means I have this

61
00:04:23 --> 00:04:26
hyperbola, and I am asking
myself what is the point on it

62
00:04:26 --> 00:04:29
that is the closest to the
origin?

63
00:04:29 --> 00:04:31
I mean we can solve this by
elementary geometry,

64
00:04:31 --> 00:04:34
we don't need actually Lagrange
multipliers,

65
00:04:34 --> 00:04:38
but we are going to do it with
Lagrange multipliers because it

66
00:04:38 --> 00:04:41
is a pretty good example.
What does it mean?

67
00:04:41 --> 00:04:47
Well, it means that we want to
minimize distance to the origin.

68
00:04:47 --> 00:04:49
What is the distance to the
origin?

69
00:04:49 --> 00:04:53
If I have a point,
at coordinates (x,

70
00:04:53 --> 00:04:58
y) and then the distance to the
origin is square root of x

71
00:04:58 --> 00:05:02
squared plus y squared.
Well, do we really want to

72
00:05:02 --> 00:05:05
minimize that or can we minimize
something easier?

73
00:05:05 --> 00:05:06
Yeah.
Maybe we can minimize the

74
00:05:06 --> 00:05:14
square of a distance.
Let's forget this guy and

75
00:05:14 --> 00:05:23
instead -- Actually,
we will minimize f of x,

76
00:05:23 --> 00:05:27
y equals x squared plus y
squared,

77
00:05:27 --> 00:05:39
that looks better, 
subject to the constraint xy =

78
00:05:39 --> 00:05:44
3.
And so we will call this thing

79
00:05:44 --> 00:05:50
g of x, y to illustrate the
general method.

80
00:05:50 --> 00:05:58
Let's look at a picture.
Here you can see in yellow the

81
00:05:58 --> 00:06:02
hyperbola xy equals three.
And we are going to look for

82
00:06:02 --> 00:06:05
the points that are the closest
to the origin.

83
00:06:05 --> 00:06:08
What can we do?
Well, for example,

84
00:06:08 --> 00:06:13
we can plot the function x
squared plus y squared,

85
00:06:13 --> 00:06:17
function f.
That is the contour plot of f

86
00:06:17 --> 00:06:21
with a hyperbola on top of it.
Now let's see what we can do

87
00:06:21 --> 00:06:25
with that.
Well, let's ask ourselves,

88
00:06:25 --> 00:06:30
for example,
if I look at points where f

89
00:06:30 --> 00:06:34
equals 20 now.
I think I am at 20 but you

90
00:06:34 --> 00:06:37
cannot really see it.
That is a circle with a point

91
00:06:37 --> 00:06:41
whose distant square is 20.
Well, can I find a solution if

92
00:06:41 --> 00:06:44
I am on the hyperbola?
Yes, there are four points at

93
00:06:44 --> 00:06:46
this distance.
Can I do better?

94
00:06:46 --> 00:06:49
Well, let's decrease for
distance.

95
00:06:49 --> 00:06:52
Yes, we can still find points
on the hyperbola and so on.

96
00:06:52 --> 00:06:56
Except if we go too low then
there are no points on this

97
00:06:56 --> 00:07:00
circle anymore in the hyperbola.
If we decrease the value of f

98
00:07:00 --> 00:07:03
that we want to look at that
will somehow limit value beyond

99
00:07:03 --> 00:07:07
which we cannot go,
and that is the minimum of f.

100
00:07:07 --> 00:07:13
We are trying to look for the
smallest value of f that will

101
00:07:13 --> 00:07:17
actually be realized on the
hyperbola.

102
00:07:17 --> 00:07:20
When does that happen?
Well, I have to backtrack a

103
00:07:20 --> 00:07:23
little bit.
It seems like the limiting case

104
00:07:23 --> 00:07:26
is basically here.
It is when the circle is

105
00:07:26 --> 00:07:31
tangent to the hyperbola.
That is the smallest circle

106
00:07:31 --> 00:07:37
that will hit the hyperbola.
If I take a larger value of f,

107
00:07:37 --> 00:07:39
I will have solutions.
If I take a smaller value of f,

108
00:07:39 --> 00:07:41
I will not have any solutions
anymore.

109
00:07:41 --> 00:07:49
So, that is the situation that
we want to solve for.

110
00:07:49 --> 00:07:54
How do we find that minimum?
Well, a key observation that is

111
00:07:54 --> 00:07:58
valid on this picture,
and that actually remain true

112
00:07:58 --> 00:08:03
in the completely general case,
is that when we have a minimum

113
00:08:03 --> 00:08:09
the level curve of f is actually
tangent to our hyperbola.

114
00:08:09 --> 00:08:15
It is tangent to the set of
points where x,

115
00:08:15 --> 00:08:20
y equals three,
to the hyperbola.

116
00:08:20 --> 00:08:32
Let's write that down.
We observe that at the minimum

117
00:08:32 --> 00:08:49
the level curve of f is tangent
to the hyperbola.

118
00:08:49 --> 00:08:53
Remember, the hyperbola is
given by the equal g equals

119
00:08:53 --> 00:08:56
three, so it is a level curve of
g.

120
00:08:56 --> 00:08:59
We have a level curve of f and
a level curve of g that are

121
00:08:59 --> 00:09:03
tangent to each other.
And I claim that is going to be

122
00:09:03 --> 00:09:07
the general situation that we
are interested in.

123
00:09:07 --> 00:09:12
How do we try to solve for
points where this happens?

124
00:09:12 --> 00:09:28
 
 

125
00:09:28 --> 00:09:36
How do we find x,
y where the level curves of f

126
00:09:36 --> 00:09:47
and g are tangent to each other?
Let's think for a second.

127
00:09:47 --> 00:09:51
If the two level curves are
tangent to each other that means

128
00:09:51 --> 00:09:57
they have the same tangent line.
That means that the normal

129
00:09:57 --> 00:10:03
vectors should be parallel.
Let me maybe draw a picture

130
00:10:03 --> 00:10:06
here.
This is the level curve maybe f

131
00:10:06 --> 00:10:11
equals something.
And this is the level curve g

132
00:10:11 --> 00:10:16
equals constant.
Here my constant is three.

133
00:10:16 --> 00:10:20
Well, if I look for gradient
vectors, the gradient of f will

134
00:10:20 --> 00:10:23
be perpendicular to the level
curve of f.

135
00:10:23 --> 00:10:27
The gradient of g will be
perpendicular to the level curve

136
00:10:27 --> 00:10:29
of g.
They don't have any reason to

137
00:10:29 --> 00:10:32
be of the same size,
but they have to be parallel to

138
00:10:32 --> 00:10:35
each other.
Of course, they could also be

139
00:10:35 --> 00:10:38
parallel pointing in opposite
directions.

140
00:10:38 --> 00:10:48
But the key point is that when
this happens the gradient of f

141
00:10:48 --> 00:10:54
is parallel to the gradient of
g.

142
00:10:54 --> 00:11:03
Well, let's check that.
Here is a point.

143
00:11:03 --> 00:11:05
And I can plot the gradient of
f in blue.

144
00:11:05 --> 00:11:08
The gradient of g in yellow.
And you see,

145
00:11:08 --> 00:11:12
in most of these places,
somehow the two gradients are

146
00:11:12 --> 00:11:14
not really parallel.
Actually, I should not be

147
00:11:14 --> 00:11:17
looking at random points.
I should be looking only on the

148
00:11:17 --> 00:11:19
hyperbola.
I want points on the hyperbola

149
00:11:19 --> 00:11:22
where the two gradients are
parallel.

150
00:11:22 --> 00:11:28
Well, when does that happen?
Well, it looks like it will

151
00:11:28 --> 00:11:31
happen here.
When I am at a minimum,

152
00:11:31 --> 00:11:34
the two gradient vectors are
parallel.

153
00:11:34 --> 00:11:37
It is not really proof.
It is an example that seems to

154
00:11:37 --> 00:11:43
be convincing.
So far things work pretty well.

155
00:11:43 --> 00:11:46
How do we decide if two vectors
are parallel?

156
00:11:46 --> 00:11:50
Well, they are parallel when
they are proportional to each

157
00:11:50 --> 00:11:54
other.
You can write one of them as a

158
00:11:54 --> 00:12:02
constant times the other one,
and that constant usually one

159
00:12:02 --> 00:12:07
uses the Greek letter lambda.
I don't know if you have seen

160
00:12:07 --> 00:12:10
it before.
It is the Greek letter for L.

161
00:12:10 --> 00:12:15
And probably,
I am sure, it is somebody's

162
00:12:15 --> 00:12:22
idea of paying tribute to
Lagrange by putting an L in

163
00:12:22 --> 00:12:25
there.
Lambda is just a constant.

164
00:12:25 --> 00:12:31
And we are looking for a scalar
lambda and points x and y where

165
00:12:31 --> 00:12:33
this holds.
In fact, 

166
00:12:33 --> 00:12:37
what we are doing is replacing
min/max problems in two

167
00:12:37 --> 00:12:41
variables with a constraint
between them by a set of

168
00:12:41 --> 00:12:47
equations involving,
you will see, three variables.

169
00:12:47 --> 00:12:54
We had min/max with two
variables x, y,

170
00:12:54 --> 00:13:00
but no independent.
We had a constraint g of x,

171
00:13:00 --> 00:13:06
y equals constant.
And that becomes something new.

172
00:13:06 --> 00:13:12
That becomes a system of
equations where we have to

173
00:13:12 --> 00:13:19
solve, well, let's write down
what it means for gradient f to

174
00:13:19 --> 00:13:26
be proportional to gradient g.
That means that f sub x should

175
00:13:26 --> 00:13:32
be lambda times g sub x,
and f sub y should be lambda

176
00:13:32 --> 00:13:36
times g sub y.
Because the gradient vectors

177
00:13:36 --> 00:13:39
here are f sub x,
f sub y and g sub x,

178
00:13:39 --> 00:13:43
g sub y.
If you have a third variable z

179
00:13:43 --> 00:13:49
then you have also an equation f
sub z equals lambda g sub z.

180
00:13:49 --> 00:13:53
Now, let's see.
How many unknowns do we have in

181
00:13:53 --> 00:13:55
these equations?
Well, there is x,

182
00:13:55 --> 00:14:01
there is y and there is lambda.
We have three unknowns and have

183
00:14:01 --> 00:14:06
only two equations.
Something is missing.

184
00:14:06 --> 00:14:10
Well, I mean x and y are not
actually independent.

185
00:14:10 --> 00:14:14
They are related by the
equation g of x,

186
00:14:14 --> 00:14:21
y equals c, so we need to add
the constraint g equals c.

187
00:14:21 --> 00:14:26
And now we have three equations
involving three variables.

188
00:14:26 --> 00:14:39
Let's see how that works.
Here remember we have f equals

189
00:14:39 --> 00:14:45
x squared y squared and g = xy.
What is f sub x?

190
00:14:45 --> 00:14:52
It is going to be 2x equals
lambda times,

191
00:14:52 --> 00:14:55
what is g sub x,
y.

192
00:14:55 --> 00:14:59
Maybe I should write here f sub
x equals lambda g sub x just to

193
00:14:59 --> 00:15:03
remind you.
Then we have f sub y equals

194
00:15:03 --> 00:15:10
lambda g sub y.
F sub y is 2y equals lambda

195
00:15:10 --> 00:15:18
times g sub y is x.
And then our third equation g

196
00:15:18 --> 00:15:22
equals c becomes xy equals
three.

197
00:15:22 --> 00:15:26
So, that is what you would have
to solve.

198
00:15:26 --> 00:15:33
Any questions at this point?
No.

199
00:15:33 --> 00:15:44
Yes?
How do I know the direction of

200
00:15:44 --> 00:15:47
a gradient?
Do you mean how do I know that

201
00:15:47 --> 00:15:50
it is perpendicular to a level
curve?

202
00:15:50 --> 00:15:54
Oh, how do I know if it points
in that direction on the

203
00:15:54 --> 00:15:56
opposite one?
Well, that depends.

204
00:15:56 --> 00:15:59
I mean we'd seen in last time,
but the gradient is

205
00:15:59 --> 00:16:02
perpendicular to the level and
points towards higher values of

206
00:16:02 --> 00:16:05
a function.
So it could be -- Wait.

207
00:16:05 --> 00:16:08
What did I have?
It could be that my gradient

208
00:16:08 --> 00:16:11
vectors up there actually point
in opposite directions.

209
00:16:11 --> 00:16:15
It doesn't matter to me because
it will still look the same in

210
00:16:15 --> 00:16:18
terms of the equation,
just lambda will be positive or

211
00:16:18 --> 00:16:22
negative, depending on the case.
I can handle both situations.

212
00:16:22 --> 00:16:30
It's not a problem.
I can allow lambda to be

213
00:16:30 --> 00:16:34
positive or negative.
Well, in this example,

214
00:16:34 --> 00:16:35
it looks like lambda will be
positive.

215
00:16:35 --> 00:16:38
If you look at the picture on
the plot.

216
00:16:38 --> 00:16:48
Yes?
Well, because actually they are

217
00:16:48 --> 00:16:51
not equal to each other.
If you look at this point where

218
00:16:51 --> 00:16:55
the hyperbola and the circle
touch each other,

219
00:16:55 --> 00:16:58
first of all,
I don't know which circle I am

220
00:16:58 --> 00:17:01
going to look at.
I am trying to solve,

221
00:17:01 --> 00:17:04
actually, for the radius of the
circle.

222
00:17:04 --> 00:17:07
I am trying to find what the
minimum value of f is.

223
00:17:07 --> 00:17:10
And, second,
at that point,

224
00:17:10 --> 00:17:14
the value of f and the value of
g are not equal.

225
00:17:14 --> 00:17:17
g is equal to three because I
want the hyperbola x equals

226
00:17:17 --> 00:17:19
three.
The value of f will be the

227
00:17:19 --> 00:17:22
square of a distance,
whatever that is.

228
00:17:22 --> 00:17:27
I think it will end up being 6,
but we will see.

229
00:17:27 --> 00:17:29
So, you cannot really set them
equal because you don't know

230
00:17:29 --> 00:17:45
what f is equal to in advance.
Yes?

231
00:17:45 --> 00:17:49
Not quite.
Actually, here I am just using

232
00:17:49 --> 00:17:52
this idea of finding a point
closest to the origin to

233
00:17:52 --> 00:17:55
illustrate an example of a
min/max problem.

234
00:17:55 --> 00:17:59
The general problem we are
trying to solve is minimize f

235
00:17:59 --> 00:18:03
subject to g equals constant.
And what we are going to do for

236
00:18:03 --> 00:18:07
that is we are really going to
say instead let's look at places

237
00:18:07 --> 00:18:10
where gradient f and gradient g
are parallel to each other and

238
00:18:10 --> 00:18:14
solve for equations of that.
I think we completely lose the

239
00:18:14 --> 00:18:19
notion of closest point if we
just look at these equations.

240
00:18:19 --> 00:18:21
We don't really say anything
about closest points anymore.

241
00:18:21 --> 00:18:24
Of course, that is what they
mean in the end.

242
00:18:24 --> 00:18:28
But, in the general setting,
there is no closest point

243
00:18:28 --> 00:18:31
involved anymore.
OK.

244
00:18:31 --> 00:18:40
Yes?
Yes.

245
00:18:40 --> 00:18:43
It is always going to be the
case that,

246
00:18:43 --> 00:18:46
at the minimum, 
or at the maximum of a function

247
00:18:46 --> 00:18:49
subject to a constraint,
the level curves of f and the

248
00:18:49 --> 00:18:52
level curves of g will be
tangent to each other.

249
00:18:52 --> 00:18:54
That is the basis for this
method.

250
00:18:54 --> 00:19:00
I am going to justify that soon.
It could be minimum or maximum.

251
00:19:00 --> 00:19:02
In three-dimensions it could
even be a saddle point.

252
00:19:02 --> 00:19:03
And, in fact,
I should say in advance,

253
00:19:03 --> 00:19:06
this method will not tell us
whether it is a minimum or a

254
00:19:06 --> 00:19:08
maximum.
We do not have any way of

255
00:19:08 --> 00:19:10
knowing, except for testing
values.

256
00:19:10 --> 00:19:13
We cannot use second derivative
tests or anything like that.

257
00:19:13 --> 00:19:21
I will get back to that.
Yes?

258
00:19:21 --> 00:19:23
Yes.
Here you can set y equals to

259
00:19:23 --> 00:19:26
favor x.
Then you can minimize x squared

260
00:19:26 --> 00:19:30
plus nine over x squared.
In general, if I am trying to

261
00:19:30 --> 00:19:33
solve a more complicated
problem, I might not be able to

262
00:19:33 --> 00:19:35
solve.
I am doing an example where,

263
00:19:35 --> 00:19:38
indeed, here you could solve
and remove one variable,

264
00:19:38 --> 00:19:41
but you cannot always do that.
And this method will still work.

265
00:19:41 --> 00:19:47
The other one won't.
OK.

266
00:19:47 --> 00:19:53
I don't see any other questions.
Are there any other questions?

267
00:19:53 --> 00:19:56
No.
OK.

268
00:19:56 --> 00:20:02
I see a lot of students
stretching and so on,

269
00:20:02 --> 00:20:08
so it is very confusing for me.
How do we solve these equations?

270
00:20:08 --> 00:20:14
Well, the answer is in general
we might be in deep trouble.

271
00:20:14 --> 00:20:18
There is no general method for
solving the equations that you

272
00:20:18 --> 00:20:21
get from this method.
You just have to think about

273
00:20:21 --> 00:20:25
them.
Sometimes it will be very easy.

274
00:20:25 --> 00:20:28
Sometimes it will be so hard
that you cannot actually do it

275
00:20:28 --> 00:20:31
without the computer.
Sometimes it will be just hard

276
00:20:31 --> 00:20:33
enough to be on Part B of this
week's problem set.

277
00:20:33 --> 00:20:50
 
 

278
00:20:50 --> 00:20:56
I claim in this case we can
actually do it without so much

279
00:20:56 --> 00:21:03
trouble, because actually we can
think of this as a two by two

280
00:21:03 --> 00:21:10
linear system in x and y.
Well, let me do something.

281
00:21:10 --> 00:21:18
Let me rewrite the first two
equations as 2x - lambda y = 0.

282
00:21:18 --> 00:21:30
And lambda x - 2y = 0.
And xy = 3.

283
00:21:30 --> 00:21:36
That is what we want to solve.
Well, I can put this into

284
00:21:36 --> 00:21:41
matrix form.
Two minus lambda,

285
00:21:41 --> 00:21:48
lambda minus two times x,
y equals 0,0.

286
00:21:48 --> 00:21:52
Now, how do I solve a linear
system matrix times x,

287
00:21:52 --> 00:21:54
y equals zero?
Well, I always have an obvious

288
00:21:54 --> 00:21:56
solution.
X and y both equal to zero.

289
00:21:56 --> 00:22:02
Is that a good solution?
No, because zero times zero is

290
00:22:02 --> 00:22:07
not three.
We want another solution,

291
00:22:07 --> 00:22:14
the trivial solution.
0,0 does not solve the

292
00:22:14 --> 00:22:20
constraint equation xy equals
three, so we want another

293
00:22:20 --> 00:22:24
solution.
When do we have another

294
00:22:24 --> 00:22:29
solution?
Well, when the determinant of a

295
00:22:29 --> 00:22:37
matrix is zero.
We have other solutions that

296
00:22:37 --> 00:22:46
exist only if determinant of a
matrix is zero.

297
00:22:46 --> 00:23:01
M is this guy.
Let's compute the determinant.

298
00:23:01 --> 00:23:08
Well, that seems to be negative
four plus lambda squared.

299
00:23:08 --> 00:23:15
That is zero exactly when
lambda squared equals four,

300
00:23:15 --> 00:23:20
which is lambda is plus or
minus two.

301
00:23:20 --> 00:23:25
Already you see here it is a
the level of difficulty that is

302
00:23:25 --> 00:23:30
a little bit much for an exam
but perfectly fine for a problem

303
00:23:30 --> 00:23:33
set or for a beautiful lecture
like this one.

304
00:23:33 --> 00:23:37
How do we deal with -- Well,
we have two cases to look at.

305
00:23:37 --> 00:23:40
Lambda equals two or lambda
equals minus two.

306
00:23:40 --> 00:23:43
Let's start with lambda equals
two.

307
00:23:43 --> 00:23:47
If I set lambda equals two,
what does this equation become?

308
00:23:47 --> 00:23:53
Well, it becomes x equals y.
This one becomes y equals x.

309
00:23:53 --> 00:23:57
Well, they seem to be the same.
x equals y.

310
00:23:57 --> 00:24:01
And then the equation xy equals
three becomes,

311
00:24:01 --> 00:24:06
well, x squared equals three.
I have two solutions.

312
00:24:06 --> 00:24:15
One is x equals root three and,
therefore, y equals root three

313
00:24:15 --> 00:24:23
as well, or negative root three
and negative root three.

314
00:24:23 --> 00:24:26
Let's look at the other case.
If I set lambda equal to

315
00:24:26 --> 00:24:30
negative two then I get 2x
equals negative 2y.

316
00:24:30 --> 00:24:37
That means x equals negative y.
The second one,

317
00:24:37 --> 00:24:40
2y equals negative 2x.
That is y equals negative x.

318
00:24:40 --> 00:24:45
Well, that is the same thing.
And xy equals three becomes

319
00:24:45 --> 00:24:51
negative x squared equals three.
Can we solve that?

320
00:24:51 --> 00:24:58
No.
There are no solutions here.

321
00:24:58 --> 00:25:03
Now we have two candidate
points which are these two

322
00:25:03 --> 00:25:07
points, root three,
root three or negative root

323
00:25:07 --> 00:25:13
three, negative root three.
OK.

324
00:25:13 --> 00:25:16
Let's actually look at what we
have here.

325
00:25:16 --> 00:25:20
Maybe you cannot read the
coordinates, but the point that

326
00:25:20 --> 00:25:23
I have here is indeed root
three, root three.

327
00:25:23 --> 00:25:26
How do we see that lambda
equals two?

328
00:25:26 --> 00:25:29
Well, if you look at this
picture, the gradient of f,

329
00:25:29 --> 00:25:32
that is the blue vector,
is indeed twice the yellow

330
00:25:32 --> 00:25:36
vector, gradient g.
That is where you read the

331
00:25:36 --> 00:25:41
value of lambda.
And we have the other solution

332
00:25:41 --> 00:25:45
which is somewhere here.
Negative root three,

333
00:25:45 --> 00:25:48
negative root there.
And there, again,

334
00:25:48 --> 00:25:51
lambda equals two.
The two vectors are

335
00:25:51 --> 00:25:59
proportional by a factor of two.
Yes?

336
00:25:59 --> 00:26:01
No, solutions are not quite
guaranteed to be absolute minima

337
00:26:01 --> 00:26:03
or maxima.
They are guaranteed to be

338
00:26:03 --> 00:26:06
somehow critical points end of a
constraint.

339
00:26:06 --> 00:26:09
That means if you were able to
solve and eliminate the variable

340
00:26:09 --> 00:26:12
that would be a critical point.
When you have the same problem,

341
00:26:12 --> 00:26:14
as we have critical points,
are they maxima or minima?

342
00:26:14 --> 00:26:22
And the answer is,
well, we won't know until we

343
00:26:22 --> 00:26:28
check.
More questions?

344
00:26:28 --> 00:26:32
No.
Yes?

345
00:26:32 --> 00:26:36
What is a Lagrange multiplier?
Well, it is this number lambda

346
00:26:36 --> 00:26:39
that is called the multiplier
here.

347
00:26:39 --> 00:26:44
It is a multiplier because it
is what you have to multiply

348
00:26:44 --> 00:26:48
gradient of g by to get gradient
of f.

349
00:26:48 --> 00:26:49
It multiplies.

350
00:26:49 --> 00:27:04

351
00:27:04 --> 00:27:11
Let's try to see why is this
method valid?

352
00:27:11 --> 00:27:18
Because so far I have shown you
pictures and have said see they

353
00:27:18 --> 00:27:23
are tangent.
But why is it that they have to

354
00:27:23 --> 00:27:28
be tangent in general?
Let's think about it.

355
00:27:28 --> 00:27:37
Let's say that we are at
constrained min or max.

356
00:27:37 --> 00:27:42
What that means is that if I
move on the level g equals

357
00:27:42 --> 00:27:46
constant then the value of f
should only increase or only

358
00:27:46 --> 00:27:49
decrease.
But it means,

359
00:27:49 --> 00:27:53
in particular,
to first order it will not

360
00:27:53 --> 00:27:56
change.
At an unconstrained min or max,

361
00:27:56 --> 00:27:59
partial derivatives are zero.
In this case,

362
00:27:59 --> 00:28:02
derivatives are zero only in
the allowed directions.

363
00:28:02 --> 00:28:09
And the allowed directions are
those that stay on the levels of

364
00:28:09 --> 00:28:21
this g equals constant.
In any direction along the

365
00:28:21 --> 00:28:40
level set g = c the rate of
change of f must be zero.

366
00:28:40 --> 00:28:44
That is what happens at minima
or maxima.

367
00:28:44 --> 00:28:49
Except here,
of course, we look only at the

368
00:28:49 --> 00:28:54
allowed directions.
Let's say the same thing in

369
00:28:54 --> 00:28:57
terms of directional
derivatives.

370
00:28:57 --> 00:29:23
 
 

371
00:29:23 --> 00:29:35
That means for any direction
that is tangent to the

372
00:29:35 --> 00:29:49
constraint level g equal c,
we must have df over ds in the

373
00:29:49 --> 00:30:00
direction of u equals zero.
I will draw a picture.

374
00:30:00 --> 00:30:05
Let's say now I am in three
variables just to give you

375
00:30:05 --> 00:30:09
different examples.
Here I have a level surface g

376
00:30:09 --> 00:30:11
equals c.
I am at my point.

377
00:30:11 --> 00:30:18
And if I move in any direction
that is on the level surface,

378
00:30:18 --> 00:30:24
so I move in the direction u
tangent to the level surface,

379
00:30:24 --> 00:30:32
then the rate of change of f in
that direction should be zero.

380
00:30:32 --> 00:30:34
Now, remember what the formula
is for this guy.

381
00:30:34 --> 00:30:44
Well, we have seen that this
guy is actually radiant f dot u.

382
00:30:44 --> 00:30:58
That means any such vector u
must be perpendicular to the

383
00:30:58 --> 00:31:05
gradient of f.
That means that the gradient of

384
00:31:05 --> 00:31:10
f should be perpendicular to
anything that is tangent to this

385
00:31:10 --> 00:31:12
level.
That means the gradient of f

386
00:31:12 --> 00:31:16
should be perpendicular to the
level set.

387
00:31:16 --> 00:31:17
That is what we have shown.

388
00:31:17 --> 00:31:37

389
00:31:37 --> 00:31:40
But we know another vector that
is also perpendicular to the

390
00:31:40 --> 00:31:57
level set of g.
That is the gradient of g.

391
00:31:57 --> 00:32:02
We conclude that the gradient
of f must be parallel to the

392
00:32:02 --> 00:32:07
gradient of g because both are
perpendicular to the level set

393
00:32:07 --> 00:32:09
of g.
I see confused faces,

394
00:32:09 --> 00:32:13
so let me try to tell you again
where that comes from.

395
00:32:13 --> 00:32:16
We said if we had a constrained
minimum or maximum,

396
00:32:16 --> 00:32:19
if we move in the level set of
g, f doesn't change.

397
00:32:19 --> 00:32:20
Well, it doesn't change to
first order.

398
00:32:20 --> 00:32:24
It is the same idea as when you
are looking for a minimum you

399
00:32:24 --> 00:32:26
set the derivative equal to
zero.

400
00:32:26 --> 00:32:31
So the derivative in any
direction, tangent to g equals

401
00:32:31 --> 00:32:34
c, should be the directional
derivative of f,

402
00:32:34 --> 00:32:38
in any such direction,
should be zero.

403
00:32:38 --> 00:32:43
That is what we mean by
critical point of f.

404
00:32:43 --> 00:32:48
And so that means that any
vector u, any unit vector

405
00:32:48 --> 00:32:55
tangent to the level set of g is
going to be perpendicular to the

406
00:32:55 --> 00:33:00
gradient of f.
That means that the gradient of

407
00:33:00 --> 00:33:04
f is perpendicular to the level
set of g.

408
00:33:04 --> 00:33:06
If you want,
that means the level sets of f

409
00:33:06 --> 00:33:10
and g are tangent to each other.
That is justifying what we have

410
00:33:10 --> 00:33:15
observed in the picture that the
two level sets have to be

411
00:33:15 --> 00:33:20
tangent to each other at the
prime minimum or maximum.

412
00:33:20 --> 00:33:23
Does that make a little bit of
sense?

413
00:33:23 --> 00:33:28
Kind of.
I see at least a few faces

414
00:33:28 --> 00:33:35
nodding so I take that to be a
positive answer.

415
00:33:35 --> 00:33:39
Since I have been asked by
several of you,

416
00:33:39 --> 00:33:43
how do I know if it is a
maximum or a minimum?

417
00:33:43 --> 00:33:57
Well, warning,
the method doesn't tell whether

418
00:33:57 --> 00:34:09
a solution is a minimum or a
maximum.

419
00:34:09 --> 00:34:13
How do we do it?
Well, more bad news.

420
00:34:13 --> 00:34:26
We cannot use the second
derivative test.

421
00:34:26 --> 00:34:30
And the reason for that is that
we care actually only about

422
00:34:30 --> 00:34:34
these specific directions that
are tangent to variable of g.

423
00:34:34 --> 00:34:39
And we don't want to bother to
try to define directional second

424
00:34:39 --> 00:34:42
derivatives.
Not to mention that actually it

425
00:34:42 --> 00:34:45
wouldn't work.
There is a criterion but it is

426
00:34:45 --> 00:34:49
much more complicated than that.
Basically, the answer for us is

427
00:34:49 --> 00:34:52
that we don't have a second
derivative test in this

428
00:34:52 --> 00:34:54
situation.
What are we left with?

429
00:34:54 --> 00:34:57
Well, we are just left with
comparing values.

430
00:34:57 --> 00:35:00
Say that in this problem you
found a point where f equals

431
00:35:00 --> 00:35:04
three, a point where f equals
nine, a point where f equals 15.

432
00:35:04 --> 00:35:08
Well, then probably the minimum
is the point where f equals

433
00:35:08 --> 00:35:12
three and the maximum is 15.
Actually, in this case,

434
00:35:12 --> 00:35:17
where we found minima,
these two points are tied for

435
00:35:17 --> 00:35:19
minimum.
What about the maximum?

436
00:35:19 --> 00:35:22
What is the maximum of f on the
hyperbola?

437
00:35:22 --> 00:35:25
Well, it is infinity because
the point can go as far as you

438
00:35:25 --> 00:35:29
want from the origin.
But the general idea is if we

439
00:35:29 --> 00:35:35
have a good reason to believe
that there should be a minimum,

440
00:35:35 --> 00:35:38
and it's not like at infinity
or something weird like that,

441
00:35:38 --> 00:35:42
then the minimum will be a
solution of the Lagrange

442
00:35:42 --> 00:35:46
multiplier equations.
We just look for all the

443
00:35:46 --> 00:35:51
solutions and then we choose the
one that gives us the lowest

444
00:35:51 --> 00:35:55
value.
Is that good enough?

445
00:35:55 --> 00:35:57
Let me actually write that down.

446
00:35:57 --> 00:36:23

447
00:36:23 --> 00:36:35
To find the minimum or the
maximum, we compare values of f

448
00:36:35 --> 00:36:46
at the various solutions -- --
to Lagrange multiplier

449
00:36:46 --> 00:36:49
equations.

450
00:36:49 --> 00:37:08

451
00:37:08 --> 00:37:11
I should say also that
sometimes you can just conclude

452
00:37:11 --> 00:37:14
by thinking geometrically.
In this case,

453
00:37:14 --> 00:37:18
when it is asking you which
point is closest to the origin

454
00:37:18 --> 00:37:23
you can just see that your
answer is the correct one.

455
00:37:23 --> 00:37:32
Let's do an advanced example.
Advanced means that -- Well,

456
00:37:32 --> 00:37:37
this one I didn't actually dare
to put on top of the other

457
00:37:37 --> 00:37:48
problem sets.
Instead, I am going to do it.

458
00:37:48 --> 00:37:51
What is this going to be about?
We are going to look for a

459
00:37:51 --> 00:38:03
surface minimizing pyramid.
Let's say that we want to build

460
00:38:03 --> 00:38:19
a pyramid with a given
triangular base -- -- and a

461
00:38:19 --> 00:38:28
given volume.
Say that I have maybe in the x,

462
00:38:28 --> 00:38:33
y plane I am giving you some
triangle.

463
00:38:33 --> 00:38:40
And I am going to try to build
a pyramid.

464
00:38:40 --> 00:38:48
Of course, I can choose where
to put the top of a pyramid.

465
00:38:48 --> 00:38:53
This guy will end up being
behind now.

466
00:38:53 --> 00:39:09
And the constraint and the goal
is to minimize the total surface

467
00:39:09 --> 00:39:13
area.
The first time I taught this

468
00:39:13 --> 00:39:15
class, it was a few years ago,
was just before they built the

469
00:39:15 --> 00:39:17
Stata Center.
And then I used to motivate

470
00:39:17 --> 00:39:20
this problem by saying Frank
Gehry has gone crazy and has

471
00:39:20 --> 00:39:23
been given a triangular plot of
land he wants to put a pyramid.

472
00:39:23 --> 00:39:26
There needs to be the right
amount of volume so that you can

473
00:39:26 --> 00:39:28
put all the offices in there.
And he wants it to be,

474
00:39:28 --> 00:39:31
actually, covered in solid
gold.

475
00:39:31 --> 00:39:34
And because that is expensive,
the administration wants him to

476
00:39:34 --> 00:39:38
cut the costs a bit.
And so you have to minimize the

477
00:39:38 --> 00:39:42
total size so that it doesn't
cost too much.

478
00:39:42 --> 00:39:45
We will see if MIT comes up
with a triangular pyramid

479
00:39:45 --> 00:39:48
building.
Hopefully not.

480
00:39:48 --> 00:39:58
It could be our next dorm,
you never know.

481
00:39:58 --> 00:40:01
Anyway, it is a fine geometry
problem.

482
00:40:01 --> 00:40:07
Let's try to think about how we
can do this.

483
00:40:07 --> 00:40:10
The natural way to think about
it would be -- Well,

484
00:40:10 --> 00:40:11
what do we have to look for
first?

485
00:40:11 --> 00:40:18
We have to look for the
position of that top point.

486
00:40:18 --> 00:40:29
Remember we know that the
volume of a pyramid is one-third

487
00:40:29 --> 00:40:37
the area of base times height.
In fact, fixing the volume,

488
00:40:37 --> 00:40:39
knowing that we have fixed the
area of a base,

489
00:40:39 --> 00:40:43
means that we are fixing the
height of the pyramid.

490
00:40:43 --> 00:40:47
The height is completely fixed.
What we have to choose just is

491
00:40:47 --> 00:40:52
where do we put that top point?
Do we put it smack in the

492
00:40:52 --> 00:40:58
middle of a triangle or to a
side or even anywhere we want?

493
00:40:58 --> 00:41:15
Its z coordinate is fixed.
Let's call h the height.

494
00:41:15 --> 00:41:20
What we could do is something
like this.

495
00:41:20 --> 00:41:24
We say we have three points of
a base.

496
00:41:24 --> 00:41:32
Let's call them p1 at (x1,
y1,0); p2 at (x2,

497
00:41:32 --> 00:41:36
y2,0); p3 at (x3,
y3,0).

498
00:41:36 --> 00:41:40
This point p is the unknown
point at (x, y,

499
00:41:40 --> 00:41:42
h).
We know the height.

500
00:41:42 --> 00:41:46
And then we want to minimize
the sum of the areas of these

501
00:41:46 --> 00:41:50
three triangles.
One here, one here and one at

502
00:41:50 --> 00:41:53
the back.
And areas of triangles we know

503
00:41:53 --> 00:41:57
how to express by using length
of cross-product.

504
00:41:57 --> 00:42:00
It becomes a function of x and
y.

505
00:42:00 --> 00:42:04
And you can try to minimize it.
Actually, it doesn't quite work.

506
00:42:04 --> 00:42:05
The formulas are just too
complicated.

507
00:42:05 --> 00:42:14
You will never get there.
What happens is actually maybe

508
00:42:14 --> 00:42:18
we need better coordinates.
Why do we need better

509
00:42:18 --> 00:42:21
coordinates?
That is because the geometry is

510
00:42:21 --> 00:42:24
kind of difficult to do if you
use x, y coordinates.

511
00:42:24 --> 00:42:28
I mean formula for
cross-product is fine,

512
00:42:28 --> 00:42:33
but then the length of the
vector will be annoying and just

513
00:42:33 --> 00:42:37
doesn't look good.
Instead, let's think about it

514
00:42:37 --> 00:42:38
differently.

515
00:42:38 --> 00:42:54

516
00:42:54 --> 00:43:01
I claim if we do it this way
and we express the area as a

517
00:43:01 --> 00:43:06
function of x,
y, well, actually we can't

518
00:43:06 --> 00:43:13
solve for a minimum.
Here is another way to do it.

519
00:43:13 --> 00:43:17
Well, what has worked pretty
well for us so far is this

520
00:43:17 --> 00:43:19
geometric idea of base times
height.

521
00:43:19 --> 00:43:29
So let's think in terms of the
heights of side triangles.

522
00:43:29 --> 00:43:37
I am going to use the height of
these things.

523
00:43:37 --> 00:43:43
And I am going to say that the
area will be the sum of three

524
00:43:43 --> 00:43:48
terms, which are three bases
times three heights.

525
00:43:48 --> 00:43:53
Let's give names to these
quantities.

526
00:43:53 --> 00:43:58
Actually, for that it is going
to be good to have the point in

527
00:43:58 --> 00:44:01
the xy plane that lives directly
below p.

528
00:44:01 --> 00:44:08
Let's call it q.
P is the point that coordinates

529
00:44:08 --> 00:44:13
x, y, h.
And let's call q the point that

530
00:44:13 --> 00:44:19
is just below it and so it'
coordinates are x,

531
00:44:19 --> 00:44:22
y, 0.
Let's see.

532
00:44:22 --> 00:44:34
Let me draw a map of this thing.
p1, p2, p3 and I have my point

533
00:44:34 --> 00:44:37
q in the middle.
Let's see.

534
00:44:37 --> 00:44:40
To know these areas,
I need to know the base.

535
00:44:40 --> 00:44:44
Well, the base I can decide
that I know it because it is

536
00:44:44 --> 00:44:48
part of my given data.
I know the sides of this

537
00:44:48 --> 00:44:53
triangle.
Let me call the lengths a1,

538
00:44:53 --> 00:44:56
a2, a3.
I also need to know the height,

539
00:44:56 --> 00:44:58
so I need to know these
lengths.

540
00:44:58 --> 00:45:01
How do I know these lengths?
Well, its distance in space,

541
00:45:01 --> 00:45:03
but it is a little bit
annoying.

542
00:45:03 --> 00:45:10
But maybe I can reduce it to a
distance in the plane by looking

543
00:45:10 --> 00:45:17
instead at this distance here.
Let me give names to the

544
00:45:17 --> 00:45:24
distances from q to the sides.
Let's call u1,

545
00:45:24 --> 00:45:35
u2, u3 the distances from q to
the sides.

546
00:45:35 --> 00:45:47
 
 

547
00:45:47 --> 00:45:49
Well, now I can claim I can
find, actually,

548
00:45:49 --> 00:45:53
sorry.
I need to draw one more thing.

549
00:45:53 --> 00:45:57
I claim I have a nice formula
for the area,

550
00:45:57 --> 00:46:01
because this is vertical and
this is horizontal so this

551
00:46:01 --> 00:46:05
length here is u3,
this length here is h.

552
00:46:05 --> 00:46:13
So what is this length here?
It is the square root of u3

553
00:46:13 --> 00:46:17
squared plus h squared.
And similarly for these other

554
00:46:17 --> 00:46:23
guys.
They are square roots of a u

555
00:46:23 --> 00:46:31
squared plus h squared.
The heights of the faces are

556
00:46:31 --> 00:46:36
square root of u1 squared times
h squared.

557
00:46:36 --> 00:46:43
And similarly with u2 and u3.
So the total side area is going

558
00:46:43 --> 00:46:47
to be the area of the first
faces,

559
00:46:47 --> 00:46:58
one-half of base times height, 
plus one-half of a base times a

560
00:46:58 --> 00:47:06
height plus one-half of the
third one.

561
00:47:06 --> 00:47:09
It doesn't look so much better.
But, trust me,

562
00:47:09 --> 00:47:15
it will get better.
Now, that is a function of

563
00:47:15 --> 00:47:19
three variables,
u1, u2, u3.

564
00:47:19 --> 00:47:22
And how do we relate u1,
u2, u3 to each other?

565
00:47:22 --> 00:47:25
They are probably not
independent.

566
00:47:25 --> 00:47:32
Well, let's cut this triangle
here into three pieces like

567
00:47:32 --> 00:47:35
that.
Then each piece has side --

568
00:47:35 --> 00:47:40
Well, let's look at it the piece
of the bottom.

569
00:47:40 --> 00:47:50
It has base a3, height u3.
Cutting base into three tells

570
00:47:50 --> 00:47:57
you that the area of a base is
one-half of a1,

571
00:47:57 --> 00:48:04
u1 plus one-half of a2,
u2 plus one-half of a3,

572
00:48:04 --> 00:48:09
u3.
And that is our constraint.

573
00:48:09 --> 00:48:12
My three variables,
u1, u2, u3, are constrained in

574
00:48:12 --> 00:48:14
this way.
The sum of this figure must be

575
00:48:14 --> 00:48:17
the area of a base.
And I want to minimize that guy.

576
00:48:17 --> 00:48:23
So that is my g and that guy
here is my f.

577
00:48:23 --> 00:48:28
Now we try to apply our
Lagrange multiplier equations.

578
00:48:28 --> 00:48:33
Well, partial f of a partial u1
is -- Well,

579
00:48:33 --> 00:48:36
if you do the calculation, 
you will see it is one-half a1, 

580
00:48:36 --> 00:48:43
u1 over square root of u1^2
plus h^2 equals lambda,

581
00:48:43 --> 00:48:46
what is partial g, 
partial a1?

582
00:48:46 --> 00:48:50
That one you can do, I am sure.
It is one-half a1.

583
00:48:50 --> 00:49:00
Oh, these guys simplify.
If you do the same with the

584
00:49:00 --> 00:49:09
second one -- -- things simplify
again.

585
00:49:09 --> 00:49:17
And the same with the third one.
Well, you will get,

586
00:49:17 --> 00:49:21
after simplifying,
u3 over square root of u3

587
00:49:21 --> 00:49:24
squared plus h squared equals
lambda.

588
00:49:24 --> 00:49:27
Now, that means this guy equals
this guy equals this guy.

589
00:49:27 --> 00:49:33
They are all equal to lambda.
And, if you think about it,

590
00:49:33 --> 00:49:39
that means that u1 = u2 = u3.
See, it looked like scary

591
00:49:39 --> 00:49:42
equations but the solution is
very simple.

592
00:49:42 --> 00:49:45
What does it mean?
It means that our point q

593
00:49:45 --> 00:49:47
should be equidistant from all
three sides.

594
00:49:47 --> 00:49:52
That is called the incenter.
Q should be in the incenter.

595
00:49:52 --> 00:49:56
The next time you have to build
a golden pyramid and don't want

596
00:49:56 --> 00:49:59
to go broke, well,
you know where to put the top.

597
00:49:59 --> 00:50:03
If that was a bit fast, sorry.
Anyway, it is not completely

598
00:50:03 --> 00:50:06
crucial.
But go over it and you will see

599
00:50:06 --> 00:50:08
it works.
Have a nice weekend.

600
00:50:08 --> 00:50:10