1
00:00:01 --> 00:00:03
The following content is
provided under a Creative

2
00:00:03 --> 00:00:05
Commons license.
Your support will help MIT

3
00:00:05 --> 00:00:08
OpenCourseWare continue to offer
high quality educational

4
00:00:08 --> 00:00:13
resources for free.
To make a donation or to view

5
00:00:13 --> 00:00:18
additional materials from
hundreds of MIT courses,

6
00:00:18 --> 00:00:23
visit MIT OpenCourseWare at
ocw.mit.edu.

7
00:00:23 --> 00:00:28
Today we are going to see how
to use what we saw last time

8
00:00:28 --> 00:00:33
about partial derivatives to
handle minimization or

9
00:00:33 --> 00:00:41
maximization problems involving
functions of several variables.

10
00:00:41 --> 00:00:44
Remember last time we said that
when we have a function,

11
00:00:44 --> 00:00:49
say, of two variables, x and y, 
then we have actually two

12
00:00:49 --> 00:00:53
different derivatives,
partial f, partial x,

13
00:00:53 --> 00:01:02
also called f sub x,
the derivative with respect to

14
00:01:02 --> 00:01:11
x keeping y constant.
And we have partial f,

15
00:01:11 --> 00:01:21
partial y, also called f sub y,
where we vary y and we keep x

16
00:01:21 --> 00:01:26
as a constant.
And now, one thing I didn't

17
00:01:26 --> 00:01:30
have time to tell you about but
hopefully you thought about in

18
00:01:30 --> 00:01:37
recitation yesterday,
is the approximation formula

19
00:01:37 --> 00:01:47
that tells you what happens if
you vary both x and y.

20
00:01:47 --> 00:01:50
f sub x tells us what happens
if we change x a little bit,

21
00:01:50 --> 00:01:53
by some small amount delta x.
f sub y tells us how f changes,

22
00:01:53 --> 00:01:56
if you change y by a small
amount delta y.

23
00:01:56 --> 00:02:00
If we do both at the same time
then the two effects will add up

24
00:02:00 --> 00:02:02
with each other,
because you can imagine that

25
00:02:02 --> 00:02:05
first you will change x and then
you will change y.

26
00:02:05 --> 00:02:12
Or the other way around.
It doesn't really matter.

27
00:02:12 --> 00:02:18
If we change x by a certain
amount delta x,

28
00:02:18 --> 00:02:23
and if we change y by the
amount delta y,

29
00:02:23 --> 00:02:32
and let's say that we have z=
f(x, y) then that changes by an

30
00:02:32 --> 00:02:40
amount which is approximately f
sub x times delta x plus f sub y

31
00:02:40 --> 00:02:45
times delta y.
And that is one of the most

32
00:02:45 --> 00:02:49
important formulas about partial
derivatives.

33
00:02:49 --> 00:02:54
The intuition for this,
again, is just the two effects

34
00:02:54 --> 00:02:58
of if I change x by a small
amount and then I change y.

35
00:02:58 --> 00:03:02
Well, first changing x will
modify f, how much does it

36
00:03:02 --> 00:03:06
modify f?
The answer is the rate change

37
00:03:06 --> 00:03:09
is f sub x.
And if I change y then the rate

38
00:03:09 --> 00:03:13
of change of f when I change y
is f sub y.

39
00:03:13 --> 00:03:17
So all together I get this
change as a value of f.

40
00:03:17 --> 00:03:19
And, of course,
that is only an approximation

41
00:03:19 --> 00:03:22
formula.
Actually, there would be higher

42
00:03:22 --> 00:03:28
order terms involving second and
third derivatives and so on.

43
00:03:28 --> 00:03:43
One way to justify this --
Sorry.

44
00:03:43 --> 00:03:47
I was distracted by the
microphone.

45
00:03:47 --> 00:03:55
OK.
How do we justify this formula?

46
00:03:55 --> 00:04:05
Well, one way to think about it
is in terms of tangent plane

47
00:04:05 --> 00:04:10
approximation.
Let's think about the tangent

48
00:04:10 --> 00:04:13
plane with regard to a function
f.

49
00:04:13 --> 00:04:15
We have some pictures to show
you.

50
00:04:15 --> 00:04:20
It will be easier if I show you
pictures.

51
00:04:20 --> 00:04:24
Remember, partial f,
partial x was obtained by

52
00:04:24 --> 00:04:29
looking at the situation where y
is held constant.

53
00:04:29 --> 00:04:33
That means I am slicing the
graph of f by a plane that is

54
00:04:33 --> 00:04:35
parallel to the x,
z plane.

55
00:04:35 --> 00:04:39
And when I change x,
z changes, and the slope of

56
00:04:39 --> 00:04:44
that is going to be the
derivative with respect to x.

57
00:04:44 --> 00:04:49
Now, if I do the same in the
other direction then I will have

58
00:04:49 --> 00:04:53
similarly the slope in a slice
now parallel to the y,

59
00:04:53 --> 00:04:57
z plane that will be partial f,
partial y.

60
00:04:57 --> 00:05:00
In fact, in each case,
I have a line.

61
00:05:00 --> 00:05:02
And that line is tangent to the
surface.

62
00:05:02 --> 00:05:06
Now, if I have two lines
tangent to the surface,

63
00:05:06 --> 00:05:09
well, then together they
determine for me the tangent

64
00:05:09 --> 00:05:13
plane to the surface.
Let's try to see how that works.

65
00:05:13 --> 00:05:18
 
 

66
00:05:18 --> 00:05:28
We know that f sub x and f sub
y are the slopes of two tangent

67
00:05:28 --> 00:05:37
lines to this plane,
two tangent lines to the graph.

68
00:05:37 --> 00:05:39
And let's write down the
equations of these lines.

69
00:05:39 --> 00:05:41
I am not going to write
parametric equations.

70
00:05:41 --> 00:05:45
I am going to write them in
terms of x, y,

71
00:05:45 --> 00:05:49
z coordinates.
Let's say that partial f of a

72
00:05:49 --> 00:05:53
partial x at the given point is
equal to a.

73
00:05:53 --> 00:06:00
That means that we have a line
given by the following

74
00:06:00 --> 00:06:05
conditions.
I am going to keep y constant

75
00:06:05 --> 00:06:07
equal to y0.
And I am going to change x.

76
00:06:07 --> 00:06:12
And, as I change x,
z will change at the rate that

77
00:06:12 --> 00:06:22
is equal to a.
That would be z = 0 a(x - x0).

78
00:06:22 --> 00:06:26
That is how you would describe
a line that, I guess,

79
00:06:26 --> 00:06:30
the one that is plotted in
green here, been dissected with

80
00:06:30 --> 00:06:33
the slice parallel to the x,
z plane.

81
00:06:33 --> 00:06:40
I hold y constant equal to y0.
And z is a function of x that

82
00:06:40 --> 00:06:50
varies with a rate of a.
And now if I look similarly at

83
00:06:50 --> 00:06:55
the other slice,
let's say that the partial with

84
00:06:55 --> 00:07:00
respect to y is equal to b,
then I get another line which

85
00:07:00 --> 00:07:06
is obtained by the fact that z
now will depend on y.

86
00:07:06 --> 00:07:10
And the rate of change with
respect to y will be b.

87
00:07:10 --> 00:07:15
While x is held constant equal
to x0.

88
00:07:15 --> 00:07:19
These two lines are both going
to be in the tangent plane to

89
00:07:19 --> 00:07:20
the surface.

90
00:07:20 --> 00:07:40

91
00:07:40 --> 00:07:45
They are both tangent to the
graph of f and together they

92
00:07:45 --> 00:07:47
determine the plane.

93
00:07:47 --> 00:07:56

94
00:07:56 --> 00:08:08
And that plane is just given by
the formula z = z0 a( x - x0) b

95
00:08:08 --> 00:08:13
( y - y0).
If you look at what happens --

96
00:08:13 --> 00:08:19
This is the equation of a plane.
z equals constant times x plus

97
00:08:19 --> 00:08:24
constant times y plus constant.
And if you look at what happens

98
00:08:24 --> 00:08:28
if I hold y constant and vary x,
I will get the first line.

99
00:08:28 --> 00:08:33
If I hold x constant and vary
y, I get the second line.

100
00:08:33 --> 00:08:34
Another way to do it,
of course,

101
00:08:34 --> 00:08:37
would provide actually
parametric equations of these

102
00:08:37 --> 00:08:40
lines,
get vectors along them and then

103
00:08:40 --> 00:08:43
take the cross-product to get
the normal vector to the plane.

104
00:08:43 --> 00:08:47
And then get this equation for
the plane using the normal

105
00:08:47 --> 00:08:49
vector.
That also works and it gives

106
00:08:49 --> 00:08:53
you the same formula.
If you are curious of the

107
00:08:53 --> 00:08:57
exercise, do it again using
parametrics and using

108
00:08:57 --> 00:09:01
cross-product to get the plane
equation.

109
00:09:01 --> 00:09:03
That is how we get the tangent
plane.

110
00:09:03 --> 00:09:06
And now what this approximation
formula here says is that,

111
00:09:06 --> 00:09:10
in fact, the graph of a
function is close to the tangent

112
00:09:10 --> 00:09:12
plane.
If we were moving on the

113
00:09:12 --> 00:09:15
tangent plane,
this would be an actual

114
00:09:15 --> 00:09:17
equality.
Delta z would be a linear

115
00:09:17 --> 00:09:23
function of delta x and delta y.
And the graph of a function is

116
00:09:23 --> 00:09:27
near the tangent plane,
but is not quite the same,

117
00:09:27 --> 00:09:33
so it is only an approximation
for small delta x and small

118
00:09:33 --> 00:09:43
delta y.
The approximation formula says

119
00:09:43 --> 00:09:57
the graph of f is close to its
tangent plane.

120
00:09:57 --> 00:10:02
And we can use that formula
over here now to estimate how

121
00:10:02 --> 00:10:08
the value of f changes if I
change x and y at the same time.

122
00:10:08 --> 00:10:18
Questions about that?
Now that we have caught up with

123
00:10:18 --> 00:10:23
what we were supposed to see on
Tuesday, I can tell you now

124
00:10:23 --> 00:10:26
about max and min problems.

125
00:10:26 --> 00:10:38

126
00:10:38 --> 00:10:48
That is going to be an
application of partial

127
00:10:48 --> 00:11:00
derivatives to look at
optimization problems.

128
00:11:00 --> 00:11:03
Maybe ten years from now,
when you have a real job,

129
00:11:03 --> 00:11:07
your job might be to actually
minimize the cost of something

130
00:11:07 --> 00:11:11
or maximize the profit of
something or whatever.

131
00:11:11 --> 00:11:14
But typically the function that
you will have to strive to

132
00:11:14 --> 00:11:18
minimize or maximize will depend
on several variables.

133
00:11:18 --> 00:11:22
If you have a function of one
variable, you know that to find

134
00:11:22 --> 00:11:26
its minimum or its maximum you
look at the derivative and set

135
00:11:26 --> 00:11:29
that equal to zero.
And you try to then look at

136
00:11:29 --> 00:11:38
what happens to the function.
Here it is going to be kind of

137
00:11:38 --> 00:11:47
similar, except,
of course, we have several

138
00:11:47 --> 00:11:51
derivatives.
For today we will think about a

139
00:11:51 --> 00:11:56
function of two variables,
but it works exactly the same

140
00:11:56 --> 00:12:00
if you have three variables,
ten variables,

141
00:12:00 --> 00:12:07
a million variables.
The first observation is that

142
00:12:07 --> 00:12:17
if we have a local minimum or a
local maximum then both partial

143
00:12:17 --> 00:12:21
derivatives,
so partial f partial x and

144
00:12:21 --> 00:12:26
partial f partial y,
are both zero at the same time.

145
00:12:26 --> 00:12:30
Why is that?
Well, let's say that f of x is

146
00:12:30 --> 00:12:32
zero.
That means when I vary x to

147
00:12:32 --> 00:12:35
first order the function doesn't
change.

148
00:12:35 --> 00:12:37
Maybe that is because it is
going through...

149
00:12:37 --> 00:12:42
If I look only at the slice
parallel to the x-axis then

150
00:12:42 --> 00:12:45
maybe I am going through the
minimum.

151
00:12:45 --> 00:12:48
But if partial f,
partial y is not 0 then

152
00:12:48 --> 00:12:51
actually, by changing y,
I could still make a value

153
00:12:51 --> 00:12:54
larger or smaller.
That wouldn't be an actual

154
00:12:54 --> 00:12:57
maximum or minimum.
It would only be a maximum or

155
00:12:57 --> 00:13:01
minimum if I stay in the slice.
But if I allow myself to change

156
00:13:01 --> 00:13:04
y that doesn't work.
I need actually to know that if

157
00:13:04 --> 00:13:07
I change y the value will not
change either to first order.

158
00:13:07 --> 00:13:11
That is why you also need
partial f, partial y to be zero.

159
00:13:11 --> 00:13:13
Now, let's say that they are
both zero.

160
00:13:13 --> 00:13:16
Well, why is that enough?
It is essentially enough

161
00:13:16 --> 00:13:20
because of this formula telling
me that if both of these guys

162
00:13:20 --> 00:13:24
are zero then to first order the
function doesn't change.

163
00:13:24 --> 00:13:26
Then, of course, 
there will be maybe quadratic

164
00:13:26 --> 00:13:28
terms that will actually turn
that, you know,

165
00:13:28 --> 00:13:31
this won't really say that your
function is actually constant.

166
00:13:31 --> 00:13:35
It will just tell you that
maybe it will actually be

167
00:13:35 --> 00:13:40
quadratic or higher order in
delta x and delta y.

168
00:13:40 --> 00:13:52
That is what you expect to have
at a maximum or a minimum.

169
00:13:52 --> 00:14:05
The condition is the same thing
as saying that the tangent plane

170
00:14:05 --> 00:14:15
to the graph is actually going
to be horizontal.

171
00:14:15 --> 00:14:18
And that is what you want to
have.

172
00:14:18 --> 00:14:23
Say you have a minimum,
well, the tangent plane at this

173
00:14:23 --> 00:14:30
point, at the bottom of the
graph is going to be horizontal.

174
00:14:30 --> 00:14:35
And you can see that on this
equation of a tangent plane,

175
00:14:35 --> 00:14:40
when both these coefficients
are 0 that is when the equation

176
00:14:40 --> 00:14:44
becomes z equals constant:
the horizontal plane.

177
00:14:44 --> 00:14:50
Does that make sense?
We will have a name for this

178
00:14:50 --> 00:14:52
kind of point because,
actually,

179
00:14:52 --> 00:14:55
what we will see very soon is
that these conditions are

180
00:14:55 --> 00:14:57
necessary but are not
sufficient.

181
00:14:57 --> 00:15:02
There are actually other kinds
of points where the partial

182
00:15:02 --> 00:15:08
derivatives are zero.
Let's give a name to this.

183
00:15:08 --> 00:15:24
We say the definition is (x0,
y0) is a critical point of f --

184
00:15:24 --> 00:15:36
-- if the partial derivative,
with respect to x,

185
00:15:36 --> 00:15:44
and partial derivative with
respect to y are both zero.

186
00:15:44 --> 00:15:50
Generally, you would want all
the partial derivatives,

187
00:15:50 --> 00:15:56
no matter how many variables
you have, to be zero at the same

188
00:15:56 --> 00:16:06
time.
Let's see an example.

189
00:16:06 --> 00:16:23
Let's say I give you the
function f(x;y)= x^2 - 2xy 3y^2

190
00:16:23 --> 00:16:28
2x - 2y.
And let's try to figure out

191
00:16:28 --> 00:16:32
whether we can minimize or
maximize this.

192
00:16:32 --> 00:16:37
What we would start doing
immediately is taking the

193
00:16:37 --> 00:16:43
partial derivatives.
What is f sub x?

194
00:16:43 --> 00:16:56
It starts with 2x - 2y 0 2.
Remember that y is a constant

195
00:16:56 --> 00:17:04
so this differentiates to zero.
Now, if we do f sub y,

196
00:17:04 --> 00:17:14
that is going to be 0-2x 6y-2.
And what we want to do is set

197
00:17:14 --> 00:17:17
these things to zero.
And we want to solve these two

198
00:17:17 --> 00:17:21
equations at the same time.
An important thing to remember, 

199
00:17:21 --> 00:17:23
and maybe I should have told
you a couple of weeks ago

200
00:17:23 --> 00:17:25
already,
if you have two equations to

201
00:17:25 --> 00:17:28
solve, well,
it is very good to try to

202
00:17:28 --> 00:17:30
simplify them by adding them
together or whatever,

203
00:17:30 --> 00:17:33
but you must keep two equations.
If you have two equations,

204
00:17:33 --> 00:17:37
you shouldn't end up with just
one equation out of nowhere.

205
00:17:37 --> 00:17:40
For example here,
we can certainly simplify

206
00:17:40 --> 00:17:46
things by summing them together.
If we add them together,

207
00:17:46 --> 00:17:52
well, the x's cancel and the
constants cancel.

208
00:17:52 --> 00:17:56
In fact, we are just left with
4y for zero.

209
00:17:56 --> 00:18:00
That is pretty good.
That tells us y should be zero.

210
00:18:00 --> 00:18:02
But then we should,
of course, go back to these and

211
00:18:02 --> 00:18:07
see what else we know.
Well, now it tells us,

212
00:18:07 --> 00:18:14
if you put y = 0 it tells you
2x 2 = 0.

213
00:18:14 --> 00:18:26
That tells you x = - 1.
We have one critical point that

214
00:18:26 --> 00:18:33
is (x, y) = (- 1;
0).

215
00:18:33 --> 00:18:39
Any questions so far?
No.

216
00:18:39 --> 00:18:40
Well, you should have a
question.

217
00:18:40 --> 00:18:49
The question should be how do
we know if it is a maximum or a

218
00:18:49 --> 00:18:53
minimum?
Yeah.

219
00:18:53 --> 00:18:55
If we had a function of one
variable, we would decide things

220
00:18:55 --> 00:18:58
based on the second derivative.
And, in fact,

221
00:18:58 --> 00:19:00
we will see tomorrow how to do
things based on the second

222
00:19:00 --> 00:19:03
derivative.
But that is kind of tricky

223
00:19:03 --> 00:19:06
because there are a lot of
second derivatives.

224
00:19:06 --> 00:19:09
I mean we already have two
first derivatives.

225
00:19:09 --> 00:19:14
You can imagine that if you
keep taking partials you may end

226
00:19:14 --> 00:19:17
up with more and more,
so we will have to figure out

227
00:19:17 --> 00:19:19
carefully what the condition
should be.

228
00:19:19 --> 00:19:27
We will do that tomorrow.
For now, let's just try to look

229
00:19:27 --> 00:19:38
a bit at how do we understand
these things by hand?

230
00:19:38 --> 00:19:42
In fact, let me point out to
you immediately that there is

231
00:19:42 --> 00:19:49
more than maxima and minima.
Remember, we saw the example of

232
00:19:49 --> 00:19:52
x^2 y^2.
That has a critical point.

233
00:19:52 --> 00:19:56
That critical point is
obviously a minimum.

234
00:19:56 --> 00:19:58
And, of course,
it could be a local minimum

235
00:19:58 --> 00:20:01
because it could be that if you
have a more complicated function

236
00:20:01 --> 00:20:04
there is indeed a minimum here,
but then elsewhere the function

237
00:20:04 --> 00:20:08
drops to a lower value.
We call that just a local

238
00:20:08 --> 00:20:12
minimum to say that it is a
minimum if you stick two values

239
00:20:12 --> 00:20:15
that are close enough to that
point.

240
00:20:15 --> 00:20:19
Of course, you also have local
maximum, which I didn't plot,

241
00:20:19 --> 00:20:23
but it is easy to plot.
That is a local maximum.

242
00:20:23 --> 00:20:27
But there is a third example of
critical point,

243
00:20:27 --> 00:20:31
and that is a saddle point.
The saddle point,

244
00:20:31 --> 00:20:35
it is a new phenomena that you
don't really see in single

245
00:20:35 --> 00:20:38
variable calculus.
It is a critical point that is

246
00:20:38 --> 00:20:42
neither a minimum nor a maximum
because, depending on which

247
00:20:42 --> 00:20:46
direction you look in,
it's either one or the other.

248
00:20:46 --> 00:20:50
See the point in the middle,
at the origin,

249
00:20:50 --> 00:20:55
is a saddle point.
If you look at the tangent

250
00:20:55 --> 00:20:58
plane to this graph,
you will see that it is

251
00:20:58 --> 00:21:01
actually horizontal at the
origin.

252
00:21:01 --> 00:21:05
You have this mountain pass
where the ground is horizontal.

253
00:21:05 --> 00:21:08
But, depending on which
direction you go,

254
00:21:08 --> 00:21:12
you go up or down.
So, we say that a point is a

255
00:21:12 --> 00:21:16
saddle point if it is neither a
minimum or a maximum.

256
00:21:16 --> 00:21:30
 
 

257
00:21:30 --> 00:21:38
Possibilities could be a local
min, a local max or a saddle.

258
00:21:38 --> 00:21:42
Tomorrow we will see how to
decide which one it is,

259
00:21:42 --> 00:21:46
in general, using second
derivatives.

260
00:21:46 --> 00:21:50
For this time,
let's just try to do it by

261
00:21:50 --> 00:21:53
hand.
I just want to observe,

262
00:21:53 --> 00:21:57
in fact, I can try to,
you know,

263
00:21:57 --> 00:21:58
these examples that I have
here,

264
00:21:58 --> 00:22:02
they are x^2 y^2, y^2 - x^2, 
they are sums or differences of

265
00:22:02 --> 00:22:05
squares.
And, if we know that we can put

266
00:22:05 --> 00:22:08
things as sum of squares for
example, we will be done.

267
00:22:08 --> 00:22:16
Let's try to express this maybe
as the square of something.

268
00:22:16 --> 00:22:21
The main problem is this 2xy.
Observe we know something that

269
00:22:21 --> 00:22:26
starts with x^2 - 2xy but is
actually a square of something

270
00:22:26 --> 00:22:32
else.
It would be x^2 - 2xy y^2,

271
00:22:32 --> 00:22:37
not plus 3y2.
Let's try that.

272
00:22:37 --> 00:22:48
So, we are going to complete
the square.

273
00:22:48 --> 00:22:53
I am going to say it is x minus
y squared, so it gives me the

274
00:22:53 --> 00:23:01
first two terms and also the y2.
Well, I still need to add two

275
00:23:01 --> 00:23:09
more y^2, and I also need to
add, of course,

276
00:23:09 --> 00:23:15
the 2x and - 2y.
It is still not simple enough

277
00:23:15 --> 00:23:19
for my taste.
I can actually do better.

278
00:23:19 --> 00:23:24
These guys look like a sum of
squares, but here I have this

279
00:23:24 --> 00:23:28
extra stuff, 2x - 2y.
Well, that is 2 (x - y).

280
00:23:28 --> 00:23:32
It looks like maybe we can
modify this and make this into

281
00:23:32 --> 00:23:36
another square.
So, in fact,

282
00:23:36 --> 00:23:45
I can simplify this further to
(x - y 1)^2.

283
00:23:45 --> 00:23:51
That would be (x - y)^2 2( x -
y), and then there is a plus

284
00:23:51 --> 00:23:55
one.
Well, we don't have a plus one

285
00:23:55 --> 00:24:00
so let's remove it by
subtracting one.

286
00:24:00 --> 00:24:07
And I still have my 2y^2.
Do you see why this is the same

287
00:24:07 --> 00:24:13
function?
Yeah.

288
00:24:13 --> 00:24:19
Again, if I expand x minus y
plus one squared,

289
00:24:19 --> 00:24:28
I get (x - y)^2 2 (x - y) 1.
But I will have minus one that

290
00:24:28 --> 00:24:34
will cancel out and then I have
a plus 2y^2.

291
00:24:34 --> 00:24:41
Now, what I know is a sum of
two squared minus one.

292
00:24:41 --> 00:24:44
And this critical point,
(x,y) = (-1;0),

293
00:24:44 --> 00:24:49
that is actually when this is
zero and that is zero,

294
00:24:49 --> 00:24:55
so that is the smallest value.
This is always greater or equal

295
00:24:55 --> 00:25:00
to zero, the same with that one,
so that is always at least

296
00:25:00 --> 00:25:03
minus one.
And minus one happens to be the

297
00:25:03 --> 00:25:13
value at the critical point.
So, it is a minimum.

298
00:25:13 --> 00:25:16
Now, of course here I was very
lucky.

299
00:25:16 --> 00:25:19
I mean, generally,
I couldn't expect things to

300
00:25:19 --> 00:25:21
simplify that much.
In fact, I cheated.

301
00:25:21 --> 00:25:26
I started from that,
I expanded, and then that is

302
00:25:26 --> 00:25:30
how I got my example.
The general method will be a

303
00:25:30 --> 00:25:32
bit different,
but you will see it will

304
00:25:32 --> 00:25:34
actually also involve completing
squares.

305
00:25:34 --> 00:25:42
Just there is more to it than
what we have seen.

306
00:25:42 --> 00:25:48
We will come back to this
tomorrow.

307
00:25:48 --> 00:25:56
Sorry?
How do I know that this equals

308
00:25:56 --> 00:26:09
-- How do I know that the whole
function is greater or equal to

309
00:26:09 --> 00:26:15
negative one?
Well, I wrote f of x,

310
00:26:15 --> 00:26:20
y as something squared plus
2y^2 - 1.

311
00:26:20 --> 00:26:25
This squared is always a
positive number and not a

312
00:26:25 --> 00:26:27
negative.
It is a square.

313
00:26:27 --> 00:26:30
The square of something is
always non-negative.

314
00:26:30 --> 00:26:34
Similarly, y^2 is also always
non-negative.

315
00:26:34 --> 00:26:38
So if you add something that is
at least zero plus something

316
00:26:38 --> 00:26:40
that is at least zero and you
subtract one,

317
00:26:40 --> 00:26:43
you get always at least minus
one.

318
00:26:43 --> 00:26:48
And, in fact,
the only way you can get minus

319
00:26:48 --> 00:26:54
one is if both of these guys are
zero at the same time.

320
00:26:54 --> 00:27:17
That is how I get my minimum.
More about this tomorrow.

321
00:27:17 --> 00:27:20
In fact, 
what I would like to tell you

322
00:27:20 --> 00:27:23
about now instead is a nice
application of min,

323
00:27:23 --> 00:27:27
max problems that maybe you
don't think of as a min,

324
00:27:27 --> 00:27:31
max problem that you will see.
I mean you will think of it

325
00:27:31 --> 00:27:35
that way because probably your
calculator can do it for you or,

326
00:27:35 --> 00:27:37
if not, your computer can do it
for you.

327
00:27:37 --> 00:27:42
But it is actually something
where the theory is based on

328
00:27:42 --> 00:27:47
minimization in two variables.
Very often in experimental

329
00:27:47 --> 00:27:52
sciences you have to do
something called least-squares

330
00:27:52 --> 00:28:01
intercalation.
And what is that about?

331
00:28:01 --> 00:28:07
Well, it is the idea that maybe
you do some experiments and you

332
00:28:07 --> 00:28:11
record some data.
You have some data x and some

333
00:28:11 --> 00:28:13
data y.
And, I don't know,

334
00:28:13 --> 00:28:17
maybe, for example,
x is -- Maybe your measuring

335
00:28:17 --> 00:28:21
frogs and you're trying to
measure how bit the frog leg is

336
00:28:21 --> 00:28:23
compared to the eyes of the
frog,

337
00:28:23 --> 00:28:26
or you're trying to measure
something.

338
00:28:26 --> 00:28:30
And if you are doing chemistry
then it could be how much you

339
00:28:30 --> 00:28:35
put of some reactant and how
much of the output product that

340
00:28:35 --> 00:28:37
you wanted to synthesize
generated.

341
00:28:37 --> 00:28:43
All sorts of things.
Make up your own example.

342
00:28:43 --> 00:28:46
You measure basically,
for various values of x,

343
00:28:46 --> 00:28:48
what the value of y ends up
being.

344
00:28:48 --> 00:28:52
And then you like to claim
these points are kind of

345
00:28:52 --> 00:28:53
aligned.
And, of course,

346
00:28:53 --> 00:28:55
to a mathematician they are not
aligned.

347
00:28:55 --> 00:28:57
But, to an experimental
scientist, that is evidence that

348
00:28:57 --> 00:29:00
there is a relation between the
two.

349
00:29:00 --> 00:29:03
And so you want to claim -- And
in your paper you will actually

350
00:29:03 --> 00:29:05
draw a nice little line like
that.

351
00:29:05 --> 00:29:10
The functions depend linearly
on each of them.

352
00:29:10 --> 00:29:15
The question is how do we come
up with that nice line that

353
00:29:15 --> 00:29:19
passes smack in the middle of
the points?

354
00:29:19 --> 00:29:27
The question is,
given experimental data xi,

355
00:29:27 --> 00:29:36
yi -- Maybe I should actually
be more precise.

356
00:29:36 --> 00:29:37
You are given some experimental
data.

357
00:29:37 --> 00:29:45
You have data points x1,
y1, x2, y2 and so on,

358
00:29:45 --> 00:29:52
xn, yn,
the question would be find the

359
00:29:52 --> 00:30:00
"best fit"
line of a form y equals ax b

360
00:30:00 --> 00:30:08
that somehow approximates very
well this data.

361
00:30:08 --> 00:30:11
You can also use that right
away to predict various things.

362
00:30:11 --> 00:30:13
For example,
if you look at your new

363
00:30:13 --> 00:30:17
homework,
actually the first problem asks

364
00:30:17 --> 00:30:22
you to predict how many iPods
will be on this planet in ten

365
00:30:22 --> 00:30:28
years looking at past sales and
how they behave.

366
00:30:28 --> 00:30:31
One thing, right away,
before you lose all the money

367
00:30:31 --> 00:30:35
that you don't have yet,
you cannot use that to predict

368
00:30:35 --> 00:30:39
the stock market.
So, don't try to use that to

369
00:30:39 --> 00:30:52
make money.
It doesn't work.

370
00:30:52 --> 00:30:58
One tricky thing here that I
want to draw your attention to

371
00:30:58 --> 00:31:02
is what are the unknowns here?
The natural answer would be to

372
00:31:02 --> 00:31:03
say that the unknowns are x and
y.

373
00:31:03 --> 00:31:07
That is not actually the case.
We are not going to solve for

374
00:31:07 --> 00:31:09
some x and y.
I mean we have some values

375
00:31:09 --> 00:31:12
given to us.
And, when we are looking for

376
00:31:12 --> 00:31:16
that line, we don't really care
about the perfect value of x.

377
00:31:16 --> 00:31:21
What we care about is actually
these coefficients a and b that

378
00:31:21 --> 00:31:26
will tell us what the relation
is between x and y.

379
00:31:26 --> 00:31:30
In fact, we are trying to solve
for a and b that will give us

380
00:31:30 --> 00:31:34
the nicest possible line for
these points.

381
00:31:34 --> 00:31:36
The unknowns,
in our equations,

382
00:31:36 --> 00:31:39
will have to be a and b,
not x and y.

383
00:31:39 --> 00:32:11
 
 

384
00:32:11 --> 00:32:20
The question really is find the
"best"

385
00:32:20 --> 00:32:23
a and b.
And, of course,

386
00:32:23 --> 00:32:26
we have to decide what we mean
by best.

387
00:32:26 --> 00:32:30
Best will mean that we minimize
some function of a and b that

388
00:32:30 --> 00:32:34
measures the total errors that
we are making when we are

389
00:32:34 --> 00:32:38
choosing this line compared to
the experimental data.

390
00:32:38 --> 00:32:43
Maybe, roughly speaking,
it should measure how far these

391
00:32:43 --> 00:32:49
points are from the line.
But now there are various ways

392
00:32:49 --> 00:32:52
to do it.
And a lot of them are valid

393
00:32:52 --> 00:32:57
they give you different answers.
You have to decide what it is

394
00:32:57 --> 00:32:59
that you prefer.
For example,

395
00:32:59 --> 00:33:04
you could measure the distance
to the line by projecting

396
00:33:04 --> 00:33:08
perpendicularly.
Or you could measure instead,

397
00:33:08 --> 00:33:13
for a given value of x,
the difference between the

398
00:33:13 --> 00:33:17
experimental value of y and the
predicted one.

399
00:33:17 --> 00:33:21
And that is often more relevant
because these guys actually may

400
00:33:21 --> 00:33:25
be expressed in different units.
They are not the same type of

401
00:33:25 --> 00:33:29
quantity.
You cannot actually combine

402
00:33:29 --> 00:33:32
them arbitrarily.
Anyway, the convention is

403
00:33:32 --> 00:33:34
usually we measure distance in
this way.

404
00:33:34 --> 00:33:38
Next, you could try to minimize
the largest distance.

405
00:33:38 --> 00:33:42
Say we look at who has the
largest error and we make that

406
00:33:42 --> 00:33:44
the smallest possible.
The drawback of doing that is

407
00:33:44 --> 00:33:47
experimentally very often you
have one data point that is not

408
00:33:47 --> 00:33:50
good because maybe you fell
asleep in front of the

409
00:33:50 --> 00:33:53
experiment.
And so you didn't measure the

410
00:33:53 --> 00:33:55
right thing.
You tend to want to not give

411
00:33:55 --> 00:33:59
too much importance to some data
point that is far away from the

412
00:33:59 --> 00:34:02
others.
Maybe instead you want to

413
00:34:02 --> 00:34:06
measure the average distance or
maybe you want to actually give

414
00:34:06 --> 00:34:09
more weight to things that are
further away.

415
00:34:09 --> 00:34:12
And then you don't want to do
the distance with a square of

416
00:34:12 --> 00:34:14
the distance.
There are various possible

417
00:34:14 --> 00:34:18
answers, but one of them gives
us actually a particularly nice

418
00:34:18 --> 00:34:22
formula for a and b.
And so that is why it is the

419
00:34:22 --> 00:34:27
universally used one.
Here it says list squares.

420
00:34:27 --> 00:34:31
That's because we will measure,
actually, the sum of the

421
00:34:31 --> 00:34:35
squares of the errors.
And why do we do that?

422
00:34:35 --> 00:34:37
Well, part of it is because it
looks good.

423
00:34:37 --> 00:34:42
When you see this plot in
scientific papers they really

424
00:34:42 --> 00:34:46
look like the line is indeed the
ideal line.

425
00:34:46 --> 00:34:49
And the second reason is
because actually the

426
00:34:49 --> 00:34:52
minimization problem that we
will get is particularly simple,

427
00:34:52 --> 00:34:57
well-posed and easy to solve.
So we will have a nice formula

428
00:34:57 --> 00:35:03
for the best a and the best b.
If you have a method that is

429
00:35:03 --> 00:35:07
simple and gives you a good
answer then that is probably

430
00:35:07 --> 00:35:09
good.
We have to define best.

431
00:35:09 --> 00:35:22
Here it is in the sense of
minimizing the total square

432
00:35:22 --> 00:35:29
error.
Or maybe I should say total

433
00:35:29 --> 00:35:35
square deviation instead.
What do I mean by this?

434
00:35:35 --> 00:35:44
The deviation for each data
point is the difference between

435
00:35:44 --> 00:35:52
what you have measured and what
you are predicting by your

436
00:35:52 --> 00:36:00
model.
That is the difference between

437
00:36:00 --> 00:36:11
y1 and axi plus b.
Now, what we will do is try to

438
00:36:11 --> 00:36:25
minimize the function capital D,
which is just the sum for all

439
00:36:25 --> 00:36:36
the data points of the square of
a deviation.

440
00:36:36 --> 00:36:40
Let me go over this again.
This is a function of a and b.

441
00:36:40 --> 00:36:43
Of course there are a lot of
letters in here,

442
00:36:43 --> 00:36:46
but xi and yi in real life
there will be numbers given to

443
00:36:46 --> 00:36:48
you.
There will be numbers that you

444
00:36:48 --> 00:36:51
have measured.
You have measured all of this

445
00:36:51 --> 00:36:53
data.
They are just going to be

446
00:36:53 --> 00:36:58
numbers.
You put them in there and you

447
00:36:58 --> 00:37:04
get a function of a and b.
Any questions?

448
00:37:04 --> 00:37:16
 
 

449
00:37:16 --> 00:37:20
How do we minimize this
function of a and b?

450
00:37:20 --> 00:37:27
Well, let's use your knowledge.
Let's actually look for a

451
00:37:27 --> 00:37:34
critical point.
We want to solve for partial d

452
00:37:34 --> 00:37:42
over partial a= 0,
partial d over partial b = 0.

453
00:37:42 --> 00:37:48
That is how we look for
critical points.

454
00:37:48 --> 00:37:52
Let's take the derivative of
this with respect to a.

455
00:37:52 --> 00:37:59
Well, the derivative of a sum
is sum of the derivatives.

456
00:37:59 --> 00:38:04
And now we have to take the
derivative of this quantity

457
00:38:04 --> 00:38:07
squared.
Remember, we take the

458
00:38:07 --> 00:38:11
derivative of the square.
We take twice this quantity

459
00:38:11 --> 00:38:15
times the derivative of what we
are squaring.

460
00:38:15 --> 00:38:26
We will get 2(yi - axi) b times
the derivative of this with

461
00:38:26 --> 00:38:30
respect to a.
What is the derivative of this

462
00:38:30 --> 00:38:35
with respect to a?
Negative xi, exactly.

463
00:38:35 --> 00:38:38
And so we will want this to be
0.

464
00:38:38 --> 00:38:41
And partial d over partial b,
we do the same thing,

465
00:38:41 --> 00:38:45
but different shading with
respect to b instead of with

466
00:38:45 --> 00:38:50
respect to a.
Again, the sum of squares twice

467
00:38:50 --> 00:38:58
yi minus axi equals b times the
derivative of this with respect

468
00:38:58 --> 00:39:02
to b is, I think,
negative one.

469
00:39:02 --> 00:39:07
Those are the equations we have
to solve.

470
00:39:07 --> 00:39:10
Well, let's reorganize this a
little bit.

471
00:39:10 --> 00:39:24
 
 

472
00:39:24 --> 00:39:32
The first equation.
See, there are a's and there

473
00:39:32 --> 00:39:36
are b's in these equations.
I am going to just look at the

474
00:39:36 --> 00:39:39
coefficients of a and b.
If you have good eyes,

475
00:39:39 --> 00:39:42
you can see probably that these
are actually linear equations in

476
00:39:42 --> 00:39:45
a and b.
There is a lot of clutter with

477
00:39:45 --> 00:39:47
all these x's and y's all over
the place.

478
00:39:47 --> 00:39:55
Let's actually try to expand
things and make that more

479
00:39:55 --> 00:39:59
apparent.
The first thing I will do is

480
00:39:59 --> 00:40:02
actually get rid of these
factors of two.

481
00:40:02 --> 00:40:05
They are just not very
important.

482
00:40:05 --> 00:40:10
I can simplify things.
Next, I am going to look at the

483
00:40:10 --> 00:40:15
coefficient of a.
I will get basically a times xi

484
00:40:15 --> 00:40:24
squared.
Let me just do it and should be

485
00:40:24 --> 00:40:33
clear.
I claim when we simplify this

486
00:40:33 --> 00:40:46
we get xi squared times a plus
xi times b minus xiyi.

487
00:40:46 --> 00:40:53
And we set this equal to zero.
Do you agree that this is what

488
00:40:53 --> 00:40:57
we get when we expand that
product?

489
00:40:57 --> 00:41:03
Yeah. Kind of?
OK. Let's do the other one.

490
00:41:03 --> 00:41:08
We just multiply by minus one,
so we take the opposite of that

491
00:41:08 --> 00:41:19
which would be axi plus b.
I will write that as xia plus b

492
00:41:19 --> 00:41:25
minus yi.
Sorry. I forgot the n here.

493
00:41:25 --> 00:41:30
And let me just reorganize that
by actually putting all the a's

494
00:41:30 --> 00:41:34
together.
That means I will have sum of

495
00:41:34 --> 00:41:40
all the xi2 times a plus sum of
xib minus sum of xiyi equal to

496
00:41:40 --> 00:41:41
zero.

497
00:41:41 --> 00:42:08

498
00:42:08 --> 00:42:15
If I rewrite this,
it becomes sum of xi2 times a

499
00:42:15 --> 00:42:24
plus sum of the xi's time b,
and let me move the other guys

500
00:42:24 --> 00:42:30
to the other side,
equals sum of xiyi.

501
00:42:30 --> 00:42:37
And that one becomes sum of xi
times a.

502
00:42:37 --> 00:42:41
Plus how many b's do I get on
this one?

503
00:42:41 --> 00:42:45
I get one for each data point.
When I sum them together,

504
00:42:45 --> 00:42:48
I will get n.
Very good.

505
00:42:48 --> 00:42:56
N times b equals sum of yi.
Now, this quantities look

506
00:42:56 --> 00:42:58
scary, but they are actually
just numbers.

507
00:42:58 --> 00:43:01
For example,
this one, you look at all your

508
00:43:01 --> 00:43:05
data points.
For each of them you take the

509
00:43:05 --> 00:43:10
value of x and you just sum all
these numbers together.

510
00:43:10 --> 00:43:19
What you get,
actually, is a linear system in

511
00:43:19 --> 00:43:26
a and b, a two by two linear
system.

512
00:43:26 --> 00:43:32
And so now we can solve this
for a and b.

513
00:43:32 --> 00:43:35
In practice,
of course, first you plug in

514
00:43:35 --> 00:43:40
the numbers for xi and yi and
then you solve the system that

515
00:43:40 --> 00:43:44
you get.
And we know how to solve two by

516
00:43:44 --> 00:43:46
two linear systems,
I hope.

517
00:43:46 --> 00:43:50
That's how we find the best fit
line.

518
00:43:50 --> 00:43:54
Now, why is that going to be
the best one instead of the

519
00:43:54 --> 00:43:56
worst one?
We just solved for a critical

520
00:43:56 --> 00:43:58
point.
That could actually be a

521
00:43:58 --> 00:44:01
maximum of this error function
D.

522
00:44:01 --> 00:44:05
We will have the answer to that
next time, but trust me.

523
00:44:05 --> 00:44:08
If you really want to go over
the second derivative test that

524
00:44:08 --> 00:44:11
we will see tomorrow and apply
it in this case,

525
00:44:11 --> 00:44:14
it is quite hard to check,
but you can see it is actually

526
00:44:14 --> 00:44:28
a minimum.
I will just say -- -- we can

527
00:44:28 --> 00:44:42
show that it is a minimum.
Now, the event with the linear

528
00:44:42 --> 00:44:47
case is the one that we are the
most familiar with.

529
00:44:47 --> 00:44:56
Least-squares interpolation
actually works in much more

530
00:44:56 --> 00:45:03
general settings.
Because instead of fitting for

531
00:45:03 --> 00:45:06
the best line,
if you think it has a different

532
00:45:06 --> 00:45:10
kind of relation then maybe you
can fit in using a different

533
00:45:10 --> 00:45:14
kind of formula.
Let me actually illustrate that

534
00:45:14 --> 00:45:17
with an example.
I don't know if you are

535
00:45:17 --> 00:45:21
familiar with Moore's law.
It is something that is

536
00:45:21 --> 00:45:24
supposed to tell you how quickly
basically computer chips become

537
00:45:24 --> 00:45:27
smarter faster and faster all
the time.

538
00:45:27 --> 00:45:31
It's a law that says things
about the number of transistors

539
00:45:31 --> 00:45:33
that you can fit onto a computer
chip.

540
00:45:33 --> 00:45:45
Here I have some data about --
Here is data about the number of

541
00:45:45 --> 00:45:58
transistors on a standard PC
processor as a function of time.

542
00:45:58 --> 00:46:01
And if you try to do a
best-line fit,

543
00:46:01 --> 00:46:07
well, it doesn't seem to follow
a linear trend.

544
00:46:07 --> 00:46:11
On the other hand, 
if you plug the diagram in the

545
00:46:11 --> 00:46:13
log scale,
the log of a number of

546
00:46:13 --> 00:46:15
transitions as a function of
time,

547
00:46:15 --> 00:46:21
then you get a much better line.
And so, in fact,

548
00:46:21 --> 00:46:26
that means that you had an
exponential relation between the

549
00:46:26 --> 00:46:30
number of transistors and time.
And so, actually that's what

550
00:46:30 --> 00:46:32
Moore's law says.
It says that the number of

551
00:46:32 --> 00:46:36
transistors in the chip doubles
every 18 months or every two

552
00:46:36 --> 00:46:40
years.
They keep changing the

553
00:46:40 --> 00:46:49
statement.
How do we find the best

554
00:46:49 --> 00:46:58
exponential fit?
Well, an exponential fit would

555
00:46:58 --> 00:47:05
be something of a form y equals
a constant times exponential of

556
00:47:05 --> 00:47:09
a times x.
That is what we want to look at.

557
00:47:09 --> 00:47:13
Well, we could try to minimize
a square error like we did

558
00:47:13 --> 00:47:16
before.
That doesn't work well at all.

559
00:47:16 --> 00:47:18
The equations that you get are
very complicated.

560
00:47:18 --> 00:47:24
You cannot solve them.
But remember what I showed you

561
00:47:24 --> 00:47:28
on this log plot.
If you plot the log of y as a

562
00:47:28 --> 00:47:33
function of x then suddenly it
becomes a linear relation.

563
00:47:33 --> 00:47:43
Observe, this is the same as ln
of y equals ln of c plus ax.

564
00:47:43 --> 00:47:55
And that is the linear best fit.
What you do is you just look

565
00:47:55 --> 00:48:08
for the best straight line fit
for the log of y.

566
00:48:08 --> 00:48:10
That is something we already
know.

567
00:48:10 --> 00:48:12
But you can also do,
for example,

568
00:48:12 --> 00:48:16
let's say that we have
something more complicated.

569
00:48:16 --> 00:48:21
Let's say that we have actually
a quadratic law.

570
00:48:21 --> 00:48:27
For example,
y is of the form ax^2 bx c.

571
00:48:27 --> 00:48:31
And, of course,
you are trying to find somehow

572
00:48:31 --> 00:48:34
the best.
That would mean here fitting

573
00:48:34 --> 00:48:37
the best parabola for your data
points.

574
00:48:37 --> 00:48:40
Well, to do that,
you would need to find a,

575
00:48:40 --> 00:48:45
b and c.
And now you will have actually

576
00:48:45 --> 00:48:51
a function of a,
b and c, which would be the sum

577
00:48:51 --> 00:48:57
of the old data points of the
square deviation.

578
00:48:57 --> 00:49:01
And, if you try to solve for
critical points,

579
00:49:01 --> 00:49:03
now you will have three
equations involving a,

580
00:49:03 --> 00:49:05
b and c,
in fact, you will find a three

581
00:49:05 --> 00:49:09
by three linear system.
And it works the same way.

582
00:49:09 --> 00:49:14
Just you have a little bit more
data.

583
00:49:14 --> 00:49:19
Basically, you see that this
best fit problems are an example

584
00:49:19 --> 00:49:24
of a minimization problem that
maybe you didn't expect to see

585
00:49:24 --> 00:49:30
minimization problems come in.
But that is really the way to

586
00:49:30 --> 00:49:34
handle these questions.
Tomorrow we will go back to the

587
00:49:34 --> 00:49:38
question of how do we decide
whether it is a minimum or a

588
00:49:38 --> 00:49:40
maximum.
And we will continue exploring

589
00:49:40 --> 00:49:43
in terms of several variables.
 

590
00:49:43 --> 00:49:48