1
00:00:01,540 --> 00:00:03,910
The following content is
provided under a Creative

2
00:00:03,910 --> 00:00:05,300
Commons license.

3
00:00:05,300 --> 00:00:07,510
Your support will help
MIT OpenCourseWare

4
00:00:07,510 --> 00:00:11,600
continue to offer high-quality
educational resources for free.

5
00:00:11,600 --> 00:00:14,140
To make a donation or to
view additional materials

6
00:00:14,140 --> 00:00:18,100
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:18,100 --> 00:00:18,980
at ocw.mit.edu.

8
00:00:24,660 --> 00:00:27,280
JAMES SWAN: OK.

9
00:00:27,280 --> 00:00:30,280
Should we begin?

10
00:00:30,280 --> 00:00:33,070
Let me remind you,
we switched topics.

11
00:00:33,070 --> 00:00:35,110
We transitioned
from linear algebra,

12
00:00:35,110 --> 00:00:36,880
solving systems of
linear equations,

13
00:00:36,880 --> 00:00:39,220
to solving systems of
nonlinear equations.

14
00:00:39,220 --> 00:00:43,630
And it turns out, linear algebra
is at the core of the way

15
00:00:43,630 --> 00:00:45,380
that we're going to
solve these equations.

16
00:00:45,380 --> 00:00:47,239
We need iterative approaches.

17
00:00:47,239 --> 00:00:48,530
These problems are complicated.

18
00:00:48,530 --> 00:00:50,679
We don't know how many
solutions there could be.

19
00:00:50,679 --> 00:00:52,970
We have no idea where those
solutions could be located.

20
00:00:52,970 --> 00:00:55,460
We have no exact
ways of finding them.

21
00:00:55,460 --> 00:00:59,140
We use iterative methods to
transform non-linear equations

22
00:00:59,140 --> 00:01:02,110
into simpler problems, right?

23
00:01:02,110 --> 00:01:04,849
Iterates of systems
of linear equations.

24
00:01:04,849 --> 00:01:09,292
And the key to that was
the Newton-Raphson method.

25
00:01:09,292 --> 00:01:11,000
So I'm going to pick
up where we left off

26
00:01:11,000 --> 00:01:12,708
with the Newton-Raphson
method, and we're

27
00:01:12,708 --> 00:01:17,420
going find out ways of being
less Newton-Raphson-y in order

28
00:01:17,420 --> 00:01:21,740
to overcome some difficulties
with the method, shortcomings

29
00:01:21,740 --> 00:01:22,324
of the method.

30
00:01:22,324 --> 00:01:23,823
There are a number
of them that have

31
00:01:23,823 --> 00:01:25,340
to be overcome in various ways.

32
00:01:25,340 --> 00:01:28,730
And you sort of choose these
so-called quasi- Newton-Raphson

33
00:01:28,730 --> 00:01:30,670
methods as you need them.

34
00:01:30,670 --> 00:01:31,906
OK, so you'll find out.

35
00:01:31,906 --> 00:01:33,030
You try to solve a problem.

36
00:01:33,030 --> 00:01:36,800
And the Newton-Raphson method
presents some difficulty,

37
00:01:36,800 --> 00:01:39,980
you might resort to a quasi
Newton-Raphson method instead.

38
00:01:39,980 --> 00:01:44,060
Built into MATLAB is
non-linear equations solver.

39
00:01:44,060 --> 00:01:44,790
fsolve.

40
00:01:44,790 --> 00:01:46,874
OK, it's going to
happily solve systems

41
00:01:46,874 --> 00:01:48,540
of nonlinear equations
for you, and it's

42
00:01:48,540 --> 00:01:50,520
going to use this
methodology to do it.

43
00:01:50,520 --> 00:01:51,920
It's going to use
various aspects

44
00:01:51,920 --> 00:01:53,990
of these quasi- Newton-Raphson
methods to do it.

45
00:01:53,990 --> 00:01:56,150
I'll sort of point
out places where

46
00:01:56,150 --> 00:01:58,430
fsolve will take
from our lecture

47
00:01:58,430 --> 00:01:59,560
and implement them for you.

48
00:01:59,560 --> 00:02:01,684
It will even use some more
complicated methods that

49
00:02:01,684 --> 00:02:04,850
we'll talk about later on in
the context of optimization .

50
00:02:04,850 --> 00:02:08,210
Somebody asked an
interesting question,

51
00:02:08,210 --> 00:02:10,460
which is how many of
these nonlinear equations

52
00:02:10,460 --> 00:02:12,781
am I going to want
to solve at once?

53
00:02:12,781 --> 00:02:13,280
Right?

54
00:02:13,280 --> 00:02:15,590
Like I have a system
of these equations.

55
00:02:15,590 --> 00:02:18,710
What does a big system of
nonlinear equations look like?

56
00:02:18,710 --> 00:02:20,210
And just like with
linear equations,

57
00:02:20,210 --> 00:02:22,170
it's as big as you can imagine.

58
00:02:22,170 --> 00:02:26,390
So one case you could think
about is trying to solve,

59
00:02:26,390 --> 00:02:29,450
for example, the steady
Navia-Stokes equations.

60
00:02:29,450 --> 00:02:31,490
That's a nonlinear
partial differential

61
00:02:31,490 --> 00:02:35,710
equation for the velocity field
and the pressure in a fluid.

62
00:02:35,710 --> 00:02:38,480
And a at Reynolds number,
that non-linearity

63
00:02:38,480 --> 00:02:41,360
is going to present itself
in terms of inertial terms

64
00:02:41,360 --> 00:02:46,190
that may even dominate the flow
characteristics in many places.

65
00:02:46,190 --> 00:02:48,650
We'll learn ways of discretizing
partial differential

66
00:02:48,650 --> 00:02:50,400
equations like that.

67
00:02:50,400 --> 00:02:53,424
And so then, at each point in
the fluid we're interested in,

68
00:02:53,424 --> 00:02:56,090
we're going to have a non-linear
equation that we have to solve.

69
00:02:56,090 --> 00:02:58,631
So there's going to be a system
of these non-linear equations

70
00:02:58,631 --> 00:02:59,870
that are coupled together.

71
00:02:59,870 --> 00:03:01,453
How many points are
there going to be?

72
00:03:01,453 --> 00:03:03,155
That's up to you, OK?

73
00:03:03,155 --> 00:03:05,030
And so you're going to
need methods like this

74
00:03:05,030 --> 00:03:05,646
to solve that.

75
00:03:05,646 --> 00:03:06,770
It sounds very complicated.

76
00:03:06,770 --> 00:03:08,420
So a lot of times
in fluid mechanics,

77
00:03:08,420 --> 00:03:10,470
we have better ways of
going about doing it.

78
00:03:10,470 --> 00:03:13,804
But in principle, we've have any
number of nonlinear equations

79
00:03:13,804 --> 00:03:14,720
that we want to solve.

80
00:03:18,791 --> 00:03:21,040
We discussed last time, the
new Newton-Raphson method,

81
00:03:21,040 --> 00:03:23,310
which was based around
the idea of linearization.

82
00:03:23,310 --> 00:03:25,732
We have these
nonlinear equations.

83
00:03:25,732 --> 00:03:27,190
We don't know what
to do with them.

84
00:03:27,190 --> 00:03:29,350
So let's linearize them, right?

85
00:03:29,350 --> 00:03:32,260
If we have some guess for the
solution, which isn't perfect,

86
00:03:32,260 --> 00:03:34,150
but it's our best
possible guess.

87
00:03:34,150 --> 00:03:36,880
Let's look at the function
and find a linearized form

88
00:03:36,880 --> 00:03:39,250
of the function and see where
that linearized form has

89
00:03:39,250 --> 00:03:40,490
an intercept.

90
00:03:40,490 --> 00:03:42,880
And we just have an Ansatz.

91
00:03:42,880 --> 00:03:45,160
We guess that this
is a better solution

92
00:03:45,160 --> 00:03:46,660
than the one we had before.

93
00:03:46,660 --> 00:03:48,610
And we iterate.

94
00:03:48,610 --> 00:03:51,520
It turns out you can
prove that this sort

95
00:03:51,520 --> 00:03:54,520
of a strategy-- this
Newton-Raphson strategy

96
00:03:54,520 --> 00:03:56,440
is locally convergent.

97
00:03:56,440 --> 00:03:59,710
If I start with a guess
sufficiently close to the root,

98
00:03:59,710 --> 00:04:02,410
you can prove mathematically
that this procedure

99
00:04:02,410 --> 00:04:05,980
will terminate with a
solution at the root, right?

100
00:04:05,980 --> 00:04:08,510
It's going to approach after
an infinite number of iterates,

101
00:04:08,510 --> 00:04:10,060
the root.

102
00:04:10,060 --> 00:04:10,980
That's wonderful.

103
00:04:10,980 --> 00:04:13,200
It's locally convergent,
not globally convergent.

104
00:04:13,200 --> 00:04:17,230
So this is one of those
problems that we discussed.

105
00:04:17,230 --> 00:04:18,519
Take a second here, right?

106
00:04:18,519 --> 00:04:19,660
Here's your
Newton-Raphson formula.

107
00:04:19,660 --> 00:04:20,868
You've got it on your slides.

108
00:04:20,868 --> 00:04:23,664
Take a second here and--

109
00:04:23,664 --> 00:04:24,830
this is sort of interesting.

110
00:04:24,830 --> 00:04:26,620
Derive the Babylonian
method, right?

111
00:04:26,620 --> 00:04:28,150
Turns out the Babylonians
didn't know anything

112
00:04:28,150 --> 00:04:30,608
about Newton-Raphson but they
had some good guesses for how

113
00:04:30,608 --> 00:04:32,410
to find square roots, right?

114
00:04:32,410 --> 00:04:34,162
Find the roots of an
equation like this.

115
00:04:34,162 --> 00:04:36,370
See that you understand the
new Newton-Raphson method

116
00:04:36,370 --> 00:04:38,860
by deriving the
Babylonian method, right?

117
00:04:38,860 --> 00:04:41,980
The iterative method for
finding the square root of s

118
00:04:41,980 --> 00:04:43,715
as the root of this equation.

119
00:04:43,715 --> 00:04:44,410
Can you do it?

120
00:04:44,410 --> 00:04:46,905
[SIDE CONVERSATION]

121
00:06:06,880 --> 00:06:09,404
JAMES SWAN: Yes, you know
how to do this, right?

122
00:06:09,404 --> 00:06:10,570
So calculate the derivative.

123
00:06:10,570 --> 00:06:12,460
The derivative is 2x.

124
00:06:12,460 --> 00:06:14,790
Here's our formula for
the iterative method.

125
00:06:14,790 --> 00:06:15,610
Right?

126
00:06:15,610 --> 00:06:18,940
So it's f of x
over f prime of x.

127
00:06:18,940 --> 00:06:21,040
That sets the
magnitude of the stop.

128
00:06:21,040 --> 00:06:23,590
The direction is
minus this magnitude.

129
00:06:23,590 --> 00:06:26,470
It's in one d, so we either
go left or we go right.

130
00:06:26,470 --> 00:06:28,210
Minus sets the direction.

131
00:06:28,210 --> 00:06:29,770
We add that to our
previous guess.

132
00:06:29,770 --> 00:06:31,560
And we have our
new iterate, right?

133
00:06:31,560 --> 00:06:34,840
You substitute f and f prime,
and you can simplify this

134
00:06:34,840 --> 00:06:38,110
down to the Babylonian
method, which said take

135
00:06:38,110 --> 00:06:41,595
the average of x and s over x.

136
00:06:41,595 --> 00:06:42,970
If I'm at the
root, both of these

137
00:06:42,970 --> 00:06:45,510
should be square root of s.

138
00:06:45,510 --> 00:06:48,856
And this quantity should
be zero exactly, right?

139
00:06:48,856 --> 00:06:50,550
And you'll get your solution.

140
00:06:50,550 --> 00:06:53,414
So that's the Babylonian
method, right?

141
00:06:53,414 --> 00:06:55,580
It's just an extension of
the Newton-Raphson method.

142
00:06:55,580 --> 00:06:57,330
It was pretty good
back in the day, right?

143
00:06:57,330 --> 00:07:00,450
Quadratic convergence to
the square root of a number.

144
00:07:00,450 --> 00:07:03,220
I mentioned early on
computers got really good

145
00:07:03,220 --> 00:07:05,600
in computing square
roots at one point,

146
00:07:05,600 --> 00:07:09,380
because somebody did
something kind of magic.

147
00:07:09,380 --> 00:07:12,290
They came up with a scheme for
getting good initial guesses

148
00:07:12,290 --> 00:07:13,240
for the square root.

149
00:07:13,240 --> 00:07:15,950
This iterative method has to
start with some initial guess.

150
00:07:15,950 --> 00:07:18,530
If it starts far away,
it'll take more iterations

151
00:07:18,530 --> 00:07:19,320
to get there.

152
00:07:19,320 --> 00:07:21,620
It'll get there, but it's
going to take more iterations

153
00:07:21,620 --> 00:07:22,780
to get there.

154
00:07:22,780 --> 00:07:25,280
That's undesirable if you're
trying to do fast calculations.

155
00:07:25,280 --> 00:07:26,840
So somebody came up
with some magic scheme,

156
00:07:26,840 --> 00:07:28,610
using floating point
mathematics, right?

157
00:07:28,610 --> 00:07:32,720
They masked some of the bits
in the digits of these numbers.

158
00:07:32,720 --> 00:07:35,120
A special number
to mask those bits.

159
00:07:35,120 --> 00:07:37,880
They found that using
optimization, it turns out.

160
00:07:37,880 --> 00:07:39,590
And they got really
good initial guesses,

161
00:07:39,590 --> 00:07:40,964
and then it would
take one or two

162
00:07:40,964 --> 00:07:43,040
iterations with the
Newton-Raphson method

163
00:07:43,040 --> 00:07:45,450
to get 16 digits of accuracy.

164
00:07:45,450 --> 00:07:47,990
That's pretty good.

165
00:07:47,990 --> 00:07:49,620
But good initial
guesses are important.

166
00:07:49,620 --> 00:07:51,495
We'll talk about that
next week on Wednesday.

167
00:07:51,495 --> 00:07:53,670
Where do those good
initial guesses come from?

168
00:07:53,670 --> 00:07:56,052
But sometimes we don't
have those available to us.

169
00:07:56,052 --> 00:07:58,010
So what are some other
ways that we can improve

170
00:07:58,010 --> 00:07:59,093
the Newton-Raphson method?

171
00:07:59,093 --> 00:08:02,701
That will be the topic
of today's lecture.

172
00:08:02,701 --> 00:08:04,950
What's the Newton-Raphson
method look like graphically

173
00:08:04,950 --> 00:08:06,060
in many dimensions.

174
00:08:06,060 --> 00:08:08,130
We talked about this Jacobian.

175
00:08:08,130 --> 00:08:10,200
Right, when we're
trying to find the roots

176
00:08:10,200 --> 00:08:13,890
of a non-linear equation
where our function has

177
00:08:13,890 --> 00:08:16,670
more than one dimension-- let's
say it has two dimensions.

178
00:08:16,670 --> 00:08:18,700
So we have an f 1 and an f 2.

179
00:08:18,700 --> 00:08:21,270
And our unknowns
our x 1 and x 2,

180
00:08:21,270 --> 00:08:23,870
they live in the
x1 x2 plane, right?

181
00:08:23,870 --> 00:08:27,017
f1 might be this say
bowl-shaped function.

182
00:08:27,017 --> 00:08:28,350
I've sketched out in red, right?

183
00:08:28,350 --> 00:08:29,400
It's three dimensional.

184
00:08:29,400 --> 00:08:31,041
It's some surface here.

185
00:08:31,041 --> 00:08:31,540
Right?

186
00:08:31,540 --> 00:08:34,350
We have some initial
guess for the solution.

187
00:08:34,350 --> 00:08:37,860
We go up to the function, and
we find a linearization of it,

188
00:08:37,860 --> 00:08:40,200
which is not a line but a plane.

189
00:08:40,200 --> 00:08:44,900
And that plane intersects
the x 1, x 2 plane at a line.

190
00:08:44,900 --> 00:08:48,210
And our next best guess is going
to live somewhere on this line.

191
00:08:48,210 --> 00:08:51,880
Where on this line depends
on the linearization of f 2.

192
00:08:51,880 --> 00:08:52,410
Right?

193
00:08:52,410 --> 00:08:53,980
So we got to draw the
same picture for f 2,

194
00:08:53,980 --> 00:08:55,521
but I'm not going
to do that for you.

195
00:08:55,521 --> 00:08:59,400
So let's say, this is where
the equivalent line from f 2

196
00:08:59,400 --> 00:09:02,070
intersects the line
from f 1, right?

197
00:09:02,070 --> 00:09:04,650
So the two linearizations
intersect here.

198
00:09:04,650 --> 00:09:06,460
That's our next best guess.

199
00:09:06,460 --> 00:09:07,860
We go back up to the curve.

200
00:09:07,860 --> 00:09:10,680
We find the plane that's
tangent to the curve.

201
00:09:10,680 --> 00:09:12,600
We figure out where
it intersects.

202
00:09:12,600 --> 00:09:13,840
The x 1 x 2 plane.

203
00:09:13,840 --> 00:09:14,610
That's a line.

204
00:09:14,610 --> 00:09:17,280
We find the point on the line
that's our next best guess,

205
00:09:17,280 --> 00:09:18,300
and continue.

206
00:09:18,300 --> 00:09:20,370
Finding that
intersection in the plane

207
00:09:20,370 --> 00:09:24,938
is the act of computing
Jacobian inverse times f.

208
00:09:24,938 --> 00:09:25,438
OK?

209
00:09:30,380 --> 00:09:34,100
If we project down to
just the x 1 x 2 plane,

210
00:09:34,100 --> 00:09:39,750
and we draw the curves where f
1 equals 0, and f 2 equals zero,

211
00:09:39,750 --> 00:09:40,250
right?

212
00:09:40,250 --> 00:09:42,950
Then each of these iterates,
we start with an initial guess.

213
00:09:42,950 --> 00:09:46,550
We find the planes that are
tangent to these curves,

214
00:09:46,550 --> 00:09:48,200
or to these surfaces.

215
00:09:48,200 --> 00:09:50,600
And where they intersect
the x 1 x 2 plane.

216
00:09:50,600 --> 00:09:51,745
Those give us these lines.

217
00:09:51,745 --> 00:09:53,120
And the intersection
of the lines

218
00:09:53,120 --> 00:09:55,050
give us our next approximation.

219
00:09:55,050 --> 00:09:58,160
And so our function steps
along in the x1 and x2 plane.

220
00:09:58,160 --> 00:10:00,650
It takes some path
through that plane.

221
00:10:00,650 --> 00:10:03,692
And eventually it will approach
this locally unique solution.

222
00:10:03,692 --> 00:10:05,900
So that's what this iterative
method is doing, right?

223
00:10:05,900 --> 00:10:08,750
It's navigating this
multidimensional space, right?

224
00:10:08,750 --> 00:10:12,230
It moves where it
has to to satisfy

225
00:10:12,230 --> 00:10:14,930
these linearized
equations, right?

226
00:10:14,930 --> 00:10:18,610
Producing ever better
approximations for a root.

227
00:10:18,610 --> 00:10:19,600
Start close.

228
00:10:19,600 --> 00:10:21,230
It'll converge fast.

229
00:10:21,230 --> 00:10:22,510
How fast?

230
00:10:22,510 --> 00:10:23,920
Quadratically.

231
00:10:23,920 --> 00:10:25,540
And you can prove this.

232
00:10:25,540 --> 00:10:26,620
I'll prove it in 1D.

233
00:10:26,620 --> 00:10:28,800
You might think about the
multidimensional case,

234
00:10:28,800 --> 00:10:30,944
but I'll show you
in one dimension.

235
00:10:30,944 --> 00:10:32,360
So the Newton-Raphson
method said,

236
00:10:32,360 --> 00:10:38,460
xi plus 1 is equal to xi minus
f of xi over f prime of xi.

237
00:10:38,460 --> 00:10:42,340
I'm going to subtract the root,
the exact root from both sides

238
00:10:42,340 --> 00:10:44,320
of this equation.

239
00:10:44,320 --> 00:10:49,900
So this is the absolute error
in the i plus 1 approximation.

240
00:10:49,900 --> 00:10:52,300
It's equal to this.

241
00:10:52,300 --> 00:10:54,940
And we're going to do
a little trick, OK?

242
00:10:54,940 --> 00:10:59,280
The value of the function at the
root is exactly equal to zero,

243
00:10:59,280 --> 00:11:03,090
and I'm going to expand
this as a Taylor series,

244
00:11:03,090 --> 00:11:05,820
about the point xy.

245
00:11:05,820 --> 00:11:11,040
So f of xi plus f prime of
xi times x star minus xi,

246
00:11:11,040 --> 00:11:12,990
plus this second
order term as well.

247
00:11:12,990 --> 00:11:16,310
Plus cubic terms in this
Taylor expansion, right?

248
00:11:16,310 --> 00:11:18,480
All of those need to sum
up and be equal to 0,

249
00:11:18,480 --> 00:11:21,250
Because f of x star
by definition is zero.

250
00:11:21,250 --> 00:11:24,000
x star is the root.

251
00:11:24,000 --> 00:11:28,860
And buried in this
expression here

252
00:11:28,860 --> 00:11:33,210
is a quantity which can be
related to xi minus f of xi

253
00:11:33,210 --> 00:11:34,950
over f prime minus x star.

254
00:11:34,950 --> 00:11:35,970
It's right here, right?

255
00:11:35,970 --> 00:11:39,250
xi minus x star,
xi minus x star.

256
00:11:39,250 --> 00:11:41,820
I've got to divide
through by f prime.

257
00:11:41,820 --> 00:11:44,550
Divide through by f prime,
and I get f over f prime.

258
00:11:44,550 --> 00:11:46,320
That's this guy here.

259
00:11:46,320 --> 00:11:49,890
Those things are equal
in magnitude then,

260
00:11:49,890 --> 00:11:51,380
to this second order term here.

261
00:11:51,380 --> 00:11:54,240
So they are equal
in magnitude to 1/2,

262
00:11:54,240 --> 00:11:58,380
the second derivative of f,
divided by f prime, times xi

263
00:11:58,380 --> 00:11:59,939
minus x star squared.

264
00:11:59,939 --> 00:12:02,230
And then these cubic terms,
well, they're still around.

265
00:12:02,230 --> 00:12:05,610
But they're going to be small as
I get close to the actual root.

266
00:12:05,610 --> 00:12:07,440
So they're negligible, right?

267
00:12:07,440 --> 00:12:11,115
Compared to these second order
terms, they can be neglected.

268
00:12:11,115 --> 00:12:12,870
And you should convince
yourself that I

269
00:12:12,870 --> 00:12:15,000
can apply some of
the norm properties

270
00:12:15,000 --> 00:12:16,740
that we used before, OK?

271
00:12:16,740 --> 00:12:17,730
To the absolute value.

272
00:12:17,730 --> 00:12:20,940
The absolute values is
the norm of a scalar.

273
00:12:20,940 --> 00:12:23,550
So these norm properties
tell me that this quantity

274
00:12:23,550 --> 00:12:26,940
has to be less than
or equal to, right?

275
00:12:26,940 --> 00:12:30,150
This ratio of
derivatives multiplied

276
00:12:30,150 --> 00:12:33,990
by the absolute error
in step i squared.

277
00:12:33,990 --> 00:12:37,590
And I'll divide by that absolute
error in step i squared.

278
00:12:37,590 --> 00:12:40,440
So taking the limit
is i goes to infinity,

279
00:12:40,440 --> 00:12:45,520
this ratio here is
bound by a constant.

280
00:12:45,520 --> 00:12:48,450
This is a definition for
the rate of convergence.

281
00:12:48,450 --> 00:12:51,930
It says I take the absolute
error in step i plus 1.

282
00:12:51,930 --> 00:12:55,350
I divide it by the absolute
error in step i squared.

283
00:12:55,350 --> 00:12:57,540
And it will always be
smaller than some constant,

284
00:12:57,540 --> 00:13:00,100
as i goes to infinity.

285
00:13:00,100 --> 00:13:02,370
So it converges
quadratically, right?

286
00:13:02,370 --> 00:13:09,667
If the relative error in step
i was order 10 to the minus 1,

287
00:13:09,667 --> 00:13:11,880
then the relative
error in step i plus 1

288
00:13:11,880 --> 00:13:14,490
will be order 10 to the minus 2.

289
00:13:14,490 --> 00:13:16,410
Because they got to be
bound by this constant.

290
00:13:16,410 --> 00:13:19,240
If the relative error in
step i was 10 to the minus 2,

291
00:13:19,240 --> 00:13:21,590
the relative error
in step i plus 1,

292
00:13:21,590 --> 00:13:25,170
has got to be order 10 to the
minus 4, or smaller, right?

293
00:13:25,170 --> 00:13:27,810
Because I square the
quantity down here.

294
00:13:27,810 --> 00:13:30,450
I get to double the
number of accurate digits

295
00:13:30,450 --> 00:13:31,500
with each iteration.

296
00:13:34,050 --> 00:13:37,950
And this will hold so long
as the derivative evaluated

297
00:13:37,950 --> 00:13:39,660
at the root is
not equal to zero.

298
00:13:39,660 --> 00:13:42,270
If the derivative evaluated
at the root is equal to zero,

299
00:13:42,270 --> 00:13:43,970
this analysis
wasn't really valid.

300
00:13:43,970 --> 00:13:49,260
You can't divide by zero
in various places, OK?

301
00:13:49,260 --> 00:13:51,330
It turns out the
same thing is true

302
00:13:51,330 --> 00:13:52,890
if we do the
multidimensional case.

303
00:13:52,890 --> 00:13:54,960
I'll leave it to you to
investigate that case.

304
00:13:54,960 --> 00:13:57,334
I think it's interesting for
you to try and explore that.

305
00:13:57,334 --> 00:13:59,940
It follows the 1D model
I showed you before.

306
00:13:59,940 --> 00:14:02,930
But the absolute error
in iterate i plus 1,

307
00:14:02,930 --> 00:14:05,070
divided by the absolute
error in iterate i--

308
00:14:05,070 --> 00:14:06,150
here's a small typo here.

309
00:14:06,150 --> 00:14:07,590
Cross out that plus 1, right?

310
00:14:07,590 --> 00:14:10,917
The absolute error
in iterate i squared

311
00:14:10,917 --> 00:14:12,375
is going to be
bound by a constant.

312
00:14:14,980 --> 00:14:17,670
And this will be true so
long as the determinant

313
00:14:17,670 --> 00:14:20,516
at the Jacobian at the
root is not equal to zero.

314
00:14:20,516 --> 00:14:22,140
We know the determinant
of the Jacobian

315
00:14:22,140 --> 00:14:26,194
plays the role of the
derivative in the 1D case.

316
00:14:26,194 --> 00:14:27,610
When the Jacobian
is singular, you

317
00:14:27,610 --> 00:14:31,150
can show that linear convergence
is going to occur instead.

318
00:14:31,150 --> 00:14:32,540
So it will still converge.

319
00:14:32,540 --> 00:14:34,570
It's not necessarily a
problem that the Jacobian

320
00:14:34,570 --> 00:14:36,190
becomes singular at the root.

321
00:14:36,190 --> 00:14:39,120
But you're going to lose your
rate of quadratic convergence.

322
00:14:42,381 --> 00:14:43,880
And this rate of
convergence is only

323
00:14:43,880 --> 00:14:46,390
guaranteed if we start
sufficiently close to the root.

324
00:14:46,390 --> 00:14:49,610
So good initial guesses,
that's important.

325
00:14:49,610 --> 00:14:51,230
We have a locally
convergent method.

326
00:14:51,230 --> 00:14:52,490
Bad initial guesses?

327
00:14:52,490 --> 00:14:55,501
Well, who knows where
this iterative method is

328
00:14:55,501 --> 00:14:56,000
going to go.

329
00:14:56,000 --> 00:14:57,470
There's nothing to
guarantee that it's

330
00:14:57,470 --> 00:14:58,430
going to converge even.

331
00:14:58,430 --> 00:15:00,500
Right It may run away someplace.

332
00:15:00,500 --> 00:15:03,540
Here are a few examples of
where things can go wrong.

333
00:15:03,540 --> 00:15:07,010
So if I have a local
minima or maxima,

334
00:15:07,010 --> 00:15:10,940
I might have an iterate where
I evaluate the linearization,

335
00:15:10,940 --> 00:15:12,770
and it tells me my
next best approximation

336
00:15:12,770 --> 00:15:16,140
is on the other side of
this minima or maxima.

337
00:15:16,140 --> 00:15:18,630
And then I go up, and I
get the linearization here.

338
00:15:18,630 --> 00:15:20,670
And it tells me, oh, my
next best approximation

339
00:15:20,670 --> 00:15:21,545
is on the other side.

340
00:15:21,545 --> 00:15:24,000
And this method could
bounce back and forth

341
00:15:24,000 --> 00:15:27,839
in here for as long
as we sit and wait.

342
00:15:27,839 --> 00:15:29,880
It's locally convergent,
not globally convergent.

343
00:15:29,880 --> 00:15:33,330
It can get hung up in
situations like this.

344
00:15:33,330 --> 00:15:35,250
Asymptotes are a problem.

345
00:15:35,250 --> 00:15:37,980
I have an asymptote,
which presumably

346
00:15:37,980 --> 00:15:40,800
has an effective root
somewhere out here at infinity.

347
00:15:40,800 --> 00:15:42,540
Well, my solution
would like to follow

348
00:15:42,540 --> 00:15:45,450
the linearization, the
successive linearizations all

349
00:15:45,450 --> 00:15:47,580
the way out along
this asymptote, right?

350
00:15:47,580 --> 00:15:52,470
So my iterates may blow up
in an uncontrolled fashion.

351
00:15:52,470 --> 00:15:54,400
You can also end
up with funny cases

352
00:15:54,400 --> 00:15:58,590
where our Newton-Raphson
steps continually

353
00:15:58,590 --> 00:16:00,090
overshoot the roots.

354
00:16:00,090 --> 00:16:05,490
So they can be functions who
have a power loss scaling

355
00:16:05,490 --> 00:16:11,640
right near the root, such that
the derivative doesn't exist.

356
00:16:11,640 --> 00:16:12,820
OK?

357
00:16:12,820 --> 00:16:17,080
So here the derivative of this
thing, if s is smaller than 1,

358
00:16:17,080 --> 00:16:19,720
and x equals zero, it
won't exist, right?

359
00:16:19,720 --> 00:16:22,580
There isn't a derivative
that's defined there.

360
00:16:22,580 --> 00:16:25,090
And in those cases, you can
often wind up with overshoot.

361
00:16:25,090 --> 00:16:29,320
So I'll take a linearization,
and I'll shoot over the root.

362
00:16:29,320 --> 00:16:31,450
And I'll go up and I'll
take my next linearization,

363
00:16:31,450 --> 00:16:33,366
I'll shoot back on the
other side of the root.

364
00:16:33,366 --> 00:16:36,530
And depending on the power of s
associated with this function,

365
00:16:36,530 --> 00:16:38,440
it may diverge, right?

366
00:16:38,440 --> 00:16:40,830
I may get further and
further away from the root,

367
00:16:40,830 --> 00:16:44,140
or it may slowly converge
towards that root.

368
00:16:44,140 --> 00:16:45,477
But it can be problematic.

369
00:16:48,205 --> 00:16:51,910
Here's another
problem that crops up.

370
00:16:51,910 --> 00:16:54,620
Sometimes people talk
about basins of attraction.

371
00:16:54,620 --> 00:16:58,180
So here's a two-dimensional,
non-linear equation

372
00:16:58,180 --> 00:16:59,850
I want to find the roots for.

373
00:16:59,850 --> 00:17:01,420
It's cubic in
nature, so it's got

374
00:17:01,420 --> 00:17:04,730
three roots, which are indicated
by the stars in the x1 x 2

375
00:17:04,730 --> 00:17:06,839
plane.

376
00:17:06,839 --> 00:17:09,589
And I've taken a number of
different initial guesses

377
00:17:09,589 --> 00:17:13,160
from all over the
plane and I've asked--

378
00:17:13,160 --> 00:17:15,980
given that initial guess, using
the Newton-Raphson method,

379
00:17:15,980 --> 00:17:20,000
which root do I find?

380
00:17:20,000 --> 00:17:22,190
So if you see a dark
blue color like this,

381
00:17:22,190 --> 00:17:25,190
that means initial guesses
there found this root.

382
00:17:25,190 --> 00:17:26,690
If you see a medium
blue color, that

383
00:17:26,690 --> 00:17:28,099
means they found this root.

384
00:17:28,099 --> 00:17:31,110
See a light blue color, that
means they found this root.

385
00:17:31,110 --> 00:17:36,140
And this is a relatively
simple function, relatively low

386
00:17:36,140 --> 00:17:40,010
dimension, but the plane
here is tilled by--

387
00:17:40,010 --> 00:17:40,730
it's not tiled.

388
00:17:40,730 --> 00:17:41,960
It's filled with a fractal.

389
00:17:41,960 --> 00:17:44,160
These basins of attraction
are fractal in nature.

390
00:17:44,160 --> 00:17:47,120
Which means that I
could think that I'm

391
00:17:47,120 --> 00:17:49,310
starting with a
solution rate here

392
00:17:49,310 --> 00:17:52,070
that should converge to this
green root because it's close.

393
00:17:52,070 --> 00:17:53,875
But it actually goes over here.

394
00:17:53,875 --> 00:17:56,000
And if I change that initial
guess by a little bit,

395
00:17:56,000 --> 00:17:59,870
it actually pops up to this
root over here instead.

396
00:17:59,870 --> 00:18:03,170
It's quite difficult to
predict which solution

397
00:18:03,170 --> 00:18:07,060
you're going to converge to.

398
00:18:07,060 --> 00:18:07,910
Yes?

399
00:18:07,910 --> 00:18:09,830
AUDIENCE: And in this case, you
knew how many roots there are.

400
00:18:09,830 --> 00:18:10,496
JAMES SWAN: Yes.

401
00:18:10,496 --> 00:18:12,042
AUDIENCE: Often
you wouldn't know.

402
00:18:12,042 --> 00:18:14,630
So you find one,
and you're happy.

403
00:18:14,630 --> 00:18:15,130
Right?

404
00:18:15,130 --> 00:18:18,539
You're happy because
[INAUDIBLE] physical.

405
00:18:18,539 --> 00:18:20,000
Might be the wrong one.

406
00:18:20,000 --> 00:18:22,740
JAMES SWAN: So this the problem.

407
00:18:27,980 --> 00:18:30,770
I think this is about the
minimum level of complexity

408
00:18:30,770 --> 00:18:31,404
you need.

409
00:18:31,404 --> 00:18:33,320
Which is not very complex
at all in a function

410
00:18:33,320 --> 00:18:36,740
to get these sorts of
basins of attraction.

411
00:18:36,740 --> 00:18:38,210
Polynomial equations
are ones that

412
00:18:38,210 --> 00:18:40,060
really suffer from
this especially,

413
00:18:40,060 --> 00:18:42,170
but it's a problem in general.

414
00:18:42,170 --> 00:18:44,300
You often don't know.

415
00:18:44,300 --> 00:18:46,250
I'll show you quasi
Newton-Raphson methods

416
00:18:46,250 --> 00:18:48,460
that help fix some
of these problems.

417
00:18:48,460 --> 00:18:50,270
How about other problems?

418
00:18:50,270 --> 00:18:52,240
It's good to know where
the weaknesses are.

419
00:18:52,240 --> 00:18:54,870
Newton-Raphson sounds great,
but where are the weaknesses?

420
00:18:54,870 --> 00:18:55,370
Let's see.

421
00:18:55,370 --> 00:18:59,890
The Jacobian-- might not be
easy to calculate analytically,

422
00:18:59,890 --> 00:19:00,430
right?

423
00:19:00,430 --> 00:19:02,440
So far we've written
down analytical forms

424
00:19:02,440 --> 00:19:03,460
for the Jacobian.

425
00:19:03,460 --> 00:19:06,070
We've had simple functions.

426
00:19:06,070 --> 00:19:08,299
But maybe it's not easy
to calculate analytically.

427
00:19:08,299 --> 00:19:10,090
You should think about
what are the sources

428
00:19:10,090 --> 00:19:12,400
for this function,
f of x, that we're

429
00:19:12,400 --> 00:19:14,470
trying to find the roots for.

430
00:19:14,470 --> 00:19:16,210
Also we got to
invert the Jacobian,

431
00:19:16,210 --> 00:19:17,380
and we know that's a matrix.

432
00:19:17,380 --> 00:19:19,570
And matrices which have a
lot of dimensions in them

433
00:19:19,570 --> 00:19:21,160
are complicated to invert.

434
00:19:21,160 --> 00:19:23,509
There's a huge
amount of complexity,

435
00:19:23,509 --> 00:19:25,675
computational complexity,
in doing those inversions.

436
00:19:25,675 --> 00:19:28,970
It can take a long
time to do them.

437
00:19:28,970 --> 00:19:32,350
It may undesirable to have
to constantly be solving

438
00:19:32,350 --> 00:19:34,770
a system of linear equations.

439
00:19:34,770 --> 00:19:39,500
So might think about some
options for mitigating this.

440
00:19:39,500 --> 00:19:41,972
Sometimes it won't
converge at all?

441
00:19:41,972 --> 00:19:42,930
Or to the nearest root.

442
00:19:42,930 --> 00:19:45,904
This is this overshoot, or
basin of attraction problem.

443
00:19:45,904 --> 00:19:47,570
And we'll talk about
these modifications

444
00:19:47,570 --> 00:19:48,770
to correct these issues.

445
00:19:48,770 --> 00:19:50,190
They come with a penalty though.

446
00:19:50,190 --> 00:19:50,969
OK?

447
00:19:50,969 --> 00:19:52,760
So Newton-Raphson was
based around the idea

448
00:19:52,760 --> 00:19:53,870
of linearization.

449
00:19:53,870 --> 00:19:57,010
If we modify that
linearization, we're

450
00:19:57,010 --> 00:20:00,070
going to lose some of
these great benefits

451
00:20:00,070 --> 00:20:02,570
of the Newton-Raphson method,
namely that it's quadratically

452
00:20:02,570 --> 00:20:03,320
convergent, right?

453
00:20:03,320 --> 00:20:05,632
We're going to make some
changes to the method,

454
00:20:05,632 --> 00:20:08,150
and it's not going to converge
quadratically anymore.

455
00:20:08,150 --> 00:20:10,280
It's going to slow
down, but maybe we'll

456
00:20:10,280 --> 00:20:13,280
be able to rein in the method
and make it converge either

457
00:20:13,280 --> 00:20:15,980
to the roots we want it to
converge to or converge more

458
00:20:15,980 --> 00:20:19,250
reliably than it would before.

459
00:20:19,250 --> 00:20:23,150
Maybe we'll be able to actually
do the calculation faster,

460
00:20:23,150 --> 00:20:26,180
even though it may
require more iterations.

461
00:20:26,180 --> 00:20:28,070
Maybe we can make each
iteration much faster

462
00:20:28,070 --> 00:20:30,565
using some of these methods.

463
00:20:30,565 --> 00:20:33,190
OK so here are the three things
that we're going to talk about.

464
00:20:33,190 --> 00:20:35,730
We're going to talk about
approximating the Jacobian

465
00:20:35,730 --> 00:20:37,230
with finite differences.

466
00:20:37,230 --> 00:20:38,730
We're talking about
Broyden's method

467
00:20:38,730 --> 00:20:40,647
for approximating the
inverse of the Jacobian.

468
00:20:40,647 --> 00:20:42,563
And we're going to talk
about something called

469
00:20:42,563 --> 00:20:43,981
damped Newton-Raphson methods.

470
00:20:43,981 --> 00:20:45,730
Those will be the three
topics of the day.

471
00:20:48,772 --> 00:20:49,980
So here's what I said before.

472
00:20:49,980 --> 00:20:51,480
Analytical calculations
of Jacobian

473
00:20:51,480 --> 00:20:53,620
requires analytical
formulas for f.

474
00:20:53,620 --> 00:20:56,070
And for functions of a
few dimensions, right?

475
00:20:56,070 --> 00:20:58,620
These calculations
are not too tough.

476
00:20:58,620 --> 00:21:00,780
For functions of
many dimensions,

477
00:21:00,780 --> 00:21:03,710
this is tedious at best.

478
00:21:03,710 --> 00:21:05,880
Error prone, at worst.

479
00:21:05,880 --> 00:21:09,190
Think about even something like
10 equations for 10 unknowns.

480
00:21:09,190 --> 00:21:12,717
If your error rate is
1%, well, you're shot.

481
00:21:12,717 --> 00:21:14,550
There's a pretty good
chance that you missed

482
00:21:14,550 --> 00:21:15,900
one element of the Jacobian.

483
00:21:15,900 --> 00:21:17,906
You made a mistake
somewhere in there.

484
00:21:17,906 --> 00:21:19,530
And now you're not
doing Newton-Raphson

485
00:21:19,530 --> 00:21:21,654
You're doing some other
iterative method that isn't

486
00:21:21,654 --> 00:21:24,540
the one that you intended.

487
00:21:24,540 --> 00:21:26,745
There are a lot of times
where you-- maybe you

488
00:21:26,745 --> 00:21:30,570
have an analytical formula for
some of these f's, but not all

489
00:21:30,570 --> 00:21:31,980
of them.

490
00:21:31,980 --> 00:21:34,710
So where can these
functionalities come from?

491
00:21:34,710 --> 00:21:37,950
We've seen some cases, where
you have physical models.

492
00:21:37,950 --> 00:21:40,804
Thermodynamic models that
you can write down by hand.

493
00:21:40,804 --> 00:21:43,220
But where are other places
that these functions come from?

494
00:21:46,560 --> 00:21:47,498
Ideas?

495
00:21:47,498 --> 00:21:50,426
AUDIENCE: [INAUDIBLE]

496
00:21:50,426 --> 00:21:51,890
JAMES SWAN: Oh, good.

497
00:21:51,890 --> 00:21:53,714
AUDIENCE: [INAUDIBLE]

498
00:21:53,714 --> 00:21:54,630
JAMES SWAN: Beautiful.

499
00:21:54,630 --> 00:21:56,754
So this is going to be the
most common case, right?

500
00:21:56,754 --> 00:22:00,720
Maybe you want to use some
sort of simulation code, right?

501
00:22:00,720 --> 00:22:01,719
To model something.

502
00:22:01,719 --> 00:22:03,260
It's somebody else's
simulation code.

503
00:22:03,260 --> 00:22:06,240
They're an expert at doing
finite element modeling.

504
00:22:06,240 --> 00:22:09,900
But the output is this f
that you're interested,

505
00:22:09,900 --> 00:22:12,930
and the input to the
simulation are these x's.

506
00:22:12,930 --> 00:22:15,960
And you want to find the roots
associated with this problem

507
00:22:15,960 --> 00:22:18,574
that you're solving via
the simulation code, right?

508
00:22:18,574 --> 00:22:20,490
This is pretty important
being able to connect

509
00:22:20,490 --> 00:22:23,420
different pieces of
software together.

510
00:22:23,420 --> 00:22:27,910
Well, there's no analytical
formula for f there.

511
00:22:27,910 --> 00:22:28,600
OK?

512
00:22:28,600 --> 00:22:29,990
You're shot.

513
00:22:29,990 --> 00:22:32,180
So it may come from
results of simulations.

514
00:22:32,180 --> 00:22:33,710
This is extremely common.

515
00:22:33,710 --> 00:22:36,550
It could come from
interpretation of data.

516
00:22:36,550 --> 00:22:38,630
So you may have a
bunch of data that's

517
00:22:38,630 --> 00:22:41,780
being generated by some physical
measurement or a process,

518
00:22:41,780 --> 00:22:44,240
either continuously or
you just have a data set

519
00:22:44,240 --> 00:22:47,030
that's available to you.

520
00:22:47,030 --> 00:22:50,060
But these function
values are often not,

521
00:22:50,060 --> 00:22:52,250
they're not things that
you know analytically.

522
00:22:52,250 --> 00:22:56,730
It may also be the case that,
oh, man, even Aspen, you're

523
00:22:56,730 --> 00:22:59,157
going to wind up solving
systems of nonlinear equations.

524
00:22:59,157 --> 00:23:00,990
It's going to use the
Newton-Raphson method.

525
00:23:00,990 --> 00:23:03,750
Aspen's going to have lots
of these formulas in it

526
00:23:03,750 --> 00:23:05,250
for functions.

527
00:23:05,250 --> 00:23:07,440
Whose going in by
hand and computing

528
00:23:07,440 --> 00:23:11,010
the derivatives of all
these functions for aspen?

529
00:23:11,010 --> 00:23:13,920
MATLAB has a nonlinear
equation solver in it.

530
00:23:13,920 --> 00:23:15,900
You give it the
function, and it'll

531
00:23:15,900 --> 00:23:18,000
find the root of the
equation, given a guess.

532
00:23:18,000 --> 00:23:19,833
It's going to use the
Newton-Raphson method.

533
00:23:19,833 --> 00:23:23,621
Whose computing the
Jacobian for MATLAB?

534
00:23:23,621 --> 00:23:24,120
You can.

535
00:23:24,120 --> 00:23:26,161
You can compute it by
hand, and give it an input.

536
00:23:26,161 --> 00:23:28,987
Sometimes that's a
really good thing to do.

537
00:23:28,987 --> 00:23:31,070
But sometimes, we don't
have that available to us.

538
00:23:31,070 --> 00:23:33,320
So we need alternative ways
of computing the Jacobian.

539
00:23:33,320 --> 00:23:37,410
The simplest one is a finite
difference approximation.

540
00:23:37,410 --> 00:23:40,200
So you recall the definition
of the derivative.

541
00:23:40,200 --> 00:23:45,570
It's the limit of this
difference, f of x plus epsilon

542
00:23:45,570 --> 00:23:49,680
minus f of x divided by epsilon,
as epsilon goes to zero.

543
00:23:49,680 --> 00:23:52,260
There's an error in
this approximation

544
00:23:52,260 --> 00:23:55,210
for the derivative with a
finite value for epsilon,

545
00:23:55,210 --> 00:23:58,320
which is proportional
to epsilon.

546
00:23:58,320 --> 00:23:59,970
So choose a small
value of epsilon.

547
00:23:59,970 --> 00:24:04,080
You'll get a good approximation
for the derivative.

548
00:24:04,080 --> 00:24:06,162
It turns out the accuracy
depends on epsilon,

549
00:24:06,162 --> 00:24:07,620
but kind of in a
non-intuitive way.

550
00:24:07,620 --> 00:24:08,786
And here's a simple example.

551
00:24:08,786 --> 00:24:12,270
So let's compute the
derivative of f of x equals e

552
00:24:12,270 --> 00:24:14,520
the x, which is e to the x.

553
00:24:14,520 --> 00:24:17,670
Let's evaluate it at x equals 1.

554
00:24:17,670 --> 00:24:18,170
So

555
00:24:18,170 --> 00:24:21,240
F prime of 1 is e the 1,
which should approximately

556
00:24:21,240 --> 00:24:26,740
be e to the 1 plus epsilon
minus e to the 1 over epsilon.

557
00:24:26,740 --> 00:24:29,930
And here I've done
this calculation.

558
00:24:29,930 --> 00:24:32,600
And I've asked, what's
the absolute error

559
00:24:32,600 --> 00:24:35,810
in this calculation by taking
the difference between this

560
00:24:35,810 --> 00:24:39,290
and this, for different
values of epsilon.

561
00:24:39,290 --> 00:24:42,180
You can see initially,
as epsilon gets smaller,

562
00:24:42,180 --> 00:24:44,980
the absolute error goes down
in proportion to epsilon.

563
00:24:44,980 --> 00:24:47,000
10 to the minus 3,
10 to the minus 3.

564
00:24:47,000 --> 00:24:49,190
10 to the minus 4,
10 to the minus 4.

565
00:24:49,190 --> 00:24:51,380
10 to the minus 8,
10 to the minus 8.

566
00:24:51,380 --> 00:24:52,660
10 to the minus 9.

567
00:24:52,660 --> 00:24:53,960
10 to the minus 7.

568
00:24:53,960 --> 00:24:55,390
10 to the minus 10.

569
00:24:55,390 --> 00:24:56,380
And 10 to the minus 6.

570
00:24:56,380 --> 00:24:58,860
So it went down,
and it came back up.

571
00:24:58,860 --> 00:25:02,060
But that's not what this formula
told us should happen, right?

572
00:25:02,060 --> 00:25:02,849
Yes?

573
00:25:02,849 --> 00:25:04,316
AUDIENCE: So just to be sure.

574
00:25:04,316 --> 00:25:07,250
That term in that
column on the right?

575
00:25:07,250 --> 00:25:08,028
JAMES SWAN: Yes?

576
00:25:08,028 --> 00:25:10,291
AUDIENCE: It says
exponential 1, but it

577
00:25:10,291 --> 00:25:11,890
represents the approximation?

578
00:25:11,890 --> 00:25:15,050
JAMES SWAN: Exponential 1 is
exponential 1. f prime of 1

579
00:25:15,050 --> 00:25:16,310
is our approximation here.

580
00:25:16,310 --> 00:25:16,820
AUDIENCE: Oh, OK.

581
00:25:16,820 --> 00:25:18,403
JAMES SWAN: Sorry
that that's unclear.

582
00:25:18,403 --> 00:25:23,370
Yes, so this is the absolute
error in this approximation.

583
00:25:23,370 --> 00:25:25,660
So it goes down,
and then it goes up.

584
00:25:25,660 --> 00:25:26,590
Is that clear now?

585
00:25:26,590 --> 00:25:27,210
Good.

586
00:25:27,210 --> 00:25:28,730
OK, why does it go down?

587
00:25:28,730 --> 00:25:31,280
It goes down because our
definition of the derivative

588
00:25:31,280 --> 00:25:33,140
says it should go down.

589
00:25:33,140 --> 00:25:37,610
At some point, I've actually
got to do these calculations

590
00:25:37,610 --> 00:25:40,370
with high enough accuracy
to be able to perceive

591
00:25:40,370 --> 00:25:44,450
the difference between e to
the 1 plus, 10 to the minus 9,

592
00:25:44,450 --> 00:25:47,030
and e to the 1.

593
00:25:47,030 --> 00:25:49,640
So there is a truncation
error in the calculation

594
00:25:49,640 --> 00:25:54,810
of this difference that reduces
my accuracy at a certain level.

595
00:25:54,810 --> 00:25:57,650
There's a heuristic
you can use here, OK?

596
00:25:57,650 --> 00:25:59,087
You want to set
this epsilon when

597
00:25:59,087 --> 00:26:00,920
you do this finite
difference approximation,

598
00:26:00,920 --> 00:26:02,510
to be the square
root of the machine

599
00:26:02,510 --> 00:26:06,830
precision times the magnitude
of x, the point at which you're

600
00:26:06,830 --> 00:26:08,340
trying to calculate
this derivative.

601
00:26:08,340 --> 00:26:10,220
So that's, usually
we're double precision.

602
00:26:10,220 --> 00:26:12,200
So this is something
like 10 to the minus 8

603
00:26:12,200 --> 00:26:13,899
times the magnitude of x.

604
00:26:13,899 --> 00:26:14,690
That's pretty good.

605
00:26:14,690 --> 00:26:15,810
That holds true here.

606
00:26:15,810 --> 00:26:16,310
OK?

607
00:26:16,310 --> 00:26:20,140
You can test it out on
some other functions.

608
00:26:20,140 --> 00:26:21,952
If x is 0, or very small.

609
00:26:21,952 --> 00:26:23,410
We don't want a
relative tolerance.

610
00:26:23,410 --> 00:26:26,110
We've got to choose an
absolute tolerance instead.

611
00:26:26,110 --> 00:26:29,080
Just like we talked about
with the step norm criteria.

612
00:26:29,080 --> 00:26:30,670
So one has to be a
little bit careful

613
00:26:30,670 --> 00:26:31,850
in how you implement this.

614
00:26:31,850 --> 00:26:34,570
But this is a good guide, OK?

615
00:26:34,570 --> 00:26:37,690
A good way to think about how
the error is going to go down,

616
00:26:37,690 --> 00:26:40,180
and where it's going to
start to come back up.

617
00:26:40,180 --> 00:26:41,350
Make sense?

618
00:26:41,350 --> 00:26:43,880
Good.

619
00:26:43,880 --> 00:26:46,527
OK, so how do you compute
elements of the Jacobian then?

620
00:26:46,527 --> 00:26:48,360
Well, those are all
just partial derivatives

621
00:26:48,360 --> 00:26:53,100
of the function with respect to
one of the unknown variables.

622
00:26:53,100 --> 00:26:56,430
So partial f i with
respect to x j is just

623
00:26:56,430 --> 00:27:03,310
f i at x plus some
epsilon deviation of x

624
00:27:03,310 --> 00:27:04,830
in its j-th component only.

625
00:27:04,830 --> 00:27:08,710
So this is like a unit
vector in the j direction,

626
00:27:08,710 --> 00:27:13,090
or associated with the j-th
element of this vector.

627
00:27:13,090 --> 00:27:15,475
Minus f i of x divided
by this epsilon.

628
00:27:18,200 --> 00:27:21,590
Equivalently, you'd
have to do this for f i.

629
00:27:21,590 --> 00:27:23,780
You can compute all the
columns of the Jacobian

630
00:27:23,780 --> 00:27:28,220
very quickly by calling
f of x plus epsilon

631
00:27:28,220 --> 00:27:30,260
minus f of x over epsilon.

632
00:27:30,260 --> 00:27:33,710
Just evaluate your vector-valued
function at these different

633
00:27:33,710 --> 00:27:34,349
x's.

634
00:27:34,349 --> 00:27:36,140
Take the difference,
and that will give you

635
00:27:36,140 --> 00:27:40,940
column j of your Jacobian.

636
00:27:40,940 --> 00:27:43,010
So how many function
evaluations does it

637
00:27:43,010 --> 00:27:47,549
take to calculate the
Jacobian at a single point?

638
00:27:47,549 --> 00:27:49,590
How many times do I have
to evaluate my function?

639
00:27:56,492 --> 00:27:58,960
Yeah?

640
00:27:58,960 --> 00:27:59,634
AUDIENCE: 2 n.

641
00:27:59,634 --> 00:28:00,550
JAMES SWAN: 2n, right.

642
00:28:00,550 --> 00:28:05,530
So if I have n, if I
have n elements to x,

643
00:28:05,530 --> 00:28:09,740
I've got to make two function
calls per column of j.

644
00:28:09,740 --> 00:28:11,986
There's going to
be n columns in j.

645
00:28:11,986 --> 00:28:15,910
So 2n function evaluations
to compute the Jacobian

646
00:28:15,910 --> 00:28:17,290
at a single point.

647
00:28:17,290 --> 00:28:18,520
Is that really true though?

648
00:28:18,520 --> 00:28:19,510
Not quite.

649
00:28:19,510 --> 00:28:21,207
f of x is f of x.

650
00:28:21,207 --> 00:28:22,790
I don't have to
compute it every time.

651
00:28:22,790 --> 00:28:25,090
I just compute f of x once.

652
00:28:25,090 --> 00:28:28,460
So it's really like n plus
1 that I have to do, right?

653
00:28:28,460 --> 00:28:31,942
N plus a function evaluations
to compute this thing.

654
00:28:31,942 --> 00:28:33,710
I actually got to
compute them though.

655
00:28:33,710 --> 00:28:37,070
Function evaluations
may be really expensive.

656
00:28:37,070 --> 00:28:40,136
Suppose you're doing some sort
of complicated simulation,

657
00:28:40,136 --> 00:28:41,510
like a finite
element simulation.

658
00:28:41,510 --> 00:28:44,966
Maybe it takes minutes to
generate a function evaluation.

659
00:28:44,966 --> 00:28:46,340
So it can be
expensive to compute

660
00:28:46,340 --> 00:28:48,110
the Jacobian in this way.

661
00:28:48,110 --> 00:28:51,500
Just be expensive to
compute the Jacobian.

662
00:28:51,500 --> 00:28:53,780
How is approximation
of Jacobian going

663
00:28:53,780 --> 00:28:55,200
to affect the convergence?

664
00:28:58,289 --> 00:29:00,330
What's going to happen to
the rate of convergence

665
00:29:00,330 --> 00:29:02,160
of our method?

666
00:29:02,160 --> 00:29:04,890
It's going to go down, right?

667
00:29:04,890 --> 00:29:06,959
It's probably not
going to be linear.

668
00:29:06,959 --> 00:29:08,250
It's not going to be quadratic.

669
00:29:08,250 --> 00:29:10,230
It's going to be some
super linear factor.

670
00:29:10,230 --> 00:29:12,450
It's going to depend on how
accurate the Jacobian is.

671
00:29:12,450 --> 00:29:17,960
How sensitive the
function is near the root.

672
00:29:17,960 --> 00:29:20,940
But it's going to reduce
the accuracy of the method,

673
00:29:20,940 --> 00:29:23,880
or the convergence rate of
the method by a little bit.

674
00:29:23,880 --> 00:29:25,230
That's OK.

675
00:29:25,230 --> 00:29:27,330
So this is what MATLAB does.

676
00:29:27,330 --> 00:29:29,460
It uses a finite
difference approximation

677
00:29:29,460 --> 00:29:30,210
for your Jacobian.

678
00:29:30,210 --> 00:29:32,293
When you give it a function,
and you don't tell it

679
00:29:32,293 --> 00:29:33,300
the Jacobian explicitly.

680
00:29:35,990 --> 00:29:39,237
Here's an example of how
to implement this yourself.

681
00:29:39,237 --> 00:29:40,820
So I've got to have
some function that

682
00:29:40,820 --> 00:29:42,736
does whatever this
function is supposed to do.

683
00:29:42,736 --> 00:29:45,402
It takes as input x and
it gives an output f.

684
00:29:45,402 --> 00:29:47,480
And then the Jacobian, right?

685
00:29:47,480 --> 00:29:48,740
It's a matrix.

686
00:29:48,740 --> 00:29:50,990
So we initialize this matrix.

687
00:29:50,990 --> 00:29:53,990
We loop over each
of the columns.

688
00:29:53,990 --> 00:29:57,350
We compute the
displacement right?

689
00:29:57,350 --> 00:30:00,512
The deviation from
x for each of these.

690
00:30:00,512 --> 00:30:01,970
And then we compute
this difference

691
00:30:01,970 --> 00:30:04,674
and divide it by epsilon.

692
00:30:04,674 --> 00:30:06,590
I haven't done everything
perfect here, right?

693
00:30:06,590 --> 00:30:08,150
Here's an extra
function evaluation.

694
00:30:08,150 --> 00:30:10,400
I could just calculate
the value of the function

695
00:30:10,400 --> 00:30:12,486
at x before doing the loop.

696
00:30:12,486 --> 00:30:14,570
I've also only used a
relative tolerance here.

697
00:30:14,570 --> 00:30:16,392
I'm going to be in
trouble if xi is 0.

698
00:30:16,392 --> 00:30:18,350
It's going to be a problem
with this algorithm.

699
00:30:18,350 --> 00:30:22,620
These are the little details
one has to pay attention to.

700
00:30:22,620 --> 00:30:24,560
But it's a simple enough
calculation to do.

701
00:30:24,560 --> 00:30:25,850
Loop over the columns, right?

702
00:30:25,850 --> 00:30:27,290
Compute these differences.

703
00:30:27,290 --> 00:30:28,130
Divide by epsilon.

704
00:30:28,130 --> 00:30:30,980
You have your approximation
for the Jacobian.

705
00:30:30,980 --> 00:30:33,890
I've got to do that at
every iteration, right?

706
00:30:33,890 --> 00:30:36,680
Every time x is updated, I've
got to recompute my Jacobian.

707
00:30:40,762 --> 00:30:41,470
That's it though.

708
00:30:41,470 --> 00:30:44,310
All right, that's one way
of approximating a Jacobian.

709
00:30:44,310 --> 00:30:47,370
There s a method that's used in
one dimension called the Secant

710
00:30:47,370 --> 00:30:49,380
method.

711
00:30:49,380 --> 00:30:51,570
It's a special case of
the Newton-Raphson method

712
00:30:51,570 --> 00:30:54,090
and uses a coarser approximation
for the derivative.

713
00:30:54,090 --> 00:30:59,600
It says, I was taking these
steps from xi minus 1 to xi.

714
00:30:59,600 --> 00:31:01,200
And I knew the
function values there.

715
00:31:01,200 --> 00:31:03,234
Maybe I should just
compute the slope

716
00:31:03,234 --> 00:31:05,400
of the line that goes through
those points, and say,

717
00:31:05,400 --> 00:31:07,690
that's my approximation
for the derivative.

718
00:31:07,690 --> 00:31:08,190
Why not?

719
00:31:08,190 --> 00:31:10,020
I have the data available to me.

720
00:31:10,020 --> 00:31:12,420
It seems like a
sensible thing to do.

721
00:31:12,420 --> 00:31:20,590
So we replace f prime at x1 with
f of xi minus f of x i minus 1.

722
00:31:20,590 --> 00:31:24,077
Down here we put xi
minus xi minus 1 up here.

723
00:31:24,077 --> 00:31:25,910
That's our approximation
for the derivative,

724
00:31:25,910 --> 00:31:28,010
or the inverse of
the derivative.

725
00:31:28,010 --> 00:31:31,460
This can work, it
can work just fine.

726
00:31:31,460 --> 00:31:33,050
Can it be extended
to many dimensions?

727
00:31:33,050 --> 00:31:34,700
That's an interesting
question, though?

728
00:31:34,700 --> 00:31:35,870
This is simple.

729
00:31:35,870 --> 00:31:37,880
In many dimensions,
not so obvious right?

730
00:31:37,880 --> 00:31:42,575
If I know xi, xi minus 1.
f of xi, f of si minus 1.

731
00:31:42,575 --> 00:31:46,841
Can I approximate the Jacobian?

732
00:31:46,841 --> 00:31:47,590
What do you think?

733
00:31:55,034 --> 00:31:56,450
Does it strike you
as though there

734
00:31:56,450 --> 00:31:59,402
might be some fundamental
difficulty to doing that?

735
00:32:09,704 --> 00:32:10,204
Yeah?

736
00:32:10,204 --> 00:32:12,659
AUDIENCE: Could you
approximate the gradient?

737
00:32:12,659 --> 00:32:19,533
[INAUDIBLE] gradient of f at x.

738
00:32:19,533 --> 00:32:20,855
JAMES SWAN: OK.

739
00:32:20,855 --> 00:32:24,015
AUDIENCE: But I'm sure
if you whether you

740
00:32:24,015 --> 00:32:27,440
can go backwards from the
gradient in the Jacobian.

741
00:32:27,440 --> 00:32:29,160
JAMES SWAN: OK.

742
00:32:29,160 --> 00:32:31,656
So, let's-- go ahead.

743
00:32:31,656 --> 00:32:33,534
AUDIENCE: Perhaps
the difficulty is,

744
00:32:33,534 --> 00:32:35,200
I mean when they're
just single values--

745
00:32:35,200 --> 00:32:35,410
JAMES SWAN: Yeah.

746
00:32:35,410 --> 00:32:38,062
AUDIENCE: You can think of
[INAUDIBLE] derivative, right?

747
00:32:38,062 --> 00:32:39,190
JAMES SWAN: Yeah.

748
00:32:39,190 --> 00:32:41,550
AUDIENCE: [INAUDIBLE]
get really big,

749
00:32:41,550 --> 00:32:44,190
you get a vector of
a function at xi,

750
00:32:44,190 --> 00:32:47,290
a vector of a function of
xi minus 1 or whatever.

751
00:32:47,290 --> 00:32:49,343
Vectors of these x's.

752
00:32:49,343 --> 00:32:52,377
And so if you're [INAUDIBLE]

753
00:32:52,377 --> 00:32:54,460
JAMES SWAN: Yeah, so how
do I divide these things?

754
00:32:54,460 --> 00:32:55,430
That's a good question.

755
00:32:55,430 --> 00:32:58,210
The Jacobian-- how much
information content

756
00:32:58,210 --> 00:32:59,470
is in the Jacobian?

757
00:32:59,470 --> 00:33:01,690
Or how many
independent quantities

758
00:33:01,690 --> 00:33:04,010
are built into the Jacobian?

759
00:33:04,010 --> 00:33:05,360
AUDIENCE: [INAUDIBLE]

760
00:33:05,360 --> 00:33:06,440
JAMES SWAN: And squared.

761
00:33:06,440 --> 00:33:10,160
And how much data do I
have to work with here?

762
00:33:10,160 --> 00:33:11,870
You know, order n data.

763
00:33:11,870 --> 00:33:14,240
To figure out order
n squared quantities.

764
00:33:14,240 --> 00:33:16,490
This is the division problem
you're describing, right?

765
00:33:16,490 --> 00:33:19,400
So it seems like this is
an underdetermined sort

766
00:33:19,400 --> 00:33:20,040
of problem.

767
00:33:20,040 --> 00:33:20,910
And it is.

768
00:33:20,910 --> 00:33:21,740
OK?

769
00:33:21,740 --> 00:33:25,830
So there isn't a direct
analog to the Secant method

770
00:33:25,830 --> 00:33:27,110
in dimensions.

771
00:33:27,110 --> 00:33:31,090
We can write down
something that makes sense.

772
00:33:31,090 --> 00:33:32,990
So this is the 1D
Secant approximation.

773
00:33:32,990 --> 00:33:35,810
That the value of the
derivative multiplied

774
00:33:35,810 --> 00:33:38,750
by the step between
i minus 1 and i

775
00:33:38,750 --> 00:33:40,820
is approximated
by the difference

776
00:33:40,820 --> 00:33:43,840
in the values of the function.

777
00:33:43,840 --> 00:33:47,110
The equivalent is the value
of the Jacobian multiplied

778
00:33:47,110 --> 00:33:49,120
by the step between
i minus 1 and i

779
00:33:49,120 --> 00:33:51,970
is equal to the difference
between the values

780
00:33:51,970 --> 00:33:54,220
of the functions.

781
00:33:54,220 --> 00:33:57,000
But now this is
an equation for n

782
00:33:57,000 --> 00:34:00,260
squared elements
of the Jacobian,

783
00:34:00,260 --> 00:34:03,700
in terms of n elements
of the function, right?

784
00:34:03,700 --> 00:34:07,690
So it's massively,
massively underdetermined.

785
00:34:07,690 --> 00:34:10,469
OK?

786
00:34:10,469 --> 00:34:12,120
Here we have an equation for--

787
00:34:12,120 --> 00:34:14,280
we have one equation
for one unknown.

788
00:34:14,280 --> 00:34:15,659
The derivative, right?

789
00:34:15,659 --> 00:34:18,420
Think about how it was moving
through space before, right?

790
00:34:18,420 --> 00:34:24,120
The difference here, xi minus 1,
that's some sort of linear path

791
00:34:24,120 --> 00:34:26,469
that I'm moving
along through space.

792
00:34:26,469 --> 00:34:30,420
How am I supposed to figure out
what the tangent curves to all

793
00:34:30,420 --> 00:34:32,556
these functions are
from this linear path

794
00:34:32,556 --> 00:34:34,139
through multidimensional
space, right?

795
00:34:34,139 --> 00:34:37,260
That's not going to work.

796
00:34:37,260 --> 00:34:38,840
So there's
underdetermined problems.

797
00:34:38,840 --> 00:34:40,920
It's not so-- that's
not so bad, actually.

798
00:34:40,920 --> 00:34:41,420
Right?

799
00:34:41,420 --> 00:34:42,719
Doesn't mean
there's no solution.

800
00:34:42,719 --> 00:34:44,760
In fact, it means there's
all a lot of solutions.

801
00:34:44,760 --> 00:34:47,630
So we can pick whichever
one we think is suitable.

802
00:34:47,630 --> 00:34:50,179
And Broyden's method
is a method for picking

803
00:34:50,179 --> 00:34:51,920
one of these
potential solutions to

804
00:34:51,920 --> 00:34:53,540
this underdetermined problem.

805
00:34:53,540 --> 00:34:56,780
We don't have enough information
to calculate the Jacobian

806
00:34:56,780 --> 00:34:57,710
exactly.

807
00:34:57,710 --> 00:35:01,380
But maybe we can construct a
suitable approximation for it.

808
00:35:01,380 --> 00:35:04,200
And here's what's done.

809
00:35:04,200 --> 00:35:06,890
So here's the Secant
approximation.

810
00:35:06,890 --> 00:35:08,750
It says the Jacobian
times the step

811
00:35:08,750 --> 00:35:10,480
size, or the
Newton-Raphson step,

812
00:35:10,480 --> 00:35:12,980
should be the difference
in the functions.

813
00:35:12,980 --> 00:35:20,660
And Newton's method for x
i, said xi minus xi minus 1

814
00:35:20,660 --> 00:35:22,550
was equal-- times
the Jacobian, was

815
00:35:22,550 --> 00:35:23,872
equal to minus f of xi minus 1.

816
00:35:23,872 --> 00:35:25,080
This is just Newton's method.

817
00:35:25,080 --> 00:35:27,710
Invert the Jacobian, and
put it on the other side

818
00:35:27,710 --> 00:35:28,910
of the equation.

819
00:35:28,910 --> 00:35:31,300
Broyden's method said,
i-- there's a trick here.

820
00:35:31,300 --> 00:35:33,970
Take the difference
between these things.

821
00:35:33,970 --> 00:35:36,660
I get the same left-hand side
on both of these equations.

822
00:35:36,660 --> 00:35:38,720
So take the difference,
and I can figure out

823
00:35:38,720 --> 00:35:41,894
how the Jacobian should change
from one step to the next.

824
00:35:41,894 --> 00:35:44,060
So maybe I have a good
approximation to the Jacobian

825
00:35:44,060 --> 00:35:48,380
at xi minus i, I might
be able to use this still

826
00:35:48,380 --> 00:35:51,230
underdetermined problem
to figure out how

827
00:35:51,230 --> 00:35:53,250
to update that Jacobian, right?

828
00:35:53,250 --> 00:35:56,270
So Broyden's method is what's
referred to as the rank one

829
00:35:56,270 --> 00:35:56,780
update.

830
00:35:56,780 --> 00:35:59,750
You should convince
yourself that letting

831
00:35:59,750 --> 00:36:04,500
the Jacobian at xi minus
the Jacobian at xi minus 1

832
00:36:04,500 --> 00:36:07,370
be equal to this is
one possible solution

833
00:36:07,370 --> 00:36:10,500
of this underdetermined
equation.

834
00:36:10,500 --> 00:36:12,150
There are others.

835
00:36:12,150 --> 00:36:13,860
This is one possible solution.

836
00:36:13,860 --> 00:36:17,820
It turns out to be a
good one to choose.

837
00:36:17,820 --> 00:36:19,500
So there's an
iterative approximation

838
00:36:19,500 --> 00:36:21,150
now for the Jacobian.

839
00:36:24,615 --> 00:36:26,745
Does this strategy make sense?

840
00:36:26,745 --> 00:36:27,870
It's a little weird, right?

841
00:36:27,870 --> 00:36:29,120
There's something tricky here.

842
00:36:29,120 --> 00:36:30,600
You got to know to do this.

843
00:36:30,600 --> 00:36:32,310
Right, so somebody
has to have in mind

844
00:36:32,310 --> 00:36:34,860
already that they're looking
for differences in the Jacobian

845
00:36:34,860 --> 00:36:36,510
that they're going
to update over time.

846
00:36:40,322 --> 00:36:42,780
So this tells me the Jacobian,
how the Jacobian is updated.

847
00:36:46,480 --> 00:36:49,480
Really we need the
Jacobian inverse,

848
00:36:49,480 --> 00:36:52,240
and the reason for choosing
this rank one update

849
00:36:52,240 --> 00:36:57,100
approximation is it's
possible to write

850
00:36:57,100 --> 00:37:01,330
the inverse of j of xi in
terms of the inverse of j

851
00:37:01,330 --> 00:37:05,100
at xi minus 1 when this
update formula is true.

852
00:37:05,100 --> 00:37:07,600
So it's something called the
Sherman Morrison Formula, which

853
00:37:07,600 --> 00:37:11,440
says the inverse of a matrix
plus the dyadic product of two

854
00:37:11,440 --> 00:37:14,440
vectors can be
written in this form.

855
00:37:14,440 --> 00:37:18,120
We don't need to derive
this, but this is true.

856
00:37:18,120 --> 00:37:22,260
This matrix plus dyadic
product is exactly this.

857
00:37:22,260 --> 00:37:24,660
We have dyadic product
between f and the step

858
00:37:24,660 --> 00:37:27,869
from xi minus 1 to x.

859
00:37:27,869 --> 00:37:29,910
And so we can apply that
Sherman Morrison Formula

860
00:37:29,910 --> 00:37:31,350
to the rank one update.

861
00:37:31,350 --> 00:37:34,656
And not only can we update
the Jacobian iteratively,

862
00:37:34,656 --> 00:37:36,280
but we can update
the Jacobian inverse.

863
00:37:36,280 --> 00:37:39,900
So if I know j inverse
at some previous time,

864
00:37:39,900 --> 00:37:41,899
I know j inverse at
some later time too.

865
00:37:41,899 --> 00:37:43,440
I don't have to
compute these things.

866
00:37:43,440 --> 00:37:45,990
I don't have to solve these
systems of equations, right?

867
00:37:45,990 --> 00:37:47,220
I just update this matrix.

868
00:37:50,020 --> 00:37:51,880
Update this matrix,
and I can very rapidly

869
00:37:51,880 --> 00:37:54,639
do these computations.

870
00:37:54,639 --> 00:37:56,430
So not only do we have
an iterative formula

871
00:37:56,430 --> 00:37:57,900
for the steps, right?

872
00:37:57,900 --> 00:38:01,382
From x 0 to x1 to x 2,
all the way up to our

873
00:38:01,382 --> 00:38:02,840
converged solution,
but we can have

874
00:38:02,840 --> 00:38:05,460
a formula for the
inverse of the Jacobian.

875
00:38:05,460 --> 00:38:07,170
We give up accuracy.

876
00:38:07,170 --> 00:38:10,260
But that's paid for in
terms of the amount of time

877
00:38:10,260 --> 00:38:12,930
we have to spend doing
these calculations.

878
00:38:12,930 --> 00:38:13,710
Does it pay off?

879
00:38:13,710 --> 00:38:15,570
It depends on the
problem, right?

880
00:38:15,570 --> 00:38:17,890
We try to solve problems
in different ways.

881
00:38:17,890 --> 00:38:22,631
This is a pretty common way
to approximate the Jacobian.

882
00:38:22,631 --> 00:38:24,062
Questions about this?

883
00:38:27,410 --> 00:38:28,130
No.

884
00:38:28,130 --> 00:38:28,952
OK.

885
00:38:28,952 --> 00:38:29,660
Broyden's method.

886
00:38:32,619 --> 00:38:33,910
All right, here's the last one.

887
00:38:36,460 --> 00:38:38,190
The Damped
Newton-Raphson method.

888
00:38:38,190 --> 00:38:40,330
We'll do this in one dimension.

889
00:38:40,330 --> 00:38:42,740
So the Newton-Raphson method,
Newton and Raphson told us,

890
00:38:42,740 --> 00:38:46,780
take a step from xi to xi
plus 1 that is this big.

891
00:38:49,880 --> 00:38:53,570
xi to xi plus 1, it's this big.

892
00:38:53,570 --> 00:38:56,270
Sometimes you'll take
that step, and you'll

893
00:38:56,270 --> 00:38:59,360
find that the value of
the function at xi plus 1

894
00:38:59,360 --> 00:39:02,180
is even bigger than the
value of the function at xi.

895
00:39:02,180 --> 00:39:04,429
There was nothing about the
Newton-Raphson method that

896
00:39:04,429 --> 00:39:07,149
told us the function value was
always going to be decreasing.

897
00:39:07,149 --> 00:39:09,440
But actually, our goal is to
make the function value go

898
00:39:09,440 --> 00:39:11,070
to 0 in absolute value.

899
00:39:11,070 --> 00:39:16,370
So it seems like this step,
not a very good one, right?

900
00:39:16,370 --> 00:39:18,120
What are Newton and
Raphson thinking here.

901
00:39:18,120 --> 00:39:19,440
This is not a good idea.

902
00:39:19,440 --> 00:39:20,630
The function value went up.

903
00:39:24,830 --> 00:39:25,994
Far from a root, OK?

904
00:39:25,994 --> 00:39:27,410
The Newton-Raphson
method is going

905
00:39:27,410 --> 00:39:29,621
to give these sorts
of erratic responses.

906
00:39:29,621 --> 00:39:31,370
Who knows what direction
it's going to go?

907
00:39:31,370 --> 00:39:35,480
And it's only
locally convergent.

908
00:39:35,480 --> 00:39:37,191
It tells us a
direction to move in,

909
00:39:37,191 --> 00:39:39,440
but it doesn't always give
the right sort of magnitude

910
00:39:39,440 --> 00:39:41,310
associated with that step.

911
00:39:41,310 --> 00:39:43,310
And so you take these
steps and you can find out

912
00:39:43,310 --> 00:39:45,650
the value of your function, the
normed value of your functions.

913
00:39:45,650 --> 00:39:47,108
It's bigger than
where you started.

914
00:39:47,108 --> 00:39:49,670
It seems like you're getting
further away from the root.

915
00:39:49,670 --> 00:39:52,880
Our ultimate goal is to
drive this norm to 0.

916
00:39:52,880 --> 00:39:55,310
So steps like that you might
even call unacceptable.

917
00:39:55,310 --> 00:39:55,520
Right?

918
00:39:55,520 --> 00:39:57,270
Why would I ever take a
step in that direction?

919
00:39:57,270 --> 00:39:58,940
Maybe I should use
a different method.

920
00:39:58,940 --> 00:40:01,550
When I take a step that's
so big my function value

921
00:40:01,550 --> 00:40:03,020
grows in norm value.

922
00:40:05,690 --> 00:40:08,920
So what one does, oftentimes,
is introduce a damping factor,

923
00:40:08,920 --> 00:40:09,970
right?

924
00:40:09,970 --> 00:40:13,450
We said that this
ratio, or equivalently,

925
00:40:13,450 --> 00:40:15,760
the Jacobian inverse times
the value of the function,

926
00:40:15,760 --> 00:40:18,320
gives us the right
direction to step in.

927
00:40:18,320 --> 00:40:21,760
But how big a step
should we take?

928
00:40:21,760 --> 00:40:25,750
It's clear a step like
this is a good one.

929
00:40:25,750 --> 00:40:27,320
It reduced the value
of the function.

930
00:40:30,647 --> 00:40:32,480
And it's better than
the one we took before,

931
00:40:32,480 --> 00:40:35,240
which was given by the
linear approximation.

932
00:40:35,240 --> 00:40:38,150
So if I draw the tangent
line, it intercepts here.

933
00:40:38,150 --> 00:40:40,290
If I take a step
in this direction,

934
00:40:40,290 --> 00:40:42,440
but I reduce the
slope by having some

935
00:40:42,440 --> 00:40:43,990
damping factor that's
smaller than 1,

936
00:40:43,990 --> 00:40:46,570
I get closer to the root.

937
00:40:46,570 --> 00:40:50,710
Ideally we'd like to
choose that damping factor

938
00:40:50,710 --> 00:40:54,760
to be the one that minimizes
the value of the function at xi

939
00:40:54,760 --> 00:40:57,280
plus 1.

940
00:40:57,280 --> 00:41:01,510
So it's the argument
that minimizes

941
00:41:01,510 --> 00:41:05,530
the value of the function at xi
plus 1 or at xi minus alpha, f

942
00:41:05,530 --> 00:41:08,060
over f prime.

943
00:41:08,060 --> 00:41:09,500
Solving that
optimization problem,

944
00:41:09,500 --> 00:41:12,150
what's hard as finding
the root itself.

945
00:41:12,150 --> 00:41:13,760
So ideally this is true.

946
00:41:13,760 --> 00:41:16,730
But practically you're not
going to be able to do it.

947
00:41:16,730 --> 00:41:20,540
So we have to come up with some
approximate methods of solving

948
00:41:20,540 --> 00:41:21,624
this optimization problem.

949
00:41:21,624 --> 00:41:23,748
Actually we don't even care
about getting it exact.

950
00:41:23,748 --> 00:41:25,670
We know Newton-Raphson
does a pretty good job.

951
00:41:25,670 --> 00:41:28,400
We want some sort
of guess that's

952
00:41:28,400 --> 00:41:32,216
respectable for this alpha so
that we get close to this root.

953
00:41:32,216 --> 00:41:33,590
Once we get close,
we'll probably

954
00:41:33,590 --> 00:41:34,774
choose alpha equal to 1.

955
00:41:34,774 --> 00:41:36,440
We'll just take the
Newton-Raphson steps

956
00:41:36,440 --> 00:41:40,100
all the way down to the root.

957
00:41:40,100 --> 00:41:41,800
So here it is in
many dimensions.

958
00:41:41,800 --> 00:41:45,960
Modify the Newton-Raphson
step by some value alpha,

959
00:41:45,960 --> 00:41:47,890
choose alpha to be
the argument that

960
00:41:47,890 --> 00:41:52,270
minimizes the norm value of
the function at xi plus 1.

961
00:41:55,510 --> 00:41:57,340
Here's one way of doing this.

962
00:41:57,340 --> 00:41:59,330
So this is called the
Armijo line search.

963
00:41:59,330 --> 00:41:59,830
See?

964
00:41:59,830 --> 00:42:01,180
Line search.

965
00:42:01,180 --> 00:42:03,400
Start by letting
alpha equal to 1.

966
00:42:03,400 --> 00:42:05,960
Take the full Newton-Raphson
step, and check.

967
00:42:05,960 --> 00:42:09,000
Was the value of my function
smaller than where I started?

968
00:42:09,000 --> 00:42:11,470
If it is, let's take the step.

969
00:42:11,470 --> 00:42:13,630
It's getting us-- we're
accomplishing our goal.

970
00:42:13,630 --> 00:42:16,180
We re reducing the value
of the function in norm.

971
00:42:16,180 --> 00:42:17,410
Maybe we're headed towards z.

972
00:42:17,410 --> 00:42:18,370
That's good.

973
00:42:18,370 --> 00:42:20,020
Accept it.

974
00:42:20,020 --> 00:42:24,440
If no, let's replace
alpha with alpha over 2.

975
00:42:24,440 --> 00:42:25,630
Let's take a shorter step.

976
00:42:28,330 --> 00:42:30,190
We take a shorter
step, and we repeat.

977
00:42:30,190 --> 00:42:30,690
Right?

978
00:42:30,690 --> 00:42:32,029
Take the shorter step.

979
00:42:32,029 --> 00:42:34,570
Check whether the value of the
function with the shorter step

980
00:42:34,570 --> 00:42:35,740
is acceptable.

981
00:42:35,740 --> 00:42:38,540
If yes, let's take
it, and let's move on.

982
00:42:38,540 --> 00:42:42,590
And if no, replace alpha with
alpha over 2, and continue.

983
00:42:42,590 --> 00:42:45,580
So we have our step
size every time.

984
00:42:45,580 --> 00:42:47,080
We don't just have to have it.

985
00:42:47,080 --> 00:42:49,930
We could choose different
factors to reduce it by.

986
00:42:49,930 --> 00:42:52,300
But we try to take
shorter and shorter steps

987
00:42:52,300 --> 00:42:56,710
until we accomplish our goal
of having a function which

988
00:42:56,710 --> 00:42:58,900
is smaller in norm
at our next iterate

989
00:42:58,900 --> 00:43:01,580
than where we were before.

990
00:43:01,580 --> 00:43:03,910
It's got-- the function
value will be reduced.

991
00:43:03,910 --> 00:43:06,430
The Newton-Raphson
method picks a direction

992
00:43:06,430 --> 00:43:10,170
that wants to bring the
function value closer to 0.

993
00:43:10,170 --> 00:43:11,797
We linearize the
function, and we

994
00:43:11,797 --> 00:43:14,380
found the direction we needed
to go to make that linearization

995
00:43:14,380 --> 00:43:14,880
go to 0.

996
00:43:14,880 --> 00:43:18,310
So there is a step size
for which the function

997
00:43:18,310 --> 00:43:21,450
value will be reduced.

998
00:43:21,450 --> 00:43:23,632
And because of that,
this Armijo line search

999
00:43:23,632 --> 00:43:25,090
of the Damped
Newton-Raphson method

1000
00:43:25,090 --> 00:43:27,690
is actually globally
convergent, right?

1001
00:43:27,690 --> 00:43:30,370
The iterative method
will terminate.

1002
00:43:30,370 --> 00:43:31,347
You can guarantee it.

1003
00:43:31,347 --> 00:43:32,930
Here's what it looks
like graphically.

1004
00:43:32,930 --> 00:43:35,790
I take my big step, my
alpha equals 1 step.

1005
00:43:35,790 --> 00:43:37,375
I check the value
of the function.

1006
00:43:37,375 --> 00:43:39,500
It's bigger in absolute
value than where I started.

1007
00:43:39,500 --> 00:43:40,570
So I go back.

1008
00:43:40,570 --> 00:43:42,830
I take half that step size.

1009
00:43:42,830 --> 00:43:43,570
OK?

1010
00:43:43,570 --> 00:43:44,480
I look at the value
of the function.

1011
00:43:44,480 --> 00:43:45,280
It's still bigger.

1012
00:43:45,280 --> 00:43:46,890
Let's reject it, and go back.

1013
00:43:46,890 --> 00:43:48,719
I take half that
step size again.

1014
00:43:48,719 --> 00:43:50,260
The value of the
function here is now

1015
00:43:50,260 --> 00:43:51,950
smaller in absolute value.

1016
00:43:51,950 --> 00:43:53,370
So I accept it.

1017
00:43:53,370 --> 00:43:57,200
And I put myself pretty
close to the root.

1018
00:43:57,200 --> 00:44:00,050
So it's convergent,
globally convergent.

1019
00:44:00,050 --> 00:44:01,850
That's nice.

1020
00:44:01,850 --> 00:44:05,712
It's not globally convergent
to roots, which is a pain.

1021
00:44:05,712 --> 00:44:06,920
But it's globally convergent.

1022
00:44:06,920 --> 00:44:08,220
It will terminate eventually.

1023
00:44:08,220 --> 00:44:11,780
You'll get to a
point where you won't

1024
00:44:11,780 --> 00:44:14,500
be able to advance
your steps any further.

1025
00:44:14,500 --> 00:44:18,050
It may converge to minima
or maxima of a function.

1026
00:44:18,050 --> 00:44:19,486
Or it may converge to roots.

1027
00:44:19,486 --> 00:44:20,360
But it will converge.

1028
00:44:23,290 --> 00:44:26,732
I showed you this example before
with basins of attraction.

1029
00:44:26,732 --> 00:44:28,690
So here we have different
basins of attraction.

1030
00:44:28,690 --> 00:44:29,648
They're all colored in.

1031
00:44:29,648 --> 00:44:31,460
They show you which
roots you approach.

1032
00:44:31,460 --> 00:44:34,680
Here I've applied the
Damped Newton-Raphson method

1033
00:44:34,680 --> 00:44:36,714
to the same system of equations.

1034
00:44:36,714 --> 00:44:38,380
And you can see the
basins of attraction

1035
00:44:38,380 --> 00:44:41,380
are shrunk because
of the damping.

1036
00:44:41,380 --> 00:44:44,380
What happens when you're
very close to places where

1037
00:44:44,380 --> 00:44:46,640
the determinant of the
Jacobian is singular,

1038
00:44:46,640 --> 00:44:49,690
you take all sorts
of wild steps.

1039
00:44:49,690 --> 00:44:51,700
You go to places where
the value of the function

1040
00:44:51,700 --> 00:44:54,019
is bigger than
where you started.

1041
00:44:54,019 --> 00:44:55,810
And then you've got to
step down from there

1042
00:44:55,810 --> 00:44:57,430
to try to find the root.

1043
00:44:57,430 --> 00:44:59,320
Who knows where
those locations are?

1044
00:44:59,320 --> 00:45:02,650
It's a very complicated,
geometrically complicated space

1045
00:45:02,650 --> 00:45:04,075
that you're moving through.

1046
00:45:04,075 --> 00:45:05,575
And the Damped
Newton-Raphson method

1047
00:45:05,575 --> 00:45:09,784
is forcing the steps
to always reduce

1048
00:45:09,784 --> 00:45:11,200
the value of the
function, so they

1049
00:45:11,200 --> 00:45:15,030
reduce the size of these
basins of attraction.

1050
00:45:15,030 --> 00:45:18,437
So this is often a
nice way to supplement

1051
00:45:18,437 --> 00:45:20,520
the Newton-Raphson method
when your guesses aren't

1052
00:45:20,520 --> 00:45:21,589
very good to begin with.

1053
00:45:21,589 --> 00:45:23,880
When you start to get close
to root you're, always just

1054
00:45:23,880 --> 00:45:25,650
going to accept alpha equals 1.

1055
00:45:25,650 --> 00:45:27,480
The first step will
be the best step,

1056
00:45:27,480 --> 00:45:30,825
and then you'll converge
very rapidly to the solution.

1057
00:45:30,825 --> 00:45:33,150
Do we have to do any
extra work actually

1058
00:45:33,150 --> 00:45:35,460
to do this Damped
Newton-Raphson method.

1059
00:45:35,460 --> 00:45:37,310
Does it require
extra calculations?

1060
00:45:43,010 --> 00:45:44,057
What do you think?

1061
00:45:44,057 --> 00:45:45,890
A lot of extra-- a lot
of extra calculation?

1062
00:45:45,890 --> 00:45:47,780
How many extra calculations
does it require?

1063
00:45:47,780 --> 00:45:48,990
Of course it requires extra.

1064
00:45:48,990 --> 00:45:50,246
How many?

1065
00:45:50,246 --> 00:45:55,076
AUDIENCE: [INAUDIBLE]

1066
00:45:55,076 --> 00:45:56,712
JAMES SWAN: What do you think?

1067
00:45:56,712 --> 00:46:07,180
AUDIENCE: [INAUDIBLE]

1068
00:46:07,180 --> 00:46:09,060
JAMES SWAN: It's--
that much is true.

1069
00:46:09,060 --> 00:46:13,160
So let's talk about
taking one step.

1070
00:46:13,160 --> 00:46:15,860
How many more-- how many
more calculations do

1071
00:46:15,860 --> 00:46:18,110
I have to pay to do
this sort of a step?

1072
00:46:18,110 --> 00:46:21,490
Or even write the
multidimensional step?

1073
00:46:24,160 --> 00:46:27,185
For each of these
times around this loop,

1074
00:46:27,185 --> 00:46:28,810
do I have to recompute?

1075
00:46:28,810 --> 00:46:32,420
Do I have to solve the
system of equations?

1076
00:46:32,420 --> 00:46:32,930
No.

1077
00:46:32,930 --> 00:46:33,430
Right?

1078
00:46:33,430 --> 00:46:34,970
You precompute this, right?

1079
00:46:34,970 --> 00:46:36,560
This is the basic
Newton-Raphson step.

1080
00:46:36,560 --> 00:46:37,518
You compute that first.

1081
00:46:37,518 --> 00:46:39,470
You've got to do it once.

1082
00:46:39,470 --> 00:46:41,472
And then it's pretty
cheap after that.

1083
00:46:41,472 --> 00:46:43,430
I've got to do some extra
function evaluations,

1084
00:46:43,430 --> 00:46:45,888
but I don't actually have to
solve the system of equations.

1085
00:46:45,888 --> 00:46:49,040
Remember this is order n cubed.

1086
00:46:49,040 --> 00:46:53,690
If we solve it exactly, maybe
order n squared or order

1087
00:46:53,690 --> 00:46:55,160
n if we do it iteratively.

1088
00:46:55,160 --> 00:46:57,620
And the Jacobian
is sparse somehow,

1089
00:46:57,620 --> 00:46:59,510
and we know about
it sparsity pattern.

1090
00:46:59,510 --> 00:47:00,320
This is expensive.

1091
00:47:00,320 --> 00:47:04,040
Function evaluations, those
are order n to compute.

1092
00:47:04,040 --> 00:47:05,490
Relatively cheap by comparison.

1093
00:47:05,490 --> 00:47:08,760
So you compute
your initial step.

1094
00:47:08,760 --> 00:47:09,690
That's expensive.

1095
00:47:09,690 --> 00:47:13,500
But all of this down
here is pretty cheap.

1096
00:47:13,500 --> 00:47:14,211
Yeah?

1097
00:47:14,211 --> 00:47:15,894
AUDIENCE: You're also assuming
that your function evaluations

1098
00:47:15,894 --> 00:47:16,620
are reasonably true.

1099
00:47:16,620 --> 00:47:17,661
JAMES SWAN: This is true.

1100
00:47:17,661 --> 00:47:19,474
AUDIENCE: [INAUDIBLE]

1101
00:47:19,474 --> 00:47:20,390
JAMES SWAN: It's true.

1102
00:47:20,390 --> 00:47:23,250
Well, Jacobian is also very
expensive to compute then too.

1103
00:47:23,250 --> 00:47:24,830
So, if--

1104
00:47:24,830 --> 00:47:28,352
AUDIENCE: [INAUDIBLE]

1105
00:47:28,352 --> 00:47:29,560
JAMES SWAN: Sure, sure, sure.

1106
00:47:29,560 --> 00:47:31,210
No, I don't disagree.

1107
00:47:31,210 --> 00:47:33,550
I think one has to pick
the method you're going

1108
00:47:33,550 --> 00:47:35,380
to use to suit the problem.

1109
00:47:35,380 --> 00:47:37,900
But turns out this doesn't
involve much extra calculation.

1110
00:47:37,900 --> 00:47:40,060
So by default, for
example, fsolve in MATLAB

1111
00:47:40,060 --> 00:47:41,560
is going to do this for you.

1112
00:47:41,560 --> 00:47:43,450
Or some version of this.

1113
00:47:43,450 --> 00:47:46,150
it's going to try to take
steps that aren't too big.

1114
00:47:46,150 --> 00:47:47,740
It will limit the
step size for you,

1115
00:47:47,740 --> 00:47:50,830
so that it keeps the value
of the function reducing

1116
00:47:50,830 --> 00:47:52,320
in magnitude.

1117
00:47:52,320 --> 00:47:55,160
It's a pretty good
general strategy.

1118
00:47:55,160 --> 00:47:55,998
Yes?

1119
00:47:55,998 --> 00:48:03,468
AUDIENCE: [INAUDIBLE] so why
do we just pick one value for

1120
00:48:03,468 --> 00:48:08,500
[INAUDIBLE]

1121
00:48:08,500 --> 00:48:09,250
JAMES SWAN: I see.

1122
00:48:09,250 --> 00:48:11,160
So why-- ask that one more time.

1123
00:48:11,160 --> 00:48:12,244
This is a good question.

1124
00:48:12,244 --> 00:48:14,410
Can you say it a little
louder so everyone can hear?

1125
00:48:14,410 --> 00:48:17,791
AUDIENCE: So why, instead of
having just one value of alpha

1126
00:48:17,791 --> 00:48:21,700
and not having several
values of alpha [INAUDIBLE]

1127
00:48:21,700 --> 00:48:22,450
JAMES SWAN: I see.

1128
00:48:22,450 --> 00:48:26,740
So the question is, yeah,
we used a scalar alpha here,

1129
00:48:26,740 --> 00:48:27,550
right?

1130
00:48:27,550 --> 00:48:30,130
If we wanted to, we could
reduce the step size

1131
00:48:30,130 --> 00:48:32,260
and also change direction.

1132
00:48:32,260 --> 00:48:34,960
We would use a matrix to
do that, instead, right?

1133
00:48:34,960 --> 00:48:37,180
It would transform the step
and change its direction.

1134
00:48:37,180 --> 00:48:39,184
And maybe we would
choose different alphas

1135
00:48:39,184 --> 00:48:40,850
along different
directions, for example.

1136
00:48:40,850 --> 00:48:43,300
So diagonal matrix
with different alphas.

1137
00:48:43,300 --> 00:48:46,150
We could potentially do that.

1138
00:48:46,150 --> 00:48:50,650
we're probably going to
need some extra information

1139
00:48:50,650 --> 00:48:55,400
to decide how to set the
scaling in different directions.

1140
00:48:55,400 --> 00:48:59,770
One thing we know for sure is
that the Newton-Raphson step

1141
00:48:59,770 --> 00:49:01,550
will reduce the value
of the function.

1142
00:49:01,550 --> 00:49:03,490
If we take a small
enough step size,

1143
00:49:03,490 --> 00:49:05,890
it will bring the value
of the function down.

1144
00:49:05,890 --> 00:49:07,870
We know that because
we did the Taylor

1145
00:49:07,870 --> 00:49:11,710
Expansion of the function
to determine that step size.

1146
00:49:11,710 --> 00:49:13,950
And that Taylor expansion
was going to be--

1147
00:49:13,950 --> 00:49:17,170
that Taylor expansion is
nearly exact in the limit

1148
00:49:17,170 --> 00:49:18,820
of very, very small step sizes.

1149
00:49:18,820 --> 00:49:21,220
So there will always
be some small step

1150
00:49:21,220 --> 00:49:24,570
in this direction,
which will reduce

1151
00:49:24,570 --> 00:49:25,720
the value of the function.

1152
00:49:25,720 --> 00:49:27,840
In other directions,
we may reduce the value

1153
00:49:27,840 --> 00:49:30,090
of the function faster.

1154
00:49:30,090 --> 00:49:32,560
We don't know which
directions to choose, OK?

1155
00:49:32,560 --> 00:49:33,810
Actually I shouldn't say that.

1156
00:49:33,810 --> 00:49:35,610
When we take very small step
sizes in this direction,

1157
00:49:35,610 --> 00:49:37,609
it's reducing the value
of the function fastest.

1158
00:49:37,609 --> 00:49:39,480
There isn't a faster
direction to go in.

1159
00:49:39,480 --> 00:49:44,459
When we take impossibly small,
vanishingly small step sizes.

1160
00:49:44,459 --> 00:49:46,500
But in principle, if I
had some extra information

1161
00:49:46,500 --> 00:49:48,480
on the problem, I might be
able to choose step sizes

1162
00:49:48,480 --> 00:49:49,605
along different directions.

1163
00:49:49,605 --> 00:49:51,600
I may know that one
of these directions

1164
00:49:51,600 --> 00:49:55,080
is more ill-behaved
than the other ones.

1165
00:49:55,080 --> 00:49:57,416
And choose a different
damping factor for it.

1166
00:49:57,416 --> 00:49:58,290
That's a possibility.

1167
00:49:58,290 --> 00:50:00,030
But we actually have
to know something

1168
00:50:00,030 --> 00:50:01,320
about the details of
the problem we're trying

1169
00:50:01,320 --> 00:50:02,611
to solve if we're going to do--

1170
00:50:02,611 --> 00:50:03,820
it's a wonderful question.

1171
00:50:03,820 --> 00:50:05,880
I mean, you could
think about ways

1172
00:50:05,880 --> 00:50:08,016
of making this more,
potentially more robust.

1173
00:50:08,016 --> 00:50:10,140
I'll show you an alternative
way of doing this when

1174
00:50:10,140 --> 00:50:12,660
we talk about optimization.

1175
00:50:12,660 --> 00:50:14,280
In optimization we'll
do-- we'll solve

1176
00:50:14,280 --> 00:50:16,890
systems of nonlinear equations
to solve these optimization

1177
00:50:16,890 --> 00:50:17,550
problems.

1178
00:50:17,550 --> 00:50:20,454
There's another way of doing
the same sort of strategy that's

1179
00:50:20,454 --> 00:50:21,870
more along what
you're describing.

1180
00:50:21,870 --> 00:50:23,328
Maybe there's a
different direction

1181
00:50:23,328 --> 00:50:25,130
to choose instead that
could be preferable.

1182
00:50:25,130 --> 00:50:27,640
This is something called
the dogleg method.

1183
00:50:27,640 --> 00:50:30,230
Great question.

1184
00:50:30,230 --> 00:50:32,260
Anything else?

1185
00:50:32,260 --> 00:50:32,760
No.

1186
00:50:36,120 --> 00:50:39,720
So globally convergent, right?

1187
00:50:39,720 --> 00:50:42,821
Converges to roots,
local minima or maxima.

1188
00:50:42,821 --> 00:50:44,820
There are other modifications
that are possible.

1189
00:50:44,820 --> 00:50:46,645
We'll talk about
them in optimization.

1190
00:50:46,645 --> 00:50:48,270
There's always a
penalty to doing this.

1191
00:50:48,270 --> 00:50:50,550
The penalty is in the
rate of convergence.

1192
00:50:50,550 --> 00:50:51,910
So it will converge more slowly.

1193
00:50:51,910 --> 00:50:53,790
But maybe you speed the
calculations along anyways,

1194
00:50:53,790 --> 00:50:54,289
right?

1195
00:50:54,289 --> 00:50:56,370
Maybe it requires fewer
iterations overall

1196
00:50:56,370 --> 00:50:59,135
to get there because
you tame the locally

1197
00:50:59,135 --> 00:51:01,260
convergent properties of
the Newton-Raphson method.

1198
00:51:01,260 --> 00:51:05,160
Or you shortcut some of
the expensive calculations,

1199
00:51:05,160 --> 00:51:07,740
like getting your Jacobian
or calculating your Jacobian

1200
00:51:07,740 --> 00:51:09,010
inverse.

1201
00:51:09,010 --> 00:51:10,280
All right?

1202
00:51:10,280 --> 00:51:13,050
So Monday we're going to review
sort of topics up til now.

1203
00:51:13,050 --> 00:51:15,234
Professor Green will run
the lecture on Monday.

1204
00:51:15,234 --> 00:51:16,650
And then after
that, we'll pick up

1205
00:51:16,650 --> 00:51:18,900
with optimization, which will
follow right on from what

1206
00:51:18,900 --> 00:51:19,680
we've done so far.

1207
00:51:19,680 --> 00:51:21,230
Thanks.