1
00:00:00,030 --> 00:00:02,400
The following content is
provided under a Creative

2
00:00:02,400 --> 00:00:03,780
Commons license.

3
00:00:03,780 --> 00:00:06,020
Your support will help
MIT OpenCourseWare

4
00:00:06,020 --> 00:00:10,100
continue to offer high quality
educational resources for free.

5
00:00:10,100 --> 00:00:12,670
To make a donation or to
view additional materials

6
00:00:12,670 --> 00:00:16,580
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:16,580 --> 00:00:17,235
at ocw.mit.edu.

8
00:00:31,220 --> 00:00:35,250
KAREN WILLCOX: OK everybody,
let's get started.

9
00:00:35,250 --> 00:00:38,130
So if you came in late,
project three is here.

10
00:00:38,130 --> 00:00:40,630
It's graded in here.

11
00:00:40,630 --> 00:00:43,530
You're going to have
project two back tomorrow.

12
00:00:43,530 --> 00:00:44,030
Yeah.

13
00:00:47,940 --> 00:00:49,607
The project threes
were really good.

14
00:00:49,607 --> 00:00:51,440
The one thing that
seemed to trip up-- well,

15
00:00:51,440 --> 00:00:53,565
there was a few things that
tripped up some people,

16
00:00:53,565 --> 00:00:55,790
but the one thing that seemed
to trip up a few people

17
00:00:55,790 --> 00:00:58,580
was just the computation of
the number of samples given

18
00:00:58,580 --> 00:01:03,230
the Monte Carlo simulation to
achieve the required stated

19
00:01:03,230 --> 00:01:07,655
accuracy of plus or minus .01
on the probability estimate.

20
00:01:07,655 --> 00:01:09,760
So some people got
it really well.

21
00:01:09,760 --> 00:01:12,130
Some people got it,
but their explanations

22
00:01:12,130 --> 00:01:14,820
were a little bit muddled
in some of their words.

23
00:01:14,820 --> 00:01:17,340
And then some people
didn't get it quite right.

24
00:01:17,340 --> 00:01:19,499
And I think it's a
really important thing,

25
00:01:19,499 --> 00:01:21,540
and it's really important
thing to make sure it's

26
00:01:21,540 --> 00:01:23,970
very clear in your mind.

27
00:01:23,970 --> 00:01:26,770
And one thing that
I would recommend

28
00:01:26,770 --> 00:01:29,390
is that whenever you're
talking about mean and variance

29
00:01:29,390 --> 00:01:31,930
or mean and standard
variation that you make sure

30
00:01:31,930 --> 00:01:34,350
it's very clear in your mind
of which random variable

31
00:01:34,350 --> 00:01:36,830
you're talking about,
because just saying mean

32
00:01:36,830 --> 00:01:39,990
or just saying variance
doesn't necessarily

33
00:01:39,990 --> 00:01:43,190
tell me or you anything
when there's lots

34
00:01:43,190 --> 00:01:45,560
of variances floating around.

35
00:01:45,560 --> 00:01:48,880
In this particular
example, there's

36
00:01:48,880 --> 00:01:51,491
the variance of the
output of the simulation

37
00:01:51,491 --> 00:01:52,990
that we're trying
to estimate, which

38
00:01:52,990 --> 00:01:55,340
is how far the ball flies.

39
00:01:55,340 --> 00:01:57,370
So how far the ball flies
is a random variable

40
00:01:57,370 --> 00:01:59,410
that's got its own variance.

41
00:01:59,410 --> 00:02:01,280
But then the
probability estimates

42
00:02:01,280 --> 00:02:04,170
themselves have got variances
because the probability

43
00:02:04,170 --> 00:02:06,684
estimates are random
variables because you're

44
00:02:06,684 --> 00:02:08,350
estimating them using
Monte Carlo, which

45
00:02:08,350 --> 00:02:10,030
is a random process.

46
00:02:10,030 --> 00:02:16,740
So when you're asked
to get the probability

47
00:02:16,740 --> 00:02:27,320
estimate to within plus or
minus 0.01 with 99% confidence,

48
00:02:27,320 --> 00:02:31,370
then that says we want
to look at the variance

49
00:02:31,370 --> 00:02:33,130
of the probability estimate.

50
00:02:33,130 --> 00:02:36,770
How much does the probability
estimate itself vary?

51
00:02:36,770 --> 00:02:39,114
And the variance of
that estimator-- so

52
00:02:39,114 --> 00:02:40,030
what's that estimator?

53
00:02:40,030 --> 00:02:43,819
Let's call it p hat is the
estimator of the probability.

54
00:02:43,819 --> 00:02:46,110
Well, we know that saying by
the central limit theorem,

55
00:02:46,110 --> 00:02:49,710
it's distributed as
a normal distribution

56
00:02:49,710 --> 00:02:53,280
with mean equal
to the probability

57
00:02:53,280 --> 00:02:58,230
that we're actually trying
to estimate, and variance

58
00:02:58,230 --> 00:03:09,774
of that probability p of
a, 1 minus p of a over n.

59
00:03:09,774 --> 00:03:10,274
Yup.

60
00:03:10,274 --> 00:03:11,149
AUDIENCE: [INAUDIBLE]

61
00:03:16,577 --> 00:03:18,410
KAREN WILLCOX: We'll
also put the lights on.

62
00:03:18,410 --> 00:03:19,520
That will help.

63
00:03:19,520 --> 00:03:22,460
Do you mind going into
my office and grabbing

64
00:03:22,460 --> 00:03:26,130
some of the big chalk
that's on-- great, thanks.

65
00:03:26,130 --> 00:03:27,170
I hate the little chalk.

66
00:03:27,170 --> 00:03:29,590
We'll get some bigger chalk.

67
00:03:29,590 --> 00:03:31,370
Can you see a bit better now?

68
00:03:31,370 --> 00:03:35,490
So this is the variance, but
it's the variance of what?

69
00:03:35,490 --> 00:03:38,660
It's the variance of this
estimator, p hat of a,

70
00:03:38,660 --> 00:03:41,076
and this is the
mean of p hat of a.

71
00:03:41,076 --> 00:03:42,450
So again, just
any time you write

72
00:03:42,450 --> 00:03:44,140
variance, variance of what?

73
00:03:44,140 --> 00:03:46,180
Variance of what
random variable?

74
00:03:46,180 --> 00:03:48,700
And I sort of got the
feeling some people who

75
00:03:48,700 --> 00:03:51,710
knew it had to be this
thing, but with defining

76
00:03:51,710 --> 00:03:55,600
this thing as the variance,
and then just randomly

77
00:03:55,600 --> 00:03:57,750
or magically dividing
by n, but make sure

78
00:03:57,750 --> 00:04:00,610
that this is really
clear in your mind

79
00:04:00,610 --> 00:04:02,940
that the estimator itself
is a random variable.

80
00:04:02,940 --> 00:04:05,620
And then in order to figure
out how many samples you take,

81
00:04:05,620 --> 00:04:09,800
we've got to control the
variance of the estimator.

82
00:04:09,800 --> 00:04:12,595
And then because we know this
thing is normally distributed,

83
00:04:12,595 --> 00:04:14,970
and we want 99%
confidence, that's

84
00:04:14,970 --> 00:04:18,253
plus or minus 3 sigma or
plus or minus 2.58, some

85
00:04:18,253 --> 00:04:20,779
of you computed, so
we need to make sure

86
00:04:20,779 --> 00:04:28,426
that this sigma, square root
of this times 3 or times 2.58

87
00:04:28,426 --> 00:04:32,833
falls within plus or
minus-- is less than 0.01.

88
00:04:32,833 --> 00:04:33,708
AUDIENCE: [INAUDIBLE]

89
00:04:36,376 --> 00:04:37,626
KAREN WILLCOX: Less than here?

90
00:04:37,626 --> 00:04:38,340
AUDIENCE: Yeah.

91
00:04:38,340 --> 00:04:41,650
KAREN WILLCOX: So that's through
the central limit theorem,

92
00:04:41,650 --> 00:04:43,230
because we know
that we can write

93
00:04:43,230 --> 00:04:47,530
the estimate of the
probability-- remember,

94
00:04:47,530 --> 00:04:52,280
we could write it as 1 over n,
the sum from i equals 1 to n

95
00:04:52,280 --> 00:04:56,561
of i of ai.

96
00:04:56,561 --> 00:04:57,436
AUDIENCE: [INAUDIBLE]

97
00:05:02,738 --> 00:05:05,050
KAREN WILLCOX: Yeah, so this
is another important thing

98
00:05:05,050 --> 00:05:07,400
to keep in mind is the
output itself doesn't

99
00:05:07,400 --> 00:05:10,160
have to be normally
distributed, but the estimator

100
00:05:10,160 --> 00:05:14,480
for the probability does
have a normal distribution.

101
00:05:14,480 --> 00:05:17,360
And the reason it does is
because we can write it

102
00:05:17,360 --> 00:05:21,330
as-- this is like a sample
mean of an indicator function,

103
00:05:21,330 --> 00:05:23,710
something that's either 0 or 1.

104
00:05:23,710 --> 00:05:27,270
And so by central limit theorem,
this thing for big enough n

105
00:05:27,270 --> 00:05:28,690
has a normal distribution.

106
00:05:28,690 --> 00:05:33,070
The same is true for the
estimator for the mean,

107
00:05:33,070 --> 00:05:36,885
because that's 1 over
n, the sum from i

108
00:05:36,885 --> 00:05:39,250
equal 1 to n-- then we
called this thing y,

109
00:05:39,250 --> 00:05:42,740
but I think we called
it y bar in the notes.

110
00:05:42,740 --> 00:05:45,910
The distribution of y bar,
the estimator for the mean

111
00:05:45,910 --> 00:05:48,442
also has a normal distribution.

112
00:05:48,442 --> 00:05:52,280
So remember, we saw estimators
for mean, for probability,

113
00:05:52,280 --> 00:05:54,260
and for variance.

114
00:05:54,260 --> 00:05:56,780
These two have normal
distributions regardless

115
00:05:56,780 --> 00:05:59,290
of the distribution
of the output,

116
00:05:59,290 --> 00:06:01,650
but the variance doesn't.

117
00:06:01,650 --> 00:06:03,114
Is that clear?

118
00:06:03,114 --> 00:06:04,530
And again, that's
why I think it's

119
00:06:04,530 --> 00:06:06,988
important to say variance of
what, or distribution of what.

120
00:06:06,988 --> 00:06:09,660
This is the distribution
of the random variable,

121
00:06:09,660 --> 00:06:12,360
which is the estimator
of the probability.

122
00:06:12,360 --> 00:06:14,780
It's sort of got
nothing-- it's got nothing

123
00:06:14,780 --> 00:06:19,620
to do with the variance of the
range of the ball in this case,

124
00:06:19,620 --> 00:06:21,870
or even the shape
of the distribution.

125
00:06:21,870 --> 00:06:23,930
But what it does depend
on is the probability

126
00:06:23,930 --> 00:06:25,950
we're trying to estimate.

127
00:06:25,950 --> 00:06:30,340
And so we use this
result to figure out

128
00:06:30,340 --> 00:06:31,960
how many samples to
take, but we don't

129
00:06:31,960 --> 00:06:33,372
have an estimate for this guy.

130
00:06:33,372 --> 00:06:37,630
So a trick that
helps people realize

131
00:06:37,630 --> 00:06:40,030
is that we know the
worst case for this.

132
00:06:40,030 --> 00:06:43,165
So what value of p will give
us the biggest variance?

133
00:06:45,860 --> 00:06:48,260
p equal 1/2?

134
00:06:48,260 --> 00:06:52,480
If p is 1/2, this 1/2
times 1/2 is 0.25.

135
00:06:52,480 --> 00:06:56,380
That's the biggest value
that p1 minus p can take.

136
00:06:56,380 --> 00:07:00,620
There's 0 at either end
and 0.25 in the middle.

137
00:07:00,620 --> 00:07:04,590
So one strategy is to compute
this using p equal 1/2,

138
00:07:04,590 --> 00:07:07,760
which is the worst
case, and I would just

139
00:07:07,760 --> 00:07:08,897
do plus or minus 3 sigma.

140
00:07:08,897 --> 00:07:10,980
It's a little bit conservative,
and that gives you

141
00:07:10,980 --> 00:07:16,120
something like 22,500
is the n that you need.

142
00:07:16,120 --> 00:07:18,670
And then you know you'll
be OK for anything.

143
00:07:18,670 --> 00:07:20,560
And it turns out one of
them, the probability

144
00:07:20,560 --> 00:07:22,895
came out to be like
0.52 or something.

145
00:07:22,895 --> 00:07:24,270
And that would be
the worst case,

146
00:07:24,270 --> 00:07:26,210
and you'd be a little
bit conservative here,

147
00:07:26,210 --> 00:07:26,930
but that's fine.

148
00:07:26,930 --> 00:07:32,430
You'd still be meeting
your plus or minus 0.01.

149
00:07:32,430 --> 00:07:34,070
So just make sure
make sure that that's

150
00:07:34,070 --> 00:07:38,304
really clear in your mind as
you go through how you take all

151
00:07:38,304 --> 00:07:39,720
these different
pieces that you're

152
00:07:39,720 --> 00:07:42,520
clear about, the distribution
and what is normally

153
00:07:42,520 --> 00:07:44,760
distributed and what
isn't, and then you

154
00:07:44,760 --> 00:07:47,840
can work from this
variance to identifying

155
00:07:47,840 --> 00:07:51,120
the plus or minus 3 sigma
to give 99% confidence

156
00:07:51,120 --> 00:07:54,611
and then translate
that into an n.

157
00:07:54,611 --> 00:07:55,110
OK.

158
00:07:55,110 --> 00:07:57,401
So if you didn't get it quite
right, take a look at it.

159
00:07:57,401 --> 00:07:59,130
And then if it's
not clear, I'm going

160
00:07:59,130 --> 00:08:01,857
to do sort of a summary of
everything on Wednesday,

161
00:08:01,857 --> 00:08:03,940
and we can go through it
again in more detail then

162
00:08:03,940 --> 00:08:05,080
if you want to.

163
00:08:05,080 --> 00:08:10,740
And I'll also post some
solutions tonight or tomorrow.

164
00:08:10,740 --> 00:08:11,910
But any other questions?

165
00:08:18,630 --> 00:08:19,200
OK.

166
00:08:19,200 --> 00:08:22,730
So today we're going
to finish talking

167
00:08:22,730 --> 00:08:26,970
about unconstrained
optimization.

168
00:08:26,970 --> 00:08:32,669
We'll just pick up where we
left off on last Wednesday.

169
00:08:32,669 --> 00:08:36,520
And we'll talk a lot about
how we compute gradients.

170
00:08:36,520 --> 00:08:39,580
We might talk a bit about the
1D search if we have time.

171
00:08:39,580 --> 00:08:42,158
And then we'll finish up by
talking about response surface

172
00:08:42,158 --> 00:08:42,658
modelling.

173
00:08:47,150 --> 00:09:00,590
If you remember where we left
off last Wednesday, so first

174
00:09:00,590 --> 00:09:02,799
up, this is a basic
optimization problem

175
00:09:02,799 --> 00:09:03,840
we've been talking about.

176
00:09:03,840 --> 00:09:06,420
Remember, minimize
some objective j

177
00:09:06,420 --> 00:09:09,010
that's a function of
design variables x,

178
00:09:09,010 --> 00:09:14,540
and we have x1, x2, up
to xn design variables.

179
00:09:14,540 --> 00:09:19,290
Parameters p subject to
inequality constraints

180
00:09:19,290 --> 00:09:23,280
that we write as dJ is less
than or equal to 0, and equality

181
00:09:23,280 --> 00:09:26,140
constraints that we
write as hk equal to 0.

182
00:09:26,140 --> 00:09:29,360
And in general, we might have
m1 inequality constraints

183
00:09:29,360 --> 00:09:31,160
and m2 equality
constraints, and then we

184
00:09:31,160 --> 00:09:33,960
might also have bounds
on the design space.

185
00:09:33,960 --> 00:09:36,140
So we saw this last Wednesday.

186
00:09:36,140 --> 00:09:39,020
And we also talked about
the general process

187
00:09:39,020 --> 00:09:40,550
of optimization.

188
00:09:40,550 --> 00:09:42,970
And remember, I said think
of it as a landscape where

189
00:09:42,970 --> 00:09:45,930
the directions in the landscape
are the design variables,

190
00:09:45,930 --> 00:09:48,040
and the height of the
landscape represents

191
00:09:48,040 --> 00:09:49,480
the objective function.

192
00:09:49,480 --> 00:09:51,360
If you're maximizing,
your goal is

193
00:09:51,360 --> 00:09:53,625
to get to the top of
the highest mountain

194
00:09:53,625 --> 00:09:55,134
or highest hill
in the landscape.

195
00:09:55,134 --> 00:09:56,550
If you're minimizing,
your goal is

196
00:09:56,550 --> 00:09:59,390
to get to the bottom
of the lowest valley.

197
00:09:59,390 --> 00:10:01,440
And the way that
that works is that we

198
00:10:01,440 --> 00:10:03,312
start with some
initial guess that we

199
00:10:03,312 --> 00:10:05,630
denote x with a superscript 0.

200
00:10:05,630 --> 00:10:07,560
That's the initial guess.

201
00:10:07,560 --> 00:10:09,070
We look around.

202
00:10:09,070 --> 00:10:10,580
For a gradient-based
method, we're

203
00:10:10,580 --> 00:10:14,050
going to also
compute the gradient

204
00:10:14,050 --> 00:10:18,000
of the objective of where we're
standing in the landscape.

205
00:10:18,000 --> 00:10:21,970
Look around, find
the search direction.

206
00:10:21,970 --> 00:10:23,640
Once we've found that
search direction,

207
00:10:23,640 --> 00:10:25,250
look along that
search direction.

208
00:10:25,250 --> 00:10:29,276
Perform what's called the 1D
search and take a step alpha.

209
00:10:29,276 --> 00:10:31,650
Compute a new design, figure
out whether we've converged,

210
00:10:31,650 --> 00:10:32,280
and keep going.

211
00:10:32,280 --> 00:10:36,170
So we're walking around
the landscape iterating

212
00:10:36,170 --> 00:10:39,289
with our design variables.

213
00:10:39,289 --> 00:10:40,830
And what we talked
about on Wednesday

214
00:10:40,830 --> 00:10:42,480
was the steepest
descent method, which

215
00:10:42,480 --> 00:10:44,490
says always choose
the search direction

216
00:10:44,490 --> 00:10:47,400
to be the negative
of gradient of j.

217
00:10:47,400 --> 00:10:49,630
So that's the direction
of steepest descent.

218
00:10:49,630 --> 00:10:51,540
And we also saw
conjugate gradient,

219
00:10:51,540 --> 00:10:53,560
where we modify the
search direction according

220
00:10:53,560 --> 00:10:55,820
to where we have come from.

221
00:10:55,820 --> 00:11:00,690
And remember that we saw that
steepest descent-- in this plot

222
00:11:00,690 --> 00:11:03,940
where the objective function is
represented with the contours,

223
00:11:03,940 --> 00:11:06,090
the steepest descent
direction is always

224
00:11:06,090 --> 00:11:07,930
perpendicular to the contours.

225
00:11:07,930 --> 00:11:10,670
And we saw that as we approached
the optimum, steepest descent

226
00:11:10,670 --> 00:11:13,003
starts to take this zig-zagging
path with really, really

227
00:11:13,003 --> 00:11:14,350
small steps.

228
00:11:14,350 --> 00:11:17,340
Whereas conjugate gradient,
which modifies each search

229
00:11:17,340 --> 00:11:20,400
direction depending on the
previous search direction,

230
00:11:20,400 --> 00:11:22,470
is able to converge
much more quickly.

231
00:11:22,470 --> 00:11:25,620
And in this case, it's just
[? a quick consideration. ?]

232
00:11:25,620 --> 00:11:29,120
OK, so that's where we
are stopped on Wednesday.

233
00:11:29,120 --> 00:11:33,900
So I want to mention just really
quickly a couple other methods.

234
00:11:33,900 --> 00:11:38,360
So if I go back actually, we
talked about first order method

235
00:11:38,360 --> 00:11:39,890
that use just
gradient information,

236
00:11:39,890 --> 00:11:42,300
and that was steepest descent
and conjugate gradient.

237
00:11:42,300 --> 00:11:44,230
So there are also
second order method

238
00:11:44,230 --> 00:11:47,416
that use not just
the first derivative,

239
00:11:47,416 --> 00:11:49,040
but also second
derivative information.

240
00:11:49,040 --> 00:11:50,850
And remember, we also
derived on Wednesday

241
00:11:50,850 --> 00:11:56,000
that the second derivative
of j is an n by n matrix.

242
00:11:56,000 --> 00:11:58,500
That was the Hessian matrix,
the matrix of second derivative

243
00:11:58,500 --> 00:12:01,250
that handled the pure second
on the diagonal, and then

244
00:12:01,250 --> 00:12:04,658
all the cross terms
on the off diagonals.

245
00:12:04,658 --> 00:12:07,400
So how does Newton's
method work?

246
00:12:07,400 --> 00:12:11,150
Well, if you think about a
Taylor series-- and again,

247
00:12:11,150 --> 00:12:13,070
we derived this on Wednesday.

248
00:12:13,070 --> 00:12:17,430
The Taylor series in the case
of a scalar objective that

249
00:12:17,430 --> 00:12:20,586
depends on a vector of
design variables, so x

250
00:12:20,586 --> 00:12:22,270
is an n-dimensional vector.

251
00:12:22,270 --> 00:12:28,150
If we expand j of x
about the point j of x0,

252
00:12:28,150 --> 00:12:33,295
then the first term is the
gradient times delta x.

253
00:12:33,295 --> 00:12:35,324
Delta x is x minus x0.

254
00:12:35,324 --> 00:12:37,740
And then remember, the second
term was this quadratic term

255
00:12:37,740 --> 00:12:40,460
that looked like delta x
transposed times the Hessian

256
00:12:40,460 --> 00:12:42,230
times delta x.

257
00:12:42,230 --> 00:12:44,849
Remember, we did
that on Wednesday.

258
00:12:44,849 --> 00:12:46,640
And this is an
approximation, because we're

259
00:12:46,640 --> 00:12:50,138
truncating all the third
order terms and higher.

260
00:12:50,138 --> 00:12:53,220
So we can use this
Taylor series expansion,

261
00:12:53,220 --> 00:12:56,832
and we can say,
differentiate both sides.

262
00:12:56,832 --> 00:13:00,930
So grad J of x can
be approximated

263
00:13:00,930 --> 00:13:08,440
by grad J of x0 plus the Hessian
evaluated at x0 times delta x.

264
00:13:08,440 --> 00:13:08,941
Yeah.

265
00:13:08,941 --> 00:13:10,106
And what are we looking for?

266
00:13:10,106 --> 00:13:12,430
We're looking for a place
with the gradient of j is 0.

267
00:13:12,430 --> 00:13:14,640
Remember, that was the
condition for finding

268
00:13:14,640 --> 00:13:19,310
a minimum or a maximum, or it
could also be a saddle point.

269
00:13:19,310 --> 00:13:24,320
So if we set this left hand side
equal to 0, then what's left

270
00:13:24,320 --> 00:13:27,260
is this condition here.

271
00:13:27,260 --> 00:13:29,030
And so now we can
rearrange and say

272
00:13:29,030 --> 00:13:33,210
that the delta x,
the step that we take

273
00:13:33,210 --> 00:13:38,300
is going to be the inverse of
the Hessian at the point we're

274
00:13:38,300 --> 00:13:44,120
standing at times the gradient
of the objective at that point

275
00:13:44,120 --> 00:13:46,180
with a minus sign.

276
00:13:46,180 --> 00:13:55,080
OK, so remember [INAUDIBLE] in
the steepest descent method,

277
00:13:55,080 --> 00:13:57,302
we said that the search
direction was just

278
00:13:57,302 --> 00:14:08,160
equal to minus grad J. And
we're writing that x at q plus 1

279
00:14:08,160 --> 00:14:14,480
is equal to x at
q plus alpha q--

280
00:14:14,480 --> 00:14:20,220
let me write that as a
subscript-- times [? x ?] q.

281
00:14:25,440 --> 00:14:25,940
OK.

282
00:14:25,940 --> 00:14:28,230
So that's what
we've been saying is

283
00:14:28,230 --> 00:14:31,160
that the new guess
of the design is

284
00:14:31,160 --> 00:14:33,960
the old one plus
[INAUDIBLE] term.

285
00:14:33,960 --> 00:14:37,308
In the steepest descent method,
where it's just minus gradient

286
00:14:37,308 --> 00:14:40,850
of J. So now you can see
with Newton's method,

287
00:14:40,850 --> 00:14:44,070
we have that the delta x,
which is this guy here,

288
00:14:44,070 --> 00:14:47,780
the alpha q times [? this ?]
[? direction ?] [? s ?] q is

289
00:14:47,780 --> 00:14:50,677
the negative of the inverse
of the Hessian times gradient

290
00:14:50,677 --> 00:14:51,710
of J.

291
00:14:51,710 --> 00:14:52,585
AUDIENCE: [INAUDIBLE]

292
00:14:56,416 --> 00:14:57,832
KAREN WILLCOX:
Yeah, so we'll have

293
00:14:57,832 --> 00:15:06,650
to check that when we converge,
when we get to-- we'll

294
00:15:06,650 --> 00:15:09,620
check that on convergence.

295
00:15:09,620 --> 00:15:10,970
Yeah.

296
00:15:10,970 --> 00:15:12,940
Typically you don't
check it as you go along,

297
00:15:12,940 --> 00:15:16,320
but it actually depends a
little bit on the algorithm.

298
00:15:16,320 --> 00:15:17,990
The next ones that
you'll see, you'll

299
00:15:17,990 --> 00:15:19,950
see that the desire
for Hessian to be

300
00:15:19,950 --> 00:15:22,792
positive definite is worked in.

301
00:15:26,090 --> 00:15:29,050
OK, so how does
Newton's method work?

302
00:15:29,050 --> 00:15:30,640
This says S, but
it really should

303
00:15:30,640 --> 00:15:35,094
be alpha times S in the way I
defined the update in delta x.

304
00:15:35,094 --> 00:15:36,870
So maybe you can
see that if you're

305
00:15:36,870 --> 00:15:39,680
trying to optimize a
quadratic function,

306
00:15:39,680 --> 00:15:41,890
this extension is exact.

307
00:15:41,890 --> 00:15:43,820
There's no approximation here.

308
00:15:43,820 --> 00:15:45,560
And in that case,
Newton's method

309
00:15:45,560 --> 00:15:47,734
would converge in one step.

310
00:15:47,734 --> 00:15:49,150
So that means that
if I was trying

311
00:15:49,150 --> 00:15:51,066
to optimize some kind
of a quadratic function,

312
00:15:51,066 --> 00:15:53,400
or the landscape
would be like a bowl,

313
00:15:53,400 --> 00:15:55,570
no matter where I
started, I would

314
00:15:55,570 --> 00:16:01,260
get to the optimum in one
step with Newton's method.

315
00:16:01,260 --> 00:16:04,410
But if the function's
not actually quadratic,

316
00:16:04,410 --> 00:16:07,670
then what we're
basically doing is

317
00:16:07,670 --> 00:16:12,000
computing an approximation
of the function

318
00:16:12,000 --> 00:16:14,070
here as a quadratic.

319
00:16:14,070 --> 00:16:15,570
So however it would
look, it would

320
00:16:15,570 --> 00:16:18,160
look maybe something like this,
we'd be approximating here.

321
00:16:18,160 --> 00:16:19,580
We would solve Newton's method.

322
00:16:19,580 --> 00:16:21,300
We would jump to there.

323
00:16:21,300 --> 00:16:24,640
We would create a new
approximation locally

324
00:16:24,640 --> 00:16:26,945
that might look like this,
and then we would come in

325
00:16:26,945 --> 00:16:29,644
and eventually we
would converge.

326
00:16:29,644 --> 00:16:32,850
So if J is not
quadratic, you just

327
00:16:32,850 --> 00:16:35,470
keep performing Taylor
series, getting a new point,

328
00:16:35,470 --> 00:16:38,264
and then repeating
until it's converged.

329
00:16:38,264 --> 00:16:40,430
So this turns out to be a
really efficient technique

330
00:16:40,430 --> 00:16:42,138
if you start near the
solution, if you're

331
00:16:42,138 --> 00:16:45,720
starting sort of close to one
of the local bowls of attraction

332
00:16:45,720 --> 00:16:47,129
if you're minimizing.

333
00:16:47,129 --> 00:16:49,670
But the problem is that you can
see that now we need not just

334
00:16:49,670 --> 00:16:52,810
gradients, but we need
also the second derivative.

335
00:16:52,810 --> 00:16:55,010
I'm going to talk in
a second about how

336
00:16:55,010 --> 00:17:00,860
to estimate the gradients even
if is actually in many cases

337
00:17:00,860 --> 00:17:04,865
getting this Hessian ends
up to be much too expensive.

338
00:17:04,865 --> 00:17:07,236
AUDIENCE: [INAUDIBLE]

339
00:17:07,236 --> 00:17:09,069
KAREN WILLCOX: How do
we define convergence?

340
00:17:09,069 --> 00:17:10,377
AUDIENCE: [INAUDIBLE].

341
00:17:10,377 --> 00:17:12,909
Yeah, so for an
unconstrained problem,

342
00:17:12,909 --> 00:17:14,750
remember we're looking
for two conditions.

343
00:17:14,750 --> 00:17:17,829
We're looking for the
gradient of j equal to 0.

344
00:17:17,829 --> 00:17:22,630
And we're looking for Hessian
to be positive definite.

345
00:17:22,630 --> 00:17:27,319
Or positive semi-definite
depending on the uniqueness.

346
00:17:27,319 --> 00:17:28,055
That's in theory.

347
00:17:28,055 --> 00:17:30,100
It turns out that in
implementation, it

348
00:17:30,100 --> 00:17:34,320
can be a little bit tricky
to monitor convergence.

349
00:17:34,320 --> 00:17:38,840
So often you would look for
the sequence of the design

350
00:17:38,840 --> 00:17:40,150
variables to stop changing.

351
00:17:40,150 --> 00:17:42,700
In other words, when the
updates become really small.

352
00:17:42,700 --> 00:17:43,780
And that's when you
would terminate,

353
00:17:43,780 --> 00:17:46,154
and then you would try to
check the optimality condition.

354
00:17:48,540 --> 00:17:53,065
So let's talk about how
to estimate the gradient.

355
00:17:57,520 --> 00:18:05,030
And this actually will
connect back to some stuff

356
00:18:05,030 --> 00:18:06,830
that you saw earlier
in the semester.

357
00:18:11,800 --> 00:18:14,210
So we're onto number two,
which is computing gradients.

358
00:18:22,814 --> 00:18:31,750
So, generally we're
going to need definitely

359
00:18:31,750 --> 00:18:37,900
grad j, which remember was
the partial derivatives of j

360
00:18:37,900 --> 00:18:42,910
with respect to all of
the design variables.

361
00:18:48,470 --> 00:18:55,870
And we might need the
second derivatives.

362
00:18:55,870 --> 00:18:58,588
If we wanted to do
the Newton method,

363
00:18:58,588 --> 00:19:00,774
then we might also need
the second derivatives.

364
00:19:00,774 --> 00:19:02,273
And I won't write
it out, but again,

365
00:19:02,273 --> 00:19:09,820
remember that was the n by
n matrix of second order

366
00:19:09,820 --> 00:19:11,838
partials.

367
00:19:11,838 --> 00:19:16,280
So methods to compute gradient.

368
00:19:16,280 --> 00:19:16,850
Any ideas?

369
00:19:19,556 --> 00:19:20,730
Finite difference, yup.

370
00:19:20,730 --> 00:19:25,490
So that's finite difference,
which we'll talk about.

371
00:19:25,490 --> 00:19:28,430
That's probably the most
commonly used in practice

372
00:19:28,430 --> 00:19:31,500
that people are using.

373
00:19:31,500 --> 00:19:32,380
Any other ideas?

374
00:19:39,310 --> 00:19:41,574
If you could, what would
be the most reliable way

375
00:19:41,574 --> 00:19:42,240
to get gradient?

376
00:19:45,070 --> 00:19:46,203
Differentiate, yeah.

377
00:19:46,203 --> 00:19:48,908
So analytical
gradients, if you can.

378
00:19:51,430 --> 00:19:54,709
If you have a code, or if you
have a model that analytic,

379
00:19:54,709 --> 00:19:56,500
that's all functions
you can differentiate,

380
00:19:56,500 --> 00:19:57,730
then differentiate.

381
00:19:57,730 --> 00:20:02,520
Analytic gradients is
definitely the way to go.

382
00:20:02,520 --> 00:20:05,730
Have you guys heard
of the adjoint method,

383
00:20:05,730 --> 00:20:07,320
that Professor
Wong talked about?

384
00:20:07,320 --> 00:20:07,820
[INAUDIBLE]

385
00:20:10,450 --> 00:20:13,110
So this is a really
powerful way to get

386
00:20:13,110 --> 00:20:15,321
gradients when you
have particularly

387
00:20:15,321 --> 00:20:17,445
CFD models or [? panelement ?]
models that have got

388
00:20:17,445 --> 00:20:19,660
lots and lots and lots
of design variables.

389
00:20:19,660 --> 00:20:22,320
So like if you're trying
to do shape optimization

390
00:20:22,320 --> 00:20:24,670
of an aircraft wing and
your design variables were

391
00:20:24,670 --> 00:20:31,370
all the surface points or all
the properties of the sections

392
00:20:31,370 --> 00:20:33,620
in a wing, so you have
hundreds and hundreds of design

393
00:20:33,620 --> 00:20:37,430
variables, people use
what's the adjoint method

394
00:20:37,430 --> 00:20:38,847
to come up with gradients.

395
00:20:38,847 --> 00:20:40,805
Has anybody heard of
automatic differentiation?

396
00:20:47,860 --> 00:20:50,500
So this is something that's
becoming kind of increasingly

397
00:20:50,500 --> 00:20:55,300
popular, where people write
these automatic differentiation

398
00:20:55,300 --> 00:20:58,272
codes, and basically
you take your code--

399
00:20:58,272 --> 00:21:04,040
so you take your code that
takes in x and puts out J,

400
00:21:04,040 --> 00:21:06,500
and you feed it through this
automatic differentiator,

401
00:21:06,500 --> 00:21:09,310
and it gives you back a code
that takes in x and gives

402
00:21:09,310 --> 00:21:14,309
you grad J. Sort of somewhere
at the intersection of computer

403
00:21:14,309 --> 00:21:16,100
science and computational
science, I guess.

404
00:21:16,100 --> 00:21:18,685
They're write these codes
that will go through your code

405
00:21:18,685 --> 00:21:21,040
and differentiate it
line by line by line,

406
00:21:21,040 --> 00:21:24,970
and use the chain
rule, and basically

407
00:21:24,970 --> 00:21:26,730
automatically differentiate it.

408
00:21:26,730 --> 00:21:28,850
And so there's an example
of this in [? EETS ?].

409
00:21:28,850 --> 00:21:31,320
There's a big model called
the MIT GCM, which is the MIT

410
00:21:31,320 --> 00:21:32,553
Global Circulation Model.

411
00:21:32,553 --> 00:21:34,455
I don't know if
anybody's heard of it.

412
00:21:34,455 --> 00:21:36,330
Nicholas, you work over
in [? EETS ?], right?

413
00:21:36,330 --> 00:21:36,862
Have you--

414
00:21:36,862 --> 00:21:37,737
AUDIENCE: [INAUDIBLE]

415
00:21:41,390 --> 00:21:43,560
KAREN WILLCOX: MIT's been
building for a long time

416
00:21:43,560 --> 00:21:45,620
this really massive
community model that's

417
00:21:45,620 --> 00:21:47,965
doing a lot of modeling of
the ocean and atmosphere

418
00:21:47,965 --> 00:21:49,850
and interactions
for climate change.

419
00:21:49,850 --> 00:21:52,220
And so they've used
automatic differentiation

420
00:21:52,220 --> 00:21:55,730
so that you can take
this massive model

421
00:21:55,730 --> 00:21:57,805
and come up with gradients.

422
00:21:57,805 --> 00:21:59,680
And it's symbolic
differentiation too, right?

423
00:21:59,680 --> 00:22:01,235
Isn't it Mathematica?

424
00:22:01,235 --> 00:22:03,860
There's a lot of different ways
you can come up with gradients.

425
00:22:03,860 --> 00:22:06,470
[? And there's ?]
[? another one ?] [INAUDIBLE]

426
00:22:06,470 --> 00:22:08,962
something called the
complex step method,

427
00:22:08,962 --> 00:22:11,045
which I'm afraid we don't
have time to talk about.

428
00:22:11,045 --> 00:22:14,312
It's a pretty neat trick.

429
00:22:14,312 --> 00:22:15,520
I can't remember the formula.

430
00:22:15,520 --> 00:22:17,643
Do you know the formula, Alex?

431
00:22:17,643 --> 00:22:18,767
AUDIENCE: [INAUDIBLE]

432
00:22:18,767 --> 00:22:19,600
KAREN WILLCOX: Yeah.

433
00:22:19,600 --> 00:22:21,558
I don't want to write it
down and get it wrong,

434
00:22:21,558 --> 00:22:25,500
but if you can take
your model, your J of x

435
00:22:25,500 --> 00:22:30,017
and put in a complex x-- maybe
you can find the formula.

436
00:22:30,017 --> 00:22:31,600
Anyway, it's a really
neat trick where

437
00:22:31,600 --> 00:22:33,850
you can pretend that your
variables are complex,

438
00:22:33,850 --> 00:22:35,808
and then it turns out
that the gradient ends up

439
00:22:35,808 --> 00:22:37,977
being the imaginary part.

440
00:22:37,977 --> 00:22:38,810
So that's also neat.

441
00:22:38,810 --> 00:22:41,300
And if your code's
written in Fortran,

442
00:22:41,300 --> 00:22:42,720
this can be a neat trick.

443
00:22:42,720 --> 00:22:44,557
Do you guys even know Fortran?

444
00:22:44,557 --> 00:22:47,140
You know, Professor [? Jolley ?]
still writes all of his codes

445
00:22:47,140 --> 00:22:47,928
in Fortran.

446
00:22:47,928 --> 00:22:50,370
AUDIENCE: [INAUDIBLE]

447
00:22:50,370 --> 00:22:52,050
KAREN WILLCOX: You're
painfully aware?

448
00:22:52,050 --> 00:22:54,215
So in Fortran, you can
define complex variables.

449
00:22:54,215 --> 00:22:55,090
AUDIENCE: [INAUDIBLE]

450
00:22:58,280 --> 00:23:01,110
KAREN WILLCOX: The
imaginary part of J,

451
00:23:01,110 --> 00:23:04,604
x of [? pi ?] times
[? a ?] [? h ?] is equal

452
00:23:04,604 --> 00:23:05,270
to the gradient?

453
00:23:05,270 --> 00:23:06,545
AUDIENCE: [INAUDIBLE]

454
00:23:06,545 --> 00:23:07,920
KAREN WILLCOX:
Divided by [? J ?]

455
00:23:07,920 --> 00:23:09,780
is equal to the gradient.

456
00:23:09,780 --> 00:23:13,124
Yeah, so [INAUDIBLE].

457
00:23:13,124 --> 00:23:16,660
It turns out if you
take a complex step,

458
00:23:16,660 --> 00:23:19,180
if you take your variable
x and add a complex step,

459
00:23:19,180 --> 00:23:23,376
and then divide by
that h, it turns out

460
00:23:23,376 --> 00:23:25,560
the imaginary part of
that result ends up

461
00:23:25,560 --> 00:23:28,350
giving you the gradient.

462
00:23:28,350 --> 00:23:30,310
OK, but anyway.

463
00:23:30,310 --> 00:23:32,430
Finite differences is
the most commonly used

464
00:23:32,430 --> 00:23:34,625
because it's sort of
the general purpose one.

465
00:23:34,625 --> 00:23:41,330
So let's just look at how
finite differences work.

466
00:23:41,330 --> 00:23:43,840
And these formulas should
look somewhat familiar,

467
00:23:43,840 --> 00:23:45,730
although you saw them
in a different setting.

468
00:23:45,730 --> 00:23:48,415
Remember, you saw
finite differences

469
00:23:48,415 --> 00:23:51,640
when we were approximating
partial derivatives

470
00:23:51,640 --> 00:23:52,790
in the [? PEs ?], right?

471
00:23:52,790 --> 00:23:54,000
Remember way back?

472
00:23:54,000 --> 00:23:55,583
You're look at the
convection equation

473
00:23:55,583 --> 00:23:57,660
or the convection
diffusion equation, where

474
00:23:57,660 --> 00:24:01,770
you had things like du dx, or
[INAUDIBLE] squared [INAUDIBLE]

475
00:24:01,770 --> 00:24:03,680
squared, and you used
finite differences?

476
00:24:03,680 --> 00:24:06,420
The exact same idea
can be applied now

477
00:24:06,420 --> 00:24:10,230
to find the gradient of J with
respect to design variables x.

478
00:24:10,230 --> 00:24:17,060
So for example, if we
were looking for dJ dx i

479
00:24:17,060 --> 00:24:21,140
at some point, let's
just call it x0,

480
00:24:21,140 --> 00:24:29,430
then we would just take J and
we'd have x1, x2, all the way,

481
00:24:29,430 --> 00:24:33,960
and we would have
xi plus delta xi.

482
00:24:33,960 --> 00:24:39,230
So we're going to perturb
the ith design variable.

483
00:24:39,230 --> 00:24:42,820
And then we would have xi
plus 1 all the way up to xn.

484
00:24:42,820 --> 00:24:45,490
They would all be fixed.

485
00:24:45,490 --> 00:24:47,810
And we're going to
differentiate that

486
00:24:47,810 --> 00:24:58,492
with the baseline
point, xi up to xn.

487
00:25:01,390 --> 00:25:06,300
And all that's going to
be divided by delta xi.

488
00:25:06,300 --> 00:25:10,160
And so this thing
here is the numerator.

489
00:25:10,160 --> 00:25:18,260
So this is the baseline point,
the x0 would be right here.

490
00:25:23,309 --> 00:25:26,290
So there's the baseline point.

491
00:25:26,290 --> 00:25:30,650
And then if we want to estimate
the gradient in the direction

492
00:25:30,650 --> 00:25:34,650
xi, we just take xi and
we perturb it by delta xi,

493
00:25:34,650 --> 00:25:37,100
take the difference between
the function [INAUDIBLE]

494
00:25:37,100 --> 00:25:42,380
perturbed point minus the
baseline divided by delta xi.

495
00:25:42,380 --> 00:25:43,282
Yeah.

496
00:25:43,282 --> 00:25:44,990
So again, it looks
pretty similar to what

497
00:25:44,990 --> 00:25:48,832
you saw when we talked about
finite differences earlier.

498
00:25:48,832 --> 00:25:51,010
So this would be a
one-sided difference.

499
00:25:54,635 --> 00:25:55,510
AUDIENCE: [INAUDIBLE]

500
00:25:58,301 --> 00:26:00,050
KAREN WILLCOX: Different
ways of computing

501
00:26:00,050 --> 00:26:00,670
the finite difference?

502
00:26:00,670 --> 00:26:01,253
Yeah, exactly.

503
00:26:01,253 --> 00:26:02,660
So that's a one-sided.

504
00:26:02,660 --> 00:26:06,010
You could also do a
two-sided difference

505
00:26:06,010 --> 00:26:10,219
with the same partial.

506
00:26:10,219 --> 00:26:16,730
So it would be J dx i
evaluated at the point

507
00:26:16,730 --> 00:26:31,750
x0 could be J x1, x2, to xi
plus delta xi minus J x1, x2,

508
00:26:31,750 --> 00:26:33,104
xi minus delta xi.

509
00:26:36,896 --> 00:26:39,771
All divided out by 2.

510
00:26:39,771 --> 00:26:42,466
[INAUDIBLE] central difference.

511
00:26:42,466 --> 00:26:46,369
So you can evaluate
the derivative

512
00:26:46,369 --> 00:26:49,390
at the same point using
the point in one side,

513
00:26:49,390 --> 00:26:52,725
or you could use one little step
forward, one little step back,

514
00:26:52,725 --> 00:26:54,225
and do it like a
central difference.

515
00:26:56,765 --> 00:26:59,640
And if you wanted
to do higher order,

516
00:26:59,640 --> 00:27:02,730
you could take more points.

517
00:27:02,730 --> 00:27:06,050
But now here's the catch.

518
00:27:06,050 --> 00:27:15,400
So if we want to estimate
[? compute ?] to compute

519
00:27:15,400 --> 00:27:22,110
grad J at whatever the current
point is in the design space,

520
00:27:22,110 --> 00:27:24,970
how many function
calls do we need if we

521
00:27:24,970 --> 00:27:26,489
use a one-sided difference?

522
00:27:26,489 --> 00:27:28,280
How many times do we
need to run our model?

523
00:27:36,310 --> 00:27:39,890
2n if we were to use this one.

524
00:27:39,890 --> 00:27:41,080
How about this one?

525
00:27:41,080 --> 00:27:42,900
Well, almost 2n.

526
00:27:42,900 --> 00:27:46,430
n plus 1?

527
00:27:46,430 --> 00:27:48,250
So 1 for the baseline
and then it's

528
00:27:48,250 --> 00:27:50,540
going to be a little step
in x1, a little step in x2,

529
00:27:50,540 --> 00:27:52,800
a little step in x3,
all the way through xn.

530
00:27:55,400 --> 00:28:01,180
So with a one-sided,
we would need n plus 1.

531
00:28:01,180 --> 00:28:04,750
And for a two-sided,
we would just need 2n.

532
00:28:04,750 --> 00:28:06,100
We don't have the baseline.

533
00:28:09,470 --> 00:28:10,420
Function evaluations.

534
00:28:10,420 --> 00:28:15,720
Now remember, each
function evaluation

535
00:28:15,720 --> 00:28:17,410
in the course of
your simulation code,

536
00:28:17,410 --> 00:28:20,360
which could take minutes
or hours or even days.

537
00:28:20,360 --> 00:28:23,690
So if you have 100
design variables,

538
00:28:23,690 --> 00:28:26,376
each time you want
a gradient, you're

539
00:28:26,376 --> 00:28:29,250
going to have to do of the order
of 100 calls to your function.

540
00:28:29,250 --> 00:28:31,083
And remember, you're
going to need gradients

541
00:28:31,083 --> 00:28:32,468
at each point in
the design space

542
00:28:32,468 --> 00:28:34,426
when you're trying to
figure out where to move.

543
00:28:34,426 --> 00:28:37,020
So it can get expensive
pretty quickly.

544
00:28:37,020 --> 00:28:40,940
How about if you wanted
to compute the Hessian

545
00:28:40,940 --> 00:28:41,990
by finite differences?

546
00:28:48,880 --> 00:28:50,670
Well, we didn't write
down the formula,

547
00:28:50,670 --> 00:28:54,980
but you remember the formula for
the second derivative, right?

548
00:28:54,980 --> 00:29:01,206
If we were computing that
guy, then it would be J,

549
00:29:01,206 --> 00:29:03,109
and we would do xi--
probably don't want

550
00:29:03,109 --> 00:29:04,400
to write out all the arguments.

551
00:29:04,400 --> 00:29:16,216
xi minus 2J at the nominal
point minus J doing xi minus--

552
00:29:16,216 --> 00:29:18,358
do you remember that formula?

553
00:29:18,358 --> 00:29:21,300
Divided by delta x squared.

554
00:29:21,300 --> 00:29:25,720
The central finite difference
for the second order

555
00:29:25,720 --> 00:29:26,310
derivative.

556
00:29:26,310 --> 00:29:29,360
Do you guys remember that?

557
00:29:29,360 --> 00:29:31,970
Am I confusing you, or is it OK
to not write out all the x's?

558
00:29:31,970 --> 00:29:32,720
Is that all right?

559
00:29:32,720 --> 00:29:34,160
Yeah.

560
00:29:34,160 --> 00:29:40,030
So to compute second
derivative, first of all

561
00:29:40,030 --> 00:29:43,220
you're going to need the point
in front and the point behind.

562
00:29:43,220 --> 00:29:46,395
But how many entries
in the matrix?

563
00:29:46,395 --> 00:29:48,405
AUDIENCE: [INAUDIBLE]

564
00:29:48,405 --> 00:29:49,780
KAREN WILLCOX:
Yeah, it's n by n.

565
00:29:49,780 --> 00:29:51,330
It turns out it's
symmetric, so you

566
00:29:51,330 --> 00:29:53,540
don't need-- remember
it's symmetric,

567
00:29:53,540 --> 00:29:56,030
so you only need to
compute the diagonal

568
00:29:56,030 --> 00:29:57,540
and the upper triangle.

569
00:29:57,540 --> 00:30:04,060
It still turns out to be
n plus 1 over 2 entries,

570
00:30:04,060 --> 00:30:06,700
and then you need the
extra function evaluation,

571
00:30:06,700 --> 00:30:08,610
so then you would need
two for each of these.

572
00:30:08,610 --> 00:30:11,770
So it basically ends up
being [INAUDIBLE] n squared.

573
00:30:11,770 --> 00:30:16,270
And so if you have 100 design
variables, that's 100 squared,

574
00:30:16,270 --> 00:30:19,340
10,000 calls to your function
to get a Hessian matrix

575
00:30:19,340 --> 00:30:22,229
of the order every iteration.

576
00:30:22,229 --> 00:30:24,270
Which is why I say the
Newton method is powerful,

577
00:30:24,270 --> 00:30:26,490
but if you're doing gradients
by finite difference,

578
00:30:26,490 --> 00:30:32,617
almost it means you can't
afford to use the Newton method.

579
00:30:32,617 --> 00:30:34,200
But the good news
is that MATLAB takes

580
00:30:34,200 --> 00:30:35,775
care of all of this for you.

581
00:30:40,140 --> 00:30:49,050
We can have a look at how
MATLAB does that for you.

582
00:30:49,050 --> 00:30:52,221
Is this clear how you
estimate the gradients?

583
00:30:52,221 --> 00:30:52,720
Yup?

584
00:30:55,690 --> 00:30:59,650
So let me pull this.

585
00:30:59,650 --> 00:31:11,490
So I showed you
already on Wednesday.

586
00:31:21,843 --> 00:31:34,050
[INAUDIBLE] So I
showed you already

587
00:31:34,050 --> 00:31:38,555
on Wednesday this
little [? piece ?].

588
00:31:38,555 --> 00:31:48,625
You might remember when
we looked at the landscape

589
00:31:48,625 --> 00:31:51,676
and we saw the little
asterisks climbing around.

590
00:31:51,676 --> 00:32:01,230
So I showed you [INAUDIBLE],
but here's the call.

591
00:32:01,230 --> 00:32:04,970
So remember I talked about
[? Effman ?] unc, which

592
00:32:04,970 --> 00:32:08,730
is the unconstrained
gradient-based optimization

593
00:32:08,730 --> 00:32:11,071
method in MATLAB.

594
00:32:11,071 --> 00:32:13,320
And I also talked about
[? Effman ?] search,

595
00:32:13,320 --> 00:32:14,550
which I said was the
[INAUDIBLE] simplex.

596
00:32:14,550 --> 00:32:15,870
Remember that was
the triangles that

597
00:32:15,870 --> 00:32:18,203
were flipping around and going
through the design space.

598
00:32:18,203 --> 00:32:20,230
So the thing with
[? Naldemede ?] simplex is you

599
00:32:20,230 --> 00:32:20,940
don't need gradients.

600
00:32:20,940 --> 00:32:21,710
It doesn't need gradients.

601
00:32:21,710 --> 00:32:24,150
It just does these three
triangles and all these rules

602
00:32:24,150 --> 00:32:25,092
to contract them.

603
00:32:25,092 --> 00:32:26,550
It's only ever
looking at the three

604
00:32:26,550 --> 00:32:28,264
points in order in the design.

605
00:32:28,264 --> 00:32:30,430
So if you have a function
that's not differentiable,

606
00:32:30,430 --> 00:32:32,013
you can still use
[? Effman ?] search.

607
00:32:32,013 --> 00:32:34,200
But if you have something
that's not differentiable

608
00:32:34,200 --> 00:32:36,590
and you try to use
it [? Effman ?] unc,

609
00:32:36,590 --> 00:32:39,200
MATLAB's going to be trying
to compute gradients--

610
00:32:39,200 --> 00:32:42,260
not second derivatives
but first derivatives--

611
00:32:42,260 --> 00:32:44,985
and could get
itself into trouble.

612
00:32:44,985 --> 00:32:48,110
But the really nice thing
with these MATLAB functions

613
00:32:48,110 --> 00:32:51,720
is that you basically
supply a function, that

614
00:32:51,720 --> 00:32:56,930
given x computes
J. And you supply

615
00:32:56,930 --> 00:33:00,039
an initial guess, an initial
point in the landscape,

616
00:33:00,039 --> 00:33:02,330
and that's actually the only
thing you have to give it.

617
00:33:02,330 --> 00:33:04,121
If you don't want to
give it anything more,

618
00:33:04,121 --> 00:33:06,200
MATLAB will take care of
everything else for you.

619
00:33:06,200 --> 00:33:09,370
You can give it just
that black box function,

620
00:33:09,370 --> 00:33:13,580
given x computes J,
and an initial guess.

621
00:33:13,580 --> 00:33:15,860
If you know a little bit
more about what's going on,

622
00:33:15,860 --> 00:33:18,320
you can also choose
to manage options.

623
00:33:18,320 --> 00:33:20,830
So you can do things like
turn on see more information

624
00:33:20,830 --> 00:33:22,550
about the iteration.

625
00:33:22,550 --> 00:33:24,430
You can play with tolerances.

626
00:33:24,430 --> 00:33:34,690
You can even specify exactly
which methods you want to use.

627
00:33:34,690 --> 00:33:38,305
So here's the documentation
for [? Effman ?] unc.

628
00:33:38,305 --> 00:33:40,800
So you see there, there's
just the basic call,

629
00:33:40,800 --> 00:33:43,478
giving it a function
and an initial guess.

630
00:33:46,170 --> 00:33:49,080
So you see all these
different options in here.

631
00:33:49,080 --> 00:33:52,030
So you can have the option
to supply a gradient.

632
00:33:52,030 --> 00:33:54,590
So if you had a function
that was analytic

633
00:33:54,590 --> 00:33:56,370
and you could
differentiate it, it

634
00:33:56,370 --> 00:34:00,490
would be a really good idea to
give that gradient to MATLAB

635
00:34:00,490 --> 00:34:02,260
as well as giving
it the function.

636
00:34:02,260 --> 00:34:04,230
Or if you would use
automatic differentiation

637
00:34:04,230 --> 00:34:07,090
and come up with a code
that computed gradients,

638
00:34:07,090 --> 00:34:10,560
and you're able to specify
that as an additional option.

639
00:34:17,392 --> 00:34:20,540
And last thing I just want
to show you is the methods.

640
00:34:28,659 --> 00:34:30,510
Check it in here.

641
00:34:30,510 --> 00:34:33,822
I'm not sure if it's telling
us what the methods are.

642
00:34:43,442 --> 00:35:03,546
[INAUDIBLE] and have a
look, see what method it is.

643
00:35:03,546 --> 00:35:05,540
So again, I didn't
talk about them,

644
00:35:05,540 --> 00:35:09,240
but there's a class of
methods called quasi Newton

645
00:35:09,240 --> 00:35:11,570
methods, which in
a way are kind of

646
00:35:11,570 --> 00:35:15,300
like the best of both worlds
between the first order

647
00:35:15,300 --> 00:35:18,480
and the second order methods.

648
00:35:18,480 --> 00:35:22,630
So if you remember here
back in the Newton's method,

649
00:35:22,630 --> 00:35:27,980
[INAUDIBLE] design space to
be the inverse of the Hessian

650
00:35:27,980 --> 00:35:29,050
times the gradient.

651
00:35:29,050 --> 00:35:31,210
But then we said that
[? computation ?] was really

652
00:35:31,210 --> 00:35:32,095
too expensive.

653
00:35:32,095 --> 00:35:34,180
So what a quasi
Newton method does

654
00:35:34,180 --> 00:35:36,280
is that it basically
introduces an approximation

655
00:35:36,280 --> 00:35:36,946
for the Hessian.

656
00:35:36,946 --> 00:35:39,250
That's this A thing here.

657
00:35:39,250 --> 00:35:40,850
And it builds up
this approximation

658
00:35:40,850 --> 00:35:42,970
using the gradients that
it's computing anyway.

659
00:35:42,970 --> 00:35:45,025
So don't worry about
the details of what

660
00:35:45,025 --> 00:35:47,150
is on here, because it's
a little bit beyond what I

661
00:35:47,150 --> 00:35:50,030
want you guys to be
comfortable with.

662
00:35:50,030 --> 00:35:54,120
But more or less every
time we compute a gradient,

663
00:35:54,120 --> 00:35:56,620
we can also use differences
between gradients

664
00:35:56,620 --> 00:35:59,420
to get approximations
to Hessians.

665
00:35:59,420 --> 00:36:04,090
And I'm pretty sure that's
the method that's sitting

666
00:36:04,090 --> 00:36:05,589
underneath here in MATLAB.

667
00:36:05,589 --> 00:36:06,464
AUDIENCE: [INAUDIBLE]

668
00:36:09,835 --> 00:36:10,960
KAREN WILLCOX: [INAUDIBLE].

669
00:36:10,960 --> 00:36:13,508
OK.

670
00:36:13,508 --> 00:36:14,383
AUDIENCE: [INAUDIBLE]

671
00:36:27,839 --> 00:36:29,880
KAREN WILLCOX: So we're
not guaranteed because it

672
00:36:29,880 --> 00:36:31,710
depends on the problem.

673
00:36:31,710 --> 00:36:34,320
So if you have a problem where
there's a direction where

674
00:36:34,320 --> 00:36:35,856
the curvature is
flat, then you're

675
00:36:35,856 --> 00:36:37,230
going to have a
singular Hessian.

676
00:36:37,230 --> 00:36:38,771
And then you just
have to be careful.

677
00:36:38,771 --> 00:36:40,695
AUDIENCE: [INAUDIBLE] Yeah.

678
00:36:40,695 --> 00:36:44,560
If J were linear in any
design variable, when

679
00:36:44,560 --> 00:36:46,390
you differentiate
twice you would get 0.

680
00:36:46,390 --> 00:36:51,230
And you have to be very careful
if you have [INAUDIBLE] inverse

681
00:36:51,230 --> 00:36:52,700
of the Hessian.

682
00:36:52,700 --> 00:36:54,412
But there are ways
you can handle that,

683
00:36:54,412 --> 00:36:57,650
more advanced optimizations.

684
00:36:57,650 --> 00:36:59,344
OK.

685
00:36:59,344 --> 00:37:02,410
So I'm not going to talk
about constrained methods,

686
00:37:02,410 --> 00:37:08,440
but I did want to just show
you briefly, and show you also

687
00:37:08,440 --> 00:37:13,904
in MATLAB how
constrained methods work.

688
00:37:13,904 --> 00:37:16,070
So there's something called
the [? buns ?] function.

689
00:37:16,070 --> 00:37:17,010
Actually the
optimization community

690
00:37:17,010 --> 00:37:18,630
loves these little
simple problems.

691
00:37:18,630 --> 00:37:20,240
Remember I talked about
[? Rosenbrot ?] the other day

692
00:37:20,240 --> 00:37:21,656
and called it the
banana function.

693
00:37:21,656 --> 00:37:24,514
So there's another one called
the [? buns ?] function,

694
00:37:24,514 --> 00:37:26,180
which is actually a
constrained problem.

695
00:37:26,180 --> 00:37:28,170
So it turns out this
is the objective, which

696
00:37:28,170 --> 00:37:29,670
I know doesn't
really mean anything,

697
00:37:29,670 --> 00:37:33,890
but the difference of this
one is it's got constraints.

698
00:37:33,890 --> 00:37:36,400
And so if we look at
the visualization,

699
00:37:36,400 --> 00:37:39,360
here's the design space,
design variables x1 and x2.

700
00:37:39,360 --> 00:37:41,240
And now the contours,
they're called contours

701
00:37:41,240 --> 00:37:42,920
of the objective function.

702
00:37:42,920 --> 00:37:45,280
And then the
constraints are saying

703
00:37:45,280 --> 00:37:47,500
the solution has to
lie above this one,

704
00:37:47,500 --> 00:37:49,360
has to lie to the
left of this one,

705
00:37:49,360 --> 00:37:51,129
and has to lie above
this constraint.

706
00:37:51,129 --> 00:37:53,170
So again, when I was
talking about the landscape,

707
00:37:53,170 --> 00:37:54,836
remember I said think
of the constraints

708
00:37:54,836 --> 00:37:56,530
as being like the
keep-out regions,

709
00:37:56,530 --> 00:38:00,450
telling you where in the
design space you can be.

710
00:38:00,450 --> 00:38:03,780
And so in this case our
possible design region

711
00:38:03,780 --> 00:38:07,740
is this thing here traced
out by the constraints.

712
00:38:07,740 --> 00:38:11,000
So it turns out this problem
has two local solutions.

713
00:38:11,000 --> 00:38:13,889
One is also the global solution.

714
00:38:13,889 --> 00:38:15,680
If we try and minimize,
the global solution

715
00:38:15,680 --> 00:38:18,724
is actually up here in the
corner of the design space,

716
00:38:18,724 --> 00:38:20,140
which you can see
from the colors.

717
00:38:20,140 --> 00:38:22,500
It's the bluest color.

718
00:38:22,500 --> 00:38:24,710
But there's also a
local optimum that

719
00:38:24,710 --> 00:38:30,810
sits here on this constraint
down here at about 50, 20.

720
00:38:30,810 --> 00:38:33,070
And I should say that there
are bounds on the design

721
00:38:33,070 --> 00:38:36,900
variables, which is why this
global solution sits up here,

722
00:38:36,900 --> 00:38:39,421
because the design variables
are at their bounds.

723
00:38:39,421 --> 00:38:43,848
So that's a simple
constrained problem.

724
00:38:43,848 --> 00:38:57,650
And I just want to show
you running the algorithm.

725
00:38:57,650 --> 00:39:03,810
So [? fman ?] [? con ?] is
the MATLAB function that does

726
00:39:03,810 --> 00:39:04,880
constrained optimization.

727
00:39:04,880 --> 00:39:07,110
And again, it's really
super simple to use.

728
00:39:07,110 --> 00:39:09,200
At a minimum, all
you have to do is

729
00:39:09,200 --> 00:39:14,170
give it a function that
given x returns J of x,

730
00:39:14,170 --> 00:39:17,230
give it an initial
guess to start with,

731
00:39:17,230 --> 00:39:18,640
and give it a
function that given

732
00:39:18,640 --> 00:39:21,070
x returns the
constraints, the g of x

733
00:39:21,070 --> 00:39:23,720
or the h of x if you have them.

734
00:39:23,720 --> 00:39:26,632
So given x, return J of
x, return g of x and h

735
00:39:26,632 --> 00:39:29,234
of x, and an initial guess.

736
00:39:29,234 --> 00:39:30,900
And there are lots
more options that you

737
00:39:30,900 --> 00:39:33,610
can set if you want to.

738
00:39:33,610 --> 00:39:36,819
And you can look at lots more
things, but in the simplest

739
00:39:36,819 --> 00:39:38,360
implementation--
and then MATLAB will

740
00:39:38,360 --> 00:39:40,700
take care of finite
differencing for you

741
00:39:40,700 --> 00:39:43,380
to do the gradient, the
step size, everything.

742
00:39:43,380 --> 00:39:45,597
Everything you have
to worry about.

743
00:39:45,597 --> 00:39:47,680
But of course, again, if
you have an efficient way

744
00:39:47,680 --> 00:39:49,860
of computing gradients,
you have the option

745
00:39:49,860 --> 00:39:56,570
to specify the
gradients and make

746
00:39:56,570 --> 00:39:59,840
things run a lot more reliably
and a lot more efficiently.

747
00:39:59,840 --> 00:40:07,990
So just go back up
here, so this one's

748
00:40:07,990 --> 00:40:12,600
actually using-- this is
a quasi Newton method.

749
00:40:12,600 --> 00:40:15,870
There's also a Newton
method that [INAUDIBLE].

750
00:40:15,870 --> 00:40:18,390
I think there are actually
three levels of optimization

751
00:40:18,390 --> 00:40:21,750
in [? fman ?] [? con ?], two
different kinds of quasi Newton

752
00:40:21,750 --> 00:40:26,334
method, and there's a Newton
method that's in there.

753
00:40:26,334 --> 00:40:27,750
In order to run
the Newton method,

754
00:40:27,750 --> 00:40:29,937
you have to actually
supply a gradient for it,

755
00:40:29,937 --> 00:40:32,020
so that it doesn't do all
that finite differencing

756
00:40:32,020 --> 00:40:34,830
thing for the Hessian.

757
00:40:34,830 --> 00:40:38,385
And what we can look at,
what we're looking at here

758
00:40:38,385 --> 00:40:39,760
are different
initial conditions.

759
00:40:39,760 --> 00:40:41,780
So here's that
[? buns ?] function

760
00:40:41,780 --> 00:40:44,810
that I was showing you.

761
00:40:44,810 --> 00:40:47,430
And in this particular
case, we initialize here,

762
00:40:47,430 --> 00:40:49,620
and what's being
ported are the steps

763
00:40:49,620 --> 00:40:53,000
that the optimizer's
taking in the design space.

764
00:40:53,000 --> 00:40:55,100
I'm not going to talk about
the algorithms at all,

765
00:40:55,100 --> 00:40:57,420
but just to get a sense that
when you have constraints,

766
00:40:57,420 --> 00:40:59,628
things are much more difficult,
because it's not just

767
00:40:59,628 --> 00:41:01,040
about going downhill.

768
00:41:01,040 --> 00:41:02,690
But here we could
just go downhill,

769
00:41:02,690 --> 00:41:04,064
and you can see
more or less this

770
00:41:04,064 --> 00:41:06,881
looks kind of like a
steepest descent direction.

771
00:41:06,881 --> 00:41:09,380
But then when you get here and
you're right kind of actually

772
00:41:09,380 --> 00:41:11,194
on the wrong side
of the constraint,

773
00:41:11,194 --> 00:41:13,110
you can't just go downhill,
because you've got

774
00:41:13,110 --> 00:41:13,930
to worry about the constraint.

775
00:41:13,930 --> 00:41:15,380
And you can see
what it's doing is

776
00:41:15,380 --> 00:41:18,202
it's kind of following the
constraint and skipping around.

777
00:41:18,202 --> 00:41:20,410
So things get a lot harder
when you have constraints,

778
00:41:20,410 --> 00:41:22,150
because you have to
worry about descent,

779
00:41:22,150 --> 00:41:26,112
but also about satisfying
the constraints.

780
00:41:26,112 --> 00:41:28,140
But again, if you take
optimization class

781
00:41:28,140 --> 00:41:31,265
you'll learn lots
of stuff there.

782
00:41:31,265 --> 00:41:32,140
AUDIENCE: [INAUDIBLE]

783
00:41:34,607 --> 00:41:35,440
KAREN WILLCOX: Yeah.

784
00:41:35,440 --> 00:41:37,900
So it depends on the algorithm.

785
00:41:37,900 --> 00:41:40,810
Some algorithms will allow
it to violate the constraints

786
00:41:40,810 --> 00:41:42,310
a little bit along the way.

787
00:41:42,310 --> 00:41:44,150
Some will not.

788
00:41:44,150 --> 00:41:45,930
And there are all
lots of different ways

789
00:41:45,930 --> 00:41:46,930
of handling constraints.

790
00:41:46,930 --> 00:41:48,596
There are things
called penalty methods.

791
00:41:48,596 --> 00:41:50,570
There are things
called barrier methods.

792
00:41:50,570 --> 00:41:53,100
There are interior point--
the barrier methods related

793
00:41:53,100 --> 00:41:56,210
to interior point methods
that force things to stay away

794
00:41:56,210 --> 00:41:57,650
inside the constraints.

795
00:41:57,650 --> 00:42:01,760
It just depends on
what algorithm you use.

796
00:42:01,760 --> 00:42:03,919
But that sort of
relates to the next part

797
00:42:03,919 --> 00:42:05,460
that I want to show
you, which is now

798
00:42:05,460 --> 00:42:07,530
when you think
about convergence,

799
00:42:07,530 --> 00:42:09,910
so here is the objective
function value.

800
00:42:09,910 --> 00:42:13,210
This is the iteration.

801
00:42:13,210 --> 00:42:15,510
So you can see the
objective is coming down,

802
00:42:15,510 --> 00:42:18,119
but you don't just want to
worry about what's happening now

803
00:42:18,119 --> 00:42:18,910
with the objective.

804
00:42:18,910 --> 00:42:21,007
You also have to look at
the constraint violation.

805
00:42:21,007 --> 00:42:22,590
And here it [? skips ?]
[? forward ?].

806
00:42:22,590 --> 00:42:24,840
It looks like we found
a pretty good design,

807
00:42:24,840 --> 00:42:27,560
but actually the
constraint is violated.

808
00:42:27,560 --> 00:42:29,510
So this would be like
an aircraft wing that

809
00:42:29,510 --> 00:42:31,912
exceeds the maximum
stress constraint, which

810
00:42:31,912 --> 00:42:32,620
would be no good.

811
00:42:32,620 --> 00:42:34,554
And so that's why we
have to keep going.

812
00:42:34,554 --> 00:42:36,720
And actually you can see a
little bit of an increase

813
00:42:36,720 --> 00:42:41,210
here in the objective to get the
constraint violation satisfied.

814
00:42:41,210 --> 00:42:44,030
And I'll say that when we try
to apply these kinds of methods

815
00:42:44,030 --> 00:42:46,010
to realistic aircraft
design problems that

816
00:42:46,010 --> 00:42:48,390
have lots of variables,
they have even lots

817
00:42:48,390 --> 00:42:49,880
and lots of constraints.

818
00:42:49,880 --> 00:42:53,000
And satisfying constraints
can be sometimes just

819
00:42:53,000 --> 00:42:55,590
about the most challenging
thing for the optimizer.

820
00:42:55,590 --> 00:42:58,020
So it can often
find good solutions,

821
00:42:58,020 --> 00:43:00,600
but then it turns out that
they don't necessarily

822
00:43:00,600 --> 00:43:01,850
satisfy the constraints.

823
00:43:01,850 --> 00:43:03,960
You see another one here.

824
00:43:03,960 --> 00:43:06,350
In this case, we started
with an infeasible design,

825
00:43:06,350 --> 00:43:08,075
one that violated
these constraints,

826
00:43:08,075 --> 00:43:10,339
or violated this constraint.

827
00:43:10,339 --> 00:43:11,880
So you see first of
all the optimizer

828
00:43:11,880 --> 00:43:13,580
tries to get it back
to the feasible region.

829
00:43:13,580 --> 00:43:15,670
It tries to make sure the
constraint is satisfied.

830
00:43:15,670 --> 00:43:17,667
And then it starts searching.

831
00:43:17,667 --> 00:43:19,750
And actually most of these
designs are infeasible.

832
00:43:19,750 --> 00:43:26,330
It's only at the end that we
come here to the local optimum.

833
00:43:26,330 --> 00:43:29,905
And so again, there's
the function value.

834
00:43:29,905 --> 00:43:32,155
And you can see we start off
violating the constraint.

835
00:43:32,155 --> 00:43:35,130
We satisfy it, and then we go
back outside the constraint,

836
00:43:35,130 --> 00:43:39,060
and then eventually come
back to it at the end.

837
00:43:39,060 --> 00:43:45,520
And then the last one
starts again over here,

838
00:43:45,520 --> 00:43:47,070
where two constraints
are violated.

839
00:43:47,070 --> 00:43:49,445
And you can see even though
two constraints are violated,

840
00:43:49,445 --> 00:43:53,479
this one's actually able to get
to that solution pretty easily.

841
00:43:53,479 --> 00:43:55,770
One thing you notice is none
of the ones that I started

842
00:43:55,770 --> 00:43:59,180
actually found the
global optimum.

843
00:43:59,180 --> 00:44:02,610
They all converge to
the local optimum.

844
00:44:02,610 --> 00:44:04,490
And again, if you like
to think more in 3D,

845
00:44:04,490 --> 00:44:06,823
remember that landscape where
there was the big mountain

846
00:44:06,823 --> 00:44:08,230
and the two smaller peaks.

847
00:44:08,230 --> 00:44:10,600
In this case, we started
here, we started here,

848
00:44:10,600 --> 00:44:12,950
and we started here, and
they all ended up over here.

849
00:44:12,950 --> 00:44:15,574
Probably we would have to start
on the other side of this ridge

850
00:44:15,574 --> 00:44:19,974
here in order to get up
into the global optimum.

851
00:44:24,390 --> 00:44:24,890
OK.

852
00:44:24,890 --> 00:44:31,277
So I know you don't really
know very much about constraint

853
00:44:31,277 --> 00:44:33,360
optimization, but you could
all be [? dangerous ?]

854
00:44:33,360 --> 00:44:37,490
[? utilize ?] for the next
year if you wanted to be.

855
00:44:37,490 --> 00:44:39,250
It's actually a
really powerful thing

856
00:44:39,250 --> 00:44:41,440
if you're trying to
do a design problem,

857
00:44:41,440 --> 00:44:43,767
or you have simulation
code and you

858
00:44:43,767 --> 00:44:45,933
want to explore what's going
on with the parameters.

859
00:44:45,933 --> 00:44:49,590
The toolbox in MATLAB
is really good.

860
00:44:49,590 --> 00:44:51,090
And then if you say
for grad school,

861
00:44:51,090 --> 00:44:52,673
you can take the
graduate class that I

862
00:44:52,673 --> 00:44:56,578
teach that goes into this stuff
in much, much more detail.

863
00:44:56,578 --> 00:45:00,800
AUDIENCE: Is there anything
MATLAB, like other--

864
00:45:00,800 --> 00:45:01,770
KAREN WILLCOX: Yeah.

865
00:45:01,770 --> 00:45:02,270
Absolutely.

866
00:45:02,270 --> 00:45:06,450
So there's all kind of
optimization toolboxes.

867
00:45:06,450 --> 00:45:09,230
I mentioned Professor
Johnson in [INAUDIBLE], where

868
00:45:09,230 --> 00:45:13,150
he's implemented a lot of
some of the less mainstream

869
00:45:13,150 --> 00:45:14,260
optimization methods.

870
00:45:14,260 --> 00:45:15,090
That's nice.

871
00:45:15,090 --> 00:45:20,580
And then there's all
kinds of other toolboxes,

872
00:45:20,580 --> 00:45:25,071
things like CPLEX that a lot
of optimization people use.

873
00:45:25,071 --> 00:45:26,320
What are some of the big ones?

874
00:45:26,320 --> 00:45:27,820
I would say though
MATLAB has really

875
00:45:27,820 --> 00:45:29,900
started to sort of dominate
the market in a way

876
00:45:29,900 --> 00:45:32,000
because MATLAB's
optimization toolbox has

877
00:45:32,000 --> 00:45:35,550
become really good, especially
in the last 10 years.

878
00:45:35,550 --> 00:45:38,630
And I think the
gradient pace, they

879
00:45:38,630 --> 00:45:41,750
have the state of the art, the
Newton methods with all the

880
00:45:41,750 --> 00:45:43,440
bells and whistles on them.

881
00:45:43,440 --> 00:45:48,041
And then there are also a lot
of user-contributed algorithms.

882
00:45:51,200 --> 00:45:52,125
I'll show you.

883
00:45:52,125 --> 00:45:54,190
Let me see if I can
run it, something

884
00:45:54,190 --> 00:45:56,155
that I don't approve of.

885
00:45:56,155 --> 00:45:57,030
AUDIENCE: [INAUDIBLE]

886
00:46:08,164 --> 00:46:09,330
KAREN WILLCOX: That's right.

887
00:46:09,330 --> 00:46:10,184
Yeah.

888
00:46:10,184 --> 00:46:10,850
I gave you the--

889
00:46:10,850 --> 00:46:13,194
AUDIENCE: [INAUDIBLE]

890
00:46:13,194 --> 00:46:14,610
KAREN WILLCOX: I
gave you the link

891
00:46:14,610 --> 00:46:16,403
to that in the
last lecture notes,

892
00:46:16,403 --> 00:46:17,694
to Professor Johnson's website.

893
00:46:17,694 --> 00:46:20,604
And actually since you
said Python, also pyOpt.

894
00:46:20,604 --> 00:46:23,460
Right now pyOpt is
starting to create

895
00:46:23,460 --> 00:46:27,205
all the optimization
algorithms in Python.

896
00:46:27,205 --> 00:46:28,660
AUDIENCE: [INAUDIBLE]

897
00:46:28,660 --> 00:46:31,160
KAREN WILLCOX: What
about [? Julian ?]?

898
00:46:31,160 --> 00:46:32,039
AUDIENCE: [INAUDIBLE]

899
00:46:32,039 --> 00:46:34,080
KAREN WILLCOX: Is that
Professor Edelman's stuff?

900
00:46:34,080 --> 00:46:34,955
AUDIENCE: [INAUDIBLE]

901
00:46:49,189 --> 00:46:50,955
KAREN WILLCOX: Oh,
once you graduate?

902
00:46:50,955 --> 00:46:52,330
But that's part
of their model is

903
00:46:52,330 --> 00:46:54,580
for you to convince
your employers

904
00:46:54,580 --> 00:46:58,115
to-- but that's why you
stay in academia is you

905
00:46:58,115 --> 00:46:59,858
get perpetual very cheap MATLAB.

906
00:46:59,858 --> 00:47:02,140
It's one of the great
perks of being an academic.

907
00:47:02,140 --> 00:47:03,015
AUDIENCE: [INAUDIBLE]

908
00:47:06,404 --> 00:47:08,570
KAREN WILLCOX: So I want
to show you something again

909
00:47:08,570 --> 00:47:10,070
of which I definitely
don't approve,

910
00:47:10,070 --> 00:47:11,730
but again this
triggered in my mind

911
00:47:11,730 --> 00:47:13,850
because I said
user-contributed toolboxes.

912
00:47:13,850 --> 00:47:15,894
There's a whole class
of optimization--

913
00:47:15,894 --> 00:47:18,060
I'm going to use the word
"optimization" in quotes--

914
00:47:18,060 --> 00:47:20,330
methods that don't use
gradient, that don't really

915
00:47:20,330 --> 00:47:23,410
have any convergence theory.

916
00:47:23,410 --> 00:47:29,620
A couple of them have some
theories associated with them.

917
00:47:29,620 --> 00:47:31,810
They're typically called
heuristic algorithms

918
00:47:31,810 --> 00:47:34,560
because they use rules to
search the design space.

919
00:47:34,560 --> 00:47:37,310
And one of the most popular
in aerospace engineering

920
00:47:37,310 --> 00:47:41,460
is something called a genetic
algorithm, which basically uses

921
00:47:41,460 --> 00:47:43,572
all the rules of
natural selection,

922
00:47:43,572 --> 00:47:45,280
and mimics the rules
of natural selection

923
00:47:45,280 --> 00:47:47,545
to optimize the design space.

924
00:47:47,545 --> 00:47:50,570
So this is the function that we
were looking at on Wednesday.

925
00:47:50,570 --> 00:47:52,887
And remember, we did a
gradient-based method,

926
00:47:52,887 --> 00:47:54,470
or we did the
[? naldamete ?] simplex,

927
00:47:54,470 --> 00:47:57,740
where we had a little asterisk,
but with our initial guess,

928
00:47:57,740 --> 00:48:00,510
and then the asterisk marched
and found the top of the hill,

929
00:48:00,510 --> 00:48:02,460
or got to one of the
tops of the hills

930
00:48:02,460 --> 00:48:04,250
depending on where it started.

931
00:48:04,250 --> 00:48:07,327
So a genetic algorithm is
a population-based method,

932
00:48:07,327 --> 00:48:09,660
which means you don't have
one design on each iteration,

933
00:48:09,660 --> 00:48:10,639
but you have many.

934
00:48:10,639 --> 00:48:13,180
And I don't know how many there
are in this population, maybe

935
00:48:13,180 --> 00:48:14,810
20 or 30.

936
00:48:14,810 --> 00:48:17,180
So there are 20 or
30 initial guesses,

937
00:48:17,180 --> 00:48:19,060
and that's the population.

938
00:48:19,060 --> 00:48:21,340
And when I set it running,
what you're going to see

939
00:48:21,340 --> 00:48:25,270
is that each iteration, it's
evolving this population.

940
00:48:25,270 --> 00:48:28,100
So if there are, let's
say there are 30 there,

941
00:48:28,100 --> 00:48:30,700
on the next iteration it's
going to be 30 more and then 30

942
00:48:30,700 --> 00:48:35,190
more and 30 more, and the
way it's evolving it's taking

943
00:48:35,190 --> 00:48:38,205
designs and then the parents
and they're having children,

944
00:48:38,205 --> 00:48:40,080
then it's figuring out
which children to keep

945
00:48:40,080 --> 00:48:41,900
and which ones to not keep.

946
00:48:41,900 --> 00:48:43,990
And it's introducing
a mutation that causes

947
00:48:43,990 --> 00:48:45,800
designs to jump around.

948
00:48:45,800 --> 00:48:47,810
So there's a whole
sort of thing.

949
00:48:47,810 --> 00:48:51,910
And Professor [? Divic ?] knows
a lot about genetic algorithms.

950
00:48:51,910 --> 00:48:54,610
So if you're interested,
you could ask him.

951
00:48:54,610 --> 00:48:56,780
But let me set it
running, and you'll

952
00:48:56,780 --> 00:48:58,406
see you'll see what I mean.

953
00:48:58,406 --> 00:49:00,804
AUDIENCE: [INAUDIBLE]

954
00:49:00,804 --> 00:49:02,970
KAREN WILLCOX: Yeah, the
heuristic is all the things

955
00:49:02,970 --> 00:49:07,840
about-- you take designs
and you have to encode them

956
00:49:07,840 --> 00:49:09,240
with chromosomes.

957
00:49:09,240 --> 00:49:11,850
And then there are heuristics
about the different mating

958
00:49:11,850 --> 00:49:14,220
strategies and
then the mutations.

959
00:49:14,220 --> 00:49:16,520
So there's a lot of-- and
at the end of the day,

960
00:49:16,520 --> 00:49:18,740
you don't have guarantees
about optimality.

961
00:49:18,740 --> 00:49:23,270
But if you have a problem where
you can compute gradients,

962
00:49:23,270 --> 00:49:27,622
maybe where the models
are not differentiable,

963
00:49:27,622 --> 00:49:29,330
and you just kind of
want to spray points

964
00:49:29,330 --> 00:49:31,204
everywhere and [? work ?]
as well as you can,

965
00:49:31,204 --> 00:49:32,557
maybe something like this is OK.

966
00:49:32,557 --> 00:49:33,890
I wouldn't call it optimization.

967
00:49:33,890 --> 00:49:35,959
AUDIENCE: [INAUDIBLE]

968
00:49:35,959 --> 00:49:38,000
KAREN WILLCOX: Yeah, if
you want to avoid local--

969
00:49:38,000 --> 00:49:40,260
but I think you avoid
local minima by not

970
00:49:40,260 --> 00:49:43,100
getting to any minimum at all.

971
00:49:43,100 --> 00:49:45,320
They work great for
when you have 2D.

972
00:49:45,320 --> 00:49:50,040
But you see the idea how
dramatically different it is.

973
00:49:50,040 --> 00:49:51,640
There's simulated
annealing which

974
00:49:51,640 --> 00:49:54,730
does have some theory
associated with it.

975
00:49:54,730 --> 00:49:56,275
There's particle
swarm optimization.

976
00:49:56,275 --> 00:49:57,150
There's ant colonies.

977
00:49:57,150 --> 00:49:59,480
There's lots of
analogies with nature.

978
00:49:59,480 --> 00:50:01,570
So there's this
whole other class

979
00:50:01,570 --> 00:50:04,055
of optimization algorithms
which can be kind of fun.

980
00:50:07,310 --> 00:50:09,690
OK.

981
00:50:09,690 --> 00:50:11,580
And so if you wanted
to find those,

982
00:50:11,580 --> 00:50:13,834
you could probably go find
other toolboxes as well.

983
00:50:16,520 --> 00:50:20,900
OK, so I think I'll
skip over the 1D search.

984
00:50:20,900 --> 00:50:22,584
It's not all that important.

985
00:50:22,584 --> 00:50:24,750
But I want to spend the
last little bit just talking

986
00:50:24,750 --> 00:50:30,250
about surrogate modeling and
reduced [INAUDIBLE] models.

987
00:50:30,250 --> 00:50:34,280
So maybe you have a sense that
when we run an optimization

988
00:50:34,280 --> 00:50:37,590
algorithm, we have to call
the code lots of times.

989
00:50:37,590 --> 00:50:40,032
Every time we want to
evaluate the objective

990
00:50:40,032 --> 00:50:42,490
or the constraints, and any
time we want to get a gradient,

991
00:50:42,490 --> 00:50:45,720
we may have to call it n times.

992
00:50:45,720 --> 00:50:47,740
Also, you've seen in the
Monte Carlo simulation

993
00:50:47,740 --> 00:50:51,590
that you had to call the code
22,000 times when you were

994
00:50:51,590 --> 00:50:55,270
running the simulations
for each ball, each pitch

995
00:50:55,270 --> 00:50:57,275
that you were analyzing.

996
00:50:57,275 --> 00:50:59,220
So it turns out
that when you have

997
00:50:59,220 --> 00:51:01,670
CFD models or finite
element models

998
00:51:01,670 --> 00:51:04,924
or realistic models of
aircraft or spacecraft

999
00:51:04,924 --> 00:51:06,340
or whatever it is
you're modeling,

1000
00:51:06,340 --> 00:51:08,110
it just gets too expensive.

1001
00:51:08,110 --> 00:51:10,860
And so we use what is
called surrogate model.

1002
00:51:10,860 --> 00:51:12,680
So the word "surrogate"
means that we're

1003
00:51:12,680 --> 00:51:18,427
going to replace the real
model with an approximation.

1004
00:51:18,427 --> 00:51:20,010
And the idea is that
the approximation

1005
00:51:20,010 --> 00:51:24,000
should be something that's
much cheaper than the model

1006
00:51:24,000 --> 00:51:26,520
that we're trying to optimize,
but hopefully it's good

1007
00:51:26,520 --> 00:51:28,580
enough so that we can
make good design decisions

1008
00:51:28,580 --> 00:51:30,800
and find good designs,
and then maybe go back

1009
00:51:30,800 --> 00:51:34,680
and analyze them with our
more sophisticated model.

1010
00:51:34,680 --> 00:51:37,090
So I like to think of
surrogate models in these three

1011
00:51:37,090 --> 00:51:38,300
different categories.

1012
00:51:38,300 --> 00:51:40,600
There are data fit models
like regression models,

1013
00:51:40,600 --> 00:51:43,870
and we'll talk a little
bit more about these ones.

1014
00:51:43,870 --> 00:51:46,230
There are simplified physics
models, which actually we

1015
00:51:46,230 --> 00:51:48,940
use in engineering all
the time, that maybe we

1016
00:51:48,940 --> 00:51:53,920
want to design according to
a Navier-Stokes equation of,

1017
00:51:53,920 --> 00:51:56,830
in this case, of a
supersonic business jet.

1018
00:51:56,830 --> 00:51:59,640
But we could also use like
a paddle method, or even

1019
00:51:59,640 --> 00:52:03,600
just a simple empirical
very high-level model

1020
00:52:03,600 --> 00:52:05,684
of the aircraft so we
can simplify the physics.

1021
00:52:05,684 --> 00:52:07,100
And then there are
things that are

1022
00:52:07,100 --> 00:52:11,140
called projection-based reduced
model, which is basically

1023
00:52:11,140 --> 00:52:15,570
a mathematical way of coming
up with simpler models.

1024
00:52:15,570 --> 00:52:18,780
This little diagram here is
supposed to represent an x

1025
00:52:18,780 --> 00:52:21,510
dot equal x plus b u.

1026
00:52:21,510 --> 00:52:23,976
If you guys didn't see state
space systems [INAUDIBLE]

1027
00:52:23,976 --> 00:52:24,600
anymore, right?

1028
00:52:24,600 --> 00:52:26,512
No.

1029
00:52:26,512 --> 00:52:30,810
So mathematical ways
for reducing systems

1030
00:52:30,810 --> 00:52:32,160
that I won't talk about.

1031
00:52:32,160 --> 00:52:36,640
But I do lots of research
in that third category.

1032
00:52:36,640 --> 00:52:40,260
So if we just want to think a
little bit about the data fit

1033
00:52:40,260 --> 00:52:42,460
category, so the
idea is going to be

1034
00:52:42,460 --> 00:52:44,892
that we're going to sample
our simulation at some number

1035
00:52:44,892 --> 00:52:45,600
of design points.

1036
00:52:45,600 --> 00:52:47,891
So we might use one of the
design of experiments method

1037
00:52:47,891 --> 00:52:52,570
that we talked about last week
to somehow select some points.

1038
00:52:52,570 --> 00:52:54,650
So we're going to run
our expensive model

1039
00:52:54,650 --> 00:52:56,149
at a bunch of points,
and then we're

1040
00:52:56,149 --> 00:52:59,650
going to fit a model using
that sample information.

1041
00:52:59,650 --> 00:53:02,250
We could try to fit
a model everywhere,

1042
00:53:02,250 --> 00:53:05,600
or we might want to do
more of a local fit.

1043
00:53:05,600 --> 00:53:08,460
And then you could also think
about updating the model when

1044
00:53:08,460 --> 00:53:09,760
you get new points.

1045
00:53:09,760 --> 00:53:16,970
But I want to talk specifically
about response surface

1046
00:53:16,970 --> 00:53:23,980
modeling, and see
how one would come up

1047
00:53:23,980 --> 00:53:30,965
with a surrogate model
using a response surface.

1048
00:53:30,965 --> 00:53:32,340
Because again,
that's where we'll

1049
00:53:32,340 --> 00:53:35,190
use some of the
mathematical techniques

1050
00:53:35,190 --> 00:53:39,340
that you guys should
have seen before.

1051
00:53:39,340 --> 00:53:42,450
So we skipped over number three.

1052
00:53:42,450 --> 00:53:45,790
We didn't talk
about the 1D search.

1053
00:53:45,790 --> 00:53:48,760
And number four was
really just a brief intro

1054
00:53:48,760 --> 00:53:50,622
to what is a surrogate
model, and what

1055
00:53:50,622 --> 00:53:51,660
are the different times.

1056
00:53:51,660 --> 00:53:55,090
So we're at number five, on
a response surface model.

1057
00:53:58,576 --> 00:54:06,180
OK, so the setup is that we're
going to have-- let's just

1058
00:54:06,180 --> 00:54:10,730
draw the [? block ?] diagram.

1059
00:54:10,730 --> 00:54:17,940
So this is our expensive
model where we can supply x,

1060
00:54:17,940 --> 00:54:19,361
and out comes J of x.

1061
00:54:19,361 --> 00:54:20,860
Therefore with a
constraint problem,

1062
00:54:20,860 --> 00:54:24,216
it would also be g and
h coming out of there.

1063
00:54:24,216 --> 00:54:26,550
So that's our expensive model.

1064
00:54:26,550 --> 00:54:28,580
And basically what we
want to do is come up

1065
00:54:28,580 --> 00:54:37,480
with a surrogate model said
that when we supply x, what's

1066
00:54:37,480 --> 00:54:41,160
going to come out will
be some J hat, where

1067
00:54:41,160 --> 00:54:48,006
this guy is hopefully a
good approximation of J.

1068
00:54:48,006 --> 00:54:52,210
And one simple but pretty
powerful and commonly used

1069
00:54:52,210 --> 00:54:55,810
way to do that is with what's
called a response surface

1070
00:54:55,810 --> 00:54:58,242
model.

1071
00:54:58,242 --> 00:55:06,510
So response surface
model, or RSM.

1072
00:55:14,040 --> 00:55:22,560
So if we wanted to do
just a first order RSM,

1073
00:55:22,560 --> 00:55:27,660
then what we're going to say
is that this J hat, which

1074
00:55:27,660 --> 00:55:32,810
is a function of x, is just
going to be some constant,

1075
00:55:32,810 --> 00:55:37,440
let's call it a0,
plus the sum from i

1076
00:55:37,440 --> 00:55:41,020
equal 1 to n of
some ai times xi.

1077
00:55:41,020 --> 00:55:43,720
In other words, it's
just a linear model.

1078
00:55:43,720 --> 00:55:49,430
It's a constant plus a constant
a1 times x1, plus a2 times x2.

1079
00:55:49,430 --> 00:55:52,880
So it's going to replace
the expensive model

1080
00:55:52,880 --> 00:55:54,590
with a linear approximation.

1081
00:55:54,590 --> 00:56:00,630
That would be a first order
response surface model.

1082
00:56:00,630 --> 00:56:04,620
Turns out this is
not such a good idea,

1083
00:56:04,620 --> 00:56:08,322
and we'll look at it graphically
in a second and you'll see why.

1084
00:56:08,322 --> 00:56:11,970
So more often than not,
people used a second order

1085
00:56:11,970 --> 00:56:20,860
or a quadratic response surface.

1086
00:56:20,860 --> 00:56:27,790
And it's the same idea that
the approximate model they had

1087
00:56:27,790 --> 00:56:31,770
is going to be,
again, a constant,

1088
00:56:31,770 --> 00:56:35,980
and then it's going to be the
linear term, so the sum from i

1089
00:56:35,980 --> 00:56:39,360
equal 1 to n of
ai xi's, and then

1090
00:56:39,360 --> 00:56:43,390
it's going to be all the
quadratic terms, so i equals 1

1091
00:56:43,390 --> 00:56:53,818
to n, j equals 1 to
n of b i J x i x J.

1092
00:56:53,818 --> 00:56:56,770
And actually maybe this
should be J equal i to n.

1093
00:57:04,600 --> 00:57:07,930
So again, we're just replacing
our J of x with a model,

1094
00:57:07,930 --> 00:57:09,450
and in this case,
second order case,

1095
00:57:09,450 --> 00:57:11,060
we're saying the
model is quadratic.

1096
00:57:11,060 --> 00:57:13,350
It can have a constant
term, it can linear terms,

1097
00:57:13,350 --> 00:57:18,770
and then it can have all these
quadratic terms in the xi, xj,

1098
00:57:18,770 --> 00:57:21,752
xi squared, x1 squared, x1, x2.

1099
00:57:21,752 --> 00:57:24,212
AUDIENCE: [INAUDIBLE]

1100
00:57:24,212 --> 00:57:29,230
KAREN WILLCOX: Well, because
we don't need 1, 2 and 2, 1.

1101
00:57:29,230 --> 00:57:30,670
Yeah.

1102
00:57:30,670 --> 00:57:33,330
I mean, you can write 1
there if you wanted to,

1103
00:57:33,330 --> 00:57:35,530
but because x1, x2
is the same as x2,

1104
00:57:35,530 --> 00:57:39,530
x1, we would just
lump them. [INAUDIBLE]

1105
00:57:39,530 --> 00:57:49,310
OK, so here we've got that the
ai's are unknown coefficients.

1106
00:57:54,050 --> 00:57:58,165
And here we have the
ai's and the bij's

1107
00:57:58,165 --> 00:58:00,524
are unknown coefficients.

1108
00:58:03,250 --> 00:58:07,420
So we've said that we want
to approximate our model

1109
00:58:07,420 --> 00:58:09,104
with a linear or a
quadratic function.

1110
00:58:09,104 --> 00:58:11,395
Now the question is how do
we come up with coefficients

1111
00:58:11,395 --> 00:58:13,160
for that approximation.

1112
00:58:13,160 --> 00:58:17,034
And that's where this
sampling comes into play.

1113
00:58:24,664 --> 00:58:26,205
What we're going to
do is we're going

1114
00:58:26,205 --> 00:58:29,075
to generate samples using our
favorite design of experiments

1115
00:58:29,075 --> 00:58:30,450
method, or maybe
we're just going

1116
00:58:30,450 --> 00:58:33,319
to run Monte Carlo
and sample randomly.

1117
00:58:33,319 --> 00:58:35,110
We're going to generate
a bunch of samples,

1118
00:58:35,110 --> 00:58:38,060
and then we're going to use
least squares fitting to try

1119
00:58:38,060 --> 00:58:41,357
to estimate those coefficients
and come up with the response

1120
00:58:41,357 --> 00:58:41,940
surface model.

1121
00:58:44,630 --> 00:58:57,899
The next step is going to be how
to determine the ai and the bij

1122
00:58:57,899 --> 00:58:58,440
coefficients.

1123
00:59:04,330 --> 00:59:15,070
So let's say that we
have generated n samples,

1124
00:59:15,070 --> 00:59:22,218
and each sample
is going to have--

1125
00:59:22,218 --> 00:59:27,530
we'll call it j-- there's
going to be a pair consisting

1126
00:59:27,530 --> 00:59:31,500
of the design variables
that we sampled,

1127
00:59:31,500 --> 00:59:33,760
the value of x that we
sampled, and that's again

1128
00:59:33,760 --> 00:59:37,500
the n by 1 vector, and then
the corresponding value

1129
00:59:37,500 --> 00:59:39,220
of the objective.

1130
00:59:39,220 --> 00:59:44,590
So we'll have x1, j1,
we'll have x2, j2,

1131
00:59:44,590 --> 00:59:49,570
all the way up to xn, jn.

1132
00:59:49,570 --> 00:59:53,210
So I ran my expensive
model n times.

1133
00:59:53,210 --> 00:59:55,000
So I just collected
all the data.

1134
00:59:55,000 --> 00:59:57,047
I collected all the
conditions that I ran it at,

1135
00:59:57,047 --> 00:59:59,255
and I collected all the
results that I got out of it.

1136
00:59:59,255 --> 01:00:00,740
Is that clear?

1137
01:00:00,740 --> 01:00:01,240
Yeah?

1138
01:00:09,570 --> 01:00:19,500
Let's think about this guy
here, so just a first order

1139
01:00:19,500 --> 01:00:27,515
[INAUDIBLE] J of x equal a0
plus the sum from i equal 1 to n

1140
01:00:27,515 --> 01:00:31,100
of ai xi.

1141
01:00:31,100 --> 01:00:33,140
How many pieces of data
do we need at a minimum?

1142
01:00:42,020 --> 01:00:45,640
How many unknown coefficients
are there in the model?

1143
01:00:45,640 --> 01:00:46,260
n plus 1.

1144
01:00:46,260 --> 01:00:47,210
Little n plus 1.

1145
01:00:47,210 --> 01:00:50,189
So we need big N to be
at least little n plus 1.

1146
01:00:50,189 --> 01:00:51,730
But in m what we
would do is we would

1147
01:00:51,730 --> 01:00:55,140
generate more than
little n plus 1,

1148
01:00:55,140 --> 01:00:56,660
and then we would
do a regression.

1149
01:00:56,660 --> 01:00:58,890
We would do a least squares fit.

1150
01:00:58,890 --> 01:01:02,050
You guys did least squares
fir somewhere, yes?

1151
01:01:02,050 --> 01:01:02,550
1806?

1152
01:01:02,550 --> 01:01:03,716
Did you do it anywhere else?

1153
01:01:03,716 --> 01:01:04,594
[INAUDIBLE]

1154
01:01:04,594 --> 01:01:05,970
AUDIENCE: [INAUDIBLE]

1155
01:01:05,970 --> 01:01:07,100
KAREN WILLCOX: High school.

1156
01:01:07,100 --> 01:01:09,474
Did everybody here remember
their high school regression?

1157
01:01:09,474 --> 01:01:10,242
No?

1158
01:01:10,242 --> 01:01:11,117
AUDIENCE: [INAUDIBLE]

1159
01:01:13,457 --> 01:01:15,540
KAREN WILLCOX: You can do
them on your calculator?

1160
01:01:15,540 --> 01:01:20,390
MATLAB backslash, like
the most amazing operator?

1161
01:01:20,390 --> 01:01:20,890
OK.

1162
01:01:20,890 --> 01:01:23,360
So, well, let's think about
what it would look like.

1163
01:01:23,360 --> 01:01:28,700
So I always like to think of the
least squares fitting from just

1164
01:01:28,700 --> 01:01:30,070
the matrix system.

1165
01:01:30,070 --> 01:01:32,726
Let me draw it over here
so there's plenty of room.

1166
01:01:39,210 --> 01:01:40,920
This is a matrix
system, and if we

1167
01:01:40,920 --> 01:01:43,090
have exactly the right
number of data points,

1168
01:01:43,090 --> 01:01:44,750
the matrix will be square.

1169
01:01:44,750 --> 01:01:47,060
And if we have more data
points than unknowns,

1170
01:01:47,060 --> 01:01:50,510
the matrix will have
more rows than columns,

1171
01:01:50,510 --> 01:01:52,630
and it will be an
overdetermined system,

1172
01:01:52,630 --> 01:01:55,528
and you would do a
least square solution.

1173
01:01:55,528 --> 01:01:59,280
So we'll set up a matrix system.

1174
01:02:05,120 --> 01:02:07,880
First of all, the unknowns, so
the things we're solving for

1175
01:02:07,880 --> 01:02:16,670
are these [INAUDIBLE] a0, a1,
a2, down to a n, little n.

1176
01:02:24,550 --> 01:02:25,910
Those are the unknowns.

1177
01:02:25,910 --> 01:02:28,690
And then each row in
the matrix system, this

1178
01:02:28,690 --> 01:02:31,880
is going to correspond
one of the sample points.

1179
01:02:31,880 --> 01:02:37,180
So the first sample
point says that x--

1180
01:02:37,180 --> 01:02:40,870
when I feed it in x
superscript one into the model,

1181
01:02:40,870 --> 01:02:43,050
I get out J1.

1182
01:02:43,050 --> 01:02:47,770
So here's J1, and
here's the expansion.

1183
01:02:47,770 --> 01:02:51,440
And when I feed in x1,
so the first thing I do

1184
01:02:51,440 --> 01:03:04,600
is I get 1 times a0 plus a1
times x1 in the first sample

1185
01:03:04,600 --> 01:03:05,740
point.

1186
01:03:05,740 --> 01:03:10,457
So this guy here is going to be
x1,1 where the superscript is

1187
01:03:10,457 --> 01:03:12,540
the sample number, and the
subscript is the design

1188
01:03:12,540 --> 01:03:13,206
variable number.

1189
01:03:13,206 --> 01:03:14,918
Yup.

1190
01:03:14,918 --> 01:03:22,233
And that's x2,1, xn,1.

1191
01:03:22,233 --> 01:03:23,730
Is everybody with me?

1192
01:03:23,730 --> 01:03:28,390
I just wrote down the
equation 1 times a0

1193
01:03:28,390 --> 01:03:31,080
plus the first
component of sample

1194
01:03:31,080 --> 01:03:33,970
1 times a1 plus the
second component

1195
01:03:33,970 --> 01:03:35,830
of sample 1 times
a2, plus, plus,

1196
01:03:35,830 --> 01:03:41,512
plus the nth component of sample
a1 times a n equals to J1.

1197
01:03:44,950 --> 01:03:46,640
[INAUDIBLE] exactly
does this equation

1198
01:03:46,640 --> 01:03:48,139
apply to the first
[? here ?], and I

1199
01:03:48,139 --> 01:03:52,550
could write down so that this
is the first sample, which

1200
01:03:52,550 --> 01:03:55,710
we called x superscript 1.

1201
01:03:55,710 --> 01:03:57,310
So then I would do
the second sample,

1202
01:03:57,310 --> 01:03:58,716
and it would just
look like this.

1203
01:04:02,684 --> 01:04:11,394
x2,1, x2,2, xn,2, and that's
going to be equal to J2.

1204
01:04:11,394 --> 01:04:12,560
And then I would keep going.

1205
01:04:12,560 --> 01:04:17,810
So I'm always going to have 1's
in this column, because that 1

1206
01:04:17,810 --> 01:04:20,197
is always going to multiply a0.

1207
01:04:20,197 --> 01:04:21,780
Down here eventually
I'm going to have

1208
01:04:21,780 --> 01:04:25,200
x, first component
of the nth sample,

1209
01:04:25,200 --> 01:04:28,210
and here I'll have x, nth
component of the big Nth

1210
01:04:28,210 --> 01:04:31,860
sample, and
everything in between.

1211
01:04:31,860 --> 01:04:33,690
Yup.

1212
01:04:33,690 --> 01:04:40,860
So this is a big N by
little n plus 1 matrix.

1213
01:04:40,860 --> 01:04:42,950
This is a little
n plus 1 vector,

1214
01:04:42,950 --> 01:04:53,310
and this is a big N vector.

1215
01:04:53,310 --> 01:04:54,670
Yeah.

1216
01:04:54,670 --> 01:04:57,140
And if we have exactly
little n plus 1 samples,

1217
01:04:57,140 --> 01:04:58,640
again, this matrix
would be square,

1218
01:04:58,640 --> 01:05:00,098
and we would just
invert it, and it

1219
01:05:00,098 --> 01:05:01,830
would give us the coefficients.

1220
01:05:01,830 --> 01:05:05,460
If we have more samples
than we have unknowns, then

1221
01:05:05,460 --> 01:05:07,910
again it's just a
least squares solution,

1222
01:05:07,910 --> 01:05:10,440
which you can do with matrix
with MATLAB's backslash,

1223
01:05:10,440 --> 01:05:16,370
or you can remember the
formula involving ax equals b,

1224
01:05:16,370 --> 01:05:21,810
and it's overdetermined, then
you do a transposed ax equals

1225
01:05:21,810 --> 01:05:24,630
a transposed b, and then
take the inverse of this,

1226
01:05:24,630 --> 01:05:26,620
and take it to
the lefthand side.

1227
01:05:29,390 --> 01:05:31,710
[INAUDIBLE]

1228
01:05:31,710 --> 01:05:34,687
OK, and if we're doing a
quadratic response surface,

1229
01:05:34,687 --> 01:05:38,800
so if I was setting J hat
of x is this plus this

1230
01:05:38,800 --> 01:05:48,201
plus the term, the b ij xi xj,
how would the system change?

1231
01:05:48,201 --> 01:05:49,200
What would I have to do?

1232
01:05:55,360 --> 01:05:58,011
Would there be more unknowns?

1233
01:05:58,011 --> 01:06:04,780
Yeah, so there would be the
b1,1, b1,2, b1,3, all the b's.

1234
01:06:04,780 --> 01:06:09,630
So more unknown means more
columns in the matrix.

1235
01:06:09,630 --> 01:06:11,900
So what would be--
if I put b1,1 here,

1236
01:06:11,900 --> 01:06:14,280
what entry would go in here?

1237
01:06:17,118 --> 01:06:18,540
AUDIENCE: [INAUDIBLE]

1238
01:06:18,540 --> 01:06:24,300
KAREN WILLCOX: X1,1 squared,
because in the expansion,

1239
01:06:24,300 --> 01:06:27,640
it's b1,1 times x1 squared.

1240
01:06:27,640 --> 01:06:30,410
And we would be evaluating
that for the first sample.

1241
01:06:30,410 --> 01:06:34,950
So we'd end up with x1,1
squared x1,1 times x2,1.

1242
01:06:34,950 --> 01:06:37,280
You'd end up with all the
quadratic terms matched out.

1243
01:06:37,280 --> 01:06:40,042
So it looks just the same.

1244
01:06:40,042 --> 01:06:43,382
The regression's not
really-- sometimes it

1245
01:06:43,382 --> 01:06:45,340
seems more complicated,
but I think if you just

1246
01:06:45,340 --> 01:06:46,881
write out the matrix
system and think

1247
01:06:46,881 --> 01:06:50,760
about it as an overdetermined
set of equations,

1248
01:06:50,760 --> 01:06:51,700
it becomes clearer.

1249
01:06:51,700 --> 01:06:53,700
And then you can also see we're
going to have more unknowns.

1250
01:06:53,700 --> 01:06:55,033
We're going to add more columns.

1251
01:06:55,033 --> 01:06:56,750
If we want it to
stay overdetermined,

1252
01:06:56,750 --> 01:06:58,291
or at least determined,
we would have

1253
01:06:58,291 --> 01:07:02,670
to get more samples, which
is what we talked about.

1254
01:07:02,670 --> 01:07:08,970
So let me show you
how that works.

1255
01:07:08,970 --> 01:07:09,732
Is this clear?

1256
01:07:09,732 --> 01:07:10,690
Is this notation clear?

1257
01:07:13,783 --> 01:07:14,658
AUDIENCE: [INAUDIBLE]

1258
01:07:19,122 --> 01:07:20,727
KAREN WILLCOX: Say
it a bit louder.

1259
01:07:20,727 --> 01:07:21,602
AUDIENCE: [INAUDIBLE]

1260
01:07:27,177 --> 01:07:28,593
KAREN WILLCOX:
Yeah, that's right.

1261
01:07:28,593 --> 01:07:31,880
And in fact, using a first
order response surface

1262
01:07:31,880 --> 01:07:35,106
is a really bad idea, and
we'll see it graphically.

1263
01:07:35,106 --> 01:07:37,740
So it turns out linear
approximations in optimization

1264
01:07:37,740 --> 01:07:39,144
are not a good idea.

1265
01:07:39,144 --> 01:07:40,810
Maybe you can think
about why that would

1266
01:07:40,810 --> 01:07:44,105
be with a linear approximation.

1267
01:07:44,105 --> 01:07:45,480
You think about
what we're doing,

1268
01:07:45,480 --> 01:07:48,560
we've got this landscape, we
build a linear approximation

1269
01:07:48,560 --> 01:07:50,410
and say "optimize."

1270
01:07:50,410 --> 01:07:53,137
Find a new point, and build
a new linear approximation.

1271
01:07:53,137 --> 01:07:54,970
But where is the linear
approximation always

1272
01:07:54,970 --> 01:07:58,575
going to send you?

1273
01:07:58,575 --> 01:08:02,880
It's always going to send you
far away in some direction,

1274
01:08:02,880 --> 01:08:03,886
kind of by definition.

1275
01:08:03,886 --> 01:08:05,510
So I build a linear
approximation here.

1276
01:08:05,510 --> 01:08:07,010
If it's sloping
this way, it's going

1277
01:08:07,010 --> 01:08:09,480
to take you all the way to
that side of the design space.

1278
01:08:09,480 --> 01:08:11,063
If it's sloping the
other way, so it's

1279
01:08:11,063 --> 01:08:12,870
always going to send
you [INAUDIBLE].

1280
01:08:12,870 --> 01:08:14,495
So you could combine
it with a strategy

1281
01:08:14,495 --> 01:08:17,479
that-- that's what
a trust region does.

1282
01:08:17,479 --> 01:08:19,830
But actually it turns out
that quadratic approximation,

1283
01:08:19,830 --> 01:08:22,080
because they embed the
information of curvature, which

1284
01:08:22,080 --> 01:08:26,552
we know is really important
for optimality conditions,

1285
01:08:26,552 --> 01:08:34,542
are far, far better choices
and more commonly used.

1286
01:08:38,240 --> 01:08:43,784
Put up on [? stellar ?], it
goes through what we just

1287
01:08:43,784 --> 01:08:45,100
wrote on the board.

1288
01:08:45,100 --> 01:08:54,550
But Let me just get rid of
that horrible genetic algorithm

1289
01:08:54,550 --> 01:08:58,809
and get to the right directory.

1290
01:09:10,194 --> 01:09:17,010
So I have a code
here that is going

1291
01:09:17,010 --> 01:09:18,939
to build some response surface.

1292
01:09:18,939 --> 01:09:20,779
It's the same demo.

1293
01:09:20,779 --> 01:09:23,140
It's this peak problem,
the landscape problem.

1294
01:09:23,140 --> 01:09:25,260
So it's going to
grab some samples.

1295
01:09:25,260 --> 01:09:27,512
Maybe I'll just do it and
we can look at the code.

1296
01:09:27,512 --> 01:09:29,470
And we're going to build
some response surfaces

1297
01:09:29,470 --> 01:09:36,109
so you can-- so in this
first example, again, here's

1298
01:09:36,109 --> 01:09:39,370
our landscape, variable
x1, variable x2,

1299
01:09:39,370 --> 01:09:41,604
objective function.

1300
01:09:41,604 --> 01:09:43,270
We happen to generate
these [AUDIO OUT].

1301
01:09:46,372 --> 01:09:47,330
Maybe they were random.

1302
01:09:47,330 --> 01:09:50,609
Maybe we had some kind of design
of experiments method to do it.

1303
01:09:50,609 --> 01:09:54,340
And three points, two
dimensions, what can we fit?

1304
01:09:54,340 --> 01:09:56,960
What kind of a response
surface can we fit?

1305
01:09:56,960 --> 01:09:59,110
Linear one, first order?

1306
01:09:59,110 --> 01:10:01,690
n plus 1?

1307
01:10:01,690 --> 01:10:03,760
We don't have enough
to fit a quadratic.

1308
01:10:03,760 --> 01:10:07,050
So there is-- in
effect, it's determined.

1309
01:10:07,050 --> 01:10:15,240
It's three unknowns, constant,
slope in the x direction,

1310
01:10:15,240 --> 01:10:16,810
slope in the x2 direction.

1311
01:10:16,810 --> 01:10:18,470
So there's the response surface.

1312
01:10:18,470 --> 01:10:19,060
It's clean.

1313
01:10:19,060 --> 01:10:20,980
It goes exactly through
those three points.

1314
01:10:20,980 --> 01:10:23,480
And you see kind of the problem
that [? Tyrone ?] was asking

1315
01:10:23,480 --> 01:10:25,910
about, which is if we
use this optimization,

1316
01:10:25,910 --> 01:10:27,700
it's going to send
us-- and we maximize,

1317
01:10:27,700 --> 01:10:30,325
it's going to send us over into
the corner of the design space.

1318
01:10:30,325 --> 01:10:33,470
Clearly it's not a
good approximation.

1319
01:10:33,470 --> 01:10:35,930
So we could generate
more points.

1320
01:10:35,930 --> 01:10:38,640
So now we have these six points.

1321
01:10:38,640 --> 01:10:40,500
And now that we
have six points, we

1322
01:10:40,500 --> 01:10:42,250
could still set a
linear model, but now it

1323
01:10:42,250 --> 01:10:44,050
would be an
overdetermined system.

1324
01:10:44,050 --> 01:10:46,613
The matrix has got six rows,
because there are six sample

1325
01:10:46,613 --> 01:10:47,137
points.

1326
01:10:47,137 --> 01:10:49,220
And it's got three columns
because there are still

1327
01:10:49,220 --> 01:10:51,470
three degrees of
freedom, the constant,

1328
01:10:51,470 --> 01:10:55,550
the slope in the x1 direction,
and the slope the x2 direction.

1329
01:10:55,550 --> 01:10:57,450
So now you still
have a linear model,

1330
01:10:57,450 --> 01:11:00,910
but you can see that it
doesn't go through the points.

1331
01:11:00,910 --> 01:11:02,680
And in fact, it's
the least squares

1332
01:11:02,680 --> 01:11:04,616
is like the line of
best fit, but we're

1333
01:11:04,616 --> 01:11:06,400
in multiple dimensions.

1334
01:11:06,400 --> 01:11:08,200
So it's like the
plane of best fit.

1335
01:11:08,200 --> 01:11:12,360
Three of the points are
above, and the other three

1336
01:11:12,360 --> 01:11:16,130
want to be sitting
somewhere underneath here.

1337
01:11:16,130 --> 01:11:18,540
And maybe if you've
taken 1806-- they're

1338
01:11:18,540 --> 01:11:19,540
sitting underneath here.

1339
01:11:19,540 --> 01:11:21,039
Maybe if you've
taken 1806, you know

1340
01:11:21,039 --> 01:11:25,170
that this solution
is going to be

1341
01:11:25,170 --> 01:11:28,380
the one that minimizes the sum
of the squares of the two norm

1342
01:11:28,380 --> 01:11:29,120
distance.

1343
01:11:29,120 --> 01:11:35,310
So it is the line of best
fit in that optimal sense.

1344
01:11:35,310 --> 01:11:37,900
That's what the
least squares fit is.

1345
01:11:37,900 --> 01:11:38,400
OK.

1346
01:11:38,400 --> 01:11:40,720
So now we have six
points, which turns out

1347
01:11:40,720 --> 01:11:43,900
to be n times n plus 1 over
2, which is how many points we

1348
01:11:43,900 --> 01:11:46,180
need to fit a quadratic model.

1349
01:11:46,180 --> 01:11:47,274
So we have six points.

1350
01:11:47,274 --> 01:11:48,940
And you think about
the quadratic model,

1351
01:11:48,940 --> 01:11:49,648
what do you have?

1352
01:11:49,648 --> 01:11:52,070
You have the constant, a0,
you have the linear terms,

1353
01:11:52,070 --> 01:11:59,180
a1 and a2, and then you have
of b1,1, [INAUDIBLE] squared.

1354
01:11:59,180 --> 01:12:04,430
You have b2,2 for x2 squared,
and then you have b1,2 for x1

1355
01:12:04,430 --> 01:12:05,420
x2.

1356
01:12:05,420 --> 01:12:09,930
You have six degrees of freedom
to fit a quadratic model in 2D.

1357
01:12:09,930 --> 01:12:11,360
I wrote out the expansion.

1358
01:12:11,360 --> 01:12:13,660
So six points, six
degrees of freedom.

1359
01:12:13,660 --> 01:12:15,860
Again, our matrix
system would be 6 by 6,

1360
01:12:15,860 --> 01:12:17,820
so it would be
uniquely determined.

1361
01:12:17,820 --> 01:12:20,390
And this is what the quadratic
response surface looks like.

1362
01:12:20,390 --> 01:12:26,240
And as you can see, it's going
exactly through those points.

1363
01:12:26,240 --> 01:12:31,604
OK, so now we're starting to
get a little bit of curvature.

1364
01:12:31,604 --> 01:12:33,270
With a quadratic
model, we can only ever

1365
01:12:33,270 --> 01:12:36,460
have one minimum or one
maximum for the models

1366
01:12:36,460 --> 01:12:38,139
that look like this,
whereas we know

1367
01:12:38,139 --> 01:12:40,180
that the design space
we're trying to approximate

1368
01:12:40,180 --> 01:12:42,240
has got multiple blobs.

1369
01:12:42,240 --> 01:12:46,510
But a quadratic model;s not ever
going to capture these multiple

1370
01:12:46,510 --> 01:12:47,436
hills in this case.

1371
01:12:50,262 --> 01:12:52,220
I mentioned really briefly
on one of the slides

1372
01:12:52,220 --> 01:12:54,094
that we could think
about adapting the model.

1373
01:12:54,094 --> 01:12:57,279
So maybe as we're going
through the optimization, maybe

1374
01:12:57,279 --> 01:12:58,820
we can make a genetic
algorithm, even

1375
01:12:58,820 --> 01:12:59,760
though we're not supposed to.

1376
01:12:59,760 --> 01:13:02,009
We're figuring out that this
is the interesting region

1377
01:13:02,009 --> 01:13:05,630
in the design space, so rather
than having our points spread

1378
01:13:05,630 --> 01:13:09,240
out everywhere, we might be
saying let's kind of only

1379
01:13:09,240 --> 01:13:13,370
on the points that have got
good performance that we've

1380
01:13:13,370 --> 01:13:14,239
found so far.

1381
01:13:14,239 --> 01:13:16,280
So see all these points
around here-- [INAUDIBLE]

1382
01:13:16,280 --> 01:13:18,040
this one does too
because it's on the side

1383
01:13:18,040 --> 01:13:21,520
of that other little
hill that sits over here.

1384
01:13:21,520 --> 01:13:25,090
So if you build a response
surface using those points,

1385
01:13:25,090 --> 01:13:28,550
maybe you can see now
that the model's really

1386
01:13:28,550 --> 01:13:30,704
pretty terrible over
here, but it's actually

1387
01:13:30,704 --> 01:13:33,120
starting to be a pretty good
approximation of what's going

1388
01:13:33,120 --> 01:13:35,390
on locally around those points.

1389
01:13:35,390 --> 01:13:36,880
And that's a common
strategy would

1390
01:13:36,880 --> 01:13:40,616
be to build these local
quadratic models in regions

1391
01:13:40,616 --> 01:13:42,740
where you've sampled and
use them to make progress,

1392
01:13:42,740 --> 01:13:44,946
and then to keep updating them.

1393
01:13:44,946 --> 01:13:46,320
And even though
a quadratic model

1394
01:13:46,320 --> 01:13:50,250
seems really simple
for a real problem,

1395
01:13:50,250 --> 01:13:52,320
that strategy of adapting
points and building

1396
01:13:52,320 --> 01:14:03,580
local models [AUDIO OUT]
updating the model as we go

1397
01:14:03,580 --> 01:14:07,780
is actually a really,
really powerful one.

1398
01:14:07,780 --> 01:14:09,890
And of course if we
had more points--

1399
01:14:09,890 --> 01:14:12,120
I don't think I've got one
that's got more points.

1400
01:14:12,120 --> 01:14:15,510
If you had more than six
points, than the quadratic model

1401
01:14:15,510 --> 01:14:18,053
again would be the quadratic
model of this fit, where

1402
01:14:18,053 --> 01:14:22,250
it wouldn't go through
all the points,

1403
01:14:22,250 --> 01:14:27,890
but it would cut through
them on average equally.

1404
01:14:27,890 --> 01:14:28,685
OK, questions?

1405
01:14:34,030 --> 01:14:37,880
So that's a pretty,
I think, neat use

1406
01:14:37,880 --> 01:14:39,830
of regression that's
actually very powerful

1407
01:14:39,830 --> 01:14:46,796
and is used a lot in practice.

1408
01:14:46,796 --> 01:14:50,814
So that's one of the things
that you need to be able to do.

1409
01:14:50,814 --> 01:14:53,210
So have a basic
understanding of how

1410
01:14:53,210 --> 01:14:55,600
a design problem can be posed
as an optimization problem.

1411
01:14:55,600 --> 01:14:57,300
What do I mean when
I say constraint?

1412
01:14:57,300 --> 01:14:59,924
What I mean when I say
objective function?

1413
01:14:59,924 --> 01:15:01,590
Have a basic understanding
of the steps,

1414
01:15:01,590 --> 01:15:03,132
and the gradient-based
constrained

1415
01:15:03,132 --> 01:15:05,340
optimization algorithms are
looking for the direction

1416
01:15:05,340 --> 01:15:10,165
and how the different
methods do that.

1417
01:15:10,165 --> 01:15:13,120
Be able to estimate a gradient,
a first order or second order

1418
01:15:13,120 --> 01:15:14,620
derivative using
finite differences.

1419
01:15:14,620 --> 01:15:15,703
You could already do that.

1420
01:15:15,703 --> 01:15:18,365
You just did it in
the context of PDEs.

1421
01:15:18,365 --> 01:15:20,650
Now we're talking
about design problems.

1422
01:15:20,650 --> 01:15:24,260
And understand how you could
construct a polynomial response

1423
01:15:24,260 --> 01:15:26,450
surface, either
linear or quadratic,

1424
01:15:26,450 --> 01:15:28,940
using least squares regression.

1425
01:15:28,940 --> 01:15:31,144
Actually, we didn't talk
about the quality of fit.

1426
01:15:31,144 --> 01:15:33,060
Do you guys remember
what's the quality of fit

1427
01:15:33,060 --> 01:15:35,866
metric for regression models?

1428
01:15:35,866 --> 01:15:36,850
r squared.

1429
01:15:36,850 --> 01:15:41,335
Yeah, so r squared tells you how
well your surface approximates

1430
01:15:41,335 --> 01:15:42,220
your data points.

1431
01:15:42,220 --> 01:15:44,469
But you have to be careful
because it doesn't tell you

1432
01:15:44,469 --> 01:15:47,322
anything about how good it
might be in regions where

1433
01:15:47,322 --> 01:15:50,190
you didn't sample.

1434
01:15:50,190 --> 01:15:50,690
All right.

1435
01:15:50,690 --> 01:15:51,356
Final questions?

1436
01:15:56,716 --> 01:15:58,090
You guys feel like
you've learned

1437
01:15:58,090 --> 01:16:00,970
enough in this class already?

1438
01:16:00,970 --> 01:16:02,161
No?

1439
01:16:02,161 --> 01:16:02,660
OK.

1440
01:16:02,660 --> 01:16:07,050
So on Wednesday,
I will do a review

1441
01:16:07,050 --> 01:16:10,194
of finite-- I think I'm going
to focus only on finite element

1442
01:16:10,194 --> 01:16:12,318
methods and then the
probabilistic and optimization

1443
01:16:12,318 --> 01:16:14,710
function.

1444
01:16:14,710 --> 01:16:17,180
If you have specific
questions or topics

1445
01:16:17,180 --> 01:16:19,830
that you want me to cover
in more detail than others,

1446
01:16:19,830 --> 01:16:21,610
then either bring
them to class or you

1447
01:16:21,610 --> 01:16:23,250
can e-mail me ahead
of time so that I

1448
01:16:23,250 --> 01:16:26,692
can prepare a little bit more.

1449
01:16:26,692 --> 01:16:28,650
But that will be a good
chance to ask questions

1450
01:16:28,650 --> 01:16:31,900
about things that are
confusing before the final.

1451
01:16:31,900 --> 01:16:32,630
All right.

1452
01:16:32,630 --> 01:16:35,370
See everybody on Wednesday.