1 00:00:00,030 --> 00:00:02,400 The following content is provided under a Creative 2 00:00:02,400 --> 00:00:03,780 Commons license. 3 00:00:03,780 --> 00:00:06,020 Your support will help MIT OpenCourseWare 4 00:00:06,020 --> 00:00:10,100 continue to offer high quality educational resources for free. 5 00:00:10,100 --> 00:00:12,670 To make a donation or to view additional materials 6 00:00:12,670 --> 00:00:16,580 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:16,580 --> 00:00:17,235 at ocw.mit.edu. 8 00:00:31,220 --> 00:00:35,250 KAREN WILLCOX: OK everybody, let's get started. 9 00:00:35,250 --> 00:00:38,130 So if you came in late, project three is here. 10 00:00:38,130 --> 00:00:40,630 It's graded in here. 11 00:00:40,630 --> 00:00:43,530 You're going to have project two back tomorrow. 12 00:00:43,530 --> 00:00:44,030 Yeah. 13 00:00:47,940 --> 00:00:49,607 The project threes were really good. 14 00:00:49,607 --> 00:00:51,440 The one thing that seemed to trip up-- well, 15 00:00:51,440 --> 00:00:53,565 there was a few things that tripped up some people, 16 00:00:53,565 --> 00:00:55,790 but the one thing that seemed to trip up a few people 17 00:00:55,790 --> 00:00:58,580 was just the computation of the number of samples given 18 00:00:58,580 --> 00:01:03,230 the Monte Carlo simulation to achieve the required stated 19 00:01:03,230 --> 00:01:07,655 accuracy of plus or minus .01 on the probability estimate. 20 00:01:07,655 --> 00:01:09,760 So some people got it really well. 21 00:01:09,760 --> 00:01:12,130 Some people got it, but their explanations 22 00:01:12,130 --> 00:01:14,820 were a little bit muddled in some of their words. 23 00:01:14,820 --> 00:01:17,340 And then some people didn't get it quite right. 24 00:01:17,340 --> 00:01:19,499 And I think it's a really important thing, 25 00:01:19,499 --> 00:01:21,540 and it's really important thing to make sure it's 26 00:01:21,540 --> 00:01:23,970 very clear in your mind. 27 00:01:23,970 --> 00:01:26,770 And one thing that I would recommend 28 00:01:26,770 --> 00:01:29,390 is that whenever you're talking about mean and variance 29 00:01:29,390 --> 00:01:31,930 or mean and standard variation that you make sure 30 00:01:31,930 --> 00:01:34,350 it's very clear in your mind of which random variable 31 00:01:34,350 --> 00:01:36,830 you're talking about, because just saying mean 32 00:01:36,830 --> 00:01:39,990 or just saying variance doesn't necessarily 33 00:01:39,990 --> 00:01:43,190 tell me or you anything when there's lots 34 00:01:43,190 --> 00:01:45,560 of variances floating around. 35 00:01:45,560 --> 00:01:48,880 In this particular example, there's 36 00:01:48,880 --> 00:01:51,491 the variance of the output of the simulation 37 00:01:51,491 --> 00:01:52,990 that we're trying to estimate, which 38 00:01:52,990 --> 00:01:55,340 is how far the ball flies. 39 00:01:55,340 --> 00:01:57,370 So how far the ball flies is a random variable 40 00:01:57,370 --> 00:01:59,410 that's got its own variance. 41 00:01:59,410 --> 00:02:01,280 But then the probability estimates 42 00:02:01,280 --> 00:02:04,170 themselves have got variances because the probability 43 00:02:04,170 --> 00:02:06,684 estimates are random variables because you're 44 00:02:06,684 --> 00:02:08,350 estimating them using Monte Carlo, which 45 00:02:08,350 --> 00:02:10,030 is a random process. 46 00:02:10,030 --> 00:02:16,740 So when you're asked to get the probability 47 00:02:16,740 --> 00:02:27,320 estimate to within plus or minus 0.01 with 99% confidence, 48 00:02:27,320 --> 00:02:31,370 then that says we want to look at the variance 49 00:02:31,370 --> 00:02:33,130 of the probability estimate. 50 00:02:33,130 --> 00:02:36,770 How much does the probability estimate itself vary? 51 00:02:36,770 --> 00:02:39,114 And the variance of that estimator-- so 52 00:02:39,114 --> 00:02:40,030 what's that estimator? 53 00:02:40,030 --> 00:02:43,819 Let's call it p hat is the estimator of the probability. 54 00:02:43,819 --> 00:02:46,110 Well, we know that saying by the central limit theorem, 55 00:02:46,110 --> 00:02:49,710 it's distributed as a normal distribution 56 00:02:49,710 --> 00:02:53,280 with mean equal to the probability 57 00:02:53,280 --> 00:02:58,230 that we're actually trying to estimate, and variance 58 00:02:58,230 --> 00:03:09,774 of that probability p of a, 1 minus p of a over n. 59 00:03:09,774 --> 00:03:10,274 Yup. 60 00:03:10,274 --> 00:03:11,149 AUDIENCE: [INAUDIBLE] 61 00:03:16,577 --> 00:03:18,410 KAREN WILLCOX: We'll also put the lights on. 62 00:03:18,410 --> 00:03:19,520 That will help. 63 00:03:19,520 --> 00:03:22,460 Do you mind going into my office and grabbing 64 00:03:22,460 --> 00:03:26,130 some of the big chalk that's on-- great, thanks. 65 00:03:26,130 --> 00:03:27,170 I hate the little chalk. 66 00:03:27,170 --> 00:03:29,590 We'll get some bigger chalk. 67 00:03:29,590 --> 00:03:31,370 Can you see a bit better now? 68 00:03:31,370 --> 00:03:35,490 So this is the variance, but it's the variance of what? 69 00:03:35,490 --> 00:03:38,660 It's the variance of this estimator, p hat of a, 70 00:03:38,660 --> 00:03:41,076 and this is the mean of p hat of a. 71 00:03:41,076 --> 00:03:42,450 So again, just any time you write 72 00:03:42,450 --> 00:03:44,140 variance, variance of what? 73 00:03:44,140 --> 00:03:46,180 Variance of what random variable? 74 00:03:46,180 --> 00:03:48,700 And I sort of got the feeling some people who 75 00:03:48,700 --> 00:03:51,710 knew it had to be this thing, but with defining 76 00:03:51,710 --> 00:03:55,600 this thing as the variance, and then just randomly 77 00:03:55,600 --> 00:03:57,750 or magically dividing by n, but make sure 78 00:03:57,750 --> 00:04:00,610 that this is really clear in your mind 79 00:04:00,610 --> 00:04:02,940 that the estimator itself is a random variable. 80 00:04:02,940 --> 00:04:05,620 And then in order to figure out how many samples you take, 81 00:04:05,620 --> 00:04:09,800 we've got to control the variance of the estimator. 82 00:04:09,800 --> 00:04:12,595 And then because we know this thing is normally distributed, 83 00:04:12,595 --> 00:04:14,970 and we want 99% confidence, that's 84 00:04:14,970 --> 00:04:18,253 plus or minus 3 sigma or plus or minus 2.58, some 85 00:04:18,253 --> 00:04:20,779 of you computed, so we need to make sure 86 00:04:20,779 --> 00:04:28,426 that this sigma, square root of this times 3 or times 2.58 87 00:04:28,426 --> 00:04:32,833 falls within plus or minus-- is less than 0.01. 88 00:04:32,833 --> 00:04:33,708 AUDIENCE: [INAUDIBLE] 89 00:04:36,376 --> 00:04:37,626 KAREN WILLCOX: Less than here? 90 00:04:37,626 --> 00:04:38,340 AUDIENCE: Yeah. 91 00:04:38,340 --> 00:04:41,650 KAREN WILLCOX: So that's through the central limit theorem, 92 00:04:41,650 --> 00:04:43,230 because we know that we can write 93 00:04:43,230 --> 00:04:47,530 the estimate of the probability-- remember, 94 00:04:47,530 --> 00:04:52,280 we could write it as 1 over n, the sum from i equals 1 to n 95 00:04:52,280 --> 00:04:56,561 of i of ai. 96 00:04:56,561 --> 00:04:57,436 AUDIENCE: [INAUDIBLE] 97 00:05:02,738 --> 00:05:05,050 KAREN WILLCOX: Yeah, so this is another important thing 98 00:05:05,050 --> 00:05:07,400 to keep in mind is the output itself doesn't 99 00:05:07,400 --> 00:05:10,160 have to be normally distributed, but the estimator 100 00:05:10,160 --> 00:05:14,480 for the probability does have a normal distribution. 101 00:05:14,480 --> 00:05:17,360 And the reason it does is because we can write it 102 00:05:17,360 --> 00:05:21,330 as-- this is like a sample mean of an indicator function, 103 00:05:21,330 --> 00:05:23,710 something that's either 0 or 1. 104 00:05:23,710 --> 00:05:27,270 And so by central limit theorem, this thing for big enough n 105 00:05:27,270 --> 00:05:28,690 has a normal distribution. 106 00:05:28,690 --> 00:05:33,070 The same is true for the estimator for the mean, 107 00:05:33,070 --> 00:05:36,885 because that's 1 over n, the sum from i 108 00:05:36,885 --> 00:05:39,250 equal 1 to n-- then we called this thing y, 109 00:05:39,250 --> 00:05:42,740 but I think we called it y bar in the notes. 110 00:05:42,740 --> 00:05:45,910 The distribution of y bar, the estimator for the mean 111 00:05:45,910 --> 00:05:48,442 also has a normal distribution. 112 00:05:48,442 --> 00:05:52,280 So remember, we saw estimators for mean, for probability, 113 00:05:52,280 --> 00:05:54,260 and for variance. 114 00:05:54,260 --> 00:05:56,780 These two have normal distributions regardless 115 00:05:56,780 --> 00:05:59,290 of the distribution of the output, 116 00:05:59,290 --> 00:06:01,650 but the variance doesn't. 117 00:06:01,650 --> 00:06:03,114 Is that clear? 118 00:06:03,114 --> 00:06:04,530 And again, that's why I think it's 119 00:06:04,530 --> 00:06:06,988 important to say variance of what, or distribution of what. 120 00:06:06,988 --> 00:06:09,660 This is the distribution of the random variable, 121 00:06:09,660 --> 00:06:12,360 which is the estimator of the probability. 122 00:06:12,360 --> 00:06:14,780 It's sort of got nothing-- it's got nothing 123 00:06:14,780 --> 00:06:19,620 to do with the variance of the range of the ball in this case, 124 00:06:19,620 --> 00:06:21,870 or even the shape of the distribution. 125 00:06:21,870 --> 00:06:23,930 But what it does depend on is the probability 126 00:06:23,930 --> 00:06:25,950 we're trying to estimate. 127 00:06:25,950 --> 00:06:30,340 And so we use this result to figure out 128 00:06:30,340 --> 00:06:31,960 how many samples to take, but we don't 129 00:06:31,960 --> 00:06:33,372 have an estimate for this guy. 130 00:06:33,372 --> 00:06:37,630 So a trick that helps people realize 131 00:06:37,630 --> 00:06:40,030 is that we know the worst case for this. 132 00:06:40,030 --> 00:06:43,165 So what value of p will give us the biggest variance? 133 00:06:45,860 --> 00:06:48,260 p equal 1/2? 134 00:06:48,260 --> 00:06:52,480 If p is 1/2, this 1/2 times 1/2 is 0.25. 135 00:06:52,480 --> 00:06:56,380 That's the biggest value that p1 minus p can take. 136 00:06:56,380 --> 00:07:00,620 There's 0 at either end and 0.25 in the middle. 137 00:07:00,620 --> 00:07:04,590 So one strategy is to compute this using p equal 1/2, 138 00:07:04,590 --> 00:07:07,760 which is the worst case, and I would just 139 00:07:07,760 --> 00:07:08,897 do plus or minus 3 sigma. 140 00:07:08,897 --> 00:07:10,980 It's a little bit conservative, and that gives you 141 00:07:10,980 --> 00:07:16,120 something like 22,500 is the n that you need. 142 00:07:16,120 --> 00:07:18,670 And then you know you'll be OK for anything. 143 00:07:18,670 --> 00:07:20,560 And it turns out one of them, the probability 144 00:07:20,560 --> 00:07:22,895 came out to be like 0.52 or something. 145 00:07:22,895 --> 00:07:24,270 And that would be the worst case, 146 00:07:24,270 --> 00:07:26,210 and you'd be a little bit conservative here, 147 00:07:26,210 --> 00:07:26,930 but that's fine. 148 00:07:26,930 --> 00:07:32,430 You'd still be meeting your plus or minus 0.01. 149 00:07:32,430 --> 00:07:34,070 So just make sure make sure that that's 150 00:07:34,070 --> 00:07:38,304 really clear in your mind as you go through how you take all 151 00:07:38,304 --> 00:07:39,720 these different pieces that you're 152 00:07:39,720 --> 00:07:42,520 clear about, the distribution and what is normally 153 00:07:42,520 --> 00:07:44,760 distributed and what isn't, and then you 154 00:07:44,760 --> 00:07:47,840 can work from this variance to identifying 155 00:07:47,840 --> 00:07:51,120 the plus or minus 3 sigma to give 99% confidence 156 00:07:51,120 --> 00:07:54,611 and then translate that into an n. 157 00:07:54,611 --> 00:07:55,110 OK. 158 00:07:55,110 --> 00:07:57,401 So if you didn't get it quite right, take a look at it. 159 00:07:57,401 --> 00:07:59,130 And then if it's not clear, I'm going 160 00:07:59,130 --> 00:08:01,857 to do sort of a summary of everything on Wednesday, 161 00:08:01,857 --> 00:08:03,940 and we can go through it again in more detail then 162 00:08:03,940 --> 00:08:05,080 if you want to. 163 00:08:05,080 --> 00:08:10,740 And I'll also post some solutions tonight or tomorrow. 164 00:08:10,740 --> 00:08:11,910 But any other questions? 165 00:08:18,630 --> 00:08:19,200 OK. 166 00:08:19,200 --> 00:08:22,730 So today we're going to finish talking 167 00:08:22,730 --> 00:08:26,970 about unconstrained optimization. 168 00:08:26,970 --> 00:08:32,669 We'll just pick up where we left off on last Wednesday. 169 00:08:32,669 --> 00:08:36,520 And we'll talk a lot about how we compute gradients. 170 00:08:36,520 --> 00:08:39,580 We might talk a bit about the 1D search if we have time. 171 00:08:39,580 --> 00:08:42,158 And then we'll finish up by talking about response surface 172 00:08:42,158 --> 00:08:42,658 modelling. 173 00:08:47,150 --> 00:09:00,590 If you remember where we left off last Wednesday, so first 174 00:09:00,590 --> 00:09:02,799 up, this is a basic optimization problem 175 00:09:02,799 --> 00:09:03,840 we've been talking about. 176 00:09:03,840 --> 00:09:06,420 Remember, minimize some objective j 177 00:09:06,420 --> 00:09:09,010 that's a function of design variables x, 178 00:09:09,010 --> 00:09:14,540 and we have x1, x2, up to xn design variables. 179 00:09:14,540 --> 00:09:19,290 Parameters p subject to inequality constraints 180 00:09:19,290 --> 00:09:23,280 that we write as dJ is less than or equal to 0, and equality 181 00:09:23,280 --> 00:09:26,140 constraints that we write as hk equal to 0. 182 00:09:26,140 --> 00:09:29,360 And in general, we might have m1 inequality constraints 183 00:09:29,360 --> 00:09:31,160 and m2 equality constraints, and then we 184 00:09:31,160 --> 00:09:33,960 might also have bounds on the design space. 185 00:09:33,960 --> 00:09:36,140 So we saw this last Wednesday. 186 00:09:36,140 --> 00:09:39,020 And we also talked about the general process 187 00:09:39,020 --> 00:09:40,550 of optimization. 188 00:09:40,550 --> 00:09:42,970 And remember, I said think of it as a landscape where 189 00:09:42,970 --> 00:09:45,930 the directions in the landscape are the design variables, 190 00:09:45,930 --> 00:09:48,040 and the height of the landscape represents 191 00:09:48,040 --> 00:09:49,480 the objective function. 192 00:09:49,480 --> 00:09:51,360 If you're maximizing, your goal is 193 00:09:51,360 --> 00:09:53,625 to get to the top of the highest mountain 194 00:09:53,625 --> 00:09:55,134 or highest hill in the landscape. 195 00:09:55,134 --> 00:09:56,550 If you're minimizing, your goal is 196 00:09:56,550 --> 00:09:59,390 to get to the bottom of the lowest valley. 197 00:09:59,390 --> 00:10:01,440 And the way that that works is that we 198 00:10:01,440 --> 00:10:03,312 start with some initial guess that we 199 00:10:03,312 --> 00:10:05,630 denote x with a superscript 0. 200 00:10:05,630 --> 00:10:07,560 That's the initial guess. 201 00:10:07,560 --> 00:10:09,070 We look around. 202 00:10:09,070 --> 00:10:10,580 For a gradient-based method, we're 203 00:10:10,580 --> 00:10:14,050 going to also compute the gradient 204 00:10:14,050 --> 00:10:18,000 of the objective of where we're standing in the landscape. 205 00:10:18,000 --> 00:10:21,970 Look around, find the search direction. 206 00:10:21,970 --> 00:10:23,640 Once we've found that search direction, 207 00:10:23,640 --> 00:10:25,250 look along that search direction. 208 00:10:25,250 --> 00:10:29,276 Perform what's called the 1D search and take a step alpha. 209 00:10:29,276 --> 00:10:31,650 Compute a new design, figure out whether we've converged, 210 00:10:31,650 --> 00:10:32,280 and keep going. 211 00:10:32,280 --> 00:10:36,170 So we're walking around the landscape iterating 212 00:10:36,170 --> 00:10:39,289 with our design variables. 213 00:10:39,289 --> 00:10:40,830 And what we talked about on Wednesday 214 00:10:40,830 --> 00:10:42,480 was the steepest descent method, which 215 00:10:42,480 --> 00:10:44,490 says always choose the search direction 216 00:10:44,490 --> 00:10:47,400 to be the negative of gradient of j. 217 00:10:47,400 --> 00:10:49,630 So that's the direction of steepest descent. 218 00:10:49,630 --> 00:10:51,540 And we also saw conjugate gradient, 219 00:10:51,540 --> 00:10:53,560 where we modify the search direction according 220 00:10:53,560 --> 00:10:55,820 to where we have come from. 221 00:10:55,820 --> 00:11:00,690 And remember that we saw that steepest descent-- in this plot 222 00:11:00,690 --> 00:11:03,940 where the objective function is represented with the contours, 223 00:11:03,940 --> 00:11:06,090 the steepest descent direction is always 224 00:11:06,090 --> 00:11:07,930 perpendicular to the contours. 225 00:11:07,930 --> 00:11:10,670 And we saw that as we approached the optimum, steepest descent 226 00:11:10,670 --> 00:11:13,003 starts to take this zig-zagging path with really, really 227 00:11:13,003 --> 00:11:14,350 small steps. 228 00:11:14,350 --> 00:11:17,340 Whereas conjugate gradient, which modifies each search 229 00:11:17,340 --> 00:11:20,400 direction depending on the previous search direction, 230 00:11:20,400 --> 00:11:22,470 is able to converge much more quickly. 231 00:11:22,470 --> 00:11:25,620 And in this case, it's just [? a quick consideration. ?] 232 00:11:25,620 --> 00:11:29,120 OK, so that's where we are stopped on Wednesday. 233 00:11:29,120 --> 00:11:33,900 So I want to mention just really quickly a couple other methods. 234 00:11:33,900 --> 00:11:38,360 So if I go back actually, we talked about first order method 235 00:11:38,360 --> 00:11:39,890 that use just gradient information, 236 00:11:39,890 --> 00:11:42,300 and that was steepest descent and conjugate gradient. 237 00:11:42,300 --> 00:11:44,230 So there are also second order method 238 00:11:44,230 --> 00:11:47,416 that use not just the first derivative, 239 00:11:47,416 --> 00:11:49,040 but also second derivative information. 240 00:11:49,040 --> 00:11:50,850 And remember, we also derived on Wednesday 241 00:11:50,850 --> 00:11:56,000 that the second derivative of j is an n by n matrix. 242 00:11:56,000 --> 00:11:58,500 That was the Hessian matrix, the matrix of second derivative 243 00:11:58,500 --> 00:12:01,250 that handled the pure second on the diagonal, and then 244 00:12:01,250 --> 00:12:04,658 all the cross terms on the off diagonals. 245 00:12:04,658 --> 00:12:07,400 So how does Newton's method work? 246 00:12:07,400 --> 00:12:11,150 Well, if you think about a Taylor series-- and again, 247 00:12:11,150 --> 00:12:13,070 we derived this on Wednesday. 248 00:12:13,070 --> 00:12:17,430 The Taylor series in the case of a scalar objective that 249 00:12:17,430 --> 00:12:20,586 depends on a vector of design variables, so x 250 00:12:20,586 --> 00:12:22,270 is an n-dimensional vector. 251 00:12:22,270 --> 00:12:28,150 If we expand j of x about the point j of x0, 252 00:12:28,150 --> 00:12:33,295 then the first term is the gradient times delta x. 253 00:12:33,295 --> 00:12:35,324 Delta x is x minus x0. 254 00:12:35,324 --> 00:12:37,740 And then remember, the second term was this quadratic term 255 00:12:37,740 --> 00:12:40,460 that looked like delta x transposed times the Hessian 256 00:12:40,460 --> 00:12:42,230 times delta x. 257 00:12:42,230 --> 00:12:44,849 Remember, we did that on Wednesday. 258 00:12:44,849 --> 00:12:46,640 And this is an approximation, because we're 259 00:12:46,640 --> 00:12:50,138 truncating all the third order terms and higher. 260 00:12:50,138 --> 00:12:53,220 So we can use this Taylor series expansion, 261 00:12:53,220 --> 00:12:56,832 and we can say, differentiate both sides. 262 00:12:56,832 --> 00:13:00,930 So grad J of x can be approximated 263 00:13:00,930 --> 00:13:08,440 by grad J of x0 plus the Hessian evaluated at x0 times delta x. 264 00:13:08,440 --> 00:13:08,941 Yeah. 265 00:13:08,941 --> 00:13:10,106 And what are we looking for? 266 00:13:10,106 --> 00:13:12,430 We're looking for a place with the gradient of j is 0. 267 00:13:12,430 --> 00:13:14,640 Remember, that was the condition for finding 268 00:13:14,640 --> 00:13:19,310 a minimum or a maximum, or it could also be a saddle point. 269 00:13:19,310 --> 00:13:24,320 So if we set this left hand side equal to 0, then what's left 270 00:13:24,320 --> 00:13:27,260 is this condition here. 271 00:13:27,260 --> 00:13:29,030 And so now we can rearrange and say 272 00:13:29,030 --> 00:13:33,210 that the delta x, the step that we take 273 00:13:33,210 --> 00:13:38,300 is going to be the inverse of the Hessian at the point we're 274 00:13:38,300 --> 00:13:44,120 standing at times the gradient of the objective at that point 275 00:13:44,120 --> 00:13:46,180 with a minus sign. 276 00:13:46,180 --> 00:13:55,080 OK, so remember [INAUDIBLE] in the steepest descent method, 277 00:13:55,080 --> 00:13:57,302 we said that the search direction was just 278 00:13:57,302 --> 00:14:08,160 equal to minus grad J. And we're writing that x at q plus 1 279 00:14:08,160 --> 00:14:14,480 is equal to x at q plus alpha q-- 280 00:14:14,480 --> 00:14:20,220 let me write that as a subscript-- times [? x ?] q. 281 00:14:25,440 --> 00:14:25,940 OK. 282 00:14:25,940 --> 00:14:28,230 So that's what we've been saying is 283 00:14:28,230 --> 00:14:31,160 that the new guess of the design is 284 00:14:31,160 --> 00:14:33,960 the old one plus [INAUDIBLE] term. 285 00:14:33,960 --> 00:14:37,308 In the steepest descent method, where it's just minus gradient 286 00:14:37,308 --> 00:14:40,850 of J. So now you can see with Newton's method, 287 00:14:40,850 --> 00:14:44,070 we have that the delta x, which is this guy here, 288 00:14:44,070 --> 00:14:47,780 the alpha q times [? this ?] [? direction ?] [? s ?] q is 289 00:14:47,780 --> 00:14:50,677 the negative of the inverse of the Hessian times gradient 290 00:14:50,677 --> 00:14:51,710 of J. 291 00:14:51,710 --> 00:14:52,585 AUDIENCE: [INAUDIBLE] 292 00:14:56,416 --> 00:14:57,832 KAREN WILLCOX: Yeah, so we'll have 293 00:14:57,832 --> 00:15:06,650 to check that when we converge, when we get to-- we'll 294 00:15:06,650 --> 00:15:09,620 check that on convergence. 295 00:15:09,620 --> 00:15:10,970 Yeah. 296 00:15:10,970 --> 00:15:12,940 Typically you don't check it as you go along, 297 00:15:12,940 --> 00:15:16,320 but it actually depends a little bit on the algorithm. 298 00:15:16,320 --> 00:15:17,990 The next ones that you'll see, you'll 299 00:15:17,990 --> 00:15:19,950 see that the desire for Hessian to be 300 00:15:19,950 --> 00:15:22,792 positive definite is worked in. 301 00:15:26,090 --> 00:15:29,050 OK, so how does Newton's method work? 302 00:15:29,050 --> 00:15:30,640 This says S, but it really should 303 00:15:30,640 --> 00:15:35,094 be alpha times S in the way I defined the update in delta x. 304 00:15:35,094 --> 00:15:36,870 So maybe you can see that if you're 305 00:15:36,870 --> 00:15:39,680 trying to optimize a quadratic function, 306 00:15:39,680 --> 00:15:41,890 this extension is exact. 307 00:15:41,890 --> 00:15:43,820 There's no approximation here. 308 00:15:43,820 --> 00:15:45,560 And in that case, Newton's method 309 00:15:45,560 --> 00:15:47,734 would converge in one step. 310 00:15:47,734 --> 00:15:49,150 So that means that if I was trying 311 00:15:49,150 --> 00:15:51,066 to optimize some kind of a quadratic function, 312 00:15:51,066 --> 00:15:53,400 or the landscape would be like a bowl, 313 00:15:53,400 --> 00:15:55,570 no matter where I started, I would 314 00:15:55,570 --> 00:16:01,260 get to the optimum in one step with Newton's method. 315 00:16:01,260 --> 00:16:04,410 But if the function's not actually quadratic, 316 00:16:04,410 --> 00:16:07,670 then what we're basically doing is 317 00:16:07,670 --> 00:16:12,000 computing an approximation of the function 318 00:16:12,000 --> 00:16:14,070 here as a quadratic. 319 00:16:14,070 --> 00:16:15,570 So however it would look, it would 320 00:16:15,570 --> 00:16:18,160 look maybe something like this, we'd be approximating here. 321 00:16:18,160 --> 00:16:19,580 We would solve Newton's method. 322 00:16:19,580 --> 00:16:21,300 We would jump to there. 323 00:16:21,300 --> 00:16:24,640 We would create a new approximation locally 324 00:16:24,640 --> 00:16:26,945 that might look like this, and then we would come in 325 00:16:26,945 --> 00:16:29,644 and eventually we would converge. 326 00:16:29,644 --> 00:16:32,850 So if J is not quadratic, you just 327 00:16:32,850 --> 00:16:35,470 keep performing Taylor series, getting a new point, 328 00:16:35,470 --> 00:16:38,264 and then repeating until it's converged. 329 00:16:38,264 --> 00:16:40,430 So this turns out to be a really efficient technique 330 00:16:40,430 --> 00:16:42,138 if you start near the solution, if you're 331 00:16:42,138 --> 00:16:45,720 starting sort of close to one of the local bowls of attraction 332 00:16:45,720 --> 00:16:47,129 if you're minimizing. 333 00:16:47,129 --> 00:16:49,670 But the problem is that you can see that now we need not just 334 00:16:49,670 --> 00:16:52,810 gradients, but we need also the second derivative. 335 00:16:52,810 --> 00:16:55,010 I'm going to talk in a second about how 336 00:16:55,010 --> 00:17:00,860 to estimate the gradients even if is actually in many cases 337 00:17:00,860 --> 00:17:04,865 getting this Hessian ends up to be much too expensive. 338 00:17:04,865 --> 00:17:07,236 AUDIENCE: [INAUDIBLE] 339 00:17:07,236 --> 00:17:09,069 KAREN WILLCOX: How do we define convergence? 340 00:17:09,069 --> 00:17:10,377 AUDIENCE: [INAUDIBLE]. 341 00:17:10,377 --> 00:17:12,909 Yeah, so for an unconstrained problem, 342 00:17:12,909 --> 00:17:14,750 remember we're looking for two conditions. 343 00:17:14,750 --> 00:17:17,829 We're looking for the gradient of j equal to 0. 344 00:17:17,829 --> 00:17:22,630 And we're looking for Hessian to be positive definite. 345 00:17:22,630 --> 00:17:27,319 Or positive semi-definite depending on the uniqueness. 346 00:17:27,319 --> 00:17:28,055 That's in theory. 347 00:17:28,055 --> 00:17:30,100 It turns out that in implementation, it 348 00:17:30,100 --> 00:17:34,320 can be a little bit tricky to monitor convergence. 349 00:17:34,320 --> 00:17:38,840 So often you would look for the sequence of the design 350 00:17:38,840 --> 00:17:40,150 variables to stop changing. 351 00:17:40,150 --> 00:17:42,700 In other words, when the updates become really small. 352 00:17:42,700 --> 00:17:43,780 And that's when you would terminate, 353 00:17:43,780 --> 00:17:46,154 and then you would try to check the optimality condition. 354 00:17:48,540 --> 00:17:53,065 So let's talk about how to estimate the gradient. 355 00:17:57,520 --> 00:18:05,030 And this actually will connect back to some stuff 356 00:18:05,030 --> 00:18:06,830 that you saw earlier in the semester. 357 00:18:11,800 --> 00:18:14,210 So we're onto number two, which is computing gradients. 358 00:18:22,814 --> 00:18:31,750 So, generally we're going to need definitely 359 00:18:31,750 --> 00:18:37,900 grad j, which remember was the partial derivatives of j 360 00:18:37,900 --> 00:18:42,910 with respect to all of the design variables. 361 00:18:48,470 --> 00:18:55,870 And we might need the second derivatives. 362 00:18:55,870 --> 00:18:58,588 If we wanted to do the Newton method, 363 00:18:58,588 --> 00:19:00,774 then we might also need the second derivatives. 364 00:19:00,774 --> 00:19:02,273 And I won't write it out, but again, 365 00:19:02,273 --> 00:19:09,820 remember that was the n by n matrix of second order 366 00:19:09,820 --> 00:19:11,838 partials. 367 00:19:11,838 --> 00:19:16,280 So methods to compute gradient. 368 00:19:16,280 --> 00:19:16,850 Any ideas? 369 00:19:19,556 --> 00:19:20,730 Finite difference, yup. 370 00:19:20,730 --> 00:19:25,490 So that's finite difference, which we'll talk about. 371 00:19:25,490 --> 00:19:28,430 That's probably the most commonly used in practice 372 00:19:28,430 --> 00:19:31,500 that people are using. 373 00:19:31,500 --> 00:19:32,380 Any other ideas? 374 00:19:39,310 --> 00:19:41,574 If you could, what would be the most reliable way 375 00:19:41,574 --> 00:19:42,240 to get gradient? 376 00:19:45,070 --> 00:19:46,203 Differentiate, yeah. 377 00:19:46,203 --> 00:19:48,908 So analytical gradients, if you can. 378 00:19:51,430 --> 00:19:54,709 If you have a code, or if you have a model that analytic, 379 00:19:54,709 --> 00:19:56,500 that's all functions you can differentiate, 380 00:19:56,500 --> 00:19:57,730 then differentiate. 381 00:19:57,730 --> 00:20:02,520 Analytic gradients is definitely the way to go. 382 00:20:02,520 --> 00:20:05,730 Have you guys heard of the adjoint method, 383 00:20:05,730 --> 00:20:07,320 that Professor Wong talked about? 384 00:20:07,320 --> 00:20:07,820 [INAUDIBLE] 385 00:20:10,450 --> 00:20:13,110 So this is a really powerful way to get 386 00:20:13,110 --> 00:20:15,321 gradients when you have particularly 387 00:20:15,321 --> 00:20:17,445 CFD models or [? panelement ?] models that have got 388 00:20:17,445 --> 00:20:19,660 lots and lots and lots of design variables. 389 00:20:19,660 --> 00:20:22,320 So like if you're trying to do shape optimization 390 00:20:22,320 --> 00:20:24,670 of an aircraft wing and your design variables were 391 00:20:24,670 --> 00:20:31,370 all the surface points or all the properties of the sections 392 00:20:31,370 --> 00:20:33,620 in a wing, so you have hundreds and hundreds of design 393 00:20:33,620 --> 00:20:37,430 variables, people use what's the adjoint method 394 00:20:37,430 --> 00:20:38,847 to come up with gradients. 395 00:20:38,847 --> 00:20:40,805 Has anybody heard of automatic differentiation? 396 00:20:47,860 --> 00:20:50,500 So this is something that's becoming kind of increasingly 397 00:20:50,500 --> 00:20:55,300 popular, where people write these automatic differentiation 398 00:20:55,300 --> 00:20:58,272 codes, and basically you take your code-- 399 00:20:58,272 --> 00:21:04,040 so you take your code that takes in x and puts out J, 400 00:21:04,040 --> 00:21:06,500 and you feed it through this automatic differentiator, 401 00:21:06,500 --> 00:21:09,310 and it gives you back a code that takes in x and gives 402 00:21:09,310 --> 00:21:14,309 you grad J. Sort of somewhere at the intersection of computer 403 00:21:14,309 --> 00:21:16,100 science and computational science, I guess. 404 00:21:16,100 --> 00:21:18,685 They're write these codes that will go through your code 405 00:21:18,685 --> 00:21:21,040 and differentiate it line by line by line, 406 00:21:21,040 --> 00:21:24,970 and use the chain rule, and basically 407 00:21:24,970 --> 00:21:26,730 automatically differentiate it. 408 00:21:26,730 --> 00:21:28,850 And so there's an example of this in [? EETS ?]. 409 00:21:28,850 --> 00:21:31,320 There's a big model called the MIT GCM, which is the MIT 410 00:21:31,320 --> 00:21:32,553 Global Circulation Model. 411 00:21:32,553 --> 00:21:34,455 I don't know if anybody's heard of it. 412 00:21:34,455 --> 00:21:36,330 Nicholas, you work over in [? EETS ?], right? 413 00:21:36,330 --> 00:21:36,862 Have you-- 414 00:21:36,862 --> 00:21:37,737 AUDIENCE: [INAUDIBLE] 415 00:21:41,390 --> 00:21:43,560 KAREN WILLCOX: MIT's been building for a long time 416 00:21:43,560 --> 00:21:45,620 this really massive community model that's 417 00:21:45,620 --> 00:21:47,965 doing a lot of modeling of the ocean and atmosphere 418 00:21:47,965 --> 00:21:49,850 and interactions for climate change. 419 00:21:49,850 --> 00:21:52,220 And so they've used automatic differentiation 420 00:21:52,220 --> 00:21:55,730 so that you can take this massive model 421 00:21:55,730 --> 00:21:57,805 and come up with gradients. 422 00:21:57,805 --> 00:21:59,680 And it's symbolic differentiation too, right? 423 00:21:59,680 --> 00:22:01,235 Isn't it Mathematica? 424 00:22:01,235 --> 00:22:03,860 There's a lot of different ways you can come up with gradients. 425 00:22:03,860 --> 00:22:06,470 [? And there's ?] [? another one ?] [INAUDIBLE] 426 00:22:06,470 --> 00:22:08,962 something called the complex step method, 427 00:22:08,962 --> 00:22:11,045 which I'm afraid we don't have time to talk about. 428 00:22:11,045 --> 00:22:14,312 It's a pretty neat trick. 429 00:22:14,312 --> 00:22:15,520 I can't remember the formula. 430 00:22:15,520 --> 00:22:17,643 Do you know the formula, Alex? 431 00:22:17,643 --> 00:22:18,767 AUDIENCE: [INAUDIBLE] 432 00:22:18,767 --> 00:22:19,600 KAREN WILLCOX: Yeah. 433 00:22:19,600 --> 00:22:21,558 I don't want to write it down and get it wrong, 434 00:22:21,558 --> 00:22:25,500 but if you can take your model, your J of x 435 00:22:25,500 --> 00:22:30,017 and put in a complex x-- maybe you can find the formula. 436 00:22:30,017 --> 00:22:31,600 Anyway, it's a really neat trick where 437 00:22:31,600 --> 00:22:33,850 you can pretend that your variables are complex, 438 00:22:33,850 --> 00:22:35,808 and then it turns out that the gradient ends up 439 00:22:35,808 --> 00:22:37,977 being the imaginary part. 440 00:22:37,977 --> 00:22:38,810 So that's also neat. 441 00:22:38,810 --> 00:22:41,300 And if your code's written in Fortran, 442 00:22:41,300 --> 00:22:42,720 this can be a neat trick. 443 00:22:42,720 --> 00:22:44,557 Do you guys even know Fortran? 444 00:22:44,557 --> 00:22:47,140 You know, Professor [? Jolley ?] still writes all of his codes 445 00:22:47,140 --> 00:22:47,928 in Fortran. 446 00:22:47,928 --> 00:22:50,370 AUDIENCE: [INAUDIBLE] 447 00:22:50,370 --> 00:22:52,050 KAREN WILLCOX: You're painfully aware? 448 00:22:52,050 --> 00:22:54,215 So in Fortran, you can define complex variables. 449 00:22:54,215 --> 00:22:55,090 AUDIENCE: [INAUDIBLE] 450 00:22:58,280 --> 00:23:01,110 KAREN WILLCOX: The imaginary part of J, 451 00:23:01,110 --> 00:23:04,604 x of [? pi ?] times [? a ?] [? h ?] is equal 452 00:23:04,604 --> 00:23:05,270 to the gradient? 453 00:23:05,270 --> 00:23:06,545 AUDIENCE: [INAUDIBLE] 454 00:23:06,545 --> 00:23:07,920 KAREN WILLCOX: Divided by [? J ?] 455 00:23:07,920 --> 00:23:09,780 is equal to the gradient. 456 00:23:09,780 --> 00:23:13,124 Yeah, so [INAUDIBLE]. 457 00:23:13,124 --> 00:23:16,660 It turns out if you take a complex step, 458 00:23:16,660 --> 00:23:19,180 if you take your variable x and add a complex step, 459 00:23:19,180 --> 00:23:23,376 and then divide by that h, it turns out 460 00:23:23,376 --> 00:23:25,560 the imaginary part of that result ends up 461 00:23:25,560 --> 00:23:28,350 giving you the gradient. 462 00:23:28,350 --> 00:23:30,310 OK, but anyway. 463 00:23:30,310 --> 00:23:32,430 Finite differences is the most commonly used 464 00:23:32,430 --> 00:23:34,625 because it's sort of the general purpose one. 465 00:23:34,625 --> 00:23:41,330 So let's just look at how finite differences work. 466 00:23:41,330 --> 00:23:43,840 And these formulas should look somewhat familiar, 467 00:23:43,840 --> 00:23:45,730 although you saw them in a different setting. 468 00:23:45,730 --> 00:23:48,415 Remember, you saw finite differences 469 00:23:48,415 --> 00:23:51,640 when we were approximating partial derivatives 470 00:23:51,640 --> 00:23:52,790 in the [? PEs ?], right? 471 00:23:52,790 --> 00:23:54,000 Remember way back? 472 00:23:54,000 --> 00:23:55,583 You're look at the convection equation 473 00:23:55,583 --> 00:23:57,660 or the convection diffusion equation, where 474 00:23:57,660 --> 00:24:01,770 you had things like du dx, or [INAUDIBLE] squared [INAUDIBLE] 475 00:24:01,770 --> 00:24:03,680 squared, and you used finite differences? 476 00:24:03,680 --> 00:24:06,420 The exact same idea can be applied now 477 00:24:06,420 --> 00:24:10,230 to find the gradient of J with respect to design variables x. 478 00:24:10,230 --> 00:24:17,060 So for example, if we were looking for dJ dx i 479 00:24:17,060 --> 00:24:21,140 at some point, let's just call it x0, 480 00:24:21,140 --> 00:24:29,430 then we would just take J and we'd have x1, x2, all the way, 481 00:24:29,430 --> 00:24:33,960 and we would have xi plus delta xi. 482 00:24:33,960 --> 00:24:39,230 So we're going to perturb the ith design variable. 483 00:24:39,230 --> 00:24:42,820 And then we would have xi plus 1 all the way up to xn. 484 00:24:42,820 --> 00:24:45,490 They would all be fixed. 485 00:24:45,490 --> 00:24:47,810 And we're going to differentiate that 486 00:24:47,810 --> 00:24:58,492 with the baseline point, xi up to xn. 487 00:25:01,390 --> 00:25:06,300 And all that's going to be divided by delta xi. 488 00:25:06,300 --> 00:25:10,160 And so this thing here is the numerator. 489 00:25:10,160 --> 00:25:18,260 So this is the baseline point, the x0 would be right here. 490 00:25:23,309 --> 00:25:26,290 So there's the baseline point. 491 00:25:26,290 --> 00:25:30,650 And then if we want to estimate the gradient in the direction 492 00:25:30,650 --> 00:25:34,650 xi, we just take xi and we perturb it by delta xi, 493 00:25:34,650 --> 00:25:37,100 take the difference between the function [INAUDIBLE] 494 00:25:37,100 --> 00:25:42,380 perturbed point minus the baseline divided by delta xi. 495 00:25:42,380 --> 00:25:43,282 Yeah. 496 00:25:43,282 --> 00:25:44,990 So again, it looks pretty similar to what 497 00:25:44,990 --> 00:25:48,832 you saw when we talked about finite differences earlier. 498 00:25:48,832 --> 00:25:51,010 So this would be a one-sided difference. 499 00:25:54,635 --> 00:25:55,510 AUDIENCE: [INAUDIBLE] 500 00:25:58,301 --> 00:26:00,050 KAREN WILLCOX: Different ways of computing 501 00:26:00,050 --> 00:26:00,670 the finite difference? 502 00:26:00,670 --> 00:26:01,253 Yeah, exactly. 503 00:26:01,253 --> 00:26:02,660 So that's a one-sided. 504 00:26:02,660 --> 00:26:06,010 You could also do a two-sided difference 505 00:26:06,010 --> 00:26:10,219 with the same partial. 506 00:26:10,219 --> 00:26:16,730 So it would be J dx i evaluated at the point 507 00:26:16,730 --> 00:26:31,750 x0 could be J x1, x2, to xi plus delta xi minus J x1, x2, 508 00:26:31,750 --> 00:26:33,104 xi minus delta xi. 509 00:26:36,896 --> 00:26:39,771 All divided out by 2. 510 00:26:39,771 --> 00:26:42,466 [INAUDIBLE] central difference. 511 00:26:42,466 --> 00:26:46,369 So you can evaluate the derivative 512 00:26:46,369 --> 00:26:49,390 at the same point using the point in one side, 513 00:26:49,390 --> 00:26:52,725 or you could use one little step forward, one little step back, 514 00:26:52,725 --> 00:26:54,225 and do it like a central difference. 515 00:26:56,765 --> 00:26:59,640 And if you wanted to do higher order, 516 00:26:59,640 --> 00:27:02,730 you could take more points. 517 00:27:02,730 --> 00:27:06,050 But now here's the catch. 518 00:27:06,050 --> 00:27:15,400 So if we want to estimate [? compute ?] to compute 519 00:27:15,400 --> 00:27:22,110 grad J at whatever the current point is in the design space, 520 00:27:22,110 --> 00:27:24,970 how many function calls do we need if we 521 00:27:24,970 --> 00:27:26,489 use a one-sided difference? 522 00:27:26,489 --> 00:27:28,280 How many times do we need to run our model? 523 00:27:36,310 --> 00:27:39,890 2n if we were to use this one. 524 00:27:39,890 --> 00:27:41,080 How about this one? 525 00:27:41,080 --> 00:27:42,900 Well, almost 2n. 526 00:27:42,900 --> 00:27:46,430 n plus 1? 527 00:27:46,430 --> 00:27:48,250 So 1 for the baseline and then it's 528 00:27:48,250 --> 00:27:50,540 going to be a little step in x1, a little step in x2, 529 00:27:50,540 --> 00:27:52,800 a little step in x3, all the way through xn. 530 00:27:55,400 --> 00:28:01,180 So with a one-sided, we would need n plus 1. 531 00:28:01,180 --> 00:28:04,750 And for a two-sided, we would just need 2n. 532 00:28:04,750 --> 00:28:06,100 We don't have the baseline. 533 00:28:09,470 --> 00:28:10,420 Function evaluations. 534 00:28:10,420 --> 00:28:15,720 Now remember, each function evaluation 535 00:28:15,720 --> 00:28:17,410 in the course of your simulation code, 536 00:28:17,410 --> 00:28:20,360 which could take minutes or hours or even days. 537 00:28:20,360 --> 00:28:23,690 So if you have 100 design variables, 538 00:28:23,690 --> 00:28:26,376 each time you want a gradient, you're 539 00:28:26,376 --> 00:28:29,250 going to have to do of the order of 100 calls to your function. 540 00:28:29,250 --> 00:28:31,083 And remember, you're going to need gradients 541 00:28:31,083 --> 00:28:32,468 at each point in the design space 542 00:28:32,468 --> 00:28:34,426 when you're trying to figure out where to move. 543 00:28:34,426 --> 00:28:37,020 So it can get expensive pretty quickly. 544 00:28:37,020 --> 00:28:40,940 How about if you wanted to compute the Hessian 545 00:28:40,940 --> 00:28:41,990 by finite differences? 546 00:28:48,880 --> 00:28:50,670 Well, we didn't write down the formula, 547 00:28:50,670 --> 00:28:54,980 but you remember the formula for the second derivative, right? 548 00:28:54,980 --> 00:29:01,206 If we were computing that guy, then it would be J, 549 00:29:01,206 --> 00:29:03,109 and we would do xi-- probably don't want 550 00:29:03,109 --> 00:29:04,400 to write out all the arguments. 551 00:29:04,400 --> 00:29:16,216 xi minus 2J at the nominal point minus J doing xi minus-- 552 00:29:16,216 --> 00:29:18,358 do you remember that formula? 553 00:29:18,358 --> 00:29:21,300 Divided by delta x squared. 554 00:29:21,300 --> 00:29:25,720 The central finite difference for the second order 555 00:29:25,720 --> 00:29:26,310 derivative. 556 00:29:26,310 --> 00:29:29,360 Do you guys remember that? 557 00:29:29,360 --> 00:29:31,970 Am I confusing you, or is it OK to not write out all the x's? 558 00:29:31,970 --> 00:29:32,720 Is that all right? 559 00:29:32,720 --> 00:29:34,160 Yeah. 560 00:29:34,160 --> 00:29:40,030 So to compute second derivative, first of all 561 00:29:40,030 --> 00:29:43,220 you're going to need the point in front and the point behind. 562 00:29:43,220 --> 00:29:46,395 But how many entries in the matrix? 563 00:29:46,395 --> 00:29:48,405 AUDIENCE: [INAUDIBLE] 564 00:29:48,405 --> 00:29:49,780 KAREN WILLCOX: Yeah, it's n by n. 565 00:29:49,780 --> 00:29:51,330 It turns out it's symmetric, so you 566 00:29:51,330 --> 00:29:53,540 don't need-- remember it's symmetric, 567 00:29:53,540 --> 00:29:56,030 so you only need to compute the diagonal 568 00:29:56,030 --> 00:29:57,540 and the upper triangle. 569 00:29:57,540 --> 00:30:04,060 It still turns out to be n plus 1 over 2 entries, 570 00:30:04,060 --> 00:30:06,700 and then you need the extra function evaluation, 571 00:30:06,700 --> 00:30:08,610 so then you would need two for each of these. 572 00:30:08,610 --> 00:30:11,770 So it basically ends up being [INAUDIBLE] n squared. 573 00:30:11,770 --> 00:30:16,270 And so if you have 100 design variables, that's 100 squared, 574 00:30:16,270 --> 00:30:19,340 10,000 calls to your function to get a Hessian matrix 575 00:30:19,340 --> 00:30:22,229 of the order every iteration. 576 00:30:22,229 --> 00:30:24,270 Which is why I say the Newton method is powerful, 577 00:30:24,270 --> 00:30:26,490 but if you're doing gradients by finite difference, 578 00:30:26,490 --> 00:30:32,617 almost it means you can't afford to use the Newton method. 579 00:30:32,617 --> 00:30:34,200 But the good news is that MATLAB takes 580 00:30:34,200 --> 00:30:35,775 care of all of this for you. 581 00:30:40,140 --> 00:30:49,050 We can have a look at how MATLAB does that for you. 582 00:30:49,050 --> 00:30:52,221 Is this clear how you estimate the gradients? 583 00:30:52,221 --> 00:30:52,720 Yup? 584 00:30:55,690 --> 00:30:59,650 So let me pull this. 585 00:30:59,650 --> 00:31:11,490 So I showed you already on Wednesday. 586 00:31:21,843 --> 00:31:34,050 [INAUDIBLE] So I showed you already 587 00:31:34,050 --> 00:31:38,555 on Wednesday this little [? piece ?]. 588 00:31:38,555 --> 00:31:48,625 You might remember when we looked at the landscape 589 00:31:48,625 --> 00:31:51,676 and we saw the little asterisks climbing around. 590 00:31:51,676 --> 00:32:01,230 So I showed you [INAUDIBLE], but here's the call. 591 00:32:01,230 --> 00:32:04,970 So remember I talked about [? Effman ?] unc, which 592 00:32:04,970 --> 00:32:08,730 is the unconstrained gradient-based optimization 593 00:32:08,730 --> 00:32:11,071 method in MATLAB. 594 00:32:11,071 --> 00:32:13,320 And I also talked about [? Effman ?] search, 595 00:32:13,320 --> 00:32:14,550 which I said was the [INAUDIBLE] simplex. 596 00:32:14,550 --> 00:32:15,870 Remember that was the triangles that 597 00:32:15,870 --> 00:32:18,203 were flipping around and going through the design space. 598 00:32:18,203 --> 00:32:20,230 So the thing with [? Naldemede ?] simplex is you 599 00:32:20,230 --> 00:32:20,940 don't need gradients. 600 00:32:20,940 --> 00:32:21,710 It doesn't need gradients. 601 00:32:21,710 --> 00:32:24,150 It just does these three triangles and all these rules 602 00:32:24,150 --> 00:32:25,092 to contract them. 603 00:32:25,092 --> 00:32:26,550 It's only ever looking at the three 604 00:32:26,550 --> 00:32:28,264 points in order in the design. 605 00:32:28,264 --> 00:32:30,430 So if you have a function that's not differentiable, 606 00:32:30,430 --> 00:32:32,013 you can still use [? Effman ?] search. 607 00:32:32,013 --> 00:32:34,200 But if you have something that's not differentiable 608 00:32:34,200 --> 00:32:36,590 and you try to use it [? Effman ?] unc, 609 00:32:36,590 --> 00:32:39,200 MATLAB's going to be trying to compute gradients-- 610 00:32:39,200 --> 00:32:42,260 not second derivatives but first derivatives-- 611 00:32:42,260 --> 00:32:44,985 and could get itself into trouble. 612 00:32:44,985 --> 00:32:48,110 But the really nice thing with these MATLAB functions 613 00:32:48,110 --> 00:32:51,720 is that you basically supply a function, that 614 00:32:51,720 --> 00:32:56,930 given x computes J. And you supply 615 00:32:56,930 --> 00:33:00,039 an initial guess, an initial point in the landscape, 616 00:33:00,039 --> 00:33:02,330 and that's actually the only thing you have to give it. 617 00:33:02,330 --> 00:33:04,121 If you don't want to give it anything more, 618 00:33:04,121 --> 00:33:06,200 MATLAB will take care of everything else for you. 619 00:33:06,200 --> 00:33:09,370 You can give it just that black box function, 620 00:33:09,370 --> 00:33:13,580 given x computes J, and an initial guess. 621 00:33:13,580 --> 00:33:15,860 If you know a little bit more about what's going on, 622 00:33:15,860 --> 00:33:18,320 you can also choose to manage options. 623 00:33:18,320 --> 00:33:20,830 So you can do things like turn on see more information 624 00:33:20,830 --> 00:33:22,550 about the iteration. 625 00:33:22,550 --> 00:33:24,430 You can play with tolerances. 626 00:33:24,430 --> 00:33:34,690 You can even specify exactly which methods you want to use. 627 00:33:34,690 --> 00:33:38,305 So here's the documentation for [? Effman ?] unc. 628 00:33:38,305 --> 00:33:40,800 So you see there, there's just the basic call, 629 00:33:40,800 --> 00:33:43,478 giving it a function and an initial guess. 630 00:33:46,170 --> 00:33:49,080 So you see all these different options in here. 631 00:33:49,080 --> 00:33:52,030 So you can have the option to supply a gradient. 632 00:33:52,030 --> 00:33:54,590 So if you had a function that was analytic 633 00:33:54,590 --> 00:33:56,370 and you could differentiate it, it 634 00:33:56,370 --> 00:34:00,490 would be a really good idea to give that gradient to MATLAB 635 00:34:00,490 --> 00:34:02,260 as well as giving it the function. 636 00:34:02,260 --> 00:34:04,230 Or if you would use automatic differentiation 637 00:34:04,230 --> 00:34:07,090 and come up with a code that computed gradients, 638 00:34:07,090 --> 00:34:10,560 and you're able to specify that as an additional option. 639 00:34:17,392 --> 00:34:20,540 And last thing I just want to show you is the methods. 640 00:34:28,659 --> 00:34:30,510 Check it in here. 641 00:34:30,510 --> 00:34:33,822 I'm not sure if it's telling us what the methods are. 642 00:34:43,442 --> 00:35:03,546 [INAUDIBLE] and have a look, see what method it is. 643 00:35:03,546 --> 00:35:05,540 So again, I didn't talk about them, 644 00:35:05,540 --> 00:35:09,240 but there's a class of methods called quasi Newton 645 00:35:09,240 --> 00:35:11,570 methods, which in a way are kind of 646 00:35:11,570 --> 00:35:15,300 like the best of both worlds between the first order 647 00:35:15,300 --> 00:35:18,480 and the second order methods. 648 00:35:18,480 --> 00:35:22,630 So if you remember here back in the Newton's method, 649 00:35:22,630 --> 00:35:27,980 [INAUDIBLE] design space to be the inverse of the Hessian 650 00:35:27,980 --> 00:35:29,050 times the gradient. 651 00:35:29,050 --> 00:35:31,210 But then we said that [? computation ?] was really 652 00:35:31,210 --> 00:35:32,095 too expensive. 653 00:35:32,095 --> 00:35:34,180 So what a quasi Newton method does 654 00:35:34,180 --> 00:35:36,280 is that it basically introduces an approximation 655 00:35:36,280 --> 00:35:36,946 for the Hessian. 656 00:35:36,946 --> 00:35:39,250 That's this A thing here. 657 00:35:39,250 --> 00:35:40,850 And it builds up this approximation 658 00:35:40,850 --> 00:35:42,970 using the gradients that it's computing anyway. 659 00:35:42,970 --> 00:35:45,025 So don't worry about the details of what 660 00:35:45,025 --> 00:35:47,150 is on here, because it's a little bit beyond what I 661 00:35:47,150 --> 00:35:50,030 want you guys to be comfortable with. 662 00:35:50,030 --> 00:35:54,120 But more or less every time we compute a gradient, 663 00:35:54,120 --> 00:35:56,620 we can also use differences between gradients 664 00:35:56,620 --> 00:35:59,420 to get approximations to Hessians. 665 00:35:59,420 --> 00:36:04,090 And I'm pretty sure that's the method that's sitting 666 00:36:04,090 --> 00:36:05,589 underneath here in MATLAB. 667 00:36:05,589 --> 00:36:06,464 AUDIENCE: [INAUDIBLE] 668 00:36:09,835 --> 00:36:10,960 KAREN WILLCOX: [INAUDIBLE]. 669 00:36:10,960 --> 00:36:13,508 OK. 670 00:36:13,508 --> 00:36:14,383 AUDIENCE: [INAUDIBLE] 671 00:36:27,839 --> 00:36:29,880 KAREN WILLCOX: So we're not guaranteed because it 672 00:36:29,880 --> 00:36:31,710 depends on the problem. 673 00:36:31,710 --> 00:36:34,320 So if you have a problem where there's a direction where 674 00:36:34,320 --> 00:36:35,856 the curvature is flat, then you're 675 00:36:35,856 --> 00:36:37,230 going to have a singular Hessian. 676 00:36:37,230 --> 00:36:38,771 And then you just have to be careful. 677 00:36:38,771 --> 00:36:40,695 AUDIENCE: [INAUDIBLE] Yeah. 678 00:36:40,695 --> 00:36:44,560 If J were linear in any design variable, when 679 00:36:44,560 --> 00:36:46,390 you differentiate twice you would get 0. 680 00:36:46,390 --> 00:36:51,230 And you have to be very careful if you have [INAUDIBLE] inverse 681 00:36:51,230 --> 00:36:52,700 of the Hessian. 682 00:36:52,700 --> 00:36:54,412 But there are ways you can handle that, 683 00:36:54,412 --> 00:36:57,650 more advanced optimizations. 684 00:36:57,650 --> 00:36:59,344 OK. 685 00:36:59,344 --> 00:37:02,410 So I'm not going to talk about constrained methods, 686 00:37:02,410 --> 00:37:08,440 but I did want to just show you briefly, and show you also 687 00:37:08,440 --> 00:37:13,904 in MATLAB how constrained methods work. 688 00:37:13,904 --> 00:37:16,070 So there's something called the [? buns ?] function. 689 00:37:16,070 --> 00:37:17,010 Actually the optimization community 690 00:37:17,010 --> 00:37:18,630 loves these little simple problems. 691 00:37:18,630 --> 00:37:20,240 Remember I talked about [? Rosenbrot ?] the other day 692 00:37:20,240 --> 00:37:21,656 and called it the banana function. 693 00:37:21,656 --> 00:37:24,514 So there's another one called the [? buns ?] function, 694 00:37:24,514 --> 00:37:26,180 which is actually a constrained problem. 695 00:37:26,180 --> 00:37:28,170 So it turns out this is the objective, which 696 00:37:28,170 --> 00:37:29,670 I know doesn't really mean anything, 697 00:37:29,670 --> 00:37:33,890 but the difference of this one is it's got constraints. 698 00:37:33,890 --> 00:37:36,400 And so if we look at the visualization, 699 00:37:36,400 --> 00:37:39,360 here's the design space, design variables x1 and x2. 700 00:37:39,360 --> 00:37:41,240 And now the contours, they're called contours 701 00:37:41,240 --> 00:37:42,920 of the objective function. 702 00:37:42,920 --> 00:37:45,280 And then the constraints are saying 703 00:37:45,280 --> 00:37:47,500 the solution has to lie above this one, 704 00:37:47,500 --> 00:37:49,360 has to lie to the left of this one, 705 00:37:49,360 --> 00:37:51,129 and has to lie above this constraint. 706 00:37:51,129 --> 00:37:53,170 So again, when I was talking about the landscape, 707 00:37:53,170 --> 00:37:54,836 remember I said think of the constraints 708 00:37:54,836 --> 00:37:56,530 as being like the keep-out regions, 709 00:37:56,530 --> 00:38:00,450 telling you where in the design space you can be. 710 00:38:00,450 --> 00:38:03,780 And so in this case our possible design region 711 00:38:03,780 --> 00:38:07,740 is this thing here traced out by the constraints. 712 00:38:07,740 --> 00:38:11,000 So it turns out this problem has two local solutions. 713 00:38:11,000 --> 00:38:13,889 One is also the global solution. 714 00:38:13,889 --> 00:38:15,680 If we try and minimize, the global solution 715 00:38:15,680 --> 00:38:18,724 is actually up here in the corner of the design space, 716 00:38:18,724 --> 00:38:20,140 which you can see from the colors. 717 00:38:20,140 --> 00:38:22,500 It's the bluest color. 718 00:38:22,500 --> 00:38:24,710 But there's also a local optimum that 719 00:38:24,710 --> 00:38:30,810 sits here on this constraint down here at about 50, 20. 720 00:38:30,810 --> 00:38:33,070 And I should say that there are bounds on the design 721 00:38:33,070 --> 00:38:36,900 variables, which is why this global solution sits up here, 722 00:38:36,900 --> 00:38:39,421 because the design variables are at their bounds. 723 00:38:39,421 --> 00:38:43,848 So that's a simple constrained problem. 724 00:38:43,848 --> 00:38:57,650 And I just want to show you running the algorithm. 725 00:38:57,650 --> 00:39:03,810 So [? fman ?] [? con ?] is the MATLAB function that does 726 00:39:03,810 --> 00:39:04,880 constrained optimization. 727 00:39:04,880 --> 00:39:07,110 And again, it's really super simple to use. 728 00:39:07,110 --> 00:39:09,200 At a minimum, all you have to do is 729 00:39:09,200 --> 00:39:14,170 give it a function that given x returns J of x, 730 00:39:14,170 --> 00:39:17,230 give it an initial guess to start with, 731 00:39:17,230 --> 00:39:18,640 and give it a function that given 732 00:39:18,640 --> 00:39:21,070 x returns the constraints, the g of x 733 00:39:21,070 --> 00:39:23,720 or the h of x if you have them. 734 00:39:23,720 --> 00:39:26,632 So given x, return J of x, return g of x and h 735 00:39:26,632 --> 00:39:29,234 of x, and an initial guess. 736 00:39:29,234 --> 00:39:30,900 And there are lots more options that you 737 00:39:30,900 --> 00:39:33,610 can set if you want to. 738 00:39:33,610 --> 00:39:36,819 And you can look at lots more things, but in the simplest 739 00:39:36,819 --> 00:39:38,360 implementation-- and then MATLAB will 740 00:39:38,360 --> 00:39:40,700 take care of finite differencing for you 741 00:39:40,700 --> 00:39:43,380 to do the gradient, the step size, everything. 742 00:39:43,380 --> 00:39:45,597 Everything you have to worry about. 743 00:39:45,597 --> 00:39:47,680 But of course, again, if you have an efficient way 744 00:39:47,680 --> 00:39:49,860 of computing gradients, you have the option 745 00:39:49,860 --> 00:39:56,570 to specify the gradients and make 746 00:39:56,570 --> 00:39:59,840 things run a lot more reliably and a lot more efficiently. 747 00:39:59,840 --> 00:40:07,990 So just go back up here, so this one's 748 00:40:07,990 --> 00:40:12,600 actually using-- this is a quasi Newton method. 749 00:40:12,600 --> 00:40:15,870 There's also a Newton method that [INAUDIBLE]. 750 00:40:15,870 --> 00:40:18,390 I think there are actually three levels of optimization 751 00:40:18,390 --> 00:40:21,750 in [? fman ?] [? con ?], two different kinds of quasi Newton 752 00:40:21,750 --> 00:40:26,334 method, and there's a Newton method that's in there. 753 00:40:26,334 --> 00:40:27,750 In order to run the Newton method, 754 00:40:27,750 --> 00:40:29,937 you have to actually supply a gradient for it, 755 00:40:29,937 --> 00:40:32,020 so that it doesn't do all that finite differencing 756 00:40:32,020 --> 00:40:34,830 thing for the Hessian. 757 00:40:34,830 --> 00:40:38,385 And what we can look at, what we're looking at here 758 00:40:38,385 --> 00:40:39,760 are different initial conditions. 759 00:40:39,760 --> 00:40:41,780 So here's that [? buns ?] function 760 00:40:41,780 --> 00:40:44,810 that I was showing you. 761 00:40:44,810 --> 00:40:47,430 And in this particular case, we initialize here, 762 00:40:47,430 --> 00:40:49,620 and what's being ported are the steps 763 00:40:49,620 --> 00:40:53,000 that the optimizer's taking in the design space. 764 00:40:53,000 --> 00:40:55,100 I'm not going to talk about the algorithms at all, 765 00:40:55,100 --> 00:40:57,420 but just to get a sense that when you have constraints, 766 00:40:57,420 --> 00:40:59,628 things are much more difficult, because it's not just 767 00:40:59,628 --> 00:41:01,040 about going downhill. 768 00:41:01,040 --> 00:41:02,690 But here we could just go downhill, 769 00:41:02,690 --> 00:41:04,064 and you can see more or less this 770 00:41:04,064 --> 00:41:06,881 looks kind of like a steepest descent direction. 771 00:41:06,881 --> 00:41:09,380 But then when you get here and you're right kind of actually 772 00:41:09,380 --> 00:41:11,194 on the wrong side of the constraint, 773 00:41:11,194 --> 00:41:13,110 you can't just go downhill, because you've got 774 00:41:13,110 --> 00:41:13,930 to worry about the constraint. 775 00:41:13,930 --> 00:41:15,380 And you can see what it's doing is 776 00:41:15,380 --> 00:41:18,202 it's kind of following the constraint and skipping around. 777 00:41:18,202 --> 00:41:20,410 So things get a lot harder when you have constraints, 778 00:41:20,410 --> 00:41:22,150 because you have to worry about descent, 779 00:41:22,150 --> 00:41:26,112 but also about satisfying the constraints. 780 00:41:26,112 --> 00:41:28,140 But again, if you take optimization class 781 00:41:28,140 --> 00:41:31,265 you'll learn lots of stuff there. 782 00:41:31,265 --> 00:41:32,140 AUDIENCE: [INAUDIBLE] 783 00:41:34,607 --> 00:41:35,440 KAREN WILLCOX: Yeah. 784 00:41:35,440 --> 00:41:37,900 So it depends on the algorithm. 785 00:41:37,900 --> 00:41:40,810 Some algorithms will allow it to violate the constraints 786 00:41:40,810 --> 00:41:42,310 a little bit along the way. 787 00:41:42,310 --> 00:41:44,150 Some will not. 788 00:41:44,150 --> 00:41:45,930 And there are all lots of different ways 789 00:41:45,930 --> 00:41:46,930 of handling constraints. 790 00:41:46,930 --> 00:41:48,596 There are things called penalty methods. 791 00:41:48,596 --> 00:41:50,570 There are things called barrier methods. 792 00:41:50,570 --> 00:41:53,100 There are interior point-- the barrier methods related 793 00:41:53,100 --> 00:41:56,210 to interior point methods that force things to stay away 794 00:41:56,210 --> 00:41:57,650 inside the constraints. 795 00:41:57,650 --> 00:42:01,760 It just depends on what algorithm you use. 796 00:42:01,760 --> 00:42:03,919 But that sort of relates to the next part 797 00:42:03,919 --> 00:42:05,460 that I want to show you, which is now 798 00:42:05,460 --> 00:42:07,530 when you think about convergence, 799 00:42:07,530 --> 00:42:09,910 so here is the objective function value. 800 00:42:09,910 --> 00:42:13,210 This is the iteration. 801 00:42:13,210 --> 00:42:15,510 So you can see the objective is coming down, 802 00:42:15,510 --> 00:42:18,119 but you don't just want to worry about what's happening now 803 00:42:18,119 --> 00:42:18,910 with the objective. 804 00:42:18,910 --> 00:42:21,007 You also have to look at the constraint violation. 805 00:42:21,007 --> 00:42:22,590 And here it [? skips ?] [? forward ?]. 806 00:42:22,590 --> 00:42:24,840 It looks like we found a pretty good design, 807 00:42:24,840 --> 00:42:27,560 but actually the constraint is violated. 808 00:42:27,560 --> 00:42:29,510 So this would be like an aircraft wing that 809 00:42:29,510 --> 00:42:31,912 exceeds the maximum stress constraint, which 810 00:42:31,912 --> 00:42:32,620 would be no good. 811 00:42:32,620 --> 00:42:34,554 And so that's why we have to keep going. 812 00:42:34,554 --> 00:42:36,720 And actually you can see a little bit of an increase 813 00:42:36,720 --> 00:42:41,210 here in the objective to get the constraint violation satisfied. 814 00:42:41,210 --> 00:42:44,030 And I'll say that when we try to apply these kinds of methods 815 00:42:44,030 --> 00:42:46,010 to realistic aircraft design problems that 816 00:42:46,010 --> 00:42:48,390 have lots of variables, they have even lots 817 00:42:48,390 --> 00:42:49,880 and lots of constraints. 818 00:42:49,880 --> 00:42:53,000 And satisfying constraints can be sometimes just 819 00:42:53,000 --> 00:42:55,590 about the most challenging thing for the optimizer. 820 00:42:55,590 --> 00:42:58,020 So it can often find good solutions, 821 00:42:58,020 --> 00:43:00,600 but then it turns out that they don't necessarily 822 00:43:00,600 --> 00:43:01,850 satisfy the constraints. 823 00:43:01,850 --> 00:43:03,960 You see another one here. 824 00:43:03,960 --> 00:43:06,350 In this case, we started with an infeasible design, 825 00:43:06,350 --> 00:43:08,075 one that violated these constraints, 826 00:43:08,075 --> 00:43:10,339 or violated this constraint. 827 00:43:10,339 --> 00:43:11,880 So you see first of all the optimizer 828 00:43:11,880 --> 00:43:13,580 tries to get it back to the feasible region. 829 00:43:13,580 --> 00:43:15,670 It tries to make sure the constraint is satisfied. 830 00:43:15,670 --> 00:43:17,667 And then it starts searching. 831 00:43:17,667 --> 00:43:19,750 And actually most of these designs are infeasible. 832 00:43:19,750 --> 00:43:26,330 It's only at the end that we come here to the local optimum. 833 00:43:26,330 --> 00:43:29,905 And so again, there's the function value. 834 00:43:29,905 --> 00:43:32,155 And you can see we start off violating the constraint. 835 00:43:32,155 --> 00:43:35,130 We satisfy it, and then we go back outside the constraint, 836 00:43:35,130 --> 00:43:39,060 and then eventually come back to it at the end. 837 00:43:39,060 --> 00:43:45,520 And then the last one starts again over here, 838 00:43:45,520 --> 00:43:47,070 where two constraints are violated. 839 00:43:47,070 --> 00:43:49,445 And you can see even though two constraints are violated, 840 00:43:49,445 --> 00:43:53,479 this one's actually able to get to that solution pretty easily. 841 00:43:53,479 --> 00:43:55,770 One thing you notice is none of the ones that I started 842 00:43:55,770 --> 00:43:59,180 actually found the global optimum. 843 00:43:59,180 --> 00:44:02,610 They all converge to the local optimum. 844 00:44:02,610 --> 00:44:04,490 And again, if you like to think more in 3D, 845 00:44:04,490 --> 00:44:06,823 remember that landscape where there was the big mountain 846 00:44:06,823 --> 00:44:08,230 and the two smaller peaks. 847 00:44:08,230 --> 00:44:10,600 In this case, we started here, we started here, 848 00:44:10,600 --> 00:44:12,950 and we started here, and they all ended up over here. 849 00:44:12,950 --> 00:44:15,574 Probably we would have to start on the other side of this ridge 850 00:44:15,574 --> 00:44:19,974 here in order to get up into the global optimum. 851 00:44:24,390 --> 00:44:24,890 OK. 852 00:44:24,890 --> 00:44:31,277 So I know you don't really know very much about constraint 853 00:44:31,277 --> 00:44:33,360 optimization, but you could all be [? dangerous ?] 854 00:44:33,360 --> 00:44:37,490 [? utilize ?] for the next year if you wanted to be. 855 00:44:37,490 --> 00:44:39,250 It's actually a really powerful thing 856 00:44:39,250 --> 00:44:41,440 if you're trying to do a design problem, 857 00:44:41,440 --> 00:44:43,767 or you have simulation code and you 858 00:44:43,767 --> 00:44:45,933 want to explore what's going on with the parameters. 859 00:44:45,933 --> 00:44:49,590 The toolbox in MATLAB is really good. 860 00:44:49,590 --> 00:44:51,090 And then if you say for grad school, 861 00:44:51,090 --> 00:44:52,673 you can take the graduate class that I 862 00:44:52,673 --> 00:44:56,578 teach that goes into this stuff in much, much more detail. 863 00:44:56,578 --> 00:45:00,800 AUDIENCE: Is there anything MATLAB, like other-- 864 00:45:00,800 --> 00:45:01,770 KAREN WILLCOX: Yeah. 865 00:45:01,770 --> 00:45:02,270 Absolutely. 866 00:45:02,270 --> 00:45:06,450 So there's all kind of optimization toolboxes. 867 00:45:06,450 --> 00:45:09,230 I mentioned Professor Johnson in [INAUDIBLE], where 868 00:45:09,230 --> 00:45:13,150 he's implemented a lot of some of the less mainstream 869 00:45:13,150 --> 00:45:14,260 optimization methods. 870 00:45:14,260 --> 00:45:15,090 That's nice. 871 00:45:15,090 --> 00:45:20,580 And then there's all kinds of other toolboxes, 872 00:45:20,580 --> 00:45:25,071 things like CPLEX that a lot of optimization people use. 873 00:45:25,071 --> 00:45:26,320 What are some of the big ones? 874 00:45:26,320 --> 00:45:27,820 I would say though MATLAB has really 875 00:45:27,820 --> 00:45:29,900 started to sort of dominate the market in a way 876 00:45:29,900 --> 00:45:32,000 because MATLAB's optimization toolbox has 877 00:45:32,000 --> 00:45:35,550 become really good, especially in the last 10 years. 878 00:45:35,550 --> 00:45:38,630 And I think the gradient pace, they 879 00:45:38,630 --> 00:45:41,750 have the state of the art, the Newton methods with all the 880 00:45:41,750 --> 00:45:43,440 bells and whistles on them. 881 00:45:43,440 --> 00:45:48,041 And then there are also a lot of user-contributed algorithms. 882 00:45:51,200 --> 00:45:52,125 I'll show you. 883 00:45:52,125 --> 00:45:54,190 Let me see if I can run it, something 884 00:45:54,190 --> 00:45:56,155 that I don't approve of. 885 00:45:56,155 --> 00:45:57,030 AUDIENCE: [INAUDIBLE] 886 00:46:08,164 --> 00:46:09,330 KAREN WILLCOX: That's right. 887 00:46:09,330 --> 00:46:10,184 Yeah. 888 00:46:10,184 --> 00:46:10,850 I gave you the-- 889 00:46:10,850 --> 00:46:13,194 AUDIENCE: [INAUDIBLE] 890 00:46:13,194 --> 00:46:14,610 KAREN WILLCOX: I gave you the link 891 00:46:14,610 --> 00:46:16,403 to that in the last lecture notes, 892 00:46:16,403 --> 00:46:17,694 to Professor Johnson's website. 893 00:46:17,694 --> 00:46:20,604 And actually since you said Python, also pyOpt. 894 00:46:20,604 --> 00:46:23,460 Right now pyOpt is starting to create 895 00:46:23,460 --> 00:46:27,205 all the optimization algorithms in Python. 896 00:46:27,205 --> 00:46:28,660 AUDIENCE: [INAUDIBLE] 897 00:46:28,660 --> 00:46:31,160 KAREN WILLCOX: What about [? Julian ?]? 898 00:46:31,160 --> 00:46:32,039 AUDIENCE: [INAUDIBLE] 899 00:46:32,039 --> 00:46:34,080 KAREN WILLCOX: Is that Professor Edelman's stuff? 900 00:46:34,080 --> 00:46:34,955 AUDIENCE: [INAUDIBLE] 901 00:46:49,189 --> 00:46:50,955 KAREN WILLCOX: Oh, once you graduate? 902 00:46:50,955 --> 00:46:52,330 But that's part of their model is 903 00:46:52,330 --> 00:46:54,580 for you to convince your employers 904 00:46:54,580 --> 00:46:58,115 to-- but that's why you stay in academia is you 905 00:46:58,115 --> 00:46:59,858 get perpetual very cheap MATLAB. 906 00:46:59,858 --> 00:47:02,140 It's one of the great perks of being an academic. 907 00:47:02,140 --> 00:47:03,015 AUDIENCE: [INAUDIBLE] 908 00:47:06,404 --> 00:47:08,570 KAREN WILLCOX: So I want to show you something again 909 00:47:08,570 --> 00:47:10,070 of which I definitely don't approve, 910 00:47:10,070 --> 00:47:11,730 but again this triggered in my mind 911 00:47:11,730 --> 00:47:13,850 because I said user-contributed toolboxes. 912 00:47:13,850 --> 00:47:15,894 There's a whole class of optimization-- 913 00:47:15,894 --> 00:47:18,060 I'm going to use the word "optimization" in quotes-- 914 00:47:18,060 --> 00:47:20,330 methods that don't use gradient, that don't really 915 00:47:20,330 --> 00:47:23,410 have any convergence theory. 916 00:47:23,410 --> 00:47:29,620 A couple of them have some theories associated with them. 917 00:47:29,620 --> 00:47:31,810 They're typically called heuristic algorithms 918 00:47:31,810 --> 00:47:34,560 because they use rules to search the design space. 919 00:47:34,560 --> 00:47:37,310 And one of the most popular in aerospace engineering 920 00:47:37,310 --> 00:47:41,460 is something called a genetic algorithm, which basically uses 921 00:47:41,460 --> 00:47:43,572 all the rules of natural selection, 922 00:47:43,572 --> 00:47:45,280 and mimics the rules of natural selection 923 00:47:45,280 --> 00:47:47,545 to optimize the design space. 924 00:47:47,545 --> 00:47:50,570 So this is the function that we were looking at on Wednesday. 925 00:47:50,570 --> 00:47:52,887 And remember, we did a gradient-based method, 926 00:47:52,887 --> 00:47:54,470 or we did the [? naldamete ?] simplex, 927 00:47:54,470 --> 00:47:57,740 where we had a little asterisk, but with our initial guess, 928 00:47:57,740 --> 00:48:00,510 and then the asterisk marched and found the top of the hill, 929 00:48:00,510 --> 00:48:02,460 or got to one of the tops of the hills 930 00:48:02,460 --> 00:48:04,250 depending on where it started. 931 00:48:04,250 --> 00:48:07,327 So a genetic algorithm is a population-based method, 932 00:48:07,327 --> 00:48:09,660 which means you don't have one design on each iteration, 933 00:48:09,660 --> 00:48:10,639 but you have many. 934 00:48:10,639 --> 00:48:13,180 And I don't know how many there are in this population, maybe 935 00:48:13,180 --> 00:48:14,810 20 or 30. 936 00:48:14,810 --> 00:48:17,180 So there are 20 or 30 initial guesses, 937 00:48:17,180 --> 00:48:19,060 and that's the population. 938 00:48:19,060 --> 00:48:21,340 And when I set it running, what you're going to see 939 00:48:21,340 --> 00:48:25,270 is that each iteration, it's evolving this population. 940 00:48:25,270 --> 00:48:28,100 So if there are, let's say there are 30 there, 941 00:48:28,100 --> 00:48:30,700 on the next iteration it's going to be 30 more and then 30 942 00:48:30,700 --> 00:48:35,190 more and 30 more, and the way it's evolving it's taking 943 00:48:35,190 --> 00:48:38,205 designs and then the parents and they're having children, 944 00:48:38,205 --> 00:48:40,080 then it's figuring out which children to keep 945 00:48:40,080 --> 00:48:41,900 and which ones to not keep. 946 00:48:41,900 --> 00:48:43,990 And it's introducing a mutation that causes 947 00:48:43,990 --> 00:48:45,800 designs to jump around. 948 00:48:45,800 --> 00:48:47,810 So there's a whole sort of thing. 949 00:48:47,810 --> 00:48:51,910 And Professor [? Divic ?] knows a lot about genetic algorithms. 950 00:48:51,910 --> 00:48:54,610 So if you're interested, you could ask him. 951 00:48:54,610 --> 00:48:56,780 But let me set it running, and you'll 952 00:48:56,780 --> 00:48:58,406 see you'll see what I mean. 953 00:48:58,406 --> 00:49:00,804 AUDIENCE: [INAUDIBLE] 954 00:49:00,804 --> 00:49:02,970 KAREN WILLCOX: Yeah, the heuristic is all the things 955 00:49:02,970 --> 00:49:07,840 about-- you take designs and you have to encode them 956 00:49:07,840 --> 00:49:09,240 with chromosomes. 957 00:49:09,240 --> 00:49:11,850 And then there are heuristics about the different mating 958 00:49:11,850 --> 00:49:14,220 strategies and then the mutations. 959 00:49:14,220 --> 00:49:16,520 So there's a lot of-- and at the end of the day, 960 00:49:16,520 --> 00:49:18,740 you don't have guarantees about optimality. 961 00:49:18,740 --> 00:49:23,270 But if you have a problem where you can compute gradients, 962 00:49:23,270 --> 00:49:27,622 maybe where the models are not differentiable, 963 00:49:27,622 --> 00:49:29,330 and you just kind of want to spray points 964 00:49:29,330 --> 00:49:31,204 everywhere and [? work ?] as well as you can, 965 00:49:31,204 --> 00:49:32,557 maybe something like this is OK. 966 00:49:32,557 --> 00:49:33,890 I wouldn't call it optimization. 967 00:49:33,890 --> 00:49:35,959 AUDIENCE: [INAUDIBLE] 968 00:49:35,959 --> 00:49:38,000 KAREN WILLCOX: Yeah, if you want to avoid local-- 969 00:49:38,000 --> 00:49:40,260 but I think you avoid local minima by not 970 00:49:40,260 --> 00:49:43,100 getting to any minimum at all. 971 00:49:43,100 --> 00:49:45,320 They work great for when you have 2D. 972 00:49:45,320 --> 00:49:50,040 But you see the idea how dramatically different it is. 973 00:49:50,040 --> 00:49:51,640 There's simulated annealing which 974 00:49:51,640 --> 00:49:54,730 does have some theory associated with it. 975 00:49:54,730 --> 00:49:56,275 There's particle swarm optimization. 976 00:49:56,275 --> 00:49:57,150 There's ant colonies. 977 00:49:57,150 --> 00:49:59,480 There's lots of analogies with nature. 978 00:49:59,480 --> 00:50:01,570 So there's this whole other class 979 00:50:01,570 --> 00:50:04,055 of optimization algorithms which can be kind of fun. 980 00:50:07,310 --> 00:50:09,690 OK. 981 00:50:09,690 --> 00:50:11,580 And so if you wanted to find those, 982 00:50:11,580 --> 00:50:13,834 you could probably go find other toolboxes as well. 983 00:50:16,520 --> 00:50:20,900 OK, so I think I'll skip over the 1D search. 984 00:50:20,900 --> 00:50:22,584 It's not all that important. 985 00:50:22,584 --> 00:50:24,750 But I want to spend the last little bit just talking 986 00:50:24,750 --> 00:50:30,250 about surrogate modeling and reduced [INAUDIBLE] models. 987 00:50:30,250 --> 00:50:34,280 So maybe you have a sense that when we run an optimization 988 00:50:34,280 --> 00:50:37,590 algorithm, we have to call the code lots of times. 989 00:50:37,590 --> 00:50:40,032 Every time we want to evaluate the objective 990 00:50:40,032 --> 00:50:42,490 or the constraints, and any time we want to get a gradient, 991 00:50:42,490 --> 00:50:45,720 we may have to call it n times. 992 00:50:45,720 --> 00:50:47,740 Also, you've seen in the Monte Carlo simulation 993 00:50:47,740 --> 00:50:51,590 that you had to call the code 22,000 times when you were 994 00:50:51,590 --> 00:50:55,270 running the simulations for each ball, each pitch 995 00:50:55,270 --> 00:50:57,275 that you were analyzing. 996 00:50:57,275 --> 00:50:59,220 So it turns out that when you have 997 00:50:59,220 --> 00:51:01,670 CFD models or finite element models 998 00:51:01,670 --> 00:51:04,924 or realistic models of aircraft or spacecraft 999 00:51:04,924 --> 00:51:06,340 or whatever it is you're modeling, 1000 00:51:06,340 --> 00:51:08,110 it just gets too expensive. 1001 00:51:08,110 --> 00:51:10,860 And so we use what is called surrogate model. 1002 00:51:10,860 --> 00:51:12,680 So the word "surrogate" means that we're 1003 00:51:12,680 --> 00:51:18,427 going to replace the real model with an approximation. 1004 00:51:18,427 --> 00:51:20,010 And the idea is that the approximation 1005 00:51:20,010 --> 00:51:24,000 should be something that's much cheaper than the model 1006 00:51:24,000 --> 00:51:26,520 that we're trying to optimize, but hopefully it's good 1007 00:51:26,520 --> 00:51:28,580 enough so that we can make good design decisions 1008 00:51:28,580 --> 00:51:30,800 and find good designs, and then maybe go back 1009 00:51:30,800 --> 00:51:34,680 and analyze them with our more sophisticated model. 1010 00:51:34,680 --> 00:51:37,090 So I like to think of surrogate models in these three 1011 00:51:37,090 --> 00:51:38,300 different categories. 1012 00:51:38,300 --> 00:51:40,600 There are data fit models like regression models, 1013 00:51:40,600 --> 00:51:43,870 and we'll talk a little bit more about these ones. 1014 00:51:43,870 --> 00:51:46,230 There are simplified physics models, which actually we 1015 00:51:46,230 --> 00:51:48,940 use in engineering all the time, that maybe we 1016 00:51:48,940 --> 00:51:53,920 want to design according to a Navier-Stokes equation of, 1017 00:51:53,920 --> 00:51:56,830 in this case, of a supersonic business jet. 1018 00:51:56,830 --> 00:51:59,640 But we could also use like a paddle method, or even 1019 00:51:59,640 --> 00:52:03,600 just a simple empirical very high-level model 1020 00:52:03,600 --> 00:52:05,684 of the aircraft so we can simplify the physics. 1021 00:52:05,684 --> 00:52:07,100 And then there are things that are 1022 00:52:07,100 --> 00:52:11,140 called projection-based reduced model, which is basically 1023 00:52:11,140 --> 00:52:15,570 a mathematical way of coming up with simpler models. 1024 00:52:15,570 --> 00:52:18,780 This little diagram here is supposed to represent an x 1025 00:52:18,780 --> 00:52:21,510 dot equal x plus b u. 1026 00:52:21,510 --> 00:52:23,976 If you guys didn't see state space systems [INAUDIBLE] 1027 00:52:23,976 --> 00:52:24,600 anymore, right? 1028 00:52:24,600 --> 00:52:26,512 No. 1029 00:52:26,512 --> 00:52:30,810 So mathematical ways for reducing systems 1030 00:52:30,810 --> 00:52:32,160 that I won't talk about. 1031 00:52:32,160 --> 00:52:36,640 But I do lots of research in that third category. 1032 00:52:36,640 --> 00:52:40,260 So if we just want to think a little bit about the data fit 1033 00:52:40,260 --> 00:52:42,460 category, so the idea is going to be 1034 00:52:42,460 --> 00:52:44,892 that we're going to sample our simulation at some number 1035 00:52:44,892 --> 00:52:45,600 of design points. 1036 00:52:45,600 --> 00:52:47,891 So we might use one of the design of experiments method 1037 00:52:47,891 --> 00:52:52,570 that we talked about last week to somehow select some points. 1038 00:52:52,570 --> 00:52:54,650 So we're going to run our expensive model 1039 00:52:54,650 --> 00:52:56,149 at a bunch of points, and then we're 1040 00:52:56,149 --> 00:52:59,650 going to fit a model using that sample information. 1041 00:52:59,650 --> 00:53:02,250 We could try to fit a model everywhere, 1042 00:53:02,250 --> 00:53:05,600 or we might want to do more of a local fit. 1043 00:53:05,600 --> 00:53:08,460 And then you could also think about updating the model when 1044 00:53:08,460 --> 00:53:09,760 you get new points. 1045 00:53:09,760 --> 00:53:16,970 But I want to talk specifically about response surface 1046 00:53:16,970 --> 00:53:23,980 modeling, and see how one would come up 1047 00:53:23,980 --> 00:53:30,965 with a surrogate model using a response surface. 1048 00:53:30,965 --> 00:53:32,340 Because again, that's where we'll 1049 00:53:32,340 --> 00:53:35,190 use some of the mathematical techniques 1050 00:53:35,190 --> 00:53:39,340 that you guys should have seen before. 1051 00:53:39,340 --> 00:53:42,450 So we skipped over number three. 1052 00:53:42,450 --> 00:53:45,790 We didn't talk about the 1D search. 1053 00:53:45,790 --> 00:53:48,760 And number four was really just a brief intro 1054 00:53:48,760 --> 00:53:50,622 to what is a surrogate model, and what 1055 00:53:50,622 --> 00:53:51,660 are the different times. 1056 00:53:51,660 --> 00:53:55,090 So we're at number five, on a response surface model. 1057 00:53:58,576 --> 00:54:06,180 OK, so the setup is that we're going to have-- let's just 1058 00:54:06,180 --> 00:54:10,730 draw the [? block ?] diagram. 1059 00:54:10,730 --> 00:54:17,940 So this is our expensive model where we can supply x, 1060 00:54:17,940 --> 00:54:19,361 and out comes J of x. 1061 00:54:19,361 --> 00:54:20,860 Therefore with a constraint problem, 1062 00:54:20,860 --> 00:54:24,216 it would also be g and h coming out of there. 1063 00:54:24,216 --> 00:54:26,550 So that's our expensive model. 1064 00:54:26,550 --> 00:54:28,580 And basically what we want to do is come up 1065 00:54:28,580 --> 00:54:37,480 with a surrogate model said that when we supply x, what's 1066 00:54:37,480 --> 00:54:41,160 going to come out will be some J hat, where 1067 00:54:41,160 --> 00:54:48,006 this guy is hopefully a good approximation of J. 1068 00:54:48,006 --> 00:54:52,210 And one simple but pretty powerful and commonly used 1069 00:54:52,210 --> 00:54:55,810 way to do that is with what's called a response surface 1070 00:54:55,810 --> 00:54:58,242 model. 1071 00:54:58,242 --> 00:55:06,510 So response surface model, or RSM. 1072 00:55:14,040 --> 00:55:22,560 So if we wanted to do just a first order RSM, 1073 00:55:22,560 --> 00:55:27,660 then what we're going to say is that this J hat, which 1074 00:55:27,660 --> 00:55:32,810 is a function of x, is just going to be some constant, 1075 00:55:32,810 --> 00:55:37,440 let's call it a0, plus the sum from i 1076 00:55:37,440 --> 00:55:41,020 equal 1 to n of some ai times xi. 1077 00:55:41,020 --> 00:55:43,720 In other words, it's just a linear model. 1078 00:55:43,720 --> 00:55:49,430 It's a constant plus a constant a1 times x1, plus a2 times x2. 1079 00:55:49,430 --> 00:55:52,880 So it's going to replace the expensive model 1080 00:55:52,880 --> 00:55:54,590 with a linear approximation. 1081 00:55:54,590 --> 00:56:00,630 That would be a first order response surface model. 1082 00:56:00,630 --> 00:56:04,620 Turns out this is not such a good idea, 1083 00:56:04,620 --> 00:56:08,322 and we'll look at it graphically in a second and you'll see why. 1084 00:56:08,322 --> 00:56:11,970 So more often than not, people used a second order 1085 00:56:11,970 --> 00:56:20,860 or a quadratic response surface. 1086 00:56:20,860 --> 00:56:27,790 And it's the same idea that the approximate model they had 1087 00:56:27,790 --> 00:56:31,770 is going to be, again, a constant, 1088 00:56:31,770 --> 00:56:35,980 and then it's going to be the linear term, so the sum from i 1089 00:56:35,980 --> 00:56:39,360 equal 1 to n of ai xi's, and then 1090 00:56:39,360 --> 00:56:43,390 it's going to be all the quadratic terms, so i equals 1 1091 00:56:43,390 --> 00:56:53,818 to n, j equals 1 to n of b i J x i x J. 1092 00:56:53,818 --> 00:56:56,770 And actually maybe this should be J equal i to n. 1093 00:57:04,600 --> 00:57:07,930 So again, we're just replacing our J of x with a model, 1094 00:57:07,930 --> 00:57:09,450 and in this case, second order case, 1095 00:57:09,450 --> 00:57:11,060 we're saying the model is quadratic. 1096 00:57:11,060 --> 00:57:13,350 It can have a constant term, it can linear terms, 1097 00:57:13,350 --> 00:57:18,770 and then it can have all these quadratic terms in the xi, xj, 1098 00:57:18,770 --> 00:57:21,752 xi squared, x1 squared, x1, x2. 1099 00:57:21,752 --> 00:57:24,212 AUDIENCE: [INAUDIBLE] 1100 00:57:24,212 --> 00:57:29,230 KAREN WILLCOX: Well, because we don't need 1, 2 and 2, 1. 1101 00:57:29,230 --> 00:57:30,670 Yeah. 1102 00:57:30,670 --> 00:57:33,330 I mean, you can write 1 there if you wanted to, 1103 00:57:33,330 --> 00:57:35,530 but because x1, x2 is the same as x2, 1104 00:57:35,530 --> 00:57:39,530 x1, we would just lump them. [INAUDIBLE] 1105 00:57:39,530 --> 00:57:49,310 OK, so here we've got that the ai's are unknown coefficients. 1106 00:57:54,050 --> 00:57:58,165 And here we have the ai's and the bij's 1107 00:57:58,165 --> 00:58:00,524 are unknown coefficients. 1108 00:58:03,250 --> 00:58:07,420 So we've said that we want to approximate our model 1109 00:58:07,420 --> 00:58:09,104 with a linear or a quadratic function. 1110 00:58:09,104 --> 00:58:11,395 Now the question is how do we come up with coefficients 1111 00:58:11,395 --> 00:58:13,160 for that approximation. 1112 00:58:13,160 --> 00:58:17,034 And that's where this sampling comes into play. 1113 00:58:24,664 --> 00:58:26,205 What we're going to do is we're going 1114 00:58:26,205 --> 00:58:29,075 to generate samples using our favorite design of experiments 1115 00:58:29,075 --> 00:58:30,450 method, or maybe we're just going 1116 00:58:30,450 --> 00:58:33,319 to run Monte Carlo and sample randomly. 1117 00:58:33,319 --> 00:58:35,110 We're going to generate a bunch of samples, 1118 00:58:35,110 --> 00:58:38,060 and then we're going to use least squares fitting to try 1119 00:58:38,060 --> 00:58:41,357 to estimate those coefficients and come up with the response 1120 00:58:41,357 --> 00:58:41,940 surface model. 1121 00:58:44,630 --> 00:58:57,899 The next step is going to be how to determine the ai and the bij 1122 00:58:57,899 --> 00:58:58,440 coefficients. 1123 00:59:04,330 --> 00:59:15,070 So let's say that we have generated n samples, 1124 00:59:15,070 --> 00:59:22,218 and each sample is going to have-- 1125 00:59:22,218 --> 00:59:27,530 we'll call it j-- there's going to be a pair consisting 1126 00:59:27,530 --> 00:59:31,500 of the design variables that we sampled, 1127 00:59:31,500 --> 00:59:33,760 the value of x that we sampled, and that's again 1128 00:59:33,760 --> 00:59:37,500 the n by 1 vector, and then the corresponding value 1129 00:59:37,500 --> 00:59:39,220 of the objective. 1130 00:59:39,220 --> 00:59:44,590 So we'll have x1, j1, we'll have x2, j2, 1131 00:59:44,590 --> 00:59:49,570 all the way up to xn, jn. 1132 00:59:49,570 --> 00:59:53,210 So I ran my expensive model n times. 1133 00:59:53,210 --> 00:59:55,000 So I just collected all the data. 1134 00:59:55,000 --> 00:59:57,047 I collected all the conditions that I ran it at, 1135 00:59:57,047 --> 00:59:59,255 and I collected all the results that I got out of it. 1136 00:59:59,255 --> 01:00:00,740 Is that clear? 1137 01:00:00,740 --> 01:00:01,240 Yeah? 1138 01:00:09,570 --> 01:00:19,500 Let's think about this guy here, so just a first order 1139 01:00:19,500 --> 01:00:27,515 [INAUDIBLE] J of x equal a0 plus the sum from i equal 1 to n 1140 01:00:27,515 --> 01:00:31,100 of ai xi. 1141 01:00:31,100 --> 01:00:33,140 How many pieces of data do we need at a minimum? 1142 01:00:42,020 --> 01:00:45,640 How many unknown coefficients are there in the model? 1143 01:00:45,640 --> 01:00:46,260 n plus 1. 1144 01:00:46,260 --> 01:00:47,210 Little n plus 1. 1145 01:00:47,210 --> 01:00:50,189 So we need big N to be at least little n plus 1. 1146 01:00:50,189 --> 01:00:51,730 But in m what we would do is we would 1147 01:00:51,730 --> 01:00:55,140 generate more than little n plus 1, 1148 01:00:55,140 --> 01:00:56,660 and then we would do a regression. 1149 01:00:56,660 --> 01:00:58,890 We would do a least squares fit. 1150 01:00:58,890 --> 01:01:02,050 You guys did least squares fir somewhere, yes? 1151 01:01:02,050 --> 01:01:02,550 1806? 1152 01:01:02,550 --> 01:01:03,716 Did you do it anywhere else? 1153 01:01:03,716 --> 01:01:04,594 [INAUDIBLE] 1154 01:01:04,594 --> 01:01:05,970 AUDIENCE: [INAUDIBLE] 1155 01:01:05,970 --> 01:01:07,100 KAREN WILLCOX: High school. 1156 01:01:07,100 --> 01:01:09,474 Did everybody here remember their high school regression? 1157 01:01:09,474 --> 01:01:10,242 No? 1158 01:01:10,242 --> 01:01:11,117 AUDIENCE: [INAUDIBLE] 1159 01:01:13,457 --> 01:01:15,540 KAREN WILLCOX: You can do them on your calculator? 1160 01:01:15,540 --> 01:01:20,390 MATLAB backslash, like the most amazing operator? 1161 01:01:20,390 --> 01:01:20,890 OK. 1162 01:01:20,890 --> 01:01:23,360 So, well, let's think about what it would look like. 1163 01:01:23,360 --> 01:01:28,700 So I always like to think of the least squares fitting from just 1164 01:01:28,700 --> 01:01:30,070 the matrix system. 1165 01:01:30,070 --> 01:01:32,726 Let me draw it over here so there's plenty of room. 1166 01:01:39,210 --> 01:01:40,920 This is a matrix system, and if we 1167 01:01:40,920 --> 01:01:43,090 have exactly the right number of data points, 1168 01:01:43,090 --> 01:01:44,750 the matrix will be square. 1169 01:01:44,750 --> 01:01:47,060 And if we have more data points than unknowns, 1170 01:01:47,060 --> 01:01:50,510 the matrix will have more rows than columns, 1171 01:01:50,510 --> 01:01:52,630 and it will be an overdetermined system, 1172 01:01:52,630 --> 01:01:55,528 and you would do a least square solution. 1173 01:01:55,528 --> 01:01:59,280 So we'll set up a matrix system. 1174 01:02:05,120 --> 01:02:07,880 First of all, the unknowns, so the things we're solving for 1175 01:02:07,880 --> 01:02:16,670 are these [INAUDIBLE] a0, a1, a2, down to a n, little n. 1176 01:02:24,550 --> 01:02:25,910 Those are the unknowns. 1177 01:02:25,910 --> 01:02:28,690 And then each row in the matrix system, this 1178 01:02:28,690 --> 01:02:31,880 is going to correspond one of the sample points. 1179 01:02:31,880 --> 01:02:37,180 So the first sample point says that x-- 1180 01:02:37,180 --> 01:02:40,870 when I feed it in x superscript one into the model, 1181 01:02:40,870 --> 01:02:43,050 I get out J1. 1182 01:02:43,050 --> 01:02:47,770 So here's J1, and here's the expansion. 1183 01:02:47,770 --> 01:02:51,440 And when I feed in x1, so the first thing I do 1184 01:02:51,440 --> 01:03:04,600 is I get 1 times a0 plus a1 times x1 in the first sample 1185 01:03:04,600 --> 01:03:05,740 point. 1186 01:03:05,740 --> 01:03:10,457 So this guy here is going to be x1,1 where the superscript is 1187 01:03:10,457 --> 01:03:12,540 the sample number, and the subscript is the design 1188 01:03:12,540 --> 01:03:13,206 variable number. 1189 01:03:13,206 --> 01:03:14,918 Yup. 1190 01:03:14,918 --> 01:03:22,233 And that's x2,1, xn,1. 1191 01:03:22,233 --> 01:03:23,730 Is everybody with me? 1192 01:03:23,730 --> 01:03:28,390 I just wrote down the equation 1 times a0 1193 01:03:28,390 --> 01:03:31,080 plus the first component of sample 1194 01:03:31,080 --> 01:03:33,970 1 times a1 plus the second component 1195 01:03:33,970 --> 01:03:35,830 of sample 1 times a2, plus, plus, 1196 01:03:35,830 --> 01:03:41,512 plus the nth component of sample a1 times a n equals to J1. 1197 01:03:44,950 --> 01:03:46,640 [INAUDIBLE] exactly does this equation 1198 01:03:46,640 --> 01:03:48,139 apply to the first [? here ?], and I 1199 01:03:48,139 --> 01:03:52,550 could write down so that this is the first sample, which 1200 01:03:52,550 --> 01:03:55,710 we called x superscript 1. 1201 01:03:55,710 --> 01:03:57,310 So then I would do the second sample, 1202 01:03:57,310 --> 01:03:58,716 and it would just look like this. 1203 01:04:02,684 --> 01:04:11,394 x2,1, x2,2, xn,2, and that's going to be equal to J2. 1204 01:04:11,394 --> 01:04:12,560 And then I would keep going. 1205 01:04:12,560 --> 01:04:17,810 So I'm always going to have 1's in this column, because that 1 1206 01:04:17,810 --> 01:04:20,197 is always going to multiply a0. 1207 01:04:20,197 --> 01:04:21,780 Down here eventually I'm going to have 1208 01:04:21,780 --> 01:04:25,200 x, first component of the nth sample, 1209 01:04:25,200 --> 01:04:28,210 and here I'll have x, nth component of the big Nth 1210 01:04:28,210 --> 01:04:31,860 sample, and everything in between. 1211 01:04:31,860 --> 01:04:33,690 Yup. 1212 01:04:33,690 --> 01:04:40,860 So this is a big N by little n plus 1 matrix. 1213 01:04:40,860 --> 01:04:42,950 This is a little n plus 1 vector, 1214 01:04:42,950 --> 01:04:53,310 and this is a big N vector. 1215 01:04:53,310 --> 01:04:54,670 Yeah. 1216 01:04:54,670 --> 01:04:57,140 And if we have exactly little n plus 1 samples, 1217 01:04:57,140 --> 01:04:58,640 again, this matrix would be square, 1218 01:04:58,640 --> 01:05:00,098 and we would just invert it, and it 1219 01:05:00,098 --> 01:05:01,830 would give us the coefficients. 1220 01:05:01,830 --> 01:05:05,460 If we have more samples than we have unknowns, then 1221 01:05:05,460 --> 01:05:07,910 again it's just a least squares solution, 1222 01:05:07,910 --> 01:05:10,440 which you can do with matrix with MATLAB's backslash, 1223 01:05:10,440 --> 01:05:16,370 or you can remember the formula involving ax equals b, 1224 01:05:16,370 --> 01:05:21,810 and it's overdetermined, then you do a transposed ax equals 1225 01:05:21,810 --> 01:05:24,630 a transposed b, and then take the inverse of this, 1226 01:05:24,630 --> 01:05:26,620 and take it to the lefthand side. 1227 01:05:29,390 --> 01:05:31,710 [INAUDIBLE] 1228 01:05:31,710 --> 01:05:34,687 OK, and if we're doing a quadratic response surface, 1229 01:05:34,687 --> 01:05:38,800 so if I was setting J hat of x is this plus this 1230 01:05:38,800 --> 01:05:48,201 plus the term, the b ij xi xj, how would the system change? 1231 01:05:48,201 --> 01:05:49,200 What would I have to do? 1232 01:05:55,360 --> 01:05:58,011 Would there be more unknowns? 1233 01:05:58,011 --> 01:06:04,780 Yeah, so there would be the b1,1, b1,2, b1,3, all the b's. 1234 01:06:04,780 --> 01:06:09,630 So more unknown means more columns in the matrix. 1235 01:06:09,630 --> 01:06:11,900 So what would be-- if I put b1,1 here, 1236 01:06:11,900 --> 01:06:14,280 what entry would go in here? 1237 01:06:17,118 --> 01:06:18,540 AUDIENCE: [INAUDIBLE] 1238 01:06:18,540 --> 01:06:24,300 KAREN WILLCOX: X1,1 squared, because in the expansion, 1239 01:06:24,300 --> 01:06:27,640 it's b1,1 times x1 squared. 1240 01:06:27,640 --> 01:06:30,410 And we would be evaluating that for the first sample. 1241 01:06:30,410 --> 01:06:34,950 So we'd end up with x1,1 squared x1,1 times x2,1. 1242 01:06:34,950 --> 01:06:37,280 You'd end up with all the quadratic terms matched out. 1243 01:06:37,280 --> 01:06:40,042 So it looks just the same. 1244 01:06:40,042 --> 01:06:43,382 The regression's not really-- sometimes it 1245 01:06:43,382 --> 01:06:45,340 seems more complicated, but I think if you just 1246 01:06:45,340 --> 01:06:46,881 write out the matrix system and think 1247 01:06:46,881 --> 01:06:50,760 about it as an overdetermined set of equations, 1248 01:06:50,760 --> 01:06:51,700 it becomes clearer. 1249 01:06:51,700 --> 01:06:53,700 And then you can also see we're going to have more unknowns. 1250 01:06:53,700 --> 01:06:55,033 We're going to add more columns. 1251 01:06:55,033 --> 01:06:56,750 If we want it to stay overdetermined, 1252 01:06:56,750 --> 01:06:58,291 or at least determined, we would have 1253 01:06:58,291 --> 01:07:02,670 to get more samples, which is what we talked about. 1254 01:07:02,670 --> 01:07:08,970 So let me show you how that works. 1255 01:07:08,970 --> 01:07:09,732 Is this clear? 1256 01:07:09,732 --> 01:07:10,690 Is this notation clear? 1257 01:07:13,783 --> 01:07:14,658 AUDIENCE: [INAUDIBLE] 1258 01:07:19,122 --> 01:07:20,727 KAREN WILLCOX: Say it a bit louder. 1259 01:07:20,727 --> 01:07:21,602 AUDIENCE: [INAUDIBLE] 1260 01:07:27,177 --> 01:07:28,593 KAREN WILLCOX: Yeah, that's right. 1261 01:07:28,593 --> 01:07:31,880 And in fact, using a first order response surface 1262 01:07:31,880 --> 01:07:35,106 is a really bad idea, and we'll see it graphically. 1263 01:07:35,106 --> 01:07:37,740 So it turns out linear approximations in optimization 1264 01:07:37,740 --> 01:07:39,144 are not a good idea. 1265 01:07:39,144 --> 01:07:40,810 Maybe you can think about why that would 1266 01:07:40,810 --> 01:07:44,105 be with a linear approximation. 1267 01:07:44,105 --> 01:07:45,480 You think about what we're doing, 1268 01:07:45,480 --> 01:07:48,560 we've got this landscape, we build a linear approximation 1269 01:07:48,560 --> 01:07:50,410 and say "optimize." 1270 01:07:50,410 --> 01:07:53,137 Find a new point, and build a new linear approximation. 1271 01:07:53,137 --> 01:07:54,970 But where is the linear approximation always 1272 01:07:54,970 --> 01:07:58,575 going to send you? 1273 01:07:58,575 --> 01:08:02,880 It's always going to send you far away in some direction, 1274 01:08:02,880 --> 01:08:03,886 kind of by definition. 1275 01:08:03,886 --> 01:08:05,510 So I build a linear approximation here. 1276 01:08:05,510 --> 01:08:07,010 If it's sloping this way, it's going 1277 01:08:07,010 --> 01:08:09,480 to take you all the way to that side of the design space. 1278 01:08:09,480 --> 01:08:11,063 If it's sloping the other way, so it's 1279 01:08:11,063 --> 01:08:12,870 always going to send you [INAUDIBLE]. 1280 01:08:12,870 --> 01:08:14,495 So you could combine it with a strategy 1281 01:08:14,495 --> 01:08:17,479 that-- that's what a trust region does. 1282 01:08:17,479 --> 01:08:19,830 But actually it turns out that quadratic approximation, 1283 01:08:19,830 --> 01:08:22,080 because they embed the information of curvature, which 1284 01:08:22,080 --> 01:08:26,552 we know is really important for optimality conditions, 1285 01:08:26,552 --> 01:08:34,542 are far, far better choices and more commonly used. 1286 01:08:38,240 --> 01:08:43,784 Put up on [? stellar ?], it goes through what we just 1287 01:08:43,784 --> 01:08:45,100 wrote on the board. 1288 01:08:45,100 --> 01:08:54,550 But Let me just get rid of that horrible genetic algorithm 1289 01:08:54,550 --> 01:08:58,809 and get to the right directory. 1290 01:09:10,194 --> 01:09:17,010 So I have a code here that is going 1291 01:09:17,010 --> 01:09:18,939 to build some response surface. 1292 01:09:18,939 --> 01:09:20,779 It's the same demo. 1293 01:09:20,779 --> 01:09:23,140 It's this peak problem, the landscape problem. 1294 01:09:23,140 --> 01:09:25,260 So it's going to grab some samples. 1295 01:09:25,260 --> 01:09:27,512 Maybe I'll just do it and we can look at the code. 1296 01:09:27,512 --> 01:09:29,470 And we're going to build some response surfaces 1297 01:09:29,470 --> 01:09:36,109 so you can-- so in this first example, again, here's 1298 01:09:36,109 --> 01:09:39,370 our landscape, variable x1, variable x2, 1299 01:09:39,370 --> 01:09:41,604 objective function. 1300 01:09:41,604 --> 01:09:43,270 We happen to generate these [AUDIO OUT]. 1301 01:09:46,372 --> 01:09:47,330 Maybe they were random. 1302 01:09:47,330 --> 01:09:50,609 Maybe we had some kind of design of experiments method to do it. 1303 01:09:50,609 --> 01:09:54,340 And three points, two dimensions, what can we fit? 1304 01:09:54,340 --> 01:09:56,960 What kind of a response surface can we fit? 1305 01:09:56,960 --> 01:09:59,110 Linear one, first order? 1306 01:09:59,110 --> 01:10:01,690 n plus 1? 1307 01:10:01,690 --> 01:10:03,760 We don't have enough to fit a quadratic. 1308 01:10:03,760 --> 01:10:07,050 So there is-- in effect, it's determined. 1309 01:10:07,050 --> 01:10:15,240 It's three unknowns, constant, slope in the x direction, 1310 01:10:15,240 --> 01:10:16,810 slope in the x2 direction. 1311 01:10:16,810 --> 01:10:18,470 So there's the response surface. 1312 01:10:18,470 --> 01:10:19,060 It's clean. 1313 01:10:19,060 --> 01:10:20,980 It goes exactly through those three points. 1314 01:10:20,980 --> 01:10:23,480 And you see kind of the problem that [? Tyrone ?] was asking 1315 01:10:23,480 --> 01:10:25,910 about, which is if we use this optimization, 1316 01:10:25,910 --> 01:10:27,700 it's going to send us-- and we maximize, 1317 01:10:27,700 --> 01:10:30,325 it's going to send us over into the corner of the design space. 1318 01:10:30,325 --> 01:10:33,470 Clearly it's not a good approximation. 1319 01:10:33,470 --> 01:10:35,930 So we could generate more points. 1320 01:10:35,930 --> 01:10:38,640 So now we have these six points. 1321 01:10:38,640 --> 01:10:40,500 And now that we have six points, we 1322 01:10:40,500 --> 01:10:42,250 could still set a linear model, but now it 1323 01:10:42,250 --> 01:10:44,050 would be an overdetermined system. 1324 01:10:44,050 --> 01:10:46,613 The matrix has got six rows, because there are six sample 1325 01:10:46,613 --> 01:10:47,137 points. 1326 01:10:47,137 --> 01:10:49,220 And it's got three columns because there are still 1327 01:10:49,220 --> 01:10:51,470 three degrees of freedom, the constant, 1328 01:10:51,470 --> 01:10:55,550 the slope in the x1 direction, and the slope the x2 direction. 1329 01:10:55,550 --> 01:10:57,450 So now you still have a linear model, 1330 01:10:57,450 --> 01:11:00,910 but you can see that it doesn't go through the points. 1331 01:11:00,910 --> 01:11:02,680 And in fact, it's the least squares 1332 01:11:02,680 --> 01:11:04,616 is like the line of best fit, but we're 1333 01:11:04,616 --> 01:11:06,400 in multiple dimensions. 1334 01:11:06,400 --> 01:11:08,200 So it's like the plane of best fit. 1335 01:11:08,200 --> 01:11:12,360 Three of the points are above, and the other three 1336 01:11:12,360 --> 01:11:16,130 want to be sitting somewhere underneath here. 1337 01:11:16,130 --> 01:11:18,540 And maybe if you've taken 1806-- they're 1338 01:11:18,540 --> 01:11:19,540 sitting underneath here. 1339 01:11:19,540 --> 01:11:21,039 Maybe if you've taken 1806, you know 1340 01:11:21,039 --> 01:11:25,170 that this solution is going to be 1341 01:11:25,170 --> 01:11:28,380 the one that minimizes the sum of the squares of the two norm 1342 01:11:28,380 --> 01:11:29,120 distance. 1343 01:11:29,120 --> 01:11:35,310 So it is the line of best fit in that optimal sense. 1344 01:11:35,310 --> 01:11:37,900 That's what the least squares fit is. 1345 01:11:37,900 --> 01:11:38,400 OK. 1346 01:11:38,400 --> 01:11:40,720 So now we have six points, which turns out 1347 01:11:40,720 --> 01:11:43,900 to be n times n plus 1 over 2, which is how many points we 1348 01:11:43,900 --> 01:11:46,180 need to fit a quadratic model. 1349 01:11:46,180 --> 01:11:47,274 So we have six points. 1350 01:11:47,274 --> 01:11:48,940 And you think about the quadratic model, 1351 01:11:48,940 --> 01:11:49,648 what do you have? 1352 01:11:49,648 --> 01:11:52,070 You have the constant, a0, you have the linear terms, 1353 01:11:52,070 --> 01:11:59,180 a1 and a2, and then you have of b1,1, [INAUDIBLE] squared. 1354 01:11:59,180 --> 01:12:04,430 You have b2,2 for x2 squared, and then you have b1,2 for x1 1355 01:12:04,430 --> 01:12:05,420 x2. 1356 01:12:05,420 --> 01:12:09,930 You have six degrees of freedom to fit a quadratic model in 2D. 1357 01:12:09,930 --> 01:12:11,360 I wrote out the expansion. 1358 01:12:11,360 --> 01:12:13,660 So six points, six degrees of freedom. 1359 01:12:13,660 --> 01:12:15,860 Again, our matrix system would be 6 by 6, 1360 01:12:15,860 --> 01:12:17,820 so it would be uniquely determined. 1361 01:12:17,820 --> 01:12:20,390 And this is what the quadratic response surface looks like. 1362 01:12:20,390 --> 01:12:26,240 And as you can see, it's going exactly through those points. 1363 01:12:26,240 --> 01:12:31,604 OK, so now we're starting to get a little bit of curvature. 1364 01:12:31,604 --> 01:12:33,270 With a quadratic model, we can only ever 1365 01:12:33,270 --> 01:12:36,460 have one minimum or one maximum for the models 1366 01:12:36,460 --> 01:12:38,139 that look like this, whereas we know 1367 01:12:38,139 --> 01:12:40,180 that the design space we're trying to approximate 1368 01:12:40,180 --> 01:12:42,240 has got multiple blobs. 1369 01:12:42,240 --> 01:12:46,510 But a quadratic model;s not ever going to capture these multiple 1370 01:12:46,510 --> 01:12:47,436 hills in this case. 1371 01:12:50,262 --> 01:12:52,220 I mentioned really briefly on one of the slides 1372 01:12:52,220 --> 01:12:54,094 that we could think about adapting the model. 1373 01:12:54,094 --> 01:12:57,279 So maybe as we're going through the optimization, maybe 1374 01:12:57,279 --> 01:12:58,820 we can make a genetic algorithm, even 1375 01:12:58,820 --> 01:12:59,760 though we're not supposed to. 1376 01:12:59,760 --> 01:13:02,009 We're figuring out that this is the interesting region 1377 01:13:02,009 --> 01:13:05,630 in the design space, so rather than having our points spread 1378 01:13:05,630 --> 01:13:09,240 out everywhere, we might be saying let's kind of only 1379 01:13:09,240 --> 01:13:13,370 on the points that have got good performance that we've 1380 01:13:13,370 --> 01:13:14,239 found so far. 1381 01:13:14,239 --> 01:13:16,280 So see all these points around here-- [INAUDIBLE] 1382 01:13:16,280 --> 01:13:18,040 this one does too because it's on the side 1383 01:13:18,040 --> 01:13:21,520 of that other little hill that sits over here. 1384 01:13:21,520 --> 01:13:25,090 So if you build a response surface using those points, 1385 01:13:25,090 --> 01:13:28,550 maybe you can see now that the model's really 1386 01:13:28,550 --> 01:13:30,704 pretty terrible over here, but it's actually 1387 01:13:30,704 --> 01:13:33,120 starting to be a pretty good approximation of what's going 1388 01:13:33,120 --> 01:13:35,390 on locally around those points. 1389 01:13:35,390 --> 01:13:36,880 And that's a common strategy would 1390 01:13:36,880 --> 01:13:40,616 be to build these local quadratic models in regions 1391 01:13:40,616 --> 01:13:42,740 where you've sampled and use them to make progress, 1392 01:13:42,740 --> 01:13:44,946 and then to keep updating them. 1393 01:13:44,946 --> 01:13:46,320 And even though a quadratic model 1394 01:13:46,320 --> 01:13:50,250 seems really simple for a real problem, 1395 01:13:50,250 --> 01:13:52,320 that strategy of adapting points and building 1396 01:13:52,320 --> 01:14:03,580 local models [AUDIO OUT] updating the model as we go 1397 01:14:03,580 --> 01:14:07,780 is actually a really, really powerful one. 1398 01:14:07,780 --> 01:14:09,890 And of course if we had more points-- 1399 01:14:09,890 --> 01:14:12,120 I don't think I've got one that's got more points. 1400 01:14:12,120 --> 01:14:15,510 If you had more than six points, than the quadratic model 1401 01:14:15,510 --> 01:14:18,053 again would be the quadratic model of this fit, where 1402 01:14:18,053 --> 01:14:22,250 it wouldn't go through all the points, 1403 01:14:22,250 --> 01:14:27,890 but it would cut through them on average equally. 1404 01:14:27,890 --> 01:14:28,685 OK, questions? 1405 01:14:34,030 --> 01:14:37,880 So that's a pretty, I think, neat use 1406 01:14:37,880 --> 01:14:39,830 of regression that's actually very powerful 1407 01:14:39,830 --> 01:14:46,796 and is used a lot in practice. 1408 01:14:46,796 --> 01:14:50,814 So that's one of the things that you need to be able to do. 1409 01:14:50,814 --> 01:14:53,210 So have a basic understanding of how 1410 01:14:53,210 --> 01:14:55,600 a design problem can be posed as an optimization problem. 1411 01:14:55,600 --> 01:14:57,300 What do I mean when I say constraint? 1412 01:14:57,300 --> 01:14:59,924 What I mean when I say objective function? 1413 01:14:59,924 --> 01:15:01,590 Have a basic understanding of the steps, 1414 01:15:01,590 --> 01:15:03,132 and the gradient-based constrained 1415 01:15:03,132 --> 01:15:05,340 optimization algorithms are looking for the direction 1416 01:15:05,340 --> 01:15:10,165 and how the different methods do that. 1417 01:15:10,165 --> 01:15:13,120 Be able to estimate a gradient, a first order or second order 1418 01:15:13,120 --> 01:15:14,620 derivative using finite differences. 1419 01:15:14,620 --> 01:15:15,703 You could already do that. 1420 01:15:15,703 --> 01:15:18,365 You just did it in the context of PDEs. 1421 01:15:18,365 --> 01:15:20,650 Now we're talking about design problems. 1422 01:15:20,650 --> 01:15:24,260 And understand how you could construct a polynomial response 1423 01:15:24,260 --> 01:15:26,450 surface, either linear or quadratic, 1424 01:15:26,450 --> 01:15:28,940 using least squares regression. 1425 01:15:28,940 --> 01:15:31,144 Actually, we didn't talk about the quality of fit. 1426 01:15:31,144 --> 01:15:33,060 Do you guys remember what's the quality of fit 1427 01:15:33,060 --> 01:15:35,866 metric for regression models? 1428 01:15:35,866 --> 01:15:36,850 r squared. 1429 01:15:36,850 --> 01:15:41,335 Yeah, so r squared tells you how well your surface approximates 1430 01:15:41,335 --> 01:15:42,220 your data points. 1431 01:15:42,220 --> 01:15:44,469 But you have to be careful because it doesn't tell you 1432 01:15:44,469 --> 01:15:47,322 anything about how good it might be in regions where 1433 01:15:47,322 --> 01:15:50,190 you didn't sample. 1434 01:15:50,190 --> 01:15:50,690 All right. 1435 01:15:50,690 --> 01:15:51,356 Final questions? 1436 01:15:56,716 --> 01:15:58,090 You guys feel like you've learned 1437 01:15:58,090 --> 01:16:00,970 enough in this class already? 1438 01:16:00,970 --> 01:16:02,161 No? 1439 01:16:02,161 --> 01:16:02,660 OK. 1440 01:16:02,660 --> 01:16:07,050 So on Wednesday, I will do a review 1441 01:16:07,050 --> 01:16:10,194 of finite-- I think I'm going to focus only on finite element 1442 01:16:10,194 --> 01:16:12,318 methods and then the probabilistic and optimization 1443 01:16:12,318 --> 01:16:14,710 function. 1444 01:16:14,710 --> 01:16:17,180 If you have specific questions or topics 1445 01:16:17,180 --> 01:16:19,830 that you want me to cover in more detail than others, 1446 01:16:19,830 --> 01:16:21,610 then either bring them to class or you 1447 01:16:21,610 --> 01:16:23,250 can e-mail me ahead of time so that I 1448 01:16:23,250 --> 01:16:26,692 can prepare a little bit more. 1449 01:16:26,692 --> 01:16:28,650 But that will be a good chance to ask questions 1450 01:16:28,650 --> 01:16:31,900 about things that are confusing before the final. 1451 01:16:31,900 --> 01:16:32,630 All right. 1452 01:16:32,630 --> 01:16:35,370 See everybody on Wednesday.