1 00:00:00,030 --> 00:00:02,400 The following content is provided under a Creative 2 00:00:02,400 --> 00:00:03,780 Commons license. 3 00:00:03,780 --> 00:00:06,020 Your support will help MIT OpenCourseWare 4 00:00:06,020 --> 00:00:10,100 continue to offer high quality educational resources for free. 5 00:00:10,100 --> 00:00:12,670 To make a donation or to view additional materials 6 00:00:12,670 --> 00:00:16,405 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:16,405 --> 00:00:17,030 at ocw.mit.edu. 8 00:00:31,510 --> 00:00:32,840 PROFESSOR: OK. 9 00:00:32,840 --> 00:00:34,600 We can go ahead and get started. 10 00:00:34,600 --> 00:00:37,360 So today is going to be the first lecture 11 00:00:37,360 --> 00:00:42,080 on an introduction to design optimization. 12 00:00:42,080 --> 00:00:44,086 You guys ready? 13 00:00:44,086 --> 00:00:46,810 Can you talk about the project? 14 00:00:46,810 --> 00:00:50,195 I have office hours right after this, so we can-- 15 00:00:50,195 --> 00:00:52,570 So here are three things I'm going to try to cover today. 16 00:00:52,570 --> 00:00:54,820 We'll talk about the basics of the design optimization 17 00:00:54,820 --> 00:00:56,970 problem, how you take a design problem 18 00:00:56,970 --> 00:00:59,720 and set it up as an optimization problem. 19 00:00:59,720 --> 00:01:02,510 Well talk about unconstrained optimization methods 20 00:01:02,510 --> 00:01:04,879 and how to compute the gradients that we will need 21 00:01:04,879 --> 00:01:06,920 for those optimization methods. 22 00:01:06,920 --> 00:01:10,500 And then in the next lecture, we'll 23 00:01:10,500 --> 00:01:15,430 talk a little more about some of the different methods. 24 00:01:15,430 --> 00:01:18,640 We'll maybe talk about some responsiveness modeling, so 25 00:01:18,640 --> 00:01:25,296 surrogate models and see what else we have time for. 26 00:01:25,296 --> 00:01:28,250 So this is intended to be a little bit 27 00:01:28,250 --> 00:01:33,350 of just kind of a teaser for optimization methods. 28 00:01:33,350 --> 00:01:37,735 This is something that I use a lot in my research. 29 00:01:37,735 --> 00:01:41,240 So let's just start off with thinking 30 00:01:41,240 --> 00:01:46,960 about the basics of writing a design problem 31 00:01:46,960 --> 00:01:50,560 as an optimization statement. 32 00:01:50,560 --> 00:02:01,610 So the idea is that a design problem, as I said already, 33 00:02:01,610 --> 00:02:05,960 can be written as an optimization problem. 34 00:02:09,611 --> 00:02:11,110 And the reason that we're doing this 35 00:02:11,110 --> 00:02:14,810 is because then we'll be able to use numerical optimization 36 00:02:14,810 --> 00:02:16,310 methods to explore the design space. 37 00:02:16,310 --> 00:02:17,590 So what we saw in the last lecture 38 00:02:17,590 --> 00:02:19,756 were the design of experiments, which were basically 39 00:02:19,756 --> 00:02:21,550 sampling methods. 40 00:02:21,550 --> 00:02:25,540 But what we saw is that if you have a lot of design variables, 41 00:02:25,540 --> 00:02:27,850 it get's hard to explore the design space very much. 42 00:02:27,850 --> 00:02:31,400 Because you can't [INAUDIBLE] to maybe too many levels. 43 00:02:31,400 --> 00:02:34,360 So one thing we can do is try to write the design problem 44 00:02:34,360 --> 00:02:36,190 as an optimization problem. 45 00:02:36,190 --> 00:02:39,460 And in doing that, we would define 46 00:02:39,460 --> 00:02:41,510 a few mathematical elements. 47 00:02:41,510 --> 00:02:45,930 So the first are what are called objectives or objective 48 00:02:45,930 --> 00:02:48,230 functions sometimes. 49 00:02:48,230 --> 00:02:58,040 And these are what we are trying to achieve, 50 00:02:58,040 --> 00:02:59,800 so measures of performance of the design. 51 00:02:59,800 --> 00:03:03,160 We'll talk about some examples in a second. 52 00:03:03,160 --> 00:03:04,505 We're going to have constraints. 53 00:03:08,050 --> 00:03:12,340 And constraints are going to be things that we can't violate, 54 00:03:12,340 --> 00:03:15,420 so requirements on the design are cannot be violated. 55 00:03:15,420 --> 00:03:20,140 So these are going to be what we cannot violate. 56 00:03:25,300 --> 00:03:28,630 We will have design variables. 57 00:03:34,040 --> 00:03:36,196 And those are the quantities that we can change. 58 00:03:36,196 --> 00:03:37,820 Those are the things that as a designer 59 00:03:37,820 --> 00:03:39,310 we have to make decisions about. 60 00:03:39,310 --> 00:03:46,140 The design variable are what we can change, what we can change 61 00:03:46,140 --> 00:03:49,400 and what we can control. 62 00:03:49,400 --> 00:03:51,705 And lastly, we're going to have parameters. 63 00:03:56,070 --> 00:03:58,920 And the word parameters is used often in different contexts. 64 00:03:58,920 --> 00:04:00,649 Sometimes people use the word parameter 65 00:04:00,649 --> 00:04:01,940 when they mean design variable. 66 00:04:01,940 --> 00:04:04,100 The way that I'll use it is that a parameter 67 00:04:04,100 --> 00:04:09,110 is going to be other quantities that are taking on fixed value. 68 00:04:09,110 --> 00:04:15,480 So they're other quantities that affect the objective 69 00:04:15,480 --> 00:04:18,290 and the constraints. 70 00:04:18,290 --> 00:04:21,100 But they're going to take on fixed values. 71 00:04:25,020 --> 00:04:27,200 So they're not going to be varying 72 00:04:27,200 --> 00:04:28,884 as we search the design space. 73 00:04:33,330 --> 00:04:34,320 OK. 74 00:04:34,320 --> 00:04:36,700 That's the four elements of the optimization problem, 75 00:04:36,700 --> 00:04:38,915 objective functions, constraints, design variables, 76 00:04:38,915 --> 00:04:39,540 and parameters. 77 00:04:39,540 --> 00:04:41,600 And I'm going to go through each of these. 78 00:04:41,600 --> 00:04:43,650 And we'll define it mathematically and also 79 00:04:43,650 --> 00:04:50,906 think about what it might be in terms of physical examples. 80 00:04:50,906 --> 00:04:53,470 So let's start off thinking about design variables. 81 00:04:56,640 --> 00:04:58,340 So again, the design variables, these 82 00:04:58,340 --> 00:05:03,980 are the qualities or the attributes of the design 83 00:05:03,980 --> 00:05:06,020 that we have control over as a designer, 84 00:05:06,020 --> 00:05:08,250 things that we can change. 85 00:05:08,250 --> 00:05:12,700 And we're going to write the design variables 86 00:05:12,700 --> 00:05:16,870 in a vector, the design variable vector or the design vector. 87 00:05:16,870 --> 00:05:19,405 We're going to give it the symbol x. 88 00:05:19,405 --> 00:05:21,990 I'll put a vector symbol above it 89 00:05:21,990 --> 00:05:25,350 just to denote that it's a vector. 90 00:05:25,350 --> 00:05:34,470 And it's going to contain n design variables, n dvs, that 91 00:05:34,470 --> 00:05:35,850 form the design space. 92 00:05:35,850 --> 00:05:38,510 So when we talk about the design space, 93 00:05:38,510 --> 00:05:40,460 we're usually talking about the space that's 94 00:05:40,460 --> 00:05:43,220 defined by these quantities in the design vector 95 00:05:43,220 --> 00:05:50,090 x, so n design variables that form the design space. 96 00:05:53,840 --> 00:05:58,810 And so mathematically, the vector x 97 00:05:58,810 --> 00:06:05,200 will be x1, x2, down to x10. 98 00:06:08,140 --> 00:06:08,640 OK. 99 00:06:08,640 --> 00:06:12,560 And the idea is going to be that as we run the optimization 100 00:06:12,560 --> 00:06:16,100 algorithm, it's going to be searching 101 00:06:16,100 --> 00:06:17,230 over different values of x. 102 00:06:17,230 --> 00:06:19,224 So this is what's going to be changing. 103 00:06:19,224 --> 00:06:21,640 And what you'll see is that different optimization methods 104 00:06:21,640 --> 00:06:24,215 use different rules and different information 105 00:06:24,215 --> 00:06:27,904 to figure out how to move around in the design space. 106 00:06:27,904 --> 00:06:29,728 OK. 107 00:06:29,728 --> 00:06:31,988 So what are some examples? 108 00:06:34,800 --> 00:06:38,570 So if we think about aircraft conceptual design, 109 00:06:38,570 --> 00:06:48,020 say, what would be examples of physical quantities 110 00:06:48,020 --> 00:06:49,757 that might be in the vector x? 111 00:06:53,240 --> 00:06:54,250 What? 112 00:06:54,250 --> 00:06:55,730 Weight. 113 00:06:55,730 --> 00:06:59,120 Weight normally would not be a design variable, 114 00:06:59,120 --> 00:07:00,820 because weight is not usually something 115 00:07:00,820 --> 00:07:04,450 that you can control directly. 116 00:07:04,450 --> 00:07:06,530 It's something that typically would 117 00:07:06,530 --> 00:07:08,570 be computed as a function of other things 118 00:07:08,570 --> 00:07:10,300 that would be designed variables. 119 00:07:10,300 --> 00:07:14,429 So weight is going to often show up as an objective. 120 00:07:14,429 --> 00:07:15,220 AUDIENCE: Wingspan. 121 00:07:15,220 --> 00:07:15,440 PROFESSOR: Wingspan. 122 00:07:15,440 --> 00:07:16,275 Yeah. 123 00:07:16,275 --> 00:07:20,650 So wingspan would be one. 124 00:07:20,650 --> 00:07:22,410 What else? 125 00:07:22,410 --> 00:07:24,520 What is it? 126 00:07:24,520 --> 00:07:25,670 Number of rotors? 127 00:07:25,670 --> 00:07:25,970 AUDIENCE: [INAUDIBLE] 128 00:07:25,970 --> 00:07:27,678 PROFESSOR: Oh, motors, number of engines? 129 00:07:27,678 --> 00:07:29,105 Motors? 130 00:07:29,105 --> 00:07:29,980 AUDIENCE: [INAUDIBLE] 131 00:07:29,980 --> 00:07:31,060 PROFESSOR: OK. 132 00:07:31,060 --> 00:07:34,000 Sure, number of motors or engines. 133 00:07:34,000 --> 00:07:35,850 Yup. 134 00:07:35,850 --> 00:07:36,888 What else? 135 00:07:36,888 --> 00:07:38,070 AUDIENCE: [INAUDIBLE] 136 00:07:38,070 --> 00:07:39,070 PROFESSOR: Payload size? 137 00:07:39,070 --> 00:07:40,310 Yeah. 138 00:07:40,310 --> 00:07:44,540 Payload weight or payload size or payload number 139 00:07:44,540 --> 00:07:45,954 of passengers. 140 00:07:45,954 --> 00:07:46,620 AUDIENCE: Color? 141 00:07:46,620 --> 00:07:48,620 PROFESSOR: Color? 142 00:07:48,620 --> 00:07:49,620 Who said color? 143 00:07:52,340 --> 00:07:53,080 OK. 144 00:07:53,080 --> 00:07:57,990 I'm going to spell it with a U just to have my objection. 145 00:07:57,990 --> 00:08:01,060 Yeah. [INAUDIBLE] mark number whatever. 146 00:08:01,060 --> 00:08:01,560 OK. 147 00:08:01,560 --> 00:08:04,840 So all the things that I think you 148 00:08:04,840 --> 00:08:08,476 can directly control and make a decision about. 149 00:08:08,476 --> 00:08:10,700 And what you notice is that these quantities 150 00:08:10,700 --> 00:08:11,800 can be of different kinds. 151 00:08:11,800 --> 00:08:15,460 So wingspan is going to be a continuous real variable that 152 00:08:15,460 --> 00:08:19,140 can take on values between 0 and whatever upper limit you 153 00:08:19,140 --> 00:08:20,117 set to it. 154 00:08:20,117 --> 00:08:22,450 Whereas, number of motors is a discrete variable, right? 155 00:08:22,450 --> 00:08:28,090 It can be 1 or 2 or 3, but it can't be 2.5 or 2.53. 156 00:08:28,090 --> 00:08:30,290 And as we talk about the optimization methods, 157 00:08:30,290 --> 00:08:33,039 you'll see that particularly discrete or integer variables 158 00:08:33,039 --> 00:08:34,080 cause a lot of problems. 159 00:08:34,080 --> 00:08:36,280 It's much harder to optimize a problem 160 00:08:36,280 --> 00:08:39,080 that's got something like number of engines or number of motors 161 00:08:39,080 --> 00:08:40,210 in it. 162 00:08:40,210 --> 00:08:40,710 OK. 163 00:08:40,710 --> 00:08:43,250 So will be your design vector x. 164 00:08:43,250 --> 00:08:48,670 Next, let's talk about objectives or objective 165 00:08:48,670 --> 00:08:51,382 functions. 166 00:08:51,382 --> 00:08:57,080 So the objectives can be a vector as well. 167 00:08:57,080 --> 00:08:58,170 And it can be a vector j. 168 00:08:58,170 --> 00:09:00,040 And I'll put the vector symbol on there 169 00:09:00,040 --> 00:09:02,660 to denote when I'm talking about a vector. 170 00:09:02,660 --> 00:09:09,339 And it's going to be j of v system responses 171 00:09:09,339 --> 00:09:10,130 or characteristics. 172 00:09:18,860 --> 00:09:23,146 And they're going to be responses or characteristics 173 00:09:23,146 --> 00:09:26,785 that we're trying to either minimize or maximize. 174 00:09:32,092 --> 00:09:32,592 OK. 175 00:09:36,528 --> 00:09:40,830 All right, so these are going to be measures of performance, 176 00:09:40,830 --> 00:09:44,670 costs, schedule, anything that we 177 00:09:44,670 --> 00:09:47,505 might want to push either as high as possible 178 00:09:47,505 --> 00:09:51,490 or as low as possible in making the decisions 179 00:09:51,490 --> 00:09:54,532 about our designs. 180 00:09:54,532 --> 00:09:57,510 So what might be some examples, again? 181 00:10:00,340 --> 00:10:02,525 For the aircraft conceptual design example, 182 00:10:02,525 --> 00:10:07,020 what will be objective functions? 183 00:10:07,020 --> 00:10:07,520 Range. 184 00:10:07,520 --> 00:10:08,019 Yup. 185 00:10:10,782 --> 00:10:15,335 Range might be an objective if you wanted to maximize range. 186 00:10:15,335 --> 00:10:16,230 What is it? 187 00:10:16,230 --> 00:10:16,770 Speed. 188 00:10:16,770 --> 00:10:17,695 Yup, speed sometimes. 189 00:10:23,160 --> 00:10:25,648 Fuel consumption, fuel burn, yup. 190 00:10:29,600 --> 00:10:32,060 So weight often shows up as an objective. 191 00:10:32,060 --> 00:10:33,780 Max take off weight is often used 192 00:10:33,780 --> 00:10:38,730 as a target for costs and fuel burn and everything 193 00:10:38,730 --> 00:10:40,696 kind of rolled up. 194 00:10:40,696 --> 00:10:43,050 Cost might be another one, operating costs, 195 00:10:43,050 --> 00:10:45,900 or it might be entire lifecycle cost. 196 00:10:45,900 --> 00:10:47,710 Could be environmental impact, could 197 00:10:47,710 --> 00:10:54,580 be noise or different kinds of things. 198 00:10:54,580 --> 00:10:58,020 And so I most-- this is right here. 199 00:10:58,020 --> 00:11:06,200 So we would write the vector j as being j1, j2, jz 200 00:11:06,200 --> 00:11:08,280 if we had z objectives. 201 00:11:08,280 --> 00:11:10,560 Turns out that most optimization algorithms 202 00:11:10,560 --> 00:11:18,290 work with a single objective, with a single scalar objective. 203 00:11:18,290 --> 00:11:21,100 There are some ways to do multi-objective optimization 204 00:11:21,100 --> 00:11:22,730 if you have more than one objective 205 00:11:22,730 --> 00:11:24,780 and you want to look for designs that 206 00:11:24,780 --> 00:11:29,020 are at the same time trying to maximize range and minimize 207 00:11:29,020 --> 00:11:29,971 costs, say. 208 00:11:29,971 --> 00:11:30,970 There are ways to do it. 209 00:11:30,970 --> 00:11:33,680 But most optimization methods work with a single scalar 210 00:11:33,680 --> 00:11:34,640 objective. 211 00:11:34,640 --> 00:11:36,190 And if that's the case, then the way 212 00:11:36,190 --> 00:11:38,310 that you would normally proceed would 213 00:11:38,310 --> 00:11:43,640 be by weighting the different objectives. 214 00:11:43,640 --> 00:11:48,130 So you would have weight 1 times j1 plus weight 2 times j2 215 00:11:48,130 --> 00:11:48,689 and so on. 216 00:11:48,689 --> 00:11:50,730 So you would roll all of the different objectives 217 00:11:50,730 --> 00:11:52,950 up, weight them, add them together, 218 00:11:52,950 --> 00:11:55,420 and get a single objective function that would then use 219 00:11:55,420 --> 00:11:56,980 in your optimization algorithm. 220 00:11:56,980 --> 00:12:00,230 And these weights would represent both some kind 221 00:12:00,230 --> 00:12:02,220 of a normalizing factor, because range is 222 00:12:02,220 --> 00:12:04,029 going to be measured in meters. 223 00:12:04,029 --> 00:12:06,070 And that's going to be in the order of thousands. 224 00:12:06,070 --> 00:12:08,880 Whereas, speed would be measured maybe in meters per second 225 00:12:08,880 --> 00:12:11,697 or might be in the order of hundreds. 226 00:12:11,697 --> 00:12:12,780 Costs could be in dollars. 227 00:12:12,780 --> 00:12:18,050 So these things have different units and different scales. 228 00:12:18,050 --> 00:12:19,550 But you could also use these weights 229 00:12:19,550 --> 00:12:21,216 to talk about your design of preference. 230 00:12:21,216 --> 00:12:23,360 I care more about cost than I do about noise. 231 00:12:23,360 --> 00:12:25,640 Or I care more about fuel burn than I do about noise. 232 00:12:25,640 --> 00:12:27,841 And you could use the weights in that way. 233 00:12:27,841 --> 00:12:28,340 OK. 234 00:12:28,340 --> 00:12:30,321 So if we have multiple objective, 235 00:12:30,321 --> 00:12:32,570 either we have to roll them up into a scalar objective 236 00:12:32,570 --> 00:12:34,403 and you think about what these weights would 237 00:12:34,403 --> 00:12:38,220 be or there are some methods to do what's called 238 00:12:38,220 --> 00:12:40,920 multi-objective optimization. 239 00:12:40,920 --> 00:12:42,290 I'll add one more to the left. 240 00:12:42,290 --> 00:12:46,464 This is Professor [INAUDIBLE] preferred-- does he talk about 241 00:12:46,464 --> 00:12:46,963 [INAUDIBLE]? 242 00:12:50,210 --> 00:12:54,610 I think it's payload fuel efficiency-- no, 243 00:12:54,610 --> 00:12:57,100 payload fuel energy intensity. 244 00:12:57,100 --> 00:13:00,710 It's a measure of fuel burned per person traveled 245 00:13:00,710 --> 00:13:04,260 1 nautical mile or 1kilometer. 246 00:13:04,260 --> 00:13:05,890 So it's a measure of how far you fly, 247 00:13:05,890 --> 00:13:08,730 how many people you transport, how much fuel you burn, 248 00:13:08,730 --> 00:13:11,061 measure of efficiency. 249 00:13:11,061 --> 00:13:11,560 OK. 250 00:13:11,560 --> 00:13:15,240 So design variables and objectives, design variables 251 00:13:15,240 --> 00:13:16,570 are what we can change. 252 00:13:16,570 --> 00:13:18,470 The objectives are going to be the measures 253 00:13:18,470 --> 00:13:24,860 of how good the design is that give us 254 00:13:24,860 --> 00:13:28,211 some direction for change. 255 00:13:28,211 --> 00:13:29,960 Then we're also going to have constraints. 256 00:13:33,700 --> 00:13:37,660 Again, we think constraints are what we cannot violate. 257 00:13:37,660 --> 00:13:47,200 So these are going to act as the boundaries of the design space. 258 00:13:47,200 --> 00:13:50,780 So here the design space is defined 259 00:13:50,780 --> 00:13:53,148 by those design variables. 260 00:13:53,148 --> 00:13:58,390 So a nice visual picture that I like to have in mind 261 00:13:58,390 --> 00:14:01,201 is that the design space is like a landscape. 262 00:14:01,201 --> 00:14:03,540 So think of your landscape. 263 00:14:03,540 --> 00:14:05,700 It's got hills, mountains. 264 00:14:05,700 --> 00:14:06,540 It's got valleys. 265 00:14:06,540 --> 00:14:08,250 It's got flat regions. 266 00:14:08,250 --> 00:14:10,690 The dimensions of the landscape are the design variables. 267 00:14:10,690 --> 00:14:14,800 So you can think about two design variable, x1 and x2. 268 00:14:14,800 --> 00:14:17,220 And then the height of the landscape 269 00:14:17,220 --> 00:14:19,021 is the objective function. 270 00:14:19,021 --> 00:14:19,520 Right? 271 00:14:19,520 --> 00:14:21,990 So the mountains, the hills, that's places 272 00:14:21,990 --> 00:14:24,870 where the objective is high. 273 00:14:24,870 --> 00:14:27,330 And the places where you have valleys, 274 00:14:27,330 --> 00:14:28,970 that's where the objective is low. 275 00:14:28,970 --> 00:14:31,840 And so if you try to minimize, say, cost, good designs 276 00:14:31,840 --> 00:14:34,110 you'd be looking for them in the valley. 277 00:14:34,110 --> 00:14:36,340 And the regions where the space is flat, 278 00:14:36,340 --> 00:14:38,422 those are regions where changing the design 279 00:14:38,422 --> 00:14:40,880 doesn't really change the cost or doesn't change the weight 280 00:14:40,880 --> 00:14:43,300 or whatever the objective is. 281 00:14:43,300 --> 00:14:44,800 So then what are the constraints? 282 00:14:44,800 --> 00:14:46,841 The constraints are going to come in and tell you 283 00:14:46,841 --> 00:14:49,229 where in the landscape are you allowed to be. 284 00:14:49,229 --> 00:14:51,520 So there's going to be boundaries 285 00:14:51,520 --> 00:14:55,820 that say you can't go on the side of the boundary. 286 00:14:55,820 --> 00:14:58,360 You have to stay within this region. 287 00:14:58,360 --> 00:15:01,070 It's going to define regions where designs are allowable, 288 00:15:01,070 --> 00:15:03,734 where they satisfy different kinds of constraints. 289 00:15:03,734 --> 00:15:05,400 And it's going to tell you regions where 290 00:15:05,400 --> 00:15:09,040 your designs are not allowable. 291 00:15:09,040 --> 00:15:14,650 And what kind of constraints can you get? 292 00:15:14,650 --> 00:15:20,440 We're going to have inequality constraints, which 293 00:15:20,440 --> 00:15:28,480 we'll write as gj is less than or equal to 0, 294 00:15:28,480 --> 00:15:33,060 for j for 1 to m1. 295 00:15:33,060 --> 00:15:36,390 So we'll have m1 inequality constraint. 296 00:15:36,390 --> 00:15:41,220 And the constraint is written that g, gj for the j constraint 297 00:15:41,220 --> 00:15:43,700 has to be less than or equal to 0. 298 00:15:43,700 --> 00:15:46,070 So this might be a constraint that's something 299 00:15:46,070 --> 00:15:47,760 like [INAUDIBLE] speed. 300 00:15:47,760 --> 00:15:49,790 You need a constraint on [INAUDIBLE] speed. 301 00:15:49,790 --> 00:15:52,710 And you don't require the speed to be anything in particular. 302 00:15:52,710 --> 00:15:56,852 But you've got to make sure that you stay above the constraint. 303 00:15:56,852 --> 00:15:59,060 You would bring everything over to the left-hand side 304 00:15:59,060 --> 00:16:01,440 and make the constraint less than equal to 0. 305 00:16:01,440 --> 00:16:04,100 Or if you had a constraint on cost, 306 00:16:04,100 --> 00:16:07,072 you would say that the cost minus the maximum cost you 307 00:16:07,072 --> 00:16:08,730 can incur had to be less than equal 0. 308 00:16:08,730 --> 00:16:10,912 So inequality constraints often come 309 00:16:10,912 --> 00:16:13,120 when there's some kind of limitation on the resources 310 00:16:13,120 --> 00:16:17,240 that you have or there's some kind of physical limitation 311 00:16:17,240 --> 00:16:19,990 in terms of the physics. 312 00:16:19,990 --> 00:16:22,310 And then we might also have equality constraints. 313 00:16:25,380 --> 00:16:29,440 And we're going to give those the symbol h. 314 00:16:29,440 --> 00:16:32,460 So we'll write hk equal to 0. 315 00:16:32,460 --> 00:16:39,214 And in general, we'll have m2 equality constraint. 316 00:16:39,214 --> 00:16:41,130 So what's an example of an equality constraint 317 00:16:41,130 --> 00:16:42,550 in an aircraft design problem? 318 00:16:48,550 --> 00:16:49,670 Lift equals weight, yeah. 319 00:16:49,670 --> 00:16:51,240 So that's kind of the classic one that shows up 320 00:16:51,240 --> 00:16:53,040 in the aircraft design problem, lift 321 00:16:53,040 --> 00:16:57,530 equals weight or left minus weight equals 0. 322 00:16:57,530 --> 00:16:59,570 If you had sophisticated-- like, if you 323 00:16:59,570 --> 00:17:02,630 have structural models in there and you're 324 00:17:02,630 --> 00:17:04,420 using a finite element analysis, then 325 00:17:04,420 --> 00:17:07,280 the governing equation of the finite element analysis 326 00:17:07,280 --> 00:17:11,490 would also show up here, the laws of whatever the 327 00:17:11,490 --> 00:17:13,180 PDE that you discretized. 328 00:17:13,180 --> 00:17:15,950 Or if there's a CFD model or a CFD equation, 329 00:17:15,950 --> 00:17:20,589 it's the conservation of mass, momentum, energy, and so on. 330 00:17:20,589 --> 00:17:24,484 So often, we can have lots and lots of equality constraints. 331 00:17:24,484 --> 00:17:25,900 So you have inequality constraints 332 00:17:25,900 --> 00:17:27,300 and we have equality constraints. 333 00:17:27,300 --> 00:17:33,010 And then we can also have design variable bounds. 334 00:17:33,010 --> 00:17:37,689 And even though these things are kind of like inequality 335 00:17:37,689 --> 00:17:39,480 constraints, they're usually separated out, 336 00:17:39,480 --> 00:17:42,010 because they're quite often treated differently. 337 00:17:42,010 --> 00:17:45,160 And the design variable bounds for design variable 338 00:17:45,160 --> 00:17:49,590 xi-- remember, our design vector was x1 x1 down to xn. 339 00:17:49,590 --> 00:17:52,720 We might bound design variable xi 340 00:17:52,720 --> 00:18:00,722 by a lower bound and by an upper bound, right? 341 00:18:00,722 --> 00:18:03,710 So for example, wingspan was one of the design variables. 342 00:18:03,710 --> 00:18:05,870 I mean, the lower bound could be 0. 343 00:18:05,870 --> 00:18:08,730 But it might even be bigger than 0. 344 00:18:08,730 --> 00:18:10,562 And the upper bound might be the limit 345 00:18:10,562 --> 00:18:13,020 that we know exists, because [INAUDIBLE] in the [INAUDIBLE] 346 00:18:13,020 --> 00:18:18,170 constraints, the 80 meter box for a big aircraft. 347 00:18:18,170 --> 00:18:20,550 Or it may be that we know we want designs that 348 00:18:20,550 --> 00:18:22,622 are going to be between 40 and 50, 349 00:18:22,622 --> 00:18:24,580 and so we're going to put those bounds on so we 350 00:18:24,580 --> 00:18:28,000 don't waste time searching for designs elsewhere. 351 00:18:28,000 --> 00:18:35,050 So we may have lower and upper bounds for some 352 00:18:35,050 --> 00:18:36,541 or all of our design variables. 353 00:18:40,150 --> 00:18:41,930 OK. 354 00:18:41,930 --> 00:18:44,462 So now, we have all the pieces. 355 00:18:44,462 --> 00:18:45,920 I'm going to call the parameters p. 356 00:18:50,550 --> 00:18:54,520 What are things that could be parameters? 357 00:18:54,520 --> 00:18:56,770 Material properties could be parameters. 358 00:18:56,770 --> 00:18:59,110 Sometimes speed is actually a parameter. 359 00:18:59,110 --> 00:19:00,980 Often, aircraft are designed with speed 360 00:19:00,980 --> 00:19:03,380 being fixed and not something that you can vary. 361 00:19:03,380 --> 00:19:05,890 So often mach number might show up 362 00:19:05,890 --> 00:19:07,900 as a parameter-- anything that's going 363 00:19:07,900 --> 00:19:10,370 to go in the problem that's going to affect the constraints 364 00:19:10,370 --> 00:19:14,430 or affect the objectives, but is not allowed to vary. 365 00:19:14,430 --> 00:19:20,230 So putting all that together, what does the optimization 366 00:19:20,230 --> 00:19:22,410 problem look like? 367 00:19:22,410 --> 00:19:27,100 Minimize over x from j of x. 368 00:19:27,100 --> 00:19:28,754 And x here is a vector. 369 00:19:28,754 --> 00:19:30,420 But we're going to, again, consider just 370 00:19:30,420 --> 00:19:32,086 a scalar objective function. 371 00:19:32,086 --> 00:19:33,460 So if we have lots of objectives, 372 00:19:33,460 --> 00:19:37,160 we're going to roll them all up together subject 373 00:19:37,160 --> 00:19:44,621 to the constraints, the hk of x. 374 00:19:44,621 --> 00:19:46,370 And actually, let me write explicitly this 375 00:19:46,370 --> 00:19:48,620 is a function of x and p. 376 00:19:48,620 --> 00:19:51,690 And it's quite common to put the semi-colon here 377 00:19:51,690 --> 00:19:55,260 to show that j, the objective, depends on the parameters. 378 00:19:55,260 --> 00:19:58,880 But the parameters are kind of different to the design 379 00:19:58,880 --> 00:19:59,540 variables. 380 00:19:59,540 --> 00:20:00,998 But the things we can change, these 381 00:20:00,998 --> 00:20:03,240 are just things that affect [INAUDIBLE], 382 00:20:03,240 --> 00:20:04,680 but they do affect j. 383 00:20:04,680 --> 00:20:08,570 So the same thing for the [INAUDIBLE] vector as well. 384 00:20:08,570 --> 00:20:11,730 For the constraints, hk is equal to 0. 385 00:20:11,730 --> 00:20:14,050 And that's k equal 1, 2. 386 00:20:14,050 --> 00:20:16,540 I think we had m2 of those. 387 00:20:16,540 --> 00:20:22,860 So gj-- again there, a function of x and p-- 388 00:20:22,860 --> 00:20:26,362 are less than or equal to 0. 389 00:20:26,362 --> 00:20:31,160 And j goes from 1 to m1. 390 00:20:31,160 --> 00:20:38,910 And then the xi lie between the lower 391 00:20:38,910 --> 00:20:41,930 bound-- how did I write [INAUDIBLE] that way-- 392 00:20:41,930 --> 00:20:47,310 and the upper bound for the design variables where 393 00:20:47,310 --> 00:20:50,200 some of these guys might be minus infinity or infinity 394 00:20:50,200 --> 00:20:53,366 if we don't actually have bounds. 395 00:20:53,366 --> 00:20:54,818 OK. 396 00:20:54,818 --> 00:20:55,395 Yup. 397 00:20:55,395 --> 00:20:56,270 AUDIENCE: [INAUDIBLE] 398 00:20:59,180 --> 00:21:00,680 PROFESSOR: Yeah. 399 00:21:00,680 --> 00:21:01,600 Yeah. 400 00:21:01,600 --> 00:21:05,290 So again, whatever the problem, however the problem shows up, 401 00:21:05,290 --> 00:21:07,130 we can always write it in this form. 402 00:21:07,130 --> 00:21:09,610 So exactly, if we actually wanted to maximize something, 403 00:21:09,610 --> 00:21:11,110 if we wanted to maximize range, we 404 00:21:11,110 --> 00:21:13,409 would just minimize negative range. 405 00:21:13,409 --> 00:21:15,200 So any objective that you want to maximize, 406 00:21:15,200 --> 00:21:17,074 you're going to multiply with a negative sign 407 00:21:17,074 --> 00:21:18,390 and try to minimize it. 408 00:21:18,390 --> 00:21:22,590 How about the inequality constraints gj less than equal 409 00:21:22,590 --> 00:21:23,155 0? 410 00:21:23,155 --> 00:21:25,640 You can always just rearrange it and turn it around, right? 411 00:21:25,640 --> 00:21:26,973 Multiply by minus 1 on this one. 412 00:21:26,973 --> 00:21:31,530 And flip the sign if it's a greater than or equal to sign. 413 00:21:31,530 --> 00:21:33,887 So the fact that we wrote this as a minimization 414 00:21:33,887 --> 00:21:35,970 and with these guys being less than equal 0, let's 415 00:21:35,970 --> 00:21:38,941 go to the standard form that for the nonlinear optimization 416 00:21:38,941 --> 00:21:39,440 problem. 417 00:21:39,440 --> 00:21:43,800 It's sometimes called NLP for Nonlinear Program. 418 00:21:43,800 --> 00:21:49,580 But you can always get your problem into this form. 419 00:21:49,580 --> 00:21:53,160 So I forgot to scroll through this thing. 420 00:21:56,950 --> 00:21:57,450 OK. 421 00:21:57,450 --> 00:22:01,520 So let's see, any questions? 422 00:22:01,520 --> 00:22:04,340 It's clear? 423 00:22:04,340 --> 00:22:06,220 Mathematical statement. 424 00:22:06,220 --> 00:22:09,030 This is a design problem written as an optimization problem, 425 00:22:09,030 --> 00:22:13,090 finding the xs that minimize my objective while satisfying 426 00:22:13,090 --> 00:22:14,750 the constraints. 427 00:22:14,750 --> 00:22:18,490 And that will be a good design. 428 00:22:22,440 --> 00:22:22,940 All right. 429 00:22:22,940 --> 00:22:25,980 So what we're going to talk about now 430 00:22:25,980 --> 00:22:38,160 are how we might go about solving this problem. 431 00:22:38,160 --> 00:22:44,450 So this is number two. 432 00:22:44,450 --> 00:22:48,760 And I'm going to focus mostly on unconstrained optimization 433 00:22:48,760 --> 00:22:49,710 problems. 434 00:22:49,710 --> 00:22:54,340 I'll talk a little bit about how to handle the constraints maybe 435 00:22:54,340 --> 00:22:58,750 in the next lecture-- so unconstrained optimization. 436 00:22:58,750 --> 00:23:10,200 So many or most-- so let's say many optimization algorithms, 437 00:23:10,200 --> 00:23:18,590 in fact, almost all optimization algorithms are iterative-- 438 00:23:18,590 --> 00:23:23,640 [INAUDIBLE]-- which means that we're going to iterate to try 439 00:23:23,640 --> 00:23:24,567 to solve this problem. 440 00:23:24,567 --> 00:23:25,650 So what do I mean by that? 441 00:23:25,650 --> 00:23:27,836 We're going to have a guess for x. 442 00:23:27,836 --> 00:23:29,710 We're going to update that guess in some way. 443 00:23:29,710 --> 00:23:31,010 And we're going to keep iterating, keep 444 00:23:31,010 --> 00:23:33,010 updating the guess for x, get a new guess for x, 445 00:23:33,010 --> 00:23:35,800 new guess for x, new guess for x until we think 446 00:23:35,800 --> 00:23:38,590 we've solved this problem. 447 00:23:38,590 --> 00:23:42,500 And the way that we can write the kinds 448 00:23:42,500 --> 00:23:44,720 of optimizations algorithms we're going to talk about 449 00:23:44,720 --> 00:23:57,640 is that xq is equal to x q minus 1 plus alpha q times xq. 450 00:24:00,420 --> 00:24:03,852 And this is going to be true for q equal 1, 2, 451 00:24:03,852 --> 00:24:06,290 up to however many iterations we have. 452 00:24:06,290 --> 00:24:06,790 OK. 453 00:24:06,790 --> 00:24:08,260 So what are all these symbols? 454 00:24:08,260 --> 00:24:11,670 So first of all, q is going to be the iteration number. 455 00:24:14,960 --> 00:24:18,950 So the superscript q here denotes 456 00:24:18,950 --> 00:24:23,070 our guess for whatever quantity or our value for whatever 457 00:24:23,070 --> 00:24:26,160 quantity on iteration q. 458 00:24:26,160 --> 00:24:30,070 And specifically, xq is going to be 459 00:24:30,070 --> 00:24:38,003 our guess for x on iteration q. 460 00:24:38,003 --> 00:24:38,502 OK. 461 00:24:38,502 --> 00:24:46,300 So it's our current guess for the design variables. 462 00:24:46,300 --> 00:24:50,710 And what you can see is that the guess on iteration q 463 00:24:50,710 --> 00:24:53,850 is going to be given by the guess of the design variables 464 00:24:53,850 --> 00:24:58,880 on t minus 1, so what we had before, plus this update turn. 465 00:24:58,880 --> 00:25:02,900 And this update turn is about a scalar alpha q. 466 00:25:02,900 --> 00:25:06,562 And it's got a vector Sq. 467 00:25:06,562 --> 00:25:09,070 So let me explain it, and then we can write it down. 468 00:25:09,070 --> 00:25:12,440 This thing here is called the search direction. 469 00:25:12,440 --> 00:25:15,564 And this thing here is going to be out scalar [INAUDIBLE]. 470 00:25:15,564 --> 00:25:16,980 So again, what I want you to do is 471 00:25:16,980 --> 00:25:18,030 I want you to picture the design space 472 00:25:18,030 --> 00:25:19,240 as being like a landscape. 473 00:25:19,240 --> 00:25:19,740 Right? 474 00:25:19,740 --> 00:25:20,610 So that's the xs. 475 00:25:20,610 --> 00:25:24,190 And again, the height of the landscape 476 00:25:24,190 --> 00:25:26,790 is going to be a measure of j. 477 00:25:26,790 --> 00:25:29,020 So I'm standing in the landscape. 478 00:25:29,020 --> 00:25:32,162 And I'm standing at point x q minus 1. 479 00:25:32,162 --> 00:25:33,370 We can [INAUDIBLE] q equal 1. 480 00:25:33,370 --> 00:25:35,630 So I'm standing at a point x0. 481 00:25:35,630 --> 00:25:37,454 And I want to figure out a way to go. 482 00:25:37,454 --> 00:25:38,495 I'm going to take a step. 483 00:25:38,495 --> 00:25:41,500 I'm going to move in the landscape to get my next one, 484 00:25:41,500 --> 00:25:43,100 to get my x1. 485 00:25:43,100 --> 00:25:46,650 And the way I move is that first of all, 486 00:25:46,650 --> 00:25:49,270 I'm to pick a direction to move in. 487 00:25:49,270 --> 00:25:52,049 So if q is my search direction, so first of all, 488 00:25:52,049 --> 00:25:53,090 I'm going to look around. 489 00:25:53,090 --> 00:25:55,170 I'm going to pick a direction to move in. 490 00:25:55,170 --> 00:25:56,950 And maybe you have some intuition 491 00:25:56,950 --> 00:25:58,810 that if I'm trying to minimize something, 492 00:25:58,810 --> 00:26:01,386 I'm going to try to pick a downhill direction, right? 493 00:26:01,386 --> 00:26:04,234 So I'm looking for a bottom of a valley. 494 00:26:04,234 --> 00:26:05,900 I'm standing at a point in the landscape 495 00:26:05,900 --> 00:26:08,640 I might want to walk downhill. 496 00:26:08,640 --> 00:26:10,476 So I'm going to pick a direction Sq. 497 00:26:10,476 --> 00:26:12,100 And then once I've picked my direction, 498 00:26:12,100 --> 00:26:15,280 I want to figure out how far to walk in that direction. 499 00:26:15,280 --> 00:26:16,465 And that's the alpha q. 500 00:26:16,465 --> 00:26:18,920 So the Sq is the direction. 501 00:26:18,920 --> 00:26:21,440 And then alpha q is going to be the size of step. 502 00:26:21,440 --> 00:26:24,100 So this is scalar in that direction. 503 00:26:24,100 --> 00:26:26,400 So standing in the landscape, pick a direction. 504 00:26:26,400 --> 00:26:28,295 Take a step [INAUDIBLE] alpha q. 505 00:26:28,295 --> 00:26:31,320 So that gets me to the new point in the landscape. 506 00:26:31,320 --> 00:26:33,600 Then I'm going to stop, and I'm going to look around. 507 00:26:33,600 --> 00:26:35,252 I'm going to find a new direction. 508 00:26:35,252 --> 00:26:36,710 And I'm going to do the same thing, 509 00:26:36,710 --> 00:26:37,834 figure out how far to walk. 510 00:26:37,834 --> 00:26:40,550 We're just going to keep walking around the landscape picking 511 00:26:40,550 --> 00:26:43,490 directions and figuring out how far to walk 512 00:26:43,490 --> 00:26:46,440 in those directions. 513 00:26:46,440 --> 00:26:46,940 OK. 514 00:26:46,940 --> 00:26:48,930 And of course, then the trick comes. 515 00:26:48,930 --> 00:26:50,160 How do you this? 516 00:26:50,160 --> 00:26:52,480 And that's what the different optimization algorithms 517 00:26:52,480 --> 00:26:59,180 do, different ways of picking search directions and steps. 518 00:26:59,180 --> 00:26:59,680 OK. 519 00:26:59,680 --> 00:27:07,200 So Sq is, again, our vector search direction. 520 00:27:07,200 --> 00:27:11,760 So it's got the same dimension as x. 521 00:27:11,760 --> 00:27:13,572 Somebody is smoking out the window. 522 00:27:13,572 --> 00:27:14,544 Do you guys smell it? 523 00:27:23,780 --> 00:27:43,220 [INAUDIBLE] [AUDIO OUT] is [AUDIO OUT] [INAUDIBLE] q 524 00:27:43,220 --> 00:27:45,130 and alpha q is a strong consideration, too. 525 00:27:50,412 --> 00:27:51,870 And then the other thing we're also 526 00:27:51,870 --> 00:27:54,300 going to need is we're going to need 527 00:27:54,300 --> 00:27:58,900 an x0, which is going to be our initial guess, all right? 528 00:27:58,900 --> 00:28:02,370 So we're always going to have to initialize from some x0, 529 00:28:02,370 --> 00:28:05,035 so that we can start this whole process of walking 530 00:28:05,035 --> 00:28:06,475 through the design space. 531 00:28:09,785 --> 00:28:10,285 OK. 532 00:28:26,930 --> 00:28:28,860 It's very [INAUDIBLE] wavy. 533 00:28:28,860 --> 00:28:34,670 Actually, that's a [AUDIO OUT] satisfy this problem. 534 00:28:42,050 --> 00:28:47,640 [AUDIO OUT] You can [AUDIO OUT] make guarantees 535 00:28:47,640 --> 00:28:52,950 about whether the final x that comes out after you iterate 536 00:28:52,950 --> 00:28:55,920 satisfies the assumptions [AUDIO OUT]. 537 00:28:58,580 --> 00:29:17,450 So there are things that [AUDIO OUT] [INAUDIBLE] has 538 00:29:17,450 --> 00:29:20,160 to be true. 539 00:29:20,160 --> 00:29:22,630 So [AUDIO OUT] I just want to minimize j of x. 540 00:29:22,630 --> 00:29:24,146 Forget about [INAUDIBLE]. 541 00:29:24,146 --> 00:29:26,800 It has to be true. 542 00:29:26,800 --> 00:29:30,510 Yeah, what's that mathematical condition? 543 00:29:30,510 --> 00:29:32,124 What has to be there? 544 00:29:32,124 --> 00:29:34,534 Yeah, [INAUDIBLE]. 545 00:29:34,534 --> 00:29:46,670 [AUDIO OUT] [INAUDIBLE] depending on [AUDIO OUT], 546 00:29:46,670 --> 00:30:02,450 we can guarantees about [AUDIO OUT] additional 547 00:30:02,450 --> 00:30:05,280 [INAUDIBLE] that you need [INAUDIBLE] derivative has 548 00:30:05,280 --> 00:30:06,787 to be positive for a minimum. 549 00:30:06,787 --> 00:30:08,870 And we'll talk about what that means, because this 550 00:30:08,870 --> 00:30:10,945 is a vector [INAUDIBLE]. 551 00:30:10,945 --> 00:30:12,690 So I call these S's. 552 00:30:12,690 --> 00:30:14,380 I mean, they're iterates. 553 00:30:14,380 --> 00:30:15,270 They're on the way. 554 00:30:15,270 --> 00:30:17,353 And again, depending on the what algorithm we use, 555 00:30:17,353 --> 00:30:20,253 we will get to at least a solution of that problem. 556 00:30:23,060 --> 00:30:33,028 I want to-- that's strange-- I want to show you this idea. 557 00:30:43,228 --> 00:30:46,194 It's a little MATLAB demo. 558 00:30:46,194 --> 00:30:48,860 Because I think it will help you think about some of the issues. 559 00:30:48,860 --> 00:30:54,655 And then we'll start talking about how we actually can-- 560 00:30:54,655 --> 00:30:58,362 well, we'll talk about some of the mathematical quantities 561 00:30:58,362 --> 00:31:00,820 that we're going to need to be able to compute these alphas 562 00:31:00,820 --> 00:31:01,000 and S's. 563 00:31:01,000 --> 00:31:02,540 And then we'll talk about the different methods 564 00:31:02,540 --> 00:31:03,850 and how they do those. 565 00:31:08,450 --> 00:31:15,090 So [INAUDIBLE] still simple [INAUDIBLE] 566 00:31:15,090 --> 00:31:17,116 script here that does optimization. 567 00:31:19,940 --> 00:31:21,710 It does optimization on the function. 568 00:31:21,710 --> 00:31:23,730 This is the peak function in MATLAB. 569 00:31:23,730 --> 00:31:25,919 So this is the landscape. 570 00:31:25,919 --> 00:31:27,210 There are two design variables. 571 00:31:27,210 --> 00:31:28,418 Here, they're called x and y. 572 00:31:28,418 --> 00:31:30,200 So this is x1 and x2. 573 00:31:30,200 --> 00:31:34,320 And this is j on the vertical axis. 574 00:31:34,320 --> 00:31:35,385 So this is the objective. 575 00:31:35,385 --> 00:31:36,820 And we're going to try to maximize 576 00:31:36,820 --> 00:31:38,403 just because it's easier to see what's 577 00:31:38,403 --> 00:31:39,784 going on with maximizing. 578 00:31:39,784 --> 00:31:41,950 So we've asked [INAUDIBLE] constraint to [INAUDIBLE] 579 00:31:41,950 --> 00:31:45,080 an unconstrained problem, we've asked the optimizer 580 00:31:45,080 --> 00:31:48,240 to find the design vector x, which in this case 581 00:31:48,240 --> 00:31:53,430 is xy, two components, that maximizes the objective j where 582 00:31:53,430 --> 00:32:00,260 the height and the color are the measure of j. 583 00:32:00,260 --> 00:32:01,990 And this little asterisk sitting here 584 00:32:01,990 --> 00:32:03,670 is the initial guess that I gave. 585 00:32:03,670 --> 00:32:05,641 When I called it, I gave it an initial guess 586 00:32:05,641 --> 00:32:10,730 of-- what did I give it-- minus 1, 1. 587 00:32:10,730 --> 00:32:11,230 OK. 588 00:32:11,230 --> 00:32:12,034 So this is x0. 589 00:32:12,034 --> 00:32:13,950 And I'm going to start it running in a second. 590 00:32:13,950 --> 00:32:17,485 And what you're going to see is a sequence of little asterisk. 591 00:32:17,485 --> 00:32:21,220 And each asterisk is a new guess or a new iterate. 592 00:32:21,220 --> 00:32:22,440 So this is x0. 593 00:32:22,440 --> 00:32:24,540 It's going to compute x1, x2, x3, 594 00:32:24,540 --> 00:32:28,060 through the script all the way until, hopefully, 595 00:32:28,060 --> 00:32:31,330 it's gets up to the top here. 596 00:32:31,330 --> 00:32:32,540 So we'll set it going. 597 00:32:35,880 --> 00:32:42,440 And you can see it's doing what you would want it to do, 598 00:32:42,440 --> 00:32:44,242 which is it was asked to maximize. 599 00:32:44,242 --> 00:32:45,200 So it's looking around. 600 00:32:45,200 --> 00:32:46,491 It's picking search directions. 601 00:32:46,491 --> 00:32:48,430 And then it's deciding how far to step. 602 00:32:48,430 --> 00:32:51,480 On each iteration, it's moving in a particular direction 603 00:32:51,480 --> 00:32:52,660 and taking a step. 604 00:32:52,660 --> 00:32:58,500 And you can see that it gets to the top of the hole. 605 00:32:58,500 --> 00:33:04,310 So this is an optimization method that's 606 00:33:04,310 --> 00:33:05,460 called Nelder-Mead simplex. 607 00:33:05,460 --> 00:33:08,320 And I'll show you just a little bit about it later on. 608 00:33:08,320 --> 00:33:09,940 That's fminsearch in MATLAB. 609 00:33:13,114 --> 00:33:14,530 And we'll talk in the next lecture 610 00:33:14,530 --> 00:33:17,630 about how to call the MATLAB optimization functions, 611 00:33:17,630 --> 00:33:19,720 because they're really powerful. 612 00:33:19,720 --> 00:33:24,307 And they can really help you if you're doing design problems. 613 00:33:24,307 --> 00:33:25,390 But that one's fminsearch. 614 00:33:25,390 --> 00:33:26,170 Do you want to ask a question? 615 00:33:26,170 --> 00:33:27,070 Yup. 616 00:33:27,070 --> 00:33:28,004 AUDIENCE: [INAUDIBLE] 617 00:33:28,004 --> 00:33:28,670 PROFESSOR: Yeah. 618 00:33:28,670 --> 00:33:30,040 We'll do that in one second. 619 00:33:30,040 --> 00:33:31,860 Yeah. 620 00:33:31,860 --> 00:33:39,314 I just want to show you this-- oops-- still running. 621 00:33:39,314 --> 00:33:40,200 Hang on. 622 00:33:40,200 --> 00:33:42,150 So it's still going. 623 00:33:42,150 --> 00:33:46,375 So fminsearches, you'll see it-- where is [INAUDIBLE]? 624 00:33:46,375 --> 00:33:48,510 It's interesting that it's still running. 625 00:33:48,510 --> 00:33:52,265 So fminsearch, this is why you should always 626 00:33:52,265 --> 00:33:54,694 put maximum iterations in your code. 627 00:33:54,694 --> 00:33:56,860 Because I'm pretty sure it's at the top of the hill, 628 00:33:56,860 --> 00:33:59,374 but the [INAUDIBLE] probably fit two [INAUDIBLE]. 629 00:33:59,374 --> 00:33:59,874 Yup. 630 00:33:59,874 --> 00:34:01,190 AUDIENCE: [INAUDIBLE] 631 00:34:01,190 --> 00:34:01,790 PROFESSOR: No. 632 00:34:01,790 --> 00:34:03,540 It changes each iteration. 633 00:34:03,540 --> 00:34:04,130 Yeah. 634 00:34:04,130 --> 00:34:06,600 And we'll talk about how to compute it. 635 00:34:06,600 --> 00:34:08,100 So I think I'm going to kill that. 636 00:34:08,100 --> 00:34:11,992 So fminsearch is actually-- it's called a Nelder-Mead simplex. 637 00:34:11,992 --> 00:34:13,945 It's kind of a pattern search. 638 00:34:13,945 --> 00:34:15,319 So it's sampling the design space 639 00:34:15,319 --> 00:34:16,190 and doing a pattern search. 640 00:34:16,190 --> 00:34:17,273 And we'll talk about that. 641 00:34:17,273 --> 00:34:20,210 But I also just want to-- oops, not that one. 642 00:34:23,159 --> 00:34:25,615 How does it measure-- well, we'll talk about that as well. 643 00:34:25,615 --> 00:34:26,114 Yup, 0. 644 00:34:28,886 --> 00:34:30,280 AUDIENCE: [INAUDIBLE] 645 00:34:30,280 --> 00:34:32,777 PROFESSOR: Nelder-Mead simplex is the method that's 646 00:34:32,777 --> 00:34:33,860 implemented in fminsearch. 647 00:34:37,249 --> 00:34:39,790 So there's another one-- oops, now I'm [INAUDIBLE] the wrong. 648 00:34:39,790 --> 00:34:41,920 Actually, [INAUDIBLE]. 649 00:34:41,920 --> 00:34:43,690 There's another one called fminunc, 650 00:34:43,690 --> 00:34:48,422 which stands for minimization unconstrained. 651 00:34:48,422 --> 00:34:50,380 And this is one that uses gradient information. 652 00:34:50,380 --> 00:34:54,130 And again, we're going to talk about how that works. 653 00:34:54,130 --> 00:34:57,550 But let me just run the same, so you can see. [INAUDIBLE] 654 00:34:57,550 --> 00:34:58,740 after 49 iterations. 655 00:34:58,740 --> 00:35:02,160 I don't know why it was continuing to go. 656 00:35:02,160 --> 00:35:04,505 But let's run from the same initial condition. 657 00:35:15,287 --> 00:35:16,790 OK. 658 00:35:16,790 --> 00:35:19,670 So it takes a little bit longer per iteration. 659 00:35:19,670 --> 00:35:22,840 But did you see how it basically jumped all way to top of hill 660 00:35:22,840 --> 00:35:24,315 in the second iteration? 661 00:35:24,315 --> 00:35:26,315 And you'll see it when we talk about algorithms. 662 00:35:26,315 --> 00:35:29,440 But that's because we actually computed the gradient here 663 00:35:29,440 --> 00:35:31,050 rather than just sampling locally, 664 00:35:31,050 --> 00:35:33,837 which is what now the Mead simplex does with triangles 665 00:35:33,837 --> 00:35:35,670 and flipping them on and taking a long time. 666 00:35:35,670 --> 00:35:37,280 This one used a gradient. 667 00:35:37,280 --> 00:35:39,660 And you can see just how quickly we 668 00:35:39,660 --> 00:35:42,790 got to the top of the hill, 6 iteration, 24 function 669 00:35:42,790 --> 00:35:43,350 evaluations. 670 00:35:43,350 --> 00:35:45,937 And again, we'll more talk about the different aspects. 671 00:35:45,937 --> 00:35:47,770 But this is definitely a message that if you 672 00:35:47,770 --> 00:35:50,500 can compute gradients-- which we can't always do, 673 00:35:50,500 --> 00:35:52,420 in this case we can do it analytically-- 674 00:35:52,420 --> 00:35:58,540 you can get really quickly to an optimal solution. 675 00:35:58,540 --> 00:36:01,535 And so fminunc and fmincon is the constrained sort 676 00:36:01,535 --> 00:36:03,954 of version of fminunc can be really powerful. 677 00:36:03,954 --> 00:36:05,370 So let's see, Kevin wanted to know 678 00:36:05,370 --> 00:36:07,824 what happened if we started from a different solution. 679 00:36:07,824 --> 00:36:09,457 So what would you like? 680 00:36:18,000 --> 00:36:20,340 OK. 681 00:36:20,340 --> 00:36:23,620 So do you want to pick a-- so somewhere here? 682 00:36:23,620 --> 00:36:27,545 So like minus 1, minus 1? 683 00:36:27,545 --> 00:36:30,252 Let's see. 684 00:36:30,252 --> 00:36:31,230 OK. 685 00:36:31,230 --> 00:36:33,969 So there's the new asterisk. 686 00:36:33,969 --> 00:36:35,594 So what do you think's going to happen? 687 00:36:40,040 --> 00:36:43,460 So it goes to the local peak. 688 00:36:43,460 --> 00:36:45,350 It's going to convert pretty quickly. 689 00:36:45,350 --> 00:36:47,901 So while an advantage of gradient based 690 00:36:47,901 --> 00:36:50,150 optimization is that it's super efficient, because you 691 00:36:50,150 --> 00:36:58,440 use the gradient information and you go to the [INAUDIBLE] 692 00:36:58,440 --> 00:37:02,086 in the design space and [INAUDIBLE] you can 693 00:37:02,086 --> 00:37:02,585 [INAUDIBLE]. 694 00:37:02,585 --> 00:37:05,140 We'll talk about [INAUDIBLE] condition. 695 00:37:05,140 --> 00:37:09,285 The gradient of j equals 0 and [INAUDIBLE]. 696 00:37:09,285 --> 00:37:12,870 We can't tell the difference between being here, being here, 697 00:37:12,870 --> 00:37:15,480 or being here. 698 00:37:15,480 --> 00:37:18,215 And that's kind of a drawback of a local method. 699 00:37:18,215 --> 00:37:22,220 At best, we can say that we are in a local maximum 700 00:37:22,220 --> 00:37:23,835 in this case. 701 00:37:23,835 --> 00:37:25,960 So we have to be careful about where we initialize. 702 00:37:25,960 --> 00:37:29,150 And you probably guess if we start close to this one, 703 00:37:29,150 --> 00:37:30,350 we're going end up up there. 704 00:37:30,350 --> 00:37:31,850 If we can start somewhere over here, 705 00:37:31,850 --> 00:37:33,110 we have a good chance of getting up here. 706 00:37:33,110 --> 00:37:34,860 And if we start where we started, 707 00:37:34,860 --> 00:37:36,230 we're going to go here. 708 00:37:36,230 --> 00:37:37,090 It's one of the drawbacks. [INAUDIBLE], 709 00:37:37,090 --> 00:37:37,750 do you have a question? 710 00:37:37,750 --> 00:37:38,375 AUDIENCE: Yeah. 711 00:37:38,375 --> 00:37:43,814 I was going to ask how could you know how far to go [INAUDIBLE]? 712 00:37:43,814 --> 00:37:44,480 PROFESSOR: Yeah. 713 00:37:44,480 --> 00:37:46,021 We're going to talk about that, yeah. 714 00:37:46,021 --> 00:37:50,724 So we're going to talk about how it chooses x, which is going 715 00:37:50,724 --> 00:37:51,890 to use gradient information. 716 00:37:51,890 --> 00:37:54,070 But it's not just the gradient you'll see. 717 00:37:54,070 --> 00:37:55,650 And then how it knows how far to go 718 00:37:55,650 --> 00:37:58,090 is that once it's picked the gradient direction, 719 00:37:58,090 --> 00:38:00,590 it looks to see the shape of the function in that direction. 720 00:38:00,590 --> 00:38:03,045 And we'll talk about how it actually knows how far to go. 721 00:38:03,045 --> 00:38:03,544 Yeah. 722 00:38:03,544 --> 00:38:07,138 Is there another question? 723 00:38:07,138 --> 00:38:09,850 AUDIENCE: [INAUDIBLE] 724 00:38:09,850 --> 00:38:11,830 PROFESSOR: Yeah. 725 00:38:11,830 --> 00:38:12,510 This one here? 726 00:38:15,730 --> 00:38:16,230 Yeah. 727 00:38:16,230 --> 00:38:17,770 It's not quite clear. 728 00:38:17,770 --> 00:38:25,810 If it were right on the settle point-- it's not quite clear. 729 00:38:25,810 --> 00:38:30,900 And I'll see if I can-- I think it's a similar question. 730 00:38:30,900 --> 00:38:33,360 If I started over here, so let's see, 731 00:38:33,360 --> 00:38:35,140 x will be close to minus 3. 732 00:38:35,140 --> 00:38:36,990 And y will be close to 3. 733 00:38:36,990 --> 00:38:50,730 So let's start it at, like, minus 2.5 and plus 2.5. 734 00:38:50,730 --> 00:38:53,152 So I think that's over in a corner of the design space. 735 00:38:53,152 --> 00:38:56,720 Do you see any potential problems here? 736 00:38:56,720 --> 00:38:57,857 It's really flat, right? 737 00:38:57,857 --> 00:39:00,440 So when we compute the gradient, it's going to be really flat. 738 00:39:00,440 --> 00:39:02,750 It's going to know that it's not at a local minimum, 739 00:39:02,750 --> 00:39:06,560 because the second road is not going to be positive. 740 00:39:06,560 --> 00:39:09,296 But it can be that the algorithm has difficulty figuring out 741 00:39:09,296 --> 00:39:10,670 which way to go, which might also 742 00:39:10,670 --> 00:39:13,270 be the same thing if you started in that shoulder 743 00:39:13,270 --> 00:39:14,970 between the two. 744 00:39:14,970 --> 00:39:18,550 But so it turns out, I mean, it's going to have difficulty. 745 00:39:18,550 --> 00:39:24,846 And you can see it's probably having a few issues. 746 00:39:24,846 --> 00:39:26,220 It's running pretty slowly, which 747 00:39:26,220 --> 00:39:30,970 means it's probably doing a lot more function calls. 748 00:39:30,970 --> 00:39:32,880 But these algorithms and especially 749 00:39:32,880 --> 00:39:36,110 MATLAB's optimization toolbox is really good. 750 00:39:36,110 --> 00:39:38,050 So they have ways of recovering. 751 00:39:38,050 --> 00:39:40,480 So that when things go wrong, if you can't figure out 752 00:39:40,480 --> 00:39:42,880 where to go, you do a little bit of sampling 753 00:39:42,880 --> 00:39:45,352 and there are many ways that you can recover. 754 00:39:45,352 --> 00:39:47,810 And in this case, we managed to recover and actually get up 755 00:39:47,810 --> 00:39:50,230 to the top. 756 00:39:50,230 --> 00:39:51,960 We're not there yet. 757 00:39:51,960 --> 00:39:52,960 It'll get there, though. 758 00:39:52,960 --> 00:39:54,352 It should get there. 759 00:39:57,960 --> 00:39:59,641 It is-- yeah, you can see. 760 00:39:59,641 --> 00:40:01,140 Even though it managed to get there, 761 00:40:01,140 --> 00:40:04,300 I mean, I don't know if you looked at the numbers before. 762 00:40:04,300 --> 00:40:06,885 This is the number of times that it calls 763 00:40:06,885 --> 00:40:09,480 the evaluation of the surface. 764 00:40:09,480 --> 00:40:12,670 And usually, it's on the order of 3, 6, 765 00:40:12,670 --> 00:40:14,076 but you can see it took 27. 766 00:40:14,076 --> 00:40:15,450 And that second iteration, that's 767 00:40:15,450 --> 00:40:17,660 because it was in this flat region. 768 00:40:17,660 --> 00:40:20,590 But you can see that whatever MATLAB, whatever 769 00:40:20,590 --> 00:40:23,898 [INAUDIBLE] built in to try and sort of make 770 00:40:23,898 --> 00:40:25,606 the algorithm robust, actually did manage 771 00:40:25,606 --> 00:40:27,350 to recover in that [INAUDIBLE]. 772 00:40:27,350 --> 00:40:29,830 These things can definitely fail to converge. 773 00:40:29,830 --> 00:40:32,430 They can come back without finding a solution. 774 00:40:32,430 --> 00:40:35,260 So there's all kinds of problems you can run into. 775 00:40:35,260 --> 00:40:37,550 But conceptually, this is what's going on. 776 00:40:37,550 --> 00:40:39,580 I think that a picture of a landscape 777 00:40:39,580 --> 00:40:41,860 is a really good one to have in your mind 778 00:40:41,860 --> 00:40:45,090 as we talk about the math of the algorithms 779 00:40:45,090 --> 00:40:48,720 and the computing of the alpha and the x. 780 00:40:48,720 --> 00:40:49,220 OK? 781 00:40:55,057 --> 00:40:57,140 Is there any good way to approximate the landscape 782 00:40:57,140 --> 00:41:00,010 if the function is too expensive to call? 783 00:41:00,010 --> 00:41:01,510 AUDIENCE: [INAUDIBLE]. 784 00:41:01,510 --> 00:41:04,483 We're lucky that we know what it looks like. 785 00:41:04,483 --> 00:41:04,983 [INAUDIBLE] 786 00:41:11,784 --> 00:41:12,450 PROFESSOR: Yeah. 787 00:41:12,450 --> 00:41:13,420 So we know where it is. 788 00:41:13,420 --> 00:41:14,461 And we're visualizing it. 789 00:41:14,461 --> 00:41:16,090 But the optimizer didn't know that. 790 00:41:16,090 --> 00:41:17,090 AUDIENCE: Right. 791 00:41:17,090 --> 00:41:18,090 PROFESSOR: So-- 792 00:41:18,090 --> 00:41:21,058 AUDIENCE: [INAUDIBLE] are there [INAUDIBLE]. 793 00:41:28,953 --> 00:41:32,690 PROFESSOR: So it depends on your problem. 794 00:41:32,690 --> 00:41:34,680 If your problem has got particular structure, 795 00:41:34,680 --> 00:41:38,271 it may be that you can come up with an approximations, 796 00:41:38,271 --> 00:41:40,770 and in particular what's called complex approximations where 797 00:41:40,770 --> 00:41:42,894 you can guarantee something about the solution. 798 00:41:42,894 --> 00:41:45,060 So that's going to depend very much on the structure 799 00:41:45,060 --> 00:41:45,680 of your problem. 800 00:41:45,680 --> 00:41:47,150 That's what a lot of people in operations research 801 00:41:47,150 --> 00:41:49,480 spend a lot of time doing is taking difficult problems 802 00:41:49,480 --> 00:41:51,720 and then coming up with approximations or relaxations 803 00:41:51,720 --> 00:41:54,320 of the problem that they can then say something 804 00:41:54,320 --> 00:41:56,305 rigorous about the solution and how it relates 805 00:41:56,305 --> 00:41:57,672 to solution [INAUDIBLE]. 806 00:41:57,672 --> 00:42:00,290 But I think the reality is that in engineering design 807 00:42:00,290 --> 00:42:02,590 the landscape usually looks a real mess. 808 00:42:02,590 --> 00:42:05,690 And not only does it have multiple hills and mountains, 809 00:42:05,690 --> 00:42:08,860 but often there's cliffs or part of a-- because we're not even 810 00:42:08,860 --> 00:42:09,990 looking constraints here. 811 00:42:09,990 --> 00:42:11,770 Part of the design space where there's 812 00:42:11,770 --> 00:42:15,790 like a region where there's just no feasible design. 813 00:42:15,790 --> 00:42:18,740 And basically, people apply optimization methods 814 00:42:18,740 --> 00:42:22,176 to try to improve designs, but don't necessarily worry as much 815 00:42:22,176 --> 00:42:23,426 about mathematical optimality. 816 00:42:23,426 --> 00:42:26,750 It's more of a tool to help improve design. 817 00:42:26,750 --> 00:42:30,050 So think a little bit on what you're trying to do as well. 818 00:42:30,050 --> 00:42:31,490 Do you have a question? 819 00:42:31,490 --> 00:42:34,370 AUDIENCE: [INAUDIBLE] what happens like [INAUDIBLE] 820 00:42:41,570 --> 00:42:42,435 PROFESSOR: Yeah. 821 00:42:42,435 --> 00:42:46,240 AUDIENCE: [INAUDIBLE] still [INAUDIBLE]. 822 00:42:46,240 --> 00:42:47,330 PROFESSOR: Yeah. 823 00:42:47,330 --> 00:42:51,420 So that crinkled up wing was probably like a solution wing. 824 00:42:51,420 --> 00:42:52,980 It was horrible. 825 00:42:52,980 --> 00:42:56,010 I mean, this is now-- that was a design problem with many more 826 00:42:56,010 --> 00:42:57,810 designs, so we can't visualize it. 827 00:42:57,810 --> 00:42:58,660 But even just thinking about it here, 828 00:42:58,660 --> 00:43:00,660 it was probably a design that was somewhere down 829 00:43:00,660 --> 00:43:02,630 here because it was so awful. 830 00:43:02,630 --> 00:43:05,890 But you know, as long as you can get reliable [INAUDIBLE] 831 00:43:05,890 --> 00:43:07,400 gradients using [INAUDIBLE] methods, 832 00:43:07,400 --> 00:43:09,220 you get yourself out of that awful place, 833 00:43:09,220 --> 00:43:11,160 and you get to a good one. 834 00:43:11,160 --> 00:43:15,540 So it might take longer. 835 00:43:15,540 --> 00:43:16,040 Yup. 836 00:43:18,880 --> 00:43:19,380 OK. 837 00:43:19,380 --> 00:43:26,148 So Let's see where we are. 838 00:43:26,148 --> 00:43:27,881 So I'll turn this off. 839 00:43:27,881 --> 00:43:28,380 OK. 840 00:43:28,380 --> 00:43:32,384 So [INAUDIBLE] a couple of mathematical things 841 00:43:32,384 --> 00:43:39,071 that we may need to [INAUDIBLE] concepts 842 00:43:39,071 --> 00:43:40,570 that we need before we actually talk 843 00:43:40,570 --> 00:43:44,300 about what's going on in those optimization algorithms. 844 00:43:44,300 --> 00:43:46,762 So we're going to need to think about gradients. 845 00:43:52,534 --> 00:43:56,402 So let's just think about what the gradient is in this case. 846 00:43:59,500 --> 00:44:06,490 And so what we're talking about for j of x-- and remember, 847 00:44:06,490 --> 00:44:15,468 this is a scalar objective, but this is n dimensional design 848 00:44:15,468 --> 00:44:15,968 [INAUDIBLE]. 849 00:44:21,830 --> 00:44:24,470 So we're talking about the objective function 850 00:44:24,470 --> 00:44:29,840 j of x, a scalar function of n design variables. 851 00:44:29,840 --> 00:44:37,494 Then the gradient is what? 852 00:44:37,494 --> 00:44:39,845 First of all, what dimension does the gradient have? 853 00:44:44,210 --> 00:44:45,304 n by n? 854 00:44:48,020 --> 00:44:49,670 What do you think it is? [INAUDIBLE]? 855 00:44:49,670 --> 00:44:50,622 N? 856 00:44:50,622 --> 00:44:52,970 n by-- n by 1. 857 00:44:52,970 --> 00:44:55,187 It's a vector of link n, n by 1. 858 00:44:55,187 --> 00:44:56,895 So the gradient-- but with the gradients, 859 00:44:56,895 --> 00:45:01,760 I'm talking about the gradient of j with respect to x. 860 00:45:01,760 --> 00:45:03,914 So you should write that as gradj. 861 00:45:07,246 --> 00:45:09,930 You must have seen this in 1802, no? 862 00:45:09,930 --> 00:45:11,738 Gradient? 863 00:45:11,738 --> 00:45:12,642 Yeah. 864 00:45:12,642 --> 00:45:14,468 You're frowning, Jacobi. 865 00:45:14,468 --> 00:45:15,836 AUDIENCE: No, I was nodding. 866 00:45:15,836 --> 00:45:17,169 PROFESSOR: Oh, you were nodding? 867 00:45:17,169 --> 00:45:19,110 You looked like you frown when I [INAUDIBLE]. 868 00:45:19,110 --> 00:45:25,100 So gradient of j, which is the vector of partial derivatives, 869 00:45:25,100 --> 00:45:33,430 right? bj, bx1, bj dx2 down to bj dxn. 870 00:45:37,630 --> 00:45:42,596 So the gradient is-- it's an n by 1 vector 871 00:45:42,596 --> 00:45:44,180 [INAUDIBLE] grad j, which is just 872 00:45:44,180 --> 00:45:47,250 a vector of partial derivative. 873 00:45:47,250 --> 00:45:50,250 And normally, we'll be interested in the gradient 874 00:45:50,250 --> 00:45:55,011 evaluated at a particular point, because you know, again, 875 00:45:55,011 --> 00:45:57,510 we're going to be evaluating the gradient at the point we're 876 00:45:57,510 --> 00:45:58,800 standing in the landscape. 877 00:45:58,800 --> 00:46:03,410 And so we would write that maybe as gradj 878 00:46:03,410 --> 00:46:07,340 evaluated at the point xk. 879 00:46:07,340 --> 00:46:09,316 So [INAUDIBLE] going to use q. 880 00:46:09,316 --> 00:46:11,640 You can leave q over there, which 881 00:46:11,640 --> 00:46:15,230 would mean taking these partial derivatives 882 00:46:15,230 --> 00:46:19,160 and evaluating them at the point x equal to xq. 883 00:46:24,916 --> 00:46:28,208 So you guys did a reading problem that I know 884 00:46:28,208 --> 00:46:33,840 was lots of issues, because it broke the MITX platform's 885 00:46:33,840 --> 00:46:37,480 ability to take pictures as an answer. 886 00:46:37,480 --> 00:46:40,130 But you computed the gradient vector. 887 00:46:40,130 --> 00:46:43,600 And then you substituted a particular value 888 00:46:43,600 --> 00:46:44,622 of the design vector x. 889 00:46:44,622 --> 00:46:47,038 And then the gradients just came out to be numbers, right? 890 00:46:47,038 --> 00:46:47,538 Yup. 891 00:46:50,110 --> 00:46:51,510 So that's the gradient. 892 00:46:51,510 --> 00:46:54,640 So when we talk about the gradient of the objective, 893 00:46:54,640 --> 00:46:57,650 it's an n by 1 vector that contains 894 00:46:57,650 --> 00:46:59,440 the partial derivative. 895 00:46:59,440 --> 00:47:04,920 And so if we had a gradient [INAUDIBLE] 896 00:47:04,920 --> 00:47:10,600 have [INAUDIBLE] is also going to be a vector of dimension n. 897 00:47:13,830 --> 00:47:14,330 OK. 898 00:47:14,330 --> 00:47:19,610 So we know that we need gradient of j 899 00:47:19,610 --> 00:47:24,260 equals 0 to be at an optimal point. 900 00:47:24,260 --> 00:47:24,760 Yup. 901 00:47:24,760 --> 00:47:26,829 Minimum, maximum, all at [INAUDIBLE] point. 902 00:47:26,829 --> 00:47:29,120 So gradient of j equals 0 means that all these partials 903 00:47:29,120 --> 00:47:31,710 have to be equal to 0. 904 00:47:31,710 --> 00:47:33,520 But we also said that if we wanted 905 00:47:33,520 --> 00:47:35,980 to be sure that we're at a minimum, 906 00:47:35,980 --> 00:47:38,894 we need to look at the second derivative information. 907 00:47:38,894 --> 00:47:43,154 So what is the second derivative of j with respect to x? 908 00:47:43,154 --> 00:47:44,320 What dimension does it have? 909 00:47:47,470 --> 00:47:48,200 n by n. 910 00:47:48,200 --> 00:47:49,674 It's an n by n matrix. 911 00:47:49,674 --> 00:47:50,840 And it's called the Hessian. 912 00:47:53,950 --> 00:47:58,794 So the Hessian matrix is a matrix of second derivative. 913 00:47:58,794 --> 00:48:00,560 So it's an n by n matrix. 914 00:48:03,500 --> 00:48:07,720 So we could write it as grad squared j. 915 00:48:07,720 --> 00:48:12,000 And so it's just on the diagonal, 916 00:48:12,000 --> 00:48:14,180 we're going to be the pure partial. 917 00:48:14,180 --> 00:48:17,540 Del squared j del x1 squared del squared 918 00:48:17,540 --> 00:48:22,791 j del x2 squared all the way down to del squared j del 919 00:48:22,791 --> 00:48:23,733 xn squared. 920 00:48:26,560 --> 00:48:30,146 And then on the up diagonals are they 921 00:48:30,146 --> 00:48:37,130 mixed terms, del squared j del x 1 del x1 del x2 and so on. 922 00:48:37,130 --> 00:48:41,302 Del squared j del x1, del x10. 923 00:48:44,020 --> 00:48:48,960 And what's special about this matrix? 924 00:48:48,960 --> 00:48:49,830 It's symmetric. 925 00:48:49,830 --> 00:48:54,610 Because a mixed partial with respect to 2, 926 00:48:54,610 --> 00:49:03,820 1 is the same as the next partial with respect to 1, 2. 927 00:49:03,820 --> 00:49:08,740 So n by n matrix, it's symmetric. 928 00:49:08,740 --> 00:49:13,976 And so in the scalar case, if you wanted to minimize-- 929 00:49:13,976 --> 00:49:17,040 how do I want to put it? 930 00:49:17,040 --> 00:49:17,940 Let me put it here. 931 00:49:17,940 --> 00:49:23,200 So in the scalar case if I asked you 932 00:49:23,200 --> 00:49:27,010 to find a minimum of f of x where x was a scalar 933 00:49:27,010 --> 00:49:30,090 and f was a scalar, you would have two conditions, right? 934 00:49:30,090 --> 00:49:32,365 You would say, yes, the x equals 0. 935 00:49:35,620 --> 00:49:46,208 And then second derivative, [INAUDIBLE] positive. 936 00:49:46,208 --> 00:49:50,427 And that would be evaluated at the x, the optimum point. 937 00:49:50,427 --> 00:49:52,385 So we could have made it evaluate it at x star. 938 00:49:55,727 --> 00:49:57,310 So you've seen all this before, right? 939 00:49:57,310 --> 00:50:01,420 This is 1801 or high school calculus. 940 00:50:01,420 --> 00:50:05,180 So in our case, we are trying to minimize 941 00:50:05,180 --> 00:50:07,430 j which isn't a scalar, but now it's 942 00:50:07,430 --> 00:50:10,530 a function of the design vector x. 943 00:50:10,530 --> 00:50:12,120 So the corresponding condition here 944 00:50:12,120 --> 00:50:16,790 is this grad j evaluated at x star equals 0. 945 00:50:16,790 --> 00:50:18,960 We can put a [INAUDIBLE] symbol on there 946 00:50:18,960 --> 00:50:22,190 if you want to remind yourself that this is an n by 1 vector, 947 00:50:22,190 --> 00:50:25,210 and [INAUDIBLE] grad j equals 0 means putting all those entries 948 00:50:25,210 --> 00:50:26,810 to 0. 949 00:50:26,810 --> 00:50:29,132 What's the corresponding case here? 950 00:50:29,132 --> 00:50:32,320 What's the analogous condition to second derivative 951 00:50:32,320 --> 00:50:33,265 being positive? 952 00:50:36,235 --> 00:50:46,559 Who's taken 1806? 953 00:50:46,559 --> 00:50:47,350 Here it's a scalar. 954 00:50:47,350 --> 00:50:48,740 And it's got to be positive. 955 00:50:48,740 --> 00:50:49,573 Here, it's a matrix. 956 00:50:56,780 --> 00:51:01,000 What have you ever heard about matrices and being positive. 957 00:51:01,000 --> 00:51:02,782 Positive definite, yeah. 958 00:51:02,782 --> 00:51:06,910 So the condition is that the [INAUDIBLE] matrix 959 00:51:06,910 --> 00:51:10,764 grad squared j, again, evaluated at the thing. 960 00:51:10,764 --> 00:51:12,180 And [INAUDIBLE] write it that way. 961 00:51:14,750 --> 00:51:16,250 In symmetric matrix, all eigenvalues 962 00:51:16,250 --> 00:51:17,410 have to be positive. 963 00:51:17,410 --> 00:51:19,462 They're going to be real and positive. 964 00:51:19,462 --> 00:51:23,710 So in another way, for any vector, 965 00:51:23,710 --> 00:51:26,600 say y, y transposed times this matrix times 966 00:51:26,600 --> 00:51:28,100 y has to be strictly positive, which 967 00:51:28,100 --> 00:51:30,225 you can show that it's equivalent to the eigenvalue 968 00:51:30,225 --> 00:51:31,982 [INAUDIBLE]. 969 00:51:31,982 --> 00:51:36,270 So turns out the eigenvalues of this Hessian matrix tell you 970 00:51:36,270 --> 00:51:39,410 an awful lot about the shape of your design space. 971 00:51:39,410 --> 00:51:45,840 And maybe intuitively, I find it conceptually easy 972 00:51:45,840 --> 00:51:47,180 to think about maximizing. 973 00:51:47,180 --> 00:51:48,763 If I'm standing at the top of the hill 974 00:51:48,763 --> 00:51:53,240 and I'm trying to maximize and think about the eigenvalues 975 00:51:53,240 --> 00:51:55,742 and eigenvectors, there's no direction 976 00:51:55,742 --> 00:51:57,950 I can move where things are going to increase, right? 977 00:51:57,950 --> 00:52:00,660 So that means all the eigenvector directions 978 00:52:00,660 --> 00:52:04,020 are going to have to be associated with decrease. 979 00:52:04,020 --> 00:52:07,120 And so this would be flipping, right? 980 00:52:07,120 --> 00:52:09,360 To be the maximum, the Hessian would 981 00:52:09,360 --> 00:52:11,590 have to be negative definite. 982 00:52:11,590 --> 00:52:13,684 And so it kind of makes sense that the eigenvalues 983 00:52:13,684 --> 00:52:15,100 of the Hessian are going to relate 984 00:52:15,100 --> 00:52:18,360 to what goes on as we move away from a minimum 985 00:52:18,360 --> 00:52:20,260 or a maximum point. 986 00:52:20,260 --> 00:52:20,760 OK. 987 00:52:20,760 --> 00:52:21,987 So we've got gradient. 988 00:52:21,987 --> 00:52:22,570 We've Hessian. 989 00:52:22,570 --> 00:52:25,900 We need one more thing, which is your old friend Taylor series 990 00:52:25,900 --> 00:52:26,400 expansion. 991 00:52:26,400 --> 00:52:27,380 Yeah, Greg. 992 00:52:27,380 --> 00:52:29,830 AUDIENCE: Can you go over that definition [INAUDIBLE]? 993 00:52:29,830 --> 00:52:30,480 PROFESSOR: OK. 994 00:52:30,480 --> 00:52:31,510 This right here? 995 00:52:31,510 --> 00:52:32,470 Yeah. 996 00:52:32,470 --> 00:52:43,660 So let me write it out here to be-- So let's call it h. 997 00:52:43,660 --> 00:52:47,000 So let's write the Hessian-- I'm just going to call it 998 00:52:47,000 --> 00:52:50,270 h, so I don't have to keep writing grad squared j. 999 00:52:50,270 --> 00:52:54,170 So the condition is that h is positive definite. 1000 00:52:58,570 --> 00:53:01,950 And what that means-- we write it this way, h is a matrix. 1001 00:53:01,950 --> 00:53:09,880 What that means is that v transpose hv has to be positive 1002 00:53:09,880 --> 00:53:15,960 for av, for any v that's not 0. 1003 00:53:15,960 --> 00:53:20,340 So we take any vector v and do the b transpose hv. 1004 00:53:20,340 --> 00:53:22,890 And that has to be positive. 1005 00:53:22,890 --> 00:53:24,760 If that's true, then h is positive definite. 1006 00:53:24,760 --> 00:53:28,610 And then, because h is symmetric matrix, that 1007 00:53:28,610 --> 00:53:35,032 is the same condition as saying all the eigenvalues of h-- 1008 00:53:35,032 --> 00:53:36,890 and there are going to the n of them-- 1009 00:53:36,890 --> 00:53:39,355 have to also be positive. 1010 00:53:42,970 --> 00:53:44,839 It's a property of the matrix. 1011 00:53:44,839 --> 00:53:45,714 AUDIENCE: [INAUDIBLE] 1012 00:53:51,380 --> 00:53:52,380 PROFESSOR: Yeah. 1013 00:53:52,380 --> 00:53:53,255 AUDIENCE: [INAUDIBLE] 1014 00:54:02,050 --> 00:54:04,300 PROFESSOR: So if not all the eigenvalues are positive, 1015 00:54:04,300 --> 00:54:07,358 if some of them are 0, then it means-- 1016 00:54:07,358 --> 00:54:10,012 AUDIENCE: [INAUDIBLE] 1017 00:54:10,012 --> 00:54:12,470 PROFESSOR: Well, so if some of the eigenvalues are positive 1018 00:54:12,470 --> 00:54:14,845 and some are negative, it means you're at a settle point. 1019 00:54:14,845 --> 00:54:16,297 You've seen that in the 1dk-- 1020 00:54:16,297 --> 00:54:17,172 AUDIENCE: [INAUDIBLE] 1021 00:54:20,490 --> 00:54:23,350 PROFESSOR: So I don't know how to act this out. 1022 00:54:23,350 --> 00:54:30,860 But in a landscape if I a settle point is [INAUDIBLE] 1023 00:54:30,860 --> 00:54:34,980 point means that the [INAUDIBLE] goes this way in one direction, 1024 00:54:34,980 --> 00:54:36,080 but this way in the other. 1025 00:54:36,080 --> 00:54:38,824 So there's still a direction that I can move to [INAUDIBLE]. 1026 00:54:38,824 --> 00:54:39,699 AUDIENCE: [INAUDIBLE] 1027 00:54:44,349 --> 00:54:46,390 PROFESSOR: So maybe [INAUDIBLE] said another way. 1028 00:54:46,390 --> 00:54:49,250 An optimization algorithm that's trying to minimize or maximize 1029 00:54:49,250 --> 00:54:52,590 something won't stop at those points. 1030 00:54:52,590 --> 00:54:55,170 Because it will be up to find a direction of improvement. 1031 00:54:55,170 --> 00:54:56,625 AUDIENCE: [INAUDIBLE] 1032 00:54:56,625 --> 00:54:58,556 PROFESSOR: No, it'll keep going. 1033 00:54:58,556 --> 00:55:00,530 AUDIENCE: [INAUDIBLE] 1034 00:55:00,530 --> 00:55:01,530 PROFESSOR: That's right. 1035 00:55:01,530 --> 00:55:04,150 Think about standing on-- you're standing on the edge 1036 00:55:04,150 --> 00:55:06,990 and things are dropping either side of you. 1037 00:55:06,990 --> 00:55:10,286 But if you're looking front, you can keep going up the hill. 1038 00:55:10,286 --> 00:55:14,240 So it's just going to constrain the directions in which you 1039 00:55:14,240 --> 00:55:15,390 move. 1040 00:55:15,390 --> 00:55:21,550 But the problem with [AUDIO OUT] as we 1041 00:55:21,550 --> 00:55:27,830 got into the [AUDIO OUT] regions where [AUDIO OUT] you 1042 00:55:27,830 --> 00:55:35,391 might know [AUDIO OUT] if you're minimal and not unique, 1043 00:55:35,391 --> 00:55:36,640 then there's actually a ridge. 1044 00:55:36,640 --> 00:55:37,550 The top of the mountain would be the ridge. 1045 00:55:37,550 --> 00:55:39,749 And so there's a direction that you can walk along, 1046 00:55:39,749 --> 00:55:41,290 and you're not changing an objective. 1047 00:55:41,290 --> 00:55:44,060 You're staying at constant elevation. 1048 00:55:44,060 --> 00:55:46,856 And that would correspond to one of the eigenvalues being 0. 1049 00:55:46,856 --> 00:55:48,230 But in that case, what you've got 1050 00:55:48,230 --> 00:55:50,640 is a whole bunch of designs that are equally good. 1051 00:55:50,640 --> 00:55:52,140 And so that's actually kind of nice to know. 1052 00:55:52,140 --> 00:55:54,140 Because that would be different design decisions 1053 00:55:54,140 --> 00:55:57,010 that you could make that were all as good in cost 1054 00:55:57,010 --> 00:55:58,900 or whatever it is you're trying to do. 1055 00:55:58,900 --> 00:56:00,594 AUDIENCE: [INAUDIBLE] 1056 00:56:00,594 --> 00:56:01,260 PROFESSOR: Yeah. 1057 00:56:01,260 --> 00:56:02,509 So, yeah. 1058 00:56:02,509 --> 00:56:03,384 AUDIENCE: [INAUDIBLE] 1059 00:56:09,120 --> 00:56:10,274 PROFESSOR: That's right. 1060 00:56:10,274 --> 00:56:11,149 AUDIENCE: [INAUDIBLE] 1061 00:56:15,644 --> 00:56:16,310 PROFESSOR: Yeah. 1062 00:56:16,310 --> 00:56:18,143 So you don't have to worry about this stuff. 1063 00:56:18,143 --> 00:56:20,660 This stuff is taken care of for you 1064 00:56:20,660 --> 00:56:23,780 when you run something like fminunc and fmincon. 1065 00:56:23,780 --> 00:56:27,995 But what's important is to understand 1066 00:56:27,995 --> 00:56:30,660 that there are sort of rigorous mathematical conditions 1067 00:56:30,660 --> 00:56:33,430 that tell you when you're at optimal solution. 1068 00:56:33,430 --> 00:56:37,830 And if your problem sort of obeys the given structure, 1069 00:56:37,830 --> 00:56:39,570 then that's great. 1070 00:56:39,570 --> 00:56:40,519 And that holds. 1071 00:56:40,519 --> 00:56:43,060 In reality, remember when we had our list of design variables 1072 00:56:43,060 --> 00:56:45,734 we talked about having variables like number of engines? 1073 00:56:45,734 --> 00:56:47,900 That's something for which the gradient doesn't even 1074 00:56:47,900 --> 00:56:49,100 exist, right? 1075 00:56:49,100 --> 00:56:50,790 It's not differentiable. 1076 00:56:50,790 --> 00:56:53,420 And so in many cases, our problems 1077 00:56:53,420 --> 00:56:56,790 don't even satisfy the requirements for the algorithms 1078 00:56:56,790 --> 00:56:57,409 that we use. 1079 00:56:57,409 --> 00:56:58,950 And so there are very few guarantees. 1080 00:56:58,950 --> 00:57:01,003 But they can still help us make progress. 1081 00:57:01,003 --> 00:57:01,878 AUDIENCE: [INAUDIBLE] 1082 00:57:11,075 --> 00:57:12,700 PROFESSOR: That's the optimal solution? 1083 00:57:12,700 --> 00:57:13,575 AUDIENCE: [INAUDIBLE] 1084 00:57:17,900 --> 00:57:19,650 PROFESSOR: If there's one of these ridges, 1085 00:57:19,650 --> 00:57:21,550 if there's a non-unique solution, yeah. 1086 00:57:21,550 --> 00:57:22,847 Yeah. 1087 00:57:22,847 --> 00:57:23,722 AUDIENCE: [INAUDIBLE] 1088 00:57:29,554 --> 00:57:30,220 PROFESSOR: Yeah. 1089 00:57:30,220 --> 00:57:32,494 So integers becomes a whole other ball game. 1090 00:57:32,494 --> 00:57:34,410 And maybe I should have said right off the bat 1091 00:57:34,410 --> 00:57:36,870 that when you start looking at the gradient 1092 00:57:36,870 --> 00:57:39,180 and the Hessian analysis, [INAUDIBLE] j 1093 00:57:39,180 --> 00:57:42,730 of x needs to be at least twice differentiable with respect 1094 00:57:42,730 --> 00:57:43,230 to x. 1095 00:57:43,230 --> 00:57:45,605 Or otherwise, what we're writing here doesn't make sense. 1096 00:57:45,605 --> 00:57:48,700 When you have integers, the conditions 1097 00:57:48,700 --> 00:57:49,839 become more complicated. 1098 00:57:49,839 --> 00:57:52,130 And how you solve the problem becomes more complicated. 1099 00:57:52,130 --> 00:57:54,490 One approach is to let the integers vary, 1100 00:57:54,490 --> 00:57:55,710 exactly what you're saying. 1101 00:57:55,710 --> 00:57:57,176 And then at the end, you round. 1102 00:57:57,176 --> 00:57:59,300 And that might be effective, but it's certainly not 1103 00:57:59,300 --> 00:58:01,270 guaranteed to find an optimal solution. 1104 00:58:01,270 --> 00:58:06,460 There are specialized optimization solution methods 1105 00:58:06,460 --> 00:58:08,180 that are tailored for integer programs. 1106 00:58:08,180 --> 00:58:10,730 And in fact, MATLAB just released a mixed integer 1107 00:58:10,730 --> 00:58:12,870 program, I think, as part of their latest release 1108 00:58:12,870 --> 00:58:14,300 of the optimization toolbox. 1109 00:58:14,300 --> 00:58:16,451 And these methods will handle the integer variables 1110 00:58:16,451 --> 00:58:17,200 in different ways. 1111 00:58:17,200 --> 00:58:18,170 It's something called [INAUDIBLE] 1112 00:58:18,170 --> 00:58:20,800 bound, which is about searching down different integer 1113 00:58:20,800 --> 00:58:22,170 combinations. 1114 00:58:22,170 --> 00:58:26,530 But yeah, integer variables make it really difficult. 1115 00:58:26,530 --> 00:58:28,530 Yeah, Alex. 1116 00:58:28,530 --> 00:58:31,025 AUDIENCE: [INAUDIBLE] j [INAUDIBLE]. 1117 00:58:33,900 --> 00:58:36,150 PROFESSOR: Yeah, so we're going to talk about-- that's 1118 00:58:36,150 --> 00:58:37,620 sort of number three. 1119 00:58:37,620 --> 00:58:39,717 j is usually going to be computer code 1120 00:58:39,717 --> 00:58:41,550 that we can put in the shape of the aircraft 1121 00:58:41,550 --> 00:58:44,387 wing or the whole aircraft, and out comes range or cost 1122 00:58:44,387 --> 00:58:45,720 or whatever it is, that's right. 1123 00:58:45,720 --> 00:58:49,040 So we can usually only get j of x through simulation 1124 00:58:49,040 --> 00:58:50,440 not necessarily by analytic. 1125 00:58:50,440 --> 00:58:53,530 And so then we're going to need some way to compute gradients. 1126 00:58:53,530 --> 00:58:56,430 And what's really nice is finite differences, which you guys saw 1127 00:58:56,430 --> 00:59:00,383 back a few months ago, is what we're going to use to do that. 1128 00:59:00,383 --> 00:59:01,292 Yeah. 1129 00:59:01,292 --> 00:59:03,750 AUDIENCE: Like also, one thing we talked about [INAUDIBLE]. 1130 00:59:11,750 --> 00:59:14,300 PROFESSOR: So it's all taken care of, 1131 00:59:14,300 --> 00:59:15,790 because we're not sampling. 1132 00:59:15,790 --> 00:59:18,600 We're actually measuring the gradient. 1133 00:59:18,600 --> 00:59:25,394 And I mean, they're going to be-- 1134 00:59:25,394 --> 00:59:27,560 I mean, we're doing something different here, right? 1135 00:59:27,560 --> 00:59:29,770 We're moving through the landscape. 1136 00:59:29,770 --> 00:59:32,170 We're not sampling, and then trying 1137 00:59:32,170 --> 00:59:35,410 to say how this variable relates to this one. 1138 00:59:35,410 --> 00:59:37,962 We're actually just looking for an optimum solution. 1139 00:59:37,962 --> 00:59:38,920 It's a different thing. 1140 00:59:38,920 --> 00:59:42,290 So whatever interactions are there, how are they affected? 1141 00:59:42,290 --> 00:59:44,710 They're reflected in the shape of the landscape. 1142 00:59:44,710 --> 00:59:46,990 The shape that the landscape takes 1143 00:59:46,990 --> 00:59:49,978 is a manifestation of how the design variables relate 1144 00:59:49,978 --> 00:59:51,978 to each other and how they affect the objective. 1145 00:59:57,346 --> 00:59:58,634 AUDIENCE: [INAUDIBLE] 1146 00:59:58,634 --> 00:59:59,300 PROFESSOR: Yeah. 1147 00:59:59,300 --> 01:00:00,190 There are. 1148 01:00:00,190 --> 01:00:02,120 And I'll show you some of those. 1149 01:00:04,994 --> 01:00:07,410 The problem is actually now to guarantee a global maximum. 1150 01:00:07,410 --> 01:00:08,951 So there are a whole bunch of methods 1151 01:00:08,951 --> 01:00:10,550 called heuristic methods. 1152 01:00:10,550 --> 01:00:14,090 So genetic algorithms-- which Professor [? Devic ?] 1153 01:00:14,090 --> 01:00:15,790 uses a lot in his research, which 1154 01:00:15,790 --> 01:00:19,230 I happen to not particularly care for-- are ways to do that. 1155 01:00:19,230 --> 01:00:22,479 And I'll show you maybe in the next lecture. 1156 01:00:22,479 --> 01:00:24,020 They sort of spray points everywhere. 1157 01:00:24,020 --> 01:00:26,290 And then they use an analogy with natural selection 1158 01:00:26,290 --> 01:00:28,320 and mutations. 1159 01:00:28,320 --> 01:00:30,030 And designs have babies. 1160 01:00:30,030 --> 01:00:33,830 And then the strong babies survive and the other ones 1161 01:00:33,830 --> 01:00:35,040 don't. 1162 01:00:35,040 --> 01:00:36,340 So there will be [INAUDIBLE]. 1163 01:00:36,340 --> 01:00:37,756 And you know, people sort of claim 1164 01:00:37,756 --> 01:00:40,230 that's a way to do global optimization. 1165 01:00:40,230 --> 01:00:42,147 The problem is there are no guarantees at all. 1166 01:00:42,147 --> 01:00:44,563 I'll show you another one, which is a patent search, which 1167 01:00:44,563 --> 01:00:46,490 sort of has a guarantee of eventually finding 1168 01:00:46,490 --> 01:00:48,630 a global solution, but only [INAUDIBLE] that you 1169 01:00:48,630 --> 01:00:50,660 search forever. 1170 01:00:50,660 --> 01:00:51,240 Yeah. 1171 01:00:51,240 --> 01:00:52,710 And it's a really hard problem. 1172 01:00:52,710 --> 01:00:54,410 If you know something about the structure of your problem, 1173 01:00:54,410 --> 01:00:55,825 you maybe have to do something. 1174 01:00:55,825 --> 01:00:57,620 And certain problems, like we were talking about before, 1175 01:00:57,620 --> 01:00:59,640 have the nice structure where you can be rigorous. 1176 01:00:59,640 --> 01:01:02,040 If you have truly just kind of this black box complicated 1177 01:01:02,040 --> 01:01:04,922 aircraft design problem, there are no guarantees. 1178 01:01:04,922 --> 01:01:06,630 But again, often what you're trying to do 1179 01:01:06,630 --> 01:01:08,230 is to find a good design that meets 1180 01:01:08,230 --> 01:01:09,604 all the constraints that's better 1181 01:01:09,604 --> 01:01:11,102 than what you could do by hand. 1182 01:01:11,102 --> 01:01:13,980 So [INAUDIBLE]. 1183 01:01:13,980 --> 01:01:14,480 Yeah. 1184 01:01:14,480 --> 01:01:14,940 It depends. 1185 01:01:14,940 --> 01:01:17,356 Are you an engineer trying to make a good design decision? 1186 01:01:17,356 --> 01:01:20,490 Or are you a mathematician who wants to guarantee optimality? 1187 01:01:20,490 --> 01:01:23,834 And where do you fall in that? 1188 01:01:23,834 --> 01:01:25,040 So that's god. 1189 01:01:25,040 --> 01:01:26,767 So you guys have lots of questions. 1190 01:01:26,767 --> 01:01:28,600 You're trying to avoid getting Taylor series 1191 01:01:28,600 --> 01:01:29,520 expansions I know. 1192 01:01:32,522 --> 01:01:33,355 Any other questions? 1193 01:01:35,752 --> 01:01:37,210 So Greg, does this sort of answer-- 1194 01:01:37,210 --> 01:01:38,626 I know I didn't do it very deeply. 1195 01:01:38,626 --> 01:01:40,655 But it's probably enough. 1196 01:01:40,655 --> 01:01:41,530 AUDIENCE: [INAUDIBLE] 1197 01:01:41,530 --> 01:01:44,980 PROFESSOR: So just the last mathematical ingredient 1198 01:01:44,980 --> 01:01:47,557 that we need or that we will need 1199 01:01:47,557 --> 01:01:49,265 are going to be Taylor series expansions. 1200 01:01:49,265 --> 01:01:51,580 And it's just, again, same thing. 1201 01:01:51,580 --> 01:01:53,510 You've seen them in the scalar case. 1202 01:01:53,510 --> 01:01:58,420 But let's just make sure it's clear in the gradient case. 1203 01:01:58,420 --> 01:02:01,490 And by the way, If you really do have lots of questions 1204 01:02:01,490 --> 01:02:04,303 and you need to take the graduate class set [INAUDIBLE] 1205 01:02:04,303 --> 01:02:06,804 and I teach-- which we're just teaching it this semester, 1206 01:02:06,804 --> 01:02:08,220 and it's offered every other year. 1207 01:02:08,220 --> 01:02:10,780 But it's on design optimization, a whole class 1208 01:02:10,780 --> 01:02:15,152 on this stuff, which is fun. 1209 01:02:15,152 --> 01:02:16,395 Taylor series expansion. 1210 01:02:19,209 --> 01:02:20,750 Again, in the scalar case, let's just 1211 01:02:20,750 --> 01:02:23,460 do what we did over there and do the analogy. 1212 01:02:23,460 --> 01:02:27,950 So in the scalar case, if I had some f of-- let me use z. 1213 01:02:27,950 --> 01:02:29,726 I probably should have used z over there. 1214 01:02:29,726 --> 01:02:30,915 It's OK. 1215 01:02:30,915 --> 01:02:32,414 If I have some f of z, what is that? 1216 01:02:32,414 --> 01:02:33,370 It's f. 1217 01:02:33,370 --> 01:02:39,810 If I'm expanding about the point b0 plus the first derivative 1218 01:02:39,810 --> 01:02:42,893 evaluated at the point b0 times z 1219 01:02:42,893 --> 01:02:50,382 minus z0 plus the second derivative evaluated 1220 01:02:50,382 --> 01:03:01,118 at the point z0 times z minus z0 squared plus blah, blah, blah. 1221 01:03:01,118 --> 01:03:01,617 Right? 1222 01:03:01,617 --> 01:03:04,996 So that's the Taylor series expansion that you've seen. 1223 01:03:04,996 --> 01:03:07,690 So how does it look in the vector case? 1224 01:03:15,450 --> 01:03:18,370 So when we talk about Taylor series expansion, 1225 01:03:18,370 --> 01:03:24,420 we're talking about expanding j of x, where again x is now 1226 01:03:24,420 --> 01:03:26,850 this n dimensional vector. 1227 01:03:26,850 --> 01:03:30,228 And we're going to expand it around the point 1228 01:03:30,228 --> 01:03:34,391 x0 point in the landscape. 1229 01:03:34,391 --> 01:03:34,890 OK. 1230 01:03:34,890 --> 01:03:40,140 So what does the first term look like? 1231 01:03:40,140 --> 01:03:57,240 Gradient of j evaluated at x0 multiplied by x minus x0. 1232 01:03:57,240 --> 01:03:57,740 OK. 1233 01:03:57,740 --> 01:04:00,364 So we have to think about the dimensions. 1234 01:04:00,364 --> 01:04:02,280 It always helps me think about the dimensions. 1235 01:04:02,280 --> 01:04:05,860 So [INAUDIBLE] n by 1, right? 1236 01:04:05,860 --> 01:04:07,741 What do we need to do to this thing? 1237 01:04:07,741 --> 01:04:08,240 Transpose. 1238 01:04:08,240 --> 01:04:16,180 So it's in a product between the gradient evaluated at x0 1239 01:04:16,180 --> 01:04:19,910 and in the delta x, x minus x0. 1240 01:04:19,910 --> 01:04:22,010 Yup. 1241 01:04:22,010 --> 01:04:26,990 Scalar, scalar, 1 by n times n by 1. 1242 01:04:26,990 --> 01:04:27,851 That's a scalar. 1243 01:04:27,851 --> 01:04:28,350 OK. 1244 01:04:28,350 --> 01:04:31,795 How about the second derivative term? 1245 01:04:31,795 --> 01:04:33,420 What do you think that might look like? 1246 01:04:36,880 --> 01:04:39,270 We'll put the Hessian in the middle. 1247 01:04:39,270 --> 01:04:40,960 And it's the Hessian, again, evaluated 1248 01:04:40,960 --> 01:04:44,260 at-- let me write it thigs way-- x0. 1249 01:04:48,112 --> 01:04:50,697 AUDIENCE: x [INAUDIBLE] 1250 01:04:50,697 --> 01:04:51,530 PROFESSOR: OK, good. 1251 01:04:51,530 --> 01:04:52,030 Yeah. 1252 01:04:52,030 --> 01:04:54,140 So x minus x0 on that side. 1253 01:04:54,140 --> 01:04:59,690 And then x minus x0 transpose. 1254 01:04:59,690 --> 01:05:06,890 And again, this is an n by n and n by 1 [INAUDIBLE] a scalar. 1255 01:05:06,890 --> 01:05:08,560 And then if you went higher, you would 1256 01:05:08,560 --> 01:05:14,458 be getting a tensor for the third derivative. 1257 01:05:14,458 --> 01:05:16,709 AUDIENCE: [INAUDIBLE] 1258 01:05:16,709 --> 01:05:17,500 PROFESSOR: You can. 1259 01:05:17,500 --> 01:05:20,360 And in fact tensors are very popular, 1260 01:05:20,360 --> 01:05:23,180 lots of people working on them now. 1261 01:05:23,180 --> 01:05:25,190 I don't-- don't ask me anything about them. 1262 01:05:29,670 --> 01:05:30,622 OK. 1263 01:05:30,622 --> 01:05:38,540 So let me come back to this. 1264 01:05:38,540 --> 01:05:43,180 Then I can start showing you some of the optimization 1265 01:05:43,180 --> 01:05:43,680 measures. 1266 01:05:43,680 --> 01:05:45,555 I'll show you pictures of some of them first. 1267 01:05:45,555 --> 01:05:49,963 And then we'll look-- yeah. 1268 01:05:49,963 --> 01:05:50,838 AUDIENCE: [INAUDIBLE] 1269 01:05:56,687 --> 01:05:57,270 PROFESSOR: No. 1270 01:05:57,270 --> 01:05:57,770 Yeah. 1271 01:05:57,770 --> 01:05:59,190 So I could've written-- yeah. 1272 01:05:59,190 --> 01:06:00,742 So that's just the point about which we were doing, 1273 01:06:00,742 --> 01:06:01,770 the Taylor series expansion. 1274 01:06:01,770 --> 01:06:03,936 And what you'll probably guess is that it's actually 1275 01:06:03,936 --> 01:06:05,660 going to be xq. 1276 01:06:05,660 --> 01:06:07,716 We're going to lock locally. 1277 01:06:07,716 --> 01:06:09,132 AUDIENCE: [INAUDIBLE] 1278 01:06:09,132 --> 01:06:12,410 PROFESSOR: Yeah, well, whatever. q minus 1, yeah. 1279 01:06:15,470 --> 01:06:15,970 OK. 1280 01:06:15,970 --> 01:06:21,560 So we can close this. 1281 01:06:21,560 --> 01:06:23,060 So actually Professor Johnson, who's 1282 01:06:23,060 --> 01:06:26,910 in the math department who teaches a really great graduate 1283 01:06:26,910 --> 01:06:28,814 class on numerical linear algebra, 1284 01:06:28,814 --> 01:06:30,230 it actually covers lots of things. 1285 01:06:30,230 --> 01:06:33,770 He has this NLopt package where he's implemented lots 1286 01:06:33,770 --> 01:06:34,964 of optimization algorithms. 1287 01:06:34,964 --> 01:06:36,130 And he has them implemented. 1288 01:06:36,130 --> 01:06:38,370 And you can access them in MATLAB or Python 1289 01:06:38,370 --> 01:06:40,153 or a variety of ways. 1290 01:06:40,153 --> 01:06:42,236 And also on the website, he was a really nice kind 1291 01:06:42,236 --> 01:06:45,070 of description of the algorithms and what kind of problems 1292 01:06:45,070 --> 01:06:47,810 they work for and issues with conversions and stuff. 1293 01:06:47,810 --> 01:06:49,529 It's really nice. 1294 01:06:49,529 --> 01:06:51,320 But [INAUDIBLE], you were asking about some 1295 01:06:51,320 --> 01:06:52,736 of the different kinds of methods. 1296 01:06:52,736 --> 01:06:56,389 So this is sort of the four categories. 1297 01:06:56,389 --> 01:06:57,930 There are global optimization methods 1298 01:06:57,930 --> 01:07:00,800 that sort of strive for this, being able to find 1299 01:07:00,800 --> 01:07:02,680 the global optimum. 1300 01:07:02,680 --> 01:07:05,400 There are local methods. 1301 01:07:05,400 --> 01:07:07,010 And I'll show you how these work. 1302 01:07:07,010 --> 01:07:08,520 And in the local methods, there can 1303 01:07:08,520 --> 01:07:11,655 be ones that are called derivative-free and 1304 01:07:11,655 --> 01:07:12,280 gradient-based. 1305 01:07:12,280 --> 01:07:15,545 So these one use gradients, and these ones don't use gradients. 1306 01:07:15,545 --> 01:07:17,170 And then there are a heuristic methods, 1307 01:07:17,170 --> 01:07:19,295 things like the genetic algorithms that kind of use 1308 01:07:19,295 --> 01:07:21,170 a bunch of roles and some randomness 1309 01:07:21,170 --> 01:07:24,930 to search the design space. 1310 01:07:24,930 --> 01:07:25,525 Yup. 1311 01:07:25,525 --> 01:07:26,400 AUDIENCE: [INAUDIBLE] 1312 01:07:29,350 --> 01:07:32,250 PROFESSOR: So non-linear refers to the fact that j of x 1313 01:07:32,250 --> 01:07:33,880 could be a general non-linear function 1314 01:07:33,880 --> 01:07:38,050 and that the g of x and h of x-- yup. 1315 01:07:38,050 --> 01:07:41,290 To be a linear program, j would have to be a linear function 1316 01:07:41,290 --> 01:07:42,690 of the x's. 1317 01:07:42,690 --> 01:07:45,340 So just w1 times x1 plus w2. 1318 01:07:45,340 --> 01:07:47,840 And then the constraints would also 1319 01:07:47,840 --> 01:07:50,066 have to be linear functions of [INAUDIBLE]. 1320 01:07:50,066 --> 01:07:51,440 And if you have a linear problem, 1321 01:07:51,440 --> 01:07:53,435 then there's the [INAUDIBLE] simplex, 1322 01:07:53,435 --> 01:07:55,060 different from the Nelder-Mead simplex, 1323 01:07:55,060 --> 01:07:59,432 but the simplex method that's really efficient, 1324 01:07:59,432 --> 01:08:02,015 can solve really big problems, lots of theoretical guarantees. 1325 01:08:05,781 --> 01:08:06,280 All right. 1326 01:08:06,280 --> 01:08:09,090 So he's the Nelder-Mead simplex. 1327 01:08:09,090 --> 01:08:10,689 So this was fminsearch. 1328 01:08:10,689 --> 01:08:12,230 Remember, that was the very first one 1329 01:08:12,230 --> 01:08:16,609 that our little landscape was working on. 1330 01:08:16,609 --> 01:08:20,160 So this is a local method that's derivative free. 1331 01:08:20,160 --> 01:08:22,229 So it's not using any gradient. 1332 01:08:22,229 --> 01:08:24,410 And how does it work? 1333 01:08:24,410 --> 01:08:26,950 So a simplex is a special polytope of N 1334 01:08:26,950 --> 01:08:30,279 plus 1 vertices in N dimensions. 1335 01:08:30,279 --> 01:08:32,470 For me it's easiest to think about in 2D [INAUDIBLE] 1336 01:08:32,470 --> 01:08:32,970 a triangle. 1337 01:08:32,970 --> 01:08:35,510 So simplex is a triangle in 2D. 1338 01:08:35,510 --> 01:08:40,590 And here's how this Nelder-Mead simplex method works. 1339 01:08:40,590 --> 01:08:43,660 So you take your initial guess that, again, we have to supply. 1340 01:08:43,660 --> 01:08:45,939 This is the initial point in the landscape. 1341 01:08:45,939 --> 01:08:47,579 And you form an initial simplex. 1342 01:08:47,579 --> 01:08:49,620 So with our 2D landscape, we're going to put down 1343 01:08:49,620 --> 01:08:52,064 a triangle, initial triangle. 1344 01:08:52,064 --> 01:08:53,230 And we're going to evaluate. 1345 01:08:53,230 --> 01:08:55,185 And remember, I'm talking about unconstrained optimization 1346 01:08:55,185 --> 01:08:55,460 here. 1347 01:08:55,460 --> 01:08:56,749 So there are no constraints. 1348 01:08:56,749 --> 01:09:00,450 So you evaluate j, the objective, 1349 01:09:00,450 --> 01:09:04,091 at each one of the points in the triangle, three. 1350 01:09:04,091 --> 01:09:05,882 So in two dimensions, we have three points. 1351 01:09:08,259 --> 01:09:10,050 So that's what can keep the function value. 1352 01:09:10,050 --> 01:09:12,090 The function value here is j. 1353 01:09:12,090 --> 01:09:14,090 [INAUDIBLE] the triangle-- the triangle can be-- 1354 01:09:14,090 --> 01:09:15,464 AUDIENCE: [INAUDIBLE] 1355 01:09:15,464 --> 01:09:16,130 PROFESSOR: Yeah. 1356 01:09:16,130 --> 01:09:17,580 I mean, yes. 1357 01:09:17,580 --> 01:09:18,477 In terms of size? 1358 01:09:18,477 --> 01:09:19,727 You've got your initial point. 1359 01:09:19,727 --> 01:09:21,560 And you're going to put two other down here. 1360 01:09:21,560 --> 01:09:23,559 There would be-- they're close by though. 1361 01:09:23,559 --> 01:09:24,439 It's local. 1362 01:09:24,439 --> 01:09:25,439 So they're close by. 1363 01:09:25,439 --> 01:09:29,877 Maybe not super close, but they're close by. 1364 01:09:29,877 --> 01:09:31,460 And you'll see that's going to change. 1365 01:09:31,460 --> 01:09:34,550 So it doesn't actually matter too much what we start with. 1366 01:09:34,550 --> 01:09:36,680 We're going to order the vertices according 1367 01:09:36,680 --> 01:09:40,260 to function values and discard the worse one. 1368 01:09:40,260 --> 01:09:42,460 So if we're trying to minimize, we 1369 01:09:42,460 --> 01:09:45,710 have the highest value, the lowest value, 1370 01:09:45,710 --> 01:09:47,170 and the middle one. 1371 01:09:47,170 --> 01:09:50,020 If we're trying to minimize, we would throw out the one 1372 01:09:50,020 --> 01:09:53,899 with the highest value, right? 1373 01:09:53,899 --> 01:09:56,682 So we're going to throw away, in this case, x high. 1374 01:09:56,682 --> 01:09:58,140 Because it's got the highest value, 1375 01:09:58,140 --> 01:09:59,520 and we're trying to minimize. 1376 01:09:59,520 --> 01:10:01,561 And we're going to generate a new point by what's 1377 01:10:01,561 --> 01:10:03,690 called reflection, which means that I'm 1378 01:10:03,690 --> 01:10:06,910 going to come across this line of the remaining two 1379 01:10:06,910 --> 01:10:10,090 and reflect the triangle over and generate a new point that's 1380 01:10:10,090 --> 01:10:12,910 kind of on the opposite side of the simplex. 1381 01:10:12,910 --> 01:10:16,700 And at the same time, I'm also going 1382 01:10:16,700 --> 01:10:20,530 to-- I'm going to [INAUDIBLE] a new point over here. 1383 01:10:20,530 --> 01:10:22,670 This is the xr. 1384 01:10:22,670 --> 01:10:26,660 I'm going to run the function there and see whether things 1385 01:10:26,660 --> 01:10:28,824 got better or not. 1386 01:10:28,824 --> 01:10:29,990 So we're trying to minimize. 1387 01:10:29,990 --> 01:10:31,090 We're going downhill. 1388 01:10:31,090 --> 01:10:34,100 Run the function here and see whether this is actually 1389 01:10:34,100 --> 01:10:36,140 a better function value. 1390 01:10:36,140 --> 01:10:37,890 And then I'm also going to decide, 1391 01:10:37,890 --> 01:10:39,860 depending on how steeply down the hill 1392 01:10:39,860 --> 01:10:44,910 I'm going, whether I want to change the size of my simplex. 1393 01:10:44,910 --> 01:10:47,590 And if things are going really well and I'm generating points 1394 01:10:47,590 --> 01:10:49,230 and I'm really going downhill quickly, 1395 01:10:49,230 --> 01:10:51,230 I'm going to make the simplex bigger and bigger, 1396 01:10:51,230 --> 01:10:53,730 so that I can go down hill faster and faster. 1397 01:10:53,730 --> 01:10:56,060 But if this guy's not really so good, 1398 01:10:56,060 --> 01:10:59,720 then I might shrink the simplex so that I look more locally. 1399 01:10:59,720 --> 01:11:02,170 So maybe you can kind of visualize 1400 01:11:02,170 --> 01:11:04,130 that what this optimization algorithm looks 1401 01:11:04,130 --> 01:11:08,220 like is a bunch of triangles that keep flipping over 1402 01:11:08,220 --> 01:11:11,410 just by measuring, by ordering the performance 1403 01:11:11,410 --> 01:11:13,260 of the vertices, throwing out the old one, 1404 01:11:13,260 --> 01:11:15,770 generating a new one on the other side, walking kind of 1405 01:11:15,770 --> 01:11:16,820 through the design space. 1406 01:11:16,820 --> 01:11:19,600 And then the triangles are growing or shrinking depending 1407 01:11:19,600 --> 01:11:23,282 on how well things are going. 1408 01:11:23,282 --> 01:11:24,240 So there's no gradient. 1409 01:11:24,240 --> 01:11:26,930 All it is is just a sampling and an ordering. 1410 01:11:26,930 --> 01:11:30,420 There are some convergence issues in this. 1411 01:11:30,420 --> 01:11:33,010 But actually, it's kind of a simple algorithm. 1412 01:11:33,010 --> 01:11:34,383 And it's pretty robust. 1413 01:11:37,869 --> 01:11:40,410 The function doesn't have to be differentiable, right? j of x 1414 01:11:40,410 --> 01:11:42,830 doesn't have to be smooth as long 1415 01:11:42,830 --> 01:11:46,975 as a better point has a lower value of j 1416 01:11:46,975 --> 01:11:49,460 of x than another one. 1417 01:11:49,460 --> 01:11:50,998 That's OK, right? 1418 01:11:53,686 --> 01:12:00,280 So when I ran fminsearch, way back up if I pull it up. 1419 01:12:00,280 --> 01:12:02,686 So you can see here in the MATLAB output 1420 01:12:02,686 --> 01:12:08,025 it was [INAUDIBLE] things. 1421 01:12:08,025 --> 01:12:11,050 So that was telling me what was going on. 1422 01:12:11,050 --> 01:12:13,300 Every time it was evaluating new points, 1423 01:12:13,300 --> 01:12:15,225 so two new points each time, that 1424 01:12:15,225 --> 01:12:18,080 was the simplex expanding, reflecting, contracting, 1425 01:12:18,080 --> 01:12:19,690 knew what was going on with points. 1426 01:12:22,390 --> 01:12:22,890 OK. 1427 01:12:22,890 --> 01:12:25,390 So that's a derivative-free method. 1428 01:12:25,390 --> 01:12:28,680 It uses a little bit of sampling, but it is local. 1429 01:12:28,680 --> 01:12:30,856 And that's fminsearch in MATLAB. 1430 01:12:34,260 --> 01:12:38,150 This one is sort of the same-- I don't know it sort of has 1431 01:12:38,150 --> 01:12:39,440 the same feeling. 1432 01:12:39,440 --> 01:12:41,820 But it's a global optimization method. 1433 01:12:41,820 --> 01:12:42,770 It's called DIRECT. 1434 01:12:42,770 --> 01:12:45,560 And DIRECT stands for Dividing Rectangles. 1435 01:12:45,560 --> 01:12:48,240 And basically what it does is it divides the domain 1436 01:12:48,240 --> 01:12:50,730 into these rectangles in 2D. 1437 01:12:50,730 --> 01:12:52,980 You'll have rectangles in multiple D. 1438 01:12:52,980 --> 01:12:56,330 And it basically figures out where to look. 1439 01:12:56,330 --> 01:12:59,395 So if this one is interesting, then it would divide that more. 1440 01:12:59,395 --> 01:13:00,770 And then this guy is interesting, 1441 01:13:00,770 --> 01:13:01,980 so then divide that one more. 1442 01:13:01,980 --> 01:13:04,355 And that one's interesting and keep dividing and dividing 1443 01:13:04,355 --> 01:13:05,280 and dividing more. 1444 01:13:05,280 --> 01:13:08,500 And it's got all different roles for figuring out 1445 01:13:08,500 --> 01:13:11,170 which ones to divide and which ones not to divide 1446 01:13:11,170 --> 01:13:16,230 and how to progress. 1447 01:13:16,230 --> 01:13:19,229 So this is one that tries to go at global optimization. 1448 01:13:19,229 --> 01:13:20,770 Problem with this one is that it just 1449 01:13:20,770 --> 01:13:23,103 really keeps running and running and running and running 1450 01:13:23,103 --> 01:13:25,200 and running. 1451 01:13:25,200 --> 01:13:26,995 So we've used this a couple times. 1452 01:13:26,995 --> 01:13:28,560 But it tends to just take so long 1453 01:13:28,560 --> 01:13:30,760 to run that it's not particularly useful. 1454 01:13:33,280 --> 01:13:33,780 OK. 1455 01:13:33,780 --> 01:13:36,652 So let's see what time we have left. 1456 01:13:36,652 --> 01:13:38,008 We have about 10 minutes. 1457 01:13:38,008 --> 01:13:41,170 So let's at least start talking about 1458 01:13:41,170 --> 01:13:44,470 the gradient-based method. 1459 01:13:44,470 --> 01:13:47,760 So again, this is what we've been talking about starting 1460 01:13:47,760 --> 01:13:49,628 with an initial guess, x0. 1461 01:13:52,380 --> 01:13:55,700 Then we've going to compute two things, the search direction, 1462 01:13:55,700 --> 01:13:58,990 the Sq, and then the alpha-- this should have a q on it-- 1463 01:13:58,990 --> 01:14:01,130 how far we step in that direction. 1464 01:14:01,130 --> 01:14:03,030 And with a gradient-based method, 1465 01:14:03,030 --> 01:14:08,500 we're going to use gradient of j to compute this Sq. 1466 01:14:08,500 --> 01:14:09,232 OK. 1467 01:14:09,232 --> 01:14:10,940 And we're going to check for convergence. 1468 01:14:10,940 --> 01:14:12,856 We'll talk about what that might mean and just 1469 01:14:12,856 --> 01:14:16,050 keep going around this slope until we're done. 1470 01:14:16,050 --> 01:14:18,844 So these are the methods that we'll talk about. 1471 01:14:18,844 --> 01:14:21,260 I think maybe we'll just talk about steepest descent right 1472 01:14:21,260 --> 01:14:21,760 now. 1473 01:14:21,760 --> 01:14:24,800 And then we can talk about the other ones on Monday. 1474 01:14:24,800 --> 01:14:26,840 But steepest descent is conjugate gradient 1475 01:14:26,840 --> 01:14:28,722 of first-order method. 1476 01:14:28,722 --> 01:14:30,180 Newton method, which I think you've 1477 01:14:30,180 --> 01:14:32,750 seen maybe in the scalar case. 1478 01:14:32,750 --> 01:14:34,485 Have you seen the Newton method for root 1479 01:14:34,485 --> 01:14:37,510 finding in the scalar case? 1480 01:14:37,510 --> 01:14:41,310 You've seen Newton [INAUDIBLE] in [INAUDIBLE], a little bit 1481 01:14:41,310 --> 01:14:42,150 different. 1482 01:14:42,150 --> 01:14:44,912 We'll see how Newton method looks in the [INAUDIBLE] case. 1483 01:14:44,912 --> 01:14:47,120 And then there are things called quasi-Newton methods 1484 01:14:47,120 --> 01:14:51,970 that kind of sit in the middle between the two. 1485 01:14:51,970 --> 01:14:53,500 So let's look at steepest descent. 1486 01:14:53,500 --> 01:14:55,249 It's the simplest thing you could possibly 1487 01:14:55,249 --> 01:14:57,860 do, possibly think about doing. 1488 01:14:57,860 --> 01:15:03,500 And it just says fix this search direction on iteration q 1489 01:15:03,500 --> 01:15:10,160 to be negative the gradient of j evaluated at the current design 1490 01:15:10,160 --> 01:15:11,974 iterate, Sq minus 1. 1491 01:15:11,974 --> 01:15:13,807 So why would you pick that search direction? 1492 01:15:17,470 --> 01:15:19,350 It's the steepest descent, yeah. 1493 01:15:19,350 --> 01:15:21,410 So we know that the gradient of j 1494 01:15:21,410 --> 01:15:25,070 points in the direction of maximum local increase of j. 1495 01:15:25,070 --> 01:15:27,000 Negative gradient of j points in the direction 1496 01:15:27,000 --> 01:15:28,300 of steepest descent. 1497 01:15:28,300 --> 01:15:29,990 So here's the algorithm. 1498 01:15:29,990 --> 01:15:31,780 But again, just think about the landscape. 1499 01:15:31,780 --> 01:15:32,820 What are you doing? 1500 01:15:32,820 --> 01:15:33,610 You're standing in the landscape. 1501 01:15:33,610 --> 01:15:34,580 You're looking around. 1502 01:15:34,580 --> 01:15:36,752 You're finding the direction of steepest descent. 1503 01:15:36,752 --> 01:15:38,460 And you're going to go in that direction. 1504 01:15:38,460 --> 01:15:41,084 And we'll have to talk on Monday about how to choose the alpha. 1505 01:15:41,084 --> 01:15:44,490 But we find the direction of steepest descent. 1506 01:15:44,490 --> 01:15:46,505 We choose the alpha. 1507 01:15:46,505 --> 01:15:47,130 We take a step. 1508 01:15:47,130 --> 01:15:48,011 We look around. 1509 01:15:48,011 --> 01:15:49,760 We find the direction of steepest descent. 1510 01:15:49,760 --> 01:15:54,680 We can pick a new gradient, find the new alpha, take a step, 1511 01:15:54,680 --> 01:15:58,145 and keep repeating. 1512 01:15:58,145 --> 01:15:59,020 AUDIENCE: [INAUDIBLE] 1513 01:16:09,640 --> 01:16:10,985 PROFESSOR: That's right. 1514 01:16:10,985 --> 01:16:11,860 That's exactly right. 1515 01:16:11,860 --> 01:16:13,745 And I'll show you how we do this on Monday. 1516 01:16:13,745 --> 01:16:14,620 That's exactly right. 1517 01:16:14,620 --> 01:16:16,161 It becomes [INAUDIBLE] the 1D search. 1518 01:16:16,161 --> 01:16:18,820 Because once you've define the direction, it's now just 1D. 1519 01:16:18,820 --> 01:16:20,815 So that's the first gradient-based optimization 1520 01:16:20,815 --> 01:16:21,820 algorithm. 1521 01:16:21,820 --> 01:16:22,620 It's really simple. 1522 01:16:22,620 --> 01:16:24,370 It turns out that's not a great algorithm, 1523 01:16:24,370 --> 01:16:27,410 that it converges really slowly. 1524 01:16:27,410 --> 01:16:28,880 But conjugate gradient is actually 1525 01:16:28,880 --> 01:16:31,250 something that's not too different. 1526 01:16:31,250 --> 01:16:33,020 So now what does conjugate gradient do? 1527 01:16:33,020 --> 01:16:36,720 The first search direction S on the first iteration 1528 01:16:36,720 --> 01:16:39,120 is the steepest descent direction. 1529 01:16:39,120 --> 01:16:40,100 Yeah. 1530 01:16:40,100 --> 01:16:41,820 But then on subsequent iterations, 1531 01:16:41,820 --> 01:16:44,910 the search direction is the steepest descent direction 1532 01:16:44,910 --> 01:16:47,750 minus squared j plus this term that's 1533 01:16:47,750 --> 01:16:52,020 beta q times H q minus 1, so the steepest descent direction 1534 01:16:52,020 --> 01:16:56,900 modified by the last search direction. 1535 01:16:56,900 --> 01:17:00,926 And the beta is the ratio of the gradients on the last two-- 1536 01:17:00,926 --> 01:17:03,300 the ratio of the norm are the gradients from the last two 1537 01:17:03,300 --> 01:17:04,990 search directions. 1538 01:17:04,990 --> 01:17:07,560 So what is going on here? 1539 01:17:07,560 --> 01:17:10,572 Again, I think you kind of look at the math. 1540 01:17:10,572 --> 01:17:12,030 But basically, what is happening is 1541 01:17:12,030 --> 01:17:13,280 you think about the landscape. 1542 01:17:13,280 --> 01:17:15,210 So I'm standing at x0. 1543 01:17:15,210 --> 01:17:18,360 I find the direction the steepest descent. 1544 01:17:18,360 --> 01:17:20,326 I move in that direction. 1545 01:17:20,326 --> 01:17:21,950 Now when I get to the second iteration, 1546 01:17:21,950 --> 01:17:24,560 I find the direction of steepest descent. 1547 01:17:24,560 --> 01:17:26,140 But I also look over my shoulder, 1548 01:17:26,140 --> 01:17:28,465 and I see where did I come from. 1549 01:17:28,465 --> 01:17:30,340 I find the direction of the steepest descent. 1550 01:17:30,340 --> 01:17:32,006 And then depending on where I came from, 1551 01:17:32,006 --> 01:17:34,220 I'm going to modify that direction a little bit, 1552 01:17:34,220 --> 01:17:35,655 and then move. 1553 01:17:35,655 --> 01:17:37,030 And then I'm going to go, and I'm 1554 01:17:37,030 --> 01:17:38,320 going to find the direction of steepest descent. 1555 01:17:38,320 --> 01:17:39,820 I'm going to look where I just came from 1556 01:17:39,820 --> 01:17:41,100 and modify it a little bit. 1557 01:17:41,100 --> 01:17:45,850 So I'm incorporating information about where I came from. 1558 01:17:45,850 --> 01:17:48,000 And if you look at this example, this function 1559 01:17:48,000 --> 01:17:50,059 is a really famous example that is 1560 01:17:50,059 --> 01:17:51,600 an analytic example that's often used 1561 01:17:51,600 --> 01:17:53,011 as optimization algorithms. 1562 01:17:53,011 --> 01:17:54,510 It's called the Rosenbrock function. 1563 01:17:54,510 --> 01:17:56,380 It's sometimes called the banana function, 1564 01:17:56,380 --> 01:18:00,010 because it looks like a banana with really steep walls. 1565 01:18:00,010 --> 01:18:01,620 And what you can see here on the left 1566 01:18:01,620 --> 01:18:05,994 is the steepest descent algorithm starting 1567 01:18:05,994 --> 01:18:07,160 from the initial point here. 1568 01:18:07,160 --> 01:18:08,450 So we start here. 1569 01:18:08,450 --> 01:18:10,356 We compute the direction of steepest descent. 1570 01:18:10,356 --> 01:18:11,730 These are contours of objectives. 1571 01:18:11,730 --> 01:18:13,280 So the direction of steepest descent 1572 01:18:13,280 --> 01:18:15,777 is perpendicular to the contours, right? 1573 01:18:15,777 --> 01:18:16,519 Yup. 1574 01:18:16,519 --> 01:18:19,060 Perpendicular to the contours, what you were saying, Dominic. 1575 01:18:19,060 --> 01:18:19,970 It looks along here. 1576 01:18:19,970 --> 01:18:21,905 And there's the point that would minimize. 1577 01:18:21,905 --> 01:18:24,030 It gets to the new point, completes the direction 1578 01:18:24,030 --> 01:18:27,260 of steepest descent, moves along here. 1579 01:18:27,260 --> 01:18:30,260 It gets here, completes the direction of steepest descent, 1580 01:18:30,260 --> 01:18:32,680 steepest descent, steepest descent, steepest descent, 1581 01:18:32,680 --> 01:18:34,480 can you see what's happening to this guy? 1582 01:18:34,480 --> 01:18:37,060 It's going to take hundreds probably 1583 01:18:37,060 --> 01:18:40,070 of little tiny steps going zz, zz, zz, zz, as it 1584 01:18:40,070 --> 01:18:42,726 gets to the optimum solution. 1585 01:18:42,726 --> 01:18:44,500 What does conjugate gradient do? 1586 01:18:44,500 --> 01:18:47,670 So the first one is the same, the first iteration, 1587 01:18:47,670 --> 01:18:48,850 steepest descent. 1588 01:18:48,850 --> 01:18:51,180 But now, when we get to this point, 1589 01:18:51,180 --> 01:18:53,280 we compute the direction of steepest descent, 1590 01:18:53,280 --> 01:18:57,280 which again would be perpendicular to the contour. 1591 01:18:57,280 --> 01:18:59,780 So it would be like that. 1592 01:18:59,780 --> 01:19:01,830 But now, we also look back over our shoulder, 1593 01:19:01,830 --> 01:19:05,320 see where we came from, compute that data, 1594 01:19:05,320 --> 01:19:09,660 add in that beta S1 term and modify it a little bit. 1595 01:19:09,660 --> 01:19:11,470 So it actually takes us there. 1596 01:19:11,470 --> 01:19:13,600 Not terribly different, right? 1597 01:19:13,600 --> 01:19:14,610 But enough different. 1598 01:19:14,610 --> 01:19:15,630 Now, we get here. 1599 01:19:15,630 --> 01:19:17,872 Steepest descent, again, would be perpendicular. 1600 01:19:17,872 --> 01:19:18,580 It would be here. 1601 01:19:18,580 --> 01:19:19,750 We modify it a little bit. 1602 01:19:19,750 --> 01:19:22,420 We skip all the way here and continue on. 1603 01:19:22,420 --> 01:19:27,690 So conjugate gradient converted in 1, 2, 3, 4, 5 iterations. 1604 01:19:27,690 --> 01:19:36,496 Whereas, steepest descent [AUDIO OUT] gradient 1605 01:19:36,496 --> 01:19:45,140 [AUDIO OUT] much, much faster [INAUDIBLE] to converge. 1606 01:19:45,140 --> 01:19:51,360 And [AUDIO OUT] and you want to get down, 1607 01:19:51,360 --> 01:19:52,690 don't use steepest descent. 1608 01:19:52,690 --> 01:19:54,810 Because you're going to end up going 1609 01:19:54,810 --> 01:19:56,600 back and forth across yourself. 1610 01:19:56,600 --> 01:20:00,700 Always look to see where you came from [INAUDIBLE]. 1611 01:20:00,700 --> 01:20:01,200 OK. 1612 01:20:01,200 --> 01:20:02,290 So let's finish there. 1613 01:20:02,290 --> 01:20:05,310 We'll talk about Newton's method and various other things. 1614 01:20:05,310 --> 01:20:07,714 We'll talk about computing gradients on a Monday as well. 1615 01:20:07,714 --> 01:20:09,380 But if you have questions, stick around. 1616 01:20:09,380 --> 01:20:11,005 I'm going to do office hours now if you 1617 01:20:11,005 --> 01:20:15,230 questions about the lecture or questions about the project.