1 00:00:01,540 --> 00:00:03,910 The following content is provided under a Creative 2 00:00:03,910 --> 00:00:05,300 Commons license. 3 00:00:05,300 --> 00:00:07,510 Your support will help MIT OpenCourseWare 4 00:00:07,510 --> 00:00:11,600 continue to offer high-quality educational resources for free. 5 00:00:11,600 --> 00:00:14,140 To make a donation or to view additional materials 6 00:00:14,140 --> 00:00:18,100 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:18,100 --> 00:00:18,980 at ocw.mit.edu. 8 00:00:24,660 --> 00:00:27,280 JAMES SWAN: OK. 9 00:00:27,280 --> 00:00:30,280 Should we begin? 10 00:00:30,280 --> 00:00:33,070 Let me remind you, we switched topics. 11 00:00:33,070 --> 00:00:35,110 We transitioned from linear algebra, 12 00:00:35,110 --> 00:00:36,880 solving systems of linear equations, 13 00:00:36,880 --> 00:00:39,220 to solving systems of nonlinear equations. 14 00:00:39,220 --> 00:00:43,630 And it turns out, linear algebra is at the core of the way 15 00:00:43,630 --> 00:00:45,380 that we're going to solve these equations. 16 00:00:45,380 --> 00:00:47,239 We need iterative approaches. 17 00:00:47,239 --> 00:00:48,530 These problems are complicated. 18 00:00:48,530 --> 00:00:50,679 We don't know how many solutions there could be. 19 00:00:50,679 --> 00:00:52,970 We have no idea where those solutions could be located. 20 00:00:52,970 --> 00:00:55,460 We have no exact ways of finding them. 21 00:00:55,460 --> 00:00:59,140 We use iterative methods to transform non-linear equations 22 00:00:59,140 --> 00:01:02,110 into simpler problems, right? 23 00:01:02,110 --> 00:01:04,849 Iterates of systems of linear equations. 24 00:01:04,849 --> 00:01:09,292 And the key to that was the Newton-Raphson method. 25 00:01:09,292 --> 00:01:11,000 So I'm going to pick up where we left off 26 00:01:11,000 --> 00:01:12,708 with the Newton-Raphson method, and we're 27 00:01:12,708 --> 00:01:17,420 going find out ways of being less Newton-Raphson-y in order 28 00:01:17,420 --> 00:01:21,740 to overcome some difficulties with the method, shortcomings 29 00:01:21,740 --> 00:01:22,324 of the method. 30 00:01:22,324 --> 00:01:23,823 There are a number of them that have 31 00:01:23,823 --> 00:01:25,340 to be overcome in various ways. 32 00:01:25,340 --> 00:01:28,730 And you sort of choose these so-called quasi- Newton-Raphson 33 00:01:28,730 --> 00:01:30,670 methods as you need them. 34 00:01:30,670 --> 00:01:31,906 OK, so you'll find out. 35 00:01:31,906 --> 00:01:33,030 You try to solve a problem. 36 00:01:33,030 --> 00:01:36,800 And the Newton-Raphson method presents some difficulty, 37 00:01:36,800 --> 00:01:39,980 you might resort to a quasi Newton-Raphson method instead. 38 00:01:39,980 --> 00:01:44,060 Built into MATLAB is non-linear equations solver. 39 00:01:44,060 --> 00:01:44,790 fsolve. 40 00:01:44,790 --> 00:01:46,874 OK, it's going to happily solve systems 41 00:01:46,874 --> 00:01:48,540 of nonlinear equations for you, and it's 42 00:01:48,540 --> 00:01:50,520 going to use this methodology to do it. 43 00:01:50,520 --> 00:01:51,920 It's going to use various aspects 44 00:01:51,920 --> 00:01:53,990 of these quasi- Newton-Raphson methods to do it. 45 00:01:53,990 --> 00:01:56,150 I'll sort of point out places where 46 00:01:56,150 --> 00:01:58,430 fsolve will take from our lecture 47 00:01:58,430 --> 00:01:59,560 and implement them for you. 48 00:01:59,560 --> 00:02:01,684 It will even use some more complicated methods that 49 00:02:01,684 --> 00:02:04,850 we'll talk about later on in the context of optimization . 50 00:02:04,850 --> 00:02:08,210 Somebody asked an interesting question, 51 00:02:08,210 --> 00:02:10,460 which is how many of these nonlinear equations 52 00:02:10,460 --> 00:02:12,781 am I going to want to solve at once? 53 00:02:12,781 --> 00:02:13,280 Right? 54 00:02:13,280 --> 00:02:15,590 Like I have a system of these equations. 55 00:02:15,590 --> 00:02:18,710 What does a big system of nonlinear equations look like? 56 00:02:18,710 --> 00:02:20,210 And just like with linear equations, 57 00:02:20,210 --> 00:02:22,170 it's as big as you can imagine. 58 00:02:22,170 --> 00:02:26,390 So one case you could think about is trying to solve, 59 00:02:26,390 --> 00:02:29,450 for example, the steady Navia-Stokes equations. 60 00:02:29,450 --> 00:02:31,490 That's a nonlinear partial differential 61 00:02:31,490 --> 00:02:35,710 equation for the velocity field and the pressure in a fluid. 62 00:02:35,710 --> 00:02:38,480 And a at Reynolds number, that non-linearity 63 00:02:38,480 --> 00:02:41,360 is going to present itself in terms of inertial terms 64 00:02:41,360 --> 00:02:46,190 that may even dominate the flow characteristics in many places. 65 00:02:46,190 --> 00:02:48,650 We'll learn ways of discretizing partial differential 66 00:02:48,650 --> 00:02:50,400 equations like that. 67 00:02:50,400 --> 00:02:53,424 And so then, at each point in the fluid we're interested in, 68 00:02:53,424 --> 00:02:56,090 we're going to have a non-linear equation that we have to solve. 69 00:02:56,090 --> 00:02:58,631 So there's going to be a system of these non-linear equations 70 00:02:58,631 --> 00:02:59,870 that are coupled together. 71 00:02:59,870 --> 00:03:01,453 How many points are there going to be? 72 00:03:01,453 --> 00:03:03,155 That's up to you, OK? 73 00:03:03,155 --> 00:03:05,030 And so you're going to need methods like this 74 00:03:05,030 --> 00:03:05,646 to solve that. 75 00:03:05,646 --> 00:03:06,770 It sounds very complicated. 76 00:03:06,770 --> 00:03:08,420 So a lot of times in fluid mechanics, 77 00:03:08,420 --> 00:03:10,470 we have better ways of going about doing it. 78 00:03:10,470 --> 00:03:13,804 But in principle, we've have any number of nonlinear equations 79 00:03:13,804 --> 00:03:14,720 that we want to solve. 80 00:03:18,791 --> 00:03:21,040 We discussed last time, the new Newton-Raphson method, 81 00:03:21,040 --> 00:03:23,310 which was based around the idea of linearization. 82 00:03:23,310 --> 00:03:25,732 We have these nonlinear equations. 83 00:03:25,732 --> 00:03:27,190 We don't know what to do with them. 84 00:03:27,190 --> 00:03:29,350 So let's linearize them, right? 85 00:03:29,350 --> 00:03:32,260 If we have some guess for the solution, which isn't perfect, 86 00:03:32,260 --> 00:03:34,150 but it's our best possible guess. 87 00:03:34,150 --> 00:03:36,880 Let's look at the function and find a linearized form 88 00:03:36,880 --> 00:03:39,250 of the function and see where that linearized form has 89 00:03:39,250 --> 00:03:40,490 an intercept. 90 00:03:40,490 --> 00:03:42,880 And we just have an Ansatz. 91 00:03:42,880 --> 00:03:45,160 We guess that this is a better solution 92 00:03:45,160 --> 00:03:46,660 than the one we had before. 93 00:03:46,660 --> 00:03:48,610 And we iterate. 94 00:03:48,610 --> 00:03:51,520 It turns out you can prove that this sort 95 00:03:51,520 --> 00:03:54,520 of a strategy-- this Newton-Raphson strategy 96 00:03:54,520 --> 00:03:56,440 is locally convergent. 97 00:03:56,440 --> 00:03:59,710 If I start with a guess sufficiently close to the root, 98 00:03:59,710 --> 00:04:02,410 you can prove mathematically that this procedure 99 00:04:02,410 --> 00:04:05,980 will terminate with a solution at the root, right? 100 00:04:05,980 --> 00:04:08,510 It's going to approach after an infinite number of iterates, 101 00:04:08,510 --> 00:04:10,060 the root. 102 00:04:10,060 --> 00:04:10,980 That's wonderful. 103 00:04:10,980 --> 00:04:13,200 It's locally convergent, not globally convergent. 104 00:04:13,200 --> 00:04:17,230 So this is one of those problems that we discussed. 105 00:04:17,230 --> 00:04:18,519 Take a second here, right? 106 00:04:18,519 --> 00:04:19,660 Here's your Newton-Raphson formula. 107 00:04:19,660 --> 00:04:20,868 You've got it on your slides. 108 00:04:20,868 --> 00:04:23,664 Take a second here and-- 109 00:04:23,664 --> 00:04:24,830 this is sort of interesting. 110 00:04:24,830 --> 00:04:26,620 Derive the Babylonian method, right? 111 00:04:26,620 --> 00:04:28,150 Turns out the Babylonians didn't know anything 112 00:04:28,150 --> 00:04:30,608 about Newton-Raphson but they had some good guesses for how 113 00:04:30,608 --> 00:04:32,410 to find square roots, right? 114 00:04:32,410 --> 00:04:34,162 Find the roots of an equation like this. 115 00:04:34,162 --> 00:04:36,370 See that you understand the new Newton-Raphson method 116 00:04:36,370 --> 00:04:38,860 by deriving the Babylonian method, right? 117 00:04:38,860 --> 00:04:41,980 The iterative method for finding the square root of s 118 00:04:41,980 --> 00:04:43,715 as the root of this equation. 119 00:04:43,715 --> 00:04:44,410 Can you do it? 120 00:04:44,410 --> 00:04:46,905 [SIDE CONVERSATION] 121 00:06:06,880 --> 00:06:09,404 JAMES SWAN: Yes, you know how to do this, right? 122 00:06:09,404 --> 00:06:10,570 So calculate the derivative. 123 00:06:10,570 --> 00:06:12,460 The derivative is 2x. 124 00:06:12,460 --> 00:06:14,790 Here's our formula for the iterative method. 125 00:06:14,790 --> 00:06:15,610 Right? 126 00:06:15,610 --> 00:06:18,940 So it's f of x over f prime of x. 127 00:06:18,940 --> 00:06:21,040 That sets the magnitude of the stop. 128 00:06:21,040 --> 00:06:23,590 The direction is minus this magnitude. 129 00:06:23,590 --> 00:06:26,470 It's in one d, so we either go left or we go right. 130 00:06:26,470 --> 00:06:28,210 Minus sets the direction. 131 00:06:28,210 --> 00:06:29,770 We add that to our previous guess. 132 00:06:29,770 --> 00:06:31,560 And we have our new iterate, right? 133 00:06:31,560 --> 00:06:34,840 You substitute f and f prime, and you can simplify this 134 00:06:34,840 --> 00:06:38,110 down to the Babylonian method, which said take 135 00:06:38,110 --> 00:06:41,595 the average of x and s over x. 136 00:06:41,595 --> 00:06:42,970 If I'm at the root, both of these 137 00:06:42,970 --> 00:06:45,510 should be square root of s. 138 00:06:45,510 --> 00:06:48,856 And this quantity should be zero exactly, right? 139 00:06:48,856 --> 00:06:50,550 And you'll get your solution. 140 00:06:50,550 --> 00:06:53,414 So that's the Babylonian method, right? 141 00:06:53,414 --> 00:06:55,580 It's just an extension of the Newton-Raphson method. 142 00:06:55,580 --> 00:06:57,330 It was pretty good back in the day, right? 143 00:06:57,330 --> 00:07:00,450 Quadratic convergence to the square root of a number. 144 00:07:00,450 --> 00:07:03,220 I mentioned early on computers got really good 145 00:07:03,220 --> 00:07:05,600 in computing square roots at one point, 146 00:07:05,600 --> 00:07:09,380 because somebody did something kind of magic. 147 00:07:09,380 --> 00:07:12,290 They came up with a scheme for getting good initial guesses 148 00:07:12,290 --> 00:07:13,240 for the square root. 149 00:07:13,240 --> 00:07:15,950 This iterative method has to start with some initial guess. 150 00:07:15,950 --> 00:07:18,530 If it starts far away, it'll take more iterations 151 00:07:18,530 --> 00:07:19,320 to get there. 152 00:07:19,320 --> 00:07:21,620 It'll get there, but it's going to take more iterations 153 00:07:21,620 --> 00:07:22,780 to get there. 154 00:07:22,780 --> 00:07:25,280 That's undesirable if you're trying to do fast calculations. 155 00:07:25,280 --> 00:07:26,840 So somebody came up with some magic scheme, 156 00:07:26,840 --> 00:07:28,610 using floating point mathematics, right? 157 00:07:28,610 --> 00:07:32,720 They masked some of the bits in the digits of these numbers. 158 00:07:32,720 --> 00:07:35,120 A special number to mask those bits. 159 00:07:35,120 --> 00:07:37,880 They found that using optimization, it turns out. 160 00:07:37,880 --> 00:07:39,590 And they got really good initial guesses, 161 00:07:39,590 --> 00:07:40,964 and then it would take one or two 162 00:07:40,964 --> 00:07:43,040 iterations with the Newton-Raphson method 163 00:07:43,040 --> 00:07:45,450 to get 16 digits of accuracy. 164 00:07:45,450 --> 00:07:47,990 That's pretty good. 165 00:07:47,990 --> 00:07:49,620 But good initial guesses are important. 166 00:07:49,620 --> 00:07:51,495 We'll talk about that next week on Wednesday. 167 00:07:51,495 --> 00:07:53,670 Where do those good initial guesses come from? 168 00:07:53,670 --> 00:07:56,052 But sometimes we don't have those available to us. 169 00:07:56,052 --> 00:07:58,010 So what are some other ways that we can improve 170 00:07:58,010 --> 00:07:59,093 the Newton-Raphson method? 171 00:07:59,093 --> 00:08:02,701 That will be the topic of today's lecture. 172 00:08:02,701 --> 00:08:04,950 What's the Newton-Raphson method look like graphically 173 00:08:04,950 --> 00:08:06,060 in many dimensions. 174 00:08:06,060 --> 00:08:08,130 We talked about this Jacobian. 175 00:08:08,130 --> 00:08:10,200 Right, when we're trying to find the roots 176 00:08:10,200 --> 00:08:13,890 of a non-linear equation where our function has 177 00:08:13,890 --> 00:08:16,670 more than one dimension-- let's say it has two dimensions. 178 00:08:16,670 --> 00:08:18,700 So we have an f 1 and an f 2. 179 00:08:18,700 --> 00:08:21,270 And our unknowns our x 1 and x 2, 180 00:08:21,270 --> 00:08:23,870 they live in the x1 x2 plane, right? 181 00:08:23,870 --> 00:08:27,017 f1 might be this say bowl-shaped function. 182 00:08:27,017 --> 00:08:28,350 I've sketched out in red, right? 183 00:08:28,350 --> 00:08:29,400 It's three dimensional. 184 00:08:29,400 --> 00:08:31,041 It's some surface here. 185 00:08:31,041 --> 00:08:31,540 Right? 186 00:08:31,540 --> 00:08:34,350 We have some initial guess for the solution. 187 00:08:34,350 --> 00:08:37,860 We go up to the function, and we find a linearization of it, 188 00:08:37,860 --> 00:08:40,200 which is not a line but a plane. 189 00:08:40,200 --> 00:08:44,900 And that plane intersects the x 1, x 2 plane at a line. 190 00:08:44,900 --> 00:08:48,210 And our next best guess is going to live somewhere on this line. 191 00:08:48,210 --> 00:08:51,880 Where on this line depends on the linearization of f 2. 192 00:08:51,880 --> 00:08:52,410 Right? 193 00:08:52,410 --> 00:08:53,980 So we got to draw the same picture for f 2, 194 00:08:53,980 --> 00:08:55,521 but I'm not going to do that for you. 195 00:08:55,521 --> 00:08:59,400 So let's say, this is where the equivalent line from f 2 196 00:08:59,400 --> 00:09:02,070 intersects the line from f 1, right? 197 00:09:02,070 --> 00:09:04,650 So the two linearizations intersect here. 198 00:09:04,650 --> 00:09:06,460 That's our next best guess. 199 00:09:06,460 --> 00:09:07,860 We go back up to the curve. 200 00:09:07,860 --> 00:09:10,680 We find the plane that's tangent to the curve. 201 00:09:10,680 --> 00:09:12,600 We figure out where it intersects. 202 00:09:12,600 --> 00:09:13,840 The x 1 x 2 plane. 203 00:09:13,840 --> 00:09:14,610 That's a line. 204 00:09:14,610 --> 00:09:17,280 We find the point on the line that's our next best guess, 205 00:09:17,280 --> 00:09:18,300 and continue. 206 00:09:18,300 --> 00:09:20,370 Finding that intersection in the plane 207 00:09:20,370 --> 00:09:24,938 is the act of computing Jacobian inverse times f. 208 00:09:24,938 --> 00:09:25,438 OK? 209 00:09:30,380 --> 00:09:34,100 If we project down to just the x 1 x 2 plane, 210 00:09:34,100 --> 00:09:39,750 and we draw the curves where f 1 equals 0, and f 2 equals zero, 211 00:09:39,750 --> 00:09:40,250 right? 212 00:09:40,250 --> 00:09:42,950 Then each of these iterates, we start with an initial guess. 213 00:09:42,950 --> 00:09:46,550 We find the planes that are tangent to these curves, 214 00:09:46,550 --> 00:09:48,200 or to these surfaces. 215 00:09:48,200 --> 00:09:50,600 And where they intersect the x 1 x 2 plane. 216 00:09:50,600 --> 00:09:51,745 Those give us these lines. 217 00:09:51,745 --> 00:09:53,120 And the intersection of the lines 218 00:09:53,120 --> 00:09:55,050 give us our next approximation. 219 00:09:55,050 --> 00:09:58,160 And so our function steps along in the x1 and x2 plane. 220 00:09:58,160 --> 00:10:00,650 It takes some path through that plane. 221 00:10:00,650 --> 00:10:03,692 And eventually it will approach this locally unique solution. 222 00:10:03,692 --> 00:10:05,900 So that's what this iterative method is doing, right? 223 00:10:05,900 --> 00:10:08,750 It's navigating this multidimensional space, right? 224 00:10:08,750 --> 00:10:12,230 It moves where it has to to satisfy 225 00:10:12,230 --> 00:10:14,930 these linearized equations, right? 226 00:10:14,930 --> 00:10:18,610 Producing ever better approximations for a root. 227 00:10:18,610 --> 00:10:19,600 Start close. 228 00:10:19,600 --> 00:10:21,230 It'll converge fast. 229 00:10:21,230 --> 00:10:22,510 How fast? 230 00:10:22,510 --> 00:10:23,920 Quadratically. 231 00:10:23,920 --> 00:10:25,540 And you can prove this. 232 00:10:25,540 --> 00:10:26,620 I'll prove it in 1D. 233 00:10:26,620 --> 00:10:28,800 You might think about the multidimensional case, 234 00:10:28,800 --> 00:10:30,944 but I'll show you in one dimension. 235 00:10:30,944 --> 00:10:32,360 So the Newton-Raphson method said, 236 00:10:32,360 --> 00:10:38,460 xi plus 1 is equal to xi minus f of xi over f prime of xi. 237 00:10:38,460 --> 00:10:42,340 I'm going to subtract the root, the exact root from both sides 238 00:10:42,340 --> 00:10:44,320 of this equation. 239 00:10:44,320 --> 00:10:49,900 So this is the absolute error in the i plus 1 approximation. 240 00:10:49,900 --> 00:10:52,300 It's equal to this. 241 00:10:52,300 --> 00:10:54,940 And we're going to do a little trick, OK? 242 00:10:54,940 --> 00:10:59,280 The value of the function at the root is exactly equal to zero, 243 00:10:59,280 --> 00:11:03,090 and I'm going to expand this as a Taylor series, 244 00:11:03,090 --> 00:11:05,820 about the point xy. 245 00:11:05,820 --> 00:11:11,040 So f of xi plus f prime of xi times x star minus xi, 246 00:11:11,040 --> 00:11:12,990 plus this second order term as well. 247 00:11:12,990 --> 00:11:16,310 Plus cubic terms in this Taylor expansion, right? 248 00:11:16,310 --> 00:11:18,480 All of those need to sum up and be equal to 0, 249 00:11:18,480 --> 00:11:21,250 Because f of x star by definition is zero. 250 00:11:21,250 --> 00:11:24,000 x star is the root. 251 00:11:24,000 --> 00:11:28,860 And buried in this expression here 252 00:11:28,860 --> 00:11:33,210 is a quantity which can be related to xi minus f of xi 253 00:11:33,210 --> 00:11:34,950 over f prime minus x star. 254 00:11:34,950 --> 00:11:35,970 It's right here, right? 255 00:11:35,970 --> 00:11:39,250 xi minus x star, xi minus x star. 256 00:11:39,250 --> 00:11:41,820 I've got to divide through by f prime. 257 00:11:41,820 --> 00:11:44,550 Divide through by f prime, and I get f over f prime. 258 00:11:44,550 --> 00:11:46,320 That's this guy here. 259 00:11:46,320 --> 00:11:49,890 Those things are equal in magnitude then, 260 00:11:49,890 --> 00:11:51,380 to this second order term here. 261 00:11:51,380 --> 00:11:54,240 So they are equal in magnitude to 1/2, 262 00:11:54,240 --> 00:11:58,380 the second derivative of f, divided by f prime, times xi 263 00:11:58,380 --> 00:11:59,939 minus x star squared. 264 00:11:59,939 --> 00:12:02,230 And then these cubic terms, well, they're still around. 265 00:12:02,230 --> 00:12:05,610 But they're going to be small as I get close to the actual root. 266 00:12:05,610 --> 00:12:07,440 So they're negligible, right? 267 00:12:07,440 --> 00:12:11,115 Compared to these second order terms, they can be neglected. 268 00:12:11,115 --> 00:12:12,870 And you should convince yourself that I 269 00:12:12,870 --> 00:12:15,000 can apply some of the norm properties 270 00:12:15,000 --> 00:12:16,740 that we used before, OK? 271 00:12:16,740 --> 00:12:17,730 To the absolute value. 272 00:12:17,730 --> 00:12:20,940 The absolute values is the norm of a scalar. 273 00:12:20,940 --> 00:12:23,550 So these norm properties tell me that this quantity 274 00:12:23,550 --> 00:12:26,940 has to be less than or equal to, right? 275 00:12:26,940 --> 00:12:30,150 This ratio of derivatives multiplied 276 00:12:30,150 --> 00:12:33,990 by the absolute error in step i squared. 277 00:12:33,990 --> 00:12:37,590 And I'll divide by that absolute error in step i squared. 278 00:12:37,590 --> 00:12:40,440 So taking the limit is i goes to infinity, 279 00:12:40,440 --> 00:12:45,520 this ratio here is bound by a constant. 280 00:12:45,520 --> 00:12:48,450 This is a definition for the rate of convergence. 281 00:12:48,450 --> 00:12:51,930 It says I take the absolute error in step i plus 1. 282 00:12:51,930 --> 00:12:55,350 I divide it by the absolute error in step i squared. 283 00:12:55,350 --> 00:12:57,540 And it will always be smaller than some constant, 284 00:12:57,540 --> 00:13:00,100 as i goes to infinity. 285 00:13:00,100 --> 00:13:02,370 So it converges quadratically, right? 286 00:13:02,370 --> 00:13:09,667 If the relative error in step i was order 10 to the minus 1, 287 00:13:09,667 --> 00:13:11,880 then the relative error in step i plus 1 288 00:13:11,880 --> 00:13:14,490 will be order 10 to the minus 2. 289 00:13:14,490 --> 00:13:16,410 Because they got to be bound by this constant. 290 00:13:16,410 --> 00:13:19,240 If the relative error in step i was 10 to the minus 2, 291 00:13:19,240 --> 00:13:21,590 the relative error in step i plus 1, 292 00:13:21,590 --> 00:13:25,170 has got to be order 10 to the minus 4, or smaller, right? 293 00:13:25,170 --> 00:13:27,810 Because I square the quantity down here. 294 00:13:27,810 --> 00:13:30,450 I get to double the number of accurate digits 295 00:13:30,450 --> 00:13:31,500 with each iteration. 296 00:13:34,050 --> 00:13:37,950 And this will hold so long as the derivative evaluated 297 00:13:37,950 --> 00:13:39,660 at the root is not equal to zero. 298 00:13:39,660 --> 00:13:42,270 If the derivative evaluated at the root is equal to zero, 299 00:13:42,270 --> 00:13:43,970 this analysis wasn't really valid. 300 00:13:43,970 --> 00:13:49,260 You can't divide by zero in various places, OK? 301 00:13:49,260 --> 00:13:51,330 It turns out the same thing is true 302 00:13:51,330 --> 00:13:52,890 if we do the multidimensional case. 303 00:13:52,890 --> 00:13:54,960 I'll leave it to you to investigate that case. 304 00:13:54,960 --> 00:13:57,334 I think it's interesting for you to try and explore that. 305 00:13:57,334 --> 00:13:59,940 It follows the 1D model I showed you before. 306 00:13:59,940 --> 00:14:02,930 But the absolute error in iterate i plus 1, 307 00:14:02,930 --> 00:14:05,070 divided by the absolute error in iterate i-- 308 00:14:05,070 --> 00:14:06,150 here's a small typo here. 309 00:14:06,150 --> 00:14:07,590 Cross out that plus 1, right? 310 00:14:07,590 --> 00:14:10,917 The absolute error in iterate i squared 311 00:14:10,917 --> 00:14:12,375 is going to be bound by a constant. 312 00:14:14,980 --> 00:14:17,670 And this will be true so long as the determinant 313 00:14:17,670 --> 00:14:20,516 at the Jacobian at the root is not equal to zero. 314 00:14:20,516 --> 00:14:22,140 We know the determinant of the Jacobian 315 00:14:22,140 --> 00:14:26,194 plays the role of the derivative in the 1D case. 316 00:14:26,194 --> 00:14:27,610 When the Jacobian is singular, you 317 00:14:27,610 --> 00:14:31,150 can show that linear convergence is going to occur instead. 318 00:14:31,150 --> 00:14:32,540 So it will still converge. 319 00:14:32,540 --> 00:14:34,570 It's not necessarily a problem that the Jacobian 320 00:14:34,570 --> 00:14:36,190 becomes singular at the root. 321 00:14:36,190 --> 00:14:39,120 But you're going to lose your rate of quadratic convergence. 322 00:14:42,381 --> 00:14:43,880 And this rate of convergence is only 323 00:14:43,880 --> 00:14:46,390 guaranteed if we start sufficiently close to the root. 324 00:14:46,390 --> 00:14:49,610 So good initial guesses, that's important. 325 00:14:49,610 --> 00:14:51,230 We have a locally convergent method. 326 00:14:51,230 --> 00:14:52,490 Bad initial guesses? 327 00:14:52,490 --> 00:14:55,501 Well, who knows where this iterative method is 328 00:14:55,501 --> 00:14:56,000 going to go. 329 00:14:56,000 --> 00:14:57,470 There's nothing to guarantee that it's 330 00:14:57,470 --> 00:14:58,430 going to converge even. 331 00:14:58,430 --> 00:15:00,500 Right It may run away someplace. 332 00:15:00,500 --> 00:15:03,540 Here are a few examples of where things can go wrong. 333 00:15:03,540 --> 00:15:07,010 So if I have a local minima or maxima, 334 00:15:07,010 --> 00:15:10,940 I might have an iterate where I evaluate the linearization, 335 00:15:10,940 --> 00:15:12,770 and it tells me my next best approximation 336 00:15:12,770 --> 00:15:16,140 is on the other side of this minima or maxima. 337 00:15:16,140 --> 00:15:18,630 And then I go up, and I get the linearization here. 338 00:15:18,630 --> 00:15:20,670 And it tells me, oh, my next best approximation 339 00:15:20,670 --> 00:15:21,545 is on the other side. 340 00:15:21,545 --> 00:15:24,000 And this method could bounce back and forth 341 00:15:24,000 --> 00:15:27,839 in here for as long as we sit and wait. 342 00:15:27,839 --> 00:15:29,880 It's locally convergent, not globally convergent. 343 00:15:29,880 --> 00:15:33,330 It can get hung up in situations like this. 344 00:15:33,330 --> 00:15:35,250 Asymptotes are a problem. 345 00:15:35,250 --> 00:15:37,980 I have an asymptote, which presumably 346 00:15:37,980 --> 00:15:40,800 has an effective root somewhere out here at infinity. 347 00:15:40,800 --> 00:15:42,540 Well, my solution would like to follow 348 00:15:42,540 --> 00:15:45,450 the linearization, the successive linearizations all 349 00:15:45,450 --> 00:15:47,580 the way out along this asymptote, right? 350 00:15:47,580 --> 00:15:52,470 So my iterates may blow up in an uncontrolled fashion. 351 00:15:52,470 --> 00:15:54,400 You can also end up with funny cases 352 00:15:54,400 --> 00:15:58,590 where our Newton-Raphson steps continually 353 00:15:58,590 --> 00:16:00,090 overshoot the roots. 354 00:16:00,090 --> 00:16:05,490 So they can be functions who have a power loss scaling 355 00:16:05,490 --> 00:16:11,640 right near the root, such that the derivative doesn't exist. 356 00:16:11,640 --> 00:16:12,820 OK? 357 00:16:12,820 --> 00:16:17,080 So here the derivative of this thing, if s is smaller than 1, 358 00:16:17,080 --> 00:16:19,720 and x equals zero, it won't exist, right? 359 00:16:19,720 --> 00:16:22,580 There isn't a derivative that's defined there. 360 00:16:22,580 --> 00:16:25,090 And in those cases, you can often wind up with overshoot. 361 00:16:25,090 --> 00:16:29,320 So I'll take a linearization, and I'll shoot over the root. 362 00:16:29,320 --> 00:16:31,450 And I'll go up and I'll take my next linearization, 363 00:16:31,450 --> 00:16:33,366 I'll shoot back on the other side of the root. 364 00:16:33,366 --> 00:16:36,530 And depending on the power of s associated with this function, 365 00:16:36,530 --> 00:16:38,440 it may diverge, right? 366 00:16:38,440 --> 00:16:40,830 I may get further and further away from the root, 367 00:16:40,830 --> 00:16:44,140 or it may slowly converge towards that root. 368 00:16:44,140 --> 00:16:45,477 But it can be problematic. 369 00:16:48,205 --> 00:16:51,910 Here's another problem that crops up. 370 00:16:51,910 --> 00:16:54,620 Sometimes people talk about basins of attraction. 371 00:16:54,620 --> 00:16:58,180 So here's a two-dimensional, non-linear equation 372 00:16:58,180 --> 00:16:59,850 I want to find the roots for. 373 00:16:59,850 --> 00:17:01,420 It's cubic in nature, so it's got 374 00:17:01,420 --> 00:17:04,730 three roots, which are indicated by the stars in the x1 x 2 375 00:17:04,730 --> 00:17:06,839 plane. 376 00:17:06,839 --> 00:17:09,589 And I've taken a number of different initial guesses 377 00:17:09,589 --> 00:17:13,160 from all over the plane and I've asked-- 378 00:17:13,160 --> 00:17:15,980 given that initial guess, using the Newton-Raphson method, 379 00:17:15,980 --> 00:17:20,000 which root do I find? 380 00:17:20,000 --> 00:17:22,190 So if you see a dark blue color like this, 381 00:17:22,190 --> 00:17:25,190 that means initial guesses there found this root. 382 00:17:25,190 --> 00:17:26,690 If you see a medium blue color, that 383 00:17:26,690 --> 00:17:28,099 means they found this root. 384 00:17:28,099 --> 00:17:31,110 See a light blue color, that means they found this root. 385 00:17:31,110 --> 00:17:36,140 And this is a relatively simple function, relatively low 386 00:17:36,140 --> 00:17:40,010 dimension, but the plane here is tilled by-- 387 00:17:40,010 --> 00:17:40,730 it's not tiled. 388 00:17:40,730 --> 00:17:41,960 It's filled with a fractal. 389 00:17:41,960 --> 00:17:44,160 These basins of attraction are fractal in nature. 390 00:17:44,160 --> 00:17:47,120 Which means that I could think that I'm 391 00:17:47,120 --> 00:17:49,310 starting with a solution rate here 392 00:17:49,310 --> 00:17:52,070 that should converge to this green root because it's close. 393 00:17:52,070 --> 00:17:53,875 But it actually goes over here. 394 00:17:53,875 --> 00:17:56,000 And if I change that initial guess by a little bit, 395 00:17:56,000 --> 00:17:59,870 it actually pops up to this root over here instead. 396 00:17:59,870 --> 00:18:03,170 It's quite difficult to predict which solution 397 00:18:03,170 --> 00:18:07,060 you're going to converge to. 398 00:18:07,060 --> 00:18:07,910 Yes? 399 00:18:07,910 --> 00:18:09,830 AUDIENCE: And in this case, you knew how many roots there are. 400 00:18:09,830 --> 00:18:10,496 JAMES SWAN: Yes. 401 00:18:10,496 --> 00:18:12,042 AUDIENCE: Often you wouldn't know. 402 00:18:12,042 --> 00:18:14,630 So you find one, and you're happy. 403 00:18:14,630 --> 00:18:15,130 Right? 404 00:18:15,130 --> 00:18:18,539 You're happy because [INAUDIBLE] physical. 405 00:18:18,539 --> 00:18:20,000 Might be the wrong one. 406 00:18:20,000 --> 00:18:22,740 JAMES SWAN: So this the problem. 407 00:18:27,980 --> 00:18:30,770 I think this is about the minimum level of complexity 408 00:18:30,770 --> 00:18:31,404 you need. 409 00:18:31,404 --> 00:18:33,320 Which is not very complex at all in a function 410 00:18:33,320 --> 00:18:36,740 to get these sorts of basins of attraction. 411 00:18:36,740 --> 00:18:38,210 Polynomial equations are ones that 412 00:18:38,210 --> 00:18:40,060 really suffer from this especially, 413 00:18:40,060 --> 00:18:42,170 but it's a problem in general. 414 00:18:42,170 --> 00:18:44,300 You often don't know. 415 00:18:44,300 --> 00:18:46,250 I'll show you quasi Newton-Raphson methods 416 00:18:46,250 --> 00:18:48,460 that help fix some of these problems. 417 00:18:48,460 --> 00:18:50,270 How about other problems? 418 00:18:50,270 --> 00:18:52,240 It's good to know where the weaknesses are. 419 00:18:52,240 --> 00:18:54,870 Newton-Raphson sounds great, but where are the weaknesses? 420 00:18:54,870 --> 00:18:55,370 Let's see. 421 00:18:55,370 --> 00:18:59,890 The Jacobian-- might not be easy to calculate analytically, 422 00:18:59,890 --> 00:19:00,430 right? 423 00:19:00,430 --> 00:19:02,440 So far we've written down analytical forms 424 00:19:02,440 --> 00:19:03,460 for the Jacobian. 425 00:19:03,460 --> 00:19:06,070 We've had simple functions. 426 00:19:06,070 --> 00:19:08,299 But maybe it's not easy to calculate analytically. 427 00:19:08,299 --> 00:19:10,090 You should think about what are the sources 428 00:19:10,090 --> 00:19:12,400 for this function, f of x, that we're 429 00:19:12,400 --> 00:19:14,470 trying to find the roots for. 430 00:19:14,470 --> 00:19:16,210 Also we got to invert the Jacobian, 431 00:19:16,210 --> 00:19:17,380 and we know that's a matrix. 432 00:19:17,380 --> 00:19:19,570 And matrices which have a lot of dimensions in them 433 00:19:19,570 --> 00:19:21,160 are complicated to invert. 434 00:19:21,160 --> 00:19:23,509 There's a huge amount of complexity, 435 00:19:23,509 --> 00:19:25,675 computational complexity, in doing those inversions. 436 00:19:25,675 --> 00:19:28,970 It can take a long time to do them. 437 00:19:28,970 --> 00:19:32,350 It may undesirable to have to constantly be solving 438 00:19:32,350 --> 00:19:34,770 a system of linear equations. 439 00:19:34,770 --> 00:19:39,500 So might think about some options for mitigating this. 440 00:19:39,500 --> 00:19:41,972 Sometimes it won't converge at all? 441 00:19:41,972 --> 00:19:42,930 Or to the nearest root. 442 00:19:42,930 --> 00:19:45,904 This is this overshoot, or basin of attraction problem. 443 00:19:45,904 --> 00:19:47,570 And we'll talk about these modifications 444 00:19:47,570 --> 00:19:48,770 to correct these issues. 445 00:19:48,770 --> 00:19:50,190 They come with a penalty though. 446 00:19:50,190 --> 00:19:50,969 OK? 447 00:19:50,969 --> 00:19:52,760 So Newton-Raphson was based around the idea 448 00:19:52,760 --> 00:19:53,870 of linearization. 449 00:19:53,870 --> 00:19:57,010 If we modify that linearization, we're 450 00:19:57,010 --> 00:20:00,070 going to lose some of these great benefits 451 00:20:00,070 --> 00:20:02,570 of the Newton-Raphson method, namely that it's quadratically 452 00:20:02,570 --> 00:20:03,320 convergent, right? 453 00:20:03,320 --> 00:20:05,632 We're going to make some changes to the method, 454 00:20:05,632 --> 00:20:08,150 and it's not going to converge quadratically anymore. 455 00:20:08,150 --> 00:20:10,280 It's going to slow down, but maybe we'll 456 00:20:10,280 --> 00:20:13,280 be able to rein in the method and make it converge either 457 00:20:13,280 --> 00:20:15,980 to the roots we want it to converge to or converge more 458 00:20:15,980 --> 00:20:19,250 reliably than it would before. 459 00:20:19,250 --> 00:20:23,150 Maybe we'll be able to actually do the calculation faster, 460 00:20:23,150 --> 00:20:26,180 even though it may require more iterations. 461 00:20:26,180 --> 00:20:28,070 Maybe we can make each iteration much faster 462 00:20:28,070 --> 00:20:30,565 using some of these methods. 463 00:20:30,565 --> 00:20:33,190 OK so here are the three things that we're going to talk about. 464 00:20:33,190 --> 00:20:35,730 We're going to talk about approximating the Jacobian 465 00:20:35,730 --> 00:20:37,230 with finite differences. 466 00:20:37,230 --> 00:20:38,730 We're talking about Broyden's method 467 00:20:38,730 --> 00:20:40,647 for approximating the inverse of the Jacobian. 468 00:20:40,647 --> 00:20:42,563 And we're going to talk about something called 469 00:20:42,563 --> 00:20:43,981 damped Newton-Raphson methods. 470 00:20:43,981 --> 00:20:45,730 Those will be the three topics of the day. 471 00:20:48,772 --> 00:20:49,980 So here's what I said before. 472 00:20:49,980 --> 00:20:51,480 Analytical calculations of Jacobian 473 00:20:51,480 --> 00:20:53,620 requires analytical formulas for f. 474 00:20:53,620 --> 00:20:56,070 And for functions of a few dimensions, right? 475 00:20:56,070 --> 00:20:58,620 These calculations are not too tough. 476 00:20:58,620 --> 00:21:00,780 For functions of many dimensions, 477 00:21:00,780 --> 00:21:03,710 this is tedious at best. 478 00:21:03,710 --> 00:21:05,880 Error prone, at worst. 479 00:21:05,880 --> 00:21:09,190 Think about even something like 10 equations for 10 unknowns. 480 00:21:09,190 --> 00:21:12,717 If your error rate is 1%, well, you're shot. 481 00:21:12,717 --> 00:21:14,550 There's a pretty good chance that you missed 482 00:21:14,550 --> 00:21:15,900 one element of the Jacobian. 483 00:21:15,900 --> 00:21:17,906 You made a mistake somewhere in there. 484 00:21:17,906 --> 00:21:19,530 And now you're not doing Newton-Raphson 485 00:21:19,530 --> 00:21:21,654 You're doing some other iterative method that isn't 486 00:21:21,654 --> 00:21:24,540 the one that you intended. 487 00:21:24,540 --> 00:21:26,745 There are a lot of times where you-- maybe you 488 00:21:26,745 --> 00:21:30,570 have an analytical formula for some of these f's, but not all 489 00:21:30,570 --> 00:21:31,980 of them. 490 00:21:31,980 --> 00:21:34,710 So where can these functionalities come from? 491 00:21:34,710 --> 00:21:37,950 We've seen some cases, where you have physical models. 492 00:21:37,950 --> 00:21:40,804 Thermodynamic models that you can write down by hand. 493 00:21:40,804 --> 00:21:43,220 But where are other places that these functions come from? 494 00:21:46,560 --> 00:21:47,498 Ideas? 495 00:21:47,498 --> 00:21:50,426 AUDIENCE: [INAUDIBLE] 496 00:21:50,426 --> 00:21:51,890 JAMES SWAN: Oh, good. 497 00:21:51,890 --> 00:21:53,714 AUDIENCE: [INAUDIBLE] 498 00:21:53,714 --> 00:21:54,630 JAMES SWAN: Beautiful. 499 00:21:54,630 --> 00:21:56,754 So this is going to be the most common case, right? 500 00:21:56,754 --> 00:22:00,720 Maybe you want to use some sort of simulation code, right? 501 00:22:00,720 --> 00:22:01,719 To model something. 502 00:22:01,719 --> 00:22:03,260 It's somebody else's simulation code. 503 00:22:03,260 --> 00:22:06,240 They're an expert at doing finite element modeling. 504 00:22:06,240 --> 00:22:09,900 But the output is this f that you're interested, 505 00:22:09,900 --> 00:22:12,930 and the input to the simulation are these x's. 506 00:22:12,930 --> 00:22:15,960 And you want to find the roots associated with this problem 507 00:22:15,960 --> 00:22:18,574 that you're solving via the simulation code, right? 508 00:22:18,574 --> 00:22:20,490 This is pretty important being able to connect 509 00:22:20,490 --> 00:22:23,420 different pieces of software together. 510 00:22:23,420 --> 00:22:27,910 Well, there's no analytical formula for f there. 511 00:22:27,910 --> 00:22:28,600 OK? 512 00:22:28,600 --> 00:22:29,990 You're shot. 513 00:22:29,990 --> 00:22:32,180 So it may come from results of simulations. 514 00:22:32,180 --> 00:22:33,710 This is extremely common. 515 00:22:33,710 --> 00:22:36,550 It could come from interpretation of data. 516 00:22:36,550 --> 00:22:38,630 So you may have a bunch of data that's 517 00:22:38,630 --> 00:22:41,780 being generated by some physical measurement or a process, 518 00:22:41,780 --> 00:22:44,240 either continuously or you just have a data set 519 00:22:44,240 --> 00:22:47,030 that's available to you. 520 00:22:47,030 --> 00:22:50,060 But these function values are often not, 521 00:22:50,060 --> 00:22:52,250 they're not things that you know analytically. 522 00:22:52,250 --> 00:22:56,730 It may also be the case that, oh, man, even Aspen, you're 523 00:22:56,730 --> 00:22:59,157 going to wind up solving systems of nonlinear equations. 524 00:22:59,157 --> 00:23:00,990 It's going to use the Newton-Raphson method. 525 00:23:00,990 --> 00:23:03,750 Aspen's going to have lots of these formulas in it 526 00:23:03,750 --> 00:23:05,250 for functions. 527 00:23:05,250 --> 00:23:07,440 Whose going in by hand and computing 528 00:23:07,440 --> 00:23:11,010 the derivatives of all these functions for aspen? 529 00:23:11,010 --> 00:23:13,920 MATLAB has a nonlinear equation solver in it. 530 00:23:13,920 --> 00:23:15,900 You give it the function, and it'll 531 00:23:15,900 --> 00:23:18,000 find the root of the equation, given a guess. 532 00:23:18,000 --> 00:23:19,833 It's going to use the Newton-Raphson method. 533 00:23:19,833 --> 00:23:23,621 Whose computing the Jacobian for MATLAB? 534 00:23:23,621 --> 00:23:24,120 You can. 535 00:23:24,120 --> 00:23:26,161 You can compute it by hand, and give it an input. 536 00:23:26,161 --> 00:23:28,987 Sometimes that's a really good thing to do. 537 00:23:28,987 --> 00:23:31,070 But sometimes, we don't have that available to us. 538 00:23:31,070 --> 00:23:33,320 So we need alternative ways of computing the Jacobian. 539 00:23:33,320 --> 00:23:37,410 The simplest one is a finite difference approximation. 540 00:23:37,410 --> 00:23:40,200 So you recall the definition of the derivative. 541 00:23:40,200 --> 00:23:45,570 It's the limit of this difference, f of x plus epsilon 542 00:23:45,570 --> 00:23:49,680 minus f of x divided by epsilon, as epsilon goes to zero. 543 00:23:49,680 --> 00:23:52,260 There's an error in this approximation 544 00:23:52,260 --> 00:23:55,210 for the derivative with a finite value for epsilon, 545 00:23:55,210 --> 00:23:58,320 which is proportional to epsilon. 546 00:23:58,320 --> 00:23:59,970 So choose a small value of epsilon. 547 00:23:59,970 --> 00:24:04,080 You'll get a good approximation for the derivative. 548 00:24:04,080 --> 00:24:06,162 It turns out the accuracy depends on epsilon, 549 00:24:06,162 --> 00:24:07,620 but kind of in a non-intuitive way. 550 00:24:07,620 --> 00:24:08,786 And here's a simple example. 551 00:24:08,786 --> 00:24:12,270 So let's compute the derivative of f of x equals e 552 00:24:12,270 --> 00:24:14,520 the x, which is e to the x. 553 00:24:14,520 --> 00:24:17,670 Let's evaluate it at x equals 1. 554 00:24:17,670 --> 00:24:18,170 So 555 00:24:18,170 --> 00:24:21,240 F prime of 1 is e the 1, which should approximately 556 00:24:21,240 --> 00:24:26,740 be e to the 1 plus epsilon minus e to the 1 over epsilon. 557 00:24:26,740 --> 00:24:29,930 And here I've done this calculation. 558 00:24:29,930 --> 00:24:32,600 And I've asked, what's the absolute error 559 00:24:32,600 --> 00:24:35,810 in this calculation by taking the difference between this 560 00:24:35,810 --> 00:24:39,290 and this, for different values of epsilon. 561 00:24:39,290 --> 00:24:42,180 You can see initially, as epsilon gets smaller, 562 00:24:42,180 --> 00:24:44,980 the absolute error goes down in proportion to epsilon. 563 00:24:44,980 --> 00:24:47,000 10 to the minus 3, 10 to the minus 3. 564 00:24:47,000 --> 00:24:49,190 10 to the minus 4, 10 to the minus 4. 565 00:24:49,190 --> 00:24:51,380 10 to the minus 8, 10 to the minus 8. 566 00:24:51,380 --> 00:24:52,660 10 to the minus 9. 567 00:24:52,660 --> 00:24:53,960 10 to the minus 7. 568 00:24:53,960 --> 00:24:55,390 10 to the minus 10. 569 00:24:55,390 --> 00:24:56,380 And 10 to the minus 6. 570 00:24:56,380 --> 00:24:58,860 So it went down, and it came back up. 571 00:24:58,860 --> 00:25:02,060 But that's not what this formula told us should happen, right? 572 00:25:02,060 --> 00:25:02,849 Yes? 573 00:25:02,849 --> 00:25:04,316 AUDIENCE: So just to be sure. 574 00:25:04,316 --> 00:25:07,250 That term in that column on the right? 575 00:25:07,250 --> 00:25:08,028 JAMES SWAN: Yes? 576 00:25:08,028 --> 00:25:10,291 AUDIENCE: It says exponential 1, but it 577 00:25:10,291 --> 00:25:11,890 represents the approximation? 578 00:25:11,890 --> 00:25:15,050 JAMES SWAN: Exponential 1 is exponential 1. f prime of 1 579 00:25:15,050 --> 00:25:16,310 is our approximation here. 580 00:25:16,310 --> 00:25:16,820 AUDIENCE: Oh, OK. 581 00:25:16,820 --> 00:25:18,403 JAMES SWAN: Sorry that that's unclear. 582 00:25:18,403 --> 00:25:23,370 Yes, so this is the absolute error in this approximation. 583 00:25:23,370 --> 00:25:25,660 So it goes down, and then it goes up. 584 00:25:25,660 --> 00:25:26,590 Is that clear now? 585 00:25:26,590 --> 00:25:27,210 Good. 586 00:25:27,210 --> 00:25:28,730 OK, why does it go down? 587 00:25:28,730 --> 00:25:31,280 It goes down because our definition of the derivative 588 00:25:31,280 --> 00:25:33,140 says it should go down. 589 00:25:33,140 --> 00:25:37,610 At some point, I've actually got to do these calculations 590 00:25:37,610 --> 00:25:40,370 with high enough accuracy to be able to perceive 591 00:25:40,370 --> 00:25:44,450 the difference between e to the 1 plus, 10 to the minus 9, 592 00:25:44,450 --> 00:25:47,030 and e to the 1. 593 00:25:47,030 --> 00:25:49,640 So there is a truncation error in the calculation 594 00:25:49,640 --> 00:25:54,810 of this difference that reduces my accuracy at a certain level. 595 00:25:54,810 --> 00:25:57,650 There's a heuristic you can use here, OK? 596 00:25:57,650 --> 00:25:59,087 You want to set this epsilon when 597 00:25:59,087 --> 00:26:00,920 you do this finite difference approximation, 598 00:26:00,920 --> 00:26:02,510 to be the square root of the machine 599 00:26:02,510 --> 00:26:06,830 precision times the magnitude of x, the point at which you're 600 00:26:06,830 --> 00:26:08,340 trying to calculate this derivative. 601 00:26:08,340 --> 00:26:10,220 So that's, usually we're double precision. 602 00:26:10,220 --> 00:26:12,200 So this is something like 10 to the minus 8 603 00:26:12,200 --> 00:26:13,899 times the magnitude of x. 604 00:26:13,899 --> 00:26:14,690 That's pretty good. 605 00:26:14,690 --> 00:26:15,810 That holds true here. 606 00:26:15,810 --> 00:26:16,310 OK? 607 00:26:16,310 --> 00:26:20,140 You can test it out on some other functions. 608 00:26:20,140 --> 00:26:21,952 If x is 0, or very small. 609 00:26:21,952 --> 00:26:23,410 We don't want a relative tolerance. 610 00:26:23,410 --> 00:26:26,110 We've got to choose an absolute tolerance instead. 611 00:26:26,110 --> 00:26:29,080 Just like we talked about with the step norm criteria. 612 00:26:29,080 --> 00:26:30,670 So one has to be a little bit careful 613 00:26:30,670 --> 00:26:31,850 in how you implement this. 614 00:26:31,850 --> 00:26:34,570 But this is a good guide, OK? 615 00:26:34,570 --> 00:26:37,690 A good way to think about how the error is going to go down, 616 00:26:37,690 --> 00:26:40,180 and where it's going to start to come back up. 617 00:26:40,180 --> 00:26:41,350 Make sense? 618 00:26:41,350 --> 00:26:43,880 Good. 619 00:26:43,880 --> 00:26:46,527 OK, so how do you compute elements of the Jacobian then? 620 00:26:46,527 --> 00:26:48,360 Well, those are all just partial derivatives 621 00:26:48,360 --> 00:26:53,100 of the function with respect to one of the unknown variables. 622 00:26:53,100 --> 00:26:56,430 So partial f i with respect to x j is just 623 00:26:56,430 --> 00:27:03,310 f i at x plus some epsilon deviation of x 624 00:27:03,310 --> 00:27:04,830 in its j-th component only. 625 00:27:04,830 --> 00:27:08,710 So this is like a unit vector in the j direction, 626 00:27:08,710 --> 00:27:13,090 or associated with the j-th element of this vector. 627 00:27:13,090 --> 00:27:15,475 Minus f i of x divided by this epsilon. 628 00:27:18,200 --> 00:27:21,590 Equivalently, you'd have to do this for f i. 629 00:27:21,590 --> 00:27:23,780 You can compute all the columns of the Jacobian 630 00:27:23,780 --> 00:27:28,220 very quickly by calling f of x plus epsilon 631 00:27:28,220 --> 00:27:30,260 minus f of x over epsilon. 632 00:27:30,260 --> 00:27:33,710 Just evaluate your vector-valued function at these different 633 00:27:33,710 --> 00:27:34,349 x's. 634 00:27:34,349 --> 00:27:36,140 Take the difference, and that will give you 635 00:27:36,140 --> 00:27:40,940 column j of your Jacobian. 636 00:27:40,940 --> 00:27:43,010 So how many function evaluations does it 637 00:27:43,010 --> 00:27:47,549 take to calculate the Jacobian at a single point? 638 00:27:47,549 --> 00:27:49,590 How many times do I have to evaluate my function? 639 00:27:56,492 --> 00:27:58,960 Yeah? 640 00:27:58,960 --> 00:27:59,634 AUDIENCE: 2 n. 641 00:27:59,634 --> 00:28:00,550 JAMES SWAN: 2n, right. 642 00:28:00,550 --> 00:28:05,530 So if I have n, if I have n elements to x, 643 00:28:05,530 --> 00:28:09,740 I've got to make two function calls per column of j. 644 00:28:09,740 --> 00:28:11,986 There's going to be n columns in j. 645 00:28:11,986 --> 00:28:15,910 So 2n function evaluations to compute the Jacobian 646 00:28:15,910 --> 00:28:17,290 at a single point. 647 00:28:17,290 --> 00:28:18,520 Is that really true though? 648 00:28:18,520 --> 00:28:19,510 Not quite. 649 00:28:19,510 --> 00:28:21,207 f of x is f of x. 650 00:28:21,207 --> 00:28:22,790 I don't have to compute it every time. 651 00:28:22,790 --> 00:28:25,090 I just compute f of x once. 652 00:28:25,090 --> 00:28:28,460 So it's really like n plus 1 that I have to do, right? 653 00:28:28,460 --> 00:28:31,942 N plus a function evaluations to compute this thing. 654 00:28:31,942 --> 00:28:33,710 I actually got to compute them though. 655 00:28:33,710 --> 00:28:37,070 Function evaluations may be really expensive. 656 00:28:37,070 --> 00:28:40,136 Suppose you're doing some sort of complicated simulation, 657 00:28:40,136 --> 00:28:41,510 like a finite element simulation. 658 00:28:41,510 --> 00:28:44,966 Maybe it takes minutes to generate a function evaluation. 659 00:28:44,966 --> 00:28:46,340 So it can be expensive to compute 660 00:28:46,340 --> 00:28:48,110 the Jacobian in this way. 661 00:28:48,110 --> 00:28:51,500 Just be expensive to compute the Jacobian. 662 00:28:51,500 --> 00:28:53,780 How is approximation of Jacobian going 663 00:28:53,780 --> 00:28:55,200 to affect the convergence? 664 00:28:58,289 --> 00:29:00,330 What's going to happen to the rate of convergence 665 00:29:00,330 --> 00:29:02,160 of our method? 666 00:29:02,160 --> 00:29:04,890 It's going to go down, right? 667 00:29:04,890 --> 00:29:06,959 It's probably not going to be linear. 668 00:29:06,959 --> 00:29:08,250 It's not going to be quadratic. 669 00:29:08,250 --> 00:29:10,230 It's going to be some super linear factor. 670 00:29:10,230 --> 00:29:12,450 It's going to depend on how accurate the Jacobian is. 671 00:29:12,450 --> 00:29:17,960 How sensitive the function is near the root. 672 00:29:17,960 --> 00:29:20,940 But it's going to reduce the accuracy of the method, 673 00:29:20,940 --> 00:29:23,880 or the convergence rate of the method by a little bit. 674 00:29:23,880 --> 00:29:25,230 That's OK. 675 00:29:25,230 --> 00:29:27,330 So this is what MATLAB does. 676 00:29:27,330 --> 00:29:29,460 It uses a finite difference approximation 677 00:29:29,460 --> 00:29:30,210 for your Jacobian. 678 00:29:30,210 --> 00:29:32,293 When you give it a function, and you don't tell it 679 00:29:32,293 --> 00:29:33,300 the Jacobian explicitly. 680 00:29:35,990 --> 00:29:39,237 Here's an example of how to implement this yourself. 681 00:29:39,237 --> 00:29:40,820 So I've got to have some function that 682 00:29:40,820 --> 00:29:42,736 does whatever this function is supposed to do. 683 00:29:42,736 --> 00:29:45,402 It takes as input x and it gives an output f. 684 00:29:45,402 --> 00:29:47,480 And then the Jacobian, right? 685 00:29:47,480 --> 00:29:48,740 It's a matrix. 686 00:29:48,740 --> 00:29:50,990 So we initialize this matrix. 687 00:29:50,990 --> 00:29:53,990 We loop over each of the columns. 688 00:29:53,990 --> 00:29:57,350 We compute the displacement right? 689 00:29:57,350 --> 00:30:00,512 The deviation from x for each of these. 690 00:30:00,512 --> 00:30:01,970 And then we compute this difference 691 00:30:01,970 --> 00:30:04,674 and divide it by epsilon. 692 00:30:04,674 --> 00:30:06,590 I haven't done everything perfect here, right? 693 00:30:06,590 --> 00:30:08,150 Here's an extra function evaluation. 694 00:30:08,150 --> 00:30:10,400 I could just calculate the value of the function 695 00:30:10,400 --> 00:30:12,486 at x before doing the loop. 696 00:30:12,486 --> 00:30:14,570 I've also only used a relative tolerance here. 697 00:30:14,570 --> 00:30:16,392 I'm going to be in trouble if xi is 0. 698 00:30:16,392 --> 00:30:18,350 It's going to be a problem with this algorithm. 699 00:30:18,350 --> 00:30:22,620 These are the little details one has to pay attention to. 700 00:30:22,620 --> 00:30:24,560 But it's a simple enough calculation to do. 701 00:30:24,560 --> 00:30:25,850 Loop over the columns, right? 702 00:30:25,850 --> 00:30:27,290 Compute these differences. 703 00:30:27,290 --> 00:30:28,130 Divide by epsilon. 704 00:30:28,130 --> 00:30:30,980 You have your approximation for the Jacobian. 705 00:30:30,980 --> 00:30:33,890 I've got to do that at every iteration, right? 706 00:30:33,890 --> 00:30:36,680 Every time x is updated, I've got to recompute my Jacobian. 707 00:30:40,762 --> 00:30:41,470 That's it though. 708 00:30:41,470 --> 00:30:44,310 All right, that's one way of approximating a Jacobian. 709 00:30:44,310 --> 00:30:47,370 There s a method that's used in one dimension called the Secant 710 00:30:47,370 --> 00:30:49,380 method. 711 00:30:49,380 --> 00:30:51,570 It's a special case of the Newton-Raphson method 712 00:30:51,570 --> 00:30:54,090 and uses a coarser approximation for the derivative. 713 00:30:54,090 --> 00:30:59,600 It says, I was taking these steps from xi minus 1 to xi. 714 00:30:59,600 --> 00:31:01,200 And I knew the function values there. 715 00:31:01,200 --> 00:31:03,234 Maybe I should just compute the slope 716 00:31:03,234 --> 00:31:05,400 of the line that goes through those points, and say, 717 00:31:05,400 --> 00:31:07,690 that's my approximation for the derivative. 718 00:31:07,690 --> 00:31:08,190 Why not? 719 00:31:08,190 --> 00:31:10,020 I have the data available to me. 720 00:31:10,020 --> 00:31:12,420 It seems like a sensible thing to do. 721 00:31:12,420 --> 00:31:20,590 So we replace f prime at x1 with f of xi minus f of x i minus 1. 722 00:31:20,590 --> 00:31:24,077 Down here we put xi minus xi minus 1 up here. 723 00:31:24,077 --> 00:31:25,910 That's our approximation for the derivative, 724 00:31:25,910 --> 00:31:28,010 or the inverse of the derivative. 725 00:31:28,010 --> 00:31:31,460 This can work, it can work just fine. 726 00:31:31,460 --> 00:31:33,050 Can it be extended to many dimensions? 727 00:31:33,050 --> 00:31:34,700 That's an interesting question, though? 728 00:31:34,700 --> 00:31:35,870 This is simple. 729 00:31:35,870 --> 00:31:37,880 In many dimensions, not so obvious right? 730 00:31:37,880 --> 00:31:42,575 If I know xi, xi minus 1. f of xi, f of si minus 1. 731 00:31:42,575 --> 00:31:46,841 Can I approximate the Jacobian? 732 00:31:46,841 --> 00:31:47,590 What do you think? 733 00:31:55,034 --> 00:31:56,450 Does it strike you as though there 734 00:31:56,450 --> 00:31:59,402 might be some fundamental difficulty to doing that? 735 00:32:09,704 --> 00:32:10,204 Yeah? 736 00:32:10,204 --> 00:32:12,659 AUDIENCE: Could you approximate the gradient? 737 00:32:12,659 --> 00:32:19,533 [INAUDIBLE] gradient of f at x. 738 00:32:19,533 --> 00:32:20,855 JAMES SWAN: OK. 739 00:32:20,855 --> 00:32:24,015 AUDIENCE: But I'm sure if you whether you 740 00:32:24,015 --> 00:32:27,440 can go backwards from the gradient in the Jacobian. 741 00:32:27,440 --> 00:32:29,160 JAMES SWAN: OK. 742 00:32:29,160 --> 00:32:31,656 So, let's-- go ahead. 743 00:32:31,656 --> 00:32:33,534 AUDIENCE: Perhaps the difficulty is, 744 00:32:33,534 --> 00:32:35,200 I mean when they're just single values-- 745 00:32:35,200 --> 00:32:35,410 JAMES SWAN: Yeah. 746 00:32:35,410 --> 00:32:38,062 AUDIENCE: You can think of [INAUDIBLE] derivative, right? 747 00:32:38,062 --> 00:32:39,190 JAMES SWAN: Yeah. 748 00:32:39,190 --> 00:32:41,550 AUDIENCE: [INAUDIBLE] get really big, 749 00:32:41,550 --> 00:32:44,190 you get a vector of a function at xi, 750 00:32:44,190 --> 00:32:47,290 a vector of a function of xi minus 1 or whatever. 751 00:32:47,290 --> 00:32:49,343 Vectors of these x's. 752 00:32:49,343 --> 00:32:52,377 And so if you're [INAUDIBLE] 753 00:32:52,377 --> 00:32:54,460 JAMES SWAN: Yeah, so how do I divide these things? 754 00:32:54,460 --> 00:32:55,430 That's a good question. 755 00:32:55,430 --> 00:32:58,210 The Jacobian-- how much information content 756 00:32:58,210 --> 00:32:59,470 is in the Jacobian? 757 00:32:59,470 --> 00:33:01,690 Or how many independent quantities 758 00:33:01,690 --> 00:33:04,010 are built into the Jacobian? 759 00:33:04,010 --> 00:33:05,360 AUDIENCE: [INAUDIBLE] 760 00:33:05,360 --> 00:33:06,440 JAMES SWAN: And squared. 761 00:33:06,440 --> 00:33:10,160 And how much data do I have to work with here? 762 00:33:10,160 --> 00:33:11,870 You know, order n data. 763 00:33:11,870 --> 00:33:14,240 To figure out order n squared quantities. 764 00:33:14,240 --> 00:33:16,490 This is the division problem you're describing, right? 765 00:33:16,490 --> 00:33:19,400 So it seems like this is an underdetermined sort 766 00:33:19,400 --> 00:33:20,040 of problem. 767 00:33:20,040 --> 00:33:20,910 And it is. 768 00:33:20,910 --> 00:33:21,740 OK? 769 00:33:21,740 --> 00:33:25,830 So there isn't a direct analog to the Secant method 770 00:33:25,830 --> 00:33:27,110 in dimensions. 771 00:33:27,110 --> 00:33:31,090 We can write down something that makes sense. 772 00:33:31,090 --> 00:33:32,990 So this is the 1D Secant approximation. 773 00:33:32,990 --> 00:33:35,810 That the value of the derivative multiplied 774 00:33:35,810 --> 00:33:38,750 by the step between i minus 1 and i 775 00:33:38,750 --> 00:33:40,820 is approximated by the difference 776 00:33:40,820 --> 00:33:43,840 in the values of the function. 777 00:33:43,840 --> 00:33:47,110 The equivalent is the value of the Jacobian multiplied 778 00:33:47,110 --> 00:33:49,120 by the step between i minus 1 and i 779 00:33:49,120 --> 00:33:51,970 is equal to the difference between the values 780 00:33:51,970 --> 00:33:54,220 of the functions. 781 00:33:54,220 --> 00:33:57,000 But now this is an equation for n 782 00:33:57,000 --> 00:34:00,260 squared elements of the Jacobian, 783 00:34:00,260 --> 00:34:03,700 in terms of n elements of the function, right? 784 00:34:03,700 --> 00:34:07,690 So it's massively, massively underdetermined. 785 00:34:07,690 --> 00:34:10,469 OK? 786 00:34:10,469 --> 00:34:12,120 Here we have an equation for-- 787 00:34:12,120 --> 00:34:14,280 we have one equation for one unknown. 788 00:34:14,280 --> 00:34:15,659 The derivative, right? 789 00:34:15,659 --> 00:34:18,420 Think about how it was moving through space before, right? 790 00:34:18,420 --> 00:34:24,120 The difference here, xi minus 1, that's some sort of linear path 791 00:34:24,120 --> 00:34:26,469 that I'm moving along through space. 792 00:34:26,469 --> 00:34:30,420 How am I supposed to figure out what the tangent curves to all 793 00:34:30,420 --> 00:34:32,556 these functions are from this linear path 794 00:34:32,556 --> 00:34:34,139 through multidimensional space, right? 795 00:34:34,139 --> 00:34:37,260 That's not going to work. 796 00:34:37,260 --> 00:34:38,840 So there's underdetermined problems. 797 00:34:38,840 --> 00:34:40,920 It's not so-- that's not so bad, actually. 798 00:34:40,920 --> 00:34:41,420 Right? 799 00:34:41,420 --> 00:34:42,719 Doesn't mean there's no solution. 800 00:34:42,719 --> 00:34:44,760 In fact, it means there's all a lot of solutions. 801 00:34:44,760 --> 00:34:47,630 So we can pick whichever one we think is suitable. 802 00:34:47,630 --> 00:34:50,179 And Broyden's method is a method for picking 803 00:34:50,179 --> 00:34:51,920 one of these potential solutions to 804 00:34:51,920 --> 00:34:53,540 this underdetermined problem. 805 00:34:53,540 --> 00:34:56,780 We don't have enough information to calculate the Jacobian 806 00:34:56,780 --> 00:34:57,710 exactly. 807 00:34:57,710 --> 00:35:01,380 But maybe we can construct a suitable approximation for it. 808 00:35:01,380 --> 00:35:04,200 And here's what's done. 809 00:35:04,200 --> 00:35:06,890 So here's the Secant approximation. 810 00:35:06,890 --> 00:35:08,750 It says the Jacobian times the step 811 00:35:08,750 --> 00:35:10,480 size, or the Newton-Raphson step, 812 00:35:10,480 --> 00:35:12,980 should be the difference in the functions. 813 00:35:12,980 --> 00:35:20,660 And Newton's method for x i, said xi minus xi minus 1 814 00:35:20,660 --> 00:35:22,550 was equal-- times the Jacobian, was 815 00:35:22,550 --> 00:35:23,872 equal to minus f of xi minus 1. 816 00:35:23,872 --> 00:35:25,080 This is just Newton's method. 817 00:35:25,080 --> 00:35:27,710 Invert the Jacobian, and put it on the other side 818 00:35:27,710 --> 00:35:28,910 of the equation. 819 00:35:28,910 --> 00:35:31,300 Broyden's method said, i-- there's a trick here. 820 00:35:31,300 --> 00:35:33,970 Take the difference between these things. 821 00:35:33,970 --> 00:35:36,660 I get the same left-hand side on both of these equations. 822 00:35:36,660 --> 00:35:38,720 So take the difference, and I can figure out 823 00:35:38,720 --> 00:35:41,894 how the Jacobian should change from one step to the next. 824 00:35:41,894 --> 00:35:44,060 So maybe I have a good approximation to the Jacobian 825 00:35:44,060 --> 00:35:48,380 at xi minus i, I might be able to use this still 826 00:35:48,380 --> 00:35:51,230 underdetermined problem to figure out how 827 00:35:51,230 --> 00:35:53,250 to update that Jacobian, right? 828 00:35:53,250 --> 00:35:56,270 So Broyden's method is what's referred to as the rank one 829 00:35:56,270 --> 00:35:56,780 update. 830 00:35:56,780 --> 00:35:59,750 You should convince yourself that letting 831 00:35:59,750 --> 00:36:04,500 the Jacobian at xi minus the Jacobian at xi minus 1 832 00:36:04,500 --> 00:36:07,370 be equal to this is one possible solution 833 00:36:07,370 --> 00:36:10,500 of this underdetermined equation. 834 00:36:10,500 --> 00:36:12,150 There are others. 835 00:36:12,150 --> 00:36:13,860 This is one possible solution. 836 00:36:13,860 --> 00:36:17,820 It turns out to be a good one to choose. 837 00:36:17,820 --> 00:36:19,500 So there's an iterative approximation 838 00:36:19,500 --> 00:36:21,150 now for the Jacobian. 839 00:36:24,615 --> 00:36:26,745 Does this strategy make sense? 840 00:36:26,745 --> 00:36:27,870 It's a little weird, right? 841 00:36:27,870 --> 00:36:29,120 There's something tricky here. 842 00:36:29,120 --> 00:36:30,600 You got to know to do this. 843 00:36:30,600 --> 00:36:32,310 Right, so somebody has to have in mind 844 00:36:32,310 --> 00:36:34,860 already that they're looking for differences in the Jacobian 845 00:36:34,860 --> 00:36:36,510 that they're going to update over time. 846 00:36:40,322 --> 00:36:42,780 So this tells me the Jacobian, how the Jacobian is updated. 847 00:36:46,480 --> 00:36:49,480 Really we need the Jacobian inverse, 848 00:36:49,480 --> 00:36:52,240 and the reason for choosing this rank one update 849 00:36:52,240 --> 00:36:57,100 approximation is it's possible to write 850 00:36:57,100 --> 00:37:01,330 the inverse of j of xi in terms of the inverse of j 851 00:37:01,330 --> 00:37:05,100 at xi minus 1 when this update formula is true. 852 00:37:05,100 --> 00:37:07,600 So it's something called the Sherman Morrison Formula, which 853 00:37:07,600 --> 00:37:11,440 says the inverse of a matrix plus the dyadic product of two 854 00:37:11,440 --> 00:37:14,440 vectors can be written in this form. 855 00:37:14,440 --> 00:37:18,120 We don't need to derive this, but this is true. 856 00:37:18,120 --> 00:37:22,260 This matrix plus dyadic product is exactly this. 857 00:37:22,260 --> 00:37:24,660 We have dyadic product between f and the step 858 00:37:24,660 --> 00:37:27,869 from xi minus 1 to x. 859 00:37:27,869 --> 00:37:29,910 And so we can apply that Sherman Morrison Formula 860 00:37:29,910 --> 00:37:31,350 to the rank one update. 861 00:37:31,350 --> 00:37:34,656 And not only can we update the Jacobian iteratively, 862 00:37:34,656 --> 00:37:36,280 but we can update the Jacobian inverse. 863 00:37:36,280 --> 00:37:39,900 So if I know j inverse at some previous time, 864 00:37:39,900 --> 00:37:41,899 I know j inverse at some later time too. 865 00:37:41,899 --> 00:37:43,440 I don't have to compute these things. 866 00:37:43,440 --> 00:37:45,990 I don't have to solve these systems of equations, right? 867 00:37:45,990 --> 00:37:47,220 I just update this matrix. 868 00:37:50,020 --> 00:37:51,880 Update this matrix, and I can very rapidly 869 00:37:51,880 --> 00:37:54,639 do these computations. 870 00:37:54,639 --> 00:37:56,430 So not only do we have an iterative formula 871 00:37:56,430 --> 00:37:57,900 for the steps, right? 872 00:37:57,900 --> 00:38:01,382 From x 0 to x1 to x 2, all the way up to our 873 00:38:01,382 --> 00:38:02,840 converged solution, but we can have 874 00:38:02,840 --> 00:38:05,460 a formula for the inverse of the Jacobian. 875 00:38:05,460 --> 00:38:07,170 We give up accuracy. 876 00:38:07,170 --> 00:38:10,260 But that's paid for in terms of the amount of time 877 00:38:10,260 --> 00:38:12,930 we have to spend doing these calculations. 878 00:38:12,930 --> 00:38:13,710 Does it pay off? 879 00:38:13,710 --> 00:38:15,570 It depends on the problem, right? 880 00:38:15,570 --> 00:38:17,890 We try to solve problems in different ways. 881 00:38:17,890 --> 00:38:22,631 This is a pretty common way to approximate the Jacobian. 882 00:38:22,631 --> 00:38:24,062 Questions about this? 883 00:38:27,410 --> 00:38:28,130 No. 884 00:38:28,130 --> 00:38:28,952 OK. 885 00:38:28,952 --> 00:38:29,660 Broyden's method. 886 00:38:32,619 --> 00:38:33,910 All right, here's the last one. 887 00:38:36,460 --> 00:38:38,190 The Damped Newton-Raphson method. 888 00:38:38,190 --> 00:38:40,330 We'll do this in one dimension. 889 00:38:40,330 --> 00:38:42,740 So the Newton-Raphson method, Newton and Raphson told us, 890 00:38:42,740 --> 00:38:46,780 take a step from xi to xi plus 1 that is this big. 891 00:38:49,880 --> 00:38:53,570 xi to xi plus 1, it's this big. 892 00:38:53,570 --> 00:38:56,270 Sometimes you'll take that step, and you'll 893 00:38:56,270 --> 00:38:59,360 find that the value of the function at xi plus 1 894 00:38:59,360 --> 00:39:02,180 is even bigger than the value of the function at xi. 895 00:39:02,180 --> 00:39:04,429 There was nothing about the Newton-Raphson method that 896 00:39:04,429 --> 00:39:07,149 told us the function value was always going to be decreasing. 897 00:39:07,149 --> 00:39:09,440 But actually, our goal is to make the function value go 898 00:39:09,440 --> 00:39:11,070 to 0 in absolute value. 899 00:39:11,070 --> 00:39:16,370 So it seems like this step, not a very good one, right? 900 00:39:16,370 --> 00:39:18,120 What are Newton and Raphson thinking here. 901 00:39:18,120 --> 00:39:19,440 This is not a good idea. 902 00:39:19,440 --> 00:39:20,630 The function value went up. 903 00:39:24,830 --> 00:39:25,994 Far from a root, OK? 904 00:39:25,994 --> 00:39:27,410 The Newton-Raphson method is going 905 00:39:27,410 --> 00:39:29,621 to give these sorts of erratic responses. 906 00:39:29,621 --> 00:39:31,370 Who knows what direction it's going to go? 907 00:39:31,370 --> 00:39:35,480 And it's only locally convergent. 908 00:39:35,480 --> 00:39:37,191 It tells us a direction to move in, 909 00:39:37,191 --> 00:39:39,440 but it doesn't always give the right sort of magnitude 910 00:39:39,440 --> 00:39:41,310 associated with that step. 911 00:39:41,310 --> 00:39:43,310 And so you take these steps and you can find out 912 00:39:43,310 --> 00:39:45,650 the value of your function, the normed value of your functions. 913 00:39:45,650 --> 00:39:47,108 It's bigger than where you started. 914 00:39:47,108 --> 00:39:49,670 It seems like you're getting further away from the root. 915 00:39:49,670 --> 00:39:52,880 Our ultimate goal is to drive this norm to 0. 916 00:39:52,880 --> 00:39:55,310 So steps like that you might even call unacceptable. 917 00:39:55,310 --> 00:39:55,520 Right? 918 00:39:55,520 --> 00:39:57,270 Why would I ever take a step in that direction? 919 00:39:57,270 --> 00:39:58,940 Maybe I should use a different method. 920 00:39:58,940 --> 00:40:01,550 When I take a step that's so big my function value 921 00:40:01,550 --> 00:40:03,020 grows in norm value. 922 00:40:05,690 --> 00:40:08,920 So what one does, oftentimes, is introduce a damping factor, 923 00:40:08,920 --> 00:40:09,970 right? 924 00:40:09,970 --> 00:40:13,450 We said that this ratio, or equivalently, 925 00:40:13,450 --> 00:40:15,760 the Jacobian inverse times the value of the function, 926 00:40:15,760 --> 00:40:18,320 gives us the right direction to step in. 927 00:40:18,320 --> 00:40:21,760 But how big a step should we take? 928 00:40:21,760 --> 00:40:25,750 It's clear a step like this is a good one. 929 00:40:25,750 --> 00:40:27,320 It reduced the value of the function. 930 00:40:30,647 --> 00:40:32,480 And it's better than the one we took before, 931 00:40:32,480 --> 00:40:35,240 which was given by the linear approximation. 932 00:40:35,240 --> 00:40:38,150 So if I draw the tangent line, it intercepts here. 933 00:40:38,150 --> 00:40:40,290 If I take a step in this direction, 934 00:40:40,290 --> 00:40:42,440 but I reduce the slope by having some 935 00:40:42,440 --> 00:40:43,990 damping factor that's smaller than 1, 936 00:40:43,990 --> 00:40:46,570 I get closer to the root. 937 00:40:46,570 --> 00:40:50,710 Ideally we'd like to choose that damping factor 938 00:40:50,710 --> 00:40:54,760 to be the one that minimizes the value of the function at xi 939 00:40:54,760 --> 00:40:57,280 plus 1. 940 00:40:57,280 --> 00:41:01,510 So it's the argument that minimizes 941 00:41:01,510 --> 00:41:05,530 the value of the function at xi plus 1 or at xi minus alpha, f 942 00:41:05,530 --> 00:41:08,060 over f prime. 943 00:41:08,060 --> 00:41:09,500 Solving that optimization problem, 944 00:41:09,500 --> 00:41:12,150 what's hard as finding the root itself. 945 00:41:12,150 --> 00:41:13,760 So ideally this is true. 946 00:41:13,760 --> 00:41:16,730 But practically you're not going to be able to do it. 947 00:41:16,730 --> 00:41:20,540 So we have to come up with some approximate methods of solving 948 00:41:20,540 --> 00:41:21,624 this optimization problem. 949 00:41:21,624 --> 00:41:23,748 Actually we don't even care about getting it exact. 950 00:41:23,748 --> 00:41:25,670 We know Newton-Raphson does a pretty good job. 951 00:41:25,670 --> 00:41:28,400 We want some sort of guess that's 952 00:41:28,400 --> 00:41:32,216 respectable for this alpha so that we get close to this root. 953 00:41:32,216 --> 00:41:33,590 Once we get close, we'll probably 954 00:41:33,590 --> 00:41:34,774 choose alpha equal to 1. 955 00:41:34,774 --> 00:41:36,440 We'll just take the Newton-Raphson steps 956 00:41:36,440 --> 00:41:40,100 all the way down to the root. 957 00:41:40,100 --> 00:41:41,800 So here it is in many dimensions. 958 00:41:41,800 --> 00:41:45,960 Modify the Newton-Raphson step by some value alpha, 959 00:41:45,960 --> 00:41:47,890 choose alpha to be the argument that 960 00:41:47,890 --> 00:41:52,270 minimizes the norm value of the function at xi plus 1. 961 00:41:55,510 --> 00:41:57,340 Here's one way of doing this. 962 00:41:57,340 --> 00:41:59,330 So this is called the Armijo line search. 963 00:41:59,330 --> 00:41:59,830 See? 964 00:41:59,830 --> 00:42:01,180 Line search. 965 00:42:01,180 --> 00:42:03,400 Start by letting alpha equal to 1. 966 00:42:03,400 --> 00:42:05,960 Take the full Newton-Raphson step, and check. 967 00:42:05,960 --> 00:42:09,000 Was the value of my function smaller than where I started? 968 00:42:09,000 --> 00:42:11,470 If it is, let's take the step. 969 00:42:11,470 --> 00:42:13,630 It's getting us-- we're accomplishing our goal. 970 00:42:13,630 --> 00:42:16,180 We re reducing the value of the function in norm. 971 00:42:16,180 --> 00:42:17,410 Maybe we're headed towards z. 972 00:42:17,410 --> 00:42:18,370 That's good. 973 00:42:18,370 --> 00:42:20,020 Accept it. 974 00:42:20,020 --> 00:42:24,440 If no, let's replace alpha with alpha over 2. 975 00:42:24,440 --> 00:42:25,630 Let's take a shorter step. 976 00:42:28,330 --> 00:42:30,190 We take a shorter step, and we repeat. 977 00:42:30,190 --> 00:42:30,690 Right? 978 00:42:30,690 --> 00:42:32,029 Take the shorter step. 979 00:42:32,029 --> 00:42:34,570 Check whether the value of the function with the shorter step 980 00:42:34,570 --> 00:42:35,740 is acceptable. 981 00:42:35,740 --> 00:42:38,540 If yes, let's take it, and let's move on. 982 00:42:38,540 --> 00:42:42,590 And if no, replace alpha with alpha over 2, and continue. 983 00:42:42,590 --> 00:42:45,580 So we have our step size every time. 984 00:42:45,580 --> 00:42:47,080 We don't just have to have it. 985 00:42:47,080 --> 00:42:49,930 We could choose different factors to reduce it by. 986 00:42:49,930 --> 00:42:52,300 But we try to take shorter and shorter steps 987 00:42:52,300 --> 00:42:56,710 until we accomplish our goal of having a function which 988 00:42:56,710 --> 00:42:58,900 is smaller in norm at our next iterate 989 00:42:58,900 --> 00:43:01,580 than where we were before. 990 00:43:01,580 --> 00:43:03,910 It's got-- the function value will be reduced. 991 00:43:03,910 --> 00:43:06,430 The Newton-Raphson method picks a direction 992 00:43:06,430 --> 00:43:10,170 that wants to bring the function value closer to 0. 993 00:43:10,170 --> 00:43:11,797 We linearize the function, and we 994 00:43:11,797 --> 00:43:14,380 found the direction we needed to go to make that linearization 995 00:43:14,380 --> 00:43:14,880 go to 0. 996 00:43:14,880 --> 00:43:18,310 So there is a step size for which the function 997 00:43:18,310 --> 00:43:21,450 value will be reduced. 998 00:43:21,450 --> 00:43:23,632 And because of that, this Armijo line search 999 00:43:23,632 --> 00:43:25,090 of the Damped Newton-Raphson method 1000 00:43:25,090 --> 00:43:27,690 is actually globally convergent, right? 1001 00:43:27,690 --> 00:43:30,370 The iterative method will terminate. 1002 00:43:30,370 --> 00:43:31,347 You can guarantee it. 1003 00:43:31,347 --> 00:43:32,930 Here's what it looks like graphically. 1004 00:43:32,930 --> 00:43:35,790 I take my big step, my alpha equals 1 step. 1005 00:43:35,790 --> 00:43:37,375 I check the value of the function. 1006 00:43:37,375 --> 00:43:39,500 It's bigger in absolute value than where I started. 1007 00:43:39,500 --> 00:43:40,570 So I go back. 1008 00:43:40,570 --> 00:43:42,830 I take half that step size. 1009 00:43:42,830 --> 00:43:43,570 OK? 1010 00:43:43,570 --> 00:43:44,480 I look at the value of the function. 1011 00:43:44,480 --> 00:43:45,280 It's still bigger. 1012 00:43:45,280 --> 00:43:46,890 Let's reject it, and go back. 1013 00:43:46,890 --> 00:43:48,719 I take half that step size again. 1014 00:43:48,719 --> 00:43:50,260 The value of the function here is now 1015 00:43:50,260 --> 00:43:51,950 smaller in absolute value. 1016 00:43:51,950 --> 00:43:53,370 So I accept it. 1017 00:43:53,370 --> 00:43:57,200 And I put myself pretty close to the root. 1018 00:43:57,200 --> 00:44:00,050 So it's convergent, globally convergent. 1019 00:44:00,050 --> 00:44:01,850 That's nice. 1020 00:44:01,850 --> 00:44:05,712 It's not globally convergent to roots, which is a pain. 1021 00:44:05,712 --> 00:44:06,920 But it's globally convergent. 1022 00:44:06,920 --> 00:44:08,220 It will terminate eventually. 1023 00:44:08,220 --> 00:44:11,780 You'll get to a point where you won't 1024 00:44:11,780 --> 00:44:14,500 be able to advance your steps any further. 1025 00:44:14,500 --> 00:44:18,050 It may converge to minima or maxima of a function. 1026 00:44:18,050 --> 00:44:19,486 Or it may converge to roots. 1027 00:44:19,486 --> 00:44:20,360 But it will converge. 1028 00:44:23,290 --> 00:44:26,732 I showed you this example before with basins of attraction. 1029 00:44:26,732 --> 00:44:28,690 So here we have different basins of attraction. 1030 00:44:28,690 --> 00:44:29,648 They're all colored in. 1031 00:44:29,648 --> 00:44:31,460 They show you which roots you approach. 1032 00:44:31,460 --> 00:44:34,680 Here I've applied the Damped Newton-Raphson method 1033 00:44:34,680 --> 00:44:36,714 to the same system of equations. 1034 00:44:36,714 --> 00:44:38,380 And you can see the basins of attraction 1035 00:44:38,380 --> 00:44:41,380 are shrunk because of the damping. 1036 00:44:41,380 --> 00:44:44,380 What happens when you're very close to places where 1037 00:44:44,380 --> 00:44:46,640 the determinant of the Jacobian is singular, 1038 00:44:46,640 --> 00:44:49,690 you take all sorts of wild steps. 1039 00:44:49,690 --> 00:44:51,700 You go to places where the value of the function 1040 00:44:51,700 --> 00:44:54,019 is bigger than where you started. 1041 00:44:54,019 --> 00:44:55,810 And then you've got to step down from there 1042 00:44:55,810 --> 00:44:57,430 to try to find the root. 1043 00:44:57,430 --> 00:44:59,320 Who knows where those locations are? 1044 00:44:59,320 --> 00:45:02,650 It's a very complicated, geometrically complicated space 1045 00:45:02,650 --> 00:45:04,075 that you're moving through. 1046 00:45:04,075 --> 00:45:05,575 And the Damped Newton-Raphson method 1047 00:45:05,575 --> 00:45:09,784 is forcing the steps to always reduce 1048 00:45:09,784 --> 00:45:11,200 the value of the function, so they 1049 00:45:11,200 --> 00:45:15,030 reduce the size of these basins of attraction. 1050 00:45:15,030 --> 00:45:18,437 So this is often a nice way to supplement 1051 00:45:18,437 --> 00:45:20,520 the Newton-Raphson method when your guesses aren't 1052 00:45:20,520 --> 00:45:21,589 very good to begin with. 1053 00:45:21,589 --> 00:45:23,880 When you start to get close to root you're, always just 1054 00:45:23,880 --> 00:45:25,650 going to accept alpha equals 1. 1055 00:45:25,650 --> 00:45:27,480 The first step will be the best step, 1056 00:45:27,480 --> 00:45:30,825 and then you'll converge very rapidly to the solution. 1057 00:45:30,825 --> 00:45:33,150 Do we have to do any extra work actually 1058 00:45:33,150 --> 00:45:35,460 to do this Damped Newton-Raphson method. 1059 00:45:35,460 --> 00:45:37,310 Does it require extra calculations? 1060 00:45:43,010 --> 00:45:44,057 What do you think? 1061 00:45:44,057 --> 00:45:45,890 A lot of extra-- a lot of extra calculation? 1062 00:45:45,890 --> 00:45:47,780 How many extra calculations does it require? 1063 00:45:47,780 --> 00:45:48,990 Of course it requires extra. 1064 00:45:48,990 --> 00:45:50,246 How many? 1065 00:45:50,246 --> 00:45:55,076 AUDIENCE: [INAUDIBLE] 1066 00:45:55,076 --> 00:45:56,712 JAMES SWAN: What do you think? 1067 00:45:56,712 --> 00:46:07,180 AUDIENCE: [INAUDIBLE] 1068 00:46:07,180 --> 00:46:09,060 JAMES SWAN: It's-- that much is true. 1069 00:46:09,060 --> 00:46:13,160 So let's talk about taking one step. 1070 00:46:13,160 --> 00:46:15,860 How many more-- how many more calculations do 1071 00:46:15,860 --> 00:46:18,110 I have to pay to do this sort of a step? 1072 00:46:18,110 --> 00:46:21,490 Or even write the multidimensional step? 1073 00:46:24,160 --> 00:46:27,185 For each of these times around this loop, 1074 00:46:27,185 --> 00:46:28,810 do I have to recompute? 1075 00:46:28,810 --> 00:46:32,420 Do I have to solve the system of equations? 1076 00:46:32,420 --> 00:46:32,930 No. 1077 00:46:32,930 --> 00:46:33,430 Right? 1078 00:46:33,430 --> 00:46:34,970 You precompute this, right? 1079 00:46:34,970 --> 00:46:36,560 This is the basic Newton-Raphson step. 1080 00:46:36,560 --> 00:46:37,518 You compute that first. 1081 00:46:37,518 --> 00:46:39,470 You've got to do it once. 1082 00:46:39,470 --> 00:46:41,472 And then it's pretty cheap after that. 1083 00:46:41,472 --> 00:46:43,430 I've got to do some extra function evaluations, 1084 00:46:43,430 --> 00:46:45,888 but I don't actually have to solve the system of equations. 1085 00:46:45,888 --> 00:46:49,040 Remember this is order n cubed. 1086 00:46:49,040 --> 00:46:53,690 If we solve it exactly, maybe order n squared or order 1087 00:46:53,690 --> 00:46:55,160 n if we do it iteratively. 1088 00:46:55,160 --> 00:46:57,620 And the Jacobian is sparse somehow, 1089 00:46:57,620 --> 00:46:59,510 and we know about it sparsity pattern. 1090 00:46:59,510 --> 00:47:00,320 This is expensive. 1091 00:47:00,320 --> 00:47:04,040 Function evaluations, those are order n to compute. 1092 00:47:04,040 --> 00:47:05,490 Relatively cheap by comparison. 1093 00:47:05,490 --> 00:47:08,760 So you compute your initial step. 1094 00:47:08,760 --> 00:47:09,690 That's expensive. 1095 00:47:09,690 --> 00:47:13,500 But all of this down here is pretty cheap. 1096 00:47:13,500 --> 00:47:14,211 Yeah? 1097 00:47:14,211 --> 00:47:15,894 AUDIENCE: You're also assuming that your function evaluations 1098 00:47:15,894 --> 00:47:16,620 are reasonably true. 1099 00:47:16,620 --> 00:47:17,661 JAMES SWAN: This is true. 1100 00:47:17,661 --> 00:47:19,474 AUDIENCE: [INAUDIBLE] 1101 00:47:19,474 --> 00:47:20,390 JAMES SWAN: It's true. 1102 00:47:20,390 --> 00:47:23,250 Well, Jacobian is also very expensive to compute then too. 1103 00:47:23,250 --> 00:47:24,830 So, if-- 1104 00:47:24,830 --> 00:47:28,352 AUDIENCE: [INAUDIBLE] 1105 00:47:28,352 --> 00:47:29,560 JAMES SWAN: Sure, sure, sure. 1106 00:47:29,560 --> 00:47:31,210 No, I don't disagree. 1107 00:47:31,210 --> 00:47:33,550 I think one has to pick the method you're going 1108 00:47:33,550 --> 00:47:35,380 to use to suit the problem. 1109 00:47:35,380 --> 00:47:37,900 But turns out this doesn't involve much extra calculation. 1110 00:47:37,900 --> 00:47:40,060 So by default, for example, fsolve in MATLAB 1111 00:47:40,060 --> 00:47:41,560 is going to do this for you. 1112 00:47:41,560 --> 00:47:43,450 Or some version of this. 1113 00:47:43,450 --> 00:47:46,150 it's going to try to take steps that aren't too big. 1114 00:47:46,150 --> 00:47:47,740 It will limit the step size for you, 1115 00:47:47,740 --> 00:47:50,830 so that it keeps the value of the function reducing 1116 00:47:50,830 --> 00:47:52,320 in magnitude. 1117 00:47:52,320 --> 00:47:55,160 It's a pretty good general strategy. 1118 00:47:55,160 --> 00:47:55,998 Yes? 1119 00:47:55,998 --> 00:48:03,468 AUDIENCE: [INAUDIBLE] so why do we just pick one value for 1120 00:48:03,468 --> 00:48:08,500 [INAUDIBLE] 1121 00:48:08,500 --> 00:48:09,250 JAMES SWAN: I see. 1122 00:48:09,250 --> 00:48:11,160 So why-- ask that one more time. 1123 00:48:11,160 --> 00:48:12,244 This is a good question. 1124 00:48:12,244 --> 00:48:14,410 Can you say it a little louder so everyone can hear? 1125 00:48:14,410 --> 00:48:17,791 AUDIENCE: So why, instead of having just one value of alpha 1126 00:48:17,791 --> 00:48:21,700 and not having several values of alpha [INAUDIBLE] 1127 00:48:21,700 --> 00:48:22,450 JAMES SWAN: I see. 1128 00:48:22,450 --> 00:48:26,740 So the question is, yeah, we used a scalar alpha here, 1129 00:48:26,740 --> 00:48:27,550 right? 1130 00:48:27,550 --> 00:48:30,130 If we wanted to, we could reduce the step size 1131 00:48:30,130 --> 00:48:32,260 and also change direction. 1132 00:48:32,260 --> 00:48:34,960 We would use a matrix to do that, instead, right? 1133 00:48:34,960 --> 00:48:37,180 It would transform the step and change its direction. 1134 00:48:37,180 --> 00:48:39,184 And maybe we would choose different alphas 1135 00:48:39,184 --> 00:48:40,850 along different directions, for example. 1136 00:48:40,850 --> 00:48:43,300 So diagonal matrix with different alphas. 1137 00:48:43,300 --> 00:48:46,150 We could potentially do that. 1138 00:48:46,150 --> 00:48:50,650 we're probably going to need some extra information 1139 00:48:50,650 --> 00:48:55,400 to decide how to set the scaling in different directions. 1140 00:48:55,400 --> 00:48:59,770 One thing we know for sure is that the Newton-Raphson step 1141 00:48:59,770 --> 00:49:01,550 will reduce the value of the function. 1142 00:49:01,550 --> 00:49:03,490 If we take a small enough step size, 1143 00:49:03,490 --> 00:49:05,890 it will bring the value of the function down. 1144 00:49:05,890 --> 00:49:07,870 We know that because we did the Taylor 1145 00:49:07,870 --> 00:49:11,710 Expansion of the function to determine that step size. 1146 00:49:11,710 --> 00:49:13,950 And that Taylor expansion was going to be-- 1147 00:49:13,950 --> 00:49:17,170 that Taylor expansion is nearly exact in the limit 1148 00:49:17,170 --> 00:49:18,820 of very, very small step sizes. 1149 00:49:18,820 --> 00:49:21,220 So there will always be some small step 1150 00:49:21,220 --> 00:49:24,570 in this direction, which will reduce 1151 00:49:24,570 --> 00:49:25,720 the value of the function. 1152 00:49:25,720 --> 00:49:27,840 In other directions, we may reduce the value 1153 00:49:27,840 --> 00:49:30,090 of the function faster. 1154 00:49:30,090 --> 00:49:32,560 We don't know which directions to choose, OK? 1155 00:49:32,560 --> 00:49:33,810 Actually I shouldn't say that. 1156 00:49:33,810 --> 00:49:35,610 When we take very small step sizes in this direction, 1157 00:49:35,610 --> 00:49:37,609 it's reducing the value of the function fastest. 1158 00:49:37,609 --> 00:49:39,480 There isn't a faster direction to go in. 1159 00:49:39,480 --> 00:49:44,459 When we take impossibly small, vanishingly small step sizes. 1160 00:49:44,459 --> 00:49:46,500 But in principle, if I had some extra information 1161 00:49:46,500 --> 00:49:48,480 on the problem, I might be able to choose step sizes 1162 00:49:48,480 --> 00:49:49,605 along different directions. 1163 00:49:49,605 --> 00:49:51,600 I may know that one of these directions 1164 00:49:51,600 --> 00:49:55,080 is more ill-behaved than the other ones. 1165 00:49:55,080 --> 00:49:57,416 And choose a different damping factor for it. 1166 00:49:57,416 --> 00:49:58,290 That's a possibility. 1167 00:49:58,290 --> 00:50:00,030 But we actually have to know something 1168 00:50:00,030 --> 00:50:01,320 about the details of the problem we're trying 1169 00:50:01,320 --> 00:50:02,611 to solve if we're going to do-- 1170 00:50:02,611 --> 00:50:03,820 it's a wonderful question. 1171 00:50:03,820 --> 00:50:05,880 I mean, you could think about ways 1172 00:50:05,880 --> 00:50:08,016 of making this more, potentially more robust. 1173 00:50:08,016 --> 00:50:10,140 I'll show you an alternative way of doing this when 1174 00:50:10,140 --> 00:50:12,660 we talk about optimization. 1175 00:50:12,660 --> 00:50:14,280 In optimization we'll do-- we'll solve 1176 00:50:14,280 --> 00:50:16,890 systems of nonlinear equations to solve these optimization 1177 00:50:16,890 --> 00:50:17,550 problems. 1178 00:50:17,550 --> 00:50:20,454 There's another way of doing the same sort of strategy that's 1179 00:50:20,454 --> 00:50:21,870 more along what you're describing. 1180 00:50:21,870 --> 00:50:23,328 Maybe there's a different direction 1181 00:50:23,328 --> 00:50:25,130 to choose instead that could be preferable. 1182 00:50:25,130 --> 00:50:27,640 This is something called the dogleg method. 1183 00:50:27,640 --> 00:50:30,230 Great question. 1184 00:50:30,230 --> 00:50:32,260 Anything else? 1185 00:50:32,260 --> 00:50:32,760 No. 1186 00:50:36,120 --> 00:50:39,720 So globally convergent, right? 1187 00:50:39,720 --> 00:50:42,821 Converges to roots, local minima or maxima. 1188 00:50:42,821 --> 00:50:44,820 There are other modifications that are possible. 1189 00:50:44,820 --> 00:50:46,645 We'll talk about them in optimization. 1190 00:50:46,645 --> 00:50:48,270 There's always a penalty to doing this. 1191 00:50:48,270 --> 00:50:50,550 The penalty is in the rate of convergence. 1192 00:50:50,550 --> 00:50:51,910 So it will converge more slowly. 1193 00:50:51,910 --> 00:50:53,790 But maybe you speed the calculations along anyways, 1194 00:50:53,790 --> 00:50:54,289 right? 1195 00:50:54,289 --> 00:50:56,370 Maybe it requires fewer iterations overall 1196 00:50:56,370 --> 00:50:59,135 to get there because you tame the locally 1197 00:50:59,135 --> 00:51:01,260 convergent properties of the Newton-Raphson method. 1198 00:51:01,260 --> 00:51:05,160 Or you shortcut some of the expensive calculations, 1199 00:51:05,160 --> 00:51:07,740 like getting your Jacobian or calculating your Jacobian 1200 00:51:07,740 --> 00:51:09,010 inverse. 1201 00:51:09,010 --> 00:51:10,280 All right? 1202 00:51:10,280 --> 00:51:13,050 So Monday we're going to review sort of topics up til now. 1203 00:51:13,050 --> 00:51:15,234 Professor Green will run the lecture on Monday. 1204 00:51:15,234 --> 00:51:16,650 And then after that, we'll pick up 1205 00:51:16,650 --> 00:51:18,900 with optimization, which will follow right on from what 1206 00:51:18,900 --> 00:51:19,680 we've done so far. 1207 00:51:19,680 --> 00:51:21,230 Thanks.