1 00:00:00,000 --> 00:00:01,470 INTRODUCTION: The following content 2 00:00:01,470 --> 00:00:04,420 is provided by MIT OpenCourseWare under a Creative 3 00:00:04,420 --> 00:00:05,844 Commons license. 4 00:00:05,844 --> 00:00:07,510 Additional information about our license 5 00:00:07,510 --> 00:00:10,550 and MIT OpenCourseWare in general 6 00:00:10,550 --> 00:00:11,930 is available at ocw.mit.edu. 7 00:00:15,620 --> 00:00:17,230 PROFESSOR: Specific problem, and it's 8 00:00:17,230 --> 00:00:22,310 a pure linear least squares problem, 9 00:00:22,310 --> 00:00:24,950 but it's got two terms. 10 00:00:24,950 --> 00:00:30,190 So we're used to minimizing A*u minus b square. 11 00:00:30,190 --> 00:00:33,180 That gives us the least squares solution u 12 00:00:33,180 --> 00:00:36,430 hat to a linear system. 13 00:00:36,430 --> 00:00:42,360 And usually the reason we have to go to the least square thing 14 00:00:42,360 --> 00:00:44,900 is that there's no exact solution. 15 00:00:44,900 --> 00:00:50,390 Probably A has more equations than unknowns. 16 00:00:50,390 --> 00:00:55,230 A is long and thin, and there's no exact solution, 17 00:00:55,230 --> 00:00:58,600 so we look for the best solution, and we call it u hat. 18 00:00:58,600 --> 00:00:59,280 OK. 19 00:00:59,280 --> 00:01:01,960 But there are a lot of problems in which 20 00:01:01,960 --> 00:01:05,540 a second square appears. 21 00:01:05,540 --> 00:01:10,240 There's also a B*u equal d hiding in the background. 22 00:01:10,240 --> 00:01:13,430 And so we really have like two sets of equations. 23 00:01:17,590 --> 00:01:22,500 And we multiply that second square by some factor 24 00:01:22,500 --> 00:01:25,230 alpha and that wise choice of alpha 25 00:01:25,230 --> 00:01:30,000 is usually a big part of the problem. 26 00:01:30,000 --> 00:01:32,750 And I want to speak about some of the applications 27 00:01:32,750 --> 00:01:34,570 of this area. 28 00:01:34,570 --> 00:01:41,500 So from the point of view of the normal equations, 29 00:01:41,500 --> 00:01:43,790 the system that you actually solve, 30 00:01:43,790 --> 00:01:46,470 you could say no problem. 31 00:01:46,470 --> 00:01:49,060 If we knew how to do this, then we certainly 32 00:01:49,060 --> 00:01:54,280 can do both of them together, because instead of A transpose 33 00:01:54,280 --> 00:01:58,950 A showing up, we'll now have A transpose A plus alpha B 34 00:01:58,950 --> 00:02:05,230 transpose A. That'll be the positive definite coefficient 35 00:02:05,230 --> 00:02:07,670 matrix on the left side. 36 00:02:07,670 --> 00:02:09,650 And then on the right side, instead 37 00:02:09,650 --> 00:02:12,500 of just the usual A transpose b, this 38 00:02:12,500 --> 00:02:17,010 term is also going to give us an alpha B transpose d. 39 00:02:17,010 --> 00:02:24,410 All I'm saying is we don't need any new mathematics to reach 40 00:02:24,410 --> 00:02:29,480 this normal equation with the -- sort of the two-term normal 41 00:02:29,480 --> 00:02:30,030 equation. 42 00:02:30,030 --> 00:02:34,410 And another way to think of exactly the same thing is we're 43 00:02:34,410 --> 00:02:42,410 looking at the least squares problem, where the two matrices 44 00:02:42,410 --> 00:02:46,710 A and B both are multiplying u. 45 00:02:46,710 --> 00:02:51,550 And we have two bits of data, b and d, 46 00:02:51,550 --> 00:02:53,570 and all were doing is the usual thing 47 00:02:53,570 --> 00:02:55,730 but with a weight in here. 48 00:02:55,730 --> 00:02:58,980 And the weight is the identity matrix for the A part, 49 00:02:58,980 --> 00:03:01,830 and it's alpha times the identity matrix for the B part. 50 00:03:01,830 --> 00:03:04,510 So this is our C right. 51 00:03:04,510 --> 00:03:11,150 This is our C, just to say that, really, 52 00:03:11,150 --> 00:03:17,070 the notation that we created, the formulation we have, 53 00:03:17,070 --> 00:03:21,910 allows us to take this step, so C appears here and A transpose 54 00:03:21,910 --> 00:03:26,600 C*b, C appears over here too, just as always. 55 00:03:26,600 --> 00:03:29,580 OK. 56 00:03:29,580 --> 00:03:31,920 But there are important questions. 57 00:03:31,920 --> 00:03:34,440 And of course, always, the first important question 58 00:03:34,440 --> 00:03:38,280 in applied math is what problem are you solving? 59 00:03:38,280 --> 00:03:42,810 Why have we produced this class of problems? 60 00:03:42,810 --> 00:03:45,840 And I have two answers. 61 00:03:45,840 --> 00:03:50,320 Let me just mention first, so we are sure what 62 00:03:50,320 --> 00:03:53,080 the shape of these matrices is. 63 00:03:53,080 --> 00:03:58,750 A, as always, has more rows than columns. 64 00:03:58,750 --> 00:04:00,620 Of course, u is n by 1. 65 00:04:04,590 --> 00:04:09,490 It's just a column vector. 66 00:04:09,490 --> 00:04:14,530 But A has too many equations, too many rows, 67 00:04:14,530 --> 00:04:19,250 for us to get an exact solution; B, on the other hand, 68 00:04:19,250 --> 00:04:20,950 has few rows. 69 00:04:20,950 --> 00:04:23,230 It might even only have one row. 70 00:04:23,230 --> 00:04:30,820 It's very common to add on one constraint or one term 71 00:04:30,820 --> 00:04:33,220 in regularizing the situation. 72 00:04:33,220 --> 00:04:37,300 Anyway, p is relatively small. 73 00:04:37,300 --> 00:04:47,960 So the total matrix A, B has m plus p rows, and the same n 74 00:04:47,960 --> 00:04:51,380 columns, and we're ready to go. 75 00:04:51,380 --> 00:04:56,710 But the two parts are different somehow. 76 00:04:56,710 --> 00:04:58,910 They come for different reasons. 77 00:04:58,910 --> 00:05:04,200 And now, I wrote down here two places they come from. 78 00:05:04,200 --> 00:05:08,990 And these are big applications of applied math. 79 00:05:08,990 --> 00:05:14,600 And one of them produces small coefficients alpha. 80 00:05:14,600 --> 00:05:20,460 And what's the purpose of the B*u minus d term in that case, 81 00:05:20,460 --> 00:05:22,310 with just a small alpha? 82 00:05:24,970 --> 00:05:26,770 The problem is that the A transpose 83 00:05:26,770 --> 00:05:31,620 A part is nearly singular or is singular. 84 00:05:31,620 --> 00:05:35,840 So that the usual normal equation, without the B, 85 00:05:35,840 --> 00:05:42,320 would be in trouble, and this of course happens pretty often. 86 00:05:42,320 --> 00:05:49,980 So the idea of regularization is get some control 87 00:05:49,980 --> 00:05:56,330 of the solution by putting in another term that 88 00:05:56,330 --> 00:06:01,290 keeps some control over u, and stops it from just taking off, 89 00:06:01,290 --> 00:06:06,550 as what happened where the original normal equations would 90 00:06:06,550 --> 00:06:08,580 have a very large u hat. 91 00:06:08,580 --> 00:06:16,820 So we're just, like, adding a little steady part that 92 00:06:16,820 --> 00:06:18,800 keeps it a bit under control. 93 00:06:18,800 --> 00:06:23,790 And so the A transpose A is nearly singular in ill-posed 94 00:06:23,790 --> 00:06:27,800 problems, so we make them -- it's like giving aspirin 95 00:06:27,800 --> 00:06:29,740 to an ill-posed problem, right? 96 00:06:29,740 --> 00:06:34,880 You don't fix it, but it can operate. 97 00:06:34,880 --> 00:06:35,550 OK. 98 00:06:35,550 --> 00:06:39,880 And where do ill-posed problems come from? 99 00:06:39,880 --> 00:06:46,100 And I just wanted to say that I think the fundamental ill-posed 100 00:06:46,100 --> 00:06:53,470 problems in science is: given positions -- 101 00:06:53,470 --> 00:06:59,650 suppose we know that a mass, let's say, 102 00:06:59,650 --> 00:07:03,480 is in certain positions at certain times -- 103 00:07:03,480 --> 00:07:06,150 find the velocity. 104 00:07:06,150 --> 00:07:09,820 So we often, in applications have some way 105 00:07:09,820 --> 00:07:13,280 to know position, and want to know velocity. 106 00:07:13,280 --> 00:07:17,090 And maybe you realize that that problem is not 107 00:07:17,090 --> 00:07:22,420 well posed, because velocity takes the derivative. 108 00:07:22,420 --> 00:07:24,450 And if you take the derivative, that's 109 00:07:24,450 --> 00:07:28,140 not a good operator to invert. 110 00:07:28,140 --> 00:07:32,990 Taking the derivative makes things very rough. 111 00:07:41,490 --> 00:07:47,300 All sorts of cases, we're looking for the velocity, 112 00:07:47,300 --> 00:07:48,570 and we only have positions. 113 00:07:48,570 --> 00:07:54,440 One that I think about is from GPS. 114 00:07:54,440 --> 00:08:00,220 So GPS uses space-based satellites, as you all know, 115 00:08:00,220 --> 00:08:04,470 to give you very accurate positions. 116 00:08:04,470 --> 00:08:06,710 And somehow, out of those positions, 117 00:08:06,710 --> 00:08:09,920 you get pretty accurate but not, of course, 118 00:08:09,920 --> 00:08:15,640 as accurate as the positions, but you get decent velocities. 119 00:08:15,640 --> 00:08:16,530 And how? 120 00:08:16,530 --> 00:08:21,360 And there is an example where you want to know the motion -- 121 00:08:21,360 --> 00:08:24,720 of course, to ask for the acceleration would be asking 122 00:08:24,720 --> 00:08:27,380 for yet another derivative. 123 00:08:27,380 --> 00:08:35,120 You see why the derivative is an ill-posed thing? 124 00:08:35,120 --> 00:08:37,590 Let me just say ahead of time, I'm 125 00:08:37,590 --> 00:08:46,660 going to make today's lecture about direction number two, 126 00:08:46,660 --> 00:08:48,340 not the ill-posed problems. 127 00:08:48,340 --> 00:08:51,300 So I'm just, like, throwing in some comments 128 00:08:51,300 --> 00:08:53,755 about the ill-posed problem, and then 129 00:08:53,755 --> 00:08:55,730 I'll have a weekend to think about those, 130 00:08:55,730 --> 00:08:58,330 and then next week, I'll come back 131 00:08:58,330 --> 00:09:00,760 to this ill posed problems. 132 00:09:00,760 --> 00:09:06,320 And specifically, they often come from inverse problems, 133 00:09:06,320 --> 00:09:12,580 is a big source of ill posed problems 134 00:09:12,580 --> 00:09:14,470 that need regularization. 135 00:09:14,470 --> 00:09:20,490 It's just a very large class of equations. 136 00:09:20,490 --> 00:09:27,800 I mean, I was just going to say about the derivative example. 137 00:09:27,800 --> 00:09:30,700 Why is that so unstable? 138 00:09:30,700 --> 00:09:37,340 Well, from the point of finite differences, 139 00:09:37,340 --> 00:09:40,480 if we have positions, how do you estimate velocities? 140 00:09:40,480 --> 00:09:42,990 You take a difference quotient, right? 141 00:09:42,990 --> 00:09:47,900 You take the position at this time, the position at a close 142 00:09:47,900 --> 00:09:51,630 by time, and you divide by delta t. 143 00:09:51,630 --> 00:09:53,780 That's a reasonable start. 144 00:09:53,780 --> 00:09:56,640 But dividing by delta t, that small number, 145 00:09:56,640 --> 00:10:00,120 is producing big numbers. 146 00:10:00,120 --> 00:10:05,790 Any errors in the position are multiplied by that 1 147 00:10:05,790 --> 00:10:09,550 over delta t and blown up. 148 00:10:09,550 --> 00:10:13,330 And similarly, in frequency space, 149 00:10:13,330 --> 00:10:17,330 where the functions that we think about are the functions 150 00:10:17,330 --> 00:10:22,520 like e to the i*k*t, the derivatives brings down 151 00:10:22,520 --> 00:10:24,650 the factor k. 152 00:10:24,650 --> 00:10:28,050 So high oscillations, that's the point. 153 00:10:28,050 --> 00:10:31,160 Oscillatory functions can be pretty small, 154 00:10:31,160 --> 00:10:34,150 but their derivative can be enormous. 155 00:10:34,150 --> 00:10:38,710 So it's that oscillation which is often associated 156 00:10:38,710 --> 00:10:40,940 with noise in the measurements. 157 00:10:40,940 --> 00:10:45,440 You know, noisy measurements are jumpy, 158 00:10:45,440 --> 00:10:51,000 and when we go to take their derivative or their finite 159 00:10:51,000 --> 00:10:52,980 difference, we get big answers. 160 00:10:52,980 --> 00:11:01,830 Anyway, for me that's the model ill-posed problem, 161 00:11:01,830 --> 00:11:03,890 to find velocities. 162 00:11:03,890 --> 00:11:05,600 And how to do it? 163 00:11:05,600 --> 00:11:09,090 I mean a lot of thought has gone into that. 164 00:11:09,090 --> 00:11:14,060 Let me leave it there, and come back to it. 165 00:11:14,060 --> 00:11:19,340 But I say all this just to emphasize its importance. 166 00:11:19,340 --> 00:11:22,950 Not that we'll completely solve it, actually, 167 00:11:22,950 --> 00:11:25,730 for GPS or for any other thing, it's 168 00:11:25,730 --> 00:11:29,090 just all we can do is medicate. 169 00:11:29,090 --> 00:11:29,810 OK. 170 00:11:29,810 --> 00:11:34,490 Now this is the one that we can really solve. 171 00:11:34,490 --> 00:11:37,820 So this is a different application entirely. 172 00:11:37,820 --> 00:11:42,180 In this application, this second term, B*u equal d, 173 00:11:42,180 --> 00:11:46,060 is something important, something that we want 174 00:11:46,060 --> 00:11:48,230 to enforce. 175 00:11:48,230 --> 00:11:51,740 It's a constraint, you could say. 176 00:11:51,740 --> 00:11:55,550 And one way to enforce it which fits this pattern 177 00:11:55,550 --> 00:11:58,380 is to take alpha very large, right. 178 00:11:58,380 --> 00:12:02,350 When we take alpha large, we're putting a really heavy weight 179 00:12:02,350 --> 00:12:07,020 on that B*u minus d square, and when we minimize, 180 00:12:07,020 --> 00:12:12,520 that weight will force B*u to be pretty close to d. 181 00:12:12,520 --> 00:12:16,100 But of course, B*u equal d doesn't determine u. 182 00:12:16,100 --> 00:12:18,450 Everybody's got that picture clear? 183 00:12:18,450 --> 00:12:23,410 From B*u equal d has many solutions. 184 00:12:28,520 --> 00:12:33,160 And so the real problem that we're trying to solve is 185 00:12:33,160 --> 00:12:38,010 enforce B*u equal d, but among those solutions, 186 00:12:38,010 --> 00:12:42,780 pick the one that minimizes the first square, 187 00:12:42,780 --> 00:12:45,010 A*u minus b squared. 188 00:12:45,010 --> 00:12:46,810 So you see the difference? 189 00:12:46,810 --> 00:12:48,360 You're trying to enforce something 190 00:12:48,360 --> 00:12:52,830 that the physics or the geometry, or whatever 191 00:12:52,830 --> 00:12:58,580 source says has to be true. 192 00:12:58,580 --> 00:13:00,180 And you can do it. 193 00:13:00,180 --> 00:13:04,220 And you're left with lots of options. 194 00:13:04,220 --> 00:13:12,220 And then the combined problem attempts to pick the right u. 195 00:13:12,220 --> 00:13:12,760 OK. 196 00:13:12,760 --> 00:13:16,630 So that's the application number two 197 00:13:16,630 --> 00:13:18,270 that I want to speak about today. 198 00:13:18,270 --> 00:13:24,100 And actually, I want to give several ways to do it. 199 00:13:24,100 --> 00:13:26,830 It's a very important problem. 200 00:13:26,830 --> 00:13:34,580 And one way will be to actually solve B*u equal d. 201 00:13:34,580 --> 00:13:38,070 Find those solutions. 202 00:13:38,070 --> 00:13:40,360 And you may say, well, that's what 203 00:13:40,360 --> 00:13:42,000 we learned in linear algebra, that's 204 00:13:42,000 --> 00:13:44,950 the very foundation of linear algebra, 205 00:13:44,950 --> 00:13:48,570 is there a particular solution, right? 206 00:13:51,310 --> 00:13:56,750 Every solution is of this form particular plus null space. 207 00:13:56,750 --> 00:14:04,400 Maybe I'll just point to the start of that approach. 208 00:14:04,400 --> 00:14:06,670 So want to solve B*u equal d. 209 00:14:06,670 --> 00:14:09,470 And I'll come back to this method 210 00:14:09,470 --> 00:14:12,830 after dealing with the least squares approach. 211 00:14:12,830 --> 00:14:16,200 But here's really the direct approach. 212 00:14:16,200 --> 00:14:19,440 That if I solve B*u equal d, then there's a particular 213 00:14:19,440 --> 00:14:21,750 solution that solves it. 214 00:14:21,750 --> 00:14:28,250 And then you can always add on the general solution which is, 215 00:14:28,250 --> 00:14:30,380 sorry add on the null space solution -- 216 00:14:30,380 --> 00:14:32,990 the solution of B*u equals 0. 217 00:14:32,990 --> 00:14:36,230 And B*u equals 0 has lots of solutions. 218 00:14:36,230 --> 00:14:38,850 So we would have to find them. 219 00:14:38,850 --> 00:14:40,920 OK. 220 00:14:40,920 --> 00:14:43,550 I mean that's what 18.06 would naturally do, 221 00:14:43,550 --> 00:14:47,090 but actually never, I'm ashamed to say, 222 00:14:47,090 --> 00:14:49,270 but I didn't do it in 18.06. 223 00:14:49,270 --> 00:14:56,810 I never actually said how I would scientifically compute, 224 00:14:56,810 --> 00:15:00,710 in a stable way, the solutions. 225 00:15:00,710 --> 00:15:01,250 OK. 226 00:15:01,250 --> 00:15:04,400 So I think that will be important. 227 00:15:04,400 --> 00:15:06,650 But that's not the only way to do it. 228 00:15:06,650 --> 00:15:11,030 That's called the null space method. 229 00:15:11,030 --> 00:15:14,140 And sometimes it's the right choice, sometimes not. 230 00:15:14,140 --> 00:15:20,930 This would be called the heavy weight method, right. 231 00:15:20,930 --> 00:15:25,590 Put on a very heavy weight and solve a standard problem. 232 00:15:25,590 --> 00:15:26,500 OK. 233 00:15:26,500 --> 00:15:28,900 So let me follow that one up. 234 00:15:28,900 --> 00:15:32,070 And then they'll be a third method. 235 00:15:32,070 --> 00:15:34,570 And maybe there's going to be space on the middle blackboard 236 00:15:34,570 --> 00:15:35,630 for it. 237 00:15:35,630 --> 00:15:37,930 And what would the third method be? 238 00:15:37,930 --> 00:15:43,770 That will be use a Lagrange multiplier. 239 00:15:43,770 --> 00:15:45,820 This thing is a constraint. 240 00:15:45,820 --> 00:15:48,510 I'll enforce it by a Lagrange multiplier. 241 00:15:48,510 --> 00:15:49,050 OK. 242 00:15:49,050 --> 00:15:51,060 That's coming next. 243 00:15:51,060 --> 00:15:54,920 The way I'm enforcing it right now is by a heavy weight. 244 00:15:54,920 --> 00:15:57,200 OK. 245 00:15:57,200 --> 00:16:01,650 One reason for the popularity of this method 246 00:16:01,650 --> 00:16:05,000 is you don't have to do any new thinking. 247 00:16:05,000 --> 00:16:11,070 You just create these equations and solve them. 248 00:16:11,070 --> 00:16:16,380 Where the other methods maybe ask us to think separately 249 00:16:16,380 --> 00:16:19,694 about the constraint. 250 00:16:19,694 --> 00:16:21,360 Here we don't have to things separately, 251 00:16:21,360 --> 00:16:24,910 we just create this normal equation, we solve it, 252 00:16:24,910 --> 00:16:29,030 and we get an answer u hat. 253 00:16:29,030 --> 00:16:31,420 Maybe I should call it u hat alpha, 254 00:16:31,420 --> 00:16:34,430 because it depends on the weight alpha, 255 00:16:34,430 --> 00:16:41,480 certainly, which we hope is near the exact solution. 256 00:16:41,480 --> 00:16:46,250 The exact solution being the one that exactly solves B*u equal 257 00:16:46,250 --> 00:16:47,120 d. 258 00:16:47,120 --> 00:16:52,070 Because u hat alpha will not exactly solve B*u equal d. 259 00:16:52,070 --> 00:16:56,530 But we can find solutions that do, and then among those, 260 00:16:56,530 --> 00:17:00,530 we can minimize A*u minus b square. 261 00:17:00,530 --> 00:17:01,030 OK. 262 00:17:01,030 --> 00:17:06,101 So just a word about this heavy weight method. 263 00:17:06,101 --> 00:17:06,600 OK. 264 00:17:10,260 --> 00:17:14,700 Well, first an interesting point. 265 00:17:14,700 --> 00:17:19,240 A point that I think is sort of interesting. 266 00:17:19,240 --> 00:17:22,650 I want to let alpha go to infinity and see what happens, 267 00:17:22,650 --> 00:17:24,480 right. 268 00:17:24,480 --> 00:17:26,970 Everybody figures that as alpha goes to infinity, 269 00:17:26,970 --> 00:17:29,490 I'm going to get the right answer. 270 00:17:29,490 --> 00:17:31,150 Because as alpha goes to infinity, 271 00:17:31,150 --> 00:17:34,370 it's going to more and more enforce the constraint B*u 272 00:17:34,370 --> 00:17:35,550 equal d. 273 00:17:35,550 --> 00:17:39,510 And then, with that constraint enforced, the other part of it 274 00:17:39,510 --> 00:17:43,050 will find the best u and that's great. 275 00:17:43,050 --> 00:17:46,620 But let alpha go to infinity in this equation, 276 00:17:46,620 --> 00:17:47,400 and what happens? 277 00:17:50,050 --> 00:17:53,760 So this is just like a side comment just to say alpha, 278 00:17:53,760 --> 00:17:58,050 you know, taking a limit you got to think about doing it right. 279 00:17:58,050 --> 00:18:01,694 Well, let's see, if I let alpha go to infinity as it is, 280 00:18:01,694 --> 00:18:03,360 that'll be infinite that'll be infinite, 281 00:18:03,360 --> 00:18:06,140 and I won't know what's going on. 282 00:18:06,140 --> 00:18:12,960 Let me divide by alpha before I let alpha go to infinity. 283 00:18:12,960 --> 00:18:15,850 So if I just divide everything by alpha -- 284 00:18:15,850 --> 00:18:19,790 can I do that with an eraser here? 285 00:18:19,790 --> 00:18:21,810 I'll divide by alpha. 286 00:18:21,810 --> 00:18:24,580 So there's a 1 over alpha here. 287 00:18:24,580 --> 00:18:26,590 I divide this by alpha. 288 00:18:26,590 --> 00:18:30,960 And this has a 1 over alpha there. 289 00:18:30,960 --> 00:18:33,480 And now, if I let alpha go to infinity, 290 00:18:33,480 --> 00:18:36,800 I get something sensible. 291 00:18:36,800 --> 00:18:41,150 This goes to 0, right, alpha going to infinity, 292 00:18:41,150 --> 00:18:42,420 getting bigger and bigger. 293 00:18:42,420 --> 00:18:43,850 This goes to 0. 294 00:18:43,850 --> 00:18:45,370 So what do I get in the limit? 295 00:18:45,370 --> 00:18:51,640 I get that this equals this in the limit. 296 00:18:51,640 --> 00:18:53,260 So shall I put that up here? 297 00:18:53,260 --> 00:18:56,220 Well, I'll put it here, because I don't like it, frankly. 298 00:18:58,930 --> 00:19:03,390 So I'll just squeeze it in this little spot. 299 00:19:03,390 --> 00:19:07,350 That if I let alpha go to into infinity, so 1 over alpha 300 00:19:07,350 --> 00:19:08,030 goes to 0. 301 00:19:08,030 --> 00:19:14,810 I get B transpose B, u hat infinity, shall I call it? 302 00:19:14,810 --> 00:19:16,450 Equals B transpose d. 303 00:19:19,440 --> 00:19:21,750 And I guess what I want to say is, from 304 00:19:21,750 --> 00:19:25,580 that I don't learn a whole lot. 305 00:19:25,580 --> 00:19:28,100 because B transpose B is a singular matrix, 306 00:19:28,100 --> 00:19:33,710 B transpose B is a matrix of only rank p, 307 00:19:33,710 --> 00:19:40,350 it's very singular, right? 308 00:19:40,350 --> 00:19:44,760 B had this crazy shape, long and thin. 309 00:19:44,760 --> 00:19:47,660 B transpose B will be tall, B transpose 310 00:19:47,660 --> 00:19:52,980 B will be a large matrix, but its rank will only be p. 311 00:19:52,980 --> 00:19:55,230 It's an n by n matrix of rank p, and it's 312 00:19:55,230 --> 00:19:58,690 singular and who knows what's going on there. 313 00:20:01,350 --> 00:20:04,370 That little side issue was simply 314 00:20:04,370 --> 00:20:08,740 to say that you can't just let alpha 315 00:20:08,740 --> 00:20:11,530 go to infinity in the central equation there, 316 00:20:11,530 --> 00:20:13,570 and expect to see what's happening. 317 00:20:13,570 --> 00:20:14,070 OK. 318 00:20:14,070 --> 00:20:17,870 So somehow there's more to it than that. 319 00:20:17,870 --> 00:20:25,420 So let me put alpha back where it belongs, and think again. 320 00:20:25,420 --> 00:20:27,970 OK. 321 00:20:27,970 --> 00:20:35,340 And I guess by thinking again, I might as well 322 00:20:35,340 --> 00:20:43,190 think in terms of this way of writing it. 323 00:20:43,190 --> 00:20:45,760 Because I recognize this, right? 324 00:20:45,760 --> 00:20:49,100 This is exactly the framework that we've developed. 325 00:20:55,280 --> 00:20:58,760 So this is the least squares problem. 326 00:20:58,760 --> 00:21:01,050 I just want to write down the saddle point 327 00:21:01,050 --> 00:21:08,420 matrix that goes with this least squares problem. 328 00:21:08,420 --> 00:21:10,230 What is the saddle point matrix? 329 00:21:10,230 --> 00:21:11,610 Do you remember? 330 00:21:11,610 --> 00:21:21,650 The saddle point matrix S is -- it has, up here -- 331 00:21:21,650 --> 00:21:26,980 so now I've got an A and a B here. 332 00:21:26,980 --> 00:21:31,070 So it's going to be a little bit larger. 333 00:21:31,070 --> 00:21:34,320 Then I have my usual zero block. 334 00:21:34,320 --> 00:21:38,800 And I have my usual A transpose B transpose block. 335 00:21:38,800 --> 00:21:41,520 And what block goes there? 336 00:21:41,520 --> 00:21:43,280 That's the C inverse, right. 337 00:21:47,690 --> 00:21:53,426 It's our usual C inverse, A, A transpose, 0 338 00:21:53,426 --> 00:21:54,800 that we're totally accustomed to. 339 00:21:54,800 --> 00:22:03,050 But now A has grown into A, B; 0 is still 0; 340 00:22:03,050 --> 00:22:05,890 the transpose is still the transpose; 341 00:22:05,890 --> 00:22:09,240 and up here is C inverse, and since C was this, 342 00:22:09,240 --> 00:22:14,050 C inverse will be the identity, and the identity over alpha. 343 00:22:14,050 --> 00:22:15,590 OK. 344 00:22:15,590 --> 00:22:18,330 So that's my S_alpha, you could say. 345 00:22:21,040 --> 00:22:32,020 And now I'm prepared to let -- so my equation is S_alpha -- 346 00:22:32,020 --> 00:22:40,110 written as a block equation, what are the pieces of it? 347 00:22:40,110 --> 00:22:47,670 u is the guy that I'm looking for, the u hat alpha. 348 00:22:47,670 --> 00:22:50,880 And there was a Lagrange multiplier that came in. 349 00:22:50,880 --> 00:22:54,440 You remember, that's how we got to a block form 350 00:22:54,440 --> 00:22:56,080 from a scalar form. 351 00:22:56,080 --> 00:23:00,540 And I guess I usually call it w, so I'll stay with w for here. 352 00:23:00,540 --> 00:23:02,170 OK. 353 00:23:02,170 --> 00:23:06,920 So that's what multiplies w, u. 354 00:23:06,920 --> 00:23:11,410 And it's what gives -- let's see. 355 00:23:11,410 --> 00:23:19,590 I think it gives a B, and it gives a d from this A*u 356 00:23:19,590 --> 00:23:23,004 and B*u, and I think here if gives a 0, 357 00:23:23,004 --> 00:23:24,560 because we didn't have any. 358 00:23:24,560 --> 00:23:25,060 OK. 359 00:23:27,980 --> 00:23:29,450 What am I doing here? 360 00:23:29,450 --> 00:23:31,690 I'm just writing the problem in a way 361 00:23:31,690 --> 00:23:34,740 where I can let alpha go to 0, and see the limit. 362 00:23:34,740 --> 00:23:43,260 So let alpha go to infinity, this is heavy weight part. 363 00:23:43,260 --> 00:23:44,520 So this will go to 0. 364 00:23:44,520 --> 00:23:52,320 So this approaches the S_infinity, w_infinity, 365 00:23:52,320 --> 00:23:59,070 we could call it, u hat infinity is now -- 366 00:23:59,070 --> 00:24:03,400 well you see what the limit is, that's 0 in that block. 367 00:24:03,400 --> 00:24:08,070 This is A, this is B, this is A transpose, this is B transpose, 368 00:24:08,070 --> 00:24:11,220 this is our usual zero block, multiplying 369 00:24:11,220 --> 00:24:18,120 our same w_infinity, u hat infinity, 370 00:24:18,120 --> 00:24:23,730 equaling our same b, d, and 0. 371 00:24:23,730 --> 00:24:25,250 This is the limiting equation. 372 00:24:30,590 --> 00:24:31,970 And it's great. 373 00:24:35,230 --> 00:24:40,230 This is the equation that determines the limit as alpha 374 00:24:40,230 --> 00:24:44,220 goes to infinity, that determines the best u. 375 00:24:44,220 --> 00:24:48,660 This is the problem that we really want to solve. 376 00:24:48,660 --> 00:24:51,830 Maybe that's what I should say. 377 00:24:51,830 --> 00:24:56,380 Do you see the constraint B*u equal d in here from this 378 00:24:56,380 --> 00:24:58,380 middle block row? 379 00:24:58,380 --> 00:25:04,140 That says 0, 0, B u hat is d. 380 00:25:04,140 --> 00:25:08,290 So we've introduced the constraint. 381 00:25:08,290 --> 00:25:13,330 The first part is w_infinity with an A*u_infinity, 382 00:25:13,330 --> 00:25:18,790 that's the usual error term, the thing that we probably 383 00:25:18,790 --> 00:25:20,000 can't make 0. 384 00:25:20,000 --> 00:25:23,670 And then this is the usual Lagrange multiplier 385 00:25:23,670 --> 00:25:25,450 term from there. 386 00:25:29,120 --> 00:25:34,360 So I've spoken pretty quickly here, and let me just conclude. 387 00:25:34,360 --> 00:25:39,150 This is the limit equation, is the correct limit equation. 388 00:25:39,150 --> 00:25:40,880 This is the limit equation that we want 389 00:25:40,880 --> 00:25:44,780 to solve one way or another. 390 00:25:44,780 --> 00:25:51,120 And taking alpha large is one way to get near the answer, 391 00:25:51,120 --> 00:25:53,250 but we'll look at other ways now. 392 00:25:53,250 --> 00:26:00,840 So this is really the correct equations to solve. 393 00:26:05,050 --> 00:26:12,740 Going the saddle point Lagrange multiplier route. 394 00:26:12,740 --> 00:26:14,910 OK. 395 00:26:14,910 --> 00:26:19,360 So let me summarize what I've done so far. 396 00:26:22,340 --> 00:26:30,060 My problem is when B*u equal d is a constraint that I would 397 00:26:30,060 --> 00:26:35,040 like to satisfy, and one way to do it is to take alpha, 398 00:26:35,040 --> 00:26:38,770 you know, pretty near the largest number that the machine 399 00:26:38,770 --> 00:26:43,750 will hold, say 10 to the 15. 400 00:26:43,750 --> 00:26:46,500 Put a really heavy weight on this. 401 00:26:46,500 --> 00:26:49,920 But of course, when you let alpha be 10 to the 15, 402 00:26:49,920 --> 00:26:54,470 you can see that there's like some possible problems here. 403 00:26:54,470 --> 00:26:56,860 When you let alpha have an enormous weight, 404 00:26:56,860 --> 00:26:59,900 you're really tilting this matrix 405 00:26:59,900 --> 00:27:03,230 so strongly, you know, you couldn't let it be 10 to the 20 406 00:27:03,230 --> 00:27:07,610 in single precision or you'd wipe out A transpose A. 407 00:27:07,610 --> 00:27:09,570 So it's a balance here. 408 00:27:09,570 --> 00:27:14,440 So I guess probably a lot of a numerical analysts 409 00:27:14,440 --> 00:27:18,440 would say wrong way to do it, the right way 410 00:27:18,440 --> 00:27:24,590 is solve this equation, or else do it this other way. 411 00:27:24,590 --> 00:27:30,590 But a lot of people with codes say OK, you know, 412 00:27:30,590 --> 00:27:32,090 you're going to be a nervous Nellie, 413 00:27:32,090 --> 00:27:34,220 I'm just going to use my code. 414 00:27:34,220 --> 00:27:37,920 And that's quite normal, quite human response. 415 00:27:37,920 --> 00:27:40,640 OK. 416 00:27:40,640 --> 00:27:46,400 And this will frequently succeed. 417 00:27:46,400 --> 00:27:46,900 OK. 418 00:27:46,900 --> 00:27:50,640 So that's one method to do it, not the method that 419 00:27:50,640 --> 00:27:54,690 the professionals in numerical analysis -- maybe I'm thinking, 420 00:27:54,690 --> 00:27:58,060 for example, the book by Golub/van Loan, 421 00:27:58,060 --> 00:28:03,760 if you know that book, that would discuss this problem. 422 00:28:03,760 --> 00:28:09,170 And it would actually discuss this third method, 423 00:28:09,170 --> 00:28:12,280 this null space method of solving it. 424 00:28:12,280 --> 00:28:13,510 OK. 425 00:28:13,510 --> 00:28:16,570 Maybe I'll go to that null space method. 426 00:28:16,570 --> 00:28:20,450 So this was one way. 427 00:28:20,450 --> 00:28:25,030 Another way is solve B*u equal d. 428 00:28:29,470 --> 00:28:32,670 And remember again, we only have p equations, 429 00:28:32,670 --> 00:28:35,275 we have n unknowns, so there's going 430 00:28:35,275 --> 00:28:38,740 to be freedom in the solution. 431 00:28:38,740 --> 00:28:43,210 So we have to identify a particular solution, 432 00:28:43,210 --> 00:28:45,910 there's a lot of freedom in that particular solution, 433 00:28:45,910 --> 00:28:51,350 and then we can add to it -- this null space is going to be 434 00:28:51,350 --> 00:29:02,120 n minus p dimensions, n minus p degrees of freedom in the null 435 00:29:02,120 --> 00:29:02,960 space. 436 00:29:02,960 --> 00:29:06,190 That's the dimension of the null space, n minus p. 437 00:29:06,190 --> 00:29:08,590 I'm assuming that B has full rank p, 438 00:29:08,590 --> 00:29:12,100 but p is a small number compared to n. 439 00:29:12,100 --> 00:29:12,680 OK. 440 00:29:12,680 --> 00:29:14,860 So how do you find a particular solution? 441 00:29:14,860 --> 00:29:16,680 How do you find the null space solution? 442 00:29:16,680 --> 00:29:23,930 As I said, that's what I should be explaining in 18.06. 443 00:29:23,930 --> 00:29:27,560 And of course, we do it in 18.06, but we do it with a 3 444 00:29:27,560 --> 00:29:33,290 by 3 matrix, and we practically, you know, we do it by hand, 445 00:29:33,290 --> 00:29:36,190 where here we're talking about matrices of order thousands 446 00:29:36,190 --> 00:29:39,860 or millions, we don't do those by hand. 447 00:29:39,860 --> 00:29:44,450 And we better not do it in an unstable way. 448 00:29:44,450 --> 00:29:48,940 So the question is what's a good way to do it? 449 00:29:48,940 --> 00:29:53,720 And really, the heart of modern numerical analysis 450 00:29:53,720 --> 00:30:00,110 is orthogonalize stuff, get orthogonal vectors. 451 00:30:00,110 --> 00:30:03,190 Because if you have orthogonal vectors, 452 00:30:03,190 --> 00:30:05,500 they don't get out of scale. 453 00:30:05,500 --> 00:30:10,950 The numbers involved don't become unstable. 454 00:30:10,950 --> 00:30:14,585 And the standard orthogonalization process 455 00:30:14,585 --> 00:30:17,330 is Gram-Schmidt, that's right, those 456 00:30:17,330 --> 00:30:18,760 are the words we all think of. 457 00:30:18,760 --> 00:30:22,470 If I have a bunch of vectors, I have to make them orthogonal, 458 00:30:22,470 --> 00:30:28,140 I want to make them orthogonal, then I use -- well, 459 00:30:28,140 --> 00:30:30,900 Gram-Schmidt is what we think of. 460 00:30:30,900 --> 00:30:36,830 But actually, MATLAB doesn't use Gram-Schmidt, 461 00:30:36,830 --> 00:30:40,490 doesn't use the usual Gram-Schmidt, 462 00:30:40,490 --> 00:30:44,480 as Gram and Schmidt thought of it. 463 00:30:44,480 --> 00:30:49,520 MATLAB goes a different route to the same conclusion. 464 00:30:49,520 --> 00:30:54,560 So let me just remind you what Gram-Schmidt produced. 465 00:30:58,420 --> 00:31:05,890 And let me put in the name of the numerical analyst 466 00:31:05,890 --> 00:31:14,390 long after Gram and Schmidt, it's Householder, 467 00:31:14,390 --> 00:31:20,790 you know, the guy from Tennessee with good ideas. 468 00:31:20,790 --> 00:31:24,590 So he had another way to the same answer, 469 00:31:24,590 --> 00:31:26,970 which is this factorization. 470 00:31:26,970 --> 00:31:34,200 So we take our matrix, often it's A in 18.06, 471 00:31:34,200 --> 00:31:40,150 and we factor it, we want to orthogonalize its columns. 472 00:31:40,150 --> 00:31:43,060 So the columns of A get orthogonalized 473 00:31:43,060 --> 00:31:52,740 into the columns of Q. So this has the orthogonal columns. 474 00:31:52,740 --> 00:31:54,580 And then, of course, there's some connection 475 00:31:54,580 --> 00:31:57,730 between the original columns and the orthogonal columns, 476 00:31:57,730 --> 00:32:01,320 and that connection is by triangular matrix 477 00:32:01,320 --> 00:32:03,020 R, upper triangular. 478 00:32:08,050 --> 00:32:12,720 I don't know if you remember that from 18.06. 479 00:32:12,720 --> 00:32:19,140 What I typically do is I explain Gram-Schmidt as they knew it, 480 00:32:19,140 --> 00:32:25,820 and then at the last minute I pull Q and R out 481 00:32:25,820 --> 00:32:29,620 as a way to express the result. OK. 482 00:32:29,620 --> 00:32:32,240 So it's the result we want, and not 483 00:32:32,240 --> 00:32:35,780 the particular Gram-Schmidt way to get there, 484 00:32:35,780 --> 00:32:38,470 and Householder produces a better way to get there. 485 00:32:38,470 --> 00:32:40,160 OK. 486 00:32:40,160 --> 00:32:46,290 But the main point is that if a matrix has independent columns, 487 00:32:46,290 --> 00:32:52,170 or even if it hasn't, but if it has independent columns 488 00:32:52,170 --> 00:32:58,270 we know everything about it, that we can orthogonalize 489 00:32:58,270 --> 00:32:59,550 those columns. 490 00:32:59,550 --> 00:33:04,840 In fact, we can -- here's what I'm leading to, this B 491 00:33:04,840 --> 00:33:11,440 transpose I'm remembering has this shape because B had that 492 00:33:11,440 --> 00:33:18,470 shape, so it's B transpose that I'm going to do Gram-Schmidt, 493 00:33:18,470 --> 00:33:30,510 Householder, use -- The command in MATLAB is [Q, R] equals, 494 00:33:30,510 --> 00:33:36,040 with Gram-Schmidt we could have used the letters G and S, 495 00:33:36,040 --> 00:33:40,140 but since we don't use their actual method anymore, 496 00:33:40,140 --> 00:33:44,670 we could use the letter HH for Householder or something. 497 00:33:44,670 --> 00:33:50,680 But it's qr of, in this case, B transpose is what we want. 498 00:33:50,680 --> 00:33:51,800 OK. 499 00:33:51,800 --> 00:33:59,440 So what that very frequently used command in MATLAB 500 00:33:59,440 --> 00:34:09,800 produces is Q and R. And it produces a square matrix Q, 501 00:34:09,800 --> 00:34:14,680 where these columns, the columns of the first part, 502 00:34:14,680 --> 00:34:20,490 Q_1 transpose, are orthogonalized 503 00:34:20,490 --> 00:34:22,890 versions of these columns. 504 00:34:22,890 --> 00:34:28,060 And the R just tells us the connection between them. 505 00:34:28,060 --> 00:34:32,950 Then it also produces, and this is handy as you'll see, 506 00:34:32,950 --> 00:34:39,390 the algorithm also produces n minus p more columns, that 507 00:34:39,390 --> 00:34:42,060 are orthogonal to these guys. 508 00:34:42,060 --> 00:34:45,100 So it produces a complete orthonormal basis, 509 00:34:45,100 --> 00:34:48,230 a complete set of n columns altogether. 510 00:34:48,230 --> 00:34:53,540 Q_1 transpose has the column that really are 511 00:34:53,540 --> 00:34:54,920 associated with these problems. 512 00:34:54,920 --> 00:34:58,530 And these are going to be associated with the null space. 513 00:35:02,610 --> 00:35:06,420 So out of this, I see that actually B transpose 514 00:35:06,420 --> 00:35:13,480 is Q_1 transpose R. So you can say 515 00:35:13,480 --> 00:35:18,330 this is the reduced factorization with only p 516 00:35:18,330 --> 00:35:23,620 columns, and this is the full picture 517 00:35:23,620 --> 00:35:26,810 with the other n minus p columns that are orthogonal. 518 00:35:26,810 --> 00:35:29,660 And the reason that's handy is they tell us 519 00:35:29,660 --> 00:35:31,950 about the null space. 520 00:35:31,950 --> 00:35:38,020 So now I want to identify out of this a particular solution 521 00:35:38,020 --> 00:35:39,960 and the general null space solution. 522 00:35:39,960 --> 00:35:41,780 OK. 523 00:35:41,780 --> 00:35:43,470 So what are those? 524 00:35:43,470 --> 00:35:48,450 So particular solution is going to use this part. 525 00:35:50,980 --> 00:35:57,160 So, let's see, I want a particular solution. 526 00:35:57,160 --> 00:36:05,280 So B, transposing that is R transpose Q_1. 527 00:36:05,280 --> 00:36:07,430 OK. 528 00:36:07,430 --> 00:36:15,490 So now I'm prepared to solve -- step one is the particular 529 00:36:15,490 --> 00:36:16,230 solution. 530 00:36:16,230 --> 00:36:21,500 I want to get be B*u_particular equal d. 531 00:36:21,500 --> 00:36:23,350 OK. 532 00:36:23,350 --> 00:36:27,700 But now I have B is nicely factored. 533 00:36:27,700 --> 00:36:37,010 So this is R transpose Q_1 u particular equal d. 534 00:36:40,370 --> 00:36:46,570 So now comes the computation the code has to do. 535 00:36:46,570 --> 00:36:54,370 It has to invert that to get Q_1 times u particular 536 00:36:54,370 --> 00:36:58,590 equals R inverse transpose d. 537 00:36:58,590 --> 00:37:00,680 So it had to solve a triangular system, 538 00:37:00,680 --> 00:37:02,930 but of course, a triangular system is quick to solve. 539 00:37:02,930 --> 00:37:05,040 That's the good part here. 540 00:37:05,040 --> 00:37:09,740 And then this final step to get u particular, 541 00:37:09,740 --> 00:37:14,000 I have to put the inverse of that guy over there, 542 00:37:14,000 --> 00:37:17,010 but because this has orthogonal column, that's 543 00:37:17,010 --> 00:37:19,040 just Q_1 transpose. 544 00:37:19,040 --> 00:37:22,720 So there we go. 545 00:37:22,720 --> 00:37:28,900 That's the inverse of R. So that's 546 00:37:28,900 --> 00:37:32,660 what I should've done in 18.06 and never did, 547 00:37:32,660 --> 00:37:36,430 and you get to see it. 548 00:37:36,430 --> 00:37:39,330 What's a convenient particular solution? 549 00:37:39,330 --> 00:37:46,540 Everybody knows, we got a whole collection 550 00:37:46,540 --> 00:37:47,780 of particular solutions. 551 00:37:47,780 --> 00:37:51,950 We want to choose one that's nice and stable. 552 00:37:51,950 --> 00:37:54,360 And the reason it's stable is that it 553 00:37:54,360 --> 00:37:59,670 works with orthogonal columns, orthonormal even, 554 00:37:59,670 --> 00:38:07,630 and triangular matrix for which linear systems are 555 00:38:07,630 --> 00:38:08,760 highly active. 556 00:38:08,760 --> 00:38:09,500 OK. 557 00:38:09,500 --> 00:38:11,350 So that's the particular solution. 558 00:38:11,350 --> 00:38:13,320 Now what's the null space solution? 559 00:38:13,320 --> 00:38:21,240 What are the general solutions to null space part? 560 00:38:21,240 --> 00:38:25,120 What are the solution to those? 561 00:38:25,120 --> 00:38:29,190 Well, I can just go down the same steps. 562 00:38:29,190 --> 00:38:35,310 This is R transpose Q_1 u null space equals 0. 563 00:38:38,480 --> 00:38:41,620 I multiply both sides -- this is a nice square, 564 00:38:41,620 --> 00:38:47,500 invertible matrix -- I multiply by its inverse, kills that. 565 00:38:47,500 --> 00:39:00,620 So now I have Q_1 times the -- Q_1 is really the heart of B. 566 00:39:00,620 --> 00:39:07,830 So what vectors are perpendicular to Q_1? 567 00:39:07,830 --> 00:39:09,670 I hope I've got this right. 568 00:39:09,670 --> 00:39:13,440 It's easy to mix up a transpose in the process. 569 00:39:13,440 --> 00:39:16,560 So let me just pause to be sure I'm doing it correctly. 570 00:39:22,120 --> 00:39:23,540 OK. 571 00:39:23,540 --> 00:39:24,610 I hope. 572 00:39:24,610 --> 00:39:26,460 Did I check that I get it right? 573 00:39:29,220 --> 00:39:30,020 Yes. 574 00:39:30,020 --> 00:39:30,520 OK. 575 00:39:38,060 --> 00:39:41,850 I could have written what B is here, since I 576 00:39:41,850 --> 00:39:46,130 have B transpose as a product. 577 00:39:46,130 --> 00:39:56,570 B is R_0, Q_1, Q_2. 578 00:39:56,570 --> 00:39:58,770 OK. 579 00:39:58,770 --> 00:40:05,360 And I want to multiply by u_null and get 0. 580 00:40:05,360 --> 00:40:06,760 OK. 581 00:40:06,760 --> 00:40:08,680 So what should u_null be? 582 00:40:12,840 --> 00:40:22,980 u_null should be -- this part is giving us a 0, 583 00:40:22,980 --> 00:40:25,340 so this is like gone. 584 00:40:25,340 --> 00:40:31,070 So you see the two are the same. 585 00:40:31,070 --> 00:40:38,580 So what vectors are perpendicular to those? 586 00:40:38,580 --> 00:40:50,360 The answer is the u_null is a combination of the columns 587 00:40:50,360 --> 00:40:57,170 of Q_2 transpose. 588 00:40:57,170 --> 00:41:01,750 It's the Q_2 part that's telling us about the null space. 589 00:41:01,750 --> 00:41:05,670 It was the Q_1 part that gave the particular solution, 590 00:41:05,670 --> 00:41:08,750 it's the Q_2 part that gives the general solution. 591 00:41:08,750 --> 00:41:15,000 In other words, u_null is Q_2 transpose times 592 00:41:15,000 --> 00:41:21,280 any vector, let me call it z, this is any z. 593 00:41:21,280 --> 00:41:23,530 OK. 594 00:41:23,530 --> 00:41:28,050 Now this has my n minus p degrees of freedom. 595 00:41:35,430 --> 00:41:38,390 Sorry, I'm trying to do quite a bit here. 596 00:41:38,390 --> 00:41:39,990 I'm trying to say how you actually 597 00:41:39,990 --> 00:41:49,810 solve rectangular systems when they're not determinate. 598 00:41:52,410 --> 00:41:54,730 There are many solutions. 599 00:41:54,730 --> 00:41:57,810 This is a good particular solution to find, 600 00:41:57,810 --> 00:42:00,670 and this is a good way to find the general solution, 601 00:42:00,670 --> 00:42:01,670 the null space solution. 602 00:42:01,670 --> 00:42:10,580 This is a combination of the other columns. 603 00:42:10,580 --> 00:42:11,890 OK. 604 00:42:11,890 --> 00:42:14,030 All right, now we're done really, 605 00:42:14,030 --> 00:42:19,830 because I now know what u looks like; u 606 00:42:19,830 --> 00:42:23,690 looks like this part, which I've computed, 607 00:42:23,690 --> 00:42:27,940 and this part, which has the freedom. 608 00:42:27,940 --> 00:42:30,370 Let me put those two parts together. 609 00:42:30,370 --> 00:42:37,200 So now I want to minimize -- so I'm near the end here -- 610 00:42:37,200 --> 00:42:41,440 A*u minus B, but A*u is u_particular, 611 00:42:41,440 --> 00:42:44,660 and I have u_particular here. 612 00:42:44,660 --> 00:42:50,210 Q_1 transpose R minus transpose d, 613 00:42:50,210 --> 00:42:55,900 that's u_particular, plus u_null, and that's this. 614 00:42:55,900 --> 00:43:06,430 This u_null was also here; u_null was any Q_2 transpose z, 615 00:43:06,430 --> 00:43:07,640 right. 616 00:43:07,640 --> 00:43:10,480 All that is u, A*u minus b. 617 00:43:15,880 --> 00:43:16,420 OK. 618 00:43:19,730 --> 00:43:23,050 Up to possibly screwing up on some transposes, 619 00:43:23,050 --> 00:43:26,790 this is the right method. 620 00:43:26,790 --> 00:43:29,400 So this is a fixed solution. 621 00:43:29,400 --> 00:43:31,930 I just want to write that as a different way. 622 00:43:31,930 --> 00:43:37,350 Minimize A Q transpose z. 623 00:43:37,350 --> 00:43:41,380 Now, we're minimizing over the z's. 624 00:43:41,380 --> 00:43:50,300 So u had n components, but somehow p degrees of freedom 625 00:43:50,300 --> 00:43:53,950 were used up by the constraint B*u equal d. 626 00:43:53,950 --> 00:43:57,320 And we have the n minus p true degrees of freedom 627 00:43:57,320 --> 00:43:58,610 are in the z. 628 00:43:58,610 --> 00:44:03,430 So there's this minus the b. 629 00:44:03,430 --> 00:44:05,410 This is all known stuff. 630 00:44:05,410 --> 00:44:13,000 A Q_1 transpose R minus transpose d, square. 631 00:44:13,000 --> 00:44:14,040 OK. 632 00:44:14,040 --> 00:44:17,020 I'm there. 633 00:44:17,020 --> 00:44:20,830 So this is a standard minimization problem. 634 00:44:20,830 --> 00:44:26,850 Minimize, shall I call this A tilde z? 635 00:44:26,850 --> 00:44:30,850 And I'll call all this stuff b tilde. 636 00:44:30,850 --> 00:44:34,410 And the solution is found from the normal equations 637 00:44:34,410 --> 00:44:42,250 A tilde transpose, A tilde times the best 638 00:44:42,250 --> 00:44:46,760 z, I'll put a hat on it to emphasize 639 00:44:46,760 --> 00:44:53,131 that it's the great one, is A tilde transpose b tilde. 640 00:44:53,131 --> 00:44:53,630 OK. 641 00:44:57,500 --> 00:45:02,050 Finished that process without leaving myself 642 00:45:02,050 --> 00:45:05,210 a lot of time for the other method. 643 00:45:05,210 --> 00:45:13,540 Conclusion here, that after you've done the QR step, the qr 644 00:45:13,540 --> 00:45:20,570 command, and then after you solved a linear system 645 00:45:20,570 --> 00:45:27,930 with the R transpose, and you've multiplied by Q's, and you've 646 00:45:27,930 --> 00:45:32,810 ended up with this problem with a new matrix A tilde and B 647 00:45:32,810 --> 00:45:36,230 tilde, then you just do the normal equation. 648 00:45:36,230 --> 00:45:44,580 The web will have the code that takes those steps, 649 00:45:44,580 --> 00:45:47,150 reaches this conclusion, and solves it. 650 00:45:47,150 --> 00:45:47,840 OK. 651 00:45:47,840 --> 00:45:50,760 So that's the null space method. 652 00:45:50,760 --> 00:45:56,150 And it would be our method of choice when -- 653 00:45:56,150 --> 00:46:01,290 so z has n minus p components. 654 00:46:01,290 --> 00:46:05,600 If p is near n, then they're not many z's and this 655 00:46:05,600 --> 00:46:07,340 is highly efficient. 656 00:46:07,340 --> 00:46:07,910 OK. 657 00:46:07,910 --> 00:46:10,940 So the null space method is one way to go. 658 00:46:10,940 --> 00:46:13,470 Can I just in the remaining minutes 659 00:46:13,470 --> 00:46:22,040 go back to the Lagrange multiplier idea? 660 00:46:22,040 --> 00:46:23,900 So what's the Lagrange multiplier idea? 661 00:46:26,990 --> 00:46:28,980 So let me write the problem again 662 00:46:28,980 --> 00:46:30,550 as Lagrange would like it. 663 00:46:30,550 --> 00:46:38,060 Minimize A*u minus b squared subject to B*u equal d. 664 00:46:41,140 --> 00:46:44,290 That's the problem we're solving. 665 00:46:44,290 --> 00:46:46,940 I should have written it earlier. 666 00:46:46,940 --> 00:46:50,320 Let me put a star here, because this is our problem. 667 00:46:50,320 --> 00:46:51,380 OK. 668 00:46:51,380 --> 00:46:55,600 So one way to tackle it was take that constraint, 669 00:46:55,600 --> 00:46:57,200 give it a heavy weight. 670 00:46:57,200 --> 00:46:59,460 That was method one. 671 00:46:59,460 --> 00:47:06,050 Method two was solve this constraint in full detail, 672 00:47:06,050 --> 00:47:12,350 get the z's that remain as degrees of freedom, 673 00:47:12,350 --> 00:47:16,180 plug in u_particular plus u null space into here, 674 00:47:16,180 --> 00:47:18,800 and then you have a problem in the z. 675 00:47:18,800 --> 00:47:20,360 That's method two. 676 00:47:20,360 --> 00:47:24,400 Now, so method three is Lagrange. 677 00:47:24,400 --> 00:47:28,440 So method three would say OK, what does Lagrange do? 678 00:47:28,440 --> 00:47:37,250 L, we call it the Lagrangian, he takes this A*u minus b square, 679 00:47:37,250 --> 00:47:40,590 and adds to it some Lagrange multiplier, 680 00:47:40,590 --> 00:47:48,630 and I'll use maybe the standard lambda, times B*u minus d, 681 00:47:48,630 --> 00:47:49,130 right? 682 00:47:49,130 --> 00:47:50,250 That's Lagrange's idea. 683 00:47:56,030 --> 00:47:58,230 You recognize Lagrange's idea. 684 00:47:58,230 --> 00:48:02,390 Takes the constraint, multiply it by a multiplier. 685 00:48:02,390 --> 00:48:12,110 In fact this is p constraints, so p lambdas. 686 00:48:12,110 --> 00:48:16,390 Lambda's a vector of p multipliers. 687 00:48:16,390 --> 00:48:19,200 Not just a single one, because we've got the p constraints. 688 00:48:19,200 --> 00:48:20,740 And now what does Lagrange do? 689 00:48:23,810 --> 00:48:30,770 He sets the derivative dL / d lambda -- well, 690 00:48:30,770 --> 00:48:34,070 so let me do the dL/du first. 691 00:48:34,070 --> 00:48:40,570 He sets dL/du to 0, and dL / d lambda to 0. 692 00:48:44,464 --> 00:48:46,130 I could've started out with this method, 693 00:48:46,130 --> 00:48:50,310 because it's going to lead us to the equations faster. 694 00:48:50,310 --> 00:48:53,230 What equations do we get from dL/du equals 0? 695 00:48:53,230 --> 00:48:56,550 What's the gradient with respect to u? 696 00:48:56,550 --> 00:48:59,770 That gives us A transpose A*u. 697 00:48:59,770 --> 00:49:04,200 Oh, probably we want a 1/2 here, so that the numbers 698 00:49:04,200 --> 00:49:05,220 come out right. 699 00:49:05,220 --> 00:49:13,730 We get A transpose A*u, and another u part will be the B 700 00:49:13,730 --> 00:49:15,750 lambda. 701 00:49:15,750 --> 00:49:20,760 Taking the derivative of u will produce a B transpose lambda 702 00:49:20,760 --> 00:49:22,430 out of that. 703 00:49:22,430 --> 00:49:26,050 Yeah, a B transpose lambda out of that. 704 00:49:26,050 --> 00:49:29,800 And then, in here will be a linear term in u 705 00:49:29,800 --> 00:49:31,990 that we might as well put on the right-hand side 706 00:49:31,990 --> 00:49:35,670 as A transpose b. 707 00:49:35,670 --> 00:49:37,400 Familiar. 708 00:49:37,400 --> 00:49:39,010 OK. 709 00:49:39,010 --> 00:49:41,580 And what about dL / d lambda? 710 00:49:41,580 --> 00:49:48,050 Well that's just our constraint, B*u equals d right. 711 00:49:51,230 --> 00:49:53,110 Having built in the constraint, when 712 00:49:53,110 --> 00:49:55,520 I take the derivative with respect to lambda, 713 00:49:55,520 --> 00:49:57,520 the constraint just comes back again. 714 00:49:57,520 --> 00:50:01,700 So this is now method three. 715 00:50:01,700 --> 00:50:04,440 Solve that system. 716 00:50:04,440 --> 00:50:09,110 And I guess what I want say in the remaining 30 seconds 717 00:50:09,110 --> 00:50:15,640 is that solving this system is the same as this one. 718 00:50:15,640 --> 00:50:19,660 Those two are exactly the same. 719 00:50:19,660 --> 00:50:24,080 So that's a system with three parts, but I can, as always -- 720 00:50:24,080 --> 00:50:27,420 maybe I can even get there. 721 00:50:27,420 --> 00:50:30,720 Can you see that if I take this part 722 00:50:30,720 --> 00:50:36,210 and I subtract A transpose times the top row 723 00:50:36,210 --> 00:50:41,690 from the bottom row, what will that give me? 724 00:50:41,690 --> 00:50:46,020 Let me just hope that it works, well I won't actually. 725 00:50:46,020 --> 00:50:49,780 Time is up, it's asking too much to do even 726 00:50:49,780 --> 00:50:53,890 this one piece of linear algebra that can be in the notes. 727 00:50:53,890 --> 00:50:58,740 So this system that we got as the correct limit equation 728 00:50:58,740 --> 00:51:01,980 is exactly the same one that Lagrange gets. 729 00:51:01,980 --> 00:51:03,160 So that's one way. 730 00:51:03,160 --> 00:51:07,660 This is a system with n plus p unknowns. 731 00:51:11,500 --> 00:51:15,020 That's the price you pay for going Lagrange's route. 732 00:51:15,020 --> 00:51:17,060 You add p unknowns. 733 00:51:17,060 --> 00:51:23,330 This was a system with n minus p unknowns. 734 00:51:23,330 --> 00:51:26,090 That's because you're using the constraints 735 00:51:26,090 --> 00:51:28,300 to reduce the problem. 736 00:51:28,300 --> 00:51:34,780 And the original method one was a method with n unknowns, 737 00:51:34,780 --> 00:51:35,850 the unknowns in u. 738 00:51:35,850 --> 00:51:41,600 So you have the choice n, n plus p that Lagrange would like, 739 00:51:41,600 --> 00:51:46,910 and n minus p that Golub/van Loan would prefer. 740 00:51:46,910 --> 00:51:53,810 And usually it's method two or method three is recommended, 741 00:51:53,810 --> 00:51:56,120 but method one often used. 742 00:51:56,120 --> 00:51:56,690 OK. 743 00:51:56,690 --> 00:52:06,160 So that's the lecture on this point. 744 00:52:06,160 --> 00:52:07,110 That's today. 745 00:52:07,110 --> 00:52:13,410 And then next week comes the whole class of problems 746 00:52:13,410 --> 00:52:17,120 like finding velocities from displacements, 747 00:52:17,120 --> 00:52:20,110 where alpha is a small parameter. 748 00:52:20,110 --> 00:52:26,650 And then after that come discussions of the completed 749 00:52:26,650 --> 00:52:32,020 project ones, and the upcoming extensions into project two. 750 00:52:32,020 --> 00:52:32,620 OK. 751 00:52:32,620 --> 00:52:34,560 See you next week, thanks. 752 00:52:34,560 --> 00:52:35,810 Good.