1 00:00:00,000 --> 00:00:01,950 The following content is provided 2 00:00:01,950 --> 00:00:06,090 by MIT OpenCourseWare under a Creative Commons license. 3 00:00:06,090 --> 00:00:08,230 Additional information about our license 4 00:00:08,230 --> 00:00:10,490 and MIT OpenCourseWare in general, 5 00:00:10,490 --> 00:00:11,930 is available at ocw.mit.edu. 6 00:00:17,400 --> 00:00:20,440 PROFESSOR: For me, this is the third and last major topic 7 00:00:20,440 --> 00:00:21,970 of the course. 8 00:00:21,970 --> 00:00:28,730 The first one was initial value problems -- stability, 9 00:00:28,730 --> 00:00:31,020 accuracy. 10 00:00:31,020 --> 00:00:36,510 Topic two was solving large linear systems 11 00:00:36,510 --> 00:00:42,850 by iterative methods and also by direct methods 12 00:00:42,850 --> 00:00:45,740 like re-ordering the equations. 13 00:00:45,740 --> 00:00:50,070 Now, topic three is a whole world of optimization. 14 00:00:55,260 --> 00:00:59,680 In reality it means you're minimizing or possibly 15 00:00:59,680 --> 00:01:03,600 maximizing some expression. 16 00:01:03,600 --> 00:01:07,370 That expression could be a function of several variables. 17 00:01:07,370 --> 00:01:11,390 We could be in the discrete case, so we've got -- 18 00:01:11,390 --> 00:01:15,790 maybe I just emphasize that we have discrete optimization. 19 00:01:20,610 --> 00:01:29,440 So that's in R^n, discrete in n dimensions, 20 00:01:29,440 --> 00:01:39,270 and that will include some famous areas as single, 21 00:01:39,270 --> 00:01:43,400 as small subsets, really, of the big picture. 22 00:01:43,400 --> 00:01:46,810 For example, one subset would be linear programming. 23 00:01:52,050 --> 00:01:58,060 So that's a very special but important case of problems 24 00:01:58,060 --> 00:02:01,070 where the cost is linear, the constraints 25 00:02:01,070 --> 00:02:06,420 are linear, and has its own special methods. 26 00:02:06,420 --> 00:02:11,940 So I think that's worth considering on its own 27 00:02:11,940 --> 00:02:15,640 at a later point in a lecture. 28 00:02:15,640 --> 00:02:17,780 Another bigger picture -- for us, actually, 29 00:02:17,780 --> 00:02:24,450 bigger will be quadratic programming where the quantity 30 00:02:24,450 --> 00:02:28,490 that's being minimized is a quadratic function. 31 00:02:28,490 --> 00:02:31,060 Now what's good about a quadratic function? 32 00:02:31,060 --> 00:02:33,010 Its derivatives are linear. 33 00:02:33,010 --> 00:02:36,010 So that leads us to linear equations, 34 00:02:36,010 --> 00:02:39,380 but always with constraints. 35 00:02:49,960 --> 00:02:53,570 We don't have a free choice of any vector in R^n. 36 00:02:53,570 --> 00:02:58,980 We have constraints on the vectors, they have to -- 37 00:02:58,980 --> 00:03:05,690 maybe they solve a linear system of their own. 38 00:03:05,690 --> 00:03:09,260 We might be in 100 dimensions and we 39 00:03:09,260 --> 00:03:13,390 might have 10 linear equations as the unknowns have to solve. 40 00:03:13,390 --> 00:03:19,640 So in some way we're really in 90 dimensions, but it might -- 41 00:03:19,640 --> 00:03:22,990 you know, how should we treat those constraints? 42 00:03:22,990 --> 00:03:27,060 Well, you know that Lagrange multipliers play a role. 43 00:03:27,060 --> 00:03:33,760 So there's a big area, very big area, of discrete optimization. 44 00:03:33,760 --> 00:03:39,900 Then, also, there is the continuous problems 45 00:03:39,900 --> 00:03:49,120 where the unknown is a function, is a function u of x I'll say, 46 00:03:49,120 --> 00:03:52,020 or u of x and y. 47 00:03:52,020 --> 00:03:54,880 It's a function. 48 00:03:54,880 --> 00:03:58,430 That's why I refer to that area as continuous optimization. 49 00:04:03,470 --> 00:04:07,780 Well, first, you always want to get 50 00:04:07,780 --> 00:04:12,120 an equation, which is in some way going to be 51 00:04:12,120 --> 00:04:15,000 derivative equals zero, right. 52 00:04:15,000 --> 00:04:18,680 When we learn about minimization in elementary calculus, 53 00:04:18,680 --> 00:04:20,610 somewhere along the line is going 54 00:04:20,610 --> 00:04:23,570 to be an equation that has something 55 00:04:23,570 --> 00:04:26,620 like derivative equals zero. 56 00:04:26,620 --> 00:04:31,060 But, of course, we have to account for the constraints. 57 00:04:31,060 --> 00:04:36,380 We have to ask, what is -- the derivative of what when 58 00:04:36,380 --> 00:04:38,670 our unknown is a function. 59 00:04:38,670 --> 00:04:43,400 I'm just going to write down a topic within mathematics 60 00:04:43,400 --> 00:04:49,050 that this is often expressed as. 61 00:04:49,050 --> 00:04:52,700 Calculus -- that would be derivative. 62 00:04:52,700 --> 00:04:57,214 But in the case for functions it's 63 00:04:57,214 --> 00:04:58,880 often called the calculus of variations. 64 00:05:02,150 --> 00:05:04,240 So that's just so you see that word. 65 00:05:04,240 --> 00:05:06,190 There are books with that title. 66 00:05:09,220 --> 00:05:14,300 The idea of what is that derivative when our unknown is 67 00:05:14,300 --> 00:05:19,540 a function and the objective that we're trying to minimize 68 00:05:19,540 --> 00:05:23,070 is the integral of the function and its derivatives -- 69 00:05:23,070 --> 00:05:28,200 all sorts of possibilities there. 70 00:05:28,200 --> 00:05:34,490 So that that's a very quick overview of a field 71 00:05:34,490 --> 00:05:37,350 that we'll soon know a lot about. 72 00:05:37,350 --> 00:05:41,120 I was trying to think where to start. 73 00:05:41,120 --> 00:05:44,600 I think maybe it better be discrete. 74 00:05:47,440 --> 00:05:53,140 I want to get to the system of equations 75 00:05:53,140 --> 00:05:56,460 that you constantly see. 76 00:05:56,460 --> 00:06:01,330 So let me use, as an example the, most basic problem which 77 00:06:01,330 --> 00:06:07,460 comes for -- maybe I'll start over here -- 78 00:06:07,460 --> 00:06:15,150 the problem of least squares, which I'll express this way. 79 00:06:15,150 --> 00:06:18,620 I'm given a matrix A, and a right-hand side b, 80 00:06:18,620 --> 00:06:26,890 and I want to minimize the length squared of A*u minus b. 81 00:06:26,890 --> 00:06:30,630 So that would be a first problem to which 82 00:06:30,630 --> 00:06:33,830 we could apply calculus, because it's a straight minimization 83 00:06:33,830 --> 00:06:36,060 and I haven't got any constraints in there yet. 84 00:06:40,090 --> 00:06:41,940 We could also apply linear algebra, 85 00:06:41,940 --> 00:06:44,590 and actually linear algebra's going 86 00:06:44,590 --> 00:06:48,280 to throw a little extra light. 87 00:06:48,280 --> 00:06:51,020 So, just what am I thinking of here? 88 00:06:51,020 --> 00:06:58,120 I'm thinking of A as being m by n, with m larger than n. 89 00:07:02,230 --> 00:07:06,400 If A is a square matrix, then this problem is the same 90 00:07:06,400 --> 00:07:08,340 as solving A*u equal b. 91 00:07:11,080 --> 00:07:15,180 And, of course, we'll always reduce to that case 92 00:07:15,180 --> 00:07:17,320 if m equals n. 93 00:07:17,320 --> 00:07:22,420 But to focus on the problems that I'm really thinking about, 94 00:07:22,420 --> 00:07:25,780 I'm thinking about the case where this is the -- 95 00:07:25,780 --> 00:07:27,450 n is the number of unknowns. 96 00:07:30,910 --> 00:07:32,210 It's the size of u. 97 00:07:36,210 --> 00:07:39,990 m, the larger number, is the number of measurements, 98 00:07:39,990 --> 00:07:46,330 the number of the data, so it's the number of equations, 99 00:07:46,330 --> 00:07:50,070 and it's the size of b. 100 00:07:50,070 --> 00:07:54,020 So we have more equations than unknowns. 101 00:07:54,020 --> 00:07:56,040 You've met least squares before. 102 00:07:56,040 --> 00:08:02,300 I hope that maybe even in these few minutes, a little new light 103 00:08:02,300 --> 00:08:06,160 will be shed on least squares. 104 00:08:06,160 --> 00:08:12,760 So here's our problem, and calculus could lead us 105 00:08:12,760 --> 00:08:15,980 to the equation for the best u. 106 00:08:15,980 --> 00:08:20,750 So, u stands for u_1, u_2, up to u_n. 107 00:08:20,750 --> 00:08:26,670 There are n components in that vector, u. 108 00:08:26,670 --> 00:08:29,673 Maybe you know the equation -- I guess I hope you know 109 00:08:29,673 --> 00:08:38,870 the equation, because it's such a key to so many applications. 110 00:08:38,870 --> 00:08:43,270 If I just write it, it will sound as if problem over. 111 00:08:43,270 --> 00:08:44,920 Let me write it though. 112 00:08:44,920 --> 00:08:49,600 So the key equation -- and then this comes up in statistics, 113 00:08:49,600 --> 00:08:52,790 for example. 114 00:08:52,790 --> 00:08:55,850 Well, that's one of 100 examples. 115 00:08:55,850 --> 00:09:03,820 But in statistics, this is the topic of linear regression 116 00:09:03,820 --> 00:09:05,010 in statistics. 117 00:09:05,010 --> 00:09:11,890 Let me write down -- they gave the name normal equation 118 00:09:11,890 --> 00:09:17,740 to the equation that gives u directly. 119 00:09:17,740 --> 00:09:21,010 Do you remember what it is? 120 00:09:21,010 --> 00:09:22,120 The normal equation? 121 00:09:22,120 --> 00:09:24,140 The equation for the minimizing u, which 122 00:09:24,140 --> 00:09:25,940 we could find by calculus? 123 00:09:25,940 --> 00:09:33,610 It involves the key matrix A transpose A. Let me call u hat 124 00:09:33,610 --> 00:09:40,020 the minimizer, the winner in this competition. 125 00:09:40,020 --> 00:09:44,420 The right-hand side of the equation is A transpose b. 126 00:09:44,420 --> 00:09:50,370 So I won't directly go back to derive it, 127 00:09:50,370 --> 00:09:52,370 though probably I'm going to end up deriving it, 128 00:09:52,370 --> 00:09:58,220 because you can't help approaching that equation 129 00:09:58,220 --> 00:10:01,190 from one side or another. 130 00:10:01,190 --> 00:10:03,100 As I say, one way to approach it would just 131 00:10:03,100 --> 00:10:08,480 be to write out what that sum of squares is, take its derivative 132 00:10:08,480 --> 00:10:11,280 and you would get linear equation. 133 00:10:11,280 --> 00:10:15,520 So again, u hat stands for the u that gives the minimum. 134 00:10:19,350 --> 00:10:23,670 This A transpose A, of course -- so I'm putting in a little bit 135 00:10:23,670 --> 00:10:25,280 of linear algebra. 136 00:10:25,280 --> 00:10:29,940 This matrix A transpose A is obviously symmetric. 137 00:10:36,040 --> 00:10:39,330 Its important property, beyond the symmetry, 138 00:10:39,330 --> 00:10:43,490 is that it's positive definite. 139 00:10:43,490 --> 00:10:49,490 Well, I have to say positive definite -- 140 00:10:49,490 --> 00:10:51,230 there's always a proviso, of course. 141 00:10:51,230 --> 00:10:55,500 I haven't eliminated the degenerate case yet. 142 00:10:55,500 --> 00:11:05,160 A has m, many, many rows, a smaller number of columns, 143 00:11:05,160 --> 00:11:09,200 and let's assume that those columns 144 00:11:09,200 --> 00:11:15,250 are linearly independent so that we really do have n unknowns. 145 00:11:15,250 --> 00:11:20,300 If those columns were, say if all the columns were the same, 146 00:11:20,300 --> 00:11:26,890 then A*u would just be multiplying that same column 147 00:11:26,890 --> 00:11:29,650 and there would really be only one unknown. 148 00:11:29,650 --> 00:11:37,330 So I'm going to say that A has rank n, by which I 149 00:11:37,330 --> 00:11:41,530 mean n independent columns. 150 00:11:41,530 --> 00:11:43,670 In that case, that's what guarantees 151 00:11:43,670 --> 00:11:46,450 that this is positive definite. 152 00:11:46,450 --> 00:11:52,150 Let me try to draw an arrow there -- 153 00:11:52,150 --> 00:11:54,440 this is the same statement. 154 00:11:54,440 --> 00:11:58,040 If I say about A that the columns are independent, 155 00:11:58,040 --> 00:12:00,180 then I'm saying about A transpose A 156 00:12:00,180 --> 00:12:04,550 that it is positive definite. 157 00:12:04,550 --> 00:12:07,140 That means all its eigenvalues are positive; 158 00:12:07,140 --> 00:12:11,310 it's invertible, certainly; all its pivots are positive. 159 00:12:11,310 --> 00:12:13,940 It's the great class of matrices. 160 00:12:16,720 --> 00:12:23,190 But I don't really want to start with that equation. 161 00:12:23,190 --> 00:12:25,230 Here's my point. 162 00:12:25,230 --> 00:12:32,910 Optimization -- a key word that I better get on the board, 163 00:12:32,910 --> 00:12:36,260 maybe up here, to show that it's really important, 164 00:12:36,260 --> 00:12:39,240 is plus the idea of duality. 165 00:12:45,020 --> 00:12:51,080 The effect of duality, if I just give a first mention 166 00:12:51,080 --> 00:12:56,890 to that word, is that very often optimization problems, 167 00:12:56,890 --> 00:12:59,130 there are really two problems. 168 00:12:59,130 --> 00:13:02,180 Two problems that don't look identical, 169 00:13:02,180 --> 00:13:06,500 but in some important way they both, 170 00:13:06,500 --> 00:13:16,330 each problem is a statement of the task ahead of us. 171 00:13:16,330 --> 00:13:20,560 What are the two problems, the two dual problems 172 00:13:20,560 --> 00:13:25,870 in this basic example of least squares. 173 00:13:25,870 --> 00:13:29,140 All right, here's a good picture. 174 00:13:29,140 --> 00:13:30,430 Here's a good picture. 175 00:13:30,430 --> 00:13:34,160 Let me put it on this board so I can recover it. 176 00:13:34,160 --> 00:13:37,700 So, minimize A*u minus b. 177 00:13:37,700 --> 00:13:44,160 So I think of the vector b as being -- 178 00:13:44,160 --> 00:13:45,620 where it's in m dimensions. 179 00:13:45,620 --> 00:13:49,960 So it's a picture -- I'm in m dimensions here. 180 00:13:49,960 --> 00:13:52,620 Now, what about A*u? 181 00:13:52,620 --> 00:13:59,070 A*u -- where will A*u go in this picture? 182 00:13:59,070 --> 00:14:03,100 So, A*u is -- all the candidates A*u, 183 00:14:03,100 --> 00:14:10,660 multiply A by any vector u, so that means A*u is a combination 184 00:14:10,660 --> 00:14:16,660 of the columns of A, the possible vectors A*u lie 185 00:14:16,660 --> 00:14:18,640 in a subspace. 186 00:14:18,640 --> 00:14:23,280 So this is the subspace of all possible vectors A*u. 187 00:14:28,110 --> 00:14:30,510 And it's only n-dimensional. 188 00:14:30,510 --> 00:14:41,250 This is an n-dimensional subspace because I have only n 189 00:14:41,250 --> 00:14:44,840 parameters in u, only n columns in A. 190 00:14:44,840 --> 00:14:49,360 So the set of all A*u's I think of as a -- 191 00:14:49,360 --> 00:14:54,420 you could say a plane, an n-dimensional plane within 192 00:14:54,420 --> 00:14:55,960 the bigger space R^m. 193 00:15:00,840 --> 00:15:09,210 Another name for that subspace, that plane, is the -- 194 00:15:09,210 --> 00:15:15,640 in 18.06 I would call it the column space of A, 195 00:15:15,640 --> 00:15:19,870 or the range of A is another expression that you--. 196 00:15:19,870 --> 00:15:24,200 All the possible A*u's and here's b, 197 00:15:24,200 --> 00:15:27,030 which isn't one of the possible A*u's. 198 00:15:27,030 --> 00:15:29,930 So where is u hat? 199 00:15:29,930 --> 00:15:35,380 Where is the best A*u -- the best A*u now? 200 00:15:35,380 --> 00:15:39,790 The one that's closest to b is -- 201 00:15:39,790 --> 00:15:44,190 now comes another central word in this subject. 202 00:15:44,190 --> 00:15:48,960 If I draw it, I'm going to draw it here. 203 00:15:48,960 --> 00:15:54,340 That will be my best A*u, which I'm calling A u hat. 204 00:15:54,340 --> 00:16:00,250 That's the -- if the picture seems reasonable to your eye, 205 00:16:00,250 --> 00:16:05,430 this is the vector that's in the plane closest to b. 206 00:16:08,840 --> 00:16:11,782 What's the geometry here? 207 00:16:11,782 --> 00:16:13,240 See, that's what I wanted to see -- 208 00:16:13,240 --> 00:16:17,300 a little geometry and a little algebra, not just calculus. 209 00:16:17,300 --> 00:16:24,740 So the geometry is that this vector b, 210 00:16:24,740 --> 00:16:28,210 what's the connection between b and that vector -- 211 00:16:28,210 --> 00:16:30,580 that's the closest vector, right? 212 00:16:30,580 --> 00:16:32,260 We're minimizing the distance. 213 00:16:32,260 --> 00:16:41,800 This distance here, I might call that the error vector e. 214 00:16:41,800 --> 00:16:44,450 This is as small as possible. 215 00:16:44,450 --> 00:16:46,400 That's being minimized. 216 00:16:46,400 --> 00:16:53,360 That's the difference between -- this is Pythagoras here. 217 00:16:53,360 --> 00:16:55,950 Of course, when I say it's Pythagoras, 218 00:16:55,950 --> 00:16:58,770 I'm already saying the most important point, 219 00:16:58,770 --> 00:17:04,885 that this a right angle here. 220 00:17:04,885 --> 00:17:05,760 That's a right angle. 221 00:17:08,430 --> 00:17:17,530 The closest A*u, which is A u hat, which is this vector, 222 00:17:17,530 --> 00:17:21,460 the way geometrically we know it's closest is that the line 223 00:17:21,460 --> 00:17:27,830 from b to the plane, that's where the line from b, 224 00:17:27,830 --> 00:17:29,030 perpendicular to the plane. 225 00:17:29,030 --> 00:17:36,480 This line, this error vector e is perpendicular to the plane. 226 00:17:42,300 --> 00:17:46,620 There's a good word that everybody uses for this vector. 227 00:17:46,620 --> 00:17:49,200 Take a vector b that's not on a plane, what's 228 00:17:49,200 --> 00:17:54,820 the word to look for the nearest vector in the plane? 229 00:17:54,820 --> 00:17:56,260 AUDIENCE: Projection. 230 00:17:56,260 --> 00:17:58,187 PROFESSOR: Projection. 231 00:17:58,187 --> 00:17:59,270 So this is the projection. 232 00:18:05,380 --> 00:18:07,300 Orthogonal projection, if I wanted 233 00:18:07,300 --> 00:18:11,310 to really emphasize the fact that that's a right angle. 234 00:18:15,750 --> 00:18:22,530 So that would give me a geometric way 235 00:18:22,530 --> 00:18:25,100 to see the least squares problem. 236 00:18:25,100 --> 00:18:30,780 Now comes the point to see the dual problem. 237 00:18:30,780 --> 00:18:37,631 The dual problem will be, here, if I draw the perpendicular 238 00:18:37,631 --> 00:18:38,130 subspace. 239 00:18:44,590 --> 00:18:48,250 So that's a subspace of what dimension? 240 00:18:48,250 --> 00:18:53,390 This contains all the vectors perpendicular to the plane. 241 00:18:53,390 --> 00:18:59,740 So I have it as -- if m is 3, so we're in three dimensions, 242 00:18:59,740 --> 00:19:03,190 and our plane is an ordinary two-dimensional plane, 243 00:19:03,190 --> 00:19:08,150 then the dimension is one -- that's the perpendicular line. 244 00:19:08,150 --> 00:19:15,990 But thinking bigger, if we're in m dimensions and this plane is 245 00:19:15,990 --> 00:19:18,990 n-dimensional, than this is going to have -- 246 00:19:18,990 --> 00:19:23,630 the true dimension of this is m minus n, 247 00:19:23,630 --> 00:19:26,880 which could be pretty substantial. 248 00:19:26,880 --> 00:19:32,020 But anyway, that's the perpendicular subspace. 249 00:19:32,020 --> 00:19:34,910 If this is the column space of A, 250 00:19:34,910 --> 00:19:39,150 I can figure out what vectors are 251 00:19:39,150 --> 00:19:46,880 perpendicular to the columns of A. That's really what I mean. 252 00:19:46,880 --> 00:19:49,940 This contains -- I've drawn it as a line, 253 00:19:49,940 --> 00:19:54,310 but I've written up there its dimension so that you see -- 254 00:19:54,310 --> 00:20:02,490 I just don't know how to draw, like, a bigger subspace. 255 00:20:02,490 --> 00:20:05,930 Yet you would have to see that all the vectors in it 256 00:20:05,930 --> 00:20:09,480 were perpendicular to all of these vectors. 257 00:20:09,480 --> 00:20:13,070 Do you see what I'm saying? 258 00:20:13,070 --> 00:20:17,620 If we were stuck in thinking in three dimensions, 259 00:20:17,620 --> 00:20:21,060 if I make this a plane I can't make that a plane. 260 00:20:21,060 --> 00:20:25,160 If I make m equal 3, and I make n equal to two, 261 00:20:25,160 --> 00:20:28,680 I'm only got a line left to be perpendicular. 262 00:20:28,680 --> 00:20:34,520 But in higher dimensions there are lots of dimensions left. 263 00:20:34,520 --> 00:20:37,990 So what's my dual problem? 264 00:20:37,990 --> 00:20:46,480 My dual problem is find the vector e in this plane 265 00:20:46,480 --> 00:20:47,410 closest to b. 266 00:20:50,950 --> 00:20:54,050 In other words, by the same reasoning, 267 00:20:54,050 --> 00:20:56,840 what I'm saying is take the vector b, 268 00:20:56,840 --> 00:21:04,750 project it over to this plane, project it orthogonally -- 269 00:21:04,750 --> 00:21:08,430 that same right angle is going to be there. 270 00:21:08,430 --> 00:21:13,760 This plane -- I haven't said what's in this -- 271 00:21:13,760 --> 00:21:16,660 I've said what's in this plane but I haven't written it yet. 272 00:21:16,660 --> 00:21:18,710 But you could tell me already. 273 00:21:18,710 --> 00:21:21,500 What is this? 274 00:21:21,500 --> 00:21:22,390 What's that vector? 275 00:21:25,100 --> 00:21:27,610 One answer would be, it's the projection 276 00:21:27,610 --> 00:21:30,240 of b onto this perpendicular. 277 00:21:30,240 --> 00:21:33,670 So you see we're really taking the vector b 278 00:21:33,670 --> 00:21:36,890 and we're separating it into two components. 279 00:21:36,890 --> 00:21:40,927 One in the column space, the other perpendicular 280 00:21:40,927 --> 00:21:41,760 to the column space. 281 00:21:41,760 --> 00:21:45,480 So just tell me what that vector is. 282 00:21:45,480 --> 00:21:47,800 It is e. 283 00:21:47,800 --> 00:21:49,250 Same guy. 284 00:21:49,250 --> 00:21:53,470 In other words, e is the solution to the dual problem -- 285 00:21:53,470 --> 00:22:00,820 maybe I call this projection p for the best vector 286 00:22:00,820 --> 00:22:06,520 in the plane. e is the best vector in this subspace, 287 00:22:06,520 --> 00:22:08,450 and they add up to--. 288 00:22:08,450 --> 00:22:11,720 So, we're really taking the vector b, 289 00:22:11,720 --> 00:22:13,480 and we're splitting it into a part 290 00:22:13,480 --> 00:22:20,230 p in this space, and a part e in the perpendicular space. 291 00:22:20,230 --> 00:22:22,830 If I just write down the equations for that, 292 00:22:22,830 --> 00:22:25,480 I'll see what's cooking. 293 00:22:28,590 --> 00:22:33,600 Well, I guess what I have to do is remember what 294 00:22:33,600 --> 00:22:37,930 are the equations to be in this subspace, 295 00:22:37,930 --> 00:22:39,690 to be perpendicular to the column. 296 00:22:39,690 --> 00:22:43,840 So I can't go further without remembering 297 00:22:43,840 --> 00:22:47,030 what's in that subspace. 298 00:22:47,030 --> 00:22:49,520 So everything in that subspace is 299 00:22:49,520 --> 00:22:55,030 perpendicular to the columns of A. 300 00:22:55,030 --> 00:22:57,650 Let me just write down what that means. 301 00:22:57,650 --> 00:23:01,460 Let me use the letter maybe y for the vectors 302 00:23:01,460 --> 00:23:08,920 in that subspace, and e for the winning vector, the projection. 303 00:23:08,920 --> 00:23:11,730 So y will be the vectors in that subspace. 304 00:23:11,730 --> 00:23:14,260 So those vectors are perpendicular -- 305 00:23:14,260 --> 00:23:21,700 so this is the subspace of y, all the y's in here. 306 00:23:21,700 --> 00:23:24,570 Now, what's the condition? 307 00:23:24,570 --> 00:23:32,710 So y -- that y in this perpendicular subspace. 308 00:23:32,710 --> 00:23:33,990 What do I mean? 309 00:23:33,990 --> 00:23:44,990 I mean that y is perpendicular to the columns of A. 310 00:23:44,990 --> 00:23:48,140 How shall I write that? 311 00:23:48,140 --> 00:23:51,390 Perpendicular means inner product zero. 312 00:23:51,390 --> 00:23:59,060 So I want to change the columns into rows, 313 00:23:59,060 --> 00:24:03,030 and take the inner product with y, and get zeros. 314 00:24:03,030 --> 00:24:05,460 Zero, zero, zero. 315 00:24:05,460 --> 00:24:13,770 So, this is column 1 transposed, to be a row. 316 00:24:16,530 --> 00:24:21,000 I'm trying to express this requirement 317 00:24:21,000 --> 00:24:22,730 in terms of the matrix. 318 00:24:26,030 --> 00:24:28,010 So to be perpendicular to the first column, 319 00:24:28,010 --> 00:24:31,330 I know that means that the inner product of the first column 320 00:24:31,330 --> 00:24:35,170 with y should be zero. 321 00:24:35,170 --> 00:24:38,720 The inner product with the second column -- 322 00:24:38,720 --> 00:24:42,900 the second column with y should be zero. 323 00:24:42,900 --> 00:24:51,180 The n-th column, its inner product with y should be zero. 324 00:24:51,180 --> 00:24:52,840 So what matrix have I got here? 325 00:24:56,830 --> 00:25:01,640 What's the condition on y's, simple and beautiful? 326 00:25:01,640 --> 00:25:03,930 What matrix is that? 327 00:25:03,930 --> 00:25:06,520 It's A transpose. 328 00:25:06,520 --> 00:25:14,070 So that perpendicular thing -- this is completely expressed 329 00:25:14,070 --> 00:25:17,770 by the equation A transpose y equals zero. 330 00:25:27,100 --> 00:25:31,230 That tells me the y's, and, of course, 331 00:25:31,230 --> 00:25:34,410 e is going to be one of the y's. 332 00:25:34,410 --> 00:25:37,150 It's going to be, I could say y hat, 333 00:25:37,150 --> 00:25:40,550 but I've already named it e. 334 00:25:40,550 --> 00:25:46,390 It's the particular one that's closest to b -- 335 00:25:46,390 --> 00:25:50,200 the y's are everybody all along here -- 336 00:25:50,200 --> 00:25:53,740 this is the null space of A transpose. 337 00:25:53,740 --> 00:25:57,980 So in words, I would call it that perpendicular thing is 338 00:25:57,980 --> 00:26:00,720 the null space of A transpose. 339 00:26:04,330 --> 00:26:08,470 So when you did linear algebra, you'll remember that. 340 00:26:08,470 --> 00:26:11,760 That the null space of A transpose -- let me write it -- 341 00:26:11,760 --> 00:26:18,820 is perpendicular to the column space of A. 342 00:26:18,820 --> 00:26:21,860 The fundamentals theorem of linear algebra right there. 343 00:26:21,860 --> 00:26:26,190 Now we're just using it again to see what 344 00:26:26,190 --> 00:26:28,700 are the two dual problems here. 345 00:26:28,700 --> 00:26:33,630 So the primal problem, the one that we stated first, 346 00:26:33,630 --> 00:26:35,580 was this one. 347 00:26:35,580 --> 00:26:38,660 So I'll call this the primal -- P for primal. 348 00:26:41,350 --> 00:26:43,750 What is the dual problem? 349 00:26:43,750 --> 00:26:54,480 The dual problem is the problem about the y's. 350 00:26:54,480 --> 00:26:56,690 Not about the u's at all. 351 00:26:56,690 --> 00:26:59,420 That's the beauty of this duality. 352 00:26:59,420 --> 00:27:02,080 One problem is about u's, and it's 353 00:27:02,080 --> 00:27:06,410 a problem that ends up projecting on the column space. 354 00:27:06,410 --> 00:27:09,890 The second problem is about y's, it's 355 00:27:09,890 --> 00:27:12,010 the problem that ends up projecting 356 00:27:12,010 --> 00:27:16,230 onto this perpendicular space, and it was a projection. 357 00:27:16,230 --> 00:27:24,660 So the dual problem is just minimize the distance from b 358 00:27:24,660 --> 00:27:32,310 to y, but with the constraint, with -- 359 00:27:32,310 --> 00:27:34,390 and now I get to use that word constraint -- 360 00:27:34,390 --> 00:27:37,470 with A transpose y equals zero. 361 00:27:37,470 --> 00:27:40,370 So there is the other problem. 362 00:27:46,600 --> 00:27:53,370 So I hope your eye can travel between the -- well, 363 00:27:53,370 --> 00:27:55,220 let me write underneath it the primal again. 364 00:27:59,720 --> 00:28:06,260 Minimize A*u minus b square. 365 00:28:13,160 --> 00:28:17,500 This is the one whose solution is e, 366 00:28:17,500 --> 00:28:22,870 and this is the one whose solution is, well, u, 367 00:28:22,870 --> 00:28:27,830 and the projection -- u hat, sorry, 368 00:28:27,830 --> 00:28:34,860 and the projection p is A u hat, and I guess what I'm trying 369 00:28:34,860 --> 00:28:39,070 to say is that somehow there's a very important connection 370 00:28:39,070 --> 00:28:40,220 between the two problems. 371 00:28:44,090 --> 00:28:47,070 First of all, the two problems use the same data. 372 00:28:47,070 --> 00:28:53,110 They use the same vector b, they use the same matrix A. Notice 373 00:28:53,110 --> 00:28:55,610 that in one problem it's A, and in the other problem 374 00:28:55,610 --> 00:28:57,260 it's the transpose appears. 375 00:28:57,260 --> 00:29:00,050 That's very common -- we'll see that always. 376 00:29:02,374 --> 00:29:04,040 But there's something a little different 377 00:29:04,040 --> 00:29:05,870 about the two problems. 378 00:29:05,870 --> 00:29:11,330 This problem was unconstrained, any u was allowed. 379 00:29:11,330 --> 00:29:18,490 This problem was constrained, only a subset of y's, only 380 00:29:18,490 --> 00:29:21,400 that subspace of y's was allowed. 381 00:29:21,400 --> 00:29:25,130 This is a problem with n unknowns. 382 00:29:25,130 --> 00:29:29,470 This is a problem with m minus n unknowns. 383 00:29:29,470 --> 00:29:37,330 m minus n unknown variables, once we've 384 00:29:37,330 --> 00:29:38,850 accounted for the constraints. 385 00:29:44,690 --> 00:29:48,700 This is one thing I'm thinking about. 386 00:29:48,700 --> 00:29:53,830 Often, the problem will come with a constraint. 387 00:29:53,830 --> 00:29:58,120 Maybe I'll do a physical example right away. 388 00:29:58,120 --> 00:30:01,630 The problem comes to us with a constraint. 389 00:30:01,630 --> 00:30:04,940 In other words, suppose you were given this problem. 390 00:30:04,940 --> 00:30:09,520 How would you deal with it? 391 00:30:09,520 --> 00:30:13,220 That's like the first question in optimization, 392 00:30:13,220 --> 00:30:15,200 or one of the central questions. 393 00:30:15,200 --> 00:30:18,510 How do you deal with a constraint? 394 00:30:18,510 --> 00:30:21,140 If we minimize this, of course, the minimum 395 00:30:21,140 --> 00:30:24,720 would be when y equaled b. 396 00:30:24,720 --> 00:30:29,430 But that's failing to take into account the constraint on y. 397 00:30:29,430 --> 00:30:32,370 So how do you take constraints into account 398 00:30:32,370 --> 00:30:35,920 and end up with an equation? 399 00:30:35,920 --> 00:30:40,180 We can see, in this picture, that somehow or other we 400 00:30:40,180 --> 00:30:42,420 ended up with this normal equation, 401 00:30:42,420 --> 00:30:52,580 but actually I would rather end up with a primal dual equation. 402 00:30:52,580 --> 00:30:55,250 I'd like to end up with an equation for the best 403 00:30:55,250 --> 00:30:58,980 u and the best y. 404 00:30:58,980 --> 00:31:00,420 So what will that be? 405 00:31:00,420 --> 00:31:04,720 So I need now two equations that will connect the best 406 00:31:04,720 --> 00:31:08,740 u and the best y, and probably this is going to be the key. 407 00:31:08,740 --> 00:31:13,220 This b is A u hat, right. 408 00:31:17,220 --> 00:31:19,620 So this will be one of my equations, 409 00:31:19,620 --> 00:31:20,840 and this will be the other. 410 00:31:23,620 --> 00:31:25,530 Let me see if I -- well, OK. 411 00:31:29,600 --> 00:31:34,290 I don't know what to do now. 412 00:31:34,290 --> 00:31:36,140 Here I've called it y. 413 00:31:36,140 --> 00:31:37,760 Over here it's e. 414 00:31:37,760 --> 00:31:40,550 I've got myself in a corner. 415 00:31:43,150 --> 00:31:47,200 Maybe I should call e y hat, would you like that? 416 00:31:51,010 --> 00:31:56,370 We have in mind that it's e, the error in the primal problem. 417 00:31:56,370 --> 00:32:04,310 But just to make the notation for the two 418 00:32:04,310 --> 00:32:10,340 problems consistent, let me call the winner here y hat, 419 00:32:10,340 --> 00:32:12,480 the winner here u hat. 420 00:32:12,480 --> 00:32:13,400 What's the relation? 421 00:32:13,400 --> 00:32:16,000 So let me just -- here I'll write down the relation between 422 00:32:16,000 --> 00:32:17,380 the two. 423 00:32:17,380 --> 00:32:22,050 Well, it's over there. 424 00:32:22,050 --> 00:32:23,290 Let's see, is that right? 425 00:32:23,290 --> 00:32:24,950 Yes? 426 00:32:24,950 --> 00:32:38,590 y hat plus A u hat is b, and A transpose y hat is zero. 427 00:32:38,590 --> 00:32:41,220 That's it. 428 00:32:41,220 --> 00:32:44,240 That's it. 429 00:32:44,240 --> 00:32:54,570 Here we have -- that's the pair of equations that solves, 430 00:32:54,570 --> 00:32:59,700 that connects the primal and the dual, solves them both, 431 00:32:59,700 --> 00:33:05,240 solves each one, and is really, it's a system -- 432 00:33:05,240 --> 00:33:09,110 you could say it's a block equation. 433 00:33:09,110 --> 00:33:21,330 The block matrix being identity A, A transpose, zero; 434 00:33:21,330 --> 00:33:27,340 the unknown being the y and the u. 435 00:33:27,340 --> 00:33:30,530 The right-hand side being the data, which in this case 436 00:33:30,530 --> 00:33:31,650 was the b. 437 00:33:38,850 --> 00:33:41,430 I guess what I want to do is emphasize, 438 00:33:41,430 --> 00:33:45,740 in what's coming for the month of April, 439 00:33:45,740 --> 00:33:50,060 the importance of this class of problems. 440 00:33:54,040 --> 00:33:58,740 It's dealing with two -- it's dealing with the primal 441 00:33:58,740 --> 00:34:02,250 and the dual at the same time. 442 00:34:02,250 --> 00:34:07,770 It's important for so many reasons I can't say them 443 00:34:07,770 --> 00:34:09,550 all on the first day. 444 00:34:09,550 --> 00:34:13,370 That would be a mistake, to try to say everything 445 00:34:13,370 --> 00:34:13,960 the first day. 446 00:34:13,960 --> 00:34:18,960 But let me just say something -- that linear programming, 447 00:34:18,960 --> 00:34:22,780 which is just one example, and it doesn't fit this because it 448 00:34:22,780 --> 00:34:25,160 has inequality constraints. 449 00:34:25,160 --> 00:34:30,050 But you maybe know that the number one 450 00:34:30,050 --> 00:34:37,490 method to solve linear program is called the simplex method. 451 00:34:37,490 --> 00:34:43,500 Well, it was the number one method for many years. 452 00:34:43,500 --> 00:34:46,740 For many problems it's still the right way to do it. 453 00:34:46,740 --> 00:34:53,600 But a new method called the primal-dual -- 454 00:34:53,600 --> 00:34:56,250 at least that's part of its name, primal-dual. 455 00:34:56,250 --> 00:35:02,070 It is essentially solving the primal and the dual problems 456 00:35:02,070 --> 00:35:05,990 at once, and there are inequality constraints, 457 00:35:05,990 --> 00:35:06,560 of course. 458 00:35:06,560 --> 00:35:09,250 I'm going to stop there with linear programming 459 00:35:09,250 --> 00:35:13,450 and give it its turn later. 460 00:35:13,450 --> 00:35:21,610 In this perfect example here, we have only equations. 461 00:35:21,610 --> 00:35:23,220 How many do we have? 462 00:35:23,220 --> 00:35:28,750 We have m plus n equations, because y is m unknowns, 463 00:35:28,750 --> 00:35:30,440 u is n unknowns. 464 00:35:30,440 --> 00:35:35,600 I have altogether m plus n equations -- 465 00:35:35,600 --> 00:35:42,440 m y's, and n u's, and they come together. 466 00:35:42,440 --> 00:35:48,320 Now, could you, just to connect back 467 00:35:48,320 --> 00:35:51,680 with what we absolutely know, that it's a normal equation, 468 00:35:51,680 --> 00:35:54,890 where is this normal equation coming from? 469 00:35:54,890 --> 00:35:58,130 So here's the normal equation. 470 00:35:58,130 --> 00:36:02,920 We know that that's gotta come, right, out of the thing. 471 00:36:02,920 --> 00:36:04,670 How does it come? 472 00:36:04,670 --> 00:36:08,910 Suppose I have a block system, two by two. 473 00:36:08,910 --> 00:36:11,100 How do I solve it? 474 00:36:11,100 --> 00:36:15,780 Well actually, that's, in a way, the big question. 475 00:36:15,780 --> 00:36:19,410 But one way to solve it, the natural way to solve it, 476 00:36:19,410 --> 00:36:23,250 would be elimination. 477 00:36:23,250 --> 00:36:28,720 Multiply this first row by a suitable matrix. 478 00:36:28,720 --> 00:36:32,270 Subtract from the second row to produce a zero there. 479 00:36:32,270 --> 00:36:39,130 In other words, eliminate y and get an equation for u hat. 480 00:36:39,130 --> 00:36:40,610 So what do I do? 481 00:36:40,610 --> 00:36:42,150 How do I do it? 482 00:36:42,150 --> 00:36:49,800 I multiply -- would you rather look at equations or matrices? 483 00:36:49,800 --> 00:36:53,060 I've tried to keep the two absolutely together. 484 00:36:53,060 --> 00:36:54,500 Let me look at the equation. 485 00:36:54,500 --> 00:36:58,620 What shall I multiply that equation by and subtract from 486 00:36:58,620 --> 00:37:01,370 this -- I just want to eliminate. 487 00:37:01,370 --> 00:37:05,520 I want to get y hat out of there and leave just an equation 488 00:37:05,520 --> 00:37:09,040 for u hat that we will totally recognize. 489 00:37:09,040 --> 00:37:10,350 So what do I do? 490 00:37:10,350 --> 00:37:14,250 I multiplying that first equation by? 491 00:37:14,250 --> 00:37:17,000 A transpose, thanks. 492 00:37:17,000 --> 00:37:20,760 So I multiply this first equation by A transpose. 493 00:37:20,760 --> 00:37:22,040 Let me just do it this way. 494 00:37:25,020 --> 00:37:27,610 Now what? 495 00:37:27,610 --> 00:37:30,120 A transpose y is zero. 496 00:37:30,120 --> 00:37:32,620 Now I use the second -- well, this is one way to do it. 497 00:37:32,620 --> 00:37:34,310 A transpose y is zero. 498 00:37:34,310 --> 00:37:36,450 What am I left with? 499 00:37:36,450 --> 00:37:38,550 The normal equation. 500 00:37:38,550 --> 00:37:40,040 Well, it shouldn't be a surprise. 501 00:37:40,040 --> 00:37:42,120 We had to end up with the normal equation. 502 00:37:42,120 --> 00:37:44,590 Maybe you would rather -- and actually, 503 00:37:44,590 --> 00:37:49,240 what I intended was to make it sound more like Gaussian 504 00:37:49,240 --> 00:37:50,440 elimination. 505 00:37:50,440 --> 00:37:53,270 Multiply this row by A transpose, 506 00:37:53,270 --> 00:37:55,240 subtract from this row. 507 00:37:55,240 --> 00:38:01,490 I still have identity A up here -- when I do that subtraction, 508 00:38:01,490 --> 00:38:04,180 I get the zero, that was the point. 509 00:38:04,180 --> 00:38:07,350 Here I get A transpose A subtracted from zero, 510 00:38:07,350 --> 00:38:14,170 it's minus A transpose A, y hat, u hat. 511 00:38:14,170 --> 00:38:17,960 Of course, I had to do the same thing to the right-hand side, 512 00:38:17,960 --> 00:38:22,360 and when I subtracted this the b was still there, 513 00:38:22,360 --> 00:38:26,580 but it was minus A transpose b. 514 00:38:26,580 --> 00:38:32,040 So, now I've got in the second equation -- 515 00:38:32,040 --> 00:38:36,550 the second equation only involves u hat, and, of course, 516 00:38:36,550 --> 00:38:40,780 when I changed the signs, it's our friend -- 517 00:38:40,780 --> 00:38:43,720 A transpose A u hat equal A transpose b. 518 00:38:43,720 --> 00:38:52,030 So this is -- maybe you would say the natural way to solve 519 00:38:52,030 --> 00:38:59,660 this type of system, but I want to emphasize -- throw it away. 520 00:38:59,660 --> 00:39:06,700 I really want to emphasize the importance of these -- 521 00:39:06,700 --> 00:39:13,010 let me clean it back up again to what it was -- 522 00:39:13,010 --> 00:39:14,120 of these block systems. 523 00:39:17,710 --> 00:39:20,330 Now, they need a name. 524 00:39:20,330 --> 00:39:27,430 We have to give some name to this type of two-field problem. 525 00:39:27,430 --> 00:39:29,740 I guess then, in the next month, I'm 526 00:39:29,740 --> 00:39:32,800 going to find examples of it everywhere. 527 00:39:32,800 --> 00:39:35,040 So here I've found the first example of it 528 00:39:35,040 --> 00:39:39,030 in ordinary old-fashioned least square. 529 00:39:39,030 --> 00:39:41,600 So what am I going to call this? 530 00:39:41,600 --> 00:39:46,560 I'll call it -- I'll give it a couple of names. 531 00:39:46,560 --> 00:39:50,020 Saddle point equation, saddle point system maybe 532 00:39:50,020 --> 00:39:50,710 I should say. 533 00:39:53,530 --> 00:39:57,810 I have to explain, why do I call it saddle point system. 534 00:39:57,810 --> 00:40:05,740 In optimization, I could call it the optimality equation -- 535 00:40:05,740 --> 00:40:08,130 just meaning it's the equations for the winners. 536 00:40:11,190 --> 00:40:16,300 In the world of optimization, the names of Kuhn and Tucker 537 00:40:16,300 --> 00:40:20,490 are associated with these equations -- 538 00:40:20,490 --> 00:40:29,010 the Kuhn-Tucker equations, and there are other names 539 00:40:29,010 --> 00:40:30,420 we'll see. 540 00:40:30,420 --> 00:40:35,370 But let me just say for a moment why saddle point. 541 00:40:35,370 --> 00:40:40,230 Why do I think of this as a saddle point problem? 542 00:40:45,520 --> 00:40:49,050 See, the point about A transpose A 543 00:40:49,050 --> 00:40:52,120 was that it was positive definite. 544 00:40:52,120 --> 00:40:56,670 This is A transpose A. 545 00:40:56,670 --> 00:41:01,810 Now what's the corresponding issue for this matrix? 546 00:41:01,810 --> 00:41:05,320 So this is my matrix that I'm constantly gonna look at. 547 00:41:08,960 --> 00:41:12,770 Matrices of that form are going to show up all the time. 548 00:41:12,770 --> 00:41:18,920 I've said probably in 18.085 where these appear 549 00:41:18,920 --> 00:41:20,940 but then we don't do much with them. 550 00:41:20,940 --> 00:41:22,570 Now we're ready to do something. 551 00:41:26,580 --> 00:41:28,920 I didn't appreciate their importance 552 00:41:28,920 --> 00:41:32,320 until I realized, in going to lectures 553 00:41:32,320 --> 00:41:36,300 on applied mathematics, that if I waited a little 554 00:41:36,300 --> 00:41:38,900 while that matrix would appear. 555 00:41:38,900 --> 00:41:40,010 That block matrix. 556 00:41:40,010 --> 00:41:42,570 It just shows up in all these applications. 557 00:41:47,750 --> 00:41:53,930 One of our issues will be how to solve it. 558 00:41:53,930 --> 00:41:59,420 Another issue that comes first is what's 559 00:41:59,420 --> 00:42:01,320 the general form of this? 560 00:42:01,320 --> 00:42:03,450 Can I jump to that issue? 561 00:42:03,450 --> 00:42:09,020 Just so we see something more than this single problem here. 562 00:42:09,020 --> 00:42:20,820 Let me put in the matrix as it comes in applications. 563 00:42:20,820 --> 00:42:24,790 Some matrix A, rectangular, right? 564 00:42:24,790 --> 00:42:27,550 Its transpose. 565 00:42:27,550 --> 00:42:28,370 A zero. 566 00:42:28,370 --> 00:42:30,440 Often a zero. 567 00:42:30,440 --> 00:42:34,440 But what's up here is not always the identity. 568 00:42:34,440 --> 00:42:37,730 I want to allow something more general. 569 00:42:37,730 --> 00:42:41,010 I want to allow, for example, weighted least squares. 570 00:42:41,010 --> 00:42:46,680 So weighted least squares -- if you've met least squares, 571 00:42:46,680 --> 00:42:55,060 it's very important to meet its extension to weighted least 572 00:42:55,060 --> 00:42:55,970 squares. 573 00:42:55,970 --> 00:43:04,700 When the equations A*u equal b are not given the same weight, 574 00:43:04,700 --> 00:43:08,440 there's a weighting matrix, often it's a covariance matrix. 575 00:43:08,440 --> 00:43:12,680 I'm going to call the matrix that goes in here C inverse. 576 00:43:16,610 --> 00:43:25,690 So this will be then an important class of application. 577 00:43:25,690 --> 00:43:29,740 This is pretty important already when the identity is there, 578 00:43:29,740 --> 00:43:34,350 but many, many applications produce some other matrix that 579 00:43:34,350 --> 00:43:39,000 is usually -- it very, very often is symmetric positive 580 00:43:39,000 --> 00:43:44,690 definite in that corner, like the identity is. 581 00:43:44,690 --> 00:43:48,230 But key point, which I have to make today. 582 00:43:48,230 --> 00:43:52,600 Is the whole matrix -- either this one or this one. 583 00:43:52,600 --> 00:43:54,710 It is symmetric, right? 584 00:43:54,710 --> 00:43:56,570 That matrix is symmetric. 585 00:43:56,570 --> 00:43:59,530 Is it or isn't it positive definite? 586 00:43:59,530 --> 00:44:04,060 If I do elimination do I get all positive pivots? 587 00:44:04,060 --> 00:44:07,940 It's a matrix of size m plus n. 588 00:44:07,940 --> 00:44:11,500 So I'm asking, are all its eigenvalues positive, 589 00:44:11,500 --> 00:44:15,130 but I don't want to really compute eigenvalues. 590 00:44:15,130 --> 00:44:17,740 Also, in a lot of cases I would. 591 00:44:17,740 --> 00:44:19,930 Finding the eigenvalues of this matrix 592 00:44:19,930 --> 00:44:24,410 would lead me to the singular value decomposition, 593 00:44:24,410 --> 00:44:29,120 absolutely crucial topic in linear algebra that we'll see. 594 00:44:29,120 --> 00:44:32,850 But let me just take it as a linear system. 595 00:44:32,850 --> 00:44:40,610 If I do elimination, what are the first m pivots? 596 00:44:40,610 --> 00:44:43,490 Let me not be abstract here. 597 00:44:43,490 --> 00:44:45,530 Let me be quite concrete. 598 00:44:45,530 --> 00:44:47,700 Let me put the identity here. 599 00:44:47,700 --> 00:44:53,310 Let me put some matrix -- oh, I want the matrix here to be -- 600 00:44:53,310 --> 00:44:58,630 I better put a bigger identity just so we see the picture. 601 00:44:58,630 --> 00:45:03,530 Here I'm going to put to an A transpose, which might be 2, 3, 602 00:45:03,530 --> 00:45:07,650 4; 5, 6, 7 -- when I write numbers like that 603 00:45:07,650 --> 00:45:11,360 you'll realize that I've just pick them out of the hat. 604 00:45:11,360 --> 00:45:15,920 Here is the transpose, 2, 3, 4; 5, 6, 7. 605 00:45:15,920 --> 00:45:18,120 Here's the zero block. 606 00:45:20,970 --> 00:45:23,575 I'd like to know about that matrix. 607 00:45:23,575 --> 00:45:24,200 It's symmetric. 608 00:45:29,040 --> 00:45:32,100 And it's full rank. 609 00:45:32,100 --> 00:45:34,060 It's invertible. 610 00:45:34,060 --> 00:45:35,490 How do I know that? 611 00:45:35,490 --> 00:45:37,330 It's the invertibility -- of course, 612 00:45:37,330 --> 00:45:40,170 the identity part is great. 613 00:45:40,170 --> 00:45:45,870 I guess I see that this is invertible because I've ended 614 00:45:45,870 --> 00:45:52,820 up with A transpose A, and here my A has rank two -- 615 00:45:52,820 --> 00:45:56,450 those two columns are independent. 616 00:45:56,450 --> 00:45:58,370 They're not in the same direction -- [2, 3, 617 00:45:58,370 --> 00:46:00,820 4] is not a multiple of [5, 6, 7]. 618 00:46:00,820 --> 00:46:05,930 That's an invertible matrix, and this process finds the inverse. 619 00:46:05,930 --> 00:46:07,880 Elimination finds the inverse. 620 00:46:07,880 --> 00:46:10,510 It's the pivots I want to ask you about. 621 00:46:10,510 --> 00:46:15,480 What are the pivots in this matrix? 622 00:46:15,480 --> 00:46:17,500 So what do I do? 623 00:46:17,500 --> 00:46:22,170 The first pivot is a 1 -- I use it to clean out that column. 624 00:46:22,170 --> 00:46:25,420 The second pivot is a 1 -- I use it to clean out that column. 625 00:46:25,420 --> 00:46:28,340 The third pivot is a 1 -- I use it to clean out that column. 626 00:46:31,660 --> 00:46:32,740 What's the next pivot? 627 00:46:32,740 --> 00:46:38,340 What do I have -- of course, now some stuff has filled in here. 628 00:46:38,340 --> 00:46:40,230 What is actually filled in there? 629 00:46:40,230 --> 00:46:43,770 So this is the identity, if I can write it fast. 630 00:46:43,770 --> 00:46:46,450 This guy is now the zero. 631 00:46:46,450 --> 00:46:50,010 This guy didn't move. 632 00:46:50,010 --> 00:46:54,060 What matrix filled in here? 633 00:46:54,060 --> 00:46:56,100 Well, just what I was doing there. 634 00:46:56,100 --> 00:46:59,300 Elimination is exactly what I'm repeating with numbers, 635 00:46:59,300 --> 00:47:00,740 what I did there with letters. 636 00:47:00,740 --> 00:47:05,540 What's in here is minus A transpose A. 637 00:47:05,540 --> 00:47:08,970 I could figure out what that -- I could do -- 638 00:47:08,970 --> 00:47:15,070 if I was fast enough, I could do 2, 3, 4; 5, 6, 7 times 2, 3, 4; 639 00:47:15,070 --> 00:47:21,840 5, 6, 7 and I'd get this little two by two matrix that sits 640 00:47:21,840 --> 00:47:22,340 there. 641 00:47:25,150 --> 00:47:28,850 With a minus sign, and that's the point. 642 00:47:28,850 --> 00:47:31,900 That the pivots -- of course, what's this first number going 643 00:47:31,900 --> 00:47:32,400 to be? 644 00:47:32,400 --> 00:47:34,610 4 plus 9 plus 16. 645 00:47:34,610 --> 00:47:39,060 29 is that number. 646 00:47:39,060 --> 00:47:44,430 So a minus 29 sits right there. 647 00:47:44,430 --> 00:47:46,560 That's the next pivot. 648 00:47:46,560 --> 00:47:50,280 The next pivot is a negative number, minus 29, 649 00:47:50,280 --> 00:47:52,370 and the fifth pivot is negative. 650 00:47:52,370 --> 00:47:55,860 So what I'm seeing is a matrix -- 651 00:47:55,860 --> 00:48:05,140 this matrix has three positive pivots, 652 00:48:05,140 --> 00:48:07,240 and two negative pivots. 653 00:48:11,340 --> 00:48:13,500 I sort of say that was a saddle point. 654 00:48:13,500 --> 00:48:20,440 Positive pivots describe for me a surface that's going upwards. 655 00:48:20,440 --> 00:48:22,640 This surface is going upward. 656 00:48:22,640 --> 00:48:25,660 I'm a surface in five dimensions here. 657 00:48:25,660 --> 00:48:28,950 It's going upwards in three directions, 658 00:48:28,950 --> 00:48:32,560 but it's going downwards in two. 659 00:48:32,560 --> 00:48:37,150 The point at the heart of it, the saddle point, 660 00:48:37,150 --> 00:48:41,930 is the solution to our system, is the y hat, u hat. 661 00:48:46,430 --> 00:48:54,020 Well, one conclusion is that I wouldn't 662 00:48:54,020 --> 00:48:56,020 be able to use conjugate gradient methods, 663 00:48:56,020 --> 00:48:59,230 for example, which we've just learned how powerful they 664 00:48:59,230 --> 00:49:03,830 are, on the big matrix because it's not positive definite. 665 00:49:03,830 --> 00:49:04,650 It's symmetric. 666 00:49:04,650 --> 00:49:07,010 I could use some other available methods. 667 00:49:07,010 --> 00:49:09,020 I couldn't use conjugate gradient. 668 00:49:09,020 --> 00:49:11,460 So if I wanted to use conjugate gradient, 669 00:49:11,460 --> 00:49:23,480 I better do this reduction to the definite system. 670 00:49:26,480 --> 00:49:34,000 That's longer than I intended to spend on the simple example. 671 00:49:34,000 --> 00:49:38,240 But if you see that example, then we'll 672 00:49:38,240 --> 00:49:44,320 be ready to move it to the wide variety of applications. 673 00:49:44,320 --> 00:49:52,890 So let me just note that one section, already up on the web, 674 00:49:52,890 --> 00:49:58,000 called saddle point systems, solves differential equations 675 00:49:58,000 --> 00:49:59,640 that are of this kind. 676 00:49:59,640 --> 00:50:04,670 So we'll come to that, and we'll come to matrix problems too. 677 00:50:04,670 --> 00:50:08,170 It's a very, very central question, 678 00:50:08,170 --> 00:50:13,580 how to solve linear systems with matrices of that form. 679 00:50:13,580 --> 00:50:16,830 In fact, I guess I could say it's 680 00:50:16,830 --> 00:50:21,220 almost the fundamental problem of numerical linear algebra, 681 00:50:21,220 --> 00:50:27,620 is to solve systems that fall into that saddle point 682 00:50:27,620 --> 00:50:29,240 description. 683 00:50:29,240 --> 00:50:34,730 I'll try to justify that by the importance 684 00:50:34,730 --> 00:50:38,660 I'm assigning to this problem in the next weeks. 685 00:50:38,660 --> 00:50:39,270 OK. 686 00:50:39,270 --> 00:50:46,150 Thanks for today and I'll turn off volume.