1 00:00:00,000 --> 00:00:01,950 The following content is provided 2 00:00:01,950 --> 00:00:06,100 by MIT OpenCourseWare under a Creative Commons license. 3 00:00:06,100 --> 00:00:08,200 Additional information about our license 4 00:00:08,200 --> 00:00:10,520 and MIT OpenCourseWare in general 5 00:00:10,520 --> 00:00:11,930 is available at ocw.mit.edu. 6 00:00:17,920 --> 00:00:19,190 PROFESSOR: OK. 7 00:00:19,190 --> 00:00:20,410 Good. 8 00:00:20,410 --> 00:00:26,440 So I decided to make today's lecture the one on linear 9 00:00:26,440 --> 00:00:31,230 programming and duality, which I'd planned for Friday, 10 00:00:31,230 --> 00:00:36,680 and give myself two more days to learn about ill-posed 11 00:00:36,680 --> 00:00:42,250 and inverse problems, and then come back to that Friday, 12 00:00:42,250 --> 00:00:47,180 so that -- we've studied the limits in those problems 13 00:00:47,180 --> 00:00:54,670 of alpha going to infinity or 0, but the scientific question 14 00:00:54,670 --> 00:00:57,370 when there's noise in the system is finite alpha, 15 00:00:57,370 --> 00:01:01,950 and I want to learn more about applications and examples. 16 00:01:01,950 --> 00:01:09,180 Can I also say I'm very happy to have had volunteers for Monday 17 00:01:09,180 --> 00:01:11,780 and Wednesday of next week to present, 18 00:01:11,780 --> 00:01:14,090 and if a couple of people might maybe 19 00:01:14,090 --> 00:01:19,620 volunteer for Friday, to share Friday, I'll be very grateful. 20 00:01:19,620 --> 00:01:24,570 So you could see me after class, put a hand up now, 21 00:01:24,570 --> 00:01:29,140 send me an email -- all those would be very good. 22 00:01:29,140 --> 00:01:33,610 And again, I would be thinking -- 23 00:01:33,610 --> 00:01:36,510 since it's just next week I'm talking about -- 24 00:01:36,510 --> 00:01:42,210 that it would be essentially a report on your Project One that 25 00:01:42,210 --> 00:01:47,270 you would use the overhead projector maybe, 26 00:01:47,270 --> 00:01:49,890 if that's preferable. 27 00:01:49,890 --> 00:01:50,670 OK. 28 00:01:50,670 --> 00:01:55,550 So I think you'll like this topic. 29 00:01:55,550 --> 00:02:02,620 It's kind of specific but widely used -- linear programming -- 30 00:02:02,620 --> 00:02:11,170 used in business to maximize profits, to minimize costs. 31 00:02:11,170 --> 00:02:17,180 And linear means that the cost function is linear. 32 00:02:17,180 --> 00:02:20,250 That's an inner product -- c*x. 33 00:02:20,250 --> 00:02:22,890 c is a row vector, x is a column vector, 34 00:02:22,890 --> 00:02:25,720 so that I'm following the conventions of this subject 35 00:02:25,720 --> 00:02:29,210 here to take these different shapes, 36 00:02:29,210 --> 00:02:32,750 so let me indicate what the shapes are. 37 00:02:32,750 --> 00:02:44,600 But the inputs are -- the data of the problem are -- c and A, 38 00:02:44,600 --> 00:02:47,080 an m by n matrix, and b. 39 00:02:47,080 --> 00:02:55,000 So A is m by n, b is m by 1 -- right-hand side -- 40 00:02:55,000 --> 00:02:58,400 and c is 1 by n. 41 00:02:58,400 --> 00:03:04,730 And then the unknown -- this is the thing that we're to find -- 42 00:03:04,730 --> 00:03:10,150 it's a column vector, n by 1. 43 00:03:10,150 --> 00:03:11,250 OK. 44 00:03:11,250 --> 00:03:16,710 And the point is there are constraints 45 00:03:16,710 --> 00:03:19,240 and those are linear too. 46 00:03:19,240 --> 00:03:22,200 So it's rather unusual to have a linear cost function. 47 00:03:22,200 --> 00:03:23,540 Right? 48 00:03:23,540 --> 00:03:29,770 Because when you maximize or minimize some linear function, 49 00:03:29,770 --> 00:03:34,010 well, the thing is just going up or it's going down -- 50 00:03:34,010 --> 00:03:39,570 or in higher dimensions the same -- and if it's going down, 51 00:03:39,570 --> 00:03:42,740 then the minimum is going to be at the right-hand end. 52 00:03:42,740 --> 00:03:46,670 Or if it's going up, the minimum will be at the left-hand end. 53 00:03:46,670 --> 00:03:49,670 And if I'm in more variables, this idea 54 00:03:49,670 --> 00:03:53,850 will still be true that the minimum or maximum 55 00:03:53,850 --> 00:03:58,720 will happen at the edges, at the ends of the allowed region. 56 00:03:58,720 --> 00:04:02,490 And this allowed region, called the feasible set -- 57 00:04:02,490 --> 00:04:07,480 so let me give the name to this -- these are the allowed x's. 58 00:04:07,480 --> 00:04:09,470 These are the constraints. 59 00:04:09,470 --> 00:04:16,490 And that set is called the feasible set, feasible meaning 60 00:04:16,490 --> 00:04:19,060 doable. 61 00:04:19,060 --> 00:04:22,830 So those constraints include inequalities, 62 00:04:22,830 --> 00:04:29,200 because we want finite intervals, finite regions in n 63 00:04:29,200 --> 00:04:30,550 dimensions. 64 00:04:30,550 --> 00:04:35,460 And I drew a sort of quick picture 65 00:04:35,460 --> 00:04:38,400 so that you have a model of this. 66 00:04:38,400 --> 00:04:45,170 So this is a picture with n equals 3 -- 3 dimensions, 67 00:04:45,170 --> 00:04:49,850 and so the constraints x greater or equal 0, 68 00:04:49,850 --> 00:04:53,790 x_1 greater or equal 0, x_2 greater or equal 0, 69 00:04:53,790 --> 00:04:58,550 x_3 greater or equal 0 -- that's what x greater or equal 0 70 00:04:58,550 --> 00:04:59,950 means. 71 00:04:59,950 --> 00:05:02,130 It means all components. 72 00:05:02,130 --> 00:05:04,220 So we're in the quadrant right? 73 00:05:04,220 --> 00:05:08,040 We're in a quarter -- 1/8 sorry -- we're in an octant -- 74 00:05:08,040 --> 00:05:12,680 1/8 of three-dimensional space, the positive octant. 75 00:05:12,680 --> 00:05:18,870 And then if I draw maybe just one, just put in one equation, 76 00:05:18,870 --> 00:05:25,340 one plane, would cut off a piece of that octant, 77 00:05:25,340 --> 00:05:32,140 so that A*x greater or equal b, depending on the signs, 78 00:05:32,140 --> 00:05:39,320 but the feasible set could well be the tetrahedron, 79 00:05:39,320 --> 00:05:44,950 the little piece of the octant that's cut out by this plane. 80 00:05:44,950 --> 00:05:48,480 Or if our constraint was an equality, 81 00:05:48,480 --> 00:05:51,370 the feasible set would be the triangle. 82 00:05:51,370 --> 00:05:57,120 So A*x equal to b would lead to the triangle, 83 00:05:57,120 --> 00:06:01,400 and A*x greater or equal b, if we pick the signs correctly, 84 00:06:01,400 --> 00:06:05,880 would be the pyramid, would include also this corner, 85 00:06:05,880 --> 00:06:10,160 because there'd be some volume. 86 00:06:10,160 --> 00:06:10,740 OK. 87 00:06:10,740 --> 00:06:18,080 So the feasible set is a polyhedron. 88 00:06:18,080 --> 00:06:23,030 It's like a polygon, only up into n dimensions, 89 00:06:23,030 --> 00:06:25,320 so we use the word polyhedron. 90 00:06:25,320 --> 00:06:34,130 And it's got corners, and the whole point of the linear cost, 91 00:06:34,130 --> 00:06:37,220 the linear objective function c*x -- 92 00:06:37,220 --> 00:06:40,540 so this is just c_1*x_1 plus... 93 00:06:40,540 --> 00:06:46,850 plus c_n*x_n; that's what that means. 94 00:06:46,850 --> 00:06:50,070 If I take derivatives, I get constants. 95 00:06:50,070 --> 00:06:54,090 I don't set derivatives to 0 in this type of problem. 96 00:06:54,090 --> 00:06:58,480 I look at the endpoints, at the corners. 97 00:06:58,480 --> 00:07:02,010 And that's where the minimum and maximum will occur. 98 00:07:02,010 --> 00:07:05,540 So it's just a question of finding the right corner. 99 00:07:05,540 --> 00:07:09,320 That's the problem: how to find the winning corner. 100 00:07:16,020 --> 00:07:21,670 It's an interesting competition between two quite different 101 00:07:21,670 --> 00:07:27,350 approaches: the famous approach -- so let me write these two. 102 00:07:27,350 --> 00:07:36,200 The simplex method is the best established, best known 103 00:07:36,200 --> 00:07:39,680 approach for solving these problems. 104 00:07:39,680 --> 00:07:42,490 What's the idea of the simplex method? 105 00:07:42,490 --> 00:07:44,570 The simplex method finds a corner. 106 00:07:47,220 --> 00:07:53,210 A corner is a case where we have some equality signs. 107 00:07:53,210 --> 00:08:00,090 A corner is the edge, the limit where maybe this one still 108 00:08:00,090 --> 00:08:05,090 has x_1 positive but it's down in this plane 109 00:08:05,090 --> 00:08:09,380 so it has maybe x_3 is 0 for this guy. 110 00:08:09,380 --> 00:08:14,710 So that corner has x_3 equals 0 and it also lies right 111 00:08:14,710 --> 00:08:19,390 on the plane, so it has A*x equal to b. 112 00:08:19,390 --> 00:08:24,040 This corner -- well, I guess that corner has all these guys 113 00:08:24,040 --> 00:08:27,790 equal 0: x_1 equals 0, x_2 equals 0, x_3 equals 0, 114 00:08:27,790 --> 00:08:31,770 but A*x -- inequality's holding here for this corner 115 00:08:31,770 --> 00:08:35,170 that's hiding behind the face. 116 00:08:35,170 --> 00:08:40,860 Anyway, corners are points where some of the constraints 117 00:08:40,860 --> 00:08:47,990 are tight or active and others are not. 118 00:08:47,990 --> 00:08:52,010 Well, you might say: just check them all, 119 00:08:52,010 --> 00:08:56,340 but the trouble is there are lots of corners. 120 00:08:56,340 --> 00:09:03,680 If we're in n dimensions and we have m constraint equations, 121 00:09:03,680 --> 00:09:08,580 then the number of corners goes up exponentially. 122 00:09:08,580 --> 00:09:11,890 So no way to check all of them. 123 00:09:11,890 --> 00:09:16,240 So the simplex method had a better idea. 124 00:09:16,240 --> 00:09:18,150 The simplex method found one of them -- 125 00:09:18,150 --> 00:09:21,940 and already that's a little bit of a job, to find a corner, 126 00:09:21,940 --> 00:09:24,370 but finds one. 127 00:09:24,370 --> 00:09:26,170 And then what the simplex method does, 128 00:09:26,170 --> 00:09:31,110 it stays entirely, it moves along the edges. 129 00:09:31,110 --> 00:09:35,190 So from here, it will look to see in 130 00:09:35,190 --> 00:09:41,540 which direction would the cost go down, because we're 131 00:09:41,540 --> 00:09:43,550 trying to minimize the cost. 132 00:09:43,550 --> 00:09:45,130 So it would check these directions. 133 00:09:47,950 --> 00:09:55,000 Each of those directions we're releasing one equality. 134 00:09:55,000 --> 00:10:00,060 We're allowing one equality to be an inequality, 135 00:10:00,060 --> 00:10:02,080 and that moves us along. 136 00:10:02,080 --> 00:10:05,440 So the simplex method has two steps. 137 00:10:05,440 --> 00:10:07,800 It checks each of these directions 138 00:10:07,800 --> 00:10:13,150 to find out which way will the cost drop fastest. 139 00:10:13,150 --> 00:10:18,260 It chooses the direction in which the cost -- the gradient, 140 00:10:18,260 --> 00:10:20,880 the component of the gradient, you could say, 141 00:10:20,880 --> 00:10:26,290 along that edge is the biggest, or maybe the most negative. 142 00:10:26,290 --> 00:10:31,820 And then, once it decides which way to go, it goes -- 143 00:10:31,820 --> 00:10:34,660 maybe it takes this direction -- it goes, goes, goes, goes, 144 00:10:34,660 --> 00:10:38,580 goes until it hits another corner. 145 00:10:38,580 --> 00:10:41,870 So that's the end of the simplex step, 146 00:10:41,870 --> 00:10:43,750 when it reaches another corner. 147 00:10:43,750 --> 00:10:46,560 That completes one simplex step. 148 00:10:46,560 --> 00:10:50,100 Then, from this corner, it will look the three ways 149 00:10:50,100 --> 00:10:52,120 it could go here. 150 00:10:52,120 --> 00:10:55,960 Well, it's not going to pick this way, we know, 151 00:10:55,960 --> 00:11:00,760 because in this direction the cost was decreasing 152 00:11:00,760 --> 00:11:03,180 or we wouldn't have taken it. 153 00:11:03,180 --> 00:11:04,860 We wouldn't have taken that direction 154 00:11:04,860 --> 00:11:06,290 except that the cost went down. 155 00:11:06,290 --> 00:11:08,530 So if we came back, the cost would go up. 156 00:11:08,530 --> 00:11:09,680 No good. 157 00:11:09,680 --> 00:11:13,760 So going down would be one of these two ways. 158 00:11:13,760 --> 00:11:18,340 Maybe it goes down in this direction, 159 00:11:18,340 --> 00:11:20,860 so we decide on that direction. 160 00:11:20,860 --> 00:11:24,420 We follow it until we hit a new corner, 161 00:11:24,420 --> 00:11:28,690 and eventually we're going to get to the winning corner 162 00:11:28,690 --> 00:11:33,060 because there are only a finite number of corners. 163 00:11:33,060 --> 00:11:35,890 And how will we know it's the winning corner? 164 00:11:35,890 --> 00:11:37,870 Well, we'll know that corner is a winner 165 00:11:37,870 --> 00:11:42,340 if, in every direction, the cost goes up. 166 00:11:42,340 --> 00:11:45,900 If the cost goes up in all directions, 167 00:11:45,900 --> 00:11:49,210 along all the edges out of that corner, 168 00:11:49,210 --> 00:11:50,470 then that corner has won. 169 00:11:50,470 --> 00:11:52,150 It's the minimum. 170 00:11:52,150 --> 00:11:58,230 I'm using linearity here -- the fact that you know everything 171 00:11:58,230 --> 00:12:01,540 by traveling along, by looking along those edges. 172 00:12:01,540 --> 00:12:05,090 So the simplex method is a big success. 173 00:12:05,090 --> 00:12:08,930 Because in reality, in practice, it 174 00:12:08,930 --> 00:12:10,860 turns out that the number of edges 175 00:12:10,860 --> 00:12:16,020 that you have to travel to get to the winner 176 00:12:16,020 --> 00:12:18,130 doesn't grow exponentially. 177 00:12:18,130 --> 00:12:20,480 I mean in principle it could. 178 00:12:20,480 --> 00:12:27,640 People have dreamt up really desperate examples in which 179 00:12:27,640 --> 00:12:31,260 following the simplex method you could take a long time, 180 00:12:31,260 --> 00:12:35,470 but on average you don't, and in practice you don't. 181 00:12:35,470 --> 00:12:39,600 So it's a very good method and was 182 00:12:39,600 --> 00:12:43,280 totally the method of choice. 183 00:12:43,280 --> 00:12:48,680 But a competitor has arrived. 184 00:12:48,680 --> 00:12:54,260 And that competitor goes under the name of interior point 185 00:12:54,260 --> 00:13:06,020 method, and you can guess what that method is 186 00:13:06,020 --> 00:13:10,010 doing quite different, totally different system. 187 00:13:10,010 --> 00:13:14,830 That method is inside the feasible set. 188 00:13:14,830 --> 00:13:18,010 It finds a point somewhere near the middle maybe. 189 00:13:18,010 --> 00:13:30,010 And then it does normal, gradient-type approach 190 00:13:30,010 --> 00:13:32,260 from your point. 191 00:13:32,260 --> 00:13:35,450 It figures out which way to move. 192 00:13:35,450 --> 00:13:38,830 It moves, but it doesn't go outside. 193 00:13:38,830 --> 00:13:41,250 It doesn't even reach the boundary of the feasible 194 00:13:41,250 --> 00:13:43,820 set, because if you reach the boundary of the feasible set, 195 00:13:43,820 --> 00:13:47,550 you're out of the interior and the method is not 196 00:13:47,550 --> 00:13:50,420 going to operate. 197 00:13:50,420 --> 00:13:53,300 Well, that crudest method would be follow the gradient, 198 00:13:53,300 --> 00:14:01,430 but we know from several situations 199 00:14:01,430 --> 00:14:08,990 that gradient descent can be less than optimal. 200 00:14:08,990 --> 00:14:10,650 So this is more subtle. 201 00:14:10,650 --> 00:14:14,400 Well, Newton's method actually -- I'll explain. 202 00:14:14,400 --> 00:14:18,140 So this is actually the content of my lecture -- 203 00:14:18,140 --> 00:14:20,700 this interior point method. 204 00:14:20,700 --> 00:14:26,770 And let me just mention a few names. 205 00:14:26,770 --> 00:14:30,090 People thought of interior point methods long ago, 206 00:14:30,090 --> 00:14:37,910 but a big splash came when Karmarkar 207 00:14:37,910 --> 00:14:40,760 proposed an interior point method 208 00:14:40,760 --> 00:14:49,800 and proved that it converged faster than simplex 209 00:14:49,800 --> 00:14:51,440 method in some problems. 210 00:14:51,440 --> 00:14:53,250 Well, he said all problems, actually. 211 00:14:53,250 --> 00:15:00,370 His advertising of the message was pretty generous. 212 00:15:00,370 --> 00:15:02,600 The sort of claim that was around 213 00:15:02,600 --> 00:15:07,170 was, you know, ten times as fast as the simplex method, 214 00:15:07,170 --> 00:15:10,670 generally, and it was on the front page of the New York 215 00:15:10,670 --> 00:15:16,230 Times, and I remember going to a lecture in Boston 216 00:15:16,230 --> 00:15:20,590 with lights, TV lights on and everything. 217 00:15:20,590 --> 00:15:28,610 Well, maybe his exact method now isn't so much used, 218 00:15:28,610 --> 00:15:35,330 but you have to give him credit for stirring up the whole world 219 00:15:35,330 --> 00:15:42,110 of optimization because the result of Karmarkar's method -- 220 00:15:42,110 --> 00:15:49,190 and there were others -- and I'll say barrier methods, 221 00:15:49,190 --> 00:15:52,080 and that's what I'll try to explain. 222 00:15:55,700 --> 00:15:58,480 He stirred up the whole world so that the experts 223 00:15:58,480 --> 00:16:02,400 in optimization began looking again at interior point 224 00:16:02,400 --> 00:16:05,240 methods, seeing that they did have some merit, 225 00:16:05,240 --> 00:16:06,510 improving them. 226 00:16:06,510 --> 00:16:11,010 And now for, I would say particularly 227 00:16:11,010 --> 00:16:19,270 for large, sparse problems, these are a way to go. 228 00:16:19,270 --> 00:16:20,550 These are preferred now. 229 00:16:20,550 --> 00:16:25,260 So this is the normal situation in scientific computing: 230 00:16:25,260 --> 00:16:30,010 that any method that's good, it's 231 00:16:30,010 --> 00:16:32,310 still not good for everything. 232 00:16:32,310 --> 00:16:37,350 It's got a range of problems where it's successful 233 00:16:37,350 --> 00:16:40,560 and a range of problems where some competitor wins. 234 00:16:40,560 --> 00:16:43,850 So that's the situation now. 235 00:16:43,850 --> 00:16:44,640 These are methods. 236 00:16:44,640 --> 00:16:48,350 This is certainly not out of date, 237 00:16:48,350 --> 00:16:51,330 and I'm sure it's the method of choice 238 00:16:51,330 --> 00:16:54,560 and it's carefully coded and well understood, 239 00:16:54,560 --> 00:16:59,530 but these are quite effective. 240 00:16:59,530 --> 00:17:00,250 OK. 241 00:17:00,250 --> 00:17:03,390 So my job then is to say something 242 00:17:03,390 --> 00:17:06,270 about these interior point methods. 243 00:17:06,270 --> 00:17:13,980 And the beauty of these is that the primal -- 244 00:17:13,980 --> 00:17:16,885 this is called the primal problem. 245 00:17:16,885 --> 00:17:17,510 Primal problem. 246 00:17:17,510 --> 00:17:20,260 And often you write a P for primal. 247 00:17:20,260 --> 00:17:22,690 It means the given problem. 248 00:17:22,690 --> 00:17:25,250 And over here is the dual problem. 249 00:17:25,250 --> 00:17:34,810 So you put a D, dual problem. [UNINTELLIGIBLE PHRASE] 250 00:17:34,810 --> 00:17:40,590 It involves the same data: the same b, 251 00:17:40,590 --> 00:17:47,280 the same A, and the same c, but a new variable y. 252 00:17:47,280 --> 00:18:03,420 And [UNINTELLIGIBLE PHRASE] is really 253 00:18:03,420 --> 00:18:06,590 the Lagrange multiplier in the original problem 254 00:18:06,590 --> 00:18:08,360 for the constraints. 255 00:18:12,950 --> 00:18:19,190 So I won't go at the dual problem exactly that way. 256 00:18:19,190 --> 00:18:30,320 I'm going to ask you just to consider this problem 257 00:18:30,320 --> 00:18:32,940 and show you the relation between the two. 258 00:18:32,940 --> 00:18:36,050 So what I want to say is that these two problems -- 259 00:18:36,050 --> 00:18:43,910 the primal and the dual, which use the same data A, b, c, 260 00:18:43,910 --> 00:18:49,351 are intimately related, and sort of solving one solves the other 261 00:18:49,351 --> 00:18:49,850 one. 262 00:18:49,850 --> 00:18:51,510 Actually is this it? 263 00:18:51,510 --> 00:18:53,200 That applies to the simplex method. 264 00:18:53,200 --> 00:18:57,270 When the simplex method finds the best corner, 265 00:18:57,270 --> 00:18:59,700 we could read off the Lagrange multipliers, 266 00:18:59,700 --> 00:19:01,360 we could read off y. 267 00:19:01,360 --> 00:19:04,790 We could read off the optimal y. 268 00:19:04,790 --> 00:19:07,630 So my picture was in the primal case, 269 00:19:07,630 --> 00:19:11,870 but there's a dual picture in the dual case. 270 00:19:11,870 --> 00:19:12,680 OK. 271 00:19:12,680 --> 00:19:18,010 So we have a minimum problem and a maximum problem, 272 00:19:18,010 --> 00:19:21,770 and I'm using this word duality. 273 00:19:21,770 --> 00:19:25,740 So what I want to do is tell you how 274 00:19:25,740 --> 00:19:30,110 do we recognize the winning corner in the primal problem, 275 00:19:30,110 --> 00:19:33,810 and it's beautiful. 276 00:19:33,810 --> 00:19:40,380 So at the best -- so the optimal x, 277 00:19:40,380 --> 00:19:45,580 let me call it x star and y star, 278 00:19:45,580 --> 00:19:52,880 have min over there equal the max here. 279 00:19:52,880 --> 00:20:01,170 Min of all the c*x's, which is c x star, 280 00:20:01,170 --> 00:20:08,370 equal to a maximum over all of the y's of the y*b's, which is 281 00:20:08,370 --> 00:20:10,600 y star b. 282 00:20:10,600 --> 00:20:12,850 So these are equal at the winner. 283 00:20:15,420 --> 00:20:21,450 That's the essence of this duality. 284 00:20:21,450 --> 00:20:25,850 Duality is about two problems that use the same data 285 00:20:25,850 --> 00:20:27,585 but they look quite different. 286 00:20:27,585 --> 00:20:29,710 You know, they're using the data in different ways. 287 00:20:29,710 --> 00:20:34,660 The cost function there showed up in the constraint here. 288 00:20:34,660 --> 00:20:38,470 The constraint b there showed up in the cost function here. 289 00:20:38,470 --> 00:20:45,620 And even A got flipped, because if I use my usual column vector 290 00:20:45,620 --> 00:20:48,920 notation -- if I just transpose this -- 291 00:20:48,920 --> 00:20:56,430 this would be A transpose y transpose less or equal to c 292 00:20:56,430 --> 00:20:57,130 transpose. 293 00:20:57,130 --> 00:21:00,980 If I wanted to stay with column vectors y transpose and c 294 00:21:00,980 --> 00:21:04,700 transpose, then it would be the transpose of A 295 00:21:04,700 --> 00:21:06,470 that would appear. 296 00:21:06,470 --> 00:21:12,150 So I'll just put transpose with two exclamation marks. 297 00:21:12,150 --> 00:21:15,350 That's typical. 298 00:21:15,350 --> 00:21:20,030 And you often see the word adjoint. 299 00:21:20,030 --> 00:21:24,710 So there are methods in differential equations, 300 00:21:24,710 --> 00:21:27,380 in optimization, called adjoint methods. 301 00:21:27,380 --> 00:21:31,630 Adjoint is just, really, another word for transpose. 302 00:21:31,630 --> 00:21:36,260 It's a word that applies in differential equations as well 303 00:21:36,260 --> 00:21:41,960 as matrices, so it's kind of a better word, you could say, 304 00:21:41,960 --> 00:21:45,040 where transpose we usually apply to matrices, 305 00:21:45,040 --> 00:21:48,640 but totally the same idea, identical idea. 306 00:21:48,640 --> 00:21:49,510 OK. 307 00:21:49,510 --> 00:21:59,220 So the wonderful thing is that at the moment of success, 308 00:21:59,220 --> 00:22:03,140 at the moment of optimality, these are equal. 309 00:22:03,140 --> 00:22:05,750 A minimum equals a maximum. 310 00:22:05,750 --> 00:22:12,090 And that's one way to recognize that you've succeeded, 311 00:22:12,090 --> 00:22:15,510 and that's one way to measure how far you have 312 00:22:15,510 --> 00:22:18,430 to go, with the duality gap. 313 00:22:18,430 --> 00:22:23,780 So the duality gap would be the difference. 314 00:22:23,780 --> 00:22:29,150 If you had a particular y that wasn't the winner, 315 00:22:29,150 --> 00:22:32,270 a particular x that wasn't the winner, 316 00:22:32,270 --> 00:22:38,790 the duality gap would be the difference between c*x and y*b. 317 00:22:38,790 --> 00:22:44,920 And what I'm saying is that when that duality gap narrows to 0, 318 00:22:44,920 --> 00:22:47,790 you've got it. 319 00:22:47,790 --> 00:22:51,610 When this narrows to 0, you've brought c*x down as far as you 320 00:22:51,610 --> 00:22:56,070 could, you've raised y*b up as far as you could. 321 00:22:56,070 --> 00:23:02,240 And if you did it right, if you've got to the optimum, 322 00:23:02,240 --> 00:23:06,400 then the duality gap disappeared -- became 0. 323 00:23:06,400 --> 00:23:10,440 So that's a measure of am I at the answer, 324 00:23:10,440 --> 00:23:14,620 am I close, you know, if we're going to do an iterative method 325 00:23:14,620 --> 00:23:15,710 as I'm planning. 326 00:23:15,710 --> 00:23:18,950 So that's the point, of course. 327 00:23:18,950 --> 00:23:23,910 These interior point methods will be iterative. 328 00:23:26,870 --> 00:23:29,900 We step, we never actually allow them 329 00:23:29,900 --> 00:23:36,660 to get to the absolute corner until maybe at the last minute. 330 00:23:36,660 --> 00:23:40,350 So here, let me draw a picture of how interior point 331 00:23:40,350 --> 00:23:41,620 methods might work. 332 00:23:41,620 --> 00:23:47,250 So here is the feasible set -- some kind of a polyhedron, 333 00:23:47,250 --> 00:23:48,780 whatever. 334 00:23:48,780 --> 00:23:54,030 So think of that as a kind of a diamond, a twenty-four carat 335 00:23:54,030 --> 00:23:54,530 diamond. 336 00:23:54,530 --> 00:23:55,940 OK? 337 00:23:55,940 --> 00:24:00,730 And start at a point inside. 338 00:24:00,730 --> 00:24:05,030 And somehow find the gradient, decide which way to move, 339 00:24:05,030 --> 00:24:07,790 dot dot dot dot. 340 00:24:07,790 --> 00:24:10,920 And there'll be some barrier here 341 00:24:10,920 --> 00:24:13,760 which is going to prevent us from reaching it, 342 00:24:13,760 --> 00:24:15,180 so we'll stop. 343 00:24:15,180 --> 00:24:18,560 So that will be one step, and from here we 344 00:24:18,560 --> 00:24:21,990 will do the same thing, whatever it is. 345 00:24:21,990 --> 00:24:23,590 It'll be Newton's method, actually. 346 00:24:23,590 --> 00:24:24,220 You'll see. 347 00:24:24,220 --> 00:24:26,710 It's just Newton's method. 348 00:24:26,710 --> 00:24:30,160 The most fundamental way to solve non-linear equations 349 00:24:30,160 --> 00:24:31,900 is Newton's method. 350 00:24:31,900 --> 00:24:34,950 And it'll take another direction. 351 00:24:34,950 --> 00:24:38,700 Again it'll stop, and the thing will follow some path. 352 00:24:38,700 --> 00:24:46,370 And then maybe at this point the duality gap is very small. 353 00:24:46,370 --> 00:24:49,270 We'll realize that this is the winner. 354 00:24:49,270 --> 00:24:54,780 So we could, at the last minute, say: OK, jump to the winner. 355 00:24:54,780 --> 00:24:58,890 But it's this path through the interior 356 00:24:58,890 --> 00:25:01,710 that we're really interested in. 357 00:25:01,710 --> 00:25:06,070 OK, so I'm giving a sort of general picture of it. 358 00:25:06,070 --> 00:25:09,840 And now I'm ready to do two things. 359 00:25:09,840 --> 00:25:12,870 One is the nice little bit of algebra 360 00:25:12,870 --> 00:25:19,560 that says that this duality gap is always greater or equal 0. 361 00:25:19,560 --> 00:25:20,600 OK. 362 00:25:20,600 --> 00:25:23,840 So that's called weak duality. 363 00:25:23,840 --> 00:25:32,250 Weak duality, which is easy to prove, says that, always, c*x, 364 00:25:32,250 --> 00:25:38,860 for any feasible x, is greater or equal to y*b for any 365 00:25:38,860 --> 00:25:40,410 feasible y. 366 00:25:40,410 --> 00:25:47,380 So any x and y that satisfy the constraints. 367 00:25:51,740 --> 00:25:58,350 I should say satisfying the constraints. 368 00:25:58,350 --> 00:26:01,640 So weak duality I'll now prove in one second. 369 00:26:05,210 --> 00:26:16,540 And the point is that, as I push to bring c*x down -- 370 00:26:16,540 --> 00:26:22,380 minimize -- as I push to move y*b up -- maximize -- 371 00:26:22,380 --> 00:26:25,760 they will meet, at the winner. 372 00:26:25,760 --> 00:26:29,080 OK, now how do I prove c*x greater or equal y*b? 373 00:26:29,080 --> 00:26:30,980 Let me try to prove that. 374 00:26:30,980 --> 00:26:34,300 Proof. 375 00:26:34,300 --> 00:26:36,950 OK, so look at y*b. 376 00:26:36,950 --> 00:26:38,020 OK. 377 00:26:38,020 --> 00:26:41,370 Now, so I know something about the constraints. 378 00:26:41,370 --> 00:26:45,600 A*x is greater or equal to b. 379 00:26:45,600 --> 00:26:51,600 So this b -- I want to say that this is less or equal to y*A*x. 380 00:26:51,600 --> 00:26:54,730 Now am I allowed to say that? 381 00:26:54,730 --> 00:26:58,580 First of all, y is feasible; x is feasible. 382 00:26:58,580 --> 00:27:00,550 So they satisfy. 383 00:27:00,550 --> 00:27:04,550 Feasible means that these are satisfied 384 00:27:04,550 --> 00:27:06,440 and these are satisfied. 385 00:27:09,420 --> 00:27:10,940 OK. 386 00:27:10,940 --> 00:27:14,520 And do you see that that's really all right? 387 00:27:14,520 --> 00:27:17,550 Well you might say no problem. 388 00:27:17,550 --> 00:27:19,080 A*x is greater or b. 389 00:27:19,080 --> 00:27:20,490 It's obvious. 390 00:27:20,490 --> 00:27:25,100 But I have actually used one more point here, haven't I? 391 00:27:25,100 --> 00:27:29,400 If I have an inequality, then I'm multiplying it by y, 392 00:27:29,400 --> 00:27:32,240 and it didn't change the direction of the inequality 393 00:27:32,240 --> 00:27:37,850 sign, and that was because y is greater or equal 0. 394 00:27:37,850 --> 00:27:41,050 That's where that paid off. 395 00:27:41,050 --> 00:27:44,760 So this used the fact that -- this came from the fact that y 396 00:27:44,760 --> 00:27:48,780 was greater or equal 0 and A*x was greater or equal b. 397 00:27:51,980 --> 00:27:56,900 Those two facts meant that I could multiply and preserve 398 00:27:56,900 --> 00:27:58,330 the inequality sign. 399 00:27:58,330 --> 00:28:01,390 And now I'm going to go to the next step: 400 00:28:01,390 --> 00:28:06,450 y*A is less or equal c, and there is x. 401 00:28:06,450 --> 00:28:12,420 OK, so you see that I finally got what I want: 402 00:28:12,420 --> 00:28:14,750 y*b less or equal to c*x. 403 00:28:14,750 --> 00:28:19,030 But what went into that step? 404 00:28:19,030 --> 00:28:25,960 Well, looking here, I had y*A less or equal to c, 405 00:28:25,960 --> 00:28:31,490 and I also had x greater or equal 0 by the feasibility 406 00:28:31,490 --> 00:28:32,570 of x. 407 00:28:32,570 --> 00:28:37,820 So that inequality I was allowed to multiply 408 00:28:37,820 --> 00:28:41,990 by x because x is not negative. 409 00:28:41,990 --> 00:28:45,370 If x had been minus 1, then when you -- right, 410 00:28:45,370 --> 00:28:49,750 if I have an inequality like 4 less or equal 7, 411 00:28:49,750 --> 00:28:53,810 if I multiply by minus 1, I get minus 4 and minus 7, 412 00:28:53,810 --> 00:28:58,350 and the inequality switches: minus 7 is below minus 4. 413 00:28:58,350 --> 00:29:00,300 But that's not what's happening here, 414 00:29:00,300 --> 00:29:03,940 because the x is not negative. 415 00:29:03,940 --> 00:29:10,650 So this is not what's happening, and I'm OK. 416 00:29:10,650 --> 00:29:15,200 So the conclusion was exactly what I wanted -- 417 00:29:15,200 --> 00:29:18,430 that y*b was less or equal to c*x. 418 00:29:18,430 --> 00:29:24,310 And you see how perfectly it used the four inequality 419 00:29:24,310 --> 00:29:25,970 constraints. 420 00:29:25,970 --> 00:29:26,650 OK. 421 00:29:26,650 --> 00:29:33,280 So that's the weak duality where the proof is easy. 422 00:29:33,280 --> 00:29:36,280 Just use what's given. 423 00:29:36,280 --> 00:29:40,710 The duality, without the word weak, 424 00:29:40,710 --> 00:29:47,990 is the fact that at the optimum, the gap is 0, 425 00:29:47,990 --> 00:29:51,780 and actually we can see -- that will tell us a lot. 426 00:29:51,780 --> 00:29:53,430 That will tell us a lot. 427 00:29:53,430 --> 00:29:56,420 When could this gap be 0? 428 00:29:56,420 --> 00:30:01,910 So at the optimum, y star, equality is holding throughout. 429 00:30:01,910 --> 00:30:06,110 So if equality is holding, how can that be? 430 00:30:06,110 --> 00:30:10,440 How can I take these -- of course the inequality, 431 00:30:10,440 --> 00:30:13,100 the x star and y star are feasible. 432 00:30:13,100 --> 00:30:16,810 So if I just put stars on all these things, 433 00:30:16,810 --> 00:30:21,310 then I would have -- everything would still be totally true. 434 00:30:21,310 --> 00:30:24,915 But when I put stars on them -- so I'm picking the optimal guys 435 00:30:24,915 --> 00:30:28,900 -- then equality is holding. 436 00:30:28,900 --> 00:30:33,700 I still have these inequalities, so what I want to find 437 00:30:33,700 --> 00:30:36,490 is the optimality conditions. 438 00:30:36,490 --> 00:30:39,490 How are they related? 439 00:30:39,490 --> 00:30:43,580 If I have y greater or equal 0 and A*x greater or equal b 440 00:30:43,580 --> 00:30:53,471 and I multiply, how could I get equality? 441 00:30:53,471 --> 00:30:53,970 Right? 442 00:30:53,970 --> 00:30:59,620 For example, if I have 3 greater than 0 and 5 greater than 2, 443 00:30:59,620 --> 00:31:04,790 if I multiply those, I get 15 greater than 0, I guess, 444 00:31:04,790 --> 00:31:08,900 and that's far from equality, right? 445 00:31:08,900 --> 00:31:12,100 So how could equality happen? 446 00:31:12,100 --> 00:31:17,890 Well, the only way is if one or the other of these, if equality 447 00:31:17,890 --> 00:31:26,850 holds in one or the other, then I would be OK. 448 00:31:26,850 --> 00:31:28,500 Yes, do you see that? 449 00:31:28,500 --> 00:31:35,550 If equality held -- these are vector inequalities, 450 00:31:35,550 --> 00:31:38,010 so I'm going really component by component. 451 00:31:38,010 --> 00:31:39,990 Let me write down the my conclusion 452 00:31:39,990 --> 00:31:41,670 and then you'll see what I mean. 453 00:31:41,670 --> 00:31:47,100 So these are called the Kuhn-Tucker conditions. 454 00:31:47,100 --> 00:31:48,960 You've seen their names before. 455 00:31:48,960 --> 00:31:52,340 And they're also called -- well, long words -- 456 00:31:52,340 --> 00:31:55,960 complementary slackness. 457 00:31:55,960 --> 00:32:02,200 I'm using words that, if you haven't seen the subject, 458 00:32:02,200 --> 00:32:05,350 you think OK, who needs the long words. 459 00:32:05,350 --> 00:32:08,160 But the idea of slack variable -- 460 00:32:08,160 --> 00:32:16,950 slack is the difference in the -- the slack is c minus y*A, 461 00:32:16,950 --> 00:32:21,840 or over here the slack is A*x minus b. 462 00:32:21,840 --> 00:32:28,670 These are the slack variables. w, let's say, is A*x minus b. 463 00:32:28,670 --> 00:32:30,510 And of course it's greater or equal 0; 464 00:32:30,510 --> 00:32:32,970 that's the nice thing about slack variables. 465 00:32:32,970 --> 00:32:36,070 You know, you've fixed it so it's greater or equal 0. 466 00:32:36,070 --> 00:32:40,660 Here, the slack variable s, for slack, would be what? 467 00:32:40,660 --> 00:32:46,440 c minus y*A, greater or equal 0. 468 00:32:46,440 --> 00:32:49,770 And there's no slack when s is 0. 469 00:32:49,770 --> 00:32:53,530 OK, so that's where the word slackness comes in. 470 00:32:53,530 --> 00:32:57,000 Slack is just the amount of give in the inequality. 471 00:32:57,000 --> 00:32:58,790 So what's the point here? 472 00:32:58,790 --> 00:33:02,290 I was looking at this guy, and the only way 473 00:33:02,290 --> 00:33:08,380 that I could have equality here, when I have inequalities there, 474 00:33:08,380 --> 00:33:16,180 is for each component, I'm going to have to have equality. 475 00:33:16,180 --> 00:33:19,380 And how can I have equality on a component? 476 00:33:19,380 --> 00:33:24,130 Well, I would have it for example, if y was 0. 477 00:33:26,810 --> 00:33:30,520 Then when I multiply, I have equality. 478 00:33:30,520 --> 00:33:31,020 Right? 479 00:33:33,570 --> 00:33:34,940 OK. 480 00:33:34,940 --> 00:33:45,130 Or I could have equality if I had -- let's see, 481 00:33:45,130 --> 00:33:47,770 so this is what I want to say. 482 00:33:47,770 --> 00:33:59,240 So I want to say either y_i is 0 or (A*x)_i is b_i. 483 00:33:59,240 --> 00:34:03,940 Equality holds in one or other of the two inequalities, 484 00:34:03,940 --> 00:34:09,710 because then, if I multiply them together, I have equality. 485 00:34:09,710 --> 00:34:11,270 Right? 486 00:34:11,270 --> 00:34:14,360 You see that. 487 00:34:14,360 --> 00:34:18,950 If one of those holds, say this one holds, if y_i i is 0, 488 00:34:18,950 --> 00:34:24,010 then I certainly can multiply the inequality by y 489 00:34:24,010 --> 00:34:26,010 and I get 0 equals 0. 490 00:34:26,010 --> 00:34:33,160 Or, if A*x is exactly b, then multiplying by y won't change. 491 00:34:33,160 --> 00:34:34,130 OK. 492 00:34:34,130 --> 00:34:40,520 So this is the complementary slackness, one or the other, 493 00:34:40,520 --> 00:34:43,010 that has to hold to get equality. 494 00:34:43,010 --> 00:34:46,010 Now what about this guy? 495 00:34:46,010 --> 00:34:48,820 Equality, same idea here. 496 00:34:48,820 --> 00:34:52,700 I got the inequality by multiplying these together. 497 00:34:52,700 --> 00:34:54,730 When will I get equality? 498 00:34:54,730 --> 00:35:08,750 Only if either x_j is 0 or the j-th component of y*A equals 499 00:35:08,750 --> 00:35:10,390 the j-th component of c. 500 00:35:13,070 --> 00:35:18,160 Again, the same reasoning: that when I multiply two things, 501 00:35:18,160 --> 00:35:21,730 if I get an equality out of two inequalities, 502 00:35:21,730 --> 00:35:25,300 then one of those two at least must have been actually 503 00:35:25,300 --> 00:35:29,140 an equals; otherwise I'd still have a gap. 504 00:35:29,140 --> 00:35:32,210 OK, so this is pretty important. 505 00:35:35,220 --> 00:35:38,850 These are the conditions -- these are our equations. 506 00:35:45,780 --> 00:35:48,230 That tells us when we've won. 507 00:35:48,230 --> 00:35:52,220 So this actually holds for the winners. 508 00:35:52,220 --> 00:35:59,950 It doesn't hold for all the other guys, but at the winner, 509 00:35:59,950 --> 00:36:04,280 because things are equal here, they 510 00:36:04,280 --> 00:36:07,310 had to be equal at every step, and therefore 511 00:36:07,310 --> 00:36:10,080 the Kuhn-Tucker conditions had hold at the winner. 512 00:36:10,080 --> 00:36:16,590 So they hold at this winning corner when we find it. 513 00:36:16,590 --> 00:36:21,220 So the simplex method chases corners, finally 514 00:36:21,220 --> 00:36:24,120 gets to a corner, and it would know 515 00:36:24,120 --> 00:36:26,450 it had got there by the fact that it 516 00:36:26,450 --> 00:36:29,940 couldn't decrease any more. 517 00:36:29,940 --> 00:36:31,930 And if you look at the algebra, you 518 00:36:31,930 --> 00:36:36,460 would see that that tells you that the Kuhn-Tucker conditions 519 00:36:36,460 --> 00:36:37,710 are satisfied. 520 00:36:37,710 --> 00:36:44,610 OK, so the only proof I gave was the weak proof, 521 00:36:44,610 --> 00:36:47,600 that c*x is greater or equal to y*b because that's the nice 522 00:36:47,600 --> 00:36:50,250 one. 523 00:36:50,250 --> 00:36:53,250 I've proved that for equality we'd need these. 524 00:36:53,250 --> 00:36:58,570 OK, now I guess I'm ready for the method. 525 00:36:58,570 --> 00:37:05,050 I'm ready for the interior point barrier method 526 00:37:05,050 --> 00:37:07,570 that tells me how to compute. 527 00:37:07,570 --> 00:37:10,010 OK, so I'm at an interior point. 528 00:37:10,010 --> 00:37:11,280 What do I do? 529 00:37:11,280 --> 00:37:15,720 OK, so here's the method; here's the barrier -- 530 00:37:15,720 --> 00:37:17,820 I'll call it a log barrier. 531 00:37:22,960 --> 00:37:28,380 I'll solve the problem of minimizing c*x. 532 00:37:28,380 --> 00:37:30,000 I won't solve the exact problem. 533 00:37:30,000 --> 00:37:34,580 I'm going to minimize c*x minus, I think, 534 00:37:34,580 --> 00:37:37,660 some little number times a barrier, 535 00:37:37,660 --> 00:37:45,320 which is going to be a sum of the logarithms of the x's. 536 00:37:45,320 --> 00:37:50,990 This alpha is going to be a little bit positive. 537 00:37:50,990 --> 00:37:56,760 I'll take it smaller and smaller because this part is really -- 538 00:37:56,760 --> 00:37:58,580 it's that that I really want to minimize. 539 00:37:58,580 --> 00:37:59,080 Right? 540 00:37:59,080 --> 00:38:00,400 That's the original problem. 541 00:38:00,400 --> 00:38:01,500 This is the cost. 542 00:38:01,500 --> 00:38:04,010 I'm adding something to the cost but I'd better just 543 00:38:04,010 --> 00:38:08,090 be sure that I've chosen the sign of alpha correctly. 544 00:38:08,090 --> 00:38:11,820 By the way, this is discussed now 545 00:38:11,820 --> 00:38:17,670 in the latest version of my Linear Algebra 546 00:38:17,670 --> 00:38:21,360 and its Applications textbook, the fourth edition. 547 00:38:23,960 --> 00:38:27,920 Editions one to three of that book and others 548 00:38:27,920 --> 00:38:30,400 have described the simplex method, 549 00:38:30,400 --> 00:38:36,250 and now it was just natural to include the interior point 550 00:38:36,250 --> 00:38:37,190 barrier method. 551 00:38:37,190 --> 00:38:40,190 OK, so why do I call this a barrier? 552 00:38:40,190 --> 00:38:46,240 Because if x_i gets to 0, the log blows up. 553 00:38:49,620 --> 00:38:52,320 The log blows down I should say -- 554 00:38:52,320 --> 00:38:54,140 blows down to minus infinity. 555 00:38:54,140 --> 00:38:58,980 I'm multiplying by minus alpha, so I get positive. 556 00:38:58,980 --> 00:39:00,880 The combination blows up. 557 00:39:00,880 --> 00:39:04,140 It couldn't be the minimum, so you see, 558 00:39:04,140 --> 00:39:07,630 the minimum is never going to make it to x equals 0, 559 00:39:07,630 --> 00:39:13,980 because at x equals 0, the thing I have here is plus infinity. 560 00:39:13,980 --> 00:39:17,370 So now, I'm just going to use gradient method. 561 00:39:20,020 --> 00:39:24,460 I'm going to solve this problem with the constraints, 562 00:39:24,460 --> 00:39:30,200 of course, with the constraints, and set derivatives to 0. 563 00:39:30,200 --> 00:39:33,310 Now I have -- you know, it's not linear anymore -- 564 00:39:33,310 --> 00:39:35,582 the winner is not at a corner anymore. 565 00:39:35,582 --> 00:39:36,790 It's somewhere in the middle. 566 00:39:36,790 --> 00:39:38,310 Calculus operates. 567 00:39:38,310 --> 00:39:40,450 I can set derivative to 0. 568 00:39:40,450 --> 00:39:42,470 OK, so I want to do that. 569 00:39:42,470 --> 00:39:46,420 And of course, I'm still inside this feasible set, 570 00:39:46,420 --> 00:39:51,590 so let me see if I can put down the equations 571 00:39:51,590 --> 00:39:52,830 and the constraints. 572 00:39:52,830 --> 00:39:55,880 OK, so I still have the constraints. 573 00:39:55,880 --> 00:40:04,960 Now, forgive me, but I've made a change to A*x equals b. 574 00:40:04,960 --> 00:40:09,070 I could have started with that as the constraint. 575 00:40:09,070 --> 00:40:12,380 I've made that change to A*x equals b. 576 00:40:15,530 --> 00:40:19,140 How have I done such a thing? 577 00:40:19,140 --> 00:40:22,030 I'm given the problem with A*x greater or equal b, 578 00:40:22,030 --> 00:40:27,350 but I'm also given the slack variable w greater or equal 0. 579 00:40:30,220 --> 00:40:33,980 So I just -- it's just a little trick that's not worth -- 580 00:40:33,980 --> 00:40:36,700 you could just take my word for it, a little trick. 581 00:40:36,700 --> 00:40:41,870 My new variable is the x's and the w's. 582 00:40:41,870 --> 00:40:47,400 m plus n variables: the n x's and the m w's. 583 00:40:47,400 --> 00:40:53,750 And now, put that together -- so can I just maybe do this over 584 00:40:53,750 --> 00:40:56,460 in the corner here? 585 00:40:56,460 --> 00:41:02,680 Before I start on this, I changed to a new variable that 586 00:41:02,680 --> 00:41:05,610 that'll be x's and w's. 587 00:41:05,610 --> 00:41:08,730 And that will be greater or equal 0, right? 588 00:41:08,730 --> 00:41:10,610 Because the x was always greater or equal 0, 589 00:41:10,610 --> 00:41:15,620 and the slack says A*x greater or equal b is turned into slack 590 00:41:15,620 --> 00:41:16,860 greater or equal 0. 591 00:41:16,860 --> 00:41:24,540 And now that multiplies A, minus I to give b, 592 00:41:24,540 --> 00:41:31,820 because A*x minus the slack variable is b, 593 00:41:31,820 --> 00:41:37,550 which says that A*x minus b is the slack variable w -- 594 00:41:37,550 --> 00:41:40,640 bring that over and that over -- and that's what we said was 595 00:41:40,640 --> 00:41:41,920 greater or equal 0. 596 00:41:41,920 --> 00:41:45,240 Do you see that I've changed to an equation 597 00:41:45,240 --> 00:41:49,230 by introducing more variables? 598 00:41:49,230 --> 00:41:54,470 Putting the x's and the slacks all together in a big variable 599 00:41:54,470 --> 00:41:55,870 that I'm now going to call x. 600 00:41:55,870 --> 00:42:00,002 So this is now the sum of -- there's are m plus n of these 601 00:42:00,002 --> 00:42:11,130 x's now, because -- this is the new x and this is the new A. 602 00:42:11,130 --> 00:42:14,890 You might say: why didn't I just start with equality constraint? 603 00:42:14,890 --> 00:42:16,840 And I certainly could have done. 604 00:42:16,840 --> 00:42:21,580 But just to see that inequalities have their place 605 00:42:21,580 --> 00:42:25,040 too, and to see that we can get between one and the other. 606 00:42:25,040 --> 00:42:31,310 OK, so now this is the problem with equality constraint. 607 00:42:31,310 --> 00:42:38,770 So my new constraints are A*x equals b and x greater or equal 608 00:42:38,770 --> 00:42:39,780 0. 609 00:42:39,780 --> 00:42:41,770 That's the primal constraint. 610 00:42:41,770 --> 00:42:44,940 And what's the dual constraint? 611 00:42:44,940 --> 00:42:48,740 So the dual constraint is y greater or equals 0. 612 00:42:48,740 --> 00:42:50,660 Right? 613 00:42:50,660 --> 00:42:57,330 And, OK I have to get this right because we're right at the end. 614 00:42:57,330 --> 00:43:05,240 And the slack, let me just write the slack one. 615 00:43:05,240 --> 00:43:08,930 The slack one -- s is the slack. 616 00:43:08,930 --> 00:43:10,010 This is s. 617 00:43:10,010 --> 00:43:14,170 I'm going to transpose so that I have consistently column 618 00:43:14,170 --> 00:43:14,980 vectors. 619 00:43:14,980 --> 00:43:24,560 So that, when I transpose, it says that A transpose y plus s 620 00:43:24,560 --> 00:43:26,180 is c. 621 00:43:26,180 --> 00:43:27,600 Right? 622 00:43:27,600 --> 00:43:31,110 I put that over there with the s and transpose 623 00:43:31,110 --> 00:43:32,970 to get column vectors. 624 00:43:32,970 --> 00:43:35,040 I like to have column vectors. 625 00:43:35,040 --> 00:43:39,570 OK, so those are the constraints, but now, 626 00:43:39,570 --> 00:43:44,730 what's the derivative equals 0 equation? 627 00:43:44,730 --> 00:43:50,380 Derivative equals 0 is the derivative of this equals 0. 628 00:43:50,380 --> 00:43:52,260 So what does that say? 629 00:43:52,260 --> 00:43:56,440 That says that if I set the derivative as 0, 630 00:43:56,440 --> 00:44:01,760 that says that c_i -- x, remember, is -- 631 00:44:01,760 --> 00:44:04,380 well x has got all these components. 632 00:44:04,380 --> 00:44:11,540 c_i is alpha and the derivative of log x_i, of course, 633 00:44:11,540 --> 00:44:14,810 is 1 over x_i. 634 00:44:14,810 --> 00:44:20,440 So that's the equation for derivative equals 0. 635 00:44:20,440 --> 00:44:23,090 So this is what I'm solving. 636 00:44:23,090 --> 00:44:23,590 OK. 637 00:44:26,690 --> 00:44:29,890 Equality is here, equality is here, 638 00:44:29,890 --> 00:44:32,294 equality is here, but nonlinear. 639 00:44:32,294 --> 00:44:33,460 This is of course nonlinear. 640 00:44:43,570 --> 00:44:50,680 So Newton's method just says linearize. 641 00:44:50,680 --> 00:44:55,590 Newton's method is just linearize at the point, 642 00:44:55,590 --> 00:45:00,420 and that gives you the direction to move. 643 00:45:00,420 --> 00:45:04,570 And you move that direction because you've linearized. 644 00:45:04,570 --> 00:45:10,440 As you move, you're wandering a little away from precision, 645 00:45:10,440 --> 00:45:16,430 from perfection, but if you don't take too big a step, 646 00:45:16,430 --> 00:45:18,430 Newton is safe. 647 00:45:18,430 --> 00:45:21,130 Maybe, since this is a course in scientific computing, 648 00:45:21,130 --> 00:45:24,630 I should've written on the very first day in big letters 649 00:45:24,630 --> 00:45:31,620 Newton, because that idea of following the gradient 650 00:45:31,620 --> 00:45:36,400 is the central method of solving non-linear equations. 651 00:45:36,400 --> 00:45:39,160 And then on the board beneath, I would 652 00:45:39,160 --> 00:45:41,600 have written in big letters "carefully," 653 00:45:41,600 --> 00:45:46,610 because the derivative is a local thing. 654 00:45:46,610 --> 00:45:51,590 And if you follow the derivative a long distance, 655 00:45:51,590 --> 00:45:55,780 follow the derivative here a long distance out to here, 656 00:45:55,780 --> 00:46:03,130 who knows what -- you've lost the safety of Newton's method. 657 00:46:03,130 --> 00:46:08,100 So Newton's method always comes in reality 658 00:46:08,100 --> 00:46:11,760 with some kind of a trust region, some region where 659 00:46:11,760 --> 00:46:17,150 you can rely on the derivative being 660 00:46:17,150 --> 00:46:21,840 a reasonable approximation of the way the function is moving. 661 00:46:21,840 --> 00:46:25,130 OK, so we do that here too. 662 00:46:25,130 --> 00:46:26,520 OK. 663 00:46:26,520 --> 00:46:36,800 Maybe I won't write out in full notation -- 664 00:46:36,800 --> 00:46:39,500 what does Newton's method do, actually? 665 00:46:39,500 --> 00:46:47,370 So Newton's method, we're at a particular x, y, s, 666 00:46:47,370 --> 00:46:51,450 and we've got to move. 667 00:46:51,450 --> 00:46:54,550 So the unknowns are --- the components of x, 668 00:46:54,550 --> 00:46:58,130 the components of y, and the components of s. 669 00:46:58,130 --> 00:47:03,350 So Newton's method takes steps: a delta x, a delta y, 670 00:47:03,350 --> 00:47:11,500 and a delta s, computes what those should be, 671 00:47:11,500 --> 00:47:15,780 and then that gives the direction, 672 00:47:15,780 --> 00:47:19,210 and if you take them exactly, that's the full Newton 673 00:47:19,210 --> 00:47:22,660 step, which you would be very happy to do because that gives 674 00:47:22,660 --> 00:47:27,030 terrific convergence, but if it's too big a step, 675 00:47:27,030 --> 00:47:28,800 then you have to cut back. 676 00:47:28,800 --> 00:47:32,660 So the equations for these are what you need, 677 00:47:32,660 --> 00:47:37,310 so there'll be an A delta x will be 0. 678 00:47:37,310 --> 00:47:39,640 Because b isn't changing. 679 00:47:39,640 --> 00:47:50,690 There will be an A transpose delta y; A transposed delta y 680 00:47:50,690 --> 00:47:55,900 plus delta s will be 0 because the c isn't changing. 681 00:47:55,900 --> 00:47:58,570 And then we'll get an equation out 682 00:47:58,570 --> 00:48:04,490 of this, which is a really significant one that maybe 683 00:48:04,490 --> 00:48:08,700 time is running out on and I'm not going to do justice to. 684 00:48:08,700 --> 00:48:14,400 But that's the nonlinear term, where, you see, 685 00:48:14,400 --> 00:48:20,790 if I keep A delta x zero, then my new x is exactly feasible 686 00:48:20,790 --> 00:48:21,290 right? 687 00:48:21,290 --> 00:48:25,590 If I'm at an A*x equals b and I move it by a delta x 688 00:48:25,590 --> 00:48:30,570 that's in the null space, then I still have -- 689 00:48:30,570 --> 00:48:35,820 all I'm saying is that when I take that step I will have A x 690 00:48:35,820 --> 00:48:38,350 plus delta x still equal to b. 691 00:48:38,350 --> 00:48:39,130 Good. 692 00:48:39,130 --> 00:48:40,910 Constraints still satisfied. 693 00:48:40,910 --> 00:48:46,070 When I take this step, since that's linear, the constraint, 694 00:48:46,070 --> 00:48:50,260 when I add on the delta y and the delta s and the 0, 695 00:48:50,260 --> 00:48:53,980 I still have -- my new point still satisfies that 696 00:48:53,980 --> 00:48:54,890 constraint. 697 00:48:54,890 --> 00:48:58,980 But this is of course not exactly satisfied. 698 00:48:58,980 --> 00:49:02,580 If I had the solution to this, I'd be done. 699 00:49:02,580 --> 00:49:03,900 That's my problem. 700 00:49:03,900 --> 00:49:07,390 Anyway, so it's not exactly satisfied. 701 00:49:07,390 --> 00:49:14,520 Newton would tell you a linearization of it, 702 00:49:14,520 --> 00:49:19,050 and you would move in that gradient direction to try 703 00:49:19,050 --> 00:49:24,030 to make the thing -- to try to make equality hold, 704 00:49:24,030 --> 00:49:30,370 because our current x doesn't have equality holding. 705 00:49:30,370 --> 00:49:35,060 And of course the c is A transpose y plus s. 706 00:49:35,060 --> 00:49:41,210 So that equation -- you see what's going on here? 707 00:49:41,210 --> 00:49:46,000 This is A transpose y plus s, and the x is multiplying those, 708 00:49:46,000 --> 00:49:47,830 so there's a product there. 709 00:49:47,830 --> 00:49:50,930 And when I take the derivative, it's a product rule, 710 00:49:50,930 --> 00:49:52,390 I get two terms. 711 00:49:52,390 --> 00:50:00,200 Anyway, I get a third equation from here 712 00:50:00,200 --> 00:50:03,460 that connects delta x, delta y, and delta s. 713 00:50:03,460 --> 00:50:09,420 I take that step and that's my interior point method. 714 00:50:09,420 --> 00:50:13,400 That's my Newton step. 715 00:50:13,400 --> 00:50:18,790 So maybe I just end by reporting the results, 716 00:50:18,790 --> 00:50:20,940 so I'll end with just two comments. 717 00:50:20,940 --> 00:50:23,280 First is, is the method any good? 718 00:50:23,280 --> 00:50:26,120 And of course you only know by trying. 719 00:50:26,120 --> 00:50:29,460 And the answer is yeah, in -- typically, 720 00:50:29,460 --> 00:50:34,920 you get the duality gap down below 10 to the minus 8, 721 00:50:34,920 --> 00:50:38,160 which is usually very satisfactory, 722 00:50:38,160 --> 00:50:45,410 in 20 to 80 steps. 723 00:50:45,410 --> 00:50:48,340 You can never prove a statement like that, 724 00:50:48,340 --> 00:50:51,170 because you can always create some awful example, 725 00:50:51,170 --> 00:50:55,830 but this is the typical performance of the method. 726 00:50:55,830 --> 00:51:01,620 Which is pretty good, regardless of m and n. 727 00:51:01,620 --> 00:51:06,270 That's what's wonderful -- that the number of steps 728 00:51:06,270 --> 00:51:09,910 doesn't increase with the size of the problem. 729 00:51:09,910 --> 00:51:12,400 Of course, the cost per step does increase 730 00:51:12,400 --> 00:51:13,740 with the size of the problem. 731 00:51:13,740 --> 00:51:16,660 OK so that's the results, and that's 732 00:51:16,660 --> 00:51:20,090 why the method is popular. 733 00:51:20,090 --> 00:51:23,840 And now I just wanted to not leave duality, 734 00:51:23,840 --> 00:51:28,470 which is such a key idea, without going back 735 00:51:28,470 --> 00:51:34,270 to our much more familiar problem of quadratics, where 736 00:51:34,270 --> 00:51:36,050 there are quadratic terms. 737 00:51:36,050 --> 00:51:39,720 And the best model you remember was projection. 738 00:51:39,720 --> 00:51:46,160 You remember that we had a vector b and we have the line, 739 00:51:46,160 --> 00:51:52,420 the null space of A. No, this was the column space of A. 740 00:51:52,420 --> 00:51:56,580 This was all A*x's. 741 00:51:56,580 --> 00:52:03,830 And perpendicular to it was the null space of A transpose. 742 00:52:03,830 --> 00:52:10,010 All A transpose y's that equaled 0. 743 00:52:10,010 --> 00:52:11,060 Do you remember this? 744 00:52:11,060 --> 00:52:17,280 This was the model problem for understanding the -- 745 00:52:17,280 --> 00:52:22,590 so that the projection of this solved one problem. 746 00:52:22,590 --> 00:52:27,070 The projection in the other direction -- we called that P. 747 00:52:27,070 --> 00:52:31,570 This was the projection P equal A times the best x. 748 00:52:31,570 --> 00:52:34,720 The projection in the opposite direction 749 00:52:34,720 --> 00:52:39,870 found the e, the error, but it was the solution 750 00:52:39,870 --> 00:52:41,890 to the dual problem. 751 00:52:41,890 --> 00:52:47,050 And now I want to say where was duality in this picture? 752 00:52:47,050 --> 00:52:51,770 Well, duality was -- let me call it e hat, the winning, 753 00:52:51,770 --> 00:52:54,650 the projection, the right guy over here. 754 00:52:54,650 --> 00:52:57,700 Or maybe y hat. 755 00:52:57,700 --> 00:52:59,500 OK, where was duality? 756 00:52:59,500 --> 00:53:02,640 Duality came, in this case, in the fact 757 00:53:02,640 --> 00:53:05,250 that it was Pythagoras. 758 00:53:05,250 --> 00:53:09,100 Duality in this simple, beautiful problem 759 00:53:09,100 --> 00:53:15,220 was simply the fact that p squared, this winner squared, 760 00:53:15,220 --> 00:53:21,300 plus e squared was b squared. 761 00:53:21,300 --> 00:53:25,880 The winners were the orthogonal projections. 762 00:53:25,880 --> 00:53:28,120 And now where is weak duality? 763 00:53:28,120 --> 00:53:30,460 It's the last second of the lecture. 764 00:53:30,460 --> 00:53:33,840 Weak duality says take something that's allowed, 765 00:53:33,840 --> 00:53:38,650 like that, and take something that's allowed here, like that. 766 00:53:44,650 --> 00:53:47,080 Those are not the winners. 767 00:53:47,080 --> 00:53:50,140 Those don't deserve stars or hats. 768 00:53:50,140 --> 00:53:51,650 They're not the winners. 769 00:53:51,650 --> 00:53:57,640 And compute that squared plus that squared. 770 00:53:57,640 --> 00:53:59,350 So this is any A*x. 771 00:53:59,350 --> 00:54:09,630 So any A*x squared and any y -- let's call that y -- squared. 772 00:54:09,630 --> 00:54:19,060 And what is the inequality that is satisfied by any A*x, 773 00:54:19,060 --> 00:54:24,220 like the wrong one here, and any y, like the wrong one there, 774 00:54:24,220 --> 00:54:27,250 will satisfy? 775 00:54:27,250 --> 00:54:31,260 Pythagoras won't be quite right. 776 00:54:31,260 --> 00:54:35,960 It'll be A*x squared plus y squared. 777 00:54:35,960 --> 00:54:38,530 What do we know about the sum of those two squares? 778 00:54:41,100 --> 00:54:45,080 It's greater than or equal to b squared. 779 00:54:48,440 --> 00:54:51,720 The only way we get this thing split 780 00:54:51,720 --> 00:54:58,580 into two orthogonal parts whose squares add up to b squared 781 00:54:58,580 --> 00:55:02,680 is right triangle. 782 00:55:02,680 --> 00:55:09,390 If I replace this by something longer and I replace this -- 783 00:55:09,390 --> 00:55:10,980 I should take that error really. 784 00:55:13,850 --> 00:55:17,500 e is really b minus the A*x. 785 00:55:17,500 --> 00:55:19,420 That's what I should be putting here. 786 00:55:19,420 --> 00:55:25,860 This thing should be b minus A*x squared. 787 00:55:28,450 --> 00:55:32,700 Anyway, the duality is in the fact 788 00:55:32,700 --> 00:55:36,570 of getting an equal sign there and weak duality 789 00:55:36,570 --> 00:55:40,390 is the easy inequality that no matter what you do, 790 00:55:40,390 --> 00:55:41,960 you get greater than or equal. 791 00:55:41,960 --> 00:55:46,810 So the duality gap is somehow the gap there, 792 00:55:46,810 --> 00:55:51,640 and the whole subject of optimization 793 00:55:51,640 --> 00:55:53,790 is to bring that gap to 0. 794 00:55:53,790 --> 00:55:58,430 So this is the gap in quadratic problems, 795 00:55:58,430 --> 00:56:01,830 of which this is a neat model, and this was 796 00:56:01,830 --> 00:56:04,660 all about linear programming. 797 00:56:04,660 --> 00:56:09,300 And duality is present for both. 798 00:56:09,300 --> 00:56:16,430 OK, so Friday is the promised lecture on ill-posed problems. 799 00:56:16,430 --> 00:56:21,100 And meanwhile, if two people are willing to put up 800 00:56:21,100 --> 00:56:25,710 a hand now or email me later and say: sure, 801 00:56:25,710 --> 00:56:31,271 I'll take my turn Friday of next week, that would be terrific. 802 00:56:31,271 --> 00:56:31,770 OK. 803 00:56:31,770 --> 00:56:32,710 Thanks. 804 00:56:32,710 --> 00:56:33,780 I see one hand. 805 00:56:33,780 --> 00:56:35,030 OK.