1 00:00:00,500 --> 00:00:01,950 The following content is provided 2 00:00:01,950 --> 00:00:06,090 by MIT OpenCourseWare under a Creative Commons license. 3 00:00:06,090 --> 00:00:08,330 Additional information about our license 4 00:00:08,330 --> 00:00:10,470 and MIT OpenCourseWare in general, 5 00:00:10,470 --> 00:00:11,930 is available at ocw.mit.edu. 6 00:00:17,100 --> 00:00:20,310 PROFESSOR: Up on the web now are lots of sections, 7 00:00:20,310 --> 00:00:22,280 including this conjugate gradients, 8 00:00:22,280 --> 00:00:27,960 so today is the end of the conjugate gradients discussion, 9 00:00:27,960 --> 00:00:33,000 and it's kind of a difficult algorithm. 10 00:00:33,000 --> 00:00:37,590 It's beautiful and slightly tricky, 11 00:00:37,590 --> 00:00:42,610 but if I just see what the point is of conjugate gradients, 12 00:00:42,610 --> 00:00:47,590 we don't have to check that the exact formulas that I've listed 13 00:00:47,590 --> 00:00:52,440 here -- so that's one step of conjugate gradients: 14 00:00:52,440 --> 00:01:00,480 going from k minus 1 to k, and we'll look at each of the five 15 00:01:00,480 --> 00:01:06,650 steps in the cycle, without patiently checking all those 16 00:01:06,650 --> 00:01:07,590 formulas. 17 00:01:07,590 --> 00:01:09,780 We'll just, sort of, see how much work is involved. 18 00:01:09,780 --> 00:01:17,040 But let me begin where we were last time, which was Arnoldi. 19 00:01:17,040 --> 00:01:20,250 So Arnoldi was a Gram-Schmidt-like algorithm, 20 00:01:20,250 --> 00:01:26,350 not exactly Gram-Schmidt, that produced an orthogonal basis, 21 00:01:26,350 --> 00:01:30,430 and that basis is the basis for the Krylov subspace. 22 00:01:30,430 --> 00:01:38,620 So each new basis vector in the Krylov subspace 23 00:01:38,620 --> 00:01:42,170 is found by multiplying by another power of A, 24 00:01:42,170 --> 00:01:44,620 but that's a terrible basis. 25 00:01:44,620 --> 00:01:50,920 So the basis that we get from these guys has to be improved, 26 00:01:50,920 --> 00:01:54,320 orthogonalized, and that's what Arnoldi does. 27 00:01:54,320 --> 00:01:58,790 And if we look to see what it does matrix-wise, 28 00:01:58,790 --> 00:02:02,900 it turns out to be exactly -- have that nice equation, 29 00:02:02,900 --> 00:02:08,490 that we're creating a matrix Q with the orthogonal vectors, 30 00:02:08,490 --> 00:02:14,320 and a matrix H that tells us how they were connected 31 00:02:14,320 --> 00:02:19,880 to the matrix A. OK. 32 00:02:19,880 --> 00:02:33,270 So that's the central equation, and the properties of H 33 00:02:33,270 --> 00:02:35,310 are important. 34 00:02:35,310 --> 00:02:42,900 So we saw that the key property was, in general, we've got -- 35 00:02:42,900 --> 00:02:45,620 it's full up in that upper corner, 36 00:02:45,620 --> 00:02:49,320 just the way Gram-Schmidt would be. 37 00:02:49,320 --> 00:02:53,830 But in case A is a symmetric matrix, 38 00:02:53,830 --> 00:02:58,180 then we can check that H will be symmetric from this formula. 39 00:02:58,180 --> 00:03:00,040 And the fact that it's symmetric means 40 00:03:00,040 --> 00:03:03,290 that those are zeros up there, because we 41 00:03:03,290 --> 00:03:05,750 know there are zeros down here. 42 00:03:05,750 --> 00:03:07,940 We know there's only one diagonal 43 00:03:07,940 --> 00:03:09,570 below the main diagonal. 44 00:03:12,290 --> 00:03:14,050 And if it's symmetric, then there's 45 00:03:14,050 --> 00:03:19,740 only one diagonal above, and that's the great case, 46 00:03:19,740 --> 00:03:24,540 the symmetric case. 47 00:03:24,540 --> 00:03:29,120 So I'm going to take two seconds just to say, 48 00:03:29,120 --> 00:03:31,820 here is the best way to find eigenvalues. 49 00:03:31,820 --> 00:03:37,210 We haven't spoken one word about computing eigenvalues, 50 00:03:37,210 --> 00:03:42,170 but this is the moment to say that word. 51 00:03:42,170 --> 00:03:45,710 If I have a symmetric matrix, eigenvalues are way, way easier 52 00:03:45,710 --> 00:03:48,320 for symmetric matrices. 53 00:03:48,320 --> 00:03:51,940 Linear systems are also easier, but eigenvalues 54 00:03:51,940 --> 00:03:56,700 are an order of magnitude easier for symmetric matrices. 55 00:03:56,700 --> 00:04:00,300 And here is a really good way to find the eigenvalues 56 00:04:00,300 --> 00:04:02,420 of large matrices. 57 00:04:02,420 --> 00:04:05,440 I assume that you don't want all the eigenvalues. 58 00:04:05,440 --> 00:04:12,160 I mean, that is very unusual to need many, many eigenvalues 59 00:04:12,160 --> 00:04:14,090 of a big matrix. 60 00:04:14,090 --> 00:04:20,500 You're looking -- probably those high eigenvalues are not really 61 00:04:20,500 --> 00:04:25,300 physically close, good representations of the actual 62 00:04:25,300 --> 00:04:28,330 eigenvalues of the continuous problem. 63 00:04:28,330 --> 00:04:30,480 So, we have continuous and discrete 64 00:04:30,480 --> 00:04:34,650 that are close for the low eigenvalues, 65 00:04:34,650 --> 00:04:40,810 because that's where the error is smooth. 66 00:04:40,810 --> 00:04:45,690 High eigenvalues are not so reliable. 67 00:04:45,690 --> 00:04:51,450 So for the low eigenvalues, here is a good way to do them. 68 00:04:51,450 --> 00:04:59,540 Do Arnoldi, and it gets named also, now, after Lanczos, 69 00:04:59,540 --> 00:05:03,210 because he had this eigenvalue idea. 70 00:05:03,210 --> 00:05:07,560 Run that for a while. 71 00:05:07,560 --> 00:05:13,430 Say if you want 10 eigenvalues, maybe go up to j equal to 20. 72 00:05:13,430 --> 00:05:16,490 So 20 Arnoldi steps, quite reasonable. 73 00:05:16,490 --> 00:05:22,150 And then look at the 20 by 20 submatrix of H, 74 00:05:22,150 --> 00:05:25,910 so you're not going to go to the end, to H_n. 75 00:05:25,910 --> 00:05:28,600 You're going to stop at some point, 76 00:05:28,600 --> 00:05:31,180 and you have this nice matrix. 77 00:05:34,770 --> 00:05:38,060 It'll be tri-diagonal, symmetric tri-diagonal, 78 00:05:38,060 --> 00:05:41,500 and that's the kind of matrix we know how to find 79 00:05:41,500 --> 00:05:46,920 the eigenvalues of; so this is symmetric and tri-diagonal, 80 00:05:46,920 --> 00:05:56,050 and eigenvalues of that class of matrices are very nice, 81 00:05:56,050 --> 00:06:00,110 and they're very meaningful, and they're very close to -- 82 00:06:00,110 --> 00:06:04,800 in many cases; I won't try to give a theory here -- 83 00:06:04,800 --> 00:06:07,330 in many cases, you get very good approximations. 84 00:06:07,330 --> 00:06:10,030 The eigenvalues of this submatrix 85 00:06:10,030 --> 00:06:12,630 are sometimes called Ritz values, 86 00:06:12,630 --> 00:06:18,550 and they use the eigenvalues of this submatrix, 87 00:06:18,550 --> 00:06:24,430 and those are often called Ritz values. 88 00:06:24,430 --> 00:06:29,170 And so the first 10 eigenvalues of that 20 by 20 matrix, 89 00:06:29,170 --> 00:06:35,990 would be in many, many problems very fast, very 90 00:06:35,990 --> 00:06:40,850 efficient approximations to the low eigenvalues 91 00:06:40,850 --> 00:06:44,270 of the big matrix A that you start with. 92 00:06:44,270 --> 00:06:48,790 OK, the reason is that eigenvalues -- 93 00:06:48,790 --> 00:06:51,380 do you all recognize that eigenvalues -- 94 00:06:51,380 --> 00:06:55,940 that this is the operation, when I bring Q inverse over here, 95 00:06:55,940 --> 00:07:00,890 that I would call these matrices similar -- 96 00:07:00,890 --> 00:07:08,200 two matrices are similar if there's a Q that connects them 97 00:07:08,200 --> 00:07:09,270 this way. 98 00:07:09,270 --> 00:07:14,860 And similar matrices have the same lambdas, 99 00:07:14,860 --> 00:07:16,120 the same eigenvalues. 100 00:07:18,710 --> 00:07:22,790 So the full matrix Q, if we went all the way to the end, 101 00:07:22,790 --> 00:07:26,070 we could find its eigenvalues, but the whole point 102 00:07:26,070 --> 00:07:30,760 of all this, of this month, is algorithms 103 00:07:30,760 --> 00:07:32,900 that you can stop partway, and you still 104 00:07:32,900 --> 00:07:35,180 have something really good. 105 00:07:35,180 --> 00:07:42,560 OK, so that's -- I think we can't do everything, 106 00:07:42,560 --> 00:07:45,570 so I won't try to do more with eigenvalues except to mention 107 00:07:45,570 --> 00:07:46,070 that. 108 00:07:46,070 --> 00:07:50,370 Someday you may want the eigenvalues of a large matrix, 109 00:07:50,370 --> 00:08:00,340 and I guess ARPACK would be the Arnoldi pack, that 110 00:08:00,340 --> 00:08:06,030 would be the package to go to for the Arnoldi iteration, 111 00:08:06,030 --> 00:08:09,640 and then to use for the eigenvalues. 112 00:08:09,640 --> 00:08:17,930 OK, so that's a side point, just since linear algebra is -- 113 00:08:17,930 --> 00:08:20,990 half of it's linear systems and half of it's eigenvalue 114 00:08:20,990 --> 00:08:26,490 problems, I didn't think I could miss the chance to do that 115 00:08:26,490 --> 00:08:28,420 second half, the eigenvalue part. 116 00:08:28,420 --> 00:08:33,750 OK, now I'm going to tackle conjugate gradients, 117 00:08:33,750 --> 00:08:35,330 to get the idea. 118 00:08:35,330 --> 00:08:38,035 OK, so the idea, well, of course, A 119 00:08:38,035 --> 00:08:42,900 is symmetric positive definite. 120 00:08:42,900 --> 00:08:47,460 If it's not, you could try these formulas, 121 00:08:47,460 --> 00:08:52,780 but you'd be taking a pretty big chance. 122 00:08:52,780 --> 00:08:56,530 If it is, these are just right. 123 00:08:56,530 --> 00:08:59,850 OK, so what's the main point about -- 124 00:08:59,850 --> 00:09:08,300 here's the rule that decides what x_k will be. 125 00:09:08,300 --> 00:09:15,490 x_k will be the vector in the Krylov -- of course, 126 00:09:15,490 --> 00:09:22,440 x_k is in this Krylov space K_k. 127 00:09:25,040 --> 00:09:32,490 So it'll be -- we can create x_k recursively, 128 00:09:32,490 --> 00:09:35,290 and we should need, just, if we do it right, 129 00:09:35,290 --> 00:09:38,710 should need just one multiplication by A at every 130 00:09:38,710 --> 00:09:42,970 step, and then some inner product. 131 00:09:42,970 --> 00:09:47,140 OK, so now, what do we know from this? 132 00:09:47,140 --> 00:09:53,090 This vector, since it has this, it starts with -- 133 00:09:53,090 --> 00:09:56,900 it has an x_k that's in the k-th Krylov subspace, 134 00:09:56,900 --> 00:10:01,950 but now it's multiplied by A, so that's up to the k plus first. 135 00:10:01,950 --> 00:10:08,520 And b is in K_1, it's in all the Krylov subspaces, 136 00:10:08,520 --> 00:10:18,980 so this is in -- and therefore r_k is in -- K_(k+1), right? 137 00:10:18,980 --> 00:10:22,070 This is in the next space up, because there's a 138 00:10:22,070 --> 00:10:25,660 multiplication by A. And therefore, 139 00:10:25,660 --> 00:10:30,580 if we choose to determine it by this rule, 140 00:10:30,580 --> 00:10:36,120 we learn that this vector -- I mean, 141 00:10:36,120 --> 00:10:44,340 we already know from Arnoldi that q_(k+1) is orthogonal 142 00:10:44,340 --> 00:10:46,470 to this space. 143 00:10:46,470 --> 00:10:53,680 This is the new k plus first vector that's in K_(k+1), 144 00:10:53,680 --> 00:10:56,710 so this is just where this is to be found. 145 00:10:56,710 --> 00:11:02,160 So this vector r_k is found in the next Krylov subspace; 146 00:11:02,160 --> 00:11:05,240 this one is in the next Krylov subspace. 147 00:11:05,240 --> 00:11:09,060 They're both orthogonal to all the previous Krylov subspaces, 148 00:11:09,060 --> 00:11:11,150 so one's a multiple of the other. 149 00:11:14,180 --> 00:11:17,820 So, that will mean that automatically, 150 00:11:17,820 --> 00:11:22,430 that the r's have the same property that Arnoldi 151 00:11:22,430 --> 00:11:25,550 created for the q's: they're orthogonal. 152 00:11:25,550 --> 00:11:29,920 So the residuals are orthogonal. 153 00:11:29,920 --> 00:11:32,540 So, orthogonal residuals. 154 00:11:32,540 --> 00:11:39,850 Orthogonal residuals, good. 155 00:11:43,970 --> 00:11:48,050 I better say, by the way, when we do conjugate gradients, 156 00:11:48,050 --> 00:11:52,590 we don't also do Arnoldi, so Arnoldi's finding these q's; 157 00:11:52,590 --> 00:11:55,740 we're leaving him behind for a while, 158 00:11:55,740 --> 00:11:57,590 because we're going to go directly 159 00:11:57,590 --> 00:12:00,430 to the x's and r's here. 160 00:12:00,430 --> 00:12:05,420 So, I won't know the q's but I know a lot about the q, 161 00:12:05,420 --> 00:12:10,720 so it's natural to write this simple fact: that each residual 162 00:12:10,720 --> 00:12:13,750 is a multiple of the new q, and therefore, it's 163 00:12:13,750 --> 00:12:16,770 orthogonal to all the old ones. 164 00:12:16,770 --> 00:12:23,410 And now I want to -- this is the other property that determines 165 00:12:23,410 --> 00:12:25,190 the new x. 166 00:12:25,190 --> 00:12:29,670 So the new x should be -- what do I see here, 167 00:12:29,670 --> 00:12:36,620 I see a sort of delta x, the change in x, the correction -- 168 00:12:36,620 --> 00:12:40,620 which is what I want to compute -- at step k, 169 00:12:40,620 --> 00:12:42,750 that's what I have to find. 170 00:12:42,750 --> 00:12:46,640 And I could look at the corrections at the old steps, 171 00:12:46,640 --> 00:12:50,010 and notice there's an A in the middle. 172 00:12:50,010 --> 00:12:52,270 So the corrections to the x's are 173 00:12:52,270 --> 00:12:58,410 orthogonal in the A inner products, 174 00:12:58,410 --> 00:13:01,680 and that's where the name conjugate comes from. 175 00:13:01,680 --> 00:13:11,190 So this is conjugate directions. 176 00:13:11,190 --> 00:13:15,650 Why do I say directions, and why do I say conjugate? 177 00:13:15,650 --> 00:13:17,650 Conjugate gradients, even, I could say. 178 00:13:17,650 --> 00:13:19,790 Why did I -- conjugate directions, 179 00:13:19,790 --> 00:13:22,430 I maybe shouldn't have erased that, 180 00:13:22,430 --> 00:13:28,390 but I'll also say conjugate gradients. 181 00:13:28,390 --> 00:13:30,690 OK. 182 00:13:30,690 --> 00:13:34,920 So conjugate, all I mean by conjugate, 183 00:13:34,920 --> 00:13:42,060 is orthogonal in the natural inner product that goes with 184 00:13:42,060 --> 00:13:44,370 the problem, and that's given by A. 185 00:13:44,370 --> 00:13:47,900 So the inner product given by the problem is the inner 186 00:13:47,900 --> 00:13:54,070 product of x and y, is x transposed A*y. 187 00:13:54,070 --> 00:14:00,080 And that's a good inner product, and it's the one that comes 188 00:14:00,080 --> 00:14:04,750 with the problem and so, natural to look 189 00:14:04,750 --> 00:14:09,570 at orthogonality in that sense, and that's what we have. 190 00:14:09,570 --> 00:14:12,750 So these are orthogonal in the usual sense, 191 00:14:12,750 --> 00:14:15,580 because they have an A already built into them, 192 00:14:15,580 --> 00:14:19,160 and these are orthogonal in the A inner product. 193 00:14:19,160 --> 00:14:21,510 OK. 194 00:14:21,510 --> 00:14:26,720 So I wrote down two requirements on the new x. 195 00:14:26,720 --> 00:14:29,060 Well, actually you might say, I wrote down a whole lot 196 00:14:29,060 --> 00:14:32,010 of requirements, because I'm saying that this should hold 197 00:14:32,010 --> 00:14:33,960 for all the previous i's. 198 00:14:33,960 --> 00:14:37,420 And this should hold for all the previous i's. 199 00:14:37,420 --> 00:14:39,200 And you see indices are getting in here, 200 00:14:39,200 --> 00:14:45,310 and I'm asking for your patience. 201 00:14:45,310 --> 00:14:47,890 What's the point that I want to make here? 202 00:14:47,890 --> 00:14:52,470 It's this short recurrence point, this point that we had, 203 00:14:52,470 --> 00:14:56,760 over here, when h was just tri-diagonal, 204 00:14:56,760 --> 00:14:59,220 so we only needed -- at the new step, 205 00:14:59,220 --> 00:15:06,080 the only things we had to find were the -- 206 00:15:06,080 --> 00:15:10,650 say at the second step, we just had to find that h and that h, 207 00:15:10,650 --> 00:15:14,200 because this one was the same as that. 208 00:15:14,200 --> 00:15:21,410 And then the next one, we -- maybe I didn't say that right. 209 00:15:21,410 --> 00:15:24,870 I guess when we -- let me start again. 210 00:15:24,870 --> 00:15:29,540 At the first step I need to find two numbers. 211 00:15:29,540 --> 00:15:31,430 Now because H is symmetric, that's 212 00:15:31,430 --> 00:15:32,900 already the same as that. 213 00:15:32,900 --> 00:15:35,560 So at the next step I need these two numbers, 214 00:15:35,560 --> 00:15:37,600 and then that number is already that. 215 00:15:37,600 --> 00:15:42,560 So, two numbers at every step and big zeros up here. 216 00:15:42,560 --> 00:15:46,490 That's that's the key point when A is symmetric. 217 00:15:46,490 --> 00:15:49,480 So the same will be true over here. 218 00:15:49,480 --> 00:15:58,190 We have two numbers, and they're called alpha and beta, 219 00:15:58,190 --> 00:15:59,820 so they're a little different numbers, 220 00:15:59,820 --> 00:16:04,020 because we're working with the x's now and not with the q's, 221 00:16:04,020 --> 00:16:09,330 so you don't see any q's written down for conjugate gradients. 222 00:16:09,330 --> 00:16:11,060 OK. 223 00:16:11,060 --> 00:16:15,940 All right, let me just look at the cycle of five steps. 224 00:16:18,660 --> 00:16:28,170 So I have to give it a start, so I start with d_0 as b and x_0 225 00:16:28,170 --> 00:16:36,940 as 0, and I guess, r_0 which, of course, is b minus A*x_0. 226 00:16:36,940 --> 00:16:40,300 If that's 0, it's also b. 227 00:16:40,300 --> 00:16:41,460 So those are the starts. 228 00:16:44,630 --> 00:16:47,460 So I'm ready to take a step. 229 00:16:47,460 --> 00:16:51,570 I know the ones with subscript zero, 230 00:16:51,570 --> 00:16:54,110 and I'm ready for the next step. 231 00:16:54,110 --> 00:16:56,540 I know the ones with subscript k minus 1, 232 00:16:56,540 --> 00:17:01,050 and I'm ready for this step, this cycle, k. 233 00:17:04,760 --> 00:17:06,370 So, here's a number. 234 00:17:09,610 --> 00:17:12,900 So I'm coming into this -- well let me say maybe how I should 235 00:17:12,900 --> 00:17:13,490 say it. 236 00:17:13,490 --> 00:17:18,460 I'm coming into that cycle with a new d, now what is a d? 237 00:17:18,460 --> 00:17:20,650 It's a search direction. 238 00:17:20,650 --> 00:17:22,570 It's the way I'm going to go. 239 00:17:22,570 --> 00:17:26,260 It's the direction in which this delta x is going to go. 240 00:17:26,260 --> 00:17:29,640 This delta x is going to be, probably I'll see it here. 241 00:17:29,640 --> 00:17:34,950 Yeah, there is delta x, is a multiple of d, right? 242 00:17:34,950 --> 00:17:39,650 If I put this on this side, that's delta x, 243 00:17:39,650 --> 00:17:42,520 the change in x is a multiple of d, 244 00:17:42,520 --> 00:17:46,130 so d is the direction in n-dimensional space 245 00:17:46,130 --> 00:17:49,840 that I'm looking for the correction, 246 00:17:49,840 --> 00:17:53,210 and alpha is the number, is how far 247 00:17:53,210 --> 00:17:58,650 I go along d for the best, for the right, delta x. 248 00:17:58,650 --> 00:18:06,190 So alpha will be determined by, probably by that. 249 00:18:06,190 --> 00:18:11,740 Probably by that, yeah, it involves an inner product, 250 00:18:11,740 --> 00:18:13,950 picking off a component. 251 00:18:13,950 --> 00:18:14,730 OK? 252 00:18:14,730 --> 00:18:18,000 So the first step is to find the number; 253 00:18:18,000 --> 00:18:22,170 the next step is to take that update, delta x is 254 00:18:22,170 --> 00:18:24,740 the right multiple of the search direction 255 00:18:24,740 --> 00:18:26,460 that I'm coming in with. 256 00:18:26,460 --> 00:18:30,020 Then correcting r is easy; I'll come back to that; 257 00:18:30,020 --> 00:18:31,810 if I know the change in x, I know 258 00:18:31,810 --> 00:18:35,390 the change in r right away. 259 00:18:35,390 --> 00:18:39,600 So that's given me, I'm now updated. 260 00:18:39,600 --> 00:18:43,040 But now I have to look ahead. 261 00:18:43,040 --> 00:18:47,390 What direction am I going to travel next? 262 00:18:47,390 --> 00:18:54,390 So the next steps are to find a number beta and then the search 263 00:18:54,390 --> 00:19:01,620 direction, which isn't r; it's r again with just one correction. 264 00:19:01,620 --> 00:19:05,310 So it's a beautiful set of formulas, beautiful. 265 00:19:05,310 --> 00:19:09,610 And I could write -- you know, because I know x's and r's, I 266 00:19:09,610 --> 00:19:16,250 could write the formulas -- there are different expressions 267 00:19:16,250 --> 00:19:20,895 for the same alpha and beta, but this is a very satisfactory 268 00:19:20,895 --> 00:19:21,550 one. 269 00:19:21,550 --> 00:19:22,730 OK. 270 00:19:22,730 --> 00:19:26,820 Have a look at that cycle just to see what 271 00:19:26,820 --> 00:19:28,840 how much work is involved. 272 00:19:28,840 --> 00:19:31,380 How much work is involved in that cycle? 273 00:19:31,380 --> 00:19:35,490 OK, well, there's a multiplication 274 00:19:35,490 --> 00:19:38,340 by A, no surprise. 275 00:19:38,340 --> 00:19:40,050 It's that multiplication by A that 276 00:19:40,050 --> 00:19:43,480 takes us to the next Krylov space, 277 00:19:43,480 --> 00:19:46,600 takes us to the next update. 278 00:19:46,600 --> 00:19:50,030 So I see a multiplication by A, and also here, but it's 279 00:19:50,030 --> 00:19:51,940 the same multiplication. 280 00:19:51,940 --> 00:19:59,090 So I see one multiplication by A. So, easy to list. 281 00:19:59,090 --> 00:20:05,500 So each cycle, there's one multiplication A 282 00:20:05,500 --> 00:20:12,460 times the search direction, and because A is a sparse matrix, 283 00:20:12,460 --> 00:20:18,450 the cost there is the number of non-zeros in the matrix. 284 00:20:18,450 --> 00:20:20,260 OK. 285 00:20:20,260 --> 00:20:24,520 Then, what other computations do you have to do? 286 00:20:24,520 --> 00:20:32,020 I see an inner product there, and I'm going to use it again. 287 00:20:32,020 --> 00:20:35,560 I guess, and here it is at the next step, 288 00:20:35,560 --> 00:20:42,040 so I guess per cycle, I have to do one inner product of r 289 00:20:42,040 --> 00:20:47,030 with itself, and I have to do one of these inner products. 290 00:20:47,030 --> 00:20:53,270 So I see two inner products, and of course, these 291 00:20:53,270 --> 00:20:54,720 are vectors of length n. 292 00:20:54,720 --> 00:21:02,550 so that's 2n multiplications, 2n additions. 293 00:21:02,550 --> 00:21:08,930 OK, this was a multiple of n, depending how sparse A is, 294 00:21:08,930 --> 00:21:14,326 this is 2n adds and 2n multiplies for these two 295 00:21:14,326 --> 00:21:14,950 inner products. 296 00:21:14,950 --> 00:21:18,050 OK. 297 00:21:18,050 --> 00:21:18,550 Yeah. 298 00:21:18,550 --> 00:21:23,450 I guess you could say take them there. 299 00:21:23,450 --> 00:21:25,400 And then what else do we have to do? 300 00:21:25,400 --> 00:21:27,980 Oh, we have these updates, so I have 301 00:21:27,980 --> 00:21:32,230 to multiply a vector by that scalar and add. 302 00:21:32,230 --> 00:21:35,480 And again here, I multiply that vector by that scalar 303 00:21:35,480 --> 00:21:37,610 and subtract. 304 00:21:37,610 --> 00:21:40,310 Oh, and do I have to do it again here? 305 00:21:40,310 --> 00:21:41,550 Have I got three? 306 00:21:41,550 --> 00:21:44,560 Somehow, I thought I might only have two. 307 00:21:44,560 --> 00:21:51,670 Oh, yeah, let me say two or three vector updates. 308 00:21:58,410 --> 00:22:07,120 People are much better than I am at spotting the right way 309 00:22:07,120 --> 00:22:09,890 to organize those calculations. 310 00:22:09,890 --> 00:22:17,470 But that's really a very small price to pay per cycle, 311 00:22:17,470 --> 00:22:24,980 and so if it converges quickly, as it normally does, 312 00:22:24,980 --> 00:22:26,630 this is going to be a good algorithm. 313 00:22:26,630 --> 00:22:30,020 Maybe I'll -- maybe having introduced the question of how 314 00:22:30,020 --> 00:22:33,130 quickly does it converge, I should maybe say something 315 00:22:33,130 --> 00:22:34,950 about that. 316 00:22:34,950 --> 00:22:40,520 So what's the error, after k steps? 317 00:22:40,520 --> 00:22:43,900 How is it related to the error at the start? 318 00:22:46,830 --> 00:22:49,810 So, I have to first say, what do I mean by this measure 319 00:22:49,810 --> 00:22:57,150 of error, and then I have to say what's the -- 320 00:22:57,150 --> 00:23:01,620 I hope I see a factor, less than 1, to the k power. 321 00:23:01,620 --> 00:23:05,780 Yeah, so I'm hoping some factor to the k power will be there, 322 00:23:05,780 --> 00:23:09,090 and I sure hope it's less than 1. 323 00:23:09,090 --> 00:23:10,770 Let me say what it is. 324 00:23:10,770 --> 00:23:15,080 I think -- so this is maybe not the very best estimate, 325 00:23:15,080 --> 00:23:17,080 but it shows the main point. 326 00:23:17,080 --> 00:23:21,060 It's the square root of lambda_max 327 00:23:21,060 --> 00:23:24,920 minus the square root of lambda_min, 328 00:23:24,920 --> 00:23:26,980 divided by the square root of that 329 00:23:26,980 --> 00:23:30,110 plus the square root of that. 330 00:23:30,110 --> 00:23:34,260 And maybe there's a factor 2 somewhere. 331 00:23:34,260 --> 00:23:39,130 So this is one estimate for the convergence factor, 332 00:23:39,130 --> 00:23:44,060 and you see what it depends on, I could divide everything 333 00:23:44,060 --> 00:23:48,940 by lambda_min there, so that I could write that as the square 334 00:23:48,940 --> 00:23:54,120 root of the condition number -- I'll divide by that -- minus 1, 335 00:23:54,120 --> 00:23:58,100 divided by the square root of the condition number plus 1, 336 00:23:58,100 --> 00:23:59,150 to the k power. 337 00:24:01,950 --> 00:24:08,660 So the condition number has a part in this error estimate, 338 00:24:08,660 --> 00:24:14,620 and I won't derive that, even in the notes, I won't. 339 00:24:14,620 --> 00:24:18,450 And other people work out other estimates for the error, 340 00:24:18,450 --> 00:24:20,590 it's an interesting problem. 341 00:24:20,590 --> 00:24:24,680 And I do have to say that this norm is 342 00:24:24,680 --> 00:24:33,380 the natural inner product, e_k squared 343 00:24:33,380 --> 00:24:37,110 is the inner product of e_k with itself, 344 00:24:37,110 --> 00:24:39,970 but again with A in the middle. 345 00:24:39,970 --> 00:24:44,470 I suppose that a little part of the message here, 346 00:24:44,470 --> 00:24:51,250 and on that middle board, is that it's just natural to see 347 00:24:51,250 --> 00:24:54,620 the matrix A playing a part in the -- 348 00:24:54,620 --> 00:24:57,370 it just comes in naturally. 349 00:24:57,370 --> 00:25:00,810 Conjugate gradients brings it in naturally. 350 00:25:00,810 --> 00:25:07,650 These formulas are just created so that this will work. 351 00:25:07,650 --> 00:25:09,990 OK. 352 00:25:09,990 --> 00:25:13,150 So I suppose I'm thinking there's 353 00:25:13,150 --> 00:25:17,160 one word I haven't really justified, 354 00:25:17,160 --> 00:25:18,690 and that's the word gradient. 355 00:25:18,690 --> 00:25:21,590 Why do we call it conjugate gradient? 356 00:25:21,590 --> 00:25:25,480 Here I erased -- "conjugate directions" 357 00:25:25,480 --> 00:25:28,720 I was quite happy with, the directions 358 00:25:28,720 --> 00:25:38,450 or the d's, because this is the direction in which we moved 359 00:25:38,450 --> 00:25:43,100 from x, so conjugate directions would be just fine, 360 00:25:43,100 --> 00:25:49,830 and those words come into this part of numerical mathematics, 361 00:25:49,830 --> 00:25:51,990 but where do gradients come in? 362 00:25:51,990 --> 00:25:53,430 OK. 363 00:25:53,430 --> 00:25:54,880 So, let me say, gradient of what? 364 00:25:57,860 --> 00:26:02,570 Well, what is -- so first of all, 365 00:26:02,570 --> 00:26:15,580 gradient of what is in these linear problems where A*x equal 366 00:26:15,580 --> 00:26:16,780 b? 367 00:26:16,780 --> 00:26:19,380 Those come from the gradient of the energy. 368 00:26:19,380 --> 00:26:22,470 So can I say E of x for the energy? 369 00:26:22,470 --> 00:26:27,940 As 1/2 x transpose A*x minus b transpose x? 370 00:26:35,030 --> 00:26:36,700 You might say, where did that come from? 371 00:26:41,240 --> 00:26:43,860 Normally, we've been talking about linear systems, 372 00:26:43,860 --> 00:26:47,530 everything's been in terms of A*x equal b, and now suddenly, 373 00:26:47,530 --> 00:26:52,220 I'm changing to a different part of mathematics. 374 00:26:52,220 --> 00:26:56,520 I'm looking at minimizing -- so optimization, 375 00:26:56,520 --> 00:27:02,440 the part that's coming in April, is minimizing energy, 376 00:27:02,440 --> 00:27:07,040 minimizing function, and what's the link? 377 00:27:07,040 --> 00:27:13,120 Well, if I minimize that, the gradient of this is -- 378 00:27:13,120 --> 00:27:18,560 how shall I write? grad E, the gradient of E, 379 00:27:18,560 --> 00:27:28,030 the list of the derivatives of E with respect to the x_i, 380 00:27:28,030 --> 00:27:33,650 and if we work that out -- well, why not pretend it's scalar 381 00:27:33,650 --> 00:27:36,310 for the moment, so these are just numbers. 382 00:27:36,310 --> 00:27:43,250 Then this ordinary derivative of 1/2 A x squared is A*x, 383 00:27:43,250 --> 00:27:51,950 and the derivative of b*x is b, and that rule continues 384 00:27:51,950 --> 00:27:53,740 into the vector case. 385 00:27:53,740 --> 00:27:57,910 So what have we got here? 386 00:27:57,910 --> 00:28:02,190 We've computed, we've introduced the natural energy, 387 00:28:02,190 --> 00:28:05,650 so this is a natural energy for the problem, 388 00:28:05,650 --> 00:28:12,470 and the reason it's natural is when we minimize it, 389 00:28:12,470 --> 00:28:19,240 set the derivatives to zero, we got the equation A*x equal b, 390 00:28:19,240 --> 00:28:21,820 so that would be zero at a minimum. 391 00:28:21,820 --> 00:28:27,190 This would equal zero at a min, so what I'm doing right now 392 00:28:27,190 --> 00:28:29,350 is pretty serious. 393 00:28:29,350 --> 00:28:35,860 On the other hand, I'll repeat it again in April when we do 394 00:28:35,860 --> 00:28:39,500 minimizations, but this is the -- 395 00:28:39,500 --> 00:28:45,160 this quadratic energy has a linear gradient, 396 00:28:45,160 --> 00:28:49,440 and its minimum is exactly A*x equal b. 397 00:28:49,440 --> 00:28:52,340 The minimum point is A*x equal b. 398 00:28:52,340 --> 00:28:55,570 In other words, solving the linear equation 399 00:28:55,570 --> 00:28:59,260 and minimizing the energy are one and the same, one 400 00:28:59,260 --> 00:29:00,740 and the same. 401 00:29:00,740 --> 00:29:04,940 And I'm allowed to use the word min, 402 00:29:04,940 --> 00:29:11,420 because A is positive definite, so the positive definiteness 403 00:29:11,420 --> 00:29:15,930 of A is what means that this is really a minimum 404 00:29:15,930 --> 00:29:18,790 and not a maximum or a saddle point. 405 00:29:18,790 --> 00:29:19,560 OK. 406 00:29:19,560 --> 00:29:25,280 So I was just -- I wanted just to think about how do you 407 00:29:25,280 --> 00:29:26,700 minimize a function? 408 00:29:29,250 --> 00:29:34,230 I guess any sensible person, given a function 409 00:29:34,230 --> 00:29:39,310 and looking for the minimum, would figure out the gradient 410 00:29:39,310 --> 00:29:43,380 and move that way, move, well, not up the gradient, 411 00:29:43,380 --> 00:29:47,090 move down the gradient, so I imagine this minimization 412 00:29:47,090 --> 00:29:54,170 problem, I imagine here I'm in x, this is the x_1, x_2, 413 00:29:54,170 --> 00:30:00,920 the x's; here's the E, E of x is something like that, 414 00:30:00,920 --> 00:30:03,240 maybe its minimum is there. 415 00:30:03,240 --> 00:30:09,410 This is the graph of E of x, and I'm trying to find that point, 416 00:30:09,410 --> 00:30:14,600 and I make as first guess, somewhere there. 417 00:30:14,600 --> 00:30:16,980 Somewhere on that surface. 418 00:30:16,980 --> 00:30:25,580 Maybe, so to help your eye, this is like a bowl, 419 00:30:25,580 --> 00:30:28,610 and maybe there's a point, so that's 420 00:30:28,610 --> 00:30:31,480 on the surface of the bowl. 421 00:30:31,480 --> 00:30:34,920 That's x_0, let's say. 422 00:30:34,920 --> 00:30:41,640 I'm trying to get to the bottom of the bowl, and as I said, 423 00:30:41,640 --> 00:30:43,900 any sensible person would figure out, well, 424 00:30:43,900 --> 00:30:46,340 what's the steepest way down. 425 00:30:46,340 --> 00:30:48,960 The steepest way down is the direction of the gradient. 426 00:30:51,670 --> 00:30:59,470 So I could call this gradient g, if I wanted, and then I 427 00:30:59,470 --> 00:31:01,380 would go in the direction of minus g, 428 00:31:01,380 --> 00:31:03,420 because I want to go down and not up. 429 00:31:03,420 --> 00:31:07,310 I mean, we're climbing, I shouldn't say climbing, 430 00:31:07,310 --> 00:31:14,150 we're descending a mountain, and trying to get to the -- 431 00:31:14,150 --> 00:31:15,870 or descending a valley, sorry. 432 00:31:15,870 --> 00:31:17,790 We're descending into a valley and trying 433 00:31:17,790 --> 00:31:20,400 to get to the bottom. 434 00:31:20,400 --> 00:31:22,430 And I'm assuming that this valley 435 00:31:22,430 --> 00:31:33,310 is this nice shape given by just a second-degree polynomial. 436 00:31:33,310 --> 00:31:38,220 Of course, when we get to optimization, 437 00:31:38,220 --> 00:31:40,670 this will be the simplest problem, but not the only one. 438 00:31:40,670 --> 00:31:42,560 Here it's our problem. 439 00:31:42,560 --> 00:31:47,400 Now, of course -- so what happens if I move 440 00:31:47,400 --> 00:31:54,570 in the direction of -- the direction water would flow. 441 00:31:54,570 --> 00:31:58,270 If I tip a flask of water on the ground, 442 00:31:58,270 --> 00:32:01,730 it's going to flow in the steepest direction, 443 00:32:01,730 --> 00:32:05,860 and that would be the natural direction to go. 444 00:32:09,490 --> 00:32:12,430 But it's not the best direction. 445 00:32:12,430 --> 00:32:18,960 Maybe the first step is, but -- so I'm talking now about 446 00:32:18,960 --> 00:32:23,940 conjugate gradient seen as a minimization problem, 447 00:32:23,940 --> 00:32:26,780 because that's where the word gradient is justified. 448 00:32:26,780 --> 00:32:31,310 This is the gradient, and of course, this 449 00:32:31,310 --> 00:32:35,890 is just minus the residual, so this is 450 00:32:35,890 --> 00:32:37,340 the direction of the residual. 451 00:32:37,340 --> 00:32:39,410 So that's the question, should I just 452 00:32:39,410 --> 00:32:42,980 go in the direction of the residual all the time? 453 00:32:42,980 --> 00:32:46,600 At every step -- and this is what the method of steepest 454 00:32:46,600 --> 00:32:50,900 descent will do, so let me make the contrast. 455 00:32:50,900 --> 00:32:55,790 Steepest descent is the first thing 456 00:32:55,790 --> 00:33:01,300 you would think of, direction is r. 457 00:33:05,630 --> 00:33:08,820 That's the gradient direction, or the negative gradient 458 00:33:08,820 --> 00:33:09,720 direction. 459 00:33:09,720 --> 00:33:16,130 Follow r until, in that direction, you've hit bottom. 460 00:33:16,130 --> 00:33:18,900 So you'll go down this, you see what happens, 461 00:33:18,900 --> 00:33:22,920 you'll be going down one side, and you'll hit bottom, 462 00:33:22,920 --> 00:33:28,040 and then you will come up again on a plane that's 463 00:33:28,040 --> 00:33:30,370 cutting through the surface. 464 00:33:30,370 --> 00:33:34,000 But you're not going to hit that, first shot. 465 00:33:34,000 --> 00:33:36,110 When you start here, see water would -- 466 00:33:36,110 --> 00:33:40,710 the actual steepest direction, the gradient direction, 467 00:33:40,710 --> 00:33:44,210 changes as the water flows down to the bottom, 468 00:33:44,210 --> 00:33:48,610 but here we've chosen a fixed direction; 469 00:33:48,610 --> 00:33:52,170 we're following it as long as we're going down, 470 00:33:52,170 --> 00:33:54,060 but then it'll start up again. 471 00:33:54,060 --> 00:34:02,920 So we stop there and find the gradient again there. 472 00:34:02,920 --> 00:34:05,520 Follow that down, until it goes up again; 473 00:34:05,520 --> 00:34:07,650 follow that down, until it goes up again, 474 00:34:07,650 --> 00:34:14,260 and the trouble is convergence isn't great. 475 00:34:14,260 --> 00:34:16,800 The trouble is that we end up -- even 476 00:34:16,800 --> 00:34:19,400 if we're in 100-dimensional space, 477 00:34:19,400 --> 00:34:29,080 we end up sort of just -- our repeated gradients are not 478 00:34:29,080 --> 00:34:31,260 really in good, new directions. 479 00:34:31,260 --> 00:34:38,110 We find ourselves doing a lot of work to make a little progress, 480 00:34:38,110 --> 00:34:46,670 and the reason is that these r's don't have the property 481 00:34:46,670 --> 00:34:52,300 that you're always looking for in computations, orthogonality. 482 00:34:52,300 --> 00:34:56,010 Somehow orthogonality tells you that you're really 483 00:34:56,010 --> 00:35:01,450 getting something new, if it's orthogonal to all the old ones. 484 00:35:01,450 --> 00:35:02,360 OK. 485 00:35:02,360 --> 00:35:09,570 Now, so this, the direction r is not great. 486 00:35:14,870 --> 00:35:17,780 The better one is to have some orthogonality. 487 00:35:17,780 --> 00:35:31,410 Take a direction d that -- well, here's the point. 488 00:35:34,590 --> 00:35:36,810 This is the right direction; this is the direction 489 00:35:36,810 --> 00:35:38,420 conjugate gradients choose. 490 00:35:38,420 --> 00:35:41,550 This is where -- so it's a gradient, 491 00:35:41,550 --> 00:35:47,340 but then it removes the component in the direction we 492 00:35:47,340 --> 00:35:50,010 just took. 493 00:35:50,010 --> 00:35:52,120 That's what that beta will be. 494 00:35:52,120 --> 00:35:58,170 When we compute that beta, and I'm a little surprised 495 00:35:58,170 --> 00:36:00,840 that it isn't a minus sign. 496 00:36:00,840 --> 00:36:04,360 No, it's OK. 497 00:36:04,360 --> 00:36:07,780 It turns out that way, let me just 498 00:36:07,780 --> 00:36:10,860 be sure that I wrote the formula down right, 499 00:36:10,860 --> 00:36:16,600 at least wrote it down as I have it here, yep. 500 00:36:16,600 --> 00:36:19,410 Again, I'm not checking the formulas, 501 00:36:19,410 --> 00:36:22,910 but just describing the ideas. 502 00:36:22,910 --> 00:36:25,980 So the idea is that this formula gives us 503 00:36:25,980 --> 00:36:33,400 a new direction, which has this property two that we're after. 504 00:36:33,400 --> 00:36:36,440 That the direction, it's not actually orthogonal, 505 00:36:36,440 --> 00:36:40,520 it's A-orthogonal to the previous direction. 506 00:36:40,520 --> 00:36:44,230 OK, I'm going to draw one picture of what -- whoops, 507 00:36:44,230 --> 00:36:47,740 not on that -- of these directions, 508 00:36:47,740 --> 00:36:54,170 and then that's my best I can do with conjugate gradients. 509 00:36:54,170 --> 00:37:00,940 I want to try to draw this search 510 00:37:00,940 --> 00:37:05,330 for the bottom of the valley. 511 00:37:05,330 --> 00:37:08,000 So I'm going to draw that search by drawing 512 00:37:08,000 --> 00:37:11,250 the level sets of the function. 513 00:37:11,250 --> 00:37:17,450 I'm going to -- yeah, somehow the contours of my function 514 00:37:17,450 --> 00:37:26,320 are, since my function here is quadratic, all the contours, 515 00:37:26,320 --> 00:37:28,660 the level sets will be ellipses. 516 00:37:28,660 --> 00:37:31,070 They'll all be, sort of, like similar ellipses, 517 00:37:31,070 --> 00:37:33,360 and there is the one I want. 518 00:37:33,360 --> 00:37:35,530 That's the bottom of the valley. 519 00:37:35,530 --> 00:37:38,150 So this is the contour where energy three, 520 00:37:38,150 --> 00:37:42,120 this is the contour energy two, this is the contour energy one, 521 00:37:42,120 --> 00:37:45,650 and there's the point where the energy is a minimum. 522 00:37:45,650 --> 00:37:50,280 Well, it might be negative, whatever, 523 00:37:50,280 --> 00:37:51,960 but it's the smallest. 524 00:37:51,960 --> 00:37:52,810 OK. 525 00:37:52,810 --> 00:37:59,850 So what does -- can I compare steepest descent with conjugate 526 00:37:59,850 --> 00:38:00,730 gradient? 527 00:38:00,730 --> 00:38:03,110 Let me draw one more contour, just so I have enough 528 00:38:03,110 --> 00:38:06,880 to make this point. 529 00:38:06,880 --> 00:38:14,000 OK, so steepest descent starts somewhere, OK, starts there, 530 00:38:14,000 --> 00:38:15,890 let's say. 531 00:38:15,890 --> 00:38:17,610 Suppose that's my starting point. 532 00:38:17,610 --> 00:38:22,230 Steepest descent goes, if I draw it right, 533 00:38:22,230 --> 00:38:25,640 it'll go perpendicular, right? 534 00:38:25,640 --> 00:38:30,350 It goes perpendicular to the level contour. 535 00:38:30,350 --> 00:38:35,340 You carry along until, as long as the contours are going down, 536 00:38:35,340 --> 00:38:40,510 and then if I continue here, I'm climbing back up the valley. 537 00:38:40,510 --> 00:38:41,430 So I'll stop here. 538 00:38:41,430 --> 00:38:47,200 Let's suppose that that happened to be tangent right there, OK. 539 00:38:47,200 --> 00:38:52,800 Now, that was the first step of steepest descent. 540 00:38:52,800 --> 00:38:56,090 Now I'm at this point, I look at this contour, 541 00:38:56,090 --> 00:39:00,130 and well, if I've drawn it reasonably well, 542 00:39:00,130 --> 00:39:02,650 I'm following maybe perpendicular, 543 00:39:02,650 --> 00:39:06,600 and now again I'm perpendicular, steepest descent says 544 00:39:06,600 --> 00:39:11,650 go perpendicular to the contour, until you've got to the bottom 545 00:39:11,650 --> 00:39:13,230 there. 546 00:39:13,230 --> 00:39:23,110 Then down, then up, then do you see this frustrating crawl 547 00:39:23,110 --> 00:39:26,220 across the valley. 548 00:39:26,220 --> 00:39:29,380 It wouldn't happen if these contours were circles. 549 00:39:29,380 --> 00:39:33,260 If these contours -- if A was the identity matrix, 550 00:39:33,260 --> 00:39:37,000 if A was the identity matrix, these would be perfect circles, 551 00:39:37,000 --> 00:39:39,450 and you'd go to the center right away. 552 00:39:39,450 --> 00:39:44,800 Well that only says that A*x equal b is easy to solve when A 553 00:39:44,800 --> 00:39:46,970 is the identity matrix. 554 00:39:46,970 --> 00:39:55,070 So this is steepest descent, and its rate of convergence 555 00:39:55,070 --> 00:40:00,800 is slow, because the longer and thinner the valley, 556 00:40:00,800 --> 00:40:06,260 the worse it is here, OK. 557 00:40:06,260 --> 00:40:08,280 What about conjugate gradients? 558 00:40:08,280 --> 00:40:18,800 Well, I'm just going to draw magic -- 559 00:40:18,800 --> 00:40:20,580 I'm going to make it look like magic. 560 00:40:20,580 --> 00:40:25,640 I'm going to start at the same place, the first step, 561 00:40:25,640 --> 00:40:29,090 because I'm starting with zero, the best thing I can do 562 00:40:29,090 --> 00:40:33,180 is follow the gradients to there, 563 00:40:33,180 --> 00:40:39,810 and now, so that was the first step, and now what? 564 00:40:39,810 --> 00:40:46,230 Well, the next direction -- so these are the search 565 00:40:46,230 --> 00:40:47,550 directions. 566 00:40:47,550 --> 00:40:51,790 That's the search direction d_1, d_2, d_3, d_4. 567 00:40:51,790 --> 00:40:56,060 Here is d_1, and what does d_2 look like? 568 00:40:56,060 --> 00:40:59,580 May I cheat and go straight to the center? 569 00:40:59,580 --> 00:41:03,690 It's a little bit of a cheat, I'll go near the center. 570 00:41:03,690 --> 00:41:09,280 The next direction is much more like that 571 00:41:09,280 --> 00:41:12,480 and then goes to the bottom, which 572 00:41:12,480 --> 00:41:16,250 would be somewhere like there inside that level set, 573 00:41:16,250 --> 00:41:20,490 and then the next direction would probably be very close. 574 00:41:20,490 --> 00:41:27,520 So this is not 90 degrees unless you're in the A inner product. 575 00:41:27,520 --> 00:41:37,350 So this is a 90 degree angle in the A inner product. 576 00:41:37,350 --> 00:41:40,040 And what does the A inner product do? 577 00:41:40,040 --> 00:41:49,080 In the A inner product, in which level sets 578 00:41:49,080 --> 00:41:58,040 are circles, or spheres or whatever, in whatever dimension 579 00:41:58,040 --> 00:41:58,540 we want. 580 00:41:58,540 --> 00:42:02,630 Well, I'm not going to be, I don't want 581 00:42:02,630 --> 00:42:06,100 to be held to everything here. 582 00:42:06,100 --> 00:42:11,700 If you see this picture, then we've 583 00:42:11,700 --> 00:42:15,090 not only made headway with the conjugate gradient 584 00:42:15,090 --> 00:42:19,150 method, which is a big deal for solving linear systems, 585 00:42:19,150 --> 00:42:22,170 but also we've made headway with the conjugate gradient method 586 00:42:22,170 --> 00:42:25,290 for minimizing functions. 587 00:42:25,290 --> 00:42:28,590 And if the function wasn't quadratic, 588 00:42:28,590 --> 00:42:31,330 and our equations weren't linear, 589 00:42:31,330 --> 00:42:34,880 the conjugate gradient idea would still be available. 590 00:42:34,880 --> 00:42:39,760 Non-linear conjugate gradients would somehow 591 00:42:39,760 --> 00:42:48,310 follow the same idea of A-orthogonal search directions. 592 00:42:48,310 --> 00:42:51,750 So these are not the gradients, they're the search directions 593 00:42:51,750 --> 00:42:53,120 here. 594 00:42:53,120 --> 00:42:55,940 OK. 595 00:42:55,940 --> 00:43:03,530 That's sort of my speech about the conjugate gradient method. 596 00:43:03,530 --> 00:43:07,460 Without having checked all the formulas in the cycle, 597 00:43:07,460 --> 00:43:12,020 we know how much work is involved, we know what we get, 598 00:43:12,020 --> 00:43:15,950 and we see a bit of the geometry. 599 00:43:15,950 --> 00:43:17,130 OK. 600 00:43:17,130 --> 00:43:21,590 And we see it as a linear systems solver 601 00:43:21,590 --> 00:43:26,660 and also as an energy minimizer, and those 602 00:43:26,660 --> 00:43:31,330 are both important ways to think of conjugate gradients. 603 00:43:31,330 --> 00:43:32,660 OK. 604 00:43:32,660 --> 00:43:40,340 I'll just take a few final minutes to say a word about 605 00:43:40,340 --> 00:43:44,730 other -- what if our problem is not symmetric positive 606 00:43:44,730 --> 00:43:46,460 definite? 607 00:43:46,460 --> 00:43:50,220 What if we have an un-symmetric problem, 608 00:43:50,220 --> 00:43:53,460 or a symmetric, but not positive definite problem? 609 00:43:53,460 --> 00:44:01,870 Well, in that case, these denominators could be zero. 610 00:44:01,870 --> 00:44:05,500 I mean, if we don't know that A is positive definite, that 611 00:44:05,500 --> 00:44:08,040 could be a zero, it could be anything, 612 00:44:08,040 --> 00:44:11,500 and that method is broken. 613 00:44:11,500 --> 00:44:18,030 So I want to suggest a safer, but more expensive method now 614 00:44:18,030 --> 00:44:24,850 for -- so this would be MINRES, minimum residual. 615 00:44:24,850 --> 00:44:29,230 So the idea is minimize, choose x_k 616 00:44:29,230 --> 00:44:35,080 to minimize the norm of r_k, that's the residual. 617 00:44:35,080 --> 00:44:44,230 So minimize the norm of b minus A*x_k. 618 00:44:44,230 --> 00:44:47,380 So that's my choice of x_k, and the question is, 619 00:44:47,380 --> 00:44:51,830 how quickly can I do it? 620 00:44:51,830 --> 00:44:55,400 So this is MINRES, OK. 621 00:44:55,400 --> 00:44:58,690 And there are other related methods, 622 00:44:58,690 --> 00:45:00,430 there's a whole family of methods. 623 00:45:00,430 --> 00:45:03,750 What do I want to say about MINRES? 624 00:45:03,750 --> 00:45:08,430 Let me just find MINRES. 625 00:45:08,430 --> 00:45:10,440 Well, yeah. 626 00:45:14,650 --> 00:45:18,840 I guess, for this I will start with Arnoldi. 627 00:45:28,980 --> 00:45:34,180 So for this different algorithm, which comes up with a different 628 00:45:34,180 --> 00:45:37,090 x_k, I'm going to -- and of course, 629 00:45:37,090 --> 00:45:43,600 the x_k is going to be in the Krylov space K_k, 630 00:45:43,600 --> 00:45:46,870 and I'm going to use this good basis. 631 00:45:46,870 --> 00:45:52,890 This gives me the great basis of q's. 632 00:45:52,890 --> 00:45:56,710 So I turn my problem, I convert the problem. 633 00:45:56,710 --> 00:46:10,790 I convert this minimization -- I look for x as a combination 634 00:46:10,790 --> 00:46:15,190 of the of the q's, right? 635 00:46:15,190 --> 00:46:18,980 Is that how you see x equals Q*y? 636 00:46:18,980 --> 00:46:22,200 So instead of looking for the vector x, 637 00:46:22,200 --> 00:46:24,100 I'm looking for the vector y, which 638 00:46:24,100 --> 00:46:30,110 tells me what combination of these columns, the q's, x is. 639 00:46:30,110 --> 00:46:34,760 In other words, I'm working in the q system, the q basis. 640 00:46:34,760 --> 00:46:36,330 So I'm working with y. 641 00:46:36,330 --> 00:46:43,890 So I would, naturally, write this as a minimum of b minus 642 00:46:43,890 --> 00:46:44,610 A*Q*y_k. 643 00:46:47,360 --> 00:46:49,120 Same problem. 644 00:46:49,120 --> 00:46:51,620 Same problem, I'm just deciding, OK, I'm 645 00:46:51,620 --> 00:46:55,930 going to find this x as a combination of the q's, 646 00:46:55,930 --> 00:46:59,200 because those q's are orthogonal, 647 00:46:59,200 --> 00:47:02,810 and that will make this calculation a lot nicer. 648 00:47:02,810 --> 00:47:04,660 OK, what's going to happen here? 649 00:47:04,660 --> 00:47:10,680 You know that I'm going to use the property of the q's that 650 00:47:10,680 --> 00:47:15,730 A*Q is Q*H. 651 00:47:15,730 --> 00:47:19,970 That was the key point that Arnoldi achieved. 652 00:47:19,970 --> 00:47:24,070 And the net result is going to be, of course, 653 00:47:24,070 --> 00:47:28,400 I have to do this at level k, so I'm chopping it off, 654 00:47:28,400 --> 00:47:31,380 and I have to patiently, and the notes 655 00:47:31,380 --> 00:47:33,940 do that, patiently see, OK, what's 656 00:47:33,940 --> 00:47:38,580 happening when we chop it off, chop off these matrices. 657 00:47:38,580 --> 00:47:41,340 We have to watch where the non-zeros appear. 658 00:47:41,340 --> 00:47:45,930 In the end, I get to a minimization problem for y, 659 00:47:45,930 --> 00:47:52,310 so we get to a minimum problem for the matrix 660 00:47:52,310 --> 00:48:01,920 H that leads me to to the y that I'm after. 661 00:48:01,920 --> 00:48:05,810 So it's a minimum problem for a piece, sorry, 662 00:48:05,810 --> 00:48:15,400 I should really say H_k, some piece of it, like this piece. 663 00:48:15,400 --> 00:48:22,770 In fact, exactly that piece. 664 00:48:22,770 --> 00:48:27,960 So if I ask -- all right, suppose I have a least squares 665 00:48:27,960 --> 00:48:29,520 problem. 666 00:48:29,520 --> 00:48:34,200 This is least squares because my norm is just the square, 667 00:48:34,200 --> 00:48:36,660 the norm squared. 668 00:48:36,660 --> 00:48:39,420 I'm just working with l^2 norm. 669 00:48:39,420 --> 00:48:41,740 And here's my matrix. 670 00:48:45,820 --> 00:48:52,210 It's 3 by 2, so I'm looking at least squares problems, 671 00:48:52,210 --> 00:48:57,900 in that -- matrices of that type, 672 00:48:57,900 --> 00:49:00,650 and they're easy to solve. 673 00:49:00,650 --> 00:49:04,540 The notes will give the algorithm that solves the least 674 00:49:04,540 --> 00:49:05,540 squares problem. 675 00:49:05,540 --> 00:49:09,410 I guess what I'm interested in here is just is it expensive 676 00:49:09,410 --> 00:49:10,780 or not? 677 00:49:10,780 --> 00:49:15,910 Well, the least squares problem will not be expensive; 678 00:49:15,910 --> 00:49:18,480 it's only got one extra row, compared to the number 679 00:49:18,480 --> 00:49:20,850 of columns, that's not bad. 680 00:49:20,850 --> 00:49:27,690 What's expensive is the fact that our H is not tri-diagonal 681 00:49:27,690 --> 00:49:31,510 anymore, these are not zeros anymore. 682 00:49:31,510 --> 00:49:37,050 I have to compute all these H's, so the hard work 683 00:49:37,050 --> 00:49:40,330 is back in Arnoldi. 684 00:49:40,330 --> 00:49:46,650 So it's Arnoldi leading to MINRES, 685 00:49:46,650 --> 00:49:50,640 and we have serious work already in Arnoldi. 686 00:49:50,640 --> 00:49:54,170 We don't have short recurrences, we've got all these h's. 687 00:49:54,170 --> 00:49:56,630 OK, enough. 688 00:49:56,630 --> 00:50:08,200 Because ARPACK would be a good code to call to do MINRES. 689 00:50:08,200 --> 00:50:15,140 OK that's my shot at the key Krylov 690 00:50:15,140 --> 00:50:17,440 algorithms of numerical linear algebra, 691 00:50:17,440 --> 00:50:24,270 and you see that they're more subtle than Jacobi. 692 00:50:24,270 --> 00:50:26,530 Jacobi and Gauss-Seidel were just 693 00:50:26,530 --> 00:50:29,060 straight simple iterations; these 694 00:50:29,060 --> 00:50:35,500 have recurrence that produces orthogonality, 695 00:50:35,500 --> 00:50:39,510 and you pay very little, and you get so much. 696 00:50:39,510 --> 00:50:45,570 OK, so that's Krylov methods, and that section of notes 697 00:50:45,570 --> 00:50:51,640 is up on the web, and of course, at some point, 698 00:50:51,640 --> 00:50:58,330 numerical comparisons of how fast is conjugate gradients, 699 00:50:58,330 --> 00:51:02,970 and is it faster than direct elimination? 700 00:51:02,970 --> 00:51:06,000 You know, because that's a big choice you have to make. 701 00:51:06,000 --> 00:51:09,820 Suppose you do have a symmetric positive definite matrix. 702 00:51:09,820 --> 00:51:17,150 Do you do Gauss elimination with re-ordering, minimum degree, 703 00:51:17,150 --> 00:51:20,440 or do you make the completely different choice 704 00:51:20,440 --> 00:51:24,600 of iterative methods like conjugate gradients? 705 00:51:24,600 --> 00:51:29,210 And that's a decision you have to make when you're 706 00:51:29,210 --> 00:51:31,540 faced with a large matrix problem, 707 00:51:31,540 --> 00:51:39,510 and only experiments can compare two such widely different 708 00:51:39,510 --> 00:51:41,290 directions to take. 709 00:51:41,290 --> 00:51:50,890 So I like to end up by emphasizing that there's still 710 00:51:50,890 --> 00:51:53,640 a big question. 711 00:51:53,640 --> 00:51:58,010 Do this or do minimum degree? 712 00:51:58,010 --> 00:52:04,860 And only some experiments will show which is the better. 713 00:52:04,860 --> 00:52:08,780 OK, so that's good for today, and next time 714 00:52:08,780 --> 00:52:13,620 will be direct solution, using the FFT. 715 00:52:13,620 --> 00:52:16,310 Good, thanks.