1 00:00:00,000 --> 00:00:01,470 NARRATOR: The following content is 2 00:00:01,470 --> 00:00:04,690 provided by MIT OpenCourseWare under a Creative Commons 3 00:00:04,690 --> 00:00:06,090 license. 4 00:00:06,090 --> 00:00:08,230 Additional information about our license 5 00:00:08,230 --> 00:00:10,460 and MIT OpenCourseWare in general 6 00:00:10,460 --> 00:00:14,370 is available at ocw.mit.edu. 7 00:00:14,370 --> 00:00:21,540 PROFESSOR: So I'm thinking of large sparse matrices. 8 00:00:24,600 --> 00:00:29,280 So the point about A -- the kind of matrices that we've been 9 00:00:29,280 --> 00:00:32,010 writing down, maybe two-dimensional, 10 00:00:32,010 --> 00:00:35,620 maybe three-dimensional difference matrices. 11 00:00:35,620 --> 00:00:42,570 So it might be five or seven non-zeros on a typical row 12 00:00:42,570 --> 00:00:47,500 and that means that you don't want to invert such a matrix. 13 00:00:47,500 --> 00:00:49,470 These are big matrices. 14 00:00:49,470 --> 00:00:51,640 The order could easily be a million. 15 00:00:51,640 --> 00:00:56,990 What you can do fast is multiply A by a vector, 16 00:00:56,990 --> 00:00:59,880 because multiplying a sparse matrix by a vector, 17 00:00:59,880 --> 00:01:05,720 you only have maybe five non-zeros times the n -- 18 00:01:05,720 --> 00:01:11,240 the vector of length n -- so say 5n operations, so that's fast. 19 00:01:16,230 --> 00:01:21,840 I'm thinking that we are at a size where elimination, 20 00:01:21,840 --> 00:01:27,960 even with a good ordering, is too expensive in time 21 00:01:27,960 --> 00:01:29,760 or in storage. 22 00:01:29,760 --> 00:01:33,640 So I'm going from direct elimination methods 23 00:01:33,640 --> 00:01:36,800 to iterative methods. 24 00:01:36,800 --> 00:01:39,640 Iterative means that I never actually get 25 00:01:39,640 --> 00:01:46,390 to the perfect answer x, but I get close and what I want to do 26 00:01:46,390 --> 00:01:48,650 is get close quickly. 27 00:01:48,650 --> 00:01:53,170 So a lot of thought has gone into creating 28 00:01:53,170 --> 00:01:55,630 iterative methods. 29 00:01:55,630 --> 00:01:59,710 I'll write down one key word here. 30 00:01:59,710 --> 00:02:06,130 Part of the decision is to choose a good preconditioner. 31 00:02:06,130 --> 00:02:12,920 I'll call that matrix P. So it's a matrix that's supposed to be 32 00:02:12,920 --> 00:02:17,060 -- that we have some freedom to choose and it very much speeds 33 00:02:17,060 --> 00:02:21,640 up -- so the decisions are, what preconditioner to choose? 34 00:02:21,640 --> 00:02:25,990 A preconditioner that's somehow like the matrix A, 35 00:02:25,990 --> 00:02:28,640 but hopefully a lot simpler. 36 00:02:36,540 --> 00:02:39,890 There are a bunch of iterative methods 37 00:02:39,890 --> 00:02:42,770 and that's what today's lecture is about. 38 00:02:42,770 --> 00:02:49,040 Multigrid is an idea that just keeps developing. 39 00:02:49,040 --> 00:02:54,630 It really is producing answers to very large problems very 40 00:02:54,630 --> 00:02:59,590 fast, so that will be the following lectures. 41 00:02:59,590 --> 00:03:04,560 And then come these methods -- you may not be familiar with 42 00:03:04,560 --> 00:03:05,710 that word Krylov. 43 00:03:05,710 --> 00:03:10,360 Let me mention that in this family, the best known method 44 00:03:10,360 --> 00:03:12,840 it is called conjugate gradients. 45 00:03:12,840 --> 00:03:14,750 The conjugate gradient method. 46 00:03:14,750 --> 00:03:19,120 So I'll just write that down, conjugate gradient. 47 00:03:19,120 --> 00:03:20,870 So that will come next week. 48 00:03:24,620 --> 00:03:27,570 That's a method that's a giant success 49 00:03:27,570 --> 00:03:32,040 for symmetric, positive definite problem. 50 00:03:32,040 --> 00:03:35,040 So I don't assume in general that A is 51 00:03:35,040 --> 00:03:37,420 symmetric or positive definite. 52 00:03:37,420 --> 00:03:42,090 Often it is if it comes from a finite difference 53 00:03:42,090 --> 00:03:47,250 or finite element approximation. 54 00:03:47,250 --> 00:03:49,980 If it is, then conjugate gradients 55 00:03:49,980 --> 00:03:53,690 is highly recommended. 56 00:03:53,690 --> 00:03:56,910 I'll start with the general picture. 57 00:03:56,910 --> 00:04:00,160 Let me put the general picture of what 58 00:04:00,160 --> 00:04:02,250 an iteration looks like. 59 00:04:02,250 --> 00:04:06,170 So an iteration computes the new x. 60 00:04:06,170 --> 00:04:17,380 Let me call it x-new or x_(k+1) from the known x. 61 00:04:17,380 --> 00:04:23,570 So we start with x_0, any x_0, In principle. 62 00:04:23,570 --> 00:04:27,570 It's not too important for x_0 to be close. 63 00:04:27,570 --> 00:04:33,110 In linear methods, you could start even at 0. 64 00:04:33,110 --> 00:04:35,940 In nonlinear methods, Newton methods, 65 00:04:35,940 --> 00:04:38,910 you do want to be close, and a lot of attention 66 00:04:38,910 --> 00:04:40,590 goes into a good start. 67 00:04:40,590 --> 00:04:42,670 Here the start isn't too important; 68 00:04:42,670 --> 00:04:45,300 it's the steps that you taking. 69 00:04:45,300 --> 00:04:53,300 Let me -- so I guess I'm going to -- 70 00:04:53,300 --> 00:04:55,150 I'm trying to solve A*x equal b. 71 00:04:57,770 --> 00:05:05,910 I'm going to split that equation into -- 72 00:05:05,910 --> 00:05:08,140 I'm just going to write the same equation this way. 73 00:05:08,140 --> 00:05:18,195 I could write it as x equals I minus A x plus b. 74 00:05:18,195 --> 00:05:19,820 That would be the same equation, right? 75 00:05:19,820 --> 00:05:22,500 Because x and x cancel. 76 00:05:22,500 --> 00:05:25,200 A*x comes over here and I have A*x equal b. 77 00:05:29,640 --> 00:05:34,640 I've split up the equation and now actually, let me bring -- 78 00:05:34,640 --> 00:05:36,860 so that's without preconditioner. 79 00:05:36,860 --> 00:05:44,190 You don't see a matrix P. That's just an ordinary iteration, 80 00:05:44,190 --> 00:05:46,440 no preconditioner. 81 00:05:46,440 --> 00:05:52,230 The preconditioner will come -- I'll have P and it'll go here 82 00:05:52,230 --> 00:05:54,310 too. 83 00:05:54,310 --> 00:06:00,080 Then, of course, I have to change that to a P. 84 00:06:00,080 --> 00:06:02,380 So that's still the same equation, right? 85 00:06:02,380 --> 00:06:07,080 P*x is on both sides, cancels -- and I have A*x equal b. 86 00:06:07,080 --> 00:06:12,590 So that's the same equation, but it suggests the iteration 87 00:06:12,590 --> 00:06:14,570 that I want to analyze. 88 00:06:17,570 --> 00:06:28,900 On the right-hand side is the known approximation. 89 00:06:28,900 --> 00:06:33,230 On the left side is the next approximation, x_(k+1). 90 00:06:36,100 --> 00:06:43,800 I multiply by P so I -- the idea is, P should be close to A. 91 00:06:43,800 --> 00:06:48,710 Of course, if it was exactly A -- if P equaled A, 92 00:06:48,710 --> 00:06:52,570 then this wouldn't be here, and I would just be solving, 93 00:06:52,570 --> 00:06:56,730 in one step, A*x equal b. 94 00:06:56,730 --> 00:07:01,490 So P equal A is one extreme. 95 00:07:01,490 --> 00:07:05,210 I mean, it's totally, perfectly preconditioned, 96 00:07:05,210 --> 00:07:09,520 but the point is that if P equals A, 97 00:07:09,520 --> 00:07:13,320 we don't want to solve that system, 98 00:07:13,320 --> 00:07:15,790 we're trying to escape from solving that system. 99 00:07:15,790 --> 00:07:23,450 But we're willing to solve a related system that is simpler. 100 00:07:23,450 --> 00:07:25,190 Why simpler? 101 00:07:25,190 --> 00:07:27,960 Here are some possible preconditioners. 102 00:07:27,960 --> 00:07:32,060 P equals the diagonal. 103 00:07:32,060 --> 00:07:37,550 Just take the diagonal of A and that choice 104 00:07:37,550 --> 00:07:41,000 is named after Jacobi. 105 00:07:41,000 --> 00:07:43,670 That's Jacobi's method. 106 00:07:43,670 --> 00:07:47,770 Actually, it's already a good idea, 107 00:07:47,770 --> 00:07:54,550 because the effect of that is somehow to properly scale 108 00:07:54,550 --> 00:07:56,400 the equations. 109 00:07:56,400 --> 00:07:59,410 When I had the identity here, I had no idea 110 00:07:59,410 --> 00:08:04,180 whether the identity and A are in the right scaling. 111 00:08:04,180 --> 00:08:07,810 A may be divided by delta x squared or something. 112 00:08:07,810 --> 00:08:13,060 It could be totally different units, but P -- 113 00:08:13,060 --> 00:08:16,510 if I take the diagonal of A, at least they're comparable. 114 00:08:16,510 --> 00:08:22,190 Then you see what -- we'll study Jacobi's method. 115 00:08:22,190 --> 00:08:24,600 So that's a very simple choice. 116 00:08:24,600 --> 00:08:28,650 Takes no more work than the identity, really. 117 00:08:28,650 --> 00:08:32,230 Second choice would be -- so that would be P for Jacobi. 118 00:08:32,230 --> 00:08:36,280 A second choice, that you would think of pretty fast, 119 00:08:36,280 --> 00:08:42,460 is the diagonal and the lower triangular part. 120 00:08:42,460 --> 00:08:46,760 So it's the -- I'll say the triangular -- 121 00:08:46,760 --> 00:08:48,980 I'm going to say lower triangular, 122 00:08:48,980 --> 00:08:54,210 triangular part of A, including the diagonal. 123 00:08:58,100 --> 00:09:05,390 This is associated with Gauss's great name and Seidel. 124 00:09:05,390 --> 00:09:07,520 So that's the Gauss-Seidel choice. 125 00:09:11,470 --> 00:09:15,300 So why triangular? 126 00:09:15,300 --> 00:09:18,980 We'll analyze the effect of this choice. 127 00:09:18,980 --> 00:09:27,600 If P is triangular, then we can solve -- 128 00:09:27,600 --> 00:09:29,700 I'm never going to write out an inverse matrix. 129 00:09:29,700 --> 00:09:35,340 That's understood, but I can compute the new x 130 00:09:35,340 --> 00:09:41,380 from the old x just by back substitution. 131 00:09:41,380 --> 00:09:47,100 I find the first component of x_(k+1), then the second, 132 00:09:47,100 --> 00:09:48,840 then the third. 133 00:09:48,840 --> 00:09:56,750 It's just because the first equation -- if P is triangular, 134 00:09:56,750 --> 00:10:01,320 the first equation will only have one term, 135 00:10:01,320 --> 00:10:03,860 one new term on the left side and everything else will be 136 00:10:03,860 --> 00:10:05,250 over there. 137 00:10:05,250 --> 00:10:09,850 The second equation will have a diagonal and something 138 00:10:09,850 --> 00:10:17,580 that uses the number I just found for the first component 139 00:10:17,580 --> 00:10:18,650 and onward. 140 00:10:18,650 --> 00:10:23,320 So in other words, triangular matrices are good. 141 00:10:23,320 --> 00:10:27,270 Then people thought about combinations of these. 142 00:10:30,400 --> 00:10:34,440 A big lot of activity went in something called 143 00:10:34,440 --> 00:10:39,520 overrelaxation, where you did a -- 144 00:10:39,520 --> 00:10:43,450 you had a weighting of these and it turned out to be if you 145 00:10:43,450 --> 00:10:49,900 adjusted the weight, you could speed up the method, 146 00:10:49,900 --> 00:10:55,230 sometimes called SOR-- successive overrelaxation. 147 00:10:55,230 --> 00:11:00,890 So when I was in graduate school, that was a very -- 148 00:11:00,890 --> 00:11:05,240 that was sort of the beginning of progress beyond Jacobi 149 00:11:05,240 --> 00:11:07,740 and Gauss-Seidel. 150 00:11:07,740 --> 00:11:12,960 Then after that came a very natural -- 151 00:11:12,960 --> 00:11:20,260 ILU is the next one that I want -- the I stands for incomplete, 152 00:11:20,260 --> 00:11:21,290 incomplete LU. 153 00:11:24,400 --> 00:11:27,520 So what are L and U? 154 00:11:27,520 --> 00:11:32,080 That's the letters I always use as a signal for the lower 155 00:11:32,080 --> 00:11:38,600 triangular and upper triangular factors of A, the ones 156 00:11:38,600 --> 00:11:41,310 that exact elimination would produce. 157 00:11:41,310 --> 00:11:43,410 But we don't want to do exact elimination. 158 00:11:43,410 --> 00:11:44,450 Why? 159 00:11:44,450 --> 00:11:48,040 Because non-zeros fill in. 160 00:11:48,040 --> 00:11:55,230 Non-zeros appear in those spaces where A itself is 0. 161 00:11:55,230 --> 00:12:01,630 So the factors are much -- have many more non-zeros than 162 00:12:01,630 --> 00:12:04,700 the original A. They're not as sparse. 163 00:12:04,700 --> 00:12:10,470 So the idea of incomplete LU is -- and I'll come back to it -- 164 00:12:10,470 --> 00:12:12,630 the idea of incomplete LU, you might say, 165 00:12:12,630 --> 00:12:16,400 should have occurred to people earlier -- 166 00:12:16,400 --> 00:12:24,010 only keep the non-zeros that are non-zeros in the original A. 167 00:12:24,010 --> 00:12:27,970 Don't allow fill-in or allow a certain amount of fill-in. 168 00:12:27,970 --> 00:12:32,210 You could have a tolerance and you would include a non-zero 169 00:12:32,210 --> 00:12:36,350 if it was big enough, but the many, many small non-zeros 170 00:12:36,350 --> 00:12:43,060 that just fill in and make the elimination long, 171 00:12:43,060 --> 00:12:44,840 you throw them out. 172 00:12:44,840 --> 00:12:48,870 So P is -- in this case, P is L -- 173 00:12:48,870 --> 00:12:56,290 is an approximate L times an approximate U. 174 00:12:56,290 --> 00:13:05,150 So it's close to A, but because we've refused some 175 00:13:05,150 --> 00:13:08,930 of the fill-in, it's not exactly A. 176 00:13:08,930 --> 00:13:12,520 And you have a tolerance there where if you set the tolerance 177 00:13:12,520 --> 00:13:17,930 high, then these are exact, but you've got all that fill-in; 178 00:13:17,930 --> 00:13:25,540 if you set the tolerance to 0, you've got the other extreme. 179 00:13:25,540 --> 00:13:29,330 So that would give you an idea of preconditioners, 180 00:13:29,330 --> 00:13:32,660 of reasonable choices. 181 00:13:32,660 --> 00:13:39,600 Now I'm going to think about -- first, how do -- 182 00:13:39,600 --> 00:13:47,300 how to decide whether the x's approach the correct answer? 183 00:13:47,300 --> 00:13:50,980 I need an equation for the error and because my equations are 184 00:13:50,980 --> 00:13:55,290 linear, I'm just going to subtract -- 185 00:13:55,290 --> 00:13:59,470 I'll subtract the upper equation from the lower one 186 00:13:59,470 --> 00:14:04,850 and I'll call the -- so the error e_k will be x minus 187 00:14:04,850 --> 00:14:06,370 the true answer. 188 00:14:06,370 --> 00:14:08,140 This is x exact. 189 00:14:08,140 --> 00:14:13,110 Minus x, my k-th approximation. 190 00:14:13,110 --> 00:14:15,820 So when I subtract that from that, 191 00:14:15,820 --> 00:14:22,510 I have e_(k+1) and here I have P minus A -- 192 00:14:22,510 --> 00:14:25,380 when I subtract that from that, I have e_k, 193 00:14:25,380 --> 00:14:28,280 and when I subtract that from that, 0. 194 00:14:28,280 --> 00:14:32,410 So very simple error equation. 195 00:14:32,410 --> 00:14:43,340 So this is an equation, this would be the error equation. 196 00:14:43,340 --> 00:14:48,620 You see the b has disappeared, because I'm only 197 00:14:48,620 --> 00:14:51,910 looking at differences now. 198 00:14:54,440 --> 00:14:59,090 Let me write that over on this board, 199 00:14:59,090 --> 00:15:01,380 even slightly differently. 200 00:15:01,380 --> 00:15:05,360 Now I will multiply through by P inverse. 201 00:15:05,360 --> 00:15:08,670 That really shows it just so clearly. 202 00:15:08,670 --> 00:15:15,600 So the new error is, if I multiply by P inverse, 203 00:15:15,600 --> 00:15:21,170 P inverse times P is I, and I have P inverse A, e_k. 204 00:15:29,500 --> 00:15:31,300 So what does that say? 205 00:15:31,300 --> 00:15:35,130 That says that at every iteration, 206 00:15:35,130 --> 00:15:40,380 I multiply the error by that matrix. 207 00:15:40,380 --> 00:15:44,410 Of course, if P equals A, if I pick 208 00:15:44,410 --> 00:15:50,920 the perfect preconditioner, then P inverse A is the identity, 209 00:15:50,920 --> 00:15:55,390 this is the zero matrix, and the error in the first step is 0. 210 00:15:55,390 --> 00:15:57,140 I'm solving exactly. 211 00:15:57,140 --> 00:15:59,360 I don't plan to do that. 212 00:15:59,360 --> 00:16:05,820 I plan to have a P that's close to A, so in some way, 213 00:16:05,820 --> 00:16:08,930 this should be close to I. This whole matrix should 214 00:16:08,930 --> 00:16:14,130 be close to 0, close in some sense. 215 00:16:14,130 --> 00:16:16,620 Let me give a name for that matrix. 216 00:16:16,620 --> 00:16:21,730 Let's call that matrix M. So that's 217 00:16:21,730 --> 00:16:28,970 the iteration matrix, which we're never going to display. 218 00:16:28,970 --> 00:16:30,750 I mean, it would be madness to display. 219 00:16:30,750 --> 00:16:36,140 We don't want to compute P inverse explicitly or display 220 00:16:36,140 --> 00:16:40,740 it, but to understand it -- to understand convergence or not 221 00:16:40,740 --> 00:16:45,120 convergence, it's M that controls everything. 222 00:16:45,120 --> 00:16:51,730 So this is what I would call pure iteration or maybe 223 00:16:51,730 --> 00:16:54,330 another word that's used instead of pure 224 00:16:54,330 --> 00:16:56,630 is stationary iteration. 225 00:16:56,630 --> 00:16:58,430 What does stationary mean? 226 00:16:58,430 --> 00:17:01,380 It means that you do the same thing all the time. 227 00:17:01,380 --> 00:17:10,810 At every step, you just use the same equation, where -- 228 00:17:10,810 --> 00:17:15,690 these methods will be smarter. 229 00:17:15,690 --> 00:17:16,530 They'll adapt. 230 00:17:16,530 --> 00:17:17,960 They'll use more information. 231 00:17:17,960 --> 00:17:22,450 They'll converge faster, but this is the simple one -- 232 00:17:22,450 --> 00:17:26,830 and this one plays a part in those. 233 00:17:26,830 --> 00:17:32,310 So what's the story on convergence now? 234 00:17:32,310 --> 00:17:35,690 What about that matrix M? 235 00:17:35,690 --> 00:17:38,940 If I take a starting vector, e_0 -- 236 00:17:38,940 --> 00:17:41,430 my initial error could be anything. 237 00:17:41,430 --> 00:17:45,620 I multiply by M to get e_1. 238 00:17:45,620 --> 00:17:48,900 Then I multiply by M again to get e_2. 239 00:17:48,900 --> 00:17:54,580 Every step, I multiply by M. So after k steps, 240 00:17:54,580 --> 00:18:01,240 I've multiplied k times by M. I want to know -- 241 00:18:01,240 --> 00:18:04,690 so the key question is, does that go to 0? 242 00:18:04,690 --> 00:18:08,900 So the question, does it go to 0, and if it does, how fast? 243 00:18:14,460 --> 00:18:18,600 The two parts, does it go to 0 and how fast, 244 00:18:18,600 --> 00:18:24,350 are pretty much controlled by the same property of M. So 245 00:18:24,350 --> 00:18:25,220 what's the answer? 246 00:18:25,220 --> 00:18:31,360 When does -- if I take -- I don't know anything about e_0, 247 00:18:31,360 --> 00:18:37,260 so what property of M is going to decide whether its powers 248 00:18:37,260 --> 00:18:41,750 get small, go to 0, so that the error goes to 0? 249 00:18:41,750 --> 00:18:44,550 That's convergence. 250 00:18:44,550 --> 00:18:52,790 Or, maybe blow up, or maybe stay away from 0. 251 00:18:52,790 --> 00:18:56,590 What controls it? 252 00:18:56,590 --> 00:18:59,090 One word is the answer. 253 00:18:59,090 --> 00:19:00,240 The eigenvalues. 254 00:19:00,240 --> 00:19:03,760 It's the eigenvalues of M, and in particular, it's 255 00:19:03,760 --> 00:19:06,650 the biggest one. 256 00:19:06,650 --> 00:19:12,240 Because if I have an eigenvalue of M, as far as I know, 257 00:19:12,240 --> 00:19:15,060 e_naught could be the eigenvector, 258 00:19:15,060 --> 00:19:17,610 and then every step would multiply 259 00:19:17,610 --> 00:19:19,700 by that eigenvalue lambda. 260 00:19:19,700 --> 00:19:27,860 So the condition is that all eigenvalues of M 261 00:19:27,860 --> 00:19:32,931 have to be smaller than 1, in magnitude of course, 262 00:19:32,931 --> 00:19:35,180 because they could be negative, they could be complex. 263 00:19:38,890 --> 00:19:41,540 So that's the total condition, that's 264 00:19:41,540 --> 00:19:45,990 exactly the requirement on M. How do 265 00:19:45,990 --> 00:19:48,950 we say how fast they go to 0? 266 00:19:48,950 --> 00:19:53,000 How fast -- if all the eigenvalues are below 1, 267 00:19:53,000 --> 00:19:59,470 then the slowest convergence is going to come 268 00:19:59,470 --> 00:20:02,140 for the eigenvalue that's the closest to 1. 269 00:20:02,140 --> 00:20:11,720 So really it will be -- and this is called the spectral radius 270 00:20:11,720 --> 00:20:16,890 and it's usually denoted by a Greek rho of M, 271 00:20:16,890 --> 00:20:19,540 which is the maximum of these guys. 272 00:20:19,540 --> 00:20:23,710 It's the max of the eigenvalues. 273 00:20:23,710 --> 00:20:26,320 And that has to be below 1. 274 00:20:26,320 --> 00:20:31,090 So it's this number, that number, 275 00:20:31,090 --> 00:20:35,450 because it's that number which really says what I'm 276 00:20:35,450 --> 00:20:37,710 multiplying by at every step. 277 00:20:37,710 --> 00:20:42,640 Because most likely, e_naught, my initial unknown error, 278 00:20:42,640 --> 00:20:47,890 has some component in the direction of this biggest 279 00:20:47,890 --> 00:20:52,300 eigenvalue, in the direction of its eigenvector. 280 00:20:52,300 --> 00:20:55,960 Then that component will multiply at every step 281 00:20:55,960 --> 00:21:00,330 by lambda, but that one will be the biggest guy, rho. 282 00:21:02,910 --> 00:21:05,170 So it's the maximum eigenvalue. 283 00:21:05,170 --> 00:21:06,690 That's really what it comes to. 284 00:21:06,690 --> 00:21:09,230 What is the maximum eigenvalue? 285 00:21:09,230 --> 00:21:11,900 Let me take Jacobi. 286 00:21:11,900 --> 00:21:17,440 Can we figure out the maximum eigenvalue for Jacobi? 287 00:21:17,440 --> 00:21:22,364 Of course, if I don't know anything about the matrix, 288 00:21:22,364 --> 00:21:24,280 I certainly don't want to compute eigenvalues. 289 00:21:26,840 --> 00:21:29,450 I could run Jacobi's method. 290 00:21:29,450 --> 00:21:32,110 So again, Jacobi's method, I'm taking 291 00:21:32,110 --> 00:21:33,840 P to be the diagonal part. 292 00:21:33,840 --> 00:21:38,310 Let me make that explicit and let me take the matrix 293 00:21:38,310 --> 00:21:40,180 that we know the best. 294 00:21:40,180 --> 00:21:45,850 I'll call it K. So k -- or A, this is the A -- 295 00:21:45,850 --> 00:21:51,820 is my friend with 2's on the diagonal, minus 1's above. 296 00:21:51,820 --> 00:21:55,760 I apologize -- I don't really apologize, 297 00:21:55,760 --> 00:22:02,390 but I'll pretend to apologize -- for bringing this matrix 298 00:22:02,390 --> 00:22:03,350 in so often. 299 00:22:06,889 --> 00:22:08,180 Do we remember its eigenvalues? 300 00:22:14,780 --> 00:22:22,180 We computed them last semester, but that's another world. 301 00:22:22,180 --> 00:22:24,900 I'll say what they are. 302 00:22:24,900 --> 00:22:29,830 The eigenvalues of A -- this isn't M now -- 303 00:22:29,830 --> 00:22:36,140 the eigenvalues of this matrix A -- I see 2's on the diagonal. 304 00:22:36,140 --> 00:22:37,460 So that's a 2. 305 00:22:37,460 --> 00:22:41,240 That accounts for the diagonal part, 2 times the identity. 306 00:22:41,240 --> 00:22:47,580 I have a minus and then the part that comes from these 1's, 307 00:22:47,580 --> 00:22:49,460 which are forward and back. 308 00:22:49,460 --> 00:22:51,360 What would von Neumann say? 309 00:22:51,360 --> 00:22:53,580 What would von Neumann do with that matrix? 310 00:22:53,580 --> 00:23:03,590 Of course, he would test it on exponentials and he would get 2 311 00:23:03,590 --> 00:23:11,050 minus e to the i*k minus e to the minus i*k. 312 00:23:11,050 --> 00:23:13,860 He would put the two exponentials together 313 00:23:13,860 --> 00:23:16,640 into cosines and that's the answer. 314 00:23:16,640 --> 00:23:21,500 It's 2 minus 2 cosine of whatever angle theta. 315 00:23:21,500 --> 00:23:23,590 Those are the eigenvalues. 316 00:23:23,590 --> 00:23:31,250 The j-th eigenvalue is -- the cosine is at some angle 317 00:23:31,250 --> 00:23:33,020 theta_j. 318 00:23:33,020 --> 00:23:37,190 It's worth since -- maybe just to take again a minute on this 319 00:23:37,190 --> 00:23:39,180 matrix. 320 00:23:39,180 --> 00:23:42,670 So the eigenvalues are between 0 and 4. 321 00:23:42,670 --> 00:23:44,420 The eigenvalues never go negative, right? 322 00:23:44,420 --> 00:23:48,890 Because the cosine can't be bigger than 1. 323 00:23:48,890 --> 00:23:52,910 Actually, for this matrix the cosine doesn't reach 1. 324 00:23:52,910 --> 00:23:57,340 So the smallest eigenvalue is 2 minus 2 times 325 00:23:57,340 --> 00:23:59,860 that cosine close to 1. 326 00:24:03,210 --> 00:24:09,710 It's too close for comfort, but it is below 1, 327 00:24:09,710 --> 00:24:12,790 so the eigenvalues are positive. 328 00:24:12,790 --> 00:24:17,560 And they're between 0 and 4. 329 00:24:17,560 --> 00:24:22,620 At the other extreme, theta is near pi, 330 00:24:22,620 --> 00:24:26,430 the cosine is near minus 1, and we have something near 4. 331 00:24:26,430 --> 00:24:28,890 So the eigenvalues of that -- if you look at that matrix, 332 00:24:28,890 --> 00:24:32,050 you should see. 333 00:24:32,050 --> 00:24:38,650 Special matrix eigenvalues in this interval, 0 to 4. 334 00:24:38,650 --> 00:24:42,300 The key point is, how close to 0? 335 00:24:42,300 --> 00:24:45,210 Now what about M -- what about P first? 336 00:24:45,210 --> 00:24:48,195 What's P? 337 00:24:48,195 --> 00:24:56,290 For Jacobi, so P for Jacobi is the diagonal part, 338 00:24:56,290 --> 00:24:59,680 which is exactly 2I. 339 00:24:59,680 --> 00:25:07,620 That's a very simple diagonal part, very simple P. 340 00:25:07,620 --> 00:25:11,080 It produces the matrix M. You remember, 341 00:25:11,080 --> 00:25:22,790 M is the identity matrix minus P inverse A. That's beautiful. 342 00:25:22,790 --> 00:25:25,740 So P is just 2I. 343 00:25:25,740 --> 00:25:33,450 So I'm just dividing A by 2, which puts a 1 there, 344 00:25:33,450 --> 00:25:35,430 and then subtracting from the identity. 345 00:25:35,430 --> 00:25:37,220 So it has 0's on the diagonal. 346 00:25:37,220 --> 00:25:38,660 That's what we expect. 347 00:25:38,660 --> 00:25:47,990 It will be 0's on the diagonal and off the diagonal, what 348 00:25:47,990 --> 00:25:48,490 do I have? 349 00:25:48,490 --> 00:25:53,150 Minus -- with that minus, P is 2, so it's 1/2. 350 00:25:53,150 --> 00:26:02,370 It's 1/2 off the diagonal and otherwise, all 0's. 351 00:26:02,370 --> 00:26:11,750 That's the nice matrix that we get for M in one dimension. 352 00:26:11,750 --> 00:26:14,420 I have to say, of course, as you know, 353 00:26:14,420 --> 00:26:19,180 we wouldn't be doing any of this for this one-dimensional matrix 354 00:26:19,180 --> 00:26:21,880 A. It's tridiagonal. 355 00:26:21,880 --> 00:26:23,960 We can't ask for more than that. 356 00:26:23,960 --> 00:26:28,090 Direct elimination would be top speed, 357 00:26:28,090 --> 00:26:36,060 but the 2D case is what we understand that it has a big 358 00:26:36,060 --> 00:26:38,960 bandwidth because it's two-dimensional, 359 00:26:38,960 --> 00:26:43,870 three-dimensional even bigger -- and the eigenvalues will come 360 00:26:43,870 --> 00:26:46,700 directly from the eigenvalues of this 1D case. 361 00:26:46,700 --> 00:26:49,590 So if we understand the 1D case, we'll 362 00:26:49,590 --> 00:26:53,090 knock out the two- and three-dimensional easily. 363 00:26:53,090 --> 00:26:56,560 So I'll stay with this one-dimensional case, 364 00:26:56,560 --> 00:27:00,470 where the matrix M -- and I'll even, 365 00:27:00,470 --> 00:27:06,360 maybe should write down exactly what the iteration does -- 366 00:27:06,360 --> 00:27:08,720 but if I'm looking now at convergence, 367 00:27:08,720 --> 00:27:13,980 do I have or don't I have convergence, and how fast? 368 00:27:13,980 --> 00:27:18,030 What are the eigenvalues and what's the biggest one? 369 00:27:18,030 --> 00:27:24,230 The eigenvalues of A are this and P is just -- 370 00:27:24,230 --> 00:27:29,950 P inverse is just 1/2 times the identity. 371 00:27:32,910 --> 00:27:35,360 So what are the eigenvalues? 372 00:27:35,360 --> 00:27:40,220 Again, I mean, I know the eigenvalues of A, 373 00:27:40,220 --> 00:27:42,920 so I divide by 2. 374 00:27:42,920 --> 00:27:44,880 Let me write down what I'm going to get now 375 00:27:44,880 --> 00:27:53,230 for the eigenvalues of M, remembering this connection. 376 00:27:53,230 --> 00:27:58,230 So the eigenvalues of M are 1 minus 1/2 times the eigenvalues 377 00:27:58,230 --> 00:28:00,080 of A. Better write that down. 378 00:28:00,080 --> 00:28:03,730 1 minus 1/2 times the eigenvalues of A. 379 00:28:03,730 --> 00:28:10,000 Of course, it's very special that we 380 00:28:10,000 --> 00:28:15,500 have this terrific example matrix where 381 00:28:15,500 --> 00:28:19,160 we know the eigenvalues and really can see what's happening 382 00:28:19,160 --> 00:28:21,670 and we know the matrix. 383 00:28:21,670 --> 00:28:22,520 So what happens? 384 00:28:22,520 --> 00:28:27,140 I take half of that -- that makes that a 1. 385 00:28:27,140 --> 00:28:32,090 When I subtract it from that 1, it's gone. 386 00:28:32,090 --> 00:28:36,980 1/2 of that, that 2 is cancelled -- I just get cos theta. 387 00:28:36,980 --> 00:28:40,990 Theta sub -- whatever that angle was. 388 00:28:43,910 --> 00:28:48,390 It's a cosine and it's less than 1. 389 00:28:51,190 --> 00:28:55,060 So convergence is going to happen and convergence is going 390 00:28:55,060 --> 00:29:01,950 to be slow or fast according as this is very near 1 or not -- 391 00:29:01,950 --> 00:29:06,120 and unfortunately, it is pretty near 1, because the first, 392 00:29:06,120 --> 00:29:11,040 the largest one -- I could tell you what the angles are. 393 00:29:11,040 --> 00:29:13,340 That's -- to complete this information, 394 00:29:13,340 --> 00:29:18,610 I should have put in what -- and von Neumann got them right. 395 00:29:18,610 --> 00:29:27,610 That's j -- is there maybe a pi over N plus 1? 396 00:29:27,610 --> 00:29:28,440 I think so. 397 00:29:31,200 --> 00:29:33,160 Those are the angles that come in, 398 00:29:33,160 --> 00:29:37,580 just the equally spaced angles in the finite case. 399 00:29:37,580 --> 00:29:40,940 So we have N eigenvalues. 400 00:29:40,940 --> 00:29:44,900 And the biggest one is going to be when j is 1. 401 00:29:44,900 --> 00:29:50,780 So the biggest one, rho, the maximum of these guys, 402 00:29:50,780 --> 00:29:56,040 of these lambdas, will be the cosine of, when j is 1, 403 00:29:56,040 --> 00:30:00,180 pi over N plus 1. 404 00:30:00,180 --> 00:30:01,860 That's the one that's slowing us down. 405 00:30:06,800 --> 00:30:10,390 Notice the eigenvector that's slowing us down, 406 00:30:10,390 --> 00:30:13,590 because it's the eigenvector that 407 00:30:13,590 --> 00:30:17,270 goes with that eigenvalue that's the biggest eigenvalue that's 408 00:30:17,270 --> 00:30:21,920 going to hang around the longest, decay the slowest. 409 00:30:21,920 --> 00:30:25,980 That eigenvector is very smooth. 410 00:30:25,980 --> 00:30:27,850 It's the low frequency. 411 00:30:27,850 --> 00:30:29,660 It's frequency 1. 412 00:30:29,660 --> 00:30:32,180 So this is j equal to 1. 413 00:30:32,180 --> 00:30:35,780 This is the low frequency, j equal 1, low frequency. 414 00:30:39,030 --> 00:30:49,070 The eigenvector is just samples of the sine, 415 00:30:49,070 --> 00:30:52,500 of a discrete sine. 416 00:30:52,500 --> 00:30:57,140 Lowest frequency that the grid can sustain. 417 00:31:00,600 --> 00:31:04,250 Why do I emphasize which eigenvector it is? 418 00:31:04,250 --> 00:31:08,420 Because when you know which eigenvector is slowing you down 419 00:31:08,420 --> 00:31:12,330 -- and here it's the eigenvector you would think would be 420 00:31:12,330 --> 00:31:15,770 the simplest one to handle -- this is the easiest 421 00:31:15,770 --> 00:31:21,830 eigenvector, higher-frequency eigenvectors have eigenvalues 422 00:31:21,830 --> 00:31:24,110 that start dropping. 423 00:31:24,110 --> 00:31:29,810 Cosine of 2*pi, cosine of 3*pi over N plus 1 is going to be 424 00:31:29,810 --> 00:31:31,420 further away from 1. 425 00:31:31,420 --> 00:31:38,630 Now I have to admit that the very highest frequency, 426 00:31:38,630 --> 00:31:48,600 when j is capital N -- so when j is capital N I have to admit it 427 00:31:48,600 --> 00:31:51,450 happens to come back with a minus sign, 428 00:31:51,450 --> 00:31:54,820 but the magnitude is still just as bad. 429 00:31:54,820 --> 00:32:03,040 The cosine of N*pi over N plus 1 is just the negative of that 430 00:32:03,040 --> 00:32:04,820 one. 431 00:32:04,820 --> 00:32:07,650 So actually, we have two eigenvalues 432 00:32:07,650 --> 00:32:13,070 that are both hitting the spectral radius 433 00:32:13,070 --> 00:32:20,350 and both of them are the worst, the slowest converging 434 00:32:20,350 --> 00:32:22,610 eigenvectors. 435 00:32:22,610 --> 00:32:27,360 This eigenvector looks very smooth. 436 00:32:27,360 --> 00:32:29,880 The eigenvector for that high-frequency one 437 00:32:29,880 --> 00:32:32,760 is the fastest oscillation. 438 00:32:32,760 --> 00:32:37,520 If I graph the eigenvectors, eigenvalues, I will. 439 00:32:37,520 --> 00:32:39,950 I'll graph the eigenvalues. 440 00:32:39,950 --> 00:32:42,500 You'll see that it's the first and the last that 441 00:32:42,500 --> 00:32:46,160 are nearest in magnitude to 1. 442 00:32:49,010 --> 00:32:52,020 I was just going to say, can I anticipate 443 00:32:52,020 --> 00:32:53,110 multigrid for a minute? 444 00:32:55,850 --> 00:33:01,680 So multigrid does -- is going to try to get these guys to go 445 00:33:01,680 --> 00:33:03,460 faster. 446 00:33:03,460 --> 00:33:06,510 Now this one can be handled, you'll see, 447 00:33:06,510 --> 00:33:09,650 by just a simple adjustment of Jacobi. 448 00:33:09,650 --> 00:33:13,500 I just weight Jacobi a little bit. 449 00:33:13,500 --> 00:33:19,350 Instead of 2I, I take 3I or 4I. 450 00:33:19,350 --> 00:33:24,330 3I would be very good for the preconditioner. 451 00:33:24,330 --> 00:33:28,440 That will have the effect on this high frequency one 452 00:33:28,440 --> 00:33:32,290 that it'll be well down now, below 1. 453 00:33:32,290 --> 00:33:37,980 This will still be the slowpoke in converging. 454 00:33:37,980 --> 00:33:44,460 That's where multigrid says, that's low frequency. 455 00:33:44,460 --> 00:33:45,740 That's too low. 456 00:33:45,740 --> 00:33:48,530 That's not converging quickly. 457 00:33:48,530 --> 00:33:52,380 Take a different grid on which that same thing 458 00:33:52,380 --> 00:33:54,300 looks higher frequency. 459 00:33:54,300 --> 00:33:58,360 We'll see that tomorrow. 460 00:33:58,360 --> 00:34:02,060 So if I stay with Jacobi, what have I learned? 461 00:34:02,060 --> 00:34:05,410 I've learned that this is rho. 462 00:34:05,410 --> 00:34:07,870 For this matrix, the spectral radius 463 00:34:07,870 --> 00:34:11,550 is that number and let's just see how big it is. 464 00:34:11,550 --> 00:34:15,380 How big is -- that's a cosine of a small number. 465 00:34:15,380 --> 00:34:20,320 So the cosine of a small number, theta near 0, 466 00:34:20,320 --> 00:34:25,600 the cosine of theta is 1 minus -- so it's below 1, 467 00:34:25,600 --> 00:34:27,960 of course -- 1/2 theta squared. 468 00:34:27,960 --> 00:34:33,370 Theta is pi over N plus 1 squared. 469 00:34:33,370 --> 00:34:36,730 All these simple estimates really 470 00:34:36,730 --> 00:34:44,230 tell you what happens when you do Jacobi iteration. 471 00:34:44,230 --> 00:34:52,450 You'll see it on the output from -- I mean, you can -- 472 00:34:52,450 --> 00:34:59,690 I hope will -- code up Jacobi's method, 473 00:34:59,690 --> 00:35:06,710 taking P to be a multiple of the identity, so very fast, 474 00:35:06,710 --> 00:35:11,090 and you'll see the convergence rate, you can graph. 475 00:35:11,090 --> 00:35:13,900 Here's an interesting point and I'll make it again. 476 00:35:13,900 --> 00:35:19,970 What should you graph in seeing -- 477 00:35:19,970 --> 00:35:24,640 to display convergence and rate of convergence? 478 00:35:24,640 --> 00:35:28,400 Let me draw a possible graph here. 479 00:35:28,400 --> 00:35:30,950 Let me draw two possible graphs. 480 00:35:30,950 --> 00:35:38,660 I could graph -- this is K. As I take more steps, 481 00:35:38,660 --> 00:35:42,890 I could graph the error, well, the size of the error. 482 00:35:42,890 --> 00:35:47,980 It's a vector, so I take its length. 483 00:35:47,980 --> 00:35:50,340 I don't know where I start -- at e_naught. 484 00:35:50,340 --> 00:35:51,257 This is k equals 0. 485 00:35:51,257 --> 00:35:52,340 This is the initial guess. 486 00:35:55,380 --> 00:36:00,620 If I use -- I'm going to draw a graph, 487 00:36:00,620 --> 00:36:04,990 which you'll draw much better by actually getting MATLAB to do 488 00:36:04,990 --> 00:36:05,490 it. 489 00:36:08,690 --> 00:36:13,680 Before I draw it, let me mention the other graph we could draw. 490 00:36:13,680 --> 00:36:22,350 This tells us the error in -- this is the error 491 00:36:22,350 --> 00:36:25,257 in the solution. 492 00:36:25,257 --> 00:36:27,090 If I think that's the natural thing to draw, 493 00:36:27,090 --> 00:36:32,360 it is, but we could also draw the error 494 00:36:32,360 --> 00:36:36,410 in the equation, the residual error. 495 00:36:36,410 --> 00:36:39,730 So I'll just put up here what I mean by that. 496 00:36:39,730 --> 00:36:48,220 If I put in -- so A*x equals b exactly. 497 00:36:48,220 --> 00:36:52,790 A*x_k is close. 498 00:36:56,990 --> 00:37:01,070 The residual is -- let me call it r, as everybody does -- 499 00:37:01,070 --> 00:37:13,820 is the difference A*x minus A*x_k It's how close we are 500 00:37:13,820 --> 00:37:18,850 to b, so it's b minus A*x_k. 501 00:37:18,850 --> 00:37:21,610 The true A*x gets b exactly right. 502 00:37:21,610 --> 00:37:23,770 The wrong A*x gets b nearly right. 503 00:37:28,030 --> 00:37:32,090 So you see, because we have this linear problem, 504 00:37:32,090 --> 00:37:38,860 it's A times the error. x minus x_k is e_k. 505 00:37:38,860 --> 00:37:49,770 So this is the residual and I could graph that. 506 00:37:49,770 --> 00:37:55,470 That's how close my answer comes to getting b right, 507 00:37:55,470 --> 00:37:57,360 how close the equation is. 508 00:37:57,360 --> 00:38:01,440 Where this is how close the x_k is. 509 00:38:01,440 --> 00:38:03,520 So that would be the other thing I could graph. 510 00:38:06,570 --> 00:38:12,680 r_k, the size of the residual, which is A*e_k. 511 00:38:19,670 --> 00:38:23,610 So now I have to remember what I think these graphs look like, 512 00:38:23,610 --> 00:38:26,305 but of course, it's a little absurd for me to draw them when 513 00:38:26,305 --> 00:38:35,040 -- I think what we see is that the residual drops pretty fast. 514 00:38:35,040 --> 00:38:40,660 The residual drops pretty fast and keeps going, 515 00:38:40,660 --> 00:38:49,010 for, let's say, Jacobi's method, or, in a minute or maybe next 516 00:38:49,010 --> 00:38:52,940 time, another iterative method -- 517 00:38:52,940 --> 00:38:57,000 but the error in the solution starts down fast 518 00:38:57,000 --> 00:39:01,340 and what's happening here is the high-frequency components are 519 00:39:01,340 --> 00:39:08,950 getting killed, but those low-frequency components are 520 00:39:08,950 --> 00:39:13,500 not disappearing, because we figured out that they give 521 00:39:13,500 --> 00:39:16,410 the spectral radius, the largest eigenvalue. 522 00:39:16,410 --> 00:39:27,840 So somehow it slopes off and it's -- 523 00:39:27,840 --> 00:39:31,680 the price of such a simple method is that you don't get 524 00:39:31,680 --> 00:39:33,020 great convergence. 525 00:39:33,020 --> 00:39:38,520 Of course, if I put this on a log-log plot, 526 00:39:38,520 --> 00:39:42,470 the slope would be exactly connected 527 00:39:42,470 --> 00:39:44,310 to the spectral radius. 528 00:39:44,310 --> 00:39:45,930 That's the rate of decay. 529 00:39:48,970 --> 00:39:53,320 It's the -- e_k is approximately -- 530 00:39:53,320 --> 00:39:57,750 the size of e_k is approximately that worst eigenvalue 531 00:39:57,750 --> 00:39:58,710 to the k-th power. 532 00:40:04,830 --> 00:40:06,410 Do you see how this can happen? 533 00:40:09,080 --> 00:40:11,830 You see, this can be quite small, 534 00:40:11,830 --> 00:40:13,750 the residual quite small. 535 00:40:13,750 --> 00:40:16,330 I think it's very important to see the two 536 00:40:16,330 --> 00:40:17,760 different quantities. 537 00:40:17,760 --> 00:40:22,780 The residual can be quite small, but then, from the residual, 538 00:40:22,780 --> 00:40:24,640 if I wanted to get back to the error, 539 00:40:24,640 --> 00:40:27,480 I'd have to multiply by A inverse 540 00:40:27,480 --> 00:40:30,730 and A inverse is not small. 541 00:40:30,730 --> 00:40:36,090 Our matrix A is pretty -- has eigenvalues close to 0. 542 00:40:36,090 --> 00:40:38,180 So A inverse is pretty big. 543 00:40:38,180 --> 00:40:48,170 A inverse times A*e_k -- this is small but A inverse picks up 544 00:40:48,170 --> 00:40:51,220 those smallest eigenvalues of A -- 545 00:40:51,220 --> 00:40:59,110 picks up those low frequencies and it's bigger, 546 00:40:59,110 --> 00:41:02,580 slower to go to 0 as we see in this picture. 547 00:41:02,580 --> 00:41:08,080 So it's this picture that is our problem. 548 00:41:08,080 --> 00:41:14,580 That shows where Jacobi is not good enough. 549 00:41:14,580 --> 00:41:18,760 So what more to say about Jacobi? 550 00:41:18,760 --> 00:41:21,840 For this matrix K, we know its eigenvalues. 551 00:41:24,650 --> 00:41:27,560 The eigenvalues of M we know exactly. 552 00:41:27,560 --> 00:41:30,000 The matrix M we know exactly. 553 00:41:30,000 --> 00:41:33,740 Do you see in that matrix -- now if I had said, 554 00:41:33,740 --> 00:41:37,040 look at that matrix and tell me something about its 555 00:41:37,040 --> 00:41:38,740 eigenvalues, what would you say? 556 00:41:41,370 --> 00:41:45,630 You would say, it's symmetric, so the eigenvalues are real 557 00:41:45,630 --> 00:41:49,870 and you would say that every eigenvalue is smaller than 1. 558 00:41:49,870 --> 00:41:50,880 Why? 559 00:41:50,880 --> 00:41:55,120 Because every eigenvalue is -- if you like to remember this 560 00:41:55,120 --> 00:42:01,540 business of silly circles, every eigenvalue is in a circle 561 00:42:01,540 --> 00:42:05,260 centered at this number -- in other words, centered at 0 -- 562 00:42:05,260 --> 00:42:10,090 and its radius is the sum of those, which is 1. 563 00:42:10,090 --> 00:42:16,180 So we know that every eigenvalue is in the unit circle 564 00:42:16,180 --> 00:42:21,510 and actually, it comes inside the unit circle, 565 00:42:21,510 --> 00:42:25,570 but not very much -- only by 1 over N squared. 566 00:42:25,570 --> 00:42:28,610 What do I learn from this 1 over N squared? 567 00:42:32,090 --> 00:42:33,950 How many iterations do I have to take 568 00:42:33,950 --> 00:42:39,630 to reduce the error by, let's say, 10 or e or whatever? 569 00:42:39,630 --> 00:42:44,900 If the eigenvalues are 1 minus a constant over N squared, 570 00:42:44,900 --> 00:42:47,930 I'll have to take the N squared power. 571 00:42:47,930 --> 00:42:50,900 So if I have a matrix of order 100, 572 00:42:50,900 --> 00:42:52,820 the number of iterations I'm going to need 573 00:42:52,820 --> 00:42:59,600 is 100 squared, 10,000 steps. 574 00:42:59,600 --> 00:43:04,560 Every step is fast, especially with the Jacobi choice. 575 00:43:09,420 --> 00:43:14,920 Here's my iteration -- and P is just a multiple of the identity 576 00:43:14,920 --> 00:43:15,630 here. 577 00:43:18,350 --> 00:43:20,830 So every step is certainly fast. 578 00:43:20,830 --> 00:43:23,050 Every step involves a matrix multiplication 579 00:43:23,050 --> 00:43:25,850 and that's why I began the lecture by saying, 580 00:43:25,850 --> 00:43:28,000 matrix multiplications are fast. 581 00:43:28,000 --> 00:43:32,860 A times x_k -- that's the work and it's not much. 582 00:43:32,860 --> 00:43:37,440 It's maybe five non-zeros in A -- so 5N. 583 00:43:37,440 --> 00:43:40,890 That's fast, but if I have to do it 10,000 times 584 00:43:40,890 --> 00:43:44,720 just to reduce the error by some constant factor, 585 00:43:44,720 --> 00:43:47,210 that's not good. 586 00:43:47,210 --> 00:43:52,530 So that's why people think, okay, Jacobi gives us the idea. 587 00:43:52,530 --> 00:43:56,130 It does knock off the high frequencies quite quickly, 588 00:43:56,130 --> 00:44:00,350 but it leaves those low frequencies. 589 00:44:00,350 --> 00:44:03,860 Well, I was going to mention weighed Jacobi. 590 00:44:03,860 --> 00:44:12,040 Weighted Jacobi is to take, instead of p -- 591 00:44:12,040 --> 00:44:15,230 put in a weight. 592 00:44:15,230 --> 00:44:18,970 So weighted Jacobi -- let me put this here and we'll come back 593 00:44:18,970 --> 00:44:19,510 to it. 594 00:44:19,510 --> 00:44:27,330 So weighted Jacobi simply takes P 595 00:44:27,330 --> 00:44:35,390 to be the diagonal over a factor omega. 596 00:44:39,240 --> 00:44:49,300 For example, omega equals 2/3 is a good choice. 597 00:44:49,300 --> 00:44:53,200 Let me draw the -- you can imagine that if I -- 598 00:44:53,200 --> 00:44:58,020 this is 2 times the identity, so it's just 2 over omega I -- 599 00:44:58,020 --> 00:45:01,280 I'm just choosing a different constant. 600 00:45:01,280 --> 00:45:03,540 I'm just choosing a different constant in P 601 00:45:03,540 --> 00:45:09,360 and it has a simple affect on M. I'll draw the picture. 602 00:45:12,100 --> 00:45:16,180 The eigenvalues of M are these cosines. 603 00:45:16,180 --> 00:45:19,830 So I'm going to draw those cosines, the eigenvalues of M. 604 00:45:19,830 --> 00:45:23,290 So they start -- I'll just draw the picture of cos theta if I 605 00:45:23,290 --> 00:45:23,940 can. 606 00:45:23,940 --> 00:45:26,250 What does a graph of cos theta look like? 607 00:45:32,060 --> 00:45:34,840 From 0 to pi? 608 00:45:34,840 --> 00:45:38,910 So I'm going to see -- this is theta equals 0. 609 00:45:38,910 --> 00:45:41,890 This is theta equal pi. 610 00:45:41,890 --> 00:45:45,670 If I just draw cos theta -- please tell me if this 611 00:45:45,670 --> 00:45:47,070 isn't it. 612 00:45:47,070 --> 00:45:48,430 Is it something like that? 613 00:45:52,620 --> 00:45:58,940 Now where are these theta_1 -- the bad one -- theta_2, 614 00:45:58,940 --> 00:46:03,580 theta_3, theta_4, theta_5, equally bad, down here. 615 00:46:03,580 --> 00:46:07,180 So there's cos -- there's the biggest one. 616 00:46:07,180 --> 00:46:09,600 That's rho and you see it's pretty near 1. 617 00:46:12,130 --> 00:46:14,890 Here I'm going to hit -- at this frequency, 618 00:46:14,890 --> 00:46:17,620 I'm going to hit minus rho. 619 00:46:17,620 --> 00:46:20,730 It'll be, again, near 1. 620 00:46:20,730 --> 00:46:27,180 So the spectral radius is that, that's rho. 621 00:46:27,180 --> 00:46:34,160 The cosine of that smallest angle, theta over N plus 1. 622 00:46:34,160 --> 00:46:36,740 What happens when I bring in these weights? 623 00:46:36,740 --> 00:46:43,090 It turns out that -- so, now the eigenvalues of M, 624 00:46:43,090 --> 00:46:46,780 with the weights, will be 1 -- it turns out -- 625 00:46:46,780 --> 00:46:50,740 1 minus omega plus omega cos theta. 626 00:46:53,820 --> 00:46:56,720 You'll see it in the notes, no problem. 627 00:46:56,720 --> 00:47:00,990 So it's simply influenced by omega. 628 00:47:00,990 --> 00:47:04,660 If omega is 2/3 -- if omega is 2/3, 629 00:47:04,660 --> 00:47:10,600 then I have 1 minus omega is 1/3 plus 2/3 cos theta. 630 00:47:15,620 --> 00:47:18,240 Instead of graphing cos theta, which I've done -- 631 00:47:18,240 --> 00:47:23,220 let me draw -- let me complete this graph for cos theta. 632 00:47:23,220 --> 00:47:27,880 The eigenvalues unweighted, with weight omega equal 1. 633 00:47:27,880 --> 00:47:31,450 Now it's this thing I draw. 634 00:47:31,450 --> 00:47:33,420 So what happens? 635 00:47:33,420 --> 00:47:36,170 What's going on here? 636 00:47:36,170 --> 00:47:40,890 For a small theta, cosine is near 1. 637 00:47:40,890 --> 00:47:42,200 I'm again near 1. 638 00:47:42,200 --> 00:47:46,100 In fact, I'm probably even little worse, probably up here. 639 00:47:51,230 --> 00:47:57,410 I think it goes down like that. 640 00:47:57,410 --> 00:48:03,180 So it starts at near 1, but it ends when theta is pi. 641 00:48:03,180 --> 00:48:05,140 This is 1/3 minus 2/3. 642 00:48:05,140 --> 00:48:06,170 It ends at minus 1/3. 643 00:48:06,170 --> 00:48:10,150 You see, that's a lot smarter. 644 00:48:10,150 --> 00:48:14,470 This one ended at minus 1, that one ended at near minus 1, 645 00:48:14,470 --> 00:48:21,130 but this one will end with omega -- this is the omega equal 2/3, 646 00:48:21,130 --> 00:48:22,400 the weighted one. 647 00:48:22,400 --> 00:48:26,830 All you've done is fix the high frequency. 648 00:48:26,830 --> 00:48:30,080 When I say, oh, that was a good thing to do. 649 00:48:30,080 --> 00:48:34,060 The high frequencies are no longer a problem. 650 00:48:34,060 --> 00:48:37,340 The low frequencies still are. 651 00:48:37,340 --> 00:48:39,800 So the only way we'll get out of that problem 652 00:48:39,800 --> 00:48:43,470 is moving to multigrid. 653 00:48:43,470 --> 00:48:49,970 Can I, in the remaining minute, report on Gauss-Seidel 654 00:48:49,970 --> 00:48:51,220 and then I'll come back to it? 655 00:48:51,220 --> 00:48:54,320 But quick report on Gauss-Seidel. 656 00:48:54,320 --> 00:48:58,150 So what's the difference in Gauss-Seidel? 657 00:48:58,150 --> 00:48:59,560 What does Gauss-Seidel do? 658 00:48:59,560 --> 00:49:01,740 It uses the latest information. 659 00:49:01,740 --> 00:49:05,770 As soon as you compute an x, you use it. 660 00:49:05,770 --> 00:49:14,350 Jacobi computed all -- had to keep all these x's until it 661 00:49:14,350 --> 00:49:18,780 found all the new x's, but the Gauss-Seidel idea is, 662 00:49:18,780 --> 00:49:22,010 as soon as you have a better x, put it in the system. 663 00:49:22,010 --> 00:49:25,620 So the storage is cut in half, because you're not 664 00:49:25,620 --> 00:49:32,380 saving a big old vector while you compute the new one. 665 00:49:32,380 --> 00:49:34,550 I'll write down Gauss-Seidel again. 666 00:49:34,550 --> 00:49:38,650 So it's more up to date and the effect 667 00:49:38,650 --> 00:49:44,080 is that the Jacobi eigenvalues get squared. 668 00:49:44,080 --> 00:49:50,380 So you're squaring numbers that are less than 1. 669 00:49:50,380 --> 00:50:01,460 So Gauss-Seidel, this gives the Jacobi eigenvalues squared. 670 00:50:01,460 --> 00:50:06,310 The result is that if I draw the same curve for Gauss-Seidel, 671 00:50:06,310 --> 00:50:08,590 I'll be below, right? 672 00:50:11,920 --> 00:50:15,590 Essentially, a Gauss-Seidel step is worth two Jacobi steps, 673 00:50:15,590 --> 00:50:18,300 because eigenvalues get squared as they 674 00:50:18,300 --> 00:50:20,496 would do in two Jacobi steps. 675 00:50:20,496 --> 00:50:23,460 If I do Jacobi twice, I square its eigenvalues. 676 00:50:23,460 --> 00:50:27,110 So it seems like absolutely a smart move. 677 00:50:27,110 --> 00:50:35,320 It's not brilliant, because this becomes cosine squared. 678 00:50:35,320 --> 00:50:41,960 I lose the 1/2, but I'm still very, very close to 1. 679 00:50:41,960 --> 00:50:47,150 It still takes order N squared iterations to get anywhere. 680 00:50:47,150 --> 00:50:51,060 So Jacobi is, you could say, replaced 681 00:50:51,060 --> 00:50:53,760 by Gauss-Seidel as being one step better, 682 00:50:53,760 --> 00:50:55,750 but it's not good enough. 683 00:50:55,750 --> 00:51:02,950 And that's why the subject kept going. 684 00:51:02,950 --> 00:51:07,490 This gives rho as 1 minus order of 1 685 00:51:07,490 --> 00:51:14,920 over N, first power, where these were 1 minus 1 over N squared. 686 00:51:14,920 --> 00:51:17,460 So this is definitely further from 1 687 00:51:17,460 --> 00:51:20,460 and then this can be way further from 1. 688 00:51:20,460 --> 00:51:23,980 So you can see some of the numerical experiments 689 00:51:23,980 --> 00:51:26,340 and analysis that's coming. 690 00:51:26,340 --> 00:51:31,280 I'll see you Wednesday to finish this and move into multigrid, 691 00:51:31,280 --> 00:51:33,300 which uses these guys.