1
00:00:00,000 --> 00:00:01,470
NARRATOR: The
following content is

2
00:00:01,470 --> 00:00:04,690
provided by MIT OpenCourseWare
under a Creative Commons

3
00:00:04,690 --> 00:00:06,090
license.

4
00:00:06,090 --> 00:00:08,230
Additional information
about our license

5
00:00:08,230 --> 00:00:10,460
and MIT OpenCourseWare
in general

6
00:00:10,460 --> 00:00:14,370
is available at ocw.mit.edu.

7
00:00:14,370 --> 00:00:21,540
PROFESSOR: So I'm thinking
of large sparse matrices.

8
00:00:24,600 --> 00:00:29,280
So the point about A -- the
kind of matrices that we've been

9
00:00:29,280 --> 00:00:32,010
writing down, maybe
two-dimensional,

10
00:00:32,010 --> 00:00:35,620
maybe three-dimensional
difference matrices.

11
00:00:35,620 --> 00:00:42,570
So it might be five or seven
non-zeros on a typical row

12
00:00:42,570 --> 00:00:47,500
and that means that you don't
want to invert such a matrix.

13
00:00:47,500 --> 00:00:49,470
These are big matrices.

14
00:00:49,470 --> 00:00:51,640
The order could
easily be a million.

15
00:00:51,640 --> 00:00:56,990
What you can do fast is
multiply A by a vector,

16
00:00:56,990 --> 00:00:59,880
because multiplying a
sparse matrix by a vector,

17
00:00:59,880 --> 00:01:05,720
you only have maybe five
non-zeros times the n --

18
00:01:05,720 --> 00:01:11,240
the vector of length n -- so say
5n operations, so that's fast.

19
00:01:16,230 --> 00:01:21,840
I'm thinking that we are at
a size where elimination,

20
00:01:21,840 --> 00:01:27,960
even with a good ordering,
is too expensive in time

21
00:01:27,960 --> 00:01:29,760
or in storage.

22
00:01:29,760 --> 00:01:33,640
So I'm going from direct
elimination methods

23
00:01:33,640 --> 00:01:36,800
to iterative methods.

24
00:01:36,800 --> 00:01:39,640
Iterative means that
I never actually get

25
00:01:39,640 --> 00:01:46,390
to the perfect answer x, but I
get close and what I want to do

26
00:01:46,390 --> 00:01:48,650
is get close quickly.

27
00:01:48,650 --> 00:01:53,170
So a lot of thought
has gone into creating

28
00:01:53,170 --> 00:01:55,630
iterative methods.

29
00:01:55,630 --> 00:01:59,710
I'll write down
one key word here.

30
00:01:59,710 --> 00:02:06,130
Part of the decision is to
choose a good preconditioner.

31
00:02:06,130 --> 00:02:12,920
I'll call that matrix P. So it's
a matrix that's supposed to be

32
00:02:12,920 --> 00:02:17,060
-- that we have some freedom to
choose and it very much speeds

33
00:02:17,060 --> 00:02:21,640
up -- so the decisions are,
what preconditioner to choose?

34
00:02:21,640 --> 00:02:25,990
A preconditioner that's
somehow like the matrix A,

35
00:02:25,990 --> 00:02:28,640
but hopefully a lot simpler.

36
00:02:36,540 --> 00:02:39,890
There are a bunch
of iterative methods

37
00:02:39,890 --> 00:02:42,770
and that's what today's
lecture is about.

38
00:02:42,770 --> 00:02:49,040
Multigrid is an idea that
just keeps developing.

39
00:02:49,040 --> 00:02:54,630
It really is producing answers
to very large problems very

40
00:02:54,630 --> 00:02:59,590
fast, so that will be
the following lectures.

41
00:02:59,590 --> 00:03:04,560
And then come these methods --
you may not be familiar with

42
00:03:04,560 --> 00:03:05,710
that word Krylov.

43
00:03:05,710 --> 00:03:10,360
Let me mention that in this
family, the best known method

44
00:03:10,360 --> 00:03:12,840
it is called
conjugate gradients.

45
00:03:12,840 --> 00:03:14,750
The conjugate gradient method.

46
00:03:14,750 --> 00:03:19,120
So I'll just write that
down, conjugate gradient.

47
00:03:19,120 --> 00:03:20,870
So that will come next week.

48
00:03:24,620 --> 00:03:27,570
That's a method
that's a giant success

49
00:03:27,570 --> 00:03:32,040
for symmetric, positive
definite problem.

50
00:03:32,040 --> 00:03:35,040
So I don't assume
in general that A is

51
00:03:35,040 --> 00:03:37,420
symmetric or positive definite.

52
00:03:37,420 --> 00:03:42,090
Often it is if it comes
from a finite difference

53
00:03:42,090 --> 00:03:47,250
or finite element approximation.

54
00:03:47,250 --> 00:03:49,980
If it is, then
conjugate gradients

55
00:03:49,980 --> 00:03:53,690
is highly recommended.

56
00:03:53,690 --> 00:03:56,910
I'll start with the
general picture.

57
00:03:56,910 --> 00:04:00,160
Let me put the general
picture of what

58
00:04:00,160 --> 00:04:02,250
an iteration looks like.

59
00:04:02,250 --> 00:04:06,170
So an iteration
computes the new x.

60
00:04:06,170 --> 00:04:17,380
Let me call it x-new or
x_(k+1) from the known x.

61
00:04:17,380 --> 00:04:23,570
So we start with x_0,
any x_0, In principle.

62
00:04:23,570 --> 00:04:27,570
It's not too important
for x_0 to be close.

63
00:04:27,570 --> 00:04:33,110
In linear methods, you
could start even at 0.

64
00:04:33,110 --> 00:04:35,940
In nonlinear methods,
Newton methods,

65
00:04:35,940 --> 00:04:38,910
you do want to be close,
and a lot of attention

66
00:04:38,910 --> 00:04:40,590
goes into a good start.

67
00:04:40,590 --> 00:04:42,670
Here the start
isn't too important;

68
00:04:42,670 --> 00:04:45,300
it's the steps that you taking.

69
00:04:45,300 --> 00:04:53,300
Let me -- so I guess
I'm going to --

70
00:04:53,300 --> 00:04:55,150
I'm trying to solve A*x equal b.

71
00:04:57,770 --> 00:05:05,910
I'm going to split
that equation into --

72
00:05:05,910 --> 00:05:08,140
I'm just going to write
the same equation this way.

73
00:05:08,140 --> 00:05:18,195
I could write it as x
equals I minus A x plus b.

74
00:05:18,195 --> 00:05:19,820
That would be the
same equation, right?

75
00:05:19,820 --> 00:05:22,500
Because x and x cancel.

76
00:05:22,500 --> 00:05:25,200
A*x comes over here
and I have A*x equal b.

77
00:05:29,640 --> 00:05:34,640
I've split up the equation and
now actually, let me bring --

78
00:05:34,640 --> 00:05:36,860
so that's without
preconditioner.

79
00:05:36,860 --> 00:05:44,190
You don't see a matrix P. That's
just an ordinary iteration,

80
00:05:44,190 --> 00:05:46,440
no preconditioner.

81
00:05:46,440 --> 00:05:52,230
The preconditioner will come --
I'll have P and it'll go here

82
00:05:52,230 --> 00:05:54,310
too.

83
00:05:54,310 --> 00:06:00,080
Then, of course, I have
to change that to a P.

84
00:06:00,080 --> 00:06:02,380
So that's still the
same equation, right?

85
00:06:02,380 --> 00:06:07,080
P*x is on both sides, cancels
-- and I have A*x equal b.

86
00:06:07,080 --> 00:06:12,590
So that's the same equation,
but it suggests the iteration

87
00:06:12,590 --> 00:06:14,570
that I want to analyze.

88
00:06:17,570 --> 00:06:28,900
On the right-hand side is
the known approximation.

89
00:06:28,900 --> 00:06:33,230
On the left side is the
next approximation, x_(k+1).

90
00:06:36,100 --> 00:06:43,800
I multiply by P so I -- the
idea is, P should be close to A.

91
00:06:43,800 --> 00:06:48,710
Of course, if it was
exactly A -- if P equaled A,

92
00:06:48,710 --> 00:06:52,570
then this wouldn't be here,
and I would just be solving,

93
00:06:52,570 --> 00:06:56,730
in one step, A*x equal b.

94
00:06:56,730 --> 00:07:01,490
So P equal A is one extreme.

95
00:07:01,490 --> 00:07:05,210
I mean, it's totally,
perfectly preconditioned,

96
00:07:05,210 --> 00:07:09,520
but the point is
that if P equals A,

97
00:07:09,520 --> 00:07:13,320
we don't want to
solve that system,

98
00:07:13,320 --> 00:07:15,790
we're trying to escape
from solving that system.

99
00:07:15,790 --> 00:07:23,450
But we're willing to solve a
related system that is simpler.

100
00:07:23,450 --> 00:07:25,190
Why simpler?

101
00:07:25,190 --> 00:07:27,960
Here are some possible
preconditioners.

102
00:07:27,960 --> 00:07:32,060
P equals the diagonal.

103
00:07:32,060 --> 00:07:37,550
Just take the diagonal
of A and that choice

104
00:07:37,550 --> 00:07:41,000
is named after Jacobi.

105
00:07:41,000 --> 00:07:43,670
That's Jacobi's method.

106
00:07:43,670 --> 00:07:47,770
Actually, it's
already a good idea,

107
00:07:47,770 --> 00:07:54,550
because the effect of that
is somehow to properly scale

108
00:07:54,550 --> 00:07:56,400
the equations.

109
00:07:56,400 --> 00:07:59,410
When I had the identity
here, I had no idea

110
00:07:59,410 --> 00:08:04,180
whether the identity and A
are in the right scaling.

111
00:08:04,180 --> 00:08:07,810
A may be divided by delta
x squared or something.

112
00:08:07,810 --> 00:08:13,060
It could be totally
different units, but P --

113
00:08:13,060 --> 00:08:16,510
if I take the diagonal of A,
at least they're comparable.

114
00:08:16,510 --> 00:08:22,190
Then you see what -- we'll
study Jacobi's method.

115
00:08:22,190 --> 00:08:24,600
So that's a very simple choice.

116
00:08:24,600 --> 00:08:28,650
Takes no more work than
the identity, really.

117
00:08:28,650 --> 00:08:32,230
Second choice would be -- so
that would be P for Jacobi.

118
00:08:32,230 --> 00:08:36,280
A second choice, that you
would think of pretty fast,

119
00:08:36,280 --> 00:08:42,460
is the diagonal and the
lower triangular part.

120
00:08:42,460 --> 00:08:46,760
So it's the -- I'll
say the triangular --

121
00:08:46,760 --> 00:08:48,980
I'm going to say
lower triangular,

122
00:08:48,980 --> 00:08:54,210
triangular part of A,
including the diagonal.

123
00:08:58,100 --> 00:09:05,390
This is associated with
Gauss's great name and Seidel.

124
00:09:05,390 --> 00:09:07,520
So that's the
Gauss-Seidel choice.

125
00:09:11,470 --> 00:09:15,300
So why triangular?

126
00:09:15,300 --> 00:09:18,980
We'll analyze the
effect of this choice.

127
00:09:18,980 --> 00:09:27,600
If P is triangular,
then we can solve --

128
00:09:27,600 --> 00:09:29,700
I'm never going to write
out an inverse matrix.

129
00:09:29,700 --> 00:09:35,340
That's understood, but
I can compute the new x

130
00:09:35,340 --> 00:09:41,380
from the old x just
by back substitution.

131
00:09:41,380 --> 00:09:47,100
I find the first component
of x_(k+1), then the second,

132
00:09:47,100 --> 00:09:48,840
then the third.

133
00:09:48,840 --> 00:09:56,750
It's just because the first
equation -- if P is triangular,

134
00:09:56,750 --> 00:10:01,320
the first equation will
only have one term,

135
00:10:01,320 --> 00:10:03,860
one new term on the left side
and everything else will be

136
00:10:03,860 --> 00:10:05,250
over there.

137
00:10:05,250 --> 00:10:09,850
The second equation will
have a diagonal and something

138
00:10:09,850 --> 00:10:17,580
that uses the number I just
found for the first component

139
00:10:17,580 --> 00:10:18,650
and onward.

140
00:10:18,650 --> 00:10:23,320
So in other words,
triangular matrices are good.

141
00:10:23,320 --> 00:10:27,270
Then people thought about
combinations of these.

142
00:10:30,400 --> 00:10:34,440
A big lot of activity
went in something called

143
00:10:34,440 --> 00:10:39,520
overrelaxation,
where you did a --

144
00:10:39,520 --> 00:10:43,450
you had a weighting of these
and it turned out to be if you

145
00:10:43,450 --> 00:10:49,900
adjusted the weight, you
could speed up the method,

146
00:10:49,900 --> 00:10:55,230
sometimes called SOR--
successive overrelaxation.

147
00:10:55,230 --> 00:11:00,890
So when I was in graduate
school, that was a very --

148
00:11:00,890 --> 00:11:05,240
that was sort of the beginning
of progress beyond Jacobi

149
00:11:05,240 --> 00:11:07,740
and Gauss-Seidel.

150
00:11:07,740 --> 00:11:12,960
Then after that came
a very natural --

151
00:11:12,960 --> 00:11:20,260
ILU is the next one that I want
-- the I stands for incomplete,

152
00:11:20,260 --> 00:11:21,290
incomplete LU.

153
00:11:24,400 --> 00:11:27,520
So what are L and U?

154
00:11:27,520 --> 00:11:32,080
That's the letters I always
use as a signal for the lower

155
00:11:32,080 --> 00:11:38,600
triangular and upper triangular
factors of A, the ones

156
00:11:38,600 --> 00:11:41,310
that exact elimination
would produce.

157
00:11:41,310 --> 00:11:43,410
But we don't want to
do exact elimination.

158
00:11:43,410 --> 00:11:44,450
Why?

159
00:11:44,450 --> 00:11:48,040
Because non-zeros fill in.

160
00:11:48,040 --> 00:11:55,230
Non-zeros appear in those
spaces where A itself is 0.

161
00:11:55,230 --> 00:12:01,630
So the factors are much --
have many more non-zeros than

162
00:12:01,630 --> 00:12:04,700
the original A.
They're not as sparse.

163
00:12:04,700 --> 00:12:10,470
So the idea of incomplete LU is
-- and I'll come back to it --

164
00:12:10,470 --> 00:12:12,630
the idea of incomplete
LU, you might say,

165
00:12:12,630 --> 00:12:16,400
should have occurred
to people earlier --

166
00:12:16,400 --> 00:12:24,010
only keep the non-zeros that
are non-zeros in the original A.

167
00:12:24,010 --> 00:12:27,970
Don't allow fill-in or allow
a certain amount of fill-in.

168
00:12:27,970 --> 00:12:32,210
You could have a tolerance and
you would include a non-zero

169
00:12:32,210 --> 00:12:36,350
if it was big enough, but the
many, many small non-zeros

170
00:12:36,350 --> 00:12:43,060
that just fill in and
make the elimination long,

171
00:12:43,060 --> 00:12:44,840
you throw them out.

172
00:12:44,840 --> 00:12:48,870
So P is -- in this
case, P is L --

173
00:12:48,870 --> 00:12:56,290
is an approximate L
times an approximate U.

174
00:12:56,290 --> 00:13:05,150
So it's close to A, but
because we've refused some

175
00:13:05,150 --> 00:13:08,930
of the fill-in,
it's not exactly A.

176
00:13:08,930 --> 00:13:12,520
And you have a tolerance there
where if you set the tolerance

177
00:13:12,520 --> 00:13:17,930
high, then these are exact, but
you've got all that fill-in;

178
00:13:17,930 --> 00:13:25,540
if you set the tolerance to 0,
you've got the other extreme.

179
00:13:25,540 --> 00:13:29,330
So that would give you an
idea of preconditioners,

180
00:13:29,330 --> 00:13:32,660
of reasonable choices.

181
00:13:32,660 --> 00:13:39,600
Now I'm going to think
about -- first, how do --

182
00:13:39,600 --> 00:13:47,300
how to decide whether the x's
approach the correct answer?

183
00:13:47,300 --> 00:13:50,980
I need an equation for the error
and because my equations are

184
00:13:50,980 --> 00:13:55,290
linear, I'm just
going to subtract --

185
00:13:55,290 --> 00:13:59,470
I'll subtract the upper
equation from the lower one

186
00:13:59,470 --> 00:14:04,850
and I'll call the -- so the
error e_k will be x minus

187
00:14:04,850 --> 00:14:06,370
the true answer.

188
00:14:06,370 --> 00:14:08,140
This is x exact.

189
00:14:08,140 --> 00:14:13,110
Minus x, my k-th approximation.

190
00:14:13,110 --> 00:14:15,820
So when I subtract
that from that,

191
00:14:15,820 --> 00:14:22,510
I have e_(k+1) and here
I have P minus A --

192
00:14:22,510 --> 00:14:25,380
when I subtract that
from that, I have e_k,

193
00:14:25,380 --> 00:14:28,280
and when I subtract
that from that, 0.

194
00:14:28,280 --> 00:14:32,410
So very simple error equation.

195
00:14:32,410 --> 00:14:43,340
So this is an equation, this
would be the error equation.

196
00:14:43,340 --> 00:14:48,620
You see the b has
disappeared, because I'm only

197
00:14:48,620 --> 00:14:51,910
looking at differences now.

198
00:14:54,440 --> 00:14:59,090
Let me write that
over on this board,

199
00:14:59,090 --> 00:15:01,380
even slightly differently.

200
00:15:01,380 --> 00:15:05,360
Now I will multiply
through by P inverse.

201
00:15:05,360 --> 00:15:08,670
That really shows
it just so clearly.

202
00:15:08,670 --> 00:15:15,600
So the new error is, if
I multiply by P inverse,

203
00:15:15,600 --> 00:15:21,170
P inverse times P is I, and
I have P inverse A, e_k.

204
00:15:29,500 --> 00:15:31,300
So what does that say?

205
00:15:31,300 --> 00:15:35,130
That says that at
every iteration,

206
00:15:35,130 --> 00:15:40,380
I multiply the error
by that matrix.

207
00:15:40,380 --> 00:15:44,410
Of course, if P
equals A, if I pick

208
00:15:44,410 --> 00:15:50,920
the perfect preconditioner, then
P inverse A is the identity,

209
00:15:50,920 --> 00:15:55,390
this is the zero matrix, and the
error in the first step is 0.

210
00:15:55,390 --> 00:15:57,140
I'm solving exactly.

211
00:15:57,140 --> 00:15:59,360
I don't plan to do that.

212
00:15:59,360 --> 00:16:05,820
I plan to have a P that's
close to A, so in some way,

213
00:16:05,820 --> 00:16:08,930
this should be close to I.
This whole matrix should

214
00:16:08,930 --> 00:16:14,130
be close to 0,
close in some sense.

215
00:16:14,130 --> 00:16:16,620
Let me give a name
for that matrix.

216
00:16:16,620 --> 00:16:21,730
Let's call that
matrix M. So that's

217
00:16:21,730 --> 00:16:28,970
the iteration matrix, which
we're never going to display.

218
00:16:28,970 --> 00:16:30,750
I mean, it would be
madness to display.

219
00:16:30,750 --> 00:16:36,140
We don't want to compute P
inverse explicitly or display

220
00:16:36,140 --> 00:16:40,740
it, but to understand it -- to
understand convergence or not

221
00:16:40,740 --> 00:16:45,120
convergence, it's M that
controls everything.

222
00:16:45,120 --> 00:16:51,730
So this is what I would
call pure iteration or maybe

223
00:16:51,730 --> 00:16:54,330
another word that's
used instead of pure

224
00:16:54,330 --> 00:16:56,630
is stationary iteration.

225
00:16:56,630 --> 00:16:58,430
What does stationary mean?

226
00:16:58,430 --> 00:17:01,380
It means that you do the
same thing all the time.

227
00:17:01,380 --> 00:17:10,810
At every step, you just use
the same equation, where --

228
00:17:10,810 --> 00:17:15,690
these methods will be smarter.

229
00:17:15,690 --> 00:17:16,530
They'll adapt.

230
00:17:16,530 --> 00:17:17,960
They'll use more information.

231
00:17:17,960 --> 00:17:22,450
They'll converge faster, but
this is the simple one --

232
00:17:22,450 --> 00:17:26,830
and this one plays
a part in those.

233
00:17:26,830 --> 00:17:32,310
So what's the story
on convergence now?

234
00:17:32,310 --> 00:17:35,690
What about that matrix M?

235
00:17:35,690 --> 00:17:38,940
If I take a starting
vector, e_0 --

236
00:17:38,940 --> 00:17:41,430
my initial error
could be anything.

237
00:17:41,430 --> 00:17:45,620
I multiply by M to get e_1.

238
00:17:45,620 --> 00:17:48,900
Then I multiply by
M again to get e_2.

239
00:17:48,900 --> 00:17:54,580
Every step, I multiply
by M. So after k steps,

240
00:17:54,580 --> 00:18:01,240
I've multiplied k times
by M. I want to know --

241
00:18:01,240 --> 00:18:04,690
so the key question
is, does that go to 0?

242
00:18:04,690 --> 00:18:08,900
So the question, does it go to
0, and if it does, how fast?

243
00:18:14,460 --> 00:18:18,600
The two parts, does it
go to 0 and how fast,

244
00:18:18,600 --> 00:18:24,350
are pretty much controlled
by the same property of M. So

245
00:18:24,350 --> 00:18:25,220
what's the answer?

246
00:18:25,220 --> 00:18:31,360
When does -- if I take -- I
don't know anything about e_0,

247
00:18:31,360 --> 00:18:37,260
so what property of M is going
to decide whether its powers

248
00:18:37,260 --> 00:18:41,750
get small, go to 0, so
that the error goes to 0?

249
00:18:41,750 --> 00:18:44,550
That's convergence.

250
00:18:44,550 --> 00:18:52,790
Or, maybe blow up, or
maybe stay away from 0.

251
00:18:52,790 --> 00:18:56,590
What controls it?

252
00:18:56,590 --> 00:18:59,090
One word is the answer.

253
00:18:59,090 --> 00:19:00,240
The eigenvalues.

254
00:19:00,240 --> 00:19:03,760
It's the eigenvalues of
M, and in particular, it's

255
00:19:03,760 --> 00:19:06,650
the biggest one.

256
00:19:06,650 --> 00:19:12,240
Because if I have an eigenvalue
of M, as far as I know,

257
00:19:12,240 --> 00:19:15,060
e_naught could be
the eigenvector,

258
00:19:15,060 --> 00:19:17,610
and then every
step would multiply

259
00:19:17,610 --> 00:19:19,700
by that eigenvalue lambda.

260
00:19:19,700 --> 00:19:27,860
So the condition is that
all eigenvalues of M

261
00:19:27,860 --> 00:19:32,931
have to be smaller than
1, in magnitude of course,

262
00:19:32,931 --> 00:19:35,180
because they could be negative,
they could be complex.

263
00:19:38,890 --> 00:19:41,540
So that's the total
condition, that's

264
00:19:41,540 --> 00:19:45,990
exactly the requirement
on M. How do

265
00:19:45,990 --> 00:19:48,950
we say how fast they go to 0?

266
00:19:48,950 --> 00:19:53,000
How fast -- if all the
eigenvalues are below 1,

267
00:19:53,000 --> 00:19:59,470
then the slowest
convergence is going to come

268
00:19:59,470 --> 00:20:02,140
for the eigenvalue
that's the closest to 1.

269
00:20:02,140 --> 00:20:11,720
So really it will be -- and this
is called the spectral radius

270
00:20:11,720 --> 00:20:16,890
and it's usually denoted
by a Greek rho of M,

271
00:20:16,890 --> 00:20:19,540
which is the maximum
of these guys.

272
00:20:19,540 --> 00:20:23,710
It's the max of the eigenvalues.

273
00:20:23,710 --> 00:20:26,320
And that has to be below 1.

274
00:20:26,320 --> 00:20:31,090
So it's this
number, that number,

275
00:20:31,090 --> 00:20:35,450
because it's that number
which really says what I'm

276
00:20:35,450 --> 00:20:37,710
multiplying by at every step.

277
00:20:37,710 --> 00:20:42,640
Because most likely, e_naught,
my initial unknown error,

278
00:20:42,640 --> 00:20:47,890
has some component in the
direction of this biggest

279
00:20:47,890 --> 00:20:52,300
eigenvalue, in the direction
of its eigenvector.

280
00:20:52,300 --> 00:20:55,960
Then that component will
multiply at every step

281
00:20:55,960 --> 00:21:00,330
by lambda, but that one will
be the biggest guy, rho.

282
00:21:02,910 --> 00:21:05,170
So it's the maximum eigenvalue.

283
00:21:05,170 --> 00:21:06,690
That's really what it comes to.

284
00:21:06,690 --> 00:21:09,230
What is the maximum eigenvalue?

285
00:21:09,230 --> 00:21:11,900
Let me take Jacobi.

286
00:21:11,900 --> 00:21:17,440
Can we figure out the maximum
eigenvalue for Jacobi?

287
00:21:17,440 --> 00:21:22,364
Of course, if I don't know
anything about the matrix,

288
00:21:22,364 --> 00:21:24,280
I certainly don't want
to compute eigenvalues.

289
00:21:26,840 --> 00:21:29,450
I could run Jacobi's method.

290
00:21:29,450 --> 00:21:32,110
So again, Jacobi's
method, I'm taking

291
00:21:32,110 --> 00:21:33,840
P to be the diagonal part.

292
00:21:33,840 --> 00:21:38,310
Let me make that explicit
and let me take the matrix

293
00:21:38,310 --> 00:21:40,180
that we know the best.

294
00:21:40,180 --> 00:21:45,850
I'll call it K. So k --
or A, this is the A --

295
00:21:45,850 --> 00:21:51,820
is my friend with 2's on the
diagonal, minus 1's above.

296
00:21:51,820 --> 00:21:55,760
I apologize -- I don't
really apologize,

297
00:21:55,760 --> 00:22:02,390
but I'll pretend to apologize
-- for bringing this matrix

298
00:22:02,390 --> 00:22:03,350
in so often.

299
00:22:06,889 --> 00:22:08,180
Do we remember its eigenvalues?

300
00:22:14,780 --> 00:22:22,180
We computed them last semester,
but that's another world.

301
00:22:22,180 --> 00:22:24,900
I'll say what they are.

302
00:22:24,900 --> 00:22:29,830
The eigenvalues of A
-- this isn't M now --

303
00:22:29,830 --> 00:22:36,140
the eigenvalues of this matrix
A -- I see 2's on the diagonal.

304
00:22:36,140 --> 00:22:37,460
So that's a 2.

305
00:22:37,460 --> 00:22:41,240
That accounts for the diagonal
part, 2 times the identity.

306
00:22:41,240 --> 00:22:47,580
I have a minus and then the
part that comes from these 1's,

307
00:22:47,580 --> 00:22:49,460
which are forward and back.

308
00:22:49,460 --> 00:22:51,360
What would von Neumann say?

309
00:22:51,360 --> 00:22:53,580
What would von Neumann
do with that matrix?

310
00:22:53,580 --> 00:23:03,590
Of course, he would test it on
exponentials and he would get 2

311
00:23:03,590 --> 00:23:11,050
minus e to the i*k minus
e to the minus i*k.

312
00:23:11,050 --> 00:23:13,860
He would put the two
exponentials together

313
00:23:13,860 --> 00:23:16,640
into cosines and
that's the answer.

314
00:23:16,640 --> 00:23:21,500
It's 2 minus 2 cosine
of whatever angle theta.

315
00:23:21,500 --> 00:23:23,590
Those are the eigenvalues.

316
00:23:23,590 --> 00:23:31,250
The j-th eigenvalue is --
the cosine is at some angle

317
00:23:31,250 --> 00:23:33,020
theta_j.

318
00:23:33,020 --> 00:23:37,190
It's worth since -- maybe just
to take again a minute on this

319
00:23:37,190 --> 00:23:39,180
matrix.

320
00:23:39,180 --> 00:23:42,670
So the eigenvalues
are between 0 and 4.

321
00:23:42,670 --> 00:23:44,420
The eigenvalues never
go negative, right?

322
00:23:44,420 --> 00:23:48,890
Because the cosine
can't be bigger than 1.

323
00:23:48,890 --> 00:23:52,910
Actually, for this matrix
the cosine doesn't reach 1.

324
00:23:52,910 --> 00:23:57,340
So the smallest eigenvalue
is 2 minus 2 times

325
00:23:57,340 --> 00:23:59,860
that cosine close to 1.

326
00:24:03,210 --> 00:24:09,710
It's too close for
comfort, but it is below 1,

327
00:24:09,710 --> 00:24:12,790
so the eigenvalues are positive.

328
00:24:12,790 --> 00:24:17,560
And they're between 0 and 4.

329
00:24:17,560 --> 00:24:22,620
At the other extreme,
theta is near pi,

330
00:24:22,620 --> 00:24:26,430
the cosine is near minus 1,
and we have something near 4.

331
00:24:26,430 --> 00:24:28,890
So the eigenvalues of that --
if you look at that matrix,

332
00:24:28,890 --> 00:24:32,050
you should see.

333
00:24:32,050 --> 00:24:38,650
Special matrix eigenvalues
in this interval, 0 to 4.

334
00:24:38,650 --> 00:24:42,300
The key point is,
how close to 0?

335
00:24:42,300 --> 00:24:45,210
Now what about M --
what about P first?

336
00:24:45,210 --> 00:24:48,195
What's P?

337
00:24:48,195 --> 00:24:56,290
For Jacobi, so P for Jacobi
is the diagonal part,

338
00:24:56,290 --> 00:24:59,680
which is exactly 2I.

339
00:24:59,680 --> 00:25:07,620
That's a very simple
diagonal part, very simple P.

340
00:25:07,620 --> 00:25:11,080
It produces the matrix
M. You remember,

341
00:25:11,080 --> 00:25:22,790
M is the identity matrix minus
P inverse A. That's beautiful.

342
00:25:22,790 --> 00:25:25,740
So P is just 2I.

343
00:25:25,740 --> 00:25:33,450
So I'm just dividing A by
2, which puts a 1 there,

344
00:25:33,450 --> 00:25:35,430
and then subtracting
from the identity.

345
00:25:35,430 --> 00:25:37,220
So it has 0's on the diagonal.

346
00:25:37,220 --> 00:25:38,660
That's what we expect.

347
00:25:38,660 --> 00:25:47,990
It will be 0's on the diagonal
and off the diagonal, what

348
00:25:47,990 --> 00:25:48,490
do I have?

349
00:25:48,490 --> 00:25:53,150
Minus -- with that minus,
P is 2, so it's 1/2.

350
00:25:53,150 --> 00:26:02,370
It's 1/2 off the diagonal
and otherwise, all 0's.

351
00:26:02,370 --> 00:26:11,750
That's the nice matrix that
we get for M in one dimension.

352
00:26:11,750 --> 00:26:14,420
I have to say, of
course, as you know,

353
00:26:14,420 --> 00:26:19,180
we wouldn't be doing any of this
for this one-dimensional matrix

354
00:26:19,180 --> 00:26:21,880
A. It's tridiagonal.

355
00:26:21,880 --> 00:26:23,960
We can't ask for more than that.

356
00:26:23,960 --> 00:26:28,090
Direct elimination
would be top speed,

357
00:26:28,090 --> 00:26:36,060
but the 2D case is what we
understand that it has a big

358
00:26:36,060 --> 00:26:38,960
bandwidth because
it's two-dimensional,

359
00:26:38,960 --> 00:26:43,870
three-dimensional even bigger
-- and the eigenvalues will come

360
00:26:43,870 --> 00:26:46,700
directly from the
eigenvalues of this 1D case.

361
00:26:46,700 --> 00:26:49,590
So if we understand
the 1D case, we'll

362
00:26:49,590 --> 00:26:53,090
knock out the two- and
three-dimensional easily.

363
00:26:53,090 --> 00:26:56,560
So I'll stay with this
one-dimensional case,

364
00:26:56,560 --> 00:27:00,470
where the matrix M
-- and I'll even,

365
00:27:00,470 --> 00:27:06,360
maybe should write down exactly
what the iteration does --

366
00:27:06,360 --> 00:27:08,720
but if I'm looking
now at convergence,

367
00:27:08,720 --> 00:27:13,980
do I have or don't I have
convergence, and how fast?

368
00:27:13,980 --> 00:27:18,030
What are the eigenvalues
and what's the biggest one?

369
00:27:18,030 --> 00:27:24,230
The eigenvalues of A are
this and P is just --

370
00:27:24,230 --> 00:27:29,950
P inverse is just 1/2
times the identity.

371
00:27:32,910 --> 00:27:35,360
So what are the eigenvalues?

372
00:27:35,360 --> 00:27:40,220
Again, I mean, I know
the eigenvalues of A,

373
00:27:40,220 --> 00:27:42,920
so I divide by 2.

374
00:27:42,920 --> 00:27:44,880
Let me write down what
I'm going to get now

375
00:27:44,880 --> 00:27:53,230
for the eigenvalues of M,
remembering this connection.

376
00:27:53,230 --> 00:27:58,230
So the eigenvalues of M are 1
minus 1/2 times the eigenvalues

377
00:27:58,230 --> 00:28:00,080
of A. Better write that down.

378
00:28:00,080 --> 00:28:03,730
1 minus 1/2 times
the eigenvalues of A.

379
00:28:03,730 --> 00:28:10,000
Of course, it's
very special that we

380
00:28:10,000 --> 00:28:15,500
have this terrific
example matrix where

381
00:28:15,500 --> 00:28:19,160
we know the eigenvalues and
really can see what's happening

382
00:28:19,160 --> 00:28:21,670
and we know the matrix.

383
00:28:21,670 --> 00:28:22,520
So what happens?

384
00:28:22,520 --> 00:28:27,140
I take half of that --
that makes that a 1.

385
00:28:27,140 --> 00:28:32,090
When I subtract it
from that 1, it's gone.

386
00:28:32,090 --> 00:28:36,980
1/2 of that, that 2 is cancelled
-- I just get cos theta.

387
00:28:36,980 --> 00:28:40,990
Theta sub -- whatever
that angle was.

388
00:28:43,910 --> 00:28:48,390
It's a cosine and
it's less than 1.

389
00:28:51,190 --> 00:28:55,060
So convergence is going to
happen and convergence is going

390
00:28:55,060 --> 00:29:01,950
to be slow or fast according as
this is very near 1 or not --

391
00:29:01,950 --> 00:29:06,120
and unfortunately, it is pretty
near 1, because the first,

392
00:29:06,120 --> 00:29:11,040
the largest one -- I could
tell you what the angles are.

393
00:29:11,040 --> 00:29:13,340
That's -- to complete
this information,

394
00:29:13,340 --> 00:29:18,610
I should have put in what --
and von Neumann got them right.

395
00:29:18,610 --> 00:29:27,610
That's j -- is there
maybe a pi over N plus 1?

396
00:29:27,610 --> 00:29:28,440
I think so.

397
00:29:31,200 --> 00:29:33,160
Those are the
angles that come in,

398
00:29:33,160 --> 00:29:37,580
just the equally spaced
angles in the finite case.

399
00:29:37,580 --> 00:29:40,940
So we have N eigenvalues.

400
00:29:40,940 --> 00:29:44,900
And the biggest one is
going to be when j is 1.

401
00:29:44,900 --> 00:29:50,780
So the biggest one, rho,
the maximum of these guys,

402
00:29:50,780 --> 00:29:56,040
of these lambdas, will be
the cosine of, when j is 1,

403
00:29:56,040 --> 00:30:00,180
pi over N plus 1.

404
00:30:00,180 --> 00:30:01,860
That's the one that's
slowing us down.

405
00:30:06,800 --> 00:30:10,390
Notice the eigenvector
that's slowing us down,

406
00:30:10,390 --> 00:30:13,590
because it's the
eigenvector that

407
00:30:13,590 --> 00:30:17,270
goes with that eigenvalue that's
the biggest eigenvalue that's

408
00:30:17,270 --> 00:30:21,920
going to hang around the
longest, decay the slowest.

409
00:30:21,920 --> 00:30:25,980
That eigenvector is very smooth.

410
00:30:25,980 --> 00:30:27,850
It's the low frequency.

411
00:30:27,850 --> 00:30:29,660
It's frequency 1.

412
00:30:29,660 --> 00:30:32,180
So this is j equal to 1.

413
00:30:32,180 --> 00:30:35,780
This is the low frequency,
j equal 1, low frequency.

414
00:30:39,030 --> 00:30:49,070
The eigenvector is just
samples of the sine,

415
00:30:49,070 --> 00:30:52,500
of a discrete sine.

416
00:30:52,500 --> 00:30:57,140
Lowest frequency that
the grid can sustain.

417
00:31:00,600 --> 00:31:04,250
Why do I emphasize
which eigenvector it is?

418
00:31:04,250 --> 00:31:08,420
Because when you know which
eigenvector is slowing you down

419
00:31:08,420 --> 00:31:12,330
-- and here it's the eigenvector
you would think would be

420
00:31:12,330 --> 00:31:15,770
the simplest one to handle
-- this is the easiest

421
00:31:15,770 --> 00:31:21,830
eigenvector, higher-frequency
eigenvectors have eigenvalues

422
00:31:21,830 --> 00:31:24,110
that start dropping.

423
00:31:24,110 --> 00:31:29,810
Cosine of 2*pi, cosine of 3*pi
over N plus 1 is going to be

424
00:31:29,810 --> 00:31:31,420
further away from 1.

425
00:31:31,420 --> 00:31:38,630
Now I have to admit that
the very highest frequency,

426
00:31:38,630 --> 00:31:48,600
when j is capital N -- so when j
is capital N I have to admit it

427
00:31:48,600 --> 00:31:51,450
happens to come back
with a minus sign,

428
00:31:51,450 --> 00:31:54,820
but the magnitude is
still just as bad.

429
00:31:54,820 --> 00:32:03,040
The cosine of N*pi over N plus
1 is just the negative of that

430
00:32:03,040 --> 00:32:04,820
one.

431
00:32:04,820 --> 00:32:07,650
So actually, we
have two eigenvalues

432
00:32:07,650 --> 00:32:13,070
that are both hitting
the spectral radius

433
00:32:13,070 --> 00:32:20,350
and both of them are the
worst, the slowest converging

434
00:32:20,350 --> 00:32:22,610
eigenvectors.

435
00:32:22,610 --> 00:32:27,360
This eigenvector
looks very smooth.

436
00:32:27,360 --> 00:32:29,880
The eigenvector for
that high-frequency one

437
00:32:29,880 --> 00:32:32,760
is the fastest oscillation.

438
00:32:32,760 --> 00:32:37,520
If I graph the eigenvectors,
eigenvalues, I will.

439
00:32:37,520 --> 00:32:39,950
I'll graph the eigenvalues.

440
00:32:39,950 --> 00:32:42,500
You'll see that it's the
first and the last that

441
00:32:42,500 --> 00:32:46,160
are nearest in magnitude to 1.

442
00:32:49,010 --> 00:32:52,020
I was just going to
say, can I anticipate

443
00:32:52,020 --> 00:32:53,110
multigrid for a minute?

444
00:32:55,850 --> 00:33:01,680
So multigrid does -- is going
to try to get these guys to go

445
00:33:01,680 --> 00:33:03,460
faster.

446
00:33:03,460 --> 00:33:06,510
Now this one can be
handled, you'll see,

447
00:33:06,510 --> 00:33:09,650
by just a simple
adjustment of Jacobi.

448
00:33:09,650 --> 00:33:13,500
I just weight
Jacobi a little bit.

449
00:33:13,500 --> 00:33:19,350
Instead of 2I, I take 3I or 4I.

450
00:33:19,350 --> 00:33:24,330
3I would be very good
for the preconditioner.

451
00:33:24,330 --> 00:33:28,440
That will have the effect
on this high frequency one

452
00:33:28,440 --> 00:33:32,290
that it'll be well
down now, below 1.

453
00:33:32,290 --> 00:33:37,980
This will still be the
slowpoke in converging.

454
00:33:37,980 --> 00:33:44,460
That's where multigrid
says, that's low frequency.

455
00:33:44,460 --> 00:33:45,740
That's too low.

456
00:33:45,740 --> 00:33:48,530
That's not converging quickly.

457
00:33:48,530 --> 00:33:52,380
Take a different grid
on which that same thing

458
00:33:52,380 --> 00:33:54,300
looks higher frequency.

459
00:33:54,300 --> 00:33:58,360
We'll see that tomorrow.

460
00:33:58,360 --> 00:34:02,060
So if I stay with Jacobi,
what have I learned?

461
00:34:02,060 --> 00:34:05,410
I've learned that this is rho.

462
00:34:05,410 --> 00:34:07,870
For this matrix,
the spectral radius

463
00:34:07,870 --> 00:34:11,550
is that number and let's
just see how big it is.

464
00:34:11,550 --> 00:34:15,380
How big is -- that's a
cosine of a small number.

465
00:34:15,380 --> 00:34:20,320
So the cosine of a small
number, theta near 0,

466
00:34:20,320 --> 00:34:25,600
the cosine of theta is 1
minus -- so it's below 1,

467
00:34:25,600 --> 00:34:27,960
of course -- 1/2 theta squared.

468
00:34:27,960 --> 00:34:33,370
Theta is pi over
N plus 1 squared.

469
00:34:33,370 --> 00:34:36,730
All these simple
estimates really

470
00:34:36,730 --> 00:34:44,230
tell you what happens when
you do Jacobi iteration.

471
00:34:44,230 --> 00:34:52,450
You'll see it on the output
from -- I mean, you can --

472
00:34:52,450 --> 00:34:59,690
I hope will -- code
up Jacobi's method,

473
00:34:59,690 --> 00:35:06,710
taking P to be a multiple of
the identity, so very fast,

474
00:35:06,710 --> 00:35:11,090
and you'll see the convergence
rate, you can graph.

475
00:35:11,090 --> 00:35:13,900
Here's an interesting point
and I'll make it again.

476
00:35:13,900 --> 00:35:19,970
What should you
graph in seeing --

477
00:35:19,970 --> 00:35:24,640
to display convergence
and rate of convergence?

478
00:35:24,640 --> 00:35:28,400
Let me draw a
possible graph here.

479
00:35:28,400 --> 00:35:30,950
Let me draw two possible graphs.

480
00:35:30,950 --> 00:35:38,660
I could graph -- this is
K. As I take more steps,

481
00:35:38,660 --> 00:35:42,890
I could graph the error,
well, the size of the error.

482
00:35:42,890 --> 00:35:47,980
It's a vector, so
I take its length.

483
00:35:47,980 --> 00:35:50,340
I don't know where I
start -- at e_naught.

484
00:35:50,340 --> 00:35:51,257
This is k equals 0.

485
00:35:51,257 --> 00:35:52,340
This is the initial guess.

486
00:35:55,380 --> 00:36:00,620
If I use -- I'm going
to draw a graph,

487
00:36:00,620 --> 00:36:04,990
which you'll draw much better
by actually getting MATLAB to do

488
00:36:04,990 --> 00:36:05,490
it.

489
00:36:08,690 --> 00:36:13,680
Before I draw it, let me mention
the other graph we could draw.

490
00:36:13,680 --> 00:36:22,350
This tells us the error
in -- this is the error

491
00:36:22,350 --> 00:36:25,257
in the solution.

492
00:36:25,257 --> 00:36:27,090
If I think that's the
natural thing to draw,

493
00:36:27,090 --> 00:36:32,360
it is, but we could
also draw the error

494
00:36:32,360 --> 00:36:36,410
in the equation,
the residual error.

495
00:36:36,410 --> 00:36:39,730
So I'll just put up here
what I mean by that.

496
00:36:39,730 --> 00:36:48,220
If I put in -- so
A*x equals b exactly.

497
00:36:48,220 --> 00:36:52,790
A*x_k is close.

498
00:36:56,990 --> 00:37:01,070
The residual is -- let me call
it r, as everybody does --

499
00:37:01,070 --> 00:37:13,820
is the difference A*x minus
A*x_k It's how close we are

500
00:37:13,820 --> 00:37:18,850
to b, so it's b minus A*x_k.

501
00:37:18,850 --> 00:37:21,610
The true A*x gets
b exactly right.

502
00:37:21,610 --> 00:37:23,770
The wrong A*x gets
b nearly right.

503
00:37:28,030 --> 00:37:32,090
So you see, because we
have this linear problem,

504
00:37:32,090 --> 00:37:38,860
it's A times the error.
x minus x_k is e_k.

505
00:37:38,860 --> 00:37:49,770
So this is the residual
and I could graph that.

506
00:37:49,770 --> 00:37:55,470
That's how close my answer
comes to getting b right,

507
00:37:55,470 --> 00:37:57,360
how close the equation is.

508
00:37:57,360 --> 00:38:01,440
Where this is how
close the x_k is.

509
00:38:01,440 --> 00:38:03,520
So that would be the
other thing I could graph.

510
00:38:06,570 --> 00:38:12,680
r_k, the size of the
residual, which is A*e_k.

511
00:38:19,670 --> 00:38:23,610
So now I have to remember what
I think these graphs look like,

512
00:38:23,610 --> 00:38:26,305
but of course, it's a little
absurd for me to draw them when

513
00:38:26,305 --> 00:38:35,040
-- I think what we see is that
the residual drops pretty fast.

514
00:38:35,040 --> 00:38:40,660
The residual drops pretty
fast and keeps going,

515
00:38:40,660 --> 00:38:49,010
for, let's say, Jacobi's method,
or, in a minute or maybe next

516
00:38:49,010 --> 00:38:52,940
time, another
iterative method --

517
00:38:52,940 --> 00:38:57,000
but the error in the
solution starts down fast

518
00:38:57,000 --> 00:39:01,340
and what's happening here is the
high-frequency components are

519
00:39:01,340 --> 00:39:08,950
getting killed, but those
low-frequency components are

520
00:39:08,950 --> 00:39:13,500
not disappearing, because we
figured out that they give

521
00:39:13,500 --> 00:39:16,410
the spectral radius,
the largest eigenvalue.

522
00:39:16,410 --> 00:39:27,840
So somehow it slopes
off and it's --

523
00:39:27,840 --> 00:39:31,680
the price of such a simple
method is that you don't get

524
00:39:31,680 --> 00:39:33,020
great convergence.

525
00:39:33,020 --> 00:39:38,520
Of course, if I put
this on a log-log plot,

526
00:39:38,520 --> 00:39:42,470
the slope would be
exactly connected

527
00:39:42,470 --> 00:39:44,310
to the spectral radius.

528
00:39:44,310 --> 00:39:45,930
That's the rate of decay.

529
00:39:48,970 --> 00:39:53,320
It's the -- e_k is
approximately --

530
00:39:53,320 --> 00:39:57,750
the size of e_k is approximately
that worst eigenvalue

531
00:39:57,750 --> 00:39:58,710
to the k-th power.

532
00:40:04,830 --> 00:40:06,410
Do you see how this can happen?

533
00:40:09,080 --> 00:40:11,830
You see, this can
be quite small,

534
00:40:11,830 --> 00:40:13,750
the residual quite small.

535
00:40:13,750 --> 00:40:16,330
I think it's very
important to see the two

536
00:40:16,330 --> 00:40:17,760
different quantities.

537
00:40:17,760 --> 00:40:22,780
The residual can be quite small,
but then, from the residual,

538
00:40:22,780 --> 00:40:24,640
if I wanted to get
back to the error,

539
00:40:24,640 --> 00:40:27,480
I'd have to multiply
by A inverse

540
00:40:27,480 --> 00:40:30,730
and A inverse is not small.

541
00:40:30,730 --> 00:40:36,090
Our matrix A is pretty --
has eigenvalues close to 0.

542
00:40:36,090 --> 00:40:38,180
So A inverse is pretty big.

543
00:40:38,180 --> 00:40:48,170
A inverse times A*e_k -- this
is small but A inverse picks up

544
00:40:48,170 --> 00:40:51,220
those smallest
eigenvalues of A --

545
00:40:51,220 --> 00:40:59,110
picks up those low
frequencies and it's bigger,

546
00:40:59,110 --> 00:41:02,580
slower to go to 0 as
we see in this picture.

547
00:41:02,580 --> 00:41:08,080
So it's this picture
that is our problem.

548
00:41:08,080 --> 00:41:14,580
That shows where Jacobi
is not good enough.

549
00:41:14,580 --> 00:41:18,760
So what more to
say about Jacobi?

550
00:41:18,760 --> 00:41:21,840
For this matrix K, we
know its eigenvalues.

551
00:41:24,650 --> 00:41:27,560
The eigenvalues of
M we know exactly.

552
00:41:27,560 --> 00:41:30,000
The matrix M we know exactly.

553
00:41:30,000 --> 00:41:33,740
Do you see in that matrix
-- now if I had said,

554
00:41:33,740 --> 00:41:37,040
look at that matrix and
tell me something about its

555
00:41:37,040 --> 00:41:38,740
eigenvalues, what would you say?

556
00:41:41,370 --> 00:41:45,630
You would say, it's symmetric,
so the eigenvalues are real

557
00:41:45,630 --> 00:41:49,870
and you would say that every
eigenvalue is smaller than 1.

558
00:41:49,870 --> 00:41:50,880
Why?

559
00:41:50,880 --> 00:41:55,120
Because every eigenvalue is --
if you like to remember this

560
00:41:55,120 --> 00:42:01,540
business of silly circles,
every eigenvalue is in a circle

561
00:42:01,540 --> 00:42:05,260
centered at this number -- in
other words, centered at 0 --

562
00:42:05,260 --> 00:42:10,090
and its radius is the
sum of those, which is 1.

563
00:42:10,090 --> 00:42:16,180
So we know that every
eigenvalue is in the unit circle

564
00:42:16,180 --> 00:42:21,510
and actually, it comes
inside the unit circle,

565
00:42:21,510 --> 00:42:25,570
but not very much --
only by 1 over N squared.

566
00:42:25,570 --> 00:42:28,610
What do I learn from
this 1 over N squared?

567
00:42:32,090 --> 00:42:33,950
How many iterations
do I have to take

568
00:42:33,950 --> 00:42:39,630
to reduce the error by, let's
say, 10 or e or whatever?

569
00:42:39,630 --> 00:42:44,900
If the eigenvalues are 1 minus
a constant over N squared,

570
00:42:44,900 --> 00:42:47,930
I'll have to take
the N squared power.

571
00:42:47,930 --> 00:42:50,900
So if I have a
matrix of order 100,

572
00:42:50,900 --> 00:42:52,820
the number of iterations
I'm going to need

573
00:42:52,820 --> 00:42:59,600
is 100 squared, 10,000 steps.

574
00:42:59,600 --> 00:43:04,560
Every step is fast, especially
with the Jacobi choice.

575
00:43:09,420 --> 00:43:14,920
Here's my iteration -- and P is
just a multiple of the identity

576
00:43:14,920 --> 00:43:15,630
here.

577
00:43:18,350 --> 00:43:20,830
So every step is certainly fast.

578
00:43:20,830 --> 00:43:23,050
Every step involves a
matrix multiplication

579
00:43:23,050 --> 00:43:25,850
and that's why I began
the lecture by saying,

580
00:43:25,850 --> 00:43:28,000
matrix multiplications are fast.

581
00:43:28,000 --> 00:43:32,860
A times x_k -- that's the
work and it's not much.

582
00:43:32,860 --> 00:43:37,440
It's maybe five
non-zeros in A -- so 5N.

583
00:43:37,440 --> 00:43:40,890
That's fast, but if I
have to do it 10,000 times

584
00:43:40,890 --> 00:43:44,720
just to reduce the error
by some constant factor,

585
00:43:44,720 --> 00:43:47,210
that's not good.

586
00:43:47,210 --> 00:43:52,530
So that's why people think,
okay, Jacobi gives us the idea.

587
00:43:52,530 --> 00:43:56,130
It does knock off the high
frequencies quite quickly,

588
00:43:56,130 --> 00:44:00,350
but it leaves those
low frequencies.

589
00:44:00,350 --> 00:44:03,860
Well, I was going to
mention weighed Jacobi.

590
00:44:03,860 --> 00:44:12,040
Weighted Jacobi is to
take, instead of p --

591
00:44:12,040 --> 00:44:15,230
put in a weight.

592
00:44:15,230 --> 00:44:18,970
So weighted Jacobi -- let me put
this here and we'll come back

593
00:44:18,970 --> 00:44:19,510
to it.

594
00:44:19,510 --> 00:44:27,330
So weighted Jacobi
simply takes P

595
00:44:27,330 --> 00:44:35,390
to be the diagonal
over a factor omega.

596
00:44:39,240 --> 00:44:49,300
For example, omega equals
2/3 is a good choice.

597
00:44:49,300 --> 00:44:53,200
Let me draw the -- you
can imagine that if I --

598
00:44:53,200 --> 00:44:58,020
this is 2 times the identity,
so it's just 2 over omega I --

599
00:44:58,020 --> 00:45:01,280
I'm just choosing a
different constant.

600
00:45:01,280 --> 00:45:03,540
I'm just choosing a
different constant in P

601
00:45:03,540 --> 00:45:09,360
and it has a simple affect
on M. I'll draw the picture.

602
00:45:12,100 --> 00:45:16,180
The eigenvalues of
M are these cosines.

603
00:45:16,180 --> 00:45:19,830
So I'm going to draw those
cosines, the eigenvalues of M.

604
00:45:19,830 --> 00:45:23,290
So they start -- I'll just draw
the picture of cos theta if I

605
00:45:23,290 --> 00:45:23,940
can.

606
00:45:23,940 --> 00:45:26,250
What does a graph of
cos theta look like?

607
00:45:32,060 --> 00:45:34,840
From 0 to pi?

608
00:45:34,840 --> 00:45:38,910
So I'm going to see --
this is theta equals 0.

609
00:45:38,910 --> 00:45:41,890
This is theta equal pi.

610
00:45:41,890 --> 00:45:45,670
If I just draw cos theta
-- please tell me if this

611
00:45:45,670 --> 00:45:47,070
isn't it.

612
00:45:47,070 --> 00:45:48,430
Is it something like that?

613
00:45:52,620 --> 00:45:58,940
Now where are these theta_1
-- the bad one -- theta_2,

614
00:45:58,940 --> 00:46:03,580
theta_3, theta_4, theta_5,
equally bad, down here.

615
00:46:03,580 --> 00:46:07,180
So there's cos --
there's the biggest one.

616
00:46:07,180 --> 00:46:09,600
That's rho and you see
it's pretty near 1.

617
00:46:12,130 --> 00:46:14,890
Here I'm going to hit
-- at this frequency,

618
00:46:14,890 --> 00:46:17,620
I'm going to hit minus rho.

619
00:46:17,620 --> 00:46:20,730
It'll be, again, near 1.

620
00:46:20,730 --> 00:46:27,180
So the spectral radius
is that, that's rho.

621
00:46:27,180 --> 00:46:34,160
The cosine of that smallest
angle, theta over N plus 1.

622
00:46:34,160 --> 00:46:36,740
What happens when I
bring in these weights?

623
00:46:36,740 --> 00:46:43,090
It turns out that -- so,
now the eigenvalues of M,

624
00:46:43,090 --> 00:46:46,780
with the weights, will
be 1 -- it turns out --

625
00:46:46,780 --> 00:46:50,740
1 minus omega plus
omega cos theta.

626
00:46:53,820 --> 00:46:56,720
You'll see it in the
notes, no problem.

627
00:46:56,720 --> 00:47:00,990
So it's simply
influenced by omega.

628
00:47:00,990 --> 00:47:04,660
If omega is 2/3 --
if omega is 2/3,

629
00:47:04,660 --> 00:47:10,600
then I have 1 minus omega
is 1/3 plus 2/3 cos theta.

630
00:47:15,620 --> 00:47:18,240
Instead of graphing cos
theta, which I've done --

631
00:47:18,240 --> 00:47:23,220
let me draw -- let me complete
this graph for cos theta.

632
00:47:23,220 --> 00:47:27,880
The eigenvalues unweighted,
with weight omega equal 1.

633
00:47:27,880 --> 00:47:31,450
Now it's this thing I draw.

634
00:47:31,450 --> 00:47:33,420
So what happens?

635
00:47:33,420 --> 00:47:36,170
What's going on here?

636
00:47:36,170 --> 00:47:40,890
For a small theta,
cosine is near 1.

637
00:47:40,890 --> 00:47:42,200
I'm again near 1.

638
00:47:42,200 --> 00:47:46,100
In fact, I'm probably even
little worse, probably up here.

639
00:47:51,230 --> 00:47:57,410
I think it goes down like that.

640
00:47:57,410 --> 00:48:03,180
So it starts at near 1, but
it ends when theta is pi.

641
00:48:03,180 --> 00:48:05,140
This is 1/3 minus 2/3.

642
00:48:05,140 --> 00:48:06,170
It ends at minus 1/3.

643
00:48:06,170 --> 00:48:10,150
You see, that's a lot smarter.

644
00:48:10,150 --> 00:48:14,470
This one ended at minus 1,
that one ended at near minus 1,

645
00:48:14,470 --> 00:48:21,130
but this one will end with omega
-- this is the omega equal 2/3,

646
00:48:21,130 --> 00:48:22,400
the weighted one.

647
00:48:22,400 --> 00:48:26,830
All you've done is fix
the high frequency.

648
00:48:26,830 --> 00:48:30,080
When I say, oh, that
was a good thing to do.

649
00:48:30,080 --> 00:48:34,060
The high frequencies
are no longer a problem.

650
00:48:34,060 --> 00:48:37,340
The low frequencies still are.

651
00:48:37,340 --> 00:48:39,800
So the only way we'll
get out of that problem

652
00:48:39,800 --> 00:48:43,470
is moving to multigrid.

653
00:48:43,470 --> 00:48:49,970
Can I, in the remaining
minute, report on Gauss-Seidel

654
00:48:49,970 --> 00:48:51,220
and then I'll come back to it?

655
00:48:51,220 --> 00:48:54,320
But quick report
on Gauss-Seidel.

656
00:48:54,320 --> 00:48:58,150
So what's the difference
in Gauss-Seidel?

657
00:48:58,150 --> 00:48:59,560
What does Gauss-Seidel do?

658
00:48:59,560 --> 00:49:01,740
It uses the latest information.

659
00:49:01,740 --> 00:49:05,770
As soon as you compute
an x, you use it.

660
00:49:05,770 --> 00:49:14,350
Jacobi computed all -- had to
keep all these x's until it

661
00:49:14,350 --> 00:49:18,780
found all the new x's, but
the Gauss-Seidel idea is,

662
00:49:18,780 --> 00:49:22,010
as soon as you have a better
x, put it in the system.

663
00:49:22,010 --> 00:49:25,620
So the storage is cut in
half, because you're not

664
00:49:25,620 --> 00:49:32,380
saving a big old vector while
you compute the new one.

665
00:49:32,380 --> 00:49:34,550
I'll write down
Gauss-Seidel again.

666
00:49:34,550 --> 00:49:38,650
So it's more up to
date and the effect

667
00:49:38,650 --> 00:49:44,080
is that the Jacobi
eigenvalues get squared.

668
00:49:44,080 --> 00:49:50,380
So you're squaring numbers
that are less than 1.

669
00:49:50,380 --> 00:50:01,460
So Gauss-Seidel, this gives
the Jacobi eigenvalues squared.

670
00:50:01,460 --> 00:50:06,310
The result is that if I draw
the same curve for Gauss-Seidel,

671
00:50:06,310 --> 00:50:08,590
I'll be below, right?

672
00:50:11,920 --> 00:50:15,590
Essentially, a Gauss-Seidel
step is worth two Jacobi steps,

673
00:50:15,590 --> 00:50:18,300
because eigenvalues
get squared as they

674
00:50:18,300 --> 00:50:20,496
would do in two Jacobi steps.

675
00:50:20,496 --> 00:50:23,460
If I do Jacobi twice, I
square its eigenvalues.

676
00:50:23,460 --> 00:50:27,110
So it seems like
absolutely a smart move.

677
00:50:27,110 --> 00:50:35,320
It's not brilliant, because
this becomes cosine squared.

678
00:50:35,320 --> 00:50:41,960
I lose the 1/2, but I'm
still very, very close to 1.

679
00:50:41,960 --> 00:50:47,150
It still takes order N squared
iterations to get anywhere.

680
00:50:47,150 --> 00:50:51,060
So Jacobi is, you
could say, replaced

681
00:50:51,060 --> 00:50:53,760
by Gauss-Seidel as
being one step better,

682
00:50:53,760 --> 00:50:55,750
but it's not good enough.

683
00:50:55,750 --> 00:51:02,950
And that's why the
subject kept going.

684
00:51:02,950 --> 00:51:07,490
This gives rho as
1 minus order of 1

685
00:51:07,490 --> 00:51:14,920
over N, first power, where these
were 1 minus 1 over N squared.

686
00:51:14,920 --> 00:51:17,460
So this is definitely
further from 1

687
00:51:17,460 --> 00:51:20,460
and then this can be
way further from 1.

688
00:51:20,460 --> 00:51:23,980
So you can see some of
the numerical experiments

689
00:51:23,980 --> 00:51:26,340
and analysis that's coming.

690
00:51:26,340 --> 00:51:31,280
I'll see you Wednesday to finish
this and move into multigrid,

691
00:51:31,280 --> 00:51:33,300
which uses these guys.