1
00:00:00,000 --> 00:00:03,350
The following content it's
provided by MIT OpenCourseWare,

2
00:00:03,350 --> 00:00:06,090
under a Creative
Commons license.

3
00:00:06,090 --> 00:00:09,460
Additional information about our
license and MIT OpenCourseWare,

4
00:00:09,460 --> 00:00:11,930
in general, is available
at ocw.mit.edu.

5
00:00:15,340 --> 00:00:17,430
PROFESSOR: Several
pieces of good news.

6
00:00:17,430 --> 00:00:23,890
Since Brett is back now,
all of chapters five and six

7
00:00:23,890 --> 00:00:29,930
of my notes, that is to say,
all that we've covered in class,

8
00:00:29,930 --> 00:00:34,850
plus a little more, are
on the web page now.

9
00:00:34,850 --> 00:00:39,040
So, I was cautious at the
beginning of the semester

10
00:00:39,040 --> 00:00:42,330
and didn't put that up, and then
forgot that it wasn't there.

11
00:00:42,330 --> 00:00:45,360
So now you have
something to work with,

12
00:00:45,360 --> 00:00:52,330
particularly on any
experiments, for example,

13
00:00:52,330 --> 00:00:54,030
minimum degree ordering.

14
00:00:54,030 --> 00:01:00,040
Oh and also,
there's a movie now.

15
00:01:00,040 --> 00:01:04,390
Tim Davis from Florida and
[? Perils ?] [? Person ?],

16
00:01:04,390 --> 00:01:10,950
my friend here, created a little
movie to show the order that

17
00:01:10,950 --> 00:01:13,360
elimination occurs.

18
00:01:13,360 --> 00:01:16,760
And I had a lot of
email with Tim Davis.

19
00:01:19,620 --> 00:01:26,200
On that one-page handout,
for 2D Laplace's equation,

20
00:01:26,200 --> 00:01:32,830
it mentions N cubed for the
nested dissection, where

21
00:01:32,830 --> 00:01:38,020
you cut the region in half, and
half, and half, by separators.

22
00:01:38,020 --> 00:01:41,250
I believe, and I think
experiments will show,

23
00:01:41,250 --> 00:01:45,310
that minimum degree
is even better.

24
00:01:45,310 --> 00:01:49,360
But the analysis is
extremely weak, apparently,

25
00:01:49,360 --> 00:01:50,560
on minimum degree.

26
00:01:50,560 --> 00:01:57,390
So anyway, one possible
project either for now

27
00:01:57,390 --> 00:02:04,570
or for the final project
would be some experiments

28
00:02:04,570 --> 00:02:11,310
to see what the actual
convergence is like.

29
00:02:11,310 --> 00:02:16,660
And maybe a little
understanding of minimum degree.

30
00:02:16,660 --> 00:02:18,150
So we'll talk more about that.

31
00:02:18,150 --> 00:02:23,590
Anyway, there's a movie
and a lot more material

32
00:02:23,590 --> 00:02:24,530
there on the website.

33
00:02:27,740 --> 00:02:34,600
It'll get updated and
corrected as I do the edits.

34
00:02:34,600 --> 00:02:37,760
Second bit of good news.

35
00:02:37,760 --> 00:02:40,340
I made a mistake.

36
00:02:40,340 --> 00:02:44,660
That's not normally good
news, but this time, I

37
00:02:44,660 --> 00:02:50,150
reported a little
calculation for multigrid,

38
00:02:50,150 --> 00:02:52,470
and I computed the wrong thing.

39
00:02:52,470 --> 00:03:00,680
You remember that I computed
M. M was the Jacobi matrix.

40
00:03:00,680 --> 00:03:06,800
So M was the Jacobi matrix, the
iteration matrix with Jacobi,

41
00:03:06,800 --> 00:03:11,280
which is I minus D
inverse A This is

42
00:03:11,280 --> 00:03:17,300
for the 1D second differences.

43
00:03:17,300 --> 00:03:24,140
In fact, this was
the 5 by 5 matrix

44
00:03:24,140 --> 00:03:25,620
that we've been talking about.

45
00:03:25,620 --> 00:03:32,000
And now this is multiplied
by D inverse which is a half,

46
00:03:32,000 --> 00:03:37,970
but then with a weighting
factor, that becomes a third.

47
00:03:37,970 --> 00:03:40,390
So that's weighted Jacobi now.

48
00:03:40,390 --> 00:03:45,690
And this is what we would get
for ordinary, simple iteration.

49
00:03:45,690 --> 00:03:47,960
And it's not satisfactory.

50
00:03:47,960 --> 00:03:52,630
Remember it had an eigenvalue
lambda_max, the spectra radius,

51
00:03:52,630 --> 00:03:56,400
was about 0.9.

52
00:03:56,400 --> 00:04:05,190
Then I reported about
the eigenvalues of M*S,

53
00:04:05,190 --> 00:04:12,680
the multigrid matrix S, M, and
I was a little disappointed

54
00:04:12,680 --> 00:04:14,630
in its eigenvalues.

55
00:04:14,630 --> 00:04:18,060
The reason is, I should
have been doing I

56
00:04:18,060 --> 00:04:24,960
minus S M. When I do that,
I'm not disappointed anymore.

57
00:04:29,050 --> 00:04:30,900
What are the
eigenvalues of this?

58
00:04:30,900 --> 00:04:37,100
You remember, S is the
matrix that tells us

59
00:04:37,100 --> 00:04:39,210
how much error we're capturing.

60
00:04:39,210 --> 00:04:44,800
S is the matrix that came
from-- the error we captured

61
00:04:44,800 --> 00:04:48,550
at the end of
multigrid, was S times

62
00:04:48,550 --> 00:04:52,910
the error that we entered with.

63
00:04:52,910 --> 00:05:00,750
And so the error that's
left, the remaining error,

64
00:05:00,750 --> 00:05:05,050
the new error, is the rest.

65
00:05:05,050 --> 00:05:07,780
It's the part we don't get,
and that's when I forgot.

66
00:05:07,780 --> 00:05:10,610
That's I minus S e.

67
00:05:10,610 --> 00:05:14,780
So that's why I should
have been using I minus S.

68
00:05:14,780 --> 00:05:19,130
So then, the step one of
multigrid does a smoother.

69
00:05:19,130 --> 00:05:24,480
Steps two, three, four
leave me I minus S.

70
00:05:24,480 --> 00:05:27,920
And step five did
another smoother.

71
00:05:27,920 --> 00:05:33,660
And this is just with one Jacobi
step, which you would probably

72
00:05:33,660 --> 00:05:34,930
do three.

73
00:05:34,930 --> 00:05:40,630
Anyway, the eigenvalues
of this were, well --

74
00:05:40,630 --> 00:05:45,760
this has two zero
eigenvalues, and three 1's.

75
00:05:45,760 --> 00:05:51,484
So this matrix has
eigenvalues 0, 0, 1, 1, 1.

76
00:05:56,110 --> 00:05:58,820
Because, why's that?

77
00:05:58,820 --> 00:06:02,200
Well, we get two 0's-- you
remember the eigenvalues had

78
00:06:02,200 --> 00:06:05,250
to be 0 or 1, because
S squared equaled S,

79
00:06:05,250 --> 00:06:09,110
and I minus S squared
equals I minus S. So.

80
00:06:09,110 --> 00:06:10,470
We remember that.

81
00:06:10,470 --> 00:06:14,440
Now, the question
is, what are these?

82
00:06:14,440 --> 00:06:18,600
And the answer is 1/9
is a triple eigenvalue.

83
00:06:18,600 --> 00:06:24,970
I was astonished to discover
1/9-- of course, 0, 0 survived.

84
00:06:24,970 --> 00:06:28,560
The rank is 3.

85
00:06:28,560 --> 00:06:32,220
And this is now
looking much better.

86
00:06:32,220 --> 00:06:35,610
What I reported last time
was some more like 7/8,

87
00:06:35,610 --> 00:06:38,900
or something, and I
thought oh, multigrid,

88
00:06:38,900 --> 00:06:42,290
it's not showing
its true colors.

89
00:06:42,290 --> 00:06:44,650
But actually it does.

90
00:06:44,650 --> 00:06:48,760
This is a kind eigenvalue
we expect, like 1/10,

91
00:06:48,760 --> 00:06:54,240
for a multigrid cycle.

92
00:06:54,240 --> 00:06:55,390
So much better.

93
00:06:55,390 --> 00:06:58,950
If we just did three
M's, for example,

94
00:06:58,950 --> 00:07:04,210
I would get down to 0.9
cubed, which is more than 0.7,

95
00:07:04,210 --> 00:07:09,520
where by doing multigrid
instead, we're down to 0.1.

96
00:07:09,520 --> 00:07:10,260
OK.

97
00:07:10,260 --> 00:07:13,130
So that's the good news.

98
00:07:13,130 --> 00:07:16,730
And, of course, if you
experiment a little,

99
00:07:16,730 --> 00:07:21,350
you'll find different
numbers for different sizes

100
00:07:21,350 --> 00:07:23,810
and really see what's happening.

101
00:07:23,810 --> 00:07:26,330
And, in fact, you could do 2D.

102
00:07:26,330 --> 00:07:29,710
I mentioned all
this partly for now,

103
00:07:29,710 --> 00:07:35,340
if you're wanting to do it
now, or partly for eventually,

104
00:07:35,340 --> 00:07:36,770
later.

105
00:07:36,770 --> 00:07:40,730
Oh, and thinking about projects,
just one word to repeat,

106
00:07:40,730 --> 00:07:43,610
that the Monday
after spring break,

107
00:07:43,610 --> 00:07:48,920
I think that's April
the third, no class.

108
00:07:48,920 --> 00:07:52,710
So it'll be Wednesday
that I'll see you again.

109
00:07:52,710 --> 00:07:56,620
But Mr. [? Cho ?]
will be available.

110
00:07:56,620 --> 00:08:02,270
He has office hours,
so that's April 3,

111
00:08:02,270 --> 00:08:06,480
if you wanted to discuss
issues with your project

112
00:08:06,480 --> 00:08:11,200
with Mr. [? Cho ?], that would
be at class time in his office

113
00:08:11,200 --> 00:08:18,460
2-130, at one o'clock.

114
00:08:18,460 --> 00:08:21,380
Otherwise, take an
extra spring break,

115
00:08:21,380 --> 00:08:23,620
and I'll see you Wednesday.

116
00:08:23,620 --> 00:08:24,900
OK.

117
00:08:24,900 --> 00:08:27,960
So that's various
bits of good news.

118
00:08:27,960 --> 00:08:28,460
Oh.

119
00:08:28,460 --> 00:08:32,700
One other thing that I guess I
didn't finish in that lecture,

120
00:08:32,700 --> 00:08:40,570
was to identify the
two eigenvalues.

121
00:08:44,560 --> 00:08:46,130
Well, eigenvectors.

122
00:08:46,130 --> 00:08:48,359
Those will be in the notes too.

123
00:08:48,359 --> 00:08:50,400
I guess I found one
eigenvector, [1, 2, 2, 2, 1].

124
00:08:52,930 --> 00:08:55,910
And glancing at the matrix,
I didn't spot the other one,

125
00:08:55,910 --> 00:08:58,860
but it has one oscillation.

126
00:08:58,860 --> 00:09:02,180
It's [1, 2, 0, -2, -1].

127
00:09:02,180 --> 00:09:05,110
Anyway, this is the main thing.

128
00:09:05,110 --> 00:09:12,030
A much better result.
So that's multigrid,

129
00:09:12,030 --> 00:09:16,580
which we now can use
as our iteration.

130
00:09:19,260 --> 00:09:22,420
Or we can use it as
a preconditioner.

131
00:09:22,420 --> 00:09:29,630
Many of the
recommended methods use

132
00:09:29,630 --> 00:09:33,530
multigrid as a preconditioner
before something even more

133
00:09:33,530 --> 00:09:36,147
powerful.

134
00:09:36,147 --> 00:09:37,980
Well, I don't know about
even more powerful.

135
00:09:37,980 --> 00:09:40,730
Multigrid people would
say not possible.

136
00:09:40,730 --> 00:09:49,410
But some situations, we may
want to go to a different method

137
00:09:49,410 --> 00:09:55,330
but we want a preconditioner,
and multigrid excellent.

138
00:09:55,330 --> 00:09:58,030
By multigrid, I
include the smoother.

139
00:09:58,030 --> 00:10:00,050
So it'd be that.

140
00:10:00,050 --> 00:10:02,070
That would be a
possible preconditioner

141
00:10:02,070 --> 00:10:03,530
for something else.

142
00:10:03,530 --> 00:10:04,220
OK.

143
00:10:04,220 --> 00:10:06,020
So where are we now?

144
00:10:06,020 --> 00:10:10,170
We're into the new section
starting this moment.

145
00:10:10,170 --> 00:10:13,730
And what's the section about?

146
00:10:13,730 --> 00:10:21,030
It's about things associated
with this name, Krylov.

147
00:10:21,030 --> 00:10:24,270
So, I'm going to use the letter
K for the things that are

148
00:10:24,270 --> 00:10:26,740
associated-- and I'm always
solving A*u equals b.

149
00:10:32,620 --> 00:10:37,210
There's a Krylov matrix
that's created exactly

150
00:10:37,210 --> 00:10:38,660
as you see here.

151
00:10:38,660 --> 00:10:43,100
This is the j-th-- the
one with j columns.

152
00:10:43,100 --> 00:10:48,020
b, A*b, A squared b, up to
A to the the j minus 1 b.

153
00:10:48,020 --> 00:10:55,830
And the Krylov space is the
combinations of those vectors.

154
00:10:55,830 --> 00:10:58,920
That's what this
word, span, means.

155
00:10:58,920 --> 00:11:02,660
I could erase span, and
say, "all combinations of".

156
00:11:02,660 --> 00:11:06,130
Maybe that even more familiar.

157
00:11:06,130 --> 00:11:08,160
So "spanned by"
is the same thing

158
00:11:08,160 --> 00:11:16,110
is saying "all linear
combinations of".

159
00:11:16,110 --> 00:11:19,930
Of those j vectors.

160
00:11:19,930 --> 00:11:21,790
And, of course, that's
the same as saying

161
00:11:21,790 --> 00:11:23,650
it's the column
space of the matrix,

162
00:11:23,650 --> 00:11:26,430
because the column space
is all combinations.

163
00:11:26,430 --> 00:11:27,390
OK.

164
00:11:27,390 --> 00:11:33,170
So, why am I interested?

165
00:11:33,170 --> 00:11:34,450
Why was Krylov interested?

166
00:11:34,450 --> 00:11:38,200
Why is everybody interested
in these vectors?

167
00:11:38,200 --> 00:11:45,470
Because actually, that's what an
iteration like Jacobi produces.

168
00:11:45,470 --> 00:11:53,120
If I use Jacobi's method-- or
Gauss-Seidel, any of those--

169
00:11:53,120 --> 00:11:57,670
after one step, I've got b.

170
00:11:57,670 --> 00:12:01,371
After two steps, there's a
multiplication by A in there,

171
00:12:01,371 --> 00:12:01,870
right?

172
00:12:01,870 --> 00:12:04,750
And some combination
is taken, depending

173
00:12:04,750 --> 00:12:06,710
on the particular method.

174
00:12:06,710 --> 00:12:11,510
So after two steps, I've got
a combination of b and A*b.

175
00:12:11,510 --> 00:12:15,760
After three steps, Jacobi
produces some combination of b,

176
00:12:15,760 --> 00:12:16,890
A*b, A squared b.

177
00:12:16,890 --> 00:12:20,660
In other words, all of
those iterative methods

178
00:12:20,660 --> 00:12:26,420
are picking their
j-th approximation

179
00:12:26,420 --> 00:12:29,550
in this space K_j, actually.

180
00:12:29,550 --> 00:12:34,190
I should put a little j on it,
to indicate that that's-- so we

181
00:12:34,190 --> 00:12:36,500
have these spaces are growing.

182
00:12:36,500 --> 00:12:41,010
They grow by one
dimension, by one new basis

183
00:12:41,010 --> 00:12:43,620
vector at each iteration.

184
00:12:43,620 --> 00:12:50,970
And the point is, Jacobi
makes a particular choice

185
00:12:50,970 --> 00:13:00,860
of x_j or u_j, the
approximation after j steps.

186
00:13:00,860 --> 00:13:03,470
But does it make
the best choice?

187
00:13:03,470 --> 00:13:04,690
Probably not.

188
00:13:04,690 --> 00:13:08,340
In fact, pretty definitely not.

189
00:13:08,340 --> 00:13:15,180
And so the idea is let's
make the best choice.

190
00:13:15,180 --> 00:13:18,030
Let's not just use
a simple iteration

191
00:13:18,030 --> 00:13:24,790
that builds in a choice that
might not be so terrific.

192
00:13:24,790 --> 00:13:26,210
Let's make the best choice.

193
00:13:26,210 --> 00:13:36,160
There are a whole lot of Krylov
methods, which all choose x_j--

194
00:13:36,160 --> 00:13:38,525
since I think this
section will use x,

195
00:13:38,525 --> 00:13:40,470
I'm going to change that to x.

196
00:13:43,569 --> 00:13:44,610
They'll all be iterative.

197
00:13:44,610 --> 00:13:48,810
They'll all start
with x_0 as 0, say.

198
00:13:48,810 --> 00:13:53,040
And then the first
guess will be b, maybe.

199
00:13:53,040 --> 00:13:53,810
Or maybe not.

200
00:13:53,810 --> 00:13:56,050
Maybe some multiple of b.

201
00:13:56,050 --> 00:13:59,770
Actually, a good
Krylov method will

202
00:13:59,770 --> 00:14:02,380
take the best multiple
of b as the first guess,

203
00:14:02,380 --> 00:14:06,180
and not necessarily 1 times b.

204
00:14:06,180 --> 00:14:08,430
And onwards.

205
00:14:08,430 --> 00:14:14,010
So we have this word,
best, coming up.

206
00:14:14,010 --> 00:14:17,580
What's the best
vector in that space?

207
00:14:17,580 --> 00:14:20,690
And there are different
methods depending

208
00:14:20,690 --> 00:14:22,130
on what I mean by best.

209
00:14:25,790 --> 00:14:27,870
Oh, let me tell you
the name of the method

210
00:14:27,870 --> 00:14:33,180
that I'm going to concentrate
on first and most.

211
00:14:33,180 --> 00:14:36,660
Will be the conjugate
gradient method.

212
00:14:36,660 --> 00:14:38,280
CG.

213
00:14:38,280 --> 00:14:41,020
The conjugate gradient method.

214
00:14:53,500 --> 00:14:55,290
What determines x_j?

215
00:14:55,290 --> 00:15:08,320
It chooses x_j in K_j,
the space that we always

216
00:15:08,320 --> 00:15:12,870
look for our approximation in.

217
00:15:12,870 --> 00:15:21,940
Let me not forget to say: these
vectors, b, A*b, A squared b,

218
00:15:21,940 --> 00:15:25,240
and so on, they're
easy to compute,

219
00:15:25,240 --> 00:15:30,920
because each one is just a
matrix multiplication from

220
00:15:30,920 --> 00:15:32,890
the previous one.

221
00:15:32,890 --> 00:15:35,590
And the matrix is--
we're assuming we're

222
00:15:35,590 --> 00:15:38,810
working with sparse matrices.

223
00:15:38,810 --> 00:15:42,200
And mostly, and
especially, sometimes,

224
00:15:42,200 --> 00:15:44,590
especially symmetric ones.

225
00:15:44,590 --> 00:15:47,540
So just let me put in
here, before I even

226
00:15:47,540 --> 00:15:50,790
finish that sentence,
that CG, this

227
00:15:50,790 --> 00:15:54,820
is for A transpose
equal A symmetric.

228
00:15:54,820 --> 00:15:56,990
It's only for those.

229
00:15:56,990 --> 00:16:02,150
And positive definite so
symmetric, positive definite.

230
00:16:07,390 --> 00:16:11,610
So it's a limited class of
problems, but a highly, highly

231
00:16:11,610 --> 00:16:13,400
important class.

232
00:16:13,400 --> 00:16:17,270
And you may say what do
we do if the matrix is

233
00:16:17,270 --> 00:16:20,740
symmetric, but indefinite?

234
00:16:20,740 --> 00:16:23,510
Well, that comes next.

235
00:16:23,510 --> 00:16:27,500
Or what if the matrix is
not to symmetric at all.

236
00:16:27,500 --> 00:16:31,710
Well, if you're brave, you might
try conjugate gradients anyway.

237
00:16:31,710 --> 00:16:34,800
But if you're cautious,
then you would

238
00:16:34,800 --> 00:16:39,010
use one of the other
methods like MINRES.

239
00:16:39,010 --> 00:16:45,580
Maybe I'll just mention one
more on our eventual list.

240
00:16:45,580 --> 00:16:50,970
Actually that tells us
probably what choice it makes.

241
00:16:50,970 --> 00:16:59,930
MINRES chooses the x_j
to minimize the residual.

242
00:16:59,930 --> 00:17:01,850
Minimum r.

243
00:17:01,850 --> 00:17:08,490
Remember r is b minus A*x.

244
00:17:08,490 --> 00:17:11,450
The residual r
will always denote

245
00:17:11,450 --> 00:17:13,880
the error in the equation.

246
00:17:13,880 --> 00:17:19,860
And so it's minimum residual.

247
00:17:19,860 --> 00:17:21,090
OK.

248
00:17:21,090 --> 00:17:23,470
So that's a natural
choice, but you'll

249
00:17:23,470 --> 00:17:31,190
see that this is a
fantastic method.

250
00:17:31,190 --> 00:17:39,430
Superior, just quicker than
MINRES, for a nice reason.

251
00:17:39,430 --> 00:17:45,680
So it chooses x_j in K_j,
and I think the rule it uses

252
00:17:45,680 --> 00:18:00,090
is that I think x_j should be
orthogonal-- I'll go and check

253
00:18:00,090 --> 00:18:05,080
it in my notes-- to
the residual r_j.

254
00:18:05,080 --> 00:18:08,410
Let me just be sure-- if I'm
going to write that down,

255
00:18:08,410 --> 00:18:10,510
I better be sure I
said it correctly.

256
00:18:13,910 --> 00:18:18,120
I actually, I didn't
say it correctly.

257
00:18:18,120 --> 00:18:30,800
r_j is orthogonal to
the whole space K_j.

258
00:18:30,800 --> 00:18:36,520
It turns out that
we can choose x_j--

259
00:18:36,520 --> 00:18:40,260
when I make a choice
of x_j, it's in K_j.

260
00:18:40,260 --> 00:18:43,790
Now when I compute r,
there's a multiplication by A

261
00:18:43,790 --> 00:18:48,130
to get the residual, so that
moves us up to the next space,

262
00:18:48,130 --> 00:18:50,230
and I'm going to
make a choice where

263
00:18:50,230 --> 00:18:54,600
this is orthogonal to
the whole space K_j.

264
00:18:54,600 --> 00:18:57,540
You have to do
conjugate gradient.

265
00:18:57,540 --> 00:19:01,620
And I can't start on
that for a good reason.

266
00:19:01,620 --> 00:19:04,490
I have to start with
something called Arnoldi.

267
00:19:04,490 --> 00:19:11,230
And and what's Arnoldi about?

268
00:19:11,230 --> 00:19:13,734
Well.

269
00:19:13,734 --> 00:19:15,150
Let me come back
to these vectors.

270
00:19:20,380 --> 00:19:21,240
Quick to compute.

271
00:19:21,240 --> 00:19:25,970
But what's the other property?

272
00:19:25,970 --> 00:19:29,220
Everything in applied
numerical mathematics,

273
00:19:29,220 --> 00:19:32,140
you're choosing basis
vectors, and you're

274
00:19:32,140 --> 00:19:34,710
looking for two properties.

275
00:19:34,710 --> 00:19:37,690
And I guess this is like
a general rule, whenever

276
00:19:37,690 --> 00:19:42,270
you meet a whole new
problem, in the end,

277
00:19:42,270 --> 00:19:44,190
you're going to construct
some basis vectors

278
00:19:44,190 --> 00:19:46,370
if you're going to compute.

279
00:19:46,370 --> 00:19:49,350
And what properties
are you after?

280
00:19:49,350 --> 00:19:52,380
You're after speed.

281
00:19:52,380 --> 00:19:55,230
So they have to be quick
to compute and work with.

282
00:19:55,230 --> 00:19:56,760
Well these are.

283
00:19:56,760 --> 00:20:00,400
Multiplying by A is sparse
matrix multiplication,

284
00:20:00,400 --> 00:20:03,080
you can't beat that.

285
00:20:03,080 --> 00:20:05,010
But the other
property that you need

286
00:20:05,010 --> 00:20:09,800
is some decent independence.

287
00:20:09,800 --> 00:20:12,050
And not just barely independent.

288
00:20:12,050 --> 00:20:15,840
You want the condition
number of the basis,

289
00:20:15,840 --> 00:20:21,650
somehow-- if I use
that word-- to be good.

290
00:20:21,650 --> 00:20:23,720
Not enormous.

291
00:20:23,720 --> 00:20:28,060
And best of all, you would
like the basis vectors

292
00:20:28,060 --> 00:20:30,450
to be orthonormal.

293
00:20:30,450 --> 00:20:34,280
That's a condition
number of one.

294
00:20:34,280 --> 00:20:37,500
That's the best basis
you can hope for.

295
00:20:37,500 --> 00:20:41,980
And the point is,
that this construction

296
00:20:41,980 --> 00:20:46,330
produces a lousy basis from
the point of view of condition.

297
00:20:46,330 --> 00:20:57,540
So Arnoldi's job is
orthogonalize the Krylov basis,

298
00:20:57,540 --> 00:21:14,230
which is b, A*b, and so on,
producing orthonormal basis

299
00:21:14,230 --> 00:21:17,520
q_1, q_2, up to q_j.

300
00:21:25,160 --> 00:21:26,140
It's just beautiful.

301
00:21:26,140 --> 00:21:31,000
So Arnoldi's taking
like a preliminary step

302
00:21:31,000 --> 00:21:38,400
that Krylov has to make to
get something that numerically

303
00:21:38,400 --> 00:21:40,880
reasonable to work with.

304
00:21:40,880 --> 00:21:48,300
Then if it's fast, and Arnoldi
doesn't-- we have to do

305
00:21:48,300 --> 00:21:51,800
a multiplication by A,
and you'll see one here

306
00:21:51,800 --> 00:21:55,670
in the middle of Arnoldi--
and then, of course,

307
00:21:55,670 --> 00:22:03,150
if you look at those
ten lines of MATLAB,

308
00:22:03,150 --> 00:22:07,120
and we could go through
them a little carefully,

309
00:22:07,120 --> 00:22:09,250
but let's just
take a first look.

310
00:22:09,250 --> 00:22:14,380
The first step is like
Gram-Schmidt, right?

311
00:22:14,380 --> 00:22:17,070
It's sort of a
Gram-Schmidt idea.

312
00:22:17,070 --> 00:22:21,990
You take that first vector
b, you accept that direction,

313
00:22:21,990 --> 00:22:24,995
and the only step remaining
is to normalize it.

314
00:22:24,995 --> 00:22:27,740
Divide by its length.

315
00:22:27,740 --> 00:22:30,890
Then you go on to the next.

316
00:22:30,890 --> 00:22:37,000
Then you have some
trial vector t,

317
00:22:37,000 --> 00:22:42,580
which would in this case be
A*b, in the direction of A*b,

318
00:22:42,580 --> 00:22:47,480
which won't be orthogonal
to the original b, right?

319
00:22:47,480 --> 00:22:50,350
So this won't be
orthogonal to that one.

320
00:22:53,250 --> 00:22:57,190
So, of course,
you've got to compute

321
00:22:57,190 --> 00:23:02,150
an inner product between them.

322
00:23:02,150 --> 00:23:05,680
And subtract off
the right multiple

323
00:23:05,680 --> 00:23:09,010
of the previous one
from the current t

324
00:23:09,010 --> 00:23:11,430
to get the improved t.

325
00:23:11,430 --> 00:23:15,750
And then you're going
to normalize it again.

326
00:23:18,510 --> 00:23:21,910
So you compute its length
and divide by the length.

327
00:23:21,910 --> 00:23:25,650
You see that overall
pattern in Arnoldi?

328
00:23:25,650 --> 00:23:31,180
It's the familiar idea
of Gram-Schmidt of,

329
00:23:31,180 --> 00:23:37,050
take a new vector,
find its projections

330
00:23:37,050 --> 00:23:39,800
onto the ones that
are already set,

331
00:23:39,800 --> 00:23:46,700
subtract those components
off, you're left with a piece,

332
00:23:46,700 --> 00:23:49,120
and you find its
length and normalize

333
00:23:49,120 --> 00:23:50,730
that to be a unit factor.

334
00:23:50,730 --> 00:23:52,060
It's very familiar.

335
00:23:52,060 --> 00:23:56,370
Its exactly the
type of algorithm

336
00:23:56,370 --> 00:23:59,910
that you would write
down immediately.

337
00:23:59,910 --> 00:24:04,820
And it just involved the
one multiplication by A

338
00:24:04,820 --> 00:24:10,280
so that it's not
going to cost us

339
00:24:10,280 --> 00:24:15,610
a lot to make the
vectors orthonormal.

340
00:24:15,610 --> 00:24:16,940
Well.

341
00:24:16,940 --> 00:24:17,660
Wait a minute.

342
00:24:20,630 --> 00:24:22,340
Is it going to be
expensive or not?

343
00:24:22,340 --> 00:24:26,220
That's like the key question.

344
00:24:26,220 --> 00:24:29,850
It's certainly going to
produce a good result.

345
00:24:29,850 --> 00:24:33,940
Producing orthonormal vectors is
going to be a good thing to do.

346
00:24:33,940 --> 00:24:36,650
But is it expensive or not?

347
00:24:36,650 --> 00:24:37,450
That depends.

348
00:24:40,520 --> 00:24:49,780
It's certainly not expensive
if the new trial vector

349
00:24:49,780 --> 00:24:55,910
t has only components
in one or two

350
00:24:55,910 --> 00:24:59,370
of the already
settled directions.

351
00:24:59,370 --> 00:25:04,410
In other words, if I
only have to subtract off

352
00:25:04,410 --> 00:25:12,400
a couple of earlier
components, then I'm golden.

353
00:25:12,400 --> 00:25:15,730
And that's the case
when A is symmetric.

354
00:25:15,730 --> 00:25:18,210
So that will be
the key point here.

355
00:25:18,210 --> 00:25:19,860
And I just make it here.

356
00:25:19,860 --> 00:25:32,310
If A is symmetric, A transpose
equals A, then I only need,

357
00:25:32,310 --> 00:25:39,330
in this subtracting off, where
I'm headed for the new q,

358
00:25:39,330 --> 00:25:47,620
I only need to subtract off
h_(j, j), multiplying the q_j,

359
00:25:47,620 --> 00:25:50,800
the q that was just
set, and the one before.

360
00:25:53,230 --> 00:25:53,730
h_(j-1, j).

361
00:25:56,390 --> 00:25:59,130
I'll call that a
short recurrence.

362
00:25:59,130 --> 00:26:02,730
It's short because it
only has two terms.

363
00:26:02,730 --> 00:26:08,730
And then there'll
be a new h_(j+1, j),

364
00:26:08,730 --> 00:26:11,140
which is just the length.

365
00:26:11,140 --> 00:26:17,140
So there'll be one of these
magic things in so many parts

366
00:26:17,140 --> 00:26:22,390
of mathematical analysis, a
three-term recurrence relation

367
00:26:22,390 --> 00:26:25,690
will hold-- in fact,
I guess the reason

368
00:26:25,690 --> 00:26:31,140
we see three-term recurrence
relations in all-- Legendre

369
00:26:31,140 --> 00:26:34,040
polynomials,
Chebyshev polynomials,

370
00:26:34,040 --> 00:26:37,410
all those classical
things-- the reason

371
00:26:37,410 --> 00:26:41,160
is actually the
same as it is here.

372
00:26:41,160 --> 00:26:45,150
That something in the
background is symmetric.

373
00:26:50,760 --> 00:26:52,870
I want to put a few
numbers on the board

374
00:26:52,870 --> 00:26:58,390
so you see what is typical
input and output to Arnoldi.

375
00:26:58,390 --> 00:27:02,680
And you'll see it's
symmetric and then you'll

376
00:27:02,680 --> 00:27:06,080
see it's short recurrence,
and then we want to see what?

377
00:27:06,080 --> 00:27:06,700
OK.

378
00:27:06,700 --> 00:27:14,090
So I worked out a
typical Arnoldi example.

379
00:27:14,090 --> 00:27:15,870
Not that one.

380
00:27:15,870 --> 00:27:17,550
Here.

381
00:27:17,550 --> 00:27:18,050
OK.

382
00:27:22,330 --> 00:27:25,180
I think this is a good
example to look at.

383
00:27:25,180 --> 00:27:29,120
So matrix A is not only
symmetric, it's diagonal.

384
00:27:29,120 --> 00:27:35,760
So of course, to find A
inverse b is not difficult.

385
00:27:35,760 --> 00:27:40,480
But we're going to do it
through conjugate gradients.

386
00:27:40,480 --> 00:27:43,420
So that means that we
don't figure out A inverse,

387
00:27:43,420 --> 00:27:45,780
which, of course,
we easily could.

388
00:27:45,780 --> 00:27:49,350
We just use A, very sparse.

389
00:27:49,350 --> 00:27:51,490
Here's our right-hand side.

390
00:27:51,490 --> 00:27:55,440
Here's our Krylov matrix.

391
00:27:55,440 --> 00:27:57,880
It has b in its first column.

392
00:27:57,880 --> 00:28:00,770
It has A times b in
the second column.

393
00:28:00,770 --> 00:28:03,340
Multiply by A again to
get the third column.

394
00:28:03,340 --> 00:28:05,430
And A one more time to
get the fourth column.

395
00:28:05,430 --> 00:28:09,500
So that's K_4, right?

396
00:28:09,500 --> 00:28:16,770
And for a 4 by 4
problem, that's the end.

397
00:28:16,770 --> 00:28:26,100
So actually here I'm carrying
Krylov to the end of it.

398
00:28:26,100 --> 00:28:27,920
There's no more to do.

399
00:28:27,920 --> 00:28:34,350
Because once I've got this
K_j, the combinations of those,

400
00:28:34,350 --> 00:28:39,780
are all of R^4, all of
four-dimensional space.

401
00:28:39,780 --> 00:28:40,280
Oh, yeah.

402
00:28:40,280 --> 00:28:43,230
That raises an important
point about how

403
00:28:43,230 --> 00:28:46,050
we're improving on iterations.

404
00:28:46,050 --> 00:28:49,890
If I was using Jacobi,
or Gauss-Seidel, or one

405
00:28:49,890 --> 00:28:52,910
of those others,
then-- well, I won't

406
00:28:52,910 --> 00:29:00,310
say for this particular A, but
usually-- after four steps,

407
00:29:00,310 --> 00:29:03,820
I am by no means finished.

408
00:29:03,820 --> 00:29:05,900
I've taken four steps,
I've gotten closer,

409
00:29:05,900 --> 00:29:08,220
but I haven't got
the exact answer.

410
00:29:08,220 --> 00:29:12,560
But now, in this the
world of Krylov methods,

411
00:29:12,560 --> 00:29:15,920
after four steps, I will
have the exact answer.

412
00:29:15,920 --> 00:29:16,480
Why?

413
00:29:16,480 --> 00:29:26,710
Because Krylov methods take
the best x_4 out of the space

414
00:29:26,710 --> 00:29:28,220
spanned by those four columns.

415
00:29:28,220 --> 00:29:32,100
Well, the whole space, R^4
is spanned by those columns

416
00:29:32,100 --> 00:29:35,600
and taking the best
one must be the answer.

417
00:29:35,600 --> 00:29:40,200
So x_4 will actually
be A inverse

418
00:29:40,200 --> 00:29:41,810
b, the answer we're looking for.

419
00:29:41,810 --> 00:29:51,950
So these methods are on the
one hand, direct methods.

420
00:29:51,950 --> 00:29:55,330
They finish after
four steps, finish.

421
00:29:55,330 --> 00:29:57,260
You've got the answer.

422
00:29:57,260 --> 00:29:58,582
Now.

423
00:29:58,582 --> 00:30:00,040
That's actually
how they were born.

424
00:30:00,040 --> 00:30:04,250
As direct methods
that gave the answer

425
00:30:04,250 --> 00:30:11,060
from an interesting
iteration and they nearly

426
00:30:11,060 --> 00:30:12,740
died for that reason.

427
00:30:12,740 --> 00:30:17,590
They were born and died,
or were on death's door,

428
00:30:17,590 --> 00:30:19,690
as direct methods.

429
00:30:19,690 --> 00:30:24,570
Because as direct methods,
there are better ones.

430
00:30:24,570 --> 00:30:32,670
Then some years later,
people went back to it, back

431
00:30:32,670 --> 00:30:35,880
to conjugate
gradients, and noticed

432
00:30:35,880 --> 00:30:42,580
that thought of as
just going partway,

433
00:30:42,580 --> 00:30:45,770
gave very successful answers.

434
00:30:45,770 --> 00:30:50,260
So we're going to think
of Krylov-- we're planning

435
00:30:50,260 --> 00:30:53,510
to stop before j equals n.

436
00:30:53,510 --> 00:30:57,170
In this picture, j
equals n equal 4 here.

437
00:30:59,740 --> 00:31:07,680
But, we plan to stop
for j much below n.

438
00:31:07,680 --> 00:31:11,250
We're thinking of applications
where n is 10 the fifth,

439
00:31:11,250 --> 00:31:15,910
and we're going to stop at
10 squared, 100 steps, maybe.

440
00:31:15,910 --> 00:31:17,800
Or ten steps.

441
00:31:17,800 --> 00:31:23,790
So that meant a total
reconsideration,

442
00:31:23,790 --> 00:31:26,330
reanalysis of
conjugate gradients,

443
00:31:26,330 --> 00:31:30,100
and it is now back
to a big success.

444
00:31:30,100 --> 00:31:37,420
Well it's a rather unusual
thing that a method that people

445
00:31:37,420 --> 00:31:41,750
have put aside gets picked up
again and becomes a favorite,

446
00:31:41,750 --> 00:31:43,400
as conjugate gradients have.

447
00:31:49,970 --> 00:31:56,150
First of all, I was saying that
the basis, those basis vectors

448
00:31:56,150 --> 00:32:00,850
are not very independent.

449
00:32:03,740 --> 00:32:05,920
That's not a good basis.

450
00:32:05,920 --> 00:32:08,130
Of course, it's a
detractive basis and maybe,

451
00:32:08,130 --> 00:32:09,920
do you know the
name for a matrix

452
00:32:09,920 --> 00:32:12,380
that has this particular form?

453
00:32:12,380 --> 00:32:19,590
The columns are constant, then,
that's the key column, there,

454
00:32:19,590 --> 00:32:24,210
and then it's first powers,
second powers, third powers.

455
00:32:24,210 --> 00:32:27,990
Do you remember whose name is
associated with that matrix?

456
00:32:27,990 --> 00:32:32,060
It comes up in interpolation,
and it starts with a V?

457
00:32:35,900 --> 00:32:36,700
Anybody know?

458
00:32:36,700 --> 00:32:37,960
Vandermonde, yeah.

459
00:32:37,960 --> 00:32:39,570
Vandermonde.

460
00:32:39,570 --> 00:32:44,140
So I could call it
V for Vandermonde.

461
00:32:44,140 --> 00:32:48,910
And Vandermonde matrices are
not very well conditioned.

462
00:32:54,620 --> 00:33:00,420
So now a little timeout to
say, because it's so important,

463
00:33:00,420 --> 00:33:08,160
how do you judge the good or
poor conditioning of a matrix?

464
00:33:08,160 --> 00:33:11,470
Those vectors are
linearly independent.

465
00:33:11,470 --> 00:33:14,640
The determinant
of V is not zero.

466
00:33:17,810 --> 00:33:22,960
The matrix is not singular,
but it's too close to singular,

467
00:33:22,960 --> 00:33:28,480
and how do you test
the-- suppose you

468
00:33:28,480 --> 00:33:33,700
have a basis, as we have here.

469
00:33:33,700 --> 00:33:38,570
And you want to know
is it good or not?

470
00:33:38,570 --> 00:33:41,930
So, of course, you always
put your bases vectors,

471
00:33:41,930 --> 00:33:47,760
let me call them-- well,I don't
want to call them v because --

472
00:33:47,760 --> 00:33:49,180
well I guess I
could call them v,

473
00:33:49,180 --> 00:33:52,300
they come out of Vandermonde.
v_1, v_2, v_3, v_4.

474
00:33:55,370 --> 00:33:57,570
That's our matrix.

475
00:33:57,570 --> 00:34:00,350
Essentially I want
its condition number.

476
00:34:00,350 --> 00:34:03,110
And to find its
condition number,

477
00:34:03,110 --> 00:34:07,250
I-- it's not symmetric,
so I can't just

478
00:34:07,250 --> 00:34:08,850
take the eigenvalues of that.

479
00:34:08,850 --> 00:34:11,430
You might say, look at
lambda_max and lambda_min.

480
00:34:14,010 --> 00:34:16,540
The condition
number is associated

481
00:34:16,540 --> 00:34:19,610
with max divided by min.

482
00:34:19,610 --> 00:34:25,200
But when the matrix
isn't symmetric,

483
00:34:25,200 --> 00:34:29,340
just taking its eigenvalues
directly is not cool.

484
00:34:29,340 --> 00:34:33,900
It's not reliable.

485
00:34:33,900 --> 00:34:36,050
I could have a matrix
that's badly conditioned,

486
00:34:36,050 --> 00:34:37,960
but all its eigenvalues were 1.

487
00:34:40,870 --> 00:34:43,210
I mean a matrix with
1's on the diagonal

488
00:34:43,210 --> 00:34:45,850
and zillions up
above the diagonal

489
00:34:45,850 --> 00:34:49,380
would have eigenvalues of 1, but
it would be badly conditioned.

490
00:34:49,380 --> 00:34:53,900
So the right way to take
it is V transpose V.

491
00:34:53,900 --> 00:34:59,470
Look at v_1 transpose,
v_2 transpose...

492
00:34:59,470 --> 00:35:06,060
As always, if a matrix
isn't symmetric,

493
00:35:06,060 --> 00:35:08,610
if the matrix V
is not symmetric,

494
00:35:08,610 --> 00:35:14,940
good idea to form V transpose
V. That is symmetric.

495
00:35:14,940 --> 00:35:17,970
It does have
positive eigenvalues.

496
00:35:17,970 --> 00:35:24,720
And those eigenvalues, the
eigenvalues of V transpose V,

497
00:35:24,720 --> 00:35:30,760
are the singular values, or
rather the singular values

498
00:35:30,760 --> 00:35:37,090
squared, of V. So
I guess I'm saying,

499
00:35:37,090 --> 00:35:39,970
you can't trust the
eigenvalues of V.

500
00:35:39,970 --> 00:35:42,060
It's the singular
values you can trust.

501
00:35:42,060 --> 00:35:43,870
And the way to find
singular values

502
00:35:43,870 --> 00:35:48,510
is form V transpose
V, that gives you

503
00:35:48,510 --> 00:35:51,280
a symmetric matrix,
its eigenvalues--

504
00:35:51,280 --> 00:35:55,400
so the i-th eigenvalue
would be the i-th singular

505
00:35:55,400 --> 00:35:56,250
value squared.

506
00:36:08,260 --> 00:36:15,480
So the condition number, is
sigma_max over sigma_min.

507
00:36:18,990 --> 00:36:24,910
And, well, it's not enormous
for this 4 by 4 matrix,

508
00:36:24,910 --> 00:36:29,350
but if I go up to 10
by 10 or 100 by 100.

509
00:36:29,350 --> 00:36:31,550
100 by 100 would just
totally wipe me out.

510
00:36:31,550 --> 00:36:34,810
10 by 10 would
already be disastrous.

511
00:36:34,810 --> 00:36:37,080
Completely disastrous actually.

512
00:36:37,080 --> 00:36:44,740
10 by 10 Vandermonde matrix,
with 1, 2 3, 4, up to 10

513
00:36:44,740 --> 00:36:51,180
as the points, would be--
well, it would have entries--

514
00:36:51,180 --> 00:36:54,200
if that ended in a
10 and I had 10 rows,

515
00:36:54,200 --> 00:36:57,660
would that be something
like 10 to the ninth power.

516
00:36:57,660 --> 00:37:02,500
I mean the dynamics
scale would be terrible.

517
00:37:02,500 --> 00:37:06,820
So finally, just
to-- V transpose V

518
00:37:06,820 --> 00:37:09,930
is called the Gram matrix.

519
00:37:09,930 --> 00:37:21,790
So that guy Gram is coming back
in as giving the good measure.

520
00:37:21,790 --> 00:37:26,870
So the point was then, that
this measures the dependence

521
00:37:26,870 --> 00:37:33,240
or independence of the v's.

522
00:37:33,240 --> 00:37:35,150
That ratio.

523
00:37:35,150 --> 00:37:39,690
The bigger that ratio is,
the more dependency they are.

524
00:37:39,690 --> 00:37:43,150
What's the ratio if
they're orthonormal?

525
00:37:43,150 --> 00:37:46,700
What's the ratio of the q's?

526
00:37:46,700 --> 00:37:49,250
What's the condition
number of the q's?

527
00:37:49,250 --> 00:37:57,990
If we've run Arnoldi and
got a good basis, it's one.

528
00:37:57,990 --> 00:37:59,280
What's the Gram matrix?

529
00:37:59,280 --> 00:38:07,000
If this is Q transpose Q, with
the q's, the orthonormal basis

530
00:38:07,000 --> 00:38:13,330
in the columns, then Q
transpose Q is the identity.

531
00:38:13,330 --> 00:38:15,610
So it's a Gram matrix.

532
00:38:15,610 --> 00:38:20,340
So Q transpose Q
would be the identity,

533
00:38:20,340 --> 00:38:25,880
and its condition number
is the best possible one.

534
00:38:25,880 --> 00:38:29,130
Lambda_max is 1,
lambda_min is 1.

535
00:38:29,130 --> 00:38:30,960
The condition number is one.

536
00:38:30,960 --> 00:38:31,920
OK.

537
00:38:31,920 --> 00:38:38,980
So that's just some comments
about why Arnoldi gets brought

538
00:38:38,980 --> 00:38:41,220
in to fix the situation.

539
00:38:41,220 --> 00:38:42,550
OK.

540
00:38:42,550 --> 00:38:47,720
So I'll just leave those ten
Arnoldi steps on the board.

541
00:38:47,720 --> 00:38:52,210
The notes give ten comments.

542
00:38:52,210 --> 00:38:58,300
I'm quite proud of-- my
MATLAB is unreliable.

543
00:38:58,300 --> 00:39:00,490
Actually you'll
find a few errors

544
00:39:00,490 --> 00:39:04,180
like after end, I
accidentally put

545
00:39:04,180 --> 00:39:08,260
a semicolon, which was absurd.

546
00:39:08,260 --> 00:39:11,650
But the comments
I'm pleased with,

547
00:39:11,650 --> 00:39:15,490
and then the
numerical example that

548
00:39:15,490 --> 00:39:23,850
runs through one cycle of
this, with these numbers,

549
00:39:23,850 --> 00:39:28,370
to see what q_2 is,
is, I hope, useful.

550
00:39:33,070 --> 00:39:36,210
Why do we have a
short recurrence?

551
00:39:36,210 --> 00:39:41,240
This is a key point here.

552
00:39:41,240 --> 00:39:45,420
And in this example, if I work
out the h's, I'll discover sure

553
00:39:45,420 --> 00:39:53,280
enough, h_(1, 3) will be
0. h_(1, 4) will be 0.

554
00:39:57,400 --> 00:40:00,410
Here's the key equation.

555
00:40:00,410 --> 00:40:10,700
Here's Arnoldi in
matrix language.

556
00:40:10,700 --> 00:40:15,380
Let me see if I can remember
Arnoldi in matrix language.

557
00:40:15,380 --> 00:40:23,920
So, Arnoldi is taking
the matrix-- yeah.

558
00:40:23,920 --> 00:40:28,890
So Arnoldi in matrix
language is going to be this.

559
00:40:28,890 --> 00:40:48,870
It's going to A*Q equals Q*H.
I can't write out all of Q.

560
00:40:48,870 --> 00:40:57,420
So that's the big equation.

561
00:40:57,420 --> 00:41:01,910
Its a very important
equation that we now

562
00:41:01,910 --> 00:41:05,210
have all the pieces for.

563
00:41:05,210 --> 00:41:08,210
So A is our original
matrix that we were given.

564
00:41:08,210 --> 00:41:11,450
Symmetric let's say.

565
00:41:11,450 --> 00:41:19,930
Q is our basis out of Arnoldi
and H is the multipliers

566
00:41:19,930 --> 00:41:21,310
that gave that basis.

567
00:41:21,310 --> 00:41:26,660
So this Q*H is a little
bit like Gram-Schmidt.

568
00:41:26,660 --> 00:41:33,280
Do you remember, Gram-Schmidt
is described by Q times R. Q,

569
00:41:33,280 --> 00:41:35,850
again, is orthonormal.

570
00:41:35,850 --> 00:41:39,530
So it's an orthogonal matrix.

571
00:41:39,530 --> 00:41:43,800
In Gram-Schmidt R
is upper triangular.

572
00:41:43,800 --> 00:41:45,860
Here it's not.

573
00:41:45,860 --> 00:41:46,960
Here it's Hessenberg.

574
00:41:46,960 --> 00:41:50,520
So H stands for Hessenberg.

575
00:41:50,520 --> 00:41:52,660
I'll write down
what the actual H

576
00:41:52,660 --> 00:41:56,380
is for these for these numbers.

577
00:41:56,380 --> 00:42:00,700
I won't write down the
Q. I'll just write down

578
00:42:00,700 --> 00:42:04,280
what H turned out to
be for those numbers

579
00:42:04,280 --> 00:42:05,270
if I did it right.

580
00:42:05,270 --> 00:42:07,820
Five halves.

581
00:42:07,820 --> 00:42:09,150
Oh, interesting.

582
00:42:09,150 --> 00:42:12,230
The lengths all turned
out to be five halves.

583
00:42:15,760 --> 00:42:19,630
And this turned out
to be root 5 on 2.

584
00:42:19,630 --> 00:42:24,400
This turned out to be the
square root of 4 over 5,

585
00:42:24,400 --> 00:42:29,480
and this turned out to be
the square root of 9 over 20.

586
00:42:29,480 --> 00:42:39,220
And the point is from here, this
one is just below the diagonal.

587
00:42:39,220 --> 00:42:42,890
And it will show
up as symmetric.

588
00:42:42,890 --> 00:42:50,630
Root 5 over 2, root
4/5 and root 9/20.

589
00:42:50,630 --> 00:42:52,790
OK.

590
00:42:52,790 --> 00:42:58,870
So, what am I seeing from
that particular H which

591
00:42:58,870 --> 00:43:00,970
somehow can't be an accident?

592
00:43:00,970 --> 00:43:09,210
It must be that it's built in.

593
00:43:09,210 --> 00:43:18,100
It's the fact that H is
symmetric and tridiagonal.

594
00:43:24,900 --> 00:43:27,420
And what does that
tridiagonal tell us?

595
00:43:27,420 --> 00:43:29,920
It tells us that we
have short recurrences.

596
00:43:29,920 --> 00:43:37,560
It's the three-term
recurrence relation,

597
00:43:37,560 --> 00:43:41,220
is what I'm seeing here
in matrix language,

598
00:43:41,220 --> 00:43:47,140
because there were three
non-zeros in the columns of H.

599
00:43:47,140 --> 00:43:47,640
All right.

600
00:43:47,640 --> 00:43:49,520
I was going to write
out-- I was going

601
00:43:49,520 --> 00:43:55,490
to try to understand this one
by looking at the first column.

602
00:43:55,490 --> 00:43:58,250
What if I take the first
column of both sides?

603
00:43:58,250 --> 00:44:01,540
That'll be A times Q_1, right?

604
00:44:05,100 --> 00:44:09,440
Q_1 is the first column vector
in the-- first basis vector.

605
00:44:09,440 --> 00:44:14,420
And what do I have here
when I take q's times h's?

606
00:44:14,420 --> 00:44:19,290
Do you do matrix multiplication
a column at a time?

607
00:44:19,290 --> 00:44:20,990
You should.

608
00:44:20,990 --> 00:44:22,280
OK.

609
00:44:22,280 --> 00:44:25,310
So this says take 5/2
of the first column.

610
00:44:28,170 --> 00:44:32,790
And this says take that factor
times the second column.

611
00:44:36,850 --> 00:44:40,690
And I could track back
and see, yes that's

612
00:44:40,690 --> 00:44:46,050
what Arnoldi has produced.

613
00:44:46,050 --> 00:44:51,220
And then the second one, the
next one, would be an A*q_2.

614
00:44:51,220 --> 00:44:52,220
the next would an A*q_3.

615
00:44:56,710 --> 00:44:58,890
Well, look, here it is.

616
00:44:58,890 --> 00:45:05,960
I want to show that H
is symmetric when A is.

617
00:45:05,960 --> 00:45:07,760
Could you do that?

618
00:45:07,760 --> 00:45:11,430
We know what the
property of Q is here.

619
00:45:11,430 --> 00:45:15,610
We know that Q is, because
Arnoldi made it that way,

620
00:45:15,610 --> 00:45:22,420
has Q transpose Q equal
I. And I can, in one step,

621
00:45:22,420 --> 00:45:26,130
show that H is symmetric
if A is symmetric.

622
00:45:26,130 --> 00:45:26,950
How do you do that?

623
00:45:29,860 --> 00:45:32,640
I guess we need a
formula for H, so I just

624
00:45:32,640 --> 00:45:36,610
multiply by Q inverse.

625
00:45:42,540 --> 00:45:44,610
So that's good.

626
00:45:44,610 --> 00:45:49,420
And even better is to
recognize what Q inverse is.

627
00:45:49,420 --> 00:45:51,790
So what is Q inverse here?

628
00:45:51,790 --> 00:45:53,460
It's Q transpose.

629
00:45:53,460 --> 00:45:57,030
Anytime we see Q, that's
my letter, always,

630
00:45:57,030 --> 00:45:58,440
for an orthogonal matrix.

631
00:45:58,440 --> 00:46:05,850
So this is Q transpose
A Q. And now what?

632
00:46:05,850 --> 00:46:07,780
The argument's finished.

633
00:46:07,780 --> 00:46:10,940
We're here.

634
00:46:10,940 --> 00:46:18,490
If A is symmetric, what can
you say about that combination?

635
00:46:18,490 --> 00:46:22,040
If A is a symmetric matrix?

636
00:46:22,040 --> 00:46:26,860
It is symmetric.

637
00:46:26,860 --> 00:46:27,450
Right?

638
00:46:27,450 --> 00:46:30,450
That's how you get the
symmetric matrices.

639
00:46:30,450 --> 00:46:32,210
You start with one.

640
00:46:32,210 --> 00:46:35,260
You multiply on one side by a
matrix, and on the other side

641
00:46:35,260 --> 00:46:36,550
by its transpose.

642
00:46:36,550 --> 00:46:40,430
The thing has to be
symmetric, because if I

643
00:46:40,430 --> 00:46:43,450
transpose this whole
thing, what will happen?

644
00:46:43,450 --> 00:46:48,020
To transpose things
their transposes

645
00:46:48,020 --> 00:46:49,660
come in the opposite order.

646
00:46:49,660 --> 00:46:53,100
So Q, its transpose comes first.

647
00:46:53,100 --> 00:46:58,000
A, its transpose comes in
the middle, but what's that?

648
00:46:58,000 --> 00:47:01,780
The transpose of A is A. We're
assuming A to be symmetric.

649
00:47:01,780 --> 00:47:04,000
And then the transpose
of Q transpose is?

650
00:47:04,000 --> 00:47:06,960
Is Q.

651
00:47:06,960 --> 00:47:10,790
So we got this back
again when we transpose,

652
00:47:10,790 --> 00:47:11,780
so it's symmetric.

653
00:47:11,780 --> 00:47:15,000
So H is symmetric.

654
00:47:15,000 --> 00:47:20,190
So the conclusion is,
H equals H transposed.

655
00:47:20,190 --> 00:47:25,030
And then we know immediately
that it's tridiagonal,

656
00:47:25,030 --> 00:47:31,390
because every H from
Arnoldi is Hessenberg.

657
00:47:31,390 --> 00:47:35,960
We know that these
zeros are here.

658
00:47:35,960 --> 00:47:45,200
The Arnoldi cycle ended
produced h i j in column j

659
00:47:45,200 --> 00:47:49,730
but it's stopped one
below the diagonal.

660
00:47:49,730 --> 00:47:52,810
So we know these are
zero, but now, if we know

661
00:47:52,810 --> 00:47:57,240
the matrix is symmetric,
then we know these are zero.

662
00:47:57,240 --> 00:48:00,140
And, of course, it
works out that way.

663
00:48:00,140 --> 00:48:08,390
So the conclusion is that we
can orthogonalize the Krylov

664
00:48:08,390 --> 00:48:18,100
basis, quickly, easily,
and work with that basis,

665
00:48:18,100 --> 00:48:20,390
either explicitly
by computing it

666
00:48:20,390 --> 00:48:24,830
or by implicitly by
keeping things orthogonal

667
00:48:24,830 --> 00:48:27,530
and that's what conjugate
gradients will do.

668
00:48:27,530 --> 00:48:30,720
So next time, I'm
going to make an effort

669
00:48:30,720 --> 00:48:32,970
to describe the conjugate
gradients method.

670
00:48:32,970 --> 00:48:35,090
I'll pick the highlights of it.

671
00:48:35,090 --> 00:48:43,150
It has fantastic properties,
and to verify those properties

672
00:48:43,150 --> 00:48:48,150
in full detail is often
more confusing than not.

673
00:48:48,150 --> 00:48:53,050
If you see today's
lecture, you're

674
00:48:53,050 --> 00:48:58,000
seeing the important
points: the role of symmetry

675
00:48:58,000 --> 00:49:03,790
of A and the Arnoldi algorithm.

676
00:49:03,790 --> 00:49:12,250
OK. so that's our first
lecture on the Krylov ideas.

677
00:49:12,250 --> 00:49:16,630
And next time, we'll
probably complete that topic,

678
00:49:16,630 --> 00:49:19,400
and it will be on the web.

679
00:49:19,400 --> 00:49:25,050
So I'll see you Wednesday
for the end of Krylov.

680
00:49:25,050 --> 00:49:25,800
Thanks.

681
00:49:25,800 --> 00:49:27,050
Good.