1
00:00:00,500 --> 00:00:01,950
The following
content is provided

2
00:00:01,950 --> 00:00:06,090
by MIT OpenCourseWare under
a Creative Commons license.

3
00:00:06,090 --> 00:00:08,330
Additional information
about our license

4
00:00:08,330 --> 00:00:10,470
and MIT OpenCourseWare
in general,

5
00:00:10,470 --> 00:00:11,930
is available at ocw.mit.edu.

6
00:00:17,100 --> 00:00:20,310
PROFESSOR: Up on the web
now are lots of sections,

7
00:00:20,310 --> 00:00:22,280
including this
conjugate gradients,

8
00:00:22,280 --> 00:00:27,960
so today is the end of the
conjugate gradients discussion,

9
00:00:27,960 --> 00:00:33,000
and it's kind of a
difficult algorithm.

10
00:00:33,000 --> 00:00:37,590
It's beautiful and
slightly tricky,

11
00:00:37,590 --> 00:00:42,610
but if I just see what the
point is of conjugate gradients,

12
00:00:42,610 --> 00:00:47,590
we don't have to check that the
exact formulas that I've listed

13
00:00:47,590 --> 00:00:52,440
here -- so that's one step
of conjugate gradients:

14
00:00:52,440 --> 00:01:00,480
going from k minus 1 to k, and
we'll look at each of the five

15
00:01:00,480 --> 00:01:06,650
steps in the cycle, without
patiently checking all those

16
00:01:06,650 --> 00:01:07,590
formulas.

17
00:01:07,590 --> 00:01:09,780
We'll just, sort of, see
how much work is involved.

18
00:01:09,780 --> 00:01:17,040
But let me begin where we were
last time, which was Arnoldi.

19
00:01:17,040 --> 00:01:20,250
So Arnoldi was a
Gram-Schmidt-like algorithm,

20
00:01:20,250 --> 00:01:26,350
not exactly Gram-Schmidt, that
produced an orthogonal basis,

21
00:01:26,350 --> 00:01:30,430
and that basis is the basis
for the Krylov subspace.

22
00:01:30,430 --> 00:01:38,620
So each new basis vector
in the Krylov subspace

23
00:01:38,620 --> 00:01:42,170
is found by multiplying
by another power of A,

24
00:01:42,170 --> 00:01:44,620
but that's a terrible basis.

25
00:01:44,620 --> 00:01:50,920
So the basis that we get from
these guys has to be improved,

26
00:01:50,920 --> 00:01:54,320
orthogonalized, and
that's what Arnoldi does.

27
00:01:54,320 --> 00:01:58,790
And if we look to see
what it does matrix-wise,

28
00:01:58,790 --> 00:02:02,900
it turns out to be exactly
-- have that nice equation,

29
00:02:02,900 --> 00:02:08,490
that we're creating a matrix
Q with the orthogonal vectors,

30
00:02:08,490 --> 00:02:14,320
and a matrix H that tells
us how they were connected

31
00:02:14,320 --> 00:02:19,880
to the matrix A. OK.

32
00:02:19,880 --> 00:02:33,270
So that's the central equation,
and the properties of H

33
00:02:33,270 --> 00:02:35,310
are important.

34
00:02:35,310 --> 00:02:42,900
So we saw that the key property
was, in general, we've got --

35
00:02:42,900 --> 00:02:45,620
it's full up in
that upper corner,

36
00:02:45,620 --> 00:02:49,320
just the way
Gram-Schmidt would be.

37
00:02:49,320 --> 00:02:53,830
But in case A is a
symmetric matrix,

38
00:02:53,830 --> 00:02:58,180
then we can check that H will
be symmetric from this formula.

39
00:02:58,180 --> 00:03:00,040
And the fact that
it's symmetric means

40
00:03:00,040 --> 00:03:03,290
that those are zeros
up there, because we

41
00:03:03,290 --> 00:03:05,750
know there are zeros down here.

42
00:03:05,750 --> 00:03:07,940
We know there's
only one diagonal

43
00:03:07,940 --> 00:03:09,570
below the main diagonal.

44
00:03:12,290 --> 00:03:14,050
And if it's symmetric,
then there's

45
00:03:14,050 --> 00:03:19,740
only one diagonal above,
and that's the great case,

46
00:03:19,740 --> 00:03:24,540
the symmetric case.

47
00:03:24,540 --> 00:03:29,120
So I'm going to take
two seconds just to say,

48
00:03:29,120 --> 00:03:31,820
here is the best way
to find eigenvalues.

49
00:03:31,820 --> 00:03:37,210
We haven't spoken one word
about computing eigenvalues,

50
00:03:37,210 --> 00:03:42,170
but this is the moment
to say that word.

51
00:03:42,170 --> 00:03:45,710
If I have a symmetric matrix,
eigenvalues are way, way easier

52
00:03:45,710 --> 00:03:48,320
for symmetric matrices.

53
00:03:48,320 --> 00:03:51,940
Linear systems are also
easier, but eigenvalues

54
00:03:51,940 --> 00:03:56,700
are an order of magnitude
easier for symmetric matrices.

55
00:03:56,700 --> 00:04:00,300
And here is a really good
way to find the eigenvalues

56
00:04:00,300 --> 00:04:02,420
of large matrices.

57
00:04:02,420 --> 00:04:05,440
I assume that you don't
want all the eigenvalues.

58
00:04:05,440 --> 00:04:12,160
I mean, that is very unusual
to need many, many eigenvalues

59
00:04:12,160 --> 00:04:14,090
of a big matrix.

60
00:04:14,090 --> 00:04:20,500
You're looking -- probably those
high eigenvalues are not really

61
00:04:20,500 --> 00:04:25,300
physically close, good
representations of the actual

62
00:04:25,300 --> 00:04:28,330
eigenvalues of the
continuous problem.

63
00:04:28,330 --> 00:04:30,480
So, we have continuous
and discrete

64
00:04:30,480 --> 00:04:34,650
that are close for
the low eigenvalues,

65
00:04:34,650 --> 00:04:40,810
because that's where
the error is smooth.

66
00:04:40,810 --> 00:04:45,690
High eigenvalues
are not so reliable.

67
00:04:45,690 --> 00:04:51,450
So for the low eigenvalues,
here is a good way to do them.

68
00:04:51,450 --> 00:04:59,540
Do Arnoldi, and it gets named
also, now, after Lanczos,

69
00:04:59,540 --> 00:05:03,210
because he had this
eigenvalue idea.

70
00:05:03,210 --> 00:05:07,560
Run that for a while.

71
00:05:07,560 --> 00:05:13,430
Say if you want 10 eigenvalues,
maybe go up to j equal to 20.

72
00:05:13,430 --> 00:05:16,490
So 20 Arnoldi steps,
quite reasonable.

73
00:05:16,490 --> 00:05:22,150
And then look at the 20
by 20 submatrix of H,

74
00:05:22,150 --> 00:05:25,910
so you're not going to
go to the end, to H_n.

75
00:05:25,910 --> 00:05:28,600
You're going to
stop at some point,

76
00:05:28,600 --> 00:05:31,180
and you have this nice matrix.

77
00:05:34,770 --> 00:05:38,060
It'll be tri-diagonal,
symmetric tri-diagonal,

78
00:05:38,060 --> 00:05:41,500
and that's the kind of
matrix we know how to find

79
00:05:41,500 --> 00:05:46,920
the eigenvalues of; so this
is symmetric and tri-diagonal,

80
00:05:46,920 --> 00:05:56,050
and eigenvalues of that class
of matrices are very nice,

81
00:05:56,050 --> 00:06:00,110
and they're very meaningful,
and they're very close to --

82
00:06:00,110 --> 00:06:04,800
in many cases; I won't try
to give a theory here --

83
00:06:04,800 --> 00:06:07,330
in many cases, you get
very good approximations.

84
00:06:07,330 --> 00:06:10,030
The eigenvalues
of this submatrix

85
00:06:10,030 --> 00:06:12,630
are sometimes
called Ritz values,

86
00:06:12,630 --> 00:06:18,550
and they use the eigenvalues
of this submatrix,

87
00:06:18,550 --> 00:06:24,430
and those are often
called Ritz values.

88
00:06:24,430 --> 00:06:29,170
And so the first 10 eigenvalues
of that 20 by 20 matrix,

89
00:06:29,170 --> 00:06:35,990
would be in many, many
problems very fast, very

90
00:06:35,990 --> 00:06:40,850
efficient approximations
to the low eigenvalues

91
00:06:40,850 --> 00:06:44,270
of the big matrix A
that you start with.

92
00:06:44,270 --> 00:06:48,790
OK, the reason is
that eigenvalues --

93
00:06:48,790 --> 00:06:51,380
do you all recognize
that eigenvalues --

94
00:06:51,380 --> 00:06:55,940
that this is the operation, when
I bring Q inverse over here,

95
00:06:55,940 --> 00:07:00,890
that I would call these
matrices similar --

96
00:07:00,890 --> 00:07:08,200
two matrices are similar if
there's a Q that connects them

97
00:07:08,200 --> 00:07:09,270
this way.

98
00:07:09,270 --> 00:07:14,860
And similar matrices
have the same lambdas,

99
00:07:14,860 --> 00:07:16,120
the same eigenvalues.

100
00:07:18,710 --> 00:07:22,790
So the full matrix Q, if we
went all the way to the end,

101
00:07:22,790 --> 00:07:26,070
we could find its eigenvalues,
but the whole point

102
00:07:26,070 --> 00:07:30,760
of all this, of this
month, is algorithms

103
00:07:30,760 --> 00:07:32,900
that you can stop
partway, and you still

104
00:07:32,900 --> 00:07:35,180
have something really good.

105
00:07:35,180 --> 00:07:42,560
OK, so that's -- I think
we can't do everything,

106
00:07:42,560 --> 00:07:45,570
so I won't try to do more with
eigenvalues except to mention

107
00:07:45,570 --> 00:07:46,070
that.

108
00:07:46,070 --> 00:07:50,370
Someday you may want the
eigenvalues of a large matrix,

109
00:07:50,370 --> 00:08:00,340
and I guess ARPACK would
be the Arnoldi pack, that

110
00:08:00,340 --> 00:08:06,030
would be the package to go
to for the Arnoldi iteration,

111
00:08:06,030 --> 00:08:09,640
and then to use for
the eigenvalues.

112
00:08:09,640 --> 00:08:17,930
OK, so that's a side point,
just since linear algebra is --

113
00:08:17,930 --> 00:08:20,990
half of it's linear systems
and half of it's eigenvalue

114
00:08:20,990 --> 00:08:26,490
problems, I didn't think I
could miss the chance to do that

115
00:08:26,490 --> 00:08:28,420
second half, the
eigenvalue part.

116
00:08:28,420 --> 00:08:33,750
OK, now I'm going to
tackle conjugate gradients,

117
00:08:33,750 --> 00:08:35,330
to get the idea.

118
00:08:35,330 --> 00:08:38,035
OK, so the idea,
well, of course, A

119
00:08:38,035 --> 00:08:42,900
is symmetric positive definite.

120
00:08:42,900 --> 00:08:47,460
If it's not, you could
try these formulas,

121
00:08:47,460 --> 00:08:52,780
but you'd be taking
a pretty big chance.

122
00:08:52,780 --> 00:08:56,530
If it is, these are just right.

123
00:08:56,530 --> 00:08:59,850
OK, so what's the
main point about --

124
00:08:59,850 --> 00:09:08,300
here's the rule that
decides what x_k will be.

125
00:09:08,300 --> 00:09:15,490
x_k will be the vector in
the Krylov -- of course,

126
00:09:15,490 --> 00:09:22,440
x_k is in this Krylov space K_k.

127
00:09:25,040 --> 00:09:32,490
So it'll be -- we can
create x_k recursively,

128
00:09:32,490 --> 00:09:35,290
and we should need,
just, if we do it right,

129
00:09:35,290 --> 00:09:38,710
should need just one
multiplication by A at every

130
00:09:38,710 --> 00:09:42,970
step, and then
some inner product.

131
00:09:42,970 --> 00:09:47,140
OK, so now, what do
we know from this?

132
00:09:47,140 --> 00:09:53,090
This vector, since it has
this, it starts with --

133
00:09:53,090 --> 00:09:56,900
it has an x_k that's in
the k-th Krylov subspace,

134
00:09:56,900 --> 00:10:01,950
but now it's multiplied by A, so
that's up to the k plus first.

135
00:10:01,950 --> 00:10:08,520
And b is in K_1, it's in
all the Krylov subspaces,

136
00:10:08,520 --> 00:10:18,980
so this is in -- and therefore
r_k is in -- K_(k+1), right?

137
00:10:18,980 --> 00:10:22,070
This is in the next space
up, because there's a

138
00:10:22,070 --> 00:10:25,660
multiplication by
A. And therefore,

139
00:10:25,660 --> 00:10:30,580
if we choose to determine
it by this rule,

140
00:10:30,580 --> 00:10:36,120
we learn that this
vector -- I mean,

141
00:10:36,120 --> 00:10:44,340
we already know from Arnoldi
that q_(k+1) is orthogonal

142
00:10:44,340 --> 00:10:46,470
to this space.

143
00:10:46,470 --> 00:10:53,680
This is the new k plus first
vector that's in K_(k+1),

144
00:10:53,680 --> 00:10:56,710
so this is just where
this is to be found.

145
00:10:56,710 --> 00:11:02,160
So this vector r_k is found
in the next Krylov subspace;

146
00:11:02,160 --> 00:11:05,240
this one is in the
next Krylov subspace.

147
00:11:05,240 --> 00:11:09,060
They're both orthogonal to all
the previous Krylov subspaces,

148
00:11:09,060 --> 00:11:11,150
so one's a multiple
of the other.

149
00:11:14,180 --> 00:11:17,820
So, that will mean
that automatically,

150
00:11:17,820 --> 00:11:22,430
that the r's have the
same property that Arnoldi

151
00:11:22,430 --> 00:11:25,550
created for the q's:
they're orthogonal.

152
00:11:25,550 --> 00:11:29,920
So the residuals are orthogonal.

153
00:11:29,920 --> 00:11:32,540
So, orthogonal residuals.

154
00:11:32,540 --> 00:11:39,850
Orthogonal residuals, good.

155
00:11:43,970 --> 00:11:48,050
I better say, by the way, when
we do conjugate gradients,

156
00:11:48,050 --> 00:11:52,590
we don't also do Arnoldi, so
Arnoldi's finding these q's;

157
00:11:52,590 --> 00:11:55,740
we're leaving him
behind for a while,

158
00:11:55,740 --> 00:11:57,590
because we're going
to go directly

159
00:11:57,590 --> 00:12:00,430
to the x's and r's here.

160
00:12:00,430 --> 00:12:05,420
So, I won't know the q's but
I know a lot about the q,

161
00:12:05,420 --> 00:12:10,720
so it's natural to write this
simple fact: that each residual

162
00:12:10,720 --> 00:12:13,750
is a multiple of the new
q, and therefore, it's

163
00:12:13,750 --> 00:12:16,770
orthogonal to all the old ones.

164
00:12:16,770 --> 00:12:23,410
And now I want to -- this is the
other property that determines

165
00:12:23,410 --> 00:12:25,190
the new x.

166
00:12:25,190 --> 00:12:29,670
So the new x should be
-- what do I see here,

167
00:12:29,670 --> 00:12:36,620
I see a sort of delta x, the
change in x, the correction --

168
00:12:36,620 --> 00:12:40,620
which is what I want to
compute -- at step k,

169
00:12:40,620 --> 00:12:42,750
that's what I have to find.

170
00:12:42,750 --> 00:12:46,640
And I could look at the
corrections at the old steps,

171
00:12:46,640 --> 00:12:50,010
and notice there's
an A in the middle.

172
00:12:50,010 --> 00:12:52,270
So the corrections
to the x's are

173
00:12:52,270 --> 00:12:58,410
orthogonal in the
A inner products,

174
00:12:58,410 --> 00:13:01,680
and that's where the name
conjugate comes from.

175
00:13:01,680 --> 00:13:11,190
So this is conjugate directions.

176
00:13:11,190 --> 00:13:15,650
Why do I say directions,
and why do I say conjugate?

177
00:13:15,650 --> 00:13:17,650
Conjugate gradients,
even, I could say.

178
00:13:17,650 --> 00:13:19,790
Why did I --
conjugate directions,

179
00:13:19,790 --> 00:13:22,430
I maybe shouldn't
have erased that,

180
00:13:22,430 --> 00:13:28,390
but I'll also say
conjugate gradients.

181
00:13:28,390 --> 00:13:30,690
OK.

182
00:13:30,690 --> 00:13:34,920
So conjugate, all I
mean by conjugate,

183
00:13:34,920 --> 00:13:42,060
is orthogonal in the natural
inner product that goes with

184
00:13:42,060 --> 00:13:44,370
the problem, and
that's given by A.

185
00:13:44,370 --> 00:13:47,900
So the inner product given
by the problem is the inner

186
00:13:47,900 --> 00:13:54,070
product of x and y,
is x transposed A*y.

187
00:13:54,070 --> 00:14:00,080
And that's a good inner product,
and it's the one that comes

188
00:14:00,080 --> 00:14:04,750
with the problem and
so, natural to look

189
00:14:04,750 --> 00:14:09,570
at orthogonality in that
sense, and that's what we have.

190
00:14:09,570 --> 00:14:12,750
So these are orthogonal
in the usual sense,

191
00:14:12,750 --> 00:14:15,580
because they have an A
already built into them,

192
00:14:15,580 --> 00:14:19,160
and these are orthogonal
in the A inner product.

193
00:14:19,160 --> 00:14:21,510
OK.

194
00:14:21,510 --> 00:14:26,720
So I wrote down two
requirements on the new x.

195
00:14:26,720 --> 00:14:29,060
Well, actually you might
say, I wrote down a whole lot

196
00:14:29,060 --> 00:14:32,010
of requirements, because I'm
saying that this should hold

197
00:14:32,010 --> 00:14:33,960
for all the previous i's.

198
00:14:33,960 --> 00:14:37,420
And this should hold for
all the previous i's.

199
00:14:37,420 --> 00:14:39,200
And you see indices
are getting in here,

200
00:14:39,200 --> 00:14:45,310
and I'm asking
for your patience.

201
00:14:45,310 --> 00:14:47,890
What's the point that
I want to make here?

202
00:14:47,890 --> 00:14:52,470
It's this short recurrence
point, this point that we had,

203
00:14:52,470 --> 00:14:56,760
over here, when h was
just tri-diagonal,

204
00:14:56,760 --> 00:14:59,220
so we only needed
-- at the new step,

205
00:14:59,220 --> 00:15:06,080
the only things we had
to find were the --

206
00:15:06,080 --> 00:15:10,650
say at the second step, we just
had to find that h and that h,

207
00:15:10,650 --> 00:15:14,200
because this one was
the same as that.

208
00:15:14,200 --> 00:15:21,410
And then the next one, we --
maybe I didn't say that right.

209
00:15:21,410 --> 00:15:24,870
I guess when we --
let me start again.

210
00:15:24,870 --> 00:15:29,540
At the first step I need
to find two numbers.

211
00:15:29,540 --> 00:15:31,430
Now because H is
symmetric, that's

212
00:15:31,430 --> 00:15:32,900
already the same as that.

213
00:15:32,900 --> 00:15:35,560
So at the next step I
need these two numbers,

214
00:15:35,560 --> 00:15:37,600
and then that number
is already that.

215
00:15:37,600 --> 00:15:42,560
So, two numbers at every
step and big zeros up here.

216
00:15:42,560 --> 00:15:46,490
That's that's the key
point when A is symmetric.

217
00:15:46,490 --> 00:15:49,480
So the same will
be true over here.

218
00:15:49,480 --> 00:15:58,190
We have two numbers, and
they're called alpha and beta,

219
00:15:58,190 --> 00:15:59,820
so they're a little
different numbers,

220
00:15:59,820 --> 00:16:04,020
because we're working with the
x's now and not with the q's,

221
00:16:04,020 --> 00:16:09,330
so you don't see any q's written
down for conjugate gradients.

222
00:16:09,330 --> 00:16:11,060
OK.

223
00:16:11,060 --> 00:16:15,940
All right, let me just look
at the cycle of five steps.

224
00:16:18,660 --> 00:16:28,170
So I have to give it a start,
so I start with d_0 as b and x_0

225
00:16:28,170 --> 00:16:36,940
as 0, and I guess, r_0 which,
of course, is b minus A*x_0.

226
00:16:36,940 --> 00:16:40,300
If that's 0, it's also b.

227
00:16:40,300 --> 00:16:41,460
So those are the starts.

228
00:16:44,630 --> 00:16:47,460
So I'm ready to take a step.

229
00:16:47,460 --> 00:16:51,570
I know the ones
with subscript zero,

230
00:16:51,570 --> 00:16:54,110
and I'm ready for the next step.

231
00:16:54,110 --> 00:16:56,540
I know the ones with
subscript k minus 1,

232
00:16:56,540 --> 00:17:01,050
and I'm ready for this
step, this cycle, k.

233
00:17:04,760 --> 00:17:06,370
So, here's a number.

234
00:17:09,610 --> 00:17:12,900
So I'm coming into this -- well
let me say maybe how I should

235
00:17:12,900 --> 00:17:13,490
say it.

236
00:17:13,490 --> 00:17:18,460
I'm coming into that cycle
with a new d, now what is a d?

237
00:17:18,460 --> 00:17:20,650
It's a search direction.

238
00:17:20,650 --> 00:17:22,570
It's the way I'm going to go.

239
00:17:22,570 --> 00:17:26,260
It's the direction in which
this delta x is going to go.

240
00:17:26,260 --> 00:17:29,640
This delta x is going to be,
probably I'll see it here.

241
00:17:29,640 --> 00:17:34,950
Yeah, there is delta x,
is a multiple of d, right?

242
00:17:34,950 --> 00:17:39,650
If I put this on this
side, that's delta x,

243
00:17:39,650 --> 00:17:42,520
the change in x is
a multiple of d,

244
00:17:42,520 --> 00:17:46,130
so d is the direction
in n-dimensional space

245
00:17:46,130 --> 00:17:49,840
that I'm looking
for the correction,

246
00:17:49,840 --> 00:17:53,210
and alpha is the
number, is how far

247
00:17:53,210 --> 00:17:58,650
I go along d for the best,
for the right, delta x.

248
00:17:58,650 --> 00:18:06,190
So alpha will be determined
by, probably by that.

249
00:18:06,190 --> 00:18:11,740
Probably by that, yeah, it
involves an inner product,

250
00:18:11,740 --> 00:18:13,950
picking off a component.

251
00:18:13,950 --> 00:18:14,730
OK?

252
00:18:14,730 --> 00:18:18,000
So the first step is
to find the number;

253
00:18:18,000 --> 00:18:22,170
the next step is to take
that update, delta x is

254
00:18:22,170 --> 00:18:24,740
the right multiple of
the search direction

255
00:18:24,740 --> 00:18:26,460
that I'm coming in with.

256
00:18:26,460 --> 00:18:30,020
Then correcting r is easy;
I'll come back to that;

257
00:18:30,020 --> 00:18:31,810
if I know the
change in x, I know

258
00:18:31,810 --> 00:18:35,390
the change in r right away.

259
00:18:35,390 --> 00:18:39,600
So that's given me,
I'm now updated.

260
00:18:39,600 --> 00:18:43,040
But now I have to look ahead.

261
00:18:43,040 --> 00:18:47,390
What direction am I
going to travel next?

262
00:18:47,390 --> 00:18:54,390
So the next steps are to find a
number beta and then the search

263
00:18:54,390 --> 00:19:01,620
direction, which isn't r; it's r
again with just one correction.

264
00:19:01,620 --> 00:19:05,310
So it's a beautiful set
of formulas, beautiful.

265
00:19:05,310 --> 00:19:09,610
And I could write -- you know,
because I know x's and r's, I

266
00:19:09,610 --> 00:19:16,250
could write the formulas --
there are different expressions

267
00:19:16,250 --> 00:19:20,895
for the same alpha and beta,
but this is a very satisfactory

268
00:19:20,895 --> 00:19:21,550
one.

269
00:19:21,550 --> 00:19:22,730
OK.

270
00:19:22,730 --> 00:19:26,820
Have a look at that
cycle just to see what

271
00:19:26,820 --> 00:19:28,840
how much work is involved.

272
00:19:28,840 --> 00:19:31,380
How much work is
involved in that cycle?

273
00:19:31,380 --> 00:19:35,490
OK, well, there's
a multiplication

274
00:19:35,490 --> 00:19:38,340
by A, no surprise.

275
00:19:38,340 --> 00:19:40,050
It's that
multiplication by A that

276
00:19:40,050 --> 00:19:43,480
takes us to the
next Krylov space,

277
00:19:43,480 --> 00:19:46,600
takes us to the next update.

278
00:19:46,600 --> 00:19:50,030
So I see a multiplication by
A, and also here, but it's

279
00:19:50,030 --> 00:19:51,940
the same multiplication.

280
00:19:51,940 --> 00:19:59,090
So I see one multiplication
by A. So, easy to list.

281
00:19:59,090 --> 00:20:05,500
So each cycle, there's
one multiplication A

282
00:20:05,500 --> 00:20:12,460
times the search direction, and
because A is a sparse matrix,

283
00:20:12,460 --> 00:20:18,450
the cost there is the number
of non-zeros in the matrix.

284
00:20:18,450 --> 00:20:20,260
OK.

285
00:20:20,260 --> 00:20:24,520
Then, what other computations
do you have to do?

286
00:20:24,520 --> 00:20:32,020
I see an inner product there,
and I'm going to use it again.

287
00:20:32,020 --> 00:20:35,560
I guess, and here it
is at the next step,

288
00:20:35,560 --> 00:20:42,040
so I guess per cycle, I have
to do one inner product of r

289
00:20:42,040 --> 00:20:47,030
with itself, and I have to do
one of these inner products.

290
00:20:47,030 --> 00:20:53,270
So I see two inner products,
and of course, these

291
00:20:53,270 --> 00:20:54,720
are vectors of length n.

292
00:20:54,720 --> 00:21:02,550
so that's 2n multiplications,
2n additions.

293
00:21:02,550 --> 00:21:08,930
OK, this was a multiple of
n, depending how sparse A is,

294
00:21:08,930 --> 00:21:14,326
this is 2n adds and 2n
multiplies for these two

295
00:21:14,326 --> 00:21:14,950
inner products.

296
00:21:14,950 --> 00:21:18,050
OK.

297
00:21:18,050 --> 00:21:18,550
Yeah.

298
00:21:18,550 --> 00:21:23,450
I guess you could
say take them there.

299
00:21:23,450 --> 00:21:25,400
And then what else
do we have to do?

300
00:21:25,400 --> 00:21:27,980
Oh, we have these
updates, so I have

301
00:21:27,980 --> 00:21:32,230
to multiply a vector
by that scalar and add.

302
00:21:32,230 --> 00:21:35,480
And again here, I multiply
that vector by that scalar

303
00:21:35,480 --> 00:21:37,610
and subtract.

304
00:21:37,610 --> 00:21:40,310
Oh, and do I have
to do it again here?

305
00:21:40,310 --> 00:21:41,550
Have I got three?

306
00:21:41,550 --> 00:21:44,560
Somehow, I thought I
might only have two.

307
00:21:44,560 --> 00:21:51,670
Oh, yeah, let me say two
or three vector updates.

308
00:21:58,410 --> 00:22:07,120
People are much better than I
am at spotting the right way

309
00:22:07,120 --> 00:22:09,890
to organize those calculations.

310
00:22:09,890 --> 00:22:17,470
But that's really a very
small price to pay per cycle,

311
00:22:17,470 --> 00:22:24,980
and so if it converges
quickly, as it normally does,

312
00:22:24,980 --> 00:22:26,630
this is going to be
a good algorithm.

313
00:22:26,630 --> 00:22:30,020
Maybe I'll -- maybe having
introduced the question of how

314
00:22:30,020 --> 00:22:33,130
quickly does it converge, I
should maybe say something

315
00:22:33,130 --> 00:22:34,950
about that.

316
00:22:34,950 --> 00:22:40,520
So what's the error,
after k steps?

317
00:22:40,520 --> 00:22:43,900
How is it related to
the error at the start?

318
00:22:46,830 --> 00:22:49,810
So, I have to first say, what
do I mean by this measure

319
00:22:49,810 --> 00:22:57,150
of error, and then I
have to say what's the --

320
00:22:57,150 --> 00:23:01,620
I hope I see a factor, less
than 1, to the k power.

321
00:23:01,620 --> 00:23:05,780
Yeah, so I'm hoping some factor
to the k power will be there,

322
00:23:05,780 --> 00:23:09,090
and I sure hope
it's less than 1.

323
00:23:09,090 --> 00:23:10,770
Let me say what it is.

324
00:23:10,770 --> 00:23:15,080
I think -- so this is maybe
not the very best estimate,

325
00:23:15,080 --> 00:23:17,080
but it shows the main point.

326
00:23:17,080 --> 00:23:21,060
It's the square
root of lambda_max

327
00:23:21,060 --> 00:23:24,920
minus the square
root of lambda_min,

328
00:23:24,920 --> 00:23:26,980
divided by the
square root of that

329
00:23:26,980 --> 00:23:30,110
plus the square root of that.

330
00:23:30,110 --> 00:23:34,260
And maybe there's a
factor 2 somewhere.

331
00:23:34,260 --> 00:23:39,130
So this is one estimate
for the convergence factor,

332
00:23:39,130 --> 00:23:44,060
and you see what it depends
on, I could divide everything

333
00:23:44,060 --> 00:23:48,940
by lambda_min there, so that I
could write that as the square

334
00:23:48,940 --> 00:23:54,120
root of the condition number --
I'll divide by that -- minus 1,

335
00:23:54,120 --> 00:23:58,100
divided by the square root of
the condition number plus 1,

336
00:23:58,100 --> 00:23:59,150
to the k power.

337
00:24:01,950 --> 00:24:08,660
So the condition number has a
part in this error estimate,

338
00:24:08,660 --> 00:24:14,620
and I won't derive that,
even in the notes, I won't.

339
00:24:14,620 --> 00:24:18,450
And other people work out
other estimates for the error,

340
00:24:18,450 --> 00:24:20,590
it's an interesting problem.

341
00:24:20,590 --> 00:24:24,680
And I do have to say
that this norm is

342
00:24:24,680 --> 00:24:33,380
the natural inner
product, e_k squared

343
00:24:33,380 --> 00:24:37,110
is the inner product
of e_k with itself,

344
00:24:37,110 --> 00:24:39,970
but again with A in the middle.

345
00:24:39,970 --> 00:24:44,470
I suppose that a little
part of the message here,

346
00:24:44,470 --> 00:24:51,250
and on that middle board, is
that it's just natural to see

347
00:24:51,250 --> 00:24:54,620
the matrix A playing
a part in the --

348
00:24:54,620 --> 00:24:57,370
it just comes in naturally.

349
00:24:57,370 --> 00:25:00,810
Conjugate gradients
brings it in naturally.

350
00:25:00,810 --> 00:25:07,650
These formulas are just
created so that this will work.

351
00:25:07,650 --> 00:25:09,990
OK.

352
00:25:09,990 --> 00:25:13,150
So I suppose I'm
thinking there's

353
00:25:13,150 --> 00:25:17,160
one word I haven't
really justified,

354
00:25:17,160 --> 00:25:18,690
and that's the word gradient.

355
00:25:18,690 --> 00:25:21,590
Why do we call it
conjugate gradient?

356
00:25:21,590 --> 00:25:25,480
Here I erased --
"conjugate directions"

357
00:25:25,480 --> 00:25:28,720
I was quite happy
with, the directions

358
00:25:28,720 --> 00:25:38,450
or the d's, because this is
the direction in which we moved

359
00:25:38,450 --> 00:25:43,100
from x, so conjugate
directions would be just fine,

360
00:25:43,100 --> 00:25:49,830
and those words come into this
part of numerical mathematics,

361
00:25:49,830 --> 00:25:51,990
but where do gradients come in?

362
00:25:51,990 --> 00:25:53,430
OK.

363
00:25:53,430 --> 00:25:54,880
So, let me say,
gradient of what?

364
00:25:57,860 --> 00:26:02,570
Well, what is --
so first of all,

365
00:26:02,570 --> 00:26:15,580
gradient of what is in these
linear problems where A*x equal

366
00:26:15,580 --> 00:26:16,780
b?

367
00:26:16,780 --> 00:26:19,380
Those come from the
gradient of the energy.

368
00:26:19,380 --> 00:26:22,470
So can I say E of
x for the energy?

369
00:26:22,470 --> 00:26:27,940
As 1/2 x transpose A*x
minus b transpose x?

370
00:26:35,030 --> 00:26:36,700
You might say, where
did that come from?

371
00:26:41,240 --> 00:26:43,860
Normally, we've been talking
about linear systems,

372
00:26:43,860 --> 00:26:47,530
everything's been in terms of
A*x equal b, and now suddenly,

373
00:26:47,530 --> 00:26:52,220
I'm changing to a different
part of mathematics.

374
00:26:52,220 --> 00:26:56,520
I'm looking at minimizing
-- so optimization,

375
00:26:56,520 --> 00:27:02,440
the part that's coming in
April, is minimizing energy,

376
00:27:02,440 --> 00:27:07,040
minimizing function,
and what's the link?

377
00:27:07,040 --> 00:27:13,120
Well, if I minimize that,
the gradient of this is --

378
00:27:13,120 --> 00:27:18,560
how shall I write? grad
E, the gradient of E,

379
00:27:18,560 --> 00:27:28,030
the list of the derivatives
of E with respect to the x_i,

380
00:27:28,030 --> 00:27:33,650
and if we work that out -- well,
why not pretend it's scalar

381
00:27:33,650 --> 00:27:36,310
for the moment, so
these are just numbers.

382
00:27:36,310 --> 00:27:43,250
Then this ordinary derivative
of 1/2 A x squared is A*x,

383
00:27:43,250 --> 00:27:51,950
and the derivative of b*x is
b, and that rule continues

384
00:27:51,950 --> 00:27:53,740
into the vector case.

385
00:27:53,740 --> 00:27:57,910
So what have we got here?

386
00:27:57,910 --> 00:28:02,190
We've computed, we've
introduced the natural energy,

387
00:28:02,190 --> 00:28:05,650
so this is a natural
energy for the problem,

388
00:28:05,650 --> 00:28:12,470
and the reason it's natural
is when we minimize it,

389
00:28:12,470 --> 00:28:19,240
set the derivatives to zero, we
got the equation A*x equal b,

390
00:28:19,240 --> 00:28:21,820
so that would be
zero at a minimum.

391
00:28:21,820 --> 00:28:27,190
This would equal zero at a min,
so what I'm doing right now

392
00:28:27,190 --> 00:28:29,350
is pretty serious.

393
00:28:29,350 --> 00:28:35,860
On the other hand, I'll repeat
it again in April when we do

394
00:28:35,860 --> 00:28:39,500
minimizations,
but this is the --

395
00:28:39,500 --> 00:28:45,160
this quadratic energy
has a linear gradient,

396
00:28:45,160 --> 00:28:49,440
and its minimum is
exactly A*x equal b.

397
00:28:49,440 --> 00:28:52,340
The minimum point
is A*x equal b.

398
00:28:52,340 --> 00:28:55,570
In other words, solving
the linear equation

399
00:28:55,570 --> 00:28:59,260
and minimizing the energy
are one and the same, one

400
00:28:59,260 --> 00:29:00,740
and the same.

401
00:29:00,740 --> 00:29:04,940
And I'm allowed to
use the word min,

402
00:29:04,940 --> 00:29:11,420
because A is positive definite,
so the positive definiteness

403
00:29:11,420 --> 00:29:15,930
of A is what means that
this is really a minimum

404
00:29:15,930 --> 00:29:18,790
and not a maximum
or a saddle point.

405
00:29:18,790 --> 00:29:19,560
OK.

406
00:29:19,560 --> 00:29:25,280
So I was just -- I wanted
just to think about how do you

407
00:29:25,280 --> 00:29:26,700
minimize a function?

408
00:29:29,250 --> 00:29:34,230
I guess any sensible
person, given a function

409
00:29:34,230 --> 00:29:39,310
and looking for the minimum,
would figure out the gradient

410
00:29:39,310 --> 00:29:43,380
and move that way, move,
well, not up the gradient,

411
00:29:43,380 --> 00:29:47,090
move down the gradient, so
I imagine this minimization

412
00:29:47,090 --> 00:29:54,170
problem, I imagine here I'm
in x, this is the x_1, x_2,

413
00:29:54,170 --> 00:30:00,920
the x's; here's the E, E of
x is something like that,

414
00:30:00,920 --> 00:30:03,240
maybe its minimum is there.

415
00:30:03,240 --> 00:30:09,410
This is the graph of E of x, and
I'm trying to find that point,

416
00:30:09,410 --> 00:30:14,600
and I make as first
guess, somewhere there.

417
00:30:14,600 --> 00:30:16,980
Somewhere on that surface.

418
00:30:16,980 --> 00:30:25,580
Maybe, so to help your
eye, this is like a bowl,

419
00:30:25,580 --> 00:30:28,610
and maybe there's
a point, so that's

420
00:30:28,610 --> 00:30:31,480
on the surface of the bowl.

421
00:30:31,480 --> 00:30:34,920
That's x_0, let's say.

422
00:30:34,920 --> 00:30:41,640
I'm trying to get to the bottom
of the bowl, and as I said,

423
00:30:41,640 --> 00:30:43,900
any sensible person
would figure out, well,

424
00:30:43,900 --> 00:30:46,340
what's the steepest way down.

425
00:30:46,340 --> 00:30:48,960
The steepest way down is the
direction of the gradient.

426
00:30:51,670 --> 00:30:59,470
So I could call this gradient
g, if I wanted, and then I

427
00:30:59,470 --> 00:31:01,380
would go in the
direction of minus g,

428
00:31:01,380 --> 00:31:03,420
because I want to
go down and not up.

429
00:31:03,420 --> 00:31:07,310
I mean, we're climbing,
I shouldn't say climbing,

430
00:31:07,310 --> 00:31:14,150
we're descending a mountain,
and trying to get to the --

431
00:31:14,150 --> 00:31:15,870
or descending a valley, sorry.

432
00:31:15,870 --> 00:31:17,790
We're descending into
a valley and trying

433
00:31:17,790 --> 00:31:20,400
to get to the bottom.

434
00:31:20,400 --> 00:31:22,430
And I'm assuming
that this valley

435
00:31:22,430 --> 00:31:33,310
is this nice shape given by
just a second-degree polynomial.

436
00:31:33,310 --> 00:31:38,220
Of course, when we
get to optimization,

437
00:31:38,220 --> 00:31:40,670
this will be the simplest
problem, but not the only one.

438
00:31:40,670 --> 00:31:42,560
Here it's our problem.

439
00:31:42,560 --> 00:31:47,400
Now, of course -- so
what happens if I move

440
00:31:47,400 --> 00:31:54,570
in the direction of -- the
direction water would flow.

441
00:31:54,570 --> 00:31:58,270
If I tip a flask of
water on the ground,

442
00:31:58,270 --> 00:32:01,730
it's going to flow in
the steepest direction,

443
00:32:01,730 --> 00:32:05,860
and that would be the
natural direction to go.

444
00:32:09,490 --> 00:32:12,430
But it's not the best direction.

445
00:32:12,430 --> 00:32:18,960
Maybe the first step is, but
-- so I'm talking now about

446
00:32:18,960 --> 00:32:23,940
conjugate gradient seen
as a minimization problem,

447
00:32:23,940 --> 00:32:26,780
because that's where the
word gradient is justified.

448
00:32:26,780 --> 00:32:31,310
This is the gradient,
and of course, this

449
00:32:31,310 --> 00:32:35,890
is just minus the
residual, so this is

450
00:32:35,890 --> 00:32:37,340
the direction of the residual.

451
00:32:37,340 --> 00:32:39,410
So that's the
question, should I just

452
00:32:39,410 --> 00:32:42,980
go in the direction of
the residual all the time?

453
00:32:42,980 --> 00:32:46,600
At every step -- and this is
what the method of steepest

454
00:32:46,600 --> 00:32:50,900
descent will do, so let
me make the contrast.

455
00:32:50,900 --> 00:32:55,790
Steepest descent
is the first thing

456
00:32:55,790 --> 00:33:01,300
you would think
of, direction is r.

457
00:33:05,630 --> 00:33:08,820
That's the gradient direction,
or the negative gradient

458
00:33:08,820 --> 00:33:09,720
direction.

459
00:33:09,720 --> 00:33:16,130
Follow r until, in that
direction, you've hit bottom.

460
00:33:16,130 --> 00:33:18,900
So you'll go down this,
you see what happens,

461
00:33:18,900 --> 00:33:22,920
you'll be going down one
side, and you'll hit bottom,

462
00:33:22,920 --> 00:33:28,040
and then you will come up
again on a plane that's

463
00:33:28,040 --> 00:33:30,370
cutting through the surface.

464
00:33:30,370 --> 00:33:34,000
But you're not going to
hit that, first shot.

465
00:33:34,000 --> 00:33:36,110
When you start here,
see water would --

466
00:33:36,110 --> 00:33:40,710
the actual steepest direction,
the gradient direction,

467
00:33:40,710 --> 00:33:44,210
changes as the water
flows down to the bottom,

468
00:33:44,210 --> 00:33:48,610
but here we've chosen
a fixed direction;

469
00:33:48,610 --> 00:33:52,170
we're following it as
long as we're going down,

470
00:33:52,170 --> 00:33:54,060
but then it'll start up again.

471
00:33:54,060 --> 00:34:02,920
So we stop there and find
the gradient again there.

472
00:34:02,920 --> 00:34:05,520
Follow that down,
until it goes up again;

473
00:34:05,520 --> 00:34:07,650
follow that down,
until it goes up again,

474
00:34:07,650 --> 00:34:14,260
and the trouble is
convergence isn't great.

475
00:34:14,260 --> 00:34:16,800
The trouble is that
we end up -- even

476
00:34:16,800 --> 00:34:19,400
if we're in
100-dimensional space,

477
00:34:19,400 --> 00:34:29,080
we end up sort of just -- our
repeated gradients are not

478
00:34:29,080 --> 00:34:31,260
really in good, new directions.

479
00:34:31,260 --> 00:34:38,110
We find ourselves doing a lot of
work to make a little progress,

480
00:34:38,110 --> 00:34:46,670
and the reason is that these
r's don't have the property

481
00:34:46,670 --> 00:34:52,300
that you're always looking for
in computations, orthogonality.

482
00:34:52,300 --> 00:34:56,010
Somehow orthogonality tells
you that you're really

483
00:34:56,010 --> 00:35:01,450
getting something new, if it's
orthogonal to all the old ones.

484
00:35:01,450 --> 00:35:02,360
OK.

485
00:35:02,360 --> 00:35:09,570
Now, so this, the
direction r is not great.

486
00:35:14,870 --> 00:35:17,780
The better one is to
have some orthogonality.

487
00:35:17,780 --> 00:35:31,410
Take a direction d that
-- well, here's the point.

488
00:35:34,590 --> 00:35:36,810
This is the right direction;
this is the direction

489
00:35:36,810 --> 00:35:38,420
conjugate gradients choose.

490
00:35:38,420 --> 00:35:41,550
This is where --
so it's a gradient,

491
00:35:41,550 --> 00:35:47,340
but then it removes the
component in the direction we

492
00:35:47,340 --> 00:35:50,010
just took.

493
00:35:50,010 --> 00:35:52,120
That's what that beta will be.

494
00:35:52,120 --> 00:35:58,170
When we compute that beta,
and I'm a little surprised

495
00:35:58,170 --> 00:36:00,840
that it isn't a minus sign.

496
00:36:00,840 --> 00:36:04,360
No, it's OK.

497
00:36:04,360 --> 00:36:07,780
It turns out that
way, let me just

498
00:36:07,780 --> 00:36:10,860
be sure that I wrote
the formula down right,

499
00:36:10,860 --> 00:36:16,600
at least wrote it down
as I have it here, yep.

500
00:36:16,600 --> 00:36:19,410
Again, I'm not
checking the formulas,

501
00:36:19,410 --> 00:36:22,910
but just describing the ideas.

502
00:36:22,910 --> 00:36:25,980
So the idea is that
this formula gives us

503
00:36:25,980 --> 00:36:33,400
a new direction, which has this
property two that we're after.

504
00:36:33,400 --> 00:36:36,440
That the direction, it's
not actually orthogonal,

505
00:36:36,440 --> 00:36:40,520
it's A-orthogonal to
the previous direction.

506
00:36:40,520 --> 00:36:44,230
OK, I'm going to draw one
picture of what -- whoops,

507
00:36:44,230 --> 00:36:47,740
not on that -- of
these directions,

508
00:36:47,740 --> 00:36:54,170
and then that's my best I can
do with conjugate gradients.

509
00:36:54,170 --> 00:37:00,940
I want to try to
draw this search

510
00:37:00,940 --> 00:37:05,330
for the bottom of the valley.

511
00:37:05,330 --> 00:37:08,000
So I'm going to draw
that search by drawing

512
00:37:08,000 --> 00:37:11,250
the level sets of the function.

513
00:37:11,250 --> 00:37:17,450
I'm going to -- yeah, somehow
the contours of my function

514
00:37:17,450 --> 00:37:26,320
are, since my function here is
quadratic, all the contours,

515
00:37:26,320 --> 00:37:28,660
the level sets will be ellipses.

516
00:37:28,660 --> 00:37:31,070
They'll all be, sort of,
like similar ellipses,

517
00:37:31,070 --> 00:37:33,360
and there is the one I want.

518
00:37:33,360 --> 00:37:35,530
That's the bottom of the valley.

519
00:37:35,530 --> 00:37:38,150
So this is the contour
where energy three,

520
00:37:38,150 --> 00:37:42,120
this is the contour energy two,
this is the contour energy one,

521
00:37:42,120 --> 00:37:45,650
and there's the point where
the energy is a minimum.

522
00:37:45,650 --> 00:37:50,280
Well, it might be
negative, whatever,

523
00:37:50,280 --> 00:37:51,960
but it's the smallest.

524
00:37:51,960 --> 00:37:52,810
OK.

525
00:37:52,810 --> 00:37:59,850
So what does -- can I compare
steepest descent with conjugate

526
00:37:59,850 --> 00:38:00,730
gradient?

527
00:38:00,730 --> 00:38:03,110
Let me draw one more contour,
just so I have enough

528
00:38:03,110 --> 00:38:06,880
to make this point.

529
00:38:06,880 --> 00:38:14,000
OK, so steepest descent starts
somewhere, OK, starts there,

530
00:38:14,000 --> 00:38:15,890
let's say.

531
00:38:15,890 --> 00:38:17,610
Suppose that's my
starting point.

532
00:38:17,610 --> 00:38:22,230
Steepest descent goes,
if I draw it right,

533
00:38:22,230 --> 00:38:25,640
it'll go perpendicular, right?

534
00:38:25,640 --> 00:38:30,350
It goes perpendicular
to the level contour.

535
00:38:30,350 --> 00:38:35,340
You carry along until, as long
as the contours are going down,

536
00:38:35,340 --> 00:38:40,510
and then if I continue here,
I'm climbing back up the valley.

537
00:38:40,510 --> 00:38:41,430
So I'll stop here.

538
00:38:41,430 --> 00:38:47,200
Let's suppose that that happened
to be tangent right there, OK.

539
00:38:47,200 --> 00:38:52,800
Now, that was the first
step of steepest descent.

540
00:38:52,800 --> 00:38:56,090
Now I'm at this point,
I look at this contour,

541
00:38:56,090 --> 00:39:00,130
and well, if I've drawn
it reasonably well,

542
00:39:00,130 --> 00:39:02,650
I'm following maybe
perpendicular,

543
00:39:02,650 --> 00:39:06,600
and now again I'm perpendicular,
steepest descent says

544
00:39:06,600 --> 00:39:11,650
go perpendicular to the contour,
until you've got to the bottom

545
00:39:11,650 --> 00:39:13,230
there.

546
00:39:13,230 --> 00:39:23,110
Then down, then up, then do
you see this frustrating crawl

547
00:39:23,110 --> 00:39:26,220
across the valley.

548
00:39:26,220 --> 00:39:29,380
It wouldn't happen if these
contours were circles.

549
00:39:29,380 --> 00:39:33,260
If these contours -- if A
was the identity matrix,

550
00:39:33,260 --> 00:39:37,000
if A was the identity matrix,
these would be perfect circles,

551
00:39:37,000 --> 00:39:39,450
and you'd go to the
center right away.

552
00:39:39,450 --> 00:39:44,800
Well that only says that A*x
equal b is easy to solve when A

553
00:39:44,800 --> 00:39:46,970
is the identity matrix.

554
00:39:46,970 --> 00:39:55,070
So this is steepest descent,
and its rate of convergence

555
00:39:55,070 --> 00:40:00,800
is slow, because the longer
and thinner the valley,

556
00:40:00,800 --> 00:40:06,260
the worse it is here, OK.

557
00:40:06,260 --> 00:40:08,280
What about conjugate gradients?

558
00:40:08,280 --> 00:40:18,800
Well, I'm just going
to draw magic --

559
00:40:18,800 --> 00:40:20,580
I'm going to make
it look like magic.

560
00:40:20,580 --> 00:40:25,640
I'm going to start at the
same place, the first step,

561
00:40:25,640 --> 00:40:29,090
because I'm starting with
zero, the best thing I can do

562
00:40:29,090 --> 00:40:33,180
is follow the
gradients to there,

563
00:40:33,180 --> 00:40:39,810
and now, so that was the
first step, and now what?

564
00:40:39,810 --> 00:40:46,230
Well, the next direction
-- so these are the search

565
00:40:46,230 --> 00:40:47,550
directions.

566
00:40:47,550 --> 00:40:51,790
That's the search direction
d_1, d_2, d_3, d_4.

567
00:40:51,790 --> 00:40:56,060
Here is d_1, and what
does d_2 look like?

568
00:40:56,060 --> 00:40:59,580
May I cheat and go
straight to the center?

569
00:40:59,580 --> 00:41:03,690
It's a little bit of a cheat,
I'll go near the center.

570
00:41:03,690 --> 00:41:09,280
The next direction is
much more like that

571
00:41:09,280 --> 00:41:12,480
and then goes to
the bottom, which

572
00:41:12,480 --> 00:41:16,250
would be somewhere like
there inside that level set,

573
00:41:16,250 --> 00:41:20,490
and then the next direction
would probably be very close.

574
00:41:20,490 --> 00:41:27,520
So this is not 90 degrees unless
you're in the A inner product.

575
00:41:27,520 --> 00:41:37,350
So this is a 90 degree angle
in the A inner product.

576
00:41:37,350 --> 00:41:40,040
And what does the
A inner product do?

577
00:41:40,040 --> 00:41:49,080
In the A inner product,
in which level sets

578
00:41:49,080 --> 00:41:58,040
are circles, or spheres or
whatever, in whatever dimension

579
00:41:58,040 --> 00:41:58,540
we want.

580
00:41:58,540 --> 00:42:02,630
Well, I'm not going
to be, I don't want

581
00:42:02,630 --> 00:42:06,100
to be held to everything here.

582
00:42:06,100 --> 00:42:11,700
If you see this
picture, then we've

583
00:42:11,700 --> 00:42:15,090
not only made headway with
the conjugate gradient

584
00:42:15,090 --> 00:42:19,150
method, which is a big deal
for solving linear systems,

585
00:42:19,150 --> 00:42:22,170
but also we've made headway with
the conjugate gradient method

586
00:42:22,170 --> 00:42:25,290
for minimizing functions.

587
00:42:25,290 --> 00:42:28,590
And if the function
wasn't quadratic,

588
00:42:28,590 --> 00:42:31,330
and our equations
weren't linear,

589
00:42:31,330 --> 00:42:34,880
the conjugate gradient idea
would still be available.

590
00:42:34,880 --> 00:42:39,760
Non-linear conjugate
gradients would somehow

591
00:42:39,760 --> 00:42:48,310
follow the same idea of
A-orthogonal search directions.

592
00:42:48,310 --> 00:42:51,750
So these are not the gradients,
they're the search directions

593
00:42:51,750 --> 00:42:53,120
here.

594
00:42:53,120 --> 00:42:55,940
OK.

595
00:42:55,940 --> 00:43:03,530
That's sort of my speech about
the conjugate gradient method.

596
00:43:03,530 --> 00:43:07,460
Without having checked all
the formulas in the cycle,

597
00:43:07,460 --> 00:43:12,020
we know how much work is
involved, we know what we get,

598
00:43:12,020 --> 00:43:15,950
and we see a bit
of the geometry.

599
00:43:15,950 --> 00:43:17,130
OK.

600
00:43:17,130 --> 00:43:21,590
And we see it as a
linear systems solver

601
00:43:21,590 --> 00:43:26,660
and also as an energy
minimizer, and those

602
00:43:26,660 --> 00:43:31,330
are both important ways to
think of conjugate gradients.

603
00:43:31,330 --> 00:43:32,660
OK.

604
00:43:32,660 --> 00:43:40,340
I'll just take a few final
minutes to say a word about

605
00:43:40,340 --> 00:43:44,730
other -- what if our problem
is not symmetric positive

606
00:43:44,730 --> 00:43:46,460
definite?

607
00:43:46,460 --> 00:43:50,220
What if we have an
un-symmetric problem,

608
00:43:50,220 --> 00:43:53,460
or a symmetric, but not
positive definite problem?

609
00:43:53,460 --> 00:44:01,870
Well, in that case, these
denominators could be zero.

610
00:44:01,870 --> 00:44:05,500
I mean, if we don't know that
A is positive definite, that

611
00:44:05,500 --> 00:44:08,040
could be a zero, it
could be anything,

612
00:44:08,040 --> 00:44:11,500
and that method is broken.

613
00:44:11,500 --> 00:44:18,030
So I want to suggest a safer,
but more expensive method now

614
00:44:18,030 --> 00:44:24,850
for -- so this would be
MINRES, minimum residual.

615
00:44:24,850 --> 00:44:29,230
So the idea is
minimize, choose x_k

616
00:44:29,230 --> 00:44:35,080
to minimize the norm of
r_k, that's the residual.

617
00:44:35,080 --> 00:44:44,230
So minimize the norm
of b minus A*x_k.

618
00:44:44,230 --> 00:44:47,380
So that's my choice of
x_k, and the question is,

619
00:44:47,380 --> 00:44:51,830
how quickly can I do it?

620
00:44:51,830 --> 00:44:55,400
So this is MINRES, OK.

621
00:44:55,400 --> 00:44:58,690
And there are other
related methods,

622
00:44:58,690 --> 00:45:00,430
there's a whole
family of methods.

623
00:45:00,430 --> 00:45:03,750
What do I want to
say about MINRES?

624
00:45:03,750 --> 00:45:08,430
Let me just find MINRES.

625
00:45:08,430 --> 00:45:10,440
Well, yeah.

626
00:45:14,650 --> 00:45:18,840
I guess, for this I
will start with Arnoldi.

627
00:45:28,980 --> 00:45:34,180
So for this different algorithm,
which comes up with a different

628
00:45:34,180 --> 00:45:37,090
x_k, I'm going to
-- and of course,

629
00:45:37,090 --> 00:45:43,600
the x_k is going to be
in the Krylov space K_k,

630
00:45:43,600 --> 00:45:46,870
and I'm going to
use this good basis.

631
00:45:46,870 --> 00:45:52,890
This gives me the
great basis of q's.

632
00:45:52,890 --> 00:45:56,710
So I turn my problem,
I convert the problem.

633
00:45:56,710 --> 00:46:10,790
I convert this minimization --
I look for x as a combination

634
00:46:10,790 --> 00:46:15,190
of the of the q's, right?

635
00:46:15,190 --> 00:46:18,980
Is that how you
see x equals Q*y?

636
00:46:18,980 --> 00:46:22,200
So instead of looking
for the vector x,

637
00:46:22,200 --> 00:46:24,100
I'm looking for
the vector y, which

638
00:46:24,100 --> 00:46:30,110
tells me what combination of
these columns, the q's, x is.

639
00:46:30,110 --> 00:46:34,760
In other words, I'm working
in the q system, the q basis.

640
00:46:34,760 --> 00:46:36,330
So I'm working with y.

641
00:46:36,330 --> 00:46:43,890
So I would, naturally, write
this as a minimum of b minus

642
00:46:43,890 --> 00:46:44,610
A*Q*y_k.

643
00:46:47,360 --> 00:46:49,120
Same problem.

644
00:46:49,120 --> 00:46:51,620
Same problem, I'm
just deciding, OK, I'm

645
00:46:51,620 --> 00:46:55,930
going to find this x as
a combination of the q's,

646
00:46:55,930 --> 00:46:59,200
because those q's
are orthogonal,

647
00:46:59,200 --> 00:47:02,810
and that will make this
calculation a lot nicer.

648
00:47:02,810 --> 00:47:04,660
OK, what's going to happen here?

649
00:47:04,660 --> 00:47:10,680
You know that I'm going to use
the property of the q's that

650
00:47:10,680 --> 00:47:15,730
A*Q is Q*H.

651
00:47:15,730 --> 00:47:19,970
That was the key point
that Arnoldi achieved.

652
00:47:19,970 --> 00:47:24,070
And the net result is
going to be, of course,

653
00:47:24,070 --> 00:47:28,400
I have to do this at level
k, so I'm chopping it off,

654
00:47:28,400 --> 00:47:31,380
and I have to
patiently, and the notes

655
00:47:31,380 --> 00:47:33,940
do that, patiently
see, OK, what's

656
00:47:33,940 --> 00:47:38,580
happening when we chop it
off, chop off these matrices.

657
00:47:38,580 --> 00:47:41,340
We have to watch where
the non-zeros appear.

658
00:47:41,340 --> 00:47:45,930
In the end, I get to a
minimization problem for y,

659
00:47:45,930 --> 00:47:52,310
so we get to a minimum
problem for the matrix

660
00:47:52,310 --> 00:48:01,920
H that leads me to to
the y that I'm after.

661
00:48:01,920 --> 00:48:05,810
So it's a minimum problem
for a piece, sorry,

662
00:48:05,810 --> 00:48:15,400
I should really say H_k, some
piece of it, like this piece.

663
00:48:15,400 --> 00:48:22,770
In fact, exactly that piece.

664
00:48:22,770 --> 00:48:27,960
So if I ask -- all right,
suppose I have a least squares

665
00:48:27,960 --> 00:48:29,520
problem.

666
00:48:29,520 --> 00:48:34,200
This is least squares because
my norm is just the square,

667
00:48:34,200 --> 00:48:36,660
the norm squared.

668
00:48:36,660 --> 00:48:39,420
I'm just working with l^2 norm.

669
00:48:39,420 --> 00:48:41,740
And here's my matrix.

670
00:48:45,820 --> 00:48:52,210
It's 3 by 2, so I'm looking
at least squares problems,

671
00:48:52,210 --> 00:48:57,900
in that -- matrices
of that type,

672
00:48:57,900 --> 00:49:00,650
and they're easy to solve.

673
00:49:00,650 --> 00:49:04,540
The notes will give the
algorithm that solves the least

674
00:49:04,540 --> 00:49:05,540
squares problem.

675
00:49:05,540 --> 00:49:09,410
I guess what I'm interested in
here is just is it expensive

676
00:49:09,410 --> 00:49:10,780
or not?

677
00:49:10,780 --> 00:49:15,910
Well, the least squares
problem will not be expensive;

678
00:49:15,910 --> 00:49:18,480
it's only got one extra
row, compared to the number

679
00:49:18,480 --> 00:49:20,850
of columns, that's not bad.

680
00:49:20,850 --> 00:49:27,690
What's expensive is the fact
that our H is not tri-diagonal

681
00:49:27,690 --> 00:49:31,510
anymore, these are
not zeros anymore.

682
00:49:31,510 --> 00:49:37,050
I have to compute all
these H's, so the hard work

683
00:49:37,050 --> 00:49:40,330
is back in Arnoldi.

684
00:49:40,330 --> 00:49:46,650
So it's Arnoldi
leading to MINRES,

685
00:49:46,650 --> 00:49:50,640
and we have serious
work already in Arnoldi.

686
00:49:50,640 --> 00:49:54,170
We don't have short recurrences,
we've got all these h's.

687
00:49:54,170 --> 00:49:56,630
OK, enough.

688
00:49:56,630 --> 00:50:08,200
Because ARPACK would be a good
code to call to do MINRES.

689
00:50:08,200 --> 00:50:15,140
OK that's my shot
at the key Krylov

690
00:50:15,140 --> 00:50:17,440
algorithms of numerical
linear algebra,

691
00:50:17,440 --> 00:50:24,270
and you see that they're
more subtle than Jacobi.

692
00:50:24,270 --> 00:50:26,530
Jacobi and
Gauss-Seidel were just

693
00:50:26,530 --> 00:50:29,060
straight simple
iterations; these

694
00:50:29,060 --> 00:50:35,500
have recurrence that
produces orthogonality,

695
00:50:35,500 --> 00:50:39,510
and you pay very little,
and you get so much.

696
00:50:39,510 --> 00:50:45,570
OK, so that's Krylov methods,
and that section of notes

697
00:50:45,570 --> 00:50:51,640
is up on the web, and of
course, at some point,

698
00:50:51,640 --> 00:50:58,330
numerical comparisons of how
fast is conjugate gradients,

699
00:50:58,330 --> 00:51:02,970
and is it faster than
direct elimination?

700
00:51:02,970 --> 00:51:06,000
You know, because that's a
big choice you have to make.

701
00:51:06,000 --> 00:51:09,820
Suppose you do have a symmetric
positive definite matrix.

702
00:51:09,820 --> 00:51:17,150
Do you do Gauss elimination with
re-ordering, minimum degree,

703
00:51:17,150 --> 00:51:20,440
or do you make the
completely different choice

704
00:51:20,440 --> 00:51:24,600
of iterative methods
like conjugate gradients?

705
00:51:24,600 --> 00:51:29,210
And that's a decision you
have to make when you're

706
00:51:29,210 --> 00:51:31,540
faced with a large
matrix problem,

707
00:51:31,540 --> 00:51:39,510
and only experiments can compare
two such widely different

708
00:51:39,510 --> 00:51:41,290
directions to take.

709
00:51:41,290 --> 00:51:50,890
So I like to end up by
emphasizing that there's still

710
00:51:50,890 --> 00:51:53,640
a big question.

711
00:51:53,640 --> 00:51:58,010
Do this or do minimum degree?

712
00:51:58,010 --> 00:52:04,860
And only some experiments
will show which is the better.

713
00:52:04,860 --> 00:52:08,780
OK, so that's good for
today, and next time

714
00:52:08,780 --> 00:52:13,620
will be direct
solution, using the FFT.

715
00:52:13,620 --> 00:52:16,310
Good, thanks.