1
00:00:00,000 --> 00:00:01,950
The following
content is provided

2
00:00:01,950 --> 00:00:06,090
by MIT OpenCourseWare under
a Creative Commons license.

3
00:00:06,090 --> 00:00:08,230
Additional information
about our license

4
00:00:08,230 --> 00:00:10,490
and MIT OpenCourseWare
in general,

5
00:00:10,490 --> 00:00:11,930
is available at ocw.mit.edu.

6
00:00:17,400 --> 00:00:20,440
PROFESSOR: For me, this is
the third and last major topic

7
00:00:20,440 --> 00:00:21,970
of the course.

8
00:00:21,970 --> 00:00:28,730
The first one was initial
value problems -- stability,

9
00:00:28,730 --> 00:00:31,020
accuracy.

10
00:00:31,020 --> 00:00:36,510
Topic two was solving
large linear systems

11
00:00:36,510 --> 00:00:42,850
by iterative methods and
also by direct methods

12
00:00:42,850 --> 00:00:45,740
like re-ordering the equations.

13
00:00:45,740 --> 00:00:50,070
Now, topic three is a whole
world of optimization.

14
00:00:55,260 --> 00:00:59,680
In reality it means you're
minimizing or possibly

15
00:00:59,680 --> 00:01:03,600
maximizing some expression.

16
00:01:03,600 --> 00:01:07,370
That expression could be a
function of several variables.

17
00:01:07,370 --> 00:01:11,390
We could be in the discrete
case, so we've got --

18
00:01:11,390 --> 00:01:15,790
maybe I just emphasize that
we have discrete optimization.

19
00:01:20,610 --> 00:01:29,440
So that's in R^n,
discrete in n dimensions,

20
00:01:29,440 --> 00:01:39,270
and that will include some
famous areas as single,

21
00:01:39,270 --> 00:01:43,400
as small subsets, really,
of the big picture.

22
00:01:43,400 --> 00:01:46,810
For example, one subset
would be linear programming.

23
00:01:52,050 --> 00:01:58,060
So that's a very special but
important case of problems

24
00:01:58,060 --> 00:02:01,070
where the cost is
linear, the constraints

25
00:02:01,070 --> 00:02:06,420
are linear, and has its
own special methods.

26
00:02:06,420 --> 00:02:11,940
So I think that's worth
considering on its own

27
00:02:11,940 --> 00:02:15,640
at a later point in a lecture.

28
00:02:15,640 --> 00:02:17,780
Another bigger picture
-- for us, actually,

29
00:02:17,780 --> 00:02:24,450
bigger will be quadratic
programming where the quantity

30
00:02:24,450 --> 00:02:28,490
that's being minimized
is a quadratic function.

31
00:02:28,490 --> 00:02:31,060
Now what's good about
a quadratic function?

32
00:02:31,060 --> 00:02:33,010
Its derivatives are linear.

33
00:02:33,010 --> 00:02:36,010
So that leads us to
linear equations,

34
00:02:36,010 --> 00:02:39,380
but always with constraints.

35
00:02:49,960 --> 00:02:53,570
We don't have a free choice
of any vector in R^n.

36
00:02:53,570 --> 00:02:58,980
We have constraints on the
vectors, they have to --

37
00:02:58,980 --> 00:03:05,690
maybe they solve a linear
system of their own.

38
00:03:05,690 --> 00:03:09,260
We might be in 100
dimensions and we

39
00:03:09,260 --> 00:03:13,390
might have 10 linear equations
as the unknowns have to solve.

40
00:03:13,390 --> 00:03:19,640
So in some way we're really in
90 dimensions, but it might --

41
00:03:19,640 --> 00:03:22,990
you know, how should we
treat those constraints?

42
00:03:22,990 --> 00:03:27,060
Well, you know that Lagrange
multipliers play a role.

43
00:03:27,060 --> 00:03:33,760
So there's a big area, very big
area, of discrete optimization.

44
00:03:33,760 --> 00:03:39,900
Then, also, there is
the continuous problems

45
00:03:39,900 --> 00:03:49,120
where the unknown is a function,
is a function u of x I'll say,

46
00:03:49,120 --> 00:03:52,020
or u of x and y.

47
00:03:52,020 --> 00:03:54,880
It's a function.

48
00:03:54,880 --> 00:03:58,430
That's why I refer to that area
as continuous optimization.

49
00:04:03,470 --> 00:04:07,780
Well, first, you
always want to get

50
00:04:07,780 --> 00:04:12,120
an equation, which is
in some way going to be

51
00:04:12,120 --> 00:04:15,000
derivative equals zero, right.

52
00:04:15,000 --> 00:04:18,680
When we learn about minimization
in elementary calculus,

53
00:04:18,680 --> 00:04:20,610
somewhere along
the line is going

54
00:04:20,610 --> 00:04:23,570
to be an equation
that has something

55
00:04:23,570 --> 00:04:26,620
like derivative equals zero.

56
00:04:26,620 --> 00:04:31,060
But, of course, we have to
account for the constraints.

57
00:04:31,060 --> 00:04:36,380
We have to ask, what is --
the derivative of what when

58
00:04:36,380 --> 00:04:38,670
our unknown is a function.

59
00:04:38,670 --> 00:04:43,400
I'm just going to write down
a topic within mathematics

60
00:04:43,400 --> 00:04:49,050
that this is often expressed as.

61
00:04:49,050 --> 00:04:52,700
Calculus -- that
would be derivative.

62
00:04:52,700 --> 00:04:57,214
But in the case
for functions it's

63
00:04:57,214 --> 00:04:58,880
often called the
calculus of variations.

64
00:05:02,150 --> 00:05:04,240
So that's just so
you see that word.

65
00:05:04,240 --> 00:05:06,190
There are books with that title.

66
00:05:09,220 --> 00:05:14,300
The idea of what is that
derivative when our unknown is

67
00:05:14,300 --> 00:05:19,540
a function and the objective
that we're trying to minimize

68
00:05:19,540 --> 00:05:23,070
is the integral of the
function and its derivatives --

69
00:05:23,070 --> 00:05:28,200
all sorts of
possibilities there.

70
00:05:28,200 --> 00:05:34,490
So that that's a very
quick overview of a field

71
00:05:34,490 --> 00:05:37,350
that we'll soon
know a lot about.

72
00:05:37,350 --> 00:05:41,120
I was trying to
think where to start.

73
00:05:41,120 --> 00:05:44,600
I think maybe it
better be discrete.

74
00:05:47,440 --> 00:05:53,140
I want to get to the
system of equations

75
00:05:53,140 --> 00:05:56,460
that you constantly see.

76
00:05:56,460 --> 00:06:01,330
So let me use, as an example
the, most basic problem which

77
00:06:01,330 --> 00:06:07,460
comes for -- maybe
I'll start over here --

78
00:06:07,460 --> 00:06:15,150
the problem of least squares,
which I'll express this way.

79
00:06:15,150 --> 00:06:18,620
I'm given a matrix A,
and a right-hand side b,

80
00:06:18,620 --> 00:06:26,890
and I want to minimize the
length squared of A*u minus b.

81
00:06:26,890 --> 00:06:30,630
So that would be a
first problem to which

82
00:06:30,630 --> 00:06:33,830
we could apply calculus, because
it's a straight minimization

83
00:06:33,830 --> 00:06:36,060
and I haven't got any
constraints in there yet.

84
00:06:40,090 --> 00:06:41,940
We could also apply
linear algebra,

85
00:06:41,940 --> 00:06:44,590
and actually linear
algebra's going

86
00:06:44,590 --> 00:06:48,280
to throw a little extra light.

87
00:06:48,280 --> 00:06:51,020
So, just what am I
thinking of here?

88
00:06:51,020 --> 00:06:58,120
I'm thinking of A as being m
by n, with m larger than n.

89
00:07:02,230 --> 00:07:06,400
If A is a square matrix,
then this problem is the same

90
00:07:06,400 --> 00:07:08,340
as solving A*u equal b.

91
00:07:11,080 --> 00:07:15,180
And, of course, we'll
always reduce to that case

92
00:07:15,180 --> 00:07:17,320
if m equals n.

93
00:07:17,320 --> 00:07:22,420
But to focus on the problems
that I'm really thinking about,

94
00:07:22,420 --> 00:07:25,780
I'm thinking about the
case where this is the --

95
00:07:25,780 --> 00:07:27,450
n is the number of unknowns.

96
00:07:30,910 --> 00:07:32,210
It's the size of u.

97
00:07:36,210 --> 00:07:39,990
m, the larger number, is
the number of measurements,

98
00:07:39,990 --> 00:07:46,330
the number of the data, so
it's the number of equations,

99
00:07:46,330 --> 00:07:50,070
and it's the size of b.

100
00:07:50,070 --> 00:07:54,020
So we have more
equations than unknowns.

101
00:07:54,020 --> 00:07:56,040
You've met least squares before.

102
00:07:56,040 --> 00:08:02,300
I hope that maybe even in these
few minutes, a little new light

103
00:08:02,300 --> 00:08:06,160
will be shed on least squares.

104
00:08:06,160 --> 00:08:12,760
So here's our problem,
and calculus could lead us

105
00:08:12,760 --> 00:08:15,980
to the equation for the best u.

106
00:08:15,980 --> 00:08:20,750
So, u stands for
u_1, u_2, up to u_n.

107
00:08:20,750 --> 00:08:26,670
There are n components
in that vector, u.

108
00:08:26,670 --> 00:08:29,673
Maybe you know the equation
-- I guess I hope you know

109
00:08:29,673 --> 00:08:38,870
the equation, because it's such
a key to so many applications.

110
00:08:38,870 --> 00:08:43,270
If I just write it, it will
sound as if problem over.

111
00:08:43,270 --> 00:08:44,920
Let me write it though.

112
00:08:44,920 --> 00:08:49,600
So the key equation -- and then
this comes up in statistics,

113
00:08:49,600 --> 00:08:52,790
for example.

114
00:08:52,790 --> 00:08:55,850
Well, that's one
of 100 examples.

115
00:08:55,850 --> 00:09:03,820
But in statistics, this is
the topic of linear regression

116
00:09:03,820 --> 00:09:05,010
in statistics.

117
00:09:05,010 --> 00:09:11,890
Let me write down -- they
gave the name normal equation

118
00:09:11,890 --> 00:09:17,740
to the equation that
gives u directly.

119
00:09:17,740 --> 00:09:21,010
Do you remember what it is?

120
00:09:21,010 --> 00:09:22,120
The normal equation?

121
00:09:22,120 --> 00:09:24,140
The equation for the
minimizing u, which

122
00:09:24,140 --> 00:09:25,940
we could find by calculus?

123
00:09:25,940 --> 00:09:33,610
It involves the key matrix A
transpose A. Let me call u hat

124
00:09:33,610 --> 00:09:40,020
the minimizer, the winner
in this competition.

125
00:09:40,020 --> 00:09:44,420
The right-hand side of the
equation is A transpose b.

126
00:09:44,420 --> 00:09:50,370
So I won't directly
go back to derive it,

127
00:09:50,370 --> 00:09:52,370
though probably I'm going
to end up deriving it,

128
00:09:52,370 --> 00:09:58,220
because you can't help
approaching that equation

129
00:09:58,220 --> 00:10:01,190
from one side or another.

130
00:10:01,190 --> 00:10:03,100
As I say, one way to
approach it would just

131
00:10:03,100 --> 00:10:08,480
be to write out what that sum of
squares is, take its derivative

132
00:10:08,480 --> 00:10:11,280
and you would get
linear equation.

133
00:10:11,280 --> 00:10:15,520
So again, u hat stands for
the u that gives the minimum.

134
00:10:19,350 --> 00:10:23,670
This A transpose A, of course --
so I'm putting in a little bit

135
00:10:23,670 --> 00:10:25,280
of linear algebra.

136
00:10:25,280 --> 00:10:29,940
This matrix A transpose
A is obviously symmetric.

137
00:10:36,040 --> 00:10:39,330
Its important property,
beyond the symmetry,

138
00:10:39,330 --> 00:10:43,490
is that it's positive definite.

139
00:10:43,490 --> 00:10:49,490
Well, I have to say
positive definite --

140
00:10:49,490 --> 00:10:51,230
there's always a
proviso, of course.

141
00:10:51,230 --> 00:10:55,500
I haven't eliminated
the degenerate case yet.

142
00:10:55,500 --> 00:11:05,160
A has m, many, many rows, a
smaller number of columns,

143
00:11:05,160 --> 00:11:09,200
and let's assume
that those columns

144
00:11:09,200 --> 00:11:15,250
are linearly independent so that
we really do have n unknowns.

145
00:11:15,250 --> 00:11:20,300
If those columns were, say if
all the columns were the same,

146
00:11:20,300 --> 00:11:26,890
then A*u would just be
multiplying that same column

147
00:11:26,890 --> 00:11:29,650
and there would really
be only one unknown.

148
00:11:29,650 --> 00:11:37,330
So I'm going to say that
A has rank n, by which I

149
00:11:37,330 --> 00:11:41,530
mean n independent columns.

150
00:11:41,530 --> 00:11:43,670
In that case, that's
what guarantees

151
00:11:43,670 --> 00:11:46,450
that this is positive definite.

152
00:11:46,450 --> 00:11:52,150
Let me try to draw
an arrow there --

153
00:11:52,150 --> 00:11:54,440
this is the same statement.

154
00:11:54,440 --> 00:11:58,040
If I say about A that the
columns are independent,

155
00:11:58,040 --> 00:12:00,180
then I'm saying
about A transpose A

156
00:12:00,180 --> 00:12:04,550
that it is positive definite.

157
00:12:04,550 --> 00:12:07,140
That means all its
eigenvalues are positive;

158
00:12:07,140 --> 00:12:11,310
it's invertible, certainly;
all its pivots are positive.

159
00:12:11,310 --> 00:12:13,940
It's the great
class of matrices.

160
00:12:16,720 --> 00:12:23,190
But I don't really want to
start with that equation.

161
00:12:23,190 --> 00:12:25,230
Here's my point.

162
00:12:25,230 --> 00:12:32,910
Optimization -- a key word
that I better get on the board,

163
00:12:32,910 --> 00:12:36,260
maybe up here, to show
that it's really important,

164
00:12:36,260 --> 00:12:39,240
is plus the idea of duality.

165
00:12:45,020 --> 00:12:51,080
The effect of duality, if
I just give a first mention

166
00:12:51,080 --> 00:12:56,890
to that word, is that very
often optimization problems,

167
00:12:56,890 --> 00:12:59,130
there are really two problems.

168
00:12:59,130 --> 00:13:02,180
Two problems that
don't look identical,

169
00:13:02,180 --> 00:13:06,500
but in some important
way they both,

170
00:13:06,500 --> 00:13:16,330
each problem is a statement
of the task ahead of us.

171
00:13:16,330 --> 00:13:20,560
What are the two problems,
the two dual problems

172
00:13:20,560 --> 00:13:25,870
in this basic example
of least squares.

173
00:13:25,870 --> 00:13:29,140
All right, here's
a good picture.

174
00:13:29,140 --> 00:13:30,430
Here's a good picture.

175
00:13:30,430 --> 00:13:34,160
Let me put it on this
board so I can recover it.

176
00:13:34,160 --> 00:13:37,700
So, minimize A*u minus b.

177
00:13:37,700 --> 00:13:44,160
So I think of the
vector b as being --

178
00:13:44,160 --> 00:13:45,620
where it's in m dimensions.

179
00:13:45,620 --> 00:13:49,960
So it's a picture -- I'm
in m dimensions here.

180
00:13:49,960 --> 00:13:52,620
Now, what about A*u?

181
00:13:52,620 --> 00:13:59,070
A*u -- where will A*u
go in this picture?

182
00:13:59,070 --> 00:14:03,100
So, A*u is -- all
the candidates A*u,

183
00:14:03,100 --> 00:14:10,660
multiply A by any vector u, so
that means A*u is a combination

184
00:14:10,660 --> 00:14:16,660
of the columns of A, the
possible vectors A*u lie

185
00:14:16,660 --> 00:14:18,640
in a subspace.

186
00:14:18,640 --> 00:14:23,280
So this is the subspace of
all possible vectors A*u.

187
00:14:28,110 --> 00:14:30,510
And it's only n-dimensional.

188
00:14:30,510 --> 00:14:41,250
This is an n-dimensional
subspace because I have only n

189
00:14:41,250 --> 00:14:44,840
parameters in u,
only n columns in A.

190
00:14:44,840 --> 00:14:49,360
So the set of all A*u's
I think of as a --

191
00:14:49,360 --> 00:14:54,420
you could say a plane, an
n-dimensional plane within

192
00:14:54,420 --> 00:14:55,960
the bigger space R^m.

193
00:15:00,840 --> 00:15:09,210
Another name for that
subspace, that plane, is the --

194
00:15:09,210 --> 00:15:15,640
in 18.06 I would call it
the column space of A,

195
00:15:15,640 --> 00:15:19,870
or the range of A is another
expression that you--.

196
00:15:19,870 --> 00:15:24,200
All the possible
A*u's and here's b,

197
00:15:24,200 --> 00:15:27,030
which isn't one of
the possible A*u's.

198
00:15:27,030 --> 00:15:29,930
So where is u hat?

199
00:15:29,930 --> 00:15:35,380
Where is the best A*u
-- the best A*u now?

200
00:15:35,380 --> 00:15:39,790
The one that's
closest to b is --

201
00:15:39,790 --> 00:15:44,190
now comes another central
word in this subject.

202
00:15:44,190 --> 00:15:48,960
If I draw it, I'm
going to draw it here.

203
00:15:48,960 --> 00:15:54,340
That will be my best A*u,
which I'm calling A u hat.

204
00:15:54,340 --> 00:16:00,250
That's the -- if the picture
seems reasonable to your eye,

205
00:16:00,250 --> 00:16:05,430
this is the vector that's
in the plane closest to b.

206
00:16:08,840 --> 00:16:11,782
What's the geometry here?

207
00:16:11,782 --> 00:16:13,240
See, that's what
I wanted to see --

208
00:16:13,240 --> 00:16:17,300
a little geometry and a little
algebra, not just calculus.

209
00:16:17,300 --> 00:16:24,740
So the geometry is
that this vector b,

210
00:16:24,740 --> 00:16:28,210
what's the connection
between b and that vector --

211
00:16:28,210 --> 00:16:30,580
that's the closest
vector, right?

212
00:16:30,580 --> 00:16:32,260
We're minimizing the distance.

213
00:16:32,260 --> 00:16:41,800
This distance here, I might
call that the error vector e.

214
00:16:41,800 --> 00:16:44,450
This is as small as possible.

215
00:16:44,450 --> 00:16:46,400
That's being minimized.

216
00:16:46,400 --> 00:16:53,360
That's the difference between
-- this is Pythagoras here.

217
00:16:53,360 --> 00:16:55,950
Of course, when I
say it's Pythagoras,

218
00:16:55,950 --> 00:16:58,770
I'm already saying the
most important point,

219
00:16:58,770 --> 00:17:04,885
that this a right angle here.

220
00:17:04,885 --> 00:17:05,760
That's a right angle.

221
00:17:08,430 --> 00:17:17,530
The closest A*u, which is A
u hat, which is this vector,

222
00:17:17,530 --> 00:17:21,460
the way geometrically we know
it's closest is that the line

223
00:17:21,460 --> 00:17:27,830
from b to the plane, that's
where the line from b,

224
00:17:27,830 --> 00:17:29,030
perpendicular to the plane.

225
00:17:29,030 --> 00:17:36,480
This line, this error vector e
is perpendicular to the plane.

226
00:17:42,300 --> 00:17:46,620
There's a good word that
everybody uses for this vector.

227
00:17:46,620 --> 00:17:49,200
Take a vector b that's
not on a plane, what's

228
00:17:49,200 --> 00:17:54,820
the word to look for the
nearest vector in the plane?

229
00:17:54,820 --> 00:17:56,260
AUDIENCE: Projection.

230
00:17:56,260 --> 00:17:58,187
PROFESSOR: Projection.

231
00:17:58,187 --> 00:17:59,270
So this is the projection.

232
00:18:05,380 --> 00:18:07,300
Orthogonal projection,
if I wanted

233
00:18:07,300 --> 00:18:11,310
to really emphasize the fact
that that's a right angle.

234
00:18:15,750 --> 00:18:22,530
So that would give
me a geometric way

235
00:18:22,530 --> 00:18:25,100
to see the least
squares problem.

236
00:18:25,100 --> 00:18:30,780
Now comes the point to
see the dual problem.

237
00:18:30,780 --> 00:18:37,631
The dual problem will be, here,
if I draw the perpendicular

238
00:18:37,631 --> 00:18:38,130
subspace.

239
00:18:44,590 --> 00:18:48,250
So that's a subspace
of what dimension?

240
00:18:48,250 --> 00:18:53,390
This contains all the vectors
perpendicular to the plane.

241
00:18:53,390 --> 00:18:59,740
So I have it as -- if m is 3,
so we're in three dimensions,

242
00:18:59,740 --> 00:19:03,190
and our plane is an ordinary
two-dimensional plane,

243
00:19:03,190 --> 00:19:08,150
then the dimension is one --
that's the perpendicular line.

244
00:19:08,150 --> 00:19:15,990
But thinking bigger, if we're in
m dimensions and this plane is

245
00:19:15,990 --> 00:19:18,990
n-dimensional, than
this is going to have --

246
00:19:18,990 --> 00:19:23,630
the true dimension
of this is m minus n,

247
00:19:23,630 --> 00:19:26,880
which could be
pretty substantial.

248
00:19:26,880 --> 00:19:32,020
But anyway, that's the
perpendicular subspace.

249
00:19:32,020 --> 00:19:34,910
If this is the
column space of A,

250
00:19:34,910 --> 00:19:39,150
I can figure out
what vectors are

251
00:19:39,150 --> 00:19:46,880
perpendicular to the columns of
A. That's really what I mean.

252
00:19:46,880 --> 00:19:49,940
This contains -- I've
drawn it as a line,

253
00:19:49,940 --> 00:19:54,310
but I've written up there its
dimension so that you see --

254
00:19:54,310 --> 00:20:02,490
I just don't know how to
draw, like, a bigger subspace.

255
00:20:02,490 --> 00:20:05,930
Yet you would have to see
that all the vectors in it

256
00:20:05,930 --> 00:20:09,480
were perpendicular to
all of these vectors.

257
00:20:09,480 --> 00:20:13,070
Do you see what I'm saying?

258
00:20:13,070 --> 00:20:17,620
If we were stuck in thinking
in three dimensions,

259
00:20:17,620 --> 00:20:21,060
if I make this a plane I
can't make that a plane.

260
00:20:21,060 --> 00:20:25,160
If I make m equal 3, and
I make n equal to two,

261
00:20:25,160 --> 00:20:28,680
I'm only got a line left
to be perpendicular.

262
00:20:28,680 --> 00:20:34,520
But in higher dimensions there
are lots of dimensions left.

263
00:20:34,520 --> 00:20:37,990
So what's my dual problem?

264
00:20:37,990 --> 00:20:46,480
My dual problem is find
the vector e in this plane

265
00:20:46,480 --> 00:20:47,410
closest to b.

266
00:20:50,950 --> 00:20:54,050
In other words, by
the same reasoning,

267
00:20:54,050 --> 00:20:56,840
what I'm saying is
take the vector b,

268
00:20:56,840 --> 00:21:04,750
project it over to this plane,
project it orthogonally --

269
00:21:04,750 --> 00:21:08,430
that same right angle
is going to be there.

270
00:21:08,430 --> 00:21:13,760
This plane -- I haven't
said what's in this --

271
00:21:13,760 --> 00:21:16,660
I've said what's in this plane
but I haven't written it yet.

272
00:21:16,660 --> 00:21:18,710
But you could tell me already.

273
00:21:18,710 --> 00:21:21,500
What is this?

274
00:21:21,500 --> 00:21:22,390
What's that vector?

275
00:21:25,100 --> 00:21:27,610
One answer would be,
it's the projection

276
00:21:27,610 --> 00:21:30,240
of b onto this perpendicular.

277
00:21:30,240 --> 00:21:33,670
So you see we're really
taking the vector b

278
00:21:33,670 --> 00:21:36,890
and we're separating
it into two components.

279
00:21:36,890 --> 00:21:40,927
One in the column space,
the other perpendicular

280
00:21:40,927 --> 00:21:41,760
to the column space.

281
00:21:41,760 --> 00:21:45,480
So just tell me
what that vector is.

282
00:21:45,480 --> 00:21:47,800
It is e.

283
00:21:47,800 --> 00:21:49,250
Same guy.

284
00:21:49,250 --> 00:21:53,470
In other words, e is the
solution to the dual problem --

285
00:21:53,470 --> 00:22:00,820
maybe I call this projection
p for the best vector

286
00:22:00,820 --> 00:22:06,520
in the plane. e is the best
vector in this subspace,

287
00:22:06,520 --> 00:22:08,450
and they add up to--.

288
00:22:08,450 --> 00:22:11,720
So, we're really
taking the vector b,

289
00:22:11,720 --> 00:22:13,480
and we're splitting
it into a part

290
00:22:13,480 --> 00:22:20,230
p in this space, and a part
e in the perpendicular space.

291
00:22:20,230 --> 00:22:22,830
If I just write down
the equations for that,

292
00:22:22,830 --> 00:22:25,480
I'll see what's cooking.

293
00:22:28,590 --> 00:22:33,600
Well, I guess what I have
to do is remember what

294
00:22:33,600 --> 00:22:37,930
are the equations to
be in this subspace,

295
00:22:37,930 --> 00:22:39,690
to be perpendicular
to the column.

296
00:22:39,690 --> 00:22:43,840
So I can't go further
without remembering

297
00:22:43,840 --> 00:22:47,030
what's in that subspace.

298
00:22:47,030 --> 00:22:49,520
So everything in
that subspace is

299
00:22:49,520 --> 00:22:55,030
perpendicular to
the columns of A.

300
00:22:55,030 --> 00:22:57,650
Let me just write
down what that means.

301
00:22:57,650 --> 00:23:01,460
Let me use the letter
maybe y for the vectors

302
00:23:01,460 --> 00:23:08,920
in that subspace, and e for the
winning vector, the projection.

303
00:23:08,920 --> 00:23:11,730
So y will be the vectors
in that subspace.

304
00:23:11,730 --> 00:23:14,260
So those vectors
are perpendicular --

305
00:23:14,260 --> 00:23:21,700
so this is the subspace
of y, all the y's in here.

306
00:23:21,700 --> 00:23:24,570
Now, what's the condition?

307
00:23:24,570 --> 00:23:32,710
So y -- that y in this
perpendicular subspace.

308
00:23:32,710 --> 00:23:33,990
What do I mean?

309
00:23:33,990 --> 00:23:44,990
I mean that y is perpendicular
to the columns of A.

310
00:23:44,990 --> 00:23:48,140
How shall I write that?

311
00:23:48,140 --> 00:23:51,390
Perpendicular means
inner product zero.

312
00:23:51,390 --> 00:23:59,060
So I want to change
the columns into rows,

313
00:23:59,060 --> 00:24:03,030
and take the inner product
with y, and get zeros.

314
00:24:03,030 --> 00:24:05,460
Zero, zero, zero.

315
00:24:05,460 --> 00:24:13,770
So, this is column 1
transposed, to be a row.

316
00:24:16,530 --> 00:24:21,000
I'm trying to express
this requirement

317
00:24:21,000 --> 00:24:22,730
in terms of the matrix.

318
00:24:26,030 --> 00:24:28,010
So to be perpendicular
to the first column,

319
00:24:28,010 --> 00:24:31,330
I know that means that the inner
product of the first column

320
00:24:31,330 --> 00:24:35,170
with y should be zero.

321
00:24:35,170 --> 00:24:38,720
The inner product with
the second column --

322
00:24:38,720 --> 00:24:42,900
the second column
with y should be zero.

323
00:24:42,900 --> 00:24:51,180
The n-th column, its inner
product with y should be zero.

324
00:24:51,180 --> 00:24:52,840
So what matrix have I got here?

325
00:24:56,830 --> 00:25:01,640
What's the condition on
y's, simple and beautiful?

326
00:25:01,640 --> 00:25:03,930
What matrix is that?

327
00:25:03,930 --> 00:25:06,520
It's A transpose.

328
00:25:06,520 --> 00:25:14,070
So that perpendicular thing --
this is completely expressed

329
00:25:14,070 --> 00:25:17,770
by the equation A
transpose y equals zero.

330
00:25:27,100 --> 00:25:31,230
That tells me the
y's, and, of course,

331
00:25:31,230 --> 00:25:34,410
e is going to be one of the y's.

332
00:25:34,410 --> 00:25:37,150
It's going to be,
I could say y hat,

333
00:25:37,150 --> 00:25:40,550
but I've already named it e.

334
00:25:40,550 --> 00:25:46,390
It's the particular one
that's closest to b --

335
00:25:46,390 --> 00:25:50,200
the y's are everybody
all along here --

336
00:25:50,200 --> 00:25:53,740
this is the null
space of A transpose.

337
00:25:53,740 --> 00:25:57,980
So in words, I would call it
that perpendicular thing is

338
00:25:57,980 --> 00:26:00,720
the null space of A transpose.

339
00:26:04,330 --> 00:26:08,470
So when you did linear
algebra, you'll remember that.

340
00:26:08,470 --> 00:26:11,760
That the null space of A
transpose -- let me write it --

341
00:26:11,760 --> 00:26:18,820
is perpendicular to
the column space of A.

342
00:26:18,820 --> 00:26:21,860
The fundamentals theorem of
linear algebra right there.

343
00:26:21,860 --> 00:26:26,190
Now we're just using
it again to see what

344
00:26:26,190 --> 00:26:28,700
are the two dual problems here.

345
00:26:28,700 --> 00:26:33,630
So the primal problem, the
one that we stated first,

346
00:26:33,630 --> 00:26:35,580
was this one.

347
00:26:35,580 --> 00:26:38,660
So I'll call this the
primal -- P for primal.

348
00:26:41,350 --> 00:26:43,750
What is the dual problem?

349
00:26:43,750 --> 00:26:54,480
The dual problem is the
problem about the y's.

350
00:26:54,480 --> 00:26:56,690
Not about the u's at all.

351
00:26:56,690 --> 00:26:59,420
That's the beauty
of this duality.

352
00:26:59,420 --> 00:27:02,080
One problem is
about u's, and it's

353
00:27:02,080 --> 00:27:06,410
a problem that ends up
projecting on the column space.

354
00:27:06,410 --> 00:27:09,890
The second problem
is about y's, it's

355
00:27:09,890 --> 00:27:12,010
the problem that
ends up projecting

356
00:27:12,010 --> 00:27:16,230
onto this perpendicular space,
and it was a projection.

357
00:27:16,230 --> 00:27:24,660
So the dual problem is just
minimize the distance from b

358
00:27:24,660 --> 00:27:32,310
to y, but with the
constraint, with --

359
00:27:32,310 --> 00:27:34,390
and now I get to use
that word constraint --

360
00:27:34,390 --> 00:27:37,470
with A transpose y equals zero.

361
00:27:37,470 --> 00:27:40,370
So there is the other problem.

362
00:27:46,600 --> 00:27:53,370
So I hope your eye can
travel between the -- well,

363
00:27:53,370 --> 00:27:55,220
let me write underneath
it the primal again.

364
00:27:59,720 --> 00:28:06,260
Minimize A*u minus b square.

365
00:28:13,160 --> 00:28:17,500
This is the one
whose solution is e,

366
00:28:17,500 --> 00:28:22,870
and this is the one whose
solution is, well, u,

367
00:28:22,870 --> 00:28:27,830
and the projection
-- u hat, sorry,

368
00:28:27,830 --> 00:28:34,860
and the projection p is A u
hat, and I guess what I'm trying

369
00:28:34,860 --> 00:28:39,070
to say is that somehow there's
a very important connection

370
00:28:39,070 --> 00:28:40,220
between the two problems.

371
00:28:44,090 --> 00:28:47,070
First of all, the two
problems use the same data.

372
00:28:47,070 --> 00:28:53,110
They use the same vector b, they
use the same matrix A. Notice

373
00:28:53,110 --> 00:28:55,610
that in one problem it's
A, and in the other problem

374
00:28:55,610 --> 00:28:57,260
it's the transpose appears.

375
00:28:57,260 --> 00:29:00,050
That's very common --
we'll see that always.

376
00:29:02,374 --> 00:29:04,040
But there's something
a little different

377
00:29:04,040 --> 00:29:05,870
about the two problems.

378
00:29:05,870 --> 00:29:11,330
This problem was unconstrained,
any u was allowed.

379
00:29:11,330 --> 00:29:18,490
This problem was constrained,
only a subset of y's, only

380
00:29:18,490 --> 00:29:21,400
that subspace of
y's was allowed.

381
00:29:21,400 --> 00:29:25,130
This is a problem
with n unknowns.

382
00:29:25,130 --> 00:29:29,470
This is a problem with
m minus n unknowns.

383
00:29:29,470 --> 00:29:37,330
m minus n unknown
variables, once we've

384
00:29:37,330 --> 00:29:38,850
accounted for the constraints.

385
00:29:44,690 --> 00:29:48,700
This is one thing
I'm thinking about.

386
00:29:48,700 --> 00:29:53,830
Often, the problem will
come with a constraint.

387
00:29:53,830 --> 00:29:58,120
Maybe I'll do a physical
example right away.

388
00:29:58,120 --> 00:30:01,630
The problem comes to
us with a constraint.

389
00:30:01,630 --> 00:30:04,940
In other words, suppose you
were given this problem.

390
00:30:04,940 --> 00:30:09,520
How would you deal with it?

391
00:30:09,520 --> 00:30:13,220
That's like the first
question in optimization,

392
00:30:13,220 --> 00:30:15,200
or one of the central questions.

393
00:30:15,200 --> 00:30:18,510
How do you deal
with a constraint?

394
00:30:18,510 --> 00:30:21,140
If we minimize this,
of course, the minimum

395
00:30:21,140 --> 00:30:24,720
would be when y equaled b.

396
00:30:24,720 --> 00:30:29,430
But that's failing to take into
account the constraint on y.

397
00:30:29,430 --> 00:30:32,370
So how do you take
constraints into account

398
00:30:32,370 --> 00:30:35,920
and end up with an equation?

399
00:30:35,920 --> 00:30:40,180
We can see, in this picture,
that somehow or other we

400
00:30:40,180 --> 00:30:42,420
ended up with this
normal equation,

401
00:30:42,420 --> 00:30:52,580
but actually I would rather end
up with a primal dual equation.

402
00:30:52,580 --> 00:30:55,250
I'd like to end up with
an equation for the best

403
00:30:55,250 --> 00:30:58,980
u and the best y.

404
00:30:58,980 --> 00:31:00,420
So what will that be?

405
00:31:00,420 --> 00:31:04,720
So I need now two equations
that will connect the best

406
00:31:04,720 --> 00:31:08,740
u and the best y, and probably
this is going to be the key.

407
00:31:08,740 --> 00:31:13,220
This b is A u hat, right.

408
00:31:17,220 --> 00:31:19,620
So this will be one
of my equations,

409
00:31:19,620 --> 00:31:20,840
and this will be the other.

410
00:31:23,620 --> 00:31:25,530
Let me see if I -- well, OK.

411
00:31:29,600 --> 00:31:34,290
I don't know what to do now.

412
00:31:34,290 --> 00:31:36,140
Here I've called it y.

413
00:31:36,140 --> 00:31:37,760
Over here it's e.

414
00:31:37,760 --> 00:31:40,550
I've got myself in a corner.

415
00:31:43,150 --> 00:31:47,200
Maybe I should call e y
hat, would you like that?

416
00:31:51,010 --> 00:31:56,370
We have in mind that it's e,
the error in the primal problem.

417
00:31:56,370 --> 00:32:04,310
But just to make the
notation for the two

418
00:32:04,310 --> 00:32:10,340
problems consistent, let me
call the winner here y hat,

419
00:32:10,340 --> 00:32:12,480
the winner here u hat.

420
00:32:12,480 --> 00:32:13,400
What's the relation?

421
00:32:13,400 --> 00:32:16,000
So let me just -- here I'll
write down the relation between

422
00:32:16,000 --> 00:32:17,380
the two.

423
00:32:17,380 --> 00:32:22,050
Well, it's over there.

424
00:32:22,050 --> 00:32:23,290
Let's see, is that right?

425
00:32:23,290 --> 00:32:24,950
Yes?

426
00:32:24,950 --> 00:32:38,590
y hat plus A u hat is b, and
A transpose y hat is zero.

427
00:32:38,590 --> 00:32:41,220
That's it.

428
00:32:41,220 --> 00:32:44,240
That's it.

429
00:32:44,240 --> 00:32:54,570
Here we have -- that's the
pair of equations that solves,

430
00:32:54,570 --> 00:32:59,700
that connects the primal and
the dual, solves them both,

431
00:32:59,700 --> 00:33:05,240
solves each one, and is
really, it's a system --

432
00:33:05,240 --> 00:33:09,110
you could say it's
a block equation.

433
00:33:09,110 --> 00:33:21,330
The block matrix being
identity A, A transpose, zero;

434
00:33:21,330 --> 00:33:27,340
the unknown being
the y and the u.

435
00:33:27,340 --> 00:33:30,530
The right-hand side being
the data, which in this case

436
00:33:30,530 --> 00:33:31,650
was the b.

437
00:33:38,850 --> 00:33:41,430
I guess what I want
to do is emphasize,

438
00:33:41,430 --> 00:33:45,740
in what's coming for
the month of April,

439
00:33:45,740 --> 00:33:50,060
the importance of this
class of problems.

440
00:33:54,040 --> 00:33:58,740
It's dealing with two --
it's dealing with the primal

441
00:33:58,740 --> 00:34:02,250
and the dual at the same time.

442
00:34:02,250 --> 00:34:07,770
It's important for so many
reasons I can't say them

443
00:34:07,770 --> 00:34:09,550
all on the first day.

444
00:34:09,550 --> 00:34:13,370
That would be a mistake,
to try to say everything

445
00:34:13,370 --> 00:34:13,960
the first day.

446
00:34:13,960 --> 00:34:18,960
But let me just say something
-- that linear programming,

447
00:34:18,960 --> 00:34:22,780
which is just one example, and
it doesn't fit this because it

448
00:34:22,780 --> 00:34:25,160
has inequality constraints.

449
00:34:25,160 --> 00:34:30,050
But you maybe know
that the number one

450
00:34:30,050 --> 00:34:37,490
method to solve linear program
is called the simplex method.

451
00:34:37,490 --> 00:34:43,500
Well, it was the number
one method for many years.

452
00:34:43,500 --> 00:34:46,740
For many problems it's still
the right way to do it.

453
00:34:46,740 --> 00:34:53,600
But a new method called
the primal-dual --

454
00:34:53,600 --> 00:34:56,250
at least that's part of
its name, primal-dual.

455
00:34:56,250 --> 00:35:02,070
It is essentially solving the
primal and the dual problems

456
00:35:02,070 --> 00:35:05,990
at once, and there are
inequality constraints,

457
00:35:05,990 --> 00:35:06,560
of course.

458
00:35:06,560 --> 00:35:09,250
I'm going to stop there
with linear programming

459
00:35:09,250 --> 00:35:13,450
and give it its turn later.

460
00:35:13,450 --> 00:35:21,610
In this perfect example
here, we have only equations.

461
00:35:21,610 --> 00:35:23,220
How many do we have?

462
00:35:23,220 --> 00:35:28,750
We have m plus n equations,
because y is m unknowns,

463
00:35:28,750 --> 00:35:30,440
u is n unknowns.

464
00:35:30,440 --> 00:35:35,600
I have altogether m
plus n equations --

465
00:35:35,600 --> 00:35:42,440
m y's, and n u's, and
they come together.

466
00:35:42,440 --> 00:35:48,320
Now, could you,
just to connect back

467
00:35:48,320 --> 00:35:51,680
with what we absolutely know,
that it's a normal equation,

468
00:35:51,680 --> 00:35:54,890
where is this normal
equation coming from?

469
00:35:54,890 --> 00:35:58,130
So here's the normal equation.

470
00:35:58,130 --> 00:36:02,920
We know that that's gotta
come, right, out of the thing.

471
00:36:02,920 --> 00:36:04,670
How does it come?

472
00:36:04,670 --> 00:36:08,910
Suppose I have a block
system, two by two.

473
00:36:08,910 --> 00:36:11,100
How do I solve it?

474
00:36:11,100 --> 00:36:15,780
Well actually, that's, in
a way, the big question.

475
00:36:15,780 --> 00:36:19,410
But one way to solve it,
the natural way to solve it,

476
00:36:19,410 --> 00:36:23,250
would be elimination.

477
00:36:23,250 --> 00:36:28,720
Multiply this first row
by a suitable matrix.

478
00:36:28,720 --> 00:36:32,270
Subtract from the second
row to produce a zero there.

479
00:36:32,270 --> 00:36:39,130
In other words, eliminate y
and get an equation for u hat.

480
00:36:39,130 --> 00:36:40,610
So what do I do?

481
00:36:40,610 --> 00:36:42,150
How do I do it?

482
00:36:42,150 --> 00:36:49,800
I multiply -- would you rather
look at equations or matrices?

483
00:36:49,800 --> 00:36:53,060
I've tried to keep the
two absolutely together.

484
00:36:53,060 --> 00:36:54,500
Let me look at the equation.

485
00:36:54,500 --> 00:36:58,620
What shall I multiply that
equation by and subtract from

486
00:36:58,620 --> 00:37:01,370
this -- I just
want to eliminate.

487
00:37:01,370 --> 00:37:05,520
I want to get y hat out of
there and leave just an equation

488
00:37:05,520 --> 00:37:09,040
for u hat that we will
totally recognize.

489
00:37:09,040 --> 00:37:10,350
So what do I do?

490
00:37:10,350 --> 00:37:14,250
I multiplying that
first equation by?

491
00:37:14,250 --> 00:37:17,000
A transpose, thanks.

492
00:37:17,000 --> 00:37:20,760
So I multiply this first
equation by A transpose.

493
00:37:20,760 --> 00:37:22,040
Let me just do it this way.

494
00:37:25,020 --> 00:37:27,610
Now what?

495
00:37:27,610 --> 00:37:30,120
A transpose y is zero.

496
00:37:30,120 --> 00:37:32,620
Now I use the second -- well,
this is one way to do it.

497
00:37:32,620 --> 00:37:34,310
A transpose y is zero.

498
00:37:34,310 --> 00:37:36,450
What am I left with?

499
00:37:36,450 --> 00:37:38,550
The normal equation.

500
00:37:38,550 --> 00:37:40,040
Well, it shouldn't
be a surprise.

501
00:37:40,040 --> 00:37:42,120
We had to end up with
the normal equation.

502
00:37:42,120 --> 00:37:44,590
Maybe you would rather
-- and actually,

503
00:37:44,590 --> 00:37:49,240
what I intended was to make
it sound more like Gaussian

504
00:37:49,240 --> 00:37:50,440
elimination.

505
00:37:50,440 --> 00:37:53,270
Multiply this row
by A transpose,

506
00:37:53,270 --> 00:37:55,240
subtract from this row.

507
00:37:55,240 --> 00:38:01,490
I still have identity A up here
-- when I do that subtraction,

508
00:38:01,490 --> 00:38:04,180
I get the zero,
that was the point.

509
00:38:04,180 --> 00:38:07,350
Here I get A transpose
A subtracted from zero,

510
00:38:07,350 --> 00:38:14,170
it's minus A transpose
A, y hat, u hat.

511
00:38:14,170 --> 00:38:17,960
Of course, I had to do the same
thing to the right-hand side,

512
00:38:17,960 --> 00:38:22,360
and when I subtracted this
the b was still there,

513
00:38:22,360 --> 00:38:26,580
but it was minus A transpose b.

514
00:38:26,580 --> 00:38:32,040
So, now I've got in
the second equation --

515
00:38:32,040 --> 00:38:36,550
the second equation only
involves u hat, and, of course,

516
00:38:36,550 --> 00:38:40,780
when I changed the
signs, it's our friend --

517
00:38:40,780 --> 00:38:43,720
A transpose A u hat
equal A transpose b.

518
00:38:43,720 --> 00:38:52,030
So this is -- maybe you would
say the natural way to solve

519
00:38:52,030 --> 00:38:59,660
this type of system, but I want
to emphasize -- throw it away.

520
00:38:59,660 --> 00:39:06,700
I really want to emphasize
the importance of these --

521
00:39:06,700 --> 00:39:13,010
let me clean it back up
again to what it was --

522
00:39:13,010 --> 00:39:14,120
of these block systems.

523
00:39:17,710 --> 00:39:20,330
Now, they need a name.

524
00:39:20,330 --> 00:39:27,430
We have to give some name to
this type of two-field problem.

525
00:39:27,430 --> 00:39:29,740
I guess then, in
the next month, I'm

526
00:39:29,740 --> 00:39:32,800
going to find examples
of it everywhere.

527
00:39:32,800 --> 00:39:35,040
So here I've found the
first example of it

528
00:39:35,040 --> 00:39:39,030
in ordinary old-fashioned
least square.

529
00:39:39,030 --> 00:39:41,600
So what am I going to call this?

530
00:39:41,600 --> 00:39:46,560
I'll call it -- I'll give
it a couple of names.

531
00:39:46,560 --> 00:39:50,020
Saddle point equation,
saddle point system maybe

532
00:39:50,020 --> 00:39:50,710
I should say.

533
00:39:53,530 --> 00:39:57,810
I have to explain, why do I
call it saddle point system.

534
00:39:57,810 --> 00:40:05,740
In optimization, I could call
it the optimality equation --

535
00:40:05,740 --> 00:40:08,130
just meaning it's the
equations for the winners.

536
00:40:11,190 --> 00:40:16,300
In the world of optimization,
the names of Kuhn and Tucker

537
00:40:16,300 --> 00:40:20,490
are associated with
these equations --

538
00:40:20,490 --> 00:40:29,010
the Kuhn-Tucker equations,
and there are other names

539
00:40:29,010 --> 00:40:30,420
we'll see.

540
00:40:30,420 --> 00:40:35,370
But let me just say for a
moment why saddle point.

541
00:40:35,370 --> 00:40:40,230
Why do I think of this as
a saddle point problem?

542
00:40:45,520 --> 00:40:49,050
See, the point
about A transpose A

543
00:40:49,050 --> 00:40:52,120
was that it was
positive definite.

544
00:40:52,120 --> 00:40:56,670
This is A transpose A.

545
00:40:56,670 --> 00:41:01,810
Now what's the corresponding
issue for this matrix?

546
00:41:01,810 --> 00:41:05,320
So this is my matrix that
I'm constantly gonna look at.

547
00:41:08,960 --> 00:41:12,770
Matrices of that form are
going to show up all the time.

548
00:41:12,770 --> 00:41:18,920
I've said probably in
18.085 where these appear

549
00:41:18,920 --> 00:41:20,940
but then we don't
do much with them.

550
00:41:20,940 --> 00:41:22,570
Now we're ready to do something.

551
00:41:26,580 --> 00:41:28,920
I didn't appreciate
their importance

552
00:41:28,920 --> 00:41:32,320
until I realized,
in going to lectures

553
00:41:32,320 --> 00:41:36,300
on applied mathematics,
that if I waited a little

554
00:41:36,300 --> 00:41:38,900
while that matrix would appear.

555
00:41:38,900 --> 00:41:40,010
That block matrix.

556
00:41:40,010 --> 00:41:42,570
It just shows up in
all these applications.

557
00:41:47,750 --> 00:41:53,930
One of our issues will
be how to solve it.

558
00:41:53,930 --> 00:41:59,420
Another issue that
comes first is what's

559
00:41:59,420 --> 00:42:01,320
the general form of this?

560
00:42:01,320 --> 00:42:03,450
Can I jump to that issue?

561
00:42:03,450 --> 00:42:09,020
Just so we see something more
than this single problem here.

562
00:42:09,020 --> 00:42:20,820
Let me put in the matrix as
it comes in applications.

563
00:42:20,820 --> 00:42:24,790
Some matrix A,
rectangular, right?

564
00:42:24,790 --> 00:42:27,550
Its transpose.

565
00:42:27,550 --> 00:42:28,370
A zero.

566
00:42:28,370 --> 00:42:30,440
Often a zero.

567
00:42:30,440 --> 00:42:34,440
But what's up here is
not always the identity.

568
00:42:34,440 --> 00:42:37,730
I want to allow
something more general.

569
00:42:37,730 --> 00:42:41,010
I want to allow, for example,
weighted least squares.

570
00:42:41,010 --> 00:42:46,680
So weighted least squares --
if you've met least squares,

571
00:42:46,680 --> 00:42:55,060
it's very important to meet
its extension to weighted least

572
00:42:55,060 --> 00:42:55,970
squares.

573
00:42:55,970 --> 00:43:04,700
When the equations A*u equal b
are not given the same weight,

574
00:43:04,700 --> 00:43:08,440
there's a weighting matrix,
often it's a covariance matrix.

575
00:43:08,440 --> 00:43:12,680
I'm going to call the matrix
that goes in here C inverse.

576
00:43:16,610 --> 00:43:25,690
So this will be then an
important class of application.

577
00:43:25,690 --> 00:43:29,740
This is pretty important already
when the identity is there,

578
00:43:29,740 --> 00:43:34,350
but many, many applications
produce some other matrix that

579
00:43:34,350 --> 00:43:39,000
is usually -- it very, very
often is symmetric positive

580
00:43:39,000 --> 00:43:44,690
definite in that corner,
like the identity is.

581
00:43:44,690 --> 00:43:48,230
But key point, which
I have to make today.

582
00:43:48,230 --> 00:43:52,600
Is the whole matrix --
either this one or this one.

583
00:43:52,600 --> 00:43:54,710
It is symmetric, right?

584
00:43:54,710 --> 00:43:56,570
That matrix is symmetric.

585
00:43:56,570 --> 00:43:59,530
Is it or isn't it
positive definite?

586
00:43:59,530 --> 00:44:04,060
If I do elimination do I
get all positive pivots?

587
00:44:04,060 --> 00:44:07,940
It's a matrix of size m plus n.

588
00:44:07,940 --> 00:44:11,500
So I'm asking, are all
its eigenvalues positive,

589
00:44:11,500 --> 00:44:15,130
but I don't want to really
compute eigenvalues.

590
00:44:15,130 --> 00:44:17,740
Also, in a lot of cases I would.

591
00:44:17,740 --> 00:44:19,930
Finding the eigenvalues
of this matrix

592
00:44:19,930 --> 00:44:24,410
would lead me to the
singular value decomposition,

593
00:44:24,410 --> 00:44:29,120
absolutely crucial topic in
linear algebra that we'll see.

594
00:44:29,120 --> 00:44:32,850
But let me just take
it as a linear system.

595
00:44:32,850 --> 00:44:40,610
If I do elimination, what
are the first m pivots?

596
00:44:40,610 --> 00:44:43,490
Let me not be abstract here.

597
00:44:43,490 --> 00:44:45,530
Let me be quite concrete.

598
00:44:45,530 --> 00:44:47,700
Let me put the identity here.

599
00:44:47,700 --> 00:44:53,310
Let me put some matrix -- oh,
I want the matrix here to be --

600
00:44:53,310 --> 00:44:58,630
I better put a bigger identity
just so we see the picture.

601
00:44:58,630 --> 00:45:03,530
Here I'm going to put to an A
transpose, which might be 2, 3,

602
00:45:03,530 --> 00:45:07,650
4; 5, 6, 7 -- when I
write numbers like that

603
00:45:07,650 --> 00:45:11,360
you'll realize that I've just
pick them out of the hat.

604
00:45:11,360 --> 00:45:15,920
Here is the transpose,
2, 3, 4; 5, 6, 7.

605
00:45:15,920 --> 00:45:18,120
Here's the zero block.

606
00:45:20,970 --> 00:45:23,575
I'd like to know
about that matrix.

607
00:45:23,575 --> 00:45:24,200
It's symmetric.

608
00:45:29,040 --> 00:45:32,100
And it's full rank.

609
00:45:32,100 --> 00:45:34,060
It's invertible.

610
00:45:34,060 --> 00:45:35,490
How do I know that?

611
00:45:35,490 --> 00:45:37,330
It's the invertibility
-- of course,

612
00:45:37,330 --> 00:45:40,170
the identity part is great.

613
00:45:40,170 --> 00:45:45,870
I guess I see that this is
invertible because I've ended

614
00:45:45,870 --> 00:45:52,820
up with A transpose A, and
here my A has rank two --

615
00:45:52,820 --> 00:45:56,450
those two columns
are independent.

616
00:45:56,450 --> 00:45:58,370
They're not in the same
direction -- [2, 3,

617
00:45:58,370 --> 00:46:00,820
4] is not a multiple
of [5, 6,  7].

618
00:46:00,820 --> 00:46:05,930
That's an invertible matrix, and
this process finds the inverse.

619
00:46:05,930 --> 00:46:07,880
Elimination finds the inverse.

620
00:46:07,880 --> 00:46:10,510
It's the pivots I
want to ask you about.

621
00:46:10,510 --> 00:46:15,480
What are the pivots
in this matrix?

622
00:46:15,480 --> 00:46:17,500
So what do I do?

623
00:46:17,500 --> 00:46:22,170
The first pivot is a 1 -- I use
it to clean out that column.

624
00:46:22,170 --> 00:46:25,420
The second pivot is a 1 -- I
use it to clean out that column.

625
00:46:25,420 --> 00:46:28,340
The third pivot is a 1 -- I use
it to clean out that column.

626
00:46:31,660 --> 00:46:32,740
What's the next pivot?

627
00:46:32,740 --> 00:46:38,340
What do I have -- of course, now
some stuff has filled in here.

628
00:46:38,340 --> 00:46:40,230
What is actually
filled in there?

629
00:46:40,230 --> 00:46:43,770
So this is the identity,
if I can write it fast.

630
00:46:43,770 --> 00:46:46,450
This guy is now the zero.

631
00:46:46,450 --> 00:46:50,010
This guy didn't move.

632
00:46:50,010 --> 00:46:54,060
What matrix filled in here?

633
00:46:54,060 --> 00:46:56,100
Well, just what I
was doing there.

634
00:46:56,100 --> 00:46:59,300
Elimination is exactly what
I'm repeating with numbers,

635
00:46:59,300 --> 00:47:00,740
what I did there with letters.

636
00:47:00,740 --> 00:47:05,540
What's in here is
minus A transpose A.

637
00:47:05,540 --> 00:47:08,970
I could figure out what
that -- I could do --

638
00:47:08,970 --> 00:47:15,070
if I was fast enough, I could do
2, 3, 4; 5, 6, 7 times 2, 3, 4;

639
00:47:15,070 --> 00:47:21,840
5, 6, 7 and I'd get this little
two by two matrix that sits

640
00:47:21,840 --> 00:47:22,340
there.

641
00:47:25,150 --> 00:47:28,850
With a minus sign,
and that's the point.

642
00:47:28,850 --> 00:47:31,900
That the pivots -- of course,
what's this first number going

643
00:47:31,900 --> 00:47:32,400
to be?

644
00:47:32,400 --> 00:47:34,610
4 plus 9 plus 16.

645
00:47:34,610 --> 00:47:39,060
29 is that number.

646
00:47:39,060 --> 00:47:44,430
So a minus 29 sits right there.

647
00:47:44,430 --> 00:47:46,560
That's the next pivot.

648
00:47:46,560 --> 00:47:50,280
The next pivot is a
negative number, minus 29,

649
00:47:50,280 --> 00:47:52,370
and the fifth pivot is negative.

650
00:47:52,370 --> 00:47:55,860
So what I'm seeing
is a matrix --

651
00:47:55,860 --> 00:48:05,140
this matrix has three
positive pivots,

652
00:48:05,140 --> 00:48:07,240
and two negative pivots.

653
00:48:11,340 --> 00:48:13,500
I sort of say that
was a saddle point.

654
00:48:13,500 --> 00:48:20,440
Positive pivots describe for me
a surface that's going upwards.

655
00:48:20,440 --> 00:48:22,640
This surface is going upward.

656
00:48:22,640 --> 00:48:25,660
I'm a surface in
five dimensions here.

657
00:48:25,660 --> 00:48:28,950
It's going upwards
in three directions,

658
00:48:28,950 --> 00:48:32,560
but it's going downwards in two.

659
00:48:32,560 --> 00:48:37,150
The point at the heart
of it, the saddle point,

660
00:48:37,150 --> 00:48:41,930
is the solution to our
system, is the y hat, u hat.

661
00:48:46,430 --> 00:48:54,020
Well, one conclusion
is that I wouldn't

662
00:48:54,020 --> 00:48:56,020
be able to use conjugate
gradient methods,

663
00:48:56,020 --> 00:48:59,230
for example, which we've just
learned how powerful they

664
00:48:59,230 --> 00:49:03,830
are, on the big matrix because
it's not positive definite.

665
00:49:03,830 --> 00:49:04,650
It's symmetric.

666
00:49:04,650 --> 00:49:07,010
I could use some other
available methods.

667
00:49:07,010 --> 00:49:09,020
I couldn't use
conjugate gradient.

668
00:49:09,020 --> 00:49:11,460
So if I wanted to use
conjugate gradient,

669
00:49:11,460 --> 00:49:23,480
I better do this reduction
to the definite system.

670
00:49:26,480 --> 00:49:34,000
That's longer than I intended
to spend on the simple example.

671
00:49:34,000 --> 00:49:38,240
But if you see that
example, then we'll

672
00:49:38,240 --> 00:49:44,320
be ready to move it to the
wide variety of applications.

673
00:49:44,320 --> 00:49:52,890
So let me just note that one
section, already up on the web,

674
00:49:52,890 --> 00:49:58,000
called saddle point systems,
solves differential equations

675
00:49:58,000 --> 00:49:59,640
that are of this kind.

676
00:49:59,640 --> 00:50:04,670
So we'll come to that, and we'll
come to matrix problems too.

677
00:50:04,670 --> 00:50:08,170
It's a very, very
central question,

678
00:50:08,170 --> 00:50:13,580
how to solve linear systems
with matrices of that form.

679
00:50:13,580 --> 00:50:16,830
In fact, I guess
I could say it's

680
00:50:16,830 --> 00:50:21,220
almost the fundamental problem
of numerical linear algebra,

681
00:50:21,220 --> 00:50:27,620
is to solve systems that
fall into that saddle point

682
00:50:27,620 --> 00:50:29,240
description.

683
00:50:29,240 --> 00:50:34,730
I'll try to justify
that by the importance

684
00:50:34,730 --> 00:50:38,660
I'm assigning to this
problem in the next weeks.

685
00:50:38,660 --> 00:50:39,270
OK.

686
00:50:39,270 --> 00:50:46,150
Thanks for today and
I'll turn off volume.