1
00:00:00,000 --> 00:00:01,470
INTRODUCTION: The
following content

2
00:00:01,470 --> 00:00:04,420
is provided by MIT
OpenCourseWare under a Creative

3
00:00:04,420 --> 00:00:05,844
Commons license.

4
00:00:05,844 --> 00:00:07,510
Additional information
about our license

5
00:00:07,510 --> 00:00:10,550
and MIT OpenCourseWare
in general

6
00:00:10,550 --> 00:00:11,930
is available at ocw.mit.edu.

7
00:00:15,620 --> 00:00:17,230
PROFESSOR: Specific
problem, and it's

8
00:00:17,230 --> 00:00:22,310
a pure linear least
squares problem,

9
00:00:22,310 --> 00:00:24,950
but it's got two terms.

10
00:00:24,950 --> 00:00:30,190
So we're used to minimizing
A*u minus b square.

11
00:00:30,190 --> 00:00:33,180
That gives us the least
squares solution u

12
00:00:33,180 --> 00:00:36,430
hat to a linear system.

13
00:00:36,430 --> 00:00:42,360
And usually the reason we have
to go to the least square thing

14
00:00:42,360 --> 00:00:44,900
is that there's
no exact solution.

15
00:00:44,900 --> 00:00:50,390
Probably A has more
equations than unknowns.

16
00:00:50,390 --> 00:00:55,230
A is long and thin, and
there's no exact solution,

17
00:00:55,230 --> 00:00:58,600
so we look for the best
solution, and we call it u hat.

18
00:00:58,600 --> 00:00:59,280
OK.

19
00:00:59,280 --> 00:01:01,960
But there are a lot
of problems in which

20
00:01:01,960 --> 00:01:05,540
a second square appears.

21
00:01:05,540 --> 00:01:10,240
There's also a B*u equal d
hiding in the background.

22
00:01:10,240 --> 00:01:13,430
And so we really have like
two sets of equations.

23
00:01:17,590 --> 00:01:22,500
And we multiply that second
square by some factor

24
00:01:22,500 --> 00:01:25,230
alpha and that wise
choice of alpha

25
00:01:25,230 --> 00:01:30,000
is usually a big
part of the problem.

26
00:01:30,000 --> 00:01:32,750
And I want to speak about
some of the applications

27
00:01:32,750 --> 00:01:34,570
of this area.

28
00:01:34,570 --> 00:01:41,500
So from the point of view
of the normal equations,

29
00:01:41,500 --> 00:01:43,790
the system that
you actually solve,

30
00:01:43,790 --> 00:01:46,470
you could say no problem.

31
00:01:46,470 --> 00:01:49,060
If we knew how to do
this, then we certainly

32
00:01:49,060 --> 00:01:54,280
can do both of them together,
because instead of A transpose

33
00:01:54,280 --> 00:01:58,950
A showing up, we'll now have
A transpose A plus alpha B

34
00:01:58,950 --> 00:02:05,230
transpose A. That'll be the
positive definite coefficient

35
00:02:05,230 --> 00:02:07,670
matrix on the left side.

36
00:02:07,670 --> 00:02:09,650
And then on the
right side, instead

37
00:02:09,650 --> 00:02:12,500
of just the usual
A transpose b, this

38
00:02:12,500 --> 00:02:17,010
term is also going to give
us an alpha B transpose d.

39
00:02:17,010 --> 00:02:24,410
All I'm saying is we don't need
any new mathematics to reach

40
00:02:24,410 --> 00:02:29,480
this normal equation with the
-- sort of the two-term normal

41
00:02:29,480 --> 00:02:30,030
equation.

42
00:02:30,030 --> 00:02:34,410
And another way to think of
exactly the same thing is we're

43
00:02:34,410 --> 00:02:42,410
looking at the least squares
problem, where the two matrices

44
00:02:42,410 --> 00:02:46,710
A and B both are multiplying u.

45
00:02:46,710 --> 00:02:51,550
And we have two bits
of data, b and d,

46
00:02:51,550 --> 00:02:53,570
and all were doing
is the usual thing

47
00:02:53,570 --> 00:02:55,730
but with a weight in here.

48
00:02:55,730 --> 00:02:58,980
And the weight is the identity
matrix for the A part,

49
00:02:58,980 --> 00:03:01,830
and it's alpha times the
identity matrix for the B part.

50
00:03:01,830 --> 00:03:04,510
So this is our C right.

51
00:03:04,510 --> 00:03:11,150
This is our C, just
to say that, really,

52
00:03:11,150 --> 00:03:17,070
the notation that we created,
the formulation we have,

53
00:03:17,070 --> 00:03:21,910
allows us to take this step, so
C appears here and A transpose

54
00:03:21,910 --> 00:03:26,600
C*b, C appears over here
too, just as always.

55
00:03:26,600 --> 00:03:29,580
OK.

56
00:03:29,580 --> 00:03:31,920
But there are
important questions.

57
00:03:31,920 --> 00:03:34,440
And of course, always, the
first important question

58
00:03:34,440 --> 00:03:38,280
in applied math is what
problem are you solving?

59
00:03:38,280 --> 00:03:42,810
Why have we produced
this class of problems?

60
00:03:42,810 --> 00:03:45,840
And I have two answers.

61
00:03:45,840 --> 00:03:50,320
Let me just mention
first, so we are sure what

62
00:03:50,320 --> 00:03:53,080
the shape of these matrices is.

63
00:03:53,080 --> 00:03:58,750
A, as always, has more
rows than columns.

64
00:03:58,750 --> 00:04:00,620
Of course, u is n by 1.

65
00:04:04,590 --> 00:04:09,490
It's just a column vector.

66
00:04:09,490 --> 00:04:14,530
But A has too many
equations, too many rows,

67
00:04:14,530 --> 00:04:19,250
for us to get an exact
solution; B, on the other hand,

68
00:04:19,250 --> 00:04:20,950
has few rows.

69
00:04:20,950 --> 00:04:23,230
It might even only have one row.

70
00:04:23,230 --> 00:04:30,820
It's very common to add on
one constraint or one term

71
00:04:30,820 --> 00:04:33,220
in regularizing the situation.

72
00:04:33,220 --> 00:04:37,300
Anyway, p is relatively small.

73
00:04:37,300 --> 00:04:47,960
So the total matrix A, B has
m plus p rows, and the same n

74
00:04:47,960 --> 00:04:51,380
columns, and we're ready to go.

75
00:04:51,380 --> 00:04:56,710
But the two parts are
different somehow.

76
00:04:56,710 --> 00:04:58,910
They come for different reasons.

77
00:04:58,910 --> 00:05:04,200
And now, I wrote down here
two places they come from.

78
00:05:04,200 --> 00:05:08,990
And these are big
applications of applied math.

79
00:05:08,990 --> 00:05:14,600
And one of them produces
small coefficients alpha.

80
00:05:14,600 --> 00:05:20,460
And what's the purpose of the
B*u minus d term in that case,

81
00:05:20,460 --> 00:05:22,310
with just a small alpha?

82
00:05:24,970 --> 00:05:26,770
The problem is that
the A transpose

83
00:05:26,770 --> 00:05:31,620
A part is nearly
singular or is singular.

84
00:05:31,620 --> 00:05:35,840
So that the usual normal
equation, without the B,

85
00:05:35,840 --> 00:05:42,320
would be in trouble, and this
of course happens pretty often.

86
00:05:42,320 --> 00:05:49,980
So the idea of regularization
is get some control

87
00:05:49,980 --> 00:05:56,330
of the solution by putting
in another term that

88
00:05:56,330 --> 00:06:01,290
keeps some control over u, and
stops it from just taking off,

89
00:06:01,290 --> 00:06:06,550
as what happened where the
original normal equations would

90
00:06:06,550 --> 00:06:08,580
have a very large u hat.

91
00:06:08,580 --> 00:06:16,820
So we're just, like, adding
a little steady part that

92
00:06:16,820 --> 00:06:18,800
keeps it a bit under control.

93
00:06:18,800 --> 00:06:23,790
And so the A transpose A is
nearly singular in ill-posed

94
00:06:23,790 --> 00:06:27,800
problems, so we make them
-- it's like giving aspirin

95
00:06:27,800 --> 00:06:29,740
to an ill-posed problem, right?

96
00:06:29,740 --> 00:06:34,880
You don't fix it,
but it can operate.

97
00:06:34,880 --> 00:06:35,550
OK.

98
00:06:35,550 --> 00:06:39,880
And where do ill-posed
problems come from?

99
00:06:39,880 --> 00:06:46,100
And I just wanted to say that I
think the fundamental ill-posed

100
00:06:46,100 --> 00:06:53,470
problems in science
is: given positions --

101
00:06:53,470 --> 00:06:59,650
suppose we know that
a mass, let's say,

102
00:06:59,650 --> 00:07:03,480
is in certain positions
at certain times --

103
00:07:03,480 --> 00:07:06,150
find the velocity.

104
00:07:06,150 --> 00:07:09,820
So we often, in
applications have some way

105
00:07:09,820 --> 00:07:13,280
to know position, and
want to know velocity.

106
00:07:13,280 --> 00:07:17,090
And maybe you realize
that that problem is not

107
00:07:17,090 --> 00:07:22,420
well posed, because velocity
takes the derivative.

108
00:07:22,420 --> 00:07:24,450
And if you take the
derivative, that's

109
00:07:24,450 --> 00:07:28,140
not a good operator to invert.

110
00:07:28,140 --> 00:07:32,990
Taking the derivative
makes things very rough.

111
00:07:41,490 --> 00:07:47,300
All sorts of cases, we're
looking for the velocity,

112
00:07:47,300 --> 00:07:48,570
and we only have positions.

113
00:07:48,570 --> 00:07:54,440
One that I think
about is from GPS.

114
00:07:54,440 --> 00:08:00,220
So GPS uses space-based
satellites, as you all know,

115
00:08:00,220 --> 00:08:04,470
to give you very
accurate positions.

116
00:08:04,470 --> 00:08:06,710
And somehow, out
of those positions,

117
00:08:06,710 --> 00:08:09,920
you get pretty accurate
but not, of course,

118
00:08:09,920 --> 00:08:15,640
as accurate as the positions,
but you get decent velocities.

119
00:08:15,640 --> 00:08:16,530
And how?

120
00:08:16,530 --> 00:08:21,360
And there is an example where
you want to know the motion --

121
00:08:21,360 --> 00:08:24,720
of course, to ask for the
acceleration would be asking

122
00:08:24,720 --> 00:08:27,380
for yet another derivative.

123
00:08:27,380 --> 00:08:35,120
You see why the derivative
is an ill-posed thing?

124
00:08:35,120 --> 00:08:37,590
Let me just say
ahead of time, I'm

125
00:08:37,590 --> 00:08:46,660
going to make today's lecture
about direction number two,

126
00:08:46,660 --> 00:08:48,340
not the ill-posed problems.

127
00:08:48,340 --> 00:08:51,300
So I'm just, like,
throwing in some comments

128
00:08:51,300 --> 00:08:53,755
about the ill-posed
problem, and then

129
00:08:53,755 --> 00:08:55,730
I'll have a weekend
to think about those,

130
00:08:55,730 --> 00:08:58,330
and then next week,
I'll come back

131
00:08:58,330 --> 00:09:00,760
to this ill posed problems.

132
00:09:00,760 --> 00:09:06,320
And specifically, they often
come from inverse problems,

133
00:09:06,320 --> 00:09:12,580
is a big source of
ill posed problems

134
00:09:12,580 --> 00:09:14,470
that need regularization.

135
00:09:14,470 --> 00:09:20,490
It's just a very large
class of equations.

136
00:09:20,490 --> 00:09:27,800
I mean, I was just going to say
about the derivative example.

137
00:09:27,800 --> 00:09:30,700
Why is that so unstable?

138
00:09:30,700 --> 00:09:37,340
Well, from the point
of finite differences,

139
00:09:37,340 --> 00:09:40,480
if we have positions, how
do you estimate velocities?

140
00:09:40,480 --> 00:09:42,990
You take a difference
quotient, right?

141
00:09:42,990 --> 00:09:47,900
You take the position at this
time, the position at a close

142
00:09:47,900 --> 00:09:51,630
by time, and you
divide by delta t.

143
00:09:51,630 --> 00:09:53,780
That's a reasonable start.

144
00:09:53,780 --> 00:09:56,640
But dividing by delta
t, that small number,

145
00:09:56,640 --> 00:10:00,120
is producing big numbers.

146
00:10:00,120 --> 00:10:05,790
Any errors in the position
are multiplied by that 1

147
00:10:05,790 --> 00:10:09,550
over delta t and blown up.

148
00:10:09,550 --> 00:10:13,330
And similarly, in
frequency space,

149
00:10:13,330 --> 00:10:17,330
where the functions that we
think about are the functions

150
00:10:17,330 --> 00:10:22,520
like e to the i*k*t, the
derivatives brings down

151
00:10:22,520 --> 00:10:24,650
the factor k.

152
00:10:24,650 --> 00:10:28,050
So high oscillations,
that's the point.

153
00:10:28,050 --> 00:10:31,160
Oscillatory functions
can be pretty small,

154
00:10:31,160 --> 00:10:34,150
but their derivative
can be enormous.

155
00:10:34,150 --> 00:10:38,710
So it's that oscillation
which is often associated

156
00:10:38,710 --> 00:10:40,940
with noise in the measurements.

157
00:10:40,940 --> 00:10:45,440
You know, noisy
measurements are jumpy,

158
00:10:45,440 --> 00:10:51,000
and when we go to take their
derivative or their finite

159
00:10:51,000 --> 00:10:52,980
difference, we get big answers.

160
00:10:52,980 --> 00:11:01,830
Anyway, for me that's the
model ill-posed problem,

161
00:11:01,830 --> 00:11:03,890
to find velocities.

162
00:11:03,890 --> 00:11:05,600
And how to do it?

163
00:11:05,600 --> 00:11:09,090
I mean a lot of thought
has gone into that.

164
00:11:09,090 --> 00:11:14,060
Let me leave it there,
and come back to it.

165
00:11:14,060 --> 00:11:19,340
But I say all this just to
emphasize its importance.

166
00:11:19,340 --> 00:11:22,950
Not that we'll completely
solve it, actually,

167
00:11:22,950 --> 00:11:25,730
for GPS or for any
other thing, it's

168
00:11:25,730 --> 00:11:29,090
just all we can do is medicate.

169
00:11:29,090 --> 00:11:29,810
OK.

170
00:11:29,810 --> 00:11:34,490
Now this is the one that
we can really solve.

171
00:11:34,490 --> 00:11:37,820
So this is a different
application entirely.

172
00:11:37,820 --> 00:11:42,180
In this application, this
second term, B*u equal d,

173
00:11:42,180 --> 00:11:46,060
is something important,
something that we want

174
00:11:46,060 --> 00:11:48,230
to enforce.

175
00:11:48,230 --> 00:11:51,740
It's a constraint,
you could say.

176
00:11:51,740 --> 00:11:55,550
And one way to enforce it
which fits this pattern

177
00:11:55,550 --> 00:11:58,380
is to take alpha
very large, right.

178
00:11:58,380 --> 00:12:02,350
When we take alpha large, we're
putting a really heavy weight

179
00:12:02,350 --> 00:12:07,020
on that B*u minus d square,
and when we minimize,

180
00:12:07,020 --> 00:12:12,520
that weight will force B*u
to be pretty close to d.

181
00:12:12,520 --> 00:12:16,100
But of course, B*u equal
d doesn't determine u.

182
00:12:16,100 --> 00:12:18,450
Everybody's got
that picture clear?

183
00:12:18,450 --> 00:12:23,410
From B*u equal d
has many solutions.

184
00:12:28,520 --> 00:12:33,160
And so the real problem that
we're trying to solve is

185
00:12:33,160 --> 00:12:38,010
enforce B*u equal d, but
among those solutions,

186
00:12:38,010 --> 00:12:42,780
pick the one that
minimizes the first square,

187
00:12:42,780 --> 00:12:45,010
A*u minus b squared.

188
00:12:45,010 --> 00:12:46,810
So you see the difference?

189
00:12:46,810 --> 00:12:48,360
You're trying to
enforce something

190
00:12:48,360 --> 00:12:52,830
that the physics or the
geometry, or whatever

191
00:12:52,830 --> 00:12:58,580
source says has to be true.

192
00:12:58,580 --> 00:13:00,180
And you can do it.

193
00:13:00,180 --> 00:13:04,220
And you're left with
lots of options.

194
00:13:04,220 --> 00:13:12,220
And then the combined problem
attempts to pick the right u.

195
00:13:12,220 --> 00:13:12,760
OK.

196
00:13:12,760 --> 00:13:16,630
So that's the
application number two

197
00:13:16,630 --> 00:13:18,270
that I want to
speak about today.

198
00:13:18,270 --> 00:13:24,100
And actually, I want to
give several ways to do it.

199
00:13:24,100 --> 00:13:26,830
It's a very important problem.

200
00:13:26,830 --> 00:13:34,580
And one way will be to
actually solve B*u equal d.

201
00:13:34,580 --> 00:13:38,070
Find those solutions.

202
00:13:38,070 --> 00:13:40,360
And you may say,
well, that's what

203
00:13:40,360 --> 00:13:42,000
we learned in linear
algebra, that's

204
00:13:42,000 --> 00:13:44,950
the very foundation
of linear algebra,

205
00:13:44,950 --> 00:13:48,570
is there a particular
solution, right?

206
00:13:51,310 --> 00:13:56,750
Every solution is of this form
particular plus null space.

207
00:13:56,750 --> 00:14:04,400
Maybe I'll just point to
the start of that approach.

208
00:14:04,400 --> 00:14:06,670
So want to solve B*u equal d.

209
00:14:06,670 --> 00:14:09,470
And I'll come back
to this method

210
00:14:09,470 --> 00:14:12,830
after dealing with the
least squares approach.

211
00:14:12,830 --> 00:14:16,200
But here's really
the direct approach.

212
00:14:16,200 --> 00:14:19,440
That if I solve B*u equal
d, then there's a particular

213
00:14:19,440 --> 00:14:21,750
solution that solves it.

214
00:14:21,750 --> 00:14:28,250
And then you can always add on
the general solution which is,

215
00:14:28,250 --> 00:14:30,380
sorry add on the null
space solution --

216
00:14:30,380 --> 00:14:32,990
the solution of B*u equals 0.

217
00:14:32,990 --> 00:14:36,230
And B*u equals 0 has
lots of solutions.

218
00:14:36,230 --> 00:14:38,850
So we would have to find them.

219
00:14:38,850 --> 00:14:40,920
OK.

220
00:14:40,920 --> 00:14:43,550
I mean that's what 18.06
would naturally do,

221
00:14:43,550 --> 00:14:47,090
but actually never,
I'm ashamed to say,

222
00:14:47,090 --> 00:14:49,270
but I didn't do it in 18.06.

223
00:14:49,270 --> 00:14:56,810
I never actually said how I
would scientifically compute,

224
00:14:56,810 --> 00:15:00,710
in a stable way, the solutions.

225
00:15:00,710 --> 00:15:01,250
OK.

226
00:15:01,250 --> 00:15:04,400
So I think that
will be important.

227
00:15:04,400 --> 00:15:06,650
But that's not the
only way to do it.

228
00:15:06,650 --> 00:15:11,030
That's called the
null space method.

229
00:15:11,030 --> 00:15:14,140
And sometimes it's the
right choice, sometimes not.

230
00:15:14,140 --> 00:15:20,930
This would be called the
heavy weight method, right.

231
00:15:20,930 --> 00:15:25,590
Put on a very heavy weight
and solve a standard problem.

232
00:15:25,590 --> 00:15:26,500
OK.

233
00:15:26,500 --> 00:15:28,900
So let me follow that one up.

234
00:15:28,900 --> 00:15:32,070
And then they'll
be a third method.

235
00:15:32,070 --> 00:15:34,570
And maybe there's going to be
space on the middle blackboard

236
00:15:34,570 --> 00:15:35,630
for it.

237
00:15:35,630 --> 00:15:37,930
And what would the
third method be?

238
00:15:37,930 --> 00:15:43,770
That will be use a
Lagrange multiplier.

239
00:15:43,770 --> 00:15:45,820
This thing is a constraint.

240
00:15:45,820 --> 00:15:48,510
I'll enforce it by a
Lagrange multiplier.

241
00:15:48,510 --> 00:15:49,050
OK.

242
00:15:49,050 --> 00:15:51,060
That's coming next.

243
00:15:51,060 --> 00:15:54,920
The way I'm enforcing it right
now is by a heavy weight.

244
00:15:54,920 --> 00:15:57,200
OK.

245
00:15:57,200 --> 00:16:01,650
One reason for the
popularity of this method

246
00:16:01,650 --> 00:16:05,000
is you don't have to
do any new thinking.

247
00:16:05,000 --> 00:16:11,070
You just create these
equations and solve them.

248
00:16:11,070 --> 00:16:16,380
Where the other methods maybe
ask us to think separately

249
00:16:16,380 --> 00:16:19,694
about the constraint.

250
00:16:19,694 --> 00:16:21,360
Here we don't have
to things separately,

251
00:16:21,360 --> 00:16:24,910
we just create this normal
equation, we solve it,

252
00:16:24,910 --> 00:16:29,030
and we get an answer u hat.

253
00:16:29,030 --> 00:16:31,420
Maybe I should call
it u hat alpha,

254
00:16:31,420 --> 00:16:34,430
because it depends
on the weight alpha,

255
00:16:34,430 --> 00:16:41,480
certainly, which we hope
is near the exact solution.

256
00:16:41,480 --> 00:16:46,250
The exact solution being the one
that exactly solves B*u equal

257
00:16:46,250 --> 00:16:47,120
d.

258
00:16:47,120 --> 00:16:52,070
Because u hat alpha will not
exactly solve B*u equal d.

259
00:16:52,070 --> 00:16:56,530
But we can find solutions
that do, and then among those,

260
00:16:56,530 --> 00:17:00,530
we can minimize
A*u minus b square.

261
00:17:00,530 --> 00:17:01,030
OK.

262
00:17:01,030 --> 00:17:06,101
So just a word about
this heavy weight method.

263
00:17:06,101 --> 00:17:06,600
OK.

264
00:17:10,260 --> 00:17:14,700
Well, first an
interesting point.

265
00:17:14,700 --> 00:17:19,240
A point that I think
is sort of interesting.

266
00:17:19,240 --> 00:17:22,650
I want to let alpha go to
infinity and see what happens,

267
00:17:22,650 --> 00:17:24,480
right.

268
00:17:24,480 --> 00:17:26,970
Everybody figures that as
alpha goes to infinity,

269
00:17:26,970 --> 00:17:29,490
I'm going to get
the right answer.

270
00:17:29,490 --> 00:17:31,150
Because as alpha
goes to infinity,

271
00:17:31,150 --> 00:17:34,370
it's going to more and more
enforce the constraint B*u

272
00:17:34,370 --> 00:17:35,550
equal d.

273
00:17:35,550 --> 00:17:39,510
And then, with that constraint
enforced, the other part of it

274
00:17:39,510 --> 00:17:43,050
will find the best
u and that's great.

275
00:17:43,050 --> 00:17:46,620
But let alpha go to
infinity in this equation,

276
00:17:46,620 --> 00:17:47,400
and what happens?

277
00:17:50,050 --> 00:17:53,760
So this is just like a side
comment just to say alpha,

278
00:17:53,760 --> 00:17:58,050
you know, taking a limit you got
to think about doing it right.

279
00:17:58,050 --> 00:18:01,694
Well, let's see, if I let
alpha go to infinity as it is,

280
00:18:01,694 --> 00:18:03,360
that'll be infinite
that'll be infinite,

281
00:18:03,360 --> 00:18:06,140
and I won't know
what's going on.

282
00:18:06,140 --> 00:18:12,960
Let me divide by alpha before
I let alpha go to infinity.

283
00:18:12,960 --> 00:18:15,850
So if I just divide
everything by alpha --

284
00:18:15,850 --> 00:18:19,790
can I do that with
an eraser here?

285
00:18:19,790 --> 00:18:21,810
I'll divide by alpha.

286
00:18:21,810 --> 00:18:24,580
So there's a 1 over alpha here.

287
00:18:24,580 --> 00:18:26,590
I divide this by alpha.

288
00:18:26,590 --> 00:18:30,960
And this has a 1
over alpha there.

289
00:18:30,960 --> 00:18:33,480
And now, if I let
alpha go to infinity,

290
00:18:33,480 --> 00:18:36,800
I get something sensible.

291
00:18:36,800 --> 00:18:41,150
This goes to 0, right,
alpha going to infinity,

292
00:18:41,150 --> 00:18:42,420
getting bigger and bigger.

293
00:18:42,420 --> 00:18:43,850
This goes to 0.

294
00:18:43,850 --> 00:18:45,370
So what do I get in the limit?

295
00:18:45,370 --> 00:18:51,640
I get that this equals
this in the limit.

296
00:18:51,640 --> 00:18:53,260
So shall I put that up here?

297
00:18:53,260 --> 00:18:56,220
Well, I'll put it here, because
I don't like it, frankly.

298
00:18:58,930 --> 00:19:03,390
So I'll just squeeze
it in this little spot.

299
00:19:03,390 --> 00:19:07,350
That if I let alpha go to
into infinity, so 1 over alpha

300
00:19:07,350 --> 00:19:08,030
goes to 0.

301
00:19:08,030 --> 00:19:14,810
I get B transpose B, u hat
infinity, shall I call it?

302
00:19:14,810 --> 00:19:16,450
Equals B transpose d.

303
00:19:19,440 --> 00:19:21,750
And I guess what I
want to say is, from

304
00:19:21,750 --> 00:19:25,580
that I don't learn a whole lot.

305
00:19:25,580 --> 00:19:28,100
because B transpose B
is a singular matrix,

306
00:19:28,100 --> 00:19:33,710
B transpose B is a
matrix of only rank p,

307
00:19:33,710 --> 00:19:40,350
it's very singular, right?

308
00:19:40,350 --> 00:19:44,760
B had this crazy
shape, long and thin.

309
00:19:44,760 --> 00:19:47,660
B transpose B will
be tall, B transpose

310
00:19:47,660 --> 00:19:52,980
B will be a large matrix,
but its rank will only be p.

311
00:19:52,980 --> 00:19:55,230
It's an n by n matrix
of rank p, and it's

312
00:19:55,230 --> 00:19:58,690
singular and who knows
what's going on there.

313
00:20:01,350 --> 00:20:04,370
That little side
issue was simply

314
00:20:04,370 --> 00:20:08,740
to say that you
can't just let alpha

315
00:20:08,740 --> 00:20:11,530
go to infinity in the
central equation there,

316
00:20:11,530 --> 00:20:13,570
and expect to see
what's happening.

317
00:20:13,570 --> 00:20:14,070
OK.

318
00:20:14,070 --> 00:20:17,870
So somehow there's
more to it than that.

319
00:20:17,870 --> 00:20:25,420
So let me put alpha back where
it belongs, and think again.

320
00:20:25,420 --> 00:20:27,970
OK.

321
00:20:27,970 --> 00:20:35,340
And I guess by thinking
again, I might as well

322
00:20:35,340 --> 00:20:43,190
think in terms of this
way of writing it.

323
00:20:43,190 --> 00:20:45,760
Because I recognize this, right?

324
00:20:45,760 --> 00:20:49,100
This is exactly the framework
that we've developed.

325
00:20:55,280 --> 00:20:58,760
So this is the least
squares problem.

326
00:20:58,760 --> 00:21:01,050
I just want to write
down the saddle point

327
00:21:01,050 --> 00:21:08,420
matrix that goes with this
least squares problem.

328
00:21:08,420 --> 00:21:10,230
What is the saddle point matrix?

329
00:21:10,230 --> 00:21:11,610
Do you remember?

330
00:21:11,610 --> 00:21:21,650
The saddle point matrix S
is -- it has, up here --

331
00:21:21,650 --> 00:21:26,980
so now I've got
an A and a B here.

332
00:21:26,980 --> 00:21:31,070
So it's going to be
a little bit larger.

333
00:21:31,070 --> 00:21:34,320
Then I have my usual zero block.

334
00:21:34,320 --> 00:21:38,800
And I have my usual A
transpose B transpose block.

335
00:21:38,800 --> 00:21:41,520
And what block goes there?

336
00:21:41,520 --> 00:21:43,280
That's the C inverse, right.

337
00:21:47,690 --> 00:21:53,426
It's our usual C inverse,
A, A transpose, 0

338
00:21:53,426 --> 00:21:54,800
that we're totally
accustomed to.

339
00:21:54,800 --> 00:22:03,050
But now A has grown
into A, B; 0 is still 0;

340
00:22:03,050 --> 00:22:05,890
the transpose is
still the transpose;

341
00:22:05,890 --> 00:22:09,240
and up here is C inverse,
and since C was this,

342
00:22:09,240 --> 00:22:14,050
C inverse will be the identity,
and the identity over alpha.

343
00:22:14,050 --> 00:22:15,590
OK.

344
00:22:15,590 --> 00:22:18,330
So that's my S_alpha,
you could say.

345
00:22:21,040 --> 00:22:32,020
And now I'm prepared to let --
so my equation is S_alpha --

346
00:22:32,020 --> 00:22:40,110
written as a block equation,
what are the pieces of it?

347
00:22:40,110 --> 00:22:47,670
u is the guy that I'm
looking for, the u hat alpha.

348
00:22:47,670 --> 00:22:50,880
And there was a Lagrange
multiplier that came in.

349
00:22:50,880 --> 00:22:54,440
You remember, that's how
we got to a block form

350
00:22:54,440 --> 00:22:56,080
from a scalar form.

351
00:22:56,080 --> 00:23:00,540
And I guess I usually call it
w, so I'll stay with w for here.

352
00:23:00,540 --> 00:23:02,170
OK.

353
00:23:02,170 --> 00:23:06,920
So that's what multiplies w, u.

354
00:23:06,920 --> 00:23:11,410
And it's what
gives -- let's see.

355
00:23:11,410 --> 00:23:19,590
I think it gives a B, and
it gives a d from this A*u

356
00:23:19,590 --> 00:23:23,004
and B*u, and I think
here if gives a 0,

357
00:23:23,004 --> 00:23:24,560
because we didn't have any.

358
00:23:24,560 --> 00:23:25,060
OK.

359
00:23:27,980 --> 00:23:29,450
What am I doing here?

360
00:23:29,450 --> 00:23:31,690
I'm just writing
the problem in a way

361
00:23:31,690 --> 00:23:34,740
where I can let alpha go
to 0, and see the limit.

362
00:23:34,740 --> 00:23:43,260
So let alpha go to infinity,
this is heavy weight part.

363
00:23:43,260 --> 00:23:44,520
So this will go to 0.

364
00:23:44,520 --> 00:23:52,320
So this approaches the
S_infinity, w_infinity,

365
00:23:52,320 --> 00:23:59,070
we could call it, u
hat infinity is now --

366
00:23:59,070 --> 00:24:03,400
well you see what the limit
is, that's 0 in that block.

367
00:24:03,400 --> 00:24:08,070
This is A, this is B, this is A
transpose, this is B transpose,

368
00:24:08,070 --> 00:24:11,220
this is our usual zero
block, multiplying

369
00:24:11,220 --> 00:24:18,120
our same w_infinity,
u hat infinity,

370
00:24:18,120 --> 00:24:23,730
equaling our same b, d, and 0.

371
00:24:23,730 --> 00:24:25,250
This is the limiting equation.

372
00:24:30,590 --> 00:24:31,970
And it's great.

373
00:24:35,230 --> 00:24:40,230
This is the equation that
determines the limit as alpha

374
00:24:40,230 --> 00:24:44,220
goes to infinity, that
determines the best u.

375
00:24:44,220 --> 00:24:48,660
This is the problem that
we really want to solve.

376
00:24:48,660 --> 00:24:51,830
Maybe that's what I should say.

377
00:24:51,830 --> 00:24:56,380
Do you see the constraint
B*u equal d in here from this

378
00:24:56,380 --> 00:24:58,380
middle block row?

379
00:24:58,380 --> 00:25:04,140
That says 0, 0, B u hat is d.

380
00:25:04,140 --> 00:25:08,290
So we've introduced
the constraint.

381
00:25:08,290 --> 00:25:13,330
The first part is w_infinity
with an A*u_infinity,

382
00:25:13,330 --> 00:25:18,790
that's the usual error term,
the thing that we probably

383
00:25:18,790 --> 00:25:20,000
can't make 0.

384
00:25:20,000 --> 00:25:23,670
And then this is the
usual Lagrange multiplier

385
00:25:23,670 --> 00:25:25,450
term from there.

386
00:25:29,120 --> 00:25:34,360
So I've spoken pretty quickly
here, and let me just conclude.

387
00:25:34,360 --> 00:25:39,150
This is the limit equation,
is the correct limit equation.

388
00:25:39,150 --> 00:25:40,880
This is the limit
equation that we want

389
00:25:40,880 --> 00:25:44,780
to solve one way or another.

390
00:25:44,780 --> 00:25:51,120
And taking alpha large is one
way to get near the answer,

391
00:25:51,120 --> 00:25:53,250
but we'll look at
other ways now.

392
00:25:53,250 --> 00:26:00,840
So this is really the
correct equations to solve.

393
00:26:05,050 --> 00:26:12,740
Going the saddle point
Lagrange multiplier route.

394
00:26:12,740 --> 00:26:14,910
OK.

395
00:26:14,910 --> 00:26:19,360
So let me summarize
what I've done so far.

396
00:26:22,340 --> 00:26:30,060
My problem is when B*u equal
d is a constraint that I would

397
00:26:30,060 --> 00:26:35,040
like to satisfy, and one way
to do it is to take alpha,

398
00:26:35,040 --> 00:26:38,770
you know, pretty near the
largest number that the machine

399
00:26:38,770 --> 00:26:43,750
will hold, say 10 to the 15.

400
00:26:43,750 --> 00:26:46,500
Put a really heavy
weight on this.

401
00:26:46,500 --> 00:26:49,920
But of course, when you
let alpha be 10 to the 15,

402
00:26:49,920 --> 00:26:54,470
you can see that there's like
some possible problems here.

403
00:26:54,470 --> 00:26:56,860
When you let alpha have
an enormous weight,

404
00:26:56,860 --> 00:26:59,900
you're really
tilting this matrix

405
00:26:59,900 --> 00:27:03,230
so strongly, you know, you
couldn't let it be 10 to the 20

406
00:27:03,230 --> 00:27:07,610
in single precision or you'd
wipe out A transpose A.

407
00:27:07,610 --> 00:27:09,570
So it's a balance here.

408
00:27:09,570 --> 00:27:14,440
So I guess probably a lot
of a numerical analysts

409
00:27:14,440 --> 00:27:18,440
would say wrong way to
do it, the right way

410
00:27:18,440 --> 00:27:24,590
is solve this equation, or
else do it this other way.

411
00:27:24,590 --> 00:27:30,590
But a lot of people with
codes say OK, you know,

412
00:27:30,590 --> 00:27:32,090
you're going to be
a nervous Nellie,

413
00:27:32,090 --> 00:27:34,220
I'm just going to use my code.

414
00:27:34,220 --> 00:27:37,920
And that's quite normal,
quite human response.

415
00:27:37,920 --> 00:27:40,640
OK.

416
00:27:40,640 --> 00:27:46,400
And this will
frequently succeed.

417
00:27:46,400 --> 00:27:46,900
OK.

418
00:27:46,900 --> 00:27:50,640
So that's one method to
do it, not the method that

419
00:27:50,640 --> 00:27:54,690
the professionals in numerical
analysis -- maybe I'm thinking,

420
00:27:54,690 --> 00:27:58,060
for example, the book
by Golub/van Loan,

421
00:27:58,060 --> 00:28:03,760
if you know that book, that
would discuss this problem.

422
00:28:03,760 --> 00:28:09,170
And it would actually
discuss this third method,

423
00:28:09,170 --> 00:28:12,280
this null space
method of solving it.

424
00:28:12,280 --> 00:28:13,510
OK.

425
00:28:13,510 --> 00:28:16,570
Maybe I'll go to that
null space method.

426
00:28:16,570 --> 00:28:20,450
So this was one way.

427
00:28:20,450 --> 00:28:25,030
Another way is
solve B*u equal d.

428
00:28:29,470 --> 00:28:32,670
And remember again, we
only have p equations,

429
00:28:32,670 --> 00:28:35,275
we have n unknowns,
so there's going

430
00:28:35,275 --> 00:28:38,740
to be freedom in the solution.

431
00:28:38,740 --> 00:28:43,210
So we have to identify
a particular solution,

432
00:28:43,210 --> 00:28:45,910
there's a lot of freedom in
that particular solution,

433
00:28:45,910 --> 00:28:51,350
and then we can add to it --
this null space is going to be

434
00:28:51,350 --> 00:29:02,120
n minus p dimensions, n minus p
degrees of freedom in the null

435
00:29:02,120 --> 00:29:02,960
space.

436
00:29:02,960 --> 00:29:06,190
That's the dimension of
the null space, n minus p.

437
00:29:06,190 --> 00:29:08,590
I'm assuming that
B has full rank p,

438
00:29:08,590 --> 00:29:12,100
but p is a small
number compared to n.

439
00:29:12,100 --> 00:29:12,680
OK.

440
00:29:12,680 --> 00:29:14,860
So how do you find a
particular solution?

441
00:29:14,860 --> 00:29:16,680
How do you find the
null space solution?

442
00:29:16,680 --> 00:29:23,930
As I said, that's what I
should be explaining in 18.06.

443
00:29:23,930 --> 00:29:27,560
And of course, we do it in
18.06, but we do it with a 3

444
00:29:27,560 --> 00:29:33,290
by 3 matrix, and we practically,
you know, we do it by hand,

445
00:29:33,290 --> 00:29:36,190
where here we're talking about
matrices of order thousands

446
00:29:36,190 --> 00:29:39,860
or millions, we don't
do those by hand.

447
00:29:39,860 --> 00:29:44,450
And we better not do
it in an unstable way.

448
00:29:44,450 --> 00:29:48,940
So the question is what's
a good way to do it?

449
00:29:48,940 --> 00:29:53,720
And really, the heart of
modern numerical analysis

450
00:29:53,720 --> 00:30:00,110
is orthogonalize stuff,
get orthogonal vectors.

451
00:30:00,110 --> 00:30:03,190
Because if you have
orthogonal vectors,

452
00:30:03,190 --> 00:30:05,500
they don't get out of scale.

453
00:30:05,500 --> 00:30:10,950
The numbers involved
don't become unstable.

454
00:30:10,950 --> 00:30:14,585
And the standard
orthogonalization process

455
00:30:14,585 --> 00:30:17,330
is Gram-Schmidt,
that's right, those

456
00:30:17,330 --> 00:30:18,760
are the words we all think of.

457
00:30:18,760 --> 00:30:22,470
If I have a bunch of vectors,
I have to make them orthogonal,

458
00:30:22,470 --> 00:30:28,140
I want to make them
orthogonal, then I use -- well,

459
00:30:28,140 --> 00:30:30,900
Gram-Schmidt is
what we think of.

460
00:30:30,900 --> 00:30:36,830
But actually, MATLAB
doesn't use Gram-Schmidt,

461
00:30:36,830 --> 00:30:40,490
doesn't use the
usual Gram-Schmidt,

462
00:30:40,490 --> 00:30:44,480
as Gram and Schmidt
thought of it.

463
00:30:44,480 --> 00:30:49,520
MATLAB goes a different
route to the same conclusion.

464
00:30:49,520 --> 00:30:54,560
So let me just remind you
what Gram-Schmidt produced.

465
00:30:58,420 --> 00:31:05,890
And let me put in the name
of the numerical analyst

466
00:31:05,890 --> 00:31:14,390
long after Gram and
Schmidt, it's Householder,

467
00:31:14,390 --> 00:31:20,790
you know, the guy from
Tennessee with good ideas.

468
00:31:20,790 --> 00:31:24,590
So he had another way
to the same answer,

469
00:31:24,590 --> 00:31:26,970
which is this factorization.

470
00:31:26,970 --> 00:31:34,200
So we take our matrix,
often it's A in 18.06,

471
00:31:34,200 --> 00:31:40,150
and we factor it, we want to
orthogonalize its columns.

472
00:31:40,150 --> 00:31:43,060
So the columns of A
get orthogonalized

473
00:31:43,060 --> 00:31:52,740
into the columns of Q. So this
has the orthogonal columns.

474
00:31:52,740 --> 00:31:54,580
And then, of course,
there's some connection

475
00:31:54,580 --> 00:31:57,730
between the original columns
and the orthogonal columns,

476
00:31:57,730 --> 00:32:01,320
and that connection is
by triangular matrix

477
00:32:01,320 --> 00:32:03,020
R, upper triangular.

478
00:32:08,050 --> 00:32:12,720
I don't know if you
remember that from 18.06.

479
00:32:12,720 --> 00:32:19,140
What I typically do is I explain
Gram-Schmidt as they knew it,

480
00:32:19,140 --> 00:32:25,820
and then at the last
minute I pull Q and R out

481
00:32:25,820 --> 00:32:29,620
as a way to express
the result. OK.

482
00:32:29,620 --> 00:32:32,240
So it's the result
we want, and not

483
00:32:32,240 --> 00:32:35,780
the particular Gram-Schmidt
way to get there,

484
00:32:35,780 --> 00:32:38,470
and Householder produces
a better way to get there.

485
00:32:38,470 --> 00:32:40,160
OK.

486
00:32:40,160 --> 00:32:46,290
But the main point is that if a
matrix has independent columns,

487
00:32:46,290 --> 00:32:52,170
or even if it hasn't, but if
it has independent columns

488
00:32:52,170 --> 00:32:58,270
we know everything about it,
that we can orthogonalize

489
00:32:58,270 --> 00:32:59,550
those columns.

490
00:32:59,550 --> 00:33:04,840
In fact, we can -- here's
what I'm leading to, this B

491
00:33:04,840 --> 00:33:11,440
transpose I'm remembering has
this shape because B had that

492
00:33:11,440 --> 00:33:18,470
shape, so it's B transpose that
I'm going to do Gram-Schmidt,

493
00:33:18,470 --> 00:33:30,510
Householder, use -- The command
in MATLAB is [Q, R] equals,

494
00:33:30,510 --> 00:33:36,040
with Gram-Schmidt we could
have used the letters G and S,

495
00:33:36,040 --> 00:33:40,140
but since we don't use
their actual method anymore,

496
00:33:40,140 --> 00:33:44,670
we could use the letter HH
for Householder or something.

497
00:33:44,670 --> 00:33:50,680
But it's qr of, in this case,
B transpose is what we want.

498
00:33:50,680 --> 00:33:51,800
OK.

499
00:33:51,800 --> 00:33:59,440
So what that very frequently
used command in MATLAB

500
00:33:59,440 --> 00:34:09,800
produces is Q and R. And it
produces a square matrix Q,

501
00:34:09,800 --> 00:34:14,680
where these columns, the
columns of the first part,

502
00:34:14,680 --> 00:34:20,490
Q_1 transpose,
are orthogonalized

503
00:34:20,490 --> 00:34:22,890
versions of these columns.

504
00:34:22,890 --> 00:34:28,060
And the R just tells us the
connection between them.

505
00:34:28,060 --> 00:34:32,950
Then it also produces, and
this is handy as you'll see,

506
00:34:32,950 --> 00:34:39,390
the algorithm also produces
n minus p more columns, that

507
00:34:39,390 --> 00:34:42,060
are orthogonal to these guys.

508
00:34:42,060 --> 00:34:45,100
So it produces a complete
orthonormal basis,

509
00:34:45,100 --> 00:34:48,230
a complete set of n
columns altogether.

510
00:34:48,230 --> 00:34:53,540
Q_1 transpose has the
column that really are

511
00:34:53,540 --> 00:34:54,920
associated with these problems.

512
00:34:54,920 --> 00:34:58,530
And these are going to be
associated with the null space.

513
00:35:02,610 --> 00:35:06,420
So out of this, I see
that actually B transpose

514
00:35:06,420 --> 00:35:13,480
is Q_1 transpose
R. So you can say

515
00:35:13,480 --> 00:35:18,330
this is the reduced
factorization with only p

516
00:35:18,330 --> 00:35:23,620
columns, and this
is the full picture

517
00:35:23,620 --> 00:35:26,810
with the other n minus p
columns that are orthogonal.

518
00:35:26,810 --> 00:35:29,660
And the reason that's
handy is they tell us

519
00:35:29,660 --> 00:35:31,950
about the null space.

520
00:35:31,950 --> 00:35:38,020
So now I want to identify out
of this a particular solution

521
00:35:38,020 --> 00:35:39,960
and the general
null space solution.

522
00:35:39,960 --> 00:35:41,780
OK.

523
00:35:41,780 --> 00:35:43,470
So what are those?

524
00:35:43,470 --> 00:35:48,450
So particular solution is
going to use this part.

525
00:35:50,980 --> 00:35:57,160
So, let's see, I want
a particular solution.

526
00:35:57,160 --> 00:36:05,280
So B, transposing that
is R transpose Q_1.

527
00:36:05,280 --> 00:36:07,430
OK.

528
00:36:07,430 --> 00:36:15,490
So now I'm prepared to solve
-- step one is the particular

529
00:36:15,490 --> 00:36:16,230
solution.

530
00:36:16,230 --> 00:36:21,500
I want to get be
B*u_particular equal d.

531
00:36:21,500 --> 00:36:23,350
OK.

532
00:36:23,350 --> 00:36:27,700
But now I have B
is nicely factored.

533
00:36:27,700 --> 00:36:37,010
So this is R transpose
Q_1 u particular equal d.

534
00:36:40,370 --> 00:36:46,570
So now comes the computation
the code has to do.

535
00:36:46,570 --> 00:36:54,370
It has to invert that to
get Q_1 times u particular

536
00:36:54,370 --> 00:36:58,590
equals R inverse transpose d.

537
00:36:58,590 --> 00:37:00,680
So it had to solve
a triangular system,

538
00:37:00,680 --> 00:37:02,930
but of course, a triangular
system is quick to solve.

539
00:37:02,930 --> 00:37:05,040
That's the good part here.

540
00:37:05,040 --> 00:37:09,740
And then this final step
to get u particular,

541
00:37:09,740 --> 00:37:14,000
I have to put the inverse
of that guy over there,

542
00:37:14,000 --> 00:37:17,010
but because this has
orthogonal column, that's

543
00:37:17,010 --> 00:37:19,040
just Q_1 transpose.

544
00:37:19,040 --> 00:37:22,720
So there we go.

545
00:37:22,720 --> 00:37:28,900
That's the inverse
of R. So that's

546
00:37:28,900 --> 00:37:32,660
what I should've done
in 18.06 and never did,

547
00:37:32,660 --> 00:37:36,430
and you get to see it.

548
00:37:36,430 --> 00:37:39,330
What's a convenient
particular solution?

549
00:37:39,330 --> 00:37:46,540
Everybody knows, we
got a whole collection

550
00:37:46,540 --> 00:37:47,780
of particular solutions.

551
00:37:47,780 --> 00:37:51,950
We want to choose one
that's nice and stable.

552
00:37:51,950 --> 00:37:54,360
And the reason it's
stable is that it

553
00:37:54,360 --> 00:37:59,670
works with orthogonal
columns, orthonormal even,

554
00:37:59,670 --> 00:38:07,630
and triangular matrix for
which linear systems are

555
00:38:07,630 --> 00:38:08,760
highly active.

556
00:38:08,760 --> 00:38:09,500
OK.

557
00:38:09,500 --> 00:38:11,350
So that's the
particular solution.

558
00:38:11,350 --> 00:38:13,320
Now what's the null
space solution?

559
00:38:13,320 --> 00:38:21,240
What are the general
solutions to null space part?

560
00:38:21,240 --> 00:38:25,120
What are the solution to those?

561
00:38:25,120 --> 00:38:29,190
Well, I can just go
down the same steps.

562
00:38:29,190 --> 00:38:35,310
This is R transpose Q_1
u null space equals 0.

563
00:38:38,480 --> 00:38:41,620
I multiply both sides --
this is a nice square,

564
00:38:41,620 --> 00:38:47,500
invertible matrix -- I multiply
by its inverse, kills that.

565
00:38:47,500 --> 00:39:00,620
So now I have Q_1 times the --
Q_1 is really the heart of B.

566
00:39:00,620 --> 00:39:07,830
So what vectors are
perpendicular to Q_1?

567
00:39:07,830 --> 00:39:09,670
I hope I've got this right.

568
00:39:09,670 --> 00:39:13,440
It's easy to mix up a
transpose in the process.

569
00:39:13,440 --> 00:39:16,560
So let me just pause to be
sure I'm doing it correctly.

570
00:39:22,120 --> 00:39:23,540
OK.

571
00:39:23,540 --> 00:39:24,610
I hope.

572
00:39:24,610 --> 00:39:26,460
Did I check that I get it right?

573
00:39:29,220 --> 00:39:30,020
Yes.

574
00:39:30,020 --> 00:39:30,520
OK.

575
00:39:38,060 --> 00:39:41,850
I could have written
what B is here, since I

576
00:39:41,850 --> 00:39:46,130
have B transpose as a product.

577
00:39:46,130 --> 00:39:56,570
B is R_0, Q_1, Q_2.

578
00:39:56,570 --> 00:39:58,770
OK.

579
00:39:58,770 --> 00:40:05,360
And I want to multiply
by u_null and get 0.

580
00:40:05,360 --> 00:40:06,760
OK.

581
00:40:06,760 --> 00:40:08,680
So what should u_null be?

582
00:40:12,840 --> 00:40:22,980
u_null should be -- this
part is giving us a 0,

583
00:40:22,980 --> 00:40:25,340
so this is like gone.

584
00:40:25,340 --> 00:40:31,070
So you see the two are the same.

585
00:40:31,070 --> 00:40:38,580
So what vectors are
perpendicular to those?

586
00:40:38,580 --> 00:40:50,360
The answer is the u_null is
a combination of the columns

587
00:40:50,360 --> 00:40:57,170
of Q_2 transpose.

588
00:40:57,170 --> 00:41:01,750
It's the Q_2 part that's
telling us about the null space.

589
00:41:01,750 --> 00:41:05,670
It was the Q_1 part that
gave the particular solution,

590
00:41:05,670 --> 00:41:08,750
it's the Q_2 part that
gives the general solution.

591
00:41:08,750 --> 00:41:15,000
In other words, u_null
is Q_2 transpose times

592
00:41:15,000 --> 00:41:21,280
any vector, let me call
it z, this is any z.

593
00:41:21,280 --> 00:41:23,530
OK.

594
00:41:23,530 --> 00:41:28,050
Now this has my n minus
p degrees of freedom.

595
00:41:35,430 --> 00:41:38,390
Sorry, I'm trying to
do quite a bit here.

596
00:41:38,390 --> 00:41:39,990
I'm trying to say
how you actually

597
00:41:39,990 --> 00:41:49,810
solve rectangular systems
when they're not determinate.

598
00:41:52,410 --> 00:41:54,730
There are many solutions.

599
00:41:54,730 --> 00:41:57,810
This is a good particular
solution to find,

600
00:41:57,810 --> 00:42:00,670
and this is a good way to
find the general solution,

601
00:42:00,670 --> 00:42:01,670
the null space solution.

602
00:42:01,670 --> 00:42:10,580
This is a combination
of the other columns.

603
00:42:10,580 --> 00:42:11,890
OK.

604
00:42:11,890 --> 00:42:14,030
All right, now
we're done really,

605
00:42:14,030 --> 00:42:19,830
because I now know
what u looks like; u

606
00:42:19,830 --> 00:42:23,690
looks like this part,
which I've computed,

607
00:42:23,690 --> 00:42:27,940
and this part, which
has the freedom.

608
00:42:27,940 --> 00:42:30,370
Let me put those
two parts together.

609
00:42:30,370 --> 00:42:37,200
So now I want to minimize --
so I'm near the end here --

610
00:42:37,200 --> 00:42:41,440
A*u minus B, but
A*u is u_particular,

611
00:42:41,440 --> 00:42:44,660
and I have u_particular here.

612
00:42:44,660 --> 00:42:50,210
Q_1 transpose R
minus transpose d,

613
00:42:50,210 --> 00:42:55,900
that's u_particular, plus
u_null, and that's this.

614
00:42:55,900 --> 00:43:06,430
This u_null was also here;
u_null was any Q_2 transpose z,

615
00:43:06,430 --> 00:43:07,640
right.

616
00:43:07,640 --> 00:43:10,480
All that is u, A*u minus b.

617
00:43:15,880 --> 00:43:16,420
OK.

618
00:43:19,730 --> 00:43:23,050
Up to possibly screwing
up on some transposes,

619
00:43:23,050 --> 00:43:26,790
this is the right method.

620
00:43:26,790 --> 00:43:29,400
So this is a fixed solution.

621
00:43:29,400 --> 00:43:31,930
I just want to write
that as a different way.

622
00:43:31,930 --> 00:43:37,350
Minimize A Q transpose z.

623
00:43:37,350 --> 00:43:41,380
Now, we're minimizing
over the z's.

624
00:43:41,380 --> 00:43:50,300
So u had n components, but
somehow p degrees of freedom

625
00:43:50,300 --> 00:43:53,950
were used up by the
constraint B*u equal d.

626
00:43:53,950 --> 00:43:57,320
And we have the n minus
p true degrees of freedom

627
00:43:57,320 --> 00:43:58,610
are in the z.

628
00:43:58,610 --> 00:44:03,430
So there's this minus the b.

629
00:44:03,430 --> 00:44:05,410
This is all known stuff.

630
00:44:05,410 --> 00:44:13,000
A Q_1 transpose R minus
transpose d, square.

631
00:44:13,000 --> 00:44:14,040
OK.

632
00:44:14,040 --> 00:44:17,020
I'm there.

633
00:44:17,020 --> 00:44:20,830
So this is a standard
minimization problem.

634
00:44:20,830 --> 00:44:26,850
Minimize, shall I
call this A tilde z?

635
00:44:26,850 --> 00:44:30,850
And I'll call all
this stuff b tilde.

636
00:44:30,850 --> 00:44:34,410
And the solution is found
from the normal equations

637
00:44:34,410 --> 00:44:42,250
A tilde transpose, A
tilde times the best

638
00:44:42,250 --> 00:44:46,760
z, I'll put a hat
on it to emphasize

639
00:44:46,760 --> 00:44:53,131
that it's the great one, is
A tilde transpose b tilde.

640
00:44:53,131 --> 00:44:53,630
OK.

641
00:44:57,500 --> 00:45:02,050
Finished that process
without leaving myself

642
00:45:02,050 --> 00:45:05,210
a lot of time for
the other method.

643
00:45:05,210 --> 00:45:13,540
Conclusion here, that after
you've done the QR step, the qr

644
00:45:13,540 --> 00:45:20,570
command, and then after
you solved a linear system

645
00:45:20,570 --> 00:45:27,930
with the R transpose, and you've
multiplied by Q's, and you've

646
00:45:27,930 --> 00:45:32,810
ended up with this problem
with a new matrix A tilde and B

647
00:45:32,810 --> 00:45:36,230
tilde, then you just
do the normal equation.

648
00:45:36,230 --> 00:45:44,580
The web will have the code
that takes those steps,

649
00:45:44,580 --> 00:45:47,150
reaches this conclusion,
and solves it.

650
00:45:47,150 --> 00:45:47,840
OK.

651
00:45:47,840 --> 00:45:50,760
So that's the null space method.

652
00:45:50,760 --> 00:45:56,150
And it would be our
method of choice when --

653
00:45:56,150 --> 00:46:01,290
so z has n minus p components.

654
00:46:01,290 --> 00:46:05,600
If p is near n, then they're
not many z's and this

655
00:46:05,600 --> 00:46:07,340
is highly efficient.

656
00:46:07,340 --> 00:46:07,910
OK.

657
00:46:07,910 --> 00:46:10,940
So the null space
method is one way to go.

658
00:46:10,940 --> 00:46:13,470
Can I just in the
remaining minutes

659
00:46:13,470 --> 00:46:22,040
go back to the Lagrange
multiplier idea?

660
00:46:22,040 --> 00:46:23,900
So what's the Lagrange
multiplier idea?

661
00:46:26,990 --> 00:46:28,980
So let me write
the problem again

662
00:46:28,980 --> 00:46:30,550
as Lagrange would like it.

663
00:46:30,550 --> 00:46:38,060
Minimize A*u minus b squared
subject to B*u equal d.

664
00:46:41,140 --> 00:46:44,290
That's the problem
we're solving.

665
00:46:44,290 --> 00:46:46,940
I should have
written it earlier.

666
00:46:46,940 --> 00:46:50,320
Let me put a star here,
because this is our problem.

667
00:46:50,320 --> 00:46:51,380
OK.

668
00:46:51,380 --> 00:46:55,600
So one way to tackle it
was take that constraint,

669
00:46:55,600 --> 00:46:57,200
give it a heavy weight.

670
00:46:57,200 --> 00:46:59,460
That was method one.

671
00:46:59,460 --> 00:47:06,050
Method two was solve this
constraint in full detail,

672
00:47:06,050 --> 00:47:12,350
get the z's that remain
as degrees of freedom,

673
00:47:12,350 --> 00:47:16,180
plug in u_particular plus
u null space into here,

674
00:47:16,180 --> 00:47:18,800
and then you have
a problem in the z.

675
00:47:18,800 --> 00:47:20,360
That's method two.

676
00:47:20,360 --> 00:47:24,400
Now, so method
three is Lagrange.

677
00:47:24,400 --> 00:47:28,440
So method three would say
OK, what does Lagrange do?

678
00:47:28,440 --> 00:47:37,250
L, we call it the Lagrangian, he
takes this A*u minus b square,

679
00:47:37,250 --> 00:47:40,590
and adds to it some
Lagrange multiplier,

680
00:47:40,590 --> 00:47:48,630
and I'll use maybe the standard
lambda, times B*u minus d,

681
00:47:48,630 --> 00:47:49,130
right?

682
00:47:49,130 --> 00:47:50,250
That's Lagrange's idea.

683
00:47:56,030 --> 00:47:58,230
You recognize Lagrange's idea.

684
00:47:58,230 --> 00:48:02,390
Takes the constraint,
multiply it by a multiplier.

685
00:48:02,390 --> 00:48:12,110
In fact this is p
constraints, so p lambdas.

686
00:48:12,110 --> 00:48:16,390
Lambda's a vector
of p multipliers.

687
00:48:16,390 --> 00:48:19,200
Not just a single one, because
we've got the p constraints.

688
00:48:19,200 --> 00:48:20,740
And now what does Lagrange do?

689
00:48:23,810 --> 00:48:30,770
He sets the derivative
dL / d lambda -- well,

690
00:48:30,770 --> 00:48:34,070
so let me do the dL/du first.

691
00:48:34,070 --> 00:48:40,570
He sets dL/du to 0,
and dL / d lambda to 0.

692
00:48:44,464 --> 00:48:46,130
I could've started
out with this method,

693
00:48:46,130 --> 00:48:50,310
because it's going to lead
us to the equations faster.

694
00:48:50,310 --> 00:48:53,230
What equations do we
get from dL/du equals 0?

695
00:48:53,230 --> 00:48:56,550
What's the gradient
with respect to u?

696
00:48:56,550 --> 00:48:59,770
That gives us A transpose A*u.

697
00:48:59,770 --> 00:49:04,200
Oh, probably we want a 1/2
here, so that the numbers

698
00:49:04,200 --> 00:49:05,220
come out right.

699
00:49:05,220 --> 00:49:13,730
We get A transpose A*u, and
another u part will be the B

700
00:49:13,730 --> 00:49:15,750
lambda.

701
00:49:15,750 --> 00:49:20,760
Taking the derivative of u will
produce a B transpose lambda

702
00:49:20,760 --> 00:49:22,430
out of that.

703
00:49:22,430 --> 00:49:26,050
Yeah, a B transpose
lambda out of that.

704
00:49:26,050 --> 00:49:29,800
And then, in here will
be a linear term in u

705
00:49:29,800 --> 00:49:31,990
that we might as well put
on the right-hand side

706
00:49:31,990 --> 00:49:35,670
as A transpose b.

707
00:49:35,670 --> 00:49:37,400
Familiar.

708
00:49:37,400 --> 00:49:39,010
OK.

709
00:49:39,010 --> 00:49:41,580
And what about dL / d lambda?

710
00:49:41,580 --> 00:49:48,050
Well that's just our
constraint, B*u equals d right.

711
00:49:51,230 --> 00:49:53,110
Having built in the
constraint, when

712
00:49:53,110 --> 00:49:55,520
I take the derivative
with respect to lambda,

713
00:49:55,520 --> 00:49:57,520
the constraint just
comes back again.

714
00:49:57,520 --> 00:50:01,700
So this is now method three.

715
00:50:01,700 --> 00:50:04,440
Solve that system.

716
00:50:04,440 --> 00:50:09,110
And I guess what I want say
in the remaining 30 seconds

717
00:50:09,110 --> 00:50:15,640
is that solving this system
is the same as this one.

718
00:50:15,640 --> 00:50:19,660
Those two are exactly the same.

719
00:50:19,660 --> 00:50:24,080
So that's a system with three
parts, but I can, as always --

720
00:50:24,080 --> 00:50:27,420
maybe I can even get there.

721
00:50:27,420 --> 00:50:30,720
Can you see that
if I take this part

722
00:50:30,720 --> 00:50:36,210
and I subtract A transpose
times the top row

723
00:50:36,210 --> 00:50:41,690
from the bottom row,
what will that give me?

724
00:50:41,690 --> 00:50:46,020
Let me just hope that it
works, well I won't actually.

725
00:50:46,020 --> 00:50:49,780
Time is up, it's asking
too much to do even

726
00:50:49,780 --> 00:50:53,890
this one piece of linear algebra
that can be in the notes.

727
00:50:53,890 --> 00:50:58,740
So this system that we got
as the correct limit equation

728
00:50:58,740 --> 00:51:01,980
is exactly the same
one that Lagrange gets.

729
00:51:01,980 --> 00:51:03,160
So that's one way.

730
00:51:03,160 --> 00:51:07,660
This is a system with
n plus p unknowns.

731
00:51:11,500 --> 00:51:15,020
That's the price you pay
for going Lagrange's route.

732
00:51:15,020 --> 00:51:17,060
You add p unknowns.

733
00:51:17,060 --> 00:51:23,330
This was a system with
n minus p unknowns.

734
00:51:23,330 --> 00:51:26,090
That's because you're
using the constraints

735
00:51:26,090 --> 00:51:28,300
to reduce the problem.

736
00:51:28,300 --> 00:51:34,780
And the original method one
was a method with n unknowns,

737
00:51:34,780 --> 00:51:35,850
the unknowns in u.

738
00:51:35,850 --> 00:51:41,600
So you have the choice n, n
plus p that Lagrange would like,

739
00:51:41,600 --> 00:51:46,910
and n minus p that
Golub/van Loan would prefer.

740
00:51:46,910 --> 00:51:53,810
And usually it's method two or
method three is recommended,

741
00:51:53,810 --> 00:51:56,120
but method one often used.

742
00:51:56,120 --> 00:51:56,690
OK.

743
00:51:56,690 --> 00:52:06,160
So that's the lecture
on this point.

744
00:52:06,160 --> 00:52:07,110
That's today.

745
00:52:07,110 --> 00:52:13,410
And then next week comes
the whole class of problems

746
00:52:13,410 --> 00:52:17,120
like finding velocities
from displacements,

747
00:52:17,120 --> 00:52:20,110
where alpha is a
small parameter.

748
00:52:20,110 --> 00:52:26,650
And then after that come
discussions of the completed

749
00:52:26,650 --> 00:52:32,020
project ones, and the upcoming
extensions into project two.

750
00:52:32,020 --> 00:52:32,620
OK.

751
00:52:32,620 --> 00:52:34,560
See you next week, thanks.

752
00:52:34,560 --> 00:52:35,810
Good.