1
00:00:00,000 --> 00:00:01,950
The following
content is provided

2
00:00:01,950 --> 00:00:06,100
by MIT OpenCourseWare under
a Creative Commons license.

3
00:00:06,100 --> 00:00:08,200
Additional information
about our license

4
00:00:08,200 --> 00:00:10,520
and MIT OpenCourseWare
in general

5
00:00:10,520 --> 00:00:11,930
is available at ocw.mit.edu.

6
00:00:17,920 --> 00:00:19,190
PROFESSOR: OK.

7
00:00:19,190 --> 00:00:20,410
Good.

8
00:00:20,410 --> 00:00:26,440
So I decided to make today's
lecture the one on linear

9
00:00:26,440 --> 00:00:31,230
programming and duality,
which I'd planned for Friday,

10
00:00:31,230 --> 00:00:36,680
and give myself two more
days to learn about ill-posed

11
00:00:36,680 --> 00:00:42,250
and inverse problems, and
then come back to that Friday,

12
00:00:42,250 --> 00:00:47,180
so that -- we've studied
the limits in those problems

13
00:00:47,180 --> 00:00:54,670
of alpha going to infinity or
0, but the scientific question

14
00:00:54,670 --> 00:00:57,370
when there's noise in the
system is finite alpha,

15
00:00:57,370 --> 00:01:01,950
and I want to learn more about
applications and examples.

16
00:01:01,950 --> 00:01:09,180
Can I also say I'm very happy to
have had volunteers for Monday

17
00:01:09,180 --> 00:01:11,780
and Wednesday of
next week to present,

18
00:01:11,780 --> 00:01:14,090
and if a couple of
people might maybe

19
00:01:14,090 --> 00:01:19,620
volunteer for Friday, to share
Friday, I'll be very grateful.

20
00:01:19,620 --> 00:01:24,570
So you could see me after
class, put a hand up now,

21
00:01:24,570 --> 00:01:29,140
send me an email -- all
those would be very good.

22
00:01:29,140 --> 00:01:33,610
And again, I would
be thinking --

23
00:01:33,610 --> 00:01:36,510
since it's just next
week I'm talking about --

24
00:01:36,510 --> 00:01:42,210
that it would be essentially a
report on your Project One that

25
00:01:42,210 --> 00:01:47,270
you would use the
overhead projector maybe,

26
00:01:47,270 --> 00:01:49,890
if that's preferable.

27
00:01:49,890 --> 00:01:50,670
OK.

28
00:01:50,670 --> 00:01:55,550
So I think you'll
like this topic.

29
00:01:55,550 --> 00:02:02,620
It's kind of specific but widely
used -- linear programming --

30
00:02:02,620 --> 00:02:11,170
used in business to maximize
profits, to minimize costs.

31
00:02:11,170 --> 00:02:17,180
And linear means that the
cost function is linear.

32
00:02:17,180 --> 00:02:20,250
That's an inner product -- c*x.

33
00:02:20,250 --> 00:02:22,890
c is a row vector, x
is a column vector,

34
00:02:22,890 --> 00:02:25,720
so that I'm following the
conventions of this subject

35
00:02:25,720 --> 00:02:29,210
here to take these
different shapes,

36
00:02:29,210 --> 00:02:32,750
so let me indicate
what the shapes are.

37
00:02:32,750 --> 00:02:44,600
But the inputs are -- the data
of the problem are -- c and A,

38
00:02:44,600 --> 00:02:47,080
an m by n matrix, and b.

39
00:02:47,080 --> 00:02:55,000
So A is m by n, b is m by
1 -- right-hand side --

40
00:02:55,000 --> 00:02:58,400
and c is 1 by n.

41
00:02:58,400 --> 00:03:04,730
And then the unknown -- this is
the thing that we're to find --

42
00:03:04,730 --> 00:03:10,150
it's a column vector, n by 1.

43
00:03:10,150 --> 00:03:11,250
OK.

44
00:03:11,250 --> 00:03:16,710
And the point is
there are constraints

45
00:03:16,710 --> 00:03:19,240
and those are linear too.

46
00:03:19,240 --> 00:03:22,200
So it's rather unusual to
have a linear cost function.

47
00:03:22,200 --> 00:03:23,540
Right?

48
00:03:23,540 --> 00:03:29,770
Because when you maximize or
minimize some linear function,

49
00:03:29,770 --> 00:03:34,010
well, the thing is just going
up or it's going down --

50
00:03:34,010 --> 00:03:39,570
or in higher dimensions the
same -- and if it's going down,

51
00:03:39,570 --> 00:03:42,740
then the minimum is going
to be at the right-hand end.

52
00:03:42,740 --> 00:03:46,670
Or if it's going up, the minimum
will be at the left-hand end.

53
00:03:46,670 --> 00:03:49,670
And if I'm in more
variables, this idea

54
00:03:49,670 --> 00:03:53,850
will still be true that
the minimum or maximum

55
00:03:53,850 --> 00:03:58,720
will happen at the edges, at
the ends of the allowed region.

56
00:03:58,720 --> 00:04:02,490
And this allowed region,
called the feasible set --

57
00:04:02,490 --> 00:04:07,480
so let me give the name to this
-- these are the allowed x's.

58
00:04:07,480 --> 00:04:09,470
These are the constraints.

59
00:04:09,470 --> 00:04:16,490
And that set is called the
feasible set, feasible meaning

60
00:04:16,490 --> 00:04:19,060
doable.

61
00:04:19,060 --> 00:04:22,830
So those constraints
include inequalities,

62
00:04:22,830 --> 00:04:29,200
because we want finite
intervals, finite regions in n

63
00:04:29,200 --> 00:04:30,550
dimensions.

64
00:04:30,550 --> 00:04:35,460
And I drew a sort
of quick picture

65
00:04:35,460 --> 00:04:38,400
so that you have
a model of this.

66
00:04:38,400 --> 00:04:45,170
So this is a picture with
n equals 3 -- 3 dimensions,

67
00:04:45,170 --> 00:04:49,850
and so the constraints
x greater or equal 0,

68
00:04:49,850 --> 00:04:53,790
x_1 greater or equal 0,
x_2 greater or equal 0,

69
00:04:53,790 --> 00:04:58,550
x_3 greater or equal 0 --
that's what x greater or equal 0

70
00:04:58,550 --> 00:04:59,950
means.

71
00:04:59,950 --> 00:05:02,130
It means all components.

72
00:05:02,130 --> 00:05:04,220
So we're in the quadrant right?

73
00:05:04,220 --> 00:05:08,040
We're in a quarter -- 1/8
sorry -- we're in an octant --

74
00:05:08,040 --> 00:05:12,680
1/8 of three-dimensional
space, the positive octant.

75
00:05:12,680 --> 00:05:18,870
And then if I draw maybe just
one, just put in one equation,

76
00:05:18,870 --> 00:05:25,340
one plane, would cut off
a piece of that octant,

77
00:05:25,340 --> 00:05:32,140
so that A*x greater or equal
b, depending on the signs,

78
00:05:32,140 --> 00:05:39,320
but the feasible set could
well be the tetrahedron,

79
00:05:39,320 --> 00:05:44,950
the little piece of the octant
that's cut out by this plane.

80
00:05:44,950 --> 00:05:48,480
Or if our constraint
was an equality,

81
00:05:48,480 --> 00:05:51,370
the feasible set
would be the triangle.

82
00:05:51,370 --> 00:05:57,120
So A*x equal to b would
lead to the triangle,

83
00:05:57,120 --> 00:06:01,400
and A*x greater or equal b, if
we pick the signs correctly,

84
00:06:01,400 --> 00:06:05,880
would be the pyramid, would
include also this corner,

85
00:06:05,880 --> 00:06:10,160
because there'd be some volume.

86
00:06:10,160 --> 00:06:10,740
OK.

87
00:06:10,740 --> 00:06:18,080
So the feasible set
is a polyhedron.

88
00:06:18,080 --> 00:06:23,030
It's like a polygon, only
up into n dimensions,

89
00:06:23,030 --> 00:06:25,320
so we use the word polyhedron.

90
00:06:25,320 --> 00:06:34,130
And it's got corners, and the
whole point of the linear cost,

91
00:06:34,130 --> 00:06:37,220
the linear objective
function c*x --

92
00:06:37,220 --> 00:06:40,540
so this is just c_1*x_1 plus...

93
00:06:40,540 --> 00:06:46,850
plus c_n*x_n; that's
what that means.

94
00:06:46,850 --> 00:06:50,070
If I take derivatives,
I get constants.

95
00:06:50,070 --> 00:06:54,090
I don't set derivatives to
0 in this type of problem.

96
00:06:54,090 --> 00:06:58,480
I look at the endpoints,
at the corners.

97
00:06:58,480 --> 00:07:02,010
And that's where the minimum
and maximum will occur.

98
00:07:02,010 --> 00:07:05,540
So it's just a question of
finding the right corner.

99
00:07:05,540 --> 00:07:09,320
That's the problem: how to
find the winning corner.

100
00:07:16,020 --> 00:07:21,670
It's an interesting competition
between two quite different

101
00:07:21,670 --> 00:07:27,350
approaches: the famous approach
-- so let me write these two.

102
00:07:27,350 --> 00:07:36,200
The simplex method is the
best established, best known

103
00:07:36,200 --> 00:07:39,680
approach for solving
these problems.

104
00:07:39,680 --> 00:07:42,490
What's the idea of
the simplex method?

105
00:07:42,490 --> 00:07:44,570
The simplex method
finds a corner.

106
00:07:47,220 --> 00:07:53,210
A corner is a case where we
have some equality signs.

107
00:07:53,210 --> 00:08:00,090
A corner is the edge, the limit
where maybe this one still

108
00:08:00,090 --> 00:08:05,090
has x_1 positive but
it's down in this plane

109
00:08:05,090 --> 00:08:09,380
so it has maybe x_3
is 0 for this guy.

110
00:08:09,380 --> 00:08:14,710
So that corner has x_3 equals
0 and it also lies right

111
00:08:14,710 --> 00:08:19,390
on the plane, so it
has A*x equal to b.

112
00:08:19,390 --> 00:08:24,040
This corner -- well, I guess
that corner has all these guys

113
00:08:24,040 --> 00:08:27,790
equal 0: x_1 equals 0, x_2
equals 0, x_3 equals 0,

114
00:08:27,790 --> 00:08:31,770
but A*x -- inequality's
holding here for this corner

115
00:08:31,770 --> 00:08:35,170
that's hiding behind the face.

116
00:08:35,170 --> 00:08:40,860
Anyway, corners are points
where some of the constraints

117
00:08:40,860 --> 00:08:47,990
are tight or active
and others are not.

118
00:08:47,990 --> 00:08:52,010
Well, you might say:
just check them all,

119
00:08:52,010 --> 00:08:56,340
but the trouble is there
are lots of corners.

120
00:08:56,340 --> 00:09:03,680
If we're in n dimensions and
we have m constraint equations,

121
00:09:03,680 --> 00:09:08,580
then the number of corners
goes up exponentially.

122
00:09:08,580 --> 00:09:11,890
So no way to check all of them.

123
00:09:11,890 --> 00:09:16,240
So the simplex method
had a better idea.

124
00:09:16,240 --> 00:09:18,150
The simplex method
found one of them --

125
00:09:18,150 --> 00:09:21,940
and already that's a little
bit of a job, to find a corner,

126
00:09:21,940 --> 00:09:24,370
but finds one.

127
00:09:24,370 --> 00:09:26,170
And then what the
simplex method does,

128
00:09:26,170 --> 00:09:31,110
it stays entirely, it
moves along the edges.

129
00:09:31,110 --> 00:09:35,190
So from here, it
will look to see in

130
00:09:35,190 --> 00:09:41,540
which direction would the
cost go down, because we're

131
00:09:41,540 --> 00:09:43,550
trying to minimize the cost.

132
00:09:43,550 --> 00:09:45,130
So it would check
these directions.

133
00:09:47,950 --> 00:09:55,000
Each of those directions
we're releasing one equality.

134
00:09:55,000 --> 00:10:00,060
We're allowing one equality
to be an inequality,

135
00:10:00,060 --> 00:10:02,080
and that moves us along.

136
00:10:02,080 --> 00:10:05,440
So the simplex
method has two steps.

137
00:10:05,440 --> 00:10:07,800
It checks each of
these directions

138
00:10:07,800 --> 00:10:13,150
to find out which way will
the cost drop fastest.

139
00:10:13,150 --> 00:10:18,260
It chooses the direction in
which the cost -- the gradient,

140
00:10:18,260 --> 00:10:20,880
the component of the
gradient, you could say,

141
00:10:20,880 --> 00:10:26,290
along that edge is the biggest,
or maybe the most negative.

142
00:10:26,290 --> 00:10:31,820
And then, once it decides
which way to go, it goes --

143
00:10:31,820 --> 00:10:34,660
maybe it takes this direction
-- it goes, goes, goes, goes,

144
00:10:34,660 --> 00:10:38,580
goes until it hits
another corner.

145
00:10:38,580 --> 00:10:41,870
So that's the end
of the simplex step,

146
00:10:41,870 --> 00:10:43,750
when it reaches another corner.

147
00:10:43,750 --> 00:10:46,560
That completes one simplex step.

148
00:10:46,560 --> 00:10:50,100
Then, from this corner, it
will look the three ways

149
00:10:50,100 --> 00:10:52,120
it could go here.

150
00:10:52,120 --> 00:10:55,960
Well, it's not going to
pick this way, we know,

151
00:10:55,960 --> 00:11:00,760
because in this direction
the cost was decreasing

152
00:11:00,760 --> 00:11:03,180
or we wouldn't have taken it.

153
00:11:03,180 --> 00:11:04,860
We wouldn't have
taken that direction

154
00:11:04,860 --> 00:11:06,290
except that the cost went down.

155
00:11:06,290 --> 00:11:08,530
So if we came back,
the cost would go up.

156
00:11:08,530 --> 00:11:09,680
No good.

157
00:11:09,680 --> 00:11:13,760
So going down would be
one of these two ways.

158
00:11:13,760 --> 00:11:18,340
Maybe it goes down
in this direction,

159
00:11:18,340 --> 00:11:20,860
so we decide on that direction.

160
00:11:20,860 --> 00:11:24,420
We follow it until
we hit a new corner,

161
00:11:24,420 --> 00:11:28,690
and eventually we're going
to get to the winning corner

162
00:11:28,690 --> 00:11:33,060
because there are only a
finite number of corners.

163
00:11:33,060 --> 00:11:35,890
And how will we know
it's the winning corner?

164
00:11:35,890 --> 00:11:37,870
Well, we'll know that
corner is a winner

165
00:11:37,870 --> 00:11:42,340
if, in every direction,
the cost goes up.

166
00:11:42,340 --> 00:11:45,900
If the cost goes up
in all directions,

167
00:11:45,900 --> 00:11:49,210
along all the edges
out of that corner,

168
00:11:49,210 --> 00:11:50,470
then that corner has won.

169
00:11:50,470 --> 00:11:52,150
It's the minimum.

170
00:11:52,150 --> 00:11:58,230
I'm using linearity here -- the
fact that you know everything

171
00:11:58,230 --> 00:12:01,540
by traveling along, by
looking along those edges.

172
00:12:01,540 --> 00:12:05,090
So the simplex method
is a big success.

173
00:12:05,090 --> 00:12:08,930
Because in reality,
in practice, it

174
00:12:08,930 --> 00:12:10,860
turns out that the
number of edges

175
00:12:10,860 --> 00:12:16,020
that you have to travel
to get to the winner

176
00:12:16,020 --> 00:12:18,130
doesn't grow exponentially.

177
00:12:18,130 --> 00:12:20,480
I mean in principle it could.

178
00:12:20,480 --> 00:12:27,640
People have dreamt up really
desperate examples in which

179
00:12:27,640 --> 00:12:31,260
following the simplex method
you could take a long time,

180
00:12:31,260 --> 00:12:35,470
but on average you don't,
and in practice you don't.

181
00:12:35,470 --> 00:12:39,600
So it's a very
good method and was

182
00:12:39,600 --> 00:12:43,280
totally the method of choice.

183
00:12:43,280 --> 00:12:48,680
But a competitor has arrived.

184
00:12:48,680 --> 00:12:54,260
And that competitor goes under
the name of interior point

185
00:12:54,260 --> 00:13:06,020
method, and you can
guess what that method is

186
00:13:06,020 --> 00:13:10,010
doing quite different,
totally different system.

187
00:13:10,010 --> 00:13:14,830
That method is inside
the feasible set.

188
00:13:14,830 --> 00:13:18,010
It finds a point somewhere
near the middle maybe.

189
00:13:18,010 --> 00:13:30,010
And then it does normal,
gradient-type approach

190
00:13:30,010 --> 00:13:32,260
from your point.

191
00:13:32,260 --> 00:13:35,450
It figures out
which way to move.

192
00:13:35,450 --> 00:13:38,830
It moves, but it
doesn't go outside.

193
00:13:38,830 --> 00:13:41,250
It doesn't even reach the
boundary of the feasible

194
00:13:41,250 --> 00:13:43,820
set, because if you reach the
boundary of the feasible set,

195
00:13:43,820 --> 00:13:47,550
you're out of the interior
and the method is not

196
00:13:47,550 --> 00:13:50,420
going to operate.

197
00:13:50,420 --> 00:13:53,300
Well, that crudest method
would be follow the gradient,

198
00:13:53,300 --> 00:14:01,430
but we know from
several situations

199
00:14:01,430 --> 00:14:08,990
that gradient descent
can be less than optimal.

200
00:14:08,990 --> 00:14:10,650
So this is more subtle.

201
00:14:10,650 --> 00:14:14,400
Well, Newton's method
actually -- I'll explain.

202
00:14:14,400 --> 00:14:18,140
So this is actually the
content of my lecture --

203
00:14:18,140 --> 00:14:20,700
this interior point method.

204
00:14:20,700 --> 00:14:26,770
And let me just
mention a few names.

205
00:14:26,770 --> 00:14:30,090
People thought of interior
point methods long ago,

206
00:14:30,090 --> 00:14:37,910
but a big splash
came when Karmarkar

207
00:14:37,910 --> 00:14:40,760
proposed an interior
point method

208
00:14:40,760 --> 00:14:49,800
and proved that it converged
faster than simplex

209
00:14:49,800 --> 00:14:51,440
method in some problems.

210
00:14:51,440 --> 00:14:53,250
Well, he said all
problems, actually.

211
00:14:53,250 --> 00:15:00,370
His advertising of the
message was pretty generous.

212
00:15:00,370 --> 00:15:02,600
The sort of claim
that was around

213
00:15:02,600 --> 00:15:07,170
was, you know, ten times as
fast as the simplex method,

214
00:15:07,170 --> 00:15:10,670
generally, and it was on the
front page of the New York

215
00:15:10,670 --> 00:15:16,230
Times, and I remember going
to a lecture in Boston

216
00:15:16,230 --> 00:15:20,590
with lights, TV lights
on and everything.

217
00:15:20,590 --> 00:15:28,610
Well, maybe his exact method
now isn't so much used,

218
00:15:28,610 --> 00:15:35,330
but you have to give him credit
for stirring up the whole world

219
00:15:35,330 --> 00:15:42,110
of optimization because the
result of Karmarkar's method --

220
00:15:42,110 --> 00:15:49,190
and there were others -- and
I'll say barrier methods,

221
00:15:49,190 --> 00:15:52,080
and that's what
I'll try to explain.

222
00:15:55,700 --> 00:15:58,480
He stirred up the whole
world so that the experts

223
00:15:58,480 --> 00:16:02,400
in optimization began looking
again at interior point

224
00:16:02,400 --> 00:16:05,240
methods, seeing that
they did have some merit,

225
00:16:05,240 --> 00:16:06,510
improving them.

226
00:16:06,510 --> 00:16:11,010
And now for, I would
say particularly

227
00:16:11,010 --> 00:16:19,270
for large, sparse problems,
these are a way to go.

228
00:16:19,270 --> 00:16:20,550
These are preferred now.

229
00:16:20,550 --> 00:16:25,260
So this is the normal situation
in scientific computing:

230
00:16:25,260 --> 00:16:30,010
that any method
that's good, it's

231
00:16:30,010 --> 00:16:32,310
still not good for everything.

232
00:16:32,310 --> 00:16:37,350
It's got a range of problems
where it's successful

233
00:16:37,350 --> 00:16:40,560
and a range of problems
where some competitor wins.

234
00:16:40,560 --> 00:16:43,850
So that's the situation now.

235
00:16:43,850 --> 00:16:44,640
These are methods.

236
00:16:44,640 --> 00:16:48,350
This is certainly
not out of date,

237
00:16:48,350 --> 00:16:51,330
and I'm sure it's
the method of choice

238
00:16:51,330 --> 00:16:54,560
and it's carefully coded
and well understood,

239
00:16:54,560 --> 00:16:59,530
but these are quite effective.

240
00:16:59,530 --> 00:17:00,250
OK.

241
00:17:00,250 --> 00:17:03,390
So my job then is
to say something

242
00:17:03,390 --> 00:17:06,270
about these interior
point methods.

243
00:17:06,270 --> 00:17:13,980
And the beauty of these
is that the primal --

244
00:17:13,980 --> 00:17:16,885
this is called the
primal problem.

245
00:17:16,885 --> 00:17:17,510
Primal problem.

246
00:17:17,510 --> 00:17:20,260
And often you write
a P for primal.

247
00:17:20,260 --> 00:17:22,690
It means the given problem.

248
00:17:22,690 --> 00:17:25,250
And over here is
the dual problem.

249
00:17:25,250 --> 00:17:34,810
So you put a D, dual problem.
[UNINTELLIGIBLE PHRASE]

250
00:17:34,810 --> 00:17:40,590
It involves the same
data: the same b,

251
00:17:40,590 --> 00:17:47,280
the same A, and the same
c, but a new variable y.

252
00:17:47,280 --> 00:18:03,420
And [UNINTELLIGIBLE PHRASE]
is really

253
00:18:03,420 --> 00:18:06,590
the Lagrange multiplier
in the original problem

254
00:18:06,590 --> 00:18:08,360
for the constraints.

255
00:18:12,950 --> 00:18:19,190
So I won't go at the dual
problem exactly that way.

256
00:18:19,190 --> 00:18:30,320
I'm going to ask you just
to consider this problem

257
00:18:30,320 --> 00:18:32,940
and show you the
relation between the two.

258
00:18:32,940 --> 00:18:36,050
So what I want to say is
that these two problems --

259
00:18:36,050 --> 00:18:43,910
the primal and the dual, which
use the same data A, b, c,

260
00:18:43,910 --> 00:18:49,351
are intimately related, and sort
of solving one solves the other

261
00:18:49,351 --> 00:18:49,850
one.

262
00:18:49,850 --> 00:18:51,510
Actually is this it?

263
00:18:51,510 --> 00:18:53,200
That applies to
the simplex method.

264
00:18:53,200 --> 00:18:57,270
When the simplex method
finds the best corner,

265
00:18:57,270 --> 00:18:59,700
we could read off the
Lagrange multipliers,

266
00:18:59,700 --> 00:19:01,360
we could read off y.

267
00:19:01,360 --> 00:19:04,790
We could read off the optimal y.

268
00:19:04,790 --> 00:19:07,630
So my picture was
in the primal case,

269
00:19:07,630 --> 00:19:11,870
but there's a dual
picture in the dual case.

270
00:19:11,870 --> 00:19:12,680
OK.

271
00:19:12,680 --> 00:19:18,010
So we have a minimum problem
and a maximum problem,

272
00:19:18,010 --> 00:19:21,770
and I'm using this word duality.

273
00:19:21,770 --> 00:19:25,740
So what I want to
do is tell you how

274
00:19:25,740 --> 00:19:30,110
do we recognize the winning
corner in the primal problem,

275
00:19:30,110 --> 00:19:33,810
and it's beautiful.

276
00:19:33,810 --> 00:19:40,380
So at the best --
so the optimal x,

277
00:19:40,380 --> 00:19:45,580
let me call it x
star and y star,

278
00:19:45,580 --> 00:19:52,880
have min over there
equal the max here.

279
00:19:52,880 --> 00:20:01,170
Min of all the c*x's,
which is c x star,

280
00:20:01,170 --> 00:20:08,370
equal to a maximum over all of
the y's of the y*b's, which is

281
00:20:08,370 --> 00:20:10,600
y star b.

282
00:20:10,600 --> 00:20:12,850
So these are equal
at the winner.

283
00:20:15,420 --> 00:20:21,450
That's the essence
of this duality.

284
00:20:21,450 --> 00:20:25,850
Duality is about two problems
that use the same data

285
00:20:25,850 --> 00:20:27,585
but they look quite different.

286
00:20:27,585 --> 00:20:29,710
You know, they're using
the data in different ways.

287
00:20:29,710 --> 00:20:34,660
The cost function there showed
up in the constraint here.

288
00:20:34,660 --> 00:20:38,470
The constraint b there showed
up in the cost function here.

289
00:20:38,470 --> 00:20:45,620
And even A got flipped, because
if I use my usual column vector

290
00:20:45,620 --> 00:20:48,920
notation -- if I just
transpose this --

291
00:20:48,920 --> 00:20:56,430
this would be A transpose y
transpose less or equal to c

292
00:20:56,430 --> 00:20:57,130
transpose.

293
00:20:57,130 --> 00:21:00,980
If I wanted to stay with column
vectors y transpose and c

294
00:21:00,980 --> 00:21:04,700
transpose, then it would
be the transpose of A

295
00:21:04,700 --> 00:21:06,470
that would appear.

296
00:21:06,470 --> 00:21:12,150
So I'll just put transpose
with two exclamation marks.

297
00:21:12,150 --> 00:21:15,350
That's typical.

298
00:21:15,350 --> 00:21:20,030
And you often see
the word adjoint.

299
00:21:20,030 --> 00:21:24,710
So there are methods in
differential equations,

300
00:21:24,710 --> 00:21:27,380
in optimization,
called adjoint methods.

301
00:21:27,380 --> 00:21:31,630
Adjoint is just, really,
another word for transpose.

302
00:21:31,630 --> 00:21:36,260
It's a word that applies in
differential equations as well

303
00:21:36,260 --> 00:21:41,960
as matrices, so it's kind of
a better word, you could say,

304
00:21:41,960 --> 00:21:45,040
where transpose we
usually apply to matrices,

305
00:21:45,040 --> 00:21:48,640
but totally the same
idea, identical idea.

306
00:21:48,640 --> 00:21:49,510
OK.

307
00:21:49,510 --> 00:21:59,220
So the wonderful thing is
that at the moment of success,

308
00:21:59,220 --> 00:22:03,140
at the moment of
optimality, these are equal.

309
00:22:03,140 --> 00:22:05,750
A minimum equals a maximum.

310
00:22:05,750 --> 00:22:12,090
And that's one way to recognize
that you've succeeded,

311
00:22:12,090 --> 00:22:15,510
and that's one way to
measure how far you have

312
00:22:15,510 --> 00:22:18,430
to go, with the duality gap.

313
00:22:18,430 --> 00:22:23,780
So the duality gap
would be the difference.

314
00:22:23,780 --> 00:22:29,150
If you had a particular
y that wasn't the winner,

315
00:22:29,150 --> 00:22:32,270
a particular x that
wasn't the winner,

316
00:22:32,270 --> 00:22:38,790
the duality gap would be the
difference between c*x and y*b.

317
00:22:38,790 --> 00:22:44,920
And what I'm saying is that when
that duality gap narrows to 0,

318
00:22:44,920 --> 00:22:47,790
you've got it.

319
00:22:47,790 --> 00:22:51,610
When this narrows to 0, you've
brought c*x down as far as you

320
00:22:51,610 --> 00:22:56,070
could, you've raised y*b
up as far as you could.

321
00:22:56,070 --> 00:23:02,240
And if you did it right, if
you've got to the optimum,

322
00:23:02,240 --> 00:23:06,400
then the duality gap
disappeared -- became 0.

323
00:23:06,400 --> 00:23:10,440
So that's a measure
of am I at the answer,

324
00:23:10,440 --> 00:23:14,620
am I close, you know, if we're
going to do an iterative method

325
00:23:14,620 --> 00:23:15,710
as I'm planning.

326
00:23:15,710 --> 00:23:18,950
So that's the point, of course.

327
00:23:18,950 --> 00:23:23,910
These interior point
methods will be iterative.

328
00:23:26,870 --> 00:23:29,900
We step, we never
actually allow them

329
00:23:29,900 --> 00:23:36,660
to get to the absolute corner
until maybe at the last minute.

330
00:23:36,660 --> 00:23:40,350
So here, let me draw a
picture of how interior point

331
00:23:40,350 --> 00:23:41,620
methods might work.

332
00:23:41,620 --> 00:23:47,250
So here is the feasible set
-- some kind of a polyhedron,

333
00:23:47,250 --> 00:23:48,780
whatever.

334
00:23:48,780 --> 00:23:54,030
So think of that as a kind of
a diamond, a twenty-four carat

335
00:23:54,030 --> 00:23:54,530
diamond.

336
00:23:54,530 --> 00:23:55,940
OK?

337
00:23:55,940 --> 00:24:00,730
And start at a point inside.

338
00:24:00,730 --> 00:24:05,030
And somehow find the gradient,
decide which way to move,

339
00:24:05,030 --> 00:24:07,790
dot dot dot dot.

340
00:24:07,790 --> 00:24:10,920
And there'll be
some barrier here

341
00:24:10,920 --> 00:24:13,760
which is going to prevent
us from reaching it,

342
00:24:13,760 --> 00:24:15,180
so we'll stop.

343
00:24:15,180 --> 00:24:18,560
So that will be one
step, and from here we

344
00:24:18,560 --> 00:24:21,990
will do the same
thing, whatever it is.

345
00:24:21,990 --> 00:24:23,590
It'll be Newton's
method, actually.

346
00:24:23,590 --> 00:24:24,220
You'll see.

347
00:24:24,220 --> 00:24:26,710
It's just Newton's method.

348
00:24:26,710 --> 00:24:30,160
The most fundamental way to
solve non-linear equations

349
00:24:30,160 --> 00:24:31,900
is Newton's method.

350
00:24:31,900 --> 00:24:34,950
And it'll take
another direction.

351
00:24:34,950 --> 00:24:38,700
Again it'll stop, and the
thing will follow some path.

352
00:24:38,700 --> 00:24:46,370
And then maybe at this point
the duality gap is very small.

353
00:24:46,370 --> 00:24:49,270
We'll realize that
this is the winner.

354
00:24:49,270 --> 00:24:54,780
So we could, at the last minute,
say: OK, jump to the winner.

355
00:24:54,780 --> 00:24:58,890
But it's this path
through the interior

356
00:24:58,890 --> 00:25:01,710
that we're really interested in.

357
00:25:01,710 --> 00:25:06,070
OK, so I'm giving a sort
of general picture of it.

358
00:25:06,070 --> 00:25:09,840
And now I'm ready
to do two things.

359
00:25:09,840 --> 00:25:12,870
One is the nice
little bit of algebra

360
00:25:12,870 --> 00:25:19,560
that says that this duality gap
is always greater or equal 0.

361
00:25:19,560 --> 00:25:20,600
OK.

362
00:25:20,600 --> 00:25:23,840
So that's called weak duality.

363
00:25:23,840 --> 00:25:32,250
Weak duality, which is easy to
prove, says that, always, c*x,

364
00:25:32,250 --> 00:25:38,860
for any feasible x, is greater
or equal to y*b for any

365
00:25:38,860 --> 00:25:40,410
feasible y.

366
00:25:40,410 --> 00:25:47,380
So any x and y that
satisfy the constraints.

367
00:25:51,740 --> 00:25:58,350
I should say satisfying
the constraints.

368
00:25:58,350 --> 00:26:01,640
So weak duality I'll
now prove in one second.

369
00:26:05,210 --> 00:26:16,540
And the point is that, as
I push to bring c*x down --

370
00:26:16,540 --> 00:26:22,380
minimize -- as I push to
move y*b up -- maximize --

371
00:26:22,380 --> 00:26:25,760
they will meet, at the winner.

372
00:26:25,760 --> 00:26:29,080
OK, now how do I prove
c*x greater or equal y*b?

373
00:26:29,080 --> 00:26:30,980
Let me try to prove that.

374
00:26:30,980 --> 00:26:34,300
Proof.

375
00:26:34,300 --> 00:26:36,950
OK, so look at y*b.

376
00:26:36,950 --> 00:26:38,020
OK.

377
00:26:38,020 --> 00:26:41,370
Now, so I know something
about the constraints.

378
00:26:41,370 --> 00:26:45,600
A*x is greater or equal to b.

379
00:26:45,600 --> 00:26:51,600
So this b -- I want to say that
this is less or equal to y*A*x.

380
00:26:51,600 --> 00:26:54,730
Now am I allowed to say that?

381
00:26:54,730 --> 00:26:58,580
First of all, y is
feasible; x is feasible.

382
00:26:58,580 --> 00:27:00,550
So they satisfy.

383
00:27:00,550 --> 00:27:04,550
Feasible means that
these are satisfied

384
00:27:04,550 --> 00:27:06,440
and these are satisfied.

385
00:27:09,420 --> 00:27:10,940
OK.

386
00:27:10,940 --> 00:27:14,520
And do you see that
that's really all right?

387
00:27:14,520 --> 00:27:17,550
Well you might say no problem.

388
00:27:17,550 --> 00:27:19,080
A*x is greater or b.

389
00:27:19,080 --> 00:27:20,490
It's obvious.

390
00:27:20,490 --> 00:27:25,100
But I have actually used one
more point here, haven't I?

391
00:27:25,100 --> 00:27:29,400
If I have an inequality,
then I'm multiplying it by y,

392
00:27:29,400 --> 00:27:32,240
and it didn't change the
direction of the inequality

393
00:27:32,240 --> 00:27:37,850
sign, and that was because
y is greater or equal 0.

394
00:27:37,850 --> 00:27:41,050
That's where that paid off.

395
00:27:41,050 --> 00:27:44,760
So this used the fact that --
this came from the fact that y

396
00:27:44,760 --> 00:27:48,780
was greater or equal 0 and
A*x was greater or equal b.

397
00:27:51,980 --> 00:27:56,900
Those two facts meant that I
could multiply and preserve

398
00:27:56,900 --> 00:27:58,330
the inequality sign.

399
00:27:58,330 --> 00:28:01,390
And now I'm going to
go to the next step:

400
00:28:01,390 --> 00:28:06,450
y*A is less or equal
c, and there is x.

401
00:28:06,450 --> 00:28:12,420
OK, so you see that I
finally got what I want:

402
00:28:12,420 --> 00:28:14,750
y*b less or equal to c*x.

403
00:28:14,750 --> 00:28:19,030
But what went into that step?

404
00:28:19,030 --> 00:28:25,960
Well, looking here, I had
y*A less or equal to c,

405
00:28:25,960 --> 00:28:31,490
and I also had x greater or
equal 0 by the feasibility

406
00:28:31,490 --> 00:28:32,570
of x.

407
00:28:32,570 --> 00:28:37,820
So that inequality I
was allowed to multiply

408
00:28:37,820 --> 00:28:41,990
by x because x is not negative.

409
00:28:41,990 --> 00:28:45,370
If x had been minus 1,
then when you -- right,

410
00:28:45,370 --> 00:28:49,750
if I have an inequality
like 4 less or equal 7,

411
00:28:49,750 --> 00:28:53,810
if I multiply by minus 1,
I get minus 4 and minus 7,

412
00:28:53,810 --> 00:28:58,350
and the inequality switches:
minus 7 is below minus 4.

413
00:28:58,350 --> 00:29:00,300
But that's not what's
happening here,

414
00:29:00,300 --> 00:29:03,940
because the x is not negative.

415
00:29:03,940 --> 00:29:10,650
So this is not what's
happening, and I'm OK.

416
00:29:10,650 --> 00:29:15,200
So the conclusion was
exactly what I wanted --

417
00:29:15,200 --> 00:29:18,430
that y*b was less
or equal to c*x.

418
00:29:18,430 --> 00:29:24,310
And you see how perfectly
it used the four inequality

419
00:29:24,310 --> 00:29:25,970
constraints.

420
00:29:25,970 --> 00:29:26,650
OK.

421
00:29:26,650 --> 00:29:33,280
So that's the weak duality
where the proof is easy.

422
00:29:33,280 --> 00:29:36,280
Just use what's given.

423
00:29:36,280 --> 00:29:40,710
The duality, without
the word weak,

424
00:29:40,710 --> 00:29:47,990
is the fact that at the
optimum, the gap is 0,

425
00:29:47,990 --> 00:29:51,780
and actually we can see --
that will tell us a lot.

426
00:29:51,780 --> 00:29:53,430
That will tell us a lot.

427
00:29:53,430 --> 00:29:56,420
When could this gap be 0?

428
00:29:56,420 --> 00:30:01,910
So at the optimum, y star,
equality is holding throughout.

429
00:30:01,910 --> 00:30:06,110
So if equality is
holding, how can that be?

430
00:30:06,110 --> 00:30:10,440
How can I take these --
of course the inequality,

431
00:30:10,440 --> 00:30:13,100
the x star and y
star are feasible.

432
00:30:13,100 --> 00:30:16,810
So if I just put stars
on all these things,

433
00:30:16,810 --> 00:30:21,310
then I would have -- everything
would still be totally true.

434
00:30:21,310 --> 00:30:24,915
But when I put stars on them --
so I'm picking the optimal guys

435
00:30:24,915 --> 00:30:28,900
-- then equality is holding.

436
00:30:28,900 --> 00:30:33,700
I still have these inequalities,
so what I want to find

437
00:30:33,700 --> 00:30:36,490
is the optimality conditions.

438
00:30:36,490 --> 00:30:39,490
How are they related?

439
00:30:39,490 --> 00:30:43,580
If I have y greater or equal
0 and A*x greater or equal b

440
00:30:43,580 --> 00:30:53,471
and I multiply, how
could I get equality?

441
00:30:53,471 --> 00:30:53,970
Right?

442
00:30:53,970 --> 00:30:59,620
For example, if I have 3 greater
than 0 and 5 greater than 2,

443
00:30:59,620 --> 00:31:04,790
if I multiply those, I get
15 greater than 0, I guess,

444
00:31:04,790 --> 00:31:08,900
and that's far from
equality, right?

445
00:31:08,900 --> 00:31:12,100
So how could equality happen?

446
00:31:12,100 --> 00:31:17,890
Well, the only way is if one or
the other of these, if equality

447
00:31:17,890 --> 00:31:26,850
holds in one or the
other, then I would be OK.

448
00:31:26,850 --> 00:31:28,500
Yes, do you see that?

449
00:31:28,500 --> 00:31:35,550
If equality held -- these
are vector inequalities,

450
00:31:35,550 --> 00:31:38,010
so I'm going really
component by component.

451
00:31:38,010 --> 00:31:39,990
Let me write down
the my conclusion

452
00:31:39,990 --> 00:31:41,670
and then you'll see what I mean.

453
00:31:41,670 --> 00:31:47,100
So these are called the
Kuhn-Tucker conditions.

454
00:31:47,100 --> 00:31:48,960
You've seen their names before.

455
00:31:48,960 --> 00:31:52,340
And they're also called
-- well, long words --

456
00:31:52,340 --> 00:31:55,960
complementary slackness.

457
00:31:55,960 --> 00:32:02,200
I'm using words that, if you
haven't seen the subject,

458
00:32:02,200 --> 00:32:05,350
you think OK, who
needs the long words.

459
00:32:05,350 --> 00:32:08,160
But the idea of
slack variable --

460
00:32:08,160 --> 00:32:16,950
slack is the difference in the
-- the slack is c minus y*A,

461
00:32:16,950 --> 00:32:21,840
or over here the
slack is A*x minus b.

462
00:32:21,840 --> 00:32:28,670
These are the slack variables.
w, let's say, is A*x minus b.

463
00:32:28,670 --> 00:32:30,510
And of course it's
greater or equal 0;

464
00:32:30,510 --> 00:32:32,970
that's the nice thing
about slack variables.

465
00:32:32,970 --> 00:32:36,070
You know, you've fixed it
so it's greater or equal 0.

466
00:32:36,070 --> 00:32:40,660
Here, the slack variable s,
for slack, would be what?

467
00:32:40,660 --> 00:32:46,440
c minus y*A, greater or equal 0.

468
00:32:46,440 --> 00:32:49,770
And there's no
slack when s is 0.

469
00:32:49,770 --> 00:32:53,530
OK, so that's where the
word slackness comes in.

470
00:32:53,530 --> 00:32:57,000
Slack is just the amount
of give in the inequality.

471
00:32:57,000 --> 00:32:58,790
So what's the point here?

472
00:32:58,790 --> 00:33:02,290
I was looking at this
guy, and the only way

473
00:33:02,290 --> 00:33:08,380
that I could have equality here,
when I have inequalities there,

474
00:33:08,380 --> 00:33:16,180
is for each component, I'm
going to have to have equality.

475
00:33:16,180 --> 00:33:19,380
And how can I have
equality on a component?

476
00:33:19,380 --> 00:33:24,130
Well, I would have it
for example, if y was 0.

477
00:33:26,810 --> 00:33:30,520
Then when I multiply,
I have equality.

478
00:33:30,520 --> 00:33:31,020
Right?

479
00:33:33,570 --> 00:33:34,940
OK.

480
00:33:34,940 --> 00:33:45,130
Or I could have equality
if I had -- let's see,

481
00:33:45,130 --> 00:33:47,770
so this is what I want to say.

482
00:33:47,770 --> 00:33:59,240
So I want to say either
y_i is 0 or (A*x)_i is b_i.

483
00:33:59,240 --> 00:34:03,940
Equality holds in one or
other of the two inequalities,

484
00:34:03,940 --> 00:34:09,710
because then, if I multiply
them together, I have equality.

485
00:34:09,710 --> 00:34:11,270
Right?

486
00:34:11,270 --> 00:34:14,360
You see that.

487
00:34:14,360 --> 00:34:18,950
If one of those holds, say
this one holds, if y_i i is 0,

488
00:34:18,950 --> 00:34:24,010
then I certainly can
multiply the inequality by y

489
00:34:24,010 --> 00:34:26,010
and I get 0 equals 0.

490
00:34:26,010 --> 00:34:33,160
Or, if A*x is exactly b, then
multiplying by y won't change.

491
00:34:33,160 --> 00:34:34,130
OK.

492
00:34:34,130 --> 00:34:40,520
So this is the complementary
slackness, one or the other,

493
00:34:40,520 --> 00:34:43,010
that has to hold
to get equality.

494
00:34:43,010 --> 00:34:46,010
Now what about this guy?

495
00:34:46,010 --> 00:34:48,820
Equality, same idea here.

496
00:34:48,820 --> 00:34:52,700
I got the inequality by
multiplying these together.

497
00:34:52,700 --> 00:34:54,730
When will I get equality?

498
00:34:54,730 --> 00:35:08,750
Only if either x_j is 0 or the
j-th component of y*A equals

499
00:35:08,750 --> 00:35:10,390
the j-th component of c.

500
00:35:13,070 --> 00:35:18,160
Again, the same reasoning: that
when I multiply two things,

501
00:35:18,160 --> 00:35:21,730
if I get an equality
out of two inequalities,

502
00:35:21,730 --> 00:35:25,300
then one of those two at
least must have been actually

503
00:35:25,300 --> 00:35:29,140
an equals; otherwise
I'd still have a gap.

504
00:35:29,140 --> 00:35:32,210
OK, so this is pretty important.

505
00:35:35,220 --> 00:35:38,850
These are the conditions
-- these are our equations.

506
00:35:45,780 --> 00:35:48,230
That tells us when we've won.

507
00:35:48,230 --> 00:35:52,220
So this actually
holds for the winners.

508
00:35:52,220 --> 00:35:59,950
It doesn't hold for all the
other guys, but at the winner,

509
00:35:59,950 --> 00:36:04,280
because things are
equal here, they

510
00:36:04,280 --> 00:36:07,310
had to be equal at every
step, and therefore

511
00:36:07,310 --> 00:36:10,080
the Kuhn-Tucker conditions
had hold at the winner.

512
00:36:10,080 --> 00:36:16,590
So they hold at this winning
corner when we find it.

513
00:36:16,590 --> 00:36:21,220
So the simplex method
chases corners, finally

514
00:36:21,220 --> 00:36:24,120
gets to a corner,
and it would know

515
00:36:24,120 --> 00:36:26,450
it had got there
by the fact that it

516
00:36:26,450 --> 00:36:29,940
couldn't decrease any more.

517
00:36:29,940 --> 00:36:31,930
And if you look at
the algebra, you

518
00:36:31,930 --> 00:36:36,460
would see that that tells you
that the Kuhn-Tucker conditions

519
00:36:36,460 --> 00:36:37,710
are satisfied.

520
00:36:37,710 --> 00:36:44,610
OK, so the only proof I
gave was the weak proof,

521
00:36:44,610 --> 00:36:47,600
that c*x is greater or equal
to y*b because that's the nice

522
00:36:47,600 --> 00:36:50,250
one.

523
00:36:50,250 --> 00:36:53,250
I've proved that for
equality we'd need these.

524
00:36:53,250 --> 00:36:58,570
OK, now I guess I'm
ready for the method.

525
00:36:58,570 --> 00:37:05,050
I'm ready for the interior
point barrier method

526
00:37:05,050 --> 00:37:07,570
that tells me how to compute.

527
00:37:07,570 --> 00:37:10,010
OK, so I'm at an interior point.

528
00:37:10,010 --> 00:37:11,280
What do I do?

529
00:37:11,280 --> 00:37:15,720
OK, so here's the method;
here's the barrier --

530
00:37:15,720 --> 00:37:17,820
I'll call it a log barrier.

531
00:37:22,960 --> 00:37:28,380
I'll solve the problem
of minimizing c*x.

532
00:37:28,380 --> 00:37:30,000
I won't solve the exact problem.

533
00:37:30,000 --> 00:37:34,580
I'm going to minimize
c*x minus, I think,

534
00:37:34,580 --> 00:37:37,660
some little number
times a barrier,

535
00:37:37,660 --> 00:37:45,320
which is going to be a sum
of the logarithms of the x's.

536
00:37:45,320 --> 00:37:50,990
This alpha is going to
be a little bit positive.

537
00:37:50,990 --> 00:37:56,760
I'll take it smaller and smaller
because this part is really --

538
00:37:56,760 --> 00:37:58,580
it's that that I really
want to minimize.

539
00:37:58,580 --> 00:37:59,080
Right?

540
00:37:59,080 --> 00:38:00,400
That's the original problem.

541
00:38:00,400 --> 00:38:01,500
This is the cost.

542
00:38:01,500 --> 00:38:04,010
I'm adding something to the
cost but I'd better just

543
00:38:04,010 --> 00:38:08,090
be sure that I've chosen
the sign of alpha correctly.

544
00:38:08,090 --> 00:38:11,820
By the way, this
is discussed now

545
00:38:11,820 --> 00:38:17,670
in the latest version
of my Linear Algebra

546
00:38:17,670 --> 00:38:21,360
and its Applications
textbook, the fourth edition.

547
00:38:23,960 --> 00:38:27,920
Editions one to three
of that book and others

548
00:38:27,920 --> 00:38:30,400
have described the
simplex method,

549
00:38:30,400 --> 00:38:36,250
and now it was just natural
to include the interior point

550
00:38:36,250 --> 00:38:37,190
barrier method.

551
00:38:37,190 --> 00:38:40,190
OK, so why do I
call this a barrier?

552
00:38:40,190 --> 00:38:46,240
Because if x_i gets to
0, the log blows up.

553
00:38:49,620 --> 00:38:52,320
The log blows down
I should say --

554
00:38:52,320 --> 00:38:54,140
blows down to minus infinity.

555
00:38:54,140 --> 00:38:58,980
I'm multiplying by minus
alpha, so I get positive.

556
00:38:58,980 --> 00:39:00,880
The combination blows up.

557
00:39:00,880 --> 00:39:04,140
It couldn't be the
minimum, so you see,

558
00:39:04,140 --> 00:39:07,630
the minimum is never going
to make it to x equals 0,

559
00:39:07,630 --> 00:39:13,980
because at x equals 0, the thing
I have here is plus infinity.

560
00:39:13,980 --> 00:39:17,370
So now, I'm just going
to use gradient method.

561
00:39:20,020 --> 00:39:24,460
I'm going to solve this
problem with the constraints,

562
00:39:24,460 --> 00:39:30,200
of course, with the constraints,
and set derivatives to 0.

563
00:39:30,200 --> 00:39:33,310
Now I have -- you know,
it's not linear anymore --

564
00:39:33,310 --> 00:39:35,582
the winner is not
at a corner anymore.

565
00:39:35,582 --> 00:39:36,790
It's somewhere in the middle.

566
00:39:36,790 --> 00:39:38,310
Calculus operates.

567
00:39:38,310 --> 00:39:40,450
I can set derivative to 0.

568
00:39:40,450 --> 00:39:42,470
OK, so I want to do that.

569
00:39:42,470 --> 00:39:46,420
And of course, I'm still
inside this feasible set,

570
00:39:46,420 --> 00:39:51,590
so let me see if I can
put down the equations

571
00:39:51,590 --> 00:39:52,830
and the constraints.

572
00:39:52,830 --> 00:39:55,880
OK, so I still have
the constraints.

573
00:39:55,880 --> 00:40:04,960
Now, forgive me, but I've
made a change to A*x equals b.

574
00:40:04,960 --> 00:40:09,070
I could have started with
that as the constraint.

575
00:40:09,070 --> 00:40:12,380
I've made that change
to A*x equals b.

576
00:40:15,530 --> 00:40:19,140
How have I done such a thing?

577
00:40:19,140 --> 00:40:22,030
I'm given the problem with
A*x greater or equal b,

578
00:40:22,030 --> 00:40:27,350
but I'm also given the slack
variable w greater or equal 0.

579
00:40:30,220 --> 00:40:33,980
So I just -- it's just a little
trick that's not worth --

580
00:40:33,980 --> 00:40:36,700
you could just take my word
for it, a little trick.

581
00:40:36,700 --> 00:40:41,870
My new variable is
the x's and the w's.

582
00:40:41,870 --> 00:40:47,400
m plus n variables: the
n x's and the m w's.

583
00:40:47,400 --> 00:40:53,750
And now, put that together --
so can I just maybe do this over

584
00:40:53,750 --> 00:40:56,460
in the corner here?

585
00:40:56,460 --> 00:41:02,680
Before I start on this, I
changed to a new variable that

586
00:41:02,680 --> 00:41:05,610
that'll be x's and w's.

587
00:41:05,610 --> 00:41:08,730
And that will be greater
or equal 0, right?

588
00:41:08,730 --> 00:41:10,610
Because the x was always
greater or equal 0,

589
00:41:10,610 --> 00:41:15,620
and the slack says A*x greater
or equal b is turned into slack

590
00:41:15,620 --> 00:41:16,860
greater or equal 0.

591
00:41:16,860 --> 00:41:24,540
And now that multiplies
A, minus I to give b,

592
00:41:24,540 --> 00:41:31,820
because A*x minus the
slack variable is b,

593
00:41:31,820 --> 00:41:37,550
which says that A*x minus b
is the slack variable w --

594
00:41:37,550 --> 00:41:40,640
bring that over and that over
-- and that's what we said was

595
00:41:40,640 --> 00:41:41,920
greater or equal 0.

596
00:41:41,920 --> 00:41:45,240
Do you see that I've
changed to an equation

597
00:41:45,240 --> 00:41:49,230
by introducing more variables?

598
00:41:49,230 --> 00:41:54,470
Putting the x's and the slacks
all together in a big variable

599
00:41:54,470 --> 00:41:55,870
that I'm now going to call x.

600
00:41:55,870 --> 00:42:00,002
So this is now the sum of --
there's are m plus n of these

601
00:42:00,002 --> 00:42:11,130
x's now, because -- this is the
new x and this is the new A.

602
00:42:11,130 --> 00:42:14,890
You might say: why didn't I just
start with equality constraint?

603
00:42:14,890 --> 00:42:16,840
And I certainly could have done.

604
00:42:16,840 --> 00:42:21,580
But just to see that
inequalities have their place

605
00:42:21,580 --> 00:42:25,040
too, and to see that we can
get between one and the other.

606
00:42:25,040 --> 00:42:31,310
OK, so now this is the problem
with equality constraint.

607
00:42:31,310 --> 00:42:38,770
So my new constraints are A*x
equals b and x greater or equal

608
00:42:38,770 --> 00:42:39,780
0.

609
00:42:39,780 --> 00:42:41,770
That's the primal constraint.

610
00:42:41,770 --> 00:42:44,940
And what's the dual constraint?

611
00:42:44,940 --> 00:42:48,740
So the dual constraint
is y greater or equals 0.

612
00:42:48,740 --> 00:42:50,660
Right?

613
00:42:50,660 --> 00:42:57,330
And, OK I have to get this right
because we're right at the end.

614
00:42:57,330 --> 00:43:05,240
And the slack, let me
just write the slack one.

615
00:43:05,240 --> 00:43:08,930
The slack one -- s is the slack.

616
00:43:08,930 --> 00:43:10,010
This is s.

617
00:43:10,010 --> 00:43:14,170
I'm going to transpose so that
I have consistently column

618
00:43:14,170 --> 00:43:14,980
vectors.

619
00:43:14,980 --> 00:43:24,560
So that, when I transpose, it
says that A transpose y plus s

620
00:43:24,560 --> 00:43:26,180
is c.

621
00:43:26,180 --> 00:43:27,600
Right?

622
00:43:27,600 --> 00:43:31,110
I put that over there
with the s and transpose

623
00:43:31,110 --> 00:43:32,970
to get column vectors.

624
00:43:32,970 --> 00:43:35,040
I like to have column vectors.

625
00:43:35,040 --> 00:43:39,570
OK, so those are the
constraints, but now,

626
00:43:39,570 --> 00:43:44,730
what's the derivative
equals 0 equation?

627
00:43:44,730 --> 00:43:50,380
Derivative equals 0 is the
derivative of this equals 0.

628
00:43:50,380 --> 00:43:52,260
So what does that say?

629
00:43:52,260 --> 00:43:56,440
That says that if I set
the derivative as 0,

630
00:43:56,440 --> 00:44:01,760
that says that c_i
-- x, remember, is --

631
00:44:01,760 --> 00:44:04,380
well x has got all
these components.

632
00:44:04,380 --> 00:44:11,540
c_i is alpha and the derivative
of log x_i, of course,

633
00:44:11,540 --> 00:44:14,810
is 1 over x_i.

634
00:44:14,810 --> 00:44:20,440
So that's the equation
for derivative equals 0.

635
00:44:20,440 --> 00:44:23,090
So this is what I'm solving.

636
00:44:23,090 --> 00:44:23,590
OK.

637
00:44:26,690 --> 00:44:29,890
Equality is here,
equality is here,

638
00:44:29,890 --> 00:44:32,294
equality is here, but nonlinear.

639
00:44:32,294 --> 00:44:33,460
This is of course nonlinear.

640
00:44:43,570 --> 00:44:50,680
So Newton's method
just says linearize.

641
00:44:50,680 --> 00:44:55,590
Newton's method is just
linearize at the point,

642
00:44:55,590 --> 00:45:00,420
and that gives you
the direction to move.

643
00:45:00,420 --> 00:45:04,570
And you move that direction
because you've linearized.

644
00:45:04,570 --> 00:45:10,440
As you move, you're wandering
a little away from precision,

645
00:45:10,440 --> 00:45:16,430
from perfection, but if you
don't take too big a step,

646
00:45:16,430 --> 00:45:18,430
Newton is safe.

647
00:45:18,430 --> 00:45:21,130
Maybe, since this is a course
in scientific computing,

648
00:45:21,130 --> 00:45:24,630
I should've written on the
very first day in big letters

649
00:45:24,630 --> 00:45:31,620
Newton, because that idea
of following the gradient

650
00:45:31,620 --> 00:45:36,400
is the central method of
solving non-linear equations.

651
00:45:36,400 --> 00:45:39,160
And then on the board
beneath, I would

652
00:45:39,160 --> 00:45:41,600
have written in big
letters "carefully,"

653
00:45:41,600 --> 00:45:46,610
because the derivative
is a local thing.

654
00:45:46,610 --> 00:45:51,590
And if you follow the
derivative a long distance,

655
00:45:51,590 --> 00:45:55,780
follow the derivative here
a long distance out to here,

656
00:45:55,780 --> 00:46:03,130
who knows what -- you've lost
the safety of Newton's method.

657
00:46:03,130 --> 00:46:08,100
So Newton's method
always comes in reality

658
00:46:08,100 --> 00:46:11,760
with some kind of a trust
region, some region where

659
00:46:11,760 --> 00:46:17,150
you can rely on the
derivative being

660
00:46:17,150 --> 00:46:21,840
a reasonable approximation of
the way the function is moving.

661
00:46:21,840 --> 00:46:25,130
OK, so we do that here too.

662
00:46:25,130 --> 00:46:26,520
OK.

663
00:46:26,520 --> 00:46:36,800
Maybe I won't write
out in full notation --

664
00:46:36,800 --> 00:46:39,500
what does Newton's
method do, actually?

665
00:46:39,500 --> 00:46:47,370
So Newton's method, we're
at a particular x, y, s,

666
00:46:47,370 --> 00:46:51,450
and we've got to move.

667
00:46:51,450 --> 00:46:54,550
So the unknowns are ---
the components of x,

668
00:46:54,550 --> 00:46:58,130
the components of y,
and the components of s.

669
00:46:58,130 --> 00:47:03,350
So Newton's method takes
steps: a delta x, a delta y,

670
00:47:03,350 --> 00:47:11,500
and a delta s, computes
what those should be,

671
00:47:11,500 --> 00:47:15,780
and then that gives
the direction,

672
00:47:15,780 --> 00:47:19,210
and if you take them exactly,
that's the full Newton

673
00:47:19,210 --> 00:47:22,660
step, which you would be very
happy to do because that gives

674
00:47:22,660 --> 00:47:27,030
terrific convergence, but
if it's too big a step,

675
00:47:27,030 --> 00:47:28,800
then you have to cut back.

676
00:47:28,800 --> 00:47:32,660
So the equations for
these are what you need,

677
00:47:32,660 --> 00:47:37,310
so there'll be an A
delta x will be 0.

678
00:47:37,310 --> 00:47:39,640
Because b isn't changing.

679
00:47:39,640 --> 00:47:50,690
There will be an A transpose
delta y; A transposed delta y

680
00:47:50,690 --> 00:47:55,900
plus delta s will be 0
because the c isn't changing.

681
00:47:55,900 --> 00:47:58,570
And then we'll get
an equation out

682
00:47:58,570 --> 00:48:04,490
of this, which is a really
significant one that maybe

683
00:48:04,490 --> 00:48:08,700
time is running out on and I'm
not going to do justice to.

684
00:48:08,700 --> 00:48:14,400
But that's the nonlinear
term, where, you see,

685
00:48:14,400 --> 00:48:20,790
if I keep A delta x zero, then
my new x is exactly feasible

686
00:48:20,790 --> 00:48:21,290
right?

687
00:48:21,290 --> 00:48:25,590
If I'm at an A*x equals b
and I move it by a delta x

688
00:48:25,590 --> 00:48:30,570
that's in the null space,
then I still have --

689
00:48:30,570 --> 00:48:35,820
all I'm saying is that when I
take that step I will have A x

690
00:48:35,820 --> 00:48:38,350
plus delta x still equal to b.

691
00:48:38,350 --> 00:48:39,130
Good.

692
00:48:39,130 --> 00:48:40,910
Constraints still satisfied.

693
00:48:40,910 --> 00:48:46,070
When I take this step, since
that's linear, the constraint,

694
00:48:46,070 --> 00:48:50,260
when I add on the delta y
and the delta s and the 0,

695
00:48:50,260 --> 00:48:53,980
I still have -- my new
point still satisfies that

696
00:48:53,980 --> 00:48:54,890
constraint.

697
00:48:54,890 --> 00:48:58,980
But this is of course
not exactly satisfied.

698
00:48:58,980 --> 00:49:02,580
If I had the solution
to this, I'd be done.

699
00:49:02,580 --> 00:49:03,900
That's my problem.

700
00:49:03,900 --> 00:49:07,390
Anyway, so it's not
exactly satisfied.

701
00:49:07,390 --> 00:49:14,520
Newton would tell you
a linearization of it,

702
00:49:14,520 --> 00:49:19,050
and you would move in that
gradient direction to try

703
00:49:19,050 --> 00:49:24,030
to make the thing -- to
try to make equality hold,

704
00:49:24,030 --> 00:49:30,370
because our current x doesn't
have equality holding.

705
00:49:30,370 --> 00:49:35,060
And of course the c is
A transpose y plus s.

706
00:49:35,060 --> 00:49:41,210
So that equation -- you
see what's going on here?

707
00:49:41,210 --> 00:49:46,000
This is A transpose y plus s,
and the x is multiplying those,

708
00:49:46,000 --> 00:49:47,830
so there's a product there.

709
00:49:47,830 --> 00:49:50,930
And when I take the derivative,
it's a product rule,

710
00:49:50,930 --> 00:49:52,390
I get two terms.

711
00:49:52,390 --> 00:50:00,200
Anyway, I get a third
equation from here

712
00:50:00,200 --> 00:50:03,460
that connects delta x,
delta y, and delta s.

713
00:50:03,460 --> 00:50:09,420
I take that step and that's
my interior point method.

714
00:50:09,420 --> 00:50:13,400
That's my Newton step.

715
00:50:13,400 --> 00:50:18,790
So maybe I just end by
reporting the results,

716
00:50:18,790 --> 00:50:20,940
so I'll end with
just two comments.

717
00:50:20,940 --> 00:50:23,280
First is, is the
method any good?

718
00:50:23,280 --> 00:50:26,120
And of course you
only know by trying.

719
00:50:26,120 --> 00:50:29,460
And the answer is
yeah, in -- typically,

720
00:50:29,460 --> 00:50:34,920
you get the duality gap down
below 10 to the minus 8,

721
00:50:34,920 --> 00:50:38,160
which is usually
very satisfactory,

722
00:50:38,160 --> 00:50:45,410
in 20 to 80 steps.

723
00:50:45,410 --> 00:50:48,340
You can never prove a
statement like that,

724
00:50:48,340 --> 00:50:51,170
because you can always
create some awful example,

725
00:50:51,170 --> 00:50:55,830
but this is the typical
performance of the method.

726
00:50:55,830 --> 00:51:01,620
Which is pretty good,
regardless of m and n.

727
00:51:01,620 --> 00:51:06,270
That's what's wonderful --
that the number of steps

728
00:51:06,270 --> 00:51:09,910
doesn't increase with
the size of the problem.

729
00:51:09,910 --> 00:51:12,400
Of course, the cost
per step does increase

730
00:51:12,400 --> 00:51:13,740
with the size of the problem.

731
00:51:13,740 --> 00:51:16,660
OK so that's the
results, and that's

732
00:51:16,660 --> 00:51:20,090
why the method is popular.

733
00:51:20,090 --> 00:51:23,840
And now I just wanted
to not leave duality,

734
00:51:23,840 --> 00:51:28,470
which is such a key
idea, without going back

735
00:51:28,470 --> 00:51:34,270
to our much more familiar
problem of quadratics, where

736
00:51:34,270 --> 00:51:36,050
there are quadratic terms.

737
00:51:36,050 --> 00:51:39,720
And the best model you
remember was projection.

738
00:51:39,720 --> 00:51:46,160
You remember that we had a
vector b and we have the line,

739
00:51:46,160 --> 00:51:52,420
the null space of A. No, this
was the column space of A.

740
00:51:52,420 --> 00:51:56,580
This was all A*x's.

741
00:51:56,580 --> 00:52:03,830
And perpendicular to it was
the null space of A transpose.

742
00:52:03,830 --> 00:52:10,010
All A transpose
y's that equaled 0.

743
00:52:10,010 --> 00:52:11,060
Do you remember this?

744
00:52:11,060 --> 00:52:17,280
This was the model problem
for understanding the --

745
00:52:17,280 --> 00:52:22,590
so that the projection of
this solved one problem.

746
00:52:22,590 --> 00:52:27,070
The projection in the other
direction -- we called that P.

747
00:52:27,070 --> 00:52:31,570
This was the projection P
equal A times the best x.

748
00:52:31,570 --> 00:52:34,720
The projection in the
opposite direction

749
00:52:34,720 --> 00:52:39,870
found the e, the error,
but it was the solution

750
00:52:39,870 --> 00:52:41,890
to the dual problem.

751
00:52:41,890 --> 00:52:47,050
And now I want to say where
was duality in this picture?

752
00:52:47,050 --> 00:52:51,770
Well, duality was -- let me
call it e hat, the winning,

753
00:52:51,770 --> 00:52:54,650
the projection, the
right guy over here.

754
00:52:54,650 --> 00:52:57,700
Or maybe y hat.

755
00:52:57,700 --> 00:52:59,500
OK, where was duality?

756
00:52:59,500 --> 00:53:02,640
Duality came, in this
case, in the fact

757
00:53:02,640 --> 00:53:05,250
that it was Pythagoras.

758
00:53:05,250 --> 00:53:09,100
Duality in this simple,
beautiful problem

759
00:53:09,100 --> 00:53:15,220
was simply the fact that p
squared, this winner squared,

760
00:53:15,220 --> 00:53:21,300
plus e squared was b squared.

761
00:53:21,300 --> 00:53:25,880
The winners were the
orthogonal projections.

762
00:53:25,880 --> 00:53:28,120
And now where is weak duality?

763
00:53:28,120 --> 00:53:30,460
It's the last second
of the lecture.

764
00:53:30,460 --> 00:53:33,840
Weak duality says take
something that's allowed,

765
00:53:33,840 --> 00:53:38,650
like that, and take something
that's allowed here, like that.

766
00:53:44,650 --> 00:53:47,080
Those are not the winners.

767
00:53:47,080 --> 00:53:50,140
Those don't deserve
stars or hats.

768
00:53:50,140 --> 00:53:51,650
They're not the winners.

769
00:53:51,650 --> 00:53:57,640
And compute that squared
plus that squared.

770
00:53:57,640 --> 00:53:59,350
So this is any A*x.

771
00:53:59,350 --> 00:54:09,630
So any A*x squared and any y --
let's call that y -- squared.

772
00:54:09,630 --> 00:54:19,060
And what is the inequality
that is satisfied by any A*x,

773
00:54:19,060 --> 00:54:24,220
like the wrong one here, and
any y, like the wrong one there,

774
00:54:24,220 --> 00:54:27,250
will satisfy?

775
00:54:27,250 --> 00:54:31,260
Pythagoras won't be quite right.

776
00:54:31,260 --> 00:54:35,960
It'll be A*x squared
plus y squared.

777
00:54:35,960 --> 00:54:38,530
What do we know about the
sum of those two squares?

778
00:54:41,100 --> 00:54:45,080
It's greater than or
equal to b squared.

779
00:54:48,440 --> 00:54:51,720
The only way we get
this thing split

780
00:54:51,720 --> 00:54:58,580
into two orthogonal parts whose
squares add up to b squared

781
00:54:58,580 --> 00:55:02,680
is right triangle.

782
00:55:02,680 --> 00:55:09,390
If I replace this by something
longer and I replace this --

783
00:55:09,390 --> 00:55:10,980
I should take that error really.

784
00:55:13,850 --> 00:55:17,500
e is really b minus the A*x.

785
00:55:17,500 --> 00:55:19,420
That's what I should
be putting here.

786
00:55:19,420 --> 00:55:25,860
This thing should be
b minus A*x squared.

787
00:55:28,450 --> 00:55:32,700
Anyway, the duality
is in the fact

788
00:55:32,700 --> 00:55:36,570
of getting an equal sign
there and weak duality

789
00:55:36,570 --> 00:55:40,390
is the easy inequality
that no matter what you do,

790
00:55:40,390 --> 00:55:41,960
you get greater than or equal.

791
00:55:41,960 --> 00:55:46,810
So the duality gap is
somehow the gap there,

792
00:55:46,810 --> 00:55:51,640
and the whole subject
of optimization

793
00:55:51,640 --> 00:55:53,790
is to bring that gap to 0.

794
00:55:53,790 --> 00:55:58,430
So this is the gap in
quadratic problems,

795
00:55:58,430 --> 00:56:01,830
of which this is a neat
model, and this was

796
00:56:01,830 --> 00:56:04,660
all about linear programming.

797
00:56:04,660 --> 00:56:09,300
And duality is present for both.

798
00:56:09,300 --> 00:56:16,430
OK, so Friday is the promised
lecture on ill-posed problems.

799
00:56:16,430 --> 00:56:21,100
And meanwhile, if two
people are willing to put up

800
00:56:21,100 --> 00:56:25,710
a hand now or email me
later and say: sure,

801
00:56:25,710 --> 00:56:31,271
I'll take my turn Friday of next
week, that would be terrific.

802
00:56:31,271 --> 00:56:31,770
OK.

803
00:56:31,770 --> 00:56:32,710
Thanks.

804
00:56:32,710 --> 00:56:33,780
I see one hand.

805
00:56:33,780 --> 00:56:35,030
OK.