1
00:00:00,040 --> 00:00:02,460
The following content is
provided under a Creative

2
00:00:02,460 --> 00:00:03,870
Commons license.

3
00:00:03,870 --> 00:00:06,320
Your support will help
MIT OpenCourseWare

4
00:00:06,320 --> 00:00:10,560
continue to offer high quality
educational resources for free.

5
00:00:10,560 --> 00:00:13,300
To make a donation or
view additional materials

6
00:00:13,300 --> 00:00:17,210
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:17,210 --> 00:00:19,500
at ocw.mit.edu.

8
00:00:32,560 --> 00:00:33,620
HERBERT GROSS: Hi.

9
00:00:33,620 --> 00:00:37,120
As I was standing here wondering
how to begin today's lesson,

10
00:00:37,120 --> 00:00:40,440
an old story came to mind, of
the professor who passed out

11
00:00:40,440 --> 00:00:43,175
an examination to his class,
and one of the students

12
00:00:43,175 --> 00:00:44,800
said, "Professor,
this is the same test

13
00:00:44,800 --> 00:00:46,640
you gave us last week".

14
00:00:46,640 --> 00:00:48,820
And the professor said,
"I know, but this time I

15
00:00:48,820 --> 00:00:50,470
changed the answers."

16
00:00:50,470 --> 00:00:52,710
And I was thinking of
this in terms of the fact

17
00:00:52,710 --> 00:00:56,170
that much of the new
mathematics is essentially

18
00:00:56,170 --> 00:00:59,370
the old mathematics with
some of the answers changed.

19
00:00:59,370 --> 00:01:02,390
One of the topics that
we used to belittle

20
00:01:02,390 --> 00:01:05,180
in the traditional curriculum,
because it was too easy,

21
00:01:05,180 --> 00:01:08,380
was the topic called
linear equations.

22
00:01:08,380 --> 00:01:11,430
And it turns out that in the
study of several variables

23
00:01:11,430 --> 00:01:14,160
in particular-- but it was
already present in calculus

24
00:01:14,160 --> 00:01:17,410
of a single variable--
we very strongly used

25
00:01:17,410 --> 00:01:20,210
the concept of linearity.

26
00:01:20,210 --> 00:01:24,100
I could've called today's lesson
"something old, something new."

27
00:01:24,100 --> 00:01:27,690
Meaning that the old topic
that we were going to revisit

28
00:01:27,690 --> 00:01:29,770
would be that of
linear functions,

29
00:01:29,770 --> 00:01:33,270
and the new topic would
be how it manifests

30
00:01:33,270 --> 00:01:36,070
into the modern
curriculum in the sense

31
00:01:36,070 --> 00:01:38,240
that one introduces
a subject called

32
00:01:38,240 --> 00:01:40,980
linear algebra,
or matrix algebra,

33
00:01:40,980 --> 00:01:44,960
as a standard portion of
a modern calculus course,

34
00:01:44,960 --> 00:01:47,730
whereas in the traditional
calculus courses,

35
00:01:47,730 --> 00:01:51,970
essentially nothing was ever
said about matrix algebra

36
00:01:51,970 --> 00:01:53,140
or linearity.

37
00:01:53,140 --> 00:01:56,230
Instead I picked a
more conservative title

38
00:01:56,230 --> 00:02:00,790
for today's lesson, I simply
call it "Linearity Revisited".

39
00:02:00,790 --> 00:02:03,410
And as I say, it
goes back to when

40
00:02:03,410 --> 00:02:05,730
we were in junior high
school or high school,

41
00:02:05,730 --> 00:02:09,419
when we were taught that linear
functions were very nice.

42
00:02:09,419 --> 00:02:13,750
For example, given the
equation y equals m*x plus b--

43
00:02:13,750 --> 00:02:15,510
the linear equation
meaning what?

44
00:02:15,510 --> 00:02:19,050
It graphs as a straight line,
but that the two variables

45
00:02:19,050 --> 00:02:22,100
are related linearly, y
is a constant multiple

46
00:02:22,100 --> 00:02:24,670
of x, plus a constant.

47
00:02:24,670 --> 00:02:28,140
We were told solve
for x in terms of y.

48
00:02:28,140 --> 00:02:31,470
And what we found was that
if y equals m*x plus b,

49
00:02:31,470 --> 00:02:37,280
this was true if and only if x
was equal to y minus b over m.

50
00:02:37,280 --> 00:02:40,450
What we showed was
given a value of x,

51
00:02:40,450 --> 00:02:44,380
there corresponded a value
of y, and conversely,

52
00:02:44,380 --> 00:02:48,730
given a value for y, there
corresponded a unique value

53
00:02:48,730 --> 00:02:49,780
for x.

54
00:02:49,780 --> 00:02:52,140
And to put this into the
language of functions,

55
00:02:52,140 --> 00:02:56,650
what we were saying was that
if f of x equals m*x plus b,

56
00:02:56,650 --> 00:02:59,420
then f inverse exists.

57
00:02:59,420 --> 00:03:03,110
In other words, what we're
saying is that no two different

58
00:03:03,110 --> 00:03:07,010
x values can give
you the same y value,

59
00:03:07,010 --> 00:03:10,235
if the function has the
form y equals m*x plus b.

60
00:03:10,235 --> 00:03:12,110
And just about the time
that we were learning

61
00:03:12,110 --> 00:03:14,060
to enjoy this kind
of an equation

62
00:03:14,060 --> 00:03:18,540
our dream world was shattered,
and we were told it's too bad,

63
00:03:18,540 --> 00:03:21,360
but most functions
aren't linear.

64
00:03:21,360 --> 00:03:24,260
We were given things like
y equals x to the seventh

65
00:03:24,260 --> 00:03:25,920
plus x to the
fifth, and we found

66
00:03:25,920 --> 00:03:28,610
that we couldn't solve
for x very conveniently

67
00:03:28,610 --> 00:03:29,920
in terms of y.

68
00:03:29,920 --> 00:03:32,790
And that's what began
our intermediate algebra

69
00:03:32,790 --> 00:03:34,410
and advanced algebra courses.

70
00:03:34,410 --> 00:03:38,850
In other words, the fact that
most functions are non-linear.

71
00:03:38,850 --> 00:03:41,180
Now an interesting
thing occurred though.

72
00:03:41,180 --> 00:03:44,050
Let me just emphasize this.

73
00:03:44,050 --> 00:03:45,460
And this is the key point.

74
00:03:45,460 --> 00:03:49,460
In terms of calculus,
we discovered--

75
00:03:49,460 --> 00:03:51,840
and here's a key word
coming up-- Most functions

76
00:03:51,840 --> 00:03:54,140
are locally linear.

77
00:03:54,140 --> 00:03:56,730
Now that sounds a little
bit like a tongue twister,

78
00:03:56,730 --> 00:04:00,110
but actually back in
the first part of course

79
00:04:00,110 --> 00:04:04,520
when we talked about delta
y sub tan-- a change in y

80
00:04:04,520 --> 00:04:05,910
to the tangent line.

81
00:04:05,910 --> 00:04:07,570
Notice what we were saying.

82
00:04:07,570 --> 00:04:11,680
We were saying that to study
f of x near f equals a,

83
00:04:11,680 --> 00:04:15,450
we saw that f of a plus
delta x minus f of a

84
00:04:15,450 --> 00:04:20,089
was equal to f prime of a
times delta x plus k delta

85
00:04:20,089 --> 00:04:23,660
x, where the limit of
k, as delta x went to 0,

86
00:04:23,660 --> 00:04:25,400
was 0 itself.

87
00:04:25,400 --> 00:04:29,550
Provided of course that f was
differentiable at x equals a;

88
00:04:29,550 --> 00:04:32,550
otherwise, you couldn't
write down f prime of a here.

89
00:04:32,550 --> 00:04:34,420
The interesting point is this.

90
00:04:34,420 --> 00:04:38,100
But if you look just
at this term over here,

91
00:04:38,100 --> 00:04:44,410
this expresses delta f as a
linear function of delta x.

92
00:04:44,410 --> 00:04:47,440
The part that makes this
thing non-linear is the term

93
00:04:47,440 --> 00:04:50,150
called k delta x.

94
00:04:50,150 --> 00:04:52,180
But that's the term
that's going to 0

95
00:04:52,180 --> 00:04:54,710
as a second-order infinitesimal.

96
00:04:54,710 --> 00:04:57,080
So what we're really
saying is this:

97
00:04:57,080 --> 00:04:59,420
that provided that f
is differentiable at x

98
00:04:59,420 --> 00:05:03,690
equals a-- in other words
locally we mean this:

99
00:05:03,690 --> 00:05:09,400
near x equals a, we can say
that delta f is approximately

100
00:05:09,400 --> 00:05:12,240
f prime of a times delta x.

101
00:05:12,240 --> 00:05:15,480
That's what we call
delta f sub tan, recall.

102
00:05:15,480 --> 00:05:18,430
And what we mean by
approximately here

103
00:05:18,430 --> 00:05:21,710
is that error k delta
x goes to 0 very,

104
00:05:21,710 --> 00:05:24,900
very rapidly as
delta x goes to 0.

105
00:05:24,900 --> 00:05:27,900
And what we mean by
locally is this-- suppose

106
00:05:27,900 --> 00:05:30,970
f prime exists also
when x equals b.

107
00:05:30,970 --> 00:05:35,750
We can again compute delta f
near x equals b Now delta f

108
00:05:35,750 --> 00:05:37,370
is equal to what?

109
00:05:37,370 --> 00:05:40,610
Approximately f prime
of b times delta

110
00:05:40,610 --> 00:05:44,160
x plus that error term which
goes to 0 very rapidly.

111
00:05:44,160 --> 00:05:50,180
We again call this
thing here delta f tan,

112
00:05:50,180 --> 00:05:54,240
but the thing to keep in mind
is since f prime of a need

113
00:05:54,240 --> 00:06:00,100
not equal f prime of b, delta f
tan is different at a and at b.

114
00:06:00,100 --> 00:06:02,920
In other words, even
though it's always true

115
00:06:02,920 --> 00:06:04,810
where f is
differentiable, that we

116
00:06:04,810 --> 00:06:08,950
can say that delta f is
approximately delta f tan,

117
00:06:08,950 --> 00:06:12,580
the value of delta f
tan depends on the value

118
00:06:12,580 --> 00:06:14,280
of x that we're near.

119
00:06:14,280 --> 00:06:16,170
And that's what
we mean by saying

120
00:06:16,170 --> 00:06:18,750
that approximating
delta f by delta f

121
00:06:18,750 --> 00:06:21,580
tan is a local property.

122
00:06:21,580 --> 00:06:23,830
Now I think that sometimes,
by putting these things

123
00:06:23,830 --> 00:06:26,900
into words, it sounds
harder than it really is.

124
00:06:26,900 --> 00:06:29,460
So I think what might
be nice is if we just

125
00:06:29,460 --> 00:06:33,690
look at a specific illustration,
a problem which I deliberately

126
00:06:33,690 --> 00:06:37,250
picked to be as simple a
non-linear example as I

127
00:06:37,250 --> 00:06:38,210
can think of.

128
00:06:38,210 --> 00:06:41,830
Let me come back
to our old friend,

129
00:06:41,830 --> 00:06:45,040
the function f of x equals
x squared, which as I say,

130
00:06:45,040 --> 00:06:48,850
is about as simple a non-linear
function we can get into.

131
00:06:48,850 --> 00:06:52,030
Now we know that f of x equals
x squared plots as the curve y

132
00:06:52,030 --> 00:06:54,930
equals x squared, the parabola.

133
00:06:54,930 --> 00:06:57,360
Let's take a couple of
points on this parabola.

134
00:06:57,360 --> 00:07:01,870
Let's say the point 1 comma
1 and the point 2 comma 4.

135
00:07:01,870 --> 00:07:08,480
Draw in the tangent lines to
the curve at these two points.

136
00:07:08,480 --> 00:07:09,470
And we know what?

137
00:07:09,470 --> 00:07:13,110
That the equation of the tangent
line to the curve at (1, 1)

138
00:07:13,110 --> 00:07:17,630
is y minus 1 over x
minus 1 equals the slope.

139
00:07:17,630 --> 00:07:21,750
Since y is equal to x squared,
the slope is 2x; when x is 1

140
00:07:21,750 --> 00:07:22,970
the slope is 2.

141
00:07:22,970 --> 00:07:26,640
So the equation of this tangent
line is given by y minus 1

142
00:07:26,640 --> 00:07:28,920
over x minus 1 equals 2.

143
00:07:28,920 --> 00:07:31,790
At the point corresponding
to x equals 2,

144
00:07:31,790 --> 00:07:36,250
2x is 4, so the equation of the
tangent line here is y minus 4

145
00:07:36,250 --> 00:07:38,870
over x minus 2 equals 4.

146
00:07:38,870 --> 00:07:41,150
So now I've induced
three functions

147
00:07:41,150 --> 00:07:42,400
that I can talk about.

148
00:07:42,400 --> 00:07:46,230
My original function,
f of x is x squared.

149
00:07:46,230 --> 00:07:49,990
This straight line is the
linear function-- just solving

150
00:07:49,990 --> 00:07:55,860
for y in terms of x-- g
of x equals 4x minus 4.

151
00:07:55,860 --> 00:07:59,770
And this straight line
corresponds to the function h

152
00:07:59,770 --> 00:08:03,059
of x equals 2x minus 1.

153
00:08:03,059 --> 00:08:04,600
Now the interesting
point, of course,

154
00:08:04,600 --> 00:08:06,754
is that these two
functions here are linear.

155
00:08:06,754 --> 00:08:08,420
They are completely
different functions.

156
00:08:08,420 --> 00:08:10,530
Notice not only pictorially
are they different,

157
00:08:10,530 --> 00:08:13,200
but algebraically their
slopes are different,

158
00:08:13,200 --> 00:08:16,470
and their y-intercepts
are different,

159
00:08:16,470 --> 00:08:19,340
and back in our
course in part one,

160
00:08:19,340 --> 00:08:21,910
we talked about things
geometrically saying,

161
00:08:21,910 --> 00:08:25,140
lookit, near the
point of tangency,

162
00:08:25,140 --> 00:08:28,820
the tangent line serves
as a good approximation

163
00:08:28,820 --> 00:08:30,030
to the curve itself.

164
00:08:30,030 --> 00:08:31,780
What were we really saying then?

165
00:08:31,780 --> 00:08:35,692
What we were saying was that
near the point of tangency,

166
00:08:35,692 --> 00:08:39,690
g of x, which was
a linear function,

167
00:08:39,690 --> 00:08:43,520
could replace f of x, which
was a non-linear function.

168
00:08:43,520 --> 00:08:45,320
Of course, when we
moved too far away

169
00:08:45,320 --> 00:08:48,680
from a given point, then when
we said that f of x still

170
00:08:48,680 --> 00:08:50,880
had a linear
approximation, we had

171
00:08:50,880 --> 00:08:53,780
to pick a different
linear function.

172
00:08:53,780 --> 00:08:55,390
By the way, again
because we were

173
00:08:55,390 --> 00:08:57,700
dealing with one
independent variable and one

174
00:08:57,700 --> 00:09:00,490
dependent variable, it
was very easy to invent

175
00:09:00,490 --> 00:09:02,280
the concept of a graph.

176
00:09:02,280 --> 00:09:06,320
As we shall show in a little
while, the concept of linearity

177
00:09:06,320 --> 00:09:08,860
extends to several
variables, but you

178
00:09:08,860 --> 00:09:10,880
can't draw the graph as nicely.

179
00:09:10,880 --> 00:09:15,530
So let me now revisit the
same result here, only

180
00:09:15,530 --> 00:09:17,730
without reference to the graph.

181
00:09:17,730 --> 00:09:21,120
What we're saying
is that our function

182
00:09:21,120 --> 00:09:24,555
is mapping the real number
line into the real number line.

183
00:09:24,555 --> 00:09:26,680
In other words, instead of
putting x and y at right

184
00:09:26,680 --> 00:09:31,800
angles to each other, let's put
x and y horizontally parallel

185
00:09:31,800 --> 00:09:32,840
to one another.

186
00:09:32,840 --> 00:09:39,060
And what we're saying is that
f maps the interval from 0 to 2

187
00:09:39,060 --> 00:09:43,370
onto the interval from 0 to 4.

188
00:09:43,370 --> 00:09:44,750
Now what does h do?

189
00:09:44,750 --> 00:09:47,760
Remember h is the
function 2x minus 1.

190
00:09:47,760 --> 00:09:51,620
h maps the interval from
0 to 2 onto the interval

191
00:09:51,620 --> 00:09:54,770
from minus 1 to 3.

192
00:09:54,770 --> 00:09:58,700
And you see this is all this
diagram means. f maps 0 into 0,

193
00:09:58,700 --> 00:10:03,340
it maps 1 into 1, it
maps 2 into 4, et cetera.

194
00:10:03,340 --> 00:10:06,070
In other words, f is the
function which squares

195
00:10:06,070 --> 00:10:08,140
the input to yield the output.

196
00:10:08,140 --> 00:10:13,430
And correspondingly, h maps 0
into minus 1, it maps 1 into 1,

197
00:10:13,430 --> 00:10:16,190
and it maps 2 into 3.

198
00:10:16,190 --> 00:10:18,110
Now the interesting
point is that f and h

199
00:10:18,110 --> 00:10:19,570
are very different.

200
00:10:19,570 --> 00:10:23,710
In fact, the only time f
and h have the same output

201
00:10:23,710 --> 00:10:27,610
is when x equals 1.

202
00:10:27,610 --> 00:10:29,310
Which, of course,
we knew from before,

203
00:10:29,310 --> 00:10:31,730
because how was h
of x constructed?

204
00:10:31,730 --> 00:10:36,300
h of x was constructed to be the
line tangent to the parabola y

205
00:10:36,300 --> 00:10:39,600
equals x squared at the
point x equals 1, y equals 1.

206
00:10:39,600 --> 00:10:41,830
So that should be
no great surprise.

207
00:10:41,830 --> 00:10:45,040
But if we didn't know that,
notice that algebraically, we

208
00:10:45,040 --> 00:10:48,450
could equate f of x to h
of x, conclude, therefore,

209
00:10:48,450 --> 00:10:51,910
that that means x squared
must equal 2x minus 1.

210
00:10:51,910 --> 00:10:57,610
We then transpose, and get that
x minus 1 squared must be 0,

211
00:10:57,610 --> 00:11:00,290
whence x must equal 1.

212
00:11:00,290 --> 00:11:03,890
And what we have is
that near x equals 1,

213
00:11:03,890 --> 00:11:08,080
x squared behaves like-- and
I put this in quotation marks,

214
00:11:08,080 --> 00:11:11,210
because that's the hardest
part of the course that's

215
00:11:11,210 --> 00:11:14,350
going to follow, was what do
you mean by behaves like-- but x

216
00:11:14,350 --> 00:11:17,370
squared behaves like 2x minus 1.

217
00:11:17,370 --> 00:11:19,260
And what we mean by
that is this, at least

218
00:11:19,260 --> 00:11:20,740
in terms of a picture.

219
00:11:20,740 --> 00:11:24,130
If I pick a small
interval surrounding

220
00:11:24,130 --> 00:11:28,020
x equals 1 on the x-axis,
and a small interval--

221
00:11:28,020 --> 00:11:33,660
like a thick dot-- surrounding
y equals 1 on the y-axis here.

222
00:11:33,660 --> 00:11:40,090
Then, as a mapping from
this domain into this range,

223
00:11:40,090 --> 00:11:45,520
I can essentially not
distinguish f from h.

224
00:11:45,520 --> 00:11:49,800
The error is so small that
as the size of the interval

225
00:11:49,800 --> 00:11:53,370
shrinks, the error
goes to 0 even faster.

226
00:11:53,370 --> 00:11:56,900
And therefore, if I stay
close enough, locally,

227
00:11:56,900 --> 00:11:59,520
to the point in question-- if
I stay close enough to this

228
00:11:59,520 --> 00:12:01,880
point, I cannot tell
the difference between

229
00:12:01,880 --> 00:12:05,140
the non-linear function
and the linear function.

230
00:12:05,140 --> 00:12:07,290
But what I have to be
careful about is this--

231
00:12:07,290 --> 00:12:11,350
that whereas x squared can
be replaced by 2x minus 1

232
00:12:11,350 --> 00:12:15,120
near x equals 1,
near x equals 2,

233
00:12:15,120 --> 00:12:18,860
x squared can be replaced again
by a linear function, namely

234
00:12:18,860 --> 00:12:20,230
4x minus 4.

235
00:12:20,230 --> 00:12:23,610
But 4x minus 4 is
not approximately

236
00:12:23,610 --> 00:12:26,950
the same as 2x minus 1,
no matter where you look.

237
00:12:26,950 --> 00:12:29,460
You might say well, lookit,
don't these two straight lines

238
00:12:29,460 --> 00:12:31,010
intersect at the
particular point?

239
00:12:31,010 --> 00:12:33,050
The answer is yes they do.

240
00:12:33,050 --> 00:12:35,610
But even at the point
that they intersect,

241
00:12:35,610 --> 00:12:38,410
there was no
neighborhood in which

242
00:12:38,410 --> 00:12:42,410
these lines can serve as
approximations for one another.

243
00:12:42,410 --> 00:12:44,220
Those are two
straight lines that

244
00:12:44,220 --> 00:12:46,490
intersect at a constant
angle, and as soon

245
00:12:46,490 --> 00:12:48,430
as you leave the
point of intersection

246
00:12:48,430 --> 00:12:50,050
there is a significant error.

247
00:12:50,050 --> 00:12:53,860
Meaning an error which
does not go to 0 more

248
00:12:53,860 --> 00:12:56,260
rapidly than the change in x.

249
00:12:56,260 --> 00:12:59,820
You don't have that higher-order
infinitesimal over here.

250
00:12:59,820 --> 00:13:02,470
At any rate, leaving
this to the exercises

251
00:13:02,470 --> 00:13:04,880
and the supplementary notes,
for you to get more out of,

252
00:13:04,880 --> 00:13:08,810
in summary, let's just say
this: if f is continuously

253
00:13:08,810 --> 00:13:12,530
differentiable at x equals a,
then locally-- meaning near x

254
00:13:12,530 --> 00:13:16,060
equals a-- f behaves linearly.

255
00:13:16,060 --> 00:13:17,030
In other words,

256
00:13:17,030 --> 00:13:21,970
f of x is approximately f
of a plus f prime of a times

257
00:13:21,970 --> 00:13:26,880
the quantity x minus a, and you
see, once x is chosen to be a,

258
00:13:26,880 --> 00:13:29,830
this is a number,
this is a number,

259
00:13:29,830 --> 00:13:32,080
delta x here is
the only variable

260
00:13:32,080 --> 00:13:33,440
on the right-hand side.

261
00:13:33,440 --> 00:13:36,960
So what we're saying is
that f of x is a what?

262
00:13:36,960 --> 00:13:39,770
Linear function of delta x.

263
00:13:39,770 --> 00:13:42,160
And the more
interesting point is--

264
00:13:42,160 --> 00:13:44,860
since this is all review,
so I say-- what I mean

265
00:13:44,860 --> 00:13:46,509
by interesting point is what?

266
00:13:46,509 --> 00:13:48,300
That we don't have to
just review this way,

267
00:13:48,300 --> 00:13:50,330
we did this simply to
refresh your memories

268
00:13:50,330 --> 00:13:53,300
as to how linearity was
playing a big role in calculus

269
00:13:53,300 --> 00:13:54,800
of a single variable.

270
00:13:54,800 --> 00:13:57,900
Now what we're going to
do is extend the result

271
00:13:57,900 --> 00:14:00,190
to several variables.

272
00:14:00,190 --> 00:14:02,420
Let me just say
that at the outset.

273
00:14:02,420 --> 00:14:05,240
That this concept does
extend to n variables,

274
00:14:05,240 --> 00:14:08,270
but n equals 2 yields
a particularly good

275
00:14:08,270 --> 00:14:10,010
geometric insight.

276
00:14:10,010 --> 00:14:13,870
For example, let's suppose I
look at two equations and two

277
00:14:13,870 --> 00:14:15,500
unknowns.

278
00:14:15,500 --> 00:14:19,002
Well actually, I'll
use u and v instead.

279
00:14:19,002 --> 00:14:19,960
Let those be variables.

280
00:14:19,960 --> 00:14:21,960
Also, we can think of
this as a function.

281
00:14:21,960 --> 00:14:25,540
I have u of x, y is x
squared minus y squared,

282
00:14:25,540 --> 00:14:28,510
whereas v of x, y is 2x*y.

283
00:14:28,510 --> 00:14:30,960
Notice that these
are not linear,

284
00:14:30,960 --> 00:14:32,640
because here we have
things appearing

285
00:14:32,640 --> 00:14:36,540
to second power, squares,
and here we have what?

286
00:14:36,540 --> 00:14:38,560
The variables
multiplying one another.

287
00:14:38,560 --> 00:14:42,230
These are not linear equations,
but the beautiful point

288
00:14:42,230 --> 00:14:45,830
is-- if you look at this way--
is even without a picture,

289
00:14:45,830 --> 00:14:47,720
I can think of this
as a mapping which

290
00:14:47,720 --> 00:14:50,630
maps two-dimensional space
into two-dimensional space.

291
00:14:50,630 --> 00:14:52,540
And how does this
mapping take place?

292
00:14:52,540 --> 00:14:55,720
It maps the point or the
pair, or the 2-tuple--

293
00:14:55,720 --> 00:14:57,220
whichever way you
want to say it--

294
00:14:57,220 --> 00:15:01,410
x comma y into the
2-tuple u comma v,

295
00:15:01,410 --> 00:15:06,510
where u is x squared minus
y squared, and v is 2x*y.

296
00:15:06,510 --> 00:15:07,790
In other words,

297
00:15:07,790 --> 00:15:10,590
f-bar-- and notice I
put the bar underneath,

298
00:15:10,590 --> 00:15:13,810
simply to indicate that
E^2 is a vector space,

299
00:15:13,810 --> 00:15:15,700
and we have a function
that's mapping what?

300
00:15:15,700 --> 00:15:20,890
A vector into a vector, so I
indicate that f is a vector

301
00:15:20,890 --> 00:15:21,570
function here.

302
00:15:21,570 --> 00:15:23,870
It maps a vector into a vector.

303
00:15:23,870 --> 00:15:25,730
And how does the
mapping take place?

304
00:15:25,730 --> 00:15:30,670
It maps the 2-tuple x comma
y into the 2-tuple x squared

305
00:15:30,670 --> 00:15:33,700
minus y squared comma 2x*y.

306
00:15:33,700 --> 00:15:35,630
(u, v).

307
00:15:35,630 --> 00:15:38,380
Now, the thing is that as long
as we only have n equals 2,

308
00:15:38,380 --> 00:15:42,650
we can still draw a picture, but
not a picture as nice as what

309
00:15:42,650 --> 00:15:45,470
existed when n was equal to 1.

310
00:15:45,470 --> 00:15:49,500
See, pictorially,
f-bar maps the xy-plane

311
00:15:49,500 --> 00:15:52,350
into what we can
call the uv-plane.

312
00:15:52,350 --> 00:15:55,880
But notice that since
the domain of f-bar

313
00:15:55,880 --> 00:15:59,170
has two degrees of freedom-- a
two-dimensional vector space--

314
00:15:59,170 --> 00:16:03,360
notice that the domain of
f-bar is the entire xy-plane,

315
00:16:03,360 --> 00:16:07,610
whereas the range of f-bar
is the entire uv-plane.

316
00:16:07,610 --> 00:16:09,880
In other words, I
can now view f-bar

317
00:16:09,880 --> 00:16:13,300
as a mapping which carries
points in the xy-plane

318
00:16:13,300 --> 00:16:15,600
into points in the uv-plane.

319
00:16:15,600 --> 00:16:19,020
And this will be exploited
more later in the course,

320
00:16:19,020 --> 00:16:20,550
but the idea is this.

321
00:16:20,550 --> 00:16:22,470
Let's take a look
for the time being.

322
00:16:22,470 --> 00:16:26,520
Let's see what f-bar does
to the point 2 comma 1.

323
00:16:26,520 --> 00:16:30,350
Remember u is x squared
minus y squared,

324
00:16:30,350 --> 00:16:33,310
so at the point 2 comma
1, u becomes what?

325
00:16:33,310 --> 00:16:36,840
2 squared minus 1
squared, which is 3.

326
00:16:36,840 --> 00:16:42,360
On the other hand, 2x*y is 2
times 2 times 1, which is 4.

327
00:16:42,360 --> 00:16:45,930
So f-bar can be viewed as
mapping the point 2 comma

328
00:16:45,930 --> 00:16:50,010
1 into the point 3 comma 4.

329
00:16:50,010 --> 00:16:52,610
Now you recall
that calculus isn't

330
00:16:52,610 --> 00:16:55,470
interested in what's happening
at a particular point.

331
00:16:55,470 --> 00:16:58,630
It's interested in what's
happening in the neighborhood

332
00:16:58,630 --> 00:17:00,050
of a particular point.

333
00:17:00,050 --> 00:17:03,700
So the major question
is, how does f-bar behave

334
00:17:03,700 --> 00:17:06,390
near the point 2 comma 1.

335
00:17:06,390 --> 00:17:08,750
In other words,
what is f-bar of 2

336
00:17:08,750 --> 00:17:13,970
plus delta x comma 1 plus delta
y, when delta x and delta y are

337
00:17:13,970 --> 00:17:14,710
quite small.

338
00:17:14,710 --> 00:17:17,700
That's the question that
we're raising over here.

339
00:17:17,700 --> 00:17:21,250
What we're saying is, we know
that 2 comma 1 maps into 3

340
00:17:21,250 --> 00:17:22,790
comma 4.

341
00:17:22,790 --> 00:17:26,109
We also know or we'd like to
believe that a point near 2

342
00:17:26,109 --> 00:17:30,050
comma 1 maps into a
point near 3 comma 4.

343
00:17:30,050 --> 00:17:32,120
Well if we call
this point 2 plus

344
00:17:32,120 --> 00:17:35,030
delta x comma 1
plus delta y, then

345
00:17:35,030 --> 00:17:37,100
the corresponding
image over here

346
00:17:37,100 --> 00:17:42,390
should be 3 plus delta
u comma 4 plus delta v.

347
00:17:42,390 --> 00:17:46,190
What we can say is that
whatever the image of 2

348
00:17:46,190 --> 00:17:48,930
plus delta x comma
1 plus delta y

349
00:17:48,930 --> 00:17:54,120
is, it has the form 3 plus
delta u comma 4 plus delta v,

350
00:17:54,120 --> 00:17:57,970
and all we have to do is find
delta u and delta v. This

351
00:17:57,970 --> 00:18:00,720
is the pictorial idea
of what's happening.

352
00:18:00,720 --> 00:18:03,050
Now the point is that
delta u and delta

353
00:18:03,050 --> 00:18:04,920
v are very difficult to find.

354
00:18:04,920 --> 00:18:07,840
After all, u and v are
non-linear functions.

355
00:18:07,840 --> 00:18:10,140
To invert them is
either difficult,

356
00:18:10,140 --> 00:18:14,450
or downright impossible, one
or the other, in many cases.

357
00:18:14,450 --> 00:18:19,430
The thing that's easy to find
is delta u tan, and delta v tan.

358
00:18:19,430 --> 00:18:23,460
Remember delta u tan was the
partial of u with respect to x

359
00:18:23,460 --> 00:18:26,400
times delta x, plus the
partial of u with respect to y

360
00:18:26,400 --> 00:18:27,730
times delta y.

361
00:18:27,730 --> 00:18:30,620
Since u is equal to x
squared minus y squared,

362
00:18:30,620 --> 00:18:35,240
that means delta u tan is
2x delta x minus 2y delta y.

363
00:18:35,240 --> 00:18:38,640
We're interested in this
at the point 2 comma 1.

364
00:18:38,640 --> 00:18:43,360
Letting x be 2, and y be
1, we see that delta u tan

365
00:18:43,360 --> 00:18:46,580
is 4 delta x minus 2 delta y.

366
00:18:46,580 --> 00:18:49,810
Similarly, since v
is equal to 2x*y,

367
00:18:49,810 --> 00:18:52,790
the partial of v with
respect to x is 2y;

368
00:18:52,790 --> 00:18:55,830
the partial with v with
respect to y is 2x.

369
00:18:55,830 --> 00:19:01,230
Therefore, delta v sub tan is
2y delta x plus 2x delta y.

370
00:19:01,230 --> 00:19:03,450
Since we're evaluating
this at x equals 2,

371
00:19:03,450 --> 00:19:08,560
y equals 1, we see that delta v
tan is two delta x plus 4 delta

372
00:19:08,560 --> 00:19:09,500
y.

373
00:19:09,500 --> 00:19:11,390
Now here's the key point.

374
00:19:11,390 --> 00:19:14,270
This is always delta u tan.

375
00:19:14,270 --> 00:19:16,600
This is always delta v tan.

376
00:19:16,600 --> 00:19:19,030
Where the local
thing comes in is

377
00:19:19,030 --> 00:19:22,220
that we know that because
u and v are continuously

378
00:19:22,220 --> 00:19:26,950
differentiable functions of x
and y, that near the point 2

379
00:19:26,950 --> 00:19:32,380
comma 1, we can replace delta
u by delta u sub tan, delta v

380
00:19:32,380 --> 00:19:36,010
by delta v sub tan, and
we wind up with what?

381
00:19:36,010 --> 00:19:40,110
delta u is approximately
4 delta x minus 2 delta y.

382
00:19:40,110 --> 00:19:43,780
delta v is approximately
2 delta x plus 4 delta y.

383
00:19:43,780 --> 00:19:49,470
But the key point
now is that this is

384
00:19:49,470 --> 00:19:51,760
a system of linear equations.

385
00:19:51,760 --> 00:19:54,620
You see, delta u is
a linear combination

386
00:19:54,620 --> 00:19:57,130
of delta x and
delta y, and delta v

387
00:19:57,130 --> 00:20:01,590
is also a linear combination
of delta x and delta y.

388
00:20:01,590 --> 00:20:03,530
In other words,
as long as u and v

389
00:20:03,530 --> 00:20:07,460
are continuously differentiable
functions of x and y,

390
00:20:07,460 --> 00:20:11,650
we can approximate,
locally, delta u and delta

391
00:20:11,650 --> 00:20:15,070
v by linear approximations.

392
00:20:15,070 --> 00:20:18,310
Notice how linear
systems come into play.

393
00:20:18,310 --> 00:20:21,300
Now I've been emphasizing
the case n equals 2 just

394
00:20:21,300 --> 00:20:23,020
so we could draw a picture.

395
00:20:23,020 --> 00:20:26,150
Notice that no matter how
many variables we have-- well,

396
00:20:26,150 --> 00:20:29,820
in fact, let me just summarize
this in terms of x and y first.

397
00:20:29,820 --> 00:20:34,090
And then we'll generalize it
to n variables in a minute.

398
00:20:34,090 --> 00:20:36,410
The key point for two
variables, and what

399
00:20:36,410 --> 00:20:40,250
happens for two variables
happens for any number.

400
00:20:40,250 --> 00:20:41,950
But as we've often
done in this course,

401
00:20:41,950 --> 00:20:44,260
we emphasize the
two-variable case

402
00:20:44,260 --> 00:20:46,950
because we can still
visualize the picture.

403
00:20:46,950 --> 00:20:50,320
Even though the graph
idea is hard to see,

404
00:20:50,320 --> 00:20:53,910
because we're mapping two
dimensions into two dimensions.

405
00:20:53,910 --> 00:20:56,520
But at least the
domain and the range

406
00:20:56,520 --> 00:20:58,970
are easy to see
separately, but if u

407
00:20:58,970 --> 00:21:00,920
is a continuously
differentiable function

408
00:21:00,920 --> 00:21:05,280
of x and y near the
point (x_0, y_0),

409
00:21:05,280 --> 00:21:10,850
then delta u is exactly the
partial of u with respect to x

410
00:21:10,850 --> 00:21:14,630
times delta x, plus the partial
of u with respect to y times

411
00:21:14,630 --> 00:21:20,440
delta y, plus an error term,
k_1 delta x plus k_2 delta y,

412
00:21:20,440 --> 00:21:25,610
where k_1 and k_2 go to 0, as
delta x and delta y go to 0.

413
00:21:25,610 --> 00:21:27,490
In other words,

414
00:21:27,490 --> 00:21:31,530
if we just look at
this part alone,

415
00:21:31,530 --> 00:21:37,000
delta u is linear up to
this as a correction term.

416
00:21:37,000 --> 00:21:40,450
In other words, the
non-linearity part of delta u

417
00:21:40,450 --> 00:21:44,060
is going to 0 as a
second-order infinitesimal,

418
00:21:44,060 --> 00:21:46,470
and the reason I keep
harping on this point

419
00:21:46,470 --> 00:21:49,140
is that no matter how
complex the theory gets,

420
00:21:49,140 --> 00:21:51,520
in the rest of this
particular block,

421
00:21:51,520 --> 00:21:54,050
the key step is
always going to be

422
00:21:54,050 --> 00:21:55,810
that when you have
a continuously

423
00:21:55,810 --> 00:21:59,580
differentiable function you
can essentially-- as long you

424
00:21:59,580 --> 00:22:01,380
stay locally-- you
can essentially

425
00:22:01,380 --> 00:22:02,910
throw away the nasty part.

426
00:22:02,910 --> 00:22:05,730
You can essentially throw
away this error term,

427
00:22:05,730 --> 00:22:08,720
because it goes to 0 so
rapidly that if you stay close

428
00:22:08,720 --> 00:22:12,170
enough to the point
x_0, y_0, no harm comes

429
00:22:12,170 --> 00:22:13,870
from neglecting this term.

430
00:22:13,870 --> 00:22:15,850
What you must be
careful about is

431
00:22:15,850 --> 00:22:18,490
that as soon as you pick a
large enough neighborhood so

432
00:22:18,490 --> 00:22:21,040
that this term is no
longer negligible, then

433
00:22:21,040 --> 00:22:25,250
even though this part here
is still delta u sub tan,

434
00:22:25,250 --> 00:22:29,880
delta u sub tan is no longer a
good approximation for delta u.

435
00:22:29,880 --> 00:22:33,580
At any rate, in n variables,
what we're saying is,

436
00:22:33,580 --> 00:22:37,280
suppose w is a function
of x_1 up to x_n.

437
00:22:37,280 --> 00:22:39,960
Then if w happens
to be continuously

438
00:22:39,960 --> 00:22:42,540
differentiable at the
point corresponding

439
00:22:42,540 --> 00:22:46,850
to x-bar equals a-bar--
meaning, in terms of n-tuples,

440
00:22:46,850 --> 00:22:50,510
x_1 up to x_n is the
point a_1 comma up

441
00:22:50,510 --> 00:22:55,110
to a_n-- then what we're saying
is that delta w can be replaced

442
00:22:55,110 --> 00:22:57,710
by-- now this has been
mentioned the text,

443
00:22:57,710 --> 00:23:00,100
I don't remember whether
we've mentioned this

444
00:23:00,100 --> 00:23:02,300
in previous lectures or not.

445
00:23:02,300 --> 00:23:03,780
It's rather
interesting that when

446
00:23:03,780 --> 00:23:07,660
you deal with more than three
independent variables we

447
00:23:07,660 --> 00:23:11,340
somehow don't like to use
the word delta w sub tan.

448
00:23:11,340 --> 00:23:15,470
Because tangent indicates a
tangent line or a tangent plane

449
00:23:15,470 --> 00:23:17,280
which is a geometric concept.

450
00:23:17,280 --> 00:23:23,430
Instead we replace the
word tangent by L-I-N

451
00:23:23,430 --> 00:23:26,020
as an abbreviation for linear.

452
00:23:26,020 --> 00:23:27,550
The key point being what?

453
00:23:27,550 --> 00:23:30,730
That this thing that we
call delta w sub lin,

454
00:23:30,730 --> 00:23:33,260
or if you like to call it
sub tan, what's in a name?

455
00:23:33,260 --> 00:23:34,800
Call it whatever you want.

456
00:23:34,800 --> 00:23:37,975
The point is that this thing
that we call delta w sub

457
00:23:37,975 --> 00:23:42,370
lin or delta w sub tan is
the partial of f with respect

458
00:23:42,370 --> 00:23:46,770
to x_1 evaluated at
a-bar times delta x1,

459
00:23:46,770 --> 00:23:49,390
plus the partial
of f with respect

460
00:23:49,390 --> 00:23:53,970
to x sub n evaluated at
a-bar times delta x_n.

461
00:23:53,970 --> 00:23:56,060
And the key point
is that once you

462
00:23:56,060 --> 00:23:59,770
have chosen a
specific number a-bar,

463
00:23:59,770 --> 00:24:03,770
notice that the coefficients
of delta x_1 up to delta x_n

464
00:24:03,770 --> 00:24:05,940
are numbers.

465
00:24:05,940 --> 00:24:07,060
They're not variables.

466
00:24:07,060 --> 00:24:09,500
They are numbers
once a is chosen.

467
00:24:09,500 --> 00:24:13,360
So that what is delta w lin,
why do we call it linear?

468
00:24:13,360 --> 00:24:17,510
Notice that this expression
here is a linear combination

469
00:24:17,510 --> 00:24:19,790
of delta x_1 up to delta x_n.

470
00:24:19,790 --> 00:24:21,110
In other words they're what?

471
00:24:21,110 --> 00:24:25,980
Sums of terms each involving
a delta x times-- excuse me.

472
00:24:25,980 --> 00:24:29,770
A delta x times a constant.

473
00:24:29,770 --> 00:24:32,470
What we're saying is
that nice functions,

474
00:24:32,470 --> 00:24:34,240
and what's a nice function?

475
00:24:34,240 --> 00:24:37,830
A nice function is one which
is continuously differentiable.

476
00:24:37,830 --> 00:24:41,550
A nice function
is locally linear.

477
00:24:41,550 --> 00:24:45,840
In other words, a continuously
differentiable function,

478
00:24:45,840 --> 00:24:51,390
near a particular point,
can be approximated

479
00:24:51,390 --> 00:24:54,460
by a linear function,
where the error will

480
00:24:54,460 --> 00:24:56,590
be very small as
long as you stay

481
00:24:56,590 --> 00:24:58,025
near the point in question.

482
00:24:58,025 --> 00:24:59,900
You remember, at the
beginning of my lecture,

483
00:24:59,900 --> 00:25:02,510
I said something
old, something new.

484
00:25:02,510 --> 00:25:06,020
This finishes the old
part of the course.

485
00:25:06,020 --> 00:25:09,020
In other words, what I've
tried to motivate for you here

486
00:25:09,020 --> 00:25:13,850
is why, if we were remodeling
the pre-calculus curriculum,

487
00:25:13,850 --> 00:25:17,500
much more emphasis should
be paid to linear equations.

488
00:25:17,500 --> 00:25:22,210
Granted that most functions
in real life are non-linear,

489
00:25:22,210 --> 00:25:27,340
the point remains that
locally, functions are linear.

490
00:25:27,340 --> 00:25:28,630
OK?

491
00:25:28,630 --> 00:25:30,870
That's the key point.

492
00:25:30,870 --> 00:25:33,653
Locally we deal with
linear functions.

493
00:25:36,390 --> 00:25:40,010
Therefore, since all
non-linear functions

494
00:25:40,010 --> 00:25:42,790
may be viewed as
being linear locally,

495
00:25:42,790 --> 00:25:44,890
this motivates why we
should really study

496
00:25:44,890 --> 00:25:47,030
systems of linear equations.

497
00:25:47,030 --> 00:25:48,810
In other words, this
motivates the subject

498
00:25:48,810 --> 00:25:50,800
called linear systems.

499
00:25:50,800 --> 00:25:52,630
Now what is a linear system?

500
00:25:52,630 --> 00:25:58,720
Essentially, a linear system
is m equations in n unknowns.

501
00:25:58,720 --> 00:26:01,840
In many cases m and n
are taken to be equal,

502
00:26:01,840 --> 00:26:03,680
but what kind of
equations are they?

503
00:26:03,680 --> 00:26:06,890
They are equations where all
the variables appear separately

504
00:26:06,890 --> 00:26:10,940
to the first power multiplied
only by a constant term,

505
00:26:10,940 --> 00:26:13,670
and by the way, let me
introduce this double subscript

506
00:26:13,670 --> 00:26:18,220
notation rather than introducing
umpteen different symbols

507
00:26:18,220 --> 00:26:19,390
for constants.

508
00:26:19,390 --> 00:26:21,400
Notice that a very
nice device here

509
00:26:21,400 --> 00:26:25,800
is to pick one symbol, like an
a, and then use two subscripts.

510
00:26:25,800 --> 00:26:29,280
The first subscript
telling you what row

511
00:26:29,280 --> 00:26:32,520
the coefficient is referring
to, and the second one

512
00:26:32,520 --> 00:26:33,760
which column.

513
00:26:33,760 --> 00:26:37,030
Or in terms of the equations,
the first subscript

514
00:26:37,030 --> 00:26:39,630
tells you which equation
you're dealing with,

515
00:26:39,630 --> 00:26:42,150
and the second
subscript tells you

516
00:26:42,150 --> 00:26:44,510
what variable it's multiplying.

517
00:26:44,510 --> 00:26:46,360
For example this is what?

518
00:26:46,360 --> 00:26:51,930
This is the coefficient of x
sub 1 in the first equation.

519
00:26:51,930 --> 00:26:57,140
This is the coefficient of x
sub n in the first equation.

520
00:26:57,140 --> 00:27:04,880
This is the coefficient of x
sub n in the n-th equation.

521
00:27:04,880 --> 00:27:08,270
Think of this as the row
and the column if you will.

522
00:27:08,270 --> 00:27:13,270
And what we're saying then is
that the solutions of this type

523
00:27:13,270 --> 00:27:16,460
of system of equations
are really controlled

524
00:27:16,460 --> 00:27:18,820
by the coefficients of the x's.

525
00:27:18,820 --> 00:27:22,480
In other words, by
the numbers a sub ij,

526
00:27:22,480 --> 00:27:26,760
where i and j can take on--
well i takes on all values

527
00:27:26,760 --> 00:27:28,380
from what?

528
00:27:28,380 --> 00:27:39,110
The number of rows. i goes from
1 to m, and j goes from 1 to n.

529
00:27:39,110 --> 00:27:42,070
But the a's become
very important,

530
00:27:42,070 --> 00:27:43,570
and this is what
ultimately is going

531
00:27:43,570 --> 00:27:46,280
to motivate what we
mean by a matrix,

532
00:27:46,280 --> 00:27:48,910
but before I come to that,
let me give you just one

533
00:27:48,910 --> 00:27:54,510
example of what I mean by saying
that the equations are governed

534
00:27:54,510 --> 00:27:59,020
by the coefficients of the
x's, not by the constants

535
00:27:59,020 --> 00:28:00,480
on the right-hand side.

536
00:28:00,480 --> 00:28:02,340
By the way, notice
the convention

537
00:28:02,340 --> 00:28:04,980
that when you have two
equations with two unknowns,

538
00:28:04,980 --> 00:28:07,760
rather than call the
unknowns x_1 and x_2,

539
00:28:07,760 --> 00:28:10,910
it's conventional to call
the unknowns x and y.

540
00:28:10,910 --> 00:28:14,030
Let's take a particularly
simple system here-- x plus y

541
00:28:14,030 --> 00:28:17,530
equals b_1, x
minus y equals b_2.

542
00:28:17,530 --> 00:28:19,540
If we add these
two equations, we

543
00:28:19,540 --> 00:28:23,320
get 2x is b_1 plus
b_2, whereupon

544
00:28:23,320 --> 00:28:26,720
x is b_1 plus b_2 over 2.

545
00:28:26,720 --> 00:28:28,440
If we subtract
the two equations,

546
00:28:28,440 --> 00:28:32,640
we get 2y is b_1
minus b_2, whereupon y

547
00:28:32,640 --> 00:28:35,720
is b_1 minus b_2 over 2.

548
00:28:35,720 --> 00:28:38,950
Notice that this tells us
how to solve for x and y

549
00:28:38,950 --> 00:28:41,060
in terms of b_1 and b_2.

550
00:28:41,060 --> 00:28:44,400
Namely, to find x you take
half the sum of the two b's.

551
00:28:44,400 --> 00:28:47,280
To find y, you take
half the difference.

552
00:28:47,280 --> 00:28:49,710
Now certainly, the
solution depends

553
00:28:49,710 --> 00:28:51,900
on the values of b_1 and b_2.

554
00:28:51,900 --> 00:28:54,520
I'm not saying you don't
change the answers by changing

555
00:28:54,520 --> 00:28:55,860
the constants on this side.

556
00:28:55,860 --> 00:28:59,280
What I am saying is that
the structure by which you

557
00:28:59,280 --> 00:29:02,670
find the answers does not
depend on b_1 and b_2;

558
00:29:02,670 --> 00:29:07,370
it's determined solely by
the coefficients of x and y.

559
00:29:07,370 --> 00:29:08,950
What we're saying
is, no matter what

560
00:29:08,950 --> 00:29:11,690
b_1 and b_2 are in this
particular problem,

561
00:29:11,690 --> 00:29:15,400
to find x and y we take half
the sum of the b's, and we

562
00:29:15,400 --> 00:29:17,300
take half the difference.

563
00:29:17,300 --> 00:29:18,860
In other words, the
solution depends

564
00:29:18,860 --> 00:29:24,232
on b_1 and b_2 numerically,
but not structurally.

565
00:29:24,232 --> 00:29:26,190
Well, the whole idea is
this-- and this is what

566
00:29:26,190 --> 00:29:28,480
we so often do in mathematics.

567
00:29:28,480 --> 00:29:32,570
Because the solution
to our equations

568
00:29:32,570 --> 00:29:36,420
depends on the coefficients
of the x's, we somehow

569
00:29:36,420 --> 00:29:40,130
want to focus our attention
on the coefficients.

570
00:29:40,130 --> 00:29:41,810
And we don't need
the x's in there,

571
00:29:41,810 --> 00:29:45,380
because we can sort of think of
the x's as being a place value

572
00:29:45,380 --> 00:29:46,430
type of situation.

573
00:29:46,430 --> 00:29:49,070
In other words,
x_1 can be thought

574
00:29:49,070 --> 00:29:51,470
of as being the first column.

575
00:29:51,470 --> 00:29:53,200
x_2 the second column.

576
00:29:53,200 --> 00:29:56,090
The first equation can be
thought of as the first row.

577
00:29:56,090 --> 00:29:58,070
The second equation,
the second row.

578
00:29:58,070 --> 00:30:00,270
And what this
motivates is a concept

579
00:30:00,270 --> 00:30:02,500
called an m by n matrix.

580
00:30:02,500 --> 00:30:05,930
Now this sounds like a very
ominous term, an m by n matrix.

581
00:30:05,930 --> 00:30:10,070
But the point is it's
not a very ominous term.

582
00:30:10,070 --> 00:30:12,980
It's in fact, I think
that it's too-- in fact

583
00:30:12,980 --> 00:30:16,180
the word matrix essentially
indicates an array,

584
00:30:16,180 --> 00:30:17,620
and that's all this thing is.

585
00:30:17,620 --> 00:30:21,910
By an m by n matrix, we simply
mean a rectangular array

586
00:30:21,910 --> 00:30:26,870
of numbers, arranged
to form m rows--

587
00:30:26,870 --> 00:30:27,690
In other words,

588
00:30:27,690 --> 00:30:33,870
the first number tells
you the number of rows,

589
00:30:33,870 --> 00:30:38,070
and the second number tells
you the number of columns.

590
00:30:38,070 --> 00:30:40,070
Now there's certainly
nothing logical about that

591
00:30:40,070 --> 00:30:41,620
in terms of our game idea.

592
00:30:41,620 --> 00:30:44,187
Just memorize this, it's a rule
of the game or a definition.

593
00:30:44,187 --> 00:30:45,770
Somebody could've
said, why didn't you

594
00:30:45,770 --> 00:30:47,590
give the columns first
and then the rows?

595
00:30:47,590 --> 00:30:50,000
Well we could've, but one
of them had to come first.

596
00:30:50,000 --> 00:30:52,890
And the convention is that
one refers to the rows

597
00:30:52,890 --> 00:30:54,910
first, and then the columns.

598
00:30:54,910 --> 00:30:57,030
An m by n matrix then is what?

599
00:30:57,030 --> 00:31:00,520
It's a rectangular array
of numbers consisting

600
00:31:00,520 --> 00:31:03,970
of m rows and n columns.

601
00:31:03,970 --> 00:31:08,580
By way of an example-- by the
way, to indicate that's you're

602
00:31:08,580 --> 00:31:10,590
talking about a
matrix, one usually

603
00:31:10,590 --> 00:31:16,209
encloses the array in
brackets, or in parentheses.

604
00:31:16,209 --> 00:31:17,500
It doesn't make any difference.

605
00:31:17,500 --> 00:31:21,180
I will use whichever one
strikes my fancy at the moment.

606
00:31:21,180 --> 00:31:23,310
And it happens to be
brackets right now.

607
00:31:23,310 --> 00:31:26,110
But if I write down this
array-- what is it now?

608
00:31:26,110 --> 00:31:31,070
[1, 1, 1; 1, -1, 2].

609
00:31:31,070 --> 00:31:35,120
This is a rectangular array
of numbers consisting of what?

610
00:31:35,120 --> 00:31:40,220
Two rows and three columns.

611
00:31:40,220 --> 00:31:44,170
And so this is an example
of a 2 by 3 matrix.

612
00:31:44,170 --> 00:31:45,990
A 2 by 3 matrix.

613
00:31:45,990 --> 00:31:49,500
Now again, we don't want to
invent this thing vacuously.

614
00:31:49,500 --> 00:31:53,230
Let's keep track of
what this matrix is

615
00:31:53,230 --> 00:31:57,291
coding for us in terms
of a system of equations.

616
00:31:57,291 --> 00:31:57,790
Well.

617
00:31:57,790 --> 00:32:01,790
For example, suppose we have
the system of equations z_1

618
00:32:01,790 --> 00:32:05,350
is equal to y_1
plus y_2 plus y_3.

619
00:32:05,350 --> 00:32:09,870
z_2 is equal to y_1
minus y_2 plus 2*y_3,

620
00:32:09,870 --> 00:32:13,270
and we want to think
of the y_1, y_2,

621
00:32:13,270 --> 00:32:19,000
and y_3 as being the
variables, z_1 and z_2 as being

622
00:32:19,000 --> 00:32:20,020
the constants here.

623
00:32:20,020 --> 00:32:22,580
What is the matrix
of coefficients here?

624
00:32:22,580 --> 00:32:25,410
Well the matrix would be what?

625
00:32:25,410 --> 00:32:32,070
The coefficient of the first
variable in the first column

626
00:32:32,070 --> 00:32:35,110
is 1; second variable,
first column is 1;

627
00:32:35,110 --> 00:32:39,870
third variable, first row is 1.

628
00:32:39,870 --> 00:32:40,550
You see?

629
00:32:40,550 --> 00:32:42,930
Second equation, first
variable coefficient

630
00:32:42,930 --> 00:32:48,120
is 1; second equation,
second variable coefficient

631
00:32:48,120 --> 00:32:49,540
is minus 1;

632
00:32:49,540 --> 00:32:53,010
second equation, third
variable coefficient is 2.

633
00:32:53,010 --> 00:32:55,360
So using our matrix
coding system,

634
00:32:55,360 --> 00:32:57,490
the matrix of coefficients
would be what?

635
00:32:57,490 --> 00:33:03,580
[1, 1, 1; 1, -1, 2].

636
00:33:03,580 --> 00:33:08,296
Which is exactly the matrix
that we wrote down over here.

637
00:33:08,296 --> 00:33:10,170
And to put this into a
different perspective,

638
00:33:10,170 --> 00:33:13,610
so to see what we're driving
at, let's take a second example

639
00:33:13,610 --> 00:33:17,400
where we first start out
with three equations and four

640
00:33:17,400 --> 00:33:17,950
unknowns.

641
00:33:17,950 --> 00:33:20,150
Three linear equations
and four unknowns.

642
00:33:20,150 --> 00:33:22,560
And then we'll write the
matrix for this afterwards.

643
00:33:22,560 --> 00:33:27,470
But let the equations be y sub 1
is x_1 plus 2*x_2 plus x_3 plus

644
00:33:27,470 --> 00:33:28,900
x_4.

645
00:33:28,900 --> 00:33:33,510
y_2 is 2*x_1 minus x_2
minus x_3 plus 3*x_4.

646
00:33:33,510 --> 00:33:37,760
y_3 is 3*x_1 plus x_2
plus 2*x_3 minus x_4.

647
00:33:37,760 --> 00:33:41,165
if I want to write the matrix
of coefficients, what do I do?

648
00:33:41,165 --> 00:33:44,610
I simply leave the variables
out, and write down what?

649
00:33:44,610 --> 00:33:46,230
My first row would be what?

650
00:33:46,230 --> 00:33:49,460
[1, 2, 1, 1].

651
00:33:49,460 --> 00:33:54,700
My second row would
be [2, -1,  -1, 3].

652
00:33:54,700 --> 00:34:00,320
My third row would
be [3, 1, 2,  -1].

653
00:34:00,320 --> 00:34:02,440
In other words, my
matrix of coefficients,

654
00:34:02,440 --> 00:34:04,840
now, would be what
kind of a matrix?

655
00:34:04,840 --> 00:34:07,440
It would be a rectangular
array of numbers,

656
00:34:07,440 --> 00:34:12,719
consisting of three
rows and four . columns.

657
00:34:12,719 --> 00:34:13,690
All right?

658
00:34:13,690 --> 00:34:16,510
And that would be
called a 3 by 4 matrix.

659
00:34:16,510 --> 00:34:20,000
Again, notice, in this coding
system, the number of rows

660
00:34:20,000 --> 00:34:23,330
corresponds to the
number of equations.

661
00:34:23,330 --> 00:34:26,340
And the number of
columns corresponds

662
00:34:26,340 --> 00:34:29,250
to the number of
variables that are

663
00:34:29,250 --> 00:34:32,590
formed in linear combinations.

664
00:34:32,590 --> 00:34:36,820
To summarize this
again, the matrix

665
00:34:36,820 --> 00:34:39,280
of coefficients in
our second example

666
00:34:39,280 --> 00:34:48,040
is the 3 by 4 matrix [1, 2, 1,
 1; 2, -1, -1, 3; 3, 1, 2, -1].

667
00:34:48,040 --> 00:34:52,070
Well again, let's recall
that when we do mathematics,

668
00:34:52,070 --> 00:34:53,909
we don't like to
introduce notation

669
00:34:53,909 --> 00:34:55,409
for the sake of notation.

670
00:34:55,409 --> 00:34:59,970
And simply to be able to have
a way of conveniently writing

671
00:34:59,970 --> 00:35:02,990
the coefficients, but
not being able to use it

672
00:35:02,990 --> 00:35:06,620
efficiently would be a
rather stupid thing to do.

673
00:35:06,620 --> 00:35:09,700
Why invent new notation
if it's not going to help

674
00:35:09,700 --> 00:35:12,180
us effectively
solve new problems?

675
00:35:12,180 --> 00:35:14,850
This is why in mathematics
we've been emphasizing

676
00:35:14,850 --> 00:35:20,170
the game idea, whereby what we
really care about is structure.

677
00:35:20,170 --> 00:35:24,270
We care about structure, not
about the terms themselves.

678
00:35:24,270 --> 00:35:26,090
And to motivate
what I'm driving at,

679
00:35:26,090 --> 00:35:29,000
let me return to
examples one and two.

680
00:35:29,000 --> 00:35:34,060
And bring up a question
that has great impact--

681
00:35:34,060 --> 00:35:36,200
and even if we don't
appreciate it right now

682
00:35:36,200 --> 00:35:38,210
in terms of a
practical application,

683
00:35:38,210 --> 00:35:40,680
let's at least see
what's happening.

684
00:35:40,680 --> 00:35:44,660
You'll notice that if I look
at these systems of equations

685
00:35:44,660 --> 00:35:49,000
over here, notice that the
first two equations tell me

686
00:35:49,000 --> 00:35:54,810
how to express z_1 and z_2 in
terms of y_1, y_2, and y_3.

687
00:35:54,810 --> 00:35:57,930
On the other hand, the
second system of equations

688
00:35:57,930 --> 00:36:02,190
tells me how to express
y_1, y_2, and y_3 in terms

689
00:36:02,190 --> 00:36:05,500
of x_1, x_2, x_3, and x_4.

690
00:36:05,500 --> 00:36:06,990
Now, without
belaboring the point

691
00:36:06,990 --> 00:36:10,160
because the arithmetic
is quite trivial here,

692
00:36:10,160 --> 00:36:12,980
a very natural question
that might come up next this

693
00:36:12,980 --> 00:36:15,710
is, lookit, let's look at
our old friend the chain

694
00:36:15,710 --> 00:36:16,650
rule again.

695
00:36:16,650 --> 00:36:18,690
Since the z's are
expressed in terms

696
00:36:18,690 --> 00:36:21,600
of the y's, and the y's
are expressed in terms

697
00:36:21,600 --> 00:36:24,970
of the x's, it seems that
by direct substitution,

698
00:36:24,970 --> 00:36:29,190
I should be able to express
the z's in terms of the x's.

699
00:36:29,190 --> 00:36:34,410
Namely, I replace y_1 by this
linear combination of the x's.

700
00:36:34,410 --> 00:36:38,510
I replace y_2 by this linear
combination of the x's.

701
00:36:38,510 --> 00:36:43,050
I replace y_3 by this linear
combination of the x's.

702
00:36:43,050 --> 00:36:50,040
| then combine the y's in terms
of the x's as indicated here.

703
00:36:50,040 --> 00:36:52,860
And that should give me the
z's in terms of the x's.

704
00:36:52,860 --> 00:36:57,780
Leaving that, hopefully,
as a trivial exercise,

705
00:36:57,780 --> 00:37:00,890
we come to the next example
that I'd like to mention here,

706
00:37:00,890 --> 00:37:03,710
and that is: suppose
you were told

707
00:37:03,710 --> 00:37:09,350
to express z_1 and z_2 in
terms of x_1, x_2, x_3 and x_4.

708
00:37:09,350 --> 00:37:13,240
The point is that, with the
amount of arithmetic mentioned

709
00:37:13,240 --> 00:37:20,070
before, we could easily show
that z_1 was 6*x_1 plus 2*x_2

710
00:37:20,070 --> 00:37:22,610
plus 2*x_3 plus 3*x_4,

711
00:37:22,610 --> 00:37:29,900
while z_2 was 5*x_1 plus
5*x_2 plus 6*x_3 minus 4*x_4,

712
00:37:29,900 --> 00:37:32,590
by a straightforward
substitution.

713
00:37:32,590 --> 00:37:34,380
The point is that
somehow or other,

714
00:37:34,380 --> 00:37:38,500
we would like to be able to
handle this substitution more

715
00:37:38,500 --> 00:37:40,010
efficiently.

716
00:37:40,010 --> 00:37:46,050
Is there a neater way of
being able to transform

717
00:37:46,050 --> 00:37:49,500
the z's into the x's
by way of the y's?

718
00:37:49,500 --> 00:37:52,300
In other words, is there
a way of replacing the y's

719
00:37:52,300 --> 00:37:54,690
by the x's, and then
finding z's in terms

720
00:37:54,690 --> 00:37:58,630
of x's in a convenient,
mechanical way that

721
00:37:58,630 --> 00:38:00,640
will save us much steps?

722
00:38:00,640 --> 00:38:04,620
Not so much in these easy
examples where you have 2 by 3,

723
00:38:04,620 --> 00:38:08,365
and 3 by 4 systems, but
cases where you might have

724
00:38:08,365 --> 00:38:10,140
10 equations and 10 unknowns.

725
00:38:10,140 --> 00:38:12,460
Or 10 equations and 12 unknowns.

726
00:38:12,460 --> 00:38:14,191
And the answer is,
there is a way.

727
00:38:14,191 --> 00:38:16,190
Of course, you knew there
was going to be a way.

728
00:38:16,190 --> 00:38:18,100
Otherwise we wouldn't
be leading up to it

729
00:38:18,100 --> 00:38:21,200
in this particular way,
and as so often happens,

730
00:38:21,200 --> 00:38:24,400
there usually happens to be
a real-life situation that

731
00:38:24,400 --> 00:38:26,520
motivates why we
invent something

732
00:38:26,520 --> 00:38:29,720
called matrix algebra.

733
00:38:29,720 --> 00:38:31,860
In terms of our
present illustration,

734
00:38:31,860 --> 00:38:33,840
the chain rule that
we're just talking

735
00:38:33,840 --> 00:38:36,570
about expressing the z's in
terms of the y's, and then

736
00:38:36,570 --> 00:38:39,170
the y's in terms of
the x's motivates

737
00:38:39,170 --> 00:38:42,240
what we mean by
matrix multiplication.

738
00:38:42,240 --> 00:38:45,400
And you may notice that I
put "multiplication" here

739
00:38:45,400 --> 00:38:47,000
in quotation marks.

740
00:38:47,000 --> 00:38:48,720
The reason I put
in quotation marks

741
00:38:48,720 --> 00:38:51,520
is that unfortunately
the word "multiplication"

742
00:38:51,520 --> 00:38:55,250
has a connotation of
multiplying numbers together.

743
00:38:55,250 --> 00:38:56,510
Don't think of it that way.

744
00:38:56,510 --> 00:38:58,490
Think of multiplication
meaning what?

745
00:38:58,490 --> 00:39:03,260
A way of combining two matrices
to form another matrix.

746
00:39:03,260 --> 00:39:06,950
There's going to be no logic
behind this other than one

747
00:39:06,950 --> 00:39:08,930
very famous piece of logic.

748
00:39:08,930 --> 00:39:12,220
That is knowing what the
answer was supposed to be,

749
00:39:12,220 --> 00:39:15,560
we make up our rules
to guarantee us

750
00:39:15,560 --> 00:39:17,995
that we will get the
appropriate answer.

751
00:39:17,995 --> 00:39:18,620
In other words,

752
00:39:18,620 --> 00:39:22,300
I remember when I was an
undergraduate in college.

753
00:39:22,300 --> 00:39:25,030
The big type of humor that
was going around at that time

754
00:39:25,030 --> 00:39:27,540
was the idea of, somebody
would give you the answer,

755
00:39:27,540 --> 00:39:29,685
and you have to make
up the question.

756
00:39:29,685 --> 00:39:31,310
Oh, they were silly
little things like,

757
00:39:31,310 --> 00:39:35,115
if the answer to the question
was 9w what was the question?

758
00:39:35,115 --> 00:39:36,490
And the question
would be, do you

759
00:39:36,490 --> 00:39:39,970
spell your last name with a
V, Herr Wagner, and the answer

760
00:39:39,970 --> 00:39:41,440
would be "nein, W."

761
00:39:41,440 --> 00:39:43,164
And these were funny
jokes at that time.

762
00:39:43,164 --> 00:39:45,080
I don't know whether
they're funny now or not.

763
00:39:45,080 --> 00:39:46,730
But the funny point is this.

764
00:39:46,730 --> 00:39:49,520
That this joke, which
might not be that funny,

765
00:39:49,520 --> 00:39:52,750
is exactly how we motivate
definitions and rules

766
00:39:52,750 --> 00:39:53,620
in mathematics.

767
00:39:53,620 --> 00:39:56,150
We start with the
answer, and then

768
00:39:56,150 --> 00:39:58,460
go back, and answer
the question.

769
00:39:58,460 --> 00:40:01,550
Knowing in advance
that somehow or other,

770
00:40:01,550 --> 00:40:03,820
the matrix that
expresses the z's

771
00:40:03,820 --> 00:40:07,880
in terms of the y's
is given by this.

772
00:40:07,880 --> 00:40:11,940
And the matrix that expresses
the y's in terms of the x's, is

773
00:40:11,940 --> 00:40:13,830
given by this matrix.

774
00:40:13,830 --> 00:40:16,680
Somehow or other what
we would like to do

775
00:40:16,680 --> 00:40:22,070
is invent a way of combining
these two matrices to give me

776
00:40:22,070 --> 00:40:24,970
the matrix that
expresses this answer.

777
00:40:24,970 --> 00:40:27,940
In other words, if I start
knowing what the answer is

778
00:40:27,940 --> 00:40:31,390
supposed to be-- in other
words, what is the matrix that

779
00:40:31,390 --> 00:40:33,950
expresses the z's
in terms of the x's?

780
00:40:33,950 --> 00:40:38,020
It's the matrix whose
first row is [6, 2, 2, 3].

781
00:40:38,020 --> 00:40:42,240
And whose second row
is [5, 5,  6, -4].

782
00:40:42,240 --> 00:40:43,990
In other words, the
matrix would be what?

783
00:40:43,990 --> 00:40:50,530
[6, 2, 2, 3; 5, 5, 6, -4].

784
00:40:50,530 --> 00:40:52,880
And without even looking
at any mechanical rule,

785
00:40:52,880 --> 00:40:55,390
the question that
comes up is, how can I

786
00:40:55,390 --> 00:40:57,630
invent a rule that
will tell me how

787
00:40:57,630 --> 00:41:04,470
to multiply this 2 by 3
matrix by this 3 by 4 matrix

788
00:41:04,470 --> 00:41:08,480
to obtain this 2 by 4 matrix?

789
00:41:08,480 --> 00:41:10,900
2 by 4 matrix.

790
00:41:10,900 --> 00:41:11,520
Now lookit.

791
00:41:11,520 --> 00:41:14,530
In the notes, I'm going to
do this in great detail.

792
00:41:14,530 --> 00:41:16,680
There will be many
exercises on this for you

793
00:41:16,680 --> 00:41:18,670
to sharpen your teeth on.

794
00:41:18,670 --> 00:41:22,030
But for now I just want
to hit this main point,

795
00:41:22,030 --> 00:41:24,560
because the lecture
is quite long.

796
00:41:24,560 --> 00:41:28,110
Your attention span probably
is starting to be taxed.

797
00:41:28,110 --> 00:41:31,480
And so I just want to show
you what the recipe is,

798
00:41:31,480 --> 00:41:34,180
because my feeling is
that this is something

799
00:41:34,180 --> 00:41:38,350
you have to hear before you can
really read it without becoming

800
00:41:38,350 --> 00:41:40,540
panicked by the notation.

801
00:41:40,540 --> 00:41:44,320
The idea is this: first of
all, to multiply two matrices,

802
00:41:44,320 --> 00:41:47,780
all we ever require is
that the number of columns

803
00:41:47,780 --> 00:41:51,250
in the first matrix
equals the number of rows

804
00:41:51,250 --> 00:41:52,800
in the second matrix.

805
00:41:52,800 --> 00:41:55,090
And if that sounds
complicated to you,

806
00:41:55,090 --> 00:41:58,800
simply think in terms
of the chain rule again.

807
00:41:58,800 --> 00:42:01,850
The number of columns
in the first matrix

808
00:42:01,850 --> 00:42:03,500
tells you how many
unknowns there

809
00:42:03,500 --> 00:42:06,070
are in the first
system of equations.

810
00:42:06,070 --> 00:42:09,730
And that number of
unknowns gives you

811
00:42:09,730 --> 00:42:12,690
the number of equations
in the second system.

812
00:42:12,690 --> 00:42:16,200
In other words, the number of
columns in the first matrix

813
00:42:16,200 --> 00:42:21,880
must match the number of
rows in the second matrix.

814
00:42:21,880 --> 00:42:25,290
Notice, we don't care
about the number of rows

815
00:42:25,290 --> 00:42:27,690
in the first one matching
the number of columns

816
00:42:27,690 --> 00:42:31,280
in the second, all we care
is that the number of columns

817
00:42:31,280 --> 00:42:34,230
in the first matrix--
namely three here-- match

818
00:42:34,230 --> 00:42:37,410
the number of rows of the
second, which is three.

819
00:42:37,410 --> 00:42:41,640
Then the rule works in a very
interesting mechanical way that

820
00:42:41,640 --> 00:42:43,610
makes use of the dot product.

821
00:42:43,610 --> 00:42:45,650
Namely, what you
do is, suppose I

822
00:42:45,650 --> 00:42:50,080
want to find the term in the
product of these two matrices

823
00:42:50,080 --> 00:42:55,200
that occupies the second
row, third column.

824
00:42:55,200 --> 00:42:58,430
What I do is I take the
second row-- in other words,

825
00:42:58,430 --> 00:43:02,190
I take the row comes
from the first matrix.

826
00:43:02,190 --> 00:43:04,830
I take the column value
from the second matrix.

827
00:43:04,830 --> 00:43:06,240
In other words, I have what?

828
00:43:06,240 --> 00:43:10,150
Second row, third column.

829
00:43:10,150 --> 00:43:14,810
And I form the usual dot
product that we've talked about.

830
00:43:14,810 --> 00:43:18,350
I dot the second row
with the third column.

831
00:43:18,350 --> 00:43:20,010
And what would I
get if I did that?

832
00:43:20,010 --> 00:43:27,310
1 times 1 is 1; minus 1 times
minus 1 is 1; and 2 times 2

833
00:43:27,310 --> 00:43:28,200
is 4.

834
00:43:28,200 --> 00:43:32,470
So 1 plus 1 plus 4 is 6.

835
00:43:32,470 --> 00:43:35,690
So in this product
matrix, the term

836
00:43:35,690 --> 00:43:39,920
in the second row,
third column will be 6.

837
00:43:39,920 --> 00:43:44,820
The term in the second row,
third column will be 6.

838
00:43:44,820 --> 00:43:47,570
Second row, third
column will be 6.

839
00:43:47,570 --> 00:43:50,460
Now, leaving it as an
exercise for the time being,

840
00:43:50,460 --> 00:43:52,230
and reading it in the
supplementary notes,

841
00:43:52,230 --> 00:43:54,460
I'm sure you'll be able
to put this all together.

842
00:43:54,460 --> 00:43:57,220
It's not nearly as
difficult as it sounds

843
00:43:57,220 --> 00:43:58,760
hearing it the first time.

844
00:43:58,760 --> 00:44:01,850
I think the most difficult
part is rationalizing

845
00:44:01,850 --> 00:44:05,260
why one would invent such a
definition in the first place.

846
00:44:05,260 --> 00:44:08,620
The answer is very simple:
we invent the definition

847
00:44:08,620 --> 00:44:11,470
to solve a particular problem.

848
00:44:11,470 --> 00:44:13,800
Coming back here
again, all I'm saying

849
00:44:13,800 --> 00:44:15,790
is that if I
invent-- for example,

850
00:44:15,790 --> 00:44:18,330
let me just give you one
more checking-out point here.

851
00:44:18,330 --> 00:44:20,210
Let me see what
the term would be

852
00:44:20,210 --> 00:44:22,420
in the first row, second column.

853
00:44:22,420 --> 00:44:26,150
To find the term in the
first row, second column,

854
00:44:26,150 --> 00:44:29,560
I take the first row
of the first matrix.

855
00:44:29,560 --> 00:44:35,260
Dot it with the second
column of the second matrix.

856
00:44:35,260 --> 00:44:37,800
See first row dotted
with second column,

857
00:44:37,800 --> 00:44:39,630
the answer will give me what?

858
00:44:39,630 --> 00:44:42,030
The term in the product
that's in the first row,

859
00:44:42,030 --> 00:44:43,030
second column.

860
00:44:43,030 --> 00:44:44,050
Let's check that.

861
00:44:44,050 --> 00:44:50,550
1 times 2 is 2; 1 times minus
1 is minus 1; 1 times 1 is 1.

862
00:44:50,550 --> 00:44:53,530
2 minus 1 plus 1 is 2.

863
00:44:53,530 --> 00:44:57,490
And therefore, the term in
the first row, second column

864
00:44:57,490 --> 00:45:00,520
should be 2.

865
00:45:00,520 --> 00:45:01,020
It is.

866
00:45:01,020 --> 00:45:02,680
You see, there's
no more motivation

867
00:45:02,680 --> 00:45:05,890
to how we multiply these
two matrices than the fact

868
00:45:05,890 --> 00:45:09,950
that it solves the problem
that we want solved.

869
00:45:09,950 --> 00:45:13,260
To find the term
that's in the i-th row,

870
00:45:13,260 --> 00:45:17,730
j-th column of the
product, dot the i-th row

871
00:45:17,730 --> 00:45:20,950
of the first matrix
with the j-th column

872
00:45:20,950 --> 00:45:23,550
of the second matrix.

873
00:45:23,550 --> 00:45:29,180
More generally, you can always
multiply an m by n matrix

874
00:45:29,180 --> 00:45:32,330
by an n by p matrix.

875
00:45:32,330 --> 00:45:34,240
What's the key factor?

876
00:45:34,240 --> 00:45:36,720
You don't care about the
number of rows in the first,

877
00:45:36,720 --> 00:45:39,170
you don't care about the number
of columns in the second.

878
00:45:39,170 --> 00:45:41,030
What you do care about is what?

879
00:45:41,030 --> 00:45:45,380
That the number of columns
in the first matrix

880
00:45:45,380 --> 00:45:48,450
be equal to the number
of rows in the second,

881
00:45:48,450 --> 00:45:52,120
and if you do that, when you
multiply an m by n matrix

882
00:45:52,120 --> 00:45:56,550
by an n by p matrix, notice
that the result will be what?

883
00:45:56,550 --> 00:45:58,750
An m by p matrix.

884
00:45:58,750 --> 00:46:01,230
In other words,
the number of rows

885
00:46:01,230 --> 00:46:05,100
is governed by the number
of rows in the first matrix

886
00:46:05,100 --> 00:46:07,890
and a number of
columns is governed

887
00:46:07,890 --> 00:46:11,790
by the number of columns
in the second matrix.

888
00:46:11,790 --> 00:46:13,840
Notice, by the way,
that this tells us

889
00:46:13,840 --> 00:46:17,660
right away that when we want
to multiply two matrices

890
00:46:17,660 --> 00:46:21,310
it makes a difference in which
order that they're written.

891
00:46:21,310 --> 00:46:26,440
If we were to take that 2 by 3
matrix, and the 3 by 4 matrix,

892
00:46:26,440 --> 00:46:30,220
and interchange them, we don't
have the appropriate match

893
00:46:30,220 --> 00:46:32,600
up of rows and columns.

894
00:46:32,600 --> 00:46:35,930
You can't dot a
2-tuple with a 4-tuple.

895
00:46:35,930 --> 00:46:39,530
The very fact that we say
dot the row with the column,

896
00:46:39,530 --> 00:46:42,630
the dot product is only
defined for two n-tuples.

897
00:46:42,630 --> 00:46:45,480
We insist that the
n-tuples be the same.

898
00:46:45,480 --> 00:46:49,520
The n has to be the same
to dot two n-tuples.

899
00:46:49,520 --> 00:46:52,680
Let me summarize today's
lecture by saying

900
00:46:52,680 --> 00:46:55,850
that in overview, notice
that what we've done,

901
00:46:55,850 --> 00:46:58,090
hopefully, is that
we have reestablished

902
00:46:58,090 --> 00:47:01,250
the need for linear
systems of equations,

903
00:47:01,250 --> 00:47:03,760
and secondly, once we
have understood what

904
00:47:03,760 --> 00:47:06,970
the need for linear systems
is, we are now introducing

905
00:47:06,970 --> 00:47:11,310
a mechanism whereby we can
solve linear systems more

906
00:47:11,310 --> 00:47:14,580
efficiently than what we were
taught in the past as to how

907
00:47:14,580 --> 00:47:15,750
to solve them.

908
00:47:15,750 --> 00:47:18,490
You see, what I'm going to do
for the next few lectures now

909
00:47:18,490 --> 00:47:21,470
is concentrate on
a new game, called

910
00:47:21,470 --> 00:47:23,880
the game of matrix algebra.

911
00:47:23,880 --> 00:47:27,950
But that will unfold gradually
as we develop the next two

912
00:47:27,950 --> 00:47:28,750
lectures.

913
00:47:28,750 --> 00:47:31,330
And so until our next
lecture, so long.

914
00:47:34,480 --> 00:47:36,850
Funding for the
publication of this video

915
00:47:36,850 --> 00:47:41,730
was provided by the Gabriella
and Paul Rosenbaum foundation.

916
00:47:41,730 --> 00:47:45,900
Help OCW continue to provide
free and open access to MIT

917
00:47:45,900 --> 00:47:53,610
courses by making a donation
at ocw.mit.edu/donate.