1
00:00:08,119 --> 00:00:11,380
What we're going to
talk about today, is goals.

2
00:00:11,389 --> 00:00:17,550
So just by way of a little warm up exercise,
I'd like you to look at that integration problem

3
00:00:17,550 --> 00:00:40,750
over there. The one that's disappeared.

4
00:00:40,750 --> 00:00:47,840
So the question is, can you do it in your
head? Probably not. The question is, if a

5
00:00:47,840 --> 00:00:55,260
program can do that, is a program, in any
sense of the word, intelligent? That's a background

6
00:00:55,260 --> 00:00:58,880
task I'd like you to work on as I talk today.

7
00:00:58,880 --> 00:01:01,000
So today we're going to be
modeling a little bit of human

8
00:01:01,000 --> 00:01:02,720
problem solving, the kind that

9
00:01:02,720 --> 00:01:07,100
is required when you do symbolic integration.
Now, you all learned how to do that. You may

10
00:01:07,100 --> 00:01:10,579
not be able to do that particular problem
anymore, but you all learned how to integrate

11
00:01:10,579 --> 00:01:16,950
in high school 1801, or something like that.
The question is, how did you do it, and is

12
00:01:16,950 --> 00:01:22,539
the problem solving technique that we are
trying to model by building a program that

13
00:01:22,539 --> 00:01:29,610
does symbolic integration, is that a common
kind of description of what people do when

14
00:01:29,610 --> 00:01:31,110
they solve problems.

15
00:01:31,110 --> 00:01:34,120
So the answer to the question is,
yes. The kind of problem solving you'll see

16
00:01:34,150 --> 00:01:39,280
today is like generating tests, which you
saw last time. It's a very common kind of

17
00:01:39,280 --> 00:01:44,799
problem solving that we all engage in, that
we all engage in without thinking about it,

18
00:01:44,799 --> 00:01:46,650
and without having a name for it.

19
00:01:46,650 --> 00:01:51,470
But once we get a name for it, we'll get power
over it. And then we'll be able to deploy

20
00:01:51,470 --> 00:01:55,658
it, and it will become a skill. We'll not
just witness it, we'll not just understand

21
00:01:55,658 --> 00:02:00,950
it, we'll use it instinctively, as a skill.

22
00:02:00,950 --> 00:02:04,650
So there you are, you've got that problem,
there's your problem, and what do you do to

23
00:02:04,650 --> 00:02:12,060
solve it? I don't know, look it up in a table?
You'll never find it in a table because of

24
00:02:12,060 --> 00:02:21,670
that minus sign and that 5. So you're going
to have to do something better than that.

25
00:02:21,670 --> 00:02:26,640
So what you're going to do, is what you always
do when you see a problem like that. You try

26
00:02:26,640 --> 00:02:31,090
to apply a transform, and make it into a different
problem that's easier to solve. And eventually,

27
00:02:31,090 --> 00:02:38,040
what you hope is that you'll simplify it sufficiently,
that the pieces that you've simplified to

28
00:02:38,040 --> 00:02:44,450
will be found in some small table of integrals.
So how long is this table? It's not the case

29
00:02:44,450 --> 00:02:48,690
that we're going to look at a table with 388
elements, because this is not a big table

30
00:02:48,690 --> 00:02:52,860
of integrals. This is what a freshman might
have in a freshman's head, after taking a

31
00:02:52,860 --> 00:02:55,739
course in integral calculus.

32
00:02:55,739 --> 00:03:00,400
One of the interesting questions is, how many
elements have to be in that table to get an

33
00:03:00,400 --> 00:03:05,180
A in the course? We're interested in how much
knowledge is involved, that's one of the elements

34
00:03:05,180 --> 00:03:09,980
of catechism that I've listed over there,
that will be part of the gold star ideas suite

35
00:03:09,980 --> 00:03:12,400
of the day.

36
00:03:12,400 --> 00:03:19,769
So we'd like to take that problem, and find
a way to make it into another problem that's

37
00:03:19,769 --> 00:03:25,800
more likely, or closer to being found in the
table. So what we're going to do is very simple,

38
00:03:25,800 --> 00:03:31,720
graphically. We're going to take the problem
we're given, and convert it into another problem

39
00:03:31,720 --> 00:03:35,579
that's simpler. And we're going to give that
process and name, and we're going to call

40
00:03:35,579 --> 00:03:46,900
it problem reduction.

41
00:03:46,900 --> 00:03:54,269
And so, in the world of integral calculus,
there are all sorts of simple methods, simple

42
00:03:54,269 --> 00:04:00,010
transformations, we can try that will take
a hard problem and make it into an easier

43
00:04:00,010 --> 00:04:07,659
problem. And some of these transformations
are extremely simple and always safe. Some

44
00:04:07,659 --> 00:04:12,629
of them are just, well let's try it and see
what happens. But some of them are safe, and

45
00:04:12,629 --> 00:04:18,879
I'd like to make a short list of safe transformations
right now.

46
00:04:18,879 --> 00:04:25,490
Now I'm going to be going into some detail.
And that detail will be grungy. And the question

47
00:04:25,490 --> 00:04:30,159
is, why do I do it? And it's educational philosophy,
is why I do it. So here's the educational

48
00:04:30,159 --> 00:04:38,110
philosophy. At one level, you want to have
a skill. But if you're going to have a skill,

49
00:04:38,110 --> 00:04:47,629
you have to understand it. So if you're going
to have a skill you have to understand it

50
00:04:47,629 --> 00:04:54,330
one level down. If you're going to understand
it, you have to have witnessed it on a level

51
00:04:54,330 --> 00:04:56,389
lower than that.

52
00:04:56,389 --> 00:05:00,569
So I'm not just going to talk about the idea
of problem reduction, because if I were just

53
00:05:00,569 --> 00:05:05,969
going to do that, then we could all go home
now. So I'm going to show you a particular

54
00:05:05,969 --> 00:05:10,919
example of it, so you understand it better,
and I'm going to show you the detail at an

55
00:05:10,919 --> 00:05:15,199
even lower level than that. So you will witness
the stuff that makes it possible, to understand

56
00:05:15,199 --> 00:05:19,419
the stuff that makes it possible, to build
a skill. So that's why I'm going through the

57
00:05:19,419 --> 00:05:21,539
grungy detail.

58
00:05:21,539 --> 00:05:27,349
So I don't know, let's see. Maybe we can get
some hints from that example, but I wonder

59
00:05:27,349 --> 00:05:34,569
if somebody could volunteer a simple transformation
that always is a good thing to do. Yes, Sebastian.

60
00:05:34,569 --> 00:05:35,729
AUDIENCE: Take the constants out.

61
00:05:35,729 --> 00:05:41,819
SPEAKER 1: Take the constants out. So we'll
make that number two. And we'll say that the

62
00:05:41,819 --> 00:05:53,680
integral c f of x dx is equal to c times the
integral f of x dx. Other suggestions? Yes.

63
00:05:53,680 --> 00:05:56,449
AUDIENCE: Trig substitution.

64
00:05:56,449 --> 00:06:03,569
SPEAKER 1: Trig substitution. Now this is--
no, that's for day two. We don't do trig substitution

65
00:06:03,569 --> 00:06:10,229
here under stuff that's safe, always works,
never any doubt, there are simpler things.

66
00:06:10,229 --> 00:06:17,319
These are the safe transformations. What you're
giving me is a heuristic transformation. Often

67
00:06:17,319 --> 00:06:22,779
is helpful, doesn't necessarily always work.
We're going to divide our transformations

68
00:06:22,779 --> 00:06:26,729
into those two categories. So I need another
safe one.

69
00:06:26,729 --> 00:06:29,830
AUDIENCE: [INAUDIBLE]

70
00:06:29,830 --> 00:06:37,740
SPEAKER 1: The architects are sitting over
there. Divided not only by nationality, but

71
00:06:37,740 --> 00:06:40,020
by course. What?

72
00:06:40,020 --> 00:06:43,280
AUDIENCE: The sum of integrals is the integral
of the sum.

73
00:06:43,280 --> 00:07:05,360
SPEAKER 1: The sum of integrals is the integral
of the sum. Now what's missing? What's number

74
00:07:05,360 --> 00:07:11,220
one? You're probably thinking it's already
there, because you've given me the transformation

75
00:07:11,229 --> 00:07:16,159
that involves a constant. And you can think
of minus 1 as a constant.

76
00:07:16,159 --> 00:07:20,270
But whether you use a separate transformation
or not, of course depends on how you represent

77
00:07:20,270 --> 00:07:25,889
the knowledge. And all of this knowledge,
all of this whole thing, was written in an

78
00:07:25,889 --> 00:07:32,749
early form of Lisp. As a consequence, the
way in which minus was represented is different

79
00:07:32,749 --> 00:07:38,069
from the way minus 1 is represented. So we
need one more transformation. Or rather, Jim

80
00:07:38,069 --> 00:07:42,469
Slagle needed one more transformation, when
he wrote his famous transformation program.

81
00:07:42,469 --> 00:07:53,599
And that was that if you have the integral
of minus f of x, that's equal to, minus the

82
00:07:53,599 --> 00:07:56,989
integral of f of x.

83
00:07:56,989 --> 00:08:01,099
So that almost completes our safe transformation
set. There's one more that I'm going to supply

84
00:08:01,099 --> 00:08:05,800
you, because I don't think you'd guess it.
Why should you? It's number four. There are

85
00:08:05,800 --> 00:08:11,489
more than this, this is a sample. And these
are the ones we're going to need in order

86
00:08:11,489 --> 00:08:14,989
to solve that problem, by way of illustration.

87
00:08:14,989 --> 00:08:24,439
So the fourth one is that, if you have the
integral of p of x, over q of x, then you

88
00:08:24,439 --> 00:08:31,159
divide. If you can reach way back into high
school and figure out how to divide polynomials.

89
00:08:31,159 --> 00:08:37,179
But if the degree of the numerator is greater
than the degree of the denominator, then it's

90
00:08:37,179 --> 00:08:42,120
a knee-jerk always win, you must do it, divide
it out.

91
00:08:42,120 --> 00:08:50,259
So this, then, forms the core of an integration
program, that will integrate almost nothing.

92
00:08:50,259 --> 00:08:56,490
But actually, almost nothing is integrable
anyway, so it's a good head start. So let's

93
00:08:56,490 --> 00:09:05,620
see how we would put this into some kind of
procedure. Some kind of framework for deploying

94
00:09:05,620 --> 00:09:12,600
the knowledge that we're beginning to develop.

95
00:09:12,600 --> 00:09:32,920
What we're going to do is, apply all safe
transforms. That's our first step. Then we're

96
00:09:32,920 --> 00:09:52,440
going to look in the table, and then we're
going to do a test to see if we're done. And

97
00:09:52,440 --> 00:10:02,400
if we are, we report success. But, we're not
likely to get done with just that stuff.

98
00:10:02,410 --> 00:10:10,000
But you know what, there was one transformation
up here, which breaks my little diagram. Which

99
00:10:10,000 --> 00:10:19,709
one is it? It's the third one, right? Because
this picture does not reflect what happens

100
00:10:19,709 --> 00:10:24,680
when you apply number three. Because it breaks
the problem up, not into just one problem,

101
00:10:24,680 --> 00:10:33,240
but into a whole bunch. So we have to extend
our graphical device for talking about this

102
00:10:33,240 --> 00:10:47,220
by a little bit, and show what is called an
"and node".

103
00:10:47,220 --> 00:10:55,230
So we've got a program core, we've got a table
of integrals, we've got a few transformations,

104
00:10:55,230 --> 00:10:59,810
we've got an architecture, a way of putting
that stuff together. And now we can try it

105
00:10:59,810 --> 00:11:06,370
out on our sample problem. So let's have a
go at that.

106
00:11:06,370 --> 00:11:20,149
Let's see, this one immediately transforms
into 5x to the fourth over 1 minus x squared

107
00:11:20,149 --> 00:11:28,649
to the 5/2 dx. And that in turn, immediately
transforms into the integral of x to the fourth

108
00:11:28,649 --> 00:11:36,620
over 1 minus x squared to the 5/2, dx.

109
00:11:36,620 --> 00:11:42,699
This program, by the way, is a dawn-age program.
This was written by a nearly blind, and subsequently

110
00:11:42,699 --> 00:11:48,319
completely blind, graduate student by the
name of James Slagle in 1960, a long time

111
00:11:48,319 --> 00:11:54,509
ago. The reason I gave it to you today is
because, that by describing it, I am giving

112
00:11:54,509 --> 00:12:01,079
you a one-lecture course in artificial intelligence.
He anticipated so much of the subsequent 20

113
00:12:01,079 --> 00:12:08,689
years, that talking about his program, which
is possible in one day, is a miniature introduction

114
00:12:08,689 --> 00:12:10,389
to the whole field.

115
00:12:10,389 --> 00:12:17,899
So Slagle, as he was doing this on an antique
computer, almost no memory, almost no speed,

116
00:12:17,899 --> 00:12:25,019
only slightly faster than mice running around
on a treadmill. He was able to write a program

117
00:12:25,019 --> 00:12:31,769
that did extremely well when benchmarked against
freshmen. And the way you benchmark against

118
00:12:31,769 --> 00:12:39,249
freshman, of course, is you give it an examination,
drawn from the previous MIT finals for four

119
00:12:39,249 --> 00:12:46,100
or five years, the hardest problems. And this
was the hardest problem that it solved.

120
00:12:46,100 --> 00:12:51,769
So at this point, with what we've got so far,
we would be stuck. We have no transformation

121
00:12:51,769 --> 00:12:59,540
that can take us further, so we need something
else. And what we need by way of something

122
00:12:59,540 --> 00:13:08,939
else, is some transformations that we will
describe as-- perhaps we'll call them, heuristic

123
00:13:08,939 --> 00:13:20,279
transformations. A funny word, meaning a method
that often works isn't guaranteed to work.

124
00:13:20,279 --> 00:13:27,540
It's not an algorithm in the usual sense that
we talk about algorithms. But rather, it's

125
00:13:27,540 --> 00:13:29,430
an attempt.

126
00:13:29,430 --> 00:13:35,519
So these things I'm going to talk about now,
are sometimes useful, not always useful. Sometimes

127
00:13:35,519 --> 00:13:42,209
take you into a blind alley, don' always work.
But you can't get an A in calculus without

128
00:13:42,209 --> 00:13:49,379
knowing some of them. So you said, some kind
of trig substitution. So here is some kind

129
00:13:49,379 --> 00:13:56,329
of trig substitution. We'll call this heuristic
transformation A.

130
00:13:56,329 --> 00:14:10,790
You have a function sine x, cosine x, tangent
of x, cotangent of x, secant of x, and cosecant

131
00:14:10,790 --> 00:14:20,899
of x. And we all know from high school trigonometry,
that we can rewrite that as a function of

132
00:14:20,899 --> 00:14:36,990
sine x, and cosine x. Or we can rewrite that
as a function of tangent of x, and cosecant

133
00:14:36,990 --> 00:14:51,550
of x. Or we can rewrite that as function of
cotangent of x, and the secant of x. So that's

134
00:14:51,550 --> 00:14:57,089
a transmission from trigonometric form, into
another trigonometric form. It's not always

135
00:14:57,089 --> 00:15:02,990
a good idea, sometimes it helps.

136
00:15:02,990 --> 00:15:25,370
Well that's just part one of our suite of
heuristic transformations. Stop. There are

137
00:15:25,370 --> 00:15:32,459
others that we need to have in our repertoire,
in order to solve the problem. One of them

138
00:15:32,459 --> 00:15:38,629
is a family of transformations, which I'll
show you only one. It goes like this, if you

139
00:15:38,629 --> 00:15:53,899
have the integral of a function, of the tangent
of x, then you can rewrite that as the integral

140
00:15:53,899 --> 00:16:05,850
of a function of y over 1 plus y squared dy.
So that's a transformation from a trigonometric

141
00:16:05,850 --> 00:16:11,360
form into a polynomial form. So it gets rid
of all that trigonometric garbage we don't

142
00:16:11,360 --> 00:16:17,319
want to deal with. And there's a whole family
of things like that, just as there's a family

143
00:16:17,319 --> 00:16:22,699
of transformations like so, but this is enough
to give you flavor.

144
00:16:22,699 --> 00:16:30,949
Now there's a C that we need as well. And
that's going to be your proper knee-jerk reaction

145
00:16:30,949 --> 00:16:39,640
when you see something of the form 1 minus
x squared. What do you do when you see that?

146
00:16:39,640 --> 00:16:40,660
AUDIENCE: [INAUDIBLE]

147
00:16:40,660 --> 00:16:41,940
What's that Rhana?

148
00:16:41,940 --> 00:16:44,340
Rhana: 1 + 6  * 1 - 6

149
00:16:44,380 --> 00:16:49,180
Well wait a second. We could do that.
But there's another thing we can do.

150
00:16:49,180 --> 00:16:57,260
Christian, have you got something you can
suggest? Where's our Hungarian? Our Turk,

151
00:16:57,260 --> 00:17:02,460
our young Turk. Yeah, what do you think?

152
00:17:02,460 --> 00:17:07,060
AUDIENCE: I actually don't remember. I mean,
I think it might have been 10.

153
00:17:07,060 --> 00:17:11,859
SPEAKER 1: Well, let's see. Cosine squared
plus sine squared equals 1. So, what's that

154
00:17:11,859 --> 00:17:22,060
suggest to you? So it suggests that we make
a transformation that involves x equals sine

155
00:17:22,060 --> 00:17:27,640
y. So [? Silla ?] doesn't actually have to
remember that anymore because going forward,

156
00:17:27,648 --> 00:17:31,640
she will never have to integrate anything
personally in her life, she can just simulate

157
00:17:31,640 --> 00:17:49,010
the program.

158
00:17:49,010 --> 00:17:53,760
So these go from polynomial form, back into
trigonometric form. So you have three of these

159
00:17:53,760 --> 00:17:59,289
heuristic transformations. We've got four
safe transformations. Let's see if we can

160
00:17:59,289 --> 00:18:11,160
make any progress on our integration problem.

161
00:18:11,160 --> 00:18:19,600
OK so keeping track of what we've been using,
this is safe transformation number one, this

162
00:18:19,610 --> 00:18:24,890
is safe transformation number two. What do
we do next? We decided there were no more

163
00:18:24,890 --> 00:18:31,190
safe transformations that apply. But now we
can look at our heuristic transformations

164
00:18:31,190 --> 00:18:35,150
and behold, we see what?

165
00:18:35,150 --> 00:18:35,820
AUDIENCE: C

166
00:18:35,820 --> 00:18:36,920
SPEAKER 1: What?

167
00:18:36,920 --> 00:18:38,460
AUDIENCE: Applying transformation C.

168
00:18:38,470 --> 00:18:50,710
SPEAKER 1: Transformation C suggests that
we do x equals the sine y.

169
00:18:50,710 --> 00:18:57,490
And now we get the integral of sine to the

170
00:18:57,490 --> 00:19:11,380
fourth y over cosine to the fourth y dy, right.
All good, I see some confused, worried, concerned

171
00:19:11,380 --> 00:19:19,320
looks. Maybe I've made a mistake, perhaps
I should use notes. Well no, wait a minute.

172
00:19:19,320 --> 00:19:26,429
For those of you who have a concerned look,
remember that if x equals a sine y, then dx

173
00:19:26,429 --> 00:19:33,940
is equal to cosine y dy. That's why it's cosine
to the fourth not cosine to the fifth, as

174
00:19:33,940 --> 00:19:38,580
you were perhaps thinking it might be.

175
00:19:38,580 --> 00:19:45,880
So now we've made some progress. We look at
this, we say, are there any safe transformations

176
00:19:45,880 --> 00:19:52,450
that apply? And the answer is, no. Now we
look for a heuristic transformation that might

177
00:19:52,450 --> 00:19:58,029
apply, and I say, what do you see? Which one?
What's that?

178
00:19:58,029 --> 00:20:05,880
AUDIENCE: [INAUDIBLE].

179
00:20:05,880 --> 00:20:09,740
SPEAKER 1: She said something unintelligible,
but what she probably said is, that this looks

180
00:20:09,750 --> 00:20:17,029
like a pattern that might match with the heuristic
transformation A, right? Because we have a

181
00:20:17,029 --> 00:20:22,320
function in which the variable is buried,
universally in sines, or cosines, or tangents,

182
00:20:22,320 --> 00:20:27,649
or cotangents, or secants, or cosecants. And
we know we can rewrite that in one of three

183
00:20:27,649 --> 00:20:35,549
ways. It's already written as a function of
sine and cosine. But we can also rewrite that

184
00:20:35,549 --> 00:20:41,809
in terms of tangent and cosecant. Or cotangent
and secant.

185
00:20:41,809 --> 00:20:50,370
So when we do that, we can go this way, and
we can get the integral of 1 over the cotangent

186
00:20:50,370 --> 00:21:03,120
of x dx. That's g3 up there. Or we can do
it down this path, and get the integral of

187
00:21:03,120 --> 00:21:11,200
tangent of x dx. And of course, those are
both to the fourth.

188
00:21:11,200 --> 00:21:23,140
But know what, I've broken my little graphical
diagram again. Where did it go, it's disappeared.

189
00:21:23,140 --> 00:21:35,100
There it is. How have I broken it? Because
with transformation A, I've introduced a possibility

190
00:21:35,110 --> 00:21:39,220
that a particular problem can be transformed
into more than one kind of problem, any of

191
00:21:39,220 --> 00:21:43,470
which will be the solution to my problem.

192
00:21:43,470 --> 00:22:00,740
So far I've got an and node, but now I've
got to introduce an or node. Because now we

193
00:22:00,740 --> 00:22:04,210
have an example of something that can be solved
one of two different ways, and we don't care

194
00:22:04,210 --> 00:22:10,399
which one it is. Now you'll notice that there's
already some confusion here, because how can

195
00:22:10,399 --> 00:22:13,970
you tell the difference between an and node
and an or node. So the universal convention

196
00:22:13,970 --> 00:22:20,190
is, you draw an arc over the and nodes. And
that makes it look like an A, so it's easy

197
00:22:20,190 --> 00:22:22,850
to remember. So those are and nodes.

198
00:22:22,850 --> 00:22:30,200
And now, we have the method of problem reduction,
and this is sometimes called a problem reduction

199
00:22:30,200 --> 00:22:45,750
tree. Sometimes it's called an and/or tree,
and sometimes it's called a goal tree, because

200
00:22:45,750 --> 00:22:53,100
this tree of problems is a tree that shows
how our goals are related to one another.

201
00:22:53,100 --> 00:22:57,800
So these are items for your vocabulary that
are all synonymous. Problem reduction tree,

202
00:22:57,809 --> 00:23:02,200
and/or tree, goal tree, all the same thing.
Now you have a name for it, you've got some

203
00:23:02,200 --> 00:23:11,279
power over it. So when we get a situation
like this, unlike the previous situation,

204
00:23:11,279 --> 00:23:31,880
which we suggested might come up in transformation
A. Let's see, we've got one, two, C, and this

205
00:23:31,880 --> 00:23:40,010
one is A, it's an or node. Which one of these
problems do we work on?

206
00:23:40,010 --> 00:23:45,360
Well Slegle, who considered himself to be
modeling a freshman, modeling the intelligence

207
00:23:45,360 --> 00:23:52,200
of a freshman, modeling something that, after
all, you have to be pretty smart to do, right.

208
00:23:52,200 --> 00:23:57,460
Most people don't know how to do integration.
Everybody at MIT knows how to do integration.

209
00:23:57,460 --> 00:24:00,100
You would think that somebody, therefore,
that knows how to do integration is pretty

210
00:24:00,100 --> 00:24:03,980
smart. What would a smart person do, when
faced with this choice?

211
00:24:03,980 --> 00:24:12,669
Well, a smart person would say, which of these
two problems is easier? So how do you think

212
00:24:12,669 --> 00:24:21,800
you might determine which of two, or many
algebraic expressions is the easiest to integrate?

213
00:24:21,800 --> 00:24:22,960
What's your name?

214
00:24:22,960 --> 00:24:24,120
AUDIENCE: Andrew Carrol.

215
00:24:24,120 --> 00:24:26,500
SPEAKER 1: Andrew, what do you think?

216
00:24:26,500 --> 00:24:28,039
AUDIENCE: Based on whichever one feels more
familiar.

217
00:24:28,039 --> 00:24:30,010
SPEAKER 1: Feels.

218
00:24:30,010 --> 00:24:31,809
AUDIENCE: Yes.

219
00:24:31,809 --> 00:24:32,340
SPEAKER 1: Feels.

220
00:24:32,340 --> 00:24:32,570
AUDIENCE: You asked, how would I decide.

221
00:24:32,570 --> 00:24:34,760
SPEAKER 1: Yeah, how would you decide? How
would you feel it?

222
00:24:34,760 --> 00:24:37,990
AUDIENCE: I would feel that the tangent is
more familiar.

223
00:24:37,990 --> 00:24:39,679
SPEAKER 1: Which one?

224
00:24:39,679 --> 00:24:42,850
AUDIENCE: I feel that the tangent [INAUDIBLE].

225
00:24:42,850 --> 00:24:44,409
SPEAKER 1: Yeah, but I wonder how we could
make it a little bit more precise, this idea

226
00:24:44,409 --> 00:24:52,349
of simplicity. The young Turk has a suggestion.
What?

227
00:24:52,349 --> 00:24:57,969
AUDIENCE: I had a suggestion until you said
this idea of simplicity. So then I realized

228
00:24:57,969 --> 00:25:02,909
that what I was about to suggest wasn't going
to clarify simplicity, but I was going to

229
00:25:02,909 --> 00:25:09,280
say, whichever one we've had more encounters
with, or more experience with.

230
00:25:09,289 --> 00:25:12,410
SPEAKER 1: Yeah, if there was something here
with a hyperbolic tangent, you might say,

231
00:25:12,419 --> 00:25:15,950
well, stay away from that. [? Yinid ?]?

232
00:25:15,950 --> 00:25:21,399
AUDIENCE: To which one of those the easier
transformation is applied on the next step.

233
00:25:21,399 --> 00:25:24,819
SPEAKER 1: Like, somebody do a little look
ahead, and see which kind of thing would be

234
00:25:24,820 --> 00:25:29,640
next to you? I don't know, maybe. Oh, we've
got lots of people, all at the same time.

235
00:25:29,640 --> 00:25:33,200
I don't know all your names yet. Shoot. Erica,
I know you.

236
00:25:33,200 --> 00:25:35,840
AUDIENCE: What's look it up in the table and
see [INAUDIBLE].

237
00:25:35,840 --> 00:25:40,300
SPEAKER 1: Oh, you could look it up in the
table and see if something is in it, you could

238
00:25:40,300 --> 00:25:46,840
do that. But this is tangent to the fourth,
so that's not in the table. Ariel?

239
00:25:46,840 --> 00:25:49,620
AUDIENCE: I choose the one without the reciprocal.

240
00:25:49,620 --> 00:25:50,800
SPEAKER 1: Why?

241
00:25:50,800 --> 00:25:57,740
AUDIENCE: It is because when people see one
it's like, oh man, it jut not going to work.

242
00:25:57,740 --> 00:26:00,120
SPEAKER 1: Yeah, we're on the right track.
Claire?

243
00:26:00,120 --> 00:26:03,480
AUDIENCE: On an extremely simple level, I
choose whichever one has the least symbols

244
00:26:03,480 --> 00:26:04,680
in it.

245
00:26:04,680 --> 00:26:07,060
SPEAKER 1: The fewest symbols in it. Now we're
really getting somewhere, because you can

246
00:26:07,060 --> 00:26:12,640
measure that, right, there's a little program
Why Brett, there you are.

247
00:26:12,640 --> 00:26:19,800
AUDIENCE: I would say, every [INAUDIBLE] expression
can be written as, having a number of functions,

248
00:26:19,800 --> 00:26:23,500
we could say all these functions, multiplied
together, divided, and you can just choose

249
00:26:23,500 --> 00:26:26,980
with the least amount of [? iterations ?].

250
00:26:26,980 --> 00:26:31,760
SPEAKER 1: Well I heard it, perhaps others
didn't but what Brett said, is he suggested

251
00:26:31,769 --> 00:26:38,040
that we should measure depth of functional
composition. So the number of symbols may

252
00:26:38,040 --> 00:26:42,779
not matter, because if you have x plus x plus
x plus x, out to a hundred, that would not

253
00:26:42,779 --> 00:26:49,620
be hard to integrate. But if you've got something
that is really deeply nested under a lot of

254
00:26:49,620 --> 00:26:54,860
functional compositions, that could be a problem.
And that's in fact, what Slegle decided to

255
00:26:54,860 --> 00:27:00,440
use, after trying several alternatives.

256
00:27:00,440 --> 00:27:06,370
So if we measure the depth of the functional
composition, this is the winner, and we put

257
00:27:06,370 --> 00:27:11,399
the other one on the shelf, at least for the
moment. And now we have tangent to the fourth

258
00:27:11,399 --> 00:27:18,809
x dx. Do I need the safe transformation supply?
No. Which of the-- you know something has

259
00:27:18,809 --> 00:27:24,440
to apply, otherwise it wouldn't be up here
as an example. So what of the heuristic transformation

260
00:27:24,440 --> 00:27:25,289
supply? Elliott.

261
00:27:25,289 --> 00:27:26,460
AUDIENCE: [INAUDIBLE]

262
00:27:26,460 --> 00:27:30,240
SPEAKER 1: Yeah, B bravo. Military
background or something like that. Maybe

263
00:27:30,240 --> 00:27:38,160
he flies airplanes. OK so B says, it is in
fact a function of the tangent. And when we

264
00:27:38,169 --> 00:27:43,529
do that, we've got to make a substitution,
that y is equal to the tangent. So that means

265
00:27:43,529 --> 00:27:55,799
that this becomes the integral of y to the
fourth over 1 plus y squared. And that's by

266
00:27:55,799 --> 00:28:08,059
transformation B, and the transformation is
y equals tangent of x. The tangent-- I guess

267
00:28:08,059 --> 00:28:17,450
I've lost track of the fact that I've already
transformed a y, but relabeling doesn't matter.

268
00:28:17,450 --> 00:28:25,769
All right so that's progress, maybe. But don't
see this in any of the heuristic transformations,

269
00:28:25,769 --> 00:28:30,340
what do I do now? I didn't have to look in
the heuristic transformations, because one

270
00:28:30,340 --> 00:28:39,370
of the safe transformations applies. Because
this thing is a rational function and the

271
00:28:39,370 --> 00:28:44,120
degree of the numerator is greater that the
degree of the denominator, so I have to divide.

272
00:28:44,120 --> 00:28:52,240
And when I divide, and that by the way is
number four, I get what? Is anybody good high

273
00:28:52,240 --> 00:28:53,960
school algebra that can help me out with that?

274
00:28:53,960 --> 00:28:59,740
AUDIENCE: Y squared minus 2 plus negative
2 over 1 plus y squared

275
00:28:59,740 --> 00:29:15,679
SPEAKER 1: Exactly, y squared minus 1 plus
1 over 1 plus y squared, I think. Now what?

276
00:29:15,679 --> 00:29:31,009
Now we're really getting close to getting
through this, because that is a sum. And by

277
00:29:31,009 --> 00:29:38,070
virtue of the fact that it's a sum, that divides
into three pieces, and the top piece is the

278
00:29:38,070 --> 00:29:44,399
integral of y squared, the middle piece is
the integral of minus 1, and the bottom piece

279
00:29:44,399 --> 00:29:51,929
is the integral of 1 over 1 plus y squared
dy in all cases.

280
00:29:51,929 --> 00:30:06,929
Gosh, if I look this up, I've found it. That's
up there, that's letter B. So I'm done with

281
00:30:06,929 --> 00:30:16,409
that. This one I can transform again, by virtue
of 1, and now I get the integral dy. That's

282
00:30:16,409 --> 00:30:24,750
in there, that's B as well. As this one, I
don't know. But I'd better keep track of what

283
00:30:24,750 --> 00:30:27,669
I'm doing here. This is in the and node, so
I've got to do all of those. I can't give

284
00:30:27,669 --> 00:30:38,139
up on that last thing. And that and transformation
is transformation number 3. So this is in

285
00:30:38,139 --> 00:30:46,710
the table, this is in the table, we still
have this to do, but that's C, heuristic transformation

286
00:30:46,710 --> 00:30:58,659
C. We have 1, plus y squared, then with the
transformation C, with y-- this is y squared--

287
00:30:58,659 --> 00:31:09,149
y equals tangent of z And then we get to the
integral of dz and that's in the table and,

288
00:31:09,149 --> 00:31:10,889
we're done.

289
00:31:10,889 --> 00:31:16,120
So now we've solved the problem. It's the
hardest problem that appeared in that half

290
00:31:16,120 --> 00:31:24,009
decade on MIT 18 01 finals. This is exactly
the problem that was given, except that it

291
00:31:24,009 --> 00:31:30,080
started here. I put the other two pieces on
just to illustrate a couple of the transformations.

292
00:31:30,080 --> 00:31:35,480
But that's a problem that it solved.

293
00:31:35,480 --> 00:31:42,320
And now that we've seen an example, we can
finish up what we talked about a little bit

294
00:31:42,320 --> 00:31:51,779
ago, having to do with the architecture of
this thing. So far, all we've done is talk

295
00:31:51,779 --> 00:31:58,039
about the safe transformations, but now we
know that if we're not done, we need to find

296
00:31:58,040 --> 00:32:06,860
a problem to work on

297
00:32:06,860 --> 00:32:17,159
using that depth of functional composition
business. And then after that we apply heuristic

298
00:32:17,159 --> 00:32:28,819
transformation.

299
00:32:28,820 --> 00:32:31,540
And the way Slagle
designed his program is,

300
00:32:31,540 --> 00:32:34,600
he found just one problem to work on,
did one transformation,

301
00:32:34,610 --> 00:32:38,600
then went back around the loop. Because these
heuristic transformations are a little harder

302
00:32:38,600 --> 00:32:44,679
to apply than the safe ones. So I'll given
you an accurate portrayal of what this program

303
00:32:44,679 --> 00:32:51,169
did, except for one thing. Which I would like,
now, to go back and patch up. And that thing

304
00:32:51,169 --> 00:33:09,009
is over here. What to do with something like
this. Well we got to that in a board that's

305
00:33:09,009 --> 00:33:15,350
disappeared, but when we tried to deal with
this, we had to find a heuristic transformation.

306
00:33:15,350 --> 00:33:20,289
And when we decided to work on this, it must
have been the case that this was the simplest

307
00:33:20,289 --> 00:33:24,370
problem at a leaf node that has not yet been
solved.

308
00:33:24,370 --> 00:33:32,559
So what's the functional composition depth
of this? It's 3. Back over here, we have something

309
00:33:32,559 --> 00:33:37,649
that has a depth of functional composition
of 2. So when the program actually ran on

310
00:33:37,649 --> 00:33:44,029
this particular problem, it stopped a few
inches short of the finish line, And went

311
00:33:44,029 --> 00:33:48,720
back and screwed around with that other problem
for a little bit, before it gave up and came

312
00:33:48,720 --> 00:33:50,919
back here.

313
00:33:50,919 --> 00:33:55,669
So it's always looking across the whole tree,
the leaves of the tree. Whenever it has to

314
00:33:55,669 --> 00:33:59,970
find a place to work on with the heuristic
transformation, it happened to look at all

315
00:33:59,970 --> 00:34:04,159
the leaves of the tree that had not yet been
dealt with, tried to find the easiest one,

316
00:34:04,159 --> 00:34:08,550
and that could involve a lot of backing up
and starting over on a branch of the tree

317
00:34:08,550 --> 00:34:14,710
that it had previously ignored. A small detail,
not a particularly important one.

318
00:34:14,710 --> 00:34:35,620
Now where are we. We've got that guy there.
We've got our complete architecture. We've

319
00:34:35,620 --> 00:34:44,300
got our solved problem. And now we can start
reflecting on what we've done. We can say,

320
00:34:44,300 --> 00:34:49,900
for example, how good an integration program
is this? And the answer is, it was pretty

321
00:34:49,900 --> 00:34:57,970
good. This machine that Slagle was using was
a machine that was over in building 26. And

322
00:34:57,970 --> 00:35:01,900
we were so proud of it, that it was behind
glass, and you could go there and watch the

323
00:35:01,900 --> 00:35:10,610
tape spin, it was really a delight. 32k of
memory, that's 32k of memory. It's amazing

324
00:35:10,610 --> 00:35:17,440
that he was able to do anything with a machine
of that size.

325
00:35:17,440 --> 00:35:29,740
Let's see, let's get us a clean one.
Can't do board geometry and talk at the same

326
00:35:29,740 --> 00:35:40,220
time. We can now ask some questions about
how well the program performed. It was given

327
00:35:40,220 --> 00:35:49,000
56 of the hardest problems, and it got 54
right. What happened when it didn't get the

328
00:35:49,000 --> 00:35:55,310
other two? Well, you might be right if you
said, oh it probably ran out of memory, since

329
00:35:55,310 --> 00:36:01,280
it had 32k. But in fact, it just was lacking
2 transformations that were needed, in order

330
00:36:01,280 --> 00:36:09,930
to solve the whole entire set of final quiz
problems. So when a program fails, that's

331
00:36:09,930 --> 00:36:15,270
often the most interesting question you can
ask. This is an exception. This failed for

332
00:36:15,270 --> 00:36:20,670
uninteresting reasons on 2 of the 56 problems
that it was given to.

333
00:36:20,670 --> 00:36:30,750
And now the next question you can say is,
what is the depth of the tree in the maximal

334
00:36:30,750 --> 00:36:36,370
case? And the answer is, it's that case we
just worked out. And since I've once again

335
00:36:36,370 --> 00:36:42,880
lost the whole tree, I'll tell you that it's
depth was 7 when you take off that minus 5.

336
00:36:42,880 --> 00:36:51,090
So in the worst case, this thing had to get
down seven levels.

337
00:36:51,090 --> 00:37:00,380
That's the worst case, a more interesting
question is what was the average depth? And

338
00:37:00,380 --> 00:37:09,900
that was approximately 3. And now we're beginning
to say something, not only about Slagle's

339
00:37:09,900 --> 00:37:16,450
model of how a freshman works, but we're beginning
to say something about the nature of the domain.

340
00:37:16,450 --> 00:37:21,980
In the domain of calculus problems, integrals
expressions that are given to freshman, in

341
00:37:21,980 --> 00:37:29,670
that domain, the average depth of problem
reduction needed to solve the problem was

342
00:37:29,670 --> 00:37:35,270
3. So that's not very complicated. If it were
10, you would say, wow, how can anybody ever

343
00:37:35,270 --> 00:37:42,260
do those problems? If it were 5, you'd say,
well only people destined to be math professors

344
00:37:42,260 --> 00:37:48,590
are going to get anything right. If it's 3,
us ordinary mortals can do a pretty good job.

345
00:37:48,590 --> 00:38:06,040
Another question of even greater interest
is, how many branches were unused? Here's

346
00:38:06,040 --> 00:38:10,910
a branch that turned out to be unused, it
didn't pursue that. And so you might say,

347
00:38:10,910 --> 00:38:17,850
well maybe there are a lot of unused branches.
Maybe you have to be pretty smart about your

348
00:38:17,850 --> 00:38:21,160
method for determining what problem to work
on, because otherwise you'll go down a lot

349
00:38:21,160 --> 00:38:23,580
of rat holes.

350
00:38:23,580 --> 00:38:30,250
And guess what, here's another statement about
the domain. In the domain of problems that

351
00:38:30,250 --> 00:38:38,060
freshmen could work on a final, the number
of unused branches is about 1. So that means

352
00:38:38,060 --> 00:38:45,690
this tree keeps itself together, and doesn't
run down to a very large, bushy, useless tree.

353
00:38:45,690 --> 00:38:52,320
So this means that the depth of functional
composition, which Brett suggested as a technique

354
00:38:52,320 --> 00:38:59,750
for recognizing the right problem work on,
was a choice that didn't actually matter.

355
00:38:59,750 --> 00:39:05,710
Because the tree doesn't grow deep, it doesn't
go broad. It doesn't matter what you use to

356
00:39:05,710 --> 00:39:10,110
decide what to work on, because in the worst
case, you'll just generate a couple of extra,

357
00:39:10,110 --> 00:39:14,150
useless nodes. But they very quickly run to
find a dead end, so you don't have to do anything

358
00:39:14,150 --> 00:39:19,020
more with them.

359
00:39:19,020 --> 00:39:24,280
So now the next thing we need to do is back
even further away from this program, and ask

360
00:39:24,280 --> 00:39:29,370
ourselves some questions about the nature
of what we've been doing. And that brings

361
00:39:29,370 --> 00:39:34,170
me to the things I've got on that upper right-hand
board. One of those things as a catechism

362
00:39:34,170 --> 00:39:37,020
having to do with knowledge.

363
00:39:37,020 --> 00:39:41,050
And what we've done informally as we went
through this program was, we've asked questions

364
00:39:41,050 --> 00:39:48,530
such as, what kind of knowledge is involved
in doing this? Well knowledge about transformation.

365
00:39:48,530 --> 00:39:54,770
Knowledge about how goal trees work and when
we're done with a problem. Knowledge about

366
00:39:54,770 --> 00:39:59,520
what things don't need to be transformed,
because you can look them up in a table. That's

367
00:39:59,520 --> 00:40:05,890
the kind of knowledge that is involved in
doing 18 01. And if you do 18 0 circuit theory,

368
00:40:05,890 --> 00:40:11,020
6 0 circuit theory or 6 0 Maxwell's equations,
this is the same thing.

369
00:40:11,020 --> 00:40:14,020
You have to ask questions of this sort,

370
00:40:14,030 --> 00:40:17,430
about the nature of the knowledge involved,
and question number one is always, what kind

371
00:40:17,430 --> 00:40:23,440
of knowledge is involved? Is it Kirchhoff's
laws, Maxwell's equations, what is it?

372
00:40:23,440 --> 00:40:27,440
The next question is, how is the knowledge
represented? And our answers here are, well

373
00:40:27,440 --> 00:40:32,700
all this stuff, ultimately was represented
in list best expressions. Some of the

374
00:40:32,700 --> 00:40:37,040
knowledge was recorded in a table [? of best
?] expressions to show what transformations

375
00:40:37,040 --> 00:40:43,930
there are. There was a similar table of integrals.
Knowledge about goal trees was embedded in

376
00:40:43,930 --> 00:40:49,970
the procedure, so it was procedurally represented.
And so for each of the categories of knowledge,

377
00:40:49,970 --> 00:40:56,550
there's a way it gets represented. How is
it used? Straightforward, transformations

378
00:40:56,550 --> 00:41:02,390
are used to make the problem simpler. The
table is used to trim off and to serve as

379
00:41:02,390 --> 00:41:06,530
the bottom of the tree. Those are the ways
in which the knowledge is used.

380
00:41:06,530 --> 00:41:13,440
And then there's the question of course of,
how much knowledge is required. Something

381
00:41:13,440 --> 00:41:19,180
that's useful to know if it's late at night,
you have 2 finals the next day, and you're

382
00:41:19,180 --> 00:41:24,640
not sure which course you should study. So
how much knowledge might you suppose was actually

383
00:41:24,640 --> 00:41:29,960
in this program? I've shown you a glimpse
of the kind of knowledge that's involved in

384
00:41:29,960 --> 00:41:34,710
the program. I've answered a little bit of
question 5, what exactly. But how much knowledge

385
00:41:34,710 --> 00:41:36,910
was involved. You might be surprised by the
answer.

386
00:41:36,910 --> 00:41:44,540
First of all, the table of integrals. I've
listed only 3 things there. There are lots

387
00:41:44,540 --> 00:41:50,770
of other things you can think of, like integral
of e to the x is e to the x. But in the end,

388
00:41:50,770 --> 00:41:58,310
what Slagle found is, a table only 26 elements
was enough to solve all of these problems.

389
00:41:58,310 --> 00:42:11,320
How about the transformations here, the safe
ones, about 12. How about the heuristic ones,

390
00:42:11,320 --> 00:42:17,490
about 12. So just a few bits and pieces of
knowledge, here and there, are sufficient

391
00:42:17,490 --> 00:42:22,370
to do everything you need to do, in order
to do the integration problems on a calculus

392
00:42:22,370 --> 00:42:25,520
final. That was a surprise.

393
00:42:25,520 --> 00:42:31,140
Another surprise of a similar kind, also about
knowledge, is that the relationship between

394
00:42:31,140 --> 00:42:40,400
the method to be used, and the characteristics
of the problem, was almost a diagonal table.

395
00:42:40,400 --> 00:42:46,850
That means that you could, in this domain,
make the right transformation almost all the

396
00:42:46,850 --> 00:42:52,300
time if you're a little bit smart, and never
back up. That was an observation made by Joel

397
00:42:52,300 --> 00:42:58,210
Moses, who became subsequently our provost
here at MIT for a while. And he wrote a program

398
00:42:58,210 --> 00:43:05,760
that could solve anything. It would beat the
most dedicated mathematicians at integration.

399
00:43:05,760 --> 00:43:08,610
And its descendents are in MATLAB today.

400
00:43:08,610 --> 00:43:12,570
But this is how it all works. And now you
can write one of these things yourself. Partly

401
00:43:12,570 --> 00:43:18,300
because you now have this catechism. This
is the kind of stuff you should ask any time

402
00:43:18,300 --> 00:43:24,320
you're dealing with a new domain. It will
make you smarter. And this is of course, meta

403
00:43:24,320 --> 00:43:30,970
knowledge, this is knowledge about knowledge.
So this tired aphorism isn't quite what we

404
00:43:30,970 --> 00:43:43,810
are going to complete ourselves with. We're
going to say that knowledge about knowledge

405
00:43:43,810 --> 00:43:45,850
is where the real power is.

406
00:43:45,850 --> 00:43:52,790
Now there's one final thing that this program
does for us. It tells us something about our

407
00:43:52,790 --> 00:43:58,280
appreciation of what it means to be intelligent.
You know that in the beginning of this hour,

408
00:43:58,280 --> 00:44:04,260
I asked you to think about whether a program
that could do symbolic integration would be,

409
00:44:04,260 --> 00:44:12,480
in any way, or should be considered to any
degree, intelligent. And I'm imagining that

410
00:44:12,480 --> 00:44:17,820
even in these days of MATLAB, and whatnot,
many of you said well, yes, I learned how

411
00:44:17,820 --> 00:44:23,190
to do that at MIT, or late in high school,
so it must be smart.

412
00:44:23,190 --> 00:44:29,410
But now that we've completed this discussion,
I also expect that your feeling of intelligence

413
00:44:29,410 --> 00:44:34,740
in this program is somewhat diminished. Because
what happens is that, when we understand how

414
00:44:34,740 --> 00:44:39,890
something works, it's intelligence seems to
vanish. You've seen this in your friends,

415
00:44:39,890 --> 00:44:46,740
right? They solve some problem, they seem
super smart. Then they tell you how they did

416
00:44:46,740 --> 00:44:50,830
it, and they don't seem so smart anymore.

417
00:44:50,830 --> 00:44:59,860
So let's conclude our discussion today was
a little story. A long time ago I was talking

418
00:44:59,860 --> 00:45:08,030
with a student who said, computers cannot
be intelligent. And I said, OK, maybe you're

419
00:45:08,030 --> 00:45:12,100
right, but let me show you this program. So
I showed him the integration program, working

420
00:45:12,100 --> 00:45:18,580
on problems like this. And after I showed
him a couple of those examples, he says, well,

421
00:45:18,580 --> 00:45:23,460
all right, I guess maybe they can be intelligent.
I'm learning how to do that, and it's not

422
00:45:23,460 --> 00:45:30,830
always easy. Then I made a fatal mistake.
I said let me show you how it works, and we

423
00:45:30,830 --> 00:45:35,280
spent an hour going through it like this.
And at the end of that time, he turned to

424
00:45:35,280 --> 00:45:42,050
me and said, I take it back, it's not intelligent
after all. It does integration the same way

425
00:45:42,050 --> 00:45:44,840
I do.