1 00:00:08,119 --> 00:00:11,380 What we're going to talk about today, is goals. 2 00:00:11,389 --> 00:00:17,550 So just by way of a little warm up exercise, I'd like you to look at that integration problem 3 00:00:17,550 --> 00:00:40,750 over there. The one that's disappeared. 4 00:00:40,750 --> 00:00:47,840 So the question is, can you do it in your head? Probably not. The question is, if a 5 00:00:47,840 --> 00:00:55,260 program can do that, is a program, in any sense of the word, intelligent? That's a background 6 00:00:55,260 --> 00:00:58,880 task I'd like you to work on as I talk today. 7 00:00:58,880 --> 00:01:01,000 So today we're going to be modeling a little bit of human 8 00:01:01,000 --> 00:01:02,720 problem solving, the kind that 9 00:01:02,720 --> 00:01:07,100 is required when you do symbolic integration. Now, you all learned how to do that. You may 10 00:01:07,100 --> 00:01:10,579 not be able to do that particular problem anymore, but you all learned how to integrate 11 00:01:10,579 --> 00:01:16,950 in high school 1801, or something like that. The question is, how did you do it, and is 12 00:01:16,950 --> 00:01:22,539 the problem solving technique that we are trying to model by building a program that 13 00:01:22,539 --> 00:01:29,610 does symbolic integration, is that a common kind of description of what people do when 14 00:01:29,610 --> 00:01:31,110 they solve problems. 15 00:01:31,110 --> 00:01:34,120 So the answer to the question is, yes. The kind of problem solving you'll see 16 00:01:34,150 --> 00:01:39,280 today is like generating tests, which you saw last time. It's a very common kind of 17 00:01:39,280 --> 00:01:44,799 problem solving that we all engage in, that we all engage in without thinking about it, 18 00:01:44,799 --> 00:01:46,650 and without having a name for it. 19 00:01:46,650 --> 00:01:51,470 But once we get a name for it, we'll get power over it. And then we'll be able to deploy 20 00:01:51,470 --> 00:01:55,658 it, and it will become a skill. We'll not just witness it, we'll not just understand 21 00:01:55,658 --> 00:02:00,950 it, we'll use it instinctively, as a skill. 22 00:02:00,950 --> 00:02:04,650 So there you are, you've got that problem, there's your problem, and what do you do to 23 00:02:04,650 --> 00:02:12,060 solve it? I don't know, look it up in a table? You'll never find it in a table because of 24 00:02:12,060 --> 00:02:21,670 that minus sign and that 5. So you're going to have to do something better than that. 25 00:02:21,670 --> 00:02:26,640 So what you're going to do, is what you always do when you see a problem like that. You try 26 00:02:26,640 --> 00:02:31,090 to apply a transform, and make it into a different problem that's easier to solve. And eventually, 27 00:02:31,090 --> 00:02:38,040 what you hope is that you'll simplify it sufficiently, that the pieces that you've simplified to 28 00:02:38,040 --> 00:02:44,450 will be found in some small table of integrals. So how long is this table? It's not the case 29 00:02:44,450 --> 00:02:48,690 that we're going to look at a table with 388 elements, because this is not a big table 30 00:02:48,690 --> 00:02:52,860 of integrals. This is what a freshman might have in a freshman's head, after taking a 31 00:02:52,860 --> 00:02:55,739 course in integral calculus. 32 00:02:55,739 --> 00:03:00,400 One of the interesting questions is, how many elements have to be in that table to get an 33 00:03:00,400 --> 00:03:05,180 A in the course? We're interested in how much knowledge is involved, that's one of the elements 34 00:03:05,180 --> 00:03:09,980 of catechism that I've listed over there, that will be part of the gold star ideas suite 35 00:03:09,980 --> 00:03:12,400 of the day. 36 00:03:12,400 --> 00:03:19,769 So we'd like to take that problem, and find a way to make it into another problem that's 37 00:03:19,769 --> 00:03:25,800 more likely, or closer to being found in the table. So what we're going to do is very simple, 38 00:03:25,800 --> 00:03:31,720 graphically. We're going to take the problem we're given, and convert it into another problem 39 00:03:31,720 --> 00:03:35,579 that's simpler. And we're going to give that process and name, and we're going to call 40 00:03:35,579 --> 00:03:46,900 it problem reduction. 41 00:03:46,900 --> 00:03:54,269 And so, in the world of integral calculus, there are all sorts of simple methods, simple 42 00:03:54,269 --> 00:04:00,010 transformations, we can try that will take a hard problem and make it into an easier 43 00:04:00,010 --> 00:04:07,659 problem. And some of these transformations are extremely simple and always safe. Some 44 00:04:07,659 --> 00:04:12,629 of them are just, well let's try it and see what happens. But some of them are safe, and 45 00:04:12,629 --> 00:04:18,879 I'd like to make a short list of safe transformations right now. 46 00:04:18,879 --> 00:04:25,490 Now I'm going to be going into some detail. And that detail will be grungy. And the question 47 00:04:25,490 --> 00:04:30,159 is, why do I do it? And it's educational philosophy, is why I do it. So here's the educational 48 00:04:30,159 --> 00:04:38,110 philosophy. At one level, you want to have a skill. But if you're going to have a skill, 49 00:04:38,110 --> 00:04:47,629 you have to understand it. So if you're going to have a skill you have to understand it 50 00:04:47,629 --> 00:04:54,330 one level down. If you're going to understand it, you have to have witnessed it on a level 51 00:04:54,330 --> 00:04:56,389 lower than that. 52 00:04:56,389 --> 00:05:00,569 So I'm not just going to talk about the idea of problem reduction, because if I were just 53 00:05:00,569 --> 00:05:05,969 going to do that, then we could all go home now. So I'm going to show you a particular 54 00:05:05,969 --> 00:05:10,919 example of it, so you understand it better, and I'm going to show you the detail at an 55 00:05:10,919 --> 00:05:15,199 even lower level than that. So you will witness the stuff that makes it possible, to understand 56 00:05:15,199 --> 00:05:19,419 the stuff that makes it possible, to build a skill. So that's why I'm going through the 57 00:05:19,419 --> 00:05:21,539 grungy detail. 58 00:05:21,539 --> 00:05:27,349 So I don't know, let's see. Maybe we can get some hints from that example, but I wonder 59 00:05:27,349 --> 00:05:34,569 if somebody could volunteer a simple transformation that always is a good thing to do. Yes, Sebastian. 60 00:05:34,569 --> 00:05:35,729 AUDIENCE: Take the constants out. 61 00:05:35,729 --> 00:05:41,819 SPEAKER 1: Take the constants out. So we'll make that number two. And we'll say that the 62 00:05:41,819 --> 00:05:53,680 integral c f of x dx is equal to c times the integral f of x dx. Other suggestions? Yes. 63 00:05:53,680 --> 00:05:56,449 AUDIENCE: Trig substitution. 64 00:05:56,449 --> 00:06:03,569 SPEAKER 1: Trig substitution. Now this is-- no, that's for day two. We don't do trig substitution 65 00:06:03,569 --> 00:06:10,229 here under stuff that's safe, always works, never any doubt, there are simpler things. 66 00:06:10,229 --> 00:06:17,319 These are the safe transformations. What you're giving me is a heuristic transformation. Often 67 00:06:17,319 --> 00:06:22,779 is helpful, doesn't necessarily always work. We're going to divide our transformations 68 00:06:22,779 --> 00:06:26,729 into those two categories. So I need another safe one. 69 00:06:26,729 --> 00:06:29,830 AUDIENCE: [INAUDIBLE] 70 00:06:29,830 --> 00:06:37,740 SPEAKER 1: The architects are sitting over there. Divided not only by nationality, but 71 00:06:37,740 --> 00:06:40,020 by course. What? 72 00:06:40,020 --> 00:06:43,280 AUDIENCE: The sum of integrals is the integral of the sum. 73 00:06:43,280 --> 00:07:05,360 SPEAKER 1: The sum of integrals is the integral of the sum. Now what's missing? What's number 74 00:07:05,360 --> 00:07:11,220 one? You're probably thinking it's already there, because you've given me the transformation 75 00:07:11,229 --> 00:07:16,159 that involves a constant. And you can think of minus 1 as a constant. 76 00:07:16,159 --> 00:07:20,270 But whether you use a separate transformation or not, of course depends on how you represent 77 00:07:20,270 --> 00:07:25,889 the knowledge. And all of this knowledge, all of this whole thing, was written in an 78 00:07:25,889 --> 00:07:32,749 early form of Lisp. As a consequence, the way in which minus was represented is different 79 00:07:32,749 --> 00:07:38,069 from the way minus 1 is represented. So we need one more transformation. Or rather, Jim 80 00:07:38,069 --> 00:07:42,469 Slagle needed one more transformation, when he wrote his famous transformation program. 81 00:07:42,469 --> 00:07:53,599 And that was that if you have the integral of minus f of x, that's equal to, minus the 82 00:07:53,599 --> 00:07:56,989 integral of f of x. 83 00:07:56,989 --> 00:08:01,099 So that almost completes our safe transformation set. There's one more that I'm going to supply 84 00:08:01,099 --> 00:08:05,800 you, because I don't think you'd guess it. Why should you? It's number four. There are 85 00:08:05,800 --> 00:08:11,489 more than this, this is a sample. And these are the ones we're going to need in order 86 00:08:11,489 --> 00:08:14,989 to solve that problem, by way of illustration. 87 00:08:14,989 --> 00:08:24,439 So the fourth one is that, if you have the integral of p of x, over q of x, then you 88 00:08:24,439 --> 00:08:31,159 divide. If you can reach way back into high school and figure out how to divide polynomials. 89 00:08:31,159 --> 00:08:37,179 But if the degree of the numerator is greater than the degree of the denominator, then it's 90 00:08:37,179 --> 00:08:42,120 a knee-jerk always win, you must do it, divide it out. 91 00:08:42,120 --> 00:08:50,259 So this, then, forms the core of an integration program, that will integrate almost nothing. 92 00:08:50,259 --> 00:08:56,490 But actually, almost nothing is integrable anyway, so it's a good head start. So let's 93 00:08:56,490 --> 00:09:05,620 see how we would put this into some kind of procedure. Some kind of framework for deploying 94 00:09:05,620 --> 00:09:12,600 the knowledge that we're beginning to develop. 95 00:09:12,600 --> 00:09:32,920 What we're going to do is, apply all safe transforms. That's our first step. Then we're 96 00:09:32,920 --> 00:09:52,440 going to look in the table, and then we're going to do a test to see if we're done. And 97 00:09:52,440 --> 00:10:02,400 if we are, we report success. But, we're not likely to get done with just that stuff. 98 00:10:02,410 --> 00:10:10,000 But you know what, there was one transformation up here, which breaks my little diagram. Which 99 00:10:10,000 --> 00:10:19,709 one is it? It's the third one, right? Because this picture does not reflect what happens 100 00:10:19,709 --> 00:10:24,680 when you apply number three. Because it breaks the problem up, not into just one problem, 101 00:10:24,680 --> 00:10:33,240 but into a whole bunch. So we have to extend our graphical device for talking about this 102 00:10:33,240 --> 00:10:47,220 by a little bit, and show what is called an "and node". 103 00:10:47,220 --> 00:10:55,230 So we've got a program core, we've got a table of integrals, we've got a few transformations, 104 00:10:55,230 --> 00:10:59,810 we've got an architecture, a way of putting that stuff together. And now we can try it 105 00:10:59,810 --> 00:11:06,370 out on our sample problem. So let's have a go at that. 106 00:11:06,370 --> 00:11:20,149 Let's see, this one immediately transforms into 5x to the fourth over 1 minus x squared 107 00:11:20,149 --> 00:11:28,649 to the 5/2 dx. And that in turn, immediately transforms into the integral of x to the fourth 108 00:11:28,649 --> 00:11:36,620 over 1 minus x squared to the 5/2, dx. 109 00:11:36,620 --> 00:11:42,699 This program, by the way, is a dawn-age program. This was written by a nearly blind, and subsequently 110 00:11:42,699 --> 00:11:48,319 completely blind, graduate student by the name of James Slagle in 1960, a long time 111 00:11:48,319 --> 00:11:54,509 ago. The reason I gave it to you today is because, that by describing it, I am giving 112 00:11:54,509 --> 00:12:01,079 you a one-lecture course in artificial intelligence. He anticipated so much of the subsequent 20 113 00:12:01,079 --> 00:12:08,689 years, that talking about his program, which is possible in one day, is a miniature introduction 114 00:12:08,689 --> 00:12:10,389 to the whole field. 115 00:12:10,389 --> 00:12:17,899 So Slagle, as he was doing this on an antique computer, almost no memory, almost no speed, 116 00:12:17,899 --> 00:12:25,019 only slightly faster than mice running around on a treadmill. He was able to write a program 117 00:12:25,019 --> 00:12:31,769 that did extremely well when benchmarked against freshmen. And the way you benchmark against 118 00:12:31,769 --> 00:12:39,249 freshman, of course, is you give it an examination, drawn from the previous MIT finals for four 119 00:12:39,249 --> 00:12:46,100 or five years, the hardest problems. And this was the hardest problem that it solved. 120 00:12:46,100 --> 00:12:51,769 So at this point, with what we've got so far, we would be stuck. We have no transformation 121 00:12:51,769 --> 00:12:59,540 that can take us further, so we need something else. And what we need by way of something 122 00:12:59,540 --> 00:13:08,939 else, is some transformations that we will describe as-- perhaps we'll call them, heuristic 123 00:13:08,939 --> 00:13:20,279 transformations. A funny word, meaning a method that often works isn't guaranteed to work. 124 00:13:20,279 --> 00:13:27,540 It's not an algorithm in the usual sense that we talk about algorithms. But rather, it's 125 00:13:27,540 --> 00:13:29,430 an attempt. 126 00:13:29,430 --> 00:13:35,519 So these things I'm going to talk about now, are sometimes useful, not always useful. Sometimes 127 00:13:35,519 --> 00:13:42,209 take you into a blind alley, don' always work. But you can't get an A in calculus without 128 00:13:42,209 --> 00:13:49,379 knowing some of them. So you said, some kind of trig substitution. So here is some kind 129 00:13:49,379 --> 00:13:56,329 of trig substitution. We'll call this heuristic transformation A. 130 00:13:56,329 --> 00:14:10,790 You have a function sine x, cosine x, tangent of x, cotangent of x, secant of x, and cosecant 131 00:14:10,790 --> 00:14:20,899 of x. And we all know from high school trigonometry, that we can rewrite that as a function of 132 00:14:20,899 --> 00:14:36,990 sine x, and cosine x. Or we can rewrite that as a function of tangent of x, and cosecant 133 00:14:36,990 --> 00:14:51,550 of x. Or we can rewrite that as function of cotangent of x, and the secant of x. So that's 134 00:14:51,550 --> 00:14:57,089 a transmission from trigonometric form, into another trigonometric form. It's not always 135 00:14:57,089 --> 00:15:02,990 a good idea, sometimes it helps. 136 00:15:02,990 --> 00:15:25,370 Well that's just part one of our suite of heuristic transformations. Stop. There are 137 00:15:25,370 --> 00:15:32,459 others that we need to have in our repertoire, in order to solve the problem. One of them 138 00:15:32,459 --> 00:15:38,629 is a family of transformations, which I'll show you only one. It goes like this, if you 139 00:15:38,629 --> 00:15:53,899 have the integral of a function, of the tangent of x, then you can rewrite that as the integral 140 00:15:53,899 --> 00:16:05,850 of a function of y over 1 plus y squared dy. So that's a transformation from a trigonometric 141 00:16:05,850 --> 00:16:11,360 form into a polynomial form. So it gets rid of all that trigonometric garbage we don't 142 00:16:11,360 --> 00:16:17,319 want to deal with. And there's a whole family of things like that, just as there's a family 143 00:16:17,319 --> 00:16:22,699 of transformations like so, but this is enough to give you flavor. 144 00:16:22,699 --> 00:16:30,949 Now there's a C that we need as well. And that's going to be your proper knee-jerk reaction 145 00:16:30,949 --> 00:16:39,640 when you see something of the form 1 minus x squared. What do you do when you see that? 146 00:16:39,640 --> 00:16:40,660 AUDIENCE: [INAUDIBLE] 147 00:16:40,660 --> 00:16:41,940 What's that Rhana? 148 00:16:41,940 --> 00:16:44,340 Rhana: 1 + 6 * 1 - 6 149 00:16:44,380 --> 00:16:49,180 Well wait a second. We could do that. But there's another thing we can do. 150 00:16:49,180 --> 00:16:57,260 Christian, have you got something you can suggest? Where's our Hungarian? Our Turk, 151 00:16:57,260 --> 00:17:02,460 our young Turk. Yeah, what do you think? 152 00:17:02,460 --> 00:17:07,060 AUDIENCE: I actually don't remember. I mean, I think it might have been 10. 153 00:17:07,060 --> 00:17:11,859 SPEAKER 1: Well, let's see. Cosine squared plus sine squared equals 1. So, what's that 154 00:17:11,859 --> 00:17:22,060 suggest to you? So it suggests that we make a transformation that involves x equals sine 155 00:17:22,060 --> 00:17:27,640 y. So [? Silla ?] doesn't actually have to remember that anymore because going forward, 156 00:17:27,648 --> 00:17:31,640 she will never have to integrate anything personally in her life, she can just simulate 157 00:17:31,640 --> 00:17:49,010 the program. 158 00:17:49,010 --> 00:17:53,760 So these go from polynomial form, back into trigonometric form. So you have three of these 159 00:17:53,760 --> 00:17:59,289 heuristic transformations. We've got four safe transformations. Let's see if we can 160 00:17:59,289 --> 00:18:11,160 make any progress on our integration problem. 161 00:18:11,160 --> 00:18:19,600 OK so keeping track of what we've been using, this is safe transformation number one, this 162 00:18:19,610 --> 00:18:24,890 is safe transformation number two. What do we do next? We decided there were no more 163 00:18:24,890 --> 00:18:31,190 safe transformations that apply. But now we can look at our heuristic transformations 164 00:18:31,190 --> 00:18:35,150 and behold, we see what? 165 00:18:35,150 --> 00:18:35,820 AUDIENCE: C 166 00:18:35,820 --> 00:18:36,920 SPEAKER 1: What? 167 00:18:36,920 --> 00:18:38,460 AUDIENCE: Applying transformation C. 168 00:18:38,470 --> 00:18:50,710 SPEAKER 1: Transformation C suggests that we do x equals the sine y. 169 00:18:50,710 --> 00:18:57,490 And now we get the integral of sine to the 170 00:18:57,490 --> 00:19:11,380 fourth y over cosine to the fourth y dy, right. All good, I see some confused, worried, concerned 171 00:19:11,380 --> 00:19:19,320 looks. Maybe I've made a mistake, perhaps I should use notes. Well no, wait a minute. 172 00:19:19,320 --> 00:19:26,429 For those of you who have a concerned look, remember that if x equals a sine y, then dx 173 00:19:26,429 --> 00:19:33,940 is equal to cosine y dy. That's why it's cosine to the fourth not cosine to the fifth, as 174 00:19:33,940 --> 00:19:38,580 you were perhaps thinking it might be. 175 00:19:38,580 --> 00:19:45,880 So now we've made some progress. We look at this, we say, are there any safe transformations 176 00:19:45,880 --> 00:19:52,450 that apply? And the answer is, no. Now we look for a heuristic transformation that might 177 00:19:52,450 --> 00:19:58,029 apply, and I say, what do you see? Which one? What's that? 178 00:19:58,029 --> 00:20:05,880 AUDIENCE: [INAUDIBLE]. 179 00:20:05,880 --> 00:20:09,740 SPEAKER 1: She said something unintelligible, but what she probably said is, that this looks 180 00:20:09,750 --> 00:20:17,029 like a pattern that might match with the heuristic transformation A, right? Because we have a 181 00:20:17,029 --> 00:20:22,320 function in which the variable is buried, universally in sines, or cosines, or tangents, 182 00:20:22,320 --> 00:20:27,649 or cotangents, or secants, or cosecants. And we know we can rewrite that in one of three 183 00:20:27,649 --> 00:20:35,549 ways. It's already written as a function of sine and cosine. But we can also rewrite that 184 00:20:35,549 --> 00:20:41,809 in terms of tangent and cosecant. Or cotangent and secant. 185 00:20:41,809 --> 00:20:50,370 So when we do that, we can go this way, and we can get the integral of 1 over the cotangent 186 00:20:50,370 --> 00:21:03,120 of x dx. That's g3 up there. Or we can do it down this path, and get the integral of 187 00:21:03,120 --> 00:21:11,200 tangent of x dx. And of course, those are both to the fourth. 188 00:21:11,200 --> 00:21:23,140 But know what, I've broken my little graphical diagram again. Where did it go, it's disappeared. 189 00:21:23,140 --> 00:21:35,100 There it is. How have I broken it? Because with transformation A, I've introduced a possibility 190 00:21:35,110 --> 00:21:39,220 that a particular problem can be transformed into more than one kind of problem, any of 191 00:21:39,220 --> 00:21:43,470 which will be the solution to my problem. 192 00:21:43,470 --> 00:22:00,740 So far I've got an and node, but now I've got to introduce an or node. Because now we 193 00:22:00,740 --> 00:22:04,210 have an example of something that can be solved one of two different ways, and we don't care 194 00:22:04,210 --> 00:22:10,399 which one it is. Now you'll notice that there's already some confusion here, because how can 195 00:22:10,399 --> 00:22:13,970 you tell the difference between an and node and an or node. So the universal convention 196 00:22:13,970 --> 00:22:20,190 is, you draw an arc over the and nodes. And that makes it look like an A, so it's easy 197 00:22:20,190 --> 00:22:22,850 to remember. So those are and nodes. 198 00:22:22,850 --> 00:22:30,200 And now, we have the method of problem reduction, and this is sometimes called a problem reduction 199 00:22:30,200 --> 00:22:45,750 tree. Sometimes it's called an and/or tree, and sometimes it's called a goal tree, because 200 00:22:45,750 --> 00:22:53,100 this tree of problems is a tree that shows how our goals are related to one another. 201 00:22:53,100 --> 00:22:57,800 So these are items for your vocabulary that are all synonymous. Problem reduction tree, 202 00:22:57,809 --> 00:23:02,200 and/or tree, goal tree, all the same thing. Now you have a name for it, you've got some 203 00:23:02,200 --> 00:23:11,279 power over it. So when we get a situation like this, unlike the previous situation, 204 00:23:11,279 --> 00:23:31,880 which we suggested might come up in transformation A. Let's see, we've got one, two, C, and this 205 00:23:31,880 --> 00:23:40,010 one is A, it's an or node. Which one of these problems do we work on? 206 00:23:40,010 --> 00:23:45,360 Well Slegle, who considered himself to be modeling a freshman, modeling the intelligence 207 00:23:45,360 --> 00:23:52,200 of a freshman, modeling something that, after all, you have to be pretty smart to do, right. 208 00:23:52,200 --> 00:23:57,460 Most people don't know how to do integration. Everybody at MIT knows how to do integration. 209 00:23:57,460 --> 00:24:00,100 You would think that somebody, therefore, that knows how to do integration is pretty 210 00:24:00,100 --> 00:24:03,980 smart. What would a smart person do, when faced with this choice? 211 00:24:03,980 --> 00:24:12,669 Well, a smart person would say, which of these two problems is easier? So how do you think 212 00:24:12,669 --> 00:24:21,800 you might determine which of two, or many algebraic expressions is the easiest to integrate? 213 00:24:21,800 --> 00:24:22,960 What's your name? 214 00:24:22,960 --> 00:24:24,120 AUDIENCE: Andrew Carrol. 215 00:24:24,120 --> 00:24:26,500 SPEAKER 1: Andrew, what do you think? 216 00:24:26,500 --> 00:24:28,039 AUDIENCE: Based on whichever one feels more familiar. 217 00:24:28,039 --> 00:24:30,010 SPEAKER 1: Feels. 218 00:24:30,010 --> 00:24:31,809 AUDIENCE: Yes. 219 00:24:31,809 --> 00:24:32,340 SPEAKER 1: Feels. 220 00:24:32,340 --> 00:24:32,570 AUDIENCE: You asked, how would I decide. 221 00:24:32,570 --> 00:24:34,760 SPEAKER 1: Yeah, how would you decide? How would you feel it? 222 00:24:34,760 --> 00:24:37,990 AUDIENCE: I would feel that the tangent is more familiar. 223 00:24:37,990 --> 00:24:39,679 SPEAKER 1: Which one? 224 00:24:39,679 --> 00:24:42,850 AUDIENCE: I feel that the tangent [INAUDIBLE]. 225 00:24:42,850 --> 00:24:44,409 SPEAKER 1: Yeah, but I wonder how we could make it a little bit more precise, this idea 226 00:24:44,409 --> 00:24:52,349 of simplicity. The young Turk has a suggestion. What? 227 00:24:52,349 --> 00:24:57,969 AUDIENCE: I had a suggestion until you said this idea of simplicity. So then I realized 228 00:24:57,969 --> 00:25:02,909 that what I was about to suggest wasn't going to clarify simplicity, but I was going to 229 00:25:02,909 --> 00:25:09,280 say, whichever one we've had more encounters with, or more experience with. 230 00:25:09,289 --> 00:25:12,410 SPEAKER 1: Yeah, if there was something here with a hyperbolic tangent, you might say, 231 00:25:12,419 --> 00:25:15,950 well, stay away from that. [? Yinid ?]? 232 00:25:15,950 --> 00:25:21,399 AUDIENCE: To which one of those the easier transformation is applied on the next step. 233 00:25:21,399 --> 00:25:24,819 SPEAKER 1: Like, somebody do a little look ahead, and see which kind of thing would be 234 00:25:24,820 --> 00:25:29,640 next to you? I don't know, maybe. Oh, we've got lots of people, all at the same time. 235 00:25:29,640 --> 00:25:33,200 I don't know all your names yet. Shoot. Erica, I know you. 236 00:25:33,200 --> 00:25:35,840 AUDIENCE: What's look it up in the table and see [INAUDIBLE]. 237 00:25:35,840 --> 00:25:40,300 SPEAKER 1: Oh, you could look it up in the table and see if something is in it, you could 238 00:25:40,300 --> 00:25:46,840 do that. But this is tangent to the fourth, so that's not in the table. Ariel? 239 00:25:46,840 --> 00:25:49,620 AUDIENCE: I choose the one without the reciprocal. 240 00:25:49,620 --> 00:25:50,800 SPEAKER 1: Why? 241 00:25:50,800 --> 00:25:57,740 AUDIENCE: It is because when people see one it's like, oh man, it jut not going to work. 242 00:25:57,740 --> 00:26:00,120 SPEAKER 1: Yeah, we're on the right track. Claire? 243 00:26:00,120 --> 00:26:03,480 AUDIENCE: On an extremely simple level, I choose whichever one has the least symbols 244 00:26:03,480 --> 00:26:04,680 in it. 245 00:26:04,680 --> 00:26:07,060 SPEAKER 1: The fewest symbols in it. Now we're really getting somewhere, because you can 246 00:26:07,060 --> 00:26:12,640 measure that, right, there's a little program Why Brett, there you are. 247 00:26:12,640 --> 00:26:19,800 AUDIENCE: I would say, every [INAUDIBLE] expression can be written as, having a number of functions, 248 00:26:19,800 --> 00:26:23,500 we could say all these functions, multiplied together, divided, and you can just choose 249 00:26:23,500 --> 00:26:26,980 with the least amount of [? iterations ?]. 250 00:26:26,980 --> 00:26:31,760 SPEAKER 1: Well I heard it, perhaps others didn't but what Brett said, is he suggested 251 00:26:31,769 --> 00:26:38,040 that we should measure depth of functional composition. So the number of symbols may 252 00:26:38,040 --> 00:26:42,779 not matter, because if you have x plus x plus x plus x, out to a hundred, that would not 253 00:26:42,779 --> 00:26:49,620 be hard to integrate. But if you've got something that is really deeply nested under a lot of 254 00:26:49,620 --> 00:26:54,860 functional compositions, that could be a problem. And that's in fact, what Slegle decided to 255 00:26:54,860 --> 00:27:00,440 use, after trying several alternatives. 256 00:27:00,440 --> 00:27:06,370 So if we measure the depth of the functional composition, this is the winner, and we put 257 00:27:06,370 --> 00:27:11,399 the other one on the shelf, at least for the moment. And now we have tangent to the fourth 258 00:27:11,399 --> 00:27:18,809 x dx. Do I need the safe transformation supply? No. Which of the-- you know something has 259 00:27:18,809 --> 00:27:24,440 to apply, otherwise it wouldn't be up here as an example. So what of the heuristic transformation 260 00:27:24,440 --> 00:27:25,289 supply? Elliott. 261 00:27:25,289 --> 00:27:26,460 AUDIENCE: [INAUDIBLE] 262 00:27:26,460 --> 00:27:30,240 SPEAKER 1: Yeah, B bravo. Military background or something like that. Maybe 263 00:27:30,240 --> 00:27:38,160 he flies airplanes. OK so B says, it is in fact a function of the tangent. And when we 264 00:27:38,169 --> 00:27:43,529 do that, we've got to make a substitution, that y is equal to the tangent. So that means 265 00:27:43,529 --> 00:27:55,799 that this becomes the integral of y to the fourth over 1 plus y squared. And that's by 266 00:27:55,799 --> 00:28:08,059 transformation B, and the transformation is y equals tangent of x. The tangent-- I guess 267 00:28:08,059 --> 00:28:17,450 I've lost track of the fact that I've already transformed a y, but relabeling doesn't matter. 268 00:28:17,450 --> 00:28:25,769 All right so that's progress, maybe. But don't see this in any of the heuristic transformations, 269 00:28:25,769 --> 00:28:30,340 what do I do now? I didn't have to look in the heuristic transformations, because one 270 00:28:30,340 --> 00:28:39,370 of the safe transformations applies. Because this thing is a rational function and the 271 00:28:39,370 --> 00:28:44,120 degree of the numerator is greater that the degree of the denominator, so I have to divide. 272 00:28:44,120 --> 00:28:52,240 And when I divide, and that by the way is number four, I get what? Is anybody good high 273 00:28:52,240 --> 00:28:53,960 school algebra that can help me out with that? 274 00:28:53,960 --> 00:28:59,740 AUDIENCE: Y squared minus 2 plus negative 2 over 1 plus y squared 275 00:28:59,740 --> 00:29:15,679 SPEAKER 1: Exactly, y squared minus 1 plus 1 over 1 plus y squared, I think. Now what? 276 00:29:15,679 --> 00:29:31,009 Now we're really getting close to getting through this, because that is a sum. And by 277 00:29:31,009 --> 00:29:38,070 virtue of the fact that it's a sum, that divides into three pieces, and the top piece is the 278 00:29:38,070 --> 00:29:44,399 integral of y squared, the middle piece is the integral of minus 1, and the bottom piece 279 00:29:44,399 --> 00:29:51,929 is the integral of 1 over 1 plus y squared dy in all cases. 280 00:29:51,929 --> 00:30:06,929 Gosh, if I look this up, I've found it. That's up there, that's letter B. So I'm done with 281 00:30:06,929 --> 00:30:16,409 that. This one I can transform again, by virtue of 1, and now I get the integral dy. That's 282 00:30:16,409 --> 00:30:24,750 in there, that's B as well. As this one, I don't know. But I'd better keep track of what 283 00:30:24,750 --> 00:30:27,669 I'm doing here. This is in the and node, so I've got to do all of those. I can't give 284 00:30:27,669 --> 00:30:38,139 up on that last thing. And that and transformation is transformation number 3. So this is in 285 00:30:38,139 --> 00:30:46,710 the table, this is in the table, we still have this to do, but that's C, heuristic transformation 286 00:30:46,710 --> 00:30:58,659 C. We have 1, plus y squared, then with the transformation C, with y-- this is y squared-- 287 00:30:58,659 --> 00:31:09,149 y equals tangent of z And then we get to the integral of dz and that's in the table and, 288 00:31:09,149 --> 00:31:10,889 we're done. 289 00:31:10,889 --> 00:31:16,120 So now we've solved the problem. It's the hardest problem that appeared in that half 290 00:31:16,120 --> 00:31:24,009 decade on MIT 18 01 finals. This is exactly the problem that was given, except that it 291 00:31:24,009 --> 00:31:30,080 started here. I put the other two pieces on just to illustrate a couple of the transformations. 292 00:31:30,080 --> 00:31:35,480 But that's a problem that it solved. 293 00:31:35,480 --> 00:31:42,320 And now that we've seen an example, we can finish up what we talked about a little bit 294 00:31:42,320 --> 00:31:51,779 ago, having to do with the architecture of this thing. So far, all we've done is talk 295 00:31:51,779 --> 00:31:58,039 about the safe transformations, but now we know that if we're not done, we need to find 296 00:31:58,040 --> 00:32:06,860 a problem to work on 297 00:32:06,860 --> 00:32:17,159 using that depth of functional composition business. And then after that we apply heuristic 298 00:32:17,159 --> 00:32:28,819 transformation. 299 00:32:28,820 --> 00:32:31,540 And the way Slagle designed his program is, 300 00:32:31,540 --> 00:32:34,600 he found just one problem to work on, did one transformation, 301 00:32:34,610 --> 00:32:38,600 then went back around the loop. Because these heuristic transformations are a little harder 302 00:32:38,600 --> 00:32:44,679 to apply than the safe ones. So I'll given you an accurate portrayal of what this program 303 00:32:44,679 --> 00:32:51,169 did, except for one thing. Which I would like, now, to go back and patch up. And that thing 304 00:32:51,169 --> 00:33:09,009 is over here. What to do with something like this. Well we got to that in a board that's 305 00:33:09,009 --> 00:33:15,350 disappeared, but when we tried to deal with this, we had to find a heuristic transformation. 306 00:33:15,350 --> 00:33:20,289 And when we decided to work on this, it must have been the case that this was the simplest 307 00:33:20,289 --> 00:33:24,370 problem at a leaf node that has not yet been solved. 308 00:33:24,370 --> 00:33:32,559 So what's the functional composition depth of this? It's 3. Back over here, we have something 309 00:33:32,559 --> 00:33:37,649 that has a depth of functional composition of 2. So when the program actually ran on 310 00:33:37,649 --> 00:33:44,029 this particular problem, it stopped a few inches short of the finish line, And went 311 00:33:44,029 --> 00:33:48,720 back and screwed around with that other problem for a little bit, before it gave up and came 312 00:33:48,720 --> 00:33:50,919 back here. 313 00:33:50,919 --> 00:33:55,669 So it's always looking across the whole tree, the leaves of the tree. Whenever it has to 314 00:33:55,669 --> 00:33:59,970 find a place to work on with the heuristic transformation, it happened to look at all 315 00:33:59,970 --> 00:34:04,159 the leaves of the tree that had not yet been dealt with, tried to find the easiest one, 316 00:34:04,159 --> 00:34:08,550 and that could involve a lot of backing up and starting over on a branch of the tree 317 00:34:08,550 --> 00:34:14,710 that it had previously ignored. A small detail, not a particularly important one. 318 00:34:14,710 --> 00:34:35,620 Now where are we. We've got that guy there. We've got our complete architecture. We've 319 00:34:35,620 --> 00:34:44,300 got our solved problem. And now we can start reflecting on what we've done. We can say, 320 00:34:44,300 --> 00:34:49,900 for example, how good an integration program is this? And the answer is, it was pretty 321 00:34:49,900 --> 00:34:57,970 good. This machine that Slagle was using was a machine that was over in building 26. And 322 00:34:57,970 --> 00:35:01,900 we were so proud of it, that it was behind glass, and you could go there and watch the 323 00:35:01,900 --> 00:35:10,610 tape spin, it was really a delight. 32k of memory, that's 32k of memory. It's amazing 324 00:35:10,610 --> 00:35:17,440 that he was able to do anything with a machine of that size. 325 00:35:17,440 --> 00:35:29,740 Let's see, let's get us a clean one. Can't do board geometry and talk at the same 326 00:35:29,740 --> 00:35:40,220 time. We can now ask some questions about how well the program performed. It was given 327 00:35:40,220 --> 00:35:49,000 56 of the hardest problems, and it got 54 right. What happened when it didn't get the 328 00:35:49,000 --> 00:35:55,310 other two? Well, you might be right if you said, oh it probably ran out of memory, since 329 00:35:55,310 --> 00:36:01,280 it had 32k. But in fact, it just was lacking 2 transformations that were needed, in order 330 00:36:01,280 --> 00:36:09,930 to solve the whole entire set of final quiz problems. So when a program fails, that's 331 00:36:09,930 --> 00:36:15,270 often the most interesting question you can ask. This is an exception. This failed for 332 00:36:15,270 --> 00:36:20,670 uninteresting reasons on 2 of the 56 problems that it was given to. 333 00:36:20,670 --> 00:36:30,750 And now the next question you can say is, what is the depth of the tree in the maximal 334 00:36:30,750 --> 00:36:36,370 case? And the answer is, it's that case we just worked out. And since I've once again 335 00:36:36,370 --> 00:36:42,880 lost the whole tree, I'll tell you that it's depth was 7 when you take off that minus 5. 336 00:36:42,880 --> 00:36:51,090 So in the worst case, this thing had to get down seven levels. 337 00:36:51,090 --> 00:37:00,380 That's the worst case, a more interesting question is what was the average depth? And 338 00:37:00,380 --> 00:37:09,900 that was approximately 3. And now we're beginning to say something, not only about Slagle's 339 00:37:09,900 --> 00:37:16,450 model of how a freshman works, but we're beginning to say something about the nature of the domain. 340 00:37:16,450 --> 00:37:21,980 In the domain of calculus problems, integrals expressions that are given to freshman, in 341 00:37:21,980 --> 00:37:29,670 that domain, the average depth of problem reduction needed to solve the problem was 342 00:37:29,670 --> 00:37:35,270 3. So that's not very complicated. If it were 10, you would say, wow, how can anybody ever 343 00:37:35,270 --> 00:37:42,260 do those problems? If it were 5, you'd say, well only people destined to be math professors 344 00:37:42,260 --> 00:37:48,590 are going to get anything right. If it's 3, us ordinary mortals can do a pretty good job. 345 00:37:48,590 --> 00:38:06,040 Another question of even greater interest is, how many branches were unused? Here's 346 00:38:06,040 --> 00:38:10,910 a branch that turned out to be unused, it didn't pursue that. And so you might say, 347 00:38:10,910 --> 00:38:17,850 well maybe there are a lot of unused branches. Maybe you have to be pretty smart about your 348 00:38:17,850 --> 00:38:21,160 method for determining what problem to work on, because otherwise you'll go down a lot 349 00:38:21,160 --> 00:38:23,580 of rat holes. 350 00:38:23,580 --> 00:38:30,250 And guess what, here's another statement about the domain. In the domain of problems that 351 00:38:30,250 --> 00:38:38,060 freshmen could work on a final, the number of unused branches is about 1. So that means 352 00:38:38,060 --> 00:38:45,690 this tree keeps itself together, and doesn't run down to a very large, bushy, useless tree. 353 00:38:45,690 --> 00:38:52,320 So this means that the depth of functional composition, which Brett suggested as a technique 354 00:38:52,320 --> 00:38:59,750 for recognizing the right problem work on, was a choice that didn't actually matter. 355 00:38:59,750 --> 00:39:05,710 Because the tree doesn't grow deep, it doesn't go broad. It doesn't matter what you use to 356 00:39:05,710 --> 00:39:10,110 decide what to work on, because in the worst case, you'll just generate a couple of extra, 357 00:39:10,110 --> 00:39:14,150 useless nodes. But they very quickly run to find a dead end, so you don't have to do anything 358 00:39:14,150 --> 00:39:19,020 more with them. 359 00:39:19,020 --> 00:39:24,280 So now the next thing we need to do is back even further away from this program, and ask 360 00:39:24,280 --> 00:39:29,370 ourselves some questions about the nature of what we've been doing. And that brings 361 00:39:29,370 --> 00:39:34,170 me to the things I've got on that upper right-hand board. One of those things as a catechism 362 00:39:34,170 --> 00:39:37,020 having to do with knowledge. 363 00:39:37,020 --> 00:39:41,050 And what we've done informally as we went through this program was, we've asked questions 364 00:39:41,050 --> 00:39:48,530 such as, what kind of knowledge is involved in doing this? Well knowledge about transformation. 365 00:39:48,530 --> 00:39:54,770 Knowledge about how goal trees work and when we're done with a problem. Knowledge about 366 00:39:54,770 --> 00:39:59,520 what things don't need to be transformed, because you can look them up in a table. That's 367 00:39:59,520 --> 00:40:05,890 the kind of knowledge that is involved in doing 18 01. And if you do 18 0 circuit theory, 368 00:40:05,890 --> 00:40:11,020 6 0 circuit theory or 6 0 Maxwell's equations, this is the same thing. 369 00:40:11,020 --> 00:40:14,020 You have to ask questions of this sort, 370 00:40:14,030 --> 00:40:17,430 about the nature of the knowledge involved, and question number one is always, what kind 371 00:40:17,430 --> 00:40:23,440 of knowledge is involved? Is it Kirchhoff's laws, Maxwell's equations, what is it? 372 00:40:23,440 --> 00:40:27,440 The next question is, how is the knowledge represented? And our answers here are, well 373 00:40:27,440 --> 00:40:32,700 all this stuff, ultimately was represented in list best expressions. Some of the 374 00:40:32,700 --> 00:40:37,040 knowledge was recorded in a table [? of best ?] expressions to show what transformations 375 00:40:37,040 --> 00:40:43,930 there are. There was a similar table of integrals. Knowledge about goal trees was embedded in 376 00:40:43,930 --> 00:40:49,970 the procedure, so it was procedurally represented. And so for each of the categories of knowledge, 377 00:40:49,970 --> 00:40:56,550 there's a way it gets represented. How is it used? Straightforward, transformations 378 00:40:56,550 --> 00:41:02,390 are used to make the problem simpler. The table is used to trim off and to serve as 379 00:41:02,390 --> 00:41:06,530 the bottom of the tree. Those are the ways in which the knowledge is used. 380 00:41:06,530 --> 00:41:13,440 And then there's the question of course of, how much knowledge is required. Something 381 00:41:13,440 --> 00:41:19,180 that's useful to know if it's late at night, you have 2 finals the next day, and you're 382 00:41:19,180 --> 00:41:24,640 not sure which course you should study. So how much knowledge might you suppose was actually 383 00:41:24,640 --> 00:41:29,960 in this program? I've shown you a glimpse of the kind of knowledge that's involved in 384 00:41:29,960 --> 00:41:34,710 the program. I've answered a little bit of question 5, what exactly. But how much knowledge 385 00:41:34,710 --> 00:41:36,910 was involved. You might be surprised by the answer. 386 00:41:36,910 --> 00:41:44,540 First of all, the table of integrals. I've listed only 3 things there. There are lots 387 00:41:44,540 --> 00:41:50,770 of other things you can think of, like integral of e to the x is e to the x. But in the end, 388 00:41:50,770 --> 00:41:58,310 what Slagle found is, a table only 26 elements was enough to solve all of these problems. 389 00:41:58,310 --> 00:42:11,320 How about the transformations here, the safe ones, about 12. How about the heuristic ones, 390 00:42:11,320 --> 00:42:17,490 about 12. So just a few bits and pieces of knowledge, here and there, are sufficient 391 00:42:17,490 --> 00:42:22,370 to do everything you need to do, in order to do the integration problems on a calculus 392 00:42:22,370 --> 00:42:25,520 final. That was a surprise. 393 00:42:25,520 --> 00:42:31,140 Another surprise of a similar kind, also about knowledge, is that the relationship between 394 00:42:31,140 --> 00:42:40,400 the method to be used, and the characteristics of the problem, was almost a diagonal table. 395 00:42:40,400 --> 00:42:46,850 That means that you could, in this domain, make the right transformation almost all the 396 00:42:46,850 --> 00:42:52,300 time if you're a little bit smart, and never back up. That was an observation made by Joel 397 00:42:52,300 --> 00:42:58,210 Moses, who became subsequently our provost here at MIT for a while. And he wrote a program 398 00:42:58,210 --> 00:43:05,760 that could solve anything. It would beat the most dedicated mathematicians at integration. 399 00:43:05,760 --> 00:43:08,610 And its descendents are in MATLAB today. 400 00:43:08,610 --> 00:43:12,570 But this is how it all works. And now you can write one of these things yourself. Partly 401 00:43:12,570 --> 00:43:18,300 because you now have this catechism. This is the kind of stuff you should ask any time 402 00:43:18,300 --> 00:43:24,320 you're dealing with a new domain. It will make you smarter. And this is of course, meta 403 00:43:24,320 --> 00:43:30,970 knowledge, this is knowledge about knowledge. So this tired aphorism isn't quite what we 404 00:43:30,970 --> 00:43:43,810 are going to complete ourselves with. We're going to say that knowledge about knowledge 405 00:43:43,810 --> 00:43:45,850 is where the real power is. 406 00:43:45,850 --> 00:43:52,790 Now there's one final thing that this program does for us. It tells us something about our 407 00:43:52,790 --> 00:43:58,280 appreciation of what it means to be intelligent. You know that in the beginning of this hour, 408 00:43:58,280 --> 00:44:04,260 I asked you to think about whether a program that could do symbolic integration would be, 409 00:44:04,260 --> 00:44:12,480 in any way, or should be considered to any degree, intelligent. And I'm imagining that 410 00:44:12,480 --> 00:44:17,820 even in these days of MATLAB, and whatnot, many of you said well, yes, I learned how 411 00:44:17,820 --> 00:44:23,190 to do that at MIT, or late in high school, so it must be smart. 412 00:44:23,190 --> 00:44:29,410 But now that we've completed this discussion, I also expect that your feeling of intelligence 413 00:44:29,410 --> 00:44:34,740 in this program is somewhat diminished. Because what happens is that, when we understand how 414 00:44:34,740 --> 00:44:39,890 something works, it's intelligence seems to vanish. You've seen this in your friends, 415 00:44:39,890 --> 00:44:46,740 right? They solve some problem, they seem super smart. Then they tell you how they did 416 00:44:46,740 --> 00:44:50,830 it, and they don't seem so smart anymore. 417 00:44:50,830 --> 00:44:59,860 So let's conclude our discussion today was a little story. A long time ago I was talking 418 00:44:59,860 --> 00:45:08,030 with a student who said, computers cannot be intelligent. And I said, OK, maybe you're 419 00:45:08,030 --> 00:45:12,100 right, but let me show you this program. So I showed him the integration program, working 420 00:45:12,100 --> 00:45:18,580 on problems like this. And after I showed him a couple of those examples, he says, well, 421 00:45:18,580 --> 00:45:23,460 all right, I guess maybe they can be intelligent. I'm learning how to do that, and it's not 422 00:45:23,460 --> 00:45:30,830 always easy. Then I made a fatal mistake. I said let me show you how it works, and we 423 00:45:30,830 --> 00:45:35,280 spent an hour going through it like this. And at the end of that time, he turned to 424 00:45:35,280 --> 00:45:42,050 me and said, I take it back, it's not intelligent after all. It does integration the same way 425 00:45:42,050 --> 00:45:44,840 I do.