1 00:00:10,870 --> 00:00:13,000 PROFESSOR PATRICK WINSTON: You know, some 2 00:00:13,000 --> 00:00:15,406 of you who for instance-- 3 00:00:15,406 --> 00:00:19,640 I don't know, Sonya, Krishna, Shoshana-- 4 00:00:22,240 --> 00:00:24,390 some of you I can count on being here every time. 5 00:00:24,390 --> 00:00:25,710 Some of you show up once in a while. 6 00:00:25,710 --> 00:00:27,650 The ones of you who show up once in a while happen to be 7 00:00:27,650 --> 00:00:29,320 very lucky if you picked today, because what we're 8 00:00:29,320 --> 00:00:32,350 going to do today is I'm going to tell you stuff that might 9 00:00:32,350 --> 00:00:34,850 make a big difference in your whole life. 10 00:00:34,850 --> 00:00:35,870 Because I'm going to tell you how you can 11 00:00:35,870 --> 00:00:37,810 make yourself smarter. 12 00:00:37,810 --> 00:00:38,610 No kidding. 13 00:00:38,610 --> 00:00:41,730 And I'm also going to tell you how you can package your ideas 14 00:00:41,730 --> 00:00:43,610 so you'll be the one that's picked instead 15 00:00:43,610 --> 00:00:45,660 of some other slug. 16 00:00:45,660 --> 00:00:48,240 So that's what we're going to do today. 17 00:00:48,240 --> 00:00:52,450 It's the most important lecture of the semester. 18 00:00:52,450 --> 00:00:54,850 The sleep lecture is only the second most important. 19 00:00:54,850 --> 00:00:57,300 This is the most important. 20 00:00:57,300 --> 00:00:59,740 Now the vehicle that's going to get us there is a 21 00:00:59,740 --> 00:01:04,170 discussion about how it's possible to learn in a way 22 00:01:04,170 --> 00:01:06,110 that is a little reminiscent of what we 23 00:01:06,110 --> 00:01:07,540 talked about last time. 24 00:01:07,540 --> 00:01:10,120 Because last time we learned something very definite from a 25 00:01:10,120 --> 00:01:11,289 small number of examples. 26 00:01:11,289 --> 00:01:15,220 This takes it one step further and shows how it's possible to 27 00:01:15,220 --> 00:01:20,830 learn in a human-like way from a single example in one shot. 28 00:01:20,830 --> 00:01:24,880 So it's extremely different, very different from everything 29 00:01:24,880 --> 00:01:26,270 you've seen before. 30 00:01:26,270 --> 00:01:32,050 Everything that involves learning from thousands of 31 00:01:32,050 --> 00:01:37,340 trials and gazillions of examples and only learning a 32 00:01:37,340 --> 00:01:39,300 little tiny bit, if anything, from each of them. 33 00:01:39,300 --> 00:01:40,550 This is going to learn something 34 00:01:40,550 --> 00:01:41,800 definite from every example. 35 00:01:46,400 --> 00:01:49,400 So here's the classroom example. 36 00:01:49,400 --> 00:01:50,080 What's this? 37 00:01:50,080 --> 00:01:51,090 It's an arch. 38 00:01:51,090 --> 00:01:54,330 I know the architects are complaining that it's not an 39 00:01:54,330 --> 00:01:57,150 arch in architecture land. 40 00:01:57,150 --> 00:01:59,340 It's a post and lintel construction. 41 00:01:59,340 --> 00:02:01,780 But for us today it's going to be an arch. 42 00:02:01,780 --> 00:02:04,700 Now if you were from Mars and didn't know what an arch was, 43 00:02:04,700 --> 00:02:07,610 I might present this to you and you'd get a general idea 44 00:02:07,610 --> 00:02:10,020 of some things that might be factors, but you'd have no 45 00:02:10,020 --> 00:02:12,230 idea what's really important. 46 00:02:12,230 --> 00:02:15,040 So then I would say, that's not an arch. 47 00:02:15,040 --> 00:02:17,900 And you would learn something very definite from that. 48 00:02:17,900 --> 00:02:20,470 And then I would shove these together and put this back on, 49 00:02:20,470 --> 00:02:22,690 and I would say, that's not an arch either. 50 00:02:22,690 --> 00:02:25,470 And you'd learn something very definite from that. 51 00:02:25,470 --> 00:02:28,530 And then I could paint the top one blue, and you'd learn 52 00:02:28,530 --> 00:02:29,920 something very different from that. 53 00:02:29,920 --> 00:02:33,050 And how can that happen is the question? 54 00:02:33,050 --> 00:02:38,130 How can that happen in detail, and what might it mean for 55 00:02:38,130 --> 00:02:40,780 human learning and how you can make yourself smarter? 56 00:02:40,780 --> 00:02:42,350 And that's where we're going to go. 57 00:02:42,350 --> 00:02:43,020 All right? 58 00:02:43,020 --> 00:02:46,620 So how can we make a program that's a smart as a martian 59 00:02:46,620 --> 00:02:49,610 about learning things like that? 60 00:02:49,610 --> 00:02:52,020 Well, if you were writing that program, surely the first 61 00:02:52,020 --> 00:02:55,710 thing you would do is you'd try to get off the picture as 62 00:02:55,710 --> 00:02:59,870 quickly as possible and into symbol land where things are 63 00:02:59,870 --> 00:03:03,430 clearer about what the important parts are. 64 00:03:03,430 --> 00:03:08,120 So you'd be presented with an initial example that might 65 00:03:08,120 --> 00:03:09,370 look like this. 66 00:03:14,630 --> 00:03:15,880 We'll call that an example. 67 00:03:19,750 --> 00:03:21,079 And it's more than just an example. 68 00:03:21,079 --> 00:03:22,329 It's the initial model. 69 00:03:27,829 --> 00:03:29,900 That's the starting point. 70 00:03:29,900 --> 00:03:32,470 And now we're going to couple that with something that's not 71 00:03:32,470 --> 00:03:38,710 actually an arch but looks a whole lot like one, at least 72 00:03:38,710 --> 00:03:41,530 on the descriptive level to which we're about to go. 73 00:03:41,530 --> 00:03:47,590 So here's something that's not an arch, but its description 74 00:03:47,590 --> 00:03:50,430 doesn't differ from that of an arch very much. 75 00:03:50,430 --> 00:03:54,262 In fact, if we were to draw this out in a kind of network, 76 00:03:54,262 --> 00:03:57,780 we would have a description that looks like this, and 77 00:03:57,780 --> 00:03:59,980 these relations would be support relations. 78 00:04:03,280 --> 00:04:07,640 And this would be drawn out like so. 79 00:04:07,640 --> 00:04:10,640 And the only difference would be-- 80 00:04:10,640 --> 00:04:12,950 the only difference would be that those support relations 81 00:04:12,950 --> 00:04:17,570 that we had in the initial model-- 82 00:04:17,570 --> 00:04:19,060 the example-- 83 00:04:19,060 --> 00:04:25,100 have disappeared down out here in this configuration. 84 00:04:25,100 --> 00:04:26,790 But since it's not very different from the model, 85 00:04:26,790 --> 00:04:28,240 we're going to call this a near miss. 86 00:04:36,630 --> 00:04:39,670 And now, you see, we've abstracted away from all the 87 00:04:39,670 --> 00:04:41,800 details that don't matter to us. 88 00:04:41,800 --> 00:04:44,000 Last time we talked about a good representation having 89 00:04:44,000 --> 00:04:44,970 certain qualities-- 90 00:04:44,970 --> 00:04:47,050 qualities like making the right things explicit. 91 00:04:47,050 --> 00:04:50,300 Well, this makes the structure explicit, and it suppresses 92 00:04:50,300 --> 00:04:53,159 information about blemishes on the surface. 93 00:04:53,159 --> 00:04:56,540 We don't care much about how tall the objects are. 94 00:04:56,540 --> 00:05:00,180 We don't think it matters what they're made of. 95 00:05:00,180 --> 00:05:03,050 So this is a representation that satisfies the first of 96 00:05:03,050 --> 00:05:04,210 the criteria from last time. 97 00:05:04,210 --> 00:05:05,460 It makes the right things explicit. 98 00:05:07,830 --> 00:05:10,990 And by making the right things explicit, it's exposing some 99 00:05:10,990 --> 00:05:12,440 constraint here with respect to what it 100 00:05:12,440 --> 00:05:13,370 takes to be an arch. 101 00:05:13,370 --> 00:05:16,460 And we see that if those support relations are missing, 102 00:05:16,460 --> 00:05:18,410 it's not an arch. 103 00:05:18,410 --> 00:05:20,510 So we ought to be able to learn something from that. 104 00:05:23,870 --> 00:05:25,170 What we're going to do is we're going to put these two 105 00:05:25,170 --> 00:05:27,170 things together. 106 00:05:27,170 --> 00:05:29,880 We're going to describe the difference between the two. 107 00:05:29,880 --> 00:05:31,510 And we're going to reach the conclusion that since there's 108 00:05:31,510 --> 00:05:33,040 only one difference-- 109 00:05:33,040 --> 00:05:36,250 one kind of difference with two manifestations to 110 00:05:36,250 --> 00:05:38,860 disappearing support relations, we're going to 111 00:05:38,860 --> 00:05:44,940 conclude that those support relations are important. 112 00:05:44,940 --> 00:05:48,830 And we're going to turn them red 113 00:05:48,830 --> 00:05:51,140 because they're so important. 114 00:05:51,140 --> 00:05:54,640 And we're going to change the name from "support" to "must 115 00:05:54,640 --> 00:06:00,130 support." 116 00:06:00,130 --> 00:06:02,250 So this is our new model. 117 00:06:02,250 --> 00:06:05,380 This is an evolving model that now is decorated with 118 00:06:05,380 --> 00:06:07,590 information about what's important. 119 00:06:07,590 --> 00:06:10,570 So if you're going to match something against this model, 120 00:06:10,570 --> 00:06:12,700 it must be the case that those support relations are there. 121 00:06:12,700 --> 00:06:15,960 If it's not there-- if they're not there, it's not an arch. 122 00:06:15,960 --> 00:06:17,750 All right? 123 00:06:17,750 --> 00:06:19,830 So we've learned something definite 124 00:06:19,830 --> 00:06:21,070 from a single example. 125 00:06:21,070 --> 00:06:23,330 This is not 10,000 trials. 126 00:06:23,330 --> 00:06:26,490 This is a teacher presenting something to the student and 127 00:06:26,490 --> 00:06:30,040 the student learning something immediately in one step about 128 00:06:30,040 --> 00:06:33,740 what's important in an arch. 129 00:06:33,740 --> 00:06:34,550 So let's do it again. 130 00:06:34,550 --> 00:06:36,580 That was so much fun. 131 00:06:36,580 --> 00:06:37,830 Let's do this one. 132 00:06:40,930 --> 00:06:45,970 Same as before except that now when we describe this thing, 133 00:06:45,970 --> 00:06:48,880 there are some additional relations-- 134 00:06:48,880 --> 00:06:51,070 these relations, and those are touch relations. 135 00:06:54,860 --> 00:06:57,310 So now when we compare that-- 136 00:06:57,310 --> 00:06:58,340 is that an arch? 137 00:06:58,340 --> 00:06:59,360 No. 138 00:06:59,360 --> 00:07:00,610 It's a near miss. 139 00:07:05,720 --> 00:07:09,370 When we compare that near miss with our evolving model, we 140 00:07:09,370 --> 00:07:12,210 see immediately that once again there's exactly one 141 00:07:12,210 --> 00:07:13,580 difference, two 142 00:07:13,580 --> 00:07:16,190 manifestations, the touch relations. 143 00:07:16,190 --> 00:07:19,650 So we can immediately conclude that these touch relations are 144 00:07:19,650 --> 00:07:23,490 interfering with our belief that this could be an arch. 145 00:07:23,490 --> 00:07:25,330 So what do we do with that? 146 00:07:25,330 --> 00:07:28,210 We put those together again and we build 147 00:07:28,210 --> 00:07:30,680 ourselves a new model. 148 00:07:30,680 --> 00:07:33,730 It's much like the old model. 149 00:07:33,730 --> 00:07:36,920 It still has the imperatives up here. 150 00:07:36,920 --> 00:07:39,159 We have to have the support relations. 151 00:07:39,159 --> 00:07:42,740 But now down here-- and we draw not signs through there-- 152 00:07:42,740 --> 00:07:44,159 these are must not touch relations. 153 00:07:49,500 --> 00:07:52,300 So now you can't match against that model if those two side 154 00:07:52,300 --> 00:07:55,360 supports are touching each other. 155 00:07:55,360 --> 00:07:57,630 So in just two steps, we've learned two important things 156 00:07:57,630 --> 00:08:00,780 about what has to be in place in order for this thing to be 157 00:08:00,780 --> 00:08:01,720 construed to be an arch. 158 00:08:01,720 --> 00:08:04,160 So our martian is making great progress. 159 00:08:04,160 --> 00:08:05,860 But our martian isn't through, because there's some more 160 00:08:05,860 --> 00:08:08,210 things we might want it to know about 161 00:08:08,210 --> 00:08:10,170 the nature of arches. 162 00:08:10,170 --> 00:08:16,510 For example, we might present it with this one. 163 00:08:16,510 --> 00:08:19,830 Well, that looks just like our initial example. 164 00:08:19,830 --> 00:08:24,720 It's an example just like our initial example. 165 00:08:24,720 --> 00:08:29,360 But this time the top has been painted red. 166 00:08:29,360 --> 00:08:32,380 And I'm still saying that that's an arch. 167 00:08:32,380 --> 00:08:35,490 So once again, there's only one difference and that 168 00:08:35,490 --> 00:08:42,570 difference is that in the description of this object, we 169 00:08:42,570 --> 00:08:47,330 have the additional information that the color of 170 00:08:47,330 --> 00:08:50,480 the top is red. 171 00:08:50,480 --> 00:08:53,000 And we've been carrying along without saying so, that the 172 00:08:53,000 --> 00:08:57,750 color of the top in the evolving model is white. 173 00:09:00,920 --> 00:09:03,900 So now we know that the top doesn't have to be white. 174 00:09:03,900 --> 00:09:06,920 It can be either red or white. 175 00:09:06,920 --> 00:09:09,480 So we'll put those two together and 176 00:09:09,480 --> 00:09:11,600 we'll get a new model. 177 00:09:11,600 --> 00:09:14,440 And that new model this time once again 178 00:09:14,440 --> 00:09:16,600 will have three parts. 179 00:09:16,600 --> 00:09:20,380 It will have the relations, an imperative form that we've 180 00:09:20,380 --> 00:09:23,760 been carrying along now, the must support and the must not 181 00:09:23,760 --> 00:09:27,220 touch, but now we're going to turn that color relation 182 00:09:27,220 --> 00:09:29,520 itself into an imperative. 183 00:09:29,520 --> 00:09:31,280 And we're going to say that the top has to be 184 00:09:31,280 --> 00:09:35,200 either red or white. 185 00:09:40,290 --> 00:09:42,330 So now, once again, in one step we've learned something 186 00:09:42,330 --> 00:09:44,540 definite about archness. 187 00:09:44,540 --> 00:09:46,300 Two more steps. 188 00:09:46,300 --> 00:09:49,190 Suppose now we present it with this example. 189 00:09:54,400 --> 00:09:55,650 It's an example. 190 00:09:59,670 --> 00:10:02,030 And this time there's going to be a little paint 191 00:10:02,030 --> 00:10:04,020 added here as well. 192 00:10:04,020 --> 00:10:09,790 This time we're going to have the top painted blue like so. 193 00:10:09,790 --> 00:10:13,060 So the description will be like so. 194 00:10:22,540 --> 00:10:24,520 And now we have to somehow put that together with our 195 00:10:24,520 --> 00:10:26,620 evolving model to make a new model. 196 00:10:26,620 --> 00:10:28,080 And there's some choices here. 197 00:10:28,080 --> 00:10:30,640 And our choice depends somewhat on the nature of the 198 00:10:30,640 --> 00:10:33,000 world that we're working in. 199 00:10:33,000 --> 00:10:35,990 So suppose we're working in flag world. 200 00:10:35,990 --> 00:10:37,080 There are only three colors-- 201 00:10:37,080 --> 00:10:39,410 red, white, and blue. 202 00:10:39,410 --> 00:10:41,590 Now we've seen them all. 203 00:10:41,590 --> 00:10:44,700 If we've seen them all, then what we're going to do is 204 00:10:44,700 --> 00:10:48,220 we're going to say that the evolving model now is adjusted 205 00:10:48,220 --> 00:10:53,250 yet again like so. 206 00:10:53,250 --> 00:10:54,690 Oh-- but those are imperatives still. 207 00:10:54,690 --> 00:10:55,940 Let me carry that along. 208 00:11:01,230 --> 00:11:03,110 At this time, this guy-- 209 00:11:03,110 --> 00:11:05,790 the color relation-- 210 00:11:05,790 --> 00:11:09,960 goes out here to anything at all. 211 00:11:09,960 --> 00:11:12,660 So we could have just not drawn it at all, but then we 212 00:11:12,660 --> 00:11:14,510 would have lost track of the fact that we've actually 213 00:11:14,510 --> 00:11:16,260 learned that anything can be there. 214 00:11:16,260 --> 00:11:18,290 So we're going to retain the relation but have it point to 215 00:11:18,290 --> 00:11:21,620 the "anything goes" marker. 216 00:11:21,620 --> 00:11:23,170 Well, we're making great progress and I said there's 217 00:11:23,170 --> 00:11:25,480 just one more thing to go. 218 00:11:25,480 --> 00:11:29,640 So let me compress that into this area here. 219 00:11:29,640 --> 00:11:34,400 What I'm going to add this time is I'm going to say that 220 00:11:34,400 --> 00:11:46,320 the example is like everything you've seen before except that 221 00:11:46,320 --> 00:11:52,510 the top is now one of those kinds of child's bricks. 222 00:11:52,510 --> 00:11:55,280 So you have a choice actually about whether this 223 00:11:55,280 --> 00:11:56,840 is an arch or not. 224 00:11:56,840 --> 00:12:00,710 But if I say, yeah, it's still an arch, then we'd add a 225 00:12:00,710 --> 00:12:02,130 little something to its description. 226 00:12:02,130 --> 00:12:05,080 So this description would look like this. 227 00:12:05,080 --> 00:12:08,260 Same things that we've seen before in terms of support, 228 00:12:08,260 --> 00:12:09,950 but now we'd have a relation that says that 229 00:12:09,950 --> 00:12:13,240 this top is a wedge. 230 00:12:17,260 --> 00:12:18,460 And over here-- 231 00:12:18,460 --> 00:12:21,220 something we've been carrying along but not writing down-- 232 00:12:21,220 --> 00:12:23,640 this top is a block. 233 00:12:23,640 --> 00:12:28,600 A brick, I guess in the language of the day. 234 00:12:28,600 --> 00:12:31,800 So if we say that it can be either a wedge or a brick on 235 00:12:31,800 --> 00:12:34,390 top, what do we do with that? 236 00:12:34,390 --> 00:12:37,430 Once again, it depends on the nature of representation, but 237 00:12:37,430 --> 00:12:39,980 if we say that we have a representation, that has a 238 00:12:39,980 --> 00:12:42,000 hierarchy of parts. 239 00:12:42,000 --> 00:12:45,500 So bricks and wedges are both children's blocks and 240 00:12:45,500 --> 00:12:47,760 children's box or toys. 241 00:12:47,760 --> 00:12:49,830 Then we can think of drawing in a little bit of that 242 00:12:49,830 --> 00:12:55,090 hierarchy right here and saying well, let's see. 243 00:12:55,090 --> 00:13:02,670 Immediately above that we've got the brick or wedge. 244 00:13:06,000 --> 00:13:10,600 And a little bit above that we've got block. 245 00:13:10,600 --> 00:13:14,190 And a little bit above that we've got toy. 246 00:13:14,190 --> 00:13:15,980 And a little bit above that we eventually get to 247 00:13:15,980 --> 00:13:18,060 any physical object. 248 00:13:18,060 --> 00:13:24,480 So what does it do in response to that kind of situation? 249 00:13:24,480 --> 00:13:25,440 You have the choice. 250 00:13:25,440 --> 00:13:28,830 But what the program I'm speaking of actually did was 251 00:13:28,830 --> 00:13:31,750 to make a conservative generalization up here just to 252 00:13:31,750 --> 00:13:36,140 say that it's one of those guys. 253 00:13:36,140 --> 00:13:38,280 So once again it's learned something definite. 254 00:13:38,280 --> 00:13:38,680 Let me see. 255 00:13:38,680 --> 00:13:40,590 Let me count the steps. 256 00:13:40,590 --> 00:13:44,540 One, two, three, four, five. 257 00:13:50,950 --> 00:13:53,780 And I just learned four things. 258 00:13:53,780 --> 00:13:56,290 So the generalization of a color, it took two steps to 259 00:13:56,290 --> 00:13:59,300 get all the way up to "don't care." 260 00:13:59,300 --> 00:14:02,460 So note how it contrasts with anything you've seen in a 261 00:14:02,460 --> 00:14:04,840 neural net. 262 00:14:04,840 --> 00:14:06,990 Or anything you will see downstream in some of the 263 00:14:06,990 --> 00:14:09,280 other learning techniques that we'll be talking about that 264 00:14:09,280 --> 00:14:13,000 involve using thousands of samples to learn what it is-- 265 00:14:13,000 --> 00:14:16,130 to learn whatever it is that is intended to be learned. 266 00:14:19,030 --> 00:14:21,890 Let me show you another example of how these 267 00:14:21,890 --> 00:14:23,360 heuristics can be put to work. 268 00:14:37,280 --> 00:14:40,510 So there are two sets of drawings. 269 00:14:40,510 --> 00:14:43,110 We have the upper set and the lower set. 270 00:14:43,110 --> 00:14:46,360 And your task, you smart humans working in vast 271 00:14:46,360 --> 00:14:50,100 parallelism, your task is to give me a description of the 272 00:14:50,100 --> 00:14:53,740 top trains that distinguishes and separates them from the 273 00:14:53,740 --> 00:14:54,990 trains on the bottom. 274 00:15:09,540 --> 00:15:10,790 You got it? 275 00:15:14,740 --> 00:15:16,690 Nobody's got it? 276 00:15:16,690 --> 00:15:18,920 Well, let me try one on you. 277 00:15:18,920 --> 00:15:21,830 The top trains all have a short car with a closed top. 278 00:15:25,010 --> 00:15:27,235 So how is it possible that a computer could have 279 00:15:27,235 --> 00:15:29,450 figured that out? 280 00:15:29,450 --> 00:15:31,380 It turns out that it figured it out with much the same 281 00:15:31,380 --> 00:15:33,220 apparatus that I've shown you here in connection with the 282 00:15:33,220 --> 00:15:37,640 arches, just deployed in a somewhat different manner. 283 00:15:37,640 --> 00:15:40,680 In this particular case, the examples are presented one at 284 00:15:40,680 --> 00:15:43,750 a time by a teacher who's eager for 285 00:15:43,750 --> 00:15:45,610 the student to learn. 286 00:15:45,610 --> 00:15:49,950 In this case, the examples are presented all at once and the 287 00:15:49,950 --> 00:15:52,720 machine is expected to figure out a description that 288 00:15:52,720 --> 00:15:55,820 separates the two groups. 289 00:15:55,820 --> 00:15:57,070 And here's how it works. 290 00:16:01,950 --> 00:16:16,210 What you do is you start with one of them. 291 00:16:16,210 --> 00:16:17,570 But you have a lot of them. 292 00:16:17,570 --> 00:16:18,530 You have some examples-- 293 00:16:18,530 --> 00:16:21,700 we'll call the examples on top the "plus examples" and the 294 00:16:21,700 --> 00:16:28,500 examples on the bottom the "negative examples." So the 295 00:16:28,500 --> 00:16:31,020 first thing that you do is you pick one of the positive 296 00:16:31,020 --> 00:16:33,810 examples to work with. 297 00:16:33,810 --> 00:16:35,790 Anybody got any good guesses about what we're 298 00:16:35,790 --> 00:16:37,640 going to call that? 299 00:16:37,640 --> 00:16:38,480 Yeah, you do. 300 00:16:38,480 --> 00:16:39,730 We're going to call that the seed. 301 00:16:42,310 --> 00:16:45,770 It's just highly reminiscent of what we did last time when 302 00:16:45,770 --> 00:16:46,790 we were doing [? phonology ?] 303 00:16:46,790 --> 00:16:48,785 but now at a much different level. 304 00:16:48,785 --> 00:16:51,350 We're going to pick one of those guys to be the seed, and 305 00:16:51,350 --> 00:16:55,000 then we're going to take these heuristics and we're going to 306 00:16:55,000 --> 00:16:58,690 search for one that loosens this description so that it 307 00:16:58,690 --> 00:17:00,510 covers more of the positives. 308 00:17:00,510 --> 00:17:03,390 You see, if you have a seed that is exactly a description 309 00:17:03,390 --> 00:17:07,010 of a particular thing and you insist that everything be just 310 00:17:07,010 --> 00:17:10,750 like that, then nothing will match except itself. 311 00:17:10,750 --> 00:17:14,098 But you can use these heuristics to expand the 312 00:17:14,098 --> 00:17:17,800 coverage of the description, to loosen it so that it covers 313 00:17:17,800 --> 00:17:19,540 more of the positives. 314 00:17:19,540 --> 00:17:24,700 So in your first step you might cover, for example, that 315 00:17:24,700 --> 00:17:26,810 group of objects. 316 00:17:26,810 --> 00:17:30,860 Too bad for your side, you've also in that particular case 317 00:17:30,860 --> 00:17:34,600 included a negative example in your description, but perhaps 318 00:17:34,600 --> 00:17:38,070 in this next step beyond that you'll get to the point where 319 00:17:38,070 --> 00:17:42,020 you've eliminated all of those negative examples and zeroed 320 00:17:42,020 --> 00:17:46,160 in on all the positive examples. 321 00:17:46,160 --> 00:17:49,810 So how might a program be constructed that would do that 322 00:17:49,810 --> 00:17:50,500 sort of thing? 323 00:17:50,500 --> 00:17:52,140 Well, think about the choices. 324 00:17:52,140 --> 00:17:57,920 The first choice that you have it is to pick a positive 325 00:17:57,920 --> 00:18:00,950 example to be the seed. 326 00:18:03,650 --> 00:18:05,880 And once you've picked a particular example to be the 327 00:18:05,880 --> 00:18:09,350 seed, then you can apply heuristics, all of them that 328 00:18:09,350 --> 00:18:12,820 you have, to make a new description that may cover the 329 00:18:12,820 --> 00:18:13,530 data better. 330 00:18:13,530 --> 00:18:15,350 It may have more of the positives and fewer of the 331 00:18:15,350 --> 00:18:19,150 negatives than in your previous step. 332 00:18:19,150 --> 00:18:23,880 But this, if you have a lot of heuristics, and these are a 333 00:18:23,880 --> 00:18:25,850 lot of heuristics because there's a lot of description 334 00:18:25,850 --> 00:18:29,250 in that set of trains, there are lots of possible things 335 00:18:29,250 --> 00:18:31,100 that you could do with those heuristics because you could 336 00:18:31,100 --> 00:18:32,820 apply them anywhere. 337 00:18:32,820 --> 00:18:35,990 So this tree is extremely large. 338 00:18:39,870 --> 00:18:43,080 So what do you do to keep it under control? 339 00:18:43,080 --> 00:18:46,240 Well, now you have answers to questions like that by 340 00:18:46,240 --> 00:18:47,480 knee-jerk, right? 341 00:18:47,480 --> 00:18:49,680 The branching factor is too big. 342 00:18:49,680 --> 00:18:52,810 You want to keep a few solutions going. 343 00:18:52,810 --> 00:18:55,700 You have some way of measuring how well you're doing so you 344 00:18:55,700 --> 00:18:59,130 can use a beam search. 345 00:18:59,130 --> 00:19:02,880 This piece here was originally worked out by a friend of 346 00:19:02,880 --> 00:19:04,930 mine, now, alas, deceased, [? Rashad ?] 347 00:19:04,930 --> 00:19:05,610 [? Malkowski ?] 348 00:19:05,610 --> 00:19:07,000 when he was at the University of Illinois. 349 00:19:07,000 --> 00:19:09,355 And of course, he wasn't interested in toy trains, he 350 00:19:09,355 --> 00:19:12,150 was just interested in soybean diseases. 351 00:19:12,150 --> 00:19:16,080 And so this exact program was used to build descriptions of 352 00:19:16,080 --> 00:19:16,960 soybean diseases. 353 00:19:16,960 --> 00:19:18,170 It turned out to be better than the 354 00:19:18,170 --> 00:19:19,420 plant pathology books. 355 00:19:23,920 --> 00:19:27,200 We now have two ways of deploying the same heuristics. 356 00:19:27,200 --> 00:19:33,210 But my vocabulary is in need of enrichment, because I'm 357 00:19:33,210 --> 00:19:35,880 talking about "those" heuristics. 358 00:19:35,880 --> 00:19:38,010 And one of the nice things that [? Malkowski ?] 359 00:19:38,010 --> 00:19:41,490 did for me a long time ago is give each of them a name. 360 00:19:41,490 --> 00:19:45,160 So here are the names that were developed by 361 00:19:45,160 --> 00:19:46,510 [? Malkowski. ?] 362 00:19:46,510 --> 00:19:47,240 What's happening here? 363 00:19:47,240 --> 00:19:52,430 You're going from an original model to an understanding-- 364 00:19:52,430 --> 00:19:54,550 some things are essential. 365 00:19:54,550 --> 00:19:57,140 So he called this the "require link" heuristic. 366 00:20:03,590 --> 00:20:06,860 And here in the next step, we're forbidding some things 367 00:20:06,860 --> 00:20:08,290 from being there. 368 00:20:08,290 --> 00:20:09,070 So [? Malkowski ?] 369 00:20:09,070 --> 00:20:11,510 called that heuristic the "forbid link" heuristic. 370 00:20:17,310 --> 00:20:19,080 And in the next step, we're saying it can be 371 00:20:19,080 --> 00:20:20,500 either red or white. 372 00:20:20,500 --> 00:20:22,850 So we have a set of colors and we're extending it. 373 00:20:29,400 --> 00:20:33,010 And over here in this heuristic, going from red or 374 00:20:33,010 --> 00:20:37,090 white to anything goes, that's essentially forgetting about 375 00:20:37,090 --> 00:20:42,250 color altogether, so we're going to call that "drop link" 376 00:20:42,250 --> 00:20:45,680 even though for reasons of keeping track, we don't 377 00:20:45,680 --> 00:20:46,420 actually get rid of it. 378 00:20:46,420 --> 00:20:50,210 We just have it pointing to the "anything" marker. 379 00:20:50,210 --> 00:20:54,950 And finally, in this last step, what we're doing with 380 00:20:54,950 --> 00:21:00,090 this tree of categories is we're climbing up it one step. 381 00:21:00,090 --> 00:21:01,900 So he called that the "climb tree" heuristic. 382 00:21:05,070 --> 00:21:07,670 So now we have a vocabulary of things we can do in the 383 00:21:07,670 --> 00:21:11,950 learning process, and having that vocabulary gives us power 384 00:21:11,950 --> 00:21:12,470 over it, right? 385 00:21:12,470 --> 00:21:14,130 Because those are names. 386 00:21:14,130 --> 00:21:15,686 We can now say, well, what you need here is 387 00:21:15,686 --> 00:21:17,360 the "drop link" heuristic. 388 00:21:17,360 --> 00:21:22,910 And what you need over there is the "extend set" heuristic. 389 00:21:22,910 --> 00:21:25,860 So now I want to back up yet another time and 390 00:21:25,860 --> 00:21:28,850 say, well, let's see. 391 00:21:28,850 --> 00:21:30,870 When we were working with that phonology stuff, all I did was 392 00:21:30,870 --> 00:21:31,350 generalize. 393 00:21:31,350 --> 00:21:34,090 Are we just generalizing here? 394 00:21:34,090 --> 00:21:35,560 No. 395 00:21:35,560 --> 00:21:38,780 We're both generalizing and specializing. 396 00:21:38,780 --> 00:21:43,520 So when I say that the links over here that are developed 397 00:21:43,520 --> 00:21:48,050 in our first step are essential, this is a 398 00:21:48,050 --> 00:21:49,710 specialization step. 399 00:21:54,760 --> 00:21:57,560 And when I say they can't be-- 400 00:21:57,560 --> 00:21:59,880 they cannot be touch relations, that's a 401 00:21:59,880 --> 00:22:01,130 specialization step. 402 00:22:04,880 --> 00:22:08,480 Because we're able to match fewer and fewer things when we 403 00:22:08,480 --> 00:22:11,090 say you can't have touch relations. 404 00:22:11,090 --> 00:22:13,940 But over here, when I go here and say, well, it doesn't have 405 00:22:13,940 --> 00:22:14,640 to be white. 406 00:22:14,640 --> 00:22:17,220 It can also be red. 407 00:22:17,220 --> 00:22:18,470 That's a generalization. 408 00:22:21,280 --> 00:22:23,970 Now we can match more things. 409 00:22:23,970 --> 00:22:27,210 And when I drop the link altogether, that's a 410 00:22:27,210 --> 00:22:28,460 generalization. 411 00:22:31,170 --> 00:22:34,005 And when I climb the tree, that's a generalization. 412 00:22:40,150 --> 00:22:44,770 And that's why when I do this notional picture of what 413 00:22:44,770 --> 00:22:46,030 happens when [? Malkowski ?] 414 00:22:46,030 --> 00:22:48,380 program does a tree search to find a solution to the train 415 00:22:48,380 --> 00:22:51,680 problem, they're both specialization steps which 416 00:22:51,680 --> 00:22:53,740 draw in the number of things that can be matched, and 417 00:22:53,740 --> 00:22:55,485 generalization steps that make it broader. 418 00:22:58,180 --> 00:23:00,670 So, let's see. 419 00:23:00,670 --> 00:23:05,530 We've also got the notion of near miss. 420 00:23:05,530 --> 00:23:07,220 And we've got the notion of example-- 421 00:23:07,220 --> 00:23:08,450 some of these things are examples, 422 00:23:08,450 --> 00:23:09,890 some are near misses. 423 00:23:09,890 --> 00:23:13,090 We've got generalization specialization. 424 00:23:13,090 --> 00:23:17,290 Does one go with one or the other, or are they all mixed 425 00:23:17,290 --> 00:23:19,330 up in their relationship to each other? 426 00:23:19,330 --> 00:23:22,490 Can you generalize and specialize with near misses? 427 00:23:22,490 --> 00:23:24,310 What do you think? 428 00:23:24,310 --> 00:23:25,570 You think-- 429 00:23:25,570 --> 00:23:27,146 you don't think so, [INAUDIBLE]? 430 00:23:27,146 --> 00:23:28,132 What do you think? 431 00:23:28,132 --> 00:23:29,611 STUDENT: [INAUDIBLE] 432 00:23:29,611 --> 00:23:30,104 specialization. 433 00:23:30,104 --> 00:23:31,090 PROFESSOR PATRICK WINSTON: [INAUDIBLE] lead to 434 00:23:31,090 --> 00:23:32,569 specialization. 435 00:23:32,569 --> 00:23:35,050 Let's see if that's right. 436 00:23:35,050 --> 00:23:39,380 So we've got specialization here, and that's a near miss. 437 00:23:39,380 --> 00:23:44,050 We've got specialization here, and that's a near miss. 438 00:23:44,050 --> 00:23:49,430 We've got generalization here, and that's an example. 439 00:23:49,430 --> 00:23:53,540 And we've got generalization here, and that's an example. 440 00:23:53,540 --> 00:23:56,550 And we've got generalization here, and that's an example. 441 00:23:56,550 --> 00:23:59,000 So [INAUDIBLE] has got that one nailed. 442 00:23:59,000 --> 00:24:01,650 The examples always generalize, and the near 443 00:24:01,650 --> 00:24:02,910 misses always specialize. 444 00:24:02,910 --> 00:24:05,580 So we've got apparatuses in place that allow us to both 445 00:24:05,580 --> 00:24:10,910 expand what we could match and shrink what we could match. 446 00:24:10,910 --> 00:24:12,380 So what has this got to do anything? 447 00:24:12,380 --> 00:24:16,260 Well, which one of these methods is better, by the way? 448 00:24:16,260 --> 00:24:17,920 This one-- 449 00:24:17,920 --> 00:24:20,780 this one requires a teacher to organize everything up. 450 00:24:20,780 --> 00:24:26,320 This one can handle it in batch mode. 451 00:24:26,320 --> 00:24:29,600 This one is the sort of thing you would need to do with a 452 00:24:29,600 --> 00:24:31,900 human because we don't have much memory. 453 00:24:31,900 --> 00:24:33,970 That one is the sort of thing that a computer's good at 454 00:24:33,970 --> 00:24:35,790 because it has lots of memory. 455 00:24:35,790 --> 00:24:38,690 So which one's better? 456 00:24:38,690 --> 00:24:41,010 Well, it depends on what you're trying to do. 457 00:24:41,010 --> 00:24:44,425 If you're trying to build a machine that analyzes the 458 00:24:44,425 --> 00:24:46,610 stock market, you might want to go that way. 459 00:24:46,610 --> 00:24:50,540 Or soybean diseases, or any one of a variety 460 00:24:50,540 --> 00:24:51,290 of practical problems. 461 00:24:51,290 --> 00:24:54,440 If you're trying to model people, then maybe this is a 462 00:24:54,440 --> 00:24:57,690 way that deserves additional merit. 463 00:24:57,690 --> 00:24:59,450 How do you get all that sorted out? 464 00:24:59,450 --> 00:25:03,840 Well, one way to get it all sorted out is to talk in terms 465 00:25:03,840 --> 00:25:12,875 of what are sometimes called "felicity conditions." So when 466 00:25:12,875 --> 00:25:14,570 I talk about felicity conditions, I'm talking about 467 00:25:14,570 --> 00:25:16,980 a teacher and a student and covenants that 468 00:25:16,980 --> 00:25:18,710 hold between them. 469 00:25:18,710 --> 00:25:20,655 So here's the teacher. 470 00:25:26,100 --> 00:25:29,090 That's me. 471 00:25:29,090 --> 00:25:30,340 And here's the student. 472 00:25:33,980 --> 00:25:35,230 That's you. 473 00:25:37,270 --> 00:25:44,200 And the objective of interaction is to transform an 474 00:25:44,200 --> 00:25:57,560 initial state of knowledge into a new state of knowledge 475 00:25:57,560 --> 00:26:04,465 so that the student is smarter and able to make use of that 476 00:26:04,465 --> 00:26:11,565 new knowledge to do things that couldn't be done before 477 00:26:11,565 --> 00:26:13,950 by the student. 478 00:26:13,950 --> 00:26:15,655 So the student over here has a learner. 479 00:26:22,150 --> 00:26:26,005 And he has something that uses what is learned. 480 00:26:29,310 --> 00:26:31,620 And the teacher over here has a style. 481 00:26:34,890 --> 00:26:39,720 So if any learning is to take place, one side has to know 482 00:26:39,720 --> 00:26:42,230 something about the other side. 483 00:26:42,230 --> 00:26:52,440 For example, it's helpful if the teacher understands the 484 00:26:52,440 --> 00:26:55,200 initial state of the student. 485 00:26:55,200 --> 00:26:58,050 And here's one way of thinking about that. 486 00:27:16,830 --> 00:27:20,350 You can think of what you know as forming a kind of network. 487 00:27:20,350 --> 00:27:23,750 So initially, you don't know anything. 488 00:27:23,750 --> 00:27:26,780 But as you learn, you start 489 00:27:26,780 --> 00:27:28,295 developing quanta of knowledge. 490 00:27:37,620 --> 00:27:40,360 And these quanta of knowledge are all linked together by 491 00:27:40,360 --> 00:27:44,410 prerequisite relationships that might indicate how you 492 00:27:44,410 --> 00:27:47,130 get from one quantum to another. 493 00:27:47,130 --> 00:27:49,200 So maybe you have generalization links, maybe 494 00:27:49,200 --> 00:27:51,560 you have specialization links, maybe you have combination 495 00:27:51,560 --> 00:27:54,420 links, but you can think of what you know as forming this 496 00:27:54,420 --> 00:27:56,590 kind of network. 497 00:27:56,590 --> 00:27:59,640 Now your state of knowledge at any particular time can then 498 00:27:59,640 --> 00:28:04,700 be viewed as a kind of wavefront in that space. 499 00:28:04,700 --> 00:28:08,020 So if I, the teacher, know where your wavefront is, can I 500 00:28:08,020 --> 00:28:10,990 do a better job of teaching you stuff? 501 00:28:10,990 --> 00:28:13,510 Sure, for this reason. 502 00:28:13,510 --> 00:28:20,130 Suppose you make a mistake, m1, that depends on q1. 503 00:28:20,130 --> 00:28:22,600 Way, way behind your wavefront. 504 00:28:22,600 --> 00:28:24,230 What do I do if I know that you made a 505 00:28:24,230 --> 00:28:26,935 mistake of that kind? 506 00:28:26,935 --> 00:28:30,620 Oh, I just say, oh, you forgot you need a semicolon after 507 00:28:30,620 --> 00:28:32,450 that kind of statement. 508 00:28:32,450 --> 00:28:34,710 I just remind you of something that you certainly know, you 509 00:28:34,710 --> 00:28:36,560 just overlooked. 510 00:28:36,560 --> 00:28:37,720 Right? 511 00:28:37,720 --> 00:28:42,030 On the other hand, suppose you make a mistake that depends on 512 00:28:42,030 --> 00:28:44,620 a piece of knowledge way out here. 513 00:28:44,620 --> 00:28:46,590 That kind of mistake, m2. 514 00:28:46,590 --> 00:28:50,075 What do I say to you then? 515 00:28:50,075 --> 00:28:52,480 What do you think, Patrick? 516 00:28:52,480 --> 00:28:53,245 What do you think I would say if you made 517 00:28:53,245 --> 00:28:55,021 that kind of mistake? 518 00:28:55,021 --> 00:28:56,969 STUDENT: [INAUDIBLE]. 519 00:28:56,969 --> 00:28:57,943 PROFESSOR PATRICK WINSTON: No. 520 00:28:57,943 --> 00:29:02,813 That's not what I would say [INAUDIBLE]. 521 00:29:02,813 --> 00:29:05,248 STUDENT: You'd tell us that we don't know that yet. 522 00:29:05,248 --> 00:29:07,196 PROFESSOR PATRICK WINSTON: I would say something like that. 523 00:29:07,196 --> 00:29:10,140 What [INAUDIBLE] suggested I would say. 524 00:29:10,140 --> 00:29:12,470 Oh, don't worry about that. 525 00:29:12,470 --> 00:29:13,420 We'll get to it. 526 00:29:13,420 --> 00:29:15,770 We're not ready for it yet. 527 00:29:15,770 --> 00:29:19,030 So in this case, I remind somebody of something they 528 00:29:19,030 --> 00:29:20,360 already know. 529 00:29:20,360 --> 00:29:23,370 In this case, I tell them they'll learn about it later. 530 00:29:23,370 --> 00:29:27,550 So what do I do with mistake number three? 531 00:29:27,550 --> 00:29:29,710 That's the learning moment. 532 00:29:29,710 --> 00:29:31,960 That's where I can push the wavefront out. 533 00:29:31,960 --> 00:29:33,550 Because everything's in place to learn the 534 00:29:33,550 --> 00:29:36,770 stuff at the next radius. 535 00:29:36,770 --> 00:29:38,770 So if I know that the student has made a mistake on that 536 00:29:38,770 --> 00:29:41,490 wavefront, that's when I say, this is the teaching moment. 537 00:29:41,490 --> 00:29:43,500 This is when I explain something. 538 00:29:43,500 --> 00:29:48,290 So that's why it's important for the teacher to have a good 539 00:29:48,290 --> 00:29:52,860 model of where the student is in the 540 00:29:52,860 --> 00:29:54,110 initial state of knowledge. 541 00:29:56,910 --> 00:30:00,970 Next thing that's important for the teacher to know is the 542 00:30:00,970 --> 00:30:03,220 way that the student learns. 543 00:30:03,220 --> 00:30:05,290 Because if the student is a computer, they can handle the 544 00:30:05,290 --> 00:30:06,030 stuff in batch. 545 00:30:06,030 --> 00:30:07,280 That's one thing. 546 00:30:07,280 --> 00:30:10,810 If the student is a third grader who has a limited 547 00:30:10,810 --> 00:30:14,880 capacity to store stuff, then that makes a difference in how 548 00:30:14,880 --> 00:30:15,370 you teach it. 549 00:30:15,370 --> 00:30:19,260 You might teach it that way to the third grader, and that 550 00:30:19,260 --> 00:30:23,080 way, buried underneath this board, to a computer. 551 00:30:23,080 --> 00:30:25,810 So you need to understand the way that the learner-- 552 00:30:25,810 --> 00:30:30,130 the computational capacity of the learner. 553 00:30:30,130 --> 00:30:32,610 And there's also a need to understand the computational 554 00:30:32,610 --> 00:30:38,160 capacity of the user box down there, because sometimes you 555 00:30:38,160 --> 00:30:42,120 can be taught stuff that you can't actually use. 556 00:30:42,120 --> 00:30:44,550 So by now, most of you have attempted to read that 557 00:30:44,550 --> 00:30:46,920 sentence up there, right? 558 00:30:46,920 --> 00:30:49,690 And it seems screwy, right? 559 00:30:49,690 --> 00:30:52,970 It seems unintelligible, perhaps? 560 00:30:52,970 --> 00:30:54,350 It's a garden path sentence. 561 00:30:54,350 --> 00:30:57,750 It makes perfectly good English, but the way you 562 00:30:57,750 --> 00:31:00,300 generally read it, it doesn't, because you have a limited 563 00:31:00,300 --> 00:31:04,580 buffer in your language processor. 564 00:31:04,580 --> 00:31:06,000 What does this mean? 565 00:31:06,000 --> 00:31:10,130 You're expecting this to be "to." Question. 566 00:31:10,130 --> 00:31:11,680 But it's actually a command. 567 00:31:11,680 --> 00:31:12,820 Here's the deal. 568 00:31:12,820 --> 00:31:15,560 Somebody's got to give the students their grades. 569 00:31:15,560 --> 00:31:18,700 Well, we can have their parents do it. 570 00:31:18,700 --> 00:31:21,650 Have the grades given to their students by 571 00:31:21,650 --> 00:31:23,340 their parents, then. 572 00:31:23,340 --> 00:31:24,180 So it's a command. 573 00:31:24,180 --> 00:31:25,970 And you garden path on it, because you have limited 574 00:31:25,970 --> 00:31:28,230 buffer space in your language processor. 575 00:31:28,230 --> 00:31:31,310 So with parentheses you can understand it. 576 00:31:31,310 --> 00:31:32,520 You can learn about it. 577 00:31:32,520 --> 00:31:34,150 You can see that it's good English, but you can't 578 00:31:34,150 --> 00:31:38,100 generally process that kind of sentence without going back 579 00:31:38,100 --> 00:31:40,450 and starting over. 580 00:31:40,450 --> 00:31:42,140 And what about going the other way? 581 00:31:42,140 --> 00:31:45,670 Are there covenants that we have to have here that involve 582 00:31:45,670 --> 00:31:48,900 the student understanding some things about the teacher? 583 00:31:48,900 --> 00:31:52,210 Well, first thing there is is trust. 584 00:31:52,210 --> 00:31:54,690 The student has to presume that the teacher is teaching 585 00:31:54,690 --> 00:31:57,420 the student correct information, 586 00:31:57,420 --> 00:32:00,460 not lying to student. 587 00:32:00,460 --> 00:32:02,700 Ratified that you're all here because presumably you all 588 00:32:02,700 --> 00:32:05,700 think that I'm not trying to screw you by telling you stuff 589 00:32:05,700 --> 00:32:07,540 that's a lie. 590 00:32:07,540 --> 00:32:10,990 There's also this sort of thing down here. 591 00:32:10,990 --> 00:32:13,440 Understanding of the teacher's style. 592 00:32:13,440 --> 00:32:15,590 So you might say, well, professor x, all he does is 593 00:32:15,590 --> 00:32:18,230 read slides to us in class, so why go? 594 00:32:18,230 --> 00:32:19,760 You wouldn't be entirely misadvised. 595 00:32:19,760 --> 00:32:22,530 That's an understanding of one kind of style. 596 00:32:22,530 --> 00:32:24,620 Or you can say, well, old Winston, he tries to tell us 597 00:32:24,620 --> 00:32:29,040 something definite and convey a family of powerful ideas in 598 00:32:29,040 --> 00:32:29,830 every class. 599 00:32:29,830 --> 00:32:31,590 So maybe it's worth dragging yourself out of bed at 10 600 00:32:31,590 --> 00:32:32,850 o'clock in the morning. 601 00:32:32,850 --> 00:32:35,850 Those are style issues, and those are things that the 602 00:32:35,850 --> 00:32:39,500 student uses to determine how to match the student's style 603 00:32:39,500 --> 00:32:43,370 against that of the instructor. 604 00:32:43,370 --> 00:32:48,330 So that helps us to interpret or think about differences in 605 00:32:48,330 --> 00:32:51,010 style so that we can appreciate whether we ought to 606 00:32:51,010 --> 00:32:56,800 be learning that way, where that way is the way that's 607 00:32:56,800 --> 00:32:59,520 underneath down here, the way you would teach a computer, 608 00:32:59,520 --> 00:33:00,970 the way [? Malkowski ?] 609 00:33:00,970 --> 00:33:03,600 taught a computer about soybean diseases. 610 00:33:03,600 --> 00:33:08,090 We can do it that way, or we can do it this way with a 611 00:33:08,090 --> 00:33:10,570 teacher who deliberately organizes and shapes the 612 00:33:10,570 --> 00:33:14,150 learning sequence for the benefit of a student who has a 613 00:33:14,150 --> 00:33:18,260 limited processing capability. 614 00:33:18,260 --> 00:33:19,680 Now you're humans, right? 615 00:33:19,680 --> 00:33:22,910 So think about what the machine has to do here. 616 00:33:22,910 --> 00:33:23,570 The machine-- 617 00:33:23,570 --> 00:33:26,380 in order to learn anything definite in each of those 618 00:33:26,380 --> 00:33:29,490 steps, the machine has to build a description. 619 00:33:29,490 --> 00:33:32,410 So it has to describe the examples to itself. 620 00:33:32,410 --> 00:33:33,450 That's unquestioned, right? 621 00:33:33,450 --> 00:33:36,460 Because what it's doing is looking at the differences. 622 00:33:36,460 --> 00:33:38,420 So it can't look at the differences unless it's got 623 00:33:38,420 --> 00:33:39,670 descriptions of things. 624 00:33:42,480 --> 00:33:46,630 So if you're like the machine, then you can't learn anything 625 00:33:46,630 --> 00:33:49,200 unless you build descriptions. 626 00:33:49,200 --> 00:33:53,140 Unless you talk to yourself. 627 00:33:53,140 --> 00:33:56,030 And if you talk to yourself, you're building the kind of 628 00:33:56,030 --> 00:33:58,340 descriptions that make it possible for 629 00:33:58,340 --> 00:34:00,850 you to do the learning. 630 00:34:00,850 --> 00:34:04,220 And you say to me, I'm an MIT student. 631 00:34:04,220 --> 00:34:06,640 I want to see the numbers. 632 00:34:06,640 --> 00:34:08,040 So let me show you the numbers. 633 00:34:08,040 --> 00:34:10,290 And when I'm going to show numbers-- 634 00:34:10,290 --> 00:34:12,400 the numbers that I'm going to show you show you the virtues 635 00:34:12,400 --> 00:34:15,600 of talking to yourself. 636 00:34:15,600 --> 00:34:18,190 So here's the experiment. 637 00:34:18,190 --> 00:34:21,750 The experiment was done by a friend of mine, Michelene Chi. 638 00:34:21,750 --> 00:34:25,170 Always seems to go by the name Mickey Chi. 639 00:34:38,130 --> 00:34:39,460 There he is. 640 00:34:39,460 --> 00:34:40,340 So here's the deal. 641 00:34:40,340 --> 00:34:44,580 The students that she worked with were expected to learn 642 00:34:44,580 --> 00:34:45,989 about elementary physics. 643 00:34:45,989 --> 00:34:48,060 801 type stuff. 644 00:34:48,060 --> 00:34:52,719 And she took eight subjects, and she had them-- 645 00:34:52,719 --> 00:34:54,820 she took them through a bunch of examples and then she gave 646 00:34:54,820 --> 00:34:57,220 them an examination. 647 00:34:57,220 --> 00:35:01,400 So eight subjects, and so they divide into two groups. 648 00:35:01,400 --> 00:35:03,890 The bottom half and the top half. 649 00:35:03,890 --> 00:35:06,750 The ones who did better than average and the ones who did 650 00:35:06,750 --> 00:35:09,320 worse than average. 651 00:35:09,320 --> 00:35:12,740 So then you can say, well, OK, what did that mean? 652 00:35:12,740 --> 00:35:14,940 You can say, how much did they talk to themselves? 653 00:35:14,940 --> 00:35:17,680 Well, that was measured by having them talk out loud as 654 00:35:17,680 --> 00:35:20,800 they solved the problems on an examination. 655 00:35:20,800 --> 00:35:25,440 So we could ask how much self explanation was done by the 656 00:35:25,440 --> 00:35:28,470 smart ones versus the less smart ones? 657 00:35:28,470 --> 00:35:30,830 And here are the results. 658 00:35:30,830 --> 00:35:34,620 The worst ones-- the worst four said about 10 things to 659 00:35:34,620 --> 00:35:36,390 themselves. 660 00:35:36,390 --> 00:35:42,530 The best four said about 35 things to themselves. 661 00:35:42,530 --> 00:35:44,470 That's a pretty dramatic difference. 662 00:35:44,470 --> 00:35:48,290 Here's the data in a more straightforward form. 663 00:35:48,290 --> 00:35:50,840 This, by the way, points out that the smart ones scored 664 00:35:50,840 --> 00:35:54,870 twice as high as the less smart ones. 665 00:35:54,870 --> 00:35:56,870 And when we look at the number of explanations they gave 666 00:35:56,870 --> 00:36:01,760 themselves in two categories, smart ones said three times as 667 00:36:01,760 --> 00:36:04,460 much stuff to themselves as the less smart ones. 668 00:36:04,460 --> 00:36:07,760 So, as you can see, the explanations break down into 669 00:36:07,760 --> 00:36:09,170 two groups. 670 00:36:09,170 --> 00:36:13,770 Some have to do with monitoring and not with 671 00:36:13,770 --> 00:36:14,760 physics at all. 672 00:36:14,760 --> 00:36:18,010 They're things like, oh hell, I'm stuck. 673 00:36:18,010 --> 00:36:22,250 Or, I don't know what to do. 674 00:36:22,250 --> 00:36:24,490 And the others have to do with physics. 675 00:36:24,490 --> 00:36:27,550 Things like, well, maybe I should draw a force diagram. 676 00:36:27,550 --> 00:36:32,340 Or let me write down f equals ma, or something like that, as 677 00:36:32,340 --> 00:36:34,200 physics knowledge. 678 00:36:34,200 --> 00:36:37,975 I think it's interesting that this average score is 679 00:36:37,975 --> 00:36:41,080 different by a factor of two, and the average talking to 680 00:36:41,080 --> 00:36:44,580 oneself differed by a factor of three. 681 00:36:44,580 --> 00:36:49,280 Now this isn't quite there, because what's not clear is if 682 00:36:49,280 --> 00:36:51,900 you encourage somebody to talk to themself, and they talk to 683 00:36:51,900 --> 00:36:54,570 themselves more than they would have ordinarily, does 684 00:36:54,570 --> 00:36:55,770 that make them score better? 685 00:36:55,770 --> 00:36:58,280 All we know is that the ones who talk to themselves more do 686 00:36:58,280 --> 00:36:59,890 score better. 687 00:36:59,890 --> 00:37:04,510 But anecdotally, talking to some veterans of 6.034, 688 00:37:04,510 --> 00:37:06,240 they've started talking to themselves more when they 689 00:37:06,240 --> 00:37:09,065 solve problems, and they think that it makes them smarter. 690 00:37:11,830 --> 00:37:16,490 Now I would caution you not to do this too much in public. 691 00:37:16,490 --> 00:37:18,590 Because people can get the wrong idea if you talk to 692 00:37:18,590 --> 00:37:19,430 yourself too much. 693 00:37:19,430 --> 00:37:22,350 But it does seem-- 694 00:37:22,350 --> 00:37:29,270 it does, in fact, seem to help. 695 00:37:29,270 --> 00:37:32,260 Now what I did last time is I told you how 696 00:37:32,260 --> 00:37:33,750 to be a good scientist. 697 00:37:33,750 --> 00:37:35,890 What I'm telling you now is how to make yourself smarter. 698 00:37:35,890 --> 00:37:38,160 And I want to conclude this hour by telling you about how 699 00:37:38,160 --> 00:37:42,670 you can package your ideas so that they have greater impact. 700 00:37:42,670 --> 00:37:45,720 So I guess I could have said, how to make yourself more 701 00:37:45,720 --> 00:37:49,430 famous, but I've limited myself to saying how to 702 00:37:49,430 --> 00:37:50,570 package your ideas better. 703 00:37:50,570 --> 00:37:53,010 And the reason you want to package your ideas better is 704 00:37:53,010 --> 00:37:55,260 because if you package your ideas better than the next 705 00:37:55,260 --> 00:37:57,630 slug, then you're going to get the faculty position and 706 00:37:57,630 --> 00:37:59,130 they're not. 707 00:37:59,130 --> 00:38:00,050 If you say to me, I'm going to be an 708 00:38:00,050 --> 00:38:02,310 entrepreneur, same thing. 709 00:38:02,310 --> 00:38:03,950 You're going to get the venture capitalist money and 710 00:38:03,950 --> 00:38:07,580 the next slug won't if you package your ideas better. 711 00:38:07,580 --> 00:38:10,920 So this little piece of work on the arch business got a 712 00:38:10,920 --> 00:38:13,600 whole lot more famous than I ever expected. 713 00:38:13,600 --> 00:38:16,310 I did it when I was young and stupid, and didn't have any 714 00:38:16,310 --> 00:38:19,400 idea what qualities might emerge from a piece of work 715 00:38:19,400 --> 00:38:21,240 that would make it well known. 716 00:38:21,240 --> 00:38:23,160 I only figured it out much later. 717 00:38:23,160 --> 00:38:29,310 But in retrospect, it has five qualities that you can think 718 00:38:29,310 --> 00:38:31,950 about when you're deciding whether your packaging of your 719 00:38:31,950 --> 00:38:37,240 idea is in a form that will lead to that idea becoming 720 00:38:37,240 --> 00:38:39,490 well known. 721 00:38:39,490 --> 00:38:43,340 And since there are five of them, it's convenient to put 722 00:38:43,340 --> 00:38:51,520 them all on the points of a star like so. 723 00:38:51,520 --> 00:38:55,790 So quality number one. 724 00:38:55,790 --> 00:38:57,870 I've made these all into s-words just to make them 725 00:38:57,870 --> 00:38:59,720 easier to remember. 726 00:38:59,720 --> 00:39:03,390 Quality number one is that there's some kind of symbol 727 00:39:03,390 --> 00:39:05,270 associated with a work. 728 00:39:05,270 --> 00:39:09,560 Some kind of visual handle that people will use to 729 00:39:09,560 --> 00:39:11,760 remember your idea. 730 00:39:11,760 --> 00:39:14,890 So what's the visual symbol here? 731 00:39:14,890 --> 00:39:17,200 Well, that's astonishingly easy to figure out, right? 732 00:39:17,200 --> 00:39:19,140 That's the arch. 733 00:39:19,140 --> 00:39:21,110 For years without my intending it, this 734 00:39:21,110 --> 00:39:24,670 was called arch learning. 735 00:39:24,670 --> 00:39:26,840 So you need a symbol. 736 00:39:26,840 --> 00:39:29,760 Then you also need a slogan. 737 00:39:35,940 --> 00:39:37,820 That's a kind of verbal handle. 738 00:39:37,820 --> 00:39:41,180 It doesn't explain the idea, but it's enough of a handle 739 00:39:41,180 --> 00:39:45,280 to, as Minsky would say, put you back in the mental state 740 00:39:45,280 --> 00:39:46,920 you were in when you understood the idea in the 741 00:39:46,920 --> 00:39:48,540 first place. 742 00:39:48,540 --> 00:39:51,430 So what is the slogan for this work? 743 00:39:51,430 --> 00:39:53,760 Anybody have any ideas? 744 00:39:53,760 --> 00:39:56,745 Pretty obvious. 745 00:39:56,745 --> 00:39:59,727 What's essential to this process working? 746 00:39:59,727 --> 00:40:02,212 The ability to present an example is very similar 747 00:40:02,212 --> 00:40:05,616 [INAUDIBLE], that constitutes a model but 748 00:40:05,616 --> 00:40:06,820 isn't one of those. 749 00:40:06,820 --> 00:40:07,310 STUDENT: [INAUDIBLE]. 750 00:40:07,310 --> 00:40:08,780 PROFESSOR PATRICK WINSTON: So it's a near miss. 751 00:40:17,620 --> 00:40:19,700 The next thing you need if your work is going to become 752 00:40:19,700 --> 00:40:21,300 well known is a surprise. 753 00:40:29,550 --> 00:40:32,180 What's the surprise with this stuff? 754 00:40:32,180 --> 00:40:33,050 Well, the surprise-- 755 00:40:33,050 --> 00:40:35,310 everything that had been done in artificial intelligence 756 00:40:35,310 --> 00:40:37,800 having to do with learning before this time was 757 00:40:37,800 --> 00:40:40,120 precursors to neural nets. 758 00:40:40,120 --> 00:40:42,490 Thousands of examples to learn anything. 759 00:40:42,490 --> 00:40:46,410 So the big surprise was that it was possible for a machine 760 00:40:46,410 --> 00:40:49,950 to learn something definite from each of the examples. 761 00:40:49,950 --> 00:40:53,810 So that now goes by the name of one shot learning. 762 00:40:53,810 --> 00:40:55,890 That was the surprise, that a computer could learn something 763 00:40:55,890 --> 00:40:59,680 definite from a single example. 764 00:40:59,680 --> 00:41:00,120 So let's see. 765 00:41:00,120 --> 00:41:03,630 We've almost completed our star. 766 00:41:03,630 --> 00:41:05,120 But there are more points on it. 767 00:41:05,120 --> 00:41:06,370 So this point is the salient. 768 00:41:10,530 --> 00:41:12,680 What's a salient-- 769 00:41:12,680 --> 00:41:13,930 what's a salient idea? 770 00:41:16,400 --> 00:41:17,996 Jose, do you know what a salient idea is? 771 00:41:20,912 --> 00:41:24,800 He's too shy to tell me. 772 00:41:24,800 --> 00:41:27,716 What's a salient idea? 773 00:41:27,716 --> 00:41:30,065 Ah, who said important? 774 00:41:30,065 --> 00:41:31,550 Wrong answer, but very good. 775 00:41:31,550 --> 00:41:34,025 You're not shy. 776 00:41:34,025 --> 00:41:35,510 So what does it really mean? 777 00:41:35,510 --> 00:41:36,500 Yes. 778 00:41:36,500 --> 00:41:37,490 STUDENT: Relative to what somebody's 779 00:41:37,490 --> 00:41:39,470 already thinking about? 780 00:41:39,470 --> 00:41:40,955 PROFESSOR PATRICK WINSTON: Relative to what somebody's 781 00:41:40,955 --> 00:41:41,450 thinking about. 782 00:41:41,450 --> 00:41:42,700 Not quite. 783 00:41:48,380 --> 00:41:50,360 If you have a-- 784 00:41:50,360 --> 00:41:51,610 if you're an expert in-- 785 00:41:53,825 --> 00:41:54,815 yes? 786 00:41:54,815 --> 00:41:56,300 STUDENT: [INAUDIBLE]. 787 00:41:56,300 --> 00:41:56,795 PROFESSOR PATRICK WINSTON: Really close. 788 00:41:56,795 --> 00:42:00,260 We're getting closer. 789 00:42:00,260 --> 00:42:01,250 [INAUDIBLE]. 790 00:42:01,250 --> 00:42:01,745 Yes? 791 00:42:01,745 --> 00:42:04,467 STUDENT: Maybe an idea that wasn't obviously apparent, but 792 00:42:04,467 --> 00:42:08,740 becomes apparent gradually as somebody starts to understand? 793 00:42:08,740 --> 00:42:09,660 PROFESSOR PATRICK WINSTON: We're zeroing-- we're circling 794 00:42:09,660 --> 00:42:12,040 the wagons here and zeroing in on it. 795 00:42:12,040 --> 00:42:12,978 Yes? 796 00:42:12,978 --> 00:42:15,730 STUDENT: If I'm preempting what you're about to say, it 797 00:42:15,730 --> 00:42:19,455 has sort of a doorway of how you can understand the idea. 798 00:42:19,455 --> 00:42:19,954 PROFESSOR PATRICK WINSTON: It's what? 799 00:42:19,954 --> 00:42:20,453 Sorry. 800 00:42:20,453 --> 00:42:22,948 STUDENT: It's sort of like a doorway of how you 801 00:42:22,948 --> 00:42:26,940 can grasp the idea. 802 00:42:26,940 --> 00:42:29,250 PROFESSOR PATRICK WINSTON: That's sort if it, too, but if 803 00:42:29,250 --> 00:42:31,700 you study military history, what's the salient on a fort? 804 00:42:36,020 --> 00:42:38,090 Well, this is a good word to have in your vocabulary 805 00:42:38,090 --> 00:42:41,850 because it sort of means all of those things, but what it 806 00:42:41,850 --> 00:42:44,850 really means is something that sticks out. 807 00:42:44,850 --> 00:42:48,660 So on a fort, if this were a fort, these would all be 808 00:42:48,660 --> 00:42:51,380 salients because they stick out. 809 00:42:51,380 --> 00:42:54,020 So the salient idea is usually important 810 00:42:54,020 --> 00:42:55,870 because it sticks out. 811 00:42:55,870 --> 00:42:57,780 But it's not-- the meaning is not "important," the meaning 812 00:42:57,780 --> 00:42:59,300 is "stick out." 813 00:42:59,300 --> 00:43:02,180 So a piece of work becomes more famous if it has 814 00:43:02,180 --> 00:43:04,130 something that sticks out. 815 00:43:04,130 --> 00:43:04,930 It's interesting. 816 00:43:04,930 --> 00:43:06,840 There are theses that have been written at MIT that have 817 00:43:06,840 --> 00:43:08,090 too many good ideas. 818 00:43:10,190 --> 00:43:12,160 And how can have too many good ideas? 819 00:43:12,160 --> 00:43:15,170 Well, you can have too many good ideas if no one idea 820 00:43:15,170 --> 00:43:18,120 rises above and becomes the idea that people think about 821 00:43:18,120 --> 00:43:20,310 when they think about you. 822 00:43:20,310 --> 00:43:22,130 We have people on the faculty who would have been more 823 00:43:22,130 --> 00:43:24,280 famous if their theses had fewer ideas. 824 00:43:24,280 --> 00:43:26,530 It's amazing. 825 00:43:26,530 --> 00:43:29,670 So this piece of work did have a salient. 826 00:43:29,670 --> 00:43:33,390 And the salient idea was that you could get one shot 827 00:43:33,390 --> 00:43:38,920 learning via the use of near misses. 828 00:43:38,920 --> 00:43:41,290 That was the salient idea. 829 00:43:41,290 --> 00:43:44,660 The fifth thing, ah. 830 00:43:44,660 --> 00:43:47,600 Talk more about this in my "How to 831 00:43:47,600 --> 00:43:48,960 Speak" lecture in January. 832 00:43:48,960 --> 00:43:51,950 The fifth thing I like people to try to incorporate into 833 00:43:51,950 --> 00:43:54,020 their presentations is a story. 834 00:43:57,000 --> 00:44:00,290 Because we humans somehow love stories. 835 00:44:00,290 --> 00:44:01,610 We love people to tell us stories. 836 00:44:01,610 --> 00:44:03,260 We love things to be packaged in stories. 837 00:44:03,260 --> 00:44:06,920 And believe me, I think all of education is essentially about 838 00:44:06,920 --> 00:44:09,850 storytelling and story understanding. 839 00:44:09,850 --> 00:44:12,480 So if you want your idea to be sold to the venture 840 00:44:12,480 --> 00:44:16,850 capitalist, if you want to get the faculty job, if you want 841 00:44:16,850 --> 00:44:19,940 to get your book sold to a publisher, if you want to sell 842 00:44:19,940 --> 00:44:23,720 something to a customer, ask yourself if your presentation 843 00:44:23,720 --> 00:44:25,300 has these qualities in it. 844 00:44:25,300 --> 00:44:28,230 And if it has all of those things, it's a lot more likely 845 00:44:28,230 --> 00:44:30,120 to be effective than it doesn't. 846 00:44:30,120 --> 00:44:31,550 And you'll end up being famous. 847 00:44:31,550 --> 00:44:35,030 Now you say to me, well, being famous-- that sounds like the 848 00:44:35,030 --> 00:44:38,020 Sloan School type of concept. 849 00:44:38,020 --> 00:44:39,990 Isn't it immoral to want to be famous? 850 00:44:42,570 --> 00:44:45,270 Maybe that's a decision you can make. 851 00:44:45,270 --> 00:44:50,880 But whenever I think about the question, I somehow think of 852 00:44:50,880 --> 00:44:52,730 the idea that your ideas are like your children. 853 00:44:52,730 --> 00:44:56,050 You want to be sure that they have the best life possible. 854 00:44:56,050 --> 00:44:59,390 So if they're not packaged well, they won't. 855 00:44:59,390 --> 00:45:06,670 I'm also reminded of an evening I spent at a soiree 856 00:45:06,670 --> 00:45:10,400 with Julia Child. 857 00:45:10,400 --> 00:45:14,600 Julia, and there's me. 858 00:45:14,600 --> 00:45:16,770 And I have no idea how come I got to sit 859 00:45:16,770 --> 00:45:17,700 next to Julia Child. 860 00:45:17,700 --> 00:45:20,600 I think they thought I was one of the rich Winstons. 861 00:45:20,600 --> 00:45:23,250 The Winston flowers, or the Harry Winston diamonds or 862 00:45:23,250 --> 00:45:23,960 something like that. 863 00:45:23,960 --> 00:45:26,700 There I was, sitting next to Julia Child. 864 00:45:26,700 --> 00:45:27,940 And the interesting thing-- 865 00:45:27,940 --> 00:45:31,379 by the way, did you notice I'm now telling a story? 866 00:45:31,379 --> 00:45:36,120 The interesting thing about this experience was that there 867 00:45:36,120 --> 00:45:42,760 was a constant flow of people-- 868 00:45:42,760 --> 00:45:44,410 happened to be all women-- 869 00:45:44,410 --> 00:45:49,430 people going past Ms. Child saying how wonderful she was 870 00:45:49,430 --> 00:45:52,840 to have made such an enormous change in their life. 871 00:45:52,840 --> 00:45:53,340 Must have been 10 of them. 872 00:45:53,340 --> 00:45:54,190 It was amazing. 873 00:45:54,190 --> 00:45:56,250 Just steady flow. 874 00:45:56,250 --> 00:46:00,730 So eventually I leaned over to her and I said, Ms. Child, is 875 00:46:00,730 --> 00:46:03,130 it fun to be famous? 876 00:46:03,130 --> 00:46:05,930 And she thought about it a second and said, 877 00:46:05,930 --> 00:46:08,360 you get used to it. 878 00:46:08,360 --> 00:46:11,380 And that had a profound effect on me, because you always say, 879 00:46:11,380 --> 00:46:13,810 well, what's the opposite like? 880 00:46:13,810 --> 00:46:15,330 Is it fun to be ignored? 881 00:46:15,330 --> 00:46:20,950 And the answer is, no, it's not much fun to be ignored. 882 00:46:20,950 --> 00:46:23,990 So yeah, it's something you can get used to, but you can 883 00:46:23,990 --> 00:46:27,180 never get used to having your stuff ignored, especially if 884 00:46:27,180 --> 00:46:28,610 it's good stuff. 885 00:46:28,610 --> 00:46:30,270 So that's why I commend to you this business 886 00:46:30,270 --> 00:46:32,160 about packaging ideas. 887 00:46:32,160 --> 00:46:34,770 And now you see that 6034 is not just about AI. 888 00:46:34,770 --> 00:46:36,140 It's about how to do good science. 889 00:46:36,140 --> 00:46:38,220 It's how to make yourself smarter, and how to make 890 00:46:38,220 --> 00:46:39,470 yourself more famous.