1 00:00:09,670 --> 00:00:09,717 PROF. 2 00:00:09,717 --> 00:00:12,460 PATRICK WINSTON: Well that's the Kodo Drummers. 3 00:00:12,460 --> 00:00:17,290 They're a group of about 30 or 40 Japanese people who live in 4 00:00:17,290 --> 00:00:20,880 a village on some island off the coast of Japan, and 5 00:00:20,880 --> 00:00:24,690 preserve traditional Japanese music. 6 00:00:24,690 --> 00:00:26,830 It's an unusual semi communal group. 7 00:00:26,830 --> 00:00:33,500 They generally run about 10 kilometers before breakfast, 8 00:00:33,500 --> 00:00:36,730 which is served at 5:00 AM. 9 00:00:36,730 --> 00:00:38,730 Strange group. 10 00:00:38,730 --> 00:00:41,460 Wouldn't miss a concert for the world, although they, 11 00:00:41,460 --> 00:00:44,130 alas, don't seem to be coming down to the 12 00:00:44,130 --> 00:00:46,970 Boston area very soon. 13 00:00:46,970 --> 00:00:49,130 If you go to a concert from the Kodo 14 00:00:49,130 --> 00:00:51,480 Drummers--and you should-- 15 00:00:51,480 --> 00:00:55,880 and if you're no longer young, you'll want to bring earplugs. 16 00:00:55,880 --> 00:01:03,490 Because, as we humans get older the dynamic range 17 00:01:03,490 --> 00:01:07,580 control in our inner ear tends to be less effective. 18 00:01:07,580 --> 00:01:12,900 So that's why a person of my age might find some piece of 19 00:01:12,900 --> 00:01:14,960 music excruciatingly loud, whereas you'll 20 00:01:14,960 --> 00:01:15,990 think it's just fine. 21 00:01:15,990 --> 00:01:18,200 Because you have better automatic gain control. 22 00:01:18,200 --> 00:01:22,470 Just like in any kind of communication device there's a 23 00:01:22,470 --> 00:01:25,611 control on how intense the sound gets. 24 00:01:25,611 --> 00:01:30,090 Ah, but I go off on a sidebar. 25 00:01:30,090 --> 00:01:34,310 Many of you have looked at me in astonishment 26 00:01:34,310 --> 00:01:37,622 as I drink my coffee. 27 00:01:37,622 --> 00:01:39,645 And you have undoubtedly have been saying to yourself, you 28 00:01:39,645 --> 00:01:42,960 know, Winston doesn't look like a professional athlete, 29 00:01:42,960 --> 00:01:47,979 but he seemed to have no trouble drinking his coffee. 30 00:01:47,979 --> 00:01:51,120 So today's material is going to be pretty easy. 31 00:01:51,120 --> 00:01:54,080 So I want to give you the side problem of thinking about how 32 00:01:54,080 --> 00:01:58,830 it's possible for somebody to do that. 33 00:01:58,830 --> 00:01:59,509 How is it possible? 34 00:01:59,509 --> 00:02:03,140 How would you make a computer program that could reach out 35 00:02:03,140 --> 00:02:07,850 and drink a cup of coffee, if it wanted a cup of coffee? 36 00:02:07,850 --> 00:02:09,370 So that's one puzzle I'd like you to work on. 37 00:02:09,370 --> 00:02:12,030 There's another puzzle, too. 38 00:02:12,030 --> 00:02:15,660 And that puzzle concerns diet drinks. 39 00:02:15,660 --> 00:02:18,070 This is a so-called Diet Coke. 40 00:02:22,900 --> 00:02:25,920 Yeah, it's ripe. 41 00:02:25,920 --> 00:02:33,120 If you take a Diet Coke and ask yourself, what would a dog 42 00:02:33,120 --> 00:02:36,900 think a Diet Coke is for? 43 00:02:36,900 --> 00:02:40,730 That's another puzzle that you can work on while we go 44 00:02:40,730 --> 00:02:42,800 through the material of the day. 45 00:02:42,800 --> 00:02:45,150 So this is our first lecture on learning, and I want to 46 00:02:45,150 --> 00:02:47,790 spend a minute or two in the beginning talking about the 47 00:02:47,790 --> 00:02:48,990 lay of the land. 48 00:02:48,990 --> 00:02:52,190 And then we'll race through some material on nearest 49 00:02:52,190 --> 00:02:52,910 neighbor learning. 50 00:02:52,910 --> 00:02:56,850 And then we'll finish up with the advertised 51 00:02:56,850 --> 00:02:59,010 discussion of sleep. 52 00:02:59,010 --> 00:03:02,820 Because I know many of you think that because your MIT 53 00:03:02,820 --> 00:03:04,370 students you're pretty tough, and you don't need 54 00:03:04,370 --> 00:03:05,855 to sleep and stuff. 55 00:03:05,855 --> 00:03:09,950 And we need to address that question before it's too late 56 00:03:09,950 --> 00:03:13,320 in the semester to get back on track. 57 00:03:13,320 --> 00:03:13,730 All right. 58 00:03:13,730 --> 00:03:16,720 So here's the story. 59 00:03:16,720 --> 00:03:18,500 Now the way we're going to look at learning is 60 00:03:18,500 --> 00:03:20,790 there are two kinds. 61 00:03:20,790 --> 00:03:24,780 There's this kind, and there's that kind. 62 00:03:24,780 --> 00:03:27,640 And we're going to talk a little bit about both kinds. 63 00:03:27,640 --> 00:03:32,340 The kind of the right is learning based on observations 64 00:03:32,340 --> 00:03:33,590 of regularity. 65 00:03:39,750 --> 00:03:44,190 And computers are particularly good at this stuff. 66 00:03:44,190 --> 00:03:49,320 And amongst the things that we'll talk about in connection 67 00:03:49,320 --> 00:03:52,130 with regularity based learning are today's topic, which is 68 00:03:52,130 --> 00:03:53,380 nearest neighbors. 69 00:03:59,110 --> 00:04:03,520 Then a little bit downstream we'll talk about neural nets. 70 00:04:08,250 --> 00:04:11,340 And then somewhere near the end of the segment, we'll talk 71 00:04:11,340 --> 00:04:12,900 about boosting. 72 00:04:17,690 --> 00:04:20,209 And these ideas come from all over the place. 73 00:04:20,209 --> 00:04:23,590 In particular, the stuff we're talking about today, nearest 74 00:04:23,590 --> 00:04:27,010 neighbors, is the stuff of which the field of pattern 75 00:04:27,010 --> 00:04:28,260 recognition-- 76 00:04:34,900 --> 00:04:37,690 it's the stuff of which pattern recognition journals 77 00:04:37,690 --> 00:04:38,909 are filled. 78 00:04:38,909 --> 00:04:40,450 This stuff has been around a long time. 79 00:04:40,450 --> 00:04:43,970 Does that mean it's not good? 80 00:04:43,970 --> 00:04:45,850 I hope not, because that would mean that everything you 81 00:04:45,850 --> 00:04:50,060 learned in 1801 is not good, because the same course was 82 00:04:50,060 --> 00:04:52,980 taught 1910. 83 00:04:52,980 --> 00:04:55,950 So it has been around a while, but it's extremely useful. 84 00:04:55,950 --> 00:04:57,980 And it's the first thing to try when you have a learning 85 00:04:57,980 --> 00:05:00,570 problem, because it's the simplest thing. 86 00:05:00,570 --> 00:05:03,200 And you always want to try the simplest thing before you try 87 00:05:03,200 --> 00:05:07,480 something more complex that you will be less likely to 88 00:05:07,480 --> 00:05:09,280 understand. 89 00:05:09,280 --> 00:05:11,360 So that's nearest neighbors and pattern recognitions. 90 00:05:11,360 --> 00:05:14,060 And the custodians of knowledge about neural nets, 91 00:05:14,060 --> 00:05:16,090 well this is sort of an attempt to mimic biology. 92 00:05:21,930 --> 00:05:24,370 And I'll cast a lot of calumny on that when we get down there 93 00:05:24,370 --> 00:05:26,000 to talk about it. 94 00:05:26,000 --> 00:05:29,000 And finally, this is the gift of the theoreticians. 95 00:05:31,610 --> 00:05:35,550 So we in AI have invented some stuff, we've borrowed some 96 00:05:35,550 --> 00:05:38,400 stuff, we've stolen some stuff, we've championed some 97 00:05:38,400 --> 00:05:40,520 stuff, and we've improved some stuff. 98 00:05:40,520 --> 00:05:43,210 That's why our discussion of learning will reach around all 99 00:05:43,210 --> 00:05:46,030 of these topics. 100 00:05:46,030 --> 00:05:47,409 So that's regularity based learning. 101 00:05:47,409 --> 00:05:50,100 And you can think of this as the branch 102 00:05:50,100 --> 00:05:51,350 of bulldozer computing. 103 00:05:54,200 --> 00:05:57,750 Because, when doing these kinds of things, a computer's 104 00:05:57,750 --> 00:06:01,426 processing information like a bulldozer processes gravel. 105 00:06:01,426 --> 00:06:04,530 Now that's not necessarily a good model for all the kinds 106 00:06:04,530 --> 00:06:06,140 of learning that humans do. 107 00:06:06,140 --> 00:06:08,810 And after all, learning is one of the things that we think 108 00:06:08,810 --> 00:06:11,040 characterizes human intelligence. 109 00:06:11,040 --> 00:06:13,310 So if we were to build models of it and understand that we 110 00:06:13,310 --> 00:06:15,800 have to go down this other branch, too. 111 00:06:15,800 --> 00:06:19,530 And down this other branch we find learning ideas that are 112 00:06:19,530 --> 00:06:20,780 based on constraint. 113 00:06:25,490 --> 00:06:26,540 And let's call this the 114 00:06:26,540 --> 00:06:28,140 human-like side of the picture. 115 00:06:33,270 --> 00:06:38,310 And we'll talk about ideas that enable, for example, 116 00:06:38,310 --> 00:06:42,230 one-shot learning, where you learn something definite from 117 00:06:42,230 --> 00:06:44,530 each experience. 118 00:06:44,530 --> 00:06:46,300 And we'll talk about explanation based learning. 119 00:06:56,870 --> 00:07:04,930 By the way, do you learn by self explanation? 120 00:07:04,930 --> 00:07:05,430 I think so. 121 00:07:05,430 --> 00:07:09,220 I had an advisee once, who got nothing but A's and F's. 122 00:07:09,220 --> 00:07:12,600 And I said, what are the subjects that you get A's in? 123 00:07:12,600 --> 00:07:14,420 And why don't you get A's in all of your subjects? 124 00:07:14,420 --> 00:07:17,360 And he said, oh, I get A's in the subjects when I convince 125 00:07:17,360 --> 00:07:19,800 myself the material is true. 126 00:07:19,800 --> 00:07:23,060 So the learning was a byproduct of self explanation, 127 00:07:23,060 --> 00:07:25,720 an important kind of learning. 128 00:07:25,720 --> 00:07:28,860 But alas, that's downstream. 129 00:07:28,860 --> 00:07:32,630 And what we're going to talk about today is this path 130 00:07:32,630 --> 00:07:36,650 through the tree, nearest neighbor learning. 131 00:07:36,650 --> 00:07:39,280 And here's how it works, in general. 132 00:07:42,310 --> 00:07:45,430 Here's just a general picture of what we're talking about. 133 00:07:45,430 --> 00:07:47,130 When you think of pattern recognition, or nearest 134 00:07:47,130 --> 00:07:50,870 neighbor based learning, you've got some sort of 135 00:07:50,870 --> 00:07:55,940 mechanism that generates a vector of features. 136 00:07:55,940 --> 00:07:57,475 So we'll call this the feature detector. 137 00:08:02,790 --> 00:08:05,652 And out comes a vector of values. 138 00:08:05,652 --> 00:08:09,090 And that vector of values goes into a 139 00:08:09,090 --> 00:08:10,685 comparator of some sort. 140 00:08:17,070 --> 00:08:21,240 And that comparator compares the feature vector with 141 00:08:21,240 --> 00:08:23,696 feature vectors coming from a library of possibilities. 142 00:08:26,960 --> 00:08:31,440 And by finding the closest match the comparator 143 00:08:31,440 --> 00:08:35,460 determines what some object is. 144 00:08:35,460 --> 00:08:36,710 It does recognition. 145 00:08:42,360 --> 00:08:47,870 So let me demonstrate that with these electrical covers. 146 00:08:47,870 --> 00:08:53,580 Suppose they arrived on an assembly line and some robot 147 00:08:53,580 --> 00:08:54,820 wants to sort them. 148 00:08:54,820 --> 00:08:56,830 How would it go about doing that? 149 00:08:56,830 --> 00:08:58,080 Well it could easily use the nearest 150 00:08:58,080 --> 00:09:00,240 neighbor sorting mechanism. 151 00:09:00,240 --> 00:09:01,440 So how would that work? 152 00:09:01,440 --> 00:09:02,690 Well here's how if would work. 153 00:09:05,350 --> 00:09:07,490 You would make some measurements. 154 00:09:07,490 --> 00:09:10,310 And it we'll just make some measurements in two 155 00:09:10,310 --> 00:09:11,950 dimensions. 156 00:09:11,950 --> 00:09:17,030 And one of those measurements might be the total area, 157 00:09:17,030 --> 00:09:18,780 including the area of the holes of 158 00:09:18,780 --> 00:09:20,440 these electrical covers. 159 00:09:20,440 --> 00:09:22,430 Just so you can follow what I'm doing without craning your 160 00:09:22,430 --> 00:09:27,102 neck, let me see if I can find the electrical covers. 161 00:09:27,102 --> 00:09:29,040 Yes, there they are. 162 00:09:29,040 --> 00:09:33,080 So we've got one big blank one, and several others. 163 00:09:33,080 --> 00:09:34,920 So we might also measure the hole area. 164 00:09:40,190 --> 00:09:45,320 And this one here, this guy here, this big white one has 165 00:09:45,320 --> 00:09:48,910 no hole area, and its got the maximum amount of total area. 166 00:09:48,910 --> 00:09:51,410 So it will find itself at that point in 167 00:09:51,410 --> 00:09:56,590 this space of features. 168 00:09:56,590 --> 00:10:02,710 Then we've got the guy here, with room for 169 00:10:02,710 --> 00:10:04,330 four sockets in it. 170 00:10:04,330 --> 00:10:06,980 That's got the maximum amount of hole area, as well as the 171 00:10:06,980 --> 00:10:08,360 maximum amount of area. 172 00:10:08,360 --> 00:10:14,180 So it will be right straight up, maybe up here. 173 00:10:14,180 --> 00:10:20,230 Then we have, in addition to those two, a blank cover, like 174 00:10:20,230 --> 00:10:24,670 this, that's got about 1/2 the total area that any cover can 175 00:10:24,670 --> 00:10:26,710 have, so we'll put it right here. 176 00:10:26,710 --> 00:10:31,560 And finally, we've got one more of these guys. 177 00:10:31,560 --> 00:10:32,590 Oh yes, this one. 178 00:10:32,590 --> 00:10:36,580 1/2 the hole area, and 1/2 the total area. 179 00:10:36,580 --> 00:10:37,680 So I don't know, let's see. 180 00:10:37,680 --> 00:10:40,370 Where will that go? 181 00:10:40,370 --> 00:10:44,400 Maybe about right here. 182 00:10:44,400 --> 00:10:47,330 So now our robot is looking on the assembly line and it sees 183 00:10:47,330 --> 00:10:49,570 something coming along, and it measures the area. 184 00:10:49,570 --> 00:10:51,320 And of course, there's noise. 185 00:10:51,320 --> 00:10:53,840 There's manufacturing variability. 186 00:10:53,840 --> 00:10:55,960 So it won't be precisely on top of anything. 187 00:10:55,960 --> 00:10:59,090 But suppose it's right there. 188 00:10:59,090 --> 00:11:01,820 Well it doesn't take any genius human, human or 189 00:11:01,820 --> 00:11:04,360 computer, to figure out that this must be one of those guys 190 00:11:04,360 --> 00:11:07,230 with maximum area and maximum hole area. 191 00:11:10,430 --> 00:11:12,880 But now let's ask some other questions. 192 00:11:12,880 --> 00:11:21,595 Where would [TAPPING ON CHALK BOARD], what 193 00:11:21,595 --> 00:11:23,460 would that be? 194 00:11:23,460 --> 00:11:24,780 Or what would this be? 195 00:11:24,780 --> 00:11:28,570 [TAPPING ON CHALK BOARD], and so on. 196 00:11:28,570 --> 00:11:34,980 Well we have to figure out what those newly viewed 197 00:11:34,980 --> 00:11:39,162 objects are closest to in order to do an identification. 198 00:11:39,162 --> 00:11:40,350 But that's easy. 199 00:11:40,350 --> 00:11:46,050 We just calculate the distance to all of those standard, 200 00:11:46,050 --> 00:11:49,980 platonic, ideal descriptions of things, and we find out 201 00:11:49,980 --> 00:11:51,790 which is nearest. 202 00:11:51,790 --> 00:11:55,610 But in general, it's a little easier to think about 203 00:11:55,610 --> 00:12:01,580 producing some boundaries between these various idealize 204 00:12:01,580 --> 00:12:04,530 places, so that we can just say, well which area 205 00:12:04,530 --> 00:12:05,540 is the object in? 206 00:12:05,540 --> 00:12:08,900 And then we'll know instantaneously to what 207 00:12:08,900 --> 00:12:11,250 category it belongs. 208 00:12:11,250 --> 00:12:15,140 So if we only had two, like the purple one and the yellow 209 00:12:15,140 --> 00:12:16,880 one, it would be easy. 210 00:12:16,880 --> 00:12:21,520 Because, we would just construct a line between the 211 00:12:21,520 --> 00:12:27,290 two, with a line between the purple and yellow as a 212 00:12:27,290 --> 00:12:29,390 perpendicular bisector. 213 00:12:29,390 --> 00:12:32,550 And so drawing it out instead of talking about it, if there 214 00:12:32,550 --> 00:12:35,270 were only two, that would be the boundary line. 215 00:12:35,270 --> 00:12:38,270 Anything south of the dotted line would be purple, and 216 00:12:38,270 --> 00:12:41,180 anything north would be yellow. 217 00:12:41,180 --> 00:12:43,570 And now we can do this with all the points, right? 218 00:12:43,570 --> 00:12:46,350 So we can figure out-- oh could you, Pierre, could you 219 00:12:46,350 --> 00:12:50,800 just close the lap top please? 220 00:12:50,800 --> 00:12:55,620 So if we want to do this with all these guys it would go 221 00:12:55,620 --> 00:12:57,390 something like this-- 222 00:12:57,390 --> 00:12:59,450 I better get rid of these dotted x's before 223 00:12:59,450 --> 00:13:01,600 they confuse me. 224 00:13:01,600 --> 00:13:04,930 Let's see, if these were the only two points, then we would 225 00:13:04,930 --> 00:13:07,460 want to construct a perpendicular bisector between 226 00:13:07,460 --> 00:13:09,762 the line joining them. 227 00:13:09,762 --> 00:13:13,860 And if these two were the only points, I would want to 228 00:13:13,860 --> 00:13:17,060 construct this perpendicular bisector. 229 00:13:17,060 --> 00:13:21,520 And if these two were the only points, I would want to 230 00:13:21,520 --> 00:13:24,100 construct a perpendicular bisector. 231 00:13:24,100 --> 00:13:27,180 And if these two points were the only ones involved I'd 232 00:13:27,180 --> 00:13:27,960 want to construct-- 233 00:13:27,960 --> 00:13:30,480 oh, you see what I'm doing? 234 00:13:30,480 --> 00:13:33,410 I'm constructing perpendicular bisectors, and those are 235 00:13:33,410 --> 00:13:38,520 exactly the lines that I need in order to 236 00:13:38,520 --> 00:13:40,530 divide up this space. 237 00:13:40,530 --> 00:13:42,070 And it's going to divide up like this. 238 00:13:48,240 --> 00:13:50,230 And I won't say we'll give you a problem like this on an 239 00:13:50,230 --> 00:13:54,630 examination, but we have every year in the past ten. 240 00:13:54,630 --> 00:13:58,310 To divide up a space and produce-- 241 00:13:58,310 --> 00:13:59,730 something we would like to give a name. 242 00:13:59,730 --> 00:14:02,010 You know, Rumpelstiltskin effect, when you have a name 243 00:14:02,010 --> 00:14:03,390 you get power over it. 244 00:14:03,390 --> 00:14:04,865 So we're going to call these decision boundaries. 245 00:14:14,160 --> 00:14:16,760 OK so those are the simple decision boundaries, produced 246 00:14:16,760 --> 00:14:21,050 in a sample space, by a simple idea. 247 00:14:21,050 --> 00:14:26,030 But there is a little bit more to say about this. 248 00:14:26,030 --> 00:14:29,450 Because, I've talked about this as if we're trying to 249 00:14:29,450 --> 00:14:31,510 identify something. 250 00:14:31,510 --> 00:14:34,560 There's another way of thinking about it that's 251 00:14:34,560 --> 00:14:36,500 extremely important. 252 00:14:36,500 --> 00:14:38,180 And that is this. 253 00:14:38,180 --> 00:14:43,340 Suppose I come in with a brand new cover, never before seen. 254 00:14:43,340 --> 00:14:52,780 And I only measure, well let's say I only 255 00:14:52,780 --> 00:14:54,710 measure the hole area. 256 00:14:54,710 --> 00:14:58,695 And the hole area has that value. 257 00:15:01,610 --> 00:15:04,600 What is the most likely total area? 258 00:15:08,610 --> 00:15:10,060 Well I don't know. 259 00:15:10,060 --> 00:15:13,290 But there's a kind of weak principle of, if something is 260 00:15:13,290 --> 00:15:15,250 similar in some respects, it's likely to be 261 00:15:15,250 --> 00:15:16,175 similar in other respects. 262 00:15:16,175 --> 00:15:19,100 So I'm going to guess, if you hold a knife to my throat and 263 00:15:19,100 --> 00:15:22,090 back me into a corner, that it's total area is going to be 264 00:15:22,090 --> 00:15:27,410 something like that orange cover whole, total area. 265 00:15:27,410 --> 00:15:29,340 So this is a contrived example, and I don't make too 266 00:15:29,340 --> 00:15:29,800 much of it. 267 00:15:29,800 --> 00:15:32,910 But I do want to make a lot of that first 268 00:15:32,910 --> 00:15:33,630 principal, over there. 269 00:15:33,630 --> 00:15:36,200 And that is the idea that, if something is similar in some 270 00:15:36,200 --> 00:15:39,740 respects, it's likely to be similar in other respects. 271 00:15:39,740 --> 00:15:45,120 Because that's what most of education is about. 272 00:15:45,120 --> 00:15:47,990 Fairy tales, legal cases, medical 273 00:15:47,990 --> 00:15:49,760 cases, business cases-- 274 00:15:49,760 --> 00:15:52,170 if you can see that there are similar in some respects to a 275 00:15:52,170 --> 00:15:55,070 situation you've got now, then it's likely that they're going 276 00:15:55,070 --> 00:15:57,860 to be similar in other respects, as well. 277 00:15:57,860 --> 00:16:00,115 So when we're learning, we're not just learning to recognize 278 00:16:00,115 --> 00:16:02,740 a category, we're learning because we're attempting to 279 00:16:02,740 --> 00:16:06,390 apply some kind of precedent. 280 00:16:06,390 --> 00:16:08,996 That's the story on that. 281 00:16:08,996 --> 00:16:11,590 Well that's a simple idea but does it have any application? 282 00:16:11,590 --> 00:16:13,810 The answer is sure. 283 00:16:13,810 --> 00:16:15,470 Here's an example. 284 00:16:15,470 --> 00:16:18,730 My second example, the example of cell identification. 285 00:16:18,730 --> 00:16:20,060 Suppose you have some white blood cells, 286 00:16:20,060 --> 00:16:21,310 what might you do? 287 00:16:23,390 --> 00:16:25,960 You might measure the total area of the cell. 288 00:16:25,960 --> 00:16:28,340 And not the hole area, but maybe the nucleus area. 289 00:16:33,290 --> 00:16:36,685 And maybe you might measure four or five other things, and 290 00:16:36,685 --> 00:16:38,300 put this thing in a high dimensional space. 291 00:16:38,300 --> 00:16:41,860 You can still measure the nearness in a 292 00:16:41,860 --> 00:16:42,700 high dimensional space. 293 00:16:42,700 --> 00:16:44,020 So you can use the idea to do that. 294 00:16:44,020 --> 00:16:45,780 It works pretty well. 295 00:16:45,780 --> 00:16:48,940 A friend of mine once started a company based on this idea. 296 00:16:48,940 --> 00:16:51,490 He got wiped out, of course, but it wasn't his fault. 297 00:16:51,490 --> 00:16:54,670 What happened is that somebody invented a better stain and it 298 00:16:54,670 --> 00:16:56,670 became much easier to just do the 299 00:16:56,670 --> 00:17:00,030 recognition by brute force. 300 00:17:00,030 --> 00:17:02,840 So let's see, that's two examples. 301 00:17:02,840 --> 00:17:06,770 the introductory example of the holes of the electrical 302 00:17:06,770 --> 00:17:09,810 covers, and the example of cells. 303 00:17:09,810 --> 00:17:14,170 And what I want to do now is show you how the idea can 304 00:17:14,170 --> 00:17:17,940 reappear in disguised forms in areas where you might not 305 00:17:17,940 --> 00:17:20,010 expect to see it. 306 00:17:20,010 --> 00:17:22,310 So consider the following problem. 307 00:17:22,310 --> 00:17:29,070 You have a collection of articles from magazines. 308 00:17:29,070 --> 00:17:34,060 And you're interested in learning something about how 309 00:17:34,060 --> 00:17:35,920 to address a particular question. 310 00:17:35,920 --> 00:17:38,510 How do you go about finding the articles that are relevant 311 00:17:38,510 --> 00:17:40,420 to your question? 312 00:17:40,420 --> 00:17:46,170 So this is a puzzle that has been studied for decades by 313 00:17:46,170 --> 00:17:48,900 people interested in information retrieval. 314 00:17:48,900 --> 00:17:50,390 And here's the simple way to do it. 315 00:17:53,390 --> 00:17:59,010 I'm going to illustrate, once again, in just two dimensions. 316 00:17:59,010 --> 00:18:02,840 But it has to be applied in many, many dimensions. 317 00:18:02,840 --> 00:18:07,930 The idea is you count up the words in the articles in your 318 00:18:07,930 --> 00:18:12,370 library, and you compare the word counts to the word counts 319 00:18:12,370 --> 00:18:13,870 in your probing question. 320 00:18:16,500 --> 00:18:20,480 So you might be interested in 100 words. 321 00:18:20,480 --> 00:18:23,990 I'm only going to write two on the board for illustration. 322 00:18:23,990 --> 00:18:29,850 So we're going to think about articles from two magazines. 323 00:18:29,850 --> 00:18:31,500 Well first of all, what words are we going to use? 324 00:18:31,500 --> 00:18:38,160 One word is going to be hack, and that will include all 325 00:18:38,160 --> 00:18:41,550 derivatives of hack-- hacker, hacking, and so on. 326 00:18:41,550 --> 00:18:43,550 And the other word is going to be computer. 327 00:18:49,390 --> 00:18:53,250 And so it would not be surprising for you to see that 328 00:18:53,250 --> 00:18:56,480 articles from Wired Magazine might appear 329 00:18:56,480 --> 00:18:58,790 in places like this. 330 00:18:58,790 --> 00:19:02,320 They would involve lots of uses of the word computer, and 331 00:19:02,320 --> 00:19:05,670 lots of uses of the word hack. 332 00:19:05,670 --> 00:19:08,180 And now for the sake of illustration, the second 333 00:19:08,180 --> 00:19:11,680 magazine from which we are going to draw articles is Town 334 00:19:11,680 --> 00:19:13,700 and Country. 335 00:19:13,700 --> 00:19:17,830 It's a very tony magazine, and the people who read out Town 336 00:19:17,830 --> 00:19:21,360 and Country tend to be social parasites. 337 00:19:21,360 --> 00:19:25,930 And they still use the word hack. 338 00:19:25,930 --> 00:19:28,330 Because you can talk about hacking, there's some sort of 339 00:19:28,330 --> 00:19:32,080 specialize term of art in dealing with horses. 340 00:19:32,080 --> 00:19:37,760 So all the Town and Country articles would be likely to be 341 00:19:37,760 --> 00:19:40,980 down here somewhere. 342 00:19:40,980 --> 00:19:46,110 And maybe they would be one like that when they talk about 343 00:19:46,110 --> 00:19:48,960 hiring some computer expert to keep track of the results so 344 00:19:48,960 --> 00:19:53,940 the weekly hunt, or something. 345 00:19:53,940 --> 00:19:56,950 And now, in you come with your probe. 346 00:19:56,950 --> 00:19:59,430 And of course your probe question is going to be 347 00:19:59,430 --> 00:20:01,220 relatively small. 348 00:20:01,220 --> 00:20:03,510 It's not going to have a lot of words in it. 349 00:20:03,510 --> 00:20:05,640 So here's your here's your probe question. 350 00:20:05,640 --> 00:20:06,890 Here's your unknown. 351 00:20:11,670 --> 00:20:13,840 Which article's going to be closest? 352 00:20:13,840 --> 00:20:16,944 Which articles are going to be closest? 353 00:20:16,944 --> 00:20:22,580 Well, alas, all those Town and Country articles are closest. 354 00:20:22,580 --> 00:20:27,520 So you can't use the nearest neighbor idea, it would seem. 355 00:20:27,520 --> 00:20:29,230 Anybody got a suggestion for how we might 356 00:20:29,230 --> 00:20:30,570 get out of this dilemma? 357 00:20:30,570 --> 00:20:31,286 Yes, Christopher. 358 00:20:31,286 --> 00:20:35,087 CHRISTOPHER: If you're looking for word counts and you want 359 00:20:35,087 --> 00:20:38,248 to include some terms of computer, then wouldn't you 360 00:20:38,248 --> 00:20:40,743 want to use that as a threshold, rather than the 361 00:20:40,743 --> 00:20:41,741 nearest neighbor? 362 00:20:41,741 --> 00:20:41,824 PROF. 363 00:20:41,824 --> 00:20:42,740 PATRICK WINSTON: I don't know, it's a good idea. 364 00:20:42,740 --> 00:20:46,492 It might work, who knows. 365 00:20:46,492 --> 00:20:47,486 Doug? 366 00:20:47,486 --> 00:20:50,965 DOUG: Instead of using decision boundaries that are 367 00:20:50,965 --> 00:20:55,530 perpendicular bisectors, if you treated Wired and Town and 368 00:20:55,530 --> 00:20:59,434 Country as sort of this like, [INAUDIBLE] 369 00:20:59,434 --> 00:21:00,410 targets. 370 00:21:00,410 --> 00:21:03,338 And they would look like some [? great radial ?], here. 371 00:21:03,338 --> 00:21:05,290 I guess, some radius around curves. 372 00:21:05,290 --> 00:21:07,730 If it's within a certain radius then-- 373 00:21:11,634 --> 00:21:11,756 PROF. 374 00:21:11,756 --> 00:21:13,098 PATRICK WINSTON: Yes? 375 00:21:13,098 --> 00:21:14,806 [? SPEAKER 1: Are we, ?] necessarily, have it done with 376 00:21:14,806 --> 00:21:16,026 some sort of a [? politidy distance ?] 377 00:21:16,026 --> 00:21:16,550 metric? 378 00:21:16,550 --> 00:21:16,640 PROF. 379 00:21:16,640 --> 00:21:17,650 PATRICK WINSTON: Oh, here we go. 380 00:21:17,650 --> 00:21:19,114 We're not going to use any [? politidy distance ?] 381 00:21:19,114 --> 00:21:19,602 metric. 382 00:21:19,602 --> 00:21:21,066 We're going to use some other metric. 383 00:21:21,066 --> 00:21:22,042 SPEAKER 1: Like alogrithmic, or whatnot? 384 00:21:22,042 --> 00:21:22,164 PROF. 385 00:21:22,164 --> 00:21:23,018 PATRICK WINSTON: Well, algorithmic, 386 00:21:23,018 --> 00:21:24,482 gees, I don't know. 387 00:21:24,482 --> 00:21:26,440 [LAUGHTER] 388 00:21:26,440 --> 00:21:26,478 PROF. 389 00:21:26,478 --> 00:21:29,040 PATRICK WINSTON: Let me give you a hint. 390 00:21:29,040 --> 00:21:30,880 Let me give you a hint. 391 00:21:30,880 --> 00:21:35,720 There are all those articles up there, out there, and out 392 00:21:35,720 --> 00:21:39,045 there, just for example. 393 00:21:39,045 --> 00:21:41,060 And here are the Town and Country articles. 394 00:21:41,060 --> 00:21:45,210 They're out there, and out there, for example. 395 00:21:45,210 --> 00:21:50,050 And now our unknown is out there. 396 00:21:50,050 --> 00:21:52,020 Anybody got an idea now? 397 00:21:52,020 --> 00:21:53,110 Hey Brett, what do you think? 398 00:21:53,110 --> 00:21:55,870 BRETT: So you sort of want the ratio. 399 00:21:55,870 --> 00:21:58,640 Or in this case, you can take the angle-- 400 00:21:58,640 --> 00:21:58,707 PROF. 401 00:21:58,707 --> 00:22:00,270 PATRICK WINSTON: Let's be-- ah, there we go, we're getting 402 00:22:00,270 --> 00:22:02,060 a little more sophisticated. 403 00:22:02,060 --> 00:22:03,320 The angle between what? 404 00:22:03,320 --> 00:22:05,260 BRETT: The angle between the vectors. 405 00:22:05,260 --> 00:22:05,381 PROF. 406 00:22:05,381 --> 00:22:06,715 PATRICK WINSTON: The vectors. 407 00:22:06,715 --> 00:22:08,170 Good. 408 00:22:08,170 --> 00:22:09,140 So we're going to use a different metric. 409 00:22:09,140 --> 00:22:10,485 What we're going to do is, we're going to forget 410 00:22:10,485 --> 00:22:12,810 including a distance, and we're going to measure the 411 00:22:12,810 --> 00:22:15,150 angle between the vectors. 412 00:22:15,150 --> 00:22:18,460 So the angle between the vectors, well let's actually 413 00:22:18,460 --> 00:22:21,960 measure the cosine of the angle between the vectors. 414 00:22:21,960 --> 00:22:24,180 Let's see how we can calculate that. 415 00:22:24,180 --> 00:22:27,970 So we'll take the cosine of the angle between the vectors, 416 00:22:27,970 --> 00:22:29,570 we'll call it theta. 417 00:22:29,570 --> 00:22:37,960 That's going to be equal to the sum of the unknown values 418 00:22:37,960 --> 00:22:42,660 times the article values. 419 00:22:42,660 --> 00:22:45,290 Those are just the values in various dimensions. 420 00:22:45,290 --> 00:22:50,660 And then we'll divide that by the magnitude 421 00:22:50,660 --> 00:22:51,550 of the other vectors. 422 00:22:51,550 --> 00:22:54,430 So we'll divide by the magnitude of u, and we'll 423 00:22:54,430 --> 00:23:00,290 divide by the magnitude of the art vector to the article. 424 00:23:00,290 --> 00:23:03,050 So that's just the dot product right? 425 00:23:03,050 --> 00:23:05,860 That's a very fast computation. 426 00:23:05,860 --> 00:23:08,075 So with a very fast computation you can see if 427 00:23:08,075 --> 00:23:10,250 these things are going to be in the same direction. 428 00:23:10,250 --> 00:23:15,670 By the way, if this vector here is actually identical to 429 00:23:15,670 --> 00:23:18,980 one of those articles, what will the value be? 430 00:23:18,980 --> 00:23:22,366 Well then a cosine will be 0 and we'll get the maximum die 431 00:23:22,366 --> 00:23:23,616 of the cosine, which is 1. 432 00:23:30,690 --> 00:23:32,540 Yeah, that will do it. 433 00:23:32,540 --> 00:23:35,900 So if we use any of the articles to probe the article 434 00:23:35,900 --> 00:23:39,230 space, they'll find themselves, which is a good 435 00:23:39,230 --> 00:23:43,080 thing to have a mechanism do. 436 00:23:43,080 --> 00:23:43,560 OK. 437 00:23:43,560 --> 00:23:46,300 So that's just the dot product of those two vectors. 438 00:23:46,300 --> 00:23:49,220 And it works like a charm. 439 00:23:49,220 --> 00:23:50,830 It's not the most sophisticated way of doing 440 00:23:50,830 --> 00:23:51,620 these things. 441 00:23:51,620 --> 00:23:54,150 There are hairy ways. 442 00:23:54,150 --> 00:23:56,370 You can get a Ph.D. by doing this sort of stuff in some new 443 00:23:56,370 --> 00:23:57,510 and sophisticated way. 444 00:23:57,510 --> 00:23:59,230 But this is a simple way. 445 00:23:59,230 --> 00:24:00,830 It works pretty well. 446 00:24:00,830 --> 00:24:02,080 And you don't have to strain yourself, much, 447 00:24:02,080 --> 00:24:03,990 to implement it. 448 00:24:03,990 --> 00:24:04,700 So that's cool. 449 00:24:04,700 --> 00:24:07,220 That's an example where we have a very 450 00:24:07,220 --> 00:24:08,470 non-standard metric. 451 00:24:11,920 --> 00:24:14,190 Now let's see, what else can we do? 452 00:24:14,190 --> 00:24:17,980 How about a robotic arm control? 453 00:24:17,980 --> 00:24:19,430 Here we go. 454 00:24:19,430 --> 00:24:20,790 We're going to just have a simple arm. 455 00:24:30,950 --> 00:24:36,820 And what we want to do is, we want to get this arm to move 456 00:24:36,820 --> 00:24:43,200 that ball along some trajectory at a speed, 457 00:24:43,200 --> 00:24:47,040 velocity, and acceleration that we have determined. 458 00:24:47,040 --> 00:24:49,320 So we've got two problems here. 459 00:24:49,320 --> 00:24:52,780 Well let's see, we've got two problems because, first of 460 00:24:52,780 --> 00:24:59,374 all, we've got angles, theta 1 and theta 2. 461 00:24:59,374 --> 00:25:04,470 It's a 2 degree of 3 of arm, so there are only two angles. 462 00:25:04,470 --> 00:25:07,220 So the first problem we have is the kinematic problem of 463 00:25:07,220 --> 00:25:09,590 translating the (x,y)-cordinates of the ball, 464 00:25:09,590 --> 00:25:13,660 the desired ones, into the theta 1, theta 2 space. 465 00:25:13,660 --> 00:25:15,630 That's simple kinematic problem. 466 00:25:15,630 --> 00:25:16,680 No f equals ma there. 467 00:25:16,680 --> 00:25:20,110 It Doesn't involve forces, or time, or 468 00:25:20,110 --> 00:25:22,240 acceleration, anything. 469 00:25:22,240 --> 00:25:24,680 Pretty simple. 470 00:25:24,680 --> 00:25:31,990 But then we've got the problem of getting it to go along that 471 00:25:31,990 --> 00:25:36,690 trajectory with positions, speeds, and 472 00:25:36,690 --> 00:25:40,230 accelerations that we desire. 473 00:25:40,230 --> 00:25:48,710 And now you say to me, well I've got 801, I can do that. 474 00:25:48,710 --> 00:25:49,920 And that's true, you can. 475 00:25:49,920 --> 00:25:52,480 Because, it's Newtonian mechanics. 476 00:25:52,480 --> 00:25:53,810 All you have to do is solve the equations. 477 00:26:03,830 --> 00:26:06,810 There are the equations. 478 00:26:06,810 --> 00:26:08,060 Good luck. 479 00:26:11,550 --> 00:26:12,940 Why are they so complicated? 480 00:26:12,940 --> 00:26:15,655 Well because of the complicated geometry. 481 00:26:15,655 --> 00:26:18,850 You notice we've got some products of theta 1 and theta 482 00:26:18,850 --> 00:26:20,170 2 in there, somewhere, I think? 483 00:26:20,170 --> 00:26:21,210 You've got theta 2's. 484 00:26:21,210 --> 00:26:23,570 I see an acceleration squared. 485 00:26:23,570 --> 00:26:27,080 And yeah, there's a theta 1 dot times a theta 2 dot. 486 00:26:27,080 --> 00:26:29,520 A velocity times a velocity. 487 00:26:29,520 --> 00:26:30,530 Where the hell did that come from? 488 00:26:30,530 --> 00:26:32,970 I mean it's supposed to be f equals ma, right? 489 00:26:32,970 --> 00:26:34,600 Those are Coriolis forces, because of 490 00:26:34,600 --> 00:26:37,690 the complicated geometry. 491 00:26:37,690 --> 00:26:37,950 OK. 492 00:26:37,950 --> 00:26:40,440 So you hire Berthold Horn, or somebody, to work these 493 00:26:40,440 --> 00:26:41,210 equations out for you. 494 00:26:41,210 --> 00:26:42,590 And he comes up with something like this. 495 00:26:42,590 --> 00:26:45,090 And you try it out and it doesn't work. 496 00:26:45,090 --> 00:26:46,175 Why doesn't it work? 497 00:26:46,175 --> 00:26:47,772 It's Newtonian mechanics, I said. 498 00:26:50,430 --> 00:26:54,480 It doesn't work because we forgot to tell Berthold that 499 00:26:54,480 --> 00:26:56,860 there's friction in all the joints. 500 00:26:56,860 --> 00:26:58,800 And we forgot to tell him that they've worn a little bit 501 00:26:58,800 --> 00:27:00,470 since yesterday. 502 00:27:00,470 --> 00:27:02,150 And we forgot that the measurements we make on the 503 00:27:02,150 --> 00:27:04,500 lab table are not quite precise. 504 00:27:04,500 --> 00:27:06,580 So people try to do this. 505 00:27:06,580 --> 00:27:09,360 It just doesn't work. 506 00:27:09,360 --> 00:27:11,270 As soon as you get a ball of a different weight you have to 507 00:27:11,270 --> 00:27:11,820 start over. 508 00:27:11,820 --> 00:27:14,310 It's gross. 509 00:27:14,310 --> 00:27:15,560 So I don't know. 510 00:27:15,560 --> 00:27:18,990 I can do this sort of thing effortlessly, and I couldn't 511 00:27:18,990 --> 00:27:21,590 begin to solve those equations. 512 00:27:21,590 --> 00:27:22,410 So let's see. 513 00:27:22,410 --> 00:27:23,940 What we're going to do is we're going to forget about 514 00:27:23,940 --> 00:27:25,310 the problem for a minute. 515 00:27:25,310 --> 00:27:27,030 And we're going to talk about building 516 00:27:27,030 --> 00:27:30,180 ourselves a gigantic table. 517 00:27:30,180 --> 00:27:31,570 And here's what's going to be on the table. 518 00:27:34,320 --> 00:27:40,610 Theta 1, theta 2, theta 3, oops, there are only two. 519 00:27:40,610 --> 00:27:42,960 So that's theta 1 again, but it's the 520 00:27:42,960 --> 00:27:47,260 velocity, angular velocity. 521 00:27:47,260 --> 00:27:48,570 And then we have the accelerations. 522 00:27:53,430 --> 00:27:56,685 So we're going to have a big table of these things. 523 00:27:56,685 --> 00:27:58,780 And what we're going to do, is we're going to 524 00:27:58,780 --> 00:28:02,140 give this arm a childhood. 525 00:28:02,140 --> 00:28:04,270 And we're going to write down all the combinations we ever 526 00:28:04,270 --> 00:28:07,940 see, every 100 milliseconds, or something. 527 00:28:07,940 --> 00:28:10,947 And the arm is just going to wave around like a kid does in 528 00:28:10,947 --> 00:28:13,160 the cradle. 529 00:28:13,160 --> 00:28:16,350 And then, we're not quite done. 530 00:28:16,350 --> 00:28:18,660 Because there are two other things we're going to record. 531 00:28:18,660 --> 00:28:21,410 Can you guess what they are? 532 00:28:21,410 --> 00:28:24,317 There are going to be the torque on the first motor, and 533 00:28:24,317 --> 00:28:25,839 the torque on the second motor. 534 00:28:29,970 --> 00:28:33,710 And so now, we've got a whole bunch of those records. 535 00:28:36,280 --> 00:28:40,960 The question is, what do we got to do with it? 536 00:28:40,960 --> 00:28:43,620 Well here's what we're going to do it. 537 00:28:43,620 --> 00:28:46,200 We're going to divide this trajectory that we're hoping 538 00:28:46,200 --> 00:28:49,370 to achieve, up into little pieces. 539 00:28:49,370 --> 00:28:50,580 And there's a little piece. 540 00:28:50,580 --> 00:28:52,860 And in that little piece nothing is 541 00:28:52,860 --> 00:28:54,420 going to change much. 542 00:28:54,420 --> 00:28:54,960 There's going to be an 543 00:28:54,960 --> 00:28:58,770 acceleration, velocity, position. 544 00:28:58,770 --> 00:29:02,000 And so we can look those up in the table that 545 00:29:02,000 --> 00:29:03,892 we made in the childhood. 546 00:29:03,892 --> 00:29:08,360 And we'll look around and find the closest match, and this 547 00:29:08,360 --> 00:29:13,960 will be the set of values for the positions, velocities, and 548 00:29:13,960 --> 00:29:17,230 accelerations that are associated with that 549 00:29:17,230 --> 00:29:18,880 particular movement. 550 00:29:18,880 --> 00:29:21,200 And guess what we can do now? 551 00:29:21,200 --> 00:29:24,460 We can say, in the past, the torques associated with that 552 00:29:24,460 --> 00:29:27,650 particular little piece of movement lie right there. 553 00:29:27,650 --> 00:29:29,950 So we can just look it up. 554 00:29:29,950 --> 00:29:33,690 Now this method was thought up and rejected, because 555 00:29:33,690 --> 00:29:35,690 computers weren't powerful enough. 556 00:29:35,690 --> 00:29:38,170 And then, this is the age of recycling, right? 557 00:29:38,170 --> 00:29:42,625 So the idea got recycled when computers got strong enough. 558 00:29:42,625 --> 00:29:46,336 And it works pretty well, for things like this. 559 00:29:46,336 --> 00:29:51,040 But you might say to me, well can it do the stuff that we 560 00:29:51,040 --> 00:29:52,620 humans can do, like this? 561 00:29:58,540 --> 00:30:03,010 And the answer is, let's look. 562 00:30:19,070 --> 00:30:21,820 So this is a training phase, it's 563 00:30:21,820 --> 00:30:23,070 going through its childhood. 564 00:30:42,830 --> 00:30:44,940 You see what's happening is this. 565 00:30:44,940 --> 00:30:47,200 The initial table won't be very good. 566 00:30:47,200 --> 00:30:48,330 But that's OK. 567 00:30:48,330 --> 00:30:50,950 Because there are only a small number of things that it's 568 00:30:50,950 --> 00:30:53,600 important for you to be able to do. 569 00:30:53,600 --> 00:30:55,450 So when you try those things it's still 570 00:30:55,450 --> 00:30:57,340 writing into the table. 571 00:30:57,340 --> 00:30:59,660 So the next time you try that particular motion, it's going 572 00:30:59,660 --> 00:31:01,800 to be better at it, because its got better stuff to 573 00:31:01,800 --> 00:31:02,810 interpolate [? amongst ?] 574 00:31:02,810 --> 00:31:04,300 in that table. 575 00:31:04,300 --> 00:31:07,290 So that's why this thing is getting better and better as 576 00:31:07,290 --> 00:31:08,540 it goes on. 577 00:31:23,250 --> 00:31:24,500 That's as good as I was doing. 578 00:31:38,290 --> 00:31:38,830 Pretty good, don't you think? 579 00:31:38,830 --> 00:31:40,460 There's just one thing I want to show at the end of this 580 00:31:40,460 --> 00:31:43,820 clip just for fun. 581 00:31:43,820 --> 00:31:45,470 Maybe you've seen some old Zorro movies? 582 00:31:52,400 --> 00:31:54,370 So here's a little set up where this thing has learned 583 00:31:54,370 --> 00:31:56,680 to use a lash. 584 00:31:56,680 --> 00:32:00,220 So here's the lash, and there's a candle down there. 585 00:32:00,220 --> 00:32:01,470 So watch this. 586 00:32:11,325 --> 00:32:13,160 Pretty good, don't you think? 587 00:32:13,160 --> 00:32:14,840 So how fast does the learning take place? 588 00:32:14,840 --> 00:32:18,820 Let me go back to that other slides and show you. 589 00:32:18,820 --> 00:32:24,790 So here's some graphs to show you how fast goes, boom. 590 00:32:24,790 --> 00:32:28,620 That gives you the curves of how well the robot arm can go 591 00:32:28,620 --> 00:32:31,290 along a straight line, after no practice with just some 592 00:32:31,290 --> 00:32:33,200 stuff recorded in the memory. 593 00:32:33,200 --> 00:32:35,270 And then with a couple of practice runs do give it 594 00:32:35,270 --> 00:32:40,120 better values amongst which to interpolate. 595 00:32:40,120 --> 00:32:42,170 So I think that's pretty cool. 596 00:32:42,170 --> 00:32:45,630 So simple, but yet so effective. 597 00:32:45,630 --> 00:32:48,720 But you still might say, well, I don't know, it might be 598 00:32:48,720 --> 00:32:52,280 something that can be done in special cases. 599 00:32:52,280 --> 00:32:55,230 I wonder if old Winston uses something like that when he 600 00:32:55,230 --> 00:32:56,470 drinks his coffee? 601 00:32:56,470 --> 00:32:57,610 Well we' ought to do the numbers 602 00:32:57,610 --> 00:33:01,050 and see if it's possible. 603 00:33:01,050 --> 00:33:02,190 But I don't want to use coffee, it's 604 00:33:02,190 --> 00:33:03,740 the baseball season. 605 00:33:03,740 --> 00:33:06,180 We're approaching the World Series. 606 00:33:06,180 --> 00:33:08,410 We might as well talk about professional athletes. 607 00:33:13,640 --> 00:33:18,320 So let's suppose that this is a baseball pitcher. 608 00:33:18,320 --> 00:33:20,620 And I want to know how much memory I'll need to record a 609 00:33:20,620 --> 00:33:22,590 whole lot of pitches. 610 00:33:22,590 --> 00:33:24,040 Is there a good pitcher these days? 611 00:33:24,040 --> 00:33:27,710 The Red Socks suck so I don't do Red Socks. 612 00:33:27,710 --> 00:33:30,240 Clay Buchholz, I guess. 613 00:33:30,240 --> 00:33:32,560 I don't know, some pitcher. 614 00:33:32,560 --> 00:33:36,380 And what we're going to do, is we're going to say for each of 615 00:33:36,380 --> 00:33:39,890 these little segments were going to record 616 00:33:39,890 --> 00:33:46,990 100 bytes per joint. 617 00:33:46,990 --> 00:33:49,980 And we've got joints all over the place. 618 00:33:49,980 --> 00:33:52,230 I don't know how many are involved in doing a baseball 619 00:33:52,230 --> 00:33:58,170 pitch, but let's just say we have had 100 joints. 620 00:33:58,170 --> 00:34:01,840 And then we have to divide the pitch up 621 00:34:01,840 --> 00:34:04,800 into a bunch of segments. 622 00:34:04,800 --> 00:34:07,470 So let's just say for sake of argument that 623 00:34:07,470 --> 00:34:15,219 there are 100 segments. 624 00:34:15,219 --> 00:34:20,560 And how many pitches does a pitcher throw in a day? 625 00:34:20,560 --> 00:34:20,879 What? 626 00:34:20,879 --> 00:34:21,675 SPEAKER 2: In a day? 627 00:34:21,675 --> 00:34:21,754 PROF. 628 00:34:21,754 --> 00:34:25,010 PATRICK WINSTON: In a day, yeah. 629 00:34:25,010 --> 00:34:28,330 This, we all know, is about 100. 630 00:34:28,330 --> 00:34:30,610 Everybody knows that they take them out 631 00:34:30,610 --> 00:34:37,210 after about 100 pitches. 632 00:34:37,210 --> 00:34:39,330 So what I want to know is how much memory we need to record 633 00:34:39,330 --> 00:34:42,000 all the pitches a pitcher pitches in his career. 634 00:34:42,000 --> 00:34:44,060 So we still have to work on this little bit more. 635 00:34:44,060 --> 00:34:47,000 How many days a year does a pitcher pitch? 636 00:34:47,000 --> 00:34:50,750 Well, they've got winter ball, and that sort of thing, so 637 00:34:50,750 --> 00:34:56,938 let's just approximate it as 100. 638 00:34:56,938 --> 00:34:59,070 I don't know, some of these may be a little high, some of 639 00:34:59,070 --> 00:35:00,430 the others may be a low. 640 00:35:00,430 --> 00:35:02,650 And of course, the career-- 641 00:35:02,650 --> 00:35:04,524 just to make things easy-- 642 00:35:04,524 --> 00:35:07,940 is 100 years. 643 00:35:07,940 --> 00:35:11,000 So that's one, two, three, four, five, six. 644 00:35:11,000 --> 00:35:15,110 So we have 10 to the 12th bytes. 645 00:35:15,110 --> 00:35:18,340 Is that the hopelessly big to store in here? 646 00:35:21,184 --> 00:35:23,080 CHRISTOPHER: 10 to 100 [INAUDIBLE] or 647 00:35:23,080 --> 00:35:25,460 just 100 times throwing? 648 00:35:25,460 --> 00:35:25,495 PROF. 649 00:35:25,495 --> 00:35:27,470 PATRICK WINSTON: 100 pitches in a day-- 650 00:35:27,470 --> 00:35:28,860 Christopher's asking some detail-- 651 00:35:28,860 --> 00:35:30,790 and what we're gong to do is we're going to record 652 00:35:30,790 --> 00:35:33,270 everything there is to know about one pitch, and then 653 00:35:33,270 --> 00:35:34,472 we're going to see how many pitches, he 654 00:35:34,472 --> 00:35:35,890 pitches in his lifetime. 655 00:35:35,890 --> 00:35:37,140 And we're going to record all that. 656 00:35:41,516 --> 00:35:42,444 Trust me. 657 00:35:42,444 --> 00:35:43,805 Trust me. 658 00:35:43,805 --> 00:35:47,800 OK. so we want to know if this is actually a practical scale. 659 00:35:47,800 --> 00:35:49,640 And this, by the way, is cocktail conversation, who 660 00:35:49,640 --> 00:35:50,330 knows, right? 661 00:35:50,330 --> 00:35:53,170 But it's useful to work out these numbers, and know some 662 00:35:53,170 --> 00:35:54,780 of these numbers. 663 00:35:54,780 --> 00:35:59,690 So the question we have to ask is, how much 664 00:35:59,690 --> 00:36:02,010 computation is in there? 665 00:36:02,010 --> 00:36:05,240 And the first question relevant to that is, how many 666 00:36:05,240 --> 00:36:06,760 neurons do we have in our brain? 667 00:36:09,530 --> 00:36:10,650 Volunteer? 668 00:36:10,650 --> 00:36:12,740 Neuroscience? 669 00:36:12,740 --> 00:36:15,580 No one to volunteer? 670 00:36:15,580 --> 00:36:15,970 All right. 671 00:36:15,970 --> 00:36:18,690 Well this is a number you should know, because this is 672 00:36:18,690 --> 00:36:21,990 what you've got in there. 673 00:36:21,990 --> 00:36:29,870 There are 10 to the 10th neurons in the brain, of which 674 00:36:29,870 --> 00:36:31,990 10 to the 11th are in the cerebellum, alone. 675 00:36:37,950 --> 00:36:39,670 What the devil do I mean by that? 676 00:36:39,670 --> 00:36:42,390 I mean that your cerebellum is so full of neurons that it 677 00:36:42,390 --> 00:36:44,610 dwarfs the rest of the brain. 678 00:36:44,610 --> 00:36:46,440 So if you exclude the cerebellum, you've got about 679 00:36:46,440 --> 00:36:48,380 10 to 10th neurons. 680 00:36:48,380 --> 00:36:50,020 And there about 10 to the 11th neurons in 681 00:36:50,020 --> 00:36:50,750 the cerebellum, alone. 682 00:36:50,750 --> 00:36:53,610 What's the cerebellum for? 683 00:36:53,610 --> 00:36:55,210 Motor control. 684 00:36:55,210 --> 00:36:57,380 Interesting. 685 00:36:57,380 --> 00:36:58,690 So we're a little short. 686 00:36:58,690 --> 00:37:01,170 Oh, but we forget, that's just the number of neurons. 687 00:37:01,170 --> 00:37:04,630 We have to count up the number of synapses. 688 00:37:04,630 --> 00:37:07,060 Because conceivably, we might be able to adjust those 689 00:37:07,060 --> 00:37:08,650 synapses, right? 690 00:37:08,650 --> 00:37:11,670 So how many synapses does a neuron have? 691 00:37:11,670 --> 00:37:14,020 The answer is, it depends. 692 00:37:14,020 --> 00:37:15,855 But the ones in the cerebellum-- 693 00:37:18,940 --> 00:37:22,990 I should be pointing back there, I guess-- 694 00:37:22,990 --> 00:37:25,550 10 to the 5th. 695 00:37:25,550 --> 00:37:31,970 So if we add all that up we have 10 to the 16th. 696 00:37:31,970 --> 00:37:33,220 No problem. 697 00:37:37,150 --> 00:37:38,950 It's just that existence proves that you don't have to 698 00:37:38,950 --> 00:37:40,470 worry too much about having storage. 699 00:37:40,470 --> 00:37:44,100 So maybe our cerebellum functions, in some way, as a 700 00:37:44,100 --> 00:37:46,150 gigantic table. 701 00:37:46,150 --> 00:37:49,240 And that's maybe how we learn motor skills, by filling up 702 00:37:49,240 --> 00:37:55,700 that table as we run around emerging from the cradle, 703 00:37:55,700 --> 00:38:00,770 learning how to manipulate ourselves as we go on. 704 00:38:00,770 --> 00:38:05,440 So that's the story on arm control. 705 00:38:05,440 --> 00:38:11,320 Now all this is pretty straightforward, easy to 706 00:38:11,320 --> 00:38:12,430 understand. 707 00:38:12,430 --> 00:38:15,515 And of course, there are some problems. 708 00:38:23,370 --> 00:38:34,660 Problem number one, what if the space of 709 00:38:34,660 --> 00:38:36,586 samples looks like this? 710 00:38:36,586 --> 00:38:42,420 [TAPPING ON CHALK BOARD] 711 00:38:42,420 --> 00:38:45,400 What's going to happen in that case? 712 00:38:45,400 --> 00:38:48,870 Well what's going to happen in that case is that the-- 713 00:38:51,800 --> 00:38:53,595 let's see, which values are going to be more important? 714 00:38:56,590 --> 00:38:59,160 The x values, right? 715 00:38:59,160 --> 00:39:02,000 The y values are spread out all over the place. 716 00:39:02,000 --> 00:39:04,470 So you'd like the spread of the data to sort of be the 717 00:39:04,470 --> 00:39:06,820 same in all the dimensions. 718 00:39:06,820 --> 00:39:08,650 So is there anything we can do to arrange 719 00:39:08,650 --> 00:39:10,710 for that to be true? 720 00:39:10,710 --> 00:39:13,820 Sure, we can just normalize the data. 721 00:39:13,820 --> 00:39:17,470 So we can borrow from our statistics course and say, 722 00:39:17,470 --> 00:39:21,040 well, let's see, we're interested in x. 723 00:39:21,040 --> 00:39:27,250 And we know that the variance of x is equal to 1 over n 724 00:39:27,250 --> 00:39:35,170 times the sum of the values, minus the mean value squared. 725 00:39:35,170 --> 00:39:39,090 That's a measure of how much the data spreads out. 726 00:39:39,090 --> 00:39:43,990 So now, instead of using x, we can use x prime, which is 727 00:39:43,990 --> 00:39:51,380 equal to x over sigma. 728 00:39:51,380 --> 00:39:53,120 What's the variance of that going to be? 729 00:39:53,120 --> 00:39:58,000 x over sigma sub x. 730 00:39:58,000 --> 00:39:59,820 Anybody see, instantaneously, what the variance of 731 00:39:59,820 --> 00:40:00,630 that's going be? 732 00:40:00,630 --> 00:40:02,980 Or do we have to work it out? 733 00:40:02,980 --> 00:40:06,250 It's going to be 1, Work out the algebra for me. 734 00:40:06,250 --> 00:40:08,470 It's obvious, it's simple. 735 00:40:08,470 --> 00:40:17,150 Just substitute x prime into this formula for variance, and 736 00:40:17,150 --> 00:40:18,545 do the algebraic high school manipulation. 737 00:40:18,545 --> 00:40:20,720 And you'll see that the variance turns out not to be 738 00:40:20,720 --> 00:40:24,750 of this new variable, this transformed variable you want. 739 00:40:24,750 --> 00:40:30,100 So that problem, the non uniformity problem, the spread 740 00:40:30,100 --> 00:40:33,920 problem, is easy to handle. 741 00:40:48,280 --> 00:40:51,260 What about that other problem? 742 00:40:51,260 --> 00:40:53,090 No cake without flour? 743 00:40:53,090 --> 00:40:56,050 What if it turns out that the data-- 744 00:40:56,050 --> 00:41:00,470 you have two dimensions and the answer, actually, doesn't 745 00:41:00,470 --> 00:41:04,250 depend on y at all. 746 00:41:04,250 --> 00:41:05,500 What will happen? 747 00:41:08,500 --> 00:41:11,680 Then you're often going to get screwy results, because it'll 748 00:41:11,680 --> 00:41:15,450 be measuring a distance that is merely 749 00:41:15,450 --> 00:41:17,510 confusing the answer. 750 00:41:20,160 --> 00:41:24,160 So problem number two is the what matters problem. 751 00:41:30,722 --> 00:41:32,618 Write it down, what matters. 752 00:41:37,360 --> 00:41:41,440 Problem number three is, what if the answer doesn't depend 753 00:41:41,440 --> 00:41:42,540 on the data at all? 754 00:41:42,540 --> 00:41:46,360 Then you've got the trying to build a cake without flour. 755 00:41:46,360 --> 00:41:49,430 Once somebody asked me-- 756 00:41:49,430 --> 00:41:53,030 a classmate of mine, who went on to become an important 757 00:41:53,030 --> 00:41:55,210 executive in an important credit card company-- asked me 758 00:41:55,210 --> 00:41:58,780 if we could use artificial intelligence to determine when 759 00:41:58,780 --> 00:42:01,280 somebody was going to go bankrupt? 760 00:42:01,280 --> 00:42:03,220 And the answer was, no. 761 00:42:03,220 --> 00:42:07,930 Because the data available was data that was independent of 762 00:42:07,930 --> 00:42:08,990 that question. 763 00:42:08,990 --> 00:42:11,100 So he was trying to make a cake without flour, and you 764 00:42:11,100 --> 00:42:13,210 can't do that. 765 00:42:13,210 --> 00:42:14,300 So that concludes what I want to say 766 00:42:14,300 --> 00:42:15,040 about nearest neighbors. 767 00:42:15,040 --> 00:42:17,750 No I want to talk a little bit about sleep. 768 00:42:17,750 --> 00:42:22,010 Over there on that left-side branch, now disappeared, we 769 00:42:22,010 --> 00:42:24,750 talked about the human side of learning. 770 00:42:28,070 --> 00:42:31,250 And I said something about one-shot, an 771 00:42:31,250 --> 00:42:32,390 escalation based learning. 772 00:42:32,390 --> 00:42:35,110 And what that means is, you don't learn 773 00:42:35,110 --> 00:42:37,000 without problem solving. 774 00:42:37,000 --> 00:42:39,950 And the question is, how is problem solving related to how 775 00:42:39,950 --> 00:42:41,910 much sleep you get? 776 00:42:41,910 --> 00:42:44,330 And to answer questions like that, of course, you want to 777 00:42:44,330 --> 00:42:46,420 go to the people who are the custodians of the kind of 778 00:42:46,420 --> 00:42:47,870 knowledge you are interested in. 779 00:42:47,870 --> 00:42:49,980 And so you would say, who are the custodians of knowledge 780 00:42:49,980 --> 00:42:51,690 about how much sleep you need? 781 00:42:51,690 --> 00:42:53,910 And what happens if you don't get it? 782 00:42:53,910 --> 00:42:58,020 And the answer is the United States Army. 783 00:42:58,020 --> 00:43:00,170 Because they're extremely interested in what happens 784 00:43:00,170 --> 00:43:04,516 when you cross 10 or 12 times zones, and have no sleep, and 785 00:43:04,516 --> 00:43:06,600 have to perform. 786 00:43:06,600 --> 00:43:08,090 So they're very interested in that question. 787 00:43:08,090 --> 00:43:09,730 And they got even more interested after the first 788 00:43:09,730 --> 00:43:12,720 Gulf War, which was the most studied war in 789 00:43:12,720 --> 00:43:14,890 history, up to that time. 790 00:43:14,890 --> 00:43:17,750 Because, there were after action reports they were full 791 00:43:17,750 --> 00:43:20,490 of examples like this. 792 00:43:20,490 --> 00:43:26,750 The US Forces, in a certain part of the battlefield, and 793 00:43:26,750 --> 00:43:27,680 drawn up for the night. 794 00:43:27,680 --> 00:43:31,040 And those are Bradley fighting vehicles, there, and back here 795 00:43:31,040 --> 00:43:33,430 Abrams tanks. 796 00:43:33,430 --> 00:43:34,770 And they're all just kind of settling down for 797 00:43:34,770 --> 00:43:37,760 good night's sleep. 798 00:43:37,760 --> 00:43:41,110 They've been up for about 36 hours straight, by the way. 799 00:43:41,110 --> 00:43:48,940 When, much to their amazement, across their field-of-view 800 00:43:48,940 --> 00:43:53,720 came a column of Iraqi vehicles. 801 00:43:53,720 --> 00:43:56,240 And both sides were enormously surprised. 802 00:43:56,240 --> 00:43:58,830 A firefight broke out. 803 00:43:58,830 --> 00:44:02,600 The lead vehicle, over here, on the Iraqi 804 00:44:02,600 --> 00:44:04,970 side caught on fire. 805 00:44:04,970 --> 00:44:08,110 So these guys, in the Bradley fighting vehicles, went around 806 00:44:08,110 --> 00:44:12,370 to investigate, whereupon, these guys started blasting 807 00:44:12,370 --> 00:44:17,590 away, in acts of fratricidal fire. 808 00:44:17,590 --> 00:44:23,450 And the interesting thing is that all these folks here 809 00:44:23,450 --> 00:44:26,160 swore in the after action reports that they were firing 810 00:44:26,160 --> 00:44:28,430 straight ahead. 811 00:44:28,430 --> 00:44:31,330 And what happened was their ability to put ordnance on 812 00:44:31,330 --> 00:44:33,205 target was not impaired at all. 813 00:44:33,205 --> 00:44:36,090 But their idea of where the target was, what the target 814 00:44:36,090 --> 00:44:40,100 was, whether it was a target, was all screwed up. 815 00:44:40,100 --> 00:44:43,950 So this led to a lot of experiments in which people 816 00:44:43,950 --> 00:44:44,810 were sleep deprived. 817 00:44:44,810 --> 00:44:46,150 And by the way, you think you're a tough 818 00:44:46,150 --> 00:44:47,020 MIT student, right? 819 00:44:47,020 --> 00:44:48,500 These are Army Rangers. 820 00:44:48,500 --> 00:44:51,850 It doesn't get any tougher than this, believe me. 821 00:44:51,850 --> 00:44:52,780 So here's one of the 822 00:44:52,780 --> 00:44:55,290 experiments that was performed. 823 00:44:55,290 --> 00:44:57,740 In those days they had what they called 824 00:44:57,740 --> 00:45:01,110 fire control teams. 825 00:45:01,110 --> 00:45:03,820 And their job is to take information from an observer, 826 00:45:03,820 --> 00:45:08,780 over here, about a target, over here. 827 00:45:11,530 --> 00:45:18,880 And tell the artillery, over here, where to fire. 828 00:45:18,880 --> 00:45:20,170 So they kept some of these folks up 829 00:45:20,170 --> 00:45:22,020 for 36 hours straight. 830 00:45:22,020 --> 00:45:25,780 And after 36 hours they all said, we're doing great. 831 00:45:25,780 --> 00:45:28,620 And at that time they were bringing fire down on 832 00:45:28,620 --> 00:45:35,360 hospitals, mosques, churches, schools, and themselves. 833 00:45:35,360 --> 00:45:39,500 Because, they couldn't do the calculations anymore, after 36 834 00:45:39,500 --> 00:45:41,310 hours without sleep. 835 00:45:41,310 --> 00:45:44,430 And now you say to me, well I'm a MIT student, I want to 836 00:45:44,430 --> 00:45:46,070 see the data. 837 00:45:46,070 --> 00:45:47,390 So let's have a look at the data. 838 00:46:01,790 --> 00:46:02,180 OK. 839 00:46:02,180 --> 00:46:03,090 So there it goes. 840 00:46:03,090 --> 00:46:11,730 That's what happens to you after 72 hours without sleep. 841 00:46:11,730 --> 00:46:15,000 These are simple things to do. 842 00:46:15,000 --> 00:46:17,910 Very simple calculations you have to do in your head, like 843 00:46:17,910 --> 00:46:21,730 adding numbers, spelling words, and things like that. 844 00:46:21,730 --> 00:46:24,400 So after 72 hours without sleep, your performance 845 00:46:24,400 --> 00:46:30,860 relative to what you were at the beginning is about 30%. 846 00:46:30,860 --> 00:46:32,825 So loss of sleep destroys ability. 847 00:46:38,030 --> 00:46:48,900 [BELL RINGING] 848 00:46:48,900 --> 00:46:50,390 Sleep loss accumulates. 849 00:46:50,390 --> 00:46:52,830 So you say, well I need eight hours of sleep-- 850 00:46:52,830 --> 00:46:55,145 and what you need, by the way, varies-- 851 00:46:55,145 --> 00:46:58,220 but I'm going to get by was seven hours of sleep. 852 00:46:58,220 --> 00:47:02,830 So after 20 days of one hour's worth of sleep deprivation, 853 00:47:02,830 --> 00:47:05,740 you're down about 25%. 854 00:47:05,740 --> 00:47:09,660 If you say, well I need eight hours of sleep, but I'm going 855 00:47:09,660 --> 00:47:13,470 to have to get by with just six, after 20 days of that, 856 00:47:13,470 --> 00:47:19,450 you're down to about 25% of your original capability. 857 00:47:19,450 --> 00:47:21,390 So you might say, well does caffeine help? 858 00:47:21,390 --> 00:47:23,950 Or naps, naps in this case. 859 00:47:23,950 --> 00:47:26,300 And the answer is, yes, a little bit. 860 00:47:26,300 --> 00:47:29,790 Some people argue that you get the more affect out of the 861 00:47:29,790 --> 00:47:31,620 sleep that you do get if you divide it into two. 862 00:47:31,620 --> 00:47:33,640 Winston Churchill always took a three 863 00:47:33,640 --> 00:47:34,580 hour nap in the afternoon. 864 00:47:34,580 --> 00:47:37,080 He said that way he got a day and a half's worth of work out 865 00:47:37,080 --> 00:47:39,740 of every day. 866 00:47:39,740 --> 00:47:40,930 He got the full amount of sleep. 867 00:47:40,930 --> 00:47:42,630 But he divided it into two pieces. 868 00:47:42,630 --> 00:47:44,140 Here's the caffeine one. 869 00:47:44,140 --> 00:47:45,430 So caffeine does help. 870 00:47:48,540 --> 00:47:52,680 And now you say, well, shoot, I think I'm going to take it 871 00:47:52,680 --> 00:47:53,710 kind of easy this semester. 872 00:47:53,710 --> 00:47:57,100 And I'll just work hard during the week before finals. 873 00:47:57,100 --> 00:48:01,550 Maybe I won't even bother sleeping for the 24 hours 874 00:48:01,550 --> 00:48:04,754 before the 6034 final. 875 00:48:04,754 --> 00:48:05,920 That's OK. 876 00:48:05,920 --> 00:48:07,170 Well let's see what will happen. 877 00:48:10,780 --> 00:48:11,930 So let's work the numbers. 878 00:48:11,930 --> 00:48:14,240 Here is 24 hours. 879 00:48:14,240 --> 00:48:15,260 And that's where your 880 00:48:15,260 --> 00:48:18,370 effectiveness is after 24 hours. 881 00:48:18,370 --> 00:48:20,710 Now let's go over to the same amount of effectiveness on the 882 00:48:20,710 --> 00:48:22,910 blood alcohol curve. 883 00:48:22,910 --> 00:48:26,290 And it's about the level at which you 884 00:48:26,290 --> 00:48:28,880 would be legally drunk. 885 00:48:28,880 --> 00:48:31,370 So I guess what we ought to do is to check everybody as they 886 00:48:31,370 --> 00:48:35,040 come in for the 6034 final, and arrest you if you've been 887 00:48:35,040 --> 00:48:36,970 24 hours without sleep. 888 00:48:36,970 --> 00:48:41,850 And not let you take any finals again, for a year. 889 00:48:41,850 --> 00:48:46,710 So if you do all that, you might as well get drunk. 890 00:48:46,710 --> 00:48:48,400 And now we have one thing left to do today. 891 00:48:48,400 --> 00:48:50,720 And that is address the original question of, why it 892 00:48:50,720 --> 00:48:54,520 is that the dogs and cats in the world think that the diet 893 00:48:54,520 --> 00:48:58,400 drink makes people fat? 894 00:48:58,400 --> 00:49:00,600 What's the answer? 895 00:49:00,600 --> 00:49:05,160 It's because only fat guys like me drink this crap. 896 00:49:05,160 --> 00:49:08,580 So since the dogs and cats don't have the ability to tell 897 00:49:08,580 --> 00:49:12,430 themselves stories, don't have that capacity to string 898 00:49:12,430 --> 00:49:14,870 together events into narratives, they don't have 899 00:49:14,870 --> 00:49:19,230 any way of saying, well this is a consequence of desiring 900 00:49:19,230 --> 00:49:20,110 not to be fat. 901 00:49:20,110 --> 00:49:21,870 Not a consequence of being fat. 902 00:49:21,870 --> 00:49:23,970 They don't have that story. 903 00:49:23,970 --> 00:49:25,810 And so what they're doing is something you have to be very 904 00:49:25,810 --> 00:49:26,450 careful about. 905 00:49:26,450 --> 00:49:28,630 And that thing you have to be very careful about is the 906 00:49:28,630 --> 00:49:31,930 confusion of correlation with cause. 907 00:49:31,930 --> 00:49:34,040 They see the correlation, but they don't understand the 908 00:49:34,040 --> 00:49:35,660 cause, so that's why they make a mistake.