1 00:00:00,120 --> 00:00:02,550 The following content is provided under a Creative 2 00:00:02,550 --> 00:00:04,090 Commons license. 3 00:00:04,090 --> 00:00:06,390 Your support will help MIT OpenCourseWare 4 00:00:06,390 --> 00:00:10,750 continue to offer high quality educational resources for free. 5 00:00:10,750 --> 00:00:13,380 To make a donation or view additional materials 6 00:00:13,380 --> 00:00:17,310 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,310 --> 00:00:18,480 at ocw.mit.edu. 8 00:00:22,427 --> 00:00:24,260 TOM MITCHELL: I want to talk about some work 9 00:00:24,260 --> 00:00:29,210 that we're doing to try to study language in the brain. 10 00:00:29,210 --> 00:00:33,080 Actually, to be honest, this is part of a grander plan. 11 00:00:33,080 --> 00:00:37,700 So here is what I'm really doing with my research life. 12 00:00:37,700 --> 00:00:43,140 I'm interested in language, and so I'm 13 00:00:43,140 --> 00:00:46,390 involved in two different research projects. 14 00:00:46,390 --> 00:00:49,950 One of them is to build a computer to learn to read. 15 00:00:49,950 --> 00:00:53,730 And we have a project which we call our Never 16 00:00:53,730 --> 00:00:56,700 Ending Language Learner, which is an attempt 17 00:00:56,700 --> 00:00:59,550 to build a computer program to learn to read the web. 18 00:00:59,550 --> 00:01:03,930 NELL, we call it, has been running nonstop, 24 hours a day 19 00:01:03,930 --> 00:01:05,370 since 2010. 20 00:01:05,370 --> 00:01:07,842 So it's now five years old. 21 00:01:07,842 --> 00:01:09,300 If you have very good eyesight, you 22 00:01:09,300 --> 00:01:11,466 can tell that everybody's t-shirt there in the group 23 00:01:11,466 --> 00:01:15,990 is wearing a NELL fifth birthday party t-shirt. 24 00:01:15,990 --> 00:01:21,000 But it's an effort to try to understand 25 00:01:21,000 --> 00:01:24,120 what it would be like to build a computer program that 26 00:01:24,120 --> 00:01:26,280 runs forever and gets better every day. 27 00:01:26,280 --> 00:01:29,930 In this case, its job is to learn to read the web. 28 00:01:29,930 --> 00:01:31,530 It is getting better. 29 00:01:31,530 --> 00:01:34,530 It currently has about 100 million beliefs 30 00:01:34,530 --> 00:01:37,540 that it has read from the web. 31 00:01:37,540 --> 00:01:40,590 It's learning to infer new beliefs from old beliefs. 32 00:01:40,590 --> 00:01:43,440 It's a better reader today than it was last year. 33 00:01:43,440 --> 00:01:46,240 It was better last year than it was the year before. 34 00:01:46,240 --> 00:01:50,280 It's still not anything like as competent as you and I, 35 00:01:50,280 --> 00:01:54,300 but it's one line of research that you 36 00:01:54,300 --> 00:01:58,410 can follow if you're interested in understanding language 37 00:01:58,410 --> 00:02:00,302 understanding. 38 00:02:00,302 --> 00:02:01,760 The other thread, which is what I'm 39 00:02:01,760 --> 00:02:04,440 going to talk about tonight, which is in the bottom half 40 00:02:04,440 --> 00:02:09,990 here, is to study how the brain processes language 41 00:02:09,990 --> 00:02:12,480 by putting people in brain imaging 42 00:02:12,480 --> 00:02:16,770 scanners of different types, and showing them language stimuli, 43 00:02:16,770 --> 00:02:18,390 and getting them to read. 44 00:02:18,390 --> 00:02:20,550 So I'm going to focus really on the bottom part. 45 00:02:20,550 --> 00:02:23,070 But I can't really talk about this honestly 46 00:02:23,070 --> 00:02:29,760 unless I fess up to the fact that my goal is for these two 47 00:02:29,760 --> 00:02:35,280 projects to collide in a monstrous collision. 48 00:02:35,280 --> 00:02:37,980 They haven't yet, although you'll see some signs, 49 00:02:37,980 --> 00:02:42,240 I hope, tonight, of some of the cross-fertilisation 50 00:02:42,240 --> 00:02:44,810 between the two areas. 51 00:02:44,810 --> 00:02:47,520 When it comes to the brain imaging work, 52 00:02:47,520 --> 00:02:51,360 we have a very great team of people. 53 00:02:51,360 --> 00:02:54,520 One of them, Nicole Rafidi, is sitting right here. 54 00:02:54,520 --> 00:02:57,130 Some of you have already met her this week. 55 00:02:57,130 --> 00:03:00,840 And so what I'm going to present is really the group work 56 00:03:00,840 --> 00:03:05,590 of quite a few people. 57 00:03:05,590 --> 00:03:09,570 And the idea is simple, but here's the brainteaser. 58 00:03:09,570 --> 00:03:12,960 Suppose you're interested in how the brain processes language, 59 00:03:12,960 --> 00:03:16,140 and you have access to some scanning machines, then 60 00:03:16,140 --> 00:03:17,910 what would you do? 61 00:03:17,910 --> 00:03:21,300 And so we started out by showing people 62 00:03:21,300 --> 00:03:24,220 in a scanner stimuli like these. 63 00:03:24,220 --> 00:03:28,590 Maybe single words, initially nouns like camera, and drill, 64 00:03:28,590 --> 00:03:30,630 and house, and saw. 65 00:03:30,630 --> 00:03:34,690 Sometimes pictures, sometimes pictures with words under them. 66 00:03:34,690 --> 00:03:37,470 But just showing people stimuli to get them 67 00:03:37,470 --> 00:03:39,960 to think about some concept. 68 00:03:39,960 --> 00:03:42,990 And then we collect a brain image, 69 00:03:42,990 --> 00:03:45,060 like this one, which we collected 70 00:03:45,060 --> 00:03:49,032 when a person was looking at this particular stimulus, 71 00:03:49,032 --> 00:03:51,480 a bottle. 72 00:03:51,480 --> 00:03:55,650 And this is posterior, this is the back of the head on top. 73 00:03:55,650 --> 00:03:58,500 This is the front of the head at the bottom here. 74 00:03:58,500 --> 00:04:04,530 And these four slices are four out of about 22 slices 75 00:04:04,530 --> 00:04:07,820 of the brain that make up the three dimensional image. 76 00:04:07,820 --> 00:04:11,730 And so you can see here what the brain activity looks like-- 77 00:04:11,730 --> 00:04:16,860 kind of blotchy-- when one particular person thinks about, 78 00:04:16,860 --> 00:04:19,500 bottle. 79 00:04:19,500 --> 00:04:22,050 So you might ask, what does it look like if they 80 00:04:22,050 --> 00:04:23,419 think about something else? 81 00:04:23,419 --> 00:04:25,710 Well, I can show you what it looks like on the average. 82 00:04:25,710 --> 00:04:28,100 If we average over 60 different words, 83 00:04:28,100 --> 00:04:30,820 then here's the brain activity. 84 00:04:30,820 --> 00:04:34,080 And you can see that it looks a lot like bottle, 85 00:04:34,080 --> 00:04:38,770 but maybe there are some differences. 86 00:04:38,770 --> 00:04:42,220 And in fact if I subtract out this mean activity 87 00:04:42,220 --> 00:04:46,390 from the brain image we get for bottle, 88 00:04:46,390 --> 00:04:49,390 then you can see the residue here. 89 00:04:49,390 --> 00:04:51,670 There are in fact some differences 90 00:04:51,670 --> 00:04:53,800 in the activity we see for bottle compared 91 00:04:53,800 --> 00:04:58,300 to the mean activity over many words. 92 00:04:58,300 --> 00:05:00,090 Whether that's signal or noise, I 93 00:05:00,090 --> 00:05:03,370 guess you can't tell by looking at this picture. 94 00:05:03,370 --> 00:05:07,420 But that's the kind of data that we have 95 00:05:07,420 --> 00:05:12,190 if we use fMRI to capture brain activity while people 96 00:05:12,190 --> 00:05:14,194 read words. 97 00:05:14,194 --> 00:05:15,610 So the first thing you might think 98 00:05:15,610 --> 00:05:19,330 of doing if you had this kind of data 99 00:05:19,330 --> 00:05:23,050 would be to train a machine learning program 100 00:05:23,050 --> 00:05:28,410 to decode from these brain images which word somebody is 101 00:05:28,410 --> 00:05:30,080 thinking about it. 102 00:05:30,080 --> 00:05:34,510 And so we, in fact, began that way by training classifiers 103 00:05:34,510 --> 00:05:38,050 where we'd give them a brain image. 104 00:05:38,050 --> 00:05:40,930 And during training time we would tell them 105 00:05:40,930 --> 00:05:44,350 which word that brain image corresponds to. 106 00:05:44,350 --> 00:05:47,830 And then after training we could test the classifier 107 00:05:47,830 --> 00:05:51,040 to see whether indeed it had learned 108 00:05:51,040 --> 00:05:54,220 the right pattern of activity by showing it new brain images 109 00:05:54,220 --> 00:05:58,240 and having it tell us, for example, is this person reading 110 00:05:58,240 --> 00:06:01,630 the word hammer or bottle. 111 00:06:01,630 --> 00:06:04,610 And, in fact, that works that works quite well. 112 00:06:04,610 --> 00:06:08,590 And, in fact, if you try it over several different participants 113 00:06:08,590 --> 00:06:13,090 in our study, you can see we get classification accuracies 114 00:06:13,090 --> 00:06:15,190 for a Boolean classification problem. 115 00:06:15,190 --> 00:06:21,460 Are they reading a tool word like hammer, saw, chisel, 116 00:06:21,460 --> 00:06:26,560 or a building were like house, palace, hotel. 117 00:06:26,560 --> 00:06:29,830 Then, depending on the individual person, 118 00:06:29,830 --> 00:06:34,570 we can get in the high 90% accuracy or a little worse. 119 00:06:34,570 --> 00:06:39,970 In fact, if you ask why it's not the same for all people, 120 00:06:39,970 --> 00:06:43,510 it turns out the accuracy that we get correlates 121 00:06:43,510 --> 00:06:47,090 very well with measure of head motion in the machine. 122 00:06:47,090 --> 00:06:51,680 So a lot of this is noise. 123 00:06:51,680 --> 00:06:55,090 But the bottom line here is good. 124 00:06:55,090 --> 00:06:57,970 fMRI actually has enough resolution 125 00:06:57,970 --> 00:07:03,850 to resolve the differences in neural activity between, say, 126 00:07:03,850 --> 00:07:07,210 thinking about house versus hammer. 127 00:07:07,210 --> 00:07:13,690 And machine learning methods can discover those distinctions. 128 00:07:13,690 --> 00:07:15,760 So that's a good basis. 129 00:07:15,760 --> 00:07:21,310 And so given that, you can start asking a number 130 00:07:21,310 --> 00:07:22,540 of interesting questions. 131 00:07:22,540 --> 00:07:25,030 Like we could ask, well, what about you and me? 132 00:07:25,030 --> 00:07:28,240 Do we have the same pattern of brain activity 133 00:07:28,240 --> 00:07:32,380 to encode hammer, and house, and all the other concepts? 134 00:07:32,380 --> 00:07:36,350 Or do each of us do something different? 135 00:07:36,350 --> 00:07:38,726 And we can convert that into a machine learning question, 136 00:07:38,726 --> 00:07:39,225 right? 137 00:07:39,225 --> 00:07:41,440 We could say, well, what if we train on people 138 00:07:41,440 --> 00:07:43,470 on that side of the room. 139 00:07:43,470 --> 00:07:46,830 We'll collect their brain data and train our program. 140 00:07:46,830 --> 00:07:48,890 Then we'll collect data from these people 141 00:07:48,890 --> 00:07:50,860 and try to decode which word they're 142 00:07:50,860 --> 00:07:53,380 reading based on the patterns that we 143 00:07:53,380 --> 00:07:55,750 learned from those people. 144 00:07:55,750 --> 00:07:59,500 If that works, then that's overwhelming evidence 145 00:07:59,500 --> 00:08:04,150 that we have very similar neural encodings of different word 146 00:08:04,150 --> 00:08:06,640 meanings. 147 00:08:06,640 --> 00:08:09,490 So we tried that and, in fact, it works. 148 00:08:09,490 --> 00:08:12,460 In fact, here you see in black, the accuracies, 149 00:08:12,460 --> 00:08:16,900 just like on the first slide, of how well we can decode 150 00:08:16,900 --> 00:08:21,010 which word a person is reading in black, 151 00:08:21,010 --> 00:08:24,630 If we train on data from the same person we're testing on. 152 00:08:24,630 --> 00:08:27,520 But in white you see the accuracies 153 00:08:27,520 --> 00:08:32,200 we get if we train on no data at all from this person, 154 00:08:32,200 --> 00:08:34,179 but instead train on the data from all 155 00:08:34,179 --> 00:08:37,270 the other participants. 156 00:08:37,270 --> 00:08:40,210 And you see on average we do about as well 157 00:08:40,210 --> 00:08:43,809 with the white bars as we do with the black bars. 158 00:08:43,809 --> 00:08:45,730 In fact, in some cases we do better training 159 00:08:45,730 --> 00:08:46,990 on other people. 160 00:08:46,990 --> 00:08:50,860 That might be, for example, because we get 161 00:08:50,860 --> 00:08:52,660 to use more training examples. 162 00:08:52,660 --> 00:08:55,780 We get to use all the other participants' data instead 163 00:08:55,780 --> 00:08:58,310 of just one participant's data. 164 00:08:58,310 --> 00:08:59,920 But again, the important thing here 165 00:08:59,920 --> 00:09:04,030 is, this is very strong evidence that, even though we're all 166 00:09:04,030 --> 00:09:08,050 very different people, we have remarkably 167 00:09:08,050 --> 00:09:14,395 similar neural encodings when we think about common nouns. 168 00:09:17,240 --> 00:09:21,970 Which is something that really, say in the year 2000, 169 00:09:21,970 --> 00:09:24,070 I don't think anybody understood. 170 00:09:24,070 --> 00:09:28,860 So I want to kind of wrap up this idea. 171 00:09:28,860 --> 00:09:31,630 So I want to go through basically four ideas 172 00:09:31,630 --> 00:09:32,650 in this talk. 173 00:09:32,650 --> 00:09:36,280 Idea number one is, gee, we could 174 00:09:36,280 --> 00:09:42,000 train classifiers to try to decode from the neural activity 175 00:09:42,000 --> 00:09:45,090 which word a person is reading. 176 00:09:45,090 --> 00:09:47,610 And if we do that, then we can actually 177 00:09:47,610 --> 00:09:49,860 ask some interesting scientific questions, 178 00:09:49,860 --> 00:09:52,800 like are the patterns similar across our brains? 179 00:09:52,800 --> 00:09:55,650 Does it depend whether it's a picture or a word? 180 00:09:55,650 --> 00:10:01,030 And, in fact, we can think of this technique of training 181 00:10:01,030 --> 00:10:04,710 and classifier as-- the way I think of it 182 00:10:04,710 --> 00:10:09,780 is it's a way of building a virtual sensor of information 183 00:10:09,780 --> 00:10:13,380 content in the neural signal. 184 00:10:13,380 --> 00:10:16,500 So I think that fMRI was truly a revolution 185 00:10:16,500 --> 00:10:19,290 in the study of the brain, because for the first time 186 00:10:19,290 --> 00:10:22,890 we could look inside and see the activity. 187 00:10:22,890 --> 00:10:26,100 But I think these classifiers give us a different thing. 188 00:10:26,100 --> 00:10:30,240 Now we can look inside and see not just the neural activity, 189 00:10:30,240 --> 00:10:34,610 but the information encoded in that neural activity. 190 00:10:34,610 --> 00:10:37,050 And so it's a different kind of sensor. 191 00:10:37,050 --> 00:10:39,990 And you can design your own and train it, and then 192 00:10:39,990 --> 00:10:43,050 use it to study information represented 193 00:10:43,050 --> 00:10:44,835 in the neural signal in the brain. 194 00:10:44,835 --> 00:10:48,780 So it kind of opens up a very large set 195 00:10:48,780 --> 00:10:53,430 of methods, and techniques, and experiments that we can now 196 00:10:53,430 --> 00:10:56,100 run with brain imaging. 197 00:10:56,100 --> 00:10:58,530 Where instead of looking just at the activity, 198 00:10:58,530 --> 00:11:00,450 we now can look at the information content. 199 00:11:03,690 --> 00:11:08,130 OK, so that's idea number one. 200 00:11:08,130 --> 00:11:10,226 We were quite pleased with ourselves 201 00:11:10,226 --> 00:11:11,350 and we are doing this work. 202 00:11:11,350 --> 00:11:14,590 But in the back of back of our mind 203 00:11:14,590 --> 00:11:18,820 was kind of a gnawing question of, well, this is good, 204 00:11:18,820 --> 00:11:22,040 now maybe we've trained on a couple of hundred words, 205 00:11:22,040 --> 00:11:24,640 so we have a couple hundred different neural patterns 206 00:11:24,640 --> 00:11:26,170 of activity. 207 00:11:26,170 --> 00:11:29,530 We have kind of a list of the neural codes 208 00:11:29,530 --> 00:11:32,260 for a couple of hundred words, but that's not really 209 00:11:32,260 --> 00:11:36,270 a theory of neural encodings of meaning. 210 00:11:36,270 --> 00:11:37,960 It's a list. 211 00:11:37,960 --> 00:11:40,920 What would it mean to have a theory? 212 00:11:40,920 --> 00:11:45,340 Well, scientific theories are logical systems 213 00:11:45,340 --> 00:11:47,600 that can make predictions. 214 00:11:47,600 --> 00:11:49,300 And if they're interesting theories, 215 00:11:49,300 --> 00:11:54,500 they make experimentally testable predictions. 216 00:11:54,500 --> 00:11:57,070 So in our case, it would be nice, 217 00:11:57,070 --> 00:12:00,610 if we want to study representations of meaning, 218 00:12:00,610 --> 00:12:04,270 to have a theory where we could input an arbitrary noun 219 00:12:04,270 --> 00:12:07,090 and get it to predict for us what 220 00:12:07,090 --> 00:12:10,190 would be the neural representation for that non. 221 00:12:10,190 --> 00:12:12,420 At least that would be better than a list. 222 00:12:12,420 --> 00:12:17,810 That would be a generative theory or model. 223 00:12:17,810 --> 00:12:19,330 And so we're interested in this. 224 00:12:19,330 --> 00:12:22,930 And we worked on this for a while and came up with-- 225 00:12:22,930 --> 00:12:27,980 our first version of this looked like this. 226 00:12:27,980 --> 00:12:31,420 It's a computational model that was trained. 227 00:12:31,420 --> 00:12:32,980 And once it's trained, it would make 228 00:12:32,980 --> 00:12:38,480 a prediction for any input word, like telephone, in two steps. 229 00:12:38,480 --> 00:12:41,710 Step one, if you gave it a word like telephone, for example. 230 00:12:41,710 --> 00:12:45,490 Step one, it would look up the word telephone 231 00:12:45,490 --> 00:12:49,180 in a trillion words of text collected from the web 232 00:12:49,180 --> 00:12:52,660 and represent that word by a set of statistics 233 00:12:52,660 --> 00:12:55,150 about how telephone is used. 234 00:12:55,150 --> 00:12:57,340 In our case, statistics about which 235 00:12:57,340 --> 00:13:00,730 verbs co-occurred with that noun. 236 00:13:00,730 --> 00:13:02,560 And then in the second step, it would 237 00:13:02,560 --> 00:13:06,130 use that vector which approximates 238 00:13:06,130 --> 00:13:10,300 the meaning of the input noun as the basis for predicting 239 00:13:10,300 --> 00:13:13,570 in each of 20,000 locations in the brain, 240 00:13:13,570 --> 00:13:16,270 how much activity will there be there. 241 00:13:16,270 --> 00:13:19,450 So let me push on that a little bit. 242 00:13:19,450 --> 00:13:25,420 So I say in step one, we look up for a word like celery which 243 00:13:25,420 --> 00:13:26,690 verbs that occur with. 244 00:13:26,690 --> 00:13:29,740 Well, here are the statistics that we get. 245 00:13:29,740 --> 00:13:32,360 This is normalized to be a vector of length 1. 246 00:13:32,360 --> 00:13:36,550 But you can see for celery the most common verb is eat. 247 00:13:36,550 --> 00:13:38,950 And taste is second most common. 248 00:13:38,950 --> 00:13:43,630 But celery doesn't occur very often with ride. 249 00:13:43,630 --> 00:13:45,550 On the other hand, airplane occurs a lot 250 00:13:45,550 --> 00:13:49,510 with ride, and not very much with manipulate and rub. 251 00:13:49,510 --> 00:13:55,420 So these are the verb statistics extracted from the web 252 00:13:55,420 --> 00:13:58,480 for two typical nouns. 253 00:13:58,480 --> 00:14:01,690 And step one of the model was just 254 00:14:01,690 --> 00:14:03,880 to collect statistics for whatever now we 255 00:14:03,880 --> 00:14:05,650 give it to make the prediction. 256 00:14:05,650 --> 00:14:11,140 Step two is then to predict at each location in the brain 257 00:14:11,140 --> 00:14:16,300 what the neural activity will be there, the fMRI activity, 258 00:14:16,300 --> 00:14:21,350 as a function of those statistics we just collected. 259 00:14:21,350 --> 00:14:23,740 So for the word celery, now we know 260 00:14:23,740 --> 00:14:30,340 it occurs 0.84 with eat and 0.35 with the verb taste. 261 00:14:30,340 --> 00:14:34,030 We're now going to make a prediction of this voxel. 262 00:14:34,030 --> 00:14:37,030 In particular, the prediction that voxel v 263 00:14:37,030 --> 00:14:41,950 is the sum, over those 25 verbs that we're using, 264 00:14:41,950 --> 00:14:46,860 of how frequently verb i occurs with the input noun, 265 00:14:46,860 --> 00:14:50,800 celery in this case, times some coefficient that we 266 00:14:50,800 --> 00:14:53,120 have to learn from training. 267 00:14:53,120 --> 00:14:57,580 And this coefficient tells us how voxel v is influenced 268 00:14:57,580 --> 00:15:01,390 by co-occurring with verb i. 269 00:15:01,390 --> 00:15:06,530 And we have 25 verbs, 20,000 voxels, 270 00:15:06,530 --> 00:15:11,060 so we have 500,000 of these coefficients to learn. 271 00:15:13,810 --> 00:15:18,460 We learn them by taking nouns, collecting the brain-- 272 00:15:18,460 --> 00:15:22,100 the same data we use to train those classifiers. 273 00:15:22,100 --> 00:15:24,610 So we have a collection of nouns and the corresponding brain 274 00:15:24,610 --> 00:15:25,990 images. 275 00:15:25,990 --> 00:15:29,890 For each of those nouns we can look up the verbs statistics. 276 00:15:29,890 --> 00:15:32,740 And then we can train on that data 277 00:15:32,740 --> 00:15:36,880 to estimate all these half million coefficients. 278 00:15:36,880 --> 00:15:39,700 When you put those coefficients together, say, for eat, 279 00:15:39,700 --> 00:15:42,980 this is actually a plot of the coefficient values. 280 00:15:42,980 --> 00:15:45,550 Here's one of those coefficients for the verb 281 00:15:45,550 --> 00:15:49,360 eat in a particular voxel right there. 282 00:15:49,360 --> 00:15:51,700 So you can think of the coefficients associated 283 00:15:51,700 --> 00:15:58,780 with each verb as forming a kind of activity map for that verb. 284 00:15:58,780 --> 00:16:03,730 And a weighted linear sum of those verb-associated activity 285 00:16:03,730 --> 00:16:07,240 maps gives us a prediction for celery. 286 00:16:07,240 --> 00:16:10,210 You could ask, how well do these predictions work? 287 00:16:10,210 --> 00:16:12,880 One way I could answer that is to show you here, 288 00:16:12,880 --> 00:16:15,670 when we trained on 58 other nouns, 289 00:16:15,670 --> 00:16:18,910 not including celery, not including airplane. 290 00:16:18,910 --> 00:16:23,620 And then we had the system predict these novel, 291 00:16:23,620 --> 00:16:25,470 to it, words. 292 00:16:25,470 --> 00:16:27,370 Celery, it predicted this image. 293 00:16:27,370 --> 00:16:29,470 Airplane, it predicted this image. 294 00:16:29,470 --> 00:16:33,070 Unbeknownst to it, here are the actual observed images 295 00:16:33,070 --> 00:16:35,740 for celery and airplane. 296 00:16:35,740 --> 00:16:38,350 So you can see it correctly predicts 297 00:16:38,350 --> 00:16:40,870 some of this structure-- 298 00:16:40,870 --> 00:16:44,470 this is, by the way, fusiform gyrus-- 299 00:16:44,470 --> 00:16:46,040 but not all the structure. 300 00:16:46,040 --> 00:16:51,290 So it captures some of what's going on. 301 00:16:51,290 --> 00:16:53,710 I can, in a more quantitative way, 302 00:16:53,710 --> 00:16:56,290 tell you how well it's working by-- 303 00:16:56,290 --> 00:16:58,510 we can test the program this way. 304 00:16:58,510 --> 00:17:03,640 We can say, here are two words you have not seen. 305 00:17:03,640 --> 00:17:05,920 Here are two images you have not seen. 306 00:17:05,920 --> 00:17:08,319 One of them is celery, one is airplane. 307 00:17:08,319 --> 00:17:11,800 You, the program, tell me which. 308 00:17:11,800 --> 00:17:14,690 If it was just working at chance, 309 00:17:14,690 --> 00:17:17,190 it would get an accuracy of 50%. 310 00:17:17,190 --> 00:17:19,990 If you just guess randomly, you'll get half of those right 311 00:17:19,990 --> 00:17:22,750 by chance. 312 00:17:22,750 --> 00:17:26,680 In its case, averaged over nine different subjects 313 00:17:26,680 --> 00:17:31,520 in the experiment, we get 79% accuracy. 314 00:17:31,520 --> 00:17:32,760 So what does this mean? 315 00:17:32,760 --> 00:17:38,360 What this means is, three times out of four, 79%, 316 00:17:38,360 --> 00:17:42,290 we could give this trained model two new nouns that it has never 317 00:17:42,290 --> 00:17:47,030 seen, two fMRI images for those nouns, 318 00:17:47,030 --> 00:17:51,960 and it could tell us three times out of four which was which. 319 00:17:51,960 --> 00:17:55,610 So this model is extrapolating beyond the words 320 00:17:55,610 --> 00:17:58,850 on which it was trained. 321 00:17:58,850 --> 00:18:02,720 And it's extrapolating, not perfectly, but somewhat 322 00:18:02,720 --> 00:18:06,320 successfully to other nouns. 323 00:18:06,320 --> 00:18:07,340 Now, why? 324 00:18:07,340 --> 00:18:10,610 What's the basis on which it's doing that extrapolation? 325 00:18:10,610 --> 00:18:13,610 What are the assumptions built into this model? 326 00:18:13,610 --> 00:18:15,290 Well, for one thing, it's assuming 327 00:18:15,290 --> 00:18:20,270 that you can predict the neural representation of any word 328 00:18:20,270 --> 00:18:23,630 based on corpus statistics summarizing how 329 00:18:23,630 --> 00:18:27,560 that word is used on the web. 330 00:18:27,560 --> 00:18:35,630 Furthermore, it's assuming that any noun you can think of 331 00:18:35,630 --> 00:18:38,660 has a neural representation which 332 00:18:38,660 --> 00:18:43,940 lives in a 25-dimensional vector space, where 333 00:18:43,940 --> 00:18:48,770 each dimension corresponds to one of those 25 verbs. 334 00:18:48,770 --> 00:18:55,520 And every image is some point in this 25-dimensional vector 335 00:18:55,520 --> 00:18:56,780 space. 336 00:18:56,780 --> 00:19:00,710 That's what that linear equation is 337 00:19:00,710 --> 00:19:05,180 doing when it's combining some weighted combination 338 00:19:05,180 --> 00:19:09,270 of these 25 axes to predict the image. 339 00:19:09,270 --> 00:19:12,470 So, I don't actually believe that everything 340 00:19:12,470 --> 00:19:14,990 you think lives in a 25-dimensional space 341 00:19:14,990 --> 00:19:17,780 where the dimensions are those verbs. 342 00:19:17,780 --> 00:19:22,280 But the interesting thing is that the model works. 343 00:19:22,280 --> 00:19:29,950 And so it does mean that there is some more primitive set 344 00:19:29,950 --> 00:19:34,570 of meaning components out of which these neural patterns are 345 00:19:34,570 --> 00:19:35,620 being constructed. 346 00:19:39,140 --> 00:19:41,240 It's not just a big hash code where every word 347 00:19:41,240 --> 00:19:42,644 gets its own pattern. 348 00:19:42,644 --> 00:19:44,060 If that were the case, we wouldn't 349 00:19:44,060 --> 00:19:45,680 be able to extrapolate and predict 350 00:19:45,680 --> 00:19:54,120 new ones by adding together these different 25 components. 351 00:19:54,120 --> 00:19:56,420 So patterns are being built up out 352 00:19:56,420 --> 00:20:00,410 of more primitive semantic components. 353 00:20:00,410 --> 00:20:06,710 And this model is crudely, only 79%, 354 00:20:06,710 --> 00:20:11,660 capturing some of that substructure that 355 00:20:11,660 --> 00:20:17,940 gets combined when you think about an entire word. 356 00:20:17,940 --> 00:20:21,140 And the substructure are the different meaning components. 357 00:20:21,140 --> 00:20:24,080 The point here, I think, is, here's 358 00:20:24,080 --> 00:20:27,140 a model that's different from training a classifier. 359 00:20:27,140 --> 00:20:29,040 This is actually a generative model. 360 00:20:29,040 --> 00:20:32,750 It can make predictions that extrapolate beyond the training 361 00:20:32,750 --> 00:20:35,360 words on which it was trained. 362 00:20:35,360 --> 00:20:42,050 It is assuming that there is a space of semantic primitives 363 00:20:42,050 --> 00:20:46,520 out of which the patterns of neural activity are built. 364 00:20:46,520 --> 00:20:49,550 And it is assuming that that space is at least spanned 365 00:20:49,550 --> 00:20:55,640 by the corpus statistics of the noun. 366 00:20:55,640 --> 00:21:00,410 And since then, we've extended this work, 367 00:21:00,410 --> 00:21:04,010 and we no longer use just that list of 25 verbs. 368 00:21:04,010 --> 00:21:13,130 We actually use a very high 100-million-dimensional vector, 369 00:21:13,130 --> 00:21:14,840 which is generally very sparse, but where 370 00:21:14,840 --> 00:21:22,070 every feature comes from a much more precise parse of text 371 00:21:22,070 --> 00:21:23,510 on the web. 372 00:21:23,510 --> 00:21:28,340 And for example, when I say parse, 373 00:21:28,340 --> 00:21:30,470 I mean if we have a simple sentence 374 00:21:30,470 --> 00:21:34,430 like, he booked a ticket, this would be a dependency parse. 375 00:21:34,430 --> 00:21:36,020 It's showing, for example, that booked 376 00:21:36,020 --> 00:21:39,560 is a verb whose subject is he and whose direct object 377 00:21:39,560 --> 00:21:41,600 is ticket. 378 00:21:41,600 --> 00:21:44,480 And now each of these edges in the parse 379 00:21:44,480 --> 00:21:49,160 becomes a feature in our new representation of the word. 380 00:21:49,160 --> 00:21:53,360 So instead of using verbs, we use dependency parse features. 381 00:21:53,360 --> 00:21:58,280 And this actually increases slightly the accuracy 382 00:21:58,280 --> 00:22:02,840 of our former model from 79 up a little bit. 383 00:22:02,840 --> 00:22:07,500 But importantly, it also lets us work with all parts of speech. 384 00:22:07,500 --> 00:22:10,640 So now we're not restricted to just using nouns. 385 00:22:10,640 --> 00:22:14,330 We can use these dependency parse vectors for adjectives 386 00:22:14,330 --> 00:22:16,200 and all parts of speech. 387 00:22:16,200 --> 00:22:20,150 So in terms of broadening the model 388 00:22:20,150 --> 00:22:22,100 to be able to handle different types of words, 389 00:22:22,100 --> 00:22:24,060 this is helpful. 390 00:22:24,060 --> 00:22:27,260 So at this point you could say, well, this 391 00:22:27,260 --> 00:22:30,260 is kind of interesting, because what have we seen? 392 00:22:30,260 --> 00:22:36,050 I think the main points so far are, gee, 393 00:22:36,050 --> 00:22:38,630 different people have very similar patterns 394 00:22:38,630 --> 00:22:43,760 of neural activity that their brains use to encode meaning. 395 00:22:43,760 --> 00:22:47,420 Furthermore, those patterns of neural activity 396 00:22:47,420 --> 00:22:50,990 decompose into more primitive semantic components. 397 00:22:50,990 --> 00:22:56,990 And we can train models that extrapolate to new words 398 00:22:56,990 --> 00:23:02,360 on which they weren't trained by learning those more 399 00:23:02,360 --> 00:23:04,580 primitive semantic components and how 400 00:23:04,580 --> 00:23:10,290 to combine them for novel words based on corpus statistics. 401 00:23:10,290 --> 00:23:11,540 So that's kind of interesting. 402 00:23:11,540 --> 00:23:13,130 But everything that I've said so far 403 00:23:13,130 --> 00:23:17,030 is really about the static spatial distribution 404 00:23:17,030 --> 00:23:20,690 of neural activity that encodes these things. 405 00:23:20,690 --> 00:23:24,410 Now, in truth, your neural activity 406 00:23:24,410 --> 00:23:27,254 is not just one little snapshot. 407 00:23:27,254 --> 00:23:29,420 When you understand a word-- do you know how long it 408 00:23:29,420 --> 00:23:30,711 takes you to understand a word? 409 00:23:34,430 --> 00:23:36,290 About 400 milliseconds. 410 00:23:36,290 --> 00:23:40,930 It takes about 400 milliseconds to understand a word. 411 00:23:40,930 --> 00:23:44,090 Well, it turns out there is interesting brain activity 412 00:23:44,090 --> 00:23:47,810 dynamics during those 400 milliseconds. 413 00:23:47,810 --> 00:23:49,960 And let me show you. 414 00:23:49,960 --> 00:23:54,020 So up till now, we were looking at fMRI data. 415 00:23:54,020 --> 00:23:57,450 But here's some magnetoencephalography data. 416 00:23:57,450 --> 00:24:01,700 And this data has a time resolution of one millisecond. 417 00:24:01,700 --> 00:24:04,010 So I'll show you this movie which 418 00:24:04,010 --> 00:24:09,270 begins 20 milliseconds before a word appears on the screen. 419 00:24:09,270 --> 00:24:12,410 In this case, the word is the word hand. 420 00:24:12,410 --> 00:24:15,440 And this brain is about to read the word hand. 421 00:24:15,440 --> 00:24:19,570 You'll see 550 milliseconds of brain activity. 422 00:24:19,570 --> 00:24:21,440 I'll read out the numbers so you can just 423 00:24:21,440 --> 00:24:23,790 watch the activity over here. 424 00:24:23,790 --> 00:24:24,830 So here we go. 425 00:24:24,830 --> 00:24:28,890 20 milliseconds before the word appears on the screen. 426 00:24:28,890 --> 00:24:51,520 0, 100, 200 milliseconds, 300, 400 milliseconds, 500. 427 00:24:51,520 --> 00:24:55,580 OK, so it wasn't a static snapshot of activity. 428 00:24:55,580 --> 00:24:57,910 Your brain is doing a lot of things. 429 00:24:57,910 --> 00:25:01,540 There's a lot of dynamism during that 400 milliseconds 430 00:25:01,540 --> 00:25:03,520 that you're reading the word. 431 00:25:03,520 --> 00:25:10,270 fMRI captures an image about once a second, 432 00:25:10,270 --> 00:25:14,380 but because of the blood oxygen level 433 00:25:14,380 --> 00:25:18,340 dependent mechanism that it uses to capture that, it's 434 00:25:18,340 --> 00:25:19,850 kind of smeared out over time. 435 00:25:19,850 --> 00:25:25,420 So we can't see this dynamics with fMRI, but with MEG we can. 436 00:25:25,420 --> 00:25:29,020 And so now we can ask all kinds of interesting questions, 437 00:25:29,020 --> 00:25:31,390 like well, what was the information encoded 438 00:25:31,390 --> 00:25:33,590 in that movie that we just saw? 439 00:25:33,590 --> 00:25:36,730 I just showed you a movie of neural activity, 440 00:25:36,730 --> 00:25:42,270 but I want a movie of data flow in the brain. 441 00:25:42,270 --> 00:25:44,350 I want the movie showing me what information 442 00:25:44,350 --> 00:25:46,750 is encoded over time. 443 00:25:46,750 --> 00:25:49,700 Given this data, what could we do? 444 00:25:49,700 --> 00:25:51,260 Well, here's one thing we can do. 445 00:25:51,260 --> 00:25:54,670 In fact, Gus Sudre did this for his PhD thesis. 446 00:25:54,670 --> 00:25:57,850 He said, I want to know what information is flowing 447 00:25:57,850 --> 00:25:59,560 around the brain there, so I'm going 448 00:25:59,560 --> 00:26:03,100 to train roughly a million different classifiers. 449 00:26:03,100 --> 00:26:06,850 I'll train classifiers that look at just 100 milliseconds 450 00:26:06,850 --> 00:26:10,840 worth of that movie and look at just one of 70 451 00:26:10,840 --> 00:26:15,820 or so anatomically defined brain regions. 452 00:26:15,820 --> 00:26:18,940 And I'll use a set of features-- he 453 00:26:18,940 --> 00:26:21,430 wasn't using our verbs anymore. 454 00:26:21,430 --> 00:26:26,410 He was using a set of 229 features 455 00:26:26,410 --> 00:26:32,740 that we had made up manually and that were inspired 456 00:26:32,740 --> 00:26:34,420 by the game 20 questions. 457 00:26:34,420 --> 00:26:36,520 These were features of the word, not 458 00:26:36,520 --> 00:26:39,640 like, how often does a court does it co-occur with the verb 459 00:26:39,640 --> 00:26:40,340 eat? 460 00:26:40,340 --> 00:26:43,480 But instead, features like, would you eat it? 461 00:26:43,480 --> 00:26:44,660 Yes or no. 462 00:26:44,660 --> 00:26:45,940 Is it bigger than a bread box? 463 00:26:45,940 --> 00:26:46,660 Yes or no. 464 00:26:46,660 --> 00:26:48,070 And so forth. 465 00:26:48,070 --> 00:26:52,000 He had a set of 218 questions like that. 466 00:26:52,000 --> 00:26:53,740 And every word could be described 467 00:26:53,740 --> 00:26:59,790 by a set of 218 answers to those questions, 468 00:26:59,790 --> 00:27:02,950 analogous to the verbs. 469 00:27:02,950 --> 00:27:06,100 And so what Gus did is, for every one of those features, 470 00:27:06,100 --> 00:27:08,860 every one of those 218 features like, 471 00:27:08,860 --> 00:27:12,550 is it bigger than a breadbox, he trained a classifier 472 00:27:12,550 --> 00:27:15,100 to try to decode the value of that for the word 473 00:27:15,100 --> 00:27:18,790 that you're reading from just 100 milliseconds 474 00:27:18,790 --> 00:27:22,360 worth of this movie, and looking at just one of 70 475 00:27:22,360 --> 00:27:25,840 anatomically defined regions. 476 00:27:25,840 --> 00:27:30,250 And so when he did that, he ended up 477 00:27:30,250 --> 00:27:34,930 being able to make us a movie of what information is coded, 478 00:27:34,930 --> 00:27:36,900 in which part of the brain, when. 479 00:27:36,900 --> 00:27:39,280 And he ran this-- every 50 milliseconds 480 00:27:39,280 --> 00:27:42,550 he'd move forward and use a 100 millisecond window starting 481 00:27:42,550 --> 00:27:43,510 there. 482 00:27:43,510 --> 00:27:46,660 So he found that during the first 50 milliseconds 483 00:27:46,660 --> 00:27:50,170 after the word appears on the screen, 484 00:27:50,170 --> 00:27:54,180 none of those classifiers could reliably, 485 00:27:54,180 --> 00:27:58,330 in a cross validated way, produce 486 00:27:58,330 --> 00:28:01,150 any reliable predictions. 487 00:28:01,150 --> 00:28:03,460 Meaning the neural signals seems to not 488 00:28:03,460 --> 00:28:05,860 encode any of those semantic features 489 00:28:05,860 --> 00:28:10,060 during the first 50 milliseconds. 490 00:28:10,060 --> 00:28:12,580 By timing out to 100 milliseconds, 491 00:28:12,580 --> 00:28:15,100 there were no semantic features, but you could decode things 492 00:28:15,100 --> 00:28:17,650 like the number of letters in the word, the word length. 493 00:28:21,000 --> 00:28:24,330 At 150 milliseconds, at 200 milliseconds, 494 00:28:24,330 --> 00:28:26,880 you got the first semantic feature. 495 00:28:26,880 --> 00:28:27,510 Is it hairy? 496 00:28:30,240 --> 00:28:36,210 I think this is actually a stand-in for, is it alive? 497 00:28:36,210 --> 00:28:40,230 But the feature he happened to uncover was, is it hairy? 498 00:28:40,230 --> 00:28:43,140 At 200 milliseconds. 499 00:28:43,140 --> 00:28:46,800 At 250, now we start to see more semantic features. 500 00:28:46,800 --> 00:28:52,410 300, 350, 400, 450. 501 00:28:52,410 --> 00:29:04,310 So literally, these are the semantic features trickling 502 00:29:04,310 --> 00:29:07,610 in over time during this 500 milliseconds-- 503 00:29:07,610 --> 00:29:09,740 that's the movie-- 504 00:29:09,740 --> 00:29:12,860 that corresponds to the neural activity 505 00:29:12,860 --> 00:29:16,800 that I showed you in that first movie. 506 00:29:16,800 --> 00:29:23,090 So this is a kind of data flow picture of what information 507 00:29:23,090 --> 00:29:27,560 is flowing around in the brain in that neural activity 508 00:29:27,560 --> 00:29:32,700 during that 450 milliseconds so far. 509 00:29:32,700 --> 00:29:33,840 Here's the set. 510 00:29:33,840 --> 00:29:38,810 Out of those 218 questions, here are the 20 most decodable 511 00:29:38,810 --> 00:29:41,610 features. 512 00:29:41,610 --> 00:29:44,670 So the number one feature that's most decodable, 513 00:29:44,670 --> 00:29:47,670 is that bigger than a loaf of bread? 514 00:29:47,670 --> 00:29:49,740 But actually, if you look at those questions, 515 00:29:49,740 --> 00:29:52,440 you see many of the most incredible ones 516 00:29:52,440 --> 00:29:55,150 are really size. 517 00:29:55,150 --> 00:29:58,960 And many of the next are manipulability. 518 00:29:58,960 --> 00:30:00,820 And many others are animacy. 519 00:30:00,820 --> 00:30:04,110 And some are shelter. 520 00:30:04,110 --> 00:30:10,380 In fact, we've across a diverse set of experiments 521 00:30:10,380 --> 00:30:12,390 keep seeing these kind of features. 522 00:30:12,390 --> 00:30:16,620 Size, manipulability, animacy, shelter, 523 00:30:16,620 --> 00:30:26,620 edibility are recurring as features that have their own-- 524 00:30:26,620 --> 00:30:30,600 they seem to be kind of naturally some 525 00:30:30,600 --> 00:30:33,540 of the primitive components. 526 00:30:33,540 --> 00:30:35,460 And they have their corresponding neural 527 00:30:35,460 --> 00:30:40,740 signatures, out of which the encoding of the full word 528 00:30:40,740 --> 00:30:42,480 is built. 529 00:30:42,480 --> 00:30:44,340 So if you ask me right now, what's 530 00:30:44,340 --> 00:30:47,910 my best guess of what are the semantic primitives out 531 00:30:47,910 --> 00:30:50,280 of which the neural codes are built, I'd say, 532 00:30:50,280 --> 00:30:51,280 I don't really know. 533 00:30:51,280 --> 00:30:56,340 But these features plus edibility, for example, 534 00:30:56,340 --> 00:30:58,890 keep recurring in what we're seeing. 535 00:30:58,890 --> 00:31:01,050 And they have their own spatial regions 536 00:31:01,050 --> 00:31:05,000 where the codes seem to live. 537 00:31:05,000 --> 00:31:10,410 OK, so I want to get to the final part, which 538 00:31:10,410 --> 00:31:14,790 is, so far we've talked about just single words. 539 00:31:14,790 --> 00:31:17,370 And there's plenty of interesting questions 540 00:31:17,370 --> 00:31:18,850 we can ask about single words. 541 00:31:18,850 --> 00:31:22,230 But really, language is about multiple words. 542 00:31:22,230 --> 00:31:25,800 And so I want to show you a couple of examples of some more 543 00:31:25,800 --> 00:31:29,550 recent work where we've been looking at semantic composition 544 00:31:29,550 --> 00:31:31,590 with the adjective-noun phrases. 545 00:31:31,590 --> 00:31:34,500 This is the work of Alona Fyshe. 546 00:31:34,500 --> 00:31:36,690 And what she did is she presented people 547 00:31:36,690 --> 00:31:42,060 with just simple adjective-noun sequences. 548 00:31:42,060 --> 00:31:47,190 She put an adjective on the screen like tasty, 549 00:31:47,190 --> 00:31:50,190 leave it there for half a second, 550 00:31:50,190 --> 00:31:53,730 then a noun like tomato. 551 00:31:53,730 --> 00:31:57,900 And she was interested in the question of, 552 00:31:57,900 --> 00:32:03,930 well, where and when is the neural encoding of these two 553 00:32:03,930 --> 00:32:07,500 words, and what does that encoding look like? 554 00:32:07,500 --> 00:32:10,270 So I'll show you a couple of things. 555 00:32:10,270 --> 00:32:19,320 One is, here is a picture of the classifier weights 556 00:32:19,320 --> 00:32:23,370 that were learned to decode the adjective. 557 00:32:23,370 --> 00:32:25,650 And you have to think of it this way. 558 00:32:25,650 --> 00:32:27,540 Here's time. 559 00:32:27,540 --> 00:32:31,020 And this is the time, the first 500 milliseconds 560 00:32:31,020 --> 00:32:33,300 when the adjectives on the screen. 561 00:32:33,300 --> 00:32:36,320 Then there's 300 milliseconds of dead air. 562 00:32:36,320 --> 00:32:39,690 Then 500 milliseconds when the noun is on the screen. 563 00:32:39,690 --> 00:32:42,780 And then more dead air. 564 00:32:42,780 --> 00:32:46,290 This, the vertical axis, are different locations 565 00:32:46,290 --> 00:32:51,030 in the sensor helmet of the MEG scanner. 566 00:32:51,030 --> 00:32:54,330 And there are about 306 of those. 567 00:32:56,910 --> 00:33:01,590 The intensity here is showing the weight 568 00:33:01,590 --> 00:33:04,020 of a trained classifier that was trained 569 00:33:04,020 --> 00:33:06,180 to decode the adjective. 570 00:33:06,180 --> 00:33:08,370 And, in fact, this is the pattern 571 00:33:08,370 --> 00:33:11,960 of activity associated with the adjective gentle. 572 00:33:11,960 --> 00:33:13,520 Like gentle bear. 573 00:33:17,310 --> 00:33:21,990 And so what you see here is that there is neural activity 574 00:33:21,990 --> 00:33:24,720 out here when the noun is on the screen 575 00:33:24,720 --> 00:33:28,020 long after the adjective has disappeared from the screen. 576 00:33:28,020 --> 00:33:30,790 That's quite relevant to decoding 577 00:33:30,790 --> 00:33:32,070 what the adjective was. 578 00:33:35,610 --> 00:33:39,960 And so this is just kind of a quick look. 579 00:33:39,960 --> 00:33:46,320 You can see that if I say tasty tomato, even when 580 00:33:46,320 --> 00:33:51,210 you're reading the word tomato, there's neural activity here, 581 00:33:51,210 --> 00:33:54,060 when you're looking at that noun, that encodes 582 00:33:54,060 --> 00:33:57,870 what the adjective had been. 583 00:33:57,870 --> 00:33:59,850 And we can see that, in fact, it's 584 00:33:59,850 --> 00:34:01,890 a different pattern of neural activity 585 00:34:01,890 --> 00:34:05,070 than was here when the adjective was on. 586 00:34:05,070 --> 00:34:10,260 And in fact, one thing that Alona got interested in 587 00:34:10,260 --> 00:34:13,949 is, given that you can decode across time 588 00:34:13,949 --> 00:34:16,860 what that adjective was, is your brain 589 00:34:16,860 --> 00:34:20,949 using the same neural encoding across time? 590 00:34:20,949 --> 00:34:24,750 Or is it a different neural encoding, maybe 591 00:34:24,750 --> 00:34:29,100 for different purposes across time. 592 00:34:29,100 --> 00:34:30,659 Let me explain what she did. 593 00:34:30,659 --> 00:34:37,260 She trained a classifier at one time 594 00:34:37,260 --> 00:34:40,080 in this time series of adjective-noun, 595 00:34:40,080 --> 00:34:47,580 and then she would test it at some other time point. 596 00:34:47,580 --> 00:34:49,320 And if you could train at this time, 597 00:34:49,320 --> 00:34:52,830 like let's say, right when the adjective comes on the screen, 598 00:34:52,830 --> 00:34:56,159 and use it successfully to decode the adjective 599 00:34:56,159 --> 00:34:58,860 way down here when the noun is on the screen, 600 00:34:58,860 --> 00:35:02,100 then we can know that it's the same neural encoding, 601 00:35:02,100 --> 00:35:05,220 because that's what it's doing. 602 00:35:05,220 --> 00:35:09,150 And then she made a plot, a two-dimensional plot, 603 00:35:09,150 --> 00:35:12,060 where you could plot, let's say, the time at which you train 604 00:35:12,060 --> 00:35:14,700 the classifier on the vertical axis, 605 00:35:14,700 --> 00:35:18,560 and the time at which you test it on the horizontal axis. 606 00:35:18,560 --> 00:35:23,550 And then we could show at each training test time 607 00:35:23,550 --> 00:35:26,640 whether you could train at this time 608 00:35:26,640 --> 00:35:27,970 and then decode at this time. 609 00:35:27,970 --> 00:35:30,210 And that'll tell us whether there's 610 00:35:30,210 --> 00:35:34,845 a stable neural encoding of the adjective meaning across time. 611 00:35:34,845 --> 00:35:36,720 When she did that, here's what it looks like. 612 00:35:40,710 --> 00:35:44,390 OK, so here we have on the vertical axis 613 00:35:44,390 --> 00:35:46,250 the time at which she trained. 614 00:35:46,250 --> 00:35:49,130 This is when the adjective is on the screen, the first 500 615 00:35:49,130 --> 00:35:52,380 milliseconds, when the noun's on the screen. 616 00:35:52,380 --> 00:35:57,500 Here's then using any of these trained classifiers 617 00:35:57,500 --> 00:35:59,780 for decoding the adjective. 618 00:35:59,780 --> 00:36:02,487 Here's a different time at which she tried to use it. 619 00:36:02,487 --> 00:36:04,070 And again, here's when the adjective's 620 00:36:04,070 --> 00:36:06,170 on the screen, the noun. 621 00:36:06,170 --> 00:36:07,880 And so what you see-- 622 00:36:07,880 --> 00:36:11,780 all this intense stuff means high decoding accuracy-- 623 00:36:11,780 --> 00:36:17,450 shows that if you train when the adjective is on the screen, 624 00:36:17,450 --> 00:36:21,170 you can use that to decode other times at which the adjective's 625 00:36:21,170 --> 00:36:22,450 on the screen. 626 00:36:22,450 --> 00:36:24,420 That's good. 627 00:36:24,420 --> 00:36:26,780 So we can decode adjectives. 628 00:36:26,780 --> 00:36:30,590 But if you try to use it to decode the adjective when 629 00:36:30,590 --> 00:36:32,810 the noun's on the screen, it fails. 630 00:36:32,810 --> 00:36:36,380 Blue means failure. 631 00:36:36,380 --> 00:36:40,670 No statistically significant decoding accuracy. 632 00:36:40,670 --> 00:36:43,190 On the other hand, when the noun is on the screen 633 00:36:43,190 --> 00:36:46,435 if you train using the neural patterns when 634 00:36:46,435 --> 00:36:47,810 the nouns on the screen, then you 635 00:36:47,810 --> 00:36:50,420 can, in fact, decode what the adjective 636 00:36:50,420 --> 00:36:54,230 had been while the noun is on the screen. 637 00:36:54,230 --> 00:36:56,600 So it's like there are two different encodings 638 00:36:56,600 --> 00:36:58,850 of the adjective being used here. 639 00:36:58,850 --> 00:37:01,280 One when the adjective's on the screen that lets you 640 00:37:01,280 --> 00:37:04,190 successfully decode it when the adjective's on the screen, 641 00:37:04,190 --> 00:37:06,680 but it doesn't work when the noun's on the screen. 642 00:37:06,680 --> 00:37:11,320 And then the second one that works another neural encoding 643 00:37:11,320 --> 00:37:14,060 that you can use to decode what the adjective had been 644 00:37:14,060 --> 00:37:17,920 when the noun is on the screen. 645 00:37:17,920 --> 00:37:21,970 And then interestingly, there's also this other region here, 646 00:37:21,970 --> 00:37:25,400 which says if you train when the adjective was on the screen, 647 00:37:25,400 --> 00:37:27,501 you can't use that to successfully decode it 648 00:37:27,501 --> 00:37:28,750 when the noun's on the screen. 649 00:37:28,750 --> 00:37:34,840 But later on, when nothing is on the screen, the phrase is gone, 650 00:37:34,840 --> 00:37:38,860 your brain is still thinking about the adjective 651 00:37:38,860 --> 00:37:43,320 in a way that's using this neural encoding, the very first 652 00:37:43,320 --> 00:37:45,490 of those neural encodings. 653 00:37:45,490 --> 00:37:50,530 This is evidence that the neural encoding of the adjective that 654 00:37:50,530 --> 00:37:53,740 was present when you saw the adjective 655 00:37:53,740 --> 00:37:58,120 is re-emerging now a couple seconds later, 656 00:37:58,120 --> 00:38:01,780 after that thing is off the screen. 657 00:38:01,780 --> 00:38:04,100 But the neural encoding of the adjective 658 00:38:04,100 --> 00:38:09,340 when the noun was on the screen doesn't seem to get used again. 659 00:38:09,340 --> 00:38:11,770 Most recently, we've also been looking 660 00:38:11,770 --> 00:38:14,890 at stories and passages. 661 00:38:14,890 --> 00:38:18,610 And much of this, not all of it, is 662 00:38:18,610 --> 00:38:22,780 the work of Leila Wehbe, another PhD student. 663 00:38:22,780 --> 00:38:24,820 And here's what she did. 664 00:38:24,820 --> 00:38:28,765 She put people in fMRI and in MEG scanners, 665 00:38:28,765 --> 00:38:30,890 and she showed them the following kind of stimulus. 666 00:38:45,710 --> 00:38:48,340 So this goes on for about 40 minutes. 667 00:38:48,340 --> 00:38:51,010 One chapter of a Harry Potter story. 668 00:38:51,010 --> 00:38:53,650 And word by word, every 500 milliseconds, 669 00:38:53,650 --> 00:38:57,020 we know exactly when you've seen every word. 670 00:38:57,020 --> 00:39:01,060 So she collected this data in fMRI and in MEG 671 00:39:01,060 --> 00:39:05,746 to try to study the jumble of activity that 672 00:39:05,746 --> 00:39:07,120 goes on in your brain when you're 673 00:39:07,120 --> 00:39:10,930 reading not an isolated word, but a whole story. 674 00:39:10,930 --> 00:39:18,200 And so for her, with the fMRI we get an image every two seconds. 675 00:39:18,200 --> 00:39:21,490 So four words go by and we get an fMRI image. 676 00:39:21,490 --> 00:39:23,800 So here's the kind of data that she had. 677 00:39:23,800 --> 00:39:28,000 She trained a model that's very analogous to the very 678 00:39:28,000 --> 00:39:30,970 first generative model I talked about where we would 679 00:39:30,970 --> 00:39:32,860 input a word, code it with verbs, 680 00:39:32,860 --> 00:39:35,560 and then use that to predict neural activity. 681 00:39:35,560 --> 00:39:43,190 In her case, she took an approach where for every word, 682 00:39:43,190 --> 00:39:47,980 she would encode that word with a big feature vector. 683 00:39:47,980 --> 00:39:51,640 And that vector could summarize both the meaning 684 00:39:51,640 --> 00:39:53,470 of the individual word, but it also 685 00:39:53,470 --> 00:39:56,380 could have other features that capture 686 00:39:56,380 --> 00:40:02,380 the context or the various properties of the story 687 00:40:02,380 --> 00:40:04,400 at that point in time. 688 00:40:04,400 --> 00:40:07,600 But the general framework was to convert the time 689 00:40:07,600 --> 00:40:11,060 series of words into a time series of feature vectors 690 00:40:11,060 --> 00:40:14,500 that capture individual word meanings plus story 691 00:40:14,500 --> 00:40:20,110 content at that time, and then to use that to predict the fMRI 692 00:40:20,110 --> 00:40:21,830 and MEG activity. 693 00:40:21,830 --> 00:40:27,210 So when she did this, here are some 694 00:40:27,210 --> 00:40:31,210 of the kind of features that we ended up using. 695 00:40:31,210 --> 00:40:36,950 So some of there were like motions of the characters, 696 00:40:36,950 --> 00:40:38,550 like was there somebody flying-- 697 00:40:38,550 --> 00:40:40,780 this was the Harry Potter story. 698 00:40:40,780 --> 00:40:42,390 Somebody manipulating, or moving, 699 00:40:42,390 --> 00:40:45,300 or physically colliding. 700 00:40:45,300 --> 00:40:47,430 What were the emotions being experienced 701 00:40:47,430 --> 00:40:49,950 by the characters in the story that you're focused 702 00:40:49,950 --> 00:40:52,140 on at this point in time? 703 00:40:52,140 --> 00:40:55,230 What were the parts of speech of the different words 704 00:40:55,230 --> 00:40:58,590 and other syntactic features. 705 00:40:58,590 --> 00:41:00,310 What were the semantic content? 706 00:41:00,310 --> 00:41:03,210 We also used the dependency parse statistics 707 00:41:03,210 --> 00:41:07,070 that I mentioned that capture semantics of individual words. 708 00:41:07,070 --> 00:41:10,740 So altogether, she had a feature vector with about 200 features. 709 00:41:10,740 --> 00:41:15,190 Some manually annotated, some captured by corpus statistics. 710 00:41:15,190 --> 00:41:17,200 And for every word in the story we 711 00:41:17,200 --> 00:41:20,020 then had this feature vector. 712 00:41:20,020 --> 00:41:23,160 Then she trained this model that literally 713 00:41:23,160 --> 00:41:30,690 would take as input a sequence of words, convert that 714 00:41:30,690 --> 00:41:35,820 into the feature sequence, and then, 715 00:41:35,820 --> 00:41:38,100 using the trained regression, predict 716 00:41:38,100 --> 00:41:40,770 the time series of brain activity 717 00:41:40,770 --> 00:41:43,030 from those feature vectors. 718 00:41:43,030 --> 00:41:47,100 So this allowed her to then test, 719 00:41:47,100 --> 00:41:50,910 analogous to what we did with our single word noun generative 720 00:41:50,910 --> 00:41:55,710 model, to test to see, did the model learn well enough that we 721 00:41:55,710 --> 00:41:58,920 could give it to different passages, and then 722 00:41:58,920 --> 00:42:02,070 one real time series of observed data, 723 00:42:02,070 --> 00:42:04,800 and ask it to tell us which passage this person 724 00:42:04,800 --> 00:42:06,180 was reading. 725 00:42:06,180 --> 00:42:08,430 And these would be novel passages that were not 726 00:42:08,430 --> 00:42:10,200 part of the training data. 727 00:42:10,200 --> 00:42:15,990 And she found that it was, in fact, possible, imperfectly, 728 00:42:15,990 --> 00:42:20,250 but three times out of four, to take a passage which 729 00:42:20,250 --> 00:42:22,290 was not part of-- 730 00:42:22,290 --> 00:42:25,290 two passages which had never been seen in training, 731 00:42:25,290 --> 00:42:27,660 and a time series of neural activity never seen 732 00:42:27,660 --> 00:42:30,090 during training, and three times out of four, 733 00:42:30,090 --> 00:42:35,310 tell us which of those two passages they correspond to. 734 00:42:35,310 --> 00:42:38,970 So capturing some of the structure here. 735 00:42:38,970 --> 00:42:42,000 Interestingly, as a side effect of that, 736 00:42:42,000 --> 00:42:47,590 you end up with a map of different cortical regions 737 00:42:47,590 --> 00:42:51,370 and which of these 200 features are encoded 738 00:42:51,370 --> 00:42:54,400 in different cortical regions. 739 00:42:54,400 --> 00:42:56,930 So from one analysis of people reading 740 00:42:56,930 --> 00:43:02,510 this very complicated, complex story, in this analysis, 741 00:43:02,510 --> 00:43:03,490 we end up-- 742 00:43:03,490 --> 00:43:06,890 you can go [AUDIO OUT] features and color code. 743 00:43:06,890 --> 00:43:09,250 Some of them have to do with syntax, like part 744 00:43:09,250 --> 00:43:11,460 of speech and sentence length. 745 00:43:11,460 --> 00:43:13,810 Some have to do with dialogue, some 746 00:43:13,810 --> 00:43:18,760 have to do visual properties or characters in the stories. 747 00:43:18,760 --> 00:43:22,480 And you can see here is a map of where 748 00:43:22,480 --> 00:43:24,160 those different types of information 749 00:43:24,160 --> 00:43:29,050 were decodable from the neural activity. 750 00:43:29,050 --> 00:43:33,730 Interestingly, here is a slightly earlier piece of work, 751 00:43:33,730 --> 00:43:40,120 from Ev Fedorenko showing where there is neural activity that's 752 00:43:40,120 --> 00:43:43,937 selectively associated with language processing. 753 00:43:43,937 --> 00:43:45,770 The difference here is that in Leila's work, 754 00:43:45,770 --> 00:43:48,700 she was also able to indicate not just where 755 00:43:48,700 --> 00:43:54,010 the activity was, but what information is encoded there. 756 00:43:54,010 --> 00:43:56,260 And then again, you can drill down on some of these. 757 00:43:56,260 --> 00:43:58,120 If you want know more about syntax, 758 00:43:58,120 --> 00:44:01,510 we could actually look at the different syntax features 759 00:44:01,510 --> 00:44:04,240 and see, well, where's the part of speech encoded? 760 00:44:04,240 --> 00:44:06,590 What about the length of the sentence? 761 00:44:06,590 --> 00:44:09,520 What about the specific dependency role 762 00:44:09,520 --> 00:44:11,680 in the parse of the word that we're reading 763 00:44:11,680 --> 00:44:14,120 right now, and so forth. 764 00:44:14,120 --> 00:44:19,600 So this gives us a way then of starting 765 00:44:19,600 --> 00:44:24,880 to look simultaneously at very complex cognitive function, 766 00:44:24,880 --> 00:44:25,380 right? 767 00:44:25,380 --> 00:44:28,884 You're reading a story, you're perceiving the words, 768 00:44:28,884 --> 00:44:30,550 you're figuring out what part of speech, 769 00:44:30,550 --> 00:44:31,907 you're parsing the sentence. 770 00:44:31,907 --> 00:44:33,490 You're thinking about the plot, you're 771 00:44:33,490 --> 00:44:35,200 fitting this into the plot. 772 00:44:35,200 --> 00:44:37,180 You're feeling sorry for the hero who 773 00:44:37,180 --> 00:44:40,090 just had their brooms stolen, and all kinds of stuff 774 00:44:40,090 --> 00:44:43,280 going on in your head. 775 00:44:43,280 --> 00:44:48,820 Here's the analysis that attempts to simultaneously 776 00:44:48,820 --> 00:44:52,990 analyze a diverse range of these features, 777 00:44:52,990 --> 00:44:56,570 and I think with some success. 778 00:44:56,570 --> 00:45:01,960 There still remain problems of correlations 779 00:45:01,960 --> 00:45:05,270 between different features. 780 00:45:05,270 --> 00:45:08,170 And so it might be hard to know whether we're 781 00:45:08,170 --> 00:45:11,350 decoding the fact that somebody is being shouted at, 782 00:45:11,350 --> 00:45:15,870 versus the fact that their ears are hurting, so to speak. 783 00:45:15,870 --> 00:45:18,385 But there could be two different properties we thinking 784 00:45:18,385 --> 00:45:20,350 of that are highly correlated. 785 00:45:20,350 --> 00:45:24,670 And so it can still be hard to tease those apart. 786 00:45:24,670 --> 00:45:28,510 But I think that, to me, the interesting thing 787 00:45:28,510 --> 00:45:33,820 about Leila's analysis here is that it flips from a style 788 00:45:33,820 --> 00:45:36,080 that I would call reductionist. 789 00:45:36,080 --> 00:45:39,370 One way that people often study language in the brain 790 00:45:39,370 --> 00:45:41,620 is they pick one phenomena, and then 791 00:45:41,620 --> 00:45:45,260 run a carefully controlled experiment to just vary 792 00:45:45,260 --> 00:45:47,020 that one dimension. 793 00:45:47,020 --> 00:45:50,300 Like we'll use words, and we'll use 794 00:45:50,300 --> 00:45:51,950 letter strings that are pronounceable 795 00:45:51,950 --> 00:45:54,410 that are not words, and then words. 796 00:45:54,410 --> 00:45:57,770 And we'll just look at what's different in those two 797 00:45:57,770 --> 00:46:00,060 almost identical situations. 798 00:46:00,060 --> 00:46:03,620 Here, instead, we have people doing natural reading, 799 00:46:03,620 --> 00:46:06,710 doing a complex cognitive function, 800 00:46:06,710 --> 00:46:09,380 and try to use a multivariate analysis 801 00:46:09,380 --> 00:46:16,690 to simultaneously model all of those different functions. 802 00:46:16,690 --> 00:46:22,130 And so I think this is an interesting, methodologically, 803 00:46:22,130 --> 00:46:23,600 position to take. 804 00:46:23,600 --> 00:46:25,610 And it also gives us a chance to start 805 00:46:25,610 --> 00:46:29,130 looking at some of these phenomena in story reading.