1 00:00:15,000 --> 00:00:19,000 So this lecture is called the future of biology. And I want to associate 2 00:00:19,000 --> 00:00:23,000 somewhat freely. I'm not going to write on the board. 3 00:00:23,000 --> 00:00:27,000 I am going to post this presentation so you can pull 4 00:00:27,000 --> 00:00:32,000 down stuff from it. And I think there are two basic 5 00:00:32,000 --> 00:00:36,000 threads where we're going in biology. One of them is a basic 6 00:00:36,000 --> 00:00:41,000 understanding, what don't we know about biology? 7 00:00:41,000 --> 00:00:45,000 And the second is how are we going to use that information to go 8 00:00:45,000 --> 00:00:50,000 somewhere profound in the future? So the first one is how life works. 9 00:00:50,000 --> 00:00:55,000 And the obvious place to start is the Human Genome Project. 10 00:00:55,000 --> 00:00:59,000 The Human Genome Project was a very bold initiative that was first 11 00:00:59,000 --> 00:01:03,000 spoken about in the late 1980s. And I remember being at some of the 12 00:01:03,000 --> 00:01:07,000 first conversations. I was at the time finishing off my 13 00:01:07,000 --> 00:01:11,000 graduate work. I remember being at some of the 14 00:01:11,000 --> 00:01:15,000 first conversations about the Human Genome Project where the goal was to 15 00:01:15,000 --> 00:01:19,000 sequence the human genome, identify all the genes and DNA, 16 00:01:19,000 --> 00:01:23,000 and then use them to do a bunch of things that I'll talk about in a 17 00:01:23,000 --> 00:01:27,000 moment. And at the time, I have to say, it seemed like a 18 00:01:27,000 --> 00:01:31,000 really stupid idea. It was very difficult to determine 19 00:01:31,000 --> 00:01:35,000 the sequence of DNA. It took hours and days and days to 20 00:01:35,000 --> 00:01:40,000 read even a thousand base pairs. And, as you know, we have more than 21 00:01:40,000 --> 00:01:44,000 ten to the ninth base pairs. So the notion of sequencing the 22 00:01:44,000 --> 00:01:48,000 entire human genome seemed incredibly expensive and incredibly 23 00:01:48,000 --> 00:01:53,000 stupid. But that was a reflection, I think, of my naivety. And, in 24 00:01:53,000 --> 00:01:57,000 fact, it's been a very useful exercise. Sequencing has gotten 25 00:01:57,000 --> 00:02:02,000 better, largely because it had to in order for this project to succeed. 26 00:02:02,000 --> 00:02:05,000 And I think there's a real lesson there. If something has to happen, 27 00:02:05,000 --> 00:02:09,000 if you have to get a project done, there are people, 28 00:02:09,000 --> 00:02:13,000 you guys, who can make techniques better and get things done. 29 00:02:13,000 --> 00:02:17,000 So DNA sequencing is much, much orders of magnitude faster than it 30 00:02:17,000 --> 00:02:21,000 was ten years ago. And, in fact, this project was 31 00:02:21,000 --> 00:02:25,000 initiated in 1990. The sequencing, per se, 32 00:02:25,000 --> 00:02:29,000 was completed a couple of years ago. But, still, there are many people 33 00:02:29,000 --> 00:02:33,000 who are looking at the sequence and trying to figure out what it means. 34 00:02:33,000 --> 00:02:37,000 Because, as you remember from everything we've talked about, 35 00:02:37,000 --> 00:02:41,000 DNA sequence is a code. And even if you get three times ten to the ninth 36 00:02:41,000 --> 00:02:45,000 base pairs of the human genome, all you've gotten is a code. And 37 00:02:45,000 --> 00:02:49,000 now you have to crack the code. And we know how to crack the code 38 00:02:49,000 --> 00:02:53,000 kind of, that's what we've been talking about over and over. 39 00:02:53,000 --> 00:02:57,000 What does a promoter look like? What does an RNA look like? What 40 00:02:57,000 --> 00:03:02,000 does a coding region look like? What are the signals in a coding 41 00:03:02,000 --> 00:03:06,000 region that tells a protein synthesis to begin and to end? 42 00:03:06,000 --> 00:03:11,000 But when you've just given the, when you're given this huge mass of 43 00:03:11,000 --> 00:03:15,000 information that contains maybe 5% genes and 95% other stuff, 44 00:03:15,000 --> 00:03:19,000 to actually find the genes in the human genome from all the sequence 45 00:03:19,000 --> 00:03:24,000 data is not trivial. And so there is still analysis 46 00:03:24,000 --> 00:03:28,000 going on trying to figure out the identity of a bunch of genes and 47 00:03:28,000 --> 00:03:33,000 indeed the gene number. And you may notice every now and 48 00:03:33,000 --> 00:03:37,000 then revised estimates for gene numbers in the human genome, 49 00:03:37,000 --> 00:03:41,000 and it hovers somewhere around 20, 00 to 30,000. Before the sequence 50 00:03:41,000 --> 00:03:45,000 was obtained, it was thought that there were at least 100, 51 00:03:45,000 --> 00:03:49,000 00 distinct genes in the genome. The number came down and down and 52 00:03:49,000 --> 00:03:53,000 down. And now we think they're somewhere around 25, 53 00:03:53,000 --> 00:03:57,000 00 genes. That's the latest estimate. It doesn't really matter. 54 00:03:57,000 --> 00:04:01,000 Give or take a few thousand. But still the Human Genome Project 55 00:04:01,000 --> 00:04:05,000 is not complete. But it's complete enough that many 56 00:04:05,000 --> 00:04:09,000 people, including Professor Lander who is here at MIT who is over at 57 00:04:09,000 --> 00:04:13,000 the Broad Institute that's being built across Main Street. 58 00:04:13,000 --> 00:04:17,000 If you guys walk up towards the Stata Center and look across Main 59 00:04:17,000 --> 00:04:21,000 Street, there's a new building going up. That is the Broad Institute 60 00:04:21,000 --> 00:04:25,000 that is being organized by Professor Lander who was instrumental, 61 00:04:25,000 --> 00:04:30,000 one of the pivotal people in sequencing the human genome. 62 00:04:30,000 --> 00:04:36,000 And he and others are now doing the takeoff from the Human Genome 63 00:04:36,000 --> 00:04:42,000 Project, and it goes like this. Basically describe everything else 64 00:04:42,000 --> 00:04:48,000 about molecular biology. And it's a daunting list of what 65 00:04:48,000 --> 00:04:54,000 people want to do. Find all the RNAs, 66 00:04:54,000 --> 00:05:00,000 all the proteins in every cell type at every time during a cell's life. 67 00:05:00,000 --> 00:05:04,000 Figure out all the DNA-protein interactions, so all the 68 00:05:04,000 --> 00:05:08,000 transcription factors that bind to DNA. Figure out all the proteins 69 00:05:08,000 --> 00:05:12,000 that bind to RNA and might regulate their translation or might regulate 70 00:05:12,000 --> 00:05:16,000 their stability. Figure out all the protein-protein 71 00:05:16,000 --> 00:05:20,000 interactions, all those enzyme complexes, all those proteins that 72 00:05:20,000 --> 00:05:24,000 interact in all those signal transduction cascades you've been 73 00:05:24,000 --> 00:05:29,000 talking about. We have no idea what all the 74 00:05:29,000 --> 00:05:33,000 protein-protein interactions that go on in every cell at every point in a 75 00:05:33,000 --> 00:05:37,000 cell's life are. All the signal transduction events. 76 00:05:37,000 --> 00:05:41,000 All the regulatory circuits. I'll talk more about that in a moment. 77 00:05:41,000 --> 00:05:45,000 All gene function. All diseased genes. This is an enormous list. 78 00:05:45,000 --> 00:05:49,000 It's going to take decades of many people to get through this list and 79 00:05:49,000 --> 00:05:53,000 get all this information. And, of course, in the end this is 80 00:05:53,000 --> 00:05:58,000 just information. And you have to do something with it 81 00:05:58,000 --> 00:06:02,000 and put it together so that you do land up with understanding gene 82 00:06:02,000 --> 00:06:06,000 function and being able to build circuits of the kind that 83 00:06:06,000 --> 00:06:11,000 bioengineers like to do. One thing I want to point out for 84 00:06:11,000 --> 00:06:15,000 those of you who are interested in computer science. 85 00:06:15,000 --> 00:06:20,000 One of the things that has come out of the Human Genome Project is a lot 86 00:06:20,000 --> 00:06:24,000 of data, but it's actually not that much data. It's a few terabytes. 87 00:06:24,000 --> 00:06:28,000 OK? So 80,000 CDs will store all the information coming from the 88 00:06:28,000 --> 00:06:33,000 Human Genome Project. But that's just DNA sequence. 89 00:06:33,000 --> 00:06:37,000 OK? If you're starting to look at protein-protein interaction, 90 00:06:37,000 --> 00:06:41,000 all the RNAs, everything that I just went through on that list, 91 00:06:41,000 --> 00:06:45,000 we're talking about billions and billions and probably trillions of 92 00:06:45,000 --> 00:06:49,000 terabytes. Where is that information going to go? 93 00:06:49,000 --> 00:06:53,000 Is there a good way to store the information now? 94 00:06:53,000 --> 00:06:57,000 There probably ought to be some real reevaluation of data storage. 95 00:06:57,000 --> 00:07:01,000 And there is. There is some interesting work being done to try 96 00:07:01,000 --> 00:07:05,000 to figure out how to store and how to access the information that's 97 00:07:05,000 --> 00:07:09,000 going to come out of the follow-up of the Human Genome Project. 98 00:07:09,000 --> 00:07:13,000 How do you find proteins that are present in all cells at different 99 00:07:13,000 --> 00:07:17,000 times in a cell life? So here's a piece of real data. 100 00:07:17,000 --> 00:07:21,000 In the study of proteomics the notion is to look for proteins that 101 00:07:21,000 --> 00:07:25,000 are present in one cell type and not in another cell type. 102 00:07:25,000 --> 00:07:29,000 This is a technique that you know, gel electrophoresis. It's called 103 00:07:29,000 --> 00:07:34,000 2-dimensional gel electrophoresis. In the first dimension you separate 104 00:07:34,000 --> 00:07:38,000 proteins by charge. And then you actually turn your gel 105 00:07:38,000 --> 00:07:43,000 around, rerun it and separate proteins according to their size. 106 00:07:43,000 --> 00:07:47,000 And what you get are a constellation of spots, 107 00:07:47,000 --> 00:07:52,000 each of which represents a protein. And you can look at the spectrum of 108 00:07:52,000 --> 00:07:56,000 proteins from one cell type and from another cell type and ask what's 109 00:07:56,000 --> 00:08:01,000 different and what's similar between the two cell types. 110 00:08:01,000 --> 00:08:04,000 So, for example, this arrow up here. 111 00:08:04,000 --> 00:08:08,000 Actually, let's look at this one. In this cell type one there's one, 112 00:08:08,000 --> 00:08:11,000 two, three spots that are in the circle and an arrow pointing to 113 00:08:11,000 --> 00:08:15,000 nothing. If you look at cell type two, here are one, 114 00:08:15,000 --> 00:08:19,000 two, three, the same spots. And here one, two, three. And here 115 00:08:19,000 --> 00:08:22,000 the arrow is pointing to another spot which is a protein that's 116 00:08:22,000 --> 00:08:26,000 present in cell type two and not cell type one. 117 00:08:26,000 --> 00:08:30,000 And this kind of method is the way that people are figuring out which 118 00:08:30,000 --> 00:08:34,000 proteins are present in which cell type. 119 00:08:34,000 --> 00:08:38,000 What you can do now is to actually cut this little spot out of the gel 120 00:08:38,000 --> 00:08:42,000 of cell type two, put it through the mass spec and 121 00:08:42,000 --> 00:08:47,000 figure out the identify of that protein. So you can do this stuff. 122 00:08:47,000 --> 00:08:51,000 It's just a lot of work. And there are more sophisticated methods than 123 00:08:51,000 --> 00:08:56,000 this to go about finding all the proteins, but basically you have to 124 00:08:56,000 --> 00:09:00,000 look and you have to identify the protein. And then you have to store 125 00:09:00,000 --> 00:09:05,000 that data and use it somehow. Here's something else that's being 126 00:09:05,000 --> 00:09:10,000 done by Professor Young at MIT. Professor Young is trying to figure 127 00:09:10,000 --> 00:09:14,000 out all the regulatory networks between all the genes in yeast. 128 00:09:14,000 --> 00:09:19,000 So yeast is a small organism. It has just a few thousand genes. 129 00:09:19,000 --> 00:09:24,000 And it has actually just a few hundred transcription factors. 130 00:09:24,000 --> 00:09:29,000 And their names are arrayed around the outside of the circle. 131 00:09:29,000 --> 00:09:33,000 And what he's done, using various techniques, 132 00:09:33,000 --> 00:09:38,000 is to figure out which transcription factor activates the expression or 133 00:09:38,000 --> 00:09:43,000 changes the activity of which other transcription factor. 134 00:09:43,000 --> 00:09:48,000 And so every arrow indicates that there is some kind of interaction 135 00:09:48,000 --> 00:09:53,000 between these different transcription factors. 136 00:09:53,000 --> 00:09:58,000 And this gives a kind of regulatory network of the circuitry involved in 137 00:09:58,000 --> 00:10:03,000 controlling yeast transcription. Now, yeast is a single cell with 138 00:10:03,000 --> 00:10:07,000 very few genes. We are, as you know, 139 00:10:07,000 --> 00:10:11,000 multicellular organisms with many genes. And so the regulatory maps 140 00:10:11,000 --> 00:10:16,000 for humans are going to look many, many orders of magnitude more 141 00:10:16,000 --> 00:10:20,000 complex than this one. That's where we're going. 142 00:10:20,000 --> 00:10:24,000 And part of going there is using computational biology. 143 00:10:24,000 --> 00:10:29,000 One of the things that there is focus on in a number of departments 144 00:10:29,000 --> 00:10:33,000 at MIT, including Course 7. , is the question of computational 145 00:10:33,000 --> 00:10:38,000 biology or that include systems biology. 146 00:10:38,000 --> 00:10:42,000 And how can you use computational methods to work together with real 147 00:10:42,000 --> 00:10:47,000 data to predict these circuits, to describe these very, very complex 148 00:10:47,000 --> 00:10:52,000 circuits, to describe the circuit of life? And I can tell you something. 149 00:10:52,000 --> 00:10:56,000 It sounds as though it's a doable task, and in theory it is, 150 00:10:56,000 --> 00:11:01,000 but actually we are not able to describe the circuit of life for 151 00:11:01,000 --> 00:11:06,000 even the simple viruses. So there is a virus called phage 152 00:11:06,000 --> 00:11:10,000 Lambda that's been mentioned to you. It has been studied for many, many, 153 00:11:10,000 --> 00:11:15,000 many decades. And we know about its lifecycle in great detail. 154 00:11:15,000 --> 00:11:19,000 It doesn't have that many genes. We know which genes turn on and off. 155 00:11:19,000 --> 00:11:24,000 And yet we still don't have a completely reliable computer model 156 00:11:24,000 --> 00:11:29,000 of how this phage responds to various environmental inputs. 157 00:11:29,000 --> 00:11:32,000 We don't quite know when it's going to lyse the cell or when it's going 158 00:11:32,000 --> 00:11:36,000 to incorporate into the bacterial cell chromosome. 159 00:11:36,000 --> 00:11:40,000 We don't even have a complete computational description for a 160 00:11:40,000 --> 00:11:43,000 simple virus. So to get it for a cell is a daunting task. 161 00:11:43,000 --> 00:11:47,000 And this is where computational biology is going to have to work 162 00:11:47,000 --> 00:11:51,000 with the real data and where you guys come in to try to bring things 163 00:11:51,000 --> 00:11:55,000 together so we can actually get reasonable equations of life. 164 00:11:55,000 --> 00:11:59,000 Here's an equation that I took from one of my colleagues, 165 00:11:59,000 --> 00:12:03,000 Professor Eric Davidson who is at Caltech, who has been working with 166 00:12:03,000 --> 00:12:07,000 someone else to look at one of the regulatory circuits in drosophila. 167 00:12:07,000 --> 00:12:11,000 And again this is just the tip of the iceberg of gene interactions. 168 00:12:11,000 --> 00:12:15,000 You can look at this on the PowerPoint later. 169 00:12:15,000 --> 00:12:19,000 These are gene interactions, and this is just a little bit of the 170 00:12:19,000 --> 00:12:23,000 circuitry that sets up a little bit of the body plan in the fruit fly 171 00:12:23,000 --> 00:12:28,000 drosophila. Here's another frontier of biology. Imaging. 172 00:12:28,000 --> 00:12:31,000 Imaging in biology is fantastic right now. So we can do stuff like 173 00:12:31,000 --> 00:12:35,000 look at fish that has got its red blood cells fluorescently labeled. 174 00:12:35,000 --> 00:12:39,000 And we can actually see in real-time the movement of the red 175 00:12:39,000 --> 00:12:43,000 blood cells through the different parts of the body. 176 00:12:43,000 --> 00:12:46,000 We can put various drugs on the fish. We can use various fish 177 00:12:46,000 --> 00:12:50,000 mutants that are defective in components of the extracellular 178 00:12:50,000 --> 00:12:54,000 matrix, for example, or some other aspect of the animal 179 00:12:54,000 --> 00:12:58,000 that might control red blood cell movement or function. 180 00:12:58,000 --> 00:13:01,000 And we can look in real-time at what happens to the animal. 181 00:13:01,000 --> 00:13:05,000 This works great in fish, but it's only the beginning because 182 00:13:05,000 --> 00:13:09,000 there are things we still cannot see clearly enough, 183 00:13:09,000 --> 00:13:13,000 even in fish which are transparent. So these methods work well. The 184 00:13:13,000 --> 00:13:17,000 challenges become very great in mammals where development occurs 185 00:13:17,000 --> 00:13:21,000 inside the mother and where the animal is opaque. 186 00:13:21,000 --> 00:13:25,000 And how do you actually follow single cells through the animal as 187 00:13:25,000 --> 00:13:29,000 they're doing whatever they're doing? 188 00:13:29,000 --> 00:13:32,000 So, for example, if one wants to know what happens to 189 00:13:32,000 --> 00:13:35,000 a cancer cell when it's introduced into an animal, 190 00:13:35,000 --> 00:13:38,000 does it go directly to the place where it's going to make a tumor or 191 00:13:38,000 --> 00:13:41,000 does it wander around the body until it actually finds where it's going? 192 00:13:41,000 --> 00:13:45,000 You have to be able to image single cells in a very profound way. 193 00:13:45,000 --> 00:13:48,000 And this is one of the current frontiers of biology. 194 00:13:48,000 --> 00:13:51,000 But then you can go deeper down into the cell. 195 00:13:51,000 --> 00:13:54,000 You can expand that by a few orders of magnitude and say, 196 00:13:54,000 --> 00:13:58,000 well, it's not just looking at the outside of the cell. 197 00:13:58,000 --> 00:14:02,000 You really want to be looking inside the cell in real-time to see 198 00:14:02,000 --> 00:14:06,000 proteins interacting, to see transcription happening in 199 00:14:06,000 --> 00:14:10,000 real-time in the cell. It's not quite clear how to do that 200 00:14:10,000 --> 00:14:14,000 right now. It's a combination of physics. So if you're thinking of a 201 00:14:14,000 --> 00:14:18,000 Course 8.0 major, this is something you might think 202 00:14:18,000 --> 00:14:22,000 about. It's a combination of physics and biology. 203 00:14:22,000 --> 00:14:26,000 How do you get imaging on a resolution high enough that you can 204 00:14:26,000 --> 00:14:30,000 look in real-time at these events that are occurring in a cell? 205 00:14:30,000 --> 00:14:33,000 Fascinating problem. Neurobiology we touched on. 206 00:14:33,000 --> 00:14:37,000 Where is neurobiology going? Well, lots of places. How do you 207 00:14:37,000 --> 00:14:41,000 make the brain? We have no idea how you construct 208 00:14:41,000 --> 00:14:45,000 the 3-dimensional brain. And we don't understand the 209 00:14:45,000 --> 00:14:49,000 significance of the 3-dimensional structure of the brain. 210 00:14:49,000 --> 00:14:53,000 In neurobiology we talked about the circuitry in the brain and about the 211 00:14:53,000 --> 00:14:57,000 billions and billions of circuits that there are in the brain, 212 00:14:57,000 --> 00:15:01,000 probably ten to the fifteenth circuits within the brain itself. 213 00:15:01,000 --> 00:15:05,000 How on earth are we going to actually figure out that circuitry? 214 00:15:05,000 --> 00:15:09,000 I have no idea. I have no idea what we can do in the mammalian 215 00:15:09,000 --> 00:15:13,000 brain. We cannot even do it properly in something fairly simple 216 00:15:13,000 --> 00:15:18,000 like the fruit fly, so how are we going to do it in the 217 00:15:18,000 --> 00:15:22,000 human brain? This is a real frontier of biology that, 218 00:15:22,000 --> 00:15:26,000 again, brings together multiple disciplines; physics, 219 00:15:26,000 --> 00:15:31,000 biology, brain and cognitive science. What's the molecular basis for 220 00:15:31,000 --> 00:15:35,000 thought and how can we think about ourselves thinking about ourselves 221 00:15:35,000 --> 00:15:40,000 thinking about ourselves? What does that mean? OK? 222 00:15:40,000 --> 00:15:44,000 You've had one test on action potentials. You know about synapses. 223 00:15:44,000 --> 00:15:48,000 It's got something to do with channels and synapses, 224 00:15:48,000 --> 00:15:53,000 right? But what goes beyond there? How does it come back to something 225 00:15:53,000 --> 00:15:57,000 that allows us to think in such complex ways? Here's one. 226 00:15:57,000 --> 00:16:02,000 Why do we sleep? Simple question. 227 00:16:02,000 --> 00:16:06,000 We sleep. You have to sleep. If you don't sleep, this is 228 00:16:06,000 --> 00:16:10,000 something for you guys to bear in mind. If you don't sleep, 229 00:16:10,000 --> 00:16:14,000 after two weeks you will drop dead. That is true of rats. If you 230 00:16:14,000 --> 00:16:18,000 prevent a rat sleeping for two weeks it drops dead. 231 00:16:18,000 --> 00:16:22,000 There is something that happens during sleep, and it's not clear 232 00:16:22,000 --> 00:16:26,000 what, it's really not clear what. It's thought that it might be some 233 00:16:26,000 --> 00:16:30,000 kind of metabolic restoration of the brain that maybe you run out of some 234 00:16:30,000 --> 00:16:34,000 essential components that you need in order to get normal circuitry. 235 00:16:34,000 --> 00:16:37,000 But you literally have to sleep or you will die. But we don't know why. 236 00:16:37,000 --> 00:16:40,000 OK? So that's a frontier that is particularly interesting. 237 00:16:40,000 --> 00:16:44,000 OK. This is work I wanted to show you from my own laboratory. 238 00:16:44,000 --> 00:16:47,000 We're interested in the 3-dimensional structure of the brain 239 00:16:47,000 --> 00:16:51,000 and why you have a 3-dimensional structure and how you make the 240 00:16:51,000 --> 00:16:54,000 3-dimensional structure. We're looking in the zebra fish. 241 00:16:54,000 --> 00:16:57,000 This is a normal zebra fish brain. And you can see it's got these 242 00:16:57,000 --> 00:17:01,000 three red regions, which are actually cavities in the 243 00:17:01,000 --> 00:17:04,000 brain. And then this is a whole series of 244 00:17:04,000 --> 00:17:08,000 mutant fish we've isolated that have got really messed up brains. 245 00:17:08,000 --> 00:17:12,000 They turn out to be really messed up animals. They've got abnormal 246 00:17:12,000 --> 00:17:16,000 behavior, their neurons grow in the wrong place, and there's something 247 00:17:16,000 --> 00:17:20,000 profoundly wrong with both the architecture of the brain and the 248 00:17:20,000 --> 00:17:24,000 function of the brain. This is one kind of approach we can 249 00:17:24,000 --> 00:17:28,000 take but, in fact this is asking a rather simple question. 250 00:17:28,000 --> 00:17:32,000 It's not asking the question of how the zebra fish thinks about 251 00:17:32,000 --> 00:17:36,000 itself, if it does. OK. Oh, so what I'd like to do in 252 00:17:36,000 --> 00:17:40,000 the last lecture also is to point you in the direction of relevant 253 00:17:40,000 --> 00:17:44,000 movies, some of which you'll have heard of and some of which you won't. 254 00:17:44,000 --> 00:17:48,000 This is one you've probably heard of because it's a new one. 255 00:17:48,000 --> 00:17:52,000 So I particularly liked this movie with respect to neurobiology because, 256 00:17:52,000 --> 00:17:56,000 actually, I didn't like this movie. I thought it was a really 257 00:17:56,000 --> 00:18:00,000 depressing movie, but the part that I thought was 258 00:18:00,000 --> 00:18:04,000 really relevant to this class is that there is a company called 259 00:18:04,000 --> 00:18:08,000 Lacuna Incorporated that will go in and selectively erase memories from 260 00:18:08,000 --> 00:18:12,000 your memory banks. And they actually can plug in, 261 00:18:12,000 --> 00:18:15,000 you know, they put a thing on your head with electrodes coming out. 262 00:18:15,000 --> 00:18:19,000 And then they have a TV monitor. And they can actually see the 263 00:18:19,000 --> 00:18:22,000 circuits that correspond to a particular memory. 264 00:18:22,000 --> 00:18:25,000 And then they hit the erase button or the delete button and that memory 265 00:18:25,000 --> 00:18:29,000 goes. And this movie is about this guy, you look in so 266 00:18:29,000 --> 00:18:32,000 pain, Dr. Gardel. This movie is about this guy, 267 00:18:32,000 --> 00:18:36,000 Jim Carey, who is trying not to be erased. Anyway, 268 00:18:36,000 --> 00:18:39,000 it's very interesting because I thought, gee, will there ever come a 269 00:18:39,000 --> 00:18:43,000 time when we actually can have a TV screen and we actually can see the 270 00:18:43,000 --> 00:18:47,000 circuits that correspond to a particular memory? 271 00:18:47,000 --> 00:18:50,000 So see it if for no other reason. OK, here we go, basic understanding. 272 00:18:50,000 --> 00:18:54,000 Something we have not talked much about in this course but is really 273 00:18:54,000 --> 00:18:58,000 very important with regard to the future of biology is evolution. 274 00:18:58,000 --> 00:19:02,000 What does evolution mean, especially in molecular terms? 275 00:19:02,000 --> 00:19:06,000 And I actually wanted to throw this out at you because this is in the 276 00:19:06,000 --> 00:19:10,000 news presently, this term "intelligent design" and 277 00:19:10,000 --> 00:19:14,000 the contrast to evolution. I think that you guys, even now but 278 00:19:14,000 --> 00:19:18,000 certainly as you develop and go through MIT, really become 279 00:19:18,000 --> 00:19:22,000 spokespeople for science and become commentators on current issues in 280 00:19:22,000 --> 00:19:26,000 science. And I think you really should be aware of some current 281 00:19:26,000 --> 00:19:31,000 issues, so I'm throwing this out at your to increase your awareness. 282 00:19:31,000 --> 00:19:34,000 There is a term floating around called "intelligent design" which is 283 00:19:34,000 --> 00:19:38,000 sort of, I would say, an extension of creationism where 284 00:19:38,000 --> 00:19:42,000 the sense is that things are just so complex and so interesting and seem 285 00:19:42,000 --> 00:19:46,000 to be so well designed that how could this have happened by the 286 00:19:46,000 --> 00:19:50,000 process of evolution? And so if you look in the news, 287 00:19:50,000 --> 00:19:54,000 if you do a Google news or if you just look in the newspapers, 288 00:19:54,000 --> 00:19:58,000 you'll see there are raging controversies about the notion of 289 00:19:58,000 --> 00:20:02,000 intelligent design versus evolution around the country. 290 00:20:02,000 --> 00:20:06,000 And indeed evolution is complex, and we cannot explain how everything 291 00:20:06,000 --> 00:20:10,000 occurs. This is a picture of Darwin's finches, 292 00:20:10,000 --> 00:20:14,000 the thing that got him thinking about evolution. 293 00:20:14,000 --> 00:20:19,000 These finches that live in the Galapagos Islands and are believed 294 00:20:19,000 --> 00:20:23,000 to have arisen from a single pair of finches that the wind blew astray 295 00:20:23,000 --> 00:20:27,000 about 100,000 years ago. And that turned into a bunch of 296 00:20:27,000 --> 00:20:31,000 different species that can be picked out by their head shape 297 00:20:31,000 --> 00:20:36,000 and their beak size. And it's really not clear how you 298 00:20:36,000 --> 00:20:41,000 actually got this set of different beak shapes and head size. 299 00:20:41,000 --> 00:20:46,000 The sense of selection for particular beaks that allowed the 300 00:20:46,000 --> 00:20:52,000 birds to eat particular foods and so on is a very compelling one. 301 00:20:52,000 --> 00:20:57,000 And there certainly is no doubt in my mind, or I would say in most 302 00:20:57,000 --> 00:21:02,000 people's mind who work in biology, that natural selection and evolution 303 00:21:02,000 --> 00:21:07,000 is the way to go. But I want to raise with you an 304 00:21:07,000 --> 00:21:11,000 interesting question and then I want to tell you about a new paper that I 305 00:21:11,000 --> 00:21:16,000 read concerning natural selection. So natural selection leading to 306 00:21:16,000 --> 00:21:20,000 evolution is thought to act on three different kinds of changes in DNA. 307 00:21:20,000 --> 00:21:25,000 Single based mutations, you remember those, 308 00:21:25,000 --> 00:21:30,000 frame shifts, missense, nonsense mutations and so on. 309 00:21:30,000 --> 00:21:34,000 Cis-regulatory mutations, those refer to mutations in the 310 00:21:34,000 --> 00:21:38,000 promoter regions of genes. So those would change the 311 00:21:38,000 --> 00:21:42,000 transcription of a gene. A single base mutation would change 312 00:21:42,000 --> 00:21:47,000 whether a protein is made and what the actual sequence of the protein 313 00:21:47,000 --> 00:21:51,000 is and therefore its potential function. The cis-regulatory 314 00:21:51,000 --> 00:21:55,000 mutations would change how much of a message was made, how much 315 00:21:55,000 --> 00:22:00,000 of a protein was made. And here's one that you've touched 316 00:22:00,000 --> 00:22:04,000 on a bit, but I want to touch on a bit more, which is the repeat number 317 00:22:04,000 --> 00:22:08,000 of motifs within one coding sequence. So what am I talking about? 318 00:22:08,000 --> 00:22:12,000 Well, I'll tell you what I'm talking about. 319 00:22:12,000 --> 00:22:16,000 And I'll use it to describe the example of dog evolution. 320 00:22:16,000 --> 00:22:20,000 So dogs are really different from one another. They're 321 00:22:20,000 --> 00:22:24,000 extraordinarily different from one another. If you look at their size 322 00:22:24,000 --> 00:22:28,000 and their actual faces and the bones of their face, 323 00:22:28,000 --> 00:22:32,000 the shapes of the bones, the size of the snout and so on are 324 00:22:32,000 --> 00:22:36,000 really different. You know, not only are they cute, 325 00:22:36,000 --> 00:22:40,000 but they're really different from one another. OK. 326 00:22:40,000 --> 00:22:45,000 And if you actually look, over the past 150 years, there has 327 00:22:45,000 --> 00:22:49,000 been a huge increase in the number of breeds and a huge increase in the 328 00:22:49,000 --> 00:22:54,000 changes that you see in dog facial skeleton. Now, 329 00:22:54,000 --> 00:22:59,000 this bothers people who think about evolution. 330 00:22:59,000 --> 00:23:04,000 Because if you look at the number of single based mutations that are 331 00:23:04,000 --> 00:23:09,000 around in coding sequences, it would not seem to be enough to 332 00:23:09,000 --> 00:23:14,000 accomplish these rapid changes in dog morphology. 333 00:23:14,000 --> 00:23:19,000 And so a very interesting paper came out last year that will lead to 334 00:23:19,000 --> 00:23:24,000 a conclusion I'll tell you a moment. The conclusion has to do with 335 00:23:24,000 --> 00:23:30,000 variations in the number of motif repeats within a protein. 336 00:23:30,000 --> 00:23:34,000 So what am I talking about? So forget this. This is actually 337 00:23:34,000 --> 00:23:38,000 on your handout. So if you look at, 338 00:23:38,000 --> 00:23:43,000 I did give you a handout and I haven't been referring to it, 339 00:23:43,000 --> 00:23:47,000 but this in fact is number seven on your handout. So if you look at 340 00:23:47,000 --> 00:23:52,000 protein A and allele A of protein A or allele B of protein A, 341 00:23:52,000 --> 00:23:56,000 they may differ in the following way. In protein A there may be a small 342 00:23:56,000 --> 00:24:01,000 amino acid stretch that is repeated a couple of times. 343 00:24:01,000 --> 00:24:04,000 It could be directly contiguous or it could be a little far apart from 344 00:24:04,000 --> 00:24:08,000 each other. And then if you look at allele B of protein A, 345 00:24:08,000 --> 00:24:12,000 you might have five copies of that repeat sequence. 346 00:24:12,000 --> 00:24:16,000 And, in fact, that change in number of copies of a particular part of a 347 00:24:16,000 --> 00:24:20,000 protein can profoundly change the function of the protein. 348 00:24:20,000 --> 00:24:24,000 It can change confirmation. It can change enzymatic activity. 349 00:24:24,000 --> 00:24:28,000 It can change localization in the cell. It can change stability of 350 00:24:28,000 --> 00:24:32,000 the protein and so on. These repeats and variation in the 351 00:24:32,000 --> 00:24:37,000 number of repeats are actually very easy to get. They are about 100, 352 00:24:37,000 --> 00:24:42,000 00 times more frequent than point mutations if you look in genomes. 353 00:24:42,000 --> 00:24:46,000 And they occur during recombination where the DNA sequences might 354 00:24:46,000 --> 00:24:51,000 misalign with one another. And I'm not going to get into this 355 00:24:51,000 --> 00:24:56,000 now, but if you want to come ask me later I'll email you. We 356 00:24:56,000 --> 00:25:01,000 can go into this more. But they occur because the DNA 357 00:25:01,000 --> 00:25:05,000 sequences don't quite align properly during recombination, 358 00:25:05,000 --> 00:25:10,000 and you get the protein changing a bit with respect to these repeat 359 00:25:10,000 --> 00:25:14,000 sequences. And so Fondon and Garner looked at 92 breeds of dogs, 360 00:25:14,000 --> 00:25:19,000 and they looked in 17 genes that they thought might important in 361 00:25:19,000 --> 00:25:23,000 shaping the facial skeleton because we know what those genes are. 362 00:25:23,000 --> 00:25:28,000 And they found, very interestingly, that these 17 genes had 37 repeat 363 00:25:28,000 --> 00:25:32,000 regions amongst them which is actually much higher than you find 364 00:25:32,000 --> 00:25:37,000 in just your general spread of genes. 365 00:25:37,000 --> 00:25:41,000 And when they looked from breed to breed they found there was a huge 366 00:25:41,000 --> 00:25:45,000 variation in the numbers of repeats in different genes from one breed of 367 00:25:45,000 --> 00:25:50,000 dog to another breed of dog. Now, I don't know really what this 368 00:25:50,000 --> 00:25:54,000 means. It's very interesting potentially for looking at how 369 00:25:54,000 --> 00:25:59,000 breeds of dogs have evolved or how we have forced their evolution. 370 00:25:59,000 --> 00:26:03,000 Does this have something to do profoundly with evolution and 371 00:26:03,000 --> 00:26:07,000 changes in form in general? Don't know that. But it's 372 00:26:07,000 --> 00:26:11,000 something that you should be aware of as you go on because it's a 373 00:26:11,000 --> 00:26:16,000 slightly different way to think about how rapid evolution can occur. 374 00:26:16,000 --> 00:26:20,000 OK. Clinical understanding. Disease taxonomy, 375 00:26:20,000 --> 00:26:24,000 you've talked a bit about this in cancer. Over the last few decades 376 00:26:24,000 --> 00:26:28,000 there has been an enormous increase in the number of genes that can be 377 00:26:28,000 --> 00:26:33,000 assigned to be associated with a particular disease. 378 00:26:33,000 --> 00:26:36,000 And my chart here only goes up to 2002. It would probably be 379 00:26:36,000 --> 00:26:40,000 somewhere out here on the roof for 2005. How do you do this? 380 00:26:40,000 --> 00:26:44,000 Well, this is where the Humane Genome Project comes in. 381 00:26:44,000 --> 00:26:47,000 One can look. And each of these squares represents a gene and its 382 00:26:47,000 --> 00:26:51,000 expression, and the level of expression is proportional to the 383 00:26:51,000 --> 00:26:55,000 color or is associated with the color. It doesn't matter which. 384 00:26:55,000 --> 00:26:59,000 But you can look in different cancers. 385 00:26:59,000 --> 00:27:02,000 And each of these lines is a cancer. And you can look at different genes. 386 00:27:02,000 --> 00:27:06,000 And you can see that different tumors have got different patterns 387 00:27:06,000 --> 00:27:10,000 of gene expression. And you can use those patterns of 388 00:27:10,000 --> 00:27:14,000 gene expression to classify the tumors. And this has been done by 389 00:27:14,000 --> 00:27:18,000 Professor Lander and Dr. Golub over at the Genome Center. 390 00:27:18,000 --> 00:27:22,000 So, for example, in acute lymphocytic leukemia, 391 00:27:22,000 --> 00:27:26,000 you can see one pattern of gene expression. Again, 392 00:27:26,000 --> 00:27:30,000 each of these squares represents a gene and the color represents the 393 00:27:30,000 --> 00:27:33,000 level of expression of the gene. And you can see in acute myelogenous 394 00:27:33,000 --> 00:27:37,000 leukemia there's a completely different pattern of expression. 395 00:27:37,000 --> 00:27:41,000 This is fantastic because it starts to allow you to classify a disease 396 00:27:41,000 --> 00:27:44,000 in molecular detail. The old way of pathologists looking 397 00:27:44,000 --> 00:27:48,000 at diseases, looking at cells and trying to classify both cancers and 398 00:27:48,000 --> 00:27:52,000 other disorders on the basis of morphology of cells and of staining 399 00:27:52,000 --> 00:27:56,000 of cells is actually not that precise. It's much better 400 00:27:56,000 --> 00:28:00,000 than nothing. But being able to do it at a 401 00:28:00,000 --> 00:28:05,000 molecular level really lets you know what disease you're dealing with and 402 00:28:05,000 --> 00:28:10,000 what spectrum of drugs might be appropriate to treat that disorder. 403 00:28:10,000 --> 00:28:14,000 And so that segues nicely into the future of prediction. 404 00:28:14,000 --> 00:28:19,000 What can we predict in biology? So here are a couple. Will your 405 00:28:19,000 --> 00:28:24,000 specific disorder respond to particular drugs? 406 00:28:24,000 --> 00:28:29,000 If you have acute Lymphocytic leukemia, will it respond to a 407 00:28:29,000 --> 00:28:34,000 particular spectrum of drugs? And if you have a particular variant 408 00:28:34,000 --> 00:28:38,000 of acute Lymphocytic leukemia, will it respond to particular 409 00:28:38,000 --> 00:28:43,000 variants of drugs? We're already on the cusp of 410 00:28:43,000 --> 00:28:47,000 classifying cancers in a way that you can give a particular spectrum 411 00:28:47,000 --> 00:28:51,000 of drugs for a particular kind of cancer. And this is really going to 412 00:28:51,000 --> 00:28:56,000 escalate to everything. There is almost no disorder that is 413 00:28:56,000 --> 00:29:00,000 treatable by medication that where different people are not sensitive 414 00:29:00,000 --> 00:29:05,000 at different levels to a particular medication. 415 00:29:05,000 --> 00:29:08,000 So some people might respond very well and some people might respond 416 00:29:08,000 --> 00:29:12,000 very poorly, not just in cancer but in almost all disorders. 417 00:29:12,000 --> 00:29:15,000 In the future, and I think it's going to be in the near future, 418 00:29:15,000 --> 00:29:19,000 really in the next few years, I think it's going to be possible to 419 00:29:19,000 --> 00:29:22,000 say what specific disorder do you have and should you be taking this 420 00:29:22,000 --> 00:29:26,000 particular combination of drugs? And here's another one. Are you 421 00:29:26,000 --> 00:29:30,000 genetically predetermined to get a specific disease? 422 00:29:30,000 --> 00:29:34,000 That's a really tough one. Maybe you want to know. Maybe you 423 00:29:34,000 --> 00:29:38,000 don't want to know. We'll come to that to in a moment. 424 00:29:38,000 --> 00:29:42,000 Here's a movie that I particularly liked that has to do with predicting 425 00:29:42,000 --> 00:29:46,000 who you're going to be and what you're going to get or not get. 426 00:29:46,000 --> 00:29:50,000 And it's called Gattaca. You guys may or may not have seen it, 427 00:29:50,000 --> 00:29:54,000 or used to see it long ago. Have you guys seen Gattaca? 428 00:29:54,000 --> 00:29:58,000 Yes. OK. Good. Fine. I'm not that far out. I was trying 429 00:29:58,000 --> 00:30:02,000 to gauge your level here. I liked that. I particularly liked 430 00:30:02,000 --> 00:30:07,000 ìthere is no gene for the human spiritî. So Gattaca falls into the 431 00:30:07,000 --> 00:30:12,000 prediction aegis quite well. I next few years, and I would say 432 00:30:12,000 --> 00:30:18,000 within ten years easy, you are going to be about to get 433 00:30:18,000 --> 00:30:23,000 your personal DNA profile, including information about the 434 00:30:23,000 --> 00:30:28,000 approximately 1, 00 bad alleles of genes that we all 435 00:30:28,000 --> 00:30:32,000 carry. So that's good, I guess. And that's bad. 436 00:30:32,000 --> 00:30:35,000 Do you want this information? Do you want to know what you're 437 00:30:35,000 --> 00:30:38,000 going to get? Do you want to know that you're going to get a 438 00:30:38,000 --> 00:30:41,000 neurological disease as you get older? Do you want to know that 439 00:30:41,000 --> 00:30:44,000 you're likely to have a heart attack before you're 50? 440 00:30:44,000 --> 00:30:47,000 Do you want others to have this information? Would you like your 441 00:30:47,000 --> 00:30:50,000 perspective partner to know that you're likely to get some horrible 442 00:30:50,000 --> 00:30:53,000 neurological disease? Would you like your insurance 443 00:30:53,000 --> 00:30:56,000 company to know this? Would you like your children to 444 00:30:56,000 --> 00:31:00,000 know it? I don't know. I don't know what the answer is. 445 00:31:00,000 --> 00:31:04,000 Myself, I prefer actually not to know and to go through life 446 00:31:04,000 --> 00:31:08,000 day-to-day having as good a time as I can and letting whatever, 447 00:31:08,000 --> 00:31:12,000 God or chance take care of the rest. But there is certainly merit in 448 00:31:12,000 --> 00:31:17,000 trying to prevent some things. And this is really going to be a 449 00:31:17,000 --> 00:31:21,000 reality very soon. Already, actually, 450 00:31:21,000 --> 00:31:25,000 you can get for your dog a DNA profile that's actually 451 00:31:25,000 --> 00:31:30,000 fairly detailed. It is RFLP mapping that we talked 452 00:31:30,000 --> 00:31:34,000 about in class where you can make sure that your dog is who you think 453 00:31:34,000 --> 00:31:38,000 it is and who its parents are and who it thinks it is. 454 00:31:38,000 --> 00:31:42,000 So you can get your own in the future. I don't know what they're 455 00:31:42,000 --> 00:31:46,000 going to call the American Kennel Club equivalent for humans, 456 00:31:46,000 --> 00:31:50,000 but you'll be able to get your certificate of DNA analysis. 457 00:31:50,000 --> 00:31:55,000 OK. Design. We talked long ago about rational drug design. 458 00:31:55,000 --> 00:31:58,000 And Gleevec is really one of the shinning examples of being able to 459 00:31:58,000 --> 00:32:02,000 look at the structure of a protein and saying, hey, 460 00:32:02,000 --> 00:32:06,000 this protein is a bad protein, it's an abnormal protein, and it's 461 00:32:06,000 --> 00:32:10,000 causative of leukemia. And wouldn't it be great to inhibit 462 00:32:10,000 --> 00:32:13,000 its function? And so let's design the screen molecule which looks as 463 00:32:13,000 --> 00:32:17,000 though it will inhibit ATP binding and prevent this kinase from acting. 464 00:32:17,000 --> 00:32:21,000 And, in fact, Gleevec works really well that way. 465 00:32:21,000 --> 00:32:25,000 And this is something that is being, this approach is being very, 466 00:32:25,000 --> 00:32:29,000 very actively pursued, and will only be more actively pursued. 467 00:32:29,000 --> 00:32:33,000 And if you're thinking of a Course 5. major or if you're thinking of many 468 00:32:33,000 --> 00:32:37,000 of the engineering majors, you may well get into rational drug 469 00:32:37,000 --> 00:32:42,000 design and figuring out how to get it to work. Here's another one, 470 00:32:42,000 --> 00:32:46,000 prediction for the future and Jurassic Park. 471 00:32:46,000 --> 00:32:51,000 Xenotransplantation, using pigs that have immune systems 472 00:32:51,000 --> 00:32:55,000 engineered to look like the human immune system for organ transplants. 473 00:32:55,000 --> 00:33:00,000 There are companies trying to do this. 474 00:33:00,000 --> 00:33:04,000 Bionics. Here is something for all of you interested in mechanical 475 00:33:04,000 --> 00:33:08,000 engineering and other, bionics. Robocop. Here's a movie 476 00:33:08,000 --> 00:33:12,000 to see if you haven't. Artificial hearts. Artificial 477 00:33:12,000 --> 00:33:17,000 hearts are a disaster presently. AbioCor has a heart now that is 478 00:33:17,000 --> 00:33:21,000 fully implantable and that will take over ventricular function, 479 00:33:21,000 --> 00:33:25,000 but it is a lousy heart. And if one of you could go and make a new heart 480 00:33:25,000 --> 00:33:29,000 that actually would work properly that would be a real service 481 00:33:29,000 --> 00:33:34,000 to humankind. Blood substitutes. 482 00:33:34,000 --> 00:33:38,000 We still do not have a blood substitute that really works. 483 00:33:38,000 --> 00:33:43,000 There is one patented. It's called Oxygent. It's a perfluorocarbon 484 00:33:43,000 --> 00:33:47,000 that will carry oxygen around the blood in the body for a while, 485 00:33:47,000 --> 00:33:52,000 but it's not very good. And finally aging. Here's a movie you probably 486 00:33:52,000 --> 00:33:56,000 haven't heard of, Zardoz. Zardoz, a great movie about 487 00:33:56,000 --> 00:34:01,000 a community where nobody aged and they were really, really unhappy. 488 00:34:01,000 --> 00:34:05,000 And then Sean Connery comes along and introduces this community that 489 00:34:05,000 --> 00:34:09,000 doesn't age and doesn't have sex and is just generally miserable to the 490 00:34:09,000 --> 00:34:13,000 joys of procreation. And they go for it. And then they 491 00:34:13,000 --> 00:34:18,000 age and they die happily ever after. So that's OK. And if you're 492 00:34:18,000 --> 00:34:22,000 looking for a summer book to read, Professor Guarente at MIT has 493 00:34:22,000 --> 00:34:26,000 written a great book about aging, which is what his research focuses 494 00:34:26,000 --> 00:34:31,000 on. And you might want to look at that. 495 00:34:31,000 --> 00:34:35,000 So the challenge I give to you is where are you going to come in? 496 00:34:35,000 --> 00:34:40,000 I wish you all the best of luck. And it's been a pleasure to teach 497 00:34:40,000 --> 00:34:43,000 you. [APPLAUSE]