1 00:00:01,000 --> 00:00:05,000 Hello, everybody. Can we get started? 2 00:00:05,000 --> 00:00:10,000 So my name is Andrew Chess. And I'm lecturing today replacing 3 00:00:10,000 --> 00:00:15,000 Eric Lander for the day. He had to be out of town, 4 00:00:15,000 --> 00:00:20,000 something that he could not reschedule. He really tries to 5 00:00:20,000 --> 00:00:25,000 arrange his very busy schedule so that he is here, 6 00:00:25,000 --> 00:00:30,000 but this was something that could not be rearranged. Anyway. 7 00:00:30,000 --> 00:00:34,000 So I am a professor at Harvard Medical School, 8 00:00:34,000 --> 00:00:38,000 but I have a long history here at MIT, including being on the faculty 9 00:00:38,000 --> 00:00:42,000 here for a number of years. I used to teach undergraduates. 10 00:00:42,000 --> 00:00:46,000 And even going back further than that, I actually took this class. 11 00:00:46,000 --> 00:00:50,000 It was called 7.01 without any extra number at that time. 12 00:00:50,000 --> 00:00:54,000 Now it is what, 7.012 or 7. 13? 0-1-2. Anyway. So it was 13 00:00:54,000 --> 00:00:58,000 called 7.01. And it was an extremely interesting introduction 14 00:00:58,000 --> 00:01:01,000 to biology back then. It was some time over 20 years ago. 15 00:01:01,000 --> 00:01:05,000 I'm not sure exactly how many years. I kind of stopped counting at 20. 16 00:01:05,000 --> 00:01:09,000 So when Eric called me, when Professor Lander called me up to ask 17 00:01:09,000 --> 00:01:12,000 me to give this lecture, I talked to him for a while and I 18 00:01:12,000 --> 00:01:16,000 agreed to do it. And then I was thinking about what 19 00:01:16,000 --> 00:01:20,000 to present. And I went over the material that he has presented to 20 00:01:20,000 --> 00:01:23,000 earlier this week. And I discussed with him some of 21 00:01:23,000 --> 00:01:27,000 the things that he likes to do for the third lecture on neurobiology. 22 00:01:27,000 --> 00:01:31,000 And I also thought about some of the things that I would like to do. 23 00:01:31,000 --> 00:01:34,000 And so one of the things that occurred to me, 24 00:01:34,000 --> 00:01:38,000 so it's probably occurred to you from hearing Eric talk about 25 00:01:38,000 --> 00:01:42,000 neurobiology earlier this week that he is very enthusiastic about the 26 00:01:42,000 --> 00:01:45,000 subject. He always talks about it as being one of the driving forces 27 00:01:45,000 --> 00:01:49,000 that led him to enter biology from the realm of mathematics where he 28 00:01:49,000 --> 00:01:53,000 started his academic career. Anyway. So he's always talked to 29 00:01:53,000 --> 00:01:57,000 me, in the years I've known him, about how he loves neurobiology. 30 00:01:57,000 --> 00:02:01,000 And I was thinking about it. It is, in some ways, 31 00:02:01,000 --> 00:02:05,000 kind of ironic that Eric loves neurobiology so much because in some 32 00:02:05,000 --> 00:02:09,000 ways he's been a big trouble-maker for neurobiology. 33 00:02:09,000 --> 00:02:13,000 Let me explain. So Eric, Professor Lander is, of course, 34 00:02:13,000 --> 00:02:17,000 as you all know, was an instrumental driving force behind the sequencing 35 00:02:17,000 --> 00:02:21,000 of the human genome. Before all those efforts, 36 00:02:21,000 --> 00:02:25,000 over the last decade, people used to go around, biologists used to go 37 00:02:25,000 --> 00:02:30,000 around to each other and they would talk. 38 00:02:30,000 --> 00:02:33,000 And they would say there are around 100,000 genes in the human genome, 39 00:02:33,000 --> 00:02:36,000 or 100,000 genes in a mouse genome. Mammalian genomes have around 40 00:02:36,000 --> 00:02:39,000 100,000 genes. Sometimes some people would say 90, 41 00:02:39,000 --> 00:02:42,000 00 genes. Sometimes they would say 110,000 genes. 42 00:02:42,000 --> 00:02:45,000 But generally it was around 100, 00 genes. And everybody was 43 00:02:45,000 --> 00:02:48,000 comfortable with that. In fact, right around the turn of 44 00:02:48,000 --> 00:02:52,000 the century when the stock market was going way up, 45 00:02:52,000 --> 00:02:55,000 the Internet bubble and biotech bubble and everything, 46 00:02:55,000 --> 00:02:58,000 estimates of the number of genes in a human genome actually 47 00:02:58,000 --> 00:03:02,000 went higher also. They went up as high as 120, 48 00:03:02,000 --> 00:03:06,000 00, 150,000. And this, I think, was because various companies were 49 00:03:06,000 --> 00:03:10,000 competing to have the most genes on their kind of micro array or 50 00:03:10,000 --> 00:03:14,000 whatever they were selling. They wanted to say we have the most, 51 00:03:14,000 --> 00:03:18,000 and so they kept saying more. The academic scientist usually 52 00:03:18,000 --> 00:03:22,000 still stayed around 100, 00 in terms of their thinking. 53 00:03:22,000 --> 00:03:26,000 OK. The other thing that was of a lot of 54 00:03:26,000 --> 00:03:30,000 use to neurobiologists in terms of them feeling that their problem of 55 00:03:30,000 --> 00:03:34,000 trying to figure out how the brain is set up as a tractable problem was 56 00:03:34,000 --> 00:03:38,000 that they thought there were going to be lots of genes available. 57 00:03:38,000 --> 00:03:42,000 So there were 100,000 genes in the genome, and around half of them, 58 00:03:42,000 --> 00:03:46,000 people would say, are probably brain-specific. 59 00:03:46,000 --> 00:03:50,000 So there were a variety of pieces of evidence that people would think 60 00:03:50,000 --> 00:03:54,000 that a lot of the genes in the genome would be brain-specific. 61 00:03:54,000 --> 00:03:58,000 And that still remains the case. But, as you know from Eric Landers 62 00:03:58,000 --> 00:04:02,000 and other people's work sequencing the genome and mouse genome and 63 00:04:02,000 --> 00:04:06,000 other genomes now, it looks like mammals now have only 64 00:04:06,000 --> 00:04:10,000 around 30,000 to 40,000 genes. Now, if Eric told you a different 65 00:04:10,000 --> 00:04:16,000 number listen to his number. What did he say? [20,000 to 25, 66 00:04:16,000 --> 00:04:22,000 00?]. OK. Anyway. So many fewer than 100,000. OK? 67 00:04:22,000 --> 00:04:28,000 A small number. A fly is thought to have only around 15,000 genes. 68 00:04:28,000 --> 00:04:31,000 So Lander is now saying that humans don't have that many more genes than 69 00:04:31,000 --> 00:04:34,000 flies. So this presented a problem for neurobiologists, 70 00:04:34,000 --> 00:04:37,000 because even if you have half of them brain-specific, 71 00:04:37,000 --> 00:04:41,000 you still don't have nearly as many genes to play with to make the 72 00:04:41,000 --> 00:04:44,000 complex structure of the brain as you did when there were 100, 73 00:04:44,000 --> 00:04:47,000 00 genes in the genome. So the brain, of course, 74 00:04:47,000 --> 00:04:51,000 is an extremely complicated structure. There are thought to be 75 00:04:51,000 --> 00:04:54,000 somewhere between 100 billion and a trillion different neurons in the 76 00:04:54,000 --> 00:04:58,000 brain, and they also fall into many, many different neural types. 77 00:04:58,000 --> 00:05:01,000 And so the developmental process, developmental biology is something 78 00:05:01,000 --> 00:05:05,000 that some of you will study in future courses and you'll have had 79 00:05:05,000 --> 00:05:09,000 some introduction to here, that's how you get from a single 80 00:05:09,000 --> 00:05:13,000 fertilized egg, a single cell to the complex 81 00:05:13,000 --> 00:05:16,000 organism. In the brain it's a particularly difficult problem 82 00:05:16,000 --> 00:05:20,000 because you have so many different types of cells and so many cells. 83 00:05:20,000 --> 00:05:24,000 And each cell then makes all of these complex connections. 84 00:05:24,000 --> 00:05:28,000 A given neuron might connect to 1, 00 or a few thousand different other 85 00:05:28,000 --> 00:05:32,000 cells as a normal process. So forming all these different kinds 86 00:05:32,000 --> 00:05:36,000 of neurons and wiring up is a very daunting problem. 87 00:05:36,000 --> 00:05:40,000 So what I thought I would do today would be to focus on two examples 88 00:05:40,000 --> 00:05:44,000 where in each case, starting with either one gene or a 89 00:05:44,000 --> 00:05:48,000 small number of genes you get a lot of complexity. 90 00:05:48,000 --> 00:05:52,000 So this would then allow the smaller number of total gene number 91 00:05:52,000 --> 00:05:56,000 in the human genome to allow a lot of different kinds of proteins and 92 00:05:56,000 --> 00:06:00,000 maybe provide some explanations for certain parts of the complexity 93 00:06:00,000 --> 00:06:04,000 of the brain. Now, by no means am I going to 94 00:06:04,000 --> 00:06:08,000 attempt to explain all of how the brain develops. 95 00:06:08,000 --> 00:06:12,000 That would take, well, at least one course, 96 00:06:12,000 --> 00:06:16,000 more likely a few different courses to actually get a good appreciation 97 00:06:16,000 --> 00:06:20,000 of that, but I'm going to go through a couple of very intriguing examples 98 00:06:20,000 --> 00:06:24,000 that are approachable at the level of this class. 99 00:06:24,000 --> 00:06:43,000 OK. So the standard way that we 100 00:06:43,000 --> 00:06:49,000 think about how genetic information gets made into proteins is that 101 00:06:49,000 --> 00:06:55,000 there is DNA and then RNA and then proteins. I hope at this point in 7. 102 00:06:55,000 --> 00:07:01,000 1 this is all familiar to you. Good. OK. So the DNA sequence 103 00:07:01,000 --> 00:07:06,000 gets transcribed into an RNA, and then there's a splicing event 104 00:07:06,000 --> 00:07:12,000 which takes bits and pieces of the RNA, puts them together, 105 00:07:12,000 --> 00:07:18,000 and then there's an area of the RNA that tells the ribosome to start 106 00:07:18,000 --> 00:07:24,000 making protein. And so you get the protein 107 00:07:24,000 --> 00:07:29,000 synthesis. Everything is following from the 108 00:07:29,000 --> 00:07:33,000 blueprint that was in the DNA. So what I'm going to talk about 109 00:07:33,000 --> 00:07:37,000 today are two different examples. One of them involves alternative 110 00:07:37,000 --> 00:07:49,000 splicing. 111 00:07:49,000 --> 00:07:54,000 And that will be causes where instead of there being a static 112 00:07:54,000 --> 00:07:59,000 always reproduced way of going from DNA to RNA to a protein, 113 00:07:59,000 --> 00:08:04,000 that there are different alternative splicing events that can occur. 114 00:08:04,000 --> 00:08:08,000 And this can allow one gene to make multiple different protein products. 115 00:08:08,000 --> 00:08:13,000 And then I'm going to go over another way that you can violate 116 00:08:13,000 --> 00:08:18,000 this central dogma, this DNA, RNA to protein, 117 00:08:18,000 --> 00:08:22,000 which is something called RNA editing. So, as you might imagine 118 00:08:22,000 --> 00:08:27,000 from the common usage of the word editing, by editing what we mean is 119 00:08:27,000 --> 00:08:32,000 that the RNA sequence itself is actually changed so that it no 120 00:08:32,000 --> 00:08:37,000 longer reflect the exact nucleotide sequence of the DNA. 121 00:08:37,000 --> 00:08:41,000 This can also add diversity to the number of potential encoded proteins. 122 00:08:41,000 --> 00:08:46,000 I want to make sure that I, I'm going to talk about 123 00:08:46,000 --> 00:08:50,000 neural-specific examples, but I want to mention that these 124 00:08:50,000 --> 00:08:55,000 processes, alternative splicing and RNA editing are used also by other 125 00:08:55,000 --> 00:09:00,000 parts of the developing animal, and also in other plants and other 126 00:09:00,000 --> 00:09:05,000 organisms, but not just by the brain to generate diversity. 127 00:09:05,000 --> 00:09:08,000 So these are mechanisms that are widely used. But some of the most 128 00:09:08,000 --> 00:09:12,000 striking examples, as you'll see from my lecture and in 129 00:09:12,000 --> 00:09:16,000 further reading that you might do in the future, some of the most 130 00:09:16,000 --> 00:09:19,000 striking examples come from the nervous system. 131 00:09:19,000 --> 00:09:23,000 And that's not surprising given the complexity of the nervous system and 132 00:09:23,000 --> 00:09:27,000 the fact that there are so many genes out there ready to help with 133 00:09:27,000 --> 00:09:31,000 this complexity. So the first I'll turn to 134 00:09:31,000 --> 00:09:44,000 alternative splicing. 135 00:09:44,000 --> 00:09:47,000 So first, before getting into the extremely complex case that I'm 136 00:09:47,000 --> 00:09:51,000 going to focus on, I'm going to just briefly, 137 00:09:51,000 --> 00:09:55,000 by way of introduction, go over a standard alternative 138 00:09:55,000 --> 00:10:15,000 splicing scenario. 139 00:10:15,000 --> 00:10:23,000 OK. So in a gene for which there is no alternative splicing, 140 00:10:23,000 --> 00:10:31,000 if I draw the exons as boxes and the introns as lines, 141 00:10:31,000 --> 00:10:40,000 what winds up happening is you have the first exon spliced to the second 142 00:10:40,000 --> 00:10:48,000 one, second to the third, third to the forth. And so what you 143 00:10:48,000 --> 00:11:00,000 wind up with is a messenger RNA. 144 00:11:00,000 --> 00:11:03,000 So here is the messenger RNA which has been spliced from the primary 145 00:11:03,000 --> 00:11:07,000 transcript. The primary transcript, of course, reflects the actual 146 00:11:07,000 --> 00:11:11,000 structure of the DNA in terms of sequence also, 147 00:11:11,000 --> 00:11:14,000 because in the genomic DNA you'd also have areas that are going to be 148 00:11:14,000 --> 00:11:18,000 exon, intron, exon intron exactly like this. So this is just a 149 00:11:18,000 --> 00:11:22,000 general example of alternative splicing, I'm sorry, 150 00:11:22,000 --> 00:11:26,000 of regular splicing. So then alternative splicing would involve 151 00:11:26,000 --> 00:11:32,000 something like this. You'd have 1, 2, 152 00:11:32,000 --> 00:11:41,000 3A, 3B, 4. So then what happens is you have normal splicing from 1 to 2. 153 00:11:41,000 --> 00:11:51,000 And then 2 could either go to 3A, which will then go to 4 leaving 3B 154 00:11:51,000 --> 00:12:00,000 out. Or the alternative is that 2 can skip 3A, go to 3B, 155 00:12:00,000 --> 00:12:07,000 which will then splice to 4. So this allows then two different 156 00:12:07,000 --> 00:12:33,000 messengers to be formed. 157 00:12:33,000 --> 00:12:37,000 So this 3A and 3B might encode a slightly different sequence and 158 00:12:37,000 --> 00:12:41,000 might then allow two distinct proteins with different functions to 159 00:12:41,000 --> 00:12:45,000 be forwarded from one message. So this is an example, a simple 160 00:12:45,000 --> 00:12:49,000 example, a general example of alternative splicing. 161 00:12:49,000 --> 00:12:54,000 So from one gene you have two proteins. 162 00:12:54,000 --> 00:13:02,000 The example that I'm going to focus 163 00:13:02,000 --> 00:13:07,000 on today, instead of going from one gene to two proteins, 164 00:13:07,000 --> 00:13:13,000 allows you to go from one gene to 38, 00 different possibilities. 165 00:13:13,000 --> 00:13:18,000 It's actually 38,016 to be exact, and I'll explain to you why, but 38, 166 00:13:18,000 --> 00:13:23,000 00 will have occurred to you that this is larger than the number of 167 00:13:23,000 --> 00:13:29,000 genes that Eric Lander says are in the human genome. 168 00:13:29,000 --> 00:13:34,000 It's certainly larger than the number of genes in the fly genome. 169 00:13:34,000 --> 00:13:39,000 And this example I'm giving you is from a single gene in the fruit fly. 170 00:13:39,000 --> 00:13:45,000 This one gene can come in 38,000 different forms. 171 00:13:45,000 --> 00:13:59,000 The gene is called drosophila DSCAM. 172 00:13:59,000 --> 00:14:03,000 It's named for a human gene which was cloned first and characterized 173 00:14:03,000 --> 00:14:08,000 first which was called just plain DSCAM. What DSCAM stands for is 174 00:14:08,000 --> 00:14:13,000 Down syndrome cell adhesion molecule. 175 00:14:13,000 --> 00:14:33,000 And let me just explain briefly why 176 00:14:33,000 --> 00:14:36,000 it has this name. I don't think that this name is 177 00:14:36,000 --> 00:14:39,000 actually relevant so much to the biology, and it certainly not 178 00:14:39,000 --> 00:14:42,000 relevant to the alternative splicing because the human gene and the mouse 179 00:14:42,000 --> 00:14:45,000 gene, neither of them have a lot of alternative splicing. 180 00:14:45,000 --> 00:14:48,000 This is something that is particular to the fly. 181 00:14:48,000 --> 00:14:51,000 That's something I will return to later in lecture. 182 00:14:51,000 --> 00:14:54,000 This name Down syndrome cell adhesion molecule came about because 183 00:14:54,000 --> 00:14:57,000 this was cloned first from human and it's located on human 184 00:14:57,000 --> 00:15:05,000 chromosome 21. 185 00:15:05,000 --> 00:15:10,000 Chromosome 21 is normally present in two copies in every individual. 186 00:15:10,000 --> 00:15:15,000 In individuals who wind up with three copies of chromosome 21, 187 00:15:15,000 --> 00:15:21,000 something called trisomy 21, trisomy 21 causes Down syndrome, 188 00:15:21,000 --> 00:15:26,000 which is a syndrome that has some brain manifestations like mental 189 00:15:26,000 --> 00:15:32,000 retardation, and also has a number of other problems associated 190 00:15:32,000 --> 00:15:37,000 with it. When the people who found this gene 191 00:15:37,000 --> 00:15:42,000 found it they named it. So they gave the first part of its 192 00:15:42,000 --> 00:15:47,000 name, the DS comes from Down syndrome. Cell adhesion molecule 193 00:15:47,000 --> 00:15:52,000 comes from the fact that this gene is similar in structure to a lot of 194 00:15:52,000 --> 00:15:57,000 known cell adhesion molecules that are encoded by many different 195 00:15:57,000 --> 00:16:02,000 loci in the genome. And they initially thought that 196 00:16:02,000 --> 00:16:06,000 perhaps, and this gene is expressed in the brain. And they initially 197 00:16:06,000 --> 00:16:10,000 thought that perhaps having an extra copy, a third copy of this gene 198 00:16:10,000 --> 00:16:14,000 might be what's causing a lot of the brain phenotypes. 199 00:16:14,000 --> 00:16:19,000 Subsequent work has not provided further evidence for that. 200 00:16:19,000 --> 00:16:23,000 So at this point what I would say is that the name of the gene is Down 201 00:16:23,000 --> 00:16:27,000 syndrome cell adhesion molecule, DSCAM. It's on human chromosome 21. 202 00:16:27,000 --> 00:16:32,000 It may play a role in Down syndrome but there isn't -- 203 00:16:32,000 --> 00:16:37,000 The name is really the best evidence that it plays a role, 204 00:16:37,000 --> 00:16:42,000 just the fact that they named it that. OK. But in the fly this is 205 00:16:42,000 --> 00:16:47,000 an extremely interesting molecule because, as I mentioned, 206 00:16:47,000 --> 00:16:52,000 it can come in 38,000 different forms. OK. So as for why you would 207 00:16:52,000 --> 00:16:57,000 have a cell adhesion molecule in the brain, I just want to 208 00:16:57,000 --> 00:17:02,000 mention briefly. So Professor Lander went over with 209 00:17:02,000 --> 00:17:06,000 you the structure of a neuron. That neurons have cell bodies and 210 00:17:06,000 --> 00:17:10,000 axons and growth cones which allow them to get to wherever they're 211 00:17:10,000 --> 00:17:14,000 supposed to connect. One of the types of molecules that 212 00:17:14,000 --> 00:17:18,000 allows an axon as it's growing, the growth cone to lead an axon 213 00:17:18,000 --> 00:17:22,000 along a complex path is to interact with various structures that it 214 00:17:22,000 --> 00:17:26,000 encounters. And so cell adhesion molecule is one of the kinds of 215 00:17:26,000 --> 00:17:30,000 molecules that can allow the growth cone and then the rest of the axon 216 00:17:30,000 --> 00:17:34,000 to interact with various other cells or other extracellular substrates, 217 00:17:34,000 --> 00:17:38,000 proteins that have been deposited by other kinds of cells as they make 218 00:17:38,000 --> 00:17:43,000 their way and make the appropriate connections in the brain. 219 00:17:43,000 --> 00:17:46,000 So cell adhesion molecules are one of the mechanisms. 220 00:17:46,000 --> 00:17:50,000 There are also mechanisms that allow cells to respond to gradients 221 00:17:50,000 --> 00:17:54,000 of chemical signaling messengers. Question? Yes. Could you explain 222 00:17:54,000 --> 00:17:58,000 what a growth cone is? Oh, I'm sorry. That was not 223 00:17:58,000 --> 00:18:02,000 covered? OK. So the neuron has a cell body 224 00:18:02,000 --> 00:18:06,000 with a nucleus and all the other stuff that's in regular cells that 225 00:18:06,000 --> 00:18:10,000 you learned about. It then has an axon which then 226 00:18:10,000 --> 00:18:14,000 allows it to connect. And this connection could be very 227 00:18:14,000 --> 00:18:19,000 far away. In the case of, for example, a motor neuron in the 228 00:18:19,000 --> 00:18:23,000 spinal cord that's intermating a muscle in the foot, 229 00:18:23,000 --> 00:18:27,000 that single cell would have its cell body in the spinal cord and its axon 230 00:18:27,000 --> 00:18:32,000 go all the way out to the foot. OK? So that's just an example of 231 00:18:32,000 --> 00:18:36,000 one very long neuron. Some of them are very long. 232 00:18:36,000 --> 00:18:40,000 Some of them are shorter. So this is the axon. At the tip of the axon 233 00:18:40,000 --> 00:18:44,000 is this thing which looks sort of like my son's mitten when it's cold 234 00:18:44,000 --> 00:18:48,000 outside. But this is the growth cone. And basically what's going on 235 00:18:48,000 --> 00:18:52,000 is as the axon is growing out in this direction it's feeling its way. 236 00:18:52,000 --> 00:18:56,000 And there might be cell adhesion molecules on these different 237 00:18:56,000 --> 00:19:00,000 protrusions that if they attach really well -- 238 00:19:00,000 --> 00:19:06,000 Like let's say that this area over here is stickier for this particular 239 00:19:06,000 --> 00:19:12,000 growth cone than this area over here. Then this growth axon is more 240 00:19:12,000 --> 00:19:18,000 likely to grow in that direction. OK. So cell adhesion molecules are 241 00:19:18,000 --> 00:19:24,000 well known to play important roles in axon guidance. 242 00:19:24,000 --> 00:19:30,000 It's how axons grow in different directions. 243 00:19:30,000 --> 00:19:34,000 So I will tell you know about the DSCAM gene, and that will give some 244 00:19:34,000 --> 00:19:39,000 insight into how the 38, 00 different forms might be used 245 00:19:39,000 --> 00:19:43,000 because they're going to provide different kinds of stickiness. 246 00:19:43,000 --> 00:19:54,000 Let me explain. 247 00:19:54,000 --> 00:20:00,000 So this is a drawing of the genomic organization of DSCAM that allows 248 00:20:00,000 --> 00:20:13,000 the extensive alternative splicing. 249 00:20:13,000 --> 00:20:17,000 So this diagram is similar to the diagram that I drew by hand for the 250 00:20:17,000 --> 00:20:21,000 more simple cases. But basically in DSCAM what you 251 00:20:21,000 --> 00:20:26,000 start out with is exon 1 gets spliced to exon 2 gets spliced to 252 00:20:26,000 --> 00:20:30,000 exon 3, and then when you reach exon 4 there are 12 distinct 253 00:20:30,000 --> 00:20:35,000 possibilities. And only one of the 12 is chosen. 254 00:20:35,000 --> 00:20:39,000 In this case the diagram shows it choosing, I don't know, 255 00:20:39,000 --> 00:20:43,000 the ninth one perhaps. Then exon 5 is regular so that always gets 256 00:20:43,000 --> 00:20:47,000 included. And exon 6 there are 48 distinct choices. 257 00:20:47,000 --> 00:20:51,000 And again only one is chosen. Here, in this example, this one has 258 00:20:51,000 --> 00:20:55,000 been chosen at the expense of all of these other ones. 259 00:20:55,000 --> 00:20:59,000 Exon 7 and 8 are normal. And exon 9 there are 33 choices. 260 00:20:59,000 --> 00:21:03,000 Exon 17 there are two choices. And if you multiple 2 x 33 x 48 x 12 261 00:21:03,000 --> 00:21:07,000 you wind up with 38, 16. There is evidence from a number 262 00:21:07,000 --> 00:21:12,000 of different types of studies, including coning and sequencing, 263 00:21:12,000 --> 00:21:17,000 lots of different messenger RNAs that are already spliced that 264 00:21:17,000 --> 00:21:21,000 basically almost all of these forms can be made. So what this structure 265 00:21:21,000 --> 00:21:26,000 allows is for there to be diversity generated in important areas of the 266 00:21:26,000 --> 00:21:31,000 cell adhesion part of this molecule. 267 00:21:31,000 --> 00:21:53,000 The DSCAM molecule starts out with a 268 00:21:53,000 --> 00:21:59,000 number of domains which are called immunoglobulin-like domains. 269 00:21:59,000 --> 00:22:08,000 Immunoglobulin domains are named for 270 00:22:08,000 --> 00:22:12,000 immunoglobulins which is another name for antibodies. 271 00:22:12,000 --> 00:22:16,000 Antibodies help you fight infection. I don't know if that's been covered. 272 00:22:16,000 --> 00:22:20,000 Has it been covered? Yes. And the particular fold that 273 00:22:20,000 --> 00:22:24,000 they form allows recognition of foreign antigens and it also allows 274 00:22:24,000 --> 00:22:28,000 stickiness of molecules in general. So cell adhesion molecules often 275 00:22:28,000 --> 00:22:33,000 have these immunoglobulin domains. The DSCAM starts out with, 276 00:22:33,000 --> 00:22:39,000 from the N-terminus towards the C-terminus it starts out, 277 00:22:39,000 --> 00:22:45,000 the first nine domains are these Ig type domains. That's then followed 278 00:22:45,000 --> 00:22:51,000 by another kind of domain called a fiberonectin type domain, 279 00:22:51,000 --> 00:22:57,000 which that's not important. All of the diversity is in this 280 00:22:57,000 --> 00:23:02,000 nine immunoglobulin domain. The exon 4 diversity allows 281 00:23:02,000 --> 00:23:08,000 diversity of the second of the nine. The exon 6 alternative splicing 282 00:23:08,000 --> 00:23:14,000 affects the third out of the nine immunoglobulin folds. 283 00:23:14,000 --> 00:23:19,000 And the exon 9 diversity affects the seventh. So of these nine 284 00:23:19,000 --> 00:23:25,000 domains of immunoglobulin folds that allow for different kinds of 285 00:23:25,000 --> 00:23:31,000 stickiness, a lot of them are the same. 286 00:23:31,000 --> 00:23:35,000 One is the same. 4, 5 and 6 are the same. 287 00:23:35,000 --> 00:23:40,000 And 8 and 9 are the same. But 2, 3 and 7 have these 288 00:23:40,000 --> 00:23:44,000 differences which are encoded by this striking kind of genomic 289 00:23:44,000 --> 00:24:11,000 structure and alternative splicing. 290 00:24:11,000 --> 00:24:29,000 So how is this diversity used? 291 00:24:29,000 --> 00:24:34,000 So the early models for how DSCAM would be used stipulated that 292 00:24:34,000 --> 00:24:40,000 individual different kinds of neurons might express vastly reduced 293 00:24:40,000 --> 00:24:45,000 subsets out of the 38, 00. So let's say that one 294 00:24:45,000 --> 00:24:51,000 particular neuron type, of which there might be many 295 00:24:51,000 --> 00:24:57,000 different neurons, might express maybe ten out of the 296 00:24:57,000 --> 00:25:02,000 38,000 or even one out of 38,000. These were the different kinds of 297 00:25:02,000 --> 00:25:06,000 models that were tossed around by people who were thinking about this 298 00:25:06,000 --> 00:25:11,000 problem. But then people started to study it. And what turned out to be 299 00:25:11,000 --> 00:25:16,000 the case is that it looks like every kind of neuron population at first 300 00:25:16,000 --> 00:25:20,000 approximately expresses almost all of the different forms. 301 00:25:20,000 --> 00:25:25,000 OK? So these are wrong, these models. And each different 302 00:25:25,000 --> 00:25:30,000 neuron type expresses a slightly different repertoire. 303 00:25:30,000 --> 00:25:35,000 But at first approximation over 10, 00 or 20,000 forms are possible for 304 00:25:35,000 --> 00:25:41,000 each different neuron type. So that then caused people to 305 00:25:41,000 --> 00:25:47,000 scratch their heads and wonder, well, how is this used then? How is 306 00:25:47,000 --> 00:25:53,000 this used to make different kinds of neurons different from one another 307 00:25:53,000 --> 00:25:59,000 or anything in the function of them? 308 00:25:59,000 --> 00:26:05,000 So the answer to this question has emerged in part from analyses of 309 00:26:05,000 --> 00:26:12,000 individual single cells. So it turns out that an individual 310 00:26:12,000 --> 00:26:18,000 cell, and I'm using the word cell and neuron interchangeably because 311 00:26:18,000 --> 00:26:25,000 neurons are cells. And not all cells are neurons but 312 00:26:25,000 --> 00:26:32,000 all neurons are cells. So for one cell or one neuron it 313 00:26:32,000 --> 00:26:39,000 makes somewhere in the range of 10 to 50 forms. 314 00:26:39,000 --> 00:26:44,000 These are randomly chosen, apparently from the data that's 315 00:26:44,000 --> 00:26:49,000 available, from the tens of thousands of forms that are possible. 316 00:26:49,000 --> 00:26:54,000 So you can imagine that two neighboring cells that are otherwise 317 00:26:54,000 --> 00:26:59,000 identical, that each are picking, let's say ten just to make it easy, 318 00:26:59,000 --> 00:27:04,000 ten different forms of DSCAM, are going to wind up with very different 319 00:27:04,000 --> 00:27:10,000 repertoires of DSCAM than an adjacent cell. 320 00:27:10,000 --> 00:27:13,000 So what this allows is each individual cell to have a unique 321 00:27:13,000 --> 00:27:17,000 identity. The whole idea that individual neurons might need to 322 00:27:17,000 --> 00:27:21,000 have a unique identity actually is a new concept that's really been 323 00:27:21,000 --> 00:27:24,000 enlightened by this molecule. Because the way that people used to 324 00:27:24,000 --> 00:27:28,000 think of neurons is that they would wind up with unique identities based 325 00:27:28,000 --> 00:27:32,000 on the connections or experience, what they were exposed to in terms 326 00:27:32,000 --> 00:27:36,000 of different stimuli. But what this indicates is that from 327 00:27:36,000 --> 00:27:42,000 the splicing of an individual gene and the fact that each time this 328 00:27:42,000 --> 00:27:48,000 gene gets spliced you can wind up with a different form that at any 329 00:27:48,000 --> 00:27:54,000 given time each cell will have a unique set of messenger RNAs, 330 00:27:54,000 --> 00:28:00,000 and therefore proteins encoding this DSCAM gene. 331 00:28:00,000 --> 00:28:05,000 OK. So I mentioned earlier that the human DSCAM does not have 332 00:28:05,000 --> 00:28:10,000 alternative splicing. We all like to think of ourselves, 333 00:28:10,000 --> 00:28:15,000 humans and other mammals as having brains that are on the level of 334 00:28:15,000 --> 00:28:21,000 complexity, at least on par with the fly. And so it's odd to think of 335 00:28:21,000 --> 00:28:26,000 all this complexity that's there for flies and other insects but why is 336 00:28:26,000 --> 00:28:31,000 it not there for humans? Well, it turns out that there are 337 00:28:31,000 --> 00:28:36,000 other kinds of genes that do have extensive alternative splicing in 338 00:28:36,000 --> 00:28:40,000 mammals. So one of them is called neurexins. These are genes that are 339 00:28:40,000 --> 00:28:45,000 involved in synapse, how the different kinds of cells 340 00:28:45,000 --> 00:28:50,000 communicate with each other at the interface. There are genes called 341 00:28:50,000 --> 00:28:55,000 protocadherins. And there are also other kinds of 342 00:28:55,000 --> 00:29:00,000 genes that all have extensive alternative splicing in mammals. 343 00:29:00,000 --> 00:29:04,000 Interestingly, these genes tend not to have 344 00:29:04,000 --> 00:29:08,000 extensive alternative splicing in flies. It's as if in each lineage 345 00:29:08,000 --> 00:29:12,000 certain genes have been chosen to get a lot of diversity by this 346 00:29:12,000 --> 00:29:16,000 mechanism of alternative splicing and other genes are left with their 347 00:29:16,000 --> 00:29:20,000 just standard single function where it's one gene, 348 00:29:20,000 --> 00:29:24,000 one RNA, one protein. OK. So I'm going to switch now to 349 00:29:24,000 --> 00:29:29,000 the second example which is RNA editing. 350 00:29:29,000 --> 00:29:41,000 So, as I mentioned earlier, 351 00:29:41,000 --> 00:29:45,000 RNA editing involves an actual change in the RNA sequence so that 352 00:29:45,000 --> 00:29:49,000 it no longer reflects the exact DNA sequence. Now, 353 00:29:49,000 --> 00:29:53,000 this is different than splicing. Splicing takes different pieces of 354 00:29:53,000 --> 00:29:57,000 RNA and splices them together leaving out intervening sequences or 355 00:29:57,000 --> 00:30:01,000 introns. But in RNA editing you actually change the nucleotide 356 00:30:01,000 --> 00:30:05,000 sequence so that it no longer is identical to the DNA. 357 00:30:05,000 --> 00:30:10,000 This is used in a number of parts of the brain. Most of the examples are 358 00:30:10,000 --> 00:30:15,000 brain-specific. There are some non-brain specific 359 00:30:15,000 --> 00:30:20,000 parts. Most of the time the editing event changes in adenosine, 360 00:30:20,000 --> 00:30:25,000 an A in the ACGT nomenclature, into an inosine. 361 00:30:25,000 --> 00:30:34,000 This is read differently by the 362 00:30:34,000 --> 00:30:40,000 ribosome than the adenosine. So this leads, for example, in an 363 00:30:40,000 --> 00:30:45,000 important kind of channel called a glutamate receptor. 364 00:30:45,000 --> 00:30:51,000 And the specific subtype that I'm talking about is something called an 365 00:30:51,000 --> 00:30:56,000 AMPA glutamate receptor. That's for a chemical ligand that 366 00:30:56,000 --> 00:31:02,000 activates this particular kind of glutamate receptor. 367 00:31:02,000 --> 00:31:10,000 This leads to an important change of a glutamine to an arginine in the 368 00:31:10,000 --> 00:31:19,000 protein. So let me draw a quick diagram of what the protein looks 369 00:31:19,000 --> 00:31:27,000 like so we can see what the importance of this glutamine to 370 00:31:27,000 --> 00:31:36,000 arginine switch in this glutamate receptor is. 371 00:31:36,000 --> 00:31:47,000 So in the absence of detailed 372 00:31:47,000 --> 00:31:51,000 structural information about different kinds of neurotransmitter 373 00:31:51,000 --> 00:31:55,000 receptors people often draw a diagram like this where this is the 374 00:31:55,000 --> 00:31:59,000 outside of the cell, this is the inside of the cell, 375 00:31:59,000 --> 00:32:05,000 this represents the cell membrane. And here's the amino terminus. 376 00:32:05,000 --> 00:32:12,000 And then they draw a transmembrane portion. And this is what's called 377 00:32:12,000 --> 00:32:18,000 a reentrant loop. It doesn't quite pass through the 378 00:32:18,000 --> 00:32:25,000 membrane, but then it passes the membrane again. And here's 379 00:32:25,000 --> 00:32:33,000 the carboxy terminus. The glutamine to arginine change is 380 00:32:33,000 --> 00:32:41,000 here. It's in this area which is involved in making the pore. 381 00:32:41,000 --> 00:32:50,000 So the pore of the channel has this change. 382 00:32:50,000 --> 00:33:04,000 Glutamine to arginine change. 383 00:33:04,000 --> 00:33:11,000 This vastly changes the properties of the channel, 384 00:33:11,000 --> 00:33:18,000 a channel that doesn't undergo this editing event. 385 00:33:18,000 --> 00:33:25,000 So let me just state that for the GluR2 AMPA receptor, 386 00:33:25,000 --> 00:33:33,000 which is one of four different genes, it is 99% edited in adults. 387 00:33:33,000 --> 00:33:37,000 So where over 99% of the time this adenosine is made into an inosine 388 00:33:37,000 --> 00:33:42,000 which leads to a glutamine becoming an arginine in the protein. 389 00:33:42,000 --> 00:33:47,000 What this does to channel is it changes its permeability. 390 00:33:47,000 --> 00:33:52,000 So this is a kind of channel that is mostly designed to let sodium in, 391 00:33:52,000 --> 00:33:57,000 but if it doesn't get edited, if the glutamine is there it also 392 00:33:57,000 --> 00:34:02,000 lets calcium in. So whether or not calcium gets into 393 00:34:02,000 --> 00:34:06,000 the cell is very important because they're both, both sodium and 394 00:34:06,000 --> 00:34:10,000 calcium are cat ions and can lead to membrane, potential disturbances 395 00:34:10,000 --> 00:34:14,000 like you learned about earlier this week, leading to an action potential. 396 00:34:14,000 --> 00:34:18,000 But calcium also has other effects. It can lead ultimately to the turn 397 00:34:18,000 --> 00:34:22,000 on and off of genes and phosphorylation of various proteins 398 00:34:22,000 --> 00:34:26,000 to other kinds of effects in the neuron. So it has to be 399 00:34:26,000 --> 00:34:30,000 regulated very tightly. So these channels are designed to 400 00:34:30,000 --> 00:34:34,000 just let sodium through, to be involved in the transmission 401 00:34:34,000 --> 00:34:38,000 of an action potential from one neuron to the next. 402 00:34:38,000 --> 00:34:42,000 So if you had a perturbation in this process you would then also let 403 00:34:42,000 --> 00:34:46,000 calcium in because the glutamine containing channel lets calcium in. 404 00:34:46,000 --> 00:34:50,000 In fact, early in development it's probably true that you don't edit 405 00:34:50,000 --> 00:34:55,000 100% and you let a little bit of calcium in. 406 00:34:55,000 --> 00:35:00,000 And there are also other glutamate receptors that are not part of the 407 00:35:00,000 --> 00:35:05,000 AMPA family but they are related. And they're encoded by genes called, 408 00:35:05,000 --> 00:35:11,000 so this GluR1 through 4 encodes AMPA receptors, a gene called GluR5 and 6, 409 00:35:11,000 --> 00:35:16,000 they have editing which is more regulated. And by regulated I mean 410 00:35:16,000 --> 00:35:22,000 that the editing is sometimes present and sometimes not. 411 00:35:22,000 --> 00:35:27,000 So even within a given neuron you might have some channels that have 412 00:35:27,000 --> 00:35:33,000 the glutamine and some channels that have the arginine. 413 00:35:33,000 --> 00:35:41,000 There are also other sites. 414 00:35:41,000 --> 00:35:45,000 I've talked about the main site. This is the one that has the most 415 00:35:45,000 --> 00:35:49,000 profound impact on the function of the protein. There are also other 416 00:35:49,000 --> 00:35:53,000 sites in the molecule that are edited. And then they also have 417 00:35:53,000 --> 00:35:57,000 important but slightly less prominent roles in the regulation of 418 00:35:57,000 --> 00:36:02,000 these channels. So what leads this gene to become 419 00:36:02,000 --> 00:36:07,000 edited? Why do most genes not become edited and this gene becomes 420 00:36:07,000 --> 00:36:12,000 edited? Well, people are starting to pursue that 421 00:36:12,000 --> 00:36:17,000 kind of mechanisms. And one of the mechanisms that's 422 00:36:17,000 --> 00:36:23,000 become very clear is that, for example, in the Q to R change in 423 00:36:23,000 --> 00:36:28,000 the pore, or the adenosine to inosine change in the messenger RNA 424 00:36:28,000 --> 00:36:33,000 that leads to the Q to R change in the pore, if you look at 425 00:36:33,000 --> 00:36:38,000 the exon where that -- I'll draw it as an A. 426 00:36:38,000 --> 00:36:44,000 Where that A is present. There's an area, and then here's 427 00:36:44,000 --> 00:36:49,000 the intron, of the intronic sequence which actually loops back and then 428 00:36:49,000 --> 00:36:55,000 allows base pairing to form between the messenger RNA and the intron. 429 00:36:55,000 --> 00:37:01,000 And then the enzymes that are involved in mediating this adenosine 430 00:37:01,000 --> 00:37:07,000 to inosine change recognize the base pairing of this short area and some 431 00:37:07,000 --> 00:37:13,000 sequence specificity to the RNA sequence that's around here. 432 00:37:13,000 --> 00:37:19,000 It's not just any base pairing, but the base pairing is critical. 433 00:37:19,000 --> 00:37:25,000 And then the enzymes, which are called adenosine deaminase, 434 00:37:25,000 --> 00:37:32,000 can mediate this change of an adenosine to an inosine. 435 00:37:32,000 --> 00:37:37,000 So this gene has been selected to have parts of its intron, 436 00:37:37,000 --> 00:37:42,000 in addition to allowing splicing to occur, to actual be able to base 437 00:37:42,000 --> 00:37:47,000 pair and allow this editing function to happen. And so this allows an 438 00:37:47,000 --> 00:37:52,000 individual gene then to make more than one form of protein. 439 00:37:52,000 --> 00:37:57,000 OK. One other example that I'll tell you about of RNA editing 440 00:37:57,000 --> 00:38:03,000 involves the serotonin system. It involves a serotonin receptor 441 00:38:03,000 --> 00:38:10,000 which has RNA editing. And this is a serotonin receptor 442 00:38:10,000 --> 00:38:18,000 whose name is serotonin receptor 2C, so it's often written as 5-HT or 5 443 00:38:18,000 --> 00:38:25,000 hydroxtryptamine 2C serotonin receptor. This receptor is a member 444 00:38:25,000 --> 00:38:33,000 of the G protein-coupled receptor super family. 445 00:38:33,000 --> 00:38:39,000 G protein-coupled receptors are 7-transmembrane domain receptors. 446 00:38:39,000 --> 00:38:46,000 And the editing of the serotonin receptor occurs in the second 447 00:38:46,000 --> 00:38:53,000 intracellular loop and affects the coupling to G protein. 448 00:38:53,000 --> 00:39:00,000 So the way that these G protein-coupled receptors -- 449 00:39:00,000 --> 00:39:04,000 Did they do this already? G protein-coupled receptors 450 00:39:04,000 --> 00:39:08,000 transduce the signal through a G protein which often binds to the 451 00:39:08,000 --> 00:39:12,000 intracellular loops, particularly the second loop. 452 00:39:12,000 --> 00:39:16,000 And the editing event which changes adenosines to inosines, 453 00:39:16,000 --> 00:39:20,000 there are actually a few of them in this region here, 454 00:39:20,000 --> 00:39:24,000 a few different sites which can get changed. That affects the 455 00:39:24,000 --> 00:39:28,000 efficiency of the transduction, when serotonin is present the 456 00:39:28,000 --> 00:39:33,000 transduction of a signal inside the cell. 457 00:39:33,000 --> 00:39:37,000 There are other serotonin receptors that are ligand gated channels but 458 00:39:37,000 --> 00:39:41,000 they don't appear to have the same kind of editing as this serotonin 459 00:39:41,000 --> 00:39:46,000 receptor that is of the G protein-coupled variety has. 460 00:39:46,000 --> 00:39:59,000 So I'll end with just one final note 461 00:39:59,000 --> 00:40:03,000 about this serotonin receptor which is that a drug called fluoxetine or 462 00:40:03,000 --> 00:40:07,000 Prozac, did Eric go over that this year? Prozac? 463 00:40:07,000 --> 00:40:11,000 No? He mentioned it. OK, he mentioned it. So it's 464 00:40:11,000 --> 00:40:16,000 mostly known as something which blocks reuptake of serotonin. 465 00:40:16,000 --> 00:40:20,000 So one cell releases serotonin, which is a neurotransmitter. 466 00:40:20,000 --> 00:40:24,000 Another cell responds to it. If you block the reuptake the 467 00:40:24,000 --> 00:40:28,000 neurotransmitter is present in the synapse for a longer amount of time 468 00:40:28,000 --> 00:40:33,000 leading to increased signaling. Prozac is widely thought to have its 469 00:40:33,000 --> 00:40:37,000 main effect, and that probably is its main effect to just block the 470 00:40:37,000 --> 00:40:41,000 reuptake leading to increased serotonin signaling. 471 00:40:41,000 --> 00:40:45,000 Someone has studied the serotonin receptor in individuals who are 472 00:40:45,000 --> 00:40:49,000 taking Prozac versus individuals who are not taking Prozac. 473 00:40:49,000 --> 00:40:53,000 And what they found was that there were differences in the amount of 474 00:40:53,000 --> 00:40:57,000 editing of various sites in this key area in individuals taking Prozac 475 00:40:57,000 --> 00:41:02,000 versus individuals not taking Prozac. 476 00:41:02,000 --> 00:41:06,000 And the direction of the difference, whether it's switching from edited 477 00:41:06,000 --> 00:41:10,000 to unedited or back, and it varies for the different 478 00:41:10,000 --> 00:41:15,000 sites, the direction was the opposite of what was seen in a 479 00:41:15,000 --> 00:41:19,000 comparison of brains of victims of suicide versus brains of other 480 00:41:19,000 --> 00:41:24,000 accident victims. So it looks like Prozac is having 481 00:41:24,000 --> 00:41:28,000 an effect which is in the opposite effect to the skewing of editing 482 00:41:28,000 --> 00:41:33,000 that one sees in certain cases of depression. 483 00:41:33,000 --> 00:41:38,000 So I bring that example up because it's important to know that, 484 00:41:38,000 --> 00:41:43,000 you know, if you can impact on some of these subtle differences between 485 00:41:43,000 --> 00:41:48,000 different kinds of messages, like whether it's edited or not or 486 00:41:48,000 --> 00:41:53,000 whether the splicing is more towards one kind of alternative splicing or 487 00:41:53,000 --> 00:41:58,000 another kind of alternative splicing, these might provide very interesting 488 00:41:58,000 --> 00:42:03,000 pharmacologic targets for therapies that might impact on a variety of 489 00:42:03,000 --> 00:42:08,000 different human diseases. So what I've hoped to do is give you 490 00:42:08,000 --> 00:42:12,000 a sense of a couple of different examples where you can take a single 491 00:42:12,000 --> 00:42:17,000 gene and make more than one protein, and this can lead to increases in 492 00:42:17,000 --> 00:42:21,000 the diversity of neurons, and therefore increases in the 493 00:42:21,000 --> 00:42:26,000 complexity of the brain. Thank you. [APPLAUSE] Are there 494 00:42:26,000 --> 00:42:31,000 questions?