1 00:00:00,000 --> 00:00:03,000 Julia just mentioned that a few of you had commented, 2 00:00:03,000 --> 00:00:07,000 when we were talking about the genetic code, that some of you 3 00:00:07,000 --> 00:00:11,000 thought the fact that it was degenerate, it had some redundancy 4 00:00:11,000 --> 00:00:15,000 in it, like multiple codons or threonine, that that was kind of 5 00:00:15,000 --> 00:00:19,000 cool, and some of you thought it was sort of a waste and would have maybe 6 00:00:19,000 --> 00:00:23,000 designed the thing differently. That's, you know, part of when you 7 00:00:23,000 --> 00:00:27,000 study biology you don't get to design it from first principles. 8 00:00:27,000 --> 00:00:31,000 You found out what happened during evolution and what got selected for. 9 00:00:31,000 --> 00:00:36,000 And once it gets selected for then that gets sort of fixed in nature. 10 00:00:36,000 --> 00:00:40,000 If there were four nucleotides then you could have one, 11 00:00:40,000 --> 00:00:45,000 two and three-letter words. And it's going to be a three-letter 12 00:00:45,000 --> 00:00:49,000 word to have at least 20 then you've got some degeneracy or redundancy, 13 00:00:49,000 --> 00:00:54,000 but that's not necessarily a bad thing. And, in fact, 14 00:00:54,000 --> 00:00:58,000 if you go into the evolution of the code more deeply, 15 00:00:58,000 --> 00:01:03,000 people are beginning to suspect it evolved from a simpler one. 16 00:01:03,000 --> 00:01:07,000 And there actually are some relationships between some of the 17 00:01:07,000 --> 00:01:11,000 codons that go back to the similarities, the chemical 18 00:01:11,000 --> 00:01:15,000 similarities between the amino acids. And it also allows some things for 19 00:01:15,000 --> 00:01:19,000 some cells, for example, if they want proteins to be present 20 00:01:19,000 --> 00:01:23,000 at very low levels they will use a codon that has just a very low level 21 00:01:23,000 --> 00:01:27,000 of the corresponding tRNA. And if they want to make a lot of 22 00:01:27,000 --> 00:01:32,000 the protein they'll use a tRNA that it makes in abundance. 23 00:01:32,000 --> 00:01:35,000 And so it's sort of another way of controlling levels of proteins. 24 00:01:35,000 --> 00:01:38,000 There are a lot of different subtleties in here. 25 00:01:38,000 --> 00:01:41,000 And also in biology redundancy is not necessarily a bad thing. 26 00:01:41,000 --> 00:01:44,000 It's just like on a space flight, if something goes wrong and if 27 00:01:44,000 --> 00:01:48,000 there's some kind of redundant function then you've got some 28 00:01:48,000 --> 00:01:51,000 backups, too. OK. Well, in any case, 29 00:01:51,000 --> 00:01:54,000 today is a pretty interesting first part of the lecture. 30 00:01:54,000 --> 00:01:57,000 I've heard a few people express the view that why can't I just teach 31 00:01:57,000 --> 00:02:01,000 what's in the textbook and get on with it? 32 00:02:01,000 --> 00:02:06,000 And I think this part, for those of you who are following, 33 00:02:06,000 --> 00:02:11,000 really trying to understand what I'm trying to do with this course, 34 00:02:11,000 --> 00:02:17,000 I hope this will help you to see this. Because what I've talked 35 00:02:17,000 --> 00:02:22,000 about, this thing that Crick called the "central dogma" which was the 36 00:02:22,000 --> 00:02:28,000 direction of information flow in biology which was from DNA 37 00:02:28,000 --> 00:02:33,000 to RNA to proteins. And I'll just remind you, 38 00:02:33,000 --> 00:02:37,000 although proteins do many things they are, for example, 39 00:02:37,000 --> 00:02:42,000 enzymes that are biological catalysts. And it was pretty 40 00:02:42,000 --> 00:02:46,000 well-established, even by the time I was an undergrad 41 00:02:46,000 --> 00:02:50,000 that this was the way information flow went in biology and this was 42 00:02:50,000 --> 00:02:55,000 how it worked. And there were various statements 43 00:02:55,000 --> 00:02:59,000 in the literature that what was true for E. coli was true 44 00:02:59,000 --> 00:03:03,000 for an elephant. And it is still true today in a 45 00:03:03,000 --> 00:03:07,000 broad sense that, as I've tried to emphasize 46 00:03:07,000 --> 00:03:11,000 throughout the course, when you get down to a cellular 47 00:03:11,000 --> 00:03:14,000 molecule level there's an awful lot in common and things look much more 48 00:03:14,000 --> 00:03:18,000 alike than different compared to what we see at a more macroscopic 49 00:03:18,000 --> 00:03:21,000 scale. However, that doesn't mean that all the 50 00:03:21,000 --> 00:03:25,000 details are the same. And maybe you could begin to get a 51 00:03:25,000 --> 00:03:29,000 glimmering of that when I told you that although the genetic code is 52 00:03:29,000 --> 00:03:33,000 virtually universal. That almost every organism, 53 00:03:33,000 --> 00:03:37,000 with only a couple very tiny exceptions, uses exactly the same 54 00:03:37,000 --> 00:03:41,000 genetic code to have nucleotides correspond to three-letter words in 55 00:03:41,000 --> 00:03:45,000 the nucleic acid alphabet correspond to particular amino acids in a 56 00:03:45,000 --> 00:03:49,000 protein. But the other languages that are written in there such as 57 00:03:49,000 --> 00:03:53,000 the sequence to start transcribing a gene, making an RNA copy are stopped. 58 00:03:53,000 --> 00:03:58,000 Those are different between different organisms. Yeah? 59 00:03:58,000 --> 00:04:05,000 Glycolysis enzymes are amazingly 60 00:04:05,000 --> 00:04:09,000 similar. They are very clearly, they arose once, and they have 61 00:04:09,000 --> 00:04:12,000 stayed right through evolution. You could, in principle, sometimes 62 00:04:12,000 --> 00:04:16,000 in evolution you get something that creates a function and something 63 00:04:16,000 --> 00:04:19,000 that starts out, and then like what they call 64 00:04:19,000 --> 00:04:23,000 convergent evolution you end up with two things that came from a 65 00:04:23,000 --> 00:04:26,000 different evolutionary origin but have learned to do, 66 00:04:26,000 --> 00:04:30,000 let's say, catalyze the same biochemical reaction or something. 67 00:04:30,000 --> 00:04:34,000 Glycolysis came once. But if you were to look inside E. 68 00:04:34,000 --> 00:04:38,000 coli or yeast, let's say E. coli and look at how those enzymes are 69 00:04:38,000 --> 00:04:43,000 regulated, the thing that says this is the start of a gene, 70 00:04:43,000 --> 00:04:47,000 start making the RNA, it would look totally different than if you looked 71 00:04:47,000 --> 00:04:52,000 in a mouse because the language, the promoter does not have the same 72 00:04:52,000 --> 00:04:56,000 sequence in an E. coli and in a human or in a mouse. 73 00:04:56,000 --> 00:05:01,000 And I'll tell you more about that today. But there were -- 74 00:05:01,000 --> 00:05:05,000 I want to just now tell you sort of three things that were sort of 75 00:05:05,000 --> 00:05:09,000 exceptions to this general way of thinking. Every one of them 76 00:05:09,000 --> 00:05:13,000 generated a Nobel Prize. And this is a fun lecture for me to 77 00:05:13,000 --> 00:05:17,000 give because the individuals involved in all of these things had 78 00:05:17,000 --> 00:05:21,000 a very, very close association with MIT. And when I told you when Crick 79 00:05:21,000 --> 00:05:25,000 called this a central dogma he meant a hypothesis, or at least an idea 80 00:05:25,000 --> 00:05:30,000 for which there was not reasonable evidence. 81 00:05:30,000 --> 00:05:35,000 And he learned later it was something a true believer cannot 82 00:05:35,000 --> 00:05:40,000 doubt. And once this gets established it does get in the 83 00:05:40,000 --> 00:05:45,000 textbook and it does get in your thinking. And so information goes 84 00:05:45,000 --> 00:05:50,000 down this way. But there were a few oddities. 85 00:05:50,000 --> 00:05:55,000 I mean there were some viruses that had RNA inside them. 86 00:05:55,000 --> 00:06:00,000 They didn't have DNA. So how where these handled? 87 00:06:00,000 --> 00:06:06,000 Well, there turned out to be two classes of RNA virus. 88 00:06:06,000 --> 00:06:12,000 One that was studied quite heavily called, it's a plant virus called 89 00:06:12,000 --> 00:06:18,000 the tobacco mosaic virus. And it had a coat. And then it had 90 00:06:18,000 --> 00:06:24,000 in it a piece of RNA. Now, you can see if that virus were 91 00:06:24,000 --> 00:06:31,000 to inject RNA in the cell it could encode proteins. 92 00:06:31,000 --> 00:06:39,000 But that genetic material has to be copied. And the RNA was copied -- 93 00:06:39,000 --> 00:06:55,000 -- by an RNA dependent 94 00:06:55,000 --> 00:07:04,000 RNA polymerase. And so it's sort of just like the 95 00:07:04,000 --> 00:07:09,000 RNA polymerase before, except instead of using DNA as its 96 00:07:09,000 --> 00:07:14,000 template it can use RNA. So that sort of somehow would be a 97 00:07:14,000 --> 00:07:19,000 little loop in here about RNA being able to copy itself that hadn't been 98 00:07:19,000 --> 00:07:24,000 anticipated. And although this is an important virus in the plant 99 00:07:24,000 --> 00:07:29,000 industry, for plants and agriculture, it's not so important for humans. 100 00:07:29,000 --> 00:07:37,000 But there's another class of RNA viruses that are very important. 101 00:07:37,000 --> 00:07:46,000 And these are called retroviruses. And the reason these are so 102 00:07:46,000 --> 00:07:55,000 important is that the HIV-1 virus that's associated with AIDS is such 103 00:07:55,000 --> 00:08:04,000 a retrovirus. It's a virus that has a coat and it has an RNA that's it 104 00:08:04,000 --> 00:08:11,000 genetic material. And the person who worked out how 105 00:08:11,000 --> 00:08:15,000 this goes was a person at MIT, Dave Baltimore. He was a colleague 106 00:08:15,000 --> 00:08:19,000 of mine here for many years. He was the person who founded the 107 00:08:19,000 --> 00:08:23,000 White Head Institute and got that up and going. And he then finally, 108 00:08:23,000 --> 00:08:27,000 to move up one more administrative challenge, went to Caltech 109 00:08:27,000 --> 00:08:32,000 to be president. And that's where he is today. 110 00:08:32,000 --> 00:08:36,000 And David was working on this problem trying to figure out how 111 00:08:36,000 --> 00:08:41,000 these retroviruses work. And they're important. Not only 112 00:08:41,000 --> 00:08:45,000 the HIV-1 virus, but there are certain viruses that 113 00:08:45,000 --> 00:08:50,000 are associated with cancer. In general, what they do is they've 114 00:08:50,000 --> 00:08:54,000 picked up what's called an oncogene which is sort of often a mutated 115 00:08:54,000 --> 00:08:59,000 version of one of your normal genes. 116 00:08:59,000 --> 00:09:03,000 And if that virus gets inside one of your cells and brings in this 117 00:09:03,000 --> 00:09:07,000 mutated gene it's sort of kind of the same consequence as mutating one 118 00:09:07,000 --> 00:09:12,000 of your own genes along that progression of cancer. 119 00:09:12,000 --> 00:09:16,000 So it can kind of, say, bring in a cell that screws up the 120 00:09:16,000 --> 00:09:21,000 control on when cells are supposed to replicate and stop dividing and 121 00:09:21,000 --> 00:09:25,000 so on. So David started to work on these, and what he discovered was 122 00:09:25,000 --> 00:09:29,000 that these viruses encoded, they had information encoding 123 00:09:29,000 --> 00:09:35,000 proteins. And one of the proteins encoded in 124 00:09:35,000 --> 00:09:42,000 their RNA is an enzyme de-characterized which is given the 125 00:09:42,000 --> 00:09:48,000 name "reverse transcriptase". And what this can do is take an RNA 126 00:09:48,000 --> 00:09:55,000 template and make the corresponding complimentary DNA strand in this way. 127 00:09:55,000 --> 00:10:02,000 So that if we took the -- We'll just take this RNA out of the 128 00:10:02,000 --> 00:10:08,000 virus. What this virus encodes then is an enzyme that's able to take 129 00:10:08,000 --> 00:10:14,000 this RNA and make the corresponding DNA copy. So there's the original 130 00:10:14,000 --> 00:10:21,000 RNA that was in the virus. There is the RNA that it started 131 00:10:21,000 --> 00:10:27,000 out. And so what is happening, if you will in that case, is the 132 00:10:27,000 --> 00:10:34,000 information is flowing in the other direction. 133 00:10:34,000 --> 00:10:44,000 That was a marvelous discovery. 134 00:10:44,000 --> 00:10:48,000 And it was discovered by someone who wasn't willing just to take what 135 00:10:48,000 --> 00:10:52,000 was in the textbooks but was trying to figure out what could possibly be 136 00:10:52,000 --> 00:10:56,000 going on here. Now, the way these viruses work 137 00:10:56,000 --> 00:11:00,000 then, once they've done this it's not so bad because they've got their 138 00:11:00,000 --> 00:11:06,000 information now in the form of DNA. So this strand of DNA can be made 139 00:11:06,000 --> 00:11:13,000 into a double-stranded DNA by just using the kinds of enzymes that 140 00:11:13,000 --> 00:11:19,000 we've already talked about. A DNA dependent DNA polymerase will 141 00:11:19,000 --> 00:11:26,000 be able to copy the other thing. And now you've got a DNA copy of 142 00:11:26,000 --> 00:11:33,000 the information that used to be in the virus. But what happens to that 143 00:11:33,000 --> 00:11:40,000 is that you have a piece of the host DNA. 144 00:11:40,000 --> 00:11:48,000 And this viral DNA then inserts into it, so you end up with this 145 00:11:48,000 --> 00:11:56,000 situation where you have DNA from the host, and this 146 00:11:56,000 --> 00:12:04,000 is the virus DNA. So this is the DNA that encodes the 147 00:12:04,000 --> 00:12:10,000 information needed for the virus. And if this was our DNA then it 148 00:12:10,000 --> 00:12:17,000 would be inserted that way. And there are just a handful of 149 00:12:17,000 --> 00:12:23,000 health messages I've tried to drive home in this thing. 150 00:12:23,000 --> 00:12:30,000 I mentioned smoking the other day. If you smoke -- 151 00:12:30,000 --> 00:12:34,000 If you stop smoking you basically, well, let me try another way. The 152 00:12:34,000 --> 00:12:38,000 risk of smoking is about equal to the sum of everything else you can 153 00:12:38,000 --> 00:12:42,000 possibly do in your life that will affect your chances of getting 154 00:12:42,000 --> 00:12:47,000 cancer, leaving aside what you inherited from mom and dad. 155 00:12:47,000 --> 00:12:51,000 The one single thing to not do if you want to avoid cancer, 156 00:12:51,000 --> 00:12:55,000 or to help loved ones who smoke avoid cancer, is just don't smoke, 157 00:12:55,000 --> 00:13:00,000 or if you do smoke, stop. You freeze the risk of whatever 158 00:13:00,000 --> 00:13:06,000 increased risk you've got, and then just live with that, 159 00:13:06,000 --> 00:13:11,000 but it doesn't keep getting worse with time. The other one is 160 00:13:11,000 --> 00:13:17,000 practice safe sex, and this is why. HIV-1 is a 161 00:13:17,000 --> 00:13:22,000 retrovirus. If you get infected with it, it makes a DNA copy of the 162 00:13:22,000 --> 00:13:28,000 RNA, it makes the other strand of the DNA, and it sticks itself in. 163 00:13:28,000 --> 00:13:34,000 So what you've got is your DNA here, your DNA there. 164 00:13:34,000 --> 00:13:39,000 And HIV-1 is a permanent traveling companion for the rest of your life. 165 00:13:39,000 --> 00:13:44,000 There's no way of getting that out of there right now. 166 00:13:44,000 --> 00:13:49,000 All the systems for dealing with AIDS are just managing the infection. 167 00:13:49,000 --> 00:13:54,000 So when someone is HIV-1 positive, they've got those viral genes now 168 00:13:54,000 --> 00:14:00,000 permanently integrated into their DNA. 169 00:14:00,000 --> 00:14:04,000 So it's extremely important that you be aware of that, 170 00:14:04,000 --> 00:14:08,000 or if you know people who don't appreciate this because they haven't 171 00:14:08,000 --> 00:14:12,000 got so much of a biology background that you help them understand that. 172 00:14:12,000 --> 00:14:16,000 OK. So I just wanted to show you, I found one other picture last night. 173 00:14:16,000 --> 00:14:20,000 And this is you see all these old scientists, right? 174 00:14:20,000 --> 00:14:24,000 Of course, David didn't look like this when he was doing his work. 175 00:14:24,000 --> 00:14:28,000 In fact, I think he's fairly cleaned up here. 176 00:14:28,000 --> 00:14:32,000 I found this one in the Cold Spring Harbor archives last night. 177 00:14:32,000 --> 00:14:35,000 I've seen pictures of him looking considerably more shaggy and perhaps 178 00:14:35,000 --> 00:14:38,000 disreputable and stuff. But anyway, when David was making 179 00:14:38,000 --> 00:14:42,000 all these discoveries he was still quite a young man. 180 00:14:42,000 --> 00:14:45,000 I believe he got his Nobel Prize when he was still in his thirties. 181 00:14:45,000 --> 00:14:49,000 And so many of these discoveries are made by people that are not all 182 00:14:49,000 --> 00:14:52,000 that much older than you. But, again, it's trying to 183 00:14:52,000 --> 00:14:56,000 understand why we know what we know and then trying to fit 184 00:14:56,000 --> 00:15:01,000 other things into it. Now, the next thing I want to tell 185 00:15:01,000 --> 00:15:09,000 you about that has some of this same character, I've sort of told you 186 00:15:09,000 --> 00:15:16,000 that you have a piece of DNA. Let's say there's a gene here and 187 00:15:16,000 --> 00:15:24,000 this is the coding region, and then we make a mRNA copy, 188 00:15:24,000 --> 00:15:32,000 and then we use the genetic code and we make the protein. 189 00:15:32,000 --> 00:15:37,000 And so if we sequence the DNA and find the beginning of this protein 190 00:15:37,000 --> 00:15:42,000 we can read along using that genetic code and away it should go. 191 00:15:42,000 --> 00:15:47,000 And that was beautifully worked out, understood, just like I sort of 192 00:15:47,000 --> 00:15:52,000 finished up telling you the other day. So Phil Sharp who got a Nobel 193 00:15:52,000 --> 00:15:57,000 Prize for this work and his colleague in the Biology 194 00:15:57,000 --> 00:16:02,000 Department. He's in the Cancer Center just 195 00:16:02,000 --> 00:16:06,000 across the street from the building I'm in. That was the cancer center 196 00:16:06,000 --> 00:16:11,000 that Salvador Laurier, who Jim Watson trained with, 197 00:16:11,000 --> 00:16:16,000 had founded. And Phil was studying this process. It was before we 198 00:16:16,000 --> 00:16:20,000 could sequence DNA. It was in the mid '70s. 199 00:16:20,000 --> 00:16:25,000 And he was working with the tools we had then trying to map the 200 00:16:25,000 --> 00:16:30,000 relationship of an RNA to a gene that was on a virus. 201 00:16:30,000 --> 00:16:35,000 It was a DNA virus, not an RNA virus, so don't get 202 00:16:35,000 --> 00:16:40,000 yourself mixed up with that. But what he had was basically a 203 00:16:40,000 --> 00:16:46,000 fragment of DNA that he knew encoded the gene. So he knew somewhere on 204 00:16:46,000 --> 00:16:51,000 this piece of DNA there was a gene somewhere in here, 205 00:16:51,000 --> 00:16:57,000 and he had isolated the mRNA. And one way you could map, 206 00:16:57,000 --> 00:17:02,000 physically see the relationship of an RNA and a DNA would be to take, 207 00:17:02,000 --> 00:17:08,000 let's just take away one of these strands. 208 00:17:08,000 --> 00:17:13,000 So we have the complimentary strand of the DNA to the RNA. 209 00:17:13,000 --> 00:17:19,000 And if we mix them together and let them slowly cool down they will form 210 00:17:19,000 --> 00:17:24,000 hydrogen bonds. They'll form a DNA-RNA hybrid just 211 00:17:24,000 --> 00:17:30,000 the same way two strands of DNA come on. And so if the gene was a little 212 00:17:30,000 --> 00:17:36,000 shorter than the piece of DNA then you might have expected to see 213 00:17:36,000 --> 00:17:41,000 something that looked like this. And the way you'd see this, 214 00:17:41,000 --> 00:17:45,000 if you looked in an electron microscope -- 215 00:17:45,000 --> 00:17:52,000 -- perhaps it would look sort of 216 00:17:52,000 --> 00:17:56,000 like this. You cannot actually see the two strands, 217 00:17:56,000 --> 00:18:00,000 but you'd see a thick part. That would be the RNA duplex. 218 00:18:00,000 --> 00:18:06,000 So this would be just DNA. And the thick part is RNA base 219 00:18:06,000 --> 00:18:12,000 paired with a single strand of DNA. You got it? That's what textbooks 220 00:18:12,000 --> 00:18:18,000 said you should have seen. And so this is more. This is data 221 00:18:18,000 --> 00:18:24,000 from Phil's paper describing this. And let me focus on this one in 222 00:18:24,000 --> 00:18:31,000 particular. That's what he actually saw. 223 00:18:31,000 --> 00:18:38,000 You guys got any idea what's going 224 00:18:38,000 --> 00:18:43,000 on? Why don't you take a minute, find somebody who's near you and see 225 00:18:43,000 --> 00:18:48,000 if you can come up with any ideas. Here's the hybrid. Forget about 226 00:18:48,000 --> 00:18:54,000 this little bit at the 3 prime end. That's not a worry. Here is the 227 00:18:54,000 --> 00:18:59,000 thing. And this, I think, is a piece of single 228 00:18:59,000 --> 00:19:05,000 stranded DNA sticking out the end. But it looks a bit more complicated. 229 00:19:05,000 --> 00:19:11,000 Any ideas? Most people put this data in their drawers. 230 00:19:11,000 --> 00:19:17,000 Phil didn't. Phil and his colleagues didn't. 231 00:19:17,000 --> 00:19:23,000 What they realized was, I'm going to try and redraw this 232 00:19:23,000 --> 00:19:30,000 just very slightly to help you see what's going on. 233 00:19:30,000 --> 00:19:35,000 What they were seeing was something that looked rather like what they 234 00:19:35,000 --> 00:19:40,000 were expecting. They were seeing a region of hybrid 235 00:19:40,000 --> 00:19:45,000 DNA and they were seeing a region of single-stranded DNA like this, 236 00:19:45,000 --> 00:19:50,000 but what it looked like was there were little loops of single-stranded 237 00:19:50,000 --> 00:19:55,000 DNA sticking out. And what Phil had discovered was a 238 00:19:55,000 --> 00:20:01,000 phenomenon we now know as RNA splicing. 239 00:20:01,000 --> 00:20:08,000 And here's what goes on. 240 00:20:08,000 --> 00:20:12,000 In bacteria, with very few exceptions, you can look at the DNA, 241 00:20:12,000 --> 00:20:16,000 you can find the open reading frame and you can just read off the 242 00:20:16,000 --> 00:20:20,000 sequence of the protein. You find the ATG, AUG, methionine 243 00:20:20,000 --> 00:20:24,000 codon, and then it keeps going no stops, and finally you come to a 244 00:20:24,000 --> 00:20:28,000 stop codon and you see there is the protein. So the coding information 245 00:20:28,000 --> 00:20:33,000 is essentially continuous in almost all bacterial genes. 246 00:20:33,000 --> 00:20:37,000 And there's a few, some genes like that in eukaryotes, 247 00:20:37,000 --> 00:20:41,000 but many eukaryotic genes are constructed, it's almost as if you 248 00:20:41,000 --> 00:20:45,000 took the gene you'd find in a bacterium and then you'd cut it in a 249 00:20:45,000 --> 00:20:50,000 bunch of places and stuck extra DNA in between all of the pieces. 250 00:20:50,000 --> 00:20:54,000 So you'd get something like this where there's, 251 00:20:54,000 --> 00:20:59,000 in the DNA there'd be coding information. 252 00:20:59,000 --> 00:21:07,000 And then non-coding information and another block of coding information. 253 00:21:07,000 --> 00:21:16,000 And then a block of non-coding and say another one of coding 254 00:21:16,000 --> 00:21:24,000 information. So this is a double-stranded DNA. 255 00:21:24,000 --> 00:21:33,000 And what happens then when the cell makes RNA is the whole thing gets 256 00:21:33,000 --> 00:21:42,000 copied into what's known now as a pre-messenger RNA. 257 00:21:42,000 --> 00:21:47,000 And so there's a bit of coding stuff here, there's a bit of coding stuff 258 00:21:47,000 --> 00:21:52,000 here, and there's some more coding stuff there. But what the cell has 259 00:21:52,000 --> 00:21:57,000 is sort of like your unedited footage from your family summer 260 00:21:57,000 --> 00:22:03,000 vacating when you were running the video camera. 261 00:22:03,000 --> 00:22:07,000 And maybe you don't want to show everybody ever second of video that 262 00:22:07,000 --> 00:22:11,000 you took during the thing. So what you do, you get in there 263 00:22:11,000 --> 00:22:16,000 and you edit it. In the old days you used to have to 264 00:22:16,000 --> 00:22:20,000 take the film and splice it. And now you can all do it with 265 00:22:20,000 --> 00:22:25,000 iMovie or something like that. But what you do is take the pieces 266 00:22:25,000 --> 00:22:29,000 of information you want, and this is what the cell is doing. 267 00:22:29,000 --> 00:22:36,000 It takes this part of the RNA. And this part of the RNA, 268 00:22:36,000 --> 00:22:45,000 and joins it together, and then this part. And when it's done it has the 269 00:22:45,000 --> 00:22:54,000 mRNA that now looks like the kind of mRNA that you would find in a 270 00:22:54,000 --> 00:23:03,000 bacterium where you can find the start codon. 271 00:23:03,000 --> 00:23:09,000 And then you could read in three-letter words all the way 272 00:23:09,000 --> 00:23:15,000 through to the end of the protein. So, in essence, what Phil found was 273 00:23:15,000 --> 00:23:21,000 that in many organisms at least there's another step in here where 274 00:23:21,000 --> 00:23:27,000 we get RNA splicing. And only after that you get down to 275 00:23:27,000 --> 00:23:33,000 proteins. What was quite remarkable about this 276 00:23:33,000 --> 00:23:37,000 result and why I'm kind of hammering on it a little bit is this is the 277 00:23:37,000 --> 00:23:41,000 data that's out of Phil's paper. You can look it up on the Internet. 278 00:23:41,000 --> 00:23:45,000 Type in Phil Sharp 1977 and you'll find this original paper with that 279 00:23:45,000 --> 00:23:49,000 figure in it. The moment Phil realized what he was and talked 280 00:23:49,000 --> 00:23:53,000 about it at a meeting, a whole lot of people suddenly sort 281 00:23:53,000 --> 00:23:57,000 of almost simultaneously discovered RNA splicing because they opened 282 00:23:57,000 --> 00:24:01,000 their drawers and there were all these uninterpretable electron 283 00:24:01,000 --> 00:24:06,000 micrographs they had. And they were in very short order 284 00:24:06,000 --> 00:24:10,000 able to save it in the system. The same thing was going on, but it 285 00:24:10,000 --> 00:24:14,000 was just confusing, it didn't fit, and to some extent 286 00:24:14,000 --> 00:24:18,000 most people's minds were set by this paradigm, this central dogma as 287 00:24:18,000 --> 00:24:22,000 something that a true believer cannot doubt. And you had to have a 288 00:24:22,000 --> 00:24:27,000 flexible enough mind to be able to see that. 289 00:24:27,000 --> 00:24:33,000 And so this is an important piece of biology that hadn't been anticipated. 290 00:24:33,000 --> 00:24:39,000 And it can be quite remarkable. I'm just going to give you a couple 291 00:24:39,000 --> 00:24:45,000 of extreme examples. Well, not even extreme examples. 292 00:24:45,000 --> 00:24:51,000 But just show you how much non-coding information there can be. 293 00:24:51,000 --> 00:24:57,000 Factor 8 is a protein that plays a part in blood clotting. 294 00:24:57,000 --> 00:25:03,000 And the gene is 200 kilobase pairs. And the pre-mRNA is just a direct 295 00:25:03,000 --> 00:25:11,000 copy, so it's 200 kilobases. It's just a single strand so it's 296 00:25:11,000 --> 00:25:19,000 not a base pair. And the actually spliced mRNA when 297 00:25:19,000 --> 00:25:27,000 it's done is 10 kilobases. So that means that only 5% of the 298 00:25:27,000 --> 00:25:35,000 gene is coding information and 95% of that information gets thrown away 299 00:25:35,000 --> 00:25:42,000 when the RNA gets spliced. And even a more extreme example is a 300 00:25:42,000 --> 00:25:48,000 protein called dystrophin. This is what's affected in a human 301 00:25:48,000 --> 00:25:55,000 genetic disease known as Duchenne muscular dystrophy. 302 00:25:55,000 --> 00:26:06,000 In this case, the gene is two mega 303 00:26:06,000 --> 00:26:17,000 base pairs. So of course then the pre-mRNA is also two mega bases but 304 00:26:17,000 --> 00:26:28,000 the pre-RNA is 16 kilobases. So in this case less than 1% of the 305 00:26:28,000 --> 00:26:40,000 gene has coding information for making a protein. 306 00:26:40,000 --> 00:26:44,000 There are a lot of interesting reasons as to why it would be like 307 00:26:44,000 --> 00:26:49,000 this. One this, things can evolve more rapidly 308 00:26:49,000 --> 00:26:54,000 sometimes because you have parts of proteins that are sort of like 309 00:26:54,000 --> 00:26:59,000 modules and evolution can probably connect them. 310 00:26:59,000 --> 00:27:03,000 I fact, it also provides ways of regulating because we now know there 311 00:27:03,000 --> 00:27:08,000 are alternative ways of splicing RNA. So you can take one RNA and then 312 00:27:08,000 --> 00:27:12,000 splice it in different ways in different cells and end up 313 00:27:12,000 --> 00:27:17,000 generating different proteins that were all encoded by one particular 314 00:27:17,000 --> 00:27:21,000 gene. And so it gives cells different kinds of regulatory 315 00:27:21,000 --> 00:27:26,000 strategies they can use. Now, the third sort of thing that 316 00:27:26,000 --> 00:27:31,000 came out that falls in this same kind of thing of people having their 317 00:27:31,000 --> 00:27:36,000 minds open and not fixed by the current understanding or bounded by 318 00:27:36,000 --> 00:27:41,000 the current understanding is the discovery that RNA can act as an 319 00:27:41,000 --> 00:27:46,000 enzyme. And I've already talked to you about that and I've told it was 320 00:27:46,000 --> 00:27:51,000 ribozyme, but it was discovered by Tom Cech. Tom is currently 321 00:27:51,000 --> 00:27:56,000 president of the Howard Hughes Medical Institute, 322 00:27:56,000 --> 00:28:02,000 but he did his post-doctoral work at MIT with Mary Lou Pardue. 323 00:28:02,000 --> 00:28:05,000 I've been a post-doc at Berkeley when he was just finishing his 324 00:28:05,000 --> 00:28:09,000 graduate work, and I met him out there. 325 00:28:09,000 --> 00:28:13,000 And then he came to MIT to do his post-doc. And a year later I got a 326 00:28:13,000 --> 00:28:17,000 job so I'd become friends there and became friends when we started here. 327 00:28:17,000 --> 00:28:21,000 So I had a pretty close link to this particular story. 328 00:28:21,000 --> 00:28:25,000 Here's a picture of Tom together with Phil. That's actually my wife 329 00:28:25,000 --> 00:28:29,000 right there who was in this picture. But Tom actually looks much more 330 00:28:29,000 --> 00:28:35,000 like that. He's very colorful, 331 00:28:35,000 --> 00:28:43,000 very fun, a very interesting person. But anyway, when Tom left MIT to 332 00:28:43,000 --> 00:28:51,000 take a faculty position at Bolder he was interested in trying to 333 00:28:51,000 --> 00:29:00,000 understand the biochemistry of RNA splicing. And so he went -- 334 00:29:00,000 --> 00:29:03,000 He did what a good scientist will do. They'll try and find an 335 00:29:03,000 --> 00:29:07,000 experimental system where the question they want to address is 336 00:29:07,000 --> 00:29:10,000 simple enough you can actually get an answer. There's a kind of way of 337 00:29:10,000 --> 00:29:14,000 doing science where you pick a system that's too complicated and 338 00:29:14,000 --> 00:29:17,000 you never actually get an answer. It sounds very important because 339 00:29:17,000 --> 00:29:21,000 you're working on something that's important but you cannot, 340 00:29:21,000 --> 00:29:24,000 you don't have the tools you need to get to the answer. 341 00:29:24,000 --> 00:29:28,000 So Tom wanted to work on the biochemistry of RNA splicing because 342 00:29:28,000 --> 00:29:32,000 that had just been discovered. And so he went to a small little 343 00:29:32,000 --> 00:29:37,000 tiny organism called tetrahymena. And the reason he looked at that 344 00:29:37,000 --> 00:29:41,000 was because it had a ribosomal RNA, so it was an RNA that was made in 345 00:29:41,000 --> 00:29:46,000 great abundance within the organism. And it only had one of these 346 00:29:46,000 --> 00:29:51,000 non-coding regions. I'll tell you the words for these 347 00:29:51,000 --> 00:29:56,000 coding and non-coding. To me they're non-intuitive, 348 00:29:56,000 --> 00:30:01,000 but I guess you should know them. 349 00:30:01,000 --> 00:30:10,000 The coding region is called, the part that codes is called an 350 00:30:10,000 --> 00:30:19,000 exon and the non-coding part is called an intron. 351 00:30:19,000 --> 00:30:29,000 So, anyway, Tom worked on this organism because the pre-mRNA 352 00:30:29,000 --> 00:30:37,000 was basically this. Or the pre-RNA before the splicing And whenever I got out to Bolder we'd try and get in a squash game. 353 00:30:37,000 --> 00:30:45,000 looked like this. This was going to give this like 354 00:30:45,000 --> 00:30:52,000 that. He could get large quantities of this RNA, so he was all set to 355 00:30:52,000 --> 00:31:00,000 make extracts of the cells of this organism and then start cooking up here. And he went off to, I guess it was Denmark to learn how 356 00:31:00,000 --> 00:31:08,000 this RNA substrate with all sorts of cell extracts. And so I first heard about this, Tom was working on this when he was 357 00:31:08,000 --> 00:31:00,000 And then his plan was to purify the enzymes that did the RNA splicing. 358 00:30:44,000 --> 00:30:36,000 to grow this organism. Then they were back and he was off 359 00:30:36,000 --> 00:30:29,000 at Bolder. And we used to play squash all the time. 360 00:30:40,000 --> 00:30:51,000 So I was out there at a meeting and we were sitting around in the locker 361 00:30:51,000 --> 00:31:03,000 room. And I said so how's the splicing biochemistry projecting 362 00:31:03,000 --> 00:31:14,000 going? Tom says, well, it's going OK, 363 00:31:14,000 --> 00:31:25,000 I guess. There's only one little problem, he says. 364 00:31:25,000 --> 00:31:37,000 The controls are splicing. Now, what he meant was if you were 365 00:31:37,000 --> 00:31:48,000 trying to add cell extract and get this thing to go what you would 366 00:31:48,000 --> 00:32:00,000 start out with is the RNA in a tube basically. 367 00:32:00,000 --> 00:32:04,000 And that would be your control. And then you'd start adding stuff 368 00:32:04,000 --> 00:32:08,000 to it and start looking for splicing. And what Tom was finding was that 369 00:32:08,000 --> 00:32:12,000 if you just took this RNA and let it sit in a test tube that the splicing 370 00:32:12,000 --> 00:32:17,000 happened without him putting anything in. And here he was 371 00:32:17,000 --> 00:32:21,000 already to find all the enzymes, the proteins that did it. And Tom 372 00:32:21,000 --> 00:32:25,000 did an absolutely gorgeous piece of science to prove that what was 373 00:32:25,000 --> 00:32:30,000 happening was the RNA was catalyzing its own splicing. 374 00:32:30,000 --> 00:32:33,000 And he had to work very, very hard to prove that it wasn't a 375 00:32:33,000 --> 00:32:37,000 contaminating protein. Remember we had this sort of 376 00:32:37,000 --> 00:32:41,000 discussion? We were talking about is DNA the genetic material and how 377 00:32:41,000 --> 00:32:45,000 would we know that it wasn't just a little tiny bit of something else in 378 00:32:45,000 --> 00:32:49,000 our DNA perhaps that was doing it. Tom had to go through pretty much a 379 00:32:49,000 --> 00:32:53,000 similar exercise, but this was one of these key 380 00:32:53,000 --> 00:32:57,000 insights that lead to the proof that RNA could function as a catalyst, 381 00:32:57,000 --> 00:33:02,000 what we now know as a ribozyme. And I've shown you now we now sort 382 00:33:02,000 --> 00:33:08,000 of accept that the actual ribosome itself is a ribozyme and that the 383 00:33:08,000 --> 00:33:15,000 formation of the peptide bond, the thing that's the heart of all 384 00:33:15,000 --> 00:33:22,000 proteins is made by a ribozyme, not catalyzed by ribosomes and not 385 00:33:22,000 --> 00:33:28,000 by a protein. OK. So the next topic that I want to 386 00:33:28,000 --> 00:33:35,000 try on which sort of we've already set up from this is that if the 387 00:33:35,000 --> 00:33:42,000 information is all in DNA to begin with then if you make an RNA copy 388 00:33:42,000 --> 00:33:49,000 you're only taking a segment of that information at a time. 389 00:33:49,000 --> 00:33:55,000 And that gives the cells a lot of possibilities for regulating how 390 00:33:55,000 --> 00:34:01,000 they respond to the environment or just controlling what genes are 391 00:34:01,000 --> 00:34:07,000 expressed. And there are basically two kinds of strategies that are 392 00:34:07,000 --> 00:34:14,000 involved in these regulatory decisions. They can either be -- 393 00:34:14,000 --> 00:34:29,000 Can either be reversible changes. 394 00:34:29,000 --> 00:34:33,000 For example, a bacterium and a food source. If you're a bacterium and 395 00:34:33,000 --> 00:34:38,000 you've got enzymes that let you eat a hundred different kinds of food 396 00:34:38,000 --> 00:34:42,000 and you're in an environment where there's only one of them there, 397 00:34:42,000 --> 00:34:47,000 you're really wasting energy if you make the proteins to make the other 398 00:34:47,000 --> 00:34:52,000 99. So you might guess that somehow evolution has selected four systems 399 00:34:52,000 --> 00:34:56,000 that have learned how to turn on and off the things they need to eat 400 00:34:56,000 --> 00:35:01,000 certain food sources depending on whether the food source 401 00:35:01,000 --> 00:35:06,000 is available. We only carry umbrellas when it 402 00:35:06,000 --> 00:35:10,000 rains. If you had to carry an umbrella and a snowsuit and a 403 00:35:10,000 --> 00:35:14,000 surfboard, everything all the time, it would slow you down in evolution. 404 00:35:14,000 --> 00:35:18,000 So the other type, which we've talked about as well when we talked 405 00:35:18,000 --> 00:35:22,000 about starting as a single cell and going up to the 14 cells that make 406 00:35:22,000 --> 00:35:26,000 us up, then many of those changes, as those cells go along and 407 00:35:26,000 --> 00:35:31,000 progressively more specialized need to be irreversible. 408 00:35:31,000 --> 00:35:36,000 And this is particularly important 409 00:35:36,000 --> 00:35:40,000 in development. We don't want a cell in our retina 410 00:35:40,000 --> 00:35:44,000 suddenly deciding it should be part of a heart and start to make a heart 411 00:35:44,000 --> 00:35:48,000 in the middle of your eye or something like that. 412 00:35:48,000 --> 00:35:52,000 So things in development tend to be once you're off you're off or once 413 00:35:52,000 --> 00:35:56,000 you're on you're on or something. And just to give you another little 414 00:35:56,000 --> 00:36:00,000 look at that picture I've shown you before of the nematode. 415 00:36:00,000 --> 00:36:03,000 And at the time, the first time I showed you this, 416 00:36:03,000 --> 00:36:07,000 I was just trying to emphasize that we could take the gene encoding 417 00:36:07,000 --> 00:36:11,000 green fluorescent protein and put it in anything and it would go green. 418 00:36:11,000 --> 00:36:15,000 In this case, Barbara Meyer who is at Berkeley now but used to be my 419 00:36:15,000 --> 00:36:19,000 office-mate at MIT for many years, what she's done is she's taken that 420 00:36:19,000 --> 00:36:23,000 green fluorescent protein, the gene for that, and she's put it 421 00:36:23,000 --> 00:36:27,000 under the control of a regulatory system, a gene that is made to be 422 00:36:27,000 --> 00:36:31,000 expressed in the esophagus of the worm. 423 00:36:31,000 --> 00:36:35,000 And so even though that gene is present in all the cells of that 424 00:36:35,000 --> 00:36:39,000 organism, it's under the control of a system that usually permits the 425 00:36:39,000 --> 00:36:44,000 genes to be made that are needed for making esophagus but not in other 426 00:36:44,000 --> 00:36:48,000 parts of the body. So you probably didn't pick that 427 00:36:48,000 --> 00:36:53,000 part up now but sort of take another look at that same thing and see 428 00:36:53,000 --> 00:36:57,000 something different. So how do we learn about gene 429 00:36:57,000 --> 00:37:03,000 regulation? The key work, like so many of these 430 00:37:03,000 --> 00:37:09,000 things, started kind of inauspiciously, 431 00:37:09,000 --> 00:37:16,000 if you will. There were two French scientists, Jacques Monod, 432 00:37:16,000 --> 00:37:23,000 who is a biochemist, Francois Jacob who was a geneticist. 433 00:37:23,000 --> 00:37:30,000 And they were working on the metabolism of lactose by E. coli. 434 00:37:30,000 --> 00:37:36,000 Lactose is galactose, beta 1,4 glucose. And you don't 435 00:37:36,000 --> 00:37:42,000 have to know exactly the structure. You can just remember there were a 436 00:37:42,000 --> 00:37:49,000 lot of different hydroxyls, and that was one particular linkage. 437 00:37:49,000 --> 00:37:55,000 And there's an enzyme that cleaves this into galactose and glucose. 438 00:37:55,000 --> 00:38:01,000 And this can go right into glycolysis and make energy 439 00:38:01,000 --> 00:38:07,000 for the organism. And the galactose undergoes a couple 440 00:38:07,000 --> 00:38:12,000 of different transformations, and it can get in there as well. 441 00:38:12,000 --> 00:38:17,000 But in order to get at the energy that's in those carbohydrates, 442 00:38:17,000 --> 00:38:22,000 this linkage has to be broken. And it was broken by an enzyme called 443 00:38:22,000 --> 00:38:27,000 beta-galactosidase. That's a protein that's able to 444 00:38:27,000 --> 00:38:32,000 catalyze the cleavage of those two sugars. 445 00:38:32,000 --> 00:38:39,000 That's what Jacques Monod and Francois Jacob were studying. 446 00:38:39,000 --> 00:38:46,000 They were helped out in this exercise. I guess part of the 447 00:38:46,000 --> 00:38:53,000 reason they got going on this was people had noticed for many years 448 00:38:53,000 --> 00:39:00,000 that if you grew E. coli in glucose there was no 449 00:39:00,000 --> 00:39:08,000 beta-gal. I'm going to abbreviate this as 450 00:39:08,000 --> 00:39:16,000 beta-gal just so I won't have to keep writing the same thing. 451 00:39:16,000 --> 00:39:25,000 But if they grew E. coli in lactose beta-gal was present. 452 00:39:25,000 --> 00:39:34,000 And they had to be able to assay for this enzyme. And they used -- 453 00:39:34,000 --> 00:39:43,000 There were standard types of 454 00:39:43,000 --> 00:39:47,000 biochemical assays you could use. But some chemists that helped 455 00:39:47,000 --> 00:39:52,000 design a very cleaver kind of substrate that helped them, 456 00:39:52,000 --> 00:39:57,000 that could be used in these kinds of studies, and I'll show you one of 457 00:39:57,000 --> 00:40:01,000 them. What this enzyme really looks at is it looks at, let's 458 00:40:01,000 --> 00:40:06,000 see, galactose. What it sees is sort of the 459 00:40:06,000 --> 00:40:10,000 galactose side of this linkage, and then it reaches in and catalyzes 460 00:40:10,000 --> 00:40:14,000 the cleavage of what's joined to it. And it turns out not to be specific 461 00:40:14,000 --> 00:40:18,000 for whether glucose is on the other side. It can accept substrates that 462 00:40:18,000 --> 00:40:22,000 have other things as well. So some chemists made some variants 463 00:40:22,000 --> 00:40:26,000 like this. This is a compound that's commonly known as X-gal. 464 00:40:26,000 --> 00:40:31,000 If you talk to it in the lab it's got a longer chemical name. 465 00:40:31,000 --> 00:40:36,000 But what happens if beta-galactosidase is there, 466 00:40:36,000 --> 00:40:41,000 it's able to cleave this substrate so you get galactose, 467 00:40:41,000 --> 00:40:46,000 which is colorless. But if you get just X, this is colored, 468 00:40:46,000 --> 00:40:51,000 but up here this original material is also colorless. 469 00:40:51,000 --> 00:40:56,000 So this is very convenient because if you use a substrate such as this 470 00:40:56,000 --> 00:41:02,000 you could put the cells on a plate with this indicator. 471 00:41:02,000 --> 00:41:05,000 And if they are colored, and the color is blue, you'd know 472 00:41:05,000 --> 00:41:09,000 they were making beta-galactosidase. And if you don't see a color, you 473 00:41:09,000 --> 00:41:13,000 know they're not. There are a variety of ways of 474 00:41:13,000 --> 00:41:17,000 assaying for this enzyme. With that I'm just trying to give 475 00:41:17,000 --> 00:41:21,000 you a little bit of flavor of one of the different ways that they could 476 00:41:21,000 --> 00:41:25,000 assay for it. Now, one of the issues was it looked as 477 00:41:25,000 --> 00:41:29,000 though E. coli didn't have any beta-galactosidase activity if 478 00:41:29,000 --> 00:41:33,000 lactose was absent when growing a glucose. 479 00:41:33,000 --> 00:41:36,000 And they made it if lactose was present. Well, 480 00:41:36,000 --> 00:41:40,000 that would be kind of what you would expect evolution would have figured 481 00:41:40,000 --> 00:41:44,000 out how to do, only make the enzyme for 482 00:41:44,000 --> 00:41:48,000 metabolizing lactose if the lactose is present, but they had to figure 483 00:41:48,000 --> 00:41:52,000 out what the molecular basis of this was. And one of the possibilities 484 00:41:52,000 --> 00:41:56,000 was that the protein was made that it was all sort of unfolded, 485 00:41:56,000 --> 00:42:00,000 and when the substrate came in then it folded all around it and then it 486 00:42:00,000 --> 00:42:03,000 could cleave it. Or another possibility, 487 00:42:03,000 --> 00:42:07,000 which would be the kind we're talking about now, 488 00:42:07,000 --> 00:42:11,000 is the protein is not made until the lactose is present, 489 00:42:11,000 --> 00:42:14,000 and then it makes it new. So they had to figure out, 490 00:42:14,000 --> 00:42:18,000 between these two, which of these two was true. When you see the 491 00:42:18,000 --> 00:42:22,000 lactose present, is it just beta-galactosidase is 492 00:42:22,000 --> 00:42:26,000 already made but it's inactive, or is it being made de novo when you 493 00:42:26,000 --> 00:42:32,000 add the lactose? So what they did was they grew cells 494 00:42:32,000 --> 00:42:40,000 in glucose plus radioactive C14-leucine for a long time. 495 00:42:40,000 --> 00:42:53,000 So all the proteins -- 496 00:42:53,000 --> 00:42:56,000 -- were radioactive. And once they got, that's going for 497 00:42:56,000 --> 00:43:00,000 a long time. So every protein being made is radioactive. 498 00:43:00,000 --> 00:43:07,000 Then they add excess unlabeled leucine. So this means that from 499 00:43:07,000 --> 00:43:14,000 now on any new proteins that are made will not be radioactive because 500 00:43:14,000 --> 00:43:21,000 you're just going to swamp out any radioactive stuff with this. 501 00:43:21,000 --> 00:43:29,000 And they added glucose, excuse me, now they added lactose through the 502 00:43:29,000 --> 00:43:35,000 cells. And then they isolated the beta-gal 503 00:43:35,000 --> 00:43:40,000 enzyme. It was actually pretty easy to do. It's a huge enzyme and it's 504 00:43:40,000 --> 00:43:45,000 a tetramer. So very large. Even in those days it was fairly 505 00:43:45,000 --> 00:43:50,000 easy to isolate this enzyme. And then they looked to see is it 506 00:43:50,000 --> 00:43:55,000 radioactive? If it's radioactive it was there all along and it's 507 00:43:55,000 --> 00:44:00,000 refolded to become the active enzyme. 508 00:44:00,000 --> 00:44:05,000 Or if it had been only after lactose then it would be made de novo in 509 00:44:05,000 --> 00:44:10,000 response to it. And what they found was that it was 510 00:44:10,000 --> 00:44:19,000 non-radioactive. 511 00:44:19,000 --> 00:44:32,000 Which implied that it was made after you added the lactose. 512 00:44:32,000 --> 00:44:38,000 So they knew then that they were studying a system in which a protein 513 00:44:38,000 --> 00:44:44,000 was only made after the cells had experienced a particular growth 514 00:44:44,000 --> 00:44:50,000 substrate. And so a lot of work went into figuring out how this 515 00:44:50,000 --> 00:44:56,000 system worked. Let's see. We're a little short on 516 00:44:56,000 --> 00:45:01,000 time. So I'll tell you what I'll do. 517 00:45:01,000 --> 00:45:06,000 I'll tell you, I'll just put out quickly the mechanics of what they 518 00:45:06,000 --> 00:45:10,000 saw, and we'll start in on the regulation on how this works. 519 00:45:10,000 --> 00:45:15,000 And some of you may be able to figure it out. 520 00:45:15,000 --> 00:45:20,000 What we now know is that the gene that encodes beta-galactosidase is 521 00:45:20,000 --> 00:45:25,000 in a stretch of DNA that's pretty interesting. It's got three genes. 522 00:45:25,000 --> 00:45:30,000 It's the gene lacZ. This is the gene for beta-galactosidase. 523 00:45:30,000 --> 00:45:37,000 And another gene called lacY and lacZ. There's a promoter. 524 00:45:37,000 --> 00:45:45,000 That's a start signal for transcription. 525 00:45:45,000 --> 00:45:52,000 Remember that? So there's a sequence here that 526 00:45:52,000 --> 00:46:00,000 says start transcription. Down here is a terminator. 527 00:46:00,000 --> 00:46:06,000 Another word written in the nucleic acid alphabet that means stop making 528 00:46:06,000 --> 00:46:12,000 mRNA. And there is one long mRNA, as you can see, that has the 529 00:46:12,000 --> 00:46:19,000 peculiarity of encoding three different genes. 530 00:46:19,000 --> 00:46:25,000 So if you have more than one gene in a single message then that's 531 00:46:25,000 --> 00:46:40,000 called an operon. 532 00:46:40,000 --> 00:46:46,000 You've got one mRNA. But, in any case, so whenever 533 00:46:46,000 --> 00:46:52,000 beta-galactosidase was being made then RNA has to start being made 534 00:46:52,000 --> 00:46:58,000 here, goes to there. And we won't worry about the 535 00:46:58,000 --> 00:47:03,000 functions of these other two genes. But, as you might guess from the way 536 00:47:03,000 --> 00:47:08,000 evolution has selected for it, they have related activities to what 537 00:47:08,000 --> 00:47:12,000 beta-galactosidase does. And for bacteria it's a very 538 00:47:12,000 --> 00:47:17,000 efficient way to control the expression of a bunch of genes at 539 00:47:17,000 --> 00:47:22,000 once. Then there was another gene up here known as lacI that had a 540 00:47:22,000 --> 00:47:27,000 promoter and a terminator, and it made an mRNA as well. 541 00:47:27,000 --> 00:47:34,000 And that mRNA encoded a protein that's known as the lac repressor. 542 00:47:34,000 --> 00:47:42,000 And what that lac repressor does, it's a protein that has the ability 543 00:47:42,000 --> 00:47:49,000 to recognize a very, very specific DNA sequence and bind 544 00:47:49,000 --> 00:47:57,000 there. And I'm just going to kind of blow up this part of the thing. 545 00:47:57,000 --> 00:48:04,000 So what we have here is the, this is the promoter here. And it 546 00:48:04,000 --> 00:48:11,000 happens that the binding sequence -- 547 00:48:11,000 --> 00:48:22,000 -- for lac repressor overlaps with 548 00:48:22,000 --> 00:48:32,000 the promoter. Weird, right? Maybe not. 549 00:48:32,000 --> 00:48:36,000 So I'll tell you, well, you can think about this over 550 00:48:36,000 --> 00:48:41,000 the weekend, if you haven't run into this system before. 551 00:48:41,000 --> 00:48:45,000 So this gene gets made all the time. So this protein gets made all the 552 00:48:45,000 --> 00:48:50,000 time. What does that protein do if it's just like this? 553 00:48:50,000 --> 00:48:55,000 Its job in life is to look for this sequence and bind to it. 554 00:48:55,000 --> 00:49:00,000 If it binds to it, it covers up the promoter. 555 00:49:00,000 --> 00:49:04,000 And the beta-galactosidase gene is not expressed because the cell 556 00:49:04,000 --> 00:49:09,000 cannot make mRNA. So this may seem a little obscure, 557 00:49:09,000 --> 00:49:14,000 but there's something very important here. Now the conditionality on 558 00:49:14,000 --> 00:49:19,000 whether this gene is expressed or not is controlled by a protein, 559 00:49:19,000 --> 00:49:23,000 right? It's controlled by this lac repressor. If it's on there the 560 00:49:23,000 --> 00:49:28,000 gene will be made. And if it's off the gene now you 561 00:49:28,000 --> 00:49:33,000 can make it. There's a promoter and the RNA 562 00:49:33,000 --> 00:49:38,000 polymerase will see it. And so you've learned something 563 00:49:38,000 --> 00:49:44,000 about proteins. They can bind various things. 564 00:49:44,000 --> 00:49:49,000 And so what lac repressor has, it's got a little binding site that 565 00:49:49,000 --> 00:49:54,000 lactose is able to bind to and change the confirmation of the lac 566 00:49:54,000 --> 00:49:59,000 repressor. So why don't you take those pieces of information and see 567 00:49:59,000 --> 00:50:05,000 if you can figure out how the circuitry goes. 568 00:50:05,000 --> 00:50:11,000 Yeah? Did I do something wrong? Sorry. Oh, sorry. Excuse me. Yes, 569 00:50:11,000 --> 00:50:17,000 Z-Y-A. Excuse me. OK? We'll walk through that on 570 00:50:17,000 --> 00:50:23,000 Monday, but focus on the fact that if the repressor is there and 571 00:50:23,000 --> 00:50:30,000 lactose isn't, it binds to this sequence. 572 00:50:30,000 --> 00:50:34,000 The repressor is made all the time, but this repressor is something that 573 00:50:34,000 --> 00:50:39,000 can tell you whether lactose is there or not. So you can put the 574 00:50:39,000 --> 00:50:42,000 circuit together, OK?