1 00:00:01,000 --> 00:00:06,000 OK, so what I'd like to do today is pick up where we left 2 00:00:06,000 --> 00:00:12,000 off last time, 3 00:00:12,000 --> 00:00:16,000 with respect to how this genetic material actually functions. 4 00:00:16,000 --> 00:00:20,000 We discussed last time the 5 00:00:20,000 --> 00:00:25,000 experiments that identified DNA as the fundamental genetic material, 6 00:00:25,000 --> 00:00:30,000 the transforming principle. We identified the eventual work 7 00:00:30,000 --> 00:00:33,000 by Crick and Watson's work at the structure of DNA as a double helix. 8 00:00:33,000 --> 00:00:37,000 We mentioned why that was so tremendously important, 9 00:00:37,000 --> 00:00:41,000 because it contained within it in principle the secret of replication, 10 00:00:41,000 --> 00:00:45,000 namely two strands, each of which contained the full information, 11 00:00:45,000 --> 00:00:48,000 and therefore each of which included in principal serve as a template for 12 00:00:48,000 --> 00:00:52,000 making the other strand. And that is, after all, the big 13 00:00:52,000 --> 00:00:56,000 issue about life is how do you, in fact, copy life? And then, I 14 00:00:56,000 --> 00:01:00,000 mentioned briefly these experiments 15 00:01:00,000 --> 00:01:04,000 by these post-docs, Matt Meselson and Frank Stahl about 16 00:01:04,000 --> 00:01:09,000 50 years ago to demonstrate that the semi-conservative model of DNA 17 00:01:09,000 --> 00:01:13,000 replication was right by virtue of actually labeling DNA during the 18 00:01:13,000 --> 00:01:18,000 course of its replication in one generation, and demonstrating that 19 00:01:18,000 --> 00:01:22,000 DNA actually changed in its density when you added in an isotope of 20 00:01:22,000 --> 00:01:27,000 nitrogen. And, it changed in its density in such a 21 00:01:27,000 --> 00:01:31,000 way as to be intermediate between what you'd expect from heavy, 22 00:01:31,000 --> 00:01:35,000 heavy, light, light. You have the intermediate. 23 00:01:35,000 --> 00:01:38,000 So, that was all good experimental confirmation that this model was 24 00:01:38,000 --> 00:01:41,000 probably right. But now, how does it really work? 25 00:01:41,000 --> 00:01:44,000 After all the excitement calms down for a moment you say, 26 00:01:44,000 --> 00:01:47,000 OK, that's great. We now know in principal it's there, 27 00:01:47,000 --> 00:01:50,000 but what actually goes on? How is DNA really replicated? 28 00:01:50,000 --> 00:01:53,000 How is it really read out into information? How does it really, 29 00:01:53,000 --> 00:01:56,000 as Archibald Garrett noted, and as Beadle and Tatum noted, 30 00:01:56,000 --> 00:02:00,000 how does it really make protein as well? 31 00:02:00,000 --> 00:02:05,000 How does it encode the instructions for that? Well, 32 00:02:05,000 --> 00:02:10,000 that was what was on people's minds in the late '50s. 33 00:02:10,000 --> 00:02:15,000 And, it was Francis Crick who was the real intellectual thinker about 34 00:02:15,000 --> 00:02:20,000 this. And, the eventual synthesis that you guys all know, 35 00:02:20,000 --> 00:02:25,000 because, again, all this stuff gets taught in elementary school these 36 00:02:25,000 --> 00:02:30,000 days, was encapsulated in the central dogma of molecular biology, 37 00:02:30,000 --> 00:02:36,000 which I will summarize here diagrammatically. 38 00:02:36,000 --> 00:02:43,000 The DNA is replicated to make copies of DNA. 39 00:02:43,000 --> 00:02:49,000 It's read out into the intermediate RNA, and then it is translated into 40 00:02:49,000 --> 00:02:56,000 protein. This process: translation. This process is called 41 00:02:56,000 --> 00:03:01,000 transcription. And this process: replication. 42 00:03:01,000 --> 00:03:04,000 And what I'd like to do is go into some detail today about how each of 43 00:03:04,000 --> 00:03:07,000 these processes work. Now, at the beginning, 44 00:03:07,000 --> 00:03:11,000 when people were trying to patch this together, 45 00:03:11,000 --> 00:03:14,000 it wasn't as obvious as it is to you today, that DNA goes to RNA, 46 00:03:14,000 --> 00:03:17,000 goes to protein. And, in fact, it was a real struggle to figure out 47 00:03:17,000 --> 00:03:20,000 what this RNA stuff was doing in the middle, how it could possibly give 48 00:03:20,000 --> 00:03:23,000 rise to protein. I want to talk about some of that. 49 00:03:23,000 --> 00:03:26,000 Let me briefly mention, though, Francis Crick's term, 50 00:03:26,000 --> 00:03:30,000 the central dogma, because it sometimes 51 00:03:30,000 --> 00:03:33,000 gets criticized, the word dogma there as being like 52 00:03:33,000 --> 00:03:37,000 religious belief and molecular biologists treated in this way. 53 00:03:37,000 --> 00:03:41,000 I've read a couple of social scientists who sort of say, 54 00:03:41,000 --> 00:03:45,000 dogma. In fact, Francis Crick deliberately named this the central 55 00:03:45,000 --> 00:03:48,000 dogma because he said there was no proof for it at the time it was put 56 00:03:48,000 --> 00:03:52,000 forward. He put it forward with that word precisely to emphasize 57 00:03:52,000 --> 00:03:56,000 that this was a working guess. But, it was merely a matter of 58 00:03:56,000 --> 00:04:00,000 belief that this is sort of how they were putting together the pieces. 59 00:04:00,000 --> 00:04:04,000 And it was really a question of demonstrating how all these pieces 60 00:04:04,000 --> 00:04:08,000 work. We still call it the central dogma, but it's now, 61 00:04:08,000 --> 00:04:12,000 of course, extraordinarily well established. Let's look at this 62 00:04:12,000 --> 00:04:16,000 first piece. DNA is replicated. All right, so Meselson and Stahl 63 00:04:16,000 --> 00:04:20,000 tell us that, yeah, the DNA weight look like the new 64 00:04:20,000 --> 00:04:24,000 strand, the old strand, all that. How would you really 65 00:04:24,000 --> 00:04:28,000 demonstrate DNA replication? If you wanted to show me that DNA 66 00:04:28,000 --> 00:04:32,000 replication really happens, this DNA goes to DNA, 67 00:04:32,000 --> 00:04:38,000 that somehow we had to take a double strand of DNA, 68 00:04:38,000 --> 00:04:43,000 and it gives rise to, it's one thing to show this in a 69 00:04:43,000 --> 00:04:49,000 bacterium by adding the nitrogen and all that. The way to really prove 70 00:04:49,000 --> 00:04:54,000 this was to be in a test tube. In vitro, reconstitute for me DNA 71 00:04:54,000 --> 00:05:00,000 replication. Show me that in a cell free system, 72 00:05:00,000 --> 00:05:05,000 you can take DNA, and you can copy it as you would 73 00:05:05,000 --> 00:05:10,000 expect according to the Crick Watson model here. Well, 74 00:05:10,000 --> 00:05:15,000 that is what Arthur Kornberg set out to do. Arthur Kornberg was a 75 00:05:15,000 --> 00:05:20,000 biochemist, and so his interest was crack open the cell, 76 00:05:20,000 --> 00:05:25,000 and purify an enzyme that was able to copy DNA. Now, 77 00:05:25,000 --> 00:05:30,000 how do you do that? What cells should you pick? 78 00:05:30,000 --> 00:05:33,000 Sorry? Why E coli? What a bacteria? It's simple, 79 00:05:33,000 --> 00:05:37,000 exactly. Good answer. You can grow up a lot of it, 80 00:05:37,000 --> 00:05:41,000 and presumably, if this DNA replication thing is right, 81 00:05:41,000 --> 00:05:45,000 it will apply to any organism. So, we'll go with E coli. So, 82 00:05:45,000 --> 00:05:49,000 what do you do? You just crack open a cell and purify components, 83 00:05:49,000 --> 00:05:53,000 and throw them in a test tube, and look for DNA synthesis? Well, 84 00:05:53,000 --> 00:05:57,000 you've got to put something in the test tube. What should we put in 85 00:05:57,000 --> 00:06:02,000 the test tube? Sorry? Nucleotides, because we think that 86 00:06:02,000 --> 00:06:08,000 this is going to be made out of nucleotides. So, 87 00:06:08,000 --> 00:06:14,000 we'd better add some nucleotides to our test tube. 88 00:06:14,000 --> 00:06:20,000 So, actually, deoxynucleotides, we'll add some DATP, DCTP, DGTP, and 89 00:06:20,000 --> 00:06:26,000 DTTP, the deoxynucleotide triphosphates, altogether 90 00:06:26,000 --> 00:06:33,000 known as the DNTPs. OK, that's good. 91 00:06:33,000 --> 00:06:39,000 So, we're going to take different fractions of the cell. 92 00:06:39,000 --> 00:06:45,000 We'll add it here. We'll add some nucleotides, and what else should we 93 00:06:45,000 --> 00:06:51,000 add? Well, if we were going to copy DNA, maybe we ought to put in a DNA 94 00:06:51,000 --> 00:06:57,000 strand. Let's put in a DNA template. So, let's put in a template 95 00:06:57,000 --> 00:07:02,000 strand of DNA that we'll copy, 96 00:07:02,000 --> 00:07:07,000 here we go, and we've got our nucleotides floating around here. 97 00:07:07,000 --> 00:07:12,000 And, here's our template strand, a single strand of DNA, 98 00:07:12,000 --> 00:07:17,000 and now we add enzymes, and we hope that it's going to 99 00:07:17,000 --> 00:07:22,000 somehow copy the DNA. Now, it turns out that that's a 100 00:07:22,000 --> 00:07:27,000 little bit optimistic because in order to copy the DNA, 101 00:07:27,000 --> 00:07:32,000 and I think Kornberg had this insight, 102 00:07:32,000 --> 00:07:37,000 it's helpful to give it a start. So, instead of just adding a single 103 00:07:37,000 --> 00:07:43,000 template strand, he also added a short complementary 104 00:07:43,000 --> 00:07:48,000 primer strand with the hope that he would be able to purify an enzyme, 105 00:07:48,000 --> 00:07:54,000 which even if it couldn't manage to start the synthesis of DNA, 106 00:07:54,000 --> 00:08:00,000 would be able to extend the synthesis of DNA. 107 00:08:00,000 --> 00:08:04,000 That's a reasonable thing. Let's not ask for it all at once. 108 00:08:04,000 --> 00:08:08,000 Maybe it won't be a single fraction. Maybe multiple enzymes would be 109 00:08:08,000 --> 00:08:12,000 needed to get going. So, he needed a primer strand, 110 00:08:12,000 --> 00:08:17,000 a template strand, and some nucleotides. And then he added 111 00:08:17,000 --> 00:08:21,000 fractions, and he looked to see whether he could get incorporation 112 00:08:21,000 --> 00:08:25,000 of DNA. So now, let's look at this a little more 113 00:08:25,000 --> 00:08:30,000 closely. The primer strand goes like this. Five prime, ah, 114 00:08:30,000 --> 00:08:35,000 This direction is going to matter a lot, I told you. 115 00:08:35,000 --> 00:08:40,000 Phosphate T, phosphate A, phosphate C, phosphate G, phosphate 116 00:08:40,000 --> 00:08:46,000 T, phosphate A, stop there. Template strand, 117 00:08:46,000 --> 00:08:51,000 the complement to that , will start in the opposite direction. 118 00:08:51,000 --> 00:08:57,000 These are anti-parallel. What matches the T: A. Keep 119 00:08:57,000 --> 00:09:03,000 going: T, G, C, A, T, and phosphate, 120 00:09:03,000 --> 00:09:09,000 phosphate, phosphate, phosphate, phosphate; I'll stop 121 00:09:09,000 --> 00:09:15,000 writing the phosphates in a while. Let's say T, A, G, G, C, etc. This 122 00:09:15,000 --> 00:09:21,000 is the five prime end. That is the three prime end, 123 00:09:21,000 --> 00:09:27,000 OK? And, this one will go on further, let's say. 124 00:09:27,000 --> 00:09:32,000 All right, so what is the enzyme that Kornberg hopes to find going to 125 00:09:32,000 --> 00:09:38,000 do? What's it going to add to the strand? It's going to add an A. 126 00:09:38,000 --> 00:09:43,000 All right, it wants to put in an A here. So, it's going to take a 127 00:09:43,000 --> 00:09:49,000 triphosphate, and it's going to catalyze the addition of a 128 00:09:49,000 --> 00:09:54,000 triphosphate to the growing end of this DNA chain, 129 00:09:54,000 --> 00:10:00,000 and which is its growing end? The three prime end 130 00:10:00,000 --> 00:10:04,000 of the chain there, right? It's adding it to the three 131 00:10:04,000 --> 00:10:08,000 prime carbon there. And, when it does that, 132 00:10:08,000 --> 00:10:12,000 where is it going to get the energy for catalysis here for this chemical 133 00:10:12,000 --> 00:10:17,000 reaction here? It's going to get it from the 134 00:10:17,000 --> 00:10:21,000 dehydration synthesis and the breaking of this triphosphate bond, 135 00:10:21,000 --> 00:10:25,000 which is a high-energy bond. You'll take off your inorganic 136 00:10:25,000 --> 00:10:30,000 pyrophosphate and you'll add in an A. That's it. 137 00:10:30,000 --> 00:10:34,000 Then, it will go off and it'll look for, what, a T, 138 00:10:34,000 --> 00:10:38,000 a triphosphate with T, DTTP, and then DCTP, etc. 139 00:10:38,000 --> 00:10:42,000 And it adds them in. This enzyme, this hypothetical enzyme, that can 140 00:10:42,000 --> 00:10:47,000 polymerize DNA like that is called polymerase. It's all very simple 141 00:10:47,000 --> 00:10:51,000 stuff. This is DNA polymerase. OK, and the nomenclatures here make 142 00:10:51,000 --> 00:10:55,000 tremendous sense. This is called DNA polymerase. 143 00:10:55,000 --> 00:11:00,000 Anyway, Kornberg isolated by a lot of work 144 00:11:00,000 --> 00:11:05,000 DNA polymerase, and was able to demonstrate that it 145 00:11:05,000 --> 00:11:10,000 could in fact catalyze this reaction. This was incredibly exciting. 146 00:11:10,000 --> 00:11:15,000 He got a Nobel Prize for this amongst other things, 147 00:11:15,000 --> 00:11:20,000 but he really demonstrated that there were proteins that could copy 148 00:11:20,000 --> 00:11:25,000 DNA according to this double helical model for replication. 149 00:11:25,000 --> 00:11:30,000 I call your attention to the fact that the replication goes 150 00:11:30,000 --> 00:11:36,000 five prime to three prime always, ever, all the time. This is 151 00:11:36,000 --> 00:11:42,000 universal. No one has ever found a DNA polymerization system in nature 152 00:11:42,000 --> 00:11:48,000 where it goes the other way. And, why would that be? This is 153 00:11:48,000 --> 00:11:54,000 just a digression. But tell me why that would be? 154 00:11:54,000 --> 00:12:00,000 Let's take our strand here, T, G, C, A, 155 00:12:00,000 --> 00:12:10,000 T, T, A, G, C, G, T, why not go this way? 156 00:12:10,000 --> 00:12:20,000 Why not go, let's say, A, G, C, G. Let's see, what base should 157 00:12:20,000 --> 00:12:30,000 we put in? We'll take our triphosphate T, right? 158 00:12:30,000 --> 00:12:38,000 We'll put that in. Let's see, where are we going to get 159 00:12:38,000 --> 00:12:42,000 the triphosphate bond; where are we going to get the energy? 160 00:12:42,000 --> 00:12:46,000 The triphosphate's on the wrong end. Oh, that's not a problem because 161 00:12:46,000 --> 00:12:51,000 when we put this G in, it must be that its triphosphate was 162 00:12:51,000 --> 00:12:55,000 still there, right? So, now we'll take the next one, 163 00:12:55,000 --> 00:13:00,000 a triphosphate T, and now why don't we just 164 00:13:00,000 --> 00:13:03,000 carry out the polymerization using the triphosphate bond, 165 00:13:03,000 --> 00:13:06,000 the energy from the triphosphate bond, on a growing chain going in 166 00:13:06,000 --> 00:13:09,000 that direction? That would work, 167 00:13:09,000 --> 00:13:13,000 right? Just stick this guy here. It'll supply a new triphosphate at 168 00:13:13,000 --> 00:13:16,000 the end, and that triphosphate can be used to catalyze the next monomer. 169 00:13:16,000 --> 00:13:19,000 So, what's the problem? You could put the triphosphates on 170 00:13:19,000 --> 00:13:23,000 the growing chain. If we went this way, 171 00:13:23,000 --> 00:13:26,000 the triphosphate bond would be on the growing chain, 172 00:13:26,000 --> 00:13:30,000 rather than in this way the triphosphate is on the monomer. 173 00:13:30,000 --> 00:13:35,000 But who cares? Who might care? 174 00:13:35,000 --> 00:13:40,000 If you were designing it, which way would you prefer to do it? 175 00:13:40,000 --> 00:13:45,000 The one with the energy first, well, why do you care whether the 176 00:13:45,000 --> 00:13:51,000 triphosphate is on this big, long chain that you've made, or 177 00:13:51,000 --> 00:13:56,000 whether it's on this monomer because either way you've got a triphosphate 178 00:13:56,000 --> 00:14:01,000 bond that could be on the monomers floating around, 179 00:14:01,000 --> 00:14:07,000 or it could be in that last position with the growing chain. Yeah? 180 00:14:07,000 --> 00:14:12,000 Could be, could be. What kind of mistake might I make? 181 00:14:12,000 --> 00:14:18,000 Yep. And, you know, what other kind of mistakes can happen? 182 00:14:18,000 --> 00:14:24,000 What about these high-energy triphosphate bonds: unstable? 183 00:14:24,000 --> 00:14:30,000 What if they should just spontaneously hydrolyze? 184 00:14:30,000 --> 00:14:33,000 Oops: big trouble, right? You've lost your 185 00:14:33,000 --> 00:14:37,000 triphosphate bond, and but what if this one 186 00:14:37,000 --> 00:14:40,000 spontaneously hydrolyzes? Aren't you in trouble? No, 187 00:14:40,000 --> 00:14:44,000 get another monomer, right? Clearly, it's no big deal if one of the 188 00:14:44,000 --> 00:14:48,000 monomers spontaneously hydrolyzes from a triphosphate to a 189 00:14:48,000 --> 00:14:51,000 monophosphate, but it's a big deal if you've 190 00:14:51,000 --> 00:14:55,000 invested all of this energy going in the other direction, 191 00:14:55,000 --> 00:14:59,000 and it should spontaneously hydrolyze. 192 00:14:59,000 --> 00:15:02,000 So, it makes a great deal more sense to leave that high-energy bond on 193 00:15:02,000 --> 00:15:06,000 the monomer for the growing polymer rather than on the polymer itself. 194 00:15:06,000 --> 00:15:10,000 And, in fact, of course, nature hasn't told me why it chose to do 195 00:15:10,000 --> 00:15:14,000 this. This is my reason why I think nature chose to do this, 196 00:15:14,000 --> 00:15:17,000 but I think it's very reasonable, and I think it's right. So, this is 197 00:15:17,000 --> 00:15:21,000 not the way it's done. This is the way it's done, 198 00:15:21,000 --> 00:15:25,000 and it's always done that way. No one has ever found a case where 199 00:15:25,000 --> 00:15:29,000 it's not. OK, so now let's look a little more 200 00:15:29,000 --> 00:15:34,000 closely at DNA replication. Suppose I take not just this teeny 201 00:15:34,000 --> 00:15:40,000 little piece that Kornberg gives, but suppose I now look at what's 202 00:15:40,000 --> 00:15:47,000 going on in an organism. An organism might have a big, 203 00:15:47,000 --> 00:15:53,000 long chromosome. DNA replication is occurring along this chromosome. 204 00:15:53,000 --> 00:16:00,000 We've got to go five prime to three prime, five prime to three prime. 205 00:16:00,000 --> 00:16:06,000 Let's suppose there's a primer here. Wait a second, where's the primer 206 00:16:06,000 --> 00:16:12,000 going to come from? If Kornberg's not there to add the 207 00:16:12,000 --> 00:16:18,000 primer, what does the organism do? To kind of make one itself, and I'm 208 00:16:18,000 --> 00:16:24,000 going to need some enzyme to make it. So, what enzyme's going to make it? 209 00:16:24,000 --> 00:16:30,000 Or, primase: it turns out to be remarkably, coincidentally 210 00:16:30,000 --> 00:16:35,000 it's primase that makes the primer. It's funny how that works out. And 211 00:16:35,000 --> 00:16:40,000 so, primase makes the primer, and then what happens? Then, DNA 212 00:16:40,000 --> 00:16:46,000 polymerase comes along and catalyzes the addition, and works beautifully. 213 00:16:46,000 --> 00:16:51,000 What about on the other strand? So, it's got a what? 214 00:16:51,000 --> 00:16:57,000 Why does it have to play catch-up? Let's see, what kind of primer here? 215 00:16:57,000 --> 00:17:02,000 It's got to go the other way. 216 00:17:02,000 --> 00:17:06,000 OK, so let's get a primer here. So, but wait a second, now it 217 00:17:06,000 --> 00:17:10,000 breathes and opens up a little more. We've got to get a primer here. 218 00:17:10,000 --> 00:17:14,000 And then, when it's going to open up even more we've got to get a primer 219 00:17:14,000 --> 00:17:19,000 there. See, this guy's going the wrong way. So, 220 00:17:19,000 --> 00:17:23,000 in fact, this is what happens. When the DNA opens like this, one 221 00:17:23,000 --> 00:17:27,000 primer here is sufficient to keep going, but here as you begin to open 222 00:17:27,000 --> 00:17:32,000 this up, the other strand needs the continual addition of new 223 00:17:32,000 --> 00:17:36,000 primers, and then what happens when this DNA sequence here, 224 00:17:36,000 --> 00:17:40,000 growing, meets this DNA sequence there? They've got to be ligated 225 00:17:40,000 --> 00:17:44,000 together. They've got to be joined together. So, 226 00:17:44,000 --> 00:17:48,000 this is actually getting kind of complicated. We have little DNA 227 00:17:48,000 --> 00:17:52,000 fragments that have to be ligated together on this strand. 228 00:17:52,000 --> 00:17:56,000 Now, how are you going to ligate them together? 229 00:17:56,000 --> 00:18:00,000 Chemically, you've got to catalyze a 230 00:18:00,000 --> 00:18:04,000 covalent bond between this little growing DNA chain and the previous 231 00:18:04,000 --> 00:18:08,000 growing DNA chain that was there. How are you going to ligate them? 232 00:18:08,000 --> 00:18:12,000 Ligase: yes! Coincidentally, it turns out that ligase does that. 233 00:18:12,000 --> 00:18:16,000 It's just wonderful the way this worked out, that ligase should do 234 00:18:16,000 --> 00:18:20,000 the ligation, and primase should do the primer, and all that. 235 00:18:20,000 --> 00:18:24,000 All right, so this goes on and on. Now, this model, which is what 236 00:18:24,000 --> 00:18:28,000 would be compelled by what we're thinking about is experimentally 237 00:18:28,000 --> 00:18:31,000 proven. There was a scientist who 238 00:18:31,000 --> 00:18:35,000 demonstrated that on this strand, this one goes slower, right, because 239 00:18:35,000 --> 00:18:39,000 it's got to, just as you said, catch up. Playing catch up, this is 240 00:18:39,000 --> 00:18:43,000 what's called the lagging strand. This guy is called the leading 241 00:18:43,000 --> 00:18:46,000 strand. The lagging strand plays catch-up to the leading strand. 242 00:18:46,000 --> 00:18:50,000 And, these little fragments can actually be really, 243 00:18:50,000 --> 00:18:54,000 truly identified biochemically. They were identified, in fact, by 244 00:18:54,000 --> 00:18:58,000 somebody called Okazaki. And, do you know what they're 245 00:18:58,000 --> 00:19:02,000 called, those fragments? Okazaki fragments, 246 00:19:02,000 --> 00:19:07,000 exactly. That's what they're called. So, that's how it goes, 247 00:19:07,000 --> 00:19:12,000 and it goes with this continuous replication, and then this 248 00:19:12,000 --> 00:19:17,000 discontinuous replication there. Now, here's another problem. This 249 00:19:17,000 --> 00:19:22,000 upset people a lot. Try to take a long chromosome. 250 00:19:22,000 --> 00:19:27,000 In fact, let's even imagine that it's a circular chromosome like 251 00:19:27,000 --> 00:19:33,000 bacteria have, a big DNA circle. Imagine trying to replicate this. 252 00:19:33,000 --> 00:19:41,000 All right, we're going to pull this apart some. We'll start replicating 253 00:19:41,000 --> 00:19:48,000 as we'll continue to pull this apart, etc., etc., but the problem is that 254 00:19:48,000 --> 00:19:56,000 we're going to end up with this DNA helix and this DNA helix wrapped 255 00:19:56,000 --> 00:20:02,000 around each other so that we're going to have double 256 00:20:02,000 --> 00:20:07,000 helices, or we're going to have interlaced double helices. 257 00:20:07,000 --> 00:20:12,000 It's really very messy. Topologically, 258 00:20:12,000 --> 00:20:17,000 if I take a double helix and I copy the two strands, 259 00:20:17,000 --> 00:20:22,000 and the double helices went around each other 800 times before they got 260 00:20:22,000 --> 00:20:27,000 to the end and joined up, I've now got two circles of DNA that 261 00:20:27,000 --> 00:20:32,000 are inextricably linked together with 262 00:20:32,000 --> 00:20:36,000 what's mathematically called the linking number of 800. 263 00:20:36,000 --> 00:20:41,000 That's not very good when I try to now divide my cell and say, 264 00:20:41,000 --> 00:20:46,000 in one chromosome to one cell and one chromosome to the other cell 265 00:20:46,000 --> 00:20:50,000 because I've got these two long, continuous ropes that are just so 266 00:20:50,000 --> 00:20:55,000 totally knotted with each other. This bothered people tremendously. 267 00:20:55,000 --> 00:21:00,000 You can prove, mathematically, some of you take the topology courses 268 00:21:00,000 --> 00:21:04,000 that there is no way without cutting to pull apart two strings that are 269 00:21:04,000 --> 00:21:08,000 so intertwined with each other. So, how in the world is life going 270 00:21:08,000 --> 00:21:12,000 to do that? It's mathematically impossible to do that without 271 00:21:12,000 --> 00:21:17,000 actually cutting. So, it cuts it because it's got no 272 00:21:17,000 --> 00:21:21,000 choice, right? There's a theorem that says you 273 00:21:21,000 --> 00:21:25,000 have to cut it. So, it cuts it. 274 00:21:25,000 --> 00:21:30,000 You would actually need, it turns out, that if you're 275 00:21:30,000 --> 00:21:35,000 going to separate out these two different double helices that are 276 00:21:35,000 --> 00:21:40,000 all wound up around each other, you're going to need to somehow cut 277 00:21:40,000 --> 00:21:46,000 the DNA, separate it, and pass it through the other side. 278 00:21:46,000 --> 00:21:51,000 And, you're going to need to do that to un-knot this thing. 279 00:21:51,000 --> 00:21:57,000 Now, does it change it chemically when you cut it and bring it around 280 00:21:57,000 --> 00:22:01,000 to the other side of the string? It's still the same molecule, 281 00:22:01,000 --> 00:22:05,000 right? It's the same DNA, but topologically it's different. 282 00:22:05,000 --> 00:22:08,000 The two circles are now not linked to 800 times their links, 283 00:22:08,000 --> 00:22:12,000 799 times, and if I keep doing that, so they are, you could call them 284 00:22:12,000 --> 00:22:15,000 topoisomers because they differ only in their topology, 285 00:22:15,000 --> 00:22:19,000 their topoisomers. So, you would need an enzyme that 286 00:22:19,000 --> 00:22:22,000 actually cuts the DNA, and is clever enough to pass it to 287 00:22:22,000 --> 00:22:26,000 the other side and then seal it back up, and cut the DNA, 288 00:22:26,000 --> 00:22:30,000 and pass it through the side and seal it back up. 289 00:22:30,000 --> 00:22:35,000 What enzyme does that? Topoisomerase does that, 290 00:22:35,000 --> 00:22:40,000 that's right. And, there are topoisomerase enzymes that cut and 291 00:22:40,000 --> 00:22:46,000 paste the DNA to resolve this terrible linking number problem. 292 00:22:46,000 --> 00:22:51,000 So, life has worked all this stuff out, and there's just fascinating 293 00:22:51,000 --> 00:22:57,000 work that goes on to understand, woops, all of the steps there of DNA 294 00:22:57,000 --> 00:23:02,000 replication. Now, I mentioned that these are 295 00:23:02,000 --> 00:23:06,000 actually pretty important things because processes like this are very 296 00:23:06,000 --> 00:23:10,000 important to rapidly growing cells. It turns out that some very good 297 00:23:10,000 --> 00:23:14,000 anti-cancer drugs are inhibitors of topoisomerase because rapidly 298 00:23:14,000 --> 00:23:18,000 growing cancer cells are highly sensitive to the need to continue to 299 00:23:18,000 --> 00:23:22,000 topologically untangle your DNA. And so, topoisomerase inhibitors 300 00:23:22,000 --> 00:23:26,000 turn out to be pretty good, well, they're not great, but they 301 00:23:26,000 --> 00:23:31,000 turned out to be acceptable cancer drugs. 302 00:23:31,000 --> 00:23:38,000 Here's another issue: fidelity. The fidelity of DNA replication. 303 00:23:38,000 --> 00:23:45,000 If I'm copying the DNA, I'm going to put in my next base. 304 00:23:45,000 --> 00:23:52,000 It's a T. I want to put in an A, a G, I want to put in a C; how do I 305 00:23:52,000 --> 00:24:00,000 get it right? I have my DNA polymerase enzyme here. 306 00:24:00,000 --> 00:24:06,000 How do I manage to get this right? Why don't I put in a G next to the 307 00:24:06,000 --> 00:24:13,000 T instead of an A? Well, it's energetically less 308 00:24:13,000 --> 00:24:19,000 favored, right? Energetically, there's some cost. 309 00:24:19,000 --> 00:24:26,000 There's a delta G, an energetic difference between the right base 310 00:24:26,000 --> 00:24:32,000 and the wrong base. Now, if I know delta G, 311 00:24:32,000 --> 00:24:38,000 I from biochemistry know the equilibrium constant. 312 00:24:38,000 --> 00:24:43,000 I should be able to calculate, based on the energetic difference 313 00:24:43,000 --> 00:24:49,000 between putting in the right base and the wrong base how often DNA 314 00:24:49,000 --> 00:24:54,000 polymerase makes a mistake, and it turns out you can do that. 315 00:24:54,000 --> 00:25:00,000 It turns out that the equilibrium constant is about 103. 316 00:25:00,000 --> 00:25:10,000 That means that DNA polymerase, remarkably, gets it right 99.9% of 317 00:25:10,000 --> 00:25:20,000 the time, it puts it in the right base. Isn't that impressive? 318 00:25:20,000 --> 00:25:30,000 No, it's terrible. Why is that terrible? Yeah, 99.9% 319 00:25:30,000 --> 00:25:33,000 this is no Six Sigma performance or anything. This is pretty 320 00:25:33,000 --> 00:25:37,000 unimpressive stuff. I mean, a typical gene is more than 321 00:25:37,000 --> 00:25:41,000 1,000 letters. That means we're going to actually 322 00:25:41,000 --> 00:25:45,000 make a mistake on average in every gene. This won't do. 323 00:25:45,000 --> 00:25:48,000 So, what happens? Sorry? Well, clearly the energetics say 324 00:25:48,000 --> 00:25:52,000 that the delta G is only enough to get us a factor of 103. 325 00:25:52,000 --> 00:25:56,000 We're going to need an additional mechanism, and the additional 326 00:25:56,000 --> 00:26:00,000 mechanism's a proofreading. It's absolutely right. 327 00:26:00,000 --> 00:26:04,000 We need to proofread this because we know that initially we're going to 328 00:26:04,000 --> 00:26:09,000 get it wrong at an unacceptably high rate. And so, 329 00:26:09,000 --> 00:26:14,000 it turns out that there are two kinds of DNA proofreading that go on. 330 00:26:14,000 --> 00:26:19,000 First off, DNA polymerase itself has a proofreading activity. 331 00:26:19,000 --> 00:26:24,000 Whenever DNA polymerase adds a base, it kind of also has an activity that 332 00:26:24,000 --> 00:26:29,000 will remove a base. So, it doesn't just add bases going 333 00:26:29,000 --> 00:26:33,000 forward. It also has what's called an 334 00:26:33,000 --> 00:26:37,000 exonuclease activity that removes bases going backwards. 335 00:26:37,000 --> 00:26:40,000 Now, that may seem silly, right, because it's adding and 336 00:26:40,000 --> 00:26:44,000 subtracting, and adding and subtracting, but it adds more than 337 00:26:44,000 --> 00:26:47,000 it subtracts. And, the trick is that if there's a 338 00:26:47,000 --> 00:26:51,000 mismatched base, it's much more likely to subtract 339 00:26:51,000 --> 00:26:54,000 than to add, or much more likely to subtract than if there's not a 340 00:26:54,000 --> 00:26:58,000 mismatched base. So, the presence of a mismatch 341 00:26:58,000 --> 00:27:02,000 induces the enzyme to do its removal more than if there was a match. 342 00:27:02,000 --> 00:27:08,000 In that fashion, DNA polymerase is able to 343 00:27:08,000 --> 00:27:14,000 substantially increase its proofreading ability to about one 344 00:27:14,000 --> 00:27:20,000 error in 105 or 106, much better in one in 103. 345 00:27:20,000 --> 00:27:26,000 Then, it turns out that there are mismatched detection and repair 346 00:27:26,000 --> 00:27:32,000 enzymes. They come along after DNA polymerase has done its job, 347 00:27:32,000 --> 00:27:37,000 and they feel along the DNA for any mismatches. Mismatches are going to 348 00:27:37,000 --> 00:27:42,000 create funny structures. They're going to bulge in some way. 349 00:27:42,000 --> 00:27:47,000 And, mismatch repair enzymes are able to detect that something's 350 00:27:47,000 --> 00:27:52,000 funny, and they chop out some sequence, and they get copied back 351 00:27:52,000 --> 00:27:57,000 in. Now, with the proofreading that comes from these mismatched repair 352 00:27:57,000 --> 00:28:02,000 enzymes, you can get down to the neighborhood of one mistake 353 00:28:02,000 --> 00:28:07,000 in about 108 bases. In the course of the human, 354 00:28:07,000 --> 00:28:13,000 yes? Oh, what a great question! Because, when it has a mistake, 355 00:28:13,000 --> 00:28:18,000 how does it know who to correct? In bacteria, I can tell you the 356 00:28:18,000 --> 00:28:24,000 answer. Wouldn't it be cool if you could leave a mark on the old strand? 357 00:28:24,000 --> 00:28:30,000 If the old strand could be temporarily 358 00:28:30,000 --> 00:28:33,000 marked in some way so that the enzyme, when it sees a mismatch, 359 00:28:33,000 --> 00:28:37,000 would also know which strand to cut out and re-synthesize? 360 00:28:37,000 --> 00:28:40,000 It turns out that bacteria do that. Methylation enzymes actually mark 361 00:28:40,000 --> 00:28:44,000 the old strand. And, it takes a while before those 362 00:28:44,000 --> 00:28:47,000 methylation enzymes come along to mark the new strand, 363 00:28:47,000 --> 00:28:51,000 and it leaves a temporary mark as to who's the old strand. 364 00:28:51,000 --> 00:28:54,000 I wasn't going to mention that today, but it's a great question. 365 00:28:54,000 --> 00:28:58,000 So, it leaves breadcrumbs for a while that tells it who's 366 00:28:58,000 --> 00:29:02,000 the old strand. So, all of this gets worked out. 367 00:29:02,000 --> 00:29:06,000 Yes? So, the exonucleases go backwards. They go three prime to 368 00:29:06,000 --> 00:29:10,000 five prime because, that's right, they only work in that 369 00:29:10,000 --> 00:29:14,000 direction. There are other exos that go in the other direction, 370 00:29:14,000 --> 00:29:19,000 but this exo on the polymerase go backwards, three prime to five. 371 00:29:19,000 --> 00:29:23,000 Now, this is not just theoretical stuff. It turns out that about one 372 00:29:23,000 --> 00:29:27,000 person in 400, that is, probably at least one 373 00:29:27,000 --> 00:29:32,000 person in this class, is heterozygous for a mutation in 374 00:29:32,000 --> 00:29:36,000 one of the mismatch repair enzyme genes. One of the genes like MSH-2 375 00:29:36,000 --> 00:29:41,000 or MLH-1 that encode the mismatch repair enzymes. 376 00:29:41,000 --> 00:29:46,000 What do you think happens if you are missing one of your two copies 377 00:29:46,000 --> 00:29:50,000 of these mismatch repair enzymes? Nothing much. The other copy's 378 00:29:50,000 --> 00:29:55,000 enough. But, what do you think would happen if by chance a single 379 00:29:55,000 --> 00:30:00,000 cell in your body were to lose the one remaining working copy of that 380 00:30:00,000 --> 00:30:05,000 enzyme, the gene-encoded remaining working copy? 381 00:30:05,000 --> 00:30:09,000 Then it would have no copies. What do you think the response of 382 00:30:09,000 --> 00:30:13,000 the cell would be? High mutation rates, 383 00:30:13,000 --> 00:30:17,000 and cancer. It turns out that familial, hereditary, 384 00:30:17,000 --> 00:30:21,000 nonpolyposis coli, a familial form of colon cancer, 385 00:30:21,000 --> 00:30:25,000 is caused by, in many cases, mutations in the gene or genes, 386 00:30:25,000 --> 00:30:30,000 actually, encoding the mismatch repair enzymes. 387 00:30:30,000 --> 00:30:33,000 So, our theoretical understanding of the central dogma here is an 388 00:30:33,000 --> 00:30:37,000 incredibly practical disease because getting DNA replication right is 389 00:30:37,000 --> 00:30:40,000 important. And, that provides a very good proof that 390 00:30:40,000 --> 00:30:44,000 the difference between 105 or 106 here and 108 accuracy matters a 391 00:30:44,000 --> 00:30:48,000 great deal, that without that mismatch repair enzyme present in 392 00:30:48,000 --> 00:30:51,000 the cells, one is in fact going to create new mutations at an 393 00:30:51,000 --> 00:30:55,000 unacceptably high rate and lead to cancer. I don't know, 394 00:30:55,000 --> 00:30:59,000 a few other random nice facts about DNA polymerases. 395 00:30:59,000 --> 00:31:04,000 They're very fast speed. The speed of a DNA polymerase is 396 00:31:04,000 --> 00:31:09,000 about 2,000 nucleotides per second: very impressive. 397 00:31:09,000 --> 00:31:14,000 And then, one last point I can't help but mention, 398 00:31:14,000 --> 00:31:19,000 Arthur Kornberg discovers this enzyme, shows in a test tube, 399 00:31:19,000 --> 00:31:24,000 it works, people work out, how it works in detail, 400 00:31:24,000 --> 00:31:30,000 leading strands, lagging strands, topoisomerases, workout 401 00:31:30,000 --> 00:31:35,000 fidelity, all these kinds of things, great. But Kornberg's enzyme, the 402 00:31:35,000 --> 00:31:40,000 enzyme he purifies that copies DNA, is it actually the right enzyme? Is 403 00:31:40,000 --> 00:31:46,000 it the enzyme that the bacterial cells he used actually use to copy 404 00:31:46,000 --> 00:31:51,000 their DNA? Well, a biochemist would say, 405 00:31:51,000 --> 00:31:57,000 I cracked open the cell. I purified a component. It's able 406 00:31:57,000 --> 00:32:02,000 to carry out this function. There you go. But, 407 00:32:02,000 --> 00:32:06,000 what would the geneticist say? Sorry? Take out the component, 408 00:32:06,000 --> 00:32:11,000 and demonstrate now what? That the cell can't replicate. 409 00:32:11,000 --> 00:32:16,000 It's DNA. Until you've shown that, you haven't got the other half of 410 00:32:16,000 --> 00:32:20,000 the proof. So, of course, some geneticists decided 411 00:32:20,000 --> 00:32:25,000 to put this to the test. They took many mutant bacteria. 412 00:32:25,000 --> 00:32:30,000 One at a time, they grew them up, 413 00:32:30,000 --> 00:32:33,000 and they did Kornberg's purification to purify DNA polymerase. 414 00:32:33,000 --> 00:32:37,000 This is unbelievably tedious stuff, guys. You've got to take each one. 415 00:32:37,000 --> 00:32:41,000 You've got to purify it; get DNA polymerase. OK, 416 00:32:41,000 --> 00:32:45,000 it's there. Next one, next one, next one, next one. 417 00:32:45,000 --> 00:32:48,000 But, suppose you found a mutant which couldn't make Kornberg's DNA 418 00:32:48,000 --> 00:32:52,000 polymerase but still grew and replicated its DNA. 419 00:32:52,000 --> 00:32:56,000 That would prove that Kornberg's enzyme was not essential. 420 00:32:56,000 --> 00:33:00,000 They did. It turns out that Kornberg's enzyme, 421 00:33:00,000 --> 00:33:04,000 DNA polymerase 1, although it can replicate DNA in the 422 00:33:04,000 --> 00:33:08,000 test tube is not the enzyme that cells actually use for their major 423 00:33:08,000 --> 00:33:12,000 DNA replication. It turns out to be a relatively 424 00:33:12,000 --> 00:33:16,000 more minor repair enzyme used to fill in gaps. The actual enzyme is 425 00:33:16,000 --> 00:33:20,000 DNA polymerase 3, not that it matters to you a great 426 00:33:20,000 --> 00:33:24,000 deal, but this duality between the biochemistry and the genetics is 427 00:33:24,000 --> 00:33:28,000 very important because just the biochemical side of the story, 428 00:33:28,000 --> 00:33:31,000 without showing that it was essential to the function in the 429 00:33:31,000 --> 00:33:34,000 organism misses a very important point there. So, 430 00:33:34,000 --> 00:33:38,000 the combination of genetics and biochemistry, biochemistry pointed 431 00:33:38,000 --> 00:33:41,000 us to a class of enzymes. The genetics, then, identifies 432 00:33:41,000 --> 00:33:44,000 which ones are used for which purposes in vivo, 433 00:33:44,000 --> 00:33:48,000 which is not that easy to do in the test tube. Anyway, 434 00:33:48,000 --> 00:33:51,000 I mentioned that, and obviously being a geneticist, 435 00:33:51,000 --> 00:33:54,000 I like tweaking the biochemists about things like that. 436 00:33:54,000 --> 00:33:58,000 All right, onward. So, in our picture of DNA replication, 437 00:33:58,000 --> 00:34:04,000 in our picture of the central dogma, 438 00:34:04,000 --> 00:34:13,000 we've got DNA goes to DNA, and what about the step of transcription, 439 00:34:13,000 --> 00:34:22,000 DNA goes to RNA? Well, we've got to copy out our DNA into an 440 00:34:22,000 --> 00:34:31,000 intermediate molecule called RNA, which is going to then be used as a 441 00:34:31,000 --> 00:34:38,000 template for protein synthesis. Where do we start? 442 00:34:38,000 --> 00:34:43,000 Somewhere in here, there's some information. 443 00:34:43,000 --> 00:34:49,000 We want to make a copy of that information. How do we know where 444 00:34:49,000 --> 00:34:54,000 to start? Well, there's something. 445 00:34:54,000 --> 00:35:00,000 There's some information that says start here, right? 446 00:35:00,000 --> 00:35:06,000 There's a little sign that says, start here. Such a thing is called 447 00:35:06,000 --> 00:35:13,000 a promoter. And, the promoter, which we'll come and 448 00:35:13,000 --> 00:35:19,000 talk about more in a while, probably in a lecture or two, 449 00:35:19,000 --> 00:35:26,000 the promoter says here's the place to start copying the 450 00:35:26,000 --> 00:35:33,000 DNA into RNA, and it gets copied into the RNA by an 451 00:35:33,000 --> 00:35:41,000 enzyme that starts here, let's say, I don't know, T, 452 00:35:41,000 --> 00:35:49,000 A, T, G, G, T, A, T. On the other strand I guess it's going to be A, 453 00:35:49,000 --> 00:35:57,000 T, A, C, C, A, T, A. It's going to start copying here, 454 00:35:57,000 --> 00:36:04,000 and it's going to put in an A. Then opposite the A, 455 00:36:04,000 --> 00:36:11,000 it's going to put in a U, because RNA has U, A, C, C, 456 00:36:11,000 --> 00:36:19,000 A, U, A, etc., except this time it's doing it not out of DNA but out of 457 00:36:19,000 --> 00:36:26,000 RNA. How does RNA differ from DNA? So, first off, instead of 458 00:36:26,000 --> 00:36:34,000 deoxyribose, this is deoxyribose. 459 00:36:34,000 --> 00:36:42,000 In fact, it's two prime deoxyribose. This is just plain old ribose. 460 00:36:42,000 --> 00:36:51,000 Remember down there on the two prime carbon, DNA had just a 461 00:36:51,000 --> 00:37:00,000 hydrogen, whereas RNA has a hydroxyl. 462 00:37:00,000 --> 00:37:04,000 All right, that's one difference, and it turns out that that hydroxyl 463 00:37:04,000 --> 00:37:08,000 is important because it would interfere in making long double 464 00:37:08,000 --> 00:37:12,000 helices of RNA. RNA doesn't make good, 465 00:37:12,000 --> 00:37:16,000 long double helices. Let's entirely do that, oxygen. 466 00:37:16,000 --> 00:37:20,000 And, the other major difference between DNA and RNA? 467 00:37:20,000 --> 00:37:24,000 The only other difference between DNA and RNA is that this has U where 468 00:37:24,000 --> 00:37:29,000 this has T, and what's the difference between T and U? 469 00:37:29,000 --> 00:37:38,000 A single methyl group. That's the only difference between 470 00:37:38,000 --> 00:37:48,000 T and U. In this six-member ring over here, there is a methyl group. 471 00:37:48,000 --> 00:37:58,000 And here in the six membered ring, there's no methyl group. 472 00:37:58,000 --> 00:38:08,000 That's it. Why does RNA use U, and DNA use T? Anybody know? 473 00:38:08,000 --> 00:38:12,000 It's not a big difference. That would be interesting, 474 00:38:12,000 --> 00:38:16,000 although I don't think it's true. I actually have no idea. I think 475 00:38:16,000 --> 00:38:21,000 this is fascinating. I've never had a good accounting of 476 00:38:21,000 --> 00:38:25,000 why it uses U and T. You need to know this, 477 00:38:25,000 --> 00:38:30,000 and it's true, but I don't actually have a, 478 00:38:30,000 --> 00:38:33,000 whereas I have a good explanation for this I don't have a good 479 00:38:33,000 --> 00:38:37,000 explanation for that, although maybe some of my Origin of 480 00:38:37,000 --> 00:38:40,000 Life colleagues have an explanation. But I've always been a little 481 00:38:40,000 --> 00:38:44,000 puzzled. Why does it use U instead of T? Anyway, 482 00:38:44,000 --> 00:38:47,000 I do know why it doesn't have the hydroxyl. Well, 483 00:38:47,000 --> 00:38:51,000 it has the hydroxyl there. That really does affect the base 484 00:38:51,000 --> 00:38:54,000 stacking, and all sorts of things like that. All right, 485 00:38:54,000 --> 00:38:58,000 so you, don't go away, come back. So, the DNA is used as a template 486 00:38:58,000 --> 00:39:03,000 to copy here a strand of RNA. Some important names: the strand 487 00:39:03,000 --> 00:39:09,000 that is being copied that is being transcribed is called the 488 00:39:09,000 --> 00:39:15,000 transcribed strand. This is called the non-transcribed 489 00:39:15,000 --> 00:39:21,000 strand that makes good sense. This is also called the coding 490 00:39:21,000 --> 00:39:27,000 strand. And, you will find it in your books as the coding strand. 491 00:39:27,000 --> 00:39:34,000 Why is it called the non-coding strand? 492 00:39:34,000 --> 00:39:39,000 This is called the coding strand. Why is the top strand called the 493 00:39:39,000 --> 00:39:45,000 coding strand? Because the RNA that I copy out 494 00:39:45,000 --> 00:39:51,000 will have the same sequence as the coding strand, 495 00:39:51,000 --> 00:39:57,000 except for T's and U's. So, the RNA copy that is made from 496 00:39:57,000 --> 00:40:01,000 the transcribed strand matches the 497 00:40:01,000 --> 00:40:05,000 sequence of the non-transcribes strand, or the coding strand. 498 00:40:05,000 --> 00:40:09,000 So, you will find this confusing, but you will probably find it on 499 00:40:09,000 --> 00:40:13,000 tests and some things like that to know which strand you're looking at. 500 00:40:13,000 --> 00:40:17,000 The coding strand is this strand which has the code that ends up, 501 00:40:17,000 --> 00:40:21,000 but in fact it's the template for the coding strand, 502 00:40:21,000 --> 00:40:25,000 the complement to the coding strand, the non-coding strand, the 503 00:40:25,000 --> 00:40:29,000 transcribed strand that is copied. Anyway, I've said that now, and you 504 00:40:29,000 --> 00:40:33,000 can, So, how does it know where to stop? 505 00:40:33,000 --> 00:40:37,000 Sorry? Stop codons. Stop codons are actually about translation into 506 00:40:37,000 --> 00:40:41,000 protein, right, because we're going to come to stop 507 00:40:41,000 --> 00:40:45,000 codons in a second. There was some start signal there 508 00:40:45,000 --> 00:40:49,000 called a promoter, which is a start of transcription. 509 00:40:49,000 --> 00:40:53,000 It turns out there was also a stop signal that says stop of 510 00:40:53,000 --> 00:40:57,000 transcription. And, you guys haven't probably met 511 00:40:57,000 --> 00:41:03,000 that before. But, there's a start signal, 512 00:41:03,000 --> 00:41:09,000 a stop signal, and all over the genome there are these things. 513 00:41:09,000 --> 00:41:15,000 So, here's some genome. Here's some gene that's got to be read out. 514 00:41:15,000 --> 00:41:21,000 And, it's read out this way, let's say. This is the coding strand. 515 00:41:21,000 --> 00:41:27,000 This is what, I'll make two strands here. Now, in the next 516 00:41:27,000 --> 00:41:33,000 gene over here, does it go in the same direction? 517 00:41:33,000 --> 00:41:39,000 It might. Or, it might not. It turns out that the orientation of 518 00:41:39,000 --> 00:41:45,000 genes along the chromosome, which way you read, is not a fixed 519 00:41:45,000 --> 00:41:51,000 thing across the entire length of the chromosome. 520 00:41:51,000 --> 00:41:57,000 So, when I refer to the transcribed strand or the non-transcribed strand, 521 00:41:57,000 --> 00:42:01,000 that's just a local definition that says, with respect to that gene, 522 00:42:01,000 --> 00:42:04,000 this strand is coding, and this strand is non-coded. 523 00:42:04,000 --> 00:42:08,000 But with respect to the next gene over, it could be the other way. 524 00:42:08,000 --> 00:42:11,000 Now, this is not a very orderly way to do things, right? 525 00:42:11,000 --> 00:42:14,000 If a good engineer did this, they'd probably get all the pieces 526 00:42:14,000 --> 00:42:18,000 going in line and all that. But life did this, and it turns out 527 00:42:18,000 --> 00:42:21,000 that evolvable systems, you know, couldn't possibly maintain 528 00:42:21,000 --> 00:42:24,000 that order. Things are happening all the time, and genes can come in 529 00:42:24,000 --> 00:42:28,000 any order. In addition, how does RNA polymerase know when to 530 00:42:28,000 --> 00:42:32,000 turn on the gene? Oh, sorry, what's the enzyme that 531 00:42:32,000 --> 00:42:36,000 polymerizes RNA? RNA polymerase, yes. 532 00:42:36,000 --> 00:42:40,000 How does it know when to turn on the gene? How does it turn on the 533 00:42:40,000 --> 00:42:44,000 right genes in the right tissues? We'll come to that. That's gene 534 00:42:44,000 --> 00:42:49,000 regulation. That's a big non-trivial thing. 535 00:42:49,000 --> 00:42:53,000 We'll save that one. All right, so we have all of this 536 00:42:53,000 --> 00:42:57,000 transcription. Let's now look at the last 537 00:42:57,000 --> 00:43:03,000 important part of our picture here, which is translation. 538 00:43:03,000 --> 00:43:11,000 So, RNA goes to protein. So, if RNA goes to protein, 539 00:43:11,000 --> 00:43:18,000 we take our messenger, our RNA over there. This is an RNA. 540 00:43:18,000 --> 00:43:26,000 What's the direction it's been copied? Five prime 541 00:43:26,000 --> 00:43:34,000 to three prime. It's a single strand of RNA that 542 00:43:34,000 --> 00:43:43,000 we've copied here, a single strand and molecule, 543 00:43:43,000 --> 00:43:52,000 and let's give it a sequence, A, U, A, C, G, A, U, G, A, A, G, C, C, 544 00:43:52,000 --> 00:44:02,000 C, etc. Eventually we'll get to U, A, G. How is this RNA interpreted? 545 00:44:02,000 --> 00:44:08,000 Well, in an abstract sense, the way this RNA is interpreted is 546 00:44:08,000 --> 00:44:14,000 by a triplet code. The cell could come along and start 547 00:44:14,000 --> 00:44:20,000 reading three letter codons. But, does it just start anywhere? 548 00:44:20,000 --> 00:44:26,000 No, it always starts at the same codon, and that codon is A, 549 00:44:26,000 --> 00:44:32,000 U, G. This is an initiator codon. 550 00:44:32,000 --> 00:44:40,000 And it encodes a methionine. Then, the next codon down encodes 551 00:44:40,000 --> 00:44:49,000 lysine, arginine, etc. The interesting challenge is 552 00:44:49,000 --> 00:44:57,000 how in the world you get from a sequence of nucleotides to a 553 00:44:57,000 --> 00:45:05,000 sequence of amino acids. So, we have to now get this funny 554 00:45:05,000 --> 00:45:12,000 translation step between nucleotides and amino acids. 555 00:45:12,000 --> 00:45:19,000 This concerned people greatly because transcription was pretty 556 00:45:19,000 --> 00:45:26,000 easy. Transcription was going to be the RNA, actually first 557 00:45:26,000 --> 00:45:32,000 replication, each nucleotide would match a 558 00:45:32,000 --> 00:45:37,000 nucleotide on the DNA sequence. Then, RNA polymerization, each 559 00:45:37,000 --> 00:45:42,000 nucleotide of RNA would match. But how are we going to get amino 560 00:45:42,000 --> 00:45:47,000 acids to match specific RNA sequences? How are we going to get 561 00:45:47,000 --> 00:45:52,000 amino acids? Now, this bothered people a great deal. 562 00:45:52,000 --> 00:45:57,000 And, you know what some of the ideas were? Well, protenase, 563 00:45:57,000 --> 00:46:01,000 right. Some enzyme, well, 564 00:46:01,000 --> 00:46:05,000 actually the first ideas were very physical ideas. 565 00:46:05,000 --> 00:46:09,000 It was that the RNA message there would fold up into some kind of a 566 00:46:09,000 --> 00:46:13,000 funny shape that would just happen to match a lysine, 567 00:46:13,000 --> 00:46:16,000 and then the next little bit would fold up to match, 568 00:46:16,000 --> 00:46:20,000 I don't know, histidine, a methianine, and a serine, 569 00:46:20,000 --> 00:46:24,000 and a this, because people were thinking the complementarity of DNA 570 00:46:24,000 --> 00:46:28,000 bases all just physical matching that it would work that the 571 00:46:28,000 --> 00:46:32,000 amino acids would be directly read off the RNA message. 572 00:46:32,000 --> 00:46:36,000 But, it was kind of crazy to imagine that because the amino acids 573 00:46:36,000 --> 00:46:40,000 all have such wildly different physical properties: positive 574 00:46:40,000 --> 00:46:44,000 charges, negative charges, hydrophilic, hydrophobic, different 575 00:46:44,000 --> 00:46:48,000 sizes. It just didn't make sense, but it bothered people a great deal. 576 00:46:48,000 --> 00:46:52,000 But, I would say that a lot of biochemists thought that that was 577 00:46:52,000 --> 00:46:56,000 sort of how it was going to have to work. The guy who really figured 578 00:46:56,000 --> 00:47:00,000 out what was going on did it with no experimental data whatsoever. 579 00:47:00,000 --> 00:47:04,000 He did it by just sitting down and saying, that doesn't make any sense. 580 00:47:04,000 --> 00:47:08,000 There's got to be another solution. And, that was Francis Crick. 581 00:47:08,000 --> 00:47:12,000 Francis Crick just had an incredible mind. 582 00:47:12,000 --> 00:47:16,000 He, Mendel, and a few other people had this incredible insight into 583 00:47:16,000 --> 00:47:20,000 things. He said, look, this just makes no sense that 584 00:47:20,000 --> 00:47:24,000 the physical properties are going to do it. He said, 585 00:47:24,000 --> 00:47:28,000 what's got to be going on is that what I want to put in 586 00:47:28,000 --> 00:47:32,000 a certain amino acid into a growing protein chain, 587 00:47:32,000 --> 00:47:38,000 I'm going to take my amino acid here. I'm going to take my codon here, 588 00:47:38,000 --> 00:47:43,000 and I'm going to build me some kind of an adapter. 589 00:47:43,000 --> 00:47:49,000 And, this adapter molecule will, in fact, solve the problem. So, he 590 00:47:49,000 --> 00:47:54,000 said, because Francis Creek, in addition to being brilliant, 591 00:47:54,000 --> 00:48:00,000 really didn't do any experiments. 592 00:48:00,000 --> 00:48:04,000 He didn't do any experiments both because he wasn't that fond of doing 593 00:48:04,000 --> 00:48:09,000 experiments, and because he was legendarily not very good at the 594 00:48:09,000 --> 00:48:13,000 bench. But, what Francis did was he exhorted all of his colleagues to go 595 00:48:13,000 --> 00:48:18,000 find the adapter. He had what he called the adapter 596 00:48:18,000 --> 00:48:23,000 hypothesis. And sure enough, Crick was dead on, just right. 597 00:48:23,000 --> 00:48:27,000 The adapter hypothesis turned out to be that there was an 598 00:48:27,000 --> 00:48:32,000 adapter molecule who was made itself out of RNA 599 00:48:32,000 --> 00:48:38,000 called transfer RNA. And, transfer RNA matched up by 600 00:48:38,000 --> 00:48:44,000 base pairing to each codon you see, and had amino acids attached to it 601 00:48:44,000 --> 00:48:50,000 and so the problem of how you mediate between a three-letter code 602 00:48:50,000 --> 00:48:56,000 of DNA or RNA, of nucleotides, 603 00:48:56,000 --> 00:49:02,000 and amino acids was solved by a clever intermediate. 604 00:49:02,000 --> 00:49:05,000 It turned out that they looked, they found the molecule. So, it's 605 00:49:05,000 --> 00:49:08,000 just one of these great examples of somebody having thought up an idea, 606 00:49:08,000 --> 00:49:11,000 sent people off to look for it, and it was there. And then, 607 00:49:11,000 --> 00:49:14,000 of course, you've got to ask, how did the amino acids get stuck 608 00:49:14,000 --> 00:49:17,000 onto the right transfer RNAs? And the answer is there's a bunch 609 00:49:17,000 --> 00:49:20,000 of specific enzymes that do precisely that job, 610 00:49:20,000 --> 00:49:23,000 that look at the transfer RNA, attach the amino acid, and handle 611 00:49:23,000 --> 00:49:26,000 that whole problem. I will next time briefly end with 612 00:49:26,000 --> 00:49:30,000 the ribosome, and how those transfer RNAs work to 613 00:49:30,000 --> 00:49:34,000 catalyze together the protein chain, and then what I want to do is turn 614 00:49:34,000 --> 00:49:39,000 to how this common picture of DNA, RNA, and protein varies amongst 615 00:49:39,000 --> 00:49:44,000 organisms. Until next time.