1 00:00:00,000 --> 00:00:07,000 So just trying to remind you that the replication fork looks something 2 00:00:07,000 --> 00:00:15,000 like this where 5 prime to 3 prime and 5 prime to 3 prime. 3 00:00:15,000 --> 00:00:22,000 This is what's known as the leading strand because DNA, 4 00:00:22,000 --> 00:00:30,000 the synthesis of the new strand can go -- 5 00:00:30,000 --> 00:00:37,000 Which is going 5 prime to 3 prime is 6 00:00:37,000 --> 00:00:43,000 going in the same direction as the movement of the replication fork. 7 00:00:43,000 --> 00:00:49,000 The other strand, which is known as the lagging strand, 8 00:00:49,000 --> 00:00:55,000 the DNA synthesis is actually going backwards to the movement of the 9 00:00:55,000 --> 00:01:01,000 replication fork, which means it has to go and then 10 00:01:01,000 --> 00:01:06,000 start up here and go again. And it's continually jumping. 11 00:01:06,000 --> 00:01:10,000 And I told you that the little RNA primer is used to start each strand. 12 00:01:10,000 --> 00:01:15,000 And then the DNA polymerase is able to elongate that. 13 00:01:15,000 --> 00:01:20,000 And then at the end these little nicks in here, 14 00:01:20,000 --> 00:01:24,000 the RNA has to be removed, fill in the gap and then it's sealed 15 00:01:24,000 --> 00:01:29,000 up by the enzyme DNA ligase, which we'll talk about when we talk 16 00:01:29,000 --> 00:01:34,000 about recombinant DNA. Someone asked, 17 00:01:34,000 --> 00:01:38,000 I had mentioned why this strategy of using RNA was beneficial, 18 00:01:38,000 --> 00:01:42,000 and that has to do with the fact that the fidelity, 19 00:01:42,000 --> 00:01:46,000 which is going to be the next thing I'm going to focus on of DNA 20 00:01:46,000 --> 00:01:50,000 replication as not, you can get a much higher accuracy 21 00:01:50,000 --> 00:01:54,000 if you have the end of a primer already there and then carry out the 22 00:01:54,000 --> 00:01:58,000 chemistry in there. No enzyme has ever achieved the 23 00:01:58,000 --> 00:02:02,000 accuracy that you see in DNA replication if it's 24 00:02:02,000 --> 00:02:06,000 starting a strand. So RNA polymerase, 25 00:02:06,000 --> 00:02:10,000 which constantly starts strands to make RNA copies, 26 00:02:10,000 --> 00:02:14,000 as we'll talk about, is not as accurate as DNA 27 00:02:14,000 --> 00:02:18,000 replication. And by putting a little bit of RNA, 28 00:02:18,000 --> 00:02:22,000 because the cell has to start a new strand. Before it gets here there's 29 00:02:22,000 --> 00:02:26,000 no strand at all on this lagging strand so it needs to make this 30 00:02:26,000 --> 00:02:30,000 little RNA primer. It needs to make a little primer. 31 00:02:30,000 --> 00:02:34,000 And by making it out of RNA then it can tell what doesn't belong there. 32 00:02:34,000 --> 00:02:39,000 It doesn't matter if it's not quite as accurate as the rest of DNA 33 00:02:39,000 --> 00:02:43,000 replications because it's going to take it out anyway and fill it in 34 00:02:43,000 --> 00:02:48,000 using the DNA polymerase. And if you think about that maybe 35 00:02:48,000 --> 00:02:52,000 you can see one of the reasons that the cell has chosen or nature has 36 00:02:52,000 --> 00:02:57,000 chosen through evolution to use little RNAs to begin the strands. 37 00:02:57,000 --> 00:03:01,000 OK. Well, in any case, the fidelity of DNA replication is 38 00:03:01,000 --> 00:03:05,000 really pretty amazing. Incidentally, just speaking of DNA, 39 00:03:05,000 --> 00:03:09,000 many of you wrote some very thoughtful things about Vernon 40 00:03:09,000 --> 00:03:12,000 Ingram's visit. I didn't give him a whole lot of 41 00:03:12,000 --> 00:03:15,000 warning and he had to go and change his schedule and move meetings 42 00:03:15,000 --> 00:03:19,000 around in order to come to talk to you. And it was very nice of you. 43 00:03:19,000 --> 00:03:22,000 Many of you wrote some very thoughtful things, 44 00:03:22,000 --> 00:03:25,000 which I'm going to pass onto him. I want him to know that many of you 45 00:03:25,000 --> 00:03:29,000 appreciated his visit. I also saw a lot of you reacted to 46 00:03:29,000 --> 00:03:33,000 his advice about crowded labs. That has been my experience, 47 00:03:33,000 --> 00:03:37,000 too. And one thing about the scientific process is it's not just 48 00:03:37,000 --> 00:03:41,000 one person. You're in with a group of people, just as Vernon described, 49 00:03:41,000 --> 00:03:45,000 and that group of people becomes the creative engine that drives all the 50 00:03:45,000 --> 00:03:49,000 science within that lab. And so you're not only picking your 51 00:03:49,000 --> 00:03:53,000 project, you're looking for a group of people to work with. 52 00:03:53,000 --> 00:03:57,000 And, as Vernon said, if the lab is really doing hot stuff they tend to 53 00:03:57,000 --> 00:04:01,000 attract a lot of people. So a crowded lab can sometimes be a 54 00:04:01,000 --> 00:04:04,000 really good indicator. No absolutes, and there's an 55 00:04:04,000 --> 00:04:08,000 exception to everything, but that was a good piece of advice 56 00:04:08,000 --> 00:04:12,000 he gave you if you're looking for UROPs sometimes. 57 00:04:12,000 --> 00:04:16,000 OK. So, anyway, DNA fidelity. Remember I said we've 58 00:04:16,000 --> 00:04:20,000 gone from, our bodies have somewhere from like 10 to 20 billion miles of 59 00:04:20,000 --> 00:04:24,000 DNA in them if we could take all the human DNA and stretch it out? 60 00:04:24,000 --> 00:04:28,000 But that fidelity is done at an error rate of about one mistake to 61 00:04:28,000 --> 00:04:32,000 every ten to the minus tenth nucleotides replicated. 62 00:04:32,000 --> 00:04:37,000 Which I said if you were typing all the time it would be like sort of 63 00:04:37,000 --> 00:04:43,000 making one mistake every 38 years. So it's an astonishing degree of 64 00:04:43,000 --> 00:04:49,000 fidelity. Something that's beyond anything within our experience. 65 00:04:49,000 --> 00:04:55,000 And there are three principles that go. One is polymerase is really 66 00:04:55,000 --> 00:05:01,000 good at the base pair recognition telling that an A is paired with a T 67 00:05:01,000 --> 00:05:06,000 or a G is paired with a C. And discriminating against 68 00:05:06,000 --> 00:05:12,000 everything else there's a phenomenon known as proofreading, 69 00:05:12,000 --> 00:05:18,000 and I'll tell you how that works. And then there's a third system 70 00:05:18,000 --> 00:05:24,000 called mismatch repair. And all three of these contribute 71 00:05:24,000 --> 00:05:30,000 to this very, very low-frequency of errors, one mistake for 72 00:05:30,000 --> 00:05:36,000 approximately every ten to the tenth nucleotides replicated. 73 00:05:36,000 --> 00:05:40,000 So the first thing is I've pointed out to you several times that if you 74 00:05:40,000 --> 00:05:45,000 draw the hydrogen bonds between an A and a T base pair, 75 00:05:45,000 --> 00:05:49,000 the two hydrogen bonds or the three hydrogen bonds between a G and C 76 00:05:49,000 --> 00:05:54,000 base pair, that the shapes of this pair and that pair are virtually 77 00:05:54,000 --> 00:05:59,000 identical. You can pick them up and lay it right down on top. 78 00:05:59,000 --> 00:06:02,000 Now, if you actually look at it you'll see you could draw some base 79 00:06:02,000 --> 00:06:05,000 pairs between, for example, a G and a T. 80 00:06:05,000 --> 00:06:09,000 In fact, you can draw two hydrogen bonds, which is the same as between 81 00:06:09,000 --> 00:06:12,000 an A and a T. But the one thing I hope you can see, 82 00:06:12,000 --> 00:06:16,000 just from the shapes even without being able to see the individual 83 00:06:16,000 --> 00:06:19,000 atoms, is that a GT base pair doesn't have the same shape as the 84 00:06:19,000 --> 00:06:23,000 correct base pairs. So when I showed you that little 85 00:06:23,000 --> 00:06:26,000 movie the other day where this is the template nucleotide, 86 00:06:26,000 --> 00:06:30,000 this is the incoming nucleotide and there's this alpha helix 87 00:06:30,000 --> 00:06:34,000 that's swinging up. What's happening in there is that 88 00:06:34,000 --> 00:06:40,000 the enzyme is checking the way that the incoming nucleotide is the 89 00:06:40,000 --> 00:06:45,000 correct shape to go with the base pair. And you can sort of see it's 90 00:06:45,000 --> 00:06:50,000 flipping it right into a very narrow little slot in the enzyme. 91 00:06:50,000 --> 00:06:56,000 So it's not only asking for sort of hydrogen bonds, it's asking 92 00:06:56,000 --> 00:07:01,000 for the exact shape. If you just did it by thermodynamic 93 00:07:01,000 --> 00:07:07,000 grounds you'd make about one mistake in a hundred because that's about 94 00:07:07,000 --> 00:07:13,000 the discrimination between the correct base pairs and some of these 95 00:07:13,000 --> 00:07:19,000 other ones. This works so well. You get more like one mistake in 96 00:07:19,000 --> 00:07:25,000 ten to the fourth or ten to the fifth. We're still quite a distance 97 00:07:25,000 --> 00:07:31,000 away from the ten to the tenth, but this is one of the things. It's 98 00:07:31,000 --> 00:07:37,000 looking for the correct shape of the base pair. 99 00:07:37,000 --> 00:07:41,000 Now, the second thing that helps with fidelity is a phenomena known 100 00:07:41,000 --> 00:07:54,000 as proofreading -- 101 00:07:54,000 --> 00:08:00,000 -- exonuclease. Things called a nuclease. 102 00:08:00,000 --> 00:08:12,000 That means it can degrade DNA. And the exo works at an end. 103 00:08:12,000 --> 00:08:19,000 And, furthermore, 104 00:08:19,000 --> 00:08:24,000 the directionality of this proofreading was something that 105 00:08:24,000 --> 00:08:30,000 puzzled people initially because it's going 3 prime to 5 prime. 106 00:08:30,000 --> 00:08:35,000 And when people started to purify DNA polymerases or complexes of DNA 107 00:08:35,000 --> 00:08:40,000 polymerases involved in replication there seemed to be a puzzle because 108 00:08:40,000 --> 00:08:46,000 the polymerase, as I've told you, 109 00:08:46,000 --> 00:08:51,000 goes 5 prime to 3 prime, but the same enzyme complex had an 110 00:08:51,000 --> 00:08:57,000 exonuclease that went in the opposite direction. 111 00:08:57,000 --> 00:09:01,000 So this seemed very peculiar at first in the sense if you were 112 00:09:01,000 --> 00:09:06,000 trying to polymerase DNA in this way why in that same enzyme would you 113 00:09:06,000 --> 00:09:10,000 have something that wanted to degrade DNA in the other way? 114 00:09:10,000 --> 00:09:15,000 And the answer turned out that this was known as a proofreading 115 00:09:15,000 --> 00:09:20,000 exonuclease, as I've put up here. And here's the principle of how it 116 00:09:20,000 --> 00:09:24,000 works. Suppose you were replicating the DNA and there was a G. 117 00:09:24,000 --> 00:09:29,000 And if you put a C in there it very quickly goes on and continues 118 00:09:29,000 --> 00:09:34,000 the replication. If it puts in a T, 119 00:09:34,000 --> 00:09:38,000 let's say, this is not a very good base pair. It wouldn't have the 120 00:09:38,000 --> 00:09:42,000 right shape. So when the enzyme came up looking for that 3 prime 121 00:09:42,000 --> 00:09:47,000 hydroxyl, which would be right at the end of that T, 122 00:09:47,000 --> 00:09:51,000 things are not in the right place. And so the polymerase activity 123 00:09:51,000 --> 00:09:56,000 slows down. And as that primer terminus, if it sits there for a 124 00:09:56,000 --> 00:10:00,000 little bit, it's able to just peel off the DNA, flip up, 125 00:10:00,000 --> 00:10:04,000 and there's this function that does just what you'd do if you were 126 00:10:04,000 --> 00:10:09,000 typing and you made a mistake. You'd just hit the delete key and 127 00:10:09,000 --> 00:10:13,000 take off the last nucleotide that you did. And I have a little movie 128 00:10:13,000 --> 00:10:17,000 showing you that. This is a crystal structure. 129 00:10:17,000 --> 00:10:21,000 This is the DNA template. And the polymerase catalytic activity site 130 00:10:21,000 --> 00:10:25,000 is right here. And in this little movie it's just 131 00:10:25,000 --> 00:10:30,000 added an incorrect base pair and the polymerase is sort of stalled. 132 00:10:30,000 --> 00:10:35,000 And the actual nuclease function is physically separate on the protein 133 00:10:35,000 --> 00:10:41,000 structure. But what you'll see in the movie is that if the polymerase 134 00:10:41,000 --> 00:10:47,000 cannot go very well eventually this thing will come up and it will chop 135 00:10:47,000 --> 00:10:53,000 off one nucleotide, come back and try it again. 136 00:10:53,000 --> 00:10:59,000 Let's see. I think if we do this, oopsy-daisey. Let me see if I can 137 00:10:59,000 --> 00:11:04,000 get this to work here. Nope, it's not working. 138 00:11:04,000 --> 00:11:09,000 OK. Well, anyway, I'm going to skip it for right now. 139 00:11:09,000 --> 00:11:13,000 I don't want to waste time. But, in any case, the end would go 140 00:11:13,000 --> 00:11:18,000 up here and it would take off one nucleotide. So there at least are 141 00:11:18,000 --> 00:11:23,000 two of the ways that the polymerase is able to work with such fidelity. 142 00:11:23,000 --> 00:11:28,000 It selects for the correct base pair shape. 143 00:11:28,000 --> 00:11:32,000 And then after it's done in addition it sort of looks back, 144 00:11:32,000 --> 00:11:36,000 just as if you were a very slow typist, and every time you typed a 145 00:11:36,000 --> 00:11:40,000 letter you looked back and said did I make a mistake? 146 00:11:40,000 --> 00:11:44,000 And if you made a mistake then you'd delete and then just try again. 147 00:11:44,000 --> 00:11:48,000 And that gets the cell another maybe two orders of magnitude of 148 00:11:48,000 --> 00:11:52,000 accuracy. So we're up to about one mistake in ten to the seventh base 149 00:11:52,000 --> 00:11:56,000 pairs replicated. The third system, 150 00:11:56,000 --> 00:12:00,000 which is called mismatched repair, turns out to be very important for a 151 00:12:00,000 --> 00:12:04,000 whole variety of reasons. And before I tell you about it, 152 00:12:04,000 --> 00:12:08,000 I want to first introduce the idea of DNA repair in general. 153 00:12:08,000 --> 00:12:12,000 One of the things that's wonderful about DNA -- 154 00:12:12,000 --> 00:12:19,000 -- as you've learned, 155 00:12:19,000 --> 00:12:22,000 is it's got the information in two copies. It's in a complimentary 156 00:12:22,000 --> 00:12:25,000 form but it's like having the photograph and the negative. 157 00:12:25,000 --> 00:12:29,000 And if your kid sister pokes a hole with a pair of scissors through the 158 00:12:29,000 --> 00:12:32,000 picture of your boyfriend or your girlfriend, you're not really in 159 00:12:32,000 --> 00:12:35,000 trouble as long as you've got the negative because you can get the 160 00:12:35,000 --> 00:12:39,000 information back again. And that same principle applies in 161 00:12:39,000 --> 00:12:43,000 DNA repair. So if you have some kind of lesion in DNA, 162 00:12:43,000 --> 00:12:48,000 and this might have come from going outside in the sunlight, 163 00:12:48,000 --> 00:12:53,000 your DNA absorbs in the UV and it undergoes photoreactions, 164 00:12:53,000 --> 00:12:57,000 they tend, for the most part, to just effect one of the two 165 00:12:57,000 --> 00:13:03,000 strands of DNA. Or if you smoke, 166 00:13:03,000 --> 00:13:09,000 which I hope none of you do, there are many chemicals in smoke 167 00:13:09,000 --> 00:13:15,000 that will react with DNA, and they'll modify one strand. 168 00:13:15,000 --> 00:13:21,000 And so what the cell has is a system that has many kinds of repair 169 00:13:21,000 --> 00:13:27,000 systems, but it has a special type of repair system known as nucleotide 170 00:13:27,000 --> 00:13:33,000 excision repair. And you could think of this as a 171 00:13:33,000 --> 00:13:39,000 protein machine that constantly scans the DNA looking for little 172 00:13:39,000 --> 00:13:44,000 distortions. And if it finds it then what it needs to do is it needs 173 00:13:44,000 --> 00:13:50,000 to make cuts, remove the DNA and make a little gap. 174 00:13:50,000 --> 00:13:55,000 And now you can see what it can do now. Once it's got a little gap the 175 00:13:55,000 --> 00:14:01,000 information over here is a complimentary form. 176 00:14:01,000 --> 00:14:05,000 So if a DNA polymerase were to come along it could fill in that gap and 177 00:14:05,000 --> 00:14:10,000 seal it up and then you'd be back to ordinary DNA, the lesion would be 178 00:14:10,000 --> 00:14:15,000 gone. And I made a silly little PowerPoint thing here to show it. 179 00:14:15,000 --> 00:14:20,000 So if you were to, say, damage the guanine with something, 180 00:14:20,000 --> 00:14:25,000 say one of the carcinogens you find in cigarette smoke, 181 00:14:25,000 --> 00:14:30,000 you could think of this protein machine as being a sort of pair of 182 00:14:30,000 --> 00:14:35,000 scissors that have a conditionality in them. 183 00:14:35,000 --> 00:14:38,000 As this protein machine scans along the DNA the scissors aren't 184 00:14:38,000 --> 00:14:42,000 activated until it recognizes there's a distortion here, 185 00:14:42,000 --> 00:14:46,000 at which point then it senses that there's some bump in the DNA. 186 00:14:46,000 --> 00:14:50,000 And it's very cleaver the way it does it because the nuclease 187 00:14:50,000 --> 00:14:54,000 activities, the things that are going to cut the DNA are actually 188 00:14:54,000 --> 00:14:58,000 some distance away, a few nucleotides away from the 189 00:14:58,000 --> 00:15:02,000 lesion. So even if this is distorting the 190 00:15:02,000 --> 00:15:06,000 DNA, the scissors are able to work out here and out here. 191 00:15:06,000 --> 00:15:10,000 It makes two cuts. That was a huge surprise. Nobody expected that when 192 00:15:10,000 --> 00:15:14,000 they started to do the biochemistry. And then in principle once you cut 193 00:15:14,000 --> 00:15:18,000 it now you can remove this little nucleotide and then a DNA polymerase 194 00:15:18,000 --> 00:15:22,000 can just come in, and following those A pairs with T, 195 00:15:22,000 --> 00:15:26,000 G pairs with C, copy it along and then would seal it up to get to the 196 00:15:26,000 --> 00:15:30,000 end. And I've actually shown you a picture of what happens if a human 197 00:15:30,000 --> 00:15:35,000 is missing that system. When I was showing you how profound 198 00:15:35,000 --> 00:15:39,000 an effect you could get from just losing one single gene or a mutation 199 00:15:39,000 --> 00:15:44,000 affecting one single gene, this disease called xeroderma 200 00:15:44,000 --> 00:15:48,000 pigmentosum. They're a variety of different groups. 201 00:15:48,000 --> 00:15:53,000 And the one on the left is an example. That's someone who is 202 00:15:53,000 --> 00:15:57,000 missing one of the genes that encodes one of the proteins involved 203 00:15:57,000 --> 00:16:01,000 in nucleotide excision repair. And this is really, 204 00:16:01,000 --> 00:16:05,000 really important for fixing up the damage we get all the time in 205 00:16:05,000 --> 00:16:08,000 sunlight. So if you miss that repair system and you got out in the 206 00:16:08,000 --> 00:16:12,000 sun then you get all kinds of lesions and people are very 207 00:16:12,000 --> 00:16:16,000 susceptible to skin cancer. And I told you fortunately now you 208 00:16:16,000 --> 00:16:19,000 don't find people with this disease looking like that because at least 209 00:16:19,000 --> 00:16:23,000 in developed countries we recognize it. They're kept out of the sun. 210 00:16:23,000 --> 00:16:26,000 And these were the kids who I said are called ìchildren of the moonî 211 00:16:26,000 --> 00:16:30,000 because they, for example, go to summer camps where they do 212 00:16:30,000 --> 00:16:34,000 everything at night so they won't get exposed to sunlight. 213 00:16:34,000 --> 00:16:39,000 But that's what happens to us if we miss that excision repair. 214 00:16:39,000 --> 00:16:45,000 And, again, what makes that possible is that the information is 215 00:16:45,000 --> 00:16:50,000 there twice in a double-stranded DNA. I also showed you a little movie 216 00:16:50,000 --> 00:16:56,000 early on when I was showing you, I'm going to actually run this in 217 00:16:56,000 --> 00:17:02,000 QuickTime because it works a little more smoothly, I think. 218 00:17:02,000 --> 00:17:07,000 So I showed you this when we were 219 00:17:07,000 --> 00:17:11,000 talking about DNA because I wanted you to sort of get that sense of 220 00:17:11,000 --> 00:17:14,000 what it was like to kind of fly down the groove of a DNA. 221 00:17:14,000 --> 00:17:18,000 But what I didn't emphasize was this protein that was bound to the 222 00:17:18,000 --> 00:17:22,000 DNA. That's a protein that's a DNA repair protein. 223 00:17:22,000 --> 00:17:25,000 And it's one of these things that looks for lesions in the DNA. 224 00:17:25,000 --> 00:17:29,000 And as we fly along the major groove this little green thing is 225 00:17:29,000 --> 00:17:33,000 actually the lesion that that protein is looking for. 226 00:17:33,000 --> 00:17:38,000 And it sort of puts fingers down into the groove and it's able to 227 00:17:38,000 --> 00:17:43,000 sense that. And you can sort of see how this protein is bound to DNA. 228 00:17:43,000 --> 00:17:49,000 This is a lesion that we get all the time from oxidative damage. 229 00:17:49,000 --> 00:17:54,000 And remember I said oxygen is bad for DNA? So our bodies have to have 230 00:17:54,000 --> 00:18:00,000 systems that are able to do that. So DNA repair is very important for 231 00:18:00,000 --> 00:18:05,000 life. We'll just finish flying down the 232 00:18:05,000 --> 00:18:09,000 major groove one more time here. OK. I'm going to go back to 233 00:18:09,000 --> 00:18:20,000 PowerPoint. 234 00:18:20,000 --> 00:18:30,000 OK. So mismatched repair is a form of repair that's got 235 00:18:30,000 --> 00:18:37,000 that same idea. Let's think about it if we had a 236 00:18:37,000 --> 00:18:42,000 replication fork here, and let's say there was a G here and 237 00:18:42,000 --> 00:18:46,000 the T got misincorporated, but in this case it wasn't removed 238 00:18:46,000 --> 00:18:51,000 by the proofreading which happens about one in ten to the seventh 239 00:18:51,000 --> 00:18:56,000 times. Now if that strand is fixed up, excuse me, 240 00:18:56,000 --> 00:19:01,000 is continued then you'd end up with a GT base pair. 241 00:19:01,000 --> 00:19:05,000 And the next time you copied it this strand would give rise to a GC but 242 00:19:05,000 --> 00:19:09,000 this one would give rise to an AT. And then you'd have a mutation that 243 00:19:09,000 --> 00:19:13,000 now would have changed. And if it affected an important 244 00:19:13,000 --> 00:19:17,000 gene that could be bad for you. So the cell has what's known as a 245 00:19:17,000 --> 00:19:25,000 mismatch repair -- 246 00:19:25,000 --> 00:19:29,000 -- that works in exactly the same logic as here. That it 247 00:19:29,000 --> 00:19:34,000 basically comes along. It scans the DNA. 248 00:19:34,000 --> 00:19:41,000 It finds the bump because this is not a proper base pair. 249 00:19:41,000 --> 00:19:48,000 And then it fills it in and you're back to ordinary DNA with a GC base 250 00:19:48,000 --> 00:19:55,000 pair. There's one little wrinkle. For this system to work it has to 251 00:19:55,000 --> 00:20:03,000 do one other thing that's different from that kind of DNA repair. 252 00:20:03,000 --> 00:20:08,000 Can anybody see what it is? Why don't you talk to the person 253 00:20:08,000 --> 00:20:14,000 next to you and see if you can figure it out. 254 00:20:14,000 --> 00:20:19,000 This system must be doing something else in order for this to work. 255 00:20:19,000 --> 00:20:25,000 OK, you can ask somebody. What do you think? 256 00:20:25,000 --> 00:20:35,000 What if I removed the gene? 257 00:20:35,000 --> 00:20:43,000 Would that work? 258 00:20:43,000 --> 00:20:47,000 What would happen if I took the gene instead? Say I made the little gap 259 00:20:47,000 --> 00:20:52,000 over on this strand instead, cut it here? 260 00:20:52,000 --> 00:20:59,000 Yeah. So which one is the one 261 00:20:59,000 --> 00:21:03,000 that's right, the old strand or the new strand? 262 00:21:03,000 --> 00:21:07,000 The old strand, yeah. See, this is the old and this 263 00:21:07,000 --> 00:21:11,000 is the new. And the term that's usually used, it's known as the 264 00:21:11,000 --> 00:21:16,000 daughter strand, the new strand. So the other thing 265 00:21:16,000 --> 00:21:20,000 this system has to do is it not only has to be able to detect that 266 00:21:20,000 --> 00:21:25,000 there's an incorrect little base pair in there, 267 00:21:25,000 --> 00:21:29,000 but it also has to know which is the parental strand, 268 00:21:29,000 --> 00:21:33,000 the template strand, and which is the daughter strand, 269 00:21:33,000 --> 00:21:37,000 the newly synthesized strand. And this system makes the assumption 270 00:21:37,000 --> 00:21:41,000 that the strand that's old is the one that's correct and the mistake 271 00:21:41,000 --> 00:21:44,000 is on the new one. You guys see that? 272 00:21:44,000 --> 00:21:48,000 OK. So that gets another two or three orders of magnitude in 273 00:21:48,000 --> 00:21:52,000 accuracy and that's what brings it up. 274 00:21:52,000 --> 00:21:55,000 Now, the people who made this, who formulated this model for 275 00:21:55,000 --> 00:21:59,000 mismatch repair, complete with the feature that it 276 00:21:59,000 --> 00:22:03,000 needed to recognize the old and new strand, that's a bit of a trick, 277 00:22:03,000 --> 00:22:07,000 if you think about it because it's DNA on both sides. 278 00:22:07,000 --> 00:22:10,000 And there are several different ways used in nature, 279 00:22:10,000 --> 00:22:14,000 so I'm not going to go into it, but there's at least a couple of 280 00:22:14,000 --> 00:22:18,000 different ways of doing that trick. You could sort of see if you were 281 00:22:18,000 --> 00:22:22,000 the replication fork and you talked to that you could certainly, 282 00:22:22,000 --> 00:22:26,000 just from the geometry of that, if you wanted, you could probably 283 00:22:26,000 --> 00:22:29,000 keep track of who's old and new. E. coli has a very cute trick, 284 00:22:29,000 --> 00:22:33,000 but it's not universal so I won't go into it, but the people who did the 285 00:22:33,000 --> 00:22:36,000 seminal stuff, I had to just quickly show you a 286 00:22:36,000 --> 00:22:39,000 couple of pictures. When I showed you that picture of 287 00:22:39,000 --> 00:22:43,000 the DNA 50th, the guy sitting in the front row was Miroslav Radman who 288 00:22:43,000 --> 00:22:46,000 was one of the two people. He's a European scientist 289 00:22:46,000 --> 00:22:49,000 originally from Croatia. And he collaborated with someone 290 00:22:49,000 --> 00:22:53,000 you've heard about before, Matt Meselson, who was up at Harvard. 291 00:22:53,000 --> 00:22:56,000 And it was with the Meselson-Stahl experiment that showed the 292 00:22:56,000 --> 00:23:00,000 semi-conservative mechanism of DNA repair. 293 00:23:00,000 --> 00:23:03,000 This was a little reception. And Matt was talking to Alex Rich 294 00:23:03,000 --> 00:23:07,000 who's in the MIT Biology Department. And I was amused because remember 295 00:23:07,000 --> 00:23:11,000 how Vernon told you how Francis Crick would run up and down the 296 00:23:11,000 --> 00:23:15,000 stairs in the Cambridge lab and he was talking all the time? 297 00:23:15,000 --> 00:23:19,000 And I've heard Vernon say you could never really tell whether an idea 298 00:23:19,000 --> 00:23:23,000 came from Watson or Crick because they'd just talk, 299 00:23:23,000 --> 00:23:27,000 talk, talk all the time. So this was at sort of nice 300 00:23:27,000 --> 00:23:31,000 reception at the DNA 50th. And within a couple of minutes, 301 00:23:31,000 --> 00:23:35,000 I looked over and there were Miroslav Radman and Matt Meselson 302 00:23:35,000 --> 00:23:39,000 talk, talk, talk. They were in the corner drawing 303 00:23:39,000 --> 00:23:43,000 pictures on a board. I also showed you actually a 304 00:23:43,000 --> 00:23:47,000 picture of one of the genes that's involved in recognizing this 305 00:23:47,000 --> 00:23:51,000 mismatch, because there's a protein that recognizes that mismatch and 306 00:23:51,000 --> 00:23:55,000 it's given the name of mute S. And when I was showing you some 307 00:23:55,000 --> 00:23:59,000 proteins it had one that had a lot of alpha helices. 308 00:23:59,000 --> 00:24:04,000 This is actually a picture of mute S. It's a dimer. 309 00:24:04,000 --> 00:24:09,000 That's why some of it's green and some of it's blue. 310 00:24:09,000 --> 00:24:14,000 And this is DNA viewed end on and it's recognizing a GT mismatch in 311 00:24:14,000 --> 00:24:19,000 DNA in that picture. Now, this may sound very esoteric, 312 00:24:19,000 --> 00:24:24,000 you know, and obviously important for life and an important part of 313 00:24:24,000 --> 00:24:29,000 sort of understanding how life works if you're interesting in studying 314 00:24:29,000 --> 00:24:34,000 molecular biology. It may not seem to have very much 315 00:24:34,000 --> 00:24:39,000 connection to your real life. But, in fact, in this case mismatch 316 00:24:39,000 --> 00:24:44,000 repair does because it affects the frequency with which, 317 00:24:44,000 --> 00:24:49,000 if you lose it, then when you replicate your DNA you're going to 318 00:24:49,000 --> 00:24:54,000 make more mistakes. And I need to just give you a very 319 00:24:54,000 --> 00:24:59,000 quick introduction to cancer so you can see why this is important. 320 00:24:59,000 --> 00:25:03,000 Cancer comes from the fact that remember a human cell or a 321 00:25:03,000 --> 00:25:07,000 multi-cell like us that has many kinds of different cells starts out 322 00:25:07,000 --> 00:25:11,000 from one cell. And I talked about first you get 323 00:25:11,000 --> 00:25:15,000 the embryonic stem cells that can become anything. 324 00:25:15,000 --> 00:25:19,000 And the cells become successively more and more and more specialized 325 00:25:19,000 --> 00:25:23,000 as they go along. So ultimately a cell that's in your 326 00:25:23,000 --> 00:25:27,000 retina or in, say, the lining of your colon needs to 327 00:25:27,000 --> 00:25:32,000 know that's where it belongs. And it also needs to know that it 328 00:25:32,000 --> 00:25:37,000 cannot just keep replicating. So if this is actually showing a 329 00:25:37,000 --> 00:25:42,000 little picture of the lining of your intestine. And there's a single 330 00:25:42,000 --> 00:25:46,000 layer of cells right along the inside edge of your intestines. 331 00:25:46,000 --> 00:25:51,000 This is the cells through which all the nutrient exchange happens and 332 00:25:51,000 --> 00:25:56,000 everything else when your body extracts nutrients as food stuff 333 00:25:56,000 --> 00:26:01,000 passes through your intestine. And so what happens with cancer is a 334 00:26:01,000 --> 00:26:05,000 cell that's normally a part of your body has to obey a whole set of 335 00:26:05,000 --> 00:26:10,000 rules. And what you can think of when someone starts to develop 336 00:26:10,000 --> 00:26:15,000 cancer is that what started out as an ordinary cell undergoes some kind 337 00:26:15,000 --> 00:26:19,000 of successive changes in its DNA that gradually causes it to forget 338 00:26:19,000 --> 00:26:24,000 the rules that make it be part of an organized body system. 339 00:26:24,000 --> 00:26:29,000 So if we take a look here at all these different cells. 340 00:26:29,000 --> 00:26:33,000 But let's imagine just one of the gets a change that makes it forget 341 00:26:33,000 --> 00:26:38,000 to stop, or it should know to stop replicating when it touches its 342 00:26:38,000 --> 00:26:42,000 neighbors, but if a cell were to lose that control what would happen? 343 00:26:42,000 --> 00:26:47,000 Well, it would then begin to proliferate. And then what happens 344 00:26:47,000 --> 00:26:52,000 in cancer is the cell will, now there are more of them, and one 345 00:26:52,000 --> 00:26:56,000 cell with acquire an additional mutation that will lead to a further 346 00:26:56,000 --> 00:27:01,000 loss of growth control. You can see now the cells are 347 00:27:01,000 --> 00:27:05,000 starting to become sort of funny shapes. And then one of the cells 348 00:27:05,000 --> 00:27:09,000 in here will undergo yet another change. And right at this point, 349 00:27:09,000 --> 00:27:13,000 up until now, the cancer has, even though the cells are dividing and 350 00:27:13,000 --> 00:27:18,000 have lost some of their growth control they're still staying in the 351 00:27:18,000 --> 00:27:22,000 same place. So that would be sort of, you know, like a wart or 352 00:27:22,000 --> 00:27:26,000 something like that, or what you would hear as a benign 353 00:27:26,000 --> 00:27:30,000 tumor. You can go in surgically and take it 354 00:27:30,000 --> 00:27:34,000 away. But then the other thing that can happen is cells can forget where 355 00:27:34,000 --> 00:27:38,000 they're supposed to be in the body. And when that happens they say the 356 00:27:38,000 --> 00:27:42,000 cells metastasize and become metastatic or a malignant tumor. 357 00:27:42,000 --> 00:27:46,000 And what that means is the cell is beginning to, it's acquired yet 358 00:27:46,000 --> 00:27:50,000 another change that's made it forget which part of the body it's supposed 359 00:27:50,000 --> 00:27:54,000 to be in. And they've signified it here as being a change in this cell 360 00:27:54,000 --> 00:27:58,000 that then leads to, you can see here right now it's 361 00:27:58,000 --> 00:28:02,000 starting to invade into the whole intestine. 362 00:28:02,000 --> 00:28:06,000 Or if one of those cells comes off lose in your bloodstream it can land 363 00:28:06,000 --> 00:28:10,000 somewhere else in your body and then start to grow there. 364 00:28:10,000 --> 00:28:15,000 And that's what happens when somebody has metastatic cancer. 365 00:28:15,000 --> 00:28:19,000 You cannot really cure it because now there are cancer cells all over 366 00:28:19,000 --> 00:28:24,000 the body. And that usually is a very difficult situation to get any 367 00:28:24,000 --> 00:28:28,000 kind of cure on. So to put this in perspective, 368 00:28:28,000 --> 00:28:32,000 you needed to have a number of changes to go from an ordinary cell 369 00:28:32,000 --> 00:28:37,000 to a metastatic cancer cell. So each one of these changes there 370 00:28:37,000 --> 00:28:43,000 was some kind of change in the DNA. Either there was a mutation or 371 00:28:43,000 --> 00:28:48,000 maybe a chromosome was lost or something like this so that you need 372 00:28:48,000 --> 00:28:53,000 a series of successive genetic alterations. So there was a very 373 00:28:53,000 --> 00:28:59,000 key insight that a number of people had after we understood the 374 00:28:59,000 --> 00:29:04,000 mechanism of mismatch repair. Because some people realized that if 375 00:29:04,000 --> 00:29:09,000 a human cell had lost mismatch repair then the frequency of each 376 00:29:09,000 --> 00:29:14,000 one of these changes would go up. It wouldn't affect what the change 377 00:29:14,000 --> 00:29:19,000 was. It wouldn't actually have anything to do, 378 00:29:19,000 --> 00:29:24,000 if you lost mismatch repair it wouldn't affect directly the ability 379 00:29:24,000 --> 00:29:29,000 of this cell to stop dividing when it touches its neighbors. 380 00:29:29,000 --> 00:29:35,000 But it would increase the chances that a mutation somewhere would have 381 00:29:35,000 --> 00:29:41,000 that effect. And if every one of these steps goes now a hundred or a 382 00:29:41,000 --> 00:29:47,000 thousand times faster, you can see that if somebody loses 383 00:29:47,000 --> 00:29:54,000 mismatch repair in a cell then the chances of that cell coming into a 384 00:29:54,000 --> 00:30:00,000 cancer are very high. So there was a kind of human cancer, 385 00:30:00,000 --> 00:30:06,000 it's a susceptibility to colon cancer called hereditary 386 00:30:06,000 --> 00:30:12,000 nonpolyposis colon cancer. You don't need to remember the name. 387 00:30:12,000 --> 00:30:18,000 It's often abbreviated HNPCC for people who cannot remember the name. 388 00:30:18,000 --> 00:30:23,000 But it was a kind of susceptibility to cancer that ran in families. 389 00:30:23,000 --> 00:30:29,000 So it was thought to be genetically determined in some way. 390 00:30:29,000 --> 00:30:33,000 And one of the interesting things was a number of the people who had 391 00:30:33,000 --> 00:30:38,000 this disease would show a kind of instability of the genome if they 392 00:30:38,000 --> 00:30:42,000 looked in the tumors. They just looked at the DNA. 393 00:30:42,000 --> 00:30:47,000 It seemed to be undergoing changes at a much faster rate. 394 00:30:47,000 --> 00:30:51,000 And the insight that came out was that the people who had this disease 395 00:30:51,000 --> 00:30:56,000 had, for example, a mutation affecting what we can 396 00:30:56,000 --> 00:31:01,000 think of as a human homolog of mute S. 397 00:31:01,000 --> 00:31:06,000 And we'll talk about genetics of humans in a small number of weeks, 398 00:31:06,000 --> 00:31:11,000 but I think most of you know that for most genes, 399 00:31:11,000 --> 00:31:16,000 except for the genes associated with the sex chromosomes, 400 00:31:16,000 --> 00:31:21,000 you get one copy of a gene from mom and another copy of a gene from dad. 401 00:31:21,000 --> 00:31:26,000 So under most circumstances we would have two good copies of this 402 00:31:26,000 --> 00:31:32,000 gene encoding a human homolog of mute S. 403 00:31:32,000 --> 00:31:36,000 What does that human homolog of mute S do? The same thing as the 404 00:31:36,000 --> 00:31:41,000 bacteria. It recognizes a mismatch in DNA and fixes it up. 405 00:31:41,000 --> 00:31:45,000 So it turned out that what the people with this disease have is 406 00:31:45,000 --> 00:31:50,000 they have one of the genes. The gene they got from mom or the 407 00:31:50,000 --> 00:31:55,000 gene they got from dad is broken. So they're still OK. They have one 408 00:31:55,000 --> 00:32:00,000 copy of mismatch repair in every cell. 409 00:32:00,000 --> 00:32:06,000 But if a cell ever had lost that copy of the good version now that 410 00:32:06,000 --> 00:32:13,000 cell and all of its descendents would mutate at something like a 411 00:32:13,000 --> 00:32:20,000 hundred or a thousand times the normal probability. 412 00:32:20,000 --> 00:32:27,000 And so they would progress down this pathway. 413 00:32:27,000 --> 00:32:33,000 And so the polyposis means that if they look in the colons of people 414 00:32:33,000 --> 00:32:39,000 who have this disease they find lots and lots of little growths or polyps 415 00:32:39,000 --> 00:32:45,000 that are on their way to progressing down this disease. 416 00:32:45,000 --> 00:32:51,000 Even in these people it takes quite a while. And so once they knew that 417 00:32:51,000 --> 00:32:57,000 they were able to go in and through colonoscopies find these cancers 418 00:32:57,000 --> 00:33:02,000 and remove them. And most of you will not have that 419 00:33:02,000 --> 00:33:06,000 disease, but this is now a kind of cancer that's pretty much 420 00:33:06,000 --> 00:33:11,000 preventable as long as it gets detected. It can take in a normal 421 00:33:11,000 --> 00:33:15,000 person as long as 20 years or something for an initial cell that 422 00:33:15,000 --> 00:33:19,000 underwent this initial change to go all the way down to becoming 423 00:33:19,000 --> 00:33:24,000 metastatic. So when you get older, and this certainly applies to most 424 00:33:24,000 --> 00:33:28,000 of your parents or in this age group, you should have ask them if they've 425 00:33:28,000 --> 00:33:32,000 had a colonoscopy. It's not the world's most fun 426 00:33:32,000 --> 00:33:35,000 procedure because, you know, they stick a probe and 427 00:33:35,000 --> 00:33:39,000 look inside your intestine, but it isn't that bad. And what 428 00:33:39,000 --> 00:33:42,000 they do is if they see one of these little polyps they can catch it 429 00:33:42,000 --> 00:33:45,000 before it's progressed far enough to be metastatic. 430 00:33:45,000 --> 00:33:49,000 And then there's no problem. I had my first one done about, 431 00:33:49,000 --> 00:33:52,000 I don't know, three or four years ago and they found one. 432 00:33:52,000 --> 00:33:55,000 And they took it out and I'm fine. But if it had been left there and 433 00:33:55,000 --> 00:33:59,000 allowed to progress then some years down the line I would have 434 00:33:59,000 --> 00:34:02,000 gotten colon cancer. And I'm going to have to go back and 435 00:34:02,000 --> 00:34:06,000 get checked again in another year or two. But it is something that you 436 00:34:06,000 --> 00:34:10,000 should check with your parents because everybody should have a 437 00:34:10,000 --> 00:34:13,000 colonoscopy. My hope is by the time you guys reach an age when this 438 00:34:13,000 --> 00:34:17,000 comes they'll probably have some kind of little blood test or 439 00:34:17,000 --> 00:34:20,000 something where you won't have to go through this indignity. 440 00:34:20,000 --> 00:34:24,000 But right at the moment it's something everyone should do, 441 00:34:24,000 --> 00:34:28,000 I think. I just wanted to make one other comment about basic research 442 00:34:28,000 --> 00:34:32,000 because there's another thing here. Actually, my lab was the first lab 443 00:34:32,000 --> 00:34:36,000 to clone the mute S gene. We cloned it, we sequenced it, 444 00:34:36,000 --> 00:34:40,000 and we looked in the databases. And at that time in the late eighties 445 00:34:40,000 --> 00:34:44,000 there was nothing else that looked like it. I thought it would be like, 446 00:34:44,000 --> 00:34:48,000 there were some sort of similar mutants, and here's what it looked 447 00:34:48,000 --> 00:34:52,000 like. This is a culture of E. coli. And there are about ten to 448 00:34:52,000 --> 00:34:56,000 the ninth cells per mil. And we plated about ten to the 449 00:34:56,000 --> 00:35:00,000 ninth or ten to the eighth on a plate with a drug on it. 450 00:35:00,000 --> 00:35:03,000 And you can see they almost all died, but there were maybe three or four 451 00:35:03,000 --> 00:35:07,000 that survived. And then their descendents were 452 00:35:07,000 --> 00:35:11,000 able to grow up and form a colony. This is how we recognized something 453 00:35:11,000 --> 00:35:15,000 was defective and what we now know as mismatch repair. 454 00:35:15,000 --> 00:35:18,000 If you took this mutant of E. coli and plated it out, you'd see 455 00:35:18,000 --> 00:35:22,000 you got a lot more drug-resistant colonies. That's the difference 456 00:35:22,000 --> 00:35:26,000 that I was describing, the importance of mismatch repair. 457 00:35:26,000 --> 00:35:30,000 If you don't have mismatch repair you can see, you get a lot more 458 00:35:30,000 --> 00:35:33,000 mistakes that show up as mutants. So I was studying that. 459 00:35:33,000 --> 00:35:37,000 And we cloned the mute S and mute L genes which are another gene that's 460 00:35:37,000 --> 00:35:41,000 involved in this. Didn't see anything in the database, 461 00:35:41,000 --> 00:35:44,000 but there were very similar mutants in streptococcus pneumonia that 462 00:35:44,000 --> 00:35:48,000 people had isolated. Remember streptococcus pneumonia in 463 00:35:48,000 --> 00:35:51,000 the transformation experiments? So I thought, well, maybe these are 464 00:35:51,000 --> 00:35:55,000 the same genes on an evolutionary basis. So I phoned some labs, 465 00:35:55,000 --> 00:35:59,000 and I found one that was sequencing what turned out to be 466 00:35:59,000 --> 00:36:02,000 a homolog of mute S. We tried to publish our papers in a 467 00:36:02,000 --> 00:36:06,000 medium fancy journal because I thought this was a pretty cool 468 00:36:06,000 --> 00:36:10,000 result that two bacteria that were evolutionarily very diverged had 469 00:36:10,000 --> 00:36:14,000 this conserve mechanism for mismatch repair, but the reviewer said, 470 00:36:14,000 --> 00:36:18,000 you know, this is a pretty specialized topic, 471 00:36:18,000 --> 00:36:22,000 it's not of general interest, it should go in, the phrase they use 472 00:36:22,000 --> 00:36:26,000 is ìa more specialized journalî. So it was published in the Journal 473 00:36:26,000 --> 00:36:30,000 of Bacteriology which is a really wonderful journal, 474 00:36:30,000 --> 00:36:33,000 but it basically deals with bacteria. And about a week after that paper 475 00:36:33,000 --> 00:36:36,000 came out my phone rang and it was a guy from Emory. 476 00:36:36,000 --> 00:36:39,000 And he said, ìI work on mouse. We were sequencing a gene,î it 477 00:36:39,000 --> 00:36:42,000 doesn't matter what, ìand we sequenced in the wrong 478 00:36:42,000 --> 00:36:45,000 direction. And we seem to have something called mute S. 479 00:36:45,000 --> 00:36:48,000 Do you know anything about mute S? And a couple of days after that I 480 00:36:48,000 --> 00:36:51,000 got a phone call from somebody at NIH. And they said the same thing, 481 00:36:51,000 --> 00:36:54,000 ìWe were trying to sequence this gene in humans. 482 00:36:54,000 --> 00:36:57,000 We kind of sequenced in the wrong direction and found mute S. 483 00:36:57,000 --> 00:37:00,000 So within a week of the paper coming out I knew there were mouse 484 00:37:00,000 --> 00:37:04,000 and human homologs. And that led from these sorts of 485 00:37:04,000 --> 00:37:09,000 studies, which my first graduate student worked on, 486 00:37:09,000 --> 00:37:14,000 to the identification of the human homologs. And then not me but 487 00:37:14,000 --> 00:37:19,000 others made the connection between mismatch repair and cancer. 488 00:37:19,000 --> 00:37:24,000 But this is the way a lot of things happen with basic research. 489 00:37:24,000 --> 00:37:29,000 This doesn't look like anything that's very important. 490 00:37:29,000 --> 00:37:32,000 And it sure doesn't look like it's going to lead to an insight into 491 00:37:32,000 --> 00:37:36,000 cancer, but this is very much the way it goes. I've had this happen 492 00:37:36,000 --> 00:37:40,000 twice with another set of genes in my life that turned out to be 493 00:37:40,000 --> 00:37:44,000 important for cancer as well. And, as I said, what happens, 494 00:37:44,000 --> 00:37:47,000 if you lose mismatch repair, then all these alterations happen much 495 00:37:47,000 --> 00:37:51,000 more quickly and the cells can become cancerous. 496 00:37:51,000 --> 00:37:55,000 I've included a couple of outtakes because I actually made this slide 497 00:37:55,000 --> 00:37:59,000 with my son's pillowcase on our dining room counter. 498 00:37:59,000 --> 00:38:04,000 And our cats, who you saw at some point earlier in the year, 499 00:38:04,000 --> 00:38:09,000 thought this was the weirdest thing they had ever seen, 500 00:38:09,000 --> 00:38:14,000 when I brought these plates home. So, OK, anyway. All right. So one 501 00:38:14,000 --> 00:38:19,000 other thing to tell you about DNA replication before I move 502 00:38:19,000 --> 00:38:28,000 on, and that is -- 503 00:38:28,000 --> 00:38:33,000 -- the initiation of DNA replication. In E. coli there's one great big 504 00:38:33,000 --> 00:38:39,000 piece of DNA. And it's all one giant circular chromosome. 505 00:38:39,000 --> 00:38:45,000 And if you realize what I've told you about DNA replication, 506 00:38:45,000 --> 00:38:50,000 I've talked to you only about once you have a replication fork 507 00:38:50,000 --> 00:38:56,000 established how you keep it going. But, as you might guess, a really 508 00:38:56,000 --> 00:39:02,000 important point of biological control is the initiation 509 00:39:02,000 --> 00:39:07,000 of DNA replication. And so the way cells do that is they 510 00:39:07,000 --> 00:39:12,000 have a special sequence in their DNA. It's written just with Gs and Cs 511 00:39:12,000 --> 00:39:17,000 and As and Ts, but it's a word sort of written in a 512 00:39:17,000 --> 00:39:23,000 different language than the kind of genetic code we're going to be 513 00:39:23,000 --> 00:39:28,000 talking about in the next couple of lectures. And what it means is 514 00:39:28,000 --> 00:39:34,000 ìstart replication hereî. And so in E. coli these terms are 515 00:39:34,000 --> 00:39:41,000 called origin DNA replication. And, for example, in E. coli it's a 516 00:39:41,000 --> 00:39:48,000 stretch of DNA that's about 250 base pairs long. And it's got a sequence 517 00:39:48,000 --> 00:39:55,000 that lets proteins bind and they kind of are able to make a little 518 00:39:55,000 --> 00:40:01,000 bubble like this. And it's at the edges of this little 519 00:40:01,000 --> 00:40:05,000 bubble where it's able to start a replication fork. 520 00:40:05,000 --> 00:40:09,000 And one of the secrets to control of cell division is that cells are 521 00:40:09,000 --> 00:40:13,000 able then to control whether the protein that sees the origin is 522 00:40:13,000 --> 00:40:17,000 there or not. And it won't start a new round of replication unless 523 00:40:17,000 --> 00:40:21,000 everything is right. Then it can make the things that 524 00:40:21,000 --> 00:40:25,000 initiate a new round. And after that it finishes. 525 00:40:25,000 --> 00:40:30,000 Our eukaryotic cells with a lot more DNA use the same thing. 526 00:40:30,000 --> 00:40:34,000 The same idea, but there tend to be multiple 527 00:40:34,000 --> 00:40:39,000 origins. And you get a little bubble and another little one down 528 00:40:39,000 --> 00:40:44,000 here. And once you get the replication forks established then 529 00:40:44,000 --> 00:40:49,000 these kind of merge. And then eventually we end up with 530 00:40:49,000 --> 00:40:53,000 the two strands of DNA. But I just mention that in passing 531 00:40:53,000 --> 00:40:58,000 because it's an example of how even though the DNA is nothing but Gs and 532 00:40:58,000 --> 00:41:03,000 Cs and As and Ts, you can kind of write words in there 533 00:41:03,000 --> 00:41:08,000 that mean different things. Some of them on the genetic code 534 00:41:08,000 --> 00:41:13,000 tell you what the order of amino acids in the cell are, 535 00:41:13,000 --> 00:41:17,000 but everything else has to be encoded in the DNA, 536 00:41:17,000 --> 00:41:22,000 too. And here's a really nice example of how that works. 537 00:41:22,000 --> 00:41:27,000 Now, we're going to switch at this point from worrying about how DNA is 538 00:41:27,000 --> 00:41:32,000 replicated to how information is stored and interpreted. 539 00:41:32,000 --> 00:41:39,000 And there's a figure that most of 540 00:41:39,000 --> 00:41:44,000 you have probably seen, DNA goes to RNA goes to protein. 541 00:41:44,000 --> 00:41:48,000 This is the usual direction of information flow. 542 00:41:48,000 --> 00:41:53,000 The information for making proteins is encoded in the DNA, 543 00:41:53,000 --> 00:41:58,000 as we'll talk about in more detail, and an RNA copy of some piece of 544 00:41:58,000 --> 00:42:03,000 that, one gene's worth usually, gets made in RNA. 545 00:42:03,000 --> 00:42:12,000 And then that information in the RNA is used to direct the sequences of 546 00:42:12,000 --> 00:42:22,000 amino acids that appear in a protein. And this is a four letter alphabet, 547 00:42:22,000 --> 00:42:31,000 if you want, A, G, T and C. This is a four letter alphabet, 548 00:42:31,000 --> 00:42:39,000 A, G, U and C, where the uracil and the thiamine have the same base 549 00:42:39,000 --> 00:42:47,000 pairing capacity. And this is a 20 letter alphabet. 550 00:42:47,000 --> 00:42:54,000 All those 20 amino acids that you 551 00:42:54,000 --> 00:42:58,000 were looking at, at the chart over at the back of the 552 00:42:58,000 --> 00:43:02,000 exam. So from the point of view of 553 00:43:02,000 --> 00:43:06,000 information storage and information flow there are some interesting 554 00:43:06,000 --> 00:43:11,000 things that had to come up in order for the information to flow in that 555 00:43:11,000 --> 00:43:15,000 way. But before I do that I want to just get you to think about DNA as 556 00:43:15,000 --> 00:43:20,000 an information storage device. This is MIT. I'm almost sure in 557 00:43:20,000 --> 00:43:24,000 this room there are some people that are experts in high density 558 00:43:24,000 --> 00:43:29,000 information storage. And even if you're not most of us 559 00:43:29,000 --> 00:43:35,000 have now a lot of experience with it. Your computer can do gigabytes of 560 00:43:35,000 --> 00:43:40,000 information. Your iPod probably has a 40 megabyte hard drive in it or 561 00:43:40,000 --> 00:43:46,000 something like that. So you have some experience with 562 00:43:46,000 --> 00:43:51,000 high density information storage. So here's the question. How much 563 00:43:51,000 --> 00:43:57,000 DNA would it take to encode everybody who's alive on earth today, 564 00:43:57,000 --> 00:44:02,000 6 billion and a bit people? And let's argue that all we need is 565 00:44:02,000 --> 00:44:08,000 a single cell's worth of DNA because everybody started out a single 566 00:44:08,000 --> 00:44:14,000 fertilized egg and went on. Yeah? OK. Enough DNA to fill one 567 00:44:14,000 --> 00:44:19,000 human being. Anybody else got any sense? All right. 568 00:44:19,000 --> 00:44:25,000 This is, I think, the most amazing demo. I did this when I was 569 00:44:25,000 --> 00:44:30,000 teaching for the first time. The amount of DNA it would take to 570 00:44:30,000 --> 00:44:34,000 encode everybody who's alive on earth, one cell of everybody who's 571 00:44:34,000 --> 00:44:39,000 alive on earth today is this little thing in here, 572 00:44:39,000 --> 00:44:43,000 which you probably cannot see even, but I took a picture of it. There 573 00:44:43,000 --> 00:44:48,000 are about six times ten to the minus twelfth grams of DNA in a human cell. 574 00:44:48,000 --> 00:44:53,000 And if you multiple that out by 6 billion people it comes out to 36 575 00:44:53,000 --> 00:44:57,000 milligrams of DNA. And I weighed out 40 something 576 00:44:57,000 --> 00:45:02,000 milligrams of DNA. So there's actually more DNA there 577 00:45:02,000 --> 00:45:06,000 than you need to encode everybody who's alive on earth today. 578 00:45:06,000 --> 00:45:11,000 And I don't know how this hits you, but I've been working on DNA my 579 00:45:11,000 --> 00:45:16,000 entire life. And every time I do this, you know, 580 00:45:16,000 --> 00:45:20,000 I think I understand this molecule, but I don't really think I do at 581 00:45:20,000 --> 00:45:25,000 some more fundamental level. It's absolutely amazing how much 582 00:45:25,000 --> 00:45:30,000 information is stored in that molecule. 583 00:45:30,000 --> 00:45:34,000 So the one point I will, actually, I think it's close enough. 584 00:45:34,000 --> 00:45:37,000 Why don't we just call it a day, and I'll pick this stuff --