1 00:00:00,100 --> 00:00:04,000 What we've talked about in recombinant DNA so far is how to get 2 00:00:04,100 --> 00:00:09,000 a piece of DNA from somewhere and make a whole lot of copies of it. 3 00:00:09,100 --> 00:00:14,000 So, instead of working with DNA extracted from my cells, 4 00:00:14,100 --> 00:00:18,000 which there's 3 billion different base pairs of sequence, 5 00:00:18,100 --> 00:00:23,000 we can find a little stretch of DNA. But simply being able to clone it, 6 00:00:23,100 --> 00:00:28,000 it was a long way from him being able to get a lot of a fragment of 7 00:00:28,100 --> 00:00:33,000 DNA to be able to figure out what that sequence is. 8 00:00:33,100 --> 00:00:37,000 That sequence of that gene is actually from the [zero derma? 9 00:00:37,100 --> 00:00:41,000 gene, the gene that's broken and the zero derma pigmentosa variant 10 00:00:41,100 --> 00:00:45,000 patience, missing one of these translesion DNA polymerases that can 11 00:00:45,100 --> 00:00:50,000 copy over a thymine, thymine, [permadine?], 12 00:00:50,100 --> 00:00:54,000 [dymer?], this induced by UV light. So the reason they have that 13 00:00:54,100 --> 00:00:58,000 problem with their skin after sunlight is because they are missing 14 00:00:58,100 --> 00:01:02,000 a polymerase that can't copy over accurately over this very common 15 00:01:02,100 --> 00:01:07,000 lesion caused by DNA damage. But how do you get from having a 16 00:01:07,100 --> 00:01:11,000 piece of DNA and having the sequence? So, the first thing that people 17 00:01:11,100 --> 00:01:16,000 learn to do, and you still do this all the time in any molecular 18 00:01:16,100 --> 00:01:20,000 biology lab. We're sort of switching now in engineering as 19 00:01:20,100 --> 00:01:25,000 you'll see. You're going to see in the next things that I'm going to 20 00:01:25,100 --> 00:01:29,000 say, proteins that were, I talked to you about because of 21 00:01:29,100 --> 00:01:34,000 their biological roles. DNA polymerase is, 22 00:01:34,100 --> 00:01:38,000 ligases, we learned what they do. And now you're going to see them 23 00:01:38,100 --> 00:01:42,000 used in manipulative ways. Restriction enzymes: they have a 24 00:01:42,100 --> 00:01:46,000 biological purpose, too. They weren't put on Earth for 25 00:01:46,100 --> 00:01:51,000 me to cut up into fragments and clone. They were there to give the 26 00:01:51,100 --> 00:01:55,000 bacteria is some kind of primitive immune system. 27 00:01:55,100 --> 00:01:59,000 But the first thing that you often have to do, and you have, 28 00:01:59,100 --> 00:02:04,000 let's say, a plasmid into which we've inserted a fragment. 29 00:02:04,100 --> 00:02:08,000 And let's say it was the kind of cloning we described the other day, 30 00:02:08,100 --> 00:02:13,000 where I had cut to the vector with an ECO-R1 site, 31 00:02:13,100 --> 00:02:18,000 and the other DNA with an ECO-R1 site. So these, 32 00:02:18,100 --> 00:02:23,000 the junction between the inserted fragment and the cut vector has 33 00:02:23,100 --> 00:02:28,000 re-created two ECO-R1 sites now. And if I cut with ECO-R1, I'll just 34 00:02:28,100 --> 00:02:33,000 undo what I did in the cloning, and we should get the vector DNA and 35 00:02:33,100 --> 00:02:39,000 the insert DNA back again. So, I can go from this vector to 36 00:02:39,100 --> 00:02:45,000 give us an orientation. I'm going to imagine that it has 37 00:02:45,100 --> 00:02:51,000 one more restriction site. This one is called [sal 1? 38 00:02:51,100 --> 00:02:57,000 . It just recognizes a different sequence. So, 39 00:02:57,100 --> 00:03:03,000 if I take the plasmid DNA and cut with ECO-R1, I just reversing 40 00:03:03,100 --> 00:03:08,000 the cloning. So I should get is the vector DNA 41 00:03:08,100 --> 00:03:14,000 and the insert that I generated in the first place. 42 00:03:14,100 --> 00:03:20,000 But I have to detect them somehow. Unfortunately, they don't just look 43 00:03:20,100 --> 00:03:26,000 in the test tube like that. So, what people do is they use a 44 00:03:26,100 --> 00:03:32,000 very simple principle. It's called gel electrophoresis. 45 00:03:32,100 --> 00:03:43,000 And the idea is you just make a gel 46 00:03:43,100 --> 00:03:49,000 of something. In this particular case, it's just made of augur, 47 00:03:49,100 --> 00:03:54,000 which is agarose. These are polysaccharide products, 48 00:03:54,100 --> 00:04:00,000 often derived from seaweed or something like that. 49 00:04:00,100 --> 00:04:03,000 They have the property that if you warm them up, they're liquid, 50 00:04:03,100 --> 00:04:07,000 and that if you let them cool down, they're a gel. You've run into 51 00:04:07,100 --> 00:04:11,000 Jell-O which has the property. That's actually made of a protein 52 00:04:11,100 --> 00:04:15,000 rather than a carbohydrate. But it's that kind of principle. 53 00:04:15,100 --> 00:04:18,000 So it's very easy to pour something and that let it solidify. 54 00:04:18,100 --> 00:04:22,000 Now you've got a slab. And it's just a network of things that 55 00:04:22,100 --> 00:04:26,000 interact. And the principle of the thing is that, 56 00:04:26,100 --> 00:04:30,000 so you have to get the molecules to move. Well that's pretty easy with 57 00:04:30,100 --> 00:04:34,000 nucleic acids because they're charged. 58 00:04:34,100 --> 00:04:37,000 They have all those phosphates. They've got a lot of negative 59 00:04:37,100 --> 00:04:41,000 charge. So if you apply an electric field, they'll move. 60 00:04:41,100 --> 00:04:45,000 And the principle of the thing is that if you're big, 61 00:04:45,100 --> 00:04:49,000 it's harder to wiggle through this network than if you are small. 62 00:04:49,100 --> 00:04:53,000 Or you can think of a big, fat person trying to go through a forest 63 00:04:53,100 --> 00:04:57,000 with a lot of trees, and a little skinny one. 64 00:04:57,100 --> 00:05:01,000 And if we let them have a race, eventually the skinny one will 65 00:05:01,100 --> 00:05:06,000 emerge from the forest first. And so, if we had a set of markers 66 00:05:06,100 --> 00:05:13,000 down on the side where this is big, and this is small, and we take this 67 00:05:13,100 --> 00:05:20,000 piece of DNA, we're going to get two fragments. The bigger one would be 68 00:05:20,100 --> 00:05:27,000 the vector and the smaller one would be an insert. So from that, 69 00:05:27,100 --> 00:05:34,000 we could say, oh, if I didn't know what I started with I could say what 70 00:05:34,100 --> 00:05:40,000 must be at that plasmid? Is the vector? 71 00:05:40,100 --> 00:05:46,000 And I can run at all by itself and see it's exactly the same size. 72 00:05:46,100 --> 00:05:51,000 And I got an insert of this particular size. 73 00:05:51,100 --> 00:05:57,000 And now, can I learn anything more about that just using restriction 74 00:05:57,100 --> 00:06:03,000 enzymes? And let's say I now take the same. 75 00:06:03,100 --> 00:06:10,000 Actually, maybe I'll do it over here. So let's cut, 76 00:06:10,100 --> 00:06:17,000 this time, with ECO-R1 plus another restriction enzyme. 77 00:06:17,100 --> 00:06:24,000 They all have these weird names: [bam H1?], and let's see what 78 00:06:24,100 --> 00:06:32,000 happens. Well suppose I do that and I get something like this. 79 00:06:32,100 --> 00:06:38,000 Well it looks like the vector wasn't cut at all. That still seems to be 80 00:06:38,100 --> 00:06:44,000 the same, but it looks as though the insert got cut into two pieces. 81 00:06:44,100 --> 00:06:50,000 Since it was linear, it must have one site in it. 82 00:06:50,100 --> 00:06:56,000 And so, this molecule that I cloned could look, be one of 83 00:06:56,100 --> 00:07:06,000 two kinds of ways. It could be like this. 84 00:07:06,100 --> 00:07:18,000 Let's say this is the insert. Here's the ECO-R1. I'll use this 85 00:07:18,100 --> 00:07:31,000 sal 1 to orient us. So, the bam site could either be 86 00:07:31,100 --> 00:07:42,000 close over here. Or it could be over on the other 87 00:07:42,100 --> 00:07:50,000 side. Does that make sense? The logic is pretty simple. How 88 00:07:50,100 --> 00:07:58,000 can I tell which of those is correct? Just doing the kind 89 00:07:58,100 --> 00:08:07,000 of stuff I'm doing. Beautiful, beautiful. 90 00:08:07,100 --> 00:08:16,000 So if we cut with the sal 1 plus the bam H1. In one case, 91 00:08:16,100 --> 00:08:25,000 one would get a fragment like that. In the other case, get a fragment 92 00:08:25,100 --> 00:08:32,000 like that. That should feel uneasily familiar 93 00:08:32,100 --> 00:08:36,000 to you. It should feel just like what we are doing what we did that 94 00:08:36,100 --> 00:08:40,000 phage cross, and we had some genes that were lined up. 95 00:08:40,100 --> 00:08:45,000 And we were trying to figure, was the orientation this way? Or 96 00:08:45,100 --> 00:08:49,000 was the orientation that? That was exactly the same principle. 97 00:08:49,100 --> 00:08:53,000 And so this is usually, in the lab you'd call this 98 00:08:53,100 --> 00:08:58,000 restriction mapping, or making a restriction map. 99 00:08:58,100 --> 00:09:02,000 And it enabled people to manipulate fragments of DNA and make inferences 100 00:09:02,100 --> 00:09:06,000 about their orientation and other features before we can actually even 101 00:09:06,100 --> 00:09:11,000 sequence DNA. And that's just part of routine sort 102 00:09:11,100 --> 00:09:16,000 of stuff you do a lab. The equipment is disarmingly simple. 103 00:09:16,100 --> 00:09:21,000 It looks something like that, usually you're putting some colored 104 00:09:21,100 --> 00:09:26,000 dye so you can see that the things are moving down the gel. 105 00:09:26,100 --> 00:09:32,000 And the way you visualize the DNA is you add a molecule. 106 00:09:32,100 --> 00:09:35,000 The name of it doesn't particularly matter. It's called ethidium 107 00:09:35,100 --> 00:09:39,000 bromide. But its property is it doesn't fluoresce when it's just in 108 00:09:39,100 --> 00:09:43,000 solution. But it's a flat molecule, and it can interpolate in between 109 00:09:43,100 --> 00:09:46,000 the base pairs in DNA. They have all those stacked base 110 00:09:46,100 --> 00:09:50,000 pairs going down a helix. This molecule's flat, edit likes to 111 00:09:50,100 --> 00:09:54,000 slip inside. And now it's a much more hydrophobic environment. 112 00:09:54,100 --> 00:09:58,000 It's hidden from the water, becomes florescent. And so, 113 00:09:58,100 --> 00:10:01,000 DNA that's soaked up this dye then will fluoresce when you put a UV 114 00:10:01,100 --> 00:10:05,000 light on it. So if I take the gel out of there after I've run it, 115 00:10:05,100 --> 00:10:09,000 and soak it in this dye and then shine a little handheld UV light on 116 00:10:09,100 --> 00:10:13,000 it, it would look something like that if I photographed it. 117 00:10:13,100 --> 00:10:16,000 And so, you would end up with those patterns that look exactly like that. 118 00:10:16,100 --> 00:10:20,000 Oops, I guess I took the other one out. But you can, 119 00:10:20,100 --> 00:10:23,000 of course, depending on how complicated it is, 120 00:10:23,100 --> 00:10:27,000 you could have a lot of different fragments. OK, 121 00:10:27,100 --> 00:10:30,000 so the next big thing that had to happen in order for us to really 122 00:10:30,100 --> 00:10:34,000 move to where we are in today's molecular biology was somehow, 123 00:10:34,100 --> 00:10:38,000 DNA had to be sequenced. And as I say, when I was an 124 00:10:38,100 --> 00:10:43,000 undergrad, or even when I was just about to start, 125 00:10:43,100 --> 00:10:47,000 when I was a postdoc anyway, just again it seemed like how would 126 00:10:47,100 --> 00:10:52,000 you ever do it? Because every nucleotide was joint 127 00:10:52,100 --> 00:10:57,000 by a phosphodiester bond. The only difference was the base 128 00:10:57,100 --> 00:11:02,000 that was there. It seemed very, very difficult. 129 00:11:02,100 --> 00:11:06,000 It was hard to imagine you would ever be able to sort out the 130 00:11:06,100 --> 00:11:10,000 sequence of a billion base pairs. Of course, you could clone. Now 131 00:11:10,100 --> 00:11:14,000 you've got maybe a fragment of DNA that's a couple hundred base pairs 132 00:11:14,100 --> 00:11:18,000 long, and at least the problem becomes smaller. 133 00:11:18,100 --> 00:11:22,000 Maybe you could work it out. Now, there were a couple of 134 00:11:22,100 --> 00:11:26,000 different ways of doing it. One was by Wally Gilbert, who's up 135 00:11:26,100 --> 00:11:30,000 at Harvard who got half the Nobel Prize for doing this. 136 00:11:30,100 --> 00:11:34,000 The other principle, the other one that's proved to be most generally 137 00:11:34,100 --> 00:11:39,000 useful is Fred Sanger from England. And he had Wally shared the Nobel 138 00:11:39,100 --> 00:11:43,000 Prize for discovering sequencing. And the principal was disarmingly 139 00:11:43,100 --> 00:11:48,000 simple. I think it's one of these great ideas do you look back at 140 00:11:48,100 --> 00:11:52,000 afterward and think, I could have thought of that. 141 00:11:52,100 --> 00:11:57,000 You guys already know everything you need to invent how to sequence 142 00:11:57,100 --> 00:12:02,000 DNA. I've told you all the stuff already. 143 00:12:02,100 --> 00:12:07,000 But nobody's come down to tell me that you've got it. 144 00:12:07,100 --> 00:12:13,000 And I didn't think of it. So here is the principal. What 145 00:12:13,100 --> 00:12:18,000 we've talked about, if we take a DNA polymerase plus the 146 00:12:18,100 --> 00:12:24,000 four deoxynucleotide triphosphate's, remember we talked about 147 00:12:24,100 --> 00:12:29,000 deoxyribonucleotide, the adenosine triphosphate, 148 00:12:29,100 --> 00:12:35,000 and so on. There's four different ones. 149 00:12:35,100 --> 00:12:42,000 And we take a primer. And there's a three prime hydroxyl 150 00:12:42,100 --> 00:12:49,000 right there. And so this is the other strand is going the opposite 151 00:12:49,100 --> 00:12:56,000 direction. If we add that, I think you all know what's going to 152 00:12:56,100 --> 00:13:03,000 happen. We're going to get an extension to the other end. 153 00:13:03,100 --> 00:13:07,000 And what happens every time we add a nucleotide is that three prime 154 00:13:07,100 --> 00:13:12,000 hydroxyl attacks the phosphate of the triphosphate. 155 00:13:12,100 --> 00:13:16,000 We lose two of the phosphates. This is called pyrophosphate, and 156 00:13:16,100 --> 00:13:21,000 we've created a new five to three prime linkage. 157 00:13:21,100 --> 00:13:26,000 That gives us a new three prime hydroxyl, and we repeat the process, 158 00:13:26,100 --> 00:13:31,000 right? That's what we talked about. So, what would happen, 159 00:13:31,100 --> 00:13:46,000 let's spike in a little, let me do it. It's a little deoxy 160 00:13:46,100 --> 00:14:01,000 TTP. So this is dideoxy. But what would we mean by that? 161 00:14:01,100 --> 00:14:07,000 Well, if this, remember where the deoxy came from? 162 00:14:07,100 --> 00:14:14,000 The ribose has at the two prime position has a hydrogen instead of a 163 00:14:14,100 --> 00:14:20,000 hydroxyl, and at the three prime position it has a hydroxyl. 164 00:14:20,100 --> 00:14:27,000 If we made a dideoxy, what we do is we'd make that. What could 165 00:14:27,100 --> 00:14:33,000 that nucleotide do? Well, as long as the polymerase 166 00:14:33,100 --> 00:14:39,000 thought it was useful it would use this end, it would have its 167 00:14:39,100 --> 00:14:45,000 triphosphate up here. So, somebody else's three prime OH 168 00:14:45,100 --> 00:14:51,000 could come down and form a bond to here and we'd lose this. 169 00:14:51,100 --> 00:14:57,000 So it could get incorporated. That chain is finished. It can't 170 00:14:57,100 --> 00:15:03,000 be elongated anymore. So, let's think what would happen if 171 00:15:03,100 --> 00:15:09,000 we had, let me stretch this out a little bit here, 172 00:15:09,100 --> 00:15:16,000 and let's imagine we had a few A's in the sequence. 173 00:15:16,100 --> 00:15:22,000 So, we are just going to spike it a bit. So, most of the things will 174 00:15:22,100 --> 00:15:28,000 not see a dideoxy. So, this primer will put, 175 00:15:28,100 --> 00:15:34,000 we'll try elongating this. So when we get to this point, 176 00:15:34,100 --> 00:15:39,000 this point many of them will put it an ordinary A, 177 00:15:39,100 --> 00:15:45,000 but a few will put it a dideoxy. And those will finish. At that 178 00:15:45,100 --> 00:15:50,000 point, they can't go any farther. The rest of them keep going, [the 179 00:15:50,100 --> 00:15:55,000 various?] nucleotides. When we get to the next A, 180 00:15:55,100 --> 00:16:00,000 most of them will put them a good T, but the ones that put in a dideoxy 181 00:16:00,100 --> 00:16:06,000 will stop, and they will generate a fragment that looks like that. 182 00:16:06,100 --> 00:16:11,000 You get the idea. Out of this reaction, 183 00:16:11,100 --> 00:16:17,000 we are going to get a set of fragments. And each one terminates 184 00:16:17,100 --> 00:16:23,000 where there was an A up there. Now, in this newer emulation of 185 00:16:23,100 --> 00:16:29,000 this thing, we have a T. And the trick is to put a dye that 186 00:16:29,100 --> 00:16:35,000 you can attach to this nucleotide, so it has a particular color. 187 00:16:35,100 --> 00:16:41,000 So, suppose we had something that was yellow. Then this particular 188 00:16:41,100 --> 00:16:47,000 set of fragments would be yellow. And maybe you can begin to see what 189 00:16:47,100 --> 00:16:54,000 would happen now. If we did the same game three more 190 00:16:54,100 --> 00:17:00,000 times, each time using a different deoxy, next time maybe 191 00:17:00,100 --> 00:17:06,000 we'll use dideoxy A. And we'll put a different colored 192 00:17:06,099 --> 00:17:11,000 dye on it. Then every time, in this case we come to a T in the 193 00:17:11,099 --> 00:17:16,000 template, it would stop, and we'd get a little fragment 194 00:17:16,099 --> 00:17:21,000 that's stopped because it incorporated a dideoxy A, 195 00:17:21,099 --> 00:17:26,000 and those would be, let's say, green. So, by the end of this, 196 00:17:26,099 --> 00:17:31,000 we would have all possible fragments if we mixed them all together and 197 00:17:31,100 --> 00:17:37,000 the last nucleotide on each fragment would say who it was by its color. 198 00:17:37,100 --> 00:17:40,000 So, if you were to, then, take this whole mixture of DNA 199 00:17:40,100 --> 00:17:43,000 fragments and you run them down a gel, in this case it's a difference 200 00:17:43,100 --> 00:17:46,000 of polyacrylamide gel because you have smaller trying to get things to 201 00:17:46,100 --> 00:17:50,000 go by smaller fragments. You could sort of see what would 202 00:17:50,100 --> 00:17:53,000 happen. The big ones would be at the top. The small ones would be at 203 00:17:53,100 --> 00:17:56,000 the bottom. And you'd see each band would have a different color 204 00:17:56,100 --> 00:18:00,000 depending on the dideoxy that terminated its chain. 205 00:18:00,100 --> 00:18:05,000 So, if you had a little scanner that just goes along, 206 00:18:05,100 --> 00:18:10,000 it can read this, and it will print out something. 207 00:18:10,100 --> 00:18:15,000 And these are always slightly idealized. This is a real one. 208 00:18:15,100 --> 00:18:20,000 But this is the sort of stuff you get back. If you send a piece of 209 00:18:20,100 --> 00:18:25,000 DNA over to a sequencing center, they'd send this back as a file or 210 00:18:25,100 --> 00:18:30,000 something. And you'd sit there. And it's very good these days. 211 00:18:30,100 --> 00:18:34,000 The technology wasn't as good, but they can almost always now get 212 00:18:34,100 --> 00:18:38,000 the sequence. Occasionally, you'll get something like a run of 213 00:18:38,100 --> 00:18:42,000 G's that gets a little hard, but what they'll do is they'll sort 214 00:18:42,100 --> 00:18:46,000 of what they call sequencing [bow strands?]. You can see this way, 215 00:18:46,100 --> 00:18:50,000 but really only looking at the information of one strand. 216 00:18:50,100 --> 00:18:54,000 So, if we took the other strand. So if we took the other strand, and 217 00:18:54,100 --> 00:18:58,000 we did the same thing, but we should get the complementary 218 00:18:58,100 --> 00:19:02,000 piece of information. So, what this DNA sequencing allows 219 00:19:02,100 --> 00:19:06,000 you to do, then, is determine the exact sequence of 220 00:19:06,100 --> 00:19:10,000 nucleotides in some kind of piece. And much of the art from the rest of 221 00:19:10,100 --> 00:19:13,000 it then comes, how do you assemble all of those 222 00:19:13,100 --> 00:19:16,000 things together? In the case of a bacteria or 223 00:19:16,100 --> 00:19:19,000 something, it wasn't so bad because its DNA was small enough. 224 00:19:19,100 --> 00:19:23,000 You could cut it into a bunch of sort of big fragments, 225 00:19:23,100 --> 00:19:26,000 and then take each one of those, and then the sorting problem was 226 00:19:26,100 --> 00:19:29,000 relatively simple. In the case of something like 227 00:19:29,100 --> 00:19:32,000 humans, it was really complicated because there were 228 00:19:32,100 --> 00:19:35,000 so much more DNA. And the other thing is higher 229 00:19:35,100 --> 00:19:39,000 organisms such as yourselves have a lot of repeated DNA. 230 00:19:39,100 --> 00:19:42,000 It's just the same sequence, and sometimes there's quite a bit of 231 00:19:42,100 --> 00:19:45,000 it, a bunch of repeats. And so, if you see that at the end 232 00:19:45,100 --> 00:19:49,000 of your thing, you don't really quite know where 233 00:19:49,100 --> 00:19:52,000 you are in the genome. So a lot of other tricks had to be 234 00:19:52,100 --> 00:19:55,000 brought into play, including knowledge of the human 235 00:19:55,100 --> 00:19:59,000 genetic map. And so you could get yourself anchored at various places 236 00:19:59,100 --> 00:20:02,000 because you do on this particular piece of DNA, because it was 237 00:20:02,100 --> 00:20:06,000 associated with some gene, had to be here on the chromosome. 238 00:20:06,100 --> 00:20:10,000 And therefore, things at least decide beside it 239 00:20:10,100 --> 00:20:15,000 were there on the chromosome. And there were a whole lot of 240 00:20:15,100 --> 00:20:20,000 tricks to putting it together. But the very basic principle of how 241 00:20:20,100 --> 00:20:25,000 we sequence DNA has at its heart the same process that I was talking to 242 00:20:25,100 --> 00:20:30,000 you about as when we were doing DNA replication, except in this case 243 00:20:30,100 --> 00:20:35,000 it's just used in a very clever way. And that was an amazing idea. 244 00:20:35,100 --> 00:20:39,000 It got a Nobel Prize, and you've been sitting here for the 245 00:20:39,100 --> 00:20:43,000 last month with all the knowledge to do it. You keep emphasizing that 246 00:20:43,100 --> 00:20:47,000 you've got to have that three prime hydroxyl. But some of the great 247 00:20:47,100 --> 00:20:51,000 ideas often when you look back you could see it was the hurdle was kind 248 00:20:51,100 --> 00:20:55,000 of small. And they didn't even have to do this with dyes at the 249 00:20:55,100 --> 00:20:59,000 beginning. In fact, that was a later innovation. 250 00:20:59,100 --> 00:21:03,000 The key thing was just the dideoxies stopping in each place. 251 00:21:03,100 --> 00:21:11,000 I was lucky enough to live through some of this, the development of 252 00:21:11,100 --> 00:21:20,000 this technology. OK, so I've got one more really big 253 00:21:20,100 --> 00:21:28,000 thing to tell you, which again was extraordinarily 254 00:21:28,100 --> 00:21:37,000 clever, but extraordinarily simple once you heard about it. 255 00:21:37,100 --> 00:21:44,000 And it was one more technological advance. It wasn't a big insight 256 00:21:44,100 --> 00:21:51,000 into biology in and of itself, but it was a technology that opened 257 00:21:51,100 --> 00:21:58,000 up just incredible experimental possibilities. 258 00:21:58,100 --> 00:22:05,000 And it's something known as the polymerase chain reaction. 259 00:22:05,100 --> 00:22:10,000 And this allows, in principle, someone like me to go 260 00:22:10,100 --> 00:22:15,000 and to grab a single cell from you, take it to DNA, and get a copy of 261 00:22:15,100 --> 00:22:20,000 any gene I want from your genome. And I can look and see whether you 262 00:22:20,100 --> 00:22:26,000 have any mutations in that genome, or whether there are different 263 00:22:26,100 --> 00:22:31,000 polymorphic alleles in the population, in which one you've got 264 00:22:31,100 --> 00:22:37,000 from your mom, or which one you got from your dad. 265 00:22:37,100 --> 00:22:41,000 So, you take from a single DNA molecule, I can make as much as I 266 00:22:41,100 --> 00:22:45,000 want. And this is just like DNA sequencing. You guys already know 267 00:22:45,100 --> 00:22:49,000 everything you need to know to invent this technique as well. 268 00:22:49,100 --> 00:22:54,000 It has very much that same property. It's another one of these very 269 00:22:54,100 --> 00:22:58,000 brilliant insights that you just had to put things in the right place. 270 00:22:58,100 --> 00:23:03,000 So let me explain the principle. 271 00:23:03,100 --> 00:23:08,000 So, suppose that I would like to know there's a gene that I know 272 00:23:08,100 --> 00:23:13,000 there's a family history of something, and I would like to know, 273 00:23:13,100 --> 00:23:18,000 but I happen to get the allele that carries that? Or did I get the one 274 00:23:18,100 --> 00:23:23,000 that didn't? So, in principle what I would like to do 275 00:23:23,100 --> 00:23:28,000 is to get a hold of the piece of DNA for that gene from my own cells. 276 00:23:28,100 --> 00:23:33,000 But all I've started with is my entire DNA. 277 00:23:33,100 --> 00:23:36,000 Well, I could clone it. I could make a recombinant library. 278 00:23:36,100 --> 00:23:40,000 I could do everything else. But there's this other simple way. 279 00:23:40,100 --> 00:23:44,000 And one way this involves, what it involves taking, 280 00:23:44,100 --> 00:23:48,000 is since I know the sequence of the genome now, I know that almost 281 00:23:48,100 --> 00:23:52,000 everything is going to the same. There will be little differences 282 00:23:52,100 --> 00:23:56,000 between individuals. I'll make a little primer that 283 00:23:56,100 --> 00:24:00,000 corresponds to the sequence that one end of the gene, 284 00:24:00,100 --> 00:24:04,000 and another primer that corresponds to the DNA at the other end of the 285 00:24:04,100 --> 00:24:08,000 gene, or whatever fragment I want to use. 286 00:24:08,100 --> 00:24:12,000 And that's all I have to do in terms of getting anything made. 287 00:24:12,100 --> 00:24:17,000 Now the rest, we are just going to play games with DNA, 288 00:24:17,100 --> 00:24:22,000 with DNA polymerase, and nucleoside triphosphates, 289 00:24:22,100 --> 00:24:27,000 just all the stuff I dragged you through talking about DNA 290 00:24:27,100 --> 00:24:32,000 replication. So here's the idea. So here's my DNA, let's say, or 291 00:24:32,100 --> 00:24:37,000 part of it. If I could actually see the sequence, 292 00:24:37,100 --> 00:24:41,000 I would know, let's say, the gene I'm interested in is in 293 00:24:41,100 --> 00:24:45,000 here. So, what I would do is make a little primer. 294 00:24:45,100 --> 00:24:50,000 It just has to be enough to confer specificity for something with 295 00:24:50,100 --> 00:24:54,000 humans. If I make something probably 30 nucleotides long, 296 00:24:54,100 --> 00:24:58,000 that's enough. It'll only bind one place in the DNA. 297 00:24:58,100 --> 00:25:03,000 And I make one, let's say, for the opposite strand over here. 298 00:25:03,100 --> 00:25:15,000 So remember, this is five prime, three prime, five prime to three 299 00:25:15,100 --> 00:25:27,000 prime. So the principles will heat to 95∞C, and will denature the DNA. 300 00:25:27,100 --> 00:25:39,000 And we'll add an excess of the two primers. 301 00:25:39,100 --> 00:25:46,000 And let's say we'll cool to 55∞C, or something. And we'll cool it 302 00:25:46,100 --> 00:25:53,000 down enough so that we can get the primers on. But we are not going to 303 00:25:53,100 --> 00:26:00,000 go all the way and let all the strands find their way back. 304 00:26:00,100 --> 00:26:08,000 And we'll add a DNA polymerase plus four deoxy nucleoside 305 00:26:08,100 --> 00:26:15,000 triphosphates. Well, what will happen? 306 00:26:15,100 --> 00:26:23,000 Well, here's one of the strands. And we'll prime it here, let's say. 307 00:26:23,100 --> 00:26:31,000 So, it will copy down here and go as far as it can go. 308 00:26:31,100 --> 00:26:38,000 And the other one starts here. And it's going to go down all that 309 00:26:38,100 --> 00:26:44,000 way. Let's just repeat the whole process now, OK? 310 00:26:44,100 --> 00:26:50,000 What'll happen? Now when we pull them apart, we ought to have four 311 00:26:50,100 --> 00:26:56,000 strands. We'll have the original ones here, and when I repeat the 312 00:26:56,100 --> 00:27:02,000 process, the same thing's going to happen again. 313 00:27:02,100 --> 00:27:09,000 This one will go here, and it will copy out. This one will 314 00:27:09,100 --> 00:27:17,000 go here. It will copy out. But what about this guy? So, 315 00:27:17,100 --> 00:27:24,000 this one becomes this one here. So the primer that it does will 316 00:27:24,100 --> 00:27:32,000 copy it, and it can't go any further. 317 00:27:32,100 --> 00:27:37,000 I just generated a piece that's exactly what I wanted. 318 00:27:37,100 --> 00:27:43,000 And the same deal here: as long as I don't get lost, 319 00:27:43,100 --> 00:27:48,000 which what did I do? So, we've got this guy here. 320 00:27:48,100 --> 00:27:54,000 So, it starts there. So this one becomes this one, 321 00:27:54,100 --> 00:27:59,000 and we'll prime it here. It'll go along and it will stop. 322 00:27:59,100 --> 00:28:05,000 So there's the complementary strand to the one here. 323 00:28:05,100 --> 00:28:09,000 And I think this is sort of like doing a math problem. 324 00:28:09,100 --> 00:28:14,000 You can't just look at it and say, we'll maybe you will get it. But 325 00:28:14,100 --> 00:28:19,000 there's nothing like sitting down with a pencil and paper, 326 00:28:19,100 --> 00:28:24,000 and take yourself through several cycles. What you will believe is 327 00:28:24,100 --> 00:28:29,000 how quickly you get to get being nothing but, almost nothing, 328 00:28:29,100 --> 00:28:34,000 but the sequence that you are trying to amplify. 329 00:28:34,100 --> 00:28:38,000 And so, this again has an astonishing effect. 330 00:28:38,100 --> 00:28:42,000 This is why you hear about DNA testing all the time in forensics, 331 00:28:42,100 --> 00:28:46,000 because you can take a tiny bit of DNA from saliva, 332 00:28:46,100 --> 00:28:50,000 or semen, or blood, or whatever they might find on a 333 00:28:50,100 --> 00:28:54,000 crime scene, and then they can amplify little pieces 334 00:28:54,100 --> 00:28:58,000 and they compare. And there's a trick they use in 335 00:28:58,100 --> 00:29:03,000 forensics, and that is that there are sequences within the human 336 00:29:03,100 --> 00:29:08,000 genome where the little variable repeats like GT, 337 00:29:08,100 --> 00:29:13,000 GT, GT, GT, GT, GT, and I might have 14 of them in one 338 00:29:13,100 --> 00:29:18,000 of my chromosomes. The one I got from my mom might 339 00:29:18,100 --> 00:29:23,000 have 40. You might have 24 and something else, 340 00:29:23,100 --> 00:29:28,000 and so on. If you were to do PCR around a little region that was 341 00:29:28,100 --> 00:29:33,000 known to be variable, if you had 14 repeats you'd get a 342 00:29:33,100 --> 00:29:38,000 shorter fragment. And if you had 40 repeats, 343 00:29:38,100 --> 00:29:42,000 you get a longer fragment. So, I'll come back to that in a sec. 344 00:29:42,100 --> 00:29:47,000 So, if you were to, for example, take something with a long [repeat 345 00:29:47,100 --> 00:29:52,000 and a?] short peak into this kind of thing. We get two fragments, 346 00:29:52,100 --> 00:29:56,000 say, one from the paternal. And if you do this with several such sites 347 00:29:56,100 --> 00:30:01,000 around the genome, pretty soon you run into situations 348 00:30:01,100 --> 00:30:05,000 where the odds of a particular combination of a long one at the 349 00:30:05,100 --> 00:30:10,000 site, a short one at the site, and so on, becomes statistically 350 00:30:10,100 --> 00:30:15,000 improbable that it's anyone other than yourself. 351 00:30:15,100 --> 00:30:18,000 So on a crime scene, if they did this, they, 352 00:30:18,100 --> 00:30:21,000 for example, might have three individuals that they were thinking 353 00:30:21,100 --> 00:30:24,000 was possible. And they'd generate patterns like this, 354 00:30:24,100 --> 00:30:28,000 say, using three different loci like this, and then have the 355 00:30:28,100 --> 00:30:31,000 forensic sample. And it was pretty evident who didn't 356 00:30:31,100 --> 00:30:35,000 do it, and who at least remains a suspect. This probably would 357 00:30:35,100 --> 00:30:39,000 improve it. The very last thing, just to close us off, is when people 358 00:30:39,100 --> 00:30:42,000 develop this PCR technique, you had to sit there with your 359 00:30:42,100 --> 00:30:46,000 pipette because every time you raised it to 90∞ to denature the DNA 360 00:30:46,100 --> 00:30:50,000 you killed your enzyme. So, you cool it down to 55, 361 00:30:50,100 --> 00:30:53,000 escort it in a new DNA polymerase. And then someone finally said, 362 00:30:53,100 --> 00:30:57,000 another brilliant idea, what if I had a thermoresistant 363 00:30:57,100 --> 00:31:01,000 polymerase? Where would I find those? 364 00:31:01,100 --> 00:31:04,000 Well, Penny was talking to you about those events where it's really, 365 00:31:04,100 --> 00:31:08,000 really hot, and those black smokers and everything, 366 00:31:08,100 --> 00:31:11,000 so maybe you got a bacterium from their. It would have a temperature 367 00:31:11,100 --> 00:31:15,000 resistant polymerase. So here you are from the New 368 00:31:15,100 --> 00:31:19,000 England [Biocatalog? , [vent exominus?] DNA polymerase, 369 00:31:19,100 --> 00:31:22,000 deep vent DNA polymerase. People went to grab those bacteria from 370 00:31:22,100 --> 00:31:26,000 there, grabbed the DNA polymerase gene. And now, 371 00:31:26,100 --> 00:31:30,000 the DNA polymerase just sits there. It just laughs and you bring it up 372 00:31:30,100 --> 00:31:34,000 to 90∞. And when you cool it back down and 373 00:31:34,100 --> 00:31:38,000 give it a substrate again, it will do its thing. And so, 374 00:31:38,100 --> 00:31:43,000 this whole thing can be done automatic and you don't have to sit 375 00:31:43,100 --> 00:31:47,000 there and pipette something in at the end of every run, 376 00:31:47,100 --> 00:31:52,000 another little cute sort of engineering trick that combined 377 00:31:52,100 --> 00:31:55,000 ecology together with biology. OK, see you on Friday.