1 00:00:01,000 --> 00:00:05,000 Good morning. So, 2 00:00:05,000 --> 00:00:09,000 we are going to see if my voice holds up through this lecture today. 3 00:00:09,000 --> 00:00:13,000 It is a casualty of having been at Foxborough yesterday, 4 00:00:13,000 --> 00:00:17,000 and then staying up rather late watching the Red Sox game. 5 00:00:17,000 --> 00:00:21,000 On the whole, both seemed to have come through successfully, 6 00:00:21,000 --> 00:00:25,000 but my voice is a bit of a casualty of the events. 7 00:00:25,000 --> 00:00:29,000 So, we'll see. But I'm going to sound a lot 8 00:00:29,000 --> 00:00:35,000 scratchier than normal. 9 00:00:35,000 --> 00:00:41,000 So, how many of you stayed up to the end of the game last night? 10 00:00:41,000 --> 00:00:49,000 Good, excellent. I approve. 11 00:00:49,000 --> 00:00:54,000 OK,last time, we spoke about the idea of cloning DNA, 12 00:00:54,000 --> 00:00:59,000 to create libraries of molecules. 13 00:00:59,000 --> 00:01:03,000 And again, I think this is just one of the most clever inventions 14 00:01:03,000 --> 00:01:07,000 because it's a completely new way to think about purifying molecules. 15 00:01:07,000 --> 00:01:11,000 Rather than purifying molecules, by separating them based on their 16 00:01:11,000 --> 00:01:15,000 biochemical properties, it's purifying molecules by diluting 17 00:01:15,000 --> 00:01:20,000 them into single components, and then amplifying each back up 18 00:01:20,000 --> 00:01:24,000 from its own source. It's really quite a beautiful idea. 19 00:01:24,000 --> 00:01:28,000 And just to go over it, we take, say, human DNA, 20 00:01:28,000 --> 00:01:32,000 or we could take drosophila DNA, or we could take yeast DNA, or we 21 00:01:32,000 --> 00:01:37,000 could take any other DNA we feel like. 22 00:01:37,000 --> 00:01:42,000 We cut it up in some fashion with a restriction enzyme. 23 00:01:42,000 --> 00:01:48,000 We'll use our favorite restriction enzyme here, echo R1, 24 00:01:48,000 --> 00:01:54,000 which cuts a defying side, GAATTC. We take that. We add our 25 00:01:54,000 --> 00:02:00,000 insert DNA. These are referred to as inserts because they're going to 26 00:02:00,000 --> 00:02:05,000 be inserted into a plasmid. We take a plasmid vector. 27 00:02:05,000 --> 00:02:11,000 The plasmid vector here is a naturally occurring, 28 00:02:11,000 --> 00:02:16,000 although sometimes modified, piece of DNA that bacteria have that 29 00:02:16,000 --> 00:02:22,000 take an origin of replication that allow it to grow autonomously when 30 00:02:22,000 --> 00:02:28,000 put in a bacterial cell, a selectable marker. 31 00:02:28,000 --> 00:02:32,000 The selectable marker, for example, ampicillin resistance, 32 00:02:32,000 --> 00:02:37,000 or some other resistance, we add these and then we seal up the pieces 33 00:02:37,000 --> 00:02:42,000 of the DNA using the enzyme ligase. Ligase joins and joins producing 34 00:02:42,000 --> 00:02:47,000 for us molecules of this sort. We make zillions of them in 35 00:02:47,000 --> 00:02:52,000 parallel in one test tube. We then transform them by adding 36 00:02:52,000 --> 00:02:57,000 these molecules to bacterial cells that have been appropriately 37 00:02:57,000 --> 00:03:02,000 prepared to be transformed, that is, their membranes have been 38 00:03:02,000 --> 00:03:07,000 treated in such a way that they're going to be most likely to 39 00:03:07,000 --> 00:03:11,000 suck up pieces of DNA. We then plate them on a plate at a 40 00:03:11,000 --> 00:03:15,000 density so that individual bacterial cells are well separated from each 41 00:03:15,000 --> 00:03:19,000 other. You try a bunch of different densities so you get one right. 42 00:03:19,000 --> 00:03:23,000 And, you let them grow up. And, every colony here, as we discussed, 43 00:03:23,000 --> 00:03:27,000 is the descendant of a single bacterial cell, 44 00:03:27,000 --> 00:03:31,000 carrying ideally a single plasmid. 45 00:03:31,000 --> 00:03:35,000 And, that single plasmid, we know it's carrying a single 46 00:03:35,000 --> 00:03:39,000 plasmid because we were clever enough to put ampicillin or other 47 00:03:39,000 --> 00:03:44,000 selectable marker on this plate. And so, only bacteria that have 48 00:03:44,000 --> 00:03:48,000 picked up the plasmid are ampicillin resistant. And there you go. 49 00:03:48,000 --> 00:03:53,000 This is called a library. And, at the end of the day, you may have 50 00:03:53,000 --> 00:03:57,000 a library that contains one plate of clones or a library containing 51 00:03:57,000 --> 00:04:02,000 hundreds of plates of clones. We're going to see how we last 52 00:04:02,000 --> 00:04:08,000 through this. Now, a few people asked me at the end of 53 00:04:08,000 --> 00:04:13,000 the last lecture, well, OK, but what about the details. 54 00:04:13,000 --> 00:04:19,000 Is it really going to work like this? How come some of these 55 00:04:19,000 --> 00:04:24,000 plasmid molecules don't automatically get closed back up by 56 00:04:24,000 --> 00:04:30,000 ligase? Why is it that there's always an insert in the plasmid? 57 00:04:30,000 --> 00:04:34,000 What's the answer to that question? Sorry? There's not an answer 58 00:04:34,000 --> 00:04:38,000 because sometimes ligase might close up that molecule. 59 00:04:38,000 --> 00:04:42,000 Now, that would be unfortunate because it would mean that a bunch 60 00:04:42,000 --> 00:04:46,000 of the things in your library just had the vector without any insert. 61 00:04:46,000 --> 00:04:50,000 So, and these are details, but over the course of years, 62 00:04:50,000 --> 00:04:54,000 recombinant DNA specialists have worked out lots of cute tricks to 63 00:04:54,000 --> 00:04:58,000 make better and better libraries. I'll just give you an example of 64 00:04:58,000 --> 00:05:03,000 the kinds of things. Remember that in order to ligate DNA, 65 00:05:03,000 --> 00:05:09,000 we had a five prime here. We have a phosphate group here, 66 00:05:09,000 --> 00:05:16,000 three prime hydroxyl phosphate here, double strand of DNA here. We have 67 00:05:16,000 --> 00:05:23,000 a phosphate here. We have a hydroxyl here, 68 00:05:23,000 --> 00:05:30,000 phosphate five prime, three prime. 69 00:05:30,000 --> 00:05:37,000 If ligase is going to come along, it turns out that ligase needs the 70 00:05:37,000 --> 00:05:45,000 phosphate there in order to seal it up and make a chain. 71 00:05:45,000 --> 00:05:53,000 So, for example, suppose we were to arrange that the plasmid vector 72 00:05:53,000 --> 00:06:01,000 didn't have phosphates on its two ends. Then ligase would not be able 73 00:06:01,000 --> 00:06:06,000 to re-seal the plasmid vector. That's a cute trick. 74 00:06:06,000 --> 00:06:10,000 This is just cooking, but I'm giving you an idea of the 75 00:06:10,000 --> 00:06:13,000 kind of cooking tricks we use in all this. So, ideally, 76 00:06:13,000 --> 00:06:17,000 you would like an enzyme that can remove phosphate groups from the end 77 00:06:17,000 --> 00:06:21,000 of DNA. How are you going to invent such an enzyme? 78 00:06:21,000 --> 00:06:24,000 It already exists is the answer to all these questions. 79 00:06:24,000 --> 00:06:28,000 And, bacteria have such an enzyme that can remove phosphate groups. 80 00:06:28,000 --> 00:06:32,000 So, just remove phosphate groups. And of course these enzymes are 81 00:06:32,000 --> 00:06:36,000 developed by bacteria because they need them in the course of DNA 82 00:06:36,000 --> 00:06:41,000 metabolism. And, what do you think the enzyme is 83 00:06:41,000 --> 00:06:45,000 called? Phosphotase, of course. That's what happens, 84 00:06:45,000 --> 00:06:49,000 use phosphotase, and you treat that, and it doesn't seal back up. Now, 85 00:06:49,000 --> 00:06:54,000 somebody will say to me, well, OK, but now I've got my vector 86 00:06:54,000 --> 00:06:58,000 here, and I don't have a phosphate on it, and so this is 87 00:06:58,000 --> 00:07:03,000 my vector DNA. And then, I've got my insert DNA, 88 00:07:03,000 --> 00:07:08,000 and sorry, my insert DNA here, it has a hydroxyl here and a phosphate 89 00:07:08,000 --> 00:07:13,000 here. So, the vector has no phosphate. But, 90 00:07:13,000 --> 00:07:19,000 when ligase wants to attach an insert, it's got a phosphate here 91 00:07:19,000 --> 00:07:24,000 but not here. What's going to happen? Well, 92 00:07:24,000 --> 00:07:29,000 it turns out that ligase will seal up this because it's got a phosphate, 93 00:07:29,000 --> 00:07:34,000 but it'll leave this one open. Now, is that a problem? 94 00:07:34,000 --> 00:07:38,000 It turns out, if you just transform it into the bacteria with that hole 95 00:07:38,000 --> 00:07:42,000 there on one strand but not both strands, it's still a covalently 96 00:07:42,000 --> 00:07:47,000 closed circle on one of its strands. The bacteria will repair it. So, 97 00:07:47,000 --> 00:07:51,000 you can take advantage of the bacteria's own DNA repair mechanisms 98 00:07:51,000 --> 00:07:55,000 to just throw the molecule in sealed up on one strand and let its repair 99 00:07:55,000 --> 00:08:00,000 mechanism; all these tricks we play to our advantage. 100 00:08:00,000 --> 00:08:04,000 Someone else asked after class, what happens if the gene I'm 101 00:08:04,000 --> 00:08:08,000 interested in studying has, here's my gene let's say that I'm 102 00:08:08,000 --> 00:08:13,000 interested in studying. I take human DNA. I cut it with 103 00:08:13,000 --> 00:08:17,000 echo R1. So, I have cut it at all the echo sites. 104 00:08:17,000 --> 00:08:22,000 Well, golly, what happens if my gene happened to have an echo site 105 00:08:22,000 --> 00:08:26,000 in it? Then my gene's going to be cut up into two pieces. 106 00:08:26,000 --> 00:08:30,000 Isn't that bad? What do I do about that? 107 00:08:30,000 --> 00:08:34,000 Do I know in advance if my gene has an echo site? Well, 108 00:08:34,000 --> 00:08:38,000 no, I don't, because I don't know what my gene is. 109 00:08:38,000 --> 00:08:42,000 I'm making a library of everything in the genome. 110 00:08:42,000 --> 00:08:46,000 So, some genes will have it, and some won't. And, I might not 111 00:08:46,000 --> 00:08:50,000 know the gene I'm looking for. So, how do I avoid that? Sorry? 112 00:08:50,000 --> 00:08:54,000 Oh, you've tried another enzyme. You've tried BAM and Hindi, 113 00:08:54,000 --> 00:08:58,000 and make a library with different enzymes. That's one 114 00:08:58,000 --> 00:09:02,000 way. That works. Another way, just to give you a 115 00:09:02,000 --> 00:09:06,000 sense of how fast molecular biologists are with this. 116 00:09:06,000 --> 00:09:10,000 Supposed when we add echo R1 we don't let the reaction go to 117 00:09:10,000 --> 00:09:14,000 completion. Suppose we run the reaction under conditions where it's 118 00:09:14,000 --> 00:09:18,000 somewhat inefficient, and instead of managing to cleave 119 00:09:18,000 --> 00:09:22,000 every echo site, on average it cleaves, 120 00:09:22,000 --> 00:09:26,000 say, one out of every three-echo sites. You can do that. 121 00:09:26,000 --> 00:09:30,000 So, that means you can arrange just by your reaction conditions to on 122 00:09:30,000 --> 00:09:35,000 average randomly cleave some but not others. 123 00:09:35,000 --> 00:09:38,000 And, these are called partial digestions. So, 124 00:09:38,000 --> 00:09:41,000 it turns out that all of the kinds of things that people were asking me 125 00:09:41,000 --> 00:09:44,000 about afterwards, I was very glad people were thinking 126 00:09:44,000 --> 00:09:47,000 about would this really work? There are tricks to get around all 127 00:09:47,000 --> 00:09:50,000 of it, and there's a whole fat book of protocols about if you want to 128 00:09:50,000 --> 00:09:54,000 make a library really, really carefully, how you would do 129 00:09:54,000 --> 00:09:57,000 that, how you make sure the vector doesn't re-close, 130 00:09:57,000 --> 00:10:00,000 how you make sure that you don't cut every site but random sites, 131 00:10:00,000 --> 00:10:04,000 and things like that. And, all of these rely on lots of 132 00:10:04,000 --> 00:10:08,000 enzymes and things that bacteria have already invented. 133 00:10:08,000 --> 00:10:12,000 So, I'm just going to put these down as cooking tips. 134 00:10:12,000 --> 00:10:16,000 These are not really necessarily, I don't care whether you know the 135 00:10:16,000 --> 00:10:20,000 details or not, rather that there exists a whole 15 136 00:10:20,000 --> 00:10:24,000 years, 20 years worth of ways to make the best possible libraries. 137 00:10:24,000 --> 00:10:28,000 And so, it's quite routine now to be able to make good libraries. 138 00:10:28,000 --> 00:10:34,000 All right, so, having made a library, 139 00:10:34,000 --> 00:10:40,000 the challenge is finding your clone. How to find your clone, the clone 140 00:10:40,000 --> 00:10:46,000 of interest. So, I need to describe a number of ways 141 00:10:46,000 --> 00:10:52,000 that people have for finding a clone of interest. And here, 142 00:10:52,000 --> 00:10:58,000 of course, up to this point, the DNA could be zebra DNA, and it 143 00:10:58,000 --> 00:11:04,000 could be human DNA and yeast DNA, and it could be something that is an 144 00:11:04,000 --> 00:11:11,000 enzyme for arginine, or this, or that. 145 00:11:11,000 --> 00:11:18,000 But now we have to be specific. So, let's suppose we go back to a 146 00:11:18,000 --> 00:11:25,000 problem we talked about before about, say, auxotrophy for a nutrient. 147 00:11:25,000 --> 00:11:32,000 So, let's suppose that I have a bacteria, maybe even E coli itself, 148 00:11:32,000 --> 00:11:40,000 where I have selected mutants that are auxotrophic for arginine. 149 00:11:40,000 --> 00:11:52,000 So, arginine auxotrophs will grow on rich medium, but on minimal medium 150 00:11:52,000 --> 00:12:00,000 they don't grow. But, they would grow if I added 151 00:12:00,000 --> 00:12:04,000 arginine to that medium. They don't grow because they have a 152 00:12:04,000 --> 00:12:09,000 mutation in a gene. We know it's a gene because we 153 00:12:09,000 --> 00:12:13,000 crossed together the mutant and the wild type. We show that we can 154 00:12:13,000 --> 00:12:18,000 define this phenotype to be a recessive phenotype. 155 00:12:18,000 --> 00:12:22,000 We can map it in the yeast genome by showing it has linkage to other 156 00:12:22,000 --> 00:12:27,000 phenotypes. That's all great. We can do classical genetics, a la 157 00:12:27,000 --> 00:12:32,000 Mendel, a la Morgan, a la Sturtevant. 158 00:12:32,000 --> 00:12:37,000 But, how are we going to find the gene? How are we going to, 159 00:12:37,000 --> 00:12:42,000 now, use our tools of recombinant DNA to get physically in our hand 160 00:12:42,000 --> 00:12:47,000 the piece of DNA that encodes the gene that is defective in the strand? 161 00:12:47,000 --> 00:12:52,000 So, have a mutant bacteria. It can't make arginine. It can't 162 00:12:52,000 --> 00:12:57,000 grow in minimal medium. Somewhere in there, you know 163 00:12:57,000 --> 00:13:02,000 there's a mutation in the DNA sequence. 164 00:13:02,000 --> 00:13:07,000 How do we find it? What should we do? 165 00:13:07,000 --> 00:13:13,000 This is the whole point of recombinant DNA, 166 00:13:13,000 --> 00:13:18,000 to make this abstract notion of, there exists genes, they transmit 167 00:13:18,000 --> 00:13:24,000 all this kind of stuff, concrete. How are you going to find 168 00:13:24,000 --> 00:13:30,000 it? Any takers? Sorry? Run a gel. 169 00:13:30,000 --> 00:13:34,000 So, I take DNA, cut it up, run a gel. 170 00:13:34,000 --> 00:13:38,000 I have all the DNA from the bacteria schmeered (sic) out. 171 00:13:38,000 --> 00:13:42,000 And somewhere in that schmeer is the gene. So, 172 00:13:42,000 --> 00:13:46,000 I take normal DNA from normal bacteria. I take mutant DNA. 173 00:13:46,000 --> 00:13:50,000 One nucleotide is different in the mutant DNA. I run them out, 174 00:13:50,000 --> 00:13:54,000 and I assure you, they just look like a schmeer. 175 00:13:54,000 --> 00:13:58,000 It's just a big schmeer of DNA. It's hard to see one nucleotide 176 00:13:58,000 --> 00:14:03,000 difference out of the 4 million nucleotides. 177 00:14:03,000 --> 00:14:07,000 The E coli say, how are we going to get that? 178 00:14:07,000 --> 00:14:11,000 This is good. We're thinking practically here. 179 00:14:11,000 --> 00:14:15,000 What else? Sorry? Sorry? Cut it up. I'm assuming 180 00:14:15,000 --> 00:14:19,000 she wanted it cut up and run out on the gel. It still will look like a 181 00:14:19,000 --> 00:14:23,000 schmeer. Forget the gel. Cut it up. Make a library. 182 00:14:23,000 --> 00:14:27,000 OK, so we're going to make a library. Let's assume now we have a 183 00:14:27,000 --> 00:14:31,000 library of different E coli cells containing individual plasmids, 184 00:14:31,000 --> 00:14:37,000 containing random bits of E coli. How's that going to help? 185 00:14:37,000 --> 00:14:46,000 Splice it back in. How do I know if I spliced it back in? 186 00:14:46,000 --> 00:14:55,000 Ooh, that's an interesting thought. Suppose I were to make my library 187 00:14:55,000 --> 00:15:04,000 using wild type DNA, DNA from the wild type strain. 188 00:15:04,000 --> 00:15:09,000 So, I'm going to make a library containing lots and lots of 189 00:15:09,000 --> 00:15:14,000 fragments of normal E coli DNA. This is my library. I'm going to 190 00:15:14,000 --> 00:15:20,000 transform it into, what kind of bacteria should I 191 00:15:20,000 --> 00:15:25,000 transform it into, wild type or mutant? 192 00:15:25,000 --> 00:15:31,000 Who votes mutant? Who votes wild type? 193 00:15:31,000 --> 00:15:38,000 We'll go with mutant, then. Mutant. We'll put it in 194 00:15:38,000 --> 00:15:45,000 mutant. So now, all of these mutant cells, 195 00:15:45,000 --> 00:15:52,000 each one is going to suck up a plasmid. We then are going to plate 196 00:15:52,000 --> 00:15:59,000 this, and let colonies grow up. One of these colonies contained, 197 00:15:59,000 --> 00:16:06,000 so this mutant is arge minus. And, one of these colonies is going 198 00:16:06,000 --> 00:16:12,000 to contain the ARG plus gene here. How are we going to know which one? 199 00:16:12,000 --> 00:16:18,000 Sorry? How are we going to know which one has the arge plus gene? 200 00:16:18,000 --> 00:16:24,000 Yes? So, plate it on minimal medium. If I plate it on minimal 201 00:16:24,000 --> 00:16:31,000 medium, what will happen to most of my mutant bacteria? 202 00:16:31,000 --> 00:16:34,000 They're not going to grow. But, what's going to happen to the 203 00:16:34,000 --> 00:16:38,000 bacteria that happens to be lucky enough to have picked up the plasmid 204 00:16:38,000 --> 00:16:41,000 that contains the ARG plus gene? It'll grow. So, whatever grows on 205 00:16:41,000 --> 00:16:45,000 minimal medium has been rescued. In fact, we've complemented the 206 00:16:45,000 --> 00:16:49,000 defect. Remember, we talked about complementation 207 00:16:49,000 --> 00:16:52,000 tests? In a way, it would be the plasmid is 208 00:16:52,000 --> 00:16:56,000 complementing the defect. Bingo, that's it. So, we can 209 00:16:56,000 --> 00:17:00,000 actually find that gene functionally. 210 00:17:00,000 --> 00:17:09,000 We plate on minimal median, and we look for growth. The only 211 00:17:09,000 --> 00:17:18,000 things that will grow have been rescued. So, this is called cloning 212 00:17:18,000 --> 00:17:27,000 by complementation because we are complementing the defect 213 00:17:27,000 --> 00:17:34,000 in this strand. All right. So, 214 00:17:34,000 --> 00:17:38,000 any time I have a functional defect in my bacteria, 215 00:17:38,000 --> 00:17:43,000 I can find the gene for that functional defect by simply taking a 216 00:17:43,000 --> 00:17:48,000 total library for normal from wild type bacteria, 217 00:17:48,000 --> 00:17:52,000 transforming it into a mutant bacteria, and looking for rich 218 00:17:52,000 --> 00:17:57,000 bacteria has suddenly been rescued. Then I'll purify that bacterium, 219 00:17:57,000 --> 00:18:05,000 and I'll purify out the plasmid. And that plasmid will contain the 220 00:18:05,000 --> 00:18:16,000 DNA for the gene. That's pretty cool. 221 00:18:16,000 --> 00:18:28,000 Let's try another one. Suppose, yes? OK, great. 222 00:18:28,000 --> 00:18:32,000 I've got my plate here, and I've said only one of these 223 00:18:32,000 --> 00:18:36,000 bacteria will grow. It's the one that happens to have 224 00:18:36,000 --> 00:18:41,000 within it the plasmid containing the ARG gene. And, 225 00:18:41,000 --> 00:18:45,000 you're fine with that, but you're saying, but how would I 226 00:18:45,000 --> 00:18:50,000 get that plasmid back out of the bacteria because the bacteria's got 227 00:18:50,000 --> 00:18:54,000 its own chromosome, and I'm making this big deal about 228 00:18:54,000 --> 00:18:59,000 how we purified stuff away from all this other DNA. 229 00:18:59,000 --> 00:19:03,000 But, I've thrown this plasmid back into a bacteria that has all 230 00:19:03,000 --> 00:19:08,000 its chromosomal DNA. So, who am I kidding? 231 00:19:08,000 --> 00:19:13,000 How are we going to purify out just that plasmid? If I could purify the 232 00:19:13,000 --> 00:19:18,000 plasmid, it would be OK right? It turns out I can. Plasmids are 233 00:19:18,000 --> 00:19:22,000 little circles of DNA. Chromosomes are big pieces of DNA. 234 00:19:22,000 --> 00:19:27,000 It turns out that the coiling of the plasmid as a little circle gives 235 00:19:27,000 --> 00:19:32,000 it different densities and different physical chemical properties to big 236 00:19:32,000 --> 00:19:37,000 chunks of DNA which get broken up. And so, there are a bunch of tricks 237 00:19:37,000 --> 00:19:41,000 that allow me to get a pretty high purification of a plasmid away from 238 00:19:41,000 --> 00:19:46,000 chromosomal DNA based on the different physical properties of a 239 00:19:46,000 --> 00:19:50,000 small circle versus big chromosome. But, good question. Otherwise, how 240 00:19:50,000 --> 00:19:55,000 would I get that plasmid out? But it turns out, you can purify 241 00:19:55,000 --> 00:20:00,000 plasmids. Good question. OK, so now, let's try another one. 242 00:20:00,000 --> 00:20:05,000 Next cloning expedition: we're going to go to the library, 243 00:20:05,000 --> 00:20:10,000 and we want to withdraw a volume from the library. 244 00:20:10,000 --> 00:20:15,000 And, I want now, instead of bacteria that can't make arginine, 245 00:20:15,000 --> 00:20:20,000 let's go with human DNA. Let's try human DNA. And, 246 00:20:20,000 --> 00:20:25,000 I would like you to now please find the gene that encodes beta-globin. 247 00:20:25,000 --> 00:20:30,000 Beta globin, of course, is one of the two proteins in hemoglobin. 248 00:20:30,000 --> 00:20:34,000 Hemoglobin is a tetramer. It has alpha-globin and beta-globin. 249 00:20:34,000 --> 00:20:39,000 This tetramer is the oxygen carrier in your blood. 250 00:20:39,000 --> 00:20:43,000 It carriers oxygen. Beta-globin happens to be the site 251 00:20:43,000 --> 00:20:48,000 of some very important mutations. We know that sickle cell anemia is 252 00:20:48,000 --> 00:20:52,000 caused by mutations in beta-globin. We know that diseases like 253 00:20:52,000 --> 00:20:57,000 thalassemia are caused by mutations in beta-globin. 254 00:20:57,000 --> 00:21:01,000 And, people knew this before they had recombinant DNA because they 255 00:21:01,000 --> 00:21:06,000 could study red blood cells. There's lots of beta-globin in red 256 00:21:06,000 --> 00:21:10,000 blood cells. They could see that something was funny about the 257 00:21:10,000 --> 00:21:14,000 protein. They could even see that in sickle cell anemia the protein 258 00:21:14,000 --> 00:21:19,000 had a different net charge, and it would run differently. 259 00:21:19,000 --> 00:21:23,000 So, they knew something was funny with the beta globin protein. 260 00:21:23,000 --> 00:21:27,000 All I want you to do now is clone beta-globin for me. 261 00:21:27,000 --> 00:21:32,000 Could we do the same thing? Why not? 262 00:21:32,000 --> 00:21:40,000 Bacteria don't make beta-globin. So, what can we do? Well, we could 263 00:21:40,000 --> 00:21:49,000 make a library of human DNA. And, we could throw it into the 264 00:21:49,000 --> 00:21:58,000 bacteria. So, why don't we just select for a 265 00:21:58,000 --> 00:22:05,000 bacteria that makes beta-globin? Could we do that? 266 00:22:05,000 --> 00:22:11,000 I don't know, how? Do you see how? How would we 267 00:22:11,000 --> 00:22:16,000 select for that? I mean, there, we could see who 268 00:22:16,000 --> 00:22:21,000 grows without arginine. But how are we going to tell which 269 00:22:21,000 --> 00:22:27,000 bacteria has picked up beta-globin? I don't know. Yeah? Use 270 00:22:27,000 --> 00:22:32,000 mammals. We could take a mouse that did not 271 00:22:32,000 --> 00:22:37,000 make beta globin, a mouse that had, say, 272 00:22:37,000 --> 00:22:41,000 thalassemia, isolate a naturally occurring mouse with a defect in 273 00:22:41,000 --> 00:22:46,000 beta-globin. Then, do injections of plasmids into mouse 274 00:22:46,000 --> 00:22:51,000 eggs, grow up the mouse eggs by implanting them back into 275 00:22:51,000 --> 00:22:55,000 pseudo-pregnant females, do this for 108 individual plasmids 276 00:22:55,000 --> 00:23:00,000 with 108 individual mice, and look for the mouse that is 277 00:23:00,000 --> 00:23:04,000 rescued. Intellectually, 278 00:23:04,000 --> 00:23:08,000 you're absolutely right, it works. So, that's exactly the 279 00:23:08,000 --> 00:23:12,000 cloning by complementation we talked about for bacteria, 280 00:23:12,000 --> 00:23:16,000 and you're dead-on right. That would work. Getting it funded 281 00:23:16,000 --> 00:23:19,000 is another matter because it's a hugely expensive experiment to shoot 282 00:23:19,000 --> 00:23:23,000 up each egg with this, but it could work. So, 283 00:23:23,000 --> 00:23:27,000 we need another solution because we can't rescue the function in mice 284 00:23:27,000 --> 00:23:31,000 because it's just not practical to do so. 285 00:23:31,000 --> 00:23:35,000 Of course, if we could do this in mouse cells, maybe we could make it 286 00:23:35,000 --> 00:23:40,000 work in cell culture in mice. But, let's suppose we don't have a 287 00:23:40,000 --> 00:23:44,000 cell culture phenotype. We just have an organism phenotype. 288 00:23:44,000 --> 00:23:49,000 So, it's not going to work to just do this by complementation. 289 00:23:49,000 --> 00:23:53,000 But, good thinking guys. This is good. So, next trick we might have 290 00:23:53,000 --> 00:23:58,000 at our disposal is suppose because beta-globin is so abundant in red 291 00:23:58,000 --> 00:24:02,000 blood cells we have purified beta-globin, and we've done amino 292 00:24:02,000 --> 00:24:09,000 acid sequencing of the protein. By end degradation, 293 00:24:09,000 --> 00:24:17,000 you can work out the sequence of globin. And, you can learn that 294 00:24:17,000 --> 00:24:25,000 beta-globin has, here at its amino terminal, 295 00:24:25,000 --> 00:24:33,000 val, leu, ser, pro, ala, asp, lys, threonine dot, dot, dot, dot, 296 00:24:33,000 --> 00:24:41,000 dot off to the carboxy terminal, OK? 297 00:24:41,000 --> 00:24:46,000 If I knew that this was the amino acid sequence of the beginning, 298 00:24:46,000 --> 00:24:51,000 just the beginning of beta-globin, couldn't I figure out what that 299 00:24:51,000 --> 00:24:57,000 initial portion of the DNA sequence must be? 300 00:24:57,000 --> 00:25:01,000 Wouldn't this give me a clue? If I knew a little bit of the 301 00:25:01,000 --> 00:25:05,000 protein sequence, wouldn't this give me a clue about 302 00:25:05,000 --> 00:25:09,000 the nucleotide sequence that must be there in the human genome to encode 303 00:25:09,000 --> 00:25:13,000 this protein? So, a biochemist has purified the 304 00:25:13,000 --> 00:25:17,000 protein. Biochemists have studied the protein well enough to know some 305 00:25:17,000 --> 00:25:21,000 of its amino acid sequence. Can I infer the DNA sequence from 306 00:25:21,000 --> 00:25:25,000 the amino acid sequence, or at least a little snippet of it? 307 00:25:25,000 --> 00:25:30,000 Sorry? Multiple possibilities, 308 00:25:30,000 --> 00:25:35,000 but an infinite number? No. Why do you encode valine? Well, 309 00:25:35,000 --> 00:25:40,000 GT something; something could be actually A, T, 310 00:25:40,000 --> 00:25:45,000 C, or G. What about luecine. Well, it's either a T and a C, 311 00:25:45,000 --> 00:25:50,000 or is T in the first place? There's always a T there. 312 00:25:50,000 --> 00:25:55,000 There you go, and it can be either of those. There's a T, 313 00:25:55,000 --> 00:26:01,000 C, anything, or an A, G, and a T, or a C. 314 00:26:01,000 --> 00:26:07,000 Here, we have C, C anything. Here we have a G, 315 00:26:07,000 --> 00:26:13,000 C anything. We have a G, A, T, or a C. For leucine it's an A, an A, 316 00:26:13,000 --> 00:26:19,000 either an A or a G. Here, it's an A, a C, 317 00:26:19,000 --> 00:26:25,000 an anything. Here, it's an A, an A, a T, or a C, here a G, a T, 318 00:26:25,000 --> 00:26:31,000 anything, an A, an A, A or a G. You're right. There are 319 00:26:31,000 --> 00:26:36,000 multiple possibilities. But, it's not an infinite number, 320 00:26:36,000 --> 00:26:41,000 right? There are certain possible DNA sequences that might be encoded 321 00:26:41,000 --> 00:26:46,000 here. If I just work it out, it's either two choices here. There 322 00:26:46,000 --> 00:26:52,000 are four choices here. There's two choices here. 323 00:26:52,000 --> 00:26:57,000 There's four choices here. There's two choices here, two 324 00:26:57,000 --> 00:27:02,000 choices, etc. If I just look at, 325 00:27:02,000 --> 00:27:08,000 let's take a segment of this. Let's try one, two, three, these 326 00:27:08,000 --> 00:27:14,000 six amino acids. Four choices here, 327 00:27:14,000 --> 00:27:20,000 how many possible DNA sequences could encode these six amino acids 328 00:27:20,000 --> 00:27:26,000 in this order? Four times four times two times two 329 00:27:26,000 --> 00:27:32,000 times four times two, what is that? 330 00:27:32,000 --> 00:27:38,000 256, let's see, two, two, to the two, 331 00:27:38,000 --> 00:27:44,000 to the four, to the five, to the six, to the seven, eight, 332 00:27:44,000 --> 00:27:50,000 512. I think it's about 512 possibilities. 333 00:27:50,000 --> 00:27:56,000 So, 512 possible nucleotide sequences could work here. 334 00:27:56,000 --> 00:28:02,000 Well, 512's not infinite. There's 18 bases of sequence, 335 00:28:02,000 --> 00:28:09,000 512 possible 18 base long nucleotide sequences. 336 00:28:09,000 --> 00:28:14,000 Just suppose that you knew which one it was. Now, you have to suspend 337 00:28:14,000 --> 00:28:19,000 your disbelief for a second. I'm not going to tell you how you 338 00:28:19,000 --> 00:28:24,000 might know, but suppose you knew which of the 512 it was. 339 00:28:24,000 --> 00:28:29,000 OK, could we use that little fact of knowing a stretch from about 18 340 00:28:29,000 --> 00:28:35,000 bases of the sequence to find the clone? 341 00:28:35,000 --> 00:28:39,000 How could we find that clone in our library that has that 18 bases of 342 00:28:39,000 --> 00:28:43,000 sequence? Google. [LAUGHTER] And, of course, 343 00:28:43,000 --> 00:28:47,000 you are totally right because as we'll come back to, 344 00:28:47,000 --> 00:28:51,000 that is the way you would do it today if it's the human genome 345 00:28:51,000 --> 00:28:55,000 because the entire sequence of the human genome's on the web. 346 00:28:55,000 --> 00:29:00,000 But, you might have an organism where it's not on the web. 347 00:29:00,000 --> 00:29:04,000 But, we'll come back because, of course, the human genome project 348 00:29:04,000 --> 00:29:09,000 changes everything as to how you would approach this. 349 00:29:09,000 --> 00:29:13,000 Google is how you would do it today. But, in the absence of Google or 350 00:29:13,000 --> 00:29:18,000 the absence of the entire sequence of the human genome, 351 00:29:18,000 --> 00:29:23,000 but I'm glad you raise it because it's absolutely right, 352 00:29:23,000 --> 00:29:27,000 how could I find the clone that has that specific 18 base pair sequence? 353 00:29:27,000 --> 00:29:33,000 Who has my 18 base sequence. Well, here's a trick. 354 00:29:33,000 --> 00:29:41,000 I could chemically synthesize an oligonucleotide that matches my 355 00:29:41,000 --> 00:29:48,000 sequence: an 18 base pair long ologonucleotide encoding my sequence. 356 00:29:48,000 --> 00:29:56,000 What I'd like to do is use this ologonucleotide as a chemical probe 357 00:29:56,000 --> 00:30:02,000 to wash over my library. And, by washing it over my library, 358 00:30:02,000 --> 00:30:07,000 I'd like to see where it sticks. Now, that's kind of interesting. 359 00:30:07,000 --> 00:30:12,000 What do I mean by that? What I'd really like to do would be to kind 360 00:30:12,000 --> 00:30:18,000 of crack open all the cells of my library, and then the DNA would be 361 00:30:18,000 --> 00:30:23,000 sitting there. And, I'd like to take my 362 00:30:23,000 --> 00:30:28,000 ologonucleotide probe for a little snippet of the gene and wash it over 363 00:30:28,000 --> 00:30:33,000 the library. And then, by the amazing powers of 364 00:30:33,000 --> 00:30:39,000 Crick and Watson base pairing, it should stick to the right place. 365 00:30:39,000 --> 00:30:44,000 Could it do that? Turns out DNA, given time to wash around, 366 00:30:44,000 --> 00:30:49,000 will stick to its own complement. So that's the idea. How in the 367 00:30:49,000 --> 00:30:55,000 world do I do this in practice? So, here's what you do in practice. 368 00:30:55,000 --> 00:31:00,000 In practice, let us grow our 369 00:31:00,000 --> 00:31:06,000 bacteria. Let's plate the bacteria on an agar plate on which we have 370 00:31:06,000 --> 00:31:12,000 put a membrane a nitrocellulose filter or some other kind of filter. 371 00:31:12,000 --> 00:31:18,000 Just imagine it being a piece of filter paper. And, 372 00:31:18,000 --> 00:31:24,000 I'm going to plate my bacteria on the filter paper that's here. 373 00:31:24,000 --> 00:31:30,000 I'll let them grow up because there's nutrients here. 374 00:31:30,000 --> 00:31:35,000 The nutrients diffuse through the filter paper. And then, 375 00:31:35,000 --> 00:31:40,000 I have a piece of filter paper that I can pick up with my tweezers, 376 00:31:40,000 --> 00:31:45,000 and on that filter paper are bacterial colonies growing. 377 00:31:45,000 --> 00:31:50,000 So, this is a filter. Then, what I'm going to do is I'm going to 378 00:31:50,000 --> 00:31:55,000 take this filter with these glistening bacterial colonies, 379 00:31:55,000 --> 00:32:00,000 and I'm going to stick it in the autoclave. 380 00:32:00,000 --> 00:32:04,000 And, I'm going to heat it up in the presence of wet heat, 381 00:32:04,000 --> 00:32:09,000 and the bacterial cells will crack open. And, under these conditions, 382 00:32:09,000 --> 00:32:13,000 the DNA will tend to stick to the filter because I've picked the 383 00:32:13,000 --> 00:32:18,000 filter that the DNA tends to stick to. And, I'm going to wash this 384 00:32:18,000 --> 00:32:23,000 filter in a certain way that all the usual junk, some of the proteins and 385 00:32:23,000 --> 00:32:27,000 cell surface junk washes off. And, the DNA from each bacterial 386 00:32:27,000 --> 00:32:33,000 colony will stick. So now, I have the DNA from each 387 00:32:33,000 --> 00:32:39,000 colony sticking to that spot. Then, what I'm going to do is I'm 388 00:32:39,000 --> 00:32:45,000 going to take my filter and I'm going to add my ologoprobe. 389 00:32:45,000 --> 00:32:51,000 This thing is now called a probe. I'm going to add the probe to the 390 00:32:51,000 --> 00:32:57,000 filter, and I'm going to put this in a, I need some sort of a 391 00:32:57,000 --> 00:33:03,000 hybridization device in which the probe and the ologonucleotide and a 392 00:33:03,000 --> 00:33:07,000 little water can swish around. And here, we use a technical device 393 00:33:07,000 --> 00:33:11,000 called a baggy, or some other kind of, 394 00:33:11,000 --> 00:33:15,000 basically, a Ziploc bag or you can heat seal it or something like a 395 00:33:15,000 --> 00:33:18,000 freeze meal. In fact that's actually what's used in the lab is 396 00:33:18,000 --> 00:33:22,000 Freeze-a-Meal. You get these Freeze-a-Meal bags, 397 00:33:22,000 --> 00:33:26,000 you toss your filter in, you squirt a little bit of your probe in, 398 00:33:26,000 --> 00:33:30,000 and you put it in the Freeze-a-Meal bag, and then you put 399 00:33:30,000 --> 00:33:34,000 it in a water bath. And, it switches back and forth. 400 00:33:34,000 --> 00:33:40,000 And, the probe just goes washing all over the place. 401 00:33:40,000 --> 00:33:46,000 And, wherever the probe finds its corresponding cognate sequence by 402 00:33:46,000 --> 00:33:51,000 Crick and Watson, it'll stick. And there you go. 403 00:33:51,000 --> 00:33:57,000 That clone contains your sequence. Now, we have a few problems here, 404 00:33:57,000 --> 00:34:03,000 don't we? What are some of the problems with this? Yeah? 405 00:34:03,000 --> 00:34:07,000 Sorry, what if it sticks what? So, the probe, I thought this 406 00:34:07,000 --> 00:34:12,000 filter likes DNA. So, why won't the probe just stick 407 00:34:12,000 --> 00:34:17,000 nonspecifically everywhere? We treat it in some way so that 408 00:34:17,000 --> 00:34:22,000 after we've got the DNA adhering to it it's now not going to stick 409 00:34:22,000 --> 00:34:27,000 everywhere. Good, next problem. Well, 410 00:34:27,000 --> 00:34:31,000 even before that, yes? No, we'll take the whole library. 411 00:34:31,000 --> 00:34:35,000 We've gotten the library scattered out on this filter. 412 00:34:35,000 --> 00:34:39,000 Good, so hang on to that one for a second. First off, 413 00:34:39,000 --> 00:34:42,000 do we even know where that clone is? How did we know where the piece of 414 00:34:42,000 --> 00:34:46,000 DNA stuck? I mean, I drew it as red. But, 415 00:34:46,000 --> 00:34:50,000 how do we know where that red spot is? Yeah? Oh yeah, 416 00:34:50,000 --> 00:34:53,000 you see the problem is if I just wash it over there, 417 00:34:53,000 --> 00:34:57,000 unless you have, you know, Superman vision, you're not going to 418 00:34:57,000 --> 00:35:01,000 know where that probe is. So, you're proposing, the first 419 00:35:01,000 --> 00:35:05,000 thing we better do is radioactively label the probe. 420 00:35:05,000 --> 00:35:08,000 So, let's put a radioactive label on the probe, OK? 421 00:35:08,000 --> 00:35:12,000 Radio label, and it turns out you can radio label probes by using 422 00:35:12,000 --> 00:35:15,000 these enzymes that can add a radioactive phosphate group, 423 00:35:15,000 --> 00:35:19,000 etc. So, now, when it's radioactive, we put it here. 424 00:35:19,000 --> 00:35:22,000 And now we have a radioactive signal here. How are we going to 425 00:35:22,000 --> 00:35:26,000 find our radioactive signal? We put it up against x-ray films. 426 00:35:26,000 --> 00:35:30,000 We take our filter. We dry it off. 427 00:35:30,000 --> 00:35:33,000 We slap it onto a piece of x-ray film. We let it expose overnight. 428 00:35:33,000 --> 00:35:36,000 We develop the x-ray film. And, we'll see a black dot. 429 00:35:36,000 --> 00:35:39,000 We'd better actually have taken some care to take a little 430 00:35:39,000 --> 00:35:43,000 radioactive pen and make a couple of fiducial marks around the corners. 431 00:35:43,000 --> 00:35:46,000 Otherwise, we're not going to know where our black dot corresponds to. 432 00:35:46,000 --> 00:35:49,000 But, assume we've made a couple of dots and we know how to line up our 433 00:35:49,000 --> 00:35:53,000 x-ray film to our filter. Now, we go back to our filter. 434 00:35:53,000 --> 00:35:56,000 We say, uh-huh, there is a black dot corresponding to the location of 435 00:35:56,000 --> 00:36:00,000 the radioactive probe right there. 436 00:36:00,000 --> 00:36:06,000 That was, as you said, where the colony used to be that we 437 00:36:06,000 --> 00:36:12,000 wished we still had [LAUGHTER] because we cooked it in the 438 00:36:12,000 --> 00:36:19,000 autoclave, which is too bad. So, what should we do about that? 439 00:36:19,000 --> 00:36:25,000 Yep? So, if I did it one colony at a time, I would know exactly which 440 00:36:25,000 --> 00:36:32,000 one it came from. But, it could take a long time. 441 00:36:32,000 --> 00:36:35,000 Sorry? So, plate it first onto a plate of agar. 442 00:36:35,000 --> 00:36:39,000 Take a filter, and press the filter up against the 443 00:36:39,000 --> 00:36:43,000 plate and make a copy of it. Replicaplate (sic) that. It turns 444 00:36:43,000 --> 00:36:46,000 out, that'll work. There are two different approaches 445 00:36:46,000 --> 00:36:50,000 and both of you were right. One approach is to replicaplate it. 446 00:36:50,000 --> 00:36:54,000 Plate it first on a normal plate, and lay a piece of filter on top of 447 00:36:54,000 --> 00:36:58,000 it, and a little bacteria will stick in the same patterns. 448 00:36:58,000 --> 00:37:01,000 Peel it off, and you now have it. Alternatively, 449 00:37:01,000 --> 00:37:05,000 now in the presence of robotics, you can use a robot to take these 450 00:37:05,000 --> 00:37:08,000 colonies into microtiter plates, and you can screen the individual 451 00:37:08,000 --> 00:37:12,000 wells by stamping them onto a filter, things like that. 452 00:37:12,000 --> 00:37:15,000 And frankly, that's how we do it now. If you want to screen the 453 00:37:15,000 --> 00:37:19,000 human genome, at least set up a library with a few tens of thousands 454 00:37:19,000 --> 00:37:23,000 or hundreds of thousands such things. And, we can read off from a grid 455 00:37:23,000 --> 00:37:26,000 which one it was, and we go back to our master 456 00:37:26,000 --> 00:37:30,000 microtiter plates where we have. But, either way, we need to have a 457 00:37:30,000 --> 00:37:34,000 living copy of the library. But, that's how you do it. 458 00:37:34,000 --> 00:37:39,000 So now, we're in business. We have a living copy of the 459 00:37:39,000 --> 00:37:43,000 library. We make a filter containing that. 460 00:37:43,000 --> 00:37:48,000 We cook the filter in the autoclave. We add a radioactive probe. 461 00:37:48,000 --> 00:37:53,000 Wherever it sticks, it matches by the wonders of Crick-Watson base 462 00:37:53,000 --> 00:37:58,000 pairing. We're in business. Yes? So now, there was this issue. 463 00:37:58,000 --> 00:38:03,000 I mean, how do I know that that sequence doesn't appear multiple 464 00:38:03,000 --> 00:38:08,000 times in the human genome? That's one issue. So, I'm going to 465 00:38:08,000 --> 00:38:13,000 have to pull out each of the positive hits I get and check it out. 466 00:38:13,000 --> 00:38:18,000 I'm going to have to analyze the clone because just knowing that it 467 00:38:18,000 --> 00:38:23,000 hybridized to that might not tell me it's the beta-globin gene, 468 00:38:23,000 --> 00:38:28,000 but at least it's probably a good start, right? I've narrowed it down. 469 00:38:28,000 --> 00:38:33,000 But, yes? Wait a second, right. We said there were 512 possibilities, 470 00:38:33,000 --> 00:38:39,000 and I said, bear with me, let's suppose we knew which one it 471 00:38:39,000 --> 00:38:45,000 was and we used it. Well, how are we going to know 472 00:38:45,000 --> 00:38:51,000 which one it is? Well, we could do the experiment 473 00:38:51,000 --> 00:38:57,000 512 times, and one of them would work. That's lousy. 474 00:38:57,000 --> 00:39:03,000 We could go and make 512 ologotes and simultaneously throw them in the 475 00:39:03,000 --> 00:39:07,000 same seal-a-meal bag. That actually works. 476 00:39:07,000 --> 00:39:10,000 How do you make 512 ologotes? How do you make an ologote, by the 477 00:39:10,000 --> 00:39:13,000 way? To make an ologonucleotide, there's very fancy chemistry that's 478 00:39:13,000 --> 00:39:16,000 been developed, which someone won a Nobel Prize. 479 00:39:16,000 --> 00:39:20,000 Nowadays, of course, if you need an ologote made, how do you do it? 480 00:39:20,000 --> 00:39:23,000 Go to the catalog, that's right. In fact, you can go on the web, 481 00:39:23,000 --> 00:39:26,000 type in the sequence you want, and there's a machine that will make 482 00:39:26,000 --> 00:39:29,000 it. You can have it tomorrow. So, it turns out, that's how you 483 00:39:29,000 --> 00:39:32,000 make ologonucleotides today. There are good machines for it. 484 00:39:32,000 --> 00:39:36,000 And, it turns out that if you wanted to, so what you do is you 485 00:39:36,000 --> 00:39:40,000 type into the computer the following. You type in, please make me an 486 00:39:40,000 --> 00:39:44,000 ologote that starts, put a C in the first position, 487 00:39:44,000 --> 00:39:47,000 a C in the second position. And, what are you going to put in the 488 00:39:47,000 --> 00:39:51,000 third position? Just tell the computer to put in a 489 00:39:51,000 --> 00:39:55,000 random mix of all four. Then, a G in this position, 490 00:39:55,000 --> 00:39:59,000 a C in that position, and then a random mix of all four. 491 00:39:59,000 --> 00:40:03,000 Then, put in a G and an A, and then put in a 50/50 mix of T and 492 00:40:03,000 --> 00:40:06,000 A. In fact, in one synthesis, 493 00:40:06,000 --> 00:40:09,000 by telling the computer to just add a mixture at certain steps, 494 00:40:09,000 --> 00:40:12,000 it'll simultaneously synthesize a mixture of all 512 possibilities for 495 00:40:12,000 --> 00:40:16,000 you. So actually, a single synthesis will suffice to 496 00:40:16,000 --> 00:40:19,000 get a mixture of 512. You take your mixture of 512, 497 00:40:19,000 --> 00:40:22,000 wash it over the filter, etc. Now, your point still stands. How do we 498 00:40:22,000 --> 00:40:25,000 know that there's not something else in the genome that has this, 499 00:40:25,000 --> 00:40:28,000 etc.? But at least we can find all the specific positives associated 500 00:40:28,000 --> 00:40:31,000 with this, and we can analyze them further as we'll talk about next 501 00:40:31,000 --> 00:40:35,000 time more about how you actually analyze them. 502 00:40:35,000 --> 00:40:38,000 And, of course, whether 18 is the right number of 503 00:40:38,000 --> 00:40:41,000 bases, or you might prefer to have a longer probe or shorter probes, 504 00:40:41,000 --> 00:40:44,000 or two probes, these are all the cooking tips molecular biologists 505 00:40:44,000 --> 00:40:48,000 worry about. But, given a sequence of an amino acid 506 00:40:48,000 --> 00:40:51,000 sequence, you can infer, although with redundancy, 507 00:40:51,000 --> 00:40:54,000 a nucleotide sequence. Given a nucleotide sequence, 508 00:40:54,000 --> 00:40:58,000 you can make an ologonucleotide probe. Given a nucleotide probe, 509 00:40:58,000 --> 00:41:02,000 you can wash it over the filter. You can find the colonies that have 510 00:41:02,000 --> 00:41:07,000 it, and therefore you could clone by hybridization. 511 00:41:07,000 --> 00:41:12,000 So, we'll call this one cloning by hybridization, 512 00:41:12,000 --> 00:41:17,000 or cloning by sequence. OK, now, there are other ways to do 513 00:41:17,000 --> 00:41:21,000 it, or by sequence here. Of course, as someone correctly 514 00:41:21,000 --> 00:41:26,000 noted, if the entire sequence of the human genome has been already 515 00:41:26,000 --> 00:41:31,000 sequenced as it has right now, if you knew the amino acid sequence, 516 00:41:31,000 --> 00:41:36,000 you could do this hybridization not using filters and radioactive probes, 517 00:41:36,000 --> 00:41:42,000 but just doing it in silico. You can do it in the computer, 518 00:41:42,000 --> 00:41:50,000 and that will work as well. So now, let's do the next one. Last cloning 519 00:41:50,000 --> 00:41:57,000 expedition: I'd like to clone the gene for Huntington's disease or 520 00:41:57,000 --> 00:42:05,000 cystic fibrosis or something like that. Cloning a disease gene, 521 00:42:05,000 --> 00:42:13,000 such as Huntington's disease, is a dominantly inherited disorder 522 00:42:13,000 --> 00:42:23,000 passed to some of the offspring, causes a brain degeneration that 523 00:42:23,000 --> 00:42:33,000 onsets typically in the fifth decade of life. 524 00:42:33,000 --> 00:42:36,000 Let's clone that gene. Can we do it by method number one, 525 00:42:36,000 --> 00:42:39,000 cloning by complementation? No, because we don't have a bacteria 526 00:42:39,000 --> 00:42:42,000 that has Huntington's disease. We don't have mice that have 527 00:42:42,000 --> 00:42:46,000 Huntington's disease. And, we can't certainly shoot up 528 00:42:46,000 --> 00:42:49,000 people and try to rescue the phenotype and all that. 529 00:42:49,000 --> 00:42:52,000 That's not going to work. Number two, how about doing it by 530 00:42:52,000 --> 00:42:56,000 number two? Let's just get the protein for Huntington's disease, 531 00:42:56,000 --> 00:42:59,000 get its amino acid sequence, and then find its nucleotide 532 00:42:59,000 --> 00:43:03,000 sequence. Pretty good. What's the protein for Huntington's 533 00:43:03,000 --> 00:43:07,000 disease? Huntase. No, it's actually called Huntington 534 00:43:07,000 --> 00:43:11,000 it turns out. But, at the time that people went off 535 00:43:11,000 --> 00:43:15,000 trying to find the gene for Huntington's disease, 536 00:43:15,000 --> 00:43:19,000 I'm afraid they didn't know. They had no idea what the gene was 537 00:43:19,000 --> 00:43:23,000 that caused Huntington's disease. That was the point. They wanted to 538 00:43:23,000 --> 00:43:27,000 use molecular biology to find the gene when they didn't even 539 00:43:27,000 --> 00:43:32,000 know the protein. So, we can't use our method number 540 00:43:32,000 --> 00:43:37,000 two. So, how are we going to find it? The disease does lead to 541 00:43:37,000 --> 00:43:42,000 degeneration of nervous cells. Study nerve cells. So, we could 542 00:43:42,000 --> 00:43:47,000 take brain biopsies from patients who have died of Huntington's 543 00:43:47,000 --> 00:43:52,000 disease, and people did that. But, nerve cells that die, a lot of 544 00:43:52,000 --> 00:43:57,000 stuff goes on. All sorts of proteins go wrong, 545 00:43:57,000 --> 00:44:02,000 and it's stuff. The problem with studying tissue 546 00:44:02,000 --> 00:44:06,000 from people who have a disease is that it's diseased tissue. 547 00:44:06,000 --> 00:44:10,000 And, just because you see something wrong doesn't mean it's a cause 548 00:44:10,000 --> 00:44:15,000 rather than the effect of the disease. That's why we really want 549 00:44:15,000 --> 00:44:19,000 to find the gene and find its mutation because we know then that's 550 00:44:19,000 --> 00:44:24,000 the primary cause. But, how are we going to do that? 551 00:44:24,000 --> 00:44:28,000 We don't know its sequence. We can't rescue it by complementation. 552 00:44:28,000 --> 00:44:33,000 As a pure geneticist, what can we do? 553 00:44:33,000 --> 00:44:36,000 Yeah, we know the sequence of the human genome. So, 554 00:44:36,000 --> 00:44:40,000 we just sequence the entirety of the genome of somebody with Huntington's 555 00:44:40,000 --> 00:44:43,000 disease and compare it to normal. That actually may become a 556 00:44:43,000 --> 00:44:47,000 reasonable way to do things, but the first sequence of the human 557 00:44:47,000 --> 00:44:50,000 genome costs a couple of billion dollars. Doing it again would be 558 00:44:50,000 --> 00:44:54,000 cheaper. We'd spend about $30 million or so, 559 00:44:54,000 --> 00:44:57,000 but it's pricey. Also, there would be a lot of 560 00:44:57,000 --> 00:45:01,000 genetic variation, just random, meaningless 561 00:45:01,000 --> 00:45:05,000 polymorphism between individuals. The human genome differs between any 562 00:45:05,000 --> 00:45:11,000 two people by about one letter or 1, 00. So, we would see about 3 563 00:45:11,000 --> 00:45:16,000 million differences between the person with Huntington's and the 564 00:45:16,000 --> 00:45:22,000 wild type reference sequence on Google. We wouldn't know which one 565 00:45:22,000 --> 00:45:27,000 causes it. Suppose you have a family tree. How could we use it? 566 00:45:27,000 --> 00:45:33,000 Compare the children and the parents. 567 00:45:33,000 --> 00:45:37,000 That's all right. What does a geneticist do with a 568 00:45:37,000 --> 00:45:42,000 family tree? What did Sturtevant teach us: genetic mapping. 569 00:45:42,000 --> 00:45:47,000 Suppose we were to study a family tree of individuals with 570 00:45:47,000 --> 00:45:52,000 Huntington's disease. And suppose on the chromosome where 571 00:45:52,000 --> 00:45:57,000 the Huntington's disease gene lives, we were to look at genetic markers. 572 00:45:57,000 --> 00:46:03,000 Could we do genetic linkage analysis? 573 00:46:03,000 --> 00:46:09,000 Genetic linkage analysis that would allow us to know that there was a 574 00:46:09,000 --> 00:46:15,000 marker here, some kind of a marker, a DNA marker, a DNA variation that 575 00:46:15,000 --> 00:46:21,000 was co-inherited with that showed linkage with Huntington's disease? 576 00:46:21,000 --> 00:46:27,000 We could do that just by finding that across a family, 577 00:46:27,000 --> 00:46:33,000 there tended to be very little genetic recombination between this 578 00:46:33,000 --> 00:46:38,000 marker and Huntington's disease. Now, how would we know to look here? 579 00:46:38,000 --> 00:46:44,000 You wouldn't. We'd try markers all over the genome. 580 00:46:44,000 --> 00:46:50,000 Next chromosome, next chromosome; if we tried genetic 581 00:46:50,000 --> 00:46:56,000 variations all over the human genome, we would eventually find that some 582 00:46:56,000 --> 00:47:02,000 genetic markers in the human genome tended to be co-inherited along with 583 00:47:02,000 --> 00:47:07,000 Huntington's disease. It turns out that that's enough. 584 00:47:07,000 --> 00:47:12,000 This will tell us approximately where this unknown gene must live. 585 00:47:12,000 --> 00:47:17,000 Here's a portion of the chromosome where the unknown Huntington's 586 00:47:17,000 --> 00:47:22,000 disease gene lives. Here's a genetic variant, 587 00:47:22,000 --> 00:47:27,000 and here's a genetic variant, a marker, that shows correlation. 588 00:47:27,000 --> 00:47:32,000 Maybe there's only 1% recombination here, and 1% recombination here. 589 00:47:32,000 --> 00:47:36,000 And, that's the powerful thing about Sturtevant's idea. 590 00:47:36,000 --> 00:47:41,000 It works in fruit flies. It works in humans. If I have any 591 00:47:41,000 --> 00:47:45,000 genetic variation and it's 99% correlated, or only recombines 1% of 592 00:47:45,000 --> 00:47:50,000 the time, it tells me that this unknown gene must be nearby. 593 00:47:50,000 --> 00:47:55,000 So, I could use this genetic marker as a DNA probe to wash over a 594 00:47:55,000 --> 00:48:00,000 library to get a big piece of DNA from this region. 595 00:48:00,000 --> 00:48:04,000 I can take this piece of DNA and use it as a probe, 596 00:48:04,000 --> 00:48:09,000 a radioactive probe, to get an overlapping piece of DNA. 597 00:48:09,000 --> 00:48:13,000 I can use the end of this DNA as a probe to wash over a library and get 598 00:48:13,000 --> 00:48:18,000 the next piece of DNA. And, I can do the same thing here. 599 00:48:18,000 --> 00:48:22,000 Once I have any piece of DNA that's even vaguely in the neighborhood, 600 00:48:22,000 --> 00:48:27,000 I can use it as a probe to wash over a library and get a piece of DNA, 601 00:48:27,000 --> 00:48:31,000 use it to get the next piece, the next piece, the next piece, 602 00:48:31,000 --> 00:48:36,000 in a process that was called chromosomal walking. 603 00:48:36,000 --> 00:48:40,000 That gives me a series of clones that I know must cover the region 604 00:48:40,000 --> 00:48:45,000 for this unknown gene. I then begin to analyze them and I 605 00:48:45,000 --> 00:48:50,000 say, let's look at some more genetic markers, a genetic marker a little 606 00:48:50,000 --> 00:48:55,000 closer and a little closer and a little closer. 607 00:48:55,000 --> 00:49:00,000 Which ones show perfect correlation with Huntington's disease? 608 00:49:00,000 --> 00:49:04,000 And, that narrows me down to a small number of clones that must contain 609 00:49:04,000 --> 00:49:09,000 the gene, even though I had no idea in advance what that gene was. 610 00:49:09,000 --> 00:49:14,000 This is called cloning by position. And, that's a very powerful 611 00:49:14,000 --> 00:49:19,000 technique of genetics because you don't need to know in advance what's 612 00:49:19,000 --> 00:49:24,000 wrong with a diseased gene. You first figure out where it is, 613 00:49:24,000 --> 00:49:29,000 and then you get the clones to figure out what it is. So, 614 00:49:29,000 --> 00:49:33,000 this actually works. Now, the process of getting the next 615 00:49:33,000 --> 00:49:37,000 piece, and the next clone, and the next clone, is unbelievably 616 00:49:37,000 --> 00:49:41,000 boring and tedious. And, for Huntington's disease, 617 00:49:41,000 --> 00:49:45,000 this process took nine years. Of course, now, how would you do it? 618 00:49:45,000 --> 00:49:49,000 Go to the web because with all of this process of the human genome, 619 00:49:49,000 --> 00:49:53,000 you've got all these clones laid out already. And so, 620 00:49:53,000 --> 00:49:57,000 the work that used to take years now is, once you have a genetic marker 621 00:49:57,000 --> 00:50:01,000 that's close to Huntington's you can just look up all the clones in the 622 00:50:01,000 --> 00:50:05,000 neighborhood and actually all the sequences in the neighborhood. 623 00:50:05,000 --> 00:50:09,000 So, this process has gone from nine years to, if you have do this again, 624 00:50:09,000 --> 00:50:14,000 you could get that region for Huntington's disease in a couple 625 00:50:14,000 --> 00:50:18,000 weeks. Now the question is, how do you analyze that region? 626 00:50:18,000 --> 00:50:23,000 How do you know what's in that region? How do you know what the 627 00:50:23,000 --> 00:50:27,000 genes are that are in that region? And that's what we'll talk about 628 00:50:27,000 --> 00:50:32,000 next time.