1 00:00:03,000 --> 00:00:17,000 Good morning. 2 00:00:17,000 --> 00:00:23,000 It can't go without at least some acknowledgement dimension. 3 00:00:23,000 --> 00:00:29,000 If you should ever find yourself in 4 00:00:29,000 --> 00:00:35,000 life in a situation where you have or are about to give up all hope, 5 00:00:35,000 --> 00:00:42,000 you think things are utterly impossible and there's no way, 6 00:00:42,000 --> 00:00:49,000 you will remember this week that nothing is impossible. 7 00:00:49,000 --> 00:00:53,000 It is possible to come back three games down in the bottom of the 8 00:00:53,000 --> 00:00:58,000 ninth inning, you've got to believe you can do it, 9 00:00:58,000 --> 00:01:03,000 and remember to have Dave Roberts pinch run. 10 00:01:03,000 --> 00:01:06,000 Just a general bit of good advice. What an amazing week, just 11 00:01:06,000 --> 00:01:09,000 absolutely amazing week. Wow. There are lessons in life to 12 00:01:09,000 --> 00:01:12,000 be taken from it. Please do take them. 13 00:01:12,000 --> 00:01:15,000 You know, there really are. I mean I'd given up hope by that 14 00:01:15,000 --> 00:01:18,000 point, I confess. I wish I could say oh, 15 00:01:18,000 --> 00:01:22,000 I knew they were going to pull it out, but I didn't. 16 00:01:22,000 --> 00:01:25,000 And, boy, they pulled it out one game at a time. 17 00:01:25,000 --> 00:01:28,000 So, all of you think good thoughts this week. This could be a historic 18 00:01:28,000 --> 00:01:33,000 week, you know, you were here. Anyway, onward. 19 00:01:33,000 --> 00:01:45,000 We were talking last time about how 20 00:01:45,000 --> 00:01:49,000 to analyze your clone. The notion of cloning random pieces 21 00:01:49,000 --> 00:01:54,000 of DNA, identifying your clone within a library, 22 00:01:54,000 --> 00:01:59,000 purifying the DNA from a clone, doing some preliminary analysis by 23 00:01:59,000 --> 00:02:03,000 maybe cutting with a restriction enzyme, then sequencing it using 24 00:02:03,000 --> 00:02:08,000 these techniques that I'd described would allow you to take the clone 25 00:02:08,000 --> 00:02:13,000 that was, say, able to rescue the yeast that 26 00:02:13,000 --> 00:02:17,000 couldn't grow without arginine and figure out what its 27 00:02:17,000 --> 00:02:22,000 DNA sequence was. You could take the clone that you 28 00:02:22,000 --> 00:02:26,000 had obtained by hybridizing with the DNA sequence corresponding to the 29 00:02:26,000 --> 00:02:31,000 protein sequence for beta-globin and sequence it and see the beta-globin 30 00:02:31,000 --> 00:02:36,000 gene sequence perhaps. This is very powerful. 31 00:02:36,000 --> 00:02:40,000 I want to take a brief moment, we'll come back to it in more detail 32 00:02:40,000 --> 00:02:44,000 in a subsequent lecture, but I really described how you would 33 00:02:44,000 --> 00:02:49,000 sequence one clone. I just want to make a note, 34 00:02:49,000 --> 00:02:53,000 because someone asked about it last time, about how you would sequence 35 00:02:53,000 --> 00:02:58,000 an entire genome. Someone asked about this. 36 00:02:58,000 --> 00:03:04,000 Remember before we pulled out our clone, we sequenced it, 37 00:03:04,000 --> 00:03:10,000 we got its DNA sequence. What if I wanted to sequence the 38 00:03:10,000 --> 00:03:17,000 entirety of a genome? Yeah. Do a lot of this, 39 00:03:17,000 --> 00:03:23,000 right, basically if I got a whole genome. Well, 40 00:03:23,000 --> 00:03:30,000 somebody asked could I put a primer here and just sequence? 41 00:03:30,000 --> 00:03:34,000 It would take a very long time. And it turns out that it wouldn't 42 00:03:34,000 --> 00:03:38,000 work because the separation that you can achieve through gels is a 43 00:03:38,000 --> 00:03:43,000 function, the separation between N and N plus 1 in length goes like the 44 00:03:43,000 --> 00:03:47,000 logarithm of the ratio. So, it turns out that when N and N 45 00:03:47,000 --> 00:03:52,000 plus 1 get to like about a thousand, you can achieve very little physical 46 00:03:52,000 --> 00:03:56,000 separation between them. And so, DNA sequencing runs cannot 47 00:03:56,000 --> 00:04:01,000 go much past the thousand bases. So, the problem with sequencing a 48 00:04:01,000 --> 00:04:05,000 genome by putting down a primer on an extraordinarily long piece of DNA, 49 00:04:05,000 --> 00:04:09,000 a hundred million bases, is you cannot separate the little 50 00:04:09,000 --> 00:04:14,000 fragments like that. So, what you do is you break up 51 00:04:14,000 --> 00:04:18,000 your genome into lots of pieces. One strategy, break it up into a 52 00:04:18,000 --> 00:04:22,000 library of some very big pieces. It turns out you can make pieces at 53 00:04:22,000 --> 00:04:27,000 random of a hundred thousand base pairs. 54 00:04:27,000 --> 00:04:31,000 Cloning these in bacterial artificial chromosomes, 55 00:04:31,000 --> 00:04:36,000 as we talked about before. Take a library of bacterial 56 00:04:36,000 --> 00:04:40,000 artificial chromosomes and then begin sequencing them. 57 00:04:40,000 --> 00:04:45,000 And take any given bacterial artificial chromosome and break it 58 00:04:45,000 --> 00:04:49,000 up into a whole lot of pieces that are maybe a thousand bases long, 59 00:04:49,000 --> 00:04:54,000 and you could sequence all of those. How do you arrange to get just a 60 00:04:54,000 --> 00:04:58,000 perfect overlapping set of thousand based pair clones that perfectly 61 00:04:58,000 --> 00:05:03,000 tile across the sequence with no redundancy? 62 00:05:03,000 --> 00:05:06,000 You don't. That's the correct answer. That's how you do it, 63 00:05:06,000 --> 00:05:10,000 you don't. Instead you just randomly take a bunch of things. 64 00:05:10,000 --> 00:05:13,000 And, in fact, typically you might take clones that give you six or 65 00:05:13,000 --> 00:05:17,000 eight-fold redundancy. You just sequence a lot of clones 66 00:05:17,000 --> 00:05:20,000 and then you ask the computer to reassemble it. 67 00:05:20,000 --> 00:05:24,000 And, in fact, all that overlap is very good for being able to stick 68 00:05:24,000 --> 00:05:27,000 these pieces together. Sometimes people do such things as 69 00:05:27,000 --> 00:05:31,000 take pieces that might be four thousand bases long and sequence a 70 00:05:31,000 --> 00:05:35,000 thousand bases here and a thousand bases here by using a primer that 71 00:05:35,000 --> 00:05:39,000 starts there and a primer that starts there. And then you can get 72 00:05:39,000 --> 00:05:42,000 DNA sequences from two ends of a clone. And if you had that for 73 00:05:42,000 --> 00:05:46,000 zillions of clones your computer program might do an even better job 74 00:05:46,000 --> 00:05:50,000 of linking things up. It's one very big crossword puzzle 75 00:05:50,000 --> 00:05:54,000 of putting together all of these pieces, a jigsaw puzzle of putting 76 00:05:54,000 --> 00:05:58,000 together all these pieces. But, in effect, this is how you 77 00:05:58,000 --> 00:06:01,000 sequence a big piece of DNA. You chop it up into medium-sized 78 00:06:01,000 --> 00:06:05,000 pieces of DNA and then tinny pieces of DNA, you sequence them, 79 00:06:05,000 --> 00:06:09,000 and you use computational science to reassemble it. 80 00:06:09,000 --> 00:06:13,000 Some people, for some genomes, take the whole big genome and 81 00:06:13,000 --> 00:06:16,000 immediately go to lots of little pieces. That can work, 82 00:06:16,000 --> 00:06:20,000 too. I depends on exactly how complicated your genome is. 83 00:06:20,000 --> 00:06:24,000 In the human genome, there are some parts of a human genome that are 84 00:06:24,000 --> 00:06:28,000 almost identical that might be like 99.91% identical in two different 85 00:06:28,000 --> 00:06:32,000 parts of the genome. And so, if you do that, 86 00:06:32,000 --> 00:06:36,000 you may have trouble telling those pieces apart. So, 87 00:06:36,000 --> 00:06:40,000 for really complicated genomes people like sometimes breaking it up 88 00:06:40,000 --> 00:06:45,000 into intermediate-sized pieces. But basically the idea of 89 00:06:45,000 --> 00:06:49,000 sequencing a big piece of DNA by this process is referred to as 90 00:06:49,000 --> 00:06:53,000 shotgun sequencing. Shotgun sequencing, 91 00:06:53,000 --> 00:06:58,000 in fact, was developed in about 1980 by Fred Sanger, 92 00:06:58,000 --> 00:07:02,000 the same guy who developed the DNA sequencing technique that I told you 93 00:07:02,000 --> 00:07:07,000 about using polymerase and dideoxynucleotides. 94 00:07:07,000 --> 00:07:09,000 Sanger very quickly wanted to go from sequencing a single piece to 95 00:07:09,000 --> 00:07:12,000 sequencing pieces, and so he developed the shotgun 96 00:07:12,000 --> 00:07:15,000 technique there. And it's now been applied in many 97 00:07:15,000 --> 00:07:18,000 different forms of intermediate shotguns, whole genome shotguns, 98 00:07:18,000 --> 00:07:21,000 et cetera. So, that's in reply to the question someone asked last time 99 00:07:21,000 --> 00:07:24,000 about, well, how would you do a whole genome? And, 100 00:07:24,000 --> 00:07:27,000 as a matter of fact, this is not theoretical because, 101 00:07:27,000 --> 00:07:30,000 in fact, people do hold genomes this way. And we do this at MIT. 102 00:07:30,000 --> 00:07:35,000 Lots of genomes get done here in this fashion. Someone else asked 103 00:07:35,000 --> 00:07:40,000 how would you analyze your clone. And, again, I'll just make a brief 104 00:07:40,000 --> 00:07:46,000 remark on that in response to the question. So, 105 00:07:46,000 --> 00:07:51,000 analyzing some DNA sequence. So, suppose we got some DNA 106 00:07:51,000 --> 00:07:57,000 sequence, A-A-T-A, don't bother writing this down. 107 00:07:57,000 --> 00:08:03,000 I'm just making up letters here. How would we make any sense of it? 108 00:08:03,000 --> 00:08:09,000 Suppose I give you the ones and zeros from your hard-drive, 109 00:08:09,000 --> 00:08:15,000 how would you make any sense out of them? This is about as interesting 110 00:08:15,000 --> 00:08:21,000 as the ones as yours from your hard-drive, right? 111 00:08:21,000 --> 00:08:27,000 It's got four letters, not two, but this is actually what 112 00:08:27,000 --> 00:08:31,000 you get out of any project. You want to sequence beta-globin? 113 00:08:31,000 --> 00:08:35,000 You'll get something like this. You want to sequence the arginine 114 00:08:35,000 --> 00:08:39,000 gene? You want to sequence the human genome? You get a very long 115 00:08:39,000 --> 00:08:43,000 string of four letters. What do you do with it? 116 00:08:43,000 --> 00:08:46,000 Oh, well, you could compare it to a normal copy of the gene. 117 00:08:46,000 --> 00:08:50,000 And if I did that I might find a bunch of differences. 118 00:08:50,000 --> 00:08:54,000 But how would I even know where the beta-globin gene was within this 119 00:08:54,000 --> 00:08:58,000 sequence? This clone contains beta-globin. How would I even find 120 00:08:58,000 --> 00:09:02,000 the exons? Yes? Or whatever? Look at codons. 121 00:09:02,000 --> 00:09:08,000 So, let's start looking at the codons. This codon here? 122 00:09:08,000 --> 00:09:13,000 Well, or this, maybe it's this codon here. Sorry? 123 00:09:13,000 --> 00:09:19,000 Find it. Do you see any start codons here? Oh, 124 00:09:19,000 --> 00:09:24,000 there's an ATG there. So, maybe that's the start codon or 125 00:09:24,000 --> 00:09:30,000 maybe not. How often do we expect to find an ATG in some 126 00:09:30,000 --> 00:09:35,000 reading frame? You know, it could happen fairly 127 00:09:35,000 --> 00:09:39,000 easily. Also, how do we know it's going this way? 128 00:09:39,000 --> 00:09:44,000 Maybe we should look for an ATG, we'll put it there, 129 00:09:44,000 --> 00:09:48,000 going this way. Sorry? I drew the arrow there? 130 00:09:48,000 --> 00:09:53,000 Well, that's because it's where the sequence started out on my page. 131 00:09:53,000 --> 00:09:57,000 It doesn't tell me my gene runs that way. Yes? From five 132 00:09:57,000 --> 00:10:02,000 prime to three prime. Ah, but it's a double-stranded piece 133 00:10:02,000 --> 00:10:06,000 of DNA. You see, if it's five prime to three prime on 134 00:10:06,000 --> 00:10:11,000 this strand, the genome has a, another strand that reads the other 135 00:10:11,000 --> 00:10:16,000 way. What did I get? C-A-T-A, right, C-C-T, 136 00:10:16,000 --> 00:10:20,000 et cetera. And the gene could be encoded on this strand. 137 00:10:20,000 --> 00:10:25,000 This could be the coding strand, that could be the coding strand, and 138 00:10:25,000 --> 00:10:30,000 looking for a mere ATG in one of three possible reading frames on one 139 00:10:30,000 --> 00:10:35,000 of two possible strands I'll find all sorts of stuff. 140 00:10:35,000 --> 00:10:38,000 So, sorry? Guess. Guess. Guess is good. 141 00:10:38,000 --> 00:10:42,000 They don't, don't, won't you, you remember we talked about getting 142 00:10:42,000 --> 00:10:46,000 papers accepted. If you were to write up the paper 143 00:10:46,000 --> 00:10:50,000 that way the reviewing would probably ding it and say, 144 00:10:50,000 --> 00:10:54,000 you know, the guess isn't, isn't good enough. So, that's 145 00:10:54,000 --> 00:10:58,000 actually very interesting. How do you actually find the gene 146 00:10:58,000 --> 00:11:02,000 sequence? Well, it turns out to be a 147 00:11:02,000 --> 00:11:06,000 non-trivial problem which often gets glossed over in the textbooks. 148 00:11:06,000 --> 00:11:11,000 What you might do is if something really were exonic, 149 00:11:11,000 --> 00:11:16,000 if this were any exon, does it have any properties that you 150 00:11:16,000 --> 00:11:20,000 can think of? It shouldn't have a stop codon. No stop codon. 151 00:11:20,000 --> 00:11:25,000 How often does a stop codon occur at random in a given reading frame? 152 00:11:25,000 --> 00:11:30,000 How many stop codons are there? Three out of 64 possible codons. 153 00:11:30,000 --> 00:11:33,000 There's about one in 20 codons in any given reading frame is a stop 154 00:11:33,000 --> 00:11:37,000 codon. So, that means if I read for about 20 codons, 155 00:11:37,000 --> 00:11:41,000 and I don't encounter a stop, it's beginning to get more likely 156 00:11:41,000 --> 00:11:45,000 that that's not random. If I read for, say, 60 codons, 157 00:11:45,000 --> 00:11:49,000 180 bases and I've encountered no stop codon in that reading frame, 158 00:11:49,000 --> 00:11:53,000 that chances of that occurring is about either the minus three or so, 159 00:11:53,000 --> 00:11:57,000 right? Because if I went through three characteristic lengths, 160 00:11:57,000 --> 00:12:01,000 either the minus three, you know, and I don't know, about 5% or 161 00:12:01,000 --> 00:12:05,000 something like that. If I went for thousands of bases 162 00:12:05,000 --> 00:12:09,000 without any stop codon, would you be impressed? That's 163 00:12:09,000 --> 00:12:14,000 pretty impressive. So, all I have to do is find the 164 00:12:14,000 --> 00:12:18,000 few thousands of basis with no stop codon. The problem with that is 165 00:12:18,000 --> 00:12:23,000 that in bacteria there are some genes that are a thousand bases long 166 00:12:23,000 --> 00:12:27,000 and you, there, you can read them and they have no 167 00:12:27,000 --> 00:12:33,000 stop codon. What's the problem with the human 168 00:12:33,000 --> 00:12:39,000 genome? Introns. It turns out that because the 169 00:12:39,000 --> 00:12:46,000 coding sequences are broken up into small exons, if I found a thousand 170 00:12:46,000 --> 00:12:53,000 bases with no stop codons then it's very likely coding sequence. 171 00:12:53,000 --> 00:13:00,000 But a typical human exon is on the order of a 150 to 200 bases. 172 00:13:00,000 --> 00:13:04,000 Very inconvenient because, you know, it's a typical exon 173 00:13:04,000 --> 00:13:08,000 encodes 50, 60, 70 codons. So, it turns out that 174 00:13:08,000 --> 00:13:12,000 even that is not so easy to do. Well, the answer is it's not a 175 00:13:12,000 --> 00:13:17,000 trivial problem. People do all sorts of things to 176 00:13:17,000 --> 00:13:21,000 figure out how to decode sequences of genomes. You do run computer 177 00:13:21,000 --> 00:13:25,000 filters across there that say, look, there are a bunch of 178 00:13:25,000 --> 00:13:30,000 consecutive codons without stop codons. 179 00:13:30,000 --> 00:13:33,000 There tend to be little preferences, like amongst the synonymous choices 180 00:13:33,000 --> 00:13:36,000 of stop codons, humans tend to prefer one stop codon, 181 00:13:36,000 --> 00:13:39,000 one codon for a specific amino acid over others. So, 182 00:13:39,000 --> 00:13:43,000 there are some biases as to which codons get used. 183 00:13:43,000 --> 00:13:46,000 And the computer can kind of take a little bit of account of that. 184 00:13:46,000 --> 00:13:49,000 Then you can also have made a library of seed DNAs and sequence 185 00:13:49,000 --> 00:13:53,000 seed DNA, the mRNA which will help you a lot and look for where they 186 00:13:53,000 --> 00:13:56,000 match up. Then you can take sequences from the human 187 00:13:56,000 --> 00:13:59,000 and the mouse. And it turns out that the sequences 188 00:13:59,000 --> 00:14:03,000 in the mouse and the sequences in the human, if you line them up, 189 00:14:03,000 --> 00:14:06,000 the exons tend to match up better than the introns because evolution 190 00:14:06,000 --> 00:14:09,000 cares a lot about the exons. But it turns out this is not a 191 00:14:09,000 --> 00:14:13,000 trivial problem. And even today, if I give you a 192 00:14:13,000 --> 00:14:16,000 random stretch of human DNA, it's not, there is no simple 193 00:14:16,000 --> 00:14:20,000 computer program that it's on, that on its own, not even a 194 00:14:20,000 --> 00:14:23,000 complicated computer program, but on its own would be able, 195 00:14:23,000 --> 00:14:27,000 without axillary data, to accurately pick out all the genes. 196 00:14:27,000 --> 00:14:30,000 Even for simple bacteria, we cannot nail perfectly all the 197 00:14:30,000 --> 00:14:34,000 genes. Although, the lack of these introns means that 198 00:14:34,000 --> 00:14:38,000 the exons tend to be pretty big, it means the coding are pretty big 199 00:14:38,000 --> 00:14:42,000 and we can kind of do it. So, I just wanted to point out that, 200 00:14:42,000 --> 00:14:46,000 that there's a lot still to be done there. The cell manages, 201 00:14:46,000 --> 00:14:50,000 thank you, to read this just fine, but we're not as smart yet as the 202 00:14:50,000 --> 00:14:54,000 cell, and so we're not totally able to read out all this stuff. 203 00:14:54,000 --> 00:14:58,000 We'll come back to genomics in a, in a further lecture. Yes? Yeah, 204 00:14:58,000 --> 00:15:01,000 wouldn't -- What a cool idea. 205 00:15:01,000 --> 00:15:04,000 Yes. There are actually some experiments, which maybe if you 206 00:15:04,000 --> 00:15:08,000 remind me we can, I can work it into a subsequent 207 00:15:08,000 --> 00:15:11,000 lecture, but people have some experiments where they can randomly 208 00:15:11,000 --> 00:15:14,000 mutagenize zillions of bacteria and determine which ones will grow and 209 00:15:14,000 --> 00:15:18,000 which ones won't. And they can do it all in parallel 210 00:15:18,000 --> 00:15:21,000 in a single test-tube. And thereby you can tell which, 211 00:15:21,000 --> 00:15:24,000 which nucleotides in the genome matter and which don't. 212 00:15:24,000 --> 00:15:28,000 It's a kind of cool procedure. OK. Anyway, I just wanted to sort 213 00:15:28,000 --> 00:15:31,000 of tie up that bit here. Now, let's move on to re-sequencing 214 00:15:31,000 --> 00:15:35,000 a gene. So, let's suppose we've managed to 215 00:15:35,000 --> 00:15:39,000 sequence, I don't know, the human genome, the entirety of 216 00:15:39,000 --> 00:15:43,000 the human genome we have before us, OK? Actually, next week in the 217 00:15:43,000 --> 00:15:47,000 journal Nature will appear a paper reporting, in fact, 218 00:15:47,000 --> 00:15:51,000 today, yester, no, yesterday, in fact, it was yesterday, yesterday 219 00:15:51,000 --> 00:15:55,000 appeared in the journal Nature a paper reporting the finished 220 00:15:55,000 --> 00:15:59,000 sequence of the human genome. 221 00:15:59,000 --> 00:16:03,000 http://www.nature. om/nature/journal/v431/n7011/pdf/nature03001.pdf 222 00:16:03,000 --> 00:16:07,000 And so, anybody who wants to go 223 00:16:07,000 --> 00:16:11,000 online, in fact, we can get, we'll get copies for the 224 00:16:11,000 --> 00:16:14,000 class. Why don't we get you copies of the paper? It's not as long as 225 00:16:14,000 --> 00:16:17,000 the last one. But, in fact, I didn't realize it was 226 00:16:17,000 --> 00:16:21,000 yesterday. I thought it was next week. Yesterday came out the final 227 00:16:21,000 --> 00:16:24,000 report on the finished sequence of the human genome, 228 00:16:24,000 --> 00:16:28,000 which a number of us have been laboring on for quite a long time. 229 00:16:28,000 --> 00:16:31,000 And it just appeared. So, we actually have that now. 230 00:16:31,000 --> 00:16:35,000 It actually, it's been on the Web for a while, but the paper 231 00:16:35,000 --> 00:16:39,000 describing it took a while to write up and it came out yesterday. 232 00:16:39,000 --> 00:16:43,000 So, we'll get you a copy of that. But now you've got that whole 233 00:16:43,000 --> 00:16:46,000 sequence of the human genome here. I've been, you know, I've been 234 00:16:46,000 --> 00:16:50,000 working on this paper with people for so long that, 235 00:16:50,000 --> 00:16:54,000 you know, I hadn't actually paid attention to the fact that it just 236 00:16:54,000 --> 00:16:58,000 came out. You don't want to know how long it took to write this. 237 00:16:58,000 --> 00:17:01,000 The paper, actually, is unusual. It's the only paper that Nature has 238 00:17:01,000 --> 00:17:05,000 ever published where the author list is sufficiently long that we don't 239 00:17:05,000 --> 00:17:09,000 have it in the Journal. There's a website that contains the 240 00:17:09,000 --> 00:17:13,000 author list. There are, I believe, I don't have the final 241 00:17:13,000 --> 00:17:16,000 count, but something in the neighborhood of about 200 authors to 242 00:17:16,000 --> 00:17:20,000 the paper. We decided that everybody who'd worked on it should 243 00:17:20,000 --> 00:17:24,000 be a co-author of the paper, and we just put it all on a website. 244 00:17:24,000 --> 00:17:28,000 So, anyway, I digress. So, suppose we have the beta-globin gene here. 245 00:17:28,000 --> 00:17:31,000 So, I've got that in -- I've got the normal form of the 246 00:17:31,000 --> 00:17:34,000 beta-globin gene, or I've got one person's form, 247 00:17:34,000 --> 00:17:37,000 in any case, in the human genome sequence. Now I want to take a 248 00:17:37,000 --> 00:17:41,000 patient with sickle cell anemia and I want to re-sequence their gene. 249 00:17:41,000 --> 00:17:44,000 Now, remember what we said, we would, we would make a library from 250 00:17:44,000 --> 00:17:47,000 that person, right? So, we'd get that person's blood, 251 00:17:47,000 --> 00:17:50,000 we'd purify DNA, we'd cut it, we'd clone it, we'd probe the library 252 00:17:50,000 --> 00:17:53,000 with a radioactive probe for the beta-globin gene, 253 00:17:53,000 --> 00:17:56,000 we'd pull out the gene and we'd re-sequence it. 254 00:17:56,000 --> 00:18:00,000 Suppose we wanted to do that to a hundred patients. 255 00:18:00,000 --> 00:18:03,000 For every patient we'd get blood, we'd make DNA, we'd clone in a 256 00:18:03,000 --> 00:18:07,000 plasma, we'd made a whole plasma library, plate it out on filter, 257 00:18:07,000 --> 00:18:10,000 probe it with a radioactive probe, pull out the clone and sequence it. 258 00:18:10,000 --> 00:18:14,000 Now, for any such library you probably need to look through a 259 00:18:14,000 --> 00:18:18,000 couple hundred thousand clones to find beta-globin. 260 00:18:18,000 --> 00:18:21,000 So, for your DNA and your DNA and your DNA and your DNA, 261 00:18:21,000 --> 00:18:25,000 we're going to make libraries of a hundred thousand clones, 262 00:18:25,000 --> 00:18:29,000 that's a couple, that's a lot of plates, right? 263 00:18:29,000 --> 00:18:32,000 We're going to put them all on, on nylon filters in these 264 00:18:32,000 --> 00:18:35,000 Seal-a-Meal bags with these radioactive probes, 265 00:18:35,000 --> 00:18:38,000 and we're going to look for your beta-globin clone, 266 00:18:38,000 --> 00:18:42,000 your beta-globin clone, your beta-globin clone, your 267 00:18:42,000 --> 00:18:45,000 beta-globin clone, et cetera, et cetera, 268 00:18:45,000 --> 00:18:48,000 et cetera. This is really boring. Do you realize how off putting it 269 00:18:48,000 --> 00:18:51,000 would be to study sickle cell anemia if we had to do that for each 270 00:18:51,000 --> 00:18:55,000 successive patient, make a whole library? 271 00:18:55,000 --> 00:18:58,000 But that was what you had to do in molecular biology because that was 272 00:18:58,000 --> 00:19:02,000 how you got the gene. You build a whole library, 273 00:19:02,000 --> 00:19:07,000 you withdraw it from the library. However, if you wanted to do this, 274 00:19:07,000 --> 00:19:12,000 could you manage to get the beta-globin sequence from your 275 00:19:12,000 --> 00:19:16,000 genome without having to make the whole library? 276 00:19:16,000 --> 00:19:21,000 It turns out, and I know it's been covered at least in some of the 277 00:19:21,000 --> 00:19:26,000 sections, there's a cool technique to do that. And what is 278 00:19:26,000 --> 00:19:32,000 that technique? PCR. So, it turns out that the next 279 00:19:32,000 --> 00:19:39,000 really great advance in molecular biology was the technique of PCR. 280 00:19:39,000 --> 00:19:47,000 And what PCR was a way, is, is a way to obtain a piece of DNA 281 00:19:47,000 --> 00:19:54,000 corresponding to an already known gene, you have to already know the 282 00:19:54,000 --> 00:20:01,000 gene, and what it allows you to do is then obtain that piece of DNA 283 00:20:01,000 --> 00:20:09,000 based on knowing at least some of its sequence. 284 00:20:09,000 --> 00:20:14,000 It allows you to amplify just that DNA from a, from any individual. 285 00:20:14,000 --> 00:20:20,000 So, as compared to the experiment where I make a library for you and a 286 00:20:20,000 --> 00:20:25,000 library for you and a library from you and a library from you, 287 00:20:25,000 --> 00:20:31,000 each of which could take a month, PCR would allow us to do it in 288 00:20:31,000 --> 00:20:36,000 principle in five minutes. And, actually, 289 00:20:36,000 --> 00:20:41,000 there are machines that would let you do it in five minutes. 290 00:20:41,000 --> 00:20:46,000 So, let's discuss how this PCR works. Nobody uses the five minute 291 00:20:46,000 --> 00:20:52,000 machines because you usually will then wait an hour or so, 292 00:20:52,000 --> 00:20:57,000 but anyway. Suppose I take my DNA sequence here from 293 00:20:57,000 --> 00:21:04,000 the human genome. Five prime to three prime. 294 00:21:04,000 --> 00:21:14,000 Five prime to three prime. This sequence here beta-globin. 295 00:21:14,000 --> 00:21:23,000 I want to obtain that sequence. The first thing I do is I'm going to 296 00:21:23,000 --> 00:21:33,000 heat my DNA sample to maybe 97 degrees Celsius to denature. 297 00:21:33,000 --> 00:21:42,000 Denaturing means, of course, breaking the hydrogen 298 00:21:42,000 --> 00:21:51,000 bonds that separate the two strands so that the strands come apart, 299 00:21:51,000 --> 00:22:00,000 five prime to three prime, five prime to three prime. 300 00:22:00,000 --> 00:22:09,000 Now, what I then do is I take a specific DNA primer matching this 301 00:22:09,000 --> 00:22:19,000 stretch just before the beta-globin gene starts. 302 00:22:19,000 --> 00:22:22,000 Or just before where I'm interested in. How do I make a primer that 303 00:22:22,000 --> 00:22:26,000 matches just that sequence? I order, well, how do I know what 304 00:22:26,000 --> 00:22:30,000 to order? I know the sequence, right? 305 00:22:30,000 --> 00:22:34,000 I've got the sequence already. I just look at it and I say I want 306 00:22:34,000 --> 00:22:38,000 that sequence. And then how do I get it? 307 00:22:38,000 --> 00:22:42,000 I order it. I type it into the Web and the machine will synthesize me 308 00:22:42,000 --> 00:22:47,000 this, this primer. Typically a 20-base stretch will 309 00:22:47,000 --> 00:22:51,000 suffice. So, I'll get me a twentymer, a 20 base oligonucleotide 310 00:22:51,000 --> 00:22:55,000 complimentary to the sequence on this side of the gene. 311 00:22:55,000 --> 00:23:00,000 What I'm also going to do is the same thing over here. 312 00:23:00,000 --> 00:23:06,000 I'm going to get a second primer. This is primer number one. This is 313 00:23:06,000 --> 00:23:12,000 primer number two. OK? Now, let's see. 314 00:23:12,000 --> 00:23:19,000 Five prime. This is five prime, five prime. Now what I'd like to do 315 00:23:19,000 --> 00:23:25,000 is add polymerase, I'd like to add dNTPs. 316 00:23:25,000 --> 00:23:32,000 So, plus DNA polymerase plus dNTPs. 317 00:23:32,000 --> 00:23:36,000 And what will happen? Polymerase will come along and 318 00:23:36,000 --> 00:23:41,000 start copying my DNA, but it will only copy it starting 319 00:23:41,000 --> 00:23:45,000 from the primers. Now, this will keep going, 320 00:23:45,000 --> 00:23:50,000 of course, but DNA polymerase doesn't go forever, 321 00:23:50,000 --> 00:23:55,000 you know, the reactions sort of stops at some point. 322 00:23:55,000 --> 00:23:59,000 And so you'll get a strand going off here and a strand 323 00:23:59,000 --> 00:24:04,000 going off there. Now, notice what I've done. 324 00:24:04,000 --> 00:24:10,000 I started with an entire human genome, and the number of copies of 325 00:24:10,000 --> 00:24:15,000 beta-globin was one per genome. When I'm done with this process, 326 00:24:15,000 --> 00:24:21,000 how many copies, how many double-stranded copies of 327 00:24:21,000 --> 00:24:27,000 beta-globin do I have? Two. That's still very little, 328 00:24:27,000 --> 00:24:33,000 but it's more than I had before. So, what do I do next? 329 00:24:33,000 --> 00:24:41,000 Repeat. So, let's heat up that sample again. We'll denature at 97 330 00:24:41,000 --> 00:24:48,000 degrees, and now we have our initial strand here, we have our strand that 331 00:24:48,000 --> 00:24:56,000 came off this primer that runs to here and maybe goes forward, 332 00:24:56,000 --> 00:25:05,000 we have this strand here. We have this strand here. 333 00:25:05,000 --> 00:25:15,000 And this was five prime, five prime, five prime, five prime. 334 00:25:15,000 --> 00:25:25,000 Now what do we do? We repeat. We'll take our primer, this is 335 00:25:25,000 --> 00:25:33,000 primer number one, let's see. It matches over there. 336 00:25:33,000 --> 00:25:41,000 Primer number two over here. Number one over here. Number two. 337 00:25:41,000 --> 00:25:48,000 Have I got this right? Yes. Good. Then where does this guy stop? 338 00:25:48,000 --> 00:25:56,000 Right at the end where my other primer was. This guy 339 00:25:56,000 --> 00:26:02,000 runs along here. That guy stops right at the end. 340 00:26:02,000 --> 00:26:07,000 That guy might go a little further. How many copies of the beta-globin 341 00:26:07,000 --> 00:26:12,000 gene do I have now? Four. Two of which, 342 00:26:12,000 --> 00:26:17,000 by the way, perfectly sit between my pink primers. What's going to 343 00:26:17,000 --> 00:26:22,000 happen if I do this again? How many copies will I get? 344 00:26:22,000 --> 00:26:27,000 Eight, six of which will sit perfectly and two might be a little 345 00:26:27,000 --> 00:26:32,000 ragged as to where they go. So, initially, 346 00:26:32,000 --> 00:26:38,000 after cycle number zero, that is initial conditions, 347 00:26:38,000 --> 00:26:44,000 the number of copies relative to the genome was one. 348 00:26:44,000 --> 00:26:50,000 After one cycle it's two. After two cycles it's four. 349 00:26:50,000 --> 00:26:56,000 After N cycles it's two to the N copies. Is that clear 350 00:26:56,000 --> 00:27:01,000 how the PCR works? And that on every round you're 351 00:27:01,000 --> 00:27:07,000 doubling. And, with the exception of those two 352 00:27:07,000 --> 00:27:13,000 white things that go off to the side, they're going back and forth and 353 00:27:13,000 --> 00:27:18,000 back and forth between the two primers you chose to put in. 354 00:27:18,000 --> 00:27:24,000 What is when N equals ten, what do you got? A thousand copies. 355 00:27:24,000 --> 00:27:30,000 What happens when N equals 20? A million copies. 356 00:27:30,000 --> 00:27:34,000 What is the copy number of beta-globin? Beta, 357 00:27:34,000 --> 00:27:38,000 let's suppose beta-globin, for the sake of the argument, 358 00:27:38,000 --> 00:27:42,000 sake of argument is about one thousand bases. 359 00:27:42,000 --> 00:27:46,000 What fraction of the human genome does beta-globin represent? 360 00:27:46,000 --> 00:27:50,000 Yeah, about a millionth for the genome. No, actually, 361 00:27:50,000 --> 00:27:54,000 one three millionth, but we'll call it a millionth. 362 00:27:54,000 --> 00:27:58,000 So, after I've made a million-fold amplification of beta-globin, 363 00:27:58,000 --> 00:28:03,000 beta-globin now represents half of the stuff that's in the tube. 364 00:28:03,000 --> 00:28:08,000 What would happen if I go another ten rounds? How many copies do I 365 00:28:08,000 --> 00:28:13,000 have? A billion copies. So, in other words, I started with 366 00:28:13,000 --> 00:28:18,000 something that was only present at about one one-millionth of what was 367 00:28:18,000 --> 00:28:23,000 in my test tube. If I could make a billion copies of 368 00:28:23,000 --> 00:28:28,000 that specific molecule, now it so dominates the mixture that 369 00:28:28,000 --> 00:28:33,000 it is a thousand times more abundant than the rest of the genome. 370 00:28:33,000 --> 00:28:39,000 It works. That's the remarkable thing, this works. 371 00:28:39,000 --> 00:28:45,000 Any questions about the technique? Now, yes? Well, I need two primers 372 00:28:45,000 --> 00:28:52,000 in their sequence. How many copies do I need of each 373 00:28:52,000 --> 00:28:58,000 of those primers? Well, I, I obviously need a lot of 374 00:28:58,000 --> 00:29:03,000 copies of those primers. So, primer number one, 375 00:29:03,000 --> 00:29:06,000 it's a single sequence, but when I order it from the company, 376 00:29:06,000 --> 00:29:10,000 I'm going to order me a boat load, a lot of that primer. So, I'm 377 00:29:10,000 --> 00:29:13,000 adding, I better add a billion molecules of that primer because I'm 378 00:29:13,000 --> 00:29:16,000 going to make a billion copies starting from such primers. 379 00:29:16,000 --> 00:29:20,000 But if I have a billion copies of, of number one and a billion copies 380 00:29:20,000 --> 00:29:23,000 of number two and, you know, these days, 381 00:29:23,000 --> 00:29:26,000 billions aren't such big, you know, molecules are Avogadro's 382 00:29:26,000 --> 00:29:30,000 number and all that. It's not hard to get things. 383 00:29:30,000 --> 00:29:34,000 So, you throw in huge excess, a massive excess of primer number 384 00:29:34,000 --> 00:29:38,000 one, a massive excess of primer number two, and you just do this. 385 00:29:38,000 --> 00:29:42,000 Now, I mean, what does it cost to make such a massive excess of a 386 00:29:42,000 --> 00:29:46,000 primer? It's about ten cents a base, so it's two bucks, 387 00:29:46,000 --> 00:29:50,000 two bucks per primer give or take. You know, so I can get you a better 388 00:29:50,000 --> 00:29:54,000 price if you want, but, you know, anyway. 389 00:29:54,000 --> 00:29:58,000 It's not a bad price to, to buy primer. So, you can just go 390 00:29:58,000 --> 00:30:01,000 out and order a pair of primers. You can have them tomorrow. 391 00:30:01,000 --> 00:30:05,000 And then all you have to do is add the primers to the, 392 00:30:05,000 --> 00:30:09,000 so I take DNA. Do I, I, I need DNA from you. 393 00:30:09,000 --> 00:30:13,000 It turns out I could draw your blood and purify DNA and all that. 394 00:30:13,000 --> 00:30:16,000 But it turns out that if all I wanted to do was amplify one locus, 395 00:30:16,000 --> 00:30:20,000 I could actually take a Popsicle stick and ask you to scrape the 396 00:30:20,000 --> 00:30:24,000 inside of your cheek. That'll get enough cells off from 397 00:30:24,000 --> 00:30:28,000 the inside of your cheek, stick it in a test-tube, and it'll 398 00:30:28,000 --> 00:30:31,000 actually have enough DNA there. It turns out this is a very 399 00:30:31,000 --> 00:30:35,000 sensitive and powerful technique, so, but before we get to that notice 400 00:30:35,000 --> 00:30:39,000 what we had to do. We had to heat our DNA to 97 401 00:30:39,000 --> 00:30:43,000 degrees and add polymerase. Then we heat again to 97, add 402 00:30:43,000 --> 00:30:46,000 polymerase, heat again to 97, add polymerase. Why do I have to 403 00:30:46,000 --> 00:30:50,000 keep adding polymerase? Because polymerase gets roomed at 404 00:30:50,000 --> 00:30:54,000 97 degrees so it's denatured. So, the nuisance about PCR is I 405 00:30:54,000 --> 00:30:57,000 have to go to my Eppendorf plastic tube, pop open the lid, 406 00:30:57,000 --> 00:31:01,000 stick in some DNA polymerase, close it up, stick it back in a 407 00:31:01,000 --> 00:31:05,000 heating bath, let it go for a while, take it out, pop it open, add some 408 00:31:05,000 --> 00:31:09,000 more polymerase, put it back in the heating bath, 409 00:31:09,000 --> 00:31:13,000 pop it out. And this is actually the way 410 00:31:13,000 --> 00:31:17,000 primitive scientists did PCR not so long ago, OK? Wouldn't it be cool 411 00:31:17,000 --> 00:31:22,000 if we could engineer a DNA polymerase that didn't denature at 412 00:31:22,000 --> 00:31:26,000 97 degrees? Because then what we could do is just add the polymerase, 413 00:31:26,000 --> 00:31:31,000 close up the tube, put it in a machine that goes heat, 414 00:31:31,000 --> 00:31:35,000 cool, heat, cool, heat, cool, heat, cool, but you would have, 415 00:31:35,000 --> 00:31:40,000 so how do we, what kind of cleaver biological engineering do we use to 416 00:31:40,000 --> 00:31:45,000 modify polymerase so it won't denature at 97 degrees? 417 00:31:45,000 --> 00:31:49,000 Yes? Get it from a bacteria. What kind of a bacteria would you 418 00:31:49,000 --> 00:31:53,000 ask for an enzyme that could work in, in basically boiling water? 419 00:31:53,000 --> 00:31:57,000 Bacteria that basically live in boiling water. Where would 420 00:31:57,000 --> 00:32:03,000 you look for such? Thermal vents. 421 00:32:03,000 --> 00:32:09,000 You'll, geysers, things like that. Life lives 422 00:32:09,000 --> 00:32:15,000 everywhere. What you go is you find yourself a bacterium, 423 00:32:15,000 --> 00:32:21,000 so you find bacteria that lived in geysers or in thermal vents and you 424 00:32:21,000 --> 00:32:27,000 purify their DNA polymerase. The most famous one comes from the 425 00:32:27,000 --> 00:32:34,000 organism, the bacterium called thermos aquaticus, aquaticus. 426 00:32:34,000 --> 00:32:38,000 Which of course means hot water, right? That's what the bacteria is 427 00:32:38,000 --> 00:32:43,000 called, thermos aquaticus. And, or, and its enzyme is called 428 00:32:43,000 --> 00:32:47,000 tack, tack. So, we'll refer to it often, 429 00:32:47,000 --> 00:32:52,000 Taq polymerase, meaning from this bacteria thermos aquaticus, 430 00:32:52,000 --> 00:32:57,000 OK? So, that's Taq. So, it turns out that you can do this now without 431 00:32:57,000 --> 00:33:02,000 having to open and close the test-tubes. 432 00:33:02,000 --> 00:33:11,000 Oops, I meant to put that here. How sensitive is PCR? It's very 433 00:33:11,000 --> 00:33:20,000 sensitive. You could do, so applications of PCR. Very 434 00:33:20,000 --> 00:33:30,000 versatile. First let's just re-sequence a gene. 435 00:33:30,000 --> 00:33:36,000 Gene from yeast or from human. You just need, you know, any DNA 436 00:33:36,000 --> 00:33:42,000 sample. Get my gene, get my primers, and as I was 437 00:33:42,000 --> 00:33:48,000 indicating with a Popsicle stick, I don't have to have it very pure, 438 00:33:48,000 --> 00:33:54,000 although in a laboratory you go to the trouble of making it pure 439 00:33:54,000 --> 00:34:00,000 because you want it to be pure and all that. Yes? 440 00:34:00,000 --> 00:34:06,000 Correct. Yeah. So, remember I was making a fuss 441 00:34:06,000 --> 00:34:12,000 over the accuracy of replication, right? And I said that on its own a 442 00:34:12,000 --> 00:34:19,000 polymerase might have an accuracy to only about ten to the minus five. 443 00:34:19,000 --> 00:34:25,000 So, now, what were the two mechanisms for, 444 00:34:25,000 --> 00:34:32,000 for repairing DNA, for proofreading DNA? 445 00:34:32,000 --> 00:34:35,000 One was a built-in proofreading activity that the enzyme had. 446 00:34:35,000 --> 00:34:38,000 The enzyme would have put in a base, would check the base, 447 00:34:38,000 --> 00:34:42,000 and that actually helped by an order of magnitude or two. 448 00:34:42,000 --> 00:34:45,000 And some of these polymerases have a proofreading activity. 449 00:34:45,000 --> 00:34:49,000 But then we also discussed the mismatch repair activity that would 450 00:34:49,000 --> 00:34:52,000 later come along and detect mismatches. You're absolutely right, 451 00:34:52,000 --> 00:34:56,000 PCR is not as accurate as cells because it doesn't have that 452 00:34:56,000 --> 00:35:00,000 mismatch repair activity. So, when you take a PCR product, 453 00:35:00,000 --> 00:35:05,000 if I were to clone, so if I were to take all the PCR product, 454 00:35:05,000 --> 00:35:10,000 say, from my beta-globin gene, so I'm going to take my test-tube, 455 00:35:10,000 --> 00:35:15,000 I'm going to add my primers and everything, I'm going to PCR, 456 00:35:15,000 --> 00:35:20,000 I'm going to PCR, and then I'm going to get a lot of copies of 457 00:35:20,000 --> 00:35:25,000 beta-globin. If I were to take that beta-globin and just directly 458 00:35:25,000 --> 00:35:30,000 sequence the DNA in the test-tube. Here's my pieces of beta-globin. 459 00:35:30,000 --> 00:35:34,000 I can now sequence it by adding a primer and doing my fluorescent 460 00:35:34,000 --> 00:35:39,000 sequencing and running it on a sequencer and all that. 461 00:35:39,000 --> 00:35:44,000 Sorry, going the other way. I'll run a sequencing reaction. 462 00:35:44,000 --> 00:35:48,000 I could actually do it, and what I do it on is the whole population of 463 00:35:48,000 --> 00:35:53,000 a million or a billion molecules. If any one of them is wrong it's 464 00:35:53,000 --> 00:35:58,000 going to be swamped out by others, OK? Because I could do my 465 00:35:58,000 --> 00:36:03,000 sequencing reaction on the whole PCR product. 466 00:36:03,000 --> 00:36:07,000 And random mistakes in one molecule or the other will still be a tiny 467 00:36:07,000 --> 00:36:12,000 minority of the votes at any given base, right? But suppose I were to 468 00:36:12,000 --> 00:36:17,000 take my PCR product, all these amplified molecules here, 469 00:36:17,000 --> 00:36:22,000 and suppose I were to clone them individually and I were to sequence 470 00:36:22,000 --> 00:36:27,000 each of those individual clones instead of sequencing a, 471 00:36:27,000 --> 00:36:32,000 a mixture of all the products. I would, in fact, 472 00:36:32,000 --> 00:36:36,000 see a higher mutation rate. And you're absolutely right. 473 00:36:36,000 --> 00:36:41,000 When people clone PCR products they have to check them afterwards and 474 00:36:41,000 --> 00:36:46,000 throw out the ones that are wrong, OK? Absolutely right. Good, good, 475 00:36:46,000 --> 00:36:50,000 good. So, you guys are, you know, right on top of the important issues 476 00:36:50,000 --> 00:36:55,000 about, about DNA. So, so I can, I can take a gene and 477 00:36:55,000 --> 00:37:00,000 I can re-sequence it. I can also do things like take 478 00:37:00,000 --> 00:37:05,000 blood and look for the presence of a virus. 479 00:37:05,000 --> 00:37:11,000 So, I could re-sequence beta-globin and study people and see who's got 480 00:37:11,000 --> 00:37:18,000 sickle cell anemia and all that. I could take blood and I might want 481 00:37:18,000 --> 00:37:24,000 to say do I see the HIV virus present in someone's blood? 482 00:37:24,000 --> 00:37:31,000 For example, HIV testing can be done by making PCR primers for the 483 00:37:31,000 --> 00:37:37,000 sequence of the HIV virus. It has a genome. 484 00:37:37,000 --> 00:37:41,000 Taking a human's blood sample and PCR-ing it. If you get a positive 485 00:37:41,000 --> 00:37:45,000 PCR product, a PCR product that is made by these two primers and if, 486 00:37:45,000 --> 00:37:49,000 for example, you checked that it, that it gives you the HIV sequence 487 00:37:49,000 --> 00:37:53,000 then you know that that blood sample has, that person has the HIV virus. 488 00:37:53,000 --> 00:37:57,000 This is a way to do this. The PCR reaction itself is fast. 489 00:37:57,000 --> 00:38:01,000 Typically takes hours. In fact, can be forced to go much 490 00:38:01,000 --> 00:38:04,000 more quickly by machines that rapidly thermocycle. 491 00:38:04,000 --> 00:38:08,000 And you can actually PCR in five minutes, although people don't do it 492 00:38:08,000 --> 00:38:11,000 very often, but if you put a thin glass capillary and go heat, 493 00:38:11,000 --> 00:38:14,000 cold, heat, cold very, very quickly, there's a machine from Idaho 494 00:38:14,000 --> 00:38:18,000 Technologies that can do it in five minutes, but it's usually not the 495 00:38:18,000 --> 00:38:21,000 trouble. And you just put it in and, you know, in a couple hours you'll 496 00:38:21,000 --> 00:38:24,000 get an answer there as to whether or not somebody has HIV, 497 00:38:24,000 --> 00:38:28,000 for example. So, you can do that to detect relatively low 498 00:38:28,000 --> 00:38:33,000 quantities of virus. How low can you go? 499 00:38:33,000 --> 00:38:39,000 Well, it turns out, what's the limit? What's the 500 00:38:39,000 --> 00:38:45,000 smallest number of molecules you might be able to detect in a sample? 501 00:38:45,000 --> 00:38:51,000 Theoretically. One. You can't fewer than one molecule, 502 00:38:51,000 --> 00:38:57,000 right? So, one might be the limit. So, how could I arrange to have a 503 00:38:57,000 --> 00:39:01,000 single molecule in a test-tube? I would like to have a test-tube 504 00:39:01,000 --> 00:39:04,000 that has exactly one copy of the beta-globin gene. 505 00:39:04,000 --> 00:39:08,000 What, how's the best, what's the best way to get exactly 506 00:39:08,000 --> 00:39:11,000 one copy of beta-globin and put it in the test-tube? 507 00:39:11,000 --> 00:39:14,000 Sorry? You can't. Why? Just one molecule. 508 00:39:14,000 --> 00:39:18,000 I want to get exactly one copy of beta-globin. I could, 509 00:39:18,000 --> 00:39:21,000 I could just take total DNA and dilute it so, on average, 510 00:39:21,000 --> 00:39:24,000 there's only one copy. Or, actually, is there any way to, 511 00:39:24,000 --> 00:39:28,000 I mean can I, I'd just like to buy a package that contains exactly 512 00:39:28,000 --> 00:39:32,000 one beta-globin. Sorry? Bind it to something big. 513 00:39:32,000 --> 00:39:36,000 Let's think biologically. Does biology package up a single copy of 514 00:39:36,000 --> 00:39:41,000 beta-globin? Sorry? Gametes. How about a sperm? 515 00:39:41,000 --> 00:39:46,000 Let's grab a sperm by its tail here, put it in the test-tube. 516 00:39:46,000 --> 00:39:50,000 It's one copy of beta-globin. So, you can actually take cell 517 00:39:50,000 --> 00:39:55,000 sorters and have it cell sort sperm into individual test-tubes. 518 00:39:55,000 --> 00:40:00,000 You now know there's one copy of beta-globin. 519 00:40:00,000 --> 00:40:05,000 Heat it up, it will crack open the sperm, add your primers, 520 00:40:05,000 --> 00:40:10,000 you can amplify beta-globin, it's a single copy. That proves its 521 00:40:10,000 --> 00:40:16,000 extraordinary sensitivity. You can do it with a single sperm. 522 00:40:16,000 --> 00:40:21,000 You can do it with a single egg also, but harder to come by. 523 00:40:21,000 --> 00:40:27,000 So, with that level of sensitivity, you could do the following. So, 524 00:40:27,000 --> 00:40:32,000 single sperm typing. Now, single sperm typing is cool but 525 00:40:32,000 --> 00:40:36,000 sort of useless. What are you going to do with it, 526 00:40:36,000 --> 00:40:41,000 right? But here's another thing you could do. Embryo typing. 527 00:40:41,000 --> 00:40:45,000 Suppose someone has a genetic disease in their family, 528 00:40:45,000 --> 00:40:50,000 maybe it's Huntington's disease. And suppose that the individual with 529 00:40:50,000 --> 00:40:54,000 Huntington's disease wants to have kids. Or the individual, 530 00:40:54,000 --> 00:40:59,000 sorry, the individual who is at risk for Huntington's disease or breast 531 00:40:59,000 --> 00:41:04,000 cancer or whatever wants to have kids. 532 00:41:04,000 --> 00:41:13,000 What you can do is with an in vitro fertilization clinic you're able to 533 00:41:13,000 --> 00:41:23,000 obtain eggs, fertilize eggs in vitro, and grow them up in a Petri plate to 534 00:41:23,000 --> 00:41:33,000 8 or 16 cell stage before re-implanting embryos 535 00:41:33,000 --> 00:41:41,000 back in the mother. Wouldn't it be cool if we could 536 00:41:41,000 --> 00:41:48,000 choose to only re-implant an embryo that did not have the genetic 537 00:41:48,000 --> 00:41:56,000 disease? How are we going to do that? PCR. How are we, 538 00:41:56,000 --> 00:42:02,000 so what do we do? We take the embryo. 539 00:42:02,000 --> 00:42:06,000 We make DNA from the embryo. We do PCR and we say, ah-ha, this 540 00:42:06,000 --> 00:42:10,000 embryo did not have the genetic disease. Problem is it has killed 541 00:42:10,000 --> 00:42:15,000 the, the cells there, right, it killed the embryo. 542 00:42:15,000 --> 00:42:19,000 Any ideas? Pull off one cell. Remove a single cell. It turns out 543 00:42:19,000 --> 00:42:24,000 that at stage the cells are not differentiated. 544 00:42:24,000 --> 00:42:28,000 If I remove one cell from an embryo at that very early stage, 545 00:42:28,000 --> 00:42:33,000 the other cells with make a perfectly happy, healthy baby. 546 00:42:33,000 --> 00:42:37,000 That cell is not necessary. This single cell sensitivity is 547 00:42:37,000 --> 00:42:41,000 very valuable because I can actually do single cell genotyping on in 548 00:42:41,000 --> 00:42:45,000 vitro fertilized embryos and be able offer parents a chance, 549 00:42:45,000 --> 00:42:49,000 the opportunity to re-implant only those embryos that do not have the 550 00:42:49,000 --> 00:42:53,000 genetic defect. That's cool. That's really cool. 551 00:42:53,000 --> 00:42:57,000 There are other things you might be able to do. If you're treating a 552 00:42:57,000 --> 00:43:01,000 patient with cancer, a patient, a cancer patient and 553 00:43:01,000 --> 00:43:05,000 you've given chemotherapy you want to know have I managed to eradicate 554 00:43:05,000 --> 00:43:11,000 the cancer cells? And six months later have any of the 555 00:43:11,000 --> 00:43:19,000 cancer cells come back? I could look for very low 556 00:43:19,000 --> 00:43:27,000 quantities of cancer cells. I can, I can do surveillance for 557 00:43:27,000 --> 00:43:35,000 low quantities of cancer cells following chemotherapy. 558 00:43:35,000 --> 00:43:39,000 And, of course, I can also do forensics. 559 00:43:39,000 --> 00:43:44,000 I could take a small sample of blood from the scene of a crime or 560 00:43:44,000 --> 00:43:49,000 saliva from the back of an envelope that someone has licked, 561 00:43:49,000 --> 00:43:53,000 and I could do PCR and look for genetic variations that distinguish 562 00:43:53,000 --> 00:43:58,000 people. And, presumably, you see all that stuff on television 563 00:43:58,000 --> 00:44:04,000 all the time. So, that's what PCR is good for. 564 00:44:04,000 --> 00:44:10,000 It's good. All right. Last topic, very brief topic, but I do want to 565 00:44:10,000 --> 00:44:17,000 mention. This was being able to analyze a gene directed mutagenesis. 566 00:44:17,000 --> 00:44:23,000 And I won't go through the details of all this, but I just want to at 567 00:44:23,000 --> 00:44:30,000 least basically describe the concept. 568 00:44:30,000 --> 00:44:36,000 I could take any piece of DNA, say from a drosophila, and I can 569 00:44:36,000 --> 00:44:42,000 mutate the DNA in vitro. I can change this base from a G to 570 00:44:42,000 --> 00:44:48,000 a C. There's a right, there's a proper protocol and 571 00:44:48,000 --> 00:44:54,000 cooking trick for doing that. It involves putting a certain oligo 572 00:44:54,000 --> 00:45:00,000 over it and extending, and it doesn't matter exactly how. 573 00:45:00,000 --> 00:45:04,000 I could insert an extra gene into that. I could use a little 574 00:45:04,000 --> 00:45:08,000 restriction enzyme to open it up and stuff something in. 575 00:45:08,000 --> 00:45:13,000 I could delete something from this. Maybe I'll use a restriction enzyme 576 00:45:13,000 --> 00:45:17,000 to cut it open, et cetera. Basically I could, 577 00:45:17,000 --> 00:45:22,000 I can fuse genes together. I can do whatever kind of construction of 578 00:45:22,000 --> 00:45:26,000 pieces of DNA and modifications of pieces of DNA that I would 579 00:45:26,000 --> 00:45:31,000 like to do in vitro. I can then take that mutated gene, 580 00:45:31,000 --> 00:45:36,000 let's say the gene is an enzyme, encodes an enzyme, 581 00:45:36,000 --> 00:45:41,000 and the enzyme has an active site. I could change the code for the 582 00:45:41,000 --> 00:45:46,000 amino acid right at the active site to see if that amino acid really 583 00:45:46,000 --> 00:45:52,000 matters or not. I can do any of those things. 584 00:45:52,000 --> 00:45:57,000 And I can put this back in an organism. Remember that you, 585 00:45:57,000 --> 00:46:02,000 I said you could transform DNA back into bacteria? 586 00:46:02,000 --> 00:46:07,000 Well, you can also do such things as simply inject DNA into 587 00:46:07,000 --> 00:46:11,000 a fertilized egg. In fact, at the stage where there's 588 00:46:11,000 --> 00:46:15,000 a male and a female pronucleus that haven't fused yet right after 589 00:46:15,000 --> 00:46:19,000 fertilization. You can take your little pipette 590 00:46:19,000 --> 00:46:22,000 and a needle and you can inject some of the DNA you want into the male 591 00:46:22,000 --> 00:46:26,000 pronucleus, and then when the male pronucleus and the female pronucleus 592 00:46:26,000 --> 00:46:30,000 fuse and the embryo grows it will have your DNA. 593 00:46:30,000 --> 00:46:36,000 You can make mice that carry whatever gene you've modified like 594 00:46:36,000 --> 00:46:42,000 this. You can also not, you, you can also not just modify a 595 00:46:42,000 --> 00:46:49,000 piece of DNA and add, this is gene addition, 596 00:46:49,000 --> 00:46:55,000 you can also do gene subtraction. You can do gene subtraction and, 597 00:46:55,000 --> 00:47:01,000 again, I won't worry about the details here, by taking 598 00:47:01,000 --> 00:47:07,000 embryonic stem cells. Much in the news these days, 599 00:47:07,000 --> 00:47:11,000 and we may come back to them. And in vitro, working with 600 00:47:11,000 --> 00:47:15,000 embryonic stem cells, to transform a piece of DNA that has 601 00:47:15,000 --> 00:47:19,000 been arranged to recombine into the gene of interest and know it out. 602 00:47:19,000 --> 00:47:24,000 So, if you build, if you build a piece of DNA in vitro and you put it 603 00:47:24,000 --> 00:47:28,000 into a whole bunch of embryonic stem cells you can select, 604 00:47:28,000 --> 00:47:32,000 by various cleaver techniques, for those embryonic stem cells that 605 00:47:32,000 --> 00:47:38,000 have taken up your gene. And not just taken it up but slammed 606 00:47:38,000 --> 00:47:45,000 it into the normal locus in place of the normal locus. 607 00:47:45,000 --> 00:47:52,000 And that way you can knock out a gene. You can do gene knockout. 608 00:47:52,000 --> 00:47:59,000 So, the basic point of this now, to summarize these many lectures is 609 00:47:59,000 --> 00:48:06,000 we're now at the point where this picture that we saw at the, 610 00:48:06,000 --> 00:48:13,000 at the beginning, function, gene, protein, that we understood now 611 00:48:13,000 --> 00:48:20,000 first as a methodology, genetics, biochemistry. 612 00:48:20,000 --> 00:48:26,000 And then we understood how genes encode proteins through molecular 613 00:48:26,000 --> 00:48:32,000 biology. These tools of recombinant DNA allow us to move 614 00:48:32,000 --> 00:48:37,000 in any direction. You want to find the gene underlying 615 00:48:37,000 --> 00:48:41,000 a function, find the gene for Huntington's disease? 616 00:48:41,000 --> 00:48:45,000 We could do it. Clone it based solely on its linkage. 617 00:48:45,000 --> 00:48:49,000 You want to find the gene encoding a protein? If I know its amino acid 618 00:48:49,000 --> 00:48:53,000 sequence, I can find the DNA sequence that corresponds it. 619 00:48:53,000 --> 00:48:57,000 If I want to find what a certain protein does, its function, 620 00:48:57,000 --> 00:49:01,000 I could get the gene for that protein. I could knock out the gene 621 00:49:01,000 --> 00:49:05,000 for that protein and see what its function is. 622 00:49:05,000 --> 00:49:08,000 Suddenly, for the mathematicians amongst the group, 623 00:49:08,000 --> 00:49:12,000 this becomes a commutative diagram, which you can chase around in any 624 00:49:12,000 --> 00:49:15,000 direction. That is, in a sense, what the 20th century 625 00:49:15,000 --> 00:49:19,000 was about, was intellectually these two disciplines merging through 626 00:49:19,000 --> 00:49:22,000 molecular biology and then recombinant DNA giving you all the 627 00:49:22,000 --> 00:49:26,000 tools that if you're sitting at any place in this triangle you can move 628 00:49:26,000 --> 00:49:29,000 this way and that way, from a gene to a protein, 629 00:49:29,000 --> 00:49:33,000 from a protein to a gene, from a function to a gene, from a 630 00:49:33,000 --> 00:49:36,000 function to a protein. Much of the rest of the course we'll 631 00:49:36,000 --> 00:49:40,000 talk about how you use these tools, but this brings to a close this 632 00:49:40,000 --> 00:49:44,000 first chunk of the course about the concepts and the methodologies of 633 00:49:44,000 --> 00:49:48,000 molecular biology. Now, if you hang on one more minute, 634 00:49:48,000 --> 00:49:52,000 this is my last lecture for a while. I won't be, we're having an exam on, 635 00:49:52,000 --> 00:49:56,000 we have a quiz on Monday, and then Bob's taking over again. 636 00:49:56,000 --> 00:50:00,000 So, I won't see you for the next week or so. So, two things. 637 00:50:00,000 --> 00:50:04,000 One, I won't see you before the World Series is over so everyone 638 00:50:04,000 --> 00:50:09,000 please think good thoughts about the Red Sox. Number two, 639 00:50:09,000 --> 00:50:13,000 I will not see you before the election. Vote. 640 00:50:13,000 --> 00:50:18,000 It's your choice who you vote for, but vote. Good-bye.