1 00:00:09,000 --> 00:00:13,000 Good morning. So, I see we have a lot of parents here. 2 00:00:13,000 --> 00:00:17,000 How many parents have we got here? 3 00:00:17,000 --> 00:00:20,000 Welcome to the parents. How many of the parents have done 4 00:00:20,000 --> 00:00:23,000 the reading for today? [LAUGHTER] Good, because we'll call 5 00:00:23,000 --> 00:00:27,000 the parents too, right? We'll see what happens. 6 00:00:27,000 --> 00:00:31,000 All right, so, where are we? We've talked about this diagram that I 7 00:00:31,000 --> 00:00:36,000 keep coming back to. If you want to 8 00:00:36,000 --> 00:00:40,000 study biological function the two traditional ways to do that were to 9 00:00:40,000 --> 00:00:44,000 look at genetics or to look at biochemistry: genetics, the 10 00:00:44,000 --> 00:00:48,000 study of an organism with one broken component, those components being 11 00:00:48,000 --> 00:00:53,000 genes; biochemistry: the study of the purification of 12 00:00:53,000 --> 00:00:55,000 individual components from an organism away from the organism, 13 00:00:55,000 --> 00:00:58,000 particularly the most important such components being proteins. 14 00:00:58,000 --> 00:01:01,000 What do they have to do to each other? 15 00:01:01,000 --> 00:01:05,000 The unification in molecular biology that occurred in the middle 16 00:01:05,000 --> 00:01:09,000 of the century from the 1950s into the ‘60s and really 17 00:01:09,000 --> 00:01:13,000 up to 1970 or so, we came to a conceptual understand 18 00:01:13,000 --> 00:01:16,000 that genes encode proteins, 19 00:01:16,000 --> 00:01:19,000 and therefore these two different ways of looking at the organism: 20 00:01:19,000 --> 00:01:23,000 organism minus a component, components 21 00:01:23,000 --> 00:01:26,000 minus an organism were complementary points of view, 22 00:01:26,000 --> 00:01:29,000 and in theory, you could go from a gene sequence to a protein 23 00:01:29,000 --> 00:01:32,000 Sequence, a protein sequence back to a gene sequence, 24 00:01:32,000 --> 00:01:35,000 to go for a gene sequence to its function, its function 25 00:01:35,000 --> 00:01:40,000 to a protein, except for one to a tiny detail. This was all just 26 00:01:40,000 --> 00:01:46,000 conceptual. Conceptually we understood by about 1970 27 00:01:46,000 --> 00:01:51,000 that the DNA made the RNA made the protein. The protein carried 28 00:01:51,000 --> 00:01:56,000 out the function but as of then, 29 00:01:56,000 --> 00:02:00,000 you couldn't individually work with or purify the DNA corresponding to 30 00:02:00,000 --> 00:02:04,000 any particular gene. All of the inferences had been 31 00:02:04,000 --> 00:02:07,000 indirect inferences: indirect inferences from bacterial genetics, 32 00:02:07,000 --> 00:02:11,000 bacterial regulation or Meselson-Stahl experiments, 33 00:02:11,000 --> 00:02:15,000 and all sorts of interesting indirect ways 34 00:02:15,000 --> 00:02:19,000 working out the genetic code, but it didn't let you read anything. 35 00:02:19,000 --> 00:02:23,000 This was a problem. Some people in the 36 00:02:23,000 --> 00:02:26,000 late 1960s said, great, molecular biology is over. 37 00:02:26,000 --> 00:02:29,000 We understand in principle how life works. Now let's go understand 38 00:02:29,000 --> 00:02:31,000 how the brain works. And there was an exodus of some 39 00:02:31,000 --> 00:02:34,000 people from molecular biology into neurobiology to now go 40 00:02:34,000 --> 00:02:36,000 nail the brain, figured that would be worth another 41 00:02:36,000 --> 00:02:39,000 ten years or so. But in fact, remarkably, 42 00:02:39,000 --> 00:02:42,000 people began to focus on how you could get to 43 00:02:42,000 --> 00:02:46,000 work with individual specific genes. Now, what's so hard about that? I 44 00:02:46,000 --> 00:02:49,000 mean, it's not very hard to crack open a 45 00:02:49,000 --> 00:02:53,000 red blood cell and purify different proteins. You can purify hemoglobin. 46 00:02:53,000 --> 00:02:56,000 You can purify different enzymes. Biochemistry allows you to purify 47 00:02:56,000 --> 00:02:59,000 different components from each other. I want to purify an enzyme: let's 48 00:02:59,000 --> 00:03:02,000 crack open a yeast cell, separate the 49 00:03:02,000 --> 00:03:04,000 proteins over some column that separates them based on their size 50 00:03:04,000 --> 00:03:06,000 or their charge, and I'll get purer and purer 51 00:03:06,000 --> 00:03:09,000 fractions. I'll assay each fraction to see 52 00:03:09,000 --> 00:03:11,000 which one has the enzymatic activity. But basically I use the 53 00:03:11,000 --> 00:03:14,000 physical chemical properties of the proteins to 54 00:03:14,000 --> 00:03:17,000 separate them into different buckets. Why not do that with, 55 00:03:17,000 --> 00:03:21,000 say, the human DNA and purify out the gene for beta 56 00:03:21,000 --> 00:03:25,000 globin, that encodes the beta component of hemoglobin? 57 00:03:25,000 --> 00:03:30,000 What would be the problem of just using physical chemical purification 58 00:03:30,000 --> 00:03:36,000 to purify one human gene from another? Well, I mean, it's one 59 00:03:36,000 --> 00:03:39,000 very big molecule. Well, I could sheer it up. 60 00:03:39,000 --> 00:03:43,000 Maybe I'll just break it up. Now, let's purify the beta 61 00:03:43,000 --> 00:03:47,000 globin containing part. It all looks the same. It's just 62 00:03:47,000 --> 00:03:51,000 DNA. It's one chemical polymer with pretty boring properties, 63 00:03:51,000 --> 00:03:54,000 and they're not very different. 64 00:03:54,000 --> 00:03:56,000 Any particular DNA sequence in any other DNA sequence basically about 65 00:03:56,000 --> 00:04:00,000 the same molecular weight, same charges, 66 00:04:00,000 --> 00:04:04,000 there's nothing to separate them by. How are you going to purify beta 67 00:04:04,000 --> 00:04:08,000 globin? That was the problem. That's where 68 00:04:08,000 --> 00:04:11,000 recombinant DNA came in was recombinant DNA was a remarkable 69 00:04:11,000 --> 00:04:15,000 and totally different way of purifying individual 70 00:04:15,000 --> 00:04:19,000 components. And the basis of it was this notion of cloning. 71 00:04:19,000 --> 00:04:23,000 If I want to purify out from the human 72 00:04:23,000 --> 00:04:27,000 genome, how big is the human genome? The human genome is about three 73 00:04:27,000 --> 00:04:31,000 billion bases long. If I want to purify a particular 74 00:04:31,000 --> 00:04:37,000 gene, let's say beta globin or some other gene, 75 00:04:37,000 --> 00:04:46,000 typical gene, might be on the order of 30,000 letters long. 76 00:04:46,000 --> 00:04:49,000 This is one part in 105 purification I've got to achieve. 77 00:04:49,000 --> 00:04:53,000 Any given gene is only one part in 78 00:04:53,000 --> 00:04:56,000 105 of the human genome. And then, what about a typical 79 00:04:56,000 --> 00:04:59,000 mutation? Maybe the mutation that causes 80 00:04:59,000 --> 00:05:05,000 sickle cell anemia by changing a single nucleotide and beta globe, 81 00:05:05,000 --> 00:05:10,000 well, that's one base pair. So that means I'm trying to identify 82 00:05:10,000 --> 00:05:14,000 something that's on the order of one part in 109 actually, 83 00:05:14,000 --> 00:05:17,000 a little less than one part in 109 of the whole genome. 84 00:05:17,000 --> 00:05:21,000 Carrying out purifications like: really kind of hard 85 00:05:21,000 --> 00:05:25,000 to imagine. But the way it was done was by the invention of cloning. 86 00:05:25,000 --> 00:05:29,000 Let me briefly overview of the idea 87 00:05:29,000 --> 00:05:37,000 of cloning, and then we'll dive into the details. The idea of cloning 88 00:05:37,000 --> 00:05:42,000 was, the way to purify individual molecules would just be 89 00:05:42,000 --> 00:05:46,000 to take the molecules and just dilute them so that there 90 00:05:46,000 --> 00:05:49,000 was only one of each model. That's very pure, 91 00:05:49,000 --> 00:05:52,000 isn't it? The problem is it's not very much, so you need 92 00:05:52,000 --> 00:05:56,000 a way to take a single copy of a molecule, 93 00:05:56,000 --> 00:06:00,000 and then make many copies of it. So purification's not hard. You 94 00:06:00,000 --> 00:06:03,000 just dilute it down so you work with 95 00:06:03,000 --> 00:06:05,000 single molecules but then you need to copy it back again 96 00:06:05,000 --> 00:06:08,000 and again and again, and no biochemical technique 97 00:06:08,000 --> 00:06:10,000 involves, say, fractionating a cell and replicating 98 00:06:10,000 --> 00:06:13,000 some enzyme, you know, copying some enzyme. 99 00:06:13,000 --> 00:06:16,000 You can't copy enzymes, but you can copy DNA, and that was 100 00:06:16,000 --> 00:06:19,000 the basis of it. So here's the way it goes. The 101 00:06:19,000 --> 00:06:22,000 basic overview we'll look at is take your DNA and cut your DNA 102 00:06:22,000 --> 00:06:28,000 of interest, maybe the human genome, 103 00:06:28,000 --> 00:06:36,000 into pieces at defined sites Then, paste your DNA, which is more 104 00:06:36,000 --> 00:06:44,000 technically ligate, the word we use. 105 00:06:44,000 --> 00:06:53,000 Paste your DNA to some other DNA called a vector. So, 106 00:06:53,000 --> 00:06:57,000 cut your DNA and paste your DNA. Each piece of your, say, human DNA 107 00:06:57,000 --> 00:07:06,000 gets stuck to some piece of vector. 108 00:07:06,000 --> 00:07:18,000 Insert this DNA into vectors that can replicate in bacteria. 109 00:07:18,000 --> 00:07:25,000 So, I'm going to actually take my piece of human DNA 110 00:07:25,000 --> 00:07:28,000 and not just ligate it to any piece of DNA. 111 00:07:28,000 --> 00:07:32,000 I'm going to take my human DNA, and I'm going to ligate it to a 112 00:07:32,000 --> 00:07:35,000 vector that has all of the machinery, 113 00:07:35,000 --> 00:07:39,000 all of the ability to be copied in a bacteria. Then what 114 00:07:39,000 --> 00:07:45,000 I'm going to do is I'm going to transform my DNA into a 115 00:07:45,000 --> 00:07:53,000 host cell, a host bacterial cell. Transform means introduce. When we 116 00:07:53,000 --> 00:07:59,000 talk about transforming DNA, 117 00:07:59,000 --> 00:08:02,000 we're not talking about changing it. It's the word that's used for 118 00:08:02,000 --> 00:08:06,000 taking my DNA, stuck into a vector, 119 00:08:06,000 --> 00:08:10,000 and introducing it into bacterial cells. Ideally, 120 00:08:10,000 --> 00:08:14,000 each bacterial cell would carry one such 121 00:08:14,000 --> 00:08:22,000 DNA molecule, and then what I want to do is I want to plate my cells, 122 00:08:22,000 --> 00:08:31,000 and select those that carry human DNA, my DNA; DNA I've put on 123 00:08:31,000 --> 00:08:38,000 it. So, I'm going to put them on a Petri plate and I want only the 124 00:08:38,000 --> 00:08:44,000 bacteria that happen to have picked an individual piece of human DNA 125 00:08:44,000 --> 00:08:47,000 to grow. So, that's the trick. It's a very simple trick. Take 126 00:08:47,000 --> 00:08:50,000 total human DNA, cut it up into pieces, glue 127 00:08:50,000 --> 00:08:53,000 it to a vector that's able to be copied so that it's able to be 128 00:08:53,000 --> 00:08:56,000 replicated in bacteria, put the vectors into bacterial 129 00:08:56,000 --> 00:08:59,000 cells; every bacterial cell picks up no more than one vector. 130 00:08:59,000 --> 00:09:03,000 You plate it out, and you simply arrange so that the 131 00:09:03,000 --> 00:09:06,000 only cells that grow are those that picked up the piece of human DNA. 132 00:09:06,000 --> 00:09:09,000 And then, every one of these colonies 133 00:09:09,000 --> 00:09:12,000 is the descendent of a single bacterial cell that picked up a 134 00:09:12,000 --> 00:09:15,000 single human molecule, but is obligingly 135 00:09:15,000 --> 00:09:18,000 copying that molecule for you again and again and again and again. 136 00:09:18,000 --> 00:09:21,000 And thus, you have what we refer 137 00:09:21,000 --> 00:09:26,000 to; this whole collection here is called a library of clones. 138 00:09:26,000 --> 00:09:31,000 This is called a recombinant library because 139 00:09:31,000 --> 00:09:35,000 every piece of the human genome is somewhere in here. 140 00:09:35,000 --> 00:09:39,000 You know, this one here probably is active, and maybe 141 00:09:39,000 --> 00:09:43,000 this one here maybe is collagen-11 and that one there might, 142 00:09:43,000 --> 00:09:47,000 ah, there's beta globin. OK, actually 143 00:09:47,000 --> 00:09:49,000 when you look at the plate there's no way to tell but in principle 144 00:09:49,000 --> 00:09:52,000 they're all there. So, there will be this question of, 145 00:09:52,000 --> 00:09:54,000 how do we look at a library and pull out what the right one 146 00:09:54,000 --> 00:09:57,000 is? But somewhere in there should be a bacterial 147 00:09:57,000 --> 00:10:01,000 colony that has pure beta globin gene, the DNA for beta globin. 148 00:10:01,000 --> 00:10:03,000 The next lecture will be about how you actually find it. 149 00:10:03,000 --> 00:10:05,000 But today let's just build this library. So our 150 00:10:05,000 --> 00:10:09,000 goal is to be able to build a library like this. 151 00:10:09,000 --> 00:10:13,000 So, we have to figure out how to cut DNA, 152 00:10:13,000 --> 00:10:15,000 paste, DNA, vectors, etc., etc. So that's what our 153 00:10:15,000 --> 00:10:18,000 subject will be today. Let's dive in. First, 154 00:10:18,000 --> 00:10:26,000 cutting DNA, how do you cut DNA? Restriction enzymes, 155 00:10:26,000 --> 00:10:40,000 etc. It turns out that the way you could cut 156 00:10:40,000 --> 00:10:43,000 DNA at particular places is as follows. Let me take a piece of DNA. 157 00:10:43,000 --> 00:10:47,000 Here's a double-stranded piece of DNA. We'll go A, 158 00:10:47,000 --> 00:10:52,000 G, C, T, A, G, A, A, T, T, C, T, T, A, C, C, 159 00:10:52,000 --> 00:10:57,000 hydroxyl there, three primad. Let's go back on the 160 00:10:57,000 --> 00:11:03,000 other strand. What do we have? G, 161 00:11:03,000 --> 00:11:10,000 G, T, A, A, G, A, A, T, T, C, T, A, G, C, T, 162 00:11:10,000 --> 00:11:14,000 hydroxyl there, three prime. There's my double 163 00:11:14,000 --> 00:11:19,000 stranded piece of DNA. It turns out that there exists an 164 00:11:19,000 --> 00:11:29,000 enzyme that recognizes that exact sequence: G, A, 165 00:11:29,000 --> 00:11:39,000 A, T, T, C. The enzyme goes by the name Eco R1. 166 00:11:39,000 --> 00:11:42,000 This protein, this enzyme, scans along the the DNA, and it 167 00:11:42,000 --> 00:11:45,000 finds this sequence: G, A, A, T, T, C. 168 00:11:45,000 --> 00:11:48,000 Actually it's on this strand. What about on the other strand does 169 00:11:48,000 --> 00:11:52,000 it say? Same thing. But it's a reverse palindrome. 170 00:11:52,000 --> 00:11:56,000 It's symmetric. That's very good. 171 00:11:56,000 --> 00:12:00,000 And it turns out most restriction enzymes do that. OK, 172 00:12:00,000 --> 00:12:04,000 so what it does when it finds that, 173 00:12:04,000 --> 00:12:08,000 with the benefit of colored chalk that has just shown up here 174 00:12:08,000 --> 00:12:14,000 is it cleaves the DNA fragment like that. 175 00:12:14,000 --> 00:12:23,000 And what it gives you then is a broken double strand with 176 00:12:23,000 --> 00:12:29,000 an overhang, T, T, A, A, five prime, 177 00:12:29,000 --> 00:12:35,000 three prime, three prime, five prime. This has a hydroxyl 178 00:12:35,000 --> 00:12:38,000 here this. This has a phosphate there. And then this other fragment 179 00:12:38,000 --> 00:12:47,000 here is A, A, T, T, C, T, T, A, C, C, G, G, T, 180 00:12:47,000 --> 00:12:57,000 A. So, what happens is, and this has a hydroxyl five 181 00:12:57,000 --> 00:13:04,000 prime, three prime, three prime, five prime I get into 182 00:13:04,000 --> 00:13:10,000 two fragments of DNA that have been 183 00:13:10,000 --> 00:13:16,000 broken there and have it over. The overhang is complementary. 184 00:13:16,000 --> 00:13:23,000 Those two sequences match each other. There's what's called a five prime 185 00:13:23,000 --> 00:13:30,000 overhang and they're complementary So, we have complementary, 186 00:13:30,000 --> 00:13:36,000 that is matching, five prime over X. This is called Eco R1 because it's 187 00:13:36,000 --> 00:13:43,000 purified, this particular enzyme, 188 00:13:43,000 --> 00:13:49,000 from E coli strain R and it's the number one such enzyme that 189 00:13:49,000 --> 00:13:55,000 was purified from it. So, it is very simple nomenclature 190 00:13:55,000 --> 00:14:01,000 here. Now, here's a question. Why do bacteria have an enzyme like 191 00:14:01,000 --> 00:14:05,000 this? There are some people who feel that the reason 192 00:14:05,000 --> 00:14:09,000 is that this enzyme is here precisely to allow molecular 193 00:14:09,000 --> 00:14:14,000 biologists to cut and paste DNA, and this represents 194 00:14:14,000 --> 00:14:20,000 impressions likely, me among them. 195 00:14:20,000 --> 00:14:24,000 How did anybody find this stuff? Well, shaggy dog story, I have to 196 00:14:24,000 --> 00:14:28,000 tell you the following shaggy dog story. 197 00:14:28,000 --> 00:14:31,000 So, this is a fun shaggy dog story, and it's an MIT shaggy dog story 198 00:14:31,000 --> 00:14:34,000 because it comes from the work of Salvador 199 00:14:34,000 --> 00:14:37,000 Luria, who is a very famous biologist who worked 200 00:14:37,000 --> 00:14:41,000 here at MIT. So, Salvador Luria was studying 201 00:14:41,000 --> 00:14:46,000 bacteriophage. Remember, bacteriophage are the 202 00:14:46,000 --> 00:14:52,000 viruses that infect bacteria. So, he was studying bacteriophage, 203 00:14:52,000 --> 00:15:00,000 and he took his bacteriophage and used it to infect a strain of 204 00:15:00,000 --> 00:15:07,000 bacteria, strain A, and he also used it to infect a strain of 205 00:15:07,000 --> 00:15:13,000 bacteria, strain B. So when he did that, 206 00:15:13,000 --> 00:15:17,000 what you do is you plate a lawn of bacterial cells. 207 00:15:17,000 --> 00:15:21,000 You kind of have a slush of bacterial cells that you 208 00:15:21,000 --> 00:15:26,000 plate here with virus mixed in, and wherever there's a 209 00:15:26,000 --> 00:15:32,000 virus, the virus grows, replicates, and either kills or 210 00:15:32,000 --> 00:15:35,000 slows down the growth of the cells so that bacterial cells grow 211 00:15:35,000 --> 00:15:39,000 everywhere else, but where a viral particle landed 212 00:15:39,000 --> 00:15:41,000 there's an absence of bacterial cells and that hole in the lawn, 213 00:15:41,000 --> 00:15:44,000 this whole thing is called a lawn of 214 00:15:44,000 --> 00:15:47,000 bacteria, and the holes in the lawn are called plaques. 215 00:15:47,000 --> 00:15:51,000 So, when he did this, he found that when he did it on 216 00:15:51,000 --> 00:15:56,000 strain A he got a bunch of plaques and when he did it 217 00:15:56,000 --> 00:16:02,000 on strain B, he didn't, no plaques. So what what's the 218 00:16:02,000 --> 00:16:06,000 simplest explanation for this? 219 00:16:06,000 --> 00:16:10,000 Strain B is different somehow. It's resistant to the virus. I 220 00:16:10,000 --> 00:16:14,000 don't know, the virus has to come in and do 221 00:16:14,000 --> 00:16:18,000 various things, and strain b isn't compatible with 222 00:16:18,000 --> 00:16:23,000 the virus or something like that. No big deal. So it's a resistant 223 00:16:23,000 --> 00:16:28,000 strain. But, occasionally you'd get a plaque. 224 00:16:28,000 --> 00:16:33,000 Very occasionally, you'd have an occasional plaque. 225 00:16:33,000 --> 00:16:38,000 So now, how would this be? I said the strain was 226 00:16:38,000 --> 00:16:44,000 resistant. How could there be an occasional plaque? 227 00:16:44,000 --> 00:16:50,000 Mutation in, could it be imitation in the bacteria? 228 00:16:50,000 --> 00:16:56,000 Sorry. Well, if it was a mutation in the bacteria 229 00:16:56,000 --> 00:16:59,000 there would be one bacteria that had the mutation. It was now 230 00:16:59,000 --> 00:17:03,000 susceptible, and it would die. But, the lawn would 231 00:17:03,000 --> 00:17:06,000 kind of grow because the cells around it wouldn't have a mutation. 232 00:17:06,000 --> 00:17:10,000 So it's probably not a mutation in the bacteria 233 00:17:10,000 --> 00:17:13,000 but what could be? Maybe a mutation of the virus: what 234 00:17:13,000 --> 00:17:17,000 if it was a mutation in the virus that was able to overcome 235 00:17:17,000 --> 00:17:21,000 the resistance? Ah, so that's OK. 236 00:17:21,000 --> 00:17:25,000 So, what this must be is the existence of a resistant 237 00:17:25,000 --> 00:17:30,000 virus that is a virus that can overcome the resistance of 238 00:17:30,000 --> 00:17:36,000 the bacteria. So far: perfectly normal, no problem. Now, let's 239 00:17:36,000 --> 00:17:43,000 do the following experiment. Let's take this resistant virus, 240 00:17:43,000 --> 00:17:50,000 and grow it, again, on strain A and grow it on 241 00:17:50,000 --> 00:17:56,000 strain b. What do you think is going to happen when I grow it on 242 00:17:56,000 --> 00:18:02,000 strain A? It'll grow lots of plaques. It still grows 243 00:18:02,000 --> 00:18:08,000 on strain A, and now what's going to happen when I grow 244 00:18:08,000 --> 00:18:13,000 it on strain B? If this was really a mutation that 245 00:18:13,000 --> 00:18:17,000 made it able to grow on strain b then it gets lots of plaques because 246 00:18:17,000 --> 00:18:21,000 it's now gained the ability to grow on strain B, and sure 247 00:18:21,000 --> 00:18:27,000 enough, that's what happens. So, 248 00:18:27,000 --> 00:18:35,000 there's nothing funky yet. But now, suppose I take one of 249 00:18:35,000 --> 00:18:41,000 these resistant viruses that I isolated here on strain B, 250 00:18:41,000 --> 00:18:44,000 I grow it again here on strain A. It grows. I grow it on strain B. 251 00:18:44,000 --> 00:18:47,000 It grows. If I take it again from strain B and 252 00:18:47,000 --> 00:18:50,000 I repeat this, it'll still grow on strain A and 253 00:18:50,000 --> 00:18:54,000 still grow on strain B. Let's take one, 254 00:18:54,000 --> 00:19:00,000 though, from strain A. It's the resistant one which we have just now 255 00:19:00,000 --> 00:19:08,000 happened to have grown on strain A. And now, 256 00:19:08,000 --> 00:19:19,000 let's grow it again on strain A versus on strain B. And sure 257 00:19:19,000 --> 00:19:24,000 enough, it continues to grow on strain A, no problem. 258 00:19:24,000 --> 00:19:29,000 And we grow it now on strain B. And, what shall we 259 00:19:29,000 --> 00:19:32,000 get? Well, it should grow on strain B, right, because it was a mutant 260 00:19:32,000 --> 00:19:36,000 virus, and it gained the ability to grow on either. 261 00:19:36,000 --> 00:19:42,000 We passage it through B, it grows. We passage it through A. 262 00:19:42,000 --> 00:19:48,000 But the answer was nothing, no growth. How can that be? 263 00:19:48,000 --> 00:19:53,000 We had a virus. We agreed that was a mutant virus that 264 00:19:53,000 --> 00:19:55,000 had picked up the ability to grow on strain B, and we demonstrated 265 00:19:55,000 --> 00:19:58,000 it has now on either A or B. 266 00:19:58,000 --> 00:20:02,000 We then reached in, and grabbed a copy of it here from 267 00:20:02,000 --> 00:20:06,000 strain A, having grown on strain A, and we try it again and it now 268 00:20:06,000 --> 00:20:10,000 won't grow on strain B. If this was a mutation, 269 00:20:10,000 --> 00:20:15,000 I mean, maybe the mutation reverted, right? 270 00:20:15,000 --> 00:20:20,000 It was a reversion of the mutation. It mutated back. Is that plausible? 271 00:20:20,000 --> 00:20:24,000 No, come on. The chance that all of the copies there would mutate 272 00:20:24,000 --> 00:20:26,000 back, come on. I mean, you could repeat this 273 00:20:26,000 --> 00:20:30,000 several times and this is always what happens. 274 00:20:30,000 --> 00:20:34,000 What does that tell you about this mutation in the virus? 275 00:20:34,000 --> 00:20:39,000 It can't be a mutation of the virus because 276 00:20:39,000 --> 00:20:42,000 if it was a mutation, it would be transmitted through. 277 00:20:42,000 --> 00:20:46,000 But, passing through strain A makes it lose 278 00:20:46,000 --> 00:20:50,000 its ability to grow on strain B. But as long as you keep passing it 279 00:20:50,000 --> 00:20:55,000 through strain B, it can grow on strain B. This is 280 00:20:55,000 --> 00:20:59,000 not your typical genetics. So, Salvador Luria loved this. 281 00:20:59,000 --> 00:21:04,000 And, he really worked out what was going on. And somehow, 282 00:21:04,000 --> 00:21:12,000 well, so anyway, they referred to this as strain B 283 00:21:12,000 --> 00:21:20,000 having the ability to restrict the growth of the 284 00:21:20,000 --> 00:21:25,000 virus. Strain B can restrict the growth of the virus. 285 00:21:25,000 --> 00:21:30,000 That's where this word restriction enzyme 286 00:21:30,000 --> 00:21:34,000 comes from. What's really, truly going on here underneath the 287 00:21:34,000 --> 00:21:38,000 shaggy dog story? It took a long time before 288 00:21:38,000 --> 00:21:42,000 the shaggy dog story that Salvador Luria was the one to really 289 00:21:42,000 --> 00:21:47,000 demonstrate is fully worked out. But, what turns out to 290 00:21:47,000 --> 00:21:54,000 be the case is that strain B has a restriction enzyme. 291 00:21:54,000 --> 00:22:02,000 That's how it restricts the growth. It has one of 292 00:22:02,000 --> 00:22:10,000 these enzymes that can cut DNA at a specific place. 293 00:22:10,000 --> 00:22:17,000 When the virus comes into strain B, it injects its DNA, 294 00:22:17,000 --> 00:22:25,000 and the enzyme comes along and cuts the virus's DNA, protecting the 295 00:22:25,000 --> 00:22:31,000 bacteria. It's got its own little defense mechanism: pretty cool, 296 00:22:31,000 --> 00:22:38,000 pretty cool. So, any DNA that's introduced, 297 00:22:38,000 --> 00:22:43,000 if it has the sequence here, it'll take G, A, A, T, T, C, the 298 00:22:43,000 --> 00:22:49,000 bacteria cuts it. Wait a second, 299 00:22:49,000 --> 00:22:55,000 the bacteria has its own DNA. Why doesn't it chop up its own 300 00:22:55,000 --> 00:23:01,000 chromosome? Well, I mean, so one simple possibility would be 301 00:23:01,000 --> 00:23:05,000 that if this thing is looking for the sequence, G, A, A, T, T, C 302 00:23:05,000 --> 00:23:10,000 in the genome, maybe it's the case that the 303 00:23:10,000 --> 00:23:15,000 bacteria has arranged that its own DNA 304 00:23:15,000 --> 00:23:19,000 never has a G, A, A, T, T, C. That would be a 305 00:23:19,000 --> 00:23:24,000 simple solution, right? But is it a plausible 306 00:23:24,000 --> 00:23:29,000 solution? Why not? But just statistically, 307 00:23:29,000 --> 00:23:33,000 how often do I expect to encounter a G, A, A, T, T, C? What's 308 00:23:33,000 --> 00:23:39,000 the frequency of any given six letter word in a four letter 309 00:23:39,000 --> 00:23:44,000 alphabet? It's about one in 46. So, about one in 46 positions will 310 00:23:44,000 --> 00:23:48,000 be a G, A, A, T, T, C, and that's about 4,000 311 00:23:48,000 --> 00:23:52,000 letters. So, every 4, 00 letters, I expect to encounter a 312 00:23:52,000 --> 00:23:56,000 G, A, A, T, T, C. How big is the E coli genome? 313 00:23:56,000 --> 00:23:59,000 4 million letters. So, how many G, A, A, 314 00:23:59,000 --> 00:24:02,000 T, T, Cs will there be? About 1,000 315 00:24:02,000 --> 00:24:06,000 of them. It's just not plausible to imagine that it doesn't have the 316 00:24:06,000 --> 00:24:10,000 sites. So, your idea is that if it has these sites, 317 00:24:10,000 --> 00:24:12,000 it's got to arrange to protect its own sites. So, how is 318 00:24:12,000 --> 00:24:18,000 it going to protect its own sites? 319 00:24:18,000 --> 00:24:26,000 Covers it or something. You could imagine something covers 320 00:24:26,000 --> 00:24:32,000 it or something, but you want to alter your own, 321 00:24:32,000 --> 00:24:35,000 so it turns out you're exactly right. What happens is there is 322 00:24:35,000 --> 00:24:40,000 an enzyme that comes along, and at this position, 323 00:24:40,000 --> 00:24:47,000 attaches a methyl group. It modifies the DNA 324 00:24:47,000 --> 00:24:53,000 by attaching a methyl group. It turns out that that methyl group 325 00:24:53,000 --> 00:24:59,000 is enough to prevent the restriction enzyme from binding. 326 00:24:59,000 --> 00:25:07,000 So, this blocks the restriction enzyme. So, that way the bacteria 327 00:25:07,000 --> 00:25:14,000 is able to distinguish between its own DNA, 328 00:25:14,000 --> 00:25:21,000 which is methylated, and the viral DNA. So, wait a second, how does 329 00:25:21,000 --> 00:25:27,000 that explain my virus that manage to grow? How did my virus manage to 330 00:25:27,000 --> 00:25:33,000 grow? It would need to have gotten itself modified also to be protected. 331 00:25:33,000 --> 00:25:40,000 Could that happen by chance? What if the methylation enzyme, 332 00:25:40,000 --> 00:25:48,000 the methylase, which is floating around in the cell, 333 00:25:48,000 --> 00:25:56,000 “accidentally” methylated the virus's DNA? What would 334 00:25:56,000 --> 00:26:02,000 happen then? The virus would become immune. 335 00:26:02,000 --> 00:26:08,000 So, suppose the bacteria was pretty clever, and had a 336 00:26:08,000 --> 00:26:11,000 lot more restriction enzyme, and only a little bit of methylase? 337 00:26:11,000 --> 00:26:14,000 Well, you'd imagine that most of the 338 00:26:14,000 --> 00:26:17,000 time the restriction enzyme would cut up the viral DNA first. 339 00:26:17,000 --> 00:26:20,000 But every once in a while, the methylase 340 00:26:20,000 --> 00:26:24,000 would get there first and protect the virus's DNA. 341 00:26:24,000 --> 00:26:28,000 That becomes an immune virus because it can't 342 00:26:28,000 --> 00:26:32,000 be cut by the enzyme anymore. And, if I take that, and I grow it 343 00:26:32,000 --> 00:26:36,000 again on strain B, it'll now produce lots of plaques because 344 00:26:36,000 --> 00:26:42,000 it was methylated. And, if I grow it 345 00:26:42,000 --> 00:26:44,000 again on strain B, it remains methylated because once 346 00:26:44,000 --> 00:26:46,000 it's methylated and comes into the cell, it's not 347 00:26:46,000 --> 00:26:50,000 cut. And so, its descendants will get methylated. 348 00:26:50,000 --> 00:26:54,000 But, what happens if I ever grow that methylated virus 349 00:26:54,000 --> 00:26:59,000 on strain A? Strain A doesn't have the restriction enzyme, 350 00:26:59,000 --> 00:27:04,000 and it doesn't have the methylase. So, the 351 00:27:04,000 --> 00:27:08,000 progeny phage that grew up on strain A aren't methylated. 352 00:27:08,000 --> 00:27:13,000 They're no longer protected. The protection that the 353 00:27:13,000 --> 00:27:16,000 virus has is the protection that comes from this methylation enzyme. 354 00:27:16,000 --> 00:27:20,000 It's not the sequence of the DNA. It's 355 00:27:20,000 --> 00:27:23,000 the attachment to these methyl groups. And so, 356 00:27:23,000 --> 00:27:26,000 it turns out that if you ever pass this virus through strain A, 357 00:27:26,000 --> 00:27:39,000 passage through strain A, the resulting DNA loses is 358 00:27:39,000 --> 00:27:48,000 unmethylated. And now, it can be cut. 359 00:27:48,000 --> 00:27:52,000 And it can be cut. Well, this explained the weird 360 00:27:52,000 --> 00:27:56,000 results of Luria, that somehow bacteria had a complex 361 00:27:56,000 --> 00:27:59,000 defense mechanism of a restriction enzyme 362 00:27:59,000 --> 00:28:02,000 and a cognate methylase. The restriction enzyme would cut 363 00:28:02,000 --> 00:28:06,000 the sequence. The chromosome would be protected by 364 00:28:06,000 --> 00:28:09,000 methylating that site, and usually it would work fine. 365 00:28:09,000 --> 00:28:12,000 Occasionally the bacterial virus would get methylated. 366 00:28:12,000 --> 00:28:15,000 It would be protected as long as it continues to go through strains that 367 00:28:15,000 --> 00:28:18,000 have this restricted methylation system. That was it. Now, this 368 00:28:18,000 --> 00:28:21,000 shaggy dog story took a couple of decades to work out, 369 00:28:21,000 --> 00:28:25,000 and eventually led to Nobel prizes for the discovery 370 00:28:25,000 --> 00:28:28,000 of restriction enzymes. They're extremely important because 371 00:28:28,000 --> 00:28:32,000 although bacteria do this to protect themselves, they have also 372 00:28:32,000 --> 00:28:35,000 given us the perfect tool to now cut DNA where we want to 373 00:28:35,000 --> 00:28:38,000 cut DNA. Now, what if you wanted to cut at a G, 374 00:28:38,000 --> 00:28:42,000 A, A, T, T, C? You've got Eco R1. 375 00:28:42,000 --> 00:28:47,000 But what if you wanted to cut it cut it in another sequence? 376 00:28:47,000 --> 00:28:57,000 Well, it turns out that if you want to cut it at G, G, A, T, C, C 377 00:28:57,000 --> 00:29:09,000 there's an enzyme called Bam H1. If you want to cut it at 378 00:29:09,000 --> 00:29:14,000 A, A, G, C, T, T or A, A, G, C, 379 00:29:14,000 --> 00:29:20,000 T, T, there's an enzyme called Hmd 3. If you 380 00:29:20,000 --> 00:29:26,000 want to cut it at just G, A, T, C like this, C, T, A, 381 00:29:26,000 --> 00:29:34,000 G, an enzyme called Mbo 1. And, there are enzymes that 382 00:29:34,000 --> 00:29:41,000 cut it this way, enzymes that cut it this way, 383 00:29:41,000 --> 00:29:45,000 enzymes that cut it this way, enzymes that recognize four bases, 384 00:29:45,000 --> 00:29:49,000 six bases. There are even enzymes that recognize eight bases. 385 00:29:49,000 --> 00:29:53,000 It turns out that bacteria have elaborated zillions of different 386 00:29:53,000 --> 00:29:57,000 restriction enzymes that recognize different sequences. This 387 00:29:57,000 --> 00:30:00,000 perfect for molecular biologists. Bacteria, 388 00:30:00,000 --> 00:30:02,000 of course, are much smarter than we are, having been out this much 389 00:30:02,000 --> 00:30:06,000 longer, have developed all of these tools for engineering. 390 00:30:06,000 --> 00:30:10,000 All we have to do is borrow them. So how do you get Eco R1? 391 00:30:10,000 --> 00:30:13,000 We grow out that strain of E coli; you purify Wco R1. 392 00:30:13,000 --> 00:30:17,000 And how do you get Hmd 3? You grow up 393 00:30:17,000 --> 00:30:20,000 strain of haemophilus influenza. You purify the enzyme. At least, 394 00:30:20,000 --> 00:30:24,000 that's how primitive molecular biologists did it. If you 395 00:30:24,000 --> 00:30:27,000 wanted to work with a restriction enzyme, you'd grow up the bacteria. 396 00:30:27,000 --> 00:30:30,000 You'd purify the enzyme yourself, and you would just use 397 00:30:30,000 --> 00:30:34,000 it in your laboratory. Of course today what does a modern 398 00:30:34,000 --> 00:30:39,000 molecular biologist do if he or she should want Hmd 3? 399 00:30:39,000 --> 00:30:47,000 It's in the catalog. So the catalog has 200 restriction 400 00:30:47,000 --> 00:30:55,000 enzymes. Yup, PSI-1 is new, on sale, 500 units for $400. 401 00:30:55,000 --> 00:31:01,000 Let's see what Eco R1 is going for. 402 00:31:01,000 --> 00:31:04,000 Eco R1: look at this, 50,000 units $200. That's a good 403 00:31:04,000 --> 00:31:07,000 price for Eco R1 because it's a very famous 404 00:31:07,000 --> 00:31:10,000 enzyme here. So all you have to do is you give them your credit card 405 00:31:10,000 --> 00:31:15,000 number and you have it tomorrow by FedEx. So that's how restriction 406 00:31:15,000 --> 00:31:21,000 enzymes are obtained today. So, next up, we can cut DNA any 407 00:31:21,000 --> 00:31:26,000 place we want to. We now need to glue DNA together. 408 00:31:26,000 --> 00:31:32,000 Suppose I cut DNA, human DNA, and I'm 409 00:31:32,000 --> 00:31:34,000 going to cut it. I'll just take human DNA, 410 00:31:34,000 --> 00:31:37,000 your DNA, which I've purified, and I'm going to cut 411 00:31:37,000 --> 00:31:46,000 it at all its Eco R1 sites. I can take any other DNA I want. 412 00:31:46,000 --> 00:31:50,000 I don't know, I could take zebra DNA. I could take anything and I could 413 00:31:50,000 --> 00:31:56,000 also cut it at Eco R1 sites. I could mix them together, 414 00:31:56,000 --> 00:32:02,000 and after mixing them together the fragments will float around and 415 00:32:02,000 --> 00:32:08,000 remember this down here has T, T, 416 00:32:08,000 --> 00:32:16,000 A, A. This fragment over here from some other piece 417 00:32:16,000 --> 00:32:23,000 T, T, A, A, this could be human DNA. This could be zebra DNA if you want 418 00:32:23,000 --> 00:32:29,000 to. It doesn't matter. It could be bacterial DNA. 419 00:32:29,000 --> 00:32:33,000 These fragments overlap. They'll hydrogen bond a little bit, 420 00:32:33,000 --> 00:32:37,000 but that of course won't introduce a covalent bond here. 421 00:32:37,000 --> 00:32:42,000 I'd really like to make a covalent bond. I would like to attach the 422 00:32:42,000 --> 00:32:44,000 piece of DNA from one source to the piece of DNA from the 423 00:32:44,000 --> 00:32:47,000 other source by doing the opposite of the 424 00:32:47,000 --> 00:32:49,000 restriction enzyme. The restriction enzyme cut at these 425 00:32:49,000 --> 00:32:52,000 locations. I would now like to catalyze 426 00:32:52,000 --> 00:32:55,000 the rejoining of the sugar phosphate backbone here. 427 00:32:55,000 --> 00:32:59,000 So I would like to rejoin the sugar phosphate backbone. I 428 00:32:59,000 --> 00:33:03,000 have a hydroxyl here. I have a phosphate here, 429 00:33:03,000 --> 00:33:08,000 and I would like to ligate them together. So 430 00:33:08,000 --> 00:33:11,000 how I manage to ligate? What kind of fancy chemistry do I 431 00:33:11,000 --> 00:33:15,000 do to ligate these pieces of DNA together? I don't do any fancy 432 00:33:15,000 --> 00:33:20,000 chemistry. I again sit at the feet of bacteria who have solved all 433 00:33:20,000 --> 00:33:22,000 these problems before. And I ask bacteria, how do you do 434 00:33:22,000 --> 00:33:24,000 this? And they say, well, we have an enzyme called 435 00:33:24,000 --> 00:33:26,000 ligase. So, you purify ligase from bacteria, 436 00:33:26,000 --> 00:33:29,000 you add that, and ligase ligates the fragments together. Why 437 00:33:29,000 --> 00:33:33,000 do bacteria have an enzyme ligase? 438 00:33:33,000 --> 00:33:38,000 For a pair of their own DNA. Things go wrong this is part of the 439 00:33:38,000 --> 00:33:42,000 DNA maintenance scheme of bacteria. They have an enzyme 440 00:33:42,000 --> 00:33:46,000 ligase to appear their own breaks in DNA and, obligingly, you 441 00:33:46,000 --> 00:33:50,000 can purify DNA ligase. So you add ligase, today, 442 00:33:50,000 --> 00:33:54,000 of course, if you need a ligase, 443 00:33:54,000 --> 00:33:56,000 how do you get it? It's in the catalog, 444 00:33:56,000 --> 00:33:59,000 absolutely. So, you can glue together any of those things you 445 00:33:59,000 --> 00:34:04,000 want. All right, next up, what DNA do I want to stick 446 00:34:04,000 --> 00:34:10,000 together? I mean, here I made a silly example. 447 00:34:10,000 --> 00:34:15,000 I'm going to stick some human DNA to some zebra 448 00:34:15,000 --> 00:34:19,000 DNA. Why do that? I mean, just to show you that I can 449 00:34:19,000 --> 00:34:22,000 doing it, right? I'm just demonstrating that I could stick any 450 00:34:22,000 --> 00:34:25,000 DNA to any DNA. Remember, once I've 451 00:34:25,000 --> 00:34:27,000 got a piece of DNA it doesn't know whether it came from a human or a 452 00:34:27,000 --> 00:34:31,000 zebra. It's just the molecule. You can stick the molecules together, 453 00:34:31,000 --> 00:34:36,000 right? But what do I really want to attach my human DNA to? 454 00:34:36,000 --> 00:34:45,000 I want to attach it to attach it to some other DNA that 455 00:34:45,000 --> 00:34:59,000 has the ability to grow on its own within bacteria. Vectors: I need 456 00:34:59,000 --> 00:35:05,000 to make, here's what I would really like. I would like to have 457 00:35:05,000 --> 00:35:13,000 a piece of DNA that has some sequences that contain the 458 00:35:13,000 --> 00:35:23,000 recognition sites for replication. I'd like to have some replication 459 00:35:23,000 --> 00:35:31,000 initiation sites here. So, a piece of DNA that, 460 00:35:31,000 --> 00:35:36,000 remember, because the bacterial chromosome itself, 461 00:35:36,000 --> 00:35:42,000 here's my bacteria, bacteria'chromosome replicates 462 00:35:42,000 --> 00:35:48,000 itself, and it has the ability to start DNA replication at multiple 463 00:35:48,000 --> 00:35:55,000 sites called origins of replication. But, what I 464 00:35:55,000 --> 00:35:59,000 would really like is to be able to construct in the laboratory a 465 00:35:59,000 --> 00:36:07,000 synthetic piece of DNA that also would function as an 466 00:36:07,000 --> 00:36:19,000 origin of replication because then what I could do is in vitro take 467 00:36:19,000 --> 00:36:24,000 my piece of DNA, attach it to this vector, 468 00:36:24,000 --> 00:36:29,000 and it would now have the ability to grow 469 00:36:29,000 --> 00:36:33,000 the bacteria. How am I going to make a piece of DNA? 470 00:36:33,000 --> 00:36:37,000 What kind of engineering tricks can we do to create 471 00:36:37,000 --> 00:36:41,000 a small piece of DNA that has all the machinery needed to 472 00:36:41,000 --> 00:36:45,000 be able to be copied and replicated just like bacterial 473 00:36:45,000 --> 00:36:50,000 chromosomes? That's a pretty fancy feat of engineering. 474 00:36:50,000 --> 00:36:55,000 How are you going to do that? Sorry? OK, 475 00:36:55,000 --> 00:36:59,000 so who are you going to ask? If you wanted to do this, you're 476 00:36:59,000 --> 00:37:04,000 going to ask the experts. Who are the experts? Viruses or 477 00:37:04,000 --> 00:37:07,000 bacteria, or basically, if you want to do anything, 478 00:37:07,000 --> 00:37:10,000 the place to ask is the folks who have the most 479 00:37:10,000 --> 00:37:12,000 experience. And, the folks who have the most 480 00:37:12,000 --> 00:37:15,000 experience are almost always prokaryotic organisms because they 481 00:37:15,000 --> 00:37:18,000 are by far the most evolved things on 482 00:37:18,000 --> 00:37:21,000 this planet. Anything that can replicate itself and grow every 20 483 00:37:21,000 --> 00:37:24,000 minutes or something like that has had a lot more 484 00:37:24,000 --> 00:37:26,000 generations of evolution than you have. And therefore, 485 00:37:26,000 --> 00:37:29,000 they are much more optimized than we are. And so you go ask and say, 486 00:37:29,000 --> 00:37:33,000 has any bacteria worked out how to do this? Turns out bacteria have 487 00:37:33,000 --> 00:37:37,000 worked out how to do this just fine. In 488 00:37:37,000 --> 00:37:43,000 fact, most bacteria, at least many bacteria, 489 00:37:43,000 --> 00:37:49,000 contain within them, in addition to their own chromosome, 490 00:37:49,000 --> 00:37:57,000 small circles of DNA. These are called episomes. 491 00:37:57,000 --> 00:38:06,000 This is the chromosome. Epi means on top of 492 00:38:06,000 --> 00:38:10,000 or in addition to. So in addition to the chromosome, 493 00:38:10,000 --> 00:38:14,000 there's an episome. The episome is in fact an autonomously replicating 494 00:38:14,000 --> 00:38:22,000 piece of DNA that has an origin. And it replicates. Why do bacteria 495 00:38:22,000 --> 00:38:30,000 have episomes? It turns out episomes 496 00:38:30,000 --> 00:38:34,000 often contain genes. One fo the genes they contain, 497 00:38:34,000 --> 00:38:38,000 or some of the types of genes they contain, are resistance 498 00:38:38,000 --> 00:38:44,000 genes. There might be, for example, a penicillin resistance 499 00:38:44,000 --> 00:38:50,000 gene contained on an episome, or a streptomycin resistance gene. 500 00:38:50,000 --> 00:38:56,000 It turns out the bacteria have these 501 00:38:56,000 --> 00:39:02,000 episomes containing resistance genes, and they're not in the chromosome. 502 00:39:02,000 --> 00:39:08,000 They're separate. Now, why would they do that? 503 00:39:08,000 --> 00:39:15,000 It turns out when a bacterium dies and a cell cracks open, the 504 00:39:15,000 --> 00:39:21,000 DNA spills out. The next door neibhored bacteria has 505 00:39:21,000 --> 00:39:25,000 mechanismis to suck up DNA from the environment. You never 506 00:39:25,000 --> 00:39:29,000 know. It might find something interesting out there. 507 00:39:29,000 --> 00:39:33,000 So, it turns out that bacteria are rather promiscuously exchanging 508 00:39:33,000 --> 00:39:37,000 pieces of DNA all the time. And so, 509 00:39:37,000 --> 00:39:43,000 a bacteria that has an episome that has a penicillin resistance gene 510 00:39:43,000 --> 00:39:47,000 can spread it to other bacteria, and it's very nice. It's compact. 511 00:39:47,000 --> 00:39:51,000 It's on its own little episome, autonomously replicating 512 00:39:51,000 --> 00:39:56,000 piece of DNA. This is great for bacteria wanting to spread drug 513 00:39:56,000 --> 00:40:00,000 resistance. It's not good for human populations, 514 00:40:00,000 --> 00:40:03,000 for example, because this is how drug resistance spread through 515 00:40:03,000 --> 00:40:06,000 populations. This is why we have spreads of 516 00:40:06,000 --> 00:40:10,000 penicillin resistance. Now, of course, wait a second, 517 00:40:10,000 --> 00:40:15,000 this whole mechanism of spreading drug resistance, we've only had 518 00:40:15,000 --> 00:40:20,000 antibiotics since the 1940s. How did bacteria devise this so 519 00:40:20,000 --> 00:40:25,000 quickly? Sorry? Many generations since 1945? 520 00:40:25,000 --> 00:40:30,000 That would be very impressive. 521 00:40:30,000 --> 00:40:34,000 Yeah, but, I mean, why do they have this episome mechanism, the 522 00:40:34,000 --> 00:40:39,000 ability to spread DNA and all that? That's an awful lot 523 00:40:39,000 --> 00:40:45,000 to evolve in 50 years? Yeah? Something natural like 524 00:40:45,000 --> 00:40:51,000 penicillin. It turns out, we didn't 525 00:40:51,000 --> 00:40:54,000 think of penicillin. Who thought of penicillin? 526 00:40:54,000 --> 00:40:58,000 Fungi. Right, again, we learn from the lower organisms. Penicillin 527 00:40:58,000 --> 00:41:01,000 comes from fungi. Bacteria have been fighting 528 00:41:01,000 --> 00:41:04,000 off penicillin for millions and tens of millions of years. So, 529 00:41:04,000 --> 00:41:07,000 we may be very proud of our penicillin and all that. 530 00:41:07,000 --> 00:41:09,000 But, they've been at this for a very long time. 531 00:41:09,000 --> 00:41:11,000 This is about war between bacteria and fungi. 532 00:41:11,000 --> 00:41:14,000 That's what this is, OK? So, that's why these things are 533 00:41:14,000 --> 00:41:17,000 here. They're here so that bacteria can have these 534 00:41:17,000 --> 00:41:20,000 resistance genes against fungi and things like that that make 535 00:41:20,000 --> 00:41:24,000 antibiotics. Antibiotics are natural. We've made a few 536 00:41:24,000 --> 00:41:27,000 new ones, but most of the antibiotics have been made by nature. 537 00:41:27,000 --> 00:41:30,000 And so, if I wanted to replicate DNA, 538 00:41:30,000 --> 00:41:35,000 if I wanted to attach my human DNA to a piece of DNA that's 539 00:41:35,000 --> 00:41:41,000 capable of autonomous replication, autonomously replicating circles of 540 00:41:41,000 --> 00:41:47,000 DNA, these autonomously replicating circles of DNA are also called 541 00:41:47,000 --> 00:41:50,000 plasmids. And that's the word we'll mostly use for them, 542 00:41:50,000 --> 00:41:54,000 plasmids. All I need to do is purify a plasmid from 543 00:41:54,000 --> 00:41:57,000 a bacteria. So, I find a bacteria that has plasmids. 544 00:41:57,000 --> 00:42:01,000 I purify the plasmid, and then I can cut open the plasmid 545 00:42:01,000 --> 00:42:07,000 at the Eco R1 site, OK? So, this plasmid will have an 546 00:42:07,000 --> 00:42:13,000 ORI, an origin of replication. I'll cut 547 00:42:13,000 --> 00:42:19,000 it open at the Eco R1 sight. I'll take human DNA fragments that 548 00:42:19,000 --> 00:42:24,000 I've cut with Eco R1. I'll mix them with plasmid DNA that has 549 00:42:24,000 --> 00:42:30,000 been opened up, has an origin. Ligase will come 550 00:42:30,000 --> 00:42:34,000 along, join this up, and now I have a circle of DNA that 551 00:42:34,000 --> 00:42:38,000 has all the machinery to autonomously replicate, 552 00:42:38,000 --> 00:42:41,000 plus my human DNA. Now, if I wanted to get a vector, or an 553 00:42:41,000 --> 00:42:45,000 honest to goodness plasmid, I can go to a bacteria, 554 00:42:45,000 --> 00:42:50,000 grow it up, purify the plasmid, and cut it. Or alternatively, 555 00:42:50,000 --> 00:42:55,000 if I needed the plasmid, say, tomorrow, 556 00:42:55,000 --> 00:42:57,000 it's in the catalog. The next section of the catalog has 557 00:42:57,000 --> 00:42:59,000 a long list of plasmids here. There's 558 00:42:59,000 --> 00:43:04,000 a plasmid there, right? It's a nice plasmid. 559 00:43:04,000 --> 00:43:09,000 Oh yes, let's see, puck is a very good plasmid. 560 00:43:09,000 --> 00:43:12,000 PBR 322 is a good plasmid. The whole section, all this purple 561 00:43:12,000 --> 00:43:15,000 stuff are the plasmids. So, you can get the plasmids too. 562 00:43:15,000 --> 00:43:17,000 You place one order, you get the restriction enzymes, 563 00:43:17,000 --> 00:43:20,000 you get the ligases, you get the plasmids, no 564 00:43:20,000 --> 00:43:26,000 problem. So, I can then take total human DNA, cut up, cut 565 00:43:26,000 --> 00:43:32,000 up, cut up, cut up, add in plasmid, 566 00:43:32,000 --> 00:43:38,000 and I'm going to ligate together. And then, having ligated my human 567 00:43:38,000 --> 00:43:45,000 DNA to my plasmids, I'm going to mix with 568 00:43:45,000 --> 00:43:55,000 bacteria. I take some bacterial cells. I add my mixture of these 569 00:43:55,000 --> 00:43:58,000 plasmids containing human DNA. And now all I have to do is 570 00:43:58,000 --> 00:44:02,000 persuade the bacteria to suck up my plasmids containin human DNA. 571 00:44:02,000 --> 00:44:08,000 How do I teach bacteria to suck up DNA? They do that for 572 00:44:08,000 --> 00:44:11,000 a living. That's what they do. They're always spreading material. 573 00:44:11,000 --> 00:44:14,000 They have that ability. All we're doing is we're using 574 00:44:14,000 --> 00:44:16,000 their ability. So you get the sense that the kind 575 00:44:16,000 --> 00:44:19,000 of engineering that really works in biology is engineering that 576 00:44:19,000 --> 00:44:22,000 exploits what nature has been doing for a very long time. 577 00:44:22,000 --> 00:44:24,000 Rather than butting your head up against the problem, usually 578 00:44:24,000 --> 00:44:27,000 somebody has solved it, and it's almost always 579 00:44:27,000 --> 00:44:29,000 bacteria. So, you've transformed the bacteria. 580 00:44:29,000 --> 00:44:32,000 Now, there are a few tricks you can 581 00:44:32,000 --> 00:44:35,000 use to make them a little more transformable. 582 00:44:35,000 --> 00:44:39,000 You can add calcium phosphate, and blah, blah, blah, 583 00:44:39,000 --> 00:44:43,000 but you can sort of persuade them to take up the DNA. And then 584 00:44:43,000 --> 00:44:48,000 all you have to do is plate them out on a plate. 585 00:44:48,000 --> 00:44:55,000 Plate them out fairly dilutely so there are a lot of single bacterial 586 00:44:55,000 --> 00:45:03,000 cells that land on the plate, and wait for them to grow up. Each 587 00:45:03,000 --> 00:45:10,000 one of these had a single plasmid, a different plasmid than 588 00:45:10,000 --> 00:45:16,000 the next guy over. Wait a second, each one? 589 00:45:16,000 --> 00:45:22,000 How do I guarantee that every bacteria in my test 590 00:45:22,000 --> 00:45:28,000 tube took up a plasma? Is that plausible? I mean, 591 00:45:28,000 --> 00:45:34,000 I can't guarantee that every bacteria is going to take up a 592 00:45:34,000 --> 00:45:38,000 plasmid. Maybe I'll add so much plasmid that every bacteria will 593 00:45:38,000 --> 00:45:43,000 take one up. Oh, but that's a bad idea 594 00:45:43,000 --> 00:45:45,000 because why? Because then a lot of them will take up more than one. 595 00:45:45,000 --> 00:45:47,000 You don't want to do that. You really only 596 00:45:47,000 --> 00:45:50,000 want to have at most one. So, if you were going to arrange so 597 00:45:50,000 --> 00:45:53,000 that at random you only have about one, you've got to 598 00:45:53,000 --> 00:45:56,000 have a lot that are zero. So, this is a problem. I mean, 599 00:45:56,000 --> 00:45:59,000 it's a real waste. My library is going to have large numbers of 600 00:45:59,000 --> 00:46:02,000 bacteria that don't have any plasmid. In fact, this transformation 601 00:46:02,000 --> 00:46:06,000 process is not so efficient. It's not so efficient. So, we 602 00:46:06,000 --> 00:46:09,000 have a little bit of a problem here is 603 00:46:09,000 --> 00:46:13,000 that some of these guys will have human DNA. 604 00:46:13,000 --> 00:46:19,000 But, most of them won't. So, what can I do to arrange that 605 00:46:19,000 --> 00:46:26,000 any bacteria that did not pick up a plasmid 606 00:46:26,000 --> 00:46:34,000 was incapable of growing? Add a resistance gene to the 607 00:46:34,000 --> 00:46:42,000 plasmid. Suppose I were so clever as 608 00:46:42,000 --> 00:46:46,000 to add to that plasmid, penicillin resistance. So, 609 00:46:46,000 --> 00:46:51,000 not just an origin of replication, but suppose I also had a 610 00:46:51,000 --> 00:46:57,000 resistance gene here, say, for penicillin resistance or 611 00:46:57,000 --> 00:47:03,000 streptomycin resistance, or ampicillin tends to be a very big 612 00:47:03,000 --> 00:47:08,000 favorite, ampicillin resistance. Then, 613 00:47:08,000 --> 00:47:12,000 my plasmid would have ampicillin resistance gene encoded on it, 614 00:47:12,000 --> 00:47:16,000 an enzyme that can, say, break down ampicillin, 615 00:47:16,000 --> 00:47:22,000 so on and one way to to my perch you plate I just said ampa cell 616 00:47:22,000 --> 00:47:26,000 and now i'll even though most of the bacteria have not 617 00:47:26,000 --> 00:47:29,000 picked up a plasmid, only those bacteria that have picked 618 00:47:29,000 --> 00:47:32,000 up a plasmid have the ampicillin 619 00:47:32,000 --> 00:47:36,000 resistance gene and can grow on an ampicillin containing plate. 620 00:47:36,000 --> 00:47:39,000 Now, how do I get a plasmid with an ampicillin 621 00:47:39,000 --> 00:47:43,000 resistance gene? It's in the catalog. It's 622 00:47:43,000 --> 00:47:46,000 all there, right? In fact, these occur naturally. 623 00:47:46,000 --> 00:47:49,000 You can, with restriction enzymes, move the ampicillin resistance 624 00:47:49,000 --> 00:47:51,000 gene to your favorite plasmid. If you don't like that, you can put 625 00:47:51,000 --> 00:47:53,000 in kanamycin resistance, etc., etc., 626 00:47:53,000 --> 00:47:59,000 etc. So, that's how you do it. So, we've got the big picture here. 627 00:47:59,000 --> 00:48:06,000 We have now gotten a library, the Library of Human 628 00:48:06,000 --> 00:48:19,000 Fragments contained in E coli. The library is a big Petri plate or 629 00:48:19,000 --> 00:48:28,000 many Petri plates, each one of which is a colony. 630 00:48:28,000 --> 00:48:34,000 Each colony has a single vector with an origin, 631 00:48:34,000 --> 00:48:40,000 a resistance marker, and a distinct piece of 632 00:48:40,000 --> 00:48:47,000 human DNA. In this library lives somewhere the gene for Huntington's 633 00:48:47,000 --> 00:48:53,000 disease. Over here is a gene for cystic fibrosis, 634 00:48:53,000 --> 00:48:57,000 over here a gene for Duchenne muscular dystrophy, 635 00:48:57,000 --> 00:49:02,000 over here a gene for diastrophic dysplasia, 636 00:49:02,000 --> 00:49:08,000 over here a gene for etc. etc. The only detail, now, 637 00:49:08,000 --> 00:49:15,000 you've got a library. You've managed to purify each 638 00:49:15,000 --> 00:49:19,000 piece of human DNA away from every other piece of human DNA. 639 00:49:19,000 --> 00:49:21,000 The only question now is how do you use the library? 640 00:49:21,000 --> 00:49:24,000 How do you go to the library and withdraw the correct 641 00:49:24,000 --> 00:49:28,000 volume from the shelf? How do you find the one 642 00:49:28,000 --> 00:49:32,000 you're looking for? So, we have converted the problem 643 00:49:32,000 --> 00:49:35,000 of purification, which in every other form of biochemistry starts by 644 00:49:35,000 --> 00:49:38,000 saying, I'm going to purify something based on its distinctive 645 00:49:38,000 --> 00:49:42,000 properties, to I'm going to randomly purify everything. 646 00:49:42,000 --> 00:49:46,000 Everything would be purified in its own bacteria, and I've now converted 647 00:49:46,000 --> 00:49:51,000 to the problem of finding the one that I want in my 648 00:49:51,000 --> 00:49:55,000 library. Next time, we'll talk about how you go to the 649 00:49:55,000 --> 00:49:59,000 library and find what you want. 650 00:49:59,000 --> 00:50:04,000 See you then.