1 00:00:15,000 --> 00:00:19,000 We're going to talk today about recombinant DNA. 2 00:00:19,000 --> 00:00:24,000 This is a new section of the class. We'll spend three lectures talking 3 00:00:24,000 --> 00:00:29,000 about recombinant DNA, just sort of a practical how-to 4 00:00:29,000 --> 00:00:34,000 guide with respect to manipulating DNA now that you've learned a bit 5 00:00:34,000 --> 00:00:39,000 about it and how we use it in research and in other aspects of 6 00:00:39,000 --> 00:00:43,000 medicine and agriculture and so on. Now, recombinant DNA has been around 7 00:00:43,000 --> 00:00:47,000 for a long time. And, frankly, it's got a bit of a 8 00:00:47,000 --> 00:00:50,000 bad rap. When people in the general public hear the term recombinant DNA 9 00:00:50,000 --> 00:00:53,000 they sort of conjure up these ideas of mad scientists going around, 10 00:00:53,000 --> 00:00:57,000 you know, cloning bizarre organisms or creating dangerous 11 00:00:57,000 --> 00:01:01,000 forms of life. Although that's possible, 12 00:01:01,000 --> 00:01:05,000 it's not what we typically do with this technology. 13 00:01:05,000 --> 00:01:09,000 In fact, it has some rather benign and other quite important 14 00:01:09,000 --> 00:01:13,000 applications. So, Claudette, I'm not going to get this 15 00:01:13,000 --> 00:01:24,000 board, am I? OK. 16 00:01:24,000 --> 00:01:29,000 Basically what we're going to be talking about is genetic engineering. 17 00:01:29,000 --> 00:01:35,000 That ought to make the budding engineers in the crowd happy. 18 00:01:35,000 --> 00:01:40,000 We're going to manipulate the DNA of organisms and actually transfer 19 00:01:40,000 --> 00:01:46,000 DNA from one organism to another for various purposes. 20 00:01:46,000 --> 00:01:51,000 There are three general tenants that you need to pay attention to 21 00:01:51,000 --> 00:01:57,000 here. The basic goal is to isolate, from a complex genome, a piece of 22 00:01:57,000 --> 00:02:03,000 DNA of interest. And amplify that piece of DNA, 23 00:02:03,000 --> 00:02:11,000 which could be a specific gene that you're interested in, 24 00:02:11,000 --> 00:02:18,000 or it might be some other sequence within that DNA. 25 00:02:18,000 --> 00:02:25,000 It doesn't have to necessarily be a coding gene that you're interested 26 00:02:25,000 --> 00:02:33,000 in. The purposes of this are to, for example, produce therapeutic 27 00:02:33,000 --> 00:02:38,000 proteins or compounds. In the context of disease, 28 00:02:38,000 --> 00:02:42,000 for example, where individuals have a genetic disease in which they 29 00:02:42,000 --> 00:02:46,000 cannot make a particular enzyme, you might be able to treat that 30 00:02:46,000 --> 00:02:50,000 disease by producing the relevant enzyme via this technology in 31 00:02:50,000 --> 00:02:54,000 another organism and then supply it to that individual. 32 00:02:54,000 --> 00:02:58,000 The organism that you put it in could be a bacterium that produces 33 00:02:58,000 --> 00:03:01,000 it in large quantities. And you purify the enzyme and then 34 00:03:01,000 --> 00:03:05,000 perhaps inject it into the individual. It doesn't have to be 35 00:03:05,000 --> 00:03:08,000 bacteria. You can actually use recombinant DNA technology, 36 00:03:08,000 --> 00:03:12,000 genetic engineering to produce a human protein in, 37 00:03:12,000 --> 00:03:15,000 for example, some farm animal. And there are actually companies 38 00:03:15,000 --> 00:03:19,000 that will introduce human genes into sheep or cows such that you can 39 00:03:19,000 --> 00:03:23,000 purify the recombinant protein from the milk of that organism. 40 00:03:23,000 --> 00:03:27,000 And perhaps it might be more folded more properly or modified more 41 00:03:27,000 --> 00:03:31,000 properly than it would be if in bacteria. And it might be more 42 00:03:31,000 --> 00:03:35,000 easily purified from the milk than it would be from the bacterial cells. 43 00:03:35,000 --> 00:03:45,000 We can also engineer organisms. 44 00:03:45,000 --> 00:03:49,000 So this is to make different life forms that have different purposes, 45 00:03:49,000 --> 00:03:53,000 different values to us. And I've given you the example in past 46 00:03:53,000 --> 00:03:57,000 lectures about genetically modified foods. And your book talks 47 00:03:57,000 --> 00:04:02,000 about this as well. You can make crops that are 48 00:04:02,000 --> 00:04:06,000 resistant to pesticides. You can make crops that are 49 00:04:06,000 --> 00:04:10,000 resistant to herbicides. You can make crops that stay 50 00:04:10,000 --> 00:04:15,000 fresher longer, taste better, have better shape for 51 00:04:15,000 --> 00:04:19,000 packing, all sorts of things by manipulating the genome of those 52 00:04:19,000 --> 00:04:23,000 organisms such that they now have new and useful properties. 53 00:04:23,000 --> 00:04:28,000 We'll also teach you about how to make transgenic animals. 54 00:04:28,000 --> 00:04:31,000 I've given you one example of this already where you might want to make 55 00:04:31,000 --> 00:04:35,000 a therapeutic protein in the milk of a transgenic cow, 56 00:04:35,000 --> 00:04:39,000 but there are lots of purposes for making transgenic animals. 57 00:04:39,000 --> 00:04:43,000 One important one, especially for me, is to make disease models. 58 00:04:43,000 --> 00:04:46,000 So once we understand a human genetic disease we can use that 59 00:04:46,000 --> 00:04:50,000 information to create animal species, most typically mice that carry those 60 00:04:50,000 --> 00:04:54,000 very same mutations. And then study the disease process 61 00:04:54,000 --> 00:04:58,000 in the context of a laboratory mouse in ways that are difficult or 62 00:04:58,000 --> 00:05:02,000 impossible to do in humans. And we'll teach you how to do that. 63 00:05:02,000 --> 00:05:07,000 It all flows from the same general principles of manipulating DNA and 64 00:05:07,000 --> 00:05:13,000 transferring it from one organism to another. The value of doing this is 65 00:05:13,000 --> 00:05:23,000 several-fold. 66 00:05:23,000 --> 00:05:28,000 Understanding our or other DNA sequences. 67 00:05:28,000 --> 00:05:34,000 So without this technology, 68 00:05:34,000 --> 00:05:38,000 we actually couldn't have done the genome project. 69 00:05:38,000 --> 00:05:42,000 We need to purify up a lot of our DNA in order to sequence it to 70 00:05:42,000 --> 00:05:45,000 determine its sequence. And likewise of other organisms. 71 00:05:45,000 --> 00:05:49,000 So this was instrumental in bringing us to the point we are 72 00:05:49,000 --> 00:05:53,000 today of understanding our genetic information to the nucleotide level. 73 00:05:53,000 --> 00:05:57,000 As I mentioned, this technology allows us to understand 74 00:05:57,000 --> 00:06:07,000 disease-causing mutations. 75 00:06:07,000 --> 00:06:10,000 Again, we've taught you about that mutations can occur, 76 00:06:10,000 --> 00:06:13,000 disabled enzymes or other proteins cause disease, 77 00:06:13,000 --> 00:06:16,000 but that doesn't tell you why. And using this technology to sort 78 00:06:16,000 --> 00:06:20,000 of dissect what's happening to that individual gene by manipulating it 79 00:06:20,000 --> 00:06:23,000 in different ways, putting it in different contexts we 80 00:06:23,000 --> 00:06:26,000 can begin to understand what the specific consequences of that 81 00:06:26,000 --> 00:06:30,000 mutation are, which ultimately allows us to deal with the disease 82 00:06:30,000 --> 00:06:34,000 more effectively. And another value of just 83 00:06:34,000 --> 00:06:38,000 manipulating DNA, which we'll come to in a future 84 00:06:38,000 --> 00:06:42,000 lecture, is what you hear about in the press as DNA fingerprinting. 85 00:06:42,000 --> 00:06:46,000 It's a very practical use of this technology in which you want to 86 00:06:46,000 --> 00:06:51,000 distinguish two individuals based on the specific changes in their DNA. 87 00:06:51,000 --> 00:06:55,000 And we can do that because we can amplify up DNA sequences from very, 88 00:06:55,000 --> 00:06:59,000 very small quantities and figure out the specific nucleotide sequences of 89 00:06:59,000 --> 00:07:04,000 those DNA fragments and be able to tell then two people apart. 90 00:07:04,000 --> 00:07:08,000 And this is useful in paternity cases, it's useful in criminal cases, 91 00:07:08,000 --> 00:07:13,000 it's useful for archeological purposes and otherwise. 92 00:07:13,000 --> 00:07:17,000 So there are really a great number of values to this set of 93 00:07:17,000 --> 00:07:22,000 technologies. So where do they come from? Well, we've been able to do 94 00:07:22,000 --> 00:07:27,000 this sort of thing since the late 1960s, early 1970s. 95 00:07:27,000 --> 00:07:31,000 And this is referred to, in our field, as the Recombinant DNA 96 00:07:31,000 --> 00:07:43,000 Revolution 97 00:07:43,000 --> 00:07:51,000 And there were three things that happened in this time period that 98 00:07:51,000 --> 00:08:00,000 made this technology possible. Firstly was the ability to separate 99 00:08:00,000 --> 00:08:09,000 genomic DNA, specifically into fragments. 100 00:08:09,000 --> 00:08:12,000 So genomic DNA, as you know, is huge. 101 00:08:12,000 --> 00:08:15,000 Our chromosomes are long, long linear pieces of DNA. Even 102 00:08:15,000 --> 00:08:18,000 bacterial chromosomes are a few million nucleotides in length. 103 00:08:18,000 --> 00:08:21,000 That's much too much to deal with for many of these applications. 104 00:08:21,000 --> 00:08:25,000 And so it was necessary to basically chop the genomic DNA up 105 00:08:25,000 --> 00:08:28,000 into manageable sized pieces and then be able to deal with those 106 00:08:28,000 --> 00:08:31,000 pieces individually. The second thing that 107 00:08:31,000 --> 00:08:38,000 was necessary -- 108 00:08:38,000 --> 00:08:43,000 -- was the ability to transfer the isolated pieces -- 109 00:08:43,000 --> 00:08:50,000 -- into another organism. 110 00:08:50,000 --> 00:08:54,000 And the organism of choice then and largely now is bacteria. 111 00:08:54,000 --> 00:08:58,000 So it was observations made in the 1960s that bacteria would actually 112 00:08:58,000 --> 00:09:02,000 take up from their environment pieces of DNA and sometimes 113 00:09:02,000 --> 00:09:06,000 incorporate them within their cells. 114 00:09:06,000 --> 00:09:12,000 That observation led to the notion that you could do that on purpose, 115 00:09:12,000 --> 00:09:19,000 not just at random, and that would be then using the bacteria basically 116 00:09:19,000 --> 00:09:26,000 as a cargo ship for the introduced DNA. And, finally, 117 00:09:26,000 --> 00:09:33,000 the ability to amplify the fragments of interest to large quantities. 118 00:09:33,000 --> 00:09:42,000 So if I take one piece of DNA from 119 00:09:42,000 --> 00:09:46,000 this guy, it's a single molecule, that's not enough. I cannot use 120 00:09:46,000 --> 00:09:50,000 that for much. I cannot use it to sequence his DNA, 121 00:09:50,000 --> 00:09:54,000 for example. I cannot use it to study specific properties of that 122 00:09:54,000 --> 00:09:58,000 gene. Instead I need to amplify it up to very large quantities. 123 00:09:58,000 --> 00:10:02,000 And, therefore, it was necessary to figure out ways 124 00:10:02,000 --> 00:10:06,000 basically to turn the bacteria into little DNA factories that would 125 00:10:06,000 --> 00:10:10,000 churn out large quantities of DNA of interest. And, 126 00:10:10,000 --> 00:10:14,000 again, we'll review how that was possible. And all of these things 127 00:10:14,000 --> 00:10:18,000 came together more or less the same time of this timeframe, 128 00:10:18,000 --> 00:10:22,000 and then the field took off. Now we could manipulate DNA very 129 00:10:22,000 --> 00:10:27,000 precisely and begin to understand DNA at a very specific level. 130 00:10:27,000 --> 00:10:31,000 So I'm going to give you some examples of what we do. 131 00:10:31,000 --> 00:10:36,000 I'm actually going to give you perhaps the first and only practical 132 00:10:36,000 --> 00:10:41,000 demonstration of this. We're going to do an experiment 133 00:10:41,000 --> 00:10:46,000 before you, a dangerous thing to do in front of a live audience. 134 00:10:46,000 --> 00:10:50,000 We're actually going to transfer a gene from one organism to another. 135 00:10:50,000 --> 00:10:55,000 We're going to start it today and we'll finish it in subsequent 136 00:10:55,000 --> 00:11:00,000 lectures. The gene of interest is a toxic gene. It's a toxic gene 137 00:11:00,000 --> 00:11:05,000 that's present in an organism known as S. pyogenes. 138 00:11:05,000 --> 00:11:08,000 You might know of this organism, might have heard of this organism 139 00:11:08,000 --> 00:11:12,000 because it's the flesh eating bacterium. Anybody heard of the 140 00:11:12,000 --> 00:11:16,000 flesh eating bacterium? This is true. If you get infected 141 00:11:16,000 --> 00:11:19,000 with S. pyogenes and you're slightly immunocompromised, 142 00:11:19,000 --> 00:11:23,000 but even if you have a bad strain of it even if you're very healthy, 143 00:11:23,000 --> 00:11:27,000 the bacterium will go about eating your flesh. It actually can be 144 00:11:27,000 --> 00:11:31,000 lethal it's so devastating. The organism, S. 145 00:11:31,000 --> 00:11:35,000 pyogenes is a bacterium that has a genome, of course. 146 00:11:35,000 --> 00:11:40,000 That genome has about a thousand genes. And the sequence of this 147 00:11:40,000 --> 00:11:44,000 genome is known. So by standard DNA sequencing 148 00:11:44,000 --> 00:11:49,000 technologies, we'll actually review those for you in a couple of 149 00:11:49,000 --> 00:11:53,000 lectures, specific DNA sequences of this organism is known. 150 00:11:53,000 --> 00:11:58,000 OK? So we know exactly which genes there are there. 151 00:11:58,000 --> 00:12:02,000 And, just for the sake of discussion, let's consider a couple of genes. 152 00:12:02,000 --> 00:12:06,000 Gene A, gene B and gene T. And it's this gene that we're interested 153 00:12:06,000 --> 00:12:11,000 in. It's the gene that encodes a very critical toxin that the 154 00:12:11,000 --> 00:12:15,000 bacterium makes that kills your cells. And so when you're badly 155 00:12:15,000 --> 00:12:20,000 infected by this bacterium, it's dumping out this toxin and it's 156 00:12:20,000 --> 00:12:24,000 killing the cells in the surrounding. And that's why it's 157 00:12:24,000 --> 00:12:28,000 flesh eating. OK? Now, you also heard in the previous 158 00:12:28,000 --> 00:12:32,000 couple of lectures ago that these genomes have other important 159 00:12:32,000 --> 00:12:36,000 properties. And just to remind you they have origins of replication, 160 00:12:36,000 --> 00:12:40,000 ORIs. All genomes, for the purposes of replication, 161 00:12:40,000 --> 00:12:44,000 all pieces of DNA that are going to be replicated need at least one 162 00:12:44,000 --> 00:12:48,000 origin of replication. So this genome has one. 163 00:12:48,000 --> 00:12:52,000 And this will come up in another context shortly, 164 00:12:52,000 --> 00:12:56,000 but what we're really interested in is this gene T here, 165 00:12:56,000 --> 00:13:00,000 the toxin gene. We want to clone it. We want to move it from this 166 00:13:00,000 --> 00:13:04,000 organism to another organism. And specifically the other organism 167 00:13:04,000 --> 00:13:09,000 of choice, and this is often the case, is E. coli. 168 00:13:09,000 --> 00:13:13,000 E. coli is a common gut bacterium present in all of you in very, 169 00:13:13,000 --> 00:13:18,000 very large quantities. It has been modified for laboratory purposes 170 00:13:18,000 --> 00:13:23,000 largely weakened so that it cannot easily make its way back into you 171 00:13:23,000 --> 00:13:27,000 and cause possible harm. It's used in laboratories in 172 00:13:27,000 --> 00:13:32,000 another context all the time. I'm depicting here an E. 173 00:13:32,000 --> 00:13:36,000 coli cell. This is always confusing because I depicted here with a very 174 00:13:36,000 --> 00:13:41,000 similar circle the genome of S. pyogenes. This is a bacterial cell. 175 00:13:41,000 --> 00:13:45,000 This is the membrane of the bacterium. And, 176 00:13:45,000 --> 00:13:50,000 of course, E. coli has its own genome which has roughly the same 177 00:13:50,000 --> 00:13:54,000 size and roughly the same numbers of genes. What we're going to do is 178 00:13:54,000 --> 00:13:59,000 transfer one more gene from this organism into that organism. 179 00:13:59,000 --> 00:14:05,000 Really, the purpose of today's lecture is how do you do that? 180 00:14:05,000 --> 00:14:11,000 But before we consider how you do it, let's ask the question why you 181 00:14:11,000 --> 00:14:18,000 would want to do it. Anybody have any ideas? 182 00:14:18,000 --> 00:14:24,000 Why would you want to transfer this potentially deadly toxin producing 183 00:14:24,000 --> 00:14:29,000 gene to another organism? Yeah? It could be that in a different 184 00:14:29,000 --> 00:14:31,000 context it's actually beneficial in one organism. You may be able to 185 00:14:31,000 --> 00:14:34,000 make a better E. coli for some purpose if you 186 00:14:34,000 --> 00:14:36,000 transferred this gene. My favorite idea, actually, 187 00:14:36,000 --> 00:14:43,000 is global terror. 188 00:14:43,000 --> 00:14:47,000 We can laugh about this but, you know, there are terrorists 189 00:14:47,000 --> 00:14:51,000 around who are thinking these very thoughts. Sad but true, 190 00:14:51,000 --> 00:14:55,000 people are making modified organisms with dangerous genes to make them 191 00:14:55,000 --> 00:15:00,000 more ìpathogenicî for deliberate release into populations. 192 00:15:00,000 --> 00:15:04,000 So I'm not teaching you how to do this for malicious purposes but just 193 00:15:04,000 --> 00:15:08,000 to know that there are people who do that sort of thing. 194 00:15:08,000 --> 00:15:12,000 Related to this point over here, we want to understand these genes 195 00:15:12,000 --> 00:15:17,000 for the purposes of biological research. We want to understand as 196 00:15:17,000 --> 00:15:21,000 much as we can about disease-related properties of genes and so on. 197 00:15:21,000 --> 00:15:25,000 And sometimes you cannot do the experiments you want to do in the 198 00:15:25,000 --> 00:15:30,000 organism in which the disease occurs. 199 00:15:30,000 --> 00:15:33,000 And it's easier to do it, in a sense, in isolation in another 200 00:15:33,000 --> 00:15:37,000 organism. And then another example might be to produce a vaccine. 201 00:15:37,000 --> 00:15:41,000 If I could produce this protein that's so toxic in large quantities, 202 00:15:41,000 --> 00:15:45,000 I might be able to make a version of it which is not toxic, 203 00:15:45,000 --> 00:15:49,000 very similar but not toxic. And then I could inject it into 204 00:15:49,000 --> 00:15:53,000 people, they would raise an antibody response against it such that if 205 00:15:53,000 --> 00:15:57,000 they were ever challenged with the real dangerous guy they 206 00:15:57,000 --> 00:16:00,000 would be immune. So that would be a very useful thing 207 00:16:00,000 --> 00:16:04,000 to do. And, again, if we were worried about the 208 00:16:04,000 --> 00:16:08,000 terrorists we might do exactly that. So this is the goal of our 209 00:16:08,000 --> 00:16:11,000 experiment. And we're going to start the experiment today. 210 00:16:11,000 --> 00:16:15,000 So what I've done is acquired from one of my colleagues in the Biology 211 00:16:15,000 --> 00:16:18,000 Department two tubes in which we have placed these bacteria. 212 00:16:18,000 --> 00:16:22,000 S. pyogenes is in this one. E. coli is in this one. There are 213 00:16:22,000 --> 00:16:26,000 bacterial cells in here that grow in suspension. They grow inside 214 00:16:26,000 --> 00:16:30,000 the liquid medium. If we want to isolate the DNA we 215 00:16:30,000 --> 00:16:34,000 first have to isolate the bacteria. So we spin these tubes in a 216 00:16:34,000 --> 00:16:38,000 centrifuge. The bacteria then pellet to the bottom of the tube. 217 00:16:38,000 --> 00:16:43,000 So the bacteria are now collected at the bottom of these tubes. 218 00:16:43,000 --> 00:16:47,000 We then decant off the liquid. I meant to bring a beaker, but I 219 00:16:47,000 --> 00:16:51,000 don't think they'll mind if I use this one. So it's always a little 220 00:16:51,000 --> 00:16:56,000 bit dangerous to do this, and we'll have to take care of this 221 00:16:56,000 --> 00:17:00,000 later because it is a bit dangerous. It's always a little bit dangerous 222 00:17:00,000 --> 00:17:05,000 to do this because you don't want to bump yourself like I just did. 223 00:17:05,000 --> 00:17:13,000 It actually is starting to hurt a little bit. So we'll get rid of the 224 00:17:13,000 --> 00:17:21,000 liquid here. And then we'll add various chemical solutions to 225 00:17:21,000 --> 00:17:29,000 isolate the DNA from this bacterium. Wow. It's really starting to go. 226 00:17:29,000 --> 00:17:37,000 I'm just kidding. Gets them every time. 227 00:17:37,000 --> 00:17:45,000 There really was no bacterium in there of any kind. 228 00:17:45,000 --> 00:17:52,000 I know. OK. But we're still going to go through this example. 229 00:17:52,000 --> 00:18:00,000 So we want to transfer this gene into this organism. 230 00:18:00,000 --> 00:18:04,000 What are we going to do? Well, the first thing we need to 231 00:18:04,000 --> 00:18:08,000 worry about is the fact that this gene, the T gene here, 232 00:18:08,000 --> 00:18:12,000 is contained on a very large piece of DNA. As I said, 233 00:18:12,000 --> 00:18:16,000 it's got a thousand other genes. It's about 4 million base pairs 234 00:18:16,000 --> 00:18:20,000 long. And I'm only interested in this one, so I need a way to isolate 235 00:18:20,000 --> 00:18:24,000 the T gene away from the other genes present in this genome. 236 00:18:24,000 --> 00:18:28,000 So for this purpose I need a tool. And the tool that I need is a 237 00:18:28,000 --> 00:18:41,000 restriction enzyme 238 00:18:41,000 --> 00:18:44,000 Restriction enzymes are nicely described in your book. 239 00:18:44,000 --> 00:18:48,000 They are enzymes present in bacterial cells which are designed 240 00:18:48,000 --> 00:18:51,000 to cut, to cleave DNA sequences. And they do so in a site-specific 241 00:18:51,000 --> 00:18:55,000 fashion. They are sequence-specific -- 242 00:18:55,000 --> 00:19:03,000 -- DNA cutting enzymes. 243 00:19:03,000 --> 00:19:09,000 We call them endonucleases, but you don't have to worry about 244 00:19:09,000 --> 00:19:15,000 that term. They are DNA cutting enzymes. Some of them produce, 245 00:19:15,000 --> 00:19:20,000 as you'll see in a moment, what we call sticky ends, 246 00:19:20,000 --> 00:19:26,000 based on exactly how they cut the DNA. And others produce what we 247 00:19:26,000 --> 00:19:32,000 call blunt ends because they cut the DNA in a slightly different way. 248 00:19:32,000 --> 00:19:35,000 There are hundreds of restriction enzymes that have been purified from 249 00:19:35,000 --> 00:19:39,000 various bacteria. There are now companies that will 250 00:19:39,000 --> 00:19:42,000 sell you the purified restriction endonuclease. There are whole 251 00:19:42,000 --> 00:19:46,000 businesses that are geared around selling this stuff to molecular 252 00:19:46,000 --> 00:19:49,000 biologists around the world. So you can order it up like you 253 00:19:49,000 --> 00:19:53,000 would order up a chemical from a chemical supply house. 254 00:19:53,000 --> 00:19:57,000 As I said, these are site-specific, sequence-specific DNA cutting 255 00:19:57,000 --> 00:20:01,000 enzymes. So they recognize particular 256 00:20:01,000 --> 00:20:05,000 sequences in the DNA. One such enzyme, which was one of 257 00:20:05,000 --> 00:20:09,000 the first discovered, an enzyme called EcoR1, 258 00:20:09,000 --> 00:20:13,000 which comes from E. coli, that's why it's EcoR1, R1 is 259 00:20:13,000 --> 00:20:17,000 probably restriction enzyme number one, recognizes a particular 260 00:20:17,000 --> 00:20:21,000 sequence reading from the 5 prime end of the DNA molecule towards the 261 00:20:21,000 --> 00:20:25,000 3 prime end. It recognizes the sequence G-A-A-T-T-C. 262 00:20:25,000 --> 00:20:29,000 And then the other side of that, of course, is a 3 prime end. 263 00:20:29,000 --> 00:20:33,000 This is the polarity of DNA that you've been shown in many cases 264 00:20:33,000 --> 00:20:37,000 before. Now what is the reverse strand? What does the opposite, 265 00:20:37,000 --> 00:20:41,000 the complimentary strand of this sequence look like? 266 00:20:41,000 --> 00:20:45,000 Can anybody tell me? You want to yell it out in unison. 267 00:20:45,000 --> 00:20:52,000 Here is a G and -- 268 00:20:52,000 --> 00:20:57,000 That was very good. Notice anything about this? 269 00:20:57,000 --> 00:21:02,000 Yes. It reads the same ways forwards as backwards. 270 00:21:02,000 --> 00:21:05,000 G-A-A-T-T-C. G-A-A-T-T-C. What do we call that? It's a 271 00:21:05,000 --> 00:21:09,000 palindrome. Your book uses the word MOM, which is a fairly boring 272 00:21:09,000 --> 00:21:13,000 palindrome. It is reads the same way forwards as backwards. 273 00:21:13,000 --> 00:21:16,000 I always like "A man, a plan, a canal, Panama." That's a 274 00:21:16,000 --> 00:21:20,000 palindrome, too. It reads the same way forwards as 275 00:21:20,000 --> 00:21:24,000 backwards. Anyway, many, not all, but many restriction 276 00:21:24,000 --> 00:21:28,000 enzymes recognize palindromic sequences. They read the same way 277 00:21:28,000 --> 00:21:32,000 forwards as backwards. OK? And because of that they cleave 278 00:21:32,000 --> 00:21:37,000 the DNA on both strands at the same position. EcoR1, 279 00:21:37,000 --> 00:21:43,000 this enzyme that recognizes this sequence will cleave the DNA between 280 00:21:43,000 --> 00:21:48,000 this phosphodiester bond, between the phosphodiester bond 281 00:21:48,000 --> 00:21:53,000 between the G and the A. And likewise on this strand. 282 00:21:53,000 --> 00:21:58,000 And the consequence of that is to produce a break in the DNA, 283 00:21:58,000 --> 00:22:04,000 so you now have 5 prime G, 3 prime C-T-A-A. 284 00:22:04,000 --> 00:22:07,000 Breaks occur here and the strands get separated. 285 00:22:07,000 --> 00:22:10,000 They get pulled apart. The hydrogen bonds that are holding 286 00:22:10,000 --> 00:22:14,000 these base pairs together are not strong enough to hold the two 287 00:22:14,000 --> 00:22:17,000 molecules together, at least most of the time, 288 00:22:17,000 --> 00:22:21,000 so most of the time they pull apart. So you end up with a fragment that 289 00:22:21,000 --> 00:22:24,000 has an end that looks like this and another fragment that has an end 290 00:22:24,000 --> 00:22:28,000 that looks like this, A-A-T-C, here's the 3 prime end, 291 00:22:28,000 --> 00:22:32,000 G, and here's the 5 prime end. OK? So you snip it at either side 292 00:22:32,000 --> 00:22:36,000 and pull it apart. Oops. I did something wrong. 293 00:22:36,000 --> 00:22:41,000 I missed a T, yup, on both of them. Thank you. I don't think I've ever 294 00:22:41,000 --> 00:22:45,000 drawn those right. In all the years I've taught this 295 00:22:45,000 --> 00:22:50,000 course, I always make it. Not the same mistake necessarily 296 00:22:50,000 --> 00:22:54,000 but a mistake. OK, so they make a break. 297 00:22:54,000 --> 00:22:59,000 And that then cleaves the DNA into two pieces. 298 00:22:59,000 --> 00:23:04,000 Now, importantly, at low enough temperatures these 299 00:23:04,000 --> 00:23:09,000 ends can find each other. And we call them sticky because 300 00:23:09,000 --> 00:23:14,000 they are complimentary to one another. They can actually reform 301 00:23:14,000 --> 00:23:20,000 those base pairs. So you would get a molecule which 302 00:23:20,000 --> 00:23:25,000 has 5 prime G, three prime C-T-T-A-A. 303 00:23:25,000 --> 00:23:30,000 And then base paired to this T-T-A-A would be an A-A-T-T sequence, 304 00:23:30,000 --> 00:23:36,000 which was covalently bound on this side to the C. 305 00:23:36,000 --> 00:23:41,000 No. Yeah, covalently bound on this side to the C and non-covalently 306 00:23:41,000 --> 00:23:47,000 bound to this G. OK? So there are non-covalent 307 00:23:47,000 --> 00:23:53,000 bonds here and here and there are base pairs in the middle that are 308 00:23:53,000 --> 00:23:59,000 holding these strands together. We call this annealing. 309 00:23:59,000 --> 00:24:03,000 And, again, at low enough temperatures you can get sticky ends 310 00:24:03,000 --> 00:24:07,000 of restriction enzyme cleaved sites to come together this way. 311 00:24:07,000 --> 00:24:11,000 Now, importantly another enzyme which we could use, 312 00:24:11,000 --> 00:24:16,000 discovered more or less the same time, is an enzyme called DNA ligase, 313 00:24:16,000 --> 00:24:20,000 which you would have heard about in the replication lectures. 314 00:24:20,000 --> 00:24:24,000 And DNA ligase will tie to non-covalently linked pieces of DNA 315 00:24:24,000 --> 00:24:29,000 together by a covalent bond. So DNA ligase will reintroduce this 316 00:24:29,000 --> 00:24:35,000 phosphodiester bond and this phosphodiester bond to produce now a 317 00:24:35,000 --> 00:24:40,000 completely covalently-bound sequence. OK? So we can cut the DNA at sites 318 00:24:40,000 --> 00:24:46,000 we want to. If we change the temperature we can get these DNA 319 00:24:46,000 --> 00:24:51,000 molecules to come together in a non-covalent fashion. 320 00:24:51,000 --> 00:24:57,000 And then, if we add this enzyme DNA ligase, they will become covalently 321 00:24:57,000 --> 00:25:02,000 bond once again. OK? Key enzymes in our tool kit. 322 00:25:02,000 --> 00:25:06,000 As I said, there are other enzymes that will cut DNA in a blunt fashion. 323 00:25:06,000 --> 00:25:10,000 They will not produce these sticky ends. I won't show you a new 324 00:25:10,000 --> 00:25:14,000 example of that today, and it's not terribly important that 325 00:25:14,000 --> 00:25:18,000 you know about them, but you can imagine that an enzyme 326 00:25:18,000 --> 00:25:22,000 that cuts in the middle between the central positions of the recognition 327 00:25:22,000 --> 00:25:27,000 sequence will make a blunt end. They won't be sticky. 328 00:25:27,000 --> 00:25:32,000 And those aren't as useful to us because the sticky ends actually 329 00:25:32,000 --> 00:25:38,000 help us get two pieces of DNA together in order to promote the 330 00:25:38,000 --> 00:25:43,000 cloning step at the end. OK. So the S. pyogenes genome that 331 00:25:43,000 --> 00:25:49,000 we need to isolate our T gene from has a very large sequence. 332 00:25:49,000 --> 00:25:54,000 It's 4.1 times ten to the sixth base pairs long all the way around 333 00:25:54,000 --> 00:26:00,000 the circle. And I was wondering. -- 334 00:26:00,000 --> 00:26:10,000 Well, let me just remind you that EcoR1 recognizes a 6 base pair 335 00:26:10,000 --> 00:26:20,000 recognition site. So how many EcoR1 sites are there 336 00:26:20,000 --> 00:26:30,000 in this genome? What's the frequency of EcoR1 sites 337 00:26:30,000 --> 00:26:37,000 along a piece of DNA? Well, it has to be this particular 338 00:26:37,000 --> 00:26:42,000 sequence, G-A-A-T-C-C, reading in that direction. 339 00:26:42,000 --> 00:26:47,000 The frequency of any given nucleotide at any given site is one 340 00:26:47,000 --> 00:26:51,000 in four because there are four nucleotides. The frequency of six 341 00:26:51,000 --> 00:26:56,000 of a given sequence in a row is one over four to the sixth, 342 00:26:56,000 --> 00:27:01,000 which is one in 4,096. So every 4,000 base pairs or so, 343 00:27:01,000 --> 00:27:05,000 at random, you will find an EcoR1 site. OK? They won't be equally 344 00:27:05,000 --> 00:27:09,000 distributed. There will be some parts of the genome where there are 345 00:27:09,000 --> 00:27:13,000 many by chance, other parts where there are a few, 346 00:27:13,000 --> 00:27:18,000 but on average there will be about one every 4,000 base pairs. 347 00:27:18,000 --> 00:27:22,000 And since the genome is 4 million base pairs long that means that 348 00:27:22,000 --> 00:27:26,000 there's going to be about a thousand EcoR1 sites scattered around 349 00:27:26,000 --> 00:27:31,000 this genome. OK? And they might be close together in 350 00:27:31,000 --> 00:27:35,000 some places, further apart in other places, and so on around the gene. 351 00:27:35,000 --> 00:27:40,000 Now, the T gene is in a particular place within the genome. 352 00:27:40,000 --> 00:27:44,000 I'm going to draw it blue. Here's the T gene. And when 353 00:27:44,000 --> 00:27:48,000 deciding what restriction endonuclease, what restriction 354 00:27:48,000 --> 00:27:53,000 enzyme to use to cleave the T gene in an intact fashion, 355 00:27:53,000 --> 00:27:57,000 I'll consult the sequence, which I know, I'll choose an enzyme 356 00:27:57,000 --> 00:28:02,000 which I know won't cleave this gene, won't cut this gene. 357 00:28:02,000 --> 00:28:06,000 There are no EcoR1 sites within this gene. OK? So there might be one on 358 00:28:06,000 --> 00:28:11,000 this side and there might be one on this side, but I've chosen EcoR1 359 00:28:11,000 --> 00:28:16,000 because I know there are no G-A-A-T-T-C sequences within the T 360 00:28:16,000 --> 00:28:21,000 gene. OK? So to give you a visual depiction of what we're talking 361 00:28:21,000 --> 00:28:26,000 about, more visual aids, I brought with me a very, 362 00:28:26,000 --> 00:28:31,000 very dangerous piece of rope, a flesh eating piece of rope. 363 00:28:31,000 --> 00:28:34,000 They never fall for it twice, do they? In which I've depicted the 364 00:28:34,000 --> 00:28:38,000 S. pyogenes genome as a piece of covalently bound rope here. 365 00:28:38,000 --> 00:28:42,000 And the EcoR1 sites are shown in yellow. OK? That's really what 366 00:28:42,000 --> 00:28:46,000 they are. They're just little tags basically that the specific enzymes 367 00:28:46,000 --> 00:28:50,000 recognize. They're not there for the purposes of the enzyme 368 00:28:50,000 --> 00:28:54,000 recognizing them. They're just there. 369 00:28:54,000 --> 00:28:58,000 And the enzyme can recognize them. OK? And here I've colored in blue 370 00:28:58,000 --> 00:29:01,000 the T gene. And you'll see that there are no 371 00:29:01,000 --> 00:29:05,000 EcoR1 sites within the T gene. So now if I take this purified 372 00:29:05,000 --> 00:29:09,000 piece of DNA that I recovered from S. pyogenes and I put in solution with 373 00:29:09,000 --> 00:29:13,000 the restriction enzyme EcoR1 in the right sort of buffer and so on, 374 00:29:13,000 --> 00:29:16,000 the restriction enzyme will do what it's able to do. 375 00:29:16,000 --> 00:29:20,000 It will go along and find those sites and cut them. 376 00:29:20,000 --> 00:29:24,000 OK? Everywhere it sees one it will cut one. You know what I really 377 00:29:24,000 --> 00:29:28,000 should do? I should figure out how to put the rope back together. 378 00:29:28,000 --> 00:29:32,000 That would be a trick. I don't know how to do that. 379 00:29:32,000 --> 00:29:37,000 I should work on that. That would be great. I should do that. 380 00:29:37,000 --> 00:29:41,000 All right. So we've cleaved our DNA molecule into fragments. 381 00:29:41,000 --> 00:29:46,000 OK? And the S. pyogenes genome that we drew up on the board will 382 00:29:46,000 --> 00:29:51,000 likewise be separated into fragments. It will be separated into about a 383 00:29:51,000 --> 00:29:56,000 thousand fragments because there are about a thousand sites. 384 00:29:56,000 --> 00:30:01,000 OK? I now want to separate those fragments from one another. 385 00:30:01,000 --> 00:30:05,000 In order to clone the T gene, I need to separate the T gene away 386 00:30:05,000 --> 00:30:09,000 from all the other genes and all the other DNA sequences in this genome. 387 00:30:09,000 --> 00:30:13,000 So how do I do it? Well, I take advantage of the fact that the 388 00:30:13,000 --> 00:30:17,000 pieces of DNA that I've liberated are different lengths. 389 00:30:17,000 --> 00:30:21,000 You can see that there are some short ones, there are some long ones, 390 00:30:21,000 --> 00:30:25,000 and it's actually quite easy to separate DNA based on its size. 391 00:30:25,000 --> 00:30:29,000 The way you do that is to introduce the DNA solution that contains the 392 00:30:29,000 --> 00:30:34,000 fragments of all sizes into what we call an agarose gel. 393 00:30:34,000 --> 00:30:39,000 It's like a slab of Jell-O. At one end, as shown here, here's 394 00:30:39,000 --> 00:30:44,000 the slab of Jell-O in which little wells have been cut out, 395 00:30:44,000 --> 00:30:48,000 little indentations have been placed, we introduce the DNA solution, 396 00:30:48,000 --> 00:30:53,000 usually with a colored dye to let us know what we're doing, 397 00:30:53,000 --> 00:30:58,000 and then we put buffer on top of the gel and at the two wells at the end 398 00:30:58,000 --> 00:31:03,000 of the gel, and then we apply an electric field. 399 00:31:03,000 --> 00:31:07,000 This is called gel electrophorisis. We apply a positive charge on one 400 00:31:07,000 --> 00:31:11,000 end, a negative charge on the other end. Which way will the DNA go? 401 00:31:11,000 --> 00:31:15,000 Which way will the DNA go? Is DNA positively charges or negatively 402 00:31:15,000 --> 00:31:19,000 charged? It's negatively charged because all those phosphates which 403 00:31:19,000 --> 00:31:23,000 carry a negative charge. So in neutral pH buffer it will 404 00:31:23,000 --> 00:31:27,000 move towards the positive pole. And so the DNA will actually 405 00:31:27,000 --> 00:31:31,000 separate. And that's what you're seeing here. 406 00:31:31,000 --> 00:31:35,000 These are gels of different concentrations of the stuff that 407 00:31:35,000 --> 00:31:39,000 makes them stiff. So at a low percentage you see this 408 00:31:39,000 --> 00:31:42,000 fragmentation of fragments. The larger fragments don't go very 409 00:31:42,000 --> 00:31:46,000 far, the smaller fragments go further into the gel, 410 00:31:46,000 --> 00:31:49,000 and at different concentrations of this stuff you'll get different 411 00:31:49,000 --> 00:31:53,000 separations in different positions. And so with this separation 412 00:31:53,000 --> 00:31:57,000 technique I can isolate this fragment, which is of a particular 413 00:31:57,000 --> 00:32:00,000 size, away from other fragments. So if you imagine this is the gel 414 00:32:00,000 --> 00:32:03,000 here, I put a negative charge here, a positive charge here, the 415 00:32:03,000 --> 00:32:07,000 different DNA fragments are going to separate along the gel with the long 416 00:32:07,000 --> 00:32:10,000 ones at the top and the short ones at the bottom, 417 00:32:10,000 --> 00:32:13,000 and the middle sized ones will separate in the middle and the T 418 00:32:13,000 --> 00:32:17,000 gene will separate right there. And, if I have a lot of DNA to 419 00:32:17,000 --> 00:32:20,000 start with, I'm going to have a lot of molecules of the T gene in this 420 00:32:20,000 --> 00:32:23,000 position on the gel. And if I shine a UV light, 421 00:32:23,000 --> 00:32:27,000 which was what was done here, I can literally see that position. 422 00:32:27,000 --> 00:32:31,000 And since I know how big it is, because I know the sequence of this 423 00:32:31,000 --> 00:32:35,000 organism, I can know that it's exactly 1,512 base pairs long. 424 00:32:35,000 --> 00:32:39,000 And, therefore, it should run right there in this particular position on 425 00:32:39,000 --> 00:32:43,000 the gel. So I can purify it. Now, usually when I purify it, 426 00:32:43,000 --> 00:32:47,000 I purify a bunch of other things that are of similar size. 427 00:32:47,000 --> 00:32:51,000 So I don't usually get it all only this molecule, 428 00:32:51,000 --> 00:32:55,000 but I do a pretty good job of purifying it. And now I need to 429 00:32:55,000 --> 00:32:59,000 amplify it. I've got some of that DNA, but I need to have 430 00:32:59,000 --> 00:33:03,000 more of the DNA. I need to have much, 431 00:33:03,000 --> 00:33:07,000 much, many, many more copies in order to be able to do the 432 00:33:07,000 --> 00:33:12,000 manipulations I want to do downstream. So the next thing I 433 00:33:12,000 --> 00:33:16,000 need to do is get this piece of DNA, and the other ones that I'm less 434 00:33:16,000 --> 00:33:21,000 interested in, into bacteria and get the bacteria 435 00:33:21,000 --> 00:33:25,000 to make more and more of it. So what I do in principle is to 436 00:33:25,000 --> 00:33:30,000 introduce the DNA into different bacteria. 437 00:33:30,000 --> 00:33:35,000 There you go. And now I need those bacteria to make more of that stuff 438 00:33:35,000 --> 00:33:41,000 in order to give me what I need. But there are certain problems. 439 00:33:41,000 --> 00:33:46,000 There are certain problems associated with what I've just told 440 00:33:46,000 --> 00:33:52,000 to you. The problems are that I cannot tell which of these folks got 441 00:33:52,000 --> 00:33:57,000 the right gene. That's one problem. 442 00:33:57,000 --> 00:34:03,000 I need to figure out a way to know who got the T gene and who didn't. 443 00:34:03,000 --> 00:34:07,000 I also need to know who got any piece of DNA versus who got nothing. 444 00:34:07,000 --> 00:34:12,000 The rest of you didn't get any DNA. You are of no interest to me. 445 00:34:12,000 --> 00:34:16,000 If you were bacteria, I wish you would die. It's actually true. 446 00:34:16,000 --> 00:34:21,000 You'll see in a second. The rest of these guys I'm somewhat 447 00:34:21,000 --> 00:34:26,000 interested in, and I'm really interested in him 448 00:34:26,000 --> 00:34:30,000 because he's got the TG. I've picked on you a lot, 449 00:34:30,000 --> 00:34:34,000 haven't I? So those are two of the problems. 450 00:34:34,000 --> 00:34:38,000 And the third problem is that I need these guys to divide, 451 00:34:38,000 --> 00:34:41,000 which they're not terribly willing to do. I need them to divide, 452 00:34:41,000 --> 00:34:44,000 but bacteria will divide. But the bacteria, if these guys were 453 00:34:44,000 --> 00:34:48,000 bacteria, wouldn't know what to do with that little fragment of DNA 454 00:34:48,000 --> 00:34:51,000 that I gave them. They wouldn't know how to replicate 455 00:34:51,000 --> 00:34:54,000 it, how to make more if it because I've just given them a naked piece 456 00:34:54,000 --> 00:34:58,000 of DNA and that's not terribly useful. So we need 457 00:34:58,000 --> 00:35:04,000 to overcome that. And I'm going to tell you how. 458 00:35:04,000 --> 00:35:13,000 The process that we just described, the introduction of DNA into the 459 00:35:13,000 --> 00:35:22,000 bacterium is called transformation, bacterial transformation. Again, 460 00:35:22,000 --> 00:35:31,000 here's a bacterial cell with its own genome. 461 00:35:31,000 --> 00:35:35,000 We treat the bacterium with chemicals that sort of loosen up the 462 00:35:35,000 --> 00:35:39,000 membrane a little bit or we electroshock the bacteria which 463 00:35:39,000 --> 00:35:43,000 blows holes in the membrane. And then when there are holes in 464 00:35:43,000 --> 00:35:47,000 the membrane, these little fragments of DNA which I'm representing here, 465 00:35:47,000 --> 00:35:51,000 maybe I'll just show it as a single linear piece of DNA, 466 00:35:51,000 --> 00:35:55,000 can float in inside the cytoplasm of the bacteria. They literally just 467 00:35:55,000 --> 00:35:59,000 float in there. OK? Now, the bacterium, 468 00:35:59,000 --> 00:36:04,000 if they knew what to do with this piece of DNA, are remarkably good 469 00:36:04,000 --> 00:36:09,000 biofactories because bacteria will divide at optimal conditions three 470 00:36:09,000 --> 00:36:14,000 times every hour, three divisions every hour, 471 00:36:14,000 --> 00:36:20,000 which means 72 divisions every day. 472 00:36:20,000 --> 00:36:25,000 And if they could duplicate this DNA every time they divided, 473 00:36:25,000 --> 00:36:30,000 if they could and they were able to divide 72 times in a day, 474 00:36:30,000 --> 00:36:35,000 if there were one copy of that piece of DNA per cell at the beginning of 475 00:36:35,000 --> 00:36:41,000 the process, just one cell that had that piece of DNA in it then at the 476 00:36:41,000 --> 00:36:46,000 end of the day there could be two to the seventy-second molecules of DNA. 477 00:36:46,000 --> 00:36:51,000 OK? 72 divisions. Two to the seventy-second which is roughly 478 00:36:51,000 --> 00:36:57,000 equivalent to ten to the twentieth copies. Now, each of you has ten to 479 00:36:57,000 --> 00:37:02,000 the thirteenth cells in you. So you've got ten to the thirteenth 480 00:37:02,000 --> 00:37:06,000 or so copies of any particular piece of DNA. This is ten to the seventh 481 00:37:06,000 --> 00:37:10,000 times more than that, so that's 10 million persons' worth 482 00:37:10,000 --> 00:37:14,000 of a given piece of DNA. It's really quite remarkable. 483 00:37:14,000 --> 00:37:18,000 Now, Claudette always makes me point out of the fact that you never 484 00:37:18,000 --> 00:37:22,000 reach this theoretical limit because you would need an MIT's worth of 485 00:37:22,000 --> 00:37:26,000 bacterial solution to do it. So in practice, when we do this 486 00:37:26,000 --> 00:37:30,000 sort of experiment, we get about ten to the tenth to ten 487 00:37:30,000 --> 00:37:34,000 to the thirteenth bacteria at the end. 488 00:37:34,000 --> 00:37:37,000 And, if each bacteria were able to copy this DNA faithfully, 489 00:37:37,000 --> 00:37:41,000 we would have about ten to the tenth or ten to the thirteenth copies of 490 00:37:41,000 --> 00:37:45,000 the DNA. Still quite impressive for not a lot of work. 491 00:37:45,000 --> 00:37:48,000 The problem I've already alluded to is that the bacteria don't know how 492 00:37:48,000 --> 00:37:52,000 to deal with this introduced piece of DNA. This piece of DNA, 493 00:37:52,000 --> 00:37:56,000 for it to be brought up to ten to the tenth or ten to the thirteenth 494 00:37:56,000 --> 00:38:00,000 copies needs to be replicated. The DNA needs to be replicated. 495 00:38:00,000 --> 00:38:16,000 What do we know about replication? 496 00:38:16,000 --> 00:38:20,000 What did you learn about replication? What is the key thing 497 00:38:20,000 --> 00:38:23,000 that a piece of DNA needs in order for it to be replicated? 498 00:38:23,000 --> 00:38:26,000 An origin of replication, right? And these fragments most 499 00:38:26,000 --> 00:38:30,000 likely are not carry an origin of replication, so they're not going to 500 00:38:30,000 --> 00:38:34,000 be replicated. So for this purpose we need an 501 00:38:34,000 --> 00:38:38,000 origin of replication. And you'll see how we accomplish 502 00:38:38,000 --> 00:38:42,000 that. We also mentioned the fact that most of the bacteria, 503 00:38:42,000 --> 00:38:46,000 when we do this transformation, this is actually a relatively 504 00:38:46,000 --> 00:38:50,000 inefficient process. So most bacteria get no DNA. 505 00:38:50,000 --> 00:38:54,000 And they are, of course, not interesting to us. 506 00:38:54,000 --> 00:38:58,000 That's all of you guys in the back there. So we need to get 507 00:38:58,000 --> 00:39:03,000 rid of all of you. And then we need to find which 508 00:39:03,000 --> 00:39:09,000 bacterium has the clone of interest. Which bacterium has the piece of 509 00:39:09,000 --> 00:39:15,000 DNA of interest? This guy compared to the rest of 510 00:39:15,000 --> 00:39:21,000 them. So the solutions to all of those problems come in the form of 511 00:39:21,000 --> 00:39:27,000 what we call vectors. The piece of DNA that we introduce 512 00:39:27,000 --> 00:39:33,000 at the beginning is not an isolated fragment of DNA, in fact. 513 00:39:33,000 --> 00:39:49,000 Instead, we introduce the piece of 514 00:39:49,000 --> 00:39:56,000 DNA in the context of another DNA molecule called a vector. 515 00:39:56,000 --> 00:40:03,000 Vectors, for the most part, are derived from naturally occurring 516 00:40:03,000 --> 00:40:10,000 small DNA molecules that are carried in many bacteria outside of the main 517 00:40:10,000 --> 00:40:16,000 bacterial genome. This is the main bacterial genome. 518 00:40:16,000 --> 00:40:22,000 Many bacteria carry within their cytoplasm small copies of DNA, 519 00:40:22,000 --> 00:40:27,000 also circular, called plasmids. And these are short. They might be 520 00:40:27,000 --> 00:40:33,000 on average 5,000 base pairs long compared to a couple of million base 521 00:40:33,000 --> 00:40:39,000 pairs long. So they're small circular pieces of DNA. 522 00:40:39,000 --> 00:40:43,000 They get transferred from bacterium to bacterium in the wild for the 523 00:40:43,000 --> 00:40:47,000 purposes of genetic exchange between bacteria, also for transmitting drug 524 00:40:47,000 --> 00:40:51,000 resistance or pathogenic properties from one bacteria to another. 525 00:40:51,000 --> 00:40:55,000 So they do get exchanged from bacteria to bacteria. 526 00:40:55,000 --> 00:40:59,000 They were discovered a long time ago and it was recognized that they 527 00:40:59,000 --> 00:41:03,000 would be very useful as vehicle, vectors for cloned DNA. 528 00:41:03,000 --> 00:41:07,000 They have three key features, which I will illustrate in this 529 00:41:07,000 --> 00:41:12,000 diagram. One is the origin of replication. So they do replicate 530 00:41:12,000 --> 00:41:16,000 inside bacterial cells. They have their own origin of 531 00:41:16,000 --> 00:41:21,000 replication. So you can see where we're going here. 532 00:41:21,000 --> 00:41:26,000 We're going to put our piece of DNA, along with this, 533 00:41:26,000 --> 00:41:31,000 in order to get the replication of our DNA as well. 534 00:41:31,000 --> 00:41:35,000 They have restriction sites. Of course, they're going to have 535 00:41:35,000 --> 00:41:40,000 restriction sites throughout their DNA sequence. But they're designed 536 00:41:40,000 --> 00:41:45,000 now on purpose to have particular restriction sites in a particular 537 00:41:45,000 --> 00:41:50,000 region of the piece of DNA which we use for cloning purposes. 538 00:41:50,000 --> 00:41:55,000 So there might be EcoR1 site here and nowhere else in this plasmid, 539 00:41:55,000 --> 00:42:00,000 another enzyme represented once, and so on and so forth. 540 00:42:00,000 --> 00:42:05,000 Because we're going to introduce our DNA into here and we don't want to 541 00:42:05,000 --> 00:42:10,000 disrupt the sequence of the rest of the plasmid. And the final thing 542 00:42:10,000 --> 00:42:15,000 they have of value is a drug resistance gene, 543 00:42:15,000 --> 00:42:20,000 sometimes multiple drug resistance genes. An example of a drug 544 00:42:20,000 --> 00:42:25,000 resistance gene is the amp resistance gene, 545 00:42:25,000 --> 00:42:30,000 and this confers resistance to a common antibiotic called ampicillin, 546 00:42:30,000 --> 00:42:35,000 very similar to penicillin. If you treat most bacteria with 547 00:42:35,000 --> 00:42:39,000 penicillin they will die, with ampicillin, sorry, they will 548 00:42:39,000 --> 00:42:43,000 die. If you treat a bacterium that's carrying a plasmid that has 549 00:42:43,000 --> 00:42:47,000 an ampicillin resistance gene the bacterium will live, 550 00:42:47,000 --> 00:42:51,000 OK, because it inactivates the ampicillin. It makes it non-toxic 551 00:42:51,000 --> 00:42:55,000 to the bacteria. And now you can see how we're going 552 00:42:55,000 --> 00:42:59,000 to distinguish bacteria that get DNA from bacteria that don't 553 00:42:59,000 --> 00:43:04,000 get any DNA. The bacteria that get DNA are going 554 00:43:04,000 --> 00:43:09,000 to get a drug resistance gene. So they're going to live in the 555 00:43:09,000 --> 00:43:14,000 presence of ampicillin. If you don't get any DNA and you 556 00:43:14,000 --> 00:43:20,000 get treated with ampicillin you're going to die. OK. So, quickly. 557 00:43:20,000 --> 00:43:36,000 I've taken my S. 558 00:43:36,000 --> 00:43:40,000 pyogenes genome and I cut it up into EcoR1 sites. I've cut it with EcoR1, 559 00:43:40,000 --> 00:43:44,000 so I've cut it into EcoR1 fragments. And now I'm representing them as 560 00:43:44,000 --> 00:43:48,000 double-stranded with their sticky ends showing. Here's the end of an 561 00:43:48,000 --> 00:43:52,000 EcoR1 site. Here's the end of an EcoR1 site. I'm just going to show 562 00:43:52,000 --> 00:43:56,000 you two for the sake of simplicity. Here's another fragment liberated 563 00:43:56,000 --> 00:44:00,000 by EcoR1 from the S. pyogenes genome. 564 00:44:00,000 --> 00:44:06,000 Maybe these are two of similar size and I purified them together when I 565 00:44:06,000 --> 00:44:12,000 used my agarose gel technique. So these are two pieces of DNA. 566 00:44:12,000 --> 00:44:18,000 And this one contains our friend the T gene. We've liberated these, 567 00:44:18,000 --> 00:44:24,000 they have their sticky ends, and then we mix them with a plasmid that 568 00:44:24,000 --> 00:44:31,000 likewise has been cleaved with EcoR1. 569 00:44:31,000 --> 00:44:36,000 So we have lots and lots of cut plasmid that has the complimentary 570 00:44:36,000 --> 00:44:41,000 sticky ends. OK? This plasmid is that thing. 571 00:44:41,000 --> 00:44:46,000 I'm showing it as double-stranded now so I can emphasize the sticky 572 00:44:46,000 --> 00:44:51,000 ends. When I lower the temperature, at some frequency this DNA molecule 573 00:44:51,000 --> 00:44:56,000 with find its way here, an anneal at this end over here, 574 00:44:56,000 --> 00:45:01,000 at this end over here, and this piece of DNA will find its way to a 575 00:45:01,000 --> 00:45:07,000 different plasmid an anneal over here and over here. 576 00:45:07,000 --> 00:45:12,000 If I wait a while for that annealing process to take place and then I add 577 00:45:12,000 --> 00:45:18,000 DNA ligase, the enzyme that seals the nicks that form when those 578 00:45:18,000 --> 00:45:24,000 things come together, I will get fully closed 579 00:45:24,000 --> 00:45:30,000 double-stranded DNA molecules which are the plasmid plus 580 00:45:30,000 --> 00:45:35,000 a new piece of DNA. This one will give some chunk of the 581 00:45:35,000 --> 00:45:40,000 S. pyogenes genome I don't care about. This one will give a plasmid 582 00:45:40,000 --> 00:45:45,000 that has a piece of the S. pyogenes genome that carries the T 583 00:45:45,000 --> 00:45:50,000 gene. This one I do care about. OK? So, in the final minute. 584 00:45:50,000 --> 00:46:04,000 I now take these plasmids that I've 585 00:46:04,000 --> 00:46:10,000 generated by this technique, these recombinant plasmids, and I 586 00:46:10,000 --> 00:46:16,000 introduce them into bacteria in culture using this transformation 587 00:46:16,000 --> 00:46:22,000 technique. And then I plate them onto a Petri dish. 588 00:46:22,000 --> 00:46:28,000 And the Petri dish, importantly, has agar for the 589 00:46:28,000 --> 00:46:35,000 bacterial colonies to grow and it contains ampicillin. 590 00:46:35,000 --> 00:46:40,000 It contains the drug ampicillin. Now, as I said, the transformation 591 00:46:40,000 --> 00:46:45,000 process is inefficient. Most bacteria that I treat in the 592 00:46:45,000 --> 00:46:50,000 transformation process will get no DNA. They'll land on this plate. 593 00:46:50,000 --> 00:46:55,000 And what will happen to them? They'll die. The plasmids that did 594 00:46:55,000 --> 00:47:00,000 get DNA, sorry, the bacterium that did pick up a 595 00:47:00,000 --> 00:47:05,000 plasmid will have picked up an ampicillin resistance gene. 596 00:47:05,000 --> 00:47:09,000 They will grow. And they grow on little colonies, 597 00:47:09,000 --> 00:47:13,000 into little colonies after about a day which have about a million cells 598 00:47:13,000 --> 00:47:18,000 each. And they're separated from one another. So this colony might 599 00:47:18,000 --> 00:47:22,000 have that plasmid in it. Every cell in this colony has that 600 00:47:22,000 --> 00:47:26,000 plasmid in it. And this colony might have that 601 00:47:26,000 --> 00:47:31,000 plasmid in it. OK? I'm interested in this one because I 602 00:47:31,000 --> 00:47:35,000 want to now purify large quantities of that piece of DNA. 603 00:47:35,000 --> 00:47:40,000 I'm actually not interested in this one. And next lecture we'll talk 604 00:47:40,000 --> 00:47:44,000 about how you specifically purify this, isolate and identify this 605 00:47:44,000 --> 00:47:47,000 colony compared to the rest.