1 00:00:00,100 --> 00:00:05,000 So I just want to back up a couple because I think there's still some 2 00:00:05,100 --> 00:00:11,000 confusion about what a restriction enzyme is and exactly what it does, 3 00:00:11,100 --> 00:00:16,000 although I indicated, a three prime hydroxyl and a five-prime phosphate. 4 00:00:16,100 --> 00:00:22,000 Let me show you. This is the way a lot, at least a large class of 5 00:00:22,100 --> 00:00:27,000 restriction enzymes, went. We've seen a deoxyribose 6 00:00:27,100 --> 00:00:33,000 backbone before with a phosphate backbone going on up to 7 00:00:33,100 --> 00:00:40,000 the next nucleotide. This is the five-prime position. 8 00:00:40,100 --> 00:00:48,000 This is the three prime position. There is a phosphate here. It goes 9 00:00:48,100 --> 00:00:57,000 down to the next one. It goes down like this. 10 00:00:57,100 --> 00:01:06,000 So what I'm showing you, GAATTC, five prime to three prime, 11 00:01:06,100 --> 00:01:15,000 here's where I indicated the first cut comes from. 12 00:01:15,100 --> 00:01:20,000 And I indicated that the cleavage generates a G with a three prime 13 00:01:20,100 --> 00:01:26,000 hydroxyl, and the G ends up with a five prime phosphate. 14 00:01:26,100 --> 00:01:32,000 So what that means is that the hydrolysis happens right there, 15 00:01:32,100 --> 00:01:38,000 so that after cleavage, what you end up with is a five-prime phosphate 16 00:01:38,100 --> 00:01:44,000 and three prime hydroxyl. This is paired with the C on the 17 00:01:44,100 --> 00:01:51,000 opposite strand. It makes one cut here, 18 00:01:51,100 --> 00:01:57,000 and then it would make an identical cut on the opposite strand. 19 00:01:57,100 --> 00:02:03,000 So if we were to pull those apart this strand here would have G with 20 00:02:03,100 --> 00:02:10,000 its three prime OH, and the other strand would be TTAA. 21 00:02:10,100 --> 00:02:16,000 And if we were to pull the apart. We'd have A A T T C, a three prime 22 00:02:16,100 --> 00:02:23,000 OH here, five prime phosphate there, five prime phosphate there. Again, 23 00:02:23,100 --> 00:02:29,000 you have to remember, this strand is going five prime, 24 00:02:29,100 --> 00:02:36,000 three prime in that direction. This was going five prime, three 25 00:02:36,100 --> 00:02:42,000 prime in the end. And the beauty of these restriction 26 00:02:42,100 --> 00:02:46,000 enzymes, at least it's not true of all of them. But it's true of a lot 27 00:02:46,100 --> 00:02:50,000 of them as they generate these what are called sticky ends. 28 00:02:50,100 --> 00:02:54,000 You can pull them apart. They come back together, though, 29 00:02:54,100 --> 00:02:58,000 and reform those base pairs. It's almost like having little bits of 30 00:02:58,100 --> 00:03:02,000 Velcro at the end. And when this end is looking for a 31 00:03:02,100 --> 00:03:06,000 complementary sequence to pair with. It doesn't know what's out here, 32 00:03:06,100 --> 00:03:10,000 and it doesn't know what's out there, all it sees as this. 33 00:03:10,100 --> 00:03:14,000 So I can cut this and rejoin it or I can cut them and pull apart, 34 00:03:14,100 --> 00:03:19,000 take another piece of DNA that's been cut with the same enzyme and 35 00:03:19,100 --> 00:03:23,000 therefore has the same corresponding little sticky ends on each end, 36 00:03:23,100 --> 00:03:27,000 and it could insert right in the middle. And that's the 37 00:03:27,100 --> 00:03:32,000 principle of cloning. And it was the development of these, 38 00:03:32,100 --> 00:03:38,000 if you will, magic scissors that made it possible to take this DNA 39 00:03:38,100 --> 00:03:43,000 which looks so homogeneous, nothing but GA's, T's, and C's, 40 00:03:43,100 --> 00:03:49,000 and then cut it up in defined ways. When I was a postdoc, a friend of 41 00:03:49,100 --> 00:03:54,000 mine had just purified $2 million worth of EcoRI because he purified 42 00:03:54,100 --> 00:04:00,000 some of this enzyme. At that point, 43 00:04:00,100 --> 00:04:04,000 the only way to get these things was to produce them yourself and purify 44 00:04:04,100 --> 00:04:08,000 them. Now there are literally hundreds of these and they recognize 45 00:04:08,100 --> 00:04:12,000 different sequences, and once people understood that they 46 00:04:12,100 --> 00:04:16,000 existed, then they just started to look in different, 47 00:04:16,100 --> 00:04:20,000 usually they're from bacteria, and they just looked in different 48 00:04:20,100 --> 00:04:24,000 bacteria until they found another one. And then they purified it. 49 00:04:24,100 --> 00:04:28,000 So if you go to, say, any of the companies that do stuff for 50 00:04:28,100 --> 00:04:32,000 recombinant DNA, you'll find lists like this. 51 00:04:32,100 --> 00:04:36,000 Funny little abbreviations like EcoRI usually have something that 52 00:04:36,100 --> 00:04:40,000 tells you some abbreviation related to the organism, 53 00:04:40,100 --> 00:04:45,000 from which the restriction enzyme was isolated. And you can find 54 00:04:45,100 --> 00:04:49,000 things that will cut, I won't say every sequence, 55 00:04:49,100 --> 00:04:54,000 but very, very, many sequences. There are literally hundreds of 56 00:04:54,100 --> 00:04:58,000 these, and you just order them. And the next day, a FedEx package 57 00:04:58,100 --> 00:05:02,000 arrives with a little bit of the enzyme that will cut 58 00:05:02,100 --> 00:05:08,000 at that sequence. Another concept that seemed to be a 59 00:05:08,100 --> 00:05:16,000 problem, was what's a vector? So if you understand that there are 60 00:05:16,100 --> 00:05:23,000 sequence specific molecular scissors, that if we have a piece of DNA and 61 00:05:23,100 --> 00:05:31,000 there's an EcoRI site here, you would come to think of it like 62 00:05:31,100 --> 00:05:39,000 that because it's going to cut in a slightly skewed way. 63 00:05:39,100 --> 00:05:44,000 Maybe there's another one right here, another one over here. 64 00:05:44,100 --> 00:05:50,000 If we take this DNA and cut it with this particular enzyme, 65 00:05:50,100 --> 00:05:56,000 we get a break here. Then this will get this piece running from here to 66 00:05:56,100 --> 00:06:02,000 here. We'll get this little piece here. We'll get this piece here, 67 00:06:02,100 --> 00:06:08,000 and will get whatever goes off on those sides. So that's naked 68 00:06:08,100 --> 00:06:13,000 DNA in a test tube. So I could cut any piece of DNA at 69 00:06:13,100 --> 00:06:19,000 some sites generating a bunch of fragments and if I just took those 70 00:06:19,100 --> 00:06:24,000 fragments and transformed them into E. coli, I took naked DNA, 71 00:06:24,100 --> 00:06:30,000 took it from the outside, put it inside, is it going 72 00:06:30,100 --> 00:06:35,000 to replicate? No. Why not? Because there's a special 73 00:06:35,100 --> 00:06:39,000 signal called the origin of replication that says "start 74 00:06:39,100 --> 00:06:43,000 replicating DNA here". This came from a piece of human, 75 00:06:43,100 --> 00:06:47,000 lets say some of my DNA, it would not have a signal in it that said to 76 00:06:47,100 --> 00:06:52,000 the E. coli replication machinery, "start replicating DNA right here". 77 00:06:52,100 --> 00:06:56,000 So the principle of, apart from being able to cut DNA fragments, 78 00:06:56,100 --> 00:07:00,000 is you have together to replicate so you can make lots of 79 00:07:00,100 --> 00:07:05,000 lots of copies. The trick is to attach the DNA, 80 00:07:05,100 --> 00:07:11,000 at least a widely used trick is to attach the DNA to something that has 81 00:07:11,100 --> 00:07:16,000 an origin of replication that will work in the organism in question. 82 00:07:16,100 --> 00:07:21,000 And that was what we call a vector. So this is an E. coli cell. 83 00:07:21,100 --> 00:07:27,000 Another thing that's very confusing is all the circles that show up in 84 00:07:27,100 --> 00:07:32,000 this course. This is huge. The vector was double-stranded DNA 85 00:07:32,100 --> 00:07:36,000 that maybe, let's say, had a unique EcoRI restriction site 86 00:07:36,100 --> 00:07:40,000 in it. That's the only EcoRI restriction site. 87 00:07:40,100 --> 00:07:44,000 The other thing that we'd need to have is an origin of DNA replication. 88 00:07:44,100 --> 00:07:48,000 So, that's why this plasmid is able to propagate itself. 89 00:07:48,100 --> 00:07:52,000 This little circle of DNA is able to propagate itself, 90 00:07:52,100 --> 00:07:56,000 and then some kind of selectable marker. And, most of the time 91 00:07:56,100 --> 00:08:00,000 that's a drug-resistance, so although it doesn't have to be. 92 00:08:00,100 --> 00:08:05,000 And if we cut that here, 93 00:08:05,100 --> 00:08:13,000 generate sticky ends, then we can take this fragment and stick it in 94 00:08:13,100 --> 00:08:20,000 here, to give this piece here joined to the vector. 95 00:08:20,100 --> 00:08:28,000 And that's an insert. Let's say it was that piece there. 96 00:08:28,100 --> 00:08:31,000 In fact, if you wanted to clone DNA in E. coli, and then have the 97 00:08:31,100 --> 00:08:35,000 plasmid work in yeast, if you just take that plasmid that 98 00:08:35,100 --> 00:08:38,000 works, the vector with its insert that works in E. 99 00:08:38,100 --> 00:08:42,000 coli and put it in the yeast, it won't replicate either. And 100 00:08:42,100 --> 00:08:45,000 that's because these other languages other than the genetic code are not 101 00:08:45,100 --> 00:08:49,000 universal. So, you also have to put also in a 102 00:08:49,100 --> 00:08:52,000 sequence that said to the yeast replication machinery "start 103 00:08:52,100 --> 00:08:56,000 something here". People call that a shuttle vector, 104 00:08:56,100 --> 00:08:59,000 something that will replicate in E. coli or replicate in yeast, 105 00:08:59,100 --> 00:09:03,000 and the same principle applies to other organisms 106 00:09:03,100 --> 00:09:07,000 Okay, now probably the trickiest thing, the thing where I sort of 107 00:09:07,100 --> 00:09:11,000 muddled it on Friday, and I apologize for that, 108 00:09:11,100 --> 00:09:15,000 was this discovery of restriction enzymes. And again, 109 00:09:15,100 --> 00:09:19,000 some of you were frustrated. You said, why do I waste time? 110 00:09:19,100 --> 00:09:23,000 Why not just tell you stuff that's on the exam? Okay again, 111 00:09:23,100 --> 00:09:27,000 people were talking about molecular scissors when I was an undergrad and 112 00:09:27,100 --> 00:09:31,000 grad student, and chemists were trying to think if they could come 113 00:09:31,100 --> 00:09:36,000 up with some way to get some specificity in how to cutting DNA. 114 00:09:36,100 --> 00:09:40,000 And the answer, the discovery of restriction enzymes 115 00:09:40,100 --> 00:09:44,000 didn't come from that kind of experiment. It came from somebody 116 00:09:44,100 --> 00:09:48,000 trying to understand what seemed to be a really obscure piece of biology. 117 00:09:48,100 --> 00:09:52,000 Julie made up a lovely little slide. But I think it basically did was I 118 00:09:52,100 --> 00:09:56,000 left at one of the little layers that I usually show. 119 00:09:56,100 --> 00:10:03,000 So, let me just talk. Here's, again, what Luria saw when 120 00:10:03,100 --> 00:10:10,000 he was doing these experiments. So, I'm going to tell you now what 121 00:10:10,100 --> 00:10:17,000 was in strain A and B, and maybe that will help. 122 00:10:17,100 --> 00:10:24,000 But I wanted you first to see it without knowing what anything beyond 123 00:10:24,100 --> 00:10:31,000 what Luria knew. OK, strain A has no restriction 124 00:10:31,100 --> 00:10:38,000 enzyme, and no modification enzyme. 125 00:10:38,100 --> 00:10:48,000 And although there are different types of modification enzymes, 126 00:10:48,100 --> 00:10:58,000 many of them are methylases. So, we'll call that. And this one has a 127 00:10:58,100 --> 00:11:06,000 restriction enzyme. And, it has corresponding methylase. 128 00:11:06,100 --> 00:11:12,000 Just to review that again, if you can figure this pretty much out from 129 00:11:12,100 --> 00:11:18,000 first principles that if you were an organism, and you had a restriction 130 00:11:18,100 --> 00:11:24,000 enzyme that would cut this, there are two things you can 131 00:11:24,100 --> 00:11:31,000 possibly do to keep from cutting up your own DNA. 132 00:11:31,100 --> 00:11:35,000 One is to never have that sequence of pure DNA. That would prevent you 133 00:11:35,100 --> 00:11:39,000 from cutting up your own DNA, even though it has it. It's pretty 134 00:11:39,100 --> 00:11:43,000 constraining now, because somewhere in a particular 135 00:11:43,100 --> 00:11:48,000 protein, you might need that little bit of sequence to encode something 136 00:11:48,100 --> 00:11:52,000 that you need to make a critical protein. So instead what you find 137 00:11:52,100 --> 00:11:56,000 is organisms have a restriction enzyme, have those sequences, 138 00:11:56,100 --> 00:12:00,000 but they don't cut up their own DNA because they modify their own DNA, 139 00:12:00,100 --> 00:12:05,000 by putting, in the case of this one, they put a methyl here. 140 00:12:05,100 --> 00:12:09,000 And I drew that out the other day. You can see, you can put a methyl 141 00:12:09,100 --> 00:12:13,000 group on the exocyclic amino group of adanine, and not interfere with 142 00:12:13,100 --> 00:12:17,000 base pairing. But what you can do, is interfere with the way that the 143 00:12:17,100 --> 00:12:21,000 restriction enzymes sees that sequence. And when the cell pulls 144 00:12:21,100 --> 00:12:25,000 the DNA apart, each of the old strands is 145 00:12:25,100 --> 00:12:29,000 methylated, and a new strand is initially not methylated. 146 00:12:29,100 --> 00:12:33,000 But that's enough keep the restriction enzyme from 147 00:12:33,100 --> 00:12:37,000 doing its thing. And then the methylase will come 148 00:12:37,100 --> 00:12:41,000 along, find a sequence, and then the progeny strand, 149 00:12:41,100 --> 00:12:44,000 the daughter strand, will begin to get methylated. 150 00:12:44,100 --> 00:12:48,000 So, once you get DNA methylated, you can propagate it as long as you 151 00:12:48,100 --> 00:12:51,000 have the methyl group. So this was what was really 152 00:12:51,100 --> 00:12:55,000 underlying what Luria did. But he didn't know that. 153 00:12:55,100 --> 00:13:00,000 Let me quickly just go through this again. So he grew the strain, 154 00:13:00,100 --> 00:13:05,000 the phage on strain A, no restriction enzyme, 155 00:13:05,100 --> 00:13:10,000 just plain old DNA. So, he picks a plaque that's probably 156 00:13:10,100 --> 00:13:15,000 about a billion phage in a plaque, somewhere around 10^8, 10^9 probably, 157 00:13:15,100 --> 00:13:20,000 no that's not true, a little less than that, 158 00:13:20,100 --> 00:13:25,000 somewhat less than that, but lots and lots of phage particles in the 159 00:13:25,100 --> 00:13:30,000 plaque. Resuspend them and then plate them out. 160 00:13:30,100 --> 00:13:34,000 And of course it grew on strain A. That's what it was growing on. 161 00:13:34,100 --> 00:13:38,000 That wouldn't surprise you at all. The surprise was, even though he 162 00:13:38,100 --> 00:13:42,000 knew he had lots and lots of phage, hardly any of them grew on strain B. 163 00:13:42,100 --> 00:13:47,000 But, he found a rare plaque that had learned to grow. 164 00:13:47,100 --> 00:13:51,000 And he tested then, and this thing grows on strain B. 165 00:13:51,100 --> 00:13:55,000 That was not a surprise because it had to grow on strain B to be up 166 00:13:55,100 --> 00:14:00,000 there. And he tested, and it still grew on the strain A. 167 00:14:00,100 --> 00:14:04,000 So up until now, everything we knew, 168 00:14:04,100 --> 00:14:09,000 everything we've talked about in that course, you think, 169 00:14:09,100 --> 00:14:13,000 ah-ha, this original phage couldn't grow on strain B, 170 00:14:13,100 --> 00:14:18,000 but it's mutated. Somehow. It's learned to grow on strain B. 171 00:14:18,100 --> 00:14:22,000 That would be a perfectly reasonable explanation. 172 00:14:22,100 --> 00:14:27,000 But if that were the case, what would've happened? 173 00:14:27,100 --> 00:14:31,000 Well it was growing on those phage, you pick them again, it should still 174 00:14:31,100 --> 00:14:35,000 grow on A and B, and they still do. 175 00:14:35,100 --> 00:14:39,000 The problem with that model was that they grew on strain A. 176 00:14:39,100 --> 00:14:43,000 They had trouble growing on strain B. Now that we've learned to grow 177 00:14:43,100 --> 00:14:47,000 on strain B, but you hadn't forgotten how to grow on strain A. 178 00:14:47,100 --> 00:14:51,000 But if you take the ones that were growing on strain A, 179 00:14:51,100 --> 00:14:56,000 they grow on strain A, not a surprise. 180 00:14:56,100 --> 00:15:00,000 This is the problem right here. If this was a mutant, permanent 181 00:15:00,100 --> 00:15:04,000 change, then we should have been able to have it grow on strain B as 182 00:15:04,100 --> 00:15:08,000 well. So what was really happening in there? Well, 183 00:15:08,100 --> 00:15:12,000 at the beginning, the phage DNA was lacking any kind 184 00:15:12,100 --> 00:15:16,000 of modification. It grew fine on strain A, 185 00:15:16,100 --> 00:15:20,000 because there was no restriction enzyme. When it went into strain B, 186 00:15:20,100 --> 00:15:24,000 that now had an enzyme, a restriction enzyme that cut up any 187 00:15:24,100 --> 00:15:29,000 time it found that sequence. And so, most of the phage that 188 00:15:29,100 --> 00:15:33,000 injected their DNA, those DNAs were trashed by the 189 00:15:33,100 --> 00:15:37,000 restriction enzyme. But there is a methylase in there 190 00:15:37,100 --> 00:15:41,000 that's also able to methylate those sequences. And what happened 191 00:15:41,100 --> 00:15:45,000 somewhere along the way was that there was the methyl, 192 00:15:45,100 --> 00:15:50,000 on one phage DNA, and got enough methyls on there that the phage 193 00:15:50,100 --> 00:15:54,000 could be replicated before it got cut up by the restriction enzyme. 194 00:15:54,100 --> 00:15:58,000 Once the phage molecule has methylations on the site, 195 00:15:58,100 --> 00:16:02,000 it's able to grow just fine on strain B, and it will be able to do 196 00:16:02,100 --> 00:16:07,000 that forever. However, if you take that DNA with 197 00:16:07,100 --> 00:16:11,000 the methyls on it, and we put it back strain A, 198 00:16:11,100 --> 00:16:15,000 it'll still grow, as this one doesn't have any kind of restriction 199 00:16:15,100 --> 00:16:19,000 enzyme, but while it's growing its busy losing all the methyls again. 200 00:16:19,100 --> 00:16:23,000 We are right back to where we started from. An obscure experiment, 201 00:16:23,100 --> 00:16:27,000 one of the most obscure you could get, many people would have paid no 202 00:16:27,100 --> 00:16:31,000 attention. It does not seem worth it. 203 00:16:31,100 --> 00:16:35,000 So, the phenomenon was called restriction. They had to give it a 204 00:16:35,100 --> 00:16:39,000 name. It wasn't mutation. They said this phage DNA was being 205 00:16:39,100 --> 00:16:44,000 restricted somehow when it grew on strain B. People called them, 206 00:16:44,100 --> 00:16:48,000 postulated there must be a restriction enzyme that was doing 207 00:16:48,100 --> 00:16:53,000 this restricting of phage growth. When they found out what was doing 208 00:16:53,100 --> 00:16:57,000 it, they had discovered magic scissors that would cut DNA at 209 00:16:57,100 --> 00:17:01,000 particular sequences. So, the point here again, 210 00:17:01,100 --> 00:17:04,000 to try and go back and do this, I hope some of you will get this 211 00:17:04,099 --> 00:17:08,000 anyway, many of the really important discoveries come out of basic 212 00:17:08,099 --> 00:17:11,000 research. They are easy to ridicule. Why would I spend money on cancer, 213 00:17:11,099 --> 00:17:14,000 human disease or something, for somebody studying some little weird 214 00:17:14,099 --> 00:17:18,000 phenomenon about phage. But if you want to trace back to 215 00:17:18,099 --> 00:17:21,000 the experiment, it sort of started the biotech 216 00:17:21,099 --> 00:17:24,000 industry. It was the discovery of restriction enzymes. 217 00:17:24,099 --> 00:17:28,000 It took a little while to discover what it was, but the reason they 218 00:17:28,099 --> 00:17:31,000 were discovered, were people were trying to 219 00:17:31,100 --> 00:17:34,000 understand that phenomenology. Once we got restriction enzymes, 220 00:17:34,100 --> 00:17:38,000 we already had ligase, which is sort of the tape we'd need, 221 00:17:38,100 --> 00:17:42,000 and just to show you here, when we go back to this one, 222 00:17:42,100 --> 00:17:46,000 you can see that if we put these together again, 223 00:17:46,100 --> 00:17:50,000 we have a three prime hydroxyl, and a five-prime phosphate. That's 224 00:17:50,100 --> 00:17:54,000 what DNA ligase knows how to do, because that's how you seal up the 225 00:17:54,100 --> 00:17:58,000 end of a Okazaki fragment. So, that particular part of the 226 00:17:58,100 --> 00:18:02,000 molecular biology toolkit was already known to molecular 227 00:18:02,100 --> 00:18:06,000 biologists who had been studying DNA replication. 228 00:18:06,100 --> 00:18:13,000 OK, so, if we took some DNA from anything, and we cut up into pieces 229 00:18:13,100 --> 00:18:21,000 like this, and then we join them with a vector that had been cut, 230 00:18:21,100 --> 00:18:29,000 so let's just sort of open it up a little bit, this fragment would go 231 00:18:29,100 --> 00:18:37,000 in here into one vector molecule. 232 00:18:37,100 --> 00:18:42,000 This fragment would insert in another vector molecule, 233 00:18:42,100 --> 00:18:47,000 and so on and so forth. Then we would have what I said was a library, 234 00:18:47,100 --> 00:18:52,000 and the problem at this point is, so you transform those into E. coli, 235 00:18:52,100 --> 00:18:58,000 and now we have a whole series of E. coli. They have their own 236 00:18:58,100 --> 00:19:03,000 chromosome, every one of them because they still have 237 00:19:03,100 --> 00:19:10,000 to be a bacterium. So, let's take three members of E. 238 00:19:10,100 --> 00:19:19,000 coli from this library, and they will all have this vector, 239 00:19:19,100 --> 00:19:28,000 but they'll have, let's say, insert number 1, 2, 3. This insert 240 00:19:28,100 --> 00:19:35,000 is a little. This insert is bigger, 241 00:19:35,100 --> 00:19:40,000 and so on. If we did it right. We have every possible fragment of 242 00:19:40,100 --> 00:19:46,000 DNA from the original source sitting in its own vector. 243 00:19:46,100 --> 00:19:51,000 And the whole collection of E. coli in this population in the 244 00:19:51,100 --> 00:19:56,000 certain library, and the next part of the trick was, 245 00:19:56,100 --> 00:20:02,000 how do you find the thing you want, especially if you take my DNA with 3 246 00:20:02,100 --> 00:20:07,000 billion base pairs. That's an awful lot of restriction 247 00:20:07,100 --> 00:20:11,000 fragments no matter what you do. How do you go about doing it? So 248 00:20:11,100 --> 00:20:15,000 the experiment that I showed you at the end of the lecture, 249 00:20:15,100 --> 00:20:20,000 cloning by complementation, is fairly simple, and it was 250 00:20:20,100 --> 00:20:24,000 basically one of the first methods that was used to find genes in 251 00:20:24,100 --> 00:20:28,000 recombinant library. And that would be, for example, 252 00:20:28,100 --> 00:20:33,000 something that had thisGene mutation in the chromosomal DNA. 253 00:20:33,100 --> 00:20:38,000 This is what the situation I'd described the other day. 254 00:20:38,100 --> 00:20:43,000 So, if we put the library into every cell, such that the bacterium 255 00:20:43,100 --> 00:20:48,000 we transformed the library into was broken for the hisG gene, 256 00:20:48,100 --> 00:20:53,000 and that mutant couldn't grow on minimal medium unless we put in 257 00:20:53,100 --> 00:20:58,000 added histadine. But, if one member of that library 258 00:20:58,100 --> 00:21:03,000 had the wild type, hisG gene, let's say it was this one 259 00:21:03,100 --> 00:21:08,000 here, maybe it had several genes on it. 260 00:21:08,100 --> 00:21:13,000 But let's say over here. We had hisG+, then the strain is 261 00:21:13,100 --> 00:21:19,000 back to being able to synthesize histidine because it's got all the 262 00:21:19,100 --> 00:21:25,000 enzymes. What I was pointing out was this really is complementation, 263 00:21:25,100 --> 00:21:31,000 just like we did in that phage cross. We've got one broken copy of the 264 00:21:31,100 --> 00:21:36,000 gene. We've got a good copy. And all you need is one good copy, 265 00:21:36,100 --> 00:21:41,000 and you're back in business. What I was saying at the end of the lecture 266 00:21:41,100 --> 00:21:46,000 was this is not a general solution, though. If I wanted to find the 267 00:21:46,100 --> 00:21:51,000 corresponding histidine gene from my DNA, and all of these biosynthetic 268 00:21:51,100 --> 00:21:56,000 pathways, pretty much they rose so early in evolution, 269 00:21:56,100 --> 00:22:01,000 the biochemistry is essentially identical in all cells. 270 00:22:01,100 --> 00:22:06,000 Can I use this approach to find the same gene for my DNA? 271 00:22:06,100 --> 00:22:09,000 What do you think? Why don't you turn to somebody 272 00:22:09,100 --> 00:22:13,000 beside you and see if you can talk for a minute, and then let's see if, 273 00:22:13,100 --> 00:22:17,000 I can think of at least a couple of problems. Let's see if you can come 274 00:22:17,100 --> 00:22:21,000 up with one or two of them. Find somebody near you and see if 275 00:22:21,100 --> 00:23:06,000 you can come up with anything. 276 00:23:06,100 --> 00:23:14,000 Anybody want to volunteer? An idea of why it would not work? 277 00:23:14,100 --> 00:23:22,000 Or some of you think it would? No ideas? God, it's Monday. 278 00:23:22,100 --> 00:23:30,000 [LAUGHTER] I feel like most of you guys. 279 00:23:30,100 --> 00:23:36,000 Somebody, come on. What do you think? 280 00:23:36,100 --> 00:23:42,000 It's going to work? No idea? What has to happen for it 281 00:23:42,100 --> 00:23:48,000 to work? I'll give you a vector that has my gene corresponding to 282 00:23:48,100 --> 00:23:54,000 that enzyme. It's in E. coli. I need to make the protein. 283 00:23:54,100 --> 00:24:00,000 Yeah? No, it's in the vector. We cloned it into an E. 284 00:24:00,100 --> 00:24:07,000 coli vector. So, that's got it. Yeah? 285 00:24:07,100 --> 00:24:11,000 Well it'll have a language that will say "start transcription", 286 00:24:11,100 --> 00:24:15,000 but whose language is it going to have? It's going to have my 287 00:24:15,100 --> 00:24:19,000 transcriptional stuff. Will that work in E. coli? 288 00:24:19,100 --> 00:24:23,000 Even though the open reading frame is fine, that's good. 289 00:24:23,100 --> 00:24:27,000 How about translation? I didn't even tell you about that. 290 00:24:27,100 --> 00:24:31,000 There is actually some specific 291 00:24:31,100 --> 00:24:35,000 stuff needed. That's not universal, either. So when you get the RNA you 292 00:24:35,100 --> 00:24:39,000 still have to translate it. There is another thing that might 293 00:24:39,100 --> 00:24:43,000 mess us up. Do you remember anything else about, 294 00:24:43,100 --> 00:24:47,000 yeah? Introns and exons? What if my gene has introns in it, 295 00:24:47,100 --> 00:24:51,000 which it almost surely has? We have to get rid of those. 296 00:24:51,100 --> 00:24:55,000 E. coli doesn't know what they are. It's not used to taking them out. 297 00:24:55,100 --> 00:24:59,000 You see the issues? Although that's a cute thing and 298 00:24:59,100 --> 00:25:03,000 that helps you find a gene from E. coli by complementing E. coli mutant, 299 00:25:03,100 --> 00:25:07,000 or maybe you could do it with yeast if you had a factor that would 300 00:25:07,100 --> 00:25:11,000 replicate in yeast, it wasn't a general solution. 301 00:25:11,100 --> 00:25:14,000 So people had to use a whole variety of different ways. 302 00:25:14,100 --> 00:25:17,000 Here's another way. You know that you have that genetic 303 00:25:17,100 --> 00:25:21,000 code. That was worked out years ago. So let's say I was a biochemist, 304 00:25:21,100 --> 00:25:24,000 and I'd found a protein that I was interested in, 305 00:25:24,100 --> 00:25:28,000 and I purified it, and they got it out to single 306 00:25:28,100 --> 00:25:31,000 protein, and then I could cut it up with things that proteases that will 307 00:25:31,100 --> 00:25:35,000 cut the protein into pieces, and there are ways of sequencing 308 00:25:35,100 --> 00:25:39,000 protein. I'm not going to tell you how it 309 00:25:39,100 --> 00:25:43,000 works in this course. We just don't have time. 310 00:25:43,100 --> 00:25:47,000 But you can get the sequence of little pieces of protein. 311 00:25:47,100 --> 00:25:52,000 And let's imagine that this was the sequence of part of the protein that 312 00:25:52,100 --> 00:25:56,000 I purified. It's one of my enzymes [SOUND OFF/THEN ON] and I'd like to 313 00:25:56,100 --> 00:26:00,000 find the gene. Well, how could I use that 314 00:26:00,100 --> 00:26:05,000 information to figure out where the gene is in this library? 315 00:26:05,100 --> 00:26:11,000 So here's the strategy. We get out the genetic code, 316 00:26:11,100 --> 00:26:18,000 which Gobind Khorana and Marshall Nirenberg helped work out, 317 00:26:18,100 --> 00:26:24,000 and we say OK, alanine, and if you look it up, what you'll find is that 318 00:26:24,100 --> 00:26:31,000 it's GC, and then it can be A T C or G. It can be any of those. 319 00:26:31,100 --> 00:26:36,000 If we look up what the codon for aspartate. We'll find that there's 320 00:26:36,100 --> 00:26:41,000 a G or an A, but it could be T, or it could be C. We look up 321 00:26:41,100 --> 00:26:46,000 lysine; it'll mostly be A, but it could be A or G. You'll 322 00:26:46,100 --> 00:26:51,000 notice the variation of those things is almost all in the third codon if 323 00:26:51,100 --> 00:26:57,000 that hadn't struck you. Same thing with threonine: A, 324 00:26:57,100 --> 00:27:02,000 C, and this is another one; therefore codons that encode this. 325 00:27:02,100 --> 00:27:09,000 And this one asparagine is that. So knowing that piece of the protein 326 00:27:09,100 --> 00:27:18,000 doesn't define a unique sequence. Though, what we could do it is we 327 00:27:18,100 --> 00:27:27,000 could synthesize what's called a mixed probe. 328 00:27:27,100 --> 00:27:32,000 And that would mean when we are going to synthesize this DNA and 329 00:27:32,100 --> 00:27:38,000 we'd start with a G building block. And then we'd add a C building 330 00:27:38,100 --> 00:27:43,000 block. So, now we made G and C. And at the next step, we'd add an 331 00:27:43,100 --> 00:27:49,000 equal mixture of A, T, G, and C. So, what we would get 332 00:27:49,100 --> 00:27:54,000 out of that would be we'd be getting G, and then the next biosynthetic 333 00:27:54,100 --> 00:28:00,000 step would give us G, C. And then the next biochemical 334 00:28:00,100 --> 00:28:06,000 step would give us GCA, GCT, GCC, or GCG. 335 00:28:06,100 --> 00:28:10,000 At the next step, we'd add a G. So everyone of these 336 00:28:10,100 --> 00:28:15,000 would get a G. Everyone would get an A, 337 00:28:15,100 --> 00:28:20,000 and then the next step, they would branch. And if you follow that out, 338 00:28:20,100 --> 00:28:25,000 you'll see by the end you have a mixture of probes. 339 00:28:25,100 --> 00:28:30,000 One of them is going to be the right one that you find in the DNA. 340 00:28:30,100 --> 00:28:34,000 Now if you work out the number of possibilities, 341 00:28:34,100 --> 00:28:38,000 you'll discover that most of the time there is only going to be one 342 00:28:38,100 --> 00:28:42,000 probe that's unique. Once you get to about 20 343 00:28:42,100 --> 00:28:46,000 nucleotides, any sequence, on average, is represented once in 344 00:28:46,100 --> 00:28:50,000 the human genome. So as long as you make the probe 345 00:28:50,100 --> 00:28:54,000 long enough, one of the things in your mixture will be a defined probe. 346 00:28:54,100 --> 00:28:58,000 So, what we can take is we have all these different pieces of DNA that 347 00:28:58,100 --> 00:29:03,000 are the logical variants you can see here. 348 00:29:03,100 --> 00:29:08,000 And then we would label the probe with P32. It's a radioactive 349 00:29:08,100 --> 00:29:14,000 isotope, and it's very easy. You can add it to a five-prime 350 00:29:14,100 --> 00:29:19,000 phosphate. There's a special enzyme that will very easily take the 351 00:29:19,100 --> 00:29:25,000 terminal phosphate from ATP and put it over. It doesn't really [SOUND 352 00:29:25,100 --> 00:29:31,000 OFF/THEN ON] for this course how we get it there. 353 00:29:31,100 --> 00:29:36,000 But we can do is radioactively label the probe. So now we've got this 354 00:29:36,100 --> 00:29:41,000 mixture. And, somewhere in this library is a piece 355 00:29:41,100 --> 00:29:46,000 of DNA that's going to have the gene that's encoding the protein that 356 00:29:46,100 --> 00:29:51,000 we're interested in. So, how do you go about trying to 357 00:29:51,100 --> 00:29:56,000 deal with that? So, what we'll do is we'll plate 358 00:29:56,100 --> 00:30:01,000 our E. coli library onto a bunch of Petri plates. 359 00:30:01,100 --> 00:30:07,000 So, I won't put too many colonies on here, so we can sort of see a 360 00:30:07,100 --> 00:30:13,000 pattern. But, we'd have probably a lot of them, 361 00:30:13,100 --> 00:30:20,000 and we'd have a bunch of plates. You can work out statistically how 362 00:30:20,100 --> 00:30:26,000 many plates you have to have to have a chance of finding your gene of 363 00:30:26,100 --> 00:30:33,000 interest. Then what we do, is lay a membrane on the plate. 364 00:30:33,100 --> 00:30:40,000 It's a particular type of membrane, and what that will do is it will 365 00:30:40,100 --> 00:30:48,000 make a copy of everything that's there, and we're going to save the 366 00:30:48,100 --> 00:30:55,000 plate. And then we're going to treat the membrane, 367 00:30:55,100 --> 00:31:03,000 OK? We've got a membrane that's got an identical pattern. 368 00:31:03,100 --> 00:31:09,000 They've got some of the bacteria from the colonies stuck at the 369 00:31:09,100 --> 00:31:15,000 corresponding parts on the membrane. We're going to lyse the E. coli; 370 00:31:15,100 --> 00:31:22,000 that means break them open so all their insides spill out. 371 00:31:22,100 --> 00:31:28,000 We will denature the DNA by treating with a condition. 372 00:31:28,100 --> 00:31:35,000 You can, for example, vary the pH and make the strands come apart. 373 00:31:35,100 --> 00:31:40,000 That gives single stranded DNA, "ss" I'm using as an abbreviation 374 00:31:40,100 --> 00:31:46,000 for single strand of the strands you pulled apart. And, 375 00:31:46,100 --> 00:31:51,000 this sticks to the membrane. So, now we've got, at every one of 376 00:31:51,100 --> 00:31:57,000 these little positions on the membrane something that 377 00:31:57,100 --> 00:32:03,000 looks like this. Here's the membrane, 378 00:32:03,100 --> 00:32:09,000 and there's some sort of single stranded DNA that's stuck to the 379 00:32:09,100 --> 00:32:15,000 membrane in that fashion. The DNA that's stuck here came from 380 00:32:15,100 --> 00:32:21,000 the bacterium here that had a particular insert. 381 00:32:21,100 --> 00:32:27,000 Over here, we have all the E. coli DNA and the vector DNA, but we 382 00:32:27,100 --> 00:32:34,000 will have a different piece of DNA in the vector. 383 00:32:34,100 --> 00:32:38,000 Everybody with me? OK, so if we were now to take our 384 00:32:38,100 --> 00:32:42,000 radioactive probe that we made up there and get the conditions just 385 00:32:42,100 --> 00:32:46,000 right, that single stranded probe will come in, and it will try and 386 00:32:46,100 --> 00:32:50,000 find its complement. It'll form hydrogen bonds because 387 00:32:50,100 --> 00:32:54,000 that's the lowest energy well. And, we think about it 388 00:32:54,100 --> 00:32:58,000 thermodynamically. And, if we can get it right, 389 00:32:58,100 --> 00:33:02,000 the temperature and the conditions right, nothing will stick unless 390 00:33:02,100 --> 00:33:06,000 it's an exact match to the sequence. 391 00:33:06,100 --> 00:33:12,000 And, if we get the right probe that can form hydrogen bonds with 392 00:33:12,100 --> 00:33:19,000 everything on here, and it's got P32 at this point, 393 00:33:19,100 --> 00:33:25,000 what will happen is we'll have now, the probe will stick, say, to this 394 00:33:25,100 --> 00:33:32,000 particular colony, now with radioactivity right there. 395 00:33:32,100 --> 00:33:40,000 So, put some photographic film over 396 00:33:40,100 --> 00:33:48,000 the membrane, and right here there's P32, and that'll expose the film and 397 00:33:48,100 --> 00:33:57,000 nowhere else. And then, when we develop, what we'll find is 398 00:33:57,100 --> 00:34:06,000 one, if this works well, anyway, one spot. 399 00:34:06,100 --> 00:34:10,000 So, now we know that that piece, that colony had a piece of human DNA 400 00:34:10,100 --> 00:34:14,000 in it that was related to the sequence from the protein that I had 401 00:34:14,100 --> 00:34:18,000 purified. So we go back to this colony, I think our things have 402 00:34:18,100 --> 00:34:22,000 probably migrated around a little bit here. Let's move this up just a 403 00:34:22,100 --> 00:34:27,000 little bit, and make it a little better. 404 00:34:27,100 --> 00:34:32,000 So this one is this one. So, now I can go back to this 405 00:34:32,100 --> 00:34:37,000 colony and pick it out. And, let's say it's this insert. 406 00:34:37,100 --> 00:34:43,000 So, now I've found I can sequence the rest of that piece of DNA. 407 00:34:43,100 --> 00:34:48,000 We'll talk about how we sequence DNA in the next lecture. 408 00:34:48,100 --> 00:34:54,000 So that's an alternative way of identifying a clone of interest. 409 00:34:54,100 --> 00:34:59,000 There was a particularly painful way of finding a gene in a library 410 00:34:59,100 --> 00:35:05,000 that we for the most part do not have to do any more. 411 00:35:05,100 --> 00:35:18,000 It was called positional cloning. And for example the gene that when 412 00:35:18,100 --> 00:35:32,000 it's broken causes cystic fibrosis, it's a very difficult disease. 413 00:35:32,100 --> 00:35:37,000 Humans who have cystic fibrosis have a very tough time. 414 00:35:37,100 --> 00:35:42,000 So there's a great deal of interest in finding the gene that was broken 415 00:35:42,100 --> 00:35:47,000 in these patients. Human geneticists, 416 00:35:47,100 --> 00:35:53,000 I showed you something about pedigrees, they would have a 417 00:35:53,100 --> 00:35:58,000 chromosome. They might have banding patterns, and they would have 418 00:35:58,100 --> 00:36:03,000 figured out that somewhere along the chromosome that the gene for cystic 419 00:36:03,100 --> 00:36:09,000 fibrosis lay somewhere between two genetic markers that they 420 00:36:09,100 --> 00:36:13,000 had identified. Now, the amount of DNA between 421 00:36:13,100 --> 00:36:17,000 something that you knew that the gene was here, 422 00:36:17,100 --> 00:36:20,000 and something knew that the gene was there, could be huge. 423 00:36:20,100 --> 00:36:24,000 It could be many, many, many times the size of the E. 424 00:36:24,100 --> 00:36:27,000 coli chromosome. So, what people would do is they'd clone something 425 00:36:27,100 --> 00:36:31,000 from here, a little piece of DNA from there, and also clone something 426 00:36:31,100 --> 00:36:35,000 and they get a little piece of DNA from there. 427 00:36:35,100 --> 00:36:39,000 And then they go into the library, and they try and find something that 428 00:36:39,100 --> 00:36:44,000 had this DNA and something that extended in this direction the 429 00:36:44,100 --> 00:36:49,000 little bit. By the sort of thing you'd have marker A somewhere in the 430 00:36:49,100 --> 00:36:53,000 middle of the cystic fibrosis gene but you didn't know exactly where. 431 00:36:53,100 --> 00:36:58,000 You'd clone a little piece of DNA and use that to find another one 432 00:36:58,100 --> 00:37:03,000 that overlapped with it. And that you'd find, 433 00:37:03,100 --> 00:37:07,000 use that to find another piece of DNA. You'd walk your way over this 434 00:37:07,100 --> 00:37:11,000 way, and you'd start the same process at the other end. 435 00:37:11,100 --> 00:37:15,000 And, every one of these things, the same kind of operation that 436 00:37:15,100 --> 00:37:19,000 we've got here, so cycles, and cycles, 437 00:37:19,100 --> 00:37:23,000 and cycles of acquiring the next adjacent piece of DNA, 438 00:37:23,100 --> 00:37:27,000 and working your way along here. And you had to use more than one 439 00:37:27,100 --> 00:37:31,000 different restriction enzymes, otherwise you wouldn't be able to 440 00:37:31,100 --> 00:37:35,000 get these overlaps. And by doing that, 441 00:37:35,100 --> 00:37:39,000 eventually they were able to get all the DNA that was between these 442 00:37:39,100 --> 00:37:43,000 markers. They knew the cystic fibrosis gene was there from the 443 00:37:43,100 --> 00:37:47,000 maps they had made by studying human pedigrees. And then, 444 00:37:47,100 --> 00:37:51,000 once you knew the sequence, then you'd take candidate genes, 445 00:37:51,100 --> 00:37:55,000 and you'd take a bunch of cystic fibrosis patients, 446 00:37:55,100 --> 00:37:59,000 and you'd start to see if every person who had cystic fibrosis had a 447 00:37:59,100 --> 00:38:04,000 mutation in that gene. And, eventually they got it. 448 00:38:04,100 --> 00:38:10,000 That process in the case of cystic fibrosis took five years to do that 449 00:38:10,100 --> 00:38:16,000 with a huge team of people. And, I guess it was from 1985-1990. 450 00:38:16,100 --> 00:38:21,000 If you wanted to do that experiment today. We come now to one of the 451 00:38:21,100 --> 00:38:27,000 most widely used ways to finding a gene of interest, 452 00:38:27,100 --> 00:38:31,000 and that is you go to the computer. The whole human genome sequence is 453 00:38:31,100 --> 00:38:35,000 in there. If we were to do that experiment today, 454 00:38:35,100 --> 00:38:39,000 we'd say, well, I know what this gene is, so you look that gene up in 455 00:38:39,100 --> 00:38:43,000 the database. And you knew what this gene was and you look up the 456 00:38:43,100 --> 00:38:46,000 database. Then you just look at all the DNA that's in the middle. 457 00:38:46,100 --> 00:38:50,000 And you'd see a whole series of open reading frames. 458 00:38:50,100 --> 00:38:54,000 And you'd probably say, well, what do I know about the 459 00:38:54,100 --> 00:38:58,000 biology of cystic fibrosis? Could I make a guess? Is it a 460 00:38:58,100 --> 00:39:02,000 membrane protein? Is it not a membrane protein? 461 00:39:02,100 --> 00:39:05,000 And, there are certain characteristics that would probably 462 00:39:05,100 --> 00:39:09,000 allow you to make a guess. And then, you could jump right in 463 00:39:09,100 --> 00:39:13,000 and start sequencing DNA. You could start the experiment 464 00:39:13,100 --> 00:39:16,000 practically that afternoon instead of five years later. 465 00:39:16,100 --> 00:39:20,000 So, if you look back from the literature, you'll find some of the 466 00:39:20,100 --> 00:39:24,000 key genes; in fact, in human biology, we're isolated by 467 00:39:24,100 --> 00:39:27,000 this very painful process of positional cloning, 468 00:39:27,100 --> 00:39:31,000 and you hardly ever have to do that now. It may be the odd case where 469 00:39:31,100 --> 00:39:35,000 something's needed, but most of the stuff now, 470 00:39:35,100 --> 00:39:39,000 there are these amazing databases, and I'll give you the URL for it at 471 00:39:39,100 --> 00:39:43,000 the beginning of next lecture. And I'll show you when, 472 00:39:43,100 --> 00:39:48,000 this is an experiment in which someone took this gene for cystic 473 00:39:48,100 --> 00:39:53,000 fibrosis. It is a membrane protein, and it's one of those proteins that 474 00:39:53,100 --> 00:39:57,000 mediates the passage of chloride ions across the membrane. 475 00:39:57,100 --> 00:40:02,000 And, if that gets broken, you end up with cystic fibrosis. 476 00:40:02,100 --> 00:40:07,000 What someone has done here is they've taken that green fluorescent 477 00:40:07,100 --> 00:40:12,000 protein gene. And, they've fused it to the end of 478 00:40:12,100 --> 00:40:16,000 the cystic fibrosis gene. So, you can tell where the cystic 479 00:40:16,100 --> 00:40:21,000 fibrosis gene is localized in a lung cell by looking to see where the 480 00:40:21,100 --> 00:40:26,000 fluorescence is. And, I think you can see that the 481 00:40:26,100 --> 00:40:31,000 fluorescence is out there along the membrane. 482 00:40:31,100 --> 00:40:35,000 OK, so at the beginning of next lecture, I'll introduce you to how 483 00:40:35,100 --> 00:40:40,000 we take one of these recombinant plasmids, and make what's called a 484 00:40:40,100 --> 00:40:44,000 restriction map. It's using a very simple, 485 00:40:44,100 --> 00:40:49,000 little piece of apparatus like that, and we'll go in and tell you about 486 00:40:49,100 --> 00:40:53,000 DNA sequencing, and this PCR technique you've heard, 487 00:40:53,100 --> 00:40:56,000 preliminary chain reaction that you've heard so much about, OK?