1 00:00:07,340 --> 00:00:11,290 PROFESSOR: So, first step, we need to cut our DNA. 2 00:00:16,340 --> 00:00:26,446 Step one, cut, which is going to be DNA restriction enzymes. 3 00:00:35,980 --> 00:00:39,910 It turns out, quite remarkably, that if I have a 4 00:00:39,910 --> 00:00:54,640 sequence of DNA, five prime A G C T A G A A T T C T T A C C 5 00:00:54,640 --> 00:00:58,000 three prime, and we'll come backwards 6 00:00:58,000 --> 00:00:59,250 filling in the sequence. 7 00:01:09,170 --> 00:01:17,170 It turns out that molecular biologists are so cool that 8 00:01:17,170 --> 00:01:24,100 they invented a protein that is able to recognize that six 9 00:01:24,100 --> 00:01:28,900 letter sequence, G-A-A-T-T-C. 10 00:01:28,900 --> 00:01:33,360 And it is able-- actually it's G-A-A-T-T-C on this strand, 11 00:01:33,360 --> 00:01:35,190 but what is it coming back? 12 00:01:35,190 --> 00:01:39,170 It's G-A-A-T-T-C. It's the same thing. 13 00:01:39,170 --> 00:01:41,300 So actually this is a palindrome. 14 00:01:41,300 --> 00:01:41,970 That's kind of nice. 15 00:01:41,970 --> 00:01:42,990 It's a palindrome-- 16 00:01:42,990 --> 00:01:45,220 it's a reverse palindrome. 17 00:01:45,220 --> 00:01:50,530 It's the same word spelled backwards on the other strand. 18 00:01:50,530 --> 00:01:56,530 So what it does is it cuts it like that. 19 00:01:56,530 --> 00:02:04,340 And what it produces is a DNA molecule like this and a DNA 20 00:02:04,340 --> 00:02:08,889 molecule like that, that's mostly double stranded, but 21 00:02:08,889 --> 00:02:11,500 has a four base pair overhang. 22 00:02:11,500 --> 00:02:14,640 The overhang reads T-T-A-A here. 23 00:02:14,640 --> 00:02:17,610 It reads A-A-T-T there. 24 00:02:17,610 --> 00:02:21,760 And remember this is five prime to three prime, five 25 00:02:21,760 --> 00:02:24,720 prime to three prime. 26 00:02:24,720 --> 00:02:26,910 And there you go. 27 00:02:26,910 --> 00:02:31,030 This guy has its little phosphate at the end there. 28 00:02:31,030 --> 00:02:35,156 This guy has his little hydroxyl over there. 29 00:02:35,156 --> 00:02:36,406 And it cuts it. 30 00:02:38,770 --> 00:02:41,340 Now that is an incredible piece of engineering. 31 00:02:41,340 --> 00:02:45,260 To come up with a protein, to devise a protein, that is able 32 00:02:45,260 --> 00:02:48,850 to recognize those six bases and cut at those six bases. 33 00:02:52,160 --> 00:02:53,970 And cut in just this way making a really 34 00:02:53,970 --> 00:02:57,350 clean overhang here. 35 00:02:57,350 --> 00:03:00,850 It's this cool five prime overhang. 36 00:03:00,850 --> 00:03:04,080 Who do you think invented this cool protein? 37 00:03:04,080 --> 00:03:06,600 What engineer came up with this cool protein? 38 00:03:06,600 --> 00:03:08,320 AUDIENCE: MIT engineers. 39 00:03:08,320 --> 00:03:11,600 PROFESSOR: MIT engineers, yeah. 40 00:03:11,600 --> 00:03:14,270 Not a chance. 41 00:03:14,270 --> 00:03:15,960 Not a chance. 42 00:03:15,960 --> 00:03:19,010 This is a really tough feat. 43 00:03:19,010 --> 00:03:21,610 This is something that can only be done by the smartest 44 00:03:21,610 --> 00:03:22,940 engineers on the planet. 45 00:03:22,940 --> 00:03:25,550 And MIT engineers are unfortunately only the 46 00:03:25,550 --> 00:03:27,925 smartest human engineers on the planet. 47 00:03:32,910 --> 00:03:35,196 Who came up with this is E. coli. 48 00:03:35,196 --> 00:03:36,940 AUDIENCE: So you found it somewhere in nature? 49 00:03:36,940 --> 00:03:37,490 PROFESSOR: Sorry? 50 00:03:37,490 --> 00:03:39,020 AUDIENCE: You found it somewhere in nature? 51 00:03:39,020 --> 00:03:40,950 PROFESSOR: Of course you find it somewhere in nature. 52 00:03:40,950 --> 00:03:43,860 Almost everything important that we say molecular 53 00:03:43,860 --> 00:03:47,640 biologists have come up with, it means molecular sat at the 54 00:03:47,640 --> 00:03:52,120 feet of the true masters, bacteria, and learned from the 55 00:03:52,120 --> 00:03:54,310 true masters. 56 00:03:54,310 --> 00:03:57,080 This protein is found in nature. 57 00:03:57,080 --> 00:03:58,330 And it's found in E. coli. 58 00:04:01,810 --> 00:04:07,090 In fact, it's found in E. coli strain R. And it was the first 59 00:04:07,090 --> 00:04:10,600 such protein found in E. coli strain R, so it 60 00:04:10,600 --> 00:04:12,290 gets the name EcoR1. 61 00:04:16,269 --> 00:04:19,269 And it cuts the DNA like this. 62 00:04:19,269 --> 00:04:21,140 Pretty cool. 63 00:04:21,140 --> 00:04:23,060 Pretty cool. 64 00:04:23,060 --> 00:04:33,820 Now, it turns out that E. coli has this EcoR1. 65 00:04:33,820 --> 00:04:35,860 How often does E. coli-- 66 00:04:35,860 --> 00:04:38,875 so whenever I take EcoR1, this protein, purified from E. 67 00:04:38,875 --> 00:04:43,520 coli, and I add it to DNA it always cuts at this site, 68 00:04:43,520 --> 00:04:46,970 which we call an EcoR1 site. 69 00:04:46,970 --> 00:04:48,760 How frequently do we expect, what's the 70 00:04:48,760 --> 00:04:51,120 frequency of EcoR1 sites? 71 00:04:54,470 --> 00:04:59,430 G-A-A-T-T-C, how often will that occur at random? 72 00:05:02,510 --> 00:05:03,243 One in-- 73 00:05:03,243 --> 00:05:04,030 AUDIENCE: Two to the sixth? 74 00:05:04,030 --> 00:05:05,520 PROFESSOR: One in two to the sixth? 75 00:05:05,520 --> 00:05:07,280 How many letters do I have? 76 00:05:07,280 --> 00:05:08,790 AUDIENCE: Four, oh, four to the sixth. 77 00:05:08,790 --> 00:05:10,090 PROFESSOR: One in four to the sixth. 78 00:05:10,090 --> 00:05:13,850 My frequency should be about one in four to the sixth, 79 00:05:13,850 --> 00:05:15,380 which is about what? 80 00:05:15,380 --> 00:05:17,180 What's four to the sixth? 81 00:05:17,180 --> 00:05:18,100 It's two to the 12th. 82 00:05:18,100 --> 00:05:19,981 It's about 4,000. 83 00:05:19,981 --> 00:05:21,830 It's about one in 4,000 letters. 84 00:05:21,830 --> 00:05:25,028 One in 4,000 bases. 85 00:05:25,028 --> 00:05:26,070 So it's very convenient. 86 00:05:26,070 --> 00:05:28,780 One in every 4,000 bases it'll roughly cut. 87 00:05:28,780 --> 00:05:35,000 It'll cut at roughly one 4,000 bases. 88 00:05:35,000 --> 00:05:37,290 Why doesn't E. coli cut its own DNA? 89 00:05:43,370 --> 00:05:45,020 If it's got this protein floating around in its cell, 90 00:05:45,020 --> 00:05:48,610 why isn't it chopping up its own DNA? 91 00:05:48,610 --> 00:05:51,370 Doesn't have G-A-A-T-T-C? 92 00:05:51,370 --> 00:05:53,480 Yeah, the problem is it's so frequent. 93 00:05:53,480 --> 00:05:54,860 That'd be really hard to make sure-- 94 00:05:54,860 --> 00:05:57,680 I mean, E. coli has 4 million letters in its genome. 95 00:05:57,680 --> 00:05:59,500 This should cut every 4,000 bases. 96 00:05:59,500 --> 00:06:02,430 You expect about 1,000 such sequences. 97 00:06:02,430 --> 00:06:05,450 It might be hard to arrange not to cut-- 98 00:06:05,450 --> 00:06:07,600 not to have any such sequences. 99 00:06:07,600 --> 00:06:11,125 It's a good idea, is not to have any, but an alternative-- 100 00:06:11,125 --> 00:06:12,610 AUDIENCE: [INAUDIBLE]. 101 00:06:12,610 --> 00:06:13,940 PROFESSOR: It protects them. 102 00:06:13,940 --> 00:06:17,830 It turns out E. coli, instead of avoiding the sequence 103 00:06:17,830 --> 00:06:20,690 altogether, has another trick up its sleeve. 104 00:06:20,690 --> 00:06:25,330 E. coli protects this sequence whenever it occurs. 105 00:06:25,330 --> 00:06:29,215 So it turns out that whenever you have a stretch of the E. 106 00:06:29,215 --> 00:06:38,325 coli genome that has this G-A-A-T-T-C in it, what E. 107 00:06:38,325 --> 00:06:41,020 coli does is it puts-- 108 00:06:41,020 --> 00:06:43,800 I'm just writing M-E here for a methyl group. 109 00:06:43,800 --> 00:06:45,400 Right, C-H three up there. 110 00:06:45,400 --> 00:06:46,590 It puts a methyl group-- 111 00:06:46,590 --> 00:06:48,010 I'll write C-H three. 112 00:06:48,010 --> 00:06:49,510 There we go. 113 00:06:49,510 --> 00:06:54,620 It puts a methyl group on the A, that middle A. 114 00:06:54,620 --> 00:06:59,900 Well, that is a cute trick that E. coli uses, putting a 115 00:06:59,900 --> 00:07:01,150 methyl group there. 116 00:07:03,620 --> 00:07:06,630 Because what happens is, when there's a methyl group, right 117 00:07:06,630 --> 00:07:11,930 at that position, the enzyme no longer recognizes and no 118 00:07:11,930 --> 00:07:13,180 longer cuts there. 119 00:07:15,970 --> 00:07:18,770 So that's kind of clever. 120 00:07:18,770 --> 00:07:21,555 E. coli makes this protein that can recognize 121 00:07:21,555 --> 00:07:25,660 G-A-A-T-T-C, but it has a second protein that puts 122 00:07:25,660 --> 00:07:27,200 methyl groups there. 123 00:07:27,200 --> 00:07:30,830 And this protein happens by accident 124 00:07:30,830 --> 00:07:32,540 to be called a methylase. 125 00:07:36,720 --> 00:07:38,030 It has a methylase. 126 00:07:38,030 --> 00:07:40,930 And the methylase protects that sequence. 127 00:07:40,930 --> 00:07:44,180 So now, this is really cool engineering, but kind of dumb. 128 00:07:44,180 --> 00:07:45,230 What's it doing there? 129 00:07:45,230 --> 00:07:47,230 It has something that cuts the sequence and it protects the 130 00:07:47,230 --> 00:07:50,550 sequence, why bother having this? 131 00:07:50,550 --> 00:07:52,392 Yeah? 132 00:07:52,392 --> 00:07:57,232 AUDIENCE: You can use it to cut it at places to unwrap the 133 00:07:57,232 --> 00:07:58,200 true strands. 134 00:07:58,200 --> 00:07:59,410 PROFESSOR: That's an interesting idea. 135 00:07:59,410 --> 00:08:02,140 We could use it to cut our DNA and open it up to unravel our 136 00:08:02,140 --> 00:08:03,010 true strands. 137 00:08:03,010 --> 00:08:03,820 It's a thought. 138 00:08:03,820 --> 00:08:04,710 Yes? 139 00:08:04,710 --> 00:08:08,150 AUDIENCE: To protect the bacteria from viruses? 140 00:08:08,150 --> 00:08:11,030 PROFESSOR: Protect the bacteria from viruses. 141 00:08:11,030 --> 00:08:12,485 How do you protect yourself from viruses? 142 00:08:15,450 --> 00:08:17,540 Well, you have an immune system with immune cells and 143 00:08:17,540 --> 00:08:18,570 antibodies and all that. 144 00:08:18,570 --> 00:08:20,930 Does E. coli have an immune system? 145 00:08:20,930 --> 00:08:22,215 Why doesn't it have immune cells? 146 00:08:24,840 --> 00:08:26,010 Because it's like one cell. 147 00:08:26,010 --> 00:08:28,710 How's it going to have an immune system, right? 148 00:08:28,710 --> 00:08:30,320 So suppose E. coli gets a cold. 149 00:08:30,320 --> 00:08:32,700 Suppose it gets infected by a virus. 150 00:08:32,700 --> 00:08:35,610 How's it going to protect itself? 151 00:08:35,610 --> 00:08:39,830 Cut at a frequently occurring DNA sequence. 152 00:08:39,830 --> 00:08:45,720 Now the virus, of course, isn't methylated there, bingo. 153 00:08:45,720 --> 00:08:47,100 That's how it tells its own-- 154 00:08:47,100 --> 00:08:49,080 you can tell cell from an invader. 155 00:08:52,710 --> 00:08:55,520 E. coli can tell cell from an invader because it's 156 00:08:55,520 --> 00:08:59,590 methylated its own G-A-A-T-T-C sites, but the virus isn't 157 00:08:59,590 --> 00:09:01,130 methylated there. 158 00:09:01,130 --> 00:09:02,980 Way cool. 159 00:09:02,980 --> 00:09:04,880 This is an immune system for E. coli. 160 00:09:08,620 --> 00:09:10,440 Now, it turns out-- so this is protection. 161 00:09:10,440 --> 00:09:21,460 These restriction enzymes protect E. coli from viruses. 162 00:09:30,170 --> 00:09:32,750 It turns out that E. coli is not alone 163 00:09:32,750 --> 00:09:34,580 in this clever trick. 164 00:09:34,580 --> 00:09:38,770 It turns out that other bacteria have also thought of 165 00:09:38,770 --> 00:09:40,270 this trick. 166 00:09:40,270 --> 00:09:43,870 So it turns out that there is another restriction enzyme 167 00:09:43,870 --> 00:09:54,040 that cuts at G-G-A-T-C-C. And on the other strand it goes 168 00:09:54,040 --> 00:10:01,750 G-G-A-T-C-C. It, again, cuts in that distinctive pattern. 169 00:10:01,750 --> 00:10:04,005 And it's called BamH1. 170 00:10:08,971 --> 00:10:10,650 And there's another guy. 171 00:10:10,650 --> 00:10:20,060 And he cuts at A-A-G-C-T-T, A-A-G-C-T-T. And it 172 00:10:20,060 --> 00:10:22,040 also cuts like that. 173 00:10:22,040 --> 00:10:24,820 And it's called HindIII. 174 00:10:28,004 --> 00:10:32,960 And there's some that cut at G-A-T-C, just 175 00:10:32,960 --> 00:10:34,390 the four letter word. 176 00:10:36,950 --> 00:10:39,170 And they cut like that. 177 00:10:39,170 --> 00:10:54,790 And there's some that cut at C-A-G-C-T-G, C-A-G-C-T-G, and 178 00:10:54,790 --> 00:10:58,380 this, cuts smack in the middle. 179 00:10:58,380 --> 00:11:02,690 In other words, there's a wide number of different tricks. 180 00:11:02,690 --> 00:11:04,370 Some cut at six bases. 181 00:11:04,370 --> 00:11:05,740 Some cut at four bases. 182 00:11:05,740 --> 00:11:07,590 Some cut at eight bases. 183 00:11:07,590 --> 00:11:09,710 Some cut leaving an overhang. 184 00:11:09,710 --> 00:11:11,640 Some cut smack in the middle. 185 00:11:11,640 --> 00:11:15,670 Some cut leaving the overhang in the other direction. 186 00:11:15,670 --> 00:11:18,240 Some allow a degenerate base in the middle. 187 00:11:18,240 --> 00:11:21,350 It doesn't care which base is in the middle. 188 00:11:21,350 --> 00:11:24,110 There's a zillion different solutions that bacteria have 189 00:11:24,110 --> 00:11:26,390 come up with for their immune system. 190 00:11:26,390 --> 00:11:30,390 And so, if I want to cut up some human DNA all I need is 191 00:11:30,390 --> 00:11:35,420 say, this protein EcoR1 or BamH1 or HindIII or MVL 1 or 192 00:11:35,420 --> 00:11:37,695 PVU 2 or et cetera. 193 00:11:37,695 --> 00:11:40,710 And I can do that by growing up E. coli and 194 00:11:40,710 --> 00:11:41,960 purifying the protein. 195 00:11:44,810 --> 00:11:48,010 And if I wanted HindIII, I would grow up haemophilus 196 00:11:48,010 --> 00:11:51,060 influenza and purify the protein. 197 00:11:51,060 --> 00:11:53,890 So in a molecular biology lab, today, if you want to cut up 198 00:11:53,890 --> 00:11:58,000 human DNA, you could grow up some E. coli and purify EcoR1 199 00:11:58,000 --> 00:11:59,250 or haemophilus influenza. 200 00:12:01,260 --> 00:12:04,710 And that is indeed what ancient molecular biologists 201 00:12:04,710 --> 00:12:09,360 did in prehistoric days in the 1970s and 1980s. 202 00:12:09,360 --> 00:12:13,000 They would purify their own restriction enzymes. 203 00:12:13,000 --> 00:12:14,090 They're still alive today. 204 00:12:14,090 --> 00:12:15,400 You can talk to them. 205 00:12:15,400 --> 00:12:17,140 There are many of them on the faculty. 206 00:12:17,140 --> 00:12:20,010 And they'll tell you how it put hair on their chest to be 207 00:12:20,010 --> 00:12:22,265 able to purify their own restriction enzymes. 208 00:12:27,610 --> 00:12:30,150 What do you do today? 209 00:12:30,150 --> 00:12:33,275 Order it online from the catalog, right? 210 00:12:33,275 --> 00:12:36,065 You know, there's the catalog, the New 211 00:12:36,065 --> 00:12:38,480 England Bio Labs catalog. 212 00:12:38,480 --> 00:12:39,730 Let's see what we got here. 213 00:12:43,800 --> 00:12:53,470 Restriction enzymes, modifying enzymes, polymerases, all 214 00:12:53,470 --> 00:12:59,150 right, EcoR1, sale on EcoR1 right now. 215 00:12:59,150 --> 00:13:04,070 $100 buys you 10,000 units of EcoR1. 216 00:13:04,070 --> 00:13:05,030 It's in the catalog. 217 00:13:05,030 --> 00:13:05,780 You can go online. 218 00:13:05,780 --> 00:13:06,270 You can order it. 219 00:13:06,270 --> 00:13:08,390 You can have it tomorrow by FedEx. 220 00:13:08,390 --> 00:13:10,480 So, but that's how it works. 221 00:13:10,480 --> 00:13:11,810 It's in the catalog. 222 00:13:11,810 --> 00:13:15,930 So you can get any restriction enzyme you want to cut DNA 223 00:13:15,930 --> 00:13:17,180 anywhere you want to.