1 00:00:00,040 --> 00:00:02,460 The following content is provided under a Creative 2 00:00:02,460 --> 00:00:03,870 Commons license. 3 00:00:03,870 --> 00:00:06,910 Your support will help MIT OpenCourseWare continue to 4 00:00:06,910 --> 00:00:10,560 offer high quality educational resources for free. 5 00:00:10,560 --> 00:00:13,460 To make a donation or view additional materials from 6 00:00:13,460 --> 00:00:17,390 hundreds of MIT courses, visit MIT OpenCourseWare at 7 00:00:17,390 --> 00:00:18,640 ocw.mit.edu. 8 00:00:21,960 --> 00:00:22,445 PROFESSOR: Good morning. 9 00:00:22,445 --> 00:00:24,390 AUDIENCE: Good morning. 10 00:00:24,390 --> 00:00:25,000 PROFESSOR: All right. 11 00:00:25,000 --> 00:00:28,650 So today-- 12 00:00:28,650 --> 00:00:29,860 I haven't seen you in a while. 13 00:00:29,860 --> 00:00:36,930 Anyway, today, we're going to turn back to our picture, 14 00:00:36,930 --> 00:00:39,270 function gene protein. 15 00:00:39,270 --> 00:00:41,530 We filled in genetics. 16 00:00:41,530 --> 00:00:43,390 We filled in biochemistry. 17 00:00:43,390 --> 00:00:46,540 We've now got the connection between gene and protein 18 00:00:46,540 --> 00:00:48,520 through molecular biology. 19 00:00:48,520 --> 00:00:51,450 We know gene encodes protein. 20 00:00:51,450 --> 00:00:53,730 We know central dogma, DNA goes to the 21 00:00:53,730 --> 00:00:55,560 RNA goes to the protein. 22 00:00:55,560 --> 00:00:57,100 We know all that in theory. 23 00:00:57,100 --> 00:01:03,820 In fact, people knew this by the middle of the 1960s. 24 00:01:03,820 --> 00:01:07,720 People were so excited that they understood the idea of 25 00:01:07,720 --> 00:01:12,130 how genes give rise to proteins through transcription 26 00:01:12,130 --> 00:01:17,890 and translation, they read the genetic code, that they 27 00:01:17,890 --> 00:01:19,620 declared victory. 28 00:01:19,620 --> 00:01:22,690 Some of them said, done with the secret of life. 29 00:01:22,690 --> 00:01:24,560 Let's go on and do the brain. 30 00:01:24,560 --> 00:01:26,990 That was actually the thinking of a lot of the great 31 00:01:26,990 --> 00:01:29,380 molecular biologists in the late 1960s. 32 00:01:29,380 --> 00:01:30,740 Let's go do the brain. 33 00:01:30,740 --> 00:01:32,780 Why did they say such a thing? 34 00:01:32,780 --> 00:01:34,100 Well, because they thought they were 35 00:01:34,100 --> 00:01:36,170 done with the problem. 36 00:01:36,170 --> 00:01:39,440 They thought that once you knew in principle how a gene 37 00:01:39,440 --> 00:01:41,980 gave rise to a protein, you could do it. 38 00:01:41,980 --> 00:01:48,220 But in practice, nobody could read a single gene. 39 00:01:48,220 --> 00:01:51,960 Nobody could even identify a single gene. 40 00:01:51,960 --> 00:01:53,980 Maybe that's why they went on to say, let's go study the 41 00:01:53,980 --> 00:01:55,620 brain, because they actually weren't sure what they could 42 00:01:55,620 --> 00:01:58,600 do past that point. 43 00:01:58,600 --> 00:01:59,170 All right. 44 00:01:59,170 --> 00:02:01,430 So wait a second, wait a second. 45 00:02:01,430 --> 00:02:04,440 I said nobody could even purify a single gene. 46 00:02:04,440 --> 00:02:08,661 Didn't we talk about purifying the genetic material? 47 00:02:08,661 --> 00:02:10,910 You're supposed to say yes at that point. 48 00:02:10,910 --> 00:02:11,580 Yes, right? 49 00:02:11,580 --> 00:02:12,630 We talked about that. 50 00:02:12,630 --> 00:02:16,590 Avery, McCarty, MacLeod-- we purified the genetic material. 51 00:02:16,590 --> 00:02:18,876 We did it by using this assay of transforming. 52 00:02:21,480 --> 00:02:24,220 So what do I mean by we can't purify a single gene? 53 00:02:24,220 --> 00:02:28,610 What I mean is that we can purify the hereditary material 54 00:02:28,610 --> 00:02:32,670 away from everything else, but we get all of it together. 55 00:02:32,670 --> 00:02:36,800 We don't get individual genes separated from each other. 56 00:02:36,800 --> 00:02:39,250 We get the whole mixture of all the genes, 57 00:02:39,250 --> 00:02:41,550 all the genetic material. 58 00:02:41,550 --> 00:02:44,910 As a biochemist, how are we going to ever separate the 59 00:02:44,910 --> 00:02:46,240 gene encoding-- 60 00:02:46,240 --> 00:02:49,520 oh I don't know, ARG1, our favorite gene for arginine 61 00:02:49,520 --> 00:02:50,580 biosynthesis-- 62 00:02:50,580 --> 00:02:54,670 from the gene encoding ARG2, for example? 63 00:02:54,670 --> 00:02:56,500 What kind of biochemistry can we do to 64 00:02:56,500 --> 00:02:57,880 separate these two genes? 65 00:03:00,580 --> 00:03:05,660 Do they have different biochemical properties? 66 00:03:05,660 --> 00:03:07,350 What's so different about ARG1 and ARG2? 67 00:03:07,350 --> 00:03:09,504 What's their different biochemical properties? 68 00:03:09,504 --> 00:03:11,790 They're both DNA. 69 00:03:11,790 --> 00:03:14,100 They have exactly the same building blocks, slightly 70 00:03:14,100 --> 00:03:15,180 different order. 71 00:03:15,180 --> 00:03:17,350 You think you have a purification procedure, I'm 72 00:03:17,350 --> 00:03:19,800 going to run it over some column and separate it by 73 00:03:19,800 --> 00:03:24,040 something that's going to separate ARG1 from ARG2? 74 00:03:24,040 --> 00:03:24,570 No. 75 00:03:24,570 --> 00:03:27,250 From the point of view of a pure biochemist, they look 76 00:03:27,250 --> 00:03:28,800 exactly the same. 77 00:03:28,800 --> 00:03:31,920 All the different genes have the same biochemical 78 00:03:31,920 --> 00:03:33,440 properties. 79 00:03:33,440 --> 00:03:38,320 How in the world would we ever purify ARG1 from ARG2, or in 80 00:03:38,320 --> 00:03:40,830 the human, the gene encoding hemoglobin from the gene 81 00:03:40,830 --> 00:03:44,120 encoding collagen from the gene encoding keratin from the 82 00:03:44,120 --> 00:03:46,130 gene encoding anything else? 83 00:03:46,130 --> 00:03:46,670 Think about it. 84 00:03:46,670 --> 00:03:47,920 That's a tough problem. 85 00:03:50,340 --> 00:03:54,970 There is a brilliant solution that arose in the 1970s to how 86 00:03:54,970 --> 00:03:58,330 we could purify the individual genes away from each other. 87 00:03:58,330 --> 00:04:01,180 But it's like no other piece of biochemistry anybody had 88 00:04:01,180 --> 00:04:02,870 ever seen before. 89 00:04:02,870 --> 00:04:06,080 It has a totally different principle behind it. 90 00:04:06,080 --> 00:04:09,860 Because it isn't just fractionating things according 91 00:04:09,860 --> 00:04:11,550 to their biochemical properties. 92 00:04:11,550 --> 00:04:14,410 It involves something else. 93 00:04:14,410 --> 00:04:16,100 And it's called cloning. 94 00:04:18,740 --> 00:04:19,990 It is called cloning. 95 00:04:22,400 --> 00:04:24,095 Molecular cloning. 96 00:04:24,095 --> 00:04:25,520 You see, the problem is this. 97 00:04:25,520 --> 00:04:28,350 The human genome-- 98 00:04:28,350 --> 00:04:30,760 how big is the human genome? 99 00:04:30,760 --> 00:04:32,630 How many bases? 100 00:04:32,630 --> 00:04:34,630 Three billion bases, three times 10 to the 101 00:04:34,630 --> 00:04:37,440 ninth bases, right? 102 00:04:37,440 --> 00:04:40,150 How big is a typical human gene? 103 00:04:40,150 --> 00:04:46,380 A typical human gene might be 30,000 bases. 104 00:04:46,380 --> 00:04:50,610 How big is a typical mutation that we might want to find in 105 00:04:50,610 --> 00:04:54,340 a typical gene, like causing sickle cell anemia? 106 00:04:54,340 --> 00:04:57,290 One base. 107 00:04:57,290 --> 00:05:00,580 We've got to purify out genes that are one part in 10 to the 108 00:05:00,580 --> 00:05:03,660 fifth and mutations that are one part in 10 to 109 00:05:03,660 --> 00:05:05,972 the ninth or so. 110 00:05:05,972 --> 00:05:08,680 And how are we going to do that? 111 00:05:08,680 --> 00:05:11,680 Well, the trick is this. 112 00:05:11,680 --> 00:05:14,010 I'll give you the quick overview, and then we'll spend 113 00:05:14,010 --> 00:05:15,180 today looking at it. 114 00:05:15,180 --> 00:05:19,150 Step one is we cut up our DNA. 115 00:05:19,150 --> 00:05:32,790 We cut DNA at defined sites, and we then paste the DNA to 116 00:05:32,790 --> 00:05:36,520 distinct molecules called vectors. 117 00:05:36,520 --> 00:05:40,640 These vectors have a cool property, that when you take a 118 00:05:40,640 --> 00:05:45,360 vector and you insert in it a piece of DNA, that vector is 119 00:05:45,360 --> 00:05:49,880 able to grow in another organism. 120 00:05:49,880 --> 00:05:52,840 You then transform the DNA-- 121 00:05:55,900 --> 00:05:59,970 that's transfer, we use the word transform the DNA-- 122 00:05:59,970 --> 00:06:03,550 into something like E. coli, where you get your little 123 00:06:03,550 --> 00:06:05,490 vector in there. 124 00:06:05,490 --> 00:06:10,110 It grows within E. coli, and as E. coli divides, it makes 125 00:06:10,110 --> 00:06:11,680 copies of itself. 126 00:06:11,680 --> 00:06:23,750 And then you select those bacteria that have received, 127 00:06:23,750 --> 00:06:31,430 that have been transformed, grow them up on a petri plate 128 00:06:31,430 --> 00:06:35,460 so that you have little colonies. 129 00:06:35,460 --> 00:06:40,620 And then you screen the colonies. 130 00:06:44,300 --> 00:06:45,040 Now, what do I mean? 131 00:06:45,040 --> 00:06:45,960 We cut the DNA. 132 00:06:45,960 --> 00:06:47,020 We paste the DNA. 133 00:06:47,020 --> 00:06:48,750 We transform the DNA. 134 00:06:48,750 --> 00:06:51,200 We select the bacteria that have been successfully 135 00:06:51,200 --> 00:06:52,090 transformed. 136 00:06:52,090 --> 00:06:54,610 And we screen the resulting colonies to find what we're 137 00:06:54,610 --> 00:06:55,510 looking for. 138 00:06:55,510 --> 00:06:56,420 Now, notice-- 139 00:06:56,420 --> 00:06:58,990 that amazing trick here is when we cut up the DNA into 140 00:06:58,990 --> 00:07:01,910 single molecules, lots and lots of single molecules, and 141 00:07:01,910 --> 00:07:06,460 we paste them into vectors, and we transform them into 142 00:07:06,460 --> 00:07:11,880 bacteria, each one of those bacteria gets exactly one 143 00:07:11,880 --> 00:07:14,450 molecule, give or take. 144 00:07:14,450 --> 00:07:17,930 It gets one piece of human DNA. 145 00:07:17,930 --> 00:07:19,810 We then spread them out on a plate and they grow up, and 146 00:07:19,810 --> 00:07:22,840 each one grows up copies for us of 147 00:07:22,840 --> 00:07:25,020 individual pieces of DNA. 148 00:07:25,020 --> 00:07:26,420 That is so cool. 149 00:07:26,420 --> 00:07:28,120 Because we've just accomplished biochemical 150 00:07:28,120 --> 00:07:29,790 purification. 151 00:07:29,790 --> 00:07:31,850 It's not based on any different properties of the 152 00:07:31,850 --> 00:07:33,190 individual molecules. 153 00:07:33,190 --> 00:07:36,060 It's based on the fact that we dilute them, in effect. 154 00:07:36,060 --> 00:07:37,800 They're diluted, and one molecule 155 00:07:37,800 --> 00:07:39,620 ends up in each bacteria. 156 00:07:39,620 --> 00:07:41,440 So they're purified in that sense. 157 00:07:41,440 --> 00:07:43,980 And then when that bacteria grows up, everything it grows 158 00:07:43,980 --> 00:07:47,620 up is a pure copy, a copy of that single piece of DNA that 159 00:07:47,620 --> 00:07:49,020 went into it. 160 00:07:49,020 --> 00:07:53,030 That's a different kind of purification. 161 00:07:53,030 --> 00:07:54,840 When I'm done-- and we'll go through this whole process. 162 00:07:54,840 --> 00:07:56,365 That's the point of today's lecture, is to go through the 163 00:07:56,365 --> 00:07:56,880 whole process. 164 00:07:56,880 --> 00:07:58,770 When I'm done, I have bacteria spread out. 165 00:07:58,770 --> 00:08:01,260 And right over there, one of these guys has ARG1, and one 166 00:08:01,260 --> 00:08:04,740 of these guys has ARG2, and one of these guys has ARG10. 167 00:08:04,740 --> 00:08:06,800 Now, admittedly, I don't know which one has which, but I've 168 00:08:06,800 --> 00:08:09,740 accomplished the purification. 169 00:08:09,740 --> 00:08:12,070 I'll then have to figure out how to screen and find out 170 00:08:12,070 --> 00:08:15,300 which one is which, but I have separated the molecules away 171 00:08:15,300 --> 00:08:18,950 from each other by this process of cloning, diluting 172 00:08:18,950 --> 00:08:20,540 them in a way-- 173 00:08:20,540 --> 00:08:22,530 one molecule per bacteria-- 174 00:08:22,530 --> 00:08:24,961 and growing them back up. 175 00:08:24,961 --> 00:08:26,300 I could do this for anything. 176 00:08:26,300 --> 00:08:29,690 I could dilute proteins down into test tubes that had one 177 00:08:29,690 --> 00:08:32,919 protein molecule per test tube, and I could say I've 178 00:08:32,919 --> 00:08:35,100 accomplished purification. 179 00:08:35,100 --> 00:08:37,110 The problem with it is I have no way to replicate those 180 00:08:37,110 --> 00:08:39,320 proteins to get a meaningful amount of it. 181 00:08:39,320 --> 00:08:42,480 But when it's DNA, and I've put it back into a bacteria, I 182 00:08:42,480 --> 00:08:44,020 have a way to grow it back up. 183 00:08:44,020 --> 00:08:45,730 And that's why this trick works-- 184 00:08:45,730 --> 00:08:48,940 is because DNA is a molecule that can replicate. 185 00:08:48,940 --> 00:08:51,550 No other molecule has that cool property. 186 00:08:51,550 --> 00:08:53,500 And so you can pull this off for DNA. 187 00:08:53,500 --> 00:08:54,050 All right. 188 00:08:54,050 --> 00:08:56,160 Now we have to dive in to understand how this could 189 00:08:56,160 --> 00:08:57,410 possibly work.