1 00:00:06,500 --> 00:00:09,540 ERIC LANDER: So, now what I'd like to do is turn to 2 00:00:09,540 --> 00:00:13,310 variations on the theme. 3 00:00:13,310 --> 00:00:19,770 One of the best ways to understand what's going on 4 00:00:19,770 --> 00:00:25,040 with DNA goes to RNA goes to protein is to consider how it 5 00:00:25,040 --> 00:00:26,500 works in different organisms. 6 00:00:31,500 --> 00:00:38,580 And the organisms we'll consider are eukaryotes, like 7 00:00:38,580 --> 00:00:50,000 you; prokaryotes, like a bacterium; and viruses. 8 00:00:50,000 --> 00:00:54,990 And each does the same basic copying of nucleic acid to 9 00:00:54,990 --> 00:00:58,630 nucleic acid, transcription, and translation. 10 00:00:58,630 --> 00:01:00,280 But there are some pretty fascinating 11 00:01:00,280 --> 00:01:02,080 variations on the theme. 12 00:01:02,080 --> 00:01:03,860 So let's turn to those variations. 13 00:01:03,860 --> 00:01:06,183 Let's start with DNA replication. 14 00:01:15,760 --> 00:01:29,530 For eukaryotes, your genome are long, double-stranded DNA 15 00:01:29,530 --> 00:01:33,450 molecules, and they're linear. 16 00:01:33,450 --> 00:01:39,310 Long, linear, double stranded-- 17 00:01:39,310 --> 00:01:40,880 ds for double. 18 00:01:40,880 --> 00:01:44,476 I'll write it out-- double stranded DNA. 19 00:01:44,476 --> 00:01:45,850 You know this. 20 00:01:50,170 --> 00:01:51,320 And you've got a lot of it. 21 00:01:51,320 --> 00:02:02,190 The human, you actually have 23 pairs of chromosomes. 22 00:02:02,190 --> 00:02:10,190 And the total length of your DNA is about 3 times 10 to the 23 00:02:10,190 --> 00:02:13,700 ninth base pairs. 24 00:02:13,700 --> 00:02:17,890 So about 3 billion bases or so, typical chromosome on the 25 00:02:17,890 --> 00:02:21,330 order of about 150 million bases or so. 26 00:02:21,330 --> 00:02:25,100 The mouse, pretty similar. 27 00:02:25,100 --> 00:02:28,060 Mouse, 20 pairs of chromosomes. 28 00:02:28,060 --> 00:02:32,810 And it's something like 2.7 times 10 to the ninth bases. 29 00:02:32,810 --> 00:02:37,030 The dog's about 2.5, the elephant's about 3.3, 30 00:02:37,030 --> 00:02:43,600 essentially all mammals are about 3 times 10 to the ninth. 31 00:02:43,600 --> 00:02:48,790 Tomatoes, 12 chromosomes. 32 00:02:48,790 --> 00:02:50,370 They're also in the neighborhood of 3 33 00:02:50,370 --> 00:02:52,630 times 10 to the ninth. 34 00:02:52,630 --> 00:02:55,400 You don't have more DNA than a tomato, for example. 35 00:02:59,580 --> 00:03:01,830 Yeast, much smaller genome. 36 00:03:01,830 --> 00:03:03,080 Yeast is like-- 37 00:03:05,540 --> 00:03:07,810 it's got 16 chromosomes. 38 00:03:07,810 --> 00:03:14,720 Yeast is in the neighborhood of about 13 million bases 39 00:03:14,720 --> 00:03:17,370 instead of billions of bases. 40 00:03:17,370 --> 00:03:23,910 Fruit flies, four chromosomes, including a pretty measly one. 41 00:03:23,910 --> 00:03:29,870 And it's in the order of 200 million bases, et cetera. 42 00:03:29,870 --> 00:03:31,770 Many different sizes for eukaryotes. 43 00:03:31,770 --> 00:03:34,810 But there's one significant issue that all eukaryotes 44 00:03:34,810 --> 00:03:41,800 have, which is this chromosome here, this linear chromosome, 45 00:03:41,800 --> 00:03:43,770 how do we replicate it? 46 00:03:43,770 --> 00:03:44,470 Well, we told you. 47 00:03:44,470 --> 00:03:45,970 We open up a bubble. 48 00:03:45,970 --> 00:03:48,600 We start making little primers. 49 00:03:48,600 --> 00:03:52,640 Primase makes primers, you extend it, it's all just fine. 50 00:03:52,640 --> 00:03:56,175 Except in one place, the end. 51 00:04:00,940 --> 00:04:05,650 Primase makes a primer, continues 52 00:04:05,650 --> 00:04:07,515 to the end, no problem. 53 00:04:10,030 --> 00:04:12,640 What happens out there at the end? 54 00:04:12,640 --> 00:04:15,370 Suppose primase sits down here. 55 00:04:23,730 --> 00:04:27,504 What happens at the very end of the chromosome? 56 00:04:27,504 --> 00:04:30,980 If my primer wasn't exactly at the end of the chromosome, 57 00:04:30,980 --> 00:04:32,990 what happens? 58 00:04:32,990 --> 00:04:34,430 STUDENT: A little bit of it doesn't get replicated. 59 00:04:34,430 --> 00:04:36,566 ERIC LANDER: A little bit doesn't get replicated. 60 00:04:36,566 --> 00:04:38,550 Does that matter? 61 00:04:38,550 --> 00:04:40,244 Just a little bit-- 62 00:04:40,244 --> 00:04:41,030 STUDENT: [INAUDIBLE] 63 00:04:41,030 --> 00:04:44,470 ERIC LANDER: --per cell division. 64 00:04:44,470 --> 00:04:46,990 Every cell division, you lose a little bit of information 65 00:04:46,990 --> 00:04:48,240 off the end of your chromosome. 66 00:04:50,970 --> 00:04:53,400 That's not so good . 67 00:04:53,400 --> 00:04:56,820 The word for end is tel. 68 00:04:56,820 --> 00:05:01,510 And so the ends of chromosomes are called telomeres. 69 00:05:01,510 --> 00:05:05,930 And if you have a linear chromosome, you have a problem 70 00:05:05,930 --> 00:05:10,540 with replicating your telomeres because it's going 71 00:05:10,540 --> 00:05:12,375 to get a little short each cell division. 72 00:05:16,110 --> 00:05:17,360 So what do you think the cell does? 73 00:05:21,420 --> 00:05:24,010 Special mechanism, special enzyme that 74 00:05:24,010 --> 00:05:26,910 comes along and adds-- 75 00:05:26,910 --> 00:05:29,330 there's a repeat sequence that occurs. 76 00:05:29,330 --> 00:05:31,650 There's a repetitive sequence that occurs out here. 77 00:05:34,540 --> 00:05:40,350 T2AGGG in humans, different things in other organisms. 78 00:05:40,350 --> 00:05:43,750 And there's a specific enzyme that comes along and adds back 79 00:05:43,750 --> 00:05:47,570 telomeres so you don't get in trouble failing to replicate 80 00:05:47,570 --> 00:05:49,710 enough stuff at the end. 81 00:05:49,710 --> 00:05:52,880 By chance, the enzyme happens to be called--? 82 00:05:52,880 --> 00:05:53,740 STUDENT: Telomerase. 83 00:05:53,740 --> 00:05:56,370 ERIC LANDER: Telomerase. 84 00:05:56,370 --> 00:06:00,710 And some folks got a Nobel Prize for this last year, for 85 00:06:00,710 --> 00:06:03,805 understanding how telomerase works. 86 00:06:07,790 --> 00:06:13,540 What cells in your body are in need of replicating and 87 00:06:13,540 --> 00:06:14,380 replicating and replicating? 88 00:06:14,380 --> 00:06:15,410 Well, not necessarily in your body. 89 00:06:15,410 --> 00:06:18,085 What cells in some people are replicating and replicating 90 00:06:18,085 --> 00:06:18,950 and replicating and replicating? 91 00:06:18,950 --> 00:06:20,500 Cancer cells. 92 00:06:20,500 --> 00:06:23,900 Cancer cells probably need their telomerase, right? 93 00:06:23,900 --> 00:06:26,980 So one way to possibly treat cancer might be to inhibit 94 00:06:26,980 --> 00:06:28,476 telomerase. 95 00:06:28,476 --> 00:06:30,880 So you see, all the things I'm telling you about, these are 96 00:06:30,880 --> 00:06:33,890 useful fun facts to know and tell about how the cell works. 97 00:06:33,890 --> 00:06:36,580 They're also the heart of a lot of approaches in medicine. 98 00:06:36,580 --> 00:06:38,440 Because if you could specifically inhibit 99 00:06:38,440 --> 00:06:41,350 telomerase, you might specifically create a 100 00:06:41,350 --> 00:06:43,960 liability for rapidly dividing cells. 101 00:06:43,960 --> 00:06:45,960 And so telomerase, very interesting. 102 00:06:45,960 --> 00:06:49,150 Anyway, so a linear chromosome has that problem. 103 00:06:49,150 --> 00:06:52,980 And if you understood our mechanisms for replication, 104 00:06:52,980 --> 00:06:56,680 you'll understand there why the linear chromosome needs 105 00:06:56,680 --> 00:06:57,860 some special mechanism. 106 00:06:57,860 --> 00:07:00,660 Now, here, prokaryotes are much easier. 107 00:07:00,660 --> 00:07:09,450 Because most prokaryotes have circular chromosomes, 108 00:07:09,450 --> 00:07:11,730 double-stranded circular-- 109 00:07:11,730 --> 00:07:13,990 ds circular-- 110 00:07:13,990 --> 00:07:15,240 DNA. 111 00:07:17,260 --> 00:07:18,870 This is much easier because there are no ends. 112 00:07:18,870 --> 00:07:20,720 They don't have to worry about the telomerase problem. 113 00:07:20,720 --> 00:07:22,890 They start somewhere, they start replicating around, it 114 00:07:22,890 --> 00:07:24,490 all works fine. 115 00:07:24,490 --> 00:07:33,250 E. coli, for example, 4 million bases of DNA. 116 00:07:33,250 --> 00:07:34,895 The smallest are microbacteria. 117 00:07:39,290 --> 00:07:43,090 They're on the order of about a million bases of DNA. 118 00:07:43,090 --> 00:07:46,400 And they work just like we talked about. 119 00:07:46,400 --> 00:07:54,580 But now, viruses, they are weird. 120 00:08:02,030 --> 00:08:02,302 Viruses -- 121 00:08:02,302 --> 00:08:12,840 Turns out some viruses have some double-stranded linear 122 00:08:12,840 --> 00:08:17,120 DNA, and some with multiple chromosomes, even. 123 00:08:17,120 --> 00:08:26,110 Some viruses, though, have double-stranded circular DNA. 124 00:08:26,110 --> 00:08:28,800 So it turns out viruses can do either of those. 125 00:08:33,860 --> 00:08:37,070 It turns out they can do more than that. 126 00:08:37,070 --> 00:08:48,390 Some viruses have single-stranded circular DNA. 127 00:08:54,490 --> 00:08:59,130 That is to say, when they're traveling around in their 128 00:08:59,130 --> 00:09:03,590 capsid, in the protein coat, what's in the protein coat 129 00:09:03,590 --> 00:09:08,590 that you get infected with is a single-strand of DNA. 130 00:09:08,590 --> 00:09:09,840 Well, how does it replicate? 131 00:09:12,510 --> 00:09:15,975 When it gets in, it becomes double stranded. 132 00:09:15,975 --> 00:09:21,380 The first thing it has to do is polymerases end up making 133 00:09:21,380 --> 00:09:25,640 this guy double stranded. 134 00:09:25,640 --> 00:09:28,820 But it travels around in its single-stranded form. 135 00:09:28,820 --> 00:09:29,320 Why? 136 00:09:29,320 --> 00:09:30,860 Because it decided to. 137 00:09:30,860 --> 00:09:33,210 The great thing about viruses is they are small. 138 00:09:33,210 --> 00:09:35,280 They've had a chance to experiment with a zillion 139 00:09:35,280 --> 00:09:37,160 different things. 140 00:09:37,160 --> 00:09:42,430 But some viruses don't have any DNA at all when they 141 00:09:42,430 --> 00:09:43,640 travel around. 142 00:09:43,640 --> 00:09:51,320 Instead of having DNA, they alternatively could have RNA. 143 00:09:51,320 --> 00:09:54,240 Remember I said Crick already figured out DNA and RNA are 144 00:09:54,240 --> 00:09:55,670 essentially equivalent. 145 00:09:55,670 --> 00:09:57,030 They're both nucleic acid. 146 00:09:57,030 --> 00:09:58,830 You can go from one to the other. 147 00:09:58,830 --> 00:10:05,310 Some viruses decided to bring along single-stranded RNA. 148 00:10:12,300 --> 00:10:17,460 So when the virus attaches to the cell, it injects an RNA. 149 00:10:17,460 --> 00:10:18,710 The RNA is in the cell. 150 00:10:22,690 --> 00:10:24,245 But how is the virus going to do anything? 151 00:10:29,040 --> 00:10:32,490 How's it going to replicate itself? 152 00:10:32,490 --> 00:10:35,940 How do you replicate RNA? 153 00:10:35,940 --> 00:10:40,450 Well, the same stuff I told you about replicating DNA-- 154 00:10:40,450 --> 00:10:44,460 namely, for replicating DNA you use a DNA polymerase. 155 00:10:44,460 --> 00:10:46,885 It's a DNA-directed DNA polymerase. 156 00:10:46,885 --> 00:10:49,670 It uses DNA as a template. 157 00:10:49,670 --> 00:10:54,410 Any reason not to have an RNA-directed RNA polymerase? 158 00:10:54,410 --> 00:10:54,840 No. 159 00:10:54,840 --> 00:10:56,240 You can have one of those. 160 00:10:56,240 --> 00:11:04,400 So the way this works is this gets replicated 161 00:11:04,400 --> 00:11:07,630 into a strand of RNA. 162 00:11:07,630 --> 00:11:14,590 It makes double-stranded RNA by an RNA-directed RNA 163 00:11:14,590 --> 00:11:15,840 polymerase. 164 00:11:20,490 --> 00:11:22,090 That's a kind of weird enzyme. 165 00:11:22,090 --> 00:11:24,280 It takes RNA as its template. 166 00:11:24,280 --> 00:11:26,520 And it uses RNA as its template, and it 167 00:11:26,520 --> 00:11:27,420 makes another copy. 168 00:11:27,420 --> 00:11:29,160 It makes a strand of RNA to make it double stranded. 169 00:11:29,160 --> 00:11:30,830 And then it goes back and makes another strand of RNA. 170 00:11:34,980 --> 00:11:38,070 But you don't have that enzyme. 171 00:11:38,070 --> 00:11:40,264 Where does that enzyme come from? 172 00:11:40,264 --> 00:11:41,470 STUDENT: The virus. 173 00:11:41,470 --> 00:11:42,845 ERIC LANDER: The virus? 174 00:11:42,845 --> 00:11:45,255 Did it bring it with it? 175 00:11:45,255 --> 00:11:47,200 STUDENT: Another cell that it infects? 176 00:11:47,200 --> 00:11:49,020 ERIC LANDER: Another cell that infects? 177 00:11:49,020 --> 00:11:50,790 STUDENT: The RNA encodes-- 178 00:11:50,790 --> 00:11:51,480 STUDENT: The RNA does-- 179 00:11:51,480 --> 00:11:53,100 ERIC LANDER: Whoa! 180 00:11:53,100 --> 00:11:58,660 Wouldn't it be cool if the RNA was a messenger RNA and it 181 00:11:58,660 --> 00:12:02,700 encoded a protein, and the protein it encoded was the 182 00:12:02,700 --> 00:12:07,060 RNA-directed RNA polymerase? 183 00:12:07,060 --> 00:12:08,040 Bingo. 184 00:12:08,040 --> 00:12:13,280 That is, in fact, what happens with a certain class of what 185 00:12:13,280 --> 00:12:17,600 are called plus-strand viruses. 186 00:12:17,600 --> 00:12:18,980 This is a messenger. 187 00:12:18,980 --> 00:12:20,870 It's a messenger RNA. 188 00:12:20,870 --> 00:12:24,370 And it's actually encodes the instructions to the cell, 189 00:12:24,370 --> 00:12:28,340 please make me an RNA-directed RNA polymerase. 190 00:12:28,340 --> 00:12:29,590 That's way cool. 191 00:12:32,160 --> 00:12:38,310 It also turns out that some viruses are what are called 192 00:12:38,310 --> 00:12:41,870 minus-strand viruses. 193 00:12:41,870 --> 00:12:45,390 They don't bring a messenger RNA. 194 00:12:45,390 --> 00:12:50,340 But instead, like you said, they bring their own 195 00:12:50,340 --> 00:12:52,240 polymerase with them. 196 00:12:52,240 --> 00:12:55,620 The polymerase comes-- 197 00:12:55,620 --> 00:12:59,405 So here, these bring the instructions for a polymerase. 198 00:13:04,710 --> 00:13:06,550 These actually bring the polymerase itself. 199 00:13:11,450 --> 00:13:16,010 The polymerase then copies this strand, which is the 200 00:13:16,010 --> 00:13:17,790 messenger RNA. 201 00:13:17,790 --> 00:13:20,490 And it makes more RNA-directed RNA polymerases. 202 00:13:20,490 --> 00:13:22,510 So both of your two solutions-- 203 00:13:22,510 --> 00:13:26,400 the virus brings a polymerase with itself, or the virus 204 00:13:26,400 --> 00:13:28,500 brings the instructions for the polymerase. 205 00:13:28,500 --> 00:13:31,280 Both of those actually happen. 206 00:13:31,280 --> 00:13:34,720 Pretty much a good rule with viruses is anything that can 207 00:13:34,720 --> 00:13:36,900 happen does happen. 208 00:13:36,900 --> 00:13:39,400 This is pretty much Murphy's rule for viruses there. 209 00:13:42,326 --> 00:13:45,620 Turns out viruses can do one other thing. 210 00:13:45,620 --> 00:13:53,060 It turns out that viruses can take that RNA strand-- 211 00:13:53,060 --> 00:13:55,400 RNA-- 212 00:13:55,400 --> 00:13:59,620 and, although I won't go into all the details, copy that RNA 213 00:13:59,620 --> 00:14:14,340 strand into a DNA strand and then copy that DNA strand to a 214 00:14:14,340 --> 00:14:19,010 second DNA strand, to make double-stranded DNA. 215 00:14:19,010 --> 00:14:24,800 So some viruses that bring RNA with them copy themselves not 216 00:14:24,800 --> 00:14:30,080 into more RNA, but back into DNA. 217 00:14:30,080 --> 00:14:33,270 So instead of an RNA-directed RNA polymerase or a 218 00:14:33,270 --> 00:14:38,040 DNA-directed DNA polymerase, what is this? 219 00:14:38,040 --> 00:14:42,216 It's an RNA-directed DNA polymerase. 220 00:14:46,120 --> 00:14:55,410 So this is an RNA-directed DNA polymerase. 221 00:14:59,580 --> 00:15:03,280 In effect, what is this thing doing? 222 00:15:03,280 --> 00:15:05,960 It's doing the exact opposite of transcription. 223 00:15:05,960 --> 00:15:07,470 What's transcription? 224 00:15:07,470 --> 00:15:09,010 Reading DNA into RNA. 225 00:15:09,010 --> 00:15:10,300 What's this guy doing? 226 00:15:10,300 --> 00:15:11,880 Reading RNA into DNA. 227 00:15:11,880 --> 00:15:15,770 It's the reverse of transcription. 228 00:15:15,770 --> 00:15:18,130 What is the enzyme called? 229 00:15:18,130 --> 00:15:19,380 Reverse transcriptase. 230 00:15:23,280 --> 00:15:25,010 It's called reverse transcriptase. 231 00:15:34,840 --> 00:15:39,070 And then what happens is quite insidious, is that if this is 232 00:15:39,070 --> 00:15:48,560 your own chromosomal DNA, that piece of double-stranded DNA 233 00:15:48,560 --> 00:15:54,000 from the virus can be inserted into your own human 234 00:15:54,000 --> 00:15:55,250 chromosome. 235 00:16:00,260 --> 00:16:02,200 The virus can then make more copies of itself by 236 00:16:02,200 --> 00:16:03,950 transcription of that. 237 00:16:03,950 --> 00:16:06,610 This is a truly insidious virus because it doesn't just 238 00:16:06,610 --> 00:16:08,600 infect your cells and grow. 239 00:16:08,600 --> 00:16:12,000 It infects your cells, turns into double-stranded DNA, and 240 00:16:12,000 --> 00:16:12,940 installs itself. 241 00:16:12,940 --> 00:16:15,590 And how do you get that DNA from the virus out of your 242 00:16:15,590 --> 00:16:16,840 chromosome? 243 00:16:20,860 --> 00:16:23,210 You don't. 244 00:16:23,210 --> 00:16:25,480 You can't get it out. 245 00:16:25,480 --> 00:16:26,730 It's stuck there. 246 00:16:28,710 --> 00:16:31,820 These things, because they work in this fashion of back 247 00:16:31,820 --> 00:16:36,286 from RNA into DNA, have the name retroviruses. 248 00:16:36,286 --> 00:16:37,816 That's what these are, retroviruses. 249 00:16:46,210 --> 00:16:48,680 And can anyone name a particular retrovirus? 250 00:16:48,680 --> 00:16:49,530 STUDENT: HIV. 251 00:16:49,530 --> 00:16:50,460 ERIC LANDER: HIV. 252 00:16:50,460 --> 00:16:52,760 And that's how HIV works. 253 00:16:52,760 --> 00:16:56,770 It turns out that David Baltimore won a Nobel Prize 254 00:16:56,770 --> 00:16:59,710 for the discovery of reverse transcriptase. 255 00:16:59,710 --> 00:17:02,520 And again, the reason I tell you about all these mechanisms 256 00:17:02,520 --> 00:17:05,250 is A, they're cool, and they're about biology. 257 00:17:05,250 --> 00:17:07,109 And B, they're medically important. 258 00:17:07,109 --> 00:17:09,780 Why is reverse transcriptase not just cool 259 00:17:09,780 --> 00:17:13,220 but medically important? 260 00:17:13,220 --> 00:17:14,910 Because you could inhibit it. 261 00:17:14,910 --> 00:17:17,390 If you wanted to fight the HIV, if you wanted to fight 262 00:17:17,390 --> 00:17:21,720 the AIDS virus, you could come up with chemicals that inhibit 263 00:17:21,720 --> 00:17:24,240 reverse transcriptase. 264 00:17:24,240 --> 00:17:28,240 And of course, the cocktails that are given to patients who 265 00:17:28,240 --> 00:17:32,530 have been infected by HIV that now keep them alive very, very 266 00:17:32,530 --> 00:17:36,120 nicely include reverse transcriptase inhibitors. 267 00:17:36,120 --> 00:17:37,260 It's a very important thing. 268 00:17:37,260 --> 00:17:40,260 If you understand the biology of retroviruses, you 269 00:17:40,260 --> 00:17:44,060 understand the targets you can use for drug development. 270 00:17:44,060 --> 00:17:47,600 And they have saved billions of lives. 271 00:17:47,600 --> 00:17:52,660 So again, I say this is not entirely unimportant stuff. 272 00:17:52,660 --> 00:17:55,210 This is kind of cool stuff, to understand how this works. 273 00:17:55,210 --> 00:17:57,990 All right. 274 00:17:57,990 --> 00:18:04,740 Next up, transcription. 275 00:18:04,740 --> 00:18:05,990 Let's turn to transcription. 276 00:18:13,870 --> 00:18:19,350 Transcription is a little bit easier. 277 00:18:19,350 --> 00:18:21,360 Transcription varies. 278 00:18:21,360 --> 00:18:25,810 Here, eukaryotes, prokaryotes, and viruses. 279 00:18:25,810 --> 00:18:27,720 Let's see, let's start now with prokaryotes. 280 00:18:30,860 --> 00:18:33,470 Prokaryotes, pretty simple. 281 00:18:33,470 --> 00:18:38,840 I have a chromosome, I have my promoter, I make my messenger 282 00:18:38,840 --> 00:18:43,190 RNA, I'm done. 283 00:18:43,190 --> 00:18:45,040 Just like I taught you before. 284 00:18:45,040 --> 00:18:48,220 Transcription looks just like I told you. 285 00:18:48,220 --> 00:18:51,815 But for eukaryotes, it's a little weirder. 286 00:18:56,860 --> 00:19:03,931 Transcription, I have my RNA. 287 00:19:03,931 --> 00:19:04,950 I have my promoter. 288 00:19:04,950 --> 00:19:06,970 I make my RNA. 289 00:19:06,970 --> 00:19:11,100 And my RNA, it turns out, gets processed in all sorts of 290 00:19:11,100 --> 00:19:13,040 interesting ways. 291 00:19:13,040 --> 00:19:17,590 The three ways in which eukaryotic RNA are processed-- 292 00:19:17,590 --> 00:19:24,210 first, there is a modification put onto the five-prime end. 293 00:19:24,210 --> 00:19:26,702 It is called-- 294 00:19:26,702 --> 00:19:29,930 well, it's basically a backwards G. It's a G 295 00:19:29,930 --> 00:19:35,190 triphosphate that is put on backwards, so it goes GPPP, 296 00:19:35,190 --> 00:19:36,140 right there. 297 00:19:36,140 --> 00:19:38,220 And this is called a cap. 298 00:19:38,220 --> 00:19:41,270 And it's important for message recognition and stability. 299 00:19:41,270 --> 00:19:44,890 Eukaryotes put a funny little chemical modification there. 300 00:19:44,890 --> 00:19:48,410 Eukaryotes also do something where somewhere near the end, 301 00:19:48,410 --> 00:19:55,000 they cut the message, and they stick on a bunch of A's as a 302 00:19:55,000 --> 00:19:56,900 tail at the end. 303 00:19:56,900 --> 00:19:59,240 And this is also important for message recognition and 304 00:19:59,240 --> 00:20:01,010 message stability and all that. 305 00:20:01,010 --> 00:20:05,220 And this is called the poly(A) tail. 306 00:20:05,220 --> 00:20:06,870 Most of the names are quite reasonable. 307 00:20:06,870 --> 00:20:09,910 The tail of lots of A's is called the poly(A) tail. 308 00:20:09,910 --> 00:20:12,540 So eukaryotic messages have a cap at the front, the poly(A) 309 00:20:12,540 --> 00:20:13,590 tail at the back. 310 00:20:13,590 --> 00:20:17,450 But the truly weird thing that they have is, if this is my 311 00:20:17,450 --> 00:20:42,980 eukaryotic message here, some chunks of the message are cut 312 00:20:42,980 --> 00:20:45,990 out and discarded entirely. 313 00:20:45,990 --> 00:20:49,060 They are spliced out. 314 00:20:49,060 --> 00:20:53,070 You might make a longer message, and whole chunks are 315 00:20:53,070 --> 00:20:56,600 spliced out. 316 00:20:56,600 --> 00:20:58,435 This is called splicing. 317 00:21:01,090 --> 00:21:07,130 Splicing throws out sequences. 318 00:21:07,130 --> 00:21:11,200 And it could start with a long mRNA and make it a short mRNA. 319 00:21:11,200 --> 00:21:16,970 Now, Phil Sharp, who is on the faculty here at MIT, won a 320 00:21:16,970 --> 00:21:20,990 Nobel Prize some years ago for his discovery, together with 321 00:21:20,990 --> 00:21:24,160 someone else, of the phenomenon of splicing. 322 00:21:24,160 --> 00:21:25,840 So Phil is really cool. 323 00:21:25,840 --> 00:21:29,120 You should talk to Phil. 324 00:21:29,120 --> 00:21:36,290 This splicing involves leaving some things in and excising 325 00:21:36,290 --> 00:21:39,180 other things. 326 00:21:39,180 --> 00:21:41,660 The things that are -- 327 00:21:41,660 --> 00:21:50,250 the things that go out are called introns. 328 00:21:50,250 --> 00:21:53,820 The things that stay in and are not 329 00:21:53,820 --> 00:21:56,265 excised are called exons. 330 00:22:01,080 --> 00:22:03,880 The nomenclature is a little nuts. 331 00:22:03,880 --> 00:22:06,300 If it stays in, it's an exon. 332 00:22:06,300 --> 00:22:10,980 If it goes out, it's an intron. 333 00:22:10,980 --> 00:22:13,860 As I told you, the phenomenon was discovered by 334 00:22:13,860 --> 00:22:15,730 Phil Sharpe at MIT. 335 00:22:15,730 --> 00:22:19,800 The nomenclature was proposed by Wally 336 00:22:19,800 --> 00:22:23,780 Gilbert at Harvard, [LAUGHTER] 337 00:22:23,780 --> 00:22:24,540 who's a good friend. 338 00:22:24,540 --> 00:22:26,320 I'm teasing Wally. 339 00:22:26,320 --> 00:22:28,640 I'm teasing Wally, but Wally Is responsible for this 340 00:22:28,640 --> 00:22:33,120 nomenclature that confuses generations of students. 341 00:22:33,120 --> 00:22:35,990 Why is this called an intron? 342 00:22:35,990 --> 00:22:38,540 Not because it stays in, but because it's 343 00:22:38,540 --> 00:22:40,840 an intervening sequence. 344 00:22:40,840 --> 00:22:47,230 It's an Intervening sequence when it is called an intron. 345 00:22:47,230 --> 00:22:49,840 And once something was called an intron, the other thing 346 00:22:49,840 --> 00:22:51,490 became an exon. 347 00:22:51,490 --> 00:22:54,500 But I've got to say, for all purposes, to me, in means in 348 00:22:54,500 --> 00:22:55,970 an ex means out. 349 00:22:55,970 --> 00:22:57,690 But it's exactly backward. 350 00:22:57,690 --> 00:22:59,310 I've now said that a few times. 351 00:22:59,310 --> 00:23:02,280 That may help you remember that it goes the other way. 352 00:23:02,280 --> 00:23:06,680 Introns are intervening sequences, okay? 353 00:23:06,680 --> 00:23:09,630 So there are some pretty impressive splicing events 354 00:23:09,630 --> 00:23:10,850 that go on. 355 00:23:10,850 --> 00:23:18,530 A typical gene, might start off 30 356 00:23:18,530 --> 00:23:21,360 kilobases, 30,000 bases. 357 00:23:21,360 --> 00:23:27,220 And it might get spliced down to 3 kilobases, 3,000 bases. 358 00:23:27,220 --> 00:23:32,620 But for example, the Factor VIII gene that encodes the 359 00:23:32,620 --> 00:23:46,360 factor that hemophiliacs lack, it starts off with 200 360 00:23:46,360 --> 00:23:48,520 kilobases, 200,000 bases. 361 00:23:48,520 --> 00:23:51,680 And all but 10,000 are cut out. 362 00:23:51,680 --> 00:23:54,800 So it starts off 200,000, you throw out 190,000, you get 363 00:23:54,800 --> 00:23:56,920 down to 10. 364 00:23:56,920 --> 00:24:06,520 The winner, the Duchenne muscular dystrophy gene, 365 00:24:06,520 --> 00:24:13,060 starts out at 2 million bases. 366 00:24:13,060 --> 00:24:17,710 And it gets cut down to 16,000 bases. 367 00:24:17,710 --> 00:24:21,410 You make 2 million bases of RNA, and you throw out almost 368 00:24:21,410 --> 00:24:26,000 the entirety of it and retain only 16,000. 369 00:24:26,000 --> 00:24:28,420 What a waste. 370 00:24:28,420 --> 00:24:30,550 Why do this? 371 00:24:30,550 --> 00:24:35,360 Why break up your genes in patches of exons separated by 372 00:24:35,360 --> 00:24:39,045 big intervening spaces and then make a big RNA and splice 373 00:24:39,045 --> 00:24:40,360 it together? 374 00:24:40,360 --> 00:24:41,730 Why do something that dumb? 375 00:24:41,730 --> 00:24:42,328 Yeah? 376 00:24:42,328 --> 00:24:44,200 STUDENT: [INAUDIBLE]? 377 00:24:44,200 --> 00:24:44,530 ERIC LANDER: Sorry? 378 00:24:44,530 --> 00:24:45,270 STUDENT: Are they recycled? 379 00:24:45,270 --> 00:24:47,810 ERIC LANDER: The nucleotides, you mean, in the RNA? 380 00:24:47,810 --> 00:24:48,760 The nucleotides are recycled. 381 00:24:48,760 --> 00:24:51,430 But remember, you spent the trinucleotides there. 382 00:24:51,430 --> 00:24:54,430 So that was an energy expenditure. 383 00:24:54,430 --> 00:24:56,060 Well, it turns out the energy expenditure, 384 00:24:56,060 --> 00:24:58,030 big deal, who cares? 385 00:24:58,030 --> 00:25:00,160 But it turns out that this is actually very interesting 386 00:25:00,160 --> 00:25:00,980 evolutionarily. 387 00:25:00,980 --> 00:25:02,630 I'll just tell you for a second. 388 00:25:02,630 --> 00:25:04,330 Maybe you'll forget it. 389 00:25:04,330 --> 00:25:05,990 In a given organism, it might be more 390 00:25:05,990 --> 00:25:09,830 efficient to not have introns. 391 00:25:09,830 --> 00:25:11,680 Well, actually, there's one use for it. 392 00:25:11,680 --> 00:25:16,050 If I had introns, the cell could do alternative splicing. 393 00:25:16,050 --> 00:25:19,760 It could take the same message and splice it different ways 394 00:25:19,760 --> 00:25:21,420 in different cells. 395 00:25:21,420 --> 00:25:23,100 Your liver might splice the message one 396 00:25:23,100 --> 00:25:25,380 way to make one protein. 397 00:25:25,380 --> 00:25:28,130 Your muscle might splice the message a different way to 398 00:25:28,130 --> 00:25:29,110 make another protein. 399 00:25:29,110 --> 00:25:30,640 So you could actually make multiple, 400 00:25:30,640 --> 00:25:32,710 different mature messages. 401 00:25:32,710 --> 00:25:35,400 That's cool, and that's used, and most genes actually have 402 00:25:35,400 --> 00:25:37,130 alternative splice forms. 403 00:25:37,130 --> 00:25:39,810 They can be spliced up in different ways. 404 00:25:39,810 --> 00:25:42,200 It's also cool evolutionarily. 405 00:25:42,200 --> 00:25:45,180 Because it turns out that if your genes are broken up into 406 00:25:45,180 --> 00:25:49,990 patches like that, when random breaks happen in your genome, 407 00:25:49,990 --> 00:25:51,930 and this bit of chromosome attaches to that bit of 408 00:25:51,930 --> 00:25:53,650 chromosome, as happens sometimes-- 409 00:25:53,650 --> 00:25:55,350 you get hit by some little radiation, it breaks 410 00:25:55,350 --> 00:25:58,210 something, it puts it together-- 411 00:25:58,210 --> 00:26:01,290 the gene you could have a functioning gene. 412 00:26:01,290 --> 00:26:04,560 Because since your cell knows how to take a long message and 413 00:26:04,560 --> 00:26:08,300 splice it together, the fact that there was a break and a 414 00:26:08,300 --> 00:26:10,520 reunion, if it happens in one of those intervening 415 00:26:10,520 --> 00:26:13,250 sequences, would give you a new gene, and a 416 00:26:13,250 --> 00:26:14,900 new functional gene. 417 00:26:14,900 --> 00:26:17,490 And so, in fact, some people think that this is one of the 418 00:26:17,490 --> 00:26:20,610 tricks evolution uses to create diversity is by 419 00:26:20,610 --> 00:26:23,260 breaking up its information like that into little patches 420 00:26:23,260 --> 00:26:25,960 of files that can then be recombined with each other in 421 00:26:25,960 --> 00:26:27,100 different ways. 422 00:26:27,100 --> 00:26:28,180 So those are the ideas. 423 00:26:28,180 --> 00:26:30,700 In any case, it happens a great deal. 424 00:26:30,700 --> 00:26:35,537 Now, viruses. 425 00:26:35,537 --> 00:26:39,700 Viruses it turns out, will behave with regard to this 426 00:26:39,700 --> 00:26:41,190 like the organism in which they live. 427 00:26:41,190 --> 00:26:43,890 Prokaryotic viruses will behave like prokes. 428 00:26:43,890 --> 00:26:47,860 Eukaryotic viruses will behave like eukes. 429 00:26:47,860 --> 00:26:53,430 Now, let's turn to translation. 430 00:26:53,430 --> 00:26:55,615 How does translation work between--? 431 00:27:05,150 --> 00:27:12,060 Well, now here, eukaryotes, for the most 432 00:27:12,060 --> 00:27:15,770 part, are well behaved. 433 00:27:15,770 --> 00:27:26,210 A eukaryote makes an mRNA, it goes to the ribosome, the mRNA 434 00:27:26,210 --> 00:27:31,240 makes your protein, just like I taught you. 435 00:27:31,240 --> 00:27:35,180 But now, prokaryotes are a little weird. 436 00:27:35,180 --> 00:27:47,590 Prokaryotes, you know, their messages are fine. 437 00:27:47,590 --> 00:27:49,610 None of this funny cap business, none of these 438 00:27:49,610 --> 00:27:51,900 poly(A) tails, none of this splicing. 439 00:27:51,900 --> 00:27:54,690 But here's the weirdness that prokes do. 440 00:27:54,690 --> 00:28:02,470 This mRNA, one mRNA might encode 441 00:28:02,470 --> 00:28:05,480 multiple different proteins. 442 00:28:05,480 --> 00:28:06,740 Multiple proteins. 443 00:28:06,740 --> 00:28:09,030 So this is one protein. 444 00:28:09,030 --> 00:28:10,970 This, multiple proteins. 445 00:28:14,500 --> 00:28:20,640 So what happens is this RNA, the ribosome gets on here at a 446 00:28:20,640 --> 00:28:22,780 certain site. 447 00:28:22,780 --> 00:28:26,870 Now remember I told you the ribosome finds the first AUG? 448 00:28:26,870 --> 00:28:28,620 Well, it's a little more complicated. 449 00:28:28,620 --> 00:28:31,660 There's a particular ribosome binding site that ribosomes 450 00:28:31,660 --> 00:28:33,125 like to go to. 451 00:28:33,125 --> 00:28:37,620 It turns out that this prokaryotic message has 452 00:28:37,620 --> 00:28:40,310 several of those ribosome binding sites. 453 00:28:40,310 --> 00:28:42,150 Ribosomes sit down. 454 00:28:42,150 --> 00:28:45,580 They then scan for the first AUG that they see after that 455 00:28:45,580 --> 00:28:49,930 point and start making a protein. 456 00:28:49,930 --> 00:28:52,790 So I might have several different proteins all being 457 00:28:52,790 --> 00:29:00,150 made by one messenger RNA, one RNA making multiple proteins. 458 00:29:00,150 --> 00:29:01,400 Why would I do that? 459 00:29:04,130 --> 00:29:06,140 Wouldn't it be simpler, make your head hurt less, to have 460 00:29:06,140 --> 00:29:12,320 one gene making one protein instead of this one mRNA 461 00:29:12,320 --> 00:29:15,310 encoding multiple proteins? 462 00:29:15,310 --> 00:29:16,620 STUDENT: They're related to one another. 463 00:29:16,620 --> 00:29:17,610 ERIC LANDER: They're related. 464 00:29:17,610 --> 00:29:18,751 Tell me about that. 465 00:29:18,751 --> 00:29:20,329 STUDENT: Well, like their functions, if 466 00:29:20,329 --> 00:29:21,460 they had similar functions. 467 00:29:21,460 --> 00:29:23,220 ERIC LANDER: If they had similar functions, it might be 468 00:29:23,220 --> 00:29:27,370 very efficient to have regulatory controls that made 469 00:29:27,370 --> 00:29:28,410 one message. 470 00:29:28,410 --> 00:29:30,270 And then I get all the proteins made together, rather 471 00:29:30,270 --> 00:29:32,340 than having separate regulatory 472 00:29:32,340 --> 00:29:33,960 controls for each message. 473 00:29:33,960 --> 00:29:36,890 Suppose I'm a prokaryote, and I'm dividing very rapidly, and 474 00:29:36,890 --> 00:29:38,510 I care about having a small genome. 475 00:29:38,510 --> 00:29:40,850 It might be very efficient to do it this way. 476 00:29:40,850 --> 00:29:43,170 So what related genes might you put 477 00:29:43,170 --> 00:29:45,862 together on the same message? 478 00:29:45,862 --> 00:29:46,690 STUDENT: Pathway. 479 00:29:46,690 --> 00:29:48,120 STUDENT: Same pathway. 480 00:29:48,120 --> 00:29:51,540 Maybe if you're in charge of the committee, you would say, 481 00:29:51,540 --> 00:29:54,410 I would like to arrange the genes encoding the multiple 482 00:29:54,410 --> 00:29:56,780 steps of a biochemical pathway. 483 00:29:56,780 --> 00:29:58,430 And that's often what happens. 484 00:29:58,430 --> 00:29:59,830 That's exactly what happens. 485 00:29:59,830 --> 00:30:09,190 So you can get co-regulation of genes, of 486 00:30:09,190 --> 00:30:10,440 genes in the same pathway. 487 00:30:14,620 --> 00:30:20,170 And such a thing, which is called polycistronic message. 488 00:30:20,170 --> 00:30:23,470 Polycistronic is an unhelpful name for it, but it's what 489 00:30:23,470 --> 00:30:28,740 they call it sometimes, or an operon, a regulatory thing 490 00:30:28,740 --> 00:30:31,670 that makes many such things. 491 00:30:31,670 --> 00:30:34,280 Now, why do bacteria do it? 492 00:30:34,280 --> 00:30:36,060 To minimize DNA. 493 00:30:36,060 --> 00:30:39,080 They just want to use as little space as possible. 494 00:30:39,080 --> 00:30:42,260 So once it's invented a regulatory control that turns 495 00:30:42,260 --> 00:30:45,690 on this message when I want to make arginine, well, just 496 00:30:45,690 --> 00:30:47,410 stick all the genes on the same message. 497 00:30:47,410 --> 00:30:52,110 It's cheaper, simpler, less DNA needed. 498 00:30:52,110 --> 00:30:54,980 Now, who really has trouble with the amount of 499 00:30:54,980 --> 00:30:58,410 DNA they can have? 500 00:30:58,410 --> 00:31:00,760 Do you have trouble with-- 501 00:31:00,760 --> 00:31:05,270 Is replicating your DNA a rate limiting step in your 502 00:31:05,270 --> 00:31:09,010 replication as an organism? 503 00:31:09,010 --> 00:31:09,150 No. 504 00:31:09,150 --> 00:31:12,340 What is the rate limiting step in your organismal 505 00:31:12,340 --> 00:31:15,564 replication, your having offspring? 506 00:31:15,564 --> 00:31:16,380 STUDENT: [INAUDIBLE]. 507 00:31:16,380 --> 00:31:18,223 ERIC LANDER: Sorry? 508 00:31:18,223 --> 00:31:20,870 STUDENT: When it comes to an age that-- 509 00:31:20,870 --> 00:31:23,390 ERIC LANDER: It's graduating from MIT, getting a job, 510 00:31:23,390 --> 00:31:24,665 things like that, right? 511 00:31:24,665 --> 00:31:26,370 And DNA replication is not rate 512 00:31:26,370 --> 00:31:27,420 limiting, actually, right? 513 00:31:27,420 --> 00:31:30,910 Getting a degree, all that, that's rate limiting. 514 00:31:30,910 --> 00:31:34,830 But for bacteria, who are reproducing not every 20 years 515 00:31:34,830 --> 00:31:37,970 but every 20 minutes, DNA replication is a 516 00:31:37,970 --> 00:31:38,890 rate limiting step. 517 00:31:38,890 --> 00:31:40,730 It's important to their replication. 518 00:31:40,730 --> 00:31:42,290 And they want compact genomes. 519 00:31:42,290 --> 00:31:43,750 You, you don't care about a compact genome. 520 00:31:43,750 --> 00:31:47,350 It's simply not a big metabolic cost to you. 521 00:31:47,350 --> 00:31:48,240 Viruses-- 522 00:31:48,240 --> 00:31:49,730 think about viruses. 523 00:31:49,730 --> 00:31:51,940 Their genomes are tiny. 524 00:31:51,940 --> 00:31:54,710 They're much tinier even than prokaryotes. 525 00:31:54,710 --> 00:31:58,040 Typical viruses might have a few thousand or tens of 526 00:31:58,040 --> 00:31:59,290 thousands of bases. 527 00:32:02,460 --> 00:32:04,500 Bacteria were millions of bases. 528 00:32:04,500 --> 00:32:07,570 Viruses sometimes might only have 10,000 529 00:32:07,570 --> 00:32:10,370 letters of nucleic acid. 530 00:32:10,370 --> 00:32:14,825 They really have to use their information very compactly. 531 00:32:14,825 --> 00:32:17,010 So you know what viruses do sometimes? 532 00:32:17,010 --> 00:32:18,690 Not all, but some? 533 00:32:18,690 --> 00:32:21,760 Some viruses make a messenger RNA. 534 00:32:24,900 --> 00:32:29,070 They have a ribosome binding site right here. 535 00:32:29,070 --> 00:32:31,720 And they start translating the message-- 536 00:32:34,450 --> 00:32:35,660 let's give it a sequence. 537 00:32:35,660 --> 00:32:49,820 ACUUGAGCAA, and we'll put an AUG in front of that. 538 00:32:49,820 --> 00:32:51,810 They can start translating here. 539 00:33:00,461 --> 00:33:02,000 Whoops, I'm just going to fix that. 540 00:33:07,890 --> 00:33:10,010 But you know what? 541 00:33:10,010 --> 00:33:14,160 There's also an AUG over here. 542 00:33:14,160 --> 00:33:17,810 They could start using that one. 543 00:33:17,810 --> 00:33:19,430 And you know what else? 544 00:33:19,430 --> 00:33:23,640 If they found another AUG off frame, some of 545 00:33:23,640 --> 00:33:25,580 them use that too. 546 00:33:25,580 --> 00:33:31,370 Remember, the first AUG in normal messages get used to 547 00:33:31,370 --> 00:33:33,830 set the phase of the codons. 548 00:33:33,830 --> 00:33:37,970 But in theory, I could be reading those codons in a 549 00:33:37,970 --> 00:33:41,540 different reading frame, shifted over by one or shifted 550 00:33:41,540 --> 00:33:43,180 over by two. 551 00:33:43,180 --> 00:33:48,050 Some viruses are so clever that they encode three 552 00:33:48,050 --> 00:33:52,440 different proteins smack on top of each other by reading 553 00:33:52,440 --> 00:33:54,865 the same nucleic acid sequence. 554 00:33:54,865 --> 00:33:57,720 It shifted three different ways. 555 00:34:00,910 --> 00:34:02,920 It makes your head hurt to think, how could I make a 556 00:34:02,920 --> 00:34:03,900 functional protein? 557 00:34:03,900 --> 00:34:06,530 I've got to manage to get three separate functional 558 00:34:06,530 --> 00:34:11,310 proteins built out of a single nucleic acid sequence. 559 00:34:11,310 --> 00:34:12,030 One, it's easy. 560 00:34:12,030 --> 00:34:13,980 Just tell me the amino acids. 561 00:34:13,980 --> 00:34:15,810 Now this becomes an interesting puzzle. 562 00:34:15,810 --> 00:34:19,190 Can I make a sequence where read one out of frame, it also 563 00:34:19,190 --> 00:34:20,480 makes a protein I want. 564 00:34:20,480 --> 00:34:21,760 And another one where-- 565 00:34:21,760 --> 00:34:25,639 And some viruses have evolved to do that, showing you how 566 00:34:25,639 --> 00:34:29,239 much they care about economizing their DNA. 567 00:34:29,239 --> 00:34:29,719 All right. 568 00:34:29,719 --> 00:34:30,820 So what have we got? 569 00:34:30,820 --> 00:34:33,699 We have DNA replicates. 570 00:34:33,699 --> 00:34:35,530 We've seen how it replicates. 571 00:34:35,530 --> 00:34:39,900 Its mostly the same by nucleic acid copying, but all these 572 00:34:39,900 --> 00:34:40,820 variations-- 573 00:34:40,820 --> 00:34:45,889 linears, circulars, single strand, double strand, RNA, 574 00:34:45,889 --> 00:34:50,239 reverse transcriptase, transcription, translation. 575 00:34:50,239 --> 00:34:52,070 Look over these examples. 576 00:34:52,070 --> 00:34:55,460 And what they should do is help solidify how these things 577 00:34:55,460 --> 00:34:59,040 work and how they're used in slightly different fashions. 578 00:34:59,040 --> 00:35:00,290 Next time.