1 00:00:00,000 --> 00:00:06,000 By the time that Watson and Crick figured out the structure of DNA, 2 00:00:06,000 --> 00:00:12,000 you know, it was sort of obvious that since the two strands were 3 00:00:12,000 --> 00:00:18,000 complimentary you could see how it replicated. And they also could see 4 00:00:18,000 --> 00:00:24,000 that somehow the information must be encoded in the sequence of letters 5 00:00:24,000 --> 00:00:30,000 down the strands of the DNA. But it wasn't obvious what the code 6 00:00:30,000 --> 00:00:36,000 was and how it was arranged, how it worked. And in principle it 7 00:00:36,000 --> 00:00:42,000 was anything you could do with four-letters. And so I pointed out 8 00:00:42,000 --> 00:00:47,000 the other day this was sort of a four-letter alphabet. 9 00:00:47,000 --> 00:00:53,000 And I think it's useful to think of it this way with A, 10 00:00:53,000 --> 00:00:59,000 G, C and T, and RNA as also being a four-letter alphabet. 11 00:00:59,000 --> 00:01:05,000 But proteins are actually a 20-letter alphabet because there are 12 00:01:05,000 --> 00:01:12,000 20 different amino acids. And so somehow, since one of the 13 00:01:12,000 --> 00:01:19,000 key things that the DNA had to do, it somehow had to encode the 14 00:01:19,000 --> 00:01:26,000 information for making the proteins. And there was a lot of work on 15 00:01:26,000 --> 00:01:32,000 protein biosynthesis at the time. And it looked pretty complicated. 16 00:01:32,000 --> 00:01:36,000 People had found that RNA seemed to be important. Cells that were 17 00:01:36,000 --> 00:01:41,000 making lots of protein had lots of RNA in them. And another thing they 18 00:01:41,000 --> 00:01:45,000 noticed was that if you looked in eukaryotic cells the DNA stayed in 19 00:01:45,000 --> 00:01:49,000 the nucleus. The proteins, most of them, were out in the 20 00:01:49,000 --> 00:01:54,000 cytoplasm. And the evidence was that they were made out in the 21 00:01:54,000 --> 00:01:58,000 cytoplasm. So somehow the information had to get out of the 22 00:01:58,000 --> 00:02:03,000 nucleus where the DNA was and into the cytoplasm. 23 00:02:03,000 --> 00:02:06,000 And biochemists were breaking cells open and trying to make cellular 24 00:02:06,000 --> 00:02:10,000 extracts that would synthesize proteins. And I think it's fair to 25 00:02:10,000 --> 00:02:14,000 say at the time that it looked extremely complicated. 26 00:02:14,000 --> 00:02:18,000 And so thinking about how DNA encoded information and got 27 00:02:18,000 --> 00:02:22,000 translated into proteins was a very complex issue. 28 00:02:22,000 --> 00:02:26,000 But then actually there was a very interesting development that had a 29 00:02:26,000 --> 00:02:30,000 strong influence in Watson and Crick and led to them, 30 00:02:30,000 --> 00:02:34,000 Crick in particular, getting a key insight into the 31 00:02:34,000 --> 00:02:38,000 nature of this coding problem. There's a physicist, 32 00:02:38,000 --> 00:02:43,000 George Gamow, who some of you know. He proposed the ìBig Bang Theoryî. 33 00:02:43,000 --> 00:02:48,000 A very strong theoretical physicist. And he wrote a letter to Watson and 34 00:02:48,000 --> 00:02:53,000 Crick. He thought he'd figured out the basis of the genetic code. 35 00:02:53,000 --> 00:02:58,000 And his idea was you had these sequences of A, G, C and Ts. 36 00:02:58,000 --> 00:03:01,000 And so everywhere the two bases came together there was sort of like a 37 00:03:01,000 --> 00:03:05,000 little different shaped hole. So his idea was the amino acids 38 00:03:05,000 --> 00:03:09,000 would stick into these little holes. And he had a theory showing that 39 00:03:09,000 --> 00:03:13,000 you could encode the sequence of proteins by having the side chains 40 00:03:13,000 --> 00:03:17,000 in the amino acids stick into these little holes along the DNA. 41 00:03:17,000 --> 00:03:21,000 Now, there turned out to be a number of problems with that. 42 00:03:21,000 --> 00:03:25,000 It didn't take into account the involvement of RNA, 43 00:03:25,000 --> 00:03:29,000 which there sort of was quite of bit of evidence for. 44 00:03:29,000 --> 00:03:32,000 And more importantly it didn't take into account the structure of the 45 00:03:32,000 --> 00:03:36,000 side chains of the amino acids, which you guys have been exposed to. 46 00:03:36,000 --> 00:03:39,000 But it had a very profound influence on Watson and Crick. 47 00:03:39,000 --> 00:03:43,000 They read this letter. They immediately realized the idea was 48 00:03:43,000 --> 00:03:47,000 wrong and went out and had a lunch at a pub, decided again how they 49 00:03:47,000 --> 00:03:50,000 actually thought there were 25 amino acids, but they realized some of 50 00:03:50,000 --> 00:03:54,000 them were just sort of special ones that were modified only in 51 00:03:54,000 --> 00:03:58,000 particular proteins and there were really 20 amino acids that were 52 00:03:58,000 --> 00:04:02,000 found universally in nature and amino acids. 53 00:04:02,000 --> 00:04:05,000 And what they, Crick in particular, 54 00:04:05,000 --> 00:04:09,000 realized was that maybe instead of having to think about protein 55 00:04:09,000 --> 00:04:12,000 synthesis through this very complex set of extracts and mixtures a 56 00:04:12,000 --> 00:04:16,000 biochemist would work on, that he could think about it at a 57 00:04:16,000 --> 00:04:20,000 purely theoretical level, which basically is up at this kind 58 00:04:20,000 --> 00:04:23,000 of level. But if you have a molecule that has four letters and 59 00:04:23,000 --> 00:04:27,000 it's going to be encoding proteins how does it do it? 60 00:04:27,000 --> 00:04:31,000 Can I work out sort of the basis or a possible theory for how that could 61 00:04:31,000 --> 00:04:35,000 happen without actually knowing all of the biochemical details? 62 00:04:35,000 --> 00:04:40,000 So Crick made a couple of simplifying assumptions. 63 00:04:40,000 --> 00:04:45,000 One was that the DNA only determined -- 64 00:04:45,000 --> 00:04:56,000 -- the linear sequence of amino 65 00:04:56,000 --> 00:05:05,000 acids and protein. That all this information about the 66 00:05:05,000 --> 00:05:09,000 3-dimensional stuff came from the properties of the linear sequence 67 00:05:09,000 --> 00:05:14,000 once it was made. And I think you hopefully have 68 00:05:14,000 --> 00:05:18,000 enough understanding of hydrophobic and other sorts of interactions that 69 00:05:18,000 --> 00:05:23,000 would cause a linear sequence amino acid to take a particular 70 00:05:23,000 --> 00:05:27,000 confirmation. And the other assumption he made was that 71 00:05:27,000 --> 00:05:32,000 it must be universal. And it would be hard to see how life 72 00:05:32,000 --> 00:05:36,000 could have started if there wasn't some kind of code that was universal 73 00:05:36,000 --> 00:05:41,000 between organisms. And if you start from those kinds 74 00:05:41,000 --> 00:05:46,000 of considerations then what you can see is you cannot just have a 75 00:05:46,000 --> 00:05:50,000 one-to-one correspondence between a letter in the nucleic acid alphabet 76 00:05:50,000 --> 00:05:55,000 and a letter down here. If A stood for valine that would be 77 00:05:55,000 --> 00:06:00,000 fine, but you could only have code for four amino acids that way. 78 00:06:00,000 --> 00:06:07,000 So if you had one-letter words in DNA there are four possibilities. 79 00:06:07,000 --> 00:06:14,000 And so it could only make four. If you had two two-letter words then 80 00:06:14,000 --> 00:06:21,000 you'd have 16 possibilities, still not enough for all the amino 81 00:06:21,000 --> 00:06:29,000 acids. If you had a three-letter word -- 82 00:06:29,000 --> 00:06:36,000 -- then you could do 64, 83 00:06:36,000 --> 00:06:40,000 and in principle that would be all you'd need. It doesn't rule out 84 00:06:40,000 --> 00:06:44,000 there couldn't be five or six or seven-letter words. 85 00:06:44,000 --> 00:06:47,000 Or if you think about this as they were thinking about it at the time, 86 00:06:47,000 --> 00:06:51,000 even if it were let's say a three-letter word, 87 00:06:51,000 --> 00:06:55,000 is it a code where you have one word, then the next word, 88 00:06:55,000 --> 00:06:59,000 then the next word? Or could it be an overlapping word? And 89 00:06:59,000 --> 00:07:03,000 what about punctuation? And maybe another thing, 90 00:07:03,000 --> 00:07:07,000 you can see if it's AG, CT, etc., there's a frame of reference 91 00:07:07,000 --> 00:07:11,000 problem, because if I'm going to read them in groups of three, 92 00:07:11,000 --> 00:07:15,000 if I start here I'll get one word, but if I start one letter over the 93 00:07:15,000 --> 00:07:19,000 next group of three won't be the same. So somehow there would have 94 00:07:19,000 --> 00:07:23,000 to be a starting point. And so these are the sort of 95 00:07:23,000 --> 00:07:27,000 considerations that they had to take into account. And, in fact, 96 00:07:27,000 --> 00:07:32,000 Watson, excuse me. Francis Crick and another scientist 97 00:07:32,000 --> 00:07:37,000 Sydney Brenner and some other scientists worked out a very elegant 98 00:07:37,000 --> 00:07:42,000 genetic experiment that demonstrated that it was a three-letter code. 99 00:07:42,000 --> 00:07:47,000 And I don't have the time to go into it in this course. 100 00:07:47,000 --> 00:07:53,000 If you take a genetics course it's a very beautiful experiment. 101 00:07:53,000 --> 00:07:58,000 The principle of the thing, which I could show you rather easily, 102 00:07:58,000 --> 00:08:03,000 is if you're writing a thing where you're reading in three-letter words, 103 00:08:03,000 --> 00:08:08,000 something like this. The cat ran out and, 104 00:08:08,000 --> 00:08:12,000 I don't know, ate the rat or something like that. 105 00:08:12,000 --> 00:08:16,000 And these were all just continuously run together, 106 00:08:16,000 --> 00:08:20,000 not separated out, but I've put them out here. As you can see they're 107 00:08:20,000 --> 00:08:24,000 three-letter words. If you lost one letter then it 108 00:08:24,000 --> 00:08:28,000 would change to sort of gibberish. You'd get stuff that looked like 109 00:08:28,000 --> 00:08:33,000 this. And if you put one in you'd have the 110 00:08:33,000 --> 00:08:39,000 same problem, but if you were to either take out three letters or put 111 00:08:39,000 --> 00:08:45,000 in three letters then, even though there'd be a little mess 112 00:08:45,000 --> 00:08:51,000 in here somewhere, say I took out two more of these, 113 00:08:51,000 --> 00:08:57,000 what we would now have from then is the rest of it would now 114 00:08:57,000 --> 00:09:01,000 make sense again. And they did this sort of experiment 115 00:09:01,000 --> 00:09:04,000 genetically. They managed to figure out there were two kinds of 116 00:09:04,000 --> 00:09:07,000 mutations they could get in a particular way. 117 00:09:07,000 --> 00:09:10,000 Some were putting in a letter. Some were taking out a letter. And 118 00:09:10,000 --> 00:09:13,000 they didn't know at the time whether there were adding or deleting, 119 00:09:13,000 --> 00:09:16,000 but they could tell they were in the opposite directions. 120 00:09:16,000 --> 00:09:19,000 And then they found if they took three of one class, 121 00:09:19,000 --> 00:09:22,000 like three that would delete a letter and put them all together 122 00:09:22,000 --> 00:09:25,000 then things would more or less work. Or if they put three that stuck in 123 00:09:25,000 --> 00:09:28,000 an extra letter then everything would more or less work. 124 00:09:28,000 --> 00:09:34,000 So there was a genetic proof of the three-letter part of the code before 125 00:09:34,000 --> 00:09:41,000 it was figured out exactly how the code itself worked. 126 00:09:41,000 --> 00:09:48,000 And so going from this sort of theoretical insight into the code to 127 00:09:48,000 --> 00:09:55,000 actually figuring out how proteins were made there was still quite a 128 00:09:55,000 --> 00:10:02,000 lot of stuff that had to happen. And one was the concept of 129 00:10:02,000 --> 00:10:08,000 messenger RNA. As I said, there'd been quite a lot 130 00:10:08,000 --> 00:10:13,000 of evidence that RNA was somehow involved in protein synthesis 131 00:10:13,000 --> 00:10:17,000 because cells that made a lot of protein made a lot of RNA. 132 00:10:17,000 --> 00:10:22,000 And it seemed to be in the right sort of place in the cell for the 133 00:10:22,000 --> 00:10:27,000 proteins to be made. So the idea merged that RNA was 134 00:10:27,000 --> 00:10:32,000 somehow a carrier of information from the DNA to the cytoplasm. 135 00:10:32,000 --> 00:10:39,000 So it could serve as a template for making proteins. 136 00:10:39,000 --> 00:10:47,000 So the idea that the cell copied the sequence of a portion -- 137 00:10:47,000 --> 00:10:56,000 -- of the DNA. 138 00:10:56,000 --> 00:11:02,000 And we'd probably think of this as a gene right now. 139 00:11:02,000 --> 00:11:07,000 Into RNA. And the RNA would go into the cytoplasm. 140 00:11:07,000 --> 00:11:13,000 That's the part outside the nucleus. And then it would serve 141 00:11:13,000 --> 00:11:25,000 as a template -- 142 00:11:25,000 --> 00:11:30,000 -- for protein synthesis. Because of this thought that if you 143 00:11:30,000 --> 00:11:35,000 had a cell like this with a nucleus and the DNA in here, 144 00:11:35,000 --> 00:11:40,000 that if a piece of RNA were to go out into the cytoplasm and have 145 00:11:40,000 --> 00:11:46,000 those properties it would be functioning more or less as a 146 00:11:46,000 --> 00:11:51,000 messenger. It would be carrying the genetic information from inside the 147 00:11:51,000 --> 00:11:56,000 nucleus out into the cytoplasm. And so the term began to be used of 148 00:11:56,000 --> 00:12:02,000 a messenger RNA. And so over here I'll put an mRNA to 149 00:12:02,000 --> 00:12:08,000 indicate that. Now, one thing you can also see is 150 00:12:08,000 --> 00:12:15,000 we've talked about the structure of DNA and RNA. And it's essentially 151 00:12:15,000 --> 00:12:22,000 the same with one. This is the nucleotide, 152 00:12:22,000 --> 00:12:29,000 which is the fundamental building block of DNA. 153 00:12:29,000 --> 00:12:34,000 And if you recall, in DNA there's a hydroxyl, 154 00:12:34,000 --> 00:12:40,000 excuse me, a hydrogen there, but in RNA there is this extra 155 00:12:40,000 --> 00:12:45,000 hydroxyl. This is 1 prime, 2 prime, 3 prime, 4 prime, excuse me. 156 00:12:45,000 --> 00:12:51,000 Let's just leave it like for the moment, 1, 2, 3, 157 00:12:51,000 --> 00:12:57,000 4, 5. And so the DNA, as you heard, was deoxynucleic acid 158 00:12:57,000 --> 00:13:03,000 because it's missing this. But other than that the backbones 159 00:13:03,000 --> 00:13:11,000 are similar and the letters are almost the same. 160 00:13:11,000 --> 00:13:19,000 The A, the G and the C are exactly the same bases in DNA and RNA. 161 00:13:19,000 --> 00:13:27,000 The only difference is with the T and the uracil. 162 00:13:27,000 --> 00:13:35,000 So this is thiamine which is found in DNA. 163 00:13:35,000 --> 00:13:45,000 And this is uracil -- 164 00:13:45,000 --> 00:13:55,000 -- which is found in -- 165 00:13:55,000 --> 00:14:00,000 -- RNA. So the base pairing is over on this part of the molecule. 166 00:14:00,000 --> 00:14:06,000 So whether or not you have a methyl group doesn't really change the base 167 00:14:06,000 --> 00:14:12,000 pairing. And so this process of copying information in DNA to 168 00:14:12,000 --> 00:14:18,000 information that's in RNA was seen as essentially the same kind of 169 00:14:18,000 --> 00:14:24,000 language, but it's just sort of like taking somebody's word processor 170 00:14:24,000 --> 00:14:28,000 file and writing out longhand. You'd be transcribing the 171 00:14:28,000 --> 00:14:32,000 information but it would be essentially the same kind of 172 00:14:32,000 --> 00:14:36,000 information in essentially the same form. So this is known 173 00:14:36,000 --> 00:14:44,000 as transcription. 174 00:14:44,000 --> 00:14:48,000 I'll take just one very brief thing. Some of you may wonder why did 175 00:14:48,000 --> 00:14:53,000 nature do it this way? Why didn't it just use uracil in 176 00:14:53,000 --> 00:14:58,000 DNA? So as a very brief aside, I think we understand pretty much 177 00:14:58,000 --> 00:15:04,000 why it does it. And that is cytidine has this 178 00:15:04,000 --> 00:15:11,000 structure. So this is C which is found in DNA but it undergoes, 179 00:15:11,000 --> 00:15:18,000 all of your DNA is a chemical and it's able to undergo spontaneous 180 00:15:18,000 --> 00:15:25,000 kinds of damage. In fact, in every one of our human 181 00:15:25,000 --> 00:15:32,000 cells every day, 10,000 times in any given cell a 182 00:15:32,000 --> 00:15:39,000 base falls off totally just leaving the deoxyribose sitting there. 183 00:15:39,000 --> 00:15:44,000 And the cells have to fix it up. And we have DNA repair systems that 184 00:15:44,000 --> 00:15:50,000 do that. But another very common kind of thing that happens is that 185 00:15:50,000 --> 00:15:56,000 this NH2 group deaminates. And if you do that, if a C happens 186 00:15:56,000 --> 00:16:02,000 to deaminate in DNA it gives you a uracil. 187 00:16:02,000 --> 00:16:07,000 And if that ever happens, the cell is actually able to tell 188 00:16:07,000 --> 00:16:12,000 that something went wrong because uracil is not supposed to be in DNA 189 00:16:12,000 --> 00:16:18,000 and there are repair systems that constantly scan the DNA and take out 190 00:16:18,000 --> 00:16:23,000 any uracils that are in there. And the reason, if instead of using 191 00:16:23,000 --> 00:16:29,000 thiamine it used uracil then the cell wouldn't know whether the 192 00:16:29,000 --> 00:16:34,000 uracil got there because it was supposed to be there as part of the 193 00:16:34,000 --> 00:16:40,000 sequence or whether it had arisen by deamination of a cytidine. 194 00:16:40,000 --> 00:16:45,000 It's a minor point but I think we do have an understanding as to why 195 00:16:45,000 --> 00:16:51,000 there's thiamine in DNA and uracil in RNA. This isn't such a worry in 196 00:16:51,000 --> 00:16:56,000 RNA. OK. But anyway. So there's still a really big 197 00:16:56,000 --> 00:17:02,000 problem here, though, that Watson and Crick and others 198 00:17:02,000 --> 00:17:07,000 were grappling with. And it has to do, 199 00:17:07,000 --> 00:17:11,000 as I say, with this fact that the information up here is the first in 200 00:17:11,000 --> 00:17:16,000 DNA and RNA. It's written as a sequence of letters, 201 00:17:16,000 --> 00:17:20,000 if you will, chemical letters, but there are only four letters in 202 00:17:20,000 --> 00:17:24,000 the DNA alphabet and essentially the same four letters in 203 00:17:24,000 --> 00:17:29,000 the RNA alphabet. However, the protein language has 204 00:17:29,000 --> 00:17:34,000 got a totally different alphabet so it's somehow like sort of 205 00:17:34,000 --> 00:17:39,000 translating now from English to Japanese or something like that. 206 00:17:39,000 --> 00:17:44,000 Some really fundamental change had to happen because there was a real 207 00:17:44,000 --> 00:17:49,000 conversion from one kind of language to another. And so this process is 208 00:17:49,000 --> 00:17:54,000 known as translation, as going from information that's 209 00:17:54,000 --> 00:17:59,000 written using a four-letter nucleic acid alphabet to information that's 210 00:17:59,000 --> 00:18:05,000 written using a 20-letter amino acid alphabet. 211 00:18:05,000 --> 00:18:09,000 And Crick on purely theoretical grounds figured, 212 00:18:09,000 --> 00:18:14,000 well, if you're going from one language to another what do you need? 213 00:18:14,000 --> 00:18:19,000 You need a translator? And what's a translator? 214 00:18:19,000 --> 00:18:24,000 A translator is someone who speaks both languages. 215 00:18:24,000 --> 00:18:29,000 So his idea was that if there was -- I'm going to just separate out, 216 00:18:29,000 --> 00:18:34,000 let's say this is the messenger RNA. And I, just for clarity here, have 217 00:18:34,000 --> 00:18:39,000 spaced out the three-letter words so we can see them. 218 00:18:39,000 --> 00:18:44,000 These would be three like G-A-C or something like that in the RNA. 219 00:18:44,000 --> 00:18:49,000 That there would be some kind of translator. And his idea was that 220 00:18:49,000 --> 00:18:54,000 it would be something that had a particular amino acid at one end and 221 00:18:54,000 --> 00:19:00,000 it had the complimentary nucleotides at the other end. 222 00:19:00,000 --> 00:19:05,000 So it could, if you will, read the genetic code that was 223 00:19:05,000 --> 00:19:11,000 written in the RNA using the nucleic acid alphabet, 224 00:19:11,000 --> 00:19:16,000 but it would also be speaking the amino acid language. 225 00:19:16,000 --> 00:19:22,000 Got the idea? So the idea was that this would be, 226 00:19:22,000 --> 00:19:28,000 they used the words adaptor or a translator. So that was on 227 00:19:28,000 --> 00:19:32,000 basically theoretical grounds. If you had to go from a four-letter 228 00:19:32,000 --> 00:19:36,000 language to a 20-letter language you needed some kind of translator or 229 00:19:36,000 --> 00:19:40,000 adapter. Now, at that same time that these 230 00:19:40,000 --> 00:19:44,000 considerations were going on, biochemists began to find a class of 231 00:19:44,000 --> 00:19:54,000 small RNAs -- 232 00:19:54,000 --> 00:20:02,000 -- that had an amino acid -- 233 00:20:02,000 --> 00:20:08,000 -- attached. And so there were entities that had just the sort of 234 00:20:08,000 --> 00:20:14,000 properties that Crick had envisioned you'd need from theoretical 235 00:20:14,000 --> 00:20:20,000 considerations. These were given the name transfer 236 00:20:20,000 --> 00:20:26,000 RNAs or tRNAs as they're usually referred to now. 237 00:20:26,000 --> 00:20:31,000 And I've told you that RNA has, since it's got nucleic acid bases, 238 00:20:31,000 --> 00:20:37,000 if you have a single strand of either an RNA or a DNA and you don't 239 00:20:37,000 --> 00:20:42,000 have a complimentary double-strand, then if there are complimentary 240 00:20:42,000 --> 00:20:48,000 sequences they can come together and pair just the same way that 241 00:20:48,000 --> 00:20:53,000 complimentary sequences can come together in DNA. 242 00:20:53,000 --> 00:20:59,000 And in the case of tRNAs, once the sequence of these was 243 00:20:59,000 --> 00:21:05,000 determined, oops. There we go. They folded up into a 244 00:21:05,000 --> 00:21:11,000 clover leaf shape. And the amino acid is attached up 245 00:21:11,000 --> 00:21:17,000 at the 3 prime end of the chain up here in what's known as the acceptor 246 00:21:17,000 --> 00:21:23,000 part of the molecule. And so that corresponds to this 247 00:21:23,000 --> 00:21:30,000 part up here. And here is what's known as the anticodon. 248 00:21:30,000 --> 00:21:43,000 Each of these three-letter words -- 249 00:21:43,000 --> 00:21:50,000 -- in nucleic acid language is called codon. And so something that 250 00:21:50,000 --> 00:21:57,000 had a complimentary sequence to a codon was called an anticodon. 251 00:21:57,000 --> 00:22:05,000 So if G-G-G is the codon then C-C-C would be the anticodon. 252 00:22:05,000 --> 00:22:09,000 Now, this is just a schematic, as you can see. It shows where the 253 00:22:09,000 --> 00:22:14,000 hydrogen bonds are that form this stuff. When the crystal structures 254 00:22:14,000 --> 00:22:19,000 were done, the first crystal structure of tRNA was actually done 255 00:22:19,000 --> 00:22:24,000 by Alex Rich. He's in the Biology Department at MIT. 256 00:22:24,000 --> 00:22:29,000 And he was in this picture I showed you talking to Matt Meselson. 257 00:22:29,000 --> 00:22:33,000 And although we cannot see this terribly well, 258 00:22:33,000 --> 00:22:37,000 maybe you could hit the lights here, the crystal structure showed that 259 00:22:37,000 --> 00:22:41,000 the molecule didn't look like a clover leaf as in there. 260 00:22:41,000 --> 00:22:45,000 It had more this shape. And I'll show you this more clearly in this 261 00:22:45,000 --> 00:22:49,000 picture. I showed you this little part of the thing when I was showing 262 00:22:49,000 --> 00:22:53,000 you how an RNA could form. For example, if you copy the gene 263 00:22:53,000 --> 00:22:57,000 encoding a tRNA and, for example, the sequence here in 264 00:22:57,000 --> 00:23:01,000 green is complimentary to the sequence here, 265 00:23:01,000 --> 00:23:05,000 or the sequence here in sort of blue or purple was complimentary 266 00:23:05,000 --> 00:23:08,000 to the sequence here. That what can happen then, 267 00:23:08,000 --> 00:23:12,000 if you allow a single strand RNA like this to fold up, 268 00:23:12,000 --> 00:23:16,000 thermodynamically it will then go to the lower energy state which 269 00:23:16,000 --> 00:23:20,000 involves being able to make these hydrogen bonds. 270 00:23:20,000 --> 00:23:24,000 And I think you can sort of see the clover leaf. Here's one of the 271 00:23:24,000 --> 00:23:28,000 leaves. The other is down here and the others. It's a little 272 00:23:28,000 --> 00:23:32,000 bit distorted here. And the reason is, 273 00:23:32,000 --> 00:23:36,000 because I'm going to continue now to show you how this structure, 274 00:23:36,000 --> 00:23:40,000 once you get to the clover leaf, then it folds up to make other kinds 275 00:23:40,000 --> 00:23:44,000 of interactions and it takes that shape with the tRNA going on at this 276 00:23:44,000 --> 00:23:48,000 end and the anticodon being down here. And what's happening now is 277 00:23:48,000 --> 00:23:52,000 they've morphed on the van der Waals surfaces so you can see what this 278 00:23:52,000 --> 00:23:56,000 would look like, 3-dimensional shape. 279 00:23:56,000 --> 00:24:00,000 The amino acid would be attached at that end and there is the anticodon 280 00:24:00,000 --> 00:24:04,000 that we'd be able to recognize, the codon in the RNA. 281 00:24:04,000 --> 00:24:11,000 I mean the physical reality is pretty close to this simple little 282 00:24:11,000 --> 00:24:18,000 depiction here. OK. So once this basic paradigm 283 00:24:18,000 --> 00:24:25,000 had been straightened out that gave rise to this idea then, 284 00:24:25,000 --> 00:24:32,000 putting it all together, that the information in DNA, 285 00:24:32,000 --> 00:24:39,000 that a portion of it would be copied into RNA and that would go out into 286 00:24:39,000 --> 00:24:45,000 the cytoplasm. And then in the cytoplasm these 287 00:24:45,000 --> 00:24:51,000 translators, the tRNAs would be able to decode, read the nucleic acid 288 00:24:51,000 --> 00:24:57,000 information and use that to determine the linear order of amino 289 00:24:57,000 --> 00:25:02,000 acids in a protein. Crick, when he came up with this, 290 00:25:02,000 --> 00:25:07,000 gave this the term ìthe central dogmaî. And people still use this 291 00:25:07,000 --> 00:25:12,000 term to apply this idea of information flow going from DNA to 292 00:25:12,000 --> 00:25:17,000 RNA in protein. And it's still used to this day. 293 00:25:17,000 --> 00:25:23,000 There's actually sort of a little twist to that, 294 00:25:23,000 --> 00:25:28,000 because at the time that Crick proposed the term he actually 295 00:25:28,000 --> 00:25:33,000 thought that the word dogma meant ìan idea for which there is not 296 00:25:33,000 --> 00:25:38,000 reasonable evidenceî. But he was sort of amused years 297 00:25:38,000 --> 00:25:43,000 later to realize that a more reasonable definition of dogma is it 298 00:25:43,000 --> 00:25:47,000 is something that a true believer cannot doubt. So he kind of 299 00:25:47,000 --> 00:25:52,000 accidentally made an insertion that he was right, but fortunately he was 300 00:25:52,000 --> 00:26:03,000 right. Now -- 301 00:26:03,000 --> 00:26:06,000 -- the next big job, though, in working this out was to 302 00:26:06,000 --> 00:26:14,000 crack the code. 303 00:26:14,000 --> 00:26:19,000 And it's fine to know that it's a 3-letter code and it's fine to know 304 00:26:19,000 --> 00:26:25,000 it goes into RNA and then the tRNAs translate it, but if you cannot 305 00:26:25,000 --> 00:26:31,000 crack the code then you have no idea what any of the information means. 306 00:26:31,000 --> 00:26:34,000 It was sort of like before the Rosetta Stone they could look at the 307 00:26:34,000 --> 00:26:38,000 hieroglyphics in the Egyptian tombs and they could see that it was a lot 308 00:26:38,000 --> 00:26:42,000 of information and there were symbols and so on, 309 00:26:42,000 --> 00:26:46,000 but they didn't know what it meant until finally they got something 310 00:26:46,000 --> 00:26:50,000 that allowed them to relate it to a language they did know and they were 311 00:26:50,000 --> 00:26:54,000 able to work out the principles. So somehow scientists had then to 312 00:26:54,000 --> 00:26:58,000 crack the code. And there were two scientists who 313 00:26:58,000 --> 00:27:02,000 played a really big role. One was Marshall Nirenberg who was 314 00:27:02,000 --> 00:27:08,000 at NIH and is, in fact, still at NIH. 315 00:27:08,000 --> 00:27:14,000 And the other was a scientist who's on the same floor as me at MIT, 316 00:27:14,000 --> 00:27:20,000 Gobin Khorana. And they used two different approaches, 317 00:27:20,000 --> 00:27:26,000 but between these two approaches the genetic code was cracked. 318 00:27:26,000 --> 00:27:32,000 And what Nirenberg did was to take a protein synthesizing -- 319 00:27:32,000 --> 00:27:42,000 -- extract that he knew needed RNA 320 00:27:42,000 --> 00:27:47,000 in order to work. So that wasn't a surprise at this 321 00:27:47,000 --> 00:27:52,000 point because people were thinking the RNA would be the message. 322 00:27:52,000 --> 00:27:57,000 And at that point the ability to make synthesized nucleic acids was 323 00:27:57,000 --> 00:28:03,000 quite limited compared to what we do now. 324 00:28:03,000 --> 00:28:08,000 And so there were different ways of making them. Sometimes you could do 325 00:28:08,000 --> 00:28:13,000 it enzymaticly. But what Nirenberg, 326 00:28:13,000 --> 00:28:18,000 for example, was able to make was poly-U. So this was an RNA that was 327 00:28:18,000 --> 00:28:23,000 just UUUUUUU. And then what he did was he set up 20 reactions, 328 00:28:23,000 --> 00:28:28,000 and in every reaction he put some of this extract, he put poly-U and he 329 00:28:28,000 --> 00:28:34,000 put 19 of the amino acids that were unlabeled. 330 00:28:34,000 --> 00:28:39,000 And then only one amino acid that had radiolabel in it. 331 00:28:39,000 --> 00:28:44,000 So he ran these 20 reactions and waited to see in any of these did he 332 00:28:44,000 --> 00:28:49,000 get protein made that would have been coded by the poly-U. 333 00:28:49,000 --> 00:28:55,000 And what he ended up with was polyphenylalanine. 334 00:28:55,000 --> 00:29:07,000 Which you may recall when we were 335 00:29:07,000 --> 00:29:15,000 talking about structures of amino acids, there's the basic backbone. 336 00:29:15,000 --> 00:29:23,000 And the polyphenylalanine is the one that has, if you will, 337 00:29:23,000 --> 00:29:31,000 a benzene ring hanging off the end. And so what that meant was that UUU 338 00:29:31,000 --> 00:29:39,000 must code for a fee or phenylalanine. 339 00:29:39,000 --> 00:29:45,000 And if it's UUU in the RNA that must mean that the DNA that encodes this 340 00:29:45,000 --> 00:29:51,000 must have that sequence AAA and TTT. And you can see that one of the two 341 00:29:51,000 --> 00:29:57,000 strands of the DNA, since T base pairs the same as 342 00:29:57,000 --> 00:30:03,000 uridine, but one of the strands in the DNA is going to have the same 343 00:30:03,000 --> 00:30:10,000 sequence as one of the strands in the RNA. 344 00:30:10,000 --> 00:30:14,000 Now, I'll just tell you one brief little anecdote. 345 00:30:14,000 --> 00:30:18,000 I heard Marshall Nirenberg at this meeting they had to celebrate the 346 00:30:18,000 --> 00:30:22,000 50th anniversary of the discovery of DNA. And he posed something that 347 00:30:22,000 --> 00:30:26,000 I'd never thought about in my years of teaching this but might occur to 348 00:30:26,000 --> 00:30:30,000 you guys if we put it on a problem set. 349 00:30:30,000 --> 00:30:34,000 You all know something that benzene is nothing but sort of these, 350 00:30:34,000 --> 00:30:38,000 this as I call it, we even referred to it as a benzene ring, 351 00:30:38,000 --> 00:30:42,000 which is a very organic kind of solvent. So if we put a problem set, 352 00:30:42,000 --> 00:30:46,000 if you've made polyphenylalanine would you expect this to be soluble 353 00:30:46,000 --> 00:30:50,000 in water? Well, this is very, very hydrophobic, 354 00:30:50,000 --> 00:30:54,000 very, very water-hating. And your answer would be correct. 355 00:30:54,000 --> 00:30:58,000 If you said no, I wouldn't expect polyphenylalanine to be 356 00:30:58,000 --> 00:31:02,000 soluble in water. In fact, if it were in a protein 357 00:31:02,000 --> 00:31:06,000 you'd expect it to probably be in the core where all the hydrophobic 358 00:31:06,000 --> 00:31:10,000 interactions, the water-hating parts would go. So Marshall Nirenberg 359 00:31:10,000 --> 00:31:14,000 said in his talk, well, he had shown that he had 360 00:31:14,000 --> 00:31:19,000 radioactive phenylalanine, and he still had to prove chemically 361 00:31:19,000 --> 00:31:23,000 that he had polyphenylalanine. But he wasn't much of a biochemist 362 00:31:23,000 --> 00:31:27,000 so he walked down to the lab just below NIH and walked in the door and 363 00:31:27,000 --> 00:31:31,000 saw the first person he saw and said how do you solubilize 364 00:31:31,000 --> 00:31:35,000 polyphenylalanine? Just to make sure I got this right. 365 00:31:35,000 --> 00:31:39,000 And the guy said, oh, you just take 33% hydrobromic acid and glacial 366 00:31:39,000 --> 00:31:42,000 acidic acid and it works. So he went back upstairs and 367 00:31:42,000 --> 00:31:45,000 dissolved it. It turned out it dissolved in that. 368 00:31:45,000 --> 00:31:49,000 And he went on and characterized it. And he said it didn't occur to him 369 00:31:49,000 --> 00:31:52,000 or he didn't learn until about 15 or 20 years later that he just walked 370 00:31:52,000 --> 00:31:56,000 up to the only person in the world who knew how to solubilize 371 00:31:56,000 --> 00:32:00,000 polyphenylalanine. By total coincidence this guy who 372 00:32:00,000 --> 00:32:04,000 had talked to had been working away trying to figure out a way and had 373 00:32:04,000 --> 00:32:08,000 come up with this odd mix of hydrobromic acid and glacial acidic. 374 00:32:08,000 --> 00:32:12,000 And he just said of all the places in the world, he walked up to the 375 00:32:12,000 --> 00:32:17,000 one person who knew and got the answer. So the other part of the 376 00:32:17,000 --> 00:32:21,000 story then involves Gobin Khorana who I mentioned when I was telling 377 00:32:21,000 --> 00:32:25,000 you initially about the Nobel Laureates at MIT. 378 00:32:25,000 --> 00:32:30,000 And Gobin is a brilliant organic chemist. He synthesized DNA. 379 00:32:30,000 --> 00:32:35,000 You know, it was a point where a whole issue of a journal came out 380 00:32:35,000 --> 00:32:40,000 and there was nothing but his labs work and synthesizing DNA. 381 00:32:40,000 --> 00:32:45,000 Well, he was good at nucleic acids. And one of the strategies that they 382 00:32:45,000 --> 00:32:50,000 could use chemically was they would make something like a dye nucleotide 383 00:32:50,000 --> 00:32:55,000 like CA. And then they were able to polymerize that to make a piece of 384 00:32:55,000 --> 00:33:00,000 RNA. So they could make an RNA that had the sequence CA, CA, 385 00:33:00,000 --> 00:33:05,000 CA, CA and so on. And what you can see from that is 386 00:33:05,000 --> 00:33:11,000 that there are two different codons in that. One is CAC and the other 387 00:33:11,000 --> 00:33:17,000 is ACA. And the reason he made was he was synthesizing it by 388 00:33:17,000 --> 00:33:22,000 polymerizing nucleotides. So in these same kinds of 389 00:33:22,000 --> 00:33:28,000 experiments I was describing before, what they found this synthesized was 390 00:33:28,000 --> 00:33:34,000 alternating histidine and threonine. 391 00:33:34,000 --> 00:33:39,000 And you cannot tell from that experiment alone. 392 00:33:39,000 --> 00:33:44,000 One of those must be histidine and one of them must be threonine, 393 00:33:44,000 --> 00:33:49,000 but you cannot tell from that experiment so more experiments were 394 00:33:49,000 --> 00:33:54,000 needed. And what was learned from that experiment in that case was 395 00:33:54,000 --> 00:34:00,000 that CAC corresponded to histidine and ACA corresponded to threonine. 396 00:34:00,000 --> 00:34:04,000 So these kind of experiments were then put together to give what's 397 00:34:04,000 --> 00:34:09,000 known as the genetic code which is the three-letter words encoded in 398 00:34:09,000 --> 00:34:13,000 DNA that encode the sequence amino acids and proteins. 399 00:34:13,000 --> 00:34:18,000 And it's usually displayed as a table and you read it in this way. 400 00:34:18,000 --> 00:34:22,000 That this thing over here is the first base in the codon, 401 00:34:22,000 --> 00:34:27,000 across the top is the second base in the codon, and down over here is the 402 00:34:27,000 --> 00:34:31,000 third base. So if we go to C as the first, say the one for histidine we 403 00:34:31,000 --> 00:34:36,000 were just showing you. C is the first letter. 404 00:34:36,000 --> 00:34:40,000 A is the second letter, so this is the box that we're going 405 00:34:40,000 --> 00:34:45,000 to be looking at. And if C is the third letter we can 406 00:34:45,000 --> 00:34:50,000 see it encoded histidine or AC come back to A. Then the A is certainly 407 00:34:50,000 --> 00:34:54,000 threonine. But you can also see something else here. 408 00:34:54,000 --> 00:34:59,000 And that is because there were 64 possibilities with this three-letter 409 00:34:59,000 --> 00:35:04,000 word the code is what's known as degenerate. 410 00:35:04,000 --> 00:35:09,000 That is there are more words in the genetic code than are needed to 411 00:35:09,000 --> 00:35:14,000 specify the number of amino acids that have to be coded. 412 00:35:14,000 --> 00:35:20,000 So I just want to make a couple of points about this. So 413 00:35:20,000 --> 00:35:30,000 the genetic code -- 414 00:35:30,000 --> 00:35:36,000 It's degenerate. There are 61 codons that correspond 415 00:35:36,000 --> 00:35:42,000 to an amino acid. And that means that some, 416 00:35:42,000 --> 00:35:48,000 and I think threonine is a good example, there's more than one word 417 00:35:48,000 --> 00:35:54,000 in the genetic code that means threonine. There were tree codons 418 00:35:54,000 --> 00:36:00,000 for which there was no corresponding amino acid. And those mean stop. 419 00:36:00,000 --> 00:36:05,000 And that would make sense because if you're reading down a nucleic acid 420 00:36:05,000 --> 00:36:11,000 piece of RNA, at some point you'd have to end the protein. 421 00:36:11,000 --> 00:36:16,000 And so there are actually three that are used for that purpose. 422 00:36:16,000 --> 00:36:22,000 And although there's some small variation on this in nature there's 423 00:36:22,000 --> 00:36:28,000 usually one amino acid that's used for starting a protein, 424 00:36:28,000 --> 00:36:33,000 and that's methionine. And it's AUG right there. 425 00:36:33,000 --> 00:36:37,000 Now, some of this stuff probably sounds like it's been around forever, 426 00:36:37,000 --> 00:36:42,000 and that's certainly true of some of the stuff you hear in your chemistry, 427 00:36:42,000 --> 00:36:46,000 math and physics courses. I just want to drive you home. 428 00:36:46,000 --> 00:36:51,000 When I was an undergrad Watson's first book called the molecule 429 00:36:51,000 --> 00:36:55,000 biology of the gene had come out, so when I was your age, and I 430 00:36:55,000 --> 00:37:00,000 realize that I look ancient but, you know, at least I'm still here. 431 00:37:00,000 --> 00:37:03,000 When I was an undergrad I had Watson's book. 432 00:37:03,000 --> 00:37:07,000 This was the genetic code that was in the code, the genetic code as of 433 00:37:07,000 --> 00:37:11,000 May 1965. And you'll notice there are gaps in here. 434 00:37:11,000 --> 00:37:15,000 And all the things that are underlined were things for which 435 00:37:15,000 --> 00:37:19,000 there was a tentative assignment. So although you may take this and 436 00:37:19,000 --> 00:37:23,000 think that it's been knowledge that's been around forever, 437 00:37:23,000 --> 00:37:27,000 it wasn't even complete in the textbook when I was 438 00:37:27,000 --> 00:37:32,000 an undergrad. OK. So one of the things then that's 439 00:37:32,000 --> 00:37:39,000 important to think about the nucleic acid stuff, this is the basis of how 440 00:37:39,000 --> 00:37:46,000 proteins are encoded in the DNA. But everything else has to be there, 441 00:37:46,000 --> 00:37:53,000 too. And the genetic code, that's what we've been talking about, 442 00:37:53,000 --> 00:38:01,000 is universal. But there are other languages -- 443 00:38:01,000 --> 00:38:09,000 -- written in the DNA that are not 444 00:38:09,000 --> 00:38:15,000 universal. And one of them was that little example I gave you with an 445 00:38:15,000 --> 00:38:21,000 origin of replication. E. coli only starts DNA replication 446 00:38:21,000 --> 00:38:27,000 at one very particular point in its chromosome, so it is a particular 447 00:38:27,000 --> 00:38:33,000 sequence of DNA. It's actually about 250 nucleotides 448 00:38:33,000 --> 00:38:39,000 long. So you could think of that as a language. It's like starting a 449 00:38:39,000 --> 00:38:45,000 chromosome replication language. It's only got one word in it, and 450 00:38:45,000 --> 00:38:51,000 the word is 250 nucleotides long. Another place that's very important, 451 00:38:51,000 --> 00:38:57,000 and that is if you're going to make an RNA copy, if you're going to do 452 00:38:57,000 --> 00:39:03,000 transcription of a piece of DNA -- And I'll call this the coding 453 00:39:03,000 --> 00:39:09,000 sequence. This would be the sequence of three-letter words that 454 00:39:09,000 --> 00:39:15,000 we'd specify the amino acid of the protein. If you were going to make 455 00:39:15,000 --> 00:39:21,000 an RNA copy of that, you would have to somewhere have 456 00:39:21,000 --> 00:39:27,000 something here that's a sequence up here that means start 457 00:39:27,000 --> 00:39:33,000 transcription. And one at the end, 458 00:39:33,000 --> 00:39:39,000 some other sequence of letters in the nucleic acid that would mean 459 00:39:39,000 --> 00:39:45,000 stop transcription. This is given the technical term 460 00:39:45,000 --> 00:39:51,000 that's referred to as a promoter. The stop one is referred to as a 461 00:39:51,000 --> 00:39:58,000 terminator. And these, we'll say more about this. 462 00:39:58,000 --> 00:40:02,000 Because the beauties of having this system of making an RNA copy is it 463 00:40:02,000 --> 00:40:07,000 provides a beautiful point of regulation. Because the cell can 464 00:40:07,000 --> 00:40:12,000 determine whether or not it's going to make a particular protein by 465 00:40:12,000 --> 00:40:17,000 whether or not it chooses to make the protein or not. 466 00:40:17,000 --> 00:40:22,000 And so having this RNA intermediate and being able to control 467 00:40:22,000 --> 00:40:27,000 transcription is a really important part of the whole regulation that 468 00:40:27,000 --> 00:40:33,000 makes life possible. The transcription is carried out by 469 00:40:33,000 --> 00:40:40,000 an enzyme that's known as RNA polymerase. And let me make one 470 00:40:40,000 --> 00:40:47,000 more point. These promoters and terminators are not universal. 471 00:40:47,000 --> 00:40:54,000 So when we talk about recombinant DNA a little bit in the course, 472 00:40:54,000 --> 00:41:01,000 if I take a mouse gene and I put it in E. coli. 473 00:41:01,000 --> 00:41:05,000 Even though the genetic code is the same, we might have all the same 474 00:41:05,000 --> 00:41:09,000 sequence of amino acids specified, you won't get the RNA made because 475 00:41:09,000 --> 00:41:13,000 the sequences that say start transcription and stop transcription 476 00:41:13,000 --> 00:41:17,000 are different between a mouse and a bacterium even though the genetic 477 00:41:17,000 --> 00:41:21,000 code is the same. So you can kind of see from first 478 00:41:21,000 --> 00:41:25,000 principles. If you're doing recombinant DNA and you wanted to 479 00:41:25,000 --> 00:41:29,000 express the mouse protein in E. coli, you would have to fiddle 480 00:41:29,000 --> 00:41:33,000 around with the sequences up here and the sequences down there, 481 00:41:33,000 --> 00:41:39,000 the parts that are not universal. You guys with me? 482 00:41:39,000 --> 00:41:47,000 OK. So what does an RNA polymerase do? It recognizes this sequence, 483 00:41:47,000 --> 00:41:55,000 and then it teases the strands apart to make a little bubble like this. 484 00:41:55,000 --> 00:42:03,000 So let's say ATAGCTA. So the other strand then would be TATCGTA. 485 00:42:03,000 --> 00:42:08,000 And then RNA polymerase, unlike a DNA polymerase, can begin a 486 00:42:08,000 --> 00:42:13,000 chain de novo. Remember an important thing about 487 00:42:13,000 --> 00:42:18,000 DNA polymerases was they had to have a primer terminus to get started. 488 00:42:18,000 --> 00:42:23,000 That was they had to use the Okazaki fragments. 489 00:42:23,000 --> 00:42:29,000 So this is DNA. This would be 5 prime, 3 prime, 490 00:42:29,000 --> 00:42:34,000 3 prime and 5 prime. And what an RNA polymerase can do, 491 00:42:34,000 --> 00:42:39,000 it uses DATP, DGTP, DCTP and DUTP. It uses triphosphates, 492 00:42:39,000 --> 00:42:43,000 excuse me. Get rid of these. Excuse me. My mistake. No deoxies 493 00:42:43,000 --> 00:42:47,000 here. Of course this is RNA. It uses ATP, GTP, CTP and UTP as 494 00:42:47,000 --> 00:42:51,000 the substrates. So it uses triphosphates just the 495 00:42:51,000 --> 00:42:55,000 same way DNA polymerases do. And then it's able to start a chain 496 00:42:55,000 --> 00:43:01,000 de novo. And it synthesizes the RNA in a 5 497 00:43:01,000 --> 00:43:08,000 prime to 3 prime direction, the same direction that a strand of 498 00:43:08,000 --> 00:43:15,000 DNA is made by DNA polymerase. So it would copy here. And so it 499 00:43:15,000 --> 00:43:21,000 would put in an A opposite a T. And then because it's RNA it will 500 00:43:21,000 --> 00:43:28,000 put in a U opposite an A, and then an AGCAU and so on. 501 00:43:28,000 --> 00:43:35,000 So this right here is the beginning of the RNA that's being synthesized 502 00:43:35,000 --> 00:43:41,000 by the RNA polymerase. This strand is known as the 503 00:43:41,000 --> 00:43:46,000 transcribed strand. And by default then that one is the 504 00:43:46,000 --> 00:43:51,000 non-transcribed strand. And what you can see by doing this, 505 00:43:51,000 --> 00:43:56,000 it's making an RNA the same sequences up here, 506 00:43:56,000 --> 00:44:01,000 except that everywhere there's a T there's now a U in the DNA. 507 00:44:01,000 --> 00:44:07,000 So the final thing then is how this information gets all put together to 508 00:44:07,000 --> 00:44:14,000 make proteins. And protein synthesis is done by an 509 00:44:14,000 --> 00:44:21,000 amazing machine known as the ribosome. It's made up of some 510 00:44:21,000 --> 00:44:30,000 special large RNAs -- 511 00:44:30,000 --> 00:44:36,000 -- called rRNAs, some proteins as well. 512 00:44:36,000 --> 00:44:43,000 These make up the ribosome. And then it needs a mRNA and then 513 00:44:43,000 --> 00:44:49,000 it needs the various tRNAs, each of which carries an amino acid 514 00:44:49,000 --> 00:44:56,000 that's appropriate to its anticodon. And in a very briefly sort of way 515 00:44:56,000 --> 00:45:02,000 this is -- And you can see this in your 516 00:45:02,000 --> 00:45:08,000 textbook, what the ribosome does is it takes, let's consider this is the 517 00:45:08,000 --> 00:45:14,000 mRNA. I'm just going to take three codons here. And this mRNA treads 518 00:45:14,000 --> 00:45:19,000 into the ribosome. And I'll sort of show it's able to 519 00:45:19,000 --> 00:45:25,000 recognize the first codon and the second codon. Remember, 520 00:45:25,000 --> 00:45:31,000 of course, there's no spacing like this in the RNA. 521 00:45:31,000 --> 00:45:36,000 And then in the context of this large factory it's able to find the 522 00:45:36,000 --> 00:45:41,000 tRNA that has amino acid one and the anticodon that would correspond to 523 00:45:41,000 --> 00:45:46,000 this. The tRNA that has the next amino acid attached and its 524 00:45:46,000 --> 00:45:51,000 anticodon. So you can see what's happened. It's been able to order 525 00:45:51,000 --> 00:45:57,000 the first amino acid encoded by that codon and put it physically right 526 00:45:57,000 --> 00:46:02,000 next to the next amino acid that's coded here. And then 527 00:46:02,000 --> 00:46:16,000 it catalyzes -- 528 00:46:16,000 --> 00:46:20,000 -- the formation of a peptide bond. And what happens when that does is 529 00:46:20,000 --> 00:46:25,000 the way this amino acid is joined to the tRNA there's energy 530 00:46:25,000 --> 00:46:31,000 stored in that bond. And so thermodynamically that allows 531 00:46:31,000 --> 00:46:37,000 this bond formation to go. And now you end up essentially with 532 00:46:37,000 --> 00:46:43,000 this. And what happens now is everything clicks over one. 533 00:46:43,000 --> 00:46:49,000 So you could think of it as this whole RNA shifts over one so the one 534 00:46:49,000 --> 00:46:55,000 that used to be here is now sticking outside. Here's part 535 00:46:55,000 --> 00:47:01,000 of the ribosome. Here's the next codon. 536 00:47:01,000 --> 00:47:05,000 What we have here is the tRNA that's got amino acid two joined to 537 00:47:05,000 --> 00:47:10,000 amino acid one. The next codon specifies the next 538 00:47:10,000 --> 00:47:15,000 amino acid which is three. And the process is then able to go 539 00:47:15,000 --> 00:47:19,000 on like that. Now, the structure of the ribosome, 540 00:47:19,000 --> 00:47:24,000 the crystal structure of the ribosome was just finished. 541 00:47:24,000 --> 00:47:29,000 And I guess we've got as many lights out as we can do right now. 542 00:47:29,000 --> 00:47:33,000 It's absolutely remarkable. It's mostly RNA. The gray stuff 543 00:47:33,000 --> 00:47:37,000 and the blue stuff are two huge RNAs that are all folded up in 544 00:47:37,000 --> 00:47:41,000 3-dimensional space. And these things that are sort of 545 00:47:41,000 --> 00:47:45,000 stuck on the outside, these purple things here or the dark 546 00:47:45,000 --> 00:47:49,000 blue things here that sort of look like cherries stuck on the outside 547 00:47:49,000 --> 00:47:53,000 of a cake, those are proteins. So most of this is RNA, big balls 548 00:47:53,000 --> 00:47:58,000 of RNA with proteins kind of decorating the outside. 549 00:47:58,000 --> 00:48:02,000 The mRNA is a green thing that snakes through. 550 00:48:02,000 --> 00:48:06,000 There's the mRNA. See it snaking through? 551 00:48:06,000 --> 00:48:10,000 And maybe you can recognize in the middle this tRNA. 552 00:48:10,000 --> 00:48:14,000 There's an orange one and a yellow one. Those correspond to the two 553 00:48:14,000 --> 00:48:18,000 tRNAs I depicted here. And I'm just going to see if I can 554 00:48:18,000 --> 00:48:22,000 stop this. There's a viewpoint I'd like you to see when it comes around 555 00:48:22,000 --> 00:48:26,000 again here in just a second. I'll see if I can catch it there. 556 00:48:26,000 --> 00:48:30,000 Right there. Here's one of the tRNAs in yellow. 557 00:48:30,000 --> 00:48:34,000 And its end is right there. And there's the other tRNA. 558 00:48:34,000 --> 00:48:38,000 And its end is right there. So this corresponds to the point at 559 00:48:38,000 --> 00:48:42,000 which there's going to be an amino acid formed. And something is going 560 00:48:42,000 --> 00:48:47,000 to catalyze the formation of that bond. Well, the next picture sort 561 00:48:47,000 --> 00:48:51,000 of shows what happens if you pull that apart. And what you'll see is 562 00:48:51,000 --> 00:48:55,000 that here's the end of one end of the tRNA, there's the other end, 563 00:48:55,000 --> 00:49:00,000 and there's nothing near it except for RNA. 564 00:49:00,000 --> 00:49:05,000 So RNA is actually catalyzing the formation of the peptide bond. 565 00:49:05,000 --> 00:49:11,000 Another way to say that would be that the ribosome, 566 00:49:11,000 --> 00:49:16,000 which is the protein synthesizing factory, is a ribozyme. 567 00:49:16,000 --> 00:49:22,000 Remember I said most of the chemical reactions that need 568 00:49:22,000 --> 00:49:27,000 catalysts are carried out by proteins but there are a few that 569 00:49:27,000 --> 00:49:33,000 are carried out by RNA where RNA is the catalyst? 570 00:49:33,000 --> 00:49:38,000 And remarkably the formation of the bond, which is at the heart of 571 00:49:38,000 --> 00:49:43,000 proteins which are so important for all life, is catalyzed by protein. 572 00:49:43,000 --> 00:49:48,000 If you look at what makes proteins, what do you see? You see huge balls 573 00:49:48,000 --> 00:49:53,000 of RNA, a mRNA threading through two tRNAs, and the enzyme activity or 574 00:49:53,000 --> 00:49:59,000 the catalytic activity is encoded by the RNA as well. 575 00:49:59,000 --> 00:50:03,000 As I said, people think possibly there was an RNA world that preceded 576 00:50:03,000 --> 00:50:08,000 our present-day world with DNA, RNA and protein. And who knows? 577 00:50:08,000 --> 00:50:12,000 But this sort of look at a ribosome could at least make you see that 578 00:50:12,000 --> 00:50:17,000 that's a plausible explanation that RNA might have been running the show 579 00:50:17,000 --> 00:50:21,000 for a while before anything else got involved. Anyway, we'll 580 00:50:21,000 --> 00:50:24,000 see you on Friday then.