1 00:00:15,000 --> 00:04:43,000 OK. Parts of a gene. 2 00:04:43,000 --> 00:04:47,000 We have our promoter, which is part of the untranscribed 3 00:04:47,000 --> 00:04:51,000 region of a gene, usually in the 5 prime end. 4 00:04:51,000 --> 00:04:55,000 Not always but for the genes we're talking about at the 5 prime end, 5 00:04:55,000 --> 00:04:59,000 the so-called 5 prime end of the gene, or so-called upstream of this 6 00:04:59,000 --> 00:05:04,000 transcribed region. And downstream of that there is more 7 00:05:04,000 --> 00:05:09,000 untranscribed region that interestingly can also contribute to 8 00:05:09,000 --> 00:05:13,000 the promoter, even though it's far away from this more upstream part of 9 00:05:13,000 --> 00:05:18,000 the promoter. But I'm going to call it just for now untranscribed, 10 00:05:18,000 --> 00:05:23,000 two flanking regions of untranscribed DNA sequence and one 11 00:05:23,000 --> 00:05:29,000 region of transcribed sequence. Now, I want to discuss with you very 12 00:05:29,000 --> 00:05:35,000 briefly a phenomenon called splicing. And this is a phenomenon that 13 00:05:35,000 --> 00:05:41,000 occurs within the RNA that is transcribed from a gene and, 14 00:05:41,000 --> 00:05:47,000 therefore, pertains to the transcribed region of the gene. 15 00:05:47,000 --> 00:05:53,000 It turns out that in this transcribed region there are two 16 00:05:53,000 --> 00:05:59,000 kinds of sequences. There are things called exons and 17 00:05:59,000 --> 00:06:05,000 there are regions called introns. The exons code for something, 18 00:06:05,000 --> 00:06:11,000 code for the final function of the RNA or for eventually a protein. 19 00:06:11,000 --> 00:06:17,000 So these are coding. The introns are noncoding. 20 00:06:17,000 --> 00:06:24,000 Both of them are transcribed. You'll see this definition is a 21 00:06:24,000 --> 00:06:30,000 little loose as we move on in today's lecture, but 22 00:06:30,000 --> 00:06:36,000 it's good enough. In the transcript that initially is 23 00:06:36,000 --> 00:06:42,000 made from the gene in this transcribed region, 24 00:06:42,000 --> 00:06:48,000 both introns and exons are present. So these are present in what's 25 00:06:48,000 --> 00:06:54,000 called the primary transcript or primary RNA. And primary refers to 26 00:06:54,000 --> 00:07:00,000 the first RNA that is transcribed from the gene. 27 00:07:00,000 --> 00:07:07,000 And subsequent to that, still in the nucleus, those introns 28 00:07:07,000 --> 00:07:14,000 and exons are subject to a process called splicing whereby the introns 29 00:07:14,000 --> 00:07:24,000 are removed -- 30 00:07:24,000 --> 00:07:32,000 -- or spliced out is the term, such that in your mature RNA only 31 00:07:32,000 --> 00:07:49,000 the exons are present. 32 00:07:49,000 --> 00:07:53,000 This process is likely a consequence. I'm going to put up a diagram that 33 00:07:53,000 --> 00:07:57,000 you had on your last time's handout. You can watch it now. And you can 34 00:07:57,000 --> 00:08:02,000 refer back to a previous lecture if you don't have it with you. 35 00:08:02,000 --> 00:08:07,000 This notion of introns and exons is probably a consequence of evolution 36 00:08:07,000 --> 00:08:12,000 whereby different parts of genes were combined and shuffled to give 37 00:08:12,000 --> 00:08:17,000 new kinds of genes and, therefore, new kinds of proteins. 38 00:08:17,000 --> 00:08:22,000 Here on my diagram I have exons in black and introns in blue, 39 00:08:22,000 --> 00:08:27,000 and they're all just DNA sequence, but when the RNA is transcribed in 40 00:08:27,000 --> 00:08:32,000 the first place is primary RNA. It's a copy of the gene. 41 00:08:32,000 --> 00:08:36,000 It has both exons and introns. And then a very complex enzymatic 42 00:08:36,000 --> 00:08:41,000 machinery comes on and it loops out and excises these introns. 43 00:08:41,000 --> 00:08:46,000 OK? So this is very interesting such that in your mature mRNA there 44 00:08:46,000 --> 00:08:50,000 are no introns. And the introns have been looped 45 00:08:50,000 --> 00:08:55,000 out and they form these little structures that are called lariats. 46 00:08:55,000 --> 00:09:00,000 And at this point your mRNA is mature -- 47 00:09:00,000 --> 00:09:04,000 -- and it moves to the cytoplasm. Now, this process what discovered 48 00:09:04,000 --> 00:09:09,000 by Professor Phillip Sharp here at MIT and he got the Nobel Prize for 49 00:09:09,000 --> 00:09:13,000 it in 1993. It's a very important process because it's absolutely 50 00:09:13,000 --> 00:09:18,000 required for maturation of RNAs. And also, and I'll come to this in 51 00:09:18,000 --> 00:09:23,000 a few lecture's time, it allows different proteins to be 52 00:09:23,000 --> 00:09:28,000 made from the same mRNA. So here's a rule. 53 00:09:28,000 --> 00:09:33,000 In this RNA there are what are called splice donor cites that I've 54 00:09:33,000 --> 00:09:38,000 put as a circle and splice acceptor sites that I've put as a square. 55 00:09:38,000 --> 00:09:43,000 Just watch this. Just watch this for now because we will come back to 56 00:09:43,000 --> 00:09:48,000 it. So watch what I'm saying rather than trying to madly write down. 57 00:09:48,000 --> 00:09:53,000 Any spliced donor can join to any splice acceptor and remove the stuff 58 00:09:53,000 --> 00:09:58,000 between them. So in this top example I've got each introns being 59 00:09:58,000 --> 00:10:04,000 neatly removed because splice donors and inceptors interact. 60 00:10:04,000 --> 00:10:08,000 But look at the example below. I've got this splice donor next to 61 00:10:08,000 --> 00:10:12,000 exon one interacting with a spliced acceptor next to exon three. 62 00:10:12,000 --> 00:10:16,000 And when that happens you remove the hull of exon two. 63 00:10:16,000 --> 00:10:20,000 So you actually are going to make a different protein. 64 00:10:20,000 --> 00:10:24,000 Whereas, in the first case you'll have exons one, 65 00:10:24,000 --> 00:10:28,000 two and three and four. In the second case you'll have 66 00:10:28,000 --> 00:10:32,000 exons one, three and four. OK? So this process is very 67 00:10:32,000 --> 00:10:38,000 important for allowing different kinds of proteins to be made from 68 00:10:38,000 --> 00:10:43,000 the same gene. I want to make you aware of this 69 00:10:43,000 --> 00:10:49,000 now, and I will come back to it in the formation module when we talk 70 00:10:49,000 --> 00:10:54,000 about how different kinds of cells are generated. 71 00:10:54,000 --> 00:11:00,000 All right. So let's move onto the major topic of today's lecture -- 72 00:11:00,000 --> 00:11:03,000 -- which takes us back to the central dogma. 73 00:11:03,000 --> 00:11:07,000 And I want to introduce to you a term that is very important that you 74 00:11:07,000 --> 00:11:11,000 know and you understand. And this is the term gene 75 00:11:11,000 --> 00:11:15,000 expression. And really what we've been talking about is 76 00:11:15,000 --> 00:11:22,000 gene expression. 77 00:11:22,000 --> 00:11:28,000 Gene expression simply refers to the generation of the final product of a 78 00:11:28,000 --> 00:11:34,000 gene from the gene. So we're talking about the 79 00:11:34,000 --> 00:11:40,000 formation of a protein as directed by a particular gene. 80 00:11:40,000 --> 00:11:47,000 OK? So gene expression is, if you like, the readout. Here's 81 00:11:47,000 --> 00:11:53,000 another way of putting it. The readout, the final readout of a 82 00:11:53,000 --> 00:12:00,000 gene, or the generation of the final product of a gene. 83 00:12:00,000 --> 00:12:05,000 I'm going to come back to this term over and over again, 84 00:12:05,000 --> 00:12:10,000 and I will ask you to define it in your own way, but it's a term I want 85 00:12:10,000 --> 00:12:16,000 to throw out at you now because you do need to know it. 86 00:12:16,000 --> 00:12:21,000 It's very pervasive. Today I want to talk about the step 87 00:12:21,000 --> 00:12:27,000 in gene expression or translation whereby RNA is converted or is used 88 00:12:27,000 --> 00:12:33,000 to direct synthesis of a protein. So let's define translation because 89 00:12:33,000 --> 00:12:39,000 it is, I think, one of the most interesting 90 00:12:39,000 --> 00:12:45,000 questions in molecular biology. Certainly from a historical 91 00:12:45,000 --> 00:12:52,000 perspective that was true. And the notion in translation is 92 00:12:52,000 --> 00:12:58,000 that the base sequence of a mRNA somehow leads to the synthesis of a 93 00:12:58,000 --> 00:13:05,000 protein with a defined amino acid sequence. 94 00:13:05,000 --> 00:13:31,000 Now, if you think about DNA 95 00:13:31,000 --> 00:13:35,000 replication, transcription and translation, the relationship 96 00:13:35,000 --> 00:13:40,000 between them, there is a nice analogy that one can make. 97 00:13:40,000 --> 00:13:44,000 DNA uses the base code, four bases. Transcription RNA uses those same 98 00:13:44,000 --> 00:13:48,000 four bases as a code, but it's slightly different from DNA. 99 00:13:48,000 --> 00:13:53,000 So the synthesis of RNA using a DNA template is kind of like changing 100 00:13:53,000 --> 00:13:57,000 fonts in a document that you have. It's kind of like going from Times 101 00:13:57,000 --> 00:14:02,000 New Roman to Helvetica. You haven't really changed much. 102 00:14:02,000 --> 00:14:07,000 It just looks a bit different. Translation is very different. 103 00:14:07,000 --> 00:14:11,000 The use of mRNA to direct the synthesis of a protein is much more 104 00:14:11,000 --> 00:14:16,000 analogous to changing language where you've taken English and translated 105 00:14:16,000 --> 00:14:21,000 it into Chinese or Russian and translated it into French. 106 00:14:21,000 --> 00:14:26,000 OK? So this is a really different process. And it was clear from the 107 00:14:26,000 --> 00:14:31,000 outset, historically, that one had to think in a slightly 108 00:14:31,000 --> 00:14:36,000 different way about how this process was directed. 109 00:14:36,000 --> 00:14:41,000 And I want to talk about four things with respect to translation. 110 00:14:41,000 --> 00:14:47,000 Firstly, I want to talk about the genetic code that allows RNA to 111 00:14:47,000 --> 00:14:53,000 direct protein synthesis. I want to talk about something 112 00:14:53,000 --> 00:14:59,000 called the interpreter of that code. 113 00:14:59,000 --> 00:15:03,000 I'm going to talk about the factory in which the synthesis takes place. 114 00:15:03,000 --> 00:15:08,000 And then I'm going to get to a discussion of the molecule bases for 115 00:15:08,000 --> 00:15:28,000 genotype and phenotype. 116 00:15:28,000 --> 00:15:33,000 So let's think about the code. And thinking about this starts from 117 00:15:33,000 --> 00:15:38,000 a very simple logical place. And the place is this. One starts 118 00:15:38,000 --> 00:15:43,000 with four bases, A, G, C and T or A, 119 00:15:43,000 --> 00:15:49,000 G, C and U, depending if you're talking about DNA and RNA. 120 00:15:49,000 --> 00:15:54,000 And somehow those four bases have to be used in some kind of code to 121 00:15:54,000 --> 00:16:00,000 give you an outcome of 20 amino acids. 122 00:16:00,000 --> 00:16:04,000 And I am going to use the abbreviation AA for amino acids. 123 00:16:04,000 --> 00:16:09,000 So you can look at this and immediately understand there has to 124 00:16:09,000 --> 00:16:13,000 be some kind of combinatorial code in order to specify those 20 amino 125 00:16:13,000 --> 00:16:18,000 acids. So you can do combinations and you can say, 126 00:16:18,000 --> 00:16:23,000 OK, if two bases were used and you could have combinations of doublets, 127 00:16:23,000 --> 00:16:27,000 how many combinations can you get to and would that be enough to specify 128 00:16:27,000 --> 00:16:34,000 those 20 amino acids? Well, no, because two base 129 00:16:34,000 --> 00:16:42,000 combinations would only give you 16 possible amino acid combinations, 130 00:16:42,000 --> 00:16:50,000 or the ability to specify 16 amino acids. OK? Four squared. 131 00:16:50,000 --> 00:16:59,000 How about three base combinations? Well, that's better. 132 00:16:59,000 --> 00:17:05,000 What you can get out of that is 64 different combinations. 133 00:17:05,000 --> 00:17:11,000 OK? And that is plenty to specify your 20 amino acids with some left 134 00:17:11,000 --> 00:17:17,000 over. And, in fact, this is what is used. 135 00:17:17,000 --> 00:17:23,000 Combinations of three bases. And these combinations of three 136 00:17:23,000 --> 00:17:32,000 bases are termed the triplet code. 137 00:17:32,000 --> 00:17:36,000 The discovery of the triplet code is really fascinating. 138 00:17:36,000 --> 00:17:40,000 I don't have time to go into it in this lecture, but your book is not 139 00:17:40,000 --> 00:17:44,000 too bad on the discovery. And I will post on your website, 140 00:17:44,000 --> 00:17:48,000 for those of you who really want to get into it, a reference to a very 141 00:17:48,000 --> 00:17:52,000 interesting historical account of the discovery of the triplet code 142 00:17:52,000 --> 00:17:56,000 and indeed of much of molecular biology. But it's a fascinating 143 00:17:56,000 --> 00:18:00,000 story. But I'm going to tell you the code is a triplet code. OK. 144 00:18:00,000 --> 00:18:08,000 So what does that mean? It means that three bases 145 00:18:08,000 --> 00:18:16,000 correspond to a particular amino acid. OK? So one triplet of bases 146 00:18:16,000 --> 00:18:24,000 correspond, I'm writing this out because it's really important that 147 00:18:24,000 --> 00:18:32,000 you know this, correspond to one amino acid. 148 00:18:32,000 --> 00:18:38,000 And this base triplet gets a special name. It's called a codon. 149 00:18:38,000 --> 00:18:46,000 And the thing that you will have 150 00:18:46,000 --> 00:18:50,000 noticed is that what I've told you is there are 64 possible 151 00:18:50,000 --> 00:18:54,000 combinations of triplets and only 20 amino acids. And so that leaves 152 00:18:54,000 --> 00:18:58,000 some over. What happens? Well, they're all used. 153 00:18:58,000 --> 00:19:04,000 And what happens is that although the code is universal, 154 00:19:04,000 --> 00:19:10,000 as far as we know it arose just once, all living organisms on our planet 155 00:19:10,000 --> 00:19:17,000 use this code, it is a redundant code. 156 00:19:17,000 --> 00:19:23,000 So I will write down it is redundant but not ambiguous, 157 00:19:23,000 --> 00:19:30,000 and tell you what that means. So what that means is that an amino 158 00:19:30,000 --> 00:19:37,000 acid can be specified by more than one triplet, and I'll show you that 159 00:19:37,000 --> 00:19:44,000 in a moment, but that any triplet of bases only corresponds to one amino 160 00:19:44,000 --> 00:19:51,000 acid. Let's look at some diagrams to show you what I mean. 161 00:19:51,000 --> 00:19:58,000 This is a table of your amino acid code. 162 00:19:58,000 --> 00:20:02,000 These letters in columns represent the bases. And next to them are 163 00:20:02,000 --> 00:20:07,000 written the amino acids that correspond to this particular code. 164 00:20:07,000 --> 00:20:11,000 Let's start with an easy one. This is methionine encoded by AUG. 165 00:20:11,000 --> 00:20:16,000 And that's one you should actually remember. OK? 166 00:20:16,000 --> 00:20:20,000 And for methionine there is only one possible codon. 167 00:20:20,000 --> 00:20:25,000 It is AUG and always AUG. But let's keep going here. 168 00:20:25,000 --> 00:20:30,000 And let's look at the amino acid lucine. 169 00:20:30,000 --> 00:20:36,000 lucine is encoded by six possible triplets, six possible codons, 170 00:20:36,000 --> 00:20:43,000 UAA, UAG, CUU, CUC, CUA and CUG. Any one of those in a mRNA can 171 00:20:43,000 --> 00:20:49,000 encode lucine. However, CUU only encodes lucine. 172 00:20:49,000 --> 00:20:56,000 It never encodes another amino acid. OK? And that's what 173 00:20:56,000 --> 00:21:04,000 I mean by redundant. More than one triplet can encode one 174 00:21:04,000 --> 00:21:12,000 amino acid, but any given triplet only corresponds to one particular 175 00:21:12,000 --> 00:21:20,000 amino acid. OK. You will have practice on this kind 176 00:21:20,000 --> 00:21:28,000 of thing as you go along. So let's get some basics down here. 177 00:21:28,000 --> 00:21:36,000 The template in the whole translation process is your mRNA. 178 00:21:36,000 --> 00:21:41,000 OK? It's the code. It contains the code. 179 00:21:41,000 --> 00:21:46,000 It is read to give a protein readout from 5 prime to 3 prime. 180 00:21:46,000 --> 00:21:51,000 And the readout of the protein, as I mentioned to you way back when, 181 00:21:51,000 --> 00:21:57,000 reads out from the amino to the carboxyl end. New amino acids are 182 00:21:57,000 --> 00:22:05,000 added onto the carboxyl end -- -- and the free amino group 183 00:22:05,000 --> 00:22:15,000 corresponds to the first amino acid polymerized. So it is read 5 prime 184 00:22:15,000 --> 00:22:25,000 to 3 prime, and that corresponds to the amino to the carboxy 185 00:22:25,000 --> 00:22:37,000 growth of the protein. 186 00:22:37,000 --> 00:22:44,000 All mRNAs start with the same amino acid, and that is methionine. 187 00:22:44,000 --> 00:22:51,000 And the start or initiation codon in all proteins is methionine, 188 00:22:51,000 --> 00:22:59,000 oops, is AUG which encodes methionine. 189 00:22:59,000 --> 00:23:03,000 Now, not all final proteins have got methionine at their amino ends 190 00:23:03,000 --> 00:23:08,000 because it can be cleaved off. OK? So you don't have to land up 191 00:23:08,000 --> 00:23:13,000 with a protein that has a methionine end, its amino end, 192 00:23:13,000 --> 00:23:18,000 but it starts off with methionine there. And then there are no gaps 193 00:23:18,000 --> 00:23:22,000 in the message. It is read without any punctuation 194 00:23:22,000 --> 00:23:27,000 marks, except for the fact that the codons are next to one another in a 195 00:23:27,000 --> 00:23:33,000 non-overlapping way. OK? So there are no gaps. 196 00:23:33,000 --> 00:23:39,000 And the only punctuation is the start codon and a series of stop 197 00:23:39,000 --> 00:23:45,000 codons which do not encode any amino acids. These are UAA, 198 00:23:45,000 --> 00:23:51,000 UAG and UGA. And you can remember them if you want, 199 00:23:51,000 --> 00:23:57,000 but we're not going to test that you do. OK? You can use your 200 00:23:57,000 --> 00:24:02,000 amino acid tables. OK. So your punctuation is the 201 00:24:02,000 --> 00:24:08,000 start and the end of the message. All right. So let's go on and talk 202 00:24:08,000 --> 00:24:14,000 about the interpreter and what I mean by the interpreter. 203 00:24:14,000 --> 00:24:20,000 In this diagram here I have got, look up here for a moment. This is 204 00:24:20,000 --> 00:24:26,000 quite a nice diagram not from your book. I've got your DNA strand, 205 00:24:26,000 --> 00:24:31,000 which is your template strand. Your corresponding RNA, 206 00:24:31,000 --> 00:24:35,000 your mRNA, and the readout of the RNA to the protein. 207 00:24:35,000 --> 00:24:40,000 And here are the codons, UGG, this is in the middle of the 208 00:24:40,000 --> 00:24:44,000 protein so that's why there's no methionine, UGG corresponding to 209 00:24:44,000 --> 00:24:48,000 tryptophan, UUU corresponding to phenylalanine. 210 00:24:48,000 --> 00:24:53,000 You can see how the codons are right next to each other, 211 00:24:53,000 --> 00:24:57,000 OK, but do not overlap. In fact, I'm going to write that on the board. 212 00:24:57,000 --> 00:25:02,000 So no gaps and no codon overlap. Very important that you understand 213 00:25:02,000 --> 00:25:07,000 that. So when people looked to this and figured out what the codons 214 00:25:07,000 --> 00:25:11,000 corresponded to in terms of amino acids there was the question of, 215 00:25:11,000 --> 00:25:16,000 well, how do you actually get those amino acids corresponding to those 216 00:25:16,000 --> 00:25:21,000 codons? And there was a sense that you needed some kind of adapter or 217 00:25:21,000 --> 00:25:25,000 interpreter molecule that both recognized the codon and recognized 218 00:25:25,000 --> 00:25:30,000 the amino acid. And that's the next thing that I'm 219 00:25:30,000 --> 00:25:40,000 going to tell you. And -- 220 00:25:40,000 --> 00:25:46,000 -- stop. Well, I apologize on behalf of our 221 00:25:46,000 --> 00:25:53,000 illustrious institute for the boards in this room. OK. 222 00:25:53,000 --> 00:26:00,000 So all right. So let's talk about interpreter. 223 00:26:00,000 --> 00:26:07,000 And I'll tell you that this is the class of RNA someone brought up 224 00:26:07,000 --> 00:26:14,000 earlier called tRNAs. So tRNA, as you may recall, 225 00:26:14,000 --> 00:26:21,000 are these very small RNAs. There are about 100 base pairs, 226 00:26:21,000 --> 00:26:28,000 100 bases in length, and there are a lot of them. And there is a tRNA 227 00:26:28,000 --> 00:26:35,000 that corresponds to every codon. So tRNAs recognize both the amino 228 00:26:35,000 --> 00:26:42,000 acid and the specific codon. And they recognize, 229 00:26:42,000 --> 00:26:49,000 let's talk about the codon first. They recognize the codon by DNA 230 00:26:49,000 --> 00:26:56,000 complement, by RNA complementarity, by base pairing to a region on the 231 00:26:56,000 --> 00:27:05,000 tRNA called the anti-codon. 232 00:27:05,000 --> 00:27:15,000 So let's talk about methionine for a moment. The codon for methionine is 233 00:27:15,000 --> 00:27:25,000 AUG. That's the codon. Woops. Hold on one second here. 234 00:27:25,000 --> 00:27:33,000 5 prime AUG, that's your codon. And what will be complementary to 235 00:27:33,000 --> 00:27:40,000 that on the tRNA from the 3 prime end is UAC. 236 00:27:40,000 --> 00:27:47,000 OK? So this anti-codon is on the 237 00:27:47,000 --> 00:27:52,000 tRNA. Anti-codons can either be written from the 3 prime end or you 238 00:27:52,000 --> 00:27:57,000 can switch them around and talk about 5 prime CAU. It's 239 00:27:57,000 --> 00:28:02,000 the same thing. OK? So that's one thing. 240 00:28:02,000 --> 00:28:06,000 I'll show you a picture in a moment. The other thing that a tRNA has to 241 00:28:06,000 --> 00:28:11,000 recognize is the amino acid. And that's more complicated. 242 00:28:11,000 --> 00:28:16,000 For different amino acids there are different parts of the tRNA molecule 243 00:28:16,000 --> 00:28:20,000 that recognizes specific amino acids. And it hasn't actually been figured 244 00:28:20,000 --> 00:28:25,000 out completely which part of which tRNA recognizes a particular amino 245 00:28:25,000 --> 00:28:30,000 acid, but the recognition is also on the tRNA -- 246 00:28:30,000 --> 00:28:35,000 -- and not really on the anti-codon. Or certainly not the anti-codon 247 00:28:35,000 --> 00:28:40,000 alone is probably fair to say. So let me show you a picture of a 248 00:28:40,000 --> 00:28:46,000 tRNA. tRNAs are single-stranded RNAs that fold up on themselves in a 249 00:28:46,000 --> 00:28:51,000 complex way. OK? Here's the representation of the 250 00:28:51,000 --> 00:28:57,000 three-dimensional structure of a tRNA. And these cross things 251 00:28:57,000 --> 00:29:02,000 are hydrogen bonds. So there's a lot of base-pairing 252 00:29:02,000 --> 00:29:07,000 within the tRNA. Represented more simply, 253 00:29:07,000 --> 00:29:12,000 the tRNA forms this kind of cloverleaf structure, 254 00:29:12,000 --> 00:29:17,000 and the anti-codon is at one end of the tRNA. OK? 255 00:29:17,000 --> 00:29:22,000 So this is the thing that's base pairing to the codon and the mRNA. 256 00:29:22,000 --> 00:29:27,000 The amino acid attaches to the very 3 prime end of the tRNA at this site 257 00:29:27,000 --> 00:29:33,000 which is a CCA. OK? And there is a covalent attachment 258 00:29:33,000 --> 00:29:41,000 of the tRNA to the amino acid at this CCA region. 259 00:29:41,000 --> 00:29:49,000 All right. But the part that recognizes the amino acid can be 260 00:29:49,000 --> 00:29:57,000 somewhere in the rest of the tRNA molecule. It's very complex. 261 00:29:57,000 --> 00:30:03,000 OK. So let's move on now. Actually, let me tell you one more 262 00:30:03,000 --> 00:30:09,000 thing, though I'll tell it to you in a moment. OK. 263 00:30:09,000 --> 00:30:14,000 So let's move on now to the question of the factory. 264 00:30:14,000 --> 00:30:19,000 And by factory I mean the place where protein synthesis or 265 00:30:19,000 --> 00:30:25,000 translation takes place. And the factory here is the 266 00:30:25,000 --> 00:30:30,000 ribosome. We mentioned ribosomes right at the 267 00:30:30,000 --> 00:30:36,000 beginning of the course in the second lecture and haven't said a 268 00:30:36,000 --> 00:30:42,000 whole bunch about them since. Ribosomes are very large structures. 269 00:30:42,000 --> 00:30:48,000 They are not membrane bound, but they are very large. This is a 270 00:30:48,000 --> 00:30:54,000 representation of a ribosome from bacteria that has a small subunit 271 00:30:54,000 --> 00:31:01,000 and a large subunit. And, interestingly, 272 00:31:01,000 --> 00:31:09,000 ribosomes are an obligatory complex between the so-called rRNA, 273 00:31:09,000 --> 00:31:17,000 or ribosomal RNA, plus proteins. There is a small subunit, this is 274 00:31:17,000 --> 00:31:25,000 really bad. Let's try this one. Small subunit which consists of one 275 00:31:25,000 --> 00:31:33,000 ribosomal RNA of a particular kind and 33 proteins. 276 00:31:33,000 --> 00:31:37,000 And there is a large subunit. And I tell you this not because you 277 00:31:37,000 --> 00:31:42,000 need to remember this, but you need to appreciate that this 278 00:31:42,000 --> 00:31:46,000 is a very complex structure. It's a very cool and complex 279 00:31:46,000 --> 00:31:51,000 structure. The large subunit comprises of three RNAs 280 00:31:51,000 --> 00:31:56,000 and 45 proteins. You can represent the structure of 281 00:31:56,000 --> 00:32:00,000 the ribosome much more beautifully in this diagram, 282 00:32:00,000 --> 00:32:04,000 or in this representation, where the RNA is shown in gold, 283 00:32:04,000 --> 00:32:08,000 or the two RNAs are shown in gold, or the multiple RNAs are shown in 284 00:32:08,000 --> 00:32:12,000 gold, and some of the proteins are shown as these other structures and 285 00:32:12,000 --> 00:32:16,000 you can see the alpha helices of the proteins. OK? 286 00:32:16,000 --> 00:32:20,000 And what you should be able to see on this diagram, 287 00:32:20,000 --> 00:32:24,000 let me point to this one for a change, is this tunnel, 288 00:32:24,000 --> 00:32:28,000 this hole through the structure. And this is the tunnel through which 289 00:32:28,000 --> 00:32:33,000 the mRNAs thread as it is translated. So this is truly a factory. 290 00:32:33,000 --> 00:32:39,000 tRNAs come into this, the mRNA threads through, 291 00:32:39,000 --> 00:32:44,000 and as that takes place so the mRNA directs the synthesis of the protein. 292 00:32:44,000 --> 00:32:49,000 OK. This is a representation from your book. I don't like most of the 293 00:32:49,000 --> 00:32:54,000 diagrams from your book so I redrew most of them for you, 294 00:32:54,000 --> 00:33:00,000 but I left this one. This is a representation of translation. 295 00:33:00,000 --> 00:33:05,000 The mRNA is shown in green and the large subunit and small subunit of 296 00:33:05,000 --> 00:33:10,000 the ribosome come together, form the complete ribosome, and then 297 00:33:10,000 --> 00:33:15,000 the mRNA actually is thread through the ribosome and the protein, 298 00:33:15,000 --> 00:33:21,000 well, here they've called it a polypeptide chain is thread through. 299 00:33:21,000 --> 00:33:26,000 So let's explore this in a big more detail. And in order to do so, 300 00:33:26,000 --> 00:33:32,000 I've got to conserve boards here because we are one board short. 301 00:33:32,000 --> 00:33:36,000 In order to do so I need to introduce you to the various parts 302 00:33:36,000 --> 00:33:41,000 of a mRNA. And this is on one of the diagrams that I handed out today. 303 00:33:41,000 --> 00:33:46,000 OK? So you don't need to redraw it. Just look at the diagram. 304 00:33:46,000 --> 00:33:51,000 In the mRNA, and this is crucial for translation, 305 00:33:51,000 --> 00:33:56,000 there are three parts that are really important. Two 306 00:33:56,000 --> 00:34:02,000 of them, excuse me. Two of them are actually added to 307 00:34:02,000 --> 00:34:08,000 the mRNA after it is transcribed. The thing at the very 5 prime end 308 00:34:08,000 --> 00:34:14,000 called the cap and something at the very 3 prime end, 309 00:34:14,000 --> 00:34:21,000 which is a long string of up to a couple of hundred A residues 310 00:34:21,000 --> 00:34:27,000 contiguous, which is called the poly A tail. And these parts of the mRNA 311 00:34:27,000 --> 00:34:34,000 are crucial for the first part of translation which is initiation. 312 00:34:34,000 --> 00:34:39,000 As in replication and transcription, you can divide up these synthetic 313 00:34:39,000 --> 00:34:45,000 processes into different steps. And initiation is the first step. 314 00:34:45,000 --> 00:34:51,000 And in order for initiation to occur one needs the parts of the RNA 315 00:34:51,000 --> 00:34:57,000 that are added on post-transcriptionally. 316 00:34:57,000 --> 00:35:03,000 You need the cap, this poly A tail -- -- and also a region that is just 317 00:35:03,000 --> 00:35:11,000 upstream or 5 prime of this AUG initiated codon in a region called 318 00:35:11,000 --> 00:35:19,000 the UTR, the 5 prime UTR which stands for untranslated region. 319 00:35:19,000 --> 00:35:26,000 And you also need the AUG codon. OK. And what happens is that the 320 00:35:26,000 --> 00:35:34,000 ribosome and various initiation proteins bind to the 5 prime cap and 321 00:35:34,000 --> 00:35:40,000 simultaneously to the poly A tail. So this is really cool. 322 00:35:40,000 --> 00:35:45,000 The mRNA is translated as a circle where this poly A tail, 323 00:35:45,000 --> 00:35:49,000 the very 3 prime end is brought all the way around to the 5 prime end, 324 00:35:49,000 --> 00:35:54,000 and you get a whole mess of proteins sitting on that part of the RNA and 325 00:35:54,000 --> 00:36:00,000 starting translation. So you get initiation proteins, 326 00:36:00,000 --> 00:36:08,000 which are called initiation factors, and you get ribosome assembly where 327 00:36:08,000 --> 00:36:15,000 the small subunit and the large subunit come together, 328 00:36:15,000 --> 00:36:23,000 and you get a tRNA carrying a methionine amino acid coming and 329 00:36:23,000 --> 00:36:36,000 sitting on the AUG. 330 00:36:36,000 --> 00:36:40,000 OK. Let me show you more. So here we have a cartoon, 331 00:36:40,000 --> 00:36:45,000 you have this in front of you but I'm going to show it to you in a 332 00:36:45,000 --> 00:36:50,000 step-wise fashion, of this ribosome recognition 333 00:36:50,000 --> 00:36:55,000 sequence. Actually, I'm not going to show you now but in 334 00:36:55,000 --> 00:37:00,000 your handout there are pictures of the circular RNAs being translated. 335 00:37:00,000 --> 00:37:04,000 OK? That's something new and it's something very interesting. 336 00:37:04,000 --> 00:37:09,000 I'm not going to dwell on it now. OK? Where the poly A tail comes 337 00:37:09,000 --> 00:37:13,000 all the way around to that 5 prime so-called cap region. 338 00:37:13,000 --> 00:37:18,000 I should just point out, again, I'm not going to dwell on it, 339 00:37:18,000 --> 00:37:22,000 the so-called 5 prime cap region is a modified guanine. 340 00:37:22,000 --> 00:37:27,000 OK? MEG stands for methyl guanine. You can call it the cap. It 341 00:37:27,000 --> 00:37:32,000 designated the very 5 prime end of the message. 342 00:37:32,000 --> 00:37:39,000 OK. So let us look at the sequence of translation. 343 00:37:39,000 --> 00:37:47,000 And what I'm going to tell you, before I go through the cartoon, is 344 00:37:47,000 --> 00:37:54,000 that in the elongation process sequential tRNAs carrying their 345 00:37:54,000 --> 00:38:02,000 particular amino acids are going to come in. 346 00:38:02,000 --> 00:38:06,000 And they're going to sit on these various codons. 347 00:38:06,000 --> 00:38:11,000 And peptide bonds are going to form between adjacent amino acids so you 348 00:38:11,000 --> 00:38:16,000 get the polypeptide chain growing. OK? So let's start off with the 349 00:38:16,000 --> 00:38:20,000 initiator. There's your tRNA that is joined to methionine. 350 00:38:20,000 --> 00:38:25,000 And I need to introduce you to a term now which is a charged, 351 00:38:25,000 --> 00:38:33,000 I didn't have space before. The term charged tRNA refers to the 352 00:38:33,000 --> 00:38:43,000 tRNA covalently linked to its amino acid. And then correspondingly the 353 00:38:43,000 --> 00:38:54,000 uncharged tRNA has no amino acid. The amino acid has fallen off or 354 00:38:54,000 --> 00:39:02,000 been used. OK. So there is a tRNA sitting on the 355 00:39:02,000 --> 00:39:06,000 first codon, the AUG, and that's the start of the sentence. 356 00:39:06,000 --> 00:39:11,000 That positions the beginning of the protein. Now, 357 00:39:11,000 --> 00:39:15,000 watch what happens. Here comes another tRNA that 358 00:39:15,000 --> 00:39:19,000 corresponds to lucine, and you're getting base pairing here. 359 00:39:19,000 --> 00:39:24,000 That first tRNA is base paired to the AUG codon through its anti-codon. 360 00:39:24,000 --> 00:39:28,000 The second tRNA is base paired to the second codon through 361 00:39:28,000 --> 00:39:34,000 its anti-codon. And now you've got a methionine tRNA 362 00:39:34,000 --> 00:39:40,000 sitting next to a lucine tRNA. OK. Everyone with me here? And 363 00:39:40,000 --> 00:39:46,000 what happens now is that a peptide bond forms between the methionine 364 00:39:46,000 --> 00:39:52,000 and the lucine. In particular, this methionine is 365 00:39:52,000 --> 00:39:58,000 going to move over to that lucine over there and lead to uncharging of 366 00:39:58,000 --> 00:40:04,000 that particular tRNA. Take a look. OK, 367 00:40:04,000 --> 00:40:10,000 so I've shown you that methionine is going to form a peptide bond with 368 00:40:10,000 --> 00:40:16,000 the lucine. Now, watch what happens next. 369 00:40:16,000 --> 00:40:21,000 Here's the methionine tRNA. It's lost its amino acid, OK, 370 00:40:21,000 --> 00:40:27,000 so it falls off the message. It's done its thing. It's 371 00:40:27,000 --> 00:40:33,000 no longer needed. Along comes, no, 372 00:40:33,000 --> 00:40:39,000 sitting there is this lucine tRNA which is now covalently attached to 373 00:40:39,000 --> 00:40:45,000 its peptide bond to the methionine. And there's a free amino end here 374 00:40:45,000 --> 00:40:51,000 which designates the first amino acid synthesized in a polypeptide 375 00:40:51,000 --> 00:40:57,000 chain. And here comes in the next tRNA that corresponds to a serine 376 00:40:57,000 --> 00:41:03,000 tRNA based paired by its codon, base paired by its anti-codon to the 377 00:41:03,000 --> 00:41:08,000 codon on the mRNA. OK? And the same thing is going to 378 00:41:08,000 --> 00:41:13,000 happen again. The lucine and the methionine is going to be 379 00:41:13,000 --> 00:41:17,000 transferred over and make a peptide bond with the serine, 380 00:41:17,000 --> 00:41:22,000 and so you get elongation of the polypeptide chain. 381 00:41:22,000 --> 00:41:27,000 So what I'm going to write under elongation is that adjacent 382 00:41:27,000 --> 00:41:35,000 amino acids join. 383 00:41:35,000 --> 00:41:44,000 Uncharged tRNAs leave, are released, and sequentially new 384 00:41:44,000 --> 00:41:54,000 tRNAs corresponding to codons come in. 385 00:41:54,000 --> 00:42:20,000 OK. All right. 386 00:42:20,000 --> 00:42:26,000 So this whole process goes on until the mRNA, until the ribosome and all 387 00:42:26,000 --> 00:42:31,000 these tRNAs reach a place in the mRNA where there is a codon that 388 00:42:31,000 --> 00:42:37,000 doesn't correspond to an amino acid. 389 00:42:37,000 --> 00:42:43,000 A so-called stop codon. And at this point there is a 390 00:42:43,000 --> 00:42:49,000 process called termination where there is a stop codon that does not 391 00:42:49,000 --> 00:42:55,000 code for any amino acid and doesn't have a corresponding tRNA therefore. 392 00:42:55,000 --> 00:43:01,000 And at this point the protein polypeptide chain falls 393 00:43:01,000 --> 00:43:12,000 off the message. 394 00:43:12,000 --> 00:43:18,000 All right. You guys OK with that? OK. I'm going to refer you, I'm 395 00:43:18,000 --> 00:43:24,000 not going to go and watch this movie. Go and watch this movie. 396 00:43:24,000 --> 00:43:30,000 Go and watch the movie by yourselves. OK? 397 00:43:30,000 --> 00:43:34,000 I don't want to take the time to watch it now. It's an animation of 398 00:43:34,000 --> 00:43:38,000 what I've just told you. There are some diagrams in your book. 399 00:43:38,000 --> 00:43:42,000 You can look at them. They talk about things called A 400 00:43:42,000 --> 00:43:46,000 sites and P sites in the ribosome. To me that is less important than 401 00:43:46,000 --> 00:43:50,000 you understand the actual interactions between the tRNAs and 402 00:43:50,000 --> 00:43:54,000 the mRNAs. Here is a circular RNA with that poly A tail and the 5 403 00:43:54,000 --> 00:43:58,000 prime cap of binding proteins to initiate translation 404 00:43:58,000 --> 00:44:02,000 as a circular RNA. All right. So, 405 00:44:02,000 --> 00:44:07,000 finally, let's move to this complicated, I think fantastic 406 00:44:07,000 --> 00:44:12,000 bringing together of mutation from genotype to phenotype. 407 00:44:12,000 --> 00:44:18,000 You've had a genetics module where you talked about mutations, 408 00:44:18,000 --> 00:44:23,000 you talked about the genotype, you talked about the phenotype. 409 00:44:23,000 --> 00:44:28,000 We've been throwing at you genotype has got something to do with the DNA 410 00:44:28,000 --> 00:44:34,000 base sequence. Phenotype has got something to do 411 00:44:34,000 --> 00:44:40,000 with the final product, particularly the protein sequence. 412 00:44:40,000 --> 00:44:47,000 Let's explore that in a bit more detail now and ask, 413 00:44:47,000 --> 00:44:53,000 what is the molecular basis for changes in genotype and how do these 414 00:44:53,000 --> 00:45:00,000 correspond to changes in phenotype? OK. So genotype to phenotype. 415 00:45:00,000 --> 00:45:04,000 And I want to emphasize again that phenotype is an outcome of a change 416 00:45:04,000 --> 00:45:09,000 in function of the final product of a gene. It isn't necessarily the 417 00:45:09,000 --> 00:45:14,000 same as the final product of a gene. OK? So, for example, a phenotype 418 00:45:14,000 --> 00:45:19,000 is giantism. Someone who is very tall. The molecular basis for that 419 00:45:19,000 --> 00:45:24,000 could be multiple things. It could be production of too much 420 00:45:24,000 --> 00:45:29,000 of a hormone, a protein called growth hormone so that someone grows 421 00:45:29,000 --> 00:45:34,000 too tall or taller than normal. OK? So that is the phenotype is 422 00:45:34,000 --> 00:45:39,000 connected to the production of a particular protein that's not the 423 00:45:39,000 --> 00:45:44,000 same as. So here's another diagram, something for you to think about. 424 00:45:44,000 --> 00:45:50,000 Mutations, almost anywhere in a gene, can have an affect on the protein 425 00:45:50,000 --> 00:45:55,000 produced. And there are two ways the protein produced 426 00:45:55,000 --> 00:46:00,000 can be affected. One is in the amount of protein and 427 00:46:00,000 --> 00:46:04,000 the other is in the sequence of the protein produced. 428 00:46:04,000 --> 00:46:09,000 Now, if one gets a mutation in this promoter region or often in the 429 00:46:09,000 --> 00:46:13,000 introns, but particularly the promoter I've focused on, 430 00:46:13,000 --> 00:46:18,000 one can change the amount of RNA that is being transcribed from a 431 00:46:18,000 --> 00:46:22,000 particular gene. And that change in the amount of 432 00:46:22,000 --> 00:46:27,000 RNA will lead to a change in the amount of protein. 433 00:46:27,000 --> 00:46:34,000 And you may get a phenotype because you're making too little or too much 434 00:46:34,000 --> 00:46:41,000 protein. Conversely, changes in exons can lead to changes 435 00:46:41,000 --> 00:46:48,000 in the actual sequence, the amino acid sequence of the 436 00:46:48,000 --> 00:46:55,000 protein and, therefore, to its function. So those are two 437 00:46:55,000 --> 00:47:02,000 important distinctions to make. OK? So mutations can change the 438 00:47:02,000 --> 00:47:10,000 amount or the sequence of a protein. 439 00:47:10,000 --> 00:47:15,000 I'm going to go through some examples of mutations, 440 00:47:15,000 --> 00:47:20,000 and you will go through more in Section, and you are expected to 441 00:47:20,000 --> 00:47:25,000 know these changes and you are expected to know how the change in 442 00:47:25,000 --> 00:47:30,000 DNA sequence may lead or not to the change in protein sequence. 443 00:47:30,000 --> 00:47:34,000 So look carefully. I'm not going to get through all my 444 00:47:34,000 --> 00:47:38,000 examples today. You can go and you can do the 445 00:47:38,000 --> 00:47:42,000 examples that are posted on your website. You'll get more practice. 446 00:47:42,000 --> 00:47:47,000 And you really need to know this. OK, so here's a wild type gene. 447 00:47:47,000 --> 00:47:51,000 The top two strands are the DNA. The bottom of the strands is the 448 00:47:51,000 --> 00:47:55,000 template strand. This DNA is transcribed into a mRNA 449 00:47:55,000 --> 00:48:00,000 and that is translated into the protein indicated here. 450 00:48:00,000 --> 00:48:06,000 OK? Let's look at an example of what happens when there is a change 451 00:48:06,000 --> 00:48:13,000 in the DNA. So here I've got a change in the DNA, 452 00:48:13,000 --> 00:48:19,000 OK, such that this particular base pair has been changed. 453 00:48:19,000 --> 00:48:26,000 The mRNA, oh, this is the wild type again. Here's your wild type 454 00:48:26,000 --> 00:48:33,000 sequence, wild type mRNA, wild type protein. 455 00:48:33,000 --> 00:48:37,000 This is a class of change in the DNA that is called a nonsense mutation. 456 00:48:37,000 --> 00:48:42,000 Watch carefully. So at this position that I've underlined, 457 00:48:42,000 --> 00:48:46,000 watch this. Don't try to write anything down. 458 00:48:46,000 --> 00:48:51,000 OK? You'll have plenty of practice. This is all posted. 459 00:48:51,000 --> 00:48:55,000 Just watch. At this particular underlying position, 460 00:48:55,000 --> 00:49:00,000 instead of a GC base pair there is now an AT base pair. 461 00:49:00,000 --> 00:49:06,000 And that changes this codon UGG into UAG. And UAG happens to be a stop 462 00:49:06,000 --> 00:49:13,000 codon. So here's your gene, your mutant gene, here's your mRNA 463 00:49:13,000 --> 00:49:19,000 that comes from the mutant gene, and here's the protein. It starts 464 00:49:19,000 --> 00:49:26,000 OK with a methionine. But, look, the next codon is a stop. 465 00:49:26,000 --> 00:49:32,000 OK? So the protein is truncated. Now, there are a number of classes 466 00:49:32,000 --> 00:49:36,000 of mutation. I am going to write these on the board. 467 00:49:36,000 --> 00:49:41,000 I'm going to ask you to go and read your handout carefully. 468 00:49:41,000 --> 00:49:46,000 And you will cover these in section. Again, you need to know them so let 469 00:49:46,000 --> 00:49:50,000 me list the types of mutation. In the interest of time, I'm not 470 00:49:50,000 --> 00:49:55,000 going to go through them, but you will be able to work through 471 00:49:55,000 --> 00:50:00,000 these examples both in Section and on your own. 472 00:50:00,000 --> 00:50:05,000 So, to end off, the mutations in exons that you 473 00:50:05,000 --> 00:50:11,000 should know are silent mutations that don't change the sequence of 474 00:50:11,000 --> 00:50:16,000 the protein, nonsense mutations that I've just covered, 475 00:50:16,000 --> 00:50:22,000 something called missense mutations which change the amino acid sequence, 476 00:50:22,000 --> 00:50:27,000 and something called frameshift mutations which also are likely to 477 00:50:27,000 --> 00:50:33,000 change the sequence of the amino acid. 478 00:50:33,000 --> 00:50:37,000 OK? As I say, this will be covered. 479 00:50:37,000 --> 00:50:42,000 If you want to come and see me personally in office hours tomorrow 480 00:50:42,000 --> 00:50:46,000 or the next day, please do, and I'll go through these 481 00:50:46,000 --> 00:50:49,000 examples with you.