1 00:00:15,000 --> 00:00:21,000 OK. And here we are in the molecular biology section. 2 00:00:21,000 --> 00:00:27,000 And the goal of this section, as Professor Jacks started to tell 3 00:00:27,000 --> 00:00:33,000 you during the Genetics module and Professor Baker told you at the 4 00:00:33,000 --> 00:00:39,000 beginning of last lecture is try to link together in molecular terms the 5 00:00:39,000 --> 00:00:45,000 question of genotype and the question of phenotype. 6 00:00:45,000 --> 00:00:51,000 And we presented to you this notion that goes by the ponderous name of 7 00:00:51,000 --> 00:00:58,000 the central dogma that the link between genotype and phenotype is 8 00:00:58,000 --> 00:01:04,000 related to DNA as a genetic material that then proceeds to transmit its 9 00:01:04,000 --> 00:01:11,000 information to a final outcome, which is very often a protein, 10 00:01:11,000 --> 00:01:17,000 through an RNA intermediate. And the point of these molecular 11 00:01:17,000 --> 00:01:22,000 biology lectures is to tell you about the molecular biology, 12 00:01:22,000 --> 00:01:27,000 and then at the end to try to bring together this genotype and phenotype 13 00:01:27,000 --> 00:01:32,000 in molecular terms. Now, last lecture you talked about 14 00:01:32,000 --> 00:01:36,000 DNA replication, DNA as the genetic material required 15 00:01:36,000 --> 00:01:41,000 to be replicated faithfully and accurately so it can transmit its 16 00:01:41,000 --> 00:01:46,000 information to the next generation. Professor Baker I know stressed the 17 00:01:46,000 --> 00:01:50,000 requirement for accurate replication, but she did not do one part of this 18 00:01:50,000 --> 00:01:55,000 lecture that I want to spend a couple of minutes now discussing 19 00:01:55,000 --> 00:02:00,000 with you. And that is the question of DNA repair. 20 00:02:00,000 --> 00:02:13,000 So there are two types of DNA repair. 21 00:02:13,000 --> 00:02:19,000 Actually, three types that I want to talk to you about. 22 00:02:19,000 --> 00:02:25,000 And the first pertains to the accuracy of the DNA polymerase that 23 00:02:25,000 --> 00:02:32,000 replicates the DNA. So DNA polymerase -- 24 00:02:32,000 --> 00:02:40,000 -- three, or the DNA polymerase that 25 00:02:40,000 --> 00:02:45,000 replicates the DNA makes mistakes. It puts in the wrong nucleotide. 26 00:02:45,000 --> 00:02:51,000 It puts in the wrong base. And it does so about one in ten to 27 00:02:51,000 --> 00:02:56,000 the fifth bases. OK? So one in a hundred thousand 28 00:02:56,000 --> 00:03:01,000 bases is wrong. Now, if you think about the fact 29 00:03:01,000 --> 00:03:05,000 that there are more than ten to the ninth bases in a human genome, 30 00:03:05,000 --> 00:03:09,000 every cell cycle that translates to ten thousand or so mistakes, 31 00:03:09,000 --> 00:03:14,000 that's a lot of changes in the DNA. That's not a very faithful kind of 32 00:03:14,000 --> 00:03:18,000 DNA replication. So this has been selected against 33 00:03:18,000 --> 00:03:22,000 evolutionarily. And there is a mechanism that's 34 00:03:22,000 --> 00:03:30,000 called proofreading -- 35 00:03:30,000 --> 00:03:35,000 That allows this high error rate to be corrected. And it's actually 36 00:03:35,000 --> 00:03:40,000 very cleaver. So this DNA polymerase has what is called an 37 00:03:40,000 --> 00:03:45,000 exonuclease activity. Exo meaning out. Nuclease meaning 38 00:03:45,000 --> 00:03:50,000 to break down nucleic acids. And this exonuclease proceeds from 39 00:03:50,000 --> 00:03:55,000 the 3 prime to the 5 prime direction, the 3 prime nucleotide being the one 40 00:03:55,000 --> 00:04:00,000 that was added last as you should now know. 41 00:04:00,000 --> 00:04:04,000 And so what DNA polymerase does as it is replicating is it kind of 42 00:04:04,000 --> 00:04:08,000 feels whether or not the double helix has reformed in a smooth way. 43 00:04:08,000 --> 00:04:12,000 And if it feels that there is a bubble there, a bubble where the two 44 00:04:12,000 --> 00:04:17,000 bases, actually look at me rather than the diagram. 45 00:04:17,000 --> 00:04:21,000 I think it's easier. I can do it better with my hands. 46 00:04:21,000 --> 00:04:25,000 Where you've got a nice smooth helix, if there is a mismatched 47 00:04:25,000 --> 00:04:30,000 nucleotide, the bases do not pair, there will be a bubble. 48 00:04:30,000 --> 00:04:34,000 OK? There will be a bubble in the helix or a space in the helix. 49 00:04:34,000 --> 00:04:39,000 The two strands will not be joined together. And the DNA polymerase 50 00:04:39,000 --> 00:04:43,000 can sense this and it can go back and it excises the wrong nucleotide 51 00:04:43,000 --> 00:04:48,000 and puts in the correct one. OK? This is called proofreading. 52 00:04:48,000 --> 00:04:53,000 And it's extremely necessary and it's actually very good. 53 00:04:53,000 --> 00:04:57,000 And what it does is to decrease the error rate to one in ten 54 00:04:57,000 --> 00:05:02,000 to the ninth bases. OK? So you get four orders of 55 00:05:02,000 --> 00:05:07,000 magnitude improvement in the accuracy of DNA replication. 56 00:05:07,000 --> 00:05:12,000 Now, there is another set of things that can go wrong. 57 00:05:12,000 --> 00:05:18,000 And these actually fall under the heading of mutagens. 58 00:05:18,000 --> 00:05:23,000 Mutagens, as Professor Jacks mentioned to you, 59 00:05:23,000 --> 00:05:28,000 being agents which change the base sequence of the DNA once 60 00:05:28,000 --> 00:05:33,000 the DNA is there. And these can either be chemical or 61 00:05:33,000 --> 00:05:37,000 these can be ionizing radiation. And in those cases also the helix 62 00:05:37,000 --> 00:05:41,000 gets changed because the wrong base gets put in. No, 63 00:05:41,000 --> 00:05:45,000 not because the wrong base gets put in. But because there is a chemical 64 00:05:45,000 --> 00:05:49,000 reaction which might modify a base, which might, for example, covalently 65 00:05:49,000 --> 00:05:53,000 link two bases. thymine for example. 66 00:05:53,000 --> 00:05:57,000 If two thymines are sitting next to one another in the helix, 67 00:05:57,000 --> 00:06:02,000 ultraviolet light is very good at cross linking those. 68 00:06:02,000 --> 00:06:06,000 And you now have something called a thymine dimer. 69 00:06:06,000 --> 00:06:10,000 And that is very bad because that is not a normal base sequence. 70 00:06:10,000 --> 00:06:14,000 And when replication time comes along that DNA helix is abnormal and 71 00:06:14,000 --> 00:06:19,000 the replication machinery doesn't know what to do about it, 72 00:06:19,000 --> 00:06:23,000 and that can lead to all sorts of problems and to mutations. 73 00:06:23,000 --> 00:06:28,000 So there are mechanisms that can get rid of abnormal bases. 74 00:06:28,000 --> 00:06:34,000 So mutagens can chemically, actually, maybe not say chemically. 75 00:06:34,000 --> 00:06:41,000 Let me just say change bases. They change base structure either to 76 00:06:41,000 --> 00:06:48,000 something that looks like another normal new base or to something that 77 00:06:48,000 --> 00:06:55,000 looks abnormal. And there are two mechanisms to get 78 00:06:55,000 --> 00:07:02,000 rid of this. One is called excision repair and the other is called 79 00:07:02,000 --> 00:07:08,000 mismatch repair. I have them written in the reverse 80 00:07:08,000 --> 00:07:12,000 order than is on this diagram from your book. In mismatch repair there 81 00:07:12,000 --> 00:07:16,000 is one nucleotide that looks normal, but it's different. It doesn't 82 00:07:16,000 --> 00:07:20,000 match the, usually it looks normal. It doesn't match the one opposite 83 00:07:20,000 --> 00:07:24,000 to it. And in that case the repair machinery can go in and remove the 84 00:07:24,000 --> 00:07:28,000 abnormal or the mismatched nucleotide, and there's another 85 00:07:28,000 --> 00:07:33,000 enzyme that will go and correct it. In excision repair, 86 00:07:33,000 --> 00:07:37,000 one very often, excision repair occurs when, for example, 87 00:07:37,000 --> 00:07:41,000 two nucleotides have become covalently linked to one another, 88 00:07:41,000 --> 00:07:45,000 and the one strand of the helix is just a mess. And there is an enzyme, 89 00:07:45,000 --> 00:07:49,000 or enzyme complex that will go in and actually excise a little chunk 90 00:07:49,000 --> 00:07:53,000 of the helix. And then another enzyme will come in and fill in the 91 00:07:53,000 --> 00:07:58,000 gap so that you get the helix repaired. 92 00:07:58,000 --> 00:08:02,000 Now, the challenge in this, and you may be asking yourselves 93 00:08:02,000 --> 00:08:07,000 this, is how does this repair machinery know which the correct 94 00:08:07,000 --> 00:08:12,000 strand was? In the case of proofreading it's very interesting 95 00:08:12,000 --> 00:08:17,000 because initially after replication the newly synthesized DNA strand is 96 00:08:17,000 --> 00:08:22,000 not modified. It's just a normal nucleotide polymer. 97 00:08:22,000 --> 00:08:27,000 However, the template strand, the template strands, the parental 98 00:08:27,000 --> 00:08:32,000 strands over time become chemically modified. 99 00:08:32,000 --> 00:08:35,000 The bases actually get, especially adenine gets some methyl 100 00:08:35,000 --> 00:08:39,000 groups added to it. And this is different than the 101 00:08:39,000 --> 00:08:43,000 newly synthesized with doesn't have these methyl groups. 102 00:08:43,000 --> 00:08:47,000 And so the polymerase knows which strand is the old strand and the 103 00:08:47,000 --> 00:08:50,000 correct one and which is the new strand and the incorrect one. 104 00:08:50,000 --> 00:08:54,000 In the case of excision and mismatch repair, 105 00:08:54,000 --> 00:08:58,000 that's sometimes not clear. Where you've got these thymine 106 00:08:58,000 --> 00:09:02,000 dimmers, these Ts that are joined together then that's clearly the 107 00:09:02,000 --> 00:09:06,000 wrong, that's clearly wrong. OK? The enzymatic machinery can 108 00:09:06,000 --> 00:09:10,000 take that out and copy the other strand. Sometimes, 109 00:09:10,000 --> 00:09:14,000 though, if you just have a chemical conversion of one base to another, 110 00:09:14,000 --> 00:09:18,000 the repair machinery does not know which strand is the correct and 111 00:09:18,000 --> 00:09:22,000 which isn't. And that's when you'll get mutations fixed in the DNA 112 00:09:22,000 --> 00:09:26,000 because at replication you really may get the changing, 113 00:09:26,000 --> 00:09:30,000 you may get the incorrect, you may not get the correct base 114 00:09:30,000 --> 00:09:36,000 repairs. And then that incorrect base will be 115 00:09:36,000 --> 00:09:42,000 passed on through the next generation. OK. 116 00:09:42,000 --> 00:09:48,000 So this is a very rapid zip through DNA repair that I wanted you to be 117 00:09:48,000 --> 00:09:54,000 able to think about. I want to move onto the next step 118 00:09:54,000 --> 00:10:01,000 in the transmission of information from gene to final product today. 119 00:10:01,000 --> 00:10:06,000 And I want to talk to you about the generation of RNA. 120 00:10:06,000 --> 00:10:11,000 And so let us begin with a quiz. And I have for you a new incentive 121 00:10:11,000 --> 00:10:17,000 to pay attention, a new prize that you can use to 122 00:10:17,000 --> 00:10:22,000 think about the conversion of potential to kinetic energy, 123 00:10:22,000 --> 00:10:27,000 and also you can use to amuse yourself when you're downloading 124 00:10:27,000 --> 00:10:33,000 very poor, when you're downloading things from the Internet and have 125 00:10:33,000 --> 00:10:38,000 nothing better to do. I can usually get this right across 126 00:10:38,000 --> 00:10:42,000 the room. There you go. You can also use it to think about 127 00:10:42,000 --> 00:10:46,000 the nature of amphibians, they're nice flying frogs. 128 00:10:46,000 --> 00:10:51,000 OK. So let us pose the question here, what is RNA? 129 00:10:51,000 --> 00:11:01,000 And you've had some of this on a 130 00:11:01,000 --> 00:11:06,000 problem set, but you really need to know what I'm talking about. 131 00:11:06,000 --> 00:11:12,000 This is a ribonucleotide. How do I know that this is a ribonucleotide? 132 00:11:12,000 --> 00:11:17,000 Think about it. You can put your hands up, but I want everyone to 133 00:11:17,000 --> 00:11:22,000 think about it. OK. And you need to identify the 134 00:11:22,000 --> 00:11:28,000 precise chemical group, please, that tells me. I saw you 135 00:11:28,000 --> 00:11:38,000 two first, so yes. 136 00:11:38,000 --> 00:11:43,000 What does the lower right mean? Give me a name. It's the? There's 137 00:11:43,000 --> 00:11:49,000 a number there. The? Ah, we have a discrepancy of 138 00:11:49,000 --> 00:11:55,000 opinion here. Someone says it's a 3 prime hydroxyl on the ribose and 139 00:11:55,000 --> 00:12:01,000 someone says it's the 2 prime hydroxyl on the ribose. 140 00:12:01,000 --> 00:12:04,000 Let's take a vote. Who thinks that this is identified 141 00:12:04,000 --> 00:12:08,000 as a ribose because of this 2 prime hydroxyl? Thank you. 142 00:12:08,000 --> 00:12:12,000 And who believes it's the 3 prime hydroxyl that identified riboses? 143 00:12:12,000 --> 00:12:16,000 OK. We have a smaller but firm contingent. In fact, 144 00:12:16,000 --> 00:12:20,000 it's the 2 prime hydroxyl that identifies this is ribose. 145 00:12:20,000 --> 00:12:24,000 You remember, and you really need to remember that this three prime 146 00:12:24,000 --> 00:12:28,000 hydroxyl is the reactive group that allows the sugar phosphate backbone 147 00:12:28,000 --> 00:12:32,000 to polymerize. This 2 prime hydroxyl is a reactive 148 00:12:32,000 --> 00:12:38,000 group. It identifies this as ribose rather than deoxyribose, 149 00:12:38,000 --> 00:12:43,000 and it also is an additional reactive group. 150 00:12:43,000 --> 00:12:49,000 And the fact that it is a reactive group makes RNA rather labile. 151 00:12:49,000 --> 00:12:54,000 OK? So let's write a couple of important things here. 152 00:12:54,000 --> 00:13:00,000 So this is RNA as the nucleic acid polymer. 153 00:13:00,000 --> 00:13:06,000 You should really know this. Ribose has both a 3 prime hydroxyl 154 00:13:06,000 --> 00:13:13,000 and this 2 prime hydroxyl. And this is a reactive group. 155 00:13:13,000 --> 00:13:19,000 And because of this RNA is a much less stable polymer than DNA. 156 00:13:19,000 --> 00:13:26,000 Here's another one. What type of polynucleotide is this and how do 157 00:13:26,000 --> 00:13:33,000 you know? Yes. You. OK. It's RNA. 158 00:13:33,000 --> 00:13:41,000 And it's RNA we know because of these uracil groups. 159 00:13:41,000 --> 00:13:49,000 OK? So uracil is an alternate base to thymine that's found only in RNA. 160 00:13:49,000 --> 00:13:57,000 Here are the Us. It tells you it's RNA. OK? So you need to know those 161 00:13:57,000 --> 00:14:07,000 facts about RNA. 162 00:14:07,000 --> 00:14:12,000 Good. So let me pose a question to you. In this litany that you've had 163 00:14:12,000 --> 00:14:18,000 several times now where the flow of information moves from DNA to RNA to 164 00:14:18,000 --> 00:14:24,000 protein, why is the RNA there? This is a rhetorical question. 165 00:14:24,000 --> 00:14:30,000 I'm going to try to answer it for you. 166 00:14:30,000 --> 00:14:34,000 Why is the RNA there? Why is there an RNA intermediate? 167 00:14:34,000 --> 00:14:47,000 You could imagine that the DNA 168 00:14:47,000 --> 00:14:52,000 double helix could open up and that nucleic acid could be directly 169 00:14:52,000 --> 00:14:57,000 translated or could be directly converted or the code could be 170 00:14:57,000 --> 00:15:03,000 changed to form a protein without any RNA intermediate. 171 00:15:03,000 --> 00:15:07,000 But, in fact, universally throughout biology, throughout our world anyway, 172 00:15:07,000 --> 00:15:12,000 throughout our earth, RNA is there as an intermediate. 173 00:15:12,000 --> 00:15:16,000 Why? Well, I think the answer actually lies in evolution. 174 00:15:16,000 --> 00:15:21,000 RNA is probably the most ancient of the information polymers. 175 00:15:21,000 --> 00:15:26,000 That is widely believed now. So RNA is ancient. It was the 176 00:15:26,000 --> 00:15:30,000 first, strongly believed now that it was the first information 177 00:15:30,000 --> 00:15:35,000 carrying polymer. RNAs themselves were catalytic. 178 00:15:35,000 --> 00:15:39,000 They became able to replicate. And they also probably became able to be 179 00:15:39,000 --> 00:15:43,000 translated into protein before DNA was invented. OK? 180 00:15:43,000 --> 00:15:47,000 So DNA's chemical structure is different and it's a derivative of 181 00:15:47,000 --> 00:15:52,000 ribonucleic acid, and undoubtedly came second. 182 00:15:52,000 --> 00:15:56,000 There was an advantage of having DNA because it's so much more stable, 183 00:15:56,000 --> 00:16:00,000 and it made the hereditary material much more stable and much more 184 00:16:00,000 --> 00:16:05,000 faithfully transmitted from generation to generation. 185 00:16:05,000 --> 00:16:10,000 So RNA was ancient. And the relationship between RNA 186 00:16:10,000 --> 00:16:15,000 and protein is probably a very old one, and we'll talk about this 187 00:16:15,000 --> 00:16:20,000 relationship next lecture. And I believe that that 188 00:16:20,000 --> 00:16:25,000 relationship has persisted, and then DNA was kind of an add-on. 189 00:16:25,000 --> 00:16:30,000 And the DNA to RNA to protein does not necessarily reflect the only way 190 00:16:30,000 --> 00:16:36,000 or the best way to do things. Evolution is a capitalization of 191 00:16:36,000 --> 00:16:42,000 various changes. And RNA to DNA, DNA to RNA to 192 00:16:42,000 --> 00:16:48,000 protein is how things work now. But this, I think, is a consequence 193 00:16:48,000 --> 00:16:54,000 of the evolutionary past. Now, however, in our modern world 194 00:16:54,000 --> 00:16:59,000 RNA serves two main purposes. One of the things it does is to 195 00:16:59,000 --> 00:17:03,000 allow one to use just a subset of the genes to make proteins. 196 00:17:03,000 --> 00:17:08,000 So, as you've been told several times, you and I have about 30, 197 00:17:08,000 --> 00:17:13,000 00 genes in our genomes. Not all of those genes, and we will discuss 198 00:17:13,000 --> 00:17:17,000 this in great depth as the course goes on. Not all of those genes are 199 00:17:17,000 --> 00:17:22,000 used at any one time. We use just a subset of the genes. 200 00:17:22,000 --> 00:17:27,000 And having them converted into an 201 00:17:27,000 --> 00:17:32,000 RNA intermediate is one of the ways that you can allow just a subset of 202 00:17:32,000 --> 00:17:38,000 the genes to be used. So I'm going to write here subset. 203 00:17:38,000 --> 00:17:54,000 Subset of gene usage. 204 00:17:54,000 --> 00:17:57,000 OK? Because you can turn just some of those genes, 205 00:17:57,000 --> 00:18:00,000 or you can convert some of those genes into RNA, 206 00:18:00,000 --> 00:18:04,000 the information in some of those genes into RNA. 207 00:18:04,000 --> 00:18:08,000 And the other thing it lets you do is to amplify the signal from each 208 00:18:08,000 --> 00:18:12,000 gene. So there are two copies of each gene in a diploid cell. 209 00:18:12,000 --> 00:18:17,000 When it comes to RNA there can be up to 10,000 copies of RNA per cell 210 00:18:17,000 --> 00:18:21,000 of a particular RNA. OK? So you can get an 211 00:18:21,000 --> 00:18:34,000 amplification of the signal -- 212 00:18:34,000 --> 00:18:42,000 -- from each gene. RNA copy number per cell ranges 213 00:18:42,000 --> 00:18:50,000 from about one copy to about 10, 00, that's rare, copies per cell. 214 00:18:50,000 --> 00:18:59,000 All right. So here we are. Why RNA? We've dealt with that. 215 00:18:59,000 --> 00:19:02,000 So I want to talk to you about two things. I want to talk to you about 216 00:19:02,000 --> 00:19:06,000 synthesizing the RNA, and then I'm going to talk to you 217 00:19:06,000 --> 00:19:10,000 about modifying the RNA a bit. And the first thing I want to cover 218 00:19:10,000 --> 00:19:18,000 is something called transcription. 219 00:19:18,000 --> 00:19:22,000 Which is also known as RNA synthesis. And you all should have this 220 00:19:22,000 --> 00:19:26,000 handout. So I'm not going to draw it but I will write some salient 221 00:19:26,000 --> 00:19:35,000 features on the board for you. 222 00:19:35,000 --> 00:19:42,000 And we're not quite ready to use that. I'm going to leave this up 223 00:19:42,000 --> 00:19:49,000 here, but I'm going to work on the board for a little bit. 224 00:19:49,000 --> 00:19:56,000 The basic idea behind transcription, RNA synthesis, 225 00:19:56,000 --> 00:20:04,000 is that one copies a DNA template into a complementary RNA strand, 226 00:20:04,000 --> 00:20:09,000 complementary RNA. And one does this, 227 00:20:09,000 --> 00:20:13,000 as I've alluded to, only from the genes. 228 00:20:13,000 --> 00:20:24,000 And this is an interesting point 229 00:20:24,000 --> 00:20:31,000 because although you have 30, 00 genes in your genome, in fact, 230 00:20:31,000 --> 00:20:37,000 those 30,000 genes only take up about 5% of the total amount of DNA 231 00:20:37,000 --> 00:20:44,000 in each of your cells. So 5% of your total DNA of your 232 00:20:44,000 --> 00:20:51,000 genome comprises the genes, the information carrying entities in 233 00:20:51,000 --> 00:20:58,000 your DNA. And the rest is other stuff. 234 00:20:58,000 --> 00:21:03,000 So the 95% is not genes. It consists of various repeats, 235 00:21:03,000 --> 00:21:09,000 repetitive DNA that can be there at just a few copies per genome or at 236 00:21:09,000 --> 00:21:14,000 10,000 copies per genome. They can be real little, 10 base 237 00:21:14,000 --> 00:21:20,000 pairs, six base pair repeats, or they can be a few kilo bases 238 00:21:20,000 --> 00:21:25,000 repeated many times. Oris, Origins of Replication that 239 00:21:25,000 --> 00:21:31,000 you talked about last time are not genes. 240 00:21:31,000 --> 00:21:36,000 Those are there, too. Centromeres, 241 00:21:36,000 --> 00:21:42,000 the middles of chromosomes. Telomeres, the ends of chromosomes. 242 00:21:42,000 --> 00:21:47,000 All of these things are not genes, and they comprise the bulk of your 243 00:21:47,000 --> 00:21:53,000 DNA. Now, this isn't true in all organisms. OK? 244 00:21:53,000 --> 00:21:59,000 Some organisms have got very little of this repetitive extra DNA. 245 00:21:59,000 --> 00:22:06,000 We happen to have a great deal of it. OK. So let's pursue this a bit 246 00:22:06,000 --> 00:22:14,000 more. And let's think a bit more about these genes. 247 00:22:14,000 --> 00:22:22,000 And in particular let's think about the kinds of RNAs that those genes 248 00:22:22,000 --> 00:22:30,000 make. So I'm going to talk about gene classes or classes. 249 00:22:30,000 --> 00:22:37,000 And this is with respect to the RNA and the functional RNA that comes 250 00:22:37,000 --> 00:22:44,000 from those sets of genes. And I want to distinguish two major 251 00:22:44,000 --> 00:22:52,000 classes of genes. The first are the protein encoding 252 00:22:52,000 --> 00:22:59,000 genes. And protein encoding genes move through a type of RNA that is 253 00:22:59,000 --> 00:23:07,000 called messenger RNA, abbreviated mRNA. 254 00:23:07,000 --> 00:23:16,000 Messenger RNAs comprise about 1% of the total amount of RNA in a cell. 255 00:23:16,000 --> 00:23:25,000 And they can range in size from let's say 100 base pairs to 10, 256 00:23:25,000 --> 00:23:32,000 00 base pairs. OK? So there's a very wide size range. 257 00:23:32,000 --> 00:23:38,000 No, not base pairs. Yell at me. Why not base pairs? 258 00:23:38,000 --> 00:23:43,000 Why was I wrong saying base pairs? Tell me about RNA. Raise your hand. 259 00:23:43,000 --> 00:23:49,000 This is worth a frog. I caught myself, but if you can 260 00:23:49,000 --> 00:23:54,000 catch me, too. Yes. You. Good. 261 00:23:54,000 --> 00:24:00,000 OK. Generally RNA is single, woops. 262 00:24:00,000 --> 00:24:04,000 RNA is single-stranded. It does not form, it can form a 263 00:24:04,000 --> 00:24:08,000 double helix, OK? It's not as stable as the DNA 264 00:24:08,000 --> 00:24:12,000 double helix, and many RNAs, probably most RNAs have some 265 00:24:12,000 --> 00:24:16,000 double-strandedness to them, but that is an intromolecular double 266 00:24:16,000 --> 00:24:20,000 strand in this. There are some RNAs that form 267 00:24:20,000 --> 00:24:24,000 intermolecular double strands, but in generally I'm going to assume 268 00:24:24,000 --> 00:24:28,000 that RNAs are single-stranded. So we talk about 100 bases rather 269 00:24:28,000 --> 00:24:33,000 than 100 base pairs. OK? Second class of genes are the 270 00:24:33,000 --> 00:24:39,000 ones that do not code for protein, and in this case the RNA is the 271 00:24:39,000 --> 00:24:45,000 final product. And this litany of DNA to RNA to 272 00:24:45,000 --> 00:24:51,000 protein doesn't hold. You just stop at the RNA. 273 00:24:51,000 --> 00:24:58,000 And the RNA is the functional thing. So here RNA is the final product. 274 00:24:58,000 --> 00:25:06,000 And we can break these into a bunch 275 00:25:06,000 --> 00:25:12,000 of different classes. Ribosomal RNAs, abbreviated rRNA 276 00:25:12,000 --> 00:25:19,000 are a very abundant class of RNA that comprise about, 277 00:25:19,000 --> 00:25:25,000 I've moved over here, let me move here, 98% of total RNA. 278 00:25:25,000 --> 00:25:31,000 And there are a few thousand bases in length that say 2, 279 00:25:31,000 --> 00:25:37,000 00 to 4,000 bases in length. OK? So this is 98% ribosomal RNA. 280 00:25:37,000 --> 00:25:42,000 This is fascinating. I'll tell you next time. This is the RNA that 281 00:25:42,000 --> 00:25:47,000 comprises a very large proportion of the ribosome that is the factory 282 00:25:47,000 --> 00:25:52,000 that makes the proteins. OK? And so I will tell you more 283 00:25:52,000 --> 00:25:57,000 about these next time. Some other ones, tRNA, 284 00:25:57,000 --> 00:26:03,000 the T for transfer RNA. tRNA comprise about 1% of all RNA 285 00:26:03,000 --> 00:26:11,000 and are about 100 base pairs, 100 bases long. OK? And then an 286 00:26:11,000 --> 00:26:19,000 interesting one that MIT has had a huge role in discovering and 287 00:26:19,000 --> 00:26:27,000 studying, these things called micro RNAs, abbreviated miRNAs, 288 00:26:27,000 --> 00:26:35,000 which are, they're at relatively low abundance. 289 00:26:35,000 --> 00:26:43,000 Less than 1% of total RNAs. And these are small. In their 290 00:26:43,000 --> 00:26:51,000 mature form they're about 22 bases in length. OK. 291 00:26:51,000 --> 00:27:00,000 So now, and I believe I cannot do anything with these boards. 292 00:27:00,000 --> 00:27:03,000 Ah, I can do something with this one, but that one is stuck. 293 00:27:03,000 --> 00:27:07,000 All right. So I'm going to do something with this one. 294 00:27:07,000 --> 00:27:10,000 And then I'm afraid it's going to disappear, but it's not going to 295 00:27:10,000 --> 00:27:14,000 matter because you have the handout in front of you. 296 00:27:14,000 --> 00:27:18,000 So now I'm ready to move on with you to the basic idea of 297 00:27:18,000 --> 00:27:21,000 transcription. And I'm going to write some facts 298 00:27:21,000 --> 00:27:25,000 on the board, and we're going to look at these cartoons that I drew 299 00:27:25,000 --> 00:27:29,000 for you together because I think your book is kind of difficult. 300 00:27:29,000 --> 00:27:36,000 So I decided to draw some cartoons to help you with the basic idea. 301 00:27:36,000 --> 00:27:44,000 Transcription or RNA synthesis takes place in the nucleus. 302 00:27:44,000 --> 00:27:59,000 Anyone else need a handout? 303 00:27:59,000 --> 00:28:05,000 Why don't you come on down. Actually, one of the TAs, could you 304 00:28:05,000 --> 00:28:11,000 be an emissary and just hand out to those people with raised hands? 305 00:28:11,000 --> 00:28:18,000 Thanks. Transcription takes place in the nucleus. 306 00:28:18,000 --> 00:28:24,000 And the idea is really analogous to DNA replication with a difference. 307 00:28:24,000 --> 00:28:30,000 The analogy is the synthesis of a complementary strand of nucleic acid 308 00:28:30,000 --> 00:28:37,000 on a template strand. So this is an enormously important 309 00:28:37,000 --> 00:28:43,000 principle that you need to have. Super important that you get the 310 00:28:43,000 --> 00:28:50,000 principle. The basic idea involves synthesis of a complementary strand 311 00:28:50,000 --> 00:28:57,000 of nucleic acid from a template strand. The template, 312 00:28:57,000 --> 00:29:04,000 actually, let me start even earlier than that. 313 00:29:04,000 --> 00:29:09,000 We start with a gene that generally comprises double-stranded DNA. 314 00:29:09,000 --> 00:29:14,000 There are exceptions to almost everything that I will tell you, 315 00:29:14,000 --> 00:29:19,000 or that Professor Jacks will tell you. You should understand that 316 00:29:19,000 --> 00:29:25,000 there are exceptions. Some organisms, particularly 317 00:29:25,000 --> 00:29:30,000 viruses have genomes that are RNA that can be single-stranded or 318 00:29:30,000 --> 00:29:35,000 double-stranded RNA. Some have genomes that are 319 00:29:35,000 --> 00:29:40,000 single-stranded DNA. But in general most genomes are 320 00:29:40,000 --> 00:29:45,000 double-stranded DNA. And the deal is this. 321 00:29:45,000 --> 00:29:50,000 The double-stranded DNA separates its strands, and one of the strands, 322 00:29:50,000 --> 00:29:55,000 and this is the difference between DNA replication and transcription, 323 00:29:55,000 --> 00:30:01,000 one of the strands becomes the template strand. 324 00:30:01,000 --> 00:30:14,000 And this template is copied to form 325 00:30:14,000 --> 00:30:22,000 a complementary strand. And it's copied by an enzyme called 326 00:30:22,000 --> 00:30:31,000 RNA polymerase. So RNA polymerase synthesizes the 327 00:30:31,000 --> 00:30:40,000 complementary strand to the template strand. 328 00:30:40,000 --> 00:30:45,000 complementary strand. And it does so, of course, 329 00:30:45,000 --> 00:30:51,000 as RNA, because we're talking about RNA synthesis and this is RNA 330 00:30:51,000 --> 00:30:57,000 polymerase. It does not, unlike DNA polymerization, 331 00:30:57,000 --> 00:31:03,000 require a primer. So this does not require a primer. 332 00:31:03,000 --> 00:31:07,000 OK. You should know, and it should be getting deep within 333 00:31:07,000 --> 00:31:11,000 your neural circuitry that polymerization occurs by adding 334 00:31:11,000 --> 00:31:15,000 nucleotides to the 3 prime end of the growing polymer. 335 00:31:15,000 --> 00:31:20,000 Yes. If that didn't make, you know, if you didn't say "yeah" 336 00:31:20,000 --> 00:31:24,000 to that, go back and think about it, go back and look at problem sets and 337 00:31:24,000 --> 00:31:28,000 you'll get more practice in this. But you really need to know that 338 00:31:28,000 --> 00:31:33,000 the growing chain adds onto the 3 prime end. 339 00:31:33,000 --> 00:31:39,000 OK. So after the polymer, after the RNA polymer is made the 340 00:31:39,000 --> 00:31:46,000 RNA is released from the template strand. As its being transcribed it 341 00:31:46,000 --> 00:31:52,000 forms this complementary strand. And, as you know, complementary 342 00:31:52,000 --> 00:31:59,000 strands can base pair. After it's made it is released from 343 00:31:59,000 --> 00:32:05,000 the template. And it usually then goes into the 344 00:32:05,000 --> 00:32:09,000 cytoplasm where it does its thing. So if you look at the diagram I 345 00:32:09,000 --> 00:32:13,000 gave you, that's what's up here, here's your double-stranded DNA, 346 00:32:13,000 --> 00:32:17,000 your gene. The strands separate. One strand is transcribed into RNA. 347 00:32:17,000 --> 00:32:21,000 The RNA is release. Obviously, your double-stranded template, 348 00:32:21,000 --> 00:32:25,000 or what was your double-stranded template will reform 349 00:32:25,000 --> 00:32:30,000 its double strand. So perhaps that's not so obvious, 350 00:32:30,000 --> 00:32:36,000 but the double-stranded, originally double-stranded template will reform 351 00:32:36,000 --> 00:32:41,000 its double strands, thus released RNA, then goes into 352 00:32:41,000 --> 00:32:47,000 the cytoplasm where it is translated into protein, or where the RNA is 353 00:32:47,000 --> 00:32:53,000 the final product. So let's look at that in a bit more 354 00:32:53,000 --> 00:32:58,000 detail. I've got here a template strand 355 00:32:58,000 --> 00:33:03,000 shown in red. This is, again, the second picture in the 356 00:33:03,000 --> 00:33:09,000 handout in front of you. And I've got three features added 357 00:33:09,000 --> 00:33:14,000 here. I have got a precise start site of transcription. 358 00:33:14,000 --> 00:33:19,000 I've indicated elongation where the polymer is elongating. 359 00:33:19,000 --> 00:33:24,000 And I have a precise termination site where transcription ends. 360 00:33:24,000 --> 00:33:30,000 OK? Now, let me see what I have here. 361 00:33:30,000 --> 00:33:35,000 I have a movie here. Watch the movie. I'll show it to 362 00:33:35,000 --> 00:33:41,000 you once, and then you can go and watch it at your leisure. 363 00:33:41,000 --> 00:33:47,000 This is meant to be RNA polymerase. There's the helix opening up 364 00:33:47,000 --> 00:33:53,000 locally. Here are ribonucleotide triphosphates coming in, 365 00:33:53,000 --> 00:33:59,000 and RNA polymerase is catalyzing their synthesis. OK? 366 00:33:59,000 --> 00:34:03,000 So the template strand is the bottom and here is the RNA being released. 367 00:34:03,000 --> 00:34:08,000 There's RNA polymerase moving along the helix. And the depiction is 368 00:34:08,000 --> 00:34:13,000 that the helix is opening locally and then closing again behind the 369 00:34:13,000 --> 00:34:18,000 RNA polymerase. At transcription termination, 370 00:34:18,000 --> 00:34:23,000 the helix, the gene helix zips up again and the transcript is released. 371 00:34:23,000 --> 00:34:28,000 So this is a very much simplified story. 372 00:34:28,000 --> 00:34:32,000 But is the basic principle of transcription. 373 00:34:32,000 --> 00:34:37,000 And you should know it. And in particular I have put onto 374 00:34:37,000 --> 00:34:41,000 this second diagram, and because you have him in front of 375 00:34:41,000 --> 00:34:46,000 you I'm not going to write this on the board, I'm going to use this as 376 00:34:46,000 --> 00:34:50,000 something to tell you, I have put the directionality of the 377 00:34:50,000 --> 00:34:55,000 strands of the double-helix on this diagram. This should be something 378 00:34:55,000 --> 00:35:00,000 you can deal with. 5 prime to 3 prime on one strand. 379 00:35:00,000 --> 00:35:04,000 The other strand is anti-parallel. RNA, any nucleic acid is 380 00:35:04,000 --> 00:35:08,000 synthesized by adding onto the 3 prime end. And that newly 381 00:35:08,000 --> 00:35:12,000 synthesizing nucleic acid polymer is anti-parallel to the template. 382 00:35:12,000 --> 00:35:16,000 This is also something that you should be familiar with. 383 00:35:16,000 --> 00:35:20,000 And you will have, will have, have not yet, will have practice on 384 00:35:20,000 --> 00:35:24,000 doing this kind of polymerization, but it should be something you 385 00:35:24,000 --> 00:35:28,000 really, really should be familiar with, this anti-parallel 386 00:35:28,000 --> 00:35:32,000 requirement. So, in fact, you can tell the 387 00:35:32,000 --> 00:35:36,000 direction of transcription because of the directionality of the 388 00:35:36,000 --> 00:35:40,000 template strand. OK. So this is very important for 389 00:35:40,000 --> 00:35:45,000 you to go and think about after class the directionality of the 390 00:35:45,000 --> 00:35:49,000 template and of the newly synthesized polymer. 391 00:35:49,000 --> 00:35:54,000 These are some diagrams from your book, and you can go and look at 392 00:35:54,000 --> 00:35:58,000 them. I'm not going to dwell on them. They indicate the different 393 00:35:58,000 --> 00:36:02,000 between, or the steps in transcription initiation, 394 00:36:02,000 --> 00:36:06,000 elongation and termination. And I've put them up there just to 395 00:36:06,000 --> 00:36:10,000 tell you there are these diagrams in your book and you can go and take a 396 00:36:10,000 --> 00:36:14,000 look at them and read the accompanying text. 397 00:36:14,000 --> 00:36:18,000 OK. So I see three problems with transcription that are very 398 00:36:18,000 --> 00:36:28,000 interesting problems. 399 00:36:28,000 --> 00:36:33,000 One is how to find the genes. I'll write them on the board and 400 00:36:33,000 --> 00:36:43,000 then we'll go through them. 401 00:36:43,000 --> 00:36:47,000 5% of the genome is genes. That's most of it that is not genes. 402 00:36:47,000 --> 00:36:51,000 How does the transcription enzyme, how does the RNA polymerase know 403 00:36:51,000 --> 00:36:56,000 which is a gene and which is not a gene? How does it know, 404 00:36:56,000 --> 00:37:00,000 even if it finds the gene, which strand is the template strand 405 00:37:00,000 --> 00:37:05,000 and which is not the template strand? 406 00:37:05,000 --> 00:37:08,000 I could have drawn your previous diagram where the top strand was the 407 00:37:08,000 --> 00:37:11,000 template, and that what would have happened would be that the RNA 408 00:37:11,000 --> 00:37:15,000 synthesis went in the other direction. So which strand 409 00:37:15,000 --> 00:37:31,000 is the template? 410 00:37:31,000 --> 00:37:38,000 And that also gives you the direction, of course, 411 00:37:38,000 --> 00:37:46,000 of transcription. And the third one I'm going to write, 412 00:37:46,000 --> 00:37:54,000 and then I'll tell you about this in a moment. I'm going to write how to 413 00:37:54,000 --> 00:38:00,000 unwrap chromatin. OK. So in each of your cells, 414 00:38:00,000 --> 00:38:05,000 you have to look at me for this. In each of your cells there is this 415 00:38:05,000 --> 00:38:11,000 length of DNA. One meter. This is a little over, 416 00:38:11,000 --> 00:38:16,000 but one meter of DNA. How big is the average cell in diameter? 417 00:38:16,000 --> 00:38:21,000 Give it to me in micrometers. Worth a frog. On average. Well, 418 00:38:21,000 --> 00:38:27,000 that's actually a really big cell. It's about ten times 419 00:38:27,000 --> 00:38:32,000 less than that. But whoever that was, 420 00:38:32,000 --> 00:38:36,000 who was it? No way. These are very bad to throw. Very bad to throw. 421 00:38:36,000 --> 00:38:41,000 You can have it because you caught it. See me afterwards. 422 00:38:41,000 --> 00:38:46,000 I'll give you one. OK. [LAUGHTER] OK. So how do you pack 423 00:38:46,000 --> 00:38:50,000 a meter of DNA into a cell that is about ten microns in diameter? 424 00:38:50,000 --> 00:38:55,000 OK. So, OK, Jamie, you want to hazard an answer here? 425 00:38:55,000 --> 00:39:06,000 Your hand was up. 426 00:39:06,000 --> 00:39:10,000 OK. Good. You can wind it up. OK. The other thing you have to do, 427 00:39:10,000 --> 00:39:14,000 of course, is to make it really thin. It has to be a lot thinner than my 428 00:39:14,000 --> 00:39:18,000 piece of rope. But once you've made it really thin 429 00:39:18,000 --> 00:39:22,000 you can then wind it up. OK. It's logical and this is how 430 00:39:22,000 --> 00:39:26,000 it's done. And you can wind it up and then it will fit into 431 00:39:26,000 --> 00:39:31,000 your ten micron cell. Now, in actual fact, 432 00:39:31,000 --> 00:39:36,000 there's a whole process to do that. And I'm going to go through them as 433 00:39:36,000 --> 00:39:41,000 we go through these problems here. So here is problem one exemplified. 434 00:39:41,000 --> 00:39:46,000 I've got red little dots for each of the genes. How does RNA 435 00:39:46,000 --> 00:39:51,000 polymerase find these genes in this vast amount of DNA that is not genes? 436 00:39:51,000 --> 00:39:56,000 Here's the other one. Which strand is the template? 437 00:39:56,000 --> 00:40:01,000 Oh. And here is a nice problem that in 438 00:40:01,000 --> 00:40:05,000 the interest of time I am not going to do here in class with you, 439 00:40:05,000 --> 00:40:10,000 but I want you guys to go and do this as an exercise. 440 00:40:10,000 --> 00:40:14,000 I will tell you that the answer is not on your handout on the Web. 441 00:40:14,000 --> 00:40:18,000 I took it off. Sneaky, ha? So that you can go and think about this. 442 00:40:18,000 --> 00:40:23,000 I want you to go and understand that the products of synthesis from 443 00:40:23,000 --> 00:40:27,000 either strand of a DNA double-stranded helix 444 00:40:27,000 --> 00:40:32,000 are not the same. OK? And I'm going to zip through 445 00:40:32,000 --> 00:40:37,000 this because I want to move on here. OK. And I want to move to problem 446 00:40:37,000 --> 00:40:42,000 three which is this thing I called chromatin. DNA is wound up around 447 00:40:42,000 --> 00:40:46,000 proteins. These are called histones, and we'll have more to say about 448 00:40:46,000 --> 00:40:51,000 them later in the course. And wound up and wound up and wound 449 00:40:51,000 --> 00:40:56,000 up. And there is a very set number and type of proteins that the DNA is 450 00:40:56,000 --> 00:41:01,000 wound around. And once the DNA has been wound 451 00:41:01,000 --> 00:41:05,000 around once, those DNA protein complexes are wound up some more, 452 00:41:05,000 --> 00:41:10,000 and then wound up some more. And eventually you get them wound up and 453 00:41:10,000 --> 00:41:14,000 wrapped up so much you get the characteristic rather large 454 00:41:14,000 --> 00:41:19,000 chromosomes which are very much packed DNA. Now, 455 00:41:19,000 --> 00:41:23,000 this is a great way to fit DNA into a cell. However, 456 00:41:23,000 --> 00:41:28,000 this wrapping up of the chromatin into, the wrapping up of the DNA 457 00:41:28,000 --> 00:41:33,000 into this chromatin structure inhibits transcription. 458 00:41:33,000 --> 00:41:37,000 And in order to allow transcription to proceed, you have to remove these 459 00:41:37,000 --> 00:41:42,000 proteins from the DNA and allow it to unwind locally. 460 00:41:42,000 --> 00:41:47,000 And that takes a whole series of enzymatic steps, 461 00:41:47,000 --> 00:41:51,000 again that we'll explore more later in the course. 462 00:41:51,000 --> 00:41:56,000 But the problem I throw out at you now is hw do you unwrap the 463 00:41:56,000 --> 00:42:01,000 chromatin where transcription is needed? And the answer to all of 464 00:42:01,000 --> 00:42:06,000 these things lies in a specific, no, stop. 465 00:42:06,000 --> 00:42:11,000 Stop. Down. Up. OK. The answer to all of these 466 00:42:11,000 --> 00:42:17,000 questions lies in a specific DNA sequence or a series of specific DNA 467 00:42:17,000 --> 00:42:23,000 sequences that are collectively called -- 468 00:42:23,000 --> 00:42:34,000 -- the promoter. 469 00:42:34,000 --> 00:42:41,000 Here's another one. What is a promoter? And I need to 470 00:42:41,000 --> 00:42:49,000 make the distinction now between transcribed DNA of a gene and 471 00:42:49,000 --> 00:42:56,000 untranscribed DNA of a gene. The promoter is part of a gene but 472 00:42:56,000 --> 00:43:03,000 it is not transcribed. It usually depends on the gene and 473 00:43:03,000 --> 00:43:09,000 the type of gene. It usually lies 5 prime to the 474 00:43:09,000 --> 00:43:16,000 transcriptional start site. And it is a DNA sequence that says 475 00:43:16,000 --> 00:43:23,000 this is a gene, and it also says transcription 476 00:43:23,000 --> 00:43:30,000 should proceed in this direction. OK. 477 00:43:30,000 --> 00:43:34,000 And the way it does these things, I'm going to each of the answers to 478 00:43:34,000 --> 00:43:39,000 each of the problems now, is that it binds proteins that 479 00:43:39,000 --> 00:43:44,000 specifically recognize the sequence of the promoter. 480 00:43:44,000 --> 00:43:48,000 So you talked about the DNA replication origin and proteins that 481 00:43:48,000 --> 00:43:53,000 specifically recognize the nucleotide sequence of the origin. 482 00:43:53,000 --> 00:43:58,000 This is analogous. There are proteins that recognize promoter 483 00:43:58,000 --> 00:44:03,000 sequences which are similar but not identical from gene to gene. 484 00:44:03,000 --> 00:44:09,000 So it binds proteins. And these are called transcription 485 00:44:09,000 --> 00:44:21,000 factors. 486 00:44:21,000 --> 00:44:25,000 And these transcription factors bind in a DNA sequence specific way. 487 00:44:25,000 --> 00:44:30,000 OK. It also binds RNA polymerase which I'm going to abbreviate RNA 488 00:44:30,000 --> 00:44:37,000 polymerase, RNA pol. OK? And the answers to the three 489 00:44:37,000 --> 00:44:46,000 questions are that firstly the protein-DNA interaction is sequence 490 00:44:46,000 --> 00:44:55,000 specific, sequence specific, and so this allows you to actually 491 00:44:55,000 --> 00:45:02,000 find the genes. Secondly, and this is cool and I'll 492 00:45:02,000 --> 00:45:07,000 show you a picture of this in a moment, the proteins interact with a 493 00:45:07,000 --> 00:45:12,000 promoter DNA differently on different strands of the helix so 494 00:45:12,000 --> 00:45:17,000 they bind asymmetrically. They may bind more to one strand 495 00:45:17,000 --> 00:45:22,000 than to the other strand. And this gives directionality to 496 00:45:22,000 --> 00:45:27,000 the transcription so the protein binding is asymmetric 497 00:45:27,000 --> 00:45:36,000 or strand specific. 498 00:45:36,000 --> 00:45:41,000 Not for all of these proteins but for a significant number. 499 00:45:41,000 --> 00:45:46,000 And that helps you decide which strand you're going to use as the 500 00:45:46,000 --> 00:45:52,000 template. And thirdly these proteins have got associated with 501 00:45:52,000 --> 00:45:57,000 them activities that will unwrap the chromatin, that will unwrap the DNA 502 00:45:57,000 --> 00:46:03,000 from its protein complexes and allow it to be accessible to the 503 00:46:03,000 --> 00:46:16,000 transcription machinery. 504 00:46:16,000 --> 00:46:20,000 OK. Let's zip so I can show you this. You can look at this on your 505 00:46:20,000 --> 00:46:25,000 slides. Here are some pictures from your book. Really important in this, 506 00:46:25,000 --> 00:46:30,000 don't move. Really important in this is a protein called TF2D which 507 00:46:30,000 --> 00:46:35,000 recognizes a sequence called, that goes T-A-T-A-A-A. 508 00:46:35,000 --> 00:46:40,000 This is called the TATA binding protein. And it's really important. 509 00:46:40,000 --> 00:46:45,000 And it's the one thing that, the major thing that gives asymmetry to 510 00:46:45,000 --> 00:46:50,000 this transcription, set of transcription factors on the 511 00:46:50,000 --> 00:46:55,000 promoter. Once TF2D has bound to the promoter, other proteins come 512 00:46:55,000 --> 00:47:01,000 along, including these various other things called BFHG and so on. 513 00:47:01,000 --> 00:47:05,000 And here's RNA polymerase. And you can see this complex 514 00:47:05,000 --> 00:47:10,000 positioned asymmetrically on the DNA. And this complex you should know 515 00:47:10,000 --> 00:47:15,000 the name of, I'm going to put it in its own box here, is the 516 00:47:15,000 --> 00:47:23,000 initiation complex. 517 00:47:23,000 --> 00:47:28,000 And I want to show you a crystallographic rendition of the 518 00:47:28,000 --> 00:47:34,000 TATA binding protein called TBP, also sometimes called TF2D. But 519 00:47:34,000 --> 00:47:40,000 TATA binding protein here shown in purple. And if you look here, 520 00:47:40,000 --> 00:47:45,000 you're looking head on at the double helix. OK? Here's the helix. 521 00:47:45,000 --> 00:47:51,000 You're looking down the helix. And you can see that this protein 522 00:47:51,000 --> 00:47:57,000 is positioned on just one side of the helix, so that gives 523 00:47:57,000 --> 00:48:02,000 you asymmetry. Here's another transcription factor 524 00:48:02,000 --> 00:48:06,000 bound to DNA. This is a protein called GAL-4. It binds as a dimer. 525 00:48:06,000 --> 00:48:11,000 And you can see that GAL4 is the blue. And here it is contacting 526 00:48:11,000 --> 00:48:15,000 just one side, one strand of this red double helix. 527 00:48:15,000 --> 00:48:20,000 And I'm going to stop there and finish off the last little 528 00:48:20,000 --> 00:48:23,000 bit on Monday.