1 00:00:01,000 --> 00:00:04,000 Among the issues that some people asked that should be discussed in 2 00:00:04,000 --> 00:00:08,000 greater detail should be the structure of proteins. 3 00:00:08,000 --> 00:00:12,000 I'll touch on it very briefly this morning, different kinds of bonding, 4 00:00:12,000 --> 00:00:17,000 tertiary and quaternary structure, condensation or dehydration 5 00:00:17,000 --> 00:00:21,000 reactions. And, in fact, many of those issues should 6 00:00:21,000 --> 00:00:26,000 be addressed in the recitation sections. 7 00:00:26,000 --> 00:00:30,000 That's the ideal place to begin to clarify things which although they 8 00:00:30,000 --> 00:00:34,000 were mentioned here may not have been mentioned in the degree of 9 00:00:34,000 --> 00:00:38,000 detail that you really need to assimilate them properly. 10 00:00:38,000 --> 00:00:42,000 And I urge you to raise these issues with the recitation section 11 00:00:42,000 --> 00:00:46,000 instructors. That's exactly what they're there for. 12 00:00:46,000 --> 00:00:50,000 Having said that I just want to dip back briefly into protein structure, 13 00:00:50,000 --> 00:00:54,000 even though we turned our back on it at the end of last time, 14 00:00:54,000 --> 00:00:58,000 just to reinforce some things that I realized I should have mentioned 15 00:00:58,000 --> 00:01:01,000 perhaps in greater detail. Here for the example are different 16 00:01:01,000 --> 00:01:05,000 ways of depicting the three-dimensional structure of the 17 00:01:05,000 --> 00:01:08,000 protein. And, by the way, we see that these are 18 00:01:08,000 --> 00:01:11,000 beta pleated sheets in the light brown and these are alpha helices. 19 00:01:11,000 --> 00:01:15,000 There are two of them here in green, one going this way, 20 00:01:15,000 --> 00:01:18,000 the other going this way, a third one going this way. 21 00:01:18,000 --> 00:01:22,000 And the other blue areas are not structured, i. 22 00:01:22,000 --> 00:01:25,000 ., they're not structured in the sense that they are in any way 23 00:01:25,000 --> 00:01:29,000 obviously alpha helices or beta pleated sheets. 24 00:01:29,000 --> 00:01:32,000 Here's a space-filling model, a space-filling depiction of a 25 00:01:32,000 --> 00:01:36,000 protein. We talked about that last time. Here is a trace of the 26 00:01:36,000 --> 00:01:40,000 backbone, of the peptide backbone of the same protein where the side 27 00:01:40,000 --> 00:01:44,000 chains are left out, and obviously where one is only 28 00:01:44,000 --> 00:01:47,000 plotting the three-dimensional coordinates of each of the backbone 29 00:01:47,000 --> 00:01:51,000 atoms, CCN, CCN, CCN. Here is yet another way of 30 00:01:51,000 --> 00:01:55,000 plotting exactly the same protein in terms of indicating, 31 00:01:55,000 --> 00:01:59,000 as we just said, the structure of these alpha helices in 32 00:01:59,000 --> 00:02:03,000 the other regions. That is the secondary structure of 33 00:02:03,000 --> 00:02:07,000 this protein. And here's yet a fourth way of plotting, 34 00:02:07,000 --> 00:02:11,000 of depicting the same structure of the protein where roughly one is 35 00:02:11,000 --> 00:02:16,000 depicting the configuration of the amino acids in terms of a large 36 00:02:16,000 --> 00:02:20,000 sausage. Excuse me. If one were to use a space-filling 37 00:02:20,000 --> 00:02:24,000 model we'd go up to here. So these are just four ways of 38 00:02:24,000 --> 00:02:29,000 looking at the same protein with different degrees of simplification. 39 00:02:29,000 --> 00:02:33,000 Another point that I thought I would like to reinforce and make was the 40 00:02:33,000 --> 00:02:37,000 following. We've talked about transmembrane proteins in the past. 41 00:02:37,000 --> 00:02:41,000 That is, proteins which protrude through a membrane from one side to 42 00:02:41,000 --> 00:02:46,000 the other. And a point that I realized I'd like to make is that if 43 00:02:46,000 --> 00:02:50,000 we look at a transmembrane protein here's one that is starting out in 44 00:02:50,000 --> 00:02:54,000 the cytoplasm of a cell. And, by the way, the soluble part 45 00:02:54,000 --> 00:02:59,000 of the cytoplasm is sometimes called the cytosol. 46 00:02:59,000 --> 00:03:03,000 Here is the lipid bilayer that we talked about at length and here is 47 00:03:03,000 --> 00:03:07,000 the extracellular domain of this same protein. Now, 48 00:03:07,000 --> 00:03:11,000 how is all this organized? Well, the fact of the matter is we 49 00:03:11,000 --> 00:03:15,000 discussed the fact that this hydrophobic space in the lipid 50 00:03:15,000 --> 00:03:19,000 bilayer is so hydrophobic that it really doesn't like to be in the 51 00:03:19,000 --> 00:03:23,000 presence of hydrophilic molecules, including in this case amino acids. 52 00:03:23,000 --> 00:03:27,000 And what we see here is the fact that almost all of the amino acids 53 00:03:27,000 --> 00:03:31,000 in this region of the protein, which is called the transmembrane 54 00:03:31,000 --> 00:03:35,000 region of the protein because it reaches from one side to the other, 55 00:03:35,000 --> 00:03:39,000 are all hydrophobic or neutral amino acids which are reasonably 56 00:03:39,000 --> 00:03:43,000 comfortable in the hydrophobic space of the lipid bilayer. 57 00:03:43,000 --> 00:03:47,000 There happens to be two apparent violators of this, 58 00:03:47,000 --> 00:03:51,000 glutamine and histidine. You see these two here? I mean 59 00:03:51,000 --> 00:03:55,000 glutamic acid and histidine. Glutamic acid and histidine. 60 00:03:55,000 --> 00:03:59,000 One is negatively charged and therefore is highly hydrophilic. 61 00:03:59,000 --> 00:04:03,000 The other is positively charged and is therefore highly hydrophilic. 62 00:04:03,000 --> 00:04:07,000 And on the surface that would seem to violate the rule I just 63 00:04:07,000 --> 00:04:11,000 articulated. But the fact is that as it turns out in the particular 64 00:04:11,000 --> 00:04:15,000 protein these two charges, these two amino acids are so closely 65 00:04:15,000 --> 00:04:19,000 juxtaposed with one another that their positive and negative charges 66 00:04:19,000 --> 00:04:23,000 are used to neutralize one another. And as a consequence in effect 67 00:04:23,000 --> 00:04:27,000 there is no strong charging or polarity in this area 68 00:04:27,000 --> 00:04:32,000 or in this area. The take-home lesson is that somehow 69 00:04:32,000 --> 00:04:36,000 proteins manage to insert themselves and to remain stable in the lipid 70 00:04:36,000 --> 00:04:41,000 bilayer by virtue of either using only stretches of hydrophobic or 71 00:04:41,000 --> 00:04:45,000 nonpolar amino acids or they use tricks like this of neutralizing any 72 00:04:45,000 --> 00:04:50,000 charges that happen to be there. Note, by the way, that because 73 00:04:50,000 --> 00:04:55,000 there are hydrophilic amino acids down here and there turn out to be 74 00:04:55,000 --> 00:04:59,000 hydrophilic amino acid around here, arginine, and here there's a whole 75 00:04:59,000 --> 00:05:03,000 bunch of basic amino acids. Note that this keeps the 76 00:05:03,000 --> 00:05:07,000 transmembrane protein from getting pulled in one direction or the other 77 00:05:07,000 --> 00:05:10,000 because this arginine likes to associate with the negative 78 00:05:10,000 --> 00:05:14,000 phosphates on the outside of the phospholipids. 79 00:05:14,000 --> 00:05:18,000 And the same thing is here. And all that means is that this 80 00:05:18,000 --> 00:05:21,000 transmembrane protein is firmly anchored in the lipid bilayer, 81 00:05:21,000 --> 00:05:25,000 a point we'll talk about later in greater detail when we talk about 82 00:05:25,000 --> 00:05:29,000 membrane structure. One other little point I'll mention 83 00:05:29,000 --> 00:05:33,000 here in passing, which we'll also get into in greater 84 00:05:33,000 --> 00:05:38,000 detail, is that once a protein has been polymerized that polymerization 85 00:05:38,000 --> 00:05:42,000 is not the last thing that happens to it once it's polymerized and 86 00:05:42,000 --> 00:05:46,000 folded into place because we know that proteins undergo what is called 87 00:05:46,000 --> 00:05:51,000 post-translational modifications. And, as we'll talk about in the 88 00:05:51,000 --> 00:05:55,000 coming weeks, the process of synthesizing a protein 89 00:05:55,000 --> 00:06:00,000 is called translation. And when we talk about 90 00:06:00,000 --> 00:06:04,000 post-translational modification what we're talking about is opening our 91 00:06:04,000 --> 00:06:08,000 eyes to the possibility that even after the primary amino acid 92 00:06:08,000 --> 00:06:13,000 sequence has been polymerized there are chemical alterations that can 93 00:06:13,000 --> 00:06:17,000 subsequently be imposed on the amino acid side chains to further modify 94 00:06:17,000 --> 00:06:22,000 the protein. One such modification, by example, is a proteolytic 95 00:06:22,000 --> 00:06:26,000 degradation. And when I talk about proteolytic degradation, 96 00:06:26,000 --> 00:06:31,000 I'm talking about the fact that one can break down a protein. 97 00:06:31,000 --> 00:06:35,000 Proteolysis is the breaking down of a protein. And when we talk about 98 00:06:35,000 --> 00:06:39,000 degradation we're talking about destroying what has been synthesized. 99 00:06:39,000 --> 00:06:44,000 In the case of many proteins, once they're synthesized there may 100 00:06:44,000 --> 00:06:48,000 be a stretch of amino acids at one end or the other that simply clipped 101 00:06:48,000 --> 00:06:53,000 off therefore creating a protein which is smaller than the initially 102 00:06:53,000 --> 00:06:57,000 synthesized product of protein synthesis, i.e. 103 00:06:57,000 --> 00:07:02,000 the initially synthesized product of translation. 104 00:07:02,000 --> 00:07:07,000 Here we see yet another kind of post-translational modification, 105 00:07:07,000 --> 00:07:12,000 because it turns out that in many proteins which protrude into the 106 00:07:12,000 --> 00:07:18,000 extracellular space there is yet another kind of covalent 107 00:07:18,000 --> 00:07:23,000 modification which is the process of glycosylation in which a series of 108 00:07:23,000 --> 00:07:28,000 sugar side chains, carbohydrate side chains is 109 00:07:28,000 --> 00:07:34,000 covalently attached to the polypeptide chain usually on serines 110 00:07:34,000 --> 00:07:39,000 or threonines using the hydroxyl of the side chain of serines or 111 00:07:39,000 --> 00:07:45,000 threonines to attach these oligosaccharide side chains. 112 00:07:45,000 --> 00:07:49,000 We know from our discussion the last time oligosaccharide means an 113 00:07:49,000 --> 00:07:53,000 assembly of a small number of monosaccharides. 114 00:07:53,000 --> 00:07:58,000 And each of these blue hexagons represents a monosaccharide which 115 00:07:58,000 --> 00:08:02,000 are covalently linked and also modify the extracellular domain of 116 00:08:02,000 --> 00:08:07,000 this protein as it protrudes into the extracellular space. 117 00:08:07,000 --> 00:08:11,000 So I'm just opening our eyes to the possibility that in the future we're 118 00:08:11,000 --> 00:08:15,000 going to talk about yet other ways in which proteins are modified to 119 00:08:15,000 --> 00:08:20,000 further tune-up their structure to make them more suitable, 120 00:08:20,000 --> 00:08:24,000 more competent to do the various jobs to which they've been assigned. 121 00:08:24,000 --> 00:08:29,000 Let's therefore return to what we talked about the last time, 122 00:08:29,000 --> 00:08:33,000 the fact that the structure of nucleic acids is based on 123 00:08:33,000 --> 00:08:38,000 this simple principle. Here, by the way, 124 00:08:38,000 --> 00:08:42,000 I'm returning to the notion of this numbering system. 125 00:08:42,000 --> 00:08:46,000 We're talking about a pentose nucleic acid. The fact that there 126 00:08:46,000 --> 00:08:50,000 are two hydroxyls here right away tells us that we're looking at a 127 00:08:50,000 --> 00:08:54,000 ribose rather than a deoxyribose which, as I said last time, 128 00:08:54,000 --> 00:08:58,000 lacks this sugar right there. Note, as we've said repeatedly, 129 00:08:58,000 --> 00:09:03,000 that the hydroxyl side chains of carbohydrates offer numerous 130 00:09:03,000 --> 00:09:09,000 opportunities for using dehydration reactions, or as they're sometimes 131 00:09:09,000 --> 00:09:14,000 called condensation reactions where you remove a water, 132 00:09:14,000 --> 00:09:19,000 where you take out a water, dehydration, or we can call them 133 00:09:19,000 --> 00:09:25,000 condensation reactions to attach yet other things. And, 134 00:09:25,000 --> 00:09:30,000 in fact, in principle there are actually four different hydroxyls 135 00:09:30,000 --> 00:09:35,000 that could be used here to do that. There's one here, 136 00:09:35,000 --> 00:09:39,000 there's one here, one here and one here. There are four different 137 00:09:39,000 --> 00:09:44,000 hydroxyls. The 1, the 2, the 3 and the 5 hydroxyl are, 138 00:09:44,000 --> 00:09:48,000 in principle, opportunities for further modification. 139 00:09:48,000 --> 00:09:53,000 In truth the 2-prime hydroxyl is rarely used, as we'll discuss 140 00:09:53,000 --> 00:09:57,000 shortly, but the main actors are therefore this hydroxyl here in 141 00:09:57,000 --> 00:10:02,000 which a condensation reaction has created a glycosidic bond. 142 00:10:02,000 --> 00:10:06,000 That is a bond between a sugar and a non-sugar entity. 143 00:10:06,000 --> 00:10:11,000 Glyco refers obviously to sugars like glycogen or glycosylation we've 144 00:10:11,000 --> 00:10:15,000 talked about before. Here a bond has been made between a 145 00:10:15,000 --> 00:10:20,000 base, and we'll talk about the different bases shortly, 146 00:10:20,000 --> 00:10:25,000 and the 1-prime hydroxyl of the ribose. Over here at the 5-prime 147 00:10:25,000 --> 00:10:30,000 hydroxyl yet another condensation reaction. 148 00:10:30,000 --> 00:10:34,000 Sometimes this is called an esterification reaction. 149 00:10:34,000 --> 00:10:39,000 And again esterification refers to these kinds of condensation 150 00:10:39,000 --> 00:10:44,000 reactions where an acid and a base react with one another, 151 00:10:44,000 --> 00:10:48,000 and once again through a condensation reaction, 152 00:10:48,000 --> 00:10:53,000 yield the removal of a water. And let's look at what's happening 153 00:10:53,000 --> 00:10:58,000 here, because not only is one phosphate group attached to the 154 00:10:58,000 --> 00:11:03,000 5-prime carbon, to the 5-prime hydroxyl. 155 00:11:03,000 --> 00:11:07,000 In fact, there are three. And they are located, and each of 156 00:11:07,000 --> 00:11:11,000 them has a name. The inboard one is called alpha, 157 00:11:11,000 --> 00:11:15,000 moving further out is beta, and furthest out is gamma. 158 00:11:15,000 --> 00:11:19,000 And it turns out that this chain of phosphates have very important 159 00:11:19,000 --> 00:11:23,000 implications for energy metabolism and for biosynthesis. 160 00:11:23,000 --> 00:11:27,000 Why? I'm glad I asked that question. Because these are all 161 00:11:27,000 --> 00:11:31,000 three highly negatively charged. This is negatively charged, 162 00:11:31,000 --> 00:11:35,000 this is and this is. And, as you know, negative charges repel one 163 00:11:35,000 --> 00:11:39,000 another. And as a consequence, to create a triphosphate linkage 164 00:11:39,000 --> 00:11:43,000 like this represents pushing together negative charged moieties, 165 00:11:43,000 --> 00:11:47,000 these three phosphates, even though they don't like to be next to one 166 00:11:47,000 --> 00:11:51,000 another. And that pushing together, that creation of the triphosphate 167 00:11:51,000 --> 00:11:55,000 chain represents an investment of energy. And once the three are 168 00:11:55,000 --> 00:11:59,000 pushed together that represents great potential energy much like a 169 00:11:59,000 --> 00:12:03,000 spring that has been compressed together and would just 170 00:12:03,000 --> 00:12:07,000 love to pop apart. These three phosphates would love to 171 00:12:07,000 --> 00:12:11,000 pop apart from one another by virtue of the fact that these negative 172 00:12:11,000 --> 00:12:16,000 charges are mutually repelling. But they cannot as long as they're 173 00:12:16,000 --> 00:12:20,000 in this triphosphate configuration. But once the triphosphate 174 00:12:20,000 --> 00:12:25,000 configuration is broken then the energy released by their leaving one 175 00:12:25,000 --> 00:12:30,000 another can then be exploited for yet other purposes. 176 00:12:30,000 --> 00:12:34,000 Keep in mind, just to reinforce what I said a second ago, 177 00:12:34,000 --> 00:12:38,000 the difference between a ribose and a deoxyribose is the presence or the 178 00:12:38,000 --> 00:12:42,000 absence of this oxygen. And now let's focus in a little 179 00:12:42,000 --> 00:12:46,000 more detail on the bases because the bases are indeed the subject of much 180 00:12:46,000 --> 00:12:50,000 of our discussion today. And we have two basic kinds of 181 00:12:50,000 --> 00:12:54,000 bases. They're called nitrogenous bases, these bases, 182 00:12:54,000 --> 00:12:59,000 because they have nitrogen in them. And if you look at the five bases 183 00:12:59,000 --> 00:13:03,000 that are depicted here you'll see that they are not aromatic rings 184 00:13:03,000 --> 00:13:08,000 with just carbons in them like a six carbon benzene. 185 00:13:08,000 --> 00:13:12,000 Rather all of them have a substantial fraction of nitrogens 186 00:13:12,000 --> 00:13:16,000 actually in the ring, two in the case of these pyrimidines. 187 00:13:16,000 --> 00:13:21,000 And here you see the number actually is four. 188 00:13:21,000 --> 00:13:25,000 In fact, one of these nitrogenous bases indicated here, 189 00:13:25,000 --> 00:13:30,000 guanine has actually a fifth one up here as a side chain. 190 00:13:30,000 --> 00:13:34,000 This is outside of the chain, it represents a side group. And if 191 00:13:34,000 --> 00:13:38,000 we begin now to make distinctions between the ring itself and the 192 00:13:38,000 --> 00:13:42,000 entities that protrude out of the ring, they really represent some of 193 00:13:42,000 --> 00:13:46,000 the important distinguishing characteristics. 194 00:13:46,000 --> 00:13:50,000 It's important that we understand that pyrimidines have one ring and 195 00:13:50,000 --> 00:13:54,000 these have two rings in them. The purines have a five and a six 196 00:13:54,000 --> 00:13:58,000 membered ring fused together, as you can see. The pyrimidines 197 00:13:58,000 --> 00:14:02,000 have only a six membered ring. And what's really important in 198 00:14:02,000 --> 00:14:06,000 determining their identity is not the basic pyrimidine or purine 199 00:14:06,000 --> 00:14:10,000 structure. It's once again the side chains that distinguish these one 200 00:14:10,000 --> 00:14:14,000 from the other. Here in the case of cytosine we see 201 00:14:14,000 --> 00:14:18,000 that there's a carbonyl here, an oxygen sticking out, and there's 202 00:14:18,000 --> 00:14:22,000 an amine over here. We see uracil which happens to be 203 00:14:22,000 --> 00:14:26,000 present in RNA but not DNA which has two carbonyls here and here. 204 00:14:26,000 --> 00:14:30,000 Obviously, therefore what distinguishes these two from one 205 00:14:30,000 --> 00:14:34,000 another is this oxygen versus this amine. 206 00:14:34,000 --> 00:14:37,000 And here we see the thymine which is present in DNA but not RNA. 207 00:14:37,000 --> 00:14:41,000 And this will become very familiar to you shortly. 208 00:14:41,000 --> 00:14:44,000 This looks just like uracil except for the fact that there's a methyl 209 00:14:44,000 --> 00:14:48,000 group sticking out here. Now, very important for our 210 00:14:48,000 --> 00:14:51,000 understanding of what's happening here is the fact that this methyl 211 00:14:51,000 --> 00:14:55,000 group, although it distinguishes thymine from uracil is itself 212 00:14:55,000 --> 00:14:59,000 biologically actually very important. 213 00:14:59,000 --> 00:15:03,000 It's there to be sure and it's a distinguishing mark of T versus U, 214 00:15:03,000 --> 00:15:07,000 but the business end of T versus U in terms of encoding information 215 00:15:07,000 --> 00:15:11,000 happens here with these two oxygens sticking out. They're the important 216 00:15:11,000 --> 00:15:15,000 oxygens, here and here. And therefore from the point of 217 00:15:15,000 --> 00:15:20,000 view of information content, as we'll soon see, T and U are 218 00:15:20,000 --> 00:15:24,000 essentially equivalent. It may be that one of them happens 219 00:15:24,000 --> 00:15:28,000 to be in RNA and the other in DNA, but from the point of view of 220 00:15:28,000 --> 00:15:32,000 understanding the coding information they carry it's these two carbonyls 221 00:15:32,000 --> 00:15:37,000 here and here which dictate essentially their identity. 222 00:15:37,000 --> 00:15:41,000 We have the same kind of dynamics that operate here in the case of A 223 00:15:41,000 --> 00:15:45,000 and G where once again this one has only an amine side chain and this 224 00:15:45,000 --> 00:15:49,000 one has a carbonyl and an amine side chain right here. 225 00:15:49,000 --> 00:15:53,000 Now, very important there is a confusing array of names that are 226 00:15:53,000 --> 00:15:57,000 associated with all this. I don't know if it you can, 227 00:15:57,000 --> 00:16:01,000 well, it reads reasonably well. Because once a base, 228 00:16:01,000 --> 00:16:05,000 and I just showed you bases which are unattached to the sugars, 229 00:16:05,000 --> 00:16:10,000 once bases are attached to the sugars they change their name 230 00:16:10,000 --> 00:16:14,000 slightly. So keep in mind that here, when we talk about these nitrogenous 231 00:16:14,000 --> 00:16:18,000 bases, the bases are just free molecules where in each case this 232 00:16:18,000 --> 00:16:23,000 lowest nitrogen is the one that participates in the formation of a 233 00:16:23,000 --> 00:16:27,000 covalent glycosidic bond with the ribose or the deoxyribose 234 00:16:27,000 --> 00:16:32,000 underneath it. And here we can see one indication 235 00:16:32,000 --> 00:16:37,000 of how that, you see this N, in all cases via a condensation 236 00:16:37,000 --> 00:16:42,000 reaction, forms a covalent bond with a five carbon sugar, 237 00:16:42,000 --> 00:16:47,000 once again deoxyribose or ribose. Once the base associates with the 238 00:16:47,000 --> 00:16:52,000 sugar, that is the base plus the sugar is called a nucleoside. 239 00:16:52,000 --> 00:16:57,000 So when we talk in polite company about a nucleoside we're not talking 240 00:16:57,000 --> 00:17:02,000 about free bases. We're talking about the covalent 241 00:17:02,000 --> 00:17:08,000 interaction of a pentose binding to a base. The pentose could be one or 242 00:17:08,000 --> 00:17:13,000 the other of these two. And that's what a nucleoside is. 243 00:17:13,000 --> 00:17:19,000 If on top of that we add additionally one or more phosphates 244 00:17:19,000 --> 00:17:24,000 then we even modify our language even further because a base attached 245 00:17:24,000 --> 00:17:30,000 to a sugar which in turn is attached to a phosphate is called 246 00:17:30,000 --> 00:17:34,000 a nucleotide. The nucleotide, 247 00:17:34,000 --> 00:17:38,000 the T is there to designate the fact that there's actually, 248 00:17:38,000 --> 00:17:42,000 in addition to the base and the sugar there's a phosphate which is 249 00:17:42,000 --> 00:17:46,000 attached and extends off the end. And there are slightly different 250 00:17:46,000 --> 00:17:50,000 names. For the purposes of this course we won't get into this very 251 00:17:50,000 --> 00:17:54,000 arcane nomenclature because it is, to be frank, and you know I always 252 00:17:54,000 --> 00:17:58,000 am frank with you, confusing. Here is U. 253 00:17:58,000 --> 00:18:02,000 And when uracil, the base becomes linked to a ribose 254 00:18:02,000 --> 00:18:07,000 it changes its name from uracil to uridine. Cytosine changes its name 255 00:18:07,000 --> 00:18:12,000 to cytidine when it becomes a nucleoside by a covalent linkage to 256 00:18:12,000 --> 00:18:17,000 either ribose or deoxyribose. Thymine becomes thymidine. And the 257 00:18:17,000 --> 00:18:21,000 same nomenclature exists, the shift in their names exists in 258 00:18:21,000 --> 00:18:26,000 the case of the purines as well, adenine becomes adenosine and so 259 00:18:26,000 --> 00:18:31,000 forth. We need to focus mostly on the 260 00:18:31,000 --> 00:18:35,000 notion of A, C, T, G and U. Those are the things we 261 00:18:35,000 --> 00:18:39,000 need to think about. And why is this nomenclature 262 00:18:39,000 --> 00:18:43,000 confusing? Well, here the nucleoside ends with osine, 263 00:18:43,000 --> 00:18:48,000 O-S-I-N-E. You see that here? You say that's easy to remember, 264 00:18:48,000 --> 00:18:52,000 but look up here. Here the base ends with O-S-I-N-E. 265 00:18:52,000 --> 00:18:56,000 And so this nomenclature which was cobbled together in the early 20th 266 00:18:56,000 --> 00:19:00,000 century will bedevil us and generations of biology students to 267 00:19:00,000 --> 00:19:05,000 come. Oh well, that's life. Now, one of the things we're 268 00:19:05,000 --> 00:19:09,000 interested in and which I talked about briefly last time was the 269 00:19:09,000 --> 00:19:14,000 whole notion of polymerization, i.e., how we actually polymerize a 270 00:19:14,000 --> 00:19:18,000 chain. Let's look at this illustration which I think is more 271 00:19:18,000 --> 00:19:23,000 useful. Recall the fact that I emphasized with great seriousness 272 00:19:23,000 --> 00:19:27,000 the fact that nucleic acid synthesis always occurs in a certain polarity. 273 00:19:27,000 --> 00:19:32,000 It goes in a certain direction. You cannot add nucleotides on one 274 00:19:32,000 --> 00:19:36,000 end or the other end willy-nilly. You can only add them onto the 275 00:19:36,000 --> 00:19:41,000 3-prime end. And keep in mind that the reason why this is defined as 276 00:19:41,000 --> 00:19:45,000 the 5-prime end is that this is, the last hydroxyl sticking out at 277 00:19:45,000 --> 00:19:49,000 this end comes out of the 5-prime carbon right here, 278 00:19:49,000 --> 00:19:54,000 the 5-prime hydroxyl. And conversely at this end we're 279 00:19:54,000 --> 00:19:58,000 adding another base at the 3-prime hydroxyl, at this end, 280 00:19:58,000 --> 00:20:03,000 which creates the 3-prime end of the DNA or the RNA. 281 00:20:03,000 --> 00:20:08,000 In fact, the polymerization always occurs between the 5-prime end of a 282 00:20:08,000 --> 00:20:13,000 deoxyribonucleotide indicated here where the bases remain anonymous and 283 00:20:13,000 --> 00:20:18,000 the 3-prime hydroxyl. That's the way it always happens. 284 00:20:18,000 --> 00:20:24,000 And here we begin to appreciate the role of the high energy 285 00:20:24,000 --> 00:20:29,000 phosphate linkage. Because this high energy 286 00:20:29,000 --> 00:20:34,000 triphosphate linkage, which is synthesized elsewhere in 287 00:20:34,000 --> 00:20:39,000 the cell like a coiled spring and which contains a lot of potential 288 00:20:39,000 --> 00:20:43,000 energy by virtue of this mutual negative repulsion of the phosphate 289 00:20:43,000 --> 00:20:48,000 groups, this energy is used to form the bond here between the phosphate 290 00:20:48,000 --> 00:20:53,000 in this condensation reaction and the 3-prime hydroxyl. 291 00:20:53,000 --> 00:20:58,000 So that requires an investment of energy. And the resulting linkage 292 00:20:58,000 --> 00:21:03,000 which is formed is sometimes called a phosphodiester linkage. 293 00:21:03,000 --> 00:21:06,000 Why phosphodiester? Well, obviously it's phospho. 294 00:21:06,000 --> 00:21:10,000 And there actually are two esterifications that are occurring 295 00:21:10,000 --> 00:21:14,000 here. If we look at one of these phosphodiester bonds we see that an 296 00:21:14,000 --> 00:21:17,000 ester linkage has been made with this hydroxyl and an ester linkage 297 00:21:17,000 --> 00:21:21,000 has been made with this hydroxyl. And for that reason it's called a 298 00:21:21,000 --> 00:21:25,000 phosphodiester linkage. Therefore we come to realize that 299 00:21:25,000 --> 00:21:29,000 polymerization of nucleic acids doesn't take place spontaneously. 300 00:21:29,000 --> 00:21:33,000 It requires the investment of a high-energy molecule, 301 00:21:33,000 --> 00:21:38,000 the investment of the energy that it carries. And when this linkage is 302 00:21:38,000 --> 00:21:42,000 formed the diphosphate here, the beta and the gamma phosphates 303 00:21:42,000 --> 00:21:47,000 float off into interstellar space. It's only the alpha phosphate that 304 00:21:47,000 --> 00:21:52,000 is retained to form the resulting diphosphate, a phosphodiester 305 00:21:52,000 --> 00:21:56,000 linkage. And this process can be repeated literally thousands and 306 00:21:56,000 --> 00:22:01,000 millions of times. An average human's chromosomes 307 00:22:01,000 --> 00:22:06,000 contains on the order of tens, fifty, a hundred mega-bases of DNA. 308 00:22:06,000 --> 00:22:10,000 A mega-base is a million bases or a million nucleotides. 309 00:22:10,000 --> 00:22:14,000 So there you can understand that there's no limit to the extent of 310 00:22:14,000 --> 00:22:18,000 elongation of these various kinds of molecules. Now, 311 00:22:18,000 --> 00:22:22,000 note by the way yet another feature of this which is that the 312 00:22:22,000 --> 00:22:26,000 distinguishing feature between DNA and RNA, the most important 313 00:22:26,000 --> 00:22:30,000 distinguishing feature is this 2-prime hydroxyl. 314 00:22:30,000 --> 00:22:34,000 And here we're talking about DNA, but we could almost in the same 315 00:22:34,000 --> 00:22:38,000 breath be talking about the way that RNA gets polymerized. 316 00:22:38,000 --> 00:22:42,000 Why? Because this 2-prime hydroxyl or this 2-prime hydrogen in this 317 00:22:42,000 --> 00:22:46,000 case is out of the line of fire. The business action is happening 318 00:22:46,000 --> 00:22:50,000 right along here. Look where the business action is 319 00:22:50,000 --> 00:22:54,000 in terms of the backbone. The 2-prime hydroxyl is off to the 320 00:22:54,000 --> 00:22:58,000 side. And whether it's oxygen or just whether it's OH, 321 00:22:58,000 --> 00:23:02,000 that is in ribose, a hydroxyl group or just a hydrogen, 322 00:23:02,000 --> 00:23:06,000 as is indicated here in the case of deoxyribose, is irrelevant 323 00:23:06,000 --> 00:23:10,000 to the polymerization. And therefore we can guess or intuit, 324 00:23:10,000 --> 00:23:14,000 and just because we guessed doesn't mean it's wrong, 325 00:23:14,000 --> 00:23:18,000 often it's right, it doesn't really make much 326 00:23:18,000 --> 00:23:22,000 difference whether we look at DNA or RNA. Here's a polymerization scheme 327 00:23:22,000 --> 00:23:26,000 of RNA and it's absolutely identical to that of DNA. 328 00:23:26,000 --> 00:23:30,000 In this case it's ribonucleotide triphosphates that are used for the 329 00:23:30,000 --> 00:23:35,000 polymerization reaction. Now here I just uttered the phrase 330 00:23:35,000 --> 00:23:41,000 ribonucleoside triphosphates. Why did I say that? Well, 331 00:23:41,000 --> 00:23:47,000 ultimately only the good Lord knows why I said that. 332 00:23:47,000 --> 00:23:53,000 But let's look at this phrase. I said ribonucleoside triphosphate 333 00:23:53,000 --> 00:23:59,000 rather than ribonucleotide triphosphate because the fact that I 334 00:23:59,000 --> 00:24:05,000 added this on the end makes the T there unnecessary. 335 00:24:05,000 --> 00:24:09,000 The T is there to indicate the phosphate being attached to the 336 00:24:09,000 --> 00:24:13,000 ribose or the deoxyribose. But if I'm adding this phrase over 337 00:24:13,000 --> 00:24:17,000 here, triphosphate, that obviates, that makes 338 00:24:17,000 --> 00:24:21,000 unnecessary my saying ribonucleotide triphosphate. If I'm looking at UTP 339 00:24:21,000 --> 00:24:25,000 or ATP, I would say I'm a ribonucleotide if I don't mention 340 00:24:25,000 --> 00:24:29,000 the triphosphate. But the moment this comes from my 341 00:24:29,000 --> 00:24:33,000 lips then we'll say ribonucleoside indicating that a ribonucleoside, 342 00:24:33,000 --> 00:24:37,000 that is a base and a sugar are then attached to one or more 343 00:24:37,000 --> 00:24:41,000 phosphate linkages. Now, the ultimate basis of the 344 00:24:41,000 --> 00:24:45,000 biological revolution comes from the realization that these different 345 00:24:45,000 --> 00:24:50,000 bases have complementarity to one another. That is they like to be 346 00:24:50,000 --> 00:24:54,000 together with one another. And if we look at this and we think 347 00:24:54,000 --> 00:24:58,000 about the DNA double helix we come to realize that these bases have 348 00:24:58,000 --> 00:25:03,000 affinities for one another. And the general affinity is one 349 00:25:03,000 --> 00:25:07,000 purine likes to be facing opposite one pyrimidine. 350 00:25:07,000 --> 00:25:12,000 One pyrimidine opposite one purine. And if we have two pyrimidines 351 00:25:12,000 --> 00:25:16,000 facing one another they're not close enough to one another to kiss. 352 00:25:16,000 --> 00:25:21,000 And if we have two purines they're too close to one another, 353 00:25:21,000 --> 00:25:25,000 they're bumping into one another, they take up too much space. And 354 00:25:25,000 --> 00:25:30,000 therefore the optimal configuration is one purine and one pyrimidine. 355 00:25:30,000 --> 00:25:34,000 And you can see these two pairings here in the case of what happens 356 00:25:34,000 --> 00:25:39,000 with DNA. In fact, the realization of this diagram 357 00:25:39,000 --> 00:25:44,000 right here is what triggered the discovery of DNA in 1953. 358 00:25:44,000 --> 00:25:49,000 This diagram right here is what triggered the biological revolution. 359 00:25:49,000 --> 00:25:53,000 And though it's been depicted in many, many ways it's worthwhile 360 00:25:53,000 --> 00:25:58,000 dwelling on it because this is perhaps the most important diagram 361 00:25:58,000 --> 00:26:02,000 that we'll address all semester. Although this doesn't mean we have 362 00:26:02,000 --> 00:26:06,000 to spend all semester assimilating it. It's not so complicated. 363 00:26:06,000 --> 00:26:10,000 It's relatively straightforward. And let's look at its features. 364 00:26:10,000 --> 00:26:14,000 Let's dwell on them momentarily because this is a microscopic 365 00:26:14,000 --> 00:26:17,000 snapshot of what DNA is composed of. You all know it's a double helix 366 00:26:17,000 --> 00:26:21,000 and therefore there are two strands of DNA in a double helix. 367 00:26:21,000 --> 00:26:25,000 And one of the interesting things about the double helix, 368 00:26:25,000 --> 00:26:29,000 although we're not showing it yet, we're just showing a little section 369 00:26:29,000 --> 00:26:33,000 of a double helix, is the polarity of the two chains 370 00:26:33,000 --> 00:26:36,000 that constitute the double helix. Let's look at that polarity. 371 00:26:36,000 --> 00:26:40,000 This one is running in one direction and this one, 372 00:26:40,000 --> 00:26:44,000 the opposite one, the complementary one is running in the other 373 00:26:44,000 --> 00:26:47,000 direction. And therefore we talk about the double helix as being 374 00:26:47,000 --> 00:26:51,000 anti-parallel. Well, I guess I should have a 375 00:26:51,000 --> 00:26:55,000 bandage on the other finger to convince you but you get the idea. 376 00:26:55,000 --> 00:26:59,000 They're running in opposite directions. 377 00:26:59,000 --> 00:27:03,000 They're not both pointed the same. And the other thing to indicate is, 378 00:27:03,000 --> 00:27:08,000 to repeat what I said just seconds ago, that there's a complementarity 379 00:27:08,000 --> 00:27:13,000 between the purines and the pyrimidines. So we use the word 380 00:27:13,000 --> 00:27:18,000 complementary with great frequency, with great promiscuity in biology. 381 00:27:18,000 --> 00:27:22,000 Complementarity refers to the fact that A and T here or A and U because 382 00:27:22,000 --> 00:27:27,000 I said U and T are functionally equivalent, they like to 383 00:27:27,000 --> 00:27:32,000 be opposite one another. There's a purine and a pyrimidine. 384 00:27:32,000 --> 00:27:36,000 And the converse is the case with C and G, they like to be opposite one 385 00:27:36,000 --> 00:27:40,000 another. Now, there is specificity here. 386 00:27:40,000 --> 00:27:44,000 You might say any purine can pair up with any pyrimidine, 387 00:27:44,000 --> 00:27:48,000 but it's not the case. For instance, A doesn't like to be opposite C and 388 00:27:48,000 --> 00:27:52,000 T doesn't like to be opposite G. So one of the things we have to 389 00:27:52,000 --> 00:27:56,000 memorize this semester, and it's not many and it's not hard, 390 00:27:56,000 --> 00:28:00,000 is that A and T are opposite one another, or A and U, 391 00:28:00,000 --> 00:28:04,000 and G and C are opposite one another. That's one of the essential concepts 392 00:28:04,000 --> 00:28:08,000 in molecular biology. There are now a thousand things you 393 00:28:08,000 --> 00:28:11,000 need to learn, but if you don't understand that 394 00:28:11,000 --> 00:28:14,000 then ultimately sooner or later you'll find yourself in a swamp, 395 00:28:14,000 --> 00:28:18,000 literally or figuratively. Now, let's look at the different between 396 00:28:18,000 --> 00:28:21,000 these two. One of the interesting things is, to state the obvious, 397 00:28:21,000 --> 00:28:24,000 the way they're associating with one another, hand in glove, 398 00:28:24,000 --> 00:28:28,000 is via hydrogen bonds. That's not any covalent interaction, 399 00:28:28,000 --> 00:28:31,000 which means they're reversible. We talked about that. 400 00:28:31,000 --> 00:28:35,000 Which means that if we were to take a solution of double stranded DNA 401 00:28:35,000 --> 00:28:39,000 and boil it we would break those hydrogen bonds. 402 00:28:39,000 --> 00:28:43,000 Remember they only have 8 kilocalories per mole and boiling 403 00:28:43,000 --> 00:28:47,000 water has far higher energetic content. And consequently if we 404 00:28:47,000 --> 00:28:51,000 heat up a DNA double helix and we break those double bonds of DNA that 405 00:28:51,000 --> 00:28:55,000 hold the two strands together, the two strands come apart, the DNA 406 00:28:55,000 --> 00:28:59,000 ends up being denatured, that is the two strands are 407 00:28:59,000 --> 00:29:03,000 separated one from the other. In fact, if there ever were a 408 00:29:03,000 --> 00:29:07,000 covalent cross-link between the two strands that's really bad news for a 409 00:29:07,000 --> 00:29:11,000 cell carrying such a DNA double helix. A covalently cross-link from 410 00:29:11,000 --> 00:29:15,000 one strand to the other DNA double helix represents often a sign that a 411 00:29:15,000 --> 00:29:19,000 cell should go off and die because it has a very hard time dealing with 412 00:29:19,000 --> 00:29:23,000 that by virtue of the fact, as we will soon learn or as you 413 00:29:23,000 --> 00:29:27,000 already know, the cell has, with some frequency, to pull apart 414 00:29:27,000 --> 00:29:31,000 these two strands. And therefore this association must 415 00:29:31,000 --> 00:29:35,000 be tight enough so that it's stable at body temperature but not so tight 416 00:29:35,000 --> 00:29:39,000 that it cannot be pulled apart when certain biological conditions call 417 00:29:39,000 --> 00:29:43,000 for it. You see that in fact here there are three hydrogen bonds and 418 00:29:43,000 --> 00:29:47,000 here there are only two hydrogen bonds. That also has its 419 00:29:47,000 --> 00:29:51,000 implications. It turns out to be the case that the disposition of 420 00:29:51,000 --> 00:29:55,000 this hydrogen and this oxygen here, they're far enough apart that for 421 00:29:55,000 --> 00:29:59,000 all practical purposes they don't really make very good 422 00:29:59,000 --> 00:30:03,000 hydrogen bonds. And therefore we think of this as 423 00:30:03,000 --> 00:30:07,000 having two and this having three. And if you were to try to put C 424 00:30:07,000 --> 00:30:12,000 opposite A or G opposite T you'd see that they cannot form hydrogen bonds 425 00:30:12,000 --> 00:30:16,000 well with one another. Instead they kind of bump into one 426 00:30:16,000 --> 00:30:20,000 another, and therefore are not complementary to one another at all. 427 00:30:20,000 --> 00:30:25,000 There's another corollary that we can deduce from this diagram, 428 00:30:25,000 --> 00:30:29,000 and that is the following. If it's always true that A equal 429 00:30:29,000 --> 00:30:36,000 C and G equal T -- 430 00:30:36,000 --> 00:30:40,000 A equals T and G equals C. By the way, this is an interesting 431 00:30:40,000 --> 00:30:45,000 story. This is the Chargaff Rule. Because about a year or so before 432 00:30:45,000 --> 00:30:49,000 Watson and Crick figured out the structure of the double helix there 433 00:30:49,000 --> 00:30:54,000 was a guy named Erwin Chargaff in New York at Columbia University who 434 00:30:54,000 --> 00:30:58,000 one day figured out that if you looked at a whole bunch of nucleic 435 00:30:58,000 --> 00:31:03,000 acids, different DNAs from different cell types -- 436 00:31:03,000 --> 00:31:09,000 And in certain cell types what he found was that G was equal to, 437 00:31:09,000 --> 00:31:16,000 for example G equals 20% of the bases. Therefore, 438 00:31:16,000 --> 00:31:23,000 obviously we know C must equal also 20% because there always has to be a 439 00:31:23,000 --> 00:31:29,000 C opposite a G in the double helix, right? G and C always have to be 440 00:31:29,000 --> 00:31:36,000 equal. And Chargaff discovered that, in fact, A in such DNA always was 441 00:31:36,000 --> 00:31:43,000 30% and T was also 30%. Well, these together make up 100% 442 00:31:43,000 --> 00:31:49,000 which is, we're not in higher math yet, but A and T were always the 443 00:31:49,000 --> 00:31:55,000 same. If you looked at another type of DNA he might find that G equals 444 00:31:55,000 --> 00:32:00,000 23% and C also equals 23%. And in this same DNA then A would 445 00:32:00,000 --> 00:32:05,000 equal 27%, I guess, and T also equals 27%. 446 00:32:05,000 --> 00:32:10,000 And I hope that adds up to 100%. So he looked at a whole bunch of 447 00:32:10,000 --> 00:32:15,000 DNAs and they always tracked one another, A always tracked T, 448 00:32:15,000 --> 00:32:21,000 G always tracked C. And then in 1953 up comes these two guys from 449 00:32:21,000 --> 00:32:26,000 Cambridge, England, Watson and Crick whom Chargaff 450 00:32:26,000 --> 00:32:31,000 regarded as upstarts, as smart-asses who thought they knew 451 00:32:31,000 --> 00:32:35,000 all the answers. And Watson and Crick said, 452 00:32:35,000 --> 00:32:39,000 gee, this Chargaff rule really is very interesting because it suggests 453 00:32:39,000 --> 00:32:42,000 something about the structure of DNA. These cannot just be coincidences. 454 00:32:42,000 --> 00:32:46,000 There's something profoundly important they said, 455 00:32:46,000 --> 00:32:50,000 correctly, in the fact that there was always an equivalence between A 456 00:32:50,000 --> 00:32:53,000 and T and between G and C. And that represented one of the 457 00:32:53,000 --> 00:32:57,000 conceptual cornerstones of their elucidating the structure 458 00:32:57,000 --> 00:33:01,000 of the double helix. And so Chargaff who died last year 459 00:33:01,000 --> 00:33:05,000 or the year before last, at an advanced age, was for the next 460 00:33:05,000 --> 00:33:09,000 fifty years a very bitter man, because he was this far away from 461 00:33:09,000 --> 00:33:13,000 figuring out this far. Not this far, but this far away 462 00:33:13,000 --> 00:33:17,000 from figuring out, making the most important discovery 463 00:33:17,000 --> 00:33:22,000 in biology in the 20th century. He had the information right there. 464 00:33:22,000 --> 00:33:26,000 And if he thought a little bit about information theory and thought 465 00:33:26,000 --> 00:33:30,000 a little bit about the way information content is encoded he 466 00:33:30,000 --> 00:33:34,000 could have already predicted, not the detailed structure of the 467 00:33:34,000 --> 00:33:39,000 double helix, but at least the way in which it encodes information. 468 00:33:39,000 --> 00:33:42,000 Because, to state the obvious, and as many of you know already, 469 00:33:42,000 --> 00:33:46,000 if one looks at the structure of a double helix one can, 470 00:33:46,000 --> 00:33:50,000 in principle, depict it in a two or a three-dimensional cartoon. 471 00:33:50,000 --> 00:33:54,000 Here's the way one can think of it. This is the way we've been talking 472 00:33:54,000 --> 00:33:58,000 about it over the last couple of minutes. It's a two-dimensional 473 00:33:58,000 --> 00:34:02,000 double helix. And from the point of view of 474 00:34:02,000 --> 00:34:06,000 information encoding, it doesn't really matter whether we 475 00:34:06,000 --> 00:34:10,000 draw it this way or that way. It happens that the double helix is 476 00:34:10,000 --> 00:34:14,000 turned around like that, it's twisted around. It's very 477 00:34:14,000 --> 00:34:18,000 difficult for biological molecules to be totally flat for an extended 478 00:34:18,000 --> 00:34:22,000 period. And the helix is, in fact, something that is 479 00:34:22,000 --> 00:34:26,000 frequently resorted to. Witness the alpha helix in the 480 00:34:26,000 --> 00:34:30,000 protein. So these are turned around. It turns out that each of these 481 00:34:30,000 --> 00:34:34,000 constitutes a base pair, and each of these base pairs is, 482 00:34:34,000 --> 00:34:39,000 in fact, 3.4 angstroms apart. 3.4 angstroms thick. 483 00:34:39,000 --> 00:34:44,000 So you have ten of them, the DNA helix advances 3.4 angstroms 484 00:34:44,000 --> 00:34:49,000 every ten turns. And ten turns is roughly, 485 00:34:49,000 --> 00:34:54,000 oh, I'm sorry. Ten base pairs is roughly one turn of the alpha helix. 486 00:34:54,000 --> 00:34:59,000 So if you go here and you count up ten, we should start again at the 487 00:34:59,000 --> 00:35:04,000 same orientation. Another ten is another turn. 488 00:35:04,000 --> 00:35:09,000 Another ten is another turn. In fact, I'm just recalling that I 489 00:35:09,000 --> 00:35:15,000 was once a TA in 7. 1 in 1965. And there was a physics 490 00:35:15,000 --> 00:35:20,000 professor who became a biologist who always talked about these double 491 00:35:20,000 --> 00:35:25,000 helices. And he always talked about the measurements of different DNA 492 00:35:25,000 --> 00:35:30,000 molecules. Now, you may know that the term angstrom 493 00:35:30,000 --> 00:35:36,000 is named after a Danish person named Angstrom. 494 00:35:36,000 --> 00:35:40,000 That's why it got its name. So whenever this professor, 495 00:35:40,000 --> 00:35:45,000 whom I never corrected, God forbid, ever talked about something that was 496 00:35:45,000 --> 00:35:50,000 ten angstroms long, he called these ten angstra. 497 00:35:50,000 --> 00:35:54,000 Now, as you know, when you go in a Latin verb from singular to plural 498 00:35:54,000 --> 00:35:59,000 it's “-um” to “-a”, right? So he pretended this was a 499 00:35:59,000 --> 00:36:04,000 Latin word. What's a good word? 500 00:36:04,000 --> 00:36:08,000 Sorry? What's a common Latin word we use? Sorry? 501 00:36:08,000 --> 00:36:12,000 Millennium. Yeah, millennium, millennia. 502 00:36:12,000 --> 00:36:16,000 So he went from angstrom to anstra. And it went on for a whole year. I 503 00:36:16,000 --> 00:36:20,000 never said anything but I knew better. OK, anyhow. 504 00:36:20,000 --> 00:36:24,000 Here you see the genius of Watson and Crick. And, 505 00:36:24,000 --> 00:36:28,000 by the way, Angstrom was a Dane, as I said, and not a Roman soldier. 506 00:36:28,000 --> 00:36:32,000 So here we see. OK. So here is the genius of their 507 00:36:32,000 --> 00:36:36,000 discovery. And the elegance of it is not how complicated it is. 508 00:36:36,000 --> 00:36:41,000 The elegance of it is how simple it is, because information we see is 509 00:36:41,000 --> 00:36:46,000 encoded in two strands. The information is redundant 510 00:36:46,000 --> 00:36:50,000 because if we know the sequence of one strand we can obviously predict 511 00:36:50,000 --> 00:36:55,000 the sequence in the other strand because it's a complementary 512 00:36:55,000 --> 00:37:00,000 sequence. If we always realize that A is 513 00:37:00,000 --> 00:37:05,000 opposite T and G is opposite C we can know directly that a sequence in 514 00:37:05,000 --> 00:37:10,000 one strand, which may be A, C, T, G, G, C and the other strand 515 00:37:10,000 --> 00:37:16,000 moving in the other anti-parallel direction the sequence is like this. 516 00:37:16,000 --> 00:37:21,000 I don't need to know the sequence of the other strand. 517 00:37:21,000 --> 00:37:26,000 I can predict it by using these rules of complementary 518 00:37:26,000 --> 00:37:31,000 sequence structure. And that, in turn, 519 00:37:31,000 --> 00:37:35,000 obviously has important implications. If we look at the three-dimensional 520 00:37:35,000 --> 00:37:39,000 structure, this is more of what's called a space-filing model. 521 00:37:39,000 --> 00:37:44,000 This is the way the x-ray crystallographer would actually 522 00:37:44,000 --> 00:37:48,000 depict it. We talked about space-filling models before. 523 00:37:48,000 --> 00:37:53,000 One of the things we appreciate is the fact that the phosphates are on 524 00:37:53,000 --> 00:37:57,000 the outside and these bases are in the inside. And because these bases 525 00:37:57,000 --> 00:38:01,000 are able also to stack with one another via hydrophobic interactions 526 00:38:01,000 --> 00:38:05,000 importantly the bases are protected. The face where they interact is 527 00:38:05,000 --> 00:38:09,000 protected from the outside world. What do I mean by that? Well, 528 00:38:09,000 --> 00:38:13,000 let's go back to this figure right here. You see the interaction faces 529 00:38:13,000 --> 00:38:16,000 between A and T or C and G they're not on the outside of the helix. 530 00:38:16,000 --> 00:38:20,000 They're hidden in the middle. And that's important because it means 531 00:38:20,000 --> 00:38:23,000 that these interactions between A and C and G and T, 532 00:38:23,000 --> 00:38:27,000 you can see it up here as well, are biochemically protected from any 533 00:38:27,000 --> 00:38:31,000 accidents that might happen on the outside. 534 00:38:31,000 --> 00:38:35,000 They're sheltered from that. And that's important because the 535 00:38:35,000 --> 00:38:39,000 information content in DNA must be held very stable, 536 00:38:39,000 --> 00:38:43,000 very constant. If it isn't then we have real trouble like cancer. 537 00:38:43,000 --> 00:38:47,000 And therefore whenever a cell divides and copies its DNA, 538 00:38:47,000 --> 00:38:51,000 its three billion base pairs of DNA, whenever that happens the number of 539 00:38:51,000 --> 00:38:55,000 mistakes that are made is only three or four or five out three billion. 540 00:38:55,000 --> 00:39:00,000 A stunningly low rate. And this DNA can sit around. 541 00:39:00,000 --> 00:39:04,000 I told you about Neanderthal DNA that can sit around for 30, 542 00:39:04,000 --> 00:39:08,000 00 years and it's chemically relatively stable. 543 00:39:08,000 --> 00:39:12,000 In part, a testimonial to the fact that this base pairing, 544 00:39:12,000 --> 00:39:16,000 the face where the two bases interact across one another, 545 00:39:16,000 --> 00:39:20,000 this is shielded from the outside world because it's tucked into the 546 00:39:20,000 --> 00:39:24,000 middle, these interaction faces here. This is the inside of the helix. 547 00:39:24,000 --> 00:39:29,000 Here the sugar phosphate groups are on the outside. 548 00:39:29,000 --> 00:39:33,000 In fact, when Watson and Crick were struggling with the structure of the 549 00:39:33,000 --> 00:39:37,000 double helix they were in a horse race with a man named Linus Pauling 550 00:39:37,000 --> 00:39:41,000 who was really the inventor, the discoverer of the hydrogen bond 551 00:39:41,000 --> 00:39:45,000 pretty much who actually got two Nobel Prizes in his lifetime who 552 00:39:45,000 --> 00:39:49,000 ended his life believing that if you took enough vitamin C grams of it 553 00:39:49,000 --> 00:39:53,000 every day you would never get sick. I don't know what he died of, but 554 00:39:53,000 --> 00:39:57,000 probably like Dr. Atkins he probably died of an 555 00:39:57,000 --> 00:40:02,000 illness he was trying to ward off. Or he might have died of kidney 556 00:40:02,000 --> 00:40:06,000 failure from all the vitamin C he was putting into his body. 557 00:40:06,000 --> 00:40:10,000 Who knows? Anyhow, I digress. The fact is that Pauling thought 558 00:40:10,000 --> 00:40:14,000 that, in fact, DNA was constituted of a triple 559 00:40:14,000 --> 00:40:18,000 helix, with three strands, and that the bases were facing 560 00:40:18,000 --> 00:40:22,000 outward. Well, of course, now we can snicker, 561 00:40:22,000 --> 00:40:26,000 now we can laugh, but at the time nobody had any idea. 562 00:40:26,000 --> 00:40:30,000 Now we realize it's only a double helix and the bases 563 00:40:30,000 --> 00:40:33,000 are facing inward. And, of course, 564 00:40:33,000 --> 00:40:37,000 because Pauling worked with that preconception, 565 00:40:37,000 --> 00:40:41,000 he was never able to figure what was actually going on, 566 00:40:41,000 --> 00:40:45,000 even though Watson and Crick thought that he had the answer and was about 567 00:40:45,000 --> 00:40:49,000 to scoop them. Implicit in what I've just said is 568 00:40:49,000 --> 00:40:53,000 the notion that the structure of DNA, which we'll talk about later, 569 00:40:53,000 --> 00:40:57,000 allows it to be copied, i.e., now we're referring in passing, 570 00:40:57,000 --> 00:41:01,000 and we'll get into this in greater detail later, to the whole 571 00:41:01,000 --> 00:41:05,000 process of replication. Because if we have genetic material 572 00:41:05,000 --> 00:41:09,000 and we've created in a certain sequence we must be able to make 573 00:41:09,000 --> 00:41:13,000 more copies of it. Keep in mind that each one of us, 574 00:41:13,000 --> 00:41:18,000 as I mentioned to you some lectures ago, we start out with a fertilized 575 00:41:18,000 --> 00:41:22,000 egg with one human genome, and through our lifetimes we produce 576 00:41:22,000 --> 00:41:26,000 how many cells? Anybody remember? 577 00:41:26,000 --> 00:41:31,000 I did mention it, right? Is there one soul who remembers it? 578 00:41:31,000 --> 00:41:36,000 Remember the whole story of Sodom and Gomorrah where the Lord says if 579 00:41:36,000 --> 00:41:41,000 there's one soul, one righteous soul in the city I 580 00:41:41,000 --> 00:41:46,000 will spare the city. And of course there wasn't so he 581 00:41:46,000 --> 00:41:52,000 wiped them all out. 30 trillion? Well, 582 00:41:52,000 --> 00:41:57,000 sorry. What do we do for him? Something nice. [APPLAUSE] 583 00:41:57,000 --> 00:42:02,000 Excellent. OK. You'll remain anonymous, 584 00:42:02,000 --> 00:42:06,000 though. You won't be on that video. OK. Ten to the sixteenth cell 585 00:42:06,000 --> 00:42:10,000 divisions in a human lifetime. And on every one of those occasions 586 00:42:10,000 --> 00:42:14,000 the double helix is copied. I'm telling you that only to give 587 00:42:14,000 --> 00:42:18,000 you the most dramatic demonstration of the fact that if you have one set 588 00:42:18,000 --> 00:42:22,000 of DNA molecules you need to be able to copy it, you need to be able to 589 00:42:22,000 --> 00:42:26,000 replicate it. And that replicative ability is inherent in the double 590 00:42:26,000 --> 00:42:30,000 helix as Watson and Crick immediately said and as they noted 591 00:42:30,000 --> 00:42:33,000 at the end of their paper when -- I think the last sentence says it 592 00:42:33,000 --> 00:42:37,000 has not escaped our attention that this structure, 593 00:42:37,000 --> 00:42:41,000 i.e., the structure of the double helix, allows for copying, 594 00:42:41,000 --> 00:42:44,000 allows for replication. Because if you pull the two strands apart, 595 00:42:44,000 --> 00:42:48,000 recall we said earlier that in certain biological situations you 596 00:42:48,000 --> 00:42:51,000 need to do that, if the two strands are pulled apart 597 00:42:51,000 --> 00:42:55,000 not by putting them in boiling water but by enzymes whose dedicated 598 00:42:55,000 --> 00:42:59,000 function it is to separate the two strands. 599 00:42:59,000 --> 00:43:04,000 Then when that happens one can begin to create two new daughter double 600 00:43:04,000 --> 00:43:10,000 helices by simply adding on new bases and thereby replicating the 601 00:43:10,000 --> 00:43:16,000 DNA. And how that happens is, of course, as you know, IO 602 00:43:16,000 --> 00:43:22,000 "Intuitively Obvious". OK. Uh-oh, we're in a dyslexic 603 00:43:22,000 --> 00:43:28,000 moment. Now, the fact is I emphasized with great vigor 604 00:43:28,000 --> 00:43:33,000 and conviction -- And remember, class, 605 00:43:33,000 --> 00:43:37,000 when somebody is convinced of something more often than not 606 00:43:37,000 --> 00:43:41,000 they're just wrong in a loud voice. But I nevertheless emphasized with 607 00:43:41,000 --> 00:43:45,000 great conviction that T and U are, from an information standpoint, 608 00:43:45,000 --> 00:43:49,000 functionally equivalent. They're replaceable, 609 00:43:49,000 --> 00:43:53,000 interchangeable. And therefore if we want we can 610 00:43:53,000 --> 00:43:57,000 make an RNA copy of a DNA molecule by realizing that if this were DNA 611 00:43:57,000 --> 00:44:01,000 we could make an RNA that was complementary to a DNA strand 612 00:44:01,000 --> 00:44:05,000 realizing that when the RNA molecule was being polymerized, 613 00:44:05,000 --> 00:44:09,000 instead of using T one would use U. All the other three bases are 614 00:44:09,000 --> 00:44:13,000 functionally equivalent. And so we could, in principle, 615 00:44:13,000 --> 00:44:17,000 and indeed it happens transiently, we could make a DNA-RNA hybrid helix 616 00:44:17,000 --> 00:44:21,000 where a DNA molecule is wrapped around an RNA molecule because the 617 00:44:21,000 --> 00:44:25,000 two molecules are functionally equivalent. The only difference 618 00:44:25,000 --> 00:44:29,000 between the two strands would be, well, there are two differences. 619 00:44:29,000 --> 00:44:33,000 One, in the RNA strand we'd have a U instead of a T. 620 00:44:33,000 --> 00:44:37,000 And, two, in the RNA strand all the sugars would be ribose rather than 621 00:44:37,000 --> 00:44:41,000 deoxyribose. Right on. OK. Good. So this structure, 622 00:44:41,000 --> 00:44:45,000 the simplicity of the structure gives one enormous power in encoding 623 00:44:45,000 --> 00:44:50,000 all kinds of information and replicating it. 624 00:44:50,000 --> 00:44:54,000 What it means, as we'll discuss also in great detail later, 625 00:44:54,000 --> 00:44:58,000 is that if we have a certain sequence of bases in the double 626 00:44:58,000 --> 00:45:02,000 helix of DNA an RNA molecule could be made to copy one of the two 627 00:45:02,000 --> 00:45:07,000 strands to make a complementary copy. 628 00:45:07,000 --> 00:45:11,000 And that RNA molecule could then leave the DNA double helix having 629 00:45:11,000 --> 00:45:15,000 lifted one of the sequences from it and then move to another part of the 630 00:45:15,000 --> 00:45:19,000 cell where it might do something interesting. And therefore to 631 00:45:19,000 --> 00:45:23,000 extract information out of the double helix doesn't necessarily 632 00:45:23,000 --> 00:45:27,000 mean to destroy it. If one can copy one of the two 633 00:45:27,000 --> 00:45:31,000 double strands in a complementary form as an RNA molecule that may 634 00:45:31,000 --> 00:45:35,000 enable the information that is encoded in the DNA to be copied 635 00:45:35,000 --> 00:45:39,000 without destroying the double helix itself. 636 00:45:39,000 --> 00:45:43,000 Again, that process, which we'll also talk about later, 637 00:45:43,000 --> 00:45:48,000 is called the process of transcription. 638 00:45:48,000 --> 00:45:52,000 And so in the course of this morning I have uttered the three 639 00:45:52,000 --> 00:45:57,000 words which represent the cannon, the basic fundaments of molecular 640 00:45:57,000 --> 00:46:02,000 biology. What are the three words? Replication, transcription and 641 00:46:02,000 --> 00:46:06,000 translation. Transcription means when you make an RNA copy of a 642 00:46:06,000 --> 00:46:11,000 strand of the DNA double helix. Let's just add a couple more 643 00:46:11,000 --> 00:46:15,000 footnotes to what I've been saying just so we are on firm ground for 644 00:46:15,000 --> 00:46:20,000 subsequent discussions. It turns out that often in RNA 645 00:46:20,000 --> 00:46:25,000 molecules they can form intramolecular double helices. 646 00:46:25,000 --> 00:46:29,000 There's no reason why you cannot make a double helix out of RNA as 647 00:46:29,000 --> 00:46:34,000 you can make out of DNA. And therefore you see often in many 648 00:46:34,000 --> 00:46:38,000 kinds of RNA molecules they will hydrogen bond to themselves using 649 00:46:38,000 --> 00:46:42,000 these complementary sequences. And this is called a hairpin, by 650 00:46:42,000 --> 00:46:46,000 the way for obvious reasons. And so many RNA molecules, most of 651 00:46:46,000 --> 00:46:51,000 them in fact have these intramolecular hydrogen bonded 652 00:46:51,000 --> 00:46:55,000 double helices with confers on them very specific structure. 653 00:46:55,000 --> 00:46:59,000 One other aspect of the two versus three hydrogen bonds 654 00:46:59,000 --> 00:47:04,000 is the following. If a double helix has many Gs and Cs 655 00:47:04,000 --> 00:47:09,000 then it's going to have more hydrogen bonds holding it together 656 00:47:09,000 --> 00:47:13,000 than if it has few Gs and Cs. So let's look at the Chargaff 657 00:47:13,000 --> 00:47:18,000 example. Chargaff who lived for fifty years stewing in his own bile 658 00:47:18,000 --> 00:47:23,000 in bitterness because he couldn't figure this out, 659 00:47:23,000 --> 00:47:28,000 which is exactly what happened by the way. 660 00:47:28,000 --> 00:47:32,000 And so here this has a higher G plus C content, the one on the right than 661 00:47:32,000 --> 00:47:36,000 this one. This is 23% or 46% G plus C. This is 40% G plus C. 662 00:47:36,000 --> 00:47:40,000 If it's 46% G plus C that means there are more hydrogen bonds 663 00:47:40,000 --> 00:47:44,000 holding the two strands together. And it turns out that if you want 664 00:47:44,000 --> 00:47:48,000 to denature a double helix that has high G plus C content you need to 665 00:47:48,000 --> 00:47:52,000 put in more energy, you need to heat the double helix up 666 00:47:52,000 --> 00:47:56,000 to a higher temperature. It's more difficult to pull the 667 00:47:56,000 --> 00:48:00,000 strands apart. One other side comment on what I 668 00:48:00,000 --> 00:48:04,000 wanted to say is the following. The presence or the absence of this 669 00:48:04,000 --> 00:48:08,000 hydroxyl here in RNA has an important consequence for the 670 00:48:08,000 --> 00:48:12,000 stability of RNA and DNA. Let's look at what happens to an 671 00:48:12,000 --> 00:48:16,000 RNA chain when a hydroxyl ion, which happens to be floating around 672 00:48:16,000 --> 00:48:20,000 at a low concentration, happens to attack this 673 00:48:20,000 --> 00:48:24,000 phosphodiester bond. What happens is that this 674 00:48:24,000 --> 00:48:28,000 phosphodiester bond will tend to cyclize. It's forming this 675 00:48:28,000 --> 00:48:32,000 five membered ring. And ultimately that will resolve and 676 00:48:32,000 --> 00:48:37,000 break causing a cleavage of the RNA chain. This phosphodiester bond now 677 00:48:37,000 --> 00:48:42,000 forming a cyclic structure here as an intermediate representing the 678 00:48:42,000 --> 00:48:46,000 precursor to the ultimately cleaved chain. That means that if you take 679 00:48:46,000 --> 00:48:51,000 RNA molecules and you put them in alkali they will fall apart very 680 00:48:51,000 --> 00:48:56,000 quickly for this very reason. What happens to DNA molecules when 681 00:48:56,000 --> 00:49:01,000 you put them in alkali? Nothing. They're alkali resistant 682 00:49:01,000 --> 00:49:05,000 because there isn't a hydroxyl there to form this five membered ring. 683 00:49:05,000 --> 00:49:09,000 And therefore alkali cannot cleave apart the DNA or the DNA 684 00:49:09,000 --> 00:49:13,000 phosphodiester bond. If we imagine that OH groups, 685 00:49:13,000 --> 00:49:18,000 that hydroxyls, are present at a certain, albeit a certain 686 00:49:18,000 --> 00:49:22,000 concentration, albeit a low concentration in 687 00:49:22,000 --> 00:49:26,000 neutral water we can see that even at neutral pH with a certain 688 00:49:26,000 --> 00:49:31,000 frequency RNA molecules will slowly hydrolyze. 689 00:49:31,000 --> 00:49:35,000 They'll certainly be slowly broken down by the hydroxyl ions. 690 00:49:35,000 --> 00:49:40,000 DNA molecules, however, will not. And that represents yet another 691 00:49:40,000 --> 00:49:45,000 important biochemical reason why DNA is chemically stable and why it can 692 00:49:45,000 --> 00:49:50,000 carry information over years, decades or tens of thousands of 693 00:49:50,000 --> 00:49:55,000 years, because the phosphodiester linkage in DNA rather than RNA is 694 00:49:55,000 --> 00:50:00,000 very stable chemically and can hold these adjacent nucleotides together, 695 00:50:00,000 --> 00:50:05,000 one to the other. See you on Friday morning.