1 00:00:00,090 --> 00:00:02,520 The following content is provided under a Creative 2 00:00:02,520 --> 00:00:04,059 Commons license. 3 00:00:04,059 --> 00:00:06,360 Your support will help MIT OpenCourseWare 4 00:00:06,360 --> 00:00:10,720 continue to offer high-quality, educational resources for free. 5 00:00:10,720 --> 00:00:13,350 To make a donation or view additional materials 6 00:00:13,350 --> 00:00:17,280 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,280 --> 00:00:18,190 at ocw.mit.edu. 8 00:00:31,614 --> 00:00:35,540 BOGDAN FEDELES: Hi, and welcome to 5.07 Biochemistry Online, 9 00:00:35,540 --> 00:00:41,150 the biochemistry course on MIT OpenCourseWare. 10 00:00:41,150 --> 00:00:43,290 I'm Dr. Bogdan Fedeles. 11 00:00:43,290 --> 00:00:46,187 Let's metabolize some problems. 12 00:00:46,187 --> 00:00:48,270 Now today we're going to do a really nice problem. 13 00:00:48,270 --> 00:00:51,300 This is Problem 1 in Problem Set 2. 14 00:00:51,300 --> 00:00:53,340 Now, this is a problem about elucidating 15 00:00:53,340 --> 00:00:56,280 the primary structure of a protein. 16 00:00:56,280 --> 00:00:58,890 Problems like this one that we're about to discuss 17 00:00:58,890 --> 00:01:02,850 are a lot of fun because they're really biochemistry puzzles. 18 00:01:02,850 --> 00:01:06,240 We're given a number of pieces of data or clues, 19 00:01:06,240 --> 00:01:09,510 if you want, and then we'd have to use, 20 00:01:09,510 --> 00:01:12,450 not only our biochemical sense, but also 21 00:01:12,450 --> 00:01:15,390 deduction and elimination process in order 22 00:01:15,390 --> 00:01:17,520 to come up with a final answer. 23 00:01:17,520 --> 00:01:20,850 Well, in practice, elucidating the primary structure 24 00:01:20,850 --> 00:01:24,900 of a protein is now a largely automated process 25 00:01:24,900 --> 00:01:28,020 and utilizes high resolution mass spectrometry. 26 00:01:28,020 --> 00:01:30,270 Some of the traditional chemical methods 27 00:01:30,270 --> 00:01:33,930 that we're discussing here are still occasionally useful. 28 00:01:33,930 --> 00:01:37,710 But most importantly, the logical step-wise process 29 00:01:37,710 --> 00:01:41,640 through which we analyze and use each piece of data 30 00:01:41,640 --> 00:01:44,880 to construct a big picture result 31 00:01:44,880 --> 00:01:49,200 is really representative of the process by which we make 32 00:01:49,200 --> 00:01:50,970 discoveries in biochemistry. 33 00:01:50,970 --> 00:01:54,000 Before we begin, let me just say that this problem assumes 34 00:01:54,000 --> 00:01:58,320 familiarity with the structures and abbreviations of the 20 35 00:01:58,320 --> 00:01:59,850 natural amino acids. 36 00:01:59,850 --> 00:02:01,620 Feel free to pause this video and review 37 00:02:01,620 --> 00:02:04,830 the relevant chapters in the book and the lecture notes 38 00:02:04,830 --> 00:02:06,067 before continuing. 39 00:02:08,810 --> 00:02:11,630 One important tool that we have for elucidating 40 00:02:11,630 --> 00:02:15,230 the primary structure of proteins is proteases. 41 00:02:15,230 --> 00:02:19,700 Proteases are enzymes that can hydrolyze the peptide bonds 42 00:02:19,700 --> 00:02:22,800 of a polypeptide chain. 43 00:02:22,800 --> 00:02:25,310 Now, proteases that can cleave in the middle 44 00:02:25,310 --> 00:02:29,720 of a polypeptide chain are also called endopeptidases. 45 00:02:29,720 --> 00:02:32,060 Now you notice a lot of these names 46 00:02:32,060 --> 00:02:35,920 that end in "ase" denote enzymes, and peptidase 47 00:02:35,920 --> 00:02:38,440 means an enzyme that acts on a peptide. 48 00:02:38,440 --> 00:02:41,300 It's the enzyme that hydrolyzes the peptide bond. 49 00:02:41,300 --> 00:02:43,400 Endo, in this case, refers to the fact 50 00:02:43,400 --> 00:02:47,160 that it acts in the middle of a polypeptide chain. 51 00:02:47,160 --> 00:02:50,800 Now, we're going to be learning in this problem about trypsin 52 00:02:50,800 --> 00:02:52,250 and chymotrypsin. 53 00:02:52,250 --> 00:02:54,800 These are proteases that cleave in the middle 54 00:02:54,800 --> 00:02:56,060 of a polypeptide chain. 55 00:02:56,060 --> 00:02:58,340 They are endopeptidases. 56 00:02:58,340 --> 00:03:00,380 One important feature of proteases 57 00:03:00,380 --> 00:03:02,420 is that they are specific. 58 00:03:02,420 --> 00:03:07,610 They don't just cut any which one peptide bond, but rather 59 00:03:07,610 --> 00:03:11,780 they recognize a specific sequence of amino acids. 60 00:03:11,780 --> 00:03:13,470 In the case of trypsin, for example, 61 00:03:13,470 --> 00:03:16,880 we are told that it cleaves adjacent to positively 62 00:03:16,880 --> 00:03:18,940 charged amino acid. 63 00:03:18,940 --> 00:03:21,770 As you know, positively charged amino acids 64 00:03:21,770 --> 00:03:24,560 would be lysine or arginine. 65 00:03:24,560 --> 00:03:26,710 So trypsin will always be cutting 66 00:03:26,710 --> 00:03:29,510 after arginine or lysine. 67 00:03:29,510 --> 00:03:30,850 Let's take a look. 68 00:03:30,850 --> 00:03:35,360 Now, here is a polypeptide chain with amino acid residues, 69 00:03:35,360 --> 00:03:39,350 R1, R2, R3, R4. 70 00:03:39,350 --> 00:03:42,110 Now, let's look at this peptide bond 71 00:03:42,110 --> 00:03:48,240 right here in the middle of the chain. 72 00:03:48,240 --> 00:03:50,310 Now let's say in order for trypsin 73 00:03:50,310 --> 00:03:53,670 to cut, to hydrolyze this peptide bond, 74 00:03:53,670 --> 00:03:59,700 it means that R2, the amino acid residue adjacent to it, 75 00:03:59,700 --> 00:04:02,040 should be a positively charged one. 76 00:04:02,040 --> 00:04:11,580 So if R2 is lysine or arginine, then this bond 77 00:04:11,580 --> 00:04:14,700 here becomes a good substrate for trypsin. 78 00:04:14,700 --> 00:04:18,370 And what it's going to do, it's going to use a water molecule-- 79 00:04:18,370 --> 00:04:19,709 it's going to put here trypsin-- 80 00:04:22,430 --> 00:04:24,710 and it's going to hydrolyze forming 81 00:04:24,710 --> 00:04:30,150 a carboxyl end and an amine end of this peptide bond. 82 00:04:30,150 --> 00:04:39,080 So we're getting this is a carboxyl end 83 00:04:39,080 --> 00:04:53,160 and this is the amine end of that original peptide bond. 84 00:04:53,160 --> 00:04:55,640 So what I want you to remember is 85 00:04:55,640 --> 00:04:59,510 that if we're having a reaction with trypsin 86 00:04:59,510 --> 00:05:04,640 on a polypeptide chain, then the carboxy end of the peptide 87 00:05:04,640 --> 00:05:07,790 that results, which is this particular amino acid, 88 00:05:07,790 --> 00:05:14,540 is going to be one of either lysine or arginine. 89 00:05:14,540 --> 00:05:17,100 So in other words in a trypsin digest, 90 00:05:17,100 --> 00:05:19,760 all the smaller peptides that we obtain 91 00:05:19,760 --> 00:05:23,990 are going to end in arginine or lysine except, of course, 92 00:05:23,990 --> 00:05:26,180 the very end of the chain, which might 93 00:05:26,180 --> 00:05:29,450 have a different amino acid at its carboxyl end. 94 00:05:29,450 --> 00:05:33,320 Now, this consideration we've just made about the trypsin 95 00:05:33,320 --> 00:05:37,520 digest, in fact, answers part 1 of the problem, which 96 00:05:37,520 --> 00:05:42,110 is asking what is common about all these peptides generated 97 00:05:42,110 --> 00:05:44,060 by a trypsin digest. 98 00:05:44,060 --> 00:05:46,580 And as we've just explained, all these peptides 99 00:05:46,580 --> 00:05:49,700 should be ending with a positively charged amino acid 100 00:05:49,700 --> 00:05:51,440 such as lysine or arginine. 101 00:05:51,440 --> 00:05:55,760 This problem also mentions another protease chymotrypsin, 102 00:05:55,760 --> 00:05:58,670 which is a protease that has a different specificity 103 00:05:58,670 --> 00:05:59,820 from trypsin. 104 00:05:59,820 --> 00:06:02,540 It actually cleaves after amino acids 105 00:06:02,540 --> 00:06:08,600 that are either very hydrophobic and large or aromatic. 106 00:06:08,600 --> 00:06:11,790 So let's write that down and remember. 107 00:06:11,790 --> 00:06:20,130 So if we were to do a digest with chymotrypsin, 108 00:06:20,130 --> 00:06:25,560 then our R2, the residue that's recognized by the protease, 109 00:06:25,560 --> 00:06:32,770 is going to have to be either aromatic, so phenylalanine, 110 00:06:32,770 --> 00:06:37,380 tyrosine, or tryptophan, or something that's 111 00:06:37,380 --> 00:06:41,700 large and hydrophobic such as leucine, isoleucine, 112 00:06:41,700 --> 00:06:43,450 or even valine sometimes. 113 00:06:43,450 --> 00:06:45,060 Put this in parentheses. 114 00:06:45,060 --> 00:06:48,210 So if any of these residues are at this position, 115 00:06:48,210 --> 00:06:50,960 R2, then this protease, chymotrypsin, 116 00:06:50,960 --> 00:06:53,310 is going to generate two peptides. 117 00:06:53,310 --> 00:06:59,670 And once again, the resulting peptide, R2 at the carboxy end 118 00:06:59,670 --> 00:07:03,300 is going to be one of these amino acids. 119 00:07:03,300 --> 00:07:07,350 That's going to be the signature of a chymotrypsin digest. 120 00:07:07,350 --> 00:07:08,970 Now, keeping these things in mind, 121 00:07:08,970 --> 00:07:11,070 we're ready to tackle the rest of the problem. 122 00:07:14,060 --> 00:07:20,270 Question 2 asks about the use of DTT, or dithiothreitol. 123 00:07:20,270 --> 00:07:23,990 Now, DTT is a commonly used reducing agent, 124 00:07:23,990 --> 00:07:27,840 which can reduce disulfide bridges in proteins. 125 00:07:27,840 --> 00:07:31,160 There are many reasons why we want to use DTT. 126 00:07:31,160 --> 00:07:36,010 For example, when proteins form disulfide bridges, 127 00:07:36,010 --> 00:07:38,990 you may shield certain amino acids 128 00:07:38,990 --> 00:07:42,020 from being accessible by proteases, 129 00:07:42,020 --> 00:07:44,090 and therefore, they're not going to be cleaved 130 00:07:44,090 --> 00:07:46,850 and we'll get a mixture of peptides. 131 00:07:46,850 --> 00:07:50,900 So it makes our analysis and our results very difficult. 132 00:07:50,900 --> 00:07:55,910 Now, disulfide bridges can also hold two peptides together 133 00:07:55,910 --> 00:07:59,610 that have no other covalent attachment between them. 134 00:07:59,610 --> 00:08:02,150 So in that case, we get one fragment 135 00:08:02,150 --> 00:08:05,540 instead of two fragments, and once again, it complicates 136 00:08:05,540 --> 00:08:08,120 our analysis of the protein. 137 00:08:08,120 --> 00:08:10,580 But more importantly, because we're typically 138 00:08:10,580 --> 00:08:14,810 purifying proteins in the air, in an oxygen atmosphere, 139 00:08:14,810 --> 00:08:18,245 proteins can acquire disulfide bonds, which weren't there 140 00:08:18,245 --> 00:08:20,100 in the beginning. 141 00:08:20,100 --> 00:08:24,230 So in that case, we can get very unusual results, 142 00:08:24,230 --> 00:08:26,510 unreproducible and artefactual. 143 00:08:26,510 --> 00:08:32,450 That's why using DTT can prevent formation of spurious disulfide 144 00:08:32,450 --> 00:08:33,740 bridges. 145 00:08:33,740 --> 00:08:37,039 And finally, DTT is also useful to tell 146 00:08:37,039 --> 00:08:39,350 if there were any disulfide bridges in the protein 147 00:08:39,350 --> 00:08:40,470 to begin with. 148 00:08:40,470 --> 00:08:42,440 Because if we're looking at the analysis 149 00:08:42,440 --> 00:08:48,220 before and after using DTT, we can tell if the results change, 150 00:08:48,220 --> 00:08:50,900 and that will tell us whether disulfide bridges were there 151 00:08:50,900 --> 00:08:52,180 to begin with. 152 00:08:52,180 --> 00:08:54,680 Now, let's take a look at the DTT chemistry. 153 00:08:58,050 --> 00:09:13,090 Here we have a disulfide bond or bridge in a protein, 154 00:09:13,090 --> 00:09:16,240 and we're going to treat this with DTT, 155 00:09:16,240 --> 00:09:19,140 which looks like this. 156 00:09:26,490 --> 00:09:31,440 This is DTT or dithiothreitol. 157 00:09:31,440 --> 00:09:34,800 Now, if we substitute the SH groups with OH's, you 158 00:09:34,800 --> 00:09:38,340 notice we're going to have four OH's and four carbons. 159 00:09:38,340 --> 00:09:42,480 That's just an alcohol derived from a sugar, which 160 00:09:42,480 --> 00:09:47,050 is called threose That's why the threitol part of the name. 161 00:09:47,050 --> 00:09:47,550 All right. 162 00:09:47,550 --> 00:09:50,320 So when we do this chemical reaction, 163 00:09:50,320 --> 00:09:55,110 this disulfide bridge is going to transfer between the two 164 00:09:55,110 --> 00:09:57,440 sulfur atoms of DTT. 165 00:09:57,440 --> 00:10:03,780 They're going to form a intramolecular sulfide bond. 166 00:10:03,780 --> 00:10:12,000 So our cysteines are going to get reduced, 167 00:10:12,000 --> 00:10:20,590 and from DTT, we're going to get this intramolecular sulfide. 168 00:10:26,190 --> 00:10:32,910 And because entropical considerations, 169 00:10:32,910 --> 00:10:34,440 make the formation of the six-member 170 00:10:34,440 --> 00:10:38,610 ring very, very easy, then this equilibrium will typically 171 00:10:38,610 --> 00:10:41,520 shift to the right. 172 00:10:41,520 --> 00:10:44,400 Of course, we're also using vast quantities of DTT 173 00:10:44,400 --> 00:10:48,360 to make sure that the entire protein becomes reduced, 174 00:10:48,360 --> 00:10:52,290 and then our agent is going to pick up this [INAUDIBLE] 175 00:10:52,290 --> 00:10:53,400 off the bond. 176 00:10:53,400 --> 00:10:55,560 This sums up the question 2 of the problem. 177 00:10:58,180 --> 00:11:01,600 Next, we're going to look at distinguishing 178 00:11:01,600 --> 00:11:03,940 between a couple of different peptides 179 00:11:03,940 --> 00:11:07,660 that we generate during our analysis of our mystery 180 00:11:07,660 --> 00:11:08,770 protein. 181 00:11:08,770 --> 00:11:13,810 So we're told that we're isolating by HPLC peptides 182 00:11:13,810 --> 00:11:15,970 of the following composition. 183 00:11:15,970 --> 00:11:20,590 One has tryptophan, phenylalanine, 184 00:11:20,590 --> 00:11:26,730 valine, aspartate, lysine, cysteine. 185 00:11:26,730 --> 00:11:28,780 That's peptide one. 186 00:11:28,780 --> 00:11:32,820 Another one has phenylalanine, serine, cysteine, 187 00:11:32,820 --> 00:11:35,290 and an unknown amino acid. 188 00:11:35,290 --> 00:11:40,251 Finally, the third one has alanine and lysine. 189 00:11:40,251 --> 00:11:40,750 All right. 190 00:11:40,750 --> 00:11:44,320 Now, obviously, these peptides have different compositions, 191 00:11:44,320 --> 00:11:49,751 so if you could just put them through mass spectrometry, 192 00:11:49,751 --> 00:11:51,250 we're going to get different masses, 193 00:11:51,250 --> 00:11:54,550 so we can tell very quickly which one is which. 194 00:11:54,550 --> 00:11:57,380 But if we don't have mass spec available, 195 00:11:57,380 --> 00:12:01,300 we can also tell them apart only using UV-Vis spectroscopy. 196 00:12:01,300 --> 00:12:05,530 So all you need to remember is that an amino acid 197 00:12:05,530 --> 00:12:07,040 like tryptophan that we have here, 198 00:12:07,040 --> 00:12:14,650 W, has a very strong absorption in the 280 to 300 nanometers. 199 00:12:14,650 --> 00:12:18,220 Whereas, amino acids like phenylalanine, 200 00:12:18,220 --> 00:12:22,960 present here or here, they absorb only around 260 201 00:12:22,960 --> 00:12:24,310 nanometers. 202 00:12:24,310 --> 00:12:27,340 Most of the other amino acids don't absorb in this range 203 00:12:27,340 --> 00:12:28,670 at all. 204 00:12:28,670 --> 00:12:35,458 So if we were to plot the UV-Vis spectra of these peptides, 205 00:12:35,458 --> 00:12:37,720 this is going to be our absorption, 206 00:12:37,720 --> 00:12:41,840 and this is going to be lambda in nanometers, the wavelength. 207 00:12:41,840 --> 00:12:47,140 So let's look in the range from 200 to 400. 208 00:12:47,140 --> 00:12:48,700 300 is about here. 209 00:12:48,700 --> 00:12:49,930 250 is about here. 210 00:12:52,690 --> 00:12:55,150 And let's label these. 211 00:12:55,150 --> 00:13:03,680 Let's say this peptide is red, this peptide is blue, 212 00:13:03,680 --> 00:13:06,860 and this one is green. 213 00:13:06,860 --> 00:13:08,460 All right. 214 00:13:08,460 --> 00:13:11,190 So now the red peptide, as I told you, 215 00:13:11,190 --> 00:13:14,310 contains both tryptophan and phenylalanine, 216 00:13:14,310 --> 00:13:17,370 so it's going to absorb both around 260 217 00:13:17,370 --> 00:13:19,770 and both around 280 to 300. 218 00:13:19,770 --> 00:13:24,420 So it's going to have a pretty big hump in this area 219 00:13:24,420 --> 00:13:26,860 from 250 to 300. 220 00:13:26,860 --> 00:13:29,130 Now, the blue peptide only has phenylalanine, 221 00:13:29,130 --> 00:13:33,020 so above 280 or so, nanometer is going to drop off. 222 00:13:33,020 --> 00:13:37,200 So it's going to look more like this, 223 00:13:37,200 --> 00:13:42,210 whereas, the green one, well, it has neither phenylalanine 224 00:13:42,210 --> 00:13:45,540 or tryptophan, so below 240 or so, 225 00:13:45,540 --> 00:13:47,830 it's going to have no UV absorption at all. 226 00:13:47,830 --> 00:13:51,940 So it's going to drop off right here. 227 00:13:51,940 --> 00:13:54,540 So basically, if we're just looking, 228 00:13:54,540 --> 00:13:58,980 say, around 260 nanometers, we should see a stronger peak 229 00:13:58,980 --> 00:14:01,920 from the red peptide and a weaker peak from the blue one. 230 00:14:01,920 --> 00:14:04,180 But if we're going to look around 300, 231 00:14:04,180 --> 00:14:07,320 we should only see the red peptide. 232 00:14:07,320 --> 00:14:11,160 So by just using the UV-Vis absorption, 233 00:14:11,160 --> 00:14:13,240 we can tell these three peptides apart. 234 00:14:16,360 --> 00:14:19,660 Now we're ready to figure out the structure of our mystery 235 00:14:19,660 --> 00:14:21,430 protein. 236 00:14:21,430 --> 00:14:23,980 This is exactly what the last part of the problem is asking. 237 00:14:23,980 --> 00:14:27,400 So we're going to take each one of the clues provided-- 238 00:14:27,400 --> 00:14:28,750 A, B, C, and D-- 239 00:14:28,750 --> 00:14:30,820 and analyze and see what information we can 240 00:14:30,820 --> 00:14:32,800 derive from each one of them. 241 00:14:32,800 --> 00:14:35,050 The first piece of information that we're given 242 00:14:35,050 --> 00:14:37,870 is the result of a total hydrolysis 243 00:14:37,870 --> 00:14:41,695 of our peptide with six molar HCl. 244 00:14:41,695 --> 00:14:44,260 So we're told we get the following amino acid 245 00:14:44,260 --> 00:14:45,338 composition. 246 00:14:48,930 --> 00:14:57,180 We get two phenylalanines, one methionine, alanine, valine, 247 00:14:57,180 --> 00:15:05,630 two lysines, one serine, two cysteine, and one aspartate. 248 00:15:05,630 --> 00:15:08,990 Now, recall this is not a comprehensive list 249 00:15:08,990 --> 00:15:12,020 because the hydrolysis and six smaller HCl 250 00:15:12,020 --> 00:15:14,690 may destroy some of the amino acids. 251 00:15:14,690 --> 00:15:18,080 And specifically, we know for sure something 252 00:15:18,080 --> 00:15:22,997 like tryptophan, threonine, and tyrosine. 253 00:15:22,997 --> 00:15:24,830 They're going to be destroyed, and we're not 254 00:15:24,830 --> 00:15:25,920 going to see them here. 255 00:15:25,920 --> 00:15:29,060 So these amino acids may still be present in our protein, 256 00:15:29,060 --> 00:15:34,880 but we're not going to be able to see them in this situation. 257 00:15:34,880 --> 00:15:38,440 Now, let's analyze these amino acids 258 00:15:38,440 --> 00:15:40,500 and see what we can derive from this. 259 00:15:40,500 --> 00:15:43,700 So first of all, we have cysteines. 260 00:15:43,700 --> 00:15:45,770 So two cysteines, it means our protein 261 00:15:45,770 --> 00:15:48,770 can form disulfide bridges. 262 00:15:48,770 --> 00:15:51,440 So from the get go, two cysteine, 263 00:15:51,440 --> 00:15:54,170 it means we can form an internal disulfide bridge 264 00:15:54,170 --> 00:15:59,570 or our protein contains a disulfide bridge between two 265 00:15:59,570 --> 00:16:02,480 otherwise unconnected peptides. 266 00:16:02,480 --> 00:16:05,390 So here are some possibilities. 267 00:16:05,390 --> 00:16:09,020 For example, our peptide is one chain, 268 00:16:09,020 --> 00:16:14,590 and our cysteines are not actually connected 269 00:16:14,590 --> 00:16:16,600 by a disulfide bridge. 270 00:16:16,600 --> 00:16:19,660 Another possibility is we do have a disulfide bridge 271 00:16:19,660 --> 00:16:24,770 between them, like that. 272 00:16:24,770 --> 00:16:29,960 Or yet another possibility is that we have two pieces, two 273 00:16:29,960 --> 00:16:33,740 polypeptide chains, and the disulfide bridge 274 00:16:33,740 --> 00:16:37,940 is the only thing that's holding them together. 275 00:16:37,940 --> 00:16:40,760 Now, in each one of these cases, obviously, 276 00:16:40,760 --> 00:16:44,300 these polypeptide chains are oriented. 277 00:16:44,300 --> 00:16:48,582 So in one end, they're going to have the amine group. 278 00:16:48,582 --> 00:16:50,770 Let me draw it here, NH3. 279 00:16:55,430 --> 00:17:01,680 Here we're going to have two, and the other end 280 00:17:01,680 --> 00:17:04,735 is going to be the carboxyl group, COO minus. 281 00:17:07,630 --> 00:17:09,240 And once again, in this case, we're 282 00:17:09,240 --> 00:17:11,558 going to have two of them. 283 00:17:11,558 --> 00:17:12,359 All right. 284 00:17:12,359 --> 00:17:17,619 Now, how can we narrow down from these couple of possibilities? 285 00:17:17,619 --> 00:17:22,349 Well, the fact that we're getting two lysines here 286 00:17:22,349 --> 00:17:25,210 is a very important clue. 287 00:17:25,210 --> 00:17:29,730 Now remember, we're told that our peptide, this mystery 288 00:17:29,730 --> 00:17:35,190 protein, it actually comes as a result of a trypsin digest. 289 00:17:35,190 --> 00:17:39,300 And if you recall our discussion earlier, 290 00:17:39,300 --> 00:17:41,550 we said that upon trypsin digest, 291 00:17:41,550 --> 00:17:44,130 all the smaller peptides that were obtained 292 00:17:44,130 --> 00:17:48,270 will end in lysine or an arginine. 293 00:17:48,270 --> 00:17:50,520 The fact that we have two lysines here, 294 00:17:50,520 --> 00:17:53,570 it means both of them they need to be 295 00:17:53,570 --> 00:17:58,500 at the carboxy ends of peptides. 296 00:17:58,500 --> 00:18:01,080 Because if they were in the middle of a chain, 297 00:18:01,080 --> 00:18:06,090 the trypsin would have cut that chain in half. 298 00:18:06,090 --> 00:18:16,400 So two lysines means we've got to have two carboxy ends. 299 00:18:16,400 --> 00:18:22,400 So therefore, this seems to be the only possibility in which 300 00:18:22,400 --> 00:18:24,170 we can accommodate two lysines. 301 00:18:24,170 --> 00:18:28,980 Basically, the two carboxy ends that we see here, 302 00:18:28,980 --> 00:18:31,580 each one has to be a lysine. 303 00:18:31,580 --> 00:18:34,250 This possibility only has one carboxy end, 304 00:18:34,250 --> 00:18:36,710 so the other lysine that we have to place 305 00:18:36,710 --> 00:18:39,860 will not be at the end, so would not 306 00:18:39,860 --> 00:18:42,860 be compatible with a trypsin digest. 307 00:18:42,860 --> 00:18:44,660 Same applies for this case. 308 00:18:44,660 --> 00:18:52,790 So these possibilities are not consistent with our data. 309 00:18:52,790 --> 00:18:54,920 So we know already that our protein must 310 00:18:54,920 --> 00:18:57,710 look like this, two polypeptide chains, 311 00:18:57,710 --> 00:19:00,680 each one ends with lysine, and there 312 00:19:00,680 --> 00:19:03,660 must be a disulfide bridge. 313 00:19:03,660 --> 00:19:07,430 The second clue we're given is that the Edman degradation 314 00:19:07,430 --> 00:19:10,430 of the protein yields valine. 315 00:19:10,430 --> 00:19:12,770 Now, as you know, Edman degradation 316 00:19:12,770 --> 00:19:16,730 is a chemical reaction by which we can digest the protein 317 00:19:16,730 --> 00:19:19,760 from the N terminus, from the amino terminus 318 00:19:19,760 --> 00:19:21,920 of a polypeptide chain. 319 00:19:21,920 --> 00:19:26,880 Now as you recall, we have established in the first part 320 00:19:26,880 --> 00:19:32,450 that we have two amino termini in our protein. 321 00:19:32,450 --> 00:19:35,990 So the fact that we only get one amino acid and that is valine, 322 00:19:35,990 --> 00:19:38,390 it says that the other amino terminus 323 00:19:38,390 --> 00:19:43,760 might be blocked or somehow unavailable 324 00:19:43,760 --> 00:19:45,540 for the Edman degradation. 325 00:19:45,540 --> 00:19:48,800 So let's update the structure of our protein 326 00:19:48,800 --> 00:19:50,550 to take into account the second clue. 327 00:19:55,460 --> 00:19:59,300 So we said we have two polypeptide chains. 328 00:19:59,300 --> 00:20:02,670 There's a disulfide bond in between them. 329 00:20:02,670 --> 00:20:06,580 Now, each one of them ends with a lysine. 330 00:20:06,580 --> 00:20:08,450 This is a carboxy end, and now we 331 00:20:08,450 --> 00:20:12,020 know from the Edman degradation that amino end of one of them 332 00:20:12,020 --> 00:20:13,730 has to be valine. 333 00:20:13,730 --> 00:20:17,090 The other one-- we're going to put a box like that-- 334 00:20:17,090 --> 00:20:18,200 is a blocked end. 335 00:20:18,200 --> 00:20:22,110 So the amino terminus is not available. 336 00:20:22,110 --> 00:20:25,420 So this is as much as we can tell from these first two 337 00:20:25,420 --> 00:20:27,090 clues. 338 00:20:27,090 --> 00:20:30,510 We're given the products of the chymotrypsin 339 00:20:30,510 --> 00:20:33,810 digest of our protein after it was previously 340 00:20:33,810 --> 00:20:35,910 treated with DTT. 341 00:20:35,910 --> 00:20:39,690 So we know first DTT is going to cleave the disulfide bond, 342 00:20:39,690 --> 00:20:42,230 and then chymotrypsin, as we talked previously, 343 00:20:42,230 --> 00:20:47,040 is going to cut after large, hydrophobic, or aromatic 344 00:20:47,040 --> 00:20:48,610 amino acids. 345 00:20:48,610 --> 00:20:52,751 Now, we're told we're getting five smaller peptides. 346 00:20:52,751 --> 00:20:53,500 Let's take a look. 347 00:20:56,430 --> 00:20:58,500 Here are the five fragments. 348 00:20:58,500 --> 00:21:09,900 One is tryptophan, valine; one is cysteine, phenylalanine, 349 00:21:09,900 --> 00:21:15,690 another one is aspartate lysine, another one 350 00:21:15,690 --> 00:21:20,700 is methionine, alanine, cysteine, and lysine; 351 00:21:20,700 --> 00:21:25,830 and finally, the last one has serine, phenylalanine, 352 00:21:25,830 --> 00:21:28,920 and something that's not an amino acid. 353 00:21:28,920 --> 00:21:33,428 It's going to make it like a small x. 354 00:21:33,428 --> 00:21:35,390 All right. 355 00:21:35,390 --> 00:21:39,280 Now, from what we know about chymotrypsin digest, 356 00:21:39,280 --> 00:21:43,610 we should be able to orient these peptides basically 357 00:21:43,610 --> 00:21:46,580 to tell which amino acid is at the N terminus 358 00:21:46,580 --> 00:21:52,070 and which amino acid is at the C terminus of each one of them. 359 00:21:52,070 --> 00:21:53,680 Now the first one. 360 00:21:53,680 --> 00:21:56,830 Well, we know from the second clue 361 00:21:56,830 --> 00:21:58,770 that valine is at the N terminus, 362 00:21:58,770 --> 00:22:00,630 so that makes it pretty easy. 363 00:22:00,630 --> 00:22:05,770 Then the sequence has to be V-W. So valine is the N terminus, 364 00:22:05,770 --> 00:22:09,070 W is the C terminus, and as we said, 365 00:22:09,070 --> 00:22:11,470 tryptophan is one of the amino acids recognized 366 00:22:11,470 --> 00:22:13,870 by chymotrypsin, so it's going to end up 367 00:22:13,870 --> 00:22:17,610 at the carboxy terminus. 368 00:22:17,610 --> 00:22:22,540 C-F, that has to go C and F, phenylalanine, 369 00:22:22,540 --> 00:22:26,950 another chymotrypsin amino acid that's 370 00:22:26,950 --> 00:22:29,460 left of this carboxy terminus. 371 00:22:29,460 --> 00:22:33,070 D-K. Now, neither of these amino acids 372 00:22:33,070 --> 00:22:38,680 is recognized by chymotrypsin, but we remember from clue one 373 00:22:38,680 --> 00:22:41,390 that D-K must be the carboxy terminus. 374 00:22:41,390 --> 00:22:45,970 So the sequence can only be D-K. 375 00:22:45,970 --> 00:22:49,420 Now, here M, A, C, and K, none of these 376 00:22:49,420 --> 00:22:52,930 is actually on our recognition list for chymotrypsin, 377 00:22:52,930 --> 00:22:56,410 but once again, we know that K must be in the carboxy end. 378 00:22:56,410 --> 00:22:58,840 So for now we're going to have M, A, 379 00:22:58,840 --> 00:23:04,240 and C in a particular order, which we cannot establish just 380 00:23:04,240 --> 00:23:07,940 yet and K at the carboxy end. 381 00:23:07,940 --> 00:23:11,390 And finally, we do an S, F and x. 382 00:23:11,390 --> 00:23:13,570 Well, x is not even an amino acid, 383 00:23:13,570 --> 00:23:18,760 and F is an amino acid that's left of the carboxy terminus 384 00:23:18,760 --> 00:23:26,110 by chymotrypsin, so it's probably x, S, and F. Now, 385 00:23:26,110 --> 00:23:27,580 what about x? 386 00:23:27,580 --> 00:23:41,780 We're told that x is not an amino acid and is hydrophobic 387 00:23:41,780 --> 00:23:46,865 and it has a molecular weight of 256 Daltons. 388 00:23:50,381 --> 00:23:50,880 All right. 389 00:23:50,880 --> 00:23:52,860 So in order to figure out what x is, 390 00:23:52,860 --> 00:23:56,520 we have to read back the beginning of the problem, which 391 00:23:56,520 --> 00:23:59,460 tells us that we're looking at a protein that's associated 392 00:23:59,460 --> 00:24:01,140 with a plasma membrane. 393 00:24:01,140 --> 00:24:03,420 And one way for proteins to associate with a plasma 394 00:24:03,420 --> 00:24:08,370 membrane is to be modified, to incorporate a fatty acid, 395 00:24:08,370 --> 00:24:10,230 such as palmitate. 396 00:24:10,230 --> 00:24:17,040 So perhaps x, which is blocking the N terminus of this peptide 397 00:24:17,040 --> 00:24:18,410 is a fatty acid. 398 00:24:21,930 --> 00:24:24,180 Now, the general formula for fatty acids 399 00:24:24,180 --> 00:24:29,160 is something like this, CH3 CH2 repeated, say, 400 00:24:29,160 --> 00:24:33,660 n times, and then COOH. 401 00:24:33,660 --> 00:24:37,680 So let's see what would be the fatty acid that 402 00:24:37,680 --> 00:24:40,890 has the molecular weight 256 Daltons? 403 00:24:40,890 --> 00:24:45,780 Well, the mass of a methyl group is 15. 404 00:24:45,780 --> 00:24:49,800 The mass of this carboxylate is 45, so we have about 60 405 00:24:49,800 --> 00:24:58,290 plus 14n equals 256 or 14n equals 196. 406 00:24:58,290 --> 00:25:02,130 Therefore, n is 14. 407 00:25:02,130 --> 00:25:06,000 So the fatty acid that would fit these criteria-- 408 00:25:06,000 --> 00:25:08,980 it's hydrophobic, it has a mass of 256-- 409 00:25:08,980 --> 00:25:16,770 will be CH3, CH2 14 times COOH, or the fatty acid 410 00:25:16,770 --> 00:25:18,280 was 16 carbons. 411 00:25:18,280 --> 00:25:19,830 This is palmitic acid. 412 00:25:24,060 --> 00:25:26,580 Now, the answer that we got, palmitic acid, 413 00:25:26,580 --> 00:25:29,730 was anticipated in the text of the problem 414 00:25:29,730 --> 00:25:33,380 because it gave us an example one way to associate proteins 415 00:25:33,380 --> 00:25:36,630 to the plasma membrane is to form 416 00:25:36,630 --> 00:25:40,960 a covalent linkage with a fatty acid such as palmitate. 417 00:25:40,960 --> 00:25:44,400 But how does a protein associate with a plasma 418 00:25:44,400 --> 00:25:50,910 membrane when it has a palmitic acid residue as part of it? 419 00:25:50,910 --> 00:25:53,580 Well, let's take a look at a diagram. 420 00:25:53,580 --> 00:25:56,670 Here we have a representation of the plasma membrane, 421 00:25:56,670 --> 00:25:59,730 where we have the phospholipids like the hydrophilic head 422 00:25:59,730 --> 00:26:04,410 pointing inside and outside the cell and the hydrophobic tails 423 00:26:04,410 --> 00:26:09,150 of the fatty acids lined up to each other. 424 00:26:09,150 --> 00:26:11,880 Now, imagine we have a protein that's 425 00:26:11,880 --> 00:26:15,450 modified to contain one of these fatty acids. 426 00:26:15,450 --> 00:26:20,100 Then this fatty acid can just insert right next 427 00:26:20,100 --> 00:26:23,680 to the phospholipids of the plasma membrane. 428 00:26:23,680 --> 00:26:26,460 And that way it tethers the protein right 429 00:26:26,460 --> 00:26:28,500 at the plasma membrane. 430 00:26:28,500 --> 00:26:31,260 The final clue that we're given in order to figure out 431 00:26:31,260 --> 00:26:33,540 the structure of our mystery protein 432 00:26:33,540 --> 00:26:37,770 is the digest with an inorganic agent this time. 433 00:26:37,770 --> 00:26:39,710 It's called cyanogen bromide. 434 00:26:39,710 --> 00:26:44,340 Cyanogen bromide reacts quite specifically with methionines, 435 00:26:44,340 --> 00:26:47,760 and it cleaves the peptide bond after methionine 436 00:26:47,760 --> 00:26:51,750 leaving behind an unnatural amino acid called 437 00:26:51,750 --> 00:26:54,570 a homoserine lactone. 438 00:26:54,570 --> 00:26:56,730 Now, let's take a look at what happens 439 00:26:56,730 --> 00:26:59,340 when we treat our protein with cyanogen bromide. 440 00:26:59,340 --> 00:27:09,266 So we're told we're getting the following peptides, A, K 441 00:27:09,266 --> 00:27:32,470 and W, F, V, D, K, C and F, S, C, and an unnatural amino acid. 442 00:27:32,470 --> 00:27:37,120 As I just explained to you, the unnatural amino acid probably 443 00:27:37,120 --> 00:27:40,250 is this homoserine lactone. 444 00:27:45,210 --> 00:27:51,680 So it's most likely, instead of this amino acid 445 00:27:51,680 --> 00:27:56,170 in the actual sequence, we had a methionine. 446 00:27:56,170 --> 00:27:59,010 So we know these four residues-- 447 00:27:59,010 --> 00:28:01,720 F, S, C, and M-- 448 00:28:01,720 --> 00:28:04,260 go together. 449 00:28:04,260 --> 00:28:05,310 All right. 450 00:28:05,310 --> 00:28:08,070 Now, we can start putting together the clues 451 00:28:08,070 --> 00:28:13,020 from part 3 and 4 and try to figure out the final structure 452 00:28:13,020 --> 00:28:13,700 of our protein. 453 00:28:17,780 --> 00:28:26,210 So the peptide A, K, we know has to have K at the C terminus. 454 00:28:26,210 --> 00:28:28,890 Now, from part 3, we found out that there 455 00:28:28,890 --> 00:28:33,430 was a peptide that contained M, A, C, and K, 456 00:28:33,430 --> 00:28:37,380 after the chymotrypsin digest. 457 00:28:37,380 --> 00:28:41,553 So now we know K has to be the carboxy end 458 00:28:41,553 --> 00:28:45,870 and A has to be right before K, so in order 459 00:28:45,870 --> 00:28:49,080 to get an A-K peptide, then A has 460 00:28:49,080 --> 00:28:51,810 to be right after methionine because that's 461 00:28:51,810 --> 00:28:55,035 where cyanogen bromide is going to cleave the peptide bond. 462 00:28:57,750 --> 00:29:02,010 So the only sequence that we can have here is 463 00:29:02,010 --> 00:29:05,220 going to be C followed by M followed by A 464 00:29:05,220 --> 00:29:08,540 followed by K. So when we treat the cyanogen bromide, 465 00:29:08,540 --> 00:29:11,420 we're going to be cleaving this bond between M and A, 466 00:29:11,420 --> 00:29:14,460 generating M as a homoserine lactone. 467 00:29:14,460 --> 00:29:21,730 Now, we also know that M has to be in this small peptide, 468 00:29:21,730 --> 00:29:25,390 and we know it has to be C, M as a sequence. 469 00:29:25,390 --> 00:29:31,540 So therefore, S and F have to be on the amino terminus 470 00:29:31,540 --> 00:29:33,370 of this peptide. 471 00:29:33,370 --> 00:29:36,100 And we also have a clue from part 3, which 472 00:29:36,100 --> 00:29:43,950 said that S, F, and this non amino acid moiety 473 00:29:43,950 --> 00:29:49,180 were in the same peptide. 474 00:29:49,180 --> 00:29:51,970 So from here we said that, well, the sequence there, it's 475 00:29:51,970 --> 00:29:57,280 probably x modifying the amino S and then F with a carboxy end. 476 00:29:57,280 --> 00:30:04,760 So putting these three things together, 477 00:30:04,760 --> 00:30:07,400 we can come up with a sequence for this strand, which 478 00:30:07,400 --> 00:30:16,300 is going to be x, S, F, then C, M and then A 479 00:30:16,300 --> 00:30:23,150 and K. So this is probably one of the peptide chains 480 00:30:23,150 --> 00:30:25,670 in our mystery protein. 481 00:30:25,670 --> 00:30:27,350 Then, of course, the other is going 482 00:30:27,350 --> 00:30:30,600 to be composed of these amino acids. 483 00:30:30,600 --> 00:30:33,980 Now, we already know something about the sequence of these. 484 00:30:33,980 --> 00:30:39,130 For example, we know V goes before W. We also 485 00:30:39,130 --> 00:30:50,600 know D goes before K, and also know C goes before F. Now, we 486 00:30:50,600 --> 00:30:57,880 know this has to be the carboxy end of the peptide chain. 487 00:30:57,880 --> 00:31:02,220 Let's write it here, COO minus. 488 00:31:02,220 --> 00:31:06,720 And V has to be the amino end. 489 00:31:10,060 --> 00:31:14,230 So then C-F, there's no other way, has to be in the middle. 490 00:31:14,230 --> 00:31:19,210 So the only possible sequence for our second chain 491 00:31:19,210 --> 00:31:27,950 is going to be V, W, C, F, D, and K. 492 00:31:27,950 --> 00:31:31,250 Now that we've established the exact sequence 493 00:31:31,250 --> 00:31:33,200 of each one of these peptide chains, 494 00:31:33,200 --> 00:31:35,450 then we can put together the final structure 495 00:31:35,450 --> 00:31:36,890 of our mystery protein. 496 00:31:36,890 --> 00:31:42,344 So I'm going to transcribe these here, the first chain V, W, C, 497 00:31:42,344 --> 00:31:48,980 F, D, and K. And we know the cysteine is going 498 00:31:48,980 --> 00:31:54,940 to have our disulfide bridge to the other cysteine, which 499 00:31:54,940 --> 00:32:03,770 goes C, M, A, K, and then F, and then S. 500 00:32:03,770 --> 00:32:10,370 And now we know the N terminus of this peptide, 501 00:32:10,370 --> 00:32:19,990 we're going to have our fatty acid residue CH2 14 times CH3. 502 00:32:22,700 --> 00:32:29,480 So let's just mark, once again, the carboxy ends here 503 00:32:29,480 --> 00:32:35,260 and the amino end here. 504 00:32:35,260 --> 00:32:41,980 So this is the final answer for our problem 505 00:32:41,980 --> 00:32:45,700 and the structure of our mystery peptide. 506 00:32:45,700 --> 00:32:47,590 Well, that's it for this problem. 507 00:32:47,590 --> 00:32:51,940 I hope you enjoyed this little protein mystery hunt. 508 00:32:51,940 --> 00:32:55,360 Now, remember that the strategy that we used here 509 00:32:55,360 --> 00:33:00,850 in which we logically string together pieces of data 510 00:33:00,850 --> 00:33:04,780 to build a big picture, it's really the same strategy 511 00:33:04,780 --> 00:33:08,530 that has been used and is being used right now to advance 512 00:33:08,530 --> 00:33:11,880 our knowledge about living systems and their underlying 513 00:33:11,880 --> 00:33:14,030 biochemistry.