1 00:00:15,000 --> 00:00:19,000 Professor Jacks is out of town so I am going to tell you about 2 00:00:19,000 --> 00:00:24,000 Recombinant DNA 3, then he's going to come back and 3 00:00:24,000 --> 00:00:29,000 tell you about Cell Biology, and then you will have finished the 4 00:00:29,000 --> 00:00:34,000 foundations part of the course. And we'll move onto things that 5 00:00:34,000 --> 00:00:38,000 build on the foundation, the Formation Module and the part of 6 00:00:38,000 --> 00:00:43,000 the Systems Module, which I'll be teaching you for the 7 00:00:43,000 --> 00:00:48,000 next few weeks, but today is Recombinant DNA 3. 8 00:00:48,000 --> 00:00:52,000 And, as you've been hearing for the last couple of lectures, 9 00:00:52,000 --> 00:00:57,000 this is one of the How-To Modules that we've put in the course. 10 00:00:57,000 --> 00:01:01,000 How to make use of the information that you have been learning in 11 00:01:01,000 --> 00:01:06,000 Molecular Biology and in Biochemistry and in Genetics to use 12 00:01:06,000 --> 00:01:11,000 these disciplines or these pieces of information to do something useful. 13 00:01:11,000 --> 00:01:15,000 And recombinant DNA is really an extraordinary set of technologies 14 00:01:15,000 --> 00:01:19,000 that just keeps getting more and more extraordinary. 15 00:01:19,000 --> 00:01:23,000 And the way one can manipulate biological systems now is really 16 00:01:23,000 --> 00:01:27,000 very exciting. And it continues to be exciting. 17 00:01:27,000 --> 00:01:31,000 When I was a beginning graduate student we were able to clone the 18 00:01:31,000 --> 00:01:35,000 first pieces of DNA. And now we can really do a lot more 19 00:01:35,000 --> 00:01:39,000 than just clone DNA. So I want to tell you about some of 20 00:01:39,000 --> 00:01:44,000 the things that are really essential to understand about this technology, 21 00:01:44,000 --> 00:01:48,000 and then take you through some of the forefronts of where recombinant 22 00:01:48,000 --> 00:01:52,000 DNA technology is now. We're going to cover three things 23 00:01:52,000 --> 00:02:00,000 in this lecture. 24 00:02:00,000 --> 00:02:10,000 DNA sequencing, using genetic polymorphisms for 25 00:02:10,000 --> 00:02:20,000 various genotyping analyses, and then I'm going to try to touch 26 00:02:20,000 --> 00:02:30,000 on, and we'll have to see how we do here, making animals that are 27 00:02:30,000 --> 00:02:38,000 so-called transgenic. So transgenic technology. 28 00:02:38,000 --> 00:02:44,000 And I'm going to use PowerPoint pretty much for most of the lecture, 29 00:02:44,000 --> 00:02:50,000 so you have most of the relevant stuff in front of you. 30 00:02:50,000 --> 00:02:56,000 I'm going to frame this in terms of a human disease, familial 31 00:02:56,000 --> 00:03:02,000 hypercholesterolemia. So you may remember way back when in 32 00:03:02,000 --> 00:03:06,000 biochemistry we talked about cholesterol. Anyone remember what 33 00:03:06,000 --> 00:03:10,000 class of macromolecules cholesterol belongs to? Lipids. 34 00:03:10,000 --> 00:03:14,000 Thank you. Lipids. OK. I'm not even going to give a frog 35 00:03:14,000 --> 00:03:19,000 for that. And we have this sense of cholesterol being a really bad kind 36 00:03:19,000 --> 00:03:23,000 of molecule but, in fact, cholesterol is an essential 37 00:03:23,000 --> 00:03:27,000 lipid. It's extremely important. Without cholesterol you'd die and 38 00:03:27,000 --> 00:03:32,000 you need it for many things. Not only for building membranes in 39 00:03:32,000 --> 00:03:36,000 your cells but also, if you think way back, 40 00:03:36,000 --> 00:03:40,000 you may remember me telling you that cholesterol was part of or had a 41 00:03:40,000 --> 00:03:44,000 chemical structure that was very similar to the steroid hormone 42 00:03:44,000 --> 00:03:48,000 family. And steroid hormones, and we'll discuss this more in the 43 00:03:48,000 --> 00:03:52,000 future, are very important molecules that tell one part of the body what 44 00:03:52,000 --> 00:03:56,000 to do, that regulate what different parts of the body are doing. 45 00:03:56,000 --> 00:04:00,000 So cholesterol is part of this whole signaling system. 46 00:04:00,000 --> 00:04:04,000 And really it's not actually understood all of what cholesterol 47 00:04:04,000 --> 00:04:08,000 does, but it's very important. However, too much of it is not good. 48 00:04:08,000 --> 00:04:12,000 And it's probably not good because, but it's not actually clear. I'll 49 00:04:12,000 --> 00:04:16,000 tell you what happens if you have too much cholesterol, 50 00:04:16,000 --> 00:04:20,000 but actually why it happens is not that clear. So let me talk about 51 00:04:20,000 --> 00:04:24,000 this slide up here, and then we'll talk about what too 52 00:04:24,000 --> 00:04:28,000 much cholesterol does for you. So familial hypercholesterolemia is 53 00:04:28,000 --> 00:04:33,000 an inherited disease, and it's caused by mutations in a 54 00:04:33,000 --> 00:04:39,000 gene called the LDL receptor, that encodes for something called 55 00:04:39,000 --> 00:04:44,000 the LDL receptor. Now, LDL stands for low density 56 00:04:44,000 --> 00:04:49,000 lipoprotein. And you had this in a previous lecture because I'd been 57 00:04:49,000 --> 00:04:55,000 mentioned these to you. Low density lipoproteins. 58 00:04:55,000 --> 00:05:00,000 And these bind to various lipids, including cholesterol, and are taken 59 00:05:00,000 --> 00:05:05,000 up into the cell. And some of them are OK, 60 00:05:05,000 --> 00:05:09,000 you probably need some LDLs, but too much LDL is bad. And if you 61 00:05:09,000 --> 00:05:14,000 have too much LDL receptor, the thing that actually binds to the 62 00:05:14,000 --> 00:05:18,000 LDLs, you get too much LDL taken up into the cell. 63 00:05:18,000 --> 00:05:23,000 So this LDL receptor, you'll talk more about this in cell 64 00:05:23,000 --> 00:05:27,000 biology, this LDL receptor, and you've already had some of this, 65 00:05:27,000 --> 00:05:32,000 the LDL receptor is a protein that binds to these LDLs, 66 00:05:32,000 --> 00:05:37,000 takes them into the cell, and then your cell gets full of LDLs. 67 00:05:37,000 --> 00:05:41,000 OK? And as a consequence of this, your cholesterol levels go way up. 68 00:05:41,000 --> 00:05:46,000 Now, you can be heterozygote or homozygote for familiar 69 00:05:46,000 --> 00:05:50,000 hypercholesterolemia, for the LDL receptor gene. 70 00:05:50,000 --> 00:05:55,000 OK? For the familiar hypercholesterolemia gene. 71 00:05:55,000 --> 00:06:00,000 Try to say that one quickly. All right. 72 00:06:00,000 --> 00:06:06,000 So if you're heterozygote, you have an increased risk of heart 73 00:06:06,000 --> 00:06:13,000 disease. In particular for this thing called atherosclerosis I'll 74 00:06:13,000 --> 00:06:19,000 talk more about in a moment. If you are homozygote, so you have 75 00:06:19,000 --> 00:06:26,000 two copies of a mutated LDL receptor gene, you get severe heart symptoms 76 00:06:26,000 --> 00:06:32,000 and you die early. OK? What is atherosclerosis? 77 00:06:32,000 --> 00:06:38,000 Atherosclerosis is a disease that occurs because you get these 78 00:06:38,000 --> 00:06:44,000 buildups of stuff in the blood vessels. And the stuff is fat and 79 00:06:44,000 --> 00:06:49,000 it's proteins, and it basically makes a big lump 80 00:06:49,000 --> 00:06:55,000 that eventually occludes or blocks the blood vessel. 81 00:06:55,000 --> 00:07:01,000 And so atherosclerosis is bad because impedes blood flow. 82 00:07:01,000 --> 00:07:06,000 And if you impede blood flow, eventually your heart will seize up 83 00:07:06,000 --> 00:07:12,000 and you will have a heart attack, and that can have, obviously, very 84 00:07:12,000 --> 00:07:17,000 severe consequences. So atherosclerosis occurs because 85 00:07:17,000 --> 00:07:23,000 you have high levels of LDL. And it's really, the actual 86 00:07:23,000 --> 00:07:29,000 etiology of atherosclerosis is not really clear. 87 00:07:29,000 --> 00:07:33,000 Part it may be that there's just too much fat around and that starts 88 00:07:33,000 --> 00:07:37,000 actually getting deposited out of solution, but it's much more 89 00:07:37,000 --> 00:07:42,000 complicated than that. And there seems to be a very 90 00:07:42,000 --> 00:07:46,000 complicated chain of events by which you get these atherosclerosis 91 00:07:46,000 --> 00:07:50,000 plaques sitting on the lining of blood vessels and impeding blood 92 00:07:50,000 --> 00:07:55,000 flow. OK. So there is a lot of interest medically in 93 00:07:55,000 --> 00:07:59,000 atherosclerosis, particularly in countries such as 94 00:07:59,000 --> 00:08:04,000 ours where food is plentiful and people tend to have too much. 95 00:08:04,000 --> 00:08:08,000 And obesity is a problem anyway because that is part of the set of 96 00:08:08,000 --> 00:08:13,000 risk factors for atherosclerosis. So here are the risk factors. High 97 00:08:13,000 --> 00:08:18,000 levels of LDL, high blood pressure, 98 00:08:18,000 --> 00:08:23,000 diabetes, cigarette smoke and so on. And familial hypercholesterolemia 99 00:08:23,000 --> 00:08:28,000 is contributory to high levels of LDL and atherosclerosis. OK. 100 00:08:28,000 --> 00:08:34,000 So one of the things I want to do is to keep thinking about this disorder 101 00:08:34,000 --> 00:08:40,000 and walk you through how you figure out who's got FH. 102 00:08:40,000 --> 00:08:46,000 OK. What you can do is to get blood cells from people at-risk, 103 00:08:46,000 --> 00:08:52,000 and you can actually examine the LDL receptor gene in the blood cells of 104 00:08:52,000 --> 00:08:59,000 people who are at-risk for familial hypercholesterolemia. 105 00:08:59,000 --> 00:09:03,000 And what I tell you about is how you can actually sequence the gene, 106 00:09:03,000 --> 00:09:08,000 the FH gene, see if you can find the mutation and see whether or not you 107 00:09:08,000 --> 00:09:13,000 can then identify people who are at-risk for the disorder. 108 00:09:13,000 --> 00:09:18,000 So the first thing I want to tell you about today is DNA sequencing. 109 00:09:18,000 --> 00:09:23,000 DNA sequencing. What is DNA sequencing? Does someone care to 110 00:09:23,000 --> 00:09:28,000 give me a definition or think about what I might mean by 111 00:09:28,000 --> 00:09:33,000 DNA sequencing? In particular, 112 00:09:33,000 --> 00:09:38,000 what part of the DNA are we sequencing? Thank you, 113 00:09:38,000 --> 00:09:43,000 Jamie. You want to say it louder? The bases. Yes. So in DNA 114 00:09:43,000 --> 00:09:48,000 sequencing, and maybe I even wrote this, what is this, 115 00:09:48,000 --> 00:09:53,000 what you want to do is to determine the base sequence of the DNA. 116 00:09:53,000 --> 00:09:58,000 OK? You want to determine the sequence of AGCT along 117 00:09:58,000 --> 00:10:03,000 a DNA fragment. This technique is powerful beyond 118 00:10:03,000 --> 00:10:07,000 almost anything else. It's an extraordinary technique. 119 00:10:07,000 --> 00:10:11,000 The ability to sequence DNA is extraordinary. 120 00:10:11,000 --> 00:10:16,000 And it's extraordinary because you can get out of it information that 121 00:10:16,000 --> 00:10:20,000 is absolutely essential for understanding life. 122 00:10:20,000 --> 00:10:25,000 What you can get from DNA sequencing is an understanding of 123 00:10:25,000 --> 00:10:29,000 the coding capacity of a gene. So, just like you did in your exam, 124 00:10:29,000 --> 00:10:33,000 we gave you a string of DNA and you conceptually translated 125 00:10:33,000 --> 00:10:38,000 it into the protein. Well, you can do that in real life 126 00:10:38,000 --> 00:10:42,000 by looking through the genome, the human genome and finding 127 00:10:42,000 --> 00:10:46,000 stretches of DNA and conceptually turning them into RNA and into 128 00:10:46,000 --> 00:10:50,000 protein and saying, OK, is this is a gene? 129 00:10:50,000 --> 00:10:54,000 Does it code for something? And what does it code for? So you 130 00:10:54,000 --> 00:10:58,000 can figure out the coding capacity of a gene. Part of that is actually 131 00:10:58,000 --> 00:11:02,000 identifying is a gene a gene? So we've sequenced the entire human 132 00:11:02,000 --> 00:11:06,000 genome. And I've told you previously that only about 5% of the 133 00:11:06,000 --> 00:11:10,000 genome is actually genes and the rest is other stuff. 134 00:11:10,000 --> 00:11:14,000 So one of the things you want to do with DNA sequencing is to identify 135 00:11:14,000 --> 00:11:18,000 genes. And that's actually very difficult to do it turns out. 136 00:11:18,000 --> 00:11:22,000 But that's one of the things you can do with DNA sequencing. 137 00:11:22,000 --> 00:11:26,000 I'll talk more about identifying genes that are associated with 138 00:11:26,000 --> 00:11:30,000 disease, that are causative of disease. 139 00:11:30,000 --> 00:11:33,000 And particularly alleles that are associated with disease such as in 140 00:11:33,000 --> 00:11:37,000 the case of familial hypercholesterolemia. 141 00:11:37,000 --> 00:11:40,000 One can figure out evolutionary relationships between organisms. 142 00:11:40,000 --> 00:11:44,000 So you've probably heard for years about how similar we are to 143 00:11:44,000 --> 00:11:48,000 chimpanzees or how similar we are to dogs or to dolphins or whatever. 144 00:11:48,000 --> 00:11:51,000 But, actually, we didn't really know. Now we can sequence a human 145 00:11:51,000 --> 00:11:55,000 genome, we can sequence a chimp genome, a dog genome, 146 00:11:55,000 --> 00:11:59,000 a dolphin genome, and we can actually look and see 147 00:11:59,000 --> 00:12:02,000 how similar we are. And we can try to figure out, 148 00:12:02,000 --> 00:12:06,000 in evolutionary time, what's changed between the dolphin and ourselves 149 00:12:06,000 --> 00:12:09,000 and what makes a dolphin a dolphin and ourselves ourselves. 150 00:12:09,000 --> 00:12:13,000 It's a very tough question, but DNA sequencing is essential for 151 00:12:13,000 --> 00:12:16,000 trying to answer that kind of question. And then one can ask 152 00:12:16,000 --> 00:12:20,000 about the genome is other ways. Can one find the promoters of all 153 00:12:20,000 --> 00:12:23,000 the different genes? Remember promoters that make genes 154 00:12:23,000 --> 00:12:27,000 be transcribed? The centromeres, 155 00:12:27,000 --> 00:12:31,000 the middle of chromosomes. Various other elements in the genome 156 00:12:31,000 --> 00:12:36,000 that are essential for its function. So I'm going to spend quite some 157 00:12:36,000 --> 00:12:41,000 time talking about DNA sequencing and tell you that DNA sequencing, 158 00:12:41,000 --> 00:12:45,000 most of the DNA sequencing we do uses a trick. And it's a terrific 159 00:12:45,000 --> 00:12:50,000 trick. It really is. So this DNA sequencing, 160 00:12:50,000 --> 00:12:55,000 I'll write it because I don't think I have this on one of 161 00:12:55,000 --> 00:13:01,000 your PowerPoints. The method of DNA sequencing I'm 162 00:13:01,000 --> 00:13:09,000 going to tell you about was devised by a scientist called Fred Sanger. 163 00:13:09,000 --> 00:13:17,000 So I'll tell you about it. It's called dideoxy, 164 00:13:17,000 --> 00:13:25,000 it's also called chain termination, and it's also called Sanger 165 00:13:25,000 --> 00:13:30,000 sequencing. Professor Sanger is a British 166 00:13:30,000 --> 00:13:34,000 scientist who received two Nobel Prizes. The first was for figuring 167 00:13:34,000 --> 00:13:37,000 out how proteins, how to sequence proteins, 168 00:13:37,000 --> 00:13:41,000 and the second was for figuring out how to sequence DNA. 169 00:13:41,000 --> 00:13:44,000 When I was a student, I heard Professor Sanger talk. 170 00:13:44,000 --> 00:13:48,000 And he gave a lecture which was really memorable. 171 00:13:48,000 --> 00:13:51,000 It was packed, a packed auditorium. And he spoke the entire time like 172 00:13:51,000 --> 00:13:55,000 this. I don't think he looked up once. He gave the entire lecture 173 00:13:55,000 --> 00:13:59,000 like this, and he was barely audible. 174 00:13:59,000 --> 00:14:03,000 But at the end of the lecture he got a standing ovation from everybody 175 00:14:03,000 --> 00:14:07,000 because really what he's done, figuring out how to sequence 176 00:14:07,000 --> 00:14:12,000 proteins and how to sequence DNA was really an extraordinary 177 00:14:12,000 --> 00:14:16,000 accomplishment. So that's the method I'll tell you 178 00:14:16,000 --> 00:14:21,000 about. And it uses a cool trick. So you know now that the sugar in 179 00:14:21,000 --> 00:14:25,000 DNA has a 3 prime hydroxyl group, and that hydroxyl group is the group 180 00:14:25,000 --> 00:14:30,000 unto which the phosphate gets added. 181 00:14:30,000 --> 00:14:35,000 Right? And without that hydroxyl group you could not add on the next 182 00:14:35,000 --> 00:14:40,000 nucleotide, right? It's a question. Think about it. 183 00:14:40,000 --> 00:14:46,000 OK? I don't mean it to be rhetorical. I want you to really be 184 00:14:46,000 --> 00:14:51,000 thinking, OK, about this, because otherwise you won't 185 00:14:51,000 --> 00:14:57,000 understand the method. So here's the 3 prime hydroxyl on 186 00:14:57,000 --> 00:15:02,000 regular deoxyribose. OK? In the Sanger or dideoxy method one 187 00:15:02,000 --> 00:15:07,000 uses in the reaction mix, and I'll go through this with you in 188 00:15:07,000 --> 00:15:13,000 a moment, a sugar or nucleotide that's a dideoxy nucleotide. 189 00:15:13,000 --> 00:15:18,000 In other words, on both the 2 prime and the 3 prime of the sugar, 190 00:15:18,000 --> 00:15:23,000 of the ribose there is no hydroxyl group. There are just 191 00:15:23,000 --> 00:15:28,000 those hydrogens. Now, a dideoxy nucleotide such as 192 00:15:28,000 --> 00:15:34,000 this one can get incorporated into DNA just fine because this phosphate, 193 00:15:34,000 --> 00:15:40,000 the triphosphate here can react with a regular nucleotide that's got a 3 194 00:15:40,000 --> 00:15:46,000 prime hydroxyl. However, once it's been 195 00:15:46,000 --> 00:15:52,000 incorporated you cannot elongate the chain anymore because there is no 196 00:15:52,000 --> 00:15:58,000 reactive hydroxyl group. OK. So based on this principle let 197 00:15:58,000 --> 00:16:03,000 me explain. I've got one of your handouts here. 198 00:16:03,000 --> 00:16:07,000 OK. So here we go. Revision, your template, your primer, 199 00:16:07,000 --> 00:16:11,000 here's your template strand, always goes 3 prime to 5 prime. 200 00:16:11,000 --> 00:16:15,000 Here's your 5 prime to 3 prime primer. If you add nucleotides, 201 00:16:15,000 --> 00:16:19,000 deoxynucleotide triphosphates and DNA polymerase, 202 00:16:19,000 --> 00:16:24,000 you will polymerize the whole fragment. 203 00:16:24,000 --> 00:16:29,000 If you add, however, to the mix of dNTPs and DNA 204 00:16:29,000 --> 00:16:35,000 polymerase a low-level of dideoxy nucleotide triphosphates, 205 00:16:35,000 --> 00:16:41,000 every time you add on a nucleotide the polymerase can either use a 206 00:16:41,000 --> 00:16:46,000 regular nucleotide triphosphate, in which case the chain can elongate 207 00:16:46,000 --> 00:16:52,000 subsequently, or it can use a dideoxy nucleotide triphosphate. 208 00:16:52,000 --> 00:16:58,000 If it uses one of the dideoxy NTPs the chain will terminate. 209 00:16:58,000 --> 00:17:03,000 It cannot be elongated any further. So you get something like this. 210 00:17:03,000 --> 00:17:08,000 And the trick here is really this low-level of ddNTPs. 211 00:17:08,000 --> 00:17:14,000 OK? So if you have your template and your primer and you do a 212 00:17:14,000 --> 00:17:19,000 reaction with your dNTPs at a reasonable level and you spike the 213 00:17:19,000 --> 00:17:24,000 reaction with a low-level of dideoxy NTPs, you get a whole bunch of 214 00:17:24,000 --> 00:17:30,000 different length chains polymerized. 215 00:17:30,000 --> 00:17:35,000 Because there is some probability, at every position, that you're 216 00:17:35,000 --> 00:17:40,000 either going to get a ddNTP incorporated, in which case the 217 00:17:40,000 --> 00:17:45,000 chain terminates, or you're going to get a regular 218 00:17:45,000 --> 00:17:50,000 nucleotide incorporated in which case the chain can continue for a 219 00:17:50,000 --> 00:17:56,000 bit. OK? So that is paramount to dideoxy sequencing. 220 00:17:56,000 --> 00:18:01,000 So let's continue now by looking at a specific polymer and following 221 00:18:01,000 --> 00:18:06,000 through exactly what happens. So here I've given you a template 222 00:18:06,000 --> 00:18:10,000 and a primer. And we're going to do the same reaction that we just did 223 00:18:10,000 --> 00:18:15,000 conceptually. We're going to do it again conceptually except with 224 00:18:15,000 --> 00:18:19,000 letters. We're going to mix together. And we're going to do, 225 00:18:19,000 --> 00:18:24,000 and I see a mistake up here already, but that's OK. You'll bear with me. 226 00:18:24,000 --> 00:18:29,000 What I've done here is to put in some dideoxy ATP. 227 00:18:29,000 --> 00:18:33,000 And I meant to say here I've got dATP at high levels. 228 00:18:33,000 --> 00:18:37,000 And I've got all the other nucleotides here, 229 00:18:37,000 --> 00:18:41,000 too, at high levels. OK? That's my error and I will 230 00:18:41,000 --> 00:18:45,000 correct it. You should correct it now in your handout. 231 00:18:45,000 --> 00:18:49,000 So where it says dATP high, that should actually say dNTPs high, 232 00:18:49,000 --> 00:18:53,000 not just dATP. OK? All right. So let's look and see what happens to 233 00:18:53,000 --> 00:18:57,000 this reaction. And I've noted here that this 234 00:18:57,000 --> 00:19:02,000 dideoxy ATP can be radioactive or florescent. 235 00:19:02,000 --> 00:19:06,000 Or actually it doesn't have to work that way but let's just leave it 236 00:19:06,000 --> 00:19:10,000 that way for now. OK. That actually is not 237 00:19:10,000 --> 00:19:15,000 necessarily true. So let's just focus on the ddATP 238 00:19:15,000 --> 00:19:19,000 plus the high dNTPs, and let's see what happens. 239 00:19:19,000 --> 00:19:24,000 OK. So one thing that can happen is that, here's your primer in red 240 00:19:24,000 --> 00:19:28,000 and here's the polymerized DNA in blue, you get a bit 241 00:19:28,000 --> 00:19:33,000 of DNA polymerase. Now here's an A. 242 00:19:33,000 --> 00:19:38,000 See? It goes GAGTAA. And I've given you a reaction where 243 00:19:38,000 --> 00:19:42,000 the first two As use regular dATP. And so the chain will continue 244 00:19:42,000 --> 00:19:47,000 after that. All right? So here we go, GAGTA. And then the 245 00:19:47,000 --> 00:19:52,000 next A that's put in is a dideoxy A. And that's the end of that 246 00:19:52,000 --> 00:19:57,000 polymerization reaction, and the fragments you're going to 247 00:19:57,000 --> 00:20:02,000 get out of it is this little red and blue composite there. 248 00:20:02,000 --> 00:20:06,000 You can do the same thing where you say actually in some molecules you 249 00:20:06,000 --> 00:20:10,000 get polymerization past the second A, and you keep going until you get to 250 00:20:10,000 --> 00:20:15,000 the next A. And at that point, by chance, you get a dideoxy ATP 251 00:20:15,000 --> 00:20:19,000 added to some molecules. That is the end of polymerization 252 00:20:19,000 --> 00:20:24,000 for those molecules. The chain terminates. 253 00:20:24,000 --> 00:20:28,000 For some molecules, however, you'll put in a regular dATP and the 254 00:20:28,000 --> 00:20:33,000 chain will continue. But it will terminate, 255 00:20:33,000 --> 00:20:38,000 excuse me, at the next A that's put in because you put a dideoxy A in. 256 00:20:38,000 --> 00:20:43,000 So in different molecules you're going to land up with a spectrum of 257 00:20:43,000 --> 00:20:47,000 elongated products of different length. All right? 258 00:20:47,000 --> 00:20:52,000 And what's crucial here is that the length of the molecules that chain 259 00:20:52,000 --> 00:20:57,000 terminate, because they incorporated dideoxy nucleotide, 260 00:20:57,000 --> 00:21:02,000 correspond to the position of that particular nucleotide 261 00:21:02,000 --> 00:21:07,000 along the chain. So you're only going to get a 262 00:21:07,000 --> 00:21:13,000 molecule chain terminating with A when there was a T on the template 263 00:21:13,000 --> 00:21:18,000 strand. OK? And so you can map the positions of the T on the template 264 00:21:18,000 --> 00:21:23,000 or the A on the elongated strand by the length of the elongated products 265 00:21:23,000 --> 00:21:29,000 that come out of this reaction. I'm going to assume you're with me 266 00:21:29,000 --> 00:21:33,000 here. OK. So the point is the polymerized 267 00:21:33,000 --> 00:21:37,000 fragments terminate where dideoxy A incorporates. Now, 268 00:21:37,000 --> 00:21:40,000 you've got to do four reactions to determine the sequence of something. 269 00:21:40,000 --> 00:21:44,000 OK. And I've noted here. And the length of the terminated fragment 270 00:21:44,000 --> 00:21:48,000 indicates the position of A. You may need to go and work with 271 00:21:48,000 --> 00:21:51,000 this a bit. OK? It's a very clever method but it 272 00:21:51,000 --> 00:21:55,000 may not be something that's immediately apparent, 273 00:21:55,000 --> 00:21:59,000 so go and work with it if you need to. 274 00:21:59,000 --> 00:22:03,000 So the length of the terminated fragments indicates the positions of 275 00:22:03,000 --> 00:22:07,000 A in the elongated strand, or if you want in T of the template 276 00:22:07,000 --> 00:22:11,000 strand. In order to get the positions of all the different 277 00:22:11,000 --> 00:22:16,000 nucleotides along that DNA fragment you have to do four separate 278 00:22:16,000 --> 00:22:20,000 reactions. One that includes dideoxy ATP, one that includes dideoxy CTP, 279 00:22:20,000 --> 00:22:25,000 one dideoxy GTP and one dideoxy TTP. 280 00:22:25,000 --> 00:22:29,000 And you do those separately so that you can monitor the positions of 281 00:22:29,000 --> 00:22:33,000 each of those four nucleotides by the position of chain terminating as 282 00:22:33,000 --> 00:22:38,000 you're going along. OK. So assuming that you guys are 283 00:22:38,000 --> 00:22:42,000 with me here at this point, are you? No. That's an honest 284 00:22:42,000 --> 00:22:46,000 answer. Raise your hands if you're with me. OK. If you're not with me, 285 00:22:46,000 --> 00:22:51,000 don't worry about. You have to go work with it. 286 00:22:51,000 --> 00:22:55,000 It's not intuitive. It's very clever. I mean there's a reason 287 00:22:55,000 --> 00:23:00,000 this guy got the Nobel Prize for this. OK? 288 00:23:00,000 --> 00:23:03,000 It's a really clever method. OK. So the deal is this. So now 289 00:23:03,000 --> 00:23:07,000 what you get out of this is a whole mix of fragments of different 290 00:23:07,000 --> 00:23:11,000 lengths that have terminated at positions of particular nucleotides, 291 00:23:11,000 --> 00:23:14,000 depending on how you've spiked the reaction. And you've got to 292 00:23:14,000 --> 00:23:18,000 separate them from one another somehow to figure out what those 293 00:23:18,000 --> 00:23:22,000 positions are. And you can do this in a couple of 294 00:23:22,000 --> 00:23:26,000 ways. You can use gel electrophoresis, 295 00:23:26,000 --> 00:23:31,000 which was discussed with you previously, where you separate the 296 00:23:31,000 --> 00:23:36,000 DNA on the basis of size where the DNA migrates in a gel in an electric 297 00:23:36,000 --> 00:23:40,000 field and long fragments stay near the top of the gel and short 298 00:23:40,000 --> 00:23:45,000 fragments go to the bottom of the gel because they migrate quickly. 299 00:23:45,000 --> 00:23:50,000 And what you can do on a gel, and you've somehow labeled, 300 00:23:50,000 --> 00:23:55,000 don't worry about this right now, but somehow you're able to detect 301 00:23:55,000 --> 00:24:00,000 each of the fragments that has come out of your mix. 302 00:24:00,000 --> 00:24:04,000 OK? So remember you're doing the sequencing reaction on millions and 303 00:24:04,000 --> 00:24:08,000 millions or billions of molecules. And so you've got this kind of 304 00:24:08,000 --> 00:24:12,000 stochastic mix of molecules of different lengths. 305 00:24:12,000 --> 00:24:16,000 And you want to separate this mix of molecules of different lengths. 306 00:24:16,000 --> 00:24:20,000 OK. So what you can end up with, once you've separated all these 307 00:24:20,000 --> 00:24:24,000 different molecules, is in your dideoxy A reaction mix a 308 00:24:24,000 --> 00:24:28,000 series of one, two, three, four, 309 00:24:28,000 --> 00:24:33,000 five different sized fragments. In your ddG mix, 310 00:24:33,000 --> 00:24:37,000 you got out of that also a series of five different sized fragments. 311 00:24:37,000 --> 00:24:42,000 And notice that they're different in size from the ones in the ddA 312 00:24:42,000 --> 00:24:47,000 lane, the ones in the ddC lane and the ones in the ddT lane. 313 00:24:47,000 --> 00:24:51,000 And the reason they're different in size is because their size indicates 314 00:24:51,000 --> 00:24:56,000 the position of where a particular nucleotide is in the DNA fragment or 315 00:24:56,000 --> 00:25:01,000 particular bases in the DNA fragment. 316 00:25:01,000 --> 00:25:06,000 And then the trick is you could look at this gel and you could read off 317 00:25:06,000 --> 00:25:11,000 the sequence. So the shortest fragments that you're going to get 318 00:25:11,000 --> 00:25:16,000 are the ones that are nearest the beginning of that molecule you made, 319 00:25:16,000 --> 00:25:21,000 nearest the 5 prime end. So the bottom one is G, 320 00:25:21,000 --> 00:25:26,000 here's the band in the ddG lane. Then up above it there is this band 321 00:25:26,000 --> 00:25:32,000 indicating a fragment in the ddA lane. 322 00:25:32,000 --> 00:25:38,000 Above it there's one in the G lane again. Above it there's one in the 323 00:25:38,000 --> 00:25:44,000 T lane. So the sequence goes G-A-G-T, and then you can keep 324 00:25:44,000 --> 00:25:50,000 reading A-A-C-G-G-T-A-T-G-C-A. OK? Literally like that on a gel. 325 00:25:50,000 --> 00:25:56,000 OK? So you can do that on a gel. It's really fantastic. 326 00:25:56,000 --> 00:26:02,000 And this is what old sequencing gels look like. 327 00:26:02,000 --> 00:26:05,000 And, actually, I used to run them. 328 00:26:05,000 --> 00:26:09,000 I used to spend hours and hours running these gels. 329 00:26:09,000 --> 00:26:13,000 They're very, very thin. They're about a millimeter thick 330 00:26:13,000 --> 00:26:16,000 acrylamide so that you can resolve the fragments that are one 331 00:26:16,000 --> 00:26:20,000 nucleotide different in size. Think about that. OK? Each of 332 00:26:20,000 --> 00:26:24,000 these fragments, indicated by a band, 333 00:26:24,000 --> 00:26:28,000 is one nucleotide different in size. Otherwise, you couldn't get the one 334 00:26:28,000 --> 00:26:32,000 nucleotide resolution. So you do that by running very, 335 00:26:32,000 --> 00:26:37,000 very thin gels so that you can resolve the fragments well, 336 00:26:37,000 --> 00:26:42,000 and then you read off the bottom. OK? I've thrown out all my old 337 00:26:42,000 --> 00:26:46,000 sequencing gels. And the reason that I have is that 338 00:26:46,000 --> 00:26:51,000 there is new technology where you don't use this kind of display 339 00:26:51,000 --> 00:26:56,000 anymore. This is a display where your fragments were labeled with 340 00:26:56,000 --> 00:27:01,000 radioactivity and you exposed them to x-ray film and you read the 341 00:27:01,000 --> 00:27:06,000 sequence after exposure. Nowadays this is done by machine. 342 00:27:06,000 --> 00:27:11,000 And the dideoxy nucleotides are labeled fluorescently. 343 00:27:11,000 --> 00:27:15,000 OK? So they're not labeled with radioactivity. 344 00:27:15,000 --> 00:27:20,000 They're literally labeled with labels that fluoresce with different 345 00:27:20,000 --> 00:27:25,000 colors when you put UV light on them. And you do your dideoxy reaction 346 00:27:25,000 --> 00:27:30,000 and you run a gel. Again, it's a gel. 347 00:27:30,000 --> 00:27:35,000 It's actually a very thin tube of a gel mostly, but your run your gel. 348 00:27:35,000 --> 00:27:41,000 And, again, it's the same idea. You resolve fragments at single base 349 00:27:41,000 --> 00:27:46,000 resolution, single nucleotide resolution, and they keep, 350 00:27:46,000 --> 00:27:52,000 the gel keeps running and running. And single fragments actually run 351 00:27:52,000 --> 00:27:57,000 off the bottom of the gel. And as they're passing down the gel 352 00:27:57,000 --> 00:28:03,000 they are detected by a laser. A laser excites the fluorochrome. 353 00:28:03,000 --> 00:28:06,000 And the detector, there is a detector which will 354 00:28:06,000 --> 00:28:10,000 detect whether or not it's yellow, orange, blue or green. OK? And 355 00:28:10,000 --> 00:28:14,000 that will tell you which base is being, has been incorporated at that 356 00:28:14,000 --> 00:28:18,000 position. So you get things that come out. It's kind of small but 357 00:28:18,000 --> 00:28:22,000 you can go back and look, where instead of getting a gel with 358 00:28:22,000 --> 00:28:26,000 those bands that I showed you, you get these peaks and valleys that 359 00:28:26,000 --> 00:28:30,000 are different colors. And that's what current DNA 360 00:28:30,000 --> 00:28:35,000 sequencing readout looks like. And, in fact, there are machines. 361 00:28:35,000 --> 00:28:40,000 What did I do? Lots of primers. Well, it depends. 362 00:28:40,000 --> 00:28:45,000 Many copies of the same primer, right. Yes. Dr. Gardel is pointing 363 00:28:45,000 --> 00:28:50,000 out that there are many copies of the same primer in a reaction mix. 364 00:28:50,000 --> 00:28:55,000 Certainly there are. There are billions of molecules in the 365 00:28:55,000 --> 00:29:00,000 reaction mix, and so there are billions of primers. 366 00:29:00,000 --> 00:29:03,000 OK, so you have to have a primer for each molecule. 367 00:29:03,000 --> 00:29:06,000 OK. And each band, you should realize, is not a single 368 00:29:06,000 --> 00:29:09,000 molecule. It's a composite of many, many molecules, many thousands of 369 00:29:09,000 --> 00:29:12,000 molecules that have all chain terminated at the same position. 370 00:29:12,000 --> 00:29:15,000 So what I want to point out here is that this is what today's readout 371 00:29:15,000 --> 00:29:19,000 looks like. And, in fact, nowadays you just get a 372 00:29:19,000 --> 00:29:22,000 printout from the company or from the machine that tells 373 00:29:22,000 --> 00:29:26,000 you a DNA sequence. And it's this improvement in 374 00:29:26,000 --> 00:29:31,000 technology, but that basically uses this chain termination method, 375 00:29:31,000 --> 00:29:36,000 that has allowed one to sequence, rapidly enough to sequence the human 376 00:29:36,000 --> 00:29:41,000 genome and to sequence multiple human genomes in multiple animals. 377 00:29:41,000 --> 00:29:46,000 OK. So let's see. Actually, I have a movie. I guess we can take 378 00:29:46,000 --> 00:29:51,000 the time to watch this movie. Let's see if it will work. All 379 00:29:51,000 --> 00:29:56,000 right. So primer template. Four reactions, each with lots of 380 00:29:56,000 --> 00:30:01,000 molecules, each with their primer. DNA polymerase, 381 00:30:01,000 --> 00:30:05,000 dNTPs, dATP, dGTP, dCTP, dTTP, dCTP, excuse me. 382 00:30:05,000 --> 00:30:09,000 OK. They're your four reactions. OK. I think is a less dorky movie 383 00:30:09,000 --> 00:30:13,000 than some. OK. So here we go. Here's your primer 384 00:30:13,000 --> 00:30:17,000 and your template, and here's polymerization. 385 00:30:17,000 --> 00:30:21,000 And, ah, there we go, chain termination, dideoxy nucleotide 386 00:30:21,000 --> 00:30:25,000 incorporation, and you cannot get elongation. 387 00:30:25,000 --> 00:30:30,000 The poor G is thwarted in its desire to elongate. OK? 388 00:30:30,000 --> 00:30:34,000 So you land up with this mix, just like I showed you, and you land 389 00:30:34,000 --> 00:30:38,000 up with a set of four reactions, each with molecules of different 390 00:30:38,000 --> 00:30:42,000 lengths in them. And here's your gel, 391 00:30:42,000 --> 00:30:47,000 and you load them on your gel, and they migrate through your 392 00:30:47,000 --> 00:30:51,000 electric field. And there you have your things, 393 00:30:51,000 --> 00:30:55,000 you have your fragments. This is a piece of x-ray film you put on top. 394 00:30:55,000 --> 00:31:00,000 There are your little bands, your radioactive bands, and here we go. 395 00:31:00,000 --> 00:31:04,000 GT, you can read it. OK. Enough. Enough. 396 00:31:04,000 --> 00:31:09,000 OK. You can go and look at this yourself. This is an old gel 397 00:31:09,000 --> 00:31:13,000 apparatus that one used to do DNA sequencing on. 398 00:31:13,000 --> 00:31:18,000 This was the first generation of machine that you could do the 399 00:31:18,000 --> 00:31:22,000 fluorescent sequencing on. This is a room full of sequencing 400 00:31:22,000 --> 00:31:27,000 machines of the kind that was used to sequence the human genome. 401 00:31:27,000 --> 00:31:30,000 In fact, many rooms of machines going all day and all night 402 00:31:30,000 --> 00:31:34,000 sequencing and sequencing and sequencing. We have a lot of 403 00:31:34,000 --> 00:31:37,000 nucleotides. And it takes a long time to sequence. 404 00:31:37,000 --> 00:31:41,000 Although, in retrospect it's not such a long time. 405 00:31:41,000 --> 00:31:45,000 And now all the sequencing machines that sequence the human genome are 406 00:31:45,000 --> 00:31:48,000 sitting around looking for other work because they all exist. 407 00:31:48,000 --> 00:31:52,000 And so that is why we are sequencing things like dolphins and 408 00:31:52,000 --> 00:31:56,000 dogs and multiple strains of dogs, multiple breeds, excuse me, of dogs 409 00:31:56,000 --> 00:32:00,000 because we have all these sequencing machines sitting around. 410 00:32:00,000 --> 00:32:04,000 OK. Honestly, I think that's true, 411 00:32:04,000 --> 00:32:08,000 not that it's not useful. All right. So I'm going to move on 412 00:32:08,000 --> 00:32:13,000 here. This is Professor Jack's joke that I decided to use also. 413 00:32:13,000 --> 00:32:17,000 OK. This is something about DNA sequencing and the implications of 414 00:32:17,000 --> 00:32:21,000 being able to use DNA sequencing for genotyping. So I'm going to use 415 00:32:21,000 --> 00:32:26,000 that. You can go and read that on your thing. I'm going to move on 416 00:32:26,000 --> 00:32:30,000 right to talking about familial hypercholesterolemia and the notion 417 00:32:30,000 --> 00:32:35,000 of a disease allele. So here's part of the normal FH gene, 418 00:32:35,000 --> 00:32:40,000 the LDL receptor gene, and here it is. And there is a T 419 00:32:40,000 --> 00:32:45,000 here in red. And here is the mutant gene sequence and there is an A. 420 00:32:45,000 --> 00:32:50,000 So if you're wild type you have a T at this position that's arrowed and 421 00:32:50,000 --> 00:32:55,000 if you're a mutant you have an A. And if you do your conceptual 422 00:32:55,000 --> 00:33:00,000 protein translation here you get your amino acid, part of 423 00:33:00,000 --> 00:33:05,000 the amino acid chain. Obviously it's not at the beginning. 424 00:33:05,000 --> 00:33:09,000 And obviously this is DNA and this is protein, so we've removed the RNA 425 00:33:09,000 --> 00:33:14,000 here, the RNA step. And you can see here is the amino 426 00:33:14,000 --> 00:33:19,000 acid of your wild type, the sequence of your wild type gene. 427 00:33:19,000 --> 00:33:23,000 And in your LDL receptor mutant there is a stop codon at this 428 00:33:23,000 --> 00:33:28,000 position that terminates the LDL receptor. And so the receptor gene 429 00:33:28,000 --> 00:33:33,000 is mutant and does not function as it should. 430 00:33:33,000 --> 00:33:38,000 OK. All right. So let me move onto the next thing 431 00:33:38,000 --> 00:33:43,000 I want to talk about, which is this question of 432 00:33:43,000 --> 00:33:48,000 polymorphisms. What is a polymorphism? 433 00:33:48,000 --> 00:34:03,000 Anyone. All right. 434 00:34:03,000 --> 00:34:07,000 I'll tell you what a polymorphism is. A polymorphism is defined as 435 00:34:07,000 --> 00:34:12,000 some kind of variation in DNA sequence. 436 00:34:12,000 --> 00:34:23,000 And it's defined as a variation in 437 00:34:23,000 --> 00:34:27,000 DNA sequence at a particular position. 438 00:34:27,000 --> 00:34:40,000 So our DNA, all of us have very 439 00:34:40,000 --> 00:34:45,000 similar DNA. If we were to sequence me and we were to sequence you and 440 00:34:45,000 --> 00:34:49,000 we were to sequence you, we would find that our DNA was 441 00:34:49,000 --> 00:34:54,000 greater than 99% identical. If we lined up our three times ten 442 00:34:54,000 --> 00:34:59,000 to the ninth base pairs in a very long line, we would find 443 00:34:59,000 --> 00:35:04,000 it was very similar. There was about 1% difference in 444 00:35:04,000 --> 00:35:10,000 sequence between each of us. And most of that, some of that 445 00:35:10,000 --> 00:35:15,000 corresponds to disease gene alleles. We all are supposed to carry about 446 00:35:15,000 --> 00:35:20,000 a thousand bad genes, or a thousand genes that if 447 00:35:20,000 --> 00:35:26,000 homozygous would give us something bad, and sometimes do. 448 00:35:26,000 --> 00:35:31,000 And some of those correspond to changes in differences in DNA 449 00:35:31,000 --> 00:35:37,000 sequence that are not directly in genes. 450 00:35:37,000 --> 00:35:41,000 All of these differences between different individuals are called 451 00:35:41,000 --> 00:35:46,000 polymorphisms, DNA sequence variation. 452 00:35:46,000 --> 00:35:50,000 And you can use these to help figure out whether or not someone 453 00:35:50,000 --> 00:35:55,000 has a particular disease allele, and also you can use it to figure 454 00:35:55,000 --> 00:35:59,000 out where the DNA from a sample comes from me or from you 455 00:35:59,000 --> 00:36:04,000 or from Dr. Gardel. OK? And I'll talk about this, 456 00:36:04,000 --> 00:36:08,000 using polymorphisms to map genotype. I'm going to talk about a 457 00:36:08,000 --> 00:36:12,000 particular kind of polymorphism, and these are called SNPs which is 458 00:36:12,000 --> 00:36:17,000 pronounced ìsnipî. This stands for single nucleotide 459 00:36:17,000 --> 00:36:21,000 polymorphisms. So I've said again that human 460 00:36:21,000 --> 00:36:25,000 genomes are 99% identical, but there are throughout the genome 461 00:36:25,000 --> 00:36:30,000 changes, differences between regions. 462 00:36:30,000 --> 00:36:34,000 Single nucleotide polymorphisms are variations in one region. 463 00:36:34,000 --> 00:36:38,000 Here's a sample sequence I made up. Here's a G in one individual and an 464 00:36:38,000 --> 00:36:42,000 A in another individual. And if you take the population, 465 00:36:42,000 --> 00:36:47,000 you find very often that there just is a choice of two, 466 00:36:47,000 --> 00:36:51,000 sometimes more, but often just a choice of two nucleotides in one 467 00:36:51,000 --> 00:36:55,000 position. Most of the genomes are identical, but you find these little 468 00:36:55,000 --> 00:36:59,000 regions where in many individuals of a population there are 469 00:36:59,000 --> 00:37:04,000 these variations. In fact, these variations have to be 470 00:37:04,000 --> 00:37:08,000 present in more than 1% of the population for this thing to be 471 00:37:08,000 --> 00:37:12,000 called a SNP. This is a definition that humans have given but it's a 472 00:37:12,000 --> 00:37:16,000 useful definition as a genetic tool. So if there is a polymorphism 473 00:37:16,000 --> 00:37:20,000 present in about 1% of the population, whereby I might have an 474 00:37:20,000 --> 00:37:24,000 A here, excuse me, and Dr. Gardel has a G at that 475 00:37:24,000 --> 00:37:28,000 position, that would be a SNP, and we would be polymorphic for that 476 00:37:28,000 --> 00:37:32,000 SNP. In fact, my two chromosomes, 477 00:37:32,000 --> 00:37:38,000 OK, that are homologous chromosomes might on one copy carry an A and on 478 00:37:38,000 --> 00:37:43,000 the other copy carry a G. Now, these different bases are 479 00:37:43,000 --> 00:37:49,000 present at different frequencies. So, for example, it might be very 480 00:37:49,000 --> 00:37:54,000 common to have a G at this position in the sequence and it might be very 481 00:37:54,000 --> 00:38:00,000 rare to have an A at that position. All right? 482 00:38:00,000 --> 00:38:04,000 And that's useful because you can use the frequency of these different 483 00:38:04,000 --> 00:38:09,000 nucleotides, these different bases to help you use the SNP to genotype. 484 00:38:09,000 --> 00:38:13,000 And I want to point out that usually SNPs occur outside coding 485 00:38:13,000 --> 00:38:18,000 regions because 95%, actually more than that, 486 00:38:18,000 --> 00:38:22,000 99% of the genome is not coding per se. 95% is not genes, 487 00:38:22,000 --> 00:38:27,000 but then if you remove all the introns and promoters and so on, 488 00:38:27,000 --> 00:38:32,000 99% does not code for any protein. 489 00:38:32,000 --> 00:38:36,000 OK. So usually these SNPs are present outside coding regions. 490 00:38:36,000 --> 00:38:40,000 So here's to explore this a bit more. You can find lots of these 491 00:38:40,000 --> 00:38:44,000 SNPs. There are about three million SNPs in the human genome, 492 00:38:44,000 --> 00:38:49,000 and a very large percentage of those SNPs has been identified by DNA 493 00:38:49,000 --> 00:38:53,000 sequencing. So you can get the idea. You have to sequence DNA from lots 494 00:38:53,000 --> 00:38:57,000 and lots of individuals to identify these SNPs, but people 495 00:38:57,000 --> 00:39:02,000 have done it. And we know now more than a million 496 00:39:02,000 --> 00:39:06,000 SNPs in the human genome that are located all over different 497 00:39:06,000 --> 00:39:10,000 chromosomes, and we know where they're located on different 498 00:39:10,000 --> 00:39:14,000 chromosomes. And so you can use these SNPs to make kind of a map, 499 00:39:14,000 --> 00:39:19,000 I'll tell you in a moment. So here are some possible genotypes. 500 00:39:19,000 --> 00:39:23,000 I've given you a choice of two for each of these. 501 00:39:23,000 --> 00:39:27,000 OK? So, for example, for this red SNP here you can be AA, 502 00:39:27,000 --> 00:39:32,000 AC or CC on the two homologous chromosomes. 503 00:39:32,000 --> 00:39:36,000 All right. So let's keep going with this thread. So because you have 504 00:39:36,000 --> 00:39:41,000 these SNPs all over your genome and you know where they are, 505 00:39:41,000 --> 00:39:46,000 you can use them to make a map of your entire genome. 506 00:39:46,000 --> 00:39:51,000 That doesn't depend on the genes. It just depends on the sequence. 507 00:39:51,000 --> 00:39:56,000 And knowing these SNPs is a lot easier to work with than having to 508 00:39:56,000 --> 00:40:01,000 sequence the entire genome of somebody every time you 509 00:40:01,000 --> 00:40:06,000 want some information. So you can use these SNPs to 510 00:40:06,000 --> 00:40:11,000 identify each person. So I have a SNP map of all these 511 00:40:11,000 --> 00:40:16,000 hundreds of thousands of SNPs, or up to a million. The usual maps 512 00:40:16,000 --> 00:40:20,000 presently used are about 300, 00 SNPs per genome. I have a map of 513 00:40:20,000 --> 00:40:25,000 300,000 SNPs where there are different, actually, 514 00:40:25,000 --> 00:40:30,000 I don't, but I could, where there are different alleles at 515 00:40:30,000 --> 00:40:35,000 different frequencies, different bases present at different 516 00:40:35,000 --> 00:40:40,000 frequencies at specific positions. And we could pick any one of you and 517 00:40:40,000 --> 00:40:44,000 make a SNP map for you. And it would look really different 518 00:40:44,000 --> 00:40:48,000 from mine, not because the SNPs themselves are different, 519 00:40:48,000 --> 00:40:53,000 they'd be the same SNPs, but the actual bases and the combination of 520 00:40:53,000 --> 00:40:57,000 bases between all these different SNPs would be different between 521 00:40:57,000 --> 00:41:01,000 different individuals. And this SNP-type map is the basis 522 00:41:01,000 --> 00:41:05,000 for DNA fingerprinting that is used in forensics and to figure out 523 00:41:05,000 --> 00:41:09,000 disease alleles. I'll talk more about this in a 524 00:41:09,000 --> 00:41:13,000 second. I want to point out that there are other kinds of 525 00:41:13,000 --> 00:41:16,000 polymorphisms that are used in genotyping, restriction fragment 526 00:41:16,000 --> 00:41:20,000 length polymorphisms and things called simple repeat polymorphisms. 527 00:41:20,000 --> 00:41:24,000 And you can look in your book for these restriction fragment length 528 00:41:24,000 --> 00:41:28,000 polymorphisms, but let's talk more about SNPs. 529 00:41:28,000 --> 00:41:32,000 So SNP genotyping, here's a whole list, 530 00:41:32,000 --> 00:41:36,000 but the ones I'm going to focus on are disease gene mapping and 531 00:41:36,000 --> 00:41:41,000 forensics. Also, you use SNP genotyping for paternity 532 00:41:41,000 --> 00:41:45,000 suits. OK? So if someone comes and, you know, if someone says it's my 533 00:41:45,000 --> 00:41:50,000 kid and the other one says it's my kid, you can figure out very easily 534 00:41:50,000 --> 00:41:54,000 whose it is by looking at these various SNPs and figuring out what 535 00:41:54,000 --> 00:41:59,000 pattern of SNPs is present in the offspring. OK. 536 00:41:59,000 --> 00:42:02,000 So let me actually consider, let me not deal with genotyping for 537 00:42:02,000 --> 00:42:06,000 disease alleles at this point. Let me talk about forensics a bit 538 00:42:06,000 --> 00:42:09,000 because it's kind of interesting. So how do you do this? Let's look 539 00:42:09,000 --> 00:42:13,000 through this slide. You have it as a handout. 540 00:42:13,000 --> 00:42:17,000 Here are SNPs. And I've just given you two chromosomes each with two 541 00:42:17,000 --> 00:42:20,000 SNPs. OK? And different people will have different bases at these 542 00:42:20,000 --> 00:42:24,000 particular SNPs, or they'll have different 543 00:42:24,000 --> 00:42:28,000 combinations of these bases. So here's the spot of blood at the 544 00:42:28,000 --> 00:42:32,000 crime scene. OK? Our red blood cells do not have 545 00:42:32,000 --> 00:42:37,000 nuclei so you cannot get DNA from those, but there are enough white 546 00:42:37,000 --> 00:42:42,000 blood cells that do have nuclei so you can. And, 547 00:42:42,000 --> 00:42:47,000 actually, you know from PCR now that you need very little to amplify 548 00:42:47,000 --> 00:42:52,000 something up by PCR. One cell is sufficient, 549 00:42:52,000 --> 00:42:58,000 right? It's pushing the technology, but you can really use one cell. 550 00:42:58,000 --> 00:43:03,000 So there are plenty of cells in a spot of blood at a crime scene to 551 00:43:03,000 --> 00:43:08,000 isolate the DNA and to PCR amplify the regions surrounding the SNP. 552 00:43:08,000 --> 00:43:13,000 So you're not just dealing with these two nucleotides or the choice 553 00:43:13,000 --> 00:43:19,000 of these two nucleotides at the SNP. You've got a little piece of DNA 554 00:43:19,000 --> 00:43:24,000 that's usually maybe 20 or so bases that includes this choice of single 555 00:43:24,000 --> 00:43:29,000 nucleotide polymorphism. So you amplify the SNP region, 556 00:43:29,000 --> 00:43:34,000 OK, a region that's constant, that includes the nucleotide polymorphism, 557 00:43:34,000 --> 00:43:39,000 and you determine the sequence at the different single nucleotide 558 00:43:39,000 --> 00:43:44,000 polymorphism regions. So you might get someone who, 559 00:43:44,000 --> 00:43:49,000 at the red position you an be A or C, at the green you can be G. 560 00:43:49,000 --> 00:43:54,000 OK, let's have an example here. You can get genotypes where at red 561 00:43:54,000 --> 00:43:59,000 you're A or C, green you're G or G, 562 00:43:59,000 --> 00:44:04,000 purple GT, and yellow you can be A or C. 563 00:44:04,000 --> 00:44:07,000 And here the example is C and C. So here are the four suspects, 564 00:44:07,000 --> 00:44:11,000 numbers one to four. OK. And here are their genotypes. 565 00:44:11,000 --> 00:44:15,000 OK. And here is the spot of blood at the crime scene that actually has 566 00:44:15,000 --> 00:44:18,000 this genotype. OK. So let me go back here. 567 00:44:18,000 --> 00:44:22,000 This is the genotype in the blood at the crime scene. 568 00:44:22,000 --> 00:44:26,000 OK. So the red sequence on one chromosome is an A, 569 00:44:26,000 --> 00:44:30,000 on the other is a C, so you have AC. 570 00:44:30,000 --> 00:44:34,000 On the other, the green sequence you have GG, purple you have GT, 571 00:44:34,000 --> 00:44:38,000 and yellow CC. So you're looking to see whether or not any of the 572 00:44:38,000 --> 00:44:43,000 suspect genotypes map up with a spot of blood, right? 573 00:44:43,000 --> 00:44:47,000 So we're assuming that a spot of blood, you know, 574 00:44:47,000 --> 00:44:52,000 comes from one of the suspects that was attacked by the person who was 575 00:44:52,000 --> 00:44:56,000 the victim. OK. So you have a victim with scratch. 576 00:44:56,000 --> 00:45:01,000 Someone has a spot of blood. And you see whether or not, 577 00:45:01,000 --> 00:45:05,000 or you can use semen samples, you can see whether or not the DNA 578 00:45:05,000 --> 00:45:09,000 in the human tissue that is believed to come from the attacker is 579 00:45:09,000 --> 00:45:13,000 matching of any of the suspects' genotypes. So there are a lot of 580 00:45:13,000 --> 00:45:17,000 assumptions there, right? You have to have tissue at 581 00:45:17,000 --> 00:45:21,000 the crime scene that you believe to come from the attacker. 582 00:45:21,000 --> 00:45:25,000 And then, once you have that, you can determine its genotype and 583 00:45:25,000 --> 00:45:30,000 compare it to the genotypes of the suspects. 584 00:45:30,000 --> 00:45:35,000 And you find, for example, here that, let's see, yeah, 585 00:45:35,000 --> 00:45:40,000 so I believe the suspect number three has the same genotype as the 586 00:45:40,000 --> 00:45:45,000 DNA that was in the spot of blood at the crime scene. 587 00:45:45,000 --> 00:45:50,000 And that would be some evidence that this suspect number three was 588 00:45:50,000 --> 00:45:55,000 the person who did it. Now, in actual fact, you do this 589 00:45:55,000 --> 00:46:00,000 not just for four SNPs, you do it for thousands of SNPs. 590 00:46:00,000 --> 00:46:04,000 You don't usually do this for 300, 00 SNPs because that's expensive and 591 00:46:04,000 --> 00:46:08,000 it's a lot of work. And forensics doesn't put that much 592 00:46:08,000 --> 00:46:12,000 money into this. However, the more SNPs you use for 593 00:46:12,000 --> 00:46:17,000 genotyping the more sure you are of the suspect's identity. 594 00:46:17,000 --> 00:46:21,000 OK? Because it's really a matter of frequency of whether or not 595 00:46:21,000 --> 00:46:25,000 you're going to get the same combination of these different SNP 596 00:46:25,000 --> 00:46:30,000 bases in different potential suspects. 597 00:46:30,000 --> 00:46:33,000 So the greater the spectrum of SNPs you look at, the more sure you are 598 00:46:33,000 --> 00:46:37,000 of the suspect's identity. Now, in some cases this has been 599 00:46:37,000 --> 00:46:41,000 very, very useful. And there are a number of people on 600 00:46:41,000 --> 00:46:45,000 Death Row who have been exonerated by going back to DNA recovered from 601 00:46:45,000 --> 00:46:49,000 the crime scene sometimes years ago, doing SNP mapping and showing that 602 00:46:49,000 --> 00:46:53,000 they really couldn't have done it because the genotypes did not match 603 00:46:53,000 --> 00:46:57,000 up. Usually these were rape cases and the semen genotype just did not 604 00:46:57,000 --> 00:47:01,000 match up with the semen genotype of the person on Death Row. 605 00:47:01,000 --> 00:47:05,000 So this is very valuable technology. OK. It was used in the O.J. 606 00:47:05,000 --> 00:47:10,000 Simpson trial, but not as well as it could have 607 00:47:10,000 --> 00:47:14,000 been which lead to equivocation there. OK. So time is fleeting. 608 00:47:14,000 --> 00:47:19,000 I'm going to mention a technology to you in the last couple of minutes, 609 00:47:19,000 --> 00:47:24,000 and then we'll come back to it as we go on through later parts of the 610 00:47:24,000 --> 00:47:29,000 course. So I've talked today about DNA sequencing. 611 00:47:29,000 --> 00:47:33,000 I've talked about using polymorphisms to genotype people 612 00:47:33,000 --> 00:47:38,000 either, well, for disease alleles I focused on who-done-its. 613 00:47:38,000 --> 00:47:43,000 Something else that I want to throw out at you at this point is the 614 00:47:43,000 --> 00:47:48,000 notion of transgenic technology. And I'm going to tell you what 615 00:47:48,000 --> 00:47:52,000 transgenic organisms are as part of completing the Recombinant DNA 616 00:47:52,000 --> 00:47:57,000 Module. And then we'll come back in future modules and talk more about 617 00:47:57,000 --> 00:48:02,000 how you make these things. But I want to have this as part of 618 00:48:02,000 --> 00:48:07,000 your compendium now. A transgenic animal or transgenic 619 00:48:07,000 --> 00:48:12,000 organism is an organism where you have manipulated its genome in some 620 00:48:12,000 --> 00:48:17,000 way, where you've either inserted extra DNA into its genome or you've 621 00:48:17,000 --> 00:48:22,000 removed DNA from its genome or you've done something to its genome 622 00:48:22,000 --> 00:48:27,000 such that it was not the organism that you started off with. 623 00:48:27,000 --> 00:48:32,000 Genetically modified organisms. The food that you eat that is 624 00:48:32,000 --> 00:48:36,000 genetically modified has had its genome tampered with. 625 00:48:36,000 --> 00:48:40,000 This type of transgenic technology is very, very useful, 626 00:48:40,000 --> 00:48:44,000 not only for creating genetically modified foods, 627 00:48:44,000 --> 00:48:49,000 but it's very, very useful for creating disease models of animals. 628 00:48:49,000 --> 00:48:53,000 And I'll tell you now that there is a mouse model of human familial 629 00:48:53,000 --> 00:48:57,000 hypercholesterolemia that has been created by making a specific 630 00:48:57,000 --> 00:49:02,000 mutation, that T to A mutation in the mouse LDL receptor gene. 631 00:49:02,000 --> 00:49:06,000 Another thing that is extremely useful about transgenic animals is 632 00:49:06,000 --> 00:49:10,000 that you can get them to make specific proteins. 633 00:49:10,000 --> 00:49:14,000 So, for example, there are goats that have had inserted into their 634 00:49:14,000 --> 00:49:18,000 genomes genes that encode for particular medications, 635 00:49:18,000 --> 00:49:22,000 for particular drugs. And you can get these drugs out of the milk of 636 00:49:22,000 --> 00:49:26,000 the goats usually or out of the serum of the goats because they are 637 00:49:26,000 --> 00:49:30,000 constitutively producing them because you've put various genes 638 00:49:30,000 --> 00:49:35,000 into their genome. So I'm going to leave it there and 639 00:49:35,000 --> 00:49:38,000 we'll talk about how to make transgenics in a future lecture.