1 00:00:00,000 --> 00:00:04,000 There was a little confusion with dideoxies in one sense, 2 00:00:04,000 --> 00:00:08,000 and some of these things like the PCR you're going to have to sort of 3 00:00:08,000 --> 00:00:12,000 sit down and actually think about it, but the principle of the dideoxies, 4 00:00:12,000 --> 00:00:17,000 if we were making a chain of beads that had a hook on one end and a 5 00:00:17,000 --> 00:00:21,000 little hole on the other, and we were joining these things 6 00:00:21,000 --> 00:00:25,000 together, we could make obviously, a chain that went on. And then we 7 00:00:25,000 --> 00:00:30,000 could hook another one in, and so on. 8 00:00:30,000 --> 00:00:34,000 And if we had a bunch of beads like this every now and then we threw in 9 00:00:34,000 --> 00:00:39,000 a very small number that didn't have the hook on the end, 10 00:00:39,000 --> 00:00:44,000 any time this particular chain were elongating and we put on one of 11 00:00:44,000 --> 00:00:49,000 these things, the chain would stop because you haven't got any where to 12 00:00:49,000 --> 00:00:54,000 join onto it. You added dideoxy into a polymerase reaction. 13 00:00:54,000 --> 00:00:59,000 A chain that gets the dideoxy doesn't have anything to join in the 14 00:00:59,000 --> 00:01:04,000 end, and that will stop. If we only added this, 15 00:01:04,000 --> 00:01:10,000 the entire reaction would stop, and everyone would come at the first 16 00:01:10,000 --> 00:01:16,000 time a dideoxy got incorporated. The trick is to put a little bit in. 17 00:01:16,000 --> 00:01:21,000 So, a few of the molecules stop. Everything keeps going. The next 18 00:01:21,000 --> 00:01:27,000 time a dideoxy gets incorporated, the chain will stop there. And out 19 00:01:27,000 --> 00:01:33,000 of this, you will generate a family of polymers that are of 20 00:01:33,000 --> 00:01:38,000 different lengths. Each one will terminate with a 21 00:01:38,000 --> 00:01:43,000 dideoxy nucleotide, and if the dideoxy nucleotide we 22 00:01:43,000 --> 00:01:49,000 used in that particular reaction was, let's say, dideoxy ATP, 23 00:01:49,000 --> 00:01:54,000 that means that an A was the last nucleotide added to every one of 24 00:01:54,000 --> 00:02:00,000 those. And we can separate these on the basis of size. 25 00:02:00,000 --> 00:02:04,000 And if I ran them out on a gel, I'd see something like that. And 26 00:02:04,000 --> 00:02:09,000 that would tell me that when that polymerase was coming along, 27 00:02:09,000 --> 00:02:13,000 that was the first time it saw an A. A few stopped there, polymerized a 28 00:02:13,000 --> 00:02:18,000 few more. Then it put in another A, put in some other things, and so on. 29 00:02:18,000 --> 00:02:22,000 And that by itself wouldn't tell us the sequence. But if I did that 30 00:02:22,000 --> 00:02:27,000 reaction four times in a row, then I could tell. 31 00:02:27,000 --> 00:02:31,000 In the old days, they didn't used to use dyes. 32 00:02:31,000 --> 00:02:35,000 We just did P32 on it as a label, and then you'd run the four 33 00:02:35,000 --> 00:02:39,000 reactions side-by-side. And this would be with dideoxy ATP. 34 00:02:39,000 --> 00:02:43,000 You'd see a pattern like that, and maybe with dideoxy TTP, 35 00:02:43,000 --> 00:02:47,000 you'd see something like this. And when you got the rest of them, 36 00:02:47,000 --> 00:02:51,000 you'd kind of end up working out what the sequence was by looking 37 00:02:51,000 --> 00:02:55,000 across the four lanes. This business of using the dye is 38 00:02:55,000 --> 00:02:59,000 just one more step up in the engineering side that enables the 39 00:02:59,000 --> 00:03:03,000 thing to be done automatically. And it's pretty well explained in 40 00:03:03,000 --> 00:03:07,000 your textbooks. OK, PCR, someone was confused as to 41 00:03:07,000 --> 00:03:11,000 why we didn't just let the cell do it. Well, the cell does a great job, 42 00:03:11,000 --> 00:03:16,000 but if you are a molecular biologist trying to understand the basis of 43 00:03:16,000 --> 00:03:20,000 life or if you're a biological engineer, and you want to produce 44 00:03:20,000 --> 00:03:25,000 something, you need to get hold of a particular piece of DNA. 45 00:03:25,000 --> 00:03:29,000 Or, if you're a forensic investigator, and you've got a tiny, 46 00:03:29,000 --> 00:03:33,000 tiny sample of human DNA, and you want to know whose it is, 47 00:03:33,000 --> 00:03:38,000 you have to make more of it. And, that's what PCR is all about. 48 00:03:38,000 --> 00:03:42,000 So I'm going to switch just over to the net for a minute. 49 00:03:42,000 --> 00:03:46,000 I think this first site, I just want to show you something, 50 00:03:46,000 --> 00:03:51,000 how somebody functions in a lab now with all these genes out there. 51 00:03:51,000 --> 00:03:55,000 And then, I'm going to show you a little animation for PCR that will 52 00:03:55,000 --> 00:03:59,000 help. So, if you just go to Google and type NCBI, 53 00:03:59,000 --> 00:04:04,000 that's the National Center for Biotechnology Information. 54 00:04:04,000 --> 00:04:10,000 And, the Dolan Learning Center is a center that Cold Spring Harbor 55 00:04:10,000 --> 00:04:16,000 Laboratory has set up to teach people about DNA. 56 00:04:16,000 --> 00:04:22,000 So let me just see here. So, let's just go to, OK, 57 00:04:22,000 --> 00:04:28,000 let's use, whoops, this is going to seize on us. OK, let's 58 00:04:28,000 --> 00:04:43,000 find how it happens. 59 00:04:43,000 --> 00:04:47,000 OK, so here's this National Center for Biotechnology Information. 60 00:04:47,000 --> 00:04:51,000 There's all sorts of things you can search for, and I'm not expecting 61 00:04:51,000 --> 00:04:55,000 you to know the site. I just want to sort of give you a 62 00:04:55,000 --> 00:04:59,000 demo. If I was sitting in my office, this is the sort of thing I can do 63 00:04:59,000 --> 00:05:03,000 easily. Rather than sequence looking for DNA sequence, 64 00:05:03,000 --> 00:05:07,000 I'm going to look for the translated sequence of the protein that's 65 00:05:07,000 --> 00:05:11,000 encoded by the gene where the computer's gone through and used the 66 00:05:11,000 --> 00:05:15,000 genetic codes tell me the sequence of a protein. I told you about 67 00:05:15,000 --> 00:05:19,000 sequencing a mismatch repair gene back in the 80s. 68 00:05:19,000 --> 00:05:23,000 It was called MutS, and I'll put in Walker GC, 69 00:05:23,000 --> 00:05:27,000 and probably hopefully get us to the thing. 70 00:05:27,000 --> 00:05:31,000 And there, the very first hit is DNA repair. Protein MutS salmonella 71 00:05:31,000 --> 00:05:36,000 typhimurium, that's the one I sequenced. So I'll just go to that. 72 00:05:36,000 --> 00:05:41,000 It has various ways of displaying the sequence. I'm going to switch 73 00:05:41,000 --> 00:05:45,000 to Fasta, which is a very easy way to see it. Now what you see is the 74 00:05:45,000 --> 00:05:50,000 sequence of the protein using a one letter code, or one letter stands 75 00:05:50,000 --> 00:05:55,000 for each amino acid. K is lysine. A is alanine, 76 00:05:55,000 --> 00:06:00,000 and so on. I'm just going to copy that, that piece of sequence. 77 00:06:00,000 --> 00:06:03,000 OK so that's the bacterial gene for mismatch repair. 78 00:06:03,000 --> 00:06:06,000 At the time I put that in the database, there wasn't anything else 79 00:06:06,000 --> 00:06:09,000 like it, except for the gene that was Streptococcus pneumonia. 80 00:06:09,000 --> 00:06:13,000 But I found out, someone else is sequencing by phoning around in the 81 00:06:13,000 --> 00:06:16,000 field. I'm going to go back to the main site and I'm going to use a 82 00:06:16,000 --> 00:06:19,000 program called Blast, which lets you search the entire 83 00:06:19,000 --> 00:06:23,000 database. I'll use a protein blast. I'm going to take a protein 84 00:06:23,000 --> 00:06:26,000 sequence, and I'm going to ask what else is out there in terms 85 00:06:26,000 --> 00:06:32,000 of protein sequences? I'll paste in this bacterial 86 00:06:32,000 --> 00:06:40,000 sequence, and then I'm going to, if I can, manage this thing. Let's 87 00:06:40,000 --> 00:06:49,000 see if I can get myself down here. OK, over here I'll probably do, OK, 88 00:06:49,000 --> 00:06:57,000 so I'm going to limit it, let's just search the human genome. 89 00:06:57,000 --> 00:07:05,000 That's all we've got to do. And what did I have to do to get 90 00:07:05,000 --> 00:07:13,000 this thing to fit? Which button? Go to the right. 91 00:07:13,000 --> 00:07:19,000 Can you just come up here for a second to help me get this set up? 92 00:07:19,000 --> 00:07:25,000 I'm computer limited here apparently. OK, 93 00:07:25,000 --> 00:07:31,000 that one, OK, great, so why do we not try this again? 94 00:07:31,000 --> 00:07:51,000 PAUSE] Sorry about this. 95 00:07:51,000 --> 00:08:01,000 We'll see if we can get this thing to go. I have what? 96 00:08:01,000 --> 00:08:15,000 Yeah, that's OK though. That should be fine. Try again. 97 00:08:15,000 --> 00:08:30,000 Let's see if I can get the thing to work. 98 00:08:30,000 --> 00:08:35,000 OK, so it's got it now. It'll tell me. It's searching all 99 00:08:35,000 --> 00:08:40,000 the sequence that's out there. There's just an unbelievable amount 100 00:08:40,000 --> 00:08:45,000 of sequence. That's just how long it took. It's showing me here a 101 00:08:45,000 --> 00:08:51,000 diagrammatic representation of the things. Then I can see that the 102 00:08:51,000 --> 00:08:56,000 very first it was MutS homolog three for humans. And if I go down here, 103 00:08:56,000 --> 00:09:01,000 we can actually see on alignment of the bacterial gene on the top line, 104 00:09:01,000 --> 00:09:07,000 and the lines below is the sequence of that particular human homolog. 105 00:09:07,000 --> 00:09:10,000 And you can see in between all the things that are in common, 106 00:09:10,000 --> 00:09:13,000 and particularly down at the C-terminus of the protein, 107 00:09:13,000 --> 00:09:17,000 you can see there's very strong conservation. You may not think 108 00:09:17,000 --> 00:09:20,000 that that's impressive, but remember for every one of those 109 00:09:20,000 --> 00:09:24,000 positions, there's 20 possibilities. So, if you get that many in a row, 110 00:09:24,000 --> 00:09:27,000 that's the same gene basically. And when you take the structure, 111 00:09:27,000 --> 00:09:31,000 the structure's going to be very, very similar. 112 00:09:31,000 --> 00:09:38,000 And it does mismatch repair in both. Just to try and give you an idea of 113 00:09:38,000 --> 00:09:45,000 how you do sequence now, because with all these genomes done, 114 00:09:45,000 --> 00:09:52,000 you do the vast majority of it by computer, rather than some other way. 115 00:09:52,000 --> 00:09:59,000 I want to take you and show you this. DNA learning: if you go there, 116 00:09:59,000 --> 00:10:06,000 the second thing is a set of animations. 117 00:10:06,000 --> 00:10:11,000 Go to the animations. There's one on polymerase chain 118 00:10:11,000 --> 00:10:16,000 reaction. And, I'm going to just show you this 119 00:10:16,000 --> 00:10:21,000 because this is a nice little, let's see if we can get this thing 120 00:10:21,000 --> 00:10:26,000 to center. OK we'll have to see whether this is going to work. 121 00:10:26,000 --> 00:10:31,000 OK, so this is the principle of, you can go do this at your leisure, 122 00:10:31,000 --> 00:10:37,000 but the idea is to heat the DNA up, the strands come apart. 123 00:10:37,000 --> 00:10:40,000 Then, were going to take these two little primers, 124 00:10:40,000 --> 00:10:43,000 not promoters, which I think someone was confused about, 125 00:10:43,000 --> 00:10:46,000 a little piece of DNA that complimentary, 126 00:10:46,000 --> 00:10:50,000 and anneal them. Than we added DNA polymerase. 127 00:10:50,000 --> 00:10:53,000 You know what happens then. We extend those primers. 128 00:10:53,000 --> 00:10:56,000 That was the first cycle. Do the same thing again. Go to the 129 00:10:56,000 --> 00:11:00,000 second cycle. This is what I was drawing on the board the other day. 130 00:11:00,000 --> 00:11:06,000 Now are going to denature the DNA. The strands come apart. Let's let 131 00:11:06,000 --> 00:11:13,000 the polymerase extend them. Let's go to the third cycle, 132 00:11:13,000 --> 00:11:19,000 denature the DNA, anneal the primers, extend the primers, 133 00:11:19,000 --> 00:11:26,000 and now for the first time, we've got what we were shooting for. 134 00:11:26,000 --> 00:11:35,000 We have a double-stranded copy of 135 00:11:35,000 --> 00:11:45,000 just the DNA that was defined between those primers. 136 00:11:45,000 --> 00:11:55,000 OK, I think this actually, I'm going to go back one. Oops, 137 00:11:55,000 --> 00:12:02,000 OK. If you go, then, 138 00:12:02,000 --> 00:12:06,000 to the amplification graph, what they're doing here is they're 139 00:12:06,000 --> 00:12:11,000 showing you what happens as you do successive cycles. 140 00:12:11,000 --> 00:12:16,000 So, at the first, oh, it's down here. Just a minute. 141 00:12:16,000 --> 00:12:20,000 If we do the first cycle, we end up with two DNA copies. 142 00:12:20,000 --> 00:12:25,000 That's just plotting what I showed you. The next one: we have four. 143 00:12:25,000 --> 00:12:30,000 We haven't yet got to this target sequence. 144 00:12:30,000 --> 00:12:36,000 By the next cycle, we now have two copies of the target 145 00:12:36,000 --> 00:12:43,000 sequence plus these other things. But if you keep going, let's say by 146 00:12:43,000 --> 00:12:50,000 the time we're at seven cycles, the number of targets is up to 114. 147 00:12:50,000 --> 00:12:56,000 The number of DNA copies is 128. But, if we keep going like this, 148 00:12:56,000 --> 00:13:03,000 we'll find out that the target copies become the vast majority of 149 00:13:03,000 --> 00:13:09,000 the sequences that are in there. So, by the time that you're up in 150 00:13:09,000 --> 00:13:13,000 the 30 cycles, or something like that, 151 00:13:13,000 --> 00:13:18,000 there's only a handful of the original things, 152 00:13:18,000 --> 00:13:22,000 or almost all, and that, I hope will help some of you who 153 00:13:22,000 --> 00:13:27,000 might have had problems with understanding the PCR. 154 00:13:27,000 --> 00:13:32,000 So, what I'm going to do is tell you a few more things about what you can 155 00:13:32,000 --> 00:13:37,000 do with recombinant DNA, this recombinant DNA technology, 156 00:13:37,000 --> 00:13:42,000 because it's just so powerful. And I can only sort of give you a few 157 00:13:42,000 --> 00:13:47,000 ideas, and show you a few variations. But, most of these things are just 158 00:13:47,000 --> 00:13:53,000 taking principles that you've already learned as part of the basic 159 00:13:53,000 --> 00:13:58,000 biology I'm trying to tell you, and then using them like an engineer 160 00:13:58,000 --> 00:14:03,000 to achieve some applied purpose. For example suppose I wanted to 161 00:14:03,000 --> 00:14:07,000 produce a human protein, and try and produce it in a 162 00:14:07,000 --> 00:14:11,000 bacterium. That would be great. I could take the one gene. I could 163 00:14:11,000 --> 00:14:15,000 grow a fermenter load of E. coli, and if I got it right, 164 00:14:15,000 --> 00:14:19,000 then I'd be able to make a lot of this protein instead of trying to 165 00:14:19,000 --> 00:14:23,000 isolate it from some human source or something like that. 166 00:14:23,000 --> 00:14:27,000 There's a couple of problems. We talked about them. One is the 167 00:14:27,000 --> 00:14:32,000 problem of promoters. Another one is that human DNA would 168 00:14:32,000 --> 00:14:36,000 have introns in it. And, bacteria doesn't recognize the 169 00:14:36,000 --> 00:14:41,000 human promoters. It wouldn't start to make an RNA in 170 00:14:41,000 --> 00:14:46,000 the right place, and it doesn't know what to do about 171 00:14:46,000 --> 00:14:50,000 splicing out the intron. So, let's address the intron first. 172 00:14:50,000 --> 00:14:55,000 There is a way of handling that that's quite easy. 173 00:14:55,000 --> 00:15:00,000 And that's what's called to make cDNA library. 174 00:15:00,000 --> 00:15:08,000 So, if we have DNA, and then we get the RNA, 175 00:15:08,000 --> 00:15:17,000 we get the RNA copy [SOUND OFF/THEN ON] including these intron sequences. 176 00:15:17,000 --> 00:15:25,000 And then what happens, this is RNA splicing that we talked 177 00:15:25,000 --> 00:15:34,000 about. And what we get out of that would be an mRNA, 178 00:15:34,000 --> 00:15:43,000 in which the introns have been removed. 179 00:15:43,000 --> 00:15:47,000 So, eukaryotic cells, my cells, know how to express the 180 00:15:47,000 --> 00:15:51,000 gene, so they make the RNA, they know how to get rid of the 181 00:15:51,000 --> 00:15:55,000 introns. So, if I were to isolate a messenger RNA. 182 00:15:55,000 --> 00:15:59,000 That's been spliced from me, or you, or anything, what we would 183 00:15:59,000 --> 00:16:03,000 have is a population of RNA molecules that don't have 184 00:16:03,000 --> 00:16:10,000 introns anymore. Anybody remember any way we could 185 00:16:10,000 --> 00:16:19,000 get from RNA back to DNA? Reverse transcriptase. So, 186 00:16:19,000 --> 00:16:28,000 if we used reverse transcriptase, that protein that David Baltimore 187 00:16:28,000 --> 00:16:37,000 discovered and viruses, and which retroviruses use, 188 00:16:37,000 --> 00:16:46,000 now we would have a signle stranded DNA copy of the mRNA. 189 00:16:46,000 --> 00:16:49,000 And then, we could use an ordinary DNA polymerase to get ourselves to 190 00:16:49,000 --> 00:16:53,000 double-stranded DNA. We'd be doing, in essence, 191 00:16:53,000 --> 00:16:57,000 exactly what one of these retroviruses does. 192 00:16:57,000 --> 00:17:03,000 This would give us what's known as a cDNA library, where the genes don't 193 00:17:03,000 --> 00:17:09,000 have introns anymore. So, if I wanted to get at one of my 194 00:17:09,000 --> 00:17:15,000 proteins, one of my genes, and think about expressing it in E. 195 00:17:15,000 --> 00:17:21,000 coli, what I would do is go looking in a cDNA library using a sort of 196 00:17:21,000 --> 00:17:27,000 approach as we've done, trying to find my gene of interest, 197 00:17:27,000 --> 00:17:33,000 because if I use the cDNA library now it would just be like 198 00:17:33,000 --> 00:17:37,000 a bacterial gene. You could see the ATG start. 199 00:17:37,000 --> 00:17:41,000 You could get out your handy little genetic code, and you could walk 200 00:17:41,000 --> 00:17:44,000 along, and read out the sequence of the protein. So, 201 00:17:44,000 --> 00:17:48,000 that's part of what you need to do if you wanted to make, 202 00:17:48,000 --> 00:17:52,000 say, a protein inside of a bacterium. The other one which we talked about 203 00:17:52,000 --> 00:17:55,000 was since the promoters are not a universal language, 204 00:17:55,000 --> 00:17:59,000 what E. coli RNA polymerase sees is different than what human RNA 205 00:17:59,000 --> 00:18:04,000 polymerase sees as a start site. I would have to add in a promoter 206 00:18:04,000 --> 00:18:10,000 that would drive the expression of this open reading frame if I wanted 207 00:18:10,000 --> 00:18:17,000 it to work in E. coli. And that's fairly easily done, 208 00:18:17,000 --> 00:18:23,000 too. A general thing that's for this is called expression cloning. 209 00:18:23,000 --> 00:18:30,000 And it would be more or less the same idea. 210 00:18:30,000 --> 00:18:34,000 We'd have a vector that had a cloning site. It has an origin of 211 00:18:34,000 --> 00:18:39,000 replication, and maybe there's a selectable marker such as the drug 212 00:18:39,000 --> 00:18:44,000 resistance. That's the basic kind of factor that I talked about before. 213 00:18:44,000 --> 00:18:49,000 However, if I clone in a piece of DNA into that, 214 00:18:49,000 --> 00:18:54,000 it has to have a promoter that can be read in the organism working with, 215 00:18:54,000 --> 00:18:59,000 because it's just out whatever nature gave it, 216 00:18:59,000 --> 00:19:04,000 whatever promoter that would be in front of that. 217 00:19:04,000 --> 00:19:09,000 But, if I were to now into this vector put an E. 218 00:19:09,000 --> 00:19:15,000 coli promoter right there, now, if I just downstream of that 219 00:19:15,000 --> 00:19:21,000 put any open reading frame, human protein lets say, which we've 220 00:19:21,000 --> 00:19:27,000 gotten rid of the introns, the human genes minus its introns, 221 00:19:27,000 --> 00:19:33,000 which you got from the cDNA library, now when the bacterial polymerase 222 00:19:33,000 --> 00:19:39,000 came along, it would be copying, making it a messenger RNA for human 223 00:19:39,000 --> 00:19:45,000 protein, and we could get it out of that. 224 00:19:45,000 --> 00:19:51,000 And the beauty of that, suppose we took the front part of 225 00:19:51,000 --> 00:19:57,000 the Lac operator, we would have a regulated promoter. 226 00:19:57,000 --> 00:20:03,000 It would be just everything we studied about Lac if we 227 00:20:03,000 --> 00:20:08,000 were to starve it for, you know, we have to get rid of 228 00:20:08,000 --> 00:20:12,000 glucose, and then if we added lactose or some kind of synthetic 229 00:20:12,000 --> 00:20:16,000 inducer, we can turn the promoter on and off. So, you could grow an 230 00:20:16,000 --> 00:20:20,000 entire fermenter load of bacteria without expressing the gene. 231 00:20:20,000 --> 00:20:24,000 And then, once you had the bacteria all grown up, you could throw in 232 00:20:24,000 --> 00:20:28,000 something that would normally induce the expression of the Lac 233 00:20:28,000 --> 00:20:32,000 regulatory system. And now, instead of making beta 234 00:20:32,000 --> 00:20:36,000 galactosidase, instead it would make the protein 235 00:20:36,000 --> 00:20:40,000 that you are interested in, you with me? It's very pretty. 236 00:20:40,000 --> 00:20:44,000 And in fact, so much of what you can see in this is, 237 00:20:44,000 --> 00:20:47,000 these really basic studies, since the Lac system was one of the 238 00:20:47,000 --> 00:20:51,000 first to really be worked out in detail, we use its parts. 239 00:20:51,000 --> 00:20:55,000 And, there are many vectors around now that have exactly that. 240 00:20:55,000 --> 00:20:59,000 They have the Lac promoter, and you can turn things on and off 241 00:20:59,000 --> 00:21:04,000 in a regulated way, so not only provide a promoter that 242 00:21:04,000 --> 00:21:11,000 works in the organism, but it also gives you a measure of 243 00:21:11,000 --> 00:21:19,000 control. There's another very cute trick, and what we've done sort of 244 00:21:19,000 --> 00:21:26,000 here is we took, say, the promoter for Lac in the 245 00:21:26,000 --> 00:21:32,000 regulatory region. I'll use R to stand for regulatory 246 00:21:32,000 --> 00:21:37,000 region, and then this would be the LacZ sequence. 247 00:21:37,000 --> 00:21:41,000 What we've really done, is we've taken a gene from somewhere 248 00:21:41,000 --> 00:21:46,000 else, let's call it gene X that had a promoter from gene X. 249 00:21:46,000 --> 00:21:51,000 And, in essence, is cutting each of them here. And, 250 00:21:51,000 --> 00:21:56,000 now we take the regulatory promoter region from Lac and we put down 251 00:21:56,000 --> 00:22:00,000 below it gene X. And, now we've got this gene whose 252 00:22:00,000 --> 00:22:05,000 products we're interested in producing in a fermenter under the 253 00:22:05,000 --> 00:22:10,000 control of the Lac operon. Well, there's another kind of thing 254 00:22:10,000 --> 00:22:16,000 we can do. We can do the other way around. We could take LacZ, 255 00:22:16,000 --> 00:22:21,000 which makes beta galactosidase. We could put it under the promoter 256 00:22:21,000 --> 00:22:26,000 regulatory region of gene X. Well, what will happen then? 257 00:22:26,000 --> 00:22:32,000 If that construct is sitting in a cell, anytime that the cell decides 258 00:22:32,000 --> 00:22:37,000 to make gene X, instead it will make beta 259 00:22:37,000 --> 00:22:43,000 galactosidase, which is really easy to assay for. 260 00:22:43,000 --> 00:22:51,000 And, this sort of strategy, you'd use something like LacZ as a 261 00:22:51,000 --> 00:23:00,000 reporter. In this case beta galactosidase 262 00:23:00,000 --> 00:23:11,000 synthesis, which you can assay for, reports when is the promoter of gene 263 00:23:11,000 --> 00:23:23,000 X is functioning. This reporter gene now has the 264 00:23:23,000 --> 00:23:34,000 regulatory characteristics that are imposed upon it by that 265 00:23:34,000 --> 00:23:42,000 particular promoter. So, there's this picture that I've 266 00:23:42,000 --> 00:23:46,000 showed you, this little movie I showed you early on. 267 00:23:46,000 --> 00:23:50,000 You've seen it a couple of times. In this case, the reporter is green 268 00:23:50,000 --> 00:23:55,000 fluorescent protein. What Barbara Meyer, who made this 269 00:23:55,000 --> 00:23:59,000 particular construct, did was they took the gene for green 270 00:23:59,000 --> 00:24:04,000 florescent protein which started out in a jellyfish as you may remember. 271 00:24:04,000 --> 00:24:09,000 And, the protein folds up, and ends up being fluorescent. 272 00:24:09,000 --> 00:24:15,000 So, we can tell when it's expressed very easily. And in this case, 273 00:24:15,000 --> 00:24:21,000 you'll notice not all of the genes in the whole worm isn't glowing. 274 00:24:21,000 --> 00:24:27,000 And so, it's under the control of the promoter regulatory region that 275 00:24:27,000 --> 00:24:33,000 is expressed only in specific body parts. 276 00:24:33,000 --> 00:24:36,000 And so, you can see where that promoter is working by just looking 277 00:24:36,000 --> 00:24:39,000 at the worm. In the case of something like the mouse that we 278 00:24:39,000 --> 00:24:43,000 talked about, it's a pretty uniform expression at least in the skin. 279 00:24:43,000 --> 00:24:46,000 So, that was probably, in that case, the green fluorescent protein was 280 00:24:46,000 --> 00:24:50,000 probably put in a promoter that's expressed in probably most of the 281 00:24:50,000 --> 00:24:53,000 cells in the body, at least certainly all the ones in 282 00:24:53,000 --> 00:24:57,000 the mouse cell. I don't know the details of that. 283 00:24:57,000 --> 00:25:01,000 Ditto over here. It was probably something that was 284 00:25:01,000 --> 00:25:07,000 expressed in most of the body cells, but you also could have put 285 00:25:07,000 --> 00:25:12,000 something that was just expressed in some very little bits. 286 00:25:12,000 --> 00:25:18,000 So, depending on how you do the construct, there are a lot of 287 00:25:18,000 --> 00:25:24,000 different things that people can do in this sort of thing. 288 00:25:24,000 --> 00:25:29,000 OK, one more category that comes out of the sort of thing, 289 00:25:29,000 --> 00:25:35,000 is if we have a gene of some type, I don't know what it does but I'd 290 00:25:35,000 --> 00:25:40,000 like to find out. You know, at least budding 291 00:25:40,000 --> 00:25:44,000 geneticists, know what we'd like to do. We'd probably just like to 292 00:25:44,000 --> 00:25:48,000 disable that gene very specifically, and then look at the live organism 293 00:25:48,000 --> 00:25:53,000 to see what happens. And, the principle is the same 294 00:25:53,000 --> 00:25:57,000 whether you're doing it in E. coli or a mouse. It gets a little 295 00:25:57,000 --> 00:26:02,000 more complicated for technical reasons doing it in a mouse. 296 00:26:02,000 --> 00:26:06,000 But the idea is exactly the same. And here's the strategy. So, we'll 297 00:26:06,000 --> 00:26:11,000 just take a piece of DNA from the organism. And sitting at here is 298 00:26:11,000 --> 00:26:15,000 this open reading frame that we've seen. We don't know what its 299 00:26:15,000 --> 00:26:20,000 function is. We think if I could knock it out, get rid of its 300 00:26:20,000 --> 00:26:24,000 function, I'll look at the organism. Maybe I can make a guess then. So, 301 00:26:24,000 --> 00:26:29,000 if we were to cut the gene somewhere with a restriction site, 302 00:26:29,000 --> 00:26:33,000 and then we were to take, for example, a gene encoding a drug 303 00:26:33,000 --> 00:26:38,000 resistance or something like that, and insert it at that point, what we 304 00:26:38,000 --> 00:26:43,000 would end up with is this piece of the organism's DNA. 305 00:26:43,000 --> 00:26:51,000 The first part of gene X, then a drug resistance, then the 306 00:26:51,000 --> 00:26:59,000 last part of gene X, and some more sequence from the 307 00:26:59,000 --> 00:27:06,000 organism. Now, this would be, 308 00:27:06,000 --> 00:27:11,000 we'd have this in a test tube. We could do it by the kind of 309 00:27:11,000 --> 00:27:16,000 recombinant DNA manipulations that we have. And what would happen if I 310 00:27:16,000 --> 00:27:21,000 were to put, now, let's keep it with bacteria where 311 00:27:21,000 --> 00:27:26,000 it's easy to see. If I were to take that piece of DNA, 312 00:27:26,000 --> 00:27:31,000 put it inside a living cell, what's going to happen? 313 00:27:31,000 --> 00:27:38,000 Well, let's make this, say, here's the end of it. 314 00:27:38,000 --> 00:27:45,000 That's all we've got. Well, inside the living cell, we of course 315 00:27:45,000 --> 00:27:52,000 have the entire genome. And then we come to this part. 316 00:27:52,000 --> 00:28:00,000 We have gene X. Then we'd have this going. 317 00:28:00,000 --> 00:28:04,000 That would be the whole thing. Well, this particular piece doesn't 318 00:28:04,000 --> 00:28:09,000 have an origin of replication. It's not joined to a vector. It's 319 00:28:09,000 --> 00:28:13,000 just sitting there. So, if the cell divides, 320 00:28:13,000 --> 00:28:18,000 it's not going to get replicated. So, if I select for a drug 321 00:28:18,000 --> 00:28:22,000 resistance that's on that piece of DNA, unless something happens I'm 322 00:28:22,000 --> 00:28:27,000 not going to get a drug-resistant bacteria. But you do know a way 323 00:28:27,000 --> 00:28:32,000 that this thing could join to an origin of replication. 324 00:28:32,000 --> 00:28:36,000 It could join to the origin of replication that's on the bacteria 325 00:28:36,000 --> 00:28:41,000 chromosome. And, the way to do it would be by 326 00:28:41,000 --> 00:28:46,000 undergoing genetic recombination over here, because this DNA is 327 00:28:46,000 --> 00:28:51,000 exactly the same as on that side, and over here it's the same thing. 328 00:28:51,000 --> 00:28:56,000 This DNA is the same as that side. So, if that genetic exchange 329 00:28:56,000 --> 00:29:01,000 happened, what would happen, even if it happened rarely, was this 330 00:29:01,000 --> 00:29:06,000 piece of DNA would replace the piece of DNA that's in there. 331 00:29:06,000 --> 00:29:10,000 I'd be able to tell it was there because I'd just select for drug 332 00:29:10,000 --> 00:29:14,000 resistance. And even if it only happened only one in 500, 333 00:29:14,000 --> 00:29:18,000 00 cells, it wouldn't matter because up would growth the colony that now 334 00:29:18,000 --> 00:29:22,000 has the drug in the middle of gene X. Gene X is gone, 335 00:29:22,000 --> 00:29:26,000 and I could look at the organism if it's alive and see if 336 00:29:26,000 --> 00:29:31,000 it has a phenotype. If it's an essential gene, 337 00:29:31,000 --> 00:29:35,000 that strategy obviously won't work, and when people do the more 338 00:29:35,000 --> 00:29:40,000 complicated thing of doing this kind of experiment to make a transgenic 339 00:29:40,000 --> 00:29:44,000 mouse, it takes about a year to go from our DNA manipulation all the 340 00:29:44,000 --> 00:29:49,000 way to the live mouse with a disrupted gene. 341 00:29:49,000 --> 00:29:53,000 And sometimes what they find after spending half of your PhD. 342 00:29:53,000 --> 00:29:58,000 is that that was an essential gene. And, there's no live mouse, or it 343 00:29:58,000 --> 00:30:03,000 made it two days into being an embryo and it tanked at that point. 344 00:30:03,000 --> 00:30:07,000 But, this again, you could see, we talked about going 345 00:30:07,000 --> 00:30:11,000 back and forth between gene, protein, and trying to figure out 346 00:30:11,000 --> 00:30:15,000 function. All I can sort of do is give you the flavor of what's going 347 00:30:15,000 --> 00:30:19,000 on. But one sort of overarching thing I hope you remember going 348 00:30:19,000 --> 00:30:23,000 through this is DNA sequencing, PCR, all these kinds of 349 00:30:23,000 --> 00:30:27,000 manipulations we're talking about are just exploiting these basic 350 00:30:27,000 --> 00:30:31,000 cellular components that we learned about studying, how does 351 00:30:31,000 --> 00:30:36,000 DNA replicate? How is information coded? 352 00:30:36,000 --> 00:30:40,000 How do genes get expressed? How does genetic information gets 353 00:30:40,000 --> 00:30:45,000 sorted between cells? It's simply applying those 354 00:30:45,000 --> 00:30:50,000 relatively well understood tools, or sort of biological principles and 355 00:30:50,000 --> 00:30:54,000 parts that we learned about, and now using them as tools in an 356 00:30:54,000 --> 00:30:59,000 engineering way, and have just completely transformed 357 00:30:59,000 --> 00:31:04,000 the way biology has been done in the last couple decades. 358 00:31:04,000 --> 00:31:11,000 And it's just, as I say, things are changing so, 359 00:31:11,000 --> 00:31:18,000 so fast. It's almost breathtaking. So, the last little bit of sort of 360 00:31:18,000 --> 00:31:25,000 technique oriented stuff, I just want to at least make sure 361 00:31:25,000 --> 00:31:32,000 I've mentioned what are called microarrays. You often hear these 362 00:31:32,000 --> 00:31:38,000 referred to as DNA chips. The principle here is this is a way 363 00:31:38,000 --> 00:31:42,000 that lets you ask, not only whether one gene is being 364 00:31:42,000 --> 00:31:46,000 expressed or not under a particular condition, whether its RNA is being 365 00:31:46,000 --> 00:31:50,000 made, and in most cases that means making protein, 366 00:31:50,000 --> 00:31:54,000 or whether it's off, or whether it's at some intermediate 367 00:31:54,000 --> 00:31:58,000 level. A microarray lets you do that experiment with many, 368 00:31:58,000 --> 00:32:02,000 many, many genes at once. And here's the principle. 369 00:32:02,000 --> 00:32:11,000 You take some surface, and there will be a bunch of, 370 00:32:11,000 --> 00:32:20,000 if you will, sites on the surface, on this chip or whatever. And, what 371 00:32:20,000 --> 00:32:30,000 will go to be attached here would be a little piece of DNA 372 00:32:30,000 --> 00:32:39,000 from gene one. I mean, let's say, 373 00:32:39,000 --> 00:32:47,000 maybe a hundred nucleotides: that would be far more than enough to 374 00:32:47,000 --> 00:32:55,000 make it absolutely specific that they could only hybridize to a 375 00:32:55,000 --> 00:33:03,000 messenger RNA from gene one, and not from anything else. And, 376 00:33:03,000 --> 00:33:11,000 this one, then, would have from gene two, this one, from gene 377 00:33:11,000 --> 00:33:22,000 three, and so on. Then, if we were to take a messenger 378 00:33:22,000 --> 00:33:32,000 RNA preparation from, say, an organism if it's a little 379 00:33:32,000 --> 00:33:37,000 one, or maybe a tissue, or something like that, anywhere you 380 00:33:37,000 --> 00:33:41,000 could isolate RNA from. And then, we'll label it in some 381 00:33:41,000 --> 00:33:46,000 kind of way, and we can label it radioactivity, 382 00:33:46,000 --> 00:33:51,000 we can label it with dyes. It's usually done with dyes, 383 00:33:51,000 --> 00:33:55,000 and there are a variety of variations on this. 384 00:33:55,000 --> 00:34:00,000 Those are sort of technical details how to do it. But here's 385 00:34:00,000 --> 00:34:05,000 the principle. Let's just, for the moment, 386 00:34:05,000 --> 00:34:11,000 just consider that it's got a label on it. So if we take the messenger 387 00:34:11,000 --> 00:34:17,000 RNA, and take this little chip that has samples in the extreme, 388 00:34:17,000 --> 00:34:22,000 it could be a sample of every single gene that's in the genome of that 389 00:34:22,000 --> 00:34:28,000 organism, we take this labeled RNA, actually what we would usually do is 390 00:34:28,000 --> 00:34:34,000 to use this to make a labeled cDNA preparation, which would be a copy 391 00:34:34,000 --> 00:34:39,000 of each one of these things. That the technically easy way to get 392 00:34:39,000 --> 00:34:43,000 label into it. But what we do have is if the gene 393 00:34:43,000 --> 00:34:47,000 was on, its messenger RNA would be on, and we'd have a bunch of stuff 394 00:34:47,000 --> 00:34:52,000 corresponding to gene one that had label on it. And if we give it a 395 00:34:52,000 --> 00:34:56,000 chance, that will, then, hybridize here. 396 00:34:56,000 --> 00:35:01,000 And there would be some way of detecting this label. 397 00:35:01,000 --> 00:35:05,000 If gene two was off in that sample, there won't be any hybridization. 398 00:35:05,000 --> 00:35:10,000 There won't be any signal. You can sort of see in principle 399 00:35:10,000 --> 00:35:14,000 what you're doing is you're interrogating each gene in the 400 00:35:14,000 --> 00:35:19,000 extreme, each gene in the organism under some condition. 401 00:35:19,000 --> 00:35:23,000 Is it on? Is it off? If you did various samples, 402 00:35:23,000 --> 00:35:28,000 and you could see maybe it's in an intermediate level, and so on. 403 00:35:28,000 --> 00:35:32,000 So, the chips look like that, 50-100,000 genes perhaps, something 404 00:35:32,000 --> 00:35:37,000 like that. These things are really small. Here's sort of a display of 405 00:35:37,000 --> 00:35:41,000 a simple one, and this is one where they're taking RNA. 406 00:35:41,000 --> 00:35:46,000 The samples are from two conditions. One's labeled with a dye that's 407 00:35:46,000 --> 00:35:50,000 green, and one's labeled with a dye that's red. And, 408 00:35:50,000 --> 00:35:55,000 if you get equal amounts, it looks yellow. So, they mix the 409 00:35:55,000 --> 00:35:59,000 two things together, and if the gene is the same under 410 00:35:59,000 --> 00:36:04,000 two conditions it would be yellow. Under one condition, 411 00:36:04,000 --> 00:36:08,000 if the gene was on in condition one than it would be green, 412 00:36:08,000 --> 00:36:12,000 and off in two, and back and forth. So, without trying to get lost in 413 00:36:12,000 --> 00:36:16,000 the technical details right now, which doesn't matter, the principle 414 00:36:16,000 --> 00:36:20,000 of this thing is that you can take, you can sort of, by making a 415 00:36:20,000 --> 00:36:24,000 preparation of RNA, then you can use these DNA chips and 416 00:36:24,000 --> 00:36:28,000 say, is each gene on and off? Or if I switch conditions, who 417 00:36:28,000 --> 00:36:31,000 comes on and off? So, it's a little like, 418 00:36:31,000 --> 00:36:35,000 I think of it this way. It's like having, all right, 419 00:36:35,000 --> 00:36:39,000 who's on today? And a number of hands go up, or something. 420 00:36:39,000 --> 00:36:43,000 And the rest of you would be off. But, come back on Monday, and I say 421 00:36:43,000 --> 00:36:46,000 to the something or other, and a different set of you would put 422 00:36:46,000 --> 00:36:50,000 up hands. And what I'm kind of looking at are the changes between 423 00:36:50,000 --> 00:36:54,000 that. And the sort of thing where this has been so powerful, 424 00:36:54,000 --> 00:36:58,000 for example, is there are kind of cancers for which there is a 425 00:36:58,000 --> 00:37:02,000 treatment, but it was only 20% successful. 426 00:37:02,000 --> 00:37:07,000 And, when people started to study these cancers and then looked to see 427 00:37:07,000 --> 00:37:12,000 what genes were on, what they realized was even though 428 00:37:12,000 --> 00:37:17,000 physicians had given these cancers a particular name, 429 00:37:17,000 --> 00:37:22,000 if you looked at which genes were being expressed, 430 00:37:22,000 --> 00:37:27,000 they fell into two classes, class A and class B. 431 00:37:27,000 --> 00:37:35,000 And what they then realized was that the treatment they were using was 432 00:37:35,000 --> 00:37:43,000 100% effective of tumors of class A, and wasn't doing anything for the 433 00:37:43,000 --> 00:37:51,000 tumors of class B. The physician couldn't tell the 434 00:37:51,000 --> 00:38:00,000 difference between these two types of tumors, but a microarray can 435 00:38:00,000 --> 00:38:05,000 So, again we have so little time in this class. I could go on basically 436 00:38:05,000 --> 00:38:11,000 for ages. There's the output of the real sort of DNA chip. 437 00:38:11,000 --> 00:38:17,000 You can see things are very dense, and the great cleverness in doing 438 00:38:17,000 --> 00:38:23,000 these things, people now use the technology that goes with LaserJet 439 00:38:23,000 --> 00:38:29,000 printers to actually synthesize little pieces of either DNA or 440 00:38:29,000 --> 00:38:35,000 proteins starting on a little spot on each membrane, or on 441 00:38:35,000 --> 00:38:39,000 the chip, or whatever. You put it on one nucleotide and 442 00:38:39,000 --> 00:38:43,000 then you put it on the next, and the next, and the next. You 443 00:38:43,000 --> 00:38:47,000 could sequence it using technology that's already around for inkjet 444 00:38:47,000 --> 00:38:51,000 printers at that kind of thing. So here you've seen a fusion of 445 00:38:51,000 --> 00:38:55,000 different types of engineering. OK, the last thing I'm going to 446 00:38:55,000 --> 00:38:59,000 tell you about is a little bit about the immune system. 447 00:38:59,000 --> 00:39:03,000 We've run into this. This is a movie that some of you 448 00:39:03,000 --> 00:39:08,000 liked, got the biggest aw I think of the last part of the course anyway. 449 00:39:08,000 --> 00:39:13,000 But what we are seeing here is a white blood cell pushing aside from 450 00:39:13,000 --> 00:39:18,000 red blood cells, which are stationary, 451 00:39:18,000 --> 00:39:22,000 and chasing a bacterium. It's obvious that it can trace it. 452 00:39:22,000 --> 00:39:27,000 It's able to recognize some things, and at some point, then, it took it 453 00:39:27,000 --> 00:39:32,000 up. The principle of what happened there was this white blood cell had 454 00:39:32,000 --> 00:39:37,000 a capacity to recognize the bacteria, then bind to it. 455 00:39:37,000 --> 00:39:43,000 And then, its membrane, this is the membrane, and this is a 456 00:39:43,000 --> 00:39:50,000 white blood cell. There's many types of these, 457 00:39:50,000 --> 00:39:56,000 and it pinches the membrane off. So you have the bacterium. This is 458 00:39:56,000 --> 00:40:02,000 the bacterium. And, it's inside a little membrane 459 00:40:02,000 --> 00:40:06,000 compartment, as if the bacteria is in a little soap bubble. 460 00:40:06,000 --> 00:40:11,000 And the principle of what happens is the white blood cell has another 461 00:40:11,000 --> 00:40:15,000 soap bubble that's full of poison. And, if you took two soap bubbles 462 00:40:15,000 --> 00:40:19,000 and push them together, you know what happens. They'll fuse, 463 00:40:19,000 --> 00:40:24,000 and you'll get a bigger soap bubble. And that's, in essence, how these 464 00:40:24,000 --> 00:40:28,000 white blood cells normally kill bacteria. They would bring together 465 00:40:28,000 --> 00:40:33,000 these two compartments. Now you have a bacteria and a poison 466 00:40:33,000 --> 00:40:38,000 within a white blood cell at the bacterium, would get killed. 467 00:40:38,000 --> 00:40:43,000 And we talked about how bacteria fought back. That was a 468 00:40:43,000 --> 00:40:48,000 streptococcus that has a capsule. And the capsule, by having 469 00:40:48,000 --> 00:40:53,000 polysaccharide on the outside, prevent the white blood cell from 470 00:40:53,000 --> 00:40:58,000 being able to grab hold of some feature of the bacteria, 471 00:40:58,000 --> 00:41:04,000 that starts this process of killing it. 472 00:41:04,000 --> 00:41:08,000 And when I told you the story of how DNA was found, 473 00:41:08,000 --> 00:41:12,000 it was people studying pneumonia. If you remember, it was 474 00:41:12,000 --> 00:41:17,000 streptococcus. If the streptococcus had a capsule, 475 00:41:17,000 --> 00:41:21,000 and the people would get very sick. And after five or six days, 476 00:41:21,000 --> 00:41:26,000 there would be a crisis where they either lived, or they died. 477 00:41:26,000 --> 00:41:30,000 And what would happen in that time is that, the last thing I'll tell 478 00:41:30,000 --> 00:41:35,000 you about, what's called the adaptive immune system would have 479 00:41:35,000 --> 00:41:39,000 generated special recognition molecules called antibodies that 480 00:41:39,000 --> 00:41:44,000 would have learns to recognize the capsule at that bacterium. 481 00:41:44,000 --> 00:41:49,000 And once those were there, now those white blood cells would be 482 00:41:49,000 --> 00:41:54,000 able to capture the bacterium, because the antibodies give it a 483 00:41:54,000 --> 00:41:59,000 hand in recognizing there was something there that needed 484 00:41:59,000 --> 00:42:04,000 to be killed. The way this adaptive immune system 485 00:42:04,000 --> 00:42:09,000 works, it's almost like science fiction. And I'll tell you the 486 00:42:09,000 --> 00:42:14,000 molecular basis of it. The key insight came from Susumu 487 00:42:14,000 --> 00:42:19,000 Tonegawa, another member of the MIT faculty biology, 488 00:42:19,000 --> 00:42:24,000 and also runs Picower Center, who got a Nobel Prize for 489 00:42:24,000 --> 00:42:30,000 understanding the basis of the diversity of the immune system. 490 00:42:30,000 --> 00:42:35,000 What I just want to do for the moment is just sort of point out the 491 00:42:35,000 --> 00:42:40,000 key features of what's called the adaptive immune system. 492 00:42:40,000 --> 00:42:45,000 And this is one of the reasons that we are able to live. 493 00:42:45,000 --> 00:42:50,000 And even though we get sick from time to time, and we've all had one 494 00:42:50,000 --> 00:42:55,000 thing or another get us for a little while during the semester, 495 00:42:55,000 --> 00:43:00,000 the reason we aren't sick all the time, and the reason we recover when 496 00:43:00,000 --> 00:43:06,000 we get sick, as we have what's called an adaptive immune system. 497 00:43:06,000 --> 00:43:10,000 What happens when people get infected with HIV virus is the cells 498 00:43:10,000 --> 00:43:15,000 that it lives in and destroys are key players in your adaptive immune 499 00:43:15,000 --> 00:43:19,000 system. And to some people don't die from the H1V infection itself, 500 00:43:19,000 --> 00:43:24,000 they died because lots of things that we just, bacteria or fungi, 501 00:43:24,000 --> 00:43:29,000 whatever things we just have on us and we live with all the time 502 00:43:29,000 --> 00:43:33,000 suddenly become killers because you lack the immune system 503 00:43:33,000 --> 00:43:39,000 that fights them off. So the so-called adaptive immune 504 00:43:39,000 --> 00:43:47,000 system is absolutely amazing. So several features, it's got an 505 00:43:47,000 --> 00:43:55,000 incredible diversity, there is a general word that's used 506 00:43:55,000 --> 00:44:03,000 to describe some sort of all sorts of chemical entities, and 507 00:44:03,000 --> 00:44:09,000 it's called an antigen. So anyway, this can recognize many, 508 00:44:09,000 --> 00:44:13,000 many what are called antigens. And at the moment I think you can just 509 00:44:13,000 --> 00:44:17,000 think of them as some kind of chemical entity. 510 00:44:17,000 --> 00:44:22,000 It could be a carbohydrate. It could be a little piece of an 511 00:44:22,000 --> 00:44:26,000 organic molecule. It could be a few amino acids on a 512 00:44:26,000 --> 00:44:30,000 protein. But it's something that's potentially capable of being 513 00:44:30,000 --> 00:44:35,000 recognized by your immune system. So, there are many, 514 00:44:35,000 --> 00:44:40,000 many, many things. And the amazing thing is I can go into a lab and 515 00:44:40,000 --> 00:44:45,000 synthesize a molecule that's never been seen on this Earth before and 516 00:44:45,000 --> 00:44:49,000 challenge somebody with it, and you'll produce an immune 517 00:44:49,000 --> 00:44:54,000 response that will be mounted against that even though it's never 518 00:44:54,000 --> 00:44:59,000 been on Earth before. It's also the specificity. 519 00:44:59,000 --> 00:45:05,000 It is completely amazing. If I were to take a protein, 520 00:45:05,000 --> 00:45:12,000 and then, let's say, put on a phenyl ring with a methyl there, 521 00:45:12,000 --> 00:45:19,000 inject it into someone, the immune system would figure out how to 522 00:45:19,000 --> 00:45:27,000 recognize this thing with the phenyl ring, and the methyl. 523 00:45:27,000 --> 00:45:32,000 But, the response it generated, it would see this but not, let's say, 524 00:45:32,000 --> 00:45:38,000 that if I wanted to get an immune system, something with methyl here, 525 00:45:38,000 --> 00:45:44,000 I'd have to put that into the organism and let the immune system 526 00:45:44,000 --> 00:45:50,000 figure out a response. So, the specificity is at the same 527 00:45:50,000 --> 00:45:56,000 kind of level that you are used to here. If restriction enzymes can 528 00:45:56,000 --> 00:46:02,000 read different sequences in DNA or protein can tell one optical isomer 529 00:46:02,000 --> 00:46:08,000 of a small molecule from another, it's this fitting of complementary 530 00:46:08,000 --> 00:46:13,000 shapes. So, with the immune system is all 531 00:46:13,000 --> 00:46:17,000 about is figuring out how to get a complementary shape somehow that's 532 00:46:17,000 --> 00:46:21,000 able to recognize essentially any kind of chemical shape and structure 533 00:46:21,000 --> 00:46:26,000 you can think of. It's just mind blowing. 534 00:46:26,000 --> 00:46:30,000 And, you could already see right from the beginning, 535 00:46:30,000 --> 00:46:35,000 where the fundamental problem people could see from the beginning. 536 00:46:35,000 --> 00:46:39,000 It sounds like we would need a genome that's infinitely big, 537 00:46:39,000 --> 00:46:43,000 full of things that are ready to recognize anything. 538 00:46:43,000 --> 00:46:47,000 And so, one of the real surprises is, and now we know, 539 00:46:47,000 --> 00:46:51,000 is there is 20,000 genes are so in the human genome. 540 00:46:51,000 --> 00:46:55,000 There can't possibly be a zillion genes, each one specific for one of 541 00:46:55,000 --> 00:46:59,000 these structures. There had to be some underlying 542 00:46:59,000 --> 00:47:04,000 principle that we had to learn. And that was one of the big 543 00:47:04,000 --> 00:47:09,000 challenges in the immune system for a long time. Another one was that 544 00:47:09,000 --> 00:47:14,000 if you have an organism that has this capacity, 545 00:47:14,000 --> 00:47:19,000 and you could recognize it, why don't you do yourself in? 546 00:47:19,000 --> 00:47:24,000 Because you yourself are full of entities that could, 547 00:47:24,000 --> 00:47:29,000 in principle, generate an immune system. 548 00:47:29,000 --> 00:47:35,000 So, one of the other things the immune system had to deal with was 549 00:47:35,000 --> 00:47:41,000 avoiding self recognition. If you're able to recognize 550 00:47:41,000 --> 00:47:48,000 anything, how do I not avoid killing my own selves? 551 00:47:48,000 --> 00:47:54,000 So, that is another really fundamental problem in this immune 552 00:47:54,000 --> 00:48:00,000 system. This is exciting. We can all stop and watch, 553 00:48:00,000 --> 00:48:05,000 but I think I'll just try and keep soldiering along for the last minute 554 00:48:05,000 --> 00:48:10,000 or so. So, one other feature that was interesting about the immune 555 00:48:10,000 --> 00:48:14,000 system, is it has a memory. And I'll tell you more about 556 00:48:14,000 --> 00:48:19,000 antibodies at the beginning of next lecture. But, 557 00:48:19,000 --> 00:48:24,000 these are the kinds of molecule that's able to recognize these 558 00:48:24,000 --> 00:48:29,000 different entities. And you've all heard the term in 559 00:48:29,000 --> 00:48:34,000 your ordinary life. But, if we look at the level of 560 00:48:34,000 --> 00:48:39,000 antibodies that are made in the body, 561 00:48:39,000 --> 00:48:48,000 if the first exposure to an antigen that would get some kind of response 562 00:48:48,000 --> 00:48:57,000 that comes up upon the first exposure, and this is time here. 563 00:48:57,000 --> 00:49:03,000 If we let there be some delay, it could be even into years, and 564 00:49:03,000 --> 00:49:09,000 then we get a second exposure, the antigen, the response is much 565 00:49:09,000 --> 00:49:15,000 higher. And this could be a log scale. So, it could be dramatically 566 00:49:15,000 --> 00:49:21,000 higher. So, what was the basis of that? How does it work? 567 00:49:21,000 --> 00:49:27,000 You see right there the principle of vaccination in the sense that if 568 00:49:27,000 --> 00:49:33,000 you ever got chickenpox as a kid, your body has learned how to make 569 00:49:33,000 --> 00:49:37,000 antibodies. So if you ever see it again, 570 00:49:37,000 --> 00:49:41,000 it mounts a really big immune response. If you want to have a 571 00:49:41,000 --> 00:49:45,000 disease something like tetanus that you haven't seen, 572 00:49:45,000 --> 00:49:48,000 you go to the doctor and they squirt and a bit of the stuff that doesn't 573 00:49:48,000 --> 00:49:52,000 make you sick, but it gives you the initial immune 574 00:49:52,000 --> 00:49:56,000 response. Then, if you ever step on a rusty nail, 575 00:49:56,000 --> 00:50:00,000 you get a very powerful response against tetanus. 576 00:50:00,000 --> 00:50:04,000 And that sort of the underlying principle of vaccination is this 577 00:50:04,000 --> 00:50:09,000 concept of memory. We'll pick that up on Wednesday. 578 00:50:09,000 --> 00:50:12,000 So, have a great Patriots Day weekend.