1
00:00:01,000 --> 00:00:05,000
Good
morning. So,

2
00:00:05,000 --> 00:00:09,000
we are going to see if my voice
holds up through this lecture today.

3
00:00:09,000 --> 00:00:13,000
It is a casualty of having
been at Foxborough yesterday,

4
00:00:13,000 --> 00:00:17,000
and then staying up rather
late watching the Red Sox game.

5
00:00:17,000 --> 00:00:21,000
On the whole, both seemed to
have come through successfully,

6
00:00:21,000 --> 00:00:25,000
but my voice is a bit of
a casualty of the events.

7
00:00:25,000 --> 00:00:29,000
So, we'll see. But I'm
going to sound a lot

8
00:00:29,000 --> 00:00:35,000
scratchier
than normal.

9
00:00:35,000 --> 00:00:41,000
So, how many of you stayed up to
the end of the game last night?

10
00:00:41,000 --> 00:00:49,000
Good, excellent.
I approve.

11
00:00:49,000 --> 00:00:54,000
OK,last time, we spoke about
the idea of cloning DNA,

12
00:00:54,000 --> 00:00:59,000
to create libraries
of molecules.

13
00:00:59,000 --> 00:01:03,000
And again, I think this is just
one of the most clever inventions

14
00:01:03,000 --> 00:01:07,000
because it's a completely new way
to think about purifying molecules.

15
00:01:07,000 --> 00:01:11,000
Rather than purifying molecules,
by separating them based on their

16
00:01:11,000 --> 00:01:15,000
biochemical properties, it's
purifying molecules by diluting

17
00:01:15,000 --> 00:01:20,000
them into single components,
and then amplifying each back up

18
00:01:20,000 --> 00:01:24,000
from its own source. It's
really quite a beautiful idea.

19
00:01:24,000 --> 00:01:28,000
And just to go over it,
we take, say, human DNA,

20
00:01:28,000 --> 00:01:32,000
or we could take drosophila DNA,
or we could take yeast DNA, or we

21
00:01:32,000 --> 00:01:37,000
could take any other
DNA we feel like.

22
00:01:37,000 --> 00:01:42,000
We cut it up in some fashion
with a restriction enzyme.

23
00:01:42,000 --> 00:01:48,000
We'll use our favorite
restriction enzyme here, echo R1,

24
00:01:48,000 --> 00:01:54,000
which cuts a defying side,
GAATTC. We take that. We add our

25
00:01:54,000 --> 00:02:00,000
insert DNA. These are referred to
as inserts because they're going to

26
00:02:00,000 --> 00:02:05,000
be inserted into a plasmid.
We take a plasmid vector.

27
00:02:05,000 --> 00:02:11,000
The plasmid vector here
is a naturally occurring,

28
00:02:11,000 --> 00:02:16,000
although sometimes modified, piece
of DNA that bacteria have that

29
00:02:16,000 --> 00:02:22,000
take an origin of replication that
allow it to grow autonomously when

30
00:02:22,000 --> 00:02:28,000
put in a bacterial cell,
a selectable marker.

31
00:02:28,000 --> 00:02:32,000
The selectable marker, for
example, ampicillin resistance,

32
00:02:32,000 --> 00:02:37,000
or some other resistance, we add
these and then we seal up the pieces

33
00:02:37,000 --> 00:02:42,000
of the DNA using the enzyme ligase.
Ligase joins and joins producing

34
00:02:42,000 --> 00:02:47,000
for us molecules of this sort.
We make zillions of them in

35
00:02:47,000 --> 00:02:52,000
parallel in one test tube. We
then transform them by adding

36
00:02:52,000 --> 00:02:57,000
these molecules to bacterial
cells that have been appropriately

37
00:02:57,000 --> 00:03:02,000
prepared to be transformed, that
is, their membranes have been

38
00:03:02,000 --> 00:03:07,000
treated in such a way that
they're going to be most likely to

39
00:03:07,000 --> 00:03:11,000
suck up pieces of DNA. We
then plate them on a plate at a

40
00:03:11,000 --> 00:03:15,000
density so that individual bacterial
cells are well separated from each

41
00:03:15,000 --> 00:03:19,000
other. You try a bunch of different
densities so you get one right.

42
00:03:19,000 --> 00:03:23,000
And, you let them grow up. And,
every colony here, as we discussed,

43
00:03:23,000 --> 00:03:27,000
is the descendant of a
single bacterial cell,

44
00:03:27,000 --> 00:03:31,000
carrying ideally
a single plasmid.

45
00:03:31,000 --> 00:03:35,000
And, that single plasmid, we
know it's carrying a single

46
00:03:35,000 --> 00:03:39,000
plasmid because we were clever
enough to put ampicillin or other

47
00:03:39,000 --> 00:03:44,000
selectable marker on this plate.
And so, only bacteria that have

48
00:03:44,000 --> 00:03:48,000
picked up the plasmid are ampicillin
resistant. And there you go.

49
00:03:48,000 --> 00:03:53,000
This is called a library. And, at
the end of the day, you may have

50
00:03:53,000 --> 00:03:57,000
a library that contains one plate
of clones or a library containing

51
00:03:57,000 --> 00:04:02,000
hundreds of plates of clones.
We're going to see how we last

52
00:04:02,000 --> 00:04:08,000
through this. Now, a few
people asked me at the end of

53
00:04:08,000 --> 00:04:13,000
the last lecture, well, OK,
but what about the details.

54
00:04:13,000 --> 00:04:19,000
Is it really going to work like
this? How come some of these

55
00:04:19,000 --> 00:04:24,000
plasmid molecules don't
automatically get closed back up by

56
00:04:24,000 --> 00:04:30,000
ligase? Why is it that there's
always an insert in the plasmid?

57
00:04:30,000 --> 00:04:34,000
What's the answer to that question?
Sorry? There's not an answer

58
00:04:34,000 --> 00:04:38,000
because sometimes ligase
might close up that molecule.

59
00:04:38,000 --> 00:04:42,000
Now, that would be unfortunate
because it would mean that a bunch

60
00:04:42,000 --> 00:04:46,000
of the things in your library just
had the vector without any insert.

61
00:04:46,000 --> 00:04:50,000
So, and these are details,
but over the course of years,

62
00:04:50,000 --> 00:04:54,000
recombinant DNA specialists have
worked out lots of cute tricks to

63
00:04:54,000 --> 00:04:58,000
make better and better libraries.
I'll just give you an example of

64
00:04:58,000 --> 00:05:03,000
the kinds of things. Remember
that in order to ligate DNA,

65
00:05:03,000 --> 00:05:09,000
we had a five prime here. We
have a phosphate group here,

66
00:05:09,000 --> 00:05:16,000
three prime hydroxyl phosphate here,
double strand of DNA here. We have

67
00:05:16,000 --> 00:05:23,000
a phosphate here. We
have a hydroxyl here,

68
00:05:23,000 --> 00:05:30,000
phosphate five
prime, three prime.

69
00:05:30,000 --> 00:05:37,000
If ligase is going to come along,
it turns out that ligase needs the

70
00:05:37,000 --> 00:05:45,000
phosphate there in order to
seal it up and make a chain.

71
00:05:45,000 --> 00:05:53,000
So, for example, suppose we were
to arrange that the plasmid vector

72
00:05:53,000 --> 00:06:01,000
didn't have phosphates on its two
ends. Then ligase would not be able

73
00:06:01,000 --> 00:06:06,000
to re-seal the plasmid
vector. That's a cute trick.

74
00:06:06,000 --> 00:06:10,000
This is just cooking, but
I'm giving you an idea of the

75
00:06:10,000 --> 00:06:13,000
kind of cooking tricks we
use in all this. So, ideally,

76
00:06:13,000 --> 00:06:17,000
you would like an enzyme that can
remove phosphate groups from the end

77
00:06:17,000 --> 00:06:21,000
of DNA. How are you going
to invent such an enzyme?

78
00:06:21,000 --> 00:06:24,000
It already exists is the
answer to all these questions.

79
00:06:24,000 --> 00:06:28,000
And, bacteria have such an enzyme
that can remove phosphate groups.

80
00:06:28,000 --> 00:06:32,000
So, just remove phosphate groups.
And of course these enzymes are

81
00:06:32,000 --> 00:06:36,000
developed by bacteria because
they need them in the course of DNA

82
00:06:36,000 --> 00:06:41,000
metabolism. And, what do
you think the enzyme is

83
00:06:41,000 --> 00:06:45,000
called? Phosphotase, of
course. That's what happens,

84
00:06:45,000 --> 00:06:49,000
use phosphotase, and you treat that,
and it doesn't seal back up. Now,

85
00:06:49,000 --> 00:06:54,000
somebody will say to me, well,
OK, but now I've got my vector

86
00:06:54,000 --> 00:06:58,000
here, and I don't have a
phosphate on it, and so this is

87
00:06:58,000 --> 00:07:03,000
my vector DNA. And then,
I've got my insert DNA,

88
00:07:03,000 --> 00:07:08,000
and sorry, my insert DNA here, it
has a hydroxyl here and a phosphate

89
00:07:08,000 --> 00:07:13,000
here. So, the vector
has no phosphate. But,

90
00:07:13,000 --> 00:07:19,000
when ligase wants to attach an
insert, it's got a phosphate here

91
00:07:19,000 --> 00:07:24,000
but not here. What's
going to happen? Well,

92
00:07:24,000 --> 00:07:29,000
it turns out that ligase will seal
up this because it's got a phosphate,

93
00:07:29,000 --> 00:07:34,000
but it'll leave this one
open. Now, is that a problem?

94
00:07:34,000 --> 00:07:38,000
It turns out, if you just transform
it into the bacteria with that hole

95
00:07:38,000 --> 00:07:42,000
there on one strand but not both
strands, it's still a covalently

96
00:07:42,000 --> 00:07:47,000
closed circle on one of its strands.
The bacteria will repair it. So,

97
00:07:47,000 --> 00:07:51,000
you can take advantage of the
bacteria's own DNA repair mechanisms

98
00:07:51,000 --> 00:07:55,000
to just throw the molecule in sealed
up on one strand and let its repair

99
00:07:55,000 --> 00:08:00,000
mechanism; all these tricks
we play to our advantage.

100
00:08:00,000 --> 00:08:04,000
Someone else asked after class,
what happens if the gene I'm

101
00:08:04,000 --> 00:08:08,000
interested in studying has,
here's my gene let's say that I'm

102
00:08:08,000 --> 00:08:13,000
interested in studying. I
take human DNA. I cut it with

103
00:08:13,000 --> 00:08:17,000
echo R1. So, I have cut
it at all the echo sites.

104
00:08:17,000 --> 00:08:22,000
Well, golly, what happens if my
gene happened to have an echo site

105
00:08:22,000 --> 00:08:26,000
in it? Then my gene's going
to be cut up into two pieces.

106
00:08:26,000 --> 00:08:30,000
Isn't that bad? What
do I do about that?

107
00:08:30,000 --> 00:08:34,000
Do I know in advance if my
gene has an echo site? Well,

108
00:08:34,000 --> 00:08:38,000
no, I don't, because I
don't know what my gene is.

109
00:08:38,000 --> 00:08:42,000
I'm making a library of
everything in the genome.

110
00:08:42,000 --> 00:08:46,000
So, some genes will have it,
and some won't. And, I might not

111
00:08:46,000 --> 00:08:50,000
know the gene I'm looking for.
So, how do I avoid that? Sorry?

112
00:08:50,000 --> 00:08:54,000
Oh, you've tried another enzyme.
You've tried BAM and Hindi,

113
00:08:54,000 --> 00:08:58,000
and make a library with
different enzymes. That's one

114
00:08:58,000 --> 00:09:02,000
way. That works. Another
way, just to give you a

115
00:09:02,000 --> 00:09:06,000
sense of how fast molecular
biologists are with this.

116
00:09:06,000 --> 00:09:10,000
Supposed when we add echo R1
we don't let the reaction go to

117
00:09:10,000 --> 00:09:14,000
completion. Suppose we run the
reaction under conditions where it's

118
00:09:14,000 --> 00:09:18,000
somewhat inefficient, and
instead of managing to cleave

119
00:09:18,000 --> 00:09:22,000
every echo site, on
average it cleaves,

120
00:09:22,000 --> 00:09:26,000
say, one out of every three-echo
sites. You can do that.

121
00:09:26,000 --> 00:09:30,000
So, that means you can arrange just
by your reaction conditions to on

122
00:09:30,000 --> 00:09:35,000
average randomly cleave
some but not others.

123
00:09:35,000 --> 00:09:38,000
And, these are called
partial digestions. So,

124
00:09:38,000 --> 00:09:41,000
it turns out that all of the kinds
of things that people were asking me

125
00:09:41,000 --> 00:09:44,000
about afterwards, I was very
glad people were thinking

126
00:09:44,000 --> 00:09:47,000
about would this really work?
There are tricks to get around all

127
00:09:47,000 --> 00:09:50,000
of it, and there's a whole fat book
of protocols about if you want to

128
00:09:50,000 --> 00:09:54,000
make a library really, really
carefully, how you would do

129
00:09:54,000 --> 00:09:57,000
that, how you make sure
the vector doesn't re-close,

130
00:09:57,000 --> 00:10:00,000
how you make sure that you don't
cut every site but random sites,

131
00:10:00,000 --> 00:10:04,000
and things like that. And,
all of these rely on lots of

132
00:10:04,000 --> 00:10:08,000
enzymes and things that
bacteria have already invented.

133
00:10:08,000 --> 00:10:12,000
So, I'm just going to put
these down as cooking tips.

134
00:10:12,000 --> 00:10:16,000
These are not really necessarily,
I don't care whether you know the

135
00:10:16,000 --> 00:10:20,000
details or not, rather
that there exists a whole 15

136
00:10:20,000 --> 00:10:24,000
years, 20 years worth of ways to
make the best possible libraries.

137
00:10:24,000 --> 00:10:28,000
And so, it's quite routine now
to be able to make good libraries.

138
00:10:28,000 --> 00:10:34,000
All right, so,
having made a library,

139
00:10:34,000 --> 00:10:40,000
the challenge is finding your clone.
How to find your clone, the clone

140
00:10:40,000 --> 00:10:46,000
of interest. So, I need to
describe a number of ways

141
00:10:46,000 --> 00:10:52,000
that people have for finding
a clone of interest. And here,

142
00:10:52,000 --> 00:10:58,000
of course, up to this point, the
DNA could be zebra DNA, and it

143
00:10:58,000 --> 00:11:04,000
could be human DNA and yeast DNA,
and it could be something that is an

144
00:11:04,000 --> 00:11:11,000
enzyme for arginine,
or this, or that.

145
00:11:11,000 --> 00:11:18,000
But now we have to be specific.
So, let's suppose we go back to a

146
00:11:18,000 --> 00:11:25,000
problem we talked about before
about, say, auxotrophy for a nutrient.

147
00:11:25,000 --> 00:11:32,000
So, let's suppose that I have a
bacteria, maybe even E coli itself,

148
00:11:32,000 --> 00:11:40,000
where I have selected mutants
that are auxotrophic for arginine.

149
00:11:40,000 --> 00:11:52,000
So, arginine auxotrophs will grow
on rich medium, but on minimal medium

150
00:11:52,000 --> 00:12:00,000
they don't grow. But,
they would grow if I added

151
00:12:00,000 --> 00:12:04,000
arginine to that medium. They
don't grow because they have a

152
00:12:04,000 --> 00:12:09,000
mutation in a gene. We
know it's a gene because we

153
00:12:09,000 --> 00:12:13,000
crossed together the mutant and
the wild type. We show that we can

154
00:12:13,000 --> 00:12:18,000
define this phenotype to
be a recessive phenotype.

155
00:12:18,000 --> 00:12:22,000
We can map it in the yeast genome
by showing it has linkage to other

156
00:12:22,000 --> 00:12:27,000
phenotypes. That's all great. We
can do classical genetics, a la

157
00:12:27,000 --> 00:12:32,000
Mendel, a la Morgan,
a la Sturtevant.

158
00:12:32,000 --> 00:12:37,000
But, how are we going to find
the gene? How are we going to,

159
00:12:37,000 --> 00:12:42,000
now, use our tools of recombinant
DNA to get physically in our hand

160
00:12:42,000 --> 00:12:47,000
the piece of DNA that encodes the
gene that is defective in the strand?

161
00:12:47,000 --> 00:12:52,000
So, have a mutant bacteria. It
can't make arginine. It can't

162
00:12:52,000 --> 00:12:57,000
grow in minimal medium.
Somewhere in there, you know

163
00:12:57,000 --> 00:13:02,000
there's a mutation
in the DNA sequence.

164
00:13:02,000 --> 00:13:07,000
How do we find it?
What should we do?

165
00:13:07,000 --> 00:13:13,000
This is the whole point
of recombinant DNA,

166
00:13:13,000 --> 00:13:18,000
to make this abstract notion of,
there exists genes, they transmit

167
00:13:18,000 --> 00:13:24,000
all this kind of stuff, concrete.
How are you going to find

168
00:13:24,000 --> 00:13:30,000
it? Any takers?
Sorry? Run a gel.

169
00:13:30,000 --> 00:13:34,000
So, I take DNA, cut
it up, run a gel.

170
00:13:34,000 --> 00:13:38,000
I have all the DNA from the
bacteria schmeered (sic) out.

171
00:13:38,000 --> 00:13:42,000
And somewhere in that
schmeer is the gene. So,

172
00:13:42,000 --> 00:13:46,000
I take normal DNA from normal
bacteria. I take mutant DNA.

173
00:13:46,000 --> 00:13:50,000
One nucleotide is different in
the mutant DNA. I run them out,

174
00:13:50,000 --> 00:13:54,000
and I assure you, they
just look like a schmeer.

175
00:13:54,000 --> 00:13:58,000
It's just a big schmeer of DNA.
It's hard to see one nucleotide

176
00:13:58,000 --> 00:14:03,000
difference out of the
4 million nucleotides.

177
00:14:03,000 --> 00:14:07,000
The E coli say, how are
we going to get that?

178
00:14:07,000 --> 00:14:11,000
This is good. We're
thinking practically here.

179
00:14:11,000 --> 00:14:15,000
What else? Sorry? Sorry?
Cut it up. I'm assuming

180
00:14:15,000 --> 00:14:19,000
she wanted it cut up and run out on
the gel. It still will look like a

181
00:14:19,000 --> 00:14:23,000
schmeer. Forget the gel.
Cut it up. Make a library.

182
00:14:23,000 --> 00:14:27,000
OK, so we're going to make a
library. Let's assume now we have a

183
00:14:27,000 --> 00:14:31,000
library of different E coli cells
containing individual plasmids,

184
00:14:31,000 --> 00:14:37,000
containing random bits of E
coli. How's that going to help?

185
00:14:37,000 --> 00:14:46,000
Splice it back in. How do I
know if I spliced it back in?

186
00:14:46,000 --> 00:14:55,000
Ooh, that's an interesting thought.
Suppose I were to make my library

187
00:14:55,000 --> 00:15:04,000
using wild type DNA, DNA
from the wild type strain.

188
00:15:04,000 --> 00:15:09,000
So, I'm going to make a library
containing lots and lots of

189
00:15:09,000 --> 00:15:14,000
fragments of normal E coli DNA.
This is my library. I'm going to

190
00:15:14,000 --> 00:15:20,000
transform it into, what
kind of bacteria should I

191
00:15:20,000 --> 00:15:25,000
transform it into,
wild type or mutant?

192
00:15:25,000 --> 00:15:31,000
Who votes mutant?
Who votes wild type?

193
00:15:31,000 --> 00:15:38,000
We'll go with mutant, then.
Mutant. We'll put it in

194
00:15:38,000 --> 00:15:45,000
mutant. So now, all
of these mutant cells,

195
00:15:45,000 --> 00:15:52,000
each one is going to suck up a
plasmid. We then are going to plate

196
00:15:52,000 --> 00:15:59,000
this, and let colonies grow up.
One of these colonies contained,

197
00:15:59,000 --> 00:16:06,000
so this mutant is arge minus.
And, one of these colonies is going

198
00:16:06,000 --> 00:16:12,000
to contain the ARG plus gene here.
How are we going to know which one?

199
00:16:12,000 --> 00:16:18,000
Sorry? How are we going to know
which one has the arge plus gene?

200
00:16:18,000 --> 00:16:24,000
Yes? So, plate it on minimal
medium. If I plate it on minimal

201
00:16:24,000 --> 00:16:31,000
medium, what will happen to
most of my mutant bacteria?

202
00:16:31,000 --> 00:16:34,000
They're not going to grow. But,
what's going to happen to the

203
00:16:34,000 --> 00:16:38,000
bacteria that happens to be lucky
enough to have picked up the plasmid

204
00:16:38,000 --> 00:16:41,000
that contains the ARG plus gene?
It'll grow. So, whatever grows on

205
00:16:41,000 --> 00:16:45,000
minimal medium has been rescued.
In fact, we've complemented the

206
00:16:45,000 --> 00:16:49,000
defect. Remember, we
talked about complementation

207
00:16:49,000 --> 00:16:52,000
tests? In a way, it
would be the plasmid is

208
00:16:52,000 --> 00:16:56,000
complementing the defect.
Bingo, that's it. So, we can

209
00:16:56,000 --> 00:17:00,000
actually find that
gene functionally.

210
00:17:00,000 --> 00:17:09,000
We plate on minimal median,
and we look for growth. The only

211
00:17:09,000 --> 00:17:18,000
things that will grow have been
rescued. So, this is called cloning

212
00:17:18,000 --> 00:17:27,000
by complementation because we
are complementing the defect

213
00:17:27,000 --> 00:17:34,000
in this strand.
All right. So,

214
00:17:34,000 --> 00:17:38,000
any time I have a functional
defect in my bacteria,

215
00:17:38,000 --> 00:17:43,000
I can find the gene for that
functional defect by simply taking a

216
00:17:43,000 --> 00:17:48,000
total library for normal
from wild type bacteria,

217
00:17:48,000 --> 00:17:52,000
transforming it into a mutant
bacteria, and looking for rich

218
00:17:52,000 --> 00:17:57,000
bacteria has suddenly been rescued.
Then I'll purify that bacterium,

219
00:17:57,000 --> 00:18:05,000
and I'll purify out the plasmid.
And that plasmid will contain the

220
00:18:05,000 --> 00:18:16,000
DNA for the gene.
That's pretty cool.

221
00:18:16,000 --> 00:18:28,000
Let's try another one.
Suppose, yes? OK, great.

222
00:18:28,000 --> 00:18:32,000
I've got my plate here, and
I've said only one of these

223
00:18:32,000 --> 00:18:36,000
bacteria will grow. It's
the one that happens to have

224
00:18:36,000 --> 00:18:41,000
within it the plasmid
containing the ARG gene. And,

225
00:18:41,000 --> 00:18:45,000
you're fine with that, but
you're saying, but how would I

226
00:18:45,000 --> 00:18:50,000
get that plasmid back out of the
bacteria because the bacteria's got

227
00:18:50,000 --> 00:18:54,000
its own chromosome, and I'm
making this big deal about

228
00:18:54,000 --> 00:18:59,000
how we purified stuff away
from all this other DNA.

229
00:18:59,000 --> 00:19:03,000
But, I've thrown this plasmid
back into a bacteria that has all

230
00:19:03,000 --> 00:19:08,000
its chromosomal DNA.
So, who am I kidding?

231
00:19:08,000 --> 00:19:13,000
How are we going to purify out just
that plasmid? If I could purify the

232
00:19:13,000 --> 00:19:18,000
plasmid, it would be OK right?
It turns out I can. Plasmids are

233
00:19:18,000 --> 00:19:22,000
little circles of DNA.
Chromosomes are big pieces of DNA.

234
00:19:22,000 --> 00:19:27,000
It turns out that the coiling of
the plasmid as a little circle gives

235
00:19:27,000 --> 00:19:32,000
it different densities and different
physical chemical properties to big

236
00:19:32,000 --> 00:19:37,000
chunks of DNA which get broken up.
And so, there are a bunch of tricks

237
00:19:37,000 --> 00:19:41,000
that allow me to get a pretty high
purification of a plasmid away from

238
00:19:41,000 --> 00:19:46,000
chromosomal DNA based on the
different physical properties of a

239
00:19:46,000 --> 00:19:50,000
small circle versus big chromosome.
But, good question. Otherwise, how

240
00:19:50,000 --> 00:19:55,000
would I get that plasmid out?
But it turns out, you can purify

241
00:19:55,000 --> 00:20:00,000
plasmids. Good question. OK,
so now, let's try another one.

242
00:20:00,000 --> 00:20:05,000
Next cloning expedition: we're
going to go to the library,

243
00:20:05,000 --> 00:20:10,000
and we want to withdraw
a volume from the library.

244
00:20:10,000 --> 00:20:15,000
And, I want now, instead of
bacteria that can't make arginine,

245
00:20:15,000 --> 00:20:20,000
let's go with human DNA.
Let's try human DNA. And,

246
00:20:20,000 --> 00:20:25,000
I would like you to now please find
the gene that encodes beta-globin.

247
00:20:25,000 --> 00:20:30,000
Beta globin, of course, is one
of the two proteins in hemoglobin.

248
00:20:30,000 --> 00:20:34,000
Hemoglobin is a tetramer. It
has alpha-globin and beta-globin.

249
00:20:34,000 --> 00:20:39,000
This tetramer is the oxygen
carrier in your blood.

250
00:20:39,000 --> 00:20:43,000
It carriers oxygen. Beta-globin
happens to be the site

251
00:20:43,000 --> 00:20:48,000
of some very important mutations.
We know that sickle cell anemia is

252
00:20:48,000 --> 00:20:52,000
caused by mutations in beta-globin.
We know that diseases like

253
00:20:52,000 --> 00:20:57,000
thalassemia are caused by
mutations in beta-globin.

254
00:20:57,000 --> 00:21:01,000
And, people knew this before they
had recombinant DNA because they

255
00:21:01,000 --> 00:21:06,000
could study red blood cells.
There's lots of beta-globin in red

256
00:21:06,000 --> 00:21:10,000
blood cells. They could see that
something was funny about the

257
00:21:10,000 --> 00:21:14,000
protein. They could even see that
in sickle cell anemia the protein

258
00:21:14,000 --> 00:21:19,000
had a different net charge,
and it would run differently.

259
00:21:19,000 --> 00:21:23,000
So, they knew something was funny
with the beta globin protein.

260
00:21:23,000 --> 00:21:27,000
All I want you to do now
is clone beta-globin for me.

261
00:21:27,000 --> 00:21:32,000
Could we do the
same thing? Why not?

262
00:21:32,000 --> 00:21:40,000
Bacteria don't make beta-globin.
So, what can we do? Well, we could

263
00:21:40,000 --> 00:21:49,000
make a library of human DNA.
And, we could throw it into the

264
00:21:49,000 --> 00:21:58,000
bacteria. So, why don't
we just select for a

265
00:21:58,000 --> 00:22:05,000
bacteria that makes
beta-globin? Could we do that?

266
00:22:05,000 --> 00:22:11,000
I don't know, how? Do
you see how? How would we

267
00:22:11,000 --> 00:22:16,000
select for that? I mean,
there, we could see who

268
00:22:16,000 --> 00:22:21,000
grows without arginine. But
how are we going to tell which

269
00:22:21,000 --> 00:22:27,000
bacteria has picked up
beta-globin? I don't know. Yeah? Use

270
00:22:27,000 --> 00:22:32,000
mammals. We could take
a mouse that did not

271
00:22:32,000 --> 00:22:37,000
make beta globin, a
mouse that had, say,

272
00:22:37,000 --> 00:22:41,000
thalassemia, isolate a naturally
occurring mouse with a defect in

273
00:22:41,000 --> 00:22:46,000
beta-globin. Then, do
injections of plasmids into mouse

274
00:22:46,000 --> 00:22:51,000
eggs, grow up the mouse eggs
by implanting them back into

275
00:22:51,000 --> 00:22:55,000
pseudo-pregnant females, do
this for 108 individual plasmids

276
00:22:55,000 --> 00:23:00,000
with 108 individual mice,
and look for the mouse that is

277
00:23:00,000 --> 00:23:04,000
rescued. Intellectually,

278
00:23:04,000 --> 00:23:08,000
you're absolutely right, it
works. So, that's exactly the

279
00:23:08,000 --> 00:23:12,000
cloning by complementation
we talked about for bacteria,

280
00:23:12,000 --> 00:23:16,000
and you're dead-on right. That
would work. Getting it funded

281
00:23:16,000 --> 00:23:19,000
is another matter because it's a
hugely expensive experiment to shoot

282
00:23:19,000 --> 00:23:23,000
up each egg with this,
but it could work. So,

283
00:23:23,000 --> 00:23:27,000
we need another solution because
we can't rescue the function in mice

284
00:23:27,000 --> 00:23:31,000
because it's just not
practical to do so.

285
00:23:31,000 --> 00:23:35,000
Of course, if we could do this in
mouse cells, maybe we could make it

286
00:23:35,000 --> 00:23:40,000
work in cell culture in mice.
But, let's suppose we don't have a

287
00:23:40,000 --> 00:23:44,000
cell culture phenotype. We
just have an organism phenotype.

288
00:23:44,000 --> 00:23:49,000
So, it's not going to work to
just do this by complementation.

289
00:23:49,000 --> 00:23:53,000
But, good thinking guys. This is
good. So, next trick we might have

290
00:23:53,000 --> 00:23:58,000
at our disposal is suppose because
beta-globin is so abundant in red

291
00:23:58,000 --> 00:24:02,000
blood cells we have purified
beta-globin, and we've done amino

292
00:24:02,000 --> 00:24:09,000
acid sequencing of the
protein. By end degradation,

293
00:24:09,000 --> 00:24:17,000
you can work out the sequence of
globin. And, you can learn that

294
00:24:17,000 --> 00:24:25,000
beta-globin has, here
at its amino terminal,

295
00:24:25,000 --> 00:24:33,000
val, leu, ser, pro, ala, asp,
lys, threonine dot, dot, dot, dot,

296
00:24:33,000 --> 00:24:41,000
dot off to the
carboxy terminal, OK?

297
00:24:41,000 --> 00:24:46,000
If I knew that this was the amino
acid sequence of the beginning,

298
00:24:46,000 --> 00:24:51,000
just the beginning of beta-globin,
couldn't I figure out what that

299
00:24:51,000 --> 00:24:57,000
initial portion of the
DNA sequence must be?

300
00:24:57,000 --> 00:25:01,000
Wouldn't this give me a clue?
If I knew a little bit of the

301
00:25:01,000 --> 00:25:05,000
protein sequence, wouldn't
this give me a clue about

302
00:25:05,000 --> 00:25:09,000
the nucleotide sequence that must be
there in the human genome to encode

303
00:25:09,000 --> 00:25:13,000
this protein? So, a
biochemist has purified the

304
00:25:13,000 --> 00:25:17,000
protein. Biochemists have studied
the protein well enough to know some

305
00:25:17,000 --> 00:25:21,000
of its amino acid sequence. Can
I infer the DNA sequence from

306
00:25:21,000 --> 00:25:25,000
the amino acid sequence, or at
least a little snippet of it?

307
00:25:25,000 --> 00:25:30,000
Sorry? Multiple possibilities,

308
00:25:30,000 --> 00:25:35,000
but an infinite number? No.
Why do you encode valine? Well,

309
00:25:35,000 --> 00:25:40,000
GT something; something
could be actually A, T,

310
00:25:40,000 --> 00:25:45,000
C, or G. What about luecine.
Well, it's either a T and a C,

311
00:25:45,000 --> 00:25:50,000
or is T in the first place?
There's always a T there.

312
00:25:50,000 --> 00:25:55,000
There you go, and it can be
either of those. There's a T,

313
00:25:55,000 --> 00:26:01,000
C, anything, or an
A, G, and a T, or a C.

314
00:26:01,000 --> 00:26:07,000
Here, we have C, C
anything. Here we have a G,

315
00:26:07,000 --> 00:26:13,000
C anything. We have a G, A, T, or
a C. For leucine it's an A, an A,

316
00:26:13,000 --> 00:26:19,000
either an A or a G.
Here, it's an A, a C,

317
00:26:19,000 --> 00:26:25,000
an anything. Here, it's an A,
an A, a T, or a C, here a G, a T,

318
00:26:25,000 --> 00:26:31,000
anything, an A, an A, A or
a G. You're right. There are

319
00:26:31,000 --> 00:26:36,000
multiple possibilities. But,
it's not an infinite number,

320
00:26:36,000 --> 00:26:41,000
right? There are certain possible
DNA sequences that might be encoded

321
00:26:41,000 --> 00:26:46,000
here. If I just work it out, it's
either two choices here. There

322
00:26:46,000 --> 00:26:52,000
are four choices here.
There's two choices here.

323
00:26:52,000 --> 00:26:57,000
There's four choices here.
There's two choices here, two

324
00:26:57,000 --> 00:27:02,000
choices, etc. If
I just look at,

325
00:27:02,000 --> 00:27:08,000
let's take a segment of this.
Let's try one, two, three, these

326
00:27:08,000 --> 00:27:14,000
six amino acids.
Four choices here,

327
00:27:14,000 --> 00:27:20,000
how many possible DNA sequences
could encode these six amino acids

328
00:27:20,000 --> 00:27:26,000
in this order? Four times
four times two times two

329
00:27:26,000 --> 00:27:32,000
times four times
two, what is that?

330
00:27:32,000 --> 00:27:38,000
256, let's see,
two, two, to the two,

331
00:27:38,000 --> 00:27:44,000
to the four, to the five, to
the six, to the seven, eight,

332
00:27:44,000 --> 00:27:50,000
512. I think it's
about 512 possibilities.

333
00:27:50,000 --> 00:27:56,000
So, 512 possible nucleotide
sequences could work here.

334
00:27:56,000 --> 00:28:02,000
Well, 512's not infinite.
There's 18 bases of sequence,

335
00:28:02,000 --> 00:28:09,000
512 possible 18 base
long nucleotide sequences.

336
00:28:09,000 --> 00:28:14,000
Just suppose that you knew which
one it was. Now, you have to suspend

337
00:28:14,000 --> 00:28:19,000
your disbelief for a second.
I'm not going to tell you how you

338
00:28:19,000 --> 00:28:24,000
might know, but suppose you
knew which of the 512 it was.

339
00:28:24,000 --> 00:28:29,000
OK, could we use that little fact
of knowing a stretch from about 18

340
00:28:29,000 --> 00:28:35,000
bases of the sequence
to find the clone?

341
00:28:35,000 --> 00:28:39,000
How could we find that clone in our
library that has that 18 bases of

342
00:28:39,000 --> 00:28:43,000
sequence? Google.
[LAUGHTER] And, of course,

343
00:28:43,000 --> 00:28:47,000
you are totally right
because as we'll come back to,

344
00:28:47,000 --> 00:28:51,000
that is the way you would do it
today if it's the human genome

345
00:28:51,000 --> 00:28:55,000
because the entire sequence of
the human genome's on the web.

346
00:28:55,000 --> 00:29:00,000
But, you might have an organism
where it's not on the web.

347
00:29:00,000 --> 00:29:04,000
But, we'll come back because, of
course, the human genome project

348
00:29:04,000 --> 00:29:09,000
changes everything as to
how you would approach this.

349
00:29:09,000 --> 00:29:13,000
Google is how you would do it today.
But, in the absence of Google or

350
00:29:13,000 --> 00:29:18,000
the absence of the entire
sequence of the human genome,

351
00:29:18,000 --> 00:29:23,000
but I'm glad you raise it
because it's absolutely right,

352
00:29:23,000 --> 00:29:27,000
how could I find the clone that has
that specific 18 base pair sequence?

353
00:29:27,000 --> 00:29:33,000
Who has my 18 base sequence.
Well, here's a trick.

354
00:29:33,000 --> 00:29:41,000
I could chemically synthesize an
oligonucleotide that matches my

355
00:29:41,000 --> 00:29:48,000
sequence: an 18 base pair long
ologonucleotide encoding my sequence.

356
00:29:48,000 --> 00:29:56,000
What I'd like to do is use this
ologonucleotide as a chemical probe

357
00:29:56,000 --> 00:30:02,000
to wash over my library. And,
by washing it over my library,

358
00:30:02,000 --> 00:30:07,000
I'd like to see where it sticks.
Now, that's kind of interesting.

359
00:30:07,000 --> 00:30:12,000
What do I mean by that? What I'd
really like to do would be to kind

360
00:30:12,000 --> 00:30:18,000
of crack open all the cells of my
library, and then the DNA would be

361
00:30:18,000 --> 00:30:23,000
sitting there. And,
I'd like to take my

362
00:30:23,000 --> 00:30:28,000
ologonucleotide probe for a little
snippet of the gene and wash it over

363
00:30:28,000 --> 00:30:33,000
the library. And then,
by the amazing powers of

364
00:30:33,000 --> 00:30:39,000
Crick and Watson base pairing, it
should stick to the right place.

365
00:30:39,000 --> 00:30:44,000
Could it do that? Turns out
DNA, given time to wash around,

366
00:30:44,000 --> 00:30:49,000
will stick to its own complement.
So that's the idea. How in the

367
00:30:49,000 --> 00:30:55,000
world do I do this in practice?
So, here's what you do in practice.

368
00:30:55,000 --> 00:31:00,000
In practice,
let us grow our

369
00:31:00,000 --> 00:31:06,000
bacteria. Let's plate the bacteria
on an agar plate on which we have

370
00:31:06,000 --> 00:31:12,000
put a membrane a nitrocellulose
filter or some other kind of filter.

371
00:31:12,000 --> 00:31:18,000
Just imagine it being a
piece of filter paper. And,

372
00:31:18,000 --> 00:31:24,000
I'm going to plate my bacteria
on the filter paper that's here.

373
00:31:24,000 --> 00:31:30,000
I'll let them grow up because
there's nutrients here.

374
00:31:30,000 --> 00:31:35,000
The nutrients diffuse through
the filter paper. And then,

375
00:31:35,000 --> 00:31:40,000
I have a piece of filter paper
that I can pick up with my tweezers,

376
00:31:40,000 --> 00:31:45,000
and on that filter paper are
bacterial colonies growing.

377
00:31:45,000 --> 00:31:50,000
So, this is a filter. Then, what
I'm going to do is I'm going to

378
00:31:50,000 --> 00:31:55,000
take this filter with these
glistening bacterial colonies,

379
00:31:55,000 --> 00:32:00,000
and I'm going to stick
it in the autoclave.

380
00:32:00,000 --> 00:32:04,000
And, I'm going to heat it up
in the presence of wet heat,

381
00:32:04,000 --> 00:32:09,000
and the bacterial cells will crack
open. And, under these conditions,

382
00:32:09,000 --> 00:32:13,000
the DNA will tend to stick to
the filter because I've picked the

383
00:32:13,000 --> 00:32:18,000
filter that the DNA tends to stick
to. And, I'm going to wash this

384
00:32:18,000 --> 00:32:23,000
filter in a certain way that all the
usual junk, some of the proteins and

385
00:32:23,000 --> 00:32:27,000
cell surface junk washes off.
And, the DNA from each bacterial

386
00:32:27,000 --> 00:32:33,000
colony will stick. So now,
I have the DNA from each

387
00:32:33,000 --> 00:32:39,000
colony sticking to that spot.
Then, what I'm going to do is I'm

388
00:32:39,000 --> 00:32:45,000
going to take my filter and
I'm going to add my ologoprobe.

389
00:32:45,000 --> 00:32:51,000
This thing is now called a probe.
I'm going to add the probe to the

390
00:32:51,000 --> 00:32:57,000
filter, and I'm going to put
this in a, I need some sort of a

391
00:32:57,000 --> 00:33:03,000
hybridization device in which the
probe and the ologonucleotide and a

392
00:33:03,000 --> 00:33:07,000
little water can swish around.
And here, we use a technical device

393
00:33:07,000 --> 00:33:11,000
called a baggy, or
some other kind of,

394
00:33:11,000 --> 00:33:15,000
basically, a Ziploc bag or you can
heat seal it or something like a

395
00:33:15,000 --> 00:33:18,000
freeze meal. In fact that's
actually what's used in the lab is

396
00:33:18,000 --> 00:33:22,000
Freeze-a-Meal. You get
these Freeze-a-Meal bags,

397
00:33:22,000 --> 00:33:26,000
you toss your filter in, you squirt
a little bit of your probe in,

398
00:33:26,000 --> 00:33:30,000
and you put it in the
Freeze-a-Meal bag, and then you put

399
00:33:30,000 --> 00:33:34,000
it in a water bath. And,
it switches back and forth.

400
00:33:34,000 --> 00:33:40,000
And, the probe just goes
washing all over the place.

401
00:33:40,000 --> 00:33:46,000
And, wherever the probe finds its
corresponding cognate sequence by

402
00:33:46,000 --> 00:33:51,000
Crick and Watson, it'll
stick. And there you go.

403
00:33:51,000 --> 00:33:57,000
That clone contains your sequence.
Now, we have a few problems here,

404
00:33:57,000 --> 00:34:03,000
don't we? What are some of
the problems with this? Yeah?

405
00:34:03,000 --> 00:34:07,000
Sorry, what if it sticks what?
So, the probe, I thought this

406
00:34:07,000 --> 00:34:12,000
filter likes DNA. So, why
won't the probe just stick

407
00:34:12,000 --> 00:34:17,000
nonspecifically everywhere?
We treat it in some way so that

408
00:34:17,000 --> 00:34:22,000
after we've got the DNA adhering
to it it's now not going to stick

409
00:34:22,000 --> 00:34:27,000
everywhere. Good,
next problem. Well,

410
00:34:27,000 --> 00:34:31,000
even before that, yes? No,
we'll take the whole library.

411
00:34:31,000 --> 00:34:35,000
We've gotten the library
scattered out on this filter.

412
00:34:35,000 --> 00:34:39,000
Good, so hang on to that
one for a second. First off,

413
00:34:39,000 --> 00:34:42,000
do we even know where that clone is?
How did we know where the piece of

414
00:34:42,000 --> 00:34:46,000
DNA stuck? I mean, I
drew it as red. But,

415
00:34:46,000 --> 00:34:50,000
how do we know where that
red spot is? Yeah? Oh yeah,

416
00:34:50,000 --> 00:34:53,000
you see the problem is if
I just wash it over there,

417
00:34:53,000 --> 00:34:57,000
unless you have, you know,
Superman vision, you're not going to

418
00:34:57,000 --> 00:35:01,000
know where that probe is. So,
you're proposing, the first

419
00:35:01,000 --> 00:35:05,000
thing we better do is
radioactively label the probe.

420
00:35:05,000 --> 00:35:08,000
So, let's put a radioactive
label on the probe, OK?

421
00:35:08,000 --> 00:35:12,000
Radio label, and it turns out you
can radio label probes by using

422
00:35:12,000 --> 00:35:15,000
these enzymes that can add a
radioactive phosphate group,

423
00:35:15,000 --> 00:35:19,000
etc. So, now, when it's
radioactive, we put it here.

424
00:35:19,000 --> 00:35:22,000
And now we have a radioactive
signal here. How are we going to

425
00:35:22,000 --> 00:35:26,000
find our radioactive signal? We
put it up against x-ray films.

426
00:35:26,000 --> 00:35:30,000
We take our filter.
We dry it off.

427
00:35:30,000 --> 00:35:33,000
We slap it onto a piece of x-ray
film. We let it expose overnight.

428
00:35:33,000 --> 00:35:36,000
We develop the x-ray film.
And, we'll see a black dot.

429
00:35:36,000 --> 00:35:39,000
We'd better actually have
taken some care to take a little

430
00:35:39,000 --> 00:35:43,000
radioactive pen and make a couple
of fiducial marks around the corners.

431
00:35:43,000 --> 00:35:46,000
Otherwise, we're not going to know
where our black dot corresponds to.

432
00:35:46,000 --> 00:35:49,000
But, assume we've made a couple of
dots and we know how to line up our

433
00:35:49,000 --> 00:35:53,000
x-ray film to our filter.
Now, we go back to our filter.

434
00:35:53,000 --> 00:35:56,000
We say, uh-huh, there is a black
dot corresponding to the location of

435
00:35:56,000 --> 00:36:00,000
the radioactive
probe right there.

436
00:36:00,000 --> 00:36:06,000
That was, as you said, where
the colony used to be that we

437
00:36:06,000 --> 00:36:12,000
wished we still had [LAUGHTER]
because we cooked it in the

438
00:36:12,000 --> 00:36:19,000
autoclave, which is too bad.
So, what should we do about that?

439
00:36:19,000 --> 00:36:25,000
Yep? So, if I did it one colony at
a time, I would know exactly which

440
00:36:25,000 --> 00:36:32,000
one it came from. But,
it could take a long time.

441
00:36:32,000 --> 00:36:35,000
Sorry? So, plate it first
onto a plate of agar.

442
00:36:35,000 --> 00:36:39,000
Take a filter, and press
the filter up against the

443
00:36:39,000 --> 00:36:43,000
plate and make a copy of it.
Replicaplate (sic) that. It turns

444
00:36:43,000 --> 00:36:46,000
out, that'll work. There
are two different approaches

445
00:36:46,000 --> 00:36:50,000
and both of you were right. One
approach is to replicaplate it.

446
00:36:50,000 --> 00:36:54,000
Plate it first on a normal plate,
and lay a piece of filter on top of

447
00:36:54,000 --> 00:36:58,000
it, and a little bacteria will
stick in the same patterns.

448
00:36:58,000 --> 00:37:01,000
Peel it off, and you now
have it. Alternatively,

449
00:37:01,000 --> 00:37:05,000
now in the presence of robotics,
you can use a robot to take these

450
00:37:05,000 --> 00:37:08,000
colonies into microtiter plates,
and you can screen the individual

451
00:37:08,000 --> 00:37:12,000
wells by stamping them onto
a filter, things like that.

452
00:37:12,000 --> 00:37:15,000
And frankly, that's how we do
it now. If you want to screen the

453
00:37:15,000 --> 00:37:19,000
human genome, at least set up a
library with a few tens of thousands

454
00:37:19,000 --> 00:37:23,000
or hundreds of thousands such things.
And, we can read off from a grid

455
00:37:23,000 --> 00:37:26,000
which one it was, and
we go back to our master

456
00:37:26,000 --> 00:37:30,000
microtiter plates where we have.
But, either way, we need to have a

457
00:37:30,000 --> 00:37:34,000
living copy of the library.
But, that's how you do it.

458
00:37:34,000 --> 00:37:39,000
So now, we're in business.
We have a living copy of the

459
00:37:39,000 --> 00:37:43,000
library. We make a
filter containing that.

460
00:37:43,000 --> 00:37:48,000
We cook the filter in the autoclave.
We add a radioactive probe.

461
00:37:48,000 --> 00:37:53,000
Wherever it sticks, it matches by
the wonders of Crick-Watson base

462
00:37:53,000 --> 00:37:58,000
pairing. We're in business. Yes?
So now, there was this issue.

463
00:37:58,000 --> 00:38:03,000
I mean, how do I know that that
sequence doesn't appear multiple

464
00:38:03,000 --> 00:38:08,000
times in the human genome?
That's one issue. So, I'm going to

465
00:38:08,000 --> 00:38:13,000
have to pull out each of the
positive hits I get and check it out.

466
00:38:13,000 --> 00:38:18,000
I'm going to have to analyze the
clone because just knowing that it

467
00:38:18,000 --> 00:38:23,000
hybridized to that might not
tell me it's the beta-globin gene,

468
00:38:23,000 --> 00:38:28,000
but at least it's probably a good
start, right? I've narrowed it down.

469
00:38:28,000 --> 00:38:33,000
But, yes? Wait a second, right. We
said there were 512 possibilities,

470
00:38:33,000 --> 00:38:39,000
and I said, bear with me, let's
suppose we knew which one it

471
00:38:39,000 --> 00:38:45,000
was and we used it. Well,
how are we going to know

472
00:38:45,000 --> 00:38:51,000
which one it is? Well,
we could do the experiment

473
00:38:51,000 --> 00:38:57,000
512 times, and one of them
would work. That's lousy.

474
00:38:57,000 --> 00:39:03,000
We could go and make 512 ologotes
and simultaneously throw them in the

475
00:39:03,000 --> 00:39:07,000
same seal-a-meal bag.
That actually works.

476
00:39:07,000 --> 00:39:10,000
How do you make 512 ologotes?
How do you make an ologote, by the

477
00:39:10,000 --> 00:39:13,000
way? To make an ologonucleotide,
there's very fancy chemistry that's

478
00:39:13,000 --> 00:39:16,000
been developed, which
someone won a Nobel Prize.

479
00:39:16,000 --> 00:39:20,000
Nowadays, of course, if you need
an ologote made, how do you do it?

480
00:39:20,000 --> 00:39:23,000
Go to the catalog, that's right.
In fact, you can go on the web,

481
00:39:23,000 --> 00:39:26,000
type in the sequence you want, and
there's a machine that will make

482
00:39:26,000 --> 00:39:29,000
it. You can have it tomorrow.
So, it turns out, that's how you

483
00:39:29,000 --> 00:39:32,000
make ologonucleotides today.
There are good machines for it.

484
00:39:32,000 --> 00:39:36,000
And, it turns out that if you
wanted to, so what you do is you

485
00:39:36,000 --> 00:39:40,000
type into the computer the following.
You type in, please make me an

486
00:39:40,000 --> 00:39:44,000
ologote that starts, put
a C in the first position,

487
00:39:44,000 --> 00:39:47,000
a C in the second position. And,
what are you going to put in the

488
00:39:47,000 --> 00:39:51,000
third position? Just tell
the computer to put in a

489
00:39:51,000 --> 00:39:55,000
random mix of all four.
Then, a G in this position,

490
00:39:55,000 --> 00:39:59,000
a C in that position, and
then a random mix of all four.

491
00:39:59,000 --> 00:40:03,000
Then, put in a G and an A, and
then put in a 50/50 mix of T and

492
00:40:03,000 --> 00:40:06,000
A. In fact, in
one synthesis,

493
00:40:06,000 --> 00:40:09,000
by telling the computer to just
add a mixture at certain steps,

494
00:40:09,000 --> 00:40:12,000
it'll simultaneously synthesize a
mixture of all 512 possibilities for

495
00:40:12,000 --> 00:40:16,000
you. So actually, a single
synthesis will suffice to

496
00:40:16,000 --> 00:40:19,000
get a mixture of 512. You
take your mixture of 512,

497
00:40:19,000 --> 00:40:22,000
wash it over the filter, etc. Now,
your point still stands. How do we

498
00:40:22,000 --> 00:40:25,000
know that there's not something
else in the genome that has this,

499
00:40:25,000 --> 00:40:28,000
etc.? But at least we can find all
the specific positives associated

500
00:40:28,000 --> 00:40:31,000
with this, and we can analyze them
further as we'll talk about next

501
00:40:31,000 --> 00:40:35,000
time more about how you
actually analyze them.

502
00:40:35,000 --> 00:40:38,000
And, of course, whether
18 is the right number of

503
00:40:38,000 --> 00:40:41,000
bases, or you might prefer to have
a longer probe or shorter probes,

504
00:40:41,000 --> 00:40:44,000
or two probes, these are all the
cooking tips molecular biologists

505
00:40:44,000 --> 00:40:48,000
worry about. But, given a
sequence of an amino acid

506
00:40:48,000 --> 00:40:51,000
sequence, you can infer,
although with redundancy,

507
00:40:51,000 --> 00:40:54,000
a nucleotide sequence.
Given a nucleotide sequence,

508
00:40:54,000 --> 00:40:58,000
you can make an ologonucleotide
probe. Given a nucleotide probe,

509
00:40:58,000 --> 00:41:02,000
you can wash it over the filter.
You can find the colonies that have

510
00:41:02,000 --> 00:41:07,000
it, and therefore you could
clone by hybridization.

511
00:41:07,000 --> 00:41:12,000
So, we'll call this one
cloning by hybridization,

512
00:41:12,000 --> 00:41:17,000
or cloning by sequence. OK,
now, there are other ways to do

513
00:41:17,000 --> 00:41:21,000
it, or by sequence here. Of
course, as someone correctly

514
00:41:21,000 --> 00:41:26,000
noted, if the entire sequence of
the human genome has been already

515
00:41:26,000 --> 00:41:31,000
sequenced as it has right now, if
you knew the amino acid sequence,

516
00:41:31,000 --> 00:41:36,000
you could do this hybridization not
using filters and radioactive probes,

517
00:41:36,000 --> 00:41:42,000
but just doing it in silico.
You can do it in the computer,

518
00:41:42,000 --> 00:41:50,000
and that will work as well. So now,
let's do the next one. Last cloning

519
00:41:50,000 --> 00:41:57,000
expedition: I'd like to clone the
gene for Huntington's disease or

520
00:41:57,000 --> 00:42:05,000
cystic fibrosis or something
like that. Cloning a disease gene,

521
00:42:05,000 --> 00:42:13,000
such as Huntington's disease, is
a dominantly inherited disorder

522
00:42:13,000 --> 00:42:23,000
passed to some of the offspring,
causes a brain degeneration that

523
00:42:23,000 --> 00:42:33,000
onsets typically in the
fifth decade of life.

524
00:42:33,000 --> 00:42:36,000
Let's clone that gene. Can
we do it by method number one,

525
00:42:36,000 --> 00:42:39,000
cloning by complementation? No,
because we don't have a bacteria

526
00:42:39,000 --> 00:42:42,000
that has Huntington's disease.
We don't have mice that have

527
00:42:42,000 --> 00:42:46,000
Huntington's disease. And,
we can't certainly shoot up

528
00:42:46,000 --> 00:42:49,000
people and try to rescue
the phenotype and all that.

529
00:42:49,000 --> 00:42:52,000
That's not going to work.
Number two, how about doing it by

530
00:42:52,000 --> 00:42:56,000
number two? Let's just get the
protein for Huntington's disease,

531
00:42:56,000 --> 00:42:59,000
get its amino acid sequence,
and then find its nucleotide

532
00:42:59,000 --> 00:43:03,000
sequence. Pretty good. What's
the protein for Huntington's

533
00:43:03,000 --> 00:43:07,000
disease? Huntase. No, it's
actually called Huntington

534
00:43:07,000 --> 00:43:11,000
it turns out. But, at the
time that people went off

535
00:43:11,000 --> 00:43:15,000
trying to find the gene
for Huntington's disease,

536
00:43:15,000 --> 00:43:19,000
I'm afraid they didn't know.
They had no idea what the gene was

537
00:43:19,000 --> 00:43:23,000
that caused Huntington's disease.
That was the point. They wanted to

538
00:43:23,000 --> 00:43:27,000
use molecular biology to find
the gene when they didn't even

539
00:43:27,000 --> 00:43:32,000
know the protein. So, we
can't use our method number

540
00:43:32,000 --> 00:43:37,000
two. So, how are we going to
find it? The disease does lead to

541
00:43:37,000 --> 00:43:42,000
degeneration of nervous cells.
Study nerve cells. So, we could

542
00:43:42,000 --> 00:43:47,000
take brain biopsies from patients
who have died of Huntington's

543
00:43:47,000 --> 00:43:52,000
disease, and people did that.
But, nerve cells that die, a lot of

544
00:43:52,000 --> 00:43:57,000
stuff goes on. All sorts
of proteins go wrong,

545
00:43:57,000 --> 00:44:02,000
and it's stuff. The
problem with studying tissue

546
00:44:02,000 --> 00:44:06,000
from people who have a disease
is that it's diseased tissue.

547
00:44:06,000 --> 00:44:10,000
And, just because you see something
wrong doesn't mean it's a cause

548
00:44:10,000 --> 00:44:15,000
rather than the effect of the
disease. That's why we really want

549
00:44:15,000 --> 00:44:19,000
to find the gene and find its
mutation because we know then that's

550
00:44:19,000 --> 00:44:24,000
the primary cause. But,
how are we going to do that?

551
00:44:24,000 --> 00:44:28,000
We don't know its sequence. We
can't rescue it by complementation.

552
00:44:28,000 --> 00:44:33,000
As a pure geneticist,
what can we do?

553
00:44:33,000 --> 00:44:36,000
Yeah, we know the sequence
of the human genome. So,

554
00:44:36,000 --> 00:44:40,000
we just sequence the entirety of the
genome of somebody with Huntington's

555
00:44:40,000 --> 00:44:43,000
disease and compare it to
normal. That actually may become a

556
00:44:43,000 --> 00:44:47,000
reasonable way to do things, but
the first sequence of the human

557
00:44:47,000 --> 00:44:50,000
genome costs a couple of billion
dollars. Doing it again would be

558
00:44:50,000 --> 00:44:54,000
cheaper. We'd spend
about $30 million or so,

559
00:44:54,000 --> 00:44:57,000
but it's pricey. Also,
there would be a lot of

560
00:44:57,000 --> 00:45:01,000
genetic variation,
just random, meaningless

561
00:45:01,000 --> 00:45:05,000
polymorphism between individuals.
The human genome differs between any

562
00:45:05,000 --> 00:45:11,000
two people by about one letter
or 1, 00. So, we would see about 3

563
00:45:11,000 --> 00:45:16,000
million differences between the
person with Huntington's and the

564
00:45:16,000 --> 00:45:22,000
wild type reference sequence on
Google. We wouldn't know which one

565
00:45:22,000 --> 00:45:27,000
causes it. Suppose you have a
family tree. How could we use it?

566
00:45:27,000 --> 00:45:33,000
Compare the children
and the parents.

567
00:45:33,000 --> 00:45:37,000
That's all right. What
does a geneticist do with a

568
00:45:37,000 --> 00:45:42,000
family tree? What did Sturtevant
teach us: genetic mapping.

569
00:45:42,000 --> 00:45:47,000
Suppose we were to study a
family tree of individuals with

570
00:45:47,000 --> 00:45:52,000
Huntington's disease. And
suppose on the chromosome where

571
00:45:52,000 --> 00:45:57,000
the Huntington's disease gene lives,
we were to look at genetic markers.

572
00:45:57,000 --> 00:46:03,000
Could we do genetic
linkage analysis?

573
00:46:03,000 --> 00:46:09,000
Genetic linkage analysis that would
allow us to know that there was a

574
00:46:09,000 --> 00:46:15,000
marker here, some kind of a marker,
a DNA marker, a DNA variation that

575
00:46:15,000 --> 00:46:21,000
was co-inherited with that showed
linkage with Huntington's disease?

576
00:46:21,000 --> 00:46:27,000
We could do that just by
finding that across a family,

577
00:46:27,000 --> 00:46:33,000
there tended to be very little
genetic recombination between this

578
00:46:33,000 --> 00:46:38,000
marker and Huntington's disease.
Now, how would we know to look here?

579
00:46:38,000 --> 00:46:44,000
You wouldn't. We'd try
markers all over the genome.

580
00:46:44,000 --> 00:46:50,000
Next chromosome, next
chromosome; if we tried genetic

581
00:46:50,000 --> 00:46:56,000
variations all over the human genome,
we would eventually find that some

582
00:46:56,000 --> 00:47:02,000
genetic markers in the human genome
tended to be co-inherited along with

583
00:47:02,000 --> 00:47:07,000
Huntington's disease. It
turns out that that's enough.

584
00:47:07,000 --> 00:47:12,000
This will tell us approximately
where this unknown gene must live.

585
00:47:12,000 --> 00:47:17,000
Here's a portion of the chromosome
where the unknown Huntington's

586
00:47:17,000 --> 00:47:22,000
disease gene lives.
Here's a genetic variant,

587
00:47:22,000 --> 00:47:27,000
and here's a genetic variant, a
marker, that shows correlation.

588
00:47:27,000 --> 00:47:32,000
Maybe there's only 1% recombination
here, and 1% recombination here.

589
00:47:32,000 --> 00:47:36,000
And, that's the powerful
thing about Sturtevant's idea.

590
00:47:36,000 --> 00:47:41,000
It works in fruit flies. It
works in humans. If I have any

591
00:47:41,000 --> 00:47:45,000
genetic variation and it's 99%
correlated, or only recombines 1% of

592
00:47:45,000 --> 00:47:50,000
the time, it tells me that this
unknown gene must be nearby.

593
00:47:50,000 --> 00:47:55,000
So, I could use this genetic
marker as a DNA probe to wash over a

594
00:47:55,000 --> 00:48:00,000
library to get a big piece
of DNA from this region.

595
00:48:00,000 --> 00:48:04,000
I can take this piece of
DNA and use it as a probe,

596
00:48:04,000 --> 00:48:09,000
a radioactive probe, to get
an overlapping piece of DNA.

597
00:48:09,000 --> 00:48:13,000
I can use the end of this DNA as a
probe to wash over a library and get

598
00:48:13,000 --> 00:48:18,000
the next piece of DNA. And,
I can do the same thing here.

599
00:48:18,000 --> 00:48:22,000
Once I have any piece of DNA that's
even vaguely in the neighborhood,

600
00:48:22,000 --> 00:48:27,000
I can use it as a probe to wash over
a library and get a piece of DNA,

601
00:48:27,000 --> 00:48:31,000
use it to get the next piece,
the next piece, the next piece,

602
00:48:31,000 --> 00:48:36,000
in a process that was
called chromosomal walking.

603
00:48:36,000 --> 00:48:40,000
That gives me a series of clones
that I know must cover the region

604
00:48:40,000 --> 00:48:45,000
for this unknown gene. I then
begin to analyze them and I

605
00:48:45,000 --> 00:48:50,000
say, let's look at some more genetic
markers, a genetic marker a little

606
00:48:50,000 --> 00:48:55,000
closer and a little
closer and a little closer.

607
00:48:55,000 --> 00:49:00,000
Which ones show perfect correlation
with Huntington's disease?

608
00:49:00,000 --> 00:49:04,000
And, that narrows me down to a small
number of clones that must contain

609
00:49:04,000 --> 00:49:09,000
the gene, even though I had no
idea in advance what that gene was.

610
00:49:09,000 --> 00:49:14,000
This is called cloning by position.
And, that's a very powerful

611
00:49:14,000 --> 00:49:19,000
technique of genetics because you
don't need to know in advance what's

612
00:49:19,000 --> 00:49:24,000
wrong with a diseased gene. You
first figure out where it is,

613
00:49:24,000 --> 00:49:29,000
and then you get the clones
to figure out what it is. So,

614
00:49:29,000 --> 00:49:33,000
this actually works. Now, the
process of getting the next

615
00:49:33,000 --> 00:49:37,000
piece, and the next clone, and
the next clone, is unbelievably

616
00:49:37,000 --> 00:49:41,000
boring and tedious. And,
for Huntington's disease,

617
00:49:41,000 --> 00:49:45,000
this process took nine years. Of
course, now, how would you do it?

618
00:49:45,000 --> 00:49:49,000
Go to the web because with all of
this process of the human genome,

619
00:49:49,000 --> 00:49:53,000
you've got all these clones
laid out already. And so,

620
00:49:53,000 --> 00:49:57,000
the work that used to take years now
is, once you have a genetic marker

621
00:49:57,000 --> 00:50:01,000
that's close to Huntington's you can
just look up all the clones in the

622
00:50:01,000 --> 00:50:05,000
neighborhood and actually all
the sequences in the neighborhood.

623
00:50:05,000 --> 00:50:09,000
So, this process has gone from nine
years to, if you have do this again,

624
00:50:09,000 --> 00:50:14,000
you could get that region for
Huntington's disease in a couple

625
00:50:14,000 --> 00:50:18,000
weeks. Now the question is,
how do you analyze that region?

626
00:50:18,000 --> 00:50:23,000
How do you know what's in that
region? How do you know what the

627
00:50:23,000 --> 00:50:27,000
genes are that are in that region?
And that's what we'll talk about

628
00:50:27,000 --> 00:50:32,000
next time.