1
00:00:00,060 --> 00:00:01,780
The following
content is provided

2
00:00:01,780 --> 00:00:04,019
under a Creative
Commons license.

3
00:00:04,019 --> 00:00:06,870
Your support will help MIT
OpenCourseWare continue

4
00:00:06,870 --> 00:00:10,730
to offer high-quality
educational resources for free.

5
00:00:10,730 --> 00:00:13,330
To make a donation or
view additional materials

6
00:00:13,330 --> 00:00:15,780
from hundreds of
MIT courses, visit

7
00:00:15,780 --> 00:00:26,370
MIT OpenCourseWare
at ocw.mit.edu

8
00:00:26,370 --> 00:00:27,980
PROFESSOR: Thank you.

9
00:00:27,980 --> 00:00:29,700
And please feel
free to interrupt.

10
00:00:29,700 --> 00:00:33,370
I'd just as soon run this as
a discussion, if you'd like.

11
00:00:33,370 --> 00:00:34,664
Is that permitted, do you know?

12
00:00:34,664 --> 00:00:35,580
MODERATOR: Absolutely.

13
00:00:35,580 --> 00:00:38,210
PROFESSOR: OK, so
these are conflicts

14
00:00:38,210 --> 00:00:40,160
of interests for
those of you who care,

15
00:00:40,160 --> 00:00:46,650
or you can get it in more detail
here by going to this website.

16
00:00:46,650 --> 00:00:52,070
And I thought I will talk
about this topic of causality.

17
00:00:52,070 --> 00:00:54,330
You've learned quite a
bit already in this course

18
00:00:54,330 --> 00:01:01,120
about tools for analyzing
genomes from various aspects,

19
00:01:01,120 --> 00:01:03,180
but what you do after
you analyze it is you

20
00:01:03,180 --> 00:01:05,010
want to test your hypotheses.

21
00:01:05,010 --> 00:01:11,340
And this is a very richly
enabling idea, in the sense

22
00:01:11,340 --> 00:01:13,600
that you can go to very
small cohort sizes,

23
00:01:13,600 --> 00:01:16,480
as we'll see-- N of
one cohort sizes--

24
00:01:16,480 --> 00:01:20,370
and your false positives
are less of a concern

25
00:01:20,370 --> 00:01:22,690
if you have a high throughput
way of testing them.

26
00:01:22,690 --> 00:01:24,150
And so I think
it's very important

27
00:01:24,150 --> 00:01:28,060
to know the possibilities
for testing causality.

28
00:01:28,060 --> 00:01:30,620
And that gets us into
engineering genomes--

29
00:01:30,620 --> 00:01:33,520
and, in a particular, about
computer-aided design.

30
00:01:33,520 --> 00:01:36,180
So you've talked about
computer-aided analysis;

31
00:01:36,180 --> 00:01:38,340
now let's talk about
computer-aided design

32
00:01:38,340 --> 00:01:43,020
of genomes, both
bacterial and human.

33
00:01:43,020 --> 00:01:45,770
So I just want to
illustrate the idea.

34
00:01:45,770 --> 00:01:50,060
You might say, well, why would
we want to design genomes?

35
00:01:50,060 --> 00:01:51,790
You can test
causality, typically,

36
00:01:51,790 --> 00:01:53,580
by changing one base pair.

37
00:01:53,580 --> 00:01:56,270
Why would you want to change
more than one base pair?

38
00:01:56,270 --> 00:01:58,200
If you have a SNP, that's great.

39
00:01:58,200 --> 00:02:00,840
Well, sometimes you have
multiple SNPs interacting

40
00:02:00,840 --> 00:02:05,740
in multigenic-- and we'll
get to humans in a moment.

41
00:02:05,740 --> 00:02:07,873
But here's a radical
example, something

42
00:02:07,873 --> 00:02:09,289
from the extreme
edge, where you'd

43
00:02:09,289 --> 00:02:11,850
want to change almost every
base pair in the genome--

44
00:02:11,850 --> 00:02:14,523
not make a copy of a
genome but actually design,

45
00:02:14,523 --> 00:02:19,660
in an intelligent way--
semi-intelligent--

46
00:02:19,660 --> 00:02:22,460
combinatorial as
well-- a genome that

47
00:02:22,460 --> 00:02:24,240
has new functions,
new properties.

48
00:02:24,240 --> 00:02:26,760
And the four functions I
submit for your consideration

49
00:02:26,760 --> 00:02:29,055
here is that you might
want to be genetically

50
00:02:29,055 --> 00:02:33,040
and metabolically
isolated for safety

51
00:02:33,040 --> 00:02:37,180
reasons or public
relations reasons or both.

52
00:02:37,180 --> 00:02:39,150
You want to have new
chemistry, new protein

53
00:02:39,150 --> 00:02:40,830
chemistry, new amino acids.

54
00:02:40,830 --> 00:02:44,590
And finally, you want to
have multi-virus resistance.

55
00:02:44,590 --> 00:02:47,250
This is probably the most
powerful of the four,

56
00:02:47,250 --> 00:02:50,540
where imagine that you have
an organism-- whether it's

57
00:02:50,540 --> 00:02:53,940
industrial, agricultural,
or even human-- that

58
00:02:53,940 --> 00:02:58,010
was resistant to all viruses,
past and present-- even ones

59
00:02:58,010 --> 00:02:59,995
you haven't analyzed.

60
00:02:59,995 --> 00:03:00,870
So how do we do this?

61
00:03:00,870 --> 00:03:02,390
How do we get new functionality?

62
00:03:02,390 --> 00:03:06,330
How do we design a genome in
such a way that doesn't break?

63
00:03:06,330 --> 00:03:09,390
Because if you change
the genome enough,

64
00:03:09,390 --> 00:03:10,819
you get your comeuppance.

65
00:03:10,819 --> 00:03:13,110
You learn you don't know as
much as you think you know.

66
00:03:13,110 --> 00:03:16,330
You have your beautiful computer
simulations from your analysis,

67
00:03:16,330 --> 00:03:21,030
and as soon as you test them,
you start getting surprises.

68
00:03:21,030 --> 00:03:26,530
So anyway, I'm going to focus
on this process of designing

69
00:03:26,530 --> 00:03:29,360
and building and then testing.

70
00:03:29,360 --> 00:03:30,860
And then, so this
part of the design

71
00:03:30,860 --> 00:03:32,276
has to have an
analytic component.

72
00:03:32,276 --> 00:03:35,160
So we'll get back to your
old friends in analytics.

73
00:03:35,160 --> 00:03:39,690
So as I go down this list, maybe
just show of hands of how many

74
00:03:39,690 --> 00:03:44,670
have been exposed to these
computational tools already.

75
00:03:44,670 --> 00:03:47,450
So Bowtie, anybody?

76
00:03:47,450 --> 00:03:48,150
OK, good.

77
00:03:48,150 --> 00:03:51,400
See, you covered that, so
I don't need to cover that.

78
00:03:51,400 --> 00:03:54,150
Number two-- no?

79
00:03:54,150 --> 00:03:54,910
Some?

80
00:03:54,910 --> 00:03:57,300
SnpEff?

81
00:03:57,300 --> 00:03:59,410
JBrowse-- SQL, you've
all heard of SQL, right?

82
00:03:59,410 --> 00:04:01,455
OK, good.

83
00:04:01,455 --> 00:04:02,940
Let's see.

84
00:04:02,940 --> 00:04:05,200
So the point is
each of these things

85
00:04:05,200 --> 00:04:09,360
is integrated into this system
we call "Millstone," which

86
00:04:09,360 --> 00:04:12,130
is all about design
and analysis.

87
00:04:12,130 --> 00:04:14,630
So it's this loop that goes
around and around, as you'll

88
00:04:14,630 --> 00:04:17,440
see in just a moment--
actually, may have seen already

89
00:04:17,440 --> 00:04:19,209
back here.

90
00:04:19,209 --> 00:04:20,760
So we design it.

91
00:04:20,760 --> 00:04:21,260
We build it.

92
00:04:21,260 --> 00:04:22,110
We test it.

93
00:04:22,110 --> 00:04:22,930
And we analyze it.

94
00:04:22,930 --> 00:04:28,080
And the analysis--
sometimes when you build it,

95
00:04:28,080 --> 00:04:29,540
you build a large number.

96
00:04:29,540 --> 00:04:31,560
You build a combinatorial set.

97
00:04:31,560 --> 00:04:35,530
So this is something
that's fairly unique

98
00:04:35,530 --> 00:04:37,550
to biological
engineering-- or even

99
00:04:37,550 --> 00:04:39,710
to certain branches of
biological engineering--

100
00:04:39,710 --> 00:04:43,720
that you don't see every
day in civil engineering

101
00:04:43,720 --> 00:04:45,340
or aeronautics.

102
00:04:45,340 --> 00:04:49,970
You don't build a
trillion different 787s

103
00:04:49,970 --> 00:04:53,720
and see which one
works the best.

104
00:04:53,720 --> 00:04:54,840
But you can in biology.

105
00:04:54,840 --> 00:04:56,507
And I'll give you
some examples of that.

106
00:04:56,507 --> 00:04:58,131
And part of the reason
we could do this

107
00:04:58,131 --> 00:05:00,170
is just as there's
next-generation sequencing,

108
00:05:00,170 --> 00:05:04,080
which you've heard
about in this course--

109
00:05:04,080 --> 00:05:07,430
and we were also involved
in next-generation synthesis

110
00:05:07,430 --> 00:05:11,040
and next-generation inserting
synthetic DNA into genomes.

111
00:05:11,040 --> 00:05:12,357
And you'll see all about that.

112
00:05:12,357 --> 00:05:13,940
There are four
different ways of doing

113
00:05:13,940 --> 00:05:20,120
next-generation
synthesis, and it's not

114
00:05:20,120 --> 00:05:22,602
important for this
particular class.

115
00:05:22,602 --> 00:05:24,810
And there are various ways
of doing error correction.

116
00:05:24,810 --> 00:05:27,645
And these are kind of analogous
to the kind of error correction

117
00:05:27,645 --> 00:05:31,980
that you have in electronics
and computational systems,

118
00:05:31,980 --> 00:05:35,100
but we won't stress
that analogy too much.

119
00:05:35,100 --> 00:05:37,360
Here's an example,
just practically,

120
00:05:37,360 --> 00:05:40,460
what you get when you build
these oligonucleotides

121
00:05:40,460 --> 00:05:41,430
on chips.

122
00:05:41,430 --> 00:05:45,520
You might get oligos up
to 300 nucleotides long.

123
00:05:45,520 --> 00:05:49,660
As they get longer, they tend to
accumulate errors a little bit

124
00:05:49,660 --> 00:05:52,570
more towards the end.

125
00:05:52,570 --> 00:05:55,360
And so you can see that with
the length, the number of errors

126
00:05:55,360 --> 00:06:01,310
goes up from 1 in 1,300
raw error rate to 1

127
00:06:01,310 --> 00:06:04,450
in 250 raw error rate.

128
00:06:04,450 --> 00:06:08,350
And then we can get rid
of some of those errors

129
00:06:08,350 --> 00:06:11,650
with a enzymatic system called
ErASE-- it doesn't really

130
00:06:11,650 --> 00:06:12,800
matter in this case.

131
00:06:12,800 --> 00:06:14,942
We can get to 1 in 6,000
without sequencing.

132
00:06:14,942 --> 00:06:16,400
And then with
sequencing, if you're

133
00:06:16,400 --> 00:06:18,620
willing to clone
in sequence, you

134
00:06:18,620 --> 00:06:20,660
can get error rates even lower.

135
00:06:20,660 --> 00:06:24,440
And it's important to know
that fundamental limitation.

136
00:06:24,440 --> 00:06:29,910
You always need to think about
background error in computing

137
00:06:29,910 --> 00:06:32,770
as well as synthesis.

138
00:06:32,770 --> 00:06:36,900
You can now do combined
synthesis and sequencing

139
00:06:36,900 --> 00:06:40,120
very closely by making
cis-regulatory elements, which

140
00:06:40,120 --> 00:06:47,290
we did in this paper that's
published-- Sri Kosuri and Dan

141
00:06:47,290 --> 00:06:50,250
Goodman, in particular-- where
you could basically synthesize

142
00:06:50,250 --> 00:06:54,940
cis-regulatory elements in
the genome or in a plasmid.

143
00:06:54,940 --> 00:06:57,680
And then you could read out the
RNA simply by RNA sequencing.

144
00:06:57,680 --> 00:06:59,896
The number of times you see
this bar code in the RNA

145
00:06:59,896 --> 00:07:02,270
tells you how many times that
particular construct, which

146
00:07:02,270 --> 00:07:05,910
could be heavily
engineered-- it isn't like

147
00:07:05,910 --> 00:07:09,801
randomers-- you're making
interesting, cis-regulatory

148
00:07:09,801 --> 00:07:10,300
elements.

149
00:07:10,300 --> 00:07:11,590
And you can make
10s of thousands

150
00:07:11,590 --> 00:07:13,256
of these-- millions
of these constructs.

151
00:07:13,256 --> 00:07:14,380
We did 10s of thousands.

152
00:07:17,060 --> 00:07:18,930
Then you can measure
protein levels

153
00:07:18,930 --> 00:07:20,180
as a result of cis-regulatory.

154
00:07:20,180 --> 00:07:22,450
So you can have
promoter elements,

155
00:07:22,450 --> 00:07:25,750
ribosome binding sites,
and coding region mutations

156
00:07:25,750 --> 00:07:30,460
that you think might
influence RNA and protein.

157
00:07:30,460 --> 00:07:33,230
And here we do
proteins by having

158
00:07:33,230 --> 00:07:35,990
two fluorescent proteins--
a red and a green.

159
00:07:35,990 --> 00:07:38,950
The red is the control, and it
has a very tight distribution,

160
00:07:38,950 --> 00:07:41,000
as you can see here.

161
00:07:41,000 --> 00:07:44,610
And then the green is subject
to this cis-regulatory mutations

162
00:07:44,610 --> 00:07:46,100
made on chips.

163
00:07:46,100 --> 00:07:47,570
And it has a big distribution.

164
00:07:47,570 --> 00:07:50,410
And you divide it up in a
fluorescence-activated sorter.

165
00:07:50,410 --> 00:07:51,650
And you can read it out.

166
00:07:51,650 --> 00:07:55,590
So here, every pixel on these
two plots for RNA and protein

167
00:07:55,590 --> 00:07:58,050
is a separate experiment.

168
00:07:58,050 --> 00:08:01,132
And you can drill down and
get some more information

169
00:08:01,132 --> 00:08:01,840
on each of these.

170
00:08:01,840 --> 00:08:04,940
But the basic idea is each
of these was individually

171
00:08:04,940 --> 00:08:10,520
synthesized on the chip and
individually sequenced later

172
00:08:10,520 --> 00:08:11,950
to determine.

173
00:08:11,950 --> 00:08:16,010
And the bar codes can be
read out of proportion

174
00:08:16,010 --> 00:08:18,302
to the RNA and
protein expression.

175
00:08:18,302 --> 00:08:20,510
And here's an example of
some surprises that come out

176
00:08:20,510 --> 00:08:27,480
of such studies-- and we're not
just doing this for our health.

177
00:08:27,480 --> 00:08:29,490
So, for example, when
we went into this,

178
00:08:29,490 --> 00:08:32,390
it was well known that
codon usage effect

179
00:08:32,390 --> 00:08:37,990
was correlated with, and could
even causally influence--

180
00:08:37,990 --> 00:08:39,990
so here's an example
of causality--

181
00:08:39,990 --> 00:08:42,390
the expression of a protein.

182
00:08:42,390 --> 00:08:47,230
If you have very commonly
used codons, which typically

183
00:08:47,230 --> 00:08:49,920
have high levels of the
corresponding transfer

184
00:08:49,920 --> 00:08:54,180
RNA in the cell, that the
observation-- and it makes

185
00:08:54,180 --> 00:08:56,770
sense-- is that
those proteins would

186
00:08:56,770 --> 00:08:58,950
be expressed at higher levels.

187
00:08:58,950 --> 00:09:00,810
The thing that was
new was we discovered

188
00:09:00,810 --> 00:09:04,370
that at the end terminus, close
to the cis-regulatory elements,

189
00:09:04,370 --> 00:09:05,040
it flips.

190
00:09:05,040 --> 00:09:06,200
It's the opposite.

191
00:09:06,200 --> 00:09:11,190
There's almost no correlation
with abundant codons,

192
00:09:11,190 --> 00:09:13,310
and there's essentially
a negative correlation

193
00:09:13,310 --> 00:09:18,900
here with an r squared
of 0.73, right here,

194
00:09:18,900 --> 00:09:27,740
that shows that there's a
higher expression with very

195
00:09:27,740 --> 00:09:28,890
rare codons.

196
00:09:28,890 --> 00:09:30,500
This was published in Science.

197
00:09:30,500 --> 00:09:35,340
And so a lot of them
tend to be AT-rich,

198
00:09:35,340 --> 00:09:38,580
but we can separate
out that component.

199
00:09:38,580 --> 00:09:40,840
We can separate out things
like ribosome binding

200
00:09:40,840 --> 00:09:43,690
sites, which are AG-rich.

201
00:09:43,690 --> 00:09:48,540
And there's just a general
trend where rare codons help

202
00:09:48,540 --> 00:09:51,086
expression if they're at
the beginning of the gene.

203
00:09:51,086 --> 00:09:53,460
And you could find that out
from this kind of experiment.

204
00:09:57,190 --> 00:10:05,430
So now we want, if we're going
to build the genome that's

205
00:10:05,430 --> 00:10:08,360
radically different-- let's say
"radically different," here,

206
00:10:08,360 --> 00:10:14,000
defined as 7 to 13 codons,
chains, genome-wide freed up--

207
00:10:14,000 --> 00:10:19,310
liberated-- meaning
that we use the synonyms

208
00:10:19,310 --> 00:10:21,360
in the genetic code.

209
00:10:24,480 --> 00:10:27,340
So there's anywhere
from one to six codons

210
00:10:27,340 --> 00:10:31,230
for each amino acid-- three
codons for stop codons.

211
00:10:31,230 --> 00:10:35,910
We can use that synonymous
substitution table

212
00:10:35,910 --> 00:10:37,590
to move things
around and completely

213
00:10:37,590 --> 00:10:39,755
free up-- get rid of
every instance of a UAG

214
00:10:39,755 --> 00:10:41,620
and turn it into UAA.

215
00:10:41,620 --> 00:10:43,180
That's the first example.

216
00:10:43,180 --> 00:10:46,300
And we did that genome-wide
and thereby derisked it.

217
00:10:46,300 --> 00:10:48,750
We can now build on
top of that, because we

218
00:10:48,750 --> 00:10:52,660
can get genomes that grow well
under a variety of conditions.

219
00:10:52,660 --> 00:10:54,640
They're still
genetically engineerable.

220
00:10:54,640 --> 00:10:57,710
And everywhere
there's a bar there,

221
00:10:57,710 --> 00:11:03,740
this refers to a
successful mutation

222
00:11:03,740 --> 00:11:05,310
in the height of
the bar as refers

223
00:11:05,310 --> 00:11:07,615
to the efficiency of
introducing those mutations.

224
00:11:10,470 --> 00:11:16,500
Now we wanted to derisk another
special category-- remember,

225
00:11:16,500 --> 00:11:20,710
I said AGA and AGG are
special, in that they're

226
00:11:20,710 --> 00:11:24,480
the rarest coding codons.

227
00:11:24,480 --> 00:11:26,360
So UGA is a stop codon.

228
00:11:26,360 --> 00:11:31,432
AGA and AGG are
arginine-encoding codons.

229
00:11:31,432 --> 00:11:32,390
And they're the rarest.

230
00:11:32,390 --> 00:11:34,590
And they also are
complicated, because they

231
00:11:34,590 --> 00:11:36,660
tend to represent
Shine-Dalgarno sites, which

232
00:11:36,660 --> 00:11:40,765
tend to be AG-rich regions
that are involved in initiation

233
00:11:40,765 --> 00:11:42,960
of protein synthesis.

234
00:11:42,960 --> 00:11:46,910
Anyway, so there, the
number was a little large

235
00:11:46,910 --> 00:11:50,480
to do genome-wide, so we
focused on essential genes.

236
00:11:50,480 --> 00:11:55,000
And so you can computationally
find all the essential genes

237
00:11:55,000 --> 00:12:01,770
and design strategies for
getting all the AGG and AGAs.

238
00:12:01,770 --> 00:12:04,925
And then when you
synthesize those genomes,

239
00:12:04,925 --> 00:12:07,050
you can do them one at a
time with a process called

240
00:12:07,050 --> 00:12:10,330
[? MAIDS, ?] which we won't
go into-- too experimental.

241
00:12:10,330 --> 00:12:12,130
But basically, you
can essentially just

242
00:12:12,130 --> 00:12:14,280
go straight from
oligos into the genome,

243
00:12:14,280 --> 00:12:16,834
and you can do multiple
ones simultaneously.

244
00:12:16,834 --> 00:12:18,625
And you can see which
ones are hard to make

245
00:12:18,625 --> 00:12:21,350
and which ones are easy-- again,
that's the sort of efficiency

246
00:12:21,350 --> 00:12:22,352
number there.

247
00:12:22,352 --> 00:12:24,560
You can see which ones-- if
they're selected against.

248
00:12:24,560 --> 00:12:26,559
And some of them were
actually selected against.

249
00:12:26,559 --> 00:12:28,510
We could not find them.

250
00:12:28,510 --> 00:12:30,550
And so these are discoveries.

251
00:12:30,550 --> 00:12:35,480
These are examples where
synonymous is not synonymous.

252
00:12:35,480 --> 00:12:39,740
It could mean that there's
some other function, hidden,

253
00:12:39,740 --> 00:12:43,850
layered on top of
the synonyms-- might

254
00:12:43,850 --> 00:12:45,300
be a ribosome binding site.

255
00:12:45,300 --> 00:12:49,810
And so what we find is
that we can try other,

256
00:12:49,810 --> 00:12:51,900
let's say other arginine
codons, rather than

257
00:12:51,900 --> 00:12:53,440
the one we targeted.

258
00:12:53,440 --> 00:12:55,700
Or you sometimes can
try out other codons

259
00:12:55,700 --> 00:12:58,440
that are not even synonymous.

260
00:12:58,440 --> 00:13:01,460
And eventually we found
every single one of them.

261
00:13:01,460 --> 00:13:03,510
So there were about a dozen.

262
00:13:03,510 --> 00:13:05,220
They were hard at
first, and then

263
00:13:05,220 --> 00:13:08,930
we eventually found an
engineering workaround.

264
00:13:08,930 --> 00:13:12,400
And that illustrates a
number of interesting points.

265
00:13:12,400 --> 00:13:14,640
So those were all successful
in essential genes.

266
00:13:14,640 --> 00:13:17,432
And it's our
observation that if you

267
00:13:17,432 --> 00:13:18,890
get it to work for
essential genes,

268
00:13:18,890 --> 00:13:20,765
getting it to work for
the nonessential genes

269
00:13:20,765 --> 00:13:21,900
is even easier.

270
00:13:21,900 --> 00:13:25,520
So then we went on, and so
that's one codon at a time,

271
00:13:25,520 --> 00:13:26,470
two more at time.

272
00:13:26,470 --> 00:13:29,130
So we've derisked three
codons at this point.

273
00:13:29,130 --> 00:13:35,770
So we went on to derisk all
13 codons-- or 13 of the 64.

274
00:13:35,770 --> 00:13:39,280
And we did that in even
smaller set of genes.

275
00:13:39,280 --> 00:13:42,400
So there are 290 essential
genes in E. coli.

276
00:13:42,400 --> 00:13:44,610
We did 42.

277
00:13:44,610 --> 00:13:50,200
And in that case,
there were 400.

278
00:13:50,200 --> 00:13:53,080
And some examples of those--
and every one of them

279
00:13:53,080 --> 00:13:54,490
worked except for one.

280
00:13:54,490 --> 00:13:58,620
And just like the arginine
codons-- that one,

281
00:13:58,620 --> 00:14:00,210
we tried a number
of different codons,

282
00:14:00,210 --> 00:14:03,760
and they worked-- including
non-synonymous codons.

283
00:14:03,760 --> 00:14:08,970
So in almost every case, you
can find something that works.

284
00:14:08,970 --> 00:14:13,045
And then we do biological
assays that the four functions

285
00:14:13,045 --> 00:14:17,440
that we felt should be
changed were actually changed.

286
00:14:17,440 --> 00:14:22,290
And here's two slides
on the virus resistance.

287
00:14:22,290 --> 00:14:24,340
You can do, in a
variety of ways,

288
00:14:24,340 --> 00:14:29,800
of determining how effective
the virus resistance is.

289
00:14:29,800 --> 00:14:32,260
Here you have about
a factor of 1,000

290
00:14:32,260 --> 00:14:36,705
for phage lambda,
which has been mutated

291
00:14:36,705 --> 00:14:39,000
to be highly
virulent in E. coli.

292
00:14:39,000 --> 00:14:42,310
This is a very pathogenic
version of phage lambda.

293
00:14:42,310 --> 00:14:46,010
This is T7, which is
naturally quite lytic.

294
00:14:46,010 --> 00:14:50,717
And you can show that this is
resistant to two of the three

295
00:14:50,717 --> 00:14:52,200
viruses that we tested.

296
00:14:52,200 --> 00:14:55,260
And our hypothesis is
if we change more codons

297
00:14:55,260 --> 00:14:57,620
than just-- that
was just one codon.

298
00:14:57,620 --> 00:15:02,090
If we change seven or so,
which is what we're doing now,

299
00:15:02,090 --> 00:15:03,840
then it will be resistant
to all viruses--

300
00:15:03,840 --> 00:15:05,730
and very heavily
resistant-- so resistant

301
00:15:05,730 --> 00:15:10,710
that the population
of viruses can't

302
00:15:10,710 --> 00:15:13,432
mutate enough to
become resistant.

303
00:15:13,432 --> 00:15:14,890
So all of you should
be questioning

304
00:15:14,890 --> 00:15:18,500
that-- do I really believe that?

305
00:15:18,500 --> 00:15:21,200
And we can talk about
that in the discussion.

306
00:15:21,200 --> 00:15:23,680
So now the other
big functionality

307
00:15:23,680 --> 00:15:28,870
is-- can we genetically,
metabolically isolate these?

308
00:15:28,870 --> 00:15:32,650
And to do this,
we took advantage

309
00:15:32,650 --> 00:15:34,765
of its new genetic code.

310
00:15:34,765 --> 00:15:37,690
Not only we've
freed up a codon, we

311
00:15:37,690 --> 00:15:41,862
can now make that codon
code for a new amino acid

312
00:15:41,862 --> 00:15:44,260
by another set of biochemistry.

313
00:15:44,260 --> 00:15:47,720
And here's some examples.

314
00:15:47,720 --> 00:15:51,030
The amino acids look kind of
like tyrosine or phenylalanine.

315
00:15:51,030 --> 00:15:54,510
Here's one that's
a biphenylalanine,

316
00:15:54,510 --> 00:15:57,330
so it's got two benzene
rings instead of one.

317
00:15:57,330 --> 00:15:58,470
And so it's bulkier.

318
00:15:58,470 --> 00:16:00,650
It's bulkier than any
other amino acid, any

319
00:16:00,650 --> 00:16:01,910
naturally occurring one.

320
00:16:01,910 --> 00:16:04,487
And we wanted to ask-- can
we make those essential genes

321
00:16:04,487 --> 00:16:06,070
that we've been
talking about-- can we

322
00:16:06,070 --> 00:16:09,080
make them addicted
to this amino acid?

323
00:16:09,080 --> 00:16:13,660
And so we did by this
computational protein design

324
00:16:13,660 --> 00:16:14,340
strategy.

325
00:16:14,340 --> 00:16:18,840
And the idea is we looked
through every crystal structure

326
00:16:18,840 --> 00:16:24,120
of every essential protein in E.
coli-- there's 129 or something

327
00:16:24,120 --> 00:16:29,950
like that, 120
crystal structures--

328
00:16:29,950 --> 00:16:32,490
and systematically
ask, were there

329
00:16:32,490 --> 00:16:37,030
any places where we could
fit in a larger amino acid

330
00:16:37,030 --> 00:16:41,300
by carving away
adjacent amino acids,

331
00:16:41,300 --> 00:16:46,560
such that when we then replace
that larger one with a smaller

332
00:16:46,560 --> 00:16:50,270
one-- still keeping its
surroundings mutated,

333
00:16:50,270 --> 00:16:55,210
so we could mutate it two,
three, four, eight times--

334
00:16:55,210 --> 00:16:56,730
however many amino
acids nearby you

335
00:16:56,730 --> 00:16:59,090
need to accommodate the
big amino acid-- if it

336
00:16:59,090 --> 00:17:01,244
no longer accommodates
the small amino acids?

337
00:17:01,244 --> 00:17:02,660
So you basically
systematically go

338
00:17:02,660 --> 00:17:08,069
through every amino acid
for every crystal structure

339
00:17:08,069 --> 00:17:11,380
and found a short list
of a half dozen or so

340
00:17:11,380 --> 00:17:12,880
that looked promising.

341
00:17:12,880 --> 00:17:16,420
And so the idea is, you put
in these 2-phenol groups--

342
00:17:16,420 --> 00:17:20,210
and now, when you accommodate
it and shrink it down,

343
00:17:20,210 --> 00:17:22,203
it won't work.

344
00:17:22,203 --> 00:17:24,020
OK, that's the basic idea.

345
00:17:27,400 --> 00:17:28,920
And in context,
we wanted to have

346
00:17:28,920 --> 00:17:30,394
a really tough test for this.

347
00:17:30,394 --> 00:17:33,060
We wanted to say, not only do we
want it to be addicted to this,

348
00:17:33,060 --> 00:17:37,515
but we don't want it to be able
to escape-- either by mutation

349
00:17:37,515 --> 00:17:39,609
and evolution, we don't
want it to escape.

350
00:17:39,609 --> 00:17:41,150
We don't want it to
be able to escape

351
00:17:41,150 --> 00:17:48,960
by eating it's fellow-- its
classmates-- its other E. coli.

352
00:17:48,960 --> 00:17:52,050
And so the test
we do is we do a--

353
00:17:52,050 --> 00:17:55,100
did you have a
question, anybody?

354
00:17:55,100 --> 00:18:01,165
We would lyse the cells--
lyse cells of a wild-type E.

355
00:18:01,165 --> 00:18:05,500
coli or certain
mutant strains that

356
00:18:05,500 --> 00:18:07,410
would produce large
amounts of these.

357
00:18:07,410 --> 00:18:11,550
And one of the more classic ways
of making an organism that's

358
00:18:11,550 --> 00:18:13,990
metabolically isolated so it
can't survive in the wild--

359
00:18:13,990 --> 00:18:16,680
it can only survive
in an industrial plant

360
00:18:16,680 --> 00:18:19,020
or in a laboratory--
and we did this

361
00:18:19,020 --> 00:18:22,340
with the classic ones,
which people have avoided

362
00:18:22,340 --> 00:18:24,640
using lysates, because it
gives them bad news, which

363
00:18:24,640 --> 00:18:30,116
is if you grow them on lysates,
you get a lot of survivors.

364
00:18:30,116 --> 00:18:31,240
These are the classic ones.

365
00:18:31,240 --> 00:18:37,870
The deletions of these
two genes makes them--

366
00:18:37,870 --> 00:18:39,760
they will still grow.

367
00:18:39,760 --> 00:18:45,330
But this is an example of one
of our designed, nonstandard

368
00:18:45,330 --> 00:18:47,080
amino acid strains.

369
00:18:47,080 --> 00:18:51,310
And we get much
lower escape rates.

370
00:18:51,310 --> 00:18:53,090
And you'll say, even
this low number here,

371
00:18:53,090 --> 00:18:54,680
we want to get
that down to zero.

372
00:18:54,680 --> 00:18:56,670
And you'll see how
we do that a moment.

373
00:18:56,670 --> 00:18:59,050
This is Mike Mee as
a graduate student.

374
00:18:59,050 --> 00:19:02,720
So here's a close-up of-- this
is not of the active site.

375
00:19:02,720 --> 00:19:05,030
This just could be any
place in the protein

376
00:19:05,030 --> 00:19:08,560
where putting in a big amino
acid is going to be disruptive.

377
00:19:08,560 --> 00:19:12,890
So we change this
leucine, innocent leucine,

378
00:19:12,890 --> 00:19:15,480
that's packed all around
with other amino acids.

379
00:19:15,480 --> 00:19:19,295
Have you guys done protein
design in this class at all?

380
00:19:19,295 --> 00:19:19,795
Yeah?

381
00:19:19,795 --> 00:19:21,570
OK, so you know what
I'm talking about.

382
00:19:21,570 --> 00:19:22,910
Rosetta, right?

383
00:19:22,910 --> 00:19:23,597
OK.

384
00:19:23,597 --> 00:19:24,930
So that's what we're using here.

385
00:19:24,930 --> 00:19:27,970
We had to modify it to use
nonstandard amino acids,

386
00:19:27,970 --> 00:19:31,180
because normally people design
proteins with 20 amino acids.

387
00:19:31,180 --> 00:19:34,705
So we took this leucine--
we made it into this bipA.

388
00:19:34,705 --> 00:19:36,330
And you can see now,
it's got all kinds

389
00:19:36,330 --> 00:19:38,390
of clashes-- three
initial clashes.

390
00:19:38,390 --> 00:19:40,120
That's not good.

391
00:19:40,120 --> 00:19:44,270
So we identify those clashes
and we make them smaller--

392
00:19:44,270 --> 00:19:45,135
no clashes anymore.

393
00:19:45,135 --> 00:19:46,510
This is all done
in the computer.

394
00:19:46,510 --> 00:19:47,670
This is all theoretical.

395
00:19:47,670 --> 00:19:50,775
Can you believe that?

396
00:19:50,775 --> 00:19:51,275
We'll see.

397
00:19:53,850 --> 00:19:58,842
So then-- this is putting
back in a small amino acid.

398
00:19:58,842 --> 00:20:00,550
These are some of the
people that did it.

399
00:20:00,550 --> 00:20:05,010
So Marc and Dan are
post-docs in the lab,

400
00:20:05,010 --> 00:20:09,420
and Ryo and Barry did
the crystallography.

401
00:20:09,420 --> 00:20:10,880
I'm a crystallographer
by training,

402
00:20:10,880 --> 00:20:13,750
but I'm a little
out of practice.

403
00:20:13,750 --> 00:20:18,760
So here is the design again, and
there's the electron density.

404
00:20:18,760 --> 00:20:20,160
So now you can
believe it, right?

405
00:20:20,160 --> 00:20:21,784
Because it's not just
a computer model.

406
00:20:21,784 --> 00:20:24,900
Well, it's still a computer
model, but it's based on data.

407
00:20:24,900 --> 00:20:27,670
And here's a comparison of
the design with the X-ray

408
00:20:27,670 --> 00:20:29,640
structure-- not too shabby.

409
00:20:29,640 --> 00:20:30,920
OK.

410
00:20:30,920 --> 00:20:37,330
But the question is, how well
does this work in living cells?

411
00:20:37,330 --> 00:20:41,440
So these are cells where we've
gone-- changed the whole genome

412
00:20:41,440 --> 00:20:45,260
so that now the stop
codon, UAG, is free.

413
00:20:45,260 --> 00:20:50,340
It's never used, which means
we can delete the release

414
00:20:50,340 --> 00:20:52,970
factor that normally recognizes
a stop codon, which otherwise

415
00:20:52,970 --> 00:20:54,070
would have been lethal.

416
00:20:54,070 --> 00:20:55,970
And we can replace it
with a transfer RNA

417
00:20:55,970 --> 00:21:00,220
in a tRNA synthetase that brings
in this [INAUDIBLE] amino acid.

418
00:21:00,220 --> 00:21:03,270
And now-- this is the one
we were just looking at,

419
00:21:03,270 --> 00:21:05,600
the crystal structure
in bold here.

420
00:21:05,600 --> 00:21:08,480
And it has an escape
frequency which

421
00:21:08,480 --> 00:21:14,240
is higher-- we can
crank up mutagenesis

422
00:21:14,240 --> 00:21:16,497
by putting it in a
mutS minus background.

423
00:21:16,497 --> 00:21:18,580
Basically, one of the
mismatched repair proteins--

424
00:21:18,580 --> 00:21:21,690
we can knock it out, which
increases, sort of accelerates,

425
00:21:21,690 --> 00:21:23,080
evolution.

426
00:21:23,080 --> 00:21:27,230
And it has a noticeable
escape frequency.

427
00:21:27,230 --> 00:21:32,654
So a more realistic scenario
would be this mutS plus.

428
00:21:32,654 --> 00:21:34,570
And we can get escape
frequencies as low as 10

429
00:21:34,570 --> 00:21:35,540
to the minus 8th.

430
00:21:35,540 --> 00:21:38,520
These are for other mutations
in that same protein.

431
00:21:38,520 --> 00:21:41,540
And here are mutations
in another protein.

432
00:21:41,540 --> 00:21:44,330
So then we said, OK, but
none of these are perfect.

433
00:21:44,330 --> 00:21:48,580
We want something that's
undetectable levels of escape.

434
00:21:48,580 --> 00:21:53,510
So how would we, how
would you, fix this?

435
00:21:53,510 --> 00:21:55,857
Anybody?

436
00:21:55,857 --> 00:21:57,690
I'm trying to encourage
you to interrupt me,

437
00:21:57,690 --> 00:22:00,480
so I'm interrupting you.

438
00:22:00,480 --> 00:22:01,980
Anybody?

439
00:22:01,980 --> 00:22:08,000
You've got these things
that are escaping

440
00:22:08,000 --> 00:22:09,350
at very low frequencies.

441
00:22:09,350 --> 00:22:10,590
We should be proud of that.

442
00:22:10,590 --> 00:22:12,131
But we want to
drive it even more.

443
00:22:12,131 --> 00:22:13,630
Rather than 10 to
the minus 8th, you

444
00:22:13,630 --> 00:22:16,050
want get down to 10 to minus
10th, or something like that.

445
00:22:16,050 --> 00:22:16,760
Suggestions?

446
00:22:16,760 --> 00:22:18,731
AUDIENCE: So this is
reversion of the mutations

447
00:22:18,731 --> 00:22:20,270
of the [INAUDIBLE].

448
00:22:20,270 --> 00:22:25,440
PROFESSOR: Well, so this means
that you can take the bipA,

449
00:22:25,440 --> 00:22:27,935
and you mutate the codon so it
doesn't encode bipA anymore.

450
00:22:27,935 --> 00:22:29,060
It encodes something else.

451
00:22:29,060 --> 00:22:32,710
So it doesn't need
bipA from the media.

452
00:22:32,710 --> 00:22:35,320
And it puts in another amino
acid, and it somehow survives.

453
00:22:35,320 --> 00:22:37,810
So even though it's
not a perfect fit,

454
00:22:37,810 --> 00:22:43,190
it does well enough
that the enzyme is made.

455
00:22:43,190 --> 00:22:46,070
AUDIENCE: So then modified
multiple essential genes?

456
00:22:46,070 --> 00:22:48,320
PROFESSOR: Multiple
essential genes-- wow.

457
00:22:48,320 --> 00:22:50,100
Couldn't have said
it better myself.

458
00:22:50,100 --> 00:22:52,200
That's what we did.

459
00:22:52,200 --> 00:22:57,980
So before we could choose
which two we wanted

460
00:22:57,980 --> 00:23:01,080
to use-- or three-- we wanted
to know what the spectrum was.

461
00:23:01,080 --> 00:23:04,860
So we forced in all 20
standard amino acids

462
00:23:04,860 --> 00:23:06,750
to replace the bipA.

463
00:23:06,750 --> 00:23:10,690
So we said, let's mutate them
intentionally-- synthetically--

464
00:23:10,690 --> 00:23:12,800
and see what the spectrum is.

465
00:23:12,800 --> 00:23:15,620
Now this is not going to be
the natural spectrum, the sort

466
00:23:15,620 --> 00:23:20,000
of mutagenic spectrum--
this is our intentional--

467
00:23:20,000 --> 00:23:22,190
so what we do is, we
put in each of the 20.

468
00:23:22,190 --> 00:23:26,340
And then we do a quick
selection at 20 doublings.

469
00:23:26,340 --> 00:23:29,910
It's a very fast evolution,
not three billion years.

470
00:23:29,910 --> 00:23:32,470
My students didn't want to wait.

471
00:23:32,470 --> 00:23:34,610
So in 20 doublings,
you get a spectrum

472
00:23:34,610 --> 00:23:37,147
of which amino acids
will substitute for bipA.

473
00:23:37,147 --> 00:23:38,730
In an ideal world,
none of them would.

474
00:23:38,730 --> 00:23:41,960
But we forced them to, and
these are the survivors.

475
00:23:41,960 --> 00:23:45,210
And so the ones we've
been talking about here,

476
00:23:45,210 --> 00:23:47,892
W, tryptofan, is what
we'll substitute for bipA.

477
00:23:47,892 --> 00:23:49,100
And that kind of makes sense.

478
00:23:49,100 --> 00:23:51,390
It's the biggest amino acid.

479
00:23:51,390 --> 00:23:53,335
And that works for
the [? tyrS ?],

480
00:23:53,335 --> 00:23:56,440
which happens to be
the tRNA sythetase.

481
00:23:56,440 --> 00:23:57,880
And then we picked
this other one

482
00:23:57,880 --> 00:24:06,130
under this big red arrow
for AdK-- adenosine kinase--

483
00:24:06,130 --> 00:24:08,540
[INAUDIBLE] kinase--
where there's

484
00:24:08,540 --> 00:24:10,950
very little tryptophan
that will work in that one.

485
00:24:10,950 --> 00:24:12,990
But you get some
escapees if you force

486
00:24:12,990 --> 00:24:16,140
it to take these hydrophobic
aliphatics like leucine.

487
00:24:18,710 --> 00:24:24,620
So we made the double mutant of
the-- we don't have it here--

488
00:24:24,620 --> 00:24:29,690
but we've made the double
mutant of the AdK and the tyrS,

489
00:24:29,690 --> 00:24:33,230
and it's vanishingly small.

490
00:24:33,230 --> 00:24:34,230
We're probably not done.

491
00:24:34,230 --> 00:24:35,400
We'll keep doing this.

492
00:24:35,400 --> 00:24:40,249
But this is the way that
you do a radical recoding

493
00:24:40,249 --> 00:24:41,165
and get new functions.

494
00:24:47,705 --> 00:24:48,830
Any questions on that part?

495
00:24:48,830 --> 00:24:51,468
We're going to move onto
human genome engineering.

496
00:24:51,468 --> 00:24:51,968
Yeah.

497
00:24:51,968 --> 00:24:54,129
AUDIENCE: [INAUDIBLE]
and recognize

498
00:24:54,129 --> 00:24:55,170
the different amino acid.

499
00:24:55,170 --> 00:24:57,041
PROFESSOR: Yeah, I skipped
over that because that's

500
00:24:57,041 --> 00:24:58,957
a little more on the
biological, a little less

501
00:24:58,957 --> 00:25:00,840
on the computational side.

502
00:25:00,840 --> 00:25:05,200
So this was a work from Peter
Shultz' lab and other groups.

503
00:25:05,200 --> 00:25:10,235
And what you do is you take a
synthetase that's orthogonal,

504
00:25:10,235 --> 00:25:12,360
meaning it's from a completely
different organism--

505
00:25:12,360 --> 00:25:14,920
in this case,
Methanococcus jannaschii,

506
00:25:14,920 --> 00:25:17,290
which is a hyperthermalphile.

507
00:25:17,290 --> 00:25:21,620
You take that synthetase--
it's about as far as you

508
00:25:21,620 --> 00:25:24,235
can get on the evolutionary
phylogenetic tree-- you bring

509
00:25:24,235 --> 00:25:26,420
it into E. coli.

510
00:25:26,420 --> 00:25:29,220
You bring in its cognate, tRNA.

511
00:25:29,220 --> 00:25:30,870
You change the
anticodon so that it

512
00:25:30,870 --> 00:25:35,910
will recognize UAG, which
is not what typically

513
00:25:35,910 --> 00:25:39,976
any tRNA normally recognizes.

514
00:25:39,976 --> 00:25:41,850
And that only works with
certain synthetases.

515
00:25:41,850 --> 00:25:46,780
So only certain synthetases are
blind to the anticodon-- mainly

516
00:25:46,780 --> 00:25:50,350
serine and leucine
synthetase is in E. coli.

517
00:25:50,350 --> 00:25:58,170
Anyway, so you can now
evolve the active site

518
00:25:58,170 --> 00:26:02,400
that binds to the
amino acid and the ATP.

519
00:26:02,400 --> 00:26:06,280
So the amino acid and
ATP cause the amino acid

520
00:26:06,280 --> 00:26:08,910
to be [INAUDIBLE]
the transfer RNA.

521
00:26:08,910 --> 00:26:10,730
Anyway, you can
change the active site

522
00:26:10,730 --> 00:26:12,540
so that now it
recognizes any amino acid

523
00:26:12,540 --> 00:26:14,649
you want to a first
approximation.

524
00:26:14,649 --> 00:26:16,440
And you could do that
through a combination

525
00:26:16,440 --> 00:26:22,600
of intelligent design
and random mutagenesis,

526
00:26:22,600 --> 00:26:25,750
and there are selections
for that as well.

527
00:26:25,750 --> 00:26:27,490
So in general, if
you're going to be

528
00:26:27,490 --> 00:26:30,690
doing random or
semirandom mutagenesis,

529
00:26:30,690 --> 00:26:32,322
it's always great
to have a selection

530
00:26:32,322 --> 00:26:34,030
so there are selections
for these things.

531
00:26:34,030 --> 00:26:36,840
And there now are
dozens of amino acids

532
00:26:36,840 --> 00:26:38,675
that work fairly well
in that scenario.

533
00:26:41,454 --> 00:26:43,120
The main thing that
was limiting was not

534
00:26:43,120 --> 00:26:45,610
the synthetase-- I mean,
you could get synthetases.

535
00:26:45,610 --> 00:26:47,910
It's the tRNA then had to
compete with the release

536
00:26:47,910 --> 00:26:51,060
factor in the stop codon or had
to compete with another tRNA

537
00:26:51,060 --> 00:26:53,240
if you use a
different anticodon.

538
00:26:53,240 --> 00:26:58,680
And so freeing up this codon
means there's no competition.

539
00:26:58,680 --> 00:27:03,120
And now it works about as
well as a regular amino acid.

540
00:27:03,120 --> 00:27:06,770
But when it has to compete,
it's at a great disadvantage.

541
00:27:06,770 --> 00:27:07,527
Yeah.

542
00:27:07,527 --> 00:27:10,866
AUDIENCE: Can you explain
why changing the genetic code

543
00:27:10,866 --> 00:27:14,450
will cause all virus resistance?

544
00:27:14,450 --> 00:27:16,450
PROFESSOR: I planted that,
but thank you anyway.

545
00:27:23,940 --> 00:27:26,320
So there's a genetic code
up there in circular form--

546
00:27:26,320 --> 00:27:29,200
probably you're more used
to seeing it in rectangular.

547
00:27:29,200 --> 00:27:37,900
But imagine that we've now
derisked this UAG stop codon

548
00:27:37,900 --> 00:27:43,580
and these AGA and AGG codons
here-- R for arginine.

549
00:27:43,580 --> 00:27:47,150
And we're in the process
of putting all those three

550
00:27:47,150 --> 00:27:51,309
codons together with another
four for serine and leucine.

551
00:27:51,309 --> 00:27:53,600
And remember, I said serine
and leucine is interesting,

552
00:27:53,600 --> 00:27:56,570
because you could swap
out the anticodon--

553
00:27:56,570 --> 00:27:58,310
the synthetase doesn't care.

554
00:27:58,310 --> 00:28:02,750
So that's why we picked those
ones-- the three rarest ones,

555
00:28:02,750 --> 00:28:05,480
plus four where you can
swap out the anticodon.

556
00:28:05,480 --> 00:28:09,600
So we could swap serine
and leucine, for example.

557
00:28:09,600 --> 00:28:13,110
So serine and leucine
also are examples of tRNAs

558
00:28:13,110 --> 00:28:15,980
that bind to six
different codons.

559
00:28:15,980 --> 00:28:18,920
So moving two of them
is not a big deal.

560
00:28:18,920 --> 00:28:20,750
So you still got four left.

561
00:28:20,750 --> 00:28:24,290
So anyway, imagine that we
remove them or swap them

562
00:28:24,290 --> 00:28:26,320
and do weird stuff with them.

563
00:28:26,320 --> 00:28:27,770
Every time the
phage comes in, it

564
00:28:27,770 --> 00:28:31,370
has lots of serines and
leucines that are using these,

565
00:28:31,370 --> 00:28:32,950
and arginines and stops.

566
00:28:32,950 --> 00:28:37,200
And every time it wants
to put in a leucine,

567
00:28:37,200 --> 00:28:40,580
the ribosome puts in a serine.

568
00:28:40,580 --> 00:28:44,570
Well, you can note, leucine
and serine aren't that similar,

569
00:28:44,570 --> 00:28:47,400
and that's going to cause a
mess for every single protein it

570
00:28:47,400 --> 00:28:48,370
makes.

571
00:28:48,370 --> 00:28:52,570
And there might be dozens--
maybe even hundreds

572
00:28:52,570 --> 00:28:55,700
for big phage-- of those codons.

573
00:28:55,700 --> 00:28:59,700
And so you can do the math--
that the chance of mutating

574
00:28:59,700 --> 00:29:03,730
one of those codons to something
that will work is fairly high.

575
00:29:03,730 --> 00:29:08,180
Two is squared, three
to the n power, where

576
00:29:08,180 --> 00:29:10,100
n is the number of
changes it has to make.

577
00:29:10,100 --> 00:29:15,610
And so if you make
enough changes,

578
00:29:15,610 --> 00:29:18,100
population sizes have to
become astronomical in order

579
00:29:18,100 --> 00:29:23,710
to contain one member that
has changed all of its codons

580
00:29:23,710 --> 00:29:25,500
the right way and
hasn't changed a bunch

581
00:29:25,500 --> 00:29:27,726
of codons that would be lethal.

582
00:29:27,726 --> 00:29:30,141
AUDIENCE: So the ones
that you chose, were they

583
00:29:30,141 --> 00:29:32,080
the rarest of the codons--

584
00:29:32,080 --> 00:29:34,412
PROFESSOR: So the first
three were the rarest.

585
00:29:34,412 --> 00:29:35,870
And part of that
is because we felt

586
00:29:35,870 --> 00:29:38,665
we would run into the
most trouble there.

587
00:29:38,665 --> 00:29:40,260
They may be rare for a reason.

588
00:29:40,260 --> 00:29:42,210
And we wanted to
discover those reasons,

589
00:29:42,210 --> 00:29:44,450
both for biological
curiosity, but also

590
00:29:44,450 --> 00:29:49,070
to derisk the
subsequent engineering.

591
00:29:49,070 --> 00:29:51,320
But the leucine and
serine ones are normal.

592
00:29:51,320 --> 00:29:53,930
They're not that rare.

593
00:29:53,930 --> 00:29:54,960
But we derisked them.

594
00:29:54,960 --> 00:29:59,780
And remember that one
where we did 13 codons

595
00:29:59,780 --> 00:30:01,200
on 42 essential genes?

596
00:30:01,200 --> 00:30:03,910
That's how we showed
that, in general, it's

597
00:30:03,910 --> 00:30:06,114
not toxic to individual genes.

598
00:30:06,114 --> 00:30:08,280
But there are examples of
things where you derisk it

599
00:30:08,280 --> 00:30:11,434
on individual genes and you
start making lots of them,

600
00:30:11,434 --> 00:30:13,350
and then you get so-called
"synthetic lethals"

601
00:30:13,350 --> 00:30:17,780
where various pairs
of genes conspire.

602
00:30:17,780 --> 00:30:21,780
But so far, most of
the deleterious nature

603
00:30:21,780 --> 00:30:24,360
of the genomes-- where the
genomes are a little bit slower

604
00:30:24,360 --> 00:30:27,020
growing-- it's usually due
to hitchhiker mutations,

605
00:30:27,020 --> 00:30:31,274
not due to our design-- except
in cases where it's completely

606
00:30:31,274 --> 00:30:32,690
not working, in
which case we have

607
00:30:32,690 --> 00:30:36,000
to find an alternative codon.

608
00:30:36,000 --> 00:30:38,160
But we have to deal
with all these things--

609
00:30:38,160 --> 00:30:42,180
design errors, biological
discovery, and hitchhikers.

610
00:30:45,550 --> 00:30:46,130
Yeah.

611
00:30:46,130 --> 00:30:47,505
AUDIENCE: If you've
already found

612
00:30:47,505 --> 00:30:50,549
that multiple, simultaneous
mutations is unlikely,

613
00:30:50,549 --> 00:30:52,927
works, if they all had
happened at the same time,

614
00:30:52,927 --> 00:30:54,843
but if you have this
engineered system, if you

615
00:30:54,843 --> 00:30:58,405
have some way of migrating
code to other-- you

616
00:30:58,405 --> 00:31:01,964
could end up with the spreading
of your non-secret codes

617
00:31:01,964 --> 00:31:04,250
so that you can mutate
things, one of them at a time,

618
00:31:04,250 --> 00:31:06,530
and accumulate.

619
00:31:06,530 --> 00:31:09,170
PROFESSOR: Well, so,
first of all, a phage

620
00:31:09,170 --> 00:31:13,120
doesn't carry
along its own code.

621
00:31:13,120 --> 00:31:21,060
If it did, we could preempt
that by making lethal genes--

622
00:31:21,060 --> 00:31:24,540
that if you bring in the
tRNA that has the old code,

623
00:31:24,540 --> 00:31:27,040
you activate the lethal gene.

624
00:31:27,040 --> 00:31:30,140
But I think you were
talking about more

625
00:31:30,140 --> 00:31:31,910
a Darwinian
perspective, where you

626
00:31:31,910 --> 00:31:34,990
have incremental changes
that allow you to slog along

627
00:31:34,990 --> 00:31:36,930
well enough that you
can get more mutations.

628
00:31:36,930 --> 00:31:40,780
The problem is, this
collection of mutations--

629
00:31:40,780 --> 00:31:43,340
there is no growth.

630
00:31:43,340 --> 00:31:46,880
Every protein is
majorly messed up.

631
00:31:46,880 --> 00:31:50,940
And so you're not talking about,
say, antibiotic resistance,

632
00:31:50,940 --> 00:31:54,070
where there will be kind of
a gradient of antibiotics.

633
00:31:54,070 --> 00:31:55,820
And somewhere on the
edge of the gradient,

634
00:31:55,820 --> 00:31:59,320
there will be just enough
antibiotic to be selective,

635
00:31:59,320 --> 00:32:00,720
but not enough to kill it.

636
00:32:00,720 --> 00:32:03,610
This is something where, the
instant they get into the cell,

637
00:32:03,610 --> 00:32:04,450
there's no gradient.

638
00:32:04,450 --> 00:32:08,445
They only have one code choice,
and that code is something--

639
00:32:08,445 --> 00:32:10,820
I think the difference between
this and regular evolution

640
00:32:10,820 --> 00:32:14,680
is, regular evolution-- if the
bacteria tried this strategy,

641
00:32:14,680 --> 00:32:16,560
it would be changing
a little bit at a time

642
00:32:16,560 --> 00:32:18,380
and the phage be
keeping up with it.

643
00:32:18,380 --> 00:32:22,900
But we took it offline, so to
speak, did major code revision,

644
00:32:22,900 --> 00:32:24,190
and moved it back.

645
00:32:24,190 --> 00:32:27,880
And the phage was not watching.

646
00:32:27,880 --> 00:32:31,750
And the phage isn't as
intelligent as hackers are.

647
00:32:31,750 --> 00:32:35,564
OK, any other questions?

648
00:32:35,564 --> 00:32:36,730
We could stay on this topic.

649
00:32:36,730 --> 00:32:39,540
We don't have to
go on to humans.

650
00:32:39,540 --> 00:32:43,170
OK, just for fun let's
go on to human genome.

651
00:32:43,170 --> 00:32:46,860
How many people here want
to have their genome edited?

652
00:32:46,860 --> 00:32:47,805
All right.

653
00:32:47,805 --> 00:32:50,180
We'll ask in just a moment
what you want to have changed.

654
00:32:53,170 --> 00:32:56,170
So these are some of the
tools that my colleagues and I

655
00:32:56,170 --> 00:32:57,350
have worked on.

656
00:32:57,350 --> 00:32:59,480
I've been doing this
most of my career,

657
00:32:59,480 --> 00:33:01,860
is coming up with new tools
for engineering genomes

658
00:33:01,860 --> 00:33:03,430
and sequencing genomes.

659
00:33:03,430 --> 00:33:05,620
And the one I've been
talking about so far

660
00:33:05,620 --> 00:33:09,795
is down here at the bottom--
is Rec A and Red Beta.

661
00:33:12,810 --> 00:33:17,760
And the star for going
forward is this Cas9 protein.

662
00:33:17,760 --> 00:33:22,410
But we color-coded them here
so that the recognition--

663
00:33:22,410 --> 00:33:24,100
the critical thing
about genome editing

664
00:33:24,100 --> 00:33:25,750
is finding the needle
in the haystack.

665
00:33:25,750 --> 00:33:27,124
You want to change
one base pair.

666
00:33:27,124 --> 00:33:29,510
You don't want to
change anything else.

667
00:33:29,510 --> 00:33:31,430
And so something has
to do that recognition.

668
00:33:31,430 --> 00:33:33,380
That recognition
can be Watson-Crick,

669
00:33:33,380 --> 00:33:36,670
so you can have
DNA-DNA-- searching

670
00:33:36,670 --> 00:33:40,070
through the entire genome
with DNA-DNA interactions,

671
00:33:40,070 --> 00:33:42,670
or RNA-DNA interactions,
or Watson-Crick,

672
00:33:42,670 --> 00:33:44,620
or protein-DNA
interactions, which

673
00:33:44,620 --> 00:33:46,370
I'm sure you've learned
about quite a bit.

674
00:33:46,370 --> 00:33:49,150
And so we have examples of
each of these-- two examples

675
00:33:49,150 --> 00:33:53,950
are RNA, in blue; two examples
of DNA, down in the box;

676
00:33:53,950 --> 00:33:55,470
and then all the
rest are protein,

677
00:33:55,470 --> 00:33:58,570
where the protein-- the
amino acid side chains

678
00:33:58,570 --> 00:34:00,790
are recognizing, typically,
some kind of alpha

679
00:34:00,790 --> 00:34:02,690
helix in the major groove.

680
00:34:06,670 --> 00:34:11,010
OK, so Cas9 was
something that was

681
00:34:11,010 --> 00:34:13,750
a nice case of computational
biology, in my opinion.

682
00:34:13,750 --> 00:34:20,500
It was found in 1987 in E.
coli by Ishino and colleagues.

683
00:34:20,500 --> 00:34:23,730
And it was essentially junk DNA.

684
00:34:23,730 --> 00:34:25,389
It was not conserved.

685
00:34:25,389 --> 00:34:28,210
It was repetitive,
which were two

686
00:34:28,210 --> 00:34:30,690
of the hallmarks
of junk DNA, which

687
00:34:30,690 --> 00:34:32,371
were very popular
talk about in 1987.

688
00:34:32,371 --> 00:34:34,370
They were trying to shut
down the Genome Project

689
00:34:34,370 --> 00:34:37,310
before it started, three
years before it started--

690
00:34:37,310 --> 00:34:40,120
before the NIH part of it
started-- because they didn't

691
00:34:40,120 --> 00:34:44,120
want to sequence anything
in the human genome that

692
00:34:44,120 --> 00:34:46,124
wasn't coding for proteins.

693
00:34:46,124 --> 00:34:46,624
I'm serious.

694
00:34:50,239 --> 00:34:53,570
So anyway, this languished
as junk DNA for many years.

695
00:34:53,570 --> 00:34:57,820
It eventually became
clear to the cognoscenti

696
00:34:57,820 --> 00:35:01,560
bacteriologists that it might
be an interesting, adaptive

697
00:35:01,560 --> 00:35:03,920
immunity-- kind of
like antibodies--

698
00:35:03,920 --> 00:35:06,729
rather than the fixed or
native immunity, which

699
00:35:06,729 --> 00:35:07,770
were restriction enzymes.

700
00:35:07,770 --> 00:35:10,542
So this is kind of the adaptive
version of restriction enzymes.

701
00:35:10,542 --> 00:35:12,000
But it still didn't
really catch on

702
00:35:12,000 --> 00:35:17,570
until 2013, when a
couple of my post-docs

703
00:35:17,570 --> 00:35:23,150
and ex-post-doc and
graduate students in January

704
00:35:23,150 --> 00:35:24,950
got it to work in
humans-- so moved it

705
00:35:24,950 --> 00:35:28,070
from bacteria to humans--
kind of a big jump.

706
00:35:28,070 --> 00:35:31,660
And then it became
surprisingly easy,

707
00:35:31,660 --> 00:35:34,950
once it made that jump, to get
it to work in every organism

708
00:35:34,950 --> 00:35:37,270
that we and others have tried.

709
00:35:37,270 --> 00:35:39,830
So now 20 different
organisms, at least,

710
00:35:39,830 --> 00:35:43,750
that this works in-- fungi,
plants, and even elephants.

711
00:35:47,220 --> 00:35:48,850
We haven't published
the elephant yet,

712
00:35:48,850 --> 00:35:53,150
but we have our
reasons for doing that.

713
00:35:53,150 --> 00:35:56,200
And the most frequently asked
question-- and this, of course,

714
00:35:56,200 --> 00:36:00,320
should appeal to computational
biologists trying

715
00:36:00,320 --> 00:36:03,780
to do design-- is,
what about off-target?

716
00:36:03,780 --> 00:36:06,590
And it turns out now there
are many ways of dealing

717
00:36:06,590 --> 00:36:10,090
with off-target-- so much
so that I would be so bold--

718
00:36:10,090 --> 00:36:12,560
and this is a
slight speculation--

719
00:36:12,560 --> 00:36:14,680
but I would say we're
currently at the point

720
00:36:14,680 --> 00:36:18,410
where it's almost not
measurable, the off-target.

721
00:36:18,410 --> 00:36:20,540
And these are the different
ways you can do it.

722
00:36:20,540 --> 00:36:24,700
So we started out,
in our January 2013,

723
00:36:24,700 --> 00:36:26,780
with theoretical, where
you would basically

724
00:36:26,780 --> 00:36:29,980
look for-- anybody in this
room would know immediately

725
00:36:29,980 --> 00:36:33,490
how to do this-- would look
for potential off-targets that

726
00:36:33,490 --> 00:36:35,840
are off by one or
two nucleotides

727
00:36:35,840 --> 00:36:38,590
and ban those from
consideration.

728
00:36:38,590 --> 00:36:41,920
And then you take a shorter
list and do an empirical search,

729
00:36:41,920 --> 00:36:43,490
because this is so inexpensive.

730
00:36:43,490 --> 00:36:49,070
Basically, you
have this guide RNA

731
00:36:49,070 --> 00:36:50,400
which is making a triple helix.

732
00:36:50,400 --> 00:36:52,025
It's binding the one
strand of the DNA.

733
00:36:52,025 --> 00:36:53,608
It's so easy to make
those guide RNAs.

734
00:36:53,608 --> 00:36:55,440
It's just 20 nucleotides
you have to make.

735
00:36:55,440 --> 00:36:57,360
You pop it into a vector
where everything else

736
00:36:57,360 --> 00:36:58,460
is taken care of.

737
00:36:58,460 --> 00:37:01,369
It's so easy to do that that
you can make a lot of them,

738
00:37:01,369 --> 00:37:02,660
and you do an empirical search.

739
00:37:02,660 --> 00:37:04,430
You find places that
are particularly

740
00:37:04,430 --> 00:37:09,060
hot for the right sites and very
cold for the wrong off-targets.

741
00:37:09,060 --> 00:37:10,930
So those are the
first two methods.

742
00:37:10,930 --> 00:37:16,210
Then paired nickases--
they don't make

743
00:37:16,210 --> 00:37:17,890
a double-strand
break, which is what

744
00:37:17,890 --> 00:37:19,600
it does out of the
box from nature.

745
00:37:19,600 --> 00:37:21,020
It makes a double-strand break.

746
00:37:21,020 --> 00:37:22,730
You have it make a
single-strand nick.

747
00:37:22,730 --> 00:37:25,190
Then you require two of these
to be coincident and near one

748
00:37:25,190 --> 00:37:25,690
another.

749
00:37:25,690 --> 00:37:27,550
It's like the concept of PCR.

750
00:37:27,550 --> 00:37:30,144
You have to have two primers
that are near one another.

751
00:37:30,144 --> 00:37:31,060
So it's a coincidence.

752
00:37:31,060 --> 00:37:35,290
So it's like a p squared--
if the probability is

753
00:37:35,290 --> 00:37:41,182
one is off by one or two
or however many it takes,

754
00:37:41,182 --> 00:37:43,390
the chances of getting two
such sites near each other

755
00:37:43,390 --> 00:37:47,050
is roughly p squared.

756
00:37:47,050 --> 00:37:48,970
Truncated guide RNA
is not something

757
00:37:48,970 --> 00:37:51,880
that you would necessarily guess
that, if you make the guide RNA

758
00:37:51,880 --> 00:37:53,660
smaller, it's
going to be better.

759
00:37:53,660 --> 00:37:55,160
But there's obviously
some optimum.

760
00:37:55,160 --> 00:38:00,790
If you make it too long, then
it can bind by any subset--

761
00:38:00,790 --> 00:38:03,390
any kind of mismatched subset.

762
00:38:03,390 --> 00:38:08,750
If you make it too short, then
from informatics standpoint,

763
00:38:08,750 --> 00:38:11,380
it doesn't have enough
bits to recognize

764
00:38:11,380 --> 00:38:12,490
a place in the genome.

765
00:38:12,490 --> 00:38:14,660
So it turned out that
the optimal length

766
00:38:14,660 --> 00:38:16,784
was a little bit different
from the natural length.

767
00:38:16,784 --> 00:38:18,100
It was about two shorter.

768
00:38:18,100 --> 00:38:20,810
And finally-- and
this just came out.

769
00:38:20,810 --> 00:38:27,080
And this is from Keith
Joung and David Liu's lab,

770
00:38:27,080 --> 00:38:33,190
where you get rid of the
beautiful, double-strand break

771
00:38:33,190 --> 00:38:34,330
capacity.

772
00:38:34,330 --> 00:38:36,490
You can turn into
a nickase, or you

773
00:38:36,490 --> 00:38:39,410
can make it completely
nonfunctinal as a nucleus

774
00:38:39,410 --> 00:38:41,260
and then add nucleus
domains back.

775
00:38:41,260 --> 00:38:43,567
And you say, well, it
seems kind of bizarre

776
00:38:43,567 --> 00:38:45,650
that you're doing all that
work-- that you get rid

777
00:38:45,650 --> 00:38:47,150
of the nucleus and
you add it back--

778
00:38:47,150 --> 00:38:51,030
add in a different one, the
FokI bacterial restriction

779
00:38:51,030 --> 00:38:52,080
in the nucleus.

780
00:38:52,080 --> 00:38:55,561
But it turns out this
is the way that people

781
00:38:55,561 --> 00:38:57,560
have taken other DNA-binding
proteins-- the zinc

782
00:38:57,560 --> 00:38:59,890
fingers and then
the tau proteins.

783
00:38:59,890 --> 00:39:04,792
And so it had to be tried,
and it works extremely well.

784
00:39:04,792 --> 00:39:06,250
And it's like the
paired nickases--

785
00:39:06,250 --> 00:39:10,260
you need two of these sites
in order to get cleavage.

786
00:39:10,260 --> 00:39:11,040
And stay tuned.

787
00:39:11,040 --> 00:39:13,240
I'm sure there's more.

788
00:39:13,240 --> 00:39:16,690
So I just want to close on
this idea of causality again.

789
00:39:16,690 --> 00:39:17,510
I opened on it.

790
00:39:17,510 --> 00:39:19,030
I'll close on it.

791
00:39:19,030 --> 00:39:24,200
Here's an example of a
double null-- myostatin

792
00:39:24,200 --> 00:39:28,420
double null, as the both
maternal and paternal copies

793
00:39:28,420 --> 00:39:29,167
are missing.

794
00:39:29,167 --> 00:39:31,000
There are a lot of
examples of double nulls.

795
00:39:31,000 --> 00:39:35,240
We could talk about some later.

796
00:39:35,240 --> 00:39:36,460
And they're often rare.

797
00:39:36,460 --> 00:39:38,454
So at one point, there
was only one person

798
00:39:38,454 --> 00:39:40,370
in the world that was
characterized with this.

799
00:39:40,370 --> 00:39:47,580
And it's hard to do a
large cohort study on this.

800
00:39:47,580 --> 00:39:49,150
And they weren't really sick.

801
00:39:49,150 --> 00:39:54,570
The phenotype-- this little
baby had heavy musculature,

802
00:39:54,570 --> 00:39:58,490
as if he was working out next
to Arnold Schwarzenegger.

803
00:39:58,490 --> 00:40:03,050
But he came out this way,
and he stayed that way.

804
00:40:03,050 --> 00:40:04,472
But it's striking.

805
00:40:04,472 --> 00:40:05,930
You look at the
genome and you say,

806
00:40:05,930 --> 00:40:08,940
wow-- a double null and a
highly conserved protein.

807
00:40:08,940 --> 00:40:10,634
That's got to mean something.

808
00:40:10,634 --> 00:40:12,050
And then you can
have a hypothesis

809
00:40:12,050 --> 00:40:15,950
of what it means based on what
was known about that pathway.

810
00:40:15,950 --> 00:40:17,490
And it coincides
with the phenotype.

811
00:40:17,490 --> 00:40:18,990
And so you have a
strong hypothesis,

812
00:40:18,990 --> 00:40:20,490
and you can test it in animals.

813
00:40:20,490 --> 00:40:23,490
And so here, you
don't normally test it

814
00:40:23,490 --> 00:40:24,920
in three different
animal species.

815
00:40:24,920 --> 00:40:30,040
But this one, there happened to
be either preexisting or easy

816
00:40:30,040 --> 00:40:33,190
tests in cows, dogs, and mice.

817
00:40:36,750 --> 00:40:39,030
So that's one thing you
can do to get a causality.

818
00:40:39,030 --> 00:40:40,404
And the other
thing is, there are

819
00:40:40,404 --> 00:40:42,385
cases where the animal
models don't work.

820
00:40:42,385 --> 00:40:44,010
Either you knew in
advance they weren't

821
00:40:44,010 --> 00:40:47,860
going to work because they
don't have that brain structure.

822
00:40:47,860 --> 00:40:49,522
There's nothing
other than humans

823
00:40:49,522 --> 00:40:51,480
that have a particular
kind of brain structure,

824
00:40:51,480 --> 00:40:55,030
so it's hard to make mutants,
because you're already

825
00:40:55,030 --> 00:40:57,400
a mutant.

826
00:40:57,400 --> 00:41:03,490
And so another option is
organs on chips or organoids,

827
00:41:03,490 --> 00:41:06,690
because they're not really
fully physiologically faithful.

828
00:41:06,690 --> 00:41:11,590
And this, at least, is human,
but just like animal models

829
00:41:11,590 --> 00:41:13,860
can have artifacts,
human organoids

830
00:41:13,860 --> 00:41:15,372
can have artifacts as well.

831
00:41:15,372 --> 00:41:16,830
Here's an example
of something that

832
00:41:16,830 --> 00:41:21,490
will be coming out in a few
days that we did together

833
00:41:21,490 --> 00:41:25,780
with Keith Parker's
lab and Bill Pu's lab.

834
00:41:25,780 --> 00:41:28,250
And I think this
is a nice example

835
00:41:28,250 --> 00:41:30,870
of where you can take a
hypothesis, where one base here

836
00:41:30,870 --> 00:41:37,300
is changed-- this G
right here, is deleted--

837
00:41:37,300 --> 00:41:40,950
and that's putatively what
causes this cardiomyopathy that

838
00:41:40,950 --> 00:41:42,920
affects mitochondrial function.

839
00:41:42,920 --> 00:41:47,260
And you can mutate that
using the CRISPR technology

840
00:41:47,260 --> 00:41:49,510
I was talking about, where
you use homologous remedies

841
00:41:49,510 --> 00:41:51,710
to go in, find that
one base, change it.

842
00:41:51,710 --> 00:41:53,850
Or you can just make
a mess near there.

843
00:41:53,850 --> 00:41:56,190
So one control is
to not change it,

844
00:41:56,190 --> 00:41:59,670
and the other control is to put
a little insertion, deletion

845
00:41:59,670 --> 00:42:01,020
in there.

846
00:42:01,020 --> 00:42:04,250
And of course that
messes it up as well.

847
00:42:04,250 --> 00:42:11,390
And so you've now constructed
three isogenic strains.

848
00:42:11,390 --> 00:42:12,940
These are actually my cells.

849
00:42:12,940 --> 00:42:17,480
In the Personal Genome Project,
we take volunteers like myself

850
00:42:17,480 --> 00:42:18,974
and establish stem cell lines.

851
00:42:18,974 --> 00:42:20,390
And then from the
stem cell lines,

852
00:42:20,390 --> 00:42:23,620
we can establish, in this
case, very well-ordered

853
00:42:23,620 --> 00:42:26,680
cardiac tissue we'll
see in the next slide.

854
00:42:26,680 --> 00:42:31,860
And that cardiac tissue, you
can test for lipid biochemistry,

855
00:42:31,860 --> 00:42:35,990
for other physiological
parameters,

856
00:42:35,990 --> 00:42:39,160
for the morphology and
the contractility-- so

857
00:42:39,160 --> 00:42:43,609
diastole and systole that you
get in the cardiac muscle.

858
00:42:43,609 --> 00:42:45,650
So you basically make
something where you've only

859
00:42:45,650 --> 00:42:47,470
changed one base
pair in my genome,

860
00:42:47,470 --> 00:42:51,210
and we've made, essentially,
a version of me that's mutant.

861
00:42:51,210 --> 00:42:55,120
Unfortunately, I don't think
I had the picture of that.

862
00:42:55,120 --> 00:42:56,160
I thought I did.

863
00:43:03,390 --> 00:43:04,480
Oh, there it is.

864
00:43:07,070 --> 00:43:08,950
So here's an
example-- how you get

865
00:43:08,950 --> 00:43:11,790
this beautiful, ribbon-like
striated pattern

866
00:43:11,790 --> 00:43:14,620
that you expect
of cardiac muscle.

867
00:43:14,620 --> 00:43:17,540
This is programmed from
my fibroblast turned

868
00:43:17,540 --> 00:43:19,795
into stem cells into muscle.

869
00:43:19,795 --> 00:43:21,670
And then if you introduce
the two mutations--

870
00:43:21,670 --> 00:43:25,120
either the one that corresponds
to a patient or one that's just

871
00:43:25,120 --> 00:43:28,920
a mess-- you get a
morphological mess.

872
00:43:28,920 --> 00:43:32,020
And then you can restore those
by putting in the messenger

873
00:43:32,020 --> 00:43:36,580
RNA that will cover
for the mutation.

874
00:43:36,580 --> 00:43:41,330
So I'm going to open it up
for questions at that point.

875
00:43:41,330 --> 00:43:45,576
That's causality-- I think.

876
00:43:45,576 --> 00:43:46,075
Questions?

877
00:43:48,870 --> 00:43:50,556
While we're waiting,
anybody wants

878
00:43:50,556 --> 00:43:51,930
to volunteer what
they would like

879
00:43:51,930 --> 00:43:54,860
to change about themselves?

880
00:43:54,860 --> 00:43:56,970
You can mention a
specific base pair or kind

881
00:43:56,970 --> 00:44:02,122
of a general idea of what
you'd like to change,

882
00:44:02,122 --> 00:44:04,580
whether you think there's any
safety considerations that we

883
00:44:04,580 --> 00:44:05,720
should keep in mind.

884
00:44:09,224 --> 00:44:10,890
AUDIENCE: The problem's
delivery, right?

885
00:44:10,890 --> 00:44:11,370
That's the--

886
00:44:11,370 --> 00:44:11,825
PROFESSOR: Delivery.

887
00:44:11,825 --> 00:44:12,757
AUDIENCE: Yeah.

888
00:44:12,757 --> 00:44:14,190
PROFESSOR: Yeah, fair enough.

889
00:44:14,190 --> 00:44:21,190
So gene therapy had a crack.

890
00:44:21,190 --> 00:44:22,565
People were a
little overanxious,

891
00:44:22,565 --> 00:44:27,280
a little overambitious
about over a decade ago.

892
00:44:27,280 --> 00:44:34,000
And a small number of
patients died from cancers,

893
00:44:34,000 --> 00:44:36,000
because there was
random integration.

894
00:44:36,000 --> 00:44:37,902
Rather than this precise
genome manipulation

895
00:44:37,902 --> 00:44:39,360
we're talking about
here, there was

896
00:44:39,360 --> 00:44:41,370
kind of random
lentiviral integration

897
00:44:41,370 --> 00:44:43,580
of extra copies of genes.

898
00:44:43,580 --> 00:44:45,150
And if you land in
the wrong place,

899
00:44:45,150 --> 00:44:48,500
then your lentiviral
or retroviral promoter

900
00:44:48,500 --> 00:44:51,920
will go off into
oncogenes, like LMO2.

901
00:44:51,920 --> 00:44:54,060
So that delivery
was viral delivery,

902
00:44:54,060 --> 00:44:55,990
and it was random integration.

903
00:44:55,990 --> 00:44:57,710
We now have delivery
mechanisms that

904
00:44:57,710 --> 00:45:01,180
are nonintegrative or
integrative in a specific place

905
00:45:01,180 --> 00:45:05,642
or, in this case, can make
precise base pair changes.

906
00:45:05,642 --> 00:45:07,100
So there's two
levels of delivery--

907
00:45:07,100 --> 00:45:08,440
one is to get it to
the right tissue,

908
00:45:08,440 --> 00:45:10,580
and the other is to get
it to the right base pair.

909
00:45:10,580 --> 00:45:14,580
I think both are
semisolved problems.

910
00:45:14,580 --> 00:45:17,970
So you can do ex vivo delivery.

911
00:45:17,970 --> 00:45:20,100
So you can take T
cells out of a body.

912
00:45:20,100 --> 00:45:23,370
You can use a previous
generation-- the zinc finger

913
00:45:23,370 --> 00:45:26,700
nucleus-- to cleave both
copies of the CCR5 gene.

914
00:45:26,700 --> 00:45:31,120
And now people that
had full-blown AIDS,

915
00:45:31,120 --> 00:45:33,220
you put these T cells
back in their body,

916
00:45:33,220 --> 00:45:35,800
and then now they're
AIDS resistant.

917
00:45:35,800 --> 00:45:38,990
Those T cells that have
both copies of the CCR5 gene

918
00:45:38,990 --> 00:45:46,810
missing, which is the AIDS
coreceptor, are now resistant.

919
00:45:46,810 --> 00:45:48,040
So that's ex vivo.

920
00:45:48,040 --> 00:45:49,980
That's one way to do it.

921
00:45:49,980 --> 00:45:51,970
Delivery to the
liver is quite easy.

922
00:45:51,970 --> 00:45:54,780
You can do that with
nonviral vectors,

923
00:45:54,780 --> 00:45:57,290
and a [INAUDIBLE] virus is
one that's very popular.

924
00:45:57,290 --> 00:45:59,970
You can get it to go to
almost every cell in the body,

925
00:45:59,970 --> 00:46:02,826
either selectively or generally.

926
00:46:02,826 --> 00:46:05,075
So you just want to make
sure that once it goes there,

927
00:46:05,075 --> 00:46:08,626
it doesn't cause any
damage other than the base

928
00:46:08,626 --> 00:46:09,625
pair you want to change.

929
00:46:12,930 --> 00:46:16,740
So there are now 2,000
gene therapy trials

930
00:46:16,740 --> 00:46:19,396
in phase one, two, and three.

931
00:46:19,396 --> 00:46:21,520
It's a big change
from a decade ago,

932
00:46:21,520 --> 00:46:25,645
where I think people had pretty
much given up on gene therapy.

933
00:46:25,645 --> 00:46:27,520
There's now 2,000
clinical trials.

934
00:46:27,520 --> 00:46:31,920
And one has emerged all
the way out of phase three

935
00:46:31,920 --> 00:46:35,690
into full approval in Europe.

936
00:46:35,690 --> 00:46:40,580
Ironically, they now have
genetically engineered humans

937
00:46:40,580 --> 00:46:43,950
in a land where they don't eat
genetically modified foods.

938
00:46:47,010 --> 00:46:50,550
But I think they're
better for it So far, it's

939
00:46:50,550 --> 00:46:52,910
curing diseases.

940
00:46:52,910 --> 00:46:53,410
Yeah.

941
00:46:53,410 --> 00:46:56,812
AUDIENCE: For your
noncanonical amino acids,

942
00:46:56,812 --> 00:46:59,890
does this open up
enzymatic reactions that

943
00:46:59,890 --> 00:47:02,158
would be, say,
impossible, do you think,

944
00:47:02,158 --> 00:47:07,020
with if you add a new amino
acid that can [INAUDIBLE]?

945
00:47:07,020 --> 00:47:10,270
PROFESSOR: So I'll just repeat
the question for our viewing

946
00:47:10,270 --> 00:47:12,820
audience.

947
00:47:12,820 --> 00:47:16,480
Do nonstandard amino acids open
up new enzymatic reactions?

948
00:47:16,480 --> 00:47:20,160
And there's already a couple
of examples in the literature.

949
00:47:20,160 --> 00:47:23,352
This was done prior to this
wonderful strain, where

950
00:47:23,352 --> 00:47:24,310
there's no competition.

951
00:47:24,310 --> 00:47:26,162
It was done at low efficiency.

952
00:47:26,162 --> 00:47:28,370
But putting in one amino
acid at low efficiency-- you

953
00:47:28,370 --> 00:47:30,190
could still get an enzyme.

954
00:47:30,190 --> 00:47:33,870
So even if it's,
like, 10% efficiency,

955
00:47:33,870 --> 00:47:36,350
you produce 10 times as
much enzyme, and it works.

956
00:47:36,350 --> 00:47:39,220
So there were some
redox-coumarin derivatives

957
00:47:39,220 --> 00:47:40,920
of amino acids.

958
00:47:40,920 --> 00:47:43,420
So coumarin-redox
capabilities is not

959
00:47:43,420 --> 00:47:46,040
present in any of the
other amino acids.

960
00:47:46,040 --> 00:47:51,730
And they took a protein that
was very well studied-- where

961
00:47:51,730 --> 00:47:55,980
they had by protein design,
and by random mutagenesis,

962
00:47:55,980 --> 00:47:58,610
and they threw the book
at it-- and they could not

963
00:47:58,610 --> 00:48:03,220
budge the activity beyond
the apparently optimal,

964
00:48:03,220 --> 00:48:05,900
naturally occurring activity.

965
00:48:05,900 --> 00:48:08,410
They put in this amino
acid, which was not randomly

966
00:48:08,410 --> 00:48:11,600
chosen-- it was a redox-coumarin
derivative-- they put it

967
00:48:11,600 --> 00:48:15,380
in the active site.

968
00:48:15,380 --> 00:48:18,562
I think they tried out a
few different things that

969
00:48:18,562 --> 00:48:20,020
made a small
combinatorial library.

970
00:48:20,020 --> 00:48:23,370
But the point is, they
got a tenfold improvement

971
00:48:23,370 --> 00:48:25,740
in the catalytic rate constants.

972
00:48:25,740 --> 00:48:27,170
So that's an example.

973
00:48:27,170 --> 00:48:29,370
Another example, which
isn't really catalytic,

974
00:48:29,370 --> 00:48:33,400
but it's very popular, is that
you can put in polyethylene

975
00:48:33,400 --> 00:48:40,290
glycol-modified amino
acids wherever you want

976
00:48:40,290 --> 00:48:41,510
rather than kind of randomly.

977
00:48:41,510 --> 00:48:42,760
You can put it in precisely.

978
00:48:42,760 --> 00:48:47,480
And this will greatly
extend the serum half-life,

979
00:48:47,480 --> 00:48:51,910
so that normal proteins like
human growth hormone, which

980
00:48:51,910 --> 00:48:55,655
is a approved pharmaceutical
for certain uses--

981
00:48:55,655 --> 00:48:58,920
not all the uses that you find
on the internet, but other

982
00:48:58,920 --> 00:49:03,010
uses-- but it turns over
very quickly in the serum.

983
00:49:03,010 --> 00:49:05,540
And so if you put a polyethylene
glycol in the right place

984
00:49:05,540 --> 00:49:08,190
on human growth hormone--
or other human protein

985
00:49:08,190 --> 00:49:11,040
pharmaceuticals--
they last longer.

986
00:49:11,040 --> 00:49:14,580
Those are two examples-- one
of them definitely active site.

987
00:49:17,050 --> 00:49:17,550
Yeah.

988
00:49:17,550 --> 00:49:19,258
AUDIENCE: This is
actually a small detail

989
00:49:19,258 --> 00:49:22,144
from your [INAUDIBLE]
study, where

990
00:49:22,144 --> 00:49:26,369
you looked at the structure and
mutated one of the amino acids

991
00:49:26,369 --> 00:49:28,348
to this phenyl
thing, and then you

992
00:49:28,348 --> 00:49:29,848
changed a bunch of
other amino acids

993
00:49:29,848 --> 00:49:31,836
to compensate for that size.

994
00:49:31,836 --> 00:49:36,425
So I noticed most of the changes
were to either [INAUDIBLE],

995
00:49:36,425 --> 00:49:37,800
but one of them
was a tryptophan.

996
00:49:37,800 --> 00:49:39,241
So why was that?

997
00:49:39,241 --> 00:49:40,615
PROFESSOR: Let's
go back to that,

998
00:49:40,615 --> 00:49:43,267
and see if we can find that.

999
00:49:43,267 --> 00:49:44,758
AUDIENCE: It was
a previous slide.

1000
00:49:44,758 --> 00:49:45,752
Yeah, this one.

1001
00:49:45,752 --> 00:49:50,730
So it was amino acid 271.

1002
00:49:50,730 --> 00:49:51,580
PROFESSOR: Yeah, OK.

1003
00:49:51,580 --> 00:49:54,666
So in each of these lines--
I didn't spend much time

1004
00:49:54,666 --> 00:49:56,040
on this-- in each
of these lines,

1005
00:49:56,040 --> 00:49:59,410
there's one amino acid
we've changed to bipA.

1006
00:49:59,410 --> 00:50:02,670
So these three are
all the same protein,

1007
00:50:02,670 --> 00:50:06,890
and it's all the same
mutation-- leucine 303 to bipA.

1008
00:50:06,890 --> 00:50:09,420
And then all the other
ones are compensating.

1009
00:50:09,420 --> 00:50:14,010
And then here, you can see
it's a different leucine

1010
00:50:14,010 --> 00:50:16,570
and a different protein.

1011
00:50:16,570 --> 00:50:18,600
They're all leucines--
different proteins.

1012
00:50:18,600 --> 00:50:20,595
Now what's your question about?

1013
00:50:20,595 --> 00:50:22,920
AUDIENCE: My question
was the compensating

1014
00:50:22,920 --> 00:50:25,710
mutations are generally all
the smaller amino acids, right?

1015
00:50:25,710 --> 00:50:26,623
PROFESSOR: Oh, I see.

1016
00:50:26,623 --> 00:50:28,122
So why phenylalanine
and tryptophan?

1017
00:50:28,122 --> 00:50:29,358
AUDIENCE: Yeah.

1018
00:50:29,358 --> 00:50:32,990
PROFESSOR: Well, those
are pretty close.

1019
00:50:32,990 --> 00:50:36,600
So these are done by
energy, not by eyeball.

1020
00:50:36,600 --> 00:50:40,130
They're done all by
COMP ROSETTA, where

1021
00:50:40,130 --> 00:50:43,990
we combinatorially go
through lots of side chains.

1022
00:50:43,990 --> 00:50:45,750
So we combinatorially
went through lots

1023
00:50:45,750 --> 00:50:49,650
of proteins, lots of positions
to substitute amino acid,

1024
00:50:49,650 --> 00:50:53,200
then lots of accommodating
mutations-- which

1025
00:50:53,200 --> 00:50:56,840
is not necessarily the typical
way you use this software.

1026
00:50:56,840 --> 00:50:59,410
Anyway, that probably
is some stacking

1027
00:50:59,410 --> 00:51:05,656
of one of the two aromatic
rings onto the tryptophan.

1028
00:51:05,656 --> 00:51:06,156
Yeah.

1029
00:51:10,490 --> 00:51:12,360
And we tried many combinations.

1030
00:51:12,360 --> 00:51:14,780
No doubt, we tried
the phenylalanine

1031
00:51:14,780 --> 00:51:16,750
and the tryptophan in
various combinations

1032
00:51:16,750 --> 00:51:18,410
with the other ones, and
the tryptophan empirically

1033
00:51:18,410 --> 00:51:19,010
works better.

1034
00:51:22,310 --> 00:51:24,160
MODERATOR: Are there
any more questions?