1
00:00:00,090 --> 00:00:02,500
The following content is
provided under a Creative

2
00:00:02,500 --> 00:00:04,019
Commons license.

3
00:00:04,019 --> 00:00:06,360
Your support will help
MIT OpenCourseWare

4
00:00:06,360 --> 00:00:10,730
continue to offer high quality
educational resources for free.

5
00:00:10,730 --> 00:00:13,340
To make a donation or
view additional materials

6
00:00:13,340 --> 00:00:17,217
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:17,217 --> 00:00:17,842
at ocw.mit.edu.

8
00:00:21,680 --> 00:00:25,510
PROFESSOR: So today, our goal
is to really go through this

9
00:00:25,510 --> 00:00:28,240
the paper that you
read maybe last night

10
00:00:28,240 --> 00:00:31,130
by Dekel and Alon "Optimality
and Evolutionary Tuning

11
00:00:31,130 --> 00:00:32,855
of the Expression
Level of a Protein."

12
00:00:32,855 --> 00:00:35,690
It was published
in Nature in 2005.

13
00:00:35,690 --> 00:00:39,140
I think that it's a very
interesting paper, exploring

14
00:00:39,140 --> 00:00:42,060
some kind of big general ideas.

15
00:00:42,060 --> 00:00:46,960
I think it's also, in some
ways, rather misleading.

16
00:00:46,960 --> 00:00:49,970
And we'll try to
understand or discuss

17
00:00:49,970 --> 00:00:54,630
the ways in which the
connections between experiment,

18
00:00:54,630 --> 00:00:57,760
theory, prediction,
and so forth,

19
00:00:57,760 --> 00:01:02,190
how they all play out in
the context of this problem.

20
00:01:02,190 --> 00:01:04,360
Before we get going too
much on the science,

21
00:01:04,360 --> 00:01:07,310
I just want to remind
everyone that Andrew will not

22
00:01:07,310 --> 00:01:08,790
be having office hours today.

23
00:01:08,790 --> 00:01:13,800
He is off interviewing for
MD, PhD programs right now.

24
00:01:13,800 --> 00:01:16,370
But if you had questions
about the problems,

25
00:01:16,370 --> 00:01:18,770
I hope that you asked
[? Sarab ?] last night.

26
00:01:18,770 --> 00:01:21,640
You might be able to grab
him after the lecture

27
00:01:21,640 --> 00:01:25,230
today, but yes.

28
00:01:25,230 --> 00:01:30,895
Any other questions about
anything before we get going?

29
00:01:30,895 --> 00:01:33,830
No.

30
00:01:33,830 --> 00:01:37,120
So I think that this
paper in general, I guess,

31
00:01:37,120 --> 00:01:40,460
the lecture today is
really a combination

32
00:01:40,460 --> 00:01:43,750
of trying to start thinking
about maybe laboratory

33
00:01:43,750 --> 00:01:48,480
evolution or kind of population
level phenomena in general,

34
00:01:48,480 --> 00:01:52,090
as well as this
question of optimization

35
00:01:52,090 --> 00:01:55,630
in terms of protein expression.

36
00:01:55,630 --> 00:02:00,870
So can somebody
just maybe summarize

37
00:02:00,870 --> 00:02:02,750
the big idea of this paper?

38
00:02:10,150 --> 00:02:12,394
Yes, please.

39
00:02:12,394 --> 00:02:15,262
AUDIENCE: Protein
expression levels

40
00:02:15,262 --> 00:02:20,998
evolve to optimal values
for cost-benefit questions.

41
00:02:20,998 --> 00:02:23,920
PROFESSOR: Right, so that's
the argument at least.

42
00:02:23,920 --> 00:02:28,000
And they have a very
nice first sentence here.

43
00:02:28,000 --> 00:02:31,220
"Different proteins have
different expression levels."

44
00:02:31,220 --> 00:02:33,540
You know, it's hard to
argue with that statement,

45
00:02:33,540 --> 00:02:34,660
nice, concise.

46
00:02:34,660 --> 00:02:37,600
But the question is, well, why?

47
00:02:37,600 --> 00:02:40,470
And I'd say that
there is a range,

48
00:02:40,470 --> 00:02:44,910
different philosophical
opinions out in the world.

49
00:02:44,910 --> 00:02:48,470
I said that some group
that is very much reflected

50
00:02:48,470 --> 00:02:51,610
in this study is trying
to think about this

51
00:02:51,610 --> 00:02:54,180
in the context of optimization.

52
00:02:54,180 --> 00:02:57,900
Well, maybe the reason that we
see a given level of expression

53
00:02:57,900 --> 00:03:01,780
of some protein is because, at
least over evolutionary time,

54
00:03:01,780 --> 00:03:04,600
in some ancestral environment
that we don't know, but maybe

55
00:03:04,600 --> 00:03:09,671
it evolved to optimize
some cost-benefit problem.

56
00:03:09,671 --> 00:03:11,420
And then I'd say that
there's another kind

57
00:03:11,420 --> 00:03:13,060
of general
philosophical approach

58
00:03:13,060 --> 00:03:16,740
that tends to be a little
bit more agnostic or just

59
00:03:16,740 --> 00:03:21,270
maybe more of a sense that
certainly things could have

60
00:03:21,270 --> 00:03:23,020
evolved to optimize something.

61
00:03:23,020 --> 00:03:25,880
But we can never really
know where they evolved in,

62
00:03:25,880 --> 00:03:28,730
so we shouldn't be going out
on a limb on these things.

63
00:03:28,730 --> 00:03:32,150
And given that
this is philosophy,

64
00:03:32,150 --> 00:03:34,839
I will maybe not
require that you agree

65
00:03:34,839 --> 00:03:36,130
with any particular standpoint.

66
00:03:36,130 --> 00:03:38,420
But I will say that it's
at least worth thinking

67
00:03:38,420 --> 00:03:40,070
about the question
and maybe you can

68
00:03:40,070 --> 00:03:44,150
do measurements to illuminate
whether all these ideas might

69
00:03:44,150 --> 00:03:45,390
make sense.

70
00:03:45,390 --> 00:03:49,560
And then we'll try to, over
the next hour and a half,

71
00:03:49,560 --> 00:03:53,660
figure out to what degree this
paper maybe should convince us

72
00:03:53,660 --> 00:03:55,740
of this optimization
in the context

73
00:03:55,740 --> 00:03:58,640
of this particular protein.

74
00:03:58,640 --> 00:04:01,360
Now, even if it's the
case that somebody

75
00:04:01,360 --> 00:04:05,490
convinces you maybe that
expression of the lac operon

76
00:04:05,490 --> 00:04:07,730
maybe does optimize some
cost-benefit analysis.

77
00:04:07,730 --> 00:04:11,760
That does not prove that
every protein optimize things.

78
00:04:11,760 --> 00:04:15,310
So don't get overwhelmed
or underwhelmed

79
00:04:15,310 --> 00:04:17,920
or whatever it might be.

80
00:04:17,920 --> 00:04:20,500
Let's just first
make sure that we

81
00:04:20,500 --> 00:04:23,960
understand what we mean by
costs and benefits in this case.

82
00:04:23,960 --> 00:04:27,330
Can somebody pick one of them?

83
00:04:35,340 --> 00:04:37,860
Now, what is a cost and
benefit in the context

84
00:04:37,860 --> 00:04:39,060
of maybe this paper?

85
00:04:46,272 --> 00:04:46,772
Yes.

86
00:04:46,772 --> 00:04:50,146
AUDIENCE: Producing protein
requires some kind of resource.

87
00:04:50,146 --> 00:04:52,295
PROFESSOR: Right, requires--

88
00:04:52,295 --> 00:04:54,550
AUDIENCE: [INAUDIBLE] energy--

89
00:04:54,550 --> 00:04:59,297
PROFESSOR: --requires resources
of some sort or another

90
00:04:59,297 --> 00:05:00,380
to express these proteins.

91
00:05:04,460 --> 00:05:07,780
And this can manifest
in many different ways.

92
00:05:07,780 --> 00:05:11,342
But certainly, if you were
not making these proteins,

93
00:05:11,342 --> 00:05:13,300
you could have been making
some other proteins.

94
00:05:13,300 --> 00:05:15,110
And so if these proteins
are not helping you, then

95
00:05:15,110 --> 00:05:16,640
maybe something else would have.

96
00:05:16,640 --> 00:05:20,190
But they're are many different
ways of looking at this.

97
00:05:20,190 --> 00:05:22,840
But there is some finite number
of things that the cell can do.

98
00:05:25,550 --> 00:05:28,250
And the benefits, of course,
in the case-- and this

99
00:05:28,250 --> 00:05:30,700
is in particular in
the case of a lac

100
00:05:30,700 --> 00:05:38,925
operon, what does this
network allow us to do?

101
00:05:38,925 --> 00:05:39,425
Yeah.

102
00:05:39,425 --> 00:05:42,400
AUDIENCE: You get to consume
the energy of lactose.

103
00:05:42,400 --> 00:05:43,076
PROFESSOR: Yes.

104
00:05:43,076 --> 00:05:44,284
AUDIENCE: Lets you go faster.

105
00:05:44,284 --> 00:05:48,419
PROFESSOR: That's right,
you get to consume lactose

106
00:05:48,419 --> 00:05:48,960
in this case.

107
00:05:51,720 --> 00:05:55,190
Now, we've already spent some
time thinking or discussing

108
00:05:55,190 --> 00:05:57,230
the lac operon.

109
00:05:57,230 --> 00:06:08,180
What were the two key components
in here in the lac operon?

110
00:06:08,180 --> 00:06:11,237
If you were a cell and
you wanted to eat lactose,

111
00:06:11,237 --> 00:06:12,320
what would you need to do?

112
00:06:25,550 --> 00:06:27,581
I'm picking somebody
to-- yes, please.

113
00:06:27,581 --> 00:06:31,189
AUDIENCE: It's a gene that you
should express, the lac gene?

114
00:06:31,189 --> 00:06:32,090
PROFESSOR: OK, right.

115
00:06:32,090 --> 00:06:33,030
So the lac genes.

116
00:06:33,030 --> 00:06:35,816
But maybe in a little
bit more detail,

117
00:06:35,816 --> 00:06:37,565
what do we mean when
we say the lac genes?

118
00:06:40,900 --> 00:06:44,561
Well, I mean, it's
not just lactose.

119
00:06:44,561 --> 00:06:46,060
But I mean, what
are the things that

120
00:06:46,060 --> 00:06:49,234
have to happen if you want
to eat anything, I guess?

121
00:06:49,234 --> 00:06:49,733
Your cell--

122
00:06:52,320 --> 00:06:53,670
AUDIENCE: Import.

123
00:06:53,670 --> 00:06:57,700
PROFESSOR: Right, so you
first have to import it.

124
00:06:57,700 --> 00:07:01,970
Now, in some cases, this can
be done maybe for some-- maybe

125
00:07:01,970 --> 00:07:03,900
nutrients, it could be
done even passively,

126
00:07:03,900 --> 00:07:05,620
if it crosses the
membrane easily.

127
00:07:05,620 --> 00:07:08,600
But for most of the things
that you might think about,

128
00:07:08,600 --> 00:07:10,710
you actually have
to do active import.

129
00:07:10,710 --> 00:07:12,352
So this is done by what?

130
00:07:12,352 --> 00:07:13,060
Anybody remember?

131
00:07:16,451 --> 00:07:16,950
lacY.

132
00:07:20,660 --> 00:07:27,050
So lacY is a membrane
protein that imports lactose.

133
00:07:33,330 --> 00:07:35,751
And then what do you need to do?

134
00:07:35,751 --> 00:07:38,613
AUDIENCE: Break the two apart,
and then you can metabolize.

135
00:07:38,613 --> 00:07:43,120
PROFESSOR: Right, then you
have to eat it somehow.

136
00:07:43,120 --> 00:07:46,300
Now, of course, metabolism
is a very complicated thing.

137
00:07:46,300 --> 00:07:48,930
But the key thing that's
different between lactose

138
00:07:48,930 --> 00:07:52,450
and maybe the simple
sugars is that you first

139
00:07:52,450 --> 00:07:55,760
have to break down the lactose
into its constituent parts.

140
00:07:55,760 --> 00:07:58,530
A lactose is a
disaccharide composed

141
00:07:58,530 --> 00:08:01,340
of two simple monosaccharides.

142
00:08:01,340 --> 00:08:05,210
So what you need is you need
this lacZ, beta-galactosidase,

143
00:08:05,210 --> 00:08:07,670
in order to cleave that bond.

144
00:08:07,670 --> 00:08:09,850
And then you have the two
simple monosaccharides

145
00:08:09,850 --> 00:08:10,600
that can be eaten.

146
00:08:13,990 --> 00:08:16,100
Now, the lac operon
also has this lacA.

147
00:08:16,100 --> 00:08:18,200
And it's not quite obvious
what that thing does,

148
00:08:18,200 --> 00:08:19,460
so nobody ever talks about it.

149
00:08:19,460 --> 00:08:23,180
But there is a
third protein there.

150
00:08:23,180 --> 00:08:24,929
But what we always
talk about is lacY,

151
00:08:24,929 --> 00:08:27,220
that's require to import the
lactose and then lacZ that

152
00:08:27,220 --> 00:08:32,429
is required to break the lactose
down into its monosaccharides.

153
00:08:32,429 --> 00:08:35,440
And then the idea-- and
that's not sufficient.

154
00:08:35,440 --> 00:08:36,940
You don't take those
monosaccharides

155
00:08:36,940 --> 00:08:39,390
and instantly make
more cells out of it.

156
00:08:39,390 --> 00:08:43,150
But the idea is that the rest
of the metabolic machinery

157
00:08:43,150 --> 00:08:46,210
is kind of there any
ways to do other-- that's

158
00:08:46,210 --> 00:08:47,320
kind of some assumptions.

159
00:08:58,570 --> 00:09:04,840
Can somebody explain how it
is that they measured the cost

160
00:09:04,840 --> 00:09:06,450
of expressing these proteins?

161
00:09:11,641 --> 00:09:12,140
Yes.

162
00:09:12,140 --> 00:09:19,535
AUDIENCE: So they [INAUDIBLE]
expressed these proteins

163
00:09:19,535 --> 00:09:22,493
at different levels using
different concentrations

164
00:09:22,493 --> 00:09:23,479
of IPTG.

165
00:09:23,479 --> 00:09:24,250
PROFESSOR: Right.

166
00:09:24,250 --> 00:09:26,272
AUDIENCE: There was
no lactose around,

167
00:09:26,272 --> 00:09:28,062
so it was only the
cost to no benefits.

168
00:09:28,062 --> 00:09:29,520
And then they
measured [INAUDIBLE].

169
00:09:29,520 --> 00:09:30,770
PROFESSOR: All right, perfect.

170
00:09:30,770 --> 00:09:32,760
OK, so there are several
key things in here.

171
00:09:32,760 --> 00:09:34,330
So first of all,
normally, what we do

172
00:09:34,330 --> 00:09:38,710
is it's lactose inside the
cell that causes this lac

173
00:09:38,710 --> 00:09:40,550
repressor to fall
off and then you get

174
00:09:40,550 --> 00:09:46,200
expression of the lac operon.

175
00:09:46,200 --> 00:09:50,010
But in order to kind of
sidestep or circumvent

176
00:09:50,010 --> 00:09:52,760
that normal network, what
we are doing in this case

177
00:09:52,760 --> 00:09:54,672
is adding IPTG.

178
00:09:54,672 --> 00:10:04,070
So IPTG allows one to
get expression of--

179
00:10:04,070 --> 00:10:12,770
and what IPTG is that it stops
the inhibition of this lac

180
00:10:12,770 --> 00:10:14,480
promoter, where you
get lacZ and lacY.

181
00:10:20,330 --> 00:10:24,120
Now, the idea here
is that you can

182
00:10:24,120 --> 00:10:28,540
control the level of
expression of this operon,

183
00:10:28,540 --> 00:10:30,600
because what we
really want is we

184
00:10:30,600 --> 00:10:33,840
want to measure a
plot of something

185
00:10:33,840 --> 00:10:38,350
that you would call cost--
and we'll explore a little bit

186
00:10:38,350 --> 00:10:42,341
more what that means-- as a
function of the lac operon

187
00:10:42,341 --> 00:10:42,840
expression.

188
00:10:46,860 --> 00:10:52,500
And this is often done
relative to the full induction

189
00:10:52,500 --> 00:10:54,490
of the wild type lac operon.

190
00:10:57,440 --> 00:11:00,370
And this is a relative
growth rate reduction.

191
00:11:00,370 --> 00:11:03,250
So basically, this
is a percentage,

192
00:11:03,250 --> 00:11:06,180
say, decrease in growth rate.

193
00:11:10,770 --> 00:11:12,790
Now, there was a key
thing that you brought up,

194
00:11:12,790 --> 00:11:15,160
which is that you want
to measure the growth

195
00:11:15,160 --> 00:11:17,430
rate in the absence of lactose.

196
00:11:17,430 --> 00:11:20,650
Because otherwise as we
increase the level of expression

197
00:11:20,650 --> 00:11:23,370
here-- so we're
controlling this by IPTG,

198
00:11:23,370 --> 00:11:26,210
so there's some mapping
from IPTG concentration

199
00:11:26,210 --> 00:11:29,390
to the level of expression here.

200
00:11:29,390 --> 00:11:32,630
But we want to be able
to measure the cost

201
00:11:32,630 --> 00:11:35,430
separate from the benefits.

202
00:11:35,430 --> 00:11:39,760
So it's important then to grow
this in the absence of lactose.

203
00:11:39,760 --> 00:11:41,140
So say, no lactose.

204
00:11:44,180 --> 00:11:47,920
But if I just take
bacteria and I put them

205
00:11:47,920 --> 00:11:52,760
in a tube with say
minimal media, salt,

206
00:11:52,760 --> 00:11:57,834
so forth, but no lactose,
are they going to grow?

207
00:11:57,834 --> 00:11:59,000
They need to draw something.

208
00:12:01,820 --> 00:12:03,802
So what is it that
the authors have done?

209
00:12:11,174 --> 00:12:11,674
Yes.

210
00:12:11,674 --> 00:12:12,658
AUDIENCE: Glycerol.

211
00:12:12,658 --> 00:12:16,300
PROFESSOR: That's right,
they added some glycerol

212
00:12:16,300 --> 00:12:18,270
and in different parts.

213
00:12:18,270 --> 00:12:20,100
I think it's 1% glycerol.

214
00:12:20,100 --> 00:12:23,950
Does anybody happen to remember?

215
00:12:23,950 --> 00:12:26,740
I think, for most
of it, it was 0.1%.

216
00:12:26,740 --> 00:12:28,510
I tell you what,
we'll say a little bit

217
00:12:28,510 --> 00:12:30,940
of small concentrations
of glycerol.

218
00:12:35,215 --> 00:12:40,360
So the idea is that this is kind
of a second rate carbon source.

219
00:12:40,360 --> 00:12:44,630
The bacteria are not super
happy, but they're OK.

220
00:12:44,630 --> 00:12:49,605
And then given this,
what they were able

221
00:12:49,605 --> 00:12:51,480
demonstrate is that, if
they did add lactose,

222
00:12:51,480 --> 00:12:54,100
they would have grown faster.

223
00:12:54,100 --> 00:12:57,832
So there's a sense that the
lactose does help the cells.

224
00:12:57,832 --> 00:12:59,290
But you have to
have some glycerol.

225
00:12:59,290 --> 00:13:03,069
Otherwise, you can't really
measure these things.

226
00:13:03,069 --> 00:13:03,569
Yeah.

227
00:13:03,569 --> 00:13:05,755
AUDIENCE: Why is it that-- you
were saying if you put like

228
00:13:05,755 --> 00:13:07,050
a very good carbon source--

229
00:13:07,050 --> 00:13:07,460
PROFESSOR: Well--

230
00:13:07,460 --> 00:13:08,820
AUDIENCE: You're not going
to see any [INAUDIBLE].

231
00:13:08,820 --> 00:13:10,065
PROFESSOR: OK, so first
of all what I was saying

232
00:13:10,065 --> 00:13:11,590
is that you have to
have some carbon source.

233
00:13:11,590 --> 00:13:12,270
AUDIENCE: Sure.

234
00:13:12,270 --> 00:13:16,030
PROFESSOR: Right, so you
have to do something.

235
00:13:16,030 --> 00:13:17,820
And it's just good
conceptually to make

236
00:13:17,820 --> 00:13:19,130
sure you think about
how you would actually

237
00:13:19,130 --> 00:13:19,940
do this experiment.

238
00:13:19,940 --> 00:13:21,685
Now, you have to add
some carbon source.

239
00:13:21,685 --> 00:13:23,810
But the question is, well,
what happens if you just

240
00:13:23,810 --> 00:13:24,960
added a bunch of glucose?

241
00:13:27,780 --> 00:13:31,670
Now, in that case actually, for
some of the other experiments,

242
00:13:31,670 --> 00:13:34,040
I think that would have
caused problems in the sense

243
00:13:34,040 --> 00:13:36,940
that then there would not
be any benefits associated

244
00:13:36,940 --> 00:13:43,040
with growing or with adding
increasing lac operon

245
00:13:43,040 --> 00:13:44,470
expression.

246
00:13:44,470 --> 00:13:45,940
For this experiment,
in principle,

247
00:13:45,940 --> 00:13:48,529
one could have done
that, although you really

248
00:13:48,529 --> 00:13:50,570
want to measure the costs
and associated benefits

249
00:13:50,570 --> 00:13:51,690
in some environment,
which you're

250
00:13:51,690 --> 00:13:53,064
to be doing in
later experiments.

251
00:13:53,064 --> 00:13:56,790
So I think it's really from
a conceptual standpoint,

252
00:13:56,790 --> 00:14:00,850
in principle, you can
measure this in glucose,

253
00:14:00,850 --> 00:14:03,696
but then you'd always worry,
oh, well, maybe it's different.

254
00:14:03,696 --> 00:14:04,195
Yeah.

255
00:14:04,195 --> 00:14:07,251
AUDIENCE: [INAUDIBLE].

256
00:14:07,251 --> 00:14:08,630
PROFESSOR: Oh, yeah, right.

257
00:14:08,630 --> 00:14:13,440
So you could have broken down--
So the other issue is that,

258
00:14:13,440 --> 00:14:15,840
in principle-- and they
don't talk about this here--

259
00:14:15,840 --> 00:14:17,790
but yeah, if you add a
bunch of glucose, then

260
00:14:17,790 --> 00:14:19,720
you would have to have
another mutant in order

261
00:14:19,720 --> 00:14:22,480
to break the glucose
repression, because if you

262
00:14:22,480 --> 00:14:24,020
have this preferred
carbon source,

263
00:14:24,020 --> 00:14:28,730
glucose, then you'll
naturally repress the CRP, all

264
00:14:28,730 --> 00:14:31,860
of the alternative modes
of carbon metabolism

265
00:14:31,860 --> 00:14:33,650
just because glucose
was kind of the best.

266
00:14:38,100 --> 00:14:47,700
And what was the key conclusion
from this first data plot?

267
00:14:47,700 --> 00:14:49,176
AUDIENCE: It's nonlinear.

268
00:14:49,176 --> 00:14:51,370
PROFESSOR: All right, nonlinear.

269
00:14:51,370 --> 00:14:53,660
The cost is a function
of the lac expression.

270
00:14:53,660 --> 00:14:54,827
And it grows super linearly.

271
00:14:54,827 --> 00:14:56,284
I always forget
what the difference

272
00:14:56,284 --> 00:14:57,469
is in concave and convex is.

273
00:14:57,469 --> 00:14:59,760
I don't know if other people
have this particular brain

274
00:14:59,760 --> 00:15:00,259
problem.

275
00:15:00,259 --> 00:15:03,559
But the second
derivative is positive.

276
00:15:03,559 --> 00:15:05,100
In particular, that
means that if you

277
00:15:05,100 --> 00:15:10,610
do draw some sort
of like line, then

278
00:15:10,610 --> 00:15:13,985
they have data that looks
something like-- so here

279
00:15:13,985 --> 00:15:15,980
is 0.5.

280
00:15:15,980 --> 00:15:19,790
We have something that
kind of falls below here.

281
00:15:22,770 --> 00:15:24,880
They had about a 0.25.

282
00:15:24,880 --> 00:15:29,555
And it was also a little
bit below that crossed.

283
00:15:29,555 --> 00:15:34,100
They had a 0.75.

284
00:15:34,100 --> 00:15:35,550
And then they had a 1.

285
00:15:35,550 --> 00:15:37,630
Why is it that they
can't go above 1 here?

286
00:15:47,724 --> 00:15:49,390
Why do they not have
more data out here?

287
00:15:54,130 --> 00:15:56,500
AUDIENCE: Because you
can't have more expression

288
00:15:56,500 --> 00:15:57,450
than full expression.

289
00:15:57,450 --> 00:15:58,900
PROFESSOR: You can't have more
expression than full expression

290
00:15:58,900 --> 00:16:01,020
with this promoter,
because what they are doing

291
00:16:01,020 --> 00:16:03,420
is they're adding
IPTG, so they titrate

292
00:16:03,420 --> 00:16:07,752
between 0 and maximal
expression from this promoter.

293
00:16:07,752 --> 00:16:09,710
In principle, you could
always get another one.

294
00:16:09,710 --> 00:16:12,083
And then you should be able
to go out further, right?

295
00:16:15,640 --> 00:16:18,280
And at maximal
expression, they measure

296
00:16:18,280 --> 00:16:26,695
about a 4% growth
deficit, 0.04, just

297
00:16:26,695 --> 00:16:28,370
to give you a sense of scale.

298
00:16:28,370 --> 00:16:31,220
So this is 4% deficit.

299
00:16:34,800 --> 00:16:38,310
Now, I want to ask a
more general question.

300
00:16:38,310 --> 00:16:40,560
So let's imagine that you
are measuring some quantity.

301
00:16:43,950 --> 00:16:48,320
So we'll say this is some
quality y as a function of x.

302
00:16:48,320 --> 00:16:53,390
And let's imagine that the
true y as a function of x

303
00:16:53,390 --> 00:16:59,570
looks like something.

304
00:16:59,570 --> 00:17:04,670
Now, you go and you
measure at multiple values

305
00:17:04,670 --> 00:17:13,440
of x this curve, because
we're very interested in what

306
00:17:13,440 --> 00:17:15,640
this curve looks like.

307
00:17:15,640 --> 00:17:22,470
Now, the question is, what
fraction of the error bars

308
00:17:22,470 --> 00:17:42,550
will contain this
curve and, of course,

309
00:17:42,550 --> 00:17:47,500
contain this is true curve?

310
00:17:47,500 --> 00:17:53,804
So I'm assuming that this curve
is the god-given actual thing

311
00:17:53,804 --> 00:17:54,720
that you're measuring.

312
00:17:54,720 --> 00:17:57,020
And so you measure this
quantity with noise.

313
00:18:01,480 --> 00:18:04,670
So we measure this some number
of times, some number of times.

314
00:18:11,579 --> 00:18:12,870
Do you understand the question?

315
00:18:12,870 --> 00:18:19,980
So here, contained the curve.

316
00:18:19,980 --> 00:18:21,970
There, it didn't.

317
00:18:21,970 --> 00:18:29,965
So what fraction of error
bars will contain that curve?

318
00:18:29,965 --> 00:18:31,785
AUDIENCE: [INAUDIBLE].

319
00:18:31,785 --> 00:18:34,990
PROFESSOR: Right.

320
00:18:34,990 --> 00:18:37,530
And indeed, what we
want-- it's always good--

321
00:18:37,530 --> 00:18:41,260
what were the error bars
in the figure 2A in this?

322
00:18:45,160 --> 00:18:48,420
Right, well, OK, so they're
experimental error, right.

323
00:18:48,420 --> 00:18:50,170
Incidentally, how is
it that they actually

324
00:18:50,170 --> 00:18:51,045
measure these things?

325
00:18:51,045 --> 00:19:00,510
Does anybody-- And
so these are actually

326
00:19:00,510 --> 00:19:04,600
a result of growing on
a nice [INAUDIBLE] well,

327
00:19:04,600 --> 00:19:06,400
like a microtiter
plate, where they

328
00:19:06,400 --> 00:19:07,650
used a checkerboard pattern.

329
00:19:07,650 --> 00:19:10,320
And they take 48
different cultures.

330
00:19:10,320 --> 00:19:11,727
And they measure
the growth rates

331
00:19:11,727 --> 00:19:12,810
for each one individually.

332
00:19:12,810 --> 00:19:16,240
And then they're plotting the
standard error of the mean.

333
00:19:49,240 --> 00:19:51,200
Do you understand what
I'm trying to ask you?

334
00:19:54,062 --> 00:19:56,924
AUDIENCE: So in
that case, I mean,

335
00:19:56,924 --> 00:20:00,077
the size of the error bars,
you just want a scaling

336
00:20:00,077 --> 00:20:01,910
or something,
[? if that's right, ?] because

337
00:20:01,910 --> 00:20:03,190
the size of the error bars--

338
00:20:03,190 --> 00:20:04,550
PROFESSOR: Right, well--

339
00:20:04,550 --> 00:20:05,300
AUDIENCE: I just--

340
00:20:05,300 --> 00:20:06,450
PROFESSOR: Yeah, OK, so
this is a good question.

341
00:20:06,450 --> 00:20:07,075
We'll find out.

342
00:20:32,150 --> 00:20:35,160
So it depends on n, where
n is the number of samples

343
00:20:35,160 --> 00:20:37,075
that we took at each location.

344
00:20:41,700 --> 00:20:42,400
Question, yeah.

345
00:20:42,400 --> 00:20:45,774
AUDIENCE: Yeah, the standard
error is just the [INAUDIBLE]?

346
00:20:48,412 --> 00:20:50,620
PROFESSOR: Right, so standard
error of the mean, this

347
00:20:50,620 --> 00:20:51,661
is an important question.

348
00:20:54,190 --> 00:20:57,570
What you do is you calculate
the standard deviation, divide

349
00:20:57,570 --> 00:21:01,560
by the square root
of n-- OK, now,

350
00:21:01,560 --> 00:21:04,760
I always forget whether
it's n or n minus 1, now.

351
00:21:04,760 --> 00:21:09,020
We already did one
n minus 1, right?

352
00:21:09,020 --> 00:21:12,210
So it's you measure
the standard deviation

353
00:21:12,210 --> 00:21:15,380
of the data, the standard
deviation in y divided

354
00:21:15,380 --> 00:21:18,200
by root n, where n is the
number of measurements

355
00:21:18,200 --> 00:21:20,390
you took at that point.

356
00:21:20,390 --> 00:21:26,140
But of course, when you
measure the standard deviation,

357
00:21:26,140 --> 00:21:29,590
there was already
an n minus 1, right?

358
00:21:29,590 --> 00:21:31,390
Have I lost a minus 1?

359
00:21:31,390 --> 00:21:34,510
Do you guys-- OK.

360
00:21:37,412 --> 00:21:37,912
Yeah.

361
00:21:37,912 --> 00:21:41,476
AUDIENCE: Isn't the standard--
I thought the standard error

362
00:21:41,476 --> 00:21:45,202
of the mean and not the actual
standard deviation [INAUDIBLE]?

363
00:21:45,202 --> 00:21:46,740
PROFESSOR: Yes.

364
00:21:46,740 --> 00:21:49,540
And we're going to spend a
lot of time talking about what

365
00:21:49,540 --> 00:21:51,890
the difference is between
a standard deviation

366
00:21:51,890 --> 00:21:53,290
and a standard
error of the mean.

367
00:21:53,290 --> 00:21:55,123
And it depends on what
you're trying to ask.

368
00:22:04,975 --> 00:22:07,100
Do you guys understand what
I'm trying to ask here?

369
00:22:09,625 --> 00:22:11,500
All right, well, let's
just see where we are,

370
00:22:11,500 --> 00:22:12,790
and then we'll discuss.

371
00:22:12,790 --> 00:22:14,030
OK, ready?

372
00:22:14,030 --> 00:22:16,805
3, 2, 1.

373
00:22:22,840 --> 00:22:29,470
All right, so we got
many A's, B's, C's.

374
00:22:29,470 --> 00:22:36,250
Nobody likes D. OK, but it's
very common to see that.

375
00:22:36,250 --> 00:22:38,920
Let's go ahead and--
it's worthwhile,

376
00:22:38,920 --> 00:22:45,240
I think there's enough
variation to decide.

377
00:22:45,240 --> 00:22:47,970
And in particular,
between your neighbor,

378
00:22:47,970 --> 00:22:50,640
try to agree on
why or why not it

379
00:22:50,640 --> 00:22:54,313
might depend on n and so forth.

380
00:22:54,313 --> 00:22:56,187
We'll just have a minute
to think about this.

381
00:22:59,369 --> 00:23:00,660
AUDIENCE: [INTERPOSING VOICES].

382
00:24:07,755 --> 00:24:09,516
PROFESSOR: So what
do you guys think?

383
00:24:09,516 --> 00:24:12,780
AUDIENCE: We're
still [INAUDIBLE].

384
00:24:12,780 --> 00:24:14,100
PROFESSOR: OK, no, that's fine.

385
00:24:14,100 --> 00:24:15,391
AUDIENCE: [INTERPOSING VOICES].

386
00:24:50,072 --> 00:24:52,030
PROFESSOR: Why don't we
go ahead and reconvene,

387
00:24:52,030 --> 00:24:56,650
so we can kind of try to figure
out what is going on here.

388
00:24:56,650 --> 00:24:58,330
I just want to
see if anybody has

389
00:24:58,330 --> 00:25:00,647
changed their opinion as
a result of discussing

390
00:25:00,647 --> 00:25:01,480
with their neighbor.

391
00:25:01,480 --> 00:25:02,590
All right, let's see it.

392
00:25:02,590 --> 00:25:03,740
3, 2, 1.

393
00:25:06,280 --> 00:25:08,280
Some people are not even
willing to-- all right,

394
00:25:08,280 --> 00:25:09,280
OK, so it's interesting.

395
00:25:09,280 --> 00:25:11,230
So now, actually,
it seems like there

396
00:25:11,230 --> 00:25:15,430
is some convergence to this.

397
00:25:15,430 --> 00:25:19,790
Should I feel like that
you guys in general

398
00:25:19,790 --> 00:25:24,630
have more accurate votes
than past years somehow.

399
00:25:24,630 --> 00:25:26,820
I don't know.

400
00:25:26,820 --> 00:25:30,950
So let's try to figure
out why it might be that

401
00:25:30,950 --> 00:25:33,143
and what this thing
standard deviation is.

402
00:25:33,143 --> 00:25:35,226
Let's try to figure out
what all these things are.

403
00:25:39,510 --> 00:25:44,710
So the idea is that we're
going to measure some quantity,

404
00:25:44,710 --> 00:25:46,820
but it's a measurement
with error.

405
00:25:46,820 --> 00:25:49,210
And for now, we'll just assume
that the measurement error

406
00:25:49,210 --> 00:25:54,880
is Gaussian distributed, because
otherwise, we get confused

407
00:25:54,880 --> 00:25:56,290
and everything.

408
00:25:56,290 --> 00:25:58,866
So let's say-- so
what we're going to do

409
00:25:58,866 --> 00:26:01,020
is we're going to measure
some quantity with error.

410
00:26:01,020 --> 00:26:12,750
OK, so it's-- Now, what we're
interested in is not really

411
00:26:12,750 --> 00:26:14,560
the width of the
resulting distribution,

412
00:26:14,560 --> 00:26:17,200
because that's a
result of how accurate,

413
00:26:17,200 --> 00:26:19,950
how good we are as
experimentalists.

414
00:26:19,950 --> 00:26:24,110
What we're really interested
in is this true quantity, so

415
00:26:24,110 --> 00:26:27,520
the mean of our distribution.

416
00:26:27,520 --> 00:26:28,500
We want to know mean.

417
00:26:34,840 --> 00:26:36,550
Now, if you read the
supplemental section

418
00:26:36,550 --> 00:26:38,466
of this paper, what
you'll see is that there's

419
00:26:38,466 --> 00:26:42,260
a significant standard
deviation to their measurements,

420
00:26:42,260 --> 00:26:44,929
where the standard deviation,
they don't actually

421
00:26:44,929 --> 00:26:45,970
quote exactly what it is.

422
00:26:45,970 --> 00:26:49,350
But they have plots
of the histograms,

423
00:26:49,350 --> 00:26:52,750
where like, for example,
this is a histogram

424
00:26:52,750 --> 00:26:55,270
of the different growth rate
measurements across those 48

425
00:26:55,270 --> 00:26:59,080
samples, and actually, in this
case, even more than that.

426
00:27:01,960 --> 00:27:04,160
But what you see is that
the standard deviation

427
00:27:04,160 --> 00:27:08,210
might be 3%, 4%.

428
00:27:08,210 --> 00:27:16,400
So the standard deviation is
actually something that's big.

429
00:27:16,400 --> 00:27:18,570
Now the question
is, what we really

430
00:27:18,570 --> 00:27:21,202
want to now is, how the
mean of these distributions

431
00:27:21,202 --> 00:27:23,160
are shifting, because we
want to know something

432
00:27:23,160 --> 00:27:27,600
about this true underlying
growth rate deficit,

433
00:27:27,600 --> 00:27:29,330
because each
individual measurement

434
00:27:29,330 --> 00:27:32,750
is a rather noisy measurement.

435
00:27:32,750 --> 00:27:34,340
And indeed, in this
case, the noise

436
00:27:34,340 --> 00:27:36,500
is larger than the signal.

437
00:27:36,500 --> 00:27:38,830
But if we believe that
we don't have a shifting

438
00:27:38,830 --> 00:27:41,546
systematic error, then
we can average that out

439
00:27:41,546 --> 00:27:42,920
just by making
many measurements.

440
00:27:49,290 --> 00:27:51,400
So the question is,
so the standard error

441
00:27:51,400 --> 00:27:53,930
of the mean, what
it's telling us about

442
00:27:53,930 --> 00:27:58,800
is that if you measure
this quantity n times,

443
00:27:58,800 --> 00:27:59,930
you get some mean.

444
00:28:03,420 --> 00:28:06,030
So let's say that
this is a-- ooh,

445
00:28:06,030 --> 00:28:08,170
it's a little bit of a
broad somehow Gaussian.

446
00:28:14,250 --> 00:28:18,520
So this is a histogram of our
measurements of this thing.

447
00:28:18,520 --> 00:28:23,840
And what we want to know is
the mean of this distribution.

448
00:28:23,840 --> 00:28:26,830
So this is similar to our
discussion of super resolution

449
00:28:26,830 --> 00:28:29,440
microscopy.

450
00:28:29,440 --> 00:28:31,850
And the question is,
how will the mean

451
00:28:31,850 --> 00:28:34,445
be distributed if you
have these n measurements?

452
00:28:40,070 --> 00:28:42,240
It's a Gaussian distribution.

453
00:28:42,240 --> 00:28:44,660
And it's certainly a
Gaussian distribution,

454
00:28:44,660 --> 00:28:46,640
because of course, if
we-- what we're doing

455
00:28:46,640 --> 00:28:48,480
is we're measuring a
bunch of Gaussians.

456
00:28:48,480 --> 00:28:51,229
And we're going to
add them all together.

457
00:28:51,229 --> 00:28:53,020
And then we're going
to calculate the mean.

458
00:28:53,020 --> 00:28:56,120
So we definitely get a Gaussian.

459
00:28:56,120 --> 00:28:58,942
And indeed, because of
the central limit theorem,

460
00:28:58,942 --> 00:29:01,150
this is also saying that
even if your errors were not

461
00:29:01,150 --> 00:29:04,892
distributed super
Gaussian, even if they

462
00:29:04,892 --> 00:29:07,350
were a little bit funny shaped,
the resulting distributions

463
00:29:07,350 --> 00:29:09,310
of the means will look
more like a Gaussian.

464
00:29:13,320 --> 00:29:16,650
Now, what we often plot
is the standard error

465
00:29:16,650 --> 00:29:21,330
of the mean, which is kind
of the plus or minus 1 sigma

466
00:29:21,330 --> 00:29:24,490
of the distribution of the mean.

467
00:29:24,490 --> 00:29:28,070
So if we go and we sample from
this distribution n times,

468
00:29:28,070 --> 00:29:29,215
we'll get some value.

469
00:29:29,215 --> 00:29:30,590
If we sample from
it again, we'll

470
00:29:30,590 --> 00:29:32,032
get some other value, so forth.

471
00:29:32,032 --> 00:29:34,490
Now, the distribution of the
means we're going to calculate

472
00:29:34,490 --> 00:29:35,948
is not going to be
a representation

473
00:29:35,948 --> 00:29:38,450
of the full standard deviation.

474
00:29:38,450 --> 00:29:40,150
But rather, it's
going to be suppressed

475
00:29:40,150 --> 00:29:42,790
by this root n, where n is the
number that we're sampling.

476
00:29:42,790 --> 00:29:45,510
So if you look at the
histogram of the means,

477
00:29:45,510 --> 00:29:47,710
you're going to get a
Gaussian in here-- OK, that's

478
00:29:47,710 --> 00:29:54,860
not a very nice Gaussian,
but-- with a width that

479
00:29:54,860 --> 00:29:57,930
is the standard deviation
divided by root n.

480
00:30:01,800 --> 00:30:07,270
Now, if we assume that we don't
have any systematic error, then

481
00:30:07,270 --> 00:30:12,650
this distribution of means that
you would have calculated--

482
00:30:12,650 --> 00:30:14,940
it's Gaussian, it's
centered on the right value,

483
00:30:14,940 --> 00:30:19,680
but about a third
of the time, it'll

484
00:30:19,680 --> 00:30:23,230
be beyond the plus
or minus 1 sigma.

485
00:30:23,230 --> 00:30:27,537
And what that means is
that about a third of time,

486
00:30:27,537 --> 00:30:29,370
if you plot this standard
error of the mean,

487
00:30:29,370 --> 00:30:32,060
it should fall off of
the kind of true curve.

488
00:30:36,060 --> 00:30:41,330
And this basically
does not depend on n.

489
00:30:41,330 --> 00:30:45,182
And can somebody
say why it doesn't?

490
00:30:45,182 --> 00:30:46,150
Yeah.

491
00:30:46,150 --> 00:30:49,054
AUDIENCE: Yeah, I think I
was sort of confusing myself,

492
00:30:49,054 --> 00:30:50,506
but this makes sense.

493
00:30:50,506 --> 00:30:55,081
So yeah, I mean, you know that
these error bars will shrink,

494
00:30:55,081 --> 00:30:56,524
if you take more measurements.

495
00:30:56,524 --> 00:30:59,420
But on the other hand,
the actual measurements

496
00:30:59,420 --> 00:31:00,090
will be closer--

497
00:31:00,090 --> 00:31:01,090
PROFESSOR: That's right.

498
00:31:01,090 --> 00:31:02,352
AUDIENCE: --to the true value.

499
00:31:02,352 --> 00:31:03,351
PROFESSOR: That's right.

500
00:31:03,351 --> 00:31:05,510
So what happens is
that as you sample

501
00:31:05,510 --> 00:31:08,160
from this distribution
a larger number n times,

502
00:31:08,160 --> 00:31:10,420
then your error bars shrink,
but your measurements

503
00:31:10,420 --> 00:31:12,230
get closer to the curve.

504
00:31:12,230 --> 00:31:14,170
And those two effects cancel.

505
00:31:14,170 --> 00:31:19,220
So you should end up roughly
with 2/3 of the errors bars

506
00:31:19,220 --> 00:31:23,200
containing this curve,
or 1/3 falling off.

507
00:31:23,200 --> 00:31:25,890
And I think that this is
a little bit surprising,

508
00:31:25,890 --> 00:31:29,940
because there's always a sense
that we feel that there's

509
00:31:29,940 --> 00:31:32,790
something wrong with our
measurements or something

510
00:31:32,790 --> 00:31:40,320
wrong with our model or whatnot,
if any error bar does not

511
00:31:40,320 --> 00:31:41,210
contain the line.

512
00:31:41,210 --> 00:31:44,420
I mean, I feel like I often see
there's this effort that people

513
00:31:44,420 --> 00:31:51,220
have to try to make it so that
these error bars always overlap

514
00:31:51,220 --> 00:31:53,565
with some underlying
curve that is supposed

515
00:31:53,565 --> 00:31:54,830
to represent reality.

516
00:31:54,830 --> 00:31:58,880
But that's not, in principle,
supposed to be true.

517
00:32:02,016 --> 00:32:05,772
Are there any questions
about where we are right now?

518
00:32:05,772 --> 00:32:07,580
OK.

519
00:32:07,580 --> 00:32:10,160
Now, what I want to do is
something slightly different,

520
00:32:10,160 --> 00:32:15,560
which is ask-- let's say that
this is a curve that is not

521
00:32:15,560 --> 00:32:19,235
the underlying reality but
is instead a fit to the data.

522
00:32:21,920 --> 00:32:28,421
How does this change
anything that we've said?

523
00:32:31,310 --> 00:32:33,462
Or does it?

524
00:32:33,462 --> 00:32:39,510
All right, well, OK, let's-- OK,
so we're going to say do fit.

525
00:32:39,510 --> 00:32:45,660
The question is does this
change the thing here?

526
00:32:45,660 --> 00:32:46,410
Do you understand?

527
00:32:51,950 --> 00:32:57,475
Change, A is Yes, B is No.

528
00:33:02,270 --> 00:33:02,770
Yes.

529
00:33:02,770 --> 00:33:05,752
AUDIENCE: Do you have the
same modeling for the fit

530
00:33:05,752 --> 00:33:07,740
as we did for the original--

531
00:33:07,740 --> 00:33:12,190
PROFESSOR: Yeah, well, let's say
that this was a curve predicted

532
00:33:12,190 --> 00:33:16,900
by some fancy
theory but that you

533
00:33:16,900 --> 00:33:21,324
have to specify the mass
of something and the-- so I

534
00:33:21,324 --> 00:33:23,490
don't know, there are two
things that are specified.

535
00:33:23,490 --> 00:33:25,390
So what you do is you fit.

536
00:33:25,390 --> 00:33:27,270
And the question
is, does it change

537
00:33:27,270 --> 00:33:29,530
what fraction of the
error bars you expect

538
00:33:29,530 --> 00:33:32,320
to contain the true curve?

539
00:33:32,320 --> 00:33:34,100
Ready?

540
00:33:34,100 --> 00:33:35,647
Is it not clear what I'm asking?

541
00:33:35,647 --> 00:33:39,623
AUDIENCE: But the true curve
is determined by the god.

542
00:33:39,623 --> 00:33:40,617
[LAUGHTER]

543
00:33:40,617 --> 00:33:42,890
PROFESSOR: Right, so the
truth curves-- we don't

544
00:33:42,890 --> 00:33:46,750
need to get too much into this.

545
00:33:46,750 --> 00:33:49,160
But I mean, the
reason we're doing

546
00:33:49,160 --> 00:33:52,730
science is to try to look
into the mind of God, right?

547
00:33:52,730 --> 00:33:55,140
So we were doing
a fit to try to--

548
00:33:55,140 --> 00:33:58,104
AUDIENCE: But you can
fit anything to anything.

549
00:33:58,104 --> 00:34:00,952
You know, what does that mean?

550
00:34:00,952 --> 00:34:01,910
Do you see what I mean?

551
00:34:01,910 --> 00:34:02,860
Like, I could--

552
00:34:02,860 --> 00:34:03,810
AUDIENCE: It
depends on whether--

553
00:34:03,810 --> 00:34:05,435
AUDIENCE: I could
get curve that passes

554
00:34:05,435 --> 00:34:08,082
through each and every
of these points points,

555
00:34:08,082 --> 00:34:10,330
if you give me
enough time with it.

556
00:34:10,330 --> 00:34:12,589
So I guess I don't
understand the question.

557
00:34:12,589 --> 00:34:13,422
[INTERPOSING VOICES]

558
00:34:19,494 --> 00:34:20,650
PROFESSOR: All right, yeah.

559
00:34:20,650 --> 00:34:24,820
OK, but I think you're arguing
for something already maybe.

560
00:34:24,820 --> 00:34:28,870
But let's just say that
this was a-- I mean,

561
00:34:28,870 --> 00:34:33,969
let's just for concreteness
let's say that I measured at 15

562
00:34:33,969 --> 00:34:35,340
values of x.

563
00:34:35,340 --> 00:34:37,630
I have some error
bars and some error.

564
00:34:37,630 --> 00:34:40,530
But then I needed
three parameters

565
00:34:40,530 --> 00:34:41,880
to characterize this curve.

566
00:34:41,880 --> 00:34:45,246
And so those I used to fit.

567
00:34:45,246 --> 00:34:47,120
Are you happier with
three fitting parameters

568
00:34:47,120 --> 00:34:49,824
and 15 measurements?

569
00:34:49,824 --> 00:34:51,449
All right, let's just
see where we are.

570
00:34:51,449 --> 00:34:56,460
OK, ready, 3, 2, 1.

571
00:34:56,460 --> 00:35:00,220
OK, so we have a majority of
A but a significant minority

572
00:35:00,220 --> 00:35:04,540
of B. So just to be
a lot more concrete,

573
00:35:04,540 --> 00:35:06,270
can somebody say why
they're saying yes?

574
00:35:13,594 --> 00:35:14,094
Yeah.

575
00:35:14,094 --> 00:35:17,191
AUDIENCE: I guess
intuitively, [INAUDIBLE] we

576
00:35:17,191 --> 00:35:19,962
try to optimize the number of
error bars that go through.

577
00:35:19,962 --> 00:35:21,800
PROFESSOR: Yeah, so
the fit is somehow

578
00:35:21,800 --> 00:35:25,929
trying to get the curve
to go near the error bars.

579
00:35:25,929 --> 00:35:27,470
And typically when
we do a fit, we're

580
00:35:27,470 --> 00:35:30,180
typically trying to
minimize this mean squared

581
00:35:30,180 --> 00:35:34,200
error or deviation from our
curve to the data point.

582
00:35:38,940 --> 00:35:41,631
How much you expect this
to make a difference?

583
00:35:41,631 --> 00:35:43,130
So for concreteness
again, let's say

584
00:35:43,130 --> 00:35:53,300
that I had 15 values of x that
I was measuring things at.

585
00:35:53,300 --> 00:36:00,070
Now, we expect say five
of them-- five will miss

586
00:36:00,070 --> 00:36:06,590
true curve, we decided roughly.

587
00:36:06,590 --> 00:36:10,720
Now the question is, what
happens if we, instead

588
00:36:10,720 --> 00:36:13,370
of having this true curve, if
we do a fit using these three

589
00:36:13,370 --> 00:36:15,800
parameters?

590
00:36:15,800 --> 00:36:20,270
How much of a difference
should it make to this very,

591
00:36:20,270 --> 00:36:22,390
very roughly?

592
00:36:22,390 --> 00:36:38,480
We'll see-- Now, I'm asking
roughly how many of these error

593
00:36:38,480 --> 00:36:41,190
bars do you expect to then
miss the fitted curve?

594
00:36:43,890 --> 00:36:49,310
And this is we used three
fitting parameters, say.

595
00:36:54,610 --> 00:36:56,640
That was parameters over there.

596
00:36:56,640 --> 00:36:58,550
Do understand the question?

597
00:36:58,550 --> 00:37:01,190
So instead of plotting
this god-given curve,

598
00:37:01,190 --> 00:37:03,220
instead we're plotting
a curve that I'm

599
00:37:03,220 --> 00:37:06,170
giving you, where I use three
fitting parameters to fit

600
00:37:06,170 --> 00:37:06,670
to the data.

601
00:37:09,410 --> 00:37:11,290
And I'm just trying
to get it roughly.

602
00:37:11,290 --> 00:37:14,355
I think that this is
not a rigorous statement

603
00:37:14,355 --> 00:37:16,230
I'm about to make, but
just so that we're all

604
00:37:16,230 --> 00:37:17,350
roughly on the same page.

605
00:37:17,350 --> 00:37:20,415
All right, ready, 3, 2, 1.

606
00:37:27,900 --> 00:37:31,030
Right, so it'll be
somewhere in here.

607
00:37:31,030 --> 00:37:34,240
And I think this
is not quite true.

608
00:37:34,240 --> 00:37:39,100
But the idea is
that, in particular,

609
00:37:39,100 --> 00:37:43,450
if you make n
measurements and then you

610
00:37:43,450 --> 00:37:47,190
use n fitting
parameters, in general

611
00:37:47,190 --> 00:37:50,960
you will get a perfect fit, i.e.

612
00:37:50,960 --> 00:37:53,780
the curve will go
through every single data

613
00:37:53,780 --> 00:37:57,170
point amazingly perfectly.

614
00:37:57,170 --> 00:38:01,830
So if I give you 15
measurements across here

615
00:38:01,830 --> 00:38:05,060
and then I give you a
15-degree polynomial-- I guess,

616
00:38:05,060 --> 00:38:07,020
we only need a
14-degree polynomial

617
00:38:07,020 --> 00:38:10,180
with 15 free parameters--
then that polynomial

618
00:38:10,180 --> 00:38:14,960
will go through everyone of
your data points spot on,

619
00:38:14,960 --> 00:38:18,790
not even a question of whether
it goes through the error bars.

620
00:38:18,790 --> 00:38:31,742
I'm saying
literally-- and that's

621
00:38:31,742 --> 00:38:34,200
just because you're just solving
an equation at that stage.

622
00:38:37,060 --> 00:38:40,510
Now, this is a stupid statement,
except that once you're

623
00:38:40,510 --> 00:38:44,020
kind of like in the
heat of the moment,

624
00:38:44,020 --> 00:38:46,480
eagerly trying to do some
fitting for your advisor

625
00:38:46,480 --> 00:38:49,920
or whatever, it's easy to
fall into this trap, where

626
00:38:49,920 --> 00:38:53,169
you just kind of like
add extra parameters.

627
00:38:53,169 --> 00:38:55,210
I mean, I definitely
remember in graduate school,

628
00:38:55,210 --> 00:38:55,940
I was surprised.

629
00:38:55,940 --> 00:38:57,981
I was like, oh, this thing,
it works wonderfully.

630
00:38:57,981 --> 00:39:01,380
It's like it seems to magically
goes through all my data.

631
00:39:01,380 --> 00:39:05,690
And then I felt very stupid
like 30 seconds later.

632
00:39:05,690 --> 00:39:08,400
But this is just
a very easy thing

633
00:39:08,400 --> 00:39:11,880
to screw up and forget about.

634
00:39:11,880 --> 00:39:13,680
So what this is
saying is that, if you

635
00:39:13,680 --> 00:39:16,930
see a curve-- if in
the course of your work

636
00:39:16,930 --> 00:39:23,020
or if you're reading a
paper and you see some curve

637
00:39:23,020 --> 00:39:27,430
and you want to know something
about how much information is

638
00:39:27,430 --> 00:39:30,810
it or whether things look
reasonable given the data,

639
00:39:30,810 --> 00:39:33,750
it's useful to kind
of orient yourself

640
00:39:33,750 --> 00:39:38,040
relative to these
statements, that depending

641
00:39:38,040 --> 00:39:40,960
on how many free parameters
you're kind of using,

642
00:39:40,960 --> 00:39:45,810
you expect a larger or
smaller number of these data

643
00:39:45,810 --> 00:39:48,650
points to kind of go through
the curve that you see.

644
00:39:53,910 --> 00:39:57,397
But I would just want
to stress that you

645
00:39:57,397 --> 00:39:59,855
don't want to be anywhere close
to the point where you have

646
00:39:59,855 --> 00:40:04,132
a number of parameters
equal to the number of kind

647
00:40:04,132 --> 00:40:05,590
of measurements
that you're making.

648
00:40:05,590 --> 00:40:09,720
And for any sort
of reasonable curve

649
00:40:09,720 --> 00:40:11,580
describing what you
hope is a reality,

650
00:40:11,580 --> 00:40:15,445
you expect some of those data
points with their error bars

651
00:40:15,445 --> 00:40:16,920
to kind of miss the curve.

652
00:40:16,920 --> 00:40:18,340
And that doesn't
mean that they're

653
00:40:18,340 --> 00:40:19,300
sloppy experimentalists.

654
00:40:19,300 --> 00:40:20,341
It doesn't mean whatever.

655
00:40:25,250 --> 00:40:31,640
OK, now coming back
to the task at hand,

656
00:40:31,640 --> 00:40:33,460
do you understand
why they're plotting

657
00:40:33,460 --> 00:40:34,670
the standard error
of the mean rather

658
00:40:34,670 --> 00:40:35,836
than the standard deviation?

659
00:40:38,500 --> 00:40:41,781
Because what your interest
in, in principle, is not--

660
00:40:41,781 --> 00:40:43,280
the question you're
trying to answer

661
00:40:43,280 --> 00:40:46,720
is not how variable
are their measurements

662
00:40:46,720 --> 00:40:49,900
but to what certainty
can they claim

663
00:40:49,900 --> 00:40:56,847
to know the actual god-given,
real cost associated

664
00:40:56,847 --> 00:40:59,430
with expressing these proteins
as a function of the expression

665
00:40:59,430 --> 00:41:00,000
level.

666
00:41:00,000 --> 00:41:01,480
And for that, you really want
to ask about the standard error

667
00:41:01,480 --> 00:41:02,158
of the mean.

668
00:41:07,051 --> 00:41:07,550
Great.

669
00:41:10,500 --> 00:41:15,450
So now, we can come
back and ask about,

670
00:41:15,450 --> 00:41:17,750
why did I just
spend half an hour

671
00:41:17,750 --> 00:41:22,820
talking about standard error of
the mean, standard deviations,

672
00:41:22,820 --> 00:41:23,650
fitting to data?

673
00:41:28,057 --> 00:41:30,640
Well, you guys are probably all
asking yourself that question.

674
00:41:30,640 --> 00:41:33,814
But does anybody have
an answer if I-- Yeah.

675
00:41:33,814 --> 00:41:40,506
AUDIENCE: You can fit
with different curves

676
00:41:40,506 --> 00:41:42,312
if you use different things.

677
00:41:42,312 --> 00:41:44,770
PROFESSOR: You can fit with
different curves if you-- yeah,

678
00:41:44,770 --> 00:41:48,614
I think that that's hard
to argue that statement.

679
00:41:48,614 --> 00:41:51,030
But the statement is a little
bit like "different proteins

680
00:41:51,030 --> 00:41:55,350
have different expression
levels," but a little bit more

681
00:41:55,350 --> 00:41:57,749
concrete maybe.

682
00:41:57,749 --> 00:41:58,249
Yeah.

683
00:41:58,249 --> 00:42:00,503
AUDIENCE: So in
this case, I didn't

684
00:42:00,503 --> 00:42:03,079
check their calculations, but
if you have a natural line,

685
00:42:03,079 --> 00:42:07,426
then you can't make this
calculation of optimization.

686
00:42:07,426 --> 00:42:10,121
PROFESSOR: Yeah, but I
think that-- right, so--

687
00:42:10,121 --> 00:42:12,550
AUDIENCE: In the sense that
there won't be [INAUDIBLE].

688
00:42:12,550 --> 00:42:14,450
PROFESSOR: Yeah, OK.

689
00:42:14,450 --> 00:42:18,000
So I think that this
is a tricky thing.

690
00:42:18,000 --> 00:42:24,630
The data certainly do argue
for a super linear cost.

691
00:42:24,630 --> 00:42:29,640
But I would say that they
argued for it rather weakly,

692
00:42:29,640 --> 00:42:35,350
in that if you look at their
data and you just fit a line,

693
00:42:35,350 --> 00:42:39,510
you would say, it's maybe OK.

694
00:42:39,510 --> 00:42:41,640
And of course, once
again, should we

695
00:42:41,640 --> 00:42:46,110
be surprised that the
quadratic fits better?

696
00:42:46,110 --> 00:42:47,200
No.

697
00:42:47,200 --> 00:42:50,480
And this is a very
dangerous thing,

698
00:42:50,480 --> 00:42:51,639
if you're comparing models.

699
00:42:51,639 --> 00:42:53,930
It'll always be the case, if
you add another parameter,

700
00:42:53,930 --> 00:42:57,240
it will look better.

701
00:42:57,240 --> 00:43:01,730
But the question then
is how strong of a case

702
00:43:01,730 --> 00:43:03,670
should we make of this?

703
00:43:03,670 --> 00:43:10,980
And then how important is it for
the conclusions of the study?

704
00:43:10,980 --> 00:43:16,390
Now, in addition to the
line and the quadratic,

705
00:43:16,390 --> 00:43:19,820
they had another
curve in here, which

706
00:43:19,820 --> 00:43:22,650
looks like-- let me see if I
can get it right for you guys.

707
00:43:28,080 --> 00:43:32,860
So this is fine, tricky thing.

708
00:43:32,860 --> 00:43:34,900
So it's the dashed
line that looks

709
00:43:34,900 --> 00:43:37,650
very similar to the
solid quadratic line.

710
00:43:37,650 --> 00:43:39,400
Can somebody remind
us what the difference

711
00:43:39,400 --> 00:43:43,950
was between those two
non-linear curves that they had?

712
00:43:53,350 --> 00:43:56,950
Why do they have two curves
that look so similar?

713
00:44:07,730 --> 00:44:10,610
AUDIENCE: I think the dashed
line responds to some model

714
00:44:10,610 --> 00:44:14,750
where there's only so much of
this certain resource that--

715
00:44:14,750 --> 00:44:17,840
PROFESSOR: Right, OK, so my
dashed line is their red line,

716
00:44:17,840 --> 00:44:22,600
just to-- OK dashed
red in the paper.

717
00:44:22,600 --> 00:44:27,060
So it's this line where there's
a finite amount of resources

718
00:44:27,060 --> 00:44:29,890
or protein-making machinery
that the cell has.

719
00:44:29,890 --> 00:44:32,920
And if you use them up, then
you don't get any growth.

720
00:44:32,920 --> 00:44:36,020
And of course, that
statement kind of

721
00:44:36,020 --> 00:44:37,540
has to be true on some level.

722
00:44:37,540 --> 00:44:39,455
And the question is whether--

723
00:44:39,455 --> 00:44:40,826
AUDIENCE: --that scale is--

724
00:44:40,826 --> 00:44:42,450
PROFESSOR: --it's
relevant here, right.

725
00:44:46,620 --> 00:44:48,890
Certainly, I would
say that one question

726
00:44:48,890 --> 00:44:52,920
is whether you can reject
the hypothesis that this cost

727
00:44:52,920 --> 00:44:53,810
function is a line.

728
00:44:53,810 --> 00:44:56,940
Another question is whether you
can distinguish between the two

729
00:44:56,940 --> 00:45:01,490
quadratic or the two non-linear
curves based on the data.

730
00:45:01,490 --> 00:45:03,720
And I think the answer
to the second question

731
00:45:03,720 --> 00:45:05,960
is certainly not.

732
00:45:05,960 --> 00:45:07,510
And they don't
claim that they can.

733
00:45:10,410 --> 00:45:12,590
But it's important to
just note that it's

734
00:45:12,590 --> 00:45:15,320
just impossible for
them to assume--

735
00:45:15,320 --> 00:45:19,180
I mean, those curves are so, so
similar over the entire range

736
00:45:19,180 --> 00:45:20,900
where they have
data that it's going

737
00:45:20,900 --> 00:45:24,420
to be possible to
distinguish those two things.

738
00:45:24,420 --> 00:45:28,620
But does it matter which
of the two cost functions

739
00:45:28,620 --> 00:45:30,630
is the true cost function?

740
00:45:33,595 --> 00:45:34,095
Yeah.

741
00:45:34,095 --> 00:45:37,560
AUDIENCE: Is it because
the [INAUDIBLE] where

742
00:45:37,560 --> 00:45:39,045
the marginal
benefits become zero

743
00:45:39,045 --> 00:45:43,005
is like inside the range where
the cost functions are still

744
00:45:43,005 --> 00:45:44,985
exactly the same?

745
00:45:44,985 --> 00:45:46,060
PROFESSOR: OK, right.

746
00:45:46,060 --> 00:45:48,435
So what you're saying is that
the two cost functions they

747
00:45:48,435 --> 00:45:53,100
have they behave similarly
over the range that is relevant

748
00:45:53,100 --> 00:45:56,350
maybe, so then therefore,
it doesn't matter.

749
00:45:56,350 --> 00:45:59,749
Is that-- or am I-- OK.

750
00:45:59,749 --> 00:46:01,790
So why do they have to
cost functions there then,

751
00:46:01,790 --> 00:46:03,390
why two non-linear
cost functions?

752
00:46:08,790 --> 00:46:14,134
Just to provide variety
in our modeling?

753
00:46:14,134 --> 00:46:14,634
Yep.

754
00:46:14,634 --> 00:46:17,094
AUDIENCE: They were doing
another experiment later on,

755
00:46:17,094 --> 00:46:22,506
and they said something
like something was saturated

756
00:46:22,506 --> 00:46:24,774
and that was modeled by
the second cost function.

757
00:46:24,774 --> 00:46:25,690
PROFESSOR: Right, yes.

758
00:46:25,690 --> 00:46:26,770
That's right.

759
00:46:26,770 --> 00:46:28,947
And what's the later
experiment they're going to do,

760
00:46:28,947 --> 00:46:29,780
just so that we're--

761
00:46:34,532 --> 00:46:36,115
AUDIENCE: You should
ask somebody else

762
00:46:36,115 --> 00:46:37,202
to explain that, not me.

763
00:46:37,202 --> 00:46:38,910
PROFESSOR: You regret
opening your mouth.

764
00:46:38,910 --> 00:46:40,984
No, OK.

765
00:46:40,984 --> 00:46:42,400
So yeah, so what
is the experiment

766
00:46:42,400 --> 00:46:45,291
that they're going to do?

767
00:46:45,291 --> 00:46:47,159
AUDIENCE: Measuring the benefit?

768
00:46:47,159 --> 00:46:49,533
PROFESSOR: So next, they're
going to measure the benefit.

769
00:46:49,533 --> 00:46:52,730
But this question about
the two cost functions

770
00:46:52,730 --> 00:46:54,802
is not somehow relevant
yet for the benefit part.

771
00:47:01,760 --> 00:47:02,420
Yes.

772
00:47:02,420 --> 00:47:04,900
AUDIENCE: So they're doing it
in different concentrations

773
00:47:04,900 --> 00:47:08,620
of lactose and seeing if
the protein expression could

774
00:47:08,620 --> 00:47:09,370
adapt [INAUDIBLE].

775
00:47:09,370 --> 00:47:11,590
PROFESSOR: Right,
after a long time.

776
00:47:11,590 --> 00:47:13,880
So they actually do laboratory
evolution experiments,

777
00:47:13,880 --> 00:47:16,780
where they grow these
bacterial populations

778
00:47:16,780 --> 00:47:18,820
in different lactose
concentrations.

779
00:47:18,820 --> 00:47:21,630
And then they look to see
what level of the lac operon

780
00:47:21,630 --> 00:47:25,146
expression does the
population evolve to.

781
00:47:25,146 --> 00:47:26,520
So what they're
trying to do here

782
00:47:26,520 --> 00:47:28,470
is they're trying
to say, OK, well,

783
00:47:28,470 --> 00:47:31,680
we can measure some cost as
a function of expression.

784
00:47:31,680 --> 00:47:33,540
Maybe we can measure
some benefits

785
00:47:33,540 --> 00:47:34,792
as a function of expression.

786
00:47:34,792 --> 00:47:36,250
And then from that,
we'd like to be

787
00:47:36,250 --> 00:47:38,940
able to predict where the
population will evolve to.

788
00:47:49,260 --> 00:47:52,604
And they had these two
non-linear cost functions,

789
00:47:52,604 --> 00:47:55,020
which based on the data they
have, they can't distinguish.

790
00:47:55,020 --> 00:47:57,519
But they say, oh, well, they're
both kind of reasonable cost

791
00:47:57,519 --> 00:47:58,640
functions.

792
00:47:58,640 --> 00:48:00,250
And in some ways,
maybe the problem

793
00:48:00,250 --> 00:48:03,550
here is that the two
costs functions end up

794
00:48:03,550 --> 00:48:07,620
being wildly different in terms
of predicting what happens

795
00:48:07,620 --> 00:48:11,849
for large lac concentrations,
where you would want

796
00:48:11,849 --> 00:48:13,140
to express more of the protein.

797
00:48:20,140 --> 00:48:23,351
Do you guys-- do you
remember this or not?

798
00:48:23,351 --> 00:48:23,850
Sort of.

799
00:48:26,687 --> 00:48:28,770
And that's actually-- well,
you might as well just

800
00:48:28,770 --> 00:48:30,230
look at that.

801
00:48:30,230 --> 00:48:33,590
So that's figure 4.

802
00:48:33,590 --> 00:48:35,900
That's the normalized
lacZ activity

803
00:48:35,900 --> 00:48:37,336
that the populations
evolve to as

804
00:48:37,336 --> 00:48:38,960
a function of the
lactose concentration

805
00:48:38,960 --> 00:48:39,820
they're evolving in.

806
00:48:39,820 --> 00:48:43,710
And what you see is that
this red curve corresponding

807
00:48:43,710 --> 00:48:49,020
to the finite resources cost
function, it explains the data.

808
00:48:49,020 --> 00:48:50,800
Whereas, the other
ones very much do not.

809
00:48:55,570 --> 00:48:59,230
And that's just because
these other models then

810
00:48:59,230 --> 00:49:02,570
would predict that if you grow
the cells in a lot of lactose

811
00:49:02,570 --> 00:49:05,800
they should express
out to five times

812
00:49:05,800 --> 00:49:08,442
the lac expression, much,
much, much more, which is not

813
00:49:08,442 --> 00:49:09,650
what they see experimentally.

814
00:49:13,860 --> 00:49:14,360
Yes.

815
00:49:14,360 --> 00:49:17,762
AUDIENCE: Is there
another way to put

816
00:49:17,762 --> 00:49:21,636
a bound on the expression,
because of this expression

817
00:49:21,636 --> 00:49:22,136
we have?

818
00:49:22,136 --> 00:49:26,024
You mentioned for that
promoter, it's not possible to--

819
00:49:26,024 --> 00:49:28,080
PROFESSOR: OK, but
the idea of evolution

820
00:49:28,080 --> 00:49:31,890
is that evolution can make
it a stronger promoter.

821
00:49:31,890 --> 00:49:35,070
So you guys, one statement
is, given this DNA sequence

822
00:49:35,070 --> 00:49:37,290
at that promoter, how much
expression can you get?

823
00:49:37,290 --> 00:49:40,630
And the most you can get is this
amount that's normalized to 1.

824
00:49:40,630 --> 00:49:43,071
But if you make mutations
in that promoter,

825
00:49:43,071 --> 00:49:44,320
then you could go out further.

826
00:49:48,050 --> 00:49:51,990
So the question now is,
after we kind of tell you

827
00:49:51,990 --> 00:49:54,210
the results of these
evolution experiments,

828
00:49:54,210 --> 00:49:59,200
how much should that favor
this dashed red line,

829
00:49:59,200 --> 00:50:04,810
this super linear cost
function with finite resources?

830
00:50:04,810 --> 00:50:11,810
And on one level, you'd say, oh,
well, that's pretty compelling.

831
00:50:11,810 --> 00:50:16,540
On another level,
later people that

832
00:50:16,540 --> 00:50:21,230
have come and measured this
find that it's basically a line.

833
00:50:21,230 --> 00:50:23,390
So are there any questions?

834
00:50:23,390 --> 00:50:29,360
So it seems to basically be
not true within this range.

835
00:50:29,360 --> 00:50:31,420
It is the case that if
you go out far enough,

836
00:50:31,420 --> 00:50:32,890
then the growth does go to 0.

837
00:50:32,890 --> 00:50:36,341
But that's much further out.

838
00:50:36,341 --> 00:50:36,840
Yes.

839
00:50:36,840 --> 00:50:39,325
AUDIENCE: After they--
on the experiment,

840
00:50:39,325 --> 00:50:42,801
they had [INAUDIBLE] expression
protein at the [INAUDIBLE]

841
00:50:42,801 --> 00:50:43,301
level.

842
00:50:43,301 --> 00:50:44,957
Why didn't they go back and
do the experiment again, just

843
00:50:44,957 --> 00:50:45,786
to see [INAUDIBLE]?

844
00:50:45,786 --> 00:50:47,550
PROFESSOR: OK, so
actually, one of these

845
00:50:47,550 --> 00:50:53,220
curves-- so the triangle,
the sort of teal triangle,

846
00:50:53,220 --> 00:50:54,870
it is indeed higher up.

847
00:50:54,870 --> 00:50:57,330
And it's kind of here.

848
00:50:57,330 --> 00:51:01,000
So they do have a data point
that is further beyond and is,

849
00:51:01,000 --> 00:51:03,030
again, above that curve.

850
00:51:03,030 --> 00:51:06,560
So that does provide
somewhat further support

851
00:51:06,560 --> 00:51:07,892
for a non-linear model.

852
00:51:10,870 --> 00:51:13,860
But again, there's a question
of how strong that should be

853
00:51:13,860 --> 00:51:15,650
and so forth.

854
00:51:15,650 --> 00:51:19,030
And indeed, I'd
say, for example,

855
00:51:19,030 --> 00:51:21,620
Terry Hwa has
spent a lot of time

856
00:51:21,620 --> 00:51:26,450
characterizing growth rates as
a function of many, many things.

857
00:51:26,450 --> 00:51:33,000
And if you measure the
relative growth rate

858
00:51:33,000 --> 00:51:41,504
as a function of a non-useful
protein expression--

859
00:51:41,504 --> 00:51:43,420
and what he finds is
that this thing basically

860
00:51:43,420 --> 00:51:46,430
looks like a line in this axis.

861
00:51:46,430 --> 00:51:48,805
And it saturates
at around if you're

862
00:51:48,805 --> 00:51:55,740
at 30% maybe of total
protein expression.

863
00:51:55,740 --> 00:51:57,600
So this is a lot.

864
00:51:57,600 --> 00:52:01,125
But this is kind of where the
cell just can't handle that.

865
00:52:04,190 --> 00:52:07,530
So Terry Hwa has
recently been exploring

866
00:52:07,530 --> 00:52:09,780
a lot of these sort of
phenomenological growth

867
00:52:09,780 --> 00:52:14,830
laws, where he imposes
costs of various sorts

868
00:52:14,830 --> 00:52:17,570
and then looks at how the
cell kind of responds.

869
00:52:17,570 --> 00:52:21,020
And what he finds is just
a remarkably large number

870
00:52:21,020 --> 00:52:27,410
of lines in various spaces
that I find very surprising,

871
00:52:27,410 --> 00:52:29,070
but that he can
understand using kind

872
00:52:29,070 --> 00:52:31,376
of some phenomenological
modeling.

873
00:52:31,376 --> 00:52:34,020
But this is one of
like a dozen lines

874
00:52:34,020 --> 00:52:37,060
that he sees of various
axes doing things.

875
00:52:37,060 --> 00:52:39,340
But the point here is
that, as a function

876
00:52:39,340 --> 00:52:42,575
of the level of expression of
these non-useful proteins, what

877
00:52:42,575 --> 00:52:45,390
he sees is that for a variety of
different proteins-- including

878
00:52:45,390 --> 00:52:48,897
beta-gal but also beta-lactamase
and other proteins that are not

879
00:52:48,897 --> 00:52:51,230
being used in that particular
environment-- what he sees

880
00:52:51,230 --> 00:52:54,850
is that there's
basically a linear cost

881
00:52:54,850 --> 00:53:03,900
growth, as you impose this
non-useful protein expression.

882
00:53:03,900 --> 00:53:07,120
So I'd say that
this basic statement

883
00:53:07,120 --> 00:53:10,660
of it being not-- the statement
of cost being super linear, I

884
00:53:10,660 --> 00:53:15,130
think, ends up not being true.

885
00:53:15,130 --> 00:53:18,280
Now, what does it
mean for this paper?

886
00:53:29,104 --> 00:53:32,056
AUDIENCE: I mean, they still
presented with same hypothesis

887
00:53:32,056 --> 00:53:35,132
and had these data to
back up some of it.

888
00:53:35,132 --> 00:53:36,090
PROFESSOR: Yeah, right.

889
00:53:36,090 --> 00:53:39,770
So it's a very
interesting hypothesis.

890
00:53:39,770 --> 00:53:41,480
They did nice
evolution experiments,

891
00:53:41,480 --> 00:53:45,430
where they saw the population
adapt to different levels.

892
00:53:45,430 --> 00:53:48,930
But what does it mean
about the predictions,

893
00:53:48,930 --> 00:53:50,950
in particular, in the
sense that if you measure

894
00:53:50,950 --> 00:53:52,590
cost and benefits, then
you want to predict

895
00:53:52,590 --> 00:53:53,715
where it's going to evolve.

896
00:53:57,350 --> 00:53:58,170
What happens?

897
00:53:58,170 --> 00:54:01,510
If it's the case that cost
as a function of expression

898
00:54:01,510 --> 00:54:02,930
is actually linear,
then what does

899
00:54:02,930 --> 00:54:05,750
that mean for their ability to
predict what's going to happen?

900
00:54:23,462 --> 00:54:26,232
AUDIENCE: Seems like if
they use their same model

901
00:54:26,232 --> 00:54:29,148
for the benefit in
this linear cost,

902
00:54:29,148 --> 00:54:32,042
that their predictions would
be really off [INAUDIBLE].

903
00:54:32,042 --> 00:54:33,000
PROFESSOR: Yeah, right.

904
00:54:33,000 --> 00:54:36,660
So the problem is that if you
actually use a linear function

905
00:54:36,660 --> 00:54:40,170
here, then their
model doesn't even

906
00:54:40,170 --> 00:54:44,635
predict that there should be an
optimum, because their benefit

907
00:54:44,635 --> 00:54:46,260
function ends up also
being essentially

908
00:54:46,260 --> 00:54:49,390
linear with the amount of
this protein expressed.

909
00:54:49,390 --> 00:54:56,460
So if you have two lines-- so
overall growth is something

910
00:54:56,460 --> 00:54:59,320
like goes as
benefits minus costs.

911
00:55:02,490 --> 00:55:07,340
And maybe this is
a relative growth.

912
00:55:07,340 --> 00:55:15,340
So if you have a line here
and a line here, no optimum.

913
00:55:15,340 --> 00:55:18,480
So that's kind of a bummer.

914
00:55:18,480 --> 00:55:24,560
But it doesn't mean that
that's-- in biology,

915
00:55:24,560 --> 00:55:26,050
eventually things
are non-linear,

916
00:55:26,050 --> 00:55:27,590
so there should be some optimum.

917
00:55:27,590 --> 00:55:28,965
And actually, what
I would say is

918
00:55:28,965 --> 00:55:31,330
that I think that the
non-linearity is probably

919
00:55:31,330 --> 00:55:32,830
actually here.

920
00:55:32,830 --> 00:55:34,790
That's the non-linearity
that's relevant,

921
00:55:34,790 --> 00:55:39,150
maybe, is dominated on the
benefit side rather than

922
00:55:39,150 --> 00:55:41,572
the cost side.

923
00:55:41,572 --> 00:55:43,030
My guess as to
what's going on here

924
00:55:43,030 --> 00:55:46,590
is that rather than the
costs growing super linearly

925
00:55:46,590 --> 00:55:48,510
with the expression
level, rather

926
00:55:48,510 --> 00:55:52,280
the benefits will be sub-linear
with the expression level.

927
00:55:55,840 --> 00:55:58,040
And why might that be?

928
00:56:06,626 --> 00:56:08,740
AUDIENCE: We're just
seeing them apart,

929
00:56:08,740 --> 00:56:11,480
splitting up more lactose
that's useful, just so it

930
00:56:11,480 --> 00:56:12,824
can't metabolize more of it.

931
00:56:12,824 --> 00:56:14,640
PROFESSOR: Right, you
know, at some point,

932
00:56:14,640 --> 00:56:18,310
it's just that the cell
doesn't need more sugar.

933
00:56:18,310 --> 00:56:20,140
And then it's not
going to be as useful.

934
00:56:20,140 --> 00:56:22,640
And even before you
get to that regime,

935
00:56:22,640 --> 00:56:25,700
I think there are various
ways in which cells

936
00:56:25,700 --> 00:56:28,990
may be able to use
the sugar more or less

937
00:56:28,990 --> 00:56:31,490
efficiently, depending on how
much they have it, which means

938
00:56:31,490 --> 00:56:35,360
that as-- and this is just
like for us, the first slice

939
00:56:35,360 --> 00:56:36,517
of pizza is great.

940
00:56:36,517 --> 00:56:38,100
But then once you're
at the fifth one,

941
00:56:38,100 --> 00:56:40,580
you start to feel
a little bit full.

942
00:56:40,580 --> 00:56:47,020
So in general benefits as
a function of anything,

943
00:56:47,020 --> 00:56:48,880
should have some
saturating behavior.

944
00:56:53,560 --> 00:56:55,630
And my sense is that
this is basically

945
00:56:55,630 --> 00:56:58,510
why there's an optimum here.

946
00:56:58,510 --> 00:57:04,640
Now, of course, I'd say that
all these cost functions

947
00:57:04,640 --> 00:57:06,520
behave very similarly in here.

948
00:57:06,520 --> 00:57:08,500
So the predictions
that they make in here

949
00:57:08,500 --> 00:57:11,020
are really not very sensitive
to which of the cost functions

950
00:57:11,020 --> 00:57:11,520
they use.

951
00:57:11,520 --> 00:57:15,710
And those are all still
then relevant and valid.

952
00:57:15,710 --> 00:57:17,640
The question is just
trying to predict

953
00:57:17,640 --> 00:57:19,690
what happens beyond the
range that you have data

954
00:57:19,690 --> 00:57:21,570
is very hard, because
it depends very

955
00:57:21,570 --> 00:57:24,060
much on what your curve
does past that region.

956
00:57:31,160 --> 00:57:33,560
So I guess I've made
an argument that I

957
00:57:33,560 --> 00:57:35,060
think that maybe
what's happening

958
00:57:35,060 --> 00:57:37,059
is that the benefit
function here is non-linear.

959
00:57:37,059 --> 00:57:39,540
But what did they actually
do to measure the benefits,

960
00:57:39,540 --> 00:57:42,920
because this is not I think
totally obvious either?

961
00:58:06,510 --> 00:58:08,649
So what should I be plotting?

962
00:58:08,649 --> 00:58:10,440
Well, this is still a
relative growth rate.

963
00:58:15,210 --> 00:58:19,530
And here, this was actually
lactose concentrations.

964
00:58:19,530 --> 00:58:21,964
So this is not lac
expression, which

965
00:58:21,964 --> 00:58:24,130
is the most obvious thing
that you would want to do,

966
00:58:24,130 --> 00:58:26,517
but that's harder.

967
00:58:26,517 --> 00:58:28,100
And what they show
is that their model

968
00:58:28,100 --> 00:58:30,690
is sort of consistent
on this axis.

969
00:58:30,690 --> 00:58:31,910
This is external lactose.

970
00:58:36,360 --> 00:58:41,870
And the idea is
that-- here is 0--

971
00:58:41,870 --> 00:58:45,860
in the absence of any lactose,
if you induce the lac operon,

972
00:58:45,860 --> 00:58:50,400
then you're at this minus
4 and 1/2% or whatnot.

973
00:58:50,400 --> 00:58:54,650
So it kind of starts
out down here.

974
00:58:54,650 --> 00:58:58,300
And then up here, it
comes out up to above 0.1.

975
00:58:58,300 --> 00:59:01,620
So this is the
first 4, maybe 4.5%.

976
00:59:01,620 --> 00:59:03,970
This is up here at 10%.

977
00:59:03,970 --> 00:59:09,730
And you end up with
a curve that kind of

978
00:59:09,730 --> 00:59:15,760
goes from 4% or 5% deficit
up to 10% or 11% advantage.

979
00:59:15,760 --> 00:59:18,820
And this is at full
induction of the lac operon.

980
00:59:22,870 --> 00:59:26,860
What this is saying
is that if you're

981
00:59:26,860 --> 00:59:31,890
making the proteins to break
down and consume lactose,

982
00:59:31,890 --> 00:59:33,880
then there's a cost.

983
00:59:33,880 --> 00:59:35,500
That's just how they plotted it.

984
00:59:35,500 --> 00:59:38,620
But that the benefits do
indeed outweigh the costs

985
00:59:38,620 --> 00:59:40,180
at some concentration
of lactose.

986
00:59:44,230 --> 00:59:48,160
But then here,
there's a saturation.

987
00:59:48,160 --> 00:59:50,380
And here, the saturation
in their model--

988
00:59:50,380 --> 00:59:53,430
they get a saturation
just because

989
00:59:53,430 --> 00:59:56,672
of the dynamics of import.

990
00:59:56,672 --> 00:59:58,130
So what they assume
is that there's

991
00:59:58,130 --> 01:00:01,990
a Michaelis-Menten
kinetics for import.

992
01:00:01,990 --> 01:00:10,790
So the import rate kind of
goes as the concentration

993
01:00:10,790 --> 01:00:16,390
of the lactose divided by some
k plus the concentration again

994
01:00:16,390 --> 01:00:21,615
of lactose, so
Michaelis-Menten dynamics.

995
01:00:25,140 --> 01:00:27,932
But of course, if you have
more of the protein lacY,

996
01:00:27,932 --> 01:00:29,390
then you'll be able
to import more.

997
01:00:32,452 --> 01:00:33,910
So just because
you have saturation

998
01:00:33,910 --> 01:00:36,430
as a function of
lactose does not

999
01:00:36,430 --> 01:00:38,340
mean that you'll have
saturation in terms

1000
01:00:38,340 --> 01:00:43,027
of the number of proteins
that you're making.

1001
01:00:43,027 --> 01:00:44,610
Do you understand
why I'm saying that?

1002
01:00:47,890 --> 01:00:51,657
And indeed, I would say
that many underlying models

1003
01:00:51,657 --> 01:00:53,740
could have been consistent
with this data as well.

1004
01:00:53,740 --> 01:00:57,540
So I'd say that
their data does not

1005
01:00:57,540 --> 01:01:02,260
reject the hypothesis that the
benefit function is sublinear.

1006
01:01:06,044 --> 01:01:06,544
Yeah.

1007
01:01:06,544 --> 01:01:08,764
AUDIENCE: So that
you just said if you

1008
01:01:08,764 --> 01:01:12,350
have more lacY and
import more and it would

1009
01:01:12,350 --> 01:01:13,700
saturate to an [INAUDIBLE].

1010
01:01:13,700 --> 01:01:18,321
So you could imagine that by
evolution something happened

1011
01:01:18,321 --> 01:01:18,820
there.

1012
01:01:18,820 --> 01:01:22,250
So why would you even
expect the prediction

1013
01:01:22,250 --> 01:01:24,130
of this cost-benefit analysis?

1014
01:01:27,210 --> 01:01:28,300
You see what I mean?

1015
01:01:28,300 --> 01:01:31,760
PROFESSOR: OK, so you're
saying that evolution

1016
01:01:31,760 --> 01:01:33,710
might be able to change
other things as well

1017
01:01:33,710 --> 01:01:35,020
to kind of fiddle-- yeah.

1018
01:01:35,020 --> 01:01:36,787
I think this is an
important question.

1019
01:01:36,787 --> 01:01:38,370
I think the basic
answer is that there

1020
01:01:38,370 --> 01:01:40,495
are some things that are
easier for evolution to do

1021
01:01:40,495 --> 01:01:41,850
than others.

1022
01:01:41,850 --> 01:01:46,240
And also that somethings have
maybe already been optimized.

1023
01:01:46,240 --> 01:01:48,820
Now, relevant to
this point, so they

1024
01:01:48,820 --> 01:01:51,100
did these laboratory
evolution experiments,

1025
01:01:51,100 --> 01:01:57,161
and there was one category of
mutation that they did not see.

1026
01:01:57,161 --> 01:01:58,660
Does anybody remember
what that was?

1027
01:02:09,009 --> 01:02:10,800
What's the most
straightforward way of kind

1028
01:02:10,800 --> 01:02:14,080
of getting around all this
cost-benefit discussion

1029
01:02:14,080 --> 01:02:15,160
that we've just had?

1030
01:02:37,137 --> 01:02:38,720
So the one thing
that they did not see

1031
01:02:38,720 --> 01:02:43,100
was significant
improvements in the enzyme.

1032
01:02:43,100 --> 01:02:46,190
So they checked, and they
found that they did not

1033
01:02:46,190 --> 01:02:49,550
see any increase the
lacZ activity normalized

1034
01:02:49,550 --> 01:02:53,400
by the amount of the
lacZ that was being made.

1035
01:02:53,400 --> 01:02:56,610
Now, that might make sense,
because if this enzyme has

1036
01:02:56,610 --> 01:03:00,510
already been gone
through millions of years

1037
01:03:00,510 --> 01:03:03,820
of optimization to
break down lactose,

1038
01:03:03,820 --> 01:03:06,430
then it's reasonable to say,
oh, well, in the next five

1039
01:03:06,430 --> 01:03:10,484
generations in the lab,
it maybe won't improve.

1040
01:03:10,484 --> 01:03:12,650
Of course, you always have
to be careful about this,

1041
01:03:12,650 --> 01:03:17,470
because it could be that some
sequence slash structure is

1042
01:03:17,470 --> 01:03:20,269
best when you're thinking
about-- when is it

1043
01:03:20,269 --> 01:03:21,560
that E. coli might see lactose?

1044
01:03:26,400 --> 01:03:28,350
Our gut.

1045
01:03:28,350 --> 01:03:30,810
So you imagine you have
bacteria in the gut.

1046
01:03:30,810 --> 01:03:33,520
That's a different
environment than in the lab.

1047
01:03:33,520 --> 01:03:36,650
So it could be very
well that the enzyme,

1048
01:03:36,650 --> 01:03:39,120
because of the pH and
all these other things,

1049
01:03:39,120 --> 01:03:42,120
the enzyme actually
could adapt to the lab,

1050
01:03:42,120 --> 01:03:44,871
even though it may have already
been adapted to our gut.

1051
01:03:44,871 --> 01:03:47,370
So you have to got to be careful
about this kind of argument

1052
01:03:47,370 --> 01:03:49,680
always.

1053
01:03:49,680 --> 01:03:51,970
But of course, once you see
the result, then you say,

1054
01:03:51,970 --> 01:03:53,345
oh, well, that's
because of this.

1055
01:03:56,086 --> 01:03:57,960
So I just want to make
sure that we know what

1056
01:03:57,960 --> 01:03:59,126
these experiments look like.

1057
01:03:59,126 --> 01:04:01,115
So they went for
500 generations.

1058
01:04:07,270 --> 01:04:11,180
So it's useful to ask how
long this experiment should

1059
01:04:11,180 --> 01:04:11,680
have taken.

1060
01:04:21,110 --> 01:04:35,695
Is it closest to three
days, three weeks?

1061
01:04:51,030 --> 01:04:53,940
Anytime you read
about an experiment,

1062
01:04:53,940 --> 01:04:57,174
it's useful just to have some
notion of what the authors went

1063
01:04:57,174 --> 01:04:59,174
through in order to bring
you the results you're

1064
01:04:59,174 --> 01:04:59,876
reading about.

1065
01:05:05,500 --> 01:05:10,420
If you are not sure, you
can just make a guess.

1066
01:05:10,420 --> 01:05:15,210
OK, ready, 3, 2, 1.

1067
01:05:15,210 --> 01:05:18,040
All right, so we have some
number of A's, some number

1068
01:05:18,040 --> 01:05:21,150
of B's, and a couple of C's.

1069
01:05:21,150 --> 01:05:24,390
Well, one thing
you might say is,

1070
01:05:24,390 --> 01:05:27,690
how fast can E. coli divide?

1071
01:05:27,690 --> 01:05:30,015
OK, on one level, you may
say oh, about 20 minutes.

1072
01:05:32,930 --> 01:05:35,950
That should give us what?

1073
01:05:35,950 --> 01:05:38,570
75-ish generations a day.

1074
01:05:38,570 --> 01:05:46,940
So we should be able to get here
in a week or something, maybe.

1075
01:05:46,940 --> 01:05:51,540
But that's not what they
did, for several reasons.

1076
01:05:51,540 --> 01:05:54,790
First of all, this
would be in rich media.

1077
01:05:54,790 --> 01:05:57,480
In the environment that
they are doing this in,

1078
01:05:57,480 --> 01:05:58,800
it's a bit slower.

1079
01:05:58,800 --> 01:06:04,420
But that would get you maybe
to the two or three-week mark.

1080
01:06:04,420 --> 01:06:06,162
But that still is
not what happened.

1081
01:06:06,162 --> 01:06:07,870
They actually had to
go for three months.

1082
01:06:10,780 --> 01:06:14,130
And this is because
experiments are not always

1083
01:06:14,130 --> 01:06:16,070
keeping cells
constantly dividing

1084
01:06:16,070 --> 01:06:18,720
at their maximal rates.

1085
01:06:18,720 --> 01:06:20,340
The standard way
that we do this is

1086
01:06:20,340 --> 01:06:22,570
what's known as kind
of daily batch culture.

1087
01:06:25,250 --> 01:06:28,610
And does anybody know how
much they diluted by each day?

1088
01:06:34,330 --> 01:06:38,950
Yeah, so I think it was
diluting by a factor of 100.

1089
01:06:38,950 --> 01:06:47,810
So it's daily batch
culture with 100x dilution,

1090
01:06:47,810 --> 01:06:52,130
which corresponds to about
6.6 generations per day.

1091
01:06:56,250 --> 01:06:59,085
So this is very
far from what you

1092
01:06:59,085 --> 01:07:01,740
would think of as kind of the
best they could possibly do.

1093
01:07:06,862 --> 01:07:08,320
And what it means
is that, yeah, it

1094
01:07:08,320 --> 01:07:09,920
does take about
three months for them

1095
01:07:09,920 --> 01:07:13,010
to have done this experiment.

1096
01:07:13,010 --> 01:07:19,900
It also means that if you look
at the number over the course

1097
01:07:19,900 --> 01:07:27,750
of each day, this is n max.

1098
01:07:27,750 --> 01:07:31,600
And they dilute-- this
is n max over 100--

1099
01:07:31,600 --> 01:07:36,170
so they dilute by
a factor of 100.

1100
01:07:36,170 --> 01:07:40,066
When you transfer cells
from a saturated state

1101
01:07:40,066 --> 01:07:42,440
into new environment, do they
start dividing immediately,

1102
01:07:42,440 --> 01:07:44,410
for those of you who have
done this experiment?

1103
01:07:44,410 --> 01:07:45,130
No.

1104
01:07:45,130 --> 01:07:50,362
It's going to take an hour
or two for them to get going.

1105
01:07:50,362 --> 01:07:52,070
But then they're going
to start dividing.

1106
01:07:52,070 --> 01:08:00,135
And this on a log scale maybe--
log N. And what you'll see

1107
01:08:00,135 --> 01:08:02,260
is they kind of go-- they're
dividing exponentially

1108
01:08:02,260 --> 01:08:04,190
and then they saturate.

1109
01:08:04,190 --> 01:08:06,850
Indeed, they're going
to saturate for about

1110
01:08:06,850 --> 01:08:07,860
a fair amount of time.

1111
01:08:07,860 --> 01:08:10,560
So this might be an hour or two.

1112
01:08:10,560 --> 01:08:14,710
This might be say five hours.

1113
01:08:14,710 --> 01:08:16,950
But then you still have
another roughly 20 hours

1114
01:08:16,950 --> 01:08:20,270
to go before the next dilution.

1115
01:08:23,279 --> 01:08:25,790
And then we repeat.

1116
01:08:25,790 --> 01:08:28,443
So they actually saturated for
a fair fraction of the day.

1117
01:08:32,700 --> 01:08:35,879
Now, in all these discussions
of laboratory evolution--

1118
01:08:35,879 --> 01:08:37,420
and in many of the
calculations we're

1119
01:08:37,420 --> 01:08:41,350
going to be doing over
the next couple of weeks--

1120
01:08:41,350 --> 01:08:44,180
we'll typically assume that
what is being optimized

1121
01:08:44,180 --> 01:08:47,339
is the growth rate,
the rate of division.

1122
01:08:47,339 --> 01:08:49,380
But you can imagine there
being other things that

1123
01:08:49,380 --> 01:08:51,621
might possibly be
optimized in the course

1124
01:08:51,621 --> 01:08:52,870
of these sorts of experiments.

1125
01:08:52,870 --> 01:08:55,910
Can somebody volunteer
what are other things?

1126
01:09:00,338 --> 01:09:02,306
AUDIENCE: Maximum density?

1127
01:09:02,306 --> 01:09:04,569
PROFESSOR: Right, so
you could imagine,

1128
01:09:04,569 --> 01:09:08,279
if you could just eke out
one more division out there,

1129
01:09:08,279 --> 01:09:10,529
then you could get an advantage.

1130
01:09:10,529 --> 01:09:12,680
And there's a whole set
of interesting things,

1131
01:09:12,680 --> 01:09:13,729
these growth advantage.

1132
01:09:13,729 --> 01:09:16,540
It's stationary phase
or the GASP mutants,

1133
01:09:16,540 --> 01:09:20,040
where the focus is on
trying to do well here.

1134
01:09:20,040 --> 01:09:23,660
And also you can
imagine related maybe,

1135
01:09:23,660 --> 01:09:27,090
if you do better
out for this period,

1136
01:09:27,090 --> 01:09:29,310
cells will start
dying eventually.

1137
01:09:29,310 --> 01:09:31,434
So if you have a lower rate
of death at saturation,

1138
01:09:31,434 --> 01:09:34,930
then you can also spread.

1139
01:09:34,930 --> 01:09:36,136
Other-- yep.

1140
01:09:36,136 --> 01:09:38,571
AUDIENCE: Sorry, can I
ask a quick question?

1141
01:09:38,571 --> 01:09:43,207
What's a possible reason for
the initial [INAUDIBLE] used

1142
01:09:43,207 --> 01:09:44,415
[INAUDIBLE] at the beginning?

1143
01:09:44,415 --> 01:09:45,899
PROFESSOR: Yeah, right.

1144
01:09:45,899 --> 01:09:50,870
So I think it's basically that
when the cells are saturated,

1145
01:09:50,870 --> 01:09:55,570
they generally enter a rather
distinct physiological state,

1146
01:09:55,570 --> 01:09:58,070
as compared to the
dividing state.

1147
01:09:58,070 --> 01:10:01,340
And I think the longer they
sit in this saturated phase,

1148
01:10:01,340 --> 01:10:03,580
the longer it's going to
take them to get going

1149
01:10:03,580 --> 01:10:06,620
in the next day, for example.

1150
01:10:06,620 --> 01:10:10,260
And it's also the case that
cells in saturated culture

1151
01:10:10,260 --> 01:10:12,840
tend to be more resistant to
a variety of perturbations

1152
01:10:12,840 --> 01:10:17,340
of various sorts-- so if you're
talking about heat, salt, this,

1153
01:10:17,340 --> 01:10:19,550
that, and the other thing.

1154
01:10:19,550 --> 01:10:22,210
What's something else they
could be optimized here?

1155
01:10:29,906 --> 01:10:31,990
If you were imagining
you're a cell,

1156
01:10:31,990 --> 01:10:35,004
you want to spread,
what would you do?

1157
01:10:39,620 --> 01:10:41,935
AUDIENCE: [INAUDIBLE].

1158
01:10:41,935 --> 01:10:42,810
PROFESSOR: OK, right.

1159
01:10:42,810 --> 01:10:45,280
So we're saying that
the media is specified

1160
01:10:45,280 --> 01:10:46,580
by the experimentalist.

1161
01:10:46,580 --> 01:10:53,510
So you're the cell in
this Gedanken experiment.

1162
01:10:53,510 --> 01:10:54,919
AUDIENCE: You'd divide yourself.

1163
01:10:54,919 --> 01:10:57,210
PROFESSOR: Right, so you can
eat the other cells, yeah.

1164
01:10:59,990 --> 01:11:01,990
Well, and in particular,
actually out here, this

1165
01:11:01,990 --> 01:11:03,970
is part of how the
GASP mutants spread,

1166
01:11:03,970 --> 01:11:06,230
is that when other
cells start to die,

1167
01:11:06,230 --> 01:11:07,700
they lyse their contents.

1168
01:11:07,700 --> 01:11:10,740
And then the cells
that are surviving

1169
01:11:10,740 --> 01:11:12,760
can actually eat
the contents, yeah.

1170
01:11:12,760 --> 01:11:16,204
AUDIENCE: Is this
a way to coordinate

1171
01:11:16,204 --> 01:11:21,124
between different cells, so
that they can sort of evenly

1172
01:11:21,124 --> 01:11:22,900
distribute themselves
in the media,

1173
01:11:22,900 --> 01:11:24,835
so you don't have to many--

1174
01:11:24,835 --> 01:11:25,710
PROFESSOR: OK, right.

1175
01:11:25,710 --> 01:11:28,260
So I'm actually assuming
here it's well mixed,

1176
01:11:28,260 --> 01:11:31,950
so that in principle
would not be an issue.

1177
01:11:31,950 --> 01:11:34,490
But yeah, so you can
imagine spatial effects

1178
01:11:34,490 --> 01:11:36,520
of various sorts being relevant.

1179
01:11:36,520 --> 01:11:39,470
I guess I just drew this
up here to highlight

1180
01:11:39,470 --> 01:11:43,690
that, in principle, you can
also decrease the lag time.

1181
01:11:46,450 --> 01:11:50,680
So if you start
dividing more rapidly

1182
01:11:50,680 --> 01:11:53,762
at the beginning of
the day, then you'll

1183
01:11:53,762 --> 01:11:55,220
get to spread before
your neighbors

1184
01:11:55,220 --> 01:11:57,646
and your genotypes
will indeed spread.

1185
01:11:57,646 --> 01:11:58,440
Yep.

1186
01:11:58,440 --> 01:12:02,540
AUDIENCE: So I just know
so little about cells,

1187
01:12:02,540 --> 01:12:07,832
but is it true that
a lot of the cells

1188
01:12:07,832 --> 01:12:12,470
could survive and be the same
cell for that whole duration

1189
01:12:12,470 --> 01:12:14,852
when they were in
the stationary phase?

1190
01:12:14,852 --> 01:12:16,600
PROFESSOR: You're
asking whether the--

1191
01:12:16,600 --> 01:12:18,065
AUDIENCE: Yeah,
whether a cell that

1192
01:12:18,065 --> 01:12:19,981
entered the beginning
of the stationary phase,

1193
01:12:19,981 --> 01:12:22,400
that same cell would have
a pretty good chance of--

1194
01:12:22,400 --> 01:12:26,300
PROFESSOR: Yeah, over I think
this sort of 12-hour type

1195
01:12:26,300 --> 01:12:28,590
period, I think
the answer is yes.

1196
01:12:28,590 --> 01:12:31,870
But if you go for
an extra day or two,

1197
01:12:31,870 --> 01:12:34,531
then I think you can start
getting extensive cell death.

1198
01:12:34,531 --> 01:12:36,252
AUDIENCE: Because
then who knows?

1199
01:12:36,252 --> 01:12:38,752
Maybe long enough, though, they
would develop a little clock

1200
01:12:38,752 --> 01:12:40,450
to let them know that
it was about to split.

1201
01:12:40,450 --> 01:12:41,366
PROFESSOR: Yes, right.

1202
01:12:41,366 --> 01:12:42,932
Yeah, so people
have thought about--

1203
01:12:42,932 --> 01:12:44,015
and I'm not sure if this--

1204
01:12:44,015 --> 01:12:45,556
AUDIENCE: And it
seems like it would.

1205
01:12:45,556 --> 01:12:47,589
PROFESSOR: --particular
effect is-- yeah, right.

1206
01:12:47,589 --> 01:12:49,630
But I just want to mention
that this is something

1207
01:12:49,630 --> 01:12:51,463
that you kind of maybe
would expect, indeed,

1208
01:12:51,463 --> 01:12:54,640
you see in these-- so they're a
famous set of experiments done

1209
01:12:54,640 --> 01:13:01,920
by Richard Lenski at Michigan
State, where he's been dividing

1210
01:13:01,920 --> 01:13:07,280
six or eight-- doing daily
batch dilutions of equal E. coli

1211
01:13:07,280 --> 01:13:09,940
cultures now for decades.

1212
01:13:09,940 --> 01:13:13,980
So he started, I don't
know, late '80s or so.

1213
01:13:13,980 --> 01:13:15,710
I don't know if
you guys remember.

1214
01:13:15,710 --> 01:13:18,340
So he's gone tens of
thousands of generations

1215
01:13:18,340 --> 01:13:21,480
and has seen a bunch
of remarkable things.

1216
01:13:21,480 --> 01:13:23,012
One of the things
that he has seen,

1217
01:13:23,012 --> 01:13:24,720
as you might have
expected, is a decrease

1218
01:13:24,720 --> 01:13:27,275
in the lag time of
the vector area.

1219
01:13:34,050 --> 01:13:40,500
So what we have now is a
situation where they add IPTG,

1220
01:13:40,500 --> 01:13:44,200
so that all the cells
are in principle

1221
01:13:44,200 --> 01:13:46,660
start out expressing
the lac operon.

1222
01:13:46,660 --> 01:13:50,190
And then they grow
the cells over time.

1223
01:13:50,190 --> 01:13:58,420
And what they see is that the
lacZ activity, it starts out

1224
01:13:58,420 --> 01:14:03,660
at being 1, normalized,
for all the cultures,

1225
01:14:03,660 --> 01:14:05,920
because there's IPTG,
so it doesn't matter

1226
01:14:05,920 --> 01:14:07,270
how much lactose there is.

1227
01:14:07,270 --> 01:14:13,871
But what they see
is that over time,

1228
01:14:13,871 --> 01:14:15,370
they see things
that look like this.

1229
01:14:19,100 --> 01:14:25,740
So the 0.5 millimolar lactose
didn't change very much.

1230
01:14:25,740 --> 01:14:32,580
But if you look at some of
the others, like no lactose,

1231
01:14:32,580 --> 01:14:35,380
there was significant
decrease in expression.

1232
01:14:35,380 --> 01:14:38,945
Whereas, up here at, for
example, 2 millimolar lactose,

1233
01:14:38,945 --> 01:14:39,820
they see an increase.

1234
01:14:43,610 --> 01:14:46,290
So what you see is
that there really

1235
01:14:46,290 --> 01:14:53,150
are evolutionary changes
of these strains, because--

1236
01:14:53,150 --> 01:14:57,000
and it's very, very
relevant that they

1237
01:14:57,000 --> 01:14:59,150
had IPTG in the media.

1238
01:14:59,150 --> 01:15:01,305
So if they did this
experiment without IPTG,

1239
01:15:01,305 --> 01:15:05,770
do you have any sense of
what would kind of happen

1240
01:15:05,770 --> 01:15:06,350
to the cells?

1241
01:15:06,350 --> 01:15:09,894
I mean, how would that
change the results?

1242
01:15:09,894 --> 01:15:10,394
Yep.

1243
01:15:10,394 --> 01:15:11,964
AUDIENCE: The
expression level would

1244
01:15:11,964 --> 01:15:13,691
be determined by [INAUDIBLE].

1245
01:15:13,691 --> 01:15:15,830
PROFESSOR: Right, so
the expression would

1246
01:15:15,830 --> 01:15:17,660
be determined by the lactose.

1247
01:15:17,660 --> 01:15:21,020
But let's say that
after 500 generations,

1248
01:15:21,020 --> 01:15:23,412
we put them all in a
millimolar lactose.

1249
01:15:23,412 --> 01:15:25,370
How different do you
think they're going to be?

1250
01:15:34,240 --> 01:15:37,180
I mean, do think that the
culture grown, for example,

1251
01:15:37,180 --> 01:15:38,950
in the absence of
lactose, do you

1252
01:15:38,950 --> 01:15:42,870
think that it would still be
able to eat lactose after 500

1253
01:15:42,870 --> 01:15:44,775
generations in that experiment?

1254
01:15:51,560 --> 01:15:52,674
Hm?

1255
01:15:52,674 --> 01:15:53,560
AUDIENCE: Yes.

1256
01:15:53,560 --> 01:15:54,410
PROFESSOR: Yes, OK.

1257
01:15:54,410 --> 01:15:55,868
And yeah, so what's
the difference?

1258
01:15:55,868 --> 01:15:58,472
I mean, why are you
saying yes or what's the--

1259
01:15:58,472 --> 01:15:59,930
AUDIENCE: [INAUDIBLE].

1260
01:15:59,930 --> 01:16:01,390
I don't know how hard it is--

1261
01:16:01,390 --> 01:16:02,832
PROFESSOR: Well, yeah, right.

1262
01:16:02,832 --> 01:16:04,290
So this is an
experiment with IPTG.

1263
01:16:08,580 --> 01:16:10,990
And now, I'm just trying
to think about or imagine

1264
01:16:10,990 --> 01:16:13,580
what would have happened if they
had done the same experiment

1265
01:16:13,580 --> 01:16:16,950
without IPTG just growing
in that environment,

1266
01:16:16,950 --> 01:16:24,912
in particular, if you grow minus
IPTG and then minus lactose

1267
01:16:24,912 --> 01:16:25,745
for 500 generations?

1268
01:16:29,902 --> 01:16:31,360
And then what I
want to ask is, OK,

1269
01:16:31,360 --> 01:16:33,776
let's say that you go over
there and you just add lactose.

1270
01:16:35,597 --> 01:16:38,180
Will the cells, do you think,
be able to grow and the lactose?

1271
01:16:41,200 --> 01:16:41,700
OK.

1272
01:16:41,700 --> 01:16:47,100
And so why is it that here
the answer seems to be no?

1273
01:16:57,880 --> 01:16:59,710
So here, we have
evolved a population

1274
01:16:59,710 --> 01:17:07,060
that not only it's not
expressing the lacZ

1275
01:17:07,060 --> 01:17:07,700
activity here.

1276
01:17:07,700 --> 01:17:11,970
But indeed, if you put lactose
in there, it doesn't express.

1277
01:17:11,970 --> 01:17:16,280
So these cells can no
longer grow on lactose.

1278
01:17:16,280 --> 01:17:18,544
So what's the key
difference here?

1279
01:17:18,544 --> 01:17:19,044
Yep.

1280
01:17:19,044 --> 01:17:21,996
AUDIENCE: So I mean, there's
no [INAUDIBLE] in this case.

1281
01:17:21,996 --> 01:17:22,980
PROFESSOR: Right.

1282
01:17:22,980 --> 01:17:25,350
Now I think this is
just really important.

1283
01:17:25,350 --> 01:17:28,630
So in this case, there is
approximately, we'll say,

1284
01:17:28,630 --> 01:17:35,290
no cost to having the lac operon
on there, because it's just not

1285
01:17:35,290 --> 01:17:36,542
being expressed.

1286
01:17:36,542 --> 01:17:38,000
So then the only
cost is associated

1287
01:17:38,000 --> 01:17:39,780
with DNA replication.

1288
01:17:39,780 --> 01:17:44,410
So the advantage associated
with shutting off or removing

1289
01:17:44,410 --> 01:17:46,660
the ability to grow on lactose
is just really minimal.

1290
01:17:46,660 --> 01:17:53,020
And indeed, in this culture, the
authors did say, what happened.

1291
01:17:53,020 --> 01:17:57,372
AUDIENCE: Yeah, the entire
gene is diluted, right?

1292
01:17:57,372 --> 01:17:58,330
PROFESSOR: Yeah, right.

1293
01:17:58,330 --> 01:18:01,410
So it was almost a kB was
just removed from the genome.

1294
01:18:01,410 --> 01:18:04,510
And that kB included
the promoter.

1295
01:18:04,510 --> 01:18:07,844
And so that it just-- yeah.

1296
01:18:07,844 --> 01:18:10,177
So it's not going to be able
to grow on lactose anymore.

1297
01:18:12,980 --> 01:18:15,310
But the key thing
is here, these cells

1298
01:18:15,310 --> 01:18:17,400
were subject to this
5% cost associated

1299
01:18:17,400 --> 01:18:21,980
with making the lac operon,
which means that that mutant

1300
01:18:21,980 --> 01:18:23,729
that appeared, it
had a 5% advantage,

1301
01:18:23,729 --> 01:18:26,020
and so it was able to spread
throughout the population.

1302
01:18:29,700 --> 01:18:37,240
Whereas, what they could see is
that the evolved lacZ activity

1303
01:18:37,240 --> 01:18:40,790
indeed was different,
depending on how much lactose

1304
01:18:40,790 --> 01:18:42,340
they had in the culture.

1305
01:18:42,340 --> 01:18:43,850
And this is in the
presence by IPTG,

1306
01:18:43,850 --> 01:18:47,128
so they removed
that feedback loop.

1307
01:18:53,160 --> 01:18:56,670
And in these experiments,
anyway you slice it,

1308
01:18:56,670 --> 01:19:03,000
the normalized lacZ activity did
not go above around 1.2 or 1.3.

1309
01:19:03,000 --> 01:19:05,820
So there is some
non-linearity that is somehow

1310
01:19:05,820 --> 01:19:07,810
constraining those
cells from going up

1311
01:19:07,810 --> 01:19:12,776
to increased expression very
much beyond the wild type.

1312
01:19:18,620 --> 01:19:22,890
We are out of time,
but on Tuesday, we'll

1313
01:19:22,890 --> 01:19:25,470
start talking about
evolution, and in particular,

1314
01:19:25,470 --> 01:19:26,970
in the context of
neutral evolution,

1315
01:19:26,970 --> 01:19:29,570
as kind of a null model to try
to understand these dynamics.

1316
01:19:29,570 --> 01:19:32,170
And we will also
talk more about why

1317
01:19:32,170 --> 01:19:34,999
it takes as long as it
does before you start

1318
01:19:34,999 --> 01:19:36,290
seeing anything happening here.

1319
01:19:36,290 --> 01:19:39,890
If you have any questions,
please feel free to come on up.