1
00:00:00,060 --> 00:00:01,780
The following
content is provided

2
00:00:01,780 --> 00:00:04,019
under a Creative
Commons license.

3
00:00:04,019 --> 00:00:06,870
Your support will help MIT
OpenCourseWare continue

4
00:00:06,870 --> 00:00:10,730
to offer high quality
educational resources for free.

5
00:00:10,730 --> 00:00:13,340
To make a donation or
view additional materials

6
00:00:13,340 --> 00:00:17,217
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:17,217 --> 00:00:17,842
at ocw.mit.edu.

8
00:00:26,570 --> 00:00:30,580
DOUG LAUFFENBURGER:
So we shall start.

9
00:00:30,580 --> 00:00:33,092
I haven't had the pleasure
of meeting most of you.

10
00:00:33,092 --> 00:00:34,050
I'm Doug Lauffenburger.

11
00:00:34,050 --> 00:00:40,700
I'm gratefully invited for
a guest presentation here.

12
00:00:40,700 --> 00:00:44,229
So I'll definitely enjoy it.

13
00:00:44,229 --> 00:00:45,520
There should be plenty of time.

14
00:00:45,520 --> 00:00:48,510
I'm not racing through
a lot of material,

15
00:00:48,510 --> 00:00:52,080
so feel free to interrupt
me with questions.

16
00:00:52,080 --> 00:00:55,155
And of course I'll try
to respond as best I can.

17
00:00:57,621 --> 00:00:58,120
OK.

18
00:00:58,120 --> 00:01:02,460
Who has looked at the
background materials that

19
00:01:02,460 --> 00:01:06,340
were posted on the web a
long time ago, last night?

20
00:01:06,340 --> 00:01:08,770
Who already will admit
to having looked at it?

21
00:01:11,290 --> 00:01:11,790
Good.

22
00:01:11,790 --> 00:01:13,606
All right.

23
00:01:13,606 --> 00:01:15,200
I guess that means
I should do this

24
00:01:15,200 --> 00:01:17,320
because otherwise if
you've read it already then

25
00:01:17,320 --> 00:01:18,480
there'd be no point, right?

26
00:01:18,480 --> 00:01:20,550
OK.

27
00:01:20,550 --> 00:01:21,300
OK.

28
00:01:21,300 --> 00:01:24,210
Well, where we are
in your semester--

29
00:01:24,210 --> 00:01:28,290
you're learning a lot of things
across the whole spectrum

30
00:01:28,290 --> 00:01:30,940
of computational
systems biology.

31
00:01:30,940 --> 00:01:33,870
I hope I'll add
something in here.

32
00:01:33,870 --> 00:01:36,620
It's actually a
very specific topic.

33
00:01:36,620 --> 00:01:40,020
We talk about modeling of
cell signaling networks,

34
00:01:40,020 --> 00:01:44,490
and in particular, one approach
is worth going through today

35
00:01:44,490 --> 00:01:47,190
and that's the logic
modeling framework.

36
00:01:47,190 --> 00:01:51,710
So I'll give you a little bit
of a conceptual background

37
00:01:51,710 --> 00:01:54,090
for the first 10 or 15 minutes.

38
00:01:54,090 --> 00:02:00,030
Then we'll launch into
the particular example

39
00:02:00,030 --> 00:02:02,630
that was in the main paper.

40
00:02:02,630 --> 00:02:05,900
And a little side light
with an application of it

41
00:02:05,900 --> 00:02:08,830
to a particular cancer problem.

42
00:02:08,830 --> 00:02:11,290
And then that should take
us pretty much to the end.

43
00:02:11,290 --> 00:02:11,790
OK.

44
00:02:16,060 --> 00:02:16,600
OK.

45
00:02:16,600 --> 00:02:21,940
The biological topic here
is cell signaling, primarily

46
00:02:21,940 --> 00:02:23,930
mammalian cells.

47
00:02:23,930 --> 00:02:27,220
Certainly applicable
to microbial cells

48
00:02:27,220 --> 00:02:28,990
in a simpler sense.

49
00:02:28,990 --> 00:02:36,940
So just to place the context,
in mammalian cell biology,

50
00:02:36,940 --> 00:02:40,430
I'm a bio-engineer and a cell
biologist at the same time.

51
00:02:40,430 --> 00:02:43,720
We're very interested in what
controls the cell behavior,

52
00:02:43,720 --> 00:02:45,880
their phenotypic response.

53
00:02:45,880 --> 00:02:48,680
We know that it's
in fact controlled

54
00:02:48,680 --> 00:02:51,110
by what it sees in their
environment, growth

55
00:02:51,110 --> 00:02:55,890
factors, hormones, extracellular
matrix, mechanical forces,

56
00:02:55,890 --> 00:02:59,320
cell-cell contacts.

57
00:02:59,320 --> 00:03:03,730
A variety of queues
in the environment

58
00:03:03,730 --> 00:03:08,670
and the way these govern
phenotype or control phenotype

59
00:03:08,670 --> 00:03:11,750
is that they influence,
they regulate

60
00:03:11,750 --> 00:03:14,350
what I would call the
execution processes.

61
00:03:14,350 --> 00:03:17,960
The crucial execution processes
such as gene expression,

62
00:03:17,960 --> 00:03:22,620
transcription, and
translation are

63
00:03:22,620 --> 00:03:24,860
governed by
extracellular factors.

64
00:03:24,860 --> 00:03:28,830
Metabolism, synthesis
of new molecules,

65
00:03:28,830 --> 00:03:32,130
cytoskeleton, motors,
forced generation.

66
00:03:32,130 --> 00:03:34,200
These things all
carry out phenotype

67
00:03:34,200 --> 00:03:37,980
governed by the extracellular
stimuli or cues.

68
00:03:37,980 --> 00:03:41,710
And it happens via these
biochemical signaling

69
00:03:41,710 --> 00:03:45,550
pathways that are activated
primarily by cell surface

70
00:03:45,550 --> 00:03:49,000
receptors in the plasmid
membrane-- cascades

71
00:03:49,000 --> 00:03:52,930
of biochemical reactions,
mostly enzymatic.

72
00:03:52,930 --> 00:03:56,580
Some protein-protein
docking, mostly

73
00:03:56,580 --> 00:03:58,620
post-translational
modifications.

74
00:03:58,620 --> 00:04:02,100
Kinase phosphate reactions
adding and taking off

75
00:04:02,100 --> 00:04:06,020
phosphate groups that change
protein activities at locations

76
00:04:06,020 --> 00:04:07,119
and so forth.

77
00:04:07,119 --> 00:04:09,535
It could be other types of
post-translation modifications.

78
00:04:09,535 --> 00:04:15,800
It could be second messengers,
calcium, ATP aces and so forth.

79
00:04:15,800 --> 00:04:19,860
So, the extracellular--
now my battery's dead.

80
00:04:19,860 --> 00:04:20,970
That's not good.

81
00:04:20,970 --> 00:04:22,079
Oh, there we go.

82
00:04:22,079 --> 00:04:25,760
Extracellular stimuli,
generate the signals.

83
00:04:25,760 --> 00:04:29,610
They regulate gene expression,
metabolism, cytoskeleton.

84
00:04:29,610 --> 00:04:32,770
They carry out phenotype.

85
00:04:32,770 --> 00:04:33,270
OK.

86
00:04:36,060 --> 00:04:41,840
So we want to learn about cell
signaling network operations.

87
00:04:41,840 --> 00:04:43,790
There's actually multiple
pathways involved.

88
00:04:43,790 --> 00:04:46,180
We really need to
study many of them

89
00:04:46,180 --> 00:04:49,380
in concert to understand
what the cells are doing.

90
00:04:49,380 --> 00:04:52,680
And a big question is,
what kind of information

91
00:04:52,680 --> 00:04:54,630
do we need to study this?

92
00:04:54,630 --> 00:04:56,880
And on the end, we'd
be interested in how

93
00:04:56,880 --> 00:05:01,750
phenotypic behavior does arise
from variations and mutations

94
00:05:01,750 --> 00:05:05,270
in the genomic content of cells.

95
00:05:05,270 --> 00:05:10,050
But that genomic content,
of course, is not modified,

96
00:05:10,050 --> 00:05:12,630
but its effects are
influenced by what's

97
00:05:12,630 --> 00:05:15,610
the environment to these
extracellular cues,

98
00:05:15,610 --> 00:05:17,480
log ins and so forth.

99
00:05:17,480 --> 00:05:22,070
So they influence what
message is expressed.

100
00:05:22,070 --> 00:05:23,700
From that message
they influence what's

101
00:05:23,700 --> 00:05:25,630
actually translated
into protein.

102
00:05:25,630 --> 00:05:27,230
From those proteins
they influence

103
00:05:27,230 --> 00:05:28,730
the post-translational
modifications

104
00:05:28,730 --> 00:05:31,420
and what the proteins
are actually doing.

105
00:05:31,420 --> 00:05:34,280
And so, in the
end, the phenotype

106
00:05:34,280 --> 00:05:37,210
is carried out by these
protein operations.

107
00:05:37,210 --> 00:05:39,430
And the question
is, what information

108
00:05:39,430 --> 00:05:41,880
level that we might
want to study.

109
00:05:41,880 --> 00:05:44,250
And of course you would
love to have the information

110
00:05:44,250 --> 00:05:47,310
content at all levels--
genomic information,

111
00:05:47,310 --> 00:05:50,114
transcriptional information,
translational information,

112
00:05:50,114 --> 00:05:51,405
post-translational information.

113
00:05:54,180 --> 00:05:57,570
So integrating all those
different data levels

114
00:05:57,570 --> 00:05:59,232
can be extremely valuable.

115
00:05:59,232 --> 00:06:01,440
In terms of the models I'm
going to talk about today,

116
00:06:01,440 --> 00:06:03,610
they've essentially
been living at the level

117
00:06:03,610 --> 00:06:07,880
of protein activities in
these signaling pathways.

118
00:06:07,880 --> 00:06:08,380
OK.

119
00:06:08,380 --> 00:06:09,796
That will be the
kind of data sets

120
00:06:09,796 --> 00:06:13,310
you'll see that will be analyzed
with respect to the models.

121
00:06:13,310 --> 00:06:18,540
Obviously, they arise from
these underlying mechanisms

122
00:06:18,540 --> 00:06:22,440
that, as influenced by
the environmental context,

123
00:06:22,440 --> 00:06:25,070
altering the signaling
protein activities.

124
00:06:27,790 --> 00:06:28,290
OK.

125
00:06:28,290 --> 00:06:31,930
And what's very
interesting and there's

126
00:06:31,930 --> 00:06:36,170
going to be more and more
progress in the coming years

127
00:06:36,170 --> 00:06:39,520
is relating what's in the
genomic information-- mutations

128
00:06:39,520 --> 00:06:42,810
and variations to what's
happening at the protein level.

129
00:06:42,810 --> 00:06:45,487
And some of the other
instructors in this class

130
00:06:45,487 --> 00:06:47,070
are really some of
the world's experts

131
00:06:47,070 --> 00:06:48,640
in figuring out how to do this.

132
00:06:48,640 --> 00:06:50,750
I'd like to just
show this example

133
00:06:50,750 --> 00:06:54,830
as a motivation for
this kind of approach.

134
00:06:54,830 --> 00:06:58,890
And that is if you do gene
sequencing of many patient

135
00:06:58,890 --> 00:07:00,550
tumors-- in this
case, I believe this

136
00:07:00,550 --> 00:07:02,539
was a paper on
pancreatic tumors.

137
00:07:02,539 --> 00:07:05,080
This has been shown for pretty
much every other type of tumor

138
00:07:05,080 --> 00:07:06,190
since then.

139
00:07:06,190 --> 00:07:09,740
In any given patient tumor,
each one of these bars,

140
00:07:09,740 --> 00:07:13,340
there's dozens of
mutations in each tumor.

141
00:07:13,340 --> 00:07:15,850
And a variety of
types-- deletions,

142
00:07:15,850 --> 00:07:19,140
amplifications mutations
and by and large, they're

143
00:07:19,140 --> 00:07:20,710
all different.

144
00:07:20,710 --> 00:07:22,780
There's very few
mutations themselves

145
00:07:22,780 --> 00:07:25,515
that really carry over to
a substantial proportion

146
00:07:25,515 --> 00:07:27,770
of one patient's
tumor to another.

147
00:07:27,770 --> 00:07:31,160
There's some special cases
that are fairly pervasive,

148
00:07:31,160 --> 00:07:34,210
but the predominant of these
dozens and dozens of mutations

149
00:07:34,210 --> 00:07:39,190
and variations are
different from one patient

150
00:07:39,190 --> 00:07:43,020
to another, and even
in the same patient.

151
00:07:43,020 --> 00:07:46,260
So what's emerging
as a productive way

152
00:07:46,260 --> 00:07:51,220
to think about this-- How
do all these different types

153
00:07:51,220 --> 00:07:52,990
of mutations and
specific mutations

154
00:07:52,990 --> 00:07:56,950
actually lead to classes
of similar pathologies?

155
00:07:56,950 --> 00:07:59,800
And that is they
tend to reside in

156
00:07:59,800 --> 00:08:05,362
what can be identified
as pathways-- circuits,

157
00:08:05,362 --> 00:08:07,320
machines, things that
are actually carrying out

158
00:08:07,320 --> 00:08:09,220
function at the protein level.

159
00:08:09,220 --> 00:08:12,150
So for instance-- I'm
losing this again.

160
00:08:12,150 --> 00:08:14,510
For these pancreatic
cancers on this wheel

161
00:08:14,510 --> 00:08:18,430
are about a dozen different
signaling pathways

162
00:08:18,430 --> 00:08:20,720
and self-cycle control
pathways and apoptosis

163
00:08:20,720 --> 00:08:22,790
controlled pathways.

164
00:08:22,790 --> 00:08:26,110
And if you look at any
individual patient tumors,

165
00:08:26,110 --> 00:08:29,505
like this green one or this red
one-- two different patients.

166
00:08:29,505 --> 00:08:33,409
If you actually look at the
mutations at the genomic level,

167
00:08:33,409 --> 00:08:36,250
they're entirely different
in the green patient

168
00:08:36,250 --> 00:08:38,169
tumor versus the
red patient tumor.

169
00:08:38,169 --> 00:08:40,919
So if you're just trying
to match gene mutation

170
00:08:40,919 --> 00:08:43,171
to pancreatic cancer,
these two patients

171
00:08:43,171 --> 00:08:44,420
would look entirely different.

172
00:08:46,960 --> 00:08:50,630
But, it turns out, that you
can line up their mutations

173
00:08:50,630 --> 00:08:53,260
into the same
pathways and say, OK,

174
00:08:53,260 --> 00:08:54,660
the red tumor and
the green tumor

175
00:08:54,660 --> 00:08:58,515
both have mutations that
affect the TGF beta pathway.

176
00:08:58,515 --> 00:09:00,390
They're different
mutations, but they've just

177
00:09:00,390 --> 00:09:02,090
regulated that pathway.

178
00:09:02,090 --> 00:09:04,740
And similarly, you can
do that with pretty much

179
00:09:04,740 --> 00:09:06,610
all of the other mutations.

180
00:09:06,610 --> 00:09:09,380
That these tumors
have been dysregulated

181
00:09:09,380 --> 00:09:11,740
in terms of particular pathways.

182
00:09:11,740 --> 00:09:13,350
But patient to
patient to patient,

183
00:09:13,350 --> 00:09:16,960
it's happened by different
genomic gene sequence

184
00:09:16,960 --> 00:09:18,190
mutations.

185
00:09:18,190 --> 00:09:21,550
So that the ability to
look at these protein

186
00:09:21,550 --> 00:09:23,780
level pathways is
a way of making

187
00:09:23,780 --> 00:09:28,170
really good productive sense
of the gene sequencing data.

188
00:09:28,170 --> 00:09:31,560
So there's lots of labs trying
to go from gene sequence

189
00:09:31,560 --> 00:09:33,815
up to pathway modulation.

190
00:09:33,815 --> 00:09:37,760
In our case, we're not
going to show you that here.

191
00:09:37,760 --> 00:09:39,600
We're going to say,
this is a motivation

192
00:09:39,600 --> 00:09:42,940
for starting at
the protein level.

193
00:09:42,940 --> 00:09:45,450
And I'd like to show
this picture too.

194
00:09:45,450 --> 00:09:49,120
Number one because it's
such an anachronism.

195
00:09:49,120 --> 00:09:52,000
This is a circuit board from
decades and decades and decades

196
00:09:52,000 --> 00:09:54,680
ago that none of
you would recognize.

197
00:09:54,680 --> 00:10:00,560
But, in the molecular biology
world, this kind of a picture,

198
00:10:00,560 --> 00:10:04,760
and in its modern form is viewed
as a very appealing metaphor

199
00:10:04,760 --> 00:10:07,390
for how to think
about these signaling

200
00:10:07,390 --> 00:10:10,830
pathways and signaling networks
that take the extracellular

201
00:10:10,830 --> 00:10:14,410
information and turn it into
governance of transcription,

202
00:10:14,410 --> 00:10:17,030
metabolism, cytoskeleton,
and phenotype.

203
00:10:17,030 --> 00:10:19,280
So, just this
metaphor of circuitry,

204
00:10:19,280 --> 00:10:21,360
where in white, the
extracellular ligands,

205
00:10:21,360 --> 00:10:24,710
growth factors are
somehow wired to the blue.

206
00:10:24,710 --> 00:10:28,380
The cell surface receptors,
or B for instance--

207
00:10:28,380 --> 00:10:30,310
they're wired too.

208
00:10:30,310 --> 00:10:32,920
Kinases and other
signaling proteins--

209
00:10:32,920 --> 00:10:37,050
they're wired to transcription
factors, self-cycle control

210
00:10:37,050 --> 00:10:40,990
regulators,
apoptosis regulators.

211
00:10:40,990 --> 00:10:44,920
So these very famous folks
in cancer biology say,

212
00:10:44,920 --> 00:10:47,580
what you've got to understand
is, these signalling networks

213
00:10:47,580 --> 00:10:49,470
as circuitry.

214
00:10:49,470 --> 00:10:51,740
And if the circuitry is
dysregulated somehow,

215
00:10:51,740 --> 00:10:54,450
the wiring is different,
then that's what's

216
00:10:54,450 --> 00:10:58,910
underlying malignant behavior.

217
00:10:58,910 --> 00:11:01,160
So, this is really
beautiful but it's

218
00:11:01,160 --> 00:11:02,800
pretty much useless, right.

219
00:11:02,800 --> 00:11:07,620
Because there's no prediction or
calculation or even hypothesis

220
00:11:07,620 --> 00:11:10,650
generation one can do
from a picture like this.

221
00:11:10,650 --> 00:11:14,310
Yes, it's circuitry, but
what do I do with it?

222
00:11:14,310 --> 00:11:16,110
So, what I want
to show you today

223
00:11:16,110 --> 00:11:18,040
are efforts to
turn them into what

224
00:11:18,040 --> 00:11:21,470
I would call an actionable
model, a computable model.

225
00:11:21,470 --> 00:11:23,789
Yes, it looks kind
of like circuitry,

226
00:11:23,789 --> 00:11:26,080
but in fact you would know
how to do a calculation that

227
00:11:26,080 --> 00:11:29,370
would fit it to data
and predict new data.

228
00:11:29,370 --> 00:11:32,540
And then you have, in fact, a
model rather than a metaphor.

229
00:11:32,540 --> 00:11:35,200
That's the idea.

230
00:11:35,200 --> 00:11:37,910
So, one question is, if
you want to turn that

231
00:11:37,910 --> 00:11:43,750
into a formal mathematical
framework for circuitry

232
00:11:43,750 --> 00:11:46,530
that you can calculate--
what kind of mathematics

233
00:11:46,530 --> 00:11:47,190
might you use?

234
00:11:47,190 --> 00:11:48,565
And in this class
you're learning

235
00:11:48,565 --> 00:11:51,730
a whole spectrum of things.

236
00:11:51,730 --> 00:11:54,930
And one can think
about it on one hand,

237
00:11:54,930 --> 00:11:56,960
if we knew all of
those components

238
00:11:56,960 --> 00:12:01,790
and how they interacted, and
could estimate rate constance

239
00:12:01,790 --> 00:12:04,340
and so forth, we could
write differential equations

240
00:12:04,340 --> 00:12:08,240
for maybe the dozens and dozens
of components and interactions

241
00:12:08,240 --> 00:12:11,690
and predict how they would
play out dynamically with time.

242
00:12:11,690 --> 00:12:14,440
For most systems with
the complexity that's

243
00:12:14,440 --> 00:12:17,280
really controlling cell
biology, at this point in time,

244
00:12:17,280 --> 00:12:18,990
this is almost impossible.

245
00:12:18,990 --> 00:12:21,530
There's only rare
cases where enough

246
00:12:21,530 --> 00:12:24,290
is known about
signaling biochemistry

247
00:12:24,290 --> 00:12:26,800
to really write
down differential

248
00:12:26,800 --> 00:12:29,690
equations for what's going on.

249
00:12:29,690 --> 00:12:31,570
At the other extreme,
of course, is the type

250
00:12:31,570 --> 00:12:35,830
of mathematics one gets out
of very, very large data sets,

251
00:12:35,830 --> 00:12:40,190
sequencing data sets,
transcriptional, and so forth.

252
00:12:40,190 --> 00:12:42,510
More informatics
type of analysis,

253
00:12:42,510 --> 00:12:44,990
where it has to do
with multivariate

254
00:12:44,990 --> 00:12:50,290
regression and clustering,
mutual information.

255
00:12:50,290 --> 00:12:55,050
And what we've been working on
is someplace up in the middle

256
00:12:55,050 --> 00:12:58,530
where you don't have enough
mechanistic prior knowledge

257
00:12:58,530 --> 00:13:02,360
to write this formal
of physics, and yet

258
00:13:02,360 --> 00:13:06,640
takes you someplace beyond
statistical associations.

259
00:13:06,640 --> 00:13:08,710
And this is one
of the areas that

260
00:13:08,710 --> 00:13:12,470
might be worth your
learning in this class.

261
00:13:12,470 --> 00:13:15,720
OK, this is really the same
set of computational methods,

262
00:13:15,720 --> 00:13:20,130
just like it's cast in a
little bit different form that

263
00:13:20,130 --> 00:13:23,380
delineates competition
modeling, really

264
00:13:23,380 --> 00:13:26,430
into two kinds of classes.

265
00:13:26,430 --> 00:13:30,070
What's traditionally appreciated
in most fields of engineering

266
00:13:30,070 --> 00:13:32,330
and physics are
differential equations

267
00:13:32,330 --> 00:13:33,910
that are very theory driven.

268
00:13:33,910 --> 00:13:34,810
You have a theory.

269
00:13:34,810 --> 00:13:37,010
You have prior knowledge
for what's happening.

270
00:13:37,010 --> 00:13:39,140
You're writing down the
components involved,

271
00:13:39,140 --> 00:13:41,570
you're writing down
how they interact.

272
00:13:41,570 --> 00:13:43,780
And typically,
algebraic equations

273
00:13:43,780 --> 00:13:47,960
for those differential
equations describe your theory,

274
00:13:47,960 --> 00:13:49,960
describe your prior knowledge.

275
00:13:49,960 --> 00:13:53,260
And now it's formalized and
you estimate rate constants

276
00:13:53,260 --> 00:13:55,020
and so forth.

277
00:13:55,020 --> 00:13:56,520
Another whole class
of information

278
00:13:56,520 --> 00:13:58,640
is data driven, in
which, you really

279
00:13:58,640 --> 00:14:01,370
don't have a good theory
about what components matter

280
00:14:01,370 --> 00:14:03,350
and how they interact.

281
00:14:03,350 --> 00:14:07,450
And so you start with
data sets and from it

282
00:14:07,450 --> 00:14:11,970
you do classification or
typologies or associations

283
00:14:11,970 --> 00:14:15,150
with different types of
mathematics that at least try

284
00:14:15,150 --> 00:14:18,290
to make sense and get hypotheses
out of these large data

285
00:14:18,290 --> 00:14:21,240
sets, where you don't
have any theory.

286
00:14:21,240 --> 00:14:25,440
One reason that logic
modeling appeals to me,

287
00:14:25,440 --> 00:14:26,960
is that it actually
can be applied

288
00:14:26,960 --> 00:14:31,210
in either the theory driven
or the data driven mode.

289
00:14:31,210 --> 00:14:34,770
You can say, I know
nothing about my system.

290
00:14:34,770 --> 00:14:39,370
I just generate large data sets
of signaling network activities

291
00:14:39,370 --> 00:14:42,000
induced by different
stimuli, but I'm

292
00:14:42,000 --> 00:14:44,260
going to try to fit a
logic model to it that

293
00:14:44,260 --> 00:14:46,620
says how the different
components influencing

294
00:14:46,620 --> 00:14:50,120
each other in a logic way.

295
00:14:50,120 --> 00:14:52,530
Or, you could say,
well, I know something.

296
00:14:52,530 --> 00:14:54,190
I have some prior knowledge.

297
00:14:54,190 --> 00:14:59,800
I may have interact ohms the say
what molecular components are

298
00:14:59,800 --> 00:15:02,390
present in signaling networks.

299
00:15:02,390 --> 00:15:05,690
And so in principle, I
kind of know who's involved

300
00:15:05,690 --> 00:15:07,970
and who might be
influencing whom.

301
00:15:07,970 --> 00:15:12,550
And I could write a logic model
based on that prior knowledge.

302
00:15:12,550 --> 00:15:15,170
And then run calculations
and see if it actually

303
00:15:15,170 --> 00:15:17,670
makes predictions about
experimental data.

304
00:15:17,670 --> 00:15:18,830
So that's one nice thing.

305
00:15:18,830 --> 00:15:21,800
It's a mathematical
formalism that

306
00:15:21,800 --> 00:15:25,970
can either be run in data driven
mode or in theory driven mode

307
00:15:25,970 --> 00:15:27,390
and go back and forth.

308
00:15:27,390 --> 00:15:30,665
So that's one reason--
given one lecture to offer,

309
00:15:30,665 --> 00:15:34,550
I've decided to offer
it on this topic.

310
00:15:34,550 --> 00:15:35,840
All right, with me so far?

311
00:15:35,840 --> 00:15:37,360
Any questions?

312
00:15:37,360 --> 00:15:37,860
Philosophy?

313
00:15:41,250 --> 00:15:41,830
OK.

314
00:15:41,830 --> 00:15:46,490
So, what we're
going to do today is

315
00:15:46,490 --> 00:15:50,510
almost take a
hybrid of these two.

316
00:15:50,510 --> 00:15:54,330
We're going to say, what
prior knowledge do we have,

317
00:15:54,330 --> 00:15:57,360
and then recognize that
it's really not enough.

318
00:15:57,360 --> 00:16:02,310
And so how do we now integrate
that with empirical data

319
00:16:02,310 --> 00:16:05,260
to now come up with
logic modeling that,

320
00:16:05,260 --> 00:16:08,640
in fact, is actionable
and computable?

321
00:16:08,640 --> 00:16:09,140
OK.

322
00:16:09,140 --> 00:16:12,704
So what kind of prior
knowledge do we have?

323
00:16:12,704 --> 00:16:14,870
Let's say we wanted to have
a logic model for what's

324
00:16:14,870 --> 00:16:17,400
in these signaling networks
down stream of growth factor

325
00:16:17,400 --> 00:16:21,620
receptors, or hormone receptors,
or things like that, that then

326
00:16:21,620 --> 00:16:26,030
govern gene expression,
metabolism and so forth.

327
00:16:26,030 --> 00:16:28,250
What prior knowledge do we have?

328
00:16:28,250 --> 00:16:30,129
And you folks
probably have already

329
00:16:30,129 --> 00:16:31,420
seen some of this in the class.

330
00:16:35,264 --> 00:16:36,930
There's all kinds of
databases of stuff.

331
00:16:36,930 --> 00:16:39,180
What's in those databases
that might be relevant here?

332
00:16:42,054 --> 00:16:45,547
AUDIENCE: [INAUDIBLE] that the
protein-protein interactions--

333
00:16:45,547 --> 00:16:48,425
if you switch proteins, they
interact with each other,

334
00:16:48,425 --> 00:16:50,550
but maybe not necessarily
what pathways they're in.

335
00:16:50,550 --> 00:16:51,758
DOUG LAUFFENBURGER: OK, good.

336
00:16:51,758 --> 00:16:55,506
And have you seen
databases like that?

337
00:16:55,506 --> 00:16:57,130
AUDIENCE: Several of
them have come up.

338
00:16:57,130 --> 00:16:57,630
DOUG LAUFFENBURGER: OK.

339
00:16:57,630 --> 00:16:59,580
Are you the only
one who's seen them?

340
00:16:59,580 --> 00:17:01,306
Or is there anybody
else that kind

341
00:17:01,306 --> 00:17:03,426
of noticed them in passing too?

342
00:17:03,426 --> 00:17:04,099
OK, good.

343
00:17:04,099 --> 00:17:06,490
Second, third,
fourth, all right.

344
00:17:06,490 --> 00:17:10,480
That's a critical mass
if I ever saw one.

345
00:17:10,480 --> 00:17:11,010
OK.

346
00:17:11,010 --> 00:17:16,980
So, I'm just going
to allude to those.

347
00:17:16,980 --> 00:17:20,520
So there are pathway databases.

348
00:17:20,520 --> 00:17:23,282
And this is actually an
old slide of a few years

349
00:17:23,282 --> 00:17:25,240
ago, so I'm sure the
numbers are all different.

350
00:17:25,240 --> 00:17:26,640
And, in fact, there's new ones.

351
00:17:26,640 --> 00:17:29,080
I just haven't taken
to updating the slide.

352
00:17:29,080 --> 00:17:33,540
But we'll, based on literature,
take certain numbers

353
00:17:33,540 --> 00:17:36,420
of gene products, a
few hundred of them,

354
00:17:36,420 --> 00:17:41,760
and organize them into pathways
based on biological knowledge.

355
00:17:41,760 --> 00:17:45,440
There's other databases that
are more interactomes, usually

356
00:17:45,440 --> 00:17:49,230
based on other kinds of
experimental data-- yeast II

357
00:17:49,230 --> 00:17:53,280
hybrid, mass spectrometry,
literature curation,

358
00:17:53,280 --> 00:17:58,610
that also tries to say who's
physically interacting.

359
00:17:58,610 --> 00:18:01,610
So these node-- these pathway
databases don't necessarily

360
00:18:01,610 --> 00:18:04,080
say, somebody's
physically interacting,

361
00:18:04,080 --> 00:18:06,890
they say somebody might
be upstream and downstream

362
00:18:06,890 --> 00:18:08,580
and so forth.

363
00:18:08,580 --> 00:18:10,060
And then they
interactome databases

364
00:18:10,060 --> 00:18:12,632
say, component a
and component b,

365
00:18:12,632 --> 00:18:15,090
there's some evidence that they
have a physical association

366
00:18:15,090 --> 00:18:16,230
someplace along the way.

367
00:18:16,230 --> 00:18:18,720
So these are two
complementary types

368
00:18:18,720 --> 00:18:25,280
of databases that, in
fact, can be put together.

369
00:18:25,280 --> 00:18:26,190
OK.

370
00:18:26,190 --> 00:18:29,360
So an interesting
thing about these--

371
00:18:29,360 --> 00:18:31,860
there's a number
of these databases.

372
00:18:31,860 --> 00:18:34,200
And so in principle
you could say, well

373
00:18:34,200 --> 00:18:37,110
if I want then to start-- if
I want to generate a logic

374
00:18:37,110 --> 00:18:40,219
model for signaling
networks, all

375
00:18:40,219 --> 00:18:42,010
I have to do is take
what's in the database

376
00:18:42,010 --> 00:18:43,890
and say what pathways
are there and what's

377
00:18:43,890 --> 00:18:46,190
known with their
interactions, and now I've

378
00:18:46,190 --> 00:18:47,645
got a starting point.

379
00:18:47,645 --> 00:18:49,660
You know, I can
actually draw a graph

380
00:18:49,660 --> 00:18:55,750
with lots of molecular nodes and
lots of molecular interactions.

381
00:18:55,750 --> 00:18:57,680
So, you can do that.

382
00:18:57,680 --> 00:19:01,240
And so you can choose
one of these databases

383
00:19:01,240 --> 00:19:03,450
and say I'm going
to draw a graph that

384
00:19:03,450 --> 00:19:07,890
has what's believed to be
true about nodes and pathways

385
00:19:07,890 --> 00:19:10,730
and interactions and
signaling networks.

386
00:19:10,730 --> 00:19:13,870
But then you choose a different
database and another database.

387
00:19:13,870 --> 00:19:16,810
And you'll actually get
different information.

388
00:19:16,810 --> 00:19:17,310
OK.

389
00:19:17,310 --> 00:19:19,143
We actually did a study
on this-- I probably

390
00:19:19,143 --> 00:19:21,320
should have given you the
citation of that-- that

391
00:19:21,320 --> 00:19:23,770
said if you look at six or
seven of these databases,

392
00:19:23,770 --> 00:19:25,470
they are not coincident.

393
00:19:25,470 --> 00:19:28,770
They have a very
small intersections.

394
00:19:28,770 --> 00:19:32,970
Most of their information
is non-redundant.

395
00:19:32,970 --> 00:19:35,189
And so you could try
to put it all together.

396
00:19:35,189 --> 00:19:36,730
And we did this,
again, in this paper

397
00:19:36,730 --> 00:19:39,760
that I'm not giving
you a citation for.

398
00:19:39,760 --> 00:19:42,420
And so here's a number of
nodes and signaling pathways

399
00:19:42,420 --> 00:19:46,680
downstream of receptors.

400
00:19:46,680 --> 00:19:50,940
And all the colored
nodes are those

401
00:19:50,940 --> 00:19:54,120
in which they appear in
only one of these-- one, two

402
00:19:54,120 --> 00:19:57,460
three, four, five,
six databases.

403
00:19:57,460 --> 00:20:00,580
So if something's colored
green, it's only in GeneGo

404
00:20:00,580 --> 00:20:02,175
and it's not in
any of the others.

405
00:20:02,175 --> 00:20:03,710
If something's
colored purple, it's

406
00:20:03,710 --> 00:20:06,180
in PANTHER and
none of the others.

407
00:20:06,180 --> 00:20:07,240
OK.

408
00:20:07,240 --> 00:20:09,280
If they're gray-- some
of these gray ones,

409
00:20:09,280 --> 00:20:11,630
they're in at least two.

410
00:20:11,630 --> 00:20:15,270
But out of these six, there's
an exceedingly small number

411
00:20:15,270 --> 00:20:18,360
of nodes interactions that
are in all six databases.

412
00:20:18,360 --> 00:20:22,620
Which was a real surprise
to us when we did this.

413
00:20:22,620 --> 00:20:24,080
So what this means
is, if you want

414
00:20:24,080 --> 00:20:27,400
to start with some
prior knowledge graph

415
00:20:27,400 --> 00:20:30,090
that you're now going to fit
a logic model to by mapping it

416
00:20:30,090 --> 00:20:32,540
against data, you first
even have the choice, well,

417
00:20:32,540 --> 00:20:33,790
what am I going to start with?

418
00:20:33,790 --> 00:20:35,520
What is my prior knowledge?

419
00:20:35,520 --> 00:20:38,540
There;s not really
consensus prior knowledge.

420
00:20:38,540 --> 00:20:42,810
So you can start with six
different interaction graphs.

421
00:20:42,810 --> 00:20:44,530
Or you could try to
put them all together

422
00:20:44,530 --> 00:20:46,570
and get a consensus graph.

423
00:20:46,570 --> 00:20:48,600
So you have all these choices.

424
00:20:48,600 --> 00:20:50,860
And right now, it's
not as if is there's

425
00:20:50,860 --> 00:20:53,640
detailed analysis of what
the best choice would

426
00:20:53,640 --> 00:20:57,150
be for your starting point.

427
00:20:57,150 --> 00:20:59,990
But I want to stress that,
with respect to our approach,

428
00:20:59,990 --> 00:21:04,090
this is a starting
point because one

429
00:21:04,090 --> 00:21:07,170
of the issues with the
database information

430
00:21:07,170 --> 00:21:10,990
is that it's typically
very diverse with respect

431
00:21:10,990 --> 00:21:12,560
to contact.

432
00:21:12,560 --> 00:21:15,020
What cell type did this
information come from?

433
00:21:15,020 --> 00:21:18,310
What treatment conditions
did it come from?

434
00:21:18,310 --> 00:21:21,240
If there's different cell
types, different species,

435
00:21:21,240 --> 00:21:22,780
different mutations.

436
00:21:22,780 --> 00:21:28,080
So if I see interactions or
if I don't see interactions,

437
00:21:28,080 --> 00:21:29,040
are they in conflict?

438
00:21:29,040 --> 00:21:31,650
Or they're just-- this
one was in a lymphocyte,

439
00:21:31,650 --> 00:21:33,330
this one was in a
hypatocye, this one

440
00:21:33,330 --> 00:21:38,500
was in a cardiac myocyte, and
they're actually different.

441
00:21:38,500 --> 00:21:42,380
OK, so if I had a cell
type specific database,

442
00:21:42,380 --> 00:21:44,520
or pulled that information
out, that would be good.

443
00:21:44,520 --> 00:21:46,202
It would be a smaller
number of things.

444
00:21:46,202 --> 00:21:47,910
But then under what
treatment conditions?

445
00:21:47,910 --> 00:21:50,795
Because remember I said starting
with the genomic content, what

446
00:21:50,795 --> 00:21:52,920
you actually see in terms
of molecular interactions

447
00:21:52,920 --> 00:21:55,580
will be very strongly
affected by what

448
00:21:55,580 --> 00:21:57,200
matrix were the
cells growing on?

449
00:21:57,200 --> 00:21:58,820
Or was this in vivo?

450
00:21:58,820 --> 00:22:03,750
Was this in a multicellular
culture situation?

451
00:22:03,750 --> 00:22:06,610
So, that's why this
is a starting point

452
00:22:06,610 --> 00:22:09,880
and can't really
be used to describe

453
00:22:09,880 --> 00:22:14,279
any particular experimental
situation with much confidence.

454
00:22:14,279 --> 00:22:15,695
The other thing--
and this is what

455
00:22:15,695 --> 00:22:17,290
I've been trying to
emphasize from the start-- is

456
00:22:17,290 --> 00:22:19,320
that there's no calculation
you can do on this.

457
00:22:22,380 --> 00:22:27,260
There's a group of
folks in this field who

458
00:22:27,260 --> 00:22:31,270
propose some ideas that I
think are very intriguing,

459
00:22:31,270 --> 00:22:33,000
but which, at least
to me personally,

460
00:22:33,000 --> 00:22:35,450
there's not that
much evidence for.

461
00:22:35,450 --> 00:22:41,040
And that is, that there's
topological characteristics

462
00:22:41,040 --> 00:22:45,520
of these graphs, that then
tell you what's important.

463
00:22:45,520 --> 00:22:47,330
So if I have a
node that's somehow

464
00:22:47,330 --> 00:22:49,760
connected to more
other nodes, that

465
00:22:49,760 --> 00:22:51,630
is going to be a
more important node,

466
00:22:51,630 --> 00:22:54,380
and might be associated with the
disease, versus a node that's

467
00:22:54,380 --> 00:22:55,930
connected to fewer.

468
00:22:55,930 --> 00:22:56,430
OK.

469
00:22:56,430 --> 00:22:59,810
Some of these are very, very
appealing ideas conceptually.

470
00:22:59,810 --> 00:23:03,740
If you actually look for
the experimental evidence

471
00:23:03,740 --> 00:23:07,450
that they're valid
notions, it's very thin.

472
00:23:07,450 --> 00:23:09,830
But, that's where some
folks would claim,

473
00:23:09,830 --> 00:23:12,950
oh, you can do predictions
on the hypotheses based

474
00:23:12,950 --> 00:23:15,990
on these graphs because
there are these graph theory

475
00:23:15,990 --> 00:23:17,630
characteristics
that somehow might

476
00:23:17,630 --> 00:23:20,480
be biologically meaningful.

477
00:23:20,480 --> 00:23:21,090
OK.

478
00:23:21,090 --> 00:23:24,240
But I'd say, jury's out on
whether, in fact, any of that

479
00:23:24,240 --> 00:23:26,130
is true.

480
00:23:26,130 --> 00:23:30,690
So, our view is-- OK, this
is a good starting point,

481
00:23:30,690 --> 00:23:33,960
but in fact, needs to be
mapped to empirical data

482
00:23:33,960 --> 00:23:39,540
in order to gain confidence
about calculations you can do.

483
00:23:39,540 --> 00:23:44,200
So that's the goal of
this kind of approach,

484
00:23:44,200 --> 00:23:48,870
is to say, let's
stipulate that we start

485
00:23:48,870 --> 00:23:51,690
with some prior
knowledge scaffold.

486
00:23:51,690 --> 00:23:54,710
This particular one is from
the Ingenuity database.

487
00:23:54,710 --> 00:23:56,850
You could get one from
any other database.

488
00:23:56,850 --> 00:24:00,160
You could get a consensus one
from three or four if you want.

489
00:24:00,160 --> 00:24:03,590
And so it has, up here,
extracellular stimuli,

490
00:24:03,590 --> 00:24:06,200
growth factors, cytokines.

491
00:24:06,200 --> 00:24:11,150
They're connected in their
interactome II receptors.

492
00:24:11,150 --> 00:24:14,810
They're connected to scaffolding
proteins and signaling proteins

493
00:24:14,810 --> 00:24:17,110
and kinases and so forth.

494
00:24:17,110 --> 00:24:19,520
They're connected to
transcription factors,

495
00:24:19,520 --> 00:24:20,970
metabolic enzymes.

496
00:24:20,970 --> 00:24:22,880
So you can draw this graph.

497
00:24:22,880 --> 00:24:25,675
Say this might be what's
going on in my cell.

498
00:24:25,675 --> 00:24:27,550
And then what we'd like
to do is to turn this

499
00:24:27,550 --> 00:24:31,970
into a formal logic
framework that's

500
00:24:31,970 --> 00:24:36,340
capable of then fitting
experimental data,

501
00:24:36,340 --> 00:24:38,380
predicting new
experimental data,

502
00:24:38,380 --> 00:24:41,990
and giving you a chance
at biological hypothesis

503
00:24:41,990 --> 00:24:43,512
and testing.

504
00:24:43,512 --> 00:24:46,250
All right, so
conceptually you get it?

505
00:24:46,250 --> 00:24:50,240
Two aspects-- some kind of
starting prior knowledge,

506
00:24:50,240 --> 00:24:53,230
that's kind of a scaffold,
a graph, for your network.

507
00:24:53,230 --> 00:24:56,450
And now you're going to turn it
into a computable logic model

508
00:24:56,450 --> 00:25:00,720
by mapping it against
empirical data.

509
00:25:00,720 --> 00:25:04,960
So, merely what it takes is
the kind of conceptual diagram

510
00:25:04,960 --> 00:25:08,280
you see in any cell biology
paper, any signaling paper,

511
00:25:08,280 --> 00:25:13,140
that says, well, a and b
both influence e positively,

512
00:25:13,140 --> 00:25:17,880
and b influences f negatively,
and c influences f positively.

513
00:25:17,880 --> 00:25:19,780
Then there's a
feedback from g to a.

514
00:25:19,780 --> 00:25:20,980
That's inhibitory.

515
00:25:20,980 --> 00:25:22,800
You can draw those.

516
00:25:22,800 --> 00:25:27,770
But now, how do you turn it
into a computable algorithm?

517
00:25:27,770 --> 00:25:30,610
So, what I'm going to spend
most of the day on is,

518
00:25:30,610 --> 00:25:33,680
just conversion of
this to a Boolean logic

519
00:25:33,680 --> 00:25:38,870
model that any one of these
interactions is and-- a and b

520
00:25:38,870 --> 00:25:41,870
being active makes e active.

521
00:25:41,870 --> 00:25:44,770
c being active, but
b not being active,

522
00:25:44,770 --> 00:25:47,220
allows f to be
active, and so forth.

523
00:25:47,220 --> 00:25:49,320
You turn these into
formal logic statements

524
00:25:49,320 --> 00:25:50,640
that you can compute on.

525
00:25:50,640 --> 00:25:52,720
At the very end,
if we have time,

526
00:25:52,720 --> 00:25:55,880
I'll show how to relax this from
a Boolean framework that's just

527
00:25:55,880 --> 00:25:59,450
on off, to something that
can be more quantitative.

528
00:26:02,490 --> 00:26:02,990
All right.

529
00:26:02,990 --> 00:26:04,050
So that's the notion.

530
00:26:04,050 --> 00:26:07,400
Now what I'm going to do
for the rest of the time

531
00:26:07,400 --> 00:26:10,820
is go through the specific
example paper that says, OK,

532
00:26:10,820 --> 00:26:13,665
how do we in fact do this?

533
00:26:13,665 --> 00:26:16,440
What is a way to
accomplish this?

534
00:26:16,440 --> 00:26:19,050
So now let's go back
to a biological problem

535
00:26:19,050 --> 00:26:21,686
where there's going to be
empirical, experimental data

536
00:26:21,686 --> 00:26:23,310
that we're now going
to map against one

537
00:26:23,310 --> 00:26:27,070
of these prior knowledge
interactome graphs.

538
00:26:27,070 --> 00:26:30,080
This particular study-- this was
done with Peter Sorger, who's

539
00:26:30,080 --> 00:26:35,380
now at Harvard Medical School--
had to do with liver cells.

540
00:26:35,380 --> 00:26:39,720
Liver cancer-- you'll see
some application of that

541
00:26:39,720 --> 00:26:44,450
at the end-- that says we
have liver cell hepatocytes.

542
00:26:44,450 --> 00:26:47,780
And we want to know how they
respond to different growth

543
00:26:47,780 --> 00:26:50,770
factors, in cytokines
in their environment.

544
00:26:50,770 --> 00:26:52,960
How that'll change their
proliferation or death?

545
00:26:52,960 --> 00:26:56,600
Or the inflammatory
cytokines that they produce.

546
00:26:56,600 --> 00:26:59,600
And we'd like to take-- this is
just a pictorial diagram that

547
00:26:59,600 --> 00:27:02,250
could be in any
cell biology paper,

548
00:27:02,250 --> 00:27:04,200
and make this calculable.

549
00:27:04,200 --> 00:27:09,040
So we could say what's
different from a primary normal

550
00:27:09,040 --> 00:27:11,470
hepatocyte liver cell
that's not cancerous?

551
00:27:11,470 --> 00:27:13,780
It might have a signaling logic.

552
00:27:13,780 --> 00:27:15,530
But if then we compare
the signaling logic

553
00:27:15,530 --> 00:27:20,030
to a liver tumor cell type, or
four different liver tumor cell

554
00:27:20,030 --> 00:27:21,815
types, what's different?

555
00:27:21,815 --> 00:27:25,270
If we can find some logic that's
different for the tumor cell

556
00:27:25,270 --> 00:27:29,490
lines versus the normal primary
lines-- some logic from here

557
00:27:29,490 --> 00:27:32,610
to there or to there-- that now
tells you biologically, where

558
00:27:32,610 --> 00:27:34,110
the differences
might be that have

559
00:27:34,110 --> 00:27:37,040
arisen from the
genetic mutations.

560
00:27:37,040 --> 00:27:38,870
And where good drug
targets might be,

561
00:27:38,870 --> 00:27:42,000
or predictions if
I intervene here,

562
00:27:42,000 --> 00:27:44,420
if there's no difference in
that logic, between normal

563
00:27:44,420 --> 00:27:46,632
and tumor, well then that
won't have any effect.

564
00:27:46,632 --> 00:27:48,090
I want to look for
the places where

565
00:27:48,090 --> 00:27:50,610
there is a difference
in the signaling logic.

566
00:27:50,610 --> 00:27:53,350
And that would be a
better drug target.

567
00:27:53,350 --> 00:27:55,810
OK, so the measurements
are made in

568
00:27:55,810 --> 00:28:01,860
across 17 of these different
signaling molecules

569
00:28:01,860 --> 00:28:05,020
here, pretty much
all by measurement

570
00:28:05,020 --> 00:28:06,870
of a phosphorylation state.

571
00:28:06,870 --> 00:28:09,280
So if you've done cell
biology or biochemistry--

572
00:28:09,280 --> 00:28:11,040
in these signaling
pathways, many

573
00:28:11,040 --> 00:28:13,890
of the activities in
these kinds of pathways

574
00:28:13,890 --> 00:28:16,680
that regulate this
kind of cell behavior

575
00:28:16,680 --> 00:28:19,680
are kinases that end up
affecting transcription factor

576
00:28:19,680 --> 00:28:21,250
activities and so forth.

577
00:28:21,250 --> 00:28:23,160
And it's the
phosphorylation state

578
00:28:23,160 --> 00:28:25,335
of any these proteins
that matters.

579
00:28:25,335 --> 00:28:29,760
If a phosphate is on some
particular amino acid,

580
00:28:29,760 --> 00:28:31,340
the enzyme might be active.

581
00:28:31,340 --> 00:28:33,950
If it's not there it might
be inactive and so forth.

582
00:28:33,950 --> 00:28:36,075
So, just measurement of
phosphorylation states

583
00:28:36,075 --> 00:28:38,620
of 17 different proteins
in these pathways

584
00:28:38,620 --> 00:28:42,550
distributed across
multiple pathways.

585
00:28:42,550 --> 00:28:45,420
I've made these measurements on
five different cell types, four

586
00:28:45,420 --> 00:28:47,949
tumor cell types,
and the primaries

587
00:28:47,949 --> 00:28:50,240
in order to try to see what's
different between primary

588
00:28:50,240 --> 00:28:51,820
and tumor.

589
00:28:51,820 --> 00:28:55,970
And then what might be
different, patient to patient.

590
00:28:55,970 --> 00:28:59,500
In response to seven different
extracellular stimuli,

591
00:28:59,500 --> 00:29:01,340
some of them growth
factors, some of them

592
00:29:01,340 --> 00:29:06,850
cytokines, some of them actually
bacterial metabolic products.

593
00:29:06,850 --> 00:29:10,780
We all know about the effects
of microbiome these days.

594
00:29:10,780 --> 00:29:14,050
And, to further
populate a database that

595
00:29:14,050 --> 00:29:16,570
might be capable
of helping validate

596
00:29:16,570 --> 00:29:20,856
a model, a number of seven,
in fact-- intercellular

597
00:29:20,856 --> 00:29:21,355
inhibitors.

598
00:29:21,355 --> 00:29:23,990
A small molecule,
these things in black.

599
00:29:23,990 --> 00:29:25,680
One that might
inhibit this kinase.

600
00:29:25,680 --> 00:29:27,350
One might inhibit that kinase.

601
00:29:27,350 --> 00:29:29,420
One might inhibit that kinase.

602
00:29:29,420 --> 00:29:31,950
So now if you add all
those inhibitors too,

603
00:29:31,950 --> 00:29:34,310
then you start to change
the network activities

604
00:29:34,310 --> 00:29:38,160
and the downstream behavior.

605
00:29:38,160 --> 00:29:40,440
So that's how
extensive the data is.

606
00:29:40,440 --> 00:29:45,910
And this is actually for a
few different time points.

607
00:29:45,910 --> 00:29:48,190
So the data looks
something like this.

608
00:29:48,190 --> 00:29:50,230
Let's focus on the
one on the left.

609
00:29:50,230 --> 00:29:53,180
This is just the primary,
normal, human cells.

610
00:29:53,180 --> 00:29:56,940
It came from a liver donor.

611
00:29:56,940 --> 00:29:57,720
OK.

612
00:29:57,720 --> 00:30:01,890
Each row is one of the
17 different signals,

613
00:30:01,890 --> 00:30:04,650
essentially measurement
of the phosphorylation

614
00:30:04,650 --> 00:30:12,150
state of Akt or CREB
or P52 of staph 3.

615
00:30:12,150 --> 00:30:12,650
OK?

616
00:30:12,650 --> 00:30:14,270
So measurements of
its phosphorylation

617
00:30:14,270 --> 00:30:17,740
state that has something to do
with its signaling activity.

618
00:30:17,740 --> 00:30:20,620
Each of the big columns are the
seven different treatments--

619
00:30:20,620 --> 00:30:24,780
the different growth factors
and cytokines and so forth.

620
00:30:24,780 --> 00:30:25,730
And the control.

621
00:30:25,730 --> 00:30:27,580
No stimulation.

622
00:30:27,580 --> 00:30:31,950
And within each one
of these treatments,

623
00:30:31,950 --> 00:30:34,270
in each one of these
stimuli, then there's

624
00:30:34,270 --> 00:30:35,770
seven different
inhibitors that were

625
00:30:35,770 --> 00:30:38,010
used for the different pathways.

626
00:30:38,010 --> 00:30:43,970
So seven stimuli by seven
inhibitors plus controls.

627
00:30:43,970 --> 00:30:46,480
And then three
different time points.

628
00:30:46,480 --> 00:30:49,880
Sort of zero, 30
minutes, and three hours.

629
00:30:49,880 --> 00:30:52,520
So the data looks
something like this.

630
00:30:52,520 --> 00:30:56,230
If there's really no change,
due to the stimulation

631
00:30:56,230 --> 00:30:58,550
or the inhibitor, you'll
see something in gray.

632
00:30:58,550 --> 00:31:01,080
So in these gray bars,
there was already

633
00:31:01,080 --> 00:31:03,840
phosphorylation of this
transcription factor

634
00:31:03,840 --> 00:31:07,980
[INAUDIBLE] and it didn't really
change under most treatments.

635
00:31:07,980 --> 00:31:11,550
If it was yellow,
what it meant was,

636
00:31:11,550 --> 00:31:15,140
whatever the treatment was,
you got a quick activation

637
00:31:15,140 --> 00:31:19,080
of that signal and
then it went away.

638
00:31:19,080 --> 00:31:22,740
If it's late-- purple, then it
didn't happen in the first half

639
00:31:22,740 --> 00:31:25,500
hour, but it started to
show up a few hours later.

640
00:31:25,500 --> 00:31:28,310
And if it's green it showed
up in the first half hour

641
00:31:28,310 --> 00:31:29,450
and it stayed sustained.

642
00:31:29,450 --> 00:31:30,741
So that's what the color means.

643
00:31:30,741 --> 00:31:33,240
But this is the real
experimental data.

644
00:31:33,240 --> 00:31:36,720
And over here on the right is
one of the tumor cell lines.

645
00:31:36,720 --> 00:31:39,010
And you can just
see by inspection,

646
00:31:39,010 --> 00:31:40,300
it's different, right.

647
00:31:40,300 --> 00:31:42,730
The colors here are different
from the colors there.

648
00:31:42,730 --> 00:31:45,470
All the same treatments,
stimuli inhibitors.

649
00:31:45,470 --> 00:31:46,809
The colors are very different.

650
00:31:46,809 --> 00:31:48,850
You know, therefore that
the signaling activities

651
00:31:48,850 --> 00:31:50,080
are very different.

652
00:31:50,080 --> 00:31:50,580
OK.

653
00:31:50,580 --> 00:31:53,760
Just by visual inspection.

654
00:31:53,760 --> 00:31:54,260
OK.

655
00:31:54,260 --> 00:31:56,030
So what we're going to try
to do is build a logic model

656
00:31:56,030 --> 00:31:56,780
for this.

657
00:31:56,780 --> 00:31:58,580
A logic model for this.

658
00:31:58,580 --> 00:32:01,700
Compare them and say, oh where
are the key differences in how

659
00:32:01,700 --> 00:32:03,930
the signaling pathways
are getting activated?

660
00:32:03,930 --> 00:32:07,740
Downstream of the same stimuli.

661
00:32:07,740 --> 00:32:10,570
So, we start with
our prior knowledge.

662
00:32:10,570 --> 00:32:15,260
This is from the Ingenuity
database, which actually

663
00:32:15,260 --> 00:32:19,130
happened to be missing,
even basic information

664
00:32:19,130 --> 00:32:20,540
about insulin signaling.

665
00:32:20,540 --> 00:32:22,460
So we just added
our own information

666
00:32:22,460 --> 00:32:24,902
about what the
insulin receptor does.

667
00:32:24,902 --> 00:32:26,110
It's kind of hard to believe.

668
00:32:26,110 --> 00:32:27,901
This is a database that
cost a lot of money

669
00:32:27,901 --> 00:32:29,890
and they didn't have
really much information

670
00:32:29,890 --> 00:32:31,720
about insulin
receptor signaling.

671
00:32:31,720 --> 00:32:34,190
Very strange.

672
00:32:34,190 --> 00:32:38,330
So, downstream of
our seven stimuli,

673
00:32:38,330 --> 00:32:40,640
down to the transcription
factors of interest,

674
00:32:40,640 --> 00:32:44,390
there are about 82 molecular
nodes and a hundred some edges

675
00:32:44,390 --> 00:32:46,530
that you'd pull out of
the Ingenuity database.

676
00:32:46,530 --> 00:32:50,180
So here's our starting guess
at what this looks like.

677
00:32:50,180 --> 00:32:53,200
There's no logic in here, but
this is just, potentially,

678
00:32:53,200 --> 00:32:55,760
the things that the
logic might operate on,

679
00:32:55,760 --> 00:33:00,286
downstream of stimuli, and
when inhibited, and so forth.

680
00:33:00,286 --> 00:33:01,240
All right.

681
00:33:01,240 --> 00:33:02,500
So here's the process.

682
00:33:02,500 --> 00:33:04,900
This was the actual
algorithmic process

683
00:33:04,900 --> 00:33:07,730
that I'll walk you through.

684
00:33:07,730 --> 00:33:10,870
On the left-hand side
is the computer part.

685
00:33:10,870 --> 00:33:13,310
It said, OK, from the
Ingenuity database,

686
00:33:13,310 --> 00:33:16,620
we had this prior
knowledge about who

687
00:33:16,620 --> 00:33:20,110
was upstream, downstream,
who affected whom.

688
00:33:20,110 --> 00:33:22,810
We strip this down
some, because in terms

689
00:33:22,810 --> 00:33:25,610
of the measurements
on the perturbations,

690
00:33:25,610 --> 00:33:28,400
there are some of the nodes
that you just would not

691
00:33:28,400 --> 00:33:30,190
be able to see any
measurable difference.

692
00:33:30,190 --> 00:33:36,550
OK, there was no stimulus
upstream, or no perturbation.

693
00:33:36,550 --> 00:33:38,220
And it was not
measured so you really

694
00:33:38,220 --> 00:33:40,136
wouldn't be able to tell
if it changed or not.

695
00:33:40,136 --> 00:33:42,070
So you just take those out.

696
00:33:42,070 --> 00:33:45,070
Of everything remaining, now
you don't know the logic.

697
00:33:45,070 --> 00:33:46,210
You know the potential.

698
00:33:46,210 --> 00:33:48,730
And so you say, well, of all
the nodes and interactions

699
00:33:48,730 --> 00:33:52,320
remaining, I could have AND
gates, I could have OR gates,

700
00:33:52,320 --> 00:33:53,690
you could have NOTS.

701
00:33:53,690 --> 00:33:56,000
And you say, OK, in
principle, I could have,

702
00:33:56,000 --> 00:34:00,560
then, many, many, many,
many, many different logic

703
00:34:00,560 --> 00:34:06,050
models that could work.

704
00:34:06,050 --> 00:34:07,424
So how do I know which one?

705
00:34:07,424 --> 00:34:09,090
Well now you skip
over to the other side

706
00:34:09,090 --> 00:34:11,520
and say, well, but we have
all this experimental data.

707
00:34:11,520 --> 00:34:13,610
We have the data from
all the different stimuli

708
00:34:13,610 --> 00:34:17,500
and all the different inhibitors
for any given cell type.

709
00:34:17,500 --> 00:34:21,033
And so, we have that data under
all these different conditions.

710
00:34:21,033 --> 00:34:22,449
And what we're
going to do is just

711
00:34:22,449 --> 00:34:28,270
run hundreds or thousands of
these potentially appropriate

712
00:34:28,270 --> 00:34:29,980
models.

713
00:34:29,980 --> 00:34:33,320
Compare them to the data of
whether any given node is

714
00:34:33,320 --> 00:34:36,060
activated or not, activated
under treatment conditions,

715
00:34:36,060 --> 00:34:38,030
stimuli inhibitors.

716
00:34:38,030 --> 00:34:39,920
And we'll calculate the air.

717
00:34:39,920 --> 00:34:41,800
How good was any
one of those models

718
00:34:41,800 --> 00:34:44,460
at actually matching those data?

719
00:34:44,460 --> 00:34:46,396
Simple as that.

720
00:34:46,396 --> 00:34:47,770
And then it's a
matter of finding

721
00:34:47,770 --> 00:34:50,530
what are the best fit ones
from the best fit ones.

722
00:34:50,530 --> 00:34:52,860
Could you improve them and
make them fit even better?

723
00:34:52,860 --> 00:34:57,350
And in the end, how did you go
from an initial prior knowledge

724
00:34:57,350 --> 00:35:02,200
scaffold to something that, in
fact, fit the data really well,

725
00:35:02,200 --> 00:35:04,710
from which you could
make new predictions.

726
00:35:04,710 --> 00:35:05,210
OK.

727
00:35:05,210 --> 00:35:09,370
So you get the approach here?

728
00:35:09,370 --> 00:35:12,460
All right, good.

729
00:35:12,460 --> 00:35:16,320
Now, in terms of figuring
out how well any given

730
00:35:16,320 --> 00:35:20,630
model matches the data and how
to go through model selection,

731
00:35:20,630 --> 00:35:24,059
there's a myriad of
different approaches to this.

732
00:35:24,059 --> 00:35:25,600
And I'm not claiming
that what we did

733
00:35:25,600 --> 00:35:28,730
was the absolute best approach.

734
00:35:28,730 --> 00:35:30,850
There's alternatives to
it that one could consider

735
00:35:30,850 --> 00:35:33,560
and then perhaps could
work even better.

736
00:35:33,560 --> 00:35:35,660
If you read the paper,
you'll read the reasons

737
00:35:35,660 --> 00:35:37,215
for these choices.

738
00:35:37,215 --> 00:35:38,400
OK.

739
00:35:38,400 --> 00:35:40,450
So I'll let you do that.

740
00:35:40,450 --> 00:35:44,520
The way the model
quality was calculated

741
00:35:44,520 --> 00:35:47,060
was to have an
objective function that

742
00:35:47,060 --> 00:35:50,820
said we want to minimize
some number, theta.

743
00:35:50,820 --> 00:35:52,590
And how do we calculate theta?

744
00:35:52,590 --> 00:35:56,100
Well, first of all, for
whatever that model is,

745
00:35:56,100 --> 00:35:59,250
we're going to fit-- whether
the model says some nodes

746
00:35:59,250 --> 00:36:02,690
should be on or
off, one or zero.

747
00:36:02,690 --> 00:36:06,800
And we're going to compare
it to the experimental data.

748
00:36:06,800 --> 00:36:10,220
Now the experimental
data, I need to emphasize,

749
00:36:10,220 --> 00:36:12,850
isn't one or zero,
it's normalized

750
00:36:12,850 --> 00:36:14,160
to go between one and zero.

751
00:36:14,160 --> 00:36:19,050
But the actual measurement
might be 0.7 or 0.25.

752
00:36:19,050 --> 00:36:23,135
OK, so you're going to have
error against the Boolean model

753
00:36:23,135 --> 00:36:25,260
even if all the edges are
absolutely correct you'll

754
00:36:25,260 --> 00:36:29,500
still going to get some
quantitative error.

755
00:36:29,500 --> 00:36:30,920
So you calculate that.

756
00:36:30,920 --> 00:36:32,550
The Boolean model
says zero or one.

757
00:36:32,550 --> 00:36:35,710
The experimental
data says 0.250, 0.7.

758
00:36:35,710 --> 00:36:39,680
And you say, OK,
I'll calculate that.

759
00:36:39,680 --> 00:36:43,600
But then you might
think, all right, well,

760
00:36:43,600 --> 00:36:46,280
somehow I've got to penalize
bigger models with more

761
00:36:46,280 --> 00:36:49,160
nodes and more edges because
surely the more nodes and edges

762
00:36:49,160 --> 00:36:53,390
I put in, I could
capture more of the data.

763
00:36:53,390 --> 00:36:55,770
And I don't want to make
the model infinitely large

764
00:36:55,770 --> 00:36:57,240
just to get the best fit.

765
00:36:57,240 --> 00:37:00,260
So I need to penalize that.

766
00:37:00,260 --> 00:37:04,760
Turns out it's not true, but
nonetheless it's worth doing.

767
00:37:04,760 --> 00:37:08,180
So, you take a parameter
that's the size of the model.

768
00:37:08,180 --> 00:37:10,020
It's basically just
the number of nodes.

769
00:37:10,020 --> 00:37:12,350
The more nodes in
it, the more you

770
00:37:12,350 --> 00:37:14,540
would be suspicious of
the model for just fitting

771
00:37:14,540 --> 00:37:17,480
because it has too
many components.

772
00:37:17,480 --> 00:37:23,400
And you multiply that size by
a penalty parameter, alpha.

773
00:37:23,400 --> 00:37:25,450
So you have a bad
objective function

774
00:37:25,450 --> 00:37:27,480
if there's a lot of
error with the data,

775
00:37:27,480 --> 00:37:30,020
or if your model's too big.

776
00:37:30,020 --> 00:37:35,060
A better model would be, better
fit to the data and smaller.

777
00:37:35,060 --> 00:37:38,560
That's the calculation.

778
00:37:38,560 --> 00:37:39,090
OK.

779
00:37:39,090 --> 00:37:42,250
And in the end-- and I'm going
to show you how we did this.

780
00:37:45,190 --> 00:37:47,620
And I think the field is
now really believing this.

781
00:37:47,620 --> 00:37:51,170
That what you're not after
is a single best fit model.

782
00:37:51,170 --> 00:37:55,810
That one single model that gives
you the very smallest data.

783
00:37:55,810 --> 00:37:58,250
Because honestly,
within the uncertainty

784
00:37:58,250 --> 00:38:02,710
of the experimental
data-- OK, there's

785
00:38:02,710 --> 00:38:04,480
a substantial number
of models that

786
00:38:04,480 --> 00:38:09,080
could fit the data
within that noise.

787
00:38:09,080 --> 00:38:11,210
So if you demanded
the single best one,

788
00:38:11,210 --> 00:38:13,940
you say, well, but
these other 50 actually

789
00:38:13,940 --> 00:38:16,759
fit it almost as good and within
the uncertainty of the data.

790
00:38:16,759 --> 00:38:18,050
How can you really reject them?

791
00:38:18,050 --> 00:38:18,990
And you can't.

792
00:38:18,990 --> 00:38:23,090
So in the end, what's
being striven for in most

793
00:38:23,090 --> 00:38:25,522
of the field is a
family of models.

794
00:38:25,522 --> 00:38:26,980
And then you see
what the consensus

795
00:38:26,980 --> 00:38:31,670
is and the differences
within that family.

796
00:38:31,670 --> 00:38:35,180
The particular algorithm
for generating and running

797
00:38:35,180 --> 00:38:38,240
through different
potential models--

798
00:38:38,240 --> 00:38:39,720
because you just
can't exhaustively

799
00:38:39,720 --> 00:38:40,630
sample all of them.

800
00:38:40,630 --> 00:38:45,520
OK, these networks are so large,
that you can't exhaustively

801
00:38:45,520 --> 00:38:48,640
test all possibilities of
all their logic and so forth.

802
00:38:48,640 --> 00:38:50,720
It's really prohibitive.

803
00:38:50,720 --> 00:38:52,989
So there's many different
ways you can go about it.

804
00:38:52,989 --> 00:38:54,780
This particular method
maybe you've already

805
00:38:54,780 --> 00:38:57,780
learned this in class
for other applications

806
00:38:57,780 --> 00:38:59,560
as a genetic algorithm.

807
00:38:59,560 --> 00:39:01,300
So you start with
some population.

808
00:39:01,300 --> 00:39:03,350
You start with your
Ingenuity scaffold

809
00:39:03,350 --> 00:39:07,790
and then you randomly remove
or take edges and things

810
00:39:07,790 --> 00:39:08,300
like that.

811
00:39:08,300 --> 00:39:09,841
So that if you've
got a whole family,

812
00:39:09,841 --> 00:39:11,870
that's slightly different.

813
00:39:11,870 --> 00:39:14,810
For each one of them you
evaluate the objective function

814
00:39:14,810 --> 00:39:16,690
against the data.

815
00:39:16,690 --> 00:39:20,900
And you get some of those that
then are the most attractive.

816
00:39:20,900 --> 00:39:22,910
They seem to be the best fit.

817
00:39:22,910 --> 00:39:26,580
But, by no means would you
imagine they are yet optimal.

818
00:39:26,580 --> 00:39:30,530
So, now you create a next
generation from this population

819
00:39:30,530 --> 00:39:34,907
by the analog of genetics.

820
00:39:34,907 --> 00:39:36,740
Some of the very best--
you say, OK, they're

821
00:39:36,740 --> 00:39:39,800
going to survive so I'm just
going to take them as is.

822
00:39:39,800 --> 00:39:41,640
Some I'm going to
mutate, I'm going

823
00:39:41,640 --> 00:39:44,830
to have a probability of
mutating an edge here or there.

824
00:39:44,830 --> 00:39:46,970
You can have crossover,
actually mating

825
00:39:46,970 --> 00:39:49,290
between one model
and another model,

826
00:39:49,290 --> 00:39:50,890
so that the daughter
model gets some

827
00:39:50,890 --> 00:39:53,140
of the arcs from the mother
model and some of the arcs

828
00:39:53,140 --> 00:39:54,580
from the father model.

829
00:39:54,580 --> 00:39:57,920
So you just generate an
ex-population, do it again.

830
00:39:57,920 --> 00:40:00,610
And once you've
reached a set of models

831
00:40:00,610 --> 00:40:04,120
that fit your data
within the criteria

832
00:40:04,120 --> 00:40:07,490
that you want, then you say,
this is now my population.

833
00:40:07,490 --> 00:40:09,130
And these are now
my best-fit models.

834
00:40:09,130 --> 00:40:11,849
So it's not exhaustive.

835
00:40:11,849 --> 00:40:13,640
You can definitely find
local minimum here.

836
00:40:13,640 --> 00:40:14,931
There's no question about that.

837
00:40:14,931 --> 00:40:15,608
Yeah?

838
00:40:15,608 --> 00:40:22,082
AUDIENCE: Do you always take the
best model into the next round?

839
00:40:22,082 --> 00:40:23,078
Or do you--

840
00:40:23,078 --> 00:40:25,474
DOUG LAUFFENBURGER: Yeah,
that's the elite survival.

841
00:40:25,474 --> 00:40:26,890
If you don't
incorporate that, you

842
00:40:26,890 --> 00:40:30,075
might lose the best
ones in any given round.

843
00:40:30,075 --> 00:40:33,940
But this ensures you
take the best subset.

844
00:40:33,940 --> 00:40:34,980
Let them go for it.

845
00:40:34,980 --> 00:40:36,438
AUDIENCE: Is there
a worry that you

846
00:40:36,438 --> 00:40:38,141
might get stuck in [INAUDIBLE]?

847
00:40:38,141 --> 00:40:39,140
DOUG LAUFFENBURGER: Yes.

848
00:40:39,140 --> 00:40:41,750
Yes, absolutely.

849
00:40:41,750 --> 00:40:46,990
So now you run this with a
number of different starting

850
00:40:46,990 --> 00:40:48,120
populations.

851
00:40:48,120 --> 00:40:52,010
And you see if you get to
similar consensus models.

852
00:40:52,010 --> 00:40:54,030
Yeah, because
absolutely, this does not

853
00:40:54,030 --> 00:40:56,240
guarantee any kind
of a global minimum.

854
00:40:56,240 --> 00:40:58,260
You will always get local.

855
00:40:58,260 --> 00:41:01,622
So you have to condition
it on a different set

856
00:41:01,622 --> 00:41:02,580
of initial populations.

857
00:41:06,660 --> 00:41:07,160
OK.

858
00:41:07,160 --> 00:41:10,730
Once you do this-- I'm going
to show you some results

859
00:41:10,730 --> 00:41:15,810
first and then dig into some
other ways to think about it.

860
00:41:15,810 --> 00:41:17,210
So it's plotted here.

861
00:41:17,210 --> 00:41:20,860
This is one of the
tumor cell lines.

862
00:41:20,860 --> 00:41:24,160
What's plotted here, is again,
all the rows or all the signals

863
00:41:24,160 --> 00:41:25,310
that were measured.

864
00:41:25,310 --> 00:41:28,550
All the big columns or
all the different stimuli,

865
00:41:28,550 --> 00:41:32,910
and all the little columns
are the different inhibitors.

866
00:41:32,910 --> 00:41:36,920
And I should point out, this
is only for the 30 minute data.

867
00:41:36,920 --> 00:41:37,420
OK.

868
00:41:37,420 --> 00:41:39,840
This isn't for the
three hour or both,

869
00:41:39,840 --> 00:41:41,780
this is just the 30 minute data.

870
00:41:41,780 --> 00:41:45,450
And basically where
there's green,

871
00:41:45,450 --> 00:41:53,530
the model and data
fit was considered OK.

872
00:41:53,530 --> 00:41:55,820
Where it's red, it's not OK.

873
00:41:55,820 --> 00:41:59,240
Where it's pink it's less bad.

874
00:41:59,240 --> 00:42:01,010
So by the shaded.

875
00:42:01,010 --> 00:42:03,950
And the yellow
actually, the model

876
00:42:03,950 --> 00:42:05,897
really couldn't
make a prediction.

877
00:42:05,897 --> 00:42:07,980
Now, why that's the case
is what's showing up here

878
00:42:07,980 --> 00:42:10,300
is just the initial
Ingenuity scaffold.

879
00:42:10,300 --> 00:42:14,200
The very best one that
didn't add or remove any

880
00:42:14,200 --> 00:42:17,165
arcs or nodes from the
Ingenuity prior knowledge.

881
00:42:17,165 --> 00:42:18,790
It's that all we're
going to do is just

882
00:42:18,790 --> 00:42:23,130
run the best fit Boolean
logic model we can on that.

883
00:42:23,130 --> 00:42:24,130
And it wasn't very good.

884
00:42:24,130 --> 00:42:27,110
It was about 45% error.

885
00:42:27,110 --> 00:42:30,020
Almost half of the
nodes it got wrong.

886
00:42:30,020 --> 00:42:32,500
So what that tells you if you
just take a scaffold from one

887
00:42:32,500 --> 00:42:37,890
is interactive databases
and without adulterating it,

888
00:42:37,890 --> 00:42:41,190
just fit the best logic
model to some data--

889
00:42:41,190 --> 00:42:43,850
OK, at least in this case, and
we've done a number of others,

890
00:42:43,850 --> 00:42:46,770
it actually doesn't
fit very well.

891
00:42:46,770 --> 00:42:49,040
And the reasons being,
you're trying to fit this now

892
00:42:49,040 --> 00:42:52,670
to a very specific
biological context.

893
00:42:52,670 --> 00:42:54,870
Hepatocyte tumor
cells under these

894
00:42:54,870 --> 00:42:57,090
grow factor and
cytokine treatments.

895
00:42:57,090 --> 00:42:58,990
That network is
likely very different

896
00:42:58,990 --> 00:43:01,850
from whatever aggregate you
got from literature curation

897
00:43:01,850 --> 00:43:03,510
and so forth in a database.

898
00:43:03,510 --> 00:43:05,980
There's going to be a lot of
stuff in the database that's

899
00:43:05,980 --> 00:43:08,580
not applicable, because it
came from a different cell

900
00:43:08,580 --> 00:43:10,950
type, a different
condition, or there just

901
00:43:10,950 --> 00:43:15,210
wasn't enough experiments in
the literature for hypatocytes.

902
00:43:15,210 --> 00:43:17,160
Maybe it was never
measured under treatment

903
00:43:17,160 --> 00:43:18,840
with interferon gamma.

904
00:43:18,840 --> 00:43:21,750
So there's data here
that the database never

905
00:43:21,750 --> 00:43:23,960
had access to literature
that it had explored.

906
00:43:23,960 --> 00:43:26,950
So lots of reasons.

907
00:43:26,950 --> 00:43:29,460
Now when you go through the
processes we just talked about,

908
00:43:29,460 --> 00:43:33,020
and in the end,
the best fit models

909
00:43:33,020 --> 00:43:35,370
give you something like
less than 10% error.

910
00:43:35,370 --> 00:43:39,079
So less than 10% of these
squares are red or pink.

911
00:43:39,079 --> 00:43:40,620
OK, so that's the
kind of improvement

912
00:43:40,620 --> 00:43:44,420
that you can take by
generating an improved model.

913
00:43:44,420 --> 00:43:48,050
By adding and subtracting
arcs and nodes.

914
00:43:51,340 --> 00:43:54,820
So this is what the model looks
like in the end for this tumor

915
00:43:54,820 --> 00:43:55,460
cell line.

916
00:43:55,460 --> 00:44:02,240
And this is a consensus model
from the 20 or so best fit.

917
00:44:02,240 --> 00:44:05,470
And so the thickness
of a line is

918
00:44:05,470 --> 00:44:08,050
how strong the consensus was.

919
00:44:08,050 --> 00:44:11,200
The strongest would
be all 20 had it.

920
00:44:11,200 --> 00:44:16,470
And the point here being,
you see some purple.

921
00:44:16,470 --> 00:44:18,810
And I wish my pen wasn't
fading in and out.

922
00:44:18,810 --> 00:44:23,090
If anybody has a pointer
I'll be happy to have it.

923
00:44:23,090 --> 00:44:26,450
Where you see purple,
those were arcs that

924
00:44:26,450 --> 00:44:28,240
weren't in the
Ingenuity database

925
00:44:28,240 --> 00:44:34,166
and had to be put in to get
the data to fit this well.

926
00:44:34,166 --> 00:44:35,540
And it turns out,
if you actually

927
00:44:35,540 --> 00:44:40,110
go back to the literature, you
find that those purple arcs

928
00:44:40,110 --> 00:44:41,940
were already described
in the literature.

929
00:44:41,940 --> 00:44:44,190
It's just that they weren't
captured in that database.

930
00:44:48,850 --> 00:44:51,470
Well that's green and purple.

931
00:44:51,470 --> 00:44:56,820
Then you see some
blue and they were

932
00:44:56,820 --> 00:44:58,880
in some of the other
tumor cell types

933
00:44:58,880 --> 00:45:00,740
but now in this
particular hep G2

934
00:45:00,740 --> 00:45:04,530
But you can generate a
model that works very well.

935
00:45:04,530 --> 00:45:08,990
And see that it's consistent
with much of literature.

936
00:45:08,990 --> 00:45:11,880
It's a more stripped down
than what's in the databases.

937
00:45:11,880 --> 00:45:13,920
And there's some new
things in it, that in fact,

938
00:45:13,920 --> 00:45:15,836
if you go back to the
literature you can find,

939
00:45:15,836 --> 00:45:18,521
because they just were
captured in the database.

940
00:45:18,521 --> 00:45:19,060
All right.

941
00:45:19,060 --> 00:45:22,570
A few insights
about the analysis.

942
00:45:22,570 --> 00:45:26,150
So I want to show you, here
is the objective function.

943
00:45:26,150 --> 00:45:30,100
How well the model
fit and that's in red.

944
00:45:30,100 --> 00:45:30,810
OK.

945
00:45:30,810 --> 00:45:35,690
And in blue is the actual
fit to the experimental data.

946
00:45:35,690 --> 00:45:37,770
And again, the hirer
it is the worse it is.

947
00:45:37,770 --> 00:45:40,320
And the green gives you
essentially the size.

948
00:45:40,320 --> 00:45:43,730
And this is plotted
against the size penalty.

949
00:45:43,730 --> 00:45:46,600
And what's very interesting,
is even for very small size

950
00:45:46,600 --> 00:45:54,090
penalties, almost negligible,
that the size of the model that

951
00:45:54,090 --> 00:45:58,270
turns out to be best fit
is substantially smaller

952
00:45:58,270 --> 00:46:00,186
than what was in the database.

953
00:46:00,186 --> 00:46:03,575
OK, so you actually generate
a small model immediately.

954
00:46:03,575 --> 00:46:07,770
A smaller model immediately,
even without any size penalty.

955
00:46:07,770 --> 00:46:10,270
So your intuition
that a bigger model

956
00:46:10,270 --> 00:46:13,820
was going to be better actually
turns out to be incorrect.

957
00:46:13,820 --> 00:46:17,290
That even without a size
penalty, the model strips down.

958
00:46:17,290 --> 00:46:20,400
And why is that?

959
00:46:20,400 --> 00:46:20,980
Why is that?

960
00:46:20,980 --> 00:46:23,120
Let me make that
question number two.

961
00:46:23,120 --> 00:46:26,050
Just to see who's still awake.

962
00:46:26,050 --> 00:46:30,910
Why, in fitting this
hepatocyte data,

963
00:46:30,910 --> 00:46:34,100
would a model that leaves out
a lot of stuff in the Ingenuity

964
00:46:34,100 --> 00:46:38,780
database that's presumably
going on actually

965
00:46:38,780 --> 00:46:40,582
fit the data better?

966
00:46:40,582 --> 00:46:42,636
A smaller model fits better?

967
00:46:42,636 --> 00:46:46,110
Why is that?

968
00:46:46,110 --> 00:46:48,430
Yeah.

969
00:46:48,430 --> 00:46:50,020
AUDIENCE: This is a [INAUDIBLE].

970
00:46:50,020 --> 00:46:53,114
Maybe the strength
of the attractions

971
00:46:53,114 --> 00:46:54,960
aren't really taken
into account here?

972
00:46:54,960 --> 00:46:59,830
And so the moving things
out, essentially means

973
00:46:59,830 --> 00:47:03,180
that you're not sealing
everything in the [INAUDIBLE].

974
00:47:03,180 --> 00:47:05,129
You're just taking one.

975
00:47:05,129 --> 00:47:06,170
DOUG LAUFFENBURGER: Yeah.

976
00:47:06,170 --> 00:47:07,260
That's essentially it.

977
00:47:07,260 --> 00:47:11,550
I think you've casted it an
almost quantitative term,

978
00:47:11,550 --> 00:47:15,190
but I think it's true
even in qualitative terms.

979
00:47:15,190 --> 00:47:18,490
And one way to think
about it is-- let's

980
00:47:18,490 --> 00:47:21,090
say I have an extra
arc or extra node.

981
00:47:21,090 --> 00:47:22,180
OK.

982
00:47:22,180 --> 00:47:24,480
I might capture some
more true positives.

983
00:47:24,480 --> 00:47:26,910
I might actually
capture more of my data,

984
00:47:26,910 --> 00:47:32,590
but I could actually now, gain
more complex with my data.

985
00:47:32,590 --> 00:47:35,840
Because now I've put
in logic that, yes,

986
00:47:35,840 --> 00:47:38,370
it captures this
measurement, but now maybe it

987
00:47:38,370 --> 00:47:41,180
messes up these other two
or three measurements.

988
00:47:41,180 --> 00:47:43,020
So you actually can
make your model worse

989
00:47:43,020 --> 00:47:47,250
trying to capture some small
piece, that in fact, adversely

990
00:47:47,250 --> 00:47:51,020
influences the effects on
the other measurements.

991
00:47:51,020 --> 00:47:54,150
So you get you get false
positives, false negatives,

992
00:47:54,150 --> 00:47:56,110
along with anything
and that's true.

993
00:47:56,110 --> 00:48:00,090
And it just so happens that
in these kind of situations

994
00:48:00,090 --> 00:48:01,910
those can outweigh.

995
00:48:01,910 --> 00:48:04,810
Then of course, as you
increase the size penalty

996
00:48:04,810 --> 00:48:08,770
you can drive your model to
be even smaller, fewer arcs,

997
00:48:08,770 --> 00:48:11,630
and now that of course
does come at the expense

998
00:48:11,630 --> 00:48:13,340
of not fitting the data better.

999
00:48:13,340 --> 00:48:14,670
OK.

1000
00:48:14,670 --> 00:48:18,130
So where we decided that
the size penalty best lived

1001
00:48:18,130 --> 00:48:22,930
was where it was large enough
to ensure stripping down

1002
00:48:22,930 --> 00:48:26,040
of nonessential nodes and
arcs, but not large enough

1003
00:48:26,040 --> 00:48:29,100
to start compromising the
actual experimental fit.

1004
00:48:29,100 --> 00:48:29,600
OK.

1005
00:48:29,600 --> 00:48:31,308
And so that lived
someplace around there.

1006
00:48:34,890 --> 00:48:35,390
OK.

1007
00:48:35,390 --> 00:48:37,014
An important thing--
and this goes back

1008
00:48:37,014 --> 00:48:38,600
to the consensus model.

1009
00:48:38,600 --> 00:48:40,700
If you think about, quote,
model identification,

1010
00:48:40,700 --> 00:48:45,560
can you uniquely specify
one model a best fit model?

1011
00:48:45,560 --> 00:48:46,450
You really can't.

1012
00:48:46,450 --> 00:48:48,590
What's plotted here is
for any of the arcs that

1013
00:48:48,590 --> 00:48:50,892
would end up in a model.

1014
00:48:50,892 --> 00:48:53,100
Let's say we let's say we
numbered them from one to I

1015
00:48:53,100 --> 00:48:55,980
think it was 113
in the first place.

1016
00:48:55,980 --> 00:48:58,040
One arc, another arc,
another arc, another arc.

1017
00:48:58,040 --> 00:48:59,970
And you say, how
frequently did they

1018
00:48:59,970 --> 00:49:03,880
end up in the best fit models?

1019
00:49:03,880 --> 00:49:07,190
Basically, only a small
proportion of them

1020
00:49:07,190 --> 00:49:08,870
were in all the best fit models.

1021
00:49:08,870 --> 00:49:13,610
Some of them were in
some models and some not.

1022
00:49:13,610 --> 00:49:17,650
Of course the higher
the tolerance,

1023
00:49:17,650 --> 00:49:20,310
the more air you
allowed and now you

1024
00:49:20,310 --> 00:49:22,130
started to get
models that all fit

1025
00:49:22,130 --> 00:49:27,330
to within whatever that criteria
was in which most of their arcs

1026
00:49:27,330 --> 00:49:28,160
weren't the same.

1027
00:49:28,160 --> 00:49:30,326
You could have a lot of
different network structures

1028
00:49:30,326 --> 00:49:32,370
that give you that same fit.

1029
00:49:32,370 --> 00:49:37,080
If you require a very, very
tiny fit, compared to air,

1030
00:49:37,080 --> 00:49:40,830
something like this, then
more of the arcs in the models

1031
00:49:40,830 --> 00:49:42,600
have to be in common.

1032
00:49:42,600 --> 00:49:43,100
OK.

1033
00:49:43,100 --> 00:49:44,670
So that makes some sense.

1034
00:49:44,670 --> 00:49:48,220
But you can't really completely
identify a unique model.

1035
00:49:48,220 --> 00:49:50,800
That goes to what I said before.

1036
00:49:50,800 --> 00:49:51,300
OK.

1037
00:49:51,300 --> 00:49:53,217
I was talking before
about trade-offs

1038
00:49:53,217 --> 00:49:55,050
between false positives
and false negatives.

1039
00:49:55,050 --> 00:49:59,180
You must know, I'm sure from
previous things in this class,

1040
00:49:59,180 --> 00:50:01,870
the receiver operating
characteristic curves,

1041
00:50:01,870 --> 00:50:05,530
where for every of your
model parameter choices,

1042
00:50:05,530 --> 00:50:07,200
you say, what are
my results in terms

1043
00:50:07,200 --> 00:50:10,390
of false positives
versus true positives?

1044
00:50:10,390 --> 00:50:14,620
And you're trying to
find the optimal location

1045
00:50:14,620 --> 00:50:19,750
along this type of path.

1046
00:50:19,750 --> 00:50:27,400
And so, what's shown here is
that the best predictive model,

1047
00:50:27,400 --> 00:50:32,160
in fact, is the one where
we have the size penalty

1048
00:50:32,160 --> 00:50:34,760
to be right on the edge of not
making the experimental data

1049
00:50:34,760 --> 00:50:38,060
fit worse, but still
strips out the most arcs.

1050
00:50:38,060 --> 00:50:41,670
So again, that demonstrates
that the smaller model actually

1051
00:50:41,670 --> 00:50:47,540
is in fact better, in terms
of finding this type of--

1052
00:50:47,540 --> 00:50:50,650
And this shows if we actually
put in some more arcs that

1053
00:50:50,650 --> 00:50:53,140
tried to capture some
more data, yes we decrease

1054
00:50:53,140 --> 00:50:55,680
the false negatives,
but, in fact, we

1055
00:50:55,680 --> 00:50:56,900
increase the false positives.

1056
00:50:56,900 --> 00:50:59,780
We actually shift
ourselves on this curve.

1057
00:50:59,780 --> 00:51:03,310
And so you decide whether
that's desirable or not.

1058
00:51:03,310 --> 00:51:04,620
Where you'd like to live.

1059
00:51:04,620 --> 00:51:08,940
So you can analyze what you
like about your best fit class

1060
00:51:08,940 --> 00:51:11,270
of models in this kind of way.

1061
00:51:14,550 --> 00:51:17,920
OK, so now we have some
confidence in this.

1062
00:51:17,920 --> 00:51:19,360
What are you going
to do with it?

1063
00:51:19,360 --> 00:51:22,950
And one thing I'd like to do is
just make a priori predictions.

1064
00:51:22,950 --> 00:51:26,040
Say I now believe that on
these hepatocytes or tumor

1065
00:51:26,040 --> 00:51:29,810
cells stimulated with
these kind of things,

1066
00:51:29,810 --> 00:51:32,870
I can calculate what the
experimental signaling

1067
00:51:32,870 --> 00:51:35,121
activities should be.

1068
00:51:35,121 --> 00:51:35,620
All right.

1069
00:51:35,620 --> 00:51:37,020
Let's see if we
do that a priori.

1070
00:51:37,020 --> 00:51:43,440
So let's now use new inhibitors
that hadn't been used before.

1071
00:51:43,440 --> 00:51:45,650
Combination of inhibitors,
especially in cancer.

1072
00:51:45,650 --> 00:51:47,960
People are always interested
in combinatorial drugs.

1073
00:51:47,960 --> 00:51:50,460
Experimentally it's
prohibitive to run

1074
00:51:50,460 --> 00:51:52,190
through all possible
combinations.

1075
00:51:52,190 --> 00:51:54,190
So this is one thing in
the pharmaceutical field

1076
00:51:54,190 --> 00:51:56,900
people believe these kind of
models are really useful for.

1077
00:51:56,900 --> 00:51:58,780
Let's try all possible
drug combinations

1078
00:51:58,780 --> 00:52:01,270
and see which ones
are most promising.

1079
00:52:01,270 --> 00:52:04,810
And instead of just one
ligand growth factor cytokine

1080
00:52:04,810 --> 00:52:07,660
at a time, do
different combinations.

1081
00:52:07,660 --> 00:52:10,270
So this is all an
entirely new data set.

1082
00:52:10,270 --> 00:52:12,830
So different treatments that
are different combinations,

1083
00:52:12,830 --> 00:52:16,110
different inhibitors, different
combinations of inhibitors.

1084
00:52:16,110 --> 00:52:18,760
And now you just run the model--
it's not trained on this.

1085
00:52:18,760 --> 00:52:20,520
It was trained on
the previous data.

1086
00:52:20,520 --> 00:52:23,300
And now a priori
predicts this data set.

1087
00:52:23,300 --> 00:52:26,760
And now, again, you look for
the model fit in the bottom.

1088
00:52:26,760 --> 00:52:30,660
And again, you want the smallest
number of red and pink boxes.

1089
00:52:30,660 --> 00:52:33,430
In effect it predicted to
within about 11% error.

1090
00:52:33,430 --> 00:52:38,360
About 11% of the boxes didn't
fit well, but 89% percent did.

1091
00:52:38,360 --> 00:52:41,050
And that's, in fact
pretty close to the 9%

1092
00:52:41,050 --> 00:52:43,430
that was on the
original training model.

1093
00:52:43,430 --> 00:52:47,140
So in terms of this, in
this realm of studies,

1094
00:52:47,140 --> 00:52:49,970
these a priori treatment
conditions-- drug combinations,

1095
00:52:49,970 --> 00:52:52,480
growth factors,
cytokine combinations--

1096
00:52:52,480 --> 00:52:56,120
this is a pretty good validation
that this model wasn't

1097
00:52:56,120 --> 00:52:58,330
just kind of trained and fit.

1098
00:52:58,330 --> 00:53:00,130
That it, in fact,
could predict then

1099
00:53:00,130 --> 00:53:02,316
what was happening
in these pathways.

1100
00:53:02,316 --> 00:53:04,190
And then of course, what
it allows you to do,

1101
00:53:04,190 --> 00:53:05,898
where all the red
boxes are-- it say, OK,

1102
00:53:05,898 --> 00:53:08,042
that's where we need
more intensive study.

1103
00:53:08,042 --> 00:53:10,000
Now maybe we go back to
the literature and say,

1104
00:53:10,000 --> 00:53:12,050
is there more known
about those nodes that

1105
00:53:12,050 --> 00:53:15,780
was captured in whatever
our interactive database

1106
00:53:15,780 --> 00:53:17,390
that we started with?

1107
00:53:17,390 --> 00:53:21,880
Maybe we need to supplement the
scaffold with more information.

1108
00:53:21,880 --> 00:53:24,730
That's out in the literature
where more and more dedicated

1109
00:53:24,730 --> 00:53:25,690
experiments are done.

1110
00:53:25,690 --> 00:53:29,190
So it narrows down where the
next set of investigations

1111
00:53:29,190 --> 00:53:32,030
need to be, whether from the
literature or from yourself.

1112
00:53:35,286 --> 00:53:35,785
OK.

1113
00:53:40,900 --> 00:53:43,240
So this is just then
some biological results.

1114
00:53:43,240 --> 00:53:48,820
If you do this for the four
different hepatocellular lines.

1115
00:53:48,820 --> 00:53:50,810
Some of the signaling
activities are the same,

1116
00:53:50,810 --> 00:53:51,768
and some are different.

1117
00:53:54,700 --> 00:53:58,400
I think I'll skip that.

1118
00:53:58,400 --> 00:53:59,750
All right, let me show this.

1119
00:53:59,750 --> 00:54:03,850
So this says, where are the
similarities and differences

1120
00:54:03,850 --> 00:54:07,220
between the normal hepatocytes
versus the tumor lines.

1121
00:54:07,220 --> 00:54:09,160
Because this is
where you would want

1122
00:54:09,160 --> 00:54:11,540
to get the ideas for where
the right drugs would be.

1123
00:54:11,540 --> 00:54:15,340
Where is the logic different,
between a normal liver

1124
00:54:15,340 --> 00:54:19,420
cell and one of these
transformed types.

1125
00:54:19,420 --> 00:54:22,220
So, this is the same
kind of scaffold.

1126
00:54:22,220 --> 00:54:24,210
It'll get us the
consensus models

1127
00:54:24,210 --> 00:54:26,820
and the thickness of
the line is how strong--

1128
00:54:26,820 --> 00:54:30,330
what proportion of the models
did that arc show up in?

1129
00:54:30,330 --> 00:54:31,840
Along the best.

1130
00:54:31,840 --> 00:54:37,760
If it's black, the arc was
in the primary hepatocytes

1131
00:54:37,760 --> 00:54:39,690
and all the cell lines.

1132
00:54:39,690 --> 00:54:44,380
So black is just sort
of consensus core.

1133
00:54:44,380 --> 00:54:45,970
This is just invariably there.

1134
00:54:49,020 --> 00:54:53,480
The blue was in the models
for the primary hepatocytes,

1135
00:54:53,480 --> 00:54:57,030
but for some reason didn't
exist in the tumor cell lines.

1136
00:54:57,030 --> 00:55:00,300
So we're signaling logic
that normal hepatocytes use,

1137
00:55:00,300 --> 00:55:03,870
that the tumor cell
lines have somehow lost.

1138
00:55:03,870 --> 00:55:07,977
Red, are arcs that weren't
in the primary cells,

1139
00:55:07,977 --> 00:55:09,560
but showed up in the
tumor cell lines.

1140
00:55:09,560 --> 00:55:12,160
So was logic that the normal
liver cells apparently

1141
00:55:12,160 --> 00:55:16,912
didn't use, but now showed
up in the tumor cell lines.

1142
00:55:16,912 --> 00:55:18,620
And why would there
be these differences?

1143
00:55:18,620 --> 00:55:21,850
Well this is where it goes back
to then the genetic mutations

1144
00:55:21,850 --> 00:55:24,170
and variations.

1145
00:55:24,170 --> 00:55:27,880
Because going from a primary
to some tumor cell line,

1146
00:55:27,880 --> 00:55:32,540
there's enough of the genetic
mutations, that in this case

1147
00:55:32,540 --> 00:55:36,460
said, OK, I've got some genetic
mutation that interrupts

1148
00:55:36,460 --> 00:55:39,680
the link between map
three kinase and Ikk.

1149
00:55:39,680 --> 00:55:41,700
There was some docking
protein or something

1150
00:55:41,700 --> 00:55:45,250
that's now missing, not
expressed as highly.

1151
00:55:45,250 --> 00:55:46,730
It's got a mutation
of amino acids

1152
00:55:46,730 --> 00:55:48,300
and no longer docks right.

1153
00:55:48,300 --> 00:55:50,506
It has a lower
enzymatic activity.

1154
00:55:50,506 --> 00:55:51,880
So now you can go
back and trace.

1155
00:55:51,880 --> 00:55:54,410
Can I find some
genetic mutation that

1156
00:55:54,410 --> 00:55:57,290
has to do with the
loss of that arc?

1157
00:55:57,290 --> 00:55:59,015
Or if I've got a red
arc that shows up--

1158
00:55:59,015 --> 00:56:00,640
like I said because
there was something

1159
00:56:00,640 --> 00:56:06,230
in my genetic mutations that
now adds an activity here

1160
00:56:06,230 --> 00:56:07,270
that wasn't there.

1161
00:56:07,270 --> 00:56:09,920
Maybe something is now
constituently active.

1162
00:56:09,920 --> 00:56:12,360
Maybe something is just
expressed at a higher level.

1163
00:56:12,360 --> 00:56:15,179
And all of a sudden that
pathway comes into play.

1164
00:56:15,179 --> 00:56:16,220
So that's the cool thing.

1165
00:56:16,220 --> 00:56:19,270
You can trace what's actually
in the genetic mutations

1166
00:56:19,270 --> 00:56:21,400
if you have some
methodology for that,

1167
00:56:21,400 --> 00:56:24,510
to what's actually been
altered in the network logic.

1168
00:56:24,510 --> 00:56:26,576
Yeah?

1169
00:56:26,576 --> 00:56:28,950
AUDIENCE: Are the primary
lines considered healthy lines?

1170
00:56:28,950 --> 00:56:29,530
Or are they--

1171
00:56:29,530 --> 00:56:29,970
DOUG LAUFFENBURGER: Yes.

1172
00:56:29,970 --> 00:56:30,887
AUDIENCE: OK, so the--

1173
00:56:30,887 --> 00:56:31,928
DOUG LAUFFENBURGER: Yeah.

1174
00:56:31,928 --> 00:56:33,370
So they're from
donors but they're

1175
00:56:33,370 --> 00:56:40,120
mainly like motorcycle accident
donors that don't either

1176
00:56:40,120 --> 00:56:43,370
liver anymore but
the liver was fine.

1177
00:56:43,370 --> 00:56:45,130
So, yeah, they're
from healthy donors.

1178
00:56:45,130 --> 00:56:45,810
AUDIENCE: [INAUDIBLE].

1179
00:56:45,810 --> 00:56:46,360
DOUG LAUFFENBURGER: Yeah.

1180
00:56:46,360 --> 00:56:46,860
Yeah.

1181
00:56:46,860 --> 00:56:49,530
It was the lines at some
point came from a tumor

1182
00:56:49,530 --> 00:56:51,591
and have been propagated
in a culture, yeah.

1183
00:56:56,310 --> 00:56:56,950
OK.

1184
00:56:56,950 --> 00:57:01,040
What do I want to-- got
a little bit more time.

1185
00:57:01,040 --> 00:57:01,740
Let me do this.

1186
00:57:01,740 --> 00:57:02,250
OK.

1187
00:57:02,250 --> 00:57:05,100
So here's another interesting
thing that can happen.

1188
00:57:05,100 --> 00:57:06,730
If you take these
models seriously,

1189
00:57:06,730 --> 00:57:10,050
it can tell you something
about the biochemistry, perhaps

1190
00:57:10,050 --> 00:57:10,910
of what's going on.

1191
00:57:16,440 --> 00:57:21,120
So see there's this dashed line
here that I want to emphasize

1192
00:57:21,120 --> 00:57:23,510
and we'll emphasize it
again on another slide.

1193
00:57:23,510 --> 00:57:24,950
That was one that
had to be added.

1194
00:57:24,950 --> 00:57:30,970
It just wasn't in the
Ingenuity pathway, scaffold.

1195
00:57:30,970 --> 00:57:33,440
Actually couldn't find it
in any literature anywhere.

1196
00:57:33,440 --> 00:57:37,100
But nonetheless you needed
it to fit some data.

1197
00:57:37,100 --> 00:57:38,760
So we kind of kept
our eye on that one.

1198
00:57:38,760 --> 00:57:41,090
What the heck is going on here?

1199
00:57:41,090 --> 00:57:45,450
This dashed line from I kappa
kinase up to step three.

1200
00:57:45,450 --> 00:57:49,620
No evidence for that signaling
linkage in the literature

1201
00:57:49,620 --> 00:57:51,200
anywhere.

1202
00:57:51,200 --> 00:57:52,272
What could that tell you?

1203
00:57:56,150 --> 00:57:56,650
All right.

1204
00:57:56,650 --> 00:57:58,730
Well, you go back to the
data now and you say,

1205
00:57:58,730 --> 00:58:01,590
well what of the data set, of
the experimental measurements

1206
00:58:01,590 --> 00:58:04,400
that we made, caused
that arc to have

1207
00:58:04,400 --> 00:58:06,071
to be there to
fit the data well?

1208
00:58:06,071 --> 00:58:06,570
OK.

1209
00:58:06,570 --> 00:58:08,590
You can now ask that
kind of question.

1210
00:58:08,590 --> 00:58:11,437
Well remember I said in the
data set were inhibitors.

1211
00:58:11,437 --> 00:58:13,520
Some small molecule
inhibitors against this kinase

1212
00:58:13,520 --> 00:58:16,660
or that kinase or that kinase
that would perturb the network

1213
00:58:16,660 --> 00:58:19,005
and then give us relationships
at the logic model

1214
00:58:19,005 --> 00:58:21,820
and had to account for.

1215
00:58:21,820 --> 00:58:24,810
Well, this one had
to be there, mainly

1216
00:58:24,810 --> 00:58:29,270
to account for data that came
from an inhibitor of Ikk.

1217
00:58:29,270 --> 00:58:32,250
That one of the kinases
that we had a small molecule

1218
00:58:32,250 --> 00:58:36,000
inhibitor for,
inhabited this kinase.

1219
00:58:36,000 --> 00:58:37,690
And somehow there
turned out to be

1220
00:58:37,690 --> 00:58:40,740
an effect on staph
3 phosphorylation.

1221
00:58:40,740 --> 00:58:44,810
And so you needed
that arc to be there.

1222
00:58:44,810 --> 00:58:46,810
So either the explanation
that either there's,

1223
00:58:46,810 --> 00:58:49,800
in fact, some real
mechanism going on here.

1224
00:58:49,800 --> 00:58:51,360
It might have been
transcriptional

1225
00:58:51,360 --> 00:58:53,880
that somehow the
activity of this kinase

1226
00:58:53,880 --> 00:58:57,760
affects the levels of expression
and the responsiveness

1227
00:58:57,760 --> 00:58:59,260
of staph 3.

1228
00:58:59,260 --> 00:59:02,370
Or you say, ah, maybe it's
a problem with the drug?

1229
00:59:02,370 --> 00:59:04,074
It's a problem
with the inhibitor.

1230
00:59:04,074 --> 00:59:06,490
That, in fact, what you thought
was an inhibitor that just

1231
00:59:06,490 --> 00:59:09,320
affected this kinase, has
an off-target target effect

1232
00:59:09,320 --> 00:59:10,980
on that kind of that kinase.

1233
00:59:10,980 --> 00:59:12,880
And it's just an artifact.

1234
00:59:12,880 --> 00:59:14,820
That's an alternative
explanation.

1235
00:59:14,820 --> 00:59:16,820
Right, so that's the sort
of thing you can test.

1236
00:59:16,820 --> 00:59:19,420
And we did test it.

1237
00:59:19,420 --> 00:59:21,110
And here's the data here.

1238
00:59:21,110 --> 00:59:26,130
At the bottom is the kinase that
you wanted the inhibition 2.

1239
00:59:26,130 --> 00:59:30,230
And in the blue was the
inhibitor that was actually

1240
00:59:30,230 --> 00:59:33,990
used in the study, both
in vivo and en vitro

1241
00:59:33,990 --> 00:59:37,080
and it inhibited that kinase.

1242
00:59:37,080 --> 00:59:39,930
But then we looked at the
potential off target effect

1243
00:59:39,930 --> 00:59:43,550
on that other-- the
JAK2 [? stat ?] 3

1244
00:59:43,550 --> 00:59:46,560
and it also did have
activity on that.

1245
00:59:46,560 --> 00:59:50,400
So it meant that
that inhibitor had

1246
00:59:50,400 --> 00:59:55,410
an effect, not just on the Ikk,
but also on the JAK [? stat ?]

1247
00:59:55,410 --> 00:59:57,030
3.

1248
00:59:57,030 --> 01:00:01,380
And so that's why that
arc had to be there,

1249
01:00:01,380 --> 01:00:03,130
is because, in fact,
that inhibitor,

1250
01:00:03,130 --> 01:00:05,240
inhibited this kinase as well.

1251
01:00:05,240 --> 01:00:07,830
So if we took that into account
in terms of the algorithm,

1252
01:00:07,830 --> 01:00:09,980
then we wouldn't have to
have that arc because it

1253
01:00:09,980 --> 01:00:11,900
was spurious and came
from the arc, in fact,

1254
01:00:11,900 --> 01:00:13,090
of that inhibitor.

1255
01:00:13,090 --> 01:00:15,670
But the interesting thing
is that, by taking the model

1256
01:00:15,670 --> 01:00:18,500
seriously, we can
actually find that.

1257
01:00:18,500 --> 01:00:22,400
Because it was not previously
known that this inhibitor had

1258
01:00:22,400 --> 01:00:25,560
an off-target effect
on that kinase.

1259
01:00:25,560 --> 01:00:28,620
In effect, the interesting
thing, pharmacologically,

1260
01:00:28,620 --> 01:00:34,840
was that this
small molecule that

1261
01:00:34,840 --> 01:00:37,550
was aimed to be an inhibitor
against this kinase

1262
01:00:37,550 --> 01:00:42,180
was the best by far in treating
lung airway inflammation,

1263
01:00:42,180 --> 01:00:43,960
compared against
a whole other set

1264
01:00:43,960 --> 01:00:46,690
of other types of inhibitors
for the same kinase.

1265
01:00:46,690 --> 01:00:48,520
So now the reason
might be is, it's

1266
01:00:48,520 --> 01:00:51,010
better because it's also
hitting this other kinase.

1267
01:00:51,010 --> 01:00:52,610
That this off-target
effect actually

1268
01:00:52,610 --> 01:00:55,920
is therapeutically
efficacious and in fact

1269
01:00:55,920 --> 01:00:58,830
a combination of drugs
against this kinase

1270
01:00:58,830 --> 01:01:01,270
and the other kinase
is what's required

1271
01:01:01,270 --> 01:01:03,600
for the therapeutic benefit.

1272
01:01:03,600 --> 01:01:05,750
So that's something
that could be explored.

1273
01:01:05,750 --> 01:01:07,920
And that's the sort of
thing this model leads to.

1274
01:01:10,770 --> 01:01:11,270
OK.

1275
01:01:14,096 --> 01:01:20,220
Let me end by digging into
this difference a little bit.

1276
01:01:20,220 --> 01:01:24,700
Because I said, you
see these differences

1277
01:01:24,700 --> 01:01:31,080
between primary hepatocytes
and the tumor cell lines.

1278
01:01:31,080 --> 01:01:34,290
And the model said, just
from examining the data sets,

1279
01:01:34,290 --> 01:01:36,175
that the logic is different.

1280
01:01:36,175 --> 01:01:36,820
OK.

1281
01:01:36,820 --> 01:01:40,280
Is there any
validation for that?

1282
01:01:40,280 --> 01:01:43,315
Well, so let's go back and
look at those differences

1283
01:01:43,315 --> 01:01:44,440
with respect to literature.

1284
01:01:44,440 --> 01:01:47,920
So if you just blow up
that part of the model,

1285
01:01:47,920 --> 01:01:50,830
there's eight edges
that are strongly

1286
01:01:50,830 --> 01:01:53,940
disparate between the primary,
normal cell types and the tumor

1287
01:01:53,940 --> 01:01:55,940
cells and they're
all enumerated here.

1288
01:01:55,940 --> 01:01:59,240
One, two, three, four,
five, six, seven, eight.

1289
01:01:59,240 --> 01:02:02,582
And they're essentially in
three different pathways.

1290
01:02:02,582 --> 01:02:04,040
So what the model
is telling you is

1291
01:02:04,040 --> 01:02:08,670
that there's three different
pathways that are substantially

1292
01:02:08,670 --> 01:02:13,468
different between a normal liver
cell and a liver tumor cell.

1293
01:02:13,468 --> 01:02:14,460
OK.

1294
01:02:14,460 --> 01:02:19,470
So is there any evidence
that this is really true?

1295
01:02:19,470 --> 01:02:20,750
So let's look at one.

1296
01:02:20,750 --> 01:02:23,165
On to this pathway that
I've got differences.

1297
01:02:23,165 --> 01:02:25,090
And you see blue
here and red here.

1298
01:02:29,020 --> 01:02:32,385
It says that this particular
signaling node in normal cells

1299
01:02:32,385 --> 01:02:34,650
is activated by this pathway.

1300
01:02:34,650 --> 01:02:36,430
In the tumors, that
regulation is lost

1301
01:02:36,430 --> 01:02:40,160
and that actually comes
through another pathway.

1302
01:02:40,160 --> 01:02:45,550
And it turns this is
consistent with literature

1303
01:02:45,550 --> 01:02:47,910
that, in fact, in
the tumor cells,

1304
01:02:47,910 --> 01:02:52,350
you get a higher activity
of this downstream node.

1305
01:02:52,350 --> 01:02:53,940
And now I've lost
my light again.

1306
01:02:53,940 --> 01:02:56,470
This HSP27.

1307
01:02:56,470 --> 01:02:59,560
Even though it's
over expressed, you

1308
01:02:59,560 --> 01:03:03,930
get less activation because
this pathway is less strongly

1309
01:03:03,930 --> 01:03:07,490
activated in red than
the blue pathway is.

1310
01:03:07,490 --> 01:03:08,930
So if you went by
gene expression,

1311
01:03:08,930 --> 01:03:10,500
you'd think in the
tumor cells, this

1312
01:03:10,500 --> 01:03:12,706
is a higher activated pathway.

1313
01:03:12,706 --> 01:03:14,080
Turns out the
logic is different,

1314
01:03:14,080 --> 01:03:15,870
and you actually get
less activation of it

1315
01:03:15,870 --> 01:03:17,745
because it's coming from
a different pathway.

1316
01:03:17,745 --> 01:03:21,740
So that turns out to be true
in the liver tumor literature.

1317
01:03:21,740 --> 01:03:24,530
Another one-- I find this
one really interesting.

1318
01:03:24,530 --> 01:03:29,110
That in normal liver cells,
to activate this Ikk pathway--

1319
01:03:29,110 --> 01:03:30,910
that's a very important
kinase pathway,

1320
01:03:30,910 --> 01:03:33,660
governing the transcription
factor of NF Kappa b.

1321
01:03:33,660 --> 01:03:37,210
In a primary cell, I
need this combined logic

1322
01:03:37,210 --> 01:03:39,455
between a pathway downstream
of insulin receptor

1323
01:03:39,455 --> 01:03:42,060
and a pathway downstream
of a cytokine.

1324
01:03:42,060 --> 01:03:46,520
Only if both of those pathways
are on, do I now turn this on.

1325
01:03:46,520 --> 01:03:48,890
In the tumor cells,
that check is lost.

1326
01:03:48,890 --> 01:03:51,690
Only one pathway is required.

1327
01:03:51,690 --> 01:03:52,190
OK.

1328
01:03:52,190 --> 01:03:53,700
If this one is
activated, I'm going

1329
01:03:53,700 --> 01:03:56,140
to get this transcription
factor activated.

1330
01:03:56,140 --> 01:03:58,820
I don't have to wait for
simultaneous activation

1331
01:03:58,820 --> 01:03:59,800
of this pathway.

1332
01:03:59,800 --> 01:04:02,650
Where as a normal
says I have to.

1333
01:04:02,650 --> 01:04:03,470
OK.

1334
01:04:03,470 --> 01:04:06,220
That turns out to be true
that in the liver cells,

1335
01:04:06,220 --> 01:04:09,040
the progression is
associated with a looser

1336
01:04:09,040 --> 01:04:13,810
regulation of this
transcription factor.

1337
01:04:13,810 --> 01:04:15,130
And one more.

1338
01:04:15,130 --> 01:04:18,870
I won't go into too much
detail, but again, you

1339
01:04:18,870 --> 01:04:21,310
see reds and blues here.

1340
01:04:21,310 --> 01:04:23,280
In the tumor cell
lines, you've now

1341
01:04:23,280 --> 01:04:25,810
got activities
downstream of insulin.

1342
01:04:25,810 --> 01:04:27,770
That's normally just
a survival factor,

1343
01:04:27,770 --> 01:04:30,490
that's just not found
in the primary cells.

1344
01:04:30,490 --> 01:04:33,870
And that, in fact, is shown
in the literature too,

1345
01:04:33,870 --> 01:04:37,500
that insulin signaling
shifts from metabolism

1346
01:04:37,500 --> 01:04:38,785
to proliferation.

1347
01:04:38,785 --> 01:04:40,384
It's mainly metabolic, stimulus.

1348
01:04:40,384 --> 01:04:42,800
In the normal cells it turns
into a proliferative stimulus

1349
01:04:42,800 --> 01:04:44,500
in the tumor cells.

1350
01:04:44,500 --> 01:04:45,000
OK.

1351
01:04:45,000 --> 01:04:50,090
So, what this says is,
just by mapping this logic

1352
01:04:50,090 --> 01:04:53,670
scaffold, the scaffold
against empirical data,

1353
01:04:53,670 --> 01:04:55,390
developing a logic
model, you in fact

1354
01:04:55,390 --> 01:04:59,650
can find loci of differences
between the normal cell

1355
01:04:59,650 --> 01:05:02,060
signaling logic and tumor
cells signalling logic

1356
01:05:02,060 --> 01:05:05,200
for which there's evidence in
the literature, none of which

1357
01:05:05,200 --> 01:05:07,085
was in the original databases.

1358
01:05:09,726 --> 01:05:11,100
Finally, I'm going
I'm just going

1359
01:05:11,100 --> 01:05:15,560
to say that it turns out in
another study, what you could

1360
01:05:15,560 --> 01:05:19,259
show is those three pathways
that the model predicts

1361
01:05:19,259 --> 01:05:21,050
are the differences
between the liver tumor

1362
01:05:21,050 --> 01:05:22,800
cells and the normal cells.

1363
01:05:22,800 --> 01:05:27,410
That in order to kill
these liver tumor cells,

1364
01:05:27,410 --> 01:05:30,460
you need inhibitors against all
three pathways simultaneously.

1365
01:05:30,460 --> 01:05:32,220
You actually need
combination drugs

1366
01:05:32,220 --> 01:05:35,760
of three different pathway
inhibitors to kill these cells.

1367
01:05:35,760 --> 01:05:37,730
And it's exactly
the three pathways

1368
01:05:37,730 --> 01:05:40,610
that the model predicted of the
differences between the normals

1369
01:05:40,610 --> 01:05:42,030
and the tumor cells.

1370
01:05:45,790 --> 01:05:46,600
OK.

1371
01:05:46,600 --> 01:05:48,700
All right, so I will
end here and then see

1372
01:05:48,700 --> 01:05:50,510
if there's any more questions.

1373
01:05:50,510 --> 01:05:53,380
Something that
comes up a lot is--

1374
01:05:53,380 --> 01:05:56,520
there's discomfort with Boolean
logic because of zero, one.

1375
01:05:56,520 --> 01:05:58,880
It's off, on, and of
course we know biology,

1376
01:05:58,880 --> 01:06:01,260
biochemistry doesn't
work that way.

1377
01:06:01,260 --> 01:06:03,430
And so there can be
so many artifacts,

1378
01:06:03,430 --> 01:06:05,390
so many places that you
can get things wrong,

1379
01:06:05,390 --> 01:06:09,210
because you're trying to fit a
model where the measurement is

1380
01:06:09,210 --> 01:06:11,147
supposed to be
either zero or one,

1381
01:06:11,147 --> 01:06:13,230
and you're comparing it
against a measurement that

1382
01:06:13,230 --> 01:06:15,840
might be 0.6.

1383
01:06:15,840 --> 01:06:19,900
Well, 0.6, is that closer
to 1, is it closer to 0?

1384
01:06:19,900 --> 01:06:22,624
Is there some normalization
that would shift it

1385
01:06:22,624 --> 01:06:23,540
from one to the other.

1386
01:06:23,540 --> 01:06:26,510
And instead of being a correct
fit, it's now an incorrect fit.

1387
01:06:26,510 --> 01:06:28,360
So you can see the
room for artifacts

1388
01:06:28,360 --> 01:06:34,470
by mapping quantitative data
against a qualitative model.

1389
01:06:34,470 --> 01:06:37,490
So, one thing done more
recently is to admit that

1390
01:06:37,490 --> 01:06:40,820
and say, well, let's say
just relax this a bit.

1391
01:06:40,820 --> 01:06:45,470
And instead of having step
functions from off to on,

1392
01:06:45,470 --> 01:06:46,820
that they're more graded.

1393
01:06:46,820 --> 01:06:50,600
It's like an analog
transfer function.

1394
01:06:50,600 --> 01:06:53,940
So what you've essentially
done is add one more parameter

1395
01:06:53,940 --> 01:06:57,010
to every node, to every gate.

1396
01:06:57,010 --> 01:06:58,920
Because of Boolean
logic, there's

1397
01:06:58,920 --> 01:07:00,670
essentially one
hidden parameter.

1398
01:07:00,670 --> 01:07:02,910
That's where you shift
from off to on, right?

1399
01:07:02,910 --> 01:07:07,280
There's some location of
the level of the signal

1400
01:07:07,280 --> 01:07:09,554
that you've decided is 0 or 1.

1401
01:07:09,554 --> 01:07:11,470
So there's some parameter
that you shift from,

1402
01:07:11,470 --> 01:07:13,710
saying it's off to on.

1403
01:07:13,710 --> 01:07:16,430
Well here now in this
formalism there's that,

1404
01:07:16,430 --> 01:07:19,570
but there's also then the slope
of shifting from off to on.

1405
01:07:19,570 --> 01:07:21,580
Is it still fairly steep?

1406
01:07:21,580 --> 01:07:23,040
Is it really mild?

1407
01:07:23,040 --> 01:07:25,611
Is it someplace in between?

1408
01:07:25,611 --> 01:07:26,110
OK?

1409
01:07:26,110 --> 01:07:29,310
And this can go with
AND and OR gates too.

1410
01:07:29,310 --> 01:07:30,830
Now, instead of
just one dimension,

1411
01:07:30,830 --> 01:07:34,020
one component being
off to on or on to off,

1412
01:07:34,020 --> 01:07:38,460
now you got AND and OR gates
that have these slopes as well.

1413
01:07:38,460 --> 01:07:42,960
So what this means is
you require more data

1414
01:07:42,960 --> 01:07:46,090
to fit this-- we call it a
constrained fuzzy logic model

1415
01:07:46,090 --> 01:07:49,120
because you've got-- if I've
got 50 nodes in my system,

1416
01:07:49,120 --> 01:07:51,985
I've got 50 more
parameters I've got to fit.

1417
01:07:51,985 --> 01:07:54,450
OK, so that requires more data.

1418
01:07:54,450 --> 01:07:59,930
What's the benefit of it, is
that your predictions now,

1419
01:07:59,930 --> 01:08:01,441
in fact, can be quantitative.

1420
01:08:01,441 --> 01:08:02,940
So you can go into
the model and say

1421
01:08:02,940 --> 01:08:05,340
here's a transcription
factor CREB.

1422
01:08:05,340 --> 01:08:08,320
I'm going to predict its
phosphorylation state

1423
01:08:08,320 --> 01:08:10,600
and its transcriptional
activity, perhaps,

1424
01:08:10,600 --> 01:08:13,694
based on the activities
of two upstream kinases.

1425
01:08:13,694 --> 01:08:16,069
And so if I had had an inhibitor
for one of these kinases

1426
01:08:16,069 --> 01:08:19,279
or another, how much would
I shift the phosphorylation

1427
01:08:19,279 --> 01:08:21,270
of this transcription factor?

1428
01:08:21,270 --> 01:08:25,390
And what you actually see is
these gradual curves, that if I

1429
01:08:25,390 --> 01:08:29,109
start to inhibit
[INAUDIBLE], OK, it gradually

1430
01:08:29,109 --> 01:08:31,399
changes the
phosphorylation of CREB.

1431
01:08:31,399 --> 01:08:34,789
Or if I inhibit the
activity of P38,

1432
01:08:34,789 --> 01:08:39,250
it even more gradually
effects the activity of CREB.

1433
01:08:39,250 --> 01:08:42,295
So you can turn these into
quantitative predictions

1434
01:08:42,295 --> 01:08:45,000
of strong effects, weak effects.

1435
01:08:45,000 --> 01:08:47,149
And again, look at
drug combinations.

1436
01:08:47,149 --> 01:08:51,649
So that's the advantage of going
to this more analog transfer

1437
01:08:51,649 --> 01:08:53,950
function logic model.

1438
01:08:53,950 --> 01:08:57,430
You can deal with
quantification much better,

1439
01:08:57,430 --> 01:08:59,938
but at the cost of
requiring more data.

1440
01:08:59,938 --> 01:09:01,229
OK, I think I'll leave it here.

1441
01:09:01,229 --> 01:09:03,140
It's about 3:15
and so if there's

1442
01:09:03,140 --> 01:09:08,090
more questions we can take
them about any aspect of this.

1443
01:09:08,090 --> 01:09:11,933
Most of you have stayed awake,
I think that's a good thing.

1444
01:09:11,933 --> 01:09:12,710
OK.

1445
01:09:12,710 --> 01:09:13,366
More questions?

1446
01:09:20,104 --> 01:09:21,020
AUDIENCE: [INAUDIBLE].

1447
01:09:21,020 --> 01:09:25,970
When you have the model for
the template of the Ikk story.

1448
01:09:25,970 --> 01:09:28,069
And then it seems
like it may not

1449
01:09:28,069 --> 01:09:31,822
be as easy to back out
the original data that

1450
01:09:31,822 --> 01:09:33,485
led to that specific mode.

1451
01:09:33,485 --> 01:09:35,100
For example, you
showed that one arc

1452
01:09:35,100 --> 01:09:38,035
was from this one treatment.

1453
01:09:38,035 --> 01:09:41,182
But because if you trained
the model the same as that

1454
01:09:41,182 --> 01:09:45,000
and it's not deterministic,
then what-- could you just add--

1455
01:09:45,000 --> 01:09:48,720
DOUG LAUFFENBURGER: I think
that's a great question.

1456
01:09:48,720 --> 01:09:52,827
So let's say there's
new arcs that you add,

1457
01:09:52,827 --> 01:09:54,410
that weren't in the
original scaffold.

1458
01:09:54,410 --> 01:09:56,660
I mean that's what you got
the biggest questions from.

1459
01:09:56,660 --> 01:09:58,620
If you delete one,
you say, ah, it's

1460
01:09:58,620 --> 01:10:01,420
easy to believe why
you would delete one.

1461
01:10:01,420 --> 01:10:03,480
Any arc that you add
to get a best fit,

1462
01:10:03,480 --> 01:10:05,920
I think you've got to
ask questions about.

1463
01:10:05,920 --> 01:10:07,450
So in all those
cases where there

1464
01:10:07,450 --> 01:10:10,320
are arcs that were added that
led to a better fit model,

1465
01:10:10,320 --> 01:10:12,610
the first thing we did
was go to the literature.

1466
01:10:12,610 --> 01:10:17,480
Say, OK, is there literature
on some affect of this node

1467
01:10:17,480 --> 01:10:19,160
to that node?

1468
01:10:19,160 --> 01:10:20,950
And it's just that
that literature

1469
01:10:20,950 --> 01:10:22,560
wasn't curated
into that database

1470
01:10:22,560 --> 01:10:24,830
or something like that.

1471
01:10:24,830 --> 01:10:27,730
And most of the time
we could find it there.

1472
01:10:27,730 --> 01:10:28,490
OK.

1473
01:10:28,490 --> 01:10:31,720
So then there were the cases,
and this was the most prominent

1474
01:10:31,720 --> 01:10:35,270
one, where from some
added arc, we just

1475
01:10:35,270 --> 01:10:36,770
couldn't find it
in the literature.

1476
01:10:36,770 --> 01:10:38,280
In this particular
case, it was very

1477
01:10:38,280 --> 01:10:40,830
easy to trace it to
this particular effect

1478
01:10:40,830 --> 01:10:43,412
of this inhibitor.

1479
01:10:43,412 --> 01:10:45,870
I would say there's no reason
to believe that that's always

1480
01:10:45,870 --> 01:10:46,960
going to be the case.

1481
01:10:46,960 --> 01:10:49,790
I don't have another example to
show you where it was harder.

1482
01:10:49,790 --> 01:10:52,804
Everything else we actually
found in the literature.

1483
01:10:52,804 --> 01:10:55,220
But you could imagine, having
some new arc that you really

1484
01:10:55,220 --> 01:10:57,250
couldn't find in the
literature and there's

1485
01:10:57,250 --> 01:11:00,330
no artifactual
explanation for it.

1486
01:11:00,330 --> 01:11:04,250
And now, how you trace it
back to what the data was

1487
01:11:04,250 --> 01:11:07,274
that might give you
a more nuanced hint.

1488
01:11:07,274 --> 01:11:08,190
It's a great question.

1489
01:11:08,190 --> 01:11:11,760
I don't really know
how we'll do that.

1490
01:11:11,760 --> 01:11:14,450
I think, we and other
practitioners who use this,

1491
01:11:14,450 --> 01:11:17,310
I'm sure we'll run
into it at some point.

1492
01:11:17,310 --> 01:11:22,108
That's a great challenge
to be thinking about.

1493
01:11:22,108 --> 01:11:22,608
Yeah?

1494
01:11:22,608 --> 01:11:24,596
AUDIENCE: I might have
missed this earlier,

1495
01:11:24,596 --> 01:11:30,680
but I was wondering,
is this model actually

1496
01:11:30,680 --> 01:11:34,600
able to incorporate the
heterogeneity of a tumor,

1497
01:11:34,600 --> 01:11:35,580
for example?

1498
01:11:35,580 --> 01:11:40,010
Or the population heterogeneity?

1499
01:11:40,010 --> 01:11:44,068
DOUG LAUFFENBURGER: That's also
a really interesting question.

1500
01:11:44,068 --> 01:11:47,470
Let me try to show
something here.

1501
01:11:47,470 --> 01:11:50,380
Yeah.

1502
01:11:50,380 --> 01:11:52,320
So two things.

1503
01:11:52,320 --> 01:11:58,210
One is, what's shown here--
this is the four different tumor

1504
01:11:58,210 --> 01:12:00,270
cells that we did.

1505
01:12:00,270 --> 01:12:05,480
And what's shown in color is
the arcs for each one of them.

1506
01:12:05,480 --> 01:12:09,510
So yellow, orange, brown, red.

1507
01:12:09,510 --> 01:12:12,840
So some places you see all
four of those colors there.

1508
01:12:12,840 --> 01:12:15,150
In some places only two or one.

1509
01:12:15,150 --> 01:12:17,650
It says if I had four
different tumor types,

1510
01:12:17,650 --> 01:12:21,450
there's some slight differences
in logic among them.

1511
01:12:21,450 --> 01:12:23,340
You could translate
to that is, well

1512
01:12:23,340 --> 01:12:25,230
I could imagine
then having a tumor

1513
01:12:25,230 --> 01:12:31,360
that's a mixture of sub types
and how would I discern that?

1514
01:12:31,360 --> 01:12:35,120
One possible idea that's
attractive to me--

1515
01:12:35,120 --> 01:12:37,450
although we didn't really
explore this in any form of

1516
01:12:37,450 --> 01:12:37,640
[INAUDIBLE].

1517
01:12:37,640 --> 01:12:39,350
We didn't really have
the means to make

1518
01:12:39,350 --> 01:12:40,933
experimental
measurements on the tumor

1519
01:12:40,933 --> 01:12:42,790
heterogeneity at the time.

1520
01:12:42,790 --> 01:12:45,570
It's when you get to a
set of consensus models.

1521
01:12:45,570 --> 01:12:46,070
Right.

1522
01:12:46,070 --> 01:12:49,450
So let's say you get the 50
best fit models and you say,

1523
01:12:49,450 --> 01:12:55,530
some arc is in 80% of them,
but it's not in 20% of them,

1524
01:12:55,530 --> 01:12:57,980
is it possible that
that represents

1525
01:12:57,980 --> 01:13:02,250
some of heterogeneity
because you're

1526
01:13:02,250 --> 01:13:05,860
getting an average of
different subtypes?

1527
01:13:05,860 --> 01:13:06,730
I don't know.

1528
01:13:06,730 --> 01:13:09,650
Sometimes that appeals to
me as potentially valid.

1529
01:13:09,650 --> 01:13:13,820
Sometimes I think there's
a flaw in that reasoning.

1530
01:13:13,820 --> 01:13:17,695
Just because you get an average
that's not then as strong.

1531
01:13:17,695 --> 01:13:21,580
Does that necessarily
reflect a sub-population?

1532
01:13:21,580 --> 01:13:24,230
I don't know.

1533
01:13:24,230 --> 01:13:26,700
So what we do know is,
we can see differences

1534
01:13:26,700 --> 01:13:28,770
when there are differences.

1535
01:13:28,770 --> 01:13:31,380
How you actually see
them then, if all you

1536
01:13:31,380 --> 01:13:34,670
have is averaged
data, maybe it's

1537
01:13:34,670 --> 01:13:38,110
reflected in the heterogeneity
of the consensus models.

1538
01:13:38,110 --> 01:13:38,860
Maybe not.

1539
01:13:41,710 --> 01:13:43,500
It would be an
interesting to explore.

1540
01:13:48,349 --> 01:13:48,890
Anybody else?

1541
01:13:54,940 --> 01:13:55,440
All right.

1542
01:13:55,440 --> 01:13:55,940
All set.

1543
01:13:55,940 --> 01:13:57,280
Thanks.