1 00:00:00,060 --> 00:00:01,780 The following content is provided 2 00:00:01,780 --> 00:00:04,019 under a Creative Commons license. 3 00:00:04,019 --> 00:00:06,870 Your support will help MIT OpenCourseWare continue 4 00:00:06,870 --> 00:00:10,730 to offer high quality educational resources for free. 5 00:00:10,730 --> 00:00:13,340 To make a donation or view additional materials 6 00:00:13,340 --> 00:00:17,217 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,217 --> 00:00:17,842 at ocw.mit.edu. 8 00:00:26,570 --> 00:00:30,580 DOUG LAUFFENBURGER: So we shall start. 9 00:00:30,580 --> 00:00:33,092 I haven't had the pleasure of meeting most of you. 10 00:00:33,092 --> 00:00:34,050 I'm Doug Lauffenburger. 11 00:00:34,050 --> 00:00:40,700 I'm gratefully invited for a guest presentation here. 12 00:00:40,700 --> 00:00:44,229 So I'll definitely enjoy it. 13 00:00:44,229 --> 00:00:45,520 There should be plenty of time. 14 00:00:45,520 --> 00:00:48,510 I'm not racing through a lot of material, 15 00:00:48,510 --> 00:00:52,080 so feel free to interrupt me with questions. 16 00:00:52,080 --> 00:00:55,155 And of course I'll try to respond as best I can. 17 00:00:57,621 --> 00:00:58,120 OK. 18 00:00:58,120 --> 00:01:02,460 Who has looked at the background materials that 19 00:01:02,460 --> 00:01:06,340 were posted on the web a long time ago, last night? 20 00:01:06,340 --> 00:01:08,770 Who already will admit to having looked at it? 21 00:01:11,290 --> 00:01:11,790 Good. 22 00:01:11,790 --> 00:01:13,606 All right. 23 00:01:13,606 --> 00:01:15,200 I guess that means I should do this 24 00:01:15,200 --> 00:01:17,320 because otherwise if you've read it already then 25 00:01:17,320 --> 00:01:18,480 there'd be no point, right? 26 00:01:18,480 --> 00:01:20,550 OK. 27 00:01:20,550 --> 00:01:21,300 OK. 28 00:01:21,300 --> 00:01:24,210 Well, where we are in your semester-- 29 00:01:24,210 --> 00:01:28,290 you're learning a lot of things across the whole spectrum 30 00:01:28,290 --> 00:01:30,940 of computational systems biology. 31 00:01:30,940 --> 00:01:33,870 I hope I'll add something in here. 32 00:01:33,870 --> 00:01:36,620 It's actually a very specific topic. 33 00:01:36,620 --> 00:01:40,020 We talk about modeling of cell signaling networks, 34 00:01:40,020 --> 00:01:44,490 and in particular, one approach is worth going through today 35 00:01:44,490 --> 00:01:47,190 and that's the logic modeling framework. 36 00:01:47,190 --> 00:01:51,710 So I'll give you a little bit of a conceptual background 37 00:01:51,710 --> 00:01:54,090 for the first 10 or 15 minutes. 38 00:01:54,090 --> 00:02:00,030 Then we'll launch into the particular example 39 00:02:00,030 --> 00:02:02,630 that was in the main paper. 40 00:02:02,630 --> 00:02:05,900 And a little side light with an application of it 41 00:02:05,900 --> 00:02:08,830 to a particular cancer problem. 42 00:02:08,830 --> 00:02:11,290 And then that should take us pretty much to the end. 43 00:02:11,290 --> 00:02:11,790 OK. 44 00:02:16,060 --> 00:02:16,600 OK. 45 00:02:16,600 --> 00:02:21,940 The biological topic here is cell signaling, primarily 46 00:02:21,940 --> 00:02:23,930 mammalian cells. 47 00:02:23,930 --> 00:02:27,220 Certainly applicable to microbial cells 48 00:02:27,220 --> 00:02:28,990 in a simpler sense. 49 00:02:28,990 --> 00:02:36,940 So just to place the context, in mammalian cell biology, 50 00:02:36,940 --> 00:02:40,430 I'm a bio-engineer and a cell biologist at the same time. 51 00:02:40,430 --> 00:02:43,720 We're very interested in what controls the cell behavior, 52 00:02:43,720 --> 00:02:45,880 their phenotypic response. 53 00:02:45,880 --> 00:02:48,680 We know that it's in fact controlled 54 00:02:48,680 --> 00:02:51,110 by what it sees in their environment, growth 55 00:02:51,110 --> 00:02:55,890 factors, hormones, extracellular matrix, mechanical forces, 56 00:02:55,890 --> 00:02:59,320 cell-cell contacts. 57 00:02:59,320 --> 00:03:03,730 A variety of queues in the environment 58 00:03:03,730 --> 00:03:08,670 and the way these govern phenotype or control phenotype 59 00:03:08,670 --> 00:03:11,750 is that they influence, they regulate 60 00:03:11,750 --> 00:03:14,350 what I would call the execution processes. 61 00:03:14,350 --> 00:03:17,960 The crucial execution processes such as gene expression, 62 00:03:17,960 --> 00:03:22,620 transcription, and translation are 63 00:03:22,620 --> 00:03:24,860 governed by extracellular factors. 64 00:03:24,860 --> 00:03:28,830 Metabolism, synthesis of new molecules, 65 00:03:28,830 --> 00:03:32,130 cytoskeleton, motors, forced generation. 66 00:03:32,130 --> 00:03:34,200 These things all carry out phenotype 67 00:03:34,200 --> 00:03:37,980 governed by the extracellular stimuli or cues. 68 00:03:37,980 --> 00:03:41,710 And it happens via these biochemical signaling 69 00:03:41,710 --> 00:03:45,550 pathways that are activated primarily by cell surface 70 00:03:45,550 --> 00:03:49,000 receptors in the plasmid membrane-- cascades 71 00:03:49,000 --> 00:03:52,930 of biochemical reactions, mostly enzymatic. 72 00:03:52,930 --> 00:03:56,580 Some protein-protein docking, mostly 73 00:03:56,580 --> 00:03:58,620 post-translational modifications. 74 00:03:58,620 --> 00:04:02,100 Kinase phosphate reactions adding and taking off 75 00:04:02,100 --> 00:04:06,020 phosphate groups that change protein activities at locations 76 00:04:06,020 --> 00:04:07,119 and so forth. 77 00:04:07,119 --> 00:04:09,535 It could be other types of post-translation modifications. 78 00:04:09,535 --> 00:04:15,800 It could be second messengers, calcium, ATP aces and so forth. 79 00:04:15,800 --> 00:04:19,860 So, the extracellular-- now my battery's dead. 80 00:04:19,860 --> 00:04:20,970 That's not good. 81 00:04:20,970 --> 00:04:22,079 Oh, there we go. 82 00:04:22,079 --> 00:04:25,760 Extracellular stimuli, generate the signals. 83 00:04:25,760 --> 00:04:29,610 They regulate gene expression, metabolism, cytoskeleton. 84 00:04:29,610 --> 00:04:32,770 They carry out phenotype. 85 00:04:32,770 --> 00:04:33,270 OK. 86 00:04:36,060 --> 00:04:41,840 So we want to learn about cell signaling network operations. 87 00:04:41,840 --> 00:04:43,790 There's actually multiple pathways involved. 88 00:04:43,790 --> 00:04:46,180 We really need to study many of them 89 00:04:46,180 --> 00:04:49,380 in concert to understand what the cells are doing. 90 00:04:49,380 --> 00:04:52,680 And a big question is, what kind of information 91 00:04:52,680 --> 00:04:54,630 do we need to study this? 92 00:04:54,630 --> 00:04:56,880 And on the end, we'd be interested in how 93 00:04:56,880 --> 00:05:01,750 phenotypic behavior does arise from variations and mutations 94 00:05:01,750 --> 00:05:05,270 in the genomic content of cells. 95 00:05:05,270 --> 00:05:10,050 But that genomic content, of course, is not modified, 96 00:05:10,050 --> 00:05:12,630 but its effects are influenced by what's 97 00:05:12,630 --> 00:05:15,610 the environment to these extracellular cues, 98 00:05:15,610 --> 00:05:17,480 log ins and so forth. 99 00:05:17,480 --> 00:05:22,070 So they influence what message is expressed. 100 00:05:22,070 --> 00:05:23,700 From that message they influence what's 101 00:05:23,700 --> 00:05:25,630 actually translated into protein. 102 00:05:25,630 --> 00:05:27,230 From those proteins they influence 103 00:05:27,230 --> 00:05:28,730 the post-translational modifications 104 00:05:28,730 --> 00:05:31,420 and what the proteins are actually doing. 105 00:05:31,420 --> 00:05:34,280 And so, in the end, the phenotype 106 00:05:34,280 --> 00:05:37,210 is carried out by these protein operations. 107 00:05:37,210 --> 00:05:39,430 And the question is, what information 108 00:05:39,430 --> 00:05:41,880 level that we might want to study. 109 00:05:41,880 --> 00:05:44,250 And of course you would love to have the information 110 00:05:44,250 --> 00:05:47,310 content at all levels-- genomic information, 111 00:05:47,310 --> 00:05:50,114 transcriptional information, translational information, 112 00:05:50,114 --> 00:05:51,405 post-translational information. 113 00:05:54,180 --> 00:05:57,570 So integrating all those different data levels 114 00:05:57,570 --> 00:05:59,232 can be extremely valuable. 115 00:05:59,232 --> 00:06:01,440 In terms of the models I'm going to talk about today, 116 00:06:01,440 --> 00:06:03,610 they've essentially been living at the level 117 00:06:03,610 --> 00:06:07,880 of protein activities in these signaling pathways. 118 00:06:07,880 --> 00:06:08,380 OK. 119 00:06:08,380 --> 00:06:09,796 That will be the kind of data sets 120 00:06:09,796 --> 00:06:13,310 you'll see that will be analyzed with respect to the models. 121 00:06:13,310 --> 00:06:18,540 Obviously, they arise from these underlying mechanisms 122 00:06:18,540 --> 00:06:22,440 that, as influenced by the environmental context, 123 00:06:22,440 --> 00:06:25,070 altering the signaling protein activities. 124 00:06:27,790 --> 00:06:28,290 OK. 125 00:06:28,290 --> 00:06:31,930 And what's very interesting and there's 126 00:06:31,930 --> 00:06:36,170 going to be more and more progress in the coming years 127 00:06:36,170 --> 00:06:39,520 is relating what's in the genomic information-- mutations 128 00:06:39,520 --> 00:06:42,810 and variations to what's happening at the protein level. 129 00:06:42,810 --> 00:06:45,487 And some of the other instructors in this class 130 00:06:45,487 --> 00:06:47,070 are really some of the world's experts 131 00:06:47,070 --> 00:06:48,640 in figuring out how to do this. 132 00:06:48,640 --> 00:06:50,750 I'd like to just show this example 133 00:06:50,750 --> 00:06:54,830 as a motivation for this kind of approach. 134 00:06:54,830 --> 00:06:58,890 And that is if you do gene sequencing of many patient 135 00:06:58,890 --> 00:07:00,550 tumors-- in this case, I believe this 136 00:07:00,550 --> 00:07:02,539 was a paper on pancreatic tumors. 137 00:07:02,539 --> 00:07:05,080 This has been shown for pretty much every other type of tumor 138 00:07:05,080 --> 00:07:06,190 since then. 139 00:07:06,190 --> 00:07:09,740 In any given patient tumor, each one of these bars, 140 00:07:09,740 --> 00:07:13,340 there's dozens of mutations in each tumor. 141 00:07:13,340 --> 00:07:15,850 And a variety of types-- deletions, 142 00:07:15,850 --> 00:07:19,140 amplifications mutations and by and large, they're 143 00:07:19,140 --> 00:07:20,710 all different. 144 00:07:20,710 --> 00:07:22,780 There's very few mutations themselves 145 00:07:22,780 --> 00:07:25,515 that really carry over to a substantial proportion 146 00:07:25,515 --> 00:07:27,770 of one patient's tumor to another. 147 00:07:27,770 --> 00:07:31,160 There's some special cases that are fairly pervasive, 148 00:07:31,160 --> 00:07:34,210 but the predominant of these dozens and dozens of mutations 149 00:07:34,210 --> 00:07:39,190 and variations are different from one patient 150 00:07:39,190 --> 00:07:43,020 to another, and even in the same patient. 151 00:07:43,020 --> 00:07:46,260 So what's emerging as a productive way 152 00:07:46,260 --> 00:07:51,220 to think about this-- How do all these different types 153 00:07:51,220 --> 00:07:52,990 of mutations and specific mutations 154 00:07:52,990 --> 00:07:56,950 actually lead to classes of similar pathologies? 155 00:07:56,950 --> 00:07:59,800 And that is they tend to reside in 156 00:07:59,800 --> 00:08:05,362 what can be identified as pathways-- circuits, 157 00:08:05,362 --> 00:08:07,320 machines, things that are actually carrying out 158 00:08:07,320 --> 00:08:09,220 function at the protein level. 159 00:08:09,220 --> 00:08:12,150 So for instance-- I'm losing this again. 160 00:08:12,150 --> 00:08:14,510 For these pancreatic cancers on this wheel 161 00:08:14,510 --> 00:08:18,430 are about a dozen different signaling pathways 162 00:08:18,430 --> 00:08:20,720 and self-cycle control pathways and apoptosis 163 00:08:20,720 --> 00:08:22,790 controlled pathways. 164 00:08:22,790 --> 00:08:26,110 And if you look at any individual patient tumors, 165 00:08:26,110 --> 00:08:29,505 like this green one or this red one-- two different patients. 166 00:08:29,505 --> 00:08:33,409 If you actually look at the mutations at the genomic level, 167 00:08:33,409 --> 00:08:36,250 they're entirely different in the green patient 168 00:08:36,250 --> 00:08:38,169 tumor versus the red patient tumor. 169 00:08:38,169 --> 00:08:40,919 So if you're just trying to match gene mutation 170 00:08:40,919 --> 00:08:43,171 to pancreatic cancer, these two patients 171 00:08:43,171 --> 00:08:44,420 would look entirely different. 172 00:08:46,960 --> 00:08:50,630 But, it turns out, that you can line up their mutations 173 00:08:50,630 --> 00:08:53,260 into the same pathways and say, OK, 174 00:08:53,260 --> 00:08:54,660 the red tumor and the green tumor 175 00:08:54,660 --> 00:08:58,515 both have mutations that affect the TGF beta pathway. 176 00:08:58,515 --> 00:09:00,390 They're different mutations, but they've just 177 00:09:00,390 --> 00:09:02,090 regulated that pathway. 178 00:09:02,090 --> 00:09:04,740 And similarly, you can do that with pretty much 179 00:09:04,740 --> 00:09:06,610 all of the other mutations. 180 00:09:06,610 --> 00:09:09,380 That these tumors have been dysregulated 181 00:09:09,380 --> 00:09:11,740 in terms of particular pathways. 182 00:09:11,740 --> 00:09:13,350 But patient to patient to patient, 183 00:09:13,350 --> 00:09:16,960 it's happened by different genomic gene sequence 184 00:09:16,960 --> 00:09:18,190 mutations. 185 00:09:18,190 --> 00:09:21,550 So that the ability to look at these protein 186 00:09:21,550 --> 00:09:23,780 level pathways is a way of making 187 00:09:23,780 --> 00:09:28,170 really good productive sense of the gene sequencing data. 188 00:09:28,170 --> 00:09:31,560 So there's lots of labs trying to go from gene sequence 189 00:09:31,560 --> 00:09:33,815 up to pathway modulation. 190 00:09:33,815 --> 00:09:37,760 In our case, we're not going to show you that here. 191 00:09:37,760 --> 00:09:39,600 We're going to say, this is a motivation 192 00:09:39,600 --> 00:09:42,940 for starting at the protein level. 193 00:09:42,940 --> 00:09:45,450 And I'd like to show this picture too. 194 00:09:45,450 --> 00:09:49,120 Number one because it's such an anachronism. 195 00:09:49,120 --> 00:09:52,000 This is a circuit board from decades and decades and decades 196 00:09:52,000 --> 00:09:54,680 ago that none of you would recognize. 197 00:09:54,680 --> 00:10:00,560 But, in the molecular biology world, this kind of a picture, 198 00:10:00,560 --> 00:10:04,760 and in its modern form is viewed as a very appealing metaphor 199 00:10:04,760 --> 00:10:07,390 for how to think about these signaling 200 00:10:07,390 --> 00:10:10,830 pathways and signaling networks that take the extracellular 201 00:10:10,830 --> 00:10:14,410 information and turn it into governance of transcription, 202 00:10:14,410 --> 00:10:17,030 metabolism, cytoskeleton, and phenotype. 203 00:10:17,030 --> 00:10:19,280 So, just this metaphor of circuitry, 204 00:10:19,280 --> 00:10:21,360 where in white, the extracellular ligands, 205 00:10:21,360 --> 00:10:24,710 growth factors are somehow wired to the blue. 206 00:10:24,710 --> 00:10:28,380 The cell surface receptors, or B for instance-- 207 00:10:28,380 --> 00:10:30,310 they're wired too. 208 00:10:30,310 --> 00:10:32,920 Kinases and other signaling proteins-- 209 00:10:32,920 --> 00:10:37,050 they're wired to transcription factors, self-cycle control 210 00:10:37,050 --> 00:10:40,990 regulators, apoptosis regulators. 211 00:10:40,990 --> 00:10:44,920 So these very famous folks in cancer biology say, 212 00:10:44,920 --> 00:10:47,580 what you've got to understand is, these signalling networks 213 00:10:47,580 --> 00:10:49,470 as circuitry. 214 00:10:49,470 --> 00:10:51,740 And if the circuitry is dysregulated somehow, 215 00:10:51,740 --> 00:10:54,450 the wiring is different, then that's what's 216 00:10:54,450 --> 00:10:58,910 underlying malignant behavior. 217 00:10:58,910 --> 00:11:01,160 So, this is really beautiful but it's 218 00:11:01,160 --> 00:11:02,800 pretty much useless, right. 219 00:11:02,800 --> 00:11:07,620 Because there's no prediction or calculation or even hypothesis 220 00:11:07,620 --> 00:11:10,650 generation one can do from a picture like this. 221 00:11:10,650 --> 00:11:14,310 Yes, it's circuitry, but what do I do with it? 222 00:11:14,310 --> 00:11:16,110 So, what I want to show you today 223 00:11:16,110 --> 00:11:18,040 are efforts to turn them into what 224 00:11:18,040 --> 00:11:21,470 I would call an actionable model, a computable model. 225 00:11:21,470 --> 00:11:23,789 Yes, it looks kind of like circuitry, 226 00:11:23,789 --> 00:11:26,080 but in fact you would know how to do a calculation that 227 00:11:26,080 --> 00:11:29,370 would fit it to data and predict new data. 228 00:11:29,370 --> 00:11:32,540 And then you have, in fact, a model rather than a metaphor. 229 00:11:32,540 --> 00:11:35,200 That's the idea. 230 00:11:35,200 --> 00:11:37,910 So, one question is, if you want to turn that 231 00:11:37,910 --> 00:11:43,750 into a formal mathematical framework for circuitry 232 00:11:43,750 --> 00:11:46,530 that you can calculate-- what kind of mathematics 233 00:11:46,530 --> 00:11:47,190 might you use? 234 00:11:47,190 --> 00:11:48,565 And in this class you're learning 235 00:11:48,565 --> 00:11:51,730 a whole spectrum of things. 236 00:11:51,730 --> 00:11:54,930 And one can think about it on one hand, 237 00:11:54,930 --> 00:11:56,960 if we knew all of those components 238 00:11:56,960 --> 00:12:01,790 and how they interacted, and could estimate rate constance 239 00:12:01,790 --> 00:12:04,340 and so forth, we could write differential equations 240 00:12:04,340 --> 00:12:08,240 for maybe the dozens and dozens of components and interactions 241 00:12:08,240 --> 00:12:11,690 and predict how they would play out dynamically with time. 242 00:12:11,690 --> 00:12:14,440 For most systems with the complexity that's 243 00:12:14,440 --> 00:12:17,280 really controlling cell biology, at this point in time, 244 00:12:17,280 --> 00:12:18,990 this is almost impossible. 245 00:12:18,990 --> 00:12:21,530 There's only rare cases where enough 246 00:12:21,530 --> 00:12:24,290 is known about signaling biochemistry 247 00:12:24,290 --> 00:12:26,800 to really write down differential 248 00:12:26,800 --> 00:12:29,690 equations for what's going on. 249 00:12:29,690 --> 00:12:31,570 At the other extreme, of course, is the type 250 00:12:31,570 --> 00:12:35,830 of mathematics one gets out of very, very large data sets, 251 00:12:35,830 --> 00:12:40,190 sequencing data sets, transcriptional, and so forth. 252 00:12:40,190 --> 00:12:42,510 More informatics type of analysis, 253 00:12:42,510 --> 00:12:44,990 where it has to do with multivariate 254 00:12:44,990 --> 00:12:50,290 regression and clustering, mutual information. 255 00:12:50,290 --> 00:12:55,050 And what we've been working on is someplace up in the middle 256 00:12:55,050 --> 00:12:58,530 where you don't have enough mechanistic prior knowledge 257 00:12:58,530 --> 00:13:02,360 to write this formal of physics, and yet 258 00:13:02,360 --> 00:13:06,640 takes you someplace beyond statistical associations. 259 00:13:06,640 --> 00:13:08,710 And this is one of the areas that 260 00:13:08,710 --> 00:13:12,470 might be worth your learning in this class. 261 00:13:12,470 --> 00:13:15,720 OK, this is really the same set of computational methods, 262 00:13:15,720 --> 00:13:20,130 just like it's cast in a little bit different form that 263 00:13:20,130 --> 00:13:23,380 delineates competition modeling, really 264 00:13:23,380 --> 00:13:26,430 into two kinds of classes. 265 00:13:26,430 --> 00:13:30,070 What's traditionally appreciated in most fields of engineering 266 00:13:30,070 --> 00:13:32,330 and physics are differential equations 267 00:13:32,330 --> 00:13:33,910 that are very theory driven. 268 00:13:33,910 --> 00:13:34,810 You have a theory. 269 00:13:34,810 --> 00:13:37,010 You have prior knowledge for what's happening. 270 00:13:37,010 --> 00:13:39,140 You're writing down the components involved, 271 00:13:39,140 --> 00:13:41,570 you're writing down how they interact. 272 00:13:41,570 --> 00:13:43,780 And typically, algebraic equations 273 00:13:43,780 --> 00:13:47,960 for those differential equations describe your theory, 274 00:13:47,960 --> 00:13:49,960 describe your prior knowledge. 275 00:13:49,960 --> 00:13:53,260 And now it's formalized and you estimate rate constants 276 00:13:53,260 --> 00:13:55,020 and so forth. 277 00:13:55,020 --> 00:13:56,520 Another whole class of information 278 00:13:56,520 --> 00:13:58,640 is data driven, in which, you really 279 00:13:58,640 --> 00:14:01,370 don't have a good theory about what components matter 280 00:14:01,370 --> 00:14:03,350 and how they interact. 281 00:14:03,350 --> 00:14:07,450 And so you start with data sets and from it 282 00:14:07,450 --> 00:14:11,970 you do classification or typologies or associations 283 00:14:11,970 --> 00:14:15,150 with different types of mathematics that at least try 284 00:14:15,150 --> 00:14:18,290 to make sense and get hypotheses out of these large data 285 00:14:18,290 --> 00:14:21,240 sets, where you don't have any theory. 286 00:14:21,240 --> 00:14:25,440 One reason that logic modeling appeals to me, 287 00:14:25,440 --> 00:14:26,960 is that it actually can be applied 288 00:14:26,960 --> 00:14:31,210 in either the theory driven or the data driven mode. 289 00:14:31,210 --> 00:14:34,770 You can say, I know nothing about my system. 290 00:14:34,770 --> 00:14:39,370 I just generate large data sets of signaling network activities 291 00:14:39,370 --> 00:14:42,000 induced by different stimuli, but I'm 292 00:14:42,000 --> 00:14:44,260 going to try to fit a logic model to it that 293 00:14:44,260 --> 00:14:46,620 says how the different components influencing 294 00:14:46,620 --> 00:14:50,120 each other in a logic way. 295 00:14:50,120 --> 00:14:52,530 Or, you could say, well, I know something. 296 00:14:52,530 --> 00:14:54,190 I have some prior knowledge. 297 00:14:54,190 --> 00:14:59,800 I may have interact ohms the say what molecular components are 298 00:14:59,800 --> 00:15:02,390 present in signaling networks. 299 00:15:02,390 --> 00:15:05,690 And so in principle, I kind of know who's involved 300 00:15:05,690 --> 00:15:07,970 and who might be influencing whom. 301 00:15:07,970 --> 00:15:12,550 And I could write a logic model based on that prior knowledge. 302 00:15:12,550 --> 00:15:15,170 And then run calculations and see if it actually 303 00:15:15,170 --> 00:15:17,670 makes predictions about experimental data. 304 00:15:17,670 --> 00:15:18,830 So that's one nice thing. 305 00:15:18,830 --> 00:15:21,800 It's a mathematical formalism that 306 00:15:21,800 --> 00:15:25,970 can either be run in data driven mode or in theory driven mode 307 00:15:25,970 --> 00:15:27,390 and go back and forth. 308 00:15:27,390 --> 00:15:30,665 So that's one reason-- given one lecture to offer, 309 00:15:30,665 --> 00:15:34,550 I've decided to offer it on this topic. 310 00:15:34,550 --> 00:15:35,840 All right, with me so far? 311 00:15:35,840 --> 00:15:37,360 Any questions? 312 00:15:37,360 --> 00:15:37,860 Philosophy? 313 00:15:41,250 --> 00:15:41,830 OK. 314 00:15:41,830 --> 00:15:46,490 So, what we're going to do today is 315 00:15:46,490 --> 00:15:50,510 almost take a hybrid of these two. 316 00:15:50,510 --> 00:15:54,330 We're going to say, what prior knowledge do we have, 317 00:15:54,330 --> 00:15:57,360 and then recognize that it's really not enough. 318 00:15:57,360 --> 00:16:02,310 And so how do we now integrate that with empirical data 319 00:16:02,310 --> 00:16:05,260 to now come up with logic modeling that, 320 00:16:05,260 --> 00:16:08,640 in fact, is actionable and computable? 321 00:16:08,640 --> 00:16:09,140 OK. 322 00:16:09,140 --> 00:16:12,704 So what kind of prior knowledge do we have? 323 00:16:12,704 --> 00:16:14,870 Let's say we wanted to have a logic model for what's 324 00:16:14,870 --> 00:16:17,400 in these signaling networks down stream of growth factor 325 00:16:17,400 --> 00:16:21,620 receptors, or hormone receptors, or things like that, that then 326 00:16:21,620 --> 00:16:26,030 govern gene expression, metabolism and so forth. 327 00:16:26,030 --> 00:16:28,250 What prior knowledge do we have? 328 00:16:28,250 --> 00:16:30,129 And you folks probably have already 329 00:16:30,129 --> 00:16:31,420 seen some of this in the class. 330 00:16:35,264 --> 00:16:36,930 There's all kinds of databases of stuff. 331 00:16:36,930 --> 00:16:39,180 What's in those databases that might be relevant here? 332 00:16:42,054 --> 00:16:45,547 AUDIENCE: [INAUDIBLE] that the protein-protein interactions-- 333 00:16:45,547 --> 00:16:48,425 if you switch proteins, they interact with each other, 334 00:16:48,425 --> 00:16:50,550 but maybe not necessarily what pathways they're in. 335 00:16:50,550 --> 00:16:51,758 DOUG LAUFFENBURGER: OK, good. 336 00:16:51,758 --> 00:16:55,506 And have you seen databases like that? 337 00:16:55,506 --> 00:16:57,130 AUDIENCE: Several of them have come up. 338 00:16:57,130 --> 00:16:57,630 DOUG LAUFFENBURGER: OK. 339 00:16:57,630 --> 00:16:59,580 Are you the only one who's seen them? 340 00:16:59,580 --> 00:17:01,306 Or is there anybody else that kind 341 00:17:01,306 --> 00:17:03,426 of noticed them in passing too? 342 00:17:03,426 --> 00:17:04,099 OK, good. 343 00:17:04,099 --> 00:17:06,490 Second, third, fourth, all right. 344 00:17:06,490 --> 00:17:10,480 That's a critical mass if I ever saw one. 345 00:17:10,480 --> 00:17:11,010 OK. 346 00:17:11,010 --> 00:17:16,980 So, I'm just going to allude to those. 347 00:17:16,980 --> 00:17:20,520 So there are pathway databases. 348 00:17:20,520 --> 00:17:23,282 And this is actually an old slide of a few years 349 00:17:23,282 --> 00:17:25,240 ago, so I'm sure the numbers are all different. 350 00:17:25,240 --> 00:17:26,640 And, in fact, there's new ones. 351 00:17:26,640 --> 00:17:29,080 I just haven't taken to updating the slide. 352 00:17:29,080 --> 00:17:33,540 But we'll, based on literature, take certain numbers 353 00:17:33,540 --> 00:17:36,420 of gene products, a few hundred of them, 354 00:17:36,420 --> 00:17:41,760 and organize them into pathways based on biological knowledge. 355 00:17:41,760 --> 00:17:45,440 There's other databases that are more interactomes, usually 356 00:17:45,440 --> 00:17:49,230 based on other kinds of experimental data-- yeast II 357 00:17:49,230 --> 00:17:53,280 hybrid, mass spectrometry, literature curation, 358 00:17:53,280 --> 00:17:58,610 that also tries to say who's physically interacting. 359 00:17:58,610 --> 00:18:01,610 So these node-- these pathway databases don't necessarily 360 00:18:01,610 --> 00:18:04,080 say, somebody's physically interacting, 361 00:18:04,080 --> 00:18:06,890 they say somebody might be upstream and downstream 362 00:18:06,890 --> 00:18:08,580 and so forth. 363 00:18:08,580 --> 00:18:10,060 And then they interactome databases 364 00:18:10,060 --> 00:18:12,632 say, component a and component b, 365 00:18:12,632 --> 00:18:15,090 there's some evidence that they have a physical association 366 00:18:15,090 --> 00:18:16,230 someplace along the way. 367 00:18:16,230 --> 00:18:18,720 So these are two complementary types 368 00:18:18,720 --> 00:18:25,280 of databases that, in fact, can be put together. 369 00:18:25,280 --> 00:18:26,190 OK. 370 00:18:26,190 --> 00:18:29,360 So an interesting thing about these-- 371 00:18:29,360 --> 00:18:31,860 there's a number of these databases. 372 00:18:31,860 --> 00:18:34,200 And so in principle you could say, well 373 00:18:34,200 --> 00:18:37,110 if I want then to start-- if I want to generate a logic 374 00:18:37,110 --> 00:18:40,219 model for signaling networks, all 375 00:18:40,219 --> 00:18:42,010 I have to do is take what's in the database 376 00:18:42,010 --> 00:18:43,890 and say what pathways are there and what's 377 00:18:43,890 --> 00:18:46,190 known with their interactions, and now I've 378 00:18:46,190 --> 00:18:47,645 got a starting point. 379 00:18:47,645 --> 00:18:49,660 You know, I can actually draw a graph 380 00:18:49,660 --> 00:18:55,750 with lots of molecular nodes and lots of molecular interactions. 381 00:18:55,750 --> 00:18:57,680 So, you can do that. 382 00:18:57,680 --> 00:19:01,240 And so you can choose one of these databases 383 00:19:01,240 --> 00:19:03,450 and say I'm going to draw a graph that 384 00:19:03,450 --> 00:19:07,890 has what's believed to be true about nodes and pathways 385 00:19:07,890 --> 00:19:10,730 and interactions and signaling networks. 386 00:19:10,730 --> 00:19:13,870 But then you choose a different database and another database. 387 00:19:13,870 --> 00:19:16,810 And you'll actually get different information. 388 00:19:16,810 --> 00:19:17,310 OK. 389 00:19:17,310 --> 00:19:19,143 We actually did a study on this-- I probably 390 00:19:19,143 --> 00:19:21,320 should have given you the citation of that-- that 391 00:19:21,320 --> 00:19:23,770 said if you look at six or seven of these databases, 392 00:19:23,770 --> 00:19:25,470 they are not coincident. 393 00:19:25,470 --> 00:19:28,770 They have a very small intersections. 394 00:19:28,770 --> 00:19:32,970 Most of their information is non-redundant. 395 00:19:32,970 --> 00:19:35,189 And so you could try to put it all together. 396 00:19:35,189 --> 00:19:36,730 And we did this, again, in this paper 397 00:19:36,730 --> 00:19:39,760 that I'm not giving you a citation for. 398 00:19:39,760 --> 00:19:42,420 And so here's a number of nodes and signaling pathways 399 00:19:42,420 --> 00:19:46,680 downstream of receptors. 400 00:19:46,680 --> 00:19:50,940 And all the colored nodes are those 401 00:19:50,940 --> 00:19:54,120 in which they appear in only one of these-- one, two 402 00:19:54,120 --> 00:19:57,460 three, four, five, six databases. 403 00:19:57,460 --> 00:20:00,580 So if something's colored green, it's only in GeneGo 404 00:20:00,580 --> 00:20:02,175 and it's not in any of the others. 405 00:20:02,175 --> 00:20:03,710 If something's colored purple, it's 406 00:20:03,710 --> 00:20:06,180 in PANTHER and none of the others. 407 00:20:06,180 --> 00:20:07,240 OK. 408 00:20:07,240 --> 00:20:09,280 If they're gray-- some of these gray ones, 409 00:20:09,280 --> 00:20:11,630 they're in at least two. 410 00:20:11,630 --> 00:20:15,270 But out of these six, there's an exceedingly small number 411 00:20:15,270 --> 00:20:18,360 of nodes interactions that are in all six databases. 412 00:20:18,360 --> 00:20:22,620 Which was a real surprise to us when we did this. 413 00:20:22,620 --> 00:20:24,080 So what this means is, if you want 414 00:20:24,080 --> 00:20:27,400 to start with some prior knowledge graph 415 00:20:27,400 --> 00:20:30,090 that you're now going to fit a logic model to by mapping it 416 00:20:30,090 --> 00:20:32,540 against data, you first even have the choice, well, 417 00:20:32,540 --> 00:20:33,790 what am I going to start with? 418 00:20:33,790 --> 00:20:35,520 What is my prior knowledge? 419 00:20:35,520 --> 00:20:38,540 There;s not really consensus prior knowledge. 420 00:20:38,540 --> 00:20:42,810 So you can start with six different interaction graphs. 421 00:20:42,810 --> 00:20:44,530 Or you could try to put them all together 422 00:20:44,530 --> 00:20:46,570 and get a consensus graph. 423 00:20:46,570 --> 00:20:48,600 So you have all these choices. 424 00:20:48,600 --> 00:20:50,860 And right now, it's not as if is there's 425 00:20:50,860 --> 00:20:53,640 detailed analysis of what the best choice would 426 00:20:53,640 --> 00:20:57,150 be for your starting point. 427 00:20:57,150 --> 00:20:59,990 But I want to stress that, with respect to our approach, 428 00:20:59,990 --> 00:21:04,090 this is a starting point because one 429 00:21:04,090 --> 00:21:07,170 of the issues with the database information 430 00:21:07,170 --> 00:21:10,990 is that it's typically very diverse with respect 431 00:21:10,990 --> 00:21:12,560 to contact. 432 00:21:12,560 --> 00:21:15,020 What cell type did this information come from? 433 00:21:15,020 --> 00:21:18,310 What treatment conditions did it come from? 434 00:21:18,310 --> 00:21:21,240 If there's different cell types, different species, 435 00:21:21,240 --> 00:21:22,780 different mutations. 436 00:21:22,780 --> 00:21:28,080 So if I see interactions or if I don't see interactions, 437 00:21:28,080 --> 00:21:29,040 are they in conflict? 438 00:21:29,040 --> 00:21:31,650 Or they're just-- this one was in a lymphocyte, 439 00:21:31,650 --> 00:21:33,330 this one was in a hypatocye, this one 440 00:21:33,330 --> 00:21:38,500 was in a cardiac myocyte, and they're actually different. 441 00:21:38,500 --> 00:21:42,380 OK, so if I had a cell type specific database, 442 00:21:42,380 --> 00:21:44,520 or pulled that information out, that would be good. 443 00:21:44,520 --> 00:21:46,202 It would be a smaller number of things. 444 00:21:46,202 --> 00:21:47,910 But then under what treatment conditions? 445 00:21:47,910 --> 00:21:50,795 Because remember I said starting with the genomic content, what 446 00:21:50,795 --> 00:21:52,920 you actually see in terms of molecular interactions 447 00:21:52,920 --> 00:21:55,580 will be very strongly affected by what 448 00:21:55,580 --> 00:21:57,200 matrix were the cells growing on? 449 00:21:57,200 --> 00:21:58,820 Or was this in vivo? 450 00:21:58,820 --> 00:22:03,750 Was this in a multicellular culture situation? 451 00:22:03,750 --> 00:22:06,610 So, that's why this is a starting point 452 00:22:06,610 --> 00:22:09,880 and can't really be used to describe 453 00:22:09,880 --> 00:22:14,279 any particular experimental situation with much confidence. 454 00:22:14,279 --> 00:22:15,695 The other thing-- and this is what 455 00:22:15,695 --> 00:22:17,290 I've been trying to emphasize from the start-- is 456 00:22:17,290 --> 00:22:19,320 that there's no calculation you can do on this. 457 00:22:22,380 --> 00:22:27,260 There's a group of folks in this field who 458 00:22:27,260 --> 00:22:31,270 propose some ideas that I think are very intriguing, 459 00:22:31,270 --> 00:22:33,000 but which, at least to me personally, 460 00:22:33,000 --> 00:22:35,450 there's not that much evidence for. 461 00:22:35,450 --> 00:22:41,040 And that is, that there's topological characteristics 462 00:22:41,040 --> 00:22:45,520 of these graphs, that then tell you what's important. 463 00:22:45,520 --> 00:22:47,330 So if I have a node that's somehow 464 00:22:47,330 --> 00:22:49,760 connected to more other nodes, that 465 00:22:49,760 --> 00:22:51,630 is going to be a more important node, 466 00:22:51,630 --> 00:22:54,380 and might be associated with the disease, versus a node that's 467 00:22:54,380 --> 00:22:55,930 connected to fewer. 468 00:22:55,930 --> 00:22:56,430 OK. 469 00:22:56,430 --> 00:22:59,810 Some of these are very, very appealing ideas conceptually. 470 00:22:59,810 --> 00:23:03,740 If you actually look for the experimental evidence 471 00:23:03,740 --> 00:23:07,450 that they're valid notions, it's very thin. 472 00:23:07,450 --> 00:23:09,830 But, that's where some folks would claim, 473 00:23:09,830 --> 00:23:12,950 oh, you can do predictions on the hypotheses based 474 00:23:12,950 --> 00:23:15,990 on these graphs because there are these graph theory 475 00:23:15,990 --> 00:23:17,630 characteristics that somehow might 476 00:23:17,630 --> 00:23:20,480 be biologically meaningful. 477 00:23:20,480 --> 00:23:21,090 OK. 478 00:23:21,090 --> 00:23:24,240 But I'd say, jury's out on whether, in fact, any of that 479 00:23:24,240 --> 00:23:26,130 is true. 480 00:23:26,130 --> 00:23:30,690 So, our view is-- OK, this is a good starting point, 481 00:23:30,690 --> 00:23:33,960 but in fact, needs to be mapped to empirical data 482 00:23:33,960 --> 00:23:39,540 in order to gain confidence about calculations you can do. 483 00:23:39,540 --> 00:23:44,200 So that's the goal of this kind of approach, 484 00:23:44,200 --> 00:23:48,870 is to say, let's stipulate that we start 485 00:23:48,870 --> 00:23:51,690 with some prior knowledge scaffold. 486 00:23:51,690 --> 00:23:54,710 This particular one is from the Ingenuity database. 487 00:23:54,710 --> 00:23:56,850 You could get one from any other database. 488 00:23:56,850 --> 00:24:00,160 You could get a consensus one from three or four if you want. 489 00:24:00,160 --> 00:24:03,590 And so it has, up here, extracellular stimuli, 490 00:24:03,590 --> 00:24:06,200 growth factors, cytokines. 491 00:24:06,200 --> 00:24:11,150 They're connected in their interactome II receptors. 492 00:24:11,150 --> 00:24:14,810 They're connected to scaffolding proteins and signaling proteins 493 00:24:14,810 --> 00:24:17,110 and kinases and so forth. 494 00:24:17,110 --> 00:24:19,520 They're connected to transcription factors, 495 00:24:19,520 --> 00:24:20,970 metabolic enzymes. 496 00:24:20,970 --> 00:24:22,880 So you can draw this graph. 497 00:24:22,880 --> 00:24:25,675 Say this might be what's going on in my cell. 498 00:24:25,675 --> 00:24:27,550 And then what we'd like to do is to turn this 499 00:24:27,550 --> 00:24:31,970 into a formal logic framework that's 500 00:24:31,970 --> 00:24:36,340 capable of then fitting experimental data, 501 00:24:36,340 --> 00:24:38,380 predicting new experimental data, 502 00:24:38,380 --> 00:24:41,990 and giving you a chance at biological hypothesis 503 00:24:41,990 --> 00:24:43,512 and testing. 504 00:24:43,512 --> 00:24:46,250 All right, so conceptually you get it? 505 00:24:46,250 --> 00:24:50,240 Two aspects-- some kind of starting prior knowledge, 506 00:24:50,240 --> 00:24:53,230 that's kind of a scaffold, a graph, for your network. 507 00:24:53,230 --> 00:24:56,450 And now you're going to turn it into a computable logic model 508 00:24:56,450 --> 00:25:00,720 by mapping it against empirical data. 509 00:25:00,720 --> 00:25:04,960 So, merely what it takes is the kind of conceptual diagram 510 00:25:04,960 --> 00:25:08,280 you see in any cell biology paper, any signaling paper, 511 00:25:08,280 --> 00:25:13,140 that says, well, a and b both influence e positively, 512 00:25:13,140 --> 00:25:17,880 and b influences f negatively, and c influences f positively. 513 00:25:17,880 --> 00:25:19,780 Then there's a feedback from g to a. 514 00:25:19,780 --> 00:25:20,980 That's inhibitory. 515 00:25:20,980 --> 00:25:22,800 You can draw those. 516 00:25:22,800 --> 00:25:27,770 But now, how do you turn it into a computable algorithm? 517 00:25:27,770 --> 00:25:30,610 So, what I'm going to spend most of the day on is, 518 00:25:30,610 --> 00:25:33,680 just conversion of this to a Boolean logic 519 00:25:33,680 --> 00:25:38,870 model that any one of these interactions is and-- a and b 520 00:25:38,870 --> 00:25:41,870 being active makes e active. 521 00:25:41,870 --> 00:25:44,770 c being active, but b not being active, 522 00:25:44,770 --> 00:25:47,220 allows f to be active, and so forth. 523 00:25:47,220 --> 00:25:49,320 You turn these into formal logic statements 524 00:25:49,320 --> 00:25:50,640 that you can compute on. 525 00:25:50,640 --> 00:25:52,720 At the very end, if we have time, 526 00:25:52,720 --> 00:25:55,880 I'll show how to relax this from a Boolean framework that's just 527 00:25:55,880 --> 00:25:59,450 on off, to something that can be more quantitative. 528 00:26:02,490 --> 00:26:02,990 All right. 529 00:26:02,990 --> 00:26:04,050 So that's the notion. 530 00:26:04,050 --> 00:26:07,400 Now what I'm going to do for the rest of the time 531 00:26:07,400 --> 00:26:10,820 is go through the specific example paper that says, OK, 532 00:26:10,820 --> 00:26:13,665 how do we in fact do this? 533 00:26:13,665 --> 00:26:16,440 What is a way to accomplish this? 534 00:26:16,440 --> 00:26:19,050 So now let's go back to a biological problem 535 00:26:19,050 --> 00:26:21,686 where there's going to be empirical, experimental data 536 00:26:21,686 --> 00:26:23,310 that we're now going to map against one 537 00:26:23,310 --> 00:26:27,070 of these prior knowledge interactome graphs. 538 00:26:27,070 --> 00:26:30,080 This particular study-- this was done with Peter Sorger, who's 539 00:26:30,080 --> 00:26:35,380 now at Harvard Medical School-- had to do with liver cells. 540 00:26:35,380 --> 00:26:39,720 Liver cancer-- you'll see some application of that 541 00:26:39,720 --> 00:26:44,450 at the end-- that says we have liver cell hepatocytes. 542 00:26:44,450 --> 00:26:47,780 And we want to know how they respond to different growth 543 00:26:47,780 --> 00:26:50,770 factors, in cytokines in their environment. 544 00:26:50,770 --> 00:26:52,960 How that'll change their proliferation or death? 545 00:26:52,960 --> 00:26:56,600 Or the inflammatory cytokines that they produce. 546 00:26:56,600 --> 00:26:59,600 And we'd like to take-- this is just a pictorial diagram that 547 00:26:59,600 --> 00:27:02,250 could be in any cell biology paper, 548 00:27:02,250 --> 00:27:04,200 and make this calculable. 549 00:27:04,200 --> 00:27:09,040 So we could say what's different from a primary normal 550 00:27:09,040 --> 00:27:11,470 hepatocyte liver cell that's not cancerous? 551 00:27:11,470 --> 00:27:13,780 It might have a signaling logic. 552 00:27:13,780 --> 00:27:15,530 But if then we compare the signaling logic 553 00:27:15,530 --> 00:27:20,030 to a liver tumor cell type, or four different liver tumor cell 554 00:27:20,030 --> 00:27:21,815 types, what's different? 555 00:27:21,815 --> 00:27:25,270 If we can find some logic that's different for the tumor cell 556 00:27:25,270 --> 00:27:29,490 lines versus the normal primary lines-- some logic from here 557 00:27:29,490 --> 00:27:32,610 to there or to there-- that now tells you biologically, where 558 00:27:32,610 --> 00:27:34,110 the differences might be that have 559 00:27:34,110 --> 00:27:37,040 arisen from the genetic mutations. 560 00:27:37,040 --> 00:27:38,870 And where good drug targets might be, 561 00:27:38,870 --> 00:27:42,000 or predictions if I intervene here, 562 00:27:42,000 --> 00:27:44,420 if there's no difference in that logic, between normal 563 00:27:44,420 --> 00:27:46,632 and tumor, well then that won't have any effect. 564 00:27:46,632 --> 00:27:48,090 I want to look for the places where 565 00:27:48,090 --> 00:27:50,610 there is a difference in the signaling logic. 566 00:27:50,610 --> 00:27:53,350 And that would be a better drug target. 567 00:27:53,350 --> 00:27:55,810 OK, so the measurements are made in 568 00:27:55,810 --> 00:28:01,860 across 17 of these different signaling molecules 569 00:28:01,860 --> 00:28:05,020 here, pretty much all by measurement 570 00:28:05,020 --> 00:28:06,870 of a phosphorylation state. 571 00:28:06,870 --> 00:28:09,280 So if you've done cell biology or biochemistry-- 572 00:28:09,280 --> 00:28:11,040 in these signaling pathways, many 573 00:28:11,040 --> 00:28:13,890 of the activities in these kinds of pathways 574 00:28:13,890 --> 00:28:16,680 that regulate this kind of cell behavior 575 00:28:16,680 --> 00:28:19,680 are kinases that end up affecting transcription factor 576 00:28:19,680 --> 00:28:21,250 activities and so forth. 577 00:28:21,250 --> 00:28:23,160 And it's the phosphorylation state 578 00:28:23,160 --> 00:28:25,335 of any these proteins that matters. 579 00:28:25,335 --> 00:28:29,760 If a phosphate is on some particular amino acid, 580 00:28:29,760 --> 00:28:31,340 the enzyme might be active. 581 00:28:31,340 --> 00:28:33,950 If it's not there it might be inactive and so forth. 582 00:28:33,950 --> 00:28:36,075 So, just measurement of phosphorylation states 583 00:28:36,075 --> 00:28:38,620 of 17 different proteins in these pathways 584 00:28:38,620 --> 00:28:42,550 distributed across multiple pathways. 585 00:28:42,550 --> 00:28:45,420 I've made these measurements on five different cell types, four 586 00:28:45,420 --> 00:28:47,949 tumor cell types, and the primaries 587 00:28:47,949 --> 00:28:50,240 in order to try to see what's different between primary 588 00:28:50,240 --> 00:28:51,820 and tumor. 589 00:28:51,820 --> 00:28:55,970 And then what might be different, patient to patient. 590 00:28:55,970 --> 00:28:59,500 In response to seven different extracellular stimuli, 591 00:28:59,500 --> 00:29:01,340 some of them growth factors, some of them 592 00:29:01,340 --> 00:29:06,850 cytokines, some of them actually bacterial metabolic products. 593 00:29:06,850 --> 00:29:10,780 We all know about the effects of microbiome these days. 594 00:29:10,780 --> 00:29:14,050 And, to further populate a database that 595 00:29:14,050 --> 00:29:16,570 might be capable of helping validate 596 00:29:16,570 --> 00:29:20,856 a model, a number of seven, in fact-- intercellular 597 00:29:20,856 --> 00:29:21,355 inhibitors. 598 00:29:21,355 --> 00:29:23,990 A small molecule, these things in black. 599 00:29:23,990 --> 00:29:25,680 One that might inhibit this kinase. 600 00:29:25,680 --> 00:29:27,350 One might inhibit that kinase. 601 00:29:27,350 --> 00:29:29,420 One might inhibit that kinase. 602 00:29:29,420 --> 00:29:31,950 So now if you add all those inhibitors too, 603 00:29:31,950 --> 00:29:34,310 then you start to change the network activities 604 00:29:34,310 --> 00:29:38,160 and the downstream behavior. 605 00:29:38,160 --> 00:29:40,440 So that's how extensive the data is. 606 00:29:40,440 --> 00:29:45,910 And this is actually for a few different time points. 607 00:29:45,910 --> 00:29:48,190 So the data looks something like this. 608 00:29:48,190 --> 00:29:50,230 Let's focus on the one on the left. 609 00:29:50,230 --> 00:29:53,180 This is just the primary, normal, human cells. 610 00:29:53,180 --> 00:29:56,940 It came from a liver donor. 611 00:29:56,940 --> 00:29:57,720 OK. 612 00:29:57,720 --> 00:30:01,890 Each row is one of the 17 different signals, 613 00:30:01,890 --> 00:30:04,650 essentially measurement of the phosphorylation 614 00:30:04,650 --> 00:30:12,150 state of Akt or CREB or P52 of staph 3. 615 00:30:12,150 --> 00:30:12,650 OK? 616 00:30:12,650 --> 00:30:14,270 So measurements of its phosphorylation 617 00:30:14,270 --> 00:30:17,740 state that has something to do with its signaling activity. 618 00:30:17,740 --> 00:30:20,620 Each of the big columns are the seven different treatments-- 619 00:30:20,620 --> 00:30:24,780 the different growth factors and cytokines and so forth. 620 00:30:24,780 --> 00:30:25,730 And the control. 621 00:30:25,730 --> 00:30:27,580 No stimulation. 622 00:30:27,580 --> 00:30:31,950 And within each one of these treatments, 623 00:30:31,950 --> 00:30:34,270 in each one of these stimuli, then there's 624 00:30:34,270 --> 00:30:35,770 seven different inhibitors that were 625 00:30:35,770 --> 00:30:38,010 used for the different pathways. 626 00:30:38,010 --> 00:30:43,970 So seven stimuli by seven inhibitors plus controls. 627 00:30:43,970 --> 00:30:46,480 And then three different time points. 628 00:30:46,480 --> 00:30:49,880 Sort of zero, 30 minutes, and three hours. 629 00:30:49,880 --> 00:30:52,520 So the data looks something like this. 630 00:30:52,520 --> 00:30:56,230 If there's really no change, due to the stimulation 631 00:30:56,230 --> 00:30:58,550 or the inhibitor, you'll see something in gray. 632 00:30:58,550 --> 00:31:01,080 So in these gray bars, there was already 633 00:31:01,080 --> 00:31:03,840 phosphorylation of this transcription factor 634 00:31:03,840 --> 00:31:07,980 [INAUDIBLE] and it didn't really change under most treatments. 635 00:31:07,980 --> 00:31:11,550 If it was yellow, what it meant was, 636 00:31:11,550 --> 00:31:15,140 whatever the treatment was, you got a quick activation 637 00:31:15,140 --> 00:31:19,080 of that signal and then it went away. 638 00:31:19,080 --> 00:31:22,740 If it's late-- purple, then it didn't happen in the first half 639 00:31:22,740 --> 00:31:25,500 hour, but it started to show up a few hours later. 640 00:31:25,500 --> 00:31:28,310 And if it's green it showed up in the first half hour 641 00:31:28,310 --> 00:31:29,450 and it stayed sustained. 642 00:31:29,450 --> 00:31:30,741 So that's what the color means. 643 00:31:30,741 --> 00:31:33,240 But this is the real experimental data. 644 00:31:33,240 --> 00:31:36,720 And over here on the right is one of the tumor cell lines. 645 00:31:36,720 --> 00:31:39,010 And you can just see by inspection, 646 00:31:39,010 --> 00:31:40,300 it's different, right. 647 00:31:40,300 --> 00:31:42,730 The colors here are different from the colors there. 648 00:31:42,730 --> 00:31:45,470 All the same treatments, stimuli inhibitors. 649 00:31:45,470 --> 00:31:46,809 The colors are very different. 650 00:31:46,809 --> 00:31:48,850 You know, therefore that the signaling activities 651 00:31:48,850 --> 00:31:50,080 are very different. 652 00:31:50,080 --> 00:31:50,580 OK. 653 00:31:50,580 --> 00:31:53,760 Just by visual inspection. 654 00:31:53,760 --> 00:31:54,260 OK. 655 00:31:54,260 --> 00:31:56,030 So what we're going to try to do is build a logic model 656 00:31:56,030 --> 00:31:56,780 for this. 657 00:31:56,780 --> 00:31:58,580 A logic model for this. 658 00:31:58,580 --> 00:32:01,700 Compare them and say, oh where are the key differences in how 659 00:32:01,700 --> 00:32:03,930 the signaling pathways are getting activated? 660 00:32:03,930 --> 00:32:07,740 Downstream of the same stimuli. 661 00:32:07,740 --> 00:32:10,570 So, we start with our prior knowledge. 662 00:32:10,570 --> 00:32:15,260 This is from the Ingenuity database, which actually 663 00:32:15,260 --> 00:32:19,130 happened to be missing, even basic information 664 00:32:19,130 --> 00:32:20,540 about insulin signaling. 665 00:32:20,540 --> 00:32:22,460 So we just added our own information 666 00:32:22,460 --> 00:32:24,902 about what the insulin receptor does. 667 00:32:24,902 --> 00:32:26,110 It's kind of hard to believe. 668 00:32:26,110 --> 00:32:27,901 This is a database that cost a lot of money 669 00:32:27,901 --> 00:32:29,890 and they didn't have really much information 670 00:32:29,890 --> 00:32:31,720 about insulin receptor signaling. 671 00:32:31,720 --> 00:32:34,190 Very strange. 672 00:32:34,190 --> 00:32:38,330 So, downstream of our seven stimuli, 673 00:32:38,330 --> 00:32:40,640 down to the transcription factors of interest, 674 00:32:40,640 --> 00:32:44,390 there are about 82 molecular nodes and a hundred some edges 675 00:32:44,390 --> 00:32:46,530 that you'd pull out of the Ingenuity database. 676 00:32:46,530 --> 00:32:50,180 So here's our starting guess at what this looks like. 677 00:32:50,180 --> 00:32:53,200 There's no logic in here, but this is just, potentially, 678 00:32:53,200 --> 00:32:55,760 the things that the logic might operate on, 679 00:32:55,760 --> 00:33:00,286 downstream of stimuli, and when inhibited, and so forth. 680 00:33:00,286 --> 00:33:01,240 All right. 681 00:33:01,240 --> 00:33:02,500 So here's the process. 682 00:33:02,500 --> 00:33:04,900 This was the actual algorithmic process 683 00:33:04,900 --> 00:33:07,730 that I'll walk you through. 684 00:33:07,730 --> 00:33:10,870 On the left-hand side is the computer part. 685 00:33:10,870 --> 00:33:13,310 It said, OK, from the Ingenuity database, 686 00:33:13,310 --> 00:33:16,620 we had this prior knowledge about who 687 00:33:16,620 --> 00:33:20,110 was upstream, downstream, who affected whom. 688 00:33:20,110 --> 00:33:22,810 We strip this down some, because in terms 689 00:33:22,810 --> 00:33:25,610 of the measurements on the perturbations, 690 00:33:25,610 --> 00:33:28,400 there are some of the nodes that you just would not 691 00:33:28,400 --> 00:33:30,190 be able to see any measurable difference. 692 00:33:30,190 --> 00:33:36,550 OK, there was no stimulus upstream, or no perturbation. 693 00:33:36,550 --> 00:33:38,220 And it was not measured so you really 694 00:33:38,220 --> 00:33:40,136 wouldn't be able to tell if it changed or not. 695 00:33:40,136 --> 00:33:42,070 So you just take those out. 696 00:33:42,070 --> 00:33:45,070 Of everything remaining, now you don't know the logic. 697 00:33:45,070 --> 00:33:46,210 You know the potential. 698 00:33:46,210 --> 00:33:48,730 And so you say, well, of all the nodes and interactions 699 00:33:48,730 --> 00:33:52,320 remaining, I could have AND gates, I could have OR gates, 700 00:33:52,320 --> 00:33:53,690 you could have NOTS. 701 00:33:53,690 --> 00:33:56,000 And you say, OK, in principle, I could have, 702 00:33:56,000 --> 00:34:00,560 then, many, many, many, many, many different logic 703 00:34:00,560 --> 00:34:06,050 models that could work. 704 00:34:06,050 --> 00:34:07,424 So how do I know which one? 705 00:34:07,424 --> 00:34:09,090 Well now you skip over to the other side 706 00:34:09,090 --> 00:34:11,520 and say, well, but we have all this experimental data. 707 00:34:11,520 --> 00:34:13,610 We have the data from all the different stimuli 708 00:34:13,610 --> 00:34:17,500 and all the different inhibitors for any given cell type. 709 00:34:17,500 --> 00:34:21,033 And so, we have that data under all these different conditions. 710 00:34:21,033 --> 00:34:22,449 And what we're going to do is just 711 00:34:22,449 --> 00:34:28,270 run hundreds or thousands of these potentially appropriate 712 00:34:28,270 --> 00:34:29,980 models. 713 00:34:29,980 --> 00:34:33,320 Compare them to the data of whether any given node is 714 00:34:33,320 --> 00:34:36,060 activated or not, activated under treatment conditions, 715 00:34:36,060 --> 00:34:38,030 stimuli inhibitors. 716 00:34:38,030 --> 00:34:39,920 And we'll calculate the air. 717 00:34:39,920 --> 00:34:41,800 How good was any one of those models 718 00:34:41,800 --> 00:34:44,460 at actually matching those data? 719 00:34:44,460 --> 00:34:46,396 Simple as that. 720 00:34:46,396 --> 00:34:47,770 And then it's a matter of finding 721 00:34:47,770 --> 00:34:50,530 what are the best fit ones from the best fit ones. 722 00:34:50,530 --> 00:34:52,860 Could you improve them and make them fit even better? 723 00:34:52,860 --> 00:34:57,350 And in the end, how did you go from an initial prior knowledge 724 00:34:57,350 --> 00:35:02,200 scaffold to something that, in fact, fit the data really well, 725 00:35:02,200 --> 00:35:04,710 from which you could make new predictions. 726 00:35:04,710 --> 00:35:05,210 OK. 727 00:35:05,210 --> 00:35:09,370 So you get the approach here? 728 00:35:09,370 --> 00:35:12,460 All right, good. 729 00:35:12,460 --> 00:35:16,320 Now, in terms of figuring out how well any given 730 00:35:16,320 --> 00:35:20,630 model matches the data and how to go through model selection, 731 00:35:20,630 --> 00:35:24,059 there's a myriad of different approaches to this. 732 00:35:24,059 --> 00:35:25,600 And I'm not claiming that what we did 733 00:35:25,600 --> 00:35:28,730 was the absolute best approach. 734 00:35:28,730 --> 00:35:30,850 There's alternatives to it that one could consider 735 00:35:30,850 --> 00:35:33,560 and then perhaps could work even better. 736 00:35:33,560 --> 00:35:35,660 If you read the paper, you'll read the reasons 737 00:35:35,660 --> 00:35:37,215 for these choices. 738 00:35:37,215 --> 00:35:38,400 OK. 739 00:35:38,400 --> 00:35:40,450 So I'll let you do that. 740 00:35:40,450 --> 00:35:44,520 The way the model quality was calculated 741 00:35:44,520 --> 00:35:47,060 was to have an objective function that 742 00:35:47,060 --> 00:35:50,820 said we want to minimize some number, theta. 743 00:35:50,820 --> 00:35:52,590 And how do we calculate theta? 744 00:35:52,590 --> 00:35:56,100 Well, first of all, for whatever that model is, 745 00:35:56,100 --> 00:35:59,250 we're going to fit-- whether the model says some nodes 746 00:35:59,250 --> 00:36:02,690 should be on or off, one or zero. 747 00:36:02,690 --> 00:36:06,800 And we're going to compare it to the experimental data. 748 00:36:06,800 --> 00:36:10,220 Now the experimental data, I need to emphasize, 749 00:36:10,220 --> 00:36:12,850 isn't one or zero, it's normalized 750 00:36:12,850 --> 00:36:14,160 to go between one and zero. 751 00:36:14,160 --> 00:36:19,050 But the actual measurement might be 0.7 or 0.25. 752 00:36:19,050 --> 00:36:23,135 OK, so you're going to have error against the Boolean model 753 00:36:23,135 --> 00:36:25,260 even if all the edges are absolutely correct you'll 754 00:36:25,260 --> 00:36:29,500 still going to get some quantitative error. 755 00:36:29,500 --> 00:36:30,920 So you calculate that. 756 00:36:30,920 --> 00:36:32,550 The Boolean model says zero or one. 757 00:36:32,550 --> 00:36:35,710 The experimental data says 0.250, 0.7. 758 00:36:35,710 --> 00:36:39,680 And you say, OK, I'll calculate that. 759 00:36:39,680 --> 00:36:43,600 But then you might think, all right, well, 760 00:36:43,600 --> 00:36:46,280 somehow I've got to penalize bigger models with more 761 00:36:46,280 --> 00:36:49,160 nodes and more edges because surely the more nodes and edges 762 00:36:49,160 --> 00:36:53,390 I put in, I could capture more of the data. 763 00:36:53,390 --> 00:36:55,770 And I don't want to make the model infinitely large 764 00:36:55,770 --> 00:36:57,240 just to get the best fit. 765 00:36:57,240 --> 00:37:00,260 So I need to penalize that. 766 00:37:00,260 --> 00:37:04,760 Turns out it's not true, but nonetheless it's worth doing. 767 00:37:04,760 --> 00:37:08,180 So, you take a parameter that's the size of the model. 768 00:37:08,180 --> 00:37:10,020 It's basically just the number of nodes. 769 00:37:10,020 --> 00:37:12,350 The more nodes in it, the more you 770 00:37:12,350 --> 00:37:14,540 would be suspicious of the model for just fitting 771 00:37:14,540 --> 00:37:17,480 because it has too many components. 772 00:37:17,480 --> 00:37:23,400 And you multiply that size by a penalty parameter, alpha. 773 00:37:23,400 --> 00:37:25,450 So you have a bad objective function 774 00:37:25,450 --> 00:37:27,480 if there's a lot of error with the data, 775 00:37:27,480 --> 00:37:30,020 or if your model's too big. 776 00:37:30,020 --> 00:37:35,060 A better model would be, better fit to the data and smaller. 777 00:37:35,060 --> 00:37:38,560 That's the calculation. 778 00:37:38,560 --> 00:37:39,090 OK. 779 00:37:39,090 --> 00:37:42,250 And in the end-- and I'm going to show you how we did this. 780 00:37:45,190 --> 00:37:47,620 And I think the field is now really believing this. 781 00:37:47,620 --> 00:37:51,170 That what you're not after is a single best fit model. 782 00:37:51,170 --> 00:37:55,810 That one single model that gives you the very smallest data. 783 00:37:55,810 --> 00:37:58,250 Because honestly, within the uncertainty 784 00:37:58,250 --> 00:38:02,710 of the experimental data-- OK, there's 785 00:38:02,710 --> 00:38:04,480 a substantial number of models that 786 00:38:04,480 --> 00:38:09,080 could fit the data within that noise. 787 00:38:09,080 --> 00:38:11,210 So if you demanded the single best one, 788 00:38:11,210 --> 00:38:13,940 you say, well, but these other 50 actually 789 00:38:13,940 --> 00:38:16,759 fit it almost as good and within the uncertainty of the data. 790 00:38:16,759 --> 00:38:18,050 How can you really reject them? 791 00:38:18,050 --> 00:38:18,990 And you can't. 792 00:38:18,990 --> 00:38:23,090 So in the end, what's being striven for in most 793 00:38:23,090 --> 00:38:25,522 of the field is a family of models. 794 00:38:25,522 --> 00:38:26,980 And then you see what the consensus 795 00:38:26,980 --> 00:38:31,670 is and the differences within that family. 796 00:38:31,670 --> 00:38:35,180 The particular algorithm for generating and running 797 00:38:35,180 --> 00:38:38,240 through different potential models-- 798 00:38:38,240 --> 00:38:39,720 because you just can't exhaustively 799 00:38:39,720 --> 00:38:40,630 sample all of them. 800 00:38:40,630 --> 00:38:45,520 OK, these networks are so large, that you can't exhaustively 801 00:38:45,520 --> 00:38:48,640 test all possibilities of all their logic and so forth. 802 00:38:48,640 --> 00:38:50,720 It's really prohibitive. 803 00:38:50,720 --> 00:38:52,989 So there's many different ways you can go about it. 804 00:38:52,989 --> 00:38:54,780 This particular method maybe you've already 805 00:38:54,780 --> 00:38:57,780 learned this in class for other applications 806 00:38:57,780 --> 00:38:59,560 as a genetic algorithm. 807 00:38:59,560 --> 00:39:01,300 So you start with some population. 808 00:39:01,300 --> 00:39:03,350 You start with your Ingenuity scaffold 809 00:39:03,350 --> 00:39:07,790 and then you randomly remove or take edges and things 810 00:39:07,790 --> 00:39:08,300 like that. 811 00:39:08,300 --> 00:39:09,841 So that if you've got a whole family, 812 00:39:09,841 --> 00:39:11,870 that's slightly different. 813 00:39:11,870 --> 00:39:14,810 For each one of them you evaluate the objective function 814 00:39:14,810 --> 00:39:16,690 against the data. 815 00:39:16,690 --> 00:39:20,900 And you get some of those that then are the most attractive. 816 00:39:20,900 --> 00:39:22,910 They seem to be the best fit. 817 00:39:22,910 --> 00:39:26,580 But, by no means would you imagine they are yet optimal. 818 00:39:26,580 --> 00:39:30,530 So, now you create a next generation from this population 819 00:39:30,530 --> 00:39:34,907 by the analog of genetics. 820 00:39:34,907 --> 00:39:36,740 Some of the very best-- you say, OK, they're 821 00:39:36,740 --> 00:39:39,800 going to survive so I'm just going to take them as is. 822 00:39:39,800 --> 00:39:41,640 Some I'm going to mutate, I'm going 823 00:39:41,640 --> 00:39:44,830 to have a probability of mutating an edge here or there. 824 00:39:44,830 --> 00:39:46,970 You can have crossover, actually mating 825 00:39:46,970 --> 00:39:49,290 between one model and another model, 826 00:39:49,290 --> 00:39:50,890 so that the daughter model gets some 827 00:39:50,890 --> 00:39:53,140 of the arcs from the mother model and some of the arcs 828 00:39:53,140 --> 00:39:54,580 from the father model. 829 00:39:54,580 --> 00:39:57,920 So you just generate an ex-population, do it again. 830 00:39:57,920 --> 00:40:00,610 And once you've reached a set of models 831 00:40:00,610 --> 00:40:04,120 that fit your data within the criteria 832 00:40:04,120 --> 00:40:07,490 that you want, then you say, this is now my population. 833 00:40:07,490 --> 00:40:09,130 And these are now my best-fit models. 834 00:40:09,130 --> 00:40:11,849 So it's not exhaustive. 835 00:40:11,849 --> 00:40:13,640 You can definitely find local minimum here. 836 00:40:13,640 --> 00:40:14,931 There's no question about that. 837 00:40:14,931 --> 00:40:15,608 Yeah? 838 00:40:15,608 --> 00:40:22,082 AUDIENCE: Do you always take the best model into the next round? 839 00:40:22,082 --> 00:40:23,078 Or do you-- 840 00:40:23,078 --> 00:40:25,474 DOUG LAUFFENBURGER: Yeah, that's the elite survival. 841 00:40:25,474 --> 00:40:26,890 If you don't incorporate that, you 842 00:40:26,890 --> 00:40:30,075 might lose the best ones in any given round. 843 00:40:30,075 --> 00:40:33,940 But this ensures you take the best subset. 844 00:40:33,940 --> 00:40:34,980 Let them go for it. 845 00:40:34,980 --> 00:40:36,438 AUDIENCE: Is there a worry that you 846 00:40:36,438 --> 00:40:38,141 might get stuck in [INAUDIBLE]? 847 00:40:38,141 --> 00:40:39,140 DOUG LAUFFENBURGER: Yes. 848 00:40:39,140 --> 00:40:41,750 Yes, absolutely. 849 00:40:41,750 --> 00:40:46,990 So now you run this with a number of different starting 850 00:40:46,990 --> 00:40:48,120 populations. 851 00:40:48,120 --> 00:40:52,010 And you see if you get to similar consensus models. 852 00:40:52,010 --> 00:40:54,030 Yeah, because absolutely, this does not 853 00:40:54,030 --> 00:40:56,240 guarantee any kind of a global minimum. 854 00:40:56,240 --> 00:40:58,260 You will always get local. 855 00:40:58,260 --> 00:41:01,622 So you have to condition it on a different set 856 00:41:01,622 --> 00:41:02,580 of initial populations. 857 00:41:06,660 --> 00:41:07,160 OK. 858 00:41:07,160 --> 00:41:10,730 Once you do this-- I'm going to show you some results 859 00:41:10,730 --> 00:41:15,810 first and then dig into some other ways to think about it. 860 00:41:15,810 --> 00:41:17,210 So it's plotted here. 861 00:41:17,210 --> 00:41:20,860 This is one of the tumor cell lines. 862 00:41:20,860 --> 00:41:24,160 What's plotted here, is again, all the rows or all the signals 863 00:41:24,160 --> 00:41:25,310 that were measured. 864 00:41:25,310 --> 00:41:28,550 All the big columns or all the different stimuli, 865 00:41:28,550 --> 00:41:32,910 and all the little columns are the different inhibitors. 866 00:41:32,910 --> 00:41:36,920 And I should point out, this is only for the 30 minute data. 867 00:41:36,920 --> 00:41:37,420 OK. 868 00:41:37,420 --> 00:41:39,840 This isn't for the three hour or both, 869 00:41:39,840 --> 00:41:41,780 this is just the 30 minute data. 870 00:41:41,780 --> 00:41:45,450 And basically where there's green, 871 00:41:45,450 --> 00:41:53,530 the model and data fit was considered OK. 872 00:41:53,530 --> 00:41:55,820 Where it's red, it's not OK. 873 00:41:55,820 --> 00:41:59,240 Where it's pink it's less bad. 874 00:41:59,240 --> 00:42:01,010 So by the shaded. 875 00:42:01,010 --> 00:42:03,950 And the yellow actually, the model 876 00:42:03,950 --> 00:42:05,897 really couldn't make a prediction. 877 00:42:05,897 --> 00:42:07,980 Now, why that's the case is what's showing up here 878 00:42:07,980 --> 00:42:10,300 is just the initial Ingenuity scaffold. 879 00:42:10,300 --> 00:42:14,200 The very best one that didn't add or remove any 880 00:42:14,200 --> 00:42:17,165 arcs or nodes from the Ingenuity prior knowledge. 881 00:42:17,165 --> 00:42:18,790 It's that all we're going to do is just 882 00:42:18,790 --> 00:42:23,130 run the best fit Boolean logic model we can on that. 883 00:42:23,130 --> 00:42:24,130 And it wasn't very good. 884 00:42:24,130 --> 00:42:27,110 It was about 45% error. 885 00:42:27,110 --> 00:42:30,020 Almost half of the nodes it got wrong. 886 00:42:30,020 --> 00:42:32,500 So what that tells you if you just take a scaffold from one 887 00:42:32,500 --> 00:42:37,890 is interactive databases and without adulterating it, 888 00:42:37,890 --> 00:42:41,190 just fit the best logic model to some data-- 889 00:42:41,190 --> 00:42:43,850 OK, at least in this case, and we've done a number of others, 890 00:42:43,850 --> 00:42:46,770 it actually doesn't fit very well. 891 00:42:46,770 --> 00:42:49,040 And the reasons being, you're trying to fit this now 892 00:42:49,040 --> 00:42:52,670 to a very specific biological context. 893 00:42:52,670 --> 00:42:54,870 Hepatocyte tumor cells under these 894 00:42:54,870 --> 00:42:57,090 grow factor and cytokine treatments. 895 00:42:57,090 --> 00:42:58,990 That network is likely very different 896 00:42:58,990 --> 00:43:01,850 from whatever aggregate you got from literature curation 897 00:43:01,850 --> 00:43:03,510 and so forth in a database. 898 00:43:03,510 --> 00:43:05,980 There's going to be a lot of stuff in the database that's 899 00:43:05,980 --> 00:43:08,580 not applicable, because it came from a different cell 900 00:43:08,580 --> 00:43:10,950 type, a different condition, or there just 901 00:43:10,950 --> 00:43:15,210 wasn't enough experiments in the literature for hypatocytes. 902 00:43:15,210 --> 00:43:17,160 Maybe it was never measured under treatment 903 00:43:17,160 --> 00:43:18,840 with interferon gamma. 904 00:43:18,840 --> 00:43:21,750 So there's data here that the database never 905 00:43:21,750 --> 00:43:23,960 had access to literature that it had explored. 906 00:43:23,960 --> 00:43:26,950 So lots of reasons. 907 00:43:26,950 --> 00:43:29,460 Now when you go through the processes we just talked about, 908 00:43:29,460 --> 00:43:33,020 and in the end, the best fit models 909 00:43:33,020 --> 00:43:35,370 give you something like less than 10% error. 910 00:43:35,370 --> 00:43:39,079 So less than 10% of these squares are red or pink. 911 00:43:39,079 --> 00:43:40,620 OK, so that's the kind of improvement 912 00:43:40,620 --> 00:43:44,420 that you can take by generating an improved model. 913 00:43:44,420 --> 00:43:48,050 By adding and subtracting arcs and nodes. 914 00:43:51,340 --> 00:43:54,820 So this is what the model looks like in the end for this tumor 915 00:43:54,820 --> 00:43:55,460 cell line. 916 00:43:55,460 --> 00:44:02,240 And this is a consensus model from the 20 or so best fit. 917 00:44:02,240 --> 00:44:05,470 And so the thickness of a line is 918 00:44:05,470 --> 00:44:08,050 how strong the consensus was. 919 00:44:08,050 --> 00:44:11,200 The strongest would be all 20 had it. 920 00:44:11,200 --> 00:44:16,470 And the point here being, you see some purple. 921 00:44:16,470 --> 00:44:18,810 And I wish my pen wasn't fading in and out. 922 00:44:18,810 --> 00:44:23,090 If anybody has a pointer I'll be happy to have it. 923 00:44:23,090 --> 00:44:26,450 Where you see purple, those were arcs that 924 00:44:26,450 --> 00:44:28,240 weren't in the Ingenuity database 925 00:44:28,240 --> 00:44:34,166 and had to be put in to get the data to fit this well. 926 00:44:34,166 --> 00:44:35,540 And it turns out, if you actually 927 00:44:35,540 --> 00:44:40,110 go back to the literature, you find that those purple arcs 928 00:44:40,110 --> 00:44:41,940 were already described in the literature. 929 00:44:41,940 --> 00:44:44,190 It's just that they weren't captured in that database. 930 00:44:48,850 --> 00:44:51,470 Well that's green and purple. 931 00:44:51,470 --> 00:44:56,820 Then you see some blue and they were 932 00:44:56,820 --> 00:44:58,880 in some of the other tumor cell types 933 00:44:58,880 --> 00:45:00,740 but now in this particular hep G2 934 00:45:00,740 --> 00:45:04,530 But you can generate a model that works very well. 935 00:45:04,530 --> 00:45:08,990 And see that it's consistent with much of literature. 936 00:45:08,990 --> 00:45:11,880 It's a more stripped down than what's in the databases. 937 00:45:11,880 --> 00:45:13,920 And there's some new things in it, that in fact, 938 00:45:13,920 --> 00:45:15,836 if you go back to the literature you can find, 939 00:45:15,836 --> 00:45:18,521 because they just were captured in the database. 940 00:45:18,521 --> 00:45:19,060 All right. 941 00:45:19,060 --> 00:45:22,570 A few insights about the analysis. 942 00:45:22,570 --> 00:45:26,150 So I want to show you, here is the objective function. 943 00:45:26,150 --> 00:45:30,100 How well the model fit and that's in red. 944 00:45:30,100 --> 00:45:30,810 OK. 945 00:45:30,810 --> 00:45:35,690 And in blue is the actual fit to the experimental data. 946 00:45:35,690 --> 00:45:37,770 And again, the hirer it is the worse it is. 947 00:45:37,770 --> 00:45:40,320 And the green gives you essentially the size. 948 00:45:40,320 --> 00:45:43,730 And this is plotted against the size penalty. 949 00:45:43,730 --> 00:45:46,600 And what's very interesting, is even for very small size 950 00:45:46,600 --> 00:45:54,090 penalties, almost negligible, that the size of the model that 951 00:45:54,090 --> 00:45:58,270 turns out to be best fit is substantially smaller 952 00:45:58,270 --> 00:46:00,186 than what was in the database. 953 00:46:00,186 --> 00:46:03,575 OK, so you actually generate a small model immediately. 954 00:46:03,575 --> 00:46:07,770 A smaller model immediately, even without any size penalty. 955 00:46:07,770 --> 00:46:10,270 So your intuition that a bigger model 956 00:46:10,270 --> 00:46:13,820 was going to be better actually turns out to be incorrect. 957 00:46:13,820 --> 00:46:17,290 That even without a size penalty, the model strips down. 958 00:46:17,290 --> 00:46:20,400 And why is that? 959 00:46:20,400 --> 00:46:20,980 Why is that? 960 00:46:20,980 --> 00:46:23,120 Let me make that question number two. 961 00:46:23,120 --> 00:46:26,050 Just to see who's still awake. 962 00:46:26,050 --> 00:46:30,910 Why, in fitting this hepatocyte data, 963 00:46:30,910 --> 00:46:34,100 would a model that leaves out a lot of stuff in the Ingenuity 964 00:46:34,100 --> 00:46:38,780 database that's presumably going on actually 965 00:46:38,780 --> 00:46:40,582 fit the data better? 966 00:46:40,582 --> 00:46:42,636 A smaller model fits better? 967 00:46:42,636 --> 00:46:46,110 Why is that? 968 00:46:46,110 --> 00:46:48,430 Yeah. 969 00:46:48,430 --> 00:46:50,020 AUDIENCE: This is a [INAUDIBLE]. 970 00:46:50,020 --> 00:46:53,114 Maybe the strength of the attractions 971 00:46:53,114 --> 00:46:54,960 aren't really taken into account here? 972 00:46:54,960 --> 00:46:59,830 And so the moving things out, essentially means 973 00:46:59,830 --> 00:47:03,180 that you're not sealing everything in the [INAUDIBLE]. 974 00:47:03,180 --> 00:47:05,129 You're just taking one. 975 00:47:05,129 --> 00:47:06,170 DOUG LAUFFENBURGER: Yeah. 976 00:47:06,170 --> 00:47:07,260 That's essentially it. 977 00:47:07,260 --> 00:47:11,550 I think you've casted it an almost quantitative term, 978 00:47:11,550 --> 00:47:15,190 but I think it's true even in qualitative terms. 979 00:47:15,190 --> 00:47:18,490 And one way to think about it is-- let's 980 00:47:18,490 --> 00:47:21,090 say I have an extra arc or extra node. 981 00:47:21,090 --> 00:47:22,180 OK. 982 00:47:22,180 --> 00:47:24,480 I might capture some more true positives. 983 00:47:24,480 --> 00:47:26,910 I might actually capture more of my data, 984 00:47:26,910 --> 00:47:32,590 but I could actually now, gain more complex with my data. 985 00:47:32,590 --> 00:47:35,840 Because now I've put in logic that, yes, 986 00:47:35,840 --> 00:47:38,370 it captures this measurement, but now maybe it 987 00:47:38,370 --> 00:47:41,180 messes up these other two or three measurements. 988 00:47:41,180 --> 00:47:43,020 So you actually can make your model worse 989 00:47:43,020 --> 00:47:47,250 trying to capture some small piece, that in fact, adversely 990 00:47:47,250 --> 00:47:51,020 influences the effects on the other measurements. 991 00:47:51,020 --> 00:47:54,150 So you get you get false positives, false negatives, 992 00:47:54,150 --> 00:47:56,110 along with anything and that's true. 993 00:47:56,110 --> 00:48:00,090 And it just so happens that in these kind of situations 994 00:48:00,090 --> 00:48:01,910 those can outweigh. 995 00:48:01,910 --> 00:48:04,810 Then of course, as you increase the size penalty 996 00:48:04,810 --> 00:48:08,770 you can drive your model to be even smaller, fewer arcs, 997 00:48:08,770 --> 00:48:11,630 and now that of course does come at the expense 998 00:48:11,630 --> 00:48:13,340 of not fitting the data better. 999 00:48:13,340 --> 00:48:14,670 OK. 1000 00:48:14,670 --> 00:48:18,130 So where we decided that the size penalty best lived 1001 00:48:18,130 --> 00:48:22,930 was where it was large enough to ensure stripping down 1002 00:48:22,930 --> 00:48:26,040 of nonessential nodes and arcs, but not large enough 1003 00:48:26,040 --> 00:48:29,100 to start compromising the actual experimental fit. 1004 00:48:29,100 --> 00:48:29,600 OK. 1005 00:48:29,600 --> 00:48:31,308 And so that lived someplace around there. 1006 00:48:34,890 --> 00:48:35,390 OK. 1007 00:48:35,390 --> 00:48:37,014 An important thing-- and this goes back 1008 00:48:37,014 --> 00:48:38,600 to the consensus model. 1009 00:48:38,600 --> 00:48:40,700 If you think about, quote, model identification, 1010 00:48:40,700 --> 00:48:45,560 can you uniquely specify one model a best fit model? 1011 00:48:45,560 --> 00:48:46,450 You really can't. 1012 00:48:46,450 --> 00:48:48,590 What's plotted here is for any of the arcs that 1013 00:48:48,590 --> 00:48:50,892 would end up in a model. 1014 00:48:50,892 --> 00:48:53,100 Let's say we let's say we numbered them from one to I 1015 00:48:53,100 --> 00:48:55,980 think it was 113 in the first place. 1016 00:48:55,980 --> 00:48:58,040 One arc, another arc, another arc, another arc. 1017 00:48:58,040 --> 00:48:59,970 And you say, how frequently did they 1018 00:48:59,970 --> 00:49:03,880 end up in the best fit models? 1019 00:49:03,880 --> 00:49:07,190 Basically, only a small proportion of them 1020 00:49:07,190 --> 00:49:08,870 were in all the best fit models. 1021 00:49:08,870 --> 00:49:13,610 Some of them were in some models and some not. 1022 00:49:13,610 --> 00:49:17,650 Of course the higher the tolerance, 1023 00:49:17,650 --> 00:49:20,310 the more air you allowed and now you 1024 00:49:20,310 --> 00:49:22,130 started to get models that all fit 1025 00:49:22,130 --> 00:49:27,330 to within whatever that criteria was in which most of their arcs 1026 00:49:27,330 --> 00:49:28,160 weren't the same. 1027 00:49:28,160 --> 00:49:30,326 You could have a lot of different network structures 1028 00:49:30,326 --> 00:49:32,370 that give you that same fit. 1029 00:49:32,370 --> 00:49:37,080 If you require a very, very tiny fit, compared to air, 1030 00:49:37,080 --> 00:49:40,830 something like this, then more of the arcs in the models 1031 00:49:40,830 --> 00:49:42,600 have to be in common. 1032 00:49:42,600 --> 00:49:43,100 OK. 1033 00:49:43,100 --> 00:49:44,670 So that makes some sense. 1034 00:49:44,670 --> 00:49:48,220 But you can't really completely identify a unique model. 1035 00:49:48,220 --> 00:49:50,800 That goes to what I said before. 1036 00:49:50,800 --> 00:49:51,300 OK. 1037 00:49:51,300 --> 00:49:53,217 I was talking before about trade-offs 1038 00:49:53,217 --> 00:49:55,050 between false positives and false negatives. 1039 00:49:55,050 --> 00:49:59,180 You must know, I'm sure from previous things in this class, 1040 00:49:59,180 --> 00:50:01,870 the receiver operating characteristic curves, 1041 00:50:01,870 --> 00:50:05,530 where for every of your model parameter choices, 1042 00:50:05,530 --> 00:50:07,200 you say, what are my results in terms 1043 00:50:07,200 --> 00:50:10,390 of false positives versus true positives? 1044 00:50:10,390 --> 00:50:14,620 And you're trying to find the optimal location 1045 00:50:14,620 --> 00:50:19,750 along this type of path. 1046 00:50:19,750 --> 00:50:27,400 And so, what's shown here is that the best predictive model, 1047 00:50:27,400 --> 00:50:32,160 in fact, is the one where we have the size penalty 1048 00:50:32,160 --> 00:50:34,760 to be right on the edge of not making the experimental data 1049 00:50:34,760 --> 00:50:38,060 fit worse, but still strips out the most arcs. 1050 00:50:38,060 --> 00:50:41,670 So again, that demonstrates that the smaller model actually 1051 00:50:41,670 --> 00:50:47,540 is in fact better, in terms of finding this type of-- 1052 00:50:47,540 --> 00:50:50,650 And this shows if we actually put in some more arcs that 1053 00:50:50,650 --> 00:50:53,140 tried to capture some more data, yes we decrease 1054 00:50:53,140 --> 00:50:55,680 the false negatives, but, in fact, we 1055 00:50:55,680 --> 00:50:56,900 increase the false positives. 1056 00:50:56,900 --> 00:50:59,780 We actually shift ourselves on this curve. 1057 00:50:59,780 --> 00:51:03,310 And so you decide whether that's desirable or not. 1058 00:51:03,310 --> 00:51:04,620 Where you'd like to live. 1059 00:51:04,620 --> 00:51:08,940 So you can analyze what you like about your best fit class 1060 00:51:08,940 --> 00:51:11,270 of models in this kind of way. 1061 00:51:14,550 --> 00:51:17,920 OK, so now we have some confidence in this. 1062 00:51:17,920 --> 00:51:19,360 What are you going to do with it? 1063 00:51:19,360 --> 00:51:22,950 And one thing I'd like to do is just make a priori predictions. 1064 00:51:22,950 --> 00:51:26,040 Say I now believe that on these hepatocytes or tumor 1065 00:51:26,040 --> 00:51:29,810 cells stimulated with these kind of things, 1066 00:51:29,810 --> 00:51:32,870 I can calculate what the experimental signaling 1067 00:51:32,870 --> 00:51:35,121 activities should be. 1068 00:51:35,121 --> 00:51:35,620 All right. 1069 00:51:35,620 --> 00:51:37,020 Let's see if we do that a priori. 1070 00:51:37,020 --> 00:51:43,440 So let's now use new inhibitors that hadn't been used before. 1071 00:51:43,440 --> 00:51:45,650 Combination of inhibitors, especially in cancer. 1072 00:51:45,650 --> 00:51:47,960 People are always interested in combinatorial drugs. 1073 00:51:47,960 --> 00:51:50,460 Experimentally it's prohibitive to run 1074 00:51:50,460 --> 00:51:52,190 through all possible combinations. 1075 00:51:52,190 --> 00:51:54,190 So this is one thing in the pharmaceutical field 1076 00:51:54,190 --> 00:51:56,900 people believe these kind of models are really useful for. 1077 00:51:56,900 --> 00:51:58,780 Let's try all possible drug combinations 1078 00:51:58,780 --> 00:52:01,270 and see which ones are most promising. 1079 00:52:01,270 --> 00:52:04,810 And instead of just one ligand growth factor cytokine 1080 00:52:04,810 --> 00:52:07,660 at a time, do different combinations. 1081 00:52:07,660 --> 00:52:10,270 So this is all an entirely new data set. 1082 00:52:10,270 --> 00:52:12,830 So different treatments that are different combinations, 1083 00:52:12,830 --> 00:52:16,110 different inhibitors, different combinations of inhibitors. 1084 00:52:16,110 --> 00:52:18,760 And now you just run the model-- it's not trained on this. 1085 00:52:18,760 --> 00:52:20,520 It was trained on the previous data. 1086 00:52:20,520 --> 00:52:23,300 And now a priori predicts this data set. 1087 00:52:23,300 --> 00:52:26,760 And now, again, you look for the model fit in the bottom. 1088 00:52:26,760 --> 00:52:30,660 And again, you want the smallest number of red and pink boxes. 1089 00:52:30,660 --> 00:52:33,430 In effect it predicted to within about 11% error. 1090 00:52:33,430 --> 00:52:38,360 About 11% of the boxes didn't fit well, but 89% percent did. 1091 00:52:38,360 --> 00:52:41,050 And that's, in fact pretty close to the 9% 1092 00:52:41,050 --> 00:52:43,430 that was on the original training model. 1093 00:52:43,430 --> 00:52:47,140 So in terms of this, in this realm of studies, 1094 00:52:47,140 --> 00:52:49,970 these a priori treatment conditions-- drug combinations, 1095 00:52:49,970 --> 00:52:52,480 growth factors, cytokine combinations-- 1096 00:52:52,480 --> 00:52:56,120 this is a pretty good validation that this model wasn't 1097 00:52:56,120 --> 00:52:58,330 just kind of trained and fit. 1098 00:52:58,330 --> 00:53:00,130 That it, in fact, could predict then 1099 00:53:00,130 --> 00:53:02,316 what was happening in these pathways. 1100 00:53:02,316 --> 00:53:04,190 And then of course, what it allows you to do, 1101 00:53:04,190 --> 00:53:05,898 where all the red boxes are-- it say, OK, 1102 00:53:05,898 --> 00:53:08,042 that's where we need more intensive study. 1103 00:53:08,042 --> 00:53:10,000 Now maybe we go back to the literature and say, 1104 00:53:10,000 --> 00:53:12,050 is there more known about those nodes that 1105 00:53:12,050 --> 00:53:15,780 was captured in whatever our interactive database 1106 00:53:15,780 --> 00:53:17,390 that we started with? 1107 00:53:17,390 --> 00:53:21,880 Maybe we need to supplement the scaffold with more information. 1108 00:53:21,880 --> 00:53:24,730 That's out in the literature where more and more dedicated 1109 00:53:24,730 --> 00:53:25,690 experiments are done. 1110 00:53:25,690 --> 00:53:29,190 So it narrows down where the next set of investigations 1111 00:53:29,190 --> 00:53:32,030 need to be, whether from the literature or from yourself. 1112 00:53:35,286 --> 00:53:35,785 OK. 1113 00:53:40,900 --> 00:53:43,240 So this is just then some biological results. 1114 00:53:43,240 --> 00:53:48,820 If you do this for the four different hepatocellular lines. 1115 00:53:48,820 --> 00:53:50,810 Some of the signaling activities are the same, 1116 00:53:50,810 --> 00:53:51,768 and some are different. 1117 00:53:54,700 --> 00:53:58,400 I think I'll skip that. 1118 00:53:58,400 --> 00:53:59,750 All right, let me show this. 1119 00:53:59,750 --> 00:54:03,850 So this says, where are the similarities and differences 1120 00:54:03,850 --> 00:54:07,220 between the normal hepatocytes versus the tumor lines. 1121 00:54:07,220 --> 00:54:09,160 Because this is where you would want 1122 00:54:09,160 --> 00:54:11,540 to get the ideas for where the right drugs would be. 1123 00:54:11,540 --> 00:54:15,340 Where is the logic different, between a normal liver 1124 00:54:15,340 --> 00:54:19,420 cell and one of these transformed types. 1125 00:54:19,420 --> 00:54:22,220 So, this is the same kind of scaffold. 1126 00:54:22,220 --> 00:54:24,210 It'll get us the consensus models 1127 00:54:24,210 --> 00:54:26,820 and the thickness of the line is how strong-- 1128 00:54:26,820 --> 00:54:30,330 what proportion of the models did that arc show up in? 1129 00:54:30,330 --> 00:54:31,840 Along the best. 1130 00:54:31,840 --> 00:54:37,760 If it's black, the arc was in the primary hepatocytes 1131 00:54:37,760 --> 00:54:39,690 and all the cell lines. 1132 00:54:39,690 --> 00:54:44,380 So black is just sort of consensus core. 1133 00:54:44,380 --> 00:54:45,970 This is just invariably there. 1134 00:54:49,020 --> 00:54:53,480 The blue was in the models for the primary hepatocytes, 1135 00:54:53,480 --> 00:54:57,030 but for some reason didn't exist in the tumor cell lines. 1136 00:54:57,030 --> 00:55:00,300 So we're signaling logic that normal hepatocytes use, 1137 00:55:00,300 --> 00:55:03,870 that the tumor cell lines have somehow lost. 1138 00:55:03,870 --> 00:55:07,977 Red, are arcs that weren't in the primary cells, 1139 00:55:07,977 --> 00:55:09,560 but showed up in the tumor cell lines. 1140 00:55:09,560 --> 00:55:12,160 So was logic that the normal liver cells apparently 1141 00:55:12,160 --> 00:55:16,912 didn't use, but now showed up in the tumor cell lines. 1142 00:55:16,912 --> 00:55:18,620 And why would there be these differences? 1143 00:55:18,620 --> 00:55:21,850 Well this is where it goes back to then the genetic mutations 1144 00:55:21,850 --> 00:55:24,170 and variations. 1145 00:55:24,170 --> 00:55:27,880 Because going from a primary to some tumor cell line, 1146 00:55:27,880 --> 00:55:32,540 there's enough of the genetic mutations, that in this case 1147 00:55:32,540 --> 00:55:36,460 said, OK, I've got some genetic mutation that interrupts 1148 00:55:36,460 --> 00:55:39,680 the link between map three kinase and Ikk. 1149 00:55:39,680 --> 00:55:41,700 There was some docking protein or something 1150 00:55:41,700 --> 00:55:45,250 that's now missing, not expressed as highly. 1151 00:55:45,250 --> 00:55:46,730 It's got a mutation of amino acids 1152 00:55:46,730 --> 00:55:48,300 and no longer docks right. 1153 00:55:48,300 --> 00:55:50,506 It has a lower enzymatic activity. 1154 00:55:50,506 --> 00:55:51,880 So now you can go back and trace. 1155 00:55:51,880 --> 00:55:54,410 Can I find some genetic mutation that 1156 00:55:54,410 --> 00:55:57,290 has to do with the loss of that arc? 1157 00:55:57,290 --> 00:55:59,015 Or if I've got a red arc that shows up-- 1158 00:55:59,015 --> 00:56:00,640 like I said because there was something 1159 00:56:00,640 --> 00:56:06,230 in my genetic mutations that now adds an activity here 1160 00:56:06,230 --> 00:56:07,270 that wasn't there. 1161 00:56:07,270 --> 00:56:09,920 Maybe something is now constituently active. 1162 00:56:09,920 --> 00:56:12,360 Maybe something is just expressed at a higher level. 1163 00:56:12,360 --> 00:56:15,179 And all of a sudden that pathway comes into play. 1164 00:56:15,179 --> 00:56:16,220 So that's the cool thing. 1165 00:56:16,220 --> 00:56:19,270 You can trace what's actually in the genetic mutations 1166 00:56:19,270 --> 00:56:21,400 if you have some methodology for that, 1167 00:56:21,400 --> 00:56:24,510 to what's actually been altered in the network logic. 1168 00:56:24,510 --> 00:56:26,576 Yeah? 1169 00:56:26,576 --> 00:56:28,950 AUDIENCE: Are the primary lines considered healthy lines? 1170 00:56:28,950 --> 00:56:29,530 Or are they-- 1171 00:56:29,530 --> 00:56:29,970 DOUG LAUFFENBURGER: Yes. 1172 00:56:29,970 --> 00:56:30,887 AUDIENCE: OK, so the-- 1173 00:56:30,887 --> 00:56:31,928 DOUG LAUFFENBURGER: Yeah. 1174 00:56:31,928 --> 00:56:33,370 So they're from donors but they're 1175 00:56:33,370 --> 00:56:40,120 mainly like motorcycle accident donors that don't either 1176 00:56:40,120 --> 00:56:43,370 liver anymore but the liver was fine. 1177 00:56:43,370 --> 00:56:45,130 So, yeah, they're from healthy donors. 1178 00:56:45,130 --> 00:56:45,810 AUDIENCE: [INAUDIBLE]. 1179 00:56:45,810 --> 00:56:46,360 DOUG LAUFFENBURGER: Yeah. 1180 00:56:46,360 --> 00:56:46,860 Yeah. 1181 00:56:46,860 --> 00:56:49,530 It was the lines at some point came from a tumor 1182 00:56:49,530 --> 00:56:51,591 and have been propagated in a culture, yeah. 1183 00:56:56,310 --> 00:56:56,950 OK. 1184 00:56:56,950 --> 00:57:01,040 What do I want to-- got a little bit more time. 1185 00:57:01,040 --> 00:57:01,740 Let me do this. 1186 00:57:01,740 --> 00:57:02,250 OK. 1187 00:57:02,250 --> 00:57:05,100 So here's another interesting thing that can happen. 1188 00:57:05,100 --> 00:57:06,730 If you take these models seriously, 1189 00:57:06,730 --> 00:57:10,050 it can tell you something about the biochemistry, perhaps 1190 00:57:10,050 --> 00:57:10,910 of what's going on. 1191 00:57:16,440 --> 00:57:21,120 So see there's this dashed line here that I want to emphasize 1192 00:57:21,120 --> 00:57:23,510 and we'll emphasize it again on another slide. 1193 00:57:23,510 --> 00:57:24,950 That was one that had to be added. 1194 00:57:24,950 --> 00:57:30,970 It just wasn't in the Ingenuity pathway, scaffold. 1195 00:57:30,970 --> 00:57:33,440 Actually couldn't find it in any literature anywhere. 1196 00:57:33,440 --> 00:57:37,100 But nonetheless you needed it to fit some data. 1197 00:57:37,100 --> 00:57:38,760 So we kind of kept our eye on that one. 1198 00:57:38,760 --> 00:57:41,090 What the heck is going on here? 1199 00:57:41,090 --> 00:57:45,450 This dashed line from I kappa kinase up to step three. 1200 00:57:45,450 --> 00:57:49,620 No evidence for that signaling linkage in the literature 1201 00:57:49,620 --> 00:57:51,200 anywhere. 1202 00:57:51,200 --> 00:57:52,272 What could that tell you? 1203 00:57:56,150 --> 00:57:56,650 All right. 1204 00:57:56,650 --> 00:57:58,730 Well, you go back to the data now and you say, 1205 00:57:58,730 --> 00:58:01,590 well what of the data set, of the experimental measurements 1206 00:58:01,590 --> 00:58:04,400 that we made, caused that arc to have 1207 00:58:04,400 --> 00:58:06,071 to be there to fit the data well? 1208 00:58:06,071 --> 00:58:06,570 OK. 1209 00:58:06,570 --> 00:58:08,590 You can now ask that kind of question. 1210 00:58:08,590 --> 00:58:11,437 Well remember I said in the data set were inhibitors. 1211 00:58:11,437 --> 00:58:13,520 Some small molecule inhibitors against this kinase 1212 00:58:13,520 --> 00:58:16,660 or that kinase or that kinase that would perturb the network 1213 00:58:16,660 --> 00:58:19,005 and then give us relationships at the logic model 1214 00:58:19,005 --> 00:58:21,820 and had to account for. 1215 00:58:21,820 --> 00:58:24,810 Well, this one had to be there, mainly 1216 00:58:24,810 --> 00:58:29,270 to account for data that came from an inhibitor of Ikk. 1217 00:58:29,270 --> 00:58:32,250 That one of the kinases that we had a small molecule 1218 00:58:32,250 --> 00:58:36,000 inhibitor for, inhabited this kinase. 1219 00:58:36,000 --> 00:58:37,690 And somehow there turned out to be 1220 00:58:37,690 --> 00:58:40,740 an effect on staph 3 phosphorylation. 1221 00:58:40,740 --> 00:58:44,810 And so you needed that arc to be there. 1222 00:58:44,810 --> 00:58:46,810 So either the explanation that either there's, 1223 00:58:46,810 --> 00:58:49,800 in fact, some real mechanism going on here. 1224 00:58:49,800 --> 00:58:51,360 It might have been transcriptional 1225 00:58:51,360 --> 00:58:53,880 that somehow the activity of this kinase 1226 00:58:53,880 --> 00:58:57,760 affects the levels of expression and the responsiveness 1227 00:58:57,760 --> 00:58:59,260 of staph 3. 1228 00:58:59,260 --> 00:59:02,370 Or you say, ah, maybe it's a problem with the drug? 1229 00:59:02,370 --> 00:59:04,074 It's a problem with the inhibitor. 1230 00:59:04,074 --> 00:59:06,490 That, in fact, what you thought was an inhibitor that just 1231 00:59:06,490 --> 00:59:09,320 affected this kinase, has an off-target target effect 1232 00:59:09,320 --> 00:59:10,980 on that kind of that kinase. 1233 00:59:10,980 --> 00:59:12,880 And it's just an artifact. 1234 00:59:12,880 --> 00:59:14,820 That's an alternative explanation. 1235 00:59:14,820 --> 00:59:16,820 Right, so that's the sort of thing you can test. 1236 00:59:16,820 --> 00:59:19,420 And we did test it. 1237 00:59:19,420 --> 00:59:21,110 And here's the data here. 1238 00:59:21,110 --> 00:59:26,130 At the bottom is the kinase that you wanted the inhibition 2. 1239 00:59:26,130 --> 00:59:30,230 And in the blue was the inhibitor that was actually 1240 00:59:30,230 --> 00:59:33,990 used in the study, both in vivo and en vitro 1241 00:59:33,990 --> 00:59:37,080 and it inhibited that kinase. 1242 00:59:37,080 --> 00:59:39,930 But then we looked at the potential off target effect 1243 00:59:39,930 --> 00:59:43,550 on that other-- the JAK2 [? stat ?] 3 1244 00:59:43,550 --> 00:59:46,560 and it also did have activity on that. 1245 00:59:46,560 --> 00:59:50,400 So it meant that that inhibitor had 1246 00:59:50,400 --> 00:59:55,410 an effect, not just on the Ikk, but also on the JAK [? stat ?] 1247 00:59:55,410 --> 00:59:57,030 3. 1248 00:59:57,030 --> 01:00:01,380 And so that's why that arc had to be there, 1249 01:00:01,380 --> 01:00:03,130 is because, in fact, that inhibitor, 1250 01:00:03,130 --> 01:00:05,240 inhibited this kinase as well. 1251 01:00:05,240 --> 01:00:07,830 So if we took that into account in terms of the algorithm, 1252 01:00:07,830 --> 01:00:09,980 then we wouldn't have to have that arc because it 1253 01:00:09,980 --> 01:00:11,900 was spurious and came from the arc, in fact, 1254 01:00:11,900 --> 01:00:13,090 of that inhibitor. 1255 01:00:13,090 --> 01:00:15,670 But the interesting thing is that, by taking the model 1256 01:00:15,670 --> 01:00:18,500 seriously, we can actually find that. 1257 01:00:18,500 --> 01:00:22,400 Because it was not previously known that this inhibitor had 1258 01:00:22,400 --> 01:00:25,560 an off-target effect on that kinase. 1259 01:00:25,560 --> 01:00:28,620 In effect, the interesting thing, pharmacologically, 1260 01:00:28,620 --> 01:00:34,840 was that this small molecule that 1261 01:00:34,840 --> 01:00:37,550 was aimed to be an inhibitor against this kinase 1262 01:00:37,550 --> 01:00:42,180 was the best by far in treating lung airway inflammation, 1263 01:00:42,180 --> 01:00:43,960 compared against a whole other set 1264 01:00:43,960 --> 01:00:46,690 of other types of inhibitors for the same kinase. 1265 01:00:46,690 --> 01:00:48,520 So now the reason might be is, it's 1266 01:00:48,520 --> 01:00:51,010 better because it's also hitting this other kinase. 1267 01:00:51,010 --> 01:00:52,610 That this off-target effect actually 1268 01:00:52,610 --> 01:00:55,920 is therapeutically efficacious and in fact 1269 01:00:55,920 --> 01:00:58,830 a combination of drugs against this kinase 1270 01:00:58,830 --> 01:01:01,270 and the other kinase is what's required 1271 01:01:01,270 --> 01:01:03,600 for the therapeutic benefit. 1272 01:01:03,600 --> 01:01:05,750 So that's something that could be explored. 1273 01:01:05,750 --> 01:01:07,920 And that's the sort of thing this model leads to. 1274 01:01:10,770 --> 01:01:11,270 OK. 1275 01:01:14,096 --> 01:01:20,220 Let me end by digging into this difference a little bit. 1276 01:01:20,220 --> 01:01:24,700 Because I said, you see these differences 1277 01:01:24,700 --> 01:01:31,080 between primary hepatocytes and the tumor cell lines. 1278 01:01:31,080 --> 01:01:34,290 And the model said, just from examining the data sets, 1279 01:01:34,290 --> 01:01:36,175 that the logic is different. 1280 01:01:36,175 --> 01:01:36,820 OK. 1281 01:01:36,820 --> 01:01:40,280 Is there any validation for that? 1282 01:01:40,280 --> 01:01:43,315 Well, so let's go back and look at those differences 1283 01:01:43,315 --> 01:01:44,440 with respect to literature. 1284 01:01:44,440 --> 01:01:47,920 So if you just blow up that part of the model, 1285 01:01:47,920 --> 01:01:50,830 there's eight edges that are strongly 1286 01:01:50,830 --> 01:01:53,940 disparate between the primary, normal cell types and the tumor 1287 01:01:53,940 --> 01:01:55,940 cells and they're all enumerated here. 1288 01:01:55,940 --> 01:01:59,240 One, two, three, four, five, six, seven, eight. 1289 01:01:59,240 --> 01:02:02,582 And they're essentially in three different pathways. 1290 01:02:02,582 --> 01:02:04,040 So what the model is telling you is 1291 01:02:04,040 --> 01:02:08,670 that there's three different pathways that are substantially 1292 01:02:08,670 --> 01:02:13,468 different between a normal liver cell and a liver tumor cell. 1293 01:02:13,468 --> 01:02:14,460 OK. 1294 01:02:14,460 --> 01:02:19,470 So is there any evidence that this is really true? 1295 01:02:19,470 --> 01:02:20,750 So let's look at one. 1296 01:02:20,750 --> 01:02:23,165 On to this pathway that I've got differences. 1297 01:02:23,165 --> 01:02:25,090 And you see blue here and red here. 1298 01:02:29,020 --> 01:02:32,385 It says that this particular signaling node in normal cells 1299 01:02:32,385 --> 01:02:34,650 is activated by this pathway. 1300 01:02:34,650 --> 01:02:36,430 In the tumors, that regulation is lost 1301 01:02:36,430 --> 01:02:40,160 and that actually comes through another pathway. 1302 01:02:40,160 --> 01:02:45,550 And it turns this is consistent with literature 1303 01:02:45,550 --> 01:02:47,910 that, in fact, in the tumor cells, 1304 01:02:47,910 --> 01:02:52,350 you get a higher activity of this downstream node. 1305 01:02:52,350 --> 01:02:53,940 And now I've lost my light again. 1306 01:02:53,940 --> 01:02:56,470 This HSP27. 1307 01:02:56,470 --> 01:02:59,560 Even though it's over expressed, you 1308 01:02:59,560 --> 01:03:03,930 get less activation because this pathway is less strongly 1309 01:03:03,930 --> 01:03:07,490 activated in red than the blue pathway is. 1310 01:03:07,490 --> 01:03:08,930 So if you went by gene expression, 1311 01:03:08,930 --> 01:03:10,500 you'd think in the tumor cells, this 1312 01:03:10,500 --> 01:03:12,706 is a higher activated pathway. 1313 01:03:12,706 --> 01:03:14,080 Turns out the logic is different, 1314 01:03:14,080 --> 01:03:15,870 and you actually get less activation of it 1315 01:03:15,870 --> 01:03:17,745 because it's coming from a different pathway. 1316 01:03:17,745 --> 01:03:21,740 So that turns out to be true in the liver tumor literature. 1317 01:03:21,740 --> 01:03:24,530 Another one-- I find this one really interesting. 1318 01:03:24,530 --> 01:03:29,110 That in normal liver cells, to activate this Ikk pathway-- 1319 01:03:29,110 --> 01:03:30,910 that's a very important kinase pathway, 1320 01:03:30,910 --> 01:03:33,660 governing the transcription factor of NF Kappa b. 1321 01:03:33,660 --> 01:03:37,210 In a primary cell, I need this combined logic 1322 01:03:37,210 --> 01:03:39,455 between a pathway downstream of insulin receptor 1323 01:03:39,455 --> 01:03:42,060 and a pathway downstream of a cytokine. 1324 01:03:42,060 --> 01:03:46,520 Only if both of those pathways are on, do I now turn this on. 1325 01:03:46,520 --> 01:03:48,890 In the tumor cells, that check is lost. 1326 01:03:48,890 --> 01:03:51,690 Only one pathway is required. 1327 01:03:51,690 --> 01:03:52,190 OK. 1328 01:03:52,190 --> 01:03:53,700 If this one is activated, I'm going 1329 01:03:53,700 --> 01:03:56,140 to get this transcription factor activated. 1330 01:03:56,140 --> 01:03:58,820 I don't have to wait for simultaneous activation 1331 01:03:58,820 --> 01:03:59,800 of this pathway. 1332 01:03:59,800 --> 01:04:02,650 Where as a normal says I have to. 1333 01:04:02,650 --> 01:04:03,470 OK. 1334 01:04:03,470 --> 01:04:06,220 That turns out to be true that in the liver cells, 1335 01:04:06,220 --> 01:04:09,040 the progression is associated with a looser 1336 01:04:09,040 --> 01:04:13,810 regulation of this transcription factor. 1337 01:04:13,810 --> 01:04:15,130 And one more. 1338 01:04:15,130 --> 01:04:18,870 I won't go into too much detail, but again, you 1339 01:04:18,870 --> 01:04:21,310 see reds and blues here. 1340 01:04:21,310 --> 01:04:23,280 In the tumor cell lines, you've now 1341 01:04:23,280 --> 01:04:25,810 got activities downstream of insulin. 1342 01:04:25,810 --> 01:04:27,770 That's normally just a survival factor, 1343 01:04:27,770 --> 01:04:30,490 that's just not found in the primary cells. 1344 01:04:30,490 --> 01:04:33,870 And that, in fact, is shown in the literature too, 1345 01:04:33,870 --> 01:04:37,500 that insulin signaling shifts from metabolism 1346 01:04:37,500 --> 01:04:38,785 to proliferation. 1347 01:04:38,785 --> 01:04:40,384 It's mainly metabolic, stimulus. 1348 01:04:40,384 --> 01:04:42,800 In the normal cells it turns into a proliferative stimulus 1349 01:04:42,800 --> 01:04:44,500 in the tumor cells. 1350 01:04:44,500 --> 01:04:45,000 OK. 1351 01:04:45,000 --> 01:04:50,090 So, what this says is, just by mapping this logic 1352 01:04:50,090 --> 01:04:53,670 scaffold, the scaffold against empirical data, 1353 01:04:53,670 --> 01:04:55,390 developing a logic model, you in fact 1354 01:04:55,390 --> 01:04:59,650 can find loci of differences between the normal cell 1355 01:04:59,650 --> 01:05:02,060 signaling logic and tumor cells signalling logic 1356 01:05:02,060 --> 01:05:05,200 for which there's evidence in the literature, none of which 1357 01:05:05,200 --> 01:05:07,085 was in the original databases. 1358 01:05:09,726 --> 01:05:11,100 Finally, I'm going I'm just going 1359 01:05:11,100 --> 01:05:15,560 to say that it turns out in another study, what you could 1360 01:05:15,560 --> 01:05:19,259 show is those three pathways that the model predicts 1361 01:05:19,259 --> 01:05:21,050 are the differences between the liver tumor 1362 01:05:21,050 --> 01:05:22,800 cells and the normal cells. 1363 01:05:22,800 --> 01:05:27,410 That in order to kill these liver tumor cells, 1364 01:05:27,410 --> 01:05:30,460 you need inhibitors against all three pathways simultaneously. 1365 01:05:30,460 --> 01:05:32,220 You actually need combination drugs 1366 01:05:32,220 --> 01:05:35,760 of three different pathway inhibitors to kill these cells. 1367 01:05:35,760 --> 01:05:37,730 And it's exactly the three pathways 1368 01:05:37,730 --> 01:05:40,610 that the model predicted of the differences between the normals 1369 01:05:40,610 --> 01:05:42,030 and the tumor cells. 1370 01:05:45,790 --> 01:05:46,600 OK. 1371 01:05:46,600 --> 01:05:48,700 All right, so I will end here and then see 1372 01:05:48,700 --> 01:05:50,510 if there's any more questions. 1373 01:05:50,510 --> 01:05:53,380 Something that comes up a lot is-- 1374 01:05:53,380 --> 01:05:56,520 there's discomfort with Boolean logic because of zero, one. 1375 01:05:56,520 --> 01:05:58,880 It's off, on, and of course we know biology, 1376 01:05:58,880 --> 01:06:01,260 biochemistry doesn't work that way. 1377 01:06:01,260 --> 01:06:03,430 And so there can be so many artifacts, 1378 01:06:03,430 --> 01:06:05,390 so many places that you can get things wrong, 1379 01:06:05,390 --> 01:06:09,210 because you're trying to fit a model where the measurement is 1380 01:06:09,210 --> 01:06:11,147 supposed to be either zero or one, 1381 01:06:11,147 --> 01:06:13,230 and you're comparing it against a measurement that 1382 01:06:13,230 --> 01:06:15,840 might be 0.6. 1383 01:06:15,840 --> 01:06:19,900 Well, 0.6, is that closer to 1, is it closer to 0? 1384 01:06:19,900 --> 01:06:22,624 Is there some normalization that would shift it 1385 01:06:22,624 --> 01:06:23,540 from one to the other. 1386 01:06:23,540 --> 01:06:26,510 And instead of being a correct fit, it's now an incorrect fit. 1387 01:06:26,510 --> 01:06:28,360 So you can see the room for artifacts 1388 01:06:28,360 --> 01:06:34,470 by mapping quantitative data against a qualitative model. 1389 01:06:34,470 --> 01:06:37,490 So, one thing done more recently is to admit that 1390 01:06:37,490 --> 01:06:40,820 and say, well, let's say just relax this a bit. 1391 01:06:40,820 --> 01:06:45,470 And instead of having step functions from off to on, 1392 01:06:45,470 --> 01:06:46,820 that they're more graded. 1393 01:06:46,820 --> 01:06:50,600 It's like an analog transfer function. 1394 01:06:50,600 --> 01:06:53,940 So what you've essentially done is add one more parameter 1395 01:06:53,940 --> 01:06:57,010 to every node, to every gate. 1396 01:06:57,010 --> 01:06:58,920 Because of Boolean logic, there's 1397 01:06:58,920 --> 01:07:00,670 essentially one hidden parameter. 1398 01:07:00,670 --> 01:07:02,910 That's where you shift from off to on, right? 1399 01:07:02,910 --> 01:07:07,280 There's some location of the level of the signal 1400 01:07:07,280 --> 01:07:09,554 that you've decided is 0 or 1. 1401 01:07:09,554 --> 01:07:11,470 So there's some parameter that you shift from, 1402 01:07:11,470 --> 01:07:13,710 saying it's off to on. 1403 01:07:13,710 --> 01:07:16,430 Well here now in this formalism there's that, 1404 01:07:16,430 --> 01:07:19,570 but there's also then the slope of shifting from off to on. 1405 01:07:19,570 --> 01:07:21,580 Is it still fairly steep? 1406 01:07:21,580 --> 01:07:23,040 Is it really mild? 1407 01:07:23,040 --> 01:07:25,611 Is it someplace in between? 1408 01:07:25,611 --> 01:07:26,110 OK? 1409 01:07:26,110 --> 01:07:29,310 And this can go with AND and OR gates too. 1410 01:07:29,310 --> 01:07:30,830 Now, instead of just one dimension, 1411 01:07:30,830 --> 01:07:34,020 one component being off to on or on to off, 1412 01:07:34,020 --> 01:07:38,460 now you got AND and OR gates that have these slopes as well. 1413 01:07:38,460 --> 01:07:42,960 So what this means is you require more data 1414 01:07:42,960 --> 01:07:46,090 to fit this-- we call it a constrained fuzzy logic model 1415 01:07:46,090 --> 01:07:49,120 because you've got-- if I've got 50 nodes in my system, 1416 01:07:49,120 --> 01:07:51,985 I've got 50 more parameters I've got to fit. 1417 01:07:51,985 --> 01:07:54,450 OK, so that requires more data. 1418 01:07:54,450 --> 01:07:59,930 What's the benefit of it, is that your predictions now, 1419 01:07:59,930 --> 01:08:01,441 in fact, can be quantitative. 1420 01:08:01,441 --> 01:08:02,940 So you can go into the model and say 1421 01:08:02,940 --> 01:08:05,340 here's a transcription factor CREB. 1422 01:08:05,340 --> 01:08:08,320 I'm going to predict its phosphorylation state 1423 01:08:08,320 --> 01:08:10,600 and its transcriptional activity, perhaps, 1424 01:08:10,600 --> 01:08:13,694 based on the activities of two upstream kinases. 1425 01:08:13,694 --> 01:08:16,069 And so if I had had an inhibitor for one of these kinases 1426 01:08:16,069 --> 01:08:19,279 or another, how much would I shift the phosphorylation 1427 01:08:19,279 --> 01:08:21,270 of this transcription factor? 1428 01:08:21,270 --> 01:08:25,390 And what you actually see is these gradual curves, that if I 1429 01:08:25,390 --> 01:08:29,109 start to inhibit [INAUDIBLE], OK, it gradually 1430 01:08:29,109 --> 01:08:31,399 changes the phosphorylation of CREB. 1431 01:08:31,399 --> 01:08:34,789 Or if I inhibit the activity of P38, 1432 01:08:34,789 --> 01:08:39,250 it even more gradually effects the activity of CREB. 1433 01:08:39,250 --> 01:08:42,295 So you can turn these into quantitative predictions 1434 01:08:42,295 --> 01:08:45,000 of strong effects, weak effects. 1435 01:08:45,000 --> 01:08:47,149 And again, look at drug combinations. 1436 01:08:47,149 --> 01:08:51,649 So that's the advantage of going to this more analog transfer 1437 01:08:51,649 --> 01:08:53,950 function logic model. 1438 01:08:53,950 --> 01:08:57,430 You can deal with quantification much better, 1439 01:08:57,430 --> 01:08:59,938 but at the cost of requiring more data. 1440 01:08:59,938 --> 01:09:01,229 OK, I think I'll leave it here. 1441 01:09:01,229 --> 01:09:03,140 It's about 3:15 and so if there's 1442 01:09:03,140 --> 01:09:08,090 more questions we can take them about any aspect of this. 1443 01:09:08,090 --> 01:09:11,933 Most of you have stayed awake, I think that's a good thing. 1444 01:09:11,933 --> 01:09:12,710 OK. 1445 01:09:12,710 --> 01:09:13,366 More questions? 1446 01:09:20,104 --> 01:09:21,020 AUDIENCE: [INAUDIBLE]. 1447 01:09:21,020 --> 01:09:25,970 When you have the model for the template of the Ikk story. 1448 01:09:25,970 --> 01:09:28,069 And then it seems like it may not 1449 01:09:28,069 --> 01:09:31,822 be as easy to back out the original data that 1450 01:09:31,822 --> 01:09:33,485 led to that specific mode. 1451 01:09:33,485 --> 01:09:35,100 For example, you showed that one arc 1452 01:09:35,100 --> 01:09:38,035 was from this one treatment. 1453 01:09:38,035 --> 01:09:41,182 But because if you trained the model the same as that 1454 01:09:41,182 --> 01:09:45,000 and it's not deterministic, then what-- could you just add-- 1455 01:09:45,000 --> 01:09:48,720 DOUG LAUFFENBURGER: I think that's a great question. 1456 01:09:48,720 --> 01:09:52,827 So let's say there's new arcs that you add, 1457 01:09:52,827 --> 01:09:54,410 that weren't in the original scaffold. 1458 01:09:54,410 --> 01:09:56,660 I mean that's what you got the biggest questions from. 1459 01:09:56,660 --> 01:09:58,620 If you delete one, you say, ah, it's 1460 01:09:58,620 --> 01:10:01,420 easy to believe why you would delete one. 1461 01:10:01,420 --> 01:10:03,480 Any arc that you add to get a best fit, 1462 01:10:03,480 --> 01:10:05,920 I think you've got to ask questions about. 1463 01:10:05,920 --> 01:10:07,450 So in all those cases where there 1464 01:10:07,450 --> 01:10:10,320 are arcs that were added that led to a better fit model, 1465 01:10:10,320 --> 01:10:12,610 the first thing we did was go to the literature. 1466 01:10:12,610 --> 01:10:17,480 Say, OK, is there literature on some affect of this node 1467 01:10:17,480 --> 01:10:19,160 to that node? 1468 01:10:19,160 --> 01:10:20,950 And it's just that that literature 1469 01:10:20,950 --> 01:10:22,560 wasn't curated into that database 1470 01:10:22,560 --> 01:10:24,830 or something like that. 1471 01:10:24,830 --> 01:10:27,730 And most of the time we could find it there. 1472 01:10:27,730 --> 01:10:28,490 OK. 1473 01:10:28,490 --> 01:10:31,720 So then there were the cases, and this was the most prominent 1474 01:10:31,720 --> 01:10:35,270 one, where from some added arc, we just 1475 01:10:35,270 --> 01:10:36,770 couldn't find it in the literature. 1476 01:10:36,770 --> 01:10:38,280 In this particular case, it was very 1477 01:10:38,280 --> 01:10:40,830 easy to trace it to this particular effect 1478 01:10:40,830 --> 01:10:43,412 of this inhibitor. 1479 01:10:43,412 --> 01:10:45,870 I would say there's no reason to believe that that's always 1480 01:10:45,870 --> 01:10:46,960 going to be the case. 1481 01:10:46,960 --> 01:10:49,790 I don't have another example to show you where it was harder. 1482 01:10:49,790 --> 01:10:52,804 Everything else we actually found in the literature. 1483 01:10:52,804 --> 01:10:55,220 But you could imagine, having some new arc that you really 1484 01:10:55,220 --> 01:10:57,250 couldn't find in the literature and there's 1485 01:10:57,250 --> 01:11:00,330 no artifactual explanation for it. 1486 01:11:00,330 --> 01:11:04,250 And now, how you trace it back to what the data was 1487 01:11:04,250 --> 01:11:07,274 that might give you a more nuanced hint. 1488 01:11:07,274 --> 01:11:08,190 It's a great question. 1489 01:11:08,190 --> 01:11:11,760 I don't really know how we'll do that. 1490 01:11:11,760 --> 01:11:14,450 I think, we and other practitioners who use this, 1491 01:11:14,450 --> 01:11:17,310 I'm sure we'll run into it at some point. 1492 01:11:17,310 --> 01:11:22,108 That's a great challenge to be thinking about. 1493 01:11:22,108 --> 01:11:22,608 Yeah? 1494 01:11:22,608 --> 01:11:24,596 AUDIENCE: I might have missed this earlier, 1495 01:11:24,596 --> 01:11:30,680 but I was wondering, is this model actually 1496 01:11:30,680 --> 01:11:34,600 able to incorporate the heterogeneity of a tumor, 1497 01:11:34,600 --> 01:11:35,580 for example? 1498 01:11:35,580 --> 01:11:40,010 Or the population heterogeneity? 1499 01:11:40,010 --> 01:11:44,068 DOUG LAUFFENBURGER: That's also a really interesting question. 1500 01:11:44,068 --> 01:11:47,470 Let me try to show something here. 1501 01:11:47,470 --> 01:11:50,380 Yeah. 1502 01:11:50,380 --> 01:11:52,320 So two things. 1503 01:11:52,320 --> 01:11:58,210 One is, what's shown here-- this is the four different tumor 1504 01:11:58,210 --> 01:12:00,270 cells that we did. 1505 01:12:00,270 --> 01:12:05,480 And what's shown in color is the arcs for each one of them. 1506 01:12:05,480 --> 01:12:09,510 So yellow, orange, brown, red. 1507 01:12:09,510 --> 01:12:12,840 So some places you see all four of those colors there. 1508 01:12:12,840 --> 01:12:15,150 In some places only two or one. 1509 01:12:15,150 --> 01:12:17,650 It says if I had four different tumor types, 1510 01:12:17,650 --> 01:12:21,450 there's some slight differences in logic among them. 1511 01:12:21,450 --> 01:12:23,340 You could translate to that is, well 1512 01:12:23,340 --> 01:12:25,230 I could imagine then having a tumor 1513 01:12:25,230 --> 01:12:31,360 that's a mixture of sub types and how would I discern that? 1514 01:12:31,360 --> 01:12:35,120 One possible idea that's attractive to me-- 1515 01:12:35,120 --> 01:12:37,450 although we didn't really explore this in any form of 1516 01:12:37,450 --> 01:12:37,640 [INAUDIBLE]. 1517 01:12:37,640 --> 01:12:39,350 We didn't really have the means to make 1518 01:12:39,350 --> 01:12:40,933 experimental measurements on the tumor 1519 01:12:40,933 --> 01:12:42,790 heterogeneity at the time. 1520 01:12:42,790 --> 01:12:45,570 It's when you get to a set of consensus models. 1521 01:12:45,570 --> 01:12:46,070 Right. 1522 01:12:46,070 --> 01:12:49,450 So let's say you get the 50 best fit models and you say, 1523 01:12:49,450 --> 01:12:55,530 some arc is in 80% of them, but it's not in 20% of them, 1524 01:12:55,530 --> 01:12:57,980 is it possible that that represents 1525 01:12:57,980 --> 01:13:02,250 some of heterogeneity because you're 1526 01:13:02,250 --> 01:13:05,860 getting an average of different subtypes? 1527 01:13:05,860 --> 01:13:06,730 I don't know. 1528 01:13:06,730 --> 01:13:09,650 Sometimes that appeals to me as potentially valid. 1529 01:13:09,650 --> 01:13:13,820 Sometimes I think there's a flaw in that reasoning. 1530 01:13:13,820 --> 01:13:17,695 Just because you get an average that's not then as strong. 1531 01:13:17,695 --> 01:13:21,580 Does that necessarily reflect a sub-population? 1532 01:13:21,580 --> 01:13:24,230 I don't know. 1533 01:13:24,230 --> 01:13:26,700 So what we do know is, we can see differences 1534 01:13:26,700 --> 01:13:28,770 when there are differences. 1535 01:13:28,770 --> 01:13:31,380 How you actually see them then, if all you 1536 01:13:31,380 --> 01:13:34,670 have is averaged data, maybe it's 1537 01:13:34,670 --> 01:13:38,110 reflected in the heterogeneity of the consensus models. 1538 01:13:38,110 --> 01:13:38,860 Maybe not. 1539 01:13:41,710 --> 01:13:43,500 It would be an interesting to explore. 1540 01:13:48,349 --> 01:13:48,890 Anybody else? 1541 01:13:54,940 --> 01:13:55,440 All right. 1542 01:13:55,440 --> 01:13:55,940 All set. 1543 01:13:55,940 --> 01:13:57,280 Thanks.