1 00:00:00,090 --> 00:00:02,500 The following content is provided under a Creative 2 00:00:02,500 --> 00:00:04,019 Commons license. 3 00:00:04,019 --> 00:00:06,360 Your support will help MIT OpenCourseWare 4 00:00:06,360 --> 00:00:10,730 continue to offer high quality educational resources for free. 5 00:00:10,730 --> 00:00:13,340 To make a donation or view additional materials 6 00:00:13,340 --> 00:00:17,217 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,217 --> 00:00:17,842 at ocw.mit.edu. 8 00:00:21,680 --> 00:00:25,510 PROFESSOR: So today, our goal is to really go through this 9 00:00:25,510 --> 00:00:28,240 the paper that you read maybe last night 10 00:00:28,240 --> 00:00:31,130 by Dekel and Alon "Optimality and Evolutionary Tuning 11 00:00:31,130 --> 00:00:32,855 of the Expression Level of a Protein." 12 00:00:32,855 --> 00:00:35,690 It was published in Nature in 2005. 13 00:00:35,690 --> 00:00:39,140 I think that it's a very interesting paper, exploring 14 00:00:39,140 --> 00:00:42,060 some kind of big general ideas. 15 00:00:42,060 --> 00:00:46,960 I think it's also, in some ways, rather misleading. 16 00:00:46,960 --> 00:00:49,970 And we'll try to understand or discuss 17 00:00:49,970 --> 00:00:54,630 the ways in which the connections between experiment, 18 00:00:54,630 --> 00:00:57,760 theory, prediction, and so forth, 19 00:00:57,760 --> 00:01:02,190 how they all play out in the context of this problem. 20 00:01:02,190 --> 00:01:04,360 Before we get going too much on the science, 21 00:01:04,360 --> 00:01:07,310 I just want to remind everyone that Andrew will not 22 00:01:07,310 --> 00:01:08,790 be having office hours today. 23 00:01:08,790 --> 00:01:13,800 He is off interviewing for MD, PhD programs right now. 24 00:01:13,800 --> 00:01:16,370 But if you had questions about the problems, 25 00:01:16,370 --> 00:01:18,770 I hope that you asked [? Sarab ?] last night. 26 00:01:18,770 --> 00:01:21,640 You might be able to grab him after the lecture 27 00:01:21,640 --> 00:01:25,230 today, but yes. 28 00:01:25,230 --> 00:01:30,895 Any other questions about anything before we get going? 29 00:01:30,895 --> 00:01:33,830 No. 30 00:01:33,830 --> 00:01:37,120 So I think that this paper in general, I guess, 31 00:01:37,120 --> 00:01:40,460 the lecture today is really a combination 32 00:01:40,460 --> 00:01:43,750 of trying to start thinking about maybe laboratory 33 00:01:43,750 --> 00:01:48,480 evolution or kind of population level phenomena in general, 34 00:01:48,480 --> 00:01:52,090 as well as this question of optimization 35 00:01:52,090 --> 00:01:55,630 in terms of protein expression. 36 00:01:55,630 --> 00:02:00,870 So can somebody just maybe summarize 37 00:02:00,870 --> 00:02:02,750 the big idea of this paper? 38 00:02:10,150 --> 00:02:12,394 Yes, please. 39 00:02:12,394 --> 00:02:15,262 AUDIENCE: Protein expression levels 40 00:02:15,262 --> 00:02:20,998 evolve to optimal values for cost-benefit questions. 41 00:02:20,998 --> 00:02:23,920 PROFESSOR: Right, so that's the argument at least. 42 00:02:23,920 --> 00:02:28,000 And they have a very nice first sentence here. 43 00:02:28,000 --> 00:02:31,220 "Different proteins have different expression levels." 44 00:02:31,220 --> 00:02:33,540 You know, it's hard to argue with that statement, 45 00:02:33,540 --> 00:02:34,660 nice, concise. 46 00:02:34,660 --> 00:02:37,600 But the question is, well, why? 47 00:02:37,600 --> 00:02:40,470 And I'd say that there is a range, 48 00:02:40,470 --> 00:02:44,910 different philosophical opinions out in the world. 49 00:02:44,910 --> 00:02:48,470 I said that some group that is very much reflected 50 00:02:48,470 --> 00:02:51,610 in this study is trying to think about this 51 00:02:51,610 --> 00:02:54,180 in the context of optimization. 52 00:02:54,180 --> 00:02:57,900 Well, maybe the reason that we see a given level of expression 53 00:02:57,900 --> 00:03:01,780 of some protein is because, at least over evolutionary time, 54 00:03:01,780 --> 00:03:04,600 in some ancestral environment that we don't know, but maybe 55 00:03:04,600 --> 00:03:09,671 it evolved to optimize some cost-benefit problem. 56 00:03:09,671 --> 00:03:11,420 And then I'd say that there's another kind 57 00:03:11,420 --> 00:03:13,060 of general philosophical approach 58 00:03:13,060 --> 00:03:16,740 that tends to be a little bit more agnostic or just 59 00:03:16,740 --> 00:03:21,270 maybe more of a sense that certainly things could have 60 00:03:21,270 --> 00:03:23,020 evolved to optimize something. 61 00:03:23,020 --> 00:03:25,880 But we can never really know where they evolved in, 62 00:03:25,880 --> 00:03:28,730 so we shouldn't be going out on a limb on these things. 63 00:03:28,730 --> 00:03:32,150 And given that this is philosophy, 64 00:03:32,150 --> 00:03:34,839 I will maybe not require that you agree 65 00:03:34,839 --> 00:03:36,130 with any particular standpoint. 66 00:03:36,130 --> 00:03:38,420 But I will say that it's at least worth thinking 67 00:03:38,420 --> 00:03:40,070 about the question and maybe you can 68 00:03:40,070 --> 00:03:44,150 do measurements to illuminate whether all these ideas might 69 00:03:44,150 --> 00:03:45,390 make sense. 70 00:03:45,390 --> 00:03:49,560 And then we'll try to, over the next hour and a half, 71 00:03:49,560 --> 00:03:53,660 figure out to what degree this paper maybe should convince us 72 00:03:53,660 --> 00:03:55,740 of this optimization in the context 73 00:03:55,740 --> 00:03:58,640 of this particular protein. 74 00:03:58,640 --> 00:04:01,360 Now, even if it's the case that somebody 75 00:04:01,360 --> 00:04:05,490 convinces you maybe that expression of the lac operon 76 00:04:05,490 --> 00:04:07,730 maybe does optimize some cost-benefit analysis. 77 00:04:07,730 --> 00:04:11,760 That does not prove that every protein optimize things. 78 00:04:11,760 --> 00:04:15,310 So don't get overwhelmed or underwhelmed 79 00:04:15,310 --> 00:04:17,920 or whatever it might be. 80 00:04:17,920 --> 00:04:20,500 Let's just first make sure that we 81 00:04:20,500 --> 00:04:23,960 understand what we mean by costs and benefits in this case. 82 00:04:23,960 --> 00:04:27,330 Can somebody pick one of them? 83 00:04:35,340 --> 00:04:37,860 Now, what is a cost and benefit in the context 84 00:04:37,860 --> 00:04:39,060 of maybe this paper? 85 00:04:46,272 --> 00:04:46,772 Yes. 86 00:04:46,772 --> 00:04:50,146 AUDIENCE: Producing protein requires some kind of resource. 87 00:04:50,146 --> 00:04:52,295 PROFESSOR: Right, requires-- 88 00:04:52,295 --> 00:04:54,550 AUDIENCE: [INAUDIBLE] energy-- 89 00:04:54,550 --> 00:04:59,297 PROFESSOR: --requires resources of some sort or another 90 00:04:59,297 --> 00:05:00,380 to express these proteins. 91 00:05:04,460 --> 00:05:07,780 And this can manifest in many different ways. 92 00:05:07,780 --> 00:05:11,342 But certainly, if you were not making these proteins, 93 00:05:11,342 --> 00:05:13,300 you could have been making some other proteins. 94 00:05:13,300 --> 00:05:15,110 And so if these proteins are not helping you, then 95 00:05:15,110 --> 00:05:16,640 maybe something else would have. 96 00:05:16,640 --> 00:05:20,190 But they're are many different ways of looking at this. 97 00:05:20,190 --> 00:05:22,840 But there is some finite number of things that the cell can do. 98 00:05:25,550 --> 00:05:28,250 And the benefits, of course, in the case-- and this 99 00:05:28,250 --> 00:05:30,700 is in particular in the case of a lac 100 00:05:30,700 --> 00:05:38,925 operon, what does this network allow us to do? 101 00:05:38,925 --> 00:05:39,425 Yeah. 102 00:05:39,425 --> 00:05:42,400 AUDIENCE: You get to consume the energy of lactose. 103 00:05:42,400 --> 00:05:43,076 PROFESSOR: Yes. 104 00:05:43,076 --> 00:05:44,284 AUDIENCE: Lets you go faster. 105 00:05:44,284 --> 00:05:48,419 PROFESSOR: That's right, you get to consume lactose 106 00:05:48,419 --> 00:05:48,960 in this case. 107 00:05:51,720 --> 00:05:55,190 Now, we've already spent some time thinking or discussing 108 00:05:55,190 --> 00:05:57,230 the lac operon. 109 00:05:57,230 --> 00:06:08,180 What were the two key components in here in the lac operon? 110 00:06:08,180 --> 00:06:11,237 If you were a cell and you wanted to eat lactose, 111 00:06:11,237 --> 00:06:12,320 what would you need to do? 112 00:06:25,550 --> 00:06:27,581 I'm picking somebody to-- yes, please. 113 00:06:27,581 --> 00:06:31,189 AUDIENCE: It's a gene that you should express, the lac gene? 114 00:06:31,189 --> 00:06:32,090 PROFESSOR: OK, right. 115 00:06:32,090 --> 00:06:33,030 So the lac genes. 116 00:06:33,030 --> 00:06:35,816 But maybe in a little bit more detail, 117 00:06:35,816 --> 00:06:37,565 what do we mean when we say the lac genes? 118 00:06:40,900 --> 00:06:44,561 Well, I mean, it's not just lactose. 119 00:06:44,561 --> 00:06:46,060 But I mean, what are the things that 120 00:06:46,060 --> 00:06:49,234 have to happen if you want to eat anything, I guess? 121 00:06:49,234 --> 00:06:49,733 Your cell-- 122 00:06:52,320 --> 00:06:53,670 AUDIENCE: Import. 123 00:06:53,670 --> 00:06:57,700 PROFESSOR: Right, so you first have to import it. 124 00:06:57,700 --> 00:07:01,970 Now, in some cases, this can be done maybe for some-- maybe 125 00:07:01,970 --> 00:07:03,900 nutrients, it could be done even passively, 126 00:07:03,900 --> 00:07:05,620 if it crosses the membrane easily. 127 00:07:05,620 --> 00:07:08,600 But for most of the things that you might think about, 128 00:07:08,600 --> 00:07:10,710 you actually have to do active import. 129 00:07:10,710 --> 00:07:12,352 So this is done by what? 130 00:07:12,352 --> 00:07:13,060 Anybody remember? 131 00:07:16,451 --> 00:07:16,950 lacY. 132 00:07:20,660 --> 00:07:27,050 So lacY is a membrane protein that imports lactose. 133 00:07:33,330 --> 00:07:35,751 And then what do you need to do? 134 00:07:35,751 --> 00:07:38,613 AUDIENCE: Break the two apart, and then you can metabolize. 135 00:07:38,613 --> 00:07:43,120 PROFESSOR: Right, then you have to eat it somehow. 136 00:07:43,120 --> 00:07:46,300 Now, of course, metabolism is a very complicated thing. 137 00:07:46,300 --> 00:07:48,930 But the key thing that's different between lactose 138 00:07:48,930 --> 00:07:52,450 and maybe the simple sugars is that you first 139 00:07:52,450 --> 00:07:55,760 have to break down the lactose into its constituent parts. 140 00:07:55,760 --> 00:07:58,530 A lactose is a disaccharide composed 141 00:07:58,530 --> 00:08:01,340 of two simple monosaccharides. 142 00:08:01,340 --> 00:08:05,210 So what you need is you need this lacZ, beta-galactosidase, 143 00:08:05,210 --> 00:08:07,670 in order to cleave that bond. 144 00:08:07,670 --> 00:08:09,850 And then you have the two simple monosaccharides 145 00:08:09,850 --> 00:08:10,600 that can be eaten. 146 00:08:13,990 --> 00:08:16,100 Now, the lac operon also has this lacA. 147 00:08:16,100 --> 00:08:18,200 And it's not quite obvious what that thing does, 148 00:08:18,200 --> 00:08:19,460 so nobody ever talks about it. 149 00:08:19,460 --> 00:08:23,180 But there is a third protein there. 150 00:08:23,180 --> 00:08:24,929 But what we always talk about is lacY, 151 00:08:24,929 --> 00:08:27,220 that's require to import the lactose and then lacZ that 152 00:08:27,220 --> 00:08:32,429 is required to break the lactose down into its monosaccharides. 153 00:08:32,429 --> 00:08:35,440 And then the idea-- and that's not sufficient. 154 00:08:35,440 --> 00:08:36,940 You don't take those monosaccharides 155 00:08:36,940 --> 00:08:39,390 and instantly make more cells out of it. 156 00:08:39,390 --> 00:08:43,150 But the idea is that the rest of the metabolic machinery 157 00:08:43,150 --> 00:08:46,210 is kind of there any ways to do other-- that's 158 00:08:46,210 --> 00:08:47,320 kind of some assumptions. 159 00:08:58,570 --> 00:09:04,840 Can somebody explain how it is that they measured the cost 160 00:09:04,840 --> 00:09:06,450 of expressing these proteins? 161 00:09:11,641 --> 00:09:12,140 Yes. 162 00:09:12,140 --> 00:09:19,535 AUDIENCE: So they [INAUDIBLE] expressed these proteins 163 00:09:19,535 --> 00:09:22,493 at different levels using different concentrations 164 00:09:22,493 --> 00:09:23,479 of IPTG. 165 00:09:23,479 --> 00:09:24,250 PROFESSOR: Right. 166 00:09:24,250 --> 00:09:26,272 AUDIENCE: There was no lactose around, 167 00:09:26,272 --> 00:09:28,062 so it was only the cost to no benefits. 168 00:09:28,062 --> 00:09:29,520 And then they measured [INAUDIBLE]. 169 00:09:29,520 --> 00:09:30,770 PROFESSOR: All right, perfect. 170 00:09:30,770 --> 00:09:32,760 OK, so there are several key things in here. 171 00:09:32,760 --> 00:09:34,330 So first of all, normally, what we do 172 00:09:34,330 --> 00:09:38,710 is it's lactose inside the cell that causes this lac 173 00:09:38,710 --> 00:09:40,550 repressor to fall off and then you get 174 00:09:40,550 --> 00:09:46,200 expression of the lac operon. 175 00:09:46,200 --> 00:09:50,010 But in order to kind of sidestep or circumvent 176 00:09:50,010 --> 00:09:52,760 that normal network, what we are doing in this case 177 00:09:52,760 --> 00:09:54,672 is adding IPTG. 178 00:09:54,672 --> 00:10:04,070 So IPTG allows one to get expression of-- 179 00:10:04,070 --> 00:10:12,770 and what IPTG is that it stops the inhibition of this lac 180 00:10:12,770 --> 00:10:14,480 promoter, where you get lacZ and lacY. 181 00:10:20,330 --> 00:10:24,120 Now, the idea here is that you can 182 00:10:24,120 --> 00:10:28,540 control the level of expression of this operon, 183 00:10:28,540 --> 00:10:30,600 because what we really want is we 184 00:10:30,600 --> 00:10:33,840 want to measure a plot of something 185 00:10:33,840 --> 00:10:38,350 that you would call cost-- and we'll explore a little bit 186 00:10:38,350 --> 00:10:42,341 more what that means-- as a function of the lac operon 187 00:10:42,341 --> 00:10:42,840 expression. 188 00:10:46,860 --> 00:10:52,500 And this is often done relative to the full induction 189 00:10:52,500 --> 00:10:54,490 of the wild type lac operon. 190 00:10:57,440 --> 00:11:00,370 And this is a relative growth rate reduction. 191 00:11:00,370 --> 00:11:03,250 So basically, this is a percentage, 192 00:11:03,250 --> 00:11:06,180 say, decrease in growth rate. 193 00:11:10,770 --> 00:11:12,790 Now, there was a key thing that you brought up, 194 00:11:12,790 --> 00:11:15,160 which is that you want to measure the growth 195 00:11:15,160 --> 00:11:17,430 rate in the absence of lactose. 196 00:11:17,430 --> 00:11:20,650 Because otherwise as we increase the level of expression 197 00:11:20,650 --> 00:11:23,370 here-- so we're controlling this by IPTG, 198 00:11:23,370 --> 00:11:26,210 so there's some mapping from IPTG concentration 199 00:11:26,210 --> 00:11:29,390 to the level of expression here. 200 00:11:29,390 --> 00:11:32,630 But we want to be able to measure the cost 201 00:11:32,630 --> 00:11:35,430 separate from the benefits. 202 00:11:35,430 --> 00:11:39,760 So it's important then to grow this in the absence of lactose. 203 00:11:39,760 --> 00:11:41,140 So say, no lactose. 204 00:11:44,180 --> 00:11:47,920 But if I just take bacteria and I put them 205 00:11:47,920 --> 00:11:52,760 in a tube with say minimal media, salt, 206 00:11:52,760 --> 00:11:57,834 so forth, but no lactose, are they going to grow? 207 00:11:57,834 --> 00:11:59,000 They need to draw something. 208 00:12:01,820 --> 00:12:03,802 So what is it that the authors have done? 209 00:12:11,174 --> 00:12:11,674 Yes. 210 00:12:11,674 --> 00:12:12,658 AUDIENCE: Glycerol. 211 00:12:12,658 --> 00:12:16,300 PROFESSOR: That's right, they added some glycerol 212 00:12:16,300 --> 00:12:18,270 and in different parts. 213 00:12:18,270 --> 00:12:20,100 I think it's 1% glycerol. 214 00:12:20,100 --> 00:12:23,950 Does anybody happen to remember? 215 00:12:23,950 --> 00:12:26,740 I think, for most of it, it was 0.1%. 216 00:12:26,740 --> 00:12:28,510 I tell you what, we'll say a little bit 217 00:12:28,510 --> 00:12:30,940 of small concentrations of glycerol. 218 00:12:35,215 --> 00:12:40,360 So the idea is that this is kind of a second rate carbon source. 219 00:12:40,360 --> 00:12:44,630 The bacteria are not super happy, but they're OK. 220 00:12:44,630 --> 00:12:49,605 And then given this, what they were able 221 00:12:49,605 --> 00:12:51,480 demonstrate is that, if they did add lactose, 222 00:12:51,480 --> 00:12:54,100 they would have grown faster. 223 00:12:54,100 --> 00:12:57,832 So there's a sense that the lactose does help the cells. 224 00:12:57,832 --> 00:12:59,290 But you have to have some glycerol. 225 00:12:59,290 --> 00:13:03,069 Otherwise, you can't really measure these things. 226 00:13:03,069 --> 00:13:03,569 Yeah. 227 00:13:03,569 --> 00:13:05,755 AUDIENCE: Why is it that-- you were saying if you put like 228 00:13:05,755 --> 00:13:07,050 a very good carbon source-- 229 00:13:07,050 --> 00:13:07,460 PROFESSOR: Well-- 230 00:13:07,460 --> 00:13:08,820 AUDIENCE: You're not going to see any [INAUDIBLE]. 231 00:13:08,820 --> 00:13:10,065 PROFESSOR: OK, so first of all what I was saying 232 00:13:10,065 --> 00:13:11,590 is that you have to have some carbon source. 233 00:13:11,590 --> 00:13:12,270 AUDIENCE: Sure. 234 00:13:12,270 --> 00:13:16,030 PROFESSOR: Right, so you have to do something. 235 00:13:16,030 --> 00:13:17,820 And it's just good conceptually to make 236 00:13:17,820 --> 00:13:19,130 sure you think about how you would actually 237 00:13:19,130 --> 00:13:19,940 do this experiment. 238 00:13:19,940 --> 00:13:21,685 Now, you have to add some carbon source. 239 00:13:21,685 --> 00:13:23,810 But the question is, well, what happens if you just 240 00:13:23,810 --> 00:13:24,960 added a bunch of glucose? 241 00:13:27,780 --> 00:13:31,670 Now, in that case actually, for some of the other experiments, 242 00:13:31,670 --> 00:13:34,040 I think that would have caused problems in the sense 243 00:13:34,040 --> 00:13:36,940 that then there would not be any benefits associated 244 00:13:36,940 --> 00:13:43,040 with growing or with adding increasing lac operon 245 00:13:43,040 --> 00:13:44,470 expression. 246 00:13:44,470 --> 00:13:45,940 For this experiment, in principle, 247 00:13:45,940 --> 00:13:48,529 one could have done that, although you really 248 00:13:48,529 --> 00:13:50,570 want to measure the costs and associated benefits 249 00:13:50,570 --> 00:13:51,690 in some environment, which you're 250 00:13:51,690 --> 00:13:53,064 to be doing in later experiments. 251 00:13:53,064 --> 00:13:56,790 So I think it's really from a conceptual standpoint, 252 00:13:56,790 --> 00:14:00,850 in principle, you can measure this in glucose, 253 00:14:00,850 --> 00:14:03,696 but then you'd always worry, oh, well, maybe it's different. 254 00:14:03,696 --> 00:14:04,195 Yeah. 255 00:14:04,195 --> 00:14:07,251 AUDIENCE: [INAUDIBLE]. 256 00:14:07,251 --> 00:14:08,630 PROFESSOR: Oh, yeah, right. 257 00:14:08,630 --> 00:14:13,440 So you could have broken down-- So the other issue is that, 258 00:14:13,440 --> 00:14:15,840 in principle-- and they don't talk about this here-- 259 00:14:15,840 --> 00:14:17,790 but yeah, if you add a bunch of glucose, then 260 00:14:17,790 --> 00:14:19,720 you would have to have another mutant in order 261 00:14:19,720 --> 00:14:22,480 to break the glucose repression, because if you 262 00:14:22,480 --> 00:14:24,020 have this preferred carbon source, 263 00:14:24,020 --> 00:14:28,730 glucose, then you'll naturally repress the CRP, all 264 00:14:28,730 --> 00:14:31,860 of the alternative modes of carbon metabolism 265 00:14:31,860 --> 00:14:33,650 just because glucose was kind of the best. 266 00:14:38,100 --> 00:14:47,700 And what was the key conclusion from this first data plot? 267 00:14:47,700 --> 00:14:49,176 AUDIENCE: It's nonlinear. 268 00:14:49,176 --> 00:14:51,370 PROFESSOR: All right, nonlinear. 269 00:14:51,370 --> 00:14:53,660 The cost is a function of the lac expression. 270 00:14:53,660 --> 00:14:54,827 And it grows super linearly. 271 00:14:54,827 --> 00:14:56,284 I always forget what the difference 272 00:14:56,284 --> 00:14:57,469 is in concave and convex is. 273 00:14:57,469 --> 00:14:59,760 I don't know if other people have this particular brain 274 00:14:59,760 --> 00:15:00,259 problem. 275 00:15:00,259 --> 00:15:03,559 But the second derivative is positive. 276 00:15:03,559 --> 00:15:05,100 In particular, that means that if you 277 00:15:05,100 --> 00:15:10,610 do draw some sort of like line, then 278 00:15:10,610 --> 00:15:13,985 they have data that looks something like-- so here 279 00:15:13,985 --> 00:15:15,980 is 0.5. 280 00:15:15,980 --> 00:15:19,790 We have something that kind of falls below here. 281 00:15:22,770 --> 00:15:24,880 They had about a 0.25. 282 00:15:24,880 --> 00:15:29,555 And it was also a little bit below that crossed. 283 00:15:29,555 --> 00:15:34,100 They had a 0.75. 284 00:15:34,100 --> 00:15:35,550 And then they had a 1. 285 00:15:35,550 --> 00:15:37,630 Why is it that they can't go above 1 here? 286 00:15:47,724 --> 00:15:49,390 Why do they not have more data out here? 287 00:15:54,130 --> 00:15:56,500 AUDIENCE: Because you can't have more expression 288 00:15:56,500 --> 00:15:57,450 than full expression. 289 00:15:57,450 --> 00:15:58,900 PROFESSOR: You can't have more expression than full expression 290 00:15:58,900 --> 00:16:01,020 with this promoter, because what they are doing 291 00:16:01,020 --> 00:16:03,420 is they're adding IPTG, so they titrate 292 00:16:03,420 --> 00:16:07,752 between 0 and maximal expression from this promoter. 293 00:16:07,752 --> 00:16:09,710 In principle, you could always get another one. 294 00:16:09,710 --> 00:16:12,083 And then you should be able to go out further, right? 295 00:16:15,640 --> 00:16:18,280 And at maximal expression, they measure 296 00:16:18,280 --> 00:16:26,695 about a 4% growth deficit, 0.04, just 297 00:16:26,695 --> 00:16:28,370 to give you a sense of scale. 298 00:16:28,370 --> 00:16:31,220 So this is 4% deficit. 299 00:16:34,800 --> 00:16:38,310 Now, I want to ask a more general question. 300 00:16:38,310 --> 00:16:40,560 So let's imagine that you are measuring some quantity. 301 00:16:43,950 --> 00:16:48,320 So we'll say this is some quality y as a function of x. 302 00:16:48,320 --> 00:16:53,390 And let's imagine that the true y as a function of x 303 00:16:53,390 --> 00:16:59,570 looks like something. 304 00:16:59,570 --> 00:17:04,670 Now, you go and you measure at multiple values 305 00:17:04,670 --> 00:17:13,440 of x this curve, because we're very interested in what 306 00:17:13,440 --> 00:17:15,640 this curve looks like. 307 00:17:15,640 --> 00:17:22,470 Now, the question is, what fraction of the error bars 308 00:17:22,470 --> 00:17:42,550 will contain this curve and, of course, 309 00:17:42,550 --> 00:17:47,500 contain this is true curve? 310 00:17:47,500 --> 00:17:53,804 So I'm assuming that this curve is the god-given actual thing 311 00:17:53,804 --> 00:17:54,720 that you're measuring. 312 00:17:54,720 --> 00:17:57,020 And so you measure this quantity with noise. 313 00:18:01,480 --> 00:18:04,670 So we measure this some number of times, some number of times. 314 00:18:11,579 --> 00:18:12,870 Do you understand the question? 315 00:18:12,870 --> 00:18:19,980 So here, contained the curve. 316 00:18:19,980 --> 00:18:21,970 There, it didn't. 317 00:18:21,970 --> 00:18:29,965 So what fraction of error bars will contain that curve? 318 00:18:29,965 --> 00:18:31,785 AUDIENCE: [INAUDIBLE]. 319 00:18:31,785 --> 00:18:34,990 PROFESSOR: Right. 320 00:18:34,990 --> 00:18:37,530 And indeed, what we want-- it's always good-- 321 00:18:37,530 --> 00:18:41,260 what were the error bars in the figure 2A in this? 322 00:18:45,160 --> 00:18:48,420 Right, well, OK, so they're experimental error, right. 323 00:18:48,420 --> 00:18:50,170 Incidentally, how is it that they actually 324 00:18:50,170 --> 00:18:51,045 measure these things? 325 00:18:51,045 --> 00:19:00,510 Does anybody-- And so these are actually 326 00:19:00,510 --> 00:19:04,600 a result of growing on a nice [INAUDIBLE] well, 327 00:19:04,600 --> 00:19:06,400 like a microtiter plate, where they 328 00:19:06,400 --> 00:19:07,650 used a checkerboard pattern. 329 00:19:07,650 --> 00:19:10,320 And they take 48 different cultures. 330 00:19:10,320 --> 00:19:11,727 And they measure the growth rates 331 00:19:11,727 --> 00:19:12,810 for each one individually. 332 00:19:12,810 --> 00:19:16,240 And then they're plotting the standard error of the mean. 333 00:19:49,240 --> 00:19:51,200 Do you understand what I'm trying to ask you? 334 00:19:54,062 --> 00:19:56,924 AUDIENCE: So in that case, I mean, 335 00:19:56,924 --> 00:20:00,077 the size of the error bars, you just want a scaling 336 00:20:00,077 --> 00:20:01,910 or something, [? if that's right, ?] because 337 00:20:01,910 --> 00:20:03,190 the size of the error bars-- 338 00:20:03,190 --> 00:20:04,550 PROFESSOR: Right, well-- 339 00:20:04,550 --> 00:20:05,300 AUDIENCE: I just-- 340 00:20:05,300 --> 00:20:06,450 PROFESSOR: Yeah, OK, so this is a good question. 341 00:20:06,450 --> 00:20:07,075 We'll find out. 342 00:20:32,150 --> 00:20:35,160 So it depends on n, where n is the number of samples 343 00:20:35,160 --> 00:20:37,075 that we took at each location. 344 00:20:41,700 --> 00:20:42,400 Question, yeah. 345 00:20:42,400 --> 00:20:45,774 AUDIENCE: Yeah, the standard error is just the [INAUDIBLE]? 346 00:20:48,412 --> 00:20:50,620 PROFESSOR: Right, so standard error of the mean, this 347 00:20:50,620 --> 00:20:51,661 is an important question. 348 00:20:54,190 --> 00:20:57,570 What you do is you calculate the standard deviation, divide 349 00:20:57,570 --> 00:21:01,560 by the square root of n-- OK, now, 350 00:21:01,560 --> 00:21:04,760 I always forget whether it's n or n minus 1, now. 351 00:21:04,760 --> 00:21:09,020 We already did one n minus 1, right? 352 00:21:09,020 --> 00:21:12,210 So it's you measure the standard deviation 353 00:21:12,210 --> 00:21:15,380 of the data, the standard deviation in y divided 354 00:21:15,380 --> 00:21:18,200 by root n, where n is the number of measurements 355 00:21:18,200 --> 00:21:20,390 you took at that point. 356 00:21:20,390 --> 00:21:26,140 But of course, when you measure the standard deviation, 357 00:21:26,140 --> 00:21:29,590 there was already an n minus 1, right? 358 00:21:29,590 --> 00:21:31,390 Have I lost a minus 1? 359 00:21:31,390 --> 00:21:34,510 Do you guys-- OK. 360 00:21:37,412 --> 00:21:37,912 Yeah. 361 00:21:37,912 --> 00:21:41,476 AUDIENCE: Isn't the standard-- I thought the standard error 362 00:21:41,476 --> 00:21:45,202 of the mean and not the actual standard deviation [INAUDIBLE]? 363 00:21:45,202 --> 00:21:46,740 PROFESSOR: Yes. 364 00:21:46,740 --> 00:21:49,540 And we're going to spend a lot of time talking about what 365 00:21:49,540 --> 00:21:51,890 the difference is between a standard deviation 366 00:21:51,890 --> 00:21:53,290 and a standard error of the mean. 367 00:21:53,290 --> 00:21:55,123 And it depends on what you're trying to ask. 368 00:22:04,975 --> 00:22:07,100 Do you guys understand what I'm trying to ask here? 369 00:22:09,625 --> 00:22:11,500 All right, well, let's just see where we are, 370 00:22:11,500 --> 00:22:12,790 and then we'll discuss. 371 00:22:12,790 --> 00:22:14,030 OK, ready? 372 00:22:14,030 --> 00:22:16,805 3, 2, 1. 373 00:22:22,840 --> 00:22:29,470 All right, so we got many A's, B's, C's. 374 00:22:29,470 --> 00:22:36,250 Nobody likes D. OK, but it's very common to see that. 375 00:22:36,250 --> 00:22:38,920 Let's go ahead and-- it's worthwhile, 376 00:22:38,920 --> 00:22:45,240 I think there's enough variation to decide. 377 00:22:45,240 --> 00:22:47,970 And in particular, between your neighbor, 378 00:22:47,970 --> 00:22:50,640 try to agree on why or why not it 379 00:22:50,640 --> 00:22:54,313 might depend on n and so forth. 380 00:22:54,313 --> 00:22:56,187 We'll just have a minute to think about this. 381 00:22:59,369 --> 00:23:00,660 AUDIENCE: [INTERPOSING VOICES]. 382 00:24:07,755 --> 00:24:09,516 PROFESSOR: So what do you guys think? 383 00:24:09,516 --> 00:24:12,780 AUDIENCE: We're still [INAUDIBLE]. 384 00:24:12,780 --> 00:24:14,100 PROFESSOR: OK, no, that's fine. 385 00:24:14,100 --> 00:24:15,391 AUDIENCE: [INTERPOSING VOICES]. 386 00:24:50,072 --> 00:24:52,030 PROFESSOR: Why don't we go ahead and reconvene, 387 00:24:52,030 --> 00:24:56,650 so we can kind of try to figure out what is going on here. 388 00:24:56,650 --> 00:24:58,330 I just want to see if anybody has 389 00:24:58,330 --> 00:25:00,647 changed their opinion as a result of discussing 390 00:25:00,647 --> 00:25:01,480 with their neighbor. 391 00:25:01,480 --> 00:25:02,590 All right, let's see it. 392 00:25:02,590 --> 00:25:03,740 3, 2, 1. 393 00:25:06,280 --> 00:25:08,280 Some people are not even willing to-- all right, 394 00:25:08,280 --> 00:25:09,280 OK, so it's interesting. 395 00:25:09,280 --> 00:25:11,230 So now, actually, it seems like there 396 00:25:11,230 --> 00:25:15,430 is some convergence to this. 397 00:25:15,430 --> 00:25:19,790 Should I feel like that you guys in general 398 00:25:19,790 --> 00:25:24,630 have more accurate votes than past years somehow. 399 00:25:24,630 --> 00:25:26,820 I don't know. 400 00:25:26,820 --> 00:25:30,950 So let's try to figure out why it might be that 401 00:25:30,950 --> 00:25:33,143 and what this thing standard deviation is. 402 00:25:33,143 --> 00:25:35,226 Let's try to figure out what all these things are. 403 00:25:39,510 --> 00:25:44,710 So the idea is that we're going to measure some quantity, 404 00:25:44,710 --> 00:25:46,820 but it's a measurement with error. 405 00:25:46,820 --> 00:25:49,210 And for now, we'll just assume that the measurement error 406 00:25:49,210 --> 00:25:54,880 is Gaussian distributed, because otherwise, we get confused 407 00:25:54,880 --> 00:25:56,290 and everything. 408 00:25:56,290 --> 00:25:58,866 So let's say-- so what we're going to do 409 00:25:58,866 --> 00:26:01,020 is we're going to measure some quantity with error. 410 00:26:01,020 --> 00:26:12,750 OK, so it's-- Now, what we're interested in is not really 411 00:26:12,750 --> 00:26:14,560 the width of the resulting distribution, 412 00:26:14,560 --> 00:26:17,200 because that's a result of how accurate, 413 00:26:17,200 --> 00:26:19,950 how good we are as experimentalists. 414 00:26:19,950 --> 00:26:24,110 What we're really interested in is this true quantity, so 415 00:26:24,110 --> 00:26:27,520 the mean of our distribution. 416 00:26:27,520 --> 00:26:28,500 We want to know mean. 417 00:26:34,840 --> 00:26:36,550 Now, if you read the supplemental section 418 00:26:36,550 --> 00:26:38,466 of this paper, what you'll see is that there's 419 00:26:38,466 --> 00:26:42,260 a significant standard deviation to their measurements, 420 00:26:42,260 --> 00:26:44,929 where the standard deviation, they don't actually 421 00:26:44,929 --> 00:26:45,970 quote exactly what it is. 422 00:26:45,970 --> 00:26:49,350 But they have plots of the histograms, 423 00:26:49,350 --> 00:26:52,750 where like, for example, this is a histogram 424 00:26:52,750 --> 00:26:55,270 of the different growth rate measurements across those 48 425 00:26:55,270 --> 00:26:59,080 samples, and actually, in this case, even more than that. 426 00:27:01,960 --> 00:27:04,160 But what you see is that the standard deviation 427 00:27:04,160 --> 00:27:08,210 might be 3%, 4%. 428 00:27:08,210 --> 00:27:16,400 So the standard deviation is actually something that's big. 429 00:27:16,400 --> 00:27:18,570 Now the question is, what we really 430 00:27:18,570 --> 00:27:21,202 want to now is, how the mean of these distributions 431 00:27:21,202 --> 00:27:23,160 are shifting, because we want to know something 432 00:27:23,160 --> 00:27:27,600 about this true underlying growth rate deficit, 433 00:27:27,600 --> 00:27:29,330 because each individual measurement 434 00:27:29,330 --> 00:27:32,750 is a rather noisy measurement. 435 00:27:32,750 --> 00:27:34,340 And indeed, in this case, the noise 436 00:27:34,340 --> 00:27:36,500 is larger than the signal. 437 00:27:36,500 --> 00:27:38,830 But if we believe that we don't have a shifting 438 00:27:38,830 --> 00:27:41,546 systematic error, then we can average that out 439 00:27:41,546 --> 00:27:42,920 just by making many measurements. 440 00:27:49,290 --> 00:27:51,400 So the question is, so the standard error 441 00:27:51,400 --> 00:27:53,930 of the mean, what it's telling us about 442 00:27:53,930 --> 00:27:58,800 is that if you measure this quantity n times, 443 00:27:58,800 --> 00:27:59,930 you get some mean. 444 00:28:03,420 --> 00:28:06,030 So let's say that this is a-- ooh, 445 00:28:06,030 --> 00:28:08,170 it's a little bit of a broad somehow Gaussian. 446 00:28:14,250 --> 00:28:18,520 So this is a histogram of our measurements of this thing. 447 00:28:18,520 --> 00:28:23,840 And what we want to know is the mean of this distribution. 448 00:28:23,840 --> 00:28:26,830 So this is similar to our discussion of super resolution 449 00:28:26,830 --> 00:28:29,440 microscopy. 450 00:28:29,440 --> 00:28:31,850 And the question is, how will the mean 451 00:28:31,850 --> 00:28:34,445 be distributed if you have these n measurements? 452 00:28:40,070 --> 00:28:42,240 It's a Gaussian distribution. 453 00:28:42,240 --> 00:28:44,660 And it's certainly a Gaussian distribution, 454 00:28:44,660 --> 00:28:46,640 because of course, if we-- what we're doing 455 00:28:46,640 --> 00:28:48,480 is we're measuring a bunch of Gaussians. 456 00:28:48,480 --> 00:28:51,229 And we're going to add them all together. 457 00:28:51,229 --> 00:28:53,020 And then we're going to calculate the mean. 458 00:28:53,020 --> 00:28:56,120 So we definitely get a Gaussian. 459 00:28:56,120 --> 00:28:58,942 And indeed, because of the central limit theorem, 460 00:28:58,942 --> 00:29:01,150 this is also saying that even if your errors were not 461 00:29:01,150 --> 00:29:04,892 distributed super Gaussian, even if they 462 00:29:04,892 --> 00:29:07,350 were a little bit funny shaped, the resulting distributions 463 00:29:07,350 --> 00:29:09,310 of the means will look more like a Gaussian. 464 00:29:13,320 --> 00:29:16,650 Now, what we often plot is the standard error 465 00:29:16,650 --> 00:29:21,330 of the mean, which is kind of the plus or minus 1 sigma 466 00:29:21,330 --> 00:29:24,490 of the distribution of the mean. 467 00:29:24,490 --> 00:29:28,070 So if we go and we sample from this distribution n times, 468 00:29:28,070 --> 00:29:29,215 we'll get some value. 469 00:29:29,215 --> 00:29:30,590 If we sample from it again, we'll 470 00:29:30,590 --> 00:29:32,032 get some other value, so forth. 471 00:29:32,032 --> 00:29:34,490 Now, the distribution of the means we're going to calculate 472 00:29:34,490 --> 00:29:35,948 is not going to be a representation 473 00:29:35,948 --> 00:29:38,450 of the full standard deviation. 474 00:29:38,450 --> 00:29:40,150 But rather, it's going to be suppressed 475 00:29:40,150 --> 00:29:42,790 by this root n, where n is the number that we're sampling. 476 00:29:42,790 --> 00:29:45,510 So if you look at the histogram of the means, 477 00:29:45,510 --> 00:29:47,710 you're going to get a Gaussian in here-- OK, that's 478 00:29:47,710 --> 00:29:54,860 not a very nice Gaussian, but-- with a width that 479 00:29:54,860 --> 00:29:57,930 is the standard deviation divided by root n. 480 00:30:01,800 --> 00:30:07,270 Now, if we assume that we don't have any systematic error, then 481 00:30:07,270 --> 00:30:12,650 this distribution of means that you would have calculated-- 482 00:30:12,650 --> 00:30:14,940 it's Gaussian, it's centered on the right value, 483 00:30:14,940 --> 00:30:19,680 but about a third of the time, it'll 484 00:30:19,680 --> 00:30:23,230 be beyond the plus or minus 1 sigma. 485 00:30:23,230 --> 00:30:27,537 And what that means is that about a third of time, 486 00:30:27,537 --> 00:30:29,370 if you plot this standard error of the mean, 487 00:30:29,370 --> 00:30:32,060 it should fall off of the kind of true curve. 488 00:30:36,060 --> 00:30:41,330 And this basically does not depend on n. 489 00:30:41,330 --> 00:30:45,182 And can somebody say why it doesn't? 490 00:30:45,182 --> 00:30:46,150 Yeah. 491 00:30:46,150 --> 00:30:49,054 AUDIENCE: Yeah, I think I was sort of confusing myself, 492 00:30:49,054 --> 00:30:50,506 but this makes sense. 493 00:30:50,506 --> 00:30:55,081 So yeah, I mean, you know that these error bars will shrink, 494 00:30:55,081 --> 00:30:56,524 if you take more measurements. 495 00:30:56,524 --> 00:30:59,420 But on the other hand, the actual measurements 496 00:30:59,420 --> 00:31:00,090 will be closer-- 497 00:31:00,090 --> 00:31:01,090 PROFESSOR: That's right. 498 00:31:01,090 --> 00:31:02,352 AUDIENCE: --to the true value. 499 00:31:02,352 --> 00:31:03,351 PROFESSOR: That's right. 500 00:31:03,351 --> 00:31:05,510 So what happens is that as you sample 501 00:31:05,510 --> 00:31:08,160 from this distribution a larger number n times, 502 00:31:08,160 --> 00:31:10,420 then your error bars shrink, but your measurements 503 00:31:10,420 --> 00:31:12,230 get closer to the curve. 504 00:31:12,230 --> 00:31:14,170 And those two effects cancel. 505 00:31:14,170 --> 00:31:19,220 So you should end up roughly with 2/3 of the errors bars 506 00:31:19,220 --> 00:31:23,200 containing this curve, or 1/3 falling off. 507 00:31:23,200 --> 00:31:25,890 And I think that this is a little bit surprising, 508 00:31:25,890 --> 00:31:29,940 because there's always a sense that we feel that there's 509 00:31:29,940 --> 00:31:32,790 something wrong with our measurements or something 510 00:31:32,790 --> 00:31:40,320 wrong with our model or whatnot, if any error bar does not 511 00:31:40,320 --> 00:31:41,210 contain the line. 512 00:31:41,210 --> 00:31:44,420 I mean, I feel like I often see there's this effort that people 513 00:31:44,420 --> 00:31:51,220 have to try to make it so that these error bars always overlap 514 00:31:51,220 --> 00:31:53,565 with some underlying curve that is supposed 515 00:31:53,565 --> 00:31:54,830 to represent reality. 516 00:31:54,830 --> 00:31:58,880 But that's not, in principle, supposed to be true. 517 00:32:02,016 --> 00:32:05,772 Are there any questions about where we are right now? 518 00:32:05,772 --> 00:32:07,580 OK. 519 00:32:07,580 --> 00:32:10,160 Now, what I want to do is something slightly different, 520 00:32:10,160 --> 00:32:15,560 which is ask-- let's say that this is a curve that is not 521 00:32:15,560 --> 00:32:19,235 the underlying reality but is instead a fit to the data. 522 00:32:21,920 --> 00:32:28,421 How does this change anything that we've said? 523 00:32:31,310 --> 00:32:33,462 Or does it? 524 00:32:33,462 --> 00:32:39,510 All right, well, OK, let's-- OK, so we're going to say do fit. 525 00:32:39,510 --> 00:32:45,660 The question is does this change the thing here? 526 00:32:45,660 --> 00:32:46,410 Do you understand? 527 00:32:51,950 --> 00:32:57,475 Change, A is Yes, B is No. 528 00:33:02,270 --> 00:33:02,770 Yes. 529 00:33:02,770 --> 00:33:05,752 AUDIENCE: Do you have the same modeling for the fit 530 00:33:05,752 --> 00:33:07,740 as we did for the original-- 531 00:33:07,740 --> 00:33:12,190 PROFESSOR: Yeah, well, let's say that this was a curve predicted 532 00:33:12,190 --> 00:33:16,900 by some fancy theory but that you 533 00:33:16,900 --> 00:33:21,324 have to specify the mass of something and the-- so I 534 00:33:21,324 --> 00:33:23,490 don't know, there are two things that are specified. 535 00:33:23,490 --> 00:33:25,390 So what you do is you fit. 536 00:33:25,390 --> 00:33:27,270 And the question is, does it change 537 00:33:27,270 --> 00:33:29,530 what fraction of the error bars you expect 538 00:33:29,530 --> 00:33:32,320 to contain the true curve? 539 00:33:32,320 --> 00:33:34,100 Ready? 540 00:33:34,100 --> 00:33:35,647 Is it not clear what I'm asking? 541 00:33:35,647 --> 00:33:39,623 AUDIENCE: But the true curve is determined by the god. 542 00:33:39,623 --> 00:33:40,617 [LAUGHTER] 543 00:33:40,617 --> 00:33:42,890 PROFESSOR: Right, so the truth curves-- we don't 544 00:33:42,890 --> 00:33:46,750 need to get too much into this. 545 00:33:46,750 --> 00:33:49,160 But I mean, the reason we're doing 546 00:33:49,160 --> 00:33:52,730 science is to try to look into the mind of God, right? 547 00:33:52,730 --> 00:33:55,140 So we were doing a fit to try to-- 548 00:33:55,140 --> 00:33:58,104 AUDIENCE: But you can fit anything to anything. 549 00:33:58,104 --> 00:34:00,952 You know, what does that mean? 550 00:34:00,952 --> 00:34:01,910 Do you see what I mean? 551 00:34:01,910 --> 00:34:02,860 Like, I could-- 552 00:34:02,860 --> 00:34:03,810 AUDIENCE: It depends on whether-- 553 00:34:03,810 --> 00:34:05,435 AUDIENCE: I could get curve that passes 554 00:34:05,435 --> 00:34:08,082 through each and every of these points points, 555 00:34:08,082 --> 00:34:10,330 if you give me enough time with it. 556 00:34:10,330 --> 00:34:12,589 So I guess I don't understand the question. 557 00:34:12,589 --> 00:34:13,422 [INTERPOSING VOICES] 558 00:34:19,494 --> 00:34:20,650 PROFESSOR: All right, yeah. 559 00:34:20,650 --> 00:34:24,820 OK, but I think you're arguing for something already maybe. 560 00:34:24,820 --> 00:34:28,870 But let's just say that this was a-- I mean, 561 00:34:28,870 --> 00:34:33,969 let's just for concreteness let's say that I measured at 15 562 00:34:33,969 --> 00:34:35,340 values of x. 563 00:34:35,340 --> 00:34:37,630 I have some error bars and some error. 564 00:34:37,630 --> 00:34:40,530 But then I needed three parameters 565 00:34:40,530 --> 00:34:41,880 to characterize this curve. 566 00:34:41,880 --> 00:34:45,246 And so those I used to fit. 567 00:34:45,246 --> 00:34:47,120 Are you happier with three fitting parameters 568 00:34:47,120 --> 00:34:49,824 and 15 measurements? 569 00:34:49,824 --> 00:34:51,449 All right, let's just see where we are. 570 00:34:51,449 --> 00:34:56,460 OK, ready, 3, 2, 1. 571 00:34:56,460 --> 00:35:00,220 OK, so we have a majority of A but a significant minority 572 00:35:00,220 --> 00:35:04,540 of B. So just to be a lot more concrete, 573 00:35:04,540 --> 00:35:06,270 can somebody say why they're saying yes? 574 00:35:13,594 --> 00:35:14,094 Yeah. 575 00:35:14,094 --> 00:35:17,191 AUDIENCE: I guess intuitively, [INAUDIBLE] we 576 00:35:17,191 --> 00:35:19,962 try to optimize the number of error bars that go through. 577 00:35:19,962 --> 00:35:21,800 PROFESSOR: Yeah, so the fit is somehow 578 00:35:21,800 --> 00:35:25,929 trying to get the curve to go near the error bars. 579 00:35:25,929 --> 00:35:27,470 And typically when we do a fit, we're 580 00:35:27,470 --> 00:35:30,180 typically trying to minimize this mean squared 581 00:35:30,180 --> 00:35:34,200 error or deviation from our curve to the data point. 582 00:35:38,940 --> 00:35:41,631 How much you expect this to make a difference? 583 00:35:41,631 --> 00:35:43,130 So for concreteness again, let's say 584 00:35:43,130 --> 00:35:53,300 that I had 15 values of x that I was measuring things at. 585 00:35:53,300 --> 00:36:00,070 Now, we expect say five of them-- five will miss 586 00:36:00,070 --> 00:36:06,590 true curve, we decided roughly. 587 00:36:06,590 --> 00:36:10,720 Now the question is, what happens if we, instead 588 00:36:10,720 --> 00:36:13,370 of having this true curve, if we do a fit using these three 589 00:36:13,370 --> 00:36:15,800 parameters? 590 00:36:15,800 --> 00:36:20,270 How much of a difference should it make to this very, 591 00:36:20,270 --> 00:36:22,390 very roughly? 592 00:36:22,390 --> 00:36:38,480 We'll see-- Now, I'm asking roughly how many of these error 593 00:36:38,480 --> 00:36:41,190 bars do you expect to then miss the fitted curve? 594 00:36:43,890 --> 00:36:49,310 And this is we used three fitting parameters, say. 595 00:36:54,610 --> 00:36:56,640 That was parameters over there. 596 00:36:56,640 --> 00:36:58,550 Do understand the question? 597 00:36:58,550 --> 00:37:01,190 So instead of plotting this god-given curve, 598 00:37:01,190 --> 00:37:03,220 instead we're plotting a curve that I'm 599 00:37:03,220 --> 00:37:06,170 giving you, where I use three fitting parameters to fit 600 00:37:06,170 --> 00:37:06,670 to the data. 601 00:37:09,410 --> 00:37:11,290 And I'm just trying to get it roughly. 602 00:37:11,290 --> 00:37:14,355 I think that this is not a rigorous statement 603 00:37:14,355 --> 00:37:16,230 I'm about to make, but just so that we're all 604 00:37:16,230 --> 00:37:17,350 roughly on the same page. 605 00:37:17,350 --> 00:37:20,415 All right, ready, 3, 2, 1. 606 00:37:27,900 --> 00:37:31,030 Right, so it'll be somewhere in here. 607 00:37:31,030 --> 00:37:34,240 And I think this is not quite true. 608 00:37:34,240 --> 00:37:39,100 But the idea is that, in particular, 609 00:37:39,100 --> 00:37:43,450 if you make n measurements and then you 610 00:37:43,450 --> 00:37:47,190 use n fitting parameters, in general 611 00:37:47,190 --> 00:37:50,960 you will get a perfect fit, i.e. 612 00:37:50,960 --> 00:37:53,780 the curve will go through every single data 613 00:37:53,780 --> 00:37:57,170 point amazingly perfectly. 614 00:37:57,170 --> 00:38:01,830 So if I give you 15 measurements across here 615 00:38:01,830 --> 00:38:05,060 and then I give you a 15-degree polynomial-- I guess, 616 00:38:05,060 --> 00:38:07,020 we only need a 14-degree polynomial 617 00:38:07,020 --> 00:38:10,180 with 15 free parameters-- then that polynomial 618 00:38:10,180 --> 00:38:14,960 will go through everyone of your data points spot on, 619 00:38:14,960 --> 00:38:18,790 not even a question of whether it goes through the error bars. 620 00:38:18,790 --> 00:38:31,742 I'm saying literally-- and that's 621 00:38:31,742 --> 00:38:34,200 just because you're just solving an equation at that stage. 622 00:38:37,060 --> 00:38:40,510 Now, this is a stupid statement, except that once you're 623 00:38:40,510 --> 00:38:44,020 kind of like in the heat of the moment, 624 00:38:44,020 --> 00:38:46,480 eagerly trying to do some fitting for your advisor 625 00:38:46,480 --> 00:38:49,920 or whatever, it's easy to fall into this trap, where 626 00:38:49,920 --> 00:38:53,169 you just kind of like add extra parameters. 627 00:38:53,169 --> 00:38:55,210 I mean, I definitely remember in graduate school, 628 00:38:55,210 --> 00:38:55,940 I was surprised. 629 00:38:55,940 --> 00:38:57,981 I was like, oh, this thing, it works wonderfully. 630 00:38:57,981 --> 00:39:01,380 It's like it seems to magically goes through all my data. 631 00:39:01,380 --> 00:39:05,690 And then I felt very stupid like 30 seconds later. 632 00:39:05,690 --> 00:39:08,400 But this is just a very easy thing 633 00:39:08,400 --> 00:39:11,880 to screw up and forget about. 634 00:39:11,880 --> 00:39:13,680 So what this is saying is that, if you 635 00:39:13,680 --> 00:39:16,930 see a curve-- if in the course of your work 636 00:39:16,930 --> 00:39:23,020 or if you're reading a paper and you see some curve 637 00:39:23,020 --> 00:39:27,430 and you want to know something about how much information is 638 00:39:27,430 --> 00:39:30,810 it or whether things look reasonable given the data, 639 00:39:30,810 --> 00:39:33,750 it's useful to kind of orient yourself 640 00:39:33,750 --> 00:39:38,040 relative to these statements, that depending 641 00:39:38,040 --> 00:39:40,960 on how many free parameters you're kind of using, 642 00:39:40,960 --> 00:39:45,810 you expect a larger or smaller number of these data 643 00:39:45,810 --> 00:39:48,650 points to kind of go through the curve that you see. 644 00:39:53,910 --> 00:39:57,397 But I would just want to stress that you 645 00:39:57,397 --> 00:39:59,855 don't want to be anywhere close to the point where you have 646 00:39:59,855 --> 00:40:04,132 a number of parameters equal to the number of kind 647 00:40:04,132 --> 00:40:05,590 of measurements that you're making. 648 00:40:05,590 --> 00:40:09,720 And for any sort of reasonable curve 649 00:40:09,720 --> 00:40:11,580 describing what you hope is a reality, 650 00:40:11,580 --> 00:40:15,445 you expect some of those data points with their error bars 651 00:40:15,445 --> 00:40:16,920 to kind of miss the curve. 652 00:40:16,920 --> 00:40:18,340 And that doesn't mean that they're 653 00:40:18,340 --> 00:40:19,300 sloppy experimentalists. 654 00:40:19,300 --> 00:40:20,341 It doesn't mean whatever. 655 00:40:25,250 --> 00:40:31,640 OK, now coming back to the task at hand, 656 00:40:31,640 --> 00:40:33,460 do you understand why they're plotting 657 00:40:33,460 --> 00:40:34,670 the standard error of the mean rather 658 00:40:34,670 --> 00:40:35,836 than the standard deviation? 659 00:40:38,500 --> 00:40:41,781 Because what your interest in, in principle, is not-- 660 00:40:41,781 --> 00:40:43,280 the question you're trying to answer 661 00:40:43,280 --> 00:40:46,720 is not how variable are their measurements 662 00:40:46,720 --> 00:40:49,900 but to what certainty can they claim 663 00:40:49,900 --> 00:40:56,847 to know the actual god-given, real cost associated 664 00:40:56,847 --> 00:40:59,430 with expressing these proteins as a function of the expression 665 00:40:59,430 --> 00:41:00,000 level. 666 00:41:00,000 --> 00:41:01,480 And for that, you really want to ask about the standard error 667 00:41:01,480 --> 00:41:02,158 of the mean. 668 00:41:07,051 --> 00:41:07,550 Great. 669 00:41:10,500 --> 00:41:15,450 So now, we can come back and ask about, 670 00:41:15,450 --> 00:41:17,750 why did I just spend half an hour 671 00:41:17,750 --> 00:41:22,820 talking about standard error of the mean, standard deviations, 672 00:41:22,820 --> 00:41:23,650 fitting to data? 673 00:41:28,057 --> 00:41:30,640 Well, you guys are probably all asking yourself that question. 674 00:41:30,640 --> 00:41:33,814 But does anybody have an answer if I-- Yeah. 675 00:41:33,814 --> 00:41:40,506 AUDIENCE: You can fit with different curves 676 00:41:40,506 --> 00:41:42,312 if you use different things. 677 00:41:42,312 --> 00:41:44,770 PROFESSOR: You can fit with different curves if you-- yeah, 678 00:41:44,770 --> 00:41:48,614 I think that that's hard to argue that statement. 679 00:41:48,614 --> 00:41:51,030 But the statement is a little bit like "different proteins 680 00:41:51,030 --> 00:41:55,350 have different expression levels," but a little bit more 681 00:41:55,350 --> 00:41:57,749 concrete maybe. 682 00:41:57,749 --> 00:41:58,249 Yeah. 683 00:41:58,249 --> 00:42:00,503 AUDIENCE: So in this case, I didn't 684 00:42:00,503 --> 00:42:03,079 check their calculations, but if you have a natural line, 685 00:42:03,079 --> 00:42:07,426 then you can't make this calculation of optimization. 686 00:42:07,426 --> 00:42:10,121 PROFESSOR: Yeah, but I think that-- right, so-- 687 00:42:10,121 --> 00:42:12,550 AUDIENCE: In the sense that there won't be [INAUDIBLE]. 688 00:42:12,550 --> 00:42:14,450 PROFESSOR: Yeah, OK. 689 00:42:14,450 --> 00:42:18,000 So I think that this is a tricky thing. 690 00:42:18,000 --> 00:42:24,630 The data certainly do argue for a super linear cost. 691 00:42:24,630 --> 00:42:29,640 But I would say that they argued for it rather weakly, 692 00:42:29,640 --> 00:42:35,350 in that if you look at their data and you just fit a line, 693 00:42:35,350 --> 00:42:39,510 you would say, it's maybe OK. 694 00:42:39,510 --> 00:42:41,640 And of course, once again, should we 695 00:42:41,640 --> 00:42:46,110 be surprised that the quadratic fits better? 696 00:42:46,110 --> 00:42:47,200 No. 697 00:42:47,200 --> 00:42:50,480 And this is a very dangerous thing, 698 00:42:50,480 --> 00:42:51,639 if you're comparing models. 699 00:42:51,639 --> 00:42:53,930 It'll always be the case, if you add another parameter, 700 00:42:53,930 --> 00:42:57,240 it will look better. 701 00:42:57,240 --> 00:43:01,730 But the question then is how strong of a case 702 00:43:01,730 --> 00:43:03,670 should we make of this? 703 00:43:03,670 --> 00:43:10,980 And then how important is it for the conclusions of the study? 704 00:43:10,980 --> 00:43:16,390 Now, in addition to the line and the quadratic, 705 00:43:16,390 --> 00:43:19,820 they had another curve in here, which 706 00:43:19,820 --> 00:43:22,650 looks like-- let me see if I can get it right for you guys. 707 00:43:28,080 --> 00:43:32,860 So this is fine, tricky thing. 708 00:43:32,860 --> 00:43:34,900 So it's the dashed line that looks 709 00:43:34,900 --> 00:43:37,650 very similar to the solid quadratic line. 710 00:43:37,650 --> 00:43:39,400 Can somebody remind us what the difference 711 00:43:39,400 --> 00:43:43,950 was between those two non-linear curves that they had? 712 00:43:53,350 --> 00:43:56,950 Why do they have two curves that look so similar? 713 00:44:07,730 --> 00:44:10,610 AUDIENCE: I think the dashed line responds to some model 714 00:44:10,610 --> 00:44:14,750 where there's only so much of this certain resource that-- 715 00:44:14,750 --> 00:44:17,840 PROFESSOR: Right, OK, so my dashed line is their red line, 716 00:44:17,840 --> 00:44:22,600 just to-- OK dashed red in the paper. 717 00:44:22,600 --> 00:44:27,060 So it's this line where there's a finite amount of resources 718 00:44:27,060 --> 00:44:29,890 or protein-making machinery that the cell has. 719 00:44:29,890 --> 00:44:32,920 And if you use them up, then you don't get any growth. 720 00:44:32,920 --> 00:44:36,020 And of course, that statement kind of 721 00:44:36,020 --> 00:44:37,540 has to be true on some level. 722 00:44:37,540 --> 00:44:39,455 And the question is whether-- 723 00:44:39,455 --> 00:44:40,826 AUDIENCE: --that scale is-- 724 00:44:40,826 --> 00:44:42,450 PROFESSOR: --it's relevant here, right. 725 00:44:46,620 --> 00:44:48,890 Certainly, I would say that one question 726 00:44:48,890 --> 00:44:52,920 is whether you can reject the hypothesis that this cost 727 00:44:52,920 --> 00:44:53,810 function is a line. 728 00:44:53,810 --> 00:44:56,940 Another question is whether you can distinguish between the two 729 00:44:56,940 --> 00:45:01,490 quadratic or the two non-linear curves based on the data. 730 00:45:01,490 --> 00:45:03,720 And I think the answer to the second question 731 00:45:03,720 --> 00:45:05,960 is certainly not. 732 00:45:05,960 --> 00:45:07,510 And they don't claim that they can. 733 00:45:10,410 --> 00:45:12,590 But it's important to just note that it's 734 00:45:12,590 --> 00:45:15,320 just impossible for them to assume-- 735 00:45:15,320 --> 00:45:19,180 I mean, those curves are so, so similar over the entire range 736 00:45:19,180 --> 00:45:20,900 where they have data that it's going 737 00:45:20,900 --> 00:45:24,420 to be possible to distinguish those two things. 738 00:45:24,420 --> 00:45:28,620 But does it matter which of the two cost functions 739 00:45:28,620 --> 00:45:30,630 is the true cost function? 740 00:45:33,595 --> 00:45:34,095 Yeah. 741 00:45:34,095 --> 00:45:37,560 AUDIENCE: Is it because the [INAUDIBLE] where 742 00:45:37,560 --> 00:45:39,045 the marginal benefits become zero 743 00:45:39,045 --> 00:45:43,005 is like inside the range where the cost functions are still 744 00:45:43,005 --> 00:45:44,985 exactly the same? 745 00:45:44,985 --> 00:45:46,060 PROFESSOR: OK, right. 746 00:45:46,060 --> 00:45:48,435 So what you're saying is that the two cost functions they 747 00:45:48,435 --> 00:45:53,100 have they behave similarly over the range that is relevant 748 00:45:53,100 --> 00:45:56,350 maybe, so then therefore, it doesn't matter. 749 00:45:56,350 --> 00:45:59,749 Is that-- or am I-- OK. 750 00:45:59,749 --> 00:46:01,790 So why do they have to cost functions there then, 751 00:46:01,790 --> 00:46:03,390 why two non-linear cost functions? 752 00:46:08,790 --> 00:46:14,134 Just to provide variety in our modeling? 753 00:46:14,134 --> 00:46:14,634 Yep. 754 00:46:14,634 --> 00:46:17,094 AUDIENCE: They were doing another experiment later on, 755 00:46:17,094 --> 00:46:22,506 and they said something like something was saturated 756 00:46:22,506 --> 00:46:24,774 and that was modeled by the second cost function. 757 00:46:24,774 --> 00:46:25,690 PROFESSOR: Right, yes. 758 00:46:25,690 --> 00:46:26,770 That's right. 759 00:46:26,770 --> 00:46:28,947 And what's the later experiment they're going to do, 760 00:46:28,947 --> 00:46:29,780 just so that we're-- 761 00:46:34,532 --> 00:46:36,115 AUDIENCE: You should ask somebody else 762 00:46:36,115 --> 00:46:37,202 to explain that, not me. 763 00:46:37,202 --> 00:46:38,910 PROFESSOR: You regret opening your mouth. 764 00:46:38,910 --> 00:46:40,984 No, OK. 765 00:46:40,984 --> 00:46:42,400 So yeah, so what is the experiment 766 00:46:42,400 --> 00:46:45,291 that they're going to do? 767 00:46:45,291 --> 00:46:47,159 AUDIENCE: Measuring the benefit? 768 00:46:47,159 --> 00:46:49,533 PROFESSOR: So next, they're going to measure the benefit. 769 00:46:49,533 --> 00:46:52,730 But this question about the two cost functions 770 00:46:52,730 --> 00:46:54,802 is not somehow relevant yet for the benefit part. 771 00:47:01,760 --> 00:47:02,420 Yes. 772 00:47:02,420 --> 00:47:04,900 AUDIENCE: So they're doing it in different concentrations 773 00:47:04,900 --> 00:47:08,620 of lactose and seeing if the protein expression could 774 00:47:08,620 --> 00:47:09,370 adapt [INAUDIBLE]. 775 00:47:09,370 --> 00:47:11,590 PROFESSOR: Right, after a long time. 776 00:47:11,590 --> 00:47:13,880 So they actually do laboratory evolution experiments, 777 00:47:13,880 --> 00:47:16,780 where they grow these bacterial populations 778 00:47:16,780 --> 00:47:18,820 in different lactose concentrations. 779 00:47:18,820 --> 00:47:21,630 And then they look to see what level of the lac operon 780 00:47:21,630 --> 00:47:25,146 expression does the population evolve to. 781 00:47:25,146 --> 00:47:26,520 So what they're trying to do here 782 00:47:26,520 --> 00:47:28,470 is they're trying to say, OK, well, 783 00:47:28,470 --> 00:47:31,680 we can measure some cost as a function of expression. 784 00:47:31,680 --> 00:47:33,540 Maybe we can measure some benefits 785 00:47:33,540 --> 00:47:34,792 as a function of expression. 786 00:47:34,792 --> 00:47:36,250 And then from that, we'd like to be 787 00:47:36,250 --> 00:47:38,940 able to predict where the population will evolve to. 788 00:47:49,260 --> 00:47:52,604 And they had these two non-linear cost functions, 789 00:47:52,604 --> 00:47:55,020 which based on the data they have, they can't distinguish. 790 00:47:55,020 --> 00:47:57,519 But they say, oh, well, they're both kind of reasonable cost 791 00:47:57,519 --> 00:47:58,640 functions. 792 00:47:58,640 --> 00:48:00,250 And in some ways, maybe the problem 793 00:48:00,250 --> 00:48:03,550 here is that the two costs functions end up 794 00:48:03,550 --> 00:48:07,620 being wildly different in terms of predicting what happens 795 00:48:07,620 --> 00:48:11,849 for large lac concentrations, where you would want 796 00:48:11,849 --> 00:48:13,140 to express more of the protein. 797 00:48:20,140 --> 00:48:23,351 Do you guys-- do you remember this or not? 798 00:48:23,351 --> 00:48:23,850 Sort of. 799 00:48:26,687 --> 00:48:28,770 And that's actually-- well, you might as well just 800 00:48:28,770 --> 00:48:30,230 look at that. 801 00:48:30,230 --> 00:48:33,590 So that's figure 4. 802 00:48:33,590 --> 00:48:35,900 That's the normalized lacZ activity 803 00:48:35,900 --> 00:48:37,336 that the populations evolve to as 804 00:48:37,336 --> 00:48:38,960 a function of the lactose concentration 805 00:48:38,960 --> 00:48:39,820 they're evolving in. 806 00:48:39,820 --> 00:48:43,710 And what you see is that this red curve corresponding 807 00:48:43,710 --> 00:48:49,020 to the finite resources cost function, it explains the data. 808 00:48:49,020 --> 00:48:50,800 Whereas, the other ones very much do not. 809 00:48:55,570 --> 00:48:59,230 And that's just because these other models then 810 00:48:59,230 --> 00:49:02,570 would predict that if you grow the cells in a lot of lactose 811 00:49:02,570 --> 00:49:05,800 they should express out to five times 812 00:49:05,800 --> 00:49:08,442 the lac expression, much, much, much more, which is not 813 00:49:08,442 --> 00:49:09,650 what they see experimentally. 814 00:49:13,860 --> 00:49:14,360 Yes. 815 00:49:14,360 --> 00:49:17,762 AUDIENCE: Is there another way to put 816 00:49:17,762 --> 00:49:21,636 a bound on the expression, because of this expression 817 00:49:21,636 --> 00:49:22,136 we have? 818 00:49:22,136 --> 00:49:26,024 You mentioned for that promoter, it's not possible to-- 819 00:49:26,024 --> 00:49:28,080 PROFESSOR: OK, but the idea of evolution 820 00:49:28,080 --> 00:49:31,890 is that evolution can make it a stronger promoter. 821 00:49:31,890 --> 00:49:35,070 So you guys, one statement is, given this DNA sequence 822 00:49:35,070 --> 00:49:37,290 at that promoter, how much expression can you get? 823 00:49:37,290 --> 00:49:40,630 And the most you can get is this amount that's normalized to 1. 824 00:49:40,630 --> 00:49:43,071 But if you make mutations in that promoter, 825 00:49:43,071 --> 00:49:44,320 then you could go out further. 826 00:49:48,050 --> 00:49:51,990 So the question now is, after we kind of tell you 827 00:49:51,990 --> 00:49:54,210 the results of these evolution experiments, 828 00:49:54,210 --> 00:49:59,200 how much should that favor this dashed red line, 829 00:49:59,200 --> 00:50:04,810 this super linear cost function with finite resources? 830 00:50:04,810 --> 00:50:11,810 And on one level, you'd say, oh, well, that's pretty compelling. 831 00:50:11,810 --> 00:50:16,540 On another level, later people that 832 00:50:16,540 --> 00:50:21,230 have come and measured this find that it's basically a line. 833 00:50:21,230 --> 00:50:23,390 So are there any questions? 834 00:50:23,390 --> 00:50:29,360 So it seems to basically be not true within this range. 835 00:50:29,360 --> 00:50:31,420 It is the case that if you go out far enough, 836 00:50:31,420 --> 00:50:32,890 then the growth does go to 0. 837 00:50:32,890 --> 00:50:36,341 But that's much further out. 838 00:50:36,341 --> 00:50:36,840 Yes. 839 00:50:36,840 --> 00:50:39,325 AUDIENCE: After they-- on the experiment, 840 00:50:39,325 --> 00:50:42,801 they had [INAUDIBLE] expression protein at the [INAUDIBLE] 841 00:50:42,801 --> 00:50:43,301 level. 842 00:50:43,301 --> 00:50:44,957 Why didn't they go back and do the experiment again, just 843 00:50:44,957 --> 00:50:45,786 to see [INAUDIBLE]? 844 00:50:45,786 --> 00:50:47,550 PROFESSOR: OK, so actually, one of these 845 00:50:47,550 --> 00:50:53,220 curves-- so the triangle, the sort of teal triangle, 846 00:50:53,220 --> 00:50:54,870 it is indeed higher up. 847 00:50:54,870 --> 00:50:57,330 And it's kind of here. 848 00:50:57,330 --> 00:51:01,000 So they do have a data point that is further beyond and is, 849 00:51:01,000 --> 00:51:03,030 again, above that curve. 850 00:51:03,030 --> 00:51:06,560 So that does provide somewhat further support 851 00:51:06,560 --> 00:51:07,892 for a non-linear model. 852 00:51:10,870 --> 00:51:13,860 But again, there's a question of how strong that should be 853 00:51:13,860 --> 00:51:15,650 and so forth. 854 00:51:15,650 --> 00:51:19,030 And indeed, I'd say, for example, 855 00:51:19,030 --> 00:51:21,620 Terry Hwa has spent a lot of time 856 00:51:21,620 --> 00:51:26,450 characterizing growth rates as a function of many, many things. 857 00:51:26,450 --> 00:51:33,000 And if you measure the relative growth rate 858 00:51:33,000 --> 00:51:41,504 as a function of a non-useful protein expression-- 859 00:51:41,504 --> 00:51:43,420 and what he finds is that this thing basically 860 00:51:43,420 --> 00:51:46,430 looks like a line in this axis. 861 00:51:46,430 --> 00:51:48,805 And it saturates at around if you're 862 00:51:48,805 --> 00:51:55,740 at 30% maybe of total protein expression. 863 00:51:55,740 --> 00:51:57,600 So this is a lot. 864 00:51:57,600 --> 00:52:01,125 But this is kind of where the cell just can't handle that. 865 00:52:04,190 --> 00:52:07,530 So Terry Hwa has recently been exploring 866 00:52:07,530 --> 00:52:09,780 a lot of these sort of phenomenological growth 867 00:52:09,780 --> 00:52:14,830 laws, where he imposes costs of various sorts 868 00:52:14,830 --> 00:52:17,570 and then looks at how the cell kind of responds. 869 00:52:17,570 --> 00:52:21,020 And what he finds is just a remarkably large number 870 00:52:21,020 --> 00:52:27,410 of lines in various spaces that I find very surprising, 871 00:52:27,410 --> 00:52:29,070 but that he can understand using kind 872 00:52:29,070 --> 00:52:31,376 of some phenomenological modeling. 873 00:52:31,376 --> 00:52:34,020 But this is one of like a dozen lines 874 00:52:34,020 --> 00:52:37,060 that he sees of various axes doing things. 875 00:52:37,060 --> 00:52:39,340 But the point here is that, as a function 876 00:52:39,340 --> 00:52:42,575 of the level of expression of these non-useful proteins, what 877 00:52:42,575 --> 00:52:45,390 he sees is that for a variety of different proteins-- including 878 00:52:45,390 --> 00:52:48,897 beta-gal but also beta-lactamase and other proteins that are not 879 00:52:48,897 --> 00:52:51,230 being used in that particular environment-- what he sees 880 00:52:51,230 --> 00:52:54,850 is that there's basically a linear cost 881 00:52:54,850 --> 00:53:03,900 growth, as you impose this non-useful protein expression. 882 00:53:03,900 --> 00:53:07,120 So I'd say that this basic statement 883 00:53:07,120 --> 00:53:10,660 of it being not-- the statement of cost being super linear, I 884 00:53:10,660 --> 00:53:15,130 think, ends up not being true. 885 00:53:15,130 --> 00:53:18,280 Now, what does it mean for this paper? 886 00:53:29,104 --> 00:53:32,056 AUDIENCE: I mean, they still presented with same hypothesis 887 00:53:32,056 --> 00:53:35,132 and had these data to back up some of it. 888 00:53:35,132 --> 00:53:36,090 PROFESSOR: Yeah, right. 889 00:53:36,090 --> 00:53:39,770 So it's a very interesting hypothesis. 890 00:53:39,770 --> 00:53:41,480 They did nice evolution experiments, 891 00:53:41,480 --> 00:53:45,430 where they saw the population adapt to different levels. 892 00:53:45,430 --> 00:53:48,930 But what does it mean about the predictions, 893 00:53:48,930 --> 00:53:50,950 in particular, in the sense that if you measure 894 00:53:50,950 --> 00:53:52,590 cost and benefits, then you want to predict 895 00:53:52,590 --> 00:53:53,715 where it's going to evolve. 896 00:53:57,350 --> 00:53:58,170 What happens? 897 00:53:58,170 --> 00:54:01,510 If it's the case that cost as a function of expression 898 00:54:01,510 --> 00:54:02,930 is actually linear, then what does 899 00:54:02,930 --> 00:54:05,750 that mean for their ability to predict what's going to happen? 900 00:54:23,462 --> 00:54:26,232 AUDIENCE: Seems like if they use their same model 901 00:54:26,232 --> 00:54:29,148 for the benefit in this linear cost, 902 00:54:29,148 --> 00:54:32,042 that their predictions would be really off [INAUDIBLE]. 903 00:54:32,042 --> 00:54:33,000 PROFESSOR: Yeah, right. 904 00:54:33,000 --> 00:54:36,660 So the problem is that if you actually use a linear function 905 00:54:36,660 --> 00:54:40,170 here, then their model doesn't even 906 00:54:40,170 --> 00:54:44,635 predict that there should be an optimum, because their benefit 907 00:54:44,635 --> 00:54:46,260 function ends up also being essentially 908 00:54:46,260 --> 00:54:49,390 linear with the amount of this protein expressed. 909 00:54:49,390 --> 00:54:56,460 So if you have two lines-- so overall growth is something 910 00:54:56,460 --> 00:54:59,320 like goes as benefits minus costs. 911 00:55:02,490 --> 00:55:07,340 And maybe this is a relative growth. 912 00:55:07,340 --> 00:55:15,340 So if you have a line here and a line here, no optimum. 913 00:55:15,340 --> 00:55:18,480 So that's kind of a bummer. 914 00:55:18,480 --> 00:55:24,560 But it doesn't mean that that's-- in biology, 915 00:55:24,560 --> 00:55:26,050 eventually things are non-linear, 916 00:55:26,050 --> 00:55:27,590 so there should be some optimum. 917 00:55:27,590 --> 00:55:28,965 And actually, what I would say is 918 00:55:28,965 --> 00:55:31,330 that I think that the non-linearity is probably 919 00:55:31,330 --> 00:55:32,830 actually here. 920 00:55:32,830 --> 00:55:34,790 That's the non-linearity that's relevant, 921 00:55:34,790 --> 00:55:39,150 maybe, is dominated on the benefit side rather than 922 00:55:39,150 --> 00:55:41,572 the cost side. 923 00:55:41,572 --> 00:55:43,030 My guess as to what's going on here 924 00:55:43,030 --> 00:55:46,590 is that rather than the costs growing super linearly 925 00:55:46,590 --> 00:55:48,510 with the expression level, rather 926 00:55:48,510 --> 00:55:52,280 the benefits will be sub-linear with the expression level. 927 00:55:55,840 --> 00:55:58,040 And why might that be? 928 00:56:06,626 --> 00:56:08,740 AUDIENCE: We're just seeing them apart, 929 00:56:08,740 --> 00:56:11,480 splitting up more lactose that's useful, just so it 930 00:56:11,480 --> 00:56:12,824 can't metabolize more of it. 931 00:56:12,824 --> 00:56:14,640 PROFESSOR: Right, you know, at some point, 932 00:56:14,640 --> 00:56:18,310 it's just that the cell doesn't need more sugar. 933 00:56:18,310 --> 00:56:20,140 And then it's not going to be as useful. 934 00:56:20,140 --> 00:56:22,640 And even before you get to that regime, 935 00:56:22,640 --> 00:56:25,700 I think there are various ways in which cells 936 00:56:25,700 --> 00:56:28,990 may be able to use the sugar more or less 937 00:56:28,990 --> 00:56:31,490 efficiently, depending on how much they have it, which means 938 00:56:31,490 --> 00:56:35,360 that as-- and this is just like for us, the first slice 939 00:56:35,360 --> 00:56:36,517 of pizza is great. 940 00:56:36,517 --> 00:56:38,100 But then once you're at the fifth one, 941 00:56:38,100 --> 00:56:40,580 you start to feel a little bit full. 942 00:56:40,580 --> 00:56:47,020 So in general benefits as a function of anything, 943 00:56:47,020 --> 00:56:48,880 should have some saturating behavior. 944 00:56:53,560 --> 00:56:55,630 And my sense is that this is basically 945 00:56:55,630 --> 00:56:58,510 why there's an optimum here. 946 00:56:58,510 --> 00:57:04,640 Now, of course, I'd say that all these cost functions 947 00:57:04,640 --> 00:57:06,520 behave very similarly in here. 948 00:57:06,520 --> 00:57:08,500 So the predictions that they make in here 949 00:57:08,500 --> 00:57:11,020 are really not very sensitive to which of the cost functions 950 00:57:11,020 --> 00:57:11,520 they use. 951 00:57:11,520 --> 00:57:15,710 And those are all still then relevant and valid. 952 00:57:15,710 --> 00:57:17,640 The question is just trying to predict 953 00:57:17,640 --> 00:57:19,690 what happens beyond the range that you have data 954 00:57:19,690 --> 00:57:21,570 is very hard, because it depends very 955 00:57:21,570 --> 00:57:24,060 much on what your curve does past that region. 956 00:57:31,160 --> 00:57:33,560 So I guess I've made an argument that I 957 00:57:33,560 --> 00:57:35,060 think that maybe what's happening 958 00:57:35,060 --> 00:57:37,059 is that the benefit function here is non-linear. 959 00:57:37,059 --> 00:57:39,540 But what did they actually do to measure the benefits, 960 00:57:39,540 --> 00:57:42,920 because this is not I think totally obvious either? 961 00:58:06,510 --> 00:58:08,649 So what should I be plotting? 962 00:58:08,649 --> 00:58:10,440 Well, this is still a relative growth rate. 963 00:58:15,210 --> 00:58:19,530 And here, this was actually lactose concentrations. 964 00:58:19,530 --> 00:58:21,964 So this is not lac expression, which 965 00:58:21,964 --> 00:58:24,130 is the most obvious thing that you would want to do, 966 00:58:24,130 --> 00:58:26,517 but that's harder. 967 00:58:26,517 --> 00:58:28,100 And what they show is that their model 968 00:58:28,100 --> 00:58:30,690 is sort of consistent on this axis. 969 00:58:30,690 --> 00:58:31,910 This is external lactose. 970 00:58:36,360 --> 00:58:41,870 And the idea is that-- here is 0-- 971 00:58:41,870 --> 00:58:45,860 in the absence of any lactose, if you induce the lac operon, 972 00:58:45,860 --> 00:58:50,400 then you're at this minus 4 and 1/2% or whatnot. 973 00:58:50,400 --> 00:58:54,650 So it kind of starts out down here. 974 00:58:54,650 --> 00:58:58,300 And then up here, it comes out up to above 0.1. 975 00:58:58,300 --> 00:59:01,620 So this is the first 4, maybe 4.5%. 976 00:59:01,620 --> 00:59:03,970 This is up here at 10%. 977 00:59:03,970 --> 00:59:09,730 And you end up with a curve that kind of 978 00:59:09,730 --> 00:59:15,760 goes from 4% or 5% deficit up to 10% or 11% advantage. 979 00:59:15,760 --> 00:59:18,820 And this is at full induction of the lac operon. 980 00:59:22,870 --> 00:59:26,860 What this is saying is that if you're 981 00:59:26,860 --> 00:59:31,890 making the proteins to break down and consume lactose, 982 00:59:31,890 --> 00:59:33,880 then there's a cost. 983 00:59:33,880 --> 00:59:35,500 That's just how they plotted it. 984 00:59:35,500 --> 00:59:38,620 But that the benefits do indeed outweigh the costs 985 00:59:38,620 --> 00:59:40,180 at some concentration of lactose. 986 00:59:44,230 --> 00:59:48,160 But then here, there's a saturation. 987 00:59:48,160 --> 00:59:50,380 And here, the saturation in their model-- 988 00:59:50,380 --> 00:59:53,430 they get a saturation just because 989 00:59:53,430 --> 00:59:56,672 of the dynamics of import. 990 00:59:56,672 --> 00:59:58,130 So what they assume is that there's 991 00:59:58,130 --> 01:00:01,990 a Michaelis-Menten kinetics for import. 992 01:00:01,990 --> 01:00:10,790 So the import rate kind of goes as the concentration 993 01:00:10,790 --> 01:00:16,390 of the lactose divided by some k plus the concentration again 994 01:00:16,390 --> 01:00:21,615 of lactose, so Michaelis-Menten dynamics. 995 01:00:25,140 --> 01:00:27,932 But of course, if you have more of the protein lacY, 996 01:00:27,932 --> 01:00:29,390 then you'll be able to import more. 997 01:00:32,452 --> 01:00:33,910 So just because you have saturation 998 01:00:33,910 --> 01:00:36,430 as a function of lactose does not 999 01:00:36,430 --> 01:00:38,340 mean that you'll have saturation in terms 1000 01:00:38,340 --> 01:00:43,027 of the number of proteins that you're making. 1001 01:00:43,027 --> 01:00:44,610 Do you understand why I'm saying that? 1002 01:00:47,890 --> 01:00:51,657 And indeed, I would say that many underlying models 1003 01:00:51,657 --> 01:00:53,740 could have been consistent with this data as well. 1004 01:00:53,740 --> 01:00:57,540 So I'd say that their data does not 1005 01:00:57,540 --> 01:01:02,260 reject the hypothesis that the benefit function is sublinear. 1006 01:01:06,044 --> 01:01:06,544 Yeah. 1007 01:01:06,544 --> 01:01:08,764 AUDIENCE: So that you just said if you 1008 01:01:08,764 --> 01:01:12,350 have more lacY and import more and it would 1009 01:01:12,350 --> 01:01:13,700 saturate to an [INAUDIBLE]. 1010 01:01:13,700 --> 01:01:18,321 So you could imagine that by evolution something happened 1011 01:01:18,321 --> 01:01:18,820 there. 1012 01:01:18,820 --> 01:01:22,250 So why would you even expect the prediction 1013 01:01:22,250 --> 01:01:24,130 of this cost-benefit analysis? 1014 01:01:27,210 --> 01:01:28,300 You see what I mean? 1015 01:01:28,300 --> 01:01:31,760 PROFESSOR: OK, so you're saying that evolution 1016 01:01:31,760 --> 01:01:33,710 might be able to change other things as well 1017 01:01:33,710 --> 01:01:35,020 to kind of fiddle-- yeah. 1018 01:01:35,020 --> 01:01:36,787 I think this is an important question. 1019 01:01:36,787 --> 01:01:38,370 I think the basic answer is that there 1020 01:01:38,370 --> 01:01:40,495 are some things that are easier for evolution to do 1021 01:01:40,495 --> 01:01:41,850 than others. 1022 01:01:41,850 --> 01:01:46,240 And also that somethings have maybe already been optimized. 1023 01:01:46,240 --> 01:01:48,820 Now, relevant to this point, so they 1024 01:01:48,820 --> 01:01:51,100 did these laboratory evolution experiments, 1025 01:01:51,100 --> 01:01:57,161 and there was one category of mutation that they did not see. 1026 01:01:57,161 --> 01:01:58,660 Does anybody remember what that was? 1027 01:02:09,009 --> 01:02:10,800 What's the most straightforward way of kind 1028 01:02:10,800 --> 01:02:14,080 of getting around all this cost-benefit discussion 1029 01:02:14,080 --> 01:02:15,160 that we've just had? 1030 01:02:37,137 --> 01:02:38,720 So the one thing that they did not see 1031 01:02:38,720 --> 01:02:43,100 was significant improvements in the enzyme. 1032 01:02:43,100 --> 01:02:46,190 So they checked, and they found that they did not 1033 01:02:46,190 --> 01:02:49,550 see any increase the lacZ activity normalized 1034 01:02:49,550 --> 01:02:53,400 by the amount of the lacZ that was being made. 1035 01:02:53,400 --> 01:02:56,610 Now, that might make sense, because if this enzyme has 1036 01:02:56,610 --> 01:03:00,510 already been gone through millions of years 1037 01:03:00,510 --> 01:03:03,820 of optimization to break down lactose, 1038 01:03:03,820 --> 01:03:06,430 then it's reasonable to say, oh, well, in the next five 1039 01:03:06,430 --> 01:03:10,484 generations in the lab, it maybe won't improve. 1040 01:03:10,484 --> 01:03:12,650 Of course, you always have to be careful about this, 1041 01:03:12,650 --> 01:03:17,470 because it could be that some sequence slash structure is 1042 01:03:17,470 --> 01:03:20,269 best when you're thinking about-- when is it 1043 01:03:20,269 --> 01:03:21,560 that E. coli might see lactose? 1044 01:03:26,400 --> 01:03:28,350 Our gut. 1045 01:03:28,350 --> 01:03:30,810 So you imagine you have bacteria in the gut. 1046 01:03:30,810 --> 01:03:33,520 That's a different environment than in the lab. 1047 01:03:33,520 --> 01:03:36,650 So it could be very well that the enzyme, 1048 01:03:36,650 --> 01:03:39,120 because of the pH and all these other things, 1049 01:03:39,120 --> 01:03:42,120 the enzyme actually could adapt to the lab, 1050 01:03:42,120 --> 01:03:44,871 even though it may have already been adapted to our gut. 1051 01:03:44,871 --> 01:03:47,370 So you have to got to be careful about this kind of argument 1052 01:03:47,370 --> 01:03:49,680 always. 1053 01:03:49,680 --> 01:03:51,970 But of course, once you see the result, then you say, 1054 01:03:51,970 --> 01:03:53,345 oh, well, that's because of this. 1055 01:03:56,086 --> 01:03:57,960 So I just want to make sure that we know what 1056 01:03:57,960 --> 01:03:59,126 these experiments look like. 1057 01:03:59,126 --> 01:04:01,115 So they went for 500 generations. 1058 01:04:07,270 --> 01:04:11,180 So it's useful to ask how long this experiment should 1059 01:04:11,180 --> 01:04:11,680 have taken. 1060 01:04:21,110 --> 01:04:35,695 Is it closest to three days, three weeks? 1061 01:04:51,030 --> 01:04:53,940 Anytime you read about an experiment, 1062 01:04:53,940 --> 01:04:57,174 it's useful just to have some notion of what the authors went 1063 01:04:57,174 --> 01:04:59,174 through in order to bring you the results you're 1064 01:04:59,174 --> 01:04:59,876 reading about. 1065 01:05:05,500 --> 01:05:10,420 If you are not sure, you can just make a guess. 1066 01:05:10,420 --> 01:05:15,210 OK, ready, 3, 2, 1. 1067 01:05:15,210 --> 01:05:18,040 All right, so we have some number of A's, some number 1068 01:05:18,040 --> 01:05:21,150 of B's, and a couple of C's. 1069 01:05:21,150 --> 01:05:24,390 Well, one thing you might say is, 1070 01:05:24,390 --> 01:05:27,690 how fast can E. coli divide? 1071 01:05:27,690 --> 01:05:30,015 OK, on one level, you may say oh, about 20 minutes. 1072 01:05:32,930 --> 01:05:35,950 That should give us what? 1073 01:05:35,950 --> 01:05:38,570 75-ish generations a day. 1074 01:05:38,570 --> 01:05:46,940 So we should be able to get here in a week or something, maybe. 1075 01:05:46,940 --> 01:05:51,540 But that's not what they did, for several reasons. 1076 01:05:51,540 --> 01:05:54,790 First of all, this would be in rich media. 1077 01:05:54,790 --> 01:05:57,480 In the environment that they are doing this in, 1078 01:05:57,480 --> 01:05:58,800 it's a bit slower. 1079 01:05:58,800 --> 01:06:04,420 But that would get you maybe to the two or three-week mark. 1080 01:06:04,420 --> 01:06:06,162 But that still is not what happened. 1081 01:06:06,162 --> 01:06:07,870 They actually had to go for three months. 1082 01:06:10,780 --> 01:06:14,130 And this is because experiments are not always 1083 01:06:14,130 --> 01:06:16,070 keeping cells constantly dividing 1084 01:06:16,070 --> 01:06:18,720 at their maximal rates. 1085 01:06:18,720 --> 01:06:20,340 The standard way that we do this is 1086 01:06:20,340 --> 01:06:22,570 what's known as kind of daily batch culture. 1087 01:06:25,250 --> 01:06:28,610 And does anybody know how much they diluted by each day? 1088 01:06:34,330 --> 01:06:38,950 Yeah, so I think it was diluting by a factor of 100. 1089 01:06:38,950 --> 01:06:47,810 So it's daily batch culture with 100x dilution, 1090 01:06:47,810 --> 01:06:52,130 which corresponds to about 6.6 generations per day. 1091 01:06:56,250 --> 01:06:59,085 So this is very far from what you 1092 01:06:59,085 --> 01:07:01,740 would think of as kind of the best they could possibly do. 1093 01:07:06,862 --> 01:07:08,320 And what it means is that, yeah, it 1094 01:07:08,320 --> 01:07:09,920 does take about three months for them 1095 01:07:09,920 --> 01:07:13,010 to have done this experiment. 1096 01:07:13,010 --> 01:07:19,900 It also means that if you look at the number over the course 1097 01:07:19,900 --> 01:07:27,750 of each day, this is n max. 1098 01:07:27,750 --> 01:07:31,600 And they dilute-- this is n max over 100-- 1099 01:07:31,600 --> 01:07:36,170 so they dilute by a factor of 100. 1100 01:07:36,170 --> 01:07:40,066 When you transfer cells from a saturated state 1101 01:07:40,066 --> 01:07:42,440 into new environment, do they start dividing immediately, 1102 01:07:42,440 --> 01:07:44,410 for those of you who have done this experiment? 1103 01:07:44,410 --> 01:07:45,130 No. 1104 01:07:45,130 --> 01:07:50,362 It's going to take an hour or two for them to get going. 1105 01:07:50,362 --> 01:07:52,070 But then they're going to start dividing. 1106 01:07:52,070 --> 01:08:00,135 And this on a log scale maybe-- log N. And what you'll see 1107 01:08:00,135 --> 01:08:02,260 is they kind of go-- they're dividing exponentially 1108 01:08:02,260 --> 01:08:04,190 and then they saturate. 1109 01:08:04,190 --> 01:08:06,850 Indeed, they're going to saturate for about 1110 01:08:06,850 --> 01:08:07,860 a fair amount of time. 1111 01:08:07,860 --> 01:08:10,560 So this might be an hour or two. 1112 01:08:10,560 --> 01:08:14,710 This might be say five hours. 1113 01:08:14,710 --> 01:08:16,950 But then you still have another roughly 20 hours 1114 01:08:16,950 --> 01:08:20,270 to go before the next dilution. 1115 01:08:23,279 --> 01:08:25,790 And then we repeat. 1116 01:08:25,790 --> 01:08:28,443 So they actually saturated for a fair fraction of the day. 1117 01:08:32,700 --> 01:08:35,879 Now, in all these discussions of laboratory evolution-- 1118 01:08:35,879 --> 01:08:37,420 and in many of the calculations we're 1119 01:08:37,420 --> 01:08:41,350 going to be doing over the next couple of weeks-- 1120 01:08:41,350 --> 01:08:44,180 we'll typically assume that what is being optimized 1121 01:08:44,180 --> 01:08:47,339 is the growth rate, the rate of division. 1122 01:08:47,339 --> 01:08:49,380 But you can imagine there being other things that 1123 01:08:49,380 --> 01:08:51,621 might possibly be optimized in the course 1124 01:08:51,621 --> 01:08:52,870 of these sorts of experiments. 1125 01:08:52,870 --> 01:08:55,910 Can somebody volunteer what are other things? 1126 01:09:00,338 --> 01:09:02,306 AUDIENCE: Maximum density? 1127 01:09:02,306 --> 01:09:04,569 PROFESSOR: Right, so you could imagine, 1128 01:09:04,569 --> 01:09:08,279 if you could just eke out one more division out there, 1129 01:09:08,279 --> 01:09:10,529 then you could get an advantage. 1130 01:09:10,529 --> 01:09:12,680 And there's a whole set of interesting things, 1131 01:09:12,680 --> 01:09:13,729 these growth advantage. 1132 01:09:13,729 --> 01:09:16,540 It's stationary phase or the GASP mutants, 1133 01:09:16,540 --> 01:09:20,040 where the focus is on trying to do well here. 1134 01:09:20,040 --> 01:09:23,660 And also you can imagine related maybe, 1135 01:09:23,660 --> 01:09:27,090 if you do better out for this period, 1136 01:09:27,090 --> 01:09:29,310 cells will start dying eventually. 1137 01:09:29,310 --> 01:09:31,434 So if you have a lower rate of death at saturation, 1138 01:09:31,434 --> 01:09:34,930 then you can also spread. 1139 01:09:34,930 --> 01:09:36,136 Other-- yep. 1140 01:09:36,136 --> 01:09:38,571 AUDIENCE: Sorry, can I ask a quick question? 1141 01:09:38,571 --> 01:09:43,207 What's a possible reason for the initial [INAUDIBLE] used 1142 01:09:43,207 --> 01:09:44,415 [INAUDIBLE] at the beginning? 1143 01:09:44,415 --> 01:09:45,899 PROFESSOR: Yeah, right. 1144 01:09:45,899 --> 01:09:50,870 So I think it's basically that when the cells are saturated, 1145 01:09:50,870 --> 01:09:55,570 they generally enter a rather distinct physiological state, 1146 01:09:55,570 --> 01:09:58,070 as compared to the dividing state. 1147 01:09:58,070 --> 01:10:01,340 And I think the longer they sit in this saturated phase, 1148 01:10:01,340 --> 01:10:03,580 the longer it's going to take them to get going 1149 01:10:03,580 --> 01:10:06,620 in the next day, for example. 1150 01:10:06,620 --> 01:10:10,260 And it's also the case that cells in saturated culture 1151 01:10:10,260 --> 01:10:12,840 tend to be more resistant to a variety of perturbations 1152 01:10:12,840 --> 01:10:17,340 of various sorts-- so if you're talking about heat, salt, this, 1153 01:10:17,340 --> 01:10:19,550 that, and the other thing. 1154 01:10:19,550 --> 01:10:22,210 What's something else they could be optimized here? 1155 01:10:29,906 --> 01:10:31,990 If you were imagining you're a cell, 1156 01:10:31,990 --> 01:10:35,004 you want to spread, what would you do? 1157 01:10:39,620 --> 01:10:41,935 AUDIENCE: [INAUDIBLE]. 1158 01:10:41,935 --> 01:10:42,810 PROFESSOR: OK, right. 1159 01:10:42,810 --> 01:10:45,280 So we're saying that the media is specified 1160 01:10:45,280 --> 01:10:46,580 by the experimentalist. 1161 01:10:46,580 --> 01:10:53,510 So you're the cell in this Gedanken experiment. 1162 01:10:53,510 --> 01:10:54,919 AUDIENCE: You'd divide yourself. 1163 01:10:54,919 --> 01:10:57,210 PROFESSOR: Right, so you can eat the other cells, yeah. 1164 01:10:59,990 --> 01:11:01,990 Well, and in particular, actually out here, this 1165 01:11:01,990 --> 01:11:03,970 is part of how the GASP mutants spread, 1166 01:11:03,970 --> 01:11:06,230 is that when other cells start to die, 1167 01:11:06,230 --> 01:11:07,700 they lyse their contents. 1168 01:11:07,700 --> 01:11:10,740 And then the cells that are surviving 1169 01:11:10,740 --> 01:11:12,760 can actually eat the contents, yeah. 1170 01:11:12,760 --> 01:11:16,204 AUDIENCE: Is this a way to coordinate 1171 01:11:16,204 --> 01:11:21,124 between different cells, so that they can sort of evenly 1172 01:11:21,124 --> 01:11:22,900 distribute themselves in the media, 1173 01:11:22,900 --> 01:11:24,835 so you don't have to many-- 1174 01:11:24,835 --> 01:11:25,710 PROFESSOR: OK, right. 1175 01:11:25,710 --> 01:11:28,260 So I'm actually assuming here it's well mixed, 1176 01:11:28,260 --> 01:11:31,950 so that in principle would not be an issue. 1177 01:11:31,950 --> 01:11:34,490 But yeah, so you can imagine spatial effects 1178 01:11:34,490 --> 01:11:36,520 of various sorts being relevant. 1179 01:11:36,520 --> 01:11:39,470 I guess I just drew this up here to highlight 1180 01:11:39,470 --> 01:11:43,690 that, in principle, you can also decrease the lag time. 1181 01:11:46,450 --> 01:11:50,680 So if you start dividing more rapidly 1182 01:11:50,680 --> 01:11:53,762 at the beginning of the day, then you'll 1183 01:11:53,762 --> 01:11:55,220 get to spread before your neighbors 1184 01:11:55,220 --> 01:11:57,646 and your genotypes will indeed spread. 1185 01:11:57,646 --> 01:11:58,440 Yep. 1186 01:11:58,440 --> 01:12:02,540 AUDIENCE: So I just know so little about cells, 1187 01:12:02,540 --> 01:12:07,832 but is it true that a lot of the cells 1188 01:12:07,832 --> 01:12:12,470 could survive and be the same cell for that whole duration 1189 01:12:12,470 --> 01:12:14,852 when they were in the stationary phase? 1190 01:12:14,852 --> 01:12:16,600 PROFESSOR: You're asking whether the-- 1191 01:12:16,600 --> 01:12:18,065 AUDIENCE: Yeah, whether a cell that 1192 01:12:18,065 --> 01:12:19,981 entered the beginning of the stationary phase, 1193 01:12:19,981 --> 01:12:22,400 that same cell would have a pretty good chance of-- 1194 01:12:22,400 --> 01:12:26,300 PROFESSOR: Yeah, over I think this sort of 12-hour type 1195 01:12:26,300 --> 01:12:28,590 period, I think the answer is yes. 1196 01:12:28,590 --> 01:12:31,870 But if you go for an extra day or two, 1197 01:12:31,870 --> 01:12:34,531 then I think you can start getting extensive cell death. 1198 01:12:34,531 --> 01:12:36,252 AUDIENCE: Because then who knows? 1199 01:12:36,252 --> 01:12:38,752 Maybe long enough, though, they would develop a little clock 1200 01:12:38,752 --> 01:12:40,450 to let them know that it was about to split. 1201 01:12:40,450 --> 01:12:41,366 PROFESSOR: Yes, right. 1202 01:12:41,366 --> 01:12:42,932 Yeah, so people have thought about-- 1203 01:12:42,932 --> 01:12:44,015 and I'm not sure if this-- 1204 01:12:44,015 --> 01:12:45,556 AUDIENCE: And it seems like it would. 1205 01:12:45,556 --> 01:12:47,589 PROFESSOR: --particular effect is-- yeah, right. 1206 01:12:47,589 --> 01:12:49,630 But I just want to mention that this is something 1207 01:12:49,630 --> 01:12:51,463 that you kind of maybe would expect, indeed, 1208 01:12:51,463 --> 01:12:54,640 you see in these-- so they're a famous set of experiments done 1209 01:12:54,640 --> 01:13:01,920 by Richard Lenski at Michigan State, where he's been dividing 1210 01:13:01,920 --> 01:13:07,280 six or eight-- doing daily batch dilutions of equal E. coli 1211 01:13:07,280 --> 01:13:09,940 cultures now for decades. 1212 01:13:09,940 --> 01:13:13,980 So he started, I don't know, late '80s or so. 1213 01:13:13,980 --> 01:13:15,710 I don't know if you guys remember. 1214 01:13:15,710 --> 01:13:18,340 So he's gone tens of thousands of generations 1215 01:13:18,340 --> 01:13:21,480 and has seen a bunch of remarkable things. 1216 01:13:21,480 --> 01:13:23,012 One of the things that he has seen, 1217 01:13:23,012 --> 01:13:24,720 as you might have expected, is a decrease 1218 01:13:24,720 --> 01:13:27,275 in the lag time of the vector area. 1219 01:13:34,050 --> 01:13:40,500 So what we have now is a situation where they add IPTG, 1220 01:13:40,500 --> 01:13:44,200 so that all the cells are in principle 1221 01:13:44,200 --> 01:13:46,660 start out expressing the lac operon. 1222 01:13:46,660 --> 01:13:50,190 And then they grow the cells over time. 1223 01:13:50,190 --> 01:13:58,420 And what they see is that the lacZ activity, it starts out 1224 01:13:58,420 --> 01:14:03,660 at being 1, normalized, for all the cultures, 1225 01:14:03,660 --> 01:14:05,920 because there's IPTG, so it doesn't matter 1226 01:14:05,920 --> 01:14:07,270 how much lactose there is. 1227 01:14:07,270 --> 01:14:13,871 But what they see is that over time, 1228 01:14:13,871 --> 01:14:15,370 they see things that look like this. 1229 01:14:19,100 --> 01:14:25,740 So the 0.5 millimolar lactose didn't change very much. 1230 01:14:25,740 --> 01:14:32,580 But if you look at some of the others, like no lactose, 1231 01:14:32,580 --> 01:14:35,380 there was significant decrease in expression. 1232 01:14:35,380 --> 01:14:38,945 Whereas, up here at, for example, 2 millimolar lactose, 1233 01:14:38,945 --> 01:14:39,820 they see an increase. 1234 01:14:43,610 --> 01:14:46,290 So what you see is that there really 1235 01:14:46,290 --> 01:14:53,150 are evolutionary changes of these strains, because-- 1236 01:14:53,150 --> 01:14:57,000 and it's very, very relevant that they 1237 01:14:57,000 --> 01:14:59,150 had IPTG in the media. 1238 01:14:59,150 --> 01:15:01,305 So if they did this experiment without IPTG, 1239 01:15:01,305 --> 01:15:05,770 do you have any sense of what would kind of happen 1240 01:15:05,770 --> 01:15:06,350 to the cells? 1241 01:15:06,350 --> 01:15:09,894 I mean, how would that change the results? 1242 01:15:09,894 --> 01:15:10,394 Yep. 1243 01:15:10,394 --> 01:15:11,964 AUDIENCE: The expression level would 1244 01:15:11,964 --> 01:15:13,691 be determined by [INAUDIBLE]. 1245 01:15:13,691 --> 01:15:15,830 PROFESSOR: Right, so the expression would 1246 01:15:15,830 --> 01:15:17,660 be determined by the lactose. 1247 01:15:17,660 --> 01:15:21,020 But let's say that after 500 generations, 1248 01:15:21,020 --> 01:15:23,412 we put them all in a millimolar lactose. 1249 01:15:23,412 --> 01:15:25,370 How different do you think they're going to be? 1250 01:15:34,240 --> 01:15:37,180 I mean, do think that the culture grown, for example, 1251 01:15:37,180 --> 01:15:38,950 in the absence of lactose, do you 1252 01:15:38,950 --> 01:15:42,870 think that it would still be able to eat lactose after 500 1253 01:15:42,870 --> 01:15:44,775 generations in that experiment? 1254 01:15:51,560 --> 01:15:52,674 Hm? 1255 01:15:52,674 --> 01:15:53,560 AUDIENCE: Yes. 1256 01:15:53,560 --> 01:15:54,410 PROFESSOR: Yes, OK. 1257 01:15:54,410 --> 01:15:55,868 And yeah, so what's the difference? 1258 01:15:55,868 --> 01:15:58,472 I mean, why are you saying yes or what's the-- 1259 01:15:58,472 --> 01:15:59,930 AUDIENCE: [INAUDIBLE]. 1260 01:15:59,930 --> 01:16:01,390 I don't know how hard it is-- 1261 01:16:01,390 --> 01:16:02,832 PROFESSOR: Well, yeah, right. 1262 01:16:02,832 --> 01:16:04,290 So this is an experiment with IPTG. 1263 01:16:08,580 --> 01:16:10,990 And now, I'm just trying to think about or imagine 1264 01:16:10,990 --> 01:16:13,580 what would have happened if they had done the same experiment 1265 01:16:13,580 --> 01:16:16,950 without IPTG just growing in that environment, 1266 01:16:16,950 --> 01:16:24,912 in particular, if you grow minus IPTG and then minus lactose 1267 01:16:24,912 --> 01:16:25,745 for 500 generations? 1268 01:16:29,902 --> 01:16:31,360 And then what I want to ask is, OK, 1269 01:16:31,360 --> 01:16:33,776 let's say that you go over there and you just add lactose. 1270 01:16:35,597 --> 01:16:38,180 Will the cells, do you think, be able to grow and the lactose? 1271 01:16:41,200 --> 01:16:41,700 OK. 1272 01:16:41,700 --> 01:16:47,100 And so why is it that here the answer seems to be no? 1273 01:16:57,880 --> 01:16:59,710 So here, we have evolved a population 1274 01:16:59,710 --> 01:17:07,060 that not only it's not expressing the lacZ 1275 01:17:07,060 --> 01:17:07,700 activity here. 1276 01:17:07,700 --> 01:17:11,970 But indeed, if you put lactose in there, it doesn't express. 1277 01:17:11,970 --> 01:17:16,280 So these cells can no longer grow on lactose. 1278 01:17:16,280 --> 01:17:18,544 So what's the key difference here? 1279 01:17:18,544 --> 01:17:19,044 Yep. 1280 01:17:19,044 --> 01:17:21,996 AUDIENCE: So I mean, there's no [INAUDIBLE] in this case. 1281 01:17:21,996 --> 01:17:22,980 PROFESSOR: Right. 1282 01:17:22,980 --> 01:17:25,350 Now I think this is just really important. 1283 01:17:25,350 --> 01:17:28,630 So in this case, there is approximately, we'll say, 1284 01:17:28,630 --> 01:17:35,290 no cost to having the lac operon on there, because it's just not 1285 01:17:35,290 --> 01:17:36,542 being expressed. 1286 01:17:36,542 --> 01:17:38,000 So then the only cost is associated 1287 01:17:38,000 --> 01:17:39,780 with DNA replication. 1288 01:17:39,780 --> 01:17:44,410 So the advantage associated with shutting off or removing 1289 01:17:44,410 --> 01:17:46,660 the ability to grow on lactose is just really minimal. 1290 01:17:46,660 --> 01:17:53,020 And indeed, in this culture, the authors did say, what happened. 1291 01:17:53,020 --> 01:17:57,372 AUDIENCE: Yeah, the entire gene is diluted, right? 1292 01:17:57,372 --> 01:17:58,330 PROFESSOR: Yeah, right. 1293 01:17:58,330 --> 01:18:01,410 So it was almost a kB was just removed from the genome. 1294 01:18:01,410 --> 01:18:04,510 And that kB included the promoter. 1295 01:18:04,510 --> 01:18:07,844 And so that it just-- yeah. 1296 01:18:07,844 --> 01:18:10,177 So it's not going to be able to grow on lactose anymore. 1297 01:18:12,980 --> 01:18:15,310 But the key thing is here, these cells 1298 01:18:15,310 --> 01:18:17,400 were subject to this 5% cost associated 1299 01:18:17,400 --> 01:18:21,980 with making the lac operon, which means that that mutant 1300 01:18:21,980 --> 01:18:23,729 that appeared, it had a 5% advantage, 1301 01:18:23,729 --> 01:18:26,020 and so it was able to spread throughout the population. 1302 01:18:29,700 --> 01:18:37,240 Whereas, what they could see is that the evolved lacZ activity 1303 01:18:37,240 --> 01:18:40,790 indeed was different, depending on how much lactose 1304 01:18:40,790 --> 01:18:42,340 they had in the culture. 1305 01:18:42,340 --> 01:18:43,850 And this is in the presence by IPTG, 1306 01:18:43,850 --> 01:18:47,128 so they removed that feedback loop. 1307 01:18:53,160 --> 01:18:56,670 And in these experiments, anyway you slice it, 1308 01:18:56,670 --> 01:19:03,000 the normalized lacZ activity did not go above around 1.2 or 1.3. 1309 01:19:03,000 --> 01:19:05,820 So there is some non-linearity that is somehow 1310 01:19:05,820 --> 01:19:07,810 constraining those cells from going up 1311 01:19:07,810 --> 01:19:12,776 to increased expression very much beyond the wild type. 1312 01:19:18,620 --> 01:19:22,890 We are out of time, but on Tuesday, we'll 1313 01:19:22,890 --> 01:19:25,470 start talking about evolution, and in particular, 1314 01:19:25,470 --> 01:19:26,970 in the context of neutral evolution, 1315 01:19:26,970 --> 01:19:29,570 as kind of a null model to try to understand these dynamics. 1316 01:19:29,570 --> 01:19:32,170 And we will also talk more about why 1317 01:19:32,170 --> 01:19:34,999 it takes as long as it does before you start 1318 01:19:34,999 --> 01:19:36,290 seeing anything happening here. 1319 01:19:36,290 --> 01:19:39,890 If you have any questions, please feel free to come on up.