1 00:00:00,070 --> 00:00:02,500 The following content is provided under a Creative 2 00:00:02,500 --> 00:00:04,019 Commons license. 3 00:00:04,019 --> 00:00:06,360 Your support will help MIT OpenCourseWare 4 00:00:06,360 --> 00:00:10,730 continue to offer high quality educational resources for free. 5 00:00:10,730 --> 00:00:13,340 To make a donation or view additional materials 6 00:00:13,340 --> 00:00:17,236 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,236 --> 00:00:17,861 at ocw.mit.edu. 8 00:00:20,782 --> 00:00:22,240 PROFESSOR: Today what we want to do 9 00:00:22,240 --> 00:00:24,660 is discuss various approaches that you 10 00:00:24,660 --> 00:00:27,360 might want to take towards trying to understand 11 00:00:27,360 --> 00:00:28,330 stochastic systems. 12 00:00:28,330 --> 00:00:31,085 In particular, how is it that we might model or simulate 13 00:00:31,085 --> 00:00:32,820 a stochastic system? 14 00:00:32,820 --> 00:00:35,870 Now, we will kind of continue our discussion of the master 15 00:00:35,870 --> 00:00:37,280 equation from last time. 16 00:00:37,280 --> 00:00:39,160 Hopefully now you've kind of thought about it a bit more 17 00:00:39,160 --> 00:00:40,410 in the context of the reading. 18 00:00:40,410 --> 00:00:43,280 And we'll discuss kind of what it means to be using the master 19 00:00:43,280 --> 00:00:45,717 equation and how to formulate the master equation for more 20 00:00:45,717 --> 00:00:47,800 complicated situations, for example, when you have 21 00:00:47,800 --> 00:00:50,240 more than one chemical species. 22 00:00:50,240 --> 00:00:53,020 And then we'll talk about the idea of this Gillespie method, 23 00:00:53,020 --> 00:00:57,730 which is an exact way to simulate stochastic systems, 24 00:00:57,730 --> 00:01:02,550 and it's both exact and computationally tractable as 25 00:01:02,550 --> 00:01:06,120 compared to what you might call various naive methods. 26 00:01:06,120 --> 00:01:09,040 And the Gillespie method is really sort of 27 00:01:09,040 --> 00:01:11,150 qualitatively different from the master equation 28 00:01:11,150 --> 00:01:12,860 because in the master equation, you're 29 00:01:12,860 --> 00:01:16,730 looking at the evolution of probability distributions 30 00:01:16,730 --> 00:01:19,800 across the system, whereas the Gillespie method is really 31 00:01:19,800 --> 00:01:23,670 a way to generate individual stochastic trajectories. 32 00:01:23,670 --> 00:01:26,060 So if you start with somehow similar initial conditions, 33 00:01:26,060 --> 00:01:28,430 then you can actually get-- you can 34 00:01:28,430 --> 00:01:30,781 get, for example, the probability distributions 35 00:01:30,781 --> 00:01:32,280 from the Gillespie method by running 36 00:01:32,280 --> 00:01:33,870 many individual trajectories. 37 00:01:33,870 --> 00:01:36,949 But it's kind of conceptually rather different 38 00:01:36,949 --> 00:01:38,740 because of this notion of whether you think 39 00:01:38,740 --> 00:01:40,323 about probabilities or you're thinking 40 00:01:40,323 --> 00:01:44,150 about individual instantiations of some stochastic trajectory. 41 00:01:44,150 --> 00:01:46,390 So we'll try to make sense of when you might 42 00:01:46,390 --> 00:01:47,990 want to use one or the other. 43 00:01:47,990 --> 00:01:50,400 And then finally we'll talk about this Fokker-Planck 44 00:01:50,400 --> 00:01:53,850 approximation, which, as the reading indicated, 45 00:01:53,850 --> 00:01:58,390 for intermediate ends, it's useful to make 46 00:01:58,390 --> 00:02:00,206 this kind of continuous approximation, 47 00:02:00,206 --> 00:02:01,830 and then you can get a lot of intuition 48 00:02:01,830 --> 00:02:04,710 from your knowledge about diffusion 49 00:02:04,710 --> 00:02:08,680 on effective [INAUDIBLE] landscapes. 50 00:02:08,680 --> 00:02:13,740 Are there any questions about this or administrative things 51 00:02:13,740 --> 00:02:17,020 before we get going? 52 00:02:17,020 --> 00:02:20,210 I just want to remind you that the midterm is indeed 53 00:02:20,210 --> 00:02:24,600 next Thursday evening, 7-9 PM. 54 00:02:24,600 --> 00:02:26,600 If you have a problem with that time, 55 00:02:26,600 --> 00:02:29,592 then you should have emailed [? Sarab. ?] 56 00:02:29,592 --> 00:02:31,050 And if you haven't emailed him yet, 57 00:02:31,050 --> 00:02:32,930 you should do it right now. 58 00:02:32,930 --> 00:02:33,790 And-- yes. 59 00:02:37,130 --> 00:02:37,630 All right. 60 00:02:37,630 --> 00:02:41,740 So let's think about the master equation a little bit more. 61 00:02:41,740 --> 00:02:44,459 Now before what we did is we thought about the simplest 62 00:02:44,459 --> 00:02:46,250 possible case of the master equation, which 63 00:02:46,250 --> 00:02:49,230 is, if you just have something being created 64 00:02:49,230 --> 00:02:52,710 at a constant rate and then being degraded at a rate that's 65 00:02:52,710 --> 00:02:56,600 proportional to the number of that chemical species. 66 00:02:56,600 --> 00:02:59,640 And I'm going to be using the nomenclature that's 67 00:02:59,640 --> 00:03:02,440 a little bit closer to what was in your reading, just 68 00:03:02,440 --> 00:03:03,810 for, hopefully, clarity. 69 00:03:03,810 --> 00:03:07,580 And I think that some of my choices from last lecture 70 00:03:07,580 --> 00:03:09,640 were maybe unfortunate. 71 00:03:09,640 --> 00:03:14,670 So here, this is, for example, m would be the number of mRNA, 72 00:03:14,670 --> 00:03:16,200 for example, in the cell. 73 00:03:16,200 --> 00:03:19,460 This is the rate of creation of the mRNA, 74 00:03:19,460 --> 00:03:23,800 and then the rate of degradation of the mRNA. 75 00:03:23,800 --> 00:03:26,450 So m is the number of mRNA. 76 00:03:29,820 --> 00:03:31,845 And if we want understand gene expression, 77 00:03:31,845 --> 00:03:33,720 we might include an equation for the protein, 78 00:03:33,720 --> 00:03:37,120 so we might have some p dot, where some Kp. 79 00:03:42,850 --> 00:03:44,550 Now-- oh, sorry. 80 00:03:44,550 --> 00:03:46,420 Again, I always do this. 81 00:03:46,420 --> 00:03:46,920 All right. 82 00:03:46,920 --> 00:03:48,890 So we're going to have this be an n dot. 83 00:03:52,030 --> 00:03:54,630 So now n is going to be the number of the protein. 84 00:04:00,790 --> 00:04:04,110 Now this really is kind of the simplest possible model 85 00:04:04,110 --> 00:04:06,630 that you might write down for gene expression that includes 86 00:04:06,630 --> 00:04:09,080 the mRNA and the protein. 87 00:04:09,080 --> 00:04:13,370 So there's no autoregulation of any sort. 88 00:04:13,370 --> 00:04:16,820 It's just that the mRNA is involved 89 00:04:16,820 --> 00:04:19,284 in increasing the protein, but then we 90 00:04:19,284 --> 00:04:20,950 have degradation of the protein as well. 91 00:04:23,510 --> 00:04:25,010 So what we want to do is kind of try 92 00:04:25,010 --> 00:04:27,760 to understand how to formulate the master equation here. 93 00:04:27,760 --> 00:04:29,250 But then also, we want to make sure 94 00:04:29,250 --> 00:04:31,541 that we understand what the master equation is actually 95 00:04:31,541 --> 00:04:34,430 telling us and how it might be used. 96 00:04:34,430 --> 00:04:38,020 So first of all, in this model, I want to know 97 00:04:38,020 --> 00:04:41,260 is there, in principle, protein bursts? 98 00:04:44,920 --> 00:04:51,580 So before we talked about the fact that in-- at least 99 00:04:51,580 --> 00:04:54,170 in [? Sunny's ?] paper that we read-- 100 00:04:54,170 --> 00:04:55,750 they could observe protein bursts, 101 00:04:55,750 --> 00:04:58,460 at least in those experiments in e Coli. 102 00:04:58,460 --> 00:05:06,160 Question is, should this model somehow exhibit protein bursts, 103 00:05:06,160 --> 00:05:09,130 and why or why not? 104 00:05:09,130 --> 00:05:11,770 I just want to see where we are on this. 105 00:05:16,570 --> 00:05:18,934 I think this is something that, depending 106 00:05:18,934 --> 00:05:20,350 on how you interpret the question, 107 00:05:20,350 --> 00:05:22,058 you might decide the answer is yes or no. 108 00:05:22,058 --> 00:05:25,229 But I'm curious-- I think it's worth discussing 109 00:05:25,229 --> 00:05:26,520 what the implications are here. 110 00:05:32,917 --> 00:05:34,500 And the relevant part of this is going 111 00:05:34,500 --> 00:05:36,125 to be the discussion afterwards, so I'd 112 00:05:36,125 --> 00:05:39,170 say don't worry too much about what you think right now. 113 00:05:39,170 --> 00:05:41,120 But I'm just curious. 114 00:05:41,120 --> 00:05:45,240 This model, does it include, somehow, protein bursts? 115 00:05:45,240 --> 00:05:45,990 Ready? 116 00:05:45,990 --> 00:05:49,130 Three, two, one. 117 00:05:49,130 --> 00:05:49,630 OK. 118 00:05:49,630 --> 00:05:53,110 So we got-- I'd say at least a majority of people 119 00:05:53,110 --> 00:05:53,790 are saying no. 120 00:05:56,061 --> 00:05:57,560 But then some people are saying yes. 121 00:05:57,560 --> 00:06:05,120 So can somebody volunteer why or why not? 122 00:06:05,120 --> 00:06:05,620 Yes? 123 00:06:05,620 --> 00:06:09,010 AUDIENCE: I think the difference is if we're-- are we using this 124 00:06:09,010 --> 00:06:12,820 in a continuous fashion or are we using this in a discrete 125 00:06:12,820 --> 00:06:16,004 fashion [INAUDIBLE]. 126 00:06:16,004 --> 00:06:16,670 PROFESSOR: Yeah. 127 00:06:16,670 --> 00:06:16,940 OK. 128 00:06:16,940 --> 00:06:17,210 All right. 129 00:06:17,210 --> 00:06:17,710 All right. 130 00:06:17,710 --> 00:06:21,690 So he's answered both possible sides of the argument. 131 00:06:21,690 --> 00:06:24,530 And the point here is that if you just 132 00:06:24,530 --> 00:06:26,470 simulate this from the standpoint-- 133 00:06:26,470 --> 00:06:29,250 certainly, for example, this continuous, 134 00:06:29,250 --> 00:06:31,720 this discrete-- so if you just simulate 135 00:06:31,720 --> 00:06:35,690 this as a deterministic pair of differential equations, 136 00:06:35,690 --> 00:06:37,940 then will there be bursts? 137 00:06:37,940 --> 00:06:40,280 No. 138 00:06:40,280 --> 00:06:43,510 Because everything is well-behaved here. 139 00:06:43,510 --> 00:06:46,770 On the other hand, if we go and we do a full Gillespie 140 00:06:46,770 --> 00:06:51,350 simulation of this pair of equations, then 141 00:06:51,350 --> 00:06:53,210 in the proper parameter regime, we actually 142 00:06:53,210 --> 00:06:58,690 will get protein bursts, which is, in some ways, 143 00:06:58,690 --> 00:07:02,210 weird, that depending upon the framework that you're 144 00:07:02,210 --> 00:07:03,730 going to be analyzing this in, you 145 00:07:03,730 --> 00:07:07,630 can get qualitatively different behaviors for things. 146 00:07:07,630 --> 00:07:13,270 But there's a sense here that the deterministic, continuous 147 00:07:13,270 --> 00:07:15,780 evolution of these quantities would be the average over many 148 00:07:15,780 --> 00:07:16,840 of these stochastic trajectories, 149 00:07:16,840 --> 00:07:18,130 and the stochastic ones do have bursts, 150 00:07:18,130 --> 00:07:19,921 but if you average over many, many of them, 151 00:07:19,921 --> 00:07:23,910 then you end up getting some well-behaved pair of equations. 152 00:07:23,910 --> 00:07:26,290 So we'll kind of try to make sense of this more later on. 153 00:07:26,290 --> 00:07:29,560 But I think this just highlights that you 154 00:07:29,560 --> 00:07:31,800 can get really qualitatively different behaviors 155 00:07:31,800 --> 00:07:33,690 for the same set of equations depending 156 00:07:33,690 --> 00:07:35,770 upon what you're looking at. 157 00:07:38,460 --> 00:07:41,990 And these protein bursts can be dramatic events, right, 158 00:07:41,990 --> 00:07:45,900 where the protein number pops up by a lot. 159 00:07:45,900 --> 00:07:47,510 So this really, then, if you look 160 00:07:47,510 --> 00:07:48,570 at the individual trajectories here, 161 00:07:48,570 --> 00:07:50,460 they would look very different whether you 162 00:07:50,460 --> 00:07:52,170 were doing kind of a stochastic treatment 163 00:07:52,170 --> 00:07:53,211 or the deterministic one. 164 00:07:56,040 --> 00:07:58,600 Can somebody remind us the situation 165 00:07:58,600 --> 00:08:05,250 in which we get protein bursts in the stochastic model? 166 00:08:05,250 --> 00:08:08,390 In particular, will we always get these discrete protein 167 00:08:08,390 --> 00:08:09,285 bursts? 168 00:08:09,285 --> 00:08:11,540 Or what determines the size of a protein burst? 169 00:08:14,921 --> 00:08:15,420 Yes. 170 00:08:15,420 --> 00:08:17,128 AUDIENCE: Does it have to do with the lag 171 00:08:17,128 --> 00:08:19,704 time between when the mRNA is created [INAUDIBLE]? 172 00:08:23,987 --> 00:08:24,570 PROFESSOR: OK. 173 00:08:24,570 --> 00:08:25,069 Right. 174 00:08:25,069 --> 00:08:28,850 So there's a lag time between the time that mRNA is created, 175 00:08:28,850 --> 00:08:31,040 and then the next thing would be-- 176 00:08:31,040 --> 00:08:32,908 AUDIENCE: When the protein [INAUDIBLE]. 177 00:08:35,667 --> 00:08:37,750 PROFESSOR: When the protein is [? totaled-- ?] OK. 178 00:08:37,750 --> 00:08:40,929 So there are multiple time scales, right? 179 00:08:40,929 --> 00:08:43,150 So after an mRNA is created, and that's 180 00:08:43,150 --> 00:08:48,210 through this process here-- so now out pops an mRNA-- 181 00:08:48,210 --> 00:08:51,430 now there are multiple time scales. 182 00:08:51,430 --> 00:08:54,870 There's the time scale for mRNA degradation. 183 00:08:54,870 --> 00:08:58,050 That goes as 1 over gamma m. 184 00:08:58,050 --> 00:09:00,800 There's a time scale for protein degradation 185 00:09:00,800 --> 00:09:02,380 after a protein is made. 186 00:09:02,380 --> 00:09:04,590 That goes as 1 over gamma p. 187 00:09:04,590 --> 00:09:07,120 But then there's also a time scale associated 188 00:09:07,120 --> 00:09:09,530 with kind of the rate of protein production 189 00:09:09,530 --> 00:09:14,010 from each of those mRNAs, and that's determined by Kp. 190 00:09:14,010 --> 00:09:16,997 So we get big protein bursts if what? 191 00:09:19,840 --> 00:09:22,620 What determines the size of these protein bursts? 192 00:09:26,593 --> 00:09:27,093 Yes. 193 00:09:27,093 --> 00:09:27,968 AUDIENCE: [INAUDIBLE] 194 00:09:39,172 --> 00:09:39,880 PROFESSOR: Right. 195 00:09:39,880 --> 00:09:40,796 It's always confusing. 196 00:09:40,796 --> 00:09:42,100 We talk about times. 197 00:09:42,100 --> 00:09:44,270 But in particular, we have protein bursts 198 00:09:44,270 --> 00:09:48,630 in the stochastic situation if we do a stochastic simulation. 199 00:09:48,630 --> 00:09:54,360 And that's in the regime if Kp, the rate 200 00:09:54,360 --> 00:09:56,370 of protein synthesis from the mRNA 201 00:09:56,370 --> 00:09:59,530 is somehow much larger than this gamma m. 202 00:10:04,850 --> 00:10:06,060 Have I screwed up? 203 00:10:06,060 --> 00:10:07,540 Yes. 204 00:10:07,540 --> 00:10:10,305 AUDIENCE: So this is also-- in the sense 205 00:10:10,305 --> 00:10:12,643 of being different from the deterministic equations, 206 00:10:12,643 --> 00:10:17,824 we probably also want the total number of mRNAs [INAUDIBLE]. 207 00:10:17,824 --> 00:10:18,770 Is that sort of-- 208 00:10:18,770 --> 00:10:19,500 PROFESSOR: Right. 209 00:10:19,500 --> 00:10:22,400 Yeah, I think that it-- and the question of what 210 00:10:22,400 --> 00:10:23,850 mRNA number you need. 211 00:10:23,850 --> 00:10:26,100 I mean, it depends on what you mean by protein bursts. 212 00:10:26,100 --> 00:10:29,960 I would say that so long as this is true, 213 00:10:29,960 --> 00:10:33,110 what that means is that each mRNA will, indeed, kind of lead 214 00:10:33,110 --> 00:10:36,090 to a burst of proteins being made, where the burst is, 215 00:10:36,090 --> 00:10:39,120 again, geometrically distributed with some-- 216 00:10:39,120 --> 00:10:41,970 now there's another question, which is, are those protein 217 00:10:41,970 --> 00:10:44,790 bursts kind of large compared to the steady state protein 218 00:10:44,790 --> 00:10:46,670 concentration? 219 00:10:46,670 --> 00:10:51,390 And that's going to depend upon Km and gamma p as well. 220 00:10:51,390 --> 00:10:52,125 Is that-- 221 00:10:52,125 --> 00:10:52,750 AUDIENCE: Yeah. 222 00:10:52,750 --> 00:10:56,670 So I guess [INAUDIBLE] which is, I guess it would also 223 00:10:56,670 --> 00:10:58,187 depend on how big [INAUDIBLE]. 224 00:10:58,187 --> 00:11:00,770 PROFESSOR: All right, well-- and you're saying time resolution 225 00:11:00,770 --> 00:11:02,336 in terms of just measuring-- 226 00:11:02,336 --> 00:11:03,460 AUDIENCE: Yeah. [INAUDIBLE] 227 00:11:03,460 --> 00:11:04,070 PROFESSOR: Yeah. 228 00:11:04,070 --> 00:11:05,430 Well, OK, but right now we're kind 229 00:11:05,430 --> 00:11:07,388 of imagining that we live in this perfect world 230 00:11:07,388 --> 00:11:09,480 where we know at every moment of time 231 00:11:09,480 --> 00:11:11,220 exactly how many of everything there is. 232 00:11:11,220 --> 00:11:13,580 So in some ways we haven't really said anything yet 233 00:11:13,580 --> 00:11:14,540 about time resolution. 234 00:11:14,540 --> 00:11:17,750 We're assuming that our time resolution and our number 235 00:11:17,750 --> 00:11:20,454 resolution is actually perfect. 236 00:11:20,454 --> 00:11:22,620 But still, depending upon the regime that you're in, 237 00:11:22,620 --> 00:11:25,640 the protein numbers could look something 238 00:11:25,640 --> 00:11:30,790 like-- so if you look at the protein number, which 239 00:11:30,790 --> 00:11:36,150 is defined as this n as a function of time, then 240 00:11:36,150 --> 00:11:38,820 in one regime, you're going to see where it's kind of low. 241 00:11:38,820 --> 00:11:42,590 You get a big burst and then it kind of comes down, and then 242 00:11:42,590 --> 00:11:45,276 a big burst, and then it kind of comes down, and burst, 243 00:11:45,276 --> 00:11:46,650 and it kind of comes down, right? 244 00:11:46,650 --> 00:11:50,710 So this is in the regime where you have infrequent mRNAs being 245 00:11:50,710 --> 00:11:57,220 produced, and then large size bursts from each mRNA. 246 00:11:57,220 --> 00:11:59,510 And then you kind of get this effective degradation 247 00:11:59,510 --> 00:12:02,740 or dilution of the protein numbers over time. 248 00:12:02,740 --> 00:12:07,700 And this distribution, if you take a histogram of it, 249 00:12:07,700 --> 00:12:08,335 is what? 250 00:12:18,390 --> 00:12:20,332 AUDIENCE: [INAUDIBLE] 251 00:12:20,332 --> 00:12:21,040 PROFESSOR: Right. 252 00:12:21,040 --> 00:12:24,940 So I'm imagining that we look at this for a long period of time. 253 00:12:24,940 --> 00:12:27,370 And then we come over here and we histogram it. 254 00:12:27,370 --> 00:12:30,020 So now we come over here, we turn to the left, 255 00:12:30,020 --> 00:12:35,200 we say number has a function of-- this is number n. 256 00:12:35,200 --> 00:12:38,490 The frequency that we observe, some number of proteins. 257 00:12:38,490 --> 00:12:41,320 So frequency. 258 00:12:41,320 --> 00:12:43,145 And this is going to do something. 259 00:12:47,120 --> 00:12:50,050 So what about-- it may not be a beautiful drawing, 260 00:12:50,050 --> 00:12:52,690 but you're supposed to know the answer. 261 00:12:52,690 --> 00:12:54,870 I'm trying to review things for you because I 262 00:12:54,870 --> 00:12:56,760 hear that you have a big exam coming up, 263 00:12:56,760 --> 00:12:58,010 and I want to make sure that-- 264 00:13:01,565 --> 00:13:02,065 Gamma. 265 00:13:02,065 --> 00:13:03,470 It's a gamma, right? 266 00:13:03,470 --> 00:13:07,520 So this is what we learned earlier. 267 00:13:07,520 --> 00:13:09,250 So this is a gamma distribution. 268 00:13:09,250 --> 00:13:13,375 And you should know what this gamma distribution looks like. 269 00:13:13,375 --> 00:13:15,420 In particular, there are these two parameters 270 00:13:15,420 --> 00:13:16,961 that describe this gamma distribution 271 00:13:16,961 --> 00:13:19,500 as a function of underlying parameters in the model. 272 00:13:19,500 --> 00:13:20,456 AUDIENCE: [INAUDIBLE] 273 00:13:25,240 --> 00:13:26,840 PROFESSOR: Maybe. 274 00:13:26,840 --> 00:13:31,590 I don't want to get too much into this because, well, 275 00:13:31,590 --> 00:13:34,960 on Thursday we spent a long time talking about it. 276 00:13:34,960 --> 00:13:36,960 Once we get going, we'll spend another long time 277 00:13:36,960 --> 00:13:38,250 taling about it again. 278 00:13:38,250 --> 00:13:43,920 But you should review your notes from Thursday before the exam. 279 00:13:47,150 --> 00:13:48,670 So this thing is gamma distributed. 280 00:13:48,670 --> 00:13:51,710 And if we looked at the mRNA number as a function of time 281 00:13:51,710 --> 00:13:54,990 and we did a histogram of that, the mRNA distribution 282 00:13:54,990 --> 00:13:57,320 would be what? 283 00:13:57,320 --> 00:13:59,290 It's poisson. 284 00:13:59,290 --> 00:14:02,989 So it's important to remember that just because I tell you 285 00:14:02,989 --> 00:14:04,780 that a protein number is gamma distributed, 286 00:14:04,780 --> 00:14:07,160 that doesn't immediately tell you exactly what you should 287 00:14:07,160 --> 00:14:10,660 be expecting for the distribution of, say, 288 00:14:10,660 --> 00:14:12,520 the number of protein as a function of time. 289 00:14:12,520 --> 00:14:14,490 I mean, there are many different things 290 00:14:14,490 --> 00:14:17,090 I could plot over here that would all kind of come down 291 00:14:17,090 --> 00:14:19,645 to a gamma distribution over here. 292 00:14:19,645 --> 00:14:21,020 So it's important to kind of keep 293 00:14:21,020 --> 00:14:23,350 in mind the different representations that you might 294 00:14:23,350 --> 00:14:24,558 want to think about the data. 295 00:14:33,075 --> 00:14:35,700 So what we want to do now is we want to think a little bit more 296 00:14:35,700 --> 00:14:39,400 about this master equation in the context of if we're 297 00:14:39,400 --> 00:14:41,980 going to divide it up into these states. 298 00:14:41,980 --> 00:14:44,570 Now I would say that any time that you 299 00:14:44,570 --> 00:14:49,240 are asked to write down the master equation for something-- 300 00:14:49,240 --> 00:14:57,990 so now how many equations will the master equation-- I 301 00:14:57,990 --> 00:15:00,730 say master equation, but there is really more than one, maybe. 302 00:15:00,730 --> 00:15:04,800 So how many equations will be involved in the master equation 303 00:15:04,800 --> 00:15:06,260 kind of description of this model? 304 00:15:12,440 --> 00:15:13,320 Infinitely many. 305 00:15:13,320 --> 00:15:15,111 But there were infinitely many already when 306 00:15:15,111 --> 00:15:19,510 we had just one, when we just had the mRNA distribution. 307 00:15:19,510 --> 00:15:23,420 Well, you know, infinite times infinite is still infinite. 308 00:15:23,420 --> 00:15:28,160 So long as it's a countably infinite number. 309 00:15:28,160 --> 00:15:31,490 But yeah, but it's still infinite, always. 310 00:15:31,490 --> 00:15:32,430 All right. 311 00:15:32,430 --> 00:15:34,800 So what we want to do is divide up the states. 312 00:15:34,800 --> 00:15:38,580 So when somebody asks you for-- the equation's describing how 313 00:15:38,580 --> 00:15:40,410 those probabilities are going to vary, 314 00:15:40,410 --> 00:15:43,260 really what we're interested in is some derivative with respect 315 00:15:43,260 --> 00:15:47,410 to time of some probabilities described by m,n. 316 00:15:47,410 --> 00:15:52,220 We want to know the derivative with respect to time for all 317 00:15:52,220 --> 00:15:53,140 m,n's. 318 00:15:53,140 --> 00:15:56,470 So that's why there are infinite number, because m goes in one 319 00:15:56,470 --> 00:15:58,400 direction, n goes in another. 320 00:15:58,400 --> 00:15:59,560 Lots of them, OK? 321 00:15:59,560 --> 00:16:04,680 Now it's always tempting to just write down this derivative 322 00:16:04,680 --> 00:16:08,600 and then just write down the equation. 323 00:16:08,600 --> 00:16:10,270 If you do that, that's fine, but I 324 00:16:10,270 --> 00:16:11,930 would recommend that in general what 325 00:16:11,930 --> 00:16:14,540 you do is you try to write a little chart out 326 00:16:14,540 --> 00:16:18,280 to keep track of what directions things can go. 327 00:16:18,280 --> 00:16:22,220 So for example, here we have the probability of being the m,n 328 00:16:22,220 --> 00:16:25,000 state. 329 00:16:25,000 --> 00:16:27,180 Now there's going to be ways of going here. 330 00:16:27,180 --> 00:16:30,165 And this is going to be going probability of being an m plus 331 00:16:30,165 --> 00:16:30,665 1,n. 332 00:16:38,006 --> 00:16:40,000 What I'm going to do is I'm going to give you 333 00:16:40,000 --> 00:16:41,330 just a couple minutes. 334 00:16:41,330 --> 00:16:45,590 And in two minutes, I want you to try to write down as many 335 00:16:45,590 --> 00:16:50,240 of the rates, the f's and n's that correspond 336 00:16:50,240 --> 00:16:51,472 to all these transitions. 337 00:16:51,472 --> 00:16:53,430 You may not be able to get through all of them, 338 00:16:53,430 --> 00:16:56,590 but if you don't try to figure out some of them, 339 00:16:56,590 --> 00:16:58,404 then you're going to have trouble doing it 340 00:16:58,404 --> 00:16:59,070 at a later date. 341 00:17:14,819 --> 00:17:18,230 Do you understand what I'm asking you to do? 342 00:17:18,230 --> 00:17:19,819 So next to each one of these arrows, 343 00:17:19,819 --> 00:17:20,944 you should write something. 344 00:17:24,480 --> 00:17:26,290 So I'll give you two minutes to kind of do 345 00:17:26,290 --> 00:17:28,617 your best of writing these things down. 346 00:19:38,410 --> 00:19:38,910 All right. 347 00:19:38,910 --> 00:19:44,660 Why don't we reconvene, and we'll see how we are? 348 00:19:44,660 --> 00:19:49,290 So this is very similar to what we did on Thursday. 349 00:19:49,290 --> 00:19:51,195 We have to remember that m's are the mRNAs, 350 00:19:51,195 --> 00:19:54,750 and this is what we solved before, 351 00:19:54,750 --> 00:19:58,430 where it's just a long row. 352 00:19:58,430 --> 00:20:03,910 Now first of all, the mRNA distributions and the rates, 353 00:20:03,910 --> 00:20:08,607 do they depend on the protein numbers? 354 00:20:08,607 --> 00:20:09,950 No. 355 00:20:09,950 --> 00:20:12,250 So what that mean about, say, this arrow as 356 00:20:12,250 --> 00:20:15,980 compared to the arrow that would be down here? 357 00:20:15,980 --> 00:20:18,550 It's going to be the same, because n does not 358 00:20:18,550 --> 00:20:21,310 appear in that equation describing mRNA. 359 00:20:21,310 --> 00:20:23,630 If we had autoregulation of some sort, then it would. 360 00:20:26,550 --> 00:20:28,641 So let's go through. 361 00:20:28,641 --> 00:20:29,140 All right. 362 00:20:29,140 --> 00:20:30,681 What we're going to do is we're going 363 00:20:30,681 --> 00:20:31,880 to do a verbal yelling out. 364 00:20:31,880 --> 00:20:33,055 OK, ready. 365 00:20:33,055 --> 00:20:33,555 This arrow. 366 00:20:33,555 --> 00:20:34,770 AUDIENCE: Km. 367 00:20:38,100 --> 00:20:39,655 PROFESSOR: This one here is, 3,2,1-- 368 00:20:39,655 --> 00:20:40,360 AUDIENCE: Km. 369 00:20:40,360 --> 00:20:41,060 PROFESSOR: Km. 370 00:20:41,060 --> 00:20:42,720 All right. 371 00:20:42,720 --> 00:20:43,890 All right. 372 00:20:43,890 --> 00:20:46,754 Ready, 3, 2, 1. 373 00:20:46,754 --> 00:20:48,080 AUDIENCE: Gamma m times m. 374 00:20:48,080 --> 00:20:51,300 PROFESSOR: Gamma m times m. 375 00:20:51,300 --> 00:20:52,880 3, 2, 1. 376 00:20:52,880 --> 00:20:54,810 AUDIENCE: Gamma n times m plus 1. 377 00:20:54,810 --> 00:20:57,720 PROFESSOR: Gamma m times m plus 1. 378 00:20:57,720 --> 00:21:00,910 Now remember that there are more mRNA over here then there 379 00:21:00,910 --> 00:21:03,050 are here, which means that the rate of degradation 380 00:21:03,050 --> 00:21:03,633 will increase. 381 00:21:06,900 --> 00:21:09,040 Now coming here, now this is talking 382 00:21:09,040 --> 00:21:10,710 about the creation and destruction 383 00:21:10,710 --> 00:21:13,326 of the proteins, changes in n. 384 00:21:13,326 --> 00:21:14,450 All right, this arrow here. 385 00:21:14,450 --> 00:21:16,080 Ready, 3, 2, 1. 386 00:21:16,080 --> 00:21:17,984 AUDIENCE: Kp times m. 387 00:21:17,984 --> 00:21:21,100 PROFESSOR: It's Kp times m. 388 00:21:21,100 --> 00:21:26,980 So this is the rate of creation, going from n minus 1 to n. 389 00:21:26,980 --> 00:21:28,070 That's fine. 390 00:21:28,070 --> 00:21:31,010 You know, I was looking at my notes from last year, 391 00:21:31,010 --> 00:21:34,440 and I got one of these things incorrect, so-- and then, 392 00:21:34,440 --> 00:21:35,030 OK, ready. 393 00:21:35,030 --> 00:21:38,440 This one here, 3, 2, 1. 394 00:21:38,440 --> 00:21:40,600 Kp times m. 395 00:21:40,600 --> 00:21:46,805 So here the same rate, and should we be surprised by that? 396 00:21:46,805 --> 00:21:48,430 So the number of proteins are changing, 397 00:21:48,430 --> 00:21:49,760 but here it's the number of mRNA that 398 00:21:49,760 --> 00:21:52,410 matters, because we're talking about the rate of translation, 399 00:21:52,410 --> 00:21:54,170 right? 400 00:21:54,170 --> 00:21:58,590 Now this one here, 3, 2, 1. 401 00:21:58,590 --> 00:22:00,720 Gamma p times n. 402 00:22:00,720 --> 00:22:03,204 And here, 3, 2, 1. 403 00:22:03,204 --> 00:22:04,860 AUDIENCE: Gamma p times n plus 1. 404 00:22:04,860 --> 00:22:06,590 PROFESSOR: Gamma p times n plus 1. 405 00:22:06,590 --> 00:22:07,220 All right. 406 00:22:07,220 --> 00:22:08,380 Perfect. 407 00:22:08,380 --> 00:22:11,350 Now this is, of course, as you can imagine, the simplest 408 00:22:11,350 --> 00:22:13,790 possible kind of set of equations that we 409 00:22:13,790 --> 00:22:15,170 could have written down. 410 00:22:15,170 --> 00:22:18,750 If you have other crazy things, you 411 00:22:18,750 --> 00:22:21,500 get different distributions, if you have autoregulation 412 00:22:21,500 --> 00:22:23,790 or if you have interactions of something 413 00:22:23,790 --> 00:22:26,940 with something else, or the same thing, so forth. 414 00:22:26,940 --> 00:22:30,190 But I think it's really very useful 415 00:22:30,190 --> 00:22:33,930 to kind of write this thing down to clarify your thinking 416 00:22:33,930 --> 00:22:35,220 in these problems. 417 00:22:35,220 --> 00:22:39,640 And then you can fill out-- for change of probability, 418 00:22:39,640 --> 00:22:40,640 you have mn. 419 00:22:40,640 --> 00:22:42,490 You come here and you just go around 420 00:22:42,490 --> 00:22:45,010 and you count, take all the arrows coming in, 421 00:22:45,010 --> 00:22:48,300 and those are ways of increasing your probability. 422 00:22:48,300 --> 00:22:52,260 And ways going out are ways of decreasing your probability. 423 00:22:52,260 --> 00:22:54,830 Now in all those cases you have to multiply these raw rates 424 00:22:54,830 --> 00:22:57,163 by the probabilities of being in all these other states. 425 00:23:06,220 --> 00:23:07,850 So can you use the master equation 426 00:23:07,850 --> 00:23:12,332 to get these probabilities if you're out of equilibrium, 427 00:23:12,332 --> 00:23:13,165 out of steady state? 428 00:23:16,776 --> 00:23:17,650 So that's a question. 429 00:23:17,650 --> 00:23:23,220 So the master equation useful out of steady state? 430 00:23:30,390 --> 00:23:32,540 Yes. 431 00:23:32,540 --> 00:23:33,350 Ready. 432 00:23:33,350 --> 00:23:35,425 3, 2, 1. 433 00:23:39,020 --> 00:23:39,520 All right. 434 00:23:39,520 --> 00:23:46,780 So we got a fair number of-- there is some disagreement, 435 00:23:46,780 --> 00:23:47,280 but yeah. 436 00:23:47,280 --> 00:23:49,360 So it actually-- the answer is yes. 437 00:23:49,360 --> 00:23:52,649 And that's because you can start with any distribution 438 00:23:52,649 --> 00:23:54,940 of probabilities across all the states that you'd like. 439 00:23:54,940 --> 00:23:57,231 It could be that all of the probabilities at one state. 440 00:23:57,231 --> 00:23:58,930 It could be however you like. 441 00:23:58,930 --> 00:24:00,310 And the master equation tells you 442 00:24:00,310 --> 00:24:01,934 about how that probability distribution 443 00:24:01,934 --> 00:24:04,800 will change over time. 444 00:24:04,800 --> 00:24:07,770 Now if you let that run forever, then you 445 00:24:07,770 --> 00:24:10,870 come to some equilibrium steady state. 446 00:24:10,870 --> 00:24:12,965 And that's a very interesting quantity, 447 00:24:12,965 --> 00:24:15,298 is the steady state distribution of these probabilities. 448 00:24:15,298 --> 00:24:18,450 But you can actually calculate from any initial distribution 449 00:24:18,450 --> 00:24:21,329 of probabilities evolving to any later time t what 450 00:24:21,329 --> 00:24:22,620 the probability would be later. 451 00:24:25,470 --> 00:24:28,950 This comes to another question here. 452 00:24:28,950 --> 00:24:29,450 All right. 453 00:24:29,450 --> 00:24:32,270 So let's imagine that at time t equal to 0, 454 00:24:32,270 --> 00:24:35,530 I tell you that there are m not mRNA and P 455 00:24:35,530 --> 00:24:41,056 not-- I always do this. 456 00:24:41,056 --> 00:24:46,780 I don't know, somehow my brain does not like this. 457 00:24:46,780 --> 00:24:48,770 Because the P's we want to be probabilities. 458 00:24:52,400 --> 00:24:54,670 We start with m not mRNA, n not protein. 459 00:24:58,050 --> 00:25:00,270 And maybe it's a complicated situation. 460 00:25:00,270 --> 00:25:01,925 We can't calculate this analytically. 461 00:25:01,925 --> 00:25:03,550 So what we do is we go to our computer, 462 00:25:03,550 --> 00:25:08,150 and we have it solve how this probability distribution will 463 00:25:08,150 --> 00:25:11,330 evolve so that time T equal to some time-- 464 00:25:11,330 --> 00:25:13,920 if we'd like we can say this is T1. 465 00:25:13,920 --> 00:25:17,141 I'll tell you, oh, the probability of having m and n 466 00:25:17,141 --> 00:25:19,390 mRNA and protein is going to be equal to something P1. 467 00:25:22,210 --> 00:25:25,450 Now the question is, let's say I then go 468 00:25:25,450 --> 00:25:27,090 and I do this simulation again. 469 00:25:30,110 --> 00:25:34,150 Now I calculate some other at time T1 again, 470 00:25:34,150 --> 00:25:36,420 the probability that you're in the m,n state. 471 00:25:36,420 --> 00:25:38,530 The question is, will you again get P1? 472 00:25:41,905 --> 00:25:43,030 So this is a question mark. 473 00:25:43,030 --> 00:25:46,770 And A is yes, B is no. 474 00:25:56,750 --> 00:25:57,250 All right. 475 00:25:57,250 --> 00:26:00,120 I'm going to give you 15 seconds. 476 00:26:00,120 --> 00:26:03,970 I think this is very important that you understand 477 00:26:03,970 --> 00:26:06,610 what the master equation is doing and what it is not doing. 478 00:26:15,884 --> 00:26:18,850 AUDIENCE: [INAUDIBLE] 479 00:26:18,850 --> 00:26:21,040 PROFESSOR: I'm sorry, what's that? 480 00:26:21,040 --> 00:26:21,540 Right. 481 00:26:21,540 --> 00:26:22,040 OK. 482 00:26:22,040 --> 00:26:25,950 So I mean, this is just-- you know, 483 00:26:25,950 --> 00:26:32,979 you program in your computer to use the master 484 00:26:32,979 --> 00:26:34,770 equation to solve how the probabilities are 485 00:26:34,770 --> 00:26:36,150 going to evolve. 486 00:26:36,150 --> 00:26:38,760 I'm just telling you, start with some initial distribution. 487 00:26:38,760 --> 00:26:40,725 And if you do it once, it says, oh, 488 00:26:40,725 --> 00:26:42,100 the probability that you're going 489 00:26:42,100 --> 00:26:46,250 to have m-- this time you're going to have mRNA proteins is 490 00:26:46,250 --> 00:26:49,550 going to be P1, so it's 10%. 491 00:26:49,550 --> 00:26:50,460 Great. 492 00:26:50,460 --> 00:26:53,350 Now I'm asking just, if you go back and do it again, 493 00:26:53,350 --> 00:26:57,510 will you again get 10%, or is this output stochastic? 494 00:27:04,990 --> 00:27:08,080 It's OK that if you're confused by this distinction. 495 00:27:08,080 --> 00:27:09,830 I think that it's easy to get confused by, 496 00:27:09,830 --> 00:27:11,420 which is why I'm doing this. 497 00:27:11,420 --> 00:27:13,250 But let's just see where we are. 498 00:27:13,250 --> 00:27:14,020 Ready? 499 00:27:14,020 --> 00:27:16,180 3, 2, 1. 500 00:27:18,150 --> 00:27:18,650 All right. 501 00:27:18,650 --> 00:27:20,880 So I'd say a majority again. 502 00:27:20,880 --> 00:27:24,320 We're kind of at the 80-20, 75-25. 503 00:27:24,320 --> 00:27:27,090 A majority here are saying that, yes, you 504 00:27:27,090 --> 00:27:30,270 will get the same probability. 505 00:27:30,270 --> 00:27:34,020 And this is very important that we understand 506 00:27:34,020 --> 00:27:37,520 kind of where this where the stochasticity is somehow 507 00:27:37,520 --> 00:27:39,640 embedded in these different representations 508 00:27:39,640 --> 00:27:40,560 of these modelings. 509 00:27:44,500 --> 00:27:48,210 The master equation is a set of differential equations telling 510 00:27:48,210 --> 00:27:50,720 you about how the probabilities change over time 511 00:27:50,720 --> 00:27:52,800 given some initial conditions. 512 00:27:52,800 --> 00:27:56,580 Now we're using these things to calculate 513 00:27:56,580 --> 00:27:58,239 the evolution of some random process, 514 00:27:58,239 --> 00:28:00,655 but the probabilities themselves evolve deterministically. 515 00:28:06,090 --> 00:28:09,080 So what that means is that although these things are 516 00:28:09,080 --> 00:28:12,232 probabilities, if you start somewhere 517 00:28:12,232 --> 00:28:13,940 and you use the master equation to solve, 518 00:28:13,940 --> 00:28:16,520 you get the same thing every time you do it. 519 00:28:16,520 --> 00:28:20,045 Now this is not true for the Gillespie simulation, 520 00:28:20,045 --> 00:28:23,160 because that, you're looking at an individual trajectory. 521 00:28:23,160 --> 00:28:26,830 An individual trajectory, then the stochasticity 522 00:28:26,830 --> 00:28:30,340 is embedded in that trajectory itself, 523 00:28:30,340 --> 00:28:33,322 whereas in the master equation, the stochasticity arises 524 00:28:33,322 --> 00:28:35,530 because these are probabilities that are calculating, 525 00:28:35,530 --> 00:28:38,635 so any individual instantiation will be probabilistic 526 00:28:38,635 --> 00:28:41,010 because you are sampling from those different probability 527 00:28:41,010 --> 00:28:41,593 distributions. 528 00:28:44,750 --> 00:28:47,350 Now this is, I think, a sufficiently important point 529 00:28:47,350 --> 00:28:49,820 that if there are questions about it, 530 00:28:49,820 --> 00:28:52,260 we should talk about it. 531 00:28:56,071 --> 00:28:56,571 Yeah. 532 00:28:56,571 --> 00:29:00,220 AUDIENCE: How do you make the simulations? 533 00:29:00,220 --> 00:29:02,480 Would you essentially-- can you take 534 00:29:02,480 --> 00:29:05,390 a sum over different Gillespie? 535 00:29:05,390 --> 00:29:07,085 PROFESSOR: So it's true that you can do 536 00:29:07,085 --> 00:29:09,100 a sum over different Gillespie. 537 00:29:09,100 --> 00:29:11,690 But we haven't yet told you about, 538 00:29:11,690 --> 00:29:14,760 what the Gillespie algorithm is, so I can't use that. 539 00:29:14,760 --> 00:29:20,250 But indeed, you can just use a standard solver 540 00:29:20,250 --> 00:29:23,740 of differential equations. 541 00:29:23,740 --> 00:29:25,410 So whatever program you use is going 542 00:29:25,410 --> 00:29:28,445 to have some way of doing this. 543 00:29:28,445 --> 00:29:30,320 And once you've written down these equations, 544 00:29:30,320 --> 00:29:32,903 the fact that these are actually probabilities doesn't matter. 545 00:29:32,903 --> 00:29:35,170 So those could have been something else. 546 00:29:35,170 --> 00:29:40,240 So this could be the number of eggs, whatever, right? 547 00:29:40,240 --> 00:29:43,115 So once you've gotten the equations, 548 00:29:43,115 --> 00:29:44,990 then equations just tell you how the problems 549 00:29:44,990 --> 00:29:47,176 are going to change over time. 550 00:29:47,176 --> 00:29:49,628 Yeah. 551 00:29:49,628 --> 00:29:51,336 AUDIENCE: Maybe this is a silly question, 552 00:29:51,336 --> 00:29:57,520 but in practice, do you have to assume all the probabilities 553 00:29:57,520 --> 00:29:59,940 are 0 above some number? 554 00:29:59,940 --> 00:30:05,510 PROFESSOR: Oh no, that's not at all a silly question, because-- 555 00:30:05,510 --> 00:30:06,480 AUDIENCE: [INAUDIBLE] 556 00:30:06,480 --> 00:30:07,271 PROFESSOR: Exactly. 557 00:30:07,271 --> 00:30:08,510 Right. 558 00:30:08,510 --> 00:30:12,020 And yes, it's a very good question. 559 00:30:12,020 --> 00:30:14,550 So I told you this is an infinite set of differential 560 00:30:14,550 --> 00:30:15,555 equations. 561 00:30:15,555 --> 00:30:18,180 But at the same time I told you this master equation's supposed 562 00:30:18,180 --> 00:30:21,080 to be useful for something, and kind of at the face of it, 563 00:30:21,080 --> 00:30:23,560 these are incompatible ideas. 564 00:30:23,560 --> 00:30:27,530 And the basic answer is that you have 565 00:30:27,530 --> 00:30:29,200 to include all the states where there 566 00:30:29,200 --> 00:30:33,195 is a sort of non-negligible probability. 567 00:30:35,902 --> 00:30:37,110 We could be concrete, though. 568 00:30:37,110 --> 00:30:41,390 So let's imagine that I tell you we want to look at the mRNA 569 00:30:41,390 --> 00:30:44,120 number here. 570 00:30:44,120 --> 00:30:50,550 And I tell you that OK, Km is equal to-- 571 00:30:50,550 --> 00:30:52,540 well, let me make sure. 572 00:30:52,540 --> 00:30:54,370 Gamma m. 573 00:30:54,370 --> 00:30:58,240 What are typical lifetimes of mRNAs in bacteria again? 574 00:30:58,240 --> 00:30:59,292 AUDIENCE: [INAUDIBLE] 575 00:30:59,292 --> 00:31:00,000 PROFESSOR: Right. 576 00:31:00,000 --> 00:31:00,790 Order a minute. 577 00:31:00,790 --> 00:31:09,790 So that means that-- let's say this is 0.5 minutes minus 1. 578 00:31:09,790 --> 00:31:12,595 To get a lifetime of around 2 minutes. 579 00:31:15,180 --> 00:31:19,770 And then let's imagine that this is then 50 per minute. 580 00:31:22,920 --> 00:31:26,870 So an mRNA is kind of made once a minute. 581 00:31:26,870 --> 00:31:28,000 There's 50 of them. 582 00:31:28,000 --> 00:31:29,604 That's a lot, but whatever. 583 00:31:29,604 --> 00:31:30,520 There are a few genes. 584 00:31:30,520 --> 00:31:31,850 Minute. 585 00:31:31,850 --> 00:31:33,520 I wanted the number to be something. 586 00:31:33,520 --> 00:31:37,620 So there's a fair rate of mRNA production. 587 00:31:37,620 --> 00:31:40,970 Now how many equations do you think 588 00:31:40,970 --> 00:31:44,590 you might need to simulate? 589 00:31:44,590 --> 00:31:45,997 So we'll think about this. 590 00:31:45,997 --> 00:31:48,330 First of all, does it depend upon the initial conditions 591 00:31:48,330 --> 00:31:48,829 or not? 592 00:31:51,295 --> 00:31:53,170 AUDIENCE: Maybe. 593 00:31:53,170 --> 00:31:54,200 PROFESSOR: Yeah. 594 00:31:54,200 --> 00:31:56,710 It does. 595 00:31:56,710 --> 00:31:59,540 So be careful. 596 00:31:59,540 --> 00:32:06,930 But let's say that I tell you that we start with 50 mRNA. 597 00:32:06,930 --> 00:32:08,620 The question is, how many equations 598 00:32:08,620 --> 00:32:10,370 do you think you might have to write down? 599 00:32:10,370 --> 00:32:13,041 And let's say we want to understand this once it 600 00:32:13,041 --> 00:32:14,290 gets to, say, the equilibrium. 601 00:32:17,320 --> 00:32:17,820 All right. 602 00:32:17,820 --> 00:32:18,695 Number of equations. 603 00:32:22,090 --> 00:32:25,010 Give me a moment to come up with some reasonable options. 604 00:32:45,430 --> 00:32:57,790 Well, these are-- let's say that this 605 00:32:57,790 --> 00:32:59,230 could show up on your homework. 606 00:32:59,230 --> 00:33:00,490 So the question is, how many equations 607 00:33:00,490 --> 00:33:02,614 are you going to program into your intersimulation? 608 00:33:05,170 --> 00:33:07,810 And it may be-- doesn't have to be exactly any of these number, 609 00:33:07,810 --> 00:33:09,200 but order. 610 00:33:13,520 --> 00:33:15,020 Do you guys understand the question? 611 00:33:19,830 --> 00:33:23,042 So we need a different equation for each 612 00:33:23,042 --> 00:33:24,000 of these probabilities. 613 00:33:24,000 --> 00:33:27,940 So in principle we have-- the master equation 614 00:33:27,940 --> 00:33:30,910 gives us an infinite number of equations. 615 00:33:30,910 --> 00:33:34,760 So we have d the probability of having 0 mRNA with respect 616 00:33:34,760 --> 00:33:37,170 to time. 617 00:33:37,170 --> 00:33:42,620 That's going to be-- any idea what this is going to be? 618 00:33:46,806 --> 00:33:48,302 AUDIENCE: [INAUDIBLE] 619 00:33:48,302 --> 00:33:49,010 PROFESSOR: Right. 620 00:33:49,010 --> 00:33:52,665 So we have a minus Km times what? 621 00:33:57,840 --> 00:33:59,980 Times p0, right. 622 00:33:59,980 --> 00:34:03,710 So this is because if we start out down here at P0. 623 00:34:03,710 --> 00:34:07,602 Now we have Km. 624 00:34:07,602 --> 00:34:09,185 So I was just about to violate my rule 625 00:34:09,185 --> 00:34:14,730 and just write down an equation without drawing this thing. 626 00:34:14,730 --> 00:34:16,084 So it's Km times p0. 627 00:34:16,084 --> 00:34:17,500 That's a way you lose probability, 628 00:34:17,500 --> 00:34:20,100 but you can also gain probability 629 00:34:20,100 --> 00:34:26,325 at a rate that goes as gamma m times P1. 630 00:34:31,630 --> 00:34:35,340 So that's how this probability is going to change over time. 631 00:34:35,340 --> 00:34:38,730 But we have a different equation for you for p1, for p2, for p3, 632 00:34:38,730 --> 00:34:44,060 for p4, all the way, in principle, to p1,683,000, 633 00:34:44,060 --> 00:34:45,949 bla bla bla, right? 634 00:34:45,949 --> 00:34:48,150 So that's problematic, because if we 635 00:34:48,150 --> 00:34:52,860 have to actually in our program code up 100,000,000 equations, 636 00:34:52,860 --> 00:34:55,190 or it could be worse. 637 00:34:55,190 --> 00:34:59,580 Then we're going have trouble with our computers. 638 00:34:59,580 --> 00:35:01,400 So you always have to have some notion 639 00:35:01,400 --> 00:35:03,790 of what you should be doing. 640 00:35:03,790 --> 00:35:07,480 And this also highlights that it's really important 641 00:35:07,480 --> 00:35:11,290 to have some intuitive notion of what's going on in your system 642 00:35:11,290 --> 00:35:13,230 before you go and you start programming, 643 00:35:13,230 --> 00:35:16,507 because in that case-- well, you're 644 00:35:16,507 --> 00:35:18,340 likely to write down something that's wrong. 645 00:35:18,340 --> 00:35:20,173 You won't know if you have the right answer, 646 00:35:20,173 --> 00:35:23,580 and you might do something that doesn't make any sense. 647 00:35:23,580 --> 00:35:25,769 So you have to have some notion of what 648 00:35:25,769 --> 00:35:27,560 the system should look like before you even 649 00:35:27,560 --> 00:35:29,502 start coding it. 650 00:35:33,880 --> 00:35:36,300 My question is, how many of these equations 651 00:35:36,300 --> 00:35:38,050 should we simulate? 652 00:35:50,590 --> 00:35:51,994 OK. 653 00:35:51,994 --> 00:35:53,160 Let's just see where we are. 654 00:35:53,160 --> 00:35:53,659 Ready. 655 00:35:53,659 --> 00:35:55,520 3, 2, 1. 656 00:35:58,300 --> 00:35:58,800 OK. 657 00:35:58,800 --> 00:36:02,190 So I'd say that we have, it's basically 658 00:36:02,190 --> 00:36:05,590 between C and D. Yeah. 659 00:36:05,590 --> 00:36:09,770 I would say some people are maybe more careful than I am. 660 00:36:09,770 --> 00:36:15,000 Can one of the Ds maybe defend why they're saying D? 661 00:36:20,195 --> 00:36:21,070 AUDIENCE: [INAUDIBLE] 662 00:36:26,120 --> 00:36:28,370 PROFESSOR: The mean is 100, and when you say-- I mean, 663 00:36:28,370 --> 00:36:30,411 I think that whatever you're thinking is correct, 664 00:36:30,411 --> 00:36:32,780 but I think that the words are a little dangerous. 665 00:36:32,780 --> 00:36:40,280 And why am I concerned about-- you said-- 666 00:36:40,280 --> 00:36:42,280 is the mean 100 for all time? 667 00:36:44,975 --> 00:36:47,264 AUDIENCE: [INAUDIBLE] and steady state. 668 00:36:47,264 --> 00:36:48,430 PROFESSOR: And steady state. 669 00:36:48,430 --> 00:36:48,929 Right. 670 00:36:48,929 --> 00:36:54,430 I think that was the-- for long times, the mean number of mRNA 671 00:36:54,430 --> 00:36:56,170 will, indeed, be 100. 672 00:36:56,170 --> 00:36:59,160 So the mean number of m, in this case, 673 00:36:59,160 --> 00:37:03,010 will be Km divided by gamma m, which 674 00:37:03,010 --> 00:37:04,980 is going to be equal to 50 divided by that. 675 00:37:04,980 --> 00:37:09,560 That gets us 100. 676 00:37:09,560 --> 00:37:12,050 Now will it be exactly 100? 677 00:37:12,050 --> 00:37:12,550 No. 678 00:37:12,550 --> 00:37:14,216 It's going to be 100 plus or minus what? 679 00:37:16,570 --> 00:37:17,890 Plus or minus 10. 680 00:37:17,890 --> 00:37:18,490 Right. 681 00:37:18,490 --> 00:37:21,462 Because this distribution at steady state is what? 682 00:37:21,462 --> 00:37:22,610 AUDIENCE: It's Poisson. 683 00:37:22,610 --> 00:37:23,700 PROFESSOR: It's Poisson. 684 00:37:23,700 --> 00:37:28,360 What's the variance of a Poisson distribution? 685 00:37:28,360 --> 00:37:29,360 It's equal to the mean. 686 00:37:29,360 --> 00:37:34,330 So for Poisson, the variance is equal to the mean. 687 00:37:37,380 --> 00:37:40,230 Variance is the square of the standard deviation, 688 00:37:40,230 --> 00:37:42,480 which means that this is going to be plus or minus 10. 689 00:37:45,800 --> 00:37:48,180 That's kind of the typical width of the distribution. 690 00:37:48,180 --> 00:37:51,350 So what it means is that at equilibrium, 691 00:37:51,350 --> 00:37:54,780 we're going to be at 100 and it's going to kind of look 692 00:37:54,780 --> 00:37:57,380 like this. 693 00:37:57,380 --> 00:38:04,270 So this might be 2 sigma, so this could be 20. 694 00:38:04,270 --> 00:38:05,850 But each of these is 10. 695 00:38:09,290 --> 00:38:10,990 So if you want to capture this, you 696 00:38:10,990 --> 00:38:12,560 might want to go out to a few sigma. 697 00:38:15,950 --> 00:38:17,830 So let's say you want to go out to 3 sigma, 698 00:38:17,830 --> 00:38:19,663 then you might want to get out to 130 maybe. 699 00:38:21,930 --> 00:38:23,750 So then, if you want to be more careful 700 00:38:23,750 --> 00:38:25,510 you go out to 140 or 150. 701 00:38:25,510 --> 00:38:27,510 But this thing is going to decay exponentially, 702 00:38:27,510 --> 00:38:30,920 so you don't need to go up TO 1,000, 703 00:38:30,920 --> 00:38:33,840 because the probability's going to be 0 0 0. 704 00:38:33,840 --> 00:38:36,690 Certain once you're at 200 you don't have to worry about it. 705 00:38:36,690 --> 00:38:38,290 But of course, you have to remember the initial condition 706 00:38:38,290 --> 00:38:39,260 we started at 50. 707 00:38:42,930 --> 00:38:45,411 So we started at this point, which means we definitely 708 00:38:45,411 --> 00:38:46,660 have to include that equation. 709 00:38:46,660 --> 00:38:48,990 Otherwise we're in trouble. 710 00:38:48,990 --> 00:38:54,860 Now how much do we have to go to below 50 Any-- 711 00:39:04,132 --> 00:39:07,090 AUDIENCE: My guess would be that it would be not much 712 00:39:07,090 --> 00:39:11,881 more than the [? few ?] times 5, because if it were already 713 00:39:11,881 --> 00:39:13,464 at equilibrium that would be the mean. 714 00:39:13,464 --> 00:39:15,047 But it's not, and so the driving force 715 00:39:15,047 --> 00:39:17,141 is still going to push it back to [INAUDIBLE]. 716 00:39:17,141 --> 00:39:18,140 PROFESSOR: That's right. 717 00:39:18,140 --> 00:39:20,190 So it's going to be a bias random 718 00:39:20,190 --> 00:39:22,460 walk here, where it's going to be sort of maybe 719 00:39:22,460 --> 00:39:25,180 twice as likely at each step to be moving right as to be moving 720 00:39:25,180 --> 00:39:26,059 left. 721 00:39:26,059 --> 00:39:27,850 That means it could very well go to 49, 48. 722 00:39:27,850 --> 00:39:30,249 But it's not really going to go below 40, say. 723 00:39:30,249 --> 00:39:32,040 Of course you have to quantify these things 724 00:39:32,040 --> 00:39:33,123 if you want to be careful. 725 00:39:33,123 --> 00:39:38,540 But certainly I would say going from, I don't know, 35 to 135 726 00:39:38,540 --> 00:39:40,180 would be fine with me. 727 00:39:40,180 --> 00:39:43,510 You would get full credit on your problem set. 728 00:39:43,510 --> 00:39:49,350 So we'll say-- I'm going to make this up-- from 35 to 135, 729 00:39:49,350 --> 00:39:51,900 134 just so it could be 100 equations. 730 00:39:51,900 --> 00:39:54,273 So I'd say I'd be fine with 100 equations. 731 00:39:57,100 --> 00:39:59,410 So you would simulate the change in the probabilities 732 00:39:59,410 --> 00:40:05,936 of P35 to P134, for example. 733 00:40:05,936 --> 00:40:07,810 So although in principle, the master equation 734 00:40:07,810 --> 00:40:10,870 specifies how the probabilities for an infinite number 735 00:40:10,870 --> 00:40:12,519 of equations are going to change, 736 00:40:12,519 --> 00:40:14,560 you only need to simulate a finite number of them 737 00:40:14,560 --> 00:40:16,700 depending upon the dynamics of your system. 738 00:40:19,250 --> 00:40:20,100 Yes. 739 00:40:20,100 --> 00:40:22,550 Thank you for the question, because it's a very important 740 00:40:22,550 --> 00:40:23,457 practical thing. 741 00:40:23,457 --> 00:40:27,245 AUDIENCE: So in practice, you don't 742 00:40:27,245 --> 00:40:30,756 know what the solution is, which is sort of why you would 743 00:40:30,756 --> 00:40:31,256 [INAUDIBLE]. 744 00:40:31,256 --> 00:40:35,210 Do you explain your range and see if the solution changes? 745 00:40:35,210 --> 00:40:37,350 PROFESSOR: So the question is, in this case, 746 00:40:37,350 --> 00:40:39,766 it's a little bit cheating because we already kind of knew 747 00:40:39,766 --> 00:40:40,980 the answer. 748 00:40:40,980 --> 00:40:43,370 We didn't know exactly how the time dependence 749 00:40:43,370 --> 00:40:44,410 was going to go. 750 00:40:44,410 --> 00:40:47,076 How is it that the mean is going to change over time on average? 751 00:40:50,905 --> 00:40:51,780 Exponentially, right? 752 00:40:51,780 --> 00:40:53,430 So on average you will start at 50. 753 00:40:53,430 --> 00:40:55,990 You exponentially relax to 100. 754 00:40:55,990 --> 00:40:59,280 But in many cases, we don't know so much about the system. 755 00:40:59,280 --> 00:41:01,310 And I'd say that what, in general, you can do 756 00:41:01,310 --> 00:41:04,060 is, you have to always specify a finite number of equations. 757 00:41:04,060 --> 00:41:06,210 But then what you can do is, you can put in, like, 758 00:41:06,210 --> 00:41:09,040 reflecting boundary conditions or so on the end, 759 00:41:09,040 --> 00:41:12,399 so you don't allow probability to escape. 760 00:41:12,399 --> 00:41:14,690 But then what you can do is you can run the simulation, 761 00:41:14,690 --> 00:41:17,596 and if you have some reasonable probability to any 762 00:41:17,596 --> 00:41:19,720 of your boundaries, then you know you're in trouble 763 00:41:19,720 --> 00:41:22,320 and you have to extend it from there. 764 00:41:22,320 --> 00:41:24,720 So you can look to say, oh, is it above 10 765 00:41:24,720 --> 00:41:27,067 to the minus 3 or 4, whatever. 766 00:41:27,067 --> 00:41:29,400 And then if it is, then you know you have to go further. 767 00:41:34,742 --> 00:41:36,700 Any other questions about how-- you're actually 768 00:41:36,700 --> 00:41:38,949 going to be doing simulations of this, so these 769 00:41:38,949 --> 00:41:40,240 are relevant questions for you. 770 00:41:46,880 --> 00:41:47,380 All right. 771 00:41:50,880 --> 00:41:54,490 So that's the master equation. 772 00:41:54,490 --> 00:41:56,870 But I'd say the key, key thing to remember 773 00:41:56,870 --> 00:41:59,360 is that it tells you how to calculate 774 00:41:59,360 --> 00:42:04,070 the deterministic evolution of the probability of these states 775 00:42:04,070 --> 00:42:06,300 given some potentially complicated set 776 00:42:06,300 --> 00:42:07,580 of interactions. 777 00:42:07,580 --> 00:42:11,060 Now, a rather orthogonal view to the master equation 778 00:42:11,060 --> 00:42:13,080 is to use the Gillespie algorithm, 779 00:42:13,080 --> 00:42:15,486 or in general, to do direct stochastic simulations 780 00:42:15,486 --> 00:42:16,610 of individual trajectories. 781 00:42:16,610 --> 00:42:17,110 Yeah. 782 00:42:17,110 --> 00:42:19,890 Question before we go. 783 00:42:19,890 --> 00:42:22,300 AUDIENCE: So if we just set it to 0, the probabilities 784 00:42:22,300 --> 00:42:23,264 outside the range we think we need, 785 00:42:23,264 --> 00:42:24,710 would we be losing probability? 786 00:42:26,905 --> 00:42:29,030 PROFESSOR: So the question is whether we're somehow 787 00:42:29,030 --> 00:42:29,821 losing probability. 788 00:42:29,821 --> 00:42:33,110 So what I was proposing before is that you always 789 00:42:33,110 --> 00:42:34,640 want probabilities to sum to 1. 790 00:42:34,640 --> 00:42:37,730 Otherwise it's not our probability 791 00:42:37,730 --> 00:42:40,690 and the mathematicians get upset. 792 00:42:40,690 --> 00:42:44,310 And the key thing there is that you 793 00:42:44,310 --> 00:42:48,380 want to start with-- you have to include all the states that 794 00:42:48,380 --> 00:42:52,050 have probability at the beginning. 795 00:42:52,050 --> 00:42:54,462 So in that sense, you're given an initial distribution, 796 00:42:54,462 --> 00:42:56,170 and you have to include all those states. 797 00:42:56,170 --> 00:42:58,896 Otherwise you're definitely going to do something funny. 798 00:42:58,896 --> 00:43:01,270 You start out with a normalized probability distribution. 799 00:43:01,270 --> 00:43:02,390 And then I guess what I was proposing 800 00:43:02,390 --> 00:43:04,570 is that you have a finite number of equations, 801 00:43:04,570 --> 00:43:06,870 but you don't let the probability leave 802 00:43:06,870 --> 00:43:10,280 or come in from those ends. 803 00:43:10,280 --> 00:43:12,010 And if you do that, then you will always 804 00:43:12,010 --> 00:43:14,470 have a normalized probability distribution. 805 00:43:14,470 --> 00:43:16,340 Of course, at the ends you've kind of 806 00:43:16,340 --> 00:43:20,434 violated the actual equations, and that's 807 00:43:20,434 --> 00:43:21,850 why you have to make sure that you 808 00:43:21,850 --> 00:43:23,700 don't have significant probability at any 809 00:43:23,700 --> 00:43:26,440 of your boundaries. 810 00:43:26,440 --> 00:43:28,890 Does that answer? 811 00:43:28,890 --> 00:43:29,390 Not quite? 812 00:43:29,390 --> 00:43:31,348 AUDIENCE: Because I'm wondering if [INAUDIBLE]. 813 00:43:37,112 --> 00:43:39,070 PROFESSOR: So I was not suggesting that you set 814 00:43:39,070 --> 00:43:40,630 the probabilities equal to 0. 815 00:43:40,630 --> 00:43:43,114 I was suggesting that you do what's kind of like what 816 00:43:43,114 --> 00:43:45,280 the equations actually here, which is that you don't 817 00:43:45,280 --> 00:43:46,920 allow any probability to leave. 818 00:43:46,920 --> 00:43:49,540 There's no probably flux on this edge. 819 00:43:49,540 --> 00:43:54,780 So for example, out at P134, I would just say, OK, well, 820 00:43:54,780 --> 00:43:57,500 here's the probability that you have 134 mRNA. 821 00:43:57,500 --> 00:44:01,350 And in principle there are these two arrows, 822 00:44:01,350 --> 00:44:04,580 but you can just get rid of them. 823 00:44:04,580 --> 00:44:09,900 So now any probability that enters here can only come back. 824 00:44:09,900 --> 00:44:11,750 And I've somehow violated my equations. 825 00:44:11,750 --> 00:44:15,020 But if P134 is essentially 0, then it doesn't matter. 826 00:44:22,120 --> 00:44:28,880 So instead of looking at these probabilities evolve kind of as 827 00:44:28,880 --> 00:44:31,520 a whole, we can instead look at individual trajectories, right? 828 00:44:31,520 --> 00:44:36,210 So the idea here is that if we start with the situation-- 829 00:44:36,210 --> 00:44:39,492 actually, we can take this thing here. 830 00:44:39,492 --> 00:44:41,700 So we know that at steady state it's going to be 100. 831 00:44:41,700 --> 00:44:43,180 Starts out at 50. 832 00:44:43,180 --> 00:44:45,560 And in this case, with the master equation you say, 833 00:44:45,560 --> 00:44:48,060 OK, well, you start out with all the probability right here. 834 00:44:48,060 --> 00:44:50,734 So you have kind of a delta function at 50. 835 00:44:50,734 --> 00:44:52,900 But then what happens is this thing kind of evolves, 836 00:44:52,900 --> 00:44:59,442 and over time this thing kind of spreads 837 00:44:59,442 --> 00:45:00,900 until you have something that looks 838 00:45:00,900 --> 00:45:05,690 like this, where you have a Poisson distribution centered 839 00:45:05,690 --> 00:45:06,959 around 100. 840 00:45:06,959 --> 00:45:08,500 And this Poisson distribution's going 841 00:45:08,500 --> 00:45:11,060 to be very close to a Gaussian, because you 842 00:45:11,060 --> 00:45:12,680 have a significant number. 843 00:45:12,680 --> 00:45:14,888 So the master equation tells you how this probability 844 00:45:14,888 --> 00:45:16,610 distribution evolves. 845 00:45:16,610 --> 00:45:23,440 Now this is the number m and this is kind of the frequency 846 00:45:23,440 --> 00:45:25,330 that you observe it. 847 00:45:25,330 --> 00:45:26,910 So we can also kind of flip things 848 00:45:26,910 --> 00:45:34,310 so we instead plot the number m on the y-axis. 849 00:45:34,310 --> 00:45:38,347 And we already said the deterministic equations 850 00:45:38,347 --> 00:45:39,180 will look like this. 851 00:45:39,180 --> 00:45:42,585 And the characteristic time scale for this is what? 852 00:45:49,670 --> 00:45:50,590 1 over mm, right? 853 00:45:50,590 --> 00:45:54,230 So this thing relaxes to the equilibrium, 854 00:45:54,230 --> 00:45:57,840 time scale determined by the degradation time of the mRNA. 855 00:45:57,840 --> 00:46:00,270 So these are things that should be really-- 856 00:46:00,270 --> 00:46:03,145 you want to be kind of drilled into your head, 857 00:46:03,145 --> 00:46:05,790 and I'm trying to drill, so you'll hear them again 858 00:46:05,790 --> 00:46:07,110 and again. 859 00:46:07,110 --> 00:46:11,880 Now the master equation, indeed, since everything's linear here, 860 00:46:11,880 --> 00:46:14,970 the expectation value over the probability distributions 861 00:46:14,970 --> 00:46:16,821 actually does behave like this. 862 00:46:16,821 --> 00:46:19,070 So the mean of the distributions as a function of time 863 00:46:19,070 --> 00:46:20,600 look like that. 864 00:46:20,600 --> 00:46:22,710 And in some ways, if we were to plot this, 865 00:46:22,710 --> 00:46:25,550 we would say, OK, well, first of all it's all here. 866 00:46:25,550 --> 00:46:29,180 Then it kind of looks like this. 867 00:46:29,180 --> 00:46:31,640 So this is somehow how those probability distributions are 868 00:46:31,640 --> 00:46:34,390 kind of expanding over time. 869 00:46:34,390 --> 00:46:36,120 Now for an individual trajectories, 870 00:46:36,120 --> 00:46:38,314 if we run a bunch of stochastic simulations, 871 00:46:38,314 --> 00:46:40,480 we'll get something that on average looks like this, 872 00:46:40,480 --> 00:46:44,440 but it might look like this. 873 00:46:44,440 --> 00:46:50,010 A different one might look like this, and so on, 874 00:46:50,010 --> 00:46:52,860 although they shouldn't converge there 875 00:46:52,860 --> 00:46:55,630 because that's not consistent. 876 00:46:55,630 --> 00:46:58,420 And if you did a histogram at all those different times 877 00:46:58,420 --> 00:47:00,170 of the individual stochastic trajectories, 878 00:47:00,170 --> 00:47:02,259 you should recover the probability distribution 879 00:47:02,259 --> 00:47:03,800 that you got for the master equation. 880 00:47:06,480 --> 00:47:08,860 So this is a powerful way just to make sure 881 00:47:08,860 --> 00:47:11,330 that, for example, your simulations are working, 882 00:47:11,330 --> 00:47:14,150 that you can check to make sure that everything behaves 883 00:47:14,150 --> 00:47:16,869 in a consistent way. 884 00:47:16,869 --> 00:47:18,410 Now there's a major question, though, 885 00:47:18,410 --> 00:47:20,020 of how is it that you should generate 886 00:47:20,020 --> 00:47:23,550 these stochastic trajectories? 887 00:47:23,550 --> 00:47:28,150 And the sort of most straightforward thing to do 888 00:47:28,150 --> 00:47:32,230 is to just divide up time into a bunch of little delta t's, 889 00:47:32,230 --> 00:47:34,450 and just ask whether anything happened. 890 00:47:34,450 --> 00:47:35,380 So let me-- 891 00:47:47,080 --> 00:47:54,870 So what we want to do is we want to imagine we have maybe 892 00:47:54,870 --> 00:47:56,400 m chemical species. 893 00:47:56,400 --> 00:47:58,390 So now these are different m's and n's. 894 00:47:58,390 --> 00:47:58,930 Be careful. 895 00:47:58,930 --> 00:48:03,030 m chemical species, they could be anything, could be proteins, 896 00:48:03,030 --> 00:48:06,400 they could be small molecules, something. 897 00:48:06,400 --> 00:48:09,690 And there are n possible reactions. 898 00:48:09,690 --> 00:48:11,630 And indeed, in some cases people want 899 00:48:11,630 --> 00:48:16,770 to study the stochastic dynamics of large networks. 900 00:48:16,770 --> 00:48:20,500 So you could have 50 chemical species 901 00:48:20,500 --> 00:48:23,665 and 300 different reactions. 902 00:48:23,665 --> 00:48:25,165 So this could be rather complicated. 903 00:48:27,820 --> 00:48:34,080 And these m chemical species have, we'll say, numbers 904 00:48:34,080 --> 00:48:38,030 or if you'd like, in some cases it could be concentrations, Xi, 905 00:48:38,030 --> 00:48:44,140 so then the whole thing can be described as some vector X. 906 00:48:44,140 --> 00:48:48,740 And the question is, how should we assimilate this? 907 00:48:48,740 --> 00:48:51,890 The so-called, what we often call the naive protocol-- 908 00:48:51,890 --> 00:48:55,500 and this is indeed what I did in graduate school 909 00:48:55,500 --> 00:48:59,400 because nobody told me that I wasn't supposed to do it-- 910 00:48:59,400 --> 00:49:03,170 is that you divide time into little time segments delta t. 911 00:49:09,550 --> 00:49:10,422 Small delta t. 912 00:49:10,422 --> 00:49:11,880 And you just do this over and over. 913 00:49:11,880 --> 00:49:16,440 And for each delta t you ask, did anything happen? 914 00:49:16,440 --> 00:49:18,130 If it did, then you update. 915 00:49:18,130 --> 00:49:20,450 If not, you keep on going. 916 00:49:20,450 --> 00:49:23,216 Now the problem with this approach-- well, 917 00:49:23,216 --> 00:49:24,841 what is the problem with this approach? 918 00:49:29,170 --> 00:49:31,224 AUDIENCE: [INAUDIBLE] 919 00:49:31,224 --> 00:49:31,890 PROFESSOR: Yeah. 920 00:49:31,890 --> 00:49:32,830 Time is continuous. 921 00:49:32,830 --> 00:49:39,050 So one problem is that, well, you don't like discrete time. 922 00:49:39,050 --> 00:49:41,820 That's understandable. 923 00:49:41,820 --> 00:49:45,470 But I'm going to say, well, you know, the details-- a delta t 924 00:49:45,470 --> 00:49:48,360 may be small, so you won't notice. 925 00:49:51,140 --> 00:49:53,181 I'm saying, if I said delta t being small, 926 00:49:53,181 --> 00:49:55,680 then I'm going to claim that you're not going to notice that 927 00:49:55,680 --> 00:49:56,704 I've-- 928 00:49:56,704 --> 00:49:57,880 AUDIENCE: [INAUDIBLE] 929 00:49:57,880 --> 00:49:59,310 PROFESSOR: But then the simulation is slow, right? 930 00:49:59,310 --> 00:50:01,080 So there's a fundamental trade-off here. 931 00:50:01,080 --> 00:50:03,500 And in particular, the problem with this protocol 932 00:50:03,500 --> 00:50:07,070 is that for it to behave reasonably, 933 00:50:07,070 --> 00:50:09,070 delta t has to be very small. 934 00:50:09,070 --> 00:50:12,610 And what do I mean by very small, though? 935 00:50:18,222 --> 00:50:19,097 AUDIENCE: [INAUDIBLE] 936 00:50:25,090 --> 00:50:27,910 PROFESSOR: That's right. 937 00:50:27,910 --> 00:50:33,210 For this to work, delta t has to be 938 00:50:33,210 --> 00:50:36,403 such that unlikely for anything to happen. 939 00:50:41,599 --> 00:50:43,390 But this is already a problem, because that 940 00:50:43,390 --> 00:50:45,570 means that we're doing a lot of simulations, 941 00:50:45,570 --> 00:50:47,510 and then just nothing's happening. 942 00:50:51,420 --> 00:50:55,710 How do we figure out what that probability is? 943 00:51:03,680 --> 00:51:07,600 So in particular, we can ask about-- well, 944 00:51:07,600 --> 00:51:15,370 given possible reactions, we'll say with rates rs of i. 945 00:51:15,370 --> 00:51:22,140 So the probability that the i'th reaction occurs is equal to r i 946 00:51:22,140 --> 00:51:30,600 times delta t for small delta t, because each of these reactions 947 00:51:30,600 --> 00:51:34,270 will occur kind of at a rate-- they're going to be exponential 948 00:51:34,270 --> 00:51:37,330 distributions of the times for them to occur. 949 00:51:37,330 --> 00:51:40,440 This is a Poisson process because it's random. 950 00:51:40,440 --> 00:51:44,960 Now what we want to know is the probability that nothing 951 00:51:44,960 --> 00:51:49,457 is going to happen because that's how we're 952 00:51:49,457 --> 00:51:50,540 going to have set delta t. 953 00:51:53,680 --> 00:51:55,460 Well, what we can imagine is, then 954 00:51:55,460 --> 00:52:00,670 we say, well, what's the probability that is, say, 955 00:52:00,670 --> 00:52:10,870 not reaction 1 and not 2 and dot dot dot. 956 00:52:10,870 --> 00:52:11,500 OK. 957 00:52:11,500 --> 00:52:14,846 Well, and this is in some time delta t. 958 00:52:20,050 --> 00:52:25,475 Well, actually, we know that if the fundamental process just 959 00:52:25,475 --> 00:52:26,850 looks like this, then we're going 960 00:52:26,850 --> 00:52:28,975 to get exponential distributions for each of those. 961 00:52:32,170 --> 00:52:36,450 So we end up with e to the r1, and indeed, 962 00:52:36,450 --> 00:52:40,330 once we write an exponential, we don't have to write delta t. 963 00:52:40,330 --> 00:52:43,580 This is just some time t. 964 00:52:43,580 --> 00:52:46,830 For this to be true requires a delta t is very small. 965 00:52:46,830 --> 00:52:49,060 But if we want to just ask, what's 966 00:52:49,060 --> 00:52:52,360 the probability that reaction 1 has not happened in some time 967 00:52:52,360 --> 00:52:54,260 t, this actually is, indeed, precisely equal 968 00:52:54,260 --> 00:52:55,060 to e to the r1t. 969 00:53:00,400 --> 00:53:00,983 Yeah, details. 970 00:53:06,080 --> 00:53:11,890 And this is e to the minus r2t dot dot dot minus. 971 00:53:11,890 --> 00:53:17,375 And we go up to n, r to the nt, because each of those chemical 972 00:53:17,375 --> 00:53:19,500 reactions are going to be exponentially distributed 973 00:53:19,500 --> 00:53:21,875 in terms of how long you have to wait for them to happen. 974 00:53:26,090 --> 00:53:28,870 And what's neat about this is that this means 975 00:53:28,870 --> 00:53:31,440 that if you just ask about the probability 976 00:53:31,440 --> 00:53:39,450 distribution for all of them combined by saying that none 977 00:53:39,450 --> 00:53:41,130 of them have happened, this is actually 978 00:53:41,130 --> 00:53:44,920 just equal to the exponent of minus-- 979 00:53:44,920 --> 00:53:48,850 now we might pull the t out and we just sum over ri. 980 00:53:54,130 --> 00:53:56,880 So this is actually, somehow, a little bit surprising, 981 00:53:56,880 --> 00:53:59,300 which is that each of those chemical reactions 982 00:53:59,300 --> 00:54:01,300 occur, and they're occurring at different rates. 983 00:54:01,300 --> 00:54:03,591 Some of them might be fast, some of them might be slow. 984 00:54:03,591 --> 00:54:08,470 The ri's can be different by orders of magnitude. 985 00:54:08,470 --> 00:54:11,334 But still, over these hundreds of chemical reactions, 986 00:54:11,334 --> 00:54:12,750 if the only thing you want to know 987 00:54:12,750 --> 00:54:14,750 is, oh, what's the probability that none of them 988 00:54:14,750 --> 00:54:16,150 have happened, that is also going 989 00:54:16,150 --> 00:54:18,625 to end up-- that's going to decay exponentially. 990 00:54:20,880 --> 00:54:23,130 And this actually tells us something very interesting, 991 00:54:23,130 --> 00:54:26,470 which is that if we want to know the distribution of times 992 00:54:26,470 --> 00:54:29,429 for the first thing to happen, that's 993 00:54:29,429 --> 00:54:31,220 also going to be exponentially distributed. 994 00:54:34,295 --> 00:54:35,920 And it's just exponentially distributed 995 00:54:35,920 --> 00:54:40,980 with a rate that is given by the sum of these rates. 996 00:54:40,980 --> 00:54:43,720 Now that's the basic insight behind 997 00:54:43,720 --> 00:54:48,560 this GIllespie algorithm, where instead of dividing things up 998 00:54:48,560 --> 00:54:51,920 into a bunch of little times delta t, instead what you do 999 00:54:51,920 --> 00:54:55,200 is you ask, how long am I going to have to wait 1000 00:54:55,200 --> 00:54:58,480 before the first thing happens? 1001 00:54:58,480 --> 00:55:01,899 And you just sample from an exponential with this rate 1002 00:55:01,899 --> 00:55:03,190 r that is the sum of the rates. 1003 00:55:11,827 --> 00:55:13,410 Maybe it's even worth saying that, OK, 1004 00:55:13,410 --> 00:55:15,450 so there's the naive algorithm where you just 1005 00:55:15,450 --> 00:55:17,610 divide a bunch of delta t's, you just take a little steps, 1006 00:55:17,610 --> 00:55:19,320 you say, OK, nothing, nothing, nothing, nothing, and then 1007 00:55:19,320 --> 00:55:20,420 eventually something happens, and then 1008 00:55:20,420 --> 00:55:22,030 you update, you keep on going. 1009 00:55:22,030 --> 00:55:27,390 There's the somewhat less naive algorithm, which is exact, 1010 00:55:27,390 --> 00:55:29,610 so it's not the same concerns, the j hat which 1011 00:55:29,610 --> 00:55:34,840 is that you could just sample from n different exponentials, 1012 00:55:34,840 --> 00:55:36,400 each with their own rates, and then 1013 00:55:36,400 --> 00:55:38,800 just take the minimum of them and say, OK, 1014 00:55:38,800 --> 00:55:42,420 that's the that happened first, and then update from that. 1015 00:55:42,420 --> 00:55:44,310 And that's an exact algorithm. 1016 00:55:44,310 --> 00:55:47,480 But the problem is that you have to sample from possibly 1017 00:55:47,480 --> 00:55:49,790 many different exponentials. 1018 00:55:49,790 --> 00:55:51,890 And that's not a disaster, but again, it's 1019 00:55:51,890 --> 00:55:53,000 computationally slow. 1020 00:55:53,000 --> 00:55:56,580 So the Gillespie algorithm removes the requirement 1021 00:55:56,580 --> 00:55:59,470 to from those n exponentials, because instead what you 1022 00:55:59,470 --> 00:56:05,780 do is you just say, the numbers, or the concentrations, 1023 00:56:05,780 --> 00:56:13,465 give all of the ri, give you all the rates. 1024 00:56:21,290 --> 00:56:24,640 And then what you do is you sample 1025 00:56:24,640 --> 00:56:34,120 from an exponential with rate r, which 1026 00:56:34,120 --> 00:56:37,770 is the sum over all the ri. 1027 00:56:37,770 --> 00:56:40,520 That tells you, when is the first reaction going to occur. 1028 00:56:44,020 --> 00:56:47,450 And then what you do is you ask, well, which reaction did occur? 1029 00:56:47,450 --> 00:56:51,990 Because you actually don't know that yet. 1030 00:56:51,990 --> 00:56:55,180 And there, it's just the probabilities of each of them. 1031 00:56:55,180 --> 00:56:57,530 So the probabilities Pi is just going 1032 00:56:57,530 --> 00:57:06,440 to be the ri divided by the sum over the ri, so this big R. 1033 00:57:06,440 --> 00:57:10,150 So it may be that you had 300 possible chemical reactions, 1034 00:57:10,150 --> 00:57:12,030 but you only have to do two things here. 1035 00:57:12,030 --> 00:57:13,020 And they're both kind of simple, right? 1036 00:57:13,020 --> 00:57:14,950 You sample from one exponential, gives you 1037 00:57:14,950 --> 00:57:17,680 how long you had to wait for something to happen. 1038 00:57:17,680 --> 00:57:20,165 And then you just sample from another simple probability 1039 00:57:20,165 --> 00:57:24,600 thing here that just tells you which of the n 1040 00:57:24,600 --> 00:57:27,400 possible chemical reactions was it that actually occurred. 1041 00:57:27,400 --> 00:57:29,340 And of course, the chemical reactions 1042 00:57:29,340 --> 00:57:30,900 that were occurring at a faster rate 1043 00:57:30,900 --> 00:57:34,560 have a higher probability of being chosen. 1044 00:57:34,560 --> 00:57:37,370 So this actually is an exact procedure in the sense 1045 00:57:37,370 --> 00:57:43,340 that there's no digitization of time or anything of the sort. 1046 00:57:43,340 --> 00:57:47,330 So this actually is computationally efficient 1047 00:57:47,330 --> 00:57:52,080 and is exact, assuming that your description of the chemical 1048 00:57:52,080 --> 00:57:55,459 reactions was accurate to begin with. 1049 00:57:55,459 --> 00:57:57,000 So then what we do is we update time. 1050 00:58:00,820 --> 00:58:03,880 This is in some ways-- when you do computations, 1051 00:58:03,880 --> 00:58:05,880 when you actually do simulations-- this is maybe 1052 00:58:05,880 --> 00:58:08,240 the annoying part about the Gillespie algorithm, which 1053 00:58:08,240 --> 00:58:10,800 is that now your times are not equally spaced, 1054 00:58:10,800 --> 00:58:12,420 and so then you just have to make 1055 00:58:12,420 --> 00:58:15,190 sure you remember that, you don't plot something that's 1056 00:58:15,190 --> 00:58:16,150 incorrect. 1057 00:58:16,150 --> 00:58:21,250 Because your times are going to hop at different time 1058 00:58:21,250 --> 00:58:22,700 intervals. 1059 00:58:22,700 --> 00:58:23,846 But that's doable. 1060 00:58:23,846 --> 00:58:25,346 You have to update your time and you 1061 00:58:25,346 --> 00:58:29,300 have to update your abundances. 1062 00:58:29,300 --> 00:58:32,380 And then what you do is repeat. 1063 00:58:37,751 --> 00:58:40,250 I think the notes kind of allude to this Gillespie algorithm 1064 00:58:40,250 --> 00:58:43,790 but are not quite explicit about what you actually 1065 00:58:43,790 --> 00:58:47,274 do to go through this process. 1066 00:58:47,274 --> 00:58:49,690 For the simulations that you're going to do in this class, 1067 00:58:49,690 --> 00:58:54,785 I would say that you don't get the full benefits 1068 00:58:54,785 --> 00:58:56,660 of the Gillespie in the sense that you're not 1069 00:58:56,660 --> 00:58:59,540 going to be simulating hundreds of differential equations 1070 00:58:59,540 --> 00:59:01,820 with hundreds of different things. 1071 00:59:01,820 --> 00:59:04,820 But it's in those complicated models 1072 00:59:04,820 --> 00:59:07,470 that you really have to do this kind of Gillespie approach, 1073 00:59:07,470 --> 00:59:10,769 as compared to even this somewhat better model, which is 1074 00:59:10,769 --> 00:59:12,560 you sample from the different exponentials. 1075 00:59:16,187 --> 00:59:18,270 Are there any questions about why this might work, 1076 00:59:18,270 --> 00:59:20,240 why you might want to do it? 1077 00:59:20,240 --> 00:59:20,990 Yes. 1078 00:59:20,990 --> 00:59:23,930 AUDIENCE: What do you mean by sample the exponentials? 1079 00:59:23,930 --> 00:59:26,000 PROFESSOR: Right. 1080 00:59:26,000 --> 00:59:30,670 What I mean is that you go to Matlab and you say, 1081 00:59:30,670 --> 00:59:40,730 random-- I'm sort of serious, but-- sorry, 1082 00:59:40,730 --> 00:59:43,270 I'm trying to get a new-- All right. 1083 00:59:43,270 --> 00:59:46,170 So you the exponential. 1084 00:59:46,170 --> 00:59:49,460 So it's a probability distribution. 1085 00:59:49,460 --> 00:59:56,530 So this is the probability is a function of time and then t. 1086 00:59:56,530 --> 00:59:58,902 And it's going to look something like this. 1087 01:00:01,500 --> 01:00:05,840 This thing is going to be some-- given that, in general, it's 1088 01:00:05,840 --> 01:00:13,750 going to be the probability t is going to be e to the minus rt. 1089 01:00:13,750 --> 01:00:16,878 And then do I put r here or do I put 1 over r? 1090 01:00:16,878 --> 01:00:18,699 AUDIENCE: [INAUDIBLE] 1091 01:00:18,699 --> 01:00:19,782 PROFESSOR: Is it 1 over r? 1092 01:00:23,820 --> 01:00:26,520 Well, what should be the units of a probability distribution? 1093 01:00:30,300 --> 01:00:31,400 1 over time, in this case. 1094 01:00:31,400 --> 01:00:33,120 It's 1 over whatever's on this x-axis, 1095 01:00:33,120 --> 01:00:35,740 because if you want to get the actual, honest to goodness 1096 01:00:35,740 --> 01:00:40,820 probability-- so if you want the probability that t is, 1097 01:00:40,820 --> 01:00:50,864 say, between t1 and t1 plus delta t. 1098 01:00:50,864 --> 01:00:52,280 If you want an actual probability, 1099 01:00:52,280 --> 01:00:59,090 then this thing is equal to the probability density at t1, 1100 01:00:59,090 --> 01:01:02,170 in this case, times delta t. 1101 01:01:06,240 --> 01:01:08,530 So that means thing has to have a 1 over time, 1102 01:01:08,530 --> 01:01:09,931 and that gives us r here. 1103 01:01:14,960 --> 01:01:17,800 So this is probability density, and what 1104 01:01:17,800 --> 01:01:20,700 I'm saying is that when I say sample from this probability 1105 01:01:20,700 --> 01:01:24,470 distribution, what it means is that it's like rolling a die, 1106 01:01:24,470 --> 01:01:26,340 but that it's a biased die because it's 1107 01:01:26,340 --> 01:01:28,460 continuous thing over the time. 1108 01:01:28,460 --> 01:01:32,380 But just like when you have a six-sided die and I say, OK, 1109 01:01:32,380 --> 01:01:34,450 sample from the die, you're playing Monopoly, 1110 01:01:34,450 --> 01:01:37,350 you throw the die and you get 1, 2, 3, 4, 5, 6. 1111 01:01:37,350 --> 01:01:39,114 And you do that over and over again. 1112 01:01:39,114 --> 01:01:39,780 Same thing here. 1113 01:01:39,780 --> 01:01:43,040 You kind of roll the die and see what happens. 1114 01:01:43,040 --> 01:01:45,720 And indeed, you're going to get some practice with probability 1115 01:01:45,720 --> 01:01:48,010 distributions on the homework that you're 1116 01:01:48,010 --> 01:01:52,910 doing right now because you're asked to demonstrate that you 1117 01:01:52,910 --> 01:01:55,855 can sample from a uniform distribution, which something 1118 01:01:55,855 --> 01:01:59,430 that's just equally probable across the unit line, 1119 01:01:59,430 --> 01:02:03,080 and do a transformation and get an exponential distribution. 1120 01:02:03,080 --> 01:02:05,710 And it used to be that everybody knew all these tricks 1121 01:02:05,710 --> 01:02:07,260 because you had to kind of know them 1122 01:02:07,260 --> 01:02:09,280 in order to do computation. 1123 01:02:09,280 --> 01:02:12,199 But now, Matlab, or whatever program you use, 1124 01:02:12,199 --> 01:02:13,740 they know all the tricks, so you just 1125 01:02:13,740 --> 01:02:17,005 ask it to sample from an exponential with this property 1126 01:02:17,005 --> 01:02:17,999 and it does it for you. 1127 01:02:17,999 --> 01:02:19,790 But you still need to know what it's doing. 1128 01:02:23,100 --> 01:02:27,390 So just to be clear, what is the most likely time 1129 01:02:27,390 --> 01:02:29,473 that you're going to get out from the exponential? 1130 01:02:32,790 --> 01:02:35,720 0. 1131 01:02:35,720 --> 01:02:37,750 It has a peak here but the mean is over here. 1132 01:02:44,190 --> 01:02:48,815 Any other questions about how the Gillespie algorithm works? 1133 01:02:54,370 --> 01:02:59,370 Can somebody tell me how a protein burst arises? 1134 01:02:59,370 --> 01:03:01,605 So we had this original question about whether there 1135 01:03:01,605 --> 01:03:03,000 were protein bursts in that model 1136 01:03:03,000 --> 01:03:07,520 that I wrote down, where we just had m dot is equal to-- 1137 01:03:18,840 --> 01:03:24,610 Now what we said was that the master equation would not-- 1138 01:03:24,610 --> 01:03:27,280 the protein burst would somehow be 1139 01:03:27,280 --> 01:03:28,950 there are but you would never see them, 1140 01:03:28,950 --> 01:03:32,131 or somehow the protein burst would influence how the mean 1141 01:03:32,131 --> 01:03:34,380 and everything have evolved, but you wouldn't actually 1142 01:03:34,380 --> 01:03:35,510 see any big jumps. 1143 01:03:35,510 --> 01:03:38,051 But then we said, oh, but if you did a stochastic simulation, 1144 01:03:38,051 --> 01:03:38,900 you would. 1145 01:03:38,900 --> 01:03:41,910 So the claim here is that the Gillespie algorithm, 1146 01:03:41,910 --> 01:03:44,975 what I've just told you here, will lead to protein bursts. 1147 01:03:47,500 --> 01:03:50,055 When I make that statement, what is it that I actually mean? 1148 01:03:54,910 --> 01:03:57,610 If we do a Gillespie of this, will the-- OK, 1149 01:03:57,610 --> 01:03:58,590 let's just hold on. 1150 01:03:58,590 --> 01:04:00,360 Let me do a quick vote. 1151 01:04:00,360 --> 01:04:07,757 Will we have cases where delta n is greater than 1? 1152 01:04:07,757 --> 01:04:10,090 If I go through this process, if I'm using the Gillespie 1153 01:04:10,090 --> 01:04:13,240 and I'm tracking how mRNA and protein number are changing 1154 01:04:13,240 --> 01:04:16,980 over time, will I get these things, protein bursts, 1155 01:04:16,980 --> 01:04:20,775 where delta n is larger than 1 in one of these time cycles? 1156 01:04:28,460 --> 01:04:29,010 Ready? 1157 01:04:29,010 --> 01:04:32,270 3, 2, 1. 1158 01:04:36,510 --> 01:04:40,580 So most of the group is saying that it's going to be no. 1159 01:04:40,580 --> 01:04:43,770 But again, it's mixed. 1160 01:04:43,770 --> 01:04:47,760 So can somebody say why we don't get-- 1161 01:04:47,760 --> 01:04:52,910 AUDIENCE: [INAUDIBLE] It seems like the structure 1162 01:04:52,910 --> 01:04:55,760 of the simulation is to make sure [INAUDIBLE]. 1163 01:04:55,760 --> 01:04:56,760 PROFESSOR: That's right. 1164 01:04:56,760 --> 01:04:57,260 Yeah. 1165 01:04:57,260 --> 01:04:59,920 So the simulation as written-- you 1166 01:04:59,920 --> 01:05:02,547 could imagine some sort of phenomenological version 1167 01:05:02,547 --> 01:05:04,880 of this where you allowed, actually, for protein bursts. 1168 01:05:04,880 --> 01:05:09,300 But as kind of specified is that we ask, 1169 01:05:09,300 --> 01:05:11,080 what's the time for one thing to happen? 1170 01:05:14,870 --> 01:05:17,110 But the claim somehow is, OK, well, we can still 1171 01:05:17,110 --> 01:05:18,540 get protein bursts from this. 1172 01:05:18,540 --> 01:05:20,396 And how does that happen? 1173 01:05:24,284 --> 01:05:27,200 AUDIENCE: You can have the rate for something happening 1174 01:05:27,200 --> 01:05:32,475 increase suddenly, and that would happen if we go from m 1175 01:05:32,475 --> 01:05:34,604 equals 0 to m equals 1-- 1176 01:05:34,604 --> 01:05:37,480 PROFESSOR: Yeah, for example, if we didn't have an mRNA before 1177 01:05:37,480 --> 01:05:38,610 and we got an mRNA. 1178 01:05:38,610 --> 01:05:40,340 What it means that if you look at n 1179 01:05:40,340 --> 01:05:43,820 as a function of time during one of these protein 1180 01:05:43,820 --> 01:05:47,462 bursts-- before, I was drawing it just hopping up, but really, 1181 01:05:47,462 --> 01:05:48,920 in the context of the Gillespie, it 1182 01:05:48,920 --> 01:05:51,175 would be that it would hop, hop. 1183 01:05:51,175 --> 01:05:52,850 So there would be little time jumps. 1184 01:05:56,370 --> 01:05:58,880 So this is a protein burst, but it's really 1185 01:05:58,880 --> 01:06:01,210 before this mRNA is degraded, you get 1, 1, 1, 1. 1186 01:06:01,210 --> 01:06:05,800 So each of these as is delta n of 1. 1187 01:06:05,800 --> 01:06:08,170 So this is whatever, 6, 7. 1188 01:06:08,170 --> 01:06:12,680 And then what can happen is that we get the mRNA degraded. 1189 01:06:12,680 --> 01:06:14,997 And so then we're going to get a slower thing 1190 01:06:14,997 --> 01:06:22,060 where it-- looks like that. 1191 01:06:22,060 --> 01:06:26,720 So the Gillespie, everything is being created and destroyed 1192 01:06:26,720 --> 01:06:27,740 in units of 1. 1193 01:06:27,740 --> 01:06:30,024 But it could be that the time interval over this burst 1194 01:06:30,024 --> 01:06:32,190 is just very short, so then it goes up very quickly, 1195 01:06:32,190 --> 01:06:33,580 but then it's slower to go away. 1196 01:06:39,001 --> 01:06:41,000 So what I want to do in just the last 15 minutes 1197 01:06:41,000 --> 01:06:44,380 is talk a bit about the Fokker-Planck approximation. 1198 01:06:44,380 --> 01:06:47,650 I would say that all these different approaches are 1199 01:06:47,650 --> 01:06:50,470 useful to varying degrees in terms of actually doing 1200 01:06:50,470 --> 01:06:53,930 simulations, doing analytic calculations, 1201 01:06:53,930 --> 01:06:55,050 getting intuition. 1202 01:06:55,050 --> 01:07:00,284 And the Fokker-Planck approach, I'd say it's 1203 01:07:00,284 --> 01:07:01,950 more or less useful for different people 1204 01:07:01,950 --> 01:07:03,890 depending on what you're doing. 1205 01:07:03,890 --> 01:07:06,750 So the basic idea, as kind of you 1206 01:07:06,750 --> 01:07:09,410 answered in the pre-class reading, 1207 01:07:09,410 --> 01:07:12,696 is that in cases where n is large enough 1208 01:07:12,696 --> 01:07:14,070 that you don't feel like you need 1209 01:07:14,070 --> 01:07:16,870 to take into account the discrete nature 1210 01:07:16,870 --> 01:07:19,207 of the molecules, yet at the same time 1211 01:07:19,207 --> 01:07:20,790 it's not so large that you can totally 1212 01:07:20,790 --> 01:07:23,670 ignore the fluctuations, then the Fokker-Planck approach 1213 01:07:23,670 --> 01:07:27,030 is nice because it allows you to get some sense of what's 1214 01:07:27,030 --> 01:07:31,490 going on without all of the crazy details of, for example, 1215 01:07:31,490 --> 01:07:34,540 the master equation. 1216 01:07:34,540 --> 01:07:36,130 And then it also, because of this idea 1217 01:07:36,130 --> 01:07:37,830 of an effective potential, it allows 1218 01:07:37,830 --> 01:07:41,790 you to bring all the intuition from that into your study 1219 01:07:41,790 --> 01:07:43,780 of these gene circuits. 1220 01:07:43,780 --> 01:07:46,580 Now I'm not going to go through the whole derivation, 1221 01:07:46,580 --> 01:07:48,180 but if you have questions about that, 1222 01:07:48,180 --> 01:07:50,270 please come up after class and I'm 1223 01:07:50,270 --> 01:07:54,930 happy to go through it with you, because it's sort of fun. 1224 01:07:57,620 --> 01:08:01,490 But the notes do go over it. 1225 01:08:01,490 --> 01:08:04,930 I think that's what's perhaps useful to just remind ourselves 1226 01:08:04,930 --> 01:08:11,020 of is how it maybe leads to a Gaussian with some width 1227 01:08:11,020 --> 01:08:13,890 depending upon the shapes of the production degradation curves. 1228 01:08:21,520 --> 01:08:24,090 So the basic notion here is that, depending 1229 01:08:24,090 --> 01:08:29,090 on the f's and g's, the production degradation terms, 1230 01:08:29,090 --> 01:08:31,644 we get different shaped effective potentials. 1231 01:08:41,671 --> 01:08:43,170 So in general we have something that 1232 01:08:43,170 --> 01:08:46,020 looks like-- we have some n dot, there's some fn, 1233 01:08:46,020 --> 01:08:49,250 and then there's a minus gn. 1234 01:08:49,250 --> 01:08:53,956 So for example, for something that is just simple expression, 1235 01:08:53,956 --> 01:08:55,580 in the case of-- let's just imagine now 1236 01:08:55,580 --> 01:08:58,424 that there is-- if you want we can say it's a protein where 1237 01:08:58,424 --> 01:09:02,310 it's just some k minus gamma n. 1238 01:09:02,310 --> 01:09:04,800 Or if you'd like, we could say, oh, this is mRNA number. 1239 01:09:04,800 --> 01:09:07,475 But something that's just simple production, and then 1240 01:09:07,475 --> 01:09:08,475 first order degradation. 1241 01:09:12,229 --> 01:09:16,120 The question is, how do we go about understanding 1242 01:09:16,120 --> 01:09:18,790 this in the context of the Fokker-Planck approximation? 1243 01:09:18,790 --> 01:09:22,550 And it turns out that you can write it 1244 01:09:22,550 --> 01:09:26,220 in what is essentially a diffusion equation where 1245 01:09:26,220 --> 01:09:30,140 you have some probability flux that's moving around. 1246 01:09:30,140 --> 01:09:34,000 And within that realm, you can write 1247 01:09:34,000 --> 01:09:37,380 that the probability distribution of the number 1248 01:09:37,380 --> 01:09:39,210 is going to be something that-- so there's 1249 01:09:39,210 --> 01:09:40,700 going to be some constant. 1250 01:09:40,700 --> 01:09:42,140 There's f plus g. 1251 01:09:42,140 --> 01:09:45,109 And these are both functions of n. 1252 01:09:45,109 --> 01:09:49,000 And then you have e to the minus [INAUDIBLE] 1253 01:09:49,000 --> 01:09:52,554 So the idea here is that this behaves 1254 01:09:52,554 --> 01:09:53,720 as some effective potential. 1255 01:10:02,370 --> 01:10:05,860 Of course, it's not quite true because f and g also 1256 01:10:05,860 --> 01:10:08,680 are functions of n, they're are not in here. 1257 01:10:08,680 --> 01:10:10,600 But this is the dominant term because it's 1258 01:10:10,600 --> 01:10:11,391 in the exponential. 1259 01:10:13,770 --> 01:10:18,770 And here phi n is defined as the following. 1260 01:10:18,770 --> 01:10:23,400 So it's minus this integral over n of the f minus g 1261 01:10:23,400 --> 01:10:29,370 and f plus g dn that we integrate over n prime. 1262 01:10:31,980 --> 01:10:33,800 And we're going to kind of go through what 1263 01:10:33,800 --> 01:10:35,508 some of these different f's and g's might 1264 01:10:35,508 --> 01:10:38,730 look like to try to get a sense of why this happened. 1265 01:10:38,730 --> 01:10:40,560 It is worth mentioning that you can 1266 01:10:40,560 --> 01:10:44,900 do this for any f and g when it's just in one dimension, 1267 01:10:44,900 --> 01:10:48,010 so you just have n. 1268 01:10:48,010 --> 01:10:50,559 Once you have it in two dimensions, 1269 01:10:50,559 --> 01:10:52,350 so once you actually have mRNA and protein, 1270 01:10:52,350 --> 01:10:54,200 for example, you're not guaranteed 1271 01:10:54,200 --> 01:10:57,960 to to be able to write it as an effective potential. 1272 01:10:57,960 --> 01:11:01,040 Although I guess if you're willing to invoke a vector 1273 01:11:01,040 --> 01:11:05,770 potential, then maybe you can. 1274 01:11:05,770 --> 01:11:08,190 But in terms of just a simple potential, 1275 01:11:08,190 --> 01:11:11,110 then you can do it one dimension, but not necessarily 1276 01:11:11,110 --> 01:11:11,610 in more. 1277 01:11:16,320 --> 01:11:18,930 And I think that, in general, our intuition is not as useful 1278 01:11:18,930 --> 01:11:22,630 when you have the equivalent of magnetic fields and so forth 1279 01:11:22,630 --> 01:11:25,070 here anyway. 1280 01:11:25,070 --> 01:11:27,280 What I want to do is just try to understand 1281 01:11:27,280 --> 01:11:33,370 why this thing looks the way it does for this simple regulation 1282 01:11:33,370 --> 01:11:34,260 case. 1283 01:11:34,260 --> 01:11:37,560 And then we're going to ask if we change one thing or another, 1284 01:11:37,560 --> 01:11:41,675 how does it affect the resulting variance. 1285 01:11:55,940 --> 01:12:02,210 So for unregulated expression, such as here, 1286 01:12:02,210 --> 01:12:08,180 if we look at the production and degradation as a function of n, 1287 01:12:08,180 --> 01:12:15,380 fn is just some constant k, whereas gn is 1288 01:12:15,380 --> 01:12:17,230 a line that goes up as gamma n. 1289 01:12:22,430 --> 01:12:28,560 Now in this situation, if you do this integral-- and really, 1290 01:12:28,560 --> 01:12:31,750 what you can imagine is what this integral looks like right 1291 01:12:31,750 --> 01:12:35,160 around that steady state, because that's 1292 01:12:35,160 --> 01:12:37,660 kind of what we want to know, if we want to something about, 1293 01:12:37,660 --> 01:12:40,630 for example, the width of a distribution. 1294 01:12:40,630 --> 01:12:42,740 Well, there's going to b e two terms. 1295 01:12:42,740 --> 01:12:44,810 In the numerator there's an f minus g. 1296 01:12:44,810 --> 01:12:48,520 In the denominator there's an f plus g. 1297 01:12:48,520 --> 01:12:52,450 Now f minus g is actually equal to 0 right 1298 01:12:52,450 --> 01:12:54,050 at that steady state, and that's why 1299 01:12:54,050 --> 01:12:57,000 it's a steady state, because production and degradation are 1300 01:12:57,000 --> 01:12:59,440 equal. 1301 01:12:59,440 --> 01:13:02,282 Now as you go away from that location, what you're 1302 01:13:02,282 --> 01:13:04,615 doing is you're integrating the difference between the f 1303 01:13:04,615 --> 01:13:06,130 and the g. 1304 01:13:06,130 --> 01:13:09,000 And you can see that around here these things are 1305 01:13:09,000 --> 01:13:14,080 separating kind of-- well, everything's a line here. 1306 01:13:14,080 --> 01:13:17,630 And indeed, even if f and g were not linear, 1307 01:13:17,630 --> 01:13:20,794 close to that steady state they would be linear. 1308 01:13:20,794 --> 01:13:22,710 What we can see is that as you're integrating, 1309 01:13:22,710 --> 01:13:24,168 you're integrating across something 1310 01:13:24,168 --> 01:13:26,149 that is growing linearly. 1311 01:13:26,149 --> 01:13:27,565 That's what gives you a quadratic. 1312 01:13:31,010 --> 01:13:33,090 And that's why this effect of potential 1313 01:13:33,090 --> 01:13:39,800 ends up behaving as if you're in a quadratic trap. 1314 01:13:39,800 --> 01:13:42,090 Now I encourage you to go ahead and do that integral 1315 01:13:42,090 --> 01:13:42,800 at some point. 1316 01:13:42,800 --> 01:13:45,690 I was planning on doing it for you today, 1317 01:13:45,690 --> 01:13:47,120 but we are running out of time. 1318 01:13:47,120 --> 01:13:50,176 Once again, I'm happy to do it, just after class. 1319 01:13:50,176 --> 01:13:52,800 And indeed, what you can see is that because you're integrating 1320 01:13:52,800 --> 01:13:56,180 across here, you end up getting a quadratic increase 1321 01:13:56,180 --> 01:13:57,630 in the effective potential. 1322 01:13:57,630 --> 01:13:59,880 And if you look at what the variance of that thing is, 1323 01:13:59,880 --> 01:14:02,170 you indeed find that the variance is equal to the mean 1324 01:14:02,170 --> 01:14:02,670 here. 1325 01:14:07,380 --> 01:14:09,670 So what I want to ask in terms of trying 1326 01:14:09,670 --> 01:14:11,660 to get intuition is, what happens 1327 01:14:11,660 --> 01:14:13,700 if we pull these curves down? 1328 01:14:16,340 --> 01:14:18,450 So in particular, let's imagine that we 1329 01:14:18,450 --> 01:14:22,200 have a situation where-- I'm going 1330 01:14:22,200 --> 01:14:23,950 to re-parameterize things, so again, we're 1331 01:14:23,950 --> 01:14:29,172 kind of keeping the number of the equilibrium constant. 1332 01:14:29,172 --> 01:14:30,630 But now what I'm going to do is I'm 1333 01:14:30,630 --> 01:14:32,783 going to have an fn that looks like this, 1334 01:14:32,783 --> 01:14:38,840 and gn looks like-- now gn is going to be some 1/2 of lambda, 1335 01:14:38,840 --> 01:14:44,287 and this fn is equal to k minus 1/2 of gamma n. 1336 01:14:49,780 --> 01:14:52,754 Now the question is, in this situation, 1337 01:14:52,754 --> 01:14:54,420 what will be the variance over the mean? 1338 01:15:01,790 --> 01:15:03,960 Well, first of all, the variance over the mean 1339 01:15:03,960 --> 01:15:05,000 here was equal to what? 1340 01:15:12,190 --> 01:15:13,540 Although should we do vote? 1341 01:15:18,574 --> 01:15:19,990 Here are going to be some options. 1342 01:15:36,920 --> 01:15:39,411 Question is variance over the mean in this situation. 1343 01:15:47,109 --> 01:15:48,900 I'm worried that this is not going to work, 1344 01:15:48,900 --> 01:15:50,410 but let's just see where are. 1345 01:15:50,410 --> 01:15:53,170 Ready, 3, 2, 1. 1346 01:15:56,211 --> 01:15:56,710 All right. 1347 01:15:56,710 --> 01:15:59,887 So I'd say that at least broadly, people 1348 01:15:59,887 --> 01:16:01,720 are agreeing that the variance over the mean 1349 01:16:01,720 --> 01:16:02,511 here is equal to 1. 1350 01:16:02,511 --> 01:16:04,770 And again, this is the situation that we've 1351 01:16:04,770 --> 01:16:07,370 analyzed many times, which is that in this situation 1352 01:16:07,370 --> 01:16:10,540 we get a poisson, where the poisson only 1353 01:16:10,540 --> 01:16:14,190 has one free parameter, and that parameter specifies 1354 01:16:14,190 --> 01:16:16,911 both the mean and the variance. 1355 01:16:16,911 --> 01:16:18,660 So for a poisson, the variance of the mean 1356 01:16:18,660 --> 01:16:20,550 is indeed equal to 1. 1357 01:16:20,550 --> 01:16:23,620 So the Fokker-Planck approximation actually 1358 01:16:23,620 --> 01:16:25,170 accurately recapitulates that. 1359 01:16:27,986 --> 01:16:30,360 Now the question is, what will the variance over the mean 1360 01:16:30,360 --> 01:16:35,060 be in the situation that I've just drawn here? 1361 01:16:35,060 --> 01:16:36,510 So I'm going to give you a minute 1362 01:16:36,510 --> 01:16:39,720 to try to think about what this means. 1363 01:16:39,720 --> 01:16:42,100 And there are multiple ways of figuring it out. 1364 01:16:42,100 --> 01:16:44,080 You can look at, maybe, the integral. 1365 01:16:44,080 --> 01:16:47,250 You can think about the biological intuition 1366 01:16:47,250 --> 01:16:50,630 to make at least a guess of of what it should do. 1367 01:16:50,630 --> 01:16:53,990 The question is, if the production 1368 01:16:53,990 --> 01:16:55,930 rate and the degradation rate look like this, 1369 01:16:55,930 --> 01:17:00,526 what does that mean for the variance over the mean? 1370 01:17:00,526 --> 01:17:02,609 So I'll give you a minute to kind of play with it. 1371 01:17:56,005 --> 01:17:57,505 Why don't we go ahead and vote, just 1372 01:17:57,505 --> 01:17:59,380 so I can get a sense of where we are? 1373 01:17:59,380 --> 01:18:02,540 And also, it's OK if you can't actually figure this out 1374 01:18:02,540 --> 01:18:03,520 or you're confused. 1375 01:18:03,520 --> 01:18:05,930 But go ahead and make your best guess anyways, 1376 01:18:05,930 --> 01:18:08,781 because it's also useful if you can guess 1377 01:18:08,781 --> 01:18:11,280 kind of the direction it'll go, even if you can't figure out 1378 01:18:11,280 --> 01:18:12,520 its magnitude. 1379 01:18:12,520 --> 01:18:13,500 So let's vote. 1380 01:18:13,500 --> 01:18:18,260 Ready, 3, 2, 1. 1381 01:18:18,260 --> 01:18:18,760 OK. 1382 01:18:18,760 --> 01:18:24,430 So it's a mixture now, I'd say, of A, B, C, Ds. 1383 01:18:27,740 --> 01:18:32,860 Yeah, I think this is, I think, hard and confusing. 1384 01:18:32,860 --> 01:18:42,180 I maybe won't have-- all right. 1385 01:18:42,180 --> 01:18:43,260 I'll maybe say something. 1386 01:18:43,260 --> 01:18:46,860 It may be that talking to each other won't help that much. 1387 01:18:46,860 --> 01:18:50,140 OK, so in this case, what's relevant 1388 01:18:50,140 --> 01:18:54,740 is both the f minus g and the f plus g. 1389 01:18:54,740 --> 01:18:58,190 And it turns out that f minus g actually behaves the same way, 1390 01:18:58,190 --> 01:19:01,380 because at the fixed point, or at the equilibrium, 1391 01:19:01,380 --> 01:19:03,940 it starts at 0 and then it actually grows in the same way 1392 01:19:03,940 --> 01:19:05,310 as you go away from it. 1393 01:19:05,310 --> 01:19:09,410 The difference is the f plus g, where that's very much not 1394 01:19:09,410 --> 01:19:10,860 equal to 0. 1395 01:19:10,860 --> 01:19:14,920 And f plus g at the equilibrium, this f plus g 1396 01:19:14,920 --> 01:19:21,710 here is around 2k, whereas f plus g over here is around 1k. 1397 01:19:24,540 --> 01:19:27,230 What that means is that in both cases 1398 01:19:27,230 --> 01:19:29,180 you have a quadratic potential. 1399 01:19:29,180 --> 01:19:32,990 But here the quadratic potential actually ends up being steeper. 1400 01:19:32,990 --> 01:19:35,010 So if this were unregulated, then over here 1401 01:19:35,010 --> 01:19:40,970 we still get a quadratic, but it's with steeper walls. 1402 01:19:40,970 --> 01:19:45,221 So actually here, this, the variance over the mean, 1403 01:19:45,221 --> 01:19:45,970 ends up being 1/2. 1404 01:19:49,110 --> 01:19:52,190 It's useful to go ahead and just play with these equations 1405 01:19:52,190 --> 01:19:54,890 to see why that happens. 1406 01:19:54,890 --> 01:19:57,025 And I think that's a nice way to think about this 1407 01:19:57,025 --> 01:19:59,358 is, in this limit, where we pull this crossing point all 1408 01:19:59,358 --> 01:20:04,370 the way down to 0, now we have something 1409 01:20:04,370 --> 01:20:05,807 that looks kind of like this. 1410 01:20:10,120 --> 01:20:14,000 So very, very low rate of degradation. 1411 01:20:14,000 --> 01:20:15,840 But then also the production rate 1412 01:20:15,840 --> 01:20:20,894 essentially goes to 0 when we're at this point. 1413 01:20:20,894 --> 01:20:22,810 So we could still parameterize as k over gamma 1414 01:20:22,810 --> 01:20:27,090 if we want, with some-- but we could just 1415 01:20:27,090 --> 01:20:31,870 think about this as being at 100 of these mRNAs, say. 1416 01:20:31,870 --> 01:20:34,910 But then we're changing the production degradation rate. 1417 01:20:34,910 --> 01:20:37,150 And the variance over the mean here-- 1418 01:20:37,150 --> 01:20:39,025 does anybody have a guess of where that goes? 1419 01:20:43,462 --> 01:20:44,920 In this case it actually goes to 0. 1420 01:20:47,950 --> 01:20:52,500 And this is an interesting situation, because really, 1421 01:20:52,500 --> 01:20:54,562 in the limit where there's no degradation, 1422 01:20:54,562 --> 01:20:56,270 and it's all at the production side, what 1423 01:20:56,270 --> 01:20:58,720 it's saying is that you produce, you produce, you produce, 1424 01:20:58,720 --> 01:21:01,320 until you get to this number, which might be 100, 1425 01:21:01,320 --> 01:21:03,260 and then you simply stop doing anything. 1426 01:21:03,260 --> 01:21:05,820 You're not degrading, you're not producing. 1427 01:21:05,820 --> 01:21:13,380 In that case all the cells will have exactly 100, maybe, mRNA. 1428 01:21:13,380 --> 01:21:16,420 And what the Fokker-Planck kind of formalism 1429 01:21:16,420 --> 01:21:19,420 tells you is that just because production and degradation 1430 01:21:19,420 --> 01:21:22,570 rates are equal, f minus g is equal to 0, 1431 01:21:22,570 --> 01:21:24,374 doesn't mean that-- that tells you 1432 01:21:24,374 --> 01:21:26,540 that that's the equilibrium, but it doesn't tell you 1433 01:21:26,540 --> 01:21:28,998 how much spread there's going to be around the equilibrium. 1434 01:21:28,998 --> 01:21:31,960 If f and g are each larger, that leads to a larger spread 1435 01:21:31,960 --> 01:21:35,140 because there's more randomness, whereas here, f and g are 1436 01:21:35,140 --> 01:21:37,420 both essentially 0 at that point. 1437 01:21:37,420 --> 01:21:39,260 What that means is that you kind of just 1438 01:21:39,260 --> 01:21:40,786 pile up right at that precise value. 1439 01:21:44,764 --> 01:21:46,680 We are out of time, so I think we should quit. 1440 01:21:46,680 --> 01:21:48,170 But I am available for the next half hour 1441 01:21:48,170 --> 01:21:49,378 if anybody has any questions. 1442 01:21:49,378 --> 01:21:50,910 Thanks.