1 00:00:00,060 --> 00:00:02,500 The following content is provided under a Creative 2 00:00:02,500 --> 00:00:04,019 Commons license. 3 00:00:04,019 --> 00:00:06,360 Your support will help MIT OpenCourseWare 4 00:00:06,360 --> 00:00:10,730 continue to offer high quality educational resources for free. 5 00:00:10,730 --> 00:00:13,330 To make a donation or view additional materials 6 00:00:13,330 --> 00:00:17,236 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,236 --> 00:00:17,861 at ocw.mit.edu. 8 00:00:20,445 --> 00:00:22,070 PROFESSOR: Today what we're going to do 9 00:00:22,070 --> 00:00:26,661 is finish off our discussion about oscillators. 10 00:00:26,661 --> 00:00:29,160 In particular, we're going to talk about alternative designs 11 00:00:29,160 --> 00:00:30,200 for oscillators. 12 00:00:30,200 --> 00:00:32,650 So rather than having these loops that 13 00:00:32,650 --> 00:00:35,670 are purely composed of negative interactions, 14 00:00:35,670 --> 00:00:37,320 negative feedback, instead we're going 15 00:00:37,320 --> 00:00:39,920 to talk about cases where you have both positive and negative 16 00:00:39,920 --> 00:00:41,130 interactions. 17 00:00:41,130 --> 00:00:45,155 So in using this kind of combined network structure, 18 00:00:45,155 --> 00:00:47,030 you can generate what are known as relaxation 19 00:00:47,030 --> 00:00:51,010 oscillators, which have some really wonderful properties. 20 00:00:51,010 --> 00:00:53,860 In particular you can get more robust oscillations, 21 00:00:53,860 --> 00:00:55,130 relative to the parameters. 22 00:00:55,130 --> 00:00:58,350 But also the oscillations become tunable, 23 00:00:58,350 --> 00:01:00,071 i.e. you can change the frequency, 24 00:01:00,071 --> 00:01:02,070 without compromising, for example, the amplitude 25 00:01:02,070 --> 00:01:03,280 of the oscillations. 26 00:01:03,280 --> 00:01:07,670 So for both natural and synthetic oscillators 27 00:01:07,670 --> 00:01:09,220 these so-called synthetic oscillators 28 00:01:09,220 --> 00:01:11,614 are perhaps the way to go. 29 00:01:11,614 --> 00:01:13,030 And then we're going to transition 30 00:01:13,030 --> 00:01:17,410 to more of the global structure of some these networks 31 00:01:17,410 --> 00:01:20,560 in the context of transcription networks within cells. 32 00:01:20,560 --> 00:01:22,810 And discuss this paper that you guys just 33 00:01:22,810 --> 00:01:27,290 read, the Barabasi paper, which is one of the world's most 34 00:01:27,290 --> 00:01:28,470 cited papers, I think. 35 00:01:31,952 --> 00:01:34,410 And then after thinking about this global structure, of how 36 00:01:34,410 --> 00:01:36,701 you might be able to generate these so-called power law 37 00:01:36,701 --> 00:01:38,870 structures, we're going to look a little bit more 38 00:01:38,870 --> 00:01:41,681 in detail to try to understand something about these network 39 00:01:41,681 --> 00:01:42,180 motifs. 40 00:01:42,180 --> 00:01:44,510 We've already talked about them a little bit 41 00:01:44,510 --> 00:01:47,200 in the context of auto regulatory loops, 42 00:01:47,200 --> 00:01:50,360 but now we'll talk about them in a little bit more 43 00:01:50,360 --> 00:01:54,480 generality, in particular in the context of feed forward loops. 44 00:01:54,480 --> 00:01:58,390 And then on Thursday we will get into some 45 00:01:58,390 --> 00:02:02,220 of the possible beneficial features of feed forward loops. 46 00:02:02,220 --> 00:02:05,730 On Thursday we talked about the repressilator. 47 00:02:05,730 --> 00:02:09,930 So if you have x inhibiting y inhibiting z 48 00:02:09,930 --> 00:02:13,860 coming back and inhibiting x, that it's 49 00:02:13,860 --> 00:02:16,880 reasonable to expect that it might generate oscillations. 50 00:02:16,880 --> 00:02:20,380 And indeed in the Elowitz paper that we read, 51 00:02:20,380 --> 00:02:25,610 such a synthetic circuit did indeed generate oscillations, 52 00:02:25,610 --> 00:02:28,460 but there were perhaps a few problems there, right? 53 00:02:28,460 --> 00:02:30,740 So one is that only about 40% of the cells 54 00:02:30,740 --> 00:02:38,260 actually oscillated, who knows why not. 55 00:02:38,260 --> 00:02:39,880 But also there were other problems 56 00:02:39,880 --> 00:02:42,620 that the oscillations seemed rather noisy, 57 00:02:42,620 --> 00:02:45,080 there was relatively rapid desyncronization. 58 00:02:47,760 --> 00:02:52,125 Moreover, if you go and you ask, well is it possible, 59 00:02:52,125 --> 00:02:55,380 or how easy would be to change the period of the oscillations 60 00:02:55,380 --> 00:02:58,000 just by changing something like the degradation rate, what 61 00:02:58,000 --> 00:03:00,970 you'll find is that the oscillations are not 62 00:03:00,970 --> 00:03:03,700 very tunable. 63 00:03:03,700 --> 00:03:10,500 So I'll say the period, or the frequency, is not very tunable, 64 00:03:10,500 --> 00:03:15,190 and indeed this is a general feature of oscillatory networks 65 00:03:15,190 --> 00:03:19,390 that have purely negative interactions. 66 00:03:19,390 --> 00:03:22,440 We talked about a couple of these cases, for example, 67 00:03:22,440 --> 00:03:26,327 you can get oscillations just with negative auto regulation. 68 00:03:26,327 --> 00:03:27,660 And what is it that's necessary? 69 00:03:32,280 --> 00:03:33,240 AUDIENCE: [INAUDIBLE]. 70 00:03:33,240 --> 00:03:34,040 PROFESSOR: What's that? 71 00:03:34,040 --> 00:03:35,200 AUDIENCE: High coordination. 72 00:03:35,200 --> 00:03:36,408 PROFESSOR: High coordination? 73 00:03:36,408 --> 00:03:37,584 You me-- oh you're-- 74 00:03:37,584 --> 00:03:38,500 AUDIENCE: [INAUDIBLE]. 75 00:03:38,500 --> 00:03:40,833 PROFESSOR: Cooperativity in the repression, that I think 76 00:03:40,833 --> 00:03:43,295 is necessary, but is it going to be sufficient? 77 00:03:45,800 --> 00:03:49,050 Even in this case where I just have, let's say that I say, 78 00:03:49,050 --> 00:03:53,050 x dot the rate of production, if this thing is just 79 00:03:53,050 --> 00:03:57,240 as a function of x, the sharpest it could be, 80 00:03:57,240 --> 00:04:00,122 this is infinite cooperatively, so it's maximal expression. 81 00:04:00,122 --> 00:04:02,580 And then when you get above some x critical all of a sudden 82 00:04:02,580 --> 00:04:04,500 you fully repress. 83 00:04:04,500 --> 00:04:08,120 If I just have this be the formula-- 84 00:04:08,120 --> 00:04:10,470 did you guys understand what I'm referring to here? 85 00:04:10,470 --> 00:04:12,690 What would this generate? 86 00:04:12,690 --> 00:04:15,410 Would this generate oscillations? 87 00:04:15,410 --> 00:04:16,839 So it actually doesn't. 88 00:04:16,839 --> 00:04:21,120 In the simple equation, where if we have x dot 89 00:04:21,120 --> 00:04:23,280 is equal to this function. 90 00:04:23,280 --> 00:04:26,310 So I guess this is a theta, I want 91 00:04:26,310 --> 00:04:29,450 to make sure I get this x, less than x critical, 92 00:04:29,450 --> 00:04:31,240 that's what this means. 93 00:04:31,240 --> 00:04:36,100 So with some [INAUDIBLE] rate beta, minus some alpha x. 94 00:04:36,100 --> 00:04:39,070 Does this thing oscillate? 95 00:04:39,070 --> 00:04:42,690 No, and we had a simple argument for why did not oscillate, 96 00:04:42,690 --> 00:04:43,190 as well. 97 00:04:47,970 --> 00:04:50,620 Yes? 98 00:04:50,620 --> 00:04:55,660 Yell it out somebody, I'm sure somebody was here on Thursday. 99 00:04:55,660 --> 00:04:57,013 [LAUGHTER] 100 00:04:57,512 --> 00:04:59,301 AUDIENCE: [INAUDIBLE]. 101 00:04:59,301 --> 00:05:00,300 PROFESSOR: That's right. 102 00:05:00,300 --> 00:05:03,560 So this is just-- This is just an x dot, 103 00:05:03,560 --> 00:05:05,370 there's no x double dot, so that means 104 00:05:05,370 --> 00:05:07,020 the derivative of x, the single value 105 00:05:07,020 --> 00:05:09,190 is a function of x, that means that we 106 00:05:09,190 --> 00:05:14,450 can't get any oscillation here. 107 00:05:14,450 --> 00:05:17,930 And then remember we analyzed this model where we explicitly 108 00:05:17,930 --> 00:05:22,890 included the mRNA, so then we just had that x comes, 109 00:05:22,890 --> 00:05:25,580 and what it does, is it represses 110 00:05:25,580 --> 00:05:29,390 expression of this mRNA for x, and then this mRNA comes back 111 00:05:29,390 --> 00:05:31,350 and makes x. 112 00:05:31,350 --> 00:05:33,140 Right? 113 00:05:33,140 --> 00:05:35,880 And in this model, was this sufficient? 114 00:05:35,880 --> 00:05:37,880 Did this give oscillations? 115 00:05:37,880 --> 00:05:42,610 No, so this also here, there was no oscillations. 116 00:05:42,610 --> 00:05:44,360 Again here, there were no oscillations. 117 00:05:44,360 --> 00:05:47,744 But I did tell you that you could do something more 118 00:05:47,744 --> 00:05:50,160 to get oscillations, just with a single protein repressing 119 00:05:50,160 --> 00:05:50,660 itself. 120 00:05:53,000 --> 00:05:54,940 So you need more delays. 121 00:05:54,940 --> 00:06:02,410 So if you add delays, then it's possible to get oscillations. 122 00:06:05,440 --> 00:06:09,680 So those delays could be in the form of having a model 123 00:06:09,680 --> 00:06:13,280 where you explicitly take into account that first mRNA is 124 00:06:13,280 --> 00:06:15,310 made, and then that goes, and then 125 00:06:15,310 --> 00:06:17,150 you translate that to make some monomer, 126 00:06:17,150 --> 00:06:19,860 and then the monomer has to maybe fold, 127 00:06:19,860 --> 00:06:21,680 and then the folded protein maybe 128 00:06:21,680 --> 00:06:23,790 has to dimerize in order to do a repression. 129 00:06:23,790 --> 00:06:26,409 So if you have a more detailed mechanistic model, that 130 00:06:26,409 --> 00:06:28,450 includes all these steps, that kind of introduces 131 00:06:28,450 --> 00:06:30,550 some sort of delay, that in principle 132 00:06:30,550 --> 00:06:32,470 can lead to oscillations in such a circuit. 133 00:06:32,470 --> 00:06:35,880 Or if you wanted to, you could just explicitly put in a delay. 134 00:06:35,880 --> 00:06:37,760 So you could say that x dot, instead 135 00:06:37,760 --> 00:06:42,210 being a function of x, instead what you can do, 136 00:06:42,210 --> 00:06:44,470 is you could say, well its actually a function of x 137 00:06:44,470 --> 00:06:49,410 at some time, t minus tao. 138 00:06:49,410 --> 00:06:52,710 So instead of having the rate of production of x 139 00:06:52,710 --> 00:06:54,774 be a function of x, at that moment in time, 140 00:06:54,774 --> 00:06:57,190 instead it could be a function of x at some previous time. 141 00:06:57,190 --> 00:06:59,231 Doing that, that's a very explicit form of delay. 142 00:06:59,231 --> 00:07:01,900 And that can also be used to generate oscillations 143 00:07:01,900 --> 00:07:04,390 in a simple negative auto regulatory loop. 144 00:07:08,170 --> 00:07:10,670 These are all different kind of approaches 145 00:07:10,670 --> 00:07:18,600 for encoding delays into a model and in various approaches 146 00:07:18,600 --> 00:07:20,786 will give you oscillations. 147 00:07:20,786 --> 00:07:21,285 Yes? 148 00:07:21,285 --> 00:07:22,993 AUDIENCE: Question for the repressilator, 149 00:07:22,993 --> 00:07:25,494 when you say the period is not tunable, 150 00:07:25,494 --> 00:07:29,810 it's because the mRNA lifeline is very difficult to-- 151 00:07:29,810 --> 00:07:31,950 PROFESSOR: All right, when we say-- 152 00:07:31,950 --> 00:07:33,367 AUDIENCE: --in the model you can-- 153 00:07:33,367 --> 00:07:34,575 PROFESSOR: Yes, that's right. 154 00:07:34,575 --> 00:07:36,270 In the model you can, in principle-- So 155 00:07:36,270 --> 00:07:41,432 what I mean when I say this is that in this class of model, 156 00:07:41,432 --> 00:07:43,640 so you could also have, instead of this repressilator 157 00:07:43,640 --> 00:07:46,450 with three, you have the so-called pentalator, 158 00:07:46,450 --> 00:07:49,400 where you have five proteins and each is repressing itself. 159 00:07:49,400 --> 00:07:52,530 So these all have similar features, 160 00:07:52,530 --> 00:07:55,480 so all have these odd numbers of proteins going around 161 00:07:55,480 --> 00:07:56,860 and repressing one another. 162 00:07:56,860 --> 00:07:59,950 And so you can write down the model with seven, if you want. 163 00:07:59,950 --> 00:08:04,600 But in all these cases, it's not tunable. 164 00:08:04,600 --> 00:08:09,320 What we mean by that is that, when you tune the frequency, 165 00:08:09,320 --> 00:08:13,380 you in general lose the amplitude of the oscillation. 166 00:08:13,380 --> 00:08:15,050 So the amplitude will go down. 167 00:08:15,050 --> 00:08:18,360 There was a very nice paper that was 168 00:08:18,360 --> 00:08:20,780 written in 2008 on this topic, written 169 00:08:20,780 --> 00:08:23,260 by Jim Ferrell at Stanford. 170 00:08:23,260 --> 00:08:25,160 So I just want to mention this. 171 00:08:25,160 --> 00:08:32,230 So its Ferrell, at Stanford, this 172 00:08:32,230 --> 00:08:41,600 is a paper in Science 2008, and it's 173 00:08:41,600 --> 00:08:44,680 called Robust Tunable Biological Oscillations 174 00:08:44,680 --> 00:08:48,870 from Interlinked Positive and Negative Feedback Loops. 175 00:08:48,870 --> 00:08:52,480 So nice title, I like titles that say something. 176 00:08:52,480 --> 00:08:54,860 So it's sort of the ultimate short version 177 00:08:54,860 --> 00:08:57,910 of an abstract, right, if you can do it I recommend it. 178 00:08:57,910 --> 00:08:59,950 Incidentally in graduate school, I once 179 00:08:59,950 --> 00:09:04,170 wrote a paper with four words, short words, DNA over-winds 180 00:09:04,170 --> 00:09:06,040 when stretched. 181 00:09:06,040 --> 00:09:07,880 Nice statement, you may or may not 182 00:09:07,880 --> 00:09:09,340 actually know what I mean by that, 183 00:09:09,340 --> 00:09:13,580 but it's a nice short, title, it's a statement. 184 00:09:13,580 --> 00:09:15,830 I encourage you to think about that when 185 00:09:15,830 --> 00:09:19,110 you're writing your papers. 186 00:09:19,110 --> 00:09:23,650 So he wrote this paper where, he said, all right, well, 187 00:09:23,650 --> 00:09:26,100 oscillations are really important. 188 00:09:26,100 --> 00:09:29,810 Thinking about context of heart rhythms, or cell cycle, 189 00:09:29,810 --> 00:09:31,052 or this or that. 190 00:09:31,052 --> 00:09:32,760 Oscillations are important, but if you go 191 00:09:32,760 --> 00:09:35,110 and you look at the circuits that 192 00:09:35,110 --> 00:09:37,060 are generating oscillations in biology, 193 00:09:37,060 --> 00:09:39,580 they often have so-called interlinked positive 194 00:09:39,580 --> 00:09:41,790 and negative feedback loops. 195 00:09:41,790 --> 00:09:44,930 There are many cases where you have, 196 00:09:44,930 --> 00:09:49,460 some x that actually is positively, 197 00:09:49,460 --> 00:09:51,410 it's kind of activating itself. 198 00:09:51,410 --> 00:09:54,110 And this is very much something that will not lead 199 00:09:54,110 --> 00:09:56,580 to oscillations on its own. 200 00:09:56,580 --> 00:09:59,050 It might be bistable, which is interesting, 201 00:09:59,050 --> 00:10:00,880 but not oscillations on its own. 202 00:10:00,880 --> 00:10:04,240 But then there's also maybe a negative feedback 203 00:10:04,240 --> 00:10:09,030 loop through another protein. 204 00:10:09,030 --> 00:10:10,670 And the idea is that this one somehow 205 00:10:10,670 --> 00:10:15,540 operates-- this one's fast, and this one's slow. 206 00:10:15,540 --> 00:10:18,329 And the key feature of these relaxation oscillators 207 00:10:18,329 --> 00:10:19,495 is they are two time scales. 208 00:10:24,770 --> 00:10:28,680 And it's the slow time scale that 209 00:10:28,680 --> 00:10:31,780 specifies the period of the oscillation, 210 00:10:31,780 --> 00:10:34,220 and this fast one kind of locks the system 211 00:10:34,220 --> 00:10:36,770 into these alternative states. 212 00:10:36,770 --> 00:10:39,490 And this helps maintain the amplitude, 213 00:10:39,490 --> 00:10:42,340 because it has this nature being bistable, right, 214 00:10:42,340 --> 00:10:44,290 so it's on or off. 215 00:10:44,290 --> 00:10:47,440 So this helps you maintain amplitude, 216 00:10:47,440 --> 00:10:50,580 so this is kind of in charge of amplitude, 217 00:10:50,580 --> 00:10:55,220 and this one over here is in charge of the period. 218 00:10:55,220 --> 00:10:57,740 So what you can imagine that by changing this time scale, 219 00:10:57,740 --> 00:10:59,650 you change the period of the oscillation, 220 00:10:59,650 --> 00:11:05,600 whereas this loop allows you to maintain the amplitude. 221 00:11:05,600 --> 00:11:09,340 And what Jim's group did computationally in this paper, 222 00:11:09,340 --> 00:11:12,810 is they analyzed many different circuit designs that 223 00:11:12,810 --> 00:11:14,760 can lead to oscillations, and they 224 00:11:14,760 --> 00:11:18,370 showed that for the loops that are 225 00:11:18,370 --> 00:11:20,170 made of purely negative interactions 226 00:11:20,170 --> 00:11:23,640 like this, if you change a parameter in order 227 00:11:23,640 --> 00:11:26,000 to change the period, you'll also in general 228 00:11:26,000 --> 00:11:28,990 make the amplitude of the oscillations drop dramatically. 229 00:11:28,990 --> 00:11:31,424 So that's the sense in which they're not tunable. 230 00:11:31,424 --> 00:11:33,090 Whereas if you have this kind of design, 231 00:11:33,090 --> 00:11:37,510 you can actually tune over, in some cases a very wide range, 232 00:11:37,510 --> 00:11:40,730 but maintain the amplitude of the oscillation. 233 00:11:40,730 --> 00:11:42,170 And in addition to being tunable, 234 00:11:42,170 --> 00:11:45,130 these things also end up being robust in various ways. 235 00:11:45,130 --> 00:11:49,300 The oscillation is maintained subject to various kinds 236 00:11:49,300 --> 00:11:51,994 of-- If you twiddle with the parameters, you double this, 237 00:11:51,994 --> 00:11:54,160 you have that, you still get nice oscillations here, 238 00:11:54,160 --> 00:11:57,850 whereas in those designs you tend to lose 239 00:11:57,850 --> 00:11:59,610 the oscillations more easily. 240 00:11:59,610 --> 00:12:01,500 So they claim that based on that, that these 241 00:12:01,500 --> 00:12:03,650 might be more evolvable. 242 00:12:03,650 --> 00:12:06,180 So even in cases where you don't need to tune the period, 243 00:12:06,180 --> 00:12:09,060 maybe you still end up evolving towards this design, 244 00:12:09,060 --> 00:12:12,497 just because it's robust to stochastic fluctuations 245 00:12:12,497 --> 00:12:14,330 in the concentrations of things, but also it 246 00:12:14,330 --> 00:12:17,630 might be easier to evolve these sorts of oscillations. 247 00:12:20,490 --> 00:12:23,270 Are there any questions about the kind of intuition 248 00:12:23,270 --> 00:12:26,370 behind this for now? 249 00:12:26,370 --> 00:12:28,670 There's a nice kind of circuit analogy 250 00:12:28,670 --> 00:12:34,140 that people often talk about in the context of this. 251 00:12:34,140 --> 00:12:38,220 So if you imagine you have some battery, with some voltage, v, 252 00:12:38,220 --> 00:12:50,040 well, we'll say v battery, some capacitor over here, 253 00:12:50,040 --> 00:12:54,950 but over here you have something that 254 00:12:54,950 --> 00:13:03,950 will spark at some voltage, some v t, you get a spark. 255 00:13:03,950 --> 00:13:08,110 Now the question is, well, what happens over time 256 00:13:08,110 --> 00:13:12,250 if the threshold is less than v battery? 257 00:13:12,250 --> 00:13:14,335 We maybe should have a resistor in here. 258 00:13:18,800 --> 00:13:21,410 So the threshold is less than v battery, 259 00:13:21,410 --> 00:13:26,352 then this can generate nice oscillations 260 00:13:26,352 --> 00:13:28,560 in the voltage say across the capacitor as a function 261 00:13:28,560 --> 00:13:30,910 of time, that are tunable. 262 00:13:30,910 --> 00:13:34,020 Because if you plot as a function of time. 263 00:13:34,020 --> 00:13:37,900 This is the voltage across the capacitor, where up here we 264 00:13:37,900 --> 00:13:45,160 might have the v, the battery, here we might have v threshold. 265 00:13:45,160 --> 00:13:47,960 Now in the absence of-- This thing's that's 266 00:13:47,960 --> 00:13:49,510 going to short periodically, we're 267 00:13:49,510 --> 00:13:51,399 just going to charge up the capacitor. 268 00:13:51,399 --> 00:13:53,690 So in principle, there's going to be this standard r, c 269 00:13:53,690 --> 00:13:58,630 time constant, coming up to here, but before we get there, 270 00:13:58,630 --> 00:14:00,070 we get the spark. 271 00:14:00,070 --> 00:14:03,475 So then we discharge across here and this drops. 272 00:14:11,220 --> 00:14:14,820 So you get something that looks like this. 273 00:14:14,820 --> 00:14:17,890 Now you can imagine by changing, for example, the resistor, 274 00:14:17,890 --> 00:14:21,270 you can change the rate that this thing, the capacitor, 275 00:14:21,270 --> 00:14:22,799 will charge up. 276 00:14:22,799 --> 00:14:24,340 But the amplitude of the oscillations 277 00:14:24,340 --> 00:14:27,790 stay constant, because that's set by the voltage threshold 278 00:14:27,790 --> 00:14:33,270 across this-- where it shorts. 279 00:14:33,270 --> 00:14:35,520 This is capturing this dynamic of the separation 280 00:14:35,520 --> 00:14:36,170 of time scales. 281 00:14:36,170 --> 00:14:39,145 So there's a slow time scale, which is this r, 282 00:14:39,145 --> 00:14:41,370 c time constant, and then there's the rapid time 283 00:14:41,370 --> 00:14:45,060 scale is where this shorts out. 284 00:14:45,060 --> 00:14:46,820 So you can imagine that this is an example 285 00:14:46,820 --> 00:14:49,880 of an oscillatory signal that we can 286 00:14:49,880 --> 00:14:52,798 to tune the frequency without sacrificing the amplitude. 287 00:15:00,200 --> 00:15:04,580 What we've said so far is that there are engineering analogs 288 00:15:04,580 --> 00:15:06,830 to these sorts of relaxation oscillators. 289 00:15:06,830 --> 00:15:10,140 We can model various synthetic circuits, 290 00:15:10,140 --> 00:15:13,799 or we can look at natural oscillatory networks, 291 00:15:13,799 --> 00:15:15,590 in order to get a sense of what's going on. 292 00:15:15,590 --> 00:15:19,420 But of course, a major goal of this kind 293 00:15:19,420 --> 00:15:22,660 of system synthetic approach to the field, 294 00:15:22,660 --> 00:15:24,690 is that if all this stuff is really true, 295 00:15:24,690 --> 00:15:27,084 we should be able to build it. 296 00:15:27,084 --> 00:15:29,000 And there's a very nice demonstration of this, 297 00:15:29,000 --> 00:15:33,680 also in 2008, by Jeff Hasty's group. 298 00:15:33,680 --> 00:15:38,470 So Jeff Hasty was actually trained as a high energy 299 00:15:38,470 --> 00:15:43,020 theorist, and then I think it was during his postdoc, 300 00:15:43,020 --> 00:15:45,850 maybe he switched into experimental biology. 301 00:15:45,850 --> 00:15:49,600 Went and did his postdoc, I think, with Jim Collins. 302 00:15:49,600 --> 00:15:52,860 And then eventually now has his own group doing systems 303 00:15:52,860 --> 00:15:54,160 synthetic biology. 304 00:15:54,160 --> 00:16:01,980 In this paper, it was a Nature paper in 2008, 305 00:16:01,980 --> 00:16:06,050 it's called A Fast Robust and Tunable Synthetic Gene 306 00:16:06,050 --> 00:16:08,640 Oscillator. 307 00:16:08,640 --> 00:16:12,230 It's a nice statement, tells you what he's about to do. 308 00:16:12,230 --> 00:16:16,830 The data here, this is again using this basic insight 309 00:16:16,830 --> 00:16:20,140 of having both interlinked positive negative feedback 310 00:16:20,140 --> 00:16:21,420 loops in E. coli. 311 00:16:21,420 --> 00:16:24,740 He demonstrated that he can get really beautiful oscillations, 312 00:16:24,740 --> 00:16:27,560 in essentially all the cells, and that they're tunable, 313 00:16:27,560 --> 00:16:30,970 enter n period, by a factor of three, or four, 314 00:16:30,970 --> 00:16:32,510 or so, by a fair amount. 315 00:16:32,510 --> 00:16:35,750 And indeed as fast as 13 minutes, 316 00:16:35,750 --> 00:16:37,430 the oscillatory period. 317 00:16:37,430 --> 00:16:41,060 Which is pretty nice, right? 318 00:16:41,060 --> 00:16:44,520 So I encourage you to check out this paper. 319 00:16:44,520 --> 00:16:52,180 This paper was also an example of how 320 00:16:52,180 --> 00:16:56,570 it was in principle possible to get oscillations just by doing 321 00:16:56,570 --> 00:16:58,370 negative auto regulation. 322 00:16:58,370 --> 00:16:59,980 Right, so this was a case where they 323 00:16:59,980 --> 00:17:03,890 designed a gene network that they could tune 324 00:17:03,890 --> 00:17:05,920 and had this wonderful property. 325 00:17:05,920 --> 00:17:08,040 But then after they did that they noticed 326 00:17:08,040 --> 00:17:09,480 that in their model at least, they 327 00:17:09,480 --> 00:17:12,069 could get oscillations in some parameter regime, 328 00:17:12,069 --> 00:17:15,099 just by having the negative auto-regulatory loop. 329 00:17:15,099 --> 00:17:17,565 And as a result of all these intermediate processes, 330 00:17:17,565 --> 00:17:19,440 of protein maturation, and so forth, and then 331 00:17:19,440 --> 00:17:20,790 they went and they constructed that network, 332 00:17:20,790 --> 00:17:22,748 and they showed that that could also oscillate. 333 00:17:22,748 --> 00:17:26,040 So again this is an example of the interplay between modeling, 334 00:17:26,040 --> 00:17:29,190 experiment theory, modeling, And Jeff Hasty 335 00:17:29,190 --> 00:17:32,050 has gone on to write another several, 336 00:17:32,050 --> 00:17:35,840 really beautiful papers looking at these sorts of oscillations, 337 00:17:35,840 --> 00:17:39,490 looking at how you can get synchronization of oscillators, 338 00:17:39,490 --> 00:17:41,550 and you get period doubling ideas. 339 00:17:41,550 --> 00:17:44,571 It's really a whole string of wonderful, wonderful papers. 340 00:17:44,571 --> 00:17:47,070 So I encourage you to, if you're interested in oscillations, 341 00:17:47,070 --> 00:17:49,700 to look at Jeff Hasty's work over the years. 342 00:17:55,100 --> 00:17:59,170 If you want a quick introduction to these papers, 343 00:17:59,170 --> 00:18:03,200 I also wrote a news and views in Nature on these two papers. 344 00:18:03,200 --> 00:18:07,364 So you can read that, it's only a page. 345 00:18:07,364 --> 00:18:09,030 Although I guess you won't hear anything 346 00:18:09,030 --> 00:18:10,696 that you haven't already heard probably. 347 00:18:13,330 --> 00:18:15,300 Any other questions about this idea 348 00:18:15,300 --> 00:18:19,560 of how we can use both positive and negative feedback in order 349 00:18:19,560 --> 00:18:21,685 to get some nice oscillatory properties? 350 00:18:27,470 --> 00:18:30,040 OK, then let's move on. 351 00:18:33,340 --> 00:18:37,150 What did you guys think of this paper, the Barabasi paper? 352 00:18:41,170 --> 00:18:43,696 Good, bad, difficult, easy? 353 00:18:43,696 --> 00:18:45,570 AUDIENCE: Why does it have so many citations? 354 00:18:45,570 --> 00:18:46,780 PROFESSOR: Why does it have so many citations? 355 00:18:46,780 --> 00:18:48,540 All right that's an inter-- and you 356 00:18:48,540 --> 00:18:50,400 should look at how many-- according to Google Scholar, 357 00:18:50,400 --> 00:18:52,670 I haven't checked this year, but it's probably 20,000 citations. 358 00:18:52,670 --> 00:18:53,211 I mean it's-- 359 00:18:53,211 --> 00:18:55,092 AUDIENCE: Is it a cult thing? 360 00:18:55,092 --> 00:18:56,300 PROFESSOR: It's a cult thing. 361 00:18:56,300 --> 00:18:58,510 Well, I don't know. 362 00:18:58,510 --> 00:19:00,110 That might be exaggerating. 363 00:19:00,110 --> 00:19:01,720 AUDIENCE: I mean it's a nice paper. 364 00:19:01,720 --> 00:19:03,455 PROFESSOR: Yeah, right. 365 00:19:03,455 --> 00:19:05,830 So this is interesting, and I think that the basic answer 366 00:19:05,830 --> 00:19:08,950 is that there are networks that are 367 00:19:08,950 --> 00:19:12,690 relevant in many, many, many fields, which they allude to. 368 00:19:12,690 --> 00:19:15,240 And there are many researchers that 369 00:19:15,240 --> 00:19:17,570 have been excited about studying those networks 370 00:19:17,570 --> 00:19:23,115 in many, many fields, and many, many, many of the networks that 371 00:19:23,115 --> 00:19:27,460 are observed in nature or social science, the web, 372 00:19:27,460 --> 00:19:29,890 everywhere, they have these power law structures. 373 00:19:29,890 --> 00:19:37,220 And this is the first clear simple mechanism 374 00:19:37,220 --> 00:19:38,290 to generate it. 375 00:19:38,290 --> 00:19:40,220 My understanding is that actually 376 00:19:40,220 --> 00:19:42,530 a mathematician decades before, actually 377 00:19:42,530 --> 00:19:44,590 did demonstrate that this kind of thing 378 00:19:44,590 --> 00:19:46,510 could be constructed, that would lead to this, 379 00:19:46,510 --> 00:19:49,685 but that paper doesn't have 20,000 citations. 380 00:19:49,685 --> 00:19:51,310 I mean like it's a lot of these things, 381 00:19:51,310 --> 00:19:53,101 you have to be the right time, right place, 382 00:19:53,101 --> 00:19:54,960 and have the right idea. 383 00:19:54,960 --> 00:19:57,410 AUDIENCE: Yeah, I guess my main thought about the paper 384 00:19:57,410 --> 00:20:01,460 is exactly that, the interesting thing about it was, 385 00:20:01,460 --> 00:20:03,964 it came out at about the time that data on large networks 386 00:20:03,964 --> 00:20:04,900 was readily available. 387 00:20:04,900 --> 00:20:05,900 PROFESSOR: That's right. 388 00:20:05,900 --> 00:20:10,490 There's a reason that this paper was published at this time, 389 00:20:10,490 --> 00:20:14,010 and of course if Barabasi didn't do it here, 390 00:20:14,010 --> 00:20:16,520 someone else would have done it a year or two later. 391 00:20:16,520 --> 00:20:19,610 But it was really that the data were available everywhere, 392 00:20:19,610 --> 00:20:23,160 and we were seeing these power law distributions, 393 00:20:23,160 --> 00:20:26,900 and it's really crying out for an explanation. 394 00:20:26,900 --> 00:20:29,425 I think it's-- You know sometimes people complain, 395 00:20:29,425 --> 00:20:32,050 that they say, oh yeah, you know I could have come up with this 396 00:20:32,050 --> 00:20:35,380 idea, it's not that deep. 397 00:20:35,380 --> 00:20:38,220 And maybe you could have, but you didn't. 398 00:20:38,220 --> 00:20:39,625 [LAUGHTER] 399 00:20:40,320 --> 00:20:43,760 PROFESSOR: And also I'd say that Barabasi 400 00:20:43,760 --> 00:20:49,040 has a record of doing interesting things, 401 00:20:49,040 --> 00:20:52,460 and being the first to point out a simple idea. 402 00:20:52,460 --> 00:20:55,800 If you can reliably be the first to point out 403 00:20:55,800 --> 00:20:57,590 a simple explanation for important things, 404 00:20:57,590 --> 00:21:00,780 then that's another kind of genius, right? 405 00:21:00,780 --> 00:21:03,040 I mean-- and it's the kind of genius 406 00:21:03,040 --> 00:21:07,500 that I aspire to, because I know that I'm not going to reach 407 00:21:07,500 --> 00:21:09,680 the other kind of genius. 408 00:21:09,680 --> 00:21:12,970 I mean there are some things you look at, oh well, I would never 409 00:21:12,970 --> 00:21:14,530 be able to do that, right? 410 00:21:14,530 --> 00:21:17,530 And everyone agrees that that's hard. 411 00:21:17,530 --> 00:21:19,830 But I think that there is something 412 00:21:19,830 --> 00:21:23,920 about being able to see what the scientific opportunities are 413 00:21:23,920 --> 00:21:26,340 at a given time, and you don't have 414 00:21:26,340 --> 00:21:29,330 to come up with a really complicated model 415 00:21:29,330 --> 00:21:33,280 or proof in order to have really important impact. 416 00:21:33,280 --> 00:21:40,730 And this paper is way beyond, in terms 417 00:21:40,730 --> 00:21:44,120 of number of people that have read it, cited it, 418 00:21:44,120 --> 00:21:46,090 and so forth, it's way beyond, probably 419 00:21:46,090 --> 00:21:50,320 any other paper you'll likely read in your life. 420 00:21:50,320 --> 00:21:50,985 Yes? 421 00:21:50,985 --> 00:21:51,850 AUDIENCE: Just more thoughts. 422 00:21:51,850 --> 00:21:52,891 PROFESSOR: More thoughts. 423 00:21:52,891 --> 00:21:54,584 Yeah, that's fine. 424 00:21:54,584 --> 00:21:57,140 AUDIENCE: I guess, what's interesting is, 425 00:21:57,140 --> 00:22:01,452 it starts a conversation, so we can analyze these networks, 426 00:22:01,452 --> 00:22:06,798 and the feature that works the best [INAUDIBLE] 427 00:22:06,798 --> 00:22:08,499 starts that conversation, there's 428 00:22:08,499 --> 00:22:12,920 still a lot more that we can do with it [INAUDIBLE]. 429 00:22:12,920 --> 00:22:15,370 I think that's why I like it. 430 00:22:15,370 --> 00:22:18,370 PROFESSOR: That's a totally-- so this is the Barabasi and Reka 431 00:22:18,370 --> 00:22:20,880 Albert. 432 00:22:20,880 --> 00:22:26,140 So Barabasi is a professor over at Northeastern now, 433 00:22:26,140 --> 00:22:28,192 Reka Albert is a professor at Penn State, 434 00:22:28,192 --> 00:22:30,400 and I think they've both gone on to do what, I think, 435 00:22:30,400 --> 00:22:34,290 are really very interesting things in this network space, 436 00:22:34,290 --> 00:22:36,310 and more generally. 437 00:22:36,310 --> 00:22:38,840 So I you encourage you to check out what each of them 438 00:22:38,840 --> 00:22:41,940 have been doing over the years. 439 00:22:41,940 --> 00:22:45,395 All right, so this model, what are the two key ingredients 440 00:22:45,395 --> 00:22:47,895 of this model? 441 00:22:47,895 --> 00:22:49,770 AUDIENCE: Growth and preferential attachment. 442 00:22:49,770 --> 00:22:51,228 PROFESSOR: Right, so the two things 443 00:22:51,228 --> 00:22:55,750 you should be able to recapitulate on an exam 444 00:22:55,750 --> 00:22:58,070 is that there are two assumptions here, 445 00:22:58,070 --> 00:23:00,630 there's growth, and there's preferential attachment. 446 00:23:00,630 --> 00:23:02,714 And we'll talk about the degree to which we 447 00:23:02,714 --> 00:23:04,630 think each of those things might be necessary, 448 00:23:04,630 --> 00:23:11,270 but what's the key-- Can somebody be a little more 449 00:23:11,270 --> 00:23:12,770 explicit than what we've been so far 450 00:23:12,770 --> 00:23:14,380 about what the key observation is 451 00:23:14,380 --> 00:23:17,700 that we're trying to explain? 452 00:23:17,700 --> 00:23:21,060 AUDIENCE: There are nodes that, inactive nodes, 453 00:23:21,060 --> 00:23:25,860 that have more edges than you expect, either by random or-- 454 00:23:25,860 --> 00:23:26,710 PROFESSOR: Right. 455 00:23:26,710 --> 00:23:27,480 Perfect, right. 456 00:23:27,480 --> 00:23:35,205 Observation-- So some nodes have lots of edges. 457 00:23:35,205 --> 00:23:37,630 AUDIENCE: I mean it is sort of a meta irony 458 00:23:37,630 --> 00:23:40,055 that this paper is now very widely cited. 459 00:23:40,055 --> 00:23:41,520 [LAUGHTER] 460 00:23:41,520 --> 00:23:42,205 PROFESSOR: Yes. 461 00:23:42,205 --> 00:23:43,830 AUDIENCE: Every time we talk about it-- 462 00:23:43,830 --> 00:23:45,700 PROFESSOR: Yes. 463 00:23:45,700 --> 00:23:50,044 Indeed it is ironic, and we'll talk 464 00:23:50,044 --> 00:23:51,960 about how the scaling of the citation networks 465 00:23:51,960 --> 00:23:53,510 go in a moment. 466 00:23:53,510 --> 00:23:55,670 Right, so some nodes have lots of edges, 467 00:23:55,670 --> 00:23:59,915 and you want to be very clear about this, 468 00:23:59,915 --> 00:24:02,520 this is what you're trying to explain. 469 00:24:02,520 --> 00:24:05,630 So it's a power law distribution, so in particular, 470 00:24:05,630 --> 00:24:07,380 you quantify this thing. 471 00:24:07,380 --> 00:24:11,020 That the probability of having k nodes, 472 00:24:11,020 --> 00:24:13,300 or I'm sorry, the probability that a node has k edges, 473 00:24:13,300 --> 00:24:19,780 falls off as a power law, 1 over k to some power 2, 3, 4. 474 00:24:19,780 --> 00:24:26,450 So the probability of having k edges, goes as 1 over k 475 00:24:26,450 --> 00:24:28,310 to some power alpha. 476 00:24:28,310 --> 00:24:33,610 Where alpha is maybe between 2 and 4, for a lot of these. 477 00:24:33,610 --> 00:24:37,774 Now it's important to make sure that you 478 00:24:37,774 --> 00:24:39,440 keep this qualitative statement in mind, 479 00:24:39,440 --> 00:24:43,980 because it's true that it falls off, and sort of rapidly. 480 00:24:43,980 --> 00:24:47,250 Right, 1 over k squared, k cubed, or k to the fourth, 481 00:24:47,250 --> 00:24:49,190 you'd say, oh, that's a pretty rapid fall off. 482 00:24:49,190 --> 00:24:50,010 Right? 483 00:24:50,010 --> 00:24:52,540 But it's not rapid compared to what? 484 00:24:52,540 --> 00:24:54,054 AUDIENCE: [INAUDIBLE]. 485 00:24:54,054 --> 00:24:54,970 PROFESSOR: --exponent. 486 00:24:54,970 --> 00:24:55,470 Right. 487 00:24:55,470 --> 00:24:59,050 So for these other models then, it falls off exponentially. 488 00:24:59,050 --> 00:24:59,550 Right? 489 00:24:59,550 --> 00:25:02,320 So even faster. 490 00:25:02,320 --> 00:25:05,570 So it's easy to look at 1 over k to the fourth, 491 00:25:05,570 --> 00:25:07,710 and think, oh, that's a fast fall off. 492 00:25:07,710 --> 00:25:10,070 We have to remember that it's slow 493 00:25:10,070 --> 00:25:11,890 compared to some other things. 494 00:25:11,890 --> 00:25:16,330 So in particular, if you look at the data for real networks, 495 00:25:16,330 --> 00:25:19,700 and you see that the probability distribution in many cases 496 00:25:19,700 --> 00:25:22,490 goes over orders of magnitude in terms of this probability. 497 00:25:22,490 --> 00:25:24,950 You think oh, that's a big range. 498 00:25:24,950 --> 00:25:27,180 And it is a big range, but the fact 499 00:25:27,180 --> 00:25:30,980 is that you actually see some nodes with the thousand 500 00:25:30,980 --> 00:25:32,480 ideas or whatnot, which is something 501 00:25:32,480 --> 00:25:35,100 that you would just never see, if it were a random network, 502 00:25:35,100 --> 00:25:40,500 or if it were not a power law distributed network. 503 00:25:40,500 --> 00:25:43,700 And I think that this is also highlighting 504 00:25:43,700 --> 00:25:49,417 another statement, which is that a powerful way to make 505 00:25:49,417 --> 00:25:51,750 a difference, for example, if you're going to write down 506 00:25:51,750 --> 00:25:53,710 a model, or you're going to do a theory, 507 00:25:53,710 --> 00:25:57,840 is that it's nice if there's a clear observation that 508 00:25:57,840 --> 00:26:00,405 needs to be explained. 509 00:26:00,405 --> 00:26:01,780 Because you can always write down 510 00:26:01,780 --> 00:26:06,120 a model of something, and maybe you'll 511 00:26:06,120 --> 00:26:08,110 find something interesting. 512 00:26:08,110 --> 00:26:11,950 But a way to massively increase the probability 513 00:26:11,950 --> 00:26:14,830 that you're going to discover something interesting 514 00:26:14,830 --> 00:26:17,710 is if you already know there's something interesting there 515 00:26:17,710 --> 00:26:20,880 and that you're trying to explain it. 516 00:26:20,880 --> 00:26:22,710 And I think that this is an example 517 00:26:22,710 --> 00:26:24,876 of that, right there, it was already an observation, 518 00:26:24,876 --> 00:26:26,160 it was already known. 519 00:26:26,160 --> 00:26:29,510 It's not that he was the first person to make those plots. 520 00:26:29,510 --> 00:26:32,910 There are other plots of citation networks before. 521 00:26:32,910 --> 00:26:34,800 So Sid Redner, for example, had already 522 00:26:34,800 --> 00:26:37,530 done some analyses of citation networks, 523 00:26:37,530 --> 00:26:41,410 he's a theoretical statistical physicist over at BU, 524 00:26:41,410 --> 00:26:47,830 but just now, I guess, moving over to the Santa Fe Institute. 525 00:26:47,830 --> 00:26:49,550 But it's not that he was the first person 526 00:26:49,550 --> 00:26:52,715 to make that observation, but he knew 527 00:26:52,715 --> 00:26:55,090 there was something interesting that needed to explained. 528 00:26:55,090 --> 00:26:58,810 So I'd say that for any of you that 529 00:26:58,810 --> 00:27:01,880 are thinking about doing theory, or writing down models, 530 00:27:01,880 --> 00:27:05,537 I would say, whenever possible start 531 00:27:05,537 --> 00:27:06,870 with an interesting observation. 532 00:27:09,920 --> 00:27:12,510 So can somebody-- maybe you guys could just throw out, 533 00:27:12,510 --> 00:27:16,470 what are some examples of nodes and edges that 534 00:27:16,470 --> 00:27:17,780 were given there or elsewhere? 535 00:27:22,469 --> 00:27:24,577 AUDIENCE: Web pages and links. 536 00:27:24,577 --> 00:27:26,160 PROFESSOR: Right, web pages and links. 537 00:27:31,050 --> 00:27:35,319 And is this a directed or undirected? 538 00:27:35,319 --> 00:27:36,110 AUDIENCE: Directed. 539 00:27:36,110 --> 00:27:37,693 PROFESSOR: So this is indeed directed. 540 00:27:41,460 --> 00:27:42,576 Some others? 541 00:27:42,576 --> 00:27:43,950 AUDIENCE: Movie stars and movies. 542 00:27:43,950 --> 00:27:45,366 PROFESSOR: Movie stars and movies. 543 00:27:45,366 --> 00:27:53,610 This one's a funny one, rig-- So movie stars and then this 544 00:27:53,610 --> 00:27:55,650 is like being in a movie together, right? 545 00:27:55,650 --> 00:27:57,140 So co-starring or so. 546 00:28:02,360 --> 00:28:04,778 Others? 547 00:28:04,778 --> 00:28:06,359 AUDIENCE: Articles and Citations. 548 00:28:06,359 --> 00:28:07,775 PROFESSOR: Articles and citations. 549 00:28:20,151 --> 00:28:22,650 And this is again directed, and this is not directed, right? 550 00:28:25,340 --> 00:28:27,970 And we can maybe even try to remind ourselves, 551 00:28:27,970 --> 00:28:31,490 this fell off as alpha was equal to what? 552 00:28:31,490 --> 00:28:33,110 I guess it was 3, I think they said. 553 00:28:39,280 --> 00:28:48,080 Actors work around 2.3, I guess they said. 554 00:28:48,080 --> 00:28:53,540 The web was 2.1. 555 00:28:53,540 --> 00:28:55,110 Just because it's a power law doesn't 556 00:28:55,110 --> 00:28:57,960 mean that it's always going to have the same alpha right? 557 00:28:57,960 --> 00:29:02,320 But for example, what this means is that for every paper that 558 00:29:02,320 --> 00:29:08,380 has say 200 citations, there are going 559 00:29:08,380 --> 00:29:12,610 to be roughly 10 papers that have 100 citations. 560 00:29:12,610 --> 00:29:14,630 If you increase k by a factor of 2, 561 00:29:14,630 --> 00:29:16,382 you get almost an order of magnitude 562 00:29:16,382 --> 00:29:18,090 in terms of the probability distribution. 563 00:29:27,300 --> 00:29:29,390 So this is an interesting observation, 564 00:29:29,390 --> 00:29:31,110 and where Barabasi came in and said, 565 00:29:31,110 --> 00:29:35,610 well, what would be a model that would recapitulate this? 566 00:29:35,610 --> 00:29:38,485 And what are the models that did not recapitulate it? 567 00:29:43,214 --> 00:29:45,142 AUDIENCE: [INAUDIBLE]. 568 00:29:45,142 --> 00:29:45,850 PROFESSOR: Right. 569 00:29:45,850 --> 00:29:50,300 So the Erdos Renyi-- so other models, 570 00:29:50,300 --> 00:29:56,550 there's the E R, other models, there's 571 00:29:56,550 --> 00:30:04,580 the Erdos Renyi network, random network, 572 00:30:04,580 --> 00:30:10,470 and that's because here the degree distribution is peaked 573 00:30:10,470 --> 00:30:12,830 around something and then falls off exponentially 574 00:30:12,830 --> 00:30:14,270 as you go above that. 575 00:30:14,270 --> 00:30:16,740 And this is actually where I think the equations are 576 00:30:16,740 --> 00:30:20,620 wrong in this paper. 577 00:30:20,620 --> 00:30:24,720 Because if you look at the paper, page 510, 578 00:30:24,720 --> 00:30:26,970 where they say the Erdos Renyi, you 579 00:30:26,970 --> 00:30:29,400 connect the edges of probability p, 580 00:30:29,400 --> 00:30:33,640 and then they say you get a poisson distribution, p of k, 581 00:30:33,640 --> 00:30:36,180 where lambda the mean is something, but then they say, 582 00:30:36,180 --> 00:30:39,440 oh lambda is equal to some binomial of something of k, 583 00:30:39,440 --> 00:30:40,790 and so forth. 584 00:30:40,790 --> 00:30:45,560 So I think this is all not true, but rather that you 585 00:30:45,560 --> 00:30:48,450 can approximate the binomial with a poisson 586 00:30:48,450 --> 00:30:50,300 in the limit of small Ps. 587 00:30:54,740 --> 00:30:57,840 So be aware if you're looking at that. 588 00:31:00,900 --> 00:31:03,210 I know-- there was another network that-- 589 00:31:03,210 --> 00:31:04,496 Do you have a question? 590 00:31:04,496 --> 00:31:05,371 AUDIENCE: Oh, no, no. 591 00:31:05,371 --> 00:31:07,120 I was-- 592 00:31:07,120 --> 00:31:11,430 PROFESSOR: So we're going to spend 593 00:31:11,430 --> 00:31:14,200 a lot of time talking about probability distributions 594 00:31:14,200 --> 00:31:17,440 in the coming weeks, but I just wanted to highlight that there, 595 00:31:17,440 --> 00:31:19,750 as far as I tell, that is not true what they say. 596 00:31:22,330 --> 00:31:25,590 But there was one other model for a network 597 00:31:25,590 --> 00:31:27,760 that they talk about, or they mention. 598 00:31:27,760 --> 00:31:29,094 Does anybody-- 599 00:31:29,094 --> 00:31:30,010 AUDIENCE: Small world. 600 00:31:30,010 --> 00:31:32,960 PROFESSOR: The so-called small world network, right? 601 00:31:32,960 --> 00:31:36,830 And this is-- small world network, 602 00:31:36,830 --> 00:31:44,640 and this is based on a paper by Strogatz-- Watts and Strogatz, 603 00:31:44,640 --> 00:31:46,170 small world. 604 00:31:46,170 --> 00:31:52,574 That's Watts and Strogatz, and this 605 00:31:52,574 --> 00:31:54,490 was a paper where they demonstrated that there 606 00:31:54,490 --> 00:31:56,650 was a very simple mechanism. 607 00:31:56,650 --> 00:31:58,880 Just by rewiring a network that you 608 00:31:58,880 --> 00:32:01,880 could get this so-called small world phenomenon. 609 00:32:01,880 --> 00:32:06,110 Where the Kevin Bacon thing, where you can take any-- 610 00:32:06,110 --> 00:32:09,090 You're right, from Kevin Bacon, and this is actually the actor 611 00:32:09,090 --> 00:32:11,770 network, so you could say, starting with Kevin Bacon 612 00:32:11,770 --> 00:32:14,981 can you construct a list of actors 613 00:32:14,981 --> 00:32:16,480 that costarred with each person that 614 00:32:16,480 --> 00:32:17,810 gets you to any given actor. 615 00:32:17,810 --> 00:32:19,185 And the statement is that you are 616 00:32:19,185 --> 00:32:23,180 supposed to be able to do that from a path of six. 617 00:32:23,180 --> 00:32:24,760 So that all the actors are supposed 618 00:32:24,760 --> 00:32:28,094 to be connected to Kevin Bacon by six. 619 00:32:28,094 --> 00:32:29,510 Although maybe you guys don't even 620 00:32:29,510 --> 00:32:32,230 remember who Kevin Bacon is anymore. 621 00:32:32,230 --> 00:32:32,896 Oh, you do? 622 00:32:32,896 --> 00:32:35,430 OK. 623 00:32:35,430 --> 00:32:40,800 This rule works for anybody so just insert your favorite actor 624 00:32:40,800 --> 00:32:43,840 into that sentence. 625 00:32:43,840 --> 00:32:45,470 And it's important, just to mention 626 00:32:45,470 --> 00:32:48,280 that just because something is a small world network, 627 00:32:48,280 --> 00:32:51,880 does not mean that it has power law distributions. 628 00:32:51,880 --> 00:32:56,170 It may be the case that many power law networks also 629 00:32:56,170 --> 00:32:58,960 have this small world character, and I'd 630 00:32:58,960 --> 00:33:01,750 say maybe even most of them, because some 631 00:33:01,750 --> 00:33:04,100 of those highly connected nodes are 632 00:33:04,100 --> 00:33:06,354 going to be useful for connecting anybody 633 00:33:06,354 --> 00:33:07,020 to anybody else. 634 00:33:07,020 --> 00:33:09,960 But that's not required to get the small world character. 635 00:33:13,600 --> 00:33:15,715 Any questions about that statement? 636 00:33:20,857 --> 00:33:23,148 AUDIENCE: So you can go from this small world statement 637 00:33:23,148 --> 00:33:26,136 to any sort of strong statement concerning connectivity? 638 00:33:29,140 --> 00:33:31,950 PROFESSOR: Well stron-- I guess that the strong statement is 639 00:33:31,950 --> 00:33:38,890 that this property does not imply that property. 640 00:33:38,890 --> 00:33:41,190 AUDIENCE: You're not saying that the universe is 641 00:33:41,190 --> 00:33:43,948 true [INAUDIBLE] because it seems 642 00:33:43,948 --> 00:33:46,700 like, at least the examples we've listed, 643 00:33:46,700 --> 00:33:47,724 ought to be small world. 644 00:33:47,724 --> 00:33:48,390 PROFESSOR: Yeah. 645 00:33:48,390 --> 00:33:49,090 I agree. 646 00:33:49,090 --> 00:33:52,300 I think that this small world property, 647 00:33:52,300 --> 00:33:57,820 that's why I saying that, it's-- What I do not know, 648 00:33:57,820 --> 00:34:01,444 it's whether it would be possible to construct a power 649 00:34:01,444 --> 00:34:03,860 law distributed network that does not have the small world 650 00:34:03,860 --> 00:34:07,750 property, but I would say is that the ones that I'm aware 651 00:34:07,750 --> 00:34:09,960 of would have the small world property arm. 652 00:34:14,679 --> 00:34:16,840 Any other questions about where we are? 653 00:34:16,840 --> 00:34:19,260 So there's interesting properties of networks 654 00:34:19,260 --> 00:34:21,530 that we would like to explain. 655 00:34:21,530 --> 00:34:25,469 And I would say that what this paper does, 656 00:34:25,469 --> 00:34:27,929 I think kind of convincingly, is that they demonstrate 657 00:34:27,929 --> 00:34:31,810 that at least this model, and we'll get into the assumptions, 658 00:34:31,810 --> 00:34:35,078 does lead to a power law distributed network. 659 00:34:42,150 --> 00:34:44,060 The answer to the reading questions 660 00:34:44,060 --> 00:34:48,350 about whether both of these is strictly necessary, 661 00:34:48,350 --> 00:34:51,860 I think was an interesting one, and I'd 662 00:34:51,860 --> 00:34:55,699 say that this gets into the wider issue of there's 663 00:34:55,699 --> 00:34:59,400 a observation that is maybe interesting. 664 00:34:59,400 --> 00:35:01,910 And then we want to understand why that might be, 665 00:35:01,910 --> 00:35:04,500 and then what you can do is you can write down a model that 666 00:35:04,500 --> 00:35:05,736 leads to that behavior. 667 00:35:05,736 --> 00:35:06,860 We've already talked about. 668 00:35:06,860 --> 00:35:08,901 Does that prove that the assumptions of the model 669 00:35:08,901 --> 00:35:10,330 are correct? 670 00:35:10,330 --> 00:35:11,580 No. 671 00:35:11,580 --> 00:35:14,850 In this case, these are pretty generic features 672 00:35:14,850 --> 00:35:16,190 of lots and lots of the network. 673 00:35:16,190 --> 00:35:19,780 So when you read it you kind of believe 674 00:35:19,780 --> 00:35:22,510 that this is a dominant mechanism, 675 00:35:22,510 --> 00:35:25,770 but it very much does not prove that these are the only, 676 00:35:25,770 --> 00:35:28,710 this is not at all the only way to get a power law distributed 677 00:35:28,710 --> 00:35:29,354 network. 678 00:35:29,354 --> 00:35:31,270 I'd say that some of the language in the paper 679 00:35:31,270 --> 00:35:35,350 might kind of lead you to believe that that is the case, 680 00:35:35,350 --> 00:35:38,046 and I think this is a standard logical fallacy that we have 681 00:35:38,046 --> 00:35:39,420 to be careful of, and something I 682 00:35:39,420 --> 00:35:43,044 think that some the language is a little bit dangerous. 683 00:35:43,044 --> 00:35:44,710 The development of the power law scaling 684 00:35:44,710 --> 00:35:47,168 the model indicates that growth and preferential attachment 685 00:35:47,168 --> 00:35:50,120 play an important role in networ-- I'd 686 00:35:50,120 --> 00:35:54,260 say that it's quite true, but once again this question 687 00:35:54,260 --> 00:35:56,050 of-- This is certainly not a proof, 688 00:35:56,050 --> 00:35:59,040 that those assumptions are relevant for any given network. 689 00:35:59,040 --> 00:36:05,340 Of course, in all of these cases, the network does grow, 690 00:36:05,340 --> 00:36:07,850 and there is preferential attachment. 691 00:36:07,850 --> 00:36:09,850 But there are other things that are also true, 692 00:36:09,850 --> 00:36:11,340 that may be important, for example, 693 00:36:11,340 --> 00:36:14,230 in determining exactly what alpha is or in other things. 694 00:36:14,230 --> 00:36:16,000 And I think that as indicated that there 695 00:36:16,000 --> 00:36:18,470 are other ways of getting power law 696 00:36:18,470 --> 00:36:20,790 networks without making the exact assumptions that 697 00:36:20,790 --> 00:36:21,600 are here. 698 00:36:21,600 --> 00:36:26,340 But its, in my mind, it's probably 699 00:36:26,340 --> 00:36:29,160 a or d dominant mechanism in a lot of these networks. 700 00:36:29,160 --> 00:36:31,220 I think it's a fine paper, but just 701 00:36:31,220 --> 00:36:34,361 remember that it doesn't prove that those are 702 00:36:34,361 --> 00:36:35,610 the only two important things. 703 00:36:39,660 --> 00:36:40,342 Yes? 704 00:36:40,342 --> 00:36:42,425 AUDIENCE: Just above the preferential attachments, 705 00:36:42,425 --> 00:36:44,900 I think you mentioned that you tried different ways, 706 00:36:44,900 --> 00:36:47,170 and only the linearly one was 707 00:36:47,170 --> 00:36:48,437 AUDIENCE: [INAUDIBLE]. 708 00:36:48,437 --> 00:36:49,270 PROFESSOR:Troubling. 709 00:36:49,270 --> 00:36:51,490 AUDIENCE: [INAUDIBLE]. 710 00:36:51,490 --> 00:36:52,340 PROFESSOR: I agree. 711 00:36:52,340 --> 00:36:52,840 I agree. 712 00:36:55,580 --> 00:36:59,220 And what they assume in the model here 713 00:36:59,220 --> 00:37:02,180 is that the preferential attachment goes linearly 714 00:37:02,180 --> 00:37:06,130 with the number of existing edges. 715 00:37:06,130 --> 00:37:09,290 And I would say that I very much believe 716 00:37:09,290 --> 00:37:12,414 that preferential attachment is present in all those things, 717 00:37:12,414 --> 00:37:14,330 but I'm sure that if you go and you measure it 718 00:37:14,330 --> 00:37:15,800 you're not going to find that its linear 719 00:37:15,800 --> 00:37:16,841 with the number of edges. 720 00:37:16,841 --> 00:37:19,585 It's going to-- actually, I don't 721 00:37:19,585 --> 00:37:21,460 know what you'll find in each of those cases, 722 00:37:21,460 --> 00:37:24,760 but there's no reason to believe it has to be linear. 723 00:37:24,760 --> 00:37:26,940 That being said it may be, the question is 724 00:37:26,940 --> 00:37:30,020 how strong of a deviation from linearity is there? 725 00:37:30,020 --> 00:37:33,290 And then how sensitive is the power law behavior to that? 726 00:37:33,290 --> 00:37:35,150 And that's the kind of thing that I'm 727 00:37:35,150 --> 00:37:37,970 sure that one of the 20,000 papers that 728 00:37:37,970 --> 00:37:40,380 have cited this paper in the last 15 years 729 00:37:40,380 --> 00:37:43,522 address this issue. 730 00:37:43,522 --> 00:37:45,980 Yeah, but I mean, this is also why there are so many papers 731 00:37:45,980 --> 00:37:47,600 that have cite-- It's like you read this paper , like oh, 732 00:37:47,600 --> 00:37:50,016 you know, it would be really interesting to do this, tha-- 733 00:37:50,016 --> 00:37:52,870 and people have been following that interest. 734 00:37:56,940 --> 00:37:59,730 Let's go and-- I think that the derivation is a little bit 735 00:37:59,730 --> 00:38:03,790 tricky, and so I think it's worth just walking through it. 736 00:38:06,375 --> 00:38:08,000 Especially since some people apparently 737 00:38:08,000 --> 00:38:12,000 couldn't even get the equations, which is going to be a problem. 738 00:38:31,700 --> 00:38:33,410 Maybe while we're on this question 739 00:38:33,410 --> 00:38:39,790 of preferential attachment-- How do 740 00:38:39,790 --> 00:38:43,510 you guys feel about this question of networks 741 00:38:43,510 --> 00:38:45,850 within, say the transcriptional network 742 00:38:45,850 --> 00:38:47,210 of E. coli or other cells? 743 00:38:47,210 --> 00:38:51,230 I mean do you think that these properties are 744 00:38:51,230 --> 00:38:53,140 relevant in the cell or-- 745 00:39:10,870 --> 00:39:15,362 So what would growth mean? 746 00:39:15,362 --> 00:39:17,810 AUDIENCE: [INAUDIBLE]. 747 00:39:17,810 --> 00:39:19,865 PROFESSOR: So growth would correspond 748 00:39:19,865 --> 00:39:20,740 to adding a new gene. 749 00:39:20,740 --> 00:39:22,269 Does that ever happen? 750 00:39:22,269 --> 00:39:22,852 AUDIENCE: Yes. 751 00:39:26,390 --> 00:39:30,130 PROFESSOR: Can some given a possible mechanism 752 00:39:30,130 --> 00:39:33,260 by which a new gene is added to the genome? 753 00:39:33,260 --> 00:39:34,550 AUDIENCE: Duplication. 754 00:39:34,550 --> 00:39:38,010 PROFESSOR: For example, duplication is common, right? 755 00:39:38,010 --> 00:39:39,151 eg. 756 00:39:39,151 --> 00:39:39,650 duplication. 757 00:39:43,040 --> 00:39:47,910 So what does this mean for preferential attachment? 758 00:39:56,557 --> 00:39:58,890 AUDIENCE: --duplicate the gene and it will probably also 759 00:39:58,890 --> 00:40:02,572 duplicate the promoter region, which means-- 760 00:40:02,572 --> 00:40:03,280 PROFESSOR: Right. 761 00:40:03,280 --> 00:40:05,690 So this, I think, is very interesting. 762 00:40:05,690 --> 00:40:08,370 So duplication, in general you'll 763 00:40:08,370 --> 00:40:11,130 duplicate both the coding region makes protein, but also maybe 764 00:40:11,130 --> 00:40:14,610 the promoter region that specifies the regulation. 765 00:40:14,610 --> 00:40:18,820 So if you imagine you have some x here 766 00:40:18,820 --> 00:40:23,290 that is-- And we can remind ourselves, 767 00:40:23,290 --> 00:40:26,780 are both the incoming and outgoing edges power law 768 00:40:26,780 --> 00:40:29,190 distributed in transcription networks? 769 00:40:29,190 --> 00:40:32,320 No, I know this was in the pre-class reading, 770 00:40:32,320 --> 00:40:33,860 but just in case. 771 00:40:33,860 --> 00:40:38,460 So what you find is that some transcription factors regulate 772 00:40:38,460 --> 00:40:42,570 many genes, but we don't have any proteins that 773 00:40:42,570 --> 00:40:49,220 are regulated by 200 genes, so in that sense typically 774 00:40:49,220 --> 00:40:51,230 we have the things that are regulated, 775 00:40:51,230 --> 00:40:56,360 there's maybe some x1, x2, x3. 776 00:40:56,360 --> 00:40:58,740 And there might be a few incoming edges, 777 00:40:58,740 --> 00:41:01,950 so the expression a gene is typically specified 778 00:41:01,950 --> 00:41:04,460 by a few transcription factors. 779 00:41:04,460 --> 00:41:06,180 Whereas some transcription factors 780 00:41:06,180 --> 00:41:09,700 might have 100 outgoing edges. 781 00:41:09,700 --> 00:41:12,140 So it's the outgoing edges that are power law distributed, 782 00:41:12,140 --> 00:41:16,510 and the ingoing are closer to being plus on or so. 783 00:41:16,510 --> 00:41:21,420 So you can imagine that this guy might have 100 or so, 784 00:41:21,420 --> 00:41:23,410 whereas over here some y transcription 785 00:41:23,410 --> 00:41:27,690 factor that is just regulating two genes, say y1, and y2. 786 00:41:30,520 --> 00:41:35,290 Now, question is, if gene duplication 787 00:41:35,290 --> 00:41:38,580 occurs kind of randomly throughout the genome, 788 00:41:38,580 --> 00:41:43,740 which transcription factor x or y 789 00:41:43,740 --> 00:41:48,282 is more likely to have a target that's duplicated? 790 00:41:48,282 --> 00:41:49,250 AUDIENCE: x. 791 00:41:49,250 --> 00:41:51,150 PROFESSOR: x, all right. 792 00:41:51,150 --> 00:41:53,215 Interestingly, how does that scale 793 00:41:53,215 --> 00:41:54,340 with the number of targets? 794 00:41:57,258 --> 00:41:58,150 AUDIENCE: Linear? 795 00:41:58,150 --> 00:42:00,540 PROFESSOR: This actually is linear, right? 796 00:42:00,540 --> 00:42:03,930 So I'd say that gene duplication does 797 00:42:03,930 --> 00:42:07,970 give growth and preferential attachment that 798 00:42:07,970 --> 00:42:11,680 is basically linear with a number of targets. 799 00:42:11,680 --> 00:42:16,930 It's interesting I'd say I find this kind of observation 800 00:42:16,930 --> 00:42:18,860 quite interesting, and compelling, 801 00:42:18,860 --> 00:42:22,080 and makes me feel kind of comfortable about this 802 00:42:22,080 --> 00:42:25,540 as a mechanism for some of the global properties. 803 00:42:25,540 --> 00:42:27,750 I mean there's no selection, there's 804 00:42:27,750 --> 00:42:30,010 no way to explain the interesting network motifs 805 00:42:30,010 --> 00:42:31,468 and so forth here, but I'd say just 806 00:42:31,468 --> 00:42:33,210 in terms of some general properties I 807 00:42:33,210 --> 00:42:34,640 think it's interesting. 808 00:42:34,640 --> 00:42:37,390 Of course, once again not a proof. 809 00:42:37,390 --> 00:42:40,100 Evolution can do whatever it wants with these gene 810 00:42:40,100 --> 00:42:44,360 duplication events, but also I would say not everybody 811 00:42:44,360 --> 00:42:47,140 finds this argument very, very compelling. 812 00:42:47,140 --> 00:42:49,540 But I'd say I think it's kind of-- I 813 00:42:49,540 --> 00:42:51,844 get a warm fuzzy feeling inside. 814 00:42:51,844 --> 00:42:54,229 AUDIENCE: We're talking about transcription network, 815 00:42:54,229 --> 00:42:56,770 it's different from the other networks you were talking about 816 00:42:56,770 --> 00:43:00,760 in that you also lose genes, and so is there any discussion-- 817 00:43:00,760 --> 00:43:06,647 PROFESSOR: Well you know, you could lose web pages, you can-- 818 00:43:06,647 --> 00:43:08,272 AUDIENCE: Are you losing them nearly as 819 00:43:08,272 --> 00:43:13,090 fast as you're adding them? 820 00:43:13,090 --> 00:43:15,510 PROFESSOR: Yeah, I don't know. 821 00:43:15,510 --> 00:43:20,190 I find that lots of links to my web pages just 822 00:43:20,190 --> 00:43:26,090 disappear over time, and I-- It's a reasonable question. 823 00:43:26,090 --> 00:43:31,450 I don't-- In some of these you say, oh well right, 824 00:43:31,450 --> 00:43:33,970 so with the web has been growing a lot recently, 825 00:43:33,970 --> 00:43:37,370 and so then we'd say the birth dominates over death there. 826 00:43:37,370 --> 00:43:40,830 Where if you talk about genome sizes along different lineages, 827 00:43:40,830 --> 00:43:42,686 it certainly is not growing exponentially 828 00:43:42,686 --> 00:43:43,560 the way the web pag-- 829 00:43:43,560 --> 00:43:46,340 I think that that's fair and true, 830 00:43:46,340 --> 00:43:49,410 but we haven't really actually specified or made clear, 831 00:43:49,410 --> 00:43:53,400 within a model what happens if you allow for birth and death. 832 00:43:53,400 --> 00:43:55,830 But I think that you could introduce death 833 00:43:55,830 --> 00:43:57,590 and recapitulate these behaviors, so it's 834 00:43:57,590 --> 00:44:01,030 not-- I think just because some nodes disappear, 835 00:44:01,030 --> 00:44:02,640 doesn't mean that we have to throw 836 00:44:02,640 --> 00:44:03,962 the whole idea out the window. 837 00:44:07,600 --> 00:44:11,040 But in the presence of evolution this 838 00:44:11,040 --> 00:44:12,880 is all very complicated, right? 839 00:44:12,880 --> 00:44:15,525 So you can't carry this argument too far. 840 00:44:19,700 --> 00:44:20,950 AUDIENCE: So it's [INAUDIBLE]. 841 00:44:26,850 --> 00:44:28,570 PROFESSOR: Well what we're assuming 842 00:44:28,570 --> 00:44:33,122 is that there is some segment of DNA that's in front of the gene 843 00:44:33,122 --> 00:44:34,580 that specifies-- gives instructions 844 00:44:34,580 --> 00:44:40,050 of when to transcribe the gene. 845 00:44:40,050 --> 00:44:43,410 So the linearity is really just assuming 846 00:44:43,410 --> 00:44:48,782 that genes have the same rate of being duplicated on average. 847 00:44:48,782 --> 00:44:50,240 And this is a very global property, 848 00:44:50,240 --> 00:44:55,982 so I think that it's kind of roughly-- 849 00:44:55,982 --> 00:44:58,190 I would say it's the middle model that you would use, 850 00:44:58,190 --> 00:44:59,737 if you're had to write an old model. 851 00:45:05,735 --> 00:45:07,860 AUDIENCE: Is there anything in looking for evidence 852 00:45:07,860 --> 00:45:09,870 to support [INAUDIBLE]. 853 00:45:09,870 --> 00:45:12,910 PROFESSOR: That's an interesting question. 854 00:45:12,910 --> 00:45:14,840 It's hard to know what it would even 855 00:45:14,840 --> 00:45:17,710 mean to collect the evidence to support it in the sense 856 00:45:17,710 --> 00:45:21,260 that-- You're saying along different evolutionary 857 00:45:21,260 --> 00:45:27,640 lineages, could we say that it's more likely to grow. 858 00:45:27,640 --> 00:45:31,360 Of course the other thing to say is that, the rate of death 859 00:45:31,360 --> 00:45:33,210 would also scale linearly. 860 00:45:33,210 --> 00:45:37,739 In the sense that a gene being stochastically removed 861 00:45:37,739 --> 00:45:39,530 from the genome should also scale linearly, 862 00:45:39,530 --> 00:45:41,570 so it's not that you don't actually 863 00:45:41,570 --> 00:45:45,110 then expect there to be any systematic change. 864 00:45:45,110 --> 00:45:47,010 I mean it's not as simple as just saying, oh 865 00:45:47,010 --> 00:45:50,990 the number of targets of a transcription factor 866 00:45:50,990 --> 00:45:53,002 with many targets should grow faster. 867 00:45:53,002 --> 00:45:54,460 It's really that the expectation is 868 00:45:54,460 --> 00:45:59,230 that it should be changing faster because both duplication 869 00:45:59,230 --> 00:46:01,020 and removal would both be increasing. 870 00:46:01,020 --> 00:46:04,370 So I think the signature is not totally obvious in that sense. 871 00:46:10,966 --> 00:46:12,340 So how many people actually tried 872 00:46:12,340 --> 00:46:16,710 to piece this derivation apart? 873 00:46:16,710 --> 00:46:18,570 Anybody? 874 00:46:18,570 --> 00:46:22,790 All right, and were you happy with it at the end of your-- 875 00:46:22,790 --> 00:46:23,790 AUDIENCE: I think that-- 876 00:46:23,790 --> 00:46:25,120 PROFESSOR: --permissions? 877 00:46:25,120 --> 00:46:28,380 AUDIENCE: --that I was a little bit iffy about. 878 00:46:28,380 --> 00:46:32,030 PROFESSOR:There is like a crux of the climb at the end. 879 00:46:32,030 --> 00:46:36,370 So let's make sure that we can understand what happened there. 880 00:46:36,370 --> 00:46:39,240 It's worth-- since we read the paper it's worth 881 00:46:39,240 --> 00:46:40,240 trying to figure it out. 882 00:46:44,000 --> 00:46:50,720 So what we're going to assume is that we start with m0 nodes. 883 00:46:54,080 --> 00:46:56,700 So they're going to be here, and the idea 884 00:46:56,700 --> 00:46:59,010 is it doesn't really matter how we start this thing. 885 00:46:59,010 --> 00:47:01,480 They might start out being unconnected, 886 00:47:01,480 --> 00:47:03,180 or they might he connected. 887 00:47:03,180 --> 00:47:06,910 But over time the signature how we start 888 00:47:06,910 --> 00:47:09,892 is not supposed to be that important. 889 00:47:09,892 --> 00:47:12,050 What we're going to do is at each time point we're 890 00:47:12,050 --> 00:47:14,830 going to add one more node. 891 00:47:14,830 --> 00:47:20,500 And as we do that we're going to add m edges as well. 892 00:47:20,500 --> 00:47:25,420 So we then have the number of, we'll say, nodes, 893 00:47:25,420 --> 00:47:30,860 N, as a function of time, is going to be equal to what? 894 00:47:35,510 --> 00:47:37,124 [INTERPOSING VOICES] 895 00:47:37,124 --> 00:47:37,790 PROFESSOR:Right. 896 00:47:37,790 --> 00:47:40,120 This is just going to be-- we're going to start at m0 897 00:47:40,120 --> 00:47:44,530 and we're going to add 1 each time, m0 plus 2. 898 00:47:44,530 --> 00:47:49,000 Number of edges is just going to be equal to the number 899 00:47:49,000 --> 00:47:54,190 that we add each time point, times the time. 900 00:47:54,190 --> 00:47:56,142 So here we're assuming that we start out 901 00:47:56,142 --> 00:47:57,600 with these nodes being unconnected. 902 00:48:00,830 --> 00:48:04,540 Now we're given the assumption that there's 903 00:48:04,540 --> 00:48:06,840 preferential attachment, so that means 904 00:48:06,840 --> 00:48:11,150 that the probability of connecting 905 00:48:11,150 --> 00:48:15,430 to some i-th node that has k edges 906 00:48:15,430 --> 00:48:19,650 is going to be k to the i divided 907 00:48:19,650 --> 00:48:28,350 by the sum over all the edges. 908 00:48:30,856 --> 00:48:31,355 Yes? 909 00:48:31,355 --> 00:48:33,765 AUDIENCE: Why is [INAUDIBLE]? 910 00:48:33,765 --> 00:48:35,390 PROFESSOR: All right, so the assumption 911 00:48:35,390 --> 00:48:41,010 is at each time point we add a new node, let's say this node, 912 00:48:41,010 --> 00:48:45,440 and with that we bring in some number, n, of new edges. 913 00:48:45,440 --> 00:48:48,780 So this could be 3, and then we go randomly 914 00:48:48,780 --> 00:48:52,710 to 3 of the existing nodes. 915 00:48:52,710 --> 00:48:56,083 So each time point we add m edges. 916 00:48:56,083 --> 00:48:59,464 AUDIENCE: How do we necessarily add them to the new node? 917 00:48:59,464 --> 00:49:00,430 Like [INAUDIBLE]. 918 00:49:07,741 --> 00:49:09,990 PROFESSOR: I'm sorry I don't understa-- oh yeah right, 919 00:49:09,990 --> 00:49:14,150 so the assumption is that the new node is indeed 920 00:49:14,150 --> 00:49:18,300 being connected to-- that all m edges that we're adding 921 00:49:18,300 --> 00:49:19,470 are to this new node. 922 00:49:23,570 --> 00:49:25,576 So this is the linear preferential attachment 923 00:49:25,576 --> 00:49:26,700 that we were talking about. 924 00:49:42,510 --> 00:49:44,120 So what we want to know first, is 925 00:49:44,120 --> 00:49:49,240 how after a node is connected, how is it that number of edges 926 00:49:49,240 --> 00:49:52,010 will grow over time. 927 00:49:52,010 --> 00:49:55,400 What we know is that when it's first 928 00:49:55,400 --> 00:50:00,000 added it has it exactly m edges, right? 929 00:50:00,000 --> 00:50:02,797 But then as new nodes come, then we'll maybe get some more 930 00:50:02,797 --> 00:50:03,630 and then it'll grow. 931 00:50:07,400 --> 00:50:12,090 And in particular we want to get-- 932 00:50:12,090 --> 00:50:15,860 We're told that it's going to grow as this differential 933 00:50:15,860 --> 00:50:24,400 equation, so we want to kind of get to this. 934 00:50:24,400 --> 00:50:26,050 And the way to think about this is 935 00:50:26,050 --> 00:50:29,200 that, all right well, how is it that the number of edges 936 00:50:29,200 --> 00:50:33,580 will change at each time point, so delta k i. 937 00:50:33,580 --> 00:50:38,300 Well the expected number of edges 938 00:50:38,300 --> 00:50:41,680 that will be attached to some node, well that's 939 00:50:41,680 --> 00:50:46,870 going to be m, this is the number 940 00:50:46,870 --> 00:50:50,170 of edges that were attached by this incoming node, 941 00:50:50,170 --> 00:50:56,350 times this probability of attaching to this node. 942 00:50:56,350 --> 00:51:00,350 So this is the probability of k i. 943 00:51:00,350 --> 00:51:03,680 Now this is in one time step. 944 00:51:03,680 --> 00:51:04,970 So this is really a delta k i. 945 00:51:04,970 --> 00:51:07,261 If we want, we could say over some delta t, which is 1. 946 00:51:09,980 --> 00:51:12,462 So from that standpoint, we can actually then write it 947 00:51:12,462 --> 00:51:13,920 as differential equation, where you 948 00:51:13,920 --> 00:51:17,320 say the change in this number of edges with respect to time 949 00:51:17,320 --> 00:51:21,130 is indeed going to be equal to m times 950 00:51:21,130 --> 00:51:27,870 this guy here, which is the number of edges 951 00:51:27,870 --> 00:51:34,410 that that node has at this time, divided by this sum over all 952 00:51:34,410 --> 00:51:34,910 those edges. 953 00:51:40,240 --> 00:51:44,340 This is just kind of the expected number of edges 954 00:51:44,340 --> 00:51:46,403 to be added to that node at each time point. 955 00:51:49,360 --> 00:51:52,516 What does this thing-- What does that thing equal to? 956 00:51:52,516 --> 00:51:54,488 Yes? 957 00:51:54,488 --> 00:51:56,131 AUDIENCE: --that equation, because it 958 00:51:56,131 --> 00:51:58,925 seemed like you just wrote the same equation on the line 959 00:51:58,925 --> 00:52:00,404 above that line. 960 00:52:00,404 --> 00:52:03,855 You just substituted it-- 961 00:52:03,855 --> 00:52:04,841 PROFESSOR: I did. 962 00:52:04,841 --> 00:52:12,314 AUDIENCE: OK, but [INAUDIBLE] wrote it as [INAUDIBLE]. 963 00:52:14,917 --> 00:52:17,250 PROFESSOR: Yeah, so this is kind of the discrete version 964 00:52:17,250 --> 00:52:19,269 of this differential equation. 965 00:52:19,269 --> 00:52:19,810 AUDIENCE: Oh. 966 00:52:19,810 --> 00:52:21,200 PROFESSOR: Right. 967 00:52:21,200 --> 00:52:24,120 Yeah that's right, that's right. 968 00:52:24,120 --> 00:52:28,120 And of course the beginning could be highly stochastic 969 00:52:28,120 --> 00:52:29,950 but we're just thinking about in the limit 970 00:52:29,950 --> 00:52:32,010 of if it's deterministic. 971 00:52:35,330 --> 00:52:40,230 What is this thing in terms of-- from here 972 00:52:40,230 --> 00:52:42,530 this is just a normalization constant, right? 973 00:52:42,530 --> 00:52:46,076 Because each edge has to be attached somewhere, 974 00:52:46,076 --> 00:52:47,700 we're assuming it's linear with respect 975 00:52:47,700 --> 00:52:49,744 to the number of edges at each node, right? 976 00:52:49,744 --> 00:52:51,410 And that means that for normalization we 977 00:52:51,410 --> 00:52:56,910 have to divide by the sum over all those edges, the edges 978 00:52:56,910 --> 00:52:59,490 that each of the nodes might have. 979 00:52:59,490 --> 00:53:02,580 What is this thing equal to in terms of something else 980 00:53:02,580 --> 00:53:04,540 that we might have on the board? 981 00:53:04,540 --> 00:53:05,040 Yeah? 982 00:53:05,040 --> 00:53:09,062 AUDIENCE: These have edges with respect to [INAUDIBLE]. 983 00:53:09,062 --> 00:53:09,770 PROFESSOR: Right. 984 00:53:09,770 --> 00:53:12,920 So I guess the question is this, can we write this? 985 00:53:17,580 --> 00:53:18,920 Where E is a function of time? 986 00:53:23,270 --> 00:53:25,940 Is that correct? 987 00:53:25,940 --> 00:53:27,240 So we're getting some shakes. 988 00:53:27,240 --> 00:53:28,762 AUDIENCE: Isn't it 2E? 989 00:53:28,762 --> 00:53:29,470 PROFESSOR: Right. 990 00:53:29,470 --> 00:53:30,310 So it's actually 2E. 991 00:53:32,990 --> 00:53:34,860 Because what you notice here is that this 992 00:53:34,860 --> 00:53:38,400 is the sum over all of the edges that each of the nodes have. 993 00:53:38,400 --> 00:53:40,810 But each edge is connecting 2 nodes. 994 00:53:40,810 --> 00:53:43,620 So the sum over all these edge distributions 995 00:53:43,620 --> 00:53:46,330 is twice the number of edges. 996 00:53:46,330 --> 00:53:49,540 Now I would say as a physicist, working in biology, 997 00:53:49,540 --> 00:53:51,970 my general attitude is that a factor of 2 here, 998 00:53:51,970 --> 00:53:55,630 factor of 2 there, doesn't really matter. 999 00:53:55,630 --> 00:53:57,640 But this factor of 2 actually is relevant 1000 00:53:57,640 --> 00:54:01,140 because it ends up determining the scaling over time. 1001 00:54:01,140 --> 00:54:04,420 So not all factors of 2 are created equal, 1002 00:54:04,420 --> 00:54:08,930 and this is one that is worth paying attention to. 1003 00:54:08,930 --> 00:54:10,729 Does everyone here understand why this 1004 00:54:10,729 --> 00:54:12,020 is 2 times the number of edges? 1005 00:54:17,370 --> 00:54:21,247 k1 is equal to 1, k2 is equal to 1, number of edges 1006 00:54:21,247 --> 00:54:21,830 is equal to 1. 1007 00:54:25,750 --> 00:54:27,019 Yeah. 1008 00:54:27,019 --> 00:54:30,135 AUDIENCE: So that means we're in an undirected network, 1009 00:54:30,135 --> 00:54:31,510 if we were in a directed network, 1010 00:54:31,510 --> 00:54:34,695 then we would not have that factor of 2. 1011 00:54:34,695 --> 00:54:35,320 PROFESSOR: Yes. 1012 00:54:35,320 --> 00:54:37,065 So we are indeed in an undirected, 1013 00:54:37,065 --> 00:54:38,440 and I'd say in a directed network 1014 00:54:38,440 --> 00:54:40,481 you have to then be more careful about what you-- 1015 00:54:40,481 --> 00:54:43,430 you have to specify the k's in and k's out. 1016 00:54:43,430 --> 00:54:44,970 So actually, already just by writing 1017 00:54:44,970 --> 00:54:46,300 this we've already assumed it's undirected, 1018 00:54:46,300 --> 00:54:48,410 because we haven't specified what we mean by k. 1019 00:54:54,620 --> 00:54:56,820 We're here, but very conveniently we 1020 00:54:56,820 --> 00:54:58,640 already know how many edges there 1021 00:54:58,640 --> 00:55:00,680 are as a function of time. 1022 00:55:00,680 --> 00:55:02,680 This is just equal to m times t. 1023 00:55:02,680 --> 00:55:07,740 So we get something that's very convenient ki divided by 2 t. 1024 00:55:12,810 --> 00:55:16,220 From here we can solve the differential equation. 1025 00:55:16,220 --> 00:55:17,890 This is what we want to show. 1026 00:55:20,850 --> 00:55:22,310 The fact that we're doing partials 1027 00:55:22,310 --> 00:55:24,870 doesn't really matter, because it's just time here. 1028 00:55:24,870 --> 00:55:28,780 So it's really-- so we have d ki over ki, 1029 00:55:28,780 --> 00:55:32,220 is equal to dt over 2t. 1030 00:55:35,620 --> 00:55:37,280 This 2, really again, is going to make 1031 00:55:37,280 --> 00:55:40,450 a difference, because when we go and we integrate, 1032 00:55:40,450 --> 00:55:44,370 we get the logs and so forth. 1033 00:55:44,370 --> 00:55:46,620 And so we get that ki as a function of time 1034 00:55:46,620 --> 00:55:50,130 is going to grow with time, with some constant c, 1035 00:55:50,130 --> 00:55:53,024 proportionality to the square root of time. 1036 00:55:53,024 --> 00:55:55,690 So if we didn't have the half it would just be linear with time. 1037 00:55:59,090 --> 00:56:01,800 Now how do we know what c-- in general 1038 00:56:01,800 --> 00:56:05,640 how do we get constants of integration in life? 1039 00:56:05,640 --> 00:56:06,890 AUDIENCE: Boundary conditions. 1040 00:56:06,890 --> 00:56:07,980 PROFESSOR: Yeah, boundary conditions, 1041 00:56:07,980 --> 00:56:09,270 in this case, the initial condition. 1042 00:56:09,270 --> 00:56:10,436 And what is it that we know? 1043 00:56:14,424 --> 00:56:15,952 AUDIENCE: ki. 1044 00:56:15,952 --> 00:56:16,660 PROFESSOR: Right. 1045 00:56:16,660 --> 00:56:21,400 So what we know is that ki, so this i-th node, 1046 00:56:21,400 --> 00:56:24,570 when it's added at time ti, it should be equal to what? 1047 00:56:27,366 --> 00:56:28,314 AUDIENCE: m. 1048 00:56:28,314 --> 00:56:28,980 PROFESSOR: Yeah. 1049 00:56:28,980 --> 00:56:29,646 It's equal to m. 1050 00:56:29,646 --> 00:56:31,470 So when it's first added, at some time ti, 1051 00:56:31,470 --> 00:56:33,910 its number of edges is equal to m. 1052 00:56:33,910 --> 00:56:36,670 Because that's what we've assumed, is that we add a node 1053 00:56:36,670 --> 00:56:38,920 and we connect it randomly and other things, 1054 00:56:38,920 --> 00:56:41,820 so it has m edges initially. 1055 00:56:41,820 --> 00:56:46,560 So from this kot, this is then equal to m 1056 00:56:46,560 --> 00:56:51,620 times the square root of t divided by t initial. 1057 00:56:51,620 --> 00:56:55,520 Where ti is the time that i-th node was added to the network. 1058 00:56:58,110 --> 00:57:00,068 Are there any questions about how we got there? 1059 00:57:07,580 --> 00:57:10,040 So I think that this is relatively straightforward. 1060 00:57:10,040 --> 00:57:14,000 The part that gets confusing is this later part 1061 00:57:14,000 --> 00:57:17,630 about the probabilities and keeping everything straight. 1062 00:57:17,630 --> 00:57:20,709 And so what Barabasi did next, is 1063 00:57:20,709 --> 00:57:22,750 he said, all right, well, what we're going to do, 1064 00:57:22,750 --> 00:57:26,620 is we're going to talk about the probability, P. 1065 00:57:26,620 --> 00:57:31,670 Now this is an actual honest to goodness probability. 1066 00:57:31,670 --> 00:57:34,026 The big P is actually a probability, 1067 00:57:34,026 --> 00:57:35,650 and that's as compared to a probability 1068 00:57:35,650 --> 00:57:40,310 distribution, little p. 1069 00:57:40,310 --> 00:57:44,239 And I'll put in a little curly here thing, so it's a little p. 1070 00:57:44,239 --> 00:57:46,780 This is saying if you want to get an actual probability here, 1071 00:57:46,780 --> 00:57:49,071 then you have to multiply that probability distribution 1072 00:57:49,071 --> 00:57:51,890 times some range delta k. 1073 00:57:51,890 --> 00:57:53,960 If you want to know that the probability 1074 00:57:53,960 --> 00:57:58,310 that some node has between some number and some number 1075 00:57:58,310 --> 00:58:02,220 of edges, then you multiply it by that range. 1076 00:58:02,220 --> 00:58:02,720 Right? 1077 00:58:05,420 --> 00:58:09,070 Probability distribution, this is an actual probability. 1078 00:58:09,070 --> 00:58:11,310 And as befits an actual probability, 1079 00:58:11,310 --> 00:58:16,230 we're going to say, OK the probability that the i-th node 1080 00:58:16,230 --> 00:58:22,300 has k edges, that are less than some value k. 1081 00:58:22,300 --> 00:58:24,640 And remember this thing is actually a function of time. 1082 00:58:29,260 --> 00:58:31,590 But we have an expression for ki as a function of time, 1083 00:58:31,590 --> 00:58:32,381 it's equal to this. 1084 00:58:34,750 --> 00:58:38,480 So we can solve when we show that this probability is also 1085 00:58:38,480 --> 00:58:40,960 the same as this other probability. 1086 00:58:40,960 --> 00:58:45,400 That the i-th node was added after some time 1087 00:58:45,400 --> 00:58:46,940 t that can be written as this. 1088 00:58:52,710 --> 00:58:55,050 So this is saying, the probability 1089 00:58:55,050 --> 00:59:00,310 that some random, say i-th node, has fewer than k edges, 1090 00:59:00,310 --> 00:59:03,990 is the same as saying it's the probability 1091 00:59:03,990 --> 00:59:06,360 that the i-th node was added after some time, t, 1092 00:59:06,360 --> 00:59:10,430 which is this thing. 1093 00:59:10,430 --> 00:59:12,450 Because the number of edges will grow over 1094 00:59:12,450 --> 00:59:14,610 time for each of these nodes. 1095 00:59:20,050 --> 00:59:22,230 Do you understand that kind of conceptual statement 1096 00:59:22,230 --> 00:59:23,426 that was made there? 1097 00:59:26,540 --> 00:59:27,040 Yes? 1098 00:59:27,040 --> 00:59:27,710 Any questions? 1099 00:59:32,220 --> 00:59:35,270 All right, so the probability that this i-th node was added 1100 00:59:35,270 --> 00:59:41,082 after this time, is also of course 1 minus the probability 1101 00:59:41,082 --> 00:59:42,540 that it was added before that time. 1102 00:59:55,050 --> 00:59:58,350 Whereas time, little t here, this is at the time 1103 00:59:58,350 --> 01:00:00,310 that you're actually looking. 1104 01:00:00,310 --> 01:00:03,002 So this is saying, oh well, if little t is 100, for example, 1105 01:00:03,002 --> 01:00:04,710 it's saying all right, at that time point 1106 01:00:04,710 --> 01:00:07,240 after I got 100 nodes, we want to say, all right, what's 1107 01:00:07,240 --> 01:00:09,500 the probably that some random i-th node was 1108 01:00:09,500 --> 01:00:11,420 added before this quantity. 1109 01:00:11,420 --> 01:00:13,430 And this is just again some other kind of time, 1110 01:00:13,430 --> 01:00:14,855 if you'd like. 1111 01:00:19,810 --> 01:00:22,710 I think this is the part that it is especially kind of weird. 1112 01:00:22,710 --> 01:00:24,210 So this is also equal to this thing. 1113 01:00:26,720 --> 01:00:28,260 And I think reasonable people can 1114 01:00:28,260 --> 01:00:31,770 argue about exactly what you should write here, 1115 01:00:31,770 --> 01:00:36,610 but let's figure out the basic argument first. 1116 01:00:36,610 --> 01:00:40,730 So there's this probability is equal to this thing. 1117 01:00:40,730 --> 01:00:49,020 So this statement is really that at some time t we have how many 1118 01:00:49,020 --> 01:00:49,520 nodes? 1119 01:00:49,520 --> 01:00:55,490 We have m0 plus t nodes, right? 1120 01:00:55,490 --> 01:00:58,650 So this is something here. 1121 01:00:58,650 --> 01:01:02,170 And of course there are edges going around doing things. 1122 01:01:02,170 --> 01:01:04,400 And what we want to know is, what's the probability 1123 01:01:04,400 --> 01:01:06,500 if I grab one of them, we're going 1124 01:01:06,500 --> 01:01:08,610 to call that the i-th node. 1125 01:01:08,610 --> 01:01:11,050 What's the probability if I grab one of them 1126 01:01:11,050 --> 01:01:17,310 that it was added before sometime here. 1127 01:01:17,310 --> 01:01:20,290 And it's useful to just imagine this is as just being some time 1128 01:01:20,290 --> 01:01:28,350 t, just so that we don't get confused by all the symbols. 1129 01:01:28,350 --> 01:01:31,080 You say, oh well, that probability 1130 01:01:31,080 --> 01:01:35,167 is really just the probability-- well how many nodes total do 1131 01:01:35,167 --> 01:01:36,840 we have here, m0 plus t. 1132 01:01:36,840 --> 01:01:39,820 How many nodes were there that were added before this time t? 1133 01:01:39,820 --> 01:01:45,334 Well that's going to be t, you might want to say t plus m0. 1134 01:01:45,334 --> 01:01:47,750 There's a question of whether you include those nodes that 1135 01:01:47,750 --> 01:01:50,100 started there or not. 1136 01:01:50,100 --> 01:01:53,650 Given the equations that Barabasi wrote down, 1137 01:01:53,650 --> 01:01:55,137 he kind of assumes that we're only 1138 01:01:55,137 --> 01:01:56,845 counting the nodes that were added later. 1139 01:01:59,480 --> 01:02:02,180 So I'd say if you want, you could either 1140 01:02:02,180 --> 01:02:04,350 add an m0 up there, or get rid of this m0, 1141 01:02:04,350 --> 01:02:06,300 depending on what you like. 1142 01:02:06,300 --> 01:02:10,270 But broadly there's this idea that we have this many nodes, 1143 01:02:10,270 --> 01:02:14,262 and this many of them were added for some time t. 1144 01:02:14,262 --> 01:02:16,470 And that's how we get this m squared t over k squared 1145 01:02:16,470 --> 01:02:20,040 was just that time t divided by the total number of nodes. 1146 01:02:23,199 --> 01:02:24,990 And this whole discussion about whether you 1147 01:02:24,990 --> 01:02:26,479 count the initial m0 nodes or not, 1148 01:02:26,479 --> 01:02:28,020 it doesn't matter because we're going 1149 01:02:28,020 --> 01:02:29,590 to take the limit as t goes to infinity, 1150 01:02:29,590 --> 01:02:30,548 and that all goes away. 1151 01:02:33,709 --> 01:02:35,000 Are there questions about this? 1152 01:02:37,690 --> 01:02:44,360 There is something kind of mind twisting about this argument, 1153 01:02:44,360 --> 01:02:49,890 even though we're really just picking big T objects out 1154 01:02:49,890 --> 01:02:53,357 of essentially little t objects, but somehow something funny 1155 01:02:53,357 --> 01:02:53,940 goes on there. 1156 01:02:58,310 --> 01:03:02,347 Any questions about that? 1157 01:03:02,347 --> 01:03:05,700 AUDIENCE: Could you just go through the argument one more 1158 01:03:05,700 --> 01:03:06,200 time? 1159 01:03:06,200 --> 01:03:09,200 PROFESSOR: Yeah, sure, sure Right so 1160 01:03:09,200 --> 01:03:13,330 I think that what's confusing about it is the fact that we're 1161 01:03:13,330 --> 01:03:18,970 asking whether the i-th node was added before some time t. 1162 01:03:18,970 --> 01:03:22,890 And this time t is equal to something that's funny 1163 01:03:22,890 --> 01:03:25,100 based on what we've just done. 1164 01:03:25,100 --> 01:03:32,560 But it's useful to just ask, if at time little t 1165 01:03:32,560 --> 01:03:35,000 you look at this network and I ask 1166 01:03:35,000 --> 01:03:37,760 you, all right, was it added before this time, 1167 01:03:37,760 --> 01:03:40,130 big T. Let's just for concreteness 1168 01:03:40,130 --> 01:03:45,480 say m0 is equal to-- we start with 10 nodes. 1169 01:03:45,480 --> 01:03:50,710 And we say, OK, at time t equal to 100, I ask you, 1170 01:03:50,710 --> 01:03:53,880 what's the probability that if I grab a random node, what's 1171 01:03:53,880 --> 01:03:56,050 the probability it was added before some time 1172 01:03:56,050 --> 01:03:57,555 big T equal 10. 1173 01:04:02,010 --> 01:04:07,800 Well you would say, very roughly actually. 1174 01:04:07,800 --> 01:04:10,520 We can say let's actually, we can even if you'd like, 1175 01:04:10,520 --> 01:04:13,380 say we're not going to count-- we're not going to count 1176 01:04:13,380 --> 01:04:14,530 those m0 initial nodes. 1177 01:04:14,530 --> 01:04:16,613 So we're just going to be looking at nodes that we 1178 01:04:16,613 --> 01:04:17,790 added later, if you'd like. 1179 01:04:17,790 --> 01:04:20,810 And then when you would say, all right well, at time t 100, 1180 01:04:20,810 --> 01:04:23,320 we've added 100 nodes. 1181 01:04:23,320 --> 01:04:25,640 And I'm asking, if I grab one of the nodes, what's 1182 01:04:25,640 --> 01:04:30,360 the probability that the node I grab was added in the first 10 1183 01:04:30,360 --> 01:04:31,900 time steps. 1184 01:04:31,900 --> 01:04:34,400 Well you'd say, it's going to be 10%, 1185 01:04:34,400 --> 01:04:38,070 because there were 10 nodes that were added before time big T, 1186 01:04:38,070 --> 01:04:41,824 and we added 100, so it's really just this divided by this. 1187 01:04:41,824 --> 01:04:43,990 And with the question of whether you want to include 1188 01:04:43,990 --> 01:04:46,990 m0's or not. 1189 01:04:46,990 --> 01:04:53,210 So I think that that argument is surprisingly straightforward, 1190 01:04:53,210 --> 01:04:56,900 but somehow it gets really confusing is 1191 01:04:56,900 --> 01:05:01,600 that the time t we're referring it's depending on the k's 1192 01:05:01,600 --> 01:05:02,960 and t's and so forth. 1193 01:05:02,960 --> 01:05:04,450 But that's a way of keeping track 1194 01:05:04,450 --> 01:05:07,600 of how are things scaling as a function of time. 1195 01:05:07,600 --> 01:05:09,540 But if you boil the argument down to this, 1196 01:05:09,540 --> 01:05:12,640 then it makes sense, but then of course 1197 01:05:12,640 --> 01:05:15,730 then you look back at this and you get confused you again. 1198 01:05:15,730 --> 01:05:20,280 Which is how I feel every year when I prepare this lecture, 1199 01:05:20,280 --> 01:05:23,840 but I think it all does make sense if you-- 1200 01:05:27,317 --> 01:05:29,400 Any questions about this argument or that argument 1201 01:05:29,400 --> 01:05:31,980 or any part of it? 1202 01:05:31,980 --> 01:05:32,742 Yes? 1203 01:05:32,742 --> 01:05:34,950 AUDIENCE: So the ti's are very important [INAUDIBLE]? 1204 01:05:38,245 --> 01:05:38,870 PROFESSOR: Yes. 1205 01:05:41,440 --> 01:05:48,639 So this is just saying that if I pick some random node, 1206 01:05:48,639 --> 01:05:49,930 we're calling it the i-th node. 1207 01:05:49,930 --> 01:05:52,362 I'm asking what's the probability that the time that 1208 01:05:52,362 --> 01:05:54,460 was added was before something. 1209 01:05:54,460 --> 01:05:56,770 So this is not one of the variables, 1210 01:05:56,770 --> 01:05:58,760 and you'll see the ti doesn't appear down here. 1211 01:05:58,760 --> 01:06:01,870 Because this is just saying-- I'm asking you, 1212 01:06:01,870 --> 01:06:03,877 if I grab some random node, the i-th node. 1213 01:06:03,877 --> 01:06:05,460 I'm asking you, what's the probability 1214 01:06:05,460 --> 01:06:09,800 that it was added before some other time, which is all this. 1215 01:06:09,800 --> 01:06:14,290 And what you can see is that it's a function of the time 1216 01:06:14,290 --> 01:06:19,590 that we look, because if I go to longer times 1217 01:06:19,590 --> 01:06:23,880 you know then indeed this probability should it go-- 1218 01:06:23,880 --> 01:06:26,478 What should it do? 1219 01:06:26,478 --> 01:06:27,394 AUDIENCE: [INAUDIBLE]. 1220 01:06:27,394 --> 01:06:31,975 PROFESSOR: OK, but it depends on k's as well, right? 1221 01:06:31,975 --> 01:06:33,466 What do I want to say? 1222 01:06:42,010 --> 01:06:47,300 Ultimately what we see here is that as time goes infinity, 1223 01:06:47,300 --> 01:06:49,780 so after a long time, then we reach 1224 01:06:49,780 --> 01:06:51,910 this stationary distribution where 1225 01:06:51,910 --> 01:06:53,430 the base structure of the network 1226 01:06:53,430 --> 01:06:55,920 is not changing anymore. 1227 01:06:55,920 --> 01:06:58,510 And that's because there's a t in both the numerator 1228 01:06:58,510 --> 01:06:59,340 and denominator. 1229 01:06:59,340 --> 01:07:01,230 So then the only thing that is left 1230 01:07:01,230 --> 01:07:05,294 is this behavior as a function of k. 1231 01:07:05,294 --> 01:07:07,210 And this is really saying that the probability 1232 01:07:07,210 --> 01:07:11,420 that some node was added before some time, 1233 01:07:11,420 --> 01:07:14,980 is kind of the same as saying that, 1234 01:07:14,980 --> 01:07:17,942 well, that you have a lot of edges. 1235 01:07:17,942 --> 01:07:19,650 And that's how we got here to begin with, 1236 01:07:19,650 --> 01:07:22,190 because the nodes that were added early end up 1237 01:07:22,190 --> 01:07:23,600 with a lot of edges. 1238 01:07:23,600 --> 01:07:27,320 This is the so-called rich get richer phenomenon. 1239 01:07:27,320 --> 01:07:29,607 So if you're sitting on a manuscript, 1240 01:07:29,607 --> 01:07:31,440 and you're not submitting it for publication 1241 01:07:31,440 --> 01:07:34,060 you should get on it because the earlier 1242 01:07:34,060 --> 01:07:38,220 that it's published the more citations it's going to get. 1243 01:07:38,220 --> 01:07:41,080 But this is saying that the probability 1244 01:07:41,080 --> 01:07:46,540 that some random node has a small number of edges 1245 01:07:46,540 --> 01:07:49,170 is the same as that the probability 1246 01:07:49,170 --> 01:07:50,835 that the node was added late. 1247 01:07:53,285 --> 01:07:55,285 And that makes sense, because if it's added late 1248 01:07:55,285 --> 01:07:58,670 it doesn't have very many edges, hasn't had time to grow. 1249 01:07:58,670 --> 01:08:03,170 And then from those calculations you 1250 01:08:03,170 --> 01:08:04,730 get it at this degree distribution. 1251 01:08:08,649 --> 01:08:09,149 Yes? 1252 01:08:09,149 --> 01:08:13,077 AUDIENCE: So for this analytical [INAUDIBLE] 1253 01:08:13,077 --> 01:08:16,023 we're assuming the links could be [INAUDIBLE]. 1254 01:08:20,785 --> 01:08:21,410 PROFESSOR: Yes. 1255 01:08:21,410 --> 01:08:24,660 So we're taking, in principle it's a discrete problem 1256 01:08:24,660 --> 01:08:28,550 and converting it into a differential equation. 1257 01:08:28,550 --> 01:08:30,710 And it's an interesting question of I 1258 01:08:30,710 --> 01:08:35,680 don't know how big of an error this ends up making, 1259 01:08:35,680 --> 01:08:39,950 and of course this expression doesn't actually 1260 01:08:39,950 --> 01:08:43,640 end up having integers. 1261 01:08:43,640 --> 01:08:46,180 But this is a way of making it so that the errors don't 1262 01:08:46,180 --> 01:08:48,090 grow or so, right? 1263 01:08:48,090 --> 01:08:51,350 I think that it basically works. 1264 01:08:51,350 --> 01:08:54,069 If you'd like you could actually do the simulation with all 1265 01:08:54,069 --> 01:08:55,430 the discrete-- I think that is actually 1266 01:08:55,430 --> 01:08:57,388 going to be the stochastic dynamics that end up 1267 01:08:57,388 --> 01:09:02,080 being more relevant than the integer kind of issue, 1268 01:09:02,080 --> 01:09:04,533 but I haven't actually looked into that though. 1269 01:09:09,270 --> 01:09:13,840 Any other questions about that so far? 1270 01:09:13,840 --> 01:09:14,367 Yes? 1271 01:09:14,367 --> 01:09:15,283 AUDIENCE: [INAUDIBLE]. 1272 01:09:19,279 --> 01:09:22,420 PROFESSOR: So there's no loss of edges, no loss of nodes, 1273 01:09:22,420 --> 01:09:23,520 strictly verboten. 1274 01:09:30,359 --> 01:09:31,970 I spent a lot of time trying to plan 1275 01:09:31,970 --> 01:09:37,148 an upcoming trip to Germany last night so German is on my mind. 1276 01:09:44,410 --> 01:09:47,870 So are we done yet incidentally? 1277 01:09:47,870 --> 01:09:48,910 Nearly right? 1278 01:09:48,910 --> 01:09:51,790 Because we have-- What we really wanted 1279 01:09:51,790 --> 01:09:56,130 is the degree distribution, not this probability. 1280 01:09:56,130 --> 01:09:58,770 So we have to take a derivative still, 1281 01:09:58,770 --> 01:10:04,120 but as t goes to infinity, regardless 1282 01:10:04,120 --> 01:10:09,320 of how you treat the m0's, actually what we-- maybe we'll 1283 01:10:09,320 --> 01:10:10,630 take the derivative first. 1284 01:10:10,630 --> 01:10:14,730 So this probability density is going 1285 01:10:14,730 --> 01:10:17,500 to be the derivative with respect 1286 01:10:17,500 --> 01:10:22,030 to k of the actual probability here. 1287 01:10:27,660 --> 01:10:30,460 So we take a derivative, this one derivative 1288 01:10:30,460 --> 01:10:33,370 that nothing happens, case squared, 1289 01:10:33,370 --> 01:10:36,690 it's going to turn into a k cubed. 1290 01:10:36,690 --> 01:10:44,200 So we get 2m squared t over k cubed, 1291 01:10:44,200 --> 01:10:52,110 we still have the t plus m0, but when we let t go to infinity, 1292 01:10:52,110 --> 01:10:58,550 so after this thing has reached its stationary distribution, 1293 01:10:58,550 --> 01:11:02,370 then we end up just getting 2m squared over k cubed. 1294 01:11:05,030 --> 01:11:08,490 I just want to be clear this is to the k. 1295 01:11:08,490 --> 01:11:14,050 The key feature here is that the probability distribution 1296 01:11:14,050 --> 01:11:16,920 goes as 1 over k cubed. 1297 01:11:23,400 --> 01:11:26,430 What is interesting is that when I first read the paper 1298 01:11:26,430 --> 01:11:31,260 I actually thought that this exponent here 1299 01:11:31,260 --> 01:11:33,910 would be a function of the linearity 1300 01:11:33,910 --> 01:11:36,160 of the preferential attachment. 1301 01:11:36,160 --> 01:11:39,360 So I actually-- and of course they say that it's not true, 1302 01:11:39,360 --> 01:11:42,020 but when I was halfway through the paper I thought, oh well, 1303 01:11:42,020 --> 01:11:45,490 if you just let this go as some power to the beta, 1304 01:11:45,490 --> 01:11:47,900 or so, that you would maybe get something 1305 01:11:47,900 --> 01:11:51,580 like this was 2 plus beta-- I thought 1306 01:11:51,580 --> 01:11:53,990 something like that, but apparently it's not true. 1307 01:11:53,990 --> 01:11:58,040 That if you do not have linear attachment here 1308 01:11:58,040 --> 01:12:00,220 then you just don't get power law distributions. 1309 01:12:00,220 --> 01:12:02,053 They suggest other ways that you could maybe 1310 01:12:02,053 --> 01:12:04,860 get different exponents, which is very relevant given 1311 01:12:04,860 --> 01:12:07,260 the fact that different real networks indeed 1312 01:12:07,260 --> 01:12:09,850 have different exponents. 1313 01:12:09,850 --> 01:12:13,240 But I'd say that their proffered explanation, which 1314 01:12:13,240 --> 01:12:18,590 is to include directed edges, feels unsatisfying because not 1315 01:12:18,590 --> 01:12:21,240 all networks are directed. 1316 01:12:21,240 --> 01:12:24,530 And this network here is not directed, 1317 01:12:24,530 --> 01:12:26,950 it has next exponents closer to 2. 1318 01:12:26,950 --> 01:12:30,010 So you really want to have other mechanisms. 1319 01:12:30,010 --> 01:12:33,230 But this is as we mentioned, is it's a thriving field 1320 01:12:33,230 --> 01:12:35,474 and people have explored many different aspects 1321 01:12:35,474 --> 01:12:36,140 of this problem. 1322 01:12:43,400 --> 01:12:46,650 Are there any other questions about this derivation, how 1323 01:12:46,650 --> 01:12:49,562 we got there, how convincing maybe you 1324 01:12:49,562 --> 01:12:50,770 think it should be or not be? 1325 01:12:56,120 --> 01:12:58,870 So I want to just spend the last five minutes of the class 1326 01:12:58,870 --> 01:13:02,360 kind of setting up the discussion of how we should be 1327 01:13:02,360 --> 01:13:05,010 searching for network motifs. 1328 01:13:05,010 --> 01:13:06,760 In particular there's a natural question 1329 01:13:06,760 --> 01:13:10,200 which is, we have to decide what the right null model is, 1330 01:13:10,200 --> 01:13:14,842 in terms of deciding what the expected frequency of a network 1331 01:13:14,842 --> 01:13:16,550 motif, like a feed forward loop might be. 1332 01:13:22,590 --> 01:13:30,490 So first of all, why is it that we maybe should not 1333 01:13:30,490 --> 01:13:31,890 use an Erdos Renyi network? 1334 01:13:43,930 --> 01:13:44,627 Yes? 1335 01:13:44,627 --> 01:13:46,668 AUDIENCE: Because it's not very good for handling 1336 01:13:46,668 --> 01:13:49,072 directed networks? 1337 01:13:49,072 --> 01:13:49,780 PROFESSOR: Right. 1338 01:13:49,780 --> 01:13:52,460 So you'd say, oh, not very good-- I can maybe 1339 01:13:52,460 --> 01:13:57,960 make-- there's a clear analog to it-- you could take 1340 01:13:57,960 --> 01:14:04,170 a random undirected ER network and say put arrows randomly 1341 01:14:04,170 --> 01:14:08,450 on each-- I mean I think that there's a natural ER 1342 01:14:08,450 --> 01:14:10,710 version of a directed network. 1343 01:14:10,710 --> 01:14:12,455 AUDIENCE: There are constraints. 1344 01:14:12,455 --> 01:14:13,330 PROFESSOR: Like what? 1345 01:14:13,330 --> 01:14:20,092 AUDIENCE: Like when you [INAUDIBLE] duplication, 1346 01:14:20,092 --> 01:14:23,291 you don't randomly assign the edge. 1347 01:14:23,291 --> 01:14:24,290 PROFESSOR: That's right. 1348 01:14:24,290 --> 01:14:27,030 OK, so one thing is that it may be that biologically there 1349 01:14:27,030 --> 01:14:31,010 are constraints, but that should manifest itself somehow. 1350 01:14:31,010 --> 01:14:34,040 In the sense that if, you know all that may be well and good, 1351 01:14:34,040 --> 01:14:36,410 it may be true, what you're saying, but if we go 1352 01:14:36,410 --> 01:14:38,160 and we look at a transcription network, 1353 01:14:38,160 --> 01:14:41,160 if it looks like an ER network, then I 1354 01:14:41,160 --> 01:14:43,160 would say it just doesn't matter. 1355 01:14:43,160 --> 01:14:46,240 The fact that there's microscopic things going on, 1356 01:14:46,240 --> 01:14:48,740 I mean if at the end of the day it looks like an ER network, 1357 01:14:48,740 --> 01:14:53,047 then maybe it's fine anyways, right? 1358 01:14:53,047 --> 01:14:54,480 AUDIENCE: Hum. 1359 01:14:54,480 --> 01:14:56,380 PROFESSOR: Or maybe not. 1360 01:14:56,380 --> 01:14:58,477 You can argue either way. 1361 01:14:58,477 --> 01:15:00,060 AUDIENCE: It depends on what you want. 1362 01:15:00,060 --> 01:15:02,370 If a particular motif occurs a lot 1363 01:15:02,370 --> 01:15:06,814 it might be because it's selected for it, 1364 01:15:06,814 --> 01:15:09,230 but it's not what you were-- --it's for some other reason. 1365 01:15:09,230 --> 01:15:10,229 PROFESSOR: That's right. 1366 01:15:10,229 --> 01:15:13,290 So this is an important point, that I 1367 01:15:13,290 --> 01:15:15,200 would say that in Erdos approach, 1368 01:15:15,200 --> 01:15:18,389 he basically says if we see a network motif more 1369 01:15:18,389 --> 01:15:19,930 frequently than we would expect based 1370 01:15:19,930 --> 01:15:22,590 on some null model, some null network, 1371 01:15:22,590 --> 01:15:24,850 then it's kind of prima facie evidence 1372 01:15:24,850 --> 01:15:26,815 that maybe evolution was selecting 1373 01:15:26,815 --> 01:15:28,480 for it for some reason. 1374 01:15:28,480 --> 01:15:31,180 And what you're saying is that it could be there's 1375 01:15:31,180 --> 01:15:33,040 a microscopic mechanism that just leads 1376 01:15:33,040 --> 01:15:35,690 to those things happening, and so it doesn't 1377 01:15:35,690 --> 01:15:37,590 have to be selection, it could be 1378 01:15:37,590 --> 01:15:40,980 just due to the mechanistic processes below. 1379 01:15:40,980 --> 01:15:43,150 And I think that's a fair concern. 1380 01:15:43,150 --> 01:15:45,890 And it's related to a lot of these other things, in that 1381 01:15:45,890 --> 01:15:48,020 just for example, duplication will naturally 1382 01:15:48,020 --> 01:15:52,470 lead to something-- if you start out with x regulating Y, 1383 01:15:52,470 --> 01:15:55,710 and Y is duplicated then now you have x regulating 1384 01:15:55,710 --> 01:16:00,440 some Y1 and also some Y2. 1385 01:16:00,440 --> 01:16:04,050 And this is the beginnings of a network motif, 1386 01:16:04,050 --> 01:16:07,270 and so it's a reasonable thing to worry about 1387 01:16:07,270 --> 01:16:10,570 but maybe we can correct for at least a majority of this 1388 01:16:10,570 --> 01:16:12,280 by using the proper null model. 1389 01:16:12,280 --> 01:16:16,380 At least that would be the hope. 1390 01:16:16,380 --> 01:16:19,516 AUDIENCE: Well, that's why you don't want necessiarilly-- 1391 01:16:19,516 --> 01:16:20,640 PROFESSOR: OK, that's fair. 1392 01:16:20,640 --> 01:16:23,130 But then the question is, what you null model should we 1393 01:16:23,130 --> 01:16:25,420 be using? 1394 01:16:25,420 --> 01:16:26,522 Yeah? 1395 01:16:26,522 --> 01:16:28,605 AUDIENCE: So you feel like having 1396 01:16:28,605 --> 01:16:31,585 the microscopic constraints does not necessarily 1397 01:16:31,585 --> 01:16:33,510 need to be in the null model. 1398 01:16:33,510 --> 01:16:36,738 I feel we can have a null model but without using 1399 01:16:36,738 --> 01:16:39,690 the microscopic constraints and then just say, oh well 1400 01:16:39,690 --> 01:16:41,430 that's another possibility for why we 1401 01:16:41,430 --> 01:16:43,969 might have these divergences. 1402 01:16:43,969 --> 01:16:45,968 I don't think they need to be in the null model. 1403 01:16:45,968 --> 01:16:48,258 AUDIENCE: Yeah, it's just that then you can't say anything 1404 01:16:48,258 --> 01:16:48,924 about evolution. 1405 01:16:48,924 --> 01:16:51,090 AUDIENCE: Well fair, but I don't should 1406 01:16:51,090 --> 01:16:53,270 have to-- I don't think you have to say something 1407 01:16:53,270 --> 01:16:55,420 about evolution afterwards necessarily. 1408 01:16:55,420 --> 01:16:57,378 PROFESSOR: Yeah, and I think that this question 1409 01:16:57,378 --> 01:17:00,030 about how strongly you can argue that evolution, selective 1410 01:17:00,030 --> 01:17:03,180 or something, and this is a little bit of a judgment call, 1411 01:17:03,180 --> 01:17:05,260 because most of these evolutionary arguments 1412 01:17:05,260 --> 01:17:08,220 are not ironclad, it's more a matter 1413 01:17:08,220 --> 01:17:13,650 of making you feel kind of comfortable with looking 1414 01:17:13,650 --> 01:17:16,670 for what the evolutionary explanation might have been. 1415 01:17:16,670 --> 01:17:21,560 This is just the nature of looking at historical science, 1416 01:17:21,560 --> 01:17:22,490 right? 1417 01:17:22,490 --> 01:17:24,217 I mean, you can speculate about what 1418 01:17:24,217 --> 01:17:26,550 would have happened if Napoleon had done something else, 1419 01:17:26,550 --> 01:17:27,050 or whatever. 1420 01:17:27,050 --> 01:17:30,380 But it's a speculation. 1421 01:17:30,380 --> 01:17:32,540 Of course the hope is that we can collect 1422 01:17:32,540 --> 01:17:34,959 multiple pieces of evidence that make us more and more 1423 01:17:34,959 --> 01:17:36,500 comfortable with it and in some cases 1424 01:17:36,500 --> 01:17:39,290 we can do laboratory evolution to get more comfort, 1425 01:17:39,290 --> 01:17:41,590 but laboratory evolution doesn't prove 1426 01:17:41,590 --> 01:17:45,300 that that's what happened a million years ago either. 1427 01:17:45,300 --> 01:17:48,770 But I'd say it's more the accumulation of evidence 1428 01:17:48,770 --> 01:17:51,470 to make you feel comfortable with an argument. 1429 01:17:51,470 --> 01:17:53,730 But you know, let's first make sure we understand 1430 01:17:53,730 --> 01:17:58,809 what the null model is, and then on Thursday we'll decide, 1431 01:17:58,809 --> 01:18:00,850 well we won't decide, we'll discuss what we think 1432 01:18:00,850 --> 01:18:01,974 that means about evolution. 1433 01:18:01,974 --> 01:18:02,870 Yeah? 1434 01:18:02,870 --> 01:18:08,517 AUDIENCE: So I think what we the other part of the appendix 1435 01:18:08,517 --> 01:18:10,475 that we read about the in and out distributions 1436 01:18:10,475 --> 01:18:12,060 is important for the null model. 1437 01:18:12,060 --> 01:18:12,685 PROFESSOR: Yes. 1438 01:18:12,685 --> 01:18:17,388 AUDIENCE: Because it seems to me that the Erdos Renyi 1439 01:18:17,388 --> 01:18:23,292 network might be a good model for the in distributions, 1440 01:18:23,292 --> 01:18:24,770 but not for the out distributions. 1441 01:18:24,770 --> 01:18:27,529 PROFESSOR: That's right. 1442 01:18:27,529 --> 01:18:29,070 And I think this is really important. 1443 01:18:29,070 --> 01:18:32,330 I think that it's clear that the actual transcription 1444 01:18:32,330 --> 01:18:35,270 network of E. coli, for example, is not well described 1445 01:18:35,270 --> 01:18:37,780 as an Erdos Renyi random network, 1446 01:18:37,780 --> 01:18:40,310 but then it does beg the question of what should we 1447 01:18:40,310 --> 01:18:41,590 be using. 1448 01:18:41,590 --> 01:18:44,566 And you could say, well, we just make a power law network, 1449 01:18:44,566 --> 01:18:45,940 but then you say, oh, but there's 1450 01:18:45,940 --> 01:18:48,420 the in degree, and the out degree. 1451 01:18:48,420 --> 01:18:50,400 How much do you want to keep track of that? 1452 01:18:50,400 --> 01:18:53,152 And I think that there is a fairly strong argument 1453 01:18:53,152 --> 01:18:54,860 that what you should do is what they call 1454 01:18:54,860 --> 01:18:56,540 this degree preserving network. 1455 01:19:00,180 --> 01:19:07,610 In particular what that means is that you take the real network, 1456 01:19:07,610 --> 01:19:09,360 so you take the actual network that you're 1457 01:19:09,360 --> 01:19:15,650 going to be analyzing, and there is some actual degree 1458 01:19:15,650 --> 01:19:17,140 distribution. 1459 01:19:17,140 --> 01:19:25,805 So there's 1 node has-- so k1 might be 106, k2 might be 73, 1460 01:19:25,805 --> 01:19:30,040 dot, dot, dot, dot, up to kn which is equal to 1. 1461 01:19:30,040 --> 01:19:31,520 And of course I'm not even talking 1462 01:19:31,520 --> 01:19:33,603 about it being directed, but you do the same thing 1463 01:19:33,603 --> 01:19:35,526 with directed. 1464 01:19:35,526 --> 01:19:37,650 But then what you do, is you kind of mix things up. 1465 01:19:37,650 --> 01:19:39,358 So you start with a real network and then 1466 01:19:39,358 --> 01:19:42,522 you do something to randomize it. 1467 01:19:42,522 --> 01:19:44,230 And it's a rather clever scheme, I'm just 1468 01:19:44,230 --> 01:19:45,470 going to describe it briefly here 1469 01:19:45,470 --> 01:19:47,386 and then we'll talk more about it on Thursday. 1470 01:19:47,386 --> 01:19:51,210 What you do is you take all of the actual-- 1471 01:19:51,210 --> 01:19:57,580 so let's say we have x1, some x2 and here we have a Y1, 1472 01:19:57,580 --> 01:20:03,790 Y2, Y3, now let's say that these guys are regulating something 1473 01:20:03,790 --> 01:20:05,230 like this. 1474 01:20:05,230 --> 01:20:09,520 What you do is you take two edges randomly, 1475 01:20:09,520 --> 01:20:12,880 we'll pick this one that one, and what we do 1476 01:20:12,880 --> 01:20:16,570 is we swap the targets. 1477 01:20:16,570 --> 01:20:19,660 So what we do is we make this guy come over here, 1478 01:20:19,660 --> 01:20:21,440 and then this one comes over here. 1479 01:20:24,040 --> 01:20:27,240 So now what we do is we erase this, and we erase this, 1480 01:20:27,240 --> 01:20:33,670 now we have a new network, but intriguingly, the degree 1481 01:20:33,670 --> 01:20:36,170 distributions for both incoming and outgoing edges 1482 01:20:36,170 --> 01:20:41,000 are identical to what we had before this. 1483 01:20:41,000 --> 01:20:45,290 Every guy has the outgoing edges, incoming edges, 1484 01:20:45,290 --> 01:20:47,120 but they're just different targets. 1485 01:20:47,120 --> 01:20:49,710 So if you just do this procedure many, 1486 01:20:49,710 --> 01:20:51,960 many times then what you do is you 1487 01:20:51,960 --> 01:20:55,929 achieve some randomized version of the real network. 1488 01:20:55,929 --> 01:20:58,470 And then what you can do is you can ask how many feed forward 1489 01:20:58,470 --> 01:20:59,360 loops are there. 1490 01:20:59,360 --> 01:21:02,700 How many, this, that-- 1491 01:21:02,700 --> 01:21:05,720 And so there's a fair argument that this 1492 01:21:05,720 --> 01:21:08,182 is in some ways the proper null model 1493 01:21:08,182 --> 01:21:09,390 to be asking the question in. 1494 01:21:09,390 --> 01:21:10,100 And indeed, for example, there are 1495 01:21:10,100 --> 01:21:12,600 many more feed forward loops than there would be in an Erdos 1496 01:21:12,600 --> 01:21:16,421 Renyi, but still what you see is that you lose many feed forward 1497 01:21:16,421 --> 01:21:16,920 loop. 1498 01:21:16,920 --> 01:21:19,870 So this then the argument for feed 1499 01:21:19,870 --> 01:21:21,270 forward loops being selected for. 1500 01:21:21,270 --> 01:21:24,060 We'll talk about this and we'll quantify it on Thursday, 1501 01:21:24,060 --> 01:21:26,180 but I'm available for the next half hour 1502 01:21:26,180 --> 01:21:28,090 if anybody has any questions.