1
00:00:00,060 --> 00:00:02,500
The following content is
provided under a Creative

2
00:00:02,500 --> 00:00:04,019
Commons license.

3
00:00:04,019 --> 00:00:06,360
Your support will help
MIT OpenCourseWare

4
00:00:06,360 --> 00:00:10,730
continue to offer high quality
educational resources for free.

5
00:00:10,730 --> 00:00:13,330
To make a donation or
view additional materials

6
00:00:13,330 --> 00:00:17,236
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:17,236 --> 00:00:17,861
at ocw.mit.edu.

8
00:00:20,445 --> 00:00:22,070
PROFESSOR: Today what
we're going to do

9
00:00:22,070 --> 00:00:26,661
is finish off our discussion
about oscillators.

10
00:00:26,661 --> 00:00:29,160
In particular, we're going to
talk about alternative designs

11
00:00:29,160 --> 00:00:30,200
for oscillators.

12
00:00:30,200 --> 00:00:32,650
So rather than having
these loops that

13
00:00:32,650 --> 00:00:35,670
are purely composed of
negative interactions,

14
00:00:35,670 --> 00:00:37,320
negative feedback,
instead we're going

15
00:00:37,320 --> 00:00:39,920
to talk about cases where you
have both positive and negative

16
00:00:39,920 --> 00:00:41,130
interactions.

17
00:00:41,130 --> 00:00:45,155
So in using this kind of
combined network structure,

18
00:00:45,155 --> 00:00:47,030
you can generate what
are known as relaxation

19
00:00:47,030 --> 00:00:51,010
oscillators, which have some
really wonderful properties.

20
00:00:51,010 --> 00:00:53,860
In particular you can get
more robust oscillations,

21
00:00:53,860 --> 00:00:55,130
relative to the parameters.

22
00:00:55,130 --> 00:00:58,350
But also the oscillations
become tunable,

23
00:00:58,350 --> 00:01:00,071
i.e. you can change
the frequency,

24
00:01:00,071 --> 00:01:02,070
without compromising, for
example, the amplitude

25
00:01:02,070 --> 00:01:03,280
of the oscillations.

26
00:01:03,280 --> 00:01:07,670
So for both natural and
synthetic oscillators

27
00:01:07,670 --> 00:01:09,220
these so-called
synthetic oscillators

28
00:01:09,220 --> 00:01:11,614
are perhaps the way to go.

29
00:01:11,614 --> 00:01:13,030
And then we're
going to transition

30
00:01:13,030 --> 00:01:17,410
to more of the global structure
of some these networks

31
00:01:17,410 --> 00:01:20,560
in the context of transcription
networks within cells.

32
00:01:20,560 --> 00:01:22,810
And discuss this paper
that you guys just

33
00:01:22,810 --> 00:01:27,290
read, the Barabasi paper, which
is one of the world's most

34
00:01:27,290 --> 00:01:28,470
cited papers, I think.

35
00:01:31,952 --> 00:01:34,410
And then after thinking about
this global structure, of how

36
00:01:34,410 --> 00:01:36,701
you might be able to generate
these so-called power law

37
00:01:36,701 --> 00:01:38,870
structures, we're going
to look a little bit more

38
00:01:38,870 --> 00:01:41,681
in detail to try to understand
something about these network

39
00:01:41,681 --> 00:01:42,180
motifs.

40
00:01:42,180 --> 00:01:44,510
We've already talked
about them a little bit

41
00:01:44,510 --> 00:01:47,200
in the context of
auto regulatory loops,

42
00:01:47,200 --> 00:01:50,360
but now we'll talk about
them in a little bit more

43
00:01:50,360 --> 00:01:54,480
generality, in particular in the
context of feed forward loops.

44
00:01:54,480 --> 00:01:58,390
And then on Thursday
we will get into some

45
00:01:58,390 --> 00:02:02,220
of the possible beneficial
features of feed forward loops.

46
00:02:02,220 --> 00:02:05,730
On Thursday we talked
about the repressilator.

47
00:02:05,730 --> 00:02:09,930
So if you have x
inhibiting y inhibiting z

48
00:02:09,930 --> 00:02:13,860
coming back and
inhibiting x, that it's

49
00:02:13,860 --> 00:02:16,880
reasonable to expect that it
might generate oscillations.

50
00:02:16,880 --> 00:02:20,380
And indeed in the Elowitz
paper that we read,

51
00:02:20,380 --> 00:02:25,610
such a synthetic circuit did
indeed generate oscillations,

52
00:02:25,610 --> 00:02:28,460
but there were perhaps a
few problems there, right?

53
00:02:28,460 --> 00:02:30,740
So one is that only
about 40% of the cells

54
00:02:30,740 --> 00:02:38,260
actually oscillated,
who knows why not.

55
00:02:38,260 --> 00:02:39,880
But also there
were other problems

56
00:02:39,880 --> 00:02:42,620
that the oscillations
seemed rather noisy,

57
00:02:42,620 --> 00:02:45,080
there was relatively
rapid desyncronization.

58
00:02:47,760 --> 00:02:52,125
Moreover, if you go and you
ask, well is it possible,

59
00:02:52,125 --> 00:02:55,380
or how easy would be to change
the period of the oscillations

60
00:02:55,380 --> 00:02:58,000
just by changing something
like the degradation rate, what

61
00:02:58,000 --> 00:03:00,970
you'll find is that the
oscillations are not

62
00:03:00,970 --> 00:03:03,700
very tunable.

63
00:03:03,700 --> 00:03:10,500
So I'll say the period, or the
frequency, is not very tunable,

64
00:03:10,500 --> 00:03:15,190
and indeed this is a general
feature of oscillatory networks

65
00:03:15,190 --> 00:03:19,390
that have purely
negative interactions.

66
00:03:19,390 --> 00:03:22,440
We talked about a couple of
these cases, for example,

67
00:03:22,440 --> 00:03:26,327
you can get oscillations just
with negative auto regulation.

68
00:03:26,327 --> 00:03:27,660
And what is it that's necessary?

69
00:03:32,280 --> 00:03:33,240
AUDIENCE: [INAUDIBLE].

70
00:03:33,240 --> 00:03:34,040
PROFESSOR: What's that?

71
00:03:34,040 --> 00:03:35,200
AUDIENCE: High coordination.

72
00:03:35,200 --> 00:03:36,408
PROFESSOR: High coordination?

73
00:03:36,408 --> 00:03:37,584
You me-- oh you're--

74
00:03:37,584 --> 00:03:38,500
AUDIENCE: [INAUDIBLE].

75
00:03:38,500 --> 00:03:40,833
PROFESSOR: Cooperativity in
the repression, that I think

76
00:03:40,833 --> 00:03:43,295
is necessary, but is it
going to be sufficient?

77
00:03:45,800 --> 00:03:49,050
Even in this case where I just
have, let's say that I say,

78
00:03:49,050 --> 00:03:53,050
x dot the rate of production,
if this thing is just

79
00:03:53,050 --> 00:03:57,240
as a function of x, the
sharpest it could be,

80
00:03:57,240 --> 00:04:00,122
this is infinite cooperatively,
so it's maximal expression.

81
00:04:00,122 --> 00:04:02,580
And then when you get above
some x critical all of a sudden

82
00:04:02,580 --> 00:04:04,500
you fully repress.

83
00:04:04,500 --> 00:04:08,120
If I just have this
be the formula--

84
00:04:08,120 --> 00:04:10,470
did you guys understand
what I'm referring to here?

85
00:04:10,470 --> 00:04:12,690
What would this generate?

86
00:04:12,690 --> 00:04:15,410
Would this generate
oscillations?

87
00:04:15,410 --> 00:04:16,839
So it actually doesn't.

88
00:04:16,839 --> 00:04:21,120
In the simple equation,
where if we have x dot

89
00:04:21,120 --> 00:04:23,280
is equal to this function.

90
00:04:23,280 --> 00:04:26,310
So I guess this
is a theta, I want

91
00:04:26,310 --> 00:04:29,450
to make sure I get this
x, less than x critical,

92
00:04:29,450 --> 00:04:31,240
that's what this means.

93
00:04:31,240 --> 00:04:36,100
So with some [INAUDIBLE] rate
beta, minus some alpha x.

94
00:04:36,100 --> 00:04:39,070
Does this thing oscillate?

95
00:04:39,070 --> 00:04:42,690
No, and we had a simple argument
for why did not oscillate,

96
00:04:42,690 --> 00:04:43,190
as well.

97
00:04:47,970 --> 00:04:50,620
Yes?

98
00:04:50,620 --> 00:04:55,660
Yell it out somebody, I'm sure
somebody was here on Thursday.

99
00:04:55,660 --> 00:04:57,013
[LAUGHTER]

100
00:04:57,512 --> 00:04:59,301
AUDIENCE: [INAUDIBLE].

101
00:04:59,301 --> 00:05:00,300
PROFESSOR: That's right.

102
00:05:00,300 --> 00:05:03,560
So this is just--
This is just an x dot,

103
00:05:03,560 --> 00:05:05,370
there's no x double
dot, so that means

104
00:05:05,370 --> 00:05:07,020
the derivative of
x, the single value

105
00:05:07,020 --> 00:05:09,190
is a function of x,
that means that we

106
00:05:09,190 --> 00:05:14,450
can't get any oscillation here.

107
00:05:14,450 --> 00:05:17,930
And then remember we analyzed
this model where we explicitly

108
00:05:17,930 --> 00:05:22,890
included the mRNA, so then
we just had that x comes,

109
00:05:22,890 --> 00:05:25,580
and what it does,
is it represses

110
00:05:25,580 --> 00:05:29,390
expression of this mRNA for x,
and then this mRNA comes back

111
00:05:29,390 --> 00:05:31,350
and makes x.

112
00:05:31,350 --> 00:05:33,140
Right?

113
00:05:33,140 --> 00:05:35,880
And in this model,
was this sufficient?

114
00:05:35,880 --> 00:05:37,880
Did this give oscillations?

115
00:05:37,880 --> 00:05:42,610
No, so this also here,
there was no oscillations.

116
00:05:42,610 --> 00:05:44,360
Again here, there
were no oscillations.

117
00:05:44,360 --> 00:05:47,744
But I did tell you that
you could do something more

118
00:05:47,744 --> 00:05:50,160
to get oscillations, just with
a single protein repressing

119
00:05:50,160 --> 00:05:50,660
itself.

120
00:05:53,000 --> 00:05:54,940
So you need more delays.

121
00:05:54,940 --> 00:06:02,410
So if you add delays, then it's
possible to get oscillations.

122
00:06:05,440 --> 00:06:09,680
So those delays could be in
the form of having a model

123
00:06:09,680 --> 00:06:13,280
where you explicitly take into
account that first mRNA is

124
00:06:13,280 --> 00:06:15,310
made, and then
that goes, and then

125
00:06:15,310 --> 00:06:17,150
you translate that
to make some monomer,

126
00:06:17,150 --> 00:06:19,860
and then the monomer
has to maybe fold,

127
00:06:19,860 --> 00:06:21,680
and then the folded
protein maybe

128
00:06:21,680 --> 00:06:23,790
has to dimerize in order
to do a repression.

129
00:06:23,790 --> 00:06:26,409
So if you have a more detailed
mechanistic model, that

130
00:06:26,409 --> 00:06:28,450
includes all these steps,
that kind of introduces

131
00:06:28,450 --> 00:06:30,550
some sort of delay,
that in principle

132
00:06:30,550 --> 00:06:32,470
can lead to oscillations
in such a circuit.

133
00:06:32,470 --> 00:06:35,880
Or if you wanted to, you could
just explicitly put in a delay.

134
00:06:35,880 --> 00:06:37,760
So you could say
that x dot, instead

135
00:06:37,760 --> 00:06:42,210
being a function of x,
instead what you can do,

136
00:06:42,210 --> 00:06:44,470
is you could say, well its
actually a function of x

137
00:06:44,470 --> 00:06:49,410
at some time, t minus tao.

138
00:06:49,410 --> 00:06:52,710
So instead of having the
rate of production of x

139
00:06:52,710 --> 00:06:54,774
be a function of x, at
that moment in time,

140
00:06:54,774 --> 00:06:57,190
instead it could be a function
of x at some previous time.

141
00:06:57,190 --> 00:06:59,231
Doing that, that's a very
explicit form of delay.

142
00:06:59,231 --> 00:07:01,900
And that can also be used
to generate oscillations

143
00:07:01,900 --> 00:07:04,390
in a simple negative
auto regulatory loop.

144
00:07:08,170 --> 00:07:10,670
These are all different
kind of approaches

145
00:07:10,670 --> 00:07:18,600
for encoding delays into a
model and in various approaches

146
00:07:18,600 --> 00:07:20,786
will give you oscillations.

147
00:07:20,786 --> 00:07:21,285
Yes?

148
00:07:21,285 --> 00:07:22,993
AUDIENCE: Question
for the repressilator,

149
00:07:22,993 --> 00:07:25,494
when you say the
period is not tunable,

150
00:07:25,494 --> 00:07:29,810
it's because the mRNA lifeline
is very difficult to--

151
00:07:29,810 --> 00:07:31,950
PROFESSOR: All
right, when we say--

152
00:07:31,950 --> 00:07:33,367
AUDIENCE: --in the
model you can--

153
00:07:33,367 --> 00:07:34,575
PROFESSOR: Yes, that's right.

154
00:07:34,575 --> 00:07:36,270
In the model you can,
in principle-- So

155
00:07:36,270 --> 00:07:41,432
what I mean when I say this is
that in this class of model,

156
00:07:41,432 --> 00:07:43,640
so you could also have,
instead of this repressilator

157
00:07:43,640 --> 00:07:46,450
with three, you have the
so-called pentalator,

158
00:07:46,450 --> 00:07:49,400
where you have five proteins
and each is repressing itself.

159
00:07:49,400 --> 00:07:52,530
So these all have
similar features,

160
00:07:52,530 --> 00:07:55,480
so all have these odd numbers
of proteins going around

161
00:07:55,480 --> 00:07:56,860
and repressing one another.

162
00:07:56,860 --> 00:07:59,950
And so you can write down the
model with seven, if you want.

163
00:07:59,950 --> 00:08:04,600
But in all these cases,
it's not tunable.

164
00:08:04,600 --> 00:08:09,320
What we mean by that is that,
when you tune the frequency,

165
00:08:09,320 --> 00:08:13,380
you in general lose the
amplitude of the oscillation.

166
00:08:13,380 --> 00:08:15,050
So the amplitude will go down.

167
00:08:15,050 --> 00:08:18,360
There was a very
nice paper that was

168
00:08:18,360 --> 00:08:20,780
written in 2008 on
this topic, written

169
00:08:20,780 --> 00:08:23,260
by Jim Ferrell at Stanford.

170
00:08:23,260 --> 00:08:25,160
So I just want to mention this.

171
00:08:25,160 --> 00:08:32,230
So its Ferrell,
at Stanford, this

172
00:08:32,230 --> 00:08:41,600
is a paper in Science
2008, and it's

173
00:08:41,600 --> 00:08:44,680
called Robust Tunable
Biological Oscillations

174
00:08:44,680 --> 00:08:48,870
from Interlinked Positive
and Negative Feedback Loops.

175
00:08:48,870 --> 00:08:52,480
So nice title, I like
titles that say something.

176
00:08:52,480 --> 00:08:54,860
So it's sort of the
ultimate short version

177
00:08:54,860 --> 00:08:57,910
of an abstract, right, if
you can do it I recommend it.

178
00:08:57,910 --> 00:08:59,950
Incidentally in
graduate school, I once

179
00:08:59,950 --> 00:09:04,170
wrote a paper with four words,
short words, DNA over-winds

180
00:09:04,170 --> 00:09:06,040
when stretched.

181
00:09:06,040 --> 00:09:07,880
Nice statement,
you may or may not

182
00:09:07,880 --> 00:09:09,340
actually know what
I mean by that,

183
00:09:09,340 --> 00:09:13,580
but it's a nice short,
title, it's a statement.

184
00:09:13,580 --> 00:09:15,830
I encourage you to
think about that when

185
00:09:15,830 --> 00:09:19,110
you're writing your papers.

186
00:09:19,110 --> 00:09:23,650
So he wrote this paper where,
he said, all right, well,

187
00:09:23,650 --> 00:09:26,100
oscillations are
really important.

188
00:09:26,100 --> 00:09:29,810
Thinking about context of
heart rhythms, or cell cycle,

189
00:09:29,810 --> 00:09:31,052
or this or that.

190
00:09:31,052 --> 00:09:32,760
Oscillations are
important, but if you go

191
00:09:32,760 --> 00:09:35,110
and you look at
the circuits that

192
00:09:35,110 --> 00:09:37,060
are generating
oscillations in biology,

193
00:09:37,060 --> 00:09:39,580
they often have so-called
interlinked positive

194
00:09:39,580 --> 00:09:41,790
and negative feedback loops.

195
00:09:41,790 --> 00:09:44,930
There are many cases
where you have,

196
00:09:44,930 --> 00:09:49,460
some x that actually
is positively,

197
00:09:49,460 --> 00:09:51,410
it's kind of activating itself.

198
00:09:51,410 --> 00:09:54,110
And this is very much
something that will not lead

199
00:09:54,110 --> 00:09:56,580
to oscillations on its own.

200
00:09:56,580 --> 00:09:59,050
It might be bistable,
which is interesting,

201
00:09:59,050 --> 00:10:00,880
but not oscillations on its own.

202
00:10:00,880 --> 00:10:04,240
But then there's also
maybe a negative feedback

203
00:10:04,240 --> 00:10:09,030
loop through another protein.

204
00:10:09,030 --> 00:10:10,670
And the idea is that
this one somehow

205
00:10:10,670 --> 00:10:15,540
operates-- this one's
fast, and this one's slow.

206
00:10:15,540 --> 00:10:18,329
And the key feature of
these relaxation oscillators

207
00:10:18,329 --> 00:10:19,495
is they are two time scales.

208
00:10:24,770 --> 00:10:28,680
And it's the slow
time scale that

209
00:10:28,680 --> 00:10:31,780
specifies the period
of the oscillation,

210
00:10:31,780 --> 00:10:34,220
and this fast one kind
of locks the system

211
00:10:34,220 --> 00:10:36,770
into these alternative states.

212
00:10:36,770 --> 00:10:39,490
And this helps
maintain the amplitude,

213
00:10:39,490 --> 00:10:42,340
because it has this nature
being bistable, right,

214
00:10:42,340 --> 00:10:44,290
so it's on or off.

215
00:10:44,290 --> 00:10:47,440
So this helps you
maintain amplitude,

216
00:10:47,440 --> 00:10:50,580
so this is kind of in
charge of amplitude,

217
00:10:50,580 --> 00:10:55,220
and this one over here is
in charge of the period.

218
00:10:55,220 --> 00:10:57,740
So what you can imagine that
by changing this time scale,

219
00:10:57,740 --> 00:10:59,650
you change the period
of the oscillation,

220
00:10:59,650 --> 00:11:05,600
whereas this loop allows you
to maintain the amplitude.

221
00:11:05,600 --> 00:11:09,340
And what Jim's group did
computationally in this paper,

222
00:11:09,340 --> 00:11:12,810
is they analyzed many
different circuit designs that

223
00:11:12,810 --> 00:11:14,760
can lead to
oscillations, and they

224
00:11:14,760 --> 00:11:18,370
showed that for
the loops that are

225
00:11:18,370 --> 00:11:20,170
made of purely
negative interactions

226
00:11:20,170 --> 00:11:23,640
like this, if you change
a parameter in order

227
00:11:23,640 --> 00:11:26,000
to change the period,
you'll also in general

228
00:11:26,000 --> 00:11:28,990
make the amplitude of the
oscillations drop dramatically.

229
00:11:28,990 --> 00:11:31,424
So that's the sense in
which they're not tunable.

230
00:11:31,424 --> 00:11:33,090
Whereas if you have
this kind of design,

231
00:11:33,090 --> 00:11:37,510
you can actually tune over, in
some cases a very wide range,

232
00:11:37,510 --> 00:11:40,730
but maintain the amplitude
of the oscillation.

233
00:11:40,730 --> 00:11:42,170
And in addition
to being tunable,

234
00:11:42,170 --> 00:11:45,130
these things also end up
being robust in various ways.

235
00:11:45,130 --> 00:11:49,300
The oscillation is maintained
subject to various kinds

236
00:11:49,300 --> 00:11:51,994
of-- If you twiddle with the
parameters, you double this,

237
00:11:51,994 --> 00:11:54,160
you have that, you still
get nice oscillations here,

238
00:11:54,160 --> 00:11:57,850
whereas in those
designs you tend to lose

239
00:11:57,850 --> 00:11:59,610
the oscillations more easily.

240
00:11:59,610 --> 00:12:01,500
So they claim that based
on that, that these

241
00:12:01,500 --> 00:12:03,650
might be more evolvable.

242
00:12:03,650 --> 00:12:06,180
So even in cases where you
don't need to tune the period,

243
00:12:06,180 --> 00:12:09,060
maybe you still end up
evolving towards this design,

244
00:12:09,060 --> 00:12:12,497
just because it's robust
to stochastic fluctuations

245
00:12:12,497 --> 00:12:14,330
in the concentrations
of things, but also it

246
00:12:14,330 --> 00:12:17,630
might be easier to evolve
these sorts of oscillations.

247
00:12:20,490 --> 00:12:23,270
Are there any questions
about the kind of intuition

248
00:12:23,270 --> 00:12:26,370
behind this for now?

249
00:12:26,370 --> 00:12:28,670
There's a nice kind
of circuit analogy

250
00:12:28,670 --> 00:12:34,140
that people often talk about
in the context of this.

251
00:12:34,140 --> 00:12:38,220
So if you imagine you have some
battery, with some voltage, v,

252
00:12:38,220 --> 00:12:50,040
well, we'll say v battery,
some capacitor over here,

253
00:12:50,040 --> 00:12:54,950
but over here you
have something that

254
00:12:54,950 --> 00:13:03,950
will spark at some voltage,
some v t, you get a spark.

255
00:13:03,950 --> 00:13:08,110
Now the question is, well,
what happens over time

256
00:13:08,110 --> 00:13:12,250
if the threshold is
less than v battery?

257
00:13:12,250 --> 00:13:14,335
We maybe should have
a resistor in here.

258
00:13:18,800 --> 00:13:21,410
So the threshold is
less than v battery,

259
00:13:21,410 --> 00:13:26,352
then this can generate
nice oscillations

260
00:13:26,352 --> 00:13:28,560
in the voltage say across
the capacitor as a function

261
00:13:28,560 --> 00:13:30,910
of time, that are tunable.

262
00:13:30,910 --> 00:13:34,020
Because if you plot
as a function of time.

263
00:13:34,020 --> 00:13:37,900
This is the voltage across the
capacitor, where up here we

264
00:13:37,900 --> 00:13:45,160
might have the v, the battery,
here we might have v threshold.

265
00:13:45,160 --> 00:13:47,960
Now in the absence of--
This thing's that's

266
00:13:47,960 --> 00:13:49,510
going to short
periodically, we're

267
00:13:49,510 --> 00:13:51,399
just going to charge
up the capacitor.

268
00:13:51,399 --> 00:13:53,690
So in principle, there's
going to be this standard r, c

269
00:13:53,690 --> 00:13:58,630
time constant, coming up to
here, but before we get there,

270
00:13:58,630 --> 00:14:00,070
we get the spark.

271
00:14:00,070 --> 00:14:03,475
So then we discharge
across here and this drops.

272
00:14:11,220 --> 00:14:14,820
So you get something
that looks like this.

273
00:14:14,820 --> 00:14:17,890
Now you can imagine by changing,
for example, the resistor,

274
00:14:17,890 --> 00:14:21,270
you can change the rate that
this thing, the capacitor,

275
00:14:21,270 --> 00:14:22,799
will charge up.

276
00:14:22,799 --> 00:14:24,340
But the amplitude
of the oscillations

277
00:14:24,340 --> 00:14:27,790
stay constant, because that's
set by the voltage threshold

278
00:14:27,790 --> 00:14:33,270
across this-- where it shorts.

279
00:14:33,270 --> 00:14:35,520
This is capturing this
dynamic of the separation

280
00:14:35,520 --> 00:14:36,170
of time scales.

281
00:14:36,170 --> 00:14:39,145
So there's a slow time
scale, which is this r,

282
00:14:39,145 --> 00:14:41,370
c time constant, and then
there's the rapid time

283
00:14:41,370 --> 00:14:45,060
scale is where this shorts out.

284
00:14:45,060 --> 00:14:46,820
So you can imagine
that this is an example

285
00:14:46,820 --> 00:14:49,880
of an oscillatory
signal that we can

286
00:14:49,880 --> 00:14:52,798
to tune the frequency without
sacrificing the amplitude.

287
00:15:00,200 --> 00:15:04,580
What we've said so far is that
there are engineering analogs

288
00:15:04,580 --> 00:15:06,830
to these sorts of
relaxation oscillators.

289
00:15:06,830 --> 00:15:10,140
We can model various
synthetic circuits,

290
00:15:10,140 --> 00:15:13,799
or we can look at natural
oscillatory networks,

291
00:15:13,799 --> 00:15:15,590
in order to get a sense
of what's going on.

292
00:15:15,590 --> 00:15:19,420
But of course, a major
goal of this kind

293
00:15:19,420 --> 00:15:22,660
of system synthetic
approach to the field,

294
00:15:22,660 --> 00:15:24,690
is that if all this
stuff is really true,

295
00:15:24,690 --> 00:15:27,084
we should be able to build it.

296
00:15:27,084 --> 00:15:29,000
And there's a very nice
demonstration of this,

297
00:15:29,000 --> 00:15:33,680
also in 2008, by
Jeff Hasty's group.

298
00:15:33,680 --> 00:15:38,470
So Jeff Hasty was actually
trained as a high energy

299
00:15:38,470 --> 00:15:43,020
theorist, and then I think
it was during his postdoc,

300
00:15:43,020 --> 00:15:45,850
maybe he switched into
experimental biology.

301
00:15:45,850 --> 00:15:49,600
Went and did his postdoc,
I think, with Jim Collins.

302
00:15:49,600 --> 00:15:52,860
And then eventually now has
his own group doing systems

303
00:15:52,860 --> 00:15:54,160
synthetic biology.

304
00:15:54,160 --> 00:16:01,980
In this paper, it was
a Nature paper in 2008,

305
00:16:01,980 --> 00:16:06,050
it's called A Fast Robust
and Tunable Synthetic Gene

306
00:16:06,050 --> 00:16:08,640
Oscillator.

307
00:16:08,640 --> 00:16:12,230
It's a nice statement, tells
you what he's about to do.

308
00:16:12,230 --> 00:16:16,830
The data here, this is again
using this basic insight

309
00:16:16,830 --> 00:16:20,140
of having both interlinked
positive negative feedback

310
00:16:20,140 --> 00:16:21,420
loops in E. coli.

311
00:16:21,420 --> 00:16:24,740
He demonstrated that he can get
really beautiful oscillations,

312
00:16:24,740 --> 00:16:27,560
in essentially all the cells,
and that they're tunable,

313
00:16:27,560 --> 00:16:30,970
enter n period, by a
factor of three, or four,

314
00:16:30,970 --> 00:16:32,510
or so, by a fair amount.

315
00:16:32,510 --> 00:16:35,750
And indeed as fast
as 13 minutes,

316
00:16:35,750 --> 00:16:37,430
the oscillatory period.

317
00:16:37,430 --> 00:16:41,060
Which is pretty nice, right?

318
00:16:41,060 --> 00:16:44,520
So I encourage you to
check out this paper.

319
00:16:44,520 --> 00:16:52,180
This paper was also
an example of how

320
00:16:52,180 --> 00:16:56,570
it was in principle possible to
get oscillations just by doing

321
00:16:56,570 --> 00:16:58,370
negative auto regulation.

322
00:16:58,370 --> 00:16:59,980
Right, so this was
a case where they

323
00:16:59,980 --> 00:17:03,890
designed a gene network
that they could tune

324
00:17:03,890 --> 00:17:05,920
and had this wonderful property.

325
00:17:05,920 --> 00:17:08,040
But then after they
did that they noticed

326
00:17:08,040 --> 00:17:09,480
that in their model
at least, they

327
00:17:09,480 --> 00:17:12,069
could get oscillations
in some parameter regime,

328
00:17:12,069 --> 00:17:15,099
just by having the negative
auto-regulatory loop.

329
00:17:15,099 --> 00:17:17,565
And as a result of all these
intermediate processes,

330
00:17:17,565 --> 00:17:19,440
of protein maturation,
and so forth, and then

331
00:17:19,440 --> 00:17:20,790
they went and they
constructed that network,

332
00:17:20,790 --> 00:17:22,748
and they showed that that
could also oscillate.

333
00:17:22,748 --> 00:17:26,040
So again this is an example of
the interplay between modeling,

334
00:17:26,040 --> 00:17:29,190
experiment theory,
modeling, And Jeff Hasty

335
00:17:29,190 --> 00:17:32,050
has gone on to write
another several,

336
00:17:32,050 --> 00:17:35,840
really beautiful papers looking
at these sorts of oscillations,

337
00:17:35,840 --> 00:17:39,490
looking at how you can get
synchronization of oscillators,

338
00:17:39,490 --> 00:17:41,550
and you get period
doubling ideas.

339
00:17:41,550 --> 00:17:44,571
It's really a whole string of
wonderful, wonderful papers.

340
00:17:44,571 --> 00:17:47,070
So I encourage you to, if you're
interested in oscillations,

341
00:17:47,070 --> 00:17:49,700
to look at Jeff Hasty's
work over the years.

342
00:17:55,100 --> 00:17:59,170
If you want a quick
introduction to these papers,

343
00:17:59,170 --> 00:18:03,200
I also wrote a news and views
in Nature on these two papers.

344
00:18:03,200 --> 00:18:07,364
So you can read that,
it's only a page.

345
00:18:07,364 --> 00:18:09,030
Although I guess you
won't hear anything

346
00:18:09,030 --> 00:18:10,696
that you haven't
already heard probably.

347
00:18:13,330 --> 00:18:15,300
Any other questions
about this idea

348
00:18:15,300 --> 00:18:19,560
of how we can use both positive
and negative feedback in order

349
00:18:19,560 --> 00:18:21,685
to get some nice
oscillatory properties?

350
00:18:27,470 --> 00:18:30,040
OK, then let's move on.

351
00:18:33,340 --> 00:18:37,150
What did you guys think of
this paper, the Barabasi paper?

352
00:18:41,170 --> 00:18:43,696
Good, bad, difficult, easy?

353
00:18:43,696 --> 00:18:45,570
AUDIENCE: Why does it
have so many citations?

354
00:18:45,570 --> 00:18:46,780
PROFESSOR: Why does it
have so many citations?

355
00:18:46,780 --> 00:18:48,540
All right that's
an inter-- and you

356
00:18:48,540 --> 00:18:50,400
should look at how many--
according to Google Scholar,

357
00:18:50,400 --> 00:18:52,670
I haven't checked this year, but
it's probably 20,000 citations.

358
00:18:52,670 --> 00:18:53,211
I mean it's--

359
00:18:53,211 --> 00:18:55,092
AUDIENCE: Is it a cult thing?

360
00:18:55,092 --> 00:18:56,300
PROFESSOR: It's a cult thing.

361
00:18:56,300 --> 00:18:58,510
Well, I don't know.

362
00:18:58,510 --> 00:19:00,110
That might be exaggerating.

363
00:19:00,110 --> 00:19:01,720
AUDIENCE: I mean
it's a nice paper.

364
00:19:01,720 --> 00:19:03,455
PROFESSOR: Yeah, right.

365
00:19:03,455 --> 00:19:05,830
So this is interesting, and
I think that the basic answer

366
00:19:05,830 --> 00:19:08,950
is that there are
networks that are

367
00:19:08,950 --> 00:19:12,690
relevant in many, many, many
fields, which they allude to.

368
00:19:12,690 --> 00:19:15,240
And there are many
researchers that

369
00:19:15,240 --> 00:19:17,570
have been excited about
studying those networks

370
00:19:17,570 --> 00:19:23,115
in many, many fields, and many,
many, many of the networks that

371
00:19:23,115 --> 00:19:27,460
are observed in nature or
social science, the web,

372
00:19:27,460 --> 00:19:29,890
everywhere, they have
these power law structures.

373
00:19:29,890 --> 00:19:37,220
And this is the first
clear simple mechanism

374
00:19:37,220 --> 00:19:38,290
to generate it.

375
00:19:38,290 --> 00:19:40,220
My understanding
is that actually

376
00:19:40,220 --> 00:19:42,530
a mathematician decades
before, actually

377
00:19:42,530 --> 00:19:44,590
did demonstrate that
this kind of thing

378
00:19:44,590 --> 00:19:46,510
could be constructed,
that would lead to this,

379
00:19:46,510 --> 00:19:49,685
but that paper doesn't
have 20,000 citations.

380
00:19:49,685 --> 00:19:51,310
I mean like it's a
lot of these things,

381
00:19:51,310 --> 00:19:53,101
you have to be the
right time, right place,

382
00:19:53,101 --> 00:19:54,960
and have the right idea.

383
00:19:54,960 --> 00:19:57,410
AUDIENCE: Yeah, I guess my
main thought about the paper

384
00:19:57,410 --> 00:20:01,460
is exactly that, the
interesting thing about it was,

385
00:20:01,460 --> 00:20:03,964
it came out at about the time
that data on large networks

386
00:20:03,964 --> 00:20:04,900
was readily available.

387
00:20:04,900 --> 00:20:05,900
PROFESSOR: That's right.

388
00:20:05,900 --> 00:20:10,490
There's a reason that this paper
was published at this time,

389
00:20:10,490 --> 00:20:14,010
and of course if Barabasi
didn't do it here,

390
00:20:14,010 --> 00:20:16,520
someone else would have
done it a year or two later.

391
00:20:16,520 --> 00:20:19,610
But it was really that the
data were available everywhere,

392
00:20:19,610 --> 00:20:23,160
and we were seeing these
power law distributions,

393
00:20:23,160 --> 00:20:26,900
and it's really crying
out for an explanation.

394
00:20:26,900 --> 00:20:29,425
I think it's-- You know
sometimes people complain,

395
00:20:29,425 --> 00:20:32,050
that they say, oh yeah, you know
I could have come up with this

396
00:20:32,050 --> 00:20:35,380
idea, it's not that deep.

397
00:20:35,380 --> 00:20:38,220
And maybe you could
have, but you didn't.

398
00:20:38,220 --> 00:20:39,625
[LAUGHTER]

399
00:20:40,320 --> 00:20:43,760
PROFESSOR: And also
I'd say that Barabasi

400
00:20:43,760 --> 00:20:49,040
has a record of doing
interesting things,

401
00:20:49,040 --> 00:20:52,460
and being the first to
point out a simple idea.

402
00:20:52,460 --> 00:20:55,800
If you can reliably be
the first to point out

403
00:20:55,800 --> 00:20:57,590
a simple explanation
for important things,

404
00:20:57,590 --> 00:21:00,780
then that's another
kind of genius, right?

405
00:21:00,780 --> 00:21:03,040
I mean-- and it's
the kind of genius

406
00:21:03,040 --> 00:21:07,500
that I aspire to, because I
know that I'm not going to reach

407
00:21:07,500 --> 00:21:09,680
the other kind of genius.

408
00:21:09,680 --> 00:21:12,970
I mean there are some things you
look at, oh well, I would never

409
00:21:12,970 --> 00:21:14,530
be able to do that, right?

410
00:21:14,530 --> 00:21:17,530
And everyone agrees
that that's hard.

411
00:21:17,530 --> 00:21:19,830
But I think that
there is something

412
00:21:19,830 --> 00:21:23,920
about being able to see what
the scientific opportunities are

413
00:21:23,920 --> 00:21:26,340
at a given time,
and you don't have

414
00:21:26,340 --> 00:21:29,330
to come up with a
really complicated model

415
00:21:29,330 --> 00:21:33,280
or proof in order to have
really important impact.

416
00:21:33,280 --> 00:21:40,730
And this paper is
way beyond, in terms

417
00:21:40,730 --> 00:21:44,120
of number of people that
have read it, cited it,

418
00:21:44,120 --> 00:21:46,090
and so forth, it's
way beyond, probably

419
00:21:46,090 --> 00:21:50,320
any other paper you'll
likely read in your life.

420
00:21:50,320 --> 00:21:50,985
Yes?

421
00:21:50,985 --> 00:21:51,850
AUDIENCE: Just more thoughts.

422
00:21:51,850 --> 00:21:52,891
PROFESSOR: More thoughts.

423
00:21:52,891 --> 00:21:54,584
Yeah, that's fine.

424
00:21:54,584 --> 00:21:57,140
AUDIENCE: I guess,
what's interesting is,

425
00:21:57,140 --> 00:22:01,452
it starts a conversation, so
we can analyze these networks,

426
00:22:01,452 --> 00:22:06,798
and the feature that
works the best [INAUDIBLE]

427
00:22:06,798 --> 00:22:08,499
starts that
conversation, there's

428
00:22:08,499 --> 00:22:12,920
still a lot more that we
can do with it [INAUDIBLE].

429
00:22:12,920 --> 00:22:15,370
I think that's why I like it.

430
00:22:15,370 --> 00:22:18,370
PROFESSOR: That's a totally--
so this is the Barabasi and Reka

431
00:22:18,370 --> 00:22:20,880
Albert.

432
00:22:20,880 --> 00:22:26,140
So Barabasi is a professor
over at Northeastern now,

433
00:22:26,140 --> 00:22:28,192
Reka Albert is a
professor at Penn State,

434
00:22:28,192 --> 00:22:30,400
and I think they've both
gone on to do what, I think,

435
00:22:30,400 --> 00:22:34,290
are really very interesting
things in this network space,

436
00:22:34,290 --> 00:22:36,310
and more generally.

437
00:22:36,310 --> 00:22:38,840
So I you encourage you to
check out what each of them

438
00:22:38,840 --> 00:22:41,940
have been doing over the years.

439
00:22:41,940 --> 00:22:45,395
All right, so this model, what
are the two key ingredients

440
00:22:45,395 --> 00:22:47,895
of this model?

441
00:22:47,895 --> 00:22:49,770
AUDIENCE: Growth and
preferential attachment.

442
00:22:49,770 --> 00:22:51,228
PROFESSOR: Right,
so the two things

443
00:22:51,228 --> 00:22:55,750
you should be able to
recapitulate on an exam

444
00:22:55,750 --> 00:22:58,070
is that there are
two assumptions here,

445
00:22:58,070 --> 00:23:00,630
there's growth, and there's
preferential attachment.

446
00:23:00,630 --> 00:23:02,714
And we'll talk about
the degree to which we

447
00:23:02,714 --> 00:23:04,630
think each of those
things might be necessary,

448
00:23:04,630 --> 00:23:11,270
but what's the key-- Can
somebody be a little more

449
00:23:11,270 --> 00:23:12,770
explicit than what
we've been so far

450
00:23:12,770 --> 00:23:14,380
about what the
key observation is

451
00:23:14,380 --> 00:23:17,700
that we're trying to explain?

452
00:23:17,700 --> 00:23:21,060
AUDIENCE: There are nodes
that, inactive nodes,

453
00:23:21,060 --> 00:23:25,860
that have more edges than you
expect, either by random or--

454
00:23:25,860 --> 00:23:26,710
PROFESSOR: Right.

455
00:23:26,710 --> 00:23:27,480
Perfect, right.

456
00:23:27,480 --> 00:23:35,205
Observation-- So some
nodes have lots of edges.

457
00:23:35,205 --> 00:23:37,630
AUDIENCE: I mean it is
sort of a meta irony

458
00:23:37,630 --> 00:23:40,055
that this paper is
now very widely cited.

459
00:23:40,055 --> 00:23:41,520
[LAUGHTER]

460
00:23:41,520 --> 00:23:42,205
PROFESSOR: Yes.

461
00:23:42,205 --> 00:23:43,830
AUDIENCE: Every time
we talk about it--

462
00:23:43,830 --> 00:23:45,700
PROFESSOR: Yes.

463
00:23:45,700 --> 00:23:50,044
Indeed it is ironic,
and we'll talk

464
00:23:50,044 --> 00:23:51,960
about how the scaling
of the citation networks

465
00:23:51,960 --> 00:23:53,510
go in a moment.

466
00:23:53,510 --> 00:23:55,670
Right, so some nodes
have lots of edges,

467
00:23:55,670 --> 00:23:59,915
and you want to be
very clear about this,

468
00:23:59,915 --> 00:24:02,520
this is what you're
trying to explain.

469
00:24:02,520 --> 00:24:05,630
So it's a power law
distribution, so in particular,

470
00:24:05,630 --> 00:24:07,380
you quantify this thing.

471
00:24:07,380 --> 00:24:11,020
That the probability
of having k nodes,

472
00:24:11,020 --> 00:24:13,300
or I'm sorry, the probability
that a node has k edges,

473
00:24:13,300 --> 00:24:19,780
falls off as a power law, 1
over k to some power 2, 3, 4.

474
00:24:19,780 --> 00:24:26,450
So the probability of having
k edges, goes as 1 over k

475
00:24:26,450 --> 00:24:28,310
to some power alpha.

476
00:24:28,310 --> 00:24:33,610
Where alpha is maybe between
2 and 4, for a lot of these.

477
00:24:33,610 --> 00:24:37,774
Now it's important
to make sure that you

478
00:24:37,774 --> 00:24:39,440
keep this qualitative
statement in mind,

479
00:24:39,440 --> 00:24:43,980
because it's true that it
falls off, and sort of rapidly.

480
00:24:43,980 --> 00:24:47,250
Right, 1 over k squared, k
cubed, or k to the fourth,

481
00:24:47,250 --> 00:24:49,190
you'd say, oh, that's a
pretty rapid fall off.

482
00:24:49,190 --> 00:24:50,010
Right?

483
00:24:50,010 --> 00:24:52,540
But it's not rapid
compared to what?

484
00:24:52,540 --> 00:24:54,054
AUDIENCE: [INAUDIBLE].

485
00:24:54,054 --> 00:24:54,970
PROFESSOR: --exponent.

486
00:24:54,970 --> 00:24:55,470
Right.

487
00:24:55,470 --> 00:24:59,050
So for these other models then,
it falls off exponentially.

488
00:24:59,050 --> 00:24:59,550
Right?

489
00:24:59,550 --> 00:25:02,320
So even faster.

490
00:25:02,320 --> 00:25:05,570
So it's easy to look at
1 over k to the fourth,

491
00:25:05,570 --> 00:25:07,710
and think, oh, that's
a fast fall off.

492
00:25:07,710 --> 00:25:10,070
We have to remember
that it's slow

493
00:25:10,070 --> 00:25:11,890
compared to some other things.

494
00:25:11,890 --> 00:25:16,330
So in particular, if you look
at the data for real networks,

495
00:25:16,330 --> 00:25:19,700
and you see that the probability
distribution in many cases

496
00:25:19,700 --> 00:25:22,490
goes over orders of magnitude
in terms of this probability.

497
00:25:22,490 --> 00:25:24,950
You think oh,
that's a big range.

498
00:25:24,950 --> 00:25:27,180
And it is a big
range, but the fact

499
00:25:27,180 --> 00:25:30,980
is that you actually see
some nodes with the thousand

500
00:25:30,980 --> 00:25:32,480
ideas or whatnot,
which is something

501
00:25:32,480 --> 00:25:35,100
that you would just never see,
if it were a random network,

502
00:25:35,100 --> 00:25:40,500
or if it were not a power
law distributed network.

503
00:25:40,500 --> 00:25:43,700
And I think that this
is also highlighting

504
00:25:43,700 --> 00:25:49,417
another statement, which is
that a powerful way to make

505
00:25:49,417 --> 00:25:51,750
a difference, for example,
if you're going to write down

506
00:25:51,750 --> 00:25:53,710
a model, or you're
going to do a theory,

507
00:25:53,710 --> 00:25:57,840
is that it's nice if there's
a clear observation that

508
00:25:57,840 --> 00:26:00,405
needs to be explained.

509
00:26:00,405 --> 00:26:01,780
Because you can
always write down

510
00:26:01,780 --> 00:26:06,120
a model of something,
and maybe you'll

511
00:26:06,120 --> 00:26:08,110
find something interesting.

512
00:26:08,110 --> 00:26:11,950
But a way to massively
increase the probability

513
00:26:11,950 --> 00:26:14,830
that you're going to discover
something interesting

514
00:26:14,830 --> 00:26:17,710
is if you already know there's
something interesting there

515
00:26:17,710 --> 00:26:20,880
and that you're
trying to explain it.

516
00:26:20,880 --> 00:26:22,710
And I think that
this is an example

517
00:26:22,710 --> 00:26:24,876
of that, right there, it
was already an observation,

518
00:26:24,876 --> 00:26:26,160
it was already known.

519
00:26:26,160 --> 00:26:29,510
It's not that he was the first
person to make those plots.

520
00:26:29,510 --> 00:26:32,910
There are other plots of
citation networks before.

521
00:26:32,910 --> 00:26:34,800
So Sid Redner, for
example, had already

522
00:26:34,800 --> 00:26:37,530
done some analyses
of citation networks,

523
00:26:37,530 --> 00:26:41,410
he's a theoretical statistical
physicist over at BU,

524
00:26:41,410 --> 00:26:47,830
but just now, I guess, moving
over to the Santa Fe Institute.

525
00:26:47,830 --> 00:26:49,550
But it's not that he
was the first person

526
00:26:49,550 --> 00:26:52,715
to make that
observation, but he knew

527
00:26:52,715 --> 00:26:55,090
there was something interesting
that needed to explained.

528
00:26:55,090 --> 00:26:58,810
So I'd say that
for any of you that

529
00:26:58,810 --> 00:27:01,880
are thinking about doing
theory, or writing down models,

530
00:27:01,880 --> 00:27:05,537
I would say, whenever
possible start

531
00:27:05,537 --> 00:27:06,870
with an interesting observation.

532
00:27:09,920 --> 00:27:12,510
So can somebody-- maybe you
guys could just throw out,

533
00:27:12,510 --> 00:27:16,470
what are some examples
of nodes and edges that

534
00:27:16,470 --> 00:27:17,780
were given there or elsewhere?

535
00:27:22,469 --> 00:27:24,577
AUDIENCE: Web pages and links.

536
00:27:24,577 --> 00:27:26,160
PROFESSOR: Right,
web pages and links.

537
00:27:31,050 --> 00:27:35,319
And is this a directed
or undirected?

538
00:27:35,319 --> 00:27:36,110
AUDIENCE: Directed.

539
00:27:36,110 --> 00:27:37,693
PROFESSOR: So this
is indeed directed.

540
00:27:41,460 --> 00:27:42,576
Some others?

541
00:27:42,576 --> 00:27:43,950
AUDIENCE: Movie
stars and movies.

542
00:27:43,950 --> 00:27:45,366
PROFESSOR: Movie
stars and movies.

543
00:27:45,366 --> 00:27:53,610
This one's a funny one, rig--
So movie stars and then this

544
00:27:53,610 --> 00:27:55,650
is like being in a
movie together, right?

545
00:27:55,650 --> 00:27:57,140
So co-starring or so.

546
00:28:02,360 --> 00:28:04,778
Others?

547
00:28:04,778 --> 00:28:06,359
AUDIENCE: Articles
and Citations.

548
00:28:06,359 --> 00:28:07,775
PROFESSOR: Articles
and citations.

549
00:28:20,151 --> 00:28:22,650
And this is again directed, and
this is not directed, right?

550
00:28:25,340 --> 00:28:27,970
And we can maybe even
try to remind ourselves,

551
00:28:27,970 --> 00:28:31,490
this fell off as alpha
was equal to what?

552
00:28:31,490 --> 00:28:33,110
I guess it was 3,
I think they said.

553
00:28:39,280 --> 00:28:48,080
Actors work around
2.3, I guess they said.

554
00:28:48,080 --> 00:28:53,540
The web was 2.1.

555
00:28:53,540 --> 00:28:55,110
Just because it's
a power law doesn't

556
00:28:55,110 --> 00:28:57,960
mean that it's always going
to have the same alpha right?

557
00:28:57,960 --> 00:29:02,320
But for example, what this means
is that for every paper that

558
00:29:02,320 --> 00:29:08,380
has say 200 citations,
there are going

559
00:29:08,380 --> 00:29:12,610
to be roughly 10 papers
that have 100 citations.

560
00:29:12,610 --> 00:29:14,630
If you increase k
by a factor of 2,

561
00:29:14,630 --> 00:29:16,382
you get almost an
order of magnitude

562
00:29:16,382 --> 00:29:18,090
in terms of the
probability distribution.

563
00:29:27,300 --> 00:29:29,390
So this is an
interesting observation,

564
00:29:29,390 --> 00:29:31,110
and where Barabasi
came in and said,

565
00:29:31,110 --> 00:29:35,610
well, what would be a model
that would recapitulate this?

566
00:29:35,610 --> 00:29:38,485
And what are the models that
did not recapitulate it?

567
00:29:43,214 --> 00:29:45,142
AUDIENCE: [INAUDIBLE].

568
00:29:45,142 --> 00:29:45,850
PROFESSOR: Right.

569
00:29:45,850 --> 00:29:50,300
So the Erdos Renyi--
so other models,

570
00:29:50,300 --> 00:29:56,550
there's the E R,
other models, there's

571
00:29:56,550 --> 00:30:04,580
the Erdos Renyi
network, random network,

572
00:30:04,580 --> 00:30:10,470
and that's because here the
degree distribution is peaked

573
00:30:10,470 --> 00:30:12,830
around something and then
falls off exponentially

574
00:30:12,830 --> 00:30:14,270
as you go above that.

575
00:30:14,270 --> 00:30:16,740
And this is actually where
I think the equations are

576
00:30:16,740 --> 00:30:20,620
wrong in this paper.

577
00:30:20,620 --> 00:30:24,720
Because if you look at
the paper, page 510,

578
00:30:24,720 --> 00:30:26,970
where they say the
Erdos Renyi, you

579
00:30:26,970 --> 00:30:29,400
connect the edges
of probability p,

580
00:30:29,400 --> 00:30:33,640
and then they say you get a
poisson distribution, p of k,

581
00:30:33,640 --> 00:30:36,180
where lambda the mean is
something, but then they say,

582
00:30:36,180 --> 00:30:39,440
oh lambda is equal to some
binomial of something of k,

583
00:30:39,440 --> 00:30:40,790
and so forth.

584
00:30:40,790 --> 00:30:45,560
So I think this is all not
true, but rather that you

585
00:30:45,560 --> 00:30:48,450
can approximate the
binomial with a poisson

586
00:30:48,450 --> 00:30:50,300
in the limit of small Ps.

587
00:30:54,740 --> 00:30:57,840
So be aware if you're
looking at that.

588
00:31:00,900 --> 00:31:03,210
I know-- there was
another network that--

589
00:31:03,210 --> 00:31:04,496
Do you have a question?

590
00:31:04,496 --> 00:31:05,371
AUDIENCE: Oh, no, no.

591
00:31:05,371 --> 00:31:07,120
I was--

592
00:31:07,120 --> 00:31:11,430
PROFESSOR: So we're
going to spend

593
00:31:11,430 --> 00:31:14,200
a lot of time talking about
probability distributions

594
00:31:14,200 --> 00:31:17,440
in the coming weeks, but I just
wanted to highlight that there,

595
00:31:17,440 --> 00:31:19,750
as far as I tell, that is
not true what they say.

596
00:31:22,330 --> 00:31:25,590
But there was one other
model for a network

597
00:31:25,590 --> 00:31:27,760
that they talk about,
or they mention.

598
00:31:27,760 --> 00:31:29,094
Does anybody--

599
00:31:29,094 --> 00:31:30,010
AUDIENCE: Small world.

600
00:31:30,010 --> 00:31:32,960
PROFESSOR: The so-called
small world network, right?

601
00:31:32,960 --> 00:31:36,830
And this is-- small
world network,

602
00:31:36,830 --> 00:31:44,640
and this is based on a paper by
Strogatz-- Watts and Strogatz,

603
00:31:44,640 --> 00:31:46,170
small world.

604
00:31:46,170 --> 00:31:52,574
That's Watts and
Strogatz, and this

605
00:31:52,574 --> 00:31:54,490
was a paper where they
demonstrated that there

606
00:31:54,490 --> 00:31:56,650
was a very simple mechanism.

607
00:31:56,650 --> 00:31:58,880
Just by rewiring
a network that you

608
00:31:58,880 --> 00:32:01,880
could get this so-called
small world phenomenon.

609
00:32:01,880 --> 00:32:06,110
Where the Kevin Bacon thing,
where you can take any--

610
00:32:06,110 --> 00:32:09,090
You're right, from Kevin Bacon,
and this is actually the actor

611
00:32:09,090 --> 00:32:11,770
network, so you could say,
starting with Kevin Bacon

612
00:32:11,770 --> 00:32:14,981
can you construct
a list of actors

613
00:32:14,981 --> 00:32:16,480
that costarred with
each person that

614
00:32:16,480 --> 00:32:17,810
gets you to any given actor.

615
00:32:17,810 --> 00:32:19,185
And the statement
is that you are

616
00:32:19,185 --> 00:32:23,180
supposed to be able to do
that from a path of six.

617
00:32:23,180 --> 00:32:24,760
So that all the
actors are supposed

618
00:32:24,760 --> 00:32:28,094
to be connected to
Kevin Bacon by six.

619
00:32:28,094 --> 00:32:29,510
Although maybe you
guys don't even

620
00:32:29,510 --> 00:32:32,230
remember who Kevin
Bacon is anymore.

621
00:32:32,230 --> 00:32:32,896
Oh, you do?

622
00:32:32,896 --> 00:32:35,430
OK.

623
00:32:35,430 --> 00:32:40,800
This rule works for anybody so
just insert your favorite actor

624
00:32:40,800 --> 00:32:43,840
into that sentence.

625
00:32:43,840 --> 00:32:45,470
And it's important,
just to mention

626
00:32:45,470 --> 00:32:48,280
that just because something
is a small world network,

627
00:32:48,280 --> 00:32:51,880
does not mean that it has
power law distributions.

628
00:32:51,880 --> 00:32:56,170
It may be the case that
many power law networks also

629
00:32:56,170 --> 00:32:58,960
have this small world
character, and I'd

630
00:32:58,960 --> 00:33:01,750
say maybe even most
of them, because some

631
00:33:01,750 --> 00:33:04,100
of those highly
connected nodes are

632
00:33:04,100 --> 00:33:06,354
going to be useful
for connecting anybody

633
00:33:06,354 --> 00:33:07,020
to anybody else.

634
00:33:07,020 --> 00:33:09,960
But that's not required to
get the small world character.

635
00:33:13,600 --> 00:33:15,715
Any questions about
that statement?

636
00:33:20,857 --> 00:33:23,148
AUDIENCE: So you can go from
this small world statement

637
00:33:23,148 --> 00:33:26,136
to any sort of strong statement
concerning connectivity?

638
00:33:29,140 --> 00:33:31,950
PROFESSOR: Well stron-- I guess
that the strong statement is

639
00:33:31,950 --> 00:33:38,890
that this property does
not imply that property.

640
00:33:38,890 --> 00:33:41,190
AUDIENCE: You're not
saying that the universe is

641
00:33:41,190 --> 00:33:43,948
true [INAUDIBLE]
because it seems

642
00:33:43,948 --> 00:33:46,700
like, at least the
examples we've listed,

643
00:33:46,700 --> 00:33:47,724
ought to be small world.

644
00:33:47,724 --> 00:33:48,390
PROFESSOR: Yeah.

645
00:33:48,390 --> 00:33:49,090
I agree.

646
00:33:49,090 --> 00:33:52,300
I think that this
small world property,

647
00:33:52,300 --> 00:33:57,820
that's why I saying that,
it's-- What I do not know,

648
00:33:57,820 --> 00:34:01,444
it's whether it would be
possible to construct a power

649
00:34:01,444 --> 00:34:03,860
law distributed network that
does not have the small world

650
00:34:03,860 --> 00:34:07,750
property, but I would say is
that the ones that I'm aware

651
00:34:07,750 --> 00:34:09,960
of would have the small
world property arm.

652
00:34:14,679 --> 00:34:16,840
Any other questions
about where we are?

653
00:34:16,840 --> 00:34:19,260
So there's interesting
properties of networks

654
00:34:19,260 --> 00:34:21,530
that we would like to explain.

655
00:34:21,530 --> 00:34:25,469
And I would say that
what this paper does,

656
00:34:25,469 --> 00:34:27,929
I think kind of convincingly,
is that they demonstrate

657
00:34:27,929 --> 00:34:31,810
that at least this model, and
we'll get into the assumptions,

658
00:34:31,810 --> 00:34:35,078
does lead to a power
law distributed network.

659
00:34:42,150 --> 00:34:44,060
The answer to the
reading questions

660
00:34:44,060 --> 00:34:48,350
about whether both of these
is strictly necessary,

661
00:34:48,350 --> 00:34:51,860
I think was an
interesting one, and I'd

662
00:34:51,860 --> 00:34:55,699
say that this gets into
the wider issue of there's

663
00:34:55,699 --> 00:34:59,400
a observation that
is maybe interesting.

664
00:34:59,400 --> 00:35:01,910
And then we want to
understand why that might be,

665
00:35:01,910 --> 00:35:04,500
and then what you can do is
you can write down a model that

666
00:35:04,500 --> 00:35:05,736
leads to that behavior.

667
00:35:05,736 --> 00:35:06,860
We've already talked about.

668
00:35:06,860 --> 00:35:08,901
Does that prove that the
assumptions of the model

669
00:35:08,901 --> 00:35:10,330
are correct?

670
00:35:10,330 --> 00:35:11,580
No.

671
00:35:11,580 --> 00:35:14,850
In this case, these are
pretty generic features

672
00:35:14,850 --> 00:35:16,190
of lots and lots of the network.

673
00:35:16,190 --> 00:35:19,780
So when you read it
you kind of believe

674
00:35:19,780 --> 00:35:22,510
that this is a
dominant mechanism,

675
00:35:22,510 --> 00:35:25,770
but it very much does not
prove that these are the only,

676
00:35:25,770 --> 00:35:28,710
this is not at all the only way
to get a power law distributed

677
00:35:28,710 --> 00:35:29,354
network.

678
00:35:29,354 --> 00:35:31,270
I'd say that some of the
language in the paper

679
00:35:31,270 --> 00:35:35,350
might kind of lead you to
believe that that is the case,

680
00:35:35,350 --> 00:35:38,046
and I think this is a standard
logical fallacy that we have

681
00:35:38,046 --> 00:35:39,420
to be careful of,
and something I

682
00:35:39,420 --> 00:35:43,044
think that some the language
is a little bit dangerous.

683
00:35:43,044 --> 00:35:44,710
The development of
the power law scaling

684
00:35:44,710 --> 00:35:47,168
the model indicates that growth
and preferential attachment

685
00:35:47,168 --> 00:35:50,120
play an important
role in networ-- I'd

686
00:35:50,120 --> 00:35:54,260
say that it's quite true,
but once again this question

687
00:35:54,260 --> 00:35:56,050
of-- This is
certainly not a proof,

688
00:35:56,050 --> 00:35:59,040
that those assumptions are
relevant for any given network.

689
00:35:59,040 --> 00:36:05,340
Of course, in all of these
cases, the network does grow,

690
00:36:05,340 --> 00:36:07,850
and there is
preferential attachment.

691
00:36:07,850 --> 00:36:09,850
But there are other
things that are also true,

692
00:36:09,850 --> 00:36:11,340
that may be important,
for example,

693
00:36:11,340 --> 00:36:14,230
in determining exactly what
alpha is or in other things.

694
00:36:14,230 --> 00:36:16,000
And I think that as
indicated that there

695
00:36:16,000 --> 00:36:18,470
are other ways of
getting power law

696
00:36:18,470 --> 00:36:20,790
networks without making
the exact assumptions that

697
00:36:20,790 --> 00:36:21,600
are here.

698
00:36:21,600 --> 00:36:26,340
But its, in my
mind, it's probably

699
00:36:26,340 --> 00:36:29,160
a or d dominant mechanism
in a lot of these networks.

700
00:36:29,160 --> 00:36:31,220
I think it's a fine
paper, but just

701
00:36:31,220 --> 00:36:34,361
remember that it doesn't
prove that those are

702
00:36:34,361 --> 00:36:35,610
the only two important things.

703
00:36:39,660 --> 00:36:40,342
Yes?

704
00:36:40,342 --> 00:36:42,425
AUDIENCE: Just above the
preferential attachments,

705
00:36:42,425 --> 00:36:44,900
I think you mentioned that
you tried different ways,

706
00:36:44,900 --> 00:36:47,170
and only the linearly one was

707
00:36:47,170 --> 00:36:48,437
AUDIENCE: [INAUDIBLE].

708
00:36:48,437 --> 00:36:49,270
PROFESSOR:Troubling.

709
00:36:49,270 --> 00:36:51,490
AUDIENCE: [INAUDIBLE].

710
00:36:51,490 --> 00:36:52,340
PROFESSOR: I agree.

711
00:36:52,340 --> 00:36:52,840
I agree.

712
00:36:55,580 --> 00:36:59,220
And what they assume
in the model here

713
00:36:59,220 --> 00:37:02,180
is that the preferential
attachment goes linearly

714
00:37:02,180 --> 00:37:06,130
with the number
of existing edges.

715
00:37:06,130 --> 00:37:09,290
And I would say that
I very much believe

716
00:37:09,290 --> 00:37:12,414
that preferential attachment
is present in all those things,

717
00:37:12,414 --> 00:37:14,330
but I'm sure that if you
go and you measure it

718
00:37:14,330 --> 00:37:15,800
you're not going to
find that its linear

719
00:37:15,800 --> 00:37:16,841
with the number of edges.

720
00:37:16,841 --> 00:37:19,585
It's going to--
actually, I don't

721
00:37:19,585 --> 00:37:21,460
know what you'll find
in each of those cases,

722
00:37:21,460 --> 00:37:24,760
but there's no reason to
believe it has to be linear.

723
00:37:24,760 --> 00:37:26,940
That being said it may
be, the question is

724
00:37:26,940 --> 00:37:30,020
how strong of a deviation
from linearity is there?

725
00:37:30,020 --> 00:37:33,290
And then how sensitive is the
power law behavior to that?

726
00:37:33,290 --> 00:37:35,150
And that's the kind
of thing that I'm

727
00:37:35,150 --> 00:37:37,970
sure that one of the
20,000 papers that

728
00:37:37,970 --> 00:37:40,380
have cited this paper
in the last 15 years

729
00:37:40,380 --> 00:37:43,522
address this issue.

730
00:37:43,522 --> 00:37:45,980
Yeah, but I mean, this is also
why there are so many papers

731
00:37:45,980 --> 00:37:47,600
that have cite-- It's like
you read this paper , like oh,

732
00:37:47,600 --> 00:37:50,016
you know, it would be really
interesting to do this, tha--

733
00:37:50,016 --> 00:37:52,870
and people have been
following that interest.

734
00:37:56,940 --> 00:37:59,730
Let's go and-- I think that
the derivation is a little bit

735
00:37:59,730 --> 00:38:03,790
tricky, and so I think it's
worth just walking through it.

736
00:38:06,375 --> 00:38:08,000
Especially since some
people apparently

737
00:38:08,000 --> 00:38:12,000
couldn't even get the equations,
which is going to be a problem.

738
00:38:31,700 --> 00:38:33,410
Maybe while we're
on this question

739
00:38:33,410 --> 00:38:39,790
of preferential
attachment-- How do

740
00:38:39,790 --> 00:38:43,510
you guys feel about this
question of networks

741
00:38:43,510 --> 00:38:45,850
within, say the
transcriptional network

742
00:38:45,850 --> 00:38:47,210
of E. coli or other cells?

743
00:38:47,210 --> 00:38:51,230
I mean do you think that
these properties are

744
00:38:51,230 --> 00:38:53,140
relevant in the cell or--

745
00:39:10,870 --> 00:39:15,362
So what would growth mean?

746
00:39:15,362 --> 00:39:17,810
AUDIENCE: [INAUDIBLE].

747
00:39:17,810 --> 00:39:19,865
PROFESSOR: So growth
would correspond

748
00:39:19,865 --> 00:39:20,740
to adding a new gene.

749
00:39:20,740 --> 00:39:22,269
Does that ever happen?

750
00:39:22,269 --> 00:39:22,852
AUDIENCE: Yes.

751
00:39:26,390 --> 00:39:30,130
PROFESSOR: Can some given
a possible mechanism

752
00:39:30,130 --> 00:39:33,260
by which a new gene is
added to the genome?

753
00:39:33,260 --> 00:39:34,550
AUDIENCE: Duplication.

754
00:39:34,550 --> 00:39:38,010
PROFESSOR: For example,
duplication is common, right?

755
00:39:38,010 --> 00:39:39,151
eg.

756
00:39:39,151 --> 00:39:39,650
duplication.

757
00:39:43,040 --> 00:39:47,910
So what does this mean for
preferential attachment?

758
00:39:56,557 --> 00:39:58,890
AUDIENCE: --duplicate the
gene and it will probably also

759
00:39:58,890 --> 00:40:02,572
duplicate the promoter
region, which means--

760
00:40:02,572 --> 00:40:03,280
PROFESSOR: Right.

761
00:40:03,280 --> 00:40:05,690
So this, I think,
is very interesting.

762
00:40:05,690 --> 00:40:08,370
So duplication,
in general you'll

763
00:40:08,370 --> 00:40:11,130
duplicate both the coding region
makes protein, but also maybe

764
00:40:11,130 --> 00:40:14,610
the promoter region that
specifies the regulation.

765
00:40:14,610 --> 00:40:18,820
So if you imagine
you have some x here

766
00:40:18,820 --> 00:40:23,290
that is-- And we can
remind ourselves,

767
00:40:23,290 --> 00:40:26,780
are both the incoming and
outgoing edges power law

768
00:40:26,780 --> 00:40:29,190
distributed in
transcription networks?

769
00:40:29,190 --> 00:40:32,320
No, I know this was in
the pre-class reading,

770
00:40:32,320 --> 00:40:33,860
but just in case.

771
00:40:33,860 --> 00:40:38,460
So what you find is that some
transcription factors regulate

772
00:40:38,460 --> 00:40:42,570
many genes, but we don't
have any proteins that

773
00:40:42,570 --> 00:40:49,220
are regulated by 200 genes,
so in that sense typically

774
00:40:49,220 --> 00:40:51,230
we have the things
that are regulated,

775
00:40:51,230 --> 00:40:56,360
there's maybe some x1, x2, x3.

776
00:40:56,360 --> 00:40:58,740
And there might be a
few incoming edges,

777
00:40:58,740 --> 00:41:01,950
so the expression a gene
is typically specified

778
00:41:01,950 --> 00:41:04,460
by a few transcription factors.

779
00:41:04,460 --> 00:41:06,180
Whereas some
transcription factors

780
00:41:06,180 --> 00:41:09,700
might have 100 outgoing edges.

781
00:41:09,700 --> 00:41:12,140
So it's the outgoing edges
that are power law distributed,

782
00:41:12,140 --> 00:41:16,510
and the ingoing are closer
to being plus on or so.

783
00:41:16,510 --> 00:41:21,420
So you can imagine that this
guy might have 100 or so,

784
00:41:21,420 --> 00:41:23,410
whereas over here
some y transcription

785
00:41:23,410 --> 00:41:27,690
factor that is just regulating
two genes, say y1, and y2.

786
00:41:30,520 --> 00:41:35,290
Now, question is,
if gene duplication

787
00:41:35,290 --> 00:41:38,580
occurs kind of randomly
throughout the genome,

788
00:41:38,580 --> 00:41:43,740
which transcription
factor x or y

789
00:41:43,740 --> 00:41:48,282
is more likely to have a
target that's duplicated?

790
00:41:48,282 --> 00:41:49,250
AUDIENCE: x.

791
00:41:49,250 --> 00:41:51,150
PROFESSOR: x, all right.

792
00:41:51,150 --> 00:41:53,215
Interestingly, how
does that scale

793
00:41:53,215 --> 00:41:54,340
with the number of targets?

794
00:41:57,258 --> 00:41:58,150
AUDIENCE: Linear?

795
00:41:58,150 --> 00:42:00,540
PROFESSOR: This actually
is linear, right?

796
00:42:00,540 --> 00:42:03,930
So I'd say that gene
duplication does

797
00:42:03,930 --> 00:42:07,970
give growth and
preferential attachment that

798
00:42:07,970 --> 00:42:11,680
is basically linear with
a number of targets.

799
00:42:11,680 --> 00:42:16,930
It's interesting I'd say I
find this kind of observation

800
00:42:16,930 --> 00:42:18,860
quite interesting,
and compelling,

801
00:42:18,860 --> 00:42:22,080
and makes me feel kind
of comfortable about this

802
00:42:22,080 --> 00:42:25,540
as a mechanism for some
of the global properties.

803
00:42:25,540 --> 00:42:27,750
I mean there's no
selection, there's

804
00:42:27,750 --> 00:42:30,010
no way to explain the
interesting network motifs

805
00:42:30,010 --> 00:42:31,468
and so forth here,
but I'd say just

806
00:42:31,468 --> 00:42:33,210
in terms of some
general properties I

807
00:42:33,210 --> 00:42:34,640
think it's interesting.

808
00:42:34,640 --> 00:42:37,390
Of course, once
again not a proof.

809
00:42:37,390 --> 00:42:40,100
Evolution can do whatever
it wants with these gene

810
00:42:40,100 --> 00:42:44,360
duplication events, but also
I would say not everybody

811
00:42:44,360 --> 00:42:47,140
finds this argument
very, very compelling.

812
00:42:47,140 --> 00:42:49,540
But I'd say I think
it's kind of-- I

813
00:42:49,540 --> 00:42:51,844
get a warm fuzzy feeling inside.

814
00:42:51,844 --> 00:42:54,229
AUDIENCE: We're talking
about transcription network,

815
00:42:54,229 --> 00:42:56,770
it's different from the other
networks you were talking about

816
00:42:56,770 --> 00:43:00,760
in that you also lose genes,
and so is there any discussion--

817
00:43:00,760 --> 00:43:06,647
PROFESSOR: Well you know, you
could lose web pages, you can--

818
00:43:06,647 --> 00:43:08,272
AUDIENCE: Are you
losing them nearly as

819
00:43:08,272 --> 00:43:13,090
fast as you're adding them?

820
00:43:13,090 --> 00:43:15,510
PROFESSOR: Yeah, I don't know.

821
00:43:15,510 --> 00:43:20,190
I find that lots of links
to my web pages just

822
00:43:20,190 --> 00:43:26,090
disappear over time, and I--
It's a reasonable question.

823
00:43:26,090 --> 00:43:31,450
I don't-- In some of these
you say, oh well right,

824
00:43:31,450 --> 00:43:33,970
so with the web has been
growing a lot recently,

825
00:43:33,970 --> 00:43:37,370
and so then we'd say the birth
dominates over death there.

826
00:43:37,370 --> 00:43:40,830
Where if you talk about genome
sizes along different lineages,

827
00:43:40,830 --> 00:43:42,686
it certainly is not
growing exponentially

828
00:43:42,686 --> 00:43:43,560
the way the web pag--

829
00:43:43,560 --> 00:43:46,340
I think that that's
fair and true,

830
00:43:46,340 --> 00:43:49,410
but we haven't really actually
specified or made clear,

831
00:43:49,410 --> 00:43:53,400
within a model what happens if
you allow for birth and death.

832
00:43:53,400 --> 00:43:55,830
But I think that you
could introduce death

833
00:43:55,830 --> 00:43:57,590
and recapitulate these
behaviors, so it's

834
00:43:57,590 --> 00:44:01,030
not-- I think just because
some nodes disappear,

835
00:44:01,030 --> 00:44:02,640
doesn't mean that
we have to throw

836
00:44:02,640 --> 00:44:03,962
the whole idea out the window.

837
00:44:07,600 --> 00:44:11,040
But in the presence
of evolution this

838
00:44:11,040 --> 00:44:12,880
is all very complicated, right?

839
00:44:12,880 --> 00:44:15,525
So you can't carry
this argument too far.

840
00:44:19,700 --> 00:44:20,950
AUDIENCE: So it's [INAUDIBLE].

841
00:44:26,850 --> 00:44:28,570
PROFESSOR: Well
what we're assuming

842
00:44:28,570 --> 00:44:33,122
is that there is some segment of
DNA that's in front of the gene

843
00:44:33,122 --> 00:44:34,580
that specifies--
gives instructions

844
00:44:34,580 --> 00:44:40,050
of when to transcribe the gene.

845
00:44:40,050 --> 00:44:43,410
So the linearity is
really just assuming

846
00:44:43,410 --> 00:44:48,782
that genes have the same rate
of being duplicated on average.

847
00:44:48,782 --> 00:44:50,240
And this is a very
global property,

848
00:44:50,240 --> 00:44:55,982
so I think that it's
kind of roughly--

849
00:44:55,982 --> 00:44:58,190
I would say it's the middle
model that you would use,

850
00:44:58,190 --> 00:44:59,737
if you're had to
write an old model.

851
00:45:05,735 --> 00:45:07,860
AUDIENCE: Is there anything
in looking for evidence

852
00:45:07,860 --> 00:45:09,870
to support [INAUDIBLE].

853
00:45:09,870 --> 00:45:12,910
PROFESSOR: That's an
interesting question.

854
00:45:12,910 --> 00:45:14,840
It's hard to know
what it would even

855
00:45:14,840 --> 00:45:17,710
mean to collect the evidence
to support it in the sense

856
00:45:17,710 --> 00:45:21,260
that-- You're saying along
different evolutionary

857
00:45:21,260 --> 00:45:27,640
lineages, could we say that
it's more likely to grow.

858
00:45:27,640 --> 00:45:31,360
Of course the other thing to
say is that, the rate of death

859
00:45:31,360 --> 00:45:33,210
would also scale linearly.

860
00:45:33,210 --> 00:45:37,739
In the sense that a gene
being stochastically removed

861
00:45:37,739 --> 00:45:39,530
from the genome should
also scale linearly,

862
00:45:39,530 --> 00:45:41,570
so it's not that
you don't actually

863
00:45:41,570 --> 00:45:45,110
then expect there to be
any systematic change.

864
00:45:45,110 --> 00:45:47,010
I mean it's not as
simple as just saying, oh

865
00:45:47,010 --> 00:45:50,990
the number of targets of
a transcription factor

866
00:45:50,990 --> 00:45:53,002
with many targets
should grow faster.

867
00:45:53,002 --> 00:45:54,460
It's really that
the expectation is

868
00:45:54,460 --> 00:45:59,230
that it should be changing
faster because both duplication

869
00:45:59,230 --> 00:46:01,020
and removal would
both be increasing.

870
00:46:01,020 --> 00:46:04,370
So I think the signature is not
totally obvious in that sense.

871
00:46:10,966 --> 00:46:12,340
So how many people
actually tried

872
00:46:12,340 --> 00:46:16,710
to piece this derivation apart?

873
00:46:16,710 --> 00:46:18,570
Anybody?

874
00:46:18,570 --> 00:46:22,790
All right, and were you happy
with it at the end of your--

875
00:46:22,790 --> 00:46:23,790
AUDIENCE: I think that--

876
00:46:23,790 --> 00:46:25,120
PROFESSOR: --permissions?

877
00:46:25,120 --> 00:46:28,380
AUDIENCE: --that I was
a little bit iffy about.

878
00:46:28,380 --> 00:46:32,030
PROFESSOR:There is like a
crux of the climb at the end.

879
00:46:32,030 --> 00:46:36,370
So let's make sure that we can
understand what happened there.

880
00:46:36,370 --> 00:46:39,240
It's worth-- since we
read the paper it's worth

881
00:46:39,240 --> 00:46:40,240
trying to figure it out.

882
00:46:44,000 --> 00:46:50,720
So what we're going to assume
is that we start with m0 nodes.

883
00:46:54,080 --> 00:46:56,700
So they're going to
be here, and the idea

884
00:46:56,700 --> 00:46:59,010
is it doesn't really matter
how we start this thing.

885
00:46:59,010 --> 00:47:01,480
They might start out
being unconnected,

886
00:47:01,480 --> 00:47:03,180
or they might he connected.

887
00:47:03,180 --> 00:47:06,910
But over time the
signature how we start

888
00:47:06,910 --> 00:47:09,892
is not supposed to
be that important.

889
00:47:09,892 --> 00:47:12,050
What we're going to do is
at each time point we're

890
00:47:12,050 --> 00:47:14,830
going to add one more node.

891
00:47:14,830 --> 00:47:20,500
And as we do that we're
going to add m edges as well.

892
00:47:20,500 --> 00:47:25,420
So we then have the number
of, we'll say, nodes,

893
00:47:25,420 --> 00:47:30,860
N, as a function of time, is
going to be equal to what?

894
00:47:35,510 --> 00:47:37,124
[INTERPOSING VOICES]

895
00:47:37,124 --> 00:47:37,790
PROFESSOR:Right.

896
00:47:37,790 --> 00:47:40,120
This is just going to be--
we're going to start at m0

897
00:47:40,120 --> 00:47:44,530
and we're going to add
1 each time, m0 plus 2.

898
00:47:44,530 --> 00:47:49,000
Number of edges is just going
to be equal to the number

899
00:47:49,000 --> 00:47:54,190
that we add each time
point, times the time.

900
00:47:54,190 --> 00:47:56,142
So here we're assuming
that we start out

901
00:47:56,142 --> 00:47:57,600
with these nodes
being unconnected.

902
00:48:00,830 --> 00:48:04,540
Now we're given the
assumption that there's

903
00:48:04,540 --> 00:48:06,840
preferential attachment,
so that means

904
00:48:06,840 --> 00:48:11,150
that the probability
of connecting

905
00:48:11,150 --> 00:48:15,430
to some i-th node
that has k edges

906
00:48:15,430 --> 00:48:19,650
is going to be k
to the i divided

907
00:48:19,650 --> 00:48:28,350
by the sum over all the edges.

908
00:48:30,856 --> 00:48:31,355
Yes?

909
00:48:31,355 --> 00:48:33,765
AUDIENCE: Why is [INAUDIBLE]?

910
00:48:33,765 --> 00:48:35,390
PROFESSOR: All right,
so the assumption

911
00:48:35,390 --> 00:48:41,010
is at each time point we add a
new node, let's say this node,

912
00:48:41,010 --> 00:48:45,440
and with that we bring in
some number, n, of new edges.

913
00:48:45,440 --> 00:48:48,780
So this could be 3,
and then we go randomly

914
00:48:48,780 --> 00:48:52,710
to 3 of the existing nodes.

915
00:48:52,710 --> 00:48:56,083
So each time point
we add m edges.

916
00:48:56,083 --> 00:48:59,464
AUDIENCE: How do we necessarily
add them to the new node?

917
00:48:59,464 --> 00:49:00,430
Like [INAUDIBLE].

918
00:49:07,741 --> 00:49:09,990
PROFESSOR: I'm sorry I don't
understa-- oh yeah right,

919
00:49:09,990 --> 00:49:14,150
so the assumption is that
the new node is indeed

920
00:49:14,150 --> 00:49:18,300
being connected to-- that
all m edges that we're adding

921
00:49:18,300 --> 00:49:19,470
are to this new node.

922
00:49:23,570 --> 00:49:25,576
So this is the linear
preferential attachment

923
00:49:25,576 --> 00:49:26,700
that we were talking about.

924
00:49:42,510 --> 00:49:44,120
So what we want
to know first, is

925
00:49:44,120 --> 00:49:49,240
how after a node is connected,
how is it that number of edges

926
00:49:49,240 --> 00:49:52,010
will grow over time.

927
00:49:52,010 --> 00:49:55,400
What we know is
that when it's first

928
00:49:55,400 --> 00:50:00,000
added it has it
exactly m edges, right?

929
00:50:00,000 --> 00:50:02,797
But then as new nodes come,
then we'll maybe get some more

930
00:50:02,797 --> 00:50:03,630
and then it'll grow.

931
00:50:07,400 --> 00:50:12,090
And in particular
we want to get--

932
00:50:12,090 --> 00:50:15,860
We're told that it's going
to grow as this differential

933
00:50:15,860 --> 00:50:24,400
equation, so we want
to kind of get to this.

934
00:50:24,400 --> 00:50:26,050
And the way to
think about this is

935
00:50:26,050 --> 00:50:29,200
that, all right well, how is
it that the number of edges

936
00:50:29,200 --> 00:50:33,580
will change at each time
point, so delta k i.

937
00:50:33,580 --> 00:50:38,300
Well the expected
number of edges

938
00:50:38,300 --> 00:50:41,680
that will be attached to
some node, well that's

939
00:50:41,680 --> 00:50:46,870
going to be m,
this is the number

940
00:50:46,870 --> 00:50:50,170
of edges that were attached
by this incoming node,

941
00:50:50,170 --> 00:50:56,350
times this probability of
attaching to this node.

942
00:50:56,350 --> 00:51:00,350
So this is the
probability of k i.

943
00:51:00,350 --> 00:51:03,680
Now this is in one time step.

944
00:51:03,680 --> 00:51:04,970
So this is really a delta k i.

945
00:51:04,970 --> 00:51:07,261
If we want, we could say over
some delta t, which is 1.

946
00:51:09,980 --> 00:51:12,462
So from that standpoint, we
can actually then write it

947
00:51:12,462 --> 00:51:13,920
as differential
equation, where you

948
00:51:13,920 --> 00:51:17,320
say the change in this number
of edges with respect to time

949
00:51:17,320 --> 00:51:21,130
is indeed going to
be equal to m times

950
00:51:21,130 --> 00:51:27,870
this guy here, which
is the number of edges

951
00:51:27,870 --> 00:51:34,410
that that node has at this time,
divided by this sum over all

952
00:51:34,410 --> 00:51:34,910
those edges.

953
00:51:40,240 --> 00:51:44,340
This is just kind of the
expected number of edges

954
00:51:44,340 --> 00:51:46,403
to be added to that
node at each time point.

955
00:51:49,360 --> 00:51:52,516
What does this thing-- What
does that thing equal to?

956
00:51:52,516 --> 00:51:54,488
Yes?

957
00:51:54,488 --> 00:51:56,131
AUDIENCE: --that
equation, because it

958
00:51:56,131 --> 00:51:58,925
seemed like you just wrote
the same equation on the line

959
00:51:58,925 --> 00:52:00,404
above that line.

960
00:52:00,404 --> 00:52:03,855
You just substituted it--

961
00:52:03,855 --> 00:52:04,841
PROFESSOR: I did.

962
00:52:04,841 --> 00:52:12,314
AUDIENCE: OK, but [INAUDIBLE]
wrote it as [INAUDIBLE].

963
00:52:14,917 --> 00:52:17,250
PROFESSOR: Yeah, so this is
kind of the discrete version

964
00:52:17,250 --> 00:52:19,269
of this differential equation.

965
00:52:19,269 --> 00:52:19,810
AUDIENCE: Oh.

966
00:52:19,810 --> 00:52:21,200
PROFESSOR: Right.

967
00:52:21,200 --> 00:52:24,120
Yeah that's right, that's right.

968
00:52:24,120 --> 00:52:28,120
And of course the beginning
could be highly stochastic

969
00:52:28,120 --> 00:52:29,950
but we're just thinking
about in the limit

970
00:52:29,950 --> 00:52:32,010
of if it's deterministic.

971
00:52:35,330 --> 00:52:40,230
What is this thing in
terms of-- from here

972
00:52:40,230 --> 00:52:42,530
this is just a normalization
constant, right?

973
00:52:42,530 --> 00:52:46,076
Because each edge has to
be attached somewhere,

974
00:52:46,076 --> 00:52:47,700
we're assuming it's
linear with respect

975
00:52:47,700 --> 00:52:49,744
to the number of edges
at each node, right?

976
00:52:49,744 --> 00:52:51,410
And that means that
for normalization we

977
00:52:51,410 --> 00:52:56,910
have to divide by the sum over
all those edges, the edges

978
00:52:56,910 --> 00:52:59,490
that each of the
nodes might have.

979
00:52:59,490 --> 00:53:02,580
What is this thing equal to
in terms of something else

980
00:53:02,580 --> 00:53:04,540
that we might have on the board?

981
00:53:04,540 --> 00:53:05,040
Yeah?

982
00:53:05,040 --> 00:53:09,062
AUDIENCE: These have edges
with respect to [INAUDIBLE].

983
00:53:09,062 --> 00:53:09,770
PROFESSOR: Right.

984
00:53:09,770 --> 00:53:12,920
So I guess the question is
this, can we write this?

985
00:53:17,580 --> 00:53:18,920
Where E is a function of time?

986
00:53:23,270 --> 00:53:25,940
Is that correct?

987
00:53:25,940 --> 00:53:27,240
So we're getting some shakes.

988
00:53:27,240 --> 00:53:28,762
AUDIENCE: Isn't it 2E?

989
00:53:28,762 --> 00:53:29,470
PROFESSOR: Right.

990
00:53:29,470 --> 00:53:30,310
So it's actually 2E.

991
00:53:32,990 --> 00:53:34,860
Because what you notice
here is that this

992
00:53:34,860 --> 00:53:38,400
is the sum over all of the edges
that each of the nodes have.

993
00:53:38,400 --> 00:53:40,810
But each edge is
connecting 2 nodes.

994
00:53:40,810 --> 00:53:43,620
So the sum over all
these edge distributions

995
00:53:43,620 --> 00:53:46,330
is twice the number of edges.

996
00:53:46,330 --> 00:53:49,540
Now I would say as a
physicist, working in biology,

997
00:53:49,540 --> 00:53:51,970
my general attitude is
that a factor of 2 here,

998
00:53:51,970 --> 00:53:55,630
factor of 2 there,
doesn't really matter.

999
00:53:55,630 --> 00:53:57,640
But this factor of 2
actually is relevant

1000
00:53:57,640 --> 00:54:01,140
because it ends up determining
the scaling over time.

1001
00:54:01,140 --> 00:54:04,420
So not all factors of
2 are created equal,

1002
00:54:04,420 --> 00:54:08,930
and this is one that is
worth paying attention to.

1003
00:54:08,930 --> 00:54:10,729
Does everyone here
understand why this

1004
00:54:10,729 --> 00:54:12,020
is 2 times the number of edges?

1005
00:54:17,370 --> 00:54:21,247
k1 is equal to 1, k2 is
equal to 1, number of edges

1006
00:54:21,247 --> 00:54:21,830
is equal to 1.

1007
00:54:25,750 --> 00:54:27,019
Yeah.

1008
00:54:27,019 --> 00:54:30,135
AUDIENCE: So that means we're
in an undirected network,

1009
00:54:30,135 --> 00:54:31,510
if we were in a
directed network,

1010
00:54:31,510 --> 00:54:34,695
then we would not
have that factor of 2.

1011
00:54:34,695 --> 00:54:35,320
PROFESSOR: Yes.

1012
00:54:35,320 --> 00:54:37,065
So we are indeed
in an undirected,

1013
00:54:37,065 --> 00:54:38,440
and I'd say in a
directed network

1014
00:54:38,440 --> 00:54:40,481
you have to then be more
careful about what you--

1015
00:54:40,481 --> 00:54:43,430
you have to specify
the k's in and k's out.

1016
00:54:43,430 --> 00:54:44,970
So actually, already
just by writing

1017
00:54:44,970 --> 00:54:46,300
this we've already
assumed it's undirected,

1018
00:54:46,300 --> 00:54:48,410
because we haven't
specified what we mean by k.

1019
00:54:54,620 --> 00:54:56,820
We're here, but
very conveniently we

1020
00:54:56,820 --> 00:54:58,640
already know how
many edges there

1021
00:54:58,640 --> 00:55:00,680
are as a function of time.

1022
00:55:00,680 --> 00:55:02,680
This is just equal to m times t.

1023
00:55:02,680 --> 00:55:07,740
So we get something that's very
convenient ki divided by 2 t.

1024
00:55:12,810 --> 00:55:16,220
From here we can solve
the differential equation.

1025
00:55:16,220 --> 00:55:17,890
This is what we want to show.

1026
00:55:20,850 --> 00:55:22,310
The fact that we're
doing partials

1027
00:55:22,310 --> 00:55:24,870
doesn't really matter,
because it's just time here.

1028
00:55:24,870 --> 00:55:28,780
So it's really-- so
we have d ki over ki,

1029
00:55:28,780 --> 00:55:32,220
is equal to dt over 2t.

1030
00:55:35,620 --> 00:55:37,280
This 2, really again,
is going to make

1031
00:55:37,280 --> 00:55:40,450
a difference, because when
we go and we integrate,

1032
00:55:40,450 --> 00:55:44,370
we get the logs and so forth.

1033
00:55:44,370 --> 00:55:46,620
And so we get that ki
as a function of time

1034
00:55:46,620 --> 00:55:50,130
is going to grow with
time, with some constant c,

1035
00:55:50,130 --> 00:55:53,024
proportionality to the
square root of time.

1036
00:55:53,024 --> 00:55:55,690
So if we didn't have the half it
would just be linear with time.

1037
00:55:59,090 --> 00:56:01,800
Now how do we know
what c-- in general

1038
00:56:01,800 --> 00:56:05,640
how do we get constants
of integration in life?

1039
00:56:05,640 --> 00:56:06,890
AUDIENCE: Boundary conditions.

1040
00:56:06,890 --> 00:56:07,980
PROFESSOR: Yeah,
boundary conditions,

1041
00:56:07,980 --> 00:56:09,270
in this case, the
initial condition.

1042
00:56:09,270 --> 00:56:10,436
And what is it that we know?

1043
00:56:14,424 --> 00:56:15,952
AUDIENCE: ki.

1044
00:56:15,952 --> 00:56:16,660
PROFESSOR: Right.

1045
00:56:16,660 --> 00:56:21,400
So what we know is that
ki, so this i-th node,

1046
00:56:21,400 --> 00:56:24,570
when it's added at time ti,
it should be equal to what?

1047
00:56:27,366 --> 00:56:28,314
AUDIENCE: m.

1048
00:56:28,314 --> 00:56:28,980
PROFESSOR: Yeah.

1049
00:56:28,980 --> 00:56:29,646
It's equal to m.

1050
00:56:29,646 --> 00:56:31,470
So when it's first
added, at some time ti,

1051
00:56:31,470 --> 00:56:33,910
its number of edges
is equal to m.

1052
00:56:33,910 --> 00:56:36,670
Because that's what we've
assumed, is that we add a node

1053
00:56:36,670 --> 00:56:38,920
and we connect it
randomly and other things,

1054
00:56:38,920 --> 00:56:41,820
so it has m edges initially.

1055
00:56:41,820 --> 00:56:46,560
So from this kot, this
is then equal to m

1056
00:56:46,560 --> 00:56:51,620
times the square root of
t divided by t initial.

1057
00:56:51,620 --> 00:56:55,520
Where ti is the time that i-th
node was added to the network.

1058
00:56:58,110 --> 00:57:00,068
Are there any questions
about how we got there?

1059
00:57:07,580 --> 00:57:10,040
So I think that this is
relatively straightforward.

1060
00:57:10,040 --> 00:57:14,000
The part that gets
confusing is this later part

1061
00:57:14,000 --> 00:57:17,630
about the probabilities and
keeping everything straight.

1062
00:57:17,630 --> 00:57:20,709
And so what Barabasi
did next, is

1063
00:57:20,709 --> 00:57:22,750
he said, all right, well,
what we're going to do,

1064
00:57:22,750 --> 00:57:26,620
is we're going to talk
about the probability, P.

1065
00:57:26,620 --> 00:57:31,670
Now this is an actual honest
to goodness probability.

1066
00:57:31,670 --> 00:57:34,026
The big P is actually
a probability,

1067
00:57:34,026 --> 00:57:35,650
and that's as compared
to a probability

1068
00:57:35,650 --> 00:57:40,310
distribution, little p.

1069
00:57:40,310 --> 00:57:44,239
And I'll put in a little curly
here thing, so it's a little p.

1070
00:57:44,239 --> 00:57:46,780
This is saying if you want to
get an actual probability here,

1071
00:57:46,780 --> 00:57:49,071
then you have to multiply
that probability distribution

1072
00:57:49,071 --> 00:57:51,890
times some range delta k.

1073
00:57:51,890 --> 00:57:53,960
If you want to know
that the probability

1074
00:57:53,960 --> 00:57:58,310
that some node has between
some number and some number

1075
00:57:58,310 --> 00:58:02,220
of edges, then you
multiply it by that range.

1076
00:58:02,220 --> 00:58:02,720
Right?

1077
00:58:05,420 --> 00:58:09,070
Probability distribution,
this is an actual probability.

1078
00:58:09,070 --> 00:58:11,310
And as befits an
actual probability,

1079
00:58:11,310 --> 00:58:16,230
we're going to say, OK the
probability that the i-th node

1080
00:58:16,230 --> 00:58:22,300
has k edges, that are
less than some value k.

1081
00:58:22,300 --> 00:58:24,640
And remember this thing is
actually a function of time.

1082
00:58:29,260 --> 00:58:31,590
But we have an expression
for ki as a function of time,

1083
00:58:31,590 --> 00:58:32,381
it's equal to this.

1084
00:58:34,750 --> 00:58:38,480
So we can solve when we show
that this probability is also

1085
00:58:38,480 --> 00:58:40,960
the same as this
other probability.

1086
00:58:40,960 --> 00:58:45,400
That the i-th node was
added after some time

1087
00:58:45,400 --> 00:58:46,940
t that can be written as this.

1088
00:58:52,710 --> 00:58:55,050
So this is saying,
the probability

1089
00:58:55,050 --> 00:59:00,310
that some random, say i-th
node, has fewer than k edges,

1090
00:59:00,310 --> 00:59:03,990
is the same as saying
it's the probability

1091
00:59:03,990 --> 00:59:06,360
that the i-th node was
added after some time, t,

1092
00:59:06,360 --> 00:59:10,430
which is this thing.

1093
00:59:10,430 --> 00:59:12,450
Because the number of
edges will grow over

1094
00:59:12,450 --> 00:59:14,610
time for each of these nodes.

1095
00:59:20,050 --> 00:59:22,230
Do you understand that kind
of conceptual statement

1096
00:59:22,230 --> 00:59:23,426
that was made there?

1097
00:59:26,540 --> 00:59:27,040
Yes?

1098
00:59:27,040 --> 00:59:27,710
Any questions?

1099
00:59:32,220 --> 00:59:35,270
All right, so the probability
that this i-th node was added

1100
00:59:35,270 --> 00:59:41,082
after this time, is also of
course 1 minus the probability

1101
00:59:41,082 --> 00:59:42,540
that it was added
before that time.

1102
00:59:55,050 --> 00:59:58,350
Whereas time, little t
here, this is at the time

1103
00:59:58,350 --> 01:00:00,310
that you're actually looking.

1104
01:00:00,310 --> 01:00:03,002
So this is saying, oh well, if
little t is 100, for example,

1105
01:00:03,002 --> 01:00:04,710
it's saying all right,
at that time point

1106
01:00:04,710 --> 01:00:07,240
after I got 100 nodes, we
want to say, all right, what's

1107
01:00:07,240 --> 01:00:09,500
the probably that some
random i-th node was

1108
01:00:09,500 --> 01:00:11,420
added before this quantity.

1109
01:00:11,420 --> 01:00:13,430
And this is just again
some other kind of time,

1110
01:00:13,430 --> 01:00:14,855
if you'd like.

1111
01:00:19,810 --> 01:00:22,710
I think this is the part that
it is especially kind of weird.

1112
01:00:22,710 --> 01:00:24,210
So this is also
equal to this thing.

1113
01:00:26,720 --> 01:00:28,260
And I think
reasonable people can

1114
01:00:28,260 --> 01:00:31,770
argue about exactly what
you should write here,

1115
01:00:31,770 --> 01:00:36,610
but let's figure out the
basic argument first.

1116
01:00:36,610 --> 01:00:40,730
So there's this probability
is equal to this thing.

1117
01:00:40,730 --> 01:00:49,020
So this statement is really that
at some time t we have how many

1118
01:00:49,020 --> 01:00:49,520
nodes?

1119
01:00:49,520 --> 01:00:55,490
We have m0 plus t nodes, right?

1120
01:00:55,490 --> 01:00:58,650
So this is something here.

1121
01:00:58,650 --> 01:01:02,170
And of course there are edges
going around doing things.

1122
01:01:02,170 --> 01:01:04,400
And what we want to know
is, what's the probability

1123
01:01:04,400 --> 01:01:06,500
if I grab one of
them, we're going

1124
01:01:06,500 --> 01:01:08,610
to call that the i-th node.

1125
01:01:08,610 --> 01:01:11,050
What's the probability
if I grab one of them

1126
01:01:11,050 --> 01:01:17,310
that it was added
before sometime here.

1127
01:01:17,310 --> 01:01:20,290
And it's useful to just imagine
this is as just being some time

1128
01:01:20,290 --> 01:01:28,350
t, just so that we don't get
confused by all the symbols.

1129
01:01:28,350 --> 01:01:31,080
You say, oh well,
that probability

1130
01:01:31,080 --> 01:01:35,167
is really just the probability--
well how many nodes total do

1131
01:01:35,167 --> 01:01:36,840
we have here, m0 plus t.

1132
01:01:36,840 --> 01:01:39,820
How many nodes were there that
were added before this time t?

1133
01:01:39,820 --> 01:01:45,334
Well that's going to be t, you
might want to say t plus m0.

1134
01:01:45,334 --> 01:01:47,750
There's a question of whether
you include those nodes that

1135
01:01:47,750 --> 01:01:50,100
started there or not.

1136
01:01:50,100 --> 01:01:53,650
Given the equations that
Barabasi wrote down,

1137
01:01:53,650 --> 01:01:55,137
he kind of assumes
that we're only

1138
01:01:55,137 --> 01:01:56,845
counting the nodes
that were added later.

1139
01:01:59,480 --> 01:02:02,180
So I'd say if you
want, you could either

1140
01:02:02,180 --> 01:02:04,350
add an m0 up there,
or get rid of this m0,

1141
01:02:04,350 --> 01:02:06,300
depending on what you like.

1142
01:02:06,300 --> 01:02:10,270
But broadly there's this idea
that we have this many nodes,

1143
01:02:10,270 --> 01:02:14,262
and this many of them were
added for some time t.

1144
01:02:14,262 --> 01:02:16,470
And that's how we get this
m squared t over k squared

1145
01:02:16,470 --> 01:02:20,040
was just that time t divided
by the total number of nodes.

1146
01:02:23,199 --> 01:02:24,990
And this whole discussion
about whether you

1147
01:02:24,990 --> 01:02:26,479
count the initial
m0 nodes or not,

1148
01:02:26,479 --> 01:02:28,020
it doesn't matter
because we're going

1149
01:02:28,020 --> 01:02:29,590
to take the limit as
t goes to infinity,

1150
01:02:29,590 --> 01:02:30,548
and that all goes away.

1151
01:02:33,709 --> 01:02:35,000
Are there questions about this?

1152
01:02:37,690 --> 01:02:44,360
There is something kind of mind
twisting about this argument,

1153
01:02:44,360 --> 01:02:49,890
even though we're really just
picking big T objects out

1154
01:02:49,890 --> 01:02:53,357
of essentially little t objects,
but somehow something funny

1155
01:02:53,357 --> 01:02:53,940
goes on there.

1156
01:02:58,310 --> 01:03:02,347
Any questions about that?

1157
01:03:02,347 --> 01:03:05,700
AUDIENCE: Could you just go
through the argument one more

1158
01:03:05,700 --> 01:03:06,200
time?

1159
01:03:06,200 --> 01:03:09,200
PROFESSOR: Yeah,
sure, sure Right so

1160
01:03:09,200 --> 01:03:13,330
I think that what's confusing
about it is the fact that we're

1161
01:03:13,330 --> 01:03:18,970
asking whether the i-th node
was added before some time t.

1162
01:03:18,970 --> 01:03:22,890
And this time t is equal
to something that's funny

1163
01:03:22,890 --> 01:03:25,100
based on what we've just done.

1164
01:03:25,100 --> 01:03:32,560
But it's useful to just
ask, if at time little t

1165
01:03:32,560 --> 01:03:35,000
you look at this
network and I ask

1166
01:03:35,000 --> 01:03:37,760
you, all right, was it
added before this time,

1167
01:03:37,760 --> 01:03:40,130
big T. Let's just
for concreteness

1168
01:03:40,130 --> 01:03:45,480
say m0 is equal to--
we start with 10 nodes.

1169
01:03:45,480 --> 01:03:50,710
And we say, OK, at time t
equal to 100, I ask you,

1170
01:03:50,710 --> 01:03:53,880
what's the probability that if
I grab a random node, what's

1171
01:03:53,880 --> 01:03:56,050
the probability it was
added before some time

1172
01:03:56,050 --> 01:03:57,555
big T equal 10.

1173
01:04:02,010 --> 01:04:07,800
Well you would say,
very roughly actually.

1174
01:04:07,800 --> 01:04:10,520
We can say let's actually,
we can even if you'd like,

1175
01:04:10,520 --> 01:04:13,380
say we're not going to count--
we're not going to count

1176
01:04:13,380 --> 01:04:14,530
those m0 initial nodes.

1177
01:04:14,530 --> 01:04:16,613
So we're just going to be
looking at nodes that we

1178
01:04:16,613 --> 01:04:17,790
added later, if you'd like.

1179
01:04:17,790 --> 01:04:20,810
And then when you would say,
all right well, at time t 100,

1180
01:04:20,810 --> 01:04:23,320
we've added 100 nodes.

1181
01:04:23,320 --> 01:04:25,640
And I'm asking, if I grab
one of the nodes, what's

1182
01:04:25,640 --> 01:04:30,360
the probability that the node I
grab was added in the first 10

1183
01:04:30,360 --> 01:04:31,900
time steps.

1184
01:04:31,900 --> 01:04:34,400
Well you'd say, it's
going to be 10%,

1185
01:04:34,400 --> 01:04:38,070
because there were 10 nodes that
were added before time big T,

1186
01:04:38,070 --> 01:04:41,824
and we added 100, so it's really
just this divided by this.

1187
01:04:41,824 --> 01:04:43,990
And with the question of
whether you want to include

1188
01:04:43,990 --> 01:04:46,990
m0's or not.

1189
01:04:46,990 --> 01:04:53,210
So I think that that argument
is surprisingly straightforward,

1190
01:04:53,210 --> 01:04:56,900
but somehow it gets
really confusing is

1191
01:04:56,900 --> 01:05:01,600
that the time t we're referring
it's depending on the k's

1192
01:05:01,600 --> 01:05:02,960
and t's and so forth.

1193
01:05:02,960 --> 01:05:04,450
But that's a way
of keeping track

1194
01:05:04,450 --> 01:05:07,600
of how are things scaling
as a function of time.

1195
01:05:07,600 --> 01:05:09,540
But if you boil the
argument down to this,

1196
01:05:09,540 --> 01:05:12,640
then it makes sense,
but then of course

1197
01:05:12,640 --> 01:05:15,730
then you look back at this and
you get confused you again.

1198
01:05:15,730 --> 01:05:20,280
Which is how I feel every year
when I prepare this lecture,

1199
01:05:20,280 --> 01:05:23,840
but I think it all does
make sense if you--

1200
01:05:27,317 --> 01:05:29,400
Any questions about this
argument or that argument

1201
01:05:29,400 --> 01:05:31,980
or any part of it?

1202
01:05:31,980 --> 01:05:32,742
Yes?

1203
01:05:32,742 --> 01:05:34,950
AUDIENCE: So the ti's are
very important [INAUDIBLE]?

1204
01:05:38,245 --> 01:05:38,870
PROFESSOR: Yes.

1205
01:05:41,440 --> 01:05:48,639
So this is just saying that
if I pick some random node,

1206
01:05:48,639 --> 01:05:49,930
we're calling it the i-th node.

1207
01:05:49,930 --> 01:05:52,362
I'm asking what's the
probability that the time that

1208
01:05:52,362 --> 01:05:54,460
was added was before something.

1209
01:05:54,460 --> 01:05:56,770
So this is not one
of the variables,

1210
01:05:56,770 --> 01:05:58,760
and you'll see the ti
doesn't appear down here.

1211
01:05:58,760 --> 01:06:01,870
Because this is just
saying-- I'm asking you,

1212
01:06:01,870 --> 01:06:03,877
if I grab some random
node, the i-th node.

1213
01:06:03,877 --> 01:06:05,460
I'm asking you,
what's the probability

1214
01:06:05,460 --> 01:06:09,800
that it was added before some
other time, which is all this.

1215
01:06:09,800 --> 01:06:14,290
And what you can see is that
it's a function of the time

1216
01:06:14,290 --> 01:06:19,590
that we look, because
if I go to longer times

1217
01:06:19,590 --> 01:06:23,880
you know then indeed this
probability should it go--

1218
01:06:23,880 --> 01:06:26,478
What should it do?

1219
01:06:26,478 --> 01:06:27,394
AUDIENCE: [INAUDIBLE].

1220
01:06:27,394 --> 01:06:31,975
PROFESSOR: OK, but it depends
on k's as well, right?

1221
01:06:31,975 --> 01:06:33,466
What do I want to say?

1222
01:06:42,010 --> 01:06:47,300
Ultimately what we see here
is that as time goes infinity,

1223
01:06:47,300 --> 01:06:49,780
so after a long
time, then we reach

1224
01:06:49,780 --> 01:06:51,910
this stationary
distribution where

1225
01:06:51,910 --> 01:06:53,430
the base structure
of the network

1226
01:06:53,430 --> 01:06:55,920
is not changing anymore.

1227
01:06:55,920 --> 01:06:58,510
And that's because there's
a t in both the numerator

1228
01:06:58,510 --> 01:06:59,340
and denominator.

1229
01:06:59,340 --> 01:07:01,230
So then the only
thing that is left

1230
01:07:01,230 --> 01:07:05,294
is this behavior
as a function of k.

1231
01:07:05,294 --> 01:07:07,210
And this is really saying
that the probability

1232
01:07:07,210 --> 01:07:11,420
that some node was
added before some time,

1233
01:07:11,420 --> 01:07:14,980
is kind of the same
as saying that,

1234
01:07:14,980 --> 01:07:17,942
well, that you have
a lot of edges.

1235
01:07:17,942 --> 01:07:19,650
And that's how we got
here to begin with,

1236
01:07:19,650 --> 01:07:22,190
because the nodes that
were added early end up

1237
01:07:22,190 --> 01:07:23,600
with a lot of edges.

1238
01:07:23,600 --> 01:07:27,320
This is the so-called rich
get richer phenomenon.

1239
01:07:27,320 --> 01:07:29,607
So if you're sitting
on a manuscript,

1240
01:07:29,607 --> 01:07:31,440
and you're not submitting
it for publication

1241
01:07:31,440 --> 01:07:34,060
you should get on it
because the earlier

1242
01:07:34,060 --> 01:07:38,220
that it's published the more
citations it's going to get.

1243
01:07:38,220 --> 01:07:41,080
But this is saying
that the probability

1244
01:07:41,080 --> 01:07:46,540
that some random node has
a small number of edges

1245
01:07:46,540 --> 01:07:49,170
is the same as that
the probability

1246
01:07:49,170 --> 01:07:50,835
that the node was added late.

1247
01:07:53,285 --> 01:07:55,285
And that makes sense,
because if it's added late

1248
01:07:55,285 --> 01:07:58,670
it doesn't have very many
edges, hasn't had time to grow.

1249
01:07:58,670 --> 01:08:03,170
And then from those
calculations you

1250
01:08:03,170 --> 01:08:04,730
get it at this
degree distribution.

1251
01:08:08,649 --> 01:08:09,149
Yes?

1252
01:08:09,149 --> 01:08:13,077
AUDIENCE: So for this
analytical [INAUDIBLE]

1253
01:08:13,077 --> 01:08:16,023
we're assuming the links
could be [INAUDIBLE].

1254
01:08:20,785 --> 01:08:21,410
PROFESSOR: Yes.

1255
01:08:21,410 --> 01:08:24,660
So we're taking, in principle
it's a discrete problem

1256
01:08:24,660 --> 01:08:28,550
and converting it into
a differential equation.

1257
01:08:28,550 --> 01:08:30,710
And it's an interesting
question of I

1258
01:08:30,710 --> 01:08:35,680
don't know how big of an
error this ends up making,

1259
01:08:35,680 --> 01:08:39,950
and of course this
expression doesn't actually

1260
01:08:39,950 --> 01:08:43,640
end up having integers.

1261
01:08:43,640 --> 01:08:46,180
But this is a way of making
it so that the errors don't

1262
01:08:46,180 --> 01:08:48,090
grow or so, right?

1263
01:08:48,090 --> 01:08:51,350
I think that it basically works.

1264
01:08:51,350 --> 01:08:54,069
If you'd like you could actually
do the simulation with all

1265
01:08:54,069 --> 01:08:55,430
the discrete-- I
think that is actually

1266
01:08:55,430 --> 01:08:57,388
going to be the stochastic
dynamics that end up

1267
01:08:57,388 --> 01:09:02,080
being more relevant than
the integer kind of issue,

1268
01:09:02,080 --> 01:09:04,533
but I haven't actually
looked into that though.

1269
01:09:09,270 --> 01:09:13,840
Any other questions
about that so far?

1270
01:09:13,840 --> 01:09:14,367
Yes?

1271
01:09:14,367 --> 01:09:15,283
AUDIENCE: [INAUDIBLE].

1272
01:09:19,279 --> 01:09:22,420
PROFESSOR: So there's no loss
of edges, no loss of nodes,

1273
01:09:22,420 --> 01:09:23,520
strictly verboten.

1274
01:09:30,359 --> 01:09:31,970
I spent a lot of
time trying to plan

1275
01:09:31,970 --> 01:09:37,148
an upcoming trip to Germany last
night so German is on my mind.

1276
01:09:44,410 --> 01:09:47,870
So are we done yet incidentally?

1277
01:09:47,870 --> 01:09:48,910
Nearly right?

1278
01:09:48,910 --> 01:09:51,790
Because we have--
What we really wanted

1279
01:09:51,790 --> 01:09:56,130
is the degree distribution,
not this probability.

1280
01:09:56,130 --> 01:09:58,770
So we have to take
a derivative still,

1281
01:09:58,770 --> 01:10:04,120
but as t goes to
infinity, regardless

1282
01:10:04,120 --> 01:10:09,320
of how you treat the m0's,
actually what we-- maybe we'll

1283
01:10:09,320 --> 01:10:10,630
take the derivative first.

1284
01:10:10,630 --> 01:10:14,730
So this probability
density is going

1285
01:10:14,730 --> 01:10:17,500
to be the derivative
with respect

1286
01:10:17,500 --> 01:10:22,030
to k of the actual
probability here.

1287
01:10:27,660 --> 01:10:30,460
So we take a derivative,
this one derivative

1288
01:10:30,460 --> 01:10:33,370
that nothing happens,
case squared,

1289
01:10:33,370 --> 01:10:36,690
it's going to turn
into a k cubed.

1290
01:10:36,690 --> 01:10:44,200
So we get 2m squared
t over k cubed,

1291
01:10:44,200 --> 01:10:52,110
we still have the t plus m0, but
when we let t go to infinity,

1292
01:10:52,110 --> 01:10:58,550
so after this thing has reached
its stationary distribution,

1293
01:10:58,550 --> 01:11:02,370
then we end up just getting
2m squared over k cubed.

1294
01:11:05,030 --> 01:11:08,490
I just want to be
clear this is to the k.

1295
01:11:08,490 --> 01:11:14,050
The key feature here is that
the probability distribution

1296
01:11:14,050 --> 01:11:16,920
goes as 1 over k cubed.

1297
01:11:23,400 --> 01:11:26,430
What is interesting is that
when I first read the paper

1298
01:11:26,430 --> 01:11:31,260
I actually thought
that this exponent here

1299
01:11:31,260 --> 01:11:33,910
would be a function
of the linearity

1300
01:11:33,910 --> 01:11:36,160
of the preferential attachment.

1301
01:11:36,160 --> 01:11:39,360
So I actually-- and of course
they say that it's not true,

1302
01:11:39,360 --> 01:11:42,020
but when I was halfway through
the paper I thought, oh well,

1303
01:11:42,020 --> 01:11:45,490
if you just let this go
as some power to the beta,

1304
01:11:45,490 --> 01:11:47,900
or so, that you would
maybe get something

1305
01:11:47,900 --> 01:11:51,580
like this was 2 plus
beta-- I thought

1306
01:11:51,580 --> 01:11:53,990
something like that, but
apparently it's not true.

1307
01:11:53,990 --> 01:11:58,040
That if you do not have
linear attachment here

1308
01:11:58,040 --> 01:12:00,220
then you just don't get
power law distributions.

1309
01:12:00,220 --> 01:12:02,053
They suggest other ways
that you could maybe

1310
01:12:02,053 --> 01:12:04,860
get different exponents,
which is very relevant given

1311
01:12:04,860 --> 01:12:07,260
the fact that different
real networks indeed

1312
01:12:07,260 --> 01:12:09,850
have different exponents.

1313
01:12:09,850 --> 01:12:13,240
But I'd say that their
proffered explanation, which

1314
01:12:13,240 --> 01:12:18,590
is to include directed edges,
feels unsatisfying because not

1315
01:12:18,590 --> 01:12:21,240
all networks are directed.

1316
01:12:21,240 --> 01:12:24,530
And this network
here is not directed,

1317
01:12:24,530 --> 01:12:26,950
it has next exponents
closer to 2.

1318
01:12:26,950 --> 01:12:30,010
So you really want to
have other mechanisms.

1319
01:12:30,010 --> 01:12:33,230
But this is as we mentioned,
is it's a thriving field

1320
01:12:33,230 --> 01:12:35,474
and people have explored
many different aspects

1321
01:12:35,474 --> 01:12:36,140
of this problem.

1322
01:12:43,400 --> 01:12:46,650
Are there any other questions
about this derivation, how

1323
01:12:46,650 --> 01:12:49,562
we got there, how
convincing maybe you

1324
01:12:49,562 --> 01:12:50,770
think it should be or not be?

1325
01:12:56,120 --> 01:12:58,870
So I want to just spend the
last five minutes of the class

1326
01:12:58,870 --> 01:13:02,360
kind of setting up the
discussion of how we should be

1327
01:13:02,360 --> 01:13:05,010
searching for network motifs.

1328
01:13:05,010 --> 01:13:06,760
In particular there's
a natural question

1329
01:13:06,760 --> 01:13:10,200
which is, we have to decide
what the right null model is,

1330
01:13:10,200 --> 01:13:14,842
in terms of deciding what the
expected frequency of a network

1331
01:13:14,842 --> 01:13:16,550
motif, like a feed
forward loop might be.

1332
01:13:22,590 --> 01:13:30,490
So first of all, why is it
that we maybe should not

1333
01:13:30,490 --> 01:13:31,890
use an Erdos Renyi network?

1334
01:13:43,930 --> 01:13:44,627
Yes?

1335
01:13:44,627 --> 01:13:46,668
AUDIENCE: Because it's
not very good for handling

1336
01:13:46,668 --> 01:13:49,072
directed networks?

1337
01:13:49,072 --> 01:13:49,780
PROFESSOR: Right.

1338
01:13:49,780 --> 01:13:52,460
So you'd say, oh, not
very good-- I can maybe

1339
01:13:52,460 --> 01:13:57,960
make-- there's a clear
analog to it-- you could take

1340
01:13:57,960 --> 01:14:04,170
a random undirected ER network
and say put arrows randomly

1341
01:14:04,170 --> 01:14:08,450
on each-- I mean I think
that there's a natural ER

1342
01:14:08,450 --> 01:14:10,710
version of a directed network.

1343
01:14:10,710 --> 01:14:12,455
AUDIENCE: There are constraints.

1344
01:14:12,455 --> 01:14:13,330
PROFESSOR: Like what?

1345
01:14:13,330 --> 01:14:20,092
AUDIENCE: Like when you
[INAUDIBLE] duplication,

1346
01:14:20,092 --> 01:14:23,291
you don't randomly
assign the edge.

1347
01:14:23,291 --> 01:14:24,290
PROFESSOR: That's right.

1348
01:14:24,290 --> 01:14:27,030
OK, so one thing is that it
may be that biologically there

1349
01:14:27,030 --> 01:14:31,010
are constraints, but that
should manifest itself somehow.

1350
01:14:31,010 --> 01:14:34,040
In the sense that if, you know
all that may be well and good,

1351
01:14:34,040 --> 01:14:36,410
it may be true, what
you're saying, but if we go

1352
01:14:36,410 --> 01:14:38,160
and we look at a
transcription network,

1353
01:14:38,160 --> 01:14:41,160
if it looks like an
ER network, then I

1354
01:14:41,160 --> 01:14:43,160
would say it just
doesn't matter.

1355
01:14:43,160 --> 01:14:46,240
The fact that there's
microscopic things going on,

1356
01:14:46,240 --> 01:14:48,740
I mean if at the end of the day
it looks like an ER network,

1357
01:14:48,740 --> 01:14:53,047
then maybe it's
fine anyways, right?

1358
01:14:53,047 --> 01:14:54,480
AUDIENCE: Hum.

1359
01:14:54,480 --> 01:14:56,380
PROFESSOR: Or maybe not.

1360
01:14:56,380 --> 01:14:58,477
You can argue either way.

1361
01:14:58,477 --> 01:15:00,060
AUDIENCE: It depends
on what you want.

1362
01:15:00,060 --> 01:15:02,370
If a particular
motif occurs a lot

1363
01:15:02,370 --> 01:15:06,814
it might be because
it's selected for it,

1364
01:15:06,814 --> 01:15:09,230
but it's not what you were--
--it's for some other reason.

1365
01:15:09,230 --> 01:15:10,229
PROFESSOR: That's right.

1366
01:15:10,229 --> 01:15:13,290
So this is an
important point, that I

1367
01:15:13,290 --> 01:15:15,200
would say that in
Erdos approach,

1368
01:15:15,200 --> 01:15:18,389
he basically says if we
see a network motif more

1369
01:15:18,389 --> 01:15:19,930
frequently than we
would expect based

1370
01:15:19,930 --> 01:15:22,590
on some null model,
some null network,

1371
01:15:22,590 --> 01:15:24,850
then it's kind of
prima facie evidence

1372
01:15:24,850 --> 01:15:26,815
that maybe evolution
was selecting

1373
01:15:26,815 --> 01:15:28,480
for it for some reason.

1374
01:15:28,480 --> 01:15:31,180
And what you're saying is
that it could be there's

1375
01:15:31,180 --> 01:15:33,040
a microscopic mechanism
that just leads

1376
01:15:33,040 --> 01:15:35,690
to those things happening,
and so it doesn't

1377
01:15:35,690 --> 01:15:37,590
have to be selection,
it could be

1378
01:15:37,590 --> 01:15:40,980
just due to the mechanistic
processes below.

1379
01:15:40,980 --> 01:15:43,150
And I think that's
a fair concern.

1380
01:15:43,150 --> 01:15:45,890
And it's related to a lot of
these other things, in that

1381
01:15:45,890 --> 01:15:48,020
just for example,
duplication will naturally

1382
01:15:48,020 --> 01:15:52,470
lead to something-- if you
start out with x regulating Y,

1383
01:15:52,470 --> 01:15:55,710
and Y is duplicated then
now you have x regulating

1384
01:15:55,710 --> 01:16:00,440
some Y1 and also some Y2.

1385
01:16:00,440 --> 01:16:04,050
And this is the beginnings
of a network motif,

1386
01:16:04,050 --> 01:16:07,270
and so it's a reasonable
thing to worry about

1387
01:16:07,270 --> 01:16:10,570
but maybe we can correct for
at least a majority of this

1388
01:16:10,570 --> 01:16:12,280
by using the proper null model.

1389
01:16:12,280 --> 01:16:16,380
At least that would be the hope.

1390
01:16:16,380 --> 01:16:19,516
AUDIENCE: Well, that's why
you don't want necessiarilly--

1391
01:16:19,516 --> 01:16:20,640
PROFESSOR: OK, that's fair.

1392
01:16:20,640 --> 01:16:23,130
But then the question is,
what you null model should we

1393
01:16:23,130 --> 01:16:25,420
be using?

1394
01:16:25,420 --> 01:16:26,522
Yeah?

1395
01:16:26,522 --> 01:16:28,605
AUDIENCE: So you
feel like having

1396
01:16:28,605 --> 01:16:31,585
the microscopic constraints
does not necessarily

1397
01:16:31,585 --> 01:16:33,510
need to be in the null model.

1398
01:16:33,510 --> 01:16:36,738
I feel we can have a null
model but without using

1399
01:16:36,738 --> 01:16:39,690
the microscopic constraints
and then just say, oh well

1400
01:16:39,690 --> 01:16:41,430
that's another
possibility for why we

1401
01:16:41,430 --> 01:16:43,969
might have these divergences.

1402
01:16:43,969 --> 01:16:45,968
I don't think they need
to be in the null model.

1403
01:16:45,968 --> 01:16:48,258
AUDIENCE: Yeah, it's just that
then you can't say anything

1404
01:16:48,258 --> 01:16:48,924
about evolution.

1405
01:16:48,924 --> 01:16:51,090
AUDIENCE: Well fair,
but I don't should

1406
01:16:51,090 --> 01:16:53,270
have to-- I don't think
you have to say something

1407
01:16:53,270 --> 01:16:55,420
about evolution
afterwards necessarily.

1408
01:16:55,420 --> 01:16:57,378
PROFESSOR: Yeah, and I
think that this question

1409
01:16:57,378 --> 01:17:00,030
about how strongly you can
argue that evolution, selective

1410
01:17:00,030 --> 01:17:03,180
or something, and this is a
little bit of a judgment call,

1411
01:17:03,180 --> 01:17:05,260
because most of these
evolutionary arguments

1412
01:17:05,260 --> 01:17:08,220
are not ironclad,
it's more a matter

1413
01:17:08,220 --> 01:17:13,650
of making you feel kind of
comfortable with looking

1414
01:17:13,650 --> 01:17:16,670
for what the evolutionary
explanation might have been.

1415
01:17:16,670 --> 01:17:21,560
This is just the nature of
looking at historical science,

1416
01:17:21,560 --> 01:17:22,490
right?

1417
01:17:22,490 --> 01:17:24,217
I mean, you can
speculate about what

1418
01:17:24,217 --> 01:17:26,550
would have happened if Napoleon
had done something else,

1419
01:17:26,550 --> 01:17:27,050
or whatever.

1420
01:17:27,050 --> 01:17:30,380
But it's a speculation.

1421
01:17:30,380 --> 01:17:32,540
Of course the hope is
that we can collect

1422
01:17:32,540 --> 01:17:34,959
multiple pieces of evidence
that make us more and more

1423
01:17:34,959 --> 01:17:36,500
comfortable with it
and in some cases

1424
01:17:36,500 --> 01:17:39,290
we can do laboratory
evolution to get more comfort,

1425
01:17:39,290 --> 01:17:41,590
but laboratory
evolution doesn't prove

1426
01:17:41,590 --> 01:17:45,300
that that's what happened
a million years ago either.

1427
01:17:45,300 --> 01:17:48,770
But I'd say it's more the
accumulation of evidence

1428
01:17:48,770 --> 01:17:51,470
to make you feel comfortable
with an argument.

1429
01:17:51,470 --> 01:17:53,730
But you know, let's first
make sure we understand

1430
01:17:53,730 --> 01:17:58,809
what the null model is, and
then on Thursday we'll decide,

1431
01:17:58,809 --> 01:18:00,850
well we won't decide,
we'll discuss what we think

1432
01:18:00,850 --> 01:18:01,974
that means about evolution.

1433
01:18:01,974 --> 01:18:02,870
Yeah?

1434
01:18:02,870 --> 01:18:08,517
AUDIENCE: So I think what we
the other part of the appendix

1435
01:18:08,517 --> 01:18:10,475
that we read about the
in and out distributions

1436
01:18:10,475 --> 01:18:12,060
is important for the null model.

1437
01:18:12,060 --> 01:18:12,685
PROFESSOR: Yes.

1438
01:18:12,685 --> 01:18:17,388
AUDIENCE: Because it seems
to me that the Erdos Renyi

1439
01:18:17,388 --> 01:18:23,292
network might be a good model
for the in distributions,

1440
01:18:23,292 --> 01:18:24,770
but not for the
out distributions.

1441
01:18:24,770 --> 01:18:27,529
PROFESSOR: That's right.

1442
01:18:27,529 --> 01:18:29,070
And I think this is
really important.

1443
01:18:29,070 --> 01:18:32,330
I think that it's clear that
the actual transcription

1444
01:18:32,330 --> 01:18:35,270
network of E. coli, for
example, is not well described

1445
01:18:35,270 --> 01:18:37,780
as an Erdos Renyi
random network,

1446
01:18:37,780 --> 01:18:40,310
but then it does beg the
question of what should we

1447
01:18:40,310 --> 01:18:41,590
be using.

1448
01:18:41,590 --> 01:18:44,566
And you could say, well, we
just make a power law network,

1449
01:18:44,566 --> 01:18:45,940
but then you say,
oh, but there's

1450
01:18:45,940 --> 01:18:48,420
the in degree, and
the out degree.

1451
01:18:48,420 --> 01:18:50,400
How much do you want
to keep track of that?

1452
01:18:50,400 --> 01:18:53,152
And I think that there is
a fairly strong argument

1453
01:18:53,152 --> 01:18:54,860
that what you should
do is what they call

1454
01:18:54,860 --> 01:18:56,540
this degree preserving network.

1455
01:19:00,180 --> 01:19:07,610
In particular what that means is
that you take the real network,

1456
01:19:07,610 --> 01:19:09,360
so you take the actual
network that you're

1457
01:19:09,360 --> 01:19:15,650
going to be analyzing, and
there is some actual degree

1458
01:19:15,650 --> 01:19:17,140
distribution.

1459
01:19:17,140 --> 01:19:25,805
So there's 1 node has-- so k1
might be 106, k2 might be 73,

1460
01:19:25,805 --> 01:19:30,040
dot, dot, dot, dot, up to
kn which is equal to 1.

1461
01:19:30,040 --> 01:19:31,520
And of course I'm
not even talking

1462
01:19:31,520 --> 01:19:33,603
about it being directed,
but you do the same thing

1463
01:19:33,603 --> 01:19:35,526
with directed.

1464
01:19:35,526 --> 01:19:37,650
But then what you do, is
you kind of mix things up.

1465
01:19:37,650 --> 01:19:39,358
So you start with a
real network and then

1466
01:19:39,358 --> 01:19:42,522
you do something
to randomize it.

1467
01:19:42,522 --> 01:19:44,230
And it's a rather
clever scheme, I'm just

1468
01:19:44,230 --> 01:19:45,470
going to describe
it briefly here

1469
01:19:45,470 --> 01:19:47,386
and then we'll talk more
about it on Thursday.

1470
01:19:47,386 --> 01:19:51,210
What you do is you take
all of the actual--

1471
01:19:51,210 --> 01:19:57,580
so let's say we have x1, some
x2 and here we have a Y1,

1472
01:19:57,580 --> 01:20:03,790
Y2, Y3, now let's say that these
guys are regulating something

1473
01:20:03,790 --> 01:20:05,230
like this.

1474
01:20:05,230 --> 01:20:09,520
What you do is you take
two edges randomly,

1475
01:20:09,520 --> 01:20:12,880
we'll pick this one
that one, and what we do

1476
01:20:12,880 --> 01:20:16,570
is we swap the targets.

1477
01:20:16,570 --> 01:20:19,660
So what we do is we make
this guy come over here,

1478
01:20:19,660 --> 01:20:21,440
and then this one
comes over here.

1479
01:20:24,040 --> 01:20:27,240
So now what we do is we erase
this, and we erase this,

1480
01:20:27,240 --> 01:20:33,670
now we have a new network,
but intriguingly, the degree

1481
01:20:33,670 --> 01:20:36,170
distributions for both
incoming and outgoing edges

1482
01:20:36,170 --> 01:20:41,000
are identical to what
we had before this.

1483
01:20:41,000 --> 01:20:45,290
Every guy has the outgoing
edges, incoming edges,

1484
01:20:45,290 --> 01:20:47,120
but they're just
different targets.

1485
01:20:47,120 --> 01:20:49,710
So if you just do
this procedure many,

1486
01:20:49,710 --> 01:20:51,960
many times then
what you do is you

1487
01:20:51,960 --> 01:20:55,929
achieve some randomized
version of the real network.

1488
01:20:55,929 --> 01:20:58,470
And then what you can do is you
can ask how many feed forward

1489
01:20:58,470 --> 01:20:59,360
loops are there.

1490
01:20:59,360 --> 01:21:02,700
How many, this, that--

1491
01:21:02,700 --> 01:21:05,720
And so there's a fair
argument that this

1492
01:21:05,720 --> 01:21:08,182
is in some ways the
proper null model

1493
01:21:08,182 --> 01:21:09,390
to be asking the question in.

1494
01:21:09,390 --> 01:21:10,100
And indeed, for
example, there are

1495
01:21:10,100 --> 01:21:12,600
many more feed forward loops
than there would be in an Erdos

1496
01:21:12,600 --> 01:21:16,421
Renyi, but still what you see is
that you lose many feed forward

1497
01:21:16,421 --> 01:21:16,920
loop.

1498
01:21:16,920 --> 01:21:19,870
So this then the
argument for feed

1499
01:21:19,870 --> 01:21:21,270
forward loops
being selected for.

1500
01:21:21,270 --> 01:21:24,060
We'll talk about this and
we'll quantify it on Thursday,

1501
01:21:24,060 --> 01:21:26,180
but I'm available for
the next half hour

1502
01:21:26,180 --> 01:21:28,090
if anybody has any questions.