1
00:00:00,060 --> 00:00:02,500
The following content is
provided under a Creative

2
00:00:02,500 --> 00:00:04,019
Commons license.

3
00:00:04,019 --> 00:00:06,360
Your support will help
MIT OpenCourseWare

4
00:00:06,360 --> 00:00:10,730
continue to offer high quality,
educational resources for free.

5
00:00:10,730 --> 00:00:13,330
To make a donation or
view additional materials

6
00:00:13,330 --> 00:00:17,236
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:17,236 --> 00:00:17,861
at ocw.mit.edu.

8
00:00:26,600 --> 00:00:29,342
NANCY LYNCH: OK so today, you're
going to see something new.

9
00:00:29,342 --> 00:00:30,800
In fact all this
week, you're going

10
00:00:30,800 --> 00:00:33,510
to see something that's quite
different from what you've

11
00:00:33,510 --> 00:00:36,960
been studying in this course.

12
00:00:36,960 --> 00:00:37,950
These are algorithms.

13
00:00:37,950 --> 00:00:42,380
But they're for a completely
different sort of model.

14
00:00:42,380 --> 00:00:46,110
Distributed algorithms,
OK, so what are they?

15
00:00:46,110 --> 00:00:48,190
So now instead of
having algorithms

16
00:00:48,190 --> 00:00:50,480
that run on a typical
computer, you're

17
00:00:50,480 --> 00:00:55,400
going to have algorithms that
run on a network of processors.

18
00:00:55,400 --> 00:00:57,350
Or it could be on
one machine that

19
00:00:57,350 --> 00:01:01,330
has multiple processors,
multi processors that memory.

20
00:01:05,970 --> 00:01:09,370
Much of computing is
distributed algorithms now.

21
00:01:09,370 --> 00:01:11,550
They solve problems
like communication

22
00:01:11,550 --> 00:01:18,560
on the internet, data
management over a network,

23
00:01:18,560 --> 00:01:21,260
allocating resources
in a network setting,

24
00:01:21,260 --> 00:01:23,540
synchronizing,
reaching agreement

25
00:01:23,540 --> 00:01:28,990
among different agents
at remote locations.

26
00:01:28,990 --> 00:01:31,470
So these are all distributed
problems, not things

27
00:01:31,470 --> 00:01:34,610
that you just solve
on one computer.

28
00:01:34,610 --> 00:01:38,120
The kinds of algorithms you
design for these settings

29
00:01:38,120 --> 00:01:45,420
have to work under extremely
difficult platforms

30
00:01:45,420 --> 00:01:48,360
because what you have is
concurrent activity that's

31
00:01:48,360 --> 00:01:51,840
going on at many locations,
many processors doing things

32
00:01:51,840 --> 00:01:53,220
at the same time.

33
00:01:53,220 --> 00:01:55,800
And you don't know exactly
when everybody's going

34
00:01:55,800 --> 00:01:57,970
to perform their activities.

35
00:01:57,970 --> 00:02:02,090
You can have different
sorts of timing uncertainty.

36
00:02:02,090 --> 00:02:05,010
The order of events isn't clear.

37
00:02:05,010 --> 00:02:08,830
There could be inputs that
arrive at different locations.

38
00:02:08,830 --> 00:02:12,150
And then you also have to
deal with failure and recovery

39
00:02:12,150 --> 00:02:15,190
of some of the processors or
some of the channels involved

40
00:02:15,190 --> 00:02:16,522
in the computation.

41
00:02:16,522 --> 00:02:17,980
You don't think of
any of this when

42
00:02:17,980 --> 00:02:20,315
you're just trying to run an
algorithm on one computer.

43
00:02:22,920 --> 00:02:25,990
So distributed algorithms
can be pretty complicated.

44
00:02:25,990 --> 00:02:28,210
It's not easy to design them.

45
00:02:28,210 --> 00:02:30,654
And after you design
them, you still

46
00:02:30,654 --> 00:02:32,070
have to make sure
they're correct.

47
00:02:32,070 --> 00:02:34,290
So there are issues involved
in proving them correct

48
00:02:34,290 --> 00:02:35,730
and analyzing them.

49
00:02:35,730 --> 00:02:37,980
A little bit of
history, the field

50
00:02:37,980 --> 00:02:42,000
pretty much started
around the late '60s.

51
00:02:42,000 --> 00:02:46,330
Edsger Dijkstra was one of the
earliest leaders in the field.

52
00:02:46,330 --> 00:02:49,850
He won of the first
Turing Awards.

53
00:02:49,850 --> 00:02:52,780
Leslie Lamport won the
Turing Award last year.

54
00:02:52,780 --> 00:02:55,590
Although he actually
started as a very young guy,

55
00:02:55,590 --> 00:02:59,470
way back in the early
days of the field.

56
00:02:59,470 --> 00:03:01,770
If you want to look at some
sources, I have a book.

57
00:03:01,770 --> 00:03:04,390
There's another textbook
by Attiya and Welch.

58
00:03:04,390 --> 00:03:06,710
There's a new series of
monographs that basically

59
00:03:06,710 --> 00:03:10,190
try to summarize many of the
important research topics

60
00:03:10,190 --> 00:03:12,250
in distributed computing theory.

61
00:03:12,250 --> 00:03:16,170
And the last two lines have a
couple of the main conferences

62
00:03:16,170 --> 00:03:18,620
in the field.

63
00:03:18,620 --> 00:03:21,750
OK so I can't do that
much in one week.

64
00:03:21,750 --> 00:03:24,610
What I'll do is just
introduce the area,

65
00:03:24,610 --> 00:03:29,140
by showing you two common
models for distributed networks.

66
00:03:29,140 --> 00:03:32,686
And just introduce a very
few fundamental algorithms,

67
00:03:32,686 --> 00:03:35,060
and you'll see along the way
some techniques for modeling

68
00:03:35,060 --> 00:03:37,030
and analyzing them.

69
00:03:37,030 --> 00:03:39,740
OK the two models here are
synchronous distributed

70
00:03:39,740 --> 00:03:44,820
networks, and asynchronous
distributed networks.

71
00:03:44,820 --> 00:03:47,050
The problems I'll look at
in the synchronous setting

72
00:03:47,050 --> 00:03:50,860
are a simple problem of leader
election, which is a symmetry

73
00:03:50,860 --> 00:03:53,400
breaking problem, basically.

74
00:03:53,400 --> 00:03:58,227
Maximal independence set
problem, and then a couple

75
00:03:58,227 --> 00:04:00,060
of problems that should
look familiar to you

76
00:04:00,060 --> 00:04:04,530
from the settings of this
class, establishing structures

77
00:04:04,530 --> 00:04:08,360
like breadth-first spanning
trees and shortest paths trees.

78
00:04:08,360 --> 00:04:10,800
In the asynchronous case
I'll revisit these last two

79
00:04:10,800 --> 00:04:13,290
problems, setting up
breadth-first and shortest path

80
00:04:13,290 --> 00:04:15,100
trees.

81
00:04:15,100 --> 00:04:17,620
OK so I mentioned
something about modeling

82
00:04:17,620 --> 00:04:19,430
in proofs and analysis.

83
00:04:19,430 --> 00:04:23,030
Turns out, getting the
formal models right

84
00:04:23,030 --> 00:04:25,730
and getting real
proofs tends to be

85
00:04:25,730 --> 00:04:27,830
pretty important for
distributed algorithms

86
00:04:27,830 --> 00:04:31,180
because with all the stuff
going on, they're complicated.

87
00:04:31,180 --> 00:04:34,030
And it's easy to make mistakes.

88
00:04:34,030 --> 00:04:38,110
The kinds of models that we use
are interacting state machines,

89
00:04:38,110 --> 00:04:39,320
inputs and outputs.

90
00:04:39,320 --> 00:04:41,640
They send each other messages.

91
00:04:41,640 --> 00:04:44,050
But the kinds of
proofs you do typically

92
00:04:44,050 --> 00:04:46,210
use invariants, a
technique that you're very

93
00:04:46,210 --> 00:04:47,680
familiar with from this class.

94
00:04:47,680 --> 00:04:50,670
You can still use them
in a distributed setting.

95
00:04:50,670 --> 00:04:53,910
And you still prove them
the same way, by induction.

96
00:04:53,910 --> 00:04:56,640
Something else that comes up a
lot in the distributed setting

97
00:04:56,640 --> 00:05:00,690
is modeling and proofs
using levels of abstraction.

98
00:05:00,690 --> 00:05:02,810
You might want to give
an abstract description

99
00:05:02,810 --> 00:05:04,860
of an algorithm and
prove that that works.

100
00:05:04,860 --> 00:05:07,720
And then you have a very
detailed, complicated,

101
00:05:07,720 --> 00:05:11,480
lower level description that you
can prove implements the higher

102
00:05:11,480 --> 00:05:13,286
level description.

103
00:05:13,286 --> 00:05:15,460
That's another
popular technique.

104
00:05:15,460 --> 00:05:17,670
You use different kinds
of complexity measures.

105
00:05:17,670 --> 00:05:21,510
For time complexity,
you would measure rounds

106
00:05:21,510 --> 00:05:25,610
if it's the synchronous
model, or some approximation

107
00:05:25,610 --> 00:05:28,660
to real time, if it's
the asynchronous model.

108
00:05:28,660 --> 00:05:31,410
You also count communication,
either the number

109
00:05:31,410 --> 00:05:33,660
of messages you send, or
the total number of bits

110
00:05:33,660 --> 00:05:35,421
that you send in an algorithm.

111
00:05:38,000 --> 00:05:40,550
So throughout
these two lectures,

112
00:05:40,550 --> 00:05:44,300
we'll be looking at
distributed networks.

113
00:05:44,300 --> 00:05:45,740
So you start with a graph.

114
00:05:45,740 --> 00:05:49,830
Let's just look at
undirected graphs this week.

115
00:05:49,830 --> 00:05:52,050
We use n in this
field for what you're

116
00:05:52,050 --> 00:05:56,490
calling v, the total number
of nodes in the network

117
00:05:56,490 --> 00:06:01,780
or vertices in the graph.

118
00:06:01,780 --> 00:06:05,200
We use the notation gamma of
u to mean the neighbors of u

119
00:06:05,200 --> 00:06:06,910
in the graph.

120
00:06:06,910 --> 00:06:11,060
So every vertex of the graph has
a set of immediate neighboring

121
00:06:11,060 --> 00:06:11,810
vertices.

122
00:06:11,810 --> 00:06:13,730
That's gamma of u.

123
00:06:13,730 --> 00:06:19,310
And the degree of u is the size
of the neighborhood, the number

124
00:06:19,310 --> 00:06:22,090
of neighbors of the vertex.

125
00:06:22,090 --> 00:06:24,050
OK so we start with the graph.

126
00:06:24,050 --> 00:06:25,740
But now we're
going to plunk down

127
00:06:25,740 --> 00:06:29,350
a process, some kind
of active entity

128
00:06:29,350 --> 00:06:31,900
at each vertex of the graph.

129
00:06:31,900 --> 00:06:33,520
So this is some
kind of automaton.

130
00:06:33,520 --> 00:06:36,130
If you've taken automata
theory, it's not really

131
00:06:36,130 --> 00:06:39,960
finite state machines, it's more
like infinite state automata

132
00:06:39,960 --> 00:06:43,760
that can interact
with each other.

133
00:06:43,760 --> 00:06:47,820
So we usually talk about
vertices in a graph, processes

134
00:06:47,820 --> 00:06:49,160
at the vertices of a graph.

135
00:06:49,160 --> 00:06:51,930
But sometimes we get
sloppy and just say nodes.

136
00:06:51,930 --> 00:06:54,980
And we could mean either the
vertex or the active thing

137
00:06:54,980 --> 00:06:56,740
running at the vertex.

138
00:06:56,740 --> 00:06:59,840
Can't keep them
straight all the time.

139
00:06:59,840 --> 00:07:02,120
OK and then with the
edges of the graph,

140
00:07:02,120 --> 00:07:05,120
we would put
communication channels,

141
00:07:05,120 --> 00:07:08,690
one in each direction,
so that the processes

142
00:07:08,690 --> 00:07:11,700
can communicate over the edges.

143
00:07:11,700 --> 00:07:13,840
This week I'm not going
to talk about what

144
00:07:13,840 --> 00:07:16,750
happens when you introduce
failures because we just

145
00:07:16,750 --> 00:07:17,610
don't have time.

146
00:07:17,610 --> 00:07:20,800
A lot of distributed computing
theory deals with what

147
00:07:20,800 --> 00:07:24,340
happens when some of the
components in your system fail.

148
00:07:24,340 --> 00:07:27,330
How do you cope with that?

149
00:07:27,330 --> 00:07:29,880
So we'll start right in
with synchronous distributed

150
00:07:29,880 --> 00:07:30,380
algorithms.

151
00:07:32,956 --> 00:07:34,580
A source for that,
if you're interested

152
00:07:34,580 --> 00:07:38,180
is the first technical
chapter in my book.

153
00:07:38,180 --> 00:07:41,060
OK so you have processes
at the nodes of a graph,

154
00:07:41,060 --> 00:07:42,270
like I just said.

155
00:07:42,270 --> 00:07:45,830
They communicate using messages.

156
00:07:45,830 --> 00:07:49,460
So think of each process as not
knowing who his neighbors are,

157
00:07:49,460 --> 00:07:51,930
not knowing anything
about the graph.

158
00:07:51,930 --> 00:07:53,080
So what do they have?

159
00:07:53,080 --> 00:07:53,990
They have ports.

160
00:07:53,990 --> 00:07:57,410
You could say they have output
ports, on which they could send

161
00:07:57,410 --> 00:08:01,360
a message, and then some
input ports on which messages

162
00:08:01,360 --> 00:08:02,900
can come in.

163
00:08:02,900 --> 00:08:06,060
So in general, the
process doesn't know

164
00:08:06,060 --> 00:08:08,660
who the ports are connected to.

165
00:08:08,660 --> 00:08:10,470
It just has local
names for the ports,

166
00:08:10,470 --> 00:08:13,800
like one, two, three,
up to the degree.

167
00:08:13,800 --> 00:08:17,110
If you have any questions
just stop me and ask,

168
00:08:17,110 --> 00:08:19,190
if something's not clear.

169
00:08:19,190 --> 00:08:20,802
Otherwise I'll go pretty fast.

170
00:08:20,802 --> 00:08:22,510
And I know that none
of this is familiar.

171
00:08:25,620 --> 00:08:27,470
So in general, the
processes don't have

172
00:08:27,470 --> 00:08:31,070
to be distinguishable at all.

173
00:08:31,070 --> 00:08:35,127
So they don't have to have
special unique identifiers

174
00:08:35,127 --> 00:08:36,710
so you could tell
the processes apart.

175
00:08:36,710 --> 00:08:38,995
They could be
completely identical.

176
00:08:38,995 --> 00:08:40,870
Well if they have
different numbers of ports,

177
00:08:40,870 --> 00:08:43,370
they're not exactly identical.

178
00:08:43,370 --> 00:08:44,870
They certainly
know how many ports

179
00:08:44,870 --> 00:08:47,540
they have, and release the
local names for the ports.

180
00:08:51,320 --> 00:08:52,817
Good so these are
processes sitting

181
00:08:52,817 --> 00:08:53,900
at the nodes of the graph.

182
00:08:53,900 --> 00:08:55,350
What do they do?

183
00:08:55,350 --> 00:08:56,490
So they execute.

184
00:08:56,490 --> 00:09:00,900
And we talk about an
execution of this network.

185
00:09:00,900 --> 00:09:04,310
It goes in synchronous
rounds, and every round,

186
00:09:04,310 --> 00:09:06,620
every process
looks at its state,

187
00:09:06,620 --> 00:09:08,820
and decides what
messages it's going

188
00:09:08,820 --> 00:09:12,460
to send on all of the ports.

189
00:09:12,460 --> 00:09:15,425
So it could send different
messages on different ports.

190
00:09:18,110 --> 00:09:20,200
So then what happens
is all the messages

191
00:09:20,200 --> 00:09:23,540
that the processes decide to
send get put onto the channels

192
00:09:23,540 --> 00:09:26,810
and they get delivered to
the process at the other end.

193
00:09:26,810 --> 00:09:30,520
So the process of the
other end is in some state.

194
00:09:30,520 --> 00:09:32,090
All these messages come in.

195
00:09:32,090 --> 00:09:34,840
It updates its state, based
on the arriving messages.

196
00:09:34,840 --> 00:09:37,750
So it changes state in
response to whatever comes in.

197
00:09:42,880 --> 00:09:46,780
And this is completely different
from this semester so far.

198
00:09:46,780 --> 00:09:48,880
We're going to completely
ignore the costs

199
00:09:48,880 --> 00:09:51,560
of the local computation.

200
00:09:51,560 --> 00:09:54,642
So each node can compute
some complicated algorithm

201
00:09:54,642 --> 00:09:56,600
of the sort you've been
studying in this class,

202
00:09:56,600 --> 00:09:59,310
and we usually don't
consider that cost.

203
00:09:59,310 --> 00:10:03,610
We're more worried about
the communication costs.

204
00:10:03,610 --> 00:10:07,690
And so we'll be focusing on the
number of rounds that it takes,

205
00:10:07,690 --> 00:10:11,170
in the synchronous case, and
the number of communication

206
00:10:11,170 --> 00:10:14,710
messages or bits.

207
00:10:14,710 --> 00:10:15,460
OK so far?

208
00:10:18,650 --> 00:10:20,150
So let's start on
the first problem.

209
00:10:20,150 --> 00:10:22,250
Here's a graph.

210
00:10:22,250 --> 00:10:24,460
The nodes start out
possibly identical,

211
00:10:24,460 --> 00:10:27,300
but you want to somehow
distinguish one of them

212
00:10:27,300 --> 00:10:30,200
to be a leader.

213
00:10:30,200 --> 00:10:34,150
So you have this arbitrary,
connected, undirected graph.

214
00:10:34,150 --> 00:10:36,280
And exactly one
process is supposed

215
00:10:36,280 --> 00:10:37,650
to elect itself the leader.

216
00:10:37,650 --> 00:10:40,645
That means it outputs a
special leader signal.

217
00:10:43,360 --> 00:10:45,554
so exactly one should do that.

218
00:10:45,554 --> 00:10:46,720
So why do you want a leader?

219
00:10:46,720 --> 00:10:52,080
Well in practice, leaders
can coordinate things.

220
00:10:52,080 --> 00:10:54,160
They can take charge
of communication,

221
00:10:54,160 --> 00:10:56,110
and inform other
nodes when they're

222
00:10:56,110 --> 00:10:57,380
allowed to send messages.

223
00:10:57,380 --> 00:10:59,440
They can coordinate
the processing of data.

224
00:10:59,440 --> 00:11:01,610
Basically it allows
you to centralize

225
00:11:01,610 --> 00:11:03,190
some of the computation.

226
00:11:03,190 --> 00:11:05,400
It can schedule the
other processes.

227
00:11:05,400 --> 00:11:07,680
It can allocate the resources.

228
00:11:07,680 --> 00:11:10,020
It could help to reach
agreement among the processes,

229
00:11:10,020 --> 00:11:12,280
if they start out with
different opinions about what

230
00:11:12,280 --> 00:11:13,196
is supposed to happen.

231
00:11:15,782 --> 00:11:17,990
All right so let's start
out with a very simple case.

232
00:11:17,990 --> 00:11:18,740
You have a clique.

233
00:11:18,740 --> 00:11:22,500
Here's a four clique, where
all the vertices are directly

234
00:11:22,500 --> 00:11:24,490
connected to all
the other vertices,

235
00:11:24,490 --> 00:11:27,932
with two directional channels.

236
00:11:27,932 --> 00:11:29,265
And the processes are identical.

237
00:11:31,962 --> 00:11:33,420
So I should have
asked you, instead

238
00:11:33,420 --> 00:11:36,790
of just giving the
answer here, but are they

239
00:11:36,790 --> 00:11:39,550
able to elect a leader?

240
00:11:39,550 --> 00:11:42,770
So this theorem says that in
general, that's impossible.

241
00:11:42,770 --> 00:11:46,880
Or it's not possible, in
the most general case.

242
00:11:46,880 --> 00:11:48,730
If you have, no
matter what n is,

243
00:11:48,730 --> 00:11:53,100
let's just say we have an
n vertex clique for some n.

244
00:11:53,100 --> 00:11:57,040
It's not possible to have
any algorithm that you

245
00:11:57,040 --> 00:12:01,760
can have all the processes
run, if it's deterministic

246
00:12:01,760 --> 00:12:04,550
and the processes start
out all indistinguishable.

247
00:12:04,550 --> 00:12:07,940
There's no way
that they can elect

248
00:12:07,940 --> 00:12:09,430
a single node as a leader.

249
00:12:09,430 --> 00:12:12,030
So do you have an intuition
for why that might be the case?

250
00:12:14,982 --> 00:12:16,434
Yeah.

251
00:12:16,434 --> 00:12:17,934
AUDIENCE: They're
all connected, and

252
00:12:17,934 --> 00:12:21,378
the cross-problem
communication in one round

253
00:12:21,378 --> 00:12:23,697
is equal, then to
be equal likely

254
00:12:23,697 --> 00:12:24,822
to select each one of them.

255
00:12:24,822 --> 00:12:26,300
It would be--

256
00:12:26,300 --> 00:12:30,290
NANCY LYNCH: It's deterministic
there's no likelihood here.

257
00:12:30,290 --> 00:12:34,410
And nobody is doing
any selecting.

258
00:12:34,410 --> 00:12:36,640
You're talking as if there's
somebody who's choosing

259
00:12:36,640 --> 00:12:38,260
a process to do something.

260
00:12:38,260 --> 00:12:40,870
There isn't anyone in charge.

261
00:12:40,870 --> 00:12:43,700
So this is a really
different way of thinking.

262
00:12:43,700 --> 00:12:46,180
AUDIENCE: So every node is
essentially the exact same.

263
00:12:46,180 --> 00:12:49,370
So if it says, OK, let's
assume I'm going to be leader,

264
00:12:49,370 --> 00:12:53,400
everyone is going to assume
they're going to be leader.

265
00:12:53,400 --> 00:12:55,420
NANCY LYNCH: That's exactly
the right intuition.

266
00:12:55,420 --> 00:12:56,810
They can't
distinguish themselves

267
00:12:56,810 --> 00:12:59,960
because they're always
going to do the same thing.

268
00:12:59,960 --> 00:13:01,340
Let's look at a
very simple case.

269
00:13:01,340 --> 00:13:05,420
Suppose we have just two nodes,
two node clique, two nodes

270
00:13:05,420 --> 00:13:07,970
connected by channels.

271
00:13:07,970 --> 00:13:09,470
These are identical.

272
00:13:09,470 --> 00:13:10,600
They're deterministic.

273
00:13:10,600 --> 00:13:11,900
What can they do?

274
00:13:11,900 --> 00:13:14,530
Well you could try to design
algorithms for one of them

275
00:13:14,530 --> 00:13:16,680
to elect itself as the leader.

276
00:13:16,680 --> 00:13:18,960
But you can show,
by using induction,

277
00:13:18,960 --> 00:13:20,530
that the processes
are actually going

278
00:13:20,530 --> 00:13:25,230
to remain in the same state
as each other forever,

279
00:13:25,230 --> 00:13:28,260
however many rounds you execute.

280
00:13:28,260 --> 00:13:30,130
So let's slow down.

281
00:13:30,130 --> 00:13:32,090
We can work by contradiction.

282
00:13:32,090 --> 00:13:36,460
Suppose you have an algorithm
that solves this problem.

283
00:13:36,460 --> 00:13:38,270
Both of the processes,
they're identical.

284
00:13:38,270 --> 00:13:40,676
They start in the
same start state.

285
00:13:40,676 --> 00:13:42,300
Let's say there's a
unique start state.

286
00:13:46,350 --> 00:13:49,750
So we could prove by induction
on the number of rounds

287
00:13:49,750 --> 00:13:53,250
that after any number
of rounds, say r rounds,

288
00:13:53,250 --> 00:13:57,770
the processes are still
in identical states.

289
00:13:57,770 --> 00:13:59,770
So the inductive step
is, all right, they're

290
00:13:59,770 --> 00:14:01,960
in identical states after
some number of rounds.

291
00:14:01,960 --> 00:14:03,677
Let's look at the next round.

292
00:14:03,677 --> 00:14:04,760
They're in the same state.

293
00:14:04,760 --> 00:14:08,330
So they generate
the same messages.

294
00:14:08,330 --> 00:14:09,900
So they each other
the same messages.

295
00:14:09,900 --> 00:14:12,680
They receive the same message.

296
00:14:12,680 --> 00:14:15,140
And then they make
the same state change.

297
00:14:15,140 --> 00:14:17,080
So they stay in the same state.

298
00:14:19,800 --> 00:14:22,260
And you can tweak
this, and say how this

299
00:14:22,260 --> 00:14:24,025
works for-- yeah, question?

300
00:14:24,025 --> 00:14:28,010
AUDIENCE: So in what ways is
the proof a contradiction?

301
00:14:28,010 --> 00:14:29,260
NANCY LYNCH: I'm not finished.

302
00:14:29,260 --> 00:14:30,620
You're exactly right.

303
00:14:30,620 --> 00:14:33,850
We have to finish by using the
requirements of the problem.

304
00:14:33,850 --> 00:14:38,410
Since the algorithm has to solve
the leader election problem,

305
00:14:38,410 --> 00:14:41,170
the requirements say that
eventually, one of them

306
00:14:41,170 --> 00:14:45,460
has to output leader.

307
00:14:45,460 --> 00:14:46,820
And what happens when he does?

308
00:14:50,940 --> 00:14:51,440
Anyone?

309
00:14:51,440 --> 00:14:51,730
Yeah.

310
00:14:51,730 --> 00:14:54,240
AUDIENCE: You have node also
outputting the leader signal.

311
00:14:54,240 --> 00:14:56,790
NANCY LYNCH: Yeah the other one
would also do the same thing.

312
00:14:56,790 --> 00:14:59,820
We're saying round by round,
they stay in the same state.

313
00:14:59,820 --> 00:15:04,360
So as someone said before,
when one guy outputs leader,

314
00:15:04,360 --> 00:15:08,210
at the same round the other
guy will output leader as well.

315
00:15:08,210 --> 00:15:10,545
So that's a contradiction
to the problem requirements.

316
00:15:10,545 --> 00:15:12,170
Notice we didn't
assume anything at all

317
00:15:12,170 --> 00:15:15,040
about exactly how
the algorithm works.

318
00:15:15,040 --> 00:15:17,990
We're just saying, however it
works, it can't solve this,

319
00:15:17,990 --> 00:15:20,110
under the assumptions
that the nodes

320
00:15:20,110 --> 00:15:21,830
are indistinguishable
and deterministic.

321
00:15:24,780 --> 00:15:26,710
So as you can see,
this will extend if you

322
00:15:26,710 --> 00:15:30,680
have larger cliques of size n.

323
00:15:30,680 --> 00:15:33,710
So now the process has
not just one output port,

324
00:15:33,710 --> 00:15:38,080
it has n minus 1 output ports to
connect to all the other nodes.

325
00:15:38,080 --> 00:15:41,370
Let's say they're numbered
1 through n minus 1.

326
00:15:41,370 --> 00:15:45,240
And one of the possibilities,
and one I'll use in this proof

327
00:15:45,240 --> 00:15:47,980
is that the ports happen to
be numbered consistently.

328
00:15:47,980 --> 00:15:52,470
So that if you have output
port number k at one node,

329
00:15:52,470 --> 00:15:57,320
it's connected to input port
number k at the other end.

330
00:15:57,320 --> 00:16:00,207
So that's one way
things can match up.

331
00:16:00,207 --> 00:16:01,790
All right if that's
the case, we could

332
00:16:01,790 --> 00:16:03,230
do the same proof we just did.

333
00:16:03,230 --> 00:16:06,560
Show by induction that all
the processes in the clique

334
00:16:06,560 --> 00:16:09,580
remain in the same
state forever.

335
00:16:09,580 --> 00:16:10,652
So same proof.

336
00:16:10,652 --> 00:16:12,610
Suppose you have an
algorithm that's solves it.

337
00:16:12,610 --> 00:16:14,620
They all began in
the same state.

338
00:16:14,620 --> 00:16:17,080
You show by induction that
they all remain the same state.

339
00:16:19,690 --> 00:16:21,640
Well so now we slow
down a little bit.

340
00:16:21,640 --> 00:16:25,920
Each process sends a possibly
different message on each port.

341
00:16:25,920 --> 00:16:28,540
But everybody sends the
same message on port k

342
00:16:28,540 --> 00:16:30,980
because they're all
indistinguishable.

343
00:16:30,980 --> 00:16:33,080
And then because the
way the ports match up,

344
00:16:33,080 --> 00:16:36,370
everybody receives the
same message on port k.

345
00:16:36,370 --> 00:16:38,120
And then they make the
same state changes.

346
00:16:41,030 --> 00:16:43,456
AUDIENCE: Does this
proof imply that there's

347
00:16:43,456 --> 00:16:46,442
a kernel for simplifying the
graph when you find a clique?

348
00:16:50,240 --> 00:16:53,250
NANCY LYNCH: No because
if you have a graph that

349
00:16:53,250 --> 00:16:55,060
consists of a clique
and then let's say,

350
00:16:55,060 --> 00:16:57,330
some other stuff,
maybe the leader

351
00:16:57,330 --> 00:16:59,770
could be somebody
outside the clique.

352
00:16:59,770 --> 00:17:01,920
So you can't just
say because there's

353
00:17:01,920 --> 00:17:04,619
a clique that you can't elect
a leader because you could

354
00:17:04,619 --> 00:17:08,091
break the symmetry of the graph
with other stuff in the graph.

355
00:17:08,091 --> 00:17:09,055
Yeah?

356
00:17:09,055 --> 00:17:11,947
AUDIENCE: What assumptions do
we make to know that for each k,

357
00:17:11,947 --> 00:17:14,035
they receive the same message?

358
00:17:14,035 --> 00:17:15,410
NANCY LYNCH:
Because everybody is

359
00:17:15,410 --> 00:17:18,109
going to send the same message
on the same numbered port,

360
00:17:18,109 --> 00:17:19,192
because they're identical.

361
00:17:22,079 --> 00:17:23,933
And one way the ports
can be hooked up,

362
00:17:23,933 --> 00:17:26,349
and we have to tolerate all
ways they could be hooked up--

363
00:17:26,349 --> 00:17:28,800
say an adversary
hooks them up-- is

364
00:17:28,800 --> 00:17:32,430
that port k,
somebody's output port,

365
00:17:32,430 --> 00:17:36,890
is the other end's
input port numbered k.

366
00:17:36,890 --> 00:17:38,710
So then they all
receive the same message

367
00:17:38,710 --> 00:17:41,374
on their port number k.

368
00:17:41,374 --> 00:17:41,874
Yeah?

369
00:17:41,874 --> 00:17:43,317
AUDIENCE: Is it actually
possible to always hook up

370
00:17:43,317 --> 00:17:44,150
the boards that way.

371
00:17:44,150 --> 00:17:48,480
I mean, it's like wrapped
with three vertices.

372
00:17:48,480 --> 00:17:51,310
NANCY LYNCH: Well I'm
just doing it for cliques.

373
00:17:51,310 --> 00:17:53,390
Yeah it is.

374
00:17:53,390 --> 00:17:54,610
Yeah you could do it.

375
00:17:54,610 --> 00:17:57,470
I mean you could have port
one always going clockwise,

376
00:17:57,470 --> 00:18:00,780
and port two going
counterclockwise,

377
00:18:00,780 --> 00:18:03,560
I mean, there's always a
way to do that in a clique.

378
00:18:03,560 --> 00:18:06,330
I checked that.

379
00:18:06,330 --> 00:18:09,240
So what you've just seen is
one of the very basic problems

380
00:18:09,240 --> 00:18:11,780
for distributed algorithms,
which is breaking symmetry

381
00:18:11,780 --> 00:18:13,680
among identical processes.

382
00:18:13,680 --> 00:18:17,850
And you see that deterministic,
indistinguishable processes

383
00:18:17,850 --> 00:18:19,140
just can't do it.

384
00:18:19,140 --> 00:18:21,610
So we have to have
something more.

385
00:18:21,610 --> 00:18:23,100
So what do you
think we could add

386
00:18:23,100 --> 00:18:24,545
to make this problem solvable?

387
00:18:27,260 --> 00:18:28,680
AUDIENCE: [INAUDIBLE] processes.

388
00:18:28,680 --> 00:18:30,174
NANCY LYNCH: I can't hear.

389
00:18:30,174 --> 00:18:31,090
AUDIENCE: Probability.

390
00:18:31,090 --> 00:18:33,135
Probability, OK, anything else?

391
00:18:36,210 --> 00:18:39,320
So we could have the processes
actually distinguishable.

392
00:18:39,320 --> 00:18:42,710
The common way in this area is
to say that each process has

393
00:18:42,710 --> 00:18:43,720
an identifier.

394
00:18:43,720 --> 00:18:47,690
Like, you buy a chip and it's
got some identifier burned in.

395
00:18:47,690 --> 00:18:50,160
OK so you have some kind
of unique identifiers.

396
00:18:50,160 --> 00:18:53,520
Or you can use randomness.

397
00:18:53,520 --> 00:18:57,230
OK for unique
identifiers, you assume

398
00:18:57,230 --> 00:19:00,430
everybody has some
number or some identifier

399
00:19:00,430 --> 00:19:01,890
that it knows what it is.

400
00:19:01,890 --> 00:19:07,050
It's built into its state, let's
say, a special state variable.

401
00:19:07,050 --> 00:19:08,740
They're totally
ordered, generally.

402
00:19:08,740 --> 00:19:15,170
They could be integers, or
from some totally ordered set.

403
00:19:15,170 --> 00:19:16,760
When you say unique
identifiers, is

404
00:19:16,760 --> 00:19:20,810
it means that different
identifiers could

405
00:19:20,810 --> 00:19:23,360
appear any place in the graph.

406
00:19:23,360 --> 00:19:27,430
But each identifier can
appear at most once.

407
00:19:27,430 --> 00:19:29,870
You can have a huge identifier
space in a small graph.

408
00:19:29,870 --> 00:19:32,700
But you're Just selecting
some identifiers

409
00:19:32,700 --> 00:19:36,790
to put in the
processes in the graph.

410
00:19:36,790 --> 00:19:37,720
So that's one set up.

411
00:19:37,720 --> 00:19:41,880
And the other one, of
course, is using randomness.

412
00:19:41,880 --> 00:19:44,930
So let's look at the
unique identifiers first.

413
00:19:44,930 --> 00:19:46,270
Now the problem becomes easy.

414
00:19:46,270 --> 00:19:48,330
Let's look at the clique again.

415
00:19:48,330 --> 00:19:51,970
Suppose there's an
algorithm-- well, let's

416
00:19:51,970 --> 00:19:53,920
construct an algorithm
that consists

417
00:19:53,920 --> 00:19:58,760
of deterministic processes
with unique identifiers.

418
00:19:58,760 --> 00:20:02,250
And we're going to guarantee
to elect a leader in the graph.

419
00:20:02,250 --> 00:20:03,990
And moreover, it's
just going to take

420
00:20:03,990 --> 00:20:06,180
one round of communication.

421
00:20:06,180 --> 00:20:10,210
And it's only going to
use n squared messages.

422
00:20:10,210 --> 00:20:11,190
How could that work?

423
00:20:17,160 --> 00:20:20,340
Everybody in this click
has a unique identifier.

424
00:20:20,340 --> 00:20:22,540
What would they do?

425
00:20:22,540 --> 00:20:23,400
Send it out, right?

426
00:20:23,400 --> 00:20:25,860
So you can just send
it on all your ports.

427
00:20:25,860 --> 00:20:28,250
Everybody would send its
unique identifier on all

428
00:20:28,250 --> 00:20:29,740
its output ports.

429
00:20:29,740 --> 00:20:33,540
And then they collect the unique
identifiers from everyone else.

430
00:20:33,540 --> 00:20:37,360
So everybody sees the
same set of identifiers.

431
00:20:37,360 --> 00:20:40,870
And so the process with the
maximum unique identifier

432
00:20:40,870 --> 00:20:43,409
knows that it's the only
one with that identifier.

433
00:20:43,409 --> 00:20:44,450
And it's the biggest one.

434
00:20:44,450 --> 00:20:46,120
So it can elect
itself the leader.

435
00:20:49,250 --> 00:20:51,790
So all you is unique
identifiers and the ability

436
00:20:51,790 --> 00:20:54,070
to exchange them reliably.

437
00:20:54,070 --> 00:20:55,930
And you can elect
somebody easily.

438
00:20:58,810 --> 00:21:03,050
Randomness, well,
various ways to do it.

439
00:21:03,050 --> 00:21:07,270
But one idea is the processes
could just choose identifiers

440
00:21:07,270 --> 00:21:08,700
randomly.

441
00:21:08,700 --> 00:21:13,420
You take a sufficiently large
set of possible identifiers,

442
00:21:13,420 --> 00:21:16,540
and so if they just choose
uniformly at random,

443
00:21:16,540 --> 00:21:19,640
they're likely to choose
all different identifiers.

444
00:21:19,640 --> 00:21:22,590
Once you have these
randomly chosen identifiers

445
00:21:22,590 --> 00:21:26,770
you could use them like the
really unique identifiers.

446
00:21:26,770 --> 00:21:29,700
The only thing is you might,
there's a small chance

447
00:21:29,700 --> 00:21:31,170
that you'll have a duplicate.

448
00:21:31,170 --> 00:21:34,520
In which case, you want to be
able to detect that and repeat

449
00:21:34,520 --> 00:21:36,100
this.

450
00:21:36,100 --> 00:21:40,112
So first of all, how big
the a set do you need?

451
00:21:40,112 --> 00:21:41,070
Well here's an example.

452
00:21:43,950 --> 00:21:46,410
Suppose that you have
the n processes choosing

453
00:21:46,410 --> 00:21:51,390
at random, independently
from a space of size r.

454
00:21:51,390 --> 00:21:57,030
Identifiers are the
numbers one through r.

455
00:21:57,030 --> 00:22:01,290
OK and r is going
to depend on n.

456
00:22:01,290 --> 00:22:03,230
It's going to be like n
squared, but it's also

457
00:22:03,230 --> 00:22:06,230
going to depend on epsilon,
which is the error probability

458
00:22:06,230 --> 00:22:08,270
that you're interested in.

459
00:22:08,270 --> 00:22:11,710
Turns out that n squared over
2 epsilon is good enough.

460
00:22:11,710 --> 00:22:15,300
OK so you have your IDs
space at least that large.

461
00:22:15,300 --> 00:22:18,820
And then you can guarantee that
with probability at least 1

462
00:22:18,820 --> 00:22:22,130
minus epsilon, all the
numbers that everybody chooses

463
00:22:22,130 --> 00:22:24,342
are different.

464
00:22:24,342 --> 00:22:25,300
It's a very easy proof.

465
00:22:25,300 --> 00:22:27,980
The probability-- just look
at two particular processes--

466
00:22:27,980 --> 00:22:31,050
what's the probability that
they choose the same number?

467
00:22:31,050 --> 00:22:32,594
It's just 1 over r, right.

468
00:22:32,594 --> 00:22:34,260
Because they're both
choosing at random.

469
00:22:34,260 --> 00:22:35,690
The first one chooses something.

470
00:22:35,690 --> 00:22:37,470
The probability
that the second one

471
00:22:37,470 --> 00:22:41,080
chooses the same thing
is just 1 over r.

472
00:22:41,080 --> 00:22:42,600
But now you can
take a union bound,

473
00:22:42,600 --> 00:22:49,020
just add up the probabilities
of any pair having a duplicate.

474
00:22:49,020 --> 00:22:52,520
And so you have n square
around n squared over 2 pairs.

475
00:22:52,520 --> 00:22:57,500
And so multiplying 1 over
r by n squared over 2

476
00:22:57,500 --> 00:23:00,590
still keeps your probability
less than or equal to epsilon,

477
00:23:00,590 --> 00:23:02,820
your error probability.

478
00:23:02,820 --> 00:23:08,740
So you can choose
identifiers using randomness.

479
00:23:08,740 --> 00:23:11,640
With large enough space,
with very high probability,

480
00:23:11,640 --> 00:23:15,910
you can get them to
be all different.

481
00:23:15,910 --> 00:23:17,795
And now here's how
the algorithm works.

482
00:23:20,460 --> 00:23:24,630
So you get an algorithm that
would finish in only one round,

483
00:23:24,630 --> 00:23:26,980
with probability
1 minus epsilon.

484
00:23:26,980 --> 00:23:28,300
But it will be correct.

485
00:23:28,300 --> 00:23:30,640
And it will have
repeated rounds,

486
00:23:30,640 --> 00:23:32,840
in case the first
round doesn't work.

487
00:23:32,840 --> 00:23:35,900
But the expected
time is just 1 over 1

488
00:23:35,900 --> 00:23:39,130
minus epsilon, not very big.

489
00:23:39,130 --> 00:23:40,380
What's the algorithm?

490
00:23:40,380 --> 00:23:43,880
Well processes just choose the
random IDs from the big space,

491
00:23:43,880 --> 00:23:45,200
like we just said.

492
00:23:45,200 --> 00:23:47,770
They exchange their Ids.

493
00:23:47,770 --> 00:23:50,030
And now, everybody
can see everyone's ID,

494
00:23:50,030 --> 00:23:52,750
but they also can tell
if there's a duplicate.

495
00:23:52,750 --> 00:23:55,030
if the maximum is not unique.

496
00:23:55,030 --> 00:23:57,680
So if the maximum is unique,
find the maximum wins.

497
00:23:57,680 --> 00:23:59,500
And everyone knows that.

498
00:23:59,500 --> 00:24:01,190
Otherwise you have a problem.

499
00:24:01,190 --> 00:24:02,190
And you have to repeat.

500
00:24:02,190 --> 00:24:06,200
And you just keep doing
that until you succeed.

501
00:24:06,200 --> 00:24:08,650
So this can just
continue, but it's

502
00:24:08,650 --> 00:24:11,860
likely to finish very fast,
if you have a high likelihood

503
00:24:11,860 --> 00:24:13,560
of having no duplicates.

504
00:24:17,310 --> 00:24:20,440
Questions about the
leader election?

505
00:24:20,440 --> 00:24:23,910
So the story was, it's
impossible without something

506
00:24:23,910 --> 00:24:27,640
to help you distinguish
some processes.

507
00:24:27,640 --> 00:24:29,286
You can do it with
unique identifiers.

508
00:24:29,286 --> 00:24:30,410
You can do with randomness.

509
00:24:36,680 --> 00:24:42,240
Second problem is called
maximal independent set.

510
00:24:42,240 --> 00:24:44,820
So you have a picture of
a maximal independent set

511
00:24:44,820 --> 00:24:47,020
in a graph here.

512
00:24:47,020 --> 00:24:49,790
Let's try this.

513
00:24:49,790 --> 00:24:51,120
Yeah cursor.

514
00:24:51,120 --> 00:24:53,670
So the maximal independent
set in the graph is here.

515
00:24:53,670 --> 00:24:57,300
But this is something I'll
come back to a minute.

516
00:24:57,300 --> 00:25:00,010
This is actually a use of
the maximal independent set

517
00:25:00,010 --> 00:25:02,600
to model what happens
in a certain kind

518
00:25:02,600 --> 00:25:06,140
of biological system.

519
00:25:06,140 --> 00:25:07,660
What's a maximal
independence set?

520
00:25:07,660 --> 00:25:13,750
So you start with a general,
undirected graph network.

521
00:25:13,750 --> 00:25:18,280
And the problem is to choose
a subset of the nodes so that

522
00:25:18,280 --> 00:25:21,000
they form what we call
a maximal independent .

523
00:25:21,000 --> 00:25:22,180
Set let's break that down.

524
00:25:22,180 --> 00:25:23,430
What does this mean?

525
00:25:23,430 --> 00:25:26,810
Independent means you don't
have any two neighbors that

526
00:25:26,810 --> 00:25:30,310
are both in the set.

527
00:25:30,310 --> 00:25:32,960
So you don't want to get
two neighbors in the set.

528
00:25:32,960 --> 00:25:37,510
Maximal means that
whatever set you choose,

529
00:25:37,510 --> 00:25:42,480
you can't add any more nodes
without violating independence.

530
00:25:42,480 --> 00:25:44,010
So now this should
look something

531
00:25:44,010 --> 00:25:45,860
like a couple of
homework problems

532
00:25:45,860 --> 00:25:48,800
that you had from the
beginning and recently.

533
00:25:48,800 --> 00:25:52,180
But I'm not saying that it's
maximum independent set.

534
00:25:52,180 --> 00:25:54,420
I'm not saying you have to
have the global, largest

535
00:25:54,420 --> 00:25:55,970
number of nodes.

536
00:25:55,970 --> 00:25:58,960
I'm just saying it has
to be a local optimum,

537
00:25:58,960 --> 00:26:01,850
in the sense that you can't
add any more nodes to your set

538
00:26:01,850 --> 00:26:05,180
without violating the
independence property.

539
00:26:05,180 --> 00:26:06,820
Make sense?

540
00:26:06,820 --> 00:26:09,560
There's two examples,
the same graph,

541
00:26:09,560 --> 00:26:12,910
two different maximal
independent sets.

542
00:26:12,910 --> 00:26:18,350
The green nodes, here
we have four green nodes

543
00:26:18,350 --> 00:26:22,135
that are independent, not
neighbors of each other.

544
00:26:22,135 --> 00:26:23,760
And they're maximal,
in that I couldn't

545
00:26:23,760 --> 00:26:26,540
add any of the red
nodes into a set

546
00:26:26,540 --> 00:26:31,150
without violating the
independence property.

547
00:26:31,150 --> 00:26:34,080
But then over here, we have a
second maximal independent set

548
00:26:34,080 --> 00:26:35,850
for the same graph.

549
00:26:35,850 --> 00:26:39,160
Now we just have two nodes.

550
00:26:39,160 --> 00:26:41,760
And you can't add
any of the red nodes

551
00:26:41,760 --> 00:26:44,960
without violating the
independence property.

552
00:26:44,960 --> 00:26:48,550
In other words, every
node is either in the MIS,

553
00:26:48,550 --> 00:26:51,810
or has a neighbor in the MIS.

554
00:26:51,810 --> 00:26:56,620
There's nothing else you can
do to add notes to the MIS

555
00:26:56,620 --> 00:27:00,175
So the notion of maximal
independence, that make sense?

556
00:27:04,120 --> 00:27:08,430
All right, so to make this
a distributed problem,

557
00:27:08,430 --> 00:27:11,490
let's start out assuming we
have no unique identifier.

558
00:27:11,490 --> 00:27:12,869
Actually, for this
whole problem,

559
00:27:12,869 --> 00:27:14,660
we're not going to have
unique identifiers.

560
00:27:14,660 --> 00:27:17,580
They're all going
to be identical.

561
00:27:17,580 --> 00:27:19,990
The processes do need
one piece of information,

562
00:27:19,990 --> 00:27:24,010
which is some approximation
to n, the size of the network,

563
00:27:24,010 --> 00:27:27,160
the total number of vertices.

564
00:27:27,160 --> 00:27:29,990
So we would like to
have these nodes somehow

565
00:27:29,990 --> 00:27:35,860
cooperate to compute an MIS
of the entire network graph.

566
00:27:35,860 --> 00:27:39,570
What that means is every process
should find out whether it

567
00:27:39,570 --> 00:27:41,380
is in the MIS or not.

568
00:27:41,380 --> 00:27:43,780
If it is, it should output n.

569
00:27:43,780 --> 00:27:46,060
And if it's not,
it'll just output out.

570
00:27:49,110 --> 00:27:51,570
So you don't have to
actually compute this,

571
00:27:51,570 --> 00:27:53,150
like you're used
to solving problems

572
00:27:53,150 --> 00:27:55,950
like this, where somebody
has to gather all

573
00:27:55,950 --> 00:27:57,990
the information in one place.

574
00:27:57,990 --> 00:27:59,280
Nobody gathers anything.

575
00:27:59,280 --> 00:28:01,360
Everybody just has to
know whether or not

576
00:28:01,360 --> 00:28:02,228
they're in the MIS.

577
00:28:05,880 --> 00:28:07,760
So as you can
imagine, this is going

578
00:28:07,760 --> 00:28:10,000
to be unsolvable
in certain graphs

579
00:28:10,000 --> 00:28:14,870
by deterministic algorithms,
by the same kind of symmetry

580
00:28:14,870 --> 00:28:19,810
breaking problems that you
saw for leader election.

581
00:28:19,810 --> 00:28:22,320
So we're going to move right
to randomized algorithms

582
00:28:22,320 --> 00:28:25,400
for this problem.

583
00:28:25,400 --> 00:28:28,180
Some applications
of distributed MIS,

584
00:28:28,180 --> 00:28:30,230
well they come up in
communication networks,

585
00:28:30,230 --> 00:28:32,860
where you want to choose--
let's say you have a very

586
00:28:32,860 --> 00:28:35,040
dense network of processes.

587
00:28:35,040 --> 00:28:37,830
You want to choose just
a few nodes, which would

588
00:28:37,830 --> 00:28:39,770
be like an overlay network.

589
00:28:39,770 --> 00:28:41,850
You would choose some
nodes who could take charge

590
00:28:41,850 --> 00:28:44,710
of communication that you can
communicate on this overlay

591
00:28:44,710 --> 00:28:46,910
network, and then in
the end, each node

592
00:28:46,910 --> 00:28:50,960
can take care of communicating
with its many neighbors.

593
00:28:50,960 --> 00:28:54,250
So that's a common
sort of application.

594
00:28:54,250 --> 00:28:56,670
But it also comes
up in other places.

595
00:28:56,670 --> 00:28:59,300
A great example is in
developmental biology, where

596
00:28:59,300 --> 00:29:04,170
a couple of years ago, there
was a paper in Science by Afek,

597
00:29:04,170 --> 00:29:05,980
Alon-- there's like
eight authors on that.

598
00:29:05,980 --> 00:29:11,380
But Ziv Bar-Joseph was the
lead author of this paper.

599
00:29:11,380 --> 00:29:15,730
So the idea is you have a
bunch of cells in a fruit fly.

600
00:29:15,730 --> 00:29:18,830
And during development,
some of those cells

601
00:29:18,830 --> 00:29:21,300
are supposed to
distinguish themselves

602
00:29:21,300 --> 00:29:24,730
as being what's called
sensory organ precursor cells.

603
00:29:24,730 --> 00:29:28,880
The properties that you
want it that actually, you

604
00:29:28,880 --> 00:29:31,940
would like a maximal independent
set of the cells to become

605
00:29:31,940 --> 00:29:34,370
distinguished in this way.

606
00:29:34,370 --> 00:29:36,800
So they wrote a paper about
it, got published in Science.

607
00:29:36,800 --> 00:29:39,790
They basically designed a
new distributed algorithm

608
00:29:39,790 --> 00:29:43,709
that closely mirrored what
happened in the fruit fly

609
00:29:43,709 --> 00:29:44,500
during development.

610
00:29:48,420 --> 00:29:52,240
So what I'm going to show you
is a very well-known algorithm,

611
00:29:52,240 --> 00:29:55,780
a classical algorithm for MIS.

612
00:29:55,780 --> 00:29:58,690
This is by Michael Luby.

613
00:29:58,690 --> 00:30:02,070
Very simple algorithm,
it executes in phases.

614
00:30:02,070 --> 00:30:05,430
Each phase has two realms.

615
00:30:05,430 --> 00:30:07,690
So you start out with all
the nodes being active.

616
00:30:07,690 --> 00:30:08,810
They're all involved.

617
00:30:08,810 --> 00:30:12,060
They don't know what they're
going to end up with.

618
00:30:12,060 --> 00:30:15,410
And at each phase, some
of the active nodes

619
00:30:15,410 --> 00:30:18,580
are going to decide
they're in the MIS.

620
00:30:18,580 --> 00:30:21,970
Some others will decide
they're out of the MIS.

621
00:30:21,970 --> 00:30:24,400
And some others won't know yet.

622
00:30:24,400 --> 00:30:27,230
So then you just continue
to the next phase,

623
00:30:27,230 --> 00:30:30,880
with all the remaining nodes
and the edges between them.

624
00:30:30,880 --> 00:30:32,670
So you're basically
going to settle

625
00:30:32,670 --> 00:30:35,150
what happens with some
subset of the nodes,

626
00:30:35,150 --> 00:30:36,945
and then reduce the
graph and continue.

627
00:30:39,870 --> 00:30:40,870
So that's the algorithm.

628
00:30:40,870 --> 00:30:43,000
So what do you do in each phase?

629
00:30:43,000 --> 00:30:46,115
Here's what an active
node does at a phase.

630
00:30:46,115 --> 00:30:47,810
Two rounds.

631
00:30:47,810 --> 00:30:50,930
The first round, it
picks a random value

632
00:30:50,930 --> 00:30:54,640
in a large space, the same
kind of idea as before.

633
00:30:54,640 --> 00:30:57,680
This time it's 1 up
2 n to the fifth.

634
00:30:57,680 --> 00:31:01,790
It sends that random value
to all its neighbors,

635
00:31:01,790 --> 00:31:06,360
receives the values from all
its still active neighbors,

636
00:31:06,360 --> 00:31:11,310
and then it just looks to see
if its value is greater than all

637
00:31:11,310 --> 00:31:13,190
the values it received.

638
00:31:13,190 --> 00:31:14,450
So then it's a local maximum.

639
00:31:14,450 --> 00:31:16,830
It has chosen a value
that's strictly greater

640
00:31:16,830 --> 00:31:19,640
than the values chosen
by all its neighbors.

641
00:31:19,640 --> 00:31:24,040
So then it decides to join
the MIS and it outputs in.

642
00:31:24,040 --> 00:31:26,372
But now you want to make
sure none of its neighbors--

643
00:31:26,372 --> 00:31:27,830
you know that none
of its neighbors

644
00:31:27,830 --> 00:31:31,200
are going to join
the MIS at round one.

645
00:31:31,200 --> 00:31:34,040
Because you know this
guy's chosen value

646
00:31:34,040 --> 00:31:36,930
is larger, strictly larger,
than all its neighbors.

647
00:31:36,930 --> 00:31:39,450
But now you want to tell them
that they should not join.

648
00:31:39,450 --> 00:31:40,650
They should be out.

649
00:31:40,650 --> 00:31:49,080
So if you join the
MIS you're going

650
00:31:49,080 --> 00:31:54,740
to announce that by sending
messages to all your neighbors.

651
00:31:54,740 --> 00:32:02,510
And then anybody who
receives an announcement can

652
00:32:02,510 --> 00:32:05,470
decide it's not going to be
in the MIS and it outputs out.

653
00:32:05,470 --> 00:32:10,260
Because it knows it has a
neighbor that's in the MIS.

654
00:32:10,260 --> 00:32:15,050
So if you decided in or out
at this phase, you're done.

655
00:32:15,050 --> 00:32:16,420
You become inactive.

656
00:32:16,420 --> 00:32:18,190
And only the
remaining active guys

657
00:32:18,190 --> 00:32:20,570
continue to the next phase.

658
00:32:20,570 --> 00:32:21,200
Make sense?

659
00:32:24,240 --> 00:32:26,220
any questions about how
the algorithm works?

660
00:32:32,480 --> 00:32:34,020
And animation.

661
00:32:34,020 --> 00:32:37,770
All right so all the
nodes start out identical.

662
00:32:37,770 --> 00:32:39,347
They all pick IDs.

663
00:32:39,347 --> 00:32:40,930
So here's some numbers
that they pick.

664
00:32:40,930 --> 00:32:45,410
So which nodes are going
to now join the MIS?

665
00:32:45,410 --> 00:32:50,480
16, and the one that chose 13.

666
00:32:50,480 --> 00:32:52,230
Good, so they're in the MIS.

667
00:32:52,230 --> 00:32:55,070
And then at the same phase,
all of their neighbors,

668
00:32:55,070 --> 00:33:02,750
those for red nodes, are going
to decide to be out of the MIS

669
00:33:02,750 --> 00:33:04,840
And now you're left with
the remaining four nodes.

670
00:33:04,840 --> 00:33:07,290
We don't keep going
with the same IDs.

671
00:33:07,290 --> 00:33:08,140
we start over.

672
00:33:08,140 --> 00:33:10,840
We want the rounds
to be independent.

673
00:33:10,840 --> 00:33:14,610
So if they choose
again, they get new IDs.

674
00:33:14,610 --> 00:33:19,200
And now the guy with the
12 and the guy with the 18

675
00:33:19,200 --> 00:33:22,180
going to join the
MIS at this phase.

676
00:33:22,180 --> 00:33:27,280
And their neighbors will
decide not to be in the MIS.

677
00:33:27,280 --> 00:33:30,800
That leaves us with just one
mode, the guy who had four.

678
00:33:30,800 --> 00:33:33,240
Next phase, he
chooses another ID.

679
00:33:33,240 --> 00:33:36,652
But he has no neighbors
so by default,

680
00:33:36,652 --> 00:33:38,110
he's bigger than
all the neighbors.

681
00:33:38,110 --> 00:33:39,340
So he just joins the MIS.

682
00:33:42,719 --> 00:33:43,760
So that's how this works.

683
00:33:43,760 --> 00:33:45,600
Very simple algorithm,
and it actually

684
00:33:45,600 --> 00:33:48,240
works to find an
MIS very quickly.

685
00:33:53,380 --> 00:33:57,150
Why does this give
you independence?

686
00:33:57,150 --> 00:34:00,310
How do we know that if
this ever terminates,

687
00:34:00,310 --> 00:34:02,960
if everybody decides, how do
we know that we don't ever

688
00:34:02,960 --> 00:34:08,440
have two neighbors that
decided to be in the MIS?

689
00:34:08,440 --> 00:34:09,600
Yeah.

690
00:34:09,600 --> 00:34:11,580
AUDIENCE: Because once
a node joins the MIS,

691
00:34:11,580 --> 00:34:14,550
it broadcasts to
its neighbors that--

692
00:34:14,550 --> 00:34:16,630
NANCY LYNCH: Right.

693
00:34:16,630 --> 00:34:18,750
The only way you join
the MIS is if you

694
00:34:18,750 --> 00:34:21,750
have the unique maximum
value in your neighborhood.

695
00:34:21,750 --> 00:34:25,469
And when you do, all your
neighbors become inactive.

696
00:34:25,469 --> 00:34:29,020
So you're certainly going
to have independence.

697
00:34:29,020 --> 00:34:33,199
Maximality, if it
terminates, the final set

698
00:34:33,199 --> 00:34:37,159
is not going to allow you
to add any more nodes.

699
00:34:37,159 --> 00:34:37,659
Why?

700
00:34:37,659 --> 00:34:40,290
Because a node is only
going to become inactive

701
00:34:40,290 --> 00:34:45,460
if it joins the MIS, or
a neighbor joins the MIS.

702
00:34:45,460 --> 00:34:47,159
And we just continue
this algorithm

703
00:34:47,159 --> 00:34:51,080
until all the nodes
become inactive.

704
00:34:51,080 --> 00:34:55,010
So either the node is in the
MIS or a neighbor is in the MIS.

705
00:34:55,010 --> 00:34:58,170
So you can't possibly
add any more.

706
00:34:58,170 --> 00:35:00,350
Yes?

707
00:35:00,350 --> 00:35:01,970
So this has the
basic correctness

708
00:35:01,970 --> 00:35:04,590
properties, but what
you're probably wondering,

709
00:35:04,590 --> 00:35:07,940
is why is this efficient enough?

710
00:35:07,940 --> 00:35:10,120
Why is it efficient?

711
00:35:10,120 --> 00:35:13,850
Well we could say that with high
probability, of probability 1,

712
00:35:13,850 --> 00:35:15,065
it will eventually terminate.

713
00:35:17,590 --> 00:35:25,020
More quantitative, we can
state this theorem that says,

714
00:35:25,020 --> 00:35:28,770
with probability at
least 1 minus 1 over n,

715
00:35:28,770 --> 00:35:33,490
all the nodes decide
within four log n phases.

716
00:35:33,490 --> 00:35:35,540
Since n is the
number of nodes, this

717
00:35:35,540 --> 00:35:38,670
doesn't tell us that you get
probability 1 of eventually

718
00:35:38,670 --> 00:35:39,290
terminating.

719
00:35:39,290 --> 00:35:42,310
But we can repeat this
and get the same sort

720
00:35:42,310 --> 00:35:47,520
of bound repeatedly
for successive phases.

721
00:35:47,520 --> 00:35:50,860
But let's just focus on getting
probability at least 1 minus 1

722
00:35:50,860 --> 00:35:57,245
over n that all nodes decide
within about four log n phases.

723
00:36:00,270 --> 00:36:01,680
So let's see what
this is saying.

724
00:36:01,680 --> 00:36:04,580
You have this big
complicated graph.

725
00:36:04,580 --> 00:36:08,920
And in one round, for this to
be like log n behavior, what

726
00:36:08,920 --> 00:36:10,885
has to happen at each phase?

727
00:36:13,740 --> 00:36:16,120
You have to reduce it by
some constant fraction.

728
00:36:16,120 --> 00:36:18,650
The number of nodes,
say, should go down.

729
00:36:18,650 --> 00:36:23,220
So it's sort of how
the proof will go.

730
00:36:23,220 --> 00:36:25,310
So we start out
with a Lemma saying,

731
00:36:25,310 --> 00:36:27,660
you're choosing
these IDs at random.

732
00:36:27,660 --> 00:36:30,584
You want a high probability
that they're all different.

733
00:36:30,584 --> 00:36:32,500
So we have a lemma like
the one we had before.

734
00:36:32,500 --> 00:36:35,920
It says, the probability
at least, we use 1 minus 1

735
00:36:35,920 --> 00:36:38,530
over n squared, in each phase.

736
00:36:38,530 --> 00:36:41,310
All these phases
up to four log n,

737
00:36:41,310 --> 00:36:44,650
everybody's choosing a
different random value.

738
00:36:44,650 --> 00:36:48,880
All the nodes choose different
values at each phase.

739
00:36:48,880 --> 00:36:51,810
So this lets us
ignore the possibility

740
00:36:51,810 --> 00:36:54,000
that you have repeats.

741
00:36:54,000 --> 00:36:55,610
So we'll come back
to that at the end.

742
00:36:58,147 --> 00:37:00,480
All right, so we're going to
pretend that in each phase,

743
00:37:00,480 --> 00:37:02,340
all the random
numbers are different.

744
00:37:04,960 --> 00:37:09,070
So the key idea of this is
to show that the graph has

745
00:37:09,070 --> 00:37:13,019
to shrink enough at each phase.

746
00:37:13,019 --> 00:37:14,560
So the way we're
going to say that is

747
00:37:14,560 --> 00:37:17,500
not in terms of the
nodes, but in terms

748
00:37:17,500 --> 00:37:18,740
of the number of edges.

749
00:37:18,740 --> 00:37:22,670
We're going to say at
each phase, the expected

750
00:37:22,670 --> 00:37:26,140
number of edges that are
live-- why is that shaking?

751
00:37:31,680 --> 00:37:32,240
OK.

752
00:37:32,240 --> 00:37:33,760
The expected number
of edges that

753
00:37:33,760 --> 00:37:38,300
are live at the end of the
phase is at most half the number

754
00:37:38,300 --> 00:37:41,570
that were live at the
beginning of the phase.

755
00:37:41,570 --> 00:37:45,410
So an edge is live, if its
endpoints are still live.

756
00:37:45,410 --> 00:37:48,795
So instead of talking about
reducing the number of nodes

757
00:37:48,795 --> 00:37:50,170
by a constant
fraction, I'm going

758
00:37:50,170 --> 00:37:52,550
to reduce the number
of remaining edges

759
00:37:52,550 --> 00:37:56,710
by constant fraction
of each phase.

760
00:37:56,710 --> 00:37:58,860
So this is what
I'm going to prove.

761
00:37:58,860 --> 00:38:02,260
So now I've got only three
slides, but the only three

762
00:38:02,260 --> 00:38:04,690
slides today that have
calculations on them.

763
00:38:04,690 --> 00:38:07,300
So probably have
to pay attention,

764
00:38:07,300 --> 00:38:09,320
if you want to follow
the calculations online.

765
00:38:09,320 --> 00:38:11,390
So let's see why.

766
00:38:11,390 --> 00:38:12,560
But the goal is clear?

767
00:38:12,560 --> 00:38:14,570
We have to reduce
the number of edges

768
00:38:14,570 --> 00:38:16,750
that remain by a factor of two.

769
00:38:19,270 --> 00:38:23,470
So this is actually a new
proof of this algorithm's

770
00:38:23,470 --> 00:38:24,180
performance.

771
00:38:24,180 --> 00:38:28,150
The proof in the original
papers is pretty complicated.

772
00:38:28,150 --> 00:38:32,770
This is a very
intuitive, neat proof.

773
00:38:32,770 --> 00:38:35,020
So the first line
of the proof says

774
00:38:35,020 --> 00:38:38,820
if you have a node that
has a neighbor that

775
00:38:38,820 --> 00:38:43,200
chooses a value that's bigger
than all of its own neighbors--

776
00:38:43,200 --> 00:38:45,006
so u has a neighbor w.

777
00:38:45,006 --> 00:38:48,760
W chooses a value that's
bigger than all w's neighbors.

778
00:38:48,760 --> 00:38:49,640
But let's say more.

779
00:38:49,640 --> 00:38:53,220
Let's say it's also bigger than
all of u's other neighbors,

780
00:38:53,220 --> 00:38:55,670
besides w.

781
00:38:55,670 --> 00:38:59,270
So w is really big, bigger
than all w's neighbors,

782
00:38:59,270 --> 00:39:02,960
bigger than all of
u's other neighbors.

783
00:39:02,960 --> 00:39:08,240
If that happens, then
what happens to u?

784
00:39:08,240 --> 00:39:12,160
Well we know that w is going
to decide to join the MIS.

785
00:39:12,160 --> 00:39:16,260
And u is going to
definitely die,

786
00:39:16,260 --> 00:39:18,230
is not going to join the MIS.

787
00:39:18,230 --> 00:39:20,560
Right?

788
00:39:20,560 --> 00:39:21,060
OK?

789
00:39:21,060 --> 00:39:23,920
I don't want to lose
people in the first line.

790
00:39:23,920 --> 00:39:26,310
Question?

791
00:39:26,310 --> 00:39:27,890
Here's a picture.

792
00:39:27,890 --> 00:39:28,490
Here's u.

793
00:39:31,870 --> 00:39:34,800
And it has a neighbor w.

794
00:39:34,800 --> 00:39:40,780
And let's say that w's chosen
value is greater than all

795
00:39:40,780 --> 00:39:43,090
of w's neighbors, but
also greater than all

796
00:39:43,090 --> 00:39:44,785
of u's other neighbors.

797
00:39:47,510 --> 00:39:49,650
Yes?

798
00:39:49,650 --> 00:39:53,160
If w has that, w is
going to join the MIS,

799
00:39:53,160 --> 00:39:56,480
and u is going to
definitely not join the MIS.

800
00:39:56,480 --> 00:40:00,470
It's going to decide
out in this phase.

801
00:40:00,470 --> 00:40:02,076
OK so far?

802
00:40:02,076 --> 00:40:05,830
AUDIENCE: Why does you need
w to have value greater

803
00:40:05,830 --> 00:40:07,750
than u's neighbors?

804
00:40:07,750 --> 00:40:10,630
Because if w is greater than
all of its neighbors then it's--

805
00:40:10,630 --> 00:40:13,630
NANCY LYNCH: --be in the MIS
and u will not be in the MIS.

806
00:40:13,630 --> 00:40:15,900
And that seems like
it ought to be enough.

807
00:40:15,900 --> 00:40:19,240
But look at the next line.

808
00:40:19,240 --> 00:40:21,510
Well the line after this one.

809
00:40:21,510 --> 00:40:24,960
What's the probability that
w chooses a value like that?

810
00:40:28,080 --> 00:40:31,570
So if it's going to be bigger
than all u's neighbors, and all

811
00:40:31,570 --> 00:40:33,450
of w's neighbors,
and keeping in mind

812
00:40:33,450 --> 00:40:35,320
that they are each
other's neighbors,

813
00:40:35,320 --> 00:40:37,920
turns out that
there is degree u,

814
00:40:37,920 --> 00:40:43,410
at most degree u plus degree
w nodes involved here.

815
00:40:43,410 --> 00:40:45,790
W has to have the
biggest of all of those,

816
00:40:45,790 --> 00:40:48,810
so it's going to
have the probability

817
00:40:48,810 --> 00:40:53,060
1 over the number of nodes
of being the biggest one.

818
00:40:53,060 --> 00:40:55,540
So it's just 1 over
the degree of u

819
00:40:55,540 --> 00:40:59,180
plus the degree of
w, the probability

820
00:40:59,180 --> 00:41:01,320
that w will choose
a big enough value.

821
00:41:06,580 --> 00:41:09,400
But you ask, this
is pessimistic.

822
00:41:09,400 --> 00:41:13,950
Why don't I just say that w
is bigger than its own values?

823
00:41:13,950 --> 00:41:15,490
Because I want to
do this next step.

824
00:41:15,490 --> 00:41:19,180
I want to say the probability
that node u gets killed

825
00:41:19,180 --> 00:41:24,730
by one of its neighbors, any one
of its neighbors in this phase.

826
00:41:24,730 --> 00:41:26,640
I can calculate that as the sum.

827
00:41:29,540 --> 00:41:32,220
The probability that node
u is killed by a neighbor

828
00:41:32,220 --> 00:41:35,520
is at least the sum over
all of its neighbors.

829
00:41:35,520 --> 00:41:40,240
You look at all the vertices
in the neighbor set,

830
00:41:40,240 --> 00:41:43,990
and you add up this fraction.

831
00:41:43,990 --> 00:41:49,090
So why did I need to make that
additional assumption before?

832
00:41:49,090 --> 00:41:51,850
That w is greater than
all of u's neighbors,

833
00:41:51,850 --> 00:41:54,390
as well as all of
its own neighbors.

834
00:41:54,390 --> 00:41:55,078
Yeah?

835
00:41:55,078 --> 00:41:56,910
AUDIENCE: So you can
add a problem to--

836
00:41:56,910 --> 00:41:58,618
NANCY LYNCH: Yeah
because otherwise these

837
00:41:58,618 --> 00:42:00,840
would be overlapping events.

838
00:42:00,840 --> 00:42:03,260
But this way I know they're
definitely disjoint events.

839
00:42:03,260 --> 00:42:06,991
We can't have-- if we
have w and w prime,

840
00:42:06,991 --> 00:42:08,490
you can't have both
of those holding

841
00:42:08,490 --> 00:42:12,590
because the requirement for
w is saying that its ID is

842
00:42:12,590 --> 00:42:16,540
bigger than w prime's ID.

843
00:42:16,540 --> 00:42:18,790
Because you have
these disjoint events,

844
00:42:18,790 --> 00:42:21,140
you can just add
the probabilities.

845
00:42:21,140 --> 00:42:22,520
And you know that
the probability

846
00:42:22,520 --> 00:42:25,460
that u gets killed
by some neighbor

847
00:42:25,460 --> 00:42:28,130
is at least this summation.

848
00:42:28,130 --> 00:42:29,491
OK so far?

849
00:42:29,491 --> 00:42:30,740
So now I'm going to calculate.

850
00:42:33,260 --> 00:42:34,870
But I wanted to
focus on the edges.

851
00:42:34,870 --> 00:42:38,560
So let's see, this tells us a
way that a node can get killed.

852
00:42:38,560 --> 00:42:42,990
But let's look at what happens
for an edge getting killed.

853
00:42:42,990 --> 00:42:48,230
This is the probability
that a node is killed.

854
00:42:48,230 --> 00:42:53,070
So the probability that
an edge dies at this phase

855
00:42:53,070 --> 00:42:56,180
is at least the maximum
of the probability

856
00:42:56,180 --> 00:43:02,766
that either of its
two endpoints die.

857
00:43:02,766 --> 00:43:04,390
And let's just write
it as the average.

858
00:43:04,390 --> 00:43:06,790
The probability
that an edge dies

859
00:43:06,790 --> 00:43:09,050
is at least the average
of the probability

860
00:43:09,050 --> 00:43:11,760
that it's two endpoints
are killed, in this way.

861
00:43:15,200 --> 00:43:18,010
So for an edge, an edge is
definitely going to die,

862
00:43:18,010 --> 00:43:20,380
if one of its endpoints dies.

863
00:43:20,380 --> 00:43:23,130
And then the edge dies if it
dies in this particular way.

864
00:43:26,110 --> 00:43:29,050
So the probability an edge dies
is at least the probability

865
00:43:29,050 --> 00:43:32,390
that one of the-- half the
sum of the probabilities

866
00:43:32,390 --> 00:43:36,130
that the two end points die.

867
00:43:36,130 --> 00:43:38,490
It's the average probability.

868
00:43:38,490 --> 00:43:39,950
Makes sense?

869
00:43:39,950 --> 00:43:42,740
You might have to
read this later.

870
00:43:42,740 --> 00:43:45,770
So now we can go from that to
the expected number of edges

871
00:43:45,770 --> 00:43:47,809
that die.

872
00:43:47,809 --> 00:43:48,350
What is that?

873
00:43:48,350 --> 00:43:51,110
You just add up, over all,
the edges, the probability

874
00:43:51,110 --> 00:43:53,250
that the edge dies.

875
00:43:53,250 --> 00:43:55,710
The expected number
of edges that die

876
00:43:55,710 --> 00:44:00,910
is at least the sum over all
of the edges of the probability

877
00:44:00,910 --> 00:44:03,070
that the two endpoints die.

878
00:44:09,040 --> 00:44:12,170
So you have the sum,
over all of the edges.

879
00:44:12,170 --> 00:44:13,570
You add up for all the edges.

880
00:44:13,570 --> 00:44:16,300
The probability that
one endpoint is killed,

881
00:44:16,300 --> 00:44:20,360
and the probability the
other endpoint is killed.

882
00:44:20,360 --> 00:44:22,240
So what we have is this
great big summation

883
00:44:22,240 --> 00:44:26,810
involving now the kill
probabilities for vertices.

884
00:44:26,810 --> 00:44:29,050
So we have the kill
probability for each vertex.

885
00:44:29,050 --> 00:44:32,360
How many times does that occur?

886
00:44:32,360 --> 00:44:38,100
If you have a vertex, u, it
appears once for every edge

887
00:44:38,100 --> 00:44:39,912
that u is an endpoint of.

888
00:44:43,200 --> 00:44:47,610
So you have the kill probability
for each node occurring exactly

889
00:44:47,610 --> 00:44:50,500
it's degree number of times.

890
00:44:50,500 --> 00:44:53,580
So that lets me rewrite
this in terms of vertices.

891
00:44:53,580 --> 00:44:58,420
This sum is just 1/2 the
sum over all the nodes

892
00:44:58,420 --> 00:45:02,580
of the probability that the node
gets killed times its degree.

893
00:45:05,200 --> 00:45:09,370
So I'm calculating by
replacing the description

894
00:45:09,370 --> 00:45:11,100
in terms of edges,
by description

895
00:45:11,100 --> 00:45:13,150
in terms of vertices.

896
00:45:13,150 --> 00:45:16,780
More or less OK so far?

897
00:45:16,780 --> 00:45:18,000
So now what do I do?

898
00:45:18,000 --> 00:45:21,040
Well, I know the probability
that u is killed.

899
00:45:21,040 --> 00:45:23,900
I have a bound for that
up on the first line.

900
00:45:23,900 --> 00:45:26,960
So I'm just going
to plug that in.

901
00:45:26,960 --> 00:45:29,460
So I get 1/2 the sum
over all the nodes,

902
00:45:29,460 --> 00:45:36,320
the degree of the node times
this summation that gives me

903
00:45:36,320 --> 00:45:39,092
the kill probability
for that node.

904
00:45:39,092 --> 00:45:40,550
And now I play
around with the sum.

905
00:45:40,550 --> 00:45:45,170
I can move the degree
inside the second summation,

906
00:45:45,170 --> 00:45:48,360
and I get this.

907
00:45:48,360 --> 00:45:49,750
So now let's stare
at this again.

908
00:45:49,750 --> 00:45:54,760
I have the sum over all
nodes of the sum over all

909
00:45:54,760 --> 00:45:58,400
of its neighbors
of some expression.

910
00:45:58,400 --> 00:46:01,702
But if I'm considering a
node, every note and every one

911
00:46:01,702 --> 00:46:03,410
of its neighbors,
that's like considering

912
00:46:03,410 --> 00:46:05,610
all the directed edges.

913
00:46:05,610 --> 00:46:08,760
I look at every u, and I
look at every edge that

914
00:46:08,760 --> 00:46:11,600
connects u to something else.

915
00:46:11,600 --> 00:46:14,890
So I can write it as the sum
over all the directed edges

916
00:46:14,890 --> 00:46:18,080
of this expression.

917
00:46:18,080 --> 00:46:20,160
So I get half of
the sum over all the

918
00:46:20,160 --> 00:46:23,454
directed edges of
this expression.

919
00:46:23,454 --> 00:46:25,245
But we were talking
about undirected edges.

920
00:46:28,380 --> 00:46:32,229
And the undirected edges
are being twice here, once

921
00:46:32,229 --> 00:46:33,020
for each direction.

922
00:46:35,950 --> 00:46:39,690
I can change this sum to a
sum over undirected edges.

923
00:46:39,690 --> 00:46:42,270
But now I have the two
endpoints to deal with.

924
00:46:42,270 --> 00:46:48,390
So I get the degree of u and
the degree of v in the numerator

925
00:46:48,390 --> 00:46:50,390
because I'm looking at
it from the point of view

926
00:46:50,390 --> 00:46:53,810
both of the endpoints
of each edge.

927
00:46:53,810 --> 00:46:55,780
Well something
drops out, so I have

928
00:46:55,780 --> 00:47:00,860
1/2 the sum over all the
undirected edges of 1.

929
00:47:00,860 --> 00:47:03,645
So that's 1/2 of the
number of undirected edges.

930
00:47:06,550 --> 00:47:08,800
So I don't expect you to
get every step of this,

931
00:47:08,800 --> 00:47:10,850
but it's on three
slides, so you can

932
00:47:10,850 --> 00:47:14,250
stare at this when you go home
and make sure the steps work.

933
00:47:14,250 --> 00:47:16,070
But remember the
point of this is

934
00:47:16,070 --> 00:47:18,090
to show that you reduce
the number of edges

935
00:47:18,090 --> 00:47:21,640
by a factor of two, and it's
done and sort of a clever way

936
00:47:21,640 --> 00:47:24,185
by counting the kill
probabilities of vertices.

937
00:47:30,700 --> 00:47:34,020
So we get this, reducing
the number of edges.

938
00:47:34,020 --> 00:47:35,910
And now we can just
plug that back in

939
00:47:35,910 --> 00:47:41,072
to get our complexity bound
for the entire algorithm.

940
00:47:41,072 --> 00:47:43,280
Remember the original theorem
you're we were to prove

941
00:47:43,280 --> 00:47:47,740
is a probability bound for
deciding within log n phases.

942
00:47:47,740 --> 00:47:49,420
Well you should have
a pretty good idea

943
00:47:49,420 --> 00:47:51,370
of why that works
because if at each phase,

944
00:47:51,370 --> 00:47:53,120
you're going to reduce
the number of edges

945
00:47:53,120 --> 00:47:55,800
by around a factor
of two, then it's

946
00:47:55,800 --> 00:48:00,530
going to take something
like log n phases to finish.

947
00:48:00,530 --> 00:48:02,120
And I just put a proof sketch.

948
00:48:04,940 --> 00:48:07,710
The number of edges that are
still alive after four log n

949
00:48:07,710 --> 00:48:11,010
phases, well you divide
by 2 four log n times,

950
00:48:11,010 --> 00:48:13,310
so you get down to
practically nothing.

951
00:48:13,310 --> 00:48:19,690
The probability any edges are
alive at the end is very small.

952
00:48:19,690 --> 00:48:23,910
So you get a small probability
the algorithm doesn't terminate

953
00:48:23,910 --> 00:48:26,700
within four log n phases.

954
00:48:26,700 --> 00:48:29,569
There's an extra little
term I threw in here.

955
00:48:29,569 --> 00:48:30,610
You might have forgotten.

956
00:48:30,610 --> 00:48:33,030
There was a term that I needed
for the small probability,

957
00:48:33,030 --> 00:48:36,090
that somebody chose
duplicate IDs.

958
00:48:36,090 --> 00:48:37,760
So I'm bringing them
back in at the end,

959
00:48:37,760 --> 00:48:40,430
in a little union bound.

960
00:48:40,430 --> 00:48:43,857
And we get our 1 over
n probability this way.

961
00:48:43,857 --> 00:48:45,940
But the key idea is you
reduce the number of edges

962
00:48:45,940 --> 00:48:49,150
by half at each stage.

963
00:48:49,150 --> 00:48:52,790
Enough for you to look at later,
I guess to figure this out

964
00:48:52,790 --> 00:48:55,670
or you have any
questions about this?

965
00:48:55,670 --> 00:49:00,190
So that's the last
equations and calculation.

966
00:49:00,190 --> 00:49:06,470
I'm going to go onto a new
idea, more conceptual stuff.

967
00:49:06,470 --> 00:49:09,800
Familiar problem,
breadth-first spanning trees,

968
00:49:09,800 --> 00:49:14,070
setting up breadth-first
paths to every node,

969
00:49:14,070 --> 00:49:19,130
but we're going to study
it in our new setting.

970
00:49:19,130 --> 00:49:21,080
We have a connected graph.

971
00:49:21,080 --> 00:49:24,390
This time, let's suppose that
it has a distinguished vertex,

972
00:49:24,390 --> 00:49:26,310
like it already has a leader.

973
00:49:26,310 --> 00:49:28,450
So it has a distinguished
vertex in the graph

974
00:49:28,450 --> 00:49:31,930
that's going to become
the root of the BFS tree.

975
00:49:34,920 --> 00:49:37,620
And the processes don't
need any knowledge

976
00:49:37,620 --> 00:49:39,220
about the graph for this one.

977
00:49:44,930 --> 00:49:48,490
For the rest of the
time today and Thursday,

978
00:49:48,490 --> 00:49:51,250
we'll assume the processes
have unique identifiers,

979
00:49:51,250 --> 00:49:53,460
and I don't think we're
using any probabilities.

980
00:49:53,460 --> 00:49:56,970
So this is just going to be
using the unique identifiers

981
00:49:56,970 --> 00:49:59,570
to solve our problems.

982
00:49:59,570 --> 00:50:02,700
So everybody knows its
own unique identifier.

983
00:50:02,700 --> 00:50:05,710
The root has a distinguished,
generally known,

984
00:50:05,710 --> 00:50:08,720
unique identifier say i0.

985
00:50:08,720 --> 00:50:10,880
And the process that
has i0 knows hey,

986
00:50:10,880 --> 00:50:13,060
I'm at the root of the graph.

987
00:50:13,060 --> 00:50:14,380
So the set up make sense?

988
00:50:17,647 --> 00:50:19,230
We might as well
assume that everybody

989
00:50:19,230 --> 00:50:21,710
knows the unique identifiers
of their neighbors

990
00:50:21,710 --> 00:50:23,990
because they could easily
exchange information

991
00:50:23,990 --> 00:50:27,200
now, and match up who's
connected on which port

992
00:50:27,200 --> 00:50:28,385
by a unique identifier.

993
00:50:31,830 --> 00:50:33,499
We'll just do deterministic.

994
00:50:33,499 --> 00:50:35,540
There'll be a little bit
of non-determinism here.

995
00:50:35,540 --> 00:50:36,770
I'll say more about that.

996
00:50:36,770 --> 00:50:42,470
But I'm not going to worry
about probabilities for this.

997
00:50:42,470 --> 00:50:45,100
Well that told you
about the general setup.

998
00:50:45,100 --> 00:50:47,880
What are the processes
supposed to do?

999
00:50:47,880 --> 00:50:50,720
Well they're supposed to compute
a breadth-first spanning tree,

1000
00:50:50,720 --> 00:50:53,210
rooted at vertex v0.

1001
00:50:53,210 --> 00:50:56,140
The branches are
going to be directed

1002
00:50:56,140 --> 00:51:00,040
paths in this undirected
graph, coming from v0.

1003
00:51:00,040 --> 00:51:03,520
Spanning means they should
reach all the vertices.

1004
00:51:03,520 --> 00:51:06,370
And breadth-first means that
if a vertex is at a distance

1005
00:51:06,370 --> 00:51:12,600
d from v0, it will appear at
depth d in this spanning tree.

1006
00:51:12,600 --> 00:51:17,910
So everybody should get a
shortest path from the root.

1007
00:51:17,910 --> 00:51:20,610
Now how are we going to compute
this in a distributed setting?

1008
00:51:20,610 --> 00:51:23,400
Well now the output
of a process is just

1009
00:51:23,400 --> 00:51:26,850
going to be its
parent in the tree.

1010
00:51:26,850 --> 00:51:29,590
So we're not actually going
to compute this tree anywhere

1011
00:51:29,590 --> 00:51:30,680
as a whole.

1012
00:51:30,680 --> 00:51:33,546
Everybody's just going to
know its parent in the tree.

1013
00:51:37,810 --> 00:51:38,420
Questions?

1014
00:51:38,420 --> 00:51:39,340
Problem make sense?

1015
00:51:43,920 --> 00:51:47,694
So this is just an example
of a spanning tree,

1016
00:51:47,694 --> 00:51:48,860
breadth-first spanning tree.

1017
00:51:48,860 --> 00:51:53,000
This gives you shortest
paths to all of the nodes, ,

1018
00:51:53,000 --> 00:51:55,200
shortest in terms of
the number of hops.

1019
00:51:58,600 --> 00:52:01,740
So we can have a very,
very simple algorithm.

1020
00:52:01,740 --> 00:52:06,270
We're going to let the processes
mark themselves as they

1021
00:52:06,270 --> 00:52:08,520
get included in the tree.

1022
00:52:08,520 --> 00:52:12,970
Starts out only the first
process, i0, is marked.

1023
00:52:12,970 --> 00:52:17,470
So do you want to give an idea,
maybe, of how this might work?

1024
00:52:17,470 --> 00:52:19,421
Sketch out-- yeah?

1025
00:52:19,421 --> 00:52:21,504
AUDIENCE: The root will
send out to its neighbors.

1026
00:52:21,504 --> 00:52:22,968
And they will then
mark themselves

1027
00:52:22,968 --> 00:52:25,408
as the parent of
whoever they heard from.

1028
00:52:25,408 --> 00:52:27,369
Then they will--

1029
00:52:27,369 --> 00:52:28,910
NANCY LYNCH: This
is all synchronous.

1030
00:52:28,910 --> 00:52:29,710
So that's great.

1031
00:52:29,710 --> 00:52:31,720
They'll be doing this
in synchronous rounds.

1032
00:52:31,720 --> 00:52:34,030
So everybody will, at
the certain distance,

1033
00:52:34,030 --> 00:52:37,790
is going to get the message
at the right number of rounds

1034
00:52:37,790 --> 00:52:40,320
to mark their distance.

1035
00:52:40,320 --> 00:52:45,000
OK so in round one,
process i0 will

1036
00:52:45,000 --> 00:52:48,550
send a special
message, say search,

1037
00:52:48,550 --> 00:52:50,660
to all of its neighbors.

1038
00:52:50,660 --> 00:52:52,950
And anybody who receives
a message in round one

1039
00:52:52,950 --> 00:52:57,320
will mark itself,
decide i0 is its parent,

1040
00:52:57,320 --> 00:53:01,850
could output that i0 is
my parent, parent i0.

1041
00:53:01,850 --> 00:53:03,970
And then it can get
ready for the next round,

1042
00:53:03,970 --> 00:53:09,210
when it's supposed to
send to continue this.

1043
00:53:09,210 --> 00:53:13,000
So at later rounds, if you
decided you're going to send,

1044
00:53:13,000 --> 00:53:16,070
if you know you're supposed to
send from the previous round,

1045
00:53:16,070 --> 00:53:20,000
then you send a search message
to all of your neighbors.

1046
00:53:20,000 --> 00:53:22,530
Now the process is
sitting there and it

1047
00:53:22,530 --> 00:53:25,040
receives a search message.

1048
00:53:25,040 --> 00:53:30,340
If he's already marked, then he
should just ignore the message.

1049
00:53:30,340 --> 00:53:32,430
Once you're included
in the tree,

1050
00:53:32,430 --> 00:53:35,160
you don't care if you
get other messages,

1051
00:53:35,160 --> 00:53:37,840
search messages on other paths.

1052
00:53:37,840 --> 00:53:41,170
So you only do anything
if you're not yet marked

1053
00:53:41,170 --> 00:53:43,120
and you receive a message.

1054
00:53:43,120 --> 00:53:44,855
And in that case, then
you mark yourself.

1055
00:53:48,020 --> 00:53:49,970
Then you mark
yourself, and then you

1056
00:53:49,970 --> 00:53:53,980
choose one of your neighbors
as to be your parent.

1057
00:53:53,980 --> 00:53:55,920
Now because this
is synchronous, you

1058
00:53:55,920 --> 00:53:58,970
have several nodes that could
be sending at the same time.

1059
00:53:58,970 --> 00:54:02,280
So one node could be
receiving search messages

1060
00:54:02,280 --> 00:54:05,040
from several different
neighbors at once.

1061
00:54:05,040 --> 00:54:07,660
Well, it wants to choose
one of them as its parent,

1062
00:54:07,660 --> 00:54:10,380
doesn't matter which
one it chooses.

1063
00:54:10,380 --> 00:54:13,000
So it can just choose
nondeterminstically just

1064
00:54:13,000 --> 00:54:15,160
arbitrarily.

1065
00:54:15,160 --> 00:54:19,932
And then it decides that it
will send the next round.

1066
00:54:19,932 --> 00:54:21,170
Is the algorithm clear?

1067
00:54:26,770 --> 00:54:29,380
So there's, I mentioned, a
little bit of nondeterministic

1068
00:54:29,380 --> 00:54:31,970
here, only in that a process
can choose arbitrarily

1069
00:54:31,970 --> 00:54:34,120
among several possible parents.

1070
00:54:36,770 --> 00:54:38,490
And then we could
put in a default,

1071
00:54:38,490 --> 00:54:40,830
saying that it chooses the
one with the smallest ID,

1072
00:54:40,830 --> 00:54:43,170
if we really want to
make it deterministic.

1073
00:54:43,170 --> 00:54:45,542
But it's also OK to leave
distributed algorithms

1074
00:54:45,542 --> 00:54:46,250
nondeterministic.

1075
00:54:49,690 --> 00:54:51,530
And here I should
make a remark that

1076
00:54:51,530 --> 00:54:54,230
shows how differently
nondeterminism

1077
00:54:54,230 --> 00:54:56,910
is regarded in the
distributed setting,

1078
00:54:56,910 --> 00:55:00,520
from the way it is for
sequential algorithms.

1079
00:55:00,520 --> 00:55:03,410
For distributed algorithms,
there can be many options.

1080
00:55:03,410 --> 00:55:04,840
And maybe they're all OK.

1081
00:55:04,840 --> 00:55:07,560
But the algorithm is
supposed to work correctly,

1082
00:55:07,560 --> 00:55:12,960
no matter how you resolve
the nondeterministic choices.

1083
00:55:12,960 --> 00:55:15,850
So think about like
np, and the other ways

1084
00:55:15,850 --> 00:55:18,160
that you've seen
nondeterminism so far.

1085
00:55:18,160 --> 00:55:21,390
There you say you're lucky if
there is a path to a choice.

1086
00:55:21,390 --> 00:55:24,200
Here when you make a
nondeterministic choice,

1087
00:55:24,200 --> 00:55:26,590
or when the algorithm
behaves nondeterministically,

1088
00:55:26,590 --> 00:55:28,330
all the choices are
supposed to work.

1089
00:55:28,330 --> 00:55:30,890
It's like all the paths have to
come up with correct answers.

1090
00:55:30,890 --> 00:55:32,259
Do you have a question?

1091
00:55:32,259 --> 00:55:34,384
AUDIENCE: Yes, whenever
there's a sub- [INAUDIBLE],

1092
00:55:34,384 --> 00:55:36,740
whenever there's
a race condition,

1093
00:55:36,740 --> 00:55:38,740
we locally assume that
there wasn't a difference

1094
00:55:38,740 --> 00:55:41,160
in local computation time.

1095
00:55:41,160 --> 00:55:42,830
But if there is, even
in the slightest,

1096
00:55:42,830 --> 00:55:45,330
then they would get a parent
[INAUDIBLE] before another one,

1097
00:55:45,330 --> 00:55:47,129
it would still be a valid--

1098
00:55:47,129 --> 00:55:48,670
NANCY LYNCH: So the
synchronous model

1099
00:55:48,670 --> 00:55:50,030
is more abstract than that.

1100
00:55:50,030 --> 00:55:52,540
You don't model the
local computation time.

1101
00:55:52,540 --> 00:55:54,660
You're moving more toward
an asynchronous model,

1102
00:55:54,660 --> 00:55:58,280
where the steps can take
differing amounts of time.

1103
00:55:58,280 --> 00:56:01,250
Here we just assume you have
an abstract model, where

1104
00:56:01,250 --> 00:56:04,280
everybody does stuff
at once, in each round.

1105
00:56:04,280 --> 00:56:05,900
But you still have
nondeterminism

1106
00:56:05,900 --> 00:56:11,560
because they can all arrive
at the same round somewhere.

1107
00:56:11,560 --> 00:56:12,240
But it's OK.

1108
00:56:12,240 --> 00:56:14,100
You can pick any one
and it still works.

1109
00:56:17,560 --> 00:56:20,600
So it should be not hard to
see that this does give you

1110
00:56:20,600 --> 00:56:23,830
a BFS tree because you're
creating all the branches

1111
00:56:23,830 --> 00:56:25,040
synchronously.

1112
00:56:25,040 --> 00:56:28,790
And you're growing
one hop at each round.

1113
00:56:28,790 --> 00:56:30,690
It reaches all the
nodes eventually

1114
00:56:30,690 --> 00:56:32,080
because the graph is connected.

1115
00:56:32,080 --> 00:56:36,400
And everybody sends messages
once a node get marked.

1116
00:56:36,400 --> 00:56:38,520
It sends messages
to its neighbors.

1117
00:56:38,520 --> 00:56:40,400
So eventually, the
markings are going

1118
00:56:40,400 --> 00:56:46,640
to reach all the neighbors,
all the nodes in the graph.

1119
00:56:46,640 --> 00:56:50,970
So here's how you get the
example I showed before,

1120
00:56:50,970 --> 00:56:53,460
simple breadth-first search.

1121
00:56:53,460 --> 00:56:57,270
That's a search message
sent by this guy.

1122
00:56:57,270 --> 00:56:59,360
I put it to the
right of the edge

1123
00:56:59,360 --> 00:57:02,990
to indicate-- it's kind
of hard to distinguish.

1124
00:57:02,990 --> 00:57:04,880
But I put them on
the right of the edge

1125
00:57:04,880 --> 00:57:06,750
from the point of
view of the sender.

1126
00:57:06,750 --> 00:57:09,770
So he sends a search message.

1127
00:57:09,770 --> 00:57:10,650
it gets there.

1128
00:57:10,650 --> 00:57:13,720
This arrow just indicates
that it reached the other end.

1129
00:57:13,720 --> 00:57:16,160
And this guy has
chosen the sender,

1130
00:57:16,160 --> 00:57:19,790
which is the other direction
on the arrow, as its parent.

1131
00:57:19,790 --> 00:57:25,540
Now the recipient is going
to send some search messages.

1132
00:57:25,540 --> 00:57:28,370
So he sends four of them.

1133
00:57:28,370 --> 00:57:29,820
They all get to the other end.

1134
00:57:29,820 --> 00:57:32,770
And OK, so all these
guys now get marked.

1135
00:57:32,770 --> 00:57:36,230
They're included
in the BFS tree.

1136
00:57:36,230 --> 00:57:40,000
And now the next round,
they all send some messages.

1137
00:57:40,000 --> 00:57:44,270
I'm not putting in the messages
where somebody would send back

1138
00:57:44,270 --> 00:57:45,970
to a guy who sent to him.

1139
00:57:45,970 --> 00:57:47,810
But I put in all the others.

1140
00:57:47,810 --> 00:57:51,350
Some of them are
going to be ignored.

1141
00:57:51,350 --> 00:57:53,820
But you do get to a
few new nodes this way.

1142
00:57:53,820 --> 00:57:55,460
That's round three.

1143
00:57:55,460 --> 00:57:57,900
Round four, everybody sends.

1144
00:57:57,900 --> 00:58:00,490
And now you have all
the nodes included.

1145
00:58:03,250 --> 00:58:05,540
So this gives you
the spanning tree

1146
00:58:05,540 --> 00:58:07,970
that I showed at the
beginning of this topic.

1147
00:58:12,450 --> 00:58:14,650
This is not a very
complicated algorithm.

1148
00:58:14,650 --> 00:58:17,830
But I think you can see
that things can get worse.

1149
00:58:17,830 --> 00:58:22,820
And you want to argue about why
the algorithms work correctly.

1150
00:58:22,820 --> 00:58:25,970
So as I said before,
a popular method

1151
00:58:25,970 --> 00:58:28,300
of reasoning about
the algorithms

1152
00:58:28,300 --> 00:58:30,300
is to state invariance.

1153
00:58:30,300 --> 00:58:32,010
So here, suppose
I want to describe

1154
00:58:32,010 --> 00:58:35,525
the state of the entire
network, after some number, r,

1155
00:58:35,525 --> 00:58:37,921
of rounds.

1156
00:58:37,921 --> 00:58:39,170
what could you say about that?

1157
00:58:39,170 --> 00:58:41,400
What's the case after r
rounds of this algorithm?

1158
00:58:49,010 --> 00:58:50,483
Yeah.

1159
00:58:50,483 --> 00:58:53,429
AUDIENCE: All nodes at
distance r from the root

1160
00:58:53,429 --> 00:58:55,260
have been marked.

1161
00:58:55,260 --> 00:58:58,280
NANCY LYNCH: All the nodes
at distance r from the root

1162
00:58:58,280 --> 00:58:59,530
have been marked.

1163
00:58:59,530 --> 00:59:03,000
In fact, only those by
round r, only the ones

1164
00:59:03,000 --> 00:59:06,330
with distances up through
r have been marked.

1165
00:59:06,330 --> 00:59:09,350
So to state the invariance, if
you want to state invariance,

1166
00:59:09,350 --> 00:59:12,540
I have to say what's in
the state of the processes.

1167
00:59:12,540 --> 00:59:14,770
So all right, what can we say?

1168
00:59:14,770 --> 00:59:18,570
So the process has a Boolean
that says whether or not

1169
00:59:18,570 --> 00:59:19,740
it's marked.

1170
00:59:19,740 --> 00:59:23,570
It has a place to
record a parent.

1171
00:59:23,570 --> 00:59:29,150
And it has someplace
where it puts information

1172
00:59:29,150 --> 00:59:30,750
about whether it's
supposed to send

1173
00:59:30,750 --> 00:59:33,100
a message at the next round.

1174
00:59:33,100 --> 00:59:36,180
And we also should
know its UID, so I'll

1175
00:59:36,180 --> 00:59:38,800
put that in another
state variable.

1176
00:59:38,800 --> 00:59:43,570
So here is something I
can say in invariance.

1177
00:59:43,570 --> 00:59:48,920
At the end of r rounds, as you
said, at the end of r rounds

1178
00:59:48,920 --> 00:59:52,390
exactly the processes
at distance at most r

1179
00:59:52,390 --> 00:59:57,511
from the source node, the
root node, are marked.

1180
00:59:57,511 --> 00:59:58,510
I can say a little more.

1181
00:59:58,510 --> 01:00:02,750
I can say a process has its
parents defined if and only

1182
01:00:02,750 --> 01:00:04,390
if it's marked.

1183
01:00:04,390 --> 01:00:05,640
So it doesn't just get market.

1184
01:00:05,640 --> 01:00:08,050
It also computes a
parent, and the parent

1185
01:00:08,050 --> 01:00:13,030
gets computed at the point
where it gets marked.

1186
01:00:13,030 --> 01:00:15,950
Then I should say that
the parent is correct.

1187
01:00:15,950 --> 01:00:21,400
So for any process that's at
distance d from the source,

1188
01:00:21,400 --> 01:00:23,340
if the parent is
defined, then it's

1189
01:00:23,340 --> 01:00:26,410
in fact the UID of a process
at distance d minus 1

1190
01:00:26,410 --> 01:00:28,670
from the source.

1191
01:00:28,670 --> 01:00:30,220
So that says it's
actually getting

1192
01:00:30,220 --> 01:00:33,590
a correct breadth-first tree.

1193
01:00:33,590 --> 01:00:36,890
It's getting the parent
on a shortest path.

1194
01:00:36,890 --> 01:00:37,566
Yeah?

1195
01:00:37,566 --> 01:00:39,946
AUDIENCE: Do these invariants
[INAUDIBLE] for i0?

1196
01:00:42,810 --> 01:00:44,460
NANCY LYNCH:
Distance 0 is marked.

1197
01:00:47,090 --> 01:00:52,200
i0 doesn't ever-- I
see what you're saying.

1198
01:00:52,200 --> 01:00:54,330
i0 doesn't have a parent.

1199
01:00:54,330 --> 01:00:56,910
So I guess that we
should say for i

1200
01:00:56,910 --> 01:01:01,200
not equal to i0 in this case.

1201
01:01:01,200 --> 01:01:03,310
So this would be a
process other than i0.

1202
01:01:03,310 --> 01:01:04,886
It would have its
parent defined,

1203
01:01:04,886 --> 01:01:06,010
if and only if it's marked.

1204
01:01:06,010 --> 01:01:09,180
Well as I think
you just noticed,

1205
01:01:09,180 --> 01:01:12,000
the root node is marked but
it doesn't have a parent.

1206
01:01:12,000 --> 01:01:15,240
So it's an exception.

1207
01:01:15,240 --> 01:01:19,500
But this should be,
this doesn't involve i0.

1208
01:01:19,500 --> 01:01:22,777
So the second one, I
can fix that a bit.

1209
01:01:22,777 --> 01:01:23,860
Other comments, questions?

1210
01:01:27,890 --> 01:01:31,000
So if somebody wanted to
do a formal correctness

1211
01:01:31,000 --> 01:01:33,040
proof of an algorithm
like this one,

1212
01:01:33,040 --> 01:01:34,637
you would use these invariants.

1213
01:01:34,637 --> 01:01:35,720
You prove it by induction.

1214
01:01:35,720 --> 01:01:37,510
In fact there's
quite a few people

1215
01:01:37,510 --> 01:01:43,030
who use interactive theorem
provers to do proofs

1216
01:01:43,030 --> 01:01:46,480
like this because the algorithms
can get pretty complicated,

1217
01:01:46,480 --> 01:01:48,520
with a lot of variables.

1218
01:01:48,520 --> 01:01:50,470
So you have to do
some bookkeeping.

1219
01:01:50,470 --> 01:01:52,460
You keep track of
all these invariants,

1220
01:01:52,460 --> 01:01:55,880
and then you want to prove that
they're all true by induction.

1221
01:01:55,880 --> 01:01:58,440
They all hold through
an inductive step.

1222
01:01:58,440 --> 01:02:00,780
So you can use an
interactive theorem prover

1223
01:02:00,780 --> 01:02:03,540
to help you do the bookkeeping.

1224
01:02:03,540 --> 01:02:06,390
But even a manual proof
in a research paper

1225
01:02:06,390 --> 01:02:08,790
would use invariance
in this style.

1226
01:02:12,000 --> 01:02:14,940
OK complexity.

1227
01:02:14,940 --> 01:02:19,660
So the number of rounds until
everybody outputs their parent

1228
01:02:19,660 --> 01:02:23,440
would be the maximum
distance of any node from v0.

1229
01:02:23,440 --> 01:02:25,780
So we can say that's at most
the diameter of the graph.

1230
01:02:25,780 --> 01:02:26,620
It could be less.

1231
01:02:26,620 --> 01:02:28,030
It's just is the
maximum distance

1232
01:02:28,030 --> 01:02:31,440
from this particular node.

1233
01:02:31,440 --> 01:02:33,020
Message complexity?

1234
01:02:33,020 --> 01:02:38,140
Well how many messages are
sent in this algorithm?

1235
01:02:38,140 --> 01:02:40,290
So everybody is going
to send messages

1236
01:02:40,290 --> 01:02:43,880
only once on all of its edges.

1237
01:02:43,880 --> 01:02:47,320
So that means all the
edges get a message sent

1238
01:02:47,320 --> 01:02:48,714
in each direction just once.

1239
01:02:48,714 --> 01:02:50,255
So it's order of
the number of edges.

1240
01:02:55,360 --> 01:02:58,720
All right, so we can
play around with this.

1241
01:02:58,720 --> 01:03:01,560
So this algorithm just tells
everybody who his parent is.

1242
01:03:01,560 --> 01:03:03,280
But maybe when you're
finished, you'd

1243
01:03:03,280 --> 01:03:05,460
like to who your
children are as well.

1244
01:03:08,400 --> 01:03:10,380
For many uses of
these trees, you'd

1245
01:03:10,380 --> 01:03:14,040
like to have a parent be
able to talk to its children

1246
01:03:14,040 --> 01:03:15,250
in the tree.

1247
01:03:15,250 --> 01:03:16,140
So how to do that?

1248
01:03:16,140 --> 01:03:20,080
Well you can add a child
pointer because anybody

1249
01:03:20,080 --> 01:03:22,520
who gets a search message
and selects its parents

1250
01:03:22,520 --> 01:03:24,830
could send back a message
to that parents saying, hey,

1251
01:03:24,830 --> 01:03:26,330
I'm your child.

1252
01:03:26,330 --> 01:03:29,330
And if you get a search message,
and you decide that that's not

1253
01:03:29,330 --> 01:03:31,760
your parent, you can
help that guy out

1254
01:03:31,760 --> 01:03:34,407
by sending a message saying
you're not my parent.

1255
01:03:34,407 --> 01:03:35,990
In the synchronous
case, he would just

1256
01:03:35,990 --> 01:03:37,864
know that, if he didn't
get a parent message.

1257
01:03:37,864 --> 01:03:40,970
But things are going to
get more complicated.

1258
01:03:40,970 --> 01:03:43,517
So we'll send parents
or non parent responses

1259
01:03:43,517 --> 01:03:44,475
to the search messages.

1260
01:03:49,770 --> 01:03:52,300
Suppose we want to compute
the distances from v0,

1261
01:03:52,300 --> 01:03:53,630
not just to the parents are.

1262
01:03:53,630 --> 01:03:55,310
Well that's easy.

1263
01:03:55,310 --> 01:03:58,190
Everybody can just record
its distances, as well as

1264
01:03:58,190 --> 01:04:01,670
its parent and the mark.

1265
01:04:01,670 --> 01:04:04,670
And then you just include
your own distance value

1266
01:04:04,670 --> 01:04:06,170
in your search message.

1267
01:04:06,170 --> 01:04:09,340
And when somebody
receives a search message,

1268
01:04:09,340 --> 01:04:13,820
it sets its own distance to
the received distance plus 1.

1269
01:04:13,820 --> 01:04:17,750
So we can just keep track
and add one to the distance.

1270
01:04:17,750 --> 01:04:20,380
It's easy to augment
this algorithm

1271
01:04:20,380 --> 01:04:21,630
to get this extra information.

1272
01:04:24,630 --> 01:04:26,380
All right, now how
do the processes know

1273
01:04:26,380 --> 01:04:27,463
when this is all finished?

1274
01:04:30,140 --> 01:04:32,011
So everybody was able
to output parent.

1275
01:04:32,011 --> 01:04:33,010
I know who my parent is.

1276
01:04:33,010 --> 01:04:36,870
But how does anybody
know when the entire tree

1277
01:04:36,870 --> 01:04:39,770
has been produced?

1278
01:04:39,770 --> 01:04:42,820
Not so obvious.

1279
01:04:42,820 --> 01:04:46,270
So in some settings, you
might know an upper bound

1280
01:04:46,270 --> 01:04:48,350
on the depth of the tree.

1281
01:04:48,350 --> 01:04:51,477
And then you could just wait
for that number of rounds.

1282
01:04:51,477 --> 01:04:52,810
But what if you don't know that?

1283
01:04:52,810 --> 01:04:54,476
You don't know anything
about the graph.

1284
01:04:54,476 --> 01:04:57,360
Nobody knows.

1285
01:04:57,360 --> 01:04:59,460
So let's come up
with an algorithm

1286
01:04:59,460 --> 01:05:04,660
for process i0, the root,
to know definitively

1287
01:05:04,660 --> 01:05:07,700
that the tree has been
completely constructed.

1288
01:05:07,700 --> 01:05:08,200
Ideas?

1289
01:05:14,100 --> 01:05:15,890
You're creating this
by search messages.

1290
01:05:15,890 --> 01:05:17,889
How is i0 going to
know when its done?

1291
01:05:25,872 --> 01:05:26,372
Yeah.

1292
01:05:26,372 --> 01:05:27,869
AUDIENCE: Every time
you mark a node,

1293
01:05:27,869 --> 01:05:29,865
the node can send a
message back to its parent,

1294
01:05:29,865 --> 01:05:30,863
saying hi, I've been marked.

1295
01:05:30,863 --> 01:05:33,154
Then you can probably get
all the way back to the root.

1296
01:05:33,154 --> 01:05:36,601
And then the root can count
the number of-- actually,

1297
01:05:36,601 --> 01:05:37,860
no if the root doesn't--

1298
01:05:37,860 --> 01:05:39,985
NANCY LYNCH: Root doesn't
know the number of nodes.

1299
01:05:39,985 --> 01:05:41,543
So that's a good idea.

1300
01:05:41,543 --> 01:05:43,042
AUDIENCE: If you
don't have a child,

1301
01:05:43,042 --> 01:05:45,507
you can tell your parent
that you don't have a child.

1302
01:05:48,545 --> 01:05:49,920
NANCY LYNCH: That's
a good start.

1303
01:05:49,920 --> 01:05:51,230
Was there another?

1304
01:05:51,230 --> 01:05:51,737
Yeah.

1305
01:05:51,737 --> 01:05:53,153
AUDIENCE: More
generally, you just

1306
01:05:53,153 --> 01:05:55,885
send a signal when you
know your sub-tree is done.

1307
01:05:55,885 --> 01:05:58,010
NANCY LYNCH: When you know
you're sub-tree is done,

1308
01:05:58,010 --> 01:06:00,770
so that means you're going
to be communicating something

1309
01:06:00,770 --> 01:06:02,170
up the tree.

1310
01:06:02,170 --> 01:06:05,640
Right, so that's the idea
that you're working toward.

1311
01:06:05,640 --> 01:06:08,980
So a termination
algorithm to inform i0

1312
01:06:08,980 --> 01:06:11,550
when the tree is
completely constructed.

1313
01:06:11,550 --> 01:06:15,080
So let's say that the search
messages get their responses.

1314
01:06:15,080 --> 01:06:17,700
So everybody knows
which nodes are their,

1315
01:06:17,700 --> 01:06:22,290
which neighbors are its
children, and which are not.

1316
01:06:22,290 --> 01:06:24,810
So suppose a node
has gotten responses

1317
01:06:24,810 --> 01:06:30,830
to all of its search messages,
knows who all its children are.

1318
01:06:30,830 --> 01:06:33,260
Now the leaves in
this tree are going

1319
01:06:33,260 --> 01:06:34,730
to know that they're leaves.

1320
01:06:34,730 --> 01:06:37,880
How do they know that?

1321
01:06:37,880 --> 01:06:41,860
Propagating all these search
messages, and I'm a leaf.

1322
01:06:41,860 --> 01:06:43,524
How do I know I'm a leaf?

1323
01:06:43,524 --> 01:06:44,940
AUDIENCE: You can't
have children.

1324
01:06:44,940 --> 01:06:47,410
NANCY LYNCH: Yeah, you send
all these search messages,

1325
01:06:47,410 --> 01:06:51,810
and everybody says, sorry
you're not my parent.

1326
01:06:51,810 --> 01:06:54,390
So you know you have no
children because of the kind

1327
01:06:54,390 --> 01:06:57,140
of responses you get.

1328
01:06:57,140 --> 01:06:58,540
So now we're going
to use what we

1329
01:06:58,540 --> 01:07:01,300
call a convergecast strategy.

1330
01:07:01,300 --> 01:07:03,160
Broadcast is sending things out.

1331
01:07:03,160 --> 01:07:06,320
Convergecast is fanning
in information back

1332
01:07:06,320 --> 01:07:09,560
to the top of the tree.

1333
01:07:09,560 --> 01:07:11,800
So the convergecast
would say, all right,

1334
01:07:11,800 --> 01:07:15,200
so the leaves would send
a message to their parents

1335
01:07:15,200 --> 01:07:18,060
saying they're done.

1336
01:07:18,060 --> 01:07:23,600
Now if I'm some node in
the middle of the tree,

1337
01:07:23,600 --> 01:07:24,780
how do I know I'm done?

1338
01:07:24,780 --> 01:07:28,420
Well it's what you said.

1339
01:07:28,420 --> 01:07:31,750
You know that you
can figure out when

1340
01:07:31,750 --> 01:07:33,990
your entire sub-tree is done.

1341
01:07:33,990 --> 01:07:37,750
Well first of all, you have
to know your children are.

1342
01:07:37,750 --> 01:07:39,760
It's kind of a
two stage process.

1343
01:07:39,760 --> 01:07:42,610
You have to know who
your children are,

1344
01:07:42,610 --> 01:07:46,530
by having received responses
to all your search messages.

1345
01:07:46,530 --> 01:07:49,410
And you wait to receive
done messages from all

1346
01:07:49,410 --> 01:07:51,247
of your actual children.

1347
01:07:51,247 --> 01:07:53,080
So if I'm sitting in
the middle of the tree,

1348
01:07:53,080 --> 01:07:56,020
and I've got done messages
from all my children,

1349
01:07:56,020 --> 01:07:57,850
I know my whole
sub-tree is done.

1350
01:07:57,850 --> 01:08:02,140
Then I can send the done
message to my parent.

1351
01:08:02,140 --> 01:08:04,180
Got that?

1352
01:08:04,180 --> 01:08:05,870
That's how convergecast works.

1353
01:08:05,870 --> 01:08:09,690
And when it reaches
the top, if i0

1354
01:08:09,690 --> 01:08:12,580
knows who its children are,
and it receives done messages

1355
01:08:12,580 --> 01:08:15,540
from all its children, it
knows the whole tree is done.

1356
01:08:15,540 --> 01:08:20,100
So it can output that the
tree construction is complete.

1357
01:08:20,100 --> 01:08:22,420
And it could tell
the others by sending

1358
01:08:22,420 --> 01:08:25,859
a message down the tree,
so they all know as well.

1359
01:08:25,859 --> 01:08:27,194
Questions?

1360
01:08:27,194 --> 01:08:32,674
AUDIENCE: Wouldn't i0
be the last one to know?

1361
01:08:32,674 --> 01:08:34,090
NANCY LYNCH: He'd
be the last one.

1362
01:08:34,090 --> 01:08:37,390
No, he'd be the first one to
know that the whole tree is

1363
01:08:37,390 --> 01:08:38,670
complete.

1364
01:08:38,670 --> 01:08:41,450
Everybody else knows when
their sub-tree is complete.

1365
01:08:41,450 --> 01:08:45,410
So i0 still has to now send
another message down the tree

1366
01:08:45,410 --> 01:08:48,276
to tell everyone else the
entire tree is complete.

1367
01:08:48,276 --> 01:08:49,359
Is there another question?

1368
01:08:52,289 --> 01:08:53,830
All right so this
isn't showing that.

1369
01:08:53,830 --> 01:08:56,279
This is just showing done
messages, which are actually

1370
01:08:56,279 --> 01:08:58,560
going in the opposite
direction from these edges,

1371
01:08:58,560 --> 01:08:59,390
going up the tree.

1372
01:08:59,390 --> 01:09:02,060
But you can just see
how they propagate up

1373
01:09:02,060 --> 01:09:04,670
until the roots says done.

1374
01:09:04,670 --> 01:09:05,180
No big deal.

1375
01:09:08,149 --> 01:09:10,819
Complexity for termination.

1376
01:09:10,819 --> 01:09:14,130
Well it just takes at most
diameter rounds and n messages

1377
01:09:14,130 --> 01:09:16,880
for this done information
to come up to the top,

1378
01:09:16,880 --> 01:09:19,229
once the tree
actually is finished.

1379
01:09:19,229 --> 01:09:21,130
Because now you're
just sending messages

1380
01:09:21,130 --> 01:09:25,130
on the paths in this
tree, which are only,

1381
01:09:25,130 --> 01:09:29,029
at most, diameter in length.

1382
01:09:29,029 --> 01:09:32,920
And this is just the process
i0 can tell everybody else.

1383
01:09:32,920 --> 01:09:34,540
It doesn't take
very long either.

1384
01:09:37,260 --> 01:09:41,149
Applications, well suppose you
construct a tree like this.

1385
01:09:41,149 --> 01:09:44,460
And process i0 now wants
to use it to communicate.

1386
01:09:44,460 --> 01:09:46,450
It wants to send a
whole batch of messages

1387
01:09:46,450 --> 01:09:47,819
to all the other nodes.

1388
01:09:47,819 --> 01:09:49,990
It can just send
them now on the tree.

1389
01:09:49,990 --> 01:09:52,790
It's an easy way to
make sure messages reach

1390
01:09:52,790 --> 01:09:54,240
everybody else in the network.

1391
01:09:54,240 --> 01:09:57,310
Just send them on the edges
of the breadth-first spanning

1392
01:09:57,310 --> 01:09:59,580
tree.

1393
01:09:59,580 --> 01:10:03,610
So now the messages,
each individual message

1394
01:10:03,610 --> 01:10:07,370
takes at most n
message instances

1395
01:10:07,370 --> 01:10:09,650
along the edges of the
tree, because you only have

1396
01:10:09,650 --> 01:10:11,570
to traverse the tree edges.

1397
01:10:11,570 --> 01:10:15,920
No more dependence on the total
number of edges in the network.

1398
01:10:15,920 --> 01:10:19,000
And in fact, you can
save time by pipelining

1399
01:10:19,000 --> 01:10:20,280
a series of messages.

1400
01:10:20,280 --> 01:10:23,410
So you can send them one
round after the other.

1401
01:10:28,180 --> 01:10:31,740
The other way, suppose you want
to compute something globally.

1402
01:10:31,740 --> 01:10:34,870
Suppose everybody starts
with some initial value.

1403
01:10:34,870 --> 01:10:38,590
And process i0 is going
to try to determine

1404
01:10:38,590 --> 01:10:42,650
the value of some function
of everybody's initial value,

1405
01:10:42,650 --> 01:10:46,530
like the minimum or maximum
or the sum or anything.

1406
01:10:46,530 --> 01:10:48,990
Well you can do this
while convergecasting

1407
01:10:48,990 --> 01:10:52,910
on an already built BFS tree.

1408
01:10:52,910 --> 01:10:56,470
So everybody can just send
their information up the tree,

1409
01:10:56,470 --> 01:10:58,290
and i0 can collect it all.

1410
01:10:58,290 --> 01:11:00,933
In general, you
can accumulate, you

1411
01:11:00,933 --> 01:11:04,610
can do data aggregation as you
go up the paths of the tree.

1412
01:11:04,610 --> 01:11:09,910
So the message size
doesn't blow up.

1413
01:11:09,910 --> 01:11:13,520
So if you want, for example,
the sum of everybody's values,

1414
01:11:13,520 --> 01:11:16,260
everybody just sends their
values up in a convergecast.

1415
01:11:16,260 --> 01:11:18,890
And each node computes
the sum of all the values

1416
01:11:18,890 --> 01:11:21,550
in its sub-tree.

1417
01:11:21,550 --> 01:11:24,100
So this is pretty efficient.

1418
01:11:24,100 --> 01:11:26,722
Make sense?

1419
01:11:26,722 --> 01:11:27,680
I'm going to skip this.

1420
01:11:27,680 --> 01:11:30,110
But you could do leader
election in a general graph,

1421
01:11:30,110 --> 01:11:32,470
If you don't have
a leader, already,

1422
01:11:32,470 --> 01:11:35,550
i0 by having everybody
run a breadth-first search

1423
01:11:35,550 --> 01:11:36,160
in parallel.

1424
01:11:36,160 --> 01:11:37,359
But we'll skip that.

1425
01:11:37,359 --> 01:11:39,400
Because I just wanted to
have a couple of minutes

1426
01:11:39,400 --> 01:11:43,900
to start the last topic, and
we'll pick it up next time.

1427
01:11:43,900 --> 01:11:47,060
So it's the obvious extension.

1428
01:11:47,060 --> 01:11:49,350
Instead of just
breadth-first search trees,

1429
01:11:49,350 --> 01:11:51,810
let's put weights
on the edges and try

1430
01:11:51,810 --> 01:11:57,170
to compute shortest paths trees
in terms of the total weight

1431
01:11:57,170 --> 01:11:58,631
of the path.

1432
01:12:01,560 --> 01:12:04,350
So we're going to add weights.

1433
01:12:04,350 --> 01:12:05,440
It's an undirected graph.

1434
01:12:05,440 --> 01:12:08,160
So it's just a weight
for each undirected edge.

1435
01:12:11,290 --> 01:12:19,160
I'll still have a starting
node, vertex v0 with process i0.

1436
01:12:19,160 --> 01:12:22,110
Still have unique identifiers.

1437
01:12:22,110 --> 01:12:24,670
And I'll assume the processes
know who their neighbors are.

1438
01:12:24,670 --> 01:12:27,270
And they know the
weights of the incident

1439
01:12:27,270 --> 01:12:29,659
edges, their adjacent edges.

1440
01:12:29,659 --> 01:12:31,200
But otherwise they
don't need to know

1441
01:12:31,200 --> 01:12:34,590
anything else about the graph.

1442
01:12:34,590 --> 01:12:36,890
So again, this is
a familiar problem.

1443
01:12:36,890 --> 01:12:38,990
But we're looking at it
in a very different way,

1444
01:12:38,990 --> 01:12:40,160
by distributing it.

1445
01:12:43,360 --> 01:12:47,640
so the processes are supposed to
compute a shortest paths tree,

1446
01:12:47,640 --> 01:12:49,960
in the sense that
everybody should

1447
01:12:49,960 --> 01:12:52,050
output its parent in the tree.

1448
01:12:52,050 --> 01:12:55,440
And let's say they output
the distance as well,

1449
01:12:55,440 --> 01:12:58,680
the weighted distance
from the root node.

1450
01:13:03,540 --> 01:13:06,920
So this is called
Bellman-Ford's algorithm.

1451
01:13:06,920 --> 01:13:11,970
Again it's got the same name
in the distributed setting.

1452
01:13:11,970 --> 01:13:13,870
The Bellman-Ford
shortest paths algorithm.

1453
01:13:17,230 --> 01:13:20,710
So everybody is keeping
track of their current best

1454
01:13:20,710 --> 01:13:23,630
distance that they
know, and their parent.

1455
01:13:23,630 --> 01:13:27,270
And they know their
unique identifier.

1456
01:13:27,270 --> 01:13:29,040
And here's how the
algorithm works.

1457
01:13:29,040 --> 01:13:31,170
This will look
familiar from when

1458
01:13:31,170 --> 01:13:34,130
you had Bellman-Ford earlier.

1459
01:13:34,130 --> 01:13:37,650
At every round,
everybody is going

1460
01:13:37,650 --> 01:13:40,752
to send its distance
to its neighbors.

1461
01:13:40,752 --> 01:13:42,460
Instead of just sending
a search message,

1462
01:13:42,460 --> 01:13:46,450
now it will send its actual
distance information.

1463
01:13:46,450 --> 01:13:50,720
And you receive the messages
from your neighbors.

1464
01:13:50,720 --> 01:13:55,240
And now you do a relaxation
step, as you've seen before.

1465
01:13:55,240 --> 01:13:56,990
You look at the current
distance you have.

1466
01:13:56,990 --> 01:13:59,610
And you see if you've
gotten a new distance

1467
01:13:59,610 --> 01:14:03,650
from a neighbor, such that if
you add the new distance you

1468
01:14:03,650 --> 01:14:06,600
receive to the weight of
the edge between yourself

1469
01:14:06,600 --> 01:14:08,690
and that neighbor, you
get something better

1470
01:14:08,690 --> 01:14:10,350
than what you had before.

1471
01:14:10,350 --> 01:14:14,220
If you get that, then you're
going to improve your distance.

1472
01:14:14,220 --> 01:14:16,170
And if you improve
your distance,

1473
01:14:16,170 --> 01:14:19,070
then you're going
to reset your parent

1474
01:14:19,070 --> 01:14:24,720
to the sender of this new,
better distance information.

1475
01:14:24,720 --> 01:14:26,610
So does this
algorithm make sense?

1476
01:14:26,610 --> 01:14:28,580
It's like what you saw before.

1477
01:14:28,580 --> 01:14:32,470
But there's no running
through all the nodes.

1478
01:14:32,470 --> 01:14:34,310
Each node is doing
its own thing.

1479
01:14:34,310 --> 01:14:37,040
It's waiting to get better
distance information

1480
01:14:37,040 --> 01:14:39,100
and re-computing.

1481
01:14:39,100 --> 01:14:41,750
And then it's going to
be sending out its better

1482
01:14:41,750 --> 01:14:43,316
information at the next round.

1483
01:14:46,100 --> 01:14:46,710
Question?

1484
01:14:46,710 --> 01:14:49,660
So this is kind of a jump
in the way of thinking.

1485
01:14:54,060 --> 01:14:56,990
All right, so now I'm just
going to end basically

1486
01:14:56,990 --> 01:14:59,560
with an animation that'll
show you the kinds of things

1487
01:14:59,560 --> 01:15:01,930
that happen here.

1488
01:15:01,930 --> 01:15:07,100
All right so you start
out with the initial node.

1489
01:15:07,100 --> 01:15:10,590
And what's recorded in the
circle is the best distances.

1490
01:15:10,590 --> 01:15:14,522
The rest of these, the best
distance they know is infinity.

1491
01:15:14,522 --> 01:15:15,480
So I didn't write that.

1492
01:15:15,480 --> 01:15:23,470
So this guy knows 0 After one
round, he sent two messages.

1493
01:15:23,470 --> 01:15:25,920
The best distance each
of these guys knows

1494
01:15:25,920 --> 01:15:30,360
is just the weight of the
edge between v0 and itself.

1495
01:15:30,360 --> 01:15:33,410
So this guy's now estimating
it's distance at 16

1496
01:15:33,410 --> 01:15:36,080
and this guy at 1.

1497
01:15:36,080 --> 01:15:38,930
16 is not very good because it's
actually very roundabout routes

1498
01:15:38,930 --> 01:15:40,070
that can get there.

1499
01:15:40,070 --> 01:15:45,310
But it's going to take us some
time to make that adjustment.

1500
01:15:45,310 --> 01:15:50,240
After two rounds, everybody
is sending their distance

1501
01:15:50,240 --> 01:15:50,880
information.

1502
01:15:50,880 --> 01:15:54,700
But now we get a
correction here.

1503
01:15:54,700 --> 01:15:57,110
This used to say 16.

1504
01:15:57,110 --> 01:15:59,710
But now we have a
two hop path that

1505
01:15:59,710 --> 01:16:02,170
gives you a better distance.

1506
01:16:02,170 --> 01:16:04,000
So you get the 1 plus the 14.

1507
01:16:04,000 --> 01:16:08,850
So he's going to here,
about the distance of 15

1508
01:16:08,850 --> 01:16:11,390
as a result of what 1 sends.

1509
01:16:11,390 --> 01:16:16,740
And some new guys get their
distance is calculated

1510
01:16:16,740 --> 01:16:21,680
And then after three rounds, it
gets a little bit complicated.

1511
01:16:21,680 --> 01:16:24,910
So maybe I'm just going to
flip through it quickly and let

1512
01:16:24,910 --> 01:16:26,500
you study later.

1513
01:16:26,500 --> 01:16:29,340
But you see that you keep
getting improvements,

1514
01:16:29,340 --> 01:16:32,390
as you perform relaxation steps.

1515
01:16:32,390 --> 01:16:36,270
As information gets to
somebody by better paths that

1516
01:16:36,270 --> 01:16:38,680
happen to have
more hops, they're

1517
01:16:38,680 --> 01:16:40,560
going to be reducing
their estimates.

1518
01:16:40,560 --> 01:16:44,640
I'm going to flip, and you
see that this guy's estimate

1519
01:16:44,640 --> 01:16:47,050
is going down.

1520
01:16:47,050 --> 01:16:49,920
And in the end, after
eight rounds of this,

1521
01:16:49,920 --> 01:16:52,180
you end up with a
very roundabout path

1522
01:16:52,180 --> 01:16:56,430
that actually gives this
guy a much better estimate.

1523
01:16:56,430 --> 01:16:58,640
So you can see how that works.

1524
01:17:01,190 --> 01:17:03,660
So the claim is that
eventually, every process

1525
01:17:03,660 --> 01:17:08,270
will have its distance being
a correct minimum weight

1526
01:17:08,270 --> 01:17:12,710
of the path, and its
parent will be correct.

1527
01:17:12,710 --> 01:17:14,710
I think maybe this is
a good place to stop.

1528
01:17:14,710 --> 01:17:17,810
We'll pick up with this
algorithm and its analysis.

1529
01:17:17,810 --> 01:17:19,910
Most of next time
is going to be spent

1530
01:17:19,910 --> 01:17:22,440
on asynchronous
algorithms, which

1531
01:17:22,440 --> 01:17:25,560
is a whole other
level of complication.

1532
01:17:25,560 --> 01:17:27,820
So I'll see you on Thursday.