1 00:00:00,060 --> 00:00:02,500 The following content is provided under a Creative 2 00:00:02,500 --> 00:00:04,019 Commons license. 3 00:00:04,019 --> 00:00:06,360 Your support will help MIT OpenCourseWare 4 00:00:06,360 --> 00:00:10,730 continue to offer high quality, educational resources for free. 5 00:00:10,730 --> 00:00:13,330 To make a donation or view additional materials 6 00:00:13,330 --> 00:00:17,236 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,236 --> 00:00:17,861 at ocw.mit.edu. 8 00:00:26,600 --> 00:00:29,342 NANCY LYNCH: OK so today, you're going to see something new. 9 00:00:29,342 --> 00:00:30,800 In fact all this week, you're going 10 00:00:30,800 --> 00:00:33,510 to see something that's quite different from what you've 11 00:00:33,510 --> 00:00:36,960 been studying in this course. 12 00:00:36,960 --> 00:00:37,950 These are algorithms. 13 00:00:37,950 --> 00:00:42,380 But they're for a completely different sort of model. 14 00:00:42,380 --> 00:00:46,110 Distributed algorithms, OK, so what are they? 15 00:00:46,110 --> 00:00:48,190 So now instead of having algorithms 16 00:00:48,190 --> 00:00:50,480 that run on a typical computer, you're 17 00:00:50,480 --> 00:00:55,400 going to have algorithms that run on a network of processors. 18 00:00:55,400 --> 00:00:57,350 Or it could be on one machine that 19 00:00:57,350 --> 00:01:01,330 has multiple processors, multi processors that memory. 20 00:01:05,970 --> 00:01:09,370 Much of computing is distributed algorithms now. 21 00:01:09,370 --> 00:01:11,550 They solve problems like communication 22 00:01:11,550 --> 00:01:18,560 on the internet, data management over a network, 23 00:01:18,560 --> 00:01:21,260 allocating resources in a network setting, 24 00:01:21,260 --> 00:01:23,540 synchronizing, reaching agreement 25 00:01:23,540 --> 00:01:28,990 among different agents at remote locations. 26 00:01:28,990 --> 00:01:31,470 So these are all distributed problems, not things 27 00:01:31,470 --> 00:01:34,610 that you just solve on one computer. 28 00:01:34,610 --> 00:01:38,120 The kinds of algorithms you design for these settings 29 00:01:38,120 --> 00:01:45,420 have to work under extremely difficult platforms 30 00:01:45,420 --> 00:01:48,360 because what you have is concurrent activity that's 31 00:01:48,360 --> 00:01:51,840 going on at many locations, many processors doing things 32 00:01:51,840 --> 00:01:53,220 at the same time. 33 00:01:53,220 --> 00:01:55,800 And you don't know exactly when everybody's going 34 00:01:55,800 --> 00:01:57,970 to perform their activities. 35 00:01:57,970 --> 00:02:02,090 You can have different sorts of timing uncertainty. 36 00:02:02,090 --> 00:02:05,010 The order of events isn't clear. 37 00:02:05,010 --> 00:02:08,830 There could be inputs that arrive at different locations. 38 00:02:08,830 --> 00:02:12,150 And then you also have to deal with failure and recovery 39 00:02:12,150 --> 00:02:15,190 of some of the processors or some of the channels involved 40 00:02:15,190 --> 00:02:16,522 in the computation. 41 00:02:16,522 --> 00:02:17,980 You don't think of any of this when 42 00:02:17,980 --> 00:02:20,315 you're just trying to run an algorithm on one computer. 43 00:02:22,920 --> 00:02:25,990 So distributed algorithms can be pretty complicated. 44 00:02:25,990 --> 00:02:28,210 It's not easy to design them. 45 00:02:28,210 --> 00:02:30,654 And after you design them, you still 46 00:02:30,654 --> 00:02:32,070 have to make sure they're correct. 47 00:02:32,070 --> 00:02:34,290 So there are issues involved in proving them correct 48 00:02:34,290 --> 00:02:35,730 and analyzing them. 49 00:02:35,730 --> 00:02:37,980 A little bit of history, the field 50 00:02:37,980 --> 00:02:42,000 pretty much started around the late '60s. 51 00:02:42,000 --> 00:02:46,330 Edsger Dijkstra was one of the earliest leaders in the field. 52 00:02:46,330 --> 00:02:49,850 He won of the first Turing Awards. 53 00:02:49,850 --> 00:02:52,780 Leslie Lamport won the Turing Award last year. 54 00:02:52,780 --> 00:02:55,590 Although he actually started as a very young guy, 55 00:02:55,590 --> 00:02:59,470 way back in the early days of the field. 56 00:02:59,470 --> 00:03:01,770 If you want to look at some sources, I have a book. 57 00:03:01,770 --> 00:03:04,390 There's another textbook by Attiya and Welch. 58 00:03:04,390 --> 00:03:06,710 There's a new series of monographs that basically 59 00:03:06,710 --> 00:03:10,190 try to summarize many of the important research topics 60 00:03:10,190 --> 00:03:12,250 in distributed computing theory. 61 00:03:12,250 --> 00:03:16,170 And the last two lines have a couple of the main conferences 62 00:03:16,170 --> 00:03:18,620 in the field. 63 00:03:18,620 --> 00:03:21,750 OK so I can't do that much in one week. 64 00:03:21,750 --> 00:03:24,610 What I'll do is just introduce the area, 65 00:03:24,610 --> 00:03:29,140 by showing you two common models for distributed networks. 66 00:03:29,140 --> 00:03:32,686 And just introduce a very few fundamental algorithms, 67 00:03:32,686 --> 00:03:35,060 and you'll see along the way some techniques for modeling 68 00:03:35,060 --> 00:03:37,030 and analyzing them. 69 00:03:37,030 --> 00:03:39,740 OK the two models here are synchronous distributed 70 00:03:39,740 --> 00:03:44,820 networks, and asynchronous distributed networks. 71 00:03:44,820 --> 00:03:47,050 The problems I'll look at in the synchronous setting 72 00:03:47,050 --> 00:03:50,860 are a simple problem of leader election, which is a symmetry 73 00:03:50,860 --> 00:03:53,400 breaking problem, basically. 74 00:03:53,400 --> 00:03:58,227 Maximal independence set problem, and then a couple 75 00:03:58,227 --> 00:04:00,060 of problems that should look familiar to you 76 00:04:00,060 --> 00:04:04,530 from the settings of this class, establishing structures 77 00:04:04,530 --> 00:04:08,360 like breadth-first spanning trees and shortest paths trees. 78 00:04:08,360 --> 00:04:10,800 In the asynchronous case I'll revisit these last two 79 00:04:10,800 --> 00:04:13,290 problems, setting up breadth-first and shortest path 80 00:04:13,290 --> 00:04:15,100 trees. 81 00:04:15,100 --> 00:04:17,620 OK so I mentioned something about modeling 82 00:04:17,620 --> 00:04:19,430 in proofs and analysis. 83 00:04:19,430 --> 00:04:23,030 Turns out, getting the formal models right 84 00:04:23,030 --> 00:04:25,730 and getting real proofs tends to be 85 00:04:25,730 --> 00:04:27,830 pretty important for distributed algorithms 86 00:04:27,830 --> 00:04:31,180 because with all the stuff going on, they're complicated. 87 00:04:31,180 --> 00:04:34,030 And it's easy to make mistakes. 88 00:04:34,030 --> 00:04:38,110 The kinds of models that we use are interacting state machines, 89 00:04:38,110 --> 00:04:39,320 inputs and outputs. 90 00:04:39,320 --> 00:04:41,640 They send each other messages. 91 00:04:41,640 --> 00:04:44,050 But the kinds of proofs you do typically 92 00:04:44,050 --> 00:04:46,210 use invariants, a technique that you're very 93 00:04:46,210 --> 00:04:47,680 familiar with from this class. 94 00:04:47,680 --> 00:04:50,670 You can still use them in a distributed setting. 95 00:04:50,670 --> 00:04:53,910 And you still prove them the same way, by induction. 96 00:04:53,910 --> 00:04:56,640 Something else that comes up a lot in the distributed setting 97 00:04:56,640 --> 00:05:00,690 is modeling and proofs using levels of abstraction. 98 00:05:00,690 --> 00:05:02,810 You might want to give an abstract description 99 00:05:02,810 --> 00:05:04,860 of an algorithm and prove that that works. 100 00:05:04,860 --> 00:05:07,720 And then you have a very detailed, complicated, 101 00:05:07,720 --> 00:05:11,480 lower level description that you can prove implements the higher 102 00:05:11,480 --> 00:05:13,286 level description. 103 00:05:13,286 --> 00:05:15,460 That's another popular technique. 104 00:05:15,460 --> 00:05:17,670 You use different kinds of complexity measures. 105 00:05:17,670 --> 00:05:21,510 For time complexity, you would measure rounds 106 00:05:21,510 --> 00:05:25,610 if it's the synchronous model, or some approximation 107 00:05:25,610 --> 00:05:28,660 to real time, if it's the asynchronous model. 108 00:05:28,660 --> 00:05:31,410 You also count communication, either the number 109 00:05:31,410 --> 00:05:33,660 of messages you send, or the total number of bits 110 00:05:33,660 --> 00:05:35,421 that you send in an algorithm. 111 00:05:38,000 --> 00:05:40,550 So throughout these two lectures, 112 00:05:40,550 --> 00:05:44,300 we'll be looking at distributed networks. 113 00:05:44,300 --> 00:05:45,740 So you start with a graph. 114 00:05:45,740 --> 00:05:49,830 Let's just look at undirected graphs this week. 115 00:05:49,830 --> 00:05:52,050 We use n in this field for what you're 116 00:05:52,050 --> 00:05:56,490 calling v, the total number of nodes in the network 117 00:05:56,490 --> 00:06:01,780 or vertices in the graph. 118 00:06:01,780 --> 00:06:05,200 We use the notation gamma of u to mean the neighbors of u 119 00:06:05,200 --> 00:06:06,910 in the graph. 120 00:06:06,910 --> 00:06:11,060 So every vertex of the graph has a set of immediate neighboring 121 00:06:11,060 --> 00:06:11,810 vertices. 122 00:06:11,810 --> 00:06:13,730 That's gamma of u. 123 00:06:13,730 --> 00:06:19,310 And the degree of u is the size of the neighborhood, the number 124 00:06:19,310 --> 00:06:22,090 of neighbors of the vertex. 125 00:06:22,090 --> 00:06:24,050 OK so we start with the graph. 126 00:06:24,050 --> 00:06:25,740 But now we're going to plunk down 127 00:06:25,740 --> 00:06:29,350 a process, some kind of active entity 128 00:06:29,350 --> 00:06:31,900 at each vertex of the graph. 129 00:06:31,900 --> 00:06:33,520 So this is some kind of automaton. 130 00:06:33,520 --> 00:06:36,130 If you've taken automata theory, it's not really 131 00:06:36,130 --> 00:06:39,960 finite state machines, it's more like infinite state automata 132 00:06:39,960 --> 00:06:43,760 that can interact with each other. 133 00:06:43,760 --> 00:06:47,820 So we usually talk about vertices in a graph, processes 134 00:06:47,820 --> 00:06:49,160 at the vertices of a graph. 135 00:06:49,160 --> 00:06:51,930 But sometimes we get sloppy and just say nodes. 136 00:06:51,930 --> 00:06:54,980 And we could mean either the vertex or the active thing 137 00:06:54,980 --> 00:06:56,740 running at the vertex. 138 00:06:56,740 --> 00:06:59,840 Can't keep them straight all the time. 139 00:06:59,840 --> 00:07:02,120 OK and then with the edges of the graph, 140 00:07:02,120 --> 00:07:05,120 we would put communication channels, 141 00:07:05,120 --> 00:07:08,690 one in each direction, so that the processes 142 00:07:08,690 --> 00:07:11,700 can communicate over the edges. 143 00:07:11,700 --> 00:07:13,840 This week I'm not going to talk about what 144 00:07:13,840 --> 00:07:16,750 happens when you introduce failures because we just 145 00:07:16,750 --> 00:07:17,610 don't have time. 146 00:07:17,610 --> 00:07:20,800 A lot of distributed computing theory deals with what 147 00:07:20,800 --> 00:07:24,340 happens when some of the components in your system fail. 148 00:07:24,340 --> 00:07:27,330 How do you cope with that? 149 00:07:27,330 --> 00:07:29,880 So we'll start right in with synchronous distributed 150 00:07:29,880 --> 00:07:30,380 algorithms. 151 00:07:32,956 --> 00:07:34,580 A source for that, if you're interested 152 00:07:34,580 --> 00:07:38,180 is the first technical chapter in my book. 153 00:07:38,180 --> 00:07:41,060 OK so you have processes at the nodes of a graph, 154 00:07:41,060 --> 00:07:42,270 like I just said. 155 00:07:42,270 --> 00:07:45,830 They communicate using messages. 156 00:07:45,830 --> 00:07:49,460 So think of each process as not knowing who his neighbors are, 157 00:07:49,460 --> 00:07:51,930 not knowing anything about the graph. 158 00:07:51,930 --> 00:07:53,080 So what do they have? 159 00:07:53,080 --> 00:07:53,990 They have ports. 160 00:07:53,990 --> 00:07:57,410 You could say they have output ports, on which they could send 161 00:07:57,410 --> 00:08:01,360 a message, and then some input ports on which messages 162 00:08:01,360 --> 00:08:02,900 can come in. 163 00:08:02,900 --> 00:08:06,060 So in general, the process doesn't know 164 00:08:06,060 --> 00:08:08,660 who the ports are connected to. 165 00:08:08,660 --> 00:08:10,470 It just has local names for the ports, 166 00:08:10,470 --> 00:08:13,800 like one, two, three, up to the degree. 167 00:08:13,800 --> 00:08:17,110 If you have any questions just stop me and ask, 168 00:08:17,110 --> 00:08:19,190 if something's not clear. 169 00:08:19,190 --> 00:08:20,802 Otherwise I'll go pretty fast. 170 00:08:20,802 --> 00:08:22,510 And I know that none of this is familiar. 171 00:08:25,620 --> 00:08:27,470 So in general, the processes don't have 172 00:08:27,470 --> 00:08:31,070 to be distinguishable at all. 173 00:08:31,070 --> 00:08:35,127 So they don't have to have special unique identifiers 174 00:08:35,127 --> 00:08:36,710 so you could tell the processes apart. 175 00:08:36,710 --> 00:08:38,995 They could be completely identical. 176 00:08:38,995 --> 00:08:40,870 Well if they have different numbers of ports, 177 00:08:40,870 --> 00:08:43,370 they're not exactly identical. 178 00:08:43,370 --> 00:08:44,870 They certainly know how many ports 179 00:08:44,870 --> 00:08:47,540 they have, and release the local names for the ports. 180 00:08:51,320 --> 00:08:52,817 Good so these are processes sitting 181 00:08:52,817 --> 00:08:53,900 at the nodes of the graph. 182 00:08:53,900 --> 00:08:55,350 What do they do? 183 00:08:55,350 --> 00:08:56,490 So they execute. 184 00:08:56,490 --> 00:09:00,900 And we talk about an execution of this network. 185 00:09:00,900 --> 00:09:04,310 It goes in synchronous rounds, and every round, 186 00:09:04,310 --> 00:09:06,620 every process looks at its state, 187 00:09:06,620 --> 00:09:08,820 and decides what messages it's going 188 00:09:08,820 --> 00:09:12,460 to send on all of the ports. 189 00:09:12,460 --> 00:09:15,425 So it could send different messages on different ports. 190 00:09:18,110 --> 00:09:20,200 So then what happens is all the messages 191 00:09:20,200 --> 00:09:23,540 that the processes decide to send get put onto the channels 192 00:09:23,540 --> 00:09:26,810 and they get delivered to the process at the other end. 193 00:09:26,810 --> 00:09:30,520 So the process of the other end is in some state. 194 00:09:30,520 --> 00:09:32,090 All these messages come in. 195 00:09:32,090 --> 00:09:34,840 It updates its state, based on the arriving messages. 196 00:09:34,840 --> 00:09:37,750 So it changes state in response to whatever comes in. 197 00:09:42,880 --> 00:09:46,780 And this is completely different from this semester so far. 198 00:09:46,780 --> 00:09:48,880 We're going to completely ignore the costs 199 00:09:48,880 --> 00:09:51,560 of the local computation. 200 00:09:51,560 --> 00:09:54,642 So each node can compute some complicated algorithm 201 00:09:54,642 --> 00:09:56,600 of the sort you've been studying in this class, 202 00:09:56,600 --> 00:09:59,310 and we usually don't consider that cost. 203 00:09:59,310 --> 00:10:03,610 We're more worried about the communication costs. 204 00:10:03,610 --> 00:10:07,690 And so we'll be focusing on the number of rounds that it takes, 205 00:10:07,690 --> 00:10:11,170 in the synchronous case, and the number of communication 206 00:10:11,170 --> 00:10:14,710 messages or bits. 207 00:10:14,710 --> 00:10:15,460 OK so far? 208 00:10:18,650 --> 00:10:20,150 So let's start on the first problem. 209 00:10:20,150 --> 00:10:22,250 Here's a graph. 210 00:10:22,250 --> 00:10:24,460 The nodes start out possibly identical, 211 00:10:24,460 --> 00:10:27,300 but you want to somehow distinguish one of them 212 00:10:27,300 --> 00:10:30,200 to be a leader. 213 00:10:30,200 --> 00:10:34,150 So you have this arbitrary, connected, undirected graph. 214 00:10:34,150 --> 00:10:36,280 And exactly one process is supposed 215 00:10:36,280 --> 00:10:37,650 to elect itself the leader. 216 00:10:37,650 --> 00:10:40,645 That means it outputs a special leader signal. 217 00:10:43,360 --> 00:10:45,554 so exactly one should do that. 218 00:10:45,554 --> 00:10:46,720 So why do you want a leader? 219 00:10:46,720 --> 00:10:52,080 Well in practice, leaders can coordinate things. 220 00:10:52,080 --> 00:10:54,160 They can take charge of communication, 221 00:10:54,160 --> 00:10:56,110 and inform other nodes when they're 222 00:10:56,110 --> 00:10:57,380 allowed to send messages. 223 00:10:57,380 --> 00:10:59,440 They can coordinate the processing of data. 224 00:10:59,440 --> 00:11:01,610 Basically it allows you to centralize 225 00:11:01,610 --> 00:11:03,190 some of the computation. 226 00:11:03,190 --> 00:11:05,400 It can schedule the other processes. 227 00:11:05,400 --> 00:11:07,680 It can allocate the resources. 228 00:11:07,680 --> 00:11:10,020 It could help to reach agreement among the processes, 229 00:11:10,020 --> 00:11:12,280 if they start out with different opinions about what 230 00:11:12,280 --> 00:11:13,196 is supposed to happen. 231 00:11:15,782 --> 00:11:17,990 All right so let's start out with a very simple case. 232 00:11:17,990 --> 00:11:18,740 You have a clique. 233 00:11:18,740 --> 00:11:22,500 Here's a four clique, where all the vertices are directly 234 00:11:22,500 --> 00:11:24,490 connected to all the other vertices, 235 00:11:24,490 --> 00:11:27,932 with two directional channels. 236 00:11:27,932 --> 00:11:29,265 And the processes are identical. 237 00:11:31,962 --> 00:11:33,420 So I should have asked you, instead 238 00:11:33,420 --> 00:11:36,790 of just giving the answer here, but are they 239 00:11:36,790 --> 00:11:39,550 able to elect a leader? 240 00:11:39,550 --> 00:11:42,770 So this theorem says that in general, that's impossible. 241 00:11:42,770 --> 00:11:46,880 Or it's not possible, in the most general case. 242 00:11:46,880 --> 00:11:48,730 If you have, no matter what n is, 243 00:11:48,730 --> 00:11:53,100 let's just say we have an n vertex clique for some n. 244 00:11:53,100 --> 00:11:57,040 It's not possible to have any algorithm that you 245 00:11:57,040 --> 00:12:01,760 can have all the processes run, if it's deterministic 246 00:12:01,760 --> 00:12:04,550 and the processes start out all indistinguishable. 247 00:12:04,550 --> 00:12:07,940 There's no way that they can elect 248 00:12:07,940 --> 00:12:09,430 a single node as a leader. 249 00:12:09,430 --> 00:12:12,030 So do you have an intuition for why that might be the case? 250 00:12:14,982 --> 00:12:16,434 Yeah. 251 00:12:16,434 --> 00:12:17,934 AUDIENCE: They're all connected, and 252 00:12:17,934 --> 00:12:21,378 the cross-problem communication in one round 253 00:12:21,378 --> 00:12:23,697 is equal, then to be equal likely 254 00:12:23,697 --> 00:12:24,822 to select each one of them. 255 00:12:24,822 --> 00:12:26,300 It would be-- 256 00:12:26,300 --> 00:12:30,290 NANCY LYNCH: It's deterministic there's no likelihood here. 257 00:12:30,290 --> 00:12:34,410 And nobody is doing any selecting. 258 00:12:34,410 --> 00:12:36,640 You're talking as if there's somebody who's choosing 259 00:12:36,640 --> 00:12:38,260 a process to do something. 260 00:12:38,260 --> 00:12:40,870 There isn't anyone in charge. 261 00:12:40,870 --> 00:12:43,700 So this is a really different way of thinking. 262 00:12:43,700 --> 00:12:46,180 AUDIENCE: So every node is essentially the exact same. 263 00:12:46,180 --> 00:12:49,370 So if it says, OK, let's assume I'm going to be leader, 264 00:12:49,370 --> 00:12:53,400 everyone is going to assume they're going to be leader. 265 00:12:53,400 --> 00:12:55,420 NANCY LYNCH: That's exactly the right intuition. 266 00:12:55,420 --> 00:12:56,810 They can't distinguish themselves 267 00:12:56,810 --> 00:12:59,960 because they're always going to do the same thing. 268 00:12:59,960 --> 00:13:01,340 Let's look at a very simple case. 269 00:13:01,340 --> 00:13:05,420 Suppose we have just two nodes, two node clique, two nodes 270 00:13:05,420 --> 00:13:07,970 connected by channels. 271 00:13:07,970 --> 00:13:09,470 These are identical. 272 00:13:09,470 --> 00:13:10,600 They're deterministic. 273 00:13:10,600 --> 00:13:11,900 What can they do? 274 00:13:11,900 --> 00:13:14,530 Well you could try to design algorithms for one of them 275 00:13:14,530 --> 00:13:16,680 to elect itself as the leader. 276 00:13:16,680 --> 00:13:18,960 But you can show, by using induction, 277 00:13:18,960 --> 00:13:20,530 that the processes are actually going 278 00:13:20,530 --> 00:13:25,230 to remain in the same state as each other forever, 279 00:13:25,230 --> 00:13:28,260 however many rounds you execute. 280 00:13:28,260 --> 00:13:30,130 So let's slow down. 281 00:13:30,130 --> 00:13:32,090 We can work by contradiction. 282 00:13:32,090 --> 00:13:36,460 Suppose you have an algorithm that solves this problem. 283 00:13:36,460 --> 00:13:38,270 Both of the processes, they're identical. 284 00:13:38,270 --> 00:13:40,676 They start in the same start state. 285 00:13:40,676 --> 00:13:42,300 Let's say there's a unique start state. 286 00:13:46,350 --> 00:13:49,750 So we could prove by induction on the number of rounds 287 00:13:49,750 --> 00:13:53,250 that after any number of rounds, say r rounds, 288 00:13:53,250 --> 00:13:57,770 the processes are still in identical states. 289 00:13:57,770 --> 00:13:59,770 So the inductive step is, all right, they're 290 00:13:59,770 --> 00:14:01,960 in identical states after some number of rounds. 291 00:14:01,960 --> 00:14:03,677 Let's look at the next round. 292 00:14:03,677 --> 00:14:04,760 They're in the same state. 293 00:14:04,760 --> 00:14:08,330 So they generate the same messages. 294 00:14:08,330 --> 00:14:09,900 So they each other the same messages. 295 00:14:09,900 --> 00:14:12,680 They receive the same message. 296 00:14:12,680 --> 00:14:15,140 And then they make the same state change. 297 00:14:15,140 --> 00:14:17,080 So they stay in the same state. 298 00:14:19,800 --> 00:14:22,260 And you can tweak this, and say how this 299 00:14:22,260 --> 00:14:24,025 works for-- yeah, question? 300 00:14:24,025 --> 00:14:28,010 AUDIENCE: So in what ways is the proof a contradiction? 301 00:14:28,010 --> 00:14:29,260 NANCY LYNCH: I'm not finished. 302 00:14:29,260 --> 00:14:30,620 You're exactly right. 303 00:14:30,620 --> 00:14:33,850 We have to finish by using the requirements of the problem. 304 00:14:33,850 --> 00:14:38,410 Since the algorithm has to solve the leader election problem, 305 00:14:38,410 --> 00:14:41,170 the requirements say that eventually, one of them 306 00:14:41,170 --> 00:14:45,460 has to output leader. 307 00:14:45,460 --> 00:14:46,820 And what happens when he does? 308 00:14:50,940 --> 00:14:51,440 Anyone? 309 00:14:51,440 --> 00:14:51,730 Yeah. 310 00:14:51,730 --> 00:14:54,240 AUDIENCE: You have node also outputting the leader signal. 311 00:14:54,240 --> 00:14:56,790 NANCY LYNCH: Yeah the other one would also do the same thing. 312 00:14:56,790 --> 00:14:59,820 We're saying round by round, they stay in the same state. 313 00:14:59,820 --> 00:15:04,360 So as someone said before, when one guy outputs leader, 314 00:15:04,360 --> 00:15:08,210 at the same round the other guy will output leader as well. 315 00:15:08,210 --> 00:15:10,545 So that's a contradiction to the problem requirements. 316 00:15:10,545 --> 00:15:12,170 Notice we didn't assume anything at all 317 00:15:12,170 --> 00:15:15,040 about exactly how the algorithm works. 318 00:15:15,040 --> 00:15:17,990 We're just saying, however it works, it can't solve this, 319 00:15:17,990 --> 00:15:20,110 under the assumptions that the nodes 320 00:15:20,110 --> 00:15:21,830 are indistinguishable and deterministic. 321 00:15:24,780 --> 00:15:26,710 So as you can see, this will extend if you 322 00:15:26,710 --> 00:15:30,680 have larger cliques of size n. 323 00:15:30,680 --> 00:15:33,710 So now the process has not just one output port, 324 00:15:33,710 --> 00:15:38,080 it has n minus 1 output ports to connect to all the other nodes. 325 00:15:38,080 --> 00:15:41,370 Let's say they're numbered 1 through n minus 1. 326 00:15:41,370 --> 00:15:45,240 And one of the possibilities, and one I'll use in this proof 327 00:15:45,240 --> 00:15:47,980 is that the ports happen to be numbered consistently. 328 00:15:47,980 --> 00:15:52,470 So that if you have output port number k at one node, 329 00:15:52,470 --> 00:15:57,320 it's connected to input port number k at the other end. 330 00:15:57,320 --> 00:16:00,207 So that's one way things can match up. 331 00:16:00,207 --> 00:16:01,790 All right if that's the case, we could 332 00:16:01,790 --> 00:16:03,230 do the same proof we just did. 333 00:16:03,230 --> 00:16:06,560 Show by induction that all the processes in the clique 334 00:16:06,560 --> 00:16:09,580 remain in the same state forever. 335 00:16:09,580 --> 00:16:10,652 So same proof. 336 00:16:10,652 --> 00:16:12,610 Suppose you have an algorithm that's solves it. 337 00:16:12,610 --> 00:16:14,620 They all began in the same state. 338 00:16:14,620 --> 00:16:17,080 You show by induction that they all remain the same state. 339 00:16:19,690 --> 00:16:21,640 Well so now we slow down a little bit. 340 00:16:21,640 --> 00:16:25,920 Each process sends a possibly different message on each port. 341 00:16:25,920 --> 00:16:28,540 But everybody sends the same message on port k 342 00:16:28,540 --> 00:16:30,980 because they're all indistinguishable. 343 00:16:30,980 --> 00:16:33,080 And then because the way the ports match up, 344 00:16:33,080 --> 00:16:36,370 everybody receives the same message on port k. 345 00:16:36,370 --> 00:16:38,120 And then they make the same state changes. 346 00:16:41,030 --> 00:16:43,456 AUDIENCE: Does this proof imply that there's 347 00:16:43,456 --> 00:16:46,442 a kernel for simplifying the graph when you find a clique? 348 00:16:50,240 --> 00:16:53,250 NANCY LYNCH: No because if you have a graph that 349 00:16:53,250 --> 00:16:55,060 consists of a clique and then let's say, 350 00:16:55,060 --> 00:16:57,330 some other stuff, maybe the leader 351 00:16:57,330 --> 00:16:59,770 could be somebody outside the clique. 352 00:16:59,770 --> 00:17:01,920 So you can't just say because there's 353 00:17:01,920 --> 00:17:04,619 a clique that you can't elect a leader because you could 354 00:17:04,619 --> 00:17:08,091 break the symmetry of the graph with other stuff in the graph. 355 00:17:08,091 --> 00:17:09,055 Yeah? 356 00:17:09,055 --> 00:17:11,947 AUDIENCE: What assumptions do we make to know that for each k, 357 00:17:11,947 --> 00:17:14,035 they receive the same message? 358 00:17:14,035 --> 00:17:15,410 NANCY LYNCH: Because everybody is 359 00:17:15,410 --> 00:17:18,109 going to send the same message on the same numbered port, 360 00:17:18,109 --> 00:17:19,192 because they're identical. 361 00:17:22,079 --> 00:17:23,933 And one way the ports can be hooked up, 362 00:17:23,933 --> 00:17:26,349 and we have to tolerate all ways they could be hooked up-- 363 00:17:26,349 --> 00:17:28,800 say an adversary hooks them up-- is 364 00:17:28,800 --> 00:17:32,430 that port k, somebody's output port, 365 00:17:32,430 --> 00:17:36,890 is the other end's input port numbered k. 366 00:17:36,890 --> 00:17:38,710 So then they all receive the same message 367 00:17:38,710 --> 00:17:41,374 on their port number k. 368 00:17:41,374 --> 00:17:41,874 Yeah? 369 00:17:41,874 --> 00:17:43,317 AUDIENCE: Is it actually possible to always hook up 370 00:17:43,317 --> 00:17:44,150 the boards that way. 371 00:17:44,150 --> 00:17:48,480 I mean, it's like wrapped with three vertices. 372 00:17:48,480 --> 00:17:51,310 NANCY LYNCH: Well I'm just doing it for cliques. 373 00:17:51,310 --> 00:17:53,390 Yeah it is. 374 00:17:53,390 --> 00:17:54,610 Yeah you could do it. 375 00:17:54,610 --> 00:17:57,470 I mean you could have port one always going clockwise, 376 00:17:57,470 --> 00:18:00,780 and port two going counterclockwise, 377 00:18:00,780 --> 00:18:03,560 I mean, there's always a way to do that in a clique. 378 00:18:03,560 --> 00:18:06,330 I checked that. 379 00:18:06,330 --> 00:18:09,240 So what you've just seen is one of the very basic problems 380 00:18:09,240 --> 00:18:11,780 for distributed algorithms, which is breaking symmetry 381 00:18:11,780 --> 00:18:13,680 among identical processes. 382 00:18:13,680 --> 00:18:17,850 And you see that deterministic, indistinguishable processes 383 00:18:17,850 --> 00:18:19,140 just can't do it. 384 00:18:19,140 --> 00:18:21,610 So we have to have something more. 385 00:18:21,610 --> 00:18:23,100 So what do you think we could add 386 00:18:23,100 --> 00:18:24,545 to make this problem solvable? 387 00:18:27,260 --> 00:18:28,680 AUDIENCE: [INAUDIBLE] processes. 388 00:18:28,680 --> 00:18:30,174 NANCY LYNCH: I can't hear. 389 00:18:30,174 --> 00:18:31,090 AUDIENCE: Probability. 390 00:18:31,090 --> 00:18:33,135 Probability, OK, anything else? 391 00:18:36,210 --> 00:18:39,320 So we could have the processes actually distinguishable. 392 00:18:39,320 --> 00:18:42,710 The common way in this area is to say that each process has 393 00:18:42,710 --> 00:18:43,720 an identifier. 394 00:18:43,720 --> 00:18:47,690 Like, you buy a chip and it's got some identifier burned in. 395 00:18:47,690 --> 00:18:50,160 OK so you have some kind of unique identifiers. 396 00:18:50,160 --> 00:18:53,520 Or you can use randomness. 397 00:18:53,520 --> 00:18:57,230 OK for unique identifiers, you assume 398 00:18:57,230 --> 00:19:00,430 everybody has some number or some identifier 399 00:19:00,430 --> 00:19:01,890 that it knows what it is. 400 00:19:01,890 --> 00:19:07,050 It's built into its state, let's say, a special state variable. 401 00:19:07,050 --> 00:19:08,740 They're totally ordered, generally. 402 00:19:08,740 --> 00:19:15,170 They could be integers, or from some totally ordered set. 403 00:19:15,170 --> 00:19:16,760 When you say unique identifiers, is 404 00:19:16,760 --> 00:19:20,810 it means that different identifiers could 405 00:19:20,810 --> 00:19:23,360 appear any place in the graph. 406 00:19:23,360 --> 00:19:27,430 But each identifier can appear at most once. 407 00:19:27,430 --> 00:19:29,870 You can have a huge identifier space in a small graph. 408 00:19:29,870 --> 00:19:32,700 But you're Just selecting some identifiers 409 00:19:32,700 --> 00:19:36,790 to put in the processes in the graph. 410 00:19:36,790 --> 00:19:37,720 So that's one set up. 411 00:19:37,720 --> 00:19:41,880 And the other one, of course, is using randomness. 412 00:19:41,880 --> 00:19:44,930 So let's look at the unique identifiers first. 413 00:19:44,930 --> 00:19:46,270 Now the problem becomes easy. 414 00:19:46,270 --> 00:19:48,330 Let's look at the clique again. 415 00:19:48,330 --> 00:19:51,970 Suppose there's an algorithm-- well, let's 416 00:19:51,970 --> 00:19:53,920 construct an algorithm that consists 417 00:19:53,920 --> 00:19:58,760 of deterministic processes with unique identifiers. 418 00:19:58,760 --> 00:20:02,250 And we're going to guarantee to elect a leader in the graph. 419 00:20:02,250 --> 00:20:03,990 And moreover, it's just going to take 420 00:20:03,990 --> 00:20:06,180 one round of communication. 421 00:20:06,180 --> 00:20:10,210 And it's only going to use n squared messages. 422 00:20:10,210 --> 00:20:11,190 How could that work? 423 00:20:17,160 --> 00:20:20,340 Everybody in this click has a unique identifier. 424 00:20:20,340 --> 00:20:22,540 What would they do? 425 00:20:22,540 --> 00:20:23,400 Send it out, right? 426 00:20:23,400 --> 00:20:25,860 So you can just send it on all your ports. 427 00:20:25,860 --> 00:20:28,250 Everybody would send its unique identifier on all 428 00:20:28,250 --> 00:20:29,740 its output ports. 429 00:20:29,740 --> 00:20:33,540 And then they collect the unique identifiers from everyone else. 430 00:20:33,540 --> 00:20:37,360 So everybody sees the same set of identifiers. 431 00:20:37,360 --> 00:20:40,870 And so the process with the maximum unique identifier 432 00:20:40,870 --> 00:20:43,409 knows that it's the only one with that identifier. 433 00:20:43,409 --> 00:20:44,450 And it's the biggest one. 434 00:20:44,450 --> 00:20:46,120 So it can elect itself the leader. 435 00:20:49,250 --> 00:20:51,790 So all you is unique identifiers and the ability 436 00:20:51,790 --> 00:20:54,070 to exchange them reliably. 437 00:20:54,070 --> 00:20:55,930 And you can elect somebody easily. 438 00:20:58,810 --> 00:21:03,050 Randomness, well, various ways to do it. 439 00:21:03,050 --> 00:21:07,270 But one idea is the processes could just choose identifiers 440 00:21:07,270 --> 00:21:08,700 randomly. 441 00:21:08,700 --> 00:21:13,420 You take a sufficiently large set of possible identifiers, 442 00:21:13,420 --> 00:21:16,540 and so if they just choose uniformly at random, 443 00:21:16,540 --> 00:21:19,640 they're likely to choose all different identifiers. 444 00:21:19,640 --> 00:21:22,590 Once you have these randomly chosen identifiers 445 00:21:22,590 --> 00:21:26,770 you could use them like the really unique identifiers. 446 00:21:26,770 --> 00:21:29,700 The only thing is you might, there's a small chance 447 00:21:29,700 --> 00:21:31,170 that you'll have a duplicate. 448 00:21:31,170 --> 00:21:34,520 In which case, you want to be able to detect that and repeat 449 00:21:34,520 --> 00:21:36,100 this. 450 00:21:36,100 --> 00:21:40,112 So first of all, how big the a set do you need? 451 00:21:40,112 --> 00:21:41,070 Well here's an example. 452 00:21:43,950 --> 00:21:46,410 Suppose that you have the n processes choosing 453 00:21:46,410 --> 00:21:51,390 at random, independently from a space of size r. 454 00:21:51,390 --> 00:21:57,030 Identifiers are the numbers one through r. 455 00:21:57,030 --> 00:22:01,290 OK and r is going to depend on n. 456 00:22:01,290 --> 00:22:03,230 It's going to be like n squared, but it's also 457 00:22:03,230 --> 00:22:06,230 going to depend on epsilon, which is the error probability 458 00:22:06,230 --> 00:22:08,270 that you're interested in. 459 00:22:08,270 --> 00:22:11,710 Turns out that n squared over 2 epsilon is good enough. 460 00:22:11,710 --> 00:22:15,300 OK so you have your IDs space at least that large. 461 00:22:15,300 --> 00:22:18,820 And then you can guarantee that with probability at least 1 462 00:22:18,820 --> 00:22:22,130 minus epsilon, all the numbers that everybody chooses 463 00:22:22,130 --> 00:22:24,342 are different. 464 00:22:24,342 --> 00:22:25,300 It's a very easy proof. 465 00:22:25,300 --> 00:22:27,980 The probability-- just look at two particular processes-- 466 00:22:27,980 --> 00:22:31,050 what's the probability that they choose the same number? 467 00:22:31,050 --> 00:22:32,594 It's just 1 over r, right. 468 00:22:32,594 --> 00:22:34,260 Because they're both choosing at random. 469 00:22:34,260 --> 00:22:35,690 The first one chooses something. 470 00:22:35,690 --> 00:22:37,470 The probability that the second one 471 00:22:37,470 --> 00:22:41,080 chooses the same thing is just 1 over r. 472 00:22:41,080 --> 00:22:42,600 But now you can take a union bound, 473 00:22:42,600 --> 00:22:49,020 just add up the probabilities of any pair having a duplicate. 474 00:22:49,020 --> 00:22:52,520 And so you have n square around n squared over 2 pairs. 475 00:22:52,520 --> 00:22:57,500 And so multiplying 1 over r by n squared over 2 476 00:22:57,500 --> 00:23:00,590 still keeps your probability less than or equal to epsilon, 477 00:23:00,590 --> 00:23:02,820 your error probability. 478 00:23:02,820 --> 00:23:08,740 So you can choose identifiers using randomness. 479 00:23:08,740 --> 00:23:11,640 With large enough space, with very high probability, 480 00:23:11,640 --> 00:23:15,910 you can get them to be all different. 481 00:23:15,910 --> 00:23:17,795 And now here's how the algorithm works. 482 00:23:20,460 --> 00:23:24,630 So you get an algorithm that would finish in only one round, 483 00:23:24,630 --> 00:23:26,980 with probability 1 minus epsilon. 484 00:23:26,980 --> 00:23:28,300 But it will be correct. 485 00:23:28,300 --> 00:23:30,640 And it will have repeated rounds, 486 00:23:30,640 --> 00:23:32,840 in case the first round doesn't work. 487 00:23:32,840 --> 00:23:35,900 But the expected time is just 1 over 1 488 00:23:35,900 --> 00:23:39,130 minus epsilon, not very big. 489 00:23:39,130 --> 00:23:40,380 What's the algorithm? 490 00:23:40,380 --> 00:23:43,880 Well processes just choose the random IDs from the big space, 491 00:23:43,880 --> 00:23:45,200 like we just said. 492 00:23:45,200 --> 00:23:47,770 They exchange their Ids. 493 00:23:47,770 --> 00:23:50,030 And now, everybody can see everyone's ID, 494 00:23:50,030 --> 00:23:52,750 but they also can tell if there's a duplicate. 495 00:23:52,750 --> 00:23:55,030 if the maximum is not unique. 496 00:23:55,030 --> 00:23:57,680 So if the maximum is unique, find the maximum wins. 497 00:23:57,680 --> 00:23:59,500 And everyone knows that. 498 00:23:59,500 --> 00:24:01,190 Otherwise you have a problem. 499 00:24:01,190 --> 00:24:02,190 And you have to repeat. 500 00:24:02,190 --> 00:24:06,200 And you just keep doing that until you succeed. 501 00:24:06,200 --> 00:24:08,650 So this can just continue, but it's 502 00:24:08,650 --> 00:24:11,860 likely to finish very fast, if you have a high likelihood 503 00:24:11,860 --> 00:24:13,560 of having no duplicates. 504 00:24:17,310 --> 00:24:20,440 Questions about the leader election? 505 00:24:20,440 --> 00:24:23,910 So the story was, it's impossible without something 506 00:24:23,910 --> 00:24:27,640 to help you distinguish some processes. 507 00:24:27,640 --> 00:24:29,286 You can do it with unique identifiers. 508 00:24:29,286 --> 00:24:30,410 You can do with randomness. 509 00:24:36,680 --> 00:24:42,240 Second problem is called maximal independent set. 510 00:24:42,240 --> 00:24:44,820 So you have a picture of a maximal independent set 511 00:24:44,820 --> 00:24:47,020 in a graph here. 512 00:24:47,020 --> 00:24:49,790 Let's try this. 513 00:24:49,790 --> 00:24:51,120 Yeah cursor. 514 00:24:51,120 --> 00:24:53,670 So the maximal independent set in the graph is here. 515 00:24:53,670 --> 00:24:57,300 But this is something I'll come back to a minute. 516 00:24:57,300 --> 00:25:00,010 This is actually a use of the maximal independent set 517 00:25:00,010 --> 00:25:02,600 to model what happens in a certain kind 518 00:25:02,600 --> 00:25:06,140 of biological system. 519 00:25:06,140 --> 00:25:07,660 What's a maximal independence set? 520 00:25:07,660 --> 00:25:13,750 So you start with a general, undirected graph network. 521 00:25:13,750 --> 00:25:18,280 And the problem is to choose a subset of the nodes so that 522 00:25:18,280 --> 00:25:21,000 they form what we call a maximal independent . 523 00:25:21,000 --> 00:25:22,180 Set let's break that down. 524 00:25:22,180 --> 00:25:23,430 What does this mean? 525 00:25:23,430 --> 00:25:26,810 Independent means you don't have any two neighbors that 526 00:25:26,810 --> 00:25:30,310 are both in the set. 527 00:25:30,310 --> 00:25:32,960 So you don't want to get two neighbors in the set. 528 00:25:32,960 --> 00:25:37,510 Maximal means that whatever set you choose, 529 00:25:37,510 --> 00:25:42,480 you can't add any more nodes without violating independence. 530 00:25:42,480 --> 00:25:44,010 So now this should look something 531 00:25:44,010 --> 00:25:45,860 like a couple of homework problems 532 00:25:45,860 --> 00:25:48,800 that you had from the beginning and recently. 533 00:25:48,800 --> 00:25:52,180 But I'm not saying that it's maximum independent set. 534 00:25:52,180 --> 00:25:54,420 I'm not saying you have to have the global, largest 535 00:25:54,420 --> 00:25:55,970 number of nodes. 536 00:25:55,970 --> 00:25:58,960 I'm just saying it has to be a local optimum, 537 00:25:58,960 --> 00:26:01,850 in the sense that you can't add any more nodes to your set 538 00:26:01,850 --> 00:26:05,180 without violating the independence property. 539 00:26:05,180 --> 00:26:06,820 Make sense? 540 00:26:06,820 --> 00:26:09,560 There's two examples, the same graph, 541 00:26:09,560 --> 00:26:12,910 two different maximal independent sets. 542 00:26:12,910 --> 00:26:18,350 The green nodes, here we have four green nodes 543 00:26:18,350 --> 00:26:22,135 that are independent, not neighbors of each other. 544 00:26:22,135 --> 00:26:23,760 And they're maximal, in that I couldn't 545 00:26:23,760 --> 00:26:26,540 add any of the red nodes into a set 546 00:26:26,540 --> 00:26:31,150 without violating the independence property. 547 00:26:31,150 --> 00:26:34,080 But then over here, we have a second maximal independent set 548 00:26:34,080 --> 00:26:35,850 for the same graph. 549 00:26:35,850 --> 00:26:39,160 Now we just have two nodes. 550 00:26:39,160 --> 00:26:41,760 And you can't add any of the red nodes 551 00:26:41,760 --> 00:26:44,960 without violating the independence property. 552 00:26:44,960 --> 00:26:48,550 In other words, every node is either in the MIS, 553 00:26:48,550 --> 00:26:51,810 or has a neighbor in the MIS. 554 00:26:51,810 --> 00:26:56,620 There's nothing else you can do to add notes to the MIS 555 00:26:56,620 --> 00:27:00,175 So the notion of maximal independence, that make sense? 556 00:27:04,120 --> 00:27:08,430 All right, so to make this a distributed problem, 557 00:27:08,430 --> 00:27:11,490 let's start out assuming we have no unique identifier. 558 00:27:11,490 --> 00:27:12,869 Actually, for this whole problem, 559 00:27:12,869 --> 00:27:14,660 we're not going to have unique identifiers. 560 00:27:14,660 --> 00:27:17,580 They're all going to be identical. 561 00:27:17,580 --> 00:27:19,990 The processes do need one piece of information, 562 00:27:19,990 --> 00:27:24,010 which is some approximation to n, the size of the network, 563 00:27:24,010 --> 00:27:27,160 the total number of vertices. 564 00:27:27,160 --> 00:27:29,990 So we would like to have these nodes somehow 565 00:27:29,990 --> 00:27:35,860 cooperate to compute an MIS of the entire network graph. 566 00:27:35,860 --> 00:27:39,570 What that means is every process should find out whether it 567 00:27:39,570 --> 00:27:41,380 is in the MIS or not. 568 00:27:41,380 --> 00:27:43,780 If it is, it should output n. 569 00:27:43,780 --> 00:27:46,060 And if it's not, it'll just output out. 570 00:27:49,110 --> 00:27:51,570 So you don't have to actually compute this, 571 00:27:51,570 --> 00:27:53,150 like you're used to solving problems 572 00:27:53,150 --> 00:27:55,950 like this, where somebody has to gather all 573 00:27:55,950 --> 00:27:57,990 the information in one place. 574 00:27:57,990 --> 00:27:59,280 Nobody gathers anything. 575 00:27:59,280 --> 00:28:01,360 Everybody just has to know whether or not 576 00:28:01,360 --> 00:28:02,228 they're in the MIS. 577 00:28:05,880 --> 00:28:07,760 So as you can imagine, this is going 578 00:28:07,760 --> 00:28:10,000 to be unsolvable in certain graphs 579 00:28:10,000 --> 00:28:14,870 by deterministic algorithms, by the same kind of symmetry 580 00:28:14,870 --> 00:28:19,810 breaking problems that you saw for leader election. 581 00:28:19,810 --> 00:28:22,320 So we're going to move right to randomized algorithms 582 00:28:22,320 --> 00:28:25,400 for this problem. 583 00:28:25,400 --> 00:28:28,180 Some applications of distributed MIS, 584 00:28:28,180 --> 00:28:30,230 well they come up in communication networks, 585 00:28:30,230 --> 00:28:32,860 where you want to choose-- let's say you have a very 586 00:28:32,860 --> 00:28:35,040 dense network of processes. 587 00:28:35,040 --> 00:28:37,830 You want to choose just a few nodes, which would 588 00:28:37,830 --> 00:28:39,770 be like an overlay network. 589 00:28:39,770 --> 00:28:41,850 You would choose some nodes who could take charge 590 00:28:41,850 --> 00:28:44,710 of communication that you can communicate on this overlay 591 00:28:44,710 --> 00:28:46,910 network, and then in the end, each node 592 00:28:46,910 --> 00:28:50,960 can take care of communicating with its many neighbors. 593 00:28:50,960 --> 00:28:54,250 So that's a common sort of application. 594 00:28:54,250 --> 00:28:56,670 But it also comes up in other places. 595 00:28:56,670 --> 00:28:59,300 A great example is in developmental biology, where 596 00:28:59,300 --> 00:29:04,170 a couple of years ago, there was a paper in Science by Afek, 597 00:29:04,170 --> 00:29:05,980 Alon-- there's like eight authors on that. 598 00:29:05,980 --> 00:29:11,380 But Ziv Bar-Joseph was the lead author of this paper. 599 00:29:11,380 --> 00:29:15,730 So the idea is you have a bunch of cells in a fruit fly. 600 00:29:15,730 --> 00:29:18,830 And during development, some of those cells 601 00:29:18,830 --> 00:29:21,300 are supposed to distinguish themselves 602 00:29:21,300 --> 00:29:24,730 as being what's called sensory organ precursor cells. 603 00:29:24,730 --> 00:29:28,880 The properties that you want it that actually, you 604 00:29:28,880 --> 00:29:31,940 would like a maximal independent set of the cells to become 605 00:29:31,940 --> 00:29:34,370 distinguished in this way. 606 00:29:34,370 --> 00:29:36,800 So they wrote a paper about it, got published in Science. 607 00:29:36,800 --> 00:29:39,790 They basically designed a new distributed algorithm 608 00:29:39,790 --> 00:29:43,709 that closely mirrored what happened in the fruit fly 609 00:29:43,709 --> 00:29:44,500 during development. 610 00:29:48,420 --> 00:29:52,240 So what I'm going to show you is a very well-known algorithm, 611 00:29:52,240 --> 00:29:55,780 a classical algorithm for MIS. 612 00:29:55,780 --> 00:29:58,690 This is by Michael Luby. 613 00:29:58,690 --> 00:30:02,070 Very simple algorithm, it executes in phases. 614 00:30:02,070 --> 00:30:05,430 Each phase has two realms. 615 00:30:05,430 --> 00:30:07,690 So you start out with all the nodes being active. 616 00:30:07,690 --> 00:30:08,810 They're all involved. 617 00:30:08,810 --> 00:30:12,060 They don't know what they're going to end up with. 618 00:30:12,060 --> 00:30:15,410 And at each phase, some of the active nodes 619 00:30:15,410 --> 00:30:18,580 are going to decide they're in the MIS. 620 00:30:18,580 --> 00:30:21,970 Some others will decide they're out of the MIS. 621 00:30:21,970 --> 00:30:24,400 And some others won't know yet. 622 00:30:24,400 --> 00:30:27,230 So then you just continue to the next phase, 623 00:30:27,230 --> 00:30:30,880 with all the remaining nodes and the edges between them. 624 00:30:30,880 --> 00:30:32,670 So you're basically going to settle 625 00:30:32,670 --> 00:30:35,150 what happens with some subset of the nodes, 626 00:30:35,150 --> 00:30:36,945 and then reduce the graph and continue. 627 00:30:39,870 --> 00:30:40,870 So that's the algorithm. 628 00:30:40,870 --> 00:30:43,000 So what do you do in each phase? 629 00:30:43,000 --> 00:30:46,115 Here's what an active node does at a phase. 630 00:30:46,115 --> 00:30:47,810 Two rounds. 631 00:30:47,810 --> 00:30:50,930 The first round, it picks a random value 632 00:30:50,930 --> 00:30:54,640 in a large space, the same kind of idea as before. 633 00:30:54,640 --> 00:30:57,680 This time it's 1 up 2 n to the fifth. 634 00:30:57,680 --> 00:31:01,790 It sends that random value to all its neighbors, 635 00:31:01,790 --> 00:31:06,360 receives the values from all its still active neighbors, 636 00:31:06,360 --> 00:31:11,310 and then it just looks to see if its value is greater than all 637 00:31:11,310 --> 00:31:13,190 the values it received. 638 00:31:13,190 --> 00:31:14,450 So then it's a local maximum. 639 00:31:14,450 --> 00:31:16,830 It has chosen a value that's strictly greater 640 00:31:16,830 --> 00:31:19,640 than the values chosen by all its neighbors. 641 00:31:19,640 --> 00:31:24,040 So then it decides to join the MIS and it outputs in. 642 00:31:24,040 --> 00:31:26,372 But now you want to make sure none of its neighbors-- 643 00:31:26,372 --> 00:31:27,830 you know that none of its neighbors 644 00:31:27,830 --> 00:31:31,200 are going to join the MIS at round one. 645 00:31:31,200 --> 00:31:34,040 Because you know this guy's chosen value 646 00:31:34,040 --> 00:31:36,930 is larger, strictly larger, than all its neighbors. 647 00:31:36,930 --> 00:31:39,450 But now you want to tell them that they should not join. 648 00:31:39,450 --> 00:31:40,650 They should be out. 649 00:31:40,650 --> 00:31:49,080 So if you join the MIS you're going 650 00:31:49,080 --> 00:31:54,740 to announce that by sending messages to all your neighbors. 651 00:31:54,740 --> 00:32:02,510 And then anybody who receives an announcement can 652 00:32:02,510 --> 00:32:05,470 decide it's not going to be in the MIS and it outputs out. 653 00:32:05,470 --> 00:32:10,260 Because it knows it has a neighbor that's in the MIS. 654 00:32:10,260 --> 00:32:15,050 So if you decided in or out at this phase, you're done. 655 00:32:15,050 --> 00:32:16,420 You become inactive. 656 00:32:16,420 --> 00:32:18,190 And only the remaining active guys 657 00:32:18,190 --> 00:32:20,570 continue to the next phase. 658 00:32:20,570 --> 00:32:21,200 Make sense? 659 00:32:24,240 --> 00:32:26,220 any questions about how the algorithm works? 660 00:32:32,480 --> 00:32:34,020 And animation. 661 00:32:34,020 --> 00:32:37,770 All right so all the nodes start out identical. 662 00:32:37,770 --> 00:32:39,347 They all pick IDs. 663 00:32:39,347 --> 00:32:40,930 So here's some numbers that they pick. 664 00:32:40,930 --> 00:32:45,410 So which nodes are going to now join the MIS? 665 00:32:45,410 --> 00:32:50,480 16, and the one that chose 13. 666 00:32:50,480 --> 00:32:52,230 Good, so they're in the MIS. 667 00:32:52,230 --> 00:32:55,070 And then at the same phase, all of their neighbors, 668 00:32:55,070 --> 00:33:02,750 those for red nodes, are going to decide to be out of the MIS 669 00:33:02,750 --> 00:33:04,840 And now you're left with the remaining four nodes. 670 00:33:04,840 --> 00:33:07,290 We don't keep going with the same IDs. 671 00:33:07,290 --> 00:33:08,140 we start over. 672 00:33:08,140 --> 00:33:10,840 We want the rounds to be independent. 673 00:33:10,840 --> 00:33:14,610 So if they choose again, they get new IDs. 674 00:33:14,610 --> 00:33:19,200 And now the guy with the 12 and the guy with the 18 675 00:33:19,200 --> 00:33:22,180 going to join the MIS at this phase. 676 00:33:22,180 --> 00:33:27,280 And their neighbors will decide not to be in the MIS. 677 00:33:27,280 --> 00:33:30,800 That leaves us with just one mode, the guy who had four. 678 00:33:30,800 --> 00:33:33,240 Next phase, he chooses another ID. 679 00:33:33,240 --> 00:33:36,652 But he has no neighbors so by default, 680 00:33:36,652 --> 00:33:38,110 he's bigger than all the neighbors. 681 00:33:38,110 --> 00:33:39,340 So he just joins the MIS. 682 00:33:42,719 --> 00:33:43,760 So that's how this works. 683 00:33:43,760 --> 00:33:45,600 Very simple algorithm, and it actually 684 00:33:45,600 --> 00:33:48,240 works to find an MIS very quickly. 685 00:33:53,380 --> 00:33:57,150 Why does this give you independence? 686 00:33:57,150 --> 00:34:00,310 How do we know that if this ever terminates, 687 00:34:00,310 --> 00:34:02,960 if everybody decides, how do we know that we don't ever 688 00:34:02,960 --> 00:34:08,440 have two neighbors that decided to be in the MIS? 689 00:34:08,440 --> 00:34:09,600 Yeah. 690 00:34:09,600 --> 00:34:11,580 AUDIENCE: Because once a node joins the MIS, 691 00:34:11,580 --> 00:34:14,550 it broadcasts to its neighbors that-- 692 00:34:14,550 --> 00:34:16,630 NANCY LYNCH: Right. 693 00:34:16,630 --> 00:34:18,750 The only way you join the MIS is if you 694 00:34:18,750 --> 00:34:21,750 have the unique maximum value in your neighborhood. 695 00:34:21,750 --> 00:34:25,469 And when you do, all your neighbors become inactive. 696 00:34:25,469 --> 00:34:29,020 So you're certainly going to have independence. 697 00:34:29,020 --> 00:34:33,199 Maximality, if it terminates, the final set 698 00:34:33,199 --> 00:34:37,159 is not going to allow you to add any more nodes. 699 00:34:37,159 --> 00:34:37,659 Why? 700 00:34:37,659 --> 00:34:40,290 Because a node is only going to become inactive 701 00:34:40,290 --> 00:34:45,460 if it joins the MIS, or a neighbor joins the MIS. 702 00:34:45,460 --> 00:34:47,159 And we just continue this algorithm 703 00:34:47,159 --> 00:34:51,080 until all the nodes become inactive. 704 00:34:51,080 --> 00:34:55,010 So either the node is in the MIS or a neighbor is in the MIS. 705 00:34:55,010 --> 00:34:58,170 So you can't possibly add any more. 706 00:34:58,170 --> 00:35:00,350 Yes? 707 00:35:00,350 --> 00:35:01,970 So this has the basic correctness 708 00:35:01,970 --> 00:35:04,590 properties, but what you're probably wondering, 709 00:35:04,590 --> 00:35:07,940 is why is this efficient enough? 710 00:35:07,940 --> 00:35:10,120 Why is it efficient? 711 00:35:10,120 --> 00:35:13,850 Well we could say that with high probability, of probability 1, 712 00:35:13,850 --> 00:35:15,065 it will eventually terminate. 713 00:35:17,590 --> 00:35:25,020 More quantitative, we can state this theorem that says, 714 00:35:25,020 --> 00:35:28,770 with probability at least 1 minus 1 over n, 715 00:35:28,770 --> 00:35:33,490 all the nodes decide within four log n phases. 716 00:35:33,490 --> 00:35:35,540 Since n is the number of nodes, this 717 00:35:35,540 --> 00:35:38,670 doesn't tell us that you get probability 1 of eventually 718 00:35:38,670 --> 00:35:39,290 terminating. 719 00:35:39,290 --> 00:35:42,310 But we can repeat this and get the same sort 720 00:35:42,310 --> 00:35:47,520 of bound repeatedly for successive phases. 721 00:35:47,520 --> 00:35:50,860 But let's just focus on getting probability at least 1 minus 1 722 00:35:50,860 --> 00:35:57,245 over n that all nodes decide within about four log n phases. 723 00:36:00,270 --> 00:36:01,680 So let's see what this is saying. 724 00:36:01,680 --> 00:36:04,580 You have this big complicated graph. 725 00:36:04,580 --> 00:36:08,920 And in one round, for this to be like log n behavior, what 726 00:36:08,920 --> 00:36:10,885 has to happen at each phase? 727 00:36:13,740 --> 00:36:16,120 You have to reduce it by some constant fraction. 728 00:36:16,120 --> 00:36:18,650 The number of nodes, say, should go down. 729 00:36:18,650 --> 00:36:23,220 So it's sort of how the proof will go. 730 00:36:23,220 --> 00:36:25,310 So we start out with a Lemma saying, 731 00:36:25,310 --> 00:36:27,660 you're choosing these IDs at random. 732 00:36:27,660 --> 00:36:30,584 You want a high probability that they're all different. 733 00:36:30,584 --> 00:36:32,500 So we have a lemma like the one we had before. 734 00:36:32,500 --> 00:36:35,920 It says, the probability at least, we use 1 minus 1 735 00:36:35,920 --> 00:36:38,530 over n squared, in each phase. 736 00:36:38,530 --> 00:36:41,310 All these phases up to four log n, 737 00:36:41,310 --> 00:36:44,650 everybody's choosing a different random value. 738 00:36:44,650 --> 00:36:48,880 All the nodes choose different values at each phase. 739 00:36:48,880 --> 00:36:51,810 So this lets us ignore the possibility 740 00:36:51,810 --> 00:36:54,000 that you have repeats. 741 00:36:54,000 --> 00:36:55,610 So we'll come back to that at the end. 742 00:36:58,147 --> 00:37:00,480 All right, so we're going to pretend that in each phase, 743 00:37:00,480 --> 00:37:02,340 all the random numbers are different. 744 00:37:04,960 --> 00:37:09,070 So the key idea of this is to show that the graph has 745 00:37:09,070 --> 00:37:13,019 to shrink enough at each phase. 746 00:37:13,019 --> 00:37:14,560 So the way we're going to say that is 747 00:37:14,560 --> 00:37:17,500 not in terms of the nodes, but in terms 748 00:37:17,500 --> 00:37:18,740 of the number of edges. 749 00:37:18,740 --> 00:37:22,670 We're going to say at each phase, the expected 750 00:37:22,670 --> 00:37:26,140 number of edges that are live-- why is that shaking? 751 00:37:31,680 --> 00:37:32,240 OK. 752 00:37:32,240 --> 00:37:33,760 The expected number of edges that 753 00:37:33,760 --> 00:37:38,300 are live at the end of the phase is at most half the number 754 00:37:38,300 --> 00:37:41,570 that were live at the beginning of the phase. 755 00:37:41,570 --> 00:37:45,410 So an edge is live, if its endpoints are still live. 756 00:37:45,410 --> 00:37:48,795 So instead of talking about reducing the number of nodes 757 00:37:48,795 --> 00:37:50,170 by a constant fraction, I'm going 758 00:37:50,170 --> 00:37:52,550 to reduce the number of remaining edges 759 00:37:52,550 --> 00:37:56,710 by constant fraction of each phase. 760 00:37:56,710 --> 00:37:58,860 So this is what I'm going to prove. 761 00:37:58,860 --> 00:38:02,260 So now I've got only three slides, but the only three 762 00:38:02,260 --> 00:38:04,690 slides today that have calculations on them. 763 00:38:04,690 --> 00:38:07,300 So probably have to pay attention, 764 00:38:07,300 --> 00:38:09,320 if you want to follow the calculations online. 765 00:38:09,320 --> 00:38:11,390 So let's see why. 766 00:38:11,390 --> 00:38:12,560 But the goal is clear? 767 00:38:12,560 --> 00:38:14,570 We have to reduce the number of edges 768 00:38:14,570 --> 00:38:16,750 that remain by a factor of two. 769 00:38:19,270 --> 00:38:23,470 So this is actually a new proof of this algorithm's 770 00:38:23,470 --> 00:38:24,180 performance. 771 00:38:24,180 --> 00:38:28,150 The proof in the original papers is pretty complicated. 772 00:38:28,150 --> 00:38:32,770 This is a very intuitive, neat proof. 773 00:38:32,770 --> 00:38:35,020 So the first line of the proof says 774 00:38:35,020 --> 00:38:38,820 if you have a node that has a neighbor that 775 00:38:38,820 --> 00:38:43,200 chooses a value that's bigger than all of its own neighbors-- 776 00:38:43,200 --> 00:38:45,006 so u has a neighbor w. 777 00:38:45,006 --> 00:38:48,760 W chooses a value that's bigger than all w's neighbors. 778 00:38:48,760 --> 00:38:49,640 But let's say more. 779 00:38:49,640 --> 00:38:53,220 Let's say it's also bigger than all of u's other neighbors, 780 00:38:53,220 --> 00:38:55,670 besides w. 781 00:38:55,670 --> 00:38:59,270 So w is really big, bigger than all w's neighbors, 782 00:38:59,270 --> 00:39:02,960 bigger than all of u's other neighbors. 783 00:39:02,960 --> 00:39:08,240 If that happens, then what happens to u? 784 00:39:08,240 --> 00:39:12,160 Well we know that w is going to decide to join the MIS. 785 00:39:12,160 --> 00:39:16,260 And u is going to definitely die, 786 00:39:16,260 --> 00:39:18,230 is not going to join the MIS. 787 00:39:18,230 --> 00:39:20,560 Right? 788 00:39:20,560 --> 00:39:21,060 OK? 789 00:39:21,060 --> 00:39:23,920 I don't want to lose people in the first line. 790 00:39:23,920 --> 00:39:26,310 Question? 791 00:39:26,310 --> 00:39:27,890 Here's a picture. 792 00:39:27,890 --> 00:39:28,490 Here's u. 793 00:39:31,870 --> 00:39:34,800 And it has a neighbor w. 794 00:39:34,800 --> 00:39:40,780 And let's say that w's chosen value is greater than all 795 00:39:40,780 --> 00:39:43,090 of w's neighbors, but also greater than all 796 00:39:43,090 --> 00:39:44,785 of u's other neighbors. 797 00:39:47,510 --> 00:39:49,650 Yes? 798 00:39:49,650 --> 00:39:53,160 If w has that, w is going to join the MIS, 799 00:39:53,160 --> 00:39:56,480 and u is going to definitely not join the MIS. 800 00:39:56,480 --> 00:40:00,470 It's going to decide out in this phase. 801 00:40:00,470 --> 00:40:02,076 OK so far? 802 00:40:02,076 --> 00:40:05,830 AUDIENCE: Why does you need w to have value greater 803 00:40:05,830 --> 00:40:07,750 than u's neighbors? 804 00:40:07,750 --> 00:40:10,630 Because if w is greater than all of its neighbors then it's-- 805 00:40:10,630 --> 00:40:13,630 NANCY LYNCH: --be in the MIS and u will not be in the MIS. 806 00:40:13,630 --> 00:40:15,900 And that seems like it ought to be enough. 807 00:40:15,900 --> 00:40:19,240 But look at the next line. 808 00:40:19,240 --> 00:40:21,510 Well the line after this one. 809 00:40:21,510 --> 00:40:24,960 What's the probability that w chooses a value like that? 810 00:40:28,080 --> 00:40:31,570 So if it's going to be bigger than all u's neighbors, and all 811 00:40:31,570 --> 00:40:33,450 of w's neighbors, and keeping in mind 812 00:40:33,450 --> 00:40:35,320 that they are each other's neighbors, 813 00:40:35,320 --> 00:40:37,920 turns out that there is degree u, 814 00:40:37,920 --> 00:40:43,410 at most degree u plus degree w nodes involved here. 815 00:40:43,410 --> 00:40:45,790 W has to have the biggest of all of those, 816 00:40:45,790 --> 00:40:48,810 so it's going to have the probability 817 00:40:48,810 --> 00:40:53,060 1 over the number of nodes of being the biggest one. 818 00:40:53,060 --> 00:40:55,540 So it's just 1 over the degree of u 819 00:40:55,540 --> 00:40:59,180 plus the degree of w, the probability 820 00:40:59,180 --> 00:41:01,320 that w will choose a big enough value. 821 00:41:06,580 --> 00:41:09,400 But you ask, this is pessimistic. 822 00:41:09,400 --> 00:41:13,950 Why don't I just say that w is bigger than its own values? 823 00:41:13,950 --> 00:41:15,490 Because I want to do this next step. 824 00:41:15,490 --> 00:41:19,180 I want to say the probability that node u gets killed 825 00:41:19,180 --> 00:41:24,730 by one of its neighbors, any one of its neighbors in this phase. 826 00:41:24,730 --> 00:41:26,640 I can calculate that as the sum. 827 00:41:29,540 --> 00:41:32,220 The probability that node u is killed by a neighbor 828 00:41:32,220 --> 00:41:35,520 is at least the sum over all of its neighbors. 829 00:41:35,520 --> 00:41:40,240 You look at all the vertices in the neighbor set, 830 00:41:40,240 --> 00:41:43,990 and you add up this fraction. 831 00:41:43,990 --> 00:41:49,090 So why did I need to make that additional assumption before? 832 00:41:49,090 --> 00:41:51,850 That w is greater than all of u's neighbors, 833 00:41:51,850 --> 00:41:54,390 as well as all of its own neighbors. 834 00:41:54,390 --> 00:41:55,078 Yeah? 835 00:41:55,078 --> 00:41:56,910 AUDIENCE: So you can add a problem to-- 836 00:41:56,910 --> 00:41:58,618 NANCY LYNCH: Yeah because otherwise these 837 00:41:58,618 --> 00:42:00,840 would be overlapping events. 838 00:42:00,840 --> 00:42:03,260 But this way I know they're definitely disjoint events. 839 00:42:03,260 --> 00:42:06,991 We can't have-- if we have w and w prime, 840 00:42:06,991 --> 00:42:08,490 you can't have both of those holding 841 00:42:08,490 --> 00:42:12,590 because the requirement for w is saying that its ID is 842 00:42:12,590 --> 00:42:16,540 bigger than w prime's ID. 843 00:42:16,540 --> 00:42:18,790 Because you have these disjoint events, 844 00:42:18,790 --> 00:42:21,140 you can just add the probabilities. 845 00:42:21,140 --> 00:42:22,520 And you know that the probability 846 00:42:22,520 --> 00:42:25,460 that u gets killed by some neighbor 847 00:42:25,460 --> 00:42:28,130 is at least this summation. 848 00:42:28,130 --> 00:42:29,491 OK so far? 849 00:42:29,491 --> 00:42:30,740 So now I'm going to calculate. 850 00:42:33,260 --> 00:42:34,870 But I wanted to focus on the edges. 851 00:42:34,870 --> 00:42:38,560 So let's see, this tells us a way that a node can get killed. 852 00:42:38,560 --> 00:42:42,990 But let's look at what happens for an edge getting killed. 853 00:42:42,990 --> 00:42:48,230 This is the probability that a node is killed. 854 00:42:48,230 --> 00:42:53,070 So the probability that an edge dies at this phase 855 00:42:53,070 --> 00:42:56,180 is at least the maximum of the probability 856 00:42:56,180 --> 00:43:02,766 that either of its two endpoints die. 857 00:43:02,766 --> 00:43:04,390 And let's just write it as the average. 858 00:43:04,390 --> 00:43:06,790 The probability that an edge dies 859 00:43:06,790 --> 00:43:09,050 is at least the average of the probability 860 00:43:09,050 --> 00:43:11,760 that it's two endpoints are killed, in this way. 861 00:43:15,200 --> 00:43:18,010 So for an edge, an edge is definitely going to die, 862 00:43:18,010 --> 00:43:20,380 if one of its endpoints dies. 863 00:43:20,380 --> 00:43:23,130 And then the edge dies if it dies in this particular way. 864 00:43:26,110 --> 00:43:29,050 So the probability an edge dies is at least the probability 865 00:43:29,050 --> 00:43:32,390 that one of the-- half the sum of the probabilities 866 00:43:32,390 --> 00:43:36,130 that the two end points die. 867 00:43:36,130 --> 00:43:38,490 It's the average probability. 868 00:43:38,490 --> 00:43:39,950 Makes sense? 869 00:43:39,950 --> 00:43:42,740 You might have to read this later. 870 00:43:42,740 --> 00:43:45,770 So now we can go from that to the expected number of edges 871 00:43:45,770 --> 00:43:47,809 that die. 872 00:43:47,809 --> 00:43:48,350 What is that? 873 00:43:48,350 --> 00:43:51,110 You just add up, over all, the edges, the probability 874 00:43:51,110 --> 00:43:53,250 that the edge dies. 875 00:43:53,250 --> 00:43:55,710 The expected number of edges that die 876 00:43:55,710 --> 00:44:00,910 is at least the sum over all of the edges of the probability 877 00:44:00,910 --> 00:44:03,070 that the two endpoints die. 878 00:44:09,040 --> 00:44:12,170 So you have the sum, over all of the edges. 879 00:44:12,170 --> 00:44:13,570 You add up for all the edges. 880 00:44:13,570 --> 00:44:16,300 The probability that one endpoint is killed, 881 00:44:16,300 --> 00:44:20,360 and the probability the other endpoint is killed. 882 00:44:20,360 --> 00:44:22,240 So what we have is this great big summation 883 00:44:22,240 --> 00:44:26,810 involving now the kill probabilities for vertices. 884 00:44:26,810 --> 00:44:29,050 So we have the kill probability for each vertex. 885 00:44:29,050 --> 00:44:32,360 How many times does that occur? 886 00:44:32,360 --> 00:44:38,100 If you have a vertex, u, it appears once for every edge 887 00:44:38,100 --> 00:44:39,912 that u is an endpoint of. 888 00:44:43,200 --> 00:44:47,610 So you have the kill probability for each node occurring exactly 889 00:44:47,610 --> 00:44:50,500 it's degree number of times. 890 00:44:50,500 --> 00:44:53,580 So that lets me rewrite this in terms of vertices. 891 00:44:53,580 --> 00:44:58,420 This sum is just 1/2 the sum over all the nodes 892 00:44:58,420 --> 00:45:02,580 of the probability that the node gets killed times its degree. 893 00:45:05,200 --> 00:45:09,370 So I'm calculating by replacing the description 894 00:45:09,370 --> 00:45:11,100 in terms of edges, by description 895 00:45:11,100 --> 00:45:13,150 in terms of vertices. 896 00:45:13,150 --> 00:45:16,780 More or less OK so far? 897 00:45:16,780 --> 00:45:18,000 So now what do I do? 898 00:45:18,000 --> 00:45:21,040 Well, I know the probability that u is killed. 899 00:45:21,040 --> 00:45:23,900 I have a bound for that up on the first line. 900 00:45:23,900 --> 00:45:26,960 So I'm just going to plug that in. 901 00:45:26,960 --> 00:45:29,460 So I get 1/2 the sum over all the nodes, 902 00:45:29,460 --> 00:45:36,320 the degree of the node times this summation that gives me 903 00:45:36,320 --> 00:45:39,092 the kill probability for that node. 904 00:45:39,092 --> 00:45:40,550 And now I play around with the sum. 905 00:45:40,550 --> 00:45:45,170 I can move the degree inside the second summation, 906 00:45:45,170 --> 00:45:48,360 and I get this. 907 00:45:48,360 --> 00:45:49,750 So now let's stare at this again. 908 00:45:49,750 --> 00:45:54,760 I have the sum over all nodes of the sum over all 909 00:45:54,760 --> 00:45:58,400 of its neighbors of some expression. 910 00:45:58,400 --> 00:46:01,702 But if I'm considering a node, every note and every one 911 00:46:01,702 --> 00:46:03,410 of its neighbors, that's like considering 912 00:46:03,410 --> 00:46:05,610 all the directed edges. 913 00:46:05,610 --> 00:46:08,760 I look at every u, and I look at every edge that 914 00:46:08,760 --> 00:46:11,600 connects u to something else. 915 00:46:11,600 --> 00:46:14,890 So I can write it as the sum over all the directed edges 916 00:46:14,890 --> 00:46:18,080 of this expression. 917 00:46:18,080 --> 00:46:20,160 So I get half of the sum over all the 918 00:46:20,160 --> 00:46:23,454 directed edges of this expression. 919 00:46:23,454 --> 00:46:25,245 But we were talking about undirected edges. 920 00:46:28,380 --> 00:46:32,229 And the undirected edges are being twice here, once 921 00:46:32,229 --> 00:46:33,020 for each direction. 922 00:46:35,950 --> 00:46:39,690 I can change this sum to a sum over undirected edges. 923 00:46:39,690 --> 00:46:42,270 But now I have the two endpoints to deal with. 924 00:46:42,270 --> 00:46:48,390 So I get the degree of u and the degree of v in the numerator 925 00:46:48,390 --> 00:46:50,390 because I'm looking at it from the point of view 926 00:46:50,390 --> 00:46:53,810 both of the endpoints of each edge. 927 00:46:53,810 --> 00:46:55,780 Well something drops out, so I have 928 00:46:55,780 --> 00:47:00,860 1/2 the sum over all the undirected edges of 1. 929 00:47:00,860 --> 00:47:03,645 So that's 1/2 of the number of undirected edges. 930 00:47:06,550 --> 00:47:08,800 So I don't expect you to get every step of this, 931 00:47:08,800 --> 00:47:10,850 but it's on three slides, so you can 932 00:47:10,850 --> 00:47:14,250 stare at this when you go home and make sure the steps work. 933 00:47:14,250 --> 00:47:16,070 But remember the point of this is 934 00:47:16,070 --> 00:47:18,090 to show that you reduce the number of edges 935 00:47:18,090 --> 00:47:21,640 by a factor of two, and it's done and sort of a clever way 936 00:47:21,640 --> 00:47:24,185 by counting the kill probabilities of vertices. 937 00:47:30,700 --> 00:47:34,020 So we get this, reducing the number of edges. 938 00:47:34,020 --> 00:47:35,910 And now we can just plug that back in 939 00:47:35,910 --> 00:47:41,072 to get our complexity bound for the entire algorithm. 940 00:47:41,072 --> 00:47:43,280 Remember the original theorem you're we were to prove 941 00:47:43,280 --> 00:47:47,740 is a probability bound for deciding within log n phases. 942 00:47:47,740 --> 00:47:49,420 Well you should have a pretty good idea 943 00:47:49,420 --> 00:47:51,370 of why that works because if at each phase, 944 00:47:51,370 --> 00:47:53,120 you're going to reduce the number of edges 945 00:47:53,120 --> 00:47:55,800 by around a factor of two, then it's 946 00:47:55,800 --> 00:48:00,530 going to take something like log n phases to finish. 947 00:48:00,530 --> 00:48:02,120 And I just put a proof sketch. 948 00:48:04,940 --> 00:48:07,710 The number of edges that are still alive after four log n 949 00:48:07,710 --> 00:48:11,010 phases, well you divide by 2 four log n times, 950 00:48:11,010 --> 00:48:13,310 so you get down to practically nothing. 951 00:48:13,310 --> 00:48:19,690 The probability any edges are alive at the end is very small. 952 00:48:19,690 --> 00:48:23,910 So you get a small probability the algorithm doesn't terminate 953 00:48:23,910 --> 00:48:26,700 within four log n phases. 954 00:48:26,700 --> 00:48:29,569 There's an extra little term I threw in here. 955 00:48:29,569 --> 00:48:30,610 You might have forgotten. 956 00:48:30,610 --> 00:48:33,030 There was a term that I needed for the small probability, 957 00:48:33,030 --> 00:48:36,090 that somebody chose duplicate IDs. 958 00:48:36,090 --> 00:48:37,760 So I'm bringing them back in at the end, 959 00:48:37,760 --> 00:48:40,430 in a little union bound. 960 00:48:40,430 --> 00:48:43,857 And we get our 1 over n probability this way. 961 00:48:43,857 --> 00:48:45,940 But the key idea is you reduce the number of edges 962 00:48:45,940 --> 00:48:49,150 by half at each stage. 963 00:48:49,150 --> 00:48:52,790 Enough for you to look at later, I guess to figure this out 964 00:48:52,790 --> 00:48:55,670 or you have any questions about this? 965 00:48:55,670 --> 00:49:00,190 So that's the last equations and calculation. 966 00:49:00,190 --> 00:49:06,470 I'm going to go onto a new idea, more conceptual stuff. 967 00:49:06,470 --> 00:49:09,800 Familiar problem, breadth-first spanning trees, 968 00:49:09,800 --> 00:49:14,070 setting up breadth-first paths to every node, 969 00:49:14,070 --> 00:49:19,130 but we're going to study it in our new setting. 970 00:49:19,130 --> 00:49:21,080 We have a connected graph. 971 00:49:21,080 --> 00:49:24,390 This time, let's suppose that it has a distinguished vertex, 972 00:49:24,390 --> 00:49:26,310 like it already has a leader. 973 00:49:26,310 --> 00:49:28,450 So it has a distinguished vertex in the graph 974 00:49:28,450 --> 00:49:31,930 that's going to become the root of the BFS tree. 975 00:49:34,920 --> 00:49:37,620 And the processes don't need any knowledge 976 00:49:37,620 --> 00:49:39,220 about the graph for this one. 977 00:49:44,930 --> 00:49:48,490 For the rest of the time today and Thursday, 978 00:49:48,490 --> 00:49:51,250 we'll assume the processes have unique identifiers, 979 00:49:51,250 --> 00:49:53,460 and I don't think we're using any probabilities. 980 00:49:53,460 --> 00:49:56,970 So this is just going to be using the unique identifiers 981 00:49:56,970 --> 00:49:59,570 to solve our problems. 982 00:49:59,570 --> 00:50:02,700 So everybody knows its own unique identifier. 983 00:50:02,700 --> 00:50:05,710 The root has a distinguished, generally known, 984 00:50:05,710 --> 00:50:08,720 unique identifier say i0. 985 00:50:08,720 --> 00:50:10,880 And the process that has i0 knows hey, 986 00:50:10,880 --> 00:50:13,060 I'm at the root of the graph. 987 00:50:13,060 --> 00:50:14,380 So the set up make sense? 988 00:50:17,647 --> 00:50:19,230 We might as well assume that everybody 989 00:50:19,230 --> 00:50:21,710 knows the unique identifiers of their neighbors 990 00:50:21,710 --> 00:50:23,990 because they could easily exchange information 991 00:50:23,990 --> 00:50:27,200 now, and match up who's connected on which port 992 00:50:27,200 --> 00:50:28,385 by a unique identifier. 993 00:50:31,830 --> 00:50:33,499 We'll just do deterministic. 994 00:50:33,499 --> 00:50:35,540 There'll be a little bit of non-determinism here. 995 00:50:35,540 --> 00:50:36,770 I'll say more about that. 996 00:50:36,770 --> 00:50:42,470 But I'm not going to worry about probabilities for this. 997 00:50:42,470 --> 00:50:45,100 Well that told you about the general setup. 998 00:50:45,100 --> 00:50:47,880 What are the processes supposed to do? 999 00:50:47,880 --> 00:50:50,720 Well they're supposed to compute a breadth-first spanning tree, 1000 00:50:50,720 --> 00:50:53,210 rooted at vertex v0. 1001 00:50:53,210 --> 00:50:56,140 The branches are going to be directed 1002 00:50:56,140 --> 00:51:00,040 paths in this undirected graph, coming from v0. 1003 00:51:00,040 --> 00:51:03,520 Spanning means they should reach all the vertices. 1004 00:51:03,520 --> 00:51:06,370 And breadth-first means that if a vertex is at a distance 1005 00:51:06,370 --> 00:51:12,600 d from v0, it will appear at depth d in this spanning tree. 1006 00:51:12,600 --> 00:51:17,910 So everybody should get a shortest path from the root. 1007 00:51:17,910 --> 00:51:20,610 Now how are we going to compute this in a distributed setting? 1008 00:51:20,610 --> 00:51:23,400 Well now the output of a process is just 1009 00:51:23,400 --> 00:51:26,850 going to be its parent in the tree. 1010 00:51:26,850 --> 00:51:29,590 So we're not actually going to compute this tree anywhere 1011 00:51:29,590 --> 00:51:30,680 as a whole. 1012 00:51:30,680 --> 00:51:33,546 Everybody's just going to know its parent in the tree. 1013 00:51:37,810 --> 00:51:38,420 Questions? 1014 00:51:38,420 --> 00:51:39,340 Problem make sense? 1015 00:51:43,920 --> 00:51:47,694 So this is just an example of a spanning tree, 1016 00:51:47,694 --> 00:51:48,860 breadth-first spanning tree. 1017 00:51:48,860 --> 00:51:53,000 This gives you shortest paths to all of the nodes, , 1018 00:51:53,000 --> 00:51:55,200 shortest in terms of the number of hops. 1019 00:51:58,600 --> 00:52:01,740 So we can have a very, very simple algorithm. 1020 00:52:01,740 --> 00:52:06,270 We're going to let the processes mark themselves as they 1021 00:52:06,270 --> 00:52:08,520 get included in the tree. 1022 00:52:08,520 --> 00:52:12,970 Starts out only the first process, i0, is marked. 1023 00:52:12,970 --> 00:52:17,470 So do you want to give an idea, maybe, of how this might work? 1024 00:52:17,470 --> 00:52:19,421 Sketch out-- yeah? 1025 00:52:19,421 --> 00:52:21,504 AUDIENCE: The root will send out to its neighbors. 1026 00:52:21,504 --> 00:52:22,968 And they will then mark themselves 1027 00:52:22,968 --> 00:52:25,408 as the parent of whoever they heard from. 1028 00:52:25,408 --> 00:52:27,369 Then they will-- 1029 00:52:27,369 --> 00:52:28,910 NANCY LYNCH: This is all synchronous. 1030 00:52:28,910 --> 00:52:29,710 So that's great. 1031 00:52:29,710 --> 00:52:31,720 They'll be doing this in synchronous rounds. 1032 00:52:31,720 --> 00:52:34,030 So everybody will, at the certain distance, 1033 00:52:34,030 --> 00:52:37,790 is going to get the message at the right number of rounds 1034 00:52:37,790 --> 00:52:40,320 to mark their distance. 1035 00:52:40,320 --> 00:52:45,000 OK so in round one, process i0 will 1036 00:52:45,000 --> 00:52:48,550 send a special message, say search, 1037 00:52:48,550 --> 00:52:50,660 to all of its neighbors. 1038 00:52:50,660 --> 00:52:52,950 And anybody who receives a message in round one 1039 00:52:52,950 --> 00:52:57,320 will mark itself, decide i0 is its parent, 1040 00:52:57,320 --> 00:53:01,850 could output that i0 is my parent, parent i0. 1041 00:53:01,850 --> 00:53:03,970 And then it can get ready for the next round, 1042 00:53:03,970 --> 00:53:09,210 when it's supposed to send to continue this. 1043 00:53:09,210 --> 00:53:13,000 So at later rounds, if you decided you're going to send, 1044 00:53:13,000 --> 00:53:16,070 if you know you're supposed to send from the previous round, 1045 00:53:16,070 --> 00:53:20,000 then you send a search message to all of your neighbors. 1046 00:53:20,000 --> 00:53:22,530 Now the process is sitting there and it 1047 00:53:22,530 --> 00:53:25,040 receives a search message. 1048 00:53:25,040 --> 00:53:30,340 If he's already marked, then he should just ignore the message. 1049 00:53:30,340 --> 00:53:32,430 Once you're included in the tree, 1050 00:53:32,430 --> 00:53:35,160 you don't care if you get other messages, 1051 00:53:35,160 --> 00:53:37,840 search messages on other paths. 1052 00:53:37,840 --> 00:53:41,170 So you only do anything if you're not yet marked 1053 00:53:41,170 --> 00:53:43,120 and you receive a message. 1054 00:53:43,120 --> 00:53:44,855 And in that case, then you mark yourself. 1055 00:53:48,020 --> 00:53:49,970 Then you mark yourself, and then you 1056 00:53:49,970 --> 00:53:53,980 choose one of your neighbors as to be your parent. 1057 00:53:53,980 --> 00:53:55,920 Now because this is synchronous, you 1058 00:53:55,920 --> 00:53:58,970 have several nodes that could be sending at the same time. 1059 00:53:58,970 --> 00:54:02,280 So one node could be receiving search messages 1060 00:54:02,280 --> 00:54:05,040 from several different neighbors at once. 1061 00:54:05,040 --> 00:54:07,660 Well, it wants to choose one of them as its parent, 1062 00:54:07,660 --> 00:54:10,380 doesn't matter which one it chooses. 1063 00:54:10,380 --> 00:54:13,000 So it can just choose nondeterminstically just 1064 00:54:13,000 --> 00:54:15,160 arbitrarily. 1065 00:54:15,160 --> 00:54:19,932 And then it decides that it will send the next round. 1066 00:54:19,932 --> 00:54:21,170 Is the algorithm clear? 1067 00:54:26,770 --> 00:54:29,380 So there's, I mentioned, a little bit of nondeterministic 1068 00:54:29,380 --> 00:54:31,970 here, only in that a process can choose arbitrarily 1069 00:54:31,970 --> 00:54:34,120 among several possible parents. 1070 00:54:36,770 --> 00:54:38,490 And then we could put in a default, 1071 00:54:38,490 --> 00:54:40,830 saying that it chooses the one with the smallest ID, 1072 00:54:40,830 --> 00:54:43,170 if we really want to make it deterministic. 1073 00:54:43,170 --> 00:54:45,542 But it's also OK to leave distributed algorithms 1074 00:54:45,542 --> 00:54:46,250 nondeterministic. 1075 00:54:49,690 --> 00:54:51,530 And here I should make a remark that 1076 00:54:51,530 --> 00:54:54,230 shows how differently nondeterminism 1077 00:54:54,230 --> 00:54:56,910 is regarded in the distributed setting, 1078 00:54:56,910 --> 00:55:00,520 from the way it is for sequential algorithms. 1079 00:55:00,520 --> 00:55:03,410 For distributed algorithms, there can be many options. 1080 00:55:03,410 --> 00:55:04,840 And maybe they're all OK. 1081 00:55:04,840 --> 00:55:07,560 But the algorithm is supposed to work correctly, 1082 00:55:07,560 --> 00:55:12,960 no matter how you resolve the nondeterministic choices. 1083 00:55:12,960 --> 00:55:15,850 So think about like np, and the other ways 1084 00:55:15,850 --> 00:55:18,160 that you've seen nondeterminism so far. 1085 00:55:18,160 --> 00:55:21,390 There you say you're lucky if there is a path to a choice. 1086 00:55:21,390 --> 00:55:24,200 Here when you make a nondeterministic choice, 1087 00:55:24,200 --> 00:55:26,590 or when the algorithm behaves nondeterministically, 1088 00:55:26,590 --> 00:55:28,330 all the choices are supposed to work. 1089 00:55:28,330 --> 00:55:30,890 It's like all the paths have to come up with correct answers. 1090 00:55:30,890 --> 00:55:32,259 Do you have a question? 1091 00:55:32,259 --> 00:55:34,384 AUDIENCE: Yes, whenever there's a sub- [INAUDIBLE], 1092 00:55:34,384 --> 00:55:36,740 whenever there's a race condition, 1093 00:55:36,740 --> 00:55:38,740 we locally assume that there wasn't a difference 1094 00:55:38,740 --> 00:55:41,160 in local computation time. 1095 00:55:41,160 --> 00:55:42,830 But if there is, even in the slightest, 1096 00:55:42,830 --> 00:55:45,330 then they would get a parent [INAUDIBLE] before another one, 1097 00:55:45,330 --> 00:55:47,129 it would still be a valid-- 1098 00:55:47,129 --> 00:55:48,670 NANCY LYNCH: So the synchronous model 1099 00:55:48,670 --> 00:55:50,030 is more abstract than that. 1100 00:55:50,030 --> 00:55:52,540 You don't model the local computation time. 1101 00:55:52,540 --> 00:55:54,660 You're moving more toward an asynchronous model, 1102 00:55:54,660 --> 00:55:58,280 where the steps can take differing amounts of time. 1103 00:55:58,280 --> 00:56:01,250 Here we just assume you have an abstract model, where 1104 00:56:01,250 --> 00:56:04,280 everybody does stuff at once, in each round. 1105 00:56:04,280 --> 00:56:05,900 But you still have nondeterminism 1106 00:56:05,900 --> 00:56:11,560 because they can all arrive at the same round somewhere. 1107 00:56:11,560 --> 00:56:12,240 But it's OK. 1108 00:56:12,240 --> 00:56:14,100 You can pick any one and it still works. 1109 00:56:17,560 --> 00:56:20,600 So it should be not hard to see that this does give you 1110 00:56:20,600 --> 00:56:23,830 a BFS tree because you're creating all the branches 1111 00:56:23,830 --> 00:56:25,040 synchronously. 1112 00:56:25,040 --> 00:56:28,790 And you're growing one hop at each round. 1113 00:56:28,790 --> 00:56:30,690 It reaches all the nodes eventually 1114 00:56:30,690 --> 00:56:32,080 because the graph is connected. 1115 00:56:32,080 --> 00:56:36,400 And everybody sends messages once a node get marked. 1116 00:56:36,400 --> 00:56:38,520 It sends messages to its neighbors. 1117 00:56:38,520 --> 00:56:40,400 So eventually, the markings are going 1118 00:56:40,400 --> 00:56:46,640 to reach all the neighbors, all the nodes in the graph. 1119 00:56:46,640 --> 00:56:50,970 So here's how you get the example I showed before, 1120 00:56:50,970 --> 00:56:53,460 simple breadth-first search. 1121 00:56:53,460 --> 00:56:57,270 That's a search message sent by this guy. 1122 00:56:57,270 --> 00:56:59,360 I put it to the right of the edge 1123 00:56:59,360 --> 00:57:02,990 to indicate-- it's kind of hard to distinguish. 1124 00:57:02,990 --> 00:57:04,880 But I put them on the right of the edge 1125 00:57:04,880 --> 00:57:06,750 from the point of view of the sender. 1126 00:57:06,750 --> 00:57:09,770 So he sends a search message. 1127 00:57:09,770 --> 00:57:10,650 it gets there. 1128 00:57:10,650 --> 00:57:13,720 This arrow just indicates that it reached the other end. 1129 00:57:13,720 --> 00:57:16,160 And this guy has chosen the sender, 1130 00:57:16,160 --> 00:57:19,790 which is the other direction on the arrow, as its parent. 1131 00:57:19,790 --> 00:57:25,540 Now the recipient is going to send some search messages. 1132 00:57:25,540 --> 00:57:28,370 So he sends four of them. 1133 00:57:28,370 --> 00:57:29,820 They all get to the other end. 1134 00:57:29,820 --> 00:57:32,770 And OK, so all these guys now get marked. 1135 00:57:32,770 --> 00:57:36,230 They're included in the BFS tree. 1136 00:57:36,230 --> 00:57:40,000 And now the next round, they all send some messages. 1137 00:57:40,000 --> 00:57:44,270 I'm not putting in the messages where somebody would send back 1138 00:57:44,270 --> 00:57:45,970 to a guy who sent to him. 1139 00:57:45,970 --> 00:57:47,810 But I put in all the others. 1140 00:57:47,810 --> 00:57:51,350 Some of them are going to be ignored. 1141 00:57:51,350 --> 00:57:53,820 But you do get to a few new nodes this way. 1142 00:57:53,820 --> 00:57:55,460 That's round three. 1143 00:57:55,460 --> 00:57:57,900 Round four, everybody sends. 1144 00:57:57,900 --> 00:58:00,490 And now you have all the nodes included. 1145 00:58:03,250 --> 00:58:05,540 So this gives you the spanning tree 1146 00:58:05,540 --> 00:58:07,970 that I showed at the beginning of this topic. 1147 00:58:12,450 --> 00:58:14,650 This is not a very complicated algorithm. 1148 00:58:14,650 --> 00:58:17,830 But I think you can see that things can get worse. 1149 00:58:17,830 --> 00:58:22,820 And you want to argue about why the algorithms work correctly. 1150 00:58:22,820 --> 00:58:25,970 So as I said before, a popular method 1151 00:58:25,970 --> 00:58:28,300 of reasoning about the algorithms 1152 00:58:28,300 --> 00:58:30,300 is to state invariance. 1153 00:58:30,300 --> 00:58:32,010 So here, suppose I want to describe 1154 00:58:32,010 --> 00:58:35,525 the state of the entire network, after some number, r, 1155 00:58:35,525 --> 00:58:37,921 of rounds. 1156 00:58:37,921 --> 00:58:39,170 what could you say about that? 1157 00:58:39,170 --> 00:58:41,400 What's the case after r rounds of this algorithm? 1158 00:58:49,010 --> 00:58:50,483 Yeah. 1159 00:58:50,483 --> 00:58:53,429 AUDIENCE: All nodes at distance r from the root 1160 00:58:53,429 --> 00:58:55,260 have been marked. 1161 00:58:55,260 --> 00:58:58,280 NANCY LYNCH: All the nodes at distance r from the root 1162 00:58:58,280 --> 00:58:59,530 have been marked. 1163 00:58:59,530 --> 00:59:03,000 In fact, only those by round r, only the ones 1164 00:59:03,000 --> 00:59:06,330 with distances up through r have been marked. 1165 00:59:06,330 --> 00:59:09,350 So to state the invariance, if you want to state invariance, 1166 00:59:09,350 --> 00:59:12,540 I have to say what's in the state of the processes. 1167 00:59:12,540 --> 00:59:14,770 So all right, what can we say? 1168 00:59:14,770 --> 00:59:18,570 So the process has a Boolean that says whether or not 1169 00:59:18,570 --> 00:59:19,740 it's marked. 1170 00:59:19,740 --> 00:59:23,570 It has a place to record a parent. 1171 00:59:23,570 --> 00:59:29,150 And it has someplace where it puts information 1172 00:59:29,150 --> 00:59:30,750 about whether it's supposed to send 1173 00:59:30,750 --> 00:59:33,100 a message at the next round. 1174 00:59:33,100 --> 00:59:36,180 And we also should know its UID, so I'll 1175 00:59:36,180 --> 00:59:38,800 put that in another state variable. 1176 00:59:38,800 --> 00:59:43,570 So here is something I can say in invariance. 1177 00:59:43,570 --> 00:59:48,920 At the end of r rounds, as you said, at the end of r rounds 1178 00:59:48,920 --> 00:59:52,390 exactly the processes at distance at most r 1179 00:59:52,390 --> 00:59:57,511 from the source node, the root node, are marked. 1180 00:59:57,511 --> 00:59:58,510 I can say a little more. 1181 00:59:58,510 --> 01:00:02,750 I can say a process has its parents defined if and only 1182 01:00:02,750 --> 01:00:04,390 if it's marked. 1183 01:00:04,390 --> 01:00:05,640 So it doesn't just get market. 1184 01:00:05,640 --> 01:00:08,050 It also computes a parent, and the parent 1185 01:00:08,050 --> 01:00:13,030 gets computed at the point where it gets marked. 1186 01:00:13,030 --> 01:00:15,950 Then I should say that the parent is correct. 1187 01:00:15,950 --> 01:00:21,400 So for any process that's at distance d from the source, 1188 01:00:21,400 --> 01:00:23,340 if the parent is defined, then it's 1189 01:00:23,340 --> 01:00:26,410 in fact the UID of a process at distance d minus 1 1190 01:00:26,410 --> 01:00:28,670 from the source. 1191 01:00:28,670 --> 01:00:30,220 So that says it's actually getting 1192 01:00:30,220 --> 01:00:33,590 a correct breadth-first tree. 1193 01:00:33,590 --> 01:00:36,890 It's getting the parent on a shortest path. 1194 01:00:36,890 --> 01:00:37,566 Yeah? 1195 01:00:37,566 --> 01:00:39,946 AUDIENCE: Do these invariants [INAUDIBLE] for i0? 1196 01:00:42,810 --> 01:00:44,460 NANCY LYNCH: Distance 0 is marked. 1197 01:00:47,090 --> 01:00:52,200 i0 doesn't ever-- I see what you're saying. 1198 01:00:52,200 --> 01:00:54,330 i0 doesn't have a parent. 1199 01:00:54,330 --> 01:00:56,910 So I guess that we should say for i 1200 01:00:56,910 --> 01:01:01,200 not equal to i0 in this case. 1201 01:01:01,200 --> 01:01:03,310 So this would be a process other than i0. 1202 01:01:03,310 --> 01:01:04,886 It would have its parent defined, 1203 01:01:04,886 --> 01:01:06,010 if and only if it's marked. 1204 01:01:06,010 --> 01:01:09,180 Well as I think you just noticed, 1205 01:01:09,180 --> 01:01:12,000 the root node is marked but it doesn't have a parent. 1206 01:01:12,000 --> 01:01:15,240 So it's an exception. 1207 01:01:15,240 --> 01:01:19,500 But this should be, this doesn't involve i0. 1208 01:01:19,500 --> 01:01:22,777 So the second one, I can fix that a bit. 1209 01:01:22,777 --> 01:01:23,860 Other comments, questions? 1210 01:01:27,890 --> 01:01:31,000 So if somebody wanted to do a formal correctness 1211 01:01:31,000 --> 01:01:33,040 proof of an algorithm like this one, 1212 01:01:33,040 --> 01:01:34,637 you would use these invariants. 1213 01:01:34,637 --> 01:01:35,720 You prove it by induction. 1214 01:01:35,720 --> 01:01:37,510 In fact there's quite a few people 1215 01:01:37,510 --> 01:01:43,030 who use interactive theorem provers to do proofs 1216 01:01:43,030 --> 01:01:46,480 like this because the algorithms can get pretty complicated, 1217 01:01:46,480 --> 01:01:48,520 with a lot of variables. 1218 01:01:48,520 --> 01:01:50,470 So you have to do some bookkeeping. 1219 01:01:50,470 --> 01:01:52,460 You keep track of all these invariants, 1220 01:01:52,460 --> 01:01:55,880 and then you want to prove that they're all true by induction. 1221 01:01:55,880 --> 01:01:58,440 They all hold through an inductive step. 1222 01:01:58,440 --> 01:02:00,780 So you can use an interactive theorem prover 1223 01:02:00,780 --> 01:02:03,540 to help you do the bookkeeping. 1224 01:02:03,540 --> 01:02:06,390 But even a manual proof in a research paper 1225 01:02:06,390 --> 01:02:08,790 would use invariance in this style. 1226 01:02:12,000 --> 01:02:14,940 OK complexity. 1227 01:02:14,940 --> 01:02:19,660 So the number of rounds until everybody outputs their parent 1228 01:02:19,660 --> 01:02:23,440 would be the maximum distance of any node from v0. 1229 01:02:23,440 --> 01:02:25,780 So we can say that's at most the diameter of the graph. 1230 01:02:25,780 --> 01:02:26,620 It could be less. 1231 01:02:26,620 --> 01:02:28,030 It's just is the maximum distance 1232 01:02:28,030 --> 01:02:31,440 from this particular node. 1233 01:02:31,440 --> 01:02:33,020 Message complexity? 1234 01:02:33,020 --> 01:02:38,140 Well how many messages are sent in this algorithm? 1235 01:02:38,140 --> 01:02:40,290 So everybody is going to send messages 1236 01:02:40,290 --> 01:02:43,880 only once on all of its edges. 1237 01:02:43,880 --> 01:02:47,320 So that means all the edges get a message sent 1238 01:02:47,320 --> 01:02:48,714 in each direction just once. 1239 01:02:48,714 --> 01:02:50,255 So it's order of the number of edges. 1240 01:02:55,360 --> 01:02:58,720 All right, so we can play around with this. 1241 01:02:58,720 --> 01:03:01,560 So this algorithm just tells everybody who his parent is. 1242 01:03:01,560 --> 01:03:03,280 But maybe when you're finished, you'd 1243 01:03:03,280 --> 01:03:05,460 like to who your children are as well. 1244 01:03:08,400 --> 01:03:10,380 For many uses of these trees, you'd 1245 01:03:10,380 --> 01:03:14,040 like to have a parent be able to talk to its children 1246 01:03:14,040 --> 01:03:15,250 in the tree. 1247 01:03:15,250 --> 01:03:16,140 So how to do that? 1248 01:03:16,140 --> 01:03:20,080 Well you can add a child pointer because anybody 1249 01:03:20,080 --> 01:03:22,520 who gets a search message and selects its parents 1250 01:03:22,520 --> 01:03:24,830 could send back a message to that parents saying, hey, 1251 01:03:24,830 --> 01:03:26,330 I'm your child. 1252 01:03:26,330 --> 01:03:29,330 And if you get a search message, and you decide that that's not 1253 01:03:29,330 --> 01:03:31,760 your parent, you can help that guy out 1254 01:03:31,760 --> 01:03:34,407 by sending a message saying you're not my parent. 1255 01:03:34,407 --> 01:03:35,990 In the synchronous case, he would just 1256 01:03:35,990 --> 01:03:37,864 know that, if he didn't get a parent message. 1257 01:03:37,864 --> 01:03:40,970 But things are going to get more complicated. 1258 01:03:40,970 --> 01:03:43,517 So we'll send parents or non parent responses 1259 01:03:43,517 --> 01:03:44,475 to the search messages. 1260 01:03:49,770 --> 01:03:52,300 Suppose we want to compute the distances from v0, 1261 01:03:52,300 --> 01:03:53,630 not just to the parents are. 1262 01:03:53,630 --> 01:03:55,310 Well that's easy. 1263 01:03:55,310 --> 01:03:58,190 Everybody can just record its distances, as well as 1264 01:03:58,190 --> 01:04:01,670 its parent and the mark. 1265 01:04:01,670 --> 01:04:04,670 And then you just include your own distance value 1266 01:04:04,670 --> 01:04:06,170 in your search message. 1267 01:04:06,170 --> 01:04:09,340 And when somebody receives a search message, 1268 01:04:09,340 --> 01:04:13,820 it sets its own distance to the received distance plus 1. 1269 01:04:13,820 --> 01:04:17,750 So we can just keep track and add one to the distance. 1270 01:04:17,750 --> 01:04:20,380 It's easy to augment this algorithm 1271 01:04:20,380 --> 01:04:21,630 to get this extra information. 1272 01:04:24,630 --> 01:04:26,380 All right, now how do the processes know 1273 01:04:26,380 --> 01:04:27,463 when this is all finished? 1274 01:04:30,140 --> 01:04:32,011 So everybody was able to output parent. 1275 01:04:32,011 --> 01:04:33,010 I know who my parent is. 1276 01:04:33,010 --> 01:04:36,870 But how does anybody know when the entire tree 1277 01:04:36,870 --> 01:04:39,770 has been produced? 1278 01:04:39,770 --> 01:04:42,820 Not so obvious. 1279 01:04:42,820 --> 01:04:46,270 So in some settings, you might know an upper bound 1280 01:04:46,270 --> 01:04:48,350 on the depth of the tree. 1281 01:04:48,350 --> 01:04:51,477 And then you could just wait for that number of rounds. 1282 01:04:51,477 --> 01:04:52,810 But what if you don't know that? 1283 01:04:52,810 --> 01:04:54,476 You don't know anything about the graph. 1284 01:04:54,476 --> 01:04:57,360 Nobody knows. 1285 01:04:57,360 --> 01:04:59,460 So let's come up with an algorithm 1286 01:04:59,460 --> 01:05:04,660 for process i0, the root, to know definitively 1287 01:05:04,660 --> 01:05:07,700 that the tree has been completely constructed. 1288 01:05:07,700 --> 01:05:08,200 Ideas? 1289 01:05:14,100 --> 01:05:15,890 You're creating this by search messages. 1290 01:05:15,890 --> 01:05:17,889 How is i0 going to know when its done? 1291 01:05:25,872 --> 01:05:26,372 Yeah. 1292 01:05:26,372 --> 01:05:27,869 AUDIENCE: Every time you mark a node, 1293 01:05:27,869 --> 01:05:29,865 the node can send a message back to its parent, 1294 01:05:29,865 --> 01:05:30,863 saying hi, I've been marked. 1295 01:05:30,863 --> 01:05:33,154 Then you can probably get all the way back to the root. 1296 01:05:33,154 --> 01:05:36,601 And then the root can count the number of-- actually, 1297 01:05:36,601 --> 01:05:37,860 no if the root doesn't-- 1298 01:05:37,860 --> 01:05:39,985 NANCY LYNCH: Root doesn't know the number of nodes. 1299 01:05:39,985 --> 01:05:41,543 So that's a good idea. 1300 01:05:41,543 --> 01:05:43,042 AUDIENCE: If you don't have a child, 1301 01:05:43,042 --> 01:05:45,507 you can tell your parent that you don't have a child. 1302 01:05:48,545 --> 01:05:49,920 NANCY LYNCH: That's a good start. 1303 01:05:49,920 --> 01:05:51,230 Was there another? 1304 01:05:51,230 --> 01:05:51,737 Yeah. 1305 01:05:51,737 --> 01:05:53,153 AUDIENCE: More generally, you just 1306 01:05:53,153 --> 01:05:55,885 send a signal when you know your sub-tree is done. 1307 01:05:55,885 --> 01:05:58,010 NANCY LYNCH: When you know you're sub-tree is done, 1308 01:05:58,010 --> 01:06:00,770 so that means you're going to be communicating something 1309 01:06:00,770 --> 01:06:02,170 up the tree. 1310 01:06:02,170 --> 01:06:05,640 Right, so that's the idea that you're working toward. 1311 01:06:05,640 --> 01:06:08,980 So a termination algorithm to inform i0 1312 01:06:08,980 --> 01:06:11,550 when the tree is completely constructed. 1313 01:06:11,550 --> 01:06:15,080 So let's say that the search messages get their responses. 1314 01:06:15,080 --> 01:06:17,700 So everybody knows which nodes are their, 1315 01:06:17,700 --> 01:06:22,290 which neighbors are its children, and which are not. 1316 01:06:22,290 --> 01:06:24,810 So suppose a node has gotten responses 1317 01:06:24,810 --> 01:06:30,830 to all of its search messages, knows who all its children are. 1318 01:06:30,830 --> 01:06:33,260 Now the leaves in this tree are going 1319 01:06:33,260 --> 01:06:34,730 to know that they're leaves. 1320 01:06:34,730 --> 01:06:37,880 How do they know that? 1321 01:06:37,880 --> 01:06:41,860 Propagating all these search messages, and I'm a leaf. 1322 01:06:41,860 --> 01:06:43,524 How do I know I'm a leaf? 1323 01:06:43,524 --> 01:06:44,940 AUDIENCE: You can't have children. 1324 01:06:44,940 --> 01:06:47,410 NANCY LYNCH: Yeah, you send all these search messages, 1325 01:06:47,410 --> 01:06:51,810 and everybody says, sorry you're not my parent. 1326 01:06:51,810 --> 01:06:54,390 So you know you have no children because of the kind 1327 01:06:54,390 --> 01:06:57,140 of responses you get. 1328 01:06:57,140 --> 01:06:58,540 So now we're going to use what we 1329 01:06:58,540 --> 01:07:01,300 call a convergecast strategy. 1330 01:07:01,300 --> 01:07:03,160 Broadcast is sending things out. 1331 01:07:03,160 --> 01:07:06,320 Convergecast is fanning in information back 1332 01:07:06,320 --> 01:07:09,560 to the top of the tree. 1333 01:07:09,560 --> 01:07:11,800 So the convergecast would say, all right, 1334 01:07:11,800 --> 01:07:15,200 so the leaves would send a message to their parents 1335 01:07:15,200 --> 01:07:18,060 saying they're done. 1336 01:07:18,060 --> 01:07:23,600 Now if I'm some node in the middle of the tree, 1337 01:07:23,600 --> 01:07:24,780 how do I know I'm done? 1338 01:07:24,780 --> 01:07:28,420 Well it's what you said. 1339 01:07:28,420 --> 01:07:31,750 You know that you can figure out when 1340 01:07:31,750 --> 01:07:33,990 your entire sub-tree is done. 1341 01:07:33,990 --> 01:07:37,750 Well first of all, you have to know your children are. 1342 01:07:37,750 --> 01:07:39,760 It's kind of a two stage process. 1343 01:07:39,760 --> 01:07:42,610 You have to know who your children are, 1344 01:07:42,610 --> 01:07:46,530 by having received responses to all your search messages. 1345 01:07:46,530 --> 01:07:49,410 And you wait to receive done messages from all 1346 01:07:49,410 --> 01:07:51,247 of your actual children. 1347 01:07:51,247 --> 01:07:53,080 So if I'm sitting in the middle of the tree, 1348 01:07:53,080 --> 01:07:56,020 and I've got done messages from all my children, 1349 01:07:56,020 --> 01:07:57,850 I know my whole sub-tree is done. 1350 01:07:57,850 --> 01:08:02,140 Then I can send the done message to my parent. 1351 01:08:02,140 --> 01:08:04,180 Got that? 1352 01:08:04,180 --> 01:08:05,870 That's how convergecast works. 1353 01:08:05,870 --> 01:08:09,690 And when it reaches the top, if i0 1354 01:08:09,690 --> 01:08:12,580 knows who its children are, and it receives done messages 1355 01:08:12,580 --> 01:08:15,540 from all its children, it knows the whole tree is done. 1356 01:08:15,540 --> 01:08:20,100 So it can output that the tree construction is complete. 1357 01:08:20,100 --> 01:08:22,420 And it could tell the others by sending 1358 01:08:22,420 --> 01:08:25,859 a message down the tree, so they all know as well. 1359 01:08:25,859 --> 01:08:27,194 Questions? 1360 01:08:27,194 --> 01:08:32,674 AUDIENCE: Wouldn't i0 be the last one to know? 1361 01:08:32,674 --> 01:08:34,090 NANCY LYNCH: He'd be the last one. 1362 01:08:34,090 --> 01:08:37,390 No, he'd be the first one to know that the whole tree is 1363 01:08:37,390 --> 01:08:38,670 complete. 1364 01:08:38,670 --> 01:08:41,450 Everybody else knows when their sub-tree is complete. 1365 01:08:41,450 --> 01:08:45,410 So i0 still has to now send another message down the tree 1366 01:08:45,410 --> 01:08:48,276 to tell everyone else the entire tree is complete. 1367 01:08:48,276 --> 01:08:49,359 Is there another question? 1368 01:08:52,289 --> 01:08:53,830 All right so this isn't showing that. 1369 01:08:53,830 --> 01:08:56,279 This is just showing done messages, which are actually 1370 01:08:56,279 --> 01:08:58,560 going in the opposite direction from these edges, 1371 01:08:58,560 --> 01:08:59,390 going up the tree. 1372 01:08:59,390 --> 01:09:02,060 But you can just see how they propagate up 1373 01:09:02,060 --> 01:09:04,670 until the roots says done. 1374 01:09:04,670 --> 01:09:05,180 No big deal. 1375 01:09:08,149 --> 01:09:10,819 Complexity for termination. 1376 01:09:10,819 --> 01:09:14,130 Well it just takes at most diameter rounds and n messages 1377 01:09:14,130 --> 01:09:16,880 for this done information to come up to the top, 1378 01:09:16,880 --> 01:09:19,229 once the tree actually is finished. 1379 01:09:19,229 --> 01:09:21,130 Because now you're just sending messages 1380 01:09:21,130 --> 01:09:25,130 on the paths in this tree, which are only, 1381 01:09:25,130 --> 01:09:29,029 at most, diameter in length. 1382 01:09:29,029 --> 01:09:32,920 And this is just the process i0 can tell everybody else. 1383 01:09:32,920 --> 01:09:34,540 It doesn't take very long either. 1384 01:09:37,260 --> 01:09:41,149 Applications, well suppose you construct a tree like this. 1385 01:09:41,149 --> 01:09:44,460 And process i0 now wants to use it to communicate. 1386 01:09:44,460 --> 01:09:46,450 It wants to send a whole batch of messages 1387 01:09:46,450 --> 01:09:47,819 to all the other nodes. 1388 01:09:47,819 --> 01:09:49,990 It can just send them now on the tree. 1389 01:09:49,990 --> 01:09:52,790 It's an easy way to make sure messages reach 1390 01:09:52,790 --> 01:09:54,240 everybody else in the network. 1391 01:09:54,240 --> 01:09:57,310 Just send them on the edges of the breadth-first spanning 1392 01:09:57,310 --> 01:09:59,580 tree. 1393 01:09:59,580 --> 01:10:03,610 So now the messages, each individual message 1394 01:10:03,610 --> 01:10:07,370 takes at most n message instances 1395 01:10:07,370 --> 01:10:09,650 along the edges of the tree, because you only have 1396 01:10:09,650 --> 01:10:11,570 to traverse the tree edges. 1397 01:10:11,570 --> 01:10:15,920 No more dependence on the total number of edges in the network. 1398 01:10:15,920 --> 01:10:19,000 And in fact, you can save time by pipelining 1399 01:10:19,000 --> 01:10:20,280 a series of messages. 1400 01:10:20,280 --> 01:10:23,410 So you can send them one round after the other. 1401 01:10:28,180 --> 01:10:31,740 The other way, suppose you want to compute something globally. 1402 01:10:31,740 --> 01:10:34,870 Suppose everybody starts with some initial value. 1403 01:10:34,870 --> 01:10:38,590 And process i0 is going to try to determine 1404 01:10:38,590 --> 01:10:42,650 the value of some function of everybody's initial value, 1405 01:10:42,650 --> 01:10:46,530 like the minimum or maximum or the sum or anything. 1406 01:10:46,530 --> 01:10:48,990 Well you can do this while convergecasting 1407 01:10:48,990 --> 01:10:52,910 on an already built BFS tree. 1408 01:10:52,910 --> 01:10:56,470 So everybody can just send their information up the tree, 1409 01:10:56,470 --> 01:10:58,290 and i0 can collect it all. 1410 01:10:58,290 --> 01:11:00,933 In general, you can accumulate, you 1411 01:11:00,933 --> 01:11:04,610 can do data aggregation as you go up the paths of the tree. 1412 01:11:04,610 --> 01:11:09,910 So the message size doesn't blow up. 1413 01:11:09,910 --> 01:11:13,520 So if you want, for example, the sum of everybody's values, 1414 01:11:13,520 --> 01:11:16,260 everybody just sends their values up in a convergecast. 1415 01:11:16,260 --> 01:11:18,890 And each node computes the sum of all the values 1416 01:11:18,890 --> 01:11:21,550 in its sub-tree. 1417 01:11:21,550 --> 01:11:24,100 So this is pretty efficient. 1418 01:11:24,100 --> 01:11:26,722 Make sense? 1419 01:11:26,722 --> 01:11:27,680 I'm going to skip this. 1420 01:11:27,680 --> 01:11:30,110 But you could do leader election in a general graph, 1421 01:11:30,110 --> 01:11:32,470 If you don't have a leader, already, 1422 01:11:32,470 --> 01:11:35,550 i0 by having everybody run a breadth-first search 1423 01:11:35,550 --> 01:11:36,160 in parallel. 1424 01:11:36,160 --> 01:11:37,359 But we'll skip that. 1425 01:11:37,359 --> 01:11:39,400 Because I just wanted to have a couple of minutes 1426 01:11:39,400 --> 01:11:43,900 to start the last topic, and we'll pick it up next time. 1427 01:11:43,900 --> 01:11:47,060 So it's the obvious extension. 1428 01:11:47,060 --> 01:11:49,350 Instead of just breadth-first search trees, 1429 01:11:49,350 --> 01:11:51,810 let's put weights on the edges and try 1430 01:11:51,810 --> 01:11:57,170 to compute shortest paths trees in terms of the total weight 1431 01:11:57,170 --> 01:11:58,631 of the path. 1432 01:12:01,560 --> 01:12:04,350 So we're going to add weights. 1433 01:12:04,350 --> 01:12:05,440 It's an undirected graph. 1434 01:12:05,440 --> 01:12:08,160 So it's just a weight for each undirected edge. 1435 01:12:11,290 --> 01:12:19,160 I'll still have a starting node, vertex v0 with process i0. 1436 01:12:19,160 --> 01:12:22,110 Still have unique identifiers. 1437 01:12:22,110 --> 01:12:24,670 And I'll assume the processes know who their neighbors are. 1438 01:12:24,670 --> 01:12:27,270 And they know the weights of the incident 1439 01:12:27,270 --> 01:12:29,659 edges, their adjacent edges. 1440 01:12:29,659 --> 01:12:31,200 But otherwise they don't need to know 1441 01:12:31,200 --> 01:12:34,590 anything else about the graph. 1442 01:12:34,590 --> 01:12:36,890 So again, this is a familiar problem. 1443 01:12:36,890 --> 01:12:38,990 But we're looking at it in a very different way, 1444 01:12:38,990 --> 01:12:40,160 by distributing it. 1445 01:12:43,360 --> 01:12:47,640 so the processes are supposed to compute a shortest paths tree, 1446 01:12:47,640 --> 01:12:49,960 in the sense that everybody should 1447 01:12:49,960 --> 01:12:52,050 output its parent in the tree. 1448 01:12:52,050 --> 01:12:55,440 And let's say they output the distance as well, 1449 01:12:55,440 --> 01:12:58,680 the weighted distance from the root node. 1450 01:13:03,540 --> 01:13:06,920 So this is called Bellman-Ford's algorithm. 1451 01:13:06,920 --> 01:13:11,970 Again it's got the same name in the distributed setting. 1452 01:13:11,970 --> 01:13:13,870 The Bellman-Ford shortest paths algorithm. 1453 01:13:17,230 --> 01:13:20,710 So everybody is keeping track of their current best 1454 01:13:20,710 --> 01:13:23,630 distance that they know, and their parent. 1455 01:13:23,630 --> 01:13:27,270 And they know their unique identifier. 1456 01:13:27,270 --> 01:13:29,040 And here's how the algorithm works. 1457 01:13:29,040 --> 01:13:31,170 This will look familiar from when 1458 01:13:31,170 --> 01:13:34,130 you had Bellman-Ford earlier. 1459 01:13:34,130 --> 01:13:37,650 At every round, everybody is going 1460 01:13:37,650 --> 01:13:40,752 to send its distance to its neighbors. 1461 01:13:40,752 --> 01:13:42,460 Instead of just sending a search message, 1462 01:13:42,460 --> 01:13:46,450 now it will send its actual distance information. 1463 01:13:46,450 --> 01:13:50,720 And you receive the messages from your neighbors. 1464 01:13:50,720 --> 01:13:55,240 And now you do a relaxation step, as you've seen before. 1465 01:13:55,240 --> 01:13:56,990 You look at the current distance you have. 1466 01:13:56,990 --> 01:13:59,610 And you see if you've gotten a new distance 1467 01:13:59,610 --> 01:14:03,650 from a neighbor, such that if you add the new distance you 1468 01:14:03,650 --> 01:14:06,600 receive to the weight of the edge between yourself 1469 01:14:06,600 --> 01:14:08,690 and that neighbor, you get something better 1470 01:14:08,690 --> 01:14:10,350 than what you had before. 1471 01:14:10,350 --> 01:14:14,220 If you get that, then you're going to improve your distance. 1472 01:14:14,220 --> 01:14:16,170 And if you improve your distance, 1473 01:14:16,170 --> 01:14:19,070 then you're going to reset your parent 1474 01:14:19,070 --> 01:14:24,720 to the sender of this new, better distance information. 1475 01:14:24,720 --> 01:14:26,610 So does this algorithm make sense? 1476 01:14:26,610 --> 01:14:28,580 It's like what you saw before. 1477 01:14:28,580 --> 01:14:32,470 But there's no running through all the nodes. 1478 01:14:32,470 --> 01:14:34,310 Each node is doing its own thing. 1479 01:14:34,310 --> 01:14:37,040 It's waiting to get better distance information 1480 01:14:37,040 --> 01:14:39,100 and re-computing. 1481 01:14:39,100 --> 01:14:41,750 And then it's going to be sending out its better 1482 01:14:41,750 --> 01:14:43,316 information at the next round. 1483 01:14:46,100 --> 01:14:46,710 Question? 1484 01:14:46,710 --> 01:14:49,660 So this is kind of a jump in the way of thinking. 1485 01:14:54,060 --> 01:14:56,990 All right, so now I'm just going to end basically 1486 01:14:56,990 --> 01:14:59,560 with an animation that'll show you the kinds of things 1487 01:14:59,560 --> 01:15:01,930 that happen here. 1488 01:15:01,930 --> 01:15:07,100 All right so you start out with the initial node. 1489 01:15:07,100 --> 01:15:10,590 And what's recorded in the circle is the best distances. 1490 01:15:10,590 --> 01:15:14,522 The rest of these, the best distance they know is infinity. 1491 01:15:14,522 --> 01:15:15,480 So I didn't write that. 1492 01:15:15,480 --> 01:15:23,470 So this guy knows 0 After one round, he sent two messages. 1493 01:15:23,470 --> 01:15:25,920 The best distance each of these guys knows 1494 01:15:25,920 --> 01:15:30,360 is just the weight of the edge between v0 and itself. 1495 01:15:30,360 --> 01:15:33,410 So this guy's now estimating it's distance at 16 1496 01:15:33,410 --> 01:15:36,080 and this guy at 1. 1497 01:15:36,080 --> 01:15:38,930 16 is not very good because it's actually very roundabout routes 1498 01:15:38,930 --> 01:15:40,070 that can get there. 1499 01:15:40,070 --> 01:15:45,310 But it's going to take us some time to make that adjustment. 1500 01:15:45,310 --> 01:15:50,240 After two rounds, everybody is sending their distance 1501 01:15:50,240 --> 01:15:50,880 information. 1502 01:15:50,880 --> 01:15:54,700 But now we get a correction here. 1503 01:15:54,700 --> 01:15:57,110 This used to say 16. 1504 01:15:57,110 --> 01:15:59,710 But now we have a two hop path that 1505 01:15:59,710 --> 01:16:02,170 gives you a better distance. 1506 01:16:02,170 --> 01:16:04,000 So you get the 1 plus the 14. 1507 01:16:04,000 --> 01:16:08,850 So he's going to here, about the distance of 15 1508 01:16:08,850 --> 01:16:11,390 as a result of what 1 sends. 1509 01:16:11,390 --> 01:16:16,740 And some new guys get their distance is calculated 1510 01:16:16,740 --> 01:16:21,680 And then after three rounds, it gets a little bit complicated. 1511 01:16:21,680 --> 01:16:24,910 So maybe I'm just going to flip through it quickly and let 1512 01:16:24,910 --> 01:16:26,500 you study later. 1513 01:16:26,500 --> 01:16:29,340 But you see that you keep getting improvements, 1514 01:16:29,340 --> 01:16:32,390 as you perform relaxation steps. 1515 01:16:32,390 --> 01:16:36,270 As information gets to somebody by better paths that 1516 01:16:36,270 --> 01:16:38,680 happen to have more hops, they're 1517 01:16:38,680 --> 01:16:40,560 going to be reducing their estimates. 1518 01:16:40,560 --> 01:16:44,640 I'm going to flip, and you see that this guy's estimate 1519 01:16:44,640 --> 01:16:47,050 is going down. 1520 01:16:47,050 --> 01:16:49,920 And in the end, after eight rounds of this, 1521 01:16:49,920 --> 01:16:52,180 you end up with a very roundabout path 1522 01:16:52,180 --> 01:16:56,430 that actually gives this guy a much better estimate. 1523 01:16:56,430 --> 01:16:58,640 So you can see how that works. 1524 01:17:01,190 --> 01:17:03,660 So the claim is that eventually, every process 1525 01:17:03,660 --> 01:17:08,270 will have its distance being a correct minimum weight 1526 01:17:08,270 --> 01:17:12,710 of the path, and its parent will be correct. 1527 01:17:12,710 --> 01:17:14,710 I think maybe this is a good place to stop. 1528 01:17:14,710 --> 01:17:17,810 We'll pick up with this algorithm and its analysis. 1529 01:17:17,810 --> 01:17:19,910 Most of next time is going to be spent 1530 01:17:19,910 --> 01:17:22,440 on asynchronous algorithms, which 1531 01:17:22,440 --> 01:17:25,560 is a whole other level of complication. 1532 01:17:25,560 --> 01:17:27,820 So I'll see you on Thursday.