1 00:00:00,070 --> 00:00:02,500 The following content is provided under a Creative 2 00:00:02,500 --> 00:00:04,019 Commons license. 3 00:00:04,019 --> 00:00:06,360 Your support will help MIT OpenCourseWare 4 00:00:06,360 --> 00:00:10,730 continue to offer high quality educational resources for free. 5 00:00:10,730 --> 00:00:13,330 To make a donation or view additional materials 6 00:00:13,330 --> 00:00:17,217 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,217 --> 00:00:17,842 at ocw.mit.edu. 8 00:00:20,930 --> 00:00:23,190 ERIK DEMAINE: All right, let's get started. 9 00:00:23,190 --> 00:00:28,520 Today, we have another cool graph algorithm or problem. 10 00:00:28,520 --> 00:00:31,430 Actually, we'll have two algorithms. 11 00:00:31,430 --> 00:00:34,350 The problem is called minimum spanning tree. 12 00:00:34,350 --> 00:00:37,250 You can probably guess from the title what it's trying to do. 13 00:00:37,250 --> 00:00:38,990 We'll see two algorithms for doing it. 14 00:00:38,990 --> 00:00:42,200 Both of them are in the category of greedy algorithms, which 15 00:00:42,200 --> 00:00:44,710 is something we've seen a couple of times 16 00:00:44,710 --> 00:00:48,960 already in 6.046, starting with lecture 1. 17 00:00:48,960 --> 00:00:51,430 This is the definition of greedy algorithm from lecture 1, 18 00:00:51,430 --> 00:00:52,710 roughly. 19 00:00:52,710 --> 00:00:56,850 The idea is to always make greedy choices, meaning 20 00:00:56,850 --> 00:00:59,130 the choice is locally best. 21 00:00:59,130 --> 00:01:02,360 For right now, it seems like a good thing to do, 22 00:01:02,360 --> 00:01:06,540 but maybe in the future it will screw you over. 23 00:01:06,540 --> 00:01:08,800 And if you have a correct greedy algorithm, 24 00:01:08,800 --> 00:01:10,900 you prove that it won't screw you over. 25 00:01:10,900 --> 00:01:13,400 So it's sort of like Cookie Monster here, always 26 00:01:13,400 --> 00:01:15,670 locally seems like a good idea to eat another cookie, 27 00:01:15,670 --> 00:01:19,020 but maybe it'll bite you in the future. 28 00:01:19,020 --> 00:01:23,300 So today we will embrace our inner Cookie Monster 29 00:01:23,300 --> 00:01:26,530 and eat as many-- eat the largest cookie first, 30 00:01:26,530 --> 00:01:29,397 would be the standard algorithm for Cookie Monster. 31 00:01:29,397 --> 00:01:31,480 I don't know if you learned that in Sesame Street, 32 00:01:31,480 --> 00:01:34,470 but-- all right. 33 00:01:34,470 --> 00:01:36,720 So what's the problem? 34 00:01:36,720 --> 00:01:37,930 Minimum spanning tree. 35 00:01:37,930 --> 00:01:40,350 Can anyone tell me what a tree is? 36 00:01:40,350 --> 00:01:44,240 Formally, not the outside thing. 37 00:01:44,240 --> 00:01:46,590 In graph land. 38 00:01:46,590 --> 00:01:47,730 Acyclic graph, close. 39 00:01:50,450 --> 00:01:52,912 Connected acyclic graph, good. 40 00:01:52,912 --> 00:01:53,620 That's important. 41 00:01:58,850 --> 00:02:07,900 This is 604.2 stuff. 42 00:02:07,900 --> 00:02:11,400 OK, so how about a spanning tree? 43 00:02:15,331 --> 00:02:15,830 Sorry? 44 00:02:15,830 --> 00:02:17,110 AUDIENCE: It contains all the vertices. 45 00:02:17,110 --> 00:02:18,901 ERIK DEMAINE: It contains all the vertices. 46 00:02:18,901 --> 00:02:19,590 Yeah. 47 00:02:19,590 --> 00:02:21,705 So let me go over here. 48 00:02:27,350 --> 00:02:31,890 Spanning means it contains all the vertices, so implicit here, 49 00:02:31,890 --> 00:02:34,210 I guess, is subtree or subgraph. 50 00:02:34,210 --> 00:02:35,810 You're given a graph. 51 00:02:35,810 --> 00:02:38,240 You want a spanning tree of that graph. 52 00:02:44,210 --> 00:02:47,000 It's going to be a tree that lives inside the graph. 53 00:02:47,000 --> 00:02:49,330 So we're going to take some of the edges of G, 54 00:02:49,330 --> 00:02:52,390 make a tree out of them, make a connected acyclic graph. 55 00:02:52,390 --> 00:02:56,320 And that tree should hit all the vertices in G. 56 00:02:56,320 --> 00:03:00,700 So this is going to be a subset of the edges, or subgraph. 57 00:03:06,060 --> 00:03:08,160 Those edges should form a tree. 58 00:03:13,520 --> 00:03:26,300 And, I'll say, hit all vertices of G. 59 00:03:26,300 --> 00:03:28,280 OK, if I just said they should form a tree, 60 00:03:28,280 --> 00:03:30,600 then I could say, well, I'll take no edges, 61 00:03:30,600 --> 00:03:33,160 and here's a tree with one vertex. 62 00:03:33,160 --> 00:03:34,594 That's not very interesting. 63 00:03:34,594 --> 00:03:36,260 You want a vertex-- you want, basically, 64 00:03:36,260 --> 00:03:38,660 the vertex set of the tree to be the same as the vertex 65 00:03:38,660 --> 00:03:40,000 set of the graph. 66 00:03:40,000 --> 00:03:41,547 That's the spanning property. 67 00:03:41,547 --> 00:03:43,005 But you still want it to be a tree, 68 00:03:43,005 --> 00:03:46,840 so you want it to be connected and you want it to be acyclic. 69 00:03:46,840 --> 00:03:50,600 Now if G is disconnected, this is impossible. 70 00:03:50,600 --> 00:03:52,690 And for that, you could define a spanning forest 71 00:03:52,690 --> 00:03:55,260 to be like a maximal thing like this, 72 00:03:55,260 --> 00:03:58,240 but we'll focus on the case here as G is connected. 73 00:03:58,240 --> 00:03:59,900 That's the interesting case. 74 00:03:59,900 --> 00:04:03,170 And so we can get a spanning tree. 75 00:04:03,170 --> 00:04:03,670 All right? 76 00:04:03,670 --> 00:04:06,090 So what is this minimum spanning tree problem? 77 00:04:09,540 --> 00:04:11,370 Minimum spanning tree. 78 00:04:11,370 --> 00:04:22,400 We're given a weighted graph, just like last time, 79 00:04:22,400 --> 00:04:23,380 with shortest paths. 80 00:04:23,380 --> 00:04:34,720 We have an edge weight function W giving me a real number, say, 81 00:04:34,720 --> 00:04:36,970 for every edge. 82 00:04:36,970 --> 00:04:42,125 And we want to find a spanning tree of minimum total weight. 83 00:04:56,330 --> 00:04:59,460 So I'm going to define the weight of a tree T 84 00:04:59,460 --> 00:05:03,830 to be the sum over all edges in T, 85 00:05:03,830 --> 00:05:07,250 because I'm viewing a spanning tree as a set of edges, 86 00:05:07,250 --> 00:05:11,000 of the weight of that edge. 87 00:05:11,000 --> 00:05:13,070 OK, so pretty much what you would expect. 88 00:05:13,070 --> 00:05:14,520 Minimum weight spanning tree. 89 00:05:17,050 --> 00:05:19,320 It's a relatively simple problem, 90 00:05:19,320 --> 00:05:22,110 but it's not so easy to find an algorithm. 91 00:05:22,110 --> 00:05:25,390 You need to prove a lot to make sure that you really 92 00:05:25,390 --> 00:05:27,550 find the right tree. 93 00:05:27,550 --> 00:05:30,710 I guess the really naive algorithm here 94 00:05:30,710 --> 00:05:35,480 would be to try all spanning trees, 95 00:05:35,480 --> 00:05:37,580 compute the weight of each spanning tree 96 00:05:37,580 --> 00:05:39,760 and return the minimum. 97 00:05:39,760 --> 00:05:40,830 That sounds reasonable. 98 00:05:40,830 --> 00:05:42,890 That's correct. 99 00:05:42,890 --> 00:05:54,620 But it's bad, because-- n to the fourth, that would be nice. 100 00:05:54,620 --> 00:05:55,570 It's larger than that. 101 00:06:01,290 --> 00:06:05,850 Maybe not so obvious, but it can be exponential. 102 00:06:05,850 --> 00:06:08,096 Here's a graph where the number of spanning trees 103 00:06:08,096 --> 00:06:08,720 is exponential. 104 00:06:12,980 --> 00:06:16,690 This is a complete bipartite graph 105 00:06:16,690 --> 00:06:21,190 with two vertices on one side and n vertices on the other, 106 00:06:21,190 --> 00:06:24,990 and so you can-- let's say we put these two 107 00:06:24,990 --> 00:06:26,910 edges into the spanning tree. 108 00:06:26,910 --> 00:06:30,230 And now, for each of these vertices, 109 00:06:30,230 --> 00:06:32,460 we can choose whether it connects to the left vertex 110 00:06:32,460 --> 00:06:33,320 or the right vertix. 111 00:06:33,320 --> 00:06:35,861 It can only do one, but it could do either one independently. 112 00:06:35,861 --> 00:06:37,630 So maybe this guy chooses the left one, 113 00:06:37,630 --> 00:06:39,172 this one chooses the right one. 114 00:06:39,172 --> 00:06:40,880 This one chooses the left one, and so on. 115 00:06:40,880 --> 00:06:44,740 If I have n vertices down here, I have 2 to the n different 116 00:06:44,740 --> 00:06:46,780 spanning trees. 117 00:06:46,780 --> 00:06:50,632 So there can be an exponential number. 118 00:06:50,632 --> 00:06:52,480 So that algorithm is not so good. 119 00:06:59,400 --> 00:07:00,630 Exponential bad. 120 00:07:00,630 --> 00:07:01,414 Polynomial good. 121 00:07:01,414 --> 00:07:03,580 So today, we're going to get a polynomial algorithm. 122 00:07:03,580 --> 00:07:07,320 In fact, we will get an almost linear time algorithm as fast 123 00:07:07,320 --> 00:07:08,740 as Dijkstra's algorithm. 124 00:07:08,740 --> 00:07:11,970 But we can't use Dijkstra's algorithm, 125 00:07:11,970 --> 00:07:13,640 there's no shortest paths here. 126 00:07:13,640 --> 00:07:16,350 Plus, one of the algorithms will actually look pretty similar. 127 00:07:19,410 --> 00:07:22,340 Two lectures ago, the dynamic programming lecture, 128 00:07:22,340 --> 00:07:26,510 we saw an example where we tried to do greedy, 129 00:07:26,510 --> 00:07:28,500 and it gave the wrong answer, and so 130 00:07:28,500 --> 00:07:30,010 we fell back on dynamic programming. 131 00:07:30,010 --> 00:07:33,330 Today, we're going to try to do dynamic programming, 132 00:07:33,330 --> 00:07:36,590 it's going to fail, and we're going to fall back on greedy. 133 00:07:36,590 --> 00:07:37,665 It's like the reverse. 134 00:07:37,665 --> 00:07:39,040 But the way it's going to fail is 135 00:07:39,040 --> 00:07:41,700 we're going to get exponential time initially, 136 00:07:41,700 --> 00:07:44,870 and then greedy will let us get polynomial time. 137 00:07:44,870 --> 00:07:46,390 This is actually a bit unusual. 138 00:07:46,390 --> 00:07:49,250 I would say more typically, dynamic programming 139 00:07:49,250 --> 00:07:51,890 can solve anything, but, you know, with n 140 00:07:51,890 --> 00:07:54,599 to the seventh running time, something slow. 141 00:07:54,599 --> 00:07:56,390 And then you apply greedy, and you get down 142 00:07:56,390 --> 00:07:58,290 to like n or n log n running time. 143 00:07:58,290 --> 00:07:59,560 So that's more common. 144 00:07:59,560 --> 00:08:01,435 But today, we're going to go from exponential 145 00:08:01,435 --> 00:08:02,532 down to polynomial. 146 00:08:02,532 --> 00:08:03,490 And that's pretty nice. 147 00:08:07,280 --> 00:08:07,910 Cool. 148 00:08:07,910 --> 00:08:14,280 So let me tell you a little bit about greedy algorithm 149 00:08:14,280 --> 00:08:15,375 theory, so to speak. 150 00:08:20,697 --> 00:08:21,780 This is from the textbook. 151 00:08:27,070 --> 00:08:32,159 If your problem can be solved by greedy algorithm, 152 00:08:32,159 --> 00:08:34,200 usually you can prove two properties 153 00:08:34,200 --> 00:08:36,880 about that algorithm. 154 00:08:36,880 --> 00:08:38,975 One of them is called optimal substructure. 155 00:08:49,400 --> 00:08:51,700 And the other is called the greedy choice property. 156 00:08:51,700 --> 00:08:59,850 Optimal substructure should be familiar idea 157 00:08:59,850 --> 00:09:02,460 because it's essentially an encapsulation 158 00:09:02,460 --> 00:09:03,515 of dynamic programming. 159 00:09:33,429 --> 00:09:34,970 Greedy algorithms are, in some sense, 160 00:09:34,970 --> 00:09:36,635 a special form of dynamic programming. 161 00:09:48,300 --> 00:09:50,000 So this is saying something like, 162 00:09:50,000 --> 00:09:52,634 if you can solve subproblems optimally, 163 00:09:52,634 --> 00:09:54,550 smaller subproblems, or whatever, then you can 164 00:09:54,550 --> 00:09:56,480 solve your original problem. 165 00:09:56,480 --> 00:09:58,640 And this may happen recursively, whatever. 166 00:09:58,640 --> 00:10:00,820 That's essentially what makes a recurrence 167 00:10:00,820 --> 00:10:03,620 work for dynamic programming. 168 00:10:03,620 --> 00:10:09,470 And with dynamic programming, for this to be possible, 169 00:10:09,470 --> 00:10:13,137 we need to guess some feature of the solution. 170 00:10:13,137 --> 00:10:14,720 For example, in minimum spanning tree, 171 00:10:14,720 --> 00:10:17,430 maybe you guess one of the edges that's in the right answer. 172 00:10:20,290 --> 00:10:22,710 And then, once you do that, you can reduce it 173 00:10:22,710 --> 00:10:24,636 to some other subproblems. 174 00:10:24,636 --> 00:10:26,260 And if you can solve those subproblems, 175 00:10:26,260 --> 00:10:27,790 you combine them and get an optimal solution 176 00:10:27,790 --> 00:10:28,748 to your original thing. 177 00:10:28,748 --> 00:10:30,930 So this is a familiar property. 178 00:10:30,930 --> 00:10:34,140 I don't usually think of it this way for dynamic programming, 179 00:10:34,140 --> 00:10:39,002 but that is essentially what we're doing via guessing. 180 00:10:39,002 --> 00:10:41,210 But with greedy algorithms, we're not going to guess. 181 00:10:41,210 --> 00:10:43,020 We're just going to be greedy. 182 00:10:43,020 --> 00:10:44,490 Eat the largest cookie. 183 00:10:44,490 --> 00:10:46,795 And so that's the greedy choice property. 184 00:10:58,990 --> 00:11:01,590 This says that eating the largest cookie 185 00:11:01,590 --> 00:11:03,650 is actually a good thing to do. 186 00:11:10,546 --> 00:11:13,950 If we keep making locally optimal choices, 187 00:11:13,950 --> 00:11:16,100 will end up with a globally optimal solution. 188 00:11:25,650 --> 00:11:27,010 No tummy ache. 189 00:11:32,710 --> 00:11:35,310 This is something you wouldn't expect to be true in general, 190 00:11:35,310 --> 00:11:38,382 but it's going to be true for minimum spanning tree. 191 00:11:38,382 --> 00:11:40,300 And it's true for a handful of other problems. 192 00:11:40,300 --> 00:11:42,258 You'll see a bunch more in recitation tomorrow. 193 00:11:46,190 --> 00:11:48,330 This is sort of general theory, but I'm actually 194 00:11:48,330 --> 00:11:51,650 going to have a theorem like this for minimum spanning tree 195 00:11:51,650 --> 00:11:54,716 and a theorem like this for minimum spanning tree. 196 00:11:54,716 --> 00:11:58,290 This is the prototype, but most of today is all about minimum 197 00:11:58,290 --> 00:11:58,970 spanning tree. 198 00:12:02,202 --> 00:12:04,160 And for minimum spanning tree, neither of these 199 00:12:04,160 --> 00:12:05,880 is very obvious. 200 00:12:05,880 --> 00:12:08,290 So I'm just going to show you these theorems. 201 00:12:08,290 --> 00:12:12,324 They're fairly easy to prove, in fact, but finding them 202 00:12:12,324 --> 00:12:13,490 is probably the tricky part. 203 00:12:26,880 --> 00:12:29,990 Actually, I guess optimal substructure is probably 204 00:12:29,990 --> 00:12:34,496 the least intuitive or the least obvious greedy choice. 205 00:12:34,496 --> 00:12:35,870 You're probably already thinking, 206 00:12:35,870 --> 00:12:37,820 what are good greedy choices? 207 00:12:37,820 --> 00:12:41,100 Minimum weight edge seems like a good starting point, 208 00:12:41,100 --> 00:12:43,160 which we will get to. 209 00:12:43,160 --> 00:12:45,260 But there's even a stronger version 210 00:12:45,260 --> 00:12:47,100 of that, which we will prove. 211 00:12:47,100 --> 00:12:49,620 And first, optimal substructure. 212 00:12:49,620 --> 00:12:53,750 So here, I'm going to think like a dynamic program. 213 00:12:53,750 --> 00:12:58,110 Let's suppose that we know an edge that's in our solution. 214 00:12:58,110 --> 00:12:59,820 Suppose we know an edge that lives 215 00:12:59,820 --> 00:13:01,650 in a minimum spanning tree. 216 00:13:01,650 --> 00:13:02,950 We could guess that. 217 00:13:02,950 --> 00:13:05,880 We're not going to, but we could. 218 00:13:05,880 --> 00:13:11,850 Either way, let's just suppose than an edge e-- 219 00:13:11,850 --> 00:13:14,020 I should mention, I guess I didn't say, 220 00:13:14,020 --> 00:13:16,250 this graph is undirected. 221 00:13:16,250 --> 00:13:18,460 A minimum spanning tree doesn't quite make sense 222 00:13:18,460 --> 00:13:19,540 with directed graphs. 223 00:13:19,540 --> 00:13:21,164 There are other versions of the problem 224 00:13:21,164 --> 00:13:24,090 but here, the graph is undirected. 225 00:13:24,090 --> 00:13:28,032 So probably, I should write this as a unordered set, u, 226 00:13:28,032 --> 00:13:41,416 v. And there are possibly many minimum spanning trees. 227 00:13:41,416 --> 00:13:43,540 There could be many solutions with the same weight. 228 00:13:43,540 --> 00:13:46,170 For example, if all of these edges have weight 1, 229 00:13:46,170 --> 00:13:49,610 all of these trees are actually minimum. 230 00:13:49,610 --> 00:13:51,790 If all the edges have weight 1, every spanning tree 231 00:13:51,790 --> 00:13:54,440 is minimum, because every spanning tree has exactly 232 00:13:54,440 --> 00:13:55,315 n minus 1 edges. 233 00:13:57,840 --> 00:13:59,810 But let's suppose we know an edge that's 234 00:13:59,810 --> 00:14:05,060 guaranteed to be in some minimum spanning tree, at least one. 235 00:14:05,060 --> 00:14:09,400 What I would like to do is take this, so let me draw a picture. 236 00:14:09,400 --> 00:14:12,070 I have a graph. 237 00:14:12,070 --> 00:14:16,400 We've identified some edge in the graph, e, that lives 238 00:14:16,400 --> 00:14:18,970 in some minimum spanning tree. 239 00:14:18,970 --> 00:14:21,610 I'm going to draw some kind of tree structure here. 240 00:14:29,400 --> 00:14:30,860 OK. 241 00:14:30,860 --> 00:14:33,185 The wiggly lines are the tree. 242 00:14:33,185 --> 00:14:34,644 There are some other edges in here, 243 00:14:34,644 --> 00:14:37,226 which I don't want to draw too many of them because it's ugly. 244 00:14:37,226 --> 00:14:38,900 Those are other edges in the graph. 245 00:14:38,900 --> 00:14:39,941 Who knows where they are? 246 00:14:39,941 --> 00:14:43,130 They could be all sorts of things. 247 00:14:43,130 --> 00:14:43,630 OK? 248 00:14:43,630 --> 00:14:48,230 But I've highlighted the graph in a particular way. 249 00:14:48,230 --> 00:14:50,730 Because the minimum spanning tree 250 00:14:50,730 --> 00:14:54,470 is a tree, if I delete e from the tree, then 251 00:14:54,470 --> 00:14:56,850 I get two components. 252 00:14:56,850 --> 00:14:59,950 Every edge I remove-- I'm minimally connected. 253 00:14:59,950 --> 00:15:02,680 So if I delete an edge, I disconnect into two parts, 254 00:15:02,680 --> 00:15:06,530 so I've drawn that as the left circle and the right circle. 255 00:15:06,530 --> 00:15:08,780 It's just a general way to think about a tree. 256 00:15:08,780 --> 00:15:11,160 Now there are other unused edges in this picture, 257 00:15:11,160 --> 00:15:13,961 who knows where they live? 258 00:15:13,961 --> 00:15:14,460 OK? 259 00:15:14,460 --> 00:15:18,770 What I would like to do is somehow simplify this graph 260 00:15:18,770 --> 00:15:22,045 and get a smaller problem, say a graph with fewer edges. 261 00:15:25,820 --> 00:15:27,360 Any suggestions on how to do that? 262 00:15:32,090 --> 00:15:34,790 I don't actually know where all these white edges are, 263 00:15:34,790 --> 00:15:39,240 but what I'd like to do is-- I'm supposing I know where e is, 264 00:15:39,240 --> 00:15:41,810 and that's an edge in my minimum spanning tree. 265 00:15:41,810 --> 00:15:44,800 So how could I get rid of it? 266 00:15:48,252 --> 00:15:48,752 Yeah. 267 00:15:48,752 --> 00:15:50,234 AUDIENCE: Find the minimum weight 268 00:15:50,234 --> 00:15:51,720 spanning tree of the two edges. 269 00:15:51,720 --> 00:15:52,960 ERIK DEMAINE: I'd like to divide and conquer. 270 00:15:52,960 --> 00:15:54,410 Maybe find the minimum weight over 271 00:15:54,410 --> 00:15:55,701 here, minimum weight over here. 272 00:15:55,701 --> 00:15:58,330 Of course, I don't know which nodes are in what side. 273 00:15:58,330 --> 00:16:00,180 So that's a little trickier. 274 00:16:00,180 --> 00:16:02,064 But what do I do but E itself? 275 00:16:02,064 --> 00:16:02,980 Let's start with that. 276 00:16:06,917 --> 00:16:07,417 Yeah. 277 00:16:07,417 --> 00:16:08,417 AUDIENCE: You remove it? 278 00:16:08,417 --> 00:16:09,882 ERIK DEMAINE: You could remove it. 279 00:16:09,882 --> 00:16:10,868 That's a good idea. 280 00:16:10,868 --> 00:16:16,810 Doesn't work, but worth a Frisbee nonetheless. 281 00:16:16,810 --> 00:16:19,520 If I delete this edge, one problem 282 00:16:19,520 --> 00:16:22,540 is maybe none of these red edges exist and then my graph 283 00:16:22,540 --> 00:16:23,860 is disconnected. 284 00:16:23,860 --> 00:16:26,740 Well, maybe that's actually a good case. 285 00:16:26,740 --> 00:16:28,380 That probably would be a good case. 286 00:16:28,380 --> 00:16:30,384 Then I know how to divide and conquer. 287 00:16:30,384 --> 00:16:32,050 I just look at the connected components. 288 00:16:32,050 --> 00:16:36,170 In general, if I delete the edge, 289 00:16:36,170 --> 00:16:39,066 and I have these red edges, then I maybe 290 00:16:39,066 --> 00:16:40,940 find a minimum spanning tree on what remains. 291 00:16:43,750 --> 00:16:46,180 Maybe I'll end up including one of these edges. 292 00:16:46,180 --> 00:16:49,660 Maybe this edge ends up in the spanning tree, 293 00:16:49,660 --> 00:16:51,790 and then I can't put E in. 294 00:16:51,790 --> 00:16:53,700 So it's a little awkward. 295 00:16:53,700 --> 00:16:54,200 Yeah? 296 00:16:54,200 --> 00:16:56,158 AUDIENCE: Can you merge the two nodes into one? 297 00:16:56,158 --> 00:16:57,560 Merge the two nodes into one. 298 00:16:57,560 --> 00:16:58,570 Yes. 299 00:16:58,570 --> 00:17:01,450 Purple Frisbee. 300 00:17:01,450 --> 00:17:02,920 Impressive. 301 00:17:02,920 --> 00:17:05,189 This is what we call contracting the edge. 302 00:17:09,920 --> 00:17:13,260 It just means merge the endpoints. 303 00:17:13,260 --> 00:17:23,249 Merge u and v. So I will draw a new version of the graph. 304 00:17:29,840 --> 00:17:32,580 So this was u and v before. 305 00:17:32,580 --> 00:17:34,880 You've got to put the label inside. 306 00:17:34,880 --> 00:17:39,310 And now we have a new vertex here, which is uv. 307 00:17:39,310 --> 00:17:42,040 Or you can think it as the set u, v. We 308 00:17:42,040 --> 00:17:44,680 won't really need to keep track of names. 309 00:17:44,680 --> 00:17:50,120 And whatever edges you had over here, 310 00:17:50,120 --> 00:17:52,930 you're going to have over here. 311 00:17:52,930 --> 00:17:53,430 OK? 312 00:17:53,430 --> 00:17:56,223 Just collapse u and v. The edge e disappears. 313 00:17:59,910 --> 00:18:03,180 And one other thing can happen. 314 00:18:03,180 --> 00:18:08,280 Let me-- go over here. 315 00:18:14,040 --> 00:18:17,420 We could end up with duplicate edges by this process. 316 00:18:17,420 --> 00:18:21,750 So for example, suppose we have u and v, 317 00:18:21,750 --> 00:18:24,050 and they have a common neighbor. 318 00:18:24,050 --> 00:18:27,280 Might have many common neighbors, who knows. 319 00:18:27,280 --> 00:18:31,160 Add some other edges, uncommon neighbors. 320 00:18:31,160 --> 00:18:40,250 When I merge, I'd like to just have 321 00:18:40,250 --> 00:18:44,065 a single edge to that vertex and a single edge to that vertex. 322 00:18:44,065 --> 00:18:45,440 And what I'm going to do is, if I 323 00:18:45,440 --> 00:18:49,070 have some weights on these edges, let's say a and b, 324 00:18:49,070 --> 00:18:52,700 and c and d, I'm just going to take the minimum. 325 00:18:57,780 --> 00:19:00,420 Because what I'm about to do is compute a minimum spanning tree 326 00:19:00,420 --> 00:19:02,810 in this graph. 327 00:19:02,810 --> 00:19:04,700 And if I take the minimum spanning tree here, 328 00:19:04,700 --> 00:19:07,250 and I had multiple edges-- one weight a, 329 00:19:07,250 --> 00:19:10,150 one weight b-- do you think I would choose the larger weight 330 00:19:10,150 --> 00:19:11,170 edge? 331 00:19:11,170 --> 00:19:12,870 It does-- they're exactly the same edge, 332 00:19:12,870 --> 00:19:13,927 but one is higher weight. 333 00:19:13,927 --> 00:19:16,010 There's no point in keeping the higher weight one, 334 00:19:16,010 --> 00:19:18,840 so I'm just going to throw away the higher weight one. 335 00:19:18,840 --> 00:19:20,500 Take them in. 336 00:19:20,500 --> 00:19:26,300 So this is a particular form of edge contraction and graphs. 337 00:19:26,300 --> 00:19:30,180 And I claim it's a good thing to do, in the sense 338 00:19:30,180 --> 00:19:32,190 that if I can find a minimum spanning tree 339 00:19:32,190 --> 00:19:34,470 in this new graph-- this is usually 340 00:19:34,470 --> 00:19:41,190 called a G slash e, slash instead of negative, to remove 341 00:19:41,190 --> 00:19:41,983 e. 342 00:19:41,983 --> 00:19:44,552 I'm contracting e. 343 00:19:44,552 --> 00:19:46,680 So this is G slash e. 344 00:19:46,680 --> 00:19:52,150 This is G. If I can find a minimum spanning tree in G 345 00:19:52,150 --> 00:19:57,570 slash e, I claim I can find one in the original graph G 346 00:19:57,570 --> 00:20:00,680 just by adding the edge e. 347 00:20:00,680 --> 00:20:08,690 So I'm going to say if G prime is 348 00:20:08,690 --> 00:20:14,350 a minimum spanning tree, of G slash e, 349 00:20:14,350 --> 00:20:29,630 then T prime union e is a minimum spanning tree of G. 350 00:20:29,630 --> 00:20:31,270 So overall, you can think of this 351 00:20:31,270 --> 00:20:33,620 as a recurrence in a dynamic program, 352 00:20:33,620 --> 00:20:36,010 and let me write down that dynamic program. 353 00:20:39,130 --> 00:20:41,200 It won't be very good dynamic program, 354 00:20:41,200 --> 00:20:42,350 but it's a starting point. 355 00:20:52,880 --> 00:20:55,010 This is conceptually what we want to do. 356 00:20:55,010 --> 00:21:00,340 We're trying to guess an edge e that's 357 00:21:00,340 --> 00:21:03,960 in a minimum spanning tree. 358 00:21:03,960 --> 00:21:05,600 Then we're going to contract that edge. 359 00:21:10,030 --> 00:21:13,720 Then we're going to recurse, find the minimum spanning tree 360 00:21:13,720 --> 00:21:18,310 on what remains, and then we find the minimum spanning tree. 361 00:21:18,310 --> 00:21:20,750 Then we want to decontract the edge, 362 00:21:20,750 --> 00:21:24,670 put it back, put the graph back the way it was. 363 00:21:24,670 --> 00:21:27,150 And then add e to the minimum spanning tree. 364 00:21:31,700 --> 00:21:35,217 And what this lemma tells us, is that this 365 00:21:35,217 --> 00:21:36,175 is a correct algorithm. 366 00:21:39,110 --> 00:21:41,990 If you're lucky-- and we're going to force luckiness 367 00:21:41,990 --> 00:21:44,710 by trying all edges-- but if we start with an edge that 368 00:21:44,710 --> 00:21:48,140 is guaranteed to be in some minimum spanning tree, call it 369 00:21:48,140 --> 00:21:51,630 a safe edge, and we contract, and we 370 00:21:51,630 --> 00:21:53,710 find a minimum spanning tree on what remains, 371 00:21:53,710 --> 00:21:56,760 then we can put e back in at the end, 372 00:21:56,760 --> 00:22:00,230 and we'll get a minimum spanning tree of the original graph. 373 00:22:00,230 --> 00:22:03,780 So this gives us correctness of this algorithm. 374 00:22:03,780 --> 00:22:06,380 Now, this algorithm's bad, again, 375 00:22:06,380 --> 00:22:08,360 from a complexity standpoint. 376 00:22:08,360 --> 00:22:11,166 The running time is going to be exponential. 377 00:22:11,166 --> 00:22:13,540 The number of sub problems we might have to consider here 378 00:22:13,540 --> 00:22:14,960 is all subsets of edges. 379 00:22:14,960 --> 00:22:18,940 There's no particular way-- because at every step, 380 00:22:18,940 --> 00:22:22,240 we're guessing an arbitrary edge in the graph, 381 00:22:22,240 --> 00:22:23,530 there's no structure. 382 00:22:23,530 --> 00:22:25,860 Like, we can't say well, it's the first k edges, 383 00:22:25,860 --> 00:22:27,890 or some substring of edges. 384 00:22:27,890 --> 00:22:30,210 It's just going to be some subset of edges. 385 00:22:30,210 --> 00:22:33,080 There's exponentially many subsets, 2 to the e, 386 00:22:33,080 --> 00:22:34,090 so this is exponential. 387 00:22:39,370 --> 00:22:42,980 But we're going to make a polynomial 388 00:22:42,980 --> 00:22:44,794 by removing the guessing. 389 00:22:44,794 --> 00:22:46,460 This is actually a really good prototype 390 00:22:46,460 --> 00:22:47,910 for a greedy algorithm. 391 00:22:47,910 --> 00:22:50,400 If instead of guessing, trying all edges, 392 00:22:50,400 --> 00:22:52,440 if we could find a good edge to choose 393 00:22:52,440 --> 00:22:55,870 that's guaranteed to be in a minimum spanning tree, 394 00:22:55,870 --> 00:22:58,290 then we could actually follow this procedure, 395 00:22:58,290 --> 00:23:01,750 and this would be like an iterative algorithm. 396 00:23:01,750 --> 00:23:04,610 If you-- you don't guess-- you correctly 397 00:23:04,610 --> 00:23:08,090 choose a good-- you take the biggest cookie, 398 00:23:08,090 --> 00:23:11,014 you contract it, and then you repeat that process 399 00:23:11,014 --> 00:23:12,680 over and over, that would be a prototype 400 00:23:12,680 --> 00:23:14,760 for a greedy algorithm and that's what's going to work. 401 00:23:14,760 --> 00:23:16,440 There's different ways to choose this greedy edge, 402 00:23:16,440 --> 00:23:18,398 and we're going to get two different algorithms 403 00:23:18,398 --> 00:23:19,232 accordingly. 404 00:23:19,232 --> 00:23:20,440 But that's where we're going. 405 00:23:20,440 --> 00:23:22,736 First, I should prove this claim, cause, 406 00:23:22,736 --> 00:23:25,010 you know, where did edge contraction come from? 407 00:23:25,010 --> 00:23:25,820 Why does it work? 408 00:23:29,422 --> 00:23:31,970 It's not too hard to prove. 409 00:23:31,970 --> 00:23:32,550 Let's do it. 410 00:23:46,000 --> 00:23:47,070 Question? 411 00:23:47,070 --> 00:23:47,990 Oh. 412 00:23:47,990 --> 00:23:50,290 All right. 413 00:23:50,290 --> 00:23:53,670 I should be able to do this without looking. 414 00:23:53,670 --> 00:23:55,550 So-- 415 00:23:55,550 --> 00:23:57,500 Proof of optimal substructure. 416 00:23:57,500 --> 00:23:58,630 So we're given a lot. 417 00:23:58,630 --> 00:24:01,200 We're told that e belongs to a minimize spanning tree. 418 00:24:01,200 --> 00:24:03,780 Let's give that spanning tree a name. 419 00:24:03,780 --> 00:24:10,700 Say we have a minimum spanning tree T star, which contains e. 420 00:24:10,700 --> 00:24:15,090 So we're assuming that exists, then we contract e. 421 00:24:15,090 --> 00:24:16,600 And then we're given T prime, which 422 00:24:16,600 --> 00:24:20,450 is a minimum spanning tree of G slash e. 423 00:24:20,450 --> 00:24:22,250 And then we want to analyze this thing. 424 00:24:22,250 --> 00:24:24,280 So I want to claim that this thing is 425 00:24:24,280 --> 00:24:26,410 a minimum spanning tree, in other words, 426 00:24:26,410 --> 00:24:29,360 that the weight of that spanning tree 427 00:24:29,360 --> 00:24:32,310 is equal to the weight of this spanning tree, 428 00:24:32,310 --> 00:24:34,600 because this one is minimum. 429 00:24:34,600 --> 00:24:38,710 This is a minimum spanning of G. And this is also 430 00:24:38,710 --> 00:24:44,680 supposed to be a minimum spanning tree of G. 431 00:24:44,680 --> 00:24:46,460 OK. 432 00:24:46,460 --> 00:24:49,920 Sounds easy, right? 433 00:24:49,920 --> 00:24:51,130 I'm going to cheat, sorry. 434 00:25:00,330 --> 00:25:00,900 I see. 435 00:25:00,900 --> 00:25:01,400 Right. 436 00:25:01,400 --> 00:25:03,050 Duh. 437 00:25:03,050 --> 00:25:04,640 Easy, once you know how. 438 00:25:04,640 --> 00:25:09,744 So what we're going to do is think about contracting e. 439 00:25:09,744 --> 00:25:11,160 OK, we already know we're supposed 440 00:25:11,160 --> 00:25:13,760 to be thinking about contracting e in the graph. 441 00:25:13,760 --> 00:25:16,880 Let's look at how it changes that given minimum spanning 442 00:25:16,880 --> 00:25:17,720 tree. 443 00:25:17,720 --> 00:25:20,240 So we have T star, minimum spanning 444 00:25:20,240 --> 00:25:25,900 tree of the whole graph, and then I'm going to contract e. 445 00:25:25,900 --> 00:25:28,060 What I mean is, if that edge happens 446 00:25:28,060 --> 00:25:31,640 to be in the spanning tree-- it is, actually. 447 00:25:31,640 --> 00:25:33,540 We assumed that e is in there. 448 00:25:33,540 --> 00:25:38,090 So I'm basically removing, I'm just deleting that edge, 449 00:25:38,090 --> 00:25:40,220 maybe I should call it minus e. 450 00:25:43,160 --> 00:25:53,930 Then that should be a spanning tree of G slash e. 451 00:25:53,930 --> 00:25:56,580 So when I contract the edge in the graph, 452 00:25:56,580 --> 00:25:59,440 if I throw away the edge from this spanning tree, 453 00:25:59,440 --> 00:26:00,950 I should still have a spanning tree, 454 00:26:00,950 --> 00:26:02,750 and I don't know whether it's minimum. 455 00:26:02,750 --> 00:26:06,080 Probably, it is, but we won't prove that right now. 456 00:26:08,930 --> 00:26:11,105 I claim it's still a spanning tree. 457 00:26:11,105 --> 00:26:11,980 What would that take? 458 00:26:11,980 --> 00:26:17,720 It still hits all the vertices, because if I removed the edge, 459 00:26:17,720 --> 00:26:19,520 things would not be connected together. 460 00:26:19,520 --> 00:26:23,780 But this edge was in the spanning tree, 461 00:26:23,780 --> 00:26:26,600 and then I fused those two vertices together, 462 00:26:26,600 --> 00:26:29,390 so whatever spanning-- I mean, whatever was connected 463 00:26:29,390 --> 00:26:30,820 before is still connected. 464 00:26:30,820 --> 00:26:33,994 Contraction generally preserves connectivity. 465 00:26:33,994 --> 00:26:36,410 If these things were already connected directly by an edge 466 00:26:36,410 --> 00:26:39,270 when I contract, I still have a connected structure, 467 00:26:39,270 --> 00:26:42,000 so I'm still hitting all the vertices. 468 00:26:42,000 --> 00:26:45,260 And also, the number of edges is still exactly right. 469 00:26:45,260 --> 00:26:47,517 Before, I had n minus 1 edges. 470 00:26:47,517 --> 00:26:49,350 Afterwards, I'll still have n minus 1 edges, 471 00:26:49,350 --> 00:26:51,660 because I removed one edge and I removed one vertex, 472 00:26:51,660 --> 00:26:52,990 in terms of the count. 473 00:26:52,990 --> 00:26:55,870 So that proves that it's still a spanning tree, 474 00:26:55,870 --> 00:26:57,410 using properties of trees. 475 00:27:00,330 --> 00:27:02,370 Cool. 476 00:27:02,370 --> 00:27:07,390 So that means the minimum spanning tree, this thing, 477 00:27:07,390 --> 00:27:10,690 T prime, the minimum spanning tree of G slash e, 478 00:27:10,690 --> 00:27:13,077 has a smaller weight than this one. 479 00:27:13,077 --> 00:27:14,910 Because this is a spanning tree, the minimum 480 00:27:14,910 --> 00:27:16,990 is smaller than all spanning trees. 481 00:27:16,990 --> 00:27:19,600 So we know the weight of T prime is 482 00:27:19,600 --> 00:27:25,350 less than or equal to the weight of T star minus e. 483 00:27:29,850 --> 00:27:31,340 Cool. 484 00:27:31,340 --> 00:27:37,310 And now we want to know about this thing, the weight of T 485 00:27:37,310 --> 00:27:41,020 prime plus e. 486 00:27:41,020 --> 00:27:46,650 Well, that's just the weight of T prime plus the weight of e, 487 00:27:46,650 --> 00:27:48,560 because the weight of a tree is just 488 00:27:48,560 --> 00:27:50,670 the sum of the weights of the edges. 489 00:27:50,670 --> 00:27:53,293 So this is less than or equal to w 490 00:27:53,293 --> 00:28:02,130 of T star minus e plus e, which is just the weight of T star. 491 00:28:05,390 --> 00:28:08,640 So we proved that the weight of our proposed spanning tree 492 00:28:08,640 --> 00:28:10,820 is less than or equal to the weight of the minimum 493 00:28:10,820 --> 00:28:15,180 spanning tree in G, and therefore, T prime union 494 00:28:15,180 --> 00:28:17,170 e actually is a minimum spanning tree. 495 00:28:17,170 --> 00:28:17,670 OK? 496 00:28:17,670 --> 00:28:18,720 This is really easy. 497 00:28:28,527 --> 00:28:30,610 It actually implies that all of these inequalities 498 00:28:30,610 --> 00:28:32,030 have to be equalities, because we 499 00:28:32,030 --> 00:28:33,321 started with something minimum. 500 00:28:35,810 --> 00:28:36,530 Clear? 501 00:28:36,530 --> 00:28:38,260 That's the easier half. 502 00:28:38,260 --> 00:28:38,760 The 503 00:28:38,760 --> 00:28:41,218 More interesting property is going to be this greedy choice 504 00:28:41,218 --> 00:28:42,560 property. 505 00:28:42,560 --> 00:28:45,040 This is sort of where the action is for greedy algorithms, 506 00:28:45,040 --> 00:28:47,020 and this is usually the heart of proving 507 00:28:47,020 --> 00:28:48,649 greedy algorithms are correct. 508 00:28:48,649 --> 00:28:50,190 We don't yet have a greedy algorithm, 509 00:28:50,190 --> 00:28:51,930 but we're thinking about it. 510 00:28:51,930 --> 00:28:56,540 We need some way to intelligently choose an edge e, 511 00:28:56,540 --> 00:28:58,580 and I'm going to give you a whole bunch of ways 512 00:28:58,580 --> 00:29:00,060 to intelligently choose an edge e. 513 00:29:41,130 --> 00:29:42,720 So here's a really powerful lemma, 514 00:29:42,720 --> 00:29:44,928 and we're going to make it even stronger in a moment. 515 00:30:01,100 --> 00:30:04,210 So I'm going to introduce the notion of a cut, that's 516 00:30:04,210 --> 00:30:07,900 going to be a similar picture to what I had before. 517 00:30:07,900 --> 00:30:09,760 I'm going to look at some set of vertices. 518 00:30:09,760 --> 00:30:14,080 S here is a subset of the vertices, 519 00:30:14,080 --> 00:30:17,100 and that leaves in the graph, everything else. 520 00:30:17,100 --> 00:30:21,640 This would be V minus S. OK, so there's 521 00:30:21,640 --> 00:30:25,650 some vertices over here, some vertices over here, 522 00:30:25,650 --> 00:30:27,390 there's some edges that are purely 523 00:30:27,390 --> 00:30:30,610 inside one side of the cut. 524 00:30:30,610 --> 00:30:32,280 And then what I'm interested in are 525 00:30:32,280 --> 00:30:33,960 the edges that cross the cut. 526 00:30:38,380 --> 00:30:40,430 OK, whatever they look like, these edges. 527 00:30:40,430 --> 00:30:44,540 If an edge has one vertex in V and one vertex not in V, 528 00:30:44,540 --> 00:30:46,500 I call that edge a crossing edge. 529 00:30:51,680 --> 00:31:09,160 OK, so let's suppose that e is a least-weight edge crossing 530 00:31:09,160 --> 00:31:09,660 the cut. 531 00:31:15,160 --> 00:31:24,120 So let's say, let me be specific, if e is uv, 532 00:31:24,120 --> 00:31:28,046 then I want one of the endpoints, let's u, to be in S, 533 00:31:28,046 --> 00:31:31,400 and I want the other one to be not in S, 534 00:31:31,400 --> 00:31:35,140 so it's in capital V minus S. And that 535 00:31:35,140 --> 00:31:37,810 would be a crossing edge, and among all the crossing edges, 536 00:31:37,810 --> 00:31:41,360 I want to take one of minimum weight. 537 00:31:41,360 --> 00:31:46,140 There might be many, but pick any one. 538 00:31:46,140 --> 00:31:49,090 Then I claim that edge is in a minimum spanning tree. 539 00:32:00,030 --> 00:32:02,320 This is our golden ticket, right? 540 00:32:02,320 --> 00:32:05,590 If we can guarantee an edge is in the minimum spanning tree, 541 00:32:05,590 --> 00:32:07,210 then we plug that in here. 542 00:32:07,210 --> 00:32:09,950 Instead of guessing, we'll just take that edge-- 543 00:32:09,950 --> 00:32:12,370 we know it's in a minimum spanning tree-- 544 00:32:12,370 --> 00:32:15,650 and then we'll contract it and repeat this process. 545 00:32:15,650 --> 00:32:19,310 So the tricky part-- I mean, it is true that the minimum weight 546 00:32:19,310 --> 00:32:22,950 edge is in a minimum spanning tree, I'll give that away. 547 00:32:22,950 --> 00:32:25,335 But the question is, what you do then? 548 00:32:28,220 --> 00:32:29,820 And I guess you contract and repeat 549 00:32:29,820 --> 00:32:32,640 but, that will be Kruskal's algorithm. 550 00:32:32,640 --> 00:32:36,720 But this is, in some sense, a more general tool 551 00:32:36,720 --> 00:32:38,940 that will let us identify edges that are guaranteed 552 00:32:38,940 --> 00:32:40,398 to be in the minimum spanning tree, 553 00:32:40,398 --> 00:32:42,900 even after we've already identified some edges as being 554 00:32:42,900 --> 00:32:46,820 in the minimum spanning tree, so it's a little more powerful. 555 00:32:46,820 --> 00:32:51,260 Let's prove this claim. 556 00:32:51,260 --> 00:32:53,750 This is where things get particularly cool. 557 00:33:09,714 --> 00:33:11,630 And this is where we're going to use something 558 00:33:11,630 --> 00:33:13,020 called a c and paste argument. 559 00:33:19,200 --> 00:33:20,770 And if you are ever trying to prove 560 00:33:20,770 --> 00:33:22,890 a greedy algorithm correct, the first thing 561 00:33:22,890 --> 00:33:25,895 that should come to your mind is cut and paste. 562 00:33:25,895 --> 00:33:29,080 This is almost universally how you prove greedy algorithms 563 00:33:29,080 --> 00:33:34,530 to be correct, which is, suppose you have some optimal solution 564 00:33:34,530 --> 00:33:36,590 which doesn't have the property you want, 565 00:33:36,590 --> 00:33:38,810 like that it includes e here. 566 00:33:38,810 --> 00:33:41,330 And then you modify it, usually by cutting out 567 00:33:41,330 --> 00:33:43,820 one part of the solution and pasting in a different part, 568 00:33:43,820 --> 00:33:48,180 like e, and prove that you still have an optimal solution, 569 00:33:48,180 --> 00:33:50,580 and therefore, there is an optimal solution. 570 00:33:50,580 --> 00:33:56,150 There is an MST that has the property you want. 571 00:33:56,150 --> 00:33:59,620 OK, so we're going to do that by starting from an arbitrary 572 00:33:59,620 --> 00:34:02,130 minimum spanning tree. 573 00:34:02,130 --> 00:34:10,330 So let T star be a minimum spanning tree of G, 574 00:34:10,330 --> 00:34:13,000 and if the edge e is in there, we're done. 575 00:34:13,000 --> 00:34:17,770 So presumably, e is not in that minimum spanning tree. 576 00:34:20,420 --> 00:34:24,389 We're going to modify T star to include e. 577 00:34:24,389 --> 00:34:25,870 So again, let me draw the cut. 578 00:34:28,672 --> 00:34:32,770 There's S and V minus S. We have some edge e 579 00:34:32,770 --> 00:34:37,530 which crosses the cut, goes from u to v, 580 00:34:37,530 --> 00:34:39,300 that's not in the minimum spanning tree. 581 00:34:39,300 --> 00:34:44,350 Let's say in blue, I draw the minimum spanning tree. 582 00:34:44,350 --> 00:34:46,520 So you know, the minimum spanning tree 583 00:34:46,520 --> 00:34:48,025 connects everything together here. 584 00:34:51,750 --> 00:34:56,774 I claim it's got to have some edges that cross the cut, 585 00:34:56,774 --> 00:34:58,690 because if it has no edges that cross the cut, 586 00:34:58,690 --> 00:35:02,030 it doesn't connect vertices over here with vertices over here. 587 00:35:02,030 --> 00:35:06,910 So it may not use e, but some of the edges must cross the cut. 588 00:35:06,910 --> 00:35:13,290 So here's a possible minimum spanning tree. 589 00:35:13,290 --> 00:35:15,900 It happens to have sort of two components over here 590 00:35:15,900 --> 00:35:16,600 in S, maybe. 591 00:35:16,600 --> 00:35:19,420 Who knows? 592 00:35:19,420 --> 00:35:24,150 But there's got to be at least one edge the crosses over. 593 00:35:24,150 --> 00:35:31,760 In fact, the minimum spanning tree, T star, 594 00:35:31,760 --> 00:35:37,890 has to connect vertex u to vertex v, somehow. 595 00:35:37,890 --> 00:35:40,710 It doesn't use e, but there's got to be-- it's a tree, 596 00:35:40,710 --> 00:35:43,610 so in fact, there has to be a unique path from u 597 00:35:43,610 --> 00:35:50,710 to v in the minimum spanning tree. 598 00:35:50,710 --> 00:35:55,130 And now u is in S, v is not in S. So if you look at that path, 599 00:35:55,130 --> 00:35:56,630 for a while, you might stay in S, 600 00:35:56,630 --> 00:35:59,410 but eventually you have to leave S, which 601 00:35:59,410 --> 00:36:03,650 means there has to be an edge like this one, which 602 00:36:03,650 --> 00:36:08,060 I'll call it e prime, which transitions 603 00:36:08,060 --> 00:36:11,000 from S to V minus S. 604 00:36:11,000 --> 00:36:20,950 So there must be an edge e prime in the minimum spanning tree 605 00:36:20,950 --> 00:36:30,000 that crosses the cut, because u and v are connected by a path 606 00:36:30,000 --> 00:36:32,650 and that path starts in S, ends not in S, so it's got 607 00:36:32,650 --> 00:36:34,190 to transition at least once. 608 00:36:34,190 --> 00:36:36,570 It might transition many times, but there has 609 00:36:36,570 --> 00:36:38,600 to be at least one such edge. 610 00:36:38,600 --> 00:36:42,420 And now what I'm going to do is cut and paste. 611 00:36:42,420 --> 00:36:47,230 I'm going to remove e prime and add an e instead. 612 00:36:47,230 --> 00:36:55,985 So I'm going to look at T star minus e prime plus e. 613 00:36:59,510 --> 00:37:02,440 I claim that is a minimum spanning tree. 614 00:37:02,440 --> 00:37:05,940 First I want to claim, this is maybe the more annoying part, 615 00:37:05,940 --> 00:37:07,420 that it is a spanning tree. 616 00:37:16,590 --> 00:37:20,130 This is more of a graph theory thing. 617 00:37:20,130 --> 00:37:22,930 I guess one comforting thing is that you've 618 00:37:22,930 --> 00:37:26,080 preserved the number of edges, so it should still 619 00:37:26,080 --> 00:37:29,090 be if you get one property, you get 620 00:37:29,090 --> 00:37:32,180 the other, because I remove one edge, add in one edge, 621 00:37:32,180 --> 00:37:34,960 I'm still going to have n minus 1 edges. 622 00:37:34,960 --> 00:37:37,520 The worry, I guess, is that things become disconnected 623 00:37:37,520 --> 00:37:41,490 when you do that, but that's essentially not 624 00:37:41,490 --> 00:37:43,240 going to happen because if I think 625 00:37:43,240 --> 00:37:46,680 of removing e prime, again, that disconnects the tree into two 626 00:37:46,680 --> 00:37:48,290 parts. 627 00:37:48,290 --> 00:37:53,860 And I know, by this path, that one part contains this vertex, 628 00:37:53,860 --> 00:37:55,530 another part contains this vertex, 629 00:37:55,530 --> 00:37:58,300 and I know that this vertex is connected to u 630 00:37:58,300 --> 00:38:00,380 and this vertex is connected to v. Maybe I should 631 00:38:00,380 --> 00:38:03,220 call this u prime and v prime. 632 00:38:03,220 --> 00:38:06,030 I know u and u prime are connected by a path. 633 00:38:06,030 --> 00:38:08,380 I know v and v prime are connected by a path. 634 00:38:08,380 --> 00:38:10,160 But I know that by deleting e prime, 635 00:38:10,160 --> 00:38:12,900 u prime and v prime are not connected to each other. 636 00:38:12,900 --> 00:38:15,300 Therefore, u and v are not connected to each other, 637 00:38:15,300 --> 00:38:16,970 after removing e prime. 638 00:38:16,970 --> 00:38:20,750 So when I add in e, I newly connect u and v again, 639 00:38:20,750 --> 00:38:24,382 and so everything's connected back together. 640 00:38:24,382 --> 00:38:26,090 I have exactly the right number of edges. 641 00:38:26,090 --> 00:38:27,790 Therefore, I'm a spanning tree. 642 00:38:30,300 --> 00:38:31,910 So that's the graph three theory part. 643 00:38:31,910 --> 00:38:33,910 Now the interesting part from a greedy algorithm 644 00:38:33,910 --> 00:38:39,840 is to prove to this is minimum, that the weight is not too big. 645 00:38:39,840 --> 00:38:41,290 So let's do that over here. 646 00:38:50,500 --> 00:39:02,430 So I have the weight of T star minus e plus-- minus e 647 00:39:02,430 --> 00:39:04,500 prime plus e. 648 00:39:04,500 --> 00:39:07,210 By linearity, this is just the weight 649 00:39:07,210 --> 00:39:13,170 of T star minus the weight e prime plus the weight of e. 650 00:39:16,492 --> 00:39:18,200 And now we're going to use this property, 651 00:39:18,200 --> 00:39:21,850 we haven't that yet, e is a least-weight edge crossing 652 00:39:21,850 --> 00:39:23,190 the cut. 653 00:39:23,190 --> 00:39:26,295 So e prime crosses the cut, so does e, 654 00:39:26,295 --> 00:39:28,100 but e is the smallest possible weight you 655 00:39:28,100 --> 00:39:29,590 could have crossing the cut. 656 00:39:29,590 --> 00:39:34,880 That means that-- I'll put that over here-- the weight of e 657 00:39:34,880 --> 00:39:37,350 is less than or equal to the weight of e 658 00:39:37,350 --> 00:39:40,960 prime, because e prime is a particular edge crossing 659 00:39:40,960 --> 00:39:43,620 the cut, e was the smallest weight of them. 660 00:39:43,620 --> 00:39:47,610 So that tells us something about this. 661 00:39:47,610 --> 00:39:51,850 Signs are so difficult. I think that means 662 00:39:51,850 --> 00:39:53,720 that this is negative or zero. 663 00:39:56,820 --> 00:40:02,630 So this should be less than or equal to w of T star, 664 00:40:02,630 --> 00:40:04,240 and that's what I want, because that 665 00:40:04,240 --> 00:40:06,530 says the weight of this spanning tree is less than 666 00:40:06,530 --> 00:40:09,424 or equal to the optimum weight, the minimum weight. 667 00:40:09,424 --> 00:40:11,340 So that means, actually, this must be minimum. 668 00:40:14,960 --> 00:40:17,310 So what I've done is I've constructed a new minimum 669 00:40:17,310 --> 00:40:17,910 spanning tree. 670 00:40:17,910 --> 00:40:22,554 It's just as good as T star, but now it includes my edge e, 671 00:40:22,554 --> 00:40:23,970 and that's what I wanted to prove. 672 00:40:23,970 --> 00:40:25,820 There is a minimum spanning tree that 673 00:40:25,820 --> 00:40:28,987 contains e, provided e is the minimum weight 674 00:40:28,987 --> 00:40:29,820 edge crossing a cut. 675 00:40:33,110 --> 00:40:37,900 So that proves this greedy choice property. 676 00:40:37,900 --> 00:40:43,680 And I'm going to observe one extra feature of this proof, 677 00:40:43,680 --> 00:40:46,610 which is that-- so we cut and paste, 678 00:40:46,610 --> 00:40:50,060 in the sense that we removed one thing, which was e prime, 679 00:40:50,060 --> 00:40:53,760 and we added a different thing, e. 680 00:40:53,760 --> 00:41:00,420 And a useful feature is that the things that we change only 681 00:41:00,420 --> 00:41:03,010 are edges that cross the cut. 682 00:41:03,010 --> 00:41:11,870 So we only, let's say, modified edges that cross the cut. 683 00:41:22,860 --> 00:41:24,576 I'm going to use that later. 684 00:41:24,576 --> 00:41:27,200 We removed one edge that crossed the cut, and we put in the one 685 00:41:27,200 --> 00:41:29,548 that we wanted. 686 00:41:29,548 --> 00:41:31,330 OK so far? 687 00:41:31,330 --> 00:41:34,692 There's a bunch of lemmas. 688 00:41:34,692 --> 00:41:37,025 Now we actually get to do algorithms using these lemmas. 689 00:41:39,930 --> 00:41:44,790 We'll start with maybe the less obvious algorithm, 690 00:41:44,790 --> 00:41:47,480 but it's nice because it's very much like Dijkstra. 691 00:41:47,480 --> 00:41:50,810 It follows very closely to the Dijkstra model. 692 00:41:50,810 --> 00:41:53,021 And then we'll get to the one that we've all 693 00:41:53,021 --> 00:41:55,270 been thinking about, which was choose a minimum weight 694 00:41:55,270 --> 00:41:57,460 edge, contract, and repeat. 695 00:41:57,460 --> 00:42:03,790 That doesn't-- well, that does work, but the obvious way is, 696 00:42:03,790 --> 00:42:05,400 maybe, slow. 697 00:42:05,400 --> 00:42:07,870 We want to do it in near linear time. 698 00:42:20,075 --> 00:42:21,950 Let's start with the Dijkstra-like algorithm. 699 00:42:26,130 --> 00:42:27,340 This is Prim's algorithm. 700 00:42:36,410 --> 00:42:38,485 Maybe I'll start by writing down the algorithm. 701 00:42:38,485 --> 00:42:39,760 It's a little long. 702 00:42:43,080 --> 00:42:46,710 In general, the idea-- we want to apply this greedy choice 703 00:42:46,710 --> 00:42:47,210 property. 704 00:42:47,210 --> 00:42:48,710 To apply the greedy choice property, 705 00:42:48,710 --> 00:42:51,570 you need to choose a cut. 706 00:42:51,570 --> 00:42:54,800 With Prim, we're going to start out with an obvious cut, which 707 00:42:54,800 --> 00:42:56,490 is a single vertex. 708 00:42:56,490 --> 00:42:59,310 If we have a single vertex S, and we 709 00:42:59,310 --> 00:43:02,976 say that is our set capital S, then you know, 710 00:43:02,976 --> 00:43:05,100 there's some images coming out of it. 711 00:43:05,100 --> 00:43:08,900 There's basically S versus everyone else. 712 00:43:08,900 --> 00:43:10,040 That's a cut. 713 00:43:10,040 --> 00:43:11,910 And so I could take the minimum weight edge 714 00:43:11,910 --> 00:43:15,280 coming out of that cut and put that 715 00:43:15,280 --> 00:43:17,160 in my minimum spanning tree. 716 00:43:17,160 --> 00:43:22,420 So when I do that, I put it in my minimum spanning tree 717 00:43:22,420 --> 00:43:24,810 because I know it's in some minimum spanning tree. 718 00:43:24,810 --> 00:43:28,890 Now, I'm going to make capital S grow a little bit to include 719 00:43:28,890 --> 00:43:32,100 that vertex, and repeat. 720 00:43:32,100 --> 00:43:34,310 That's actually also a very natural algorithm. 721 00:43:34,310 --> 00:43:37,780 Start with a tiny s and just keep growing it one by one. 722 00:43:37,780 --> 00:43:41,660 At each stage use this lemma to guarantee the edge I'm adding 723 00:43:41,660 --> 00:43:45,040 is still in the minimum spanning tree. 724 00:43:45,040 --> 00:43:48,640 So to make that work out, we're always 725 00:43:48,640 --> 00:43:52,630 going to need to choose the minimum weight edge that's 726 00:43:52,630 --> 00:43:56,550 coming out of the cut. 727 00:43:56,550 --> 00:44:01,690 And we'll do that using a priority queue, 728 00:44:01,690 --> 00:44:02,920 just like we do in Dijkstra. 729 00:44:07,820 --> 00:44:11,460 So for every vertex that's in V minus S, 730 00:44:11,460 --> 00:44:15,380 we're going to have that vertex in the priority queue. 731 00:44:15,380 --> 00:44:21,870 And the question is, what is the key value of that node 732 00:44:21,870 --> 00:44:24,360 stored in the priority queue? 733 00:44:24,360 --> 00:44:29,760 So the invariant I'm going to have is that the key of v 734 00:44:29,760 --> 00:44:36,070 is the minimum of the weights of the edges 735 00:44:36,070 --> 00:44:41,600 that cross the cut into v. So for vertex v, 736 00:44:41,600 --> 00:44:43,360 I want to look at the-- I'm not going 737 00:44:43,360 --> 00:44:46,270 to compute this every time, I'm only going to maintain it. 738 00:44:46,270 --> 00:44:49,885 I want the minimum weight of an edge that starts in S 739 00:44:49,885 --> 00:44:54,870 and goes to v, which is not in S because v in Q-- Q only stores 740 00:44:54,870 --> 00:44:58,740 vertices that are not in S-- I want the key value 741 00:44:58,740 --> 00:45:00,760 to be that minimum weight so if I choose 742 00:45:00,760 --> 00:45:03,790 the overall minimum vertex, that gives me 743 00:45:03,790 --> 00:45:06,610 the edge of minimum weight that crosses the cut. 744 00:45:06,610 --> 00:45:07,110 OK? 745 00:45:07,110 --> 00:45:12,610 I've sort of divided this minimum vertex by vertex. 746 00:45:12,610 --> 00:45:15,790 For every vertex over here, I'm going 747 00:45:15,790 --> 00:45:18,740 to say, what's the minimum incoming weight from somebody 748 00:45:18,740 --> 00:45:19,750 over here? 749 00:45:19,750 --> 00:45:21,230 What's the minimum incoming weight 750 00:45:21,230 --> 00:45:23,040 from someone over here to there? 751 00:45:23,040 --> 00:45:24,716 To here? 752 00:45:24,716 --> 00:45:26,112 Take the minimum of those things. 753 00:45:26,112 --> 00:45:27,570 And of course, the min of all those 754 00:45:27,570 --> 00:45:29,595 will be the min of all those edges. 755 00:45:29,595 --> 00:45:32,450 OK, that's how I'm dividing things up. 756 00:45:32,450 --> 00:45:35,090 And this will be easier to maintain, but let me 757 00:45:35,090 --> 00:45:37,055 first initialize everything. 758 00:45:46,484 --> 00:45:48,400 OK, I guess we're going to actually initialize 759 00:45:48,400 --> 00:45:54,580 with S being the empty set, so Q will store everybody, 760 00:45:54,580 --> 00:45:59,790 except I'm going to get things started by setting 761 00:45:59,790 --> 00:46:02,430 for particular vertex little s. 762 00:46:02,430 --> 00:46:03,830 I'm going to set its key to zero. 763 00:46:07,310 --> 00:46:11,210 It doesn't matter who little s is. 764 00:46:11,210 --> 00:46:12,490 That's just your start vertex. 765 00:46:15,960 --> 00:46:19,750 Just pick one vertex and set its key to zero. 766 00:46:19,750 --> 00:46:22,350 That will force it to be chosen first 767 00:46:22,350 --> 00:46:32,180 because for everyone else, for v not equal to S, 768 00:46:32,180 --> 00:46:36,380 I'm going to set the key to infinity, 769 00:46:36,380 --> 00:46:40,304 because we haven't yet seen any edges that go in there, 770 00:46:40,304 --> 00:46:41,720 but we'll change that in a moment. 771 00:46:56,930 --> 00:47:01,050 OK, so that was the initialization, now 772 00:47:01,050 --> 00:47:03,020 we're going to do a loop. 773 00:47:03,020 --> 00:47:06,010 We're going to keep going until the Q is empty, 774 00:47:06,010 --> 00:47:10,550 because when the Q is empty, that means S is everybody, 775 00:47:10,550 --> 00:47:12,660 and at that point, we'll have a spanning tree 776 00:47:12,660 --> 00:47:16,005 on the whole graph, and it better be minimum. 777 00:47:16,005 --> 00:47:21,210 OK, and we're going to do that by extracting 778 00:47:21,210 --> 00:47:29,520 the minimum from our priority Q. When 779 00:47:29,520 --> 00:47:41,390 we remove Q-- we remove vertex u from the queue Q, 780 00:47:41,390 --> 00:47:44,920 this means that we're adding u to S. OK, 781 00:47:44,920 --> 00:47:48,030 by taking it out of Q, that means it enters S, 782 00:47:48,030 --> 00:47:51,070 by the invariant at the top. 783 00:47:51,070 --> 00:47:55,585 So now we need to update this invariant, 784 00:47:55,585 --> 00:47:57,520 that all the key values are correct. 785 00:47:57,520 --> 00:47:59,940 As soon as we move a vertex into S, 786 00:47:59,940 --> 00:48:05,490 now there are new edges we have to consider from S to not S, 787 00:48:05,490 --> 00:48:12,840 and we do that just by looking at all of the neighbors of u. 788 00:48:12,840 --> 00:48:14,490 I haven't written this in a long time, 789 00:48:14,490 --> 00:48:17,200 but this is how it's usually written in Dijkstra, 790 00:48:17,200 --> 00:48:20,560 except in Dijkstra, these are the outgoing edges from u 791 00:48:20,560 --> 00:48:21,730 and v are the neighbors. 792 00:48:21,730 --> 00:48:23,700 Here, it's an undirected graph, so these are 793 00:48:23,700 --> 00:48:25,580 all of the neighbors v of u. 794 00:48:25,580 --> 00:48:28,630 This as an adjacency list. 795 00:48:28,630 --> 00:48:38,039 OK, so we're looking at u, which has just been added to S, 796 00:48:38,039 --> 00:48:39,330 and we're looking at the edges. 797 00:48:39,330 --> 00:48:42,830 We want to look at the edge as they go to V minus S, only 798 00:48:42,830 --> 00:48:44,040 those ones. 799 00:48:44,040 --> 00:48:48,390 And then for those vertices v, we need to update their keys, 800 00:48:48,390 --> 00:48:52,050 because it used to just count all of these edges that 801 00:48:52,050 --> 00:48:54,860 went from the rest of S to v. And now we have a new edge 802 00:48:54,860 --> 00:49:01,240 uv that v needs to consider, because u just got added to S. 803 00:49:01,240 --> 00:49:05,020 So the first thing I'm going say is if v in in Q. 804 00:49:05,020 --> 00:49:09,020 So we're just going to store a Boolean for every vertex 805 00:49:09,020 --> 00:49:11,196 about whether it's in the queue, and so 806 00:49:11,196 --> 00:49:12,570 when I extract it from the queue, 807 00:49:12,570 --> 00:49:14,850 I just set that Boolean to false. 808 00:49:14,850 --> 00:49:17,530 Being in the queue is the same as being not in S, 809 00:49:17,530 --> 00:49:20,286 this is what Q represents. 810 00:49:20,286 --> 00:49:24,000 So Q is over here, kind of. 811 00:49:24,000 --> 00:49:31,560 So if we're in the queue, same as saying v is not in S, 812 00:49:31,560 --> 00:49:33,970 then we're going to do a check which 813 00:49:33,970 --> 00:49:35,440 lets us compute the minimum. 814 00:49:35,440 --> 00:49:37,370 This is going to look a lot like a relaxation. 815 00:49:44,117 --> 00:49:44,617 Sorry. 816 00:50:02,120 --> 00:50:03,800 A couple things going on because I 817 00:50:03,800 --> 00:50:05,244 want to compute not just the value 818 00:50:05,244 --> 00:50:06,910 of the minimum spanning tree, I actually 819 00:50:06,910 --> 00:50:08,535 want to find the minimum spanning tree, 820 00:50:08,535 --> 00:50:11,350 so I'm going to store parent pointers. 821 00:50:11,350 --> 00:50:13,180 But this is just basically taking a min. 822 00:50:13,180 --> 00:50:14,967 I say, if the weight of this edge 823 00:50:14,967 --> 00:50:16,800 is smaller than what's currently in the key, 824 00:50:16,800 --> 00:50:20,410 then update the key, because the key is supposed to be the min. 825 00:50:20,410 --> 00:50:23,561 OK, that's all we need to do to maintain this invariant, this 826 00:50:23,561 --> 00:50:24,060 for loop. 827 00:50:24,060 --> 00:50:27,830 After the for loop, this property will be restored, 828 00:50:27,830 --> 00:50:30,750 v dot key will be that minimum. 829 00:50:30,750 --> 00:50:33,370 And furthermore, we kept track of where the minimums came 830 00:50:33,370 --> 00:50:38,260 from, so when you end up extracting a vertex, 831 00:50:38,260 --> 00:50:42,810 you've already figured out which edge you 832 00:50:42,810 --> 00:50:45,530 added to put that into the set. 833 00:50:45,530 --> 00:50:50,140 So in fact, u already had a parent, 834 00:50:50,140 --> 00:50:56,810 this would be u dot parent, and we 835 00:50:56,810 --> 00:51:00,770 want to add that edge into the minimum spanning tree 836 00:51:00,770 --> 00:51:07,950 when we add u to S. Overall, let me write why this is happening. 837 00:51:10,810 --> 00:51:15,030 At the end of the algorithm, for every vertex v, 838 00:51:15,030 --> 00:51:16,450 we want the v dot parent. 839 00:51:22,710 --> 00:51:25,317 And that will be our minimum spanning tree. 840 00:51:25,317 --> 00:51:27,650 Those are the edges that form the minimum spanning tree. 841 00:51:33,460 --> 00:51:37,745 Let's prove that this works. 842 00:51:52,995 --> 00:51:54,245 Actually, let's do an example. 843 00:51:57,472 --> 00:51:59,995 We've done enough proofs for a while. 844 00:51:59,995 --> 00:52:01,190 Let's do it over here. 845 00:52:10,350 --> 00:52:12,280 I need a little break. 846 00:52:12,280 --> 00:52:16,420 Examples are fun, though easy to make mistakes, so correct me 847 00:52:16,420 --> 00:52:18,780 if you see me making a mistake. 848 00:52:18,780 --> 00:52:22,035 And let me draw a graph. 849 00:52:41,174 --> 00:52:42,920 OK, weights. 850 00:52:42,920 --> 00:52:53,890 14, 3, 8, 5, 6, 12, 7, 9, 15. 851 00:52:57,706 --> 00:52:58,206 10. 852 00:53:01,550 --> 00:53:03,580 OK. 853 00:53:03,580 --> 00:53:06,180 Colors. 854 00:53:06,180 --> 00:53:09,580 So I want to start at this vertex 855 00:53:09,580 --> 00:53:12,470 just because I know it does an interesting thing, 856 00:53:12,470 --> 00:53:14,490 or it's a nice example. 857 00:53:14,490 --> 00:53:16,360 Here's my weighted undirected graph. 858 00:53:16,360 --> 00:53:18,340 I want to compute minimum spanning tree. 859 00:53:18,340 --> 00:53:21,950 I'm going to start with a capital 860 00:53:21,950 --> 00:53:27,030 S being-- well actually, I start with capital S being nothing, 861 00:53:27,030 --> 00:53:30,040 and all of the weights-- all of the key values 862 00:53:30,040 --> 00:53:31,260 are initially infinity. 863 00:53:31,260 --> 00:53:34,520 So I'm going to write the key values in blue. 864 00:53:34,520 --> 00:53:45,230 So initially everything is infinity for every vertex, 865 00:53:45,230 --> 00:53:48,000 except for S the value is zero. 866 00:53:50,720 --> 00:53:53,530 So all of these things are in my priority queue, 867 00:53:53,530 --> 00:53:58,490 and so when I extract from the queue, I of course get S. OK, 868 00:53:58,490 --> 00:54:00,990 that's the point of that set up. 869 00:54:00,990 --> 00:54:04,640 So that's when I draw the red circle containing little s. 870 00:54:04,640 --> 00:54:08,480 The red circle here is supposed to be capital S. 871 00:54:08,480 --> 00:54:12,160 So at this point, I've added capital S-- little s 872 00:54:12,160 --> 00:54:19,560 to capital S, and then I look at all of the neighbors v of S. 873 00:54:19,560 --> 00:54:22,870 And I make sure that they are outside of S. In this case, 874 00:54:22,870 --> 00:54:23,500 they all are. 875 00:54:23,500 --> 00:54:27,885 All three neighbors, these three guys, are not in S. 876 00:54:27,885 --> 00:54:30,130 And then I look at the weights of the edges. 877 00:54:30,130 --> 00:54:31,920 Here I have a weight 7 edge. 878 00:54:31,920 --> 00:54:33,990 That's smaller than infinity, so I'm 879 00:54:33,990 --> 00:54:36,970 going to cross out infinity and write 7. 880 00:54:36,970 --> 00:54:39,430 And 15 is smaller than infinity, so I'm 881 00:54:39,430 --> 00:54:41,674 going to cross out infinity and write 15. 882 00:54:41,674 --> 00:54:45,139 And 10, surprise, is smaller than infinity. 883 00:54:45,139 --> 00:54:46,930 So I'm going to cross out infinity rate 10. 884 00:54:46,930 --> 00:54:51,070 So now I've updated the key values for those three nodes. 885 00:54:51,070 --> 00:54:53,500 I should mention in the priority queue, 886 00:54:53,500 --> 00:54:58,210 to do that, that is a decrease-key operation. 887 00:54:58,210 --> 00:55:01,445 This thing here is a decrease-key. 888 00:55:01,445 --> 00:55:03,320 You need to update the priority queue to say, 889 00:55:03,320 --> 00:55:07,090 hey look, the key of this node changed. 890 00:55:07,090 --> 00:55:09,920 And so you're going to have to move it around in the heap, 891 00:55:09,920 --> 00:55:12,400 or whatever. 892 00:55:12,400 --> 00:55:14,610 Just like Dijkstra, same thing happens. 893 00:55:14,610 --> 00:55:16,860 OK, so I've decreased the key of those three nodes. 894 00:55:16,860 --> 00:55:18,230 Now I do another iteration. 895 00:55:18,230 --> 00:55:20,450 I look at all of the key values stored. 896 00:55:20,450 --> 00:55:27,200 The smallest one is 7, because this node's no longer in there. 897 00:55:27,200 --> 00:55:30,840 So I'm going to add this node to capital S. 898 00:55:30,840 --> 00:55:34,920 So capital S is going to grow to include that node. 899 00:55:34,920 --> 00:55:36,800 I've extracted it from the queue. 900 00:55:36,800 --> 00:55:39,570 And now I look at all the neighbors of that node. 901 00:55:39,570 --> 00:55:42,520 So, for example, here's a neighbor. 902 00:55:42,520 --> 00:55:46,670 9 is less than infinity, so I write 9. 903 00:55:46,670 --> 00:55:47,630 Here's a neighbor. 904 00:55:47,630 --> 00:55:51,160 12 is less than infinity, so I write 12. 905 00:55:51,160 --> 00:55:53,810 5 is less than infinity, so I write 5. 906 00:55:53,810 --> 00:55:57,030 Here's a neighbor, but s is in big S, 907 00:55:57,030 --> 00:55:59,210 so we're not going to touch that edge. 908 00:55:59,210 --> 00:56:01,790 I'm not going to touch s. 909 00:56:01,790 --> 00:56:02,460 OK? 910 00:56:02,460 --> 00:56:06,070 I will end up looking at every edge twice, so no big deal. 911 00:56:06,070 --> 00:56:07,770 Right now, who's smallest? 912 00:56:07,770 --> 00:56:09,390 5, I think. 913 00:56:09,390 --> 00:56:12,440 It's the smallest blue key. 914 00:56:12,440 --> 00:56:14,530 So we're going to add 5 to the set. 915 00:56:17,730 --> 00:56:20,430 Sorry, add this vertex to the set S, 916 00:56:20,430 --> 00:56:24,370 and then look at all of the outgoing edges from here. 917 00:56:24,370 --> 00:56:28,720 So 6 is actually less than 12, so this edge 918 00:56:28,720 --> 00:56:31,200 is better than that one was. 919 00:56:31,200 --> 00:56:33,391 Then, what's that, an 8? 920 00:56:33,391 --> 00:56:34,990 8 Is less than 10. 921 00:56:37,820 --> 00:56:40,670 14 is definitely less than infinity. 922 00:56:40,670 --> 00:56:43,020 And we look at this edge, but that edge 923 00:56:43,020 --> 00:56:46,840 stays inside the red set, so we forget about it. 924 00:56:46,840 --> 00:56:50,280 Next smallest value is 6. 925 00:56:50,280 --> 00:56:59,240 So 6, we add this guy in. 926 00:56:59,240 --> 00:57:01,390 We look at the edges from that vertex, 927 00:57:01,390 --> 00:57:03,940 but actually nothing happens because all those vertices 928 00:57:03,940 --> 00:57:08,700 are inside capital S, so we don't care about those edges. 929 00:57:08,700 --> 00:57:20,350 Next one is 8, so we'll add in this vertex. 930 00:57:20,350 --> 00:57:23,830 And there's only one edge that leaves the cut, so that's 3, 931 00:57:23,830 --> 00:57:27,074 and 3 is indeed better than 14. 932 00:57:27,074 --> 00:57:31,700 So never mind. 933 00:57:31,700 --> 00:57:34,470 Stop. 934 00:57:34,470 --> 00:57:38,757 So good, now I think the smallest key is 3. 935 00:57:38,757 --> 00:57:40,590 Notice smallest key is smaller than anything 936 00:57:40,590 --> 00:57:43,390 we've seen before, other than 0, but that's OK. 937 00:57:43,390 --> 00:57:47,260 I'll just add it in, and there's no edges 938 00:57:47,260 --> 00:57:48,920 leaving the cut from there. 939 00:57:48,920 --> 00:57:51,140 And then over here, we have 9 and 15. 940 00:57:51,140 --> 00:57:53,320 So first we'll add 9. 941 00:57:53,320 --> 00:57:54,570 There's no edges there. 942 00:57:54,570 --> 00:57:55,670 Then we add 15. 943 00:57:55,670 --> 00:57:56,701 OK, now s is everything. 944 00:57:56,701 --> 00:57:57,200 We're done. 945 00:57:57,200 --> 00:57:58,544 Q is empty. 946 00:57:58,544 --> 00:57:59,960 Where's the minimal spanning tree? 947 00:57:59,960 --> 00:58:02,980 I forgot to draw it. 948 00:58:02,980 --> 00:58:06,200 Luckily, all of the edges here have different numbers 949 00:58:06,200 --> 00:58:07,010 as labels. 950 00:58:07,010 --> 00:58:09,780 So when I have a 3 here, what I mean is, 951 00:58:09,780 --> 00:58:11,970 include 3 in the minimum spanning tree, 952 00:58:11,970 --> 00:58:13,650 the edge that was labeled 3. 953 00:58:13,650 --> 00:58:16,670 OK, so this will be a minimum spanning tree edge. 954 00:58:16,670 --> 00:58:19,560 5 will be a minimum spanning tree edge. 955 00:58:19,560 --> 00:58:21,230 These are actually the parent pointers. 956 00:58:21,230 --> 00:58:24,020 6 will be a minimum spanning tree edge. 957 00:58:24,020 --> 00:58:31,645 7, 9, 15, and 8. 958 00:58:34,720 --> 00:58:37,460 Every vertex except the starting one 959 00:58:37,460 --> 00:58:40,030 will have a parent, which means we'll have exactly n minus 1 960 00:58:40,030 --> 00:58:42,880 edges, that's a good sign. 961 00:58:42,880 --> 00:58:45,320 And in fact, this will be a minimum spanning tree. 962 00:58:45,320 --> 00:58:49,460 That's the claim, because every time we grew the circle 963 00:58:49,460 --> 00:58:51,350 to include a bigger thing, we were 964 00:58:51,350 --> 00:58:55,970 guaranteed that this edge was in the minimum spanning tree 965 00:58:55,970 --> 00:59:00,100 by applying this property with that cut. 966 00:59:02,710 --> 00:59:04,250 Let me just write that down. 967 00:59:13,430 --> 00:59:15,030 OK, to prove correctness, you need 968 00:59:15,030 --> 00:59:19,750 to prove an invariant that this key, the key of every vertex, 969 00:59:19,750 --> 00:59:21,840 always remains this minimum. 970 00:59:21,840 --> 00:59:22,847 So this is an invariant. 971 00:59:22,847 --> 00:59:24,305 You should prove that by induction. 972 00:59:33,770 --> 00:59:34,810 I won't prove it here. 973 00:59:37,820 --> 00:59:44,280 But we have another invariant, a more interesting one 974 00:59:44,280 --> 00:59:46,146 from an MST perspective, you know, 975 00:59:46,146 --> 00:59:49,680 it's just a sort of algorithm implementation detail, 976 00:59:49,680 --> 00:59:59,640 that the tree T sub S, within S is always contained 977 00:59:59,640 --> 01:00:05,615 in a minimum spanning tree of G. So over here, 978 01:00:05,615 --> 01:00:07,740 we have this way of computing minimum spanning tree 979 01:00:07,740 --> 01:00:10,360 for all vertices v, but what I'd like 980 01:00:10,360 --> 01:00:12,470 to do is just look at v that's currently 981 01:00:12,470 --> 01:00:15,680 in S. By the end, that will be the whole thing, 982 01:00:15,680 --> 01:00:19,460 but if I look at v in S, and I always look at the edge from v 983 01:00:19,460 --> 01:00:24,580 to v dot parent, that gives me this tree TS. 984 01:00:24,580 --> 01:00:27,590 I claim it will be contained in a minimum spanning tree 985 01:00:27,590 --> 01:00:32,850 of the entire graph, proof by induction. 986 01:00:32,850 --> 01:00:39,840 So by induction, let's assume-- induction hypothesis will 987 01:00:39,840 --> 01:00:45,430 be that, let's say there is a minimum spanning tree T 988 01:00:45,430 --> 01:00:50,010 star, which contains T sub S, and then what 989 01:00:50,010 --> 01:00:54,020 the algorithm does, is it repeatedly grows S by adding 990 01:00:54,020 --> 01:01:05,240 this vertex u to S. So let's suppose that it adds u to S. 991 01:01:05,240 --> 01:01:08,230 So I'm actually going to look at the edge that it adds. 992 01:01:18,990 --> 01:01:27,630 So we have S and V minus S, and we do this thing, like we just 993 01:01:27,630 --> 01:01:30,060 saw, of growing by one. 994 01:01:30,060 --> 01:01:34,370 We add one new vertex over here to S, 995 01:01:34,370 --> 01:01:39,240 and that vertex has a parent edge, has a parent pointer. 996 01:01:39,240 --> 01:01:41,320 So this edge, I'm going to call e. 997 01:01:41,320 --> 01:01:44,620 So we're adding some vertex u that we extract at the minimum, 998 01:01:44,620 --> 01:01:49,230 and we also added an edge e to this TS, 999 01:01:49,230 --> 01:01:52,780 because we grew S by 1. 1000 01:01:52,780 --> 01:01:56,680 OK, when I do that, all I do is say, look, 1001 01:01:56,680 --> 01:01:58,980 greedy choice property guarantees 1002 01:01:58,980 --> 01:02:03,160 there's a minimum spanning tree that contains e. 1003 01:02:03,160 --> 01:02:05,480 Because we extracted the min from the queue, 1004 01:02:05,480 --> 01:02:08,320 and the key values are this, as I was arguing before, 1005 01:02:08,320 --> 01:02:13,180 that is the minimum overall edge that crosses the cut. 1006 01:02:13,180 --> 01:02:15,600 e is a minimum weight edge that crosses the cut, 1007 01:02:15,600 --> 01:02:29,370 and so by greedy choice property, 1008 01:02:29,370 --> 01:02:35,010 there is some minimum spanning tree that contains e. 1009 01:02:35,010 --> 01:02:37,250 But actually, I need that the minimum spanning tree 1010 01:02:37,250 --> 01:02:41,400 not only contains e, but also contains all the other spanning 1011 01:02:41,400 --> 01:02:46,050 tree edges that we had already said were in T star. 1012 01:02:46,050 --> 01:02:50,460 OK, so here's where I'm going to use the stronger property. 1013 01:02:50,460 --> 01:03:05,960 I can modify T star to include e and T sub S. 1014 01:03:05,960 --> 01:03:09,400 So we already assumed that T star includes T sub S. I just 1015 01:03:09,400 --> 01:03:11,860 don't want to break that. 1016 01:03:11,860 --> 01:03:14,760 And if you remember the proof of this greedy choice property, 1017 01:03:14,760 --> 01:03:19,360 we said, well all we need to do is remove one edge that crosses 1018 01:03:19,360 --> 01:03:22,045 the cut and replace it with e. 1019 01:03:22,045 --> 01:03:23,920 So here what I'm saying is there's some edge, 1020 01:03:23,920 --> 01:03:27,480 yeah, maybe there's some edge over here in T star 1021 01:03:27,480 --> 01:03:30,850 that we had to remove, and then we put e in. 1022 01:03:30,850 --> 01:03:34,170 And then we get a minimum spanning tree again, 1023 01:03:34,170 --> 01:03:36,760 T star prime. 1024 01:03:36,760 --> 01:03:42,530 OK, this edge that I remove cannot be one of the TS edges 1025 01:03:42,530 --> 01:03:45,660 because the TS edges are all inside S. 1026 01:03:45,660 --> 01:03:49,270 So because I'm only removing an edge that crosses the cut, 1027 01:03:49,270 --> 01:03:50,910 I'm not disturbing TS. 1028 01:03:50,910 --> 01:03:56,230 TS will remain inside T star, but then I get the new property 1029 01:03:56,230 --> 01:04:02,615 that e is inside T star, and so I prove this invariant holds. 1030 01:04:02,615 --> 01:04:03,115 OK? 1031 01:04:03,115 --> 01:04:06,180 I keep changing T star, but I always preserve the property 1032 01:04:06,180 --> 01:04:09,110 that all of the spanning tree edges that are inside S 1033 01:04:09,110 --> 01:04:12,255 are contained in some minimum spanning tree of G. Maybe 1034 01:04:12,255 --> 01:04:13,630 I'll add in some for emphasis. 1035 01:04:17,520 --> 01:04:18,630 Cool? 1036 01:04:18,630 --> 01:04:20,430 So that's how we use the greedy choice 1037 01:04:20,430 --> 01:04:24,640 property to get correctness of Prim's algorithm. 1038 01:04:26,895 --> 01:04:28,728 What's the running time of Prim's algorithm? 1039 01:04:36,050 --> 01:04:37,690 Same as Dijkstra, good answer. 1040 01:04:41,190 --> 01:04:43,645 I guess it depends what priority queue you use, 1041 01:04:43,645 --> 01:04:46,500 but whatever priority queue you use, it's the same as Dijkstra. 1042 01:04:55,790 --> 01:04:58,430 And so in particular, if we use Fibonacci heaps, which, 1043 01:04:58,430 --> 01:05:09,646 again, we're not covering, we get V log V plus E. In general, 1044 01:05:09,646 --> 01:05:11,520 for every edge, we have to do a decrease-key. 1045 01:05:11,520 --> 01:05:14,160 Actually, for every edge we do two decrease-key operations, 1046 01:05:14,160 --> 01:05:16,720 potentially, if you think about it. 1047 01:05:16,720 --> 01:05:21,780 But this for loop over the adjacency, the cost 1048 01:05:21,780 --> 01:05:23,790 of this stuff is constant. 1049 01:05:23,790 --> 01:05:30,530 The cost of this is the degree of the vertex u. 1050 01:05:30,530 --> 01:05:32,910 And so we're basically doing the sum 1051 01:05:32,910 --> 01:05:36,650 of the degrees of the vertices, which 1052 01:05:36,650 --> 01:05:38,580 is the number of edges times 2. 1053 01:05:38,580 --> 01:05:40,080 That's the handshaking lemma. 1054 01:05:40,080 --> 01:05:42,310 So for every edge, we're potentially 1055 01:05:42,310 --> 01:05:44,340 doing one decrease-key operation, 1056 01:05:44,340 --> 01:05:46,880 and with Fibonacci heaps, that's constant time. 1057 01:05:46,880 --> 01:05:50,207 But we're also doing V extract mins those cost log V time, 1058 01:05:50,207 --> 01:05:51,790 cause the size of the queue is at most 1059 01:05:51,790 --> 01:05:55,150 V, and so that is actually the right running time. 1060 01:05:55,150 --> 01:05:57,700 Just like Dijkstra, so easy formula to remember. 1061 01:06:00,630 --> 01:06:07,020 All right, let's do one more algorithm, Kruskal's algorithm. 1062 01:06:41,280 --> 01:06:44,860 Kruskal's algorithm is a little bit weirder from the S 1063 01:06:44,860 --> 01:06:48,132 perspective, I guess. 1064 01:06:48,132 --> 01:06:52,802 We'll see what cuts we're using in a moment, 1065 01:06:52,802 --> 01:06:54,260 but it's based around this idea of, 1066 01:06:54,260 --> 01:06:56,220 well, the globally minimum weight 1067 01:06:56,220 --> 01:07:01,390 edge is the minimum weight edge for all cuts that cross it, 1068 01:07:01,390 --> 01:07:03,760 or for all cuts that it crosses. 1069 01:07:03,760 --> 01:07:07,160 The globally minimum weight edge is going to be a valid choice, 1070 01:07:07,160 --> 01:07:08,650 and so, by this theorem, you pick 1071 01:07:08,650 --> 01:07:13,050 some S that partitions the endpoints of e, 1072 01:07:13,050 --> 01:07:14,800 therefore e is in a minimum spanning tree. 1073 01:07:14,800 --> 01:07:18,510 So let's choose that one first, and then repeat. 1074 01:07:18,510 --> 01:07:21,680 Conceptually, what we want to do is that DP idea of contract 1075 01:07:21,680 --> 01:07:24,390 the vertex, sorry, contract the edge 1076 01:07:24,390 --> 01:07:27,580 and then find the minimum weight edge that remains. 1077 01:07:27,580 --> 01:07:30,400 But the way I'm going to phrase it doesn't explicitly contract, 1078 01:07:30,400 --> 01:07:33,960 although implicitly, it's doing that. 1079 01:07:33,960 --> 01:07:36,820 And there's a catch. 1080 01:07:39,800 --> 01:07:47,444 The catch is suppose I've picked some edges out to be 1081 01:07:47,444 --> 01:07:48,610 in my minimum spanning tree. 1082 01:07:48,610 --> 01:07:50,160 Suppose this was the minimum weight 1083 01:07:50,160 --> 01:07:52,065 and this was the next minimum, next minimum, next minimum, 1084 01:07:52,065 --> 01:07:52,810 next minimum. 1085 01:07:52,810 --> 01:07:56,450 Suppose that the next lar-- at this point, 1086 01:07:56,450 --> 01:07:59,710 after contracting those edges, the minimum weight edge 1087 01:07:59,710 --> 01:08:02,740 is this one. 1088 01:08:02,740 --> 01:08:06,180 Do I want to put this edge in my minimum spanning tree? 1089 01:08:06,180 --> 01:08:07,380 No. 1090 01:08:07,380 --> 01:08:08,340 That would add a cycle. 1091 01:08:08,340 --> 01:08:10,710 Cycles are bad. 1092 01:08:10,710 --> 01:08:12,560 This is the tricky part of this algorithm. 1093 01:08:12,560 --> 01:08:16,560 I have to keep track of whether I should actually 1094 01:08:16,560 --> 01:08:19,590 add an edge, in other words, whether this vertex 1095 01:08:19,590 --> 01:08:23,840 and this vertex have already been connected to each other. 1096 01:08:23,840 --> 01:08:26,189 And it turns out you've already seen a data structure 1097 01:08:26,189 --> 01:08:27,630 to do that. 1098 01:08:27,630 --> 01:08:30,229 This is what I call union-find and the textbook 1099 01:08:30,229 --> 01:08:32,090 calls it disjoint-set data structure. 1100 01:08:37,609 --> 01:08:40,700 So it's in recitation. 1101 01:08:40,700 --> 01:08:41,535 Recitation 3. 1102 01:08:48,430 --> 01:08:51,120 So I want to maintain for my MST so far, 1103 01:08:51,120 --> 01:08:53,090 so I'm adding edges one at a time. 1104 01:08:53,090 --> 01:08:55,600 And I have some tree-- well, it's actually a forest, 1105 01:08:55,600 --> 01:08:59,550 but I'm still going to call it T, 1106 01:08:59,550 --> 01:09:05,740 and I'm going to maintain it in a union-find structure, 1107 01:09:05,740 --> 01:09:08,770 disjoint-set set data structure. 1108 01:09:08,770 --> 01:09:12,770 Remember, this had three operations, make set, union, 1109 01:09:12,770 --> 01:09:14,100 and find set. 1110 01:09:14,100 --> 01:09:19,539 Tell me given an item which set does it belong to? 1111 01:09:19,539 --> 01:09:21,330 We're going to use that, the sets are going 1112 01:09:21,330 --> 01:09:23,560 to be the connected components. 1113 01:09:23,560 --> 01:09:27,664 So after I've added these edges, these guys, these vertices 1114 01:09:27,664 --> 01:09:29,330 here, will form one connected component, 1115 01:09:29,330 --> 01:09:31,315 and, you know, everybody else will just 1116 01:09:31,315 --> 01:09:33,740 be in its own separate component. 1117 01:09:33,740 --> 01:09:40,500 So to get started, I'm not going to have any edges in my tree, 1118 01:09:40,500 --> 01:09:43,890 and so every vertex is in its own connected component. 1119 01:09:43,890 --> 01:09:51,420 So I represent that by calling make-set v for all vertices. 1120 01:09:51,420 --> 01:09:56,000 So every vertex lives in its own singleton set. 1121 01:09:56,000 --> 01:10:00,245 OK, now I'd like to do the minimum weight edge, and then 1122 01:10:00,245 --> 01:10:02,620 the next minimum weight edge, and the next minimum weight 1123 01:10:02,620 --> 01:10:03,120 edge. 1124 01:10:03,120 --> 01:10:05,450 That's also known as sorting, so I'm 1125 01:10:05,450 --> 01:10:16,210 going to sort E by weight, increasing weight, 1126 01:10:16,210 --> 01:10:20,080 so I get to start with the minimum weight edge. 1127 01:10:30,690 --> 01:10:43,010 So now I'm going to do a for-loop over the edges, 1128 01:10:43,010 --> 01:10:44,520 increasing order by weight. 1129 01:10:49,640 --> 01:10:53,360 Now I want to know-- I have an edge, 1130 01:10:53,360 --> 01:10:55,610 it's basically the minimum weight edge among the edges 1131 01:10:55,610 --> 01:10:58,740 that remain, and so I want to know whether I should add it. 1132 01:10:58,740 --> 01:11:01,894 I'm going to add it provided the endpoints of the edge 1133 01:11:01,894 --> 01:11:03,560 are not in the same connected component. 1134 01:11:06,890 --> 01:11:09,160 How can I find out whether two vertices 1135 01:11:09,160 --> 01:11:11,548 are in the same connected component, given this setup? 1136 01:11:16,528 --> 01:11:17,364 Yeah? 1137 01:11:17,364 --> 01:11:19,030 AUDIENCE: Call find-set twice and then-- 1138 01:11:19,030 --> 01:11:20,738 ERIK DEMAINE: Call find-set twice and see 1139 01:11:20,738 --> 01:11:23,110 whether they're equal, exactly. 1140 01:11:23,110 --> 01:11:23,880 Good answer. 1141 01:11:26,840 --> 01:11:38,520 So if you find-set of u is from find-set of v, 1142 01:11:38,520 --> 01:11:41,394 find-set just returns some identifier. 1143 01:11:41,394 --> 01:11:43,060 We don't really care what it is, as long 1144 01:11:43,060 --> 01:11:45,547 as it returns the same thing for the same set. 1145 01:11:45,547 --> 01:11:47,630 So if u and v are in the same set, in other words, 1146 01:11:47,630 --> 01:11:48,810 they're in the same connected component, 1147 01:11:48,810 --> 01:11:51,290 then find-set will return the same thing for both. 1148 01:11:51,290 --> 01:11:53,410 But provided they're not equal, then 1149 01:11:53,410 --> 01:12:01,890 we can add this edge into our tree. 1150 01:12:01,890 --> 01:12:06,330 So we add e to the set T, and then 1151 01:12:06,330 --> 01:12:08,150 we have to represent the fact that we just 1152 01:12:08,150 --> 01:12:10,590 merged the connected components of u and v, 1153 01:12:10,590 --> 01:12:12,360 and we do that with a union call. 1154 01:12:17,336 --> 01:12:18,710 And if you're ever wondering what 1155 01:12:18,710 --> 01:12:22,270 the heck do we use union-find for, this is the answer. 1156 01:12:22,270 --> 01:12:24,640 The union-find data structure was invented in order 1157 01:12:24,640 --> 01:12:27,710 to implement Kruskal's algorithm faster, OK? 1158 01:12:27,710 --> 01:12:29,110 In fact, a lot of data structures 1159 01:12:29,110 --> 01:12:31,210 come from graph algorithms. 1160 01:12:31,210 --> 01:12:34,200 The reason Fibonacci heaps were invented 1161 01:12:34,200 --> 01:12:35,970 was because there was Dijkstra's algorithm 1162 01:12:35,970 --> 01:12:37,450 and we wanted it to run fast. 1163 01:12:37,450 --> 01:12:40,700 So same deal here, you just saw it in the reverse order. 1164 01:12:40,700 --> 01:12:41,780 First you saw union-find. 1165 01:12:41,780 --> 01:12:43,710 Now, union-find, you know you can solve v 1166 01:12:43,710 --> 01:12:46,570 in alpha of n time, the inverse Ackermann function, 1167 01:12:46,570 --> 01:12:49,770 super, super tiny, slow growing function, smaller than log 1168 01:12:49,770 --> 01:12:51,370 log log log log log log. 1169 01:12:53,940 --> 01:12:55,560 Really small. 1170 01:12:55,560 --> 01:12:58,000 But we have this sorting, which is kind of annoying. 1171 01:12:58,000 --> 01:13:00,395 So the overall running time-- we'll 1172 01:13:00,395 --> 01:13:02,740 worry about correctness in a moment. 1173 01:13:02,740 --> 01:13:09,940 We have to sort-- to sort E by weight. 1174 01:13:09,940 --> 01:13:12,430 So I'll just call that's sort of E. 1175 01:13:12,430 --> 01:13:15,900 Then we have to do some unions. 1176 01:13:15,900 --> 01:13:20,420 I guess for every edge, potentially, we do a union. 1177 01:13:20,420 --> 01:13:29,280 I'll just write E times alpha of v. And then we have to do, 1178 01:13:29,280 --> 01:13:31,765 well, we also have to find-sets, but same deal. 1179 01:13:31,765 --> 01:13:34,990 So find-set and union cost alpha amortized, 1180 01:13:34,990 --> 01:13:36,960 so the total cost for doing this for all edges 1181 01:13:36,960 --> 01:13:40,340 is going to be the number of edges times alpha, 1182 01:13:40,340 --> 01:13:45,700 and then there's like plus v, I guess, but that's smaller. 1183 01:13:45,700 --> 01:13:47,230 That's a connected graph. 1184 01:13:47,230 --> 01:13:51,000 So other than the sorting time, this algorithm is really good. 1185 01:13:51,000 --> 01:13:53,910 It's faster. 1186 01:13:53,910 --> 01:13:57,120 But if you're sorting by an n log n algorithm, 1187 01:13:57,120 --> 01:14:00,070 this is not so great. 1188 01:14:00,070 --> 01:14:01,030 That's how it goes. 1189 01:14:01,030 --> 01:14:04,720 I think you can reduce this to sorting just v things, instead 1190 01:14:04,720 --> 01:14:07,490 of E things, with a little bit of effort, 1191 01:14:07,490 --> 01:14:09,060 like doing a select operation. 1192 01:14:09,060 --> 01:14:11,690 But when this algorithm is really good 1193 01:14:11,690 --> 01:14:14,240 is if your weights are integers. 1194 01:14:14,240 --> 01:14:23,540 If You have weights, let's say weight of e is 0 or 1 1195 01:14:23,540 --> 01:14:27,330 or, say, n to the c, for some constant c, 1196 01:14:27,330 --> 01:14:31,340 then I can use rate x sort, linear time sorting, 1197 01:14:31,340 --> 01:14:32,920 and then this will be linear time, 1198 01:14:32,920 --> 01:14:34,950 and I'm only paying E times alpha. 1199 01:14:34,950 --> 01:14:38,100 So if you have reasonably small weights, 1200 01:14:38,100 --> 01:14:39,620 Kruskal's algorithm is better. 1201 01:14:39,620 --> 01:14:44,920 Otherwise, I guess you prefer Prim's algorithm. 1202 01:14:44,920 --> 01:14:45,730 But either away. 1203 01:14:48,870 --> 01:14:51,660 I actually used a variation of this algorithm recently. 1204 01:14:51,660 --> 01:14:54,270 If you want to generate a random spanning tree, 1205 01:14:54,270 --> 01:14:56,800 then you can use exactly the same algorithm. 1206 01:14:56,800 --> 01:14:59,880 You pick a random manage that you haven't picked already, you 1207 01:14:59,880 --> 01:15:03,240 see, can I add this edge with this test? 1208 01:15:03,240 --> 01:15:04,981 If you can, add it and repeat. 1209 01:15:04,981 --> 01:15:06,730 That will give you a random spanning tree. 1210 01:15:06,730 --> 01:15:10,990 It will generate all spanning trees uniform leap likely. 1211 01:15:10,990 --> 01:15:16,080 So that's a fun fact, useful thing for union-find. 1212 01:15:16,080 --> 01:15:17,850 Let me tell you briefly about correctness. 1213 01:15:34,012 --> 01:15:35,970 Again, we proved correctness with an invariant. 1214 01:15:50,720 --> 01:15:53,070 Claim that at all times the tree T 1215 01:15:53,070 --> 01:15:54,920 of edges that we've picked so far 1216 01:15:54,920 --> 01:16:00,790 is contained in some minimum spanning tree, T star. 1217 01:16:00,790 --> 01:16:03,480 T start is going to change, but I always 1218 01:16:03,480 --> 01:16:06,680 want the edges I've chosen to be inside a minimum spanning tree. 1219 01:16:06,680 --> 01:16:09,000 Again, we can prove this by induction. 1220 01:16:09,000 --> 01:16:15,000 So assume by induction that this is true 1221 01:16:15,000 --> 01:16:21,910 so far, and then suppose that we're adding an edge here. 1222 01:16:21,910 --> 01:16:27,770 So we're converting T into T prime, which is T union e. 1223 01:16:31,650 --> 01:16:34,690 By the data structural setup, I know 1224 01:16:34,690 --> 01:16:36,880 that the endpoints of e, u, and v 1225 01:16:36,880 --> 01:16:40,470 are in different connected components. 1226 01:16:40,470 --> 01:16:42,230 In general, what my picture looks like, 1227 01:16:42,230 --> 01:16:44,610 is I have some various connected components, 1228 01:16:44,610 --> 01:16:48,320 maybe there's a single vertex, whatever. 1229 01:16:48,320 --> 01:16:50,780 I've built a minimum spanning tree for each one. 1230 01:16:50,780 --> 01:16:53,430 I built some tree, and I actually 1231 01:16:53,430 --> 01:16:56,520 know that these trees are contained in one global minimum 1232 01:16:56,520 --> 01:16:59,445 spanning tree. 1233 01:16:59,445 --> 01:17:01,750 OK, and now we're looking at an edge that 1234 01:17:01,750 --> 01:17:05,330 goes from some vertex u in one connected component 1235 01:17:05,330 --> 01:17:08,450 to some vertex v in a different connected component. 1236 01:17:08,450 --> 01:17:11,000 This is our edge e. 1237 01:17:11,000 --> 01:17:13,170 That's our setup. 1238 01:17:13,170 --> 01:17:14,947 Because the union-find data structure 1239 01:17:14,947 --> 01:17:16,530 maintains connected components, that's 1240 01:17:16,530 --> 01:17:18,570 another invariant to prove. 1241 01:17:18,570 --> 01:17:21,360 We're considering adding this edge, which connects two 1242 01:17:21,360 --> 01:17:23,490 different connected components. 1243 01:17:23,490 --> 01:17:29,850 So I want to use the greedy choice property with some S. 1244 01:17:29,850 --> 01:17:31,210 What should S be? 1245 01:17:51,710 --> 01:17:55,929 I want e to cross a cut, so what's a good cut? 1246 01:18:03,913 --> 01:18:04,911 Yeah? 1247 01:18:04,911 --> 01:18:06,286 AUDIENCE: The connected component 1248 01:18:06,286 --> 01:18:07,785 of u and then everything else. 1249 01:18:07,785 --> 01:18:09,160 ERIK DEMAINE: Connected component 1250 01:18:09,160 --> 01:18:10,770 of u and everything else? 1251 01:18:10,770 --> 01:18:11,440 AUDIENCE: Yeah. 1252 01:18:11,440 --> 01:18:12,939 ERIK DEMAINE: That would work, which 1253 01:18:12,939 --> 01:18:15,550 is also the opposite of the connected component containing 1254 01:18:15,550 --> 01:18:18,570 v. There are many choices that work. 1255 01:18:18,570 --> 01:18:20,899 I could take basically this cut, which 1256 01:18:20,899 --> 01:18:22,940 is the connected component of you with everything 1257 01:18:22,940 --> 01:18:25,250 else versus the connected component of v. 1258 01:18:25,250 --> 01:18:28,340 I could take this cut, which is the connected component of u 1259 01:18:28,340 --> 01:18:30,690 only versus everybody else. 1260 01:18:30,690 --> 01:18:32,730 Either of those will work. 1261 01:18:32,730 --> 01:18:33,230 Good. 1262 01:18:36,370 --> 01:18:40,030 Good curve, all right. 1263 01:18:40,030 --> 01:18:44,470 So let's say S equals the connected component of u, 1264 01:18:44,470 --> 01:18:48,820 or connected component of v. e crosses that, all right? 1265 01:18:48,820 --> 01:18:51,870 Because it goes from u to v, and u is on one side, 1266 01:18:51,870 --> 01:18:55,110 v is on the other side. 1267 01:18:55,110 --> 01:18:57,500 I wanted to include an entire connected component 1268 01:18:57,500 --> 01:19:00,840 because when I apply the greedy choice property, 1269 01:19:00,840 --> 01:19:03,140 I modify T star, and I don't want 1270 01:19:03,140 --> 01:19:06,410 to modify, I don't want to delete any of these edges that 1271 01:19:06,410 --> 01:19:08,380 are already in my connected components, 1272 01:19:08,380 --> 01:19:10,330 that I've already put in there. 1273 01:19:10,330 --> 01:19:13,280 But if I choose my cut to just be this, 1274 01:19:13,280 --> 01:19:15,950 I know that the edge that I potentially remove 1275 01:19:15,950 --> 01:19:17,524 will cross this cut, which means it 1276 01:19:17,524 --> 01:19:19,440 goes between connected components, which means 1277 01:19:19,440 --> 01:19:22,060 I haven't added that yet to T. 1278 01:19:22,060 --> 01:19:25,290 So when I apply this greedy choice property, 1279 01:19:25,290 --> 01:19:30,470 I'm not deleting anything from T. Everything that 1280 01:19:30,470 --> 01:19:33,680 was in T is still in T star. 1281 01:19:33,680 --> 01:19:43,730 So that tells me that T prime is contained in T star prime. 1282 01:19:43,730 --> 01:19:48,150 The new T star that I get when I apply the cut and paste 1283 01:19:48,150 --> 01:19:50,350 argument, I modify T star potentially 1284 01:19:50,350 --> 01:19:53,120 by removing one edge and putting e in. 1285 01:19:53,120 --> 01:19:54,600 And the edge that I remove was not 1286 01:19:54,600 --> 01:19:58,850 already in T, which means I preserve this part, 1287 01:19:58,850 --> 01:20:02,580 but I also get that my new edge e is 1288 01:20:02,580 --> 01:20:04,360 in the minimum spanning tree. 1289 01:20:04,360 --> 01:20:07,220 And so that's how you prove by induction that at all times 1290 01:20:07,220 --> 01:20:11,600 the edges that you've chosen so far are in T star. 1291 01:20:11,600 --> 01:20:14,270 Actually, to apply the greedy choice property, 1292 01:20:14,270 --> 01:20:17,710 I need not only that e is cut-- sorry, that e crosses the cut, 1293 01:20:17,710 --> 01:20:19,650 I also need that e is the minimum weight 1294 01:20:19,650 --> 01:20:21,470 edge crossing the cut. 1295 01:20:21,470 --> 01:20:26,590 That's a little more argument to prove. 1296 01:20:26,590 --> 01:20:29,642 The rough idea is that if you forget 1297 01:20:29,642 --> 01:20:31,350 about the edges we've already dealt with, 1298 01:20:31,350 --> 01:20:34,240 e is the globally minimum weight edge. 1299 01:20:34,240 --> 01:20:36,870 OK, but what about the edges we've already dealt with? 1300 01:20:36,870 --> 01:20:40,160 Some of them are in the tree. 1301 01:20:40,160 --> 01:20:43,411 The edges that are in these-- that are in T, those 1302 01:20:43,411 --> 01:20:44,660 obviously don't cross the cut. 1303 01:20:44,660 --> 01:20:46,200 That's how we designed the cut. 1304 01:20:46,200 --> 01:20:48,260 The cup was designed not to cross, 1305 01:20:48,260 --> 01:20:50,870 not two separate any of these connected components. 1306 01:20:50,870 --> 01:20:54,730 So all the edges that we've added to T, those are OK. 1307 01:20:54,730 --> 01:20:59,650 They're not related to the edges that cross this cut. 1308 01:20:59,650 --> 01:21:02,590 But we may have already considered some lower weight 1309 01:21:02,590 --> 01:21:08,334 edges that we didn't add to T. If we didn't add an edge to T, 1310 01:21:08,334 --> 01:21:10,500 that means actually they were in the same set, which 1311 01:21:10,500 --> 01:21:17,000 means also those are-- I'm going to use my other color, blue. 1312 01:21:17,000 --> 01:21:19,540 Those are extra edges in here that 1313 01:21:19,540 --> 01:21:23,290 are inside a connected component, 1314 01:21:23,290 --> 01:21:25,050 have smaller weight than e, but they're 1315 01:21:25,050 --> 01:21:26,400 inside the connected component. 1316 01:21:26,400 --> 01:21:28,580 So again, they're not crossed. 1317 01:21:28,580 --> 01:21:31,640 So they don't cross the cut, rather. 1318 01:21:31,640 --> 01:21:34,240 So e is basically the first edge that we're 1319 01:21:34,240 --> 01:21:36,050 considering that crosses this cut, 1320 01:21:36,050 --> 01:21:39,240 because otherwise we would have added that other edge first. 1321 01:21:39,240 --> 01:21:42,910 So here, we have to do sort of the greedy argument again, 1322 01:21:42,910 --> 01:21:45,520 considering edges by weight and e 1323 01:21:45,520 --> 01:21:47,370 is going to be the first edge that 1324 01:21:47,370 --> 01:21:49,160 crosses this particular cut, which 1325 01:21:49,160 --> 01:21:52,090 is this connected component versus everyone else. 1326 01:21:52,090 --> 01:21:54,640 So e has to be the minimum weight edge crossing the cut, 1327 01:21:54,640 --> 01:21:56,290 so the greedy choice property applies. 1328 01:21:56,290 --> 01:21:59,700 So we can put e in the minimum spanning tree, 1329 01:21:59,700 --> 01:22:01,560 and this algorithm is correct. 1330 01:22:01,560 --> 01:22:02,060 OK? 1331 01:22:02,060 --> 01:22:05,730 So we've used that lemma a zillion times by now. 1332 01:22:05,730 --> 01:22:09,430 That's minimum spanning tree and nearly linear time.