1 00:00:01,040 --> 00:00:03,230 PROFESSOR: Directed acyclic graphs 2 00:00:03,230 --> 00:00:07,510 are a special class of graphs that really have and warrant 3 00:00:07,510 --> 00:00:09,225 a theory of their own. 4 00:00:09,225 --> 00:00:11,780 Of course, "directed acyclic graphs" is lot of syllables, 5 00:00:11,780 --> 00:00:14,380 so they're called "DAGs" for short. 6 00:00:14,380 --> 00:00:14,880 OK. 7 00:00:14,880 --> 00:00:17,170 So here's why they come up all the time. 8 00:00:17,170 --> 00:00:20,220 Let's look at a diagram that may be familiar to you. 9 00:00:20,220 --> 00:00:21,860 This shows the prerequisite structure 10 00:00:21,860 --> 00:00:26,647 of required courses in the 6-3 program of MIT Electrical 11 00:00:26,647 --> 00:00:28,480 Engineering and Computer Science department. 12 00:00:28,480 --> 00:00:31,820 There are similar charts for the other sub-majors of EECS 13 00:00:31,820 --> 00:00:33,550 and in other departments as well. 14 00:00:33,550 --> 00:00:35,080 So what does it mean? 15 00:00:35,080 --> 00:00:39,540 Well, let's look at this vertex corresponding to the first term 16 00:00:39,540 --> 00:00:40,960 calculus class 18.01. 17 00:00:40,960 --> 00:00:43,610 And there's an edge that points to 6.042. 18 00:00:43,610 --> 00:00:46,150 And that's because, if you look at the catalogue, 19 00:00:46,150 --> 00:00:50,905 6.042 lists 18.01 as a prerequisite. 20 00:00:50,905 --> 00:00:53,280 If you look at the algorithms-- introductory algorithms-- 21 00:00:53,280 --> 00:00:56,190 class 6.006, you'll find, if you look in the catalog, 22 00:00:56,190 --> 00:01:01,020 that it has listed two prerequisites, 6.042 and 6.01. 23 00:01:01,020 --> 00:01:03,240 And the fact that they're explicitly 24 00:01:03,240 --> 00:01:05,430 listed in the catalog as prerequisites 25 00:01:05,430 --> 00:01:09,128 is why there's an arrow from 6.01 to 6.006 26 00:01:09,128 --> 00:01:13,310 and from 6.042 to 6.006. 27 00:01:13,310 --> 00:01:15,110 Now when you're planning on when you 28 00:01:15,110 --> 00:01:19,310 want to take 6.006, of course, you have to attend to not just 29 00:01:19,310 --> 00:01:23,440 the fact that you have to take 6.042 first and 6.01 first, 30 00:01:23,440 --> 00:01:25,150 but you've got to take the prerequisites 31 00:01:25,150 --> 00:01:26,358 of those prerequisites first. 32 00:01:26,358 --> 00:01:30,310 So you really have to take 18.01 before you can take 6.006. 33 00:01:30,310 --> 00:01:35,470 And you need to take 8.02 before you can take 6.006. 34 00:01:35,470 --> 00:01:37,122 There are corequisites here. 35 00:01:37,122 --> 00:01:38,580 Let's just ignore those and pretend 36 00:01:38,580 --> 00:01:40,650 that they were prerequisites, because they're 37 00:01:40,650 --> 00:01:43,940 another kind of arrow that needn't distract us. 38 00:01:43,940 --> 00:01:44,640 OK. 39 00:01:44,640 --> 00:01:48,530 So that's what this diagram is telling us. 40 00:01:48,530 --> 00:01:50,230 And this is a DAG. 41 00:01:50,230 --> 00:01:53,090 It's simply a bunch of vertices, the course 42 00:01:53,090 --> 00:01:58,980 labels in rectangular boxes, and directed arrows 43 00:01:58,980 --> 00:02:00,720 showing catalog listings. 44 00:02:00,720 --> 00:02:03,410 And what I said was that when you're planning your course 45 00:02:03,410 --> 00:02:06,600 work, you're really interested in the indirect prerequisites. 46 00:02:06,600 --> 00:02:10,340 So "one class, u, is an indirect prerequisite 47 00:02:10,340 --> 00:02:12,900 of another class, v" means that there's 48 00:02:12,900 --> 00:02:15,730 a sequence of prerequisites starting from u 49 00:02:15,730 --> 00:02:19,140 and going to v. It means that you really have to have taken u 50 00:02:19,140 --> 00:02:25,030 some time before you took v. And that's a crucial fact and thing 51 00:02:25,030 --> 00:02:28,000 that you need to take account of when you're planning a course 52 00:02:28,000 --> 00:02:29,350 schedule. 53 00:02:29,350 --> 00:02:32,360 So in terms of graph digraph language, 54 00:02:32,360 --> 00:02:34,060 "u is an indirect prerequisite of v" 55 00:02:34,060 --> 00:02:38,100 just means that there's a positive length walk that goes 56 00:02:38,100 --> 00:02:40,980 from u to v in the digraph. 57 00:02:40,980 --> 00:02:43,200 In this case, we're talking about the 6-3 digraph 58 00:02:43,200 --> 00:02:44,650 of prerequisites. 59 00:02:44,650 --> 00:02:50,190 So "there's a positive length walk from 18.01 to 6.006" 60 00:02:50,190 --> 00:02:52,730 means that you really have to have taken 18.01 61 00:02:52,730 --> 00:02:55,742 before you take 6.006. 62 00:02:55,742 --> 00:02:57,200 And of course, we're talking, then, 63 00:02:57,200 --> 00:03:01,460 about the positive length walk relation D plus of the digraph 64 00:03:01,460 --> 00:03:05,410 D. If D is the digraph shown in the prerequisite chart-- 65 00:03:05,410 --> 00:03:08,250 direct prerequisite chart-- then we're interested in D plus. 66 00:03:08,250 --> 00:03:11,260 And u D plus v just means there's a positive length 67 00:03:11,260 --> 00:03:16,430 walk-- that's what the plus is for-- going from to u to v. 68 00:03:16,430 --> 00:03:20,480 Now what happens if you have a closed walk? 69 00:03:20,480 --> 00:03:24,320 Well, a closed walk is a walk that starts and ends 70 00:03:24,320 --> 00:03:27,630 at the same vertex. 71 00:03:27,630 --> 00:03:29,400 And we can ask this question-- suppose 72 00:03:29,400 --> 00:03:32,090 there was a closed walk that started at 6.042 73 00:03:32,090 --> 00:03:33,320 and ended at 6.042. 74 00:03:33,320 --> 00:03:35,610 How long does it take to graduate then? 75 00:03:35,610 --> 00:03:37,170 Well, it takes a long time. 76 00:03:37,170 --> 00:03:41,350 Because you can't take 6.042 until you've taken 6.042 77 00:03:41,350 --> 00:03:43,830 and you're never going to be able to take it. 78 00:03:43,830 --> 00:03:44,760 That's a bad thing. 79 00:03:44,760 --> 00:03:48,940 We definitely don't want the prerequisite structure 80 00:03:48,940 --> 00:03:52,080 of courses in a department to have a closed 81 00:03:52,080 --> 00:03:54,646 walk of positive length. 82 00:03:54,646 --> 00:03:56,520 And in fact, there's a faculty committee that 83 00:03:56,520 --> 00:03:58,860 checks for this kind of thing. 84 00:03:58,860 --> 00:04:00,350 Bugs like this occasionally creep 85 00:04:00,350 --> 00:04:03,110 in when some busy curricular office of a department 86 00:04:03,110 --> 00:04:06,440 is planning a complicated program with dozens, if not 100 87 00:04:06,440 --> 00:04:07,630 courses. 88 00:04:07,630 --> 00:04:10,130 And the Committee on Curricula's job 89 00:04:10,130 --> 00:04:11,852 is to check for that kind of thing. 90 00:04:11,852 --> 00:04:13,310 There's a whole staff that does it. 91 00:04:13,310 --> 00:04:15,018 I used to be the chair of that committee. 92 00:04:15,018 --> 00:04:20,010 And we did spend a lot of time with proposals from departments 93 00:04:20,010 --> 00:04:24,040 and making sure that those proposed course requirements 94 00:04:24,040 --> 00:04:25,990 satisfied faculty rules. 95 00:04:25,990 --> 00:04:26,930 OK. 96 00:04:26,930 --> 00:04:31,200 So a special case of a closed walk is a cycle. 97 00:04:31,200 --> 00:04:36,090 A cycle is a walk who's only repeat vertex is its start 98 00:04:36,090 --> 00:04:37,052 and end. 99 00:04:39,930 --> 00:04:43,240 Let me remark, because we keep talking about positive length 100 00:04:43,240 --> 00:04:47,145 cycles, that a single vertex all by itself is a length-0 cycle. 101 00:04:47,145 --> 00:04:49,770 So you're never going to be able to get rid of length-0 cycles, 102 00:04:49,770 --> 00:04:52,040 because they're the same as vertices. 103 00:04:52,040 --> 00:04:55,630 But positive length cycles, you can hope to ensure 104 00:04:55,630 --> 00:04:57,370 are not there. 105 00:04:57,370 --> 00:05:00,490 So if you're going to represent a cycle as a path, 106 00:05:00,490 --> 00:05:04,970 you'd show the sequence of vertices and edges, v0, v1, v2, 107 00:05:04,970 --> 00:05:08,840 where the understanding is that all of the vertices from v0 up 108 00:05:08,840 --> 00:05:10,430 to v n minus 1 are different-- that's 109 00:05:10,430 --> 00:05:13,920 what makes it a cycle-- except that the last vertex, v0, is 110 00:05:13,920 --> 00:05:15,200 a repeat of the first one. 111 00:05:15,200 --> 00:05:17,420 That's the one repeat that's allowed in a cycle. 112 00:05:17,420 --> 00:05:20,470 So it's natural to draw it in a circle 113 00:05:20,470 --> 00:05:23,610 like this where you start at v0, you follow the edges around 114 00:05:23,610 --> 00:05:27,270 from v1 to v i plus 1 all the way back around to v0. 115 00:05:27,270 --> 00:05:30,690 And that's kind of what a cycle is going to look like. 116 00:05:30,690 --> 00:05:34,730 So we have a very straightforward lemma 117 00:05:34,730 --> 00:05:37,200 about cycles and closed walks, namely 118 00:05:37,200 --> 00:05:40,320 that the shortest positive length closed 119 00:05:40,320 --> 00:05:43,180 walk from a vertex to itself-- "it's closed" 120 00:05:43,180 --> 00:05:46,200 means it starts and ends at v-- is a positive length 121 00:05:46,200 --> 00:05:50,040 cycle starting and ending at v. And the reasoning and proof is 122 00:05:50,040 --> 00:05:52,240 basically the same proof that said that the shortest 123 00:05:52,240 --> 00:05:54,540 walk between one place and another 124 00:05:54,540 --> 00:05:57,650 is a path from one place to the other. 125 00:05:57,650 --> 00:06:01,460 The logic is that if I have a closed walk from v to v 126 00:06:01,460 --> 00:06:04,080 and there was a repeat in it other than at v, 127 00:06:04,080 --> 00:06:07,770 I could clip out the piece of the walk between the repeat 128 00:06:07,770 --> 00:06:09,930 occurrences and I'd get a shorter walk. 129 00:06:09,930 --> 00:06:13,020 So the shortest closed walk can't have any repeats. 130 00:06:13,020 --> 00:06:16,990 It's got to be a positive length cycle. 131 00:06:16,990 --> 00:06:21,130 So a directed acyclic graph now is defined simply 132 00:06:21,130 --> 00:06:24,190 as a digraph that has no positive length cycles. 133 00:06:24,190 --> 00:06:26,790 It's acyclic, no positive length cycles. 134 00:06:26,790 --> 00:06:29,030 And of course, we can equally well 135 00:06:29,030 --> 00:06:32,520 define it, since cycles are a special case 136 00:06:32,520 --> 00:06:35,070 of closed walks and closed walks of positive length 137 00:06:35,070 --> 00:06:41,100 imply cycles, as a digraph that has no positive length closed 138 00:06:41,100 --> 00:06:41,600 walk. 139 00:06:44,250 --> 00:06:46,420 Some examples of DAGs that come up-- well, 140 00:06:46,420 --> 00:06:49,630 the prerequisite graph is going to be one. 141 00:06:49,630 --> 00:06:54,365 And in general, any kind of set of constraints on tasks, 142 00:06:54,365 --> 00:06:56,490 which ones you have to do before you do other ones, 143 00:06:56,490 --> 00:07:01,267 is going to be defining a DAG structure. 144 00:07:01,267 --> 00:07:02,850 One that you might not have thought of 145 00:07:02,850 --> 00:07:05,180 is, the successor function defines 146 00:07:05,180 --> 00:07:10,020 a relation on the integers, say, going from n to n plus 1. 147 00:07:10,020 --> 00:07:12,530 So I'm going to have an arrow that goes directly 148 00:07:12,530 --> 00:07:14,180 from n to n plus 1. 149 00:07:14,180 --> 00:07:18,200 And what's the walk relation then, the positive length walk 150 00:07:18,200 --> 00:07:19,800 relation, in this graph? 151 00:07:19,800 --> 00:07:22,430 Well, there's a positive length walk from n 152 00:07:22,430 --> 00:07:26,530 to m precisely when n is less than m. 153 00:07:26,530 --> 00:07:30,470 So the successor DAG, it's paths represent 154 00:07:30,470 --> 00:07:32,090 the less than relation. 155 00:07:32,090 --> 00:07:35,120 And of course, less than, it doesn't have any cycles. 156 00:07:35,120 --> 00:07:37,070 Because if a is less than b, you're 157 00:07:37,070 --> 00:07:40,420 never going to get around from b back to something 158 00:07:40,420 --> 00:07:42,960 that's less than it, like back to a. 159 00:07:42,960 --> 00:07:47,220 So there can't be any cycles in the successor DAG. 160 00:07:47,220 --> 00:07:48,480 And that's why it is a DAG. 161 00:07:48,480 --> 00:07:52,290 Another similar one is the proper subset relation 162 00:07:52,290 --> 00:07:52,990 between sets. 163 00:07:52,990 --> 00:07:55,400 So here I'm going to draw an arrow from this set 164 00:07:55,400 --> 00:07:57,694 to that set if this set is contained in that set 165 00:07:57,694 --> 00:07:58,610 but they're not equal. 166 00:07:58,610 --> 00:08:01,916 So {a, b} is a subset of {a, b, d}, but {a, b, 167 00:08:01,916 --> 00:08:03,780 d} has this extra element, d. 168 00:08:03,780 --> 00:08:07,299 So the left-hand set is a proper subset of the right-hand set. 169 00:08:07,299 --> 00:08:08,840 And I'm going to draw an arrow there. 170 00:08:08,840 --> 00:08:10,620 And by the same reasoning, there can't 171 00:08:10,620 --> 00:08:14,932 be any cycles in this graph-- a positive length cycle-- 172 00:08:14,932 --> 00:08:16,390 because if there was, it would mean 173 00:08:16,390 --> 00:08:21,070 that the set had to be a proper subset of itself, which 174 00:08:21,070 --> 00:08:21,740 doesn't happen. 175 00:08:21,740 --> 00:08:24,540 So this would be another basic example of a DAG. 176 00:08:24,540 --> 00:08:26,610 And I hope you begin to see, from these examples, 177 00:08:26,610 --> 00:08:31,580 why DAGs are really all-pervasive 178 00:08:31,580 --> 00:08:34,980 in mathematics and in other areas 179 00:08:34,980 --> 00:08:38,030 and why they merit attention. 180 00:08:38,030 --> 00:08:40,059 So when we're looking at a DAG though, 181 00:08:40,059 --> 00:08:42,980 we're basically usually interested in just the walk 182 00:08:42,980 --> 00:08:44,730 relation of the DAG. 183 00:08:44,730 --> 00:08:48,690 So if we're only interested in the walk relation of the DAG, 184 00:08:48,690 --> 00:08:51,504 then it would be typically the case 185 00:08:51,504 --> 00:08:52,920 that many different DAGs are going 186 00:08:52,920 --> 00:08:54,260 to have the same walk relation. 187 00:08:54,260 --> 00:08:56,640 And it's natural to ask, what's the most economical one. 188 00:08:56,640 --> 00:08:59,540 Is there a minimum, say, DAG that 189 00:08:59,540 --> 00:09:01,160 defines a given walk relation? 190 00:09:01,160 --> 00:09:02,620 So let's look at this example. 191 00:09:02,620 --> 00:09:04,400 Here's a simple DAG. 192 00:09:04,400 --> 00:09:07,490 And you can check that there are no cycles in it. 193 00:09:07,490 --> 00:09:10,150 What's the smallest DAG with the same relation as this one? 194 00:09:10,150 --> 00:09:11,780 And the way I can get it is by going 195 00:09:11,780 --> 00:09:13,531 through the edges one at a time and asking 196 00:09:13,531 --> 00:09:14,905 whether I can get rid of the edge 197 00:09:14,905 --> 00:09:16,590 because it's not contributing anything. 198 00:09:16,590 --> 00:09:17,210 So look here. 199 00:09:17,210 --> 00:09:21,590 There's a path from a to e that goes through b. 200 00:09:21,590 --> 00:09:25,410 Well, that tells me that having this direct edge from a to e 201 00:09:25,410 --> 00:09:29,250 is not contributing anything in terms of connectedness. 202 00:09:29,250 --> 00:09:30,810 And that means I could get rid of it 203 00:09:30,810 --> 00:09:34,050 and I'm still going to wind up with the same possibility 204 00:09:34,050 --> 00:09:35,770 of walking from one place to another. 205 00:09:35,770 --> 00:09:37,940 Because I can always walk from a to e 206 00:09:37,940 --> 00:09:40,470 going through b instead of going directly from a to e. 207 00:09:40,470 --> 00:09:42,270 I didn't need that edge. 208 00:09:42,270 --> 00:09:45,110 Another example is, here's a walk from a 209 00:09:45,110 --> 00:09:46,830 to d that goes through c. 210 00:09:46,830 --> 00:09:49,420 There's no need for me to walk directly from a to d. 211 00:09:49,420 --> 00:09:51,820 As long as I'm walking, I can take the longer walk 212 00:09:51,820 --> 00:09:56,070 and get rid of the short circuit from a to d. 213 00:09:56,070 --> 00:10:00,330 Likewise, if I look at this path from c to d to f, 214 00:10:00,330 --> 00:10:02,940 I don't need that edge from c to f. 215 00:10:02,940 --> 00:10:04,740 And as a matter of fact, now if I 216 00:10:04,740 --> 00:10:08,140 look at this length 3 path from a to c to d to f, 217 00:10:08,140 --> 00:10:10,859 there's no need for me-- in order to get from a to f, 218 00:10:10,859 --> 00:10:12,900 there's no need for me-- to take the direct edge. 219 00:10:12,900 --> 00:10:14,010 I can get rid of that too. 220 00:10:14,010 --> 00:10:16,010 It's kind of a redundant extra edge. 221 00:10:16,010 --> 00:10:20,250 Finally, if I look at the path from b to d to f, 222 00:10:20,250 --> 00:10:23,470 I can get rid of the direct edge from b to f. 223 00:10:23,470 --> 00:10:26,690 And at this point, I'm done. 224 00:10:26,690 --> 00:10:32,660 I'm left with a set of edges called covering edges, which 225 00:10:32,660 --> 00:10:39,720 have the property that the only way to get from one vertex 226 00:10:39,720 --> 00:10:43,650 to another is going to have to be to use a covering 227 00:10:43,650 --> 00:10:45,340 edge to the target for vertex. 228 00:10:45,340 --> 00:10:48,830 Or more precisely, the only way to get, say, from a to b 229 00:10:48,830 --> 00:10:51,390 is going to be to use that covering edge. 230 00:10:51,390 --> 00:10:54,400 If there was any other path that went from a elsewhere 231 00:10:54,400 --> 00:10:57,020 and got back to b without using this edge, 232 00:10:57,020 --> 00:10:59,000 then it wouldn't be a covering edge anymore. 233 00:10:59,000 --> 00:11:00,416 The fact that it's a covering edge 234 00:11:00,416 --> 00:11:02,580 means that if you broke it, there's no way 235 00:11:02,580 --> 00:11:04,180 anymore to get from a to b. 236 00:11:04,180 --> 00:11:06,180 So that's the definition of covering edges 237 00:11:06,180 --> 00:11:09,190 and you'll do a class problem about them, more precisely, 238 00:11:09,190 --> 00:11:09,690 in a minute. 239 00:11:09,690 --> 00:11:13,480 So the other edges are unneeded to define the walk relation. 240 00:11:13,480 --> 00:11:16,880 And all we need to keep are the covering relations 241 00:11:16,880 --> 00:11:20,220 to get the minimum representation of the walk 242 00:11:20,220 --> 00:11:22,870 relation in terms of a DAG.