1 00:00:00,090 --> 00:00:01,800 The following content is provided 2 00:00:01,800 --> 00:00:04,040 under a Creative Commons license. 3 00:00:04,040 --> 00:00:06,880 Your support will help MIT OpenCourseWare continue 4 00:00:06,880 --> 00:00:10,740 to offer high quality educational resources for free. 5 00:00:10,740 --> 00:00:13,350 To make a donation or view additional materials 6 00:00:13,350 --> 00:00:15,800 from hundreds of MIT courses, visit 7 00:00:15,800 --> 00:00:21,994 MIT OpenCourseWare at ocw.mit.edu 8 00:00:21,994 --> 00:00:24,850 PROFESSOR: All right, let's get started. 9 00:00:24,850 --> 00:00:27,810 We return today to graph search. 10 00:00:27,810 --> 00:00:29,950 Last time we saw breadth-first search, today we're 11 00:00:29,950 --> 00:00:31,672 going to do depth-first search. 12 00:00:31,672 --> 00:00:34,130 It's a simple algorithm, but you can do lots of cool things 13 00:00:34,130 --> 00:00:34,570 with it. 14 00:00:34,570 --> 00:00:36,236 And that's what I'll spend most of today 15 00:00:36,236 --> 00:00:39,680 on, in particular, telling whether your graph has a cycle, 16 00:00:39,680 --> 00:00:42,680 and something called topological sort. 17 00:00:42,680 --> 00:00:45,970 As usual, basically in all graph algorithms 18 00:00:45,970 --> 00:00:48,790 in this class, the input, the way the graph is specified 19 00:00:48,790 --> 00:00:52,840 is as an adjacency list, or I guess adjacency list plural. 20 00:00:52,840 --> 00:00:56,460 So you have a bunch of lists, each one says for each vertex, 21 00:00:56,460 --> 00:00:58,550 what are the vertices I'm connected to? 22 00:00:58,550 --> 00:01:02,900 What are the vertices I can get to in one step via an edge? 23 00:01:02,900 --> 00:01:05,040 So that's our input and our goal, 24 00:01:05,040 --> 00:01:08,630 in general, with graph search is to explore the graph. 25 00:01:08,630 --> 00:01:10,390 In particular, the kind of exploration 26 00:01:10,390 --> 00:01:13,430 we're going to be doing today is to visit all the vertices, 27 00:01:13,430 --> 00:01:17,416 in some order, and visit each vertex only once. 28 00:01:17,416 --> 00:01:19,040 So the way we did breadth-first search, 29 00:01:19,040 --> 00:01:20,581 breadth-first search was really good. 30 00:01:20,581 --> 00:01:22,830 It explored things layer by layer, 31 00:01:22,830 --> 00:01:25,340 and that was nice because it gave us shortest paths, 32 00:01:25,340 --> 00:01:28,830 it gave us the fastest way to get to everywhere, 33 00:01:28,830 --> 00:01:31,490 from a particular source, vertex s. 34 00:01:31,490 --> 00:01:34,210 But if you can't get from s to your vertex, 35 00:01:34,210 --> 00:01:37,410 than the shortest way to get there is infinity, 36 00:01:37,410 --> 00:01:39,440 there's no way to get there. 37 00:01:39,440 --> 00:01:41,790 And BFS is good for detecting that, it can tell you 38 00:01:41,790 --> 00:01:46,490 which vertices are unreachable from s. 39 00:01:46,490 --> 00:01:50,130 DFS can do that as well, but it's often 40 00:01:50,130 --> 00:01:52,390 used to explore the whole graph, not just 41 00:01:52,390 --> 00:01:54,134 the part reachable from s, and so 42 00:01:54,134 --> 00:01:55,800 we're going to see how to do that today. 43 00:01:55,800 --> 00:01:58,580 This trick could be used for be BFS or for DFS, 44 00:01:58,580 --> 00:02:01,590 but we're going to do it here for DFS, because that's 45 00:02:01,590 --> 00:02:02,840 more common, let's say. 46 00:02:07,080 --> 00:02:09,014 So DFS. 47 00:02:21,110 --> 00:02:24,930 So depth-first search is kind of like how you solve a maze. 48 00:02:24,930 --> 00:02:27,350 Like, the other weekend I was at the big corn 49 00:02:27,350 --> 00:02:32,050 maze in central Massachusetts, and it's 50 00:02:32,050 --> 00:02:34,584 easy to get lost in there, in particular, 51 00:02:34,584 --> 00:02:36,250 because I didn't bring any bread crumbs. 52 00:02:36,250 --> 00:02:39,160 The proper way to solve a maze, if you're in there 53 00:02:39,160 --> 00:02:41,950 and all you can do is see which way to go next and then walk 54 00:02:41,950 --> 00:02:43,730 a little bit to the next junction, 55 00:02:43,730 --> 00:02:45,970 and then you have to keep making decisions. 56 00:02:45,970 --> 00:02:49,490 Unless you have a really good memory, which I do not, 57 00:02:49,490 --> 00:02:53,780 teaching staff can attest to that, then an easy way to do it 58 00:02:53,780 --> 00:02:55,720 is to leave bread crumbs behind, say, 59 00:02:55,720 --> 00:02:58,710 this is the last way I went from this node, 60 00:02:58,710 --> 00:03:00,974 so that when I reach a deadend, I 61 00:03:00,974 --> 00:03:02,390 have to turn around and backtrack. 62 00:03:02,390 --> 00:03:04,450 I reach a breadcrumb that say, oh, last time you 63 00:03:04,450 --> 00:03:07,160 went this way, next time you should go this way, 64 00:03:07,160 --> 00:03:10,570 and in particular, keep track at each node, which of the edges 65 00:03:10,570 --> 00:03:14,890 have I visited, which ones are still left to visit. 66 00:03:14,890 --> 00:03:18,910 And this can be done very easily on a computer using recursion. 67 00:03:30,520 --> 00:03:32,140 So high-level description is we're 68 00:03:32,140 --> 00:03:37,400 going to just recursively explore the graph, 69 00:03:37,400 --> 00:03:42,495 backtracking as necessary, kind of like how you solve a maze. 70 00:03:54,980 --> 00:03:59,210 In fact, when I was seven years old, 71 00:03:59,210 --> 00:04:00,960 one of the first computer programs I wrote 72 00:04:00,960 --> 00:04:01,918 was for solving a maze. 73 00:04:01,918 --> 00:04:04,140 I didn't know it was depth-first search at the time, 74 00:04:04,140 --> 00:04:04,810 but now I know. 75 00:04:11,050 --> 00:04:12,690 It was so much harder doing algorithms 76 00:04:12,690 --> 00:04:15,540 when I didn't know what they were. 77 00:04:15,540 --> 00:04:20,779 Anyway, I'm going to write some code for depth-first search, 78 00:04:20,779 --> 00:04:26,900 it is super simple code, the simplest graph algorithm. 79 00:04:49,175 --> 00:04:50,255 It's four lines. 80 00:05:05,590 --> 00:05:06,090 That's it. 81 00:05:06,090 --> 00:05:08,280 I'm going to write a little bit of code after this, 82 00:05:08,280 --> 00:05:11,500 but this is basic depth-first search. 83 00:05:11,500 --> 00:05:13,580 This will visit all the vertices reachable 84 00:05:13,580 --> 00:05:16,480 from a given source, vertex s. 85 00:05:16,480 --> 00:05:19,780 So we're given the adjacency list. 86 00:05:19,780 --> 00:05:22,130 I don't know why I put v here, you could erase it, 87 00:05:22,130 --> 00:05:24,380 it's not necessary. 88 00:05:24,380 --> 00:05:29,030 And all we do is, we have our vertex b, sorry, 89 00:05:29,030 --> 00:05:31,300 we have our vertex s. 90 00:05:31,300 --> 00:05:35,930 We look at all of the outgoing edges from s. 91 00:05:35,930 --> 00:05:40,950 For each one, we'll call it v, we check, 92 00:05:40,950 --> 00:05:42,770 have I visited this vertex already? 93 00:05:45,422 --> 00:05:46,880 A place where we need to be careful 94 00:05:46,880 --> 00:05:49,160 is to not repeat vertices. 95 00:05:49,160 --> 00:05:50,980 We need to do this in BFS as well. 96 00:05:56,110 --> 00:05:58,430 So, the way we're going to do that 97 00:05:58,430 --> 00:06:00,450 is by setting the parent of a node, 98 00:06:00,450 --> 00:06:03,210 we'll see what that actually means later. 99 00:06:03,210 --> 00:06:05,940 But for now, it's just, are you in the parent structure or not? 100 00:06:05,940 --> 00:06:09,600 This is initially, we've seen s, so we give it 101 00:06:09,600 --> 00:06:14,250 a parent of nothing, but it exists in this dictionary. 102 00:06:14,250 --> 00:06:16,830 If the vertex b that we're looking at 103 00:06:16,830 --> 00:06:19,300 is not in our dictionary, we haven't seen it yet, 104 00:06:19,300 --> 00:06:23,190 we mark it as seen by setting its parent to s, 105 00:06:23,190 --> 00:06:25,060 and then we recursively visit it. 106 00:06:25,060 --> 00:06:26,310 That's it. 107 00:06:26,310 --> 00:06:29,120 Super simple, just recurse. 108 00:06:29,120 --> 00:06:32,130 Sort of the magical part is the preventing yourself 109 00:06:32,130 --> 00:06:34,070 from repeating. 110 00:06:34,070 --> 00:06:37,250 As you explore the graph, if you reach something 111 00:06:37,250 --> 00:06:39,950 you've already seen before you just skip it again. 112 00:06:39,950 --> 00:06:45,010 So you only visit every vertex once, at most once. 113 00:06:45,010 --> 00:06:47,260 This will not visit the entire graph, 114 00:06:47,260 --> 00:06:50,840 it will only visit the vertices reachable from s. 115 00:06:50,840 --> 00:06:52,940 The next part of the code I'd like to give you 116 00:06:52,940 --> 00:06:56,920 is for visiting all the vertices, and in the textbook 117 00:06:56,920 --> 00:06:58,820 this is called the DFS, whereas this is just 118 00:06:58,820 --> 00:07:02,180 called DFS visit, that's sort of the recursive part, 119 00:07:02,180 --> 00:07:08,330 and this is sort of a top level algorithm. 120 00:07:08,330 --> 00:07:19,840 Here we are going to use the set of vertices, b, 121 00:07:19,840 --> 00:07:22,040 and here we're just going to iterate over the s's. 122 00:07:47,960 --> 00:07:51,150 So it looks almost the same, but what we're iterating over 123 00:07:51,150 --> 00:07:52,200 is different. 124 00:07:52,200 --> 00:07:55,720 Here we're iterating over the outgoing edges from s, 125 00:07:55,720 --> 00:07:57,855 here were iterating over the choices of s. 126 00:08:03,190 --> 00:08:05,239 So the idea here is we don't really 127 00:08:05,239 --> 00:08:06,530 know where to start our search. 128 00:08:06,530 --> 00:08:09,154 If it's a disconnected graph or not a strongly connected graph, 129 00:08:09,154 --> 00:08:12,330 we might have to start our search multiple times. 130 00:08:12,330 --> 00:08:15,520 This DFS algorithm is finding all the possible places 131 00:08:15,520 --> 00:08:19,290 you might start the search and trying them all. 132 00:08:19,290 --> 00:08:21,320 So it's like, OK, let's try the first vertex. 133 00:08:21,320 --> 00:08:23,778 If that hasn't been visited, which initially nothing's been 134 00:08:23,778 --> 00:08:27,380 visited, then visit it, recursively, everything 135 00:08:27,380 --> 00:08:29,010 reachable from s. 136 00:08:29,010 --> 00:08:30,630 Then you go on to the second vertex. 137 00:08:30,630 --> 00:08:32,480 Now, you may have already visited it, then you skip it. 138 00:08:32,480 --> 00:08:34,271 Third vertex, maybe you visited it already. 139 00:08:34,271 --> 00:08:36,250 Third, fourth vertex, keep going, 140 00:08:36,250 --> 00:08:39,049 until you find some vertex you haven't visited at all. 141 00:08:39,049 --> 00:08:42,990 And then you recursively visit everything reachable from it, 142 00:08:42,990 --> 00:08:45,400 and you repeat. 143 00:08:45,400 --> 00:08:48,400 This will find all the different clusters, 144 00:08:48,400 --> 00:08:50,480 all the different strongly connected components 145 00:08:50,480 --> 00:08:51,630 of your graph. 146 00:08:51,630 --> 00:08:54,190 Most of the work is being done by this recursion, 147 00:08:54,190 --> 00:08:55,970 but then there's this top level, just 148 00:08:55,970 --> 00:08:59,090 to make sure that all the vertices get visited. 149 00:08:59,090 --> 00:09:03,380 Let's do a little example, so this is super clear, 150 00:09:03,380 --> 00:09:07,410 and then it will also let me do something 151 00:09:07,410 --> 00:09:09,480 called edge classification. 152 00:09:09,480 --> 00:09:13,340 Once we see every edge in the graph 153 00:09:13,340 --> 00:09:15,870 gets visited by DFS in one way or another, 154 00:09:15,870 --> 00:09:18,820 and it's really helpful to think about the different ways 155 00:09:18,820 --> 00:09:20,910 they can be visited. 156 00:09:20,910 --> 00:09:25,810 So here's a graph. 157 00:09:25,810 --> 00:09:29,010 I think its a similar to one from last class. 158 00:09:46,160 --> 00:09:50,220 It's not strongly connected, I don't think, 159 00:09:50,220 --> 00:09:53,960 so you can't get from these vertices to c. 160 00:09:53,960 --> 00:09:55,510 You can get from c to everywhere, 161 00:09:55,510 --> 00:10:00,110 it looks like, but not strongly connected. 162 00:10:00,110 --> 00:10:02,820 And we're going to run DFS, and I think, basically 163 00:10:02,820 --> 00:10:06,480 in alphabetical order is how we're imagining-- 164 00:10:06,480 --> 00:10:08,230 these vertices have to be ordered somehow, 165 00:10:08,230 --> 00:10:12,680 we don't really care how, but for sake of example I care. 166 00:10:12,680 --> 00:10:15,610 So we're going to start with a, that's 167 00:10:15,610 --> 00:10:17,029 the first vertex in here. 168 00:10:17,029 --> 00:10:19,570 We're going to recursively visit everything reachable from a, 169 00:10:19,570 --> 00:10:22,750 so we enter here with s equals a. 170 00:10:22,750 --> 00:10:30,275 So I'll mark this s1, to be the first value of s at this level. 171 00:10:33,070 --> 00:10:37,180 So we consider-- I'm going to check the order here-- 172 00:10:37,180 --> 00:10:39,345 first edge we look at, there's two outgoing edges, 173 00:10:39,345 --> 00:10:40,845 let's say we look at this one first. 174 00:10:46,230 --> 00:10:48,950 We look at b, b has not been visited yet, 175 00:10:48,950 --> 00:10:50,570 has no parent pointer. 176 00:10:50,570 --> 00:10:54,040 This one has a parent pointer of 0. 177 00:10:54,040 --> 00:10:59,560 B we're going to give a parent pointer of a, that's here. 178 00:10:59,560 --> 00:11:01,970 Then we recursively visit everything for b. 179 00:11:01,970 --> 00:11:04,670 So we look at all the outgoing edges from b, there's only one. 180 00:11:04,670 --> 00:11:05,750 So we visit this edge. 181 00:11:09,230 --> 00:11:11,160 for b to e. e has not been visited, 182 00:11:11,160 --> 00:11:15,200 so we set as parent pointer to b, an now we recursively visit 183 00:11:15,200 --> 00:11:16,451 e. 184 00:11:16,451 --> 00:11:22,590 e has only one outgoing edge, so we look at it, over here to d. 185 00:11:25,230 --> 00:11:29,286 d has not been visited, so we set a parent pointer to e, 186 00:11:29,286 --> 00:11:31,160 and we look at all the outgoing edges from d. 187 00:11:31,160 --> 00:11:33,170 d has one outgoing edge, which is 188 00:11:33,170 --> 00:11:35,760 to b. b has already been visited, 189 00:11:35,760 --> 00:11:38,530 so we skip that one, nothing to do. 190 00:11:38,530 --> 00:11:42,720 That's the else case of this if, so we 191 00:11:42,720 --> 00:11:45,730 do nothing in the else case, we just go to the next edge. 192 00:11:45,730 --> 00:11:48,450 But there's no next edge for d, so we're done. 193 00:11:48,450 --> 00:11:52,440 So this algorithm returns to the next level up. 194 00:11:52,440 --> 00:11:54,220 Next level up was e, we were iterating 195 00:11:54,220 --> 00:11:55,690 over the outgoing edges from e. 196 00:11:55,690 --> 00:11:59,870 But there was only one, so we're done, so e finishes. 197 00:11:59,870 --> 00:12:05,340 Then we backtrack to b, which is always going back 198 00:12:05,340 --> 00:12:07,420 along the parent pointer, but it's also just 199 00:12:07,420 --> 00:12:08,500 in the recursion. 200 00:12:08,500 --> 00:12:10,915 We know where to go back to. 201 00:12:10,915 --> 00:12:13,540 We were going over the outgoing edges from b, there's only one, 202 00:12:13,540 --> 00:12:15,610 we're done. 203 00:12:15,610 --> 00:12:16,960 So we go back to a. 204 00:12:16,960 --> 00:12:18,910 We only looked at one outgoing edge from a. 205 00:12:18,910 --> 00:12:22,130 There's another outgoing edge, which is this one, 206 00:12:22,130 --> 00:12:24,880 but we've already visited d, so we skip over that one, 207 00:12:24,880 --> 00:12:27,240 too, so we're done recursively visiting 208 00:12:27,240 --> 00:12:30,970 everything reachable from a. 209 00:12:30,970 --> 00:12:34,190 Now we go back to this loop, the outer loop. 210 00:12:34,190 --> 00:12:38,310 So we did a, next we look at b, we say, oh b has been visited, 211 00:12:38,310 --> 00:12:40,000 we don't need to do anything from there. 212 00:12:40,000 --> 00:12:42,430 Then we go to c, c hasn't been visited 213 00:12:42,430 --> 00:12:46,210 so we're going to loop from c, and so this 214 00:12:46,210 --> 00:12:50,390 is our second choice of s in this recursion, 215 00:12:50,390 --> 00:12:53,460 or in this outer loop. 216 00:12:53,460 --> 00:12:56,200 And so we look at the outgoing edges from s2, 217 00:12:56,200 --> 00:12:59,210 let me match the order in the notes. 218 00:12:59,210 --> 00:13:03,516 Let's say first we go to f. 219 00:13:03,516 --> 00:13:08,150 f has not been visited, so we set its parent pointer to c. 220 00:13:08,150 --> 00:13:10,130 Then we look at all the outgoing edges from f. 221 00:13:10,130 --> 00:13:13,710 There's one outgoing edge from f, it goes to f. 222 00:13:13,710 --> 00:13:18,860 I guess I shouldn't really bold this, sorry. 223 00:13:18,860 --> 00:13:21,040 I'll say what the bold edges mean in a moment. 224 00:13:23,570 --> 00:13:25,300 This is just a regular edge. 225 00:13:25,300 --> 00:13:27,570 We follow the edge from f to f. 226 00:13:27,570 --> 00:13:29,385 We see, oh, f has already been visited, 227 00:13:29,385 --> 00:13:31,400 it already has a parent pointer, so there's 228 00:13:31,400 --> 00:13:33,389 no point going down there. 229 00:13:33,389 --> 00:13:35,430 We're done with f, that's the only outgoing edge. 230 00:13:35,430 --> 00:13:37,650 We go back to c, there's one other outgoing edge, 231 00:13:37,650 --> 00:13:40,900 but it leads to a vertex we've already visited, namely e, 232 00:13:40,900 --> 00:13:44,600 and so we're done with visiting everything reachable from c. 233 00:13:44,600 --> 00:13:46,100 We didn't visit everything reachable 234 00:13:46,100 --> 00:13:49,250 from c, because some of it was already visited from a. 235 00:13:49,250 --> 00:13:51,685 Then we go back to the outer loop, say, OK, what about d? 236 00:13:51,685 --> 00:13:53,060 D has been visited, what about e? 237 00:13:53,060 --> 00:13:54,351 E's been visited, what about f? 238 00:13:54,351 --> 00:13:55,590 F's been visited. 239 00:13:55,590 --> 00:13:57,790 So we're visiting these vertices again, 240 00:13:57,790 --> 00:14:03,640 but should only be twice in total, and in the end 241 00:14:03,640 --> 00:14:06,230 we visit all the vertices, and, in a certain sense, 242 00:14:06,230 --> 00:14:07,170 all the edges as well. 243 00:14:12,440 --> 00:14:18,070 Let's talk about running time. 244 00:14:27,597 --> 00:14:29,930 What do you think the running time of this algorithm is? 245 00:14:38,120 --> 00:14:39,590 Anyone? 246 00:14:39,590 --> 00:14:42,935 Time to wake up. 247 00:14:42,935 --> 00:14:43,897 AUDIENCE: Upper bound? 248 00:14:43,897 --> 00:14:45,810 PROFESSOR: Upper bound, sure. 249 00:14:45,810 --> 00:14:46,310 AUDIENCE: V? 250 00:14:46,310 --> 00:14:46,851 PROFESSOR: V? 251 00:14:46,851 --> 00:14:48,690 AUDIENCE: [INAUDIBLE]. 252 00:14:48,690 --> 00:14:55,690 PROFESSOR: V is a little bit optimistic, plus e, good, 253 00:14:55,690 --> 00:14:57,720 collaborative effort. 254 00:14:57,720 --> 00:15:00,070 It's linear time, just like BFS. 255 00:15:00,070 --> 00:15:02,520 This is what we call linear time, 256 00:15:02,520 --> 00:15:07,550 because this is the size of the input. 257 00:15:07,550 --> 00:15:11,342 It's theta V plus E for the whole thing. 258 00:15:11,342 --> 00:15:12,800 The size of the input was v plus e. 259 00:15:12,800 --> 00:15:15,300 We needed v slots in an array, plus we 260 00:15:15,300 --> 00:15:20,400 needed e items in these linked lists, one for each edge. 261 00:15:20,400 --> 00:15:22,560 We have to traverse that whole structure. 262 00:15:22,560 --> 00:15:27,030 The reason it's order v plus e is-- first, as you were saying, 263 00:15:27,030 --> 00:15:30,320 you're visiting every vertex once in this outer loop, 264 00:15:30,320 --> 00:15:46,160 so not worrying about the recursion in DFS alone, 265 00:15:46,160 --> 00:15:48,480 so that's order b. 266 00:15:48,480 --> 00:15:51,040 Then have to worry about this recursion, 267 00:15:51,040 --> 00:15:56,160 but we know that whenever we call DFS visit on a vertex, 268 00:15:56,160 --> 00:15:58,961 that it did not have a parent before. 269 00:15:58,961 --> 00:16:00,830 Right before we called DFS visit, 270 00:16:00,830 --> 00:16:03,170 we set its parent for the first time. 271 00:16:03,170 --> 00:16:05,590 Right before we called DFS visit on v here, 272 00:16:05,590 --> 00:16:07,580 we set as parent for the first time, 273 00:16:07,580 --> 00:16:09,930 because it wasn't set before. 274 00:16:09,930 --> 00:16:17,880 So DFS visit, and I'm going to just write of v, 275 00:16:17,880 --> 00:16:19,840 meaning the last argument here. 276 00:16:25,520 --> 00:16:32,660 It's called once, at most once, per vertex b. 277 00:16:35,800 --> 00:16:37,580 But it does not take constant time. 278 00:16:37,580 --> 00:16:41,310 This takes constant time per vertex, plus a recursive call. 279 00:16:41,310 --> 00:16:44,320 This thing, this takes constant time, but there's a for loop 280 00:16:44,320 --> 00:16:44,820 here. 281 00:16:44,820 --> 00:16:47,140 We have to pay for however many outgoing edges 282 00:16:47,140 --> 00:16:49,300 there are from b, that's the part you're missing. 283 00:16:52,880 --> 00:17:00,560 And we pay length of adjacency of v for that vertex. 284 00:17:00,560 --> 00:17:03,046 So the total in addition to this v 285 00:17:03,046 --> 00:17:08,300 is going to be the order, sum overall vertices, v in capital 286 00:17:08,300 --> 00:17:13,400 V, of length of the adjacency, list for v, 287 00:17:13,400 --> 00:17:22,150 which is E. This is the handshaking 288 00:17:22,150 --> 00:17:24,592 lemma from last time. 289 00:17:24,592 --> 00:17:27,010 It's twice e for undirected graphs, 290 00:17:27,010 --> 00:17:29,550 it's e for directed graphs. 291 00:17:29,550 --> 00:17:33,970 I've drawn directed graphs here, it's a little more interesting. 292 00:17:33,970 --> 00:17:37,560 OK, so it's linear time, just like the BFS, so you could say, 293 00:17:37,560 --> 00:17:42,240 who cares, but DFS offers a lot of different properties 294 00:17:42,240 --> 00:17:42,870 than BFS. 295 00:17:42,870 --> 00:17:44,660 They each have their niche. 296 00:17:44,660 --> 00:17:46,250 BFS is great for shortest paths. 297 00:17:46,250 --> 00:17:49,080 You want to know the fastest way to solve the Rubik's cube, 298 00:17:49,080 --> 00:17:50,560 BFS will find it. 299 00:17:50,560 --> 00:17:53,330 You want to find the fastest way to solve the Rubik's cube, 300 00:17:53,330 --> 00:17:55,150 DFS will not find it. 301 00:17:55,150 --> 00:17:57,090 It's not following shortest paths here. 302 00:17:57,090 --> 00:17:59,300 Going from a to d, we use the path 303 00:17:59,300 --> 00:18:01,324 of length 3, that's the bold edges. 304 00:18:01,324 --> 00:18:02,740 We could have gone directly from a 305 00:18:02,740 --> 00:18:05,170 to d, so it's a different kind of search, 306 00:18:05,170 --> 00:18:07,340 but sort of the inverse. 307 00:18:07,340 --> 00:18:10,560 But it's extremely useful, in particular, in the way 308 00:18:10,560 --> 00:18:13,082 that it classifies edges. 309 00:18:13,082 --> 00:18:14,790 So let me talk about edge classification. 310 00:18:27,630 --> 00:18:31,540 You can check every edge in this graph gets visited. 311 00:18:31,540 --> 00:18:34,060 In a directed graph every edge gets visited once, 312 00:18:34,060 --> 00:18:35,740 in an undirected graph, every edge 313 00:18:35,740 --> 00:18:37,660 gets visited twice, once from each side. 314 00:18:40,200 --> 00:18:42,240 And when you visit that edge, there's 315 00:18:42,240 --> 00:18:45,710 sort of different categories of what could happen to it. 316 00:18:45,710 --> 00:18:50,920 Maybe the edge led to something unvisited, when you went there. 317 00:18:50,920 --> 00:18:52,190 We call those tree edges. 318 00:19:10,360 --> 00:19:12,920 That's what the parent pointers are specifying 319 00:19:12,920 --> 00:19:16,420 and all the bold edges here are called three edges. 320 00:19:16,420 --> 00:19:27,410 This is when we visit a new vertex via that edge. 321 00:19:29,832 --> 00:19:31,540 So we look at the other side of the edge, 322 00:19:31,540 --> 00:19:33,024 we discover a new vertex. 323 00:19:33,024 --> 00:19:34,440 Those are what we call tree edges, 324 00:19:34,440 --> 00:19:37,830 it turns out they form a tree, a directed tree. 325 00:19:37,830 --> 00:19:39,930 That's a lemma you can prove. 326 00:19:39,930 --> 00:19:40,810 You can see it here. 327 00:19:40,810 --> 00:19:44,650 We just have a path, actually a forest would be more accurate. 328 00:19:44,650 --> 00:19:48,916 We have a path abed, and we have an edge cf, 329 00:19:48,916 --> 00:19:51,209 but, in general, it's a forest. 330 00:19:51,209 --> 00:19:53,250 So for example, if there was another thing coming 331 00:19:53,250 --> 00:19:57,540 from e here, let's modify my graph, we would, at some point, 332 00:19:57,540 --> 00:19:59,720 visit that edge and say, oh, here's a new way to go, 333 00:19:59,720 --> 00:20:04,250 and now that bold structure forms an actual tree. 334 00:20:04,250 --> 00:20:06,850 These are called tree edges, you can call them forest edges 335 00:20:06,850 --> 00:20:10,080 if you feel like it. 336 00:20:10,080 --> 00:20:13,120 There are other edges in there, the nonbold edges, 337 00:20:13,120 --> 00:20:17,260 and the textbook distinguishes three types, three types? 338 00:20:17,260 --> 00:20:19,950 Three types, so many types. 339 00:20:22,500 --> 00:20:40,580 They are forward edges, backward edges, and cross edges. 340 00:20:44,720 --> 00:20:47,740 Some of these are more useful to distinguish than others, 341 00:20:47,740 --> 00:20:51,490 but it doesn't hurt to have them all. 342 00:20:51,490 --> 00:20:57,590 So, for example, this edge I'm going to call a forward edge, 343 00:20:57,590 --> 00:21:01,260 just write f, that's unambiguous, 344 00:21:01,260 --> 00:21:04,430 because it goes, in some sense, forward along the tree. 345 00:21:04,430 --> 00:21:09,730 It goes from the root of this tree to a descendant. 346 00:21:09,730 --> 00:21:12,130 There is a path in the tree from a 347 00:21:12,130 --> 00:21:14,720 to d, so we call it a forward edge. 348 00:21:14,720 --> 00:21:20,770 By contrast, this edge I'm going to call a backward edge, 349 00:21:20,770 --> 00:21:24,570 because it goes from a node in the tree 350 00:21:24,570 --> 00:21:26,390 to an ancestor in the trees. 351 00:21:26,390 --> 00:21:28,914 If you think of parents, I can go from d to its parent 352 00:21:28,914 --> 00:21:30,830 to its parent, and that's where the edge goes, 353 00:21:30,830 --> 00:21:33,460 so that's a backward edge-- double check I 354 00:21:33,460 --> 00:21:36,870 got these not reversed, yeah, that's right. 355 00:21:36,870 --> 00:21:39,334 Forward edge because I could go from d to its parent 356 00:21:39,334 --> 00:21:41,000 to its parent to its parent and the edge 357 00:21:41,000 --> 00:21:44,220 went the other way, that's a forward edge. 358 00:21:44,220 --> 00:21:49,170 So forward edge goes from a node to a descendant in the tree. 359 00:21:52,540 --> 00:21:56,660 Backward edge goes from a node to an ancestor in the tree. 360 00:22:02,670 --> 00:22:04,180 And when I say, tree, I mean forest. 361 00:22:07,080 --> 00:22:10,170 And then all the other edges are cross edges. 362 00:22:12,940 --> 00:22:17,670 So I guess, here, this is a cross edge. 363 00:22:17,670 --> 00:22:20,840 In this case, it goes from one tree to another, doesn't 364 00:22:20,840 --> 00:22:22,540 have to go between different trees. 365 00:22:22,540 --> 00:22:28,540 For example, let's say I'm visiting d, then 366 00:22:28,540 --> 00:22:32,942 I go back to e, I visit g, or there could be this edge. 367 00:22:32,942 --> 00:22:37,720 If this edge existed, it would be a cross edge, 368 00:22:37,720 --> 00:22:40,970 because g and d are not ancestor related, 369 00:22:40,970 --> 00:22:42,980 neither one is an ancestor of the other, 370 00:22:42,980 --> 00:22:46,329 they are siblings actually. 371 00:22:46,329 --> 00:22:47,870 So there's, in general, there's going 372 00:22:47,870 --> 00:22:51,210 to be some subtree over here, some subtree over here, 373 00:22:51,210 --> 00:22:55,760 and this is a cross edge between two different subtrees. 374 00:22:55,760 --> 00:23:07,960 This cross edge is between two, sort of, non ancestor related, 375 00:23:07,960 --> 00:23:16,955 I think is the shortest way to write this, subtrees or nodes. 376 00:23:26,520 --> 00:23:29,065 A little puzzle for you, well, I guess 377 00:23:29,065 --> 00:23:31,620 the first question is, how do you compute this structure? 378 00:23:31,620 --> 00:23:34,212 How do you compute which edges are which? 379 00:23:34,212 --> 00:23:36,670 This is not hard, although I haven't written it in the code 380 00:23:36,670 --> 00:23:37,200 here. 381 00:23:37,200 --> 00:23:42,290 You can check the textbook for one way to do it. 382 00:23:42,290 --> 00:23:45,800 The parent structure tells you which edges are tree edges. 383 00:23:45,800 --> 00:23:47,980 So that part we have done. 384 00:23:47,980 --> 00:23:52,670 Every parent pointer corresponds to the reverse of a tree edge, 385 00:23:52,670 --> 00:23:55,250 so at the same time you could mark that edge a tree edge, 386 00:23:55,250 --> 00:23:56,958 and you'd know which edges are tree edges 387 00:23:56,958 --> 00:23:58,874 and which edges are nontree edges. 388 00:23:58,874 --> 00:24:01,290 If you want to know which are forward, which are backward, 389 00:24:01,290 --> 00:24:06,130 which are cross edges, the key thing you need to know 390 00:24:06,130 --> 00:24:14,140 is, well, in particular, for backward edges, one way 391 00:24:14,140 --> 00:24:16,850 to compute them is to mark which nodes 392 00:24:16,850 --> 00:24:19,880 you are currently exploring. 393 00:24:19,880 --> 00:24:22,660 So when we do a DFS visit on a node, 394 00:24:22,660 --> 00:24:25,160 we could say at the beginning here, 395 00:24:25,160 --> 00:24:31,230 basically, we're starting to visit s, say, start s, 396 00:24:31,230 --> 00:24:33,569 and then at the end of this for loop, we write, 397 00:24:33,569 --> 00:24:34,485 we're finished with s. 398 00:24:38,190 --> 00:24:40,130 And you could mark that in the s structure. 399 00:24:40,130 --> 00:24:43,720 You could say s dot in process is true up here, 400 00:24:43,720 --> 00:24:46,730 s dot in process equals false down here. 401 00:24:46,730 --> 00:24:49,470 Keep track of which nodes are currently in the recursion 402 00:24:49,470 --> 00:24:53,120 stack, just by marking them and unmarking them 403 00:24:53,120 --> 00:24:55,430 at the beginning and the end. 404 00:24:55,430 --> 00:24:58,210 Then we'll know, if we follow an edge and it's an edge 405 00:24:58,210 --> 00:25:01,220 to somebody who's already in the stack, 406 00:25:01,220 --> 00:25:06,020 then it's a backward edge, because that's-- everyone 407 00:25:06,020 --> 00:25:10,690 in the stack is an ancestor from our current node. 408 00:25:10,690 --> 00:25:15,400 Detecting forward edges, it's a little trickier. 409 00:25:18,940 --> 00:25:23,330 Forward edges versus cross edges, 410 00:25:23,330 --> 00:25:25,220 any suggestions on an easy way to do that? 411 00:25:28,480 --> 00:25:31,840 I don't think I know an easy way to do that. 412 00:25:31,840 --> 00:25:33,560 It can be done. 413 00:25:33,560 --> 00:25:35,750 The way the textbook does it is a little bit more 414 00:25:35,750 --> 00:25:41,030 sophisticated, in that when they start visiting a vertex, 415 00:25:41,030 --> 00:25:44,890 they record the time that it got visited. 416 00:25:44,890 --> 00:25:46,620 What's time? 417 00:25:46,620 --> 00:25:49,220 You could think of it as the clock on your computer, 418 00:25:49,220 --> 00:25:51,140 another way to do it is, every time 419 00:25:51,140 --> 00:25:55,000 you do a step in this algorithm, you increment a counter. 420 00:25:55,000 --> 00:25:58,351 So every time anything happens, you increment a counter, 421 00:25:58,351 --> 00:25:59,850 and then you store the value of that 422 00:25:59,850 --> 00:26:02,910 counter here for s, that would be the start time for s, 423 00:26:02,910 --> 00:26:06,100 you store the finish time for s down here, 424 00:26:06,100 --> 00:26:08,040 and then this gives you, this tells you 425 00:26:08,040 --> 00:26:09,970 when a node was visited, and you can 426 00:26:09,970 --> 00:26:12,520 use that to compute when an edge is a forward edge 427 00:26:12,520 --> 00:26:14,924 and otherwise it's a cross edge. 428 00:26:14,924 --> 00:26:16,840 It's not terribly exciting, though, so I'm not 429 00:26:16,840 --> 00:26:18,810 going to detail that. 430 00:26:18,810 --> 00:26:22,450 You can look at the textbook if you're interested. 431 00:26:22,450 --> 00:26:24,140 But here's a fun puzzle. 432 00:26:24,140 --> 00:26:32,920 In an undirected graph, which of these edges can exist? 433 00:26:32,920 --> 00:26:38,790 We can have a vote, do some democratic mathematics. 434 00:26:38,790 --> 00:26:41,910 How many people think tree edges exist in undirected graphs? 435 00:26:44,510 --> 00:26:46,170 You, OK. 436 00:26:46,170 --> 00:26:46,670 Sarini does. 437 00:26:46,670 --> 00:26:47,740 That's a good sign. 438 00:26:47,740 --> 00:26:49,340 How many people think forward edges 439 00:26:49,340 --> 00:26:50,920 exist in an undirected graph? 440 00:26:54,310 --> 00:26:54,870 A couple. 441 00:26:54,870 --> 00:26:56,370 How many people think backward edges 442 00:26:56,370 --> 00:26:59,500 exist in an undirected graph? 443 00:26:59,500 --> 00:27:00,000 Couple. 444 00:27:00,000 --> 00:27:01,850 How many people think cross edges 445 00:27:01,850 --> 00:27:03,980 exist in undirected graph? 446 00:27:03,980 --> 00:27:05,250 More people, OK. 447 00:27:05,250 --> 00:27:07,870 I think voting worked. 448 00:27:07,870 --> 00:27:10,830 They all exist, no, that's not true. 449 00:27:10,830 --> 00:27:13,217 This one can exist and this one can exist. 450 00:27:13,217 --> 00:27:15,050 I actually wrote the wrong ones in my notes, 451 00:27:15,050 --> 00:27:19,020 so it's good to trick you, no, it's I made a mistake. 452 00:27:19,020 --> 00:27:20,870 It's very easy to get these mixed up 453 00:27:20,870 --> 00:27:24,360 and you can think about why this is true, 454 00:27:24,360 --> 00:27:26,200 maybe I'll draw some pictures to clarify. 455 00:27:30,080 --> 00:27:35,570 This is something, you remember the-- there was BFS diagram, 456 00:27:35,570 --> 00:27:38,460 I talked a little bit about this last class. 457 00:27:38,460 --> 00:27:40,650 Tree edges better exist, those are the things 458 00:27:40,650 --> 00:27:42,370 you use to visit new vertices. 459 00:27:42,370 --> 00:27:45,640 So that always happens, undirected or otherwise. 460 00:27:45,640 --> 00:27:47,640 Forward edges, though, forward edge of 461 00:27:47,640 --> 00:27:51,590 would be, OK, I visited this, then I visited this. 462 00:27:51,590 --> 00:27:52,770 Those were tree edges. 463 00:27:55,370 --> 00:27:58,552 Then I backtrack and I follow an edge like this. 464 00:27:58,552 --> 00:27:59,760 This would be a forward edge. 465 00:27:59,760 --> 00:28:03,470 And in a directed graph that can happen. 466 00:28:03,470 --> 00:28:11,320 In an undirected graph, it can also happen, right? 467 00:28:11,320 --> 00:28:12,540 Oh, no, it can't, it can't. 468 00:28:12,540 --> 00:28:14,530 OK. 469 00:28:14,530 --> 00:28:15,720 So confusing. 470 00:28:15,720 --> 00:28:17,970 undirected graph, if you look like this, 471 00:28:17,970 --> 00:28:20,300 you start-- let's say this is s. 472 00:28:20,300 --> 00:28:24,000 You start here, and suppose we follow this edge. 473 00:28:24,000 --> 00:28:27,180 We get to here, then we follow this edge, we get to here. 474 00:28:27,180 --> 00:28:31,390 Then we will follow this edge in the other direction, 475 00:28:31,390 --> 00:28:35,240 and that's guaranteed to finish before we get back to s. 476 00:28:35,240 --> 00:28:36,970 So, in order to be a forward edge, 477 00:28:36,970 --> 00:28:39,110 this one has to be visited after this one, 478 00:28:39,110 --> 00:28:43,030 from s, but in this scenario, if you follow this one first, 479 00:28:43,030 --> 00:28:44,530 you'll eventually get to this vertex 480 00:28:44,530 --> 00:28:47,440 and then you will come back, and then that will be classified 481 00:28:47,440 --> 00:28:49,670 as a backward edge in an undirected graph. 482 00:28:49,670 --> 00:28:53,335 So you can never have forward edges in an undirected graph. 483 00:29:00,900 --> 00:29:04,490 But I have a backward edge here, that would suggest 484 00:29:04,490 --> 00:29:08,190 I can have backward edges here, and no cross edges. 485 00:29:08,190 --> 00:29:14,410 Well, democracy did not work, I was swayed by the popular vote. 486 00:29:14,410 --> 00:29:17,700 So I claim, apparently, cross edges do not exist. 487 00:29:17,700 --> 00:29:18,660 Let's try to draw this. 488 00:29:18,660 --> 00:29:26,240 So a cross edge typical scenario would be either here, 489 00:29:26,240 --> 00:29:29,900 you follow this edge, you backtrack, 490 00:29:29,900 --> 00:29:31,950 you follow another edge, and then 491 00:29:31,950 --> 00:29:34,670 you discover there's was an edge back to some other subtree 492 00:29:34,670 --> 00:29:36,020 that you've already visited. 493 00:29:36,020 --> 00:29:38,365 That can happen in an undirected graph. 494 00:29:38,365 --> 00:29:41,930 For the same reason, if I follow this one first, 495 00:29:41,930 --> 00:29:46,240 and this edge exists undirected, then I will go down that way. 496 00:29:46,240 --> 00:29:50,260 So it will be actually tree edge, not a cross edge. 497 00:29:50,260 --> 00:29:51,670 OK, phew. 498 00:29:51,670 --> 00:29:56,494 That means my notes were correct. 499 00:29:56,494 --> 00:29:57,910 I was surprised, because they were 500 00:29:57,910 --> 00:30:04,355 copied from the textbook, uncorrect my correction. 501 00:30:04,355 --> 00:30:04,855 Good. 502 00:30:10,080 --> 00:30:13,140 So what? 503 00:30:13,140 --> 00:30:15,930 Why do I care about these edge classifications? 504 00:30:15,930 --> 00:30:21,970 I claim they're super handy for two problems, cycle detection, 505 00:30:21,970 --> 00:30:24,140 which is pretty intuitive problem. 506 00:30:24,140 --> 00:30:26,760 Does my graph have any cycles? 507 00:30:26,760 --> 00:30:29,890 In the directed case, this is particularly interesting. 508 00:30:29,890 --> 00:30:33,390 I want to know, does a graph have any directed cycles? 509 00:30:33,390 --> 00:30:35,360 And another problem called topological sort, 510 00:30:35,360 --> 00:30:36,390 which we will get to. 511 00:30:41,500 --> 00:30:45,360 So let's start with cycle detection. 512 00:30:45,360 --> 00:30:48,870 This is actually a warmup for topological sort. 513 00:30:52,760 --> 00:30:55,680 So does my graph have any cycles? 514 00:30:55,680 --> 00:31:00,600 G has a cycle, I claim. 515 00:31:00,600 --> 00:31:10,660 This happens, if and only if, G has a back edge, or let's say, 516 00:31:10,660 --> 00:31:13,940 a depth-first search of that graph has a back edge. 517 00:31:17,250 --> 00:31:19,840 So it doesn't matter where I start from 518 00:31:19,840 --> 00:31:22,944 or how this algorithm-- I run this top level DFS algorithm, 519 00:31:22,944 --> 00:31:24,360 explore the whole graph, because I 520 00:31:24,360 --> 00:31:26,970 want to know in the whole graph is there a cycle? 521 00:31:26,970 --> 00:31:29,580 I claim, if there's a back edge, then there's a cycle. 522 00:31:33,030 --> 00:31:35,729 So it all comes down to back edges. 523 00:31:35,729 --> 00:31:38,020 This will work for both directed and undirected graphs. 524 00:31:38,020 --> 00:31:41,070 Detecting cycles is pretty easy in undirected graphs. 525 00:31:41,070 --> 00:31:43,370 It's a little more subtle with directed graphs, 526 00:31:43,370 --> 00:31:46,750 because you have to worry about the edge directions. 527 00:31:46,750 --> 00:31:49,610 So let's prove this. 528 00:31:49,610 --> 00:31:52,770 We haven't done a serious proof in a while, 529 00:31:52,770 --> 00:31:57,110 so this is still a pretty easy one, let's think about it. 530 00:31:57,110 --> 00:31:58,880 What do you think is the easier direction 531 00:31:58,880 --> 00:32:02,780 to prove here, left or right? 532 00:32:02,780 --> 00:32:03,720 To more democracy. 533 00:32:03,720 --> 00:32:07,292 How many people think left is easy? 534 00:32:07,292 --> 00:32:08,360 A couple. 535 00:32:08,360 --> 00:32:10,240 How many people think right is easy? 536 00:32:10,240 --> 00:32:12,410 A whole bunch more. 537 00:32:12,410 --> 00:32:14,890 I disagree with you. 538 00:32:14,890 --> 00:32:18,320 I guess it depends what you consider easy. 539 00:32:18,320 --> 00:32:21,210 Let me show you how easy left is. 540 00:32:21,210 --> 00:32:25,780 Left is, I have a back edge, I want to claim there's a cycle. 541 00:32:25,780 --> 00:32:27,610 What is the back edge look like? 542 00:32:27,610 --> 00:32:34,050 Well, it's an edge to an ancestor in the tree. 543 00:32:34,050 --> 00:32:35,796 If this node is a descendant of this node 544 00:32:35,796 --> 00:32:39,920 and this node is an ancestor of this node, that's 545 00:32:39,920 --> 00:32:42,860 saying there are tree edges, there's 546 00:32:42,860 --> 00:32:45,820 a path, a tree path, that connects one to the other. 547 00:32:49,340 --> 00:32:54,160 So these are tree edges, because this 548 00:32:54,160 --> 00:32:57,859 is supposed to be an ancestor, and this 549 00:32:57,859 --> 00:32:59,150 is supposed to be a descendant. 550 00:33:03,670 --> 00:33:08,770 And that's the definition of a back edge. 551 00:33:08,770 --> 00:33:11,540 Do you see a cycle? 552 00:33:11,540 --> 00:33:12,820 I see a cycle. 553 00:33:12,820 --> 00:33:17,550 This is a cycle, directed cycle. 554 00:33:17,550 --> 00:33:21,970 So if there's a back edge, by definition, it makes a cycle. 555 00:33:21,970 --> 00:33:24,290 Now, it's harder to say if I have 10 back edges, 556 00:33:24,290 --> 00:33:25,400 how many cycles are there? 557 00:33:25,400 --> 00:33:26,560 Could be many. 558 00:33:26,560 --> 00:33:28,880 But if there's a back edge, there's 559 00:33:28,880 --> 00:33:30,410 definitely at least one cycle. 560 00:33:34,082 --> 00:33:35,790 The other direction is also not too hard, 561 00:33:35,790 --> 00:33:38,600 but I would hesitate to call it easy. 562 00:33:38,600 --> 00:33:42,690 Any suggestions if, I know there is a cycle, 563 00:33:42,690 --> 00:33:46,910 how do I prove that there's a back edge somewhere? 564 00:33:46,910 --> 00:33:49,110 Think about that, let me draw a cycle. 565 00:34:11,439 --> 00:34:12,480 There's a length k cycle. 566 00:34:16,214 --> 00:34:17,880 Where do you think, which of these edges 567 00:34:17,880 --> 00:34:19,260 do you think is going to be a back edge? 568 00:34:19,260 --> 00:34:20,835 Let's hope it's one of these edges. 569 00:34:23,350 --> 00:34:24,190 Sorry? 570 00:34:24,190 --> 00:34:25,420 AUDIENCE: Vk to v zero. 571 00:34:25,420 --> 00:34:26,560 PROFESSOR: Vk to v zero. 572 00:34:26,560 --> 00:34:31,000 That's a good idea, maybe this is a back edge. 573 00:34:31,000 --> 00:34:34,670 Of course, this is symmetric, why that edge? 574 00:34:34,670 --> 00:34:36,780 I labeled it in a suggestive way, 575 00:34:36,780 --> 00:34:39,389 but I need to say something before I know actually which 576 00:34:39,389 --> 00:34:42,404 edge is going to be the back edge. 577 00:34:42,404 --> 00:34:44,320 AUDIENCE: You have to say you start to v zero? 578 00:34:44,320 --> 00:34:45,850 PROFESSOR: Start at v zero. 579 00:34:45,850 --> 00:34:48,460 If I started a search of v zero, that 580 00:34:48,460 --> 00:34:49,839 looks good, because the search is 581 00:34:49,839 --> 00:34:51,719 kind of going to go in this direction. 582 00:34:51,719 --> 00:34:53,949 vk will maybe be the last thing to be visited, 583 00:34:53,949 --> 00:34:55,480 that's not actually true. 584 00:34:55,480 --> 00:34:57,710 Could be there's an edge directly from v zero to vk, 585 00:34:57,710 --> 00:35:00,700 but intuitively vk will kind of later, 586 00:35:00,700 --> 00:35:02,470 and then when this edge gets visited, 587 00:35:02,470 --> 00:35:05,350 this will be an ancestor and it will be a back edge. 588 00:35:05,350 --> 00:35:10,270 Of course, we may not start a search here, 589 00:35:10,270 --> 00:35:12,240 so calling it the start of the search 590 00:35:12,240 --> 00:35:16,079 is not quite right, a little different. 591 00:35:16,079 --> 00:35:18,800 AUDIENCE: First vertex that gets hit [INAUDIBLE]. 592 00:35:18,800 --> 00:35:21,550 PROFESSOR: First vertex that gets hit, good. 593 00:35:21,550 --> 00:35:24,820 I'm going to start the numbering , v zero, 594 00:35:24,820 --> 00:35:38,460 let's assume v 0 is the first vertex in the cycle, 595 00:35:38,460 --> 00:35:40,040 visited by the depth-first search. 596 00:35:47,100 --> 00:35:54,060 Together, if you want some pillows if you like them, 597 00:35:54,060 --> 00:35:56,640 especially convenient that they're in front. 598 00:35:56,640 --> 00:35:59,130 So right, if it's not v zero, say 599 00:35:59,130 --> 00:36:00,470 v3 was the first one visited. 600 00:36:00,470 --> 00:36:01,845 We will just change the labeling, 601 00:36:01,845 --> 00:36:06,260 so that's v zero, that's v1, that's v, and so on. 602 00:36:06,260 --> 00:36:09,340 So set this labeling, so that v0 first one, 603 00:36:09,340 --> 00:36:12,430 first vertex that gets visited. 604 00:36:12,430 --> 00:36:20,230 Then, I claim that-- let me just write the claim first. 605 00:36:20,230 --> 00:36:23,610 This edge vkv0 will be a back edge. 606 00:36:26,350 --> 00:36:29,252 We'll just say, is back edge. 607 00:36:29,252 --> 00:36:32,780 And I would say this is not obvious, be a little careful. 608 00:36:50,420 --> 00:36:54,460 We have to somehow exploit the depth-first nature of DFS, 609 00:36:54,460 --> 00:36:58,820 the fact that it goes deep-- it goes as deep as it can before 610 00:36:58,820 --> 00:37:00,396 backtracking. 611 00:37:00,396 --> 00:37:02,820 If you think about it, we're starting, 612 00:37:02,820 --> 00:37:05,690 at this point we are starting a search relative to this cycle. 613 00:37:05,690 --> 00:37:08,550 No one has been visited, except v zero just 614 00:37:08,550 --> 00:37:10,930 got visited, has a parent pointer off somewhere else. 615 00:37:15,990 --> 00:37:16,880 What do we do next? 616 00:37:16,880 --> 00:37:19,309 Well, we visit all the outgoing edges from v zero, 617 00:37:19,309 --> 00:37:20,850 there might be many of them. it could 618 00:37:20,850 --> 00:37:23,480 be edge from v zero to v1, it could an edge from v zero 619 00:37:23,480 --> 00:37:28,750 to v3, it could be an edge from v zero to something else. 620 00:37:28,750 --> 00:37:31,980 We don't know which one's going to happen first. 621 00:37:31,980 --> 00:37:39,760 But the one thing I can claim is that v1 622 00:37:39,760 --> 00:37:46,610 will be visited before we finish visiting v zero. 623 00:37:52,124 --> 00:37:53,790 From v zero, we might go somewhere else, 624 00:37:53,790 --> 00:37:55,790 we might go somewhere else that might eventually 625 00:37:55,790 --> 00:37:58,130 lead to v1 by some other route, but in particular, we 626 00:37:58,130 --> 00:38:01,440 look at that edge from v zero to v1. 627 00:38:01,440 --> 00:38:03,730 And so, at some point, we're searching, 628 00:38:03,730 --> 00:38:06,580 we're visiting all the things reachable from v zero, that 629 00:38:06,580 --> 00:38:09,830 includes v1, and that will happen, 630 00:38:09,830 --> 00:38:11,950 we will touch v1 for the first time, 631 00:38:11,950 --> 00:38:13,800 because it hasn't been touched yet. 632 00:38:13,800 --> 00:38:17,932 We will visit it before we finish visiting v zero. 633 00:38:17,932 --> 00:38:21,660 The same goes actually for all of v i's, because they're all 634 00:38:21,660 --> 00:38:23,510 reachable from v zero. 635 00:38:23,510 --> 00:38:25,760 You can prove this by induction. 636 00:38:25,760 --> 00:38:29,860 You'll have to visit v1 before you finish visiting v zero. 637 00:38:29,860 --> 00:38:32,480 You'll have to visit v2 before you finish visiting 638 00:38:32,480 --> 00:38:35,592 v1, although you might actually visit v2 before v1. 639 00:38:35,592 --> 00:38:37,050 You would definitely finish, you'll 640 00:38:37,050 --> 00:38:41,880 finished v2 before you finish v1, and so on. 641 00:38:41,880 --> 00:38:47,424 So vi will be visited before you finish vi minus 1, 642 00:38:47,424 --> 00:38:49,090 but in particular, what we care about is 643 00:38:49,090 --> 00:38:58,760 that vk is visited before we finish v zero. 644 00:39:02,040 --> 00:39:03,670 And it will be entirely visited. 645 00:39:03,670 --> 00:39:05,930 We will finish visiting vk before we 646 00:39:05,930 --> 00:39:07,570 finish visiting v zero. 647 00:39:07,570 --> 00:39:10,280 We will start decay vk after we start to v zero, 648 00:39:10,280 --> 00:39:12,330 because v zero is first. 649 00:39:12,330 --> 00:39:16,580 So the order is going to look like, start v zero, 650 00:39:16,580 --> 00:39:20,940 at some point we will start vk. 651 00:39:20,940 --> 00:39:27,950 Then we'll finish vk, then we'll finish v zero. 652 00:39:27,950 --> 00:39:30,340 This is something the textbook likes to call, 653 00:39:30,340 --> 00:39:33,200 and I like to call, balanced parentheses. 654 00:39:33,200 --> 00:39:38,690 You can think of it as, we start v zero, then we start vk, 655 00:39:38,690 --> 00:39:42,390 then we finish vk, then we finish v zero. 656 00:39:42,390 --> 00:39:44,290 And these match up and their balanced. 657 00:39:46,970 --> 00:39:48,720 Depth-first search always looks like that, 658 00:39:48,720 --> 00:39:50,630 because once you start a vertex, you 659 00:39:50,630 --> 00:39:53,060 keep chugging until you visited all the things reachable 660 00:39:53,060 --> 00:39:54,460 from it. 661 00:39:54,460 --> 00:39:55,500 Then you finish it. 662 00:39:55,500 --> 00:39:57,560 You won't finish v zero before you finish vk, 663 00:39:57,560 --> 00:40:00,114 because it's part of the recursion. 664 00:40:00,114 --> 00:40:01,530 You can't return at a higher level 665 00:40:01,530 --> 00:40:04,942 before you return at the lower levels. 666 00:40:04,942 --> 00:40:06,400 So we've just argued that the order 667 00:40:06,400 --> 00:40:08,025 is like this, because v zero was first, 668 00:40:08,025 --> 00:40:11,600 so vk starts after v zero, and also we're going to finish vk 669 00:40:11,600 --> 00:40:14,550 before we finish v zero, because it's reachable, and hasn't 670 00:40:14,550 --> 00:40:17,000 been visited before. 671 00:40:17,000 --> 00:40:25,200 So, in here, we consider vkv zero. 672 00:40:28,000 --> 00:40:32,070 When we consider that edge, it will be a back edge. 673 00:40:34,750 --> 00:40:35,710 Why? 674 00:40:35,710 --> 00:40:39,640 Because v zero is currently on the recursion stack, 675 00:40:39,640 --> 00:40:42,427 and so you will have marked v zero as currently in process. 676 00:40:42,427 --> 00:40:44,760 So when you look at that edge, you see it's a back edge, 677 00:40:44,760 --> 00:40:47,660 it's an edge to your ancestor. 678 00:40:47,660 --> 00:40:48,430 That's the proof. 679 00:40:51,700 --> 00:40:52,790 Any questions about that? 680 00:40:55,490 --> 00:40:59,460 It's pretty easy once you set up the starting point, which 681 00:40:59,460 --> 00:41:01,470 is look at the first time you visit the cycle, 682 00:41:01,470 --> 00:41:03,732 than just think about how you walk around the cycle. 683 00:41:03,732 --> 00:41:05,940 There's lots of ways you might walk around the cycle, 684 00:41:05,940 --> 00:41:08,579 but it's guaranteed you'll visit vk at some point, 685 00:41:08,579 --> 00:41:10,870 then you'll look at the edge. v0 is still in the stack, 686 00:41:10,870 --> 00:41:12,730 so it's a back edge. 687 00:41:12,730 --> 00:41:14,575 And so this proves that having a cycle 688 00:41:14,575 --> 00:41:16,260 is equivalent to having a back edge. 689 00:41:16,260 --> 00:41:18,980 This gives you an easy linear time algorithm to tell, 690 00:41:18,980 --> 00:41:20,902 does my graph have a cycle? 691 00:41:20,902 --> 00:41:22,860 And if it does, it's actually easy to find one, 692 00:41:22,860 --> 00:41:26,102 because we find a back edge, just follow the tree edges, 693 00:41:26,102 --> 00:41:27,060 and you get your cycle. 694 00:41:29,564 --> 00:41:31,230 So if someone gives you a graph and say, 695 00:41:31,230 --> 00:41:34,350 hey, I think this is acyclic, you can very quickly say, 696 00:41:34,350 --> 00:41:36,590 no, it's not, here's a cycle, or say, 697 00:41:36,590 --> 00:41:40,490 yeah, I agree, no back edges, I only have tree, forward, 698 00:41:40,490 --> 00:41:41,611 and cross edges. 699 00:41:49,150 --> 00:41:50,545 OK, that was application 1. 700 00:41:56,610 --> 00:41:58,990 Application 2 is topological sort, 701 00:41:58,990 --> 00:42:02,790 which we're going to think about in the setting 702 00:42:02,790 --> 00:42:04,320 of a problem called job scheduling. 703 00:42:07,700 --> 00:42:14,860 So job scheduling, we are given a directed acyclic graph. 704 00:42:21,770 --> 00:42:39,090 I want to order the vertices so that all edges point 705 00:42:39,090 --> 00:42:46,090 from lower order to high order. 706 00:42:52,520 --> 00:42:54,405 Directed acyclic graph is called a DAG, 707 00:42:54,405 --> 00:42:59,830 you should know that from 042. 708 00:42:59,830 --> 00:43:02,790 And maybe I'll draw one for kicks. 709 00:43:32,030 --> 00:43:34,760 Now, I've drawn the graph so all the edges go left to right, 710 00:43:34,760 --> 00:43:37,110 so you can see that there's no cycles here, 711 00:43:37,110 --> 00:43:41,090 but generally you'd run DFS and you'd detect there's no cycles. 712 00:43:41,090 --> 00:43:43,170 And now, imagine these vertices represent 713 00:43:43,170 --> 00:43:45,746 things you need to do. 714 00:43:45,746 --> 00:43:49,080 The textbook has a funny example where you're getting dressed, 715 00:43:49,080 --> 00:43:50,820 so you have these constraints that say, 716 00:43:50,820 --> 00:43:53,579 well, I've got to put my socks on before put my shoes on. 717 00:43:53,579 --> 00:43:55,620 And then I've got to put my underwear on before I 718 00:43:55,620 --> 00:43:59,350 put my pants on, and all these kinds of things. 719 00:43:59,350 --> 00:44:01,460 You would code that as a directed acyclic graph. 720 00:44:01,460 --> 00:44:03,293 You hope there's no cycles, because then you 721 00:44:03,293 --> 00:44:05,100 can't get dressed. 722 00:44:05,100 --> 00:44:06,830 And there's some things, like, well, I 723 00:44:06,830 --> 00:44:09,050 could put my glasses on whenever, although actually I 724 00:44:09,050 --> 00:44:11,174 should put my glasses on before I do anything else, 725 00:44:11,174 --> 00:44:12,730 otherwise there's problems. 726 00:44:12,730 --> 00:44:14,980 I don't know, you could put your watch on at any time, 727 00:44:14,980 --> 00:44:17,110 unless you need to know what time is. 728 00:44:17,110 --> 00:44:20,287 So there's some disconnected parts, whatever. 729 00:44:20,287 --> 00:44:21,870 There's some unrelated things, like, I 730 00:44:21,870 --> 00:44:24,955 don't care the order between my shirt and my pants 731 00:44:24,955 --> 00:44:28,780 or whatever, some things aren't constrained. 732 00:44:28,780 --> 00:44:31,760 What you'd like to do is choose an actual order to do things. 733 00:44:31,760 --> 00:44:33,275 Say you're a sequential being, you 734 00:44:33,275 --> 00:44:35,630 can only do one thing at a time, so I 735 00:44:35,630 --> 00:44:37,050 want to compute a total order. 736 00:44:37,050 --> 00:44:39,510 First I'll do g, then I'll do a, then 737 00:44:39,510 --> 00:44:42,900 I can do h, because I've done both of the predecessors. 738 00:44:42,900 --> 00:44:45,160 Then I can't do be, because I haven't done d, 739 00:44:45,160 --> 00:44:49,040 so maybe I'll do d first, and then b, and than e, then c, 740 00:44:49,040 --> 00:44:50,090 then f, then i. 741 00:44:50,090 --> 00:44:53,180 That would be a valid order, because all edges point 742 00:44:53,180 --> 00:44:55,580 from an earlier number to a later number. 743 00:44:55,580 --> 00:44:56,930 So that's the goal. 744 00:44:56,930 --> 00:44:59,300 And these are real job scheduling problems 745 00:44:59,300 --> 00:45:01,670 that come up, you'll see more applications 746 00:45:01,670 --> 00:45:04,710 in your problem set. 747 00:45:04,710 --> 00:45:07,199 How do we do this? 748 00:45:07,199 --> 00:45:08,990 Well, at this point we have two algorithms, 749 00:45:08,990 --> 00:45:10,880 and I pretty much revealed it is DFS. 750 00:45:10,880 --> 00:45:13,100 DFS will do this. 751 00:45:13,100 --> 00:45:16,650 It's a topological sort, is what this algorithm is usually 752 00:45:16,650 --> 00:45:17,150 called. 753 00:45:20,010 --> 00:45:23,280 Topological sort because you're given a graph, which 754 00:45:23,280 --> 00:45:25,070 you could think of as a topology. 755 00:45:25,070 --> 00:45:26,912 You want to sort it, in a certain sense. 756 00:45:26,912 --> 00:45:28,370 It's not like sorting numbers, it's 757 00:45:28,370 --> 00:45:32,370 sorting vertices in a graph, so, hence, topological sort. 758 00:45:32,370 --> 00:45:34,150 That's the name of the algorithm. 759 00:45:34,150 --> 00:45:46,250 And it's run DFS, and output the reverse 760 00:45:46,250 --> 00:45:55,192 of the finishing times of vertices. 761 00:45:55,192 --> 00:45:57,150 so this is another application where you really 762 00:45:57,150 --> 00:45:58,983 want to visit all the vertices in the graph, 763 00:45:58,983 --> 00:46:05,100 so we use this top level DFS, so everybody gets visited. 764 00:46:05,100 --> 00:46:07,350 And there are these finishing times, 765 00:46:07,350 --> 00:46:11,470 so every time I finish a vertex, I could add it to a list. 766 00:46:11,470 --> 00:46:13,294 Say OK, that one was finished next, 767 00:46:13,294 --> 00:46:15,460 than this one is finished, than this one's finished. 768 00:46:15,460 --> 00:46:18,320 I take that order and I reverse it. 769 00:46:18,320 --> 00:46:21,588 That will be a topological order. 770 00:46:21,588 --> 00:46:22,900 Why? 771 00:46:22,900 --> 00:46:24,190 Who knows. 772 00:46:24,190 --> 00:46:24,880 Let's prove it. 773 00:46:34,440 --> 00:46:38,610 We've actually done pretty much the hard work, which 774 00:46:38,610 --> 00:46:42,560 is to say-- we're assuming our graph has no cycles, 775 00:46:42,560 --> 00:46:46,150 so that tells us by this cycle detection 776 00:46:46,150 --> 00:46:47,410 that there are no back edges. 777 00:46:47,410 --> 00:46:49,780 Back edges are kind of the annoying part. 778 00:46:49,780 --> 00:46:51,500 Now they don't exist here. 779 00:46:51,500 --> 00:46:56,970 So all the edges are tree edges, forward edges, and cross edges, 780 00:46:56,970 --> 00:47:01,765 and we use that to prove the theorem. 781 00:47:05,020 --> 00:47:10,570 So we want to prove that all the edges point from an earlier 782 00:47:10,570 --> 00:47:12,170 number to a later number. 783 00:47:15,320 --> 00:47:17,080 So what that means is for an edge, 784 00:47:17,080 --> 00:47:22,830 uv, we want to show that v finishes before u. 785 00:47:32,010 --> 00:47:34,750 That's the reverse, because what we're taking 786 00:47:34,750 --> 00:47:38,610 is the reverse of the finishing order. 787 00:47:38,610 --> 00:47:41,790 So edge uv, I want to make sure v finishes first, 788 00:47:41,790 --> 00:47:43,595 so that u will be ordered first. 789 00:47:45,917 --> 00:47:47,000 Well, there are two cases. 790 00:47:51,290 --> 00:47:59,010 Case 1 is that u starts before v. Case 2 791 00:47:59,010 --> 00:48:01,460 is that he v before u. 792 00:48:06,690 --> 00:48:08,220 At some point they start, because we 793 00:48:08,220 --> 00:48:09,136 visit the whole graph. 794 00:48:13,160 --> 00:48:16,400 This top loop guarantees that. 795 00:48:16,400 --> 00:48:21,440 So consider what order we visit them first, at the beginning, 796 00:48:21,440 --> 00:48:23,960 and then we'll think about how they finish. 797 00:48:23,960 --> 00:48:27,400 Well, this case is kind of something we've seen before. 798 00:48:27,400 --> 00:48:31,480 We visit u, we have not yet visited v, 799 00:48:31,480 --> 00:48:35,440 but v is reachable from u, so maybe via this edge, 800 00:48:35,440 --> 00:48:38,320 or maybe via some other path, we will eventually 801 00:48:38,320 --> 00:48:41,190 visit v in the recursion for u. 802 00:48:41,190 --> 00:48:48,950 So before u finishes, we will visit v, visit v 803 00:48:48,950 --> 00:48:53,070 before u finishes. 804 00:48:53,070 --> 00:48:58,560 That sentence is just like this sentence, 805 00:48:58,560 --> 00:48:59,849 so same kind of argument. 806 00:48:59,849 --> 00:49:01,640 We won't go into detail, because we already 807 00:49:01,640 --> 00:49:04,470 did that several times. 808 00:49:04,470 --> 00:49:07,710 So that means we'll visit v, we will completely visit v, 809 00:49:07,710 --> 00:49:10,040 we will finish v before we finish u 810 00:49:10,040 --> 00:49:12,100 and that's what we wanted to prove. 811 00:49:12,100 --> 00:49:14,580 So in that case is good. 812 00:49:14,580 --> 00:49:18,820 The other cases is that v starts before u. 813 00:49:18,820 --> 00:49:21,764 Here, you might get slightly worried. 814 00:49:21,764 --> 00:49:24,810 So we have an edge, uv, still, same direction. 815 00:49:24,810 --> 00:49:29,930 But now we start at v, u has not yet been visited. 816 00:49:29,930 --> 00:49:35,646 Well, now we worry that we visit u. 817 00:49:35,646 --> 00:49:38,510 If we visit u, we're going to finish u before we finish v, 818 00:49:38,510 --> 00:49:40,640 but we want it to be the other way around. 819 00:49:40,640 --> 00:49:43,096 Why can't that happen? 820 00:49:43,096 --> 00:49:44,013 AUDIENCE: [INAUDIBLE]. 821 00:49:44,013 --> 00:49:46,262 PROFESSOR: Because there's a back edge somewhere here. 822 00:49:46,262 --> 00:49:48,610 In particular, the graph would have to be cyclic. 823 00:49:48,610 --> 00:49:54,830 This is a cycle, so this can't happen, a contradiction. 824 00:49:54,830 --> 00:50:00,350 So v will finish before we visit u at all. 825 00:50:04,690 --> 00:50:07,830 So v will still finish first, because we don't even touch u, 826 00:50:07,830 --> 00:50:10,080 because there's no cycles. 827 00:50:10,080 --> 00:50:13,280 So that's actually the proof that topological sort gives you 828 00:50:13,280 --> 00:50:18,195 a valid job schedule, and it's kind of-- there 829 00:50:18,195 --> 00:50:21,200 are even more things you can do with DFS. 830 00:50:21,200 --> 00:50:24,520 We'll see some in recitations, more in the textbook. 831 00:50:24,520 --> 00:50:28,280 But simple algorithm, can do a lot of nifty things with it, 832 00:50:28,280 --> 00:50:30,930 very fast, linear time.