1 00:00:00,050 --> 00:00:01,770 The following content is provided 2 00:00:01,770 --> 00:00:04,010 under a Creative Commons license. 3 00:00:04,010 --> 00:00:06,860 Your support will help MIT OpenCourseWare continue 4 00:00:06,860 --> 00:00:10,720 to offer high quality educational resources for free. 5 00:00:10,720 --> 00:00:13,330 To make a donation or view additional materials 6 00:00:13,330 --> 00:00:17,226 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,226 --> 00:00:17,851 at ocw.mit.edu. 8 00:00:21,730 --> 00:00:23,980 PROFESSOR: Today we're going to introduce graph search 9 00:00:23,980 --> 00:00:25,896 in general and talk about one algorithm, which 10 00:00:25,896 --> 00:00:29,340 is breadth-first search, and understand how in principle you 11 00:00:29,340 --> 00:00:33,180 can solve a puzzle like the Rubik's Cube. 12 00:00:33,180 --> 00:00:37,990 So before I get to Rubik's Cubes let me remind you 13 00:00:37,990 --> 00:00:41,942 of some basic stuff about graphs. 14 00:00:41,942 --> 00:00:50,380 Or I can tell you to start out with, graph search is 15 00:00:50,380 --> 00:00:52,905 about exploring a graph. 16 00:00:55,890 --> 00:00:58,980 And there's many different notions of exploring a graph. 17 00:00:58,980 --> 00:01:01,950 Maybe I give you some node in a graph, s, 18 00:01:01,950 --> 00:01:03,790 and some other node in a graph, t, 19 00:01:03,790 --> 00:01:06,940 and I'd like to find a path that's 20 00:01:06,940 --> 00:01:09,390 going to represent a problem like I give you 21 00:01:09,390 --> 00:01:13,140 a particular state of a Rubik's Cube and I want to know 22 00:01:13,140 --> 00:01:16,534 is there some path that gets me into a solved state? 23 00:01:16,534 --> 00:01:18,200 Do I really want to solve this on stage? 24 00:01:18,200 --> 00:01:19,230 What the hell? 25 00:01:19,230 --> 00:01:21,650 We started. 26 00:01:21,650 --> 00:01:24,410 So this is a particularly easy state to solve, 27 00:01:24,410 --> 00:01:26,816 which is why I set up this way. 28 00:01:26,816 --> 00:01:27,940 All right, so there you go. 29 00:01:27,940 --> 00:01:31,510 Seven by seven by seven Rubik's Cube solved in 10 seconds. 30 00:01:31,510 --> 00:01:33,650 Amazing. 31 00:01:33,650 --> 00:01:36,030 New world record. 32 00:01:36,030 --> 00:01:39,327 So you're given some initial state of the Rubik's Cube. 33 00:01:39,327 --> 00:01:40,702 You're given the targets that you 34 00:01:40,702 --> 00:01:42,160 know what solved looks like. 35 00:01:42,160 --> 00:01:45,000 You want to find this path. 36 00:01:45,000 --> 00:01:47,887 Maybe you want to find all paths from s. 37 00:01:47,887 --> 00:01:49,720 Maybe you just want to explore all the nodes 38 00:01:49,720 --> 00:01:51,160 in a graph you can reach from s. 39 00:01:51,160 --> 00:01:53,460 Maybe you want to explore all the nodes in a graph or maybe 40 00:01:53,460 --> 00:01:54,501 all the edges in a graph. 41 00:01:54,501 --> 00:01:56,077 These are all exploration problems. 42 00:01:56,077 --> 00:01:57,910 They're all going to be solved by algorithms 43 00:01:57,910 --> 00:02:01,640 from this class and next class. 44 00:02:01,640 --> 00:02:04,350 So before we go further though, I 45 00:02:04,350 --> 00:02:07,740 should remind you what a graph is and sort 46 00:02:07,740 --> 00:02:12,230 of basic features of graphs that we're going to be using. 47 00:02:12,230 --> 00:02:15,390 This is also 6042 material so you should know it very well. 48 00:02:15,390 --> 00:02:17,120 If you don't, there's an appendix 49 00:02:17,120 --> 00:02:18,820 in the textbook about it. 50 00:02:18,820 --> 00:02:20,097 We have a set of vertices. 51 00:02:20,097 --> 00:02:21,055 We have a set of edges. 52 00:02:33,360 --> 00:02:41,460 Edges are either unordered pairs-- 53 00:02:41,460 --> 00:02:50,630 some sets of two items-- or ordered pairs. 54 00:02:58,230 --> 00:03:02,270 In this case, we call the graph undirected. 55 00:03:02,270 --> 00:03:04,920 In this case, we call the graph directed. 56 00:03:04,920 --> 00:03:07,630 Usually, there's only one type. 57 00:03:07,630 --> 00:03:09,030 Either all the edges are directed 58 00:03:09,030 --> 00:03:11,690 or all the edges are undirected. 59 00:03:11,690 --> 00:03:13,440 There is a study of graphs that have both, 60 00:03:13,440 --> 00:03:15,397 but we are not doing that here. 61 00:03:18,720 --> 00:03:20,430 Some simple examples. 62 00:03:23,344 --> 00:03:24,010 Here is a graph. 63 00:03:29,570 --> 00:03:30,795 This is an undirected graph. 64 00:03:42,820 --> 00:03:44,070 This is a directed graph. 65 00:03:48,490 --> 00:03:50,115 The set of vertices here is a, b, c, d. 66 00:03:50,115 --> 00:03:52,525 The set of vertices here is a, b, c. 67 00:03:52,525 --> 00:03:56,990 The set of edges here is-- E is going 68 00:03:56,990 --> 00:04:05,550 to be things like a, b; b, c; c, d-- I think you get the idea. 69 00:04:09,780 --> 00:04:12,750 Just for completeness, V is a, b, c, d. 70 00:04:12,750 --> 00:04:14,640 Just so you remember notations and so on. 71 00:04:17,214 --> 00:04:19,589 One of the issues we're going to talk about in this class 72 00:04:19,589 --> 00:04:23,562 is how do you represent a graph like this for an algorithm? 73 00:04:23,562 --> 00:04:25,770 So it's all fine to say, oh, this is a set of things. 74 00:04:25,770 --> 00:04:27,070 This is a set of things. 75 00:04:27,070 --> 00:04:28,490 An obvious representation is, you 76 00:04:28,490 --> 00:04:31,120 have a list or an array of vertices. 77 00:04:31,120 --> 00:04:32,520 You have an array of edges. 78 00:04:32,520 --> 00:04:34,320 Each edge knows its two end points. 79 00:04:34,320 --> 00:04:37,650 That would be a horrible representation for a graph 80 00:04:37,650 --> 00:04:39,920 because if you're, I don't know, at vertex, a, 81 00:04:39,920 --> 00:04:42,340 and you want to know, well what are the neighbors of a? 82 00:04:42,340 --> 00:04:43,260 b and c. 83 00:04:43,260 --> 00:04:45,290 You'd have to go through the entire edge list 84 00:04:45,290 --> 00:04:47,380 to figure out the neighbors of a. 85 00:04:47,380 --> 00:04:50,424 So it's been linear time just to know where you can go from a. 86 00:04:50,424 --> 00:04:52,340 So we're not going to use that representation. 87 00:04:52,340 --> 00:04:55,430 We're going to use some better representations. 88 00:04:55,430 --> 00:04:57,160 Something called an adjacency list. 89 00:05:01,560 --> 00:05:07,970 Over here, you've got things like a, c; b, c; and c, b. 90 00:05:07,970 --> 00:05:10,716 So you can have edges in both directions. 91 00:05:10,716 --> 00:05:11,620 What am I missing? 92 00:05:11,620 --> 00:05:12,120 b, a. 93 00:05:14,892 --> 00:05:17,480 So that's E, in that case. 94 00:05:22,180 --> 00:05:25,484 There are a whole lot of applications of graph search. 95 00:05:25,484 --> 00:05:28,430 I'll make you a little list to talk about few of them. 96 00:05:32,270 --> 00:05:35,095 So we've got web crawling. 97 00:05:39,010 --> 00:05:39,605 You're Google. 98 00:05:39,605 --> 00:05:42,100 You want to find all the pages on the web. 99 00:05:42,100 --> 00:05:45,050 Most people don't just tell you, hey, I've got a new page, 100 00:05:45,050 --> 00:05:46,450 please index it. 101 00:05:46,450 --> 00:05:48,274 You have to just keep following links-- 102 00:05:48,274 --> 00:05:49,690 in the early days of the web, this 103 00:05:49,690 --> 00:05:51,920 was a big deal-- following links finding 104 00:05:51,920 --> 00:05:53,659 everything that's out there. 105 00:05:53,659 --> 00:05:56,200 It's a little bit of an issue because if you define it wrong, 106 00:05:56,200 --> 00:05:59,690 the internet is infinite because of all those dynamically 107 00:05:59,690 --> 00:06:00,600 generated pages. 108 00:06:00,600 --> 00:06:03,330 But to deal with that, Google goes 109 00:06:03,330 --> 00:06:05,350 sort of breadth-first for the most part. 110 00:06:05,350 --> 00:06:09,710 It's prioritized You want to see all the things you 111 00:06:09,710 --> 00:06:15,440 can reach from pages you already have and keep going. 112 00:06:15,440 --> 00:06:19,130 At some point, you give up when you run out of time. 113 00:06:19,130 --> 00:06:21,130 Social networking. 114 00:06:21,130 --> 00:06:22,000 You're on Facebook. 115 00:06:22,000 --> 00:06:23,500 You use Friend Finder. 116 00:06:23,500 --> 00:06:26,260 It tries to find the friends that are nearest to you. 117 00:06:26,260 --> 00:06:31,140 Or friends of friends is sort of a level to search. 118 00:06:31,140 --> 00:06:33,340 That's essentially a graph search problem. 119 00:06:33,340 --> 00:06:37,130 You want to know what's two levels or three 120 00:06:37,130 --> 00:06:39,180 levels of separation from you. 121 00:06:39,180 --> 00:06:43,529 And then you loop over those and look for other signs 122 00:06:43,529 --> 00:06:44,820 that you might be good friends. 123 00:06:49,560 --> 00:06:54,460 You are on a network like the internet or some intranet. 124 00:06:54,460 --> 00:06:56,110 You want to broadcast a message. 125 00:06:56,110 --> 00:06:57,340 So here's you. 126 00:06:57,340 --> 00:06:59,350 You want to send data out. 127 00:06:59,350 --> 00:07:01,830 That's essentially a graph exploration problem. 128 00:07:01,830 --> 00:07:04,850 That message, that packet, is going to explore the graph. 129 00:07:09,460 --> 00:07:10,325 Garbage collection. 130 00:07:14,340 --> 00:07:17,369 I hope you all know that modern languages have 131 00:07:17,369 --> 00:07:18,160 garbage collection. 132 00:07:18,160 --> 00:07:21,740 This is why you don't have to worry about freeing things. 133 00:07:21,740 --> 00:07:23,800 Even in Python-- even in CPython, 134 00:07:23,800 --> 00:07:27,600 I learned-- there is a garbage collector as of version two. 135 00:07:27,600 --> 00:07:32,070 But also in PyPy, and JPython and in Java-- pretty much 136 00:07:32,070 --> 00:07:36,730 every fairly modern language you have garbage collection. 137 00:07:36,730 --> 00:07:40,900 Meaning, if there's some data that's unreachable from-- So 138 00:07:40,900 --> 00:07:43,820 you have your variables. 139 00:07:43,820 --> 00:07:45,890 Variables that can be accessed by the program. 140 00:07:45,890 --> 00:07:48,410 Everything that's reachable from there you have to keep. 141 00:07:48,410 --> 00:07:52,020 But if some data structure becomes no longer reachable, 142 00:07:52,020 --> 00:07:57,895 you can throw it away and regain memory. 143 00:07:57,895 --> 00:08:00,020 So that's happening behind the scenes all the time, 144 00:08:00,020 --> 00:08:01,750 and the way it's being done is with 145 00:08:01,750 --> 00:08:03,170 their breadth-first search, which 146 00:08:03,170 --> 00:08:04,990 is what we're going to talk about today. 147 00:08:07,620 --> 00:08:08,450 Another one. 148 00:08:08,450 --> 00:08:09,095 Model checking. 149 00:08:14,890 --> 00:08:21,250 Model checking is-- you have some finite model of either 150 00:08:21,250 --> 00:08:24,205 a piece of code, or a circuit, or chip, whatever, 151 00:08:24,205 --> 00:08:26,220 and you want to prove that it actually 152 00:08:26,220 --> 00:08:27,620 does what you think it does. 153 00:08:27,620 --> 00:08:29,530 And so you've drawn a graph. 154 00:08:29,530 --> 00:08:31,790 The graph is all the possible states 155 00:08:31,790 --> 00:08:36,000 that your circuit or your computer program could reach, 156 00:08:36,000 --> 00:08:38,393 or that it could possibly have. 157 00:08:38,393 --> 00:08:40,059 You start in some initial state, and you 158 00:08:40,059 --> 00:08:42,267 want to know among all the states that you can reach, 159 00:08:42,267 --> 00:08:43,400 does it have some property. 160 00:08:43,400 --> 00:08:46,050 And so you need to visit all the vertices that 161 00:08:46,050 --> 00:08:48,500 are reachable from a particular place. 162 00:08:48,500 --> 00:08:53,720 And usually people do that using breadth-first search. 163 00:08:53,720 --> 00:08:55,390 I use breadth-first search a lot, 164 00:08:55,390 --> 00:08:59,860 myself, to check mathematical conjectures. 165 00:08:59,860 --> 00:09:06,270 So if you're a mathematician, and you think something 166 00:09:06,270 --> 00:09:07,722 is true. 167 00:09:07,722 --> 00:09:11,600 Like maybe-- It's hard to give an example of that. 168 00:09:11,600 --> 00:09:15,690 But you can imagine some graph of all the possible inputs 169 00:09:15,690 --> 00:09:17,700 to that theorem, and you need to check 170 00:09:17,700 --> 00:09:20,379 them for every possible input-- If this is true-- 171 00:09:20,379 --> 00:09:22,170 the typical way to do that is breadth-first 172 00:09:22,170 --> 00:09:27,100 searching through that entire graph of states. 173 00:09:27,100 --> 00:09:29,694 Usually, we're testing finite, special cases 174 00:09:29,694 --> 00:09:32,110 of a general conjecture, but if we find a counter-example, 175 00:09:32,110 --> 00:09:32,900 we're done. 176 00:09:32,900 --> 00:09:34,591 Don't have to work on it anymore. 177 00:09:34,591 --> 00:09:36,590 If we don't find a counter-example, usually then 178 00:09:36,590 --> 00:09:38,270 we have to do the mathematics. 179 00:09:38,270 --> 00:09:42,695 It doesn't solve everything, but it's helpful. 180 00:09:47,600 --> 00:09:52,430 And then, the fun thing we're going 181 00:09:52,430 --> 00:09:54,070 to talk about a little bit today, 182 00:09:54,070 --> 00:09:55,903 is if you want to solve something like a two 183 00:09:55,903 --> 00:09:57,939 by two by two Rubik's Cube optimally, 184 00:09:57,939 --> 00:09:59,730 you can do that using breadth-first search. 185 00:09:59,730 --> 00:10:02,935 And you're going to do that on your problem set. 186 00:10:02,935 --> 00:10:04,850 To do it solving this one optimally 187 00:10:04,850 --> 00:10:09,560 using breadth-first search would probably-- would definitely-- 188 00:10:09,560 --> 00:10:12,330 take more than the lifetime of the universe. 189 00:10:12,330 --> 00:10:14,780 So don't try seven by seven by seven. 190 00:10:17,610 --> 00:10:21,180 Leave that to the cubing experts, I guess. 191 00:10:21,180 --> 00:10:23,610 I think no one will ever solve a seven by seven by seven 192 00:10:23,610 --> 00:10:26,150 Rubik's Cube optimally. 193 00:10:26,150 --> 00:10:30,190 There are ways to find a solution just not the best one. 194 00:10:30,190 --> 00:10:33,395 So let me tell you just for fun, as an example. 195 00:10:36,930 --> 00:10:41,530 This Pocket Cube, which is a two by two by two Rubik's Cube. 196 00:10:41,530 --> 00:10:45,780 What we have in mind is called the configuration graph 197 00:10:45,780 --> 00:10:48,480 or sometimes configuration space. 198 00:10:48,480 --> 00:10:50,701 But it's a graph, so we'll call it a graph. 199 00:10:54,040 --> 00:11:01,465 This graph has a vertex for each possible state of the cube. 200 00:11:10,340 --> 00:11:12,535 So this is a state. 201 00:11:15,140 --> 00:11:16,820 This is a state. 202 00:11:16,820 --> 00:11:17,640 This is a state. 203 00:11:17,640 --> 00:11:19,240 This is a state. 204 00:11:19,240 --> 00:11:21,790 Now I'm hopelessly lost. 205 00:11:21,790 --> 00:11:23,540 Anyone want to work on this? 206 00:11:23,540 --> 00:11:25,802 Bored? 207 00:11:25,802 --> 00:11:26,560 No one? 208 00:11:26,560 --> 00:11:28,510 Alright, I'll leave it unsolved then. 209 00:11:31,040 --> 00:11:32,522 So all those are vertices. 210 00:11:32,522 --> 00:11:33,980 There's actually a lot of vertices. 211 00:11:33,980 --> 00:11:38,690 There are 264 million vertices or so. 212 00:11:38,690 --> 00:11:39,310 If you want. 213 00:11:39,310 --> 00:11:41,560 To the side here. 214 00:11:41,560 --> 00:11:49,300 Number of vertices is something like 8 factorial times 3 215 00:11:49,300 --> 00:11:51,950 to the 8. 216 00:11:51,950 --> 00:11:57,637 And one way to see that is to draw a two by two 217 00:11:57,637 --> 00:11:58,470 by two Rubik's Cube. 218 00:12:01,458 --> 00:12:12,890 So these are what you might call cubelets, 219 00:12:12,890 --> 00:12:16,370 or cubies I think is the standard term in Rubik's Cube 220 00:12:16,370 --> 00:12:16,870 land. 221 00:12:21,220 --> 00:12:23,125 There's eight of them in a two by two by two. 222 00:12:23,125 --> 00:12:24,750 Two cubed. 223 00:12:24,750 --> 00:12:28,452 You can essentially permute those cubies within the cube 224 00:12:28,452 --> 00:12:29,160 however you like. 225 00:12:29,160 --> 00:12:31,020 That's 8 factorial. 226 00:12:31,020 --> 00:12:33,234 And then each of them has three possible twists. 227 00:12:33,234 --> 00:12:34,150 It could be like this. 228 00:12:34,150 --> 00:12:35,180 It could be like this. 229 00:12:35,180 --> 00:12:37,700 Or it could be like this. 230 00:12:37,700 --> 00:12:39,356 So you've got three for each. 231 00:12:39,356 --> 00:12:40,980 And this is actually an accurate count. 232 00:12:40,980 --> 00:12:43,229 You're not over-counting the number of configurations. 233 00:12:43,229 --> 00:12:45,760 All of those are, at least in principle, conceivable. 234 00:12:45,760 --> 00:12:48,060 If you take apart the cube, you can reassemble it 235 00:12:48,060 --> 00:12:49,590 in each of those states. 236 00:12:49,590 --> 00:12:53,930 And that number is about 264 million. 237 00:12:57,720 --> 00:13:00,000 Which is not so bad for computers. 238 00:13:00,000 --> 00:13:01,764 You could search that. 239 00:13:01,764 --> 00:13:02,930 Life is a little bit easier. 240 00:13:02,930 --> 00:13:04,940 You get to divide by 24 because there's 241 00:13:04,940 --> 00:13:06,800 24 symmetries of the cube. 242 00:13:06,800 --> 00:13:08,190 Eight times three. 243 00:13:08,190 --> 00:13:12,329 You can divide by three, also, because only a third 244 00:13:12,329 --> 00:13:14,370 of the configuration space is actually reachable. 245 00:13:14,370 --> 00:13:16,070 If you're not allowed to take the parts apart, 246 00:13:16,070 --> 00:13:17,720 if you have to get there by a motion, 247 00:13:17,720 --> 00:13:21,096 you can only get to 1/3 of the two by two by two. 248 00:13:21,096 --> 00:13:22,720 So it's a little bit smaller than that, 249 00:13:22,720 --> 00:13:24,550 if you're actually doing a breadth-first search, which 250 00:13:24,550 --> 00:13:26,758 is what you're going to be doing on your problem set. 251 00:13:26,758 --> 00:13:29,350 But in any case, it's feasible. 252 00:13:29,350 --> 00:13:30,370 That was vertices. 253 00:13:30,370 --> 00:13:31,575 We should talk about edges. 254 00:13:42,240 --> 00:13:47,570 For every move-- every move takes you 255 00:13:47,570 --> 00:13:49,870 from one configuration to another. 256 00:13:49,870 --> 00:13:52,960 You could traverse it in one direction and make that move. 257 00:13:52,960 --> 00:13:54,210 You could also undo that move. 258 00:13:54,210 --> 00:13:57,230 Because every move is undoable in a Rubik's Cube, 259 00:13:57,230 --> 00:13:58,940 this graph is undirected. 260 00:13:58,940 --> 00:14:02,570 Or you can think of it as every edge works in both directions. 261 00:14:02,570 --> 00:14:03,610 So this is a move. 262 00:14:03,610 --> 00:14:05,830 It's called a quarter twist. 263 00:14:05,830 --> 00:14:07,640 This is a controversy if you will. 264 00:14:07,640 --> 00:14:10,430 Some people allow a whole half twist as a single move. 265 00:14:10,430 --> 00:14:13,030 Whether you define that as a single move or a double move 266 00:14:13,030 --> 00:14:14,380 is not that big a deal. 267 00:14:14,380 --> 00:14:17,920 It just changes some of the answers. 268 00:14:17,920 --> 00:14:20,970 But you're still exploring essentially the same graph. 269 00:14:23,454 --> 00:14:24,870 So that's the graph and you'd like 270 00:14:24,870 --> 00:14:26,244 to know some properties about it. 271 00:14:26,244 --> 00:14:28,580 So let me draw a picture of the graph. 272 00:14:28,580 --> 00:14:31,560 I'm not going to draw all 264 million vertices. 273 00:14:31,560 --> 00:14:34,960 But in particular, there's the solved state-- 274 00:14:34,960 --> 00:14:38,100 we kind of care about that one, where all the colors are 275 00:14:38,100 --> 00:14:44,390 aligned-- then there's all of the configurations you could 276 00:14:44,390 --> 00:14:46,040 reach by one move. 277 00:14:46,040 --> 00:14:49,950 So these are the possible moves from the solved state. 278 00:14:52,950 --> 00:14:55,400 And then from those configurations, 279 00:14:55,400 --> 00:14:57,970 there's more places you can go. 280 00:14:57,970 --> 00:15:00,448 Maybe there's multiple ways to get to the same node. 281 00:15:03,320 --> 00:15:05,250 But these would be all the configurations 282 00:15:05,250 --> 00:15:07,190 you can reach in two moves. 283 00:15:15,170 --> 00:15:16,940 And so on. 284 00:15:16,940 --> 00:15:19,100 And at some point, you run out of graph. 285 00:15:19,100 --> 00:15:26,250 So there might be a few nodes out here. 286 00:15:26,250 --> 00:15:28,580 The way I'm drawing this, this is everything 287 00:15:28,580 --> 00:15:31,240 you can reach in one move, in two movies, in three moves. 288 00:15:31,240 --> 00:15:35,010 At the end, this would be 11 moves, 289 00:15:35,010 --> 00:15:37,760 if you allow half twists. 290 00:15:37,760 --> 00:15:41,470 And as puzzlers, we're particularly 291 00:15:41,470 --> 00:15:47,020 interested in this number, which you would call, as a graph 292 00:15:47,020 --> 00:15:50,760 theorist, the diameter of the graph. 293 00:15:50,760 --> 00:15:53,200 Puzzlers call it God's number. 294 00:15:53,200 --> 00:15:59,020 If you were God or some omni-- something being. 295 00:15:59,020 --> 00:16:01,860 You have the optimal algorithm for solving the Rubik's Cube. 296 00:16:01,860 --> 00:16:04,220 How many moves do you need If you always 297 00:16:04,220 --> 00:16:06,100 follow the best path? 298 00:16:06,100 --> 00:16:08,650 And the answer is, in the worst case, 11. 299 00:16:08,650 --> 00:16:14,130 So we're interested in the worst case of the best algorithm. 300 00:16:14,130 --> 00:16:16,880 For two by two by two, the answer is 11. 301 00:16:16,880 --> 00:16:20,050 For three by three by three, the answer is 20. 302 00:16:20,050 --> 00:16:23,070 That was just proved last summer with a couple 303 00:16:23,070 --> 00:16:24,650 years of computer time. 304 00:16:24,650 --> 00:16:26,734 For four by four by four-- I don't have one here-- 305 00:16:26,734 --> 00:16:28,233 I think we'll never know the answer. 306 00:16:28,233 --> 00:16:30,525 For five by five by five, we'll never know the answer. 307 00:16:30,525 --> 00:16:33,820 For six, for seven, same deal. 308 00:16:33,820 --> 00:16:36,330 But for two by two by two, you can compute it. 309 00:16:36,330 --> 00:16:38,180 You will compute it on your problem set. 310 00:16:38,180 --> 00:16:42,010 And it's kind of nice to know because it says whatever 311 00:16:42,010 --> 00:16:46,800 configuration I'm in, I can solve it in 11 moves. 312 00:16:46,800 --> 00:16:49,440 But the best known way to compute it, 313 00:16:49,440 --> 00:16:54,220 is basically to construct this graph one layer at a time 314 00:16:54,220 --> 00:16:55,470 until you're done. 315 00:16:55,470 --> 00:16:57,310 And then you know what the diameter is. 316 00:16:57,310 --> 00:17:00,910 The trouble is, in between here this grows exponentially. 317 00:17:00,910 --> 00:17:03,180 At some point, it decreases a little bit. 318 00:17:03,180 --> 00:17:04,970 But getting over that exponential hump 319 00:17:04,970 --> 00:17:06,564 is really hard. 320 00:17:06,564 --> 00:17:08,980 And for three by three by three, they used a lot of tricks 321 00:17:08,980 --> 00:17:13,490 to speed up the algorithm, but in the end 322 00:17:13,490 --> 00:17:15,760 it's essentially a breadth-first search. 323 00:17:15,760 --> 00:17:17,359 What's a breadth-first search? 324 00:17:17,359 --> 00:17:19,060 This going layer by layer. 325 00:17:19,060 --> 00:17:22,430 So we're going to formalize that in a moment. 326 00:17:22,430 --> 00:17:25,089 But that is the problem. 327 00:17:25,089 --> 00:17:37,920 So just for fun, any guesses what the right answer 328 00:17:37,920 --> 00:17:40,020 is for an n by n by n Rubik's cube? 329 00:17:40,020 --> 00:17:41,880 What's the diameter? 330 00:17:41,880 --> 00:17:43,380 Not an exact answer, because I think 331 00:17:43,380 --> 00:17:44,850 we'll never know the exact answer. 332 00:17:44,850 --> 00:17:48,558 But if I want theta something, what 333 00:17:48,558 --> 00:17:50,280 do you think the something is? 334 00:17:54,649 --> 00:17:56,732 How many people here have solved the Rubik's Cube? 335 00:17:56,732 --> 00:17:58,130 Ever? 336 00:17:58,130 --> 00:18:00,510 So you know what we're talking about here. 337 00:18:00,510 --> 00:18:04,390 Most people have worked on it. 338 00:18:04,390 --> 00:18:08,280 To think about an n by n by n Rubik's Cube, 339 00:18:08,280 --> 00:18:11,420 each side has area n squared. 340 00:18:11,420 --> 00:18:14,210 So total surface area is 6 n squared. 341 00:18:14,210 --> 00:18:18,718 So there's, roughly, stata n squared little cubies here. 342 00:18:18,718 --> 00:18:21,301 So what do you think the right [INAUDIBLE] is for n by n by n? 343 00:18:26,702 --> 00:18:27,480 No guesses? 344 00:18:32,095 --> 00:18:33,450 AUDIENCE: n cubed? 345 00:18:33,450 --> 00:18:34,900 PROFESSOR: n cubed? 346 00:18:34,900 --> 00:18:36,540 Reasonable guess. 347 00:18:36,540 --> 00:18:38,045 But wrong. 348 00:18:38,045 --> 00:18:39,690 It's an upper bounds. 349 00:18:39,690 --> 00:18:40,890 Why n cubed? 350 00:18:43,432 --> 00:18:44,348 AUDIENCE: [INAUDIBLE]. 351 00:18:48,834 --> 00:18:51,000 PROFESSOR: Oh, you're guessing based on the numbers. 352 00:18:51,000 --> 00:18:51,499 Yeah. 353 00:18:51,499 --> 00:18:53,500 The numbers are misleading, unfortunately. 354 00:18:53,500 --> 00:18:56,466 It's the law of small numbers I guess. 355 00:18:56,466 --> 00:18:59,300 It doesn't really look right. 356 00:18:59,300 --> 00:19:00,636 I know the answer. 357 00:19:00,636 --> 00:19:02,010 I know the answer because we just 358 00:19:02,010 --> 00:19:03,259 wrote a paper with the answer. 359 00:19:03,259 --> 00:19:04,490 This is a new result. 360 00:19:04,490 --> 00:19:05,940 From this summer. 361 00:19:05,940 --> 00:19:08,330 But I'm curious. 362 00:19:08,330 --> 00:19:10,410 To me the obvious answer is n squared 363 00:19:10,410 --> 00:19:12,700 because there's about n squared cubies. 364 00:19:12,700 --> 00:19:15,340 And it's not so hard to show in a constant number moves 365 00:19:15,340 --> 00:19:19,011 you can solve a constant number of cubies. 366 00:19:19,011 --> 00:19:20,760 If you think about the general algorithms, 367 00:19:20,760 --> 00:19:22,676 like if you've ever looked up professor's cube 368 00:19:22,676 --> 00:19:25,200 and how to solve it, you're doing like 10 moves, 369 00:19:25,200 --> 00:19:27,720 and then maybe you swap two cubies 370 00:19:27,720 --> 00:19:30,530 which you can use to solve a couple of cubies 371 00:19:30,530 --> 00:19:31,910 in a constant number of moves. 372 00:19:31,910 --> 00:19:36,400 So n squared would be the standard answer 373 00:19:36,400 --> 00:19:38,600 if you're following standard algorithms. 374 00:19:38,600 --> 00:19:41,225 But it turns out, you can do a little bit better. 375 00:19:41,225 --> 00:19:43,350 And the right answer is n squared divided by log n. 376 00:19:43,350 --> 00:19:45,050 I think it's cool. 377 00:19:45,050 --> 00:19:46,814 Hopefully, you guys can appreciate that. 378 00:19:46,814 --> 00:19:48,980 Not a lot of people can appreciate n squared divided 379 00:19:48,980 --> 00:19:52,365 by log n, but here in algorithms, we're all about n 380 00:19:52,365 --> 00:19:53,290 squared over log n. 381 00:19:57,770 --> 00:20:00,420 If you're interested, the paper's on my website. 382 00:20:00,420 --> 00:20:03,590 I think its called, Algorithms For Solving Rubik's Cubes. 383 00:20:03,590 --> 00:20:05,110 There's a constant there. 384 00:20:05,110 --> 00:20:06,980 Current constant is not so good. 385 00:20:06,980 --> 00:20:08,320 Let's say it's in the millions. 386 00:20:08,320 --> 00:20:11,760 [LAUGHTER] 387 00:20:11,760 --> 00:20:13,010 You've got to start somewhere. 388 00:20:15,754 --> 00:20:17,420 The next open problem will be to improve 389 00:20:17,420 --> 00:20:19,170 that constant to something reasonable that 390 00:20:19,170 --> 00:20:20,720 maybe is close to 20. 391 00:20:20,720 --> 00:20:25,250 But we're far from that. 392 00:20:25,250 --> 00:20:27,145 Let's talk about graph representation. 393 00:20:31,289 --> 00:20:33,080 Before we can talk about exporting a graph, 394 00:20:33,080 --> 00:20:36,680 we need to know what we're given as input. 395 00:20:36,680 --> 00:20:39,950 And there's basically one standard representation 396 00:20:39,950 --> 00:20:43,510 and a bunch of variations of it. 397 00:20:43,510 --> 00:20:45,125 And they're called adjacency lists. 398 00:20:48,090 --> 00:20:49,720 So the idea with an adjacency list, 399 00:20:49,720 --> 00:20:58,436 is you have an array called Adj, for adjacency 400 00:20:58,436 --> 00:21:02,290 of size V. Each element in the array 401 00:21:02,290 --> 00:21:03,735 is a pointer to a linked list. 402 00:21:07,610 --> 00:21:12,260 And the idea is that this array is indexed by a vertex. 403 00:21:18,960 --> 00:21:21,170 So we're imagining a world where we 404 00:21:21,170 --> 00:21:23,500 can index arrays by vertices. 405 00:21:23,500 --> 00:21:25,960 So maybe, you just label your vertices 406 00:21:25,960 --> 00:21:27,730 zero through v minus 1. 407 00:21:27,730 --> 00:21:29,760 Then that's a regular array. 408 00:21:29,760 --> 00:21:31,802 Or, if you want to get fancy, you 409 00:21:31,802 --> 00:21:35,200 can think of a vertex as an arbitrary hashable thing, 410 00:21:35,200 --> 00:21:37,700 and Adj is actually a hash table. 411 00:21:37,700 --> 00:21:39,810 And that's how you probably do it in Python. 412 00:21:39,810 --> 00:21:42,910 Maybe your vertices are objects, and this is just 413 00:21:42,910 --> 00:21:44,764 hashing based on the address of the object. 414 00:21:44,764 --> 00:21:46,430 But we're not going to worry about that. 415 00:21:46,430 --> 00:21:48,370 We're just going to write Adj of u. 416 00:21:48,370 --> 00:21:50,560 Assume that somehow you can get to the linked list 417 00:21:50,560 --> 00:21:51,768 corresponding to that vertex. 418 00:22:00,680 --> 00:22:02,420 And the idea is, for every vertex 419 00:22:02,420 --> 00:22:06,260 we just store its neighbors, namely 420 00:22:06,260 --> 00:22:10,220 the vertices you can reach by one step from u. 421 00:22:10,220 --> 00:22:13,150 So I'm going to define that a little more formally. 422 00:22:13,150 --> 00:22:17,080 Adj of u is going to be the set of all vertices, 423 00:22:17,080 --> 00:22:22,430 V, such that u, v is an edge. 424 00:22:31,320 --> 00:22:35,990 So if I have a vertex like b, Adj of b 425 00:22:35,990 --> 00:22:38,685 is going to be both a and c because in one step 426 00:22:38,685 --> 00:22:42,010 there are outgoing edges from b to a and b to c. 427 00:22:42,010 --> 00:22:44,730 So Adj of b is a, c. 428 00:22:52,260 --> 00:22:53,530 In that graph. 429 00:22:53,530 --> 00:22:56,620 I should have labeled the vertices something different. 430 00:22:56,620 --> 00:23:02,310 Adj of a is going to be just c because you 431 00:23:02,310 --> 00:23:05,145 can't get with one step from a to b. 432 00:23:05,145 --> 00:23:08,080 The edge is in the wrong direction. 433 00:23:08,080 --> 00:23:13,240 And Adj of c is b. 434 00:23:17,480 --> 00:23:19,260 I think that definition's pretty clear. 435 00:23:19,260 --> 00:23:23,290 For undirected graphs, you just put braces here. 436 00:23:23,290 --> 00:23:25,980 Which means you store-- I mean, it's the same thing. 437 00:23:25,980 --> 00:23:29,180 Here Adj of c is going to be a, b, and d, as you 438 00:23:29,180 --> 00:23:33,200 can get in one step from c to a, from c to b, from c to d. 439 00:23:33,200 --> 00:23:36,700 For pretty much every-- At least for graph exploration problems, 440 00:23:36,700 --> 00:23:38,580 this is the representation you want. 441 00:23:38,580 --> 00:23:39,850 Because you're at some vertex, and you want to know, 442 00:23:39,850 --> 00:23:40,930 where can I go next. 443 00:23:40,930 --> 00:23:44,560 And Adj of that vertex tells you exactly where you can go next. 444 00:23:44,560 --> 00:23:45,830 So this is what you want. 445 00:23:50,030 --> 00:23:53,040 There's a lot of different ways to actually implement 446 00:23:53,040 --> 00:23:56,540 adjacency lists. 447 00:23:56,540 --> 00:23:59,440 I've talked about two of them. 448 00:23:59,440 --> 00:24:02,720 You could have the vertices labeled zero to v minus 1, 449 00:24:02,720 --> 00:24:05,020 and then this is, literally, an array. 450 00:24:05,020 --> 00:24:08,460 And you have-- I guess I should draw. 451 00:24:08,460 --> 00:24:13,730 In this picture, Adj is an array. 452 00:24:13,730 --> 00:24:17,080 So you've got a, b, and c. 453 00:24:17,080 --> 00:24:20,490 Each one of them is a pointer to a linked list. 454 00:24:20,490 --> 00:24:27,510 This one's actually going to be a, c, and we're done. 455 00:24:27,510 --> 00:24:30,820 Sorry, that was b. 456 00:24:30,820 --> 00:24:33,520 Who said it had to be alphabetical order? 457 00:24:33,520 --> 00:24:38,110 A is a pointer to c, c is a pointer to b. 458 00:24:38,110 --> 00:24:40,570 That's explicitly how you might represent it. 459 00:24:40,570 --> 00:24:43,350 This might be a hash table instead of an array, 460 00:24:43,350 --> 00:24:45,870 if you have weirder vertices. 461 00:24:45,870 --> 00:24:48,370 You can also do it in a more object-oriented fashion. 462 00:24:55,590 --> 00:24:59,090 For every vertex, v, you can make the vertices objects, 463 00:24:59,090 --> 00:25:05,930 and v dot neighbors could store what 464 00:25:05,930 --> 00:25:08,460 we're defining over there to be Adj 465 00:25:08,460 --> 00:25:13,630 of v. This would be the more object-oriented way to do it 466 00:25:13,630 --> 00:25:16,040 I've thought a lot about this, and I like this, 467 00:25:16,040 --> 00:25:19,010 and usually when I implement graphs this is what I do. 468 00:25:19,010 --> 00:25:23,200 But it is actually convenient to have this representation. 469 00:25:23,200 --> 00:25:25,665 There's a reason the textbook uses this representation. 470 00:25:25,665 --> 00:25:28,040 Because, if you've already got some vertices lying around 471 00:25:28,040 --> 00:25:31,150 and you want to have multiple graphs on those vertices, 472 00:25:31,150 --> 00:25:33,710 this lets you do that. 473 00:25:33,710 --> 00:25:37,934 You can define multiple Adj arrays, one for graph one, one 474 00:25:37,934 --> 00:25:39,350 for graph two, one for graph three 475 00:25:39,350 --> 00:25:41,790 but they can all talk about the same vertices. 476 00:25:41,790 --> 00:25:45,410 Whereas here, vertex can only belong to one graph. 477 00:25:45,410 --> 00:25:48,260 It can only have one neighbor structure 478 00:25:48,260 --> 00:25:49,390 that says what happens. 479 00:25:49,390 --> 00:25:51,000 If you're only dealing with one graph, 480 00:25:51,000 --> 00:25:52,850 this is probably cleaner. 481 00:25:52,850 --> 00:25:56,220 But with multiple graphs, which will happen even in this class, 482 00:25:56,220 --> 00:26:00,534 adjacency lists are kind of the way to go. 483 00:26:00,534 --> 00:26:02,450 You can also do implicitly-represented graphs. 484 00:26:13,580 --> 00:26:20,595 Which would be to say, Adj of u is a function. 485 00:26:23,960 --> 00:26:36,660 Or v dot neighbors is a method of the vertex class. 486 00:26:36,660 --> 00:26:39,210 Meaning, it's not just stored there explicitly. 487 00:26:39,210 --> 00:26:41,270 Whenever you need it, you call this function 488 00:26:41,270 --> 00:26:45,500 and it computes what you want. 489 00:26:45,500 --> 00:26:47,420 This is useful because it uses less space. 490 00:26:47,420 --> 00:26:52,030 You could say this uses zero space or maybe v space. 491 00:26:52,030 --> 00:26:53,470 One for each vertex. 492 00:26:53,470 --> 00:26:54,139 It depends. 493 00:26:54,139 --> 00:26:56,180 Maybe you don't even need to explicitly represent 494 00:26:56,180 --> 00:26:58,080 all the vertices. 495 00:26:58,080 --> 00:27:03,500 You start with some vertex, and given a vertex, somehow 496 00:27:03,500 --> 00:27:06,610 you know how to compute, let's say in constant time or linear 497 00:27:06,610 --> 00:27:10,270 time or something, the neighbors of that vertex. 498 00:27:10,270 --> 00:27:11,840 And then from there, you can keep 499 00:27:11,840 --> 00:27:13,340 searching, keep computing neighbors, 500 00:27:13,340 --> 00:27:14,590 until you find what you want. 501 00:27:14,590 --> 00:27:16,506 Maybe you don't have to build the whole graph, 502 00:27:16,506 --> 00:27:19,774 you just need to build enough of it until you find your answer. 503 00:27:19,774 --> 00:27:21,315 Whatever answer you're searching for. 504 00:27:21,315 --> 00:27:23,850 Can you think of a situation where that might be the case? 505 00:27:27,205 --> 00:27:29,330 Where implicit representation would be a good idea? 506 00:27:29,330 --> 00:27:29,830 Yes. 507 00:27:29,830 --> 00:27:30,710 Rubik's Cubes. 508 00:27:30,710 --> 00:27:31,543 They're really good. 509 00:27:31,543 --> 00:27:33,170 I never want to build this space. 510 00:27:33,170 --> 00:27:36,060 It has a bajillion states. 511 00:27:36,060 --> 00:27:37,170 A bajillion vertices. 512 00:27:37,170 --> 00:27:38,780 It would take forever. 513 00:27:38,780 --> 00:27:41,580 There's more configurations of this cube 514 00:27:41,580 --> 00:27:45,590 than there are particles in the known universe. 515 00:27:45,590 --> 00:27:47,701 I just computed that in my head. 516 00:27:47,701 --> 00:27:50,120 [LAUGHTER] 517 00:27:50,120 --> 00:27:52,130 I have done this computation recently, 518 00:27:52,130 --> 00:27:55,620 and for five by five by five it's like 10 to the 40 states. 519 00:27:55,620 --> 00:27:58,359 Or 10 to the 40, 10 to the 60. 520 00:27:58,359 --> 00:28:00,400 There's about 10 to the 80 particles in the known 521 00:28:00,400 --> 00:28:00,940 universe. 522 00:28:00,940 --> 00:28:02,240 10 to the 83 or something. 523 00:28:02,240 --> 00:28:06,750 So this is probably 10 to the 200 or so. 524 00:28:06,750 --> 00:28:07,862 It's a lot. 525 00:28:07,862 --> 00:28:09,070 You never want to build that. 526 00:28:09,070 --> 00:28:11,820 But, it's very easy to represent this state. 527 00:28:11,820 --> 00:28:13,520 Just store where all the cubies are. 528 00:28:13,520 --> 00:28:16,410 And it's very easy to see what are all the configurations you 529 00:28:16,410 --> 00:28:17,640 can reach in one move. 530 00:28:17,640 --> 00:28:20,630 Just try this move, try this move, try this move. 531 00:28:20,630 --> 00:28:22,371 Put it back and try the next move. 532 00:28:22,371 --> 00:28:22,870 And so on. 533 00:28:25,660 --> 00:28:27,600 For an m by n by n cube in order n 534 00:28:27,600 --> 00:28:30,210 time, you can list all the order n next states. 535 00:28:30,210 --> 00:28:32,050 You can list all the order n neighbors. 536 00:28:32,050 --> 00:28:35,224 And so you can keep exploring, searching for your state. 537 00:28:35,224 --> 00:28:37,390 Now you don't want to explore too far for that cube, 538 00:28:37,390 --> 00:28:41,020 but at least you're not hosed just 539 00:28:41,020 --> 00:28:44,190 from the problem of representing the graph. 540 00:28:44,190 --> 00:28:46,030 So even for two by two by two, it's 541 00:28:46,030 --> 00:28:48,165 useful to do this mostly to save space. 542 00:28:48,165 --> 00:28:50,310 You're not really saving time. 543 00:28:50,310 --> 00:28:54,960 But you'd like to not have to store all 264 million states 544 00:28:54,960 --> 00:29:01,850 because it's going to be several gigabytes and it's annoying. 545 00:29:01,850 --> 00:29:05,650 Speaking of space-- ignoring the implicit representation-- 546 00:29:05,650 --> 00:29:08,755 how much space does this representation require? 547 00:29:19,820 --> 00:29:22,590 V plus E. This Is going to be the bread 548 00:29:22,590 --> 00:29:24,609 and butter of our graph algorithms. 549 00:29:24,609 --> 00:29:27,150 Most of the things we're going to talk about achieve V plus E 550 00:29:27,150 --> 00:29:27,650 time. 551 00:29:27,650 --> 00:29:29,090 This is essentially optimal. 552 00:29:29,090 --> 00:29:32,162 It's linear in the size of your graph. 553 00:29:32,162 --> 00:29:34,700 You've got V vertices, E edges. 554 00:29:34,700 --> 00:29:37,160 Technically, in case you're curious, 555 00:29:37,160 --> 00:29:40,560 this is really the size of V plus the size of E. 556 00:29:40,560 --> 00:29:44,667 But in the textbook, and I guess in the world, 557 00:29:44,667 --> 00:29:46,500 we just omit those sizes of whenever they're 558 00:29:46,500 --> 00:29:49,180 in a theta notation or Big O notation. 559 00:29:49,180 --> 00:29:50,930 So number vertices plus number of edges. 560 00:29:50,930 --> 00:29:52,304 that sort of the bare minimum you 561 00:29:52,304 --> 00:29:55,456 need if you want an explicit representation of the graph. 562 00:29:55,456 --> 00:29:56,830 And we achieve that because we've 563 00:29:56,830 --> 00:30:00,880 got we've got v space just to store the vertices in an array. 564 00:30:00,880 --> 00:30:06,677 And then if you add up-- Each of these is an edge. 565 00:30:06,677 --> 00:30:08,010 You have to be a little careful. 566 00:30:08,010 --> 00:30:11,570 In undirected graphs, each of these is a half edge. 567 00:30:11,570 --> 00:30:15,340 So there's actually two times e nodes over here. 568 00:30:15,340 --> 00:30:19,390 But it's theta E. So theta V plus E 569 00:30:19,390 --> 00:30:22,140 is the amount of space we need. 570 00:30:22,140 --> 00:30:25,090 And ideally, all our algorithms will run in this much time. 571 00:30:25,090 --> 00:30:28,550 Because that's what you need just to look at the graph. 572 00:31:06,710 --> 00:31:11,590 So let's do an actual algorithm, which is breadth-first search. 573 00:31:14,920 --> 00:31:18,420 So to the simplest algorithm you can think of in graphs. 574 00:31:18,420 --> 00:31:21,090 I've already outlined it several times. 575 00:31:21,090 --> 00:31:22,640 You start at some node. 576 00:31:22,640 --> 00:31:24,810 You look at all the nodes you can get to from there. 577 00:31:24,810 --> 00:31:26,976 You look at all the nodes you can get to from there. 578 00:31:26,976 --> 00:31:29,300 Keep going until you're done. 579 00:31:29,300 --> 00:31:32,110 So this is going to explore all of the vertices that 580 00:31:32,110 --> 00:31:34,030 are reachable from a node. 581 00:31:36,720 --> 00:31:39,270 The challenge-- The one annoying thing 582 00:31:39,270 --> 00:31:41,520 about breadth-first search and why this is not trivial 583 00:31:41,520 --> 00:31:44,280 is that there can be some edges that 584 00:31:44,280 --> 00:31:52,340 go sort of backwards, like that, to some previous layer. 585 00:31:52,340 --> 00:31:54,242 Actually, that's not true, is it? 586 00:31:58,090 --> 00:31:59,275 This can't happen. 587 00:31:59,275 --> 00:32:02,220 You see why? 588 00:32:02,220 --> 00:32:06,280 Because if that edge existed, then from this node 589 00:32:06,280 --> 00:32:08,205 you'd be able to get here. 590 00:32:08,205 --> 00:32:10,180 So in an undirected graph, that can't happen. 591 00:32:10,180 --> 00:32:12,520 In a directed graph, you could conceivably 592 00:32:12,520 --> 00:32:13,660 have a back edge like that. 593 00:32:13,660 --> 00:32:16,270 You'd have to realize, oh, that's a vertex I've already 594 00:32:16,270 --> 00:32:19,450 seen, I don't want to put it here, even though it's 595 00:32:19,450 --> 00:32:21,270 something I can reach from this node, 596 00:32:21,270 --> 00:32:22,961 because I've already been there. 597 00:32:22,961 --> 00:32:24,710 We've got to worry about things like that. 598 00:32:27,260 --> 00:32:29,970 That's, I guess, the main thing to worry about. 599 00:32:34,620 --> 00:32:40,950 So our goal is to visit all the nodes-- 600 00:32:40,950 --> 00:32:47,555 the vertices-- reachable from given node, s. 601 00:32:51,470 --> 00:32:54,720 We want to achieve V plus E time. 602 00:33:00,190 --> 00:33:10,440 And the idea is to look at the nodes that 603 00:33:10,440 --> 00:33:15,780 are reachable first in zero moves. 604 00:33:15,780 --> 00:33:17,040 Zero moves. 605 00:33:17,040 --> 00:33:17,820 That's s. 606 00:33:20,390 --> 00:33:23,480 Then in one move. 607 00:33:23,480 --> 00:33:27,650 Well that's everything you can reach from s in one step. 608 00:33:27,650 --> 00:33:29,430 That's adjacency of s. 609 00:33:29,430 --> 00:33:32,810 And then two moves, and three moves, and so 610 00:33:32,810 --> 00:33:36,910 on until we run out of graph. 611 00:33:36,910 --> 00:33:47,800 But we need to be careful to avoid duplicates. 612 00:33:47,800 --> 00:33:51,090 We want to avoid revisiting vertices 613 00:33:51,090 --> 00:33:52,090 for a couple of reasons. 614 00:33:52,090 --> 00:33:55,110 One is if we didn't, we would spend infinite time. 615 00:33:55,110 --> 00:33:56,945 Because we'd just go there and come back, 616 00:33:56,945 --> 00:33:58,070 and go there and come back. 617 00:33:58,070 --> 00:33:59,740 As long as there's at least one cycle, 618 00:33:59,740 --> 00:34:00,920 you're going to keep going around the cycle 619 00:34:00,920 --> 00:34:03,280 forever and ever if you don't try to avoid duplicates. 620 00:34:05,707 --> 00:34:07,790 So let me write down some code for this algorithm. 621 00:34:07,790 --> 00:34:09,580 It's pretty straightforward. 622 00:34:09,580 --> 00:34:12,190 So straightforward, we can be completely explicit 623 00:34:12,190 --> 00:34:14,115 and write [INAUDIBLE] code. 624 00:34:18,824 --> 00:34:21,199 There's a few different ways to implement this algorithm. 625 00:34:21,199 --> 00:34:23,780 I'll show you my favorite. 626 00:34:23,780 --> 00:34:25,659 The textbook has a different favorite. 627 00:34:42,040 --> 00:34:44,739 I'm going to write in pure Python, I believe. 628 00:35:57,370 --> 00:35:58,100 Almost done. 629 00:36:30,650 --> 00:36:33,160 I think I got that right. 630 00:36:33,160 --> 00:36:36,594 So this is at the end of the while-loop. 631 00:36:36,594 --> 00:36:39,272 And at that point we should be done. 632 00:36:39,272 --> 00:36:40,730 We can do an actual example, maybe. 633 00:37:16,610 --> 00:37:19,560 I'm going to do it on an undirected graph, 634 00:37:19,560 --> 00:37:21,220 but this algorithm works just as well 635 00:37:21,220 --> 00:37:22,970 on directed and undirected graphs. 636 00:37:28,330 --> 00:37:30,950 There's an undirected graph. 637 00:37:30,950 --> 00:37:34,890 We're given some start vertex, s, 638 00:37:34,890 --> 00:37:37,430 and we're given the graph by being 639 00:37:37,430 --> 00:37:39,850 given the adjacency lists. 640 00:37:39,850 --> 00:37:42,600 So you could iterate over the vertices of that thing. 641 00:37:42,600 --> 00:37:44,420 Given a vertex, you can list all the edges 642 00:37:44,420 --> 00:37:47,112 you can reach in one step. 643 00:37:47,112 --> 00:37:48,570 And then the top of the algorithm's 644 00:37:48,570 --> 00:37:50,400 just some initialization. 645 00:37:50,400 --> 00:37:52,570 The basic structure-- We have this thing 646 00:37:52,570 --> 00:37:55,890 called the frontier, which is what we just 647 00:37:55,890 --> 00:37:58,920 reached on the previous level. 648 00:37:58,920 --> 00:38:04,480 I think that's going to be level i minus one. 649 00:38:04,480 --> 00:38:06,225 Just don't want to make an index error. 650 00:38:08,614 --> 00:38:10,280 These are going to be all the things you 651 00:38:10,280 --> 00:38:14,970 can reach using exactly i minus one moves. 652 00:38:14,970 --> 00:38:17,020 And then next is going to be all the things 653 00:38:17,020 --> 00:38:18,560 you can reach in i moves. 654 00:38:21,310 --> 00:38:24,976 So to get started, what we know is s. 655 00:38:24,976 --> 00:38:28,580 s is what you can reach in zero moves. 656 00:38:28,580 --> 00:38:31,540 So we set the level of s to be zero. 657 00:38:31,540 --> 00:38:33,219 That's the first line of the code. 658 00:38:33,219 --> 00:38:35,010 There's this other thing called the parent. 659 00:38:35,010 --> 00:38:36,650 We'll worry about that later. 660 00:38:36,650 --> 00:38:37,720 It's optional. 661 00:38:37,720 --> 00:38:40,700 It gives us some other fun structure. 662 00:38:40,700 --> 00:38:44,950 We set i to be one because we just finished level zero. 663 00:38:44,950 --> 00:38:49,600 Frontier of what you can reach in level zero is just s itself. 664 00:38:49,600 --> 00:38:51,560 So we're going to put that on the list. 665 00:38:51,560 --> 00:38:54,870 That is level zero. i equals one So one minus one is zero. 666 00:38:54,870 --> 00:38:56,360 All good. 667 00:38:56,360 --> 00:38:57,810 And then we're going to iterate. 668 00:38:57,810 --> 00:39:00,250 And this is going to be looking at-- The end 669 00:39:00,250 --> 00:39:02,347 of the iteration is to increment i. 670 00:39:02,347 --> 00:39:03,930 So you could also call this a for-loop 671 00:39:03,930 --> 00:39:05,763 except we don't know when it's going to end. 672 00:39:05,763 --> 00:39:09,609 So it's easier to think of i incrementing each step 673 00:39:09,609 --> 00:39:11,150 not knowing when we're going to stop. 674 00:39:11,150 --> 00:39:13,191 We're going to stop whenever we run out of nodes. 675 00:39:13,191 --> 00:39:16,787 So whenever frontier is a non-empty list. 676 00:39:16,787 --> 00:39:18,370 the bulk of the work here is computing 677 00:39:18,370 --> 00:39:19,520 what the next level is. 678 00:39:19,520 --> 00:39:20,860 That's called next. 679 00:39:20,860 --> 00:39:22,330 It's going to be level i. 680 00:39:22,330 --> 00:39:23,310 We do some computation. 681 00:39:23,310 --> 00:39:26,020 Eventually we have what's on the next level. 682 00:39:26,020 --> 00:39:28,172 Then we set frontier next. 683 00:39:28,172 --> 00:39:29,380 Because that's our new level. 684 00:39:29,380 --> 00:39:31,820 We increment i, and then invariant 685 00:39:31,820 --> 00:39:35,642 of frontier being level i minus 1 is preserved. 686 00:39:35,642 --> 00:39:36,350 Right after here. 687 00:39:36,350 --> 00:39:40,100 And then we just keep going till we run out of nodes. 688 00:39:40,100 --> 00:39:42,230 How do we compute next? 689 00:39:42,230 --> 00:39:44,210 Well, we look at every node in the frontier, 690 00:39:44,210 --> 00:39:47,660 and we look at all the nodes you can reach from those nodes. 691 00:39:47,660 --> 00:39:49,120 So every node, u, in the frontier 692 00:39:49,120 --> 00:39:51,400 and then we look at-- So this means 693 00:39:51,400 --> 00:39:55,660 there is an edge from u to v through the picture. 694 00:39:55,660 --> 00:39:58,520 We look at all the edges from all the frontier nodes 695 00:39:58,520 --> 00:39:59,870 where you can go. 696 00:39:59,870 --> 00:40:02,330 And then the key thing is we check for duplicates. 697 00:40:02,330 --> 00:40:04,830 We see, have we seen this node before? 698 00:40:04,830 --> 00:40:08,270 If we have, we would have set it's level to be something. 699 00:40:08,270 --> 00:40:09,870 If we haven't seen it, it will not 700 00:40:09,870 --> 00:40:14,160 be in the level hash table or the level dictionary. 701 00:40:14,160 --> 00:40:18,350 And so if it's not in there, we'll put it in there 702 00:40:18,350 --> 00:40:20,640 and add it to the next layer. 703 00:40:20,640 --> 00:40:22,860 So that's how you avoid duplicates. 704 00:40:22,860 --> 00:40:25,890 You set its level to make sure you will never visit it again, 705 00:40:25,890 --> 00:40:28,870 you add it to the next frontier, you iterate, you're done. 706 00:40:31,359 --> 00:40:32,900 This is one version of what you might 707 00:40:32,900 --> 00:40:34,240 call a breadth-first search. 708 00:40:34,240 --> 00:40:36,270 And it achieves this goal, visiting 709 00:40:36,270 --> 00:40:39,220 all the nodes reachable from s, in linear time. 710 00:40:39,220 --> 00:40:41,640 Let's see how it works on a real example. 711 00:40:41,640 --> 00:40:43,740 So first frontier is this thing. 712 00:40:46,670 --> 00:40:49,120 Frontier just has the node s, so we just look at s, 713 00:40:49,120 --> 00:40:50,930 and we look at all the edges from s. 714 00:40:50,930 --> 00:40:52,440 We get a and x. 715 00:40:52,440 --> 00:40:56,460 So those get added to the next frontier. 716 00:40:56,460 --> 00:41:01,040 Maybe before I go too far, let me switch colors. 717 00:41:05,700 --> 00:41:08,080 Multimedia here. 718 00:41:08,080 --> 00:41:12,576 So here's level one. 719 00:41:12,576 --> 00:41:17,270 All of these guys, we're going to set their level to one. 720 00:41:17,270 --> 00:41:18,877 They can be reached in one step. 721 00:41:18,877 --> 00:41:19,710 That's pretty clear. 722 00:41:19,710 --> 00:41:22,570 So now frontier is a and x. 723 00:41:22,570 --> 00:41:24,380 That's what next becomes. 724 00:41:24,380 --> 00:41:26,240 Then frontier becomes next. 725 00:41:26,240 --> 00:41:28,510 And so we look at all the edges from a. 726 00:41:28,510 --> 00:41:31,110 That's going to be s and z. 727 00:41:31,110 --> 00:41:33,730 s, we've already looked at, it already has a level set, 728 00:41:33,730 --> 00:41:35,200 so we ignore that. 729 00:41:35,200 --> 00:41:35,880 So we look at z. 730 00:41:35,880 --> 00:41:38,200 Z does not have a level indicated here, 731 00:41:38,200 --> 00:41:39,990 so we're going to set it to i which 732 00:41:39,990 --> 00:41:42,340 happens to be two at this point. 733 00:41:42,340 --> 00:41:43,200 And we look at x. 734 00:41:43,200 --> 00:41:45,160 It has neighbors s, d, and c. 735 00:41:45,160 --> 00:41:46,330 We look at s again. 736 00:41:46,330 --> 00:41:48,680 We say, oh, we've already seen that yet again. 737 00:41:48,680 --> 00:41:50,800 So we're worried about this taking a lot of time 738 00:41:50,800 --> 00:41:54,090 because we look at s three times in total. 739 00:41:54,090 --> 00:41:56,290 Then we look at d. 740 00:41:56,290 --> 00:41:59,240 d hasn't been set, so we set it to two. c hasn't been set, 741 00:41:59,240 --> 00:42:00,220 so we set it to two. 742 00:42:00,220 --> 00:42:05,945 So the frontier at level two is that. 743 00:42:05,945 --> 00:42:07,570 Then we look at all the neighbors of z. 744 00:42:07,570 --> 00:42:09,489 There's a. a's already been set. 745 00:42:09,489 --> 00:42:10,780 Look at all the neighbors of d. 746 00:42:10,780 --> 00:42:11,370 There's x. 747 00:42:11,370 --> 00:42:11,870 There's c. 748 00:42:11,870 --> 00:42:12,703 Those have been set. 749 00:42:12,703 --> 00:42:13,850 There's f. 750 00:42:13,850 --> 00:42:16,440 This one gets added. 751 00:42:16,440 --> 00:42:17,480 Then we look at c. 752 00:42:17,480 --> 00:42:18,020 There's x. 753 00:42:18,020 --> 00:42:19,770 That's been done. d's been done. 754 00:42:19,770 --> 00:42:20,690 f's been done. 755 00:42:20,690 --> 00:42:23,370 v has not been done. 756 00:42:23,370 --> 00:42:27,300 So this becomes a frontier at level three. 757 00:42:27,300 --> 00:42:28,930 Then we look at level three. 758 00:42:28,930 --> 00:42:29,577 There's f. 759 00:42:29,577 --> 00:42:31,410 D's been done, c's been done, b's been done. 760 00:42:31,410 --> 00:42:34,530 We look at v. c's been done. f's been done. 761 00:42:34,530 --> 00:42:35,710 Nothing to add to next. 762 00:42:35,710 --> 00:42:36,780 Next becomes empty. 763 00:42:36,780 --> 00:42:38,240 Frontier becomes empty. 764 00:42:38,240 --> 00:42:39,530 The while-loop finishes. 765 00:42:39,530 --> 00:42:40,760 TA DA! 766 00:42:40,760 --> 00:42:43,510 We've computed-- we've visited all the vertices. 767 00:42:43,510 --> 00:42:44,453 Question. 768 00:42:44,453 --> 00:42:45,369 AUDIENCE: [INAUDIBLE]. 769 00:42:51,325 --> 00:42:52,910 What notation? 770 00:42:52,910 --> 00:42:54,460 PROFESSOR: This is Python notation. 771 00:42:54,460 --> 00:42:56,860 You may have heard of Python. 772 00:42:56,860 --> 00:43:01,720 This is a dictionary which has one key value, 773 00:43:01,720 --> 00:43:03,720 s, and has one value, zero. 774 00:43:03,720 --> 00:43:07,280 So you could-- That's shorthand in Python 775 00:43:07,280 --> 00:43:10,440 for-- Usually you have a comma separated list. 776 00:43:10,440 --> 00:43:14,310 The colon is specifying key value pairs. 777 00:43:17,300 --> 00:43:19,760 I didn't talk about parent. 778 00:43:19,760 --> 00:43:23,310 We can do that for a little bit. 779 00:43:23,310 --> 00:43:28,280 So parent we're initializing to say, the parent of s is nobody, 780 00:43:28,280 --> 00:43:30,890 and then whenever we visit a new vertex, 781 00:43:30,890 --> 00:43:34,900 v, we set its parent to be the vertex that we came from. 782 00:43:34,900 --> 00:43:36,650 So we had this vertex, v. We had an edge 783 00:43:36,650 --> 00:43:38,560 to v from some vertex, u. 784 00:43:38,560 --> 00:43:40,720 We set the parent of v to be u. 785 00:43:40,720 --> 00:43:44,140 So let me add in what that becomes. 786 00:43:44,140 --> 00:43:47,300 I'll change colors yet again. 787 00:43:47,300 --> 00:43:51,820 Although it gets hard to see any color but red. 788 00:43:51,820 --> 00:43:55,340 So we have s. 789 00:43:55,340 --> 00:44:00,810 When we visited a, then the parent of a would become s. 790 00:44:00,810 --> 00:44:05,150 When we visited z, the parent of z would be a. 791 00:44:05,150 --> 00:44:07,640 Parent of x is going to be s. 792 00:44:07,640 --> 00:44:09,925 Parent of d is going to be x. 793 00:44:09,925 --> 00:44:12,765 The parent of c is going to be x. 794 00:44:12,765 --> 00:44:15,620 The parent of f-- it could have been either way, 795 00:44:15,620 --> 00:44:18,860 but the way I did it, d went first, 796 00:44:18,860 --> 00:44:21,200 and so that became its parent. 797 00:44:21,200 --> 00:44:25,327 And I think for v, c was its parent. 798 00:44:25,327 --> 00:44:27,410 So that's what the parent pointers will look like. 799 00:44:27,410 --> 00:44:28,820 They always follow edges. 800 00:44:28,820 --> 00:44:30,760 They actually follow edges backwards. 801 00:44:30,760 --> 00:44:32,620 If this was a directed graph, the graph 802 00:44:32,620 --> 00:44:35,420 might be directed that way but the parent pointers 803 00:44:35,420 --> 00:44:37,270 go back along the edges. 804 00:44:37,270 --> 00:44:38,390 So it's a way to return. 805 00:44:38,390 --> 00:44:41,710 It's a way to return to s. 806 00:44:41,710 --> 00:44:44,580 If you follow these pointers, all roads lead to s. 807 00:44:48,140 --> 00:44:50,550 Because we started at s, that's the property we have. 808 00:44:50,550 --> 00:44:54,180 In fact, these pointers always form a tree, 809 00:44:54,180 --> 00:44:56,280 and the root of the tree is s. 810 00:44:56,280 --> 00:44:59,860 In fact, these pointers form what are called shortest paths. 811 00:44:59,860 --> 00:45:05,730 Let me write down a little bit about this. 812 00:45:19,970 --> 00:45:21,285 Shortest path properties. 813 00:45:45,910 --> 00:45:51,400 If you take a node, and you take its parent, 814 00:45:51,400 --> 00:45:53,170 and you take the parent of the parent, 815 00:45:53,170 --> 00:45:56,600 and so on, eventually you get to s. 816 00:45:56,600 --> 00:45:59,180 And if you read it backwards, that 817 00:45:59,180 --> 00:46:01,880 will actually be a path in the graph. 818 00:46:01,880 --> 00:46:10,050 And it will be a shortest path, in the graph, from s 819 00:46:10,050 --> 00:46:16,100 to v. Meaning, if you look at all paths in the graph that 820 00:46:16,100 --> 00:46:18,830 go from s to v-- So say we're going from s to v, 821 00:46:18,830 --> 00:46:23,101 how about that, we compute this path out of BFS. 822 00:46:23,101 --> 00:46:24,475 Which is, follow a parent of v is 823 00:46:24,475 --> 00:46:27,670 c, parent of c is x, parent of x is s. 824 00:46:27,670 --> 00:46:28,600 Read it backwards. 825 00:46:28,600 --> 00:46:30,590 That gives us a path from s to v. 826 00:46:30,590 --> 00:46:32,140 The claim is, that is the shortest 827 00:46:32,140 --> 00:46:35,420 way to get from s to v. It might not be the only one. 828 00:46:35,420 --> 00:46:38,330 Like if you're going from s to f, there's two short paths. 829 00:46:38,330 --> 00:46:40,330 There's this one of length three. 830 00:46:40,330 --> 00:46:42,160 There's this one of length three.. 831 00:46:42,160 --> 00:46:43,390 Uses three edges. 832 00:46:43,390 --> 00:46:45,300 Same length. 833 00:46:45,300 --> 00:46:47,110 And in the parent pointers, we can only 834 00:46:47,110 --> 00:46:48,640 afford to encode one of those paths 835 00:46:48,640 --> 00:46:51,015 because in general there might be exponentially many ways 836 00:46:51,015 --> 00:46:52,690 to get from one node to another. 837 00:46:52,690 --> 00:46:56,940 We find a shortest path, not necessarily the only one. 838 00:46:56,940 --> 00:47:01,075 And the length of that path-- So shortest 839 00:47:01,075 --> 00:47:03,750 here means that you use the fewest edges. 840 00:47:03,750 --> 00:47:07,690 And the length will be level of v. 841 00:47:07,690 --> 00:47:10,729 That's what we're keeping track of. 842 00:47:10,729 --> 00:47:13,020 If the level's zero, you can get there with zero steps. 843 00:47:13,020 --> 00:47:15,060 If the level's one, you get there with one steps. 844 00:47:15,060 --> 00:47:17,143 Because we're visiting everything you can possibly 845 00:47:17,143 --> 00:47:19,500 get in k steps, the level is telling you 846 00:47:19,500 --> 00:47:21,411 what that shortest path distance is. 847 00:47:21,411 --> 00:47:22,910 And the parent pointers are actually 848 00:47:22,910 --> 00:47:25,030 giving you the shortest path. 849 00:47:25,030 --> 00:47:27,090 That's the cool thing about BFS. 850 00:47:27,090 --> 00:47:28,620 Yeah, BFS explores the vertices. 851 00:47:28,620 --> 00:47:30,416 Sometimes, that's all you care about. 852 00:47:30,416 --> 00:47:32,040 But in some sense, what really matters, 853 00:47:32,040 --> 00:47:36,200 is it finds the shortest way to get from anywhere to anywhere. 854 00:47:36,200 --> 00:47:40,200 For a Rubik's Cube, that's nice because you 855 00:47:40,200 --> 00:47:43,260 run BFS from the start state of the Rubik's Cube. 856 00:47:43,260 --> 00:47:45,190 Then you say, oh, I'm in this state. 857 00:47:45,190 --> 00:47:46,724 You look up this state. 858 00:47:46,724 --> 00:47:47,640 You look at its level. 859 00:47:47,640 --> 00:47:50,590 It says, oh, you can get there in nine steps. 860 00:47:50,590 --> 00:47:52,260 That's, I think, the average. 861 00:47:52,260 --> 00:47:53,501 So I'm guessing. 862 00:47:53,501 --> 00:47:55,250 I don't know how to do this in nine steps. 863 00:47:58,470 --> 00:48:00,220 Great, so now you know how to solve it. 864 00:48:00,220 --> 00:48:01,720 You just look at the parent pointer. 865 00:48:01,720 --> 00:48:03,095 The parent pointer gives you another configuration. 866 00:48:03,095 --> 00:48:05,030 You say, oh, what move was that? 867 00:48:05,030 --> 00:48:06,480 And then you do that move. 868 00:48:06,480 --> 00:48:07,940 I'm not going to solve it. 869 00:48:07,940 --> 00:48:09,360 Then you look at the parent pointer of that. 870 00:48:09,360 --> 00:48:10,045 You do that move. 871 00:48:10,045 --> 00:48:11,510 You look at the parent pointer of that. 872 00:48:11,510 --> 00:48:12,080 You do that move. 873 00:48:12,080 --> 00:48:13,871 Eventually, you'll get to the solved state, 874 00:48:13,871 --> 00:48:16,430 and you will do it using the fewest possible moves. 875 00:48:16,430 --> 00:48:20,590 So if you can afford to put the whole graph in memory, which 876 00:48:20,590 --> 00:48:23,450 you can't for a big Rubik's Cube but you can for a small one, 877 00:48:23,450 --> 00:48:27,560 then this will give you a strategy, the optimal strategy, 878 00:48:27,560 --> 00:48:32,400 God's algorithm if you will, for every configuration. 879 00:48:32,400 --> 00:48:34,200 It solves all of them. 880 00:48:34,200 --> 00:48:36,199 Which is great. 881 00:48:36,199 --> 00:48:37,990 What is the running time of this algorithm? 882 00:48:37,990 --> 00:48:41,870 I claim it's order V plus E. But it looked a little wasteful 883 00:48:41,870 --> 00:48:45,300 because it was checking vertices over and over and over. 884 00:48:45,300 --> 00:48:47,260 But if you think about it carefully, 885 00:48:47,260 --> 00:48:50,110 you're only looking-- what's the right way 886 00:48:50,110 --> 00:48:55,584 to say this-- you only check every edge once. 887 00:48:55,584 --> 00:48:57,500 Or in undirected graphs, you check them twice, 888 00:48:57,500 --> 00:49:00,820 once from each side. 889 00:49:00,820 --> 00:49:04,290 A vertex enters the frontier only once. 890 00:49:04,290 --> 00:49:07,610 Because once it's in the frontier, it gets a level set. 891 00:49:07,610 --> 00:49:11,450 And once it has a level set, it'll never go in again. 892 00:49:11,450 --> 00:49:14,450 It'll never get added to next. 893 00:49:14,450 --> 00:49:17,440 So s gets added once then we check all the neighbors of s. 894 00:49:17,440 --> 00:49:19,910 a gets added once, then we check all the neighbors of a. 895 00:49:19,910 --> 00:49:21,520 Each of these guys gets added once. 896 00:49:21,520 --> 00:49:22,950 We check all the neighbors. 897 00:49:22,950 --> 00:49:24,410 So the total running time is going 898 00:49:24,410 --> 00:49:27,490 to be the sum over all vertices of the size 899 00:49:27,490 --> 00:49:33,530 of the adjacency list of v. So this is the number of neighbors 900 00:49:33,530 --> 00:49:35,220 that v has. 901 00:49:35,220 --> 00:49:37,579 And this is going to be? 902 00:49:37,579 --> 00:49:38,079 Answer? 903 00:49:42,336 --> 00:49:44,004 AUDIENCE: Two times the number of edges. 904 00:49:44,004 --> 00:49:44,670 PROFESSOR: Sorry 905 00:49:44,670 --> 00:49:46,211 AUDIENCE: Double the number of edges. 906 00:49:46,211 --> 00:49:48,930 PROFESSOR: Twice the number of edges for undirected graphs. 907 00:49:48,930 --> 00:49:51,330 It's going to be the number of edges for directed graphs. 908 00:49:51,330 --> 00:49:52,770 This is the Handshaking Lemma. 909 00:49:52,770 --> 00:49:54,670 If you don't remember the Handshaking Lemma, 910 00:49:54,670 --> 00:49:57,330 you should read the textbook. 911 00:49:57,330 --> 00:49:59,116 Six o four two stuff. 912 00:50:03,300 --> 00:50:06,780 Basically you visit every edge twice. 913 00:50:06,780 --> 00:50:10,870 For directed graphs, you visit every edge once. 914 00:50:10,870 --> 00:50:13,590 But it's order E. We also spend order V 915 00:50:13,590 --> 00:50:15,790 because we touch every vertex. 916 00:50:15,790 --> 00:50:18,770 So the total running time is order V plus E. 917 00:50:18,770 --> 00:50:23,240 In fact, the way this is going, you can be a little tighter 918 00:50:23,240 --> 00:50:25,330 and say it's order E. I just want 919 00:50:25,330 --> 00:50:27,244 to mention in reality-- Sometimes 920 00:50:27,244 --> 00:50:29,410 you don't care about just what you can reach from s, 921 00:50:29,410 --> 00:50:31,410 you really want to visit every vertex. 922 00:50:31,410 --> 00:50:33,710 Then you need another outer loop that's 923 00:50:33,710 --> 00:50:38,170 iterating over all the vertices as potential choices for s. 924 00:50:38,170 --> 00:50:41,160 And you then can visit all the vertices in the entire graph 925 00:50:41,160 --> 00:50:42,930 even if it's disconnected. 926 00:50:42,930 --> 00:50:45,040 We'll talk more about that next class. 927 00:50:45,040 --> 00:50:46,607 That's it for BFS.