1 00:00:00,580 --> 00:00:03,010 PROFESSOR: We've briefly looked at graph isomorphism 2 00:00:03,010 --> 00:00:04,710 in the context of digraphs. 3 00:00:04,710 --> 00:00:07,920 And it comes up in even more fundamental way really 4 00:00:07,920 --> 00:00:11,840 for simple graphs where the definition is a bit simpler. 5 00:00:11,840 --> 00:00:17,080 So let's just look at this graph abstraction idea 6 00:00:17,080 --> 00:00:19,840 and how isomorphism connects with it. 7 00:00:19,840 --> 00:00:22,480 This is an example of two different ways 8 00:00:22,480 --> 00:00:24,610 of drawing the same graph. 9 00:00:24,610 --> 00:00:28,300 That is here's a 257, and there's 257. 10 00:00:28,300 --> 00:00:31,050 It's connected directly to 122, as here. 11 00:00:31,050 --> 00:00:34,920 And also 257 is connected to 99, as here. 12 00:00:34,920 --> 00:00:38,010 And if you check, it's exactly the same six vertices 13 00:00:38,010 --> 00:00:44,290 and exactly the same eight edges. 14 00:00:44,290 --> 00:00:45,780 But they're just drawn differently. 15 00:00:45,780 --> 00:00:49,300 So we don't want to confuse a drawing of a graph, 16 00:00:49,300 --> 00:00:51,920 like these two, with the graph itself. 17 00:00:51,920 --> 00:00:55,260 The graph itself consists of just the set of nodes 18 00:00:55,260 --> 00:00:56,390 and the set of edges. 19 00:00:56,390 --> 00:00:58,860 And if you extracted that from these two diagrams, 20 00:00:58,860 --> 00:01:00,400 you would get the same set of nodes 21 00:01:00,400 --> 00:01:03,280 and the same set of edges. 22 00:01:03,280 --> 00:01:05,519 So same graph, different layouts. 23 00:01:05,519 --> 00:01:09,230 But here's a case where it's really the same layout. 24 00:01:09,230 --> 00:01:12,975 You can see these two pictures, if you ignore the labels, 25 00:01:12,975 --> 00:01:15,350 are exactly the same with the two grays and the two grays 26 00:01:15,350 --> 00:01:17,230 and the red and the red. 27 00:01:17,230 --> 00:01:21,080 The difference now is that I've renamed the vertices. 28 00:01:21,080 --> 00:01:26,150 So we've assigned different labels to those vertices. 29 00:01:26,150 --> 00:01:31,150 And the connection between the two graphs now, 30 00:01:31,150 --> 00:01:33,720 this graph with vertices which are integers 31 00:01:33,720 --> 00:01:37,260 and this graph with vertices that are the names of people, 32 00:01:37,260 --> 00:01:40,290 is that they are isomorphic. 33 00:01:40,290 --> 00:01:45,110 And what isomorphism means is that all that matters 34 00:01:45,110 --> 00:01:48,490 between two graphs are their connections. 35 00:01:48,490 --> 00:01:50,950 And so graphs with the same connections 36 00:01:50,950 --> 00:01:54,310 among the same number of vertices 37 00:01:54,310 --> 00:01:55,486 are said to be isomorphic. 38 00:01:58,500 --> 00:02:00,000 To say it more precisely, two graphs 39 00:02:00,000 --> 00:02:02,880 are isomorphic when there's an edge preserving matching 40 00:02:02,880 --> 00:02:04,082 between their vertices. 41 00:02:04,082 --> 00:02:05,540 Matching meaning byjection junction 42 00:02:05,540 --> 00:02:06,820 between their vertices. 43 00:02:06,820 --> 00:02:08,360 And edge preserving means that where 44 00:02:08,360 --> 00:02:09,990 there is an edge on one side there's 45 00:02:09,990 --> 00:02:12,570 an edge between the corresponding vertices 46 00:02:12,570 --> 00:02:13,320 on the other side. 47 00:02:13,320 --> 00:02:15,370 Let's look at an example. 48 00:02:15,370 --> 00:02:16,870 Here are two graphs. 49 00:02:16,870 --> 00:02:18,800 And I claim that they are isomorphic. 50 00:02:18,800 --> 00:02:22,030 On the left, we've got a bunch of animals, dog, pig, cow, cat. 51 00:02:22,030 --> 00:02:25,080 And on the right we have a bunch of animal foods, hey, corn, 52 00:02:25,080 --> 00:02:26,240 beef, tuna. 53 00:02:26,240 --> 00:02:29,830 And it's a hint on how we're going to do the matching. 54 00:02:29,830 --> 00:02:33,340 So I'm going to tell you that the dog vertex on the left 55 00:02:33,340 --> 00:02:35,890 corresponds to the beef vertex on the right. 56 00:02:35,890 --> 00:02:38,610 So I'm defining a function, a byjection, 57 00:02:38,610 --> 00:02:40,420 from the vertices on the left in blue 58 00:02:40,420 --> 00:02:42,430 to the vertices on the right in red. 59 00:02:42,430 --> 00:02:45,290 And f of dog is beef. 60 00:02:45,290 --> 00:02:47,780 Likewise, f of cat, cats eat tuna. 61 00:02:47,780 --> 00:02:50,420 I'm going to map cat to tuna. 62 00:02:50,420 --> 00:02:53,390 And continuing for the remaining two vertices, 63 00:02:53,390 --> 00:02:56,310 I'm going to map cow to hay, which is what they eat, 64 00:02:56,310 --> 00:03:00,580 and pig to corn, which is frequently what's fed to pigs. 65 00:03:00,580 --> 00:03:03,850 OK, so this is a byjection. 66 00:03:03,850 --> 00:03:05,829 I mean, it's a perfect correspondence 67 00:03:05,829 --> 00:03:07,370 between the four vertices on the left 68 00:03:07,370 --> 00:03:09,140 and the four vertices on the right. 69 00:03:09,140 --> 00:03:11,980 But I have to check now that the edges are preserved. 70 00:03:11,980 --> 00:03:13,630 What does that mean? 71 00:03:13,630 --> 00:03:15,250 Well, let's do an example. 72 00:03:15,250 --> 00:03:18,450 There's an edge on the left between dog and pig. 73 00:03:18,450 --> 00:03:21,700 That means that there should be an edge on the right 74 00:03:21,700 --> 00:03:23,710 between where they go to. 75 00:03:23,710 --> 00:03:27,990 So there ought to be an edge between beef and corn, 76 00:03:27,990 --> 00:03:29,770 because that's where dog and pig go. 77 00:03:29,770 --> 00:03:31,400 And indeed, there's an edge there. 78 00:03:31,400 --> 00:03:32,840 So that part's good. 79 00:03:32,840 --> 00:03:34,309 And you can check the others. 80 00:03:34,309 --> 00:03:36,350 The other thing that we have to check on the left 81 00:03:36,350 --> 00:03:39,310 is since the edge preserving is an if and only 82 00:03:39,310 --> 00:03:41,220 if, there's an edge on the right if and only 83 00:03:41,220 --> 00:03:42,803 if there's an edge on the left, that's 84 00:03:42,803 --> 00:03:45,587 the same as saying there's no edge on the left if and only 85 00:03:45,587 --> 00:03:46,920 if there's no edge on the right. 86 00:03:46,920 --> 00:03:49,460 So let's check non-edges on the left. 87 00:03:49,460 --> 00:03:52,460 There's no edge between cow and pig. 88 00:03:52,460 --> 00:03:58,090 And indeed, cow goes to hay, and pig goes to corn. 89 00:03:58,090 --> 00:04:00,030 And sure enough, there is no edge 90 00:04:00,030 --> 00:04:02,100 on the right between hay and corn. 91 00:04:02,100 --> 00:04:04,490 And you can check the remaining cases. 92 00:04:04,490 --> 00:04:06,510 These two graphs are isomorphic. 93 00:04:06,510 --> 00:04:13,650 And that function f is in fact the edge preserving byjection. 94 00:04:13,650 --> 00:04:19,110 So stating it again, an isomorphism between two 95 00:04:19,110 --> 00:04:23,310 graphs G1 and G2 is a byjection between the vertices V1 of G1 96 00:04:23,310 --> 00:04:26,660 and the vertices V2 of G2 with the property 97 00:04:26,660 --> 00:04:32,880 that there's an edge uv in G1, an E1 edge, if 98 00:04:32,880 --> 00:04:40,550 and only if f of u f of v is an edge in the second graph in E2. 99 00:04:40,550 --> 00:04:42,900 And it's an if and only if that's edge preserving. 100 00:04:42,900 --> 00:04:45,460 So if there's an edge here, there's an edge there. 101 00:04:45,460 --> 00:04:49,770 If there's no edge on the left, there's no edge on the right. 102 00:04:49,770 --> 00:04:52,420 And that's a definition that's worth remembering. 103 00:04:52,420 --> 00:04:55,280 It's basically the same as the digraph case. 104 00:04:55,280 --> 00:04:58,170 Except in the diagram case, the edges have a direction. 105 00:04:58,170 --> 00:05:01,320 So it would be an edge from u to v if and only 106 00:05:01,320 --> 00:05:04,570 if there is an edge from f of u to f of v. 107 00:05:04,570 --> 00:05:06,850 But since we don't have to worry about direction 108 00:05:06,850 --> 00:05:10,050 in the simple case, the definition gets 109 00:05:10,050 --> 00:05:12,560 slightly simpler. 110 00:05:12,560 --> 00:05:14,455 What about non-isomorphism? 111 00:05:14,455 --> 00:05:16,580 How do you show that two graphs are not isomorphic? 112 00:05:16,580 --> 00:05:17,954 I can show you the two graphs are 113 00:05:17,954 --> 00:05:21,300 isomorphic by simply telling you what the byjection 114 00:05:21,300 --> 00:05:22,462 between their vertices is. 115 00:05:22,462 --> 00:05:23,920 And then it becomes a simple matter 116 00:05:23,920 --> 00:05:26,820 of checking whether the edges that should be there 117 00:05:26,820 --> 00:05:28,244 are there are not. 118 00:05:28,244 --> 00:05:30,535 How do you figure out the two graphs are not isomorphic 119 00:05:30,535 --> 00:05:33,220 and that there isn't any byjection 120 00:05:33,220 --> 00:05:34,450 that edge preserves edges? 121 00:05:34,450 --> 00:05:37,180 Well, for a start, these both have four vertices, 122 00:05:37,180 --> 00:05:38,250 so it's perfect. 123 00:05:38,250 --> 00:05:40,200 There are lots of byjections between the four 124 00:05:40,200 --> 00:05:43,250 vertices on the left and the four vertices on the right. 125 00:05:43,250 --> 00:05:45,740 Why isn't there an edge preserving one? 126 00:05:45,740 --> 00:05:49,190 Well, if you look at the graph on the left, 127 00:05:49,190 --> 00:05:53,520 it's actually got two vertices of degree 2 marked in red here. 128 00:05:53,520 --> 00:05:54,860 There's a degree 2 vertex. 129 00:05:54,860 --> 00:05:56,960 There's a degree 2 vertex. 130 00:05:56,960 --> 00:06:01,830 And on the right, every vertex is degree 3, if you check. 131 00:06:01,830 --> 00:06:05,540 Now one of the things that properties of isomorphism 132 00:06:05,540 --> 00:06:13,580 is that the edges that come out of the red, these two edges, 133 00:06:13,580 --> 00:06:17,520 have to correspond to two edges that come out 134 00:06:17,520 --> 00:06:19,120 of wherever it's mapped to. 135 00:06:19,120 --> 00:06:23,140 So a degree 2 vertex can only map to a degree 2 vertex. 136 00:06:23,140 --> 00:06:24,320 There aren't any. 137 00:06:24,320 --> 00:06:27,300 That's a proof that there can't be an isomorphism 138 00:06:27,300 --> 00:06:29,300 between the two graphs. 139 00:06:29,300 --> 00:06:32,120 So in general, the idea is that we're 140 00:06:32,120 --> 00:06:34,890 looking at properties that are preserved by isomorphism. 141 00:06:34,890 --> 00:06:38,810 This is almost like a state machine invariant kind of idea. 142 00:06:38,810 --> 00:06:43,570 So a property is preserved by isomorphism. 143 00:06:43,570 --> 00:06:46,670 Means that if two graphs-- if graph one has 144 00:06:46,670 --> 00:06:49,720 the property and graph one is isomorphic to graph two, 145 00:06:49,720 --> 00:06:52,880 then graph two has the property. 146 00:06:52,880 --> 00:06:55,630 And clearly if there's a property that's 147 00:06:55,630 --> 00:06:58,270 preserved by isomorphism and one graph has it 148 00:06:58,270 --> 00:06:59,830 and the other graph doesn't have it, 149 00:06:59,830 --> 00:07:02,512 that's a proof that they can't be isomorphic. 150 00:07:02,512 --> 00:07:04,220 So what are some of these properties that 151 00:07:04,220 --> 00:07:05,510 are preserved by isomorphism? 152 00:07:05,510 --> 00:07:06,740 Well, the number of nodes. 153 00:07:06,740 --> 00:07:08,600 Clearly there's got to be a byjection, 154 00:07:08,600 --> 00:07:11,446 so they have to have the same number of nodes. 155 00:07:11,446 --> 00:07:12,820 They have to have the same number 156 00:07:12,820 --> 00:07:14,210 of edges for similar reasons. 157 00:07:14,210 --> 00:07:15,890 Because the edges are preserved. 158 00:07:15,890 --> 00:07:20,034 An edge on one side corresponds to an edge on the other side. 159 00:07:20,034 --> 00:07:21,450 Others things that matter is we've 160 00:07:21,450 --> 00:07:23,540 just made this argument that the degrees 161 00:07:23,540 --> 00:07:28,920 are preserved as a consequence of the preserving of the edges. 162 00:07:28,920 --> 00:07:32,520 And all sorts of other structural properties 163 00:07:32,520 --> 00:07:35,610 are going to be preserved by isomorphism, like for example, 164 00:07:35,610 --> 00:07:39,650 the existence of circular paths, and distances between vertices, 165 00:07:39,650 --> 00:07:40,800 and things like that. 166 00:07:40,800 --> 00:07:42,390 Those will all be properties that 167 00:07:42,390 --> 00:07:46,330 are preserved by isomorphism. 168 00:07:46,330 --> 00:07:48,680 So that gives you a hook on trying 169 00:07:48,680 --> 00:07:51,810 to figure out whether or not two graphs are or are not 170 00:07:51,810 --> 00:07:53,150 isomorphic. 171 00:07:53,150 --> 00:07:56,220 But in general, there will be, if you've 172 00:07:56,220 --> 00:07:58,790 got a graph with a few 100 or 1,000 vertices, 173 00:07:58,790 --> 00:08:01,380 there are an awful lot of potential byjections 174 00:08:01,380 --> 00:08:03,330 between them to check. 175 00:08:03,330 --> 00:08:06,330 And the question is, how do you do it? 176 00:08:06,330 --> 00:08:11,460 It's a huge search that can't really be effectively done 177 00:08:11,460 --> 00:08:12,960 exhaustively. 178 00:08:12,960 --> 00:08:15,250 So what you look for is properties 179 00:08:15,250 --> 00:08:17,370 that are preserved by isomorphisms 180 00:08:17,370 --> 00:08:18,690 that give you a guide. 181 00:08:18,690 --> 00:08:22,030 So for example, if the graph on the left 182 00:08:22,030 --> 00:08:24,810 happens to have a degree 4 vertex and that degree 183 00:08:24,810 --> 00:08:28,580 4 vertex is adjacent to a degree 3 vertex, 184 00:08:28,580 --> 00:08:31,470 then the adjacency of a degree 4 and a degree 3 185 00:08:31,470 --> 00:08:34,330 is a typical property that's preserved by isomorphism. 186 00:08:34,330 --> 00:08:35,830 So you know for sure that if there's 187 00:08:35,830 --> 00:08:39,250 going to be a byjection between the first graph 188 00:08:39,250 --> 00:08:43,120 and the second graph, this pair of adjacent vertices of degree 189 00:08:43,120 --> 00:08:47,080 4 and degree 3 can only map to another pair 190 00:08:47,080 --> 00:08:49,150 of adjacent vertices in the second graph that 191 00:08:49,150 --> 00:08:51,530 also have degrees 4 and 3. 192 00:08:51,530 --> 00:08:54,810 So that will cut down enormously the number of places 193 00:08:54,810 --> 00:08:58,550 that this given vertex can map to in the other graph. 194 00:08:58,550 --> 00:09:02,010 And it gives you some structure to use 195 00:09:02,010 --> 00:09:05,430 to try to narrow down the search for the number of isomorphisms, 196 00:09:05,430 --> 00:09:10,710 and where the isomorphism is, and whether or not it exists. 197 00:09:10,710 --> 00:09:16,140 So having a degree 4 adjacent to a degree 3, for example, 198 00:09:16,140 --> 00:09:20,370 is a typical property that's preserved under isomorphism. 199 00:09:20,370 --> 00:09:26,080 But even so, if I give you two very large graphs, 200 00:09:26,080 --> 00:09:29,160 and these are actually extracted graphs from some communication 201 00:09:29,160 --> 00:09:32,750 network, an image of them, it's very hard 202 00:09:32,750 --> 00:09:34,500 to tell whether or not they're isomorphic. 203 00:09:34,500 --> 00:09:35,770 Well, you could guess, because of course, we 204 00:09:35,770 --> 00:09:37,519 took the same picture and copied it twice. 205 00:09:37,519 --> 00:09:40,730 But if there was some subtle difference between these two, 206 00:09:40,730 --> 00:09:43,742 like I erased one edge somewhere in the middle of that mess, 207 00:09:43,742 --> 00:09:45,950 how would you figure out that the two graphs were not 208 00:09:45,950 --> 00:09:47,700 isomorphic in that case? 209 00:09:47,700 --> 00:09:52,170 And the answer is that like these NP complete problems, 210 00:09:52,170 --> 00:09:55,790 there is no known procedure to check whether or not 211 00:09:55,790 --> 00:10:00,330 two graphs are isomorphic that is guaranteed to be efficient 212 00:10:00,330 --> 00:10:03,040 and to run in polynomial time. 213 00:10:03,040 --> 00:10:05,341 On the other hand, there are technical reasons, 214 00:10:05,341 --> 00:10:06,840 there are technical properties, that 215 00:10:06,840 --> 00:10:09,660 says that graph isomorphism is not 216 00:10:09,660 --> 00:10:12,500 one of these NP complete problems, 217 00:10:12,500 --> 00:10:16,100 unless [? peoples ?] NP or something like that. 218 00:10:16,100 --> 00:10:21,520 And so that's one distinguishing characteristic of this problem. 219 00:10:21,520 --> 00:10:23,680 The important one is that, as a matter of fact, 220 00:10:23,680 --> 00:10:27,830 in practice there are some really good isomorphism 221 00:10:27,830 --> 00:10:33,070 programs around that will in many cases 222 00:10:33,070 --> 00:10:35,540 figure out, given two graphs, whether or not 223 00:10:35,540 --> 00:10:38,870 they are isomorphic in time that's approximately 224 00:10:38,870 --> 00:10:40,670 the size of the two graphs. 225 00:10:40,670 --> 00:10:42,730 So pragmatically, graph isomorphism 226 00:10:42,730 --> 00:10:45,120 seems to be a manageable problem. 227 00:10:45,120 --> 00:10:49,970 Although theoretically you can't be sure 228 00:10:49,970 --> 00:10:52,590 that these efficient procedures that work most of the time 229 00:10:52,590 --> 00:10:54,090 are going to work always. 230 00:10:54,090 --> 00:10:57,350 Well, known procedures in fact blow up exponentially 231 00:10:57,350 --> 00:11:00,430 on some example or another.