1 00:00:02,300 --> 00:00:05,430 PROFESSOR: Trees are about the most basic data structure 2 00:00:05,430 --> 00:00:07,050 that you're ever going to come across. 3 00:00:07,050 --> 00:00:09,910 They pervade computer science and other subjects. 4 00:00:09,910 --> 00:00:11,500 So let's talk about them. 5 00:00:11,500 --> 00:00:13,540 And the simplest definition of a tree 6 00:00:13,540 --> 00:00:16,660 is that a tree is a connected graph with no cycles. 7 00:00:16,660 --> 00:00:20,470 In this setting we're talking about simple graphs, and trees 8 00:00:20,470 --> 00:00:22,949 with undirected edges. 9 00:00:22,949 --> 00:00:24,490 Well, in order to make sense of that, 10 00:00:24,490 --> 00:00:26,114 we better have a definition of a cycle. 11 00:00:26,114 --> 00:00:28,720 There's a picture of a typical tree, 12 00:00:28,720 --> 00:00:32,299 but to be precise, what's a cycle in a simple graph? 13 00:00:32,299 --> 00:00:35,850 Well, it's a closed walk of length greater than 2 14 00:00:35,850 --> 00:00:37,630 that doesn't cross itself. 15 00:00:37,630 --> 00:00:40,330 So the not crossing itself is the standard definition 16 00:00:40,330 --> 00:00:43,410 of a cycle that we were using in a directed graph. 17 00:00:43,410 --> 00:00:47,580 It simply means that it's a path, except that the beginning 18 00:00:47,580 --> 00:00:48,950 and end vertex are the same. 19 00:00:48,950 --> 00:00:52,610 So it looks like you start someplace at v, 20 00:00:52,610 --> 00:00:54,540 and then you go around to a and to w, 21 00:00:54,540 --> 00:00:58,020 and it's all distinct vertices as you go around in this path. 22 00:00:58,020 --> 00:01:00,180 Except that the path ends where it 23 00:01:00,180 --> 00:01:02,660 starts at v, which is what keeps it from being a path 24 00:01:02,660 --> 00:01:04,330 and makes it a cycle. 25 00:01:04,330 --> 00:01:05,890 Now, the length greater than 2 is 26 00:01:05,890 --> 00:01:08,380 what is the difference between the definition of cycle 27 00:01:08,380 --> 00:01:12,490 [? between ?] simple graphs and directed graphs. 28 00:01:12,490 --> 00:01:14,490 And in a directed graph, it's perfectly possible 29 00:01:14,490 --> 00:01:17,130 to have a self-loop of length, 1, 30 00:01:17,130 --> 00:01:19,390 that is an interesting and important kind of cycle 31 00:01:19,390 --> 00:01:19,890 to have. 32 00:01:19,890 --> 00:01:22,210 But we forbid them in simple graphs, 33 00:01:22,210 --> 00:01:26,620 because there's no way to avoid having a cycle of length 2, 34 00:01:26,620 --> 00:01:28,660 because you always have the ability 35 00:01:28,660 --> 00:01:31,240 to go back and forth across an edge-- that's not interesting. 36 00:01:31,240 --> 00:01:34,590 And so we don't consider that to be a cycle. 37 00:01:34,590 --> 00:01:37,320 A cycle, then, has to be of length greater than 2. 38 00:01:37,320 --> 00:01:40,230 It also rules out the cycle of length 0, 39 00:01:40,230 --> 00:01:43,600 which you get by taking a vertex all by itself. 40 00:01:43,600 --> 00:01:45,521 OK, with that technical definition, 41 00:01:45,521 --> 00:01:47,520 we now know what a cycle is in the simple graph, 42 00:01:47,520 --> 00:01:50,500 and we understand the definition of tree. 43 00:01:50,500 --> 00:01:52,100 Here are some more pictures of trees-- 44 00:01:52,100 --> 00:01:55,710 simple graphs with no cycles. 45 00:01:55,710 --> 00:01:58,040 Now, they really come up all the time. 46 00:01:58,040 --> 00:01:58,750 And why is that? 47 00:01:58,750 --> 00:02:00,410 Well, there are family trees, which 48 00:02:00,410 --> 00:02:02,530 you may be familiar with-- where you're 49 00:02:02,530 --> 00:02:06,460 drawing a picture of the descendants of a given person, 50 00:02:06,460 --> 00:02:09,830 and they keep branching out in a tree structure, 51 00:02:09,830 --> 00:02:13,057 as traditionally displayed. 52 00:02:13,057 --> 00:02:15,140 There are search trees, which come up all the time 53 00:02:15,140 --> 00:02:18,220 in computer science, where you branch on the answer 54 00:02:18,220 --> 00:02:21,076 to some question, which tells you which way to search next. 55 00:02:21,076 --> 00:02:22,940 There are game trees, which we've already 56 00:02:22,940 --> 00:02:24,920 discussed in this class, which are used 57 00:02:24,920 --> 00:02:26,930 to define games and strategies. 58 00:02:26,930 --> 00:02:31,670 There are parse trees, that come up in compiler technology, 59 00:02:31,670 --> 00:02:33,592 and in language theory. 60 00:02:33,592 --> 00:02:35,550 And there are spanning trees, which we're going 61 00:02:35,550 --> 00:02:38,900 to be talking about some today. 62 00:02:38,900 --> 00:02:42,557 Now, in addition to these places where trees come up, 63 00:02:42,557 --> 00:02:44,390 there are a lot of different kinds of trees. 64 00:02:44,390 --> 00:02:46,790 There's rooted trees, where there's some designated 65 00:02:46,790 --> 00:02:48,930 vertex called the root, and you think 66 00:02:48,930 --> 00:02:54,080 of getting to all the other vertices from the root. 67 00:02:54,080 --> 00:02:57,620 There are ordered trees, where when you're at a given vertex, 68 00:02:57,620 --> 00:03:04,440 there's a distinct order in which the exit 69 00:03:04,440 --> 00:03:07,867 edges from a vertex-- there's a first one, and a second one, 70 00:03:07,867 --> 00:03:10,450 and a third one, or a left one, and a next, left, most, and so 71 00:03:10,450 --> 00:03:12,750 on, so that there is an order in which you 72 00:03:12,750 --> 00:03:15,330 can choose to leave the vertex. 73 00:03:15,330 --> 00:03:17,310 There are a binary trees, in which 74 00:03:17,310 --> 00:03:22,740 each vertex has two ways out exactly-- or no ways out 75 00:03:22,740 --> 00:03:24,050 if it's a so-called leaf. 76 00:03:24,050 --> 00:03:28,532 And then there are a complete trees, whose definition 77 00:03:28,532 --> 00:03:30,240 is not important to us, because we're not 78 00:03:30,240 --> 00:03:31,531 going to consider any of these. 79 00:03:31,531 --> 00:03:35,430 There's also, by the way, directed trees in which edges 80 00:03:35,430 --> 00:03:38,870 have a direction, as in a [? di-graph ?] tree. 81 00:03:38,870 --> 00:03:40,700 But we're not considering any of these. 82 00:03:40,700 --> 00:03:43,510 We're going to focus on so-called pure trees, which 83 00:03:43,510 --> 00:03:46,310 are unordered, unrooted, undirected, 84 00:03:46,310 --> 00:03:48,220 and that's what we're talking about. 85 00:03:48,220 --> 00:03:50,030 So let's examine some more properties 86 00:03:50,030 --> 00:03:52,640 of trees and equivalent definitions of trees. 87 00:03:52,640 --> 00:03:55,060 It will be important for theoretical reasons 88 00:03:55,060 --> 00:03:58,180 and convenience to know lots of different characterizations 89 00:03:58,180 --> 00:03:58,780 of trees. 90 00:03:58,780 --> 00:04:01,220 So we're starting off with a definition which says, 91 00:04:01,220 --> 00:04:05,550 it's a connected simple graph with no cycles, 92 00:04:05,550 --> 00:04:08,050 but there's other ways to characterize it. 93 00:04:08,050 --> 00:04:11,760 So an edge in a simple graph is called a cut edge, 94 00:04:11,760 --> 00:04:15,370 if, when you remove it from the graph, 95 00:04:15,370 --> 00:04:17,329 two vertices that used to be connected-- 96 00:04:17,329 --> 00:04:19,079 that is used to have a path between them-- 97 00:04:19,079 --> 00:04:21,079 cease to have a path between them. 98 00:04:21,079 --> 00:04:24,200 So here's a simple graph illustration. 99 00:04:24,200 --> 00:04:29,760 And that edge, e, is a cut edge, because if I delete it, then 100 00:04:29,760 --> 00:04:31,340 there are now two components. 101 00:04:31,340 --> 00:04:34,320 There used to be two vertices-- actually any 102 00:04:34,320 --> 00:04:36,100 of the vertices here used to be connected 103 00:04:36,100 --> 00:04:38,510 to any of the vertices there via that edge. 104 00:04:38,510 --> 00:04:41,100 But once I've deleted that edge, all of those 105 00:04:41,100 --> 00:04:44,161 vertices here and there that used to be connected no longer 106 00:04:44,161 --> 00:04:44,660 are. 107 00:04:44,660 --> 00:04:47,710 So that makes e a cut edge. 108 00:04:47,710 --> 00:04:48,870 f is not a cut edge. 109 00:04:48,870 --> 00:04:51,900 Because even if I delete f, there 110 00:04:51,900 --> 00:04:54,220 is, in fact, still a path from every vertex 111 00:04:54,220 --> 00:04:56,330 to every other vertex here, so that f is not 112 00:04:56,330 --> 00:04:57,380 disconnecting anything. 113 00:05:00,160 --> 00:05:03,070 And as I say, it's still connected after you delete it. 114 00:05:03,070 --> 00:05:06,610 So now we get a simple way to characterize trees 115 00:05:06,610 --> 00:05:11,380 in terms of cut edges-- because an edge is not a cut edge if 116 00:05:11,380 --> 00:05:12,890 and only if it's on a cycle. 117 00:05:12,890 --> 00:05:15,690 If you think about that, if it's on a cycle, 118 00:05:15,690 --> 00:05:18,150 and you cut an edge out of a cycle, 119 00:05:18,150 --> 00:05:19,791 then everything on the cycle is still 120 00:05:19,791 --> 00:05:22,040 connected by going the other way around the cycle that 121 00:05:22,040 --> 00:05:24,290 doesn't use that edge. 122 00:05:24,290 --> 00:05:27,780 And if it's not on a cycle, then, in fact, 123 00:05:27,780 --> 00:05:29,830 you can think through that deleting it 124 00:05:29,830 --> 00:05:35,470 means that there's not going to be two paths between two 125 00:05:35,470 --> 00:05:37,470 things at its endpoints. 126 00:05:37,470 --> 00:05:41,410 And so it will separate them. 127 00:05:41,410 --> 00:05:44,110 OK, so another way, then, to define 128 00:05:44,110 --> 00:05:47,390 a tree is to say a tree is a connected graph where 129 00:05:47,390 --> 00:05:50,480 every edge is a cut edge-- that is 130 00:05:50,480 --> 00:05:52,860 as soon as you cut any edge out of a tree 131 00:05:52,860 --> 00:05:55,850 it stops being connected. 132 00:05:55,850 --> 00:05:58,480 That yields another way to say that something 133 00:05:58,480 --> 00:06:02,760 is a tree-- a tree is a simple graph that is connected 134 00:06:02,760 --> 00:06:04,970 and is edge-minimal, which, again, 135 00:06:04,970 --> 00:06:07,620 means that if you remove any edge, 136 00:06:07,620 --> 00:06:10,070 it stops having that property of being connected. 137 00:06:10,070 --> 00:06:12,512 So its an edge-minimal connected graph. 138 00:06:12,512 --> 00:06:14,720 That's kind of the reason why trees are so important, 139 00:06:14,720 --> 00:06:17,000 because if you're trying to figure out a way 140 00:06:17,000 --> 00:06:19,650 to get a whole bunch of things-- a whole bunch of vertices-- 141 00:06:19,650 --> 00:06:22,540 connected, a tree is going to have 142 00:06:22,540 --> 00:06:25,980 the minimum number of edges that are sufficient to get them all 143 00:06:25,980 --> 00:06:26,690 connected. 144 00:06:26,690 --> 00:06:30,530 If you think about different nodes in a network that 145 00:06:30,530 --> 00:06:32,030 need to communicate with each other, 146 00:06:32,030 --> 00:06:34,488 and you want to know how many direct connections that there 147 00:06:34,488 --> 00:06:37,690 have to be between these communication 148 00:06:37,690 --> 00:06:40,470 centers in order for everybody to talk to everybody else, 149 00:06:40,470 --> 00:06:44,160 the answer is-- it's got to be a tree on n vertices. 150 00:06:44,160 --> 00:06:45,680 And a tree on n vertices is going 151 00:06:45,680 --> 00:06:48,500 to have exactly n minus 1 edges. 152 00:06:48,500 --> 00:06:51,300 So that gives you still another equivalent definition 153 00:06:51,300 --> 00:06:52,210 of a tree. 154 00:06:52,210 --> 00:06:55,325 A tree is a connected graph that has n vertices, 155 00:06:55,325 --> 00:06:57,070 and n minus 1 edges. 156 00:06:59,770 --> 00:07:01,870 A kind of dual way to think about it 157 00:07:01,870 --> 00:07:08,000 is that a tree is an acyclic graph that 158 00:07:08,000 --> 00:07:10,780 has as many edges as it possibly could 159 00:07:10,780 --> 00:07:12,660 without having any cycles. 160 00:07:12,660 --> 00:07:16,210 So typically, an acyclic graph might not be connected, 161 00:07:16,210 --> 00:07:17,640 but as long as it's not connected, 162 00:07:17,640 --> 00:07:20,340 you can keep adding edges that will connect things up 163 00:07:20,340 --> 00:07:22,200 without creating cycles. 164 00:07:22,200 --> 00:07:24,030 But the minute you get a tree, so 165 00:07:24,030 --> 00:07:27,500 that everything is connected, you can't add another edge. 166 00:07:27,500 --> 00:07:31,780 So an edge maximal acyclic graph is still another way 167 00:07:31,780 --> 00:07:33,150 to characterize trees. 168 00:07:33,150 --> 00:07:36,480 And maybe the most useful way is to say 169 00:07:36,480 --> 00:07:40,840 that a graph, in which there is a unique path between any two 170 00:07:40,840 --> 00:07:43,010 vertices, is a tree. 171 00:07:43,010 --> 00:07:45,810 So of course, if there is a unique path-- in particular, 172 00:07:45,810 --> 00:07:48,730 there's a path, so all the vertices have to be connected. 173 00:07:48,730 --> 00:07:51,860 But what makes it a tree is that there aren't two different ways 174 00:07:51,860 --> 00:07:54,565 to connect between two vertices, because as soon as there 175 00:07:54,565 --> 00:07:56,500 were there would be a cycle. 176 00:07:56,500 --> 00:08:00,430 And those are some of the basic ways that trees 177 00:08:00,430 --> 00:08:03,300 can be formulated equivalently. 178 00:08:03,300 --> 00:08:07,310 And in fact, there's lots more, but this is enough for today.