1 00:00:07,000 --> 00:00:10,000 Good morning. It looks like 9:30 is getting 2 00:00:10,000 --> 00:00:13,000 earlier and earlier for everyone. 3 00:00:13,000 --> 00:00:17,000 Hello to all the people watching at home. 4 00:00:17,000 --> 00:00:22,000 I think there should be a requirement that if you're 5 00:00:22,000 --> 00:00:26,000 watching the video, you can only watch it 6 00:00:26,000 --> 00:00:31,000 9:30-11:00 on Sunday, or at least start watching then 7 00:00:31,000 --> 00:00:36,000 just so you can all feel our mornings. 8 00:00:36,000 --> 00:00:40,000 Today, we're going to talk about balanced search trees. 9 00:00:40,000 --> 00:00:42,000 Now, we've hinted at this for a while. 10 00:00:42,000 --> 00:00:46,000 Our goal today is to get a search tree data structure, 11 00:00:46,000 --> 00:00:50,000 so we can insert, delete, and search all at log n 12 00:00:50,000 --> 00:00:53,000 time for operations. So, we want a tree that's 13 00:00:53,000 --> 00:00:56,000 guaranteed to be log n in height. 14 00:00:56,000 --> 00:01:01,000 So, that's a balanced search tree data structure. 15 00:01:13,000 --> 00:01:18,000 And, we want a data structure that can maintain a dynamic set 16 00:01:18,000 --> 00:01:22,000 of n elements in log n time for operation. 17 00:01:32,000 --> 00:01:35,000 So, we'll say, using a tree of height order 18 00:01:35,000 --> 00:01:38,000 log n. Now, if you look very closely, 19 00:01:38,000 --> 00:01:42,000 we haven't actually defined what a search tree data 20 00:01:42,000 --> 00:01:45,000 structure is. We've defined what a binary 21 00:01:45,000 --> 00:01:50,000 search tree data structure is, and that's one particular kind. 22 00:01:50,000 --> 00:01:54,000 And that's what we will be focusing on today. 23 00:01:54,000 --> 00:01:57,000 In recitation on Friday, we will look at, 24 00:01:57,000 --> 00:02:01,000 or you will like that, balanced search trees that are 25 00:02:01,000 --> 00:02:07,000 not necessarily binary. Each node can have a constant 26 00:02:07,000 --> 00:02:09,000 number of children, not just two. 27 00:02:09,000 --> 00:02:13,000 So, I'm defining is generally. You actually see what a search 28 00:02:13,000 --> 00:02:15,000 tree is in the general case later on. 29 00:02:15,000 --> 00:02:19,000 Today, we will just be focusing on the binary case. 30 00:02:19,000 --> 00:02:22,000 So, I won't define this yet. So, there are a lot of 31 00:02:22,000 --> 00:02:26,000 different balanced search tree data structures. 32 00:02:26,000 --> 00:02:30,000 So, these are the main ones that I know of. 33 00:02:30,000 --> 00:02:35,000 The first one was AVL trees. This was invented in 1962. 34 00:02:35,000 --> 00:02:41,000 So, that was the beginning of fast data structures. 35 00:02:41,000 --> 00:02:47,000 The next three sort of come together and this is what you 36 00:02:47,000 --> 00:02:51,000 will cover in recitation this week. 37 00:02:51,000 --> 00:02:56,000 So, these are non binary trees. Instead of binary, 38 00:02:56,000 --> 00:03:02,000 we have maybe binary and tertiary, or maybe binary and 39 00:03:02,000 --> 00:03:08,000 tertiary, or quaternary, over a general concept degree, 40 00:03:08,000 --> 00:03:13,000 B. So, that's another way you can 41 00:03:13,000 --> 00:03:16,000 get balance. Two-three trees, 42 00:03:16,000 --> 00:03:22,000 which were the second trees to be invented, they were invented 43 00:03:22,000 --> 00:03:27,000 in 1970 by Hopcroft. The trees we will cover today 44 00:03:27,000 --> 00:03:33,000 are called red black trees. These are binary search trees 45 00:03:33,000 --> 00:03:38,000 of guaranteed logarithmic height. 46 00:03:38,000 --> 00:03:44,000 So then, there's some others. So, skip lists are ones that we 47 00:03:44,000 --> 00:03:49,000 will cover next week. It's not exactly a tree, 48 00:03:49,000 --> 00:03:55,000 but it's more or less a tree, and one that you will see in 49 00:03:55,000 --> 00:04:02,000 your problem set this week are treeps, which I won't talk too 50 00:04:02,000 --> 00:04:06,000 much about here. But they are in some sense 51 00:04:06,000 --> 00:04:09,000 easier to get because they essentially just rely on the 52 00:04:09,000 --> 00:04:13,000 material from last Monday. So, on Monday we saw that if we 53 00:04:13,000 --> 00:04:16,000 just randomly built a binary search tree, it's going to have 54 00:04:16,000 --> 00:04:19,000 log n height most of the time in expectation. 55 00:04:19,000 --> 00:04:22,000 So, treeps are a way to make that dynamic, 56 00:04:22,000 --> 00:04:25,000 so that instead of just having a static set of n items, 57 00:04:25,000 --> 00:04:28,000 you can insert and delete into those items and still 58 00:04:28,000 --> 00:04:33,000 effectively randomly permute them and put them in a tree. 59 00:04:33,000 --> 00:04:35,000 So in some sense, it's the easiest. 60 00:04:35,000 --> 00:04:39,000 It's also one of the most recent search tree data 61 00:04:39,000 --> 00:04:42,000 structures. That was invented in 1996 by a 62 00:04:42,000 --> 00:04:45,000 couple of geometers, Rimon Sidell and Aragen. 63 00:04:45,000 --> 00:04:49,000 So, those are just some search tree data structures. 64 00:04:49,000 --> 00:04:53,000 The only ones we will not cover in this class are AVL trees. 65 00:04:53,000 --> 00:04:57,000 They're not too hard. If you're interested, 66 00:04:57,000 --> 00:05:00,000 you should read about them because they're fun. 67 00:05:00,000 --> 00:05:05,000 I think they are a problem in the textbook. 68 00:05:05,000 --> 00:05:10,000 OK, but today, we're going to focus on red 69 00:05:10,000 --> 00:05:17,000 black trees, which is a fairly simple idea, red black trees. 70 00:05:17,000 --> 00:05:24,000 And, it's a particular way of guaranteeing this logarithmic 71 00:05:24,000 --> 00:05:30,000 height so that all the operations can be supported in 72 00:05:30,000 --> 00:05:36,000 log n time. So, they are binary search 73 00:05:36,000 --> 00:05:40,000 trees. And, they have a little bit of 74 00:05:40,000 --> 00:05:47,000 extra information in each node called the color field. 75 00:06:02,000 --> 00:06:06,000 And there are several properties that a tree with a 76 00:06:06,000 --> 00:06:11,000 color field has to satisfy in order to be called a red black 77 00:06:11,000 --> 00:06:14,000 tree. These are called the red black 78 00:06:14,000 --> 00:06:17,000 properties. And, this will take a little 79 00:06:17,000 --> 00:06:21,000 bit of time to write down, but it's all pretty simple. 80 00:06:21,000 --> 00:06:26,000 So once I write them down I will just say what they really 81 00:06:26,000 --> 00:06:30,000 mean. There's four properties. 82 00:06:30,000 --> 00:06:34,000 The first one's pretty simple. Every node is either red or 83 00:06:34,000 --> 00:06:37,000 black, hence the name of red black trees. 84 00:06:37,000 --> 00:06:42,000 So, the color field is just a single bit specifying red or 85 00:06:42,000 --> 00:06:43,000 black. And red nodes, 86 00:06:43,000 --> 00:06:48,000 I'm going to denote by a double circle because I don't have 87 00:06:48,000 --> 00:06:51,000 colored chalk here, and black nodes will be a 88 00:06:51,000 --> 00:06:54,000 single circle. And you probably don't have 89 00:06:54,000 --> 00:07:00,000 colored pens either, so it will save us some grief. 90 00:07:00,000 --> 00:07:04,000 Red is double circle; black is single circle. 91 00:07:04,000 --> 00:07:09,000 And, we sort of prefer black nodes in some sense. 92 00:07:09,000 --> 00:07:13,000 Red nodes are a pain, as we'll see. 93 00:07:13,000 --> 00:07:19,000 OK, second property is that the root and the leaves are all 94 00:07:19,000 --> 00:07:22,000 black. And, I'm going to pull a little 95 00:07:22,000 --> 00:07:26,000 trick here. Treat binary search trees a 96 00:07:26,000 --> 00:07:33,000 little bit differently than we have in the past. 97 00:07:33,000 --> 00:07:36,000 Normally, you think of the tree as a bunch of nodes. 98 00:07:36,000 --> 00:07:39,000 Each node could have zero or one or two children, 99 00:07:39,000 --> 00:07:43,000 something like this. I'm going to imagine appending 100 00:07:43,000 --> 00:07:46,000 every place where a node does not have a child. 101 00:07:46,000 --> 00:07:50,000 I'm going to put a little dot here, an external node, 102 00:07:50,000 --> 00:07:53,000 which I call a leaf. So, normally leaves would have 103 00:07:53,000 --> 00:07:57,000 been these items. I'm just going to add to every 104 00:07:57,000 --> 00:08:02,000 absent child pointer a leaf. And, these will be my leaves. 105 00:08:02,000 --> 00:08:07,000 These are really the nil pointers from each of these 106 00:08:07,000 --> 00:08:11,000 nodes. So now, every internal node has 107 00:08:11,000 --> 00:08:15,000 exactly two children, and every leaf has zero 108 00:08:15,000 --> 00:08:19,000 children. OK, so these are those I'm 109 00:08:19,000 --> 00:08:22,000 referring to. These are black, 110 00:08:22,000 --> 00:08:26,000 and this guy is black according to rule two. 111 00:08:26,000 --> 00:08:31,000 Now the properties get a little bit more interesting. 112 00:08:31,000 --> 00:08:37,000 The parent of every red node is black. 113 00:08:37,000 --> 00:08:42,000 So, whenever I have a red node, its parent has to be black, 114 00:08:42,000 --> 00:08:45,000 a single circle. OK, so in other words, 115 00:08:45,000 --> 00:08:50,000 if you look at a path in the tree you can never have two red 116 00:08:50,000 --> 00:08:53,000 nodes consecutive. You can have, 117 00:08:53,000 --> 00:08:55,000 at most, red, black, red, black. 118 00:08:55,000 --> 00:08:59,000 You can have several black nodes consecutive, 119 00:08:59,000 --> 00:09:06,000 but never two red nodes. OK, and then one more rule. 120 00:09:06,000 --> 00:09:13,000 It says a little bit more about such paths. 121 00:09:13,000 --> 00:09:21,000 So, if we take a simple path, meaning it doesn't repeat any 122 00:09:21,000 --> 00:09:28,000 vertices from a node, x, to a descended leaf of x, 123 00:09:28,000 --> 00:09:36,000 all such paths to all descendant leaves have the same 124 00:09:36,000 --> 00:09:42,000 number of black nodes on them. 125 00:09:59,000 --> 00:10:02,000 So, let me draw a picture. We have some tree. 126 00:10:02,000 --> 00:10:05,000 We have some node, x, in the tree. 127 00:10:05,000 --> 00:10:09,000 And, I'm looking at all the paths from x down to some 128 00:10:09,000 --> 00:10:14,000 descendant leaf down here at the bottom of the tree. 129 00:10:14,000 --> 00:10:19,000 All of these paths should have the same number of black nodes. 130 00:10:19,000 --> 00:10:23,000 So, here I'll draw that each one has four black nodes, 131 00:10:23,000 --> 00:10:28,000 the leaf, and three above it. We know that from property 132 00:10:28,000 --> 00:10:31,000 three, at most, half of the nodes are red 133 00:10:31,000 --> 00:10:38,000 because whenever I have a red node, the parent must be black. 134 00:10:38,000 --> 00:10:44,000 But I want all of these paths to have exactly the same number 135 00:10:44,000 --> 00:10:48,000 of black nodes. One subtlety here is that the 136 00:10:48,000 --> 00:10:52,000 black height, I didn't really leave room. 137 00:10:52,000 --> 00:10:58,000 So I'll write it over here. This should be the same for all 138 00:10:58,000 --> 00:11:04,000 paths, but in particular, the count I'm interested in 139 00:11:04,000 --> 00:11:09,000 does not include x itself. OK, so if x is black, 140 00:11:09,000 --> 00:11:12,000 I'm only calling the black height. 141 00:11:12,000 --> 00:11:15,000 So, the black height of x is this count four. 142 00:11:15,000 --> 00:11:19,000 And even if x is black, the black height is four. 143 00:11:19,000 --> 00:11:23,000 So, these are just some minor details to get all of the 144 00:11:23,000 --> 00:11:27,000 algorithms a bit clean. So, let's look at an example of 145 00:11:27,000 --> 00:11:31,000 a red black tree. So, yeah, I'll show you an 146 00:11:31,000 --> 00:11:34,000 example. Then I'll say why we care about 147 00:11:34,000 --> 00:11:36,000 these properties. 148 00:12:16,000 --> 00:12:18,000 OK, so this tree has several properties. 149 00:12:18,000 --> 00:12:21,000 The first thing is that it's a binary search tree. 150 00:12:21,000 --> 00:12:24,000 OK, and so you can check an [in order traversal?]. 151 00:12:24,000 --> 00:12:26,000 It should give these numbers in sorted order: 152 00:12:26,000 --> 00:12:28,000 three, seven, eight, ten, 11, 153 00:12:28,000 --> 00:12:30,000 18, 22, 26. So, it's a valid binary search 154 00:12:30,000 --> 00:12:33,000 tree. We've appended these leaves 155 00:12:33,000 --> 00:12:36,000 with no keys in them. They are just hanging around. 156 00:12:36,000 --> 00:12:39,000 Those are the nil pointers. So, each of these, 157 00:12:39,000 --> 00:12:43,000 you can call them nil. They are all just marked there, 158 00:12:43,000 --> 00:12:47,000 wherever there is absent child. And then, I've double circled 159 00:12:47,000 --> 00:12:49,000 some of the nodes to color them red. 160 00:12:49,000 --> 00:12:52,000 OK, if I didn't, the black heights wouldn't 161 00:12:52,000 --> 00:12:54,000 match up. So, I have to be a little bit 162 00:12:54,000 --> 00:12:56,000 careful. From every node, 163 00:12:56,000 --> 00:12:59,000 we'd like to measure the number of black nodes from that node 164 00:12:59,000 --> 00:13:02,000 down to any descendent leaf. So, for example, 165 00:13:02,000 --> 00:13:07,000 the nil pointers, their black height is zero. 166 00:13:07,000 --> 00:13:09,000 Good. That's always the answer. 167 00:13:09,000 --> 00:13:12,000 So, these guys always have black height zero. 168 00:13:12,000 --> 00:13:17,000 I'll just represent that here. Black height equals zero. 169 00:13:17,000 --> 00:13:19,000 OK, what's the black height of three? 170 00:13:19,000 --> 00:13:22,000 Zero? Not quite, because these nodes 171 00:13:22,000 --> 00:13:25,000 are black. So the black height is one. 172 00:13:25,000 --> 00:13:29,000 You're right that we don't count three even though it's 173 00:13:29,000 --> 00:13:34,000 black. It's not included in the count. 174 00:13:34,000 --> 00:13:38,000 But the leaves count. And there's only two paths 175 00:13:38,000 --> 00:13:43,000 here, and they each have the same number of black nodes as 176 00:13:43,000 --> 00:13:46,000 they should. Over here, let's say eight also 177 00:13:46,000 --> 00:13:50,000 has black height one even though it's red. 178 00:13:50,000 --> 00:13:53,000 OK: same with 11, same with 26. 179 00:13:53,000 --> 00:13:55,000 Each of them only has two paths. 180 00:13:55,000 --> 00:14:00,000 Each path has one black node on it. 181 00:14:00,000 --> 00:14:02,000 Ten: what's the black height? It's still one, 182 00:14:02,000 --> 00:14:05,000 good, because we don't count ten. 183 00:14:05,000 --> 00:14:07,000 There's now four paths to leaves. 184 00:14:07,000 --> 00:14:10,000 Each of them contains exactly one black node, 185 00:14:10,000 --> 00:14:12,000 plus the root, which we don't count. 186 00:14:12,000 --> 00:14:14,000 22: same thing, hopefully. 187 00:14:14,000 --> 00:14:17,000 This is getting a little more interesting. 188 00:14:17,000 --> 00:14:20,000 There's one path here which has one black node. 189 00:14:20,000 --> 00:14:23,000 There are other paths here, which are longer. 190 00:14:23,000 --> 00:14:25,000 But they still only have one black node. 191 00:14:25,000 --> 00:14:28,000 So, if we just sort of ignore the red nodes, 192 00:14:28,000 --> 00:14:31,000 all these paths have the same length. 193 00:14:31,000 --> 00:14:34,000 OK: 18 should be bigger hopefully, black height of two 194 00:14:34,000 --> 00:14:38,000 because each of these paths now has one black node here, 195 00:14:38,000 --> 00:14:41,000 one black node in leaves, or one black note here, 196 00:14:41,000 --> 00:14:46,000 and one black node in the leaves. 197 00:14:46,000 --> 00:14:49,000 And finally, the root should have a black 198 00:14:49,000 --> 00:14:53,000 height of two. It's easier to see over here, 199 00:14:53,000 --> 00:14:56,000 I guess. Each of these paths has two 200 00:14:56,000 --> 00:15:00,000 black nodes. Same over here. 201 00:15:00,000 --> 00:15:02,000 OK, so hopefully these properties make sense. 202 00:15:02,000 --> 00:15:06,000 We didn't check all of them. Every red node has a black 203 00:15:06,000 --> 00:15:08,000 parent. If you look at all of these 204 00:15:08,000 --> 00:15:11,000 paths, we sort of alternate, red, black at most. 205 00:15:11,000 --> 00:15:13,000 Or we have just a bunch of blacks. 206 00:15:13,000 --> 00:15:15,000 But we never repeat two reds in a row. 207 00:15:15,000 --> 00:15:19,000 The root and the leaves are black that we used pretty much 208 00:15:19,000 --> 00:15:22,000 by definition. Every node is red or black. 209 00:15:22,000 --> 00:15:24,000 OK, that's easy. This is a particular set of 210 00:15:24,000 --> 00:15:27,000 properties. It may seem a bit arbitrary at 211 00:15:27,000 --> 00:15:30,000 this point. They will make a lot more sense 212 00:15:30,000 --> 00:15:34,000 as we see what consequences they have. 213 00:15:34,000 --> 00:15:37,000 But there are a couple of goals that we are trying to achieve 214 00:15:37,000 --> 00:15:40,000 here. One is that these properties 215 00:15:40,000 --> 00:15:43,000 should force the tree to have logarithmic height, 216 00:15:43,000 --> 00:15:45,000 order log n height. And, they do, 217 00:15:45,000 --> 00:15:48,000 although that's probably not obvious at this point. 218 00:15:48,000 --> 00:15:51,000 It follows mainly from all the properties. 219 00:15:51,000 --> 00:15:53,000 Three and four are the main ones. 220 00:15:53,000 --> 00:15:56,000 But you pretty much need all of them. 221 00:15:56,000 --> 00:15:59,000 The other desire we have from these properties is that they 222 00:15:59,000 --> 00:16:05,000 are somehow easy to maintain. OK, I can create a tree in the 223 00:16:05,000 --> 00:16:07,000 beginning that has this property. 224 00:16:07,000 --> 00:16:09,000 For example, I could make, 225 00:16:09,000 --> 00:16:14,000 I have to be a little bit careful, but certainly if I take 226 00:16:14,000 --> 00:16:19,000 a perfectly balanced binary tree and make all of the nodes black, 227 00:16:19,000 --> 00:16:21,000 it will satisfy those properties. 228 00:16:21,000 --> 00:16:25,000 OK, this is a red black tree. OK, so it's not too hard to 229 00:16:25,000 --> 00:16:30,000 make any these properties hold just from the beginning. 230 00:16:30,000 --> 00:16:34,000 The tricky part is to maintain them. 231 00:16:34,000 --> 00:16:38,000 When I insert a node into this tree, and delete a node for this 232 00:16:38,000 --> 00:16:40,000 tree, I want to make it not too hard. 233 00:16:40,000 --> 00:16:43,000 In log n time, I've got to be able to restore 234 00:16:43,000 --> 00:16:47,000 all these properties. OK, that will be the hardest 235 00:16:47,000 --> 00:16:49,000 part. The first thing we will do is 236 00:16:49,000 --> 00:16:53,000 prove that these properties imply that the tree has to have 237 00:16:53,000 --> 00:16:56,000 height order log n. Therefore, all searches and 238 00:16:56,000 --> 00:16:59,000 queries on a data structure will run fast. 239 00:16:59,000 --> 00:17:03,000 The hard part will be to make sure these properties stay true 240 00:17:03,000 --> 00:17:09,000 if they initially held true when we make changes to the tree. 241 00:17:09,000 --> 00:17:17,000 So, let's look at the height of a red black tree. 242 00:17:34,000 --> 00:17:38,000 And from this we will start to see where these properties come 243 00:17:38,000 --> 00:17:41,000 from, why we chose these properties. 244 00:18:06,000 --> 00:18:09,000 So, the claim is that the height of a red black tree with 245 00:18:09,000 --> 00:18:13,000 n keys, so, I'm not saying nodes here because I really only want 246 00:18:13,000 --> 00:18:17,000 to count the internal nodes, not these extra leaves that 247 00:18:17,000 --> 00:18:20,000 we've added, has height, at most, two times log of n 248 00:18:20,000 --> 00:18:24,000 plus one, so order log n. But, we have a pretty precise 249 00:18:24,000 --> 00:18:28,000 bound of a factor of two. There is a proof of this in the 250 00:18:28,000 --> 00:18:32,000 textbook by induction, and you should read that. 251 00:18:32,000 --> 00:18:35,000 What I'm going to give us more of a proof sketch. 252 00:18:35,000 --> 00:18:40,000 But you should read the proof by induction because all the 253 00:18:40,000 --> 00:18:44,000 practice you can get with proof by induction is good. 254 00:18:44,000 --> 00:18:49,000 The proof sketch on the other hand gives a lot more intuition 255 00:18:49,000 --> 00:18:54,000 with what's going on with red black trees and connects up with 256 00:18:54,000 --> 00:18:58,000 recitation on Friday. So, let me tell you that 257 00:18:58,000 --> 00:19:02,000 instead. I'm going to leave that board 258 00:19:02,000 --> 00:19:04,000 blank and go over here. 259 00:19:30,000 --> 00:19:36,000 So, the first thing I'm going to do, I'm going to manipulate 260 00:19:36,000 --> 00:19:41,000 this tree until it looks like something that I know. 261 00:19:41,000 --> 00:19:48,000 The main change I'm going to make is to merge each red node 262 00:19:48,000 --> 00:19:53,000 into its parent. And we know that the parent of 263 00:19:53,000 --> 00:19:59,000 a red node must be black. So, merge each red node into 264 00:19:59,000 --> 00:20:04,000 its black parent. So, let's look at that here. 265 00:20:04,000 --> 00:20:08,000 So, I'm going to take this red node, merge it into its parent, 266 00:20:08,000 --> 00:20:11,000 take this red node, merge it into its path, 267 00:20:11,000 --> 00:20:14,000 and so on. There's one up there which I 268 00:20:14,000 --> 00:20:17,000 can't reach. But I'm going to redraw this 269 00:20:17,000 --> 00:20:21,000 picture now. So, seven, so the top node now 270 00:20:21,000 --> 00:20:23,000 becomes, in some sense, seven and 18. 271 00:20:23,000 --> 00:20:29,000 They got merged together, but no one else joined them. 272 00:20:29,000 --> 00:20:31,000 Then, on the left, we have three. 273 00:20:31,000 --> 00:20:35,000 OK, nothing joined that, and there's some leaves as 274 00:20:35,000 --> 00:20:37,000 usual. Now, if you look at, 275 00:20:37,000 --> 00:20:40,000 maybe, I'm going to have to draw this. 276 00:20:40,000 --> 00:20:43,000 Uh-oh. I heard that sound before. 277 00:20:43,000 --> 00:20:47,000 So, I'm merging these nodes together, and I'm merging all of 278 00:20:47,000 --> 00:20:52,000 these nodes together because each of these red nodes merges 279 00:20:52,000 --> 00:20:56,000 into that black node. And, I'm merging these two 280 00:20:56,000 --> 00:20:59,000 nodes together. So, I'm putting this red node 281 00:20:59,000 --> 00:21:06,000 into that black node. So, now you can see from the 282 00:21:06,000 --> 00:21:12,000 root, which is now 7/18. There are three children 283 00:21:12,000 --> 00:21:16,000 hanging off. So, in that picture, 284 00:21:16,000 --> 00:21:23,000 I'd like to draw that fact assuming I can get this board 285 00:21:23,000 --> 00:21:24,000 back down. Good. 286 00:21:24,000 --> 00:21:31,000 So, between seven and 18, I have this conglomerate node, 287 00:21:31,000 --> 00:21:37,000 eight, ten, 11. And, there are four leaves 288 00:21:37,000 --> 00:21:42,000 hanging off of that node. And, off to the right, 289 00:21:42,000 --> 00:21:49,000 after 18, I have a conglomerate node, 22/26, and there are three 290 00:21:49,000 --> 00:21:54,000 leaves hanging off of there. OK, kind of a weird tree 291 00:21:54,000 --> 00:22:00,000 because we dealt mainly with binary trees so far, 292 00:22:00,000 --> 00:22:05,000 but this is a foreshadowing of what will come on Friday. 293 00:22:05,000 --> 00:22:12,000 This is something called a two-three-four tree. 294 00:22:12,000 --> 00:22:16,000 Any guesses why it's called a two-three-four tree? 295 00:22:16,000 --> 00:22:20,000 Every node can have two, three, or four kids, 296 00:22:20,000 --> 00:22:24,000 yeah, except the leaves. They have zero. 297 00:22:24,000 --> 00:22:29,000 There is another nice property of two-three-four trees maybe 298 00:22:29,000 --> 00:22:33,000 hinted at. So, there's really no control 299 00:22:33,000 --> 00:22:39,000 over whether you have two children or three children or 300 00:22:39,000 --> 00:22:43,000 four children. But, there is another nice 301 00:22:43,000 --> 00:22:45,000 property. Yeah? 302 00:22:45,000 --> 00:22:49,000 All of the leaves have the same depth, exactly. 303 00:22:49,000 --> 00:22:54,000 All of these guys have the same depth in the tree. 304 00:22:54,000 --> 00:23:00,000 Why is that? Because of property four. 305 00:23:00,000 --> 00:23:03,000 On Friday, you will see just how to maintain that property. 306 00:23:03,000 --> 00:23:07,000 But out of this transformation, we get that all the leaves have 307 00:23:07,000 --> 00:23:09,000 the same depth: because their depth, 308 00:23:09,000 --> 00:23:12,000 now, or let's say their height in the tree is their black 309 00:23:12,000 --> 00:23:15,000 height. And, the depth of these leaves 310 00:23:15,000 --> 00:23:17,000 will be the black height of the root. 311 00:23:17,000 --> 00:23:20,000 We are you raising all the red nodes, and we said if we look at 312 00:23:20,000 --> 00:23:24,000 a path, and we ignore all the red nodes, then the number of 313 00:23:24,000 --> 00:23:26,000 black nodes along a path is the same. 314 00:23:26,000 --> 00:23:31,000 Now we are basically just leaving all the black nodes. 315 00:23:31,000 --> 00:23:39,000 And so, along all these paths we'll have the same number of 316 00:23:39,000 --> 00:23:43,000 black nodes. And therefore, 317 00:23:43,000 --> 00:23:49,000 every leaf will have the same depth. 318 00:23:49,000 --> 00:23:55,000 Let me write down some of these properties. 319 00:23:55,000 --> 00:24:04,000 So, every internal node has between two and four children. 320 00:24:04,000 --> 00:24:10,000 And every leaf has the same depth, namely, 321 00:24:10,000 --> 00:24:17,000 the black height of the root. 322 00:24:28,000 --> 00:24:31,000 This is by property four. OK, so this is telling us a 323 00:24:31,000 --> 00:24:34,000 lot. So, essentially what this 324 00:24:34,000 --> 00:24:37,000 transformation is doing is ignoring the red nodes. 325 00:24:37,000 --> 00:24:42,000 Then, if you just focus on the black nodes, height equals black 326 00:24:42,000 --> 00:24:44,000 height. And then, black height is 327 00:24:44,000 --> 00:24:49,000 telling us that all the root to leaf paths have the same length. 328 00:24:49,000 --> 00:24:53,000 Therefore, all these nodes are at the same level. 329 00:24:53,000 --> 00:24:57,000 Having leaves at the same level as a good thing because it means 330 00:24:57,000 --> 00:25:02,000 that your tree is pretty much balanced. 331 00:25:02,000 --> 00:25:05,000 If you have a tree where all the nodes are branching, 332 00:25:05,000 --> 00:25:10,000 so, they'll have at least two children, and all the leaves are 333 00:25:10,000 --> 00:25:13,000 at the same level, that's pretty balanced. 334 00:25:13,000 --> 00:25:16,000 OK, we will prove some form of that now. 335 00:25:16,000 --> 00:25:19,000 I'm going to call the height of this tree h prime. 336 00:25:19,000 --> 00:25:22,000 The height of the original tree is h. 337 00:25:22,000 --> 00:25:24,000 That's what we want to bound here. 338 00:25:24,000 --> 00:25:27,000 So, the first thing is to bound h prime. 339 00:25:27,000 --> 00:25:32,000 And then we want to relate h and h prime. 340 00:25:32,000 --> 00:25:36,000 OK, so the first question is how many leaves are there in 341 00:25:36,000 --> 00:25:39,000 this tree? And, it doesn't really matter 342 00:25:39,000 --> 00:25:43,000 which tree I'm looking at because I didn't really do 343 00:25:43,000 --> 00:25:46,000 anything to the leaves. All the leaves are black. 344 00:25:46,000 --> 00:25:51,000 So the leaves didn't change. How many leaves are there in 345 00:25:51,000 --> 00:25:53,000 this tree, and then therefore, industry? 346 00:25:53,000 --> 00:25:54,000 Sorry? Nine. 347 00:25:54,000 --> 00:25:58,000 Indeed, there are nine, but I meant in general, 348 00:25:58,000 --> 00:26:03,000 sorry. In this example there are nine. 349 00:26:03,000 --> 00:26:07,000 How many keys are there? Eight. 350 00:26:07,000 --> 00:26:12,000 So, in general, how do you write nine as a 351 00:26:12,000 --> 00:26:18,000 function of eight for large values of nine or eight? 352 00:26:18,000 --> 00:26:20,000 Sorry? Plus one, good, 353 00:26:20,000 --> 00:26:24,000 correct answer, by guessing. 354 00:26:24,000 --> 00:26:30,000 n plus one. OK, why is it n plus one? 355 00:26:30,000 --> 00:26:34,000 Let's look at the binary tree case where we sort of understand 356 00:26:34,000 --> 00:26:37,000 what's going on? Well, wherever you have a key, 357 00:26:37,000 --> 00:26:40,000 there are two branches. And, that's not a very good 358 00:26:40,000 --> 00:26:43,000 argument. OK, we have what is here called 359 00:26:43,000 --> 00:26:47,000 a branching binary tree. Every internal node has exactly 360 00:26:47,000 --> 00:26:50,000 two children. And, we are counting the number 361 00:26:50,000 --> 00:26:54,000 of leaves that you get from that process in terms of the number 362 00:26:54,000 --> 00:26:58,000 of internal nodes. The number of leaves in a tree, 363 00:26:58,000 --> 00:27:01,000 or a branching tree, as always one plus the number 364 00:27:01,000 --> 00:27:05,000 of internal nodes. You should know that. 365 00:27:05,000 --> 00:27:10,000 You can prove it by induction. OK, so the number of leaves is 366 00:27:10,000 --> 00:27:14,000 n plus one. It doesn't hold if you have a 367 00:27:14,000 --> 00:27:18,000 single child. It holds if every internal node 368 00:27:18,000 --> 00:27:22,000 has a branching factor of two. OK, this is a neither tree. 369 00:27:22,000 --> 00:27:27,000 And now, we want to pull out some relation between the number 370 00:27:27,000 --> 00:27:32,000 of leaves and the height of the tree. 371 00:27:32,000 --> 00:27:35,000 So, what's a good relation to use here? 372 00:27:35,000 --> 00:27:38,000 We know exactly how many leaves there are. 373 00:27:38,000 --> 00:27:40,000 That will somehow connect us to n. 374 00:27:40,000 --> 00:27:43,000 What we care about is the height. 375 00:27:43,000 --> 00:27:46,000 And let's look at the height of this tree. 376 00:27:46,000 --> 00:27:50,000 So, if I have a two-three-four tree of height h prime, 377 00:27:50,000 --> 00:27:55,000 how many leaves could it have? What's the minimum and maximum 378 00:27:55,000 --> 00:28:01,000 number of leaves it could have? 2^h to 4^h, or h prime. 379 00:28:01,000 --> 00:28:05,000 So, we also know in the two-three-four tree, 380 00:28:05,000 --> 00:28:10,000 the number of leaves has to be between four to the h prime, 381 00:28:10,000 --> 00:28:15,000 because at most I could branch four ways in each node. 382 00:28:15,000 --> 00:28:21,000 And, it's at least two to the h prime because I know that every 383 00:28:21,000 --> 00:28:25,000 node branches at least two ways. That's key. 384 00:28:25,000 --> 00:28:31,000 So, I only care about one of these, I think this one. 385 00:28:31,000 --> 00:28:34,000 So, I get that two to the h prime is, at most, 386 00:28:34,000 --> 00:28:37,000 n plus one. So the number of leaves is n 387 00:28:37,000 --> 00:28:39,000 plus one. We know that exactly. 388 00:28:39,000 --> 00:28:42,000 So, we rewrite, we take logs of both sides. 389 00:28:42,000 --> 00:28:45,000 It says h one is at most log of n plus one. 390 00:28:45,000 --> 00:28:47,000 So, we have a nice, balanced tree. 391 00:28:47,000 --> 00:28:51,000 This should be intuitive. If I had every node branching 392 00:28:51,000 --> 00:28:54,000 two ways, and all the leaves at the same level, 393 00:28:54,000 --> 00:28:58,000 that's a perfect tree. It should be exactly log base 394 00:28:58,000 --> 00:29:03,000 two of n plus one, and turns out not quite n. 395 00:29:03,000 --> 00:29:05,000 That should be the height of the tree. 396 00:29:05,000 --> 00:29:08,000 Here, I might have even more branching, which is making 397 00:29:08,000 --> 00:29:10,000 things even shallower in some sense. 398 00:29:10,000 --> 00:29:13,000 So, I get more leaves out of the same height. 399 00:29:13,000 --> 00:29:16,000 But that's only better for me. That will only decrease the 400 00:29:16,000 --> 00:29:19,000 height in terms of the number of leaves. 401 00:29:19,000 --> 00:29:21,000 n plus one here is the number of leaves. 402 00:29:21,000 --> 00:29:23,000 So: cool. That's an easy upper bound on 403 00:29:23,000 --> 00:29:27,000 the height of the tree. Now, what we really care about 404 00:29:27,000 --> 00:29:30,000 is the height of this tree. So, we want to relate h and h 405 00:29:30,000 --> 00:29:34,000 prime. Any suggestions on how we might 406 00:29:34,000 --> 00:29:37,000 do that? How do we know that the height 407 00:29:37,000 --> 00:29:41,000 of this reduced tree is not too much smaller than this one. 408 00:29:41,000 --> 00:29:44,000 We know that this one is, at most, log n. 409 00:29:44,000 --> 00:29:47,000 We want this to be, at most, two log n plus one. 410 00:29:47,000 --> 00:29:50,000 We know the answer. We've said the theorem. 411 00:29:50,000 --> 00:29:51,000 Sorry? Right. 412 00:29:51,000 --> 00:29:55,000 So, property three tells us that we can only have one red 413 00:29:55,000 --> 00:29:58,000 node for every black one. We can, at most, 414 00:29:58,000 --> 00:30:04,000 alternate red and black. So, if we look at one of these 415 00:30:04,000 --> 00:30:09,000 paths that goes from a root to a leaf, the number of red nodes 416 00:30:09,000 --> 00:30:13,000 can be, at most, half the length of the path. 417 00:30:13,000 --> 00:30:18,000 And we take the max overall paths, that's the height of the 418 00:30:18,000 --> 00:30:21,000 tree. So, we know that h is, 419 00:30:21,000 --> 00:30:26,000 at most, two times h prime, or maybe it's easier to think 420 00:30:26,000 --> 00:30:28,000 of h prime is at least a half, h. 421 00:30:28,000 --> 00:30:33,000 Assuming I got that right, because at most a half of the 422 00:30:33,000 --> 00:30:38,000 nodes on any root to leaf path -- 423 00:30:48,000 --> 00:30:52,000 -- are red. So, at least half of them have 424 00:30:52,000 --> 00:30:56,000 to be black. And, all-black nodes are 425 00:30:56,000 --> 00:31:01,000 captured in this picture so we have this relation, 426 00:31:01,000 --> 00:31:04,000 and therefore, h is, at most, 427 00:31:04,000 --> 00:31:08,000 two times log n plus one. OK: pretty easy. 428 00:31:08,000 --> 00:31:12,000 But you have to remember, this tree has to be balanced, 429 00:31:12,000 --> 00:31:15,000 and they are not too far away from each other. 430 00:31:15,000 --> 00:31:18,000 OK, so in Friday's recitation, you will see how to manipulate 431 00:31:18,000 --> 00:31:21,000 trees with this form. There is a cool way to do it. 432 00:31:21,000 --> 00:31:25,000 That's two-three-four trees. Today, we're going to see how 433 00:31:25,000 --> 00:31:28,000 to manipulate trees in this form as red black trees. 434 00:31:28,000 --> 00:31:31,000 And, you'll see today's lecture, and you'll see Friday's 435 00:31:31,000 --> 00:31:36,000 recitation, and they won't really seem to relate at all. 436 00:31:36,000 --> 00:31:40,000 But they're the same, just a bit hidden. 437 00:31:40,000 --> 00:31:47,000 OK, so this is good news. We now know that all red black 438 00:31:47,000 --> 00:31:53,000 trees are balanced. So as long as we can make sure 439 00:31:53,000 --> 00:32:00,000 that our tree stays a red black tree, we'll be OK. 440 00:32:00,000 --> 00:32:05,000 We'll be OK in the sense that the height is always log n. 441 00:32:05,000 --> 00:32:10,000 And therefore, queries in a red black tree, 442 00:32:10,000 --> 00:32:15,000 so queries are things like search, find a given key, 443 00:32:15,000 --> 00:32:18,000 find the minimum, find the maximum, 444 00:32:18,000 --> 00:32:22,000 find a successor, find a predecessor. 445 00:32:22,000 --> 00:32:28,000 These are all queries that we know how to support in a binary 446 00:32:28,000 --> 00:32:35,000 search tree. And we know how to do them in 447 00:32:35,000 --> 00:32:44,000 order height time. And the height here is log n so 448 00:32:44,000 --> 00:32:53,000 we know that all of these operations take order log n in a 449 00:32:53,000 --> 00:32:58,000 red black tree. OK -- 450 00:33:13,000 --> 00:33:18,000 So, queries are easy. We are done with queries, 451 00:33:18,000 --> 00:33:22,000 just from balance: not a surprise. 452 00:33:22,000 --> 00:33:29,000 We know that balances is good. The hard part for us will be to 453 00:33:29,000 --> 00:33:33,000 do updates. And in this context, 454 00:33:33,000 --> 00:33:35,000 updates means insert and delete. 455 00:33:35,000 --> 00:33:38,000 In general, and a data structure, we talk about queries 456 00:33:38,000 --> 00:33:42,000 which ask questions about the data in the structure, 457 00:33:42,000 --> 00:33:45,000 and updates which modify the data in the structure. 458 00:33:45,000 --> 00:33:48,000 And most of the time here, we are always thinking about 459 00:33:48,000 --> 00:33:51,000 dynamic sets. So, you can change the dynamics 460 00:33:51,000 --> 00:33:53,000 set by adding or deleting an element. 461 00:33:53,000 --> 00:33:55,000 You can ask all sorts of questions. 462 00:33:55,000 --> 00:33:59,000 In priority queues, there were other updates like 463 00:33:59,000 --> 00:34:02,000 delete Min. Here we have find Min, 464 00:34:02,000 --> 00:34:05,000 but we could then delete it. Typically these are the 465 00:34:05,000 --> 00:34:09,000 operations we care about. And we'll talk about updates to 466 00:34:09,000 --> 00:34:12,000 include those of these, and queries to include all of 467 00:34:12,000 --> 00:34:15,000 these, or whatever happens to be relevant. 468 00:34:15,000 --> 00:34:18,000 In problem sets especially, you'll see all sorts of 469 00:34:18,000 --> 00:34:21,000 different queries that you can support. 470 00:34:21,000 --> 00:34:23,000 OK, so how do we support updates? 471 00:34:23,000 --> 00:34:27,000 Well, we have binary search tree insert, which we call tree 472 00:34:27,000 --> 00:34:30,000 insert. We have binary search tree 473 00:34:30,000 --> 00:34:33,000 delete, tree delete. They will preserve the binary 474 00:34:33,000 --> 00:34:36,000 search tree property, but we know they don't 475 00:34:36,000 --> 00:34:39,000 necessarily preserve balance. We can insert a bunch of nodes. 476 00:34:39,000 --> 00:34:42,000 Just keep adding new minimum elements and you will get a 477 00:34:42,000 --> 00:34:45,000 really long path off the end. So, presumably, 478 00:34:45,000 --> 00:34:49,000 they do not preserve the red black properties because we know 479 00:34:49,000 --> 00:34:51,000 red black implies balance. In particular, 480 00:34:51,000 --> 00:34:54,000 they won't satisfy property one, which I've erased, 481 00:34:54,000 --> 00:34:56,000 which is every node is red or black. 482 00:34:56,000 --> 00:34:59,000 It'll add a node, and not assign it a color. 483 00:34:59,000 --> 00:35:01,000 So, we've got to assign it a color. 484 00:35:01,000 --> 00:35:04,000 And, as soon as we do that, we'll probably violate some 485 00:35:04,000 --> 00:35:06,000 other property. And then we have to fix that 486 00:35:06,000 --> 00:35:09,000 property, and so on. So, it's a bit tricky, 487 00:35:09,000 --> 00:35:14,000 but you play around with it and it's not too hard. 488 00:35:14,000 --> 00:35:18,000 OK, so updates must modify the tree. 489 00:35:18,000 --> 00:35:27,000 And to preserve the red black properties, they're going to do 490 00:35:27,000 --> 00:35:34,000 it in three different kinds of modifications. 491 00:35:34,000 --> 00:35:37,000 The first thing we will indeed do is just use the BST 492 00:35:37,000 --> 00:35:40,000 operation, tree insert or tree delete. 493 00:35:40,000 --> 00:35:42,000 That's something we know how to do. 494 00:35:42,000 --> 00:35:45,000 Let's just do it. We are going to have to change 495 00:35:45,000 --> 00:35:48,000 the colors of some of the nodes. In particular, 496 00:35:48,000 --> 00:35:52,000 the one that we insert better be colored somehow. 497 00:35:52,000 --> 00:35:54,000 And in general, if we just rip out a node, 498 00:35:54,000 --> 00:36:00,000 we are going to have to recolor it, recolor some nearby nodes. 499 00:36:00,000 --> 00:36:03,000 There is one other kind of operation we're going to do. 500 00:36:03,000 --> 00:36:06,000 So, recoloring just means set to red or black. 501 00:36:06,000 --> 00:36:09,000 The other thing you might do is rearrange the tree, 502 00:36:09,000 --> 00:36:12,000 change the pointers, change the links from one node 503 00:36:12,000 --> 00:36:15,000 to another. And, we're going to do that at 504 00:36:15,000 --> 00:36:18,000 the very structured way. And, this is one of the main 505 00:36:18,000 --> 00:36:21,000 reasons that red black trees are interesting. 506 00:36:21,000 --> 00:36:24,000 The kinds of changes they make are very simple, 507 00:36:24,000 --> 00:36:27,000 and they also don't make very many of them. 508 00:36:27,000 --> 00:36:32,000 So, they're called rotations. So, here's a rotation. 509 00:36:46,000 --> 00:36:48,000 OK, this is a way of drawing a generic part of a tree. 510 00:36:48,000 --> 00:36:50,000 We have two nodes, A and B. 511 00:36:50,000 --> 00:36:53,000 There is some subtrees hanging off, which we draw as triangles. 512 00:36:53,000 --> 00:36:56,000 We don't know how big they are. We know they better all have 513 00:36:56,000 --> 00:37:00,000 the same black height if it's a red black tree. 514 00:37:00,000 --> 00:37:02,000 But in general, it just looks like this. 515 00:37:02,000 --> 00:37:06,000 There is some parent, and there's some rest of the 516 00:37:06,000 --> 00:37:08,000 tree out here which we don't draw. 517 00:37:08,000 --> 00:37:12,000 I'll give these subtrees names, Greek names, 518 00:37:12,000 --> 00:37:13,000 alpha, beta, gamma. 519 00:37:13,000 --> 00:37:16,000 And, I'll define the operation right rotate of B. 520 00:37:16,000 --> 00:37:21,000 So general, if I have a node, B, I look at it and I want to 521 00:37:21,000 --> 00:37:24,000 do it right rotation, I look at its left child enjoy 522 00:37:24,000 --> 00:37:30,000 this picture called the subtrees of those two nodes. 523 00:37:30,000 --> 00:37:34,000 And, I create this tree. 524 00:37:45,000 --> 00:37:47,000 So, all I've done is turn this edge 90∞. 525 00:37:47,000 --> 00:37:51,000 What was the parent of B is now the parent of A. 526 00:37:51,000 --> 00:37:54,000 A is now the new parent of B. The subtrees rearrange. 527 00:37:54,000 --> 00:37:59,000 Before, they were both subtrees of, these two were subtrees of 528 00:37:59,000 --> 00:38:02,000 A. And, gamma was a subtree of B. 529 00:38:02,000 --> 00:38:06,000 Gamma is still a subtree of B, and alpha still is a subtree of 530 00:38:06,000 --> 00:38:08,000 A. But, beta switched to being a 531 00:38:08,000 --> 00:38:11,000 subtree of B. OK, the main thing we want to 532 00:38:11,000 --> 00:38:15,000 check here is that this operation preserves the binary 533 00:38:15,000 --> 00:38:19,000 search tree property. Remember, the binary search 534 00:38:19,000 --> 00:38:23,000 tree property says that all the elements in the left subtree of 535 00:38:23,000 --> 00:38:28,000 a node are less than or equal to the node, and all the elements 536 00:38:28,000 --> 00:38:34,000 in the right subtree are greater than or equal to that value. 537 00:38:34,000 --> 00:38:37,000 So, in particular, if we take some node, 538 00:38:37,000 --> 00:38:40,000 little a in alpha, little b in beta, 539 00:38:40,000 --> 00:38:45,000 and little c in gamma, then a is less than or equal to 540 00:38:45,000 --> 00:38:50,000 capital A, is less than or equal to little b, is less than or 541 00:38:50,000 --> 00:38:54,000 equal to capital B, is less than or equal to little 542 00:38:54,000 --> 00:38:57,000 c. And, this is the condition both 543 00:38:57,000 --> 00:39:03,000 on the left side and on the right side because Alpha is left 544 00:39:03,000 --> 00:39:06,000 of everything. Beta is in between A and B, 545 00:39:06,000 --> 00:39:11,000 and gamma is after B. And the same thing is true over 546 00:39:11,000 --> 00:39:12,000 here. Beta is still, 547 00:39:12,000 --> 00:39:15,000 it's supposed to be all the nodes that come between capital 548 00:39:15,000 --> 00:39:17,000 A and capital B. So, this is good. 549 00:39:17,000 --> 00:39:20,000 We could definitely do this operation, still have the binary 550 00:39:20,000 --> 00:39:23,000 search tree, and we are going to use rotations in a particularly 551 00:39:23,000 --> 00:39:26,000 careful way to make sure that we maintain all these properties. 552 00:39:26,000 --> 00:39:30,000 That's the hard part. But, rotations will be our key. 553 00:39:30,000 --> 00:39:32,000 This was the right rotate operation. 554 00:39:32,000 --> 00:39:35,000 The reverse operation is left rotate. 555 00:39:35,000 --> 00:39:40,000 So, this is left rotate of A. In general, of the two nodes 556 00:39:40,000 --> 00:39:43,000 that are involved, we list the top one. 557 00:39:43,000 --> 00:39:47,000 So, its right rotate of B will give you this. 558 00:39:47,000 --> 00:39:50,000 Left rotate of A will give you this. 559 00:39:50,000 --> 00:39:54,000 So, these are reversible operations, which feels good. 560 00:39:54,000 --> 00:39:58,000 The other thing is that they only take constant time 561 00:39:58,000 --> 00:40:03,000 operations because we are only changing a constant number of 562 00:40:03,000 --> 00:40:07,000 pointers. As long as you know the node, 563 00:40:07,000 --> 00:40:11,000 B, that you are interested in, you set the left pointer of B 564 00:40:11,000 --> 00:40:14,000 to be, if you want it to be beta, so you set left of B to be 565 00:40:14,000 --> 00:40:16,000 right of A, and so on, and so on. 566 00:40:16,000 --> 00:40:18,000 You make constant number of those changes. 567 00:40:18,000 --> 00:40:22,000 You update the parents as well. It's only a constant number of 568 00:40:22,000 --> 00:40:25,000 links that are changing, so, a constant number of 569 00:40:25,000 --> 00:40:28,000 assignments you need to do. So, you've probably seen 570 00:40:28,000 --> 00:40:37,000 rotations before. But we are going to use them in 571 00:40:37,000 --> 00:40:47,000 a complicated way. So, let's look at how to do 572 00:40:47,000 --> 00:40:56,000 insertion. We'll see it three times in 573 00:40:56,000 --> 00:41:02,000 some sense. First, I'll tell you the basic 574 00:41:02,000 --> 00:41:07,000 idea, which is pretty simple. I mentioned some of it already. 575 00:41:07,000 --> 00:41:11,000 Then, we'll do it on an example, feel it in our bones, 576 00:41:11,000 --> 00:41:15,000 and then we'll give the pseudocode so that you could go 577 00:41:15,000 --> 00:41:18,000 home and implement it if you wanted. 578 00:41:18,000 --> 00:41:20,000 OK, this is, I should say, 579 00:41:20,000 --> 00:41:24,000 red black insert, which in the book is called RB 580 00:41:24,000 --> 00:41:29,000 insert, not for root beer, but for red black. 581 00:41:29,000 --> 00:41:32,000 OK, so the first thing we're going to do, as I said, 582 00:41:32,000 --> 00:41:35,000 is binary search tree, insert that node. 583 00:41:35,000 --> 00:41:39,000 So, x now becomes a new leaf. We searched for x wherever it's 584 00:41:39,000 --> 00:41:42,000 supposed to go. We create, I shouldn't call it 585 00:41:42,000 --> 00:41:45,000 a leaf now. It's now at node hanging off. 586 00:41:45,000 --> 00:41:49,000 It's an internal node hanging off one of the original nodes. 587 00:41:49,000 --> 00:41:53,000 Maybe we added it right here. It now gets two new leaves 588 00:41:53,000 --> 00:41:57,000 hanging off of it. It has no internal children. 589 00:41:57,000 --> 00:41:59,000 And, we get to pick a color for it. 590 00:41:59,000 --> 00:42:03,000 And, we will pick the color red. 591 00:42:03,000 --> 00:42:06,000 OK, why red? We definitely have to pick one 592 00:42:06,000 --> 00:42:08,000 of two colors. We could flip a coin. 593 00:42:08,000 --> 00:42:11,000 That might work, but it's going to make our job 594 00:42:11,000 --> 00:42:15,000 even messier. So, we are adding a new node. 595 00:42:15,000 --> 00:42:19,000 It's not a root or a leaf presumably, so we don't really 596 00:42:19,000 --> 00:42:21,000 need it to be black by property two. 597 00:42:21,000 --> 00:42:24,000 Property three, every red node has a black 598 00:42:24,000 --> 00:42:26,000 parent. That might be a problem. 599 00:42:26,000 --> 00:42:31,000 So, the problem is if its parent is red. 600 00:42:31,000 --> 00:42:35,000 Then we violate property two. The parent might be red, 601 00:42:35,000 --> 00:42:37,000 property three, sorry. 602 00:42:37,000 --> 00:42:42,000 OK, the good news is that property four is still true 603 00:42:42,000 --> 00:42:47,000 because property four is just counting numbers of black nodes 604 00:42:47,000 --> 00:42:52,000 down various paths. That's really the hard property 605 00:42:52,000 --> 00:42:55,000 to maintain. If we just add a new red node, 606 00:42:55,000 --> 00:43:00,000 none of the black heights change. 607 00:43:00,000 --> 00:43:05,000 None of the number of black nodes along the path changes. 608 00:43:05,000 --> 00:43:11,000 So, this still has to hold. The only thing we can violate 609 00:43:11,000 --> 00:43:14,000 is property three. That's reasonable. 610 00:43:14,000 --> 00:43:20,000 We know we've got to violate something at the beginning. 611 00:43:20,000 --> 00:43:24,000 We can't just do a binary search tree insert. 612 00:43:24,000 --> 00:43:30,000 OK, so, let's give it a try on this tree. 613 00:43:30,000 --> 00:43:34,000 I should say how we are going to fix this. 614 00:43:34,000 --> 00:43:40,000 How do we fix property three? We are going to move the 615 00:43:40,000 --> 00:43:47,000 violation of three up the tree. So, we're going to start at 616 00:43:47,000 --> 00:43:51,000 node x, and move up towards the root. 617 00:43:51,000 --> 00:43:55,000 This is via recoloring. The only thing, 618 00:43:55,000 --> 00:44:01,000 initially, we'll do is recoloring until we get to some 619 00:44:01,000 --> 00:44:09,000 point where we can fix the violation using a rotation -- 620 00:44:20,000 --> 00:44:24,000 -- and probably also recoloring. 621 00:44:24,000 --> 00:44:32,000 OK, so let's see this algorithm in action. 622 00:44:32,000 --> 00:44:41,000 I want to copy this tree, and you are going to have to 623 00:44:41,000 --> 00:44:49,000 copy it, too. So, I'll just redraw it instead 624 00:44:49,000 --> 00:44:59,000 of modifying that diagram. So, we have this nice red black 625 00:44:59,000 --> 00:45:05,000 tree. And, we'll try inserting a new 626 00:45:05,000 --> 00:45:10,000 value of 15. 22 black. 627 00:45:10,000 --> 00:45:16,000 22 is the new black. OK, that should be the same 628 00:45:16,000 --> 00:45:20,000 tree. So now, I'm choosing the number 629 00:45:20,000 --> 00:45:26,000 15 to insert, because that will show a fairly 630 00:45:26,000 --> 00:45:32,000 interesting insertion. Sometimes, the insertion 631 00:45:32,000 --> 00:45:36,000 doesn't take very much work. We just do the rotation and 632 00:45:36,000 --> 00:45:39,000 we're done. I just like to look at an 633 00:45:39,000 --> 00:45:42,000 interesting case. So, we insert 15. 634 00:45:42,000 --> 00:45:45,000 15 is bigger than seven. It's less than 18. 635 00:45:45,000 --> 00:45:49,000 It's bigger than ten. It's bigger than 11. 636 00:45:49,000 --> 00:45:53,000 So, 15 goes here. So, we add a new red node of 637 00:00:15,000 --> 00:45:55,000 And, it has two black leaves 638 00:45:55,000 --> 00:45:59,000 hanging off of it, replaced one black leaf. 639 00:45:59,000 --> 00:46:02,000 Now we have two. OK, now, we violate property 640 00:46:02,000 --> 00:46:09,000 three because we added a new red child of a red node. 641 00:46:09,000 --> 00:46:13,000 So, now we have two consecutive red nodes in a root to leaf 642 00:46:13,000 --> 00:46:16,000 path. We'd like to make this black, 643 00:46:16,000 --> 00:46:20,000 but that would screw up the black heights because now this 644 00:46:20,000 --> 00:46:25,000 node would have one black node over here, and two black nodes 645 00:46:25,000 --> 00:46:27,000 down this path. So, that's not good. 646 00:46:27,000 --> 00:46:32,000 What can we do? Well, let's try to re-color. 647 00:46:32,000 --> 00:46:34,000 Yes. This always takes a little 648 00:46:34,000 --> 00:46:37,000 while to remember. So, our fix is going to be to 649 00:46:37,000 --> 00:46:40,000 recolor. And, the first thing that 650 00:46:40,000 --> 00:46:44,000 struck me, which doesn't work, is we try to recolor around 651 00:46:44,000 --> 00:46:46,000 here. It doesn't look so good because 652 00:46:46,000 --> 00:46:51,000 we've got red stuff out here, but we've got a black node over 653 00:46:51,000 --> 00:46:53,000 here. So we can't make this one red, 654 00:46:53,000 --> 00:46:56,000 and this one black. It wouldn't quite work. 655 00:46:56,000 --> 00:47:00,000 If we look up a little higher at the grandparent of 15 up 656 00:47:00,000 --> 00:47:06,000 here, we have a black node here and two red children. 657 00:47:06,000 --> 00:47:08,000 That's actually pretty good news because we could, 658 00:47:08,000 --> 00:47:12,000 instead, make that two black children and a red parent. 659 00:47:12,000 --> 00:47:14,000 Locally, that's going to be fine. 660 00:47:14,000 --> 00:47:17,000 It's not going to change any black heights because any path 661 00:47:17,000 --> 00:47:21,000 that went through these nodes before will still go through the 662 00:47:21,000 --> 00:47:24,000 same number of black nodes. Instead of going through a 663 00:47:24,000 --> 00:47:27,000 black node always here, it will go through a black node 664 00:47:27,000 --> 00:47:30,000 either here or here because paths always go down to the 665 00:47:30,000 --> 00:47:34,000 leaves. So, that's what we're going to 666 00:47:34,000 --> 00:47:38,000 do, recolor these guys. And, we will get ten, 667 00:47:38,000 --> 00:47:40,000 which is red. We'll get eight, 668 00:47:40,000 --> 00:47:43,000 which is black, 11 which is black, 669 00:47:43,000 --> 00:47:48,000 and these things don't change. Everything else doesn't change. 670 00:47:48,000 --> 00:47:53,000 We are going to leave 15 red. It's no longer in violation. 671 00:47:53,000 --> 00:47:57,000 15 is great because now its parent is black. 672 00:47:57,000 --> 00:48:02,000 We now have a new violation up here with 18 because 18 is also 673 00:48:02,000 --> 00:48:07,000 red. That's the only violation we 674 00:48:07,000 --> 00:48:10,000 have. In general, we'll have, 675 00:48:10,000 --> 00:48:17,000 at most, one violation at any time until we fix it. 676 00:48:17,000 --> 00:48:20,000 Then we'll have zero violations. 677 00:48:20,000 --> 00:48:27,000 OK, so, now we have a violation between ten and 18: 678 00:48:27,000 --> 00:48:33,000 somehow always counterintuitive to me. 679 00:48:33,000 --> 00:48:35,000 I had to look at the cheat sheet again. 680 00:48:35,000 --> 00:48:37,000 Really? No, OK, good. 681 00:48:37,000 --> 00:48:40,000 I was going to say, we can't recolor anymore. 682 00:48:40,000 --> 00:48:41,000 Good. I'm not that bad. 683 00:48:41,000 --> 00:48:45,000 So, what we'd like to do is, again, look at the grandparent 684 00:48:45,000 --> 00:48:49,000 of ten, which is now seven, the root of the tree. 685 00:48:49,000 --> 00:48:51,000 It is black, but one of its children is 686 00:48:51,000 --> 00:48:53,000 black. The other is red. 687 00:48:53,000 --> 00:48:57,000 So, we can't play the same game of taking the blackness of 688 00:48:57,000 --> 00:49:02,000 seven, and moving it down to the two children. 689 00:49:02,000 --> 00:49:04,000 Never mind that the root is supposed to stay black. 690 00:49:04,000 --> 00:49:06,000 We'll ignore that property for now. 691 00:49:06,000 --> 00:49:09,000 We can't make these two black and make this one red, 692 00:49:09,000 --> 00:49:11,000 because then we'd get an imbalance. 693 00:49:11,000 --> 00:49:14,000 This was already black. So now, paths going down here 694 00:49:14,000 --> 00:49:17,000 will have one fewer black node than paths going out here. 695 00:49:17,000 --> 00:49:20,000 So, we can't just recolor seven and its children. 696 00:49:20,000 --> 00:49:22,000 So, instead, we've got to do a rotation. 697 00:49:22,000 --> 00:49:26,000 We'd better be near the end. So, what I will do is rotate 698 00:49:26,000 --> 00:49:28,000 this edge. I'm going to rotate eight to 699 00:49:28,000 --> 00:49:35,000 the right. So that's the next operation: 700 00:49:35,000 --> 00:49:39,000 rotate right of 18. 701 00:50:03,000 --> 00:50:07,000 We'll delete one more operation after this. 702 00:50:07,000 --> 00:50:11,000 So, we rotate right 18. So, the root stays the same: 703 00:50:11,000 --> 00:50:14,000 seven, three, its children. 704 00:50:14,000 --> 00:50:18,000 Now, the right child of seven is no longer 18. 705 00:50:18,000 --> 00:50:22,000 It's now ten. 18 becomes the red child of 706 00:50:22,000 --> 00:50:25,000 ten. OK, we have eight over here 707 00:50:25,000 --> 00:50:33,000 with its two children. 11 and 15: that subtree fits in 708 00:50:33,000 --> 00:50:37,000 between ten and 18. So, it goes here: 709 00:50:37,000 --> 00:50:42,000 11 and 15. And then, there's the right 710 00:50:42,000 --> 00:50:47,000 subtree. Everything to the right of 18, 711 00:50:47,000 --> 00:50:51,000 that goes over here: 22 and 26. 712 00:50:51,000 --> 00:50:58,000 And hopefully I'm not changing any colors during that 713 00:50:58,000 --> 00:51:04,000 operation. If I did, let me know. 714 00:51:04,000 --> 00:51:06,000 OK, it looks good. So, I still have this 715 00:51:06,000 --> 00:51:09,000 violation, still in trouble between ten and 18. 716 00:51:09,000 --> 00:51:12,000 But, I've made this straighter. OK, that's what we want to do, 717 00:51:12,000 --> 00:51:15,000 it turns out, is make the connection between 718 00:51:15,000 --> 00:51:17,000 18, the violator, and its grandparent, 719 00:51:17,000 --> 00:51:20,000 a straight connection: two rights or two lefts. 720 00:51:20,000 --> 00:51:22,000 Here we had to zigzag right, left. 721 00:51:22,000 --> 00:51:25,000 We like to make it straight. OK, it doesn't look like a much 722 00:51:25,000 --> 00:51:27,000 more balanced tree that this one. 723 00:51:27,000 --> 00:51:31,000 In fact, it looks a little worse. 724 00:51:31,000 --> 00:51:36,000 What we can do is now rotate these guys, or rather, 725 00:51:36,000 --> 00:51:40,000 rotate this edge. I'm going to rotate seven to 726 00:51:40,000 --> 00:51:46,000 the left, make ten the root, and that things will start to 727 00:51:46,000 --> 00:51:50,000 look balanced. This is a rotate left of seven. 728 00:51:50,000 --> 00:51:57,000 And, I'm also going to do some recoloring at the same time just 729 00:51:57,000 --> 00:52:02,000 to save me drawing one more picture because the root has to 730 00:52:02,000 --> 00:52:07,000 be black. I'm going to make 10 black 731 00:52:07,000 --> 00:52:11,000 immediately. I'll make seven red. 732 00:52:11,000 --> 00:52:16,000 That's the change. And that the rest is just a 733 00:52:16,000 --> 00:52:20,000 rotation. So, we have 18 over here. 734 00:52:20,000 --> 00:52:25,000 I think I actually have to rotate to keep some red 735 00:52:25,000 --> 00:52:31,000 blackness here. Eight comes between seven and 736 00:52:31,000 --> 00:52:34,000 ten. So it goes here. 737 00:52:34,000 --> 00:52:40,000 11 goes between ten and 18, so it goes here. 738 00:52:40,000 --> 00:52:46,000 22 and 26 come after 18. Now, if I'm lucky, 739 00:52:46,000 --> 00:52:52,000 I should satisfy all of properties that I want. 740 00:52:52,000 --> 00:52:58,000 OK, now, if I'm lucky, I should satisfy all the 741 00:52:58,000 --> 00:53:04,000 properties that I want. Every node is red or black. 742 00:53:04,000 --> 00:53:08,000 Every black node has a child. This is the last place we 743 00:53:08,000 --> 00:53:11,000 change. Red nodes have black children, 744 00:53:11,000 --> 00:53:14,000 and all the black heights should be well defined. 745 00:53:14,000 --> 00:53:18,000 For every node, the number of black nodes along 746 00:53:18,000 --> 00:53:20,000 any node to leaf path is the same. 747 00:53:20,000 --> 00:53:23,000 And you check, that was true before, 748 00:53:23,000 --> 00:53:27,000 and I did a little bit of trickery with the recoloring 749 00:53:27,000 --> 00:53:30,000 here. But it's still true. 750 00:53:30,000 --> 00:53:34,000 I mean, you can check that just locally around this rotation. 751 00:53:34,000 --> 00:53:36,000 OK, we'll do that in a little bit. 752 00:53:36,000 --> 00:53:40,000 For now, it's just an example. It's probably not terribly 753 00:53:40,000 --> 00:53:44,000 clear where these re-colorings and rotations come from 754 00:53:44,000 --> 00:53:48,000 necessarily, but it worked, and it at least convinces you 755 00:53:48,000 --> 00:53:52,000 that it's possible. And now, we'll give a general 756 00:53:52,000 --> 00:53:55,000 algorithm for doing it. Any questions before we go on? 757 00:53:55,000 --> 00:53:59,000 So, it's not exactly, I mean, just writing of the 758 00:53:59,000 --> 00:54:03,000 algorithm is not terribly intuitive. 759 00:54:03,000 --> 00:54:06,000 Red black trees of the sort of thing where you play around a 760 00:54:06,000 --> 00:54:07,000 bit. You say, OK, 761 00:54:07,000 --> 00:54:10,000 I'm going to just think about recoloring and rotations. 762 00:54:10,000 --> 00:54:12,000 Let's restrict myself to those operations. 763 00:54:12,000 --> 00:54:14,000 What could I do? Well, I'll try to recolor. 764 00:54:14,000 --> 00:54:17,000 If that works great, it pushes the problem up 765 00:54:17,000 --> 00:54:19,000 higher. And, there's only log n levels, 766 00:54:19,000 --> 00:54:22,000 order log n levels, so that's going to take order 767 00:54:22,000 --> 00:54:23,000 log n time. At some point, 768 00:54:23,000 --> 00:54:25,000 I'll get stuck. I can't recolor anymore. 769 00:54:25,000 --> 00:54:28,000 Then it turns out, a couple of rotations will do 770 00:54:28,000 --> 00:54:32,000 it. Always, two rotations will 771 00:54:32,000 --> 00:54:36,000 suffice. And you just play with it, 772 00:54:36,000 --> 00:54:40,000 and that turns out to work. And here's how. 773 00:54:40,000 --> 00:54:45,000 OK, so let's suppose we have a red black tree. 774 00:54:45,000 --> 00:54:48,000 And value x, we want to insert. 775 00:54:48,000 --> 00:54:53,000 Here's the algorithm. First, we insert it into the 776 00:54:53,000 --> 00:54:55,000 BST. So that we know. 777 00:54:55,000 --> 00:55:01,000 Then, we color the node red. And here, I'm going to use a 778 00:55:01,000 --> 00:55:08,000 slightly more precise notation. Color is a field of x. 779 00:55:08,000 --> 00:55:14,000 And now, we are going to walk our way up the tree with a while 780 00:55:14,000 --> 00:55:21,000 loop until we get to the root, or until we reach a black node. 781 00:55:21,000 --> 00:55:26,000 So, in general, x initially is going to be the 782 00:55:26,000 --> 00:55:32,000 element that we inserted. But, we're going to move x up 783 00:55:32,000 --> 00:55:36,000 the tree. If ever we find that x is a 784 00:55:36,000 --> 00:55:40,000 black node, we're happy because maybe its parent is red. 785 00:55:40,000 --> 00:55:42,000 Maybe it isn't. I don't care. 786 00:55:42,000 --> 00:55:45,000 Black nodes can have arbitrarily colored parents. 787 00:55:45,000 --> 00:55:48,000 It's red nodes that we worry about. 788 00:55:48,000 --> 00:55:51,000 So, if x is red, we have to keep doing this 789 00:55:51,000 --> 00:55:54,000 loop. Of course, I just wrote the 790 00:55:54,000 --> 00:55:56,000 wrong one. While the color is red, 791 00:55:56,000 --> 00:56:03,000 we're going to keep doing this. So, there are three cases, 792 00:56:03,000 --> 00:56:07,000 or six, depending on how you count. 793 00:56:07,000 --> 00:56:13,000 That's what makes this a little bit tricky to memorize. 794 00:56:13,000 --> 00:56:18,000 OK, but there are some symmetric situations. 795 00:56:18,000 --> 00:56:23,000 Let me draw them. What we care about, 796 00:56:23,000 --> 00:56:27,000 I've argued, is between x and its 797 00:56:27,000 --> 00:56:32,000 grandparent. So, I'm using p of x here to 798 00:56:32,000 --> 00:56:36,000 denote parent of x just because it's shorter. 799 00:56:36,000 --> 00:56:40,000 So, p of x is x's grandparent. Left of p of x is the left 800 00:56:40,000 --> 00:56:43,000 child. So, what I'm interested in is I 801 00:56:43,000 --> 00:56:46,000 look at x. And, if I don't assign any 802 00:56:46,000 --> 00:56:51,000 directions, x is the child of some p of x, and p of x is the 803 00:56:51,000 --> 00:56:55,000 child of the grandparent, p of p of x. 804 00:56:55,000 --> 00:56:57,000 Now, these edges aren't vertical. 805 00:56:57,000 --> 00:57:02,000 They are either left or right. And, I care about which one. 806 00:57:02,000 --> 00:57:05,000 In particular, I'm looking at whether the 807 00:57:05,000 --> 00:57:09,000 parent is the left child of the grandparent. 808 00:57:09,000 --> 00:57:15,000 So, I want to know, does it look like this? 809 00:57:15,000 --> 00:57:19,000 OK, and I don't know whether x is to the left or to the right 810 00:57:19,000 --> 00:57:22,000 of the parent. But, is parent of x the left 811 00:57:22,000 --> 00:57:25,000 child of p of x, or is it the right child? 812 00:57:25,000 --> 00:57:28,000 And these two cases are totally symmetric. 813 00:57:28,000 --> 00:57:31,000 But I need to assume it's one way or the other. 814 00:57:31,000 --> 00:57:35,000 Otherwise, I can't draw the pictures. 815 00:57:35,000 --> 00:57:39,000 OK, so this will be, let's call it category A. 816 00:57:39,000 --> 00:57:44,000 And, this is category B. And, I'm going to tell you what 817 00:57:44,000 --> 00:57:48,000 to do in category A. And category B is symmetric. 818 00:57:48,000 --> 00:57:52,000 You just flip left and right. OK, so this is A. 819 00:57:52,000 --> 00:57:56,000 So, within category A, there are three cases. 820 00:57:56,000 --> 00:58:01,000 And within category B, there is the same three cases, 821 00:58:01,000 --> 00:58:06,000 just reversed. So, we're going to do is look 822 00:58:06,000 --> 00:58:09,000 at the other child of the grandparent. 823 00:58:09,000 --> 00:58:14,000 This is one reason why we sort of need to know which way we are 824 00:58:14,000 --> 00:58:17,000 looking. If the parent of x is the left 825 00:58:17,000 --> 00:58:21,000 child of the grandparent, we're going to look at the 826 00:58:21,000 --> 00:58:27,000 other child of the grandparent, which would be the right child 827 00:58:27,000 --> 00:58:31,000 of the grandparent, call that node y. 828 00:58:31,000 --> 00:58:34,000 This is also known as the uncle or the aunt of x, 829 00:58:34,000 --> 00:58:37,000 depending on whether y is male or female. 830 00:58:37,000 --> 00:58:40,000 OK, so this is uncle or aunt. Unfortunately, 831 00:58:40,000 --> 00:58:44,000 in English, there is no gender-free version of this as 832 00:58:44,000 --> 00:58:47,000 far as I know. There's parent and child, 833 00:58:47,000 --> 00:58:51,000 but no uncle-aunt. I'm sure we could come up with 834 00:58:51,000 --> 00:58:53,000 one. I'm not going to try. 835 00:58:53,000 --> 00:58:57,000 It's going to sound bad. OK, so why do I care about y? 836 00:58:57,000 --> 00:59:03,000 Because, I want to see if I can do this recoloring step. 837 00:59:03,000 --> 00:59:06,000 The recoloring idea was, well, the grandparents, 838 00:59:06,000 --> 00:59:09,000 let's say it's black. If I can push the blackness of 839 00:59:09,000 --> 00:59:12,000 the grandparent down into the two children, 840 00:59:12,000 --> 00:59:15,000 then if both of these are red, in other words, 841 00:59:15,000 --> 00:59:18,000 then I'd be happy. Then I'd push the problem up. 842 00:59:18,000 --> 00:59:20,000 This guy is now red. This guy is black. 843 00:59:20,000 --> 00:59:24,000 So these two are all right. This one may violate the great 844 00:59:24,000 --> 00:59:27,000 grandparent. But we will just keep going up, 845 00:59:27,000 --> 00:59:30,000 and that will be fine. Today, if we're lucky, 846 00:59:30,000 --> 00:59:35,000 y is red. Then we can just do recoloring. 847 00:59:35,000 --> 00:59:41,000 So, if the color of y is red, then we will recolor. 848 00:59:41,000 --> 00:59:49,000 And, I'm going to defer this to a picture called case one. 849 00:59:49,000 --> 00:59:54,000 OK, let me first tell you how the cases breakup, 850 00:59:54,000 --> 01:00:00,000 and then we will see how they work. 851 01:00:16,000 --> 01:00:23,546 So, if we're not in case one, so this L should be aligned 852 01:00:23,546 --> 01:00:29,744 with that, then, then we are either in case two 853 01:00:29,744 --> 01:00:35,000 or three. So, here's the dichotomy. 854 01:00:35,000 --> 01:00:39,066 It turns out we've actually seen all of the cases, 855 01:00:39,066 --> 01:00:43,299 maybe not A versus B, but we've seen the case of the 856 01:00:43,299 --> 01:00:46,287 very beginning where we just recolor. 857 01:00:46,287 --> 01:00:49,690 That's case one. The next thing we saw is, 858 01:00:49,690 --> 01:00:54,338 well, it's kind of annoying that the grandparent and ten, 859 01:00:54,338 --> 01:00:57,159 so seven and ten were not straight. 860 01:00:57,159 --> 01:01:01,226 They were zigzagged. So, case two is when they are 861 01:01:01,226 --> 01:01:04,546 zigzagged. It turns out if x is the right 862 01:01:04,546 --> 01:01:08,364 child of its parent, and the parent is the left 863 01:01:08,364 --> 01:01:12,929 child of the grandparent, that's a we've assumed so far, 864 01:01:12,929 --> 01:01:18,972 that is case two. OK, the other case is that x is 865 01:01:18,972 --> 01:01:24,630 the left child of its parent. So, then we have a left chain, 866 01:01:24,630 --> 01:01:27,698 x, parent of x, grandparent of x. 867 01:01:27,698 --> 01:01:32,881 That is case three. OK, I did not write else here 868 01:01:32,881 --> 01:01:38,247 because what case two does is it reduces to case three. 869 01:01:38,247 --> 01:01:42,619 So, in case two, we are going to do the stuff 870 01:01:42,619 --> 01:01:46,892 that's here. And then, we're going to do the 871 01:01:46,892 --> 01:01:49,475 stuff here. For case three, 872 01:01:49,475 --> 01:01:53,549 we just do the stuff here. Or in case one, 873 01:01:53,549 --> 01:01:58,816 we just do the stuff here. And then, that finishes the 874 01:01:58,816 --> 01:02:03,486 three cases on the A side, then back to this if. 875 01:02:03,486 --> 01:02:06,169 We say else, this is case B, 876 01:02:06,169 --> 01:02:11,236 which is the same as A, but reversing the notions of 877 01:02:11,236 --> 01:02:17,000 left and right, OK, in the natural way. 878 01:02:17,000 --> 01:02:20,375 Every time we write left of something, we instead write 879 01:02:20,375 --> 01:02:22,500 right of something, and vice versa. 880 01:02:22,500 --> 01:02:25,500 So, this is really just flipping everything over. 881 01:02:25,500 --> 01:02:29,187 We'll just focus on category A. And, let's see what we do in 882 01:02:29,187 --> 01:02:33,000 each of the three cases. We've seen it in an example. 883 01:02:33,000 --> 01:02:38,212 But let's do it generically. Let's do it here. 884 01:02:38,212 --> 01:02:43,424 Sorry, there's one more line to the algorithm, 885 01:02:43,424 --> 01:02:48,057 I should say. It's not aligned with here. 886 01:02:48,057 --> 01:02:53,386 We color the root. There's a chance when you do 887 01:02:53,386 --> 01:02:57,671 all of this that the root becomes red. 888 01:02:57,671 --> 01:03:03,000 We always want the root to be black. 889 01:03:03,000 --> 01:03:06,157 If it's red, we set it to black at the very 890 01:03:06,157 --> 01:03:09,992 end of the algorithm. This does not change the black 891 01:03:09,992 --> 01:03:13,375 height property. Everything will still be fine 892 01:03:13,375 --> 01:03:17,586 because every path either goes to the root or it doesn't, 893 01:03:17,586 --> 01:03:21,421 every x to leaf path. So, changing the root from red 894 01:03:21,421 --> 01:03:25,105 to black is no problem. It will increase the black 895 01:03:25,105 --> 01:03:28,714 heights of everyone, but all the paths will still 896 01:03:28,714 --> 01:03:33,000 have the same value. It will be one larger. 897 01:03:33,000 --> 01:03:37,224 So, let's look at the three cases. 898 01:03:37,224 --> 01:03:41,704 And, I'm going to use some notation. 899 01:03:41,704 --> 01:03:48,615 Remember, we had triangles in order to denote arbitrary 900 01:03:48,615 --> 01:03:52,967 subtrees when we define a rotation. 901 01:03:52,967 --> 01:04:00,519 I'm going to use triangle with a dot on top to say that this 902 01:04:00,519 --> 01:04:08,448 subtree has a black root. So, when I fill something 903 01:04:08,448 --> 01:04:15,344 white, it means black because I'm on a black board. 904 01:04:15,344 --> 01:04:19,344 Sorry. OK, and I also have the 905 01:04:19,344 --> 01:04:27,068 property that each of these triangles have the same black 906 01:04:27,068 --> 01:04:31,569 height. So, this will let me make sure 907 01:04:31,569 --> 01:04:35,424 that the black height property, property four, 908 01:04:35,424 --> 01:04:39,450 is being observed. So, let me just show you case 909 01:04:39,450 --> 01:04:42,105 one. We always want to make sure 910 01:04:42,105 --> 01:04:46,988 property four is preserved because it's really hard to get 911 01:04:46,988 --> 01:04:50,500 that back. It's essentially the balance of 912 01:04:50,500 --> 01:04:53,840 the tree. So, let's suppose we have some 913 01:04:53,840 --> 01:04:56,838 node, C, left child, A, right child, 914 01:04:56,838 --> 01:05:00,778 B, and some subtrees hanging off of those guys. 915 01:05:00,778 --> 01:05:05,318 And, all of those subtrees have the same black height. 916 01:05:05,318 --> 01:05:09,258 So, in other words, these things are all at the 917 01:05:09,258 --> 01:05:14,466 same level. OK, this is not quite what I 918 01:05:14,466 --> 01:05:17,462 wanted, sorry. So, I'm considering, 919 01:05:17,462 --> 01:05:21,162 this is node x. x is red, and its parent is 920 01:05:21,162 --> 01:05:23,718 red. Therefore, we need to fix 921 01:05:23,718 --> 01:05:26,361 something. We look at the node, 922 01:05:26,361 --> 01:05:29,885 y, which is over here. And, I'll call it, 923 01:05:29,885 --> 01:05:33,435 the key is D. The node is called y. 924 01:05:33,435 --> 01:05:37,740 OK, it has subtrees hanging off as well, all with the same black 925 01:05:37,740 --> 01:05:39,721 height. So, that will be true. 926 01:05:39,721 --> 01:05:43,685 If all of these nodes are red, then all of these nodes have 927 01:05:43,685 --> 01:05:46,145 the same black height. And therefore, 928 01:05:46,145 --> 01:05:49,494 all of the child subtrees, which have black roots, 929 01:05:49,494 --> 01:05:52,569 all had to have the same black height as well. 930 01:05:52,569 --> 01:05:56,738 OK, so we're looking at a big chunk of red children subtree of 931 01:05:56,738 --> 01:05:59,608 a black node, looking at all the stuff that 932 01:05:59,608 --> 01:06:02,870 happens to be red. In case one, 933 01:06:02,870 --> 01:06:07,153 why is red so it participates? So, a way to think of this as 934 01:06:07,153 --> 01:06:10,346 if we converted into the two-three-four-tree, 935 01:06:10,346 --> 01:06:13,104 or tried to, we would merge all of this 936 01:06:13,104 --> 01:06:16,661 stuff into one node. That's essentially what we're 937 01:06:16,661 --> 01:06:19,491 doing here. This is not a two-three-four 938 01:06:19,491 --> 01:06:22,322 tree, though. We now have five children, 939 01:06:22,322 --> 01:06:25,443 which is bad. This is why we want to fix it. 940 01:06:25,443 --> 01:06:28,201 So, we're going to recolor in case one. 941 01:06:28,201 --> 01:06:32,048 And, we're going to take C. Instead of making C black, 942 01:06:32,048 --> 01:06:35,241 and A and D red, we are going to make A and D 943 01:06:35,241 --> 01:06:39,173 black, and C red. So, C is red. 944 01:06:39,173 --> 01:06:41,158 A is black. D is black. 945 01:06:41,158 --> 01:06:45,220 And, the subtrees are the same. B is the same. 946 01:06:45,220 --> 01:06:49,191 It's still red. OK, now we need to check that 947 01:06:49,191 --> 01:06:54,245 we preserve property four, that all of the paths have the 948 01:06:54,245 --> 01:06:59,480 same number of black nodes. That follows because we know we 949 01:06:59,480 --> 01:07:04,733 didn't touch these subtrees. They all have the same black 950 01:07:04,733 --> 01:07:06,963 height. And, if you look at any path, 951 01:07:06,963 --> 01:07:10,802 like, all the paths from A are going to have that black height. 952 01:07:10,802 --> 01:07:14,518 All the paths from C are going to have that black height plus 953 01:07:14,518 --> 01:07:17,862 one because there's a black node in all the left paths, 954 01:07:17,862 --> 01:07:20,834 and there is a black node in all the right paths. 955 01:07:20,834 --> 01:07:23,064 So, all the black links are the same. 956 01:07:23,064 --> 01:07:25,045 So, this preserves property four. 957 01:07:25,045 --> 01:07:28,327 And, it fixes property three locally because B used to 958 01:07:28,327 --> 01:07:31,884 violate A. Now B does not violate 959 01:07:31,884 --> 01:07:34,882 anything. C, now, might be violated. 960 01:07:34,882 --> 01:07:39,593 So, what we're going to do is set x, our new value of x, 961 01:07:39,593 --> 01:07:42,162 will be C. So, it used to be B. 962 01:07:42,162 --> 01:07:46,873 We move it up a couple levels. Or, in the original tree, 963 01:07:46,873 --> 01:07:50,299 yeah, we also move it up a couple levels. 964 01:07:50,299 --> 01:07:53,468 So, we're making progress up the tree. 965 01:07:53,468 --> 01:07:57,494 And then we continue this loop. That's case one: 966 01:07:57,494 --> 01:08:01,092 recolor, go up. C may violate its parent in 967 01:08:01,092 --> 01:08:05,460 which case we have to recurse. So, we are recursing, 968 01:08:05,460 --> 01:08:10,000 in some sense, or continuing on C. 969 01:08:10,000 --> 01:08:13,000 So now, let's look at case two. 970 01:08:39,000 --> 01:08:43,929 So, I'm still, in some sense, 971 01:08:43,929 --> 01:08:49,915 defining this algorithm by picture. 972 01:08:49,915 --> 01:08:56,957 This is some nice, graphical, programming 973 01:08:56,957 --> 01:09:04,000 language. So, let's draw case two. 974 01:09:04,000 --> 01:09:07,299 Yeah, I forgot to mention something about case one. 975 01:09:07,299 --> 01:09:10,862 So, I drew some things here. What do I actually know is 976 01:09:10,862 --> 01:09:13,238 true? So, let's look at the algorithm 977 01:09:13,238 --> 01:09:16,868 in which I've now reversed. But, we are assuming that we 978 01:09:16,868 --> 01:09:19,045 are in category A. In other words, 979 01:09:19,045 --> 01:09:22,147 the parent is the left child of the grandparent. 980 01:09:22,147 --> 01:09:25,182 So, A is the left child of C. That much I knew. 981 01:09:25,182 --> 01:09:27,228 Therefore, y is the right child. 982 01:09:27,228 --> 01:09:32,712 D is the right child of C. I didn't actually know whether 983 01:09:32,712 --> 01:09:36,242 B was the right child or the left child. 984 01:09:36,242 --> 01:09:38,865 It didn't matter. In case one, 985 01:09:38,865 --> 01:09:42,665 it doesn't matter. OK, so I should've said, 986 01:09:42,665 --> 01:09:45,649 the children of A may be reversed. 987 01:09:45,649 --> 01:09:48,635 But it just said the same picture. 988 01:09:48,635 --> 01:09:53,158 OK, I thought of this because in case two, we care. 989 01:09:53,158 --> 01:09:56,325 So, case one: we didn't really care. 990 01:09:56,325 --> 01:09:58,586 In case two, we say, well, 991 01:09:58,586 --> 01:10:02,929 case two is up there, is x the right child of the 992 01:10:02,929 --> 01:10:08,864 parent, or the left child? If it's the right child, 993 01:10:08,864 --> 01:10:12,672 we are in case two. So now, I can really know that 994 01:10:12,672 --> 01:10:16,013 x here, which is B, is the right child of A. 995 01:10:16,013 --> 01:10:19,043 Before, I didn't know and I didn't care. 996 01:10:19,043 --> 01:10:21,841 Now, I'm assuming that it's this way. 997 01:10:21,841 --> 01:10:26,114 OK, y is still over here. And now, now we know that y is 998 01:10:26,114 --> 01:10:28,601 black. So, y over here is a black 999 01:10:28,601 --> 01:10:32,121 node. So now, if I did the 1000 01:10:32,121 --> 01:10:36,045 contraction trick, all of these nodes, 1001 01:10:36,045 --> 01:10:40,287 A, B, and C, would conglomerate into one. 1002 01:10:40,287 --> 01:10:45,803 I only have four children. That actually looks pretty 1003 01:10:45,803 --> 01:10:49,621 good. y would not be involved because 1004 01:10:49,621 --> 01:10:52,590 it's black. So, in this case, 1005 01:10:52,590 --> 01:10:58,000 we are going to do a left rotation on A. 1006 01:10:58,000 --> 01:11:00,806 So, we take the edge, we turn at 90∞. 1007 01:11:00,806 --> 01:11:04,702 What we get is A on the left, B on the right still. 1008 01:11:04,702 --> 01:11:09,146 It should preserve the in order traversal, C up top still. 1009 01:11:09,146 --> 01:11:12,576 We have the y subtree hanging off, as before. 1010 01:11:12,576 --> 01:11:16,708 We have one of the other three subtrees hanging off B, 1011 01:11:16,708 --> 01:11:19,202 and the other two now hang off A. 1012 01:11:19,202 --> 01:11:23,723 So, this is just a generic rotation picture applied to this 1013 01:11:23,723 --> 01:11:25,594 edge. OK, what that does, 1014 01:11:25,594 --> 01:11:29,881 is before we had a zigzag between x and its grandparent. 1015 01:11:29,881 --> 01:11:36,875 Now, we have a zigzig. We have a straight path between 1016 01:11:36,875 --> 01:11:40,250 x. So, x is still down here. 1017 01:11:40,250 --> 01:11:47,625 I'm not changing x in this case because after I do case two, 1018 01:11:47,625 --> 01:11:54,500 I immediately do case three. So, this is what case three 1019 01:11:54,500 --> 01:12:00,125 will look like. And now, I continue on to case 1020 01:12:00,125 --> 01:12:03,207 three. So, finally, 1021 01:12:03,207 --> 01:12:08,137 here's case three. And, this will finally complete 1022 01:12:08,137 --> 01:12:12,664 the insertion algorithm. We have a black node, 1023 01:12:12,664 --> 01:12:15,783 C. We have a red left child from 1024 01:12:15,783 --> 01:12:17,393 C. We have a red, 1025 01:12:17,393 --> 01:12:23,128 left, grandchild which is x. And then, we have these black 1026 01:12:23,128 --> 01:12:28,057 subtrees all of the same black height hanging off, 1027 01:12:28,057 --> 01:12:35,000 OK, which is exactly what we had at the end of case two. 1028 01:12:35,000 --> 01:12:37,401 So, that definitely connects over. 1029 01:12:37,401 --> 01:12:40,458 And remember, this is the only case left in 1030 01:12:40,458 --> 01:12:43,369 category A. Category A, we assumed that B 1031 01:12:43,369 --> 01:12:46,644 was the parent of x, was the left child of the 1032 01:12:46,644 --> 01:12:49,336 grandparent, B or C. So, we know that. 1033 01:12:49,336 --> 01:12:52,757 We already did the case one, y over here as red. 1034 01:12:52,757 --> 01:12:56,323 That was case one. So, we are assuming y is black. 1035 01:12:56,323 --> 01:13:00,398 Now, we look at whether x was the left child or the right 1036 01:13:00,398 --> 01:13:04,054 child. If it was the right child, 1037 01:13:04,054 --> 01:13:08,637 we made it into the left child. x actually did change here. 1038 01:13:08,637 --> 01:13:10,850 Before, x was B. Now, x is A. 1039 01:13:10,850 --> 01:13:15,117 OK, and then case three, finally, is when x is the left 1040 01:13:15,117 --> 01:13:19,858 child of the parent who is the left child of the grandparent. 1041 01:13:19,858 --> 01:13:23,335 This is the last case we have to worry about. 1042 01:13:23,335 --> 01:13:27,523 And, what we do is another rotation just like the last 1043 01:13:27,523 --> 01:13:34,419 rotation we did in the example. That was case three. 1044 01:13:34,419 --> 01:13:43,652 So, we're going to do a right rotate in this case of C. 1045 01:13:43,652 --> 01:13:52,544 And, we are going to recolor. OK, so, what do we get? 1046 01:13:52,544 --> 01:14:01,777 Well, B now becomes the root. And, I'm going to make it 1047 01:14:01,777 --> 01:14:04,832 black. OK, remember, 1048 01:14:04,832 --> 01:14:06,816 this is the root of the subtree. 1049 01:14:06,816 --> 01:14:09,184 There is other stuff hanging off here. 1050 01:14:09,184 --> 01:14:12,706 I really should have drawn extra parents in all of these 1051 01:14:12,706 --> 01:14:14,947 pictures. There was somewhere in the 1052 01:14:14,947 --> 01:14:17,379 middle of the tree. I don't know where. 1053 01:14:17,379 --> 01:14:21,284 It could be a rightward branch; it could be a leftward branch. 1054 01:14:21,284 --> 01:14:23,781 We don't know. C becomes the child of B, 1055 01:14:23,781 --> 01:14:26,086 and I'm going to make it a red child. 1056 01:14:26,086 --> 01:14:28,646 A becomes a child of B, as it was before, 1057 01:14:28,646 --> 01:14:31,399 keep it red. And, everything else just hangs 1058 01:14:31,399 --> 01:14:34,996 off. So, there were four subtrees 1059 01:14:34,996 --> 01:14:38,418 all at the same black height. And, in particular, 1060 01:14:38,418 --> 01:14:41,983 this last one had y, but we don't particularly care 1061 01:14:41,983 --> 01:14:44,977 about y anymore. Now, we are in really good 1062 01:14:44,977 --> 01:14:48,327 shape because we should have no more violations. 1063 01:14:48,327 --> 01:14:51,964 Before, we had a violation between x and its parent, 1064 01:14:51,964 --> 01:14:54,388 A and B. Well, A and B still have a 1065 01:14:54,388 --> 01:14:57,311 parent child relation. But B is now black. 1066 01:14:57,311 --> 01:15:00,234 And, B is black, so we don't care what its 1067 01:15:00,234 --> 01:15:03,300 parent looks like. It could be red or black. 1068 01:15:03,300 --> 01:15:06,151 Both are fine. We are no longer violating 1069 01:15:06,151 --> 01:15:11,000 property three. We should be done in this case. 1070 01:15:11,000 --> 01:15:13,626 Property three is now true. If you want, 1071 01:15:13,626 --> 01:15:16,184 you can say, well, x becomes this node. 1072 01:15:16,184 --> 01:15:19,350 And then, the loop says, oh, x is no longer red. 1073 01:15:19,350 --> 01:15:22,447 Therefore, I'm done. We also need to check that 1074 01:15:22,447 --> 01:15:25,545 property four is preserved during this process. 1075 01:15:25,545 --> 01:15:29,181 Again, it's not hard because of the two-three-four tree 1076 01:15:29,181 --> 01:15:32,464 transformation. If I contract all the red 1077 01:15:32,464 --> 01:15:35,804 things into their parents, everything else has a constant, 1078 01:15:35,804 --> 01:15:39,496 I mean, every path in that tree has the same length because they 1079 01:15:39,496 --> 01:15:41,898 have the same black length. And over here, 1080 01:15:41,898 --> 01:15:44,827 that will still be true. It's a little bit trickier 1081 01:15:44,827 --> 01:15:47,640 here, because we are recoloring at the same time. 1082 01:15:47,640 --> 01:15:50,863 But, if you look at a path that comes through this tree, 1083 01:15:50,863 --> 01:15:54,437 it used to go through a black node, C, and then maybe some red 1084 01:15:54,437 --> 01:15:57,425 stuff; I don't care. And then, it went through these 1085 01:15:57,425 --> 01:16:01,000 trees, which all have the same black height. 1086 01:16:01,000 --> 01:16:03,785 So they were all the same. Now, you comment, 1087 01:16:03,785 --> 01:16:06,376 and you go through a black node called B. 1088 01:16:06,376 --> 01:16:08,902 And then, you go through some red nodes. 1089 01:16:08,902 --> 01:16:12,400 It doesn't really matter. But all the trees that you go 1090 01:16:12,400 --> 01:16:15,251 through down here have the same black height. 1091 01:16:15,251 --> 01:16:18,878 So, every path through this tree will have the same black 1092 01:16:18,878 --> 01:16:21,663 length, OK, if it starts from the same node. 1093 01:16:21,663 --> 01:16:25,032 So, we preserve property four. We fix property three. 1094 01:16:25,032 --> 01:16:27,040 That is the insertion algorithm. 1095 01:16:27,040 --> 01:16:29,696 It's pretty long. This is something you'll 1096 01:16:29,696 --> 01:16:34,625 probably just have to memorize. If you try a few examples, 1097 01:16:34,625 --> 01:16:37,562 it's not so hard. We can see that all the things 1098 01:16:37,562 --> 01:16:40,250 we did in this example were the three cases. 1099 01:16:40,250 --> 01:16:42,937 The first step, which unfortunately I had to 1100 01:16:42,937 --> 01:16:45,375 erase for space, all we did was recolor. 1101 01:16:45,375 --> 01:16:47,562 We recolored ten, and eight, and 11. 1102 01:16:47,562 --> 01:16:50,687 That was a case one. Ten was the grandparent of 15. 1103 01:16:50,687 --> 01:16:53,437 Then, we looked at ten. Ten was the violator. 1104 01:16:53,437 --> 01:16:56,437 It was a zigzag case relative to its grandparent. 1105 01:16:56,437 --> 01:16:59,875 So, we did a right rotation to fix that, took this edge, 1106 01:16:59,875 --> 01:17:04,000 and turned it so that ten became next to seven. 1107 01:17:04,000 --> 01:17:07,545 That's the picture on the top. Then, 18, which is the new 1108 01:17:07,545 --> 01:17:10,268 violator, with its grandparent, is a zigzig. 1109 01:17:10,268 --> 01:17:12,864 They are both going in the same direction. 1110 01:17:12,864 --> 01:17:15,713 And, now, we do one more rotation to fix that. 1111 01:17:15,713 --> 01:17:18,816 That's really the only thing you have to remember. 1112 01:17:18,816 --> 01:17:21,032 Recolor your grandparent if you can. 1113 01:17:21,032 --> 01:17:24,641 Otherwise, make it zigzig. And then, do one last rotation. 1114 01:17:24,641 --> 01:17:26,604 And recolor. And that will work. 1115 01:17:26,604 --> 01:17:30,403 I mean, if you remember that, you will figure out the rest on 1116 01:17:30,403 --> 01:17:34,363 any particular example. We rotate ten over. 1117 01:17:34,363 --> 01:17:37,665 That better be black, because in this case it's 1118 01:17:37,665 --> 01:17:41,039 becoming the root. But, we will make it black no 1119 01:17:41,039 --> 01:17:45,131 matter what happens because there has to be one black node 1120 01:17:45,131 --> 01:17:47,500 there. If we didn't recolor at the 1121 01:17:47,500 --> 01:17:50,443 same time, we would violate property four. 1122 01:17:50,443 --> 01:17:54,606 Why don't I draw that just for, OK, because I have a couple 1123 01:17:54,606 --> 01:17:57,405 minutes. So, if we just did the rotation 1124 01:17:57,405 --> 01:18:00,061 here, so let's say, not the following, 1125 01:18:00,061 --> 01:18:02,614 we take B. B is red. 1126 01:18:02,614 --> 01:18:06,707 This will give some intuition as to why the algorithm is this 1127 01:18:06,707 --> 01:18:09,709 way, and not some other way. And, C is black. 1128 01:18:09,709 --> 01:18:13,461 That's what we would have gotten if we just rotated this 1129 01:18:13,461 --> 01:18:16,327 tree, rotated B, or rotated C to the right. 1130 01:18:16,327 --> 01:18:19,261 So, these subtrees hang off in the same way. 1131 01:18:19,261 --> 01:18:23,013 Subtrees look great because they all have the same black 1132 01:18:23,013 --> 01:18:24,378 height. But, you see, 1133 01:18:24,378 --> 01:18:27,448 there's a problem. If we look at all the paths 1134 01:18:27,448 --> 01:18:31,064 starting from B and going down to a leaf, on the left, 1135 01:18:31,064 --> 01:18:34,884 the number of black nodes is whatever the black height is 1136 01:18:34,884 --> 01:18:39,237 over here. Label that: black height, 1137 01:18:39,237 --> 01:18:44,517 whereas all the paths on the right will be that black height 1138 01:18:44,517 --> 01:18:49,797 plus one because C is black. So now, we've violated property 1139 01:18:49,797 --> 01:18:52,750 four. So, we don't do this in case 1140 01:18:52,750 --> 01:18:55,525 three. After we do the rotation, 1141 01:18:55,525 --> 01:19:00,000 we also do a recoloring. So, we get this. 1142 01:19:00,000 --> 01:19:02,652 In other words, we are putting the black node 1143 01:19:02,652 --> 01:19:06,027 at the top because then every path has to go through that 1144 01:19:06,027 --> 01:19:09,282 node, whereas over here, some of the nodes went through 1145 01:19:09,282 --> 01:19:11,331 the C. Some of them went through A. 1146 01:19:11,331 --> 01:19:13,983 So, this is bad. Also, we would have violated 1147 01:19:13,983 --> 01:19:16,575 property three. But, the really bad thing is 1148 01:19:16,575 --> 01:19:19,287 that we are violating property four over here. 1149 01:19:19,287 --> 01:19:22,000 OK, let me sum up a little bit. 1150 01:19:32,000 --> 01:19:38,734 So, we've seen, if we insert into a red black 1151 01:19:38,734 --> 01:19:44,397 tree, we can keep it a red black tree. 1152 01:19:44,397 --> 01:19:53,734 So, RB insert adds x to the set to the dynamic set that we are 1153 01:19:53,734 --> 01:20:02,000 trying to maintain, and preserves red blackness. 1154 01:20:02,000 --> 01:20:06,036 So, it keeps the tree a red black tree, which is good 1155 01:20:06,036 --> 01:20:09,762 because we know then it keeps logarithmic height. 1156 01:20:09,762 --> 01:20:14,187 Therefore, all queries in red black trees will keep taking 1157 01:20:14,187 --> 01:20:17,835 logarithmic time. How long does red black insert 1158 01:20:17,835 --> 01:20:20,630 take? We know we are aiming for log n 1159 01:20:20,630 --> 01:20:24,278 time preparation. We are not going to prove that 1160 01:20:24,278 --> 01:20:27,616 formally, but it should be pretty intuitive. 1161 01:20:27,616 --> 01:20:31,652 So, cases two and three, sorry, pointing at the wrong 1162 01:20:31,652 --> 01:20:36,000 place, cases two and three are terminal. 1163 01:20:36,000 --> 01:20:38,059 When we do case three, we are done. 1164 01:20:38,059 --> 01:20:41,090 When we do case two, we are about to do case three, 1165 01:20:41,090 --> 01:20:44,242 and then we are done. OK, so the only thing we really 1166 01:20:44,242 --> 01:20:47,030 have to count is case one because each of these 1167 01:20:47,030 --> 01:20:50,363 operations, they are recoloring, rotation, they all take 1168 01:20:50,363 --> 01:20:52,484 constant time. So, it's a matter of, 1169 01:20:52,484 --> 01:20:55,454 how many are there? Case one does some recoloring, 1170 01:20:55,454 --> 01:21:00,000 doesn't change the tree at all, and moves x up by two levels. 1171 01:21:00,000 --> 01:21:04,226 We know that the height of the tree is, at most, 1172 01:21:04,226 --> 01:21:08,722 two log n plus one. So, the number of case ones is, 1173 01:21:08,722 --> 01:21:13,577 at most, log n plus one. OK, so the number of case ones 1174 01:21:13,577 --> 01:21:17,623 is, at most, log n. So, those take log n time. 1175 01:21:17,623 --> 01:21:21,850 And then, the number of case twos and threes is, 1176 01:21:21,850 --> 01:21:25,177 at most, one for one of these columns. 1177 01:21:25,177 --> 01:21:28,234 Well, together, twos and threes is, 1178 01:21:28,234 --> 01:21:31,111 at most, two. OK, so, log n time, 1179 01:21:31,111 --> 01:21:34,778 cool. The other thing that is 1180 01:21:34,778 --> 01:21:39,264 interesting about red black insertion is that it only makes 1181 01:21:39,264 --> 01:21:42,898 order one rotations. So, most of the changes are 1182 01:21:42,898 --> 01:21:46,146 recolorings. Case one just does recoloring, 1183 01:21:46,146 --> 01:21:48,930 no rotations. Case two maybe does one 1184 01:21:48,930 --> 01:21:52,023 rotation. Case three does one rotation if 1185 01:21:52,023 --> 01:21:56,895 you happen to be in those cases. So, the number of rotations is, 1186 01:21:56,895 --> 01:22:00,066 at most, two. It's either one or two in an 1187 01:22:00,066 --> 01:22:03,666 insertion. It's kind of nice because 1188 01:22:03,666 --> 01:22:07,733 rotating a tree is a bit more annoying than recoloring a tree. 1189 01:22:07,733 --> 01:22:09,266 Why? Because if you had, 1190 01:22:09,266 --> 01:22:12,266 say, a data structure, you have a search tree, 1191 01:22:12,266 --> 01:22:16,133 presumably, people are using the search tree for something. 1192 01:22:16,133 --> 01:22:18,133 They are, like, making queries. 1193 01:22:18,133 --> 01:22:20,933 For example, the search tree represents all 1194 01:22:20,933 --> 01:22:24,266 the documents matching the word computer in Google. 1195 01:22:24,266 --> 01:22:28,199 You've got the Google T-shirt on here, so let's use a Google 1196 01:22:28,199 --> 01:22:31,201 reference. You have the search tree. 1197 01:22:31,201 --> 01:22:33,652 It stores all the things containing the word Google. 1198 01:22:33,652 --> 01:22:36,440 You'd like to search may be for the ones that were modified 1199 01:22:36,440 --> 01:22:38,843 after a certain date, or whatever it is you want to 1200 01:22:38,843 --> 01:22:40,381 do. So, you're doing some queries 1201 01:22:40,381 --> 01:22:42,207 on this tree. And, people are pummeling 1202 01:22:42,207 --> 01:22:45,043 Google like crazy with queries. They get a zillion a second. 1203 01:22:45,043 --> 01:22:47,638 Don't quote me on that. The number may not be accurate. 1204 01:22:47,638 --> 01:22:49,849 It's a zillion. But, people are making searches 1205 01:22:49,849 --> 01:22:51,627 all the time. If you recolor the tree, 1206 01:22:51,627 --> 01:22:54,559 people can still make searches. It's just a little bit you are 1207 01:22:54,559 --> 01:22:56,145 flipping. I don't care in a search 1208 01:22:56,145 --> 01:22:58,885 whether a node is red or black because I know it will have 1209 01:22:58,885 --> 01:23:02,608 logarithmic height. So, you can come along and make 1210 01:23:02,608 --> 01:23:05,824 your occasional updates as your crawler surfs the Web and finds 1211 01:23:05,824 --> 01:23:07,536 changes. And, recoloring is great. 1212 01:23:07,536 --> 01:23:10,493 Rotation is a bit expensive because you have to lock those 1213 01:23:10,493 --> 01:23:13,657 nodes, make sure no one touches them for the duration that you 1214 01:23:13,657 --> 01:23:15,370 rotate them, and then unlock them. 1215 01:23:15,370 --> 01:23:18,016 So, it's nice that the number of rotations is small, 1216 01:23:18,016 --> 01:23:20,246 really small, just two, whereas the time has 1217 01:23:20,246 --> 01:23:23,048 to be log n because we are inserting into a sorted list 1218 01:23:23,048 --> 01:23:25,175 essentially. So, there is an n log n lower 1219 01:23:25,175 --> 01:23:28,184 bound if we do n insertions. OK, deletion and I'm not going 1220 01:23:28,184 --> 01:23:31,625 to cover here. You should read it in the book. 1221 01:23:31,625 --> 01:23:34,455 It's a little bit more complicated, but the same ideas. 1222 01:23:34,455 --> 01:23:37,338 It gets the same bounds: log n time order one rotations. 1223 01:23:37,338 --> 01:23:39,434 So, check it out. That's red black trees. 1224 01:23:39,434 --> 01:23:42,161 Now, you can maintain data in log n time preparation: 1225 01:23:42,161 --> 01:23:43,996 cool. We'll now see three ways to do 1226 01:23:43,996 --> 01:23:46,000 it.