1 00:00:00,080 --> 00:00:02,500 The following content is provided under a Creative 2 00:00:02,500 --> 00:00:04,019 Commons license. 3 00:00:04,019 --> 00:00:06,360 Your support will help MIT OpenCourseWare 4 00:00:06,360 --> 00:00:10,730 continue to offer high quality educational resources for free. 5 00:00:10,730 --> 00:00:13,340 To make a donation or view additional materials 6 00:00:13,340 --> 00:00:17,236 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,236 --> 00:00:17,861 at ocw.mit.edu. 8 00:00:21,190 --> 00:00:23,410 PROFESSOR: All right, let's get started. 9 00:00:23,410 --> 00:00:26,010 Today we have another data structures topic 10 00:00:26,010 --> 00:00:30,360 which is, Data Structure Augmentation. 11 00:00:30,360 --> 00:00:33,230 The idea here is we're going to take some existing data 12 00:00:33,230 --> 00:00:36,715 structure and augment it to do extra cool things. 13 00:00:46,390 --> 00:00:49,560 Take some other data structure there we've covered. 14 00:00:49,560 --> 00:00:53,640 Typically, that'll be a balanced search tree, 15 00:00:53,640 --> 00:00:55,870 like an AVL tree or a 2-3 tree. 16 00:00:59,480 --> 00:01:04,300 And then we'll modify it to store extra information which 17 00:01:04,300 --> 00:01:09,270 will enable additional kinds of searches, typically, 18 00:01:09,270 --> 00:01:11,240 and sometimes to do updates better. 19 00:01:21,120 --> 00:01:24,040 And in 006, you've seen an example of this where you took 20 00:01:24,040 --> 00:01:28,200 AVL trees and augmented AVL trees so that every node knew 21 00:01:28,200 --> 00:01:31,290 the number of nodes in that rooted subtree. 22 00:01:31,290 --> 00:01:33,630 Today we're going to see that example but also 23 00:01:33,630 --> 00:01:36,650 a bunch of other examples, different types of augmentation 24 00:01:36,650 --> 00:01:37,719 you could do. 25 00:01:37,719 --> 00:01:39,760 And we'll start out with a very simple one, which 26 00:01:39,760 --> 00:01:49,400 I call easy tree augmentation, which will include 27 00:01:49,400 --> 00:01:51,235 subtree size as a special case. 28 00:01:57,760 --> 00:02:02,220 So with easy tree augmentation, the idea 29 00:02:02,220 --> 00:02:06,240 is you have a tree, like an AVL tree, or 2-3 tree, 30 00:02:06,240 --> 00:02:07,480 or something like that. 31 00:02:07,480 --> 00:02:09,620 And you'd like to store, for every node 32 00:02:09,620 --> 00:02:13,480 x, some function of the subtree, rooted at x. 33 00:02:13,480 --> 00:02:15,220 Such as the number of nodes in there, 34 00:02:15,220 --> 00:02:17,382 or the sum of the weights of the nodes, 35 00:02:17,382 --> 00:02:19,090 or the sum of the squares of the weights, 36 00:02:19,090 --> 00:02:26,610 or the min, or the max, or the median maybe, I'm not sure. 37 00:02:26,610 --> 00:02:32,140 Some function f of x which is a function of that. 38 00:02:32,140 --> 00:02:35,710 Maybe not f of x, but we want to store 39 00:02:35,710 --> 00:02:37,430 some function of that subtree. 40 00:02:42,120 --> 00:02:49,325 Say the goal is to store f of the subtree rooted 41 00:02:49,325 --> 00:03:07,584 at x at each node x in a field which I'll call x.f. 42 00:03:07,584 --> 00:03:11,530 So, normally nodes have a left child, right child, parent. 43 00:03:11,530 --> 00:03:13,770 But we're going to store an extra field 44 00:03:13,770 --> 00:03:18,270 x.f for some function that you define. 45 00:03:18,270 --> 00:03:22,590 This is not always possible, but here's 46 00:03:22,590 --> 00:03:25,350 a case where it is possible. 47 00:03:25,350 --> 00:03:28,730 That's going to be the easy case. 48 00:03:28,730 --> 00:03:34,710 Suppose x.f can be computed locally 49 00:03:34,710 --> 00:03:37,905 using lower information, lower nodes. 50 00:03:47,410 --> 00:03:48,880 And we'll say, let's suppose it can 51 00:03:48,880 --> 00:03:52,530 be computed in constant time from information in the node 52 00:03:52,530 --> 00:04:02,500 x from x's children and from the f value 53 00:04:02,500 --> 00:04:04,500 that's stored in the children. 54 00:04:04,500 --> 00:04:05,750 I'll call that children.f. 55 00:04:05,750 --> 00:04:08,560 But really, I mean left child.f, right child.f, 56 00:04:08,560 --> 00:04:10,290 or if you have a 2-3 tree you have 57 00:04:10,290 --> 00:04:12,100 three children, potentially. 58 00:04:12,100 --> 00:04:14,250 And the .f of each of them. 59 00:04:14,250 --> 00:04:14,750 OK. 60 00:04:14,750 --> 00:04:19,050 So suppose you can compute x.f locally just 61 00:04:19,050 --> 00:04:23,020 using one level down in constant time. 62 00:04:23,020 --> 00:04:26,420 Then, as you might expect, you can update whenever 63 00:04:26,420 --> 00:04:30,510 a node ends up changing. 64 00:04:30,510 --> 00:04:33,560 So more formally. 65 00:04:33,560 --> 00:04:48,510 If some set of nodes change-- call this at s. 66 00:05:18,970 --> 00:05:22,540 So I'm stating a very general theorem here. 67 00:05:22,540 --> 00:05:28,920 If there is some set of nodes, which we changed something 68 00:05:28,920 --> 00:05:30,310 about them. 69 00:05:30,310 --> 00:05:33,070 We change either their f field, we 70 00:05:33,070 --> 00:05:35,220 change some of the data that's in the node, 71 00:05:35,220 --> 00:05:39,420 or we do a rotation, loosen those around. 72 00:05:39,420 --> 00:05:44,840 Then we count the total number of ancestors of these nodes. 73 00:05:44,840 --> 00:05:47,100 So this subtree. 74 00:05:47,100 --> 00:05:49,080 Those are the nodes that need to be updated 75 00:05:49,080 --> 00:05:52,870 because we're assuming we can compute x.f just given 76 00:05:52,870 --> 00:05:54,080 the children data. 77 00:05:54,080 --> 00:05:56,880 So if this data is changing, we have 78 00:05:56,880 --> 00:05:58,850 to update it's parents value of f 79 00:05:58,850 --> 00:06:01,350 because it depends on this child value. 80 00:06:01,350 --> 00:06:03,540 We have to update all those parents, 81 00:06:03,540 --> 00:06:06,820 all the way up to the root. 82 00:06:06,820 --> 00:06:10,880 So however many nodes there are there, that's the total cost. 83 00:06:10,880 --> 00:06:17,010 Now, luckily, in an AVL tree, or 2-3 tree, most balanced search 84 00:06:17,010 --> 00:06:20,610 structures, the updates you do are very localized. 85 00:06:20,610 --> 00:06:23,520 When we do splits in a 2-3 tree we only 86 00:06:23,520 --> 00:06:27,590 do it up a single path to the root. 87 00:06:27,590 --> 00:06:31,130 So the number of ancestors here is just going to be log n. 88 00:06:31,130 --> 00:06:32,360 Same thing with an AVL tree. 89 00:06:32,360 --> 00:06:34,340 If you look at the rotations you do, 90 00:06:34,340 --> 00:06:39,640 they are up a single leaf to root path. 91 00:06:39,640 --> 00:06:42,080 And so the number of ancestors that 92 00:06:42,080 --> 00:06:44,960 need to be updated is always order log n. 93 00:06:44,960 --> 00:06:49,820 Things change, and there's an order log n ancestors of them. 94 00:06:49,820 --> 00:06:52,830 So this is a little more general than we need, 95 00:06:52,830 --> 00:06:56,770 but it's just to point out if we did log n rotation spread out 96 00:06:56,770 --> 00:06:59,550 somewhere in the tree, that would actually be bad 97 00:06:59,550 --> 00:07:03,490 because the total number of ancestors could be log squared. 98 00:07:03,490 --> 00:07:07,860 But because in the structures we've seen, 99 00:07:07,860 --> 00:07:12,000 we just work on a single path to the root, we get log n. 100 00:07:33,330 --> 00:07:38,790 So in a little more detail here, whenever we 101 00:07:38,790 --> 00:07:41,060 do a rotation in an AVL tree. 102 00:07:49,610 --> 00:07:53,350 Let's say A, B, C, x, y. 103 00:07:56,160 --> 00:07:57,050 Remember rotations? 104 00:07:57,050 --> 00:08:00,680 Been a while since we've done rotations. 105 00:08:00,680 --> 00:08:04,750 So we haven't changed any of the nodes in A, B, C, 106 00:08:04,750 --> 00:08:08,160 but we have changed the nodes x and y. 107 00:08:08,160 --> 00:08:13,010 So we're going to have to trigger an update of y. 108 00:08:13,010 --> 00:08:16,540 First, we'd want to update y.f and then 109 00:08:16,540 --> 00:08:21,420 we're going to trigger the update to x.f. 110 00:08:21,420 --> 00:08:24,940 And as long as this one can be computed from its children, 111 00:08:24,940 --> 00:08:29,810 then we compute y.f, then we can compute x from its children. 112 00:08:29,810 --> 00:08:30,310 All right. 113 00:08:30,310 --> 00:08:31,810 So a constant number of extra things 114 00:08:31,810 --> 00:08:33,850 we need to do whenever we do rotation. 115 00:08:33,850 --> 00:08:38,610 And because the rotations lie on a single path, total cost 116 00:08:38,610 --> 00:08:43,659 that-- once we stop doing the rotations, in AVL insert say, 117 00:08:43,659 --> 00:08:46,100 then we still have to keep updating up to the root. 118 00:08:46,100 --> 00:08:50,440 But there's only log n at most log n nodes to do that. 119 00:08:50,440 --> 00:08:50,940 OK. 120 00:08:50,940 --> 00:08:53,590 Same thing with 2-3 trees. 121 00:08:53,590 --> 00:08:55,860 We have a node split. 122 00:08:55,860 --> 00:08:58,730 So we have, I guess, three keys, four children. 123 00:08:58,730 --> 00:09:00,190 That's too many. 124 00:09:00,190 --> 00:09:06,860 So we split to two nodes and an extra node up here. 125 00:09:09,810 --> 00:09:12,380 Then we just trigger an update of this f value, 126 00:09:12,380 --> 00:09:15,650 an update of this f value, and an update of that f value. 127 00:09:15,650 --> 00:09:21,510 And because that just follows a single path everything's log n. 128 00:09:21,510 --> 00:09:23,820 So this is a general theorem about augmentation. 129 00:09:23,820 --> 00:09:26,850 Any function that's well behaved in this sense, 130 00:09:26,850 --> 00:09:32,260 we can maintain in AVL trees and 2-3 trees. 131 00:09:32,260 --> 00:09:37,690 And I'll remind you and state, a little more generally, 132 00:09:37,690 --> 00:09:41,674 what you did in 006, which are called order statistic trees 133 00:09:41,674 --> 00:09:42,340 in the textbook. 134 00:09:50,540 --> 00:09:54,850 So here we're going to-- let me first tell you 135 00:09:54,850 --> 00:09:56,220 what we're trying to achieve. 136 00:09:56,220 --> 00:09:58,939 This is the abstract data type, or the interface 137 00:09:58,939 --> 00:09:59,855 of the data structure. 138 00:10:03,360 --> 00:10:13,610 We want to do insert, delete, and say, successor searches. 139 00:10:16,590 --> 00:10:19,230 It's the usual thing we want out of a binary search tree. 140 00:10:19,230 --> 00:10:22,210 Predecessor too, sure. 141 00:10:22,210 --> 00:10:29,800 We want to do rank of a given key which is, tell me 142 00:10:29,800 --> 00:10:38,210 what is the index of that key in the overall sorted order 143 00:10:38,210 --> 00:10:40,770 of the items, of the keys? 144 00:10:40,770 --> 00:10:43,500 We've talked about rank a few times already in this class. 145 00:10:46,170 --> 00:10:48,220 Depends whether you start at 0 or 1, 146 00:10:48,220 --> 00:10:52,890 but let's say we start at one. 147 00:10:52,890 --> 00:10:56,017 So if you say rank of the key that happens to be the minimum, 148 00:10:56,017 --> 00:10:56,850 you want to get one. 149 00:10:56,850 --> 00:10:59,224 If you say rank of the key that happens to be the median, 150 00:10:59,224 --> 00:11:05,430 you want to get n over 2 plus 1, and so on. 151 00:11:05,430 --> 00:11:08,960 So it's a natural thing you might want to find out. 152 00:11:08,960 --> 00:11:14,100 And the converse operation is select, 153 00:11:14,100 --> 00:11:19,280 let's say of i, which is, give me the key of rank i. 154 00:11:26,710 --> 00:11:31,620 We've talked about select as an offline operation. 155 00:11:31,620 --> 00:11:34,720 Given an array, find me the median. 156 00:11:34,720 --> 00:11:38,960 Or find me the n over seventh rank item. 157 00:11:38,960 --> 00:11:42,810 And we can do that in linear time given no data structure. 158 00:11:42,810 --> 00:11:44,720 Here, we want a data structure so 159 00:11:44,720 --> 00:11:48,900 that we can find the median, or the seventh item, 160 00:11:48,900 --> 00:11:54,630 or the n over seventh key, whatever in log n time. 161 00:11:54,630 --> 00:11:58,995 We want to do all of these in log n per operation. 162 00:12:05,241 --> 00:12:05,740 OK. 163 00:12:05,740 --> 00:12:10,120 So in particular, rank of selective i should equal i. 164 00:12:10,120 --> 00:12:12,210 We're trying to find the item of that rank. 165 00:12:14,970 --> 00:12:17,520 So far, so good. 166 00:12:17,520 --> 00:12:22,410 And just to plug these two parts together. 167 00:12:22,410 --> 00:12:25,290 We have this data structure augmentation tool, 168 00:12:25,290 --> 00:12:27,170 we have this goal we want to achieve, 169 00:12:27,170 --> 00:12:29,710 we're going to achieve this goal by applying 170 00:12:29,710 --> 00:12:33,125 this technique where f is just the subtree size. 171 00:12:33,125 --> 00:12:36,350 It's the number of nodes in that subtree 172 00:12:36,350 --> 00:12:39,880 because that will let us compute rank. 173 00:12:39,880 --> 00:13:03,890 So we're going to use easy tree augmentation with f of subtree 174 00:13:03,890 --> 00:13:06,220 equal to the number of nodes in the subtree. 175 00:13:11,230 --> 00:13:13,310 So in order for this to apply, we 176 00:13:13,310 --> 00:13:18,090 need to check that given a node x we can compute x.f just using 177 00:13:18,090 --> 00:13:19,160 its children. 178 00:13:19,160 --> 00:13:19,890 This is easy. 179 00:13:23,330 --> 00:13:24,780 We just add everything up. 180 00:13:24,780 --> 00:13:29,092 So x.f would be equal to 1. 181 00:13:29,092 --> 00:13:30,770 That's for x. 182 00:13:30,770 --> 00:13:37,640 Plus the sum of c.f for every child c. 183 00:13:41,980 --> 00:13:46,250 I'll write this as a python interpolation 184 00:13:46,250 --> 00:13:48,690 so it looks a little more like an algorithm. 185 00:13:48,690 --> 00:13:50,330 I'm trying to be generic here. 186 00:13:50,330 --> 00:13:52,000 If it's a binary search tree you just 187 00:13:52,000 --> 00:13:56,840 do x.left.f, plus x.right.f. 188 00:13:56,840 --> 00:13:59,240 But this will work also for 2-3 trees. 189 00:13:59,240 --> 00:14:02,320 Pick your favorite data structure. 190 00:14:02,320 --> 00:14:05,660 As long as there's a constant number of children then 191 00:14:05,660 --> 00:14:07,420 this will take constant time. 192 00:14:07,420 --> 00:14:09,440 So we satisfied this condition. 193 00:14:09,440 --> 00:14:11,710 So we can do easy tree augmentation. 194 00:14:11,710 --> 00:14:13,600 And now we know we have subtree sizes. 195 00:14:13,600 --> 00:14:14,520 So given any node. 196 00:14:14,520 --> 00:14:19,930 We know the number of descendants below that node. 197 00:14:19,930 --> 00:14:20,790 So that's cool. 198 00:14:20,790 --> 00:14:24,240 It lets us compute rank in select. 199 00:14:24,240 --> 00:14:29,130 I'll just give you those algorithms, quickly. 200 00:14:29,130 --> 00:14:31,540 We can check that they're log n time. 201 00:14:39,131 --> 00:14:39,630 Yeah. 202 00:14:39,630 --> 00:14:44,670 So the idea is pretty simple. 203 00:14:44,670 --> 00:14:47,630 You have some key-- let's think about binary trees 204 00:14:47,630 --> 00:14:50,360 now, because it's a little bit easier. 205 00:14:50,360 --> 00:14:55,250 We have some item x. 206 00:14:55,250 --> 00:14:58,810 It has a left subtree, right subtree. 207 00:14:58,810 --> 00:15:02,160 And now let's look up from x. 208 00:15:02,160 --> 00:15:05,670 Just keep calling x.parent. 209 00:15:05,670 --> 00:15:08,950 So sometimes the parent is to the right of us 210 00:15:08,950 --> 00:15:11,890 and sometimes the parent is to the left of us. 211 00:15:11,890 --> 00:15:14,600 I'm going to draw this in a, kind of, funny way. 212 00:15:18,530 --> 00:15:22,810 But this funny way has a very special property, 213 00:15:22,810 --> 00:15:25,890 which is that the x-coordinate in this diagram 214 00:15:25,890 --> 00:15:27,220 is the key value. 215 00:15:27,220 --> 00:15:29,450 Or is the sorted order of the keys, right? 216 00:15:29,450 --> 00:15:34,040 Everything in the left subtree of x has a value less than x. 217 00:15:34,040 --> 00:15:35,940 If we say all the keys are different. 218 00:15:35,940 --> 00:15:38,760 Everything to the right of x has a value greater than x. 219 00:15:38,760 --> 00:15:42,130 If x was the left child of its parent, 220 00:15:42,130 --> 00:15:45,860 that means this thing is also greater than x. 221 00:15:45,860 --> 00:15:48,740 And if we follow a parent and this was 222 00:15:48,740 --> 00:15:50,990 the right child of that parent, that 223 00:15:50,990 --> 00:15:52,910 means this thing is less than x. 224 00:15:52,910 --> 00:15:55,330 So that's why I drew it all the way over to the left. 225 00:15:55,330 --> 00:15:58,010 This thing is also less than x because it was a, 226 00:15:58,010 --> 00:15:59,720 I'll call it a left parent. 227 00:15:59,720 --> 00:16:01,270 Here we have a right parent, so that 228 00:16:01,270 --> 00:16:04,060 means this is something greater than x. 229 00:16:04,060 --> 00:16:05,980 And over here we have a left parent, so this 230 00:16:05,980 --> 00:16:07,021 is something less than x. 231 00:16:07,021 --> 00:16:08,152 Let's say that's the root. 232 00:16:08,152 --> 00:16:09,860 In general, there's going to be some left 233 00:16:09,860 --> 00:16:13,530 edges and some right edges as we go up. 234 00:16:13,530 --> 00:16:18,170 These arrows will go either left or right in a binary tree. 235 00:16:18,170 --> 00:16:23,060 So the rank of x is just 1 plus the number of nodes 236 00:16:23,060 --> 00:16:24,200 that are less than x. 237 00:16:24,200 --> 00:16:26,410 Number of keys that are less than x. 238 00:16:26,410 --> 00:16:30,370 So there's these guys, there's these guys, 239 00:16:30,370 --> 00:16:33,390 and there's whatever's hanging off-- OK. 240 00:16:33,390 --> 00:16:36,477 Here I've almost violated my x-coordinate rule. 241 00:16:36,477 --> 00:16:38,310 If I make these really narrow, that's right. 242 00:16:40,940 --> 00:16:44,300 All of these things, all of these nodes 243 00:16:44,300 --> 00:16:46,470 in the left subtrees of these less than x nodes 244 00:16:46,470 --> 00:16:48,644 will also be less than x. 245 00:16:48,644 --> 00:16:50,310 If you think about these other subtrees, 246 00:16:50,310 --> 00:16:51,726 they're going to be bigger than x. 247 00:16:51,726 --> 00:16:54,760 So we don't really care about them. 248 00:16:54,760 --> 00:17:02,420 So we just want to count up all these nodes and all 249 00:17:02,420 --> 00:17:03,050 of these nodes. 250 00:17:06,660 --> 00:17:08,845 So the algorithm to do that is pretty simple. 251 00:17:16,030 --> 00:17:20,000 We're just going to start out with-- 252 00:17:33,550 --> 00:17:37,512 I'm going to switch from this f notation to size. 253 00:17:37,512 --> 00:17:38,720 That's a little more natural. 254 00:17:38,720 --> 00:17:42,210 In general, you might have many functions. 255 00:17:42,210 --> 00:17:48,600 Size is the usual notation for subtree size. 256 00:17:48,600 --> 00:17:52,100 So we start out by counting up how many items are here. 257 00:17:52,100 --> 00:17:54,690 And if we want to start at a rank of 1, 258 00:17:54,690 --> 00:17:56,460 if the min has rank 1, then I should also 259 00:17:56,460 --> 00:17:58,820 do plus 1 for x itself. 260 00:17:58,820 --> 00:18:02,490 If you wanted to start at zero you just omit that plus 1. 261 00:18:02,490 --> 00:18:09,940 And then, all I do is walk up from x to the root of the tree. 262 00:18:09,940 --> 00:18:22,340 And whenever we go left from, say x to x prime. 263 00:18:22,340 --> 00:18:25,160 So that means we have an x prime. 264 00:18:25,160 --> 00:18:27,150 It's right child is x. 265 00:18:27,150 --> 00:18:31,280 And so when we went from x to its parent we went to the left. 266 00:18:31,280 --> 00:18:44,910 Then we say rank plus equals x prime.left.size 267 00:18:44,910 --> 00:18:48,620 plus 1 for x prime itself. 268 00:18:48,620 --> 00:18:52,074 And maybe x prime.left.size is zero. 269 00:18:52,074 --> 00:18:53,490 Maybe there's no nodes over there. 270 00:18:53,490 --> 00:18:57,650 But at the very least we have to count those nodes that 271 00:18:57,650 --> 00:18:59,290 are to the left of us. 272 00:18:59,290 --> 00:19:00,960 And if there's anything down here 273 00:19:00,960 --> 00:19:02,550 we add up all those things. 274 00:19:02,550 --> 00:19:04,310 So that lets us compute rank. 275 00:19:04,310 --> 00:19:05,840 How long does it take? 276 00:19:05,840 --> 00:19:09,770 Well, we're just walking up one path from a leaf 277 00:19:09,770 --> 00:19:12,590 to a root-- or not necessarily a leaf, but from some node x 278 00:19:12,590 --> 00:19:13,540 to the root. 279 00:19:13,540 --> 00:19:16,050 And as long we're using a balance structure 280 00:19:16,050 --> 00:19:18,340 like AVL trees. 281 00:19:18,340 --> 00:19:22,390 I guess I want binary here, so let's say AVL trees. 282 00:19:22,390 --> 00:19:25,190 Then this will take log n time. 283 00:19:25,190 --> 00:19:28,270 So I'm spending constant work per step, 284 00:19:28,270 --> 00:19:30,620 and there's log n steps. 285 00:19:30,620 --> 00:19:32,140 Clear? 286 00:19:32,140 --> 00:19:34,717 So that's good old rank. 287 00:19:34,717 --> 00:19:36,300 Easy to do once you have subtree size. 288 00:19:36,300 --> 00:19:37,650 Let's do select for fun. 289 00:19:54,980 --> 00:19:57,360 This may seem like review, but I drew out this picture 290 00:19:57,360 --> 00:20:00,110 explicitly because we're going to do it a lot today. 291 00:20:00,110 --> 00:20:02,436 We'll have pictures like this a bunch of times. 292 00:20:02,436 --> 00:20:03,810 Really helps to think about where 293 00:20:03,810 --> 00:20:05,750 the nodes are, which ones are less than x, 294 00:20:05,750 --> 00:20:08,970 which ones are greater than x. 295 00:20:08,970 --> 00:20:10,380 Let's do select first. 296 00:20:13,500 --> 00:20:19,375 This you may not have seen in 006. 297 00:20:19,375 --> 00:20:20,750 So we're going to do the reverse. 298 00:20:20,750 --> 00:20:25,020 We're going to start at the root and we're going to walk down. 299 00:20:25,020 --> 00:20:26,570 Sounds easy enough. 300 00:20:26,570 --> 00:20:29,784 But now walking down is kind of like doing a search 301 00:20:29,784 --> 00:20:31,950 but we don't have a key we're searching for, we have 302 00:20:31,950 --> 00:20:34,380 a rank we're searching for. 303 00:20:34,380 --> 00:20:37,420 So what is that rank? 304 00:20:40,410 --> 00:20:41,550 Rank is i. 305 00:20:41,550 --> 00:20:42,300 OK. 306 00:20:42,300 --> 00:20:44,910 So on the other hand, we have the node x. 307 00:20:44,910 --> 00:20:47,616 We'd like to know the rank of x and compare that to i. 308 00:20:47,616 --> 00:20:49,990 That will tell us whether we should go left, or go right, 309 00:20:49,990 --> 00:20:52,260 or whether we happen to find the item. 310 00:20:52,260 --> 00:20:54,250 Now one possibility is we call rank 311 00:20:54,250 --> 00:20:56,620 of x to find the rank of x. 312 00:20:56,620 --> 00:21:02,190 But that's dangerous because I'm going to have a four loop here 313 00:21:02,190 --> 00:21:05,500 and it's going to take log n iterations. 314 00:21:05,500 --> 00:21:07,460 If at every iteration of computing rank 315 00:21:07,460 --> 00:21:11,310 of x, and rank costs log n, then overall cost 316 00:21:11,310 --> 00:21:13,750 might be log squared n. 317 00:21:13,750 --> 00:21:17,510 So I can't afford to-- I want to know what the rank of x 318 00:21:17,510 --> 00:21:22,470 is but I can't afford to say rank, open paren, x. 319 00:21:22,470 --> 00:21:25,110 Because that recursive call will be too expensive. 320 00:21:25,110 --> 00:21:27,040 So what is the rank of x in this case? 321 00:21:27,040 --> 00:21:30,290 This is a little special. 322 00:21:30,290 --> 00:21:31,230 What's that? 323 00:21:31,230 --> 00:21:32,947 AUDIENCE: Number of left children plus 1. 324 00:21:32,947 --> 00:21:34,530 PROFESSOR: Number of left, or the size 325 00:21:34,530 --> 00:21:35,980 of the left subtree plus 1. 326 00:21:35,980 --> 00:21:36,480 Yep. 327 00:21:41,200 --> 00:21:44,900 Plus 1 if we're counting, starting at one. 328 00:21:44,900 --> 00:21:45,400 Very good. 329 00:21:48,529 --> 00:21:51,600 I'm slowly getting better. 330 00:21:51,600 --> 00:21:52,990 Didn't hit anyone this time. 331 00:21:52,990 --> 00:21:53,490 OK. 332 00:21:53,490 --> 00:21:55,680 So at least for the root, this is the rank, 333 00:21:55,680 --> 00:21:58,160 and that only takes us constant time in the special case. 334 00:21:58,160 --> 00:21:59,320 So we'll have to check that it's still 335 00:21:59,320 --> 00:22:00,580 holds after I do the loop. 336 00:22:00,580 --> 00:22:02,270 But it will. 337 00:22:02,270 --> 00:22:02,920 So, cool. 338 00:22:02,920 --> 00:22:05,040 Now there are three cases. 339 00:22:05,040 --> 00:22:06,569 If i equals rank. 340 00:22:06,569 --> 00:22:08,360 If the rank we're searching for is the rank 341 00:22:08,360 --> 00:22:10,318 that we happen to have, then we're done, right? 342 00:22:10,318 --> 00:22:13,000 We just return x. 343 00:22:13,000 --> 00:22:14,900 That's the easy case. 344 00:22:14,900 --> 00:22:18,510 More likely is that I will be either less than or greater 345 00:22:18,510 --> 00:22:20,790 than the rank of x. 346 00:22:28,710 --> 00:22:29,600 OK. 347 00:22:29,600 --> 00:22:33,970 So if i is less than the rank, this is fairly easy. 348 00:22:33,970 --> 00:22:36,490 We just say x equals x.left. 349 00:22:40,756 --> 00:22:41,630 Did I get that right? 350 00:22:41,630 --> 00:22:43,460 Yep. 351 00:22:43,460 --> 00:22:44,710 In this case, the rank. 352 00:22:44,710 --> 00:22:46,090 So here we have x. 353 00:22:46,090 --> 00:22:48,120 It's at rank, rank. 354 00:22:48,120 --> 00:22:52,476 And then we have the left subtree and the right subtree. 355 00:22:52,476 --> 00:22:53,850 And so if the rank were searching 356 00:22:53,850 --> 00:22:56,750 for is less than rank, that means we know it's in here. 357 00:22:56,750 --> 00:22:57,900 So we should go left. 358 00:22:57,900 --> 00:23:01,230 And if we just said x equals x.left you might ask, 359 00:23:01,230 --> 00:23:03,420 well what rank are we searching for in here? 360 00:23:03,420 --> 00:23:06,560 Well, exactly the same rank. 361 00:23:06,560 --> 00:23:07,370 Fine. 362 00:23:07,370 --> 00:23:09,160 That's easy case. 363 00:23:09,160 --> 00:23:11,520 In the other situation, if we're searching in here, 364 00:23:11,520 --> 00:23:15,090 we're searching for rank greater than rank. 365 00:23:15,090 --> 00:23:19,110 Then I want to go right but the new rank 366 00:23:19,110 --> 00:23:23,750 that I'm searching for is local to this subtree. 367 00:23:23,750 --> 00:23:29,190 I'm searching for i minus this stuff. 368 00:23:29,190 --> 00:23:31,380 This stuff is rank. 369 00:23:31,380 --> 00:23:37,290 So I'm going to let i be i minus rank. 370 00:23:37,290 --> 00:23:39,200 Make sure I don't have any off by 1 errors. 371 00:23:39,200 --> 00:23:42,541 That seems to be right. 372 00:23:42,541 --> 00:23:43,040 OK. 373 00:23:43,040 --> 00:23:44,110 And then I do a loop. 374 00:23:44,110 --> 00:23:45,140 So I'll write repeat. 375 00:23:50,570 --> 00:23:53,860 So then I'm going to go up here and say, OK. 376 00:23:53,860 --> 00:23:56,150 Now relative to this thing. 377 00:23:56,150 --> 00:23:59,480 What is the rank of the root of this subtree? 378 00:23:59,480 --> 00:24:04,020 Well, it's again going to be that node .left.size plus 1. 379 00:24:04,020 --> 00:24:07,410 And now I have the new rank I'm searching for, i. 380 00:24:07,410 --> 00:24:08,610 And I just keep going. 381 00:24:08,610 --> 00:24:12,040 You could write this recursively if you like, but here's 382 00:24:12,040 --> 00:24:14,620 an iterative version. 383 00:24:14,620 --> 00:24:18,230 So it's actually very familiar to the select algorithm 384 00:24:18,230 --> 00:24:22,370 that we had, like when we did deterministic linear time 385 00:24:22,370 --> 00:24:25,550 median finding or randomized median finding. 386 00:24:25,550 --> 00:24:29,140 They had a very similar kind of recursion. 387 00:24:29,140 --> 00:24:31,150 But in that case, they were spending linear time 388 00:24:31,150 --> 00:24:33,970 to do the partition and that was expensive. 389 00:24:33,970 --> 00:24:36,540 Here, we're just spending constant time at each node 390 00:24:36,540 --> 00:24:39,520 and so the overall cost is log n. 391 00:24:39,520 --> 00:24:40,280 So that's nice. 392 00:24:40,280 --> 00:24:41,440 Any questions about that? 393 00:24:44,480 --> 00:24:45,570 OK. 394 00:24:45,570 --> 00:24:47,330 I have a note here. 395 00:24:47,330 --> 00:24:51,840 Subtree size is obvious once you know that's what you should do. 396 00:24:51,840 --> 00:24:53,560 Another natural thing to try to do 397 00:24:53,560 --> 00:24:55,910 would be to augment, for each node, 398 00:24:55,910 --> 00:24:57,370 what is the rank of that node? 399 00:24:57,370 --> 00:24:59,820 Because then rank is really easy to find. 400 00:24:59,820 --> 00:25:02,200 And then select would basically be a regular search. 401 00:25:02,200 --> 00:25:03,827 I just look at the rank of the root, 402 00:25:03,827 --> 00:25:05,410 I see whether the rank I'm looking for 403 00:25:05,410 --> 00:25:08,380 is too big, or too small, and I go left or right, accordingly. 404 00:25:08,380 --> 00:25:11,890 What would be bad about augmenting with rank of a node? 405 00:25:14,430 --> 00:25:15,400 Updates. 406 00:25:15,400 --> 00:25:15,900 Why? 407 00:25:19,300 --> 00:25:22,395 What's a bad example for an update? 408 00:25:22,395 --> 00:25:25,132 AUDIENCE: If you add new in home element. 409 00:25:25,132 --> 00:25:25,840 PROFESSOR: Right. 410 00:25:25,840 --> 00:25:27,650 Say we insert a new minimum element. 411 00:25:31,810 --> 00:25:33,810 Good catch, cameraman. 412 00:25:33,810 --> 00:25:36,510 That was for the camera, obviously. 413 00:25:36,510 --> 00:25:37,610 So, right. 414 00:25:37,610 --> 00:25:41,560 If we insert, this is off to the side, but say we insert, 415 00:25:41,560 --> 00:25:43,600 I'll call it minus infinity. 416 00:25:43,600 --> 00:25:46,170 A new key that is smaller than all other keys, 417 00:25:46,170 --> 00:25:49,140 then the rank of every node changes. 418 00:25:49,140 --> 00:25:53,280 So that's bad. 419 00:25:53,280 --> 00:25:55,695 It means that easy tree augmentation, in particular, 420 00:25:55,695 --> 00:25:56,570 isn't going to apply. 421 00:25:56,570 --> 00:25:59,807 And furthermore, it would take linear time to do this. 422 00:25:59,807 --> 00:26:02,390 And you could keep inserting, if you insert keys in decreasing 423 00:26:02,390 --> 00:26:04,850 order from there, every time you do an insert, 424 00:26:04,850 --> 00:26:06,730 all the ranks increase by one. 425 00:26:06,730 --> 00:26:09,360 Maintaining that's going to cost linear time per update. 426 00:26:09,360 --> 00:26:14,230 So you have to be really careful that the function you 427 00:26:14,230 --> 00:26:16,630 want to store actually can be maintained. 428 00:26:16,630 --> 00:26:20,179 Be very careful about that, say, on the quiz coming up, 429 00:26:20,179 --> 00:26:21,720 that when you're augmenting something 430 00:26:21,720 --> 00:26:23,280 you can actually maintain it. 431 00:26:23,280 --> 00:26:27,010 For example, it's very hard to maintain the depths of nodes 432 00:26:27,010 --> 00:26:31,410 because when you do a rotation a whole lot of depths change. 433 00:26:31,410 --> 00:26:32,770 Depth is counting from the root. 434 00:26:32,770 --> 00:26:34,430 How deep am I? 435 00:26:34,430 --> 00:26:36,660 When I do a rotation then this entire subtree 436 00:26:36,660 --> 00:26:37,560 went down by one. 437 00:26:37,560 --> 00:26:40,830 This entire subtree went up by one. 438 00:26:40,830 --> 00:26:42,100 In this picture. 439 00:26:42,100 --> 00:26:44,820 But it's very easy to maintain heights, for example. 440 00:26:44,820 --> 00:26:46,590 Height counting from the bottom is OK, 441 00:26:46,590 --> 00:26:49,740 because I don't affect the height of a, b, and c. 442 00:26:49,740 --> 00:26:52,090 I affect it for x and y but that's just two nodes. 443 00:26:52,090 --> 00:26:53,890 That I can afford. 444 00:26:53,890 --> 00:26:56,984 So that's what you want to be careful of in the easy tree 445 00:26:56,984 --> 00:26:57,525 augmentation. 446 00:27:00,470 --> 00:27:03,960 So most the time easy tree augmentation does the job. 447 00:27:03,960 --> 00:27:07,340 But in the remaining two examples, 448 00:27:07,340 --> 00:27:10,747 I want to show you cooler examples of augmentation. 449 00:27:10,747 --> 00:27:12,330 These are things you probably wouldn't 450 00:27:12,330 --> 00:27:17,780 be expected to come up with on your own, but they're cool. 451 00:27:17,780 --> 00:27:20,055 And they let us do more sophisticated operations. 452 00:27:27,970 --> 00:27:31,220 So the first one is called level linking. 453 00:27:36,490 --> 00:27:41,290 And here we're going to do it in the context of 2-3 trees, 454 00:27:41,290 --> 00:27:42,255 partly for variety. 455 00:27:45,690 --> 00:27:49,410 So the idea of level linking is very simple. 456 00:27:49,410 --> 00:27:50,450 Let me draw a 2-3 tree. 457 00:28:03,340 --> 00:28:05,040 Not a very impressive 2-3 tree. 458 00:28:05,040 --> 00:28:08,030 I guess I don't feel like drawing too much. 459 00:28:08,030 --> 00:28:10,710 Level linking is the idea of, in addition to 460 00:28:10,710 --> 00:28:13,540 these child and parent pointers, we're 461 00:28:13,540 --> 00:28:15,815 going to add links on all the levels. 462 00:28:19,557 --> 00:28:21,140 Horizontal links, you might call them. 463 00:28:54,180 --> 00:28:54,820 OK. 464 00:28:54,820 --> 00:28:57,370 So that's nice. 465 00:28:57,370 --> 00:28:59,350 Two questions-- can we do this? 466 00:28:59,350 --> 00:29:00,710 And what's it good for? 467 00:29:00,710 --> 00:29:03,050 So let's start with can we do this. 468 00:29:03,050 --> 00:29:05,420 Remember in 2-3 trees all we have to think about 469 00:29:05,420 --> 00:29:07,280 are splits and merges. 470 00:29:07,280 --> 00:29:12,570 So in a split, we have, for a brief period, 471 00:29:12,570 --> 00:29:14,950 let's say three keys, four children. 472 00:29:14,950 --> 00:29:17,610 That's too many. 473 00:29:17,610 --> 00:29:19,130 So we change that to-- 474 00:29:26,524 --> 00:29:28,512 I'm going to change this in a moment. 475 00:29:28,512 --> 00:29:31,494 For now, this is the split you know and love, maybe. 476 00:29:31,494 --> 00:29:32,500 At least know. 477 00:29:32,500 --> 00:29:35,610 And if we think about where the leveling pointers are, 478 00:29:35,610 --> 00:29:38,910 we have one before. 479 00:29:38,910 --> 00:29:42,330 And then we just need to distribute those pointers 480 00:29:42,330 --> 00:29:44,540 to the two resulting nodes. 481 00:29:44,540 --> 00:29:47,980 And then we have to create a new pointer between the nodes 482 00:29:47,980 --> 00:29:49,100 that we just created. 483 00:29:49,100 --> 00:29:50,430 This is, of course, easy to do. 484 00:29:50,430 --> 00:29:50,890 We're here. 485 00:29:50,890 --> 00:29:51,848 We're taking this node. 486 00:29:51,848 --> 00:29:53,870 We're splitting it in half. 487 00:29:53,870 --> 00:29:55,860 So we have the nodes right in our hands so just 488 00:29:55,860 --> 00:29:57,710 add pointers between them. 489 00:29:57,710 --> 00:30:00,840 And key thing is, there's some node over here on the left. 490 00:30:00,840 --> 00:30:02,360 It used to point to this node, now 491 00:30:02,360 --> 00:30:04,760 we have to change it to point to the left version. 492 00:30:04,760 --> 00:30:06,030 The left half of the node. 493 00:30:06,030 --> 00:30:07,696 And there's some node over on the right. 494 00:30:07,696 --> 00:30:10,140 We have to change it's left pointer to point 495 00:30:10,140 --> 00:30:13,897 to this right half of the node. 496 00:30:13,897 --> 00:30:14,480 But that's it. 497 00:30:14,480 --> 00:30:16,670 Constant time. 498 00:30:16,670 --> 00:30:20,010 So this doesn't fall under the category of easy tree 499 00:30:20,010 --> 00:30:23,870 augmentation because this is not isolated to the subtree. 500 00:30:23,870 --> 00:30:27,380 We're also dealing with it's left and right subtrees. 501 00:30:27,380 --> 00:30:30,450 But still easy to do in constant time. 502 00:30:36,599 --> 00:30:38,140 Merging nodes is going to be similar. 503 00:30:48,130 --> 00:30:52,870 If we steal a node from our parents or former sibling, 504 00:30:52,870 --> 00:30:55,380 nothing happens in terms of level links. 505 00:30:55,380 --> 00:31:00,040 But if we have, say, an empty node and a node that cannot 506 00:31:00,040 --> 00:31:01,290 afford any stealing. 507 00:31:01,290 --> 00:31:03,880 So we have single child here, two children, 508 00:31:03,880 --> 00:31:05,030 and we merge it into-- 509 00:31:09,840 --> 00:31:12,140 We're taking something from our parent. 510 00:31:12,140 --> 00:31:13,310 Bringing it down. 511 00:31:13,310 --> 00:31:15,550 Then we have three children afterwards. 512 00:31:15,550 --> 00:31:20,070 Again, we used to have these level pointers. 513 00:31:20,070 --> 00:31:22,100 Now we just have these level pointers. 514 00:31:22,100 --> 00:31:23,255 It's easy to maintain. 515 00:31:23,255 --> 00:31:24,880 It's just a constant size neighborhood. 516 00:31:24,880 --> 00:31:26,910 Because we have the level links, we 517 00:31:26,910 --> 00:31:28,940 can get to our left and right neighbors 518 00:31:28,940 --> 00:31:31,200 and change where the links point to. 519 00:31:31,200 --> 00:31:34,730 So easy to maintain in constant time. 520 00:31:43,390 --> 00:31:45,690 I'll call it constant overhead. 521 00:31:45,690 --> 00:31:47,300 Every time we do a split or merge 522 00:31:47,300 --> 00:31:50,339 we spend additional constant time to do it. 523 00:31:50,339 --> 00:31:51,880 We're already spending constant time. 524 00:31:51,880 --> 00:31:56,410 So just changes everything by constant factor. 525 00:31:56,410 --> 00:31:58,440 So far, so good. 526 00:31:58,440 --> 00:32:01,600 Now, I'm going to have to tweak this data 527 00:32:01,600 --> 00:32:02,560 structure a little bit. 528 00:32:02,560 --> 00:32:03,810 But let me first tell you why. 529 00:32:03,810 --> 00:32:06,030 What am I trying to achieve with this data structure? 530 00:32:16,680 --> 00:32:20,510 What I'm trying to achieve is something called the finger 531 00:32:20,510 --> 00:32:21,270 search property. 532 00:32:34,164 --> 00:32:35,580 So let's just think about the case 533 00:32:35,580 --> 00:32:38,220 where I'm doing a successful search. 534 00:32:38,220 --> 00:32:41,840 I'm searching for key x and I find it in the data structure. 535 00:32:41,840 --> 00:32:43,130 I find it in the tree. 536 00:32:45,890 --> 00:32:49,890 Suppose I found one-- I search for x, I found it. 537 00:32:49,890 --> 00:32:52,089 And then I search for another key y. 538 00:32:52,089 --> 00:32:53,630 Actually I think I'll do the reverse. 539 00:32:53,630 --> 00:32:56,460 First I found y, now I'm searching for x. 540 00:32:56,460 --> 00:32:59,380 If x and y are nearby in the tree, 541 00:32:59,380 --> 00:33:02,440 I want this to run especially fast. 542 00:33:02,440 --> 00:33:05,090 For example, if x is the successor of y 543 00:33:05,090 --> 00:33:07,790 I want this to take constant time. 544 00:33:07,790 --> 00:33:10,350 That would be nice. 545 00:33:10,350 --> 00:33:12,880 In the worst case x and y are very far away from me 546 00:33:12,880 --> 00:33:16,150 in the tree then I want it to take log n time. 547 00:33:16,150 --> 00:33:19,570 So how could I interpolate between constant time 548 00:33:19,570 --> 00:33:22,900 for finding the successor and log n time 549 00:33:22,900 --> 00:33:28,080 for finding the worst case search. 550 00:33:28,080 --> 00:33:35,040 So I'm going to call this search of x from y. 551 00:33:35,040 --> 00:33:37,940 Meaning, this is a little imprecise, but what 552 00:33:37,940 --> 00:33:41,720 I mean is when I call search, I tell it 553 00:33:41,720 --> 00:33:43,330 where I've already found y. 554 00:33:43,330 --> 00:33:44,160 And here it is. 555 00:33:44,160 --> 00:33:46,690 Here's the node storing y. 556 00:33:46,690 --> 00:33:49,160 And now I'm given a key x. 557 00:33:49,160 --> 00:33:51,780 And I want to find that key x given 558 00:33:51,780 --> 00:33:54,600 the node that stores key y. 559 00:33:54,600 --> 00:33:57,835 So how long should this take? 560 00:33:57,835 --> 00:33:59,210 Will be a good way to interpolate 561 00:33:59,210 --> 00:34:01,420 between constant time at one extreme. 562 00:34:01,420 --> 00:34:03,570 The good case, when x and y are basically 563 00:34:03,570 --> 00:34:08,620 neighbors in sorted order, versus 564 00:34:08,620 --> 00:34:12,535 log n time, in the worst case. 565 00:34:12,535 --> 00:34:14,120 AUDIENCE: Distance along the graph. 566 00:34:14,120 --> 00:34:15,620 PROFESSOR: Distance along the graph. 567 00:34:15,620 --> 00:34:18,600 That would be one reasonable definition. 568 00:34:18,600 --> 00:34:22,050 So I have a tree which you could think of as a graph. 569 00:34:22,050 --> 00:34:25,920 Measure the shortest path length from x to y. 570 00:34:25,920 --> 00:34:29,340 Or we have a more sophisticated graph over here. 571 00:34:29,340 --> 00:34:30,717 Maybe that length. 572 00:34:30,717 --> 00:34:32,800 The trouble with the distance in the graph, that's 573 00:34:32,800 --> 00:34:35,400 a reasonable suggestion, but it's very data structure 574 00:34:35,400 --> 00:34:36,090 specific. 575 00:34:36,090 --> 00:34:38,840 If I use an AVL tree without level links, 576 00:34:38,840 --> 00:34:42,230 then the distance could be one thing, 577 00:34:42,230 --> 00:34:45,886 whereas if I use a 2-3 tree, even without level lengths, 578 00:34:45,886 --> 00:34:47,469 it's going to be a different distance. 579 00:34:47,469 --> 00:34:49,397 If I use a 2-3 tree with level lengths 580 00:34:49,397 --> 00:34:50,980 it's going to be yet another distance. 581 00:34:50,980 --> 00:34:53,350 So that's a little unsatisfying. 582 00:34:53,350 --> 00:34:56,236 I want this to be an answer to a question. 583 00:34:56,236 --> 00:34:58,610 I don't want to phrase the question in terms of that data 584 00:34:58,610 --> 00:34:59,464 structure. 585 00:34:59,464 --> 00:35:01,380 AUDIENCE: Difference between ranks of x and y? 586 00:35:01,380 --> 00:35:03,720 PROFESSOR: Difference between ranks between x and y. 587 00:35:03,720 --> 00:35:04,695 That's close. 588 00:35:09,830 --> 00:35:12,580 So I'm going to look at the rank of x and rank of y. 589 00:35:12,580 --> 00:35:15,010 Let's say, take the absolute difference. 590 00:35:15,010 --> 00:35:18,390 That's kind of how far away they are in sorted order. 591 00:35:18,390 --> 00:35:20,506 Do you want to add anything? 592 00:35:20,506 --> 00:35:21,350 AUDIENCE: Log? 593 00:35:21,350 --> 00:35:21,705 PROFESSOR: Log. 594 00:35:21,705 --> 00:35:22,205 Yeah. 595 00:35:24,660 --> 00:35:26,990 Because in the worst case the difference in ranks 596 00:35:26,990 --> 00:35:28,400 could be linear. 597 00:35:28,400 --> 00:35:31,950 So I want to add a log out here to get log n in that worst 598 00:35:31,950 --> 00:35:32,450 case. 599 00:35:35,120 --> 00:35:37,660 Add a big o for safety. 600 00:35:37,660 --> 00:35:39,569 That's how much time we want to achieve. 601 00:35:39,569 --> 00:35:41,360 So this would be the finger search property 602 00:35:41,360 --> 00:35:44,850 that you can solve this problem in this much time. 603 00:35:44,850 --> 00:35:47,820 Again, difference in ranks is at most n. 604 00:35:47,820 --> 00:35:49,670 So this is at most log n. 605 00:35:49,670 --> 00:35:54,090 But if y is the successor of x this will only be constant 606 00:35:54,090 --> 00:35:56,360 and this will be constant. 607 00:35:56,360 --> 00:35:59,050 So this is great if you're doing lots of searches 608 00:35:59,050 --> 00:36:01,909 and you tend to search for things that are nearby, 609 00:36:01,909 --> 00:36:03,950 but sometimes you search for things are far away. 610 00:36:03,950 --> 00:36:06,420 This gives you a nice bound. 611 00:36:12,100 --> 00:36:15,270 On the one hand, we have, this is our goal. 612 00:36:15,270 --> 00:36:16,470 Log difference of ranks. 613 00:36:16,470 --> 00:36:18,200 On the other hand, we have the suggestion 614 00:36:18,200 --> 00:36:20,070 that what we can achieve is something 615 00:36:20,070 --> 00:36:21,900 like the distance in the graph. 616 00:36:25,080 --> 00:36:26,730 But we have a problem with this. 617 00:36:26,730 --> 00:36:28,650 I used to think that data structure solved this problem, 618 00:36:28,650 --> 00:36:29,275 but it doesn't. 619 00:36:34,620 --> 00:36:37,918 Let me just draw-- actually I have a tree right there. 620 00:36:37,918 --> 00:36:39,001 I'm going to use that one. 621 00:36:44,900 --> 00:36:51,530 Suppose x is here and y is here. 622 00:36:51,530 --> 00:36:52,030 OK. 623 00:36:52,030 --> 00:36:55,790 This is a bit of a small tree but if you think about it 624 00:36:55,790 --> 00:37:00,670 long enough, this node is the predecessor of this node. 625 00:37:00,670 --> 00:37:03,140 So their difference in ranks should be 1. 626 00:37:06,290 --> 00:37:09,297 But the distance in the graph here is two. 627 00:37:09,297 --> 00:37:10,130 Not very impressive. 628 00:37:10,130 --> 00:37:13,810 But in general, you have a tree of height log n. 629 00:37:13,810 --> 00:37:18,495 If you look at the root, and the predecessor of the root, 630 00:37:18,495 --> 00:37:20,120 they will have a rank difference of one 631 00:37:20,120 --> 00:37:22,110 by definition of predecessor. 632 00:37:22,110 --> 00:37:25,561 But the graph distance will be log n. 633 00:37:25,561 --> 00:37:28,060 So that's bad news, because if we're only following pointers 634 00:37:28,060 --> 00:37:31,980 there's no way to get from here to there in constant time. 635 00:37:31,980 --> 00:37:35,340 So we're not quite there. 636 00:37:35,340 --> 00:37:43,360 We're going to use another tweak that data structure, which is 637 00:37:43,360 --> 00:37:44,910 store the data in the leaves. 638 00:37:52,529 --> 00:37:54,820 Tried to find a data structure that didn't require this 639 00:37:54,820 --> 00:37:56,000 and still got finger search. 640 00:37:56,000 --> 00:37:58,000 But as far as I know, there is none. 641 00:37:58,000 --> 00:37:58,980 No such data structure. 642 00:38:01,610 --> 00:38:04,427 If you look at, say, Wikipedia about B-trees, 643 00:38:04,427 --> 00:38:06,510 you'll see there's a ton of variations of B-trees. 644 00:38:06,510 --> 00:38:08,230 B+-trees, B*-trees. 645 00:38:08,230 --> 00:38:09,462 This is one of those. 646 00:38:09,462 --> 00:38:10,170 I think B+-trees. 647 00:38:12,990 --> 00:38:15,840 As you saw, B-trees or 2-3 trees, every node 648 00:38:15,840 --> 00:38:19,050 stored one or two keys. 649 00:38:19,050 --> 00:38:23,080 And each key only existed in one spot. 650 00:38:23,080 --> 00:38:26,640 We're still only going to put each key in one spot, kind of. 651 00:38:26,640 --> 00:38:30,030 But it's only going to be the leaf spots. 652 00:38:30,030 --> 00:38:30,530 OK. 653 00:38:30,530 --> 00:38:32,411 Good news is most nodes are leaves, right? 654 00:38:32,411 --> 00:38:34,660 Constant fraction of the nodes are going to be leaves. 655 00:38:34,660 --> 00:38:38,720 So it doesn't change too much from a space efficiency 656 00:38:38,720 --> 00:38:39,480 standpoint. 657 00:38:39,480 --> 00:38:41,970 If we just put data down here and don't put-- 658 00:38:41,970 --> 00:38:44,050 I'm not going to put any keys up here for now. 659 00:38:47,690 --> 00:38:51,080 So this a little weird. 660 00:38:51,080 --> 00:38:54,036 Let me draw an example of such a tree. 661 00:38:54,036 --> 00:39:06,180 So maybe we have 2, and 5, and 7, and 8, 9, let's say. 662 00:39:09,410 --> 00:39:11,040 Let's put 1 here. 663 00:39:11,040 --> 00:39:14,590 So I'm going to have a node here with three children, a node 664 00:39:14,590 --> 00:39:16,600 here with two children, and here's 665 00:39:16,600 --> 00:39:17,860 a node with two children. 666 00:39:17,860 --> 00:39:22,120 So I think this mimics this tree, roughly. 667 00:39:22,120 --> 00:39:23,620 I got it exactly right. 668 00:39:23,620 --> 00:39:26,120 So here I've taken this tree structure. 669 00:39:26,120 --> 00:39:27,330 I've redrawn it. 670 00:39:27,330 --> 00:39:30,197 There's now no keys in these nodes. 671 00:39:30,197 --> 00:39:32,030 But everything else is going to be the same. 672 00:39:32,030 --> 00:39:34,400 Every node is going to have 0 children 673 00:39:34,400 --> 00:39:37,999 if it's a leaf, or two, or three children otherwise. 674 00:39:37,999 --> 00:39:39,540 Never have one child because then you 675 00:39:39,540 --> 00:39:40,924 wouldn't get logarithmic depth. 676 00:39:40,924 --> 00:39:42,965 All the leaves are going to be at the same depth. 677 00:39:46,380 --> 00:39:47,980 And that's it. 678 00:39:47,980 --> 00:39:48,480 OK. 679 00:39:48,480 --> 00:39:52,410 That is a 2-3 tree with the data stored in the leaves. 680 00:39:52,410 --> 00:39:54,470 It's a useful trick to know. 681 00:39:54,470 --> 00:39:56,600 Now we're going to do a level linked 2-3 tree. 682 00:39:56,600 --> 00:39:58,620 So in addition to that picture, we're 683 00:39:58,620 --> 00:40:00,910 going to have links like this. 684 00:40:06,455 --> 00:40:06,955 OK. 685 00:40:06,955 --> 00:40:09,540 And I should check that I can still do insert and delete 686 00:40:09,540 --> 00:40:10,570 into these structures. 687 00:40:10,570 --> 00:40:12,960 It's actually not too hard. 688 00:40:12,960 --> 00:40:14,070 But let's think about it. 689 00:40:30,760 --> 00:40:32,730 I think, actually, it might be easier. 690 00:40:32,730 --> 00:40:33,600 Let's see. 691 00:40:36,640 --> 00:40:43,562 So if I want to do an insert-- OK. 692 00:40:43,562 --> 00:40:45,520 I have to first search for where I'm inserting. 693 00:40:45,520 --> 00:40:48,600 I haven't told you how to do search yet. 694 00:40:48,600 --> 00:40:49,150 OK. 695 00:40:49,150 --> 00:40:51,230 So let's first think about search. 696 00:40:55,780 --> 00:41:01,380 What we're going to do is data structure augmentation. 697 00:41:01,380 --> 00:41:04,880 We have simple tree augmentation. 698 00:41:04,880 --> 00:41:08,080 So I'm going to do it and each node, 699 00:41:08,080 --> 00:41:09,730 what the functions I'm going to store 700 00:41:09,730 --> 00:41:12,060 are the minimum key in the subtree, 701 00:41:12,060 --> 00:41:13,710 and the maximum key in the subtree. 702 00:41:23,700 --> 00:41:26,072 There are many ways to do this, but I think 703 00:41:26,072 --> 00:41:27,280 this is kind of the simplest. 704 00:41:33,450 --> 00:41:36,420 So what that means is at this node, 705 00:41:36,420 --> 00:41:43,240 I'm going to store 1 as the min and 7 as the max. 706 00:41:43,240 --> 00:41:46,400 And at this node it's going to be 1 at the min 707 00:41:46,400 --> 00:41:47,650 and 9 at the max. 708 00:41:47,650 --> 00:41:52,010 And here we have 8 as the min and 9 as the max. 709 00:41:52,010 --> 00:41:54,320 Again min and max of subtrees are easy to store. 710 00:41:54,320 --> 00:41:57,980 If I ever change a node I can update it 711 00:41:57,980 --> 00:41:59,790 based on its children, just by looking 712 00:41:59,790 --> 00:42:03,070 at the min of the leftmost child and the max 713 00:42:03,070 --> 00:42:04,690 of the rightmost child. 714 00:42:04,690 --> 00:42:06,980 If I didn't know 1 and 9, I could just 715 00:42:06,980 --> 00:42:09,300 look at this min and that max and that's 716 00:42:09,300 --> 00:42:11,870 going to be the min and the max of the overall tree. 717 00:42:11,870 --> 00:42:14,126 So in constant time I can update the min 718 00:42:14,126 --> 00:42:16,100 and the max of a node given the min 719 00:42:16,100 --> 00:42:18,680 and the max of its children. 720 00:42:18,680 --> 00:42:19,997 Special case is at the leaves. 721 00:42:19,997 --> 00:42:22,330 Then you have to actually look at keys and compare them. 722 00:42:22,330 --> 00:42:24,510 But leaves only have, at most, two keys. 723 00:42:24,510 --> 00:42:29,120 So pretty easy to compare them in constant time. 724 00:42:29,120 --> 00:42:29,620 OK. 725 00:42:29,620 --> 00:42:31,310 So that's how I do the augmentation. 726 00:42:31,310 --> 00:42:33,030 Now how do I do a search? 727 00:42:33,030 --> 00:42:37,405 Well, if I'm at a node and I'm searching for a key. 728 00:42:37,405 --> 00:42:38,780 Well, let's say I'm at this node. 729 00:42:38,780 --> 00:42:42,290 I'm searching for a key like 8. 730 00:42:42,290 --> 00:42:44,497 What I'm going to do is look at all of the children. 731 00:42:44,497 --> 00:42:45,580 In this case, there's two. 732 00:42:45,580 --> 00:42:47,280 In the worst case there's three. 733 00:42:47,280 --> 00:42:50,810 I look at the min and max and I see where does 8 fall? 734 00:42:50,810 --> 00:42:52,750 Well it falls in this interval. 735 00:42:52,750 --> 00:42:56,400 If I was searching for 7 1/2 I know it's not there. 736 00:42:56,400 --> 00:42:58,120 It's going to be in between here. 737 00:42:58,120 --> 00:43:03,420 If I'm doing a successor then I'll go to the right. 738 00:43:03,420 --> 00:43:05,510 If I'm doing predecessor I'll go to the left. 739 00:43:05,510 --> 00:43:08,860 And then take either the maximum item or the minimum item. 740 00:43:08,860 --> 00:43:10,734 If I'm searching for 8 I see, oh. 741 00:43:10,734 --> 00:43:12,400 8 falls in the interval between 8 and 9, 742 00:43:12,400 --> 00:43:13,970 so I should clearly take the right child 743 00:43:13,970 --> 00:43:15,040 among those two children. 744 00:43:15,040 --> 00:43:16,498 In general, there's three children. 745 00:43:16,498 --> 00:43:17,260 Three intervals. 746 00:43:17,260 --> 00:43:17,930 Constant time. 747 00:43:17,930 --> 00:43:20,830 I can find where my key falls in the interval. 748 00:43:20,830 --> 00:43:21,330 OK. 749 00:43:21,330 --> 00:43:25,200 So search is going to take log n time again, provided 750 00:43:25,200 --> 00:43:26,570 I have these mins and maxs. 751 00:43:29,075 --> 00:43:31,130 If you stare at it long enough, this 752 00:43:31,130 --> 00:43:34,960 is pretty much the same thing as regular search in a 2-3 tree. 753 00:43:34,960 --> 00:43:39,850 But I've put the data just one level down. 754 00:43:39,850 --> 00:43:40,350 OK. 755 00:43:43,180 --> 00:43:44,650 Good. 756 00:43:44,650 --> 00:43:46,430 That was regular search. 757 00:43:46,430 --> 00:43:49,149 I still need to do finger search, but we'll get there. 758 00:43:49,149 --> 00:43:51,190 And now, if I want to do an insert into this data 759 00:43:51,190 --> 00:43:54,330 structure, what happens. 760 00:43:54,330 --> 00:43:57,700 Well I search for the key let's say I'm inserting 6. 761 00:43:57,700 --> 00:43:59,180 So maybe I go here. 762 00:43:59,180 --> 00:44:00,760 I say because 6. 763 00:44:00,760 --> 00:44:02,380 Is in this interval. 764 00:44:02,380 --> 00:44:04,550 6 is in neither of these intervals. 765 00:44:04,550 --> 00:44:07,350 But it's closest to the interval 2, 5, or the interval 7. 766 00:44:07,350 --> 00:44:09,910 Let's say I go down to 2, 5. 767 00:44:09,910 --> 00:44:13,300 And well, to insert 6 I'll just add a 6 on there. 768 00:44:13,300 --> 00:44:15,410 Of course, now that node is too big. 769 00:44:15,410 --> 00:44:18,840 So there's still going to be a split case at the leaves where 770 00:44:18,840 --> 00:44:24,460 I have let's say, a,b,c, too many keys. 771 00:44:24,460 --> 00:44:29,140 I'm going to split that into a,b and c. 772 00:44:29,140 --> 00:44:30,850 This is different from before. 773 00:44:30,850 --> 00:44:33,800 It used to be I would promote b to the parent 774 00:44:33,800 --> 00:44:36,030 because the parent needed the key there. 775 00:44:36,030 --> 00:44:37,810 Now parents don't have keys. 776 00:44:37,810 --> 00:44:42,000 So I'm just going to split this thing, roughly, in half. 777 00:44:42,000 --> 00:44:44,510 It works. 778 00:44:44,510 --> 00:44:47,220 It's still the case that whoever was the parent up 779 00:44:47,220 --> 00:44:50,460 here now has an additional child. 780 00:44:50,460 --> 00:44:51,210 One more child. 781 00:44:51,210 --> 00:44:54,020 So maybe that node now has four children 782 00:44:54,020 --> 00:44:56,390 but it's supposed to be two or three. 783 00:44:56,390 --> 00:45:01,630 So if I have a node with four children, what I'm going to do, 784 00:45:01,630 --> 00:45:05,140 I'm suppose to use these fancy arrows. 785 00:45:05,140 --> 00:45:06,820 What do I do in this case? 786 00:45:06,820 --> 00:45:09,170 It's just going to split that into two nodes with two 787 00:45:09,170 --> 00:45:10,860 children. 788 00:45:10,860 --> 00:45:13,410 And again this used to have a parent. 789 00:45:13,410 --> 00:45:16,120 Now that parent has an additional child, 790 00:45:16,120 --> 00:45:17,830 and that may cause another split. 791 00:45:17,830 --> 00:45:19,030 It's just like before. 792 00:45:19,030 --> 00:45:23,160 Was just potentially split all the way up to the root. 793 00:45:23,160 --> 00:45:26,610 If we split the root then we get an additional level. 794 00:45:26,610 --> 00:45:32,040 But we could do all this and we can still maintain our level 795 00:45:32,040 --> 00:45:32,970 links, if we want. 796 00:45:37,430 --> 00:45:38,770 But everything will take log n. 797 00:45:38,770 --> 00:45:41,940 I won't draw the delete case, as delete 798 00:45:41,940 --> 00:45:43,660 is slightly more annoying. 799 00:45:43,660 --> 00:45:45,160 But I think, in this case, you never 800 00:45:45,160 --> 00:45:47,290 have to worry about where is the key coming from, 801 00:45:47,290 --> 00:45:49,740 your child or your parent? 802 00:45:49,740 --> 00:45:53,056 You're just merging nodes so it's a little bit simpler. 803 00:45:53,056 --> 00:45:54,680 But you have to deal with the leaf case 804 00:45:54,680 --> 00:45:56,880 separately from the nonleaf case. 805 00:45:56,880 --> 00:45:57,380 OK. 806 00:45:57,380 --> 00:45:58,860 So all this was to convince you that we 807 00:45:58,860 --> 00:46:00,068 can store data in the leaves. 808 00:46:00,068 --> 00:46:02,310 2-3 trees still work fine. 809 00:46:02,310 --> 00:46:06,650 Now I claim that the graph distance in level link trees 810 00:46:06,650 --> 00:46:10,610 is within a constant factor of the finger search bound. 811 00:46:10,610 --> 00:46:14,630 So I claim I can get the finger search property in 2-3 trees, 812 00:46:14,630 --> 00:46:17,360 with data in the leaves, with level links. 813 00:46:17,360 --> 00:46:19,490 So lots of changes here. 814 00:46:19,490 --> 00:46:24,210 But in the end, we're going to get a finger search bound. 815 00:46:24,210 --> 00:46:25,970 Let's go over here. 816 00:46:45,805 --> 00:46:47,305 So here's a finger search operation. 817 00:46:50,360 --> 00:46:53,580 First thing I want to do is identify a node 818 00:46:53,580 --> 00:46:55,670 that I'm working with. 819 00:46:55,670 --> 00:46:57,690 I want to start from y's node. 820 00:46:57,690 --> 00:47:01,840 So we're supposing that we're told the node, a leaf, that 821 00:47:01,840 --> 00:47:02,590 contains y. 822 00:47:02,590 --> 00:47:06,770 So I'm going to let v be that leaf. 823 00:47:14,670 --> 00:47:15,170 OK. 824 00:47:15,170 --> 00:47:18,357 Because we're supposing we've already found y, 825 00:47:18,357 --> 00:47:19,940 and now all the data is in the leaves. 826 00:47:19,940 --> 00:47:22,934 So give me the leaf that contains y. 827 00:47:22,934 --> 00:47:24,350 So that should take constant time. 828 00:47:24,350 --> 00:47:26,790 That's just part of the input. 829 00:47:26,790 --> 00:47:29,550 Now I'm going to do a combination of going up 830 00:47:29,550 --> 00:47:31,850 and horizontal. 831 00:47:31,850 --> 00:47:33,990 So starting at a leaf. 832 00:47:33,990 --> 00:47:37,430 And the first thing I'm going to do is check, 833 00:47:37,430 --> 00:47:41,910 does this leaf contain what I want? 834 00:47:41,910 --> 00:47:44,850 Does it contain the key I'm searching for, which is x? 835 00:47:44,850 --> 00:47:46,595 So that's going to be the case. 836 00:47:46,595 --> 00:47:49,310 At every node I store the min and the max. 837 00:47:49,310 --> 00:47:53,770 So if x happens to fall between the min and the max, 838 00:47:53,770 --> 00:47:56,010 then I'm happy. 839 00:47:56,010 --> 00:48:06,320 Then I'm going to do a regular search in v's subtree. 840 00:48:06,320 --> 00:48:08,007 This seems weird in the case of a leaf. 841 00:48:08,007 --> 00:48:09,465 In the case of a leaf, this is just 842 00:48:09,465 --> 00:48:11,580 to check the two keys that are there. 843 00:48:11,580 --> 00:48:12,961 Which one is x. 844 00:48:12,961 --> 00:48:13,460 OK. 845 00:48:13,460 --> 00:48:16,460 But in general I gave you this search algorithm 846 00:48:16,460 --> 00:48:20,830 which was, if I decide which child to take, 847 00:48:20,830 --> 00:48:23,760 according to the ranges, that's a downward search. 848 00:48:23,760 --> 00:48:26,204 So that's what I'm calling regular search here. 849 00:48:26,204 --> 00:48:27,870 Maybe downward would be a little better. 850 00:48:33,710 --> 00:48:37,650 This is the usual log n time thing. 851 00:48:37,650 --> 00:48:40,980 But we're going to claim a bound better than log n. 852 00:48:40,980 --> 00:48:43,150 If this is not the case, then I know 853 00:48:43,150 --> 00:48:46,260 x either falls before v.min or after v.max. 854 00:48:49,260 --> 00:48:58,100 So if x is less than v.min then I'm going to go left. 855 00:48:58,100 --> 00:49:04,710 v equals v. I'll call it level left to be clear. 856 00:49:04,710 --> 00:49:06,397 You might say left is the left child. 857 00:49:06,397 --> 00:49:07,980 There's no left child here, of course. 858 00:49:07,980 --> 00:49:09,490 But level left is clear. 859 00:49:09,490 --> 00:49:13,100 We take the horizontal left pointer. 860 00:49:13,100 --> 00:49:18,000 And otherwise x is greater than v.max. 861 00:49:18,000 --> 00:49:21,374 And in that case I will go right. 862 00:49:21,374 --> 00:49:22,165 That seems logical. 863 00:49:25,580 --> 00:49:32,650 And in both cases we're going to go up. 864 00:49:32,650 --> 00:49:36,250 x equals x.parent Whoops. 865 00:49:36,250 --> 00:49:39,050 v equals v.parent. 866 00:49:39,050 --> 00:49:40,060 X is not changing here. 867 00:49:40,060 --> 00:49:43,470 X is a key we're searching for. v is the node. 868 00:49:43,470 --> 00:49:46,020 V for vertex. 869 00:49:46,020 --> 00:49:48,990 So we're always going to go up, and then 870 00:49:48,990 --> 00:49:51,329 we're going to go either left or right, 871 00:49:51,329 --> 00:49:53,120 and we're going to keep doing that until we 872 00:49:53,120 --> 00:49:56,690 find a subtree that contains x in terms of key range. 873 00:49:56,690 --> 00:49:58,630 Then we're going to stop this part 874 00:49:58,630 --> 00:50:00,780 and we're just going to do downward search. 875 00:50:00,780 --> 00:50:03,894 I should say return here or something. 876 00:50:03,894 --> 00:50:05,560 I'm going to do a downward search, which 877 00:50:05,560 --> 00:50:07,860 was this regular algorithm. 878 00:50:07,860 --> 00:50:12,140 And then whatever it finds, that's what I return. 879 00:50:12,140 --> 00:50:14,712 I claim the algorithm should be clear. 880 00:50:14,712 --> 00:50:16,670 What's less clear is that it achieves the bound 881 00:50:16,670 --> 00:50:17,840 that we want. 882 00:50:17,840 --> 00:50:20,390 But I claim that this will achieve the finger search 883 00:50:20,390 --> 00:50:23,140 property. 884 00:50:23,140 --> 00:50:25,410 Let me draw a picture of what this thing looks 885 00:50:25,410 --> 00:50:33,175 like kind of generically. 886 00:50:33,175 --> 00:50:35,880 On small examples it's hard to see what's going on. 887 00:50:35,880 --> 00:50:38,925 So I'm going to draw a piece of a large example. 888 00:50:44,160 --> 00:50:47,190 Let's say we start here. 889 00:50:47,190 --> 00:50:49,395 This is where y was. 890 00:50:49,395 --> 00:50:50,570 I'm searching for x. 891 00:50:50,570 --> 00:50:52,910 Let's suppose x is to the right. 892 00:50:52,910 --> 00:50:55,600 'Cause otherwise I go to the other board. 893 00:50:55,600 --> 00:50:57,290 So x is to the right. 894 00:50:57,290 --> 00:51:00,740 I'll discover that the range with just this node, this node 895 00:51:00,740 --> 00:51:02,440 maybe contains one other key. 896 00:51:02,440 --> 00:51:05,410 I'll find that range is too small. 897 00:51:05,410 --> 00:51:08,660 So I'm going to go follow the level right pointer, 898 00:51:08,660 --> 00:51:10,930 and I get to some other node. 899 00:51:10,930 --> 00:51:12,600 Then I'm going to go to the parent. 900 00:51:12,600 --> 00:51:15,910 Maybe the parent was the parent of those two children 901 00:51:15,910 --> 00:51:17,930 so I'm going to draw it like that. 902 00:51:17,930 --> 00:51:21,340 Maybe I find this range is still too low. 903 00:51:21,340 --> 00:51:24,810 I need to go right to get to x, so I'm going to follow a level 904 00:51:24,810 --> 00:51:26,630 pointer to the right. 905 00:51:26,630 --> 00:51:29,340 I find a new subtree. 906 00:51:29,340 --> 00:51:31,490 I'll go to its parent. 907 00:51:31,490 --> 00:51:35,160 Maybe I find that this subtree, still the max is too small. 908 00:51:35,160 --> 00:51:37,360 So I have to go to the right again. 909 00:51:37,360 --> 00:51:38,740 And then I take the parent. 910 00:51:38,740 --> 00:51:41,496 So this was an example of a rightward parent. 911 00:51:41,496 --> 00:51:43,120 Here's an example of a leftward parent. 912 00:51:43,120 --> 00:51:46,860 This is maybe the parent of both of these two children. 913 00:51:46,860 --> 00:51:49,860 Then maybe this subtree is still too small, 914 00:51:49,860 --> 00:51:52,300 the max is still smaller than x. 915 00:51:52,300 --> 00:51:56,170 So then I go right one more time. 916 00:51:56,170 --> 00:51:57,470 Then I follow the parent. 917 00:51:57,470 --> 00:51:59,950 Always alternating between right and parent 918 00:51:59,950 --> 00:52:06,170 until I find a node whose subtree contains x. 919 00:52:06,170 --> 00:52:08,660 It might have actually, x may be down here, because I 920 00:52:08,660 --> 00:52:11,000 immediately went to the parent without checking 921 00:52:11,000 --> 00:52:13,370 whether I found where x is. 922 00:52:13,370 --> 00:52:15,650 But if I know that x is somewhere in here then 923 00:52:15,650 --> 00:52:18,180 I will do a downward search. 924 00:52:18,180 --> 00:52:21,050 It might go left and then down here, or it might go right, 925 00:52:21,050 --> 00:52:23,590 or there's actually potentially three children. 926 00:52:23,590 --> 00:52:27,030 One of these searches will find the key 927 00:52:27,030 --> 00:52:31,210 x that I'm looking for because I'm 928 00:52:31,210 --> 00:52:34,580 in the case where x is between v.min and v.max, 929 00:52:34,580 --> 00:52:37,090 so I know it's in there, somewhere. 930 00:52:37,090 --> 00:52:40,670 It could be x doesn't exist, but it's predecessor or successor 931 00:52:40,670 --> 00:52:42,710 is in there somewhere. 932 00:52:42,710 --> 00:52:45,320 And so one of these three subtrees 933 00:52:45,320 --> 00:52:47,310 will contain the x range. 934 00:52:47,310 --> 00:52:49,720 And then I go follow that path. 935 00:52:49,720 --> 00:52:53,440 And keep going down until I find x or it's predecessor 936 00:52:53,440 --> 00:52:54,200 or successor. 937 00:52:54,200 --> 00:52:57,440 Once I find it's predecessor I can use a level right pointer 938 00:52:57,440 --> 00:53:01,270 to find its successor, and so on. 939 00:53:01,270 --> 00:53:03,620 So that's kind of the general picture what's going on. 940 00:53:03,620 --> 00:53:06,700 We keep going rightward and we keep going up. 941 00:53:10,580 --> 00:53:15,780 Suppose we do k up steps. 942 00:53:15,780 --> 00:53:19,950 Let's look at this last step here. 943 00:53:19,950 --> 00:53:20,490 Step k. 944 00:53:25,720 --> 00:53:27,080 How high am I in the tree? 945 00:53:27,080 --> 00:53:28,400 I started at the leaf level. 946 00:53:28,400 --> 00:53:31,720 Remember in a 2-3 tree all the leaves have the same level. 947 00:53:31,720 --> 00:53:34,730 And I went up every step. 948 00:53:34,730 --> 00:53:36,270 Sorry. 949 00:53:36,270 --> 00:53:39,540 I don't know what this is, like the 2-step dance 950 00:53:39,540 --> 00:53:44,140 where, let's say every iteration of this loop I 951 00:53:44,140 --> 00:53:47,040 do one left or right step, and then a parent step. 952 00:53:47,040 --> 00:53:52,270 So I should call this iteration k. 953 00:53:52,270 --> 00:53:53,980 I guess there's two k steps, then. 954 00:53:57,804 --> 00:53:59,850 Just to be clear. 955 00:53:59,850 --> 00:54:02,520 So in iteration k, that means I've gone up k times 956 00:54:02,520 --> 00:54:05,100 and I've gone either right or left k times. 957 00:54:05,100 --> 00:54:07,710 You can show if you start going right you keep going right. 958 00:54:07,710 --> 00:54:10,750 If you initially go left you'll keep going left. 959 00:54:10,750 --> 00:54:13,860 Doesn't matter too much. 960 00:54:13,860 --> 00:54:22,240 At iteration k I am at height k, or k minus 1, 961 00:54:22,240 --> 00:54:24,000 or however you want to count. 962 00:54:24,000 --> 00:54:25,680 But let's call it k. 963 00:54:25,680 --> 00:54:31,250 So when I do this right pointer here 964 00:54:31,250 --> 00:54:36,080 I know that, for example, I am skipping 965 00:54:36,080 --> 00:54:42,660 over all of these keys. 966 00:54:42,660 --> 00:54:44,730 All the keys down-- the keys are in the leaves, 967 00:54:44,730 --> 00:54:48,110 so all these things down here, I'm jumping over them. 968 00:54:48,110 --> 00:54:51,130 How many keys are down there? 969 00:54:51,130 --> 00:54:53,900 Can you tell me, roughly, how many keys 970 00:54:53,900 --> 00:54:56,860 I'm skipping over when I'm moving right at height k? 971 00:54:59,970 --> 00:55:01,472 It's not a unique answer. 972 00:55:01,472 --> 00:55:02,805 But you can give me some bounds. 973 00:55:16,800 --> 00:55:18,940 Say again. 974 00:55:18,940 --> 00:55:20,840 Number of children to the k power. 975 00:55:20,840 --> 00:55:22,010 Yeah. 976 00:55:22,010 --> 00:55:24,060 Except we don't know the number of children. 977 00:55:24,060 --> 00:55:31,510 But it's between 2 and 3 Closer one should be easy but I fail. 978 00:55:31,510 --> 00:55:33,350 So it's between two and three children. 979 00:55:33,350 --> 00:55:41,140 So there's the number-- if you look at a height k tree, 980 00:55:41,140 --> 00:55:42,880 how many leaves does it have? 981 00:55:42,880 --> 00:55:47,890 It's going to be between 2 to the k and 3 to the k. 982 00:55:47,890 --> 00:55:51,250 Because I have between 2 and 3 children at every node. 983 00:55:51,250 --> 00:55:53,175 And so it's exponential in k. 984 00:55:53,175 --> 00:55:54,050 That's all I'll need. 985 00:55:56,780 --> 00:55:57,280 OK. 986 00:55:57,280 --> 00:56:00,820 When I'm at height k here, I'm skipping over a height 987 00:56:00,820 --> 00:56:03,440 k minus 1 tree or something. 988 00:56:03,440 --> 00:56:06,530 But it's going to be-- 989 00:56:06,530 --> 00:56:13,165 So in iteration k I'm skipping, at least, some constant times 2 990 00:56:13,165 --> 00:56:13,670 to the k. 991 00:56:13,670 --> 00:56:17,270 Maybe to the k minus 1, or to the k minus 2. 992 00:56:17,270 --> 00:56:18,300 I'm being very sloppy. 993 00:56:18,300 --> 00:56:19,130 Doesn't matter. 994 00:56:19,130 --> 00:56:21,870 As long as it's exponential in k, I'm happy. 995 00:56:21,870 --> 00:56:27,190 Because I'm supposing that x and y are somewhat close. 996 00:56:27,190 --> 00:56:30,510 Let's call this rank difference d. 997 00:56:30,510 --> 00:56:33,290 Then I claim the number of iterations 998 00:56:33,290 --> 00:56:37,340 I'll need to do in this loop is, at most, order log d. 999 00:56:37,340 --> 00:56:41,100 Because if, when I get to the k-th iteration, 1000 00:56:41,100 --> 00:56:43,870 I'm jumping over 2 to the k elements. 1001 00:56:43,870 --> 00:56:47,210 How large does k have to be before 2 to the k 1002 00:56:47,210 --> 00:56:50,230 is larger than d? 1003 00:56:50,230 --> 00:56:52,960 Well, log d. 1004 00:56:52,960 --> 00:57:09,120 Log base 2 1005 00:57:09,120 --> 00:57:15,950 The number of iterations is order 1006 00:57:15,950 --> 00:57:19,850 log d, where d is the rank difference. 1007 00:57:19,850 --> 00:57:25,390 d is the absolute value between rank of x and rank of y. 1008 00:57:28,940 --> 00:57:31,940 And I'm being a little sloppy here. 1009 00:57:31,940 --> 00:57:33,840 You probably want to use an induction. 1010 00:57:33,840 --> 00:57:36,140 You need to show that they're really, these items 1011 00:57:36,140 --> 00:57:38,140 here that you're skipping over that are strictly 1012 00:57:38,140 --> 00:57:39,500 between x and y. 1013 00:57:39,500 --> 00:57:41,970 But we know that there's only d items between x or y. 1014 00:57:41,970 --> 00:57:44,020 Actually d minus 1, I guess. 1015 00:57:44,020 --> 00:57:49,360 So as soon as we've skipped over all the items between x and y, 1016 00:57:49,360 --> 00:57:52,652 then we'll find a range that contains x, 1017 00:57:52,652 --> 00:57:54,360 and then we'll go do the downward search. 1018 00:57:54,360 --> 00:57:56,740 Now how long does the downward search cost? 1019 00:57:56,740 --> 00:57:58,881 Whatever the height of the tree is. 1020 00:57:58,881 --> 00:58:00,130 What's the height of the tree? 1021 00:58:00,130 --> 00:58:01,463 That's the number of iterations. 1022 00:58:01,463 --> 00:58:03,230 So the total cost. 1023 00:58:03,230 --> 00:58:05,110 The downward search will cost the same 1024 00:58:05,110 --> 00:58:07,480 as the rest of the search. 1025 00:58:07,480 --> 00:58:12,302 And so the total cost is going to be order log d. 1026 00:58:12,302 --> 00:58:14,200 Clear? 1027 00:58:14,200 --> 00:58:19,920 Any questions about finger searching with level 1028 00:58:19,920 --> 00:58:25,460 linked data at the leaves, 2-3 trees? 1029 00:58:25,460 --> 00:58:29,150 AUDIENCE: Sir, I'm not sure why [INAUDIBLE] d, why is that? 1030 00:58:29,150 --> 00:58:32,500 PROFESSOR: I'm defining d to be the rank of x minus rank of y. 1031 00:58:32,500 --> 00:58:34,570 My goal is to achieve a log d bound. 1032 00:58:34,570 --> 00:58:40,520 And I'm claiming that because once I've skipped over d items, 1033 00:58:40,520 --> 00:58:41,390 then I'm done. 1034 00:58:41,390 --> 00:58:43,240 Then I've found x. 1035 00:58:43,240 --> 00:58:48,250 And at step k I'm skipping over 2 to the k items. 1036 00:58:48,250 --> 00:58:50,010 So how big is k going to be? 1037 00:58:50,010 --> 00:58:51,480 Log d. 1038 00:58:51,480 --> 00:58:53,520 That's all. 1039 00:58:53,520 --> 00:58:55,400 I used d for a notation here. 1040 00:58:58,281 --> 00:58:58,780 Cool. 1041 00:59:01,420 --> 00:59:02,600 Finger searching. 1042 00:59:02,600 --> 00:59:03,100 It's nice. 1043 00:59:03,100 --> 00:59:05,474 Especially if you're doing many consecutive searches that 1044 00:59:05,474 --> 00:59:09,110 are all relatively close to each other. 1045 00:59:09,110 --> 00:59:09,960 But that was easy. 1046 00:59:09,960 --> 00:59:13,800 Let's do a more difficult augmentation. 1047 00:59:13,800 --> 00:59:18,690 So the last topic for today is range trees. 1048 00:59:18,690 --> 00:59:20,970 This is probably the coolest example of augmentation, 1049 00:59:20,970 --> 00:59:22,726 at least, that you'll see in this class. 1050 00:59:22,726 --> 00:59:24,350 If you want to see more you should take 1051 00:59:24,350 --> 00:59:32,570 advanced data structure 6851. 1052 00:59:32,570 --> 00:59:34,970 And range trees solve a problem called 1053 00:59:34,970 --> 00:59:36,180 orthogonal range searching. 1054 00:59:38,710 --> 00:59:41,910 Not orthogonal search ranging. 1055 00:59:41,910 --> 00:59:46,130 Orthogonal range search. 1056 00:59:51,840 --> 00:59:54,810 So what's the problem? 1057 00:59:54,810 --> 00:59:57,810 I'm going to give you a bunch of points. 1058 00:59:57,810 --> 01:00:01,150 Draw them as fat dots so you can actually see them. 1059 01:00:01,150 --> 01:00:03,190 In some dimension. 1060 01:00:03,190 --> 01:00:08,300 So this is, for example, a 2D point set. 1061 01:00:08,300 --> 01:00:08,800 OK. 1062 01:00:08,800 --> 01:00:11,857 Over here I will draw a 3D point set. 1063 01:00:11,857 --> 01:00:13,440 You can tell the difference, I'm sure. 1064 01:00:18,860 --> 01:00:19,360 There. 1065 01:00:19,360 --> 01:00:22,310 Now it's a 3D point set. 1066 01:00:22,310 --> 01:00:25,221 And this is a static point set. 1067 01:00:25,221 --> 01:00:26,970 You could make this dynamic but let's just 1068 01:00:26,970 --> 01:00:30,470 think about the static case. 1069 01:00:30,470 --> 01:00:34,490 Don't want the 2D points and the 3D points to mix. 1070 01:00:34,490 --> 01:00:37,890 Now, you get to preprocess this into a data structure. 1071 01:00:37,890 --> 01:00:40,097 So this is a static data structure problem. 1072 01:00:40,097 --> 01:00:42,680 And now I'm going to come along with a whole bunch of queries. 1073 01:00:42,680 --> 01:00:45,770 A query will be a box. 1074 01:00:45,770 --> 01:00:46,270 OK. 1075 01:00:46,270 --> 01:00:48,445 In two dimensions, a box is a rectangle. 1076 01:00:51,370 --> 01:00:52,290 Something like this. 1077 01:00:52,290 --> 01:00:53,580 Axis aligned. 1078 01:00:53,580 --> 01:00:57,040 So I give you an x min, x max, a y min, and a y max. 1079 01:00:57,040 --> 01:00:59,490 I want to know what are the points inside. 1080 01:00:59,490 --> 01:01:00,920 Maybe I want you to list them. 1081 01:01:00,920 --> 01:01:01,750 If there's a lot of them it's going 1082 01:01:01,750 --> 01:01:03,125 to take a long time to list them. 1083 01:01:03,125 --> 01:01:05,900 Maybe I just want to know 10 of them as examples. 1084 01:01:05,900 --> 01:01:07,730 Maybe this is a Google search or something. 1085 01:01:07,730 --> 01:01:09,813 I just get the first 10 results in the first page, 1086 01:01:09,813 --> 01:01:13,600 I hit next then want the next 10, that kind of thing. 1087 01:01:13,600 --> 01:01:16,730 Or maybe I want to know how many search results there are. 1088 01:01:16,730 --> 01:01:18,180 Number of points in the rectangle. 1089 01:01:18,180 --> 01:01:19,720 Bunch of different problems. 1090 01:01:19,720 --> 01:01:23,370 In 3D, it's a 3D box. 1091 01:01:23,370 --> 01:01:26,650 Which is a little harder to draw. 1092 01:01:26,650 --> 01:01:28,900 You can't really tell which points are inside the box. 1093 01:01:28,900 --> 01:01:30,900 Let's say these three points are all inside the box. 1094 01:01:30,900 --> 01:01:32,816 I give you an interval in x, an interval in y, 1095 01:01:32,816 --> 01:01:34,880 and an interval in z, and I want to know 1096 01:01:34,880 --> 01:01:36,060 what are the points inside. 1097 01:01:36,060 --> 01:01:37,460 How many are there? 1098 01:01:37,460 --> 01:01:38,220 List them all. 1099 01:01:38,220 --> 01:01:40,941 List 10 of them, whatever. 1100 01:01:40,941 --> 01:01:41,440 OK. 1101 01:01:44,000 --> 01:01:47,050 I want to do this in poly log time, let's say. 1102 01:01:47,050 --> 01:01:50,830 I'm going to achieve today log squared for the 2D problem 1103 01:01:50,830 --> 01:01:52,920 and log cubed for the 3D problem, 1104 01:01:52,920 --> 01:01:54,830 plus whatever the size output is. 1105 01:02:01,630 --> 01:02:03,260 So let me just write that down. 1106 01:02:14,970 --> 01:02:27,255 So the goal is to preprocess n points in d dimensions. 1107 01:02:30,400 --> 01:02:33,580 So you get to spend a bunch of time preprocessing 1108 01:02:33,580 --> 01:02:44,840 to support a query which is, given a box, axis aligned box, 1109 01:02:44,840 --> 01:02:52,020 find let's say the number of points in the box. 1110 01:02:56,310 --> 01:02:58,655 Find k points in the box. 1111 01:03:03,927 --> 01:03:04,760 I think that's good. 1112 01:03:04,760 --> 01:03:09,300 That includes a special case of find all the points in the box. 1113 01:03:09,300 --> 01:03:14,480 So this, of course, we have to pay a penalty of order k 1114 01:03:14,480 --> 01:03:15,230 for the output. 1115 01:03:17,850 --> 01:03:20,470 No getting around that. 1116 01:03:20,470 --> 01:03:24,894 But I want the rest of the time to be log to the d. 1117 01:03:28,282 --> 01:03:30,830 So we're going to achieve log to the d n 1118 01:03:30,830 --> 01:03:33,000 plus size of the output. 1119 01:03:36,050 --> 01:03:38,550 And you get to control how big you want the output to be. 1120 01:03:38,550 --> 01:03:41,360 So it's a pretty reasonable data structure. 1121 01:03:41,360 --> 01:03:43,690 In a certain sense we will understand what the output 1122 01:03:43,690 --> 01:03:45,551 is in log to the d time. 1123 01:03:45,551 --> 01:03:47,050 If you actually want to list points, 1124 01:03:47,050 --> 01:03:50,980 well, then you have to spend the time to do it. 1125 01:03:50,980 --> 01:03:51,480 All right. 1126 01:03:51,480 --> 01:03:55,110 So 2D and 3D are great, but let's start with 1D. 1127 01:03:55,110 --> 01:03:57,510 First we should understand 1D completely, 1128 01:03:57,510 --> 01:03:58,710 then we can generalize. 1129 01:04:06,590 --> 01:04:08,290 1D we already know how to do. 1130 01:04:12,700 --> 01:04:15,430 1D I have a line. 1131 01:04:15,430 --> 01:04:16,770 I have some points on the line. 1132 01:04:22,370 --> 01:04:26,210 And I'm given, as a query, some interval. 1133 01:04:29,360 --> 01:04:32,220 And I want to know how many points are in the interval, 1134 01:04:32,220 --> 01:04:36,890 give me the points in the interval, and so on. 1135 01:04:36,890 --> 01:04:38,725 So how do I do this? 1136 01:04:38,725 --> 01:04:39,225 Any ways? 1137 01:04:48,600 --> 01:04:49,720 If d is 1. 1138 01:04:49,720 --> 01:04:52,980 So I want to achieve log d, sorry, log n, 1139 01:04:52,980 --> 01:04:54,288 plus size of output. 1140 01:04:58,294 --> 01:04:58,960 I hear whispers. 1141 01:05:04,988 --> 01:05:05,876 Yeah? 1142 01:05:05,876 --> 01:05:06,950 AUDIENCE: Segment trees? 1143 01:05:06,950 --> 01:05:07,950 PROFESSOR: Segment tree? 1144 01:05:07,950 --> 01:05:09,100 That's fancy. 1145 01:05:09,100 --> 01:05:10,310 We won't cover segment trees. 1146 01:05:10,310 --> 01:05:13,750 Probably segment trees do it. 1147 01:05:13,750 --> 01:05:14,250 Yeah. 1148 01:05:14,250 --> 01:05:17,546 We know lots of ways to do this. 1149 01:05:17,546 --> 01:05:18,045 Yeah? 1150 01:05:18,045 --> 01:05:18,960 AUDIENCE: Sorted array? 1151 01:05:18,960 --> 01:05:21,020 PROFESSOR: Sorted array is probably the simplest. 1152 01:05:21,020 --> 01:05:24,380 If I store the items in a sorted array and I have two values, 1153 01:05:24,380 --> 01:05:28,040 I'll call them x1 and x2, because it's 1154 01:05:28,040 --> 01:05:30,820 the x min and x max. 1155 01:05:30,820 --> 01:05:32,500 Binary search for x1. 1156 01:05:32,500 --> 01:05:34,110 Binary search for x2. 1157 01:05:34,110 --> 01:05:36,710 Find the successor of x1 and the predecessor of x2. 1158 01:05:36,710 --> 01:05:38,164 I'll find these two guys. 1159 01:05:38,164 --> 01:05:39,830 And then I know all the ones in between. 1160 01:05:39,830 --> 01:05:41,520 That's the match. 1161 01:05:41,520 --> 01:05:44,830 So that'll take log n time to find those points 1162 01:05:44,830 --> 01:05:47,720 and then we're good. 1163 01:05:47,720 --> 01:05:50,680 So we could do a sorted array. 1164 01:05:50,680 --> 01:05:53,950 Of course, sorted array is a little hard to generalize. 1165 01:05:53,950 --> 01:05:57,170 I don't want to do a 2D array, that sounds bad. 1166 01:05:57,170 --> 01:05:59,580 You could, of course, do a binary search tree. 1167 01:05:59,580 --> 01:06:01,580 Like an AVL tree. 1168 01:06:01,580 --> 01:06:02,140 Same thing. 1169 01:06:02,140 --> 01:06:04,102 Because we have log n search, find successor, 1170 01:06:04,102 --> 01:06:06,310 and predecessor, I guess you could use Van Emde Boas, 1171 01:06:06,310 --> 01:06:08,070 but that's hard to generalize to 2D. 1172 01:06:10,940 --> 01:06:13,640 You could use level links. 1173 01:06:13,640 --> 01:06:14,940 Here's a fancy version. 1174 01:06:14,940 --> 01:06:19,310 We could use level linked 2-3 trees with data in the leaves. 1175 01:06:19,310 --> 01:06:24,150 Then once I find x min, I find this point, 1176 01:06:24,150 --> 01:06:26,580 I can go to the successor in constant time 1177 01:06:26,580 --> 01:06:29,680 because that's a finger search with a rank difference of 1. 1178 01:06:29,680 --> 01:06:32,180 And I could just keep calling successor 1179 01:06:32,180 --> 01:06:36,239 and in constant time per item I will find the next item. 1180 01:06:36,239 --> 01:06:38,280 So we could do that easily with the sorted array. 1181 01:06:38,280 --> 01:06:40,870 BST is not so great because successor 1182 01:06:40,870 --> 01:06:44,154 might cost log n each time. 1183 01:06:44,154 --> 01:06:45,570 But if I have the level links then 1184 01:06:45,570 --> 01:06:47,444 basically I'm just walking down the link list 1185 01:06:47,444 --> 01:06:49,190 at the bottom of the tree. 1186 01:06:49,190 --> 01:06:49,690 OK. 1187 01:06:49,690 --> 01:06:52,920 So actually level linked is even better. 1188 01:06:55,660 --> 01:07:01,660 BST would achieve something like log n plus k log n, where 1189 01:07:01,660 --> 01:07:05,630 k is the size of the output. 1190 01:07:05,630 --> 01:07:08,400 If I want k points in the box I have to pay log n. 1191 01:07:08,400 --> 01:07:14,130 For each level linked I'll only pay log n plus k. 1192 01:07:14,130 --> 01:07:17,310 Here I actually only need the levels at the leaves. 1193 01:07:17,310 --> 01:07:18,790 Level links. 1194 01:07:18,790 --> 01:07:19,290 OK. 1195 01:07:19,290 --> 01:07:20,352 All good. 1196 01:07:20,352 --> 01:07:22,310 But I actually want to tell you a different way 1197 01:07:22,310 --> 01:07:24,560 to do it that will generalize better. 1198 01:07:30,552 --> 01:07:32,010 The pictures are going to look just 1199 01:07:32,010 --> 01:07:34,460 like the pictures we've talked about. 1200 01:07:55,040 --> 01:07:57,980 So these would actually work dynamically. 1201 01:07:57,980 --> 01:08:00,720 My goal here is just to achieve a static data structure. 1202 01:08:00,720 --> 01:08:05,320 I'm going to idealize this solution a little bit. 1203 01:08:05,320 --> 01:08:11,030 And just say, suppose I have a perfectly balanced 1204 01:08:11,030 --> 01:08:13,050 binary search tree. 1205 01:08:13,050 --> 01:08:14,971 That's going to be my data structure. 1206 01:08:14,971 --> 01:08:15,470 OK. 1207 01:08:15,470 --> 01:08:18,140 So the data structure is not hard, but what's interesting 1208 01:08:18,140 --> 01:08:21,859 is how I do a range search. 1209 01:08:21,859 --> 01:08:32,569 So if I do range query of the interval, I'll call it ab. 1210 01:08:32,569 --> 01:08:37,380 Then what I'm going to do is do a binary search for a, 1211 01:08:37,380 --> 01:08:45,970 do a binary search for b, trim the common prefix 1212 01:08:45,970 --> 01:08:48,420 of those search paths. 1213 01:08:48,420 --> 01:08:52,504 That's basically finding the lowest common ancestor 1214 01:08:52,504 --> 01:08:55,310 of a and b. 1215 01:08:59,771 --> 01:09:01,270 And then I'm going to do some stuff. 1216 01:09:01,270 --> 01:09:02,670 Let me draw the picture. 1217 01:09:02,670 --> 01:09:07,649 So here is, suppose here's the node that contains a. 1218 01:09:07,649 --> 01:09:09,380 Here's the node that contains b. 1219 01:09:09,380 --> 01:09:12,670 They may not be at the same depth, who knows. 1220 01:09:12,670 --> 01:09:15,290 Then I'm going to look at the parents of a. 1221 01:09:15,290 --> 01:09:19,630 I just came down from some path here, and some path down to b. 1222 01:09:19,630 --> 01:09:21,439 I want to find this branching point 1223 01:09:21,439 --> 01:09:23,775 where the paths to a and the paths to b diverge. 1224 01:09:26,490 --> 01:09:28,500 So let's just look at the parent of a. 1225 01:09:28,500 --> 01:09:34,649 It could be a right parent, in which case 1226 01:09:34,649 --> 01:09:35,649 there's a subtree here. 1227 01:09:35,649 --> 01:09:38,910 Could be a left parent in which case, subtree here. 1228 01:09:43,020 --> 01:09:46,050 I'm going to follow my convention again. 1229 01:09:46,050 --> 01:09:48,220 That x-coordinate corresponds roughly to key. 1230 01:09:53,206 --> 01:09:55,120 Left parent here. 1231 01:09:55,120 --> 01:09:56,485 Maybe right parent here. 1232 01:10:08,040 --> 01:10:09,090 Something like that. 1233 01:10:23,220 --> 01:10:23,720 OK. 1234 01:10:23,720 --> 01:10:25,060 Remember it's a perfect tree. 1235 01:10:25,060 --> 01:10:30,005 So, actually, all the leaves will be at the same level. 1236 01:10:35,050 --> 01:10:39,500 And, roughly here, x-coordinate corresponds to key. 1237 01:10:39,500 --> 01:10:41,786 So here is a. 1238 01:10:41,786 --> 01:10:44,910 And I want to return all the keys that are between a and b. 1239 01:10:44,910 --> 01:10:48,112 So that's everything in this sweep line. 1240 01:10:50,960 --> 01:10:53,880 The parents of the LCA don't matter, because this parents 1241 01:10:53,880 --> 01:10:56,191 either going to be way over to the right or way over 1242 01:10:56,191 --> 01:10:56,690 to the left. 1243 01:10:56,690 --> 01:10:59,399 In both cases, it's outside the interval a to b. 1244 01:10:59,399 --> 01:11:00,940 So what I've tried to highlight here, 1245 01:11:00,940 --> 01:11:06,700 and I will color it in blue, is the relevant nodes 1246 01:11:06,700 --> 01:11:08,350 for the search between a and b. 1247 01:11:08,350 --> 01:11:11,570 So a is between a and b. 1248 01:11:11,570 --> 01:11:14,800 This subtree is greater than a and less than b. 1249 01:11:14,800 --> 01:11:17,360 This node, and these nodes. 1250 01:11:17,360 --> 01:11:20,520 This node, and these nodes. 1251 01:11:20,520 --> 01:11:23,580 This node and these nodes. 1252 01:11:23,580 --> 01:11:25,490 The common ancestor. 1253 01:11:25,490 --> 01:11:27,350 And then the corresponding thing over here. 1254 01:11:31,350 --> 01:11:33,450 All the nodes in all these blue subtrees, 1255 01:11:33,450 --> 01:11:36,630 plus these individual nodes, fall in the interval 1256 01:11:36,630 --> 01:11:40,440 between a and b, and that's it. 1257 01:11:40,440 --> 01:11:40,940 OK. 1258 01:11:40,940 --> 01:11:42,400 This should look super familiar. 1259 01:11:42,400 --> 01:11:44,397 It's just like when we're computing rank. 1260 01:11:44,397 --> 01:11:46,730 We're trying to figure out how many guys are to our left 1261 01:11:46,730 --> 01:11:47,740 or to our right. 1262 01:11:47,740 --> 01:11:49,960 We're basically doing a rightward rank 1263 01:11:49,960 --> 01:11:52,680 from a and a leftward rank from b. 1264 01:11:52,680 --> 01:11:54,160 And that finds all the nodes. 1265 01:11:54,160 --> 01:11:57,670 And stopping when those two searches converge. 1266 01:11:57,670 --> 01:12:00,040 And then we're finding all the nodes between a and b. 1267 01:12:00,040 --> 01:12:01,165 I'm not going to write down the pseudocode because it's 1268 01:12:01,165 --> 01:12:02,123 the same kind of thing. 1269 01:12:02,123 --> 01:12:05,050 You look at right parents and left parents. 1270 01:12:05,050 --> 01:12:06,880 You just walk up from a. 1271 01:12:06,880 --> 01:12:09,550 Whenever you get a right parent then 1272 01:12:09,550 --> 01:12:12,830 you want that node, and the subtree to its right. 1273 01:12:12,830 --> 01:12:15,070 And so that will highlight these nodes. 1274 01:12:15,070 --> 01:12:17,920 Same thing for b, but you look at left parents. 1275 01:12:17,920 --> 01:12:20,224 And then you stop when those two searches converge. 1276 01:12:20,224 --> 01:12:21,890 So you're going to do them in lock step. 1277 01:12:21,890 --> 01:12:23,530 You do one step for a and b. 1278 01:12:23,530 --> 01:12:24,480 One step for a and b. 1279 01:12:24,480 --> 01:12:26,980 And when they happen to hit the same node, then you're done. 1280 01:12:26,980 --> 01:12:29,990 You add that node to your list. 1281 01:12:29,990 --> 01:12:35,630 And what you end up with is a bunch 1282 01:12:35,630 --> 01:12:39,510 of nodes and rooted subtrees. 1283 01:12:43,820 --> 01:12:48,500 The things I circled in blue is going to be my return value. 1284 01:12:48,500 --> 01:12:52,110 So I'm going to return all of these nodes, explicitly. 1285 01:12:52,110 --> 01:12:54,012 And I'm also going to return these subtrees. 1286 01:12:54,012 --> 01:12:55,720 I'm not going to have to write them down. 1287 01:12:55,720 --> 01:12:58,130 I'm just going to return the root of the subtree, 1288 01:12:58,130 --> 01:12:58,880 and say, hey look. 1289 01:12:58,880 --> 01:13:01,930 Here's an entire subtree that contains 1290 01:13:01,930 --> 01:13:04,520 points that are in the answer. 1291 01:13:04,520 --> 01:13:06,380 Don't have to list them explicitly, 1292 01:13:06,380 --> 01:13:08,560 I can just give you the tree. 1293 01:13:08,560 --> 01:13:12,750 Then if I want to know how many results are in the answer, 1294 01:13:12,750 --> 01:13:16,730 well, just augment to store subtree size at the beginning. 1295 01:13:16,730 --> 01:13:18,200 And then I can count how many nodes 1296 01:13:18,200 --> 01:13:20,280 are down here, how many nodes are down here, 1297 01:13:20,280 --> 01:13:22,670 add that up for all the triangles, 1298 01:13:22,670 --> 01:13:25,700 and then also add one for each of the blue nodes, 1299 01:13:25,700 --> 01:13:30,500 and then I've counted the size of the answer in how much time? 1300 01:13:30,500 --> 01:13:35,010 How many subtrees and how many nodes am I returning here? 1301 01:13:35,010 --> 01:13:35,510 Log. 1302 01:13:40,490 --> 01:13:44,380 Log n nodes and log n rooted subtrees because at each step, 1303 01:13:44,380 --> 01:13:46,990 I'm going up by one for a, and up by one for b. 1304 01:13:46,990 --> 01:13:48,260 So it's like 2 log n. 1305 01:13:48,260 --> 01:13:48,860 Log n. 1306 01:13:51,880 --> 01:13:55,780 So I would call this an implicit representation of the answer. 1307 01:13:55,780 --> 01:13:57,810 From that implicit representation 1308 01:13:57,810 --> 01:13:59,920 you can do subtree size. 1309 01:13:59,920 --> 01:14:02,240 Augmentation to count the size the answer. 1310 01:14:02,240 --> 01:14:05,700 You can just start walking through one by one, do an inter 1311 01:14:05,700 --> 01:14:08,480 traversal of the trees, and you'll get the first k points 1312 01:14:08,480 --> 01:14:10,220 in the answer in order k time. 1313 01:14:10,220 --> 01:14:11,273 Question? 1314 01:14:11,273 --> 01:14:12,564 AUDIENCE: Just a clarification. 1315 01:14:12,564 --> 01:14:13,556 You said when we were walking up, 1316 01:14:13,556 --> 01:14:15,044 you want to get all the ancestors 1317 01:14:15,044 --> 01:14:17,524 in their right subtrees. 1318 01:14:17,524 --> 01:14:20,020 But you don't do that for the left parent, right? 1319 01:14:20,020 --> 01:14:21,020 PROFESSOR: That's right. 1320 01:14:21,020 --> 01:14:23,219 As I'm walking up the tree, if it's a right parent 1321 01:14:23,219 --> 01:14:25,760 then I take the right subtree and include that in the answer. 1322 01:14:25,760 --> 01:14:29,210 If it's a left parent just forget about it. 1323 01:14:29,210 --> 01:14:30,230 Don't do anything. 1324 01:14:30,230 --> 01:14:31,640 Just keep following parents. 1325 01:14:31,640 --> 01:14:34,020 Whenever I do right parent then I also 1326 01:14:34,020 --> 01:14:35,522 add that node and the right subtree. 1327 01:14:35,522 --> 01:14:37,480 If it's a left parent I don't include the node, 1328 01:14:37,480 --> 01:14:39,110 I don't include the left subtree. 1329 01:14:39,110 --> 01:14:40,420 I also don't include the right subtree. 1330 01:14:40,420 --> 01:14:41,711 That would have too much stuff. 1331 01:14:44,072 --> 01:14:45,530 It's easy when you see the picture, 1332 01:14:45,530 --> 01:14:46,950 you would write down the algorithm. 1333 01:14:46,950 --> 01:14:47,450 It's clear. 1334 01:14:47,450 --> 01:14:50,550 It's left versus right parents. 1335 01:14:50,550 --> 01:14:53,321 AUDIENCE: Would you include the left subtree of b? 1336 01:14:53,321 --> 01:14:54,820 PROFESSOR: I would also-- thank you. 1337 01:14:54,820 --> 01:14:58,630 I should color the left subtree of b. 1338 01:14:58,630 --> 01:15:00,480 I didn't apply symmetry perfectly. 1339 01:15:00,480 --> 01:15:03,250 So we have the right subtree of a and the left subtree of b. 1340 01:15:03,250 --> 01:15:04,950 Thanks. 1341 01:15:04,950 --> 01:15:10,110 I would also include b if it's a closed interval. 1342 01:15:10,110 --> 01:15:11,130 Slightly more general. 1343 01:15:11,130 --> 01:15:12,990 If a and b are not in the tree then this 1344 01:15:12,990 --> 01:15:17,110 is really the successor of a and this is the predecessor of b. 1345 01:15:17,110 --> 01:15:19,790 So then a and b don't have to be in there. 1346 01:15:19,790 --> 01:15:22,940 This is still a well defined range search. 1347 01:15:22,940 --> 01:15:23,440 OK. 1348 01:15:23,440 --> 01:15:25,540 Now we really understand 1D. 1349 01:15:25,540 --> 01:15:30,190 I claim we've almost solved all dimensions. 1350 01:15:30,190 --> 01:15:33,340 All we need is a little bit of augmentation. 1351 01:15:33,340 --> 01:15:34,230 So let's do it. 1352 01:15:51,560 --> 01:15:53,220 Let's start with 2D. 1353 01:15:53,220 --> 01:15:59,090 But then 3D, and 4D, and so on will be easy. 1354 01:15:59,090 --> 01:16:00,990 Why do I care about 4D range trees? 1355 01:16:00,990 --> 01:16:03,110 Because maybe I have a database. 1356 01:16:03,110 --> 01:16:05,720 Each of these points is actually just a row 1357 01:16:05,720 --> 01:16:09,880 in the database which has four columns, four values. 1358 01:16:09,880 --> 01:16:12,590 And what I'm trying to do here is find all the people 1359 01:16:12,590 --> 01:16:15,000 in my database that have a salary between this and this, 1360 01:16:15,000 --> 01:16:17,180 and have an age between this and that, 1361 01:16:17,180 --> 01:16:19,370 and have a profession between this and this. 1362 01:16:19,370 --> 01:16:22,130 I don't know what that means. 1363 01:16:22,130 --> 01:16:24,600 Number of degrees between this and this, whatever. 1364 01:16:24,600 --> 01:16:28,190 You have some numerical data representing a person or thing 1365 01:16:28,190 --> 01:16:31,280 in your database, then this is a typical kind of search 1366 01:16:31,280 --> 01:16:32,620 you want to do. 1367 01:16:32,620 --> 01:16:34,850 And you want to know how many answers you've got 1368 01:16:34,850 --> 01:16:37,290 and then list the first hundreds of them, or whatever. 1369 01:16:37,290 --> 01:16:40,150 So this is a practical thing in databases. 1370 01:16:40,150 --> 01:16:43,736 This is what you might call an index in the database. 1371 01:16:43,736 --> 01:16:44,360 So let's start. 1372 01:16:44,360 --> 01:16:46,110 Suppose your data is just two dimensional. 1373 01:16:46,110 --> 01:16:48,910 You have two fields for every item. 1374 01:16:48,910 --> 01:17:02,800 What I'm going to do is store a 1D range tree on all points 1375 01:17:02,800 --> 01:17:05,430 by x. 1376 01:17:05,430 --> 01:17:09,240 So this data structure makes sense if you fix a dimension. 1377 01:17:09,240 --> 01:17:10,990 Say x is all I care about. 1378 01:17:10,990 --> 01:17:13,290 Forget about y. 1379 01:17:13,290 --> 01:17:14,140 So my point set. 1380 01:17:14,140 --> 01:17:15,020 Yeah. 1381 01:17:15,020 --> 01:17:23,380 So what that corresponds to is projecting each of these points 1382 01:17:23,380 --> 01:17:24,475 onto the x-axis. 1383 01:17:31,320 --> 01:17:33,220 And now also projecting my query. 1384 01:17:36,050 --> 01:17:38,750 So my new query is from here to here in x. 1385 01:17:41,420 --> 01:17:43,900 And so this data structure will let 1386 01:17:43,900 --> 01:17:46,370 me find all these points that match in x. 1387 01:17:46,370 --> 01:17:48,180 That's not good because there's actually 1388 01:17:48,180 --> 01:17:50,480 only two points that I want, but I find 1389 01:17:50,480 --> 01:17:53,210 four points in this picture. 1390 01:17:53,210 --> 01:17:55,120 But it's half of the answer. 1391 01:17:55,120 --> 01:17:57,520 It's all the x matches forgetting about y. 1392 01:18:00,540 --> 01:18:03,140 Now here's the fun part. 1393 01:18:03,140 --> 01:18:08,280 So when I do a search here I get log n nodes. 1394 01:18:08,280 --> 01:18:11,210 Nodes are good because they have a single key in them. 1395 01:18:11,210 --> 01:18:13,980 So I'll just check for each of those log n nodes. 1396 01:18:13,980 --> 01:18:15,780 Do they also match in y? 1397 01:18:15,780 --> 01:18:17,570 If they do, add it to the answer. 1398 01:18:17,570 --> 01:18:20,000 If they don't forget about it. 1399 01:18:20,000 --> 01:18:21,610 OK. 1400 01:18:21,610 --> 01:18:25,370 But the tricky part is I also get log n subtrees representing 1401 01:18:25,370 --> 01:18:26,760 parts of the answer. 1402 01:18:26,760 --> 01:18:31,190 So potentially it could be that your search, this rectangle, 1403 01:18:31,190 --> 01:18:33,100 only has like five points. 1404 01:18:33,100 --> 01:18:36,030 But if you look at this whole vertical slab 1405 01:18:36,030 --> 01:18:38,200 there's a billion points. 1406 01:18:38,200 --> 01:18:39,740 Now, luckily, those billion points 1407 01:18:39,740 --> 01:18:41,050 are represented succinctly. 1408 01:18:41,050 --> 01:18:42,706 There's just log n subtrees saying, 1409 01:18:42,706 --> 01:18:44,080 well there's half a billion here, 1410 01:18:44,080 --> 01:18:46,579 and a quarter billion here, and an eighth of a billion here. 1411 01:18:50,970 --> 01:18:56,100 Now for each of that big chunk of output, 1412 01:18:56,100 --> 01:18:59,050 I want to very quickly find the ones that match in y. 1413 01:18:59,050 --> 01:19:01,220 How would I find the ones matching in y? 1414 01:19:03,910 --> 01:19:05,342 A range tree. 1415 01:19:05,342 --> 01:19:06,780 Yeah. 1416 01:19:06,780 --> 01:19:07,280 OK. 1417 01:19:07,280 --> 01:19:09,000 So here's what we're going to do. 1418 01:19:09,000 --> 01:19:14,690 For each node, call it x. 1419 01:19:14,690 --> 01:19:16,240 x is overloaded. 1420 01:19:16,240 --> 01:19:17,050 It's a coordinate. 1421 01:19:17,050 --> 01:19:17,960 So many things. 1422 01:19:17,960 --> 01:19:22,590 Let's call it v. In the, this thing 1423 01:19:22,590 --> 01:19:24,580 I'm going to call the x-tree. 1424 01:19:24,580 --> 01:19:27,110 So for every node in the x-tree I'm 1425 01:19:27,110 --> 01:19:30,810 going to store another 1D range tree. 1426 01:19:30,810 --> 01:19:42,460 But this time using the y-coordinate on all 1427 01:19:42,460 --> 01:19:50,360 points in these rooted subtree. 1428 01:19:52,899 --> 01:19:54,815 At this point I really want to draw a diagram. 1429 01:19:58,290 --> 01:20:00,580 So, rough picture. 1430 01:20:13,740 --> 01:20:17,000 Forgive me for not drawing this perfectly. 1431 01:20:17,000 --> 01:20:18,970 This is roughly what the answer looks 1432 01:20:18,970 --> 01:20:22,430 like for the 1D range search. 1433 01:20:22,430 --> 01:20:24,630 This is the x-tree. 1434 01:20:24,630 --> 01:20:28,900 And here I've searched between this value and this value 1435 01:20:28,900 --> 01:20:29,950 in the x-coordinate. 1436 01:20:29,950 --> 01:20:31,240 Basically I have log n nodes. 1437 01:20:31,240 --> 01:20:33,020 I'm going to check those separately. 1438 01:20:33,020 --> 01:20:35,970 Then I also have these log n subtrees. 1439 01:20:35,970 --> 01:20:40,280 For each of those log n sub trees 1440 01:20:40,280 --> 01:20:42,510 I'm going to have a pointer-- this 1441 01:20:42,510 --> 01:20:49,080 is the augmentation-- to another tree of exactly the same size. 1442 01:20:49,080 --> 01:20:51,860 On exactly the same data that's in here. 1443 01:20:51,860 --> 01:20:53,410 It's also over here. 1444 01:20:53,410 --> 01:20:55,900 But it's going to be sorted by y. 1445 01:20:55,900 --> 01:20:59,100 And it's a 1D range tree by y. 1446 01:20:59,100 --> 01:21:00,670 Tons of data duplication here. 1447 01:21:00,670 --> 01:21:03,740 I took all these points and I copied them over here, but then 1448 01:21:03,740 --> 01:21:05,420 built a 1D range tree in y. 1449 01:21:05,420 --> 01:21:06,590 This is all preprocessing. 1450 01:21:06,590 --> 01:21:08,595 So I don't have to pay for this. 1451 01:21:08,595 --> 01:21:09,470 It's polynomial time. 1452 01:21:09,470 --> 01:21:11,430 Don't worry too much. 1453 01:21:11,430 --> 01:21:13,730 And then I'm going to search in here. 1454 01:21:13,730 --> 01:21:15,260 What does the search in there look? 1455 01:21:15,260 --> 01:21:18,820 I'm going to get, you know, some more trees and a couple 1456 01:21:18,820 --> 01:21:20,540 more nodes. 1457 01:21:20,540 --> 01:21:21,040 OK. 1458 01:21:21,040 --> 01:21:25,710 But now those items, those points, match in x and y 1459 01:21:25,710 --> 01:21:28,410 because this whole subtree matched in x 1460 01:21:28,410 --> 01:21:31,530 and I just did a y search, so I found things that matched in y. 1461 01:21:31,530 --> 01:21:34,940 So I get here another log n trees 1462 01:21:34,940 --> 01:21:37,120 that are actually in my answer. 1463 01:21:37,120 --> 01:21:41,400 And for each of these nodes I have a corresponding other data 1464 01:21:41,400 --> 01:21:45,130 structure where I do a little search 1465 01:21:45,130 --> 01:21:46,400 and I get part of the answer. 1466 01:21:51,720 --> 01:21:53,500 Every one. 1467 01:21:53,500 --> 01:21:54,970 Sounds huge. 1468 01:21:54,970 --> 01:21:58,260 This data structure sounds huge, but it's actually small. 1469 01:21:58,260 --> 01:22:02,560 But one thing that's clear is it takes log squared n time, 1470 01:22:02,560 --> 01:22:05,050 because I have log n triangles over here. 1471 01:22:05,050 --> 01:22:08,520 For each of them I spend log n to find triangles over here. 1472 01:22:08,520 --> 01:22:12,960 The total output is log squared n nodes, for each of them 1473 01:22:12,960 --> 01:22:14,590 I have to check manually. 1474 01:22:14,590 --> 01:22:18,602 Plus, so over here, there's log n, 1475 01:22:18,602 --> 01:22:19,810 different searches I'm doing. 1476 01:22:19,810 --> 01:22:21,160 Each one has size log n. 1477 01:22:21,160 --> 01:22:23,870 So I get log squared little triangles that 1478 01:22:23,870 --> 01:22:27,230 contain the results that match in x and y. 1479 01:22:27,230 --> 01:22:30,639 How much space in this data structure? 1480 01:22:30,639 --> 01:22:31,930 That's the remaining challenge. 1481 01:22:36,140 --> 01:22:45,600 Actually, it's not that hard, because if you look at a key. 1482 01:22:45,600 --> 01:22:48,642 So look at some key in this x-tree. 1483 01:22:48,642 --> 01:22:50,350 Let's look at a leaf because that's maybe 1484 01:22:50,350 --> 01:22:51,224 the most interesting. 1485 01:22:57,440 --> 01:22:58,500 Here's the x-tree. 1486 01:22:58,500 --> 01:22:59,810 x-tree has linear size. 1487 01:22:59,810 --> 01:23:01,060 Just one tree. 1488 01:23:01,060 --> 01:23:06,570 If I look at some key value, well, it lives in this subtree. 1489 01:23:06,570 --> 01:23:09,060 And so there's going to be a corresponding blue structure 1490 01:23:09,060 --> 01:23:11,150 of that size that contains that key. 1491 01:23:11,150 --> 01:23:12,620 And then there's the parent. 1492 01:23:12,620 --> 01:23:14,340 So there's a structure here. 1493 01:23:14,340 --> 01:23:16,790 That has a corresponding blue triangle. 1494 01:23:16,790 --> 01:23:20,050 And then its parent, that's another triangle. 1495 01:23:20,050 --> 01:23:24,990 That contains-- I'm looking at a key k here. 1496 01:23:24,990 --> 01:23:28,580 All of these triangles contain the key k. 1497 01:23:28,580 --> 01:23:32,800 And so key k will be duplicated all this many times, 1498 01:23:32,800 --> 01:23:37,260 but how many sub trees is k in? 1499 01:23:37,260 --> 01:23:39,430 Log n. 1500 01:23:39,430 --> 01:23:45,160 Each key, fundamental fact about balanced binary search 1501 01:23:45,160 --> 01:23:51,684 trees, each key lives in log n subtrees. 1502 01:23:51,684 --> 01:23:52,850 Namely all of its ancestors. 1503 01:24:00,000 --> 01:24:01,180 Awesome. 1504 01:24:01,180 --> 01:24:05,430 Because that means the total space is n log n. 1505 01:24:05,430 --> 01:24:06,860 There's n keys. 1506 01:24:06,860 --> 01:24:09,145 Each of them is duplicated at most log n times. 1507 01:24:12,060 --> 01:24:15,610 In general, log to the d minus 1. 1508 01:24:15,610 --> 01:24:19,420 So If you do it in 3D, each of the blue trees, 1509 01:24:19,420 --> 01:24:21,430 every node in it has a corresponding pointer 1510 01:24:21,430 --> 01:24:25,990 to a red tree that's sorted by z. 1511 01:24:25,990 --> 01:24:29,050 And you just keep doing this, sort of, nested searching, 1512 01:24:29,050 --> 01:24:30,610 like super augmentation. 1513 01:24:30,610 --> 01:24:34,520 But you're only losing a log factor each dimension you add.