1 00:00:00,090 --> 00:00:02,490 The following content is provided under a Creative 2 00:00:02,490 --> 00:00:04,030 Commons license. 3 00:00:04,030 --> 00:00:06,360 Your support will help MIT OpenCourseWare 4 00:00:06,360 --> 00:00:10,720 continue to offer high quality educational resources for free. 5 00:00:10,720 --> 00:00:13,320 To make a donation or view additional materials 6 00:00:13,320 --> 00:00:17,280 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,280 --> 00:00:18,450 at ocw.mit.edu. 8 00:00:21,215 --> 00:00:22,090 PROFESSOR: All right. 9 00:00:22,090 --> 00:00:24,880 Today, we're going to look at some kind of different data 10 00:00:24,880 --> 00:00:28,000 structures for static trees. 11 00:00:28,000 --> 00:00:30,294 So we have-- at least in the second two problems-- 12 00:00:30,294 --> 00:00:31,210 we have a static tree. 13 00:00:31,210 --> 00:00:34,780 We want to preprocess it to answer lots of queries. 14 00:00:34,780 --> 00:00:37,380 And all the queries we're going to support today 15 00:00:37,380 --> 00:00:40,360 we'll do in constant time per operation, which is pretty 16 00:00:40,360 --> 00:00:42,130 awesome, and linear space. 17 00:00:42,130 --> 00:00:42,797 That's our goal. 18 00:00:42,797 --> 00:00:44,671 It's going to be hard to achieve these goals. 19 00:00:44,671 --> 00:00:46,510 But in the end, we will do it for all three 20 00:00:46,510 --> 00:00:47,260 of these problems. 21 00:00:47,260 --> 00:00:49,480 So let me tell you about these problems. 22 00:00:49,480 --> 00:00:57,160 Range minimum queries, you're given an array of numbers. 23 00:01:07,064 --> 00:01:08,855 And the kind of query you want to support-- 24 00:01:11,590 --> 00:01:14,671 we'll call RMQ of ij-- 25 00:01:17,420 --> 00:01:21,630 is to find the minimum in a range. 26 00:01:21,630 --> 00:01:34,280 So we have Ai up to Aj and we want to compute the minimum 27 00:01:34,280 --> 00:01:35,430 in that range. 28 00:01:35,430 --> 00:01:38,840 So i and j form the query. 29 00:01:38,840 --> 00:01:40,670 I think it's pretty clear what this means. 30 00:01:40,670 --> 00:01:44,820 I give you an interval that I care about, ij, 31 00:01:44,820 --> 00:01:46,730 and I want to know, in this range, 32 00:01:46,730 --> 00:01:48,230 what's the smallest value. 33 00:01:48,230 --> 00:01:50,750 And a little more subtle-- this will come up later. 34 00:01:50,750 --> 00:01:52,880 I don't just want to know the value that's 35 00:01:52,880 --> 00:01:54,380 there-- like say this is the minimum 36 00:01:54,380 --> 00:01:56,310 among that shaded region. 37 00:01:56,310 --> 00:02:00,820 But I also want to know the index K between i 38 00:02:00,820 --> 00:02:03,689 and j of that element. 39 00:02:03,689 --> 00:02:06,230 Of course, if I know the index, I can also look up the value. 40 00:02:06,230 --> 00:02:10,199 So it's more interesting to know that index. 41 00:02:10,199 --> 00:02:10,699 OK. 42 00:02:10,699 --> 00:02:13,880 This is a non-tree problem, but it will be closely related 43 00:02:13,880 --> 00:02:17,465 to tree problem, namely LCA. 44 00:02:22,240 --> 00:02:28,000 So LCA problem is you want to preprocess a tree. 45 00:02:28,000 --> 00:02:42,950 It's a rooted tree, and the query is LCA of two nodes. 46 00:02:42,950 --> 00:02:47,210 Which I think you know, or I guess I call them x and y. 47 00:02:47,210 --> 00:02:50,540 So it has two nodes x and y in the tree. 48 00:02:50,540 --> 00:02:53,327 I want to find their lowest common ancestor, which 49 00:02:53,327 --> 00:02:54,410 looks something like that. 50 00:02:57,110 --> 00:02:59,210 At some point they have shared ancestors, 51 00:02:59,210 --> 00:03:01,926 and we want to find that lowest one. 52 00:03:01,926 --> 00:03:03,800 And then another problem we're going to solve 53 00:03:03,800 --> 00:03:06,750 is level ancestor, which again, preprocess 54 00:03:06,750 --> 00:03:14,952 a rooted tree and the query is a little different. 55 00:03:18,050 --> 00:03:22,130 Given a node and an integer k-- 56 00:03:22,130 --> 00:03:25,220 positive integer-- I want to find 57 00:03:25,220 --> 00:03:30,740 the kth ancestor of that node. 58 00:03:30,740 --> 00:03:37,040 Which you might write parent to the k, meaning I have a node x, 59 00:03:37,040 --> 00:03:38,831 the first ancestor is its parent. 60 00:03:41,780 --> 00:03:45,720 Eventually want to get to the kth ancestor. 61 00:03:45,720 --> 00:03:49,520 So I want to jump from x to there. 62 00:03:49,520 --> 00:03:53,840 So it's like teleporting to a target height above me. 63 00:03:53,840 --> 00:03:58,447 Obviously, k cannot be larger than the depth of the node. 64 00:03:58,447 --> 00:04:00,905 So these are the three problems we're going to solve today, 65 00:04:00,905 --> 00:04:06,380 RMQ, LCA, and LA. 66 00:04:06,380 --> 00:04:08,500 Using somewhat similar techniques, 67 00:04:08,500 --> 00:04:10,250 we're going to use a nice technique called 68 00:04:10,250 --> 00:04:12,141 table look-up, which is generally 69 00:04:12,141 --> 00:04:13,640 useful for a lot of data structures. 70 00:04:13,640 --> 00:04:18,060 We are working in the Word RAM throughout. 71 00:04:18,060 --> 00:04:20,959 But that's not as essential as it has been in our past integer 72 00:04:20,959 --> 00:04:24,620 data structures. 73 00:04:24,620 --> 00:04:26,540 Now the fun thing about these problems 74 00:04:26,540 --> 00:04:30,110 is while LCA and LA look quite similar-- 75 00:04:30,110 --> 00:04:33,830 I mean, they even share two letters out of three-- 76 00:04:33,830 --> 00:04:34,880 they're quite different. 77 00:04:34,880 --> 00:04:37,130 As far as I know, you need fairly different techniques 78 00:04:37,130 --> 00:04:39,111 to deal with-- or as far as anyone knows-- 79 00:04:39,111 --> 00:04:40,610 you need pretty different techniques 80 00:04:40,610 --> 00:04:42,410 to deal with both of them. 81 00:04:42,410 --> 00:04:45,020 The original paper that solved level ancestors kind of 82 00:04:45,020 --> 00:04:47,360 lamented on this. 83 00:04:47,360 --> 00:04:49,160 RMQ, on the other hand, turns out 84 00:04:49,160 --> 00:04:52,190 to be basically identical to LCA. 85 00:04:52,190 --> 00:04:54,750 So that's the more surprising thing, 86 00:04:54,750 --> 00:04:57,482 and I want to start with that. 87 00:04:57,482 --> 00:04:59,690 Again, our goal is to get constant time, linear space 88 00:04:59,690 --> 00:05:02,390 for all these problems. 89 00:05:02,390 --> 00:05:05,419 Constant time is easy to get with polynomial space. 90 00:05:05,419 --> 00:05:06,960 You could just store all the answers. 91 00:05:06,960 --> 00:05:11,210 There's only n squared different queries for all these problems, 92 00:05:11,210 --> 00:05:13,090 so quadratic space is easy. 93 00:05:13,090 --> 00:05:15,540 Linear space is the hard part. 94 00:05:15,540 --> 00:05:19,130 So let me tell you about a nice reduction 95 00:05:19,130 --> 00:05:20,810 from an array to a tree. 96 00:05:30,830 --> 00:05:32,159 Very simple idea. 97 00:05:32,159 --> 00:05:33,450 It's called the Cartesian tree. 98 00:05:33,450 --> 00:05:37,110 It goes back to Gabow Bentley and Tarjan in 1984. 99 00:05:37,110 --> 00:05:40,019 It's an old idea, but it comes up now and then, 100 00:05:40,019 --> 00:05:41,810 and in particular, provides the equivalence 101 00:05:41,810 --> 00:05:45,980 between RMQ and LCA, or one direction of it. 102 00:05:45,980 --> 00:05:48,555 I just take a minimum element-- 103 00:05:51,170 --> 00:05:53,010 let's call it Ai-- 104 00:05:53,010 --> 00:05:55,370 of the array. 105 00:05:55,370 --> 00:05:58,560 Let that be the root of my tree. 106 00:05:58,560 --> 00:06:03,560 And then the left sub-tree of T is just 107 00:06:03,560 --> 00:06:10,640 going to be a Cartesian tree on all 108 00:06:10,640 --> 00:06:12,610 the elements to the left of i. 109 00:06:12,610 --> 00:06:19,220 So A less than i, and then the right sub-tree 110 00:06:19,220 --> 00:06:23,960 is going to be A greater than i. 111 00:06:23,960 --> 00:06:25,190 So let's do little example. 112 00:06:32,560 --> 00:06:41,976 Suppose we have 8, 7, 2, 8, 6, 9, 4, 5. 113 00:06:45,210 --> 00:06:48,150 So the minimum in this rate is 2. 114 00:06:48,150 --> 00:06:50,190 So it gets promoted to the root, which 115 00:06:50,190 --> 00:06:55,350 decomposes the problem into two halves, the left half 116 00:06:55,350 --> 00:06:56,280 and the right half. 117 00:06:56,280 --> 00:07:00,060 So drawing the tree, I put 2-- 118 00:07:00,060 --> 00:07:02,130 maybe over here is actually nicer-- 119 00:07:02,130 --> 00:07:03,870 2 at the root. 120 00:07:03,870 --> 00:07:06,440 On the left side, 7 is the smallest. 121 00:07:06,440 --> 00:07:08,550 And so it's going to get promoted to be the root, 122 00:07:08,550 --> 00:07:12,060 and so the left side will look like this. 123 00:07:12,060 --> 00:07:15,840 On the right side, the minimum is 4, 124 00:07:15,840 --> 00:07:24,460 so 4 is the right root, which decomposes into the left half 125 00:07:24,460 --> 00:07:25,920 there, the right half there. 126 00:07:25,920 --> 00:07:29,400 So the right thing is just 5. 127 00:07:29,400 --> 00:07:34,050 Here the minimum is 6, and so we get a nice binary tree 128 00:07:34,050 --> 00:07:34,770 on the left here. 129 00:07:37,751 --> 00:07:38,250 OK. 130 00:07:38,250 --> 00:07:40,650 This is not a binary search tree. 131 00:07:40,650 --> 00:07:41,600 It's a min heap. 132 00:07:49,330 --> 00:07:53,450 Cartesian tree is a min heap. 133 00:07:53,450 --> 00:07:56,420 But Cartesian trees have a more interesting property, 134 00:07:56,420 --> 00:07:59,750 which I've kind of alluded to a couple of times already, 135 00:07:59,750 --> 00:08:02,140 which is that LCAs in this tree correspond 136 00:08:02,140 --> 00:08:04,970 to RMQs in this array. 137 00:08:04,970 --> 00:08:07,400 So let's do some examples. 138 00:08:07,400 --> 00:08:11,140 Let's say I do LCA of 7 and 8. 139 00:08:11,140 --> 00:08:12,100 That's 2. 140 00:08:12,100 --> 00:08:16,630 Anything from the left and the right sub-tree, the LCA is 2. 141 00:08:16,630 --> 00:08:20,490 And indeed, if I take anything, any interval that spans 2, 142 00:08:20,490 --> 00:08:22,684 then the RMQ is 2. 143 00:08:22,684 --> 00:08:25,100 If I don't span 2, I'm either in the left or in the right. 144 00:08:25,100 --> 00:08:28,790 Let's say I'm on the right, say I do an LCA between 9 and 5. 145 00:08:28,790 --> 00:08:34,449 I get 4 because, yeah, the RMQ between 9 and 5 is 4. 146 00:08:34,449 --> 00:08:35,710 Make sense? 147 00:08:35,710 --> 00:08:41,860 Same problem, really, because it's all about which mins-- 148 00:08:41,860 --> 00:08:44,320 I mean in the sequence of mins-- which mins do you contain? 149 00:08:44,320 --> 00:08:46,090 If you contain the first min, you 150 00:08:46,090 --> 00:08:48,250 contain the highest min you contain. 151 00:08:48,250 --> 00:08:55,330 That is the answer and that's what LCA in this tree 152 00:08:55,330 --> 00:08:56,730 gives you. 153 00:08:56,730 --> 00:09:02,620 So LCA i and j in this tree T equals 154 00:09:02,620 --> 00:09:08,090 RMQ in the original array of the corresponding elements. 155 00:09:08,090 --> 00:09:11,080 So there is a bijection between these items, 156 00:09:11,080 --> 00:09:13,090 and so I and J here represents nodes, 157 00:09:13,090 --> 00:09:18,111 and here corresponding to the corresponding items in A. 158 00:09:18,111 --> 00:09:18,610 OK. 159 00:09:18,610 --> 00:09:21,610 So this says if you wanted to solve RMQ, 160 00:09:21,610 --> 00:09:25,860 you can reduce it to an LCA problem. 161 00:09:25,860 --> 00:09:29,680 Quick note here, which is-- 162 00:09:29,680 --> 00:09:30,370 yeah. 163 00:09:30,370 --> 00:09:32,744 There's a couple of different versions of Cartesian trees 164 00:09:32,744 --> 00:09:35,880 when you have ties, so here I only had one 2. 165 00:09:35,880 --> 00:09:40,120 If there was another 2, then you could either just 166 00:09:40,120 --> 00:09:42,820 break ties arbitrarily and you get a binary tree, 167 00:09:42,820 --> 00:09:46,480 or you could make them all one node, which is kind of messier, 168 00:09:46,480 --> 00:09:48,520 and then you get a non-binary tree. 169 00:09:48,520 --> 00:09:52,210 I think I'll say we disambiguate arbitrarily. 170 00:09:52,210 --> 00:09:55,020 Just pick any min, and then you get a binary tree. 171 00:09:55,020 --> 00:09:56,910 It won't affect the answer. 172 00:09:56,910 --> 00:09:59,634 But I think the original paper might do it a different way. 173 00:10:02,235 --> 00:10:02,735 OK. 174 00:10:06,250 --> 00:10:06,880 Let's see. 175 00:10:06,880 --> 00:10:10,240 So then let me just mention a fun fact 176 00:10:10,240 --> 00:10:12,490 about this reduction, which is that you can compute it 177 00:10:12,490 --> 00:10:14,230 in linear time. 178 00:10:14,230 --> 00:10:16,804 This is a fun fact we basically saw last class, 179 00:10:16,804 --> 00:10:18,595 although in a completely different setting, 180 00:10:18,595 --> 00:10:21,160 so it's not at all obvious. 181 00:10:21,160 --> 00:10:23,740 But you may recall, we had a method last time 182 00:10:23,740 --> 00:10:27,520 for building a compressed trie in linear time. 183 00:10:27,520 --> 00:10:29,530 Basically, same thing works here, 184 00:10:29,530 --> 00:10:31,496 although it seems quite different. 185 00:10:31,496 --> 00:10:33,220 The idea is if you want to build this, 186 00:10:33,220 --> 00:10:34,570 if you build a Cartesian tree according 187 00:10:34,570 --> 00:10:36,111 to this recursive algorithm, you will 188 00:10:36,111 --> 00:10:39,320 spend n log n time or actually, maybe even quadratic time, 189 00:10:39,320 --> 00:10:42,190 if you're computing min with a linear scan. 190 00:10:42,190 --> 00:10:44,080 So don't use that recursive algorithm. 191 00:10:44,080 --> 00:10:46,690 Just walk through the array, left to right, one at a time. 192 00:10:46,690 --> 00:10:48,250 So first you insert 8. 193 00:10:48,250 --> 00:10:49,720 Then you insert 7, and you realize 194 00:10:49,720 --> 00:10:53,980 7 would have would have won, so you put 7 above 8. 195 00:10:53,980 --> 00:10:54,750 Then you insert 2. 196 00:10:54,750 --> 00:10:59,590 You say that's even higher than 7, so I have to put it up here. 197 00:10:59,590 --> 00:11:03,640 Then you insert 8 so that you'll just go down from there, 198 00:11:03,640 --> 00:11:06,250 and you put 8 as a right child of 2. 199 00:11:06,250 --> 00:11:07,120 Then you insert 6. 200 00:11:07,120 --> 00:11:12,790 You say whoops, 6 actually would have gone in between 2 and 8. 201 00:11:12,790 --> 00:11:15,211 And the way you'd see that is-- 202 00:11:15,211 --> 00:11:17,710 I mean, at that moment, your tree looks something like this. 203 00:11:17,710 --> 00:11:21,367 You've got 2, 8, and there's other stuff to the left, 204 00:11:21,367 --> 00:11:22,450 but I don't actually care. 205 00:11:22,450 --> 00:11:23,970 I just care about the right spine. 206 00:11:23,970 --> 00:11:25,690 I say I'm inserting 6. 207 00:11:25,690 --> 00:11:28,390 6 would have been above 8, but not above 2. 208 00:11:28,390 --> 00:11:31,090 Therefore, it fits along this edge, 209 00:11:31,090 --> 00:11:38,110 and so I convert this tree into this pattern, 210 00:11:38,110 --> 00:11:40,150 and it will always look like this. 211 00:11:43,090 --> 00:11:45,240 8 becomes a child of 7-- 212 00:11:45,240 --> 00:11:47,760 sorry, 6. 213 00:11:47,760 --> 00:11:49,380 6. 214 00:11:49,380 --> 00:11:51,006 Thanks. 215 00:11:51,006 --> 00:11:52,350 Not 7. 216 00:11:52,350 --> 00:11:53,280 7 was on the left. 217 00:11:53,280 --> 00:11:58,390 This is the guy I'm inserting next because here. 218 00:11:58,390 --> 00:12:03,450 So I guess it's a left child because it's the first one. 219 00:12:03,450 --> 00:12:05,170 So we insert 6 like this. 220 00:12:05,170 --> 00:12:07,332 So now the new right spine is 2, 6, 221 00:12:07,332 --> 00:12:08,790 and from then on, we will always be 222 00:12:08,790 --> 00:12:10,080 working to the right of that. 223 00:12:10,080 --> 00:12:13,030 We'll never be touching any of this left stuff. 224 00:12:13,030 --> 00:12:13,530 OK. 225 00:12:13,530 --> 00:12:15,390 So how long did it take me to do that? 226 00:12:15,390 --> 00:12:19,200 In general, I have a right spine of the tree, which are all 227 00:12:19,200 --> 00:12:22,860 right edges, and I might have to walk up several steps 228 00:12:22,860 --> 00:12:28,620 before I discover whoops, this is where the next item belongs. 229 00:12:28,620 --> 00:12:32,970 And then I convert it into this new entry, 230 00:12:32,970 --> 00:12:35,709 which has a left child, which is that stuff. 231 00:12:35,709 --> 00:12:37,500 But this stuff becomes irrelevant from then 232 00:12:37,500 --> 00:12:40,120 on, because now, this is the new right spine. 233 00:12:40,120 --> 00:12:42,630 And so if this is a long walk, I charge that 234 00:12:42,630 --> 00:12:45,150 to the decrease in the length of the right spine, 235 00:12:45,150 --> 00:12:47,460 just like that algorithm last time. 236 00:12:47,460 --> 00:12:50,670 Slightly different notion of right spine. 237 00:12:50,670 --> 00:12:53,449 So same amortization, you get linear time, 238 00:12:53,449 --> 00:12:54,990 and you can build the Cartesian tree. 239 00:12:54,990 --> 00:12:57,031 This is actually where that algorithm comes from. 240 00:12:57,031 --> 00:12:58,330 This one was first, I believe. 241 00:13:01,410 --> 00:13:03,610 Questions? 242 00:13:03,610 --> 00:13:05,819 I'm not worrying too much about build time, 243 00:13:05,819 --> 00:13:07,860 how long it takes to build these data structures, 244 00:13:07,860 --> 00:13:09,930 but they can all be built in linear time. 245 00:13:09,930 --> 00:13:12,540 And this is one of the cooler algorithms, 246 00:13:12,540 --> 00:13:15,150 and it's a nice tie into last lecture. 247 00:13:15,150 --> 00:13:17,550 So that's a reduction from RMQ to LCA, 248 00:13:17,550 --> 00:13:21,122 so now all of our problems are about trees, in some sense. 249 00:13:21,122 --> 00:13:22,830 I mean, there's a reason I mentioned RMQ. 250 00:13:22,830 --> 00:13:25,350 Not just that it's a handy problem to have solved, 251 00:13:25,350 --> 00:13:29,250 but we're actually going to use RMQ to solve LCA. 252 00:13:29,250 --> 00:13:32,340 So we're going to go back and forth between the two a lot. 253 00:13:34,870 --> 00:13:37,410 Actually, we'll spend most of our time in the RMQ land. 254 00:13:37,410 --> 00:13:40,830 So let me tell you about the reverse direction, 255 00:13:40,830 --> 00:13:45,330 if you want to reduce LCA to RMQ. 256 00:13:45,330 --> 00:13:48,300 That also works. 257 00:13:48,300 --> 00:13:53,510 And you can kind of see it in this picture. 258 00:13:53,510 --> 00:13:55,790 If I gave you this tree, how would you 259 00:13:55,790 --> 00:13:57,500 reconstruct this array? 260 00:13:57,500 --> 00:13:59,730 Pop quiz. 261 00:13:59,730 --> 00:14:01,400 How do I go from here to here? 262 00:14:07,820 --> 00:14:09,584 In-order traversal, yep. 263 00:14:09,584 --> 00:14:11,750 Just doing an in-order traversal, write those guys-- 264 00:14:11,750 --> 00:14:13,801 I mean, yeah. 265 00:14:13,801 --> 00:14:14,300 Pretty easy. 266 00:14:14,300 --> 00:14:18,020 Now, not so easy because in the LCA problem, 267 00:14:18,020 --> 00:14:19,730 I don't have numbers in the nodes. 268 00:14:19,730 --> 00:14:21,950 So if I do an in-order walk and I write stuff, 269 00:14:21,950 --> 00:14:24,910 it's like, what should I write for each of the nodes. 270 00:14:24,910 --> 00:14:26,223 Any suggestions? 271 00:14:45,450 --> 00:14:47,174 AUDIENCE: [INAUDIBLE] 272 00:14:47,174 --> 00:14:48,090 PROFESSOR: The height? 273 00:14:48,090 --> 00:14:48,900 Not quite the height. 274 00:14:48,900 --> 00:14:49,440 The depth. 275 00:14:56,000 --> 00:14:58,590 That will work. 276 00:14:58,590 --> 00:15:04,256 So let's do it, just so it's clear. 277 00:15:04,256 --> 00:15:12,246 Look at the same tree Is that the same tree? 278 00:15:12,246 --> 00:15:12,745 Yep. 279 00:15:12,745 --> 00:15:14,615 So I write the depths. 280 00:15:14,615 --> 00:15:20,405 0, 1, 1, 2, 2, 2, 3, 3. 281 00:15:20,405 --> 00:15:22,530 It's either height or depth, and you try them both. 282 00:15:22,530 --> 00:15:24,790 This is depth. 283 00:15:24,790 --> 00:15:28,625 So I do an in-order walk I get 2, 1, 0-- 284 00:15:31,449 --> 00:15:32,490 can you read my writing-- 285 00:15:32,490 --> 00:15:35,474 3, 2, 3, 1, 2. 286 00:15:35,474 --> 00:15:37,890 It's funny doing an in-order traversal on something that's 287 00:15:37,890 --> 00:15:40,590 not a binary search tree, but there it is. 288 00:15:40,590 --> 00:15:43,350 That's the order in which you visit the nodes. 289 00:15:43,350 --> 00:15:48,630 And you stare at it long enough, this sequence 290 00:15:48,630 --> 00:15:52,380 will behave exactly the same as this sequence. 291 00:15:52,380 --> 00:15:55,510 Of course, not in terms of the actual values returned. 292 00:15:55,510 --> 00:15:58,650 But if you do the argument version of RMQ, 293 00:15:58,650 --> 00:16:01,830 you just ask for what's the index that gives me the min. 294 00:16:01,830 --> 00:16:05,660 If you can solve RMQ on this structure, 295 00:16:05,660 --> 00:16:10,050 then that RMQ will give exactly the same answers 296 00:16:10,050 --> 00:16:12,030 as this structure. 297 00:16:12,030 --> 00:16:13,680 Just kind of nifty. 298 00:16:13,680 --> 00:16:17,260 Because here I had numbers, they could be all over the place. 299 00:16:17,260 --> 00:16:19,940 Here I have very clean numbers. 300 00:16:19,940 --> 00:16:24,670 They will go between 0 and the height of the tree. 301 00:16:24,670 --> 00:16:28,090 So in general at most, 0, 2, n minus 1. 302 00:16:28,090 --> 00:16:31,980 So fun consequence of this is you get a tool 303 00:16:31,980 --> 00:16:38,124 for universe reduction in RMQ. 304 00:16:38,124 --> 00:16:39,790 The tree problems don't have this issue, 305 00:16:39,790 --> 00:16:41,248 because they don't involve numbers. 306 00:16:41,248 --> 00:16:44,710 They involve trees, and that's why this reduction does this. 307 00:16:44,710 --> 00:16:54,190 But you can start from an arbitrary ordered universe 308 00:16:54,190 --> 00:16:59,140 and have an RMQ problem on that, and you can convert it to LCA. 309 00:16:59,140 --> 00:17:10,599 And then you can convert it to a nice clean universe RMQ, 310 00:17:10,599 --> 00:17:12,989 just by doing the Cartesian tree and then doing 311 00:17:12,989 --> 00:17:14,530 the in-order traversal of the depths. 312 00:17:17,430 --> 00:17:21,480 This is kind of nifty because if you look at these algorithms, 313 00:17:21,480 --> 00:17:23,784 they only assume a comparison model. 314 00:17:23,784 --> 00:17:25,200 So these don't have to be numbers. 315 00:17:25,200 --> 00:17:27,359 They just have to be something from a totally ordered universe 316 00:17:27,359 --> 00:17:29,220 that you can compare in constant time. 317 00:17:29,220 --> 00:17:30,780 You do this reduction, and now we 318 00:17:30,780 --> 00:17:33,870 can assume they're integers, nice small integers, and that 319 00:17:33,870 --> 00:17:36,510 will let us solve things in constant time using the Word 320 00:17:36,510 --> 00:17:38,040 RAM. 321 00:17:38,040 --> 00:17:42,250 So you don't need to assume that about the original values. 322 00:17:42,250 --> 00:17:44,110 Cool. 323 00:17:44,110 --> 00:17:46,160 So, time to actually solve something. 324 00:17:46,160 --> 00:17:47,335 We've done reductions. 325 00:17:47,335 --> 00:17:49,750 We now know RMQ and LCA are equivalent. 326 00:17:49,750 --> 00:17:50,740 Let's solve them both. 327 00:18:01,500 --> 00:18:06,410 Kind of like the last of the sorting we saw, 328 00:18:06,410 --> 00:18:08,062 there's going to be a lot of steps. 329 00:18:08,062 --> 00:18:09,270 They're not sequential steps. 330 00:18:09,270 --> 00:18:11,850 These are like different versions of a data structure 331 00:18:11,850 --> 00:18:14,040 for solving RMQ, and they're going 332 00:18:14,040 --> 00:18:17,540 to be getting progressively better and better. 333 00:18:17,540 --> 00:18:26,560 So LCA which applies RMQ. 334 00:18:30,450 --> 00:18:33,660 This is originally solved by Harel and Tarjan in 1984, 335 00:18:33,660 --> 00:18:35,899 but is rather complicated. 336 00:18:35,899 --> 00:18:37,440 And then what I'm going to talk about 337 00:18:37,440 --> 00:18:41,580 is a version from 2000 by Bender and Farach-Colton, 338 00:18:41,580 --> 00:18:44,830 same authors from the cache-oblivious B-trees. 339 00:18:44,830 --> 00:18:48,420 That's a much simpler presentation. 340 00:18:48,420 --> 00:18:54,630 So first step is I want to do this reduction again from LCA 341 00:18:54,630 --> 00:18:57,799 to RMQ, but slightly differently. 342 00:18:57,799 --> 00:19:00,090 And we're going to get a more restricted problem called 343 00:19:00,090 --> 00:19:01,530 plus or minus 1 RMQ. 344 00:19:05,130 --> 00:19:07,140 What is plus or minus 1 RMQ? 345 00:19:07,140 --> 00:19:13,170 Just means that you get an array where all the adjacent values 346 00:19:13,170 --> 00:19:16,128 differ by plus or minus 1. 347 00:19:19,170 --> 00:19:21,540 And if you look at the numbers here, a lot of them 348 00:19:21,540 --> 00:19:23,540 differ by plus or minus 1. 349 00:19:23,540 --> 00:19:24,690 These all do. 350 00:19:24,690 --> 00:19:26,250 But then there are some big gaps-- 351 00:19:26,250 --> 00:19:29,440 like this has a gap of 3, this has a gap of 2. 352 00:19:29,440 --> 00:19:31,710 This is plus or minus 1. 353 00:19:31,710 --> 00:19:34,320 That's almost right, and if you just 354 00:19:34,320 --> 00:19:38,550 stare at this idea of tree walk enough, 355 00:19:38,550 --> 00:19:40,230 you'll realize a little trick to make 356 00:19:40,230 --> 00:19:44,370 the array a little bit bigger, but give you plus or minus 357 00:19:44,370 --> 00:19:45,360 ones. 358 00:19:45,360 --> 00:19:47,120 If you've done a lot of tree traversal, 359 00:19:47,120 --> 00:19:50,040 this will come quite naturally. 360 00:19:50,040 --> 00:19:53,530 This is a depth first search. 361 00:19:53,530 --> 00:19:55,460 This is how the depth first search order 362 00:19:55,460 --> 00:19:58,210 of visiting a tree in order. 363 00:19:58,210 --> 00:20:00,660 This is usually called an Eulerian tour. 364 00:20:00,660 --> 00:20:04,450 The concept we'll come back to in a few lectures. 365 00:20:04,450 --> 00:20:08,070 But Euler tour just means you visit every edge twice, 366 00:20:08,070 --> 00:20:10,662 in this case. 367 00:20:10,662 --> 00:20:12,120 If you look at the node visits, I'm 368 00:20:12,120 --> 00:20:16,590 visiting this node here, here, and here, three times. 369 00:20:16,590 --> 00:20:19,110 But it's amortized constant, because every edge is just 370 00:20:19,110 --> 00:20:21,270 visited twice. 371 00:20:21,270 --> 00:20:23,790 What I'd like to do is follow an Euler tour 372 00:20:23,790 --> 00:20:29,490 and then write down all the nodes that I visit, 373 00:20:29,490 --> 00:20:31,710 but with repetition. 374 00:20:31,710 --> 00:20:41,010 So in that picture I will get 0, 1, 2, 1. 375 00:20:41,010 --> 00:20:48,450 I go 0, 1, 2, back to 1, back to 0, then over to the 1 376 00:20:48,450 --> 00:20:52,740 on the right, then to the 2, then to the 3, 377 00:20:52,740 --> 00:20:55,260 then back up to the 2, then down to the other 3, 378 00:20:55,260 --> 00:20:57,120 then back up to the 2, back up to the 1, 379 00:20:57,120 --> 00:21:00,690 back down to the last node on the right, 380 00:21:00,690 --> 00:21:03,040 and back up and back up. 381 00:21:03,040 --> 00:21:03,540 OK. 382 00:21:03,540 --> 00:21:06,300 This is what we call Euler tour. 383 00:21:06,300 --> 00:21:08,130 So multiple visits-- for example, here's 384 00:21:08,130 --> 00:21:11,670 all the places that the root is visited. 385 00:21:11,670 --> 00:21:16,050 Here's all the places that this node is visited, 386 00:21:16,050 --> 00:21:20,430 then this node is visited 3 times. 387 00:21:20,430 --> 00:21:23,890 It's going to be visited once per incident edge. 388 00:21:23,890 --> 00:21:26,840 I think you get the pattern. 389 00:21:26,840 --> 00:21:28,680 I'm just going to store this. 390 00:21:28,680 --> 00:21:30,911 And what else am I going to do? 391 00:21:30,911 --> 00:21:31,410 Let's see. 392 00:21:31,410 --> 00:21:41,610 Each node in the tree stores, let's say, 393 00:21:41,610 --> 00:21:45,450 the first visit in the array. 394 00:21:45,450 --> 00:21:47,086 Pretty sure this is enough. 395 00:21:47,086 --> 00:21:48,960 You could maybe store the last visit as well. 396 00:21:48,960 --> 00:21:51,790 We can only store a constant number of things. 397 00:21:51,790 --> 00:22:04,500 And I guess each array item stores a pointer 398 00:22:04,500 --> 00:22:06,120 to the corresponding node in the tree. 399 00:22:12,340 --> 00:22:12,840 OK. 400 00:22:12,840 --> 00:22:15,960 So each instance of the 0 stores a pointer to the root, 401 00:22:15,960 --> 00:22:18,470 and so on. 402 00:22:18,470 --> 00:22:21,360 It's kind of what these horizontal bars are indicating, 403 00:22:21,360 --> 00:22:23,760 but those aren't actually stored. 404 00:22:23,760 --> 00:22:24,330 OK. 405 00:22:24,330 --> 00:22:32,390 So I claim still RMQ and here is the same as LCA over there. 406 00:22:32,390 --> 00:22:35,550 It's maybe a little more subtle, but now 407 00:22:35,550 --> 00:22:39,300 if I want to compute the LCA of two nodes, 408 00:22:39,300 --> 00:22:41,110 I look at their first occurrences. 409 00:22:41,110 --> 00:22:42,690 So let's do-- I don't know-- 410 00:22:42,690 --> 00:22:44,910 2 and 3. 411 00:22:44,910 --> 00:22:46,530 Here, this 2 and this 3. 412 00:22:46,530 --> 00:22:49,515 I didn't label them, but I happen to know where they are. 413 00:22:49,515 --> 00:22:51,360 2 is here, and it's the first 3. 414 00:22:54,150 --> 00:22:56,669 Now here, they happen to only occur once in the tour, 415 00:22:56,669 --> 00:22:57,710 so it's a little clearer. 416 00:22:57,710 --> 00:23:00,590 If I compute the RMQ, I get this 0, this 0, 417 00:23:00,590 --> 00:23:03,090 as opposed to the other 0s, but this 0 points to the root, 418 00:23:03,090 --> 00:23:05,400 so I get the LCA. 419 00:23:05,400 --> 00:23:07,570 Let's do ones that do not have unique occurrences. 420 00:23:07,570 --> 00:23:11,475 So like, this guy and this guy, the first 1 and the first 2 421 00:23:11,475 --> 00:23:14,811 It'd be this 1 and this 1. 422 00:23:14,811 --> 00:23:16,560 In fact, I think any of the 2s would work. 423 00:23:16,560 --> 00:23:17,481 Doesn't really matter. 424 00:23:17,481 --> 00:23:18,730 Just have to pick one of them. 425 00:23:18,730 --> 00:23:21,150 So I picked the leftmost one for consistency. 426 00:23:21,150 --> 00:23:24,400 Then I take the RMQ, again I get 0. 427 00:23:24,400 --> 00:23:25,960 You can test that for all of them. 428 00:23:25,960 --> 00:23:27,610 I think the slightly more subtle case 429 00:23:27,610 --> 00:23:30,140 is when one node is an ancestor of another. 430 00:23:30,140 --> 00:23:35,060 So let's do that, 1 here and 3 there. 431 00:23:35,060 --> 00:23:37,390 I think here you do need to be leftmost or rightmost, 432 00:23:37,390 --> 00:23:38,920 consistently. 433 00:23:38,920 --> 00:23:43,330 So I take the 1 and I take the second 3. 434 00:23:43,330 --> 00:23:43,830 OK. 435 00:23:43,830 --> 00:23:48,820 I take the RMQ of that, I get 1 which is the higher of the two. 436 00:23:48,820 --> 00:23:49,320 OK. 437 00:23:49,320 --> 00:23:50,959 So it seems to work. 438 00:23:50,959 --> 00:23:53,500 Actually, I think it would work no matter which guy you pick. 439 00:23:53,500 --> 00:23:56,390 I just picked the first one. 440 00:23:56,390 --> 00:23:58,872 OK, no big deal. 441 00:23:58,872 --> 00:24:01,330 You're not going to see why this is useful for a little bit 442 00:24:01,330 --> 00:24:05,080 until step 4 or something, but we've slightly 443 00:24:05,080 --> 00:24:08,110 simplified our problem to this plus or minus 1 RMQ. 444 00:24:08,110 --> 00:24:11,300 Otherwise identical to this in-order traversal. 445 00:24:11,300 --> 00:24:13,940 So not a big deal, but we'll need it later. 446 00:24:16,691 --> 00:24:17,190 OK. 447 00:24:34,476 --> 00:24:35,350 That was a reduction. 448 00:24:35,350 --> 00:24:38,140 Next, we're finally going to actually solve something. 449 00:24:38,140 --> 00:24:47,920 I'm going to do constant time, n log n space, RMQ. 450 00:24:47,920 --> 00:24:50,910 This data structure will not require plus or minus 1 RMQ. 451 00:24:50,910 --> 00:24:52,660 It works for any RMQ. 452 00:24:52,660 --> 00:24:54,560 It's actually a very simple idea, 453 00:24:54,560 --> 00:24:55,790 and it's almost what we need. 454 00:24:55,790 --> 00:24:58,040 But we're going to have to get rid of this log factor. 455 00:24:58,040 --> 00:25:00,050 That will be step 3. 456 00:25:00,050 --> 00:25:01,240 OK, so here's the idea. 457 00:25:01,240 --> 00:25:04,000 You've got an array. 458 00:25:04,000 --> 00:25:06,670 And now someone gives you an arbitrary interval 459 00:25:06,670 --> 00:25:08,725 from here to here. 460 00:25:11,260 --> 00:25:14,290 Ideally, I just store the mins for every possible interval, 461 00:25:14,290 --> 00:25:15,955 but there's n squared intervals. 462 00:25:15,955 --> 00:25:20,830 So instead, what I'm going to do is store the answer 463 00:25:20,830 --> 00:25:24,880 not for all the intervals, but for all intervals of length 464 00:25:24,880 --> 00:25:27,286 of power of 2. 465 00:25:27,286 --> 00:25:29,470 It's a trick you've probably seen before. 466 00:25:32,500 --> 00:25:34,185 This is the easy thing to do. 467 00:25:34,185 --> 00:25:35,560 And then the interesting thing is 468 00:25:35,560 --> 00:25:37,920 how you make it actually get down to linear space. 469 00:25:40,570 --> 00:25:44,260 Length, power of 2. 470 00:25:46,540 --> 00:25:47,040 OK. 471 00:25:47,040 --> 00:25:50,230 There are only log n possible powers of 2. 472 00:25:50,230 --> 00:25:53,200 There's still n different start points for those intervals, 473 00:25:53,200 --> 00:25:55,720 so total number of intervals is n log n. 474 00:25:55,720 --> 00:25:58,857 So this is n log n space, because I'm storing 475 00:25:58,857 --> 00:25:59,940 an index for each of them. 476 00:26:02,520 --> 00:26:04,370 OK. 477 00:26:04,370 --> 00:26:07,520 And then if I have an arbitrary query, the point is-- 478 00:26:07,520 --> 00:26:10,280 let's call it length k-- 479 00:26:10,280 --> 00:26:14,300 then I can cover it by two intervals 480 00:26:14,300 --> 00:26:16,790 of length a power of 2. 481 00:26:16,790 --> 00:26:18,500 They will be the same length. 482 00:26:18,500 --> 00:26:21,980 They will be length 2 to the floor 483 00:26:21,980 --> 00:26:26,176 of log k, the next smaller power of 2 below k. 484 00:26:26,176 --> 00:26:27,800 Maybe k is a power of 2, in which case, 485 00:26:27,800 --> 00:26:30,800 it's just one interval or two equal intervals. 486 00:26:30,800 --> 00:26:33,500 But in general, you just take the next smaller power of 2. 487 00:26:33,500 --> 00:26:37,700 That will cover more than half of the thing, of the interval. 488 00:26:37,700 --> 00:26:39,479 And so you have one that's left aligned, 489 00:26:39,479 --> 00:26:40,520 one that's right aligned. 490 00:26:40,520 --> 00:26:42,320 Together, those will cover everything. 491 00:26:42,320 --> 00:26:45,680 And because the min operation has this nifty feature 492 00:26:45,680 --> 00:26:48,200 that you can take the min of all these, min of all these, 493 00:26:48,200 --> 00:26:49,490 take the min of the 2. 494 00:26:49,490 --> 00:26:51,530 You will get the min overall. 495 00:26:51,530 --> 00:26:54,090 It doesn't hurt to have duplicate entries. 496 00:26:54,090 --> 00:26:58,514 That's kind of an important property of min. 497 00:26:58,514 --> 00:26:59,930 It holds for other properties too, 498 00:26:59,930 --> 00:27:02,840 like max, but not everything. 499 00:27:02,840 --> 00:27:05,019 Then boom, we've solved RMQ. 500 00:27:05,019 --> 00:27:05,810 I think it's clear. 501 00:27:05,810 --> 00:27:08,514 You do two queries, take the min of the two-- 502 00:27:08,514 --> 00:27:10,305 actually, you have to restore the arg mins. 503 00:27:10,305 --> 00:27:14,660 So it's a little more work, but constant time. 504 00:27:14,660 --> 00:27:15,420 Cool. 505 00:27:15,420 --> 00:27:16,070 That was easy. 506 00:27:22,070 --> 00:27:23,190 Leave LCA up there. 507 00:27:30,180 --> 00:27:30,790 OK. 508 00:27:30,790 --> 00:27:32,020 So we're almost there, right. 509 00:27:32,020 --> 00:27:34,420 Just a log factor off. 510 00:27:34,420 --> 00:27:38,230 So what technique do we have for shaving log factors? 511 00:27:40,880 --> 00:27:44,607 Indirection, yeah, our good friend indirection. 512 00:27:47,470 --> 00:27:50,140 Indirection comes to our rescue yet again, 513 00:27:50,140 --> 00:27:52,150 but we won't be done. 514 00:27:52,150 --> 00:27:54,280 The idea is, well, want to remove a log factor. 515 00:27:54,280 --> 00:27:56,849 Before we removed log factors from time, 516 00:27:56,849 --> 00:27:58,390 but there's no real time here, right. 517 00:27:58,390 --> 00:27:59,890 Everything's constant time. 518 00:27:59,890 --> 00:28:02,770 But we can use indirection to shave a log factor in space, 519 00:28:02,770 --> 00:28:03,790 too. 520 00:28:03,790 --> 00:28:07,120 Let's just divide. 521 00:28:07,120 --> 00:28:11,980 So this is again for RMQ. 522 00:28:11,980 --> 00:28:18,010 So I have an array, I'm going to divide the array into groups 523 00:28:18,010 --> 00:28:23,130 of size, I believe 1/2 log n would 524 00:28:23,130 --> 00:28:24,940 be the right magic number. 525 00:28:24,940 --> 00:28:27,580 It's going to be theta log n, but I need a specific constant 526 00:28:27,580 --> 00:28:28,390 for step 4. 527 00:28:30,910 --> 00:28:33,610 So what does that mean? 528 00:28:33,610 --> 00:28:37,820 I have the first 1/2 log n entries in the array. 529 00:28:37,820 --> 00:28:41,530 Then I have the next 1/2 log entries, 530 00:28:41,530 --> 00:28:47,200 and then I have the last 1/2 log n entries. 531 00:28:47,200 --> 00:28:50,560 OK, that's easy enough. 532 00:28:50,560 --> 00:28:53,210 But now I'd like to tie all these structures together. 533 00:28:53,210 --> 00:28:58,420 A natural way to do that is with a big structure on top of size, 534 00:28:58,420 --> 00:29:03,670 n over log n, I guess with a factor 2 out here. 535 00:29:03,670 --> 00:29:07,210 n over 1/2 log n. 536 00:29:07,210 --> 00:29:08,240 How do I do that? 537 00:29:08,240 --> 00:29:10,840 Well, this is an RMQ problem, so the natural thing to do 538 00:29:10,840 --> 00:29:13,400 is just take the min of everything here. 539 00:29:13,400 --> 00:29:16,480 So the red here is going to denote taking the min, 540 00:29:16,480 --> 00:29:18,880 and take that-- the one item that results by taking 541 00:29:18,880 --> 00:29:22,520 the min in that group, and promoting it to the next level. 542 00:29:22,520 --> 00:29:25,250 This is a static thing we do ahead of time. 543 00:29:25,250 --> 00:29:28,120 Now if I'm given a query, like say, 544 00:29:28,120 --> 00:29:31,900 this interval, what I need to do is first 545 00:29:31,900 --> 00:29:37,300 compute the min in this range within a bottom structure. 546 00:29:37,300 --> 00:29:39,190 Maybe also compute the min within this range, 547 00:29:39,190 --> 00:29:41,590 the last bottom structure, and then these guys 548 00:29:41,590 --> 00:29:42,820 are all taken in entirety. 549 00:29:42,820 --> 00:29:46,330 So I can just take the corresponding interval up here 550 00:29:46,330 --> 00:29:48,370 and that will give me simultaneously the mins 551 00:29:48,370 --> 00:29:49,660 of everything below. 552 00:29:49,660 --> 00:30:01,840 So now a query is going to be the min of two bottoms and one 553 00:30:01,840 --> 00:30:03,550 top. 554 00:30:03,550 --> 00:30:06,550 In other words, I do one top RMQ query for everything 555 00:30:06,550 --> 00:30:09,760 between, strictly between the two ends. 556 00:30:09,760 --> 00:30:12,520 Then I do a bottom query for the one end, a bottom query 557 00:30:12,520 --> 00:30:13,689 for the other end. 558 00:30:13,689 --> 00:30:15,730 Take the min of all those values and really, it's 559 00:30:15,730 --> 00:30:18,380 the arg min, but. 560 00:30:18,380 --> 00:30:18,880 Clear? 561 00:30:18,880 --> 00:30:20,530 So it would be constant time if I 562 00:30:20,530 --> 00:30:22,030 can do bottom in constant time, if I 563 00:30:22,030 --> 00:30:24,040 can do top in constant time. 564 00:30:24,040 --> 00:30:26,890 But the big win is that this top structure only has 565 00:30:26,890 --> 00:30:29,050 to store n over log n items. 566 00:30:29,050 --> 00:30:32,830 So I can afford an n log n space data structure, 567 00:30:32,830 --> 00:30:34,930 because the logs cancel. 568 00:30:34,930 --> 00:30:38,800 So I'm going to use structure 2 for the top. 569 00:30:38,800 --> 00:30:42,440 That will give me constant time up here, linear space. 570 00:30:42,440 --> 00:30:46,240 So all that's left is to solve the bottoms individually. 571 00:30:46,240 --> 00:30:48,310 Again, similar kind of structure to [INAUDIBLE].. 572 00:30:48,310 --> 00:30:51,160 We have a summary structure and we have the details down below. 573 00:30:51,160 --> 00:30:52,900 But the parameters are way out of whack. 574 00:30:52,900 --> 00:30:54,506 It's no longer root n, root n. 575 00:30:54,506 --> 00:30:56,380 Now these guys are super tiny because we only 576 00:30:56,380 --> 00:30:59,410 needed this to be a little bit smaller than n, 577 00:30:59,410 --> 00:31:03,010 and then this would work out to linear space. 578 00:31:03,010 --> 00:31:03,940 OK. 579 00:31:03,940 --> 00:31:08,176 So step 4 is going to be how do we solve the bottom structures. 580 00:31:20,326 --> 00:31:23,310 So step 4. 581 00:31:23,310 --> 00:31:33,830 This is where we're going to use technique of lookup tables 582 00:31:33,830 --> 00:31:35,550 for bottom groups. 583 00:31:37,965 --> 00:31:39,840 This is going to be slightly weird to phrase, 584 00:31:39,840 --> 00:31:41,756 because on the one hand, I want to be thinking 585 00:31:41,756 --> 00:31:44,490 about an individual group, but my solution is actually 586 00:31:44,490 --> 00:31:46,260 going to solve all groups simultaneously, 587 00:31:46,260 --> 00:31:47,385 and it's kind of important. 588 00:31:47,385 --> 00:31:51,990 But for now, let's just think of one group. 589 00:31:51,990 --> 00:31:58,065 So it has size n prime and n prime is 1/2 log n. 590 00:31:58,065 --> 00:31:59,440 I need to remember how it relates 591 00:31:59,440 --> 00:32:02,920 to the original value of n so I know how to pay for things. 592 00:32:02,920 --> 00:32:06,630 The idea is there's really not many different problems 593 00:32:06,630 --> 00:32:08,239 of size 1/2 log n. 594 00:32:08,239 --> 00:32:10,530 And here's where we're going to use the fact that we're 595 00:32:10,530 --> 00:32:14,110 in plus or minus 1 land. 596 00:32:14,110 --> 00:32:17,970 We have this giant string of integers. 597 00:32:17,970 --> 00:32:19,720 Well, now we're looking at log n of them 598 00:32:19,720 --> 00:32:25,590 to say OK, this here, this is a sequence 0, 1, 2, 3. 599 00:32:25,590 --> 00:32:28,200 Over here a 0, 1, 2, 1. 600 00:32:28,200 --> 00:32:29,790 There's all these different things. 601 00:32:29,790 --> 00:32:31,815 Then there's other things like 2, 3, 2, 3. 602 00:32:34,350 --> 00:32:37,000 So there's a couple annoying things. 603 00:32:37,000 --> 00:32:40,800 One is it matters what value you start at, a b, 604 00:32:40,800 --> 00:32:43,260 and then it matters what the sequence of plus and minus 1s 605 00:32:43,260 --> 00:32:45,940 are after that. 606 00:32:45,940 --> 00:32:46,440 OK. 607 00:32:46,440 --> 00:32:49,710 I claim it doesn't really matter what value you start at, 608 00:32:49,710 --> 00:33:04,110 because RMQ, this query, is invariant under adding 609 00:33:04,110 --> 00:33:10,720 some value x to all entries, all values, in the array. 610 00:33:10,720 --> 00:33:13,590 Or if I add 100 to every value, then the minimums 611 00:33:13,590 --> 00:33:15,690 stay the same in position. 612 00:33:15,690 --> 00:33:18,220 So again, here I'm thinking of RMQ as an arg min. 613 00:33:18,220 --> 00:33:21,600 So it's giving just the index of where it lives. 614 00:33:21,600 --> 00:33:27,030 So in particular, I'm going to add minus the first value 615 00:33:27,030 --> 00:33:29,370 of the array to all values. 616 00:33:33,210 --> 00:33:34,790 I should probably call this-- 617 00:33:34,790 --> 00:33:36,720 well, yeah. 618 00:33:36,720 --> 00:33:39,580 Here I'm just thinking about a single group for now. 619 00:33:39,580 --> 00:33:42,570 So in a single group, saying well, it starts at some value. 620 00:33:42,570 --> 00:33:44,760 I'm just going to decrease all these things 621 00:33:44,760 --> 00:33:46,345 by whatever that value is. 622 00:33:46,345 --> 00:33:47,970 Now some of them might become negative, 623 00:33:47,970 --> 00:33:50,730 but at least now we start with a 0. 624 00:33:50,730 --> 00:33:55,180 So what we start with is irrelevant. 625 00:33:55,180 --> 00:33:58,020 What remains, the remaining numbers here are completely 626 00:33:58,020 --> 00:34:01,770 defined by the gaps between or the difs 627 00:34:01,770 --> 00:34:04,320 between consecutive items, and the difs 628 00:34:04,320 --> 00:34:06,555 are all plus or minus 1. 629 00:34:06,555 --> 00:34:19,469 So now the number of possible arrays in a group, so 630 00:34:19,469 --> 00:34:22,980 in a single group, is equal to the number 631 00:34:22,980 --> 00:34:32,070 of plus or minus 1 strings of length n prime, which is 632 00:34:32,070 --> 00:34:32,610 1/2 log n. 633 00:34:37,120 --> 00:34:39,600 And the number of plus or minus 1 strings of length 634 00:34:39,600 --> 00:34:42,670 n prime is 2 to the n prime. 635 00:34:42,670 --> 00:34:48,150 So we get 2 to the 1/2 log n, also known as square root of n. 636 00:34:48,150 --> 00:34:49,530 Square root of n is small. 637 00:34:49,530 --> 00:34:51,219 We're aiming for linear space. 638 00:34:51,219 --> 00:34:53,730 This means that for every-- 639 00:34:53,730 --> 00:34:56,730 not only for every group, there is n over log n groups-- 640 00:34:56,730 --> 00:34:59,140 but actually many of the groups have to be the same. 641 00:34:59,140 --> 00:35:02,220 There's n over log n groups, but there's only root n 642 00:35:02,220 --> 00:35:04,450 different types of groups. 643 00:35:04,450 --> 00:35:09,090 So on average, like root n over log n occurrences of each. 644 00:35:09,090 --> 00:35:12,494 So we can kind of compress things down and say hey, 645 00:35:12,494 --> 00:35:14,160 I would like to just like store a lookup 646 00:35:14,160 --> 00:35:16,985 table for each one of these, but that would be quadratic space. 647 00:35:16,985 --> 00:35:19,360 But there's really only square root of n different types. 648 00:35:19,360 --> 00:35:22,410 So if I use a layer of indirection, I guess-- 649 00:35:22,410 --> 00:35:24,360 different sort of indirection-- if I just 650 00:35:24,360 --> 00:35:26,010 have, for each of these groups, I just 651 00:35:26,010 --> 00:35:29,040 store a pointer to the type of group, which 652 00:35:29,040 --> 00:35:31,980 is what the plus or minus 1 string is, 653 00:35:31,980 --> 00:35:34,200 and then for that type, I store a lookup table 654 00:35:34,200 --> 00:35:35,730 of all possibilities. 655 00:35:35,730 --> 00:35:36,900 That will be efficient. 656 00:35:36,900 --> 00:35:41,200 Let me show that to you. 657 00:35:41,200 --> 00:35:43,390 This is a very handy idea. 658 00:35:43,390 --> 00:35:47,960 In general, if you have a lot of things of size roughly log n, 659 00:35:47,960 --> 00:35:50,960 lookup tables are a good idea. 660 00:35:54,520 --> 00:35:56,860 And this naturally arises when you're using indirection, 661 00:35:56,860 --> 00:36:00,340 because usually you just need to shave a log or two. 662 00:36:00,340 --> 00:36:03,610 So here we have these different types. 663 00:36:03,610 --> 00:36:13,510 So what we're going to do is store a lookup table that 664 00:36:13,510 --> 00:36:16,810 says for each group type, I'll just 665 00:36:16,810 --> 00:36:24,130 say a lookup table of all answers, 666 00:36:24,130 --> 00:36:26,770 do that for each group type. 667 00:36:31,012 --> 00:36:32,970 Group type, meaning the plus or minus 1 string. 668 00:36:32,970 --> 00:36:34,630 It's really what is in that group 669 00:36:34,630 --> 00:36:36,831 after you do this shifting. 670 00:36:36,831 --> 00:36:37,330 OK. 671 00:36:37,330 --> 00:36:39,038 Now there's square root of n group types. 672 00:36:41,620 --> 00:36:43,390 What does it take to store the answers? 673 00:36:43,390 --> 00:36:50,680 Well, there is, I guess, 1/2 log n squared different queries, 674 00:36:50,680 --> 00:36:53,157 because n prime is 1/2 log n, and a query 675 00:36:53,157 --> 00:36:54,490 is defined by the two endpoints. 676 00:36:54,490 --> 00:36:56,650 So there's at most this many queries. 677 00:36:56,650 --> 00:37:00,820 Each query, to store the answer, is going to take order log log 678 00:37:00,820 --> 00:37:03,910 n bits-- this is if you're fancy-- 679 00:37:03,910 --> 00:37:07,890 because the answer is an index into that array of size 1/2 log 680 00:37:07,890 --> 00:37:10,840 n, so you need log log n bits to write down that. 681 00:37:10,840 --> 00:37:14,110 So the total size of this lookup table 682 00:37:14,110 --> 00:37:16,720 is the product of these things. 683 00:37:16,720 --> 00:37:19,120 We have to write root n look up tables. 684 00:37:19,120 --> 00:37:24,790 Each stores log squared n different values, 685 00:37:24,790 --> 00:37:28,310 and the values require log log n bits. 686 00:37:28,310 --> 00:37:31,240 So total number of bits is this thing, 687 00:37:31,240 --> 00:37:34,390 and this thing is little o of n. 688 00:37:34,390 --> 00:37:37,480 So smaller than linear, so it's irrelevant. 689 00:37:37,480 --> 00:37:39,700 Can store for free. 690 00:37:39,700 --> 00:37:42,700 Now if we have a bottom group, the one thing we need to do 691 00:37:42,700 --> 00:37:45,010 is store a pointer from that bottom group 692 00:37:45,010 --> 00:37:50,380 to the corresponding section of the lookup table for that group 693 00:37:50,380 --> 00:37:51,976 type. 694 00:37:51,976 --> 00:38:05,140 So each group stores a pointer into lookup table. 695 00:38:08,290 --> 00:38:11,050 I'm of two minds whether I think of this as a single lookup 696 00:38:11,050 --> 00:38:13,960 table that's parameterized first by group type, 697 00:38:13,960 --> 00:38:15,444 and then by the query. 698 00:38:15,444 --> 00:38:17,860 So it's like a two-dimensional table or three-dimensional, 699 00:38:17,860 --> 00:38:18,924 depending how you count. 700 00:38:18,924 --> 00:38:21,340 Or you can think of there being several lookup tables, one 701 00:38:21,340 --> 00:38:22,360 for each group type, and then you're 702 00:38:22,360 --> 00:38:23,810 pointing to a single lookup table. 703 00:38:23,810 --> 00:38:26,050 However, you want to think about it, same thing. 704 00:38:26,050 --> 00:38:29,260 Same difference, as they say. 705 00:38:29,260 --> 00:38:31,276 This gives us linear space. 706 00:38:31,276 --> 00:38:32,650 These pointers take linear space. 707 00:38:32,650 --> 00:38:34,733 The top structure takes linear space linear number 708 00:38:34,733 --> 00:38:38,650 of words, and constant query time, 709 00:38:38,650 --> 00:38:40,750 because lookup tables are very fast. 710 00:38:40,750 --> 00:38:42,070 Just look into them. 711 00:38:42,070 --> 00:38:43,360 They give you the answer. 712 00:38:43,360 --> 00:38:46,820 So you can do a lookup table here, lookup table here. 713 00:38:46,820 --> 00:38:50,290 And then over here, you do the covering 714 00:38:50,290 --> 00:38:52,990 by 2, powers of 2 intervals. 715 00:38:52,990 --> 00:38:55,567 Again, we have a lookup table for those intervals, 716 00:38:55,567 --> 00:38:57,400 so it's like we're looking into four tables, 717 00:38:57,400 --> 00:39:00,610 take the min of them all, done. 718 00:39:00,610 --> 00:39:04,180 That is RMQ, and also LCA. 719 00:39:04,180 --> 00:39:07,690 Actually it was really LCA that we solved, because we solved 720 00:39:07,690 --> 00:39:09,940 plus or minus 1 RMQ, which solved 721 00:39:09,940 --> 00:39:15,910 LCA, but by the Cartesian tree reduction, 722 00:39:15,910 --> 00:39:16,820 that also solves RMQ. 723 00:39:19,570 --> 00:39:22,870 Now we solved 2 out of 3 of our problems. 724 00:39:22,870 --> 00:39:23,620 Any questions? 725 00:39:26,750 --> 00:39:31,490 Level ancestors are going to be harder, little bit harder. 726 00:39:31,490 --> 00:39:33,350 Similar number of steps. 727 00:39:33,350 --> 00:39:35,330 I'd say they're a little more clever. 728 00:39:35,330 --> 00:39:36,830 This I feel is pretty easy. 729 00:39:36,830 --> 00:39:39,560 Very simple style of indirection, very simple style 730 00:39:39,560 --> 00:39:40,979 of enumeration here. 731 00:39:40,979 --> 00:39:43,520 It's going to be a little more sophisticated and a little bit 732 00:39:43,520 --> 00:39:48,110 more representative of the general case for level 733 00:39:48,110 --> 00:39:48,760 ancestors. 734 00:39:52,349 --> 00:39:53,140 Definitely fancier. 735 00:39:56,620 --> 00:40:01,420 Level ancestors is a similar story we solved a while ago, 736 00:40:01,420 --> 00:40:03,170 but it was kind of a complicated solution. 737 00:40:03,170 --> 00:40:05,680 And then Bender and Farach-Colton 738 00:40:05,680 --> 00:40:08,770 found it and said hey, we can simplify this. 739 00:40:08,770 --> 00:40:12,700 And I'm going to give you the simplified version. 740 00:40:12,700 --> 00:40:15,640 So this is level ancestors. 741 00:40:15,640 --> 00:40:17,860 Says originally solved by Berkman and Vishkin 742 00:40:17,860 --> 00:40:21,010 in 1994, OK, not so long ago. 743 00:40:21,010 --> 00:40:25,690 And then the new version is from 2004. 744 00:40:25,690 --> 00:40:26,800 Ready? 745 00:40:26,800 --> 00:40:27,580 Level ancestors. 746 00:40:27,580 --> 00:40:29,710 What was the problem again? 747 00:40:29,710 --> 00:40:30,580 Here it is. 748 00:40:30,580 --> 00:40:34,150 I gave you a rooted tree, give you a node, 749 00:40:34,150 --> 00:40:39,620 and a level that I want to go up, and then I level up by k, 750 00:40:39,620 --> 00:40:45,470 so I go to the kth ancestor, or parent to the k. 751 00:40:45,470 --> 00:40:47,650 This may seem superficially like LCA, 752 00:40:47,650 --> 00:40:50,470 but it's very different, because as you can see, 753 00:40:50,470 --> 00:40:52,540 RMQ was very specific to LCA. 754 00:40:52,540 --> 00:40:56,200 It's not going to let you solve level ancestors in any sense. 755 00:40:56,200 --> 00:40:57,049 I don't think. 756 00:40:57,049 --> 00:40:59,340 Maybe you could try to do the Cartesian tree reduction, 757 00:40:59,340 --> 00:41:03,520 but solution we'll see is completely different, 758 00:41:03,520 --> 00:41:05,860 although similar in spirit. 759 00:41:05,860 --> 00:41:09,564 So step 1. 760 00:41:09,564 --> 00:41:11,230 This one's going to be a little bit less 761 00:41:11,230 --> 00:41:12,980 obvious that we will succeed. 762 00:41:12,980 --> 00:41:15,370 Here we started with n log n space which 763 00:41:15,370 --> 00:41:17,217 is shaving a log, no big deal. 764 00:41:17,217 --> 00:41:19,300 Here, I'm going to give you a couple of strategies 765 00:41:19,300 --> 00:41:22,690 that aren't even constant time, they're log time or worse. 766 00:41:22,690 --> 00:41:25,951 And yet you combine them and you get constant time. 767 00:41:25,951 --> 00:41:26,770 It's crazy. 768 00:41:31,490 --> 00:41:33,880 Again, each of the pieces is going 769 00:41:33,880 --> 00:41:42,274 to be pretty intuitive, not super surprising, 770 00:41:42,274 --> 00:41:43,690 but it's one of these things where 771 00:41:43,690 --> 00:41:46,570 you take all these ingredients that are all kind of obvious, 772 00:41:46,570 --> 00:41:49,209 you stare at them for a while like, oh, I put them together 773 00:41:49,209 --> 00:41:49,750 and it works. 774 00:41:49,750 --> 00:41:51,490 It's like magic. 775 00:41:51,490 --> 00:41:54,820 All right, so first goal is going to be n log n space, 776 00:41:54,820 --> 00:41:56,726 log n query. 777 00:41:56,726 --> 00:41:59,350 So here's a way to do it with a technique called jump pointers. 778 00:42:07,510 --> 00:42:11,290 In this case, nodes are going to have log n different pointers, 779 00:42:11,290 --> 00:42:12,790 and they're going to point to the 2 780 00:42:12,790 --> 00:42:16,900 to the ith ancestor for all i. 781 00:42:20,281 --> 00:42:22,870 I guess maximum possible i would be log n. 782 00:42:22,870 --> 00:42:26,230 You can never go up more than n. 783 00:42:26,230 --> 00:42:29,150 So I mean, ideally you'd have a pointer to all your ancestors 784 00:42:29,150 --> 00:42:30,584 in array, boom. 785 00:42:30,584 --> 00:42:32,500 In the quadratic space, you solve your problem 786 00:42:32,500 --> 00:42:33,970 in constant time. 787 00:42:33,970 --> 00:42:35,470 But it's a little more interesting. 788 00:42:35,470 --> 00:42:39,430 Now every node only has pointers to log n different places 789 00:42:39,430 --> 00:42:44,212 so it's looking like this. 790 00:42:44,212 --> 00:42:47,950 This is the ancestor path. 791 00:42:47,950 --> 00:42:50,380 So n log n space, and I claim with this, 792 00:42:50,380 --> 00:42:54,410 you can roughly do a binary search, if you wanted to. 793 00:42:54,410 --> 00:42:56,920 Now we're not actually going to use this query algorithm 794 00:42:56,920 --> 00:42:59,930 for anything, but I'll write it down just 795 00:42:59,930 --> 00:43:02,200 so it feels like we've accomplished something, mainly 796 00:43:02,200 --> 00:43:04,030 log n query time. 797 00:43:04,030 --> 00:43:07,390 So what do I do? 798 00:43:07,390 --> 00:43:15,100 I set x to be the 2 to the floor log kth ancestor of x. 799 00:43:17,810 --> 00:43:24,760 OK, remember we're given a node x and a value 800 00:43:24,760 --> 00:43:26,530 k that we want to rise by. 801 00:43:26,530 --> 00:43:29,430 So I take the power of 2 just below k-- 802 00:43:29,430 --> 00:43:31,150 that's 2 the floor log k. 803 00:43:31,150 --> 00:43:33,580 I go up that much, and that's my new x, 804 00:43:33,580 --> 00:43:38,560 and then I set k to be k minus that value. 805 00:43:38,560 --> 00:43:41,300 That's how much I have left to go. 806 00:43:41,300 --> 00:43:41,800 OK. 807 00:43:41,800 --> 00:43:45,430 This thing will be less than k over 2. 808 00:43:45,430 --> 00:43:48,550 Because the next previous power of 2 is at least, 809 00:43:48,550 --> 00:43:50,710 is bigger than half of the thing. 810 00:43:50,710 --> 00:43:52,690 So we got more than halfway there, 811 00:43:52,690 --> 00:43:55,950 and so after log n iterations, we'll actually get there. 812 00:43:55,950 --> 00:43:58,250 That's pretty easy. 813 00:43:58,250 --> 00:44:04,120 That's jump pointers to two logs that we need to get rid of, 814 00:44:04,120 --> 00:44:06,040 and yes, we will use indirection, but not yet. 815 00:44:14,590 --> 00:44:16,950 First, we need some more ingredients. 816 00:44:21,270 --> 00:44:22,920 This next ingredient is kind of funny, 817 00:44:22,920 --> 00:44:25,180 because it will seem useless. 818 00:44:25,180 --> 00:44:30,000 But in fact, it is useful as a step towards ingredient 3. 819 00:44:30,000 --> 00:44:33,450 So the next trick is called long path decomposition. 820 00:44:40,170 --> 00:44:42,960 In general, this class covers a lot of different treaty 821 00:44:42,960 --> 00:44:45,030 compositions. 822 00:44:45,030 --> 00:44:49,160 We did preferred path decomposition for tango trees. 823 00:44:49,160 --> 00:44:50,700 We're going to do long path now. 824 00:44:50,700 --> 00:44:52,800 We'll do another one called heavy path later. 825 00:44:52,800 --> 00:44:54,270 There's a lot of them out there. 826 00:44:54,270 --> 00:44:56,730 This one won't seem very useful at first, 827 00:44:56,730 --> 00:45:00,420 because while it will achieve linear space, 828 00:45:00,420 --> 00:45:04,650 it will achieve the amazing square root of n query, which 829 00:45:04,650 --> 00:45:06,270 I guess is new. 830 00:45:06,270 --> 00:45:10,860 I mean, we don't know how to do that yet with linear space. 831 00:45:10,860 --> 00:45:12,240 Not so obvious how to get root n. 832 00:45:12,240 --> 00:45:16,680 But anyway, don't worry about the query time. 833 00:45:16,680 --> 00:45:18,960 It's more the concept of long path that's interesting. 834 00:45:18,960 --> 00:45:21,404 It's a step in the right direction. 835 00:45:21,404 --> 00:45:23,820 So here's what here's how we're going to decompose a tree. 836 00:45:23,820 --> 00:45:30,570 First thing we do is find the longest route to leaf path 837 00:45:30,570 --> 00:45:36,940 in the tree, because if you look at a tree, 838 00:45:36,940 --> 00:45:39,690 it has some wavy bottom. 839 00:45:39,690 --> 00:45:41,400 Take the deepest node. 840 00:45:41,400 --> 00:45:44,870 Take the path the unique path from the root to that node. 841 00:45:44,870 --> 00:45:45,630 OK. 842 00:45:45,630 --> 00:45:49,410 When I do that, I could imagine deleting those nodes. 843 00:45:49,410 --> 00:45:51,180 I mean, there's that path, and then 844 00:45:51,180 --> 00:45:52,890 there's everything else, which means 845 00:45:52,890 --> 00:45:56,730 there's all these triangles hanging off of that path, some 846 00:45:56,730 --> 00:46:00,400 on the left, some on the right. 847 00:46:00,400 --> 00:46:04,440 Actually, I haven't talked about this, 848 00:46:04,440 --> 00:46:10,620 but both LCA and level ancestors work not just for binary trees. 849 00:46:10,620 --> 00:46:13,000 They work for arbitrary trees. 850 00:46:13,000 --> 00:46:15,660 And somewhere along here-- 851 00:46:15,660 --> 00:46:17,760 yeah, here. 852 00:46:17,760 --> 00:46:20,250 This reduction of using the Euler tour 853 00:46:20,250 --> 00:46:22,464 works for non-binary trees, too. 854 00:46:22,464 --> 00:46:23,880 That's actually another reason why 855 00:46:23,880 --> 00:46:27,330 this reduction is better than in-order traversal by itself. 856 00:46:27,330 --> 00:46:30,630 In-order traversal works only for binary trees. 857 00:46:30,630 --> 00:46:32,190 This thing works for any tree. 858 00:46:32,190 --> 00:46:33,900 In that case, in an arbitrary tree, 859 00:46:33,900 --> 00:46:36,300 you visit the node many, many times potentially. 860 00:46:36,300 --> 00:46:37,980 OK, but it will still be linear space 861 00:46:37,980 --> 00:46:39,930 and everything will still work. 862 00:46:39,930 --> 00:46:42,210 Here also, I want to handle non-binary trees. 863 00:46:42,210 --> 00:46:44,250 So I'm going to draw things hanging off, 864 00:46:44,250 --> 00:46:46,500 but in fact, there might be several things hanging off 865 00:46:46,500 --> 00:46:49,200 here, each their own little tree. 866 00:46:49,200 --> 00:46:50,910 OK, but the point is-- 867 00:46:50,910 --> 00:46:51,750 where's my red. 868 00:46:54,690 --> 00:46:56,300 Here. 869 00:46:56,300 --> 00:46:59,850 There was this one path in the beginning, the longest path, 870 00:46:59,850 --> 00:47:01,680 and then there's stuff hanging off of it. 871 00:47:01,680 --> 00:47:05,760 So just recurse on all the things hanging off of it. 872 00:47:05,760 --> 00:47:08,610 Recursively decompose those sub-trees. 873 00:47:28,032 --> 00:47:29,062 OK. 874 00:47:29,062 --> 00:47:30,770 Not clear what this is going to give you. 875 00:47:30,770 --> 00:47:32,478 In fact, it's not going to be so awesome, 876 00:47:32,478 --> 00:47:35,140 but it will be a starting point. 877 00:47:35,140 --> 00:47:40,440 Now you can answer a query with this, as follows. 878 00:47:40,440 --> 00:47:43,400 Query-- oh, sorry. 879 00:47:43,400 --> 00:47:46,340 I should say how we're actually storing these paths. 880 00:47:46,340 --> 00:47:50,260 Here's the cool idea with this path thing. 881 00:47:50,260 --> 00:47:52,130 I have this path. 882 00:47:52,130 --> 00:47:54,770 I'd like to be able to jump around at least-- 883 00:47:54,770 --> 00:47:56,500 suppose your tree was a path. 884 00:47:56,500 --> 00:47:58,160 Suppose your tree were a path. 885 00:47:58,160 --> 00:47:59,930 Then what would you want to do? 886 00:47:59,930 --> 00:48:03,482 Store the nodes in an array ordered by depth, 887 00:48:03,482 --> 00:48:04,940 because then if you're a position i 888 00:48:04,940 --> 00:48:07,820 and you need to go to position i minus k, boom. 889 00:48:07,820 --> 00:48:09,990 That's just a look up into your array. 890 00:48:09,990 --> 00:48:17,610 So I'm going to store each path as an array, 891 00:48:17,610 --> 00:48:27,770 as an array of nodes or node pointers, I guess, 892 00:48:27,770 --> 00:48:30,080 ordered by depth. 893 00:48:30,080 --> 00:48:35,420 So if it happens, so if my query value x is somewhere on this 894 00:48:35,420 --> 00:48:40,430 path, and if this path encompasses where I need 895 00:48:40,430 --> 00:48:44,240 to go-- so if I need to go k up and I end up here-- 896 00:48:44,240 --> 00:48:45,890 then that's instantaneous. 897 00:48:45,890 --> 00:48:48,270 The trouble would be is if I have a query, 898 00:48:48,270 --> 00:48:50,530 let's say, over here. 899 00:48:50,530 --> 00:48:55,190 And so there's going to be a path that guy lives on, 900 00:48:55,190 --> 00:48:57,540 but maybe the kth ancestor is not on that path. 901 00:48:57,540 --> 00:48:59,794 It could be on a higher up path. 902 00:48:59,794 --> 00:49:01,460 It could be on the red path, and I can't 903 00:49:01,460 --> 00:49:03,710 jump there instantaneously. 904 00:49:03,710 --> 00:49:07,170 Nonetheless, there is a decent query algorithm here. 905 00:49:07,170 --> 00:49:07,670 All right. 906 00:49:11,230 --> 00:49:17,900 So Here's what we're going to do. 907 00:49:17,900 --> 00:49:29,970 If k is less than or equal to the index i of node 908 00:49:29,970 --> 00:49:32,300 x on its path. 909 00:49:37,580 --> 00:49:40,130 So every node belongs to exactly one path. 910 00:49:40,130 --> 00:49:41,840 This is a path decomposition. 911 00:49:41,840 --> 00:49:44,954 It's a partition of the tree into paths. 912 00:49:44,954 --> 00:49:46,370 Not all the edges are represented, 913 00:49:46,370 --> 00:49:48,980 but all the nodes are there. 914 00:49:48,980 --> 00:49:53,020 All the nodes belong to some path, 915 00:49:53,020 --> 00:49:56,000 and we're going to store, for every node, store 916 00:49:56,000 --> 00:49:59,780 what its index is and where it lives in its array. 917 00:49:59,780 --> 00:50:02,330 So look at that index in the array. 918 00:50:02,330 --> 00:50:05,330 If k is less than or equal to that index, 919 00:50:05,330 --> 00:50:08,810 then we can solve our problem instantly 920 00:50:08,810 --> 00:50:15,590 by looking at the path array at position i minus k. 921 00:50:15,590 --> 00:50:17,930 That's what I said before. 922 00:50:17,930 --> 00:50:20,210 If our kth ancestor is within the path, 923 00:50:20,210 --> 00:50:22,190 then that's where it will be, and that's 924 00:50:22,190 --> 00:50:25,260 going to work as long as that is non-negative. 925 00:50:25,260 --> 00:50:28,380 If I get to negative, that means it's another path. 926 00:50:28,380 --> 00:50:30,590 So that's the good case. 927 00:50:30,590 --> 00:50:34,970 The other case is we're just going to do 928 00:50:34,970 --> 00:50:37,224 some recursion, essentially. 929 00:50:41,100 --> 00:50:43,460 So we're going to go as high as we can with this path. 930 00:50:43,460 --> 00:50:47,420 We're going to look at path array at position 0. 931 00:50:47,420 --> 00:50:48,534 Go to the parent of that. 932 00:50:48,534 --> 00:50:50,450 Let's suppose every node has a parent pointer. 933 00:50:50,450 --> 00:50:59,940 That's easy, regular tree, and then decrease k by 1 plus i. 934 00:50:59,940 --> 00:51:02,570 So the array let us jump up i steps-- 935 00:51:02,570 --> 00:51:06,130 that's this part-- and then the parent 936 00:51:06,130 --> 00:51:07,460 stepped us up one more step. 937 00:51:07,460 --> 00:51:10,910 That's just to get to the next path above us. 938 00:51:10,910 --> 00:51:13,300 OK, so how much did this decrease k by? 939 00:51:13,300 --> 00:51:16,520 I'd like to say a factor of 2 and get log n, but in fact, 940 00:51:16,520 --> 00:51:18,842 no, it's not very good. 941 00:51:18,842 --> 00:51:20,300 It doesn't decrease k by very much. 942 00:51:20,300 --> 00:51:23,030 It does decrease k, guaranteed by at least 1, 943 00:51:23,030 --> 00:51:25,700 so it's definitely linear time. 944 00:51:25,700 --> 00:51:29,165 And there's a bad tree, which is this. 945 00:51:35,100 --> 00:51:36,710 It's like a grid. 946 00:51:36,710 --> 00:51:37,230 Whoa. 947 00:51:37,230 --> 00:51:37,730 Sorry. 948 00:51:41,104 --> 00:51:42,680 OK, here's a tree. 949 00:51:42,680 --> 00:51:44,820 It's a binary tree. 950 00:51:44,820 --> 00:51:47,091 And if you set it up right, this is the longest path. 951 00:51:47,091 --> 00:51:49,340 And then when you decompose, this is the longest path, 952 00:51:49,340 --> 00:51:51,631 and this is the longest path, this is the longest path. 953 00:51:51,631 --> 00:51:53,544 If you query here, you'll walk up to here, 954 00:51:53,544 --> 00:51:55,460 and then walk up to here, and walk up to here, 955 00:51:55,460 --> 00:51:56,293 and walk up to here. 956 00:51:56,293 --> 00:52:00,510 So this is a square root of n lower bound for this algorithm. 957 00:52:00,510 --> 00:52:03,410 So not a good algorithm yet, but the makings 958 00:52:03,410 --> 00:52:04,744 of a good algorithm. 959 00:52:19,570 --> 00:52:25,826 Makings of step 3, which is called ladder decomposition. 960 00:52:32,587 --> 00:52:34,670 Ladder decomposition is something I haven't really 961 00:52:34,670 --> 00:52:36,515 seen anywhere else. 962 00:52:36,515 --> 00:52:38,390 I think it comes from the parallel algorithms 963 00:52:38,390 --> 00:52:39,620 world in general. 964 00:52:43,830 --> 00:52:53,760 And now we're going to achieve linear space log n query. 965 00:52:53,760 --> 00:52:55,650 Now this is an improvement. 966 00:52:55,650 --> 00:52:58,070 So we have, at the moment, n log n space, 967 00:52:58,070 --> 00:53:02,105 log n query or n space root n query. 968 00:53:02,105 --> 00:53:05,120 We're basically taking the min of the two. 969 00:53:05,120 --> 00:53:08,990 And so we're getting linear space log n query. 970 00:53:08,990 --> 00:53:10,170 Still not perfect. 971 00:53:10,170 --> 00:53:11,810 We want constant query. 972 00:53:11,810 --> 00:53:15,140 That's when we'll use indirection, I think. 973 00:53:15,140 --> 00:53:20,060 Yeah, basically, a new type of indirection, but OK. 974 00:53:20,060 --> 00:53:22,950 So linear space log n query. 975 00:53:22,950 --> 00:53:28,040 Well, the idea is just to fix long paths, 976 00:53:28,040 --> 00:53:29,690 and it's a crazy idea, OK. 977 00:53:29,690 --> 00:53:31,970 Let me tell you the idea and then it's 978 00:53:31,970 --> 00:53:34,310 like, why would that be useful. 979 00:53:34,310 --> 00:53:37,220 But it's obvious that it doesn't hurt you, OK. 980 00:53:37,220 --> 00:53:40,970 When we have these paths, sometimes they're long. 981 00:53:40,970 --> 00:53:43,310 Sometimes they're not long enough. 982 00:53:43,310 --> 00:53:44,990 Just take each of these paths and extend 983 00:53:44,990 --> 00:53:48,900 them upwards by a factor of 2. 984 00:53:48,900 --> 00:53:51,110 That's the idea. 985 00:53:51,110 --> 00:54:00,430 So take number 2, extend each path upward 2 x. 986 00:54:00,430 --> 00:54:03,760 So that gives us call a ladder. 987 00:54:06,270 --> 00:54:07,160 OK, what happens? 988 00:54:07,160 --> 00:54:10,810 Well, paths are going to overlap. 989 00:54:10,810 --> 00:54:13,000 Fine. 990 00:54:13,000 --> 00:54:13,860 Ladders overlap. 991 00:54:13,860 --> 00:54:15,730 The original paths don't overlap. 992 00:54:15,730 --> 00:54:16,510 Ladders overlap. 993 00:54:16,510 --> 00:54:18,580 I don't really care if they overlap. 994 00:54:18,580 --> 00:54:20,650 How much space is there? 995 00:54:20,650 --> 00:54:23,350 It's still linear space, because I'm just doubling everything. 996 00:54:23,350 --> 00:54:27,027 So I've most doubled space relative to long path 997 00:54:27,027 --> 00:54:27,610 decomposition. 998 00:54:27,610 --> 00:54:30,010 I didn't mention it explicitly, but long path decomposition 999 00:54:30,010 --> 00:54:30,676 is linear space. 1000 00:54:30,676 --> 00:54:34,630 We're just partitioning up the tree into little pieces. 1001 00:54:34,630 --> 00:54:36,070 Doesn't take much. 1002 00:54:36,070 --> 00:54:39,100 We have to store those arrays, but every node 1003 00:54:39,100 --> 00:54:40,990 appears in exactly one cell here. 1004 00:54:40,990 --> 00:54:42,820 Now every node will appear in, on average, 1005 00:54:42,820 --> 00:54:45,100 two cells in some weird way. 1006 00:54:45,100 --> 00:54:46,570 Like what happens over here? 1007 00:54:46,570 --> 00:54:48,850 I have no idea. 1008 00:54:48,850 --> 00:54:50,590 So this guy's length 1. 1009 00:54:50,590 --> 00:54:52,060 It's going to grow to length 2. 1010 00:54:52,060 --> 00:54:55,750 This one's length 2, so now it'll grow to length 4. 1011 00:54:55,750 --> 00:54:56,625 This one's length 3-- 1012 00:54:56,625 --> 00:54:57,958 and it depends on how you count. 1013 00:54:57,958 --> 00:54:59,170 I'm counting nodes here. 1014 00:54:59,170 --> 00:55:03,010 That's going to go here, all the way the top. 1015 00:55:03,010 --> 00:55:04,840 Interesting. 1016 00:55:04,840 --> 00:55:06,470 All the others will go to the top. 1017 00:55:06,470 --> 00:55:08,945 So if I'm here, I walk here. 1018 00:55:08,945 --> 00:55:10,570 Then I can jump all the way to the top. 1019 00:55:10,570 --> 00:55:13,210 Then I can jump all the way to the root. 1020 00:55:13,210 --> 00:55:17,000 Not totally obvious, but it actually will be log n steps. 1021 00:55:17,000 --> 00:55:18,215 Let's prove that. 1022 00:55:18,215 --> 00:55:19,840 This is again something we don't really 1023 00:55:19,840 --> 00:55:21,490 need to know for the final solution, 1024 00:55:21,490 --> 00:55:25,060 but kind of nice, kind of comforting to know that we've 1025 00:55:25,060 --> 00:55:27,190 gotten down a log n query. 1026 00:55:27,190 --> 00:55:29,715 So it's at most double the space. 1027 00:55:29,715 --> 00:55:30,590 This is still linear. 1028 00:55:34,030 --> 00:55:37,165 Now-- oh, there's one catch. 1029 00:55:40,750 --> 00:55:45,660 Over in this world, we said each-- 1030 00:55:45,660 --> 00:55:47,650 I didn't say it. 1031 00:55:47,650 --> 00:55:49,000 I mentioned it out loud. 1032 00:55:49,000 --> 00:55:52,330 Every node stores what array it lives in. 1033 00:55:52,330 --> 00:55:55,720 Now a node lives in multiple arrays, OK. 1034 00:55:55,720 --> 00:55:58,360 So which one do I store a pointer to? 1035 00:55:58,360 --> 00:56:02,890 Well, there's one obvious one to store a pointer to. 1036 00:56:02,890 --> 00:56:05,380 Whatever node you take lives in one path. 1037 00:56:05,380 --> 00:56:08,230 In that long path decomposition, it still lives in one path. 1038 00:56:08,230 --> 00:56:11,660 Store a pointer into that ladder. 1039 00:56:11,660 --> 00:56:20,560 So node stores a pointer you could say to the ladder that 1040 00:56:20,560 --> 00:56:22,780 contains it in the lower half. 1041 00:56:27,550 --> 00:56:31,150 That corresponds to the one where it was an actual path. 1042 00:56:31,150 --> 00:56:35,260 And only one ladder will contain a node in its lower half. 1043 00:56:35,260 --> 00:56:37,100 The upper half was the extension. 1044 00:56:37,100 --> 00:56:40,560 I guess it's like those folding ladders you extend. 1045 00:56:40,560 --> 00:56:42,260 OK. 1046 00:56:42,260 --> 00:56:42,760 Cool. 1047 00:56:42,760 --> 00:56:44,718 So that's what we're going to do and also store 1048 00:56:44,718 --> 00:56:47,210 its index in the array. 1049 00:56:47,210 --> 00:56:50,230 Now we can do exactly this query algorithm again, 1050 00:56:50,230 --> 00:56:52,570 except now instead of path, it says ladder. 1051 00:56:52,570 --> 00:56:55,900 So you look at the index of the node in its ladder. 1052 00:56:55,900 --> 00:56:58,900 If that index is larger than k, then 1053 00:56:58,900 --> 00:57:02,200 boom, that ladder array will tell you exactly where to go. 1054 00:57:02,200 --> 00:57:04,480 Otherwise you go to the top of the ladder 1055 00:57:04,480 --> 00:57:06,160 and then you take the parent pointer, 1056 00:57:06,160 --> 00:57:07,510 and you decrease by this. 1057 00:57:07,510 --> 00:57:11,116 But now I claim that decrease will be substantial. 1058 00:57:11,116 --> 00:57:11,615 Why? 1059 00:57:20,470 --> 00:57:22,900 If I have a node of height h-- 1060 00:57:25,570 --> 00:57:28,100 remember, height of a node is the length of the longest 1061 00:57:28,100 --> 00:57:29,620 path from there downward-- 1062 00:57:32,300 --> 00:57:44,301 it will be on a ladder of height at least 2h. 1063 00:57:44,301 --> 00:57:44,800 Why? 1064 00:57:44,800 --> 00:57:47,122 Because if you look at a node of height h-- 1065 00:57:47,122 --> 00:57:48,580 like say, I don't know, this node-- 1066 00:57:51,220 --> 00:57:53,964 the longest path from there is substantial. 1067 00:57:53,964 --> 00:57:56,380 I mean, if it's height h, then the longest path from there 1068 00:57:56,380 --> 00:57:57,850 is length at least h. 1069 00:57:57,850 --> 00:58:00,850 So every node of height h will be on a path of length at least 1070 00:58:00,850 --> 00:58:04,010 h, and from there down. 1071 00:58:04,010 --> 00:58:05,260 And so you look at the ladder. 1072 00:58:05,260 --> 00:58:06,920 Well, that's going to be double that. 1073 00:58:06,920 --> 00:58:09,670 So the ladder will be height at least 2h, 1074 00:58:09,670 --> 00:58:13,030 which means if your query starts at height h, 1075 00:58:13,030 --> 00:58:16,480 after you do one step of this ladder search, 1076 00:58:16,480 --> 00:58:19,750 you will get to height at least 2h, and then 4h, and then 8h. 1077 00:58:19,750 --> 00:58:23,020 You're increasing your height by a power of 2, by a factor of 2 1078 00:58:23,020 --> 00:58:24,140 every time. 1079 00:58:24,140 --> 00:58:29,000 So in log n steps, you will get to wherever you need to go. 1080 00:58:29,000 --> 00:58:31,160 OK You don't have to worry about overshooting, 1081 00:58:31,160 --> 00:58:33,530 because that's the case when the array tells you 1082 00:58:33,530 --> 00:58:35,800 exactly where to go. 1083 00:58:35,800 --> 00:58:36,300 OK. 1084 00:58:38,930 --> 00:58:42,080 Time for the climax. 1085 00:58:42,080 --> 00:58:43,806 It won't be the end, but it's the climax 1086 00:58:43,806 --> 00:58:44,930 in the middle of the story. 1087 00:58:44,930 --> 00:58:47,900 So we have on the one hand, jump pointers. 1088 00:58:47,900 --> 00:58:48,890 Remember those? 1089 00:58:48,890 --> 00:58:55,170 Jump pointers made small steps initially and got-- 1090 00:58:55,170 --> 00:58:56,840 actually, no. 1091 00:58:56,840 --> 00:58:59,065 This is what it looks like for the data structure. 1092 00:58:59,065 --> 00:59:00,440 But if you look at the algorithm, 1093 00:59:00,440 --> 00:59:02,460 actually it makes a big step in the beginning. 1094 00:59:02,460 --> 00:59:02,960 Right? 1095 00:59:02,960 --> 00:59:04,744 It gets more than halfway there. 1096 00:59:04,744 --> 00:59:06,410 Then it makes smaller and smaller steps, 1097 00:59:06,410 --> 00:59:07,890 exponentially decreasing steps. 1098 00:59:07,890 --> 00:59:12,990 Finally, it arrives at the intended node. 1099 00:59:12,990 --> 00:59:16,004 Ladder decomposition is doing the reverse. 1100 00:59:16,004 --> 00:59:17,420 If you start at low height, you're 1101 00:59:17,420 --> 00:59:19,490 going to make very small steps in the beginning. 1102 00:59:19,490 --> 00:59:20,906 As your height gets bigger, you're 1103 00:59:20,906 --> 00:59:22,790 going to be making bigger and bigger steps. 1104 00:59:22,790 --> 00:59:25,880 And then when you jump over your node, you found it instantly. 1105 00:59:25,880 --> 00:59:29,340 So it's kind of the opposite of jump pointers. 1106 00:59:29,340 --> 00:59:32,630 So what we're going to do is take jump pointers 1107 00:59:32,630 --> 00:59:35,959 and add them to ladder decomposition. 1108 00:59:52,096 --> 00:59:53,580 Huh. 1109 00:59:53,580 --> 00:59:56,170 This is, I guess, version 4. 1110 00:59:56,170 --> 01:00:08,700 Combine jump pointers from one and ladders from three. 1111 01:00:08,700 --> 01:00:09,420 Forget about two. 1112 01:00:09,420 --> 01:00:11,940 Two is just a warm up for three. 1113 01:00:11,940 --> 01:00:15,210 Long paths, defined ladders. 1114 01:00:15,210 --> 01:00:17,850 So we've got one way to do log n query. 1115 01:00:17,850 --> 01:00:20,250 We've got another way to do log n query. 1116 01:00:20,250 --> 01:00:25,610 I combine them, and I get constant query. 1117 01:00:25,610 --> 01:00:27,260 Because log n plus log n equals 1. 1118 01:00:27,260 --> 01:00:27,830 I don't know. 1119 01:00:32,820 --> 01:00:34,760 OK, here's the idea. 1120 01:00:34,760 --> 01:00:38,030 On the one hand, jump pointers make a big step and then 1121 01:00:38,030 --> 01:00:40,030 smaller steps, right. 1122 01:00:40,030 --> 01:00:41,460 Yeah, like that. 1123 01:00:41,460 --> 01:00:46,340 And on the other hand, ladders make small steps. 1124 01:00:46,340 --> 01:00:47,090 It's hard to draw. 1125 01:00:51,400 --> 01:00:59,380 What I'd like to do is take this step and this step. 1126 01:00:59,380 --> 01:01:02,110 That would be good, because only two of them. 1127 01:01:02,110 --> 01:01:13,180 So query is going to do one jump, plus 1 ladder, 1128 01:01:13,180 --> 01:01:15,410 in that order. 1129 01:01:15,410 --> 01:01:17,652 See, the thing about ladders is it's 1130 01:01:17,652 --> 01:01:20,110 really slow in the beginning, because your height is small. 1131 01:01:20,110 --> 01:01:23,170 I really want to get large height. 1132 01:01:23,170 --> 01:01:24,730 Jump pointers give you large height. 1133 01:01:24,730 --> 01:01:28,720 The very first step, you get half the height you need. 1134 01:01:28,720 --> 01:01:30,600 That's it. 1135 01:01:30,600 --> 01:01:38,770 So when we do a jump, we do one step of the jump algorithm. 1136 01:01:38,770 --> 01:01:39,430 What do we do? 1137 01:01:39,430 --> 01:01:49,040 We reach height at least k over 2 above x. 1138 01:01:49,040 --> 01:01:51,650 All right, we get halfway there. 1139 01:01:51,650 --> 01:01:53,310 So our height-- it's a little-- 1140 01:01:53,310 --> 01:01:56,360 let's say x has height h. 1141 01:01:56,360 --> 01:01:58,960 OK, so then we get to height-- this is saying we 1142 01:01:58,960 --> 01:02:03,370 get to height h plus k over 2. 1143 01:02:03,370 --> 01:02:04,180 OK, that's good. 1144 01:02:04,180 --> 01:02:06,190 This is a big height. 1145 01:02:06,190 --> 01:02:11,410 Halfway there, I mean, halfway of the remainder after h. 1146 01:02:11,410 --> 01:02:15,620 Now ladders double your height in every step. 1147 01:02:15,620 --> 01:02:20,540 So ladder step-- so this is the jump step. 1148 01:02:20,540 --> 01:02:24,940 If you do one ladder step, you will reach height double that. 1149 01:02:24,940 --> 01:02:28,725 So it's at least 2 h plus k, which 1150 01:02:28,725 --> 01:02:29,900 is bigger than what we need. 1151 01:02:29,900 --> 01:02:31,067 We need h plus k. 1152 01:02:31,067 --> 01:02:32,400 That's where we're trying to go. 1153 01:02:32,400 --> 01:02:34,964 And so we're done. 1154 01:02:34,964 --> 01:02:35,630 Isn't that cool? 1155 01:02:40,960 --> 01:02:44,750 So the annoying part is there's this extra part here. 1156 01:02:44,750 --> 01:02:48,110 This is the h part and we start at some level. 1157 01:02:48,110 --> 01:02:49,080 We don't know where. 1158 01:02:49,080 --> 01:02:49,580 This is x. 1159 01:02:49,580 --> 01:02:51,620 The worst case is maybe when it's very small, 1160 01:02:51,620 --> 01:02:56,260 but whatever it is, we do this step and this is our target up 1161 01:02:56,260 --> 01:02:57,080 here. 1162 01:02:57,080 --> 01:03:00,110 This is height h plus k. 1163 01:03:00,110 --> 01:03:02,210 In one step, we get more than halfway 1164 01:03:02,210 --> 01:03:05,150 there with the jump pointer. 1165 01:03:05,150 --> 01:03:07,530 And then the ladder will carry us the rest of the way. 1166 01:03:07,530 --> 01:03:08,990 Because this is the ladder. 1167 01:03:08,990 --> 01:03:13,310 We basically go horizontally to fall on this ladder, 1168 01:03:13,310 --> 01:03:15,470 and it will cover us beyond where 1169 01:03:15,470 --> 01:03:17,500 we need to go, beyond our wildest imaginations. 1170 01:03:17,500 --> 01:03:19,669 So this is k over 2. 1171 01:03:19,669 --> 01:03:21,210 Because not only will it double this, 1172 01:03:21,210 --> 01:03:22,668 which is what we need to double, it 1173 01:03:22,668 --> 01:03:25,820 will also double whatever is down here, this h part. 1174 01:03:25,820 --> 01:03:28,320 So it gets us way beyond where we need to go. 1175 01:03:28,320 --> 01:03:29,250 I mean, could be h 0. 1176 01:03:29,250 --> 01:03:31,208 Then it gets us to exactly where we need to go. 1177 01:03:33,472 --> 01:03:35,180 But then the ladder tells us where to go. 1178 01:03:35,180 --> 01:03:37,931 So two steps constant time. 1179 01:03:41,300 --> 01:03:45,440 Now one annoying thing is we're not done with space. 1180 01:03:45,440 --> 01:03:47,420 So this is the anticlimax part. 1181 01:03:47,420 --> 01:03:49,320 It's still going to be pretty interesting. 1182 01:03:49,320 --> 01:03:51,456 We've got to shave off a log factor in space, 1183 01:03:51,456 --> 01:03:52,580 but hey, we're experienced. 1184 01:03:52,580 --> 01:03:54,121 We already did that once today. 1185 01:03:54,121 --> 01:03:54,620 Question? 1186 01:03:54,620 --> 01:03:55,120 Yeah. 1187 01:03:55,120 --> 01:03:57,145 Why is it OK to go past your target? 1188 01:04:01,664 --> 01:04:03,830 The question was why is it OK to go past our target? 1189 01:04:03,830 --> 01:04:05,788 Jump pointers aren't allowed, because they only 1190 01:04:05,788 --> 01:04:06,720 know how to go up. 1191 01:04:06,720 --> 01:04:07,740 They can't overshoot. 1192 01:04:07,740 --> 01:04:10,640 That's why they went less than halfway, or more than halfway, 1193 01:04:10,640 --> 01:04:12,020 but less than the full way. 1194 01:04:12,020 --> 01:04:16,580 Ladder decomposition can go beyond, because as soon as-- 1195 01:04:16,580 --> 01:04:20,450 the point is, as soon as-- here's you, x, and here's 1196 01:04:20,450 --> 01:04:21,770 your kth ancestor. 1197 01:04:21,770 --> 01:04:22,810 This is the answer. 1198 01:04:22,810 --> 01:04:24,560 As soon as you're in a common ladder, then 1199 01:04:24,560 --> 01:04:26,190 the array tells you where to go. 1200 01:04:26,190 --> 01:04:30,099 So even though the top of the ladder overshot, 1201 01:04:30,099 --> 01:04:31,640 there will be a ladder connecting you 1202 01:04:31,640 --> 01:04:32,723 to that top of the ladder. 1203 01:04:32,723 --> 01:04:36,020 So as long as it's somewhere in between, it's free. 1204 01:04:36,020 --> 01:04:39,755 Yeah, so that's why it's OK this goes potentially too high. 1205 01:04:39,755 --> 01:04:41,630 So it's good for ladders, not good for jumps, 1206 01:04:41,630 --> 01:04:46,160 but that's exactly where we have it Other questions? 1207 01:04:46,160 --> 01:04:46,823 Yeah. 1208 01:04:46,823 --> 01:04:50,204 AUDIENCE: [INAUDIBLE] jump pointers, 1209 01:04:50,204 --> 01:04:52,458 wouldn't you be high up enough in the tree 1210 01:04:52,458 --> 01:04:54,950 so that just the long path would work? 1211 01:04:54,950 --> 01:04:56,450 PROFESSOR: Oh, interesting question. 1212 01:04:56,450 --> 01:04:59,450 So would it be enough to do jump pointers plus long path? 1213 01:04:59,450 --> 01:05:02,030 My guess is no. 1214 01:05:02,030 --> 01:05:04,250 Jump pointers get you up to-- 1215 01:05:04,250 --> 01:05:05,750 so think of the case where h is 0. 1216 01:05:05,750 --> 01:05:08,060 Initially you're at height 0. 1217 01:05:08,060 --> 01:05:09,970 I think that's going to be a problem. 1218 01:05:09,970 --> 01:05:15,470 You jump up to height k over 2 with a jump pointer. 1219 01:05:15,470 --> 01:05:17,270 Now long path decomposition, you know 1220 01:05:17,270 --> 01:05:21,170 that the path will have a length at least k over 2, 1221 01:05:21,170 --> 01:05:22,740 but you need to get up to k. 1222 01:05:22,740 --> 01:05:25,010 And so you may get stuck in this kind of situation 1223 01:05:25,010 --> 01:05:27,540 where maybe you're trying to get to the root 1224 01:05:27,540 --> 01:05:31,220 and you jumped to here, but then you have to walk. 1225 01:05:31,220 --> 01:05:33,140 So I think the long path's not enough. 1226 01:05:33,140 --> 01:05:35,532 You need that factor of 2, which the ladders give you. 1227 01:05:35,532 --> 01:05:37,490 You can see where ladders come from now, right? 1228 01:05:37,490 --> 01:05:40,391 I mean we got up to height k over 2. 1229 01:05:40,391 --> 01:05:41,640 Now we just need to double it. 1230 01:05:41,640 --> 01:05:44,000 Hey, we can afford to double every path, 1231 01:05:44,000 --> 01:05:45,385 but I think we need to. 1232 01:05:45,385 --> 01:05:48,690 Are there questions? 1233 01:05:48,690 --> 01:05:50,300 OK. 1234 01:05:50,300 --> 01:05:56,000 So last thing to do is to shave off this log factor of space. 1235 01:05:56,000 --> 01:05:58,560 Now, we're going to do that with indirection, of course, 1236 01:05:58,560 --> 01:06:01,040 constant time and log n space. 1237 01:06:01,040 --> 01:06:04,450 But it's not our usual type of indirection. 1238 01:06:08,750 --> 01:06:10,540 Use this board. 1239 01:06:10,540 --> 01:06:11,260 Indirections. 1240 01:06:11,260 --> 01:06:17,080 So last time we did indirection, it was with an array. 1241 01:06:17,080 --> 01:06:19,330 And actually pretty much every indirection we've done, 1242 01:06:19,330 --> 01:06:21,160 it's been with an array-like thing. 1243 01:06:21,160 --> 01:06:24,040 We could decompose into groups of size log n, 1244 01:06:24,040 --> 01:06:26,010 the top thing was n over log n. 1245 01:06:26,010 --> 01:06:28,180 So it was kind of clean. 1246 01:06:28,180 --> 01:06:32,200 This structure is not so clean, because it's a tree. 1247 01:06:32,200 --> 01:06:34,450 How do you decompose a tree into little things 1248 01:06:34,450 --> 01:06:36,700 at the bottom of size log n and a top thing 1249 01:06:36,700 --> 01:06:38,680 of size n over log n? 1250 01:06:38,680 --> 01:06:43,510 Suppose, for example, your tree is a path. 1251 01:06:46,840 --> 01:06:49,420 Bad news. 1252 01:06:49,420 --> 01:06:55,960 If my tree were a path, well, I could trim off 1253 01:06:55,960 --> 01:06:57,340 bottom thing of size log n. 1254 01:06:57,340 --> 01:07:01,750 But now the rest is of size n minus log n, not n divided 1255 01:07:01,750 --> 01:07:02,260 by log n. 1256 01:07:02,260 --> 01:07:03,700 That's bad. 1257 01:07:03,700 --> 01:07:06,630 I need to shave a factor of log n, not an additive log n. 1258 01:07:09,591 --> 01:07:11,340 Can you tell me a good thing about a path? 1259 01:07:14,300 --> 01:07:17,150 I mean, obviously, when we can put in an array. 1260 01:07:17,150 --> 01:07:19,070 But can you quantify the goodness, 1261 01:07:19,070 --> 01:07:21,320 or the pathlikedness of a tree? 1262 01:07:24,984 --> 01:07:26,250 I erase this board. 1263 01:07:33,169 --> 01:07:34,210 Kind of a vague question. 1264 01:07:41,730 --> 01:07:43,980 Good thing about a path is that it 1265 01:07:43,980 --> 01:07:46,050 doesn't have very many leaves. 1266 01:07:46,050 --> 01:07:48,220 That's one way to quantify pathedness. 1267 01:07:48,220 --> 01:07:54,290 Small number of leaves, I claim life's not so bad. 1268 01:07:54,290 --> 01:07:59,388 I actually need to do that before we get to indirection. 1269 01:08:16,010 --> 01:08:20,990 Step 5 is let's tune jump pointers a bit. 1270 01:08:24,479 --> 01:08:25,550 I want to make them-- 1271 01:08:28,141 --> 01:08:29,390 so they're the problem, right? 1272 01:08:29,390 --> 01:08:31,330 That's where we get n log n space. 1273 01:08:31,330 --> 01:08:34,370 They're the only source of our n log n space. 1274 01:08:34,370 --> 01:08:39,170 So what I'd like to do is in this situation where 1275 01:08:39,170 --> 01:08:41,000 the number of leaves is small-- 1276 01:08:41,000 --> 01:08:42,740 we'll see what small is in a moment-- 1277 01:08:42,740 --> 01:08:46,809 I would like jump pointers to be linear size. 1278 01:08:49,779 --> 01:08:51,420 OK, here's the idea. 1279 01:08:56,250 --> 01:09:00,224 First idea is let's just store jump pointers from leaves. 1280 01:09:09,290 --> 01:09:10,020 OK. 1281 01:09:10,020 --> 01:09:16,529 So that would imply l log n space, 1282 01:09:16,529 --> 01:09:18,640 I guess, plus linear overall. 1283 01:09:22,948 --> 01:09:25,890 Instead of n log n, now we just pay for the leaves, 1284 01:09:25,890 --> 01:09:27,689 except we kind of messed up our query. 1285 01:09:27,689 --> 01:09:31,024 First thing query did was at the node, follow the jump pointer. 1286 01:09:31,024 --> 01:09:33,870 But it's not so bad. 1287 01:09:33,870 --> 01:09:35,279 Here we are at x. 1288 01:09:35,279 --> 01:09:39,120 There's some leaves down here, and we want 1289 01:09:39,120 --> 01:09:41,670 to jump up from here, from x. 1290 01:09:41,670 --> 01:09:43,470 How do I jump from x? 1291 01:09:43,470 --> 01:09:45,990 Well, if I could somehow go from x to really, 1292 01:09:45,990 --> 01:09:50,250 any leaf, the ancestors of x that I care about 1293 01:09:50,250 --> 01:09:53,010 are also ancestors of any leaf descendant of x. 1294 01:09:53,010 --> 01:09:57,420 So all I need to do is store for each node 1295 01:09:57,420 --> 01:10:01,930 any leaf descendant, single pointer-- 1296 01:10:01,930 --> 01:10:09,260 this'll be linear-- from every node. 1297 01:10:12,740 --> 01:10:15,510 OK so I start at x. 1298 01:10:15,510 --> 01:10:18,670 I jump down to an arbitrary leaf, say this one. 1299 01:10:18,670 --> 01:10:22,950 And now I have to do a query. 1300 01:10:25,800 --> 01:10:33,540 Jump down, and let's say I jumped down by d. 1301 01:10:33,540 --> 01:10:40,770 Then my k becomes k plus d, right. 1302 01:10:40,770 --> 01:10:42,960 If I went down by d, and I want to go up 1303 01:10:42,960 --> 01:10:46,440 by k from my original point, now I have to go up by k plus d. 1304 01:10:46,440 --> 01:10:50,010 But hey, we know how to go up from any node that 1305 01:10:50,010 --> 01:10:51,010 has jump pointers. 1306 01:10:51,010 --> 01:10:55,800 So now we have a new node, a leaf. 1307 01:10:55,800 --> 01:11:01,020 So it has a jump pointer, has jump pointers, upward. 1308 01:11:01,020 --> 01:11:04,890 So we follow that one jump pointer to get us halfway there 1309 01:11:04,890 --> 01:11:06,660 from our new starting point. 1310 01:11:06,660 --> 01:11:08,340 We follow one ladder thing, and we 1311 01:11:08,340 --> 01:11:13,530 can get to the level ancestor k plus d from the leaf, 1312 01:11:13,530 --> 01:11:16,270 and that's the level ancestor k from x. 1313 01:11:16,270 --> 01:11:19,069 OK, this is like a reduction to the leaf situation. 1314 01:11:19,069 --> 01:11:21,610 We really don't have to support queries from arbitrary nodes. 1315 01:11:21,610 --> 01:11:24,030 Just go down to a leaf and then solve the problem 1316 01:11:24,030 --> 01:11:25,840 from the leaf. 1317 01:11:25,840 --> 01:11:26,340 OK. 1318 01:11:29,350 --> 01:11:31,840 OK, so now, if the number leaves is small, 1319 01:11:31,840 --> 01:11:32,880 my space will get small. 1320 01:11:32,880 --> 01:11:34,540 How small does l have to be? 1321 01:11:34,540 --> 01:11:36,310 n divided by log n. 1322 01:11:36,310 --> 01:11:39,180 Interesting. 1323 01:11:39,180 --> 01:11:43,260 If I could get the top structure to not have n over log n nodes, 1324 01:11:43,260 --> 01:11:44,250 that's not possible. 1325 01:11:44,250 --> 01:11:47,070 I can, at best, get to n minus log n nodes. 1326 01:11:47,070 --> 01:11:50,700 But if I could get it down to n over log n leaves, that 1327 01:11:50,700 --> 01:11:52,500 would be enough to make this linear space, 1328 01:11:52,500 --> 01:11:54,930 and indeed, I can. 1329 01:11:54,930 --> 01:11:59,340 This is a technique called tree trimming, or I call it that. 1330 01:11:59,340 --> 01:12:01,540 I don't know if anyone else does. 1331 01:12:01,540 --> 01:12:04,542 But I think I've called it that in enough papers 1332 01:12:04,542 --> 01:12:06,000 that we're allowed to call it that. 1333 01:12:13,420 --> 01:12:15,670 Originally invented by [? Al ?] [? Strip ?] and others 1334 01:12:15,670 --> 01:12:18,620 for a particular data structure. 1335 01:12:18,620 --> 01:12:19,820 There's many versions of it. 1336 01:12:19,820 --> 01:12:21,830 We will see other versions in future lectures, 1337 01:12:21,830 --> 01:12:29,340 but here's the version you need for this problem. 1338 01:12:47,920 --> 01:12:50,470 OK, here's the plan. 1339 01:12:50,470 --> 01:12:59,820 I have a tree and I want to identify 1340 01:12:59,820 --> 01:13:04,890 all the maximally deep nodes that have at least log n 1341 01:13:04,890 --> 01:13:06,505 nodes below them. 1342 01:13:06,505 --> 01:13:08,130 This will seem weird, because we really 1343 01:13:08,130 --> 01:13:09,670 care about leaves, and so on. 1344 01:13:09,670 --> 01:13:15,240 So there's stuff hanging off here, whatever. 1345 01:13:15,240 --> 01:13:18,220 I guess I'm thinking of that as one big tree. 1346 01:13:18,220 --> 01:13:21,180 No, actually I'm not. 1347 01:13:21,180 --> 01:13:22,770 I do need to separate these out. 1348 01:13:25,470 --> 01:13:28,230 But one of these nodes could have arbitrarily many children. 1349 01:13:28,230 --> 01:13:29,920 We have no idea. 1350 01:13:29,920 --> 01:13:31,250 It's a arbitrary tree. 1351 01:13:34,740 --> 01:13:38,310 OK, and what I know is that each of these triangles 1352 01:13:38,310 --> 01:13:42,750 has size less than 1/4 log n. 1353 01:13:42,750 --> 01:13:46,020 Because otherwise, this node was not maximally deep. 1354 01:13:46,020 --> 01:13:53,919 So if this had size greater or equal than 1/4 log n, 1355 01:13:53,919 --> 01:13:56,460 then that would have been the node where I cut, not this one. 1356 01:13:56,460 --> 01:13:58,920 So I'm circling the nodes that I cut below, 1357 01:13:58,920 --> 01:14:00,240 so meaning I cut these edges. 1358 01:14:03,050 --> 01:14:05,830 OK, so these things have size less than 1/4 log n, 1359 01:14:05,830 --> 01:14:11,880 but these nodes have at least 1/4 log n nodes below them. 1360 01:14:11,880 --> 01:14:15,510 So how many of these circle nodes are there? 1361 01:14:15,510 --> 01:14:30,120 Well, at most, 4 n over log n such nodes, right, 1362 01:14:30,120 --> 01:14:33,300 because I can charge this node to at least 1/4 1363 01:14:33,300 --> 01:14:36,750 log n nodes that disappear in the top structure. 1364 01:14:40,010 --> 01:14:43,650 But these things become the leaves, right. 1365 01:14:43,650 --> 01:14:45,930 If I cut all the edges going down from there, 1366 01:14:45,930 --> 01:14:47,880 that makes it a leaf. 1367 01:14:47,880 --> 01:14:50,720 And they're the only leaves. 1368 01:14:50,720 --> 01:14:52,090 Are they the only leaves? 1369 01:14:52,090 --> 01:14:53,730 Yeah. 1370 01:14:53,730 --> 01:14:57,000 If you look at a leaf, then it has size less than 1/4 log n. 1371 01:14:57,000 --> 01:14:59,050 So you will cut above it somewhere. 1372 01:14:59,050 --> 01:15:01,540 So every old leaf will be down here, 1373 01:15:01,540 --> 01:15:05,270 and the only new leaves will be the cut nodes. 1374 01:15:05,270 --> 01:15:05,820 OK. 1375 01:15:05,820 --> 01:15:12,330 So we have order n over log n leaves. 1376 01:15:12,330 --> 01:15:14,057 Yes, good. 1377 01:15:14,057 --> 01:15:14,640 So it's funny. 1378 01:15:14,640 --> 01:15:17,070 We're cutting according to counting nodes, 1379 01:15:17,070 --> 01:15:18,660 descendants, not leaves. 1380 01:15:18,660 --> 01:15:20,870 Won't work if you cut with leaves-- 1381 01:15:20,870 --> 01:15:21,630 cut with nodes. 1382 01:15:21,630 --> 01:15:24,171 But then the thing that we care about is the number of leaves 1383 01:15:24,171 --> 01:15:25,660 went down. 1384 01:15:25,660 --> 01:15:28,530 That will be enough. 1385 01:15:28,530 --> 01:15:29,880 Great. 1386 01:15:29,880 --> 01:15:38,100 So up here, we can afford to use 5, the tuned jump pointer, 1387 01:15:38,100 --> 01:15:41,730 combined with ladder structure. 1388 01:15:41,730 --> 01:15:45,310 Because this only costs l log n. 1389 01:15:45,310 --> 01:15:48,930 l is now n over log n, so the log n's cancel. 1390 01:15:48,930 --> 01:15:51,720 So linear space to store the jump pointers 1391 01:15:51,720 --> 01:15:53,910 from these circled nodes. 1392 01:15:53,910 --> 01:15:56,370 So if our query is anywhere up here, 1393 01:15:56,370 --> 01:15:58,840 then we go to a descendant leaf in the top structure. 1394 01:15:58,840 --> 01:16:00,420 And we can go wherever we need to go. 1395 01:16:03,300 --> 01:16:05,506 If our query is in one of the little trees 1396 01:16:05,506 --> 01:16:08,130 at the bottom, which are small, they're only 1/4 quarter log n, 1397 01:16:08,130 --> 01:16:10,770 so we're going to use a lookup table. 1398 01:16:10,770 --> 01:16:12,845 Either answer is inside the triangle, 1399 01:16:12,845 --> 01:16:15,510 in which case, we really need to query that structure. 1400 01:16:15,510 --> 01:16:18,300 Or it's up here. 1401 01:16:18,300 --> 01:16:21,690 If it's up here, we just need to know, basically, 1402 01:16:21,690 --> 01:16:25,660 if every node down here stores a pointer to the dot above it. 1403 01:16:25,660 --> 01:16:28,070 Then we can first go there and see, is that too high? 1404 01:16:28,070 --> 01:16:30,210 If it's too high, then our answer is in here. 1405 01:16:30,210 --> 01:16:31,680 If it's not too high, then we just 1406 01:16:31,680 --> 01:16:34,330 do the corresponding query in structure 5. 1407 01:16:34,330 --> 01:16:36,630 OK, so the last remaining thing is 1408 01:16:36,630 --> 01:16:40,240 to solve a query that stays entirely within a triangle, so 1409 01:16:40,240 --> 01:16:45,555 a bottom structure, and that's where we use lookup tables. 1410 01:16:56,740 --> 01:17:00,026 Again, things are going to be similar to last time 1411 01:17:00,026 --> 01:17:01,600 except for now, to step 7. 1412 01:17:04,780 --> 01:17:08,410 But it's a little bit messier because instead of arrays, 1413 01:17:08,410 --> 01:17:09,760 we have trees. 1414 01:17:09,760 --> 01:17:13,870 And here it's like we graduate from baby [INAUDIBLE] which is 1415 01:17:13,870 --> 01:17:16,090 how many plus or minus 1 strings there are-- 1416 01:17:16,090 --> 01:17:20,050 power of 2-- to how many trees are there. 1417 01:17:20,050 --> 01:17:23,890 Anyone know how many trees on n nodes there are? 1418 01:17:23,890 --> 01:17:24,700 One word answer. 1419 01:17:28,220 --> 01:17:29,436 No. 1420 01:17:29,436 --> 01:17:30,660 Nice. 1421 01:17:30,660 --> 01:17:32,710 That is a correct one word answer. 1422 01:17:32,710 --> 01:17:34,150 Very good. 1423 01:17:34,150 --> 01:17:38,620 Not the one I had in mind, but anyone else? 1424 01:17:54,630 --> 01:17:55,339 Nope. 1425 01:17:55,339 --> 01:17:56,630 You're thinking end to the end. 1426 01:17:56,630 --> 01:17:58,010 That would be bad. 1427 01:17:58,010 --> 01:18:00,200 We could not afford that, because log n to log n 1428 01:18:00,200 --> 01:18:02,101 is super polynomial. 1429 01:18:02,101 --> 01:18:03,350 Fortunately it's not that big. 1430 01:18:03,350 --> 01:18:03,850 Hmm? 1431 01:18:03,850 --> 01:18:04,592 AUDIENCE: 1432 01:18:04,592 --> 01:18:06,050 PROFESSOR: It's roughly 4 to the n. 1433 01:18:06,050 --> 01:18:07,966 The correct answer-- I mean the exact answer-- 1434 01:18:07,966 --> 01:18:11,637 is called the Catalan number, which didn't tell you much. 1435 01:18:11,637 --> 01:18:13,220 I didn't write it down, but I'm pretty 1436 01:18:13,220 --> 01:18:21,770 sure it is 2 n prime choose n prime 1 over n prime plus 1 1437 01:18:21,770 --> 01:18:24,380 ish? 1438 01:18:24,380 --> 01:18:25,370 Don't quote me on that. 1439 01:18:25,370 --> 01:18:27,380 It's roughly that. 1440 01:18:27,380 --> 01:18:28,370 Might be exactly that. 1441 01:18:28,370 --> 01:18:30,970 Someone with internet can check. 1442 01:18:30,970 --> 01:18:33,460 But it is at most 4 to the n prime. 1443 01:18:33,460 --> 01:18:35,210 The computer science answer is 4 to the n. 1444 01:18:35,210 --> 01:18:37,460 Indeed. 1445 01:18:37,460 --> 01:18:39,260 It's just some asymptotics here. 1446 01:18:39,260 --> 01:18:40,220 Why is it 4 to the n? 1447 01:18:40,220 --> 01:18:42,440 4 to the n you could also write as 2 to the 2 n 1448 01:18:42,440 --> 01:18:44,754 prime, which is-- 1449 01:18:44,754 --> 01:18:46,670 first, let's check this is good, and then I'll 1450 01:18:46,670 --> 01:18:50,060 explain why this is true in a computer science way. 1451 01:18:50,060 --> 01:18:51,930 So we got 1/4 log n up here. 1452 01:18:51,930 --> 01:18:55,910 So the one 2 cancels with one 2 up here. 1453 01:18:55,910 --> 01:18:57,850 So we have 2 to the 1/2 log n. 1454 01:18:57,850 --> 01:19:00,095 This is our good friend root n. 1455 01:19:00,095 --> 01:19:02,750 Root n is just something that's n to the something, 1456 01:19:02,750 --> 01:19:05,700 but is n to the something less than 1. 1457 01:19:05,700 --> 01:19:07,175 So we can afford some log factors. 1458 01:19:10,370 --> 01:19:13,970 Why are there only 2 to the 2 n prime trees? 1459 01:19:13,970 --> 01:19:17,630 One way to see that is you can encode a tree using 2n bits. 1460 01:19:17,630 --> 01:19:20,360 If I have an n node tree, I can encode it with 2n bits. 1461 01:19:20,360 --> 01:19:21,710 How? 1462 01:19:21,710 --> 01:19:23,720 Do an Euler tour. 1463 01:19:23,720 --> 01:19:26,660 And all you really need to know from an Euler tour 1464 01:19:26,660 --> 01:19:28,970 to reconstruct the tree is at each step, 1465 01:19:28,970 --> 01:19:30,179 did I go down or did I go up? 1466 01:19:30,179 --> 01:19:31,719 Those are the only things you can do. 1467 01:19:31,719 --> 01:19:33,530 If you went down, it's to a new child. 1468 01:19:33,530 --> 01:19:35,850 If you went up, it's to an old node. 1469 01:19:35,850 --> 01:19:38,562 So if I told you a sequence of bits 1470 01:19:38,562 --> 01:19:40,520 for every step in the Euler tour, did I go down 1471 01:19:40,520 --> 01:19:43,950 or did I go up, you can reconstruct the tree. 1472 01:19:43,950 --> 01:19:45,450 Now how many bits do I have to do? 1473 01:19:45,450 --> 01:19:48,080 Well, twice the number of edges in the tree, 1474 01:19:48,080 --> 01:19:49,610 because the length of an Euler tour 1475 01:19:49,610 --> 01:19:51,318 is twice the number of edges in the tree. 1476 01:19:51,318 --> 01:19:53,810 So 2 n bits are enough to encode any tree. 1477 01:19:53,810 --> 01:19:55,790 That's the computer science information 1478 01:19:55,790 --> 01:19:57,560 theoretic way to prove it. 1479 01:19:57,560 --> 01:19:59,190 You could also do it from this formula, 1480 01:19:59,190 --> 01:20:01,440 but then you'd have to know why the formula's correct, 1481 01:20:01,440 --> 01:20:03,760 and that's messier. 1482 01:20:03,760 --> 01:20:06,150 Cool. 1483 01:20:06,150 --> 01:20:07,890 So we're almost done. 1484 01:20:07,890 --> 01:20:12,480 We have root n possible different structures down here. 1485 01:20:12,480 --> 01:20:14,610 We've got n over log n of them or-- 1486 01:20:14,610 --> 01:20:15,132 maybe. 1487 01:20:15,132 --> 01:20:17,340 It's a little harder to know exactly how many of them 1488 01:20:17,340 --> 01:20:19,440 there are, but I don't care. 1489 01:20:19,440 --> 01:20:21,310 There's only root n different types, 1490 01:20:21,310 --> 01:20:24,490 and so I only need to store a lookup table for each type. 1491 01:20:24,490 --> 01:20:32,250 The number of queries is order log squared n again, 1492 01:20:32,250 --> 01:20:36,960 because our structures are of size order log n, 1493 01:20:36,960 --> 01:20:39,470 and the answer to a query is again, 1494 01:20:39,470 --> 01:20:44,620 order log log n bits, because there's only log 1495 01:20:44,620 --> 01:20:47,730 n different nodes to point to. 1496 01:20:47,730 --> 01:20:58,260 And so the total space is order root n log n squared, 1497 01:20:58,260 --> 01:21:01,650 log log n for the lookup table. 1498 01:21:01,650 --> 01:21:05,800 And then each of these triangles stores a pointer, 1499 01:21:05,800 --> 01:21:08,010 or I guess, every node in here stores a pointer 1500 01:21:08,010 --> 01:21:14,490 to what tree we're in, or what type of tree we have, 1501 01:21:14,490 --> 01:21:17,250 and also what node in that tree we are in. 1502 01:21:17,250 --> 01:21:19,340 So every guy in here-- 1503 01:21:19,340 --> 01:21:20,940 because that's not part of the query-- 1504 01:21:20,940 --> 01:21:23,970 has to store, not only a little bit more specific pointer 1505 01:21:23,970 --> 01:21:25,260 into this table. 1506 01:21:25,260 --> 01:21:27,870 It actually tells you what the query part is, 1507 01:21:27,870 --> 01:21:30,390 or the first part of the query, the node x. 1508 01:21:30,390 --> 01:21:33,780 Then the table also is parameterized by k, 1509 01:21:33,780 --> 01:21:37,350 so one of these logs is which node you're querying. 1510 01:21:37,350 --> 01:21:39,300 The other log is now the value k, 1511 01:21:39,300 --> 01:21:41,676 but again, you never go up higher than log n. 1512 01:21:41,676 --> 01:21:43,050 If you went up higher than log n, 1513 01:21:43,050 --> 01:21:44,592 then you'd be in the 5 structure, 1514 01:21:44,592 --> 01:21:46,050 so if you just do a query up there, 1515 01:21:46,050 --> 01:21:48,430 you don't need a query in the bottom. 1516 01:21:48,430 --> 01:21:48,990 OK. 1517 01:21:48,990 --> 01:21:50,760 So there's only that many queries, 1518 01:21:50,760 --> 01:21:55,550 and so space for this lookup table is little o of n again. 1519 01:21:55,550 --> 01:21:58,490 And so we're dominated by space for these pointers 1520 01:21:58,490 --> 01:22:00,530 and for the space up here, which is linear. 1521 01:22:00,530 --> 01:22:04,020 So linear space, constant query. 1522 01:22:04,020 --> 01:22:06,398 Boom. 1523 01:22:06,398 --> 01:22:07,362 Any questions? 1524 01:22:13,150 --> 01:22:15,190 I have an open question, maybe. 1525 01:22:15,190 --> 01:22:17,460 I think it's open. 1526 01:22:17,460 --> 01:22:22,900 So what if you want to do dynamic, 30 seconds of dynamic? 1527 01:22:22,900 --> 01:22:26,590 For LCA, it's known how to do dynamic LCA 1528 01:22:26,590 --> 01:22:28,570 constant operations. 1529 01:22:28,570 --> 01:22:30,890 The operations are add a leaf-- 1530 01:22:30,890 --> 01:22:32,530 we can add another leaf-- 1531 01:22:32,530 --> 01:22:34,330 given an edge. 1532 01:22:34,330 --> 01:22:38,270 Subdivide that edge into that, and also the reverse. 1533 01:22:38,270 --> 01:22:42,850 So I can erase a guy, put the edge back, delete a leaf, 1534 01:22:42,850 --> 01:22:44,230 those sorts of things. 1535 01:22:44,230 --> 01:22:47,500 Those operations can all be done in constant time for LCA. 1536 01:22:47,500 --> 01:22:49,510 What about level ancestor? 1537 01:22:49,510 --> 01:22:50,570 I have no idea. 1538 01:22:50,570 --> 01:22:53,050 Maye we'll work on it today. 1539 01:22:53,050 --> 01:22:54,444 That's it.