1 00:00:00,060 --> 00:00:02,500 The following content is provided under a Creative 2 00:00:02,500 --> 00:00:04,019 Commons license. 3 00:00:04,019 --> 00:00:06,360 Your support will help MIT OpenCourseWare 4 00:00:06,360 --> 00:00:10,730 continue to offer high quality educational resources for free. 5 00:00:10,730 --> 00:00:13,340 To make a donation or view additional materials 6 00:00:13,340 --> 00:00:17,236 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,236 --> 00:00:17,861 at ocw.mit.edu. 8 00:00:20,915 --> 00:00:21,790 PROFESSOR: All right. 9 00:00:21,790 --> 00:00:23,780 Welcome back to 6046. 10 00:00:23,780 --> 00:00:24,611 AUDIENCE: Woohoo. 11 00:00:24,611 --> 00:00:26,860 PROFESSOR: Are you guys ready to learn an awesome data 12 00:00:26,860 --> 00:00:27,440 structure? 13 00:00:27,440 --> 00:00:28,316 AUDIENCE: Woohoo. 14 00:00:28,316 --> 00:00:31,120 PROFESSOR: Yeah, let's do it. 15 00:00:31,120 --> 00:00:34,895 This is a data structure named after a human being, Peter van 16 00:00:34,895 --> 00:00:36,640 Emde Boas. 17 00:00:36,640 --> 00:00:40,390 I was just corresponding with him yesterday. 18 00:00:40,390 --> 00:00:43,380 And he, in the '70s, he invented this really cool data 19 00:00:43,380 --> 00:00:43,880 structure. 20 00:00:43,880 --> 00:00:46,130 Its super fast It's amazing. 21 00:00:46,130 --> 00:00:48,300 It's actually pretty simple to implement. 22 00:00:48,300 --> 00:00:51,390 And it's used a lot, in practice, in network routers, 23 00:00:51,390 --> 00:00:53,230 among other things. 24 00:00:53,230 --> 00:00:54,830 And we're going to cover it today. 25 00:00:54,830 --> 00:00:58,730 So let me first tell you what it does. 26 00:00:58,730 --> 00:01:00,290 So it's an old data structure. 27 00:01:00,290 --> 00:01:03,110 But I feel like it's taken us decades to really understand. 28 00:01:03,110 --> 00:01:03,610 Question. 29 00:01:03,610 --> 00:01:05,519 AUDIENCE: You're mic's not on. 30 00:01:05,519 --> 00:01:07,870 PROFESSOR: In what sense? 31 00:01:07,870 --> 00:01:09,190 It's not amplified. 32 00:01:09,190 --> 00:01:11,490 It's just for the cameras. 33 00:01:11,490 --> 00:01:15,410 So it's taken us decades, really, 34 00:01:15,410 --> 00:01:18,270 to understand this data structure, exactly how it works 35 00:01:18,270 --> 00:01:21,530 and why it's useful. 36 00:01:21,530 --> 00:01:27,900 The problem it's solving is what you might call a predecessor 37 00:01:27,900 --> 00:01:28,560 problem. 38 00:01:28,560 --> 00:01:30,760 It's very similar to the sort of problem 39 00:01:30,760 --> 00:01:32,310 that binary search trees solve. 40 00:01:32,310 --> 00:01:36,910 But we're going to do it faster, but in a somewhat different 41 00:01:36,910 --> 00:01:41,320 model, in that the elements we're going to be storing 42 00:01:41,320 --> 00:01:45,030 are not just things that we know how to compare. 43 00:01:45,030 --> 00:01:46,580 That would be the comparison model. 44 00:01:46,580 --> 00:01:48,250 We're storing integers. 45 00:01:48,250 --> 00:01:53,220 And the integers come from a universe, U, of size little u. 46 00:01:53,220 --> 00:01:55,770 And we'll assume that they're non-negative, so from 0 47 00:01:55,770 --> 00:01:56,410 to u minus 1. 48 00:01:56,410 --> 00:01:58,030 Although you could support negative 49 00:01:58,030 --> 00:02:00,730 integers without much more effort. 50 00:02:00,730 --> 00:02:03,860 And the operations we want to support, 51 00:02:03,860 --> 00:02:06,430 we're storing a set of n of those elements. 52 00:02:06,430 --> 00:02:14,185 We want to do insert, delete, and successor. 53 00:02:20,230 --> 00:02:22,397 So these are operations you should be familiar with. 54 00:02:22,397 --> 00:02:23,813 You should know how to solve these 55 00:02:23,813 --> 00:02:26,640 in log n time per operation with a balanced binary search tree, 56 00:02:26,640 --> 00:02:27,361 like AVL trees. 57 00:02:27,361 --> 00:02:29,610 You want to add something to the set, delete something 58 00:02:29,610 --> 00:02:33,490 from the set, or given a value I want 59 00:02:33,490 --> 00:02:38,730 to know the next largest value that is in the set. 60 00:02:38,730 --> 00:02:43,790 So if you draw that as a one dimensional thing, 61 00:02:43,790 --> 00:02:48,210 you've got some items which are in your set. 62 00:02:48,210 --> 00:02:50,085 And then, you have a query. 63 00:02:52,870 --> 00:02:56,130 So you ask for the successor of this value. 64 00:02:56,130 --> 00:02:59,510 Then you're asking for, what is the next value that's 65 00:02:59,510 --> 00:03:00,010 in the set? 66 00:03:00,010 --> 00:03:02,855 So you want to return this item. 67 00:03:02,855 --> 00:03:04,730 OK, predecessor would be the symmetric thing. 68 00:03:04,730 --> 00:03:06,110 But if you could solve successor, 69 00:03:06,110 --> 00:03:08,280 you could usually solve predecessor. 70 00:03:08,280 --> 00:03:10,030 So we'll focus on these three operations, 71 00:03:10,030 --> 00:03:11,501 although, in the textbook, you'll 72 00:03:11,501 --> 00:03:13,000 see there are lots of operations you 73 00:03:13,000 --> 00:03:15,740 could do with van Emde Boas. 74 00:03:15,740 --> 00:03:17,110 So far so good. 75 00:03:17,110 --> 00:03:19,570 We know how to do this in log n time. 76 00:03:19,570 --> 00:03:27,750 We are going to do it in log log u time. 77 00:03:27,750 --> 00:03:30,230 Woah, amazing. 78 00:03:30,230 --> 00:03:33,720 So an extra log, but we're cheating a little bit, in that 79 00:03:33,720 --> 00:03:35,840 we're replacing n with u. 80 00:03:35,840 --> 00:03:39,230 Now in a lot of applications, u is pretty reasonable, 81 00:03:39,230 --> 00:03:42,335 like 2 to the 32 or 2 to the 64, depending 82 00:03:42,335 --> 00:03:44,620 on what kind of integers you usually work with. 83 00:03:44,620 --> 00:03:47,830 So log log of that is usually really tiny, and often smaller 84 00:03:47,830 --> 00:03:49,630 than log n. 85 00:03:49,630 --> 00:03:54,740 So in particular, on the theory side, 86 00:03:54,740 --> 00:03:57,280 for example, if u is a polynomial in n, 87 00:03:57,280 --> 00:04:05,960 or even larger than that, you can support n to the polylog n. 88 00:04:05,960 --> 00:04:10,930 Then log log u is the same as log log 89 00:04:10,930 --> 00:04:16,040 n, up to constant factors. 90 00:04:16,040 --> 00:04:18,089 And so this is an exponential improvement 91 00:04:18,089 --> 00:04:20,805 over regular balanced binary search trees. 92 00:04:20,805 --> 00:04:27,120 OK, so super fast, and it's also pretty clean and simple, 93 00:04:27,120 --> 00:04:29,780 though it'll take us a little while to get there. 94 00:04:29,780 --> 00:04:32,480 One application for this, as I mentioned, 95 00:04:32,480 --> 00:04:35,080 is in network routers. 96 00:04:35,080 --> 00:04:38,610 And I believe most network routers use the van Emde Boas 97 00:04:38,610 --> 00:04:40,270 data structure these days, though just 98 00:04:40,270 --> 00:04:45,190 changed in the last decade or so. 99 00:04:45,190 --> 00:04:47,810 Network router, you have to store a routing table, which 100 00:04:47,810 --> 00:04:51,170 looks like, for IP range from this to this, 101 00:04:51,170 --> 00:04:54,030 please send your packets along this port. 102 00:04:54,030 --> 00:04:56,930 For IP range from this to this, send along this port. 103 00:04:56,930 --> 00:05:00,430 So if you mark the beginnings of those ranges 104 00:05:00,430 --> 00:05:06,520 as items in your set, and given an actual IP address, 105 00:05:06,520 --> 00:05:08,110 you want to know what range it's in, 106 00:05:08,110 --> 00:05:10,860 that is a predecessor or a successor problem. 107 00:05:10,860 --> 00:05:13,880 And so van Emde Boas lets you solve that really fast. 108 00:05:13,880 --> 00:05:19,490 u, for IPV4 is only is only 2 to the 32. 109 00:05:19,490 --> 00:05:21,520 So that's super fast and practical. 110 00:05:21,520 --> 00:05:23,150 It's going to take like five operations 111 00:05:23,150 --> 00:05:27,740 to do log log 2 to the 32. 112 00:05:27,740 --> 00:05:30,750 So that's it. 113 00:05:30,750 --> 00:05:32,250 And as you may know, network routers 114 00:05:32,250 --> 00:05:33,669 are basically computers. 115 00:05:33,669 --> 00:05:35,960 And so they used to have a lot of specialized hardware. 116 00:05:35,960 --> 00:05:37,940 These days it's pretty general purpose. 117 00:05:37,940 --> 00:05:41,660 And so you want nice data structures, like the one 118 00:05:41,660 --> 00:05:43,060 we'll cover. 119 00:05:43,060 --> 00:05:46,660 OK, so we want to shoot for log log u. 120 00:05:46,660 --> 00:05:50,990 We're going to get there by a series of improvements 121 00:05:50,990 --> 00:05:52,510 on a very simple idea. 122 00:05:52,510 --> 00:05:54,830 This is not the original way that van Emde Boas 123 00:05:54,830 --> 00:05:56,570 got to this concept. 124 00:05:56,570 --> 00:05:58,200 But it's sort of the modern take on it. 125 00:05:58,200 --> 00:06:00,280 It's one that's in the textbook. 126 00:06:00,280 --> 00:06:04,920 So the first question is, how might we get a log log u bound? 127 00:06:04,920 --> 00:06:06,497 Where might that come from? 128 00:06:06,497 --> 00:06:07,580 That's a question for you. 129 00:06:11,267 --> 00:06:12,225 This is just intuition. 130 00:06:22,244 --> 00:06:22,910 Any suggestions? 131 00:06:32,980 --> 00:06:34,520 We see logs all the time. 132 00:06:34,520 --> 00:06:35,140 So, yeah. 133 00:06:35,140 --> 00:06:37,797 AUDIENCE: You organize the height of a tree into a tree. 134 00:06:37,797 --> 00:06:38,630 PROFESSOR: Ah, good. 135 00:06:38,630 --> 00:06:40,930 You organize the height of the tree into a tree. 136 00:06:40,930 --> 00:06:47,560 So we normally think of a tree, let's say we have u down here. 137 00:06:47,560 --> 00:06:51,340 So the height is log u. 138 00:06:51,340 --> 00:06:55,810 So somehow, we want a binary search 139 00:06:55,810 --> 00:06:57,420 on the levels of this tree. 140 00:06:57,420 --> 00:06:59,750 Right, if we could kind of start in the middle level, 141 00:06:59,750 --> 00:07:03,760 and then decide whether we need to go up or down, 142 00:07:03,760 --> 00:07:05,870 I'm totally unclear what that would mean. 143 00:07:05,870 --> 00:07:08,600 But in fact, that's exactly the van Emde Boas will do. 144 00:07:08,600 --> 00:07:12,450 So you can binary search-- I think 145 00:07:12,450 --> 00:07:18,785 we won't see that until the very end-- but on levels of a tree. 146 00:07:22,230 --> 00:07:23,355 So at least some intuition. 147 00:07:26,030 --> 00:07:29,800 Now let's think about this in terms of recurrences. 148 00:07:29,800 --> 00:07:35,070 There's a recurrence for binary search, which is usually 149 00:07:35,070 --> 00:07:42,580 you have k things, t of k is t of k over 2 plus order 1. 150 00:07:42,580 --> 00:07:44,070 You spend constant time to decide 151 00:07:44,070 --> 00:07:46,361 whether you should go left or right in a binary search, 152 00:07:46,361 --> 00:07:47,950 or in this case up and down somehow. 153 00:07:47,950 --> 00:07:50,140 And then you reduce to a problem of half the size. 154 00:07:50,140 --> 00:07:54,670 So this solves to log k. 155 00:07:54,670 --> 00:07:58,360 In our case, k is actually log u. 156 00:07:58,360 --> 00:08:03,780 So we want a recurrence that looks something like t of log u 157 00:08:03,780 --> 00:08:11,055 equals t of log u/2 plus order 1. 158 00:08:11,055 --> 00:08:13,680 OK, even if you don't believe in the binary search perspective, 159 00:08:13,680 --> 00:08:18,000 this is clearly a recurrence that solves to log log u. 160 00:08:18,000 --> 00:08:20,080 I'm just substituting k equals log u here. 161 00:08:20,080 --> 00:08:22,550 So that could be on the right track. 162 00:08:22,550 --> 00:08:24,010 Now, that's in terms of log u. 163 00:08:24,010 --> 00:08:26,850 What if I wanted to rewrite this recurrence in terms u? 164 00:08:26,850 --> 00:08:28,270 What would I get? 165 00:08:28,270 --> 00:08:33,299 If I wanted to have this still solve to log log u, 166 00:08:33,299 --> 00:08:35,480 what should I write here? 167 00:08:50,010 --> 00:08:54,465 If I change the logarithm of a number by a factor of 2, 168 00:08:54,465 --> 00:08:55,215 how does u change? 169 00:08:57,294 --> 00:08:58,210 AUDIENCE: Square root. 170 00:08:58,210 --> 00:08:59,168 PROFESSOR: Square root. 171 00:09:05,060 --> 00:09:08,180 OK, So I've changed what the variable is here. 172 00:09:08,180 --> 00:09:10,150 But this is really the same recurrence. 173 00:09:10,150 --> 00:09:13,020 It will still solve to log log u. 174 00:09:13,020 --> 00:09:15,870 The number of times you have to apply square root to a number 175 00:09:15,870 --> 00:09:17,990 to get to 1 is log log u. 176 00:09:17,990 --> 00:09:22,260 So this is some more intuition for how van Emde Boas is 177 00:09:22,260 --> 00:09:23,530 going to achieve log log u. 178 00:09:23,530 --> 00:09:28,490 And in fact, this is the primary intuition we'll be using. 179 00:09:28,490 --> 00:09:32,560 So what we would like is to some take our problem, 180 00:09:32,560 --> 00:09:35,710 which has size u, and split it into problems 181 00:09:35,710 --> 00:09:38,210 of size square root of u, so that we only 182 00:09:38,210 --> 00:09:40,150 have to recurse on one of them. 183 00:09:40,150 --> 00:09:42,960 And then, we'll get this recurrence. 184 00:09:42,960 --> 00:09:49,610 OK, that's where we're going to go. 185 00:09:49,610 --> 00:09:53,480 But we're going to start with a very simple data structure 186 00:09:53,480 --> 00:09:58,140 for representing a set of n numbers from the universe 0 187 00:09:58,140 --> 00:09:59,210 up to u minus 1. 188 00:10:02,370 --> 00:10:05,530 And let's say, initially, our goal is for insert and delete 189 00:10:05,530 --> 00:10:08,190 to be constant time. 190 00:10:08,190 --> 00:10:09,690 But let's not worry about successor. 191 00:10:09,690 --> 00:10:11,490 Successor could take linear time. 192 00:10:11,490 --> 00:10:15,650 What would be a good data structure for storing items 193 00:10:15,650 --> 00:10:17,000 in this universe? 194 00:10:17,000 --> 00:10:19,157 I want u to be involved somehow. 195 00:10:19,157 --> 00:10:20,740 I don't just want to, like, store them 196 00:10:20,740 --> 00:10:24,660 in a linked list of items or assorted array of items. 197 00:10:24,660 --> 00:10:28,430 I would like u to be involved, insert 198 00:10:28,430 --> 00:10:29,650 and delete constant time. 199 00:10:35,040 --> 00:10:35,725 Very simple. 200 00:10:45,633 --> 00:10:46,133 Yeah. 201 00:10:46,133 --> 00:10:47,582 AUDIENCE: Simply an array. 202 00:10:47,582 --> 00:10:48,790 PROFESSOR: In an array, yeah. 203 00:10:48,790 --> 00:10:51,514 What's the array indexed by? 204 00:10:51,514 --> 00:10:54,720 AUDIENCE: It would be index n. 205 00:10:54,720 --> 00:10:55,460 PROFESSOR: Sorry? 206 00:10:55,460 --> 00:10:56,870 AUDIENCE: By the index of n. 207 00:10:56,870 --> 00:10:59,964 PROFESSOR: The index of n, close. 208 00:10:59,964 --> 00:11:01,260 AUDIENCE: The value. 209 00:11:01,260 --> 00:11:02,027 PROFESSOR: Sorry? 210 00:11:02,027 --> 00:11:02,860 AUDIENCE: The value. 211 00:11:02,860 --> 00:11:04,490 PROFESSOR: The value, yeah. 212 00:11:04,490 --> 00:11:04,990 Good. 213 00:11:04,990 --> 00:11:12,820 So I want-- this is normally called a bit vector, where 214 00:11:12,820 --> 00:11:26,540 I want array of size u, and for each cell in the array, 215 00:11:26,540 --> 00:11:27,730 I'm going to write 0 or 1. 216 00:11:27,730 --> 00:11:29,810 0 means absent. 217 00:11:29,810 --> 00:11:31,820 1 means present. 218 00:11:31,820 --> 00:11:32,570 It's in the set. 219 00:11:35,980 --> 00:11:40,060 So let me draw a picture, maybe over here. 220 00:11:55,410 --> 00:12:00,885 Let me take my example and give you a frisbee. 221 00:12:08,390 --> 00:12:10,520 Let me put it in the middle. 222 00:12:39,470 --> 00:12:43,960 So this is an example of a set with-- if I maybe highlight 223 00:12:43,960 --> 00:12:46,060 a little bit-- here's 1. 224 00:12:48,580 --> 00:12:52,830 Here's a 1, and a one, and a one. 225 00:12:52,830 --> 00:12:57,260 So there are 4 elements in the set. 226 00:12:57,260 --> 00:12:58,860 The universe size is 16. 227 00:13:03,820 --> 00:13:08,230 n equals 4, in this particular example. 228 00:13:08,230 --> 00:13:12,930 If I want to insert into this set, I just change 0 to a 1. 229 00:13:12,930 --> 00:13:15,460 If I want to delete from the set, I change a 1 to a 0. 230 00:13:15,460 --> 00:13:17,170 So those are constant time. 231 00:13:17,170 --> 00:13:17,670 Good. 232 00:13:26,220 --> 00:13:32,570 If I want to do a successor query, not so good. 233 00:13:32,570 --> 00:13:36,890 I might need to spend order u time. 234 00:13:36,890 --> 00:13:39,520 Maybe I asked for the successor of this item, 235 00:13:39,520 --> 00:13:42,280 and the only thing to do is just keep 236 00:13:42,280 --> 00:13:44,840 jumping until I get to a 1. 237 00:13:44,840 --> 00:13:47,510 And the worst case, there's almost to u 0's 238 00:13:47,510 --> 00:13:50,900 in a row, or u minus n. 239 00:13:50,900 --> 00:13:51,980 So that's really slow. 240 00:13:51,980 --> 00:13:53,930 But this, in fact, will be our starting point. 241 00:13:53,930 --> 00:13:55,730 It may seem really silly. 242 00:13:55,730 --> 00:13:58,770 But it's actually a good starting point 243 00:13:58,770 --> 00:14:01,570 for van Emde Boas. 244 00:14:01,570 --> 00:14:16,010 So the second idea is, we're going to take our universe 245 00:14:16,010 --> 00:14:19,375 and split it into clusters. 246 00:14:22,260 --> 00:14:24,700 van Emde Boas, the person, likes to call these galaxies. 247 00:14:24,700 --> 00:14:29,290 I think that's a nice name for pieces of the universe. 248 00:14:29,290 --> 00:14:31,487 But textbook calls it clusters. 249 00:14:31,487 --> 00:14:33,070 Because they used to call it clusters. 250 00:14:33,070 --> 00:14:38,630 So now, it's question of how big the cluster should be. 251 00:14:38,630 --> 00:14:41,860 But I gave you this picture, and I 252 00:14:41,860 --> 00:14:44,420 want to think about these galaxies as separate chunks, 253 00:14:44,420 --> 00:14:45,930 and I ask for the successor of this, 254 00:14:45,930 --> 00:14:51,508 how could I possibly speed up the successor search? 255 00:14:51,508 --> 00:14:52,924 Yeah. 256 00:14:52,924 --> 00:14:58,130 AUDIENCE: You could form a tree for each cluster and connect-- 257 00:14:58,130 --> 00:15:01,240 PROFESSOR: You could form a tree here and store what at the-- 258 00:15:01,240 --> 00:15:02,073 [INTERPOSING VOICES] 259 00:15:02,073 --> 00:15:05,530 AUDIENCE: Could store an or between the two bits. 260 00:15:05,530 --> 00:15:06,250 PROFESSOR: Cool. 261 00:15:06,250 --> 00:15:07,660 I like this. 262 00:15:07,660 --> 00:15:10,516 So I could store the or of these two bits-- 263 00:15:10,516 --> 00:15:12,970 clean this up a little bit-- or of these two bits, 264 00:15:12,970 --> 00:15:15,320 or of these two bits, and so on. 265 00:15:19,096 --> 00:15:23,210 The or is interesting, because this 0 bit, in particular, 266 00:15:23,210 --> 00:15:26,170 tells me there's nothing in here. 267 00:15:26,170 --> 00:15:28,800 So I should just be able to skip over it. 268 00:15:28,800 --> 00:15:32,330 So you're imagining a kind of binary search-ish thing. 269 00:15:32,330 --> 00:15:33,140 It's a good idea. 270 00:15:37,440 --> 00:15:39,470 So each node here, I'm just writing the or 271 00:15:39,470 --> 00:15:40,303 of its two children. 272 00:15:44,710 --> 00:15:46,910 And in fact, you could do this all the way up. 273 00:15:46,910 --> 00:15:50,729 You could build an entire binary tree. 274 00:15:50,729 --> 00:15:52,270 But remember, what we're trying to do 275 00:15:52,270 --> 00:15:55,520 is a binary search on the levels of the tree. 276 00:15:55,520 --> 00:16:00,570 And so, in particular, I'm going to focus on this level. 277 00:16:00,570 --> 00:16:02,220 This is the middle level of that tree 278 00:16:02,220 --> 00:16:05,410 if I drew out the whole thing. 279 00:16:05,410 --> 00:16:08,760 And that level is interesting, because it's just summarizing-- 280 00:16:08,760 --> 00:16:11,826 is there anybody in here, is there anybody in this cluster, 281 00:16:11,826 --> 00:16:13,200 is there anybody in this cluster, 282 00:16:13,200 --> 00:16:15,440 is there anybody in this cluster. 283 00:16:15,440 --> 00:16:18,400 So we call this the summary vector. 284 00:16:22,820 --> 00:16:26,660 So we'll come back to your tree perspective at some point. 285 00:16:26,660 --> 00:16:29,834 That is a good big picture of what's going on. 286 00:16:29,834 --> 00:16:31,750 But at this level, I'm just going to say, well 287 00:16:31,750 --> 00:16:32,874 let's store the bit vector. 288 00:16:32,874 --> 00:16:36,570 Let's also store this summary vector. 289 00:16:36,570 --> 00:16:39,950 And now, when I want to find the successor of something, 290 00:16:39,950 --> 00:16:42,610 first I'll look inside the cluster. 291 00:16:42,610 --> 00:16:46,000 If I don't find my answer, I'll go up to the summary vector 292 00:16:46,000 --> 00:16:48,110 and find where is the next cluster that 293 00:16:48,110 --> 00:16:49,880 has something in it. 294 00:16:49,880 --> 00:16:51,740 And then I'll go into that cluster 295 00:16:51,740 --> 00:16:54,450 and look for the first one. 296 00:16:54,450 --> 00:16:59,560 OK, that's a good next step. 297 00:16:59,560 --> 00:17:07,280 So this will split the universe into clusters. 298 00:17:10,280 --> 00:17:14,885 How big should the clusters be to balance out? 299 00:17:14,885 --> 00:17:16,260 There's three searches I'm doing. 300 00:17:16,260 --> 00:17:17,829 One is within a cluster. 301 00:17:17,829 --> 00:17:19,700 One is in the summary vector. 302 00:17:19,700 --> 00:17:23,422 And one is within another cluster. 303 00:17:23,422 --> 00:17:23,922 Yeah. 304 00:17:23,922 --> 00:17:24,922 AUDIENCE: Square root u. 305 00:17:24,922 --> 00:17:26,000 PROFESSOR: Square root u. 306 00:17:26,000 --> 00:17:26,500 Yeah. 307 00:17:26,500 --> 00:17:27,740 That will balance out. 308 00:17:27,740 --> 00:17:29,142 If there's square root of u size, 309 00:17:29,142 --> 00:17:31,350 then the number of clusters will be square root of u. 310 00:17:31,350 --> 00:17:32,620 So the search in the summary vector 311 00:17:32,620 --> 00:17:34,410 will be the same as the cost down here. 312 00:17:34,410 --> 00:17:35,910 Also we know that we kind of want 313 00:17:35,910 --> 00:17:38,000 to do square root of u recursion somehow. 314 00:17:38,000 --> 00:17:40,030 So this is not yet the recursive version. 315 00:17:40,030 --> 00:17:42,140 But square root of u is exactly right. 316 00:17:42,140 --> 00:17:44,350 And I owe some frisbees, sorry. 317 00:17:44,350 --> 00:17:46,710 Here's one frisbee. 318 00:17:46,710 --> 00:17:50,860 And yeah, cool. 319 00:17:50,860 --> 00:17:54,552 And I think also you one. 320 00:17:54,552 --> 00:17:56,460 Sorry. 321 00:17:56,460 --> 00:18:01,430 So clusters have size square root 322 00:18:01,430 --> 00:18:04,260 of u, the square root of u of them. 323 00:18:04,260 --> 00:18:06,350 And, cool. 324 00:18:06,350 --> 00:18:10,250 So now, when I want to do an insert or a delete, 325 00:18:10,250 --> 00:18:13,340 it's still-- let's not worry about delete. 326 00:18:13,340 --> 00:18:14,580 That's a little tricky. 327 00:18:14,580 --> 00:18:16,170 To do an insert, it's still easy. 328 00:18:16,170 --> 00:18:18,960 If I insert into here, I set it to 1. 329 00:18:18,960 --> 00:18:23,210 And I check, if this is already 0, I should also set that to 1. 330 00:18:23,210 --> 00:18:24,830 Now deleting would be tricky. 331 00:18:24,830 --> 00:18:27,920 To delete this guy and realize that there's nothing else, eh. 332 00:18:27,920 --> 00:18:30,890 Let's not worry about that until we do a lot more work. 333 00:18:30,890 --> 00:18:33,790 Let's just focus on insert and successor. 334 00:18:33,790 --> 00:18:40,940 So insert, with this strategy, is still constant time. 335 00:18:40,940 --> 00:18:44,730 It's two steps instead of one, but it's good. 336 00:18:44,730 --> 00:18:50,360 Successor does three things. 337 00:18:50,360 --> 00:18:56,230 First, we look, let's say, successor of x. 338 00:18:56,230 --> 00:18:58,705 First thing we do is look in x's cluster. 339 00:19:02,930 --> 00:19:06,860 Then, if we don't find what we're looking for, 340 00:19:06,860 --> 00:19:20,690 then we'll look for the next 1 bit in the summary vector, 341 00:19:20,690 --> 00:19:30,790 and then, we'll look for the first 1 in that cluster. 342 00:19:34,190 --> 00:19:35,190 So there are two cases. 343 00:19:35,190 --> 00:19:38,145 In the lucky case, we find the successor in the cluster 344 00:19:38,145 --> 00:19:39,670 that we started in. 345 00:19:39,670 --> 00:19:41,590 So that only takes root u time. 346 00:19:41,590 --> 00:19:44,100 If we're unlucky, we research in the summary. 347 00:19:44,100 --> 00:19:45,285 That takes root u time. 348 00:19:45,285 --> 00:19:46,660 And then we find the first 1 bit. 349 00:19:46,660 --> 00:19:47,880 That takes root u time. 350 00:19:47,880 --> 00:19:51,930 Whole thing is square root of u, which is, of course, not very 351 00:19:51,930 --> 00:19:53,930 good, compared to log n. 352 00:19:53,930 --> 00:19:56,140 But it's a lot better than u, which 353 00:19:56,140 --> 00:19:59,070 is our first method, the bit vector. 354 00:19:59,070 --> 00:20:01,230 So we've improved from u to square root of u. 355 00:20:01,230 --> 00:20:03,660 Now of course, the idea is to recurse. 356 00:20:03,660 --> 00:20:06,872 Instead of just doing a bit vector at each of these levels, 357 00:20:06,872 --> 00:20:08,580 we're going to recursively represent each 358 00:20:08,580 --> 00:20:11,669 of these clusters in this way. 359 00:20:11,669 --> 00:20:13,960 This is where things get a little magical, in the magic 360 00:20:13,960 --> 00:20:17,730 of divide and conquer. 361 00:20:17,730 --> 00:20:19,710 And then, we'll get t of square root of u 362 00:20:19,710 --> 00:20:23,070 instead of square root of u. 363 00:20:23,070 --> 00:20:26,280 And then we'll get a log log cost. 364 00:20:26,280 --> 00:20:33,210 So before I get there, let me give you 365 00:20:33,210 --> 00:20:40,460 a little bit of terminology and an example 366 00:20:40,460 --> 00:20:42,295 for dealing with clusters. 367 00:20:45,770 --> 00:20:47,980 OK, in general, remember the things 368 00:20:47,980 --> 00:20:50,580 we're searching for are just integers. 369 00:20:50,580 --> 00:20:53,510 And what we're talking about is essentially 370 00:20:53,510 --> 00:20:57,710 dividing an integer, like x, by square root of u. 371 00:20:57,710 --> 00:21:01,470 And so this is, whatever, the quotient. 372 00:21:01,470 --> 00:21:02,660 And this is the remainder. 373 00:21:02,660 --> 00:21:05,830 So I want j to be between 0 and strictly 374 00:21:05,830 --> 00:21:07,290 less than square root of u. 375 00:21:07,290 --> 00:21:10,560 Then this is unique, fundamental theorem of arithmetic, 376 00:21:10,560 --> 00:21:12,350 or something. 377 00:21:12,350 --> 00:21:15,950 And i is the cluster number. 378 00:21:15,950 --> 00:21:19,860 And then j is the position of x within that cluster. 379 00:21:19,860 --> 00:21:28,000 So let's do an example like x equals 9. 380 00:21:28,000 --> 00:21:30,720 So I didn't number them over here. 381 00:21:30,720 --> 00:21:36,384 This is x equals 0, 1, 2, 3, 4, 5, 6, 7, 8, 382 00:21:36,384 --> 00:21:39,110 9-- here's the guy I'm interested in-- 10, 383 00:21:39,110 --> 00:21:43,860 11, 12, and so on. 384 00:21:43,860 --> 00:21:45,110 So 9 is here. 385 00:21:45,110 --> 00:21:49,380 This is cluster number 0, 1, 2. 386 00:21:49,380 --> 00:21:52,870 So I claim 9 equals 2 times square root of u. 387 00:21:52,870 --> 00:21:53,860 Here is 4. 388 00:21:53,860 --> 00:21:57,110 I conveniently chose u to be a perfect square. 389 00:21:57,110 --> 00:22:01,810 And it is item 0,1 within the cluster. 390 00:22:01,810 --> 00:22:05,370 And indeed, 9 equals 2 times 4 plus 1. 391 00:22:05,370 --> 00:22:09,360 So in general, if you're given x, 392 00:22:09,360 --> 00:22:12,770 and I said, ah, look in x's cluster, what that means 393 00:22:12,770 --> 00:22:17,440 is look at x integer divided by square root of u. 394 00:22:17,440 --> 00:22:18,997 That's the cluster number. 395 00:22:18,997 --> 00:22:20,330 And I'll try to search in there. 396 00:22:22,510 --> 00:22:24,690 And I look in the summary vector, 397 00:22:24,690 --> 00:22:27,650 starting from that cluster name, the name 398 00:22:27,650 --> 00:22:31,207 of the cluster for this guy, finding the next cluster. 399 00:22:31,207 --> 00:22:32,790 Then I'll multiply by square root of u 400 00:22:32,790 --> 00:22:36,930 to get here, and then continue on. 401 00:22:36,930 --> 00:22:40,340 In general, because dividing to multiplying-- I 402 00:22:40,340 --> 00:22:43,220 don't want to have to think about it too hard. 403 00:22:43,220 --> 00:22:47,850 I'm going to say, define some functions to make 404 00:22:47,850 --> 00:22:51,290 this a little easier, more intuitive. 405 00:22:51,290 --> 00:22:53,960 So when I do integer division by square root of u, which 406 00:22:53,960 --> 00:22:55,840 is like taking the floor, I'll call that high 407 00:22:55,840 --> 00:22:58,370 of x, the high part of x. 408 00:22:58,370 --> 00:23:01,320 And low of x is going to be the remainder. 409 00:23:01,320 --> 00:23:03,130 That's the j up here. 410 00:23:07,350 --> 00:23:10,910 And if I have the high and the low part, the i and the j, 411 00:23:10,910 --> 00:23:15,070 I'm going to use index to go back to x. 412 00:23:15,070 --> 00:23:22,370 So index of ij is going to be i times square root of u plus j. 413 00:23:22,370 --> 00:23:25,530 Now why do I call these high and low? 414 00:23:32,195 --> 00:23:33,070 I'll give you a hint. 415 00:23:42,530 --> 00:23:44,380 Here's the binary representation of x. 416 00:23:56,820 --> 00:24:01,160 In this case, high of x is 2. 417 00:24:01,160 --> 00:24:02,490 And low of x is 1. 418 00:24:06,282 --> 00:24:07,230 Yeah. 419 00:24:07,230 --> 00:24:09,550 AUDIENCE: So the high x corresponds to the first two, 420 00:24:09,550 --> 00:24:11,522 which is the first 2 bit. 421 00:24:11,522 --> 00:24:13,990 And the low x corresponds to [INAUDIBLE]. 422 00:24:13,990 --> 00:24:15,930 PROFESSOR: Right. 423 00:24:15,930 --> 00:24:21,480 High of x corresponds to the high half of the bits. 424 00:24:21,480 --> 00:24:26,230 And low of x corresponds to the bottom half of the bits. 425 00:24:26,230 --> 00:24:29,520 So these are the high order bits and the low order bits. 426 00:24:29,520 --> 00:24:31,170 And if you think about it, remember 427 00:24:31,170 --> 00:24:34,790 when we take square root of u in logarithm, it takes log u 428 00:24:34,790 --> 00:24:36,810 and divides it in half. 429 00:24:36,810 --> 00:24:38,880 So it's exactly, in the bit factor, 430 00:24:38,880 --> 00:24:42,900 which is log u bits long, we're dividing in half here, 431 00:24:42,900 --> 00:24:47,260 and looking at the high bits versus the low bits. 432 00:24:47,260 --> 00:24:48,730 OK? 433 00:24:48,730 --> 00:24:51,780 So that's another interpretation of what this is doing. 434 00:24:51,780 --> 00:24:53,860 And if you don't like doing division, 435 00:24:53,860 --> 00:24:57,090 as many computers don't like to do, all we're actually doing 436 00:24:57,090 --> 00:24:59,530 is masking out these bits, or taking these bits 437 00:24:59,530 --> 00:25:01,060 and shifting them over. 438 00:25:01,060 --> 00:25:03,620 So these are very efficient to actually do. 439 00:25:03,620 --> 00:25:07,950 And maybe get some intuition for why they're relevant. 440 00:25:07,950 --> 00:25:13,900 So let's recurse, shall we? 441 00:25:21,975 --> 00:25:25,100 I think now we know how this splitting things up works. 442 00:25:42,230 --> 00:25:47,150 So I'm going to call the overall structure v, 443 00:25:47,150 --> 00:25:51,810 or a van Emde Boas structure I'm trying to represent is v. 444 00:25:51,810 --> 00:25:56,390 And v is going to consist of two parts. 445 00:25:56,390 --> 00:26:00,580 One is an array of all of the clusters. 446 00:26:08,870 --> 00:26:11,050 I'm going to abbreviate van Emde Boas as VEB. 447 00:26:13,920 --> 00:26:18,190 And recursively, each of those clusters 448 00:26:18,190 --> 00:26:22,750 is going to be represented by a smaller VEB structure, of size 449 00:26:22,750 --> 00:26:25,776 square root of the given one. 450 00:26:25,776 --> 00:26:33,901 OK, and i ranges from 0 to square root of u minus 1. 451 00:26:33,901 --> 00:26:36,500 OK, so there's square root of u of them. 452 00:26:36,500 --> 00:26:38,640 Total sizes is u. 453 00:26:38,640 --> 00:26:40,850 And then, in addition, we're going 454 00:26:40,850 --> 00:26:43,160 to have a summary structure. 455 00:26:43,160 --> 00:26:48,311 And this is also a size square root of u VEB. 456 00:26:53,230 --> 00:26:57,860 OK, you should think about inserts and successors. 457 00:26:57,860 --> 00:27:01,810 Those are the two operations I care about for now. 458 00:27:01,810 --> 00:27:02,810 Let's start with insert. 459 00:27:02,810 --> 00:27:03,410 That's easier. 460 00:27:20,360 --> 00:27:26,560 So if I want to insert an item, x, into data structure v, 461 00:27:26,560 --> 00:27:29,610 then first thing I should do is insert 462 00:27:29,610 --> 00:27:31,190 into its corresponding cluster. 463 00:27:31,190 --> 00:27:34,950 So let's just get comfortable with that notation. 464 00:27:34,950 --> 00:27:41,760 We're inserting into the cluster whose number is high of x. 465 00:27:41,760 --> 00:27:44,780 That is where x belongs. 466 00:27:44,780 --> 00:27:47,270 The name of its cluster should be high of x. 467 00:27:47,270 --> 00:27:49,350 And what we're going to be inserting recursively 468 00:27:49,350 --> 00:27:51,150 into there is low of x. 469 00:27:51,150 --> 00:27:54,950 That is the name of x local to that cluster. 470 00:27:54,950 --> 00:27:58,120 x is a global name with respect to v. This cluster only 471 00:27:58,120 --> 00:28:01,590 represents a small range of square root of u items. 472 00:28:01,590 --> 00:28:03,550 So this gets us from the big space of size u 473 00:28:03,550 --> 00:28:05,133 to the small space of size square root 474 00:28:05,133 --> 00:28:06,860 of u within that cluster. 475 00:28:06,860 --> 00:28:10,220 So that's basically what high and low were made for. 476 00:28:10,220 --> 00:28:13,170 But then, we have to also update the summary structure. 477 00:28:13,170 --> 00:28:17,150 So we need, just in case-- Maybe it's already there. 478 00:28:17,150 --> 00:28:19,110 But in the worst case, it isn't. 479 00:28:19,110 --> 00:28:22,850 So we'll just think of that as recursively inserting 480 00:28:22,850 --> 00:28:31,570 into v dot summary the name of the cluster, which 481 00:28:31,570 --> 00:28:33,210 is high of x. 482 00:28:33,210 --> 00:28:36,661 High of x is keeping track of which clusters are non-empty. 483 00:28:36,661 --> 00:28:38,660 We've just inserted something into this cluster. 484 00:28:38,660 --> 00:28:39,740 So it's non-empty. 485 00:28:39,740 --> 00:28:43,170 We better mark that that cluster, high of x, 486 00:28:43,170 --> 00:28:45,640 is non-empty in the summary structure. 487 00:28:45,640 --> 00:28:46,140 Why? 488 00:28:46,140 --> 00:28:47,820 So we can do successor. 489 00:28:47,820 --> 00:28:50,295 So let's move on to successor. 490 00:29:00,440 --> 00:29:04,236 Actually, I want to mimic the successor written here 491 00:29:04,236 --> 00:29:05,360 on the bottom of the board. 492 00:29:08,250 --> 00:29:10,610 So what we had in the non-recursive version 493 00:29:10,610 --> 00:29:11,780 was three steps. 494 00:29:11,780 --> 00:29:14,120 So we're going to do the same thing here. 495 00:29:14,120 --> 00:29:16,120 We're going to look within x's cluster. 496 00:29:16,120 --> 00:29:19,420 We now know that is the cluster known as high of x. 497 00:29:22,790 --> 00:29:25,880 And either we find, and we're happy, or we don't. 498 00:29:25,880 --> 00:29:29,650 Then we're going to look at v dot summary search for this 499 00:29:29,650 --> 00:29:33,140 the successor of high of x. 500 00:29:33,140 --> 00:29:37,060 Right, finding the next 1 bit, that is successor. 501 00:29:37,060 --> 00:29:42,330 And then, I want to find the first 1 bit in that cluster. 502 00:29:42,330 --> 00:29:43,690 Is that a successor also? 503 00:29:52,251 --> 00:29:52,750 Yeah. 504 00:29:52,750 --> 00:29:56,460 That's just the successor of negative infinity. 505 00:29:56,460 --> 00:30:01,250 Finding the minimum element in a cluster is the successor of -1, 506 00:30:01,250 --> 00:30:02,940 or 0, or not zero. 507 00:30:02,940 --> 00:30:05,880 But -1 would work, or negative infinity, maybe more 508 00:30:05,880 --> 00:30:06,469 intuitively. 509 00:30:06,469 --> 00:30:08,010 That'll find the smallest thing here. 510 00:30:08,010 --> 00:30:10,490 So each of these is a recursive call. 511 00:30:10,490 --> 00:30:15,230 I can think of it as recursively calling successor. 512 00:30:15,230 --> 00:30:16,740 So let's do that. 513 00:30:24,770 --> 00:30:28,410 I want to find the successor of x in v. First thing 514 00:30:28,410 --> 00:30:32,220 I'm going to do is do the ij breakdown. 515 00:30:32,220 --> 00:30:39,380 I'll let i be high of x and j be-- I could do low of x. 516 00:30:39,380 --> 00:30:44,940 But what I'm going to try for is to search within this cluster, 517 00:30:44,940 --> 00:30:46,310 high of x. 518 00:30:46,310 --> 00:30:53,000 So I'm going to look for the successor of cluster i, 519 00:30:53,000 --> 00:30:59,914 which is cluster high of x, of low of x. 520 00:30:59,914 --> 00:31:03,870 OK, so that's this first step of looking in x's cluster. 521 00:31:03,870 --> 00:31:05,310 This is x's cluster. 522 00:31:05,310 --> 00:31:06,916 This is x's name in the cluster. 523 00:31:06,916 --> 00:31:08,540 I'm going to try to find the successor. 524 00:31:08,540 --> 00:31:10,110 But it might say infinity. 525 00:31:10,110 --> 00:31:12,160 I didn't find anything. 526 00:31:12,160 --> 00:31:15,660 And then I'll be unhappy if j equals infinity. 527 00:31:21,270 --> 00:31:23,140 So that's line one. 528 00:31:32,070 --> 00:31:33,830 Well, then we're in the wrong cluster. 529 00:31:33,830 --> 00:31:35,570 High of x is not the right cluster. 530 00:31:35,570 --> 00:31:37,550 Let's find the correct cluster, which 531 00:31:37,550 --> 00:31:40,370 is going to be the next non-empty cluster. 532 00:31:40,370 --> 00:31:50,360 So I'm going to change i to be the successor in the summary 533 00:31:50,360 --> 00:31:57,025 structure of i. 534 00:31:57,025 --> 00:31:59,480 So i was the name of a cluster. 535 00:31:59,480 --> 00:32:00,650 It may have items in it. 536 00:32:00,650 --> 00:32:02,830 But we want to find the next non-empty thing. 537 00:32:02,830 --> 00:32:06,920 Because we know the successor we're looking for is not here. 538 00:32:09,490 --> 00:32:09,990 OK. 539 00:32:09,990 --> 00:32:13,190 So this is the cluster we now belong in. 540 00:32:13,190 --> 00:32:15,000 What item in the cluster do we want? 541 00:32:15,000 --> 00:32:17,620 Well, we want to find the minimum item in that cluster. 542 00:32:17,620 --> 00:32:24,280 And we're going to do that by a recursive call, which 543 00:32:24,280 --> 00:32:40,730 is j is the successor within cluster i of minus infinity, 544 00:32:40,730 --> 00:32:42,280 I'll say. 545 00:32:42,280 --> 00:32:43,800 -1 would also work. 546 00:32:43,800 --> 00:32:46,450 So this will find the smallest item in the cluster. 547 00:32:46,450 --> 00:32:50,720 And then, in both cases, we get i and j, 548 00:32:50,720 --> 00:32:53,900 which together in this form describe 549 00:32:53,900 --> 00:32:55,860 the value x that we care about. 550 00:32:55,860 --> 00:33:02,610 So I'm just going to say, return index of ij. 551 00:33:02,610 --> 00:33:08,140 That's how we reconstruct an item name for the structure v. 552 00:33:08,140 --> 00:33:10,260 We knew which substructure it's in. 553 00:33:10,260 --> 00:33:12,740 And we know its name within the substructure, 554 00:33:12,740 --> 00:33:14,790 within the cluster. 555 00:33:14,790 --> 00:33:18,280 Is this algorithm clearly correct? 556 00:33:18,280 --> 00:33:18,905 Good. 557 00:33:18,905 --> 00:33:21,120 It's also really bad. 558 00:33:21,120 --> 00:33:23,350 Well, it's better than everything we've done so far. 559 00:33:23,350 --> 00:33:25,790 The last result we had was square root of u. 560 00:33:25,790 --> 00:33:30,440 This is going to be better than that, but still not log log u. 561 00:33:30,440 --> 00:33:31,370 Why? 562 00:33:31,370 --> 00:33:32,430 Both of these are bad. 563 00:33:38,990 --> 00:33:39,666 Yeah. 564 00:33:39,666 --> 00:33:42,932 AUDIENCE: You make more than one call to [? your insert. ?] 565 00:33:42,932 --> 00:33:43,640 PROFESSOR: Right. 566 00:33:43,640 --> 00:33:46,134 I make more than one recursive call 567 00:33:46,134 --> 00:33:47,550 to whatever the operation is here. 568 00:33:47,550 --> 00:33:49,600 Insert calls insert twice. 569 00:33:49,600 --> 00:33:52,915 Here, successor calls successor potentially three times. 570 00:33:55,830 --> 00:33:57,430 This is a good challenge for me. 571 00:33:57,430 --> 00:33:59,411 Let's see. 572 00:33:59,411 --> 00:34:00,480 Eh, not bad. 573 00:34:00,480 --> 00:34:02,564 Off by one. 574 00:34:02,564 --> 00:34:05,250 OK, that's a common problem in computer science, right? 575 00:34:05,250 --> 00:34:07,649 Always off by one errors. 576 00:34:07,649 --> 00:34:09,690 OK, so let's think of it in terms of recurrences, 577 00:34:09,690 --> 00:34:10,830 in case that's not clear. 578 00:34:10,830 --> 00:34:16,929 Here we have t of u is 2 times t of square root of u. 579 00:34:16,929 --> 00:34:18,739 Right, to solve a problem of size u, 580 00:34:18,739 --> 00:34:23,280 I solve two problems of size square root of u plus constant. 581 00:34:23,280 --> 00:34:26,050 Because high of x and low of x, I'm assuming, 582 00:34:26,050 --> 00:34:27,402 take constant time to do. 583 00:34:27,402 --> 00:34:28,610 It's just, I have an integer. 584 00:34:28,610 --> 00:34:29,750 I divide it in half. 585 00:34:29,750 --> 00:34:30,429 Those are cheap. 586 00:34:33,949 --> 00:34:35,920 What does this solve to? 587 00:34:35,920 --> 00:34:38,489 It's probably easier to think of it in terms of log u. 588 00:34:38,489 --> 00:34:40,900 Then we could apply the master method. 589 00:34:40,900 --> 00:34:45,270 Right, this is the same thing as t prime of log u 590 00:34:45,270 --> 00:34:52,055 is 2 times t of log u divided by 2 plus order 1. 591 00:34:58,750 --> 00:35:01,270 This is not quite the merge sort recurrence. 592 00:35:01,270 --> 00:35:04,452 But it's not good. 593 00:35:04,452 --> 00:35:05,910 One way to think of it, is we start 594 00:35:05,910 --> 00:35:07,860 with the total weight of log u. 595 00:35:07,860 --> 00:35:11,580 We split into log over 2, but two copies of it. 596 00:35:11,580 --> 00:35:14,190 So we're not saving anything. 597 00:35:14,190 --> 00:35:16,520 And we didn't reduce the problem strictly. 598 00:35:16,520 --> 00:35:18,940 In terms of the recursion tree, we have, you know, 599 00:35:18,940 --> 00:35:22,400 log u-- well, it's hard to think about because we 600 00:35:22,400 --> 00:35:28,396 have constant total cost. 601 00:35:28,396 --> 00:35:30,520 You could just plug this in with the Master method, 602 00:35:30,520 --> 00:35:32,990 or see that essentially we're conserving mass. 603 00:35:32,990 --> 00:35:34,640 We started with log u mass. 604 00:35:34,640 --> 00:35:36,350 We have two copies of log u over 2. 605 00:35:36,350 --> 00:35:38,290 That's the same total mass. 606 00:35:38,290 --> 00:35:41,270 So how many recursions do we do? 607 00:35:41,270 --> 00:35:44,360 Well we do do log log u recursions. 608 00:35:44,360 --> 00:35:48,080 The total number of leaves in that recursion tree is log u. 609 00:35:48,080 --> 00:35:50,060 Each of them, we pay constant. 610 00:35:50,060 --> 00:35:58,490 So this is log u, not log log u. 611 00:35:58,490 --> 00:36:01,570 To get log log u, we need to change this 2 into a 1. 612 00:36:01,570 --> 00:36:04,220 We can only afford one recursive call. 613 00:36:04,220 --> 00:36:07,570 If we have two recursive calls, we get logarithmic performance. 614 00:36:07,570 --> 00:36:11,240 If we have three recursive calls, it's even worse. 615 00:36:11,240 --> 00:36:13,375 Here, I would definitely use the Master method. 616 00:36:13,375 --> 00:36:16,400 It's less obvious. 617 00:36:16,400 --> 00:36:24,900 In this case, we get log u to the log base 2 of 3 power, 618 00:36:24,900 --> 00:36:30,514 which is log u to the 1.6 or so, so both worse than log n. 619 00:36:30,514 --> 00:36:31,930 This is strictly worse than log n. 620 00:36:31,930 --> 00:36:34,920 This is maybe just a little bit worse than log n, 621 00:36:34,920 --> 00:36:37,156 depending on how u relates to n. 622 00:36:37,156 --> 00:36:38,620 OK, so we're not there yet. 623 00:36:38,620 --> 00:36:39,881 But we're on the right track. 624 00:36:39,881 --> 00:36:41,380 We have the right kind of structure. 625 00:36:41,380 --> 00:36:43,040 We have a problem of size u. 626 00:36:43,040 --> 00:36:46,640 We split it up into square root of u sub problems of size u. 627 00:36:46,640 --> 00:36:48,200 From a data structures perspective, 628 00:36:48,200 --> 00:36:49,850 this the first time we're using divide and conquer 629 00:36:49,850 --> 00:36:50,780 for data structures. 630 00:36:50,780 --> 00:36:53,490 It's a little different from algorithms. 631 00:36:53,490 --> 00:36:57,507 So that's how the data structure is being laid out. 632 00:36:57,507 --> 00:36:59,840 But now we're worried about the algorithms on those data 633 00:36:59,840 --> 00:37:00,340 structures. 634 00:37:00,340 --> 00:37:02,960 Those, we can only afford t of u equals 1 times [? t of ?] 635 00:37:02,960 --> 00:37:04,150 squared of u plus order 1. 636 00:37:04,150 --> 00:37:06,169 Then we get log log u. 637 00:37:06,169 --> 00:37:07,710 So, here we have two recursive calls. 638 00:37:07,710 --> 00:37:10,030 Somehow we have to have only one. 639 00:37:10,030 --> 00:37:12,020 Let's start by fixing insert. 640 00:37:16,311 --> 00:37:16,810 Insert? 641 00:37:20,671 --> 00:37:21,170 No. 642 00:37:21,170 --> 00:37:22,890 Let's start by fixing successor. 643 00:37:22,890 --> 00:37:26,231 I think that will be more intuitive. 644 00:37:26,231 --> 00:37:27,230 Let's look at successor. 645 00:37:27,230 --> 00:37:29,000 Because successor is almost there. 646 00:37:29,000 --> 00:37:31,650 A lot of the time, it's just going to make this call, 647 00:37:31,650 --> 00:37:33,040 and we're happy. 648 00:37:33,040 --> 00:37:37,040 The bad cases is when we need that make both of these calls. 649 00:37:37,040 --> 00:37:40,590 Then there's three total, very bad. 650 00:37:40,590 --> 00:37:44,420 How could I get rid of this call? 651 00:37:44,420 --> 00:37:46,910 I was being all clever, that the minimum element is 652 00:37:46,910 --> 00:37:48,700 the successor of negative infinity. 653 00:37:48,700 --> 00:37:52,215 But that's actually not the right idea. 654 00:37:52,215 --> 00:37:52,715 Yeah. 655 00:37:52,715 --> 00:37:57,477 [? AUDIENCE: Catching ?] the minimum element in cluster i. 656 00:37:57,477 --> 00:37:59,560 PROFESSOR: Store the minimum element of cluster i. 657 00:37:59,560 --> 00:38:00,059 Yeah. 658 00:38:00,059 --> 00:38:05,690 In general, for every structure v, let's store the minimum. 659 00:38:05,690 --> 00:38:06,450 Why not? 660 00:38:06,450 --> 00:38:08,470 We know how to augment structures. 661 00:38:11,570 --> 00:38:14,330 Here in 006, you took an AVL tree, 662 00:38:14,330 --> 00:38:17,120 and you augment node to store the sub-tree size of the node. 663 00:38:17,120 --> 00:38:20,400 In this case, we're doing a similar kind of augmentation. 664 00:38:20,400 --> 00:38:24,130 Just for every structure, keep track of what the minimum is. 665 00:38:24,130 --> 00:38:26,925 So that will be idea number four. 666 00:38:44,297 --> 00:38:45,630 I'm going to add something here. 667 00:38:45,630 --> 00:38:47,730 But for now, let's store the minimums. 668 00:38:47,730 --> 00:38:54,165 So to do an insert into to structure v, item x, 669 00:38:54,165 --> 00:38:55,790 first thing we'll do is just say, well, 670 00:38:55,790 --> 00:38:58,260 if x is-- let's see if it's the new minimum. 671 00:38:58,260 --> 00:39:02,522 Maybe x is smaller than v dot min. 672 00:39:02,522 --> 00:39:08,590 If that's the case, let's just set v dot min to x. 673 00:39:08,590 --> 00:39:09,090 OK? 674 00:39:09,090 --> 00:39:12,070 And then, the rest is the same, same insertion 675 00:39:12,070 --> 00:39:17,340 algorithm as over here, these two recursive calls. 676 00:39:17,340 --> 00:39:19,020 I just spent constant additional time. 677 00:39:19,020 --> 00:39:21,650 And now every structure knows it's minimum. 678 00:39:21,650 --> 00:39:22,870 Again, ignore delete for now. 679 00:39:22,870 --> 00:39:25,210 That's trickier. 680 00:39:25,210 --> 00:39:28,620 OK, now every structure knows its minimum, 681 00:39:28,620 --> 00:39:33,816 which means we can replace this call with just v dot 682 00:39:33,816 --> 00:39:37,060 cluster i dot min. 683 00:39:37,060 --> 00:39:39,060 One down. 684 00:39:39,060 --> 00:39:50,330 OK, so if we look at successor, of v comma x. 685 00:39:50,330 --> 00:39:53,270 I'm going to replace the last line, or next to last line 686 00:39:53,270 --> 00:40:03,180 with j equals v cluster i dot min. 687 00:40:10,070 --> 00:40:13,430 So now, we're down to log u performance. 688 00:40:13,430 --> 00:40:15,610 We only have, at most, two recursive calls. 689 00:40:15,610 --> 00:40:19,540 So that's partial progress. 690 00:40:19,540 --> 00:40:23,730 But we need another idea to get rid of the second one. 691 00:40:23,730 --> 00:40:29,540 And the intuition here is that really, only one of these call 692 00:40:29,540 --> 00:40:31,150 should matter. 693 00:40:31,150 --> 00:40:35,429 OK, let's draw the big picture. 694 00:40:35,429 --> 00:40:37,220 Here's what the recursive thing looks like. 695 00:40:37,220 --> 00:40:38,219 We've got v dot summary. 696 00:40:41,340 --> 00:40:46,340 Then we've got a cluster 0, cluster 1, 697 00:40:46,340 --> 00:40:50,900 cluster square root of u minus 1. 698 00:40:50,900 --> 00:40:53,930 Each of those is a recursive structure. 699 00:40:53,930 --> 00:40:57,690 And we're also just storing the min over here as a copy. 700 00:41:00,240 --> 00:41:05,820 So when I do a query for, I don't know, 701 00:41:05,820 --> 00:41:11,950 the successor of this guy, there's kind of two cases. 702 00:41:11,950 --> 00:41:16,160 One situation is that I find the successor somewhere 703 00:41:16,160 --> 00:41:17,620 in this interval. 704 00:41:17,620 --> 00:41:18,700 In that case, I'm happy. 705 00:41:18,700 --> 00:41:22,474 Because I just need this one recursive call. 706 00:41:22,474 --> 00:41:23,890 OK, the other case is that I don't 707 00:41:23,890 --> 00:41:25,840 find what I'm looking for here. 708 00:41:25,840 --> 00:41:29,030 Then I have to do a successor up here. 709 00:41:29,030 --> 00:41:30,250 And then I'm done. 710 00:41:30,250 --> 00:41:33,030 Then I can teleport into whatever cluster it is. 711 00:41:33,030 --> 00:41:34,710 And I've stored the min by now. 712 00:41:34,710 --> 00:41:38,040 So that's constant time to jump into the right spot 713 00:41:38,040 --> 00:41:40,880 in the cluster. 714 00:41:40,880 --> 00:41:43,580 So either I find what I'm looking for here, 715 00:41:43,580 --> 00:41:46,295 or I find what I'm looking for here. 716 00:41:46,295 --> 00:41:47,670 What would be really nice is if I 717 00:41:47,670 --> 00:41:50,800 could tell ahead of time which one is going to succeed. 718 00:41:50,800 --> 00:41:53,710 Because then, if I know this is not going to find anything, 719 00:41:53,710 --> 00:41:56,190 I might as well just go immediately up here, 720 00:41:56,190 --> 00:41:58,859 and look at the successor in the summary structure. 721 00:41:58,859 --> 00:42:00,650 If I know I'm going to find something here, 722 00:42:00,650 --> 00:42:02,180 I'll just do the successor here. 723 00:42:02,180 --> 00:42:03,435 And I'm done. 724 00:42:03,435 --> 00:42:04,810 If I could just get away with one 725 00:42:04,810 --> 00:42:08,000 or the other of these calls, not both, I'd be very happy. 726 00:42:08,000 --> 00:42:10,072 How could I tell that? 727 00:42:10,072 --> 00:42:11,064 Yeah. 728 00:42:11,064 --> 00:42:13,050 AUDIENCE: Store the max. 729 00:42:13,050 --> 00:42:15,510 PROFESSOR: Store the max. 730 00:42:15,510 --> 00:42:16,840 Store the min and the max. 731 00:42:16,840 --> 00:42:19,230 Why not? 732 00:42:19,230 --> 00:42:21,830 OK, I just need a similar line here. 733 00:42:21,830 --> 00:42:26,570 If x is bigger than v dot max, change the max. 734 00:42:31,090 --> 00:42:33,080 So now, I've augmented my data structure 735 00:42:33,080 --> 00:42:35,200 to have the min and max at every level. 736 00:42:35,200 --> 00:42:39,690 And what's going on here is, I won't find an answer 737 00:42:39,690 --> 00:42:43,060 if I am greater than or equal to the maximum 738 00:42:43,060 --> 00:42:44,670 within this cluster. 739 00:42:44,670 --> 00:42:46,580 That's how I tell. 740 00:42:46,580 --> 00:42:49,420 If I'm equal to the max, or if I'm beyond the max, 741 00:42:49,420 --> 00:42:52,860 if all the items are over here, the max will be to my left. 742 00:42:52,860 --> 00:42:54,830 And then I know I will fail within the cluster. 743 00:42:54,830 --> 00:42:58,732 So I might as well just go up to summary and do it there. 744 00:42:58,732 --> 00:43:00,565 On the other hand, if I'm less than the max, 745 00:43:00,565 --> 00:43:02,981 then I'm guaranteed I will find something in this cluster. 746 00:43:02,981 --> 00:43:05,630 And so I can just search in there. 747 00:43:05,630 --> 00:43:07,740 So all I need to do-- I'll probably 748 00:43:07,740 --> 00:43:09,740 have to rewrite this slightly. 749 00:43:12,420 --> 00:43:25,980 If x is-- not x, close. 750 00:43:25,980 --> 00:43:30,880 I'm going to mimic this code a little bit, at least 751 00:43:30,880 --> 00:43:35,590 the first line is going to be i equals high of x. 752 00:43:35,590 --> 00:43:38,310 And now, that's the cluster I'm starting in. 753 00:43:38,310 --> 00:43:41,150 And I want to look at the maximum of that cluster. 754 00:43:58,630 --> 00:44:01,920 So I'm looking at v dot cluster i dot max. 755 00:44:01,920 --> 00:44:04,330 And I want to know, is x before that? 756 00:44:04,330 --> 00:44:07,180 Now within that cluster, x is known as low of x. 757 00:44:07,180 --> 00:44:12,520 So I compare low of x to cluster i's maximum element. 758 00:44:12,520 --> 00:44:14,340 If we're strictly to the left, then there 759 00:44:14,340 --> 00:44:18,020 is a successor guaranteed within that substructure. 760 00:44:18,020 --> 00:44:20,120 And so, I should do this line. 761 00:44:22,980 --> 00:44:24,310 I wish I could copy paste. 762 00:44:24,310 --> 00:44:30,220 But I will copy by hand. 763 00:44:30,220 --> 00:44:43,692 Successor within v dot cluster i, of low of x. 764 00:44:43,692 --> 00:44:46,020 OK, then I've found the item I'm looking for. 765 00:44:49,140 --> 00:44:56,760 Else, I'm beyond the max, I know this is the wrong cluster. 766 00:44:56,760 --> 00:45:00,450 And so I should immediately do these two lines, well, 767 00:45:00,450 --> 00:45:03,400 except I've made the second line use the min. 768 00:45:03,400 --> 00:45:06,320 So it will only be one recursive call, followed by a min. 769 00:45:09,790 --> 00:45:21,110 OK, so this is going to be i equals the successor within v 770 00:45:21,110 --> 00:45:26,695 dot summary of high of x. 771 00:45:40,460 --> 00:45:49,030 And then j is that line successor 772 00:45:49,030 --> 00:45:51,370 within-- oh, sorry-- the line that I 773 00:45:51,370 --> 00:45:55,540 used to have here, which is going to be v cluster i 774 00:45:55,540 --> 00:45:56,230 dot min. 775 00:46:00,830 --> 00:46:06,670 OK, and then, in both cases, I return index of ij. 776 00:46:12,030 --> 00:46:14,890 OK, so we're doing essentially the same logic as over here. 777 00:46:14,890 --> 00:46:17,155 Although I've replaced the step with the min, 778 00:46:17,155 --> 00:46:18,894 to get rid of that recursive call. 779 00:46:18,894 --> 00:46:21,060 But I'm really only doing one or the other of these, 780 00:46:21,060 --> 00:46:23,380 using max to distinguish. 781 00:46:23,380 --> 00:46:25,970 If I'm left of the max, I do the successor 782 00:46:25,970 --> 00:46:28,290 within cluster high of x. 783 00:46:28,290 --> 00:46:33,796 If I'm right of the max, then I do the successor 784 00:46:33,796 --> 00:46:35,170 immediately in summary structure. 785 00:46:35,170 --> 00:46:37,650 Because I know this won't find anything useful. 786 00:46:37,650 --> 00:46:43,000 And then I find the min within that non-empty structure. 787 00:46:43,000 --> 00:46:45,979 And in both cases, ij is the element I'm looking for. 788 00:46:45,979 --> 00:46:47,395 I put it back together with index. 789 00:46:50,330 --> 00:46:52,840 Clear? 790 00:46:52,840 --> 00:46:57,150 What's the running time of successor now? 791 00:46:57,150 --> 00:46:57,850 Log log u. 792 00:47:02,230 --> 00:47:03,720 Awesome. 793 00:47:03,720 --> 00:47:06,300 We've finished successor. 794 00:47:06,300 --> 00:47:09,390 Sadly, we have not finished insert. 795 00:47:09,390 --> 00:47:11,300 Insert still takes log u time. 796 00:47:11,300 --> 00:47:13,720 But, b progress. 797 00:47:13,720 --> 00:47:16,150 Maybe your routing table doesn't change that often, 798 00:47:16,150 --> 00:47:19,730 so you can afford to pay some extra time for insert, 799 00:47:19,730 --> 00:47:21,790 as long as you can route packets really fast, 800 00:47:21,790 --> 00:47:24,310 as long as you can find where something belongs, 801 00:47:24,310 --> 00:47:26,760 the successor in log log u time. 802 00:47:26,760 --> 00:47:31,730 But for kicks, let's do insert in log log u as well. 803 00:47:31,730 --> 00:47:35,070 This is going to be a little harder, 804 00:47:35,070 --> 00:47:39,070 or I would say a more surprising idea. 805 00:47:41,439 --> 00:47:41,980 This may be-- 806 00:47:55,681 --> 00:47:57,960 I don't have a great intuition for this step. 807 00:47:57,960 --> 00:47:58,671 I'm thinking. 808 00:48:01,330 --> 00:48:05,010 But again, most of the time, this should be fine, right? 809 00:48:05,010 --> 00:48:08,720 Most of the time, we insert into cluster high of x, low of x, 810 00:48:08,720 --> 00:48:09,810 and we're done. 811 00:48:09,810 --> 00:48:14,100 As long as there is something already in that cluster, 812 00:48:14,100 --> 00:48:16,330 we don't need to update the summary structure. 813 00:48:16,330 --> 00:48:19,380 As long as high of x has already been inserted into the summary 814 00:48:19,380 --> 00:48:22,200 structure, we can get away with just this first step. 815 00:48:22,200 --> 00:48:25,110 The tricky part is detecting. 816 00:48:25,110 --> 00:48:26,810 How would we know? 817 00:48:26,810 --> 00:48:30,210 Well, that's not enough just to detect it. 818 00:48:30,210 --> 00:48:33,110 If high of x is not in v dot summary, 819 00:48:33,110 --> 00:48:34,685 we have to do this insert. 820 00:48:34,685 --> 00:48:37,420 We can't get away with it. 821 00:48:37,420 --> 00:48:38,679 But that's kind of rare. 822 00:48:38,679 --> 00:48:40,220 That only happens the very first time 823 00:48:40,220 --> 00:48:41,872 you insert into that cluster. 824 00:48:41,872 --> 00:48:44,080 Every subsequent time, it's going to be really cheap. 825 00:48:44,080 --> 00:48:47,650 We just have to do this. 826 00:48:47,650 --> 00:48:51,590 It's easy enough to keep track of whether a cluster is empty. 827 00:48:51,590 --> 00:48:53,300 For example, we're storing the min. 828 00:48:53,300 --> 00:48:57,910 We can say v dot min is none, special value, whenever 829 00:48:57,910 --> 00:49:00,579 the structure v is empty. 830 00:49:00,579 --> 00:49:03,120 But we still have this problem, that the first time we insert 831 00:49:03,120 --> 00:49:05,035 into a cluster, it's expensive. 832 00:49:05,035 --> 00:49:06,160 Because we have to do this. 833 00:49:06,160 --> 00:49:09,390 And we have to do this. 834 00:49:09,390 --> 00:49:17,110 How could we avoid, in the case where a cluster is empty-- 835 00:49:17,110 --> 00:49:19,981 remember, an overall structure looks like this. 836 00:49:19,981 --> 00:49:22,230 We can tell that it's empty by saying min equals none, 837 00:49:22,230 --> 00:49:24,590 let's say. 838 00:49:24,590 --> 00:49:25,600 What could I do? 839 00:49:25,600 --> 00:49:27,363 Sorry, there's also a max now. 840 00:49:30,820 --> 00:49:35,140 What could I do to speed up inserting 841 00:49:35,140 --> 00:49:36,282 into an empty cluster? 842 00:49:36,282 --> 00:49:37,990 Because I'm first going to have to insert 843 00:49:37,990 --> 00:49:38,949 into the empty cluster. 844 00:49:38,949 --> 00:49:41,031 Then I'm going to have to answer into the summary. 845 00:49:41,031 --> 00:49:42,260 I can't get away from this. 846 00:49:42,260 --> 00:49:46,354 So I'd like this to become cheap, in the special case when 847 00:49:46,354 --> 00:49:47,270 this cluster is empty. 848 00:49:53,050 --> 00:49:53,550 Yeah. 849 00:49:53,550 --> 00:49:55,070 AUDIENCE: Lazy propogation. 850 00:49:55,070 --> 00:49:57,800 PROFESSOR: Lazy propagation-- you want to elaborate? 851 00:49:57,800 --> 00:49:58,780 AUDIENCE: Yeah. 852 00:49:58,780 --> 00:50:04,660 We mark the place we want to insert in. 853 00:50:04,660 --> 00:50:07,914 And then we will take it down whenever we [? insert ?] there. 854 00:50:07,914 --> 00:50:08,580 PROFESSOR: Good. 855 00:50:08,580 --> 00:50:11,690 So when I insert into an empty structure, 856 00:50:11,690 --> 00:50:15,460 I'm just going to have a little lazy field, or something. 857 00:50:15,460 --> 00:50:18,170 And I'll put the item in there. 858 00:50:18,170 --> 00:50:19,940 And then the next time I insert into it, 859 00:50:19,940 --> 00:50:22,550 maybe I'll carry it down a little bit. 860 00:50:22,550 --> 00:50:24,120 That actually works. 861 00:50:24,120 --> 00:50:27,529 And that was the original van Emde Boas structure, 862 00:50:27,529 --> 00:50:29,070 [? I ?] [? learned ?] [? recently. ?] 863 00:50:29,070 --> 00:50:31,390 So that works. 864 00:50:31,390 --> 00:50:33,900 But it's a little more complicated than the solution 865 00:50:33,900 --> 00:50:35,040 I have in mind. 866 00:50:35,040 --> 00:50:41,940 So I'm going to unify that lazy field with the minimum field. 867 00:50:41,940 --> 00:50:43,870 Say, when I insert into a structure, 868 00:50:43,870 --> 00:50:45,570 if there's nothing here, I'm just 869 00:50:45,570 --> 00:50:49,370 going to put the item there, and not recurse. 870 00:50:49,370 --> 00:50:54,239 I just am not going to store the minimum item recursively. 871 00:50:54,239 --> 00:50:55,030 Definitely frisbee. 872 00:50:57,940 --> 00:51:02,230 So that's the last idea, pretty much. 873 00:51:11,040 --> 00:51:18,335 Idea number five is, don't store the min recursively. 874 00:51:23,880 --> 00:51:26,672 This is effectively equivalent to lazy. 875 00:51:26,672 --> 00:51:28,130 But we're actually just never going 876 00:51:28,130 --> 00:51:30,890 to get around to moving this guy down. 877 00:51:30,890 --> 00:51:32,180 Just leave it there. 878 00:51:32,180 --> 00:51:35,606 First, if the min field is blank, store the item there. 879 00:51:35,606 --> 00:51:36,106 Yeah. 880 00:51:36,106 --> 00:51:38,629 AUDIENCE: What do you mean by moving the guy down? 881 00:51:38,629 --> 00:51:40,670 PROFESSOR: Don't worry about moving the guy down. 882 00:51:40,670 --> 00:51:41,711 We're not going to do it. 883 00:51:41,711 --> 00:51:43,230 AUDIENCE: [INAUDIBLE] 884 00:51:43,230 --> 00:51:45,000 PROFESSOR: But in general, moving down 885 00:51:45,000 --> 00:51:46,980 means, when I want to insert an item, 886 00:51:46,980 --> 00:51:50,570 I have to move it down into its sub cluster. 887 00:51:50,570 --> 00:51:54,020 So I want to insert x into the cluster, 888 00:51:54,020 --> 00:51:56,680 high of x with low of x, that recursive call. 889 00:51:56,680 --> 00:51:58,230 That's moving it down. 890 00:51:58,230 --> 00:51:59,320 I'm not going to do that. 891 00:51:59,320 --> 00:52:02,700 If the structure is empty, I'm going 892 00:52:02,700 --> 00:52:06,910 to set v dot min equal to x, and then stop. 893 00:52:06,910 --> 00:52:18,725 Let me illustrate with some code, maybe over here. 894 00:52:44,960 --> 00:52:46,170 Here's what I mean. 895 00:52:46,170 --> 00:52:50,740 If v dot min is special none value-- use 896 00:52:50,740 --> 00:52:54,370 Python notation here-- then I'm just going 897 00:52:54,370 --> 00:52:55,730 to set v dot min equal to x. 898 00:52:55,730 --> 00:52:58,470 I should also set v dot max equal to x. 899 00:52:58,470 --> 00:53:00,570 Because I want to keep track of the maximum. 900 00:53:00,570 --> 00:53:01,540 And then, stop. 901 00:53:01,540 --> 00:53:03,480 Return. 902 00:53:03,480 --> 00:53:04,960 That's all I will do for inserting 903 00:53:04,960 --> 00:53:08,320 into an empty structure, is stick it in the max field. 904 00:53:11,040 --> 00:53:13,120 OK, this may seem like a minor change. 905 00:53:13,120 --> 00:53:16,550 But it's going to make this cheap. 906 00:53:16,550 --> 00:53:20,040 So the rest of the algorithm is going to be pretty similar. 907 00:53:20,040 --> 00:53:23,700 There's a couple annoying special cases, 908 00:53:23,700 --> 00:53:26,070 which is, we have to keep the min up to date. 909 00:53:26,070 --> 00:53:28,925 And we have to keep the max up to date, in general. 910 00:53:31,860 --> 00:53:32,750 This one is easy. 911 00:53:32,750 --> 00:53:35,881 We just set v dot max equal to x. 912 00:53:35,881 --> 00:53:37,880 Because we're not doing anything fancy with max. 913 00:53:37,880 --> 00:53:39,000 Min is a little special. 914 00:53:39,000 --> 00:53:43,150 Because if we're inserting an item smaller 915 00:53:43,150 --> 00:53:47,960 than the current minimum, then really x belongs in the slot. 916 00:53:47,960 --> 00:53:49,670 And then whatever was in here needs 917 00:53:49,670 --> 00:53:51,390 to be recursively inserted. 918 00:53:51,390 --> 00:53:59,044 OK, so I'm going to say swap x with v dot min. 919 00:53:59,044 --> 00:54:00,960 So I'm going to put x into the v dot min slot. 920 00:54:00,960 --> 00:54:03,200 And I'm going to pull out whatever item was in there 921 00:54:03,200 --> 00:54:04,800 and call it x now. 922 00:54:04,800 --> 00:54:06,919 And now my remaining goal is to insert x 923 00:54:06,919 --> 00:54:08,210 into the rest of the structure. 924 00:54:08,210 --> 00:54:12,219 There's only one item that gets this freedom of not 925 00:54:12,219 --> 00:54:13,260 being recursively stored. 926 00:54:13,260 --> 00:54:15,093 And it's always going to be the minimum one. 927 00:54:15,093 --> 00:54:18,524 So this way, the new value x goes there. 928 00:54:18,524 --> 00:54:21,190 Whatever it used to be there now has to be recursively inserted. 929 00:54:21,190 --> 00:54:23,170 Because every item except the minimum, 930 00:54:23,170 --> 00:54:25,660 we're going to recursively insert. 931 00:54:25,660 --> 00:54:27,680 So the rest is pretty much the same. 932 00:54:27,680 --> 00:54:33,500 But we're going to, instead of always inserting 933 00:54:33,500 --> 00:54:35,060 into the summary structure, we're 934 00:54:35,060 --> 00:54:37,740 going to see whether it's necessary. 935 00:54:37,740 --> 00:54:39,370 Because we know how to do that. 936 00:54:39,370 --> 00:54:42,720 We just look at a cluster high of x. 937 00:54:42,720 --> 00:54:47,720 And we see, is it empty? 938 00:54:47,720 --> 00:54:55,810 Cluster high of x-- and empty means its minimum is none. 939 00:54:59,450 --> 00:55:02,860 So we're going to-- in fact, the next line 940 00:55:02,860 --> 00:55:09,670 after this one is going to be insert v cluster 941 00:55:09,670 --> 00:55:21,270 high of x, comma low of x. 942 00:55:21,270 --> 00:55:23,741 All right, that's this line. 943 00:55:23,741 --> 00:55:24,990 We're always going to do that. 944 00:55:27,680 --> 00:55:30,080 And in the special case, where there was not previously 945 00:55:30,080 --> 00:55:32,810 nothing in v cluster high of x, we 946 00:55:32,810 --> 00:55:35,080 need to update the summary structure. 947 00:55:35,080 --> 00:55:38,550 And we do that with this line. 948 00:55:38,550 --> 00:55:54,490 So I'm going to insert into v dot summary high of x. 949 00:55:57,900 --> 00:56:00,737 But I'm only doing that in the case when I need to. 950 00:56:00,737 --> 00:56:03,320 If it was already non-empty, I know this has already happened. 951 00:56:03,320 --> 00:56:06,640 So I don't need to bother with that insertion. 952 00:56:06,640 --> 00:56:08,230 OK, this is a weird algorithm. 953 00:56:08,230 --> 00:56:11,150 Because it doesn't look much better. 954 00:56:11,150 --> 00:56:15,110 In the worst case, we're doing two recursive calls to insert. 955 00:56:15,110 --> 00:56:18,748 But I claim this runs in log log u time. 956 00:56:18,748 --> 00:56:19,248 Why? 957 00:56:25,152 --> 00:56:26,628 Yeah. 958 00:56:26,628 --> 00:56:30,564 AUDIENCE: Because when we update the v dot summary, 959 00:56:30,564 --> 00:56:32,774 we [? just ?] [? have the ?] [? first ?] [? line. ?] 960 00:56:32,774 --> 00:56:33,440 PROFESSOR: Good. 961 00:56:33,440 --> 00:56:34,230 Yeah. 962 00:56:34,230 --> 00:56:36,540 In the case when I have to do this summary insertion, 963 00:56:36,540 --> 00:56:38,190 I know this guy was empty. 964 00:56:38,190 --> 00:56:39,770 Cluster high of x was empty. 965 00:56:39,770 --> 00:56:43,640 So this call is just going to do these two lines. 966 00:56:43,640 --> 00:56:45,680 Because I optimized the case of empty-- 967 00:56:45,680 --> 00:56:48,160 when a structure is empty, I spend constant time, 968 00:56:48,160 --> 00:56:49,960 no recursive calls. 969 00:56:49,960 --> 00:56:52,900 That means in the case when cluster high of x is empty, 970 00:56:52,900 --> 00:56:55,450 and I have to pay to insert into the summary structure, 971 00:56:55,450 --> 00:56:57,630 I know my second call is going to be free, only 972 00:56:57,630 --> 00:56:59,540 take constant time. 973 00:56:59,540 --> 00:57:02,510 So either I do this, in which case this takes constant time, 974 00:57:02,510 --> 00:57:06,090 or I don't do this, in which case I make one recursive call. 975 00:57:06,090 --> 00:57:10,690 In both cases, I really am only making one recursive call. 976 00:57:10,690 --> 00:57:19,560 OK, so this runs in log log u. 977 00:57:19,560 --> 00:57:22,170 Because I get the t of u equals 1 times square root 978 00:57:22,170 --> 00:57:24,490 of t of u plus order 1 recurrence. 979 00:57:24,490 --> 00:57:28,404 All the work I'm doing here is constant time, 980 00:57:28,404 --> 00:57:29,695 other than the recursive calls. 981 00:57:32,901 --> 00:57:33,400 Question? 982 00:57:33,400 --> 00:57:36,872 AUDIENCE: So when we insert the first time, 983 00:57:36,872 --> 00:57:40,022 we don't update v dot summary? 984 00:57:40,022 --> 00:57:42,480 PROFESSOR: When I insert into a completely empty structure, 985 00:57:42,480 --> 00:57:43,800 we don't update summary at all. 986 00:57:43,800 --> 00:57:44,430 That's right. 987 00:57:44,430 --> 00:57:46,870 We just store it in the min, and we're done. 988 00:57:46,870 --> 00:57:47,760 AUDIENCE: Oh. 989 00:57:47,760 --> 00:57:52,401 So then, if you were to [? call ?] the successor, 990 00:57:52,401 --> 00:57:52,900 and you-- 991 00:57:52,900 --> 00:57:53,841 PROFESSOR: Good. 992 00:57:53,841 --> 00:57:54,340 Yeah. 993 00:57:54,340 --> 00:57:57,000 The successor algorithm is currently incorrect. 994 00:57:57,000 --> 00:57:58,146 Thank you. 995 00:57:58,146 --> 00:58:01,935 Here's some frisbees for that question and the last answer. 996 00:58:05,230 --> 00:58:05,730 Yeah. 997 00:58:05,730 --> 00:58:08,110 This code is now slightly wrong. 998 00:58:08,110 --> 00:58:12,480 Because sometimes I'm storing elements in v dot min. 999 00:58:12,480 --> 00:58:15,240 And successor is just completely ignoring them. 1000 00:58:15,240 --> 00:58:18,040 So it's not going to find those items. 1001 00:58:18,040 --> 00:58:19,900 Luckily, it's a very simple fix. 1002 00:58:26,360 --> 00:58:30,180 Out of room, but please insert right in here. 1003 00:58:30,180 --> 00:58:41,150 If x is less v dot min, return v dot min. 1004 00:58:41,150 --> 00:58:43,440 That's all we need to do. 1005 00:58:43,440 --> 00:58:44,590 The min is special. 1006 00:58:44,590 --> 00:58:46,730 Because we're not storing it recursively. 1007 00:58:46,730 --> 00:58:49,240 And so, we can't rely on all of our recursive structures. 1008 00:58:49,240 --> 00:58:50,630 We can't rely on cluster i. 1009 00:58:50,630 --> 00:58:54,510 We can't rely on summary, on reporting about v dot min. 1010 00:58:54,510 --> 00:58:57,720 v dot min is just a special item sitting there. 1011 00:58:57,720 --> 00:58:59,040 It's represented nowhere else. 1012 00:59:01,560 --> 00:59:02,350 But we can check. 1013 00:59:02,350 --> 00:59:03,480 Because it's the minimum element, 1014 00:59:03,480 --> 00:59:05,000 and we're looking for successors, 1015 00:59:05,000 --> 00:59:06,940 it's really easy to check for whether it's 1016 00:59:06,940 --> 00:59:09,070 the item we're looking for. 1017 00:59:09,070 --> 00:59:10,320 Because it's the smallest one. 1018 00:59:10,320 --> 00:59:13,372 If we're smaller than it, then that's clearly the successor. 1019 00:59:13,372 --> 00:59:15,970 OK, so in that case, we just spent constant time. 1020 00:59:15,970 --> 00:59:18,520 So it actually speeds up some situations for successor. 1021 00:59:18,520 --> 00:59:19,825 We're not exploiting that here. 1022 00:59:19,825 --> 00:59:21,450 It doesn't help much in the worst case. 1023 00:59:21,450 --> 00:59:22,870 But now, it should be correct. 1024 00:59:22,870 --> 00:59:25,322 Hopefully, you're happy. 1025 00:59:25,322 --> 00:59:26,155 Any other questions? 1026 00:59:29,670 --> 00:59:33,470 So at this point, we have what I will call a van Emde Boas. 1027 00:59:33,470 --> 00:59:37,560 This last version-- we can do insert and successor in log 1028 00:59:37,560 --> 00:59:38,220 log u time. 1029 00:59:41,100 --> 00:59:42,510 Yeah, sorry. 1030 00:59:42,510 --> 00:59:45,430 I modified the wrong successor algorithm, didn't I? 1031 00:59:45,430 --> 00:59:46,910 I meant to modify this one. 1032 00:59:46,910 --> 00:59:47,830 This is the fast one. 1033 00:59:47,830 --> 00:59:53,300 So please put that code here. 1034 00:59:53,300 --> 00:59:55,940 That's the log log u version of successor. 1035 00:59:55,940 --> 00:59:58,790 We just added this constant time check. 1036 00:59:58,790 --> 01:00:01,050 And now this runs in log log u time. 1037 01:00:01,050 --> 01:00:03,770 The key idea here was if we store the max, 1038 01:00:03,770 --> 01:00:06,780 then we know which of the two recursive calls we need to do. 1039 01:00:06,780 --> 01:00:08,500 If we store the min, this doesn't end up 1040 01:00:08,500 --> 01:00:10,030 being a recursive call. 1041 01:00:10,030 --> 01:00:11,170 So that's very clean. 1042 01:00:11,170 --> 01:00:13,644 With insert, we needed this trickier idea that the min, 1043 01:00:13,644 --> 01:00:15,560 we're not even going to recursively represent. 1044 01:00:15,560 --> 01:00:17,390 We'll just keep it there. 1045 01:00:17,390 --> 01:00:20,340 That requires this extra little check for successor. 1046 01:00:20,340 --> 01:00:22,580 But it allows us to do insert cheaply 1047 01:00:22,580 --> 01:00:27,530 in all cases-- cheap meaning only one recursive call. 1048 01:00:27,530 --> 01:00:29,530 Either we need to update the summary structure, 1049 01:00:29,530 --> 01:00:31,310 in which case that thing was empty, 1050 01:00:31,310 --> 01:00:34,720 and so we can think of that cluster-- 1051 01:00:34,720 --> 01:00:36,740 so we have this special case of inserting 1052 01:00:36,740 --> 01:00:39,520 into an empty cluster, which is super cheap, 1053 01:00:39,520 --> 01:00:42,900 or most of the time, you imagine that the cluster was already 1054 01:00:42,900 --> 01:00:43,400 non-empty. 1055 01:00:43,400 --> 01:00:45,608 And so we don't need to update the summary structure. 1056 01:00:45,608 --> 01:00:48,110 And then we just do this recursion. 1057 01:00:48,110 --> 01:00:51,210 So in all cases, everything is cheap. 1058 01:00:51,210 --> 01:00:54,940 Now the one thing I've been avoiding is delete. 1059 01:00:54,940 --> 01:00:55,991 Yeah, question. 1060 01:00:55,991 --> 01:00:58,817 AUDIENCE: [INAUDIBLE] If x is greater than [? v ?] max, 1061 01:00:58,817 --> 01:01:03,060 [? we ?] [? swap ?] [? x ?] [? with ?] [? v ?] [? max? ?] 1062 01:01:03,060 --> 01:01:06,580 PROFESSOR: So if x is greater than v max, 1063 01:01:06,580 --> 01:01:08,730 I'm just going to update v max. 1064 01:01:08,730 --> 01:01:10,240 V max is stored recursively. 1065 01:01:10,240 --> 01:01:12,520 We're not doing anything fancy with v max. 1066 01:01:12,520 --> 01:01:15,650 And we had, at some point, a similar line. 1067 01:01:15,650 --> 01:01:18,616 So this is just updating v max. 1068 01:01:18,616 --> 01:01:20,280 Yeah, nothing special there. 1069 01:01:20,280 --> 01:01:23,100 In your problem set, you'll look at a more symmetric version, 1070 01:01:23,100 --> 01:01:25,620 where you don't recursively store min and max. 1071 01:01:25,620 --> 01:01:26,700 It works about the same. 1072 01:01:26,700 --> 01:01:30,430 But in some ways, the code is actually prettier. 1073 01:01:30,430 --> 01:01:31,804 So you'll get to do that. 1074 01:01:31,804 --> 01:01:32,470 Other questions? 1075 01:01:35,410 --> 01:01:37,900 All right. 1076 01:01:37,900 --> 01:01:40,880 So, delete. 1077 01:01:40,880 --> 01:01:42,174 We have insert and successor. 1078 01:01:42,174 --> 01:01:44,090 And through all these steps, it would actually 1079 01:01:44,090 --> 01:01:46,280 be very hard to do delete. 1080 01:01:46,280 --> 01:01:50,810 It turns out, at this point, delete is no problem. 1081 01:01:50,810 --> 01:01:54,180 So let me give you some delete codes. 1082 01:02:21,650 --> 01:02:22,650 It's a little bit long. 1083 01:02:26,190 --> 01:02:29,401 Maybe I'll start with a high level picture, 1084 01:02:29,401 --> 01:02:30,545 sort of the main cases. 1085 01:02:36,260 --> 01:02:38,930 Deleting the min is a little bit special, as you might imagine. 1086 01:02:38,930 --> 01:02:41,090 That element is different from every other element. 1087 01:02:41,090 --> 01:02:44,660 So if x equals min, we're going to do something else. 1088 01:02:44,660 --> 01:02:46,620 But let me specify that later. 1089 01:02:46,620 --> 01:02:49,500 Let's get to the bulk of the code, which 1090 01:02:49,500 --> 01:03:10,220 is we're going to delete low of x from cluster high of x. 1091 01:03:10,220 --> 01:03:12,600 That's the obvious recursion to do. 1092 01:03:12,600 --> 01:03:17,490 This is essentially the reverse of insert over here. 1093 01:03:17,490 --> 01:03:19,450 The first thing we do is undo this. 1094 01:03:19,450 --> 01:03:21,030 In all cases, insert was doing that. 1095 01:03:21,030 --> 01:03:22,420 So in all cases, delete has to do 1096 01:03:22,420 --> 01:03:26,730 that, other than the special case of the min. 1097 01:03:26,730 --> 01:03:29,090 And then, we need to do the inverse of this. 1098 01:03:29,090 --> 01:03:33,580 So if that was the last item, then we 1099 01:03:33,580 --> 01:03:36,100 need to delete from the summary structure. 1100 01:03:36,100 --> 01:03:40,250 So it's actually pretty symmetric, 1101 01:03:40,250 --> 01:03:43,260 other than the tiny details. 1102 01:03:43,260 --> 01:03:48,300 So after we delete, we can check, is that structure empty. 1103 01:03:48,300 --> 01:03:53,110 Because then, the min would equal none. 1104 01:03:53,110 --> 01:03:55,050 OK. 1105 01:03:55,050 --> 01:03:58,225 If that's the case, we delete from the summary structure. 1106 01:04:13,946 --> 01:04:14,446 OK. 1107 01:04:18,350 --> 01:04:20,120 Cool. 1108 01:04:20,120 --> 01:04:22,870 And there is a bit of a special case 1109 01:04:22,870 --> 01:04:26,985 at the end, which is when we deleted the maximum element. 1110 01:04:31,130 --> 01:04:32,780 OK, so I need to fill these in. 1111 01:04:36,570 --> 01:04:39,980 And it's important that these are filled in right. 1112 01:04:39,980 --> 01:04:41,800 Because in some situations here, we 1113 01:04:41,800 --> 01:04:44,170 are making two recursive calls. 1114 01:04:44,170 --> 01:04:48,000 But again, we'd like it to be, when we do both calls, 1115 01:04:48,000 --> 01:04:50,194 we want one of them to be cheap. 1116 01:04:50,194 --> 01:04:51,610 Now this one's hard to make cheap. 1117 01:04:51,610 --> 01:04:54,150 So when we delete from the summary structure, 1118 01:04:54,150 --> 01:04:57,150 we want this to delete to have taken only constant time, 1119 01:04:57,150 --> 01:04:59,220 no recursions. 1120 01:04:59,220 --> 01:05:01,550 And that's going to correspond to this case. 1121 01:05:01,550 --> 01:05:04,700 Because if we made the cluster empty, 1122 01:05:04,700 --> 01:05:06,720 that means we deleted the last item. 1123 01:05:06,720 --> 01:05:07,700 What's the last item? 1124 01:05:07,700 --> 01:05:10,630 Has to be v dot min. 1125 01:05:10,630 --> 01:05:12,280 If you have a size 1 structure, it's 1126 01:05:12,280 --> 01:05:14,310 always because that item is in v dot min, 1127 01:05:14,310 --> 01:05:16,160 everything else is empty. 1128 01:05:16,160 --> 01:05:18,260 So that's the case of deleting v dot min. 1129 01:05:18,260 --> 01:05:22,660 So we want this case to take constant time when it's 1130 01:05:22,660 --> 01:05:26,010 the last item we're deleting. 1131 01:05:26,010 --> 01:05:30,550 So let's fill that in a little. 1132 01:05:38,032 --> 01:05:39,240 Let's see if I can fit it in. 1133 01:06:29,090 --> 01:06:30,850 This is code that turns out to work 1134 01:06:30,850 --> 01:06:33,650 in this if x equals v dot min. 1135 01:06:33,650 --> 01:06:36,200 It's a little bit subtle. 1136 01:06:36,200 --> 01:06:39,940 But the key thing to check here is, we want to know, 1137 01:06:39,940 --> 01:06:41,810 is this the last item. 1138 01:06:41,810 --> 01:06:44,980 And one way to do that is to look at the summary structure, 1139 01:06:44,980 --> 01:06:48,142 and say, do you have any non-empty clusters? 1140 01:06:48,142 --> 01:06:49,850 If you don't have any non-empty clusters, 1141 01:06:49,850 --> 01:06:52,250 that means your min is none. 1142 01:06:52,250 --> 01:06:55,460 And that means, the only thing keeping the structure non-empty 1143 01:06:55,460 --> 01:06:56,650 is the minimum element. 1144 01:06:56,650 --> 01:06:58,120 That's stored in v dot min. 1145 01:06:58,120 --> 01:06:59,860 So in that case, that's the one situation 1146 01:06:59,860 --> 01:07:03,070 when v dot min becomes none. 1147 01:07:03,070 --> 01:07:07,420 We never set v dot min equals none in the other algorithms. 1148 01:07:07,420 --> 01:07:10,590 Because initially everything is none. 1149 01:07:10,590 --> 01:07:13,510 But when we're inserting, we never empty a structure. 1150 01:07:13,510 --> 01:07:14,620 Now we're doing delete. 1151 01:07:14,620 --> 01:07:16,440 This is the one situation when v dot 1152 01:07:16,440 --> 01:07:19,050 min becomes none from scratch. 1153 01:07:19,050 --> 01:07:21,780 In that case, no recursive calls. 1154 01:07:21,780 --> 01:07:24,250 So that means this algorithm is efficient. 1155 01:07:24,250 --> 01:07:26,500 Because if I had to delete from the summary structure, 1156 01:07:26,500 --> 01:07:29,610 this only had a single item, which is this situation. 1157 01:07:29,610 --> 01:07:31,630 Then I just set v dot min equals to none. 1158 01:07:31,630 --> 01:07:32,700 And I'm done. 1159 01:07:32,700 --> 01:07:35,160 So this will, overall, run in log log u time. 1160 01:07:40,170 --> 01:07:42,000 Now, it could be we're deleting the min, 1161 01:07:42,000 --> 01:07:43,820 but it was not the only item. 1162 01:07:43,820 --> 01:07:46,570 So that's this situation. 1163 01:07:46,570 --> 01:07:49,260 In that situation, we want to find out 1164 01:07:49,260 --> 01:07:50,580 what the min actually is. 1165 01:07:50,580 --> 01:07:51,080 Right? 1166 01:07:51,080 --> 01:07:52,487 We just deleted the min. 1167 01:07:52,487 --> 01:07:54,070 We want to put something in v dot min. 1168 01:07:54,070 --> 01:07:55,040 We can't set it to none. 1169 01:07:55,040 --> 01:07:57,206 Because that indicates the whole structure is empty. 1170 01:07:57,206 --> 01:08:01,190 So we have to recursively rip out the new minimum out item. 1171 01:08:01,190 --> 01:08:03,660 Because it should not be recursively stored anymore. 1172 01:08:03,660 --> 01:08:06,200 And then we're going to stick it into v dot min. 1173 01:08:06,200 --> 01:08:10,380 So now, finding minimum items is actually pretty easy. 1174 01:08:10,380 --> 01:08:12,990 We just looked at the first non-empty structure. 1175 01:08:12,990 --> 01:08:15,050 And we looked at the-- I think I'm 1176 01:08:15,050 --> 01:08:19,149 missing-- oh, v dot cluster i min, I guess, 1177 01:08:19,149 --> 01:08:21,710 closed parenthesis. 1178 01:08:21,710 --> 01:08:26,300 That is the minimum item in the first cluster. 1179 01:08:26,300 --> 01:08:29,370 So I want to recursively delete it. 1180 01:08:29,370 --> 01:08:30,770 So I'm setting x to that thing. 1181 01:08:30,770 --> 01:08:32,853 And then I'm going to do all this code, which will 1182 01:08:32,853 --> 01:08:34,970 delete x from that structure. 1183 01:08:34,970 --> 01:08:38,770 And then-- I mean, I'm doing it all right here. 1184 01:08:38,770 --> 01:08:41,689 But then, I'm going to set v dot min to be that value. 1185 01:08:41,689 --> 01:08:43,899 So then v dot min has a new value. 1186 01:08:43,899 --> 01:08:45,840 Because I deleted the old one. 1187 01:08:45,840 --> 01:08:48,250 And it's no longer recursively stored. 1188 01:08:48,250 --> 01:08:51,109 I don't want two copies of x floating around. 1189 01:08:51,109 --> 01:08:56,939 So that's why I do, even in this if case, I do all these steps. 1190 01:08:56,939 --> 01:08:58,540 Cool? 1191 01:08:58,540 --> 01:09:00,409 You can see delete-- is that a question? 1192 01:09:00,409 --> 01:09:02,832 AUDIENCE: [INAUDIBLE] 1193 01:09:02,832 --> 01:09:04,790 PROFESSOR: Oh, why did I set v dot max to none? 1194 01:09:04,790 --> 01:09:06,373 AUDIENCE: Because [? that's the ?] all 1195 01:09:06,373 --> 01:09:09,790 [? these ?] [INAUDIBLE] [? x ?] equals v dot max, 1196 01:09:09,790 --> 01:09:10,790 the last time. 1197 01:09:10,790 --> 01:09:12,200 AUDIENCE: [? Do you ?] [? find v dot max? ?] 1198 01:09:12,200 --> 01:09:12,830 PROFESSOR: Oh, right. 1199 01:09:12,830 --> 01:09:13,729 I'm not done yet. 1200 01:09:13,729 --> 01:09:15,988 I haven't specified what to do here. 1201 01:09:15,988 --> 01:09:19,226 OK, you really want to know? 1202 01:09:19,226 --> 01:09:21,140 OK. 1203 01:09:21,140 --> 01:09:24,920 Let's go somewhere else. 1204 01:09:24,920 --> 01:09:27,990 I have enough room, I think. 1205 01:09:27,990 --> 01:09:29,909 Eh, maybe I can squeeze it in. 1206 01:09:29,909 --> 01:09:33,050 It's going to be super compact. 1207 01:09:33,050 --> 01:09:36,470 So, when x equals v dot max, there are two cases. 1208 01:09:43,672 --> 01:09:44,880 So max is a little different. 1209 01:09:44,880 --> 01:09:47,700 We just need to keep it up to date. 1210 01:09:47,700 --> 01:09:49,109 So it's not that hard. 1211 01:09:49,109 --> 01:09:51,474 We don't have to do any recursive magic. 1212 01:10:09,600 --> 01:10:12,240 Well, I need another line. 1213 01:10:12,240 --> 01:10:13,510 Sorry. 1214 01:10:13,510 --> 01:10:15,040 Let me go up to the other board. 1215 01:10:54,066 --> 01:10:56,550 OK, I think that's the complete delete code. 1216 01:10:56,550 --> 01:10:57,290 You asked for it. 1217 01:10:57,290 --> 01:10:59,150 You've got it. 1218 01:10:59,150 --> 01:11:03,810 So, at this point, we have just deleted 1219 01:11:03,810 --> 01:11:06,310 the max, which means we need to find, basically, 1220 01:11:06,310 --> 01:11:07,410 the predecessor of x. 1221 01:11:07,410 --> 01:11:09,910 But we can't afford a recursive call. 1222 01:11:09,910 --> 01:11:10,660 I mean, that's OK. 1223 01:11:10,660 --> 01:11:13,990 It's just, we're trying to find the max in what remains. 1224 01:11:13,990 --> 01:11:16,030 Imagine v dot max is just wrong. 1225 01:11:16,030 --> 01:11:17,840 So we've got to set it from scratch. 1226 01:11:17,840 --> 01:11:19,410 It's not that hard to do. 1227 01:11:19,410 --> 01:11:23,850 Basically, we want to take the last non-empty structure. 1228 01:11:23,850 --> 01:11:26,430 That would v dot summary dot max, 1229 01:11:26,430 --> 01:11:30,410 and then find the last item in that cluster. 1230 01:11:30,410 --> 01:11:34,030 OK, so cluster i is the last one for v dot summary. 1231 01:11:34,030 --> 01:11:36,950 And then we look v dot cluster of i dot max. 1232 01:11:36,950 --> 01:11:38,190 And we combine it with i. 1233 01:11:38,190 --> 01:11:42,670 That gives us the name of that item in the last cluster, 1234 01:11:42,670 --> 01:11:44,410 the last non-empty cluster. 1235 01:11:44,410 --> 01:11:46,210 But there's a special case, which 1236 01:11:46,210 --> 01:11:48,640 is maybe this returns none. 1237 01:11:48,640 --> 01:11:52,340 Maybe there actually is nothing in v dot summary. 1238 01:11:52,340 --> 01:11:55,110 That means we just deleted the last item, I guess. 1239 01:11:55,110 --> 01:11:57,240 Or there's only one left. 1240 01:11:57,240 --> 01:11:59,450 We deleted the next to last time. 1241 01:11:59,450 --> 01:12:02,190 Now there's only one item left, namely v dot min. 1242 01:12:02,190 --> 01:12:04,790 So we set v dot max equal to v dot min. 1243 01:12:04,790 --> 01:12:06,480 So that's a special case. 1244 01:12:06,480 --> 01:12:08,170 But most the time, you're just doing 1245 01:12:08,170 --> 01:12:11,089 a couple dot max's, and you're done. 1246 01:12:11,089 --> 01:12:12,630 So that's how you maintain the maxes, 1247 01:12:12,630 --> 01:12:14,080 even when you're deleting. 1248 01:12:14,080 --> 01:12:16,807 And unless I made an error, I think all these algorithms 1249 01:12:16,807 --> 01:12:17,390 work together. 1250 01:12:17,390 --> 01:12:19,590 You're going to insert, delete, and successor. 1251 01:12:19,590 --> 01:12:23,000 And symmetrically, you can do predecessor in log log u 1252 01:12:23,000 --> 01:12:25,220 time per operation, super fast. 1253 01:12:28,340 --> 01:12:31,160 Let me tell you a couple other things. 1254 01:12:31,160 --> 01:12:34,110 One is, there's a matching lower bound. 1255 01:12:34,110 --> 01:12:36,230 Log log-- maybe you wonder, can I 1256 01:12:36,230 --> 01:12:41,300 get log log log time, log log log log time, or whatever? 1257 01:12:41,300 --> 01:12:43,180 No. 1258 01:12:43,180 --> 01:12:46,274 In most reasonable choices of parameters-- 1259 01:12:46,274 --> 01:12:48,190 it's a little bit more complicated than this-- 1260 01:12:48,190 --> 01:12:50,940 but for most of the time that you care about, 1261 01:12:50,940 --> 01:12:54,330 log log u is the right answer. 1262 01:12:54,330 --> 01:12:55,770 This was proved in 2007. 1263 01:12:55,770 --> 01:12:59,536 So it took us decades to really understand. 1264 01:12:59,536 --> 01:13:02,665 It's by a former MIT student. 1265 01:13:06,250 --> 01:13:18,299 So I'll give you some range where it holds, 1266 01:13:18,299 --> 01:13:19,590 which will raise another issue. 1267 01:13:19,590 --> 01:13:26,300 But, OK. 1268 01:13:26,300 --> 01:13:28,910 So this range is the range I talked about before. 1269 01:13:28,910 --> 01:13:30,891 This is when log log u equals log log n. 1270 01:13:30,891 --> 01:13:33,390 So that's kind of the case where you care about applying it. 1271 01:13:33,390 --> 01:13:37,090 If log log u is more like log n, it's not so interesting. 1272 01:13:37,090 --> 01:13:39,230 But as long as u is not too big, this 1273 01:13:39,230 --> 01:13:42,100 is a little bit bigger than polynomial n. 1274 01:13:42,100 --> 01:13:45,486 Then this is the right answer. 1275 01:13:45,486 --> 01:13:47,360 Now technically, you need another assumption, 1276 01:13:47,360 --> 01:13:49,068 which is the space of your data structure 1277 01:13:49,068 --> 01:13:50,782 is not to super linear. 1278 01:13:50,782 --> 01:13:51,990 Now this is a little awkward. 1279 01:13:51,990 --> 01:13:54,545 Because the space of this data show structure 1280 01:13:54,545 --> 01:13:59,120 is actually order u, not n. 1281 01:13:59,120 --> 01:14:00,766 So the last issue is space. 1282 01:14:06,140 --> 01:14:07,430 Space is order u. 1283 01:14:07,430 --> 01:14:11,290 Let me go back to this binary tree picture. 1284 01:14:11,290 --> 01:14:13,360 So we had the idea of, well, there's 1285 01:14:13,360 --> 01:14:15,640 all these bits at the bottom. 1286 01:14:15,640 --> 01:14:18,860 We're building a big binary tree above those. 1287 01:14:18,860 --> 01:14:20,890 The leaves are the actual data. 1288 01:14:20,890 --> 01:14:23,410 And then we're summarizing, by for every node, 1289 01:14:23,410 --> 01:14:25,462 we're writing the or of the two nodes below it, 1290 01:14:25,462 --> 01:14:27,670 which is summarizing whether that thing is non-empty. 1291 01:14:32,230 --> 01:14:34,400 What van Emde Boas is doing-- so first of all, 1292 01:14:34,400 --> 01:14:37,900 you see that the total number of nodes in this tree is order u. 1293 01:14:37,900 --> 01:14:39,521 Because there's u leaves. 1294 01:14:39,521 --> 01:14:41,395 The total size of a binary tree with u leaves 1295 01:14:41,395 --> 01:14:44,112 is order u, 2u minus 1, right? 1296 01:14:46,630 --> 01:14:49,300 And you can kind of see what van Emde Boas is doing here. 1297 01:14:49,300 --> 01:14:52,637 First, it's thinking about the middle level. 1298 01:14:52,637 --> 01:14:54,470 Now it's not directly looking at these bits. 1299 01:14:54,470 --> 01:14:57,940 It says, hey look, I know my item, 1300 01:14:57,940 --> 01:15:01,430 the thing I'm doing a successor of, let's say, is three. 1301 01:15:01,430 --> 01:15:03,580 I want to know the successor of this position. 1302 01:15:03,580 --> 01:15:08,160 First, I want to check, should I recurse in this block, 1303 01:15:08,160 --> 01:15:10,880 or should I recurse in the summary 1304 01:15:10,880 --> 01:15:13,010 block-- which I didn't draw. 1305 01:15:13,010 --> 01:15:16,340 But it's the part of the tree that would be up here. 1306 01:15:16,340 --> 01:15:23,280 And that's exactly what we're doing with successor. 1307 01:15:23,280 --> 01:15:25,842 Should we recursively look within cluster i? 1308 01:15:25,842 --> 01:15:27,800 Or should we look within the summary structure? 1309 01:15:27,800 --> 01:15:29,790 We only do one or the other. 1310 01:15:29,790 --> 01:15:32,190 And that's the sense in which we are binary searching 1311 01:15:32,190 --> 01:15:33,750 on the levels of this tree. 1312 01:15:33,750 --> 01:15:36,602 Either we will spend all of our work recursively looking 1313 01:15:36,602 --> 01:15:38,810 for the successor within the summary structure, which 1314 01:15:38,810 --> 01:15:42,960 is like finding the next 1 bit in this row, the middle row, 1315 01:15:42,960 --> 01:15:46,749 or we will spend all of our time doing successor in here. 1316 01:15:46,749 --> 01:15:47,540 And we can do that. 1317 01:15:47,540 --> 01:15:49,360 Because we have the max augmented. 1318 01:15:49,360 --> 01:15:52,424 OK, but that's the sense in which, kind of, 1319 01:15:52,424 --> 01:15:54,590 you are binary searching in the levels of this tree. 1320 01:15:54,590 --> 01:15:57,870 So that's that early intuition for van Emde Boas 1321 01:15:57,870 --> 01:15:59,800 is kind of what we're doing. 1322 01:15:59,800 --> 01:16:04,970 The trouble is, to store that tree takes order u space. 1323 01:16:04,970 --> 01:16:07,920 We'd really like to spend order n space. 1324 01:16:07,920 --> 01:16:09,680 And I have four minutes. 1325 01:16:09,680 --> 01:16:14,150 So you'll see part of the answer to this. 1326 01:16:17,132 --> 01:16:18,623 My poor microphone. 1327 01:16:23,600 --> 01:16:26,319 Let me give you an idea of how to fix the space bound. 1328 01:16:26,319 --> 01:16:27,485 Let's erase some algorithms. 1329 01:16:41,880 --> 01:16:50,910 The main idea here is only store non-empty clusters, 1330 01:16:50,910 --> 01:16:52,160 pretty simple idea. 1331 01:16:54,930 --> 01:16:58,180 We want to spend space only for the present items, 1332 01:16:58,180 --> 01:16:59,200 not for the absent ones. 1333 01:16:59,200 --> 01:17:01,940 So don't store the absent ones. 1334 01:17:01,940 --> 01:17:04,630 In particular, we're doing all this work around 1335 01:17:04,630 --> 01:17:07,560 when clusters are empty, in which case 1336 01:17:07,560 --> 01:17:10,270 we can see that just by looking at the min item, 1337 01:17:10,270 --> 01:17:11,360 or when they're non-empty. 1338 01:17:11,360 --> 01:17:13,380 So let's just store the non-empty ones. 1339 01:17:13,380 --> 01:17:17,170 That will get you down to almost order n space, not quite, 1340 01:17:17,170 --> 01:17:19,070 but close. 1341 01:17:19,070 --> 01:17:24,370 To do this, v dot cluster is no longer an array. 1342 01:17:24,370 --> 01:17:29,460 Just make it a hash table, a dictionary in Python. 1343 01:17:29,460 --> 01:17:33,460 So v dot cluster-- we were always 1344 01:17:33,460 --> 01:17:34,750 doing v dot cluster of i. 1345 01:17:34,750 --> 01:17:37,020 Just make that into dictionary instead of an array. 1346 01:17:37,020 --> 01:17:38,360 And you save most of the space. 1347 01:17:38,360 --> 01:17:40,795 You only have to store the non-empty items. 1348 01:17:47,260 --> 01:17:50,030 And you should know from 006, hash table is constant 1349 01:17:50,030 --> 01:17:51,320 expected. 1350 01:17:51,320 --> 01:17:54,870 We'll prove that formally in lecture eight, I think. 1351 01:17:54,870 --> 01:17:58,700 But for now, take hashing as given. 1352 01:17:58,700 --> 01:18:00,950 Everything we did before is essentially the same cost, 1353 01:18:00,950 --> 01:18:04,420 but an expectation, no longer worst case. 1354 01:18:04,420 --> 01:18:07,140 But now the space goes way down. 1355 01:18:07,140 --> 01:18:12,290 Because if you look at an item, when you insert an item, 1356 01:18:12,290 --> 01:18:15,620 it sort of goes to log log u different places, 1357 01:18:15,620 --> 01:18:16,910 in the worst case. 1358 01:18:16,910 --> 01:18:21,800 But, yeah. 1359 01:18:21,800 --> 01:18:28,740 We end up with n log log u space, which is pretty good, 1360 01:18:28,740 --> 01:18:31,077 almost linear space. 1361 01:18:31,077 --> 01:18:33,160 It's a little tricky to see why you get log log u. 1362 01:18:33,160 --> 01:18:38,330 But I guess if you look at the insert algorithm, 1363 01:18:38,330 --> 01:18:41,710 even though we had two recursive calls in the worst case. 1364 01:18:41,710 --> 01:18:43,430 One of them was free. 1365 01:18:43,430 --> 01:18:45,510 When we do both of them, we insert here. 1366 01:18:45,510 --> 01:18:47,506 This one happens to be free. 1367 01:18:47,506 --> 01:18:48,380 Because it was empty. 1368 01:18:48,380 --> 01:18:49,990 But we still pay for it. 1369 01:18:49,990 --> 01:18:52,310 We set v dot min equal to x. 1370 01:18:52,310 --> 01:18:55,110 And so that structure went from empty to non-empty. 1371 01:18:55,110 --> 01:18:57,880 So this costs 1. 1372 01:18:57,880 --> 01:19:00,785 And then we recursively call insert v dot summary 1373 01:19:00,785 --> 01:19:02,300 on high of x. 1374 01:19:02,300 --> 01:19:05,760 So we might, when we insert one item x, if lots of things 1375 01:19:05,760 --> 01:19:10,350 were empty, actually log log u structures become non-empty, 1376 01:19:10,350 --> 01:19:13,180 and that's why you pay log log u for each item you insert. 1377 01:19:13,180 --> 01:19:14,880 It's kind of annoying. 1378 01:19:14,880 --> 01:19:16,760 There is a fix, which is in my notes. 1379 01:19:16,760 --> 01:19:21,960 You can read it, for reducing this further to order n. 1380 01:19:21,960 --> 01:19:25,500 But, OK, I have 30 seconds to explain it. 1381 01:19:25,500 --> 01:19:28,240 The idea is-- you're not responsible for knowing it. 1382 01:19:28,240 --> 01:19:29,800 This is just in case you're curious. 1383 01:19:32,500 --> 01:19:35,200 The idea is, instead of going all the way down 1384 01:19:35,200 --> 01:19:37,790 in the recursion, at the very bottom, 1385 01:19:37,790 --> 01:19:39,970 you say, well, normally if you stop 1386 01:19:39,970 --> 01:19:41,890 the recursion when you have u equals 1387 01:19:41,890 --> 01:19:50,130 1, just stop the recursion when n is very small, 1388 01:19:50,130 --> 01:19:52,810 like log log u. 1389 01:19:52,810 --> 01:19:55,430 When I'm only storing log log u items, 1390 01:19:55,430 --> 01:19:56,514 put them in a linked list. 1391 01:19:56,514 --> 01:19:57,054 I don't care. 1392 01:19:57,054 --> 01:19:59,040 You can do whatever you want on log log u items 1393 01:19:59,040 --> 01:20:00,690 in log log u time. 1394 01:20:00,690 --> 01:20:02,120 It's just a tiny tweak. 1395 01:20:02,120 --> 01:20:05,720 But it turns out, it gets rid of that log u in the space. 1396 01:20:05,720 --> 01:20:07,317 So it's a little bit messier. 1397 01:20:07,317 --> 01:20:09,650 And I don't know if you'd want to implement it that way. 1398 01:20:09,650 --> 01:20:11,910 But you can reduce to linear space. 1399 01:20:11,910 --> 01:20:13,738 And that's van Emde Boas.