1 00:00:00,090 --> 00:00:02,490 The following content is provided under a Creative 2 00:00:02,490 --> 00:00:04,030 Commons license. 3 00:00:04,030 --> 00:00:06,360 Your support will help MIT OpenCourseWare 4 00:00:06,360 --> 00:00:10,720 continue to offer high quality educational resources for free. 5 00:00:10,720 --> 00:00:13,320 To make a donation or view additional materials 6 00:00:13,320 --> 00:00:17,280 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,280 --> 00:00:18,450 at ocw.mit.edu. 8 00:00:21,709 --> 00:00:23,500 ERIK DEMAINE: All right, let's get started. 9 00:00:23,500 --> 00:00:27,960 So today, we start out geometry, geometric data structures. 10 00:00:27,960 --> 00:00:29,680 There are two lectures on this. 11 00:00:29,680 --> 00:00:31,080 This is lecture one. 12 00:00:31,080 --> 00:00:33,910 And we're going to solve two main problems today. 13 00:00:33,910 --> 00:00:36,900 One is point location, which is finding yourself on a map. 14 00:00:36,900 --> 00:00:38,740 And the other is orthogonal range searching, 15 00:00:38,740 --> 00:00:43,280 which is catching a bunch of dots with a rectangular net. 16 00:00:43,280 --> 00:00:44,700 And they're fun problems. 17 00:00:44,700 --> 00:00:48,420 And they're good illustrations of a couple of techniques. 18 00:00:48,420 --> 00:00:51,521 We're going to cover two general techniques for data structure 19 00:00:51,521 --> 00:00:52,020 building. 20 00:00:52,020 --> 00:00:54,120 One is dynamizing static data structures, 21 00:00:54,120 --> 00:00:57,030 turning static into dynamic using a technique called weight 22 00:00:57,030 --> 00:00:58,582 balance, which is really cool. 23 00:00:58,582 --> 00:01:00,540 And another one is called fractional cascading, 24 00:01:00,540 --> 00:01:02,331 which has probably one of the coolest names 25 00:01:02,331 --> 00:01:04,739 of any algorithmic or data structures technique. 26 00:01:04,739 --> 00:01:07,320 It's actually a very simple idea, but sounds very scary. 27 00:01:10,710 --> 00:01:12,470 And with point location, we're going 28 00:01:12,470 --> 00:01:15,180 to see some fun connections to persistence and retroactivity, 29 00:01:15,180 --> 00:01:18,635 which was the topic of the last two lectures, you may recall. 30 00:01:18,635 --> 00:01:20,010 And so we'll start out with that. 31 00:01:22,970 --> 00:01:27,600 Planar point location, you can do it in higher dimensions 32 00:01:27,600 --> 00:01:28,500 as well. 33 00:01:28,500 --> 00:01:30,060 In general, geometric data structures 34 00:01:30,060 --> 00:01:32,680 are about going to more than one dimension. 35 00:01:32,680 --> 00:01:35,850 Most data structures are about one dimensional ordered data. 36 00:01:35,850 --> 00:01:39,430 Now, we have points in the plane. 37 00:01:39,430 --> 00:01:42,630 We might have polygons in the plane. 38 00:01:42,630 --> 00:01:48,810 So this is what we call a planar map, 39 00:01:48,810 --> 00:01:50,550 got a bunch of line segments and points 40 00:01:50,550 --> 00:01:51,930 forming a graph structure. 41 00:01:51,930 --> 00:01:53,610 So think of it as a planar graph drawn 42 00:01:53,610 --> 00:01:57,060 in the plane where every edge is a straight line segment. 43 00:01:57,060 --> 00:02:01,050 And none of the edges cross, let's say. 44 00:02:01,050 --> 00:02:04,335 So this is a planar map. 45 00:02:04,335 --> 00:02:08,520 It's also called a planar straight line graph. 46 00:02:08,520 --> 00:02:12,010 And the static version of this problem-- 47 00:02:12,010 --> 00:02:14,910 so there's two versions, one is static-- 48 00:02:14,910 --> 00:02:18,490 you want to preprocess the map. 49 00:02:18,490 --> 00:02:21,960 So I give you a single map up front. 50 00:02:21,960 --> 00:02:27,120 And then I want to support dynamic queries, which are 51 00:02:27,120 --> 00:02:33,800 which face contains a point p. 52 00:02:38,010 --> 00:02:39,570 So that point is going to be given 53 00:02:39,570 --> 00:02:42,750 to you as coordinates x and y. 54 00:02:42,750 --> 00:02:47,430 So maybe I mark a point like this one. 55 00:02:47,430 --> 00:02:49,000 I give you those x and y coordinates. 56 00:02:49,000 --> 00:02:51,930 I want to quickly determine that this face is 57 00:02:51,930 --> 00:02:53,355 the one that contains it. 58 00:02:53,355 --> 00:02:54,960 I give you another point over here. 59 00:02:54,960 --> 00:02:57,250 It quickly determines this face. 60 00:02:57,250 --> 00:02:58,590 This has a lot of applications. 61 00:02:58,590 --> 00:03:02,137 If you're writing a GUI and someone clicks on the screen, 62 00:03:02,137 --> 00:03:04,470 you need to map the coordinates that the mouse gives you 63 00:03:04,470 --> 00:03:08,760 to which GUI element you're clicking on. 64 00:03:08,760 --> 00:03:12,070 If you have a GPS device and it has a map, 65 00:03:12,070 --> 00:03:14,370 so it's preprocessed the map all at once. 66 00:03:14,370 --> 00:03:16,560 And now, given two GPS coordinates, latitude, 67 00:03:16,560 --> 00:03:19,021 longitude, it needs to know which city you're in, 68 00:03:19,021 --> 00:03:21,270 which part of the map you're in, so that it knows what 69 00:03:21,270 --> 00:03:24,250 to display, that sort of thing. 70 00:03:24,250 --> 00:03:26,190 These are all planar point location problems. 71 00:03:26,190 --> 00:03:28,150 It comes up in simulation, lots of things. 72 00:03:28,150 --> 00:03:29,774 It's actually one of the first problems 73 00:03:29,774 --> 00:03:33,930 I got interested in algorithms way back 74 00:03:33,930 --> 00:03:36,600 in my oceanography days. 75 00:03:36,600 --> 00:03:39,660 So that's planar point location. 76 00:03:39,660 --> 00:03:40,770 That's the static version. 77 00:03:40,770 --> 00:03:44,010 The dynamic version-- make things harder-- 78 00:03:44,010 --> 00:03:45,976 is the map is dynamic. 79 00:03:45,976 --> 00:03:47,100 So here, the map is static. 80 00:03:47,100 --> 00:03:49,110 The queries are still coming online. 81 00:03:49,110 --> 00:03:53,940 Dynamic version, you can insert and delete edges in your map. 82 00:03:57,390 --> 00:03:59,730 And let's say if you get a vertex down to degree 0, 83 00:03:59,730 --> 00:04:02,180 you can delete the vertex as well, add new degrees 84 00:04:02,180 --> 00:04:03,390 0 vertices. 85 00:04:03,390 --> 00:04:05,220 As long as you don't have crossings 86 00:04:05,220 --> 00:04:07,980 introduced by inserting edges, you can change things. 87 00:04:07,980 --> 00:04:11,040 So that's obviously harder. 88 00:04:11,040 --> 00:04:14,150 And we can solve this problem using persistence and using 89 00:04:14,150 --> 00:04:19,079 retroactivity in a pretty simple way using a technique which you 90 00:04:19,079 --> 00:04:23,040 may have seen before, pretty classic technique 91 00:04:23,040 --> 00:04:24,186 in computational geometry. 92 00:04:24,186 --> 00:04:25,560 So this is a technique that comes 93 00:04:25,560 --> 00:04:27,120 from the algorithms world. 94 00:04:27,120 --> 00:04:31,860 And we're going to apply it to the data structures world. 95 00:04:31,860 --> 00:04:37,360 So, sweep line technique, it's a very simple idea. 96 00:04:37,360 --> 00:04:42,900 So you have some line segments in the plane, 97 00:04:42,900 --> 00:04:45,000 something like this. 98 00:04:45,000 --> 00:04:48,750 And I'm going to take a vertical line. 99 00:04:48,750 --> 00:04:52,900 So the algorithmic problem is I want to know 100 00:04:52,900 --> 00:04:53,910 are there any crossings. 101 00:04:53,910 --> 00:04:55,201 Do any of these segments cross? 102 00:04:55,201 --> 00:04:57,240 This is where sweep line technique comes from, 103 00:04:57,240 --> 00:04:58,460 I believe. 104 00:04:58,460 --> 00:05:02,650 So the idea is we want to linearize or one-dimensionalify 105 00:05:02,650 --> 00:05:03,160 the problem. 106 00:05:03,160 --> 00:05:06,210 So just take a slice of the problem with a vertical line. 107 00:05:06,210 --> 00:05:09,370 And imagine sweeping that line from left to right. 108 00:05:09,370 --> 00:05:11,610 So you imagine it moving continuously. 109 00:05:11,610 --> 00:05:14,730 Of course, in reality, it moves discretely. 110 00:05:20,390 --> 00:05:23,510 Let me unambiguate this a little bit. 111 00:05:33,291 --> 00:05:33,790 OK. 112 00:05:33,790 --> 00:05:35,320 There are discrete moments in time 113 00:05:35,320 --> 00:05:38,425 when what is hit by the sweep line changes. 114 00:05:41,370 --> 00:05:42,930 Let me maybe label these segments. 115 00:05:42,930 --> 00:05:48,790 We've got a, b, c, and d. 116 00:05:48,790 --> 00:05:50,620 So initially, we hit nothing. 117 00:05:50,620 --> 00:05:54,064 Then we hit a, then we hit b. 118 00:05:54,064 --> 00:05:54,730 Why do we hit b? 119 00:05:54,730 --> 00:05:56,977 Because we saw the left end point of b. 120 00:05:56,977 --> 00:05:59,560 Then we see the right endpoint of a which means we no longer-- 121 00:05:59,560 --> 00:06:04,610 sorry, at this point, we see both a and b in that order. 122 00:06:04,610 --> 00:06:07,230 Then we lose a, so we're down to b. 123 00:06:07,230 --> 00:06:08,410 Then we see c. 124 00:06:08,410 --> 00:06:11,050 c is above b. 125 00:06:11,050 --> 00:06:15,760 Then we see d. d is above c and b. 126 00:06:15,760 --> 00:06:18,180 Then c and d cross. 127 00:06:18,180 --> 00:06:20,920 So c and d change positions. 128 00:06:20,920 --> 00:06:22,900 And then we have b. 129 00:06:22,900 --> 00:06:28,630 Then we lose b, then we lose c. 130 00:06:28,630 --> 00:06:30,970 Then we lose d. 131 00:06:30,970 --> 00:06:34,930 This is a classic algorithm for detecting these intersections. 132 00:06:34,930 --> 00:06:37,090 I don't want to get into details how you do this. 133 00:06:37,090 --> 00:06:41,110 But you're trying to look for when things change in order 134 00:06:41,110 --> 00:06:42,682 in these cross-sections. 135 00:06:42,682 --> 00:06:44,140 The way you do that is you maintain 136 00:06:44,140 --> 00:06:46,670 the cross-section in a binary search tree, 137 00:06:46,670 --> 00:06:48,040 so you maintain the order. 138 00:06:48,040 --> 00:06:50,080 If you hit a left endpoint, you insert into the binary search 139 00:06:50,080 --> 00:06:50,260 tree. 140 00:06:50,260 --> 00:06:52,300 If you see a right endpoint, you delete from the binary search 141 00:06:52,300 --> 00:06:53,110 tree. 142 00:06:53,110 --> 00:06:54,670 And you do some stuff to check for crossings. 143 00:06:54,670 --> 00:06:56,336 In this problem, there are no crossings. 144 00:06:56,336 --> 00:06:59,290 So we don't need to worry about that. 145 00:06:59,290 --> 00:07:01,280 But we're taking this technique. 146 00:07:01,280 --> 00:07:03,470 Say, OK, there's a data structure here, 147 00:07:03,470 --> 00:07:08,800 which is the binary search tree maintaining the cross-section. 148 00:07:08,800 --> 00:07:09,330 OK. 149 00:07:09,330 --> 00:07:26,120 So, typically, the cross-section data structure 150 00:07:26,120 --> 00:07:29,830 is regular balanced binary search tree. 151 00:07:34,150 --> 00:07:39,260 Our idea is what if we add persistence 152 00:07:39,260 --> 00:07:40,400 to that binary search tree? 153 00:07:40,400 --> 00:07:42,441 So instead of using a regular binary search tree, 154 00:07:42,441 --> 00:07:45,440 we use a partially persistent balanced binary search 155 00:07:45,440 --> 00:07:47,139 tree, which we know how to do. 156 00:07:47,139 --> 00:07:48,680 This is a bounded n degree structure. 157 00:07:48,680 --> 00:07:53,070 We can make it partially persistent, constant overhead. 158 00:07:53,070 --> 00:08:07,430 So if we add partial persistence, 159 00:08:07,430 --> 00:08:09,590 what does that let us do? 160 00:08:09,590 --> 00:08:11,360 Well, let's just look at a moment 161 00:08:11,360 --> 00:08:14,340 in the past, partial persistence about querying the past. 162 00:08:14,340 --> 00:08:17,210 So there's a sequence of insertions and deletions 163 00:08:17,210 --> 00:08:19,217 that occur from the sweep line. 164 00:08:19,217 --> 00:08:21,050 But now, if we can query in the past, that's 165 00:08:21,050 --> 00:08:24,860 like going to a desired x-coordinate and saying, 166 00:08:24,860 --> 00:08:28,620 what does my data structure look like at this moment? 167 00:08:28,620 --> 00:08:29,740 OK. 168 00:08:29,740 --> 00:08:32,049 Now, the data structure, let's maybe look at this one, 169 00:08:32,049 --> 00:08:35,870 because it's got three elements, very exciting. 170 00:08:35,870 --> 00:08:39,970 So you've got d, then c, then b. 171 00:08:39,970 --> 00:08:41,530 So you've got a little data structure 172 00:08:41,530 --> 00:08:44,870 that looks something like this. 173 00:08:44,870 --> 00:08:46,780 It understands the order of the cross-section 174 00:08:46,780 --> 00:08:48,670 of those segments. 175 00:08:48,670 --> 00:08:52,690 And so, for example, if I was given a query point 176 00:08:52,690 --> 00:08:57,370 like this one, I could figure out 177 00:08:57,370 --> 00:09:00,010 what is the segment above me, what is the segment below me. 178 00:09:00,010 --> 00:09:03,190 That is a successor query and a predecessor query 179 00:09:03,190 --> 00:09:04,840 in that binary search tree. 180 00:09:11,500 --> 00:09:16,750 This notation maybe-- a query at time t of, let's say, 181 00:09:16,750 --> 00:09:35,370 successor of y is what we call an upward ray shooting query 182 00:09:35,370 --> 00:09:38,620 from coordinates t,y. 183 00:09:38,620 --> 00:09:42,330 So t, the time, is acting as x-coordinate. 184 00:09:42,330 --> 00:09:45,640 Time is left to right here. 185 00:09:45,640 --> 00:09:49,080 And so what's happening is we're imagining, from this point, 186 00:09:49,080 --> 00:09:51,960 shooting a ray upward and asking what is 187 00:09:51,960 --> 00:09:53,670 the segment that I hit first. 188 00:09:53,670 --> 00:09:56,130 That's an upward ray shooting query. 189 00:09:56,130 --> 00:10:01,710 And this is from a problem called vertical ray shooting, 190 00:10:01,710 --> 00:10:05,055 which is more or less equivalent to planar point location. 191 00:10:10,710 --> 00:10:16,560 So vertical ray shooting, again, you're given a map, planar map. 192 00:10:16,560 --> 00:10:19,540 And the queries are like this. 193 00:10:19,540 --> 00:10:23,430 What is the first segment that you hit with an upward ray? 194 00:10:35,750 --> 00:10:39,110 So I give you a point, x,y. 195 00:10:39,110 --> 00:10:41,060 And I ask, if I go up from there, 196 00:10:41,060 --> 00:10:43,720 what's the next edge that I get? 197 00:10:43,720 --> 00:10:45,797 That's the vertical ray shooting problem. 198 00:10:45,797 --> 00:10:47,255 And we just solved the vertical ray 199 00:10:47,255 --> 00:10:49,340 shooting problem for static. 200 00:10:49,340 --> 00:10:52,430 If you're given a static map, you run this algorithm once, 201 00:10:52,430 --> 00:10:55,049 assume there are no crossings. 202 00:10:55,049 --> 00:10:56,840 Then to answer vertical ray shooting query, 203 00:10:56,840 --> 00:10:59,390 we just go back in time to time t, 204 00:10:59,390 --> 00:11:02,540 do the successor query, which takes log n time, 205 00:11:02,540 --> 00:11:04,680 and then we get the answer to this. 206 00:11:04,680 --> 00:11:13,220 So we can do this in log n per query static. 207 00:11:13,220 --> 00:11:15,881 This is all two dimensional. 208 00:11:15,881 --> 00:11:17,840 I should probably say that. 209 00:11:17,840 --> 00:11:19,040 You can generalize. 210 00:11:22,560 --> 00:11:23,100 Questions? 211 00:11:23,100 --> 00:11:24,360 This is actually really easy. 212 00:11:24,360 --> 00:11:26,640 This is the stuff we get for free out of persistence 213 00:11:26,640 --> 00:11:28,920 and, at the moment, retroactivity. 214 00:11:28,920 --> 00:11:31,230 I believe this is one of the reasons persistence 215 00:11:31,230 --> 00:11:32,729 was invented in the first place. 216 00:11:32,729 --> 00:11:35,020 There were a bunch of early persistent data structures. 217 00:11:35,020 --> 00:11:36,686 Then there was a general Driscoll paper, 218 00:11:36,686 --> 00:11:38,286 which I talked about. 219 00:11:38,286 --> 00:11:42,300 But I think geometry was one of the main motivations. 220 00:11:42,300 --> 00:11:44,400 Because it lets you add a dimension. 221 00:11:46,960 --> 00:11:50,800 As long as that dimension is time-like, then 222 00:11:50,800 --> 00:11:53,410 you get the dimension sort of for free. 223 00:11:53,410 --> 00:11:54,640 So that's nice. 224 00:11:54,640 --> 00:11:55,140 OK. 225 00:11:55,140 --> 00:11:56,540 What about retroactivity? 226 00:11:59,730 --> 00:12:02,790 Again, we're going to use partial retroactivity. 227 00:12:02,790 --> 00:12:05,230 And I can tell you for certainty, because I was there, 228 00:12:05,230 --> 00:12:06,960 this is why retroactivity was invented. 229 00:12:09,930 --> 00:12:13,260 So retroactivity, so that would mean 230 00:12:13,260 --> 00:12:17,250 that we get to dynamically add and delete 231 00:12:17,250 --> 00:12:19,560 insertions and deletions. 232 00:12:19,560 --> 00:12:22,350 So that's like adding and deleting segments 233 00:12:22,350 --> 00:12:23,346 from the structure. 234 00:12:23,346 --> 00:12:24,720 Again, we have a linear timeline. 235 00:12:24,720 --> 00:12:26,594 We always want to maintain a linear timeline, 236 00:12:26,594 --> 00:12:28,260 because that is reality. 237 00:12:28,260 --> 00:12:30,240 That corresponds to the x-coordinate. 238 00:12:30,240 --> 00:12:33,270 And now I want to be able to add a segment like this, which 239 00:12:33,270 --> 00:12:35,580 means there was an insertion at this time, a deletion 240 00:12:35,580 --> 00:12:37,380 at this time. 241 00:12:37,380 --> 00:12:38,680 Now, this doesn't quite work. 242 00:12:38,680 --> 00:12:41,350 Because this point in cross-section, 243 00:12:41,350 --> 00:12:43,300 it's actually moving over time. 244 00:12:43,300 --> 00:12:47,460 Binary search trees, that's OK, because things are simple. 245 00:12:47,460 --> 00:12:49,890 But at the moment, all we know how to do 246 00:12:49,890 --> 00:12:54,960 is actually horizontal segments, which are inserted and deleted 247 00:12:54,960 --> 00:12:58,420 at the same y-coordinate. 248 00:12:58,420 --> 00:13:05,316 So then we can do insert at time t1, 249 00:13:05,316 --> 00:13:10,650 an insertion of some y-coordinate, 250 00:13:10,650 --> 00:13:15,380 and then an insert at some later time, 251 00:13:15,380 --> 00:13:17,374 the deletion of that y-coordinate. 252 00:13:20,650 --> 00:13:25,320 So this is a partially retroactive successor problem. 253 00:13:25,320 --> 00:13:32,502 This is equal to dynamic vertical ray shooting. 254 00:13:36,930 --> 00:13:39,760 I guess this insertion corresponds 255 00:13:39,760 --> 00:13:41,040 to the insertion of a thing. 256 00:13:41,040 --> 00:13:44,220 If you instead do delete here, then you're 257 00:13:44,220 --> 00:13:47,040 deleting one of the segments. 258 00:13:47,040 --> 00:13:49,905 This is among horizontal segments. 259 00:13:55,720 --> 00:13:58,680 So if your map is made of horizontal and vertical 260 00:13:58,680 --> 00:13:59,890 segments-- 261 00:13:59,890 --> 00:14:01,590 so it's an orthogonal map-- 262 00:14:01,590 --> 00:14:04,410 then you can solve the dynamic problem using a partially 263 00:14:04,410 --> 00:14:06,120 retroactive successor. 264 00:14:06,120 --> 00:14:09,260 Again, we want to do successor just like before, 265 00:14:09,260 --> 00:14:10,920 querying the past. 266 00:14:10,920 --> 00:14:12,935 But now, our updates are different. 267 00:14:12,935 --> 00:14:14,310 Now, we have retroactive updates. 268 00:14:14,310 --> 00:14:17,410 That lets us dynamically change the past, 269 00:14:17,410 --> 00:14:19,920 which is like inserting and deleting edges 270 00:14:19,920 --> 00:14:21,072 through that algorithm. 271 00:14:21,072 --> 00:14:22,530 But at the moment, we only know how 272 00:14:22,530 --> 00:14:24,520 to do this for horizontal segments. 273 00:14:24,520 --> 00:14:26,550 So this gives us, if you remember, 274 00:14:26,550 --> 00:14:30,260 the retroactive successor result. 275 00:14:30,260 --> 00:14:32,380 We haven't seen that, how it works. 276 00:14:32,380 --> 00:14:33,360 It's complicated. 277 00:14:33,360 --> 00:14:38,970 But it achieves log n insert, delete successor retroactively. 278 00:14:38,970 --> 00:14:41,670 And so we get a log n, which is an optimal solution 279 00:14:41,670 --> 00:14:43,800 for dynamic vertical ray shooting 280 00:14:43,800 --> 00:14:47,010 among horizontal segments. 281 00:14:47,010 --> 00:14:49,620 There are a bunch of open problems. 282 00:14:49,620 --> 00:14:51,960 What about general maps? 283 00:14:56,220 --> 00:15:10,210 So for a dynamic vertical ray shooting in general maps, 284 00:15:10,210 --> 00:15:13,860 if you want log n query, the best results 285 00:15:13,860 --> 00:15:15,960 are log to the 1 plus epsilon insert log 286 00:15:15,960 --> 00:15:17,890 to the 2 plus epsilon delete. 287 00:15:17,890 --> 00:15:19,140 There's some other trade-offs. 288 00:15:19,140 --> 00:15:22,000 You can get log times log log n query and reduce. 289 00:15:22,000 --> 00:15:25,110 Still, we don't know how to delete faster 290 00:15:25,110 --> 00:15:28,260 than log squared in any of the general solutions. 291 00:15:28,260 --> 00:15:30,840 So you can do log square for everything. 292 00:15:30,840 --> 00:15:34,187 But the hope would be you could do log for everything even when 293 00:15:34,187 --> 00:15:35,520 the segments are not horizontal. 294 00:15:35,520 --> 00:15:41,980 But here, retroactivity doesn't seem to buy you things. 295 00:15:41,980 --> 00:15:43,090 It'd be nice if you could. 296 00:15:43,090 --> 00:15:47,170 Another fun open problem is, what 297 00:15:47,170 --> 00:15:58,010 about non-vertical rays, general rays, non-vertical rays? 298 00:16:01,240 --> 00:16:05,150 So I give you a point and I give you a vector, 299 00:16:05,150 --> 00:16:07,126 I want to know what do I hit in that direction. 300 00:16:07,126 --> 00:16:08,000 This is a lot harder. 301 00:16:08,000 --> 00:16:09,790 You can't use any of these tricks. 302 00:16:09,790 --> 00:16:13,120 And in fact, it's believed you cannot get polylog performance 303 00:16:13,120 --> 00:16:15,410 unless you have a ton of space. 304 00:16:15,410 --> 00:16:18,482 So the best known result is-- 305 00:16:18,482 --> 00:16:19,690 I'll just throw this up here. 306 00:16:19,690 --> 00:16:27,730 You can get n over square root s polylog and query. 307 00:16:27,730 --> 00:16:29,550 If you use, basically, s space. 308 00:16:36,320 --> 00:16:39,080 So you need quite a bit of space. 309 00:16:39,080 --> 00:16:42,260 Because if you use n to the 1 plus epsilon space, 310 00:16:42,260 --> 00:16:45,090 you can get roughly root n query time. 311 00:16:45,090 --> 00:16:50,200 If you use n to 5 space, then you 312 00:16:50,200 --> 00:16:53,655 get somewhat better query time, but still not great. 313 00:16:53,655 --> 00:16:55,960 You can maybe get down to n to the epsilon 314 00:16:55,960 --> 00:17:00,520 if you have very large polynomial space. 315 00:17:00,520 --> 00:17:02,890 But this is conjectured to be roughly optimal, 316 00:17:02,890 --> 00:17:04,730 I assume, other than the polylog factors. 317 00:17:04,730 --> 00:17:06,887 The belief is you cannot beat this for general ray. 318 00:17:06,887 --> 00:17:08,470 This is kind of annoying, because this 319 00:17:08,470 --> 00:17:09,650 is a problem we care about. 320 00:17:09,650 --> 00:17:12,565 Especially in 3D, this is ray tracing. 321 00:17:12,565 --> 00:17:13,992 You shoot a ray, what does it hit? 322 00:17:13,992 --> 00:17:14,575 You bounce it. 323 00:17:14,575 --> 00:17:15,700 You shoot another ray. 324 00:17:15,700 --> 00:17:18,160 I always want to know what objects am I hitting. 325 00:17:18,160 --> 00:17:21,041 And for special cases, you can do better. 326 00:17:21,041 --> 00:17:22,540 But in general, it seems quite hard. 327 00:17:22,540 --> 00:17:24,748 This is even in two dimensions. 328 00:17:24,748 --> 00:17:30,500 But there are a bunch of papers on 3D and so on. 329 00:17:30,500 --> 00:17:33,260 I just wanted to give you those connections to persistence 330 00:17:33,260 --> 00:17:34,640 and retroactivity. 331 00:17:34,640 --> 00:17:37,070 And that's point location. 332 00:17:37,070 --> 00:17:42,740 And now, I want to go on two orthogonal range searching. 333 00:17:42,740 --> 00:17:46,630 We can do some new data structures, new to us. 334 00:17:51,480 --> 00:17:53,490 So first, what is the problem? 335 00:18:10,930 --> 00:18:14,470 So it's sort of the reverse kind of problem here. 336 00:18:14,470 --> 00:18:21,710 You're given a bunch of points before the query was a point. 337 00:18:21,710 --> 00:18:25,060 And the query, in this case, is going 338 00:18:25,060 --> 00:18:29,560 to be, in two dimensions, a rectangle, a window 339 00:18:29,560 --> 00:18:30,290 if you will. 340 00:18:30,290 --> 00:18:35,320 And you want to know what points are in the rectangle. 341 00:18:35,320 --> 00:18:49,950 So given n points and d dimensions, query in general 342 00:18:49,950 --> 00:18:52,410 is going to be a box. 343 00:18:52,410 --> 00:18:55,830 So in 2D, it's an interval crossing interval. 344 00:18:55,830 --> 00:18:59,090 In 3D, it's three intervals cross-product together. 345 00:19:01,801 --> 00:19:02,300 OK. 346 00:19:02,300 --> 00:19:05,714 So in the static version, you get to preprocess the points. 347 00:19:05,714 --> 00:19:07,130 In the dynamic version, the points 348 00:19:07,130 --> 00:19:09,380 are being added and deleted. 349 00:19:09,380 --> 00:19:11,600 And in all cases, we have dynamic queries, 350 00:19:11,600 --> 00:19:13,500 which are what are the points in the box. 351 00:19:13,500 --> 00:19:15,250 Now, there are different versions of this query. 352 00:19:15,250 --> 00:19:17,208 There is an existence query, which is are there 353 00:19:17,208 --> 00:19:18,600 any points in the box? 354 00:19:18,600 --> 00:19:19,970 That's sort of the easiest. 355 00:19:19,970 --> 00:19:22,800 Next level up is, how many points are in the box? 356 00:19:22,800 --> 00:19:24,770 Which you can use to solve existence. 357 00:19:24,770 --> 00:19:27,740 Next level up is give me all the points in the box, 358 00:19:27,740 --> 00:19:32,100 or give me 10 points in the box, give me a point in the box. 359 00:19:32,100 --> 00:19:34,100 All of these problems are more or less the same. 360 00:19:34,100 --> 00:19:35,510 They do differ in some cases. 361 00:19:35,510 --> 00:19:36,968 But the things we'll see today, you 362 00:19:36,968 --> 00:19:38,850 can solve them all about as efficiently. 363 00:19:38,850 --> 00:19:41,070 But, of course, if you want to list all the points in the box, 364 00:19:41,070 --> 00:19:42,028 it could be everything. 365 00:19:42,028 --> 00:19:44,190 And so that could take linear time. 366 00:19:44,190 --> 00:19:47,000 So in general, our goal is to get a running time something 367 00:19:47,000 --> 00:19:51,740 like log n plus k, where k is the size of the output. 368 00:19:54,630 --> 00:19:56,835 So if you're asking how many points are in there, 369 00:19:56,835 --> 00:19:58,640 the size the output is a single number. 370 00:19:58,640 --> 00:20:01,280 So k is 1, you should get log n time. 371 00:20:01,280 --> 00:20:04,520 If you want to list 100 points in there, k is 100. 372 00:20:04,520 --> 00:20:07,280 And so you have to pay that to list them. 373 00:20:07,280 --> 00:20:09,155 If you want to know all of them, well, then k 374 00:20:09,155 --> 00:20:11,270 is the number of points that are in there. 375 00:20:11,270 --> 00:20:17,990 And we'll be able to achieve these kinds of bounds 376 00:20:17,990 --> 00:20:20,750 pretty much all the time, definitely in two dimensions. 377 00:20:20,750 --> 00:20:25,246 In D dimensions, it's going to get harder. 378 00:20:25,246 --> 00:20:25,745 OK. 379 00:20:28,760 --> 00:20:31,670 So I want to start out with one dimension just 380 00:20:31,670 --> 00:20:34,700 to make sure we're on the same page. 381 00:20:34,700 --> 00:20:37,280 And in general, we're going to start with a solution called 382 00:20:37,280 --> 00:20:42,230 range trees, which were simultaneously invented 383 00:20:42,230 --> 00:20:45,800 by a lot of people in the late '70s, Bentley, one 384 00:20:45,800 --> 00:20:47,720 of the main guys. 385 00:20:47,720 --> 00:20:52,220 And in general, we're going to aim here for a log to the d n 386 00:20:52,220 --> 00:20:53,630 plus k query time. 387 00:20:57,370 --> 00:20:59,950 So I like this, but now we have a dependence on dimension. 388 00:20:59,950 --> 00:21:01,330 And for 2D, this is not great. 389 00:21:01,330 --> 00:21:03,340 It's log squared. 390 00:21:03,340 --> 00:21:04,820 And we're going to do better. 391 00:21:04,820 --> 00:21:05,320 OK. 392 00:21:05,320 --> 00:21:08,200 But let's start with d equals 1. 393 00:21:08,200 --> 00:21:10,260 How do you do this? 394 00:21:10,260 --> 00:21:13,090 How do I achieve log n plus k query? 395 00:21:17,360 --> 00:21:18,230 Sort the points. 396 00:21:18,230 --> 00:21:18,730 Yeah. 397 00:21:18,730 --> 00:21:21,630 I could sort the points, then do binary search. 398 00:21:21,630 --> 00:21:26,270 So the query now is just an interval. 399 00:21:26,270 --> 00:21:28,790 That's the one dimensional version of a box. 400 00:21:28,790 --> 00:21:32,300 So if I search for a, search for b in a sorted list, 401 00:21:32,300 --> 00:21:34,040 then all the points in between I can 402 00:21:34,040 --> 00:21:35,729 count the different indices-- 403 00:21:35,729 --> 00:21:37,520 or subtract the two indices into the array. 404 00:21:37,520 --> 00:21:39,320 That will give me how many points there are 405 00:21:39,320 --> 00:21:42,920 in the box, all these things. 406 00:21:42,920 --> 00:21:45,270 Arrays aren't going to generalize super nicely. 407 00:21:45,270 --> 00:21:48,660 Although, we'll come back to arrays later. 408 00:21:48,660 --> 00:21:52,390 For now, I'd like to think of a binary search tree, 409 00:21:52,390 --> 00:21:54,359 balanced binary search tree. 410 00:21:54,359 --> 00:21:56,150 And I'm going to make it a little different 411 00:21:56,150 --> 00:21:58,820 from the usual kind of binary search tree. 412 00:21:58,820 --> 00:22:00,875 I want the data to be in the leaves. 413 00:22:04,190 --> 00:22:07,430 So I want the leaves to be my points. 414 00:22:07,430 --> 00:22:11,150 And this will be convenient for higher dimensions. 415 00:22:11,150 --> 00:22:12,950 It doesn't really matter for one dimension, 416 00:22:12,950 --> 00:22:15,230 but it's kind of nice to think about. 417 00:22:15,230 --> 00:22:17,480 So you've got a binary search tree. 418 00:22:17,480 --> 00:22:21,320 And then here is the data sorted by the only coordinate that 419 00:22:21,320 --> 00:22:24,000 exists, the x-coordinate. 420 00:22:24,000 --> 00:22:26,240 And so, of course, I can search for a, 421 00:22:26,240 --> 00:22:29,870 here's a maybe, search for b. 422 00:22:29,870 --> 00:22:35,446 And the stuff in between here, that is my result. 423 00:22:35,446 --> 00:22:40,540 And in a little more detail, as you search for a and b, 424 00:22:40,540 --> 00:22:43,425 at some point, they will diverge. 425 00:22:43,425 --> 00:22:44,925 One will go left, one will go right. 426 00:22:50,730 --> 00:22:52,680 At some point, you reach a. 427 00:22:52,680 --> 00:22:54,704 Maybe a isn't actually in the structure. 428 00:22:54,704 --> 00:22:57,120 You're searching for everything between a and b inclusive, 429 00:22:57,120 --> 00:22:58,810 but a may not be there. 430 00:22:58,810 --> 00:23:00,990 So in general, we're going to find the predecessor 431 00:23:00,990 --> 00:23:02,325 and successor of a. 432 00:23:02,325 --> 00:23:05,280 In this case, I'm interested in the predecessor. 433 00:23:05,280 --> 00:23:11,010 And similarly over here, eventually-- 434 00:23:11,010 --> 00:23:13,080 this is all, of course, logarithmic time-- 435 00:23:13,080 --> 00:23:16,719 I find the successor of b. 436 00:23:16,719 --> 00:23:18,510 Those are the two things I'm interested in. 437 00:23:18,510 --> 00:23:20,760 And now, all the leaves in between here, 438 00:23:20,760 --> 00:23:22,007 that's the result. Question? 439 00:23:22,007 --> 00:23:24,215 AUDIENCE: So if you have the data just on the leaves, 440 00:23:24,215 --> 00:23:26,270 what do you have intermediate node? 441 00:23:26,270 --> 00:23:27,270 ERIK DEMAINE: Ah, right. 442 00:23:27,270 --> 00:23:29,144 So in the intermediate nodes, I need to know, 443 00:23:29,144 --> 00:23:32,520 let's say, if every subtree knows the min and max, then 444 00:23:32,520 --> 00:23:36,547 at a node, I can decide should I go left, should I go right? 445 00:23:36,547 --> 00:23:38,880 I think every node can store the max of the left subtree 446 00:23:38,880 --> 00:23:40,569 if you just want one key per node. 447 00:23:40,569 --> 00:23:41,610 But, yeah, good question. 448 00:23:41,610 --> 00:23:43,330 Sorry, I forgot to mention that. 449 00:23:43,330 --> 00:23:47,270 You store a representative sort of in the middle 450 00:23:47,270 --> 00:23:50,460 that lets you decide whether to go left or right. 451 00:23:50,460 --> 00:23:51,730 So you can still do searches. 452 00:23:51,730 --> 00:23:53,070 We can find these two nodes. 453 00:23:53,070 --> 00:23:57,105 And now, the answer is basically all of this stuff. 454 00:24:01,660 --> 00:24:05,860 I did not leave myself enough space. 455 00:24:05,860 --> 00:24:07,210 That's the left child. 456 00:24:12,840 --> 00:24:13,570 OK. 457 00:24:13,570 --> 00:24:16,900 So wherever this left branch went 458 00:24:16,900 --> 00:24:19,970 left, the right branches in the answer. 459 00:24:19,970 --> 00:24:21,970 Whenever this right branch went right, 460 00:24:21,970 --> 00:24:23,760 the left branch is the answer. 461 00:24:23,760 --> 00:24:26,290 But from here, there's no subtree that we care about. 462 00:24:26,290 --> 00:24:29,080 Because this is all greater than what we care about. 463 00:24:29,080 --> 00:24:29,580 OK. 464 00:24:29,580 --> 00:24:30,955 But the good news is there's only 465 00:24:30,955 --> 00:24:33,100 log n of these subtrees, maybe two log n. 466 00:24:33,100 --> 00:24:35,330 Because there's the left side, the right side. 467 00:24:35,330 --> 00:24:35,830 OK. 468 00:24:35,830 --> 00:24:39,790 So the answer is implicitly represented. 469 00:24:39,790 --> 00:24:42,010 We don't have to explicitly touch all these items. 470 00:24:42,010 --> 00:24:44,720 We just know that they live in the subtrees, 471 00:24:44,720 --> 00:24:49,370 in those order log n subtrees. 472 00:24:49,370 --> 00:24:52,780 So in particular, if every node stores the size of its subtree, 473 00:24:52,780 --> 00:24:54,430 then we can add up these log n numbers. 474 00:24:54,430 --> 00:24:56,130 And we get the size of the answer. 475 00:24:56,130 --> 00:24:59,380 If we want the first k items, we can visit the first k items 476 00:24:59,380 --> 00:25:03,490 here in order k time. 477 00:25:03,490 --> 00:25:07,150 So in log n time, we get a nice representation of the answers, 478 00:25:07,150 --> 00:25:08,650 log n subtrees. 479 00:25:08,650 --> 00:25:11,860 Of course, we also had a nice answer when we had an array. 480 00:25:11,860 --> 00:25:14,930 But this one will be easier to generalize. 481 00:25:14,930 --> 00:25:16,690 And that's range trees. 482 00:25:21,540 --> 00:25:23,429 So that was a 1D range tree. 483 00:25:23,429 --> 00:25:25,470 The only difference is we put data at the leaves. 484 00:25:28,796 --> 00:25:34,980 2D range tree has a simple idea. 485 00:25:34,980 --> 00:25:37,800 We have the data in these subtrees. 486 00:25:37,800 --> 00:25:39,172 These are the matches. 487 00:25:39,172 --> 00:25:40,630 Let's think we have an x-coordinate 488 00:25:40,630 --> 00:25:41,421 and a y-coordinate. 489 00:25:41,421 --> 00:25:43,500 We have an x range and a y range. 490 00:25:43,500 --> 00:25:44,940 Let's do this for x. 491 00:25:44,940 --> 00:25:48,340 Now, we have a representation of all the matches in x. 492 00:25:48,340 --> 00:25:50,490 So we want this rectangle. 493 00:25:50,490 --> 00:25:54,360 But we can get this entire slab in log n time, 494 00:25:54,360 --> 00:25:56,760 and we have log n subtrees that we now 495 00:25:56,760 --> 00:25:58,080 have to filter in terms of y. 496 00:25:58,080 --> 00:26:00,610 There's all these points out here that we don't care about. 497 00:26:00,610 --> 00:26:05,920 We want to get rid of those and just focus in on these points. 498 00:26:05,920 --> 00:26:07,980 So we're going to do the same thing on y, 499 00:26:07,980 --> 00:26:10,530 but we want to do that for this subtree. 500 00:26:10,530 --> 00:26:12,270 And we want to do it for this subtree, 501 00:26:12,270 --> 00:26:15,330 and for this subtree, so simple idea. 502 00:26:15,330 --> 00:26:20,470 For each subtree, let's call it an x subtree. 503 00:26:20,470 --> 00:26:22,810 So we have one tree which represents all the x data. 504 00:26:22,810 --> 00:26:24,690 It looks just like this. 505 00:26:24,690 --> 00:26:27,930 And then for each subtree of that x tree, 506 00:26:27,930 --> 00:26:39,750 we store, let's say, a pointer to a y tree, 507 00:26:39,750 --> 00:26:43,230 which is also a 1D range tree. 508 00:26:43,230 --> 00:26:52,720 So this guy has a pointer to a similarly sized triangle. 509 00:26:52,720 --> 00:26:54,370 Except, this one is on y. 510 00:26:54,370 --> 00:26:55,600 This one's sorted by x. 511 00:26:55,600 --> 00:26:58,620 This one's sorted by y, same points. 512 00:26:58,620 --> 00:27:04,210 This subtree also has one, same data as over here, 513 00:27:04,210 --> 00:27:07,000 but now sorted an y instead of x. 514 00:27:07,000 --> 00:27:12,220 For example, there is a smaller tree inside this one. 515 00:27:12,220 --> 00:27:15,885 That one also has a pointer to a smaller y tree. 516 00:27:15,885 --> 00:27:17,260 Except, now, these are disjoined, 517 00:27:17,260 --> 00:27:19,180 because these are completely-- 518 00:27:19,180 --> 00:27:19,720 yeah. 519 00:27:19,720 --> 00:27:21,670 This is a subset of that one. 520 00:27:21,670 --> 00:27:24,400 But we're going to store a y tree for this one and a y tree 521 00:27:24,400 --> 00:27:24,980 for this one. 522 00:27:24,980 --> 00:27:26,063 So we're blowing up space. 523 00:27:29,320 --> 00:27:40,690 Every element, every point lives in log n y trees. 524 00:27:43,186 --> 00:27:44,810 Because if you look at a point, there's 525 00:27:44,810 --> 00:27:47,090 the tiny y tree that contains it bigger, bigger, bigger, 526 00:27:47,090 --> 00:27:49,050 bigger until the entire tree also contains it. 527 00:27:49,050 --> 00:27:51,730 Each of those has a corresponding y tree. 528 00:27:51,730 --> 00:27:55,360 So the overall space will be n log n. 529 00:27:58,550 --> 00:28:00,810 We're repeating points here. 530 00:28:00,810 --> 00:28:03,350 But the good news is now I can do search really efficiently, 531 00:28:03,350 --> 00:28:05,330 well, log squared efficiently. 532 00:28:05,330 --> 00:28:08,360 I spend log time to find these x trees that represent 533 00:28:08,360 --> 00:28:10,670 the slabs that I care about. 534 00:28:10,670 --> 00:28:13,970 So it's more like this picture. 535 00:28:13,970 --> 00:28:16,460 So there's a bunch of disjoint slabs, which together 536 00:28:16,460 --> 00:28:19,040 contain my points in x. 537 00:28:19,040 --> 00:28:21,200 And now I want to filter each of them by y. 538 00:28:21,200 --> 00:28:23,720 So for each of them, I jump over to y space and do 539 00:28:23,720 --> 00:28:29,490 a range query in y space just like what we were doing here. 540 00:28:29,490 --> 00:28:33,470 So search for a, search for b, but in y-coordinate. 541 00:28:33,470 --> 00:28:36,860 And then I get log n subtrees in here, log n subtrees in here, 542 00:28:36,860 --> 00:28:38,360 log n subtrees in here. 543 00:28:38,360 --> 00:28:49,066 So the query gives me log squared y subtrees. 544 00:28:49,066 --> 00:28:51,690 It takes me log squared n time to find them. 545 00:28:51,690 --> 00:28:54,050 If I have subtree sizes, I compute the number of matches 546 00:28:54,050 --> 00:28:55,740 in log squared n time. 547 00:28:55,740 --> 00:28:58,460 If I want k items, I can grab k items out 548 00:28:58,460 --> 00:29:00,730 of them in order k time. 549 00:29:00,730 --> 00:29:01,980 OK. 550 00:29:01,980 --> 00:29:03,000 Pretty easy. 551 00:29:03,000 --> 00:29:06,300 Of course, D dimensions is just the same trick. 552 00:29:06,300 --> 00:29:07,770 You have x tree. 553 00:29:07,770 --> 00:29:09,390 Every subtree links to a y tree. 554 00:29:09,390 --> 00:29:11,320 Every y subtree links to a z tree. 555 00:29:11,320 --> 00:29:15,030 Every z subtree links to a w tree, and so on. 556 00:29:15,030 --> 00:29:21,360 For D dimensions, you're going to get log to the D query 557 00:29:21,360 --> 00:29:24,540 as I claimed before. 558 00:29:24,540 --> 00:29:25,680 How much space? 559 00:29:25,680 --> 00:29:28,050 Well, every dimension you add adds another log factor 560 00:29:28,050 --> 00:29:29,220 of space. 561 00:29:29,220 --> 00:29:35,520 So it's going to be n log to the d minus 1 space. 562 00:29:35,520 --> 00:29:38,340 And if you want to do this statically, 563 00:29:38,340 --> 00:29:45,540 you can also build the data structure in n log 564 00:29:45,540 --> 00:29:49,620 to the d minus 1 n time, except for d equals 1 where 565 00:29:49,620 --> 00:29:52,020 you need n log n time to sort. 566 00:29:52,020 --> 00:29:53,940 But as long as d is bigger than 1, 567 00:29:53,940 --> 00:29:56,900 this is the right bound for higher dimensions. 568 00:29:56,900 --> 00:29:59,970 It takes a little bit of effort to actually build 569 00:29:59,970 --> 00:30:02,650 the structure in that much time, but it can be done. 570 00:30:06,010 --> 00:30:06,510 OK. 571 00:30:06,510 --> 00:30:08,093 That's the very simple data structure. 572 00:30:08,093 --> 00:30:11,325 Any questions about that before we make it cooler? 573 00:30:11,325 --> 00:30:13,200 You may have seen this data structure before. 574 00:30:13,200 --> 00:30:14,700 It's kind of classic. 575 00:30:14,700 --> 00:30:16,890 But you can do much better, well, 576 00:30:16,890 --> 00:30:18,506 at least a log factor better. 577 00:30:18,506 --> 00:30:19,340 AUDIENCE: Question. 578 00:30:19,340 --> 00:30:20,131 ERIK DEMAINE: Yeah. 579 00:30:20,131 --> 00:30:23,586 AUDIENCE: So when your storing one pointer for each subtree, 580 00:30:23,586 --> 00:30:26,960 you essentially have a pointer for each root, 581 00:30:26,960 --> 00:30:27,930 like for each node? 582 00:30:27,930 --> 00:30:28,290 ERIK DEMAINE: Yeah. 583 00:30:28,290 --> 00:30:28,789 Right. 584 00:30:28,789 --> 00:30:29,520 Every node. 585 00:30:29,520 --> 00:30:32,790 So I know these are the nodes that the stuff below them 586 00:30:32,790 --> 00:30:34,410 represents my answer in x. 587 00:30:34,410 --> 00:30:38,281 And so I teleport over to the y universe from the x universe. 588 00:30:38,281 --> 00:30:40,364 AUDIENCE: So, basically, it has all the same nodes 589 00:30:40,364 --> 00:30:41,749 of that subtree, but [INAUDIBLE] 590 00:30:41,749 --> 00:30:42,540 ERIK DEMAINE: Yeah. 591 00:30:42,540 --> 00:30:44,340 All the points that are in here also 592 00:30:44,340 --> 00:30:46,460 live in here, except these ones are sorted by x. 593 00:30:46,460 --> 00:30:48,090 These ones are sorted by y. 594 00:30:48,090 --> 00:30:53,210 If I kept following pointers, I get to z and w and so on. 595 00:30:53,210 --> 00:30:53,960 Other questions? 596 00:30:53,960 --> 00:30:54,762 Yeah. 597 00:30:54,762 --> 00:30:56,720 AUDIENCE: So if we were doing the dynamic case, 598 00:30:56,720 --> 00:30:59,020 how would we implement rotations in the [INAUDIBLE]? 599 00:30:59,020 --> 00:30:59,728 ERIK DEMAINE: OK. 600 00:31:02,850 --> 00:31:05,171 Dynamic is annoying. 601 00:31:05,171 --> 00:31:05,670 Yeah. 602 00:31:05,670 --> 00:31:08,100 Rotations are annoying. 603 00:31:08,100 --> 00:31:10,710 I think we'll come back to that. 604 00:31:10,710 --> 00:31:11,804 We can solve that. 605 00:31:11,804 --> 00:31:13,470 I thought it was easy, but you're right. 606 00:31:13,470 --> 00:31:15,630 Rotations are kind of annoying. 607 00:31:15,630 --> 00:31:18,840 And we can solve that using this dynamization trick. 608 00:31:18,840 --> 00:31:23,802 So we don't have to worry about it till we get there. 609 00:31:23,802 --> 00:31:26,010 It's going to get even harder to make things dynamic. 610 00:31:26,010 --> 00:31:30,880 And so then we really need to pull out the black box. 611 00:31:30,880 --> 00:31:32,120 Well, it's not a black box. 612 00:31:32,120 --> 00:31:33,540 We're going to see how it works. 613 00:31:33,540 --> 00:31:34,790 But it's a general transformation 614 00:31:34,790 --> 00:31:35,873 that makes things dynamic. 615 00:31:40,465 --> 00:31:40,965 OK. 616 00:31:43,830 --> 00:31:46,980 Before we get a dynamic, stick with static. 617 00:31:46,980 --> 00:31:49,095 And let's improve things by a log factor. 618 00:31:54,530 --> 00:31:57,070 This is an idea called layered range trees. 619 00:32:02,240 --> 00:32:07,520 It's also sometimes called fractional cascading, 620 00:32:07,520 --> 00:32:10,190 which is the technique we're going to get to later. 621 00:32:10,190 --> 00:32:12,740 I would say it involves one half of fractional cascading. 622 00:32:12,740 --> 00:32:14,960 Fractional cascading has two ideas. 623 00:32:14,960 --> 00:32:17,800 And the one that it's named after is not this idea. 624 00:32:17,800 --> 00:32:23,420 So idea one is basically to reuse searches. 625 00:32:23,420 --> 00:32:25,490 The idea is we're searching in this subtree 626 00:32:25,490 --> 00:32:27,920 or, I guess, this subtree with respect to y. 627 00:32:27,920 --> 00:32:31,430 We're also searching for the same interval of y 628 00:32:31,430 --> 00:32:32,150 in this subtree. 629 00:32:32,150 --> 00:32:33,830 Completely different elements, but 630 00:32:33,830 --> 00:32:36,660 if there was some way we could reuse the searches for y 631 00:32:36,660 --> 00:32:40,160 in all of these log n subtrees, we could save a log factor. 632 00:32:40,160 --> 00:32:41,550 And it turns out we can. 633 00:32:41,550 --> 00:32:43,520 And this is one idea in fractional cascading, 634 00:32:43,520 --> 00:32:45,200 but there will be another one later. 635 00:32:48,110 --> 00:32:50,470 OK. 636 00:32:50,470 --> 00:32:53,570 So, fun stuff. 637 00:32:53,570 --> 00:32:56,170 This is where I want to change my notes. 638 00:32:56,170 --> 00:33:03,190 So we're searching in x with a regular 1D range tree. 639 00:33:03,190 --> 00:33:07,490 I also want to have a regular 1D range tree-- 640 00:33:07,490 --> 00:33:08,650 range tree? 641 00:33:08,650 --> 00:33:09,189 Sure. 642 00:33:09,189 --> 00:33:10,730 Actually, it doesn't matter too much. 643 00:33:10,730 --> 00:33:13,570 I want to have an array of all the items sorted 644 00:33:13,570 --> 00:33:16,610 by y-coordinate. 645 00:33:16,610 --> 00:33:18,740 And we're going to simplify things here. 646 00:33:18,740 --> 00:33:20,420 Instead of pointing to a tree, I'm 647 00:33:20,420 --> 00:33:22,700 going to point to an array sorted by y. 648 00:33:22,700 --> 00:33:24,700 This is totally static. 649 00:33:24,700 --> 00:33:28,640 And this is where dynamic gets harder, 650 00:33:28,640 --> 00:33:32,000 not that know how to do it over there yet. 651 00:33:32,000 --> 00:33:36,500 So for each x subtree, we're going 652 00:33:36,500 --> 00:33:39,740 to have a pointer to the same elements sorted by y. 653 00:33:45,360 --> 00:33:47,430 So all the leaves that are down here 654 00:33:47,430 --> 00:33:53,352 are, basically, also there, but by y coordinate instead of x. 655 00:33:53,352 --> 00:33:55,060 Obviously, we can still do the same thing 656 00:33:55,060 --> 00:33:57,510 we could do before, spend log n time to search 657 00:33:57,510 --> 00:34:01,710 in each of these log n arrays corresponding to these log n 658 00:34:01,710 --> 00:34:02,920 subtrees. 659 00:34:02,920 --> 00:34:05,710 And in log squared n, we'll have our answers. 660 00:34:05,710 --> 00:34:07,950 But we can do better now. 661 00:34:07,950 --> 00:34:11,550 I only want to do one binary search in y. 662 00:34:11,550 --> 00:34:13,460 And that will be at the root. 663 00:34:13,460 --> 00:34:15,210 So the root, there's an array representing 664 00:34:15,210 --> 00:34:17,159 everything sorted by y. 665 00:34:17,159 --> 00:34:19,724 I search for the lower y-coordinate. 666 00:34:19,724 --> 00:34:23,671 I search for the upper y-coordinate, some things. 667 00:34:23,671 --> 00:34:25,170 It's hard to draw this, because it's 668 00:34:25,170 --> 00:34:26,919 in the dimensional orthogonal to this one. 669 00:34:26,919 --> 00:34:31,239 I guess I should really draw the arrays like this. 670 00:34:31,239 --> 00:34:32,760 So this guy has an array. 671 00:34:32,760 --> 00:34:34,770 We find the upper and lower bounds 672 00:34:34,770 --> 00:34:37,500 for the y-coordinate in the global space. 673 00:34:37,500 --> 00:34:39,791 This takes log n time to do two searches. 674 00:34:39,791 --> 00:34:40,290 Question. 675 00:34:40,290 --> 00:34:41,757 AUDIENCE: Those are the upper and lower bounds 676 00:34:41,757 --> 00:34:44,049 from the predecessor [INAUDIBLE] successor [INAUDIBLE]? 677 00:34:44,049 --> 00:34:45,131 ERIK DEMAINE: Yeah, right. 678 00:34:45,131 --> 00:34:47,909 So we're doing a predecessor and successor search, let's say, 679 00:34:47,909 --> 00:34:48,870 in this array. 680 00:34:48,870 --> 00:34:52,170 Binary search we find-- 681 00:34:52,170 --> 00:34:54,960 I didn't give them names, but in the notes 682 00:34:54,960 --> 00:35:00,540 they're a1 through b1 and x. 683 00:35:00,540 --> 00:35:04,560 And they're a2 through b2 and y. 684 00:35:04,560 --> 00:35:06,740 So that's my query, this rectangle. 685 00:35:06,740 --> 00:35:11,380 I'm doing the search for a2 and for b2 in the top array. 686 00:35:11,380 --> 00:35:14,460 Now, what I'd like to do is keep that information 687 00:35:14,460 --> 00:35:15,750 as I walk down the tree. 688 00:35:18,760 --> 00:35:21,390 So that in the end, when I get to these nodes, 689 00:35:21,390 --> 00:35:26,520 I know where I am in those arrays in y. 690 00:35:26,520 --> 00:35:28,440 So let's think of that just step by step. 691 00:35:47,460 --> 00:35:53,900 So imagine in the x tree, I'm at some node. 692 00:35:53,900 --> 00:35:57,080 And then I follow, let's say, a right pointer 693 00:35:57,080 --> 00:35:59,580 to the right child. 694 00:35:59,580 --> 00:36:00,080 OK. 695 00:36:00,080 --> 00:36:02,240 Now, in y space-- 696 00:36:02,240 --> 00:36:06,500 maybe I should switch to red for y space. 697 00:36:06,500 --> 00:36:13,010 This guy has a really big array representing all of the nodes 698 00:36:13,010 --> 00:36:16,000 down here, but sorted by y-coordinate. 699 00:36:16,000 --> 00:36:18,410 This guy has a corresponding array 700 00:36:18,410 --> 00:36:20,780 with some subset of the nodes. 701 00:36:20,780 --> 00:36:22,160 Which subset? 702 00:36:22,160 --> 00:36:24,660 The ones that are to the right of this x-coordinate. 703 00:36:24,660 --> 00:36:25,790 So there's no relation. 704 00:36:25,790 --> 00:36:28,587 I mean, some of the guys that are here-- 705 00:36:28,587 --> 00:36:29,420 let me circle them-- 706 00:36:32,180 --> 00:36:34,400 some of these guys exist over here. 707 00:36:34,400 --> 00:36:37,740 They'll be in the same relative order. 708 00:36:37,740 --> 00:36:41,220 So here's those four guys, then one, and two. 709 00:36:41,220 --> 00:36:43,899 So some of these guys will be preserved over here. 710 00:36:43,899 --> 00:36:46,190 Some of them won't, because their x-coordinate smaller. 711 00:36:46,190 --> 00:36:47,780 It's an arbitrary subset. 712 00:36:47,780 --> 00:36:50,660 These guys will also live here. 713 00:36:50,660 --> 00:36:51,410 OK. 714 00:36:51,410 --> 00:36:56,180 The idea is store pointers from every element over here to, 715 00:36:56,180 --> 00:36:59,090 let's say, the successor over here. 716 00:36:59,090 --> 00:37:02,680 So store these red arrows. 717 00:37:02,680 --> 00:37:06,800 let's say, these guys all point to this node. 718 00:37:06,800 --> 00:37:09,260 These guys point to that node. 719 00:37:09,260 --> 00:37:12,320 I guess these guys just point to some adjacent node, 720 00:37:12,320 --> 00:37:15,550 either the predecessor or the successor. 721 00:37:15,550 --> 00:37:20,292 So the result is if I know where a2 and b2 live in this array, 722 00:37:20,292 --> 00:37:22,250 I can figure out where they live in this array. 723 00:37:22,250 --> 00:37:23,920 I just follow the pointer. 724 00:37:23,920 --> 00:37:25,776 Easy. 725 00:37:25,776 --> 00:37:28,240 Done. 726 00:37:28,240 --> 00:37:28,750 OK. 727 00:37:28,750 --> 00:37:30,350 Let's think about what this means. 728 00:37:30,350 --> 00:37:39,580 So I'm going to store pointers from the y 729 00:37:39,580 --> 00:37:45,110 array of some x node. 730 00:37:49,560 --> 00:37:56,340 Let's call that node v in the x tree 731 00:37:56,340 --> 00:38:04,940 to the corresponding places, corresponding points, 732 00:38:04,940 --> 00:38:15,410 let's say, in the y arrays of left child of v 733 00:38:15,410 --> 00:38:16,760 and the right child of v. 734 00:38:16,760 --> 00:38:19,822 So, actually, every array item is 735 00:38:19,822 --> 00:38:21,530 going to have two pointers, one if you're 736 00:38:21,530 --> 00:38:23,290 going right in the x tree, one if you're 737 00:38:23,290 --> 00:38:26,310 going the left in the x tree. 738 00:38:26,310 --> 00:38:28,700 But we can afford a constant number of pointers per node. 739 00:38:28,700 --> 00:38:32,059 This only increases space by a constant factor. 740 00:38:32,059 --> 00:38:34,100 And now, it tells me exactly what I need to know. 741 00:38:34,100 --> 00:38:35,510 I start at the root. 742 00:38:35,510 --> 00:38:36,590 I do a binary search. 743 00:38:36,590 --> 00:38:37,820 That's the slow part. 744 00:38:37,820 --> 00:38:39,740 I spend log n time, find those two slots. 745 00:38:39,740 --> 00:38:42,260 Every time I go down, I follow the pointer. 746 00:38:42,260 --> 00:38:47,120 I know exactly where a2 and b2 live in the next array. 747 00:38:47,120 --> 00:38:50,070 In constant time, as I walk down, I can figure this out. 748 00:38:50,070 --> 00:38:52,430 I can remember the information on both sides here. 749 00:38:52,430 --> 00:38:55,940 And every time I go to one of these subtrees, 750 00:38:55,940 --> 00:38:58,970 I know exactly where I live-- 751 00:38:58,970 --> 00:39:02,940 it's no longer a tree-- now, in that array. 752 00:39:02,940 --> 00:39:06,710 So I can identify the regions in these arrays. 753 00:39:06,710 --> 00:39:11,540 that correspond to these matching subrectangles 754 00:39:11,540 --> 00:39:12,560 with no extra time. 755 00:39:12,560 --> 00:39:14,606 So I save that last log factor. 756 00:39:14,606 --> 00:39:16,230 If you generalize this to D dimensions, 757 00:39:16,230 --> 00:39:17,730 it only works in the last dimension. 758 00:39:17,730 --> 00:39:19,730 You can use this trick in the last dimension 759 00:39:19,730 --> 00:39:25,880 and improve from log to the d query to log to the d minus 1. 760 00:39:25,880 --> 00:39:28,459 In the higher dimensions, we just use regular range trees. 761 00:39:28,459 --> 00:39:30,500 And when we get down to the two dimensional case, 762 00:39:30,500 --> 00:39:31,754 it's a recursion. 763 00:39:31,754 --> 00:39:33,920 Before we were stopping at the one dimensional case. 764 00:39:33,920 --> 00:39:36,080 We use a regular binary search tree. 765 00:39:36,080 --> 00:39:38,390 Now, we stop at the two dimensional case, 766 00:39:38,390 --> 00:39:39,590 and we use this fancy thing. 767 00:39:43,370 --> 00:39:44,780 I call this cross-linking. 768 00:39:44,780 --> 00:39:46,970 A lot of people call it fractional cascading. 769 00:39:46,970 --> 00:39:49,520 Both are valid names. 770 00:39:49,520 --> 00:39:52,100 It's a cool idea, but simple once you 771 00:39:52,100 --> 00:39:54,570 can see both dimensions at once, which I know it's 772 00:39:54,570 --> 00:39:55,820 hard to see in two dimensions. 773 00:39:55,820 --> 00:39:59,060 But it can be done. 774 00:39:59,060 --> 00:40:00,630 All right. 775 00:40:00,630 --> 00:40:01,250 Questions? 776 00:40:04,270 --> 00:40:06,700 I guess the obvious question is dynamic. 777 00:40:06,700 --> 00:40:10,040 Now, we're going to go to dynamic. 778 00:40:10,040 --> 00:40:11,860 This is a very static thing to be doing. 779 00:40:11,860 --> 00:40:14,132 How in the world would we maintain this 780 00:40:14,132 --> 00:40:15,340 if the point set is changing? 781 00:40:15,340 --> 00:40:17,590 All these pointers are going to move around. 782 00:40:17,590 --> 00:40:20,350 Life seems so hard. 783 00:40:20,350 --> 00:40:22,330 But it's not. 784 00:40:22,330 --> 00:40:25,060 In fact, updates are a lot easier than you might think. 785 00:40:45,850 --> 00:40:49,540 Some of you may believe this in your heart. 786 00:40:49,540 --> 00:40:51,190 Some of you may not. 787 00:40:51,190 --> 00:40:55,750 But if you've ever seen an amortization argument that 788 00:40:55,750 --> 00:40:57,684 says, basically, when you modify a tree, 789 00:40:57,684 --> 00:40:59,350 only a constant number of things happen. 790 00:40:59,350 --> 00:41:01,620 And they usually happen near the leaves. 791 00:41:01,620 --> 00:41:03,900 I'm thinking of a binary search tree. 792 00:41:03,900 --> 00:41:05,700 The easiest way to see this is in a B-tree 793 00:41:05,700 --> 00:41:07,870 if you know B-trees. 794 00:41:07,870 --> 00:41:09,567 Usually, if you do insertion, you're 795 00:41:09,567 --> 00:41:11,650 going to do maybe one or two splits at the bottom, 796 00:41:11,650 --> 00:41:12,579 and that's it. 797 00:41:12,579 --> 00:41:14,620 Constant fraction at a time, that's all there is. 798 00:41:14,620 --> 00:41:16,900 So it should only take constant time to do an update. 799 00:41:16,900 --> 00:41:21,935 This structure is easy to update at the leaves. 800 00:41:21,935 --> 00:41:24,310 If you look at one of these structures, a constant number 801 00:41:24,310 --> 00:41:26,500 of items, there's a constant size array. 802 00:41:26,500 --> 00:41:29,840 You could update everything in constant time. 803 00:41:29,840 --> 00:41:32,745 If we're only up to hitting near the leaves, then life is good. 804 00:41:32,745 --> 00:41:34,120 Occasionally, though, we're going 805 00:41:34,120 --> 00:41:36,609 to have to update these giant structures. 806 00:41:36,609 --> 00:41:38,650 And then we're going to have to spend giant time. 807 00:41:38,650 --> 00:41:41,320 That's OK. 808 00:41:41,320 --> 00:41:43,720 The only thing we need out of this data structure 809 00:41:43,720 --> 00:41:46,750 is that it takes the same amount of space and pre-processing 810 00:41:46,750 --> 00:41:52,950 time, n log to d minus 1 space, and time to build 811 00:41:52,950 --> 00:41:56,800 the static data structure. 812 00:41:56,800 --> 00:42:01,540 If we have this, it turns out we can make it dynamic for free. 813 00:42:01,540 --> 00:42:03,900 This is the magic of weight balance trees. 814 00:42:15,610 --> 00:42:20,140 In general, there are many kinds of weight balance trees. 815 00:42:20,140 --> 00:42:22,570 We're going to look at one called BB alpha 816 00:42:22,570 --> 00:42:28,450 trees, which are the oldest and sort of the simplest. 817 00:42:28,450 --> 00:42:29,210 Well, you'll see. 818 00:42:29,210 --> 00:42:31,240 It's pretty easy to do. 819 00:42:31,240 --> 00:42:33,280 You've already seen height balance trees. 820 00:42:33,280 --> 00:42:36,190 AVL trees, for example, you keep the left and the right subtree. 821 00:42:36,190 --> 00:42:38,481 You want their height to be within an additive constant 822 00:42:38,481 --> 00:42:40,630 of each other, 1. 823 00:42:40,630 --> 00:42:43,780 Red black trees are multiplicative factor 2. 824 00:42:43,780 --> 00:42:45,250 Left and right subtree, the heights 825 00:42:45,250 --> 00:42:46,990 will be roughly the same. 826 00:42:46,990 --> 00:42:49,450 Weight balance trees, weight is the number 827 00:42:49,450 --> 00:42:51,005 of nodes in a subtree. 828 00:42:51,005 --> 00:42:52,630 Weight balance trees, they want to keep 829 00:42:52,630 --> 00:42:55,330 the size of the left subtree and the size of the right subtree 830 00:42:55,330 --> 00:42:57,560 to be roughly the same. 831 00:42:57,560 --> 00:43:02,710 So here's the definition of BB alpha trees. 832 00:43:02,710 --> 00:43:11,900 For each node v, size of the left subtree of v 833 00:43:11,900 --> 00:43:16,300 is at least alpha times the size of v. 834 00:43:16,300 --> 00:43:21,540 And size of the right subtree of v 835 00:43:21,540 --> 00:43:27,860 is at least alpha times the size of v. Now, size, 836 00:43:27,860 --> 00:43:28,810 I didn't define size. 837 00:43:28,810 --> 00:43:30,640 It could be the total number of nodes in the subtree. 838 00:43:30,640 --> 00:43:32,664 It could be the number of leaves in the subtree. 839 00:43:32,664 --> 00:43:33,580 Doesn't really matter. 840 00:43:36,160 --> 00:43:38,140 What else? 841 00:43:38,140 --> 00:43:39,080 What's alpha? 842 00:43:39,080 --> 00:43:40,900 Alpha is a half, you're in trouble. 843 00:43:40,900 --> 00:43:43,630 Because then it has to be perfectly balanced. 844 00:43:43,630 --> 00:43:46,570 But just make alpha small, like 1/10 or something. 845 00:43:46,570 --> 00:43:49,340 Any constant less than a half will do. 846 00:43:52,500 --> 00:43:53,110 Right. 847 00:43:53,110 --> 00:43:54,568 The nice thing about weight balance 848 00:43:54,568 --> 00:43:55,990 is they imply height balance. 849 00:43:55,990 --> 00:43:59,290 If you have this property that neither your left 850 00:43:59,290 --> 00:44:04,300 nor your right subtree are too small, then as you go down, 851 00:44:04,300 --> 00:44:06,940 every time you take a left or a right child, 852 00:44:06,940 --> 00:44:10,840 you throw away an alpha fraction of your nodes. 853 00:44:10,840 --> 00:44:12,480 So initially, you have all the nodes. 854 00:44:12,480 --> 00:44:14,604 Every time you go down, you lose an alpha fraction. 855 00:44:14,604 --> 00:44:16,240 How many times can that happen? 856 00:44:16,240 --> 00:44:21,280 Log base alpha, basically, so log base 1 over alpha. 857 00:44:21,280 --> 00:44:29,350 The height is log base 1 over alpha of n. 858 00:44:29,350 --> 00:44:32,350 So this is really a stronger property than height balance. 859 00:44:32,350 --> 00:44:34,871 It implies that your heights are good. 860 00:44:34,871 --> 00:44:37,120 So it implies the height of the left and right subtree 861 00:44:37,120 --> 00:44:39,340 are not too far from each other. 862 00:44:39,340 --> 00:44:41,290 But it's a lot stronger. 863 00:44:41,290 --> 00:44:46,960 It lets you do updates lickety fast, basically. 864 00:44:46,960 --> 00:44:48,130 So how do we do an update? 865 00:44:57,710 --> 00:45:01,250 The idea is, normally, you insert a leaf, 866 00:45:01,250 --> 00:45:03,520 do a regular BST, insert a delete. 867 00:45:03,520 --> 00:45:06,350 You add a leaf at the bottom or delete a leaf. 868 00:45:06,350 --> 00:45:11,240 And so you have to update like that node and maybe its parent. 869 00:45:11,240 --> 00:45:13,650 As long as you have weight balance, 870 00:45:13,650 --> 00:45:15,650 you're just making little constant sized changes 871 00:45:15,650 --> 00:45:16,430 at the bottom. 872 00:45:16,430 --> 00:45:18,330 Everything's good. 873 00:45:18,330 --> 00:45:18,830 OK. 874 00:45:18,830 --> 00:45:21,413 The trouble is when one of these constraints becomes violated. 875 00:45:21,413 --> 00:45:23,880 Then you want to do a rotation or something. 876 00:45:23,880 --> 00:45:24,380 OK. 877 00:45:24,380 --> 00:45:34,650 So when a node is not weight balanced, 878 00:45:34,650 --> 00:45:37,220 it's a pretty loose algorithm. 879 00:45:37,220 --> 00:45:39,380 But it's easy to find nodes. 880 00:45:39,380 --> 00:45:41,630 You just store all the weights, all the subtree sizes, 881 00:45:41,630 --> 00:45:43,610 which we were doing already. 882 00:45:43,610 --> 00:45:47,150 You can detect when nodes are no longer weight balanced. 883 00:45:47,150 --> 00:45:49,044 And then we just want to weight balance it. 884 00:45:49,044 --> 00:45:50,210 How do we weight balance it? 885 00:45:50,210 --> 00:45:54,090 We rebuild the entire subtree from scratch. 886 00:45:54,090 --> 00:45:56,387 This is sort of the only thing we know how to do. 887 00:45:56,387 --> 00:45:57,720 We have a static data structure. 888 00:45:57,720 --> 00:46:00,230 This is a general transformation, dynamization 889 00:46:00,230 --> 00:46:03,740 when you have augmentation. 890 00:46:03,740 --> 00:46:05,060 We have this data structure. 891 00:46:05,060 --> 00:46:06,560 It's got all these augmented things. 892 00:46:06,560 --> 00:46:07,760 It's complicated. 893 00:46:07,760 --> 00:46:09,560 But at least it's sort of downward looking. 894 00:46:09,560 --> 00:46:12,660 I mean, you only need to store pointers from here down, 895 00:46:12,660 --> 00:46:13,840 not up. 896 00:46:13,840 --> 00:46:15,340 I mean, your parent points into you. 897 00:46:15,340 --> 00:46:17,499 But you have a nice local thing. 898 00:46:17,499 --> 00:46:19,040 So if this guy's not weight balanced, 899 00:46:19,040 --> 00:46:23,530 if this left subtree is way heavier than the right subtree 900 00:46:23,530 --> 00:46:26,950 by this alpha factor, one over alpha factor, 901 00:46:26,950 --> 00:46:30,020 then just redo everything in here. 902 00:46:30,020 --> 00:46:31,580 Find the median. 903 00:46:31,580 --> 00:46:33,522 Make a perfect binary search tree. 904 00:46:33,522 --> 00:46:35,480 Then the weights between the left and the right 905 00:46:35,480 --> 00:46:36,740 will be perfectly balanced. 906 00:46:36,740 --> 00:46:40,880 We'll have achieved the one half, one half split of weight. 907 00:46:40,880 --> 00:46:43,610 How long before it gets unbalanced again? 908 00:46:43,610 --> 00:46:45,704 A long time. 909 00:46:45,704 --> 00:46:47,870 If I start with a one half, one half split, and then 910 00:46:47,870 --> 00:46:51,860 I have to get to an alpha 1 minus alpha split, 911 00:46:51,860 --> 00:46:56,365 a lot of nodes had to move from one side to the other. 912 00:46:56,365 --> 00:46:57,770 The alpha gets messy. 913 00:46:57,770 --> 00:47:01,760 So let me just say when this happens, 914 00:47:01,760 --> 00:47:03,920 rebuild entire subtree. 915 00:47:08,940 --> 00:47:12,676 I guess it's like a 1/2 minus alpha had to move. 916 00:47:12,676 --> 00:47:14,550 1/2 minus alpha times the size of the subtree 917 00:47:14,550 --> 00:47:17,970 had to be inserted or deleted, had to happen, or maybe half 918 00:47:17,970 --> 00:47:19,877 of that, some constant fraction. 919 00:47:19,877 --> 00:47:20,710 I don't really care. 920 00:47:20,710 --> 00:47:22,650 Alpha's a constant. 921 00:47:22,650 --> 00:47:28,790 I'm going to charge to the theta k 922 00:47:28,790 --> 00:47:38,560 updates that unbalance things. 923 00:47:47,150 --> 00:47:49,650 k here is the size of the subtree. 924 00:47:54,260 --> 00:47:57,440 k So when I see a node is on balance, just fix it. 925 00:47:57,440 --> 00:47:59,270 Make it perfect. 926 00:47:59,270 --> 00:48:01,810 And if I started out perfect, the subtree started out 927 00:48:01,810 --> 00:48:04,780 perfect, I know there were theta k updates that I can charge to. 928 00:48:04,780 --> 00:48:09,200 The only catch is I'm actually double charging quite a bit, 929 00:48:09,200 --> 00:48:09,700 actually. 930 00:48:09,700 --> 00:48:14,860 If you look at a tree, if I do an insert here, 931 00:48:14,860 --> 00:48:17,650 it makes this subtree potentially slightly 932 00:48:17,650 --> 00:48:18,190 unbalanced. 933 00:48:18,190 --> 00:48:19,430 It makes this subtrees slightly unbalanced. 934 00:48:19,430 --> 00:48:21,180 It makes this subtree slightly unbalanced. 935 00:48:21,180 --> 00:48:24,580 There are log n subtrees that contain that item. 936 00:48:24,580 --> 00:48:27,095 Each of them may be getting worse. 937 00:48:27,095 --> 00:48:29,470 So if I say, well, yeah, there are these theta k updates, 938 00:48:29,470 --> 00:48:31,636 but actually there are log n different subtrees that 939 00:48:31,636 --> 00:48:33,590 will charge to the same update. 940 00:48:33,590 --> 00:48:36,490 So I lose a log n factor in this amortization. 941 00:48:36,490 --> 00:48:38,110 But it's not so bad. 942 00:48:38,110 --> 00:48:40,420 I get log n amortized update. 943 00:48:46,680 --> 00:48:50,510 This is if a rebuild costs linear time. 944 00:48:57,616 --> 00:48:58,490 This is pretty nifty. 945 00:48:58,490 --> 00:49:00,930 I don't have to do rotations per se. 946 00:49:00,930 --> 00:49:03,980 I just take all the notes in the subtree, write them down. 947 00:49:03,980 --> 00:49:05,150 I do an in order traverse. 948 00:49:05,150 --> 00:49:06,800 I have them sorted, take the median, 949 00:49:06,800 --> 00:49:09,740 build a nice perfect binary search tree on those items. 950 00:49:09,740 --> 00:49:12,570 I can easily do that in linear time. 951 00:49:12,570 --> 00:49:14,930 And so this is like the brain dead way 952 00:49:14,930 --> 00:49:19,760 to make this weight balanced tree dynamic. 953 00:49:19,760 --> 00:49:21,680 The original BB alpha trees use rotations. 954 00:49:21,680 --> 00:49:22,959 But you don't have to. 955 00:49:22,959 --> 00:49:25,250 You can do this very simple thing and still get a log n 956 00:49:25,250 --> 00:49:27,650 amortized update. 957 00:49:27,650 --> 00:49:30,260 And the good news is, if you have augmentation as well-- 958 00:49:30,260 --> 00:49:31,940 because with this subtree, there's 959 00:49:31,940 --> 00:49:35,990 tons of extra stuff, all these arrays and pointers and stuff, 960 00:49:35,990 --> 00:49:37,850 it's easy to build from scratch. 961 00:49:37,850 --> 00:49:39,440 But it's hard to maintain dynamically. 962 00:49:39,440 --> 00:49:41,600 The point is, now, we don't have to. 963 00:49:41,600 --> 00:49:43,250 If ever we need to change a node, 964 00:49:43,250 --> 00:49:44,870 we just rebuild the entire subtree. 965 00:49:44,870 --> 00:49:49,620 And we can afford it at the loss of a logarithmic overhead. 966 00:49:49,620 --> 00:49:53,510 So we had n log to the d minus 1 n time to build the structure. 967 00:49:53,510 --> 00:49:55,010 So for a structure of size k, it's 968 00:49:55,010 --> 00:50:00,154 going to be k times log to the d minus 1 of k. 969 00:50:00,154 --> 00:50:01,820 We're going to lose an extra log factor. 970 00:50:01,820 --> 00:50:04,220 So this d minus 1 is going to turn into a d minus 2 971 00:50:04,220 --> 00:50:04,835 for updates. 972 00:50:24,590 --> 00:50:31,010 So that was the generic structure. 973 00:50:31,010 --> 00:50:38,720 And now, if we apply this to layered range trees, 974 00:50:38,720 --> 00:50:46,250 we get log to the d n amortized update. 975 00:50:52,160 --> 00:50:56,165 Because we had k times log to the d 976 00:50:56,165 --> 00:51:01,220 minus 1 of k pre-processing to rebuild node. 977 00:51:04,460 --> 00:51:07,380 And just to recall, we still have log 978 00:51:07,380 --> 00:51:11,900 to the d minus 1 of n query. 979 00:51:11,900 --> 00:51:14,780 So this was regular range trees. 980 00:51:14,780 --> 00:51:19,490 And we've made them dynamic, the same time as range trees. 981 00:51:19,490 --> 00:51:22,250 And still, the query is a log factor faster. 982 00:51:22,250 --> 00:51:26,060 So for 2D, we get log n query log squired n update insertion 983 00:51:26,060 --> 00:51:28,700 and deletion of points. 984 00:51:28,700 --> 00:51:29,590 Questions about that? 985 00:51:33,180 --> 00:51:34,650 Cool. 986 00:51:34,650 --> 00:51:38,990 Well, that is range searching, orthogonal range searching. 987 00:51:42,750 --> 00:51:44,980 Let's see. 988 00:51:44,980 --> 00:51:47,860 There are more results, which I don't 989 00:51:47,860 --> 00:51:50,530 want to cover in detail here. 990 00:51:50,530 --> 00:51:53,164 But you should at least know about them. 991 00:51:53,164 --> 00:51:55,330 And then we're going to turn to fractional cascading 992 00:51:55,330 --> 00:51:56,638 a little more generally. 993 00:52:05,110 --> 00:52:10,855 So where is this result? 994 00:52:10,855 --> 00:52:11,480 Somewhere here. 995 00:52:21,160 --> 00:52:28,110 So for static orthogonal range searching, 996 00:52:28,110 --> 00:52:30,090 range searching is a big area. 997 00:52:30,090 --> 00:52:31,797 We're looking at the orthogonal case. 998 00:52:31,797 --> 00:52:33,630 There's other versions where you're querying 999 00:52:33,630 --> 00:52:37,140 with a triangle or a simplex. 1000 00:52:37,140 --> 00:52:40,020 You can query with two-sided box, which 1001 00:52:40,020 --> 00:52:41,430 goes out to infinity here. 1002 00:52:41,430 --> 00:52:44,050 All sorts of things are out there. 1003 00:52:44,050 --> 00:52:46,312 But let me stick to rectangles. 1004 00:52:46,312 --> 00:52:50,760 Because that's what we've seen and we can relate to. 1005 00:52:50,760 --> 00:52:54,400 You can achieve these same bounds-- 1006 00:52:54,400 --> 00:52:55,500 sorry, no update. 1007 00:52:55,500 --> 00:52:57,420 You can achieve the log to the d minus 1 1008 00:52:57,420 --> 00:53:01,690 n query using less space. 1009 00:53:01,690 --> 00:53:06,630 So I can get log to the d minus 1 n 1010 00:53:06,630 --> 00:53:16,170 query and n log to the d minus 1 n space-- 1011 00:53:16,170 --> 00:53:18,270 that's what we were getting before-- 1012 00:53:18,270 --> 00:53:21,240 divided by log log n. 1013 00:53:21,240 --> 00:53:23,280 Slight improvement. 1014 00:53:23,280 --> 00:53:26,370 And in a certain model, this is basically optimal, 1015 00:53:26,370 --> 00:53:27,960 which is kind of even crazier. 1016 00:53:27,960 --> 00:53:31,970 This is an old result by Chazelle. 1017 00:53:31,970 --> 00:53:34,560 That's in '86. 1018 00:53:34,560 --> 00:53:35,670 OK. 1019 00:53:35,670 --> 00:53:41,325 This is 2D-- sorry, not 2D, just in general. 1020 00:53:45,600 --> 00:53:48,840 Turns out this query time is not optimal. 1021 00:53:48,840 --> 00:53:51,630 If you allow the space to go up a little bit, 1022 00:53:51,630 --> 00:53:55,188 you can get another log improvement. 1023 00:53:55,188 --> 00:53:59,160 So I can get log to the d minus 2 1024 00:53:59,160 --> 00:54:04,675 and query if I'm willing to pay-- 1025 00:54:04,675 --> 00:54:05,715 I didn't this is space-- 1026 00:54:09,210 --> 00:54:15,510 n log to the d n space. 1027 00:54:15,510 --> 00:54:19,260 So if I give up another log factor in space, 1028 00:54:19,260 --> 00:54:21,060 I can get another log factor in query. 1029 00:54:21,060 --> 00:54:23,040 I don't think you can keep doing that. 1030 00:54:23,040 --> 00:54:25,520 But for one more step, you can. 1031 00:54:25,520 --> 00:54:28,770 I believe this is conjectured optimal for query. 1032 00:54:28,770 --> 00:54:32,040 I don't know if it's proved. 1033 00:54:32,040 --> 00:54:35,190 And this was originally done by Chazelle and Guibas 1034 00:54:35,190 --> 00:54:38,190 using fractional cascading. 1035 00:54:38,190 --> 00:54:39,570 And we'll see. 1036 00:54:39,570 --> 00:54:43,430 If there's time next class, I'll show you how this works. 1037 00:54:43,430 --> 00:54:45,180 But for now, I want to tell you in general 1038 00:54:45,180 --> 00:54:48,540 how fractional cascading works in generality. 1039 00:54:48,540 --> 00:54:50,250 This is part of fractional cascading, 1040 00:54:50,250 --> 00:54:53,970 this idea of cross-linking from a bigger structure to a smaller 1041 00:54:53,970 --> 00:54:56,640 one, so that you don't have to keep researching. 1042 00:54:56,640 --> 00:54:59,340 You just reuse where you were. 1043 00:54:59,340 --> 00:55:00,480 But there's another idea. 1044 00:55:00,480 --> 00:55:01,688 I want to show you that idea. 1045 00:55:04,200 --> 00:55:06,180 So, fractional cascading. 1046 00:55:22,052 --> 00:55:24,540 AUDIENCE: Would that work for d equals 2? 1047 00:55:24,540 --> 00:55:27,840 ERIK DEMAINE: For d equals 2, no it does not work. 1048 00:55:27,840 --> 00:55:31,590 So I should say this is for 2D and higher. 1049 00:55:31,590 --> 00:55:33,620 D has to be bigger than 1. 1050 00:55:33,620 --> 00:55:35,920 Because you can never be log n. 1051 00:55:35,920 --> 00:55:39,470 So for 2D and higher, we could use the trick that we just did. 1052 00:55:39,470 --> 00:55:43,450 For 3D and higher, you can improve by another long, 1053 00:55:43,450 --> 00:55:44,984 thanks. 1054 00:55:44,984 --> 00:55:45,650 Other questions? 1055 00:55:45,650 --> 00:55:47,540 AUDIENCE: But you said you can never beat log n. 1056 00:55:47,540 --> 00:55:49,123 ERIK DEMAINE: We can never beat log n. 1057 00:55:49,123 --> 00:55:52,880 In this model, which is basically comparison model, 1058 00:55:52,880 --> 00:55:54,580 we're comparing coordinates. 1059 00:55:54,580 --> 00:55:56,690 In that model and many other models, 1060 00:55:56,690 --> 00:55:58,762 you can't beat log n query. 1061 00:55:58,762 --> 00:56:01,220 Because in particular, you have to solve the search problem 1062 00:56:01,220 --> 00:56:02,570 in 1D. 1063 00:56:02,570 --> 00:56:04,310 So we're always hampered by that. 1064 00:56:04,310 --> 00:56:08,170 But the question is, how does it grow with d? 1065 00:56:08,170 --> 00:56:10,600 And the claim is we can get log n all the way up to three 1066 00:56:10,600 --> 00:56:11,230 dimensions. 1067 00:56:11,230 --> 00:56:13,900 Only at four dimensions do we have to pay log squared. 1068 00:56:13,900 --> 00:56:17,870 It's pretty amazing I think. 1069 00:56:17,870 --> 00:56:18,570 OK. 1070 00:56:18,570 --> 00:56:32,191 Fractional cascading-- super cool name, kind of scary name. 1071 00:56:32,191 --> 00:56:34,690 I was always scared when I heard about fractional cascading. 1072 00:56:34,690 --> 00:56:35,940 But it turns out, it's very simple. 1073 00:56:35,940 --> 00:56:37,330 Goal today is to not be scared. 1074 00:56:40,110 --> 00:56:43,560 Let's start with a warm up problem. 1075 00:56:43,560 --> 00:56:46,190 And then I'll tell you its full generality. 1076 00:56:46,190 --> 00:56:48,370 But simple version of the problem 1077 00:56:48,370 --> 00:56:50,290 is not geometry, per se. 1078 00:56:50,290 --> 00:56:52,900 It's kind of 1 and 1/2 dimensions, if you will. 1079 00:56:52,900 --> 00:57:02,620 Suppose I have k lists and each has size n. 1080 00:57:02,620 --> 00:57:04,630 They're sorted lists, think of them. 1081 00:57:04,630 --> 00:57:09,460 So we have n items come from an ordered universe. 1082 00:57:09,460 --> 00:57:10,840 Here's list one. 1083 00:57:10,840 --> 00:57:12,760 Here's list two. 1084 00:57:12,760 --> 00:57:14,260 Here's a list three. 1085 00:57:14,260 --> 00:57:16,330 There's k of them. 1086 00:57:16,330 --> 00:57:19,180 Each of them has n items. 1087 00:57:19,180 --> 00:57:21,805 I would like to search the query. 1088 00:57:24,712 --> 00:57:26,920 We'll just do static here. 1089 00:57:26,920 --> 00:57:29,880 Original fractional cascading was just static. 1090 00:57:29,880 --> 00:57:32,154 And these results are just static. 1091 00:57:32,154 --> 00:57:34,320 You can make it dynamic, but there is some overhead. 1092 00:57:34,320 --> 00:57:35,920 And I don't want to get into that. 1093 00:57:35,920 --> 00:57:40,450 It's even messier, or it is messy. 1094 00:57:40,450 --> 00:57:43,900 Fractional cascading by itself is a very simple idea. 1095 00:57:43,900 --> 00:57:50,720 Query is search for x in all lists. 1096 00:57:53,870 --> 00:57:54,370 OK. 1097 00:57:54,370 --> 00:57:57,639 So I want to know what is the predecessor and successor of x 1098 00:57:57,639 --> 00:57:58,180 in this list. 1099 00:57:58,180 --> 00:57:59,560 I want to know what is the predecessor 1100 00:57:59,560 --> 00:58:00,690 and successor in this list. 1101 00:58:00,690 --> 00:58:03,010 I want to know what's the predecessor and successor 1102 00:58:03,010 --> 00:58:04,764 in this list, all of them. 1103 00:58:04,764 --> 00:58:05,680 It's more information. 1104 00:58:05,680 --> 00:58:08,470 If I just merged the lists and searched for x, 1105 00:58:08,470 --> 00:58:11,224 I would find where x fits globally. 1106 00:58:11,224 --> 00:58:13,390 But I want to know how it fits relative to this list 1107 00:58:13,390 --> 00:58:16,480 and relative to this list and relative to this list. 1108 00:58:16,480 --> 00:58:18,430 How do I do it? 1109 00:58:18,430 --> 00:58:20,330 I could just do k binary searches. 1110 00:58:20,330 --> 00:58:25,270 So this is an easy problem to solve. 1111 00:58:25,270 --> 00:58:29,500 You get k times log n. 1112 00:58:29,500 --> 00:58:32,050 But, now, fractional cascading comes in. 1113 00:58:32,050 --> 00:58:40,760 And we can get the optimal bound, which is k plus log n. 1114 00:58:40,760 --> 00:58:43,600 I need k to write down the answers. 1115 00:58:43,600 --> 00:58:45,640 I need log n to do the search in one list. 1116 00:58:45,640 --> 00:58:47,500 It turns out I can search on all k lists, 1117 00:58:47,500 --> 00:58:52,570 simultaneously get all k answers in k plus log n time. 1118 00:58:52,570 --> 00:58:57,400 It's kind of cool and, actually, quite easy to do. 1119 00:58:57,400 --> 00:58:58,645 We want to use this concept. 1120 00:59:01,390 --> 00:59:04,870 If I could search for my item, for x, in here, 1121 00:59:04,870 --> 00:59:07,380 and then basically follow a pointer to where I want 1122 00:59:07,380 --> 00:59:09,910 to go in here, I'd be done. 1123 00:59:09,910 --> 00:59:12,500 Sadly, that can't be done. 1124 00:59:12,500 --> 00:59:13,300 Why? 1125 00:59:13,300 --> 00:59:16,810 Because who knows what elements are in here? 1126 00:59:16,810 --> 00:59:21,480 All of these elements could fit right in this slot. 1127 00:59:21,480 --> 00:59:24,400 And so how do I know where to go in this giant list? 1128 00:59:24,400 --> 00:59:27,680 If these all fit in here and, recursively, 1129 00:59:27,680 --> 00:59:30,280 these all fit in here, then by searching up here, 1130 00:59:30,280 --> 00:59:32,420 I learn nothing about where x fits in here. 1131 00:59:32,420 --> 00:59:33,820 I have to do another search. 1132 00:59:33,820 --> 00:59:35,986 And then I learn nothing about where x fits in here. 1133 00:59:35,986 --> 00:59:37,720 So it doesn't work straight up. 1134 00:59:37,720 --> 00:59:41,830 But if we combine this idea with fractional cascading, 1135 00:59:41,830 --> 00:59:43,786 then we can do it. 1136 00:59:43,786 --> 00:59:46,170 So I can erase this now. 1137 00:59:58,040 --> 00:59:59,590 So what do we do? 1138 01:00:02,720 --> 01:00:04,430 Idea is very simple. 1139 01:00:04,430 --> 01:00:10,410 So I'm going to call these lists L1, L2, L3 up to Lk. 1140 01:00:13,780 --> 01:00:35,300 I want to add every other item in Lk to Lk minus 1 1141 01:00:35,300 --> 01:00:37,340 and produce a new list Lk minus 1 prime. 1142 01:00:39,880 --> 01:00:45,150 So I take every second item here, 1143 01:00:45,150 --> 01:00:46,640 just insert them into this list. 1144 01:00:49,320 --> 01:00:52,460 [INAUDIBLE] it's a constant fraction bigger. 1145 01:00:52,460 --> 01:00:54,080 And then repeat. 1146 01:00:54,080 --> 01:00:56,090 This is the fractional part. 1147 01:00:56,090 --> 01:00:57,849 Here, a fraction is one half. 1148 01:00:57,849 --> 01:00:59,390 You can make it whatever fraction you 1149 01:00:59,390 --> 01:01:02,540 like less than one. 1150 01:01:02,540 --> 01:01:06,560 In general, I'm going to add every other item-- 1151 01:01:09,720 --> 01:01:13,360 this is in sorted order in Lk, of course-- 1152 01:01:13,360 --> 01:01:16,610 that's in Li prime-- 1153 01:01:16,610 --> 01:01:18,410 the prime is the important part here-- 1154 01:01:18,410 --> 01:01:24,780 to Li minus 1 to form Li minus prime. 1155 01:01:24,780 --> 01:01:27,520 So I've got this new larger version of L2. 1156 01:01:27,520 --> 01:01:29,030 I take half the items from here. 1157 01:01:29,030 --> 01:01:31,850 Some of them may be items that were in L3. 1158 01:01:31,850 --> 01:01:34,280 Some of them are items that were originally in L2. 1159 01:01:34,280 --> 01:01:36,030 But all of them get promoted. 1160 01:01:36,030 --> 01:01:40,550 Or half of them get promoted to L1. 1161 01:01:40,550 --> 01:01:43,880 So I keep promoting from the bottom up. 1162 01:01:43,880 --> 01:01:46,340 How big do my lists get? 1163 01:01:46,340 --> 01:01:50,550 What is the size of Li prime? 1164 01:01:50,550 --> 01:01:54,290 Well, it started with Li. 1165 01:01:54,290 --> 01:01:57,680 And then I added half of the items 1166 01:01:57,680 --> 01:02:02,200 that were in the next level down, Li plus 1. 1167 01:02:02,200 --> 01:02:02,700 OK. 1168 01:02:02,700 --> 01:02:05,800 So this is n. 1169 01:02:05,800 --> 01:02:07,970 And so this is going to be half of n 1170 01:02:07,970 --> 01:02:10,546 plus half of another n plus half of-- 1171 01:02:10,546 --> 01:02:12,920 I mean, it's going to be n plus a half n plus a quarter n 1172 01:02:12,920 --> 01:02:14,010 plus an eighth n. 1173 01:02:14,010 --> 01:02:15,670 It's a geometric series. 1174 01:02:15,670 --> 01:02:18,215 This is just a constant factor growth. 1175 01:02:18,215 --> 01:02:20,510 I'm assuming all the lists have the same size here 1176 01:02:20,510 --> 01:02:21,140 for simplicity. 1177 01:02:21,140 --> 01:02:24,380 You can generalize. 1178 01:02:24,380 --> 01:02:27,920 So I didn't really make the lists any bigger, per se. 1179 01:02:27,920 --> 01:02:30,230 But I fixed this problem. 1180 01:02:30,230 --> 01:02:36,330 If all of the elements in L2 fit right here in L1, 1181 01:02:36,330 --> 01:02:37,760 it's no longer a problem. 1182 01:02:37,760 --> 01:02:41,090 Because, now, half of the items from L2 now live in L1. 1183 01:02:41,090 --> 01:02:43,880 So when I search among L1, I'm not quite 1184 01:02:43,880 --> 01:02:45,500 doing a global search. 1185 01:02:45,500 --> 01:02:47,590 But I'm finding where I fit in L1. 1186 01:02:47,590 --> 01:02:50,990 I didn't contaminate it too much from L2. 1187 01:02:50,990 --> 01:02:56,390 And then now, it's useful to have pointers from L1 to L2. 1188 01:02:56,390 --> 01:02:57,804 Let me draw a picture maybe. 1189 01:03:12,130 --> 01:03:21,700 So here's L1, L2, L3. 1190 01:03:21,700 --> 01:03:29,380 So half of the items here have been inserted into here. 1191 01:03:29,380 --> 01:03:31,870 Now, we don't really know-- 1192 01:03:31,870 --> 01:03:36,480 maybe many of them went near the same location. 1193 01:03:36,480 --> 01:03:37,707 But they went there. 1194 01:03:37,707 --> 01:03:39,790 And I'm going to have pointers in both directions. 1195 01:03:39,790 --> 01:03:42,776 Let's say I need them down. 1196 01:03:42,776 --> 01:03:44,650 So that if I search in here, I can figure out 1197 01:03:44,650 --> 01:03:45,910 where I am down here. 1198 01:03:49,020 --> 01:03:51,670 Then half of these guys-- 1199 01:03:51,670 --> 01:03:55,120 maybe I'll use another color-- 1200 01:03:55,120 --> 01:03:57,340 get promoted to the next level up. 1201 01:03:57,340 --> 01:04:00,070 So maybe this one gets promoted. 1202 01:04:00,070 --> 01:04:02,670 Maybe this one gets promoted. 1203 01:04:02,670 --> 01:04:07,740 I guess half would be that one, that one, that one, that one, 1204 01:04:07,740 --> 01:04:08,710 that one. 1205 01:04:08,710 --> 01:04:10,960 These guys get promoted to the next level. 1206 01:04:26,450 --> 01:04:27,600 OK. 1207 01:04:27,600 --> 01:04:29,290 I claim this is enough information. 1208 01:04:29,290 --> 01:04:32,020 This is fractional cascading in its full generality. 1209 01:04:32,020 --> 01:04:33,490 We have the cross-linking that we 1210 01:04:33,490 --> 01:04:36,400 had in the layered range trees. 1211 01:04:36,400 --> 01:04:38,987 But, now, we also have the fractional cascading part, 1212 01:04:38,987 --> 01:04:40,820 which is you take a fraction, you cascade it 1213 01:04:40,820 --> 01:04:41,980 into the next layer. 1214 01:04:41,980 --> 01:04:45,330 The cascading refers to those guys continue to get promoted. 1215 01:04:45,330 --> 01:04:49,330 Half of them get promoted up recursively. 1216 01:04:49,330 --> 01:04:51,950 That's where the name comes from. 1217 01:04:51,950 --> 01:04:53,417 So now, how do we do a search? 1218 01:04:53,417 --> 01:04:54,750 We're going to start at the top. 1219 01:04:54,750 --> 01:04:57,490 And we're going to do a regular binary search at the top. 1220 01:04:57,490 --> 01:04:59,650 Because we can afford log n once. 1221 01:04:59,650 --> 01:05:02,230 So we do the binary search at the top. 1222 01:05:02,230 --> 01:05:07,600 So maybe we find that our item fits here in this search. 1223 01:05:07,600 --> 01:05:10,420 So that tells us, oh, well, this is 1224 01:05:10,420 --> 01:05:13,960 where item x fits in this list. 1225 01:05:13,960 --> 01:05:14,566 Great. 1226 01:05:14,566 --> 01:05:15,940 Now, I need to know where it fits 1227 01:05:15,940 --> 01:05:18,970 in the next list in constant time. 1228 01:05:18,970 --> 01:05:20,980 Well, I need some more pointers for this. 1229 01:05:20,980 --> 01:05:22,990 So for each item in here, I'm going 1230 01:05:22,990 --> 01:05:25,880 to store a pointer to the previous and next, let's say, 1231 01:05:25,880 --> 01:05:28,570 red node, the previous and next node 1232 01:05:28,570 --> 01:05:31,490 that was promoted from the list below. 1233 01:05:31,490 --> 01:05:31,990 OK. 1234 01:05:31,990 --> 01:05:35,290 So, now, I basically know where it fits here. 1235 01:05:35,290 --> 01:05:38,350 Not quite, because this is only half the items. 1236 01:05:38,350 --> 01:05:42,040 So I know that it fits between this guy and this guy 1237 01:05:42,040 --> 01:05:47,260 in list 2 prime, technically. 1238 01:05:47,260 --> 01:05:51,370 So the only thing I don't know is, is it here or here? 1239 01:05:51,370 --> 01:05:53,590 So I compare with this one item. 1240 01:05:53,590 --> 01:05:56,560 And in general, if it's not a half, 1241 01:05:56,560 --> 01:05:58,190 if the fraction is some other constant, 1242 01:05:58,190 --> 01:06:01,120 I spend constant time to look at a constant number of items, 1243 01:06:01,120 --> 01:06:02,930 figure out where it fits among those items. 1244 01:06:02,930 --> 01:06:05,980 Now, I know where it fits in L2 prime. 1245 01:06:05,980 --> 01:06:09,490 Then I, again, follow pointers to the next items. 1246 01:06:09,490 --> 01:06:11,510 In this case, they're the white items. 1247 01:06:11,510 --> 01:06:15,150 So let's say it fits here, basically. 1248 01:06:15,150 --> 01:06:18,530 I have a pointer to the previous and next white item 1249 01:06:18,530 --> 01:06:19,900 from that item. 1250 01:06:19,900 --> 01:06:21,070 Follow those pointers down. 1251 01:06:21,070 --> 01:06:24,630 And now, I know it's either, basically, here, here, here. 1252 01:06:24,630 --> 01:06:28,300 It's somewhere in that little range of either equaling this 1253 01:06:28,300 --> 01:06:30,420 or being between these two items or on this item 1254 01:06:30,420 --> 01:06:33,280 or between those two items or on this item. 1255 01:06:33,280 --> 01:06:35,410 And, again, constant number of things to look at. 1256 01:06:35,410 --> 01:06:37,240 I figure out where I belong. 1257 01:06:37,240 --> 01:06:43,450 In the primed list, which is not quite the original list, 1258 01:06:43,450 --> 01:06:46,095 maybe I determine that x falls here. 1259 01:06:46,095 --> 01:06:47,470 And what I really want to know is 1260 01:06:47,470 --> 01:06:51,170 it's between this item and that item of the original list. 1261 01:06:51,170 --> 01:06:53,920 I don't care so much about the promoted lists. 1262 01:06:53,920 --> 01:06:56,670 So I need more pointers, which tell me 1263 01:06:56,670 --> 01:07:03,370 if it happens that I fall here, basically, every promoted item 1264 01:07:03,370 --> 01:07:05,530 has a pointer to the previous and next unpromoted 1265 01:07:05,530 --> 01:07:07,800 item from the original list. 1266 01:07:07,800 --> 01:07:09,040 This is static. 1267 01:07:09,040 --> 01:07:10,650 I can have all these pointers. 1268 01:07:10,650 --> 01:07:12,490 Let's write them down. 1269 01:07:12,490 --> 01:07:23,620 So every promoted item in Li prime-- 1270 01:07:23,620 --> 01:07:28,060 that means it came from a promotion from below-- 1271 01:07:28,060 --> 01:07:42,290 has a pointer to the previous and next non-promoted item. 1272 01:07:42,290 --> 01:07:45,350 So that's an item in Li. 1273 01:07:45,350 --> 01:07:45,850 OK. 1274 01:07:45,850 --> 01:07:47,560 That's two pointers. 1275 01:07:47,560 --> 01:07:50,080 And that's what we just use. 1276 01:07:50,080 --> 01:07:52,960 So I found where I was among the entire L1 1277 01:07:52,960 --> 01:07:56,590 prime, which was almost like a global search, not quite. 1278 01:07:56,590 --> 01:07:59,500 And then I follow these points to figure out where 1279 01:07:59,500 --> 01:08:00,790 it was in the original L1. 1280 01:08:05,137 --> 01:08:06,970 Well, so if I found that I was in the middle 1281 01:08:06,970 --> 01:08:10,180 of this big white region, I need to find the next red region. 1282 01:08:10,180 --> 01:08:11,810 So it's basically the reverse. 1283 01:08:11,810 --> 01:08:19,354 Every non-promoted item, every item in Li, has a pointer. 1284 01:08:23,229 --> 01:08:27,344 So this is basically Li prime minus Li, if you will. 1285 01:08:27,344 --> 01:08:28,760 And then these guys need a pointer 1286 01:08:28,760 --> 01:08:35,540 to the next and previous in that, so previous 1287 01:08:35,540 --> 01:08:45,720 and the next item in Li prime minus Li. 1288 01:08:45,720 --> 01:08:47,608 So these are the promoted items. 1289 01:08:47,608 --> 01:08:48,899 These are the unpromoted items. 1290 01:08:48,899 --> 01:08:52,020 So it's actually just two pointers per item. 1291 01:08:52,020 --> 01:08:54,560 If you're promoted, you store the previous and next. 1292 01:08:54,560 --> 01:08:56,010 Unpromoted, if you're unpromoted, 1293 01:08:56,010 --> 01:08:57,739 you store the previous and next promoted. 1294 01:08:57,739 --> 01:08:58,739 It's nice and symmetric. 1295 01:08:58,739 --> 01:09:00,550 It's pretty clean, a lot of pointers, 1296 01:09:00,550 --> 01:09:04,359 hard to draw, but quite simple in the end. 1297 01:09:04,359 --> 01:09:05,350 There's two main ideas. 1298 01:09:05,350 --> 01:09:07,560 One is to promote recursively up, 1299 01:09:07,560 --> 01:09:11,705 just a constant fraction, so the lists don't get much bigger. 1300 01:09:11,705 --> 01:09:13,080 Because it's a constant fraction, 1301 01:09:13,080 --> 01:09:15,779 the gaps when you walk down are constant size. 1302 01:09:15,779 --> 01:09:17,819 And so you basically get free relocalization 1303 01:09:17,819 --> 01:09:20,370 within each list with the help of some pointers 1304 01:09:20,370 --> 01:09:26,220 to walk down and jump left and right between the two colors. 1305 01:09:26,220 --> 01:09:27,439 OK. 1306 01:09:27,439 --> 01:09:29,350 That's basic fractional cascading 1307 01:09:29,350 --> 01:09:34,460 in that we solved this problem, searched within k lists 1308 01:09:34,460 --> 01:09:38,810 each of size n and k plus log n time, which is kind of amazing 1309 01:09:38,810 --> 01:09:40,700 I think, pretty cool. 1310 01:09:44,510 --> 01:09:47,590 But there's a more general form of this, 1311 01:09:47,590 --> 01:09:49,529 uses the exact same ideas. 1312 01:09:49,529 --> 01:09:54,149 But I just want to tell you how they generalized it. 1313 01:09:58,084 --> 01:09:59,250 This is Chazelle and Guibas. 1314 01:10:07,958 --> 01:10:22,920 So in general, fractional cascading, 1315 01:10:22,920 --> 01:10:25,065 if you look at what cascading is happening here-- 1316 01:10:27,660 --> 01:10:31,800 here being here-- we essentially have a path of cascades. 1317 01:10:31,800 --> 01:10:34,780 We start at the bottom and push into the predecessor 1318 01:10:34,780 --> 01:10:35,280 in the path. 1319 01:10:35,280 --> 01:10:37,530 We push into the predecessor in the path. 1320 01:10:37,530 --> 01:10:39,960 In the general case, we do it on a graph instead 1321 01:10:39,960 --> 01:10:42,580 of a path, arbitrary graph. 1322 01:10:42,580 --> 01:10:46,510 So we have a graph. 1323 01:10:46,510 --> 01:10:48,870 The input, in some sense, you can think 1324 01:10:48,870 --> 01:10:52,080 of this as a transformation. 1325 01:10:52,080 --> 01:10:54,510 But it's for a specific kind of data structure. 1326 01:10:54,510 --> 01:10:59,640 The data structure is represented by a graph. 1327 01:10:59,640 --> 01:11:11,370 And each vertex of the graph has a set of elements 1328 01:11:11,370 --> 01:11:12,540 or a list of elements. 1329 01:11:12,540 --> 01:11:13,680 That's what we had here. 1330 01:11:13,680 --> 01:11:17,415 We had a path here. 1331 01:11:17,415 --> 01:11:20,040 And each node in the path had a corresponding list of elements. 1332 01:11:20,040 --> 01:11:22,982 And we wanted to search among those lists. 1333 01:11:22,982 --> 01:11:24,690 Before I tell you exactly what search is, 1334 01:11:24,690 --> 01:11:26,790 let me tell you about the rest of the graph. 1335 01:11:34,500 --> 01:11:38,710 And this is, sorry, in an ordered universe. 1336 01:11:38,710 --> 01:11:40,167 So it's one dimensional. 1337 01:11:43,750 --> 01:11:50,245 Edge is labeled with a range from that ordered universe a, 1338 01:11:50,245 --> 01:11:50,940 b. 1339 01:11:50,940 --> 01:11:52,440 Every edge has some range. 1340 01:11:55,454 --> 01:11:57,120 You can think of it as a directed graph. 1341 01:11:57,120 --> 01:11:58,080 It's probably cleaner. 1342 01:11:58,080 --> 01:12:02,580 So when I follow this edge, there's a range here. 1343 01:12:02,580 --> 01:12:05,130 And, basically, I'm only allowed to follow that edge 1344 01:12:05,130 --> 01:12:08,730 if the range contains the thing I'm searching for. 1345 01:12:08,730 --> 01:12:13,202 So here, I was searching for some item x in all the lists. 1346 01:12:13,202 --> 01:12:14,160 There's no ranges here. 1347 01:12:14,160 --> 01:12:16,674 But in general, you get to specify a range. 1348 01:12:16,674 --> 01:12:18,090 Why do we want to specify a range? 1349 01:12:18,090 --> 01:12:21,633 We need a sort of bounded degree constraint. 1350 01:12:28,890 --> 01:12:33,348 We want to have bounded n degree. 1351 01:12:33,348 --> 01:12:37,116 So here we had n degree 1 for every node. 1352 01:12:37,116 --> 01:12:38,490 In general, we don't want to have 1353 01:12:38,490 --> 01:12:39,760 too many nodes pointing in. 1354 01:12:39,760 --> 01:12:42,492 Because we want to take half the nodes in here, 1355 01:12:42,492 --> 01:12:43,950 or a constant fraction of the items 1356 01:12:43,950 --> 01:12:46,872 here, and promote them into all the nodes that point to it, 1357 01:12:46,872 --> 01:12:48,330 so that when we follow the pointer, 1358 01:12:48,330 --> 01:12:50,400 we get to know where we belong here. 1359 01:12:50,400 --> 01:12:52,530 That's the general concept. 1360 01:12:52,530 --> 01:12:54,570 So, ideally, we have bounded n degree. 1361 01:12:54,570 --> 01:12:56,130 If we do, we're done. 1362 01:12:56,130 --> 01:12:58,200 We can have a slightly weaker condition, 1363 01:12:58,200 --> 01:13:02,010 which is called locally-bounded n degree where 1364 01:13:02,010 --> 01:13:12,750 the number of incoming edges for a node whose labels are ranges 1365 01:13:12,750 --> 01:13:15,960 The labels have to have a common intersection, x. 1366 01:13:15,960 --> 01:13:19,260 So we're searching for some item x. 1367 01:13:19,260 --> 01:13:22,230 And if all the possible ways we can enter this node given item 1368 01:13:22,230 --> 01:13:22,920 x-- 1369 01:13:22,920 --> 01:13:25,980 so this x has to fall in all those ranges-- 1370 01:13:25,980 --> 01:13:32,700 that should be bounded, so, at most, some constant c. 1371 01:13:32,700 --> 01:13:34,470 If it's always at most c for all nodes 1372 01:13:34,470 --> 01:13:38,040 and for all x's, then this is locally-bounded degree. 1373 01:13:38,040 --> 01:13:42,720 And these range labels help you achieve this property. 1374 01:13:42,720 --> 01:13:44,332 If you can constrain that you're only 1375 01:13:44,332 --> 01:13:46,290 going to follow this edge in certain situations 1376 01:13:46,290 --> 01:13:47,706 and there aren't too many ways you 1377 01:13:47,706 --> 01:13:50,340 could have gotten to a node, then you have this property. 1378 01:13:50,340 --> 01:13:57,210 AUDIENCE: [INAUDIBLE] bound to x? 1379 01:13:57,210 --> 01:14:02,030 ERIK DEMAINE: Contain x is a backwards containment. 1380 01:14:02,030 --> 01:14:03,340 Let me put it this way. 1381 01:14:03,340 --> 01:14:04,620 You have a node. 1382 01:14:04,620 --> 01:14:07,740 You have all these edges coming into it. 1383 01:14:07,740 --> 01:14:11,190 I want x to be a valid choice for each of these edges. 1384 01:14:11,190 --> 01:14:16,980 Meaning, the range, each of them is some interval on the line. 1385 01:14:16,980 --> 01:14:18,900 All those intervals should contain x. 1386 01:14:22,740 --> 01:14:25,710 It's basically, if you laid out all the intervals incoming 1387 01:14:25,710 --> 01:14:28,290 into this node, what is the maximum depth 1388 01:14:28,290 --> 01:14:29,310 of those intervals? 1389 01:14:29,310 --> 01:14:32,040 What's the maximum intersection between all those intervals? 1390 01:14:32,040 --> 01:14:34,200 That is your local degree. 1391 01:14:34,200 --> 01:14:36,720 And as long as that's the constant, we're happy. 1392 01:14:39,516 --> 01:14:42,320 All right. 1393 01:14:42,320 --> 01:14:44,590 So now, let me specify what a search means. 1394 01:14:47,120 --> 01:14:49,495 This is the problem that fractional cascading can solve. 1395 01:14:52,150 --> 01:15:05,310 Goal is to find x in some k vertex sets. 1396 01:15:05,310 --> 01:15:08,800 So k vertices, each of them has a set. 1397 01:15:08,800 --> 01:15:10,860 I want to find x in k of them. 1398 01:15:10,860 --> 01:15:12,670 Not all of them, k of them. 1399 01:15:12,670 --> 01:15:14,650 That's the general problem. 1400 01:15:14,650 --> 01:15:16,945 I have a constraint on how those sets are found. 1401 01:15:23,170 --> 01:15:28,775 They're found by navigating this graph starting from any vertex. 1402 01:15:35,010 --> 01:15:42,000 And we navigate by following edges whose labels contain x. 1403 01:15:50,520 --> 01:15:53,070 So we started some vertex in the graph. 1404 01:15:53,070 --> 01:15:56,160 We can follow some edges that contain 1405 01:15:56,160 --> 01:15:59,690 x. x is a valid choice here that's inside the interval. 1406 01:15:59,690 --> 01:16:02,730 Then from here, maybe we follow some more where 1407 01:16:02,730 --> 01:16:06,990 x is a valid choice, and so on. 1408 01:16:06,990 --> 01:16:09,485 It could look like anything. 1409 01:16:09,485 --> 01:16:11,610 It doesn't have to be depth first or breadth first. 1410 01:16:11,610 --> 01:16:15,030 It's just you follow some tree from some node 1411 01:16:15,030 --> 01:16:19,530 where all of the edges are valid for x. 1412 01:16:19,530 --> 01:16:22,140 At some point, you decide that I've seen enough. 1413 01:16:22,140 --> 01:16:27,750 And now, the goal is to find in this set, where is x? 1414 01:16:27,750 --> 01:16:28,910 In this set, where is x? 1415 01:16:28,910 --> 01:16:31,490 In this set, where is x? 1416 01:16:31,490 --> 01:16:34,090 In each of these lists, what is a predecessor and successor 1417 01:16:34,090 --> 01:16:34,900 of x? 1418 01:16:34,900 --> 01:16:35,400 Question. 1419 01:16:35,400 --> 01:16:37,274 AUDIENCE: So there's generally some root node 1420 01:16:37,274 --> 01:16:39,030 from which all queries start? 1421 01:16:39,030 --> 01:16:42,420 ERIK DEMAINE: I believe you do not need a single root node. 1422 01:16:42,420 --> 01:16:44,924 Each search could start from a different point. 1423 01:16:44,924 --> 01:16:45,465 AUDIENCE: OK. 1424 01:16:45,465 --> 01:16:46,298 So it's [INAUDIBLE]. 1425 01:16:46,298 --> 01:16:48,090 ERIK DEMAINE: But you're told where. 1426 01:16:48,090 --> 01:16:50,580 So imagine this is like an interaction between two 1427 01:16:50,580 --> 01:16:51,100 parties. 1428 01:16:51,100 --> 01:16:55,350 So the input basically says, look, I'm searching for x. 1429 01:16:55,350 --> 01:16:57,051 And I'm going to start at this node. 1430 01:16:57,051 --> 01:16:59,050 And then the fractional cascading data structure 1431 01:16:59,050 --> 01:17:01,560 says, OK, here's where x is in that node. 1432 01:17:01,560 --> 01:17:03,057 It tells you immediately. 1433 01:17:03,057 --> 01:17:04,220 Why not? 1434 01:17:04,220 --> 01:17:07,050 Then it says, OK, I'd like to follow this edge 1435 01:17:07,050 --> 01:17:08,454 and go to this node. 1436 01:17:08,454 --> 01:17:09,870 And fractional cascading says, OK, 1437 01:17:09,870 --> 01:17:12,670 here's where x is in this node in constant time. 1438 01:17:12,670 --> 01:17:13,170 OK. 1439 01:17:13,170 --> 01:17:14,720 Then now these two guys are active. 1440 01:17:14,720 --> 01:17:19,030 And now, the adversary, the input, whatever, can decide, 1441 01:17:19,030 --> 01:17:22,410 OK, I'm going to follow this edge, or this edge, any order. 1442 01:17:22,410 --> 01:17:24,120 It can build this tree in any order. 1443 01:17:24,120 --> 01:17:27,510 And every time it says here's the edge I want to follow, 1444 01:17:27,510 --> 01:17:30,240 the fractional cascading data structure in constant time 1445 01:17:30,240 --> 01:17:32,190 tells you here's where x is among all 1446 01:17:32,190 --> 01:17:34,130 the items in that node. 1447 01:17:34,130 --> 01:17:35,730 How does it do that? 1448 01:17:35,730 --> 01:17:37,200 With fractional cascading. 1449 01:17:37,200 --> 01:17:39,450 You just take half the items. 1450 01:17:39,450 --> 01:17:41,040 Half doesn't work anymore. 1451 01:17:41,040 --> 01:17:44,130 Now, it depends on that bounded n degree. 1452 01:17:44,130 --> 01:17:46,530 But you take some function of that degree 1453 01:17:46,530 --> 01:17:50,790 c, take some constant fraction of the items, 1454 01:17:50,790 --> 01:17:54,030 promote them to all the things, keep going. 1455 01:17:54,030 --> 01:17:57,269 It's a little trickier, because now you have cycles. 1456 01:17:57,269 --> 01:17:59,310 So you could actually promote back into yourself, 1457 01:17:59,310 --> 01:18:01,350 eventually, by chain reactions. 1458 01:18:01,350 --> 01:18:03,040 But if you set the constant low enough, 1459 01:18:03,040 --> 01:18:05,700 it's like radioactive decay. 1460 01:18:05,700 --> 01:18:09,281 Eventually, it all goes away, right? 1461 01:18:09,281 --> 01:18:09,780 I wish. 1462 01:18:12,390 --> 01:18:14,610 So it's much better than radioactive decay. 1463 01:18:14,610 --> 01:18:15,750 Radioactive is logarithmic. 1464 01:18:15,750 --> 01:18:17,110 This is exponential. 1465 01:18:17,110 --> 01:18:18,510 So it's decreasing very quickly. 1466 01:18:18,510 --> 01:18:21,030 After log n steps, all your items are gone. 1467 01:18:21,030 --> 01:18:23,620 So, yeah, maybe you go in a short loop for a while. 1468 01:18:23,620 --> 01:18:26,732 But after log n steps, it's all gone. 1469 01:18:26,732 --> 01:18:28,690 So you're, at most, increasing by a log factor. 1470 01:18:28,690 --> 01:18:30,450 In fact, you just increase by a constant factor, 1471 01:18:30,450 --> 01:18:32,075 because the number of items that remain 1472 01:18:32,075 --> 01:18:35,144 gets so tiny very quickly. 1473 01:18:35,144 --> 01:18:36,810 So I'm not going to go into the details, 1474 01:18:36,810 --> 01:18:40,320 but you just take this list idea, apply it to your graph. 1475 01:18:40,320 --> 01:18:41,700 It works. 1476 01:18:41,700 --> 01:18:43,230 It gets messier. 1477 01:18:43,230 --> 01:18:45,240 But in this very general scenario, 1478 01:18:45,240 --> 01:18:48,870 you can support these searches in k 1479 01:18:48,870 --> 01:18:53,310 plus log n where n, let's say, is the maximum size 1480 01:18:53,310 --> 01:18:55,650 of any vertex set. 1481 01:19:01,200 --> 01:19:03,780 So just it directly generalizes. 1482 01:19:03,780 --> 01:19:06,420 And this is the thing that you can 1483 01:19:06,420 --> 01:19:11,520 use to get this log factor improvement 1484 01:19:11,520 --> 01:19:12,630 and many other things. 1485 01:19:12,630 --> 01:19:15,642 Actually, this was such a big thing at the time. 1486 01:19:15,642 --> 01:19:17,850 There were two papers on fractional cascading, part 1 1487 01:19:17,850 --> 01:19:18,870 and part 2. 1488 01:19:18,870 --> 01:19:21,690 Part 1 is what is solving this. 1489 01:19:21,690 --> 01:19:23,312 And part 2 is applications. 1490 01:19:23,312 --> 01:19:25,020 They solved a ton of problems that no one 1491 01:19:25,020 --> 01:19:28,380 knew how to solve using this general fractional cascading 1492 01:19:28,380 --> 01:19:29,280 technique. 1493 01:19:29,280 --> 01:19:31,430 That's it for today.