1
00:00:00,090 --> 00:00:02,490
The following content is
provided under a Creative

2
00:00:02,490 --> 00:00:04,030
Commons license.

3
00:00:04,030 --> 00:00:06,360
Your support will help
MIT OpenCourseWare

4
00:00:06,360 --> 00:00:10,720
continue to offer high quality
educational resources for free.

5
00:00:10,720 --> 00:00:13,320
To make a donation or
view additional materials

6
00:00:13,320 --> 00:00:17,280
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:17,280 --> 00:00:18,450
at ocw.mit.edu.

8
00:00:20,140 --> 00:00:21,640
ERIK DEMAINE: All
right, today we're

9
00:00:21,640 --> 00:00:25,420
going to do some crossover
between two kinds of data

10
00:00:25,420 --> 00:00:27,610
structures, memory
hierarchy data structures

11
00:00:27,610 --> 00:00:29,380
and geometric data structures.

12
00:00:29,380 --> 00:00:31,420
And this will be
the final lecture

13
00:00:31,420 --> 00:00:35,890
in the memory hierarchy series,
so the end of cache oblivious.

14
00:00:35,890 --> 00:00:38,860
So we're going to look at
two-dimensional geometric data

15
00:00:38,860 --> 00:00:43,120
structure problems,
both offline and online.

16
00:00:43,120 --> 00:00:45,800
So our good friend,
orthogonal 2D range searching,

17
00:00:45,800 --> 00:00:49,690
which we spent a lot of
time in a few years ago,

18
00:00:49,690 --> 00:00:52,930
we will come back to, and try
to get our bounds good, even

19
00:00:52,930 --> 00:00:56,790
cache obliviously.

20
00:00:56,790 --> 00:00:58,900
So instead of log n,
we want log base b of n

21
00:00:58,900 --> 00:01:00,269
to make things interesting.

22
00:01:00,269 --> 00:01:01,810
And the batch version
is where you're

23
00:01:01,810 --> 00:01:03,250
given a whole bunch
of rectangles,

24
00:01:03,250 --> 00:01:05,410
and a whole bunch
of points up front,

25
00:01:05,410 --> 00:01:08,020
and you want to find
all the points that

26
00:01:08,020 --> 00:01:09,890
live in all the rectangles.

27
00:01:09,890 --> 00:01:12,130
So that's an easier
version of the problem.

28
00:01:12,130 --> 00:01:13,630
We'll start with
that and then we'll

29
00:01:13,630 --> 00:01:16,690
go to the usual
online version, where

30
00:01:16,690 --> 00:01:20,410
you have queries coming one at
a time, rectangles coming one

31
00:01:20,410 --> 00:01:20,910
at a time.

32
00:01:20,910 --> 00:01:25,430
The points are pre-processed,
it will be static.

33
00:01:25,430 --> 00:01:26,920
And to do the
batched, we're going

34
00:01:26,920 --> 00:01:30,717
to introduce a new technique
called distribution sweep,

35
00:01:30,717 --> 00:01:32,800
which is a combination of
the sweep line technique

36
00:01:32,800 --> 00:01:35,440
we saw back as we
used persistence

37
00:01:35,440 --> 00:01:38,920
to make sweep line thing
into a data structure thing.

38
00:01:38,920 --> 00:01:41,440
But we're just going to use
the algorithmic version of that

39
00:01:41,440 --> 00:01:44,479
plus a cache oblivious
sorting algorithm.

40
00:01:44,479 --> 00:01:46,270
So we'll finally do
cache oblivious sorting

41
00:01:46,270 --> 00:01:50,260
and optimal N/B log
base M/B / of N/B

42
00:01:50,260 --> 00:01:54,719
using a particular algorithm
called lazy funnel sort, which

43
00:01:54,719 --> 00:01:57,010
you can actually also use to
make another kind of cache

44
00:01:57,010 --> 00:02:00,040
oblivious priority queue,
but we won't get into that.

45
00:02:00,040 --> 00:02:02,680
And so by combining those two
things, we'll get a divide

46
00:02:02,680 --> 00:02:05,440
and conquer technique for
geometric problems that

47
00:02:05,440 --> 00:02:07,702
lets us solve the batched
thing, and then we'll

48
00:02:07,702 --> 00:02:09,160
use completely
different techniques

49
00:02:09,160 --> 00:02:10,960
for the online thing.

50
00:02:10,960 --> 00:02:15,310
So for starters, let's finally
do cache oblivious optimal

51
00:02:15,310 --> 00:02:17,020
sorting.

52
00:02:17,020 --> 00:02:22,000
I'm not going to analyze this
algorithm because it's just

53
00:02:22,000 --> 00:02:24,730
an algorithm, not
a data structure,

54
00:02:24,730 --> 00:02:26,710
and also because the
analysis is pretty

55
00:02:26,710 --> 00:02:33,340
close to the analysis
for priority queues

56
00:02:33,340 --> 00:02:34,495
we did last class.

57
00:02:38,410 --> 00:02:43,810
So funnel sort is
basically a merge sort.

58
00:02:43,810 --> 00:02:46,720
I mentioned last time
that in external memory,

59
00:02:46,720 --> 00:02:50,890
the right way to do,
or a right way to do

60
00:02:50,890 --> 00:02:55,462
optimal external memory sorting
is an m over B-way merge sort.

61
00:02:55,462 --> 00:02:58,270
In cache obliviously, you
don't know what m and b are,

62
00:02:58,270 --> 00:03:00,460
so it's hard to do
m over B-way merge.

63
00:03:00,460 --> 00:03:03,820
So instead, you basically
do a N-way merge.

64
00:03:03,820 --> 00:03:05,530
Not quite N-way, I
can't afford that,

65
00:03:05,530 --> 00:03:10,090
but it's going to be n to
the 1/3 way merge sort.

66
00:03:10,090 --> 00:03:13,280
And the big question then
becomes, how do you do emerge?

67
00:03:13,280 --> 00:03:15,104
And the answer is with a funnel.

68
00:03:15,104 --> 00:03:17,020
And so the heart of the
algorithm is a funnel.

69
00:03:20,990 --> 00:03:25,270
So if you have K-sorted
lists that are big,

70
00:03:25,270 --> 00:03:34,150
sized K cubed, then you can
merge them in, basically,

71
00:03:34,150 --> 00:03:35,400
the optimal bound.

72
00:03:46,650 --> 00:03:53,490
So K-funnel, K-sorted
lists, total size K cubed.

73
00:03:53,490 --> 00:03:54,960
Number of memory
transfers to merge

74
00:03:54,960 --> 00:03:58,670
them is K cubed over B
times log base M/B of K

75
00:03:58,670 --> 00:04:02,680
cubed over B.
There's a plus K term

76
00:04:02,680 --> 00:04:05,880
and when you plug this into
an actual sorting algorithm,

77
00:04:05,880 --> 00:04:09,900
you need to think about that,
but that's not a big deal.

78
00:04:09,900 --> 00:04:11,502
Usually this term will dominate.

79
00:04:14,100 --> 00:04:17,894
OK, so let me show
you how funnel works.

80
00:04:17,894 --> 00:04:20,019
We're just going to go
through the algorithmic part

81
00:04:20,019 --> 00:04:22,650
and I won't analyze the
number of memory transfers.

82
00:04:32,330 --> 00:04:34,170
Maybe I'll draw this here.

83
00:04:39,397 --> 00:04:40,980
So we're going to
have the inputs down

84
00:04:40,980 --> 00:04:45,329
at the bottom of this funnel.

85
00:04:45,329 --> 00:04:46,870
It's going to have
some data in them.

86
00:04:52,580 --> 00:04:57,710
Those k inputs down
here, total size,

87
00:04:57,710 --> 00:05:00,385
all these is theta K cubed.

88
00:05:05,970 --> 00:05:13,320
And then at the top here,
we have our output buffer.

89
00:05:13,320 --> 00:05:17,340
This is where we're
going to put the results

90
00:05:17,340 --> 00:05:19,940
and this will have size K cubed.

91
00:05:22,990 --> 00:05:25,140
Maybe we've already
done some work

92
00:05:25,140 --> 00:05:27,470
and we've filled some of it.

93
00:05:27,470 --> 00:05:30,300
OK, the question is what
do you put in this triangle

94
00:05:30,300 --> 00:05:32,010
to do the merge?

95
00:05:32,010 --> 00:05:35,940
And the obvious thing
is recursive triangles.

96
00:05:35,940 --> 00:05:37,770
Recursion is like
the one technique

97
00:05:37,770 --> 00:05:41,370
we know in cache
oblivious data structures.

98
00:05:41,370 --> 00:05:48,660
So we're going to take
square root of K-funnels

99
00:05:48,660 --> 00:05:51,780
and just join them together
in the obvious way.

100
00:05:51,780 --> 00:06:11,370
So just like [INAUDIBLE]
layout, except--

101
00:06:11,370 --> 00:06:14,740
I didn't quite leave
enough room here--

102
00:06:14,740 --> 00:06:19,780
in between the
levels are buffers.

103
00:06:19,780 --> 00:06:25,530
There's a buffer here
two between the nodes

104
00:06:25,530 --> 00:06:27,910
of this funnel and the
nodes of this funnel.

105
00:06:33,130 --> 00:06:39,862
OK, these buffers may have some
stuff in them at any moment.

106
00:06:39,862 --> 00:06:42,320
OK, and the big question is
how do you set the buffer size?

107
00:06:42,320 --> 00:06:44,270
This is the key step.

108
00:06:44,270 --> 00:06:50,450
And the claim is each buffer,
we set to a size of K to the 3/2

109
00:06:50,450 --> 00:06:54,230
because the number of buffers
is about square root of K

110
00:06:54,230 --> 00:06:58,130
because there's one per
leaf of this funnel.

111
00:06:58,130 --> 00:07:00,760
And a K-funnel has K inputs,
so a root K funnel is going

112
00:07:00,760 --> 00:07:03,380
to have root K inputs here.

113
00:07:03,380 --> 00:07:07,640
And so the total size
of all the buffers

114
00:07:07,640 --> 00:07:15,199
is K squared, which
is not too big.

115
00:07:15,199 --> 00:07:16,990
I'm not going to go
through the recurrence,

116
00:07:16,990 --> 00:07:19,600
but if you add up the
total size of this thing,

117
00:07:19,600 --> 00:07:25,210
it is linear size in
the output, K cubed.

118
00:07:25,210 --> 00:07:28,000
I think also if you don't count
the output buffer, it's linear

119
00:07:28,000 --> 00:07:29,740
and K squared.

120
00:07:29,740 --> 00:07:32,860
If I recall correctly.

121
00:07:32,860 --> 00:07:35,725
We're not too concerned with
that here, just overall.

122
00:07:38,410 --> 00:07:44,920
Once we have
K-funnels, funnel sort

123
00:07:44,920 --> 00:08:00,170
is just going to be N to the
1/3 way merge sort with an N

124
00:08:00,170 --> 00:08:07,180
to the 1/3 funnel as the merger.

125
00:08:12,980 --> 00:08:16,790
We can only up to n the 1/3
because of this cubic thing.

126
00:08:16,790 --> 00:08:19,250
We can only merge--

127
00:08:19,250 --> 00:08:23,870
if we want the sorting bound
N/B log base M/B of N/B

128
00:08:23,870 --> 00:08:26,690
we can only afford K
being up to n to the 1/3.

129
00:08:26,690 --> 00:08:29,480
So that's the biggest we can do.

130
00:08:29,480 --> 00:08:33,620
So it's a recursive algorithm
where each of the merging steps

131
00:08:33,620 --> 00:08:36,470
is this recursive
data structure.

132
00:08:36,470 --> 00:08:38,539
Now, this is really
just about layout.

133
00:08:38,539 --> 00:08:42,210
I haven't told you what the
actual algorithm is yet,

134
00:08:42,210 --> 00:08:43,419
but it's a recursive layout.

135
00:08:43,419 --> 00:08:46,130
You store the entire
upper triangle,

136
00:08:46,130 --> 00:08:48,755
then each of the triangles,
somewhere you put the buffers.

137
00:08:48,755 --> 00:08:50,150
It doesn't really
matter where the buffers

138
00:08:50,150 --> 00:08:51,860
are as long as each
triangle is stored.

139
00:08:51,860 --> 00:08:57,230
As a consecutive array
of memory, we'll be OK.

140
00:08:57,230 --> 00:09:01,430
And now let me tell you
about the actual algorithm

141
00:09:01,430 --> 00:09:02,430
to do this.

142
00:09:02,430 --> 00:09:06,107
It's a very simple
lazy algorithm.

143
00:09:10,671 --> 00:09:12,170
So there's a whole
bunch of buffers.

144
00:09:12,170 --> 00:09:15,920
If you want to do this merge,
really what you'd like to do

145
00:09:15,920 --> 00:09:18,680
is fill this output buffer.

146
00:09:18,680 --> 00:09:22,310
So you call this subroutine
called fill on the output

147
00:09:22,310 --> 00:09:26,120
buffer and say, I would like
to fill this entire buffer

148
00:09:26,120 --> 00:09:27,940
with elements.

149
00:09:27,940 --> 00:09:30,890
Precondition, if you're going to
do a fill, right now the buffer

150
00:09:30,890 --> 00:09:33,800
is empty, and then at
the end of the fill

151
00:09:33,800 --> 00:09:37,080
you'd like this to
be completely full.

152
00:09:37,080 --> 00:09:38,640
And how do you do it?

153
00:09:38,640 --> 00:09:40,430
Well, if you look
at any buffer--

154
00:09:43,340 --> 00:09:46,010
partially filled, whatever--
and you look right below it,

155
00:09:46,010 --> 00:09:49,040
there's a node in this tree.

156
00:09:49,040 --> 00:09:50,570
You recurse all the way down.

157
00:09:50,570 --> 00:09:55,370
In the end, this is just a
binary tree with buffers in it.

158
00:09:55,370 --> 00:09:58,130
So it's going to be there's a
buffer, then there's a node,

159
00:09:58,130 --> 00:10:00,932
then there's two children,
each of which is a buffer,

160
00:10:00,932 --> 00:10:02,390
and then there's
a node below that.

161
00:10:06,920 --> 00:10:09,890
OK, so how do I fill this thing?

162
00:10:09,890 --> 00:10:15,090
I just read the first
item, the beginning,

163
00:10:15,090 --> 00:10:17,810
the smallest item for each
of these, compare them.

164
00:10:17,810 --> 00:10:20,600
Whichever smaller,
I stick at here.

165
00:10:20,600 --> 00:10:23,690
It's just a regular binary
merge which is kind of cool.

166
00:10:23,690 --> 00:10:24,982
You've got two arrays.

167
00:10:24,982 --> 00:10:25,940
You want to merge them.

168
00:10:25,940 --> 00:10:29,214
Stick the results here.

169
00:10:29,214 --> 00:10:30,255
So that's how we do fill.

170
00:10:38,250 --> 00:10:55,580
Binary merge of the two children
buffers until we're full.

171
00:10:55,580 --> 00:10:57,295
But there's one thing
that can happen,

172
00:10:57,295 --> 00:10:59,420
which is that one of the
child buffers might empty.

173
00:11:08,320 --> 00:11:10,390
What do we do then?

174
00:11:10,390 --> 00:11:11,320
Recursively fill it.

175
00:11:20,050 --> 00:11:21,400
That's the algorithm.

176
00:11:21,400 --> 00:11:22,540
Very simple.

177
00:11:22,540 --> 00:11:24,940
The obvious lazy thing to do.

178
00:11:24,940 --> 00:11:26,050
Do a binary merge.

179
00:11:26,050 --> 00:11:29,410
This is going to be nice
because it's like two scans,

180
00:11:29,410 --> 00:11:33,550
until one of these guys empties,
and then you pause this merge,

181
00:11:33,550 --> 00:11:36,700
and then say OK, I'm going to
fill this entire buffer, which

182
00:11:36,700 --> 00:11:41,260
will recursively do stuff
until it's completely full

183
00:11:41,260 --> 00:11:44,980
or I run out of input elements,
whichever comes first,

184
00:11:44,980 --> 00:11:47,390
and then resume this merge.

185
00:11:47,390 --> 00:11:47,890
Question?

186
00:11:47,890 --> 00:11:50,680
AUDIENCE: Aren't there more
than two child buffers?

187
00:11:50,680 --> 00:11:54,970
ERIK DEMAINE: Should only
be two children buffers.

188
00:11:54,970 --> 00:11:58,540
The question is, are
there more than two?

189
00:11:58,540 --> 00:12:03,220
This recursion of the root
k and root k child triangles

190
00:12:03,220 --> 00:12:05,510
of size root k is
exactly the recursion

191
00:12:05,510 --> 00:12:06,730
we did on a binary tree.

192
00:12:06,730 --> 00:12:09,006
I didn't say, but underlying
this is a binary tree.

193
00:12:09,006 --> 00:12:11,380
The only difference between
this and a [INAUDIBLE] layout

194
00:12:11,380 --> 00:12:13,990
is we're adding these buffers.

195
00:12:13,990 --> 00:12:15,936
I intended to draw
this as binary.

196
00:12:15,936 --> 00:12:18,310
It's a little hard to tell
because I didn't draw the base

197
00:12:18,310 --> 00:12:22,240
case, but it is indeed a
binary tree in the end.

198
00:12:24,980 --> 00:12:27,647
OK, other questions?

199
00:12:27,647 --> 00:12:29,230
So that's the algorithm
and as I said,

200
00:12:29,230 --> 00:12:33,370
I'm not going to analyze it, but
it's the same kind of analysis.

201
00:12:33,370 --> 00:12:36,310
You look at the threshold
where things fit in cache

202
00:12:36,310 --> 00:12:40,105
or don't and argue accordingly.

203
00:12:43,080 --> 00:12:45,880
It's pretty hand-wavy.

204
00:12:45,880 --> 00:12:47,800
What I want to get
to is how we use

205
00:12:47,800 --> 00:12:50,950
this to solve more interesting
problems than sorting.

206
00:12:50,950 --> 00:12:53,870
Sorting is a little bit boring.

207
00:12:53,870 --> 00:12:58,369
So let's go to batched
orthogonal range searching.

208
00:13:20,890 --> 00:13:25,120
And in general, this technique
called distribution sweep.

209
00:13:29,235 --> 00:13:34,570
The idea with distribution
sweep is that not only can we

210
00:13:34,570 --> 00:13:38,130
use this cool funnel
sort algorithm to sort,

211
00:13:38,130 --> 00:13:40,030
but we can think of
it as doing a divide

212
00:13:40,030 --> 00:13:42,550
and conquer on the key value.

213
00:14:13,010 --> 00:14:14,900
And in this case, we
have two coordinates.

214
00:14:14,900 --> 00:14:16,608
We're going to use
the divide and conquer

215
00:14:16,608 --> 00:14:19,360
on one of the coordinates.

216
00:14:19,360 --> 00:14:23,700
And where we have
some flexibility

217
00:14:23,700 --> 00:14:25,730
is in this binary merge step.

218
00:14:25,730 --> 00:14:27,730
We're doing this binary
merge, and normally it's

219
00:14:27,730 --> 00:14:30,313
just you take the min, you spit
it out here, you take the min,

220
00:14:30,313 --> 00:14:33,100
you spit it out here.

221
00:14:33,100 --> 00:14:35,231
That's the min of one
particular coordinate.

222
00:14:35,231 --> 00:14:37,480
Now you've got to deal with
some auxiliary information

223
00:14:37,480 --> 00:14:38,646
about the other coordinates.

224
00:14:38,646 --> 00:14:42,200
So in general, you're
merging two sorted things.

225
00:14:42,200 --> 00:14:43,990
If there's other
geometric information,

226
00:14:43,990 --> 00:14:46,060
you can try to preserve
it during the merge.

227
00:14:46,060 --> 00:14:49,630
As long as you can do that, this
is the conqueror part or that

228
00:14:49,630 --> 00:14:51,940
combine step of
divide and conquer.

229
00:14:51,940 --> 00:14:53,490
You can do a lot.

230
00:14:53,490 --> 00:14:57,610
There's a powerful
technique, it turns out.

231
00:14:57,610 --> 00:15:01,740
It's by Brodal and Fagerberg.

232
00:15:01,740 --> 00:15:04,750
It's in their early
days of cache oblivious.

233
00:15:04,750 --> 00:15:06,190
It was the first
geometric paper.

234
00:15:09,430 --> 00:15:22,630
Fine, so replace or say
augment the binary merge, which

235
00:15:22,630 --> 00:15:25,120
is, in the end, the only
part of the algorithm

236
00:15:25,120 --> 00:15:27,230
other than the recursion.

237
00:15:27,230 --> 00:15:34,600
So it's the only thing
you need to do to maintain

238
00:15:34,600 --> 00:15:36,970
auxiliary information.

239
00:15:36,970 --> 00:15:40,325
That's the generic idea
of distribution sweep.

240
00:15:40,325 --> 00:15:41,950
And distribution
sweep has been applied

241
00:15:41,950 --> 00:15:43,620
to solve lots of
different problems.

242
00:15:43,620 --> 00:15:47,620
Batched orthogonal range
queries is one of them.

243
00:15:47,620 --> 00:15:50,200
Generally, you've got a
bunch of orthogonal segments,

244
00:15:50,200 --> 00:15:53,330
rectangles, points, and you want
to compute how they intersect.

245
00:15:53,330 --> 00:15:57,160
Those sorts of problems
that can be solved here.

246
00:15:57,160 --> 00:15:59,320
Also weird things like I
give you a bunch of points

247
00:15:59,320 --> 00:16:01,028
and I want to know
for every point what's

248
00:16:01,028 --> 00:16:03,340
its nearest neighbor.

249
00:16:03,340 --> 00:16:05,920
In Euclidean sense,
that can be solved.

250
00:16:05,920 --> 00:16:08,020
But I like orthogonal
range searching

251
00:16:08,020 --> 00:16:10,360
because it's the closest to
our data structure problem

252
00:16:10,360 --> 00:16:12,850
and that's a problem we've seen.

253
00:16:12,850 --> 00:16:16,120
So the actual batched
orthogonal range searching

254
00:16:16,120 --> 00:16:27,470
is your given N points,
and N rectangles,

255
00:16:27,470 --> 00:16:32,025
and you want to know which
points are in which rectangles.

256
00:16:32,025 --> 00:16:33,150
That's the general problem.

257
00:16:33,150 --> 00:16:37,410
So normally, we're
given the points first,

258
00:16:37,410 --> 00:16:39,550
and then we're given the
rectangles one at a time.

259
00:16:39,550 --> 00:16:41,000
That's what we've
solved in the past.

260
00:16:41,000 --> 00:16:42,333
That's what we will solve later.

261
00:16:42,333 --> 00:16:43,446
That's the online version.

262
00:16:43,446 --> 00:16:44,820
The batched version
is I give you

263
00:16:44,820 --> 00:16:48,270
a whole bunch of queries
I want to simultaneously

264
00:16:48,270 --> 00:16:55,170
and we're going to achieve
the sorting bound N/B log base

265
00:16:55,170 --> 00:17:02,790
M/B of N/B plus the size
of the output over B.

266
00:17:02,790 --> 00:17:04,950
And this is
generally the optimal

267
00:17:04,950 --> 00:17:06,940
bound you could hope for.

268
00:17:06,940 --> 00:17:08,400
It's not obvious
you need the log,

269
00:17:08,400 --> 00:17:11,069
but I think for most
problems in external memory

270
00:17:11,069 --> 00:17:12,569
you need this log.

271
00:17:12,569 --> 00:17:14,490
It's hard to beat
the sorting bound,

272
00:17:14,490 --> 00:17:16,200
and then once you pay
the sorting bound,

273
00:17:16,200 --> 00:17:20,560
this is the optimal linear time
to just write down the output.

274
00:17:20,560 --> 00:17:23,069
Now, this problem can be solved.

275
00:17:23,069 --> 00:17:28,650
Give me all the point
rectangle pairs that result.

276
00:17:28,650 --> 00:17:32,505
I'm not going to
solve it here exactly.

277
00:17:32,505 --> 00:17:34,960
We're going to solve a
slightly different version,

278
00:17:34,960 --> 00:17:37,020
or in general--

279
00:17:37,020 --> 00:17:39,200
whatever.

280
00:17:39,200 --> 00:17:41,700
Let me tell you about another
version of this problem, which

281
00:17:41,700 --> 00:17:43,270
is a little bit easier.

282
00:17:43,270 --> 00:17:46,290
Then I'll sketch how
you solve that problem.

283
00:17:51,040 --> 00:17:53,070
So remember, we've talked
about range reporting

284
00:17:53,070 --> 00:17:58,260
and also range
counting, which is you

285
00:17:58,260 --> 00:18:00,910
just want to know the
number of answers.

286
00:18:00,910 --> 00:18:03,000
Here's something in between.

287
00:18:03,000 --> 00:18:06,300
You want to know
for every point,

288
00:18:06,300 --> 00:18:09,751
how many rectangles contain it?

289
00:18:09,751 --> 00:18:11,250
And particularly,
this will tell you

290
00:18:11,250 --> 00:18:12,791
for each point, does
it appear in any

291
00:18:12,791 --> 00:18:15,604
of the rectangles in the set?

292
00:18:15,604 --> 00:18:17,520
It will tell you how
many and this is actually

293
00:18:17,520 --> 00:18:19,230
necessary as a first
step because one

294
00:18:19,230 --> 00:18:22,760
of the hard parts in solving
these kinds of problems

295
00:18:22,760 --> 00:18:26,370
or reporting problems, is
that the output could be big.

296
00:18:26,370 --> 00:18:29,360
We know that's always an issue,
but with cache oblivious,

297
00:18:29,360 --> 00:18:35,370
it's a big issue, literally,
because space is important.

298
00:18:35,370 --> 00:18:38,430
You can't afford to
put space anywhere.

299
00:18:41,610 --> 00:18:44,130
If these buffers have to
get much bigger in order

300
00:18:44,130 --> 00:18:47,010
to store those
answers, then life

301
00:18:47,010 --> 00:18:50,040
is kind of tough because
then this data structure

302
00:18:50,040 --> 00:18:52,682
gets too big, and then my
analysis goes out the window

303
00:18:52,682 --> 00:18:54,390
because things that
used to fit in cache,

304
00:18:54,390 --> 00:18:56,580
no longer fit in cache.

305
00:18:56,580 --> 00:18:58,330
The analysis I didn't show you.

306
00:18:58,330 --> 00:19:02,040
So it's an issue.

307
00:19:02,040 --> 00:19:05,340
So the first step
of this algorithm

308
00:19:05,340 --> 00:19:07,620
is to first figure out
how big those buffers have

309
00:19:07,620 --> 00:19:11,060
to be so that we don't have
to allocate them too large.

310
00:19:11,060 --> 00:19:12,810
And to do that, we
need to basically count

311
00:19:12,810 --> 00:19:18,000
how many answers there are,
and this is what we'll do.

312
00:19:18,000 --> 00:19:20,610
To compute these values,
the answers aren't very big.

313
00:19:20,610 --> 00:19:23,220
These answers are just
single numbers per point,

314
00:19:23,220 --> 00:19:26,590
so it's no big deal.

315
00:19:26,590 --> 00:19:30,300
OK, so here's what we do.

316
00:19:30,300 --> 00:19:38,430
Sort the points and the
corners of the rectangles

317
00:19:38,430 --> 00:19:44,010
by x-coordinate using
lazy final sort.

318
00:19:44,010 --> 00:19:45,150
Nothing fancy here.

319
00:19:45,150 --> 00:19:48,570
No augmentation,
regular old sort.

320
00:19:48,570 --> 00:19:51,810
Then-- this will
be useful later--

321
00:19:51,810 --> 00:19:54,050
then we're going to
divide and conquer

322
00:19:54,050 --> 00:20:02,320
on y via a distribution sweep.

323
00:20:07,240 --> 00:20:17,320
And here, our binary merger
is going to be an upward sweep

324
00:20:17,320 --> 00:20:18,418
line algorithm.

325
00:20:29,880 --> 00:20:32,266
So let's talk about that
sweep line algorithm.

326
00:20:37,190 --> 00:20:41,310
We presorted our points by x.

327
00:20:41,310 --> 00:20:45,360
If you think about the merging
step, what this means--

328
00:20:50,020 --> 00:20:51,770
it's confusing.

329
00:20:51,770 --> 00:20:57,500
We're trying to sort by y,
we were in a certain sense,

330
00:20:57,500 --> 00:21:01,190
but we're always going to be
sorted by x because we did that

331
00:21:01,190 --> 00:21:02,730
up front.

332
00:21:02,730 --> 00:21:06,810
So the picture is going
to be something like this.

333
00:21:06,810 --> 00:21:08,090
We're in a slab.

334
00:21:08,090 --> 00:21:10,730
There's going to
be the left slab.

335
00:21:10,730 --> 00:21:13,880
So here's the binary merger.

336
00:21:17,440 --> 00:21:20,000
Here's the L points
and the R points.

337
00:21:20,000 --> 00:21:23,230
The L points are going to be
in a particular x interval.

338
00:21:23,230 --> 00:21:28,070
The R points are going to
be in an adjacent x interval

339
00:21:28,070 --> 00:21:31,910
corresponding to
this tree picture.

340
00:21:31,910 --> 00:21:39,680
And then we have these points,
which they overlap and why?

341
00:21:39,680 --> 00:21:44,120
Because the whole point is
we're trying to merge by y.

342
00:21:44,120 --> 00:21:49,360
OK, we also have
some rectangles,

343
00:21:49,360 --> 00:21:53,083
and their corners are
what we have represented.

344
00:21:58,544 --> 00:22:00,210
I probably should
have used colors here.

345
00:22:09,390 --> 00:22:11,025
Something like this.

346
00:22:17,790 --> 00:22:20,190
So we're given,
essentially-- we have

347
00:22:20,190 --> 00:22:23,197
whatever we want on the
points and corners in here.

348
00:22:23,197 --> 00:22:24,780
We have whatever we
want in the points

349
00:22:24,780 --> 00:22:26,700
and corners in this slab.

350
00:22:26,700 --> 00:22:30,280
Let me add a little
bit of color.

351
00:22:30,280 --> 00:22:30,960
These lines.

352
00:22:35,580 --> 00:22:40,950
And now we want to merge these
two things and merging here

353
00:22:40,950 --> 00:22:44,800
is all about counting how many
rectangles contain each point.

354
00:22:44,800 --> 00:22:48,240
Now, we already know how
many points over here

355
00:22:48,240 --> 00:22:51,310
are contained in rectangles
that are over here.

356
00:22:51,310 --> 00:22:53,000
So we've presumably
already found

357
00:22:53,000 --> 00:22:55,930
that this point lies
in this rectangle.

358
00:22:55,930 --> 00:22:57,270
We've already found--

359
00:22:57,270 --> 00:22:58,632
I guess there's no points here.

360
00:22:58,632 --> 00:23:00,090
We've already found
that this point

361
00:23:00,090 --> 00:23:02,200
is contained in this rectangle.

362
00:23:02,200 --> 00:23:06,150
OK, because these corners were
in this slab, and so let's say

363
00:23:06,150 --> 00:23:08,680
every corner knows
the entire rectangle.

364
00:23:08,680 --> 00:23:11,130
So when you were processing
R, you saw these corners,

365
00:23:11,130 --> 00:23:12,060
you saw this point.

366
00:23:12,060 --> 00:23:14,040
Somehow you figured that out.

367
00:23:14,040 --> 00:23:19,830
What we're missing are things
like this rectangle, where

368
00:23:19,830 --> 00:23:22,420
none of the corners
are inside R.

369
00:23:22,420 --> 00:23:24,450
So R knew nothing
about this rectangle,

370
00:23:24,450 --> 00:23:26,580
and yet it has points
that are contained in it.

371
00:23:26,580 --> 00:23:30,210
Similarly, there are these
rectangles that completely

372
00:23:30,210 --> 00:23:35,700
span L, and so therefore none
of the corners are inside L.

373
00:23:35,700 --> 00:23:38,730
But we need to know that
these points are in there.

374
00:23:38,730 --> 00:23:42,420
Those are the only things that
will be missing at this level.

375
00:23:44,782 --> 00:23:46,740
There might be other
rectangles that completely

376
00:23:46,740 --> 00:23:50,060
span L and R. Those will be
discovered at higher levels,

377
00:23:50,060 --> 00:23:52,090
now here.

378
00:23:52,090 --> 00:23:54,600
It's a little bit awkward to
check if this will actually

379
00:23:54,600 --> 00:23:58,200
find everything, but it will.

380
00:23:58,200 --> 00:24:03,725
So to figure this out, when
we're merging L and R--

381
00:24:03,725 --> 00:24:05,370
see, L knows about
this rectangle

382
00:24:05,370 --> 00:24:07,050
because it sees these points.

383
00:24:07,050 --> 00:24:09,840
We want to keep track
as we sweep upwards.

384
00:24:09,840 --> 00:24:12,390
We want to realize that these
points are in a big rectangle

385
00:24:12,390 --> 00:24:14,670
here, whereas they
weren't discovered in L,

386
00:24:14,670 --> 00:24:17,520
and they weren't
discovered in R.

387
00:24:17,520 --> 00:24:25,140
To do that, we maintain a number
as-- we have a horizontal line,

388
00:24:25,140 --> 00:24:28,020
we're sweeping up.

389
00:24:28,020 --> 00:24:34,170
We want to maintain the
number of active rectangles.

390
00:24:34,170 --> 00:24:39,120
Active means that it's currently
being sliced by the sweep line.

391
00:24:42,450 --> 00:24:53,130
That have left corners
in L and completely

392
00:24:53,130 --> 00:24:57,870
span R. So that's these guys.

393
00:24:57,870 --> 00:24:59,550
So that's easy to do.

394
00:24:59,550 --> 00:25:01,350
We're merging these points.

395
00:25:01,350 --> 00:25:04,470
So that each of them
has been sorted by y.

396
00:25:04,470 --> 00:25:06,870
Now we're doing
a merge, so we're

397
00:25:06,870 --> 00:25:09,540
considering all the
corners, and all the points,

398
00:25:09,540 --> 00:25:13,740
and increasing the y-coordinate
as we do that binary merge.

399
00:25:13,740 --> 00:25:17,610
So whenever we visit a left
corner of a rectangle--

400
00:25:17,610 --> 00:25:20,954
a lower left corner-- we
say oh, does this rectangle

401
00:25:20,954 --> 00:25:21,870
go all the way across?

402
00:25:21,870 --> 00:25:23,160
This one does not.

403
00:25:23,160 --> 00:25:26,820
By the time we get to here, this
one goes all the way cross R,

404
00:25:26,820 --> 00:25:28,620
and so we increment CL.

405
00:25:28,620 --> 00:25:32,670
And when we get to the upper
left corner, we decrement CL.

406
00:25:32,670 --> 00:25:34,690
Say oh, that rectangle's over.

407
00:25:34,690 --> 00:25:38,070
So it's very easy
to do constant time,

408
00:25:38,070 --> 00:25:40,590
but it's only going
to be 1/B memory

409
00:25:40,590 --> 00:25:42,570
transfers per one of
these because it's

410
00:25:42,570 --> 00:25:45,870
a nice, cheap merge.

411
00:25:45,870 --> 00:25:49,770
And then symmetrically,
we do CR.

412
00:25:49,770 --> 00:25:52,020
It's the number of
active rectangles

413
00:25:52,020 --> 00:25:54,750
with the right
corners in R that span

414
00:25:54,750 --> 00:26:05,440
L. So that's this guy, CR,
I guess, this guy is CL.

415
00:26:05,440 --> 00:26:09,030
In general, there might be a
lot of them, so you count them.

416
00:26:09,030 --> 00:26:22,271
And then the only
thing we need to do

417
00:26:22,271 --> 00:26:24,500
is whenever we
encounter a point as

418
00:26:24,500 --> 00:26:26,630
opposed to a corner,
because we're storing them

419
00:26:26,630 --> 00:26:30,440
all together, we add--

420
00:26:30,440 --> 00:26:31,470
I got this right--

421
00:26:31,470 --> 00:26:36,950
CR to it's counter.

422
00:26:36,950 --> 00:26:40,890
We want to know how many
rectangles contain that point.

423
00:26:40,890 --> 00:26:43,400
And so for example,
when we see this point,

424
00:26:43,400 --> 00:26:46,790
and CR is currently one, then
we know that this point appeared

425
00:26:46,790 --> 00:26:49,160
in some rectangle
that spanned L.

426
00:26:49,160 --> 00:26:51,050
So we increment
this points counter.

427
00:26:51,050 --> 00:26:53,810
Similarly, when we see these
points, CL is positive,

428
00:26:53,810 --> 00:26:57,500
so we increment these guys
counters by whatever CL is.

429
00:26:57,500 --> 00:27:05,329
So this is a symmetric
version in R when we add CL.

430
00:27:05,329 --> 00:27:07,370
Probably should have called
them the other names,

431
00:27:07,370 --> 00:27:10,350
but anyway, CL,
CR, doesn't matter.

432
00:27:10,350 --> 00:27:12,950
CLRS.

433
00:27:12,950 --> 00:27:13,796
Question?

434
00:27:13,796 --> 00:27:15,740
AUDIENCE: The bottom
is the x-axis, right?

435
00:27:15,740 --> 00:27:17,365
ERIK DEMAINE: This
is the x-axis, yeah.

436
00:27:17,365 --> 00:27:20,689
AUDIENCE: So are we dividing
and conquering on x?

437
00:27:20,689 --> 00:27:23,230
ERIK DEMAINE: It does look like
we're dividing and conquering

438
00:27:23,230 --> 00:27:25,020
on x, I think you're right.

439
00:27:25,020 --> 00:27:26,356
Sorry.

440
00:27:26,356 --> 00:27:28,682
For some reason I
thought it was y.

441
00:27:28,682 --> 00:27:30,610
You're right.

442
00:27:30,610 --> 00:27:31,750
So it's a funny thing.

443
00:27:31,750 --> 00:27:35,770
We're pre-sorting by x,
which is what's getting us--

444
00:27:35,770 --> 00:27:36,530
thank you.

445
00:27:36,530 --> 00:27:37,960
That's much clearer now.

446
00:27:37,960 --> 00:27:40,482
In my mind I was like
there's something weird here.

447
00:27:40,482 --> 00:27:42,190
We're presorting on
x and then we're just

448
00:27:42,190 --> 00:27:44,310
sticking these guys down here.

449
00:27:44,310 --> 00:27:48,760
So evenly dividing
them into lists.

450
00:27:48,760 --> 00:27:53,270
Or, I guess actually, we're
doing our funnel sort,

451
00:27:53,270 --> 00:27:54,389
the merge sort.

452
00:27:54,389 --> 00:27:55,930
Things have already
been sorted by x,

453
00:27:55,930 --> 00:27:59,740
but now we're merge
sorting again,

454
00:27:59,740 --> 00:28:04,190
and this time when we merge, we
carry along this information.

455
00:28:04,190 --> 00:28:07,330
So they're both in terms of
x, which is kind of funny.

456
00:28:07,330 --> 00:28:10,242
Is there another question?

457
00:28:10,242 --> 00:28:12,953
AUDIENCE: Sorry, is it
important that we do

458
00:28:12,953 --> 00:28:14,679
the upward sweep [INAUDIBLE]?

459
00:28:18,640 --> 00:28:21,750
ERIK DEMAINE: The upward sweep.

460
00:28:21,750 --> 00:28:24,145
Yeah, we have to do the
points in order by y.

461
00:28:24,145 --> 00:28:27,066
AUDIENCE: So do we
want to just sort

462
00:28:27,066 --> 00:28:29,900
by y, and then [INAUDIBLE].

463
00:28:29,900 --> 00:28:31,988
ERIK DEMAINE: Ah,
so confused now.

464
00:28:31,988 --> 00:28:35,707
AUDIENCE: Because in the
notes, it said x and then y.

465
00:28:35,707 --> 00:28:37,790
ERIK DEMAINE: Yeah, I know
in the notes it says y.

466
00:28:37,790 --> 00:28:40,010
It used to say x.

467
00:28:40,010 --> 00:28:44,480
I believe, we're dividing
and conquering on x,

468
00:28:44,480 --> 00:28:49,160
but we're sorting by y,
and that's the confusion.

469
00:28:49,160 --> 00:28:52,850
I'll double check
this, but in order

470
00:28:52,850 --> 00:28:57,290
for this sweep to work-- so
it's like you first sort by x.

471
00:28:57,290 --> 00:28:58,386
You

472
00:28:58,386 --> 00:29:00,260
We are in some sense
doing divide and conquer

473
00:29:00,260 --> 00:29:03,290
by x because we
did this sort by x.

474
00:29:03,290 --> 00:29:07,980
But the merge short is on y.

475
00:29:07,980 --> 00:29:08,930
It makes more sense.

476
00:29:08,930 --> 00:29:11,420
If you're already
in x order, sorting

477
00:29:11,420 --> 00:29:12,670
isn't going to learn you much.

478
00:29:12,670 --> 00:29:15,620
It isn't going to
teach you much.

479
00:29:15,620 --> 00:29:17,029
So first you sort by x.

480
00:29:17,029 --> 00:29:18,320
Things are nicely ordered by x.

481
00:29:18,320 --> 00:29:21,800
So we get these nice horizontal
slabs in the decomposition,

482
00:29:21,800 --> 00:29:23,450
but now when we merge--

483
00:29:23,450 --> 00:29:24,950
Now we're going to sort by y.

484
00:29:24,950 --> 00:29:26,908
So we're going to reorder
the points and that's

485
00:29:26,908 --> 00:29:28,680
what lets us do the sweep.

486
00:29:28,680 --> 00:29:31,100
And we are, in the end, merging
all these points together

487
00:29:31,100 --> 00:29:32,690
in y order.

488
00:29:32,690 --> 00:29:34,910
And as we do it, then
we get the information

489
00:29:34,910 --> 00:29:37,065
we want about
rectangles and points.

490
00:29:37,065 --> 00:29:41,270
OK, this is why I wanted
this to be both x and y.

491
00:29:41,270 --> 00:29:43,610
But really, the divide and
conquer is happening on x,

492
00:29:43,610 --> 00:29:48,170
but we are doing
a merge sort on y.

493
00:29:48,170 --> 00:29:49,950
Finally clear.

494
00:29:49,950 --> 00:29:51,500
Thanks for helping me.

495
00:29:51,500 --> 00:29:53,972
This is a new lecturers,
as you may have guessed,

496
00:29:53,972 --> 00:29:57,622
so still working out some kinks.

497
00:29:57,622 --> 00:29:59,330
I really wanted to
introduce this lecture

498
00:29:59,330 --> 00:30:03,100
because the next thing
we're going to cover,

499
00:30:03,100 --> 00:30:05,630
which is a way to do orthogonal
2D range search and cache

500
00:30:05,630 --> 00:30:07,990
obviously, is super cool.

501
00:30:07,990 --> 00:30:13,540
It's like one of the
craziest things there is.

502
00:30:13,540 --> 00:30:15,530
At least in the cache
oblivious world.

503
00:30:15,530 --> 00:30:17,650
Any other questions before--

504
00:30:17,650 --> 00:30:21,740
Oh, I should say a little
bit more about this.

505
00:30:21,740 --> 00:30:26,510
We've now solved this first
step, which is figuring out

506
00:30:26,510 --> 00:30:28,520
the output size.

507
00:30:28,520 --> 00:30:32,321
Counting for each point how
many rectangles contain it,

508
00:30:32,321 --> 00:30:34,070
which is an interesting
problem by itself.

509
00:30:34,070 --> 00:30:36,470
That's the range
counting problem.

510
00:30:36,470 --> 00:30:39,620
You can also use it to
figure out, at this level,

511
00:30:39,620 --> 00:30:43,760
at this merging step, how many
things will be output here?

512
00:30:43,760 --> 00:30:45,140
How many new outputs are there?

513
00:30:45,140 --> 00:30:46,920
How many points in
rectangles are there?

514
00:30:46,920 --> 00:30:49,003
It's essentially just the
sum of all those things.

515
00:30:52,650 --> 00:30:55,250
So you can count the
number of outputs per merge

516
00:30:55,250 --> 00:31:00,530
and so then there's a natural
strategy, which is you

517
00:31:00,530 --> 00:31:05,720
build a new funnel structure
where these buffers

518
00:31:05,720 --> 00:31:07,900
have the right size.

519
00:31:07,900 --> 00:31:10,220
You've pre-computed what
all sizes need to be.

520
00:31:10,220 --> 00:31:12,620
At every merge you
know how many things

521
00:31:12,620 --> 00:31:14,540
are going to get spit out here.

522
00:31:14,540 --> 00:31:17,780
So you could allocate that
much space and that will

523
00:31:17,780 --> 00:31:21,370
be a kind of decent merge sort.

524
00:31:21,370 --> 00:31:22,870
Because I haven't
done the analysis,

525
00:31:22,870 --> 00:31:25,370
it's hard to get into
detail about this.

526
00:31:25,370 --> 00:31:29,390
But it will not be
optimal, unfortunately.

527
00:31:29,390 --> 00:31:31,250
To actually make
it work, you end up

528
00:31:31,250 --> 00:31:33,620
having to take this
tree, carving it

529
00:31:33,620 --> 00:31:36,530
into subtrees of linear size.

530
00:31:36,530 --> 00:31:38,540
So normally, the whole
thing is linear size.

531
00:31:38,540 --> 00:31:40,019
Everything's fine.

532
00:31:40,019 --> 00:31:41,810
And where the analysis
breaks, essentially,

533
00:31:41,810 --> 00:31:45,470
is if you have a giant buffer
because one of the outputs--

534
00:31:45,470 --> 00:31:47,760
potentially, the output
size here is quadratic.

535
00:31:47,760 --> 00:31:51,050
And so the overall thing
might be super linear.

536
00:31:51,050 --> 00:31:53,930
And so when you have a super
linear buffer or a bunch

537
00:31:53,930 --> 00:31:56,004
of very large buffers
that sum to linear size,

538
00:31:56,004 --> 00:31:57,920
you essentially need to
carve that tree, which

539
00:31:57,920 --> 00:32:01,511
you do by recursive
carving of the tree.

540
00:32:01,511 --> 00:32:03,260
So that each of the
trees has linear size.

541
00:32:03,260 --> 00:32:05,759
Then you apply the analysis to
each of the trees separately.

542
00:32:05,759 --> 00:32:08,120
You store them
consecutively, separately.

543
00:32:08,120 --> 00:32:10,040
Each of them has good
optimal running time

544
00:32:10,040 --> 00:32:11,480
and then the combination does.

545
00:32:11,480 --> 00:32:13,640
That's the hand-wavy
version of how

546
00:32:13,640 --> 00:32:17,560
to do actual range reporting
with end points and end

547
00:32:17,560 --> 00:32:18,280
rectangles.

548
00:32:18,280 --> 00:32:20,460
If you're interested in the
details, read the paper.

549
00:32:20,460 --> 00:32:24,050
It's just a little bit
messy and especially when

550
00:32:24,050 --> 00:32:26,780
you don't know the analysis.

551
00:32:26,780 --> 00:32:30,220
I want to move on to
online orthogonal 2D

552
00:32:30,220 --> 00:32:38,270
range searching because it's
the hardest and coolest of them

553
00:32:38,270 --> 00:32:38,770
all.

554
00:32:38,770 --> 00:32:41,126
Unless there are more questions.

555
00:32:41,126 --> 00:32:42,090
All right.

556
00:32:44,827 --> 00:32:46,910
AUDIENCE: So you do the
range counting [INAUDIBLE]

557
00:32:46,910 --> 00:32:52,700
in detail, and [INAUDIBLE]
to the [INAUDIBLE]..

558
00:32:52,700 --> 00:32:53,780
ERIK DEMAINE: Exactly.

559
00:32:53,780 --> 00:32:56,090
At this point, if you
believe in funnel sort,

560
00:32:56,090 --> 00:33:00,260
you should believe that
range counting is easy to do,

561
00:33:00,260 --> 00:33:04,562
and I've just hand waved
the range reporting part.

562
00:33:04,562 --> 00:33:05,285
Are you scribing?

563
00:33:05,285 --> 00:33:06,890
Is that why you ask?

564
00:33:11,080 --> 00:33:14,460
That's where we stand.

565
00:33:14,460 --> 00:33:17,450
The next thing we're going to
do is regular range reporting,

566
00:33:17,450 --> 00:33:19,710
regular online stuff.

567
00:33:19,710 --> 00:33:24,565
So this is orthogonal
2D range search.

568
00:33:30,110 --> 00:33:31,970
And we spent a
couple of lectures

569
00:33:31,970 --> 00:33:34,520
on 2D and 3D range search.

570
00:33:34,520 --> 00:33:38,450
All this crazy stuff with
fractional cascading,

571
00:33:38,450 --> 00:33:41,720
and so on, and the
layered range trees.

572
00:33:41,720 --> 00:33:44,450
We're going to use some of those
techniques that we built there,

573
00:33:44,450 --> 00:33:47,060
and in particular,
you may recall

574
00:33:47,060 --> 00:33:51,890
there was this idea that if
we have a bunch of points,

575
00:33:51,890 --> 00:33:55,010
regular 2D range searching
is I give you a rectangle,

576
00:33:55,010 --> 00:33:57,240
give me all the points
in the rectangle.

577
00:33:57,240 --> 00:33:57,740
Fine.

578
00:33:57,740 --> 00:34:06,710
Our goal is to achieve log base
B of N plus output size over B.

579
00:34:06,710 --> 00:34:08,960
That's the new optimal bound.

580
00:34:08,960 --> 00:34:11,330
This is how long it takes to
do a regular search in one

581
00:34:11,330 --> 00:34:13,550
dimension.

582
00:34:13,550 --> 00:34:16,659
So if you have output
size whatever--

583
00:34:16,659 --> 00:34:18,877
and we'll probably be
able to do range counting,

584
00:34:18,877 --> 00:34:20,210
but I won't worry about it here.

585
00:34:20,210 --> 00:34:21,835
We'll just think
about range reporting.

586
00:34:21,835 --> 00:34:23,449
If there's this
many points, we'll

587
00:34:23,449 --> 00:34:26,330
output them all in
that much over B.

588
00:34:26,330 --> 00:34:29,929
This is what we call a
regular range search,

589
00:34:29,929 --> 00:34:32,719
but I'm going to distinguish it
and call it a four sided range

590
00:34:32,719 --> 00:34:37,489
search because a
rectangle has four sides.

591
00:34:37,489 --> 00:34:39,670
But you could think
of the other versions

592
00:34:39,670 --> 00:34:43,760
and we actually did this when
we were doing the 3-D problem.

593
00:34:43,760 --> 00:34:47,770
So if these are two
rays and an edge,

594
00:34:47,770 --> 00:34:51,530
this you might call a
three sided rectangle,

595
00:34:51,530 --> 00:34:54,159
and you can go all the
way down to two sides.

596
00:34:54,159 --> 00:34:58,100
Hard to go down to one side.

597
00:34:58,100 --> 00:35:03,150
Here's a two sided rectangle,
it just has two rays.

598
00:35:03,150 --> 00:35:07,530
OK, as you might expect,
this is easier than that.

599
00:35:07,530 --> 00:35:11,760
And if I recall, in 3-D we ended
up doing this thing in linear

600
00:35:11,760 --> 00:35:14,550
space with this fancy--

601
00:35:14,550 --> 00:35:16,470
first you do a search
on the left coordinate

602
00:35:16,470 --> 00:35:17,970
and then you just walk.

603
00:35:17,970 --> 00:35:20,220
We'd subdivided with
fractional cascading

604
00:35:20,220 --> 00:35:23,040
so that every face
had constant size,

605
00:35:23,040 --> 00:35:24,540
and so you could
just walk, and each

606
00:35:24,540 --> 00:35:26,400
step you'd report a new point.

607
00:35:26,400 --> 00:35:28,666
If you may recall for this
kind of two sided thing.

608
00:35:28,666 --> 00:35:30,040
First, you would
search for this,

609
00:35:30,040 --> 00:35:32,700
and then you would basically
just follow this line until you

610
00:35:32,700 --> 00:35:37,110
found this point, this corner.

611
00:35:37,110 --> 00:35:42,750
This we could achieve in a
linear space, logarithmic time.

612
00:35:42,750 --> 00:35:46,200
This one we needed
N log N space.

613
00:35:46,200 --> 00:35:51,660
Actually, the best known is
N log N divided by log log N.

614
00:35:51,660 --> 00:35:54,150
But we could N log
N using range trees.

615
00:35:54,150 --> 00:35:58,260
And we got down to
log N time using--

616
00:35:58,260 --> 00:36:03,390
log N query time and log N
space using layered range trees.

617
00:36:03,390 --> 00:36:05,910
That was the internal
memory regular algorithms.

618
00:36:05,910 --> 00:36:09,044
AUDIENCE: Aren't you
missing an M/B though?

619
00:36:09,044 --> 00:36:10,460
ERIK DEMAINE: Am
I missing an M/B?

620
00:36:10,460 --> 00:36:14,920
No, this is log base B of N,
not log base M/B of N. Yeah,

621
00:36:14,920 --> 00:36:15,870
it's good to ask.

622
00:36:15,870 --> 00:36:21,634
When we're sorting this kind
of thing, we get log base M/B,

623
00:36:21,634 --> 00:36:23,550
but when you're searching,
the best you can do

624
00:36:23,550 --> 00:36:25,050
is log base B. We
actually proved

625
00:36:25,050 --> 00:36:27,780
a lower bound about this in the
first memory hierarchy lecture.

626
00:36:31,740 --> 00:36:33,900
Because this is online,
you read it in a block.

627
00:36:33,900 --> 00:36:36,115
You can only learn where
you fit among B items.

628
00:36:36,115 --> 00:36:37,740
And so the best you
can hope to achieve

629
00:36:37,740 --> 00:36:40,680
is log base B of N for
search in one dimension.

630
00:36:40,680 --> 00:36:43,032
So this is a lower
bound for search.

631
00:36:43,032 --> 00:36:44,490
When you're doing
batch operations,

632
00:36:44,490 --> 00:36:47,100
then you can hope to
achieve this stuff, which

633
00:36:47,100 --> 00:36:48,350
is a lot faster.

634
00:36:48,350 --> 00:36:52,770
Then it's like 1/B times
log base M/B of M/B.

635
00:36:52,770 --> 00:36:54,300
OK, so in a certain
sense, this is

636
00:36:54,300 --> 00:36:55,810
slower than the
batched operations,

637
00:36:55,810 --> 00:36:56,684
but it's more online.

638
00:36:56,684 --> 00:36:57,872
So it's a trade-off.

639
00:37:06,080 --> 00:37:09,790
So for all these problems we
can achieve log base B of N

640
00:37:09,790 --> 00:37:12,040
plus [? out ?] over B.
The issue is with space.

641
00:37:17,980 --> 00:37:25,240
Maybe I'll do sort of regular
RAM algorithms versus cache

642
00:37:25,240 --> 00:37:25,870
oblivious.

643
00:37:30,040 --> 00:37:34,330
So we've got two sided,
three sided, four sided.

644
00:37:34,330 --> 00:37:44,440
And for two sided, I believe
these are the right answers.

645
00:37:44,440 --> 00:37:50,260
Log N over log log N. But we
haven't actually seen this one.

646
00:37:55,900 --> 00:38:00,310
And cache oblivious,
here's what we can do.

647
00:38:00,310 --> 00:38:04,750
This is with optimal query
times and this is all static.

648
00:38:12,270 --> 00:38:15,450
OK, and if there's time,
I'll cover all of these.

649
00:38:15,450 --> 00:38:17,130
So they're not perfect.

650
00:38:17,130 --> 00:38:21,960
These two were off by a
log factor, but not bad.

651
00:38:21,960 --> 00:38:24,780
Pretty good orthogonal
2D range queries.

652
00:38:24,780 --> 00:38:27,930
And really, the coolest
one is this one.

653
00:38:27,930 --> 00:38:31,696
This one blows my mind
every time I see it.

654
00:38:31,696 --> 00:38:32,320
So let's do it.

655
00:38:36,600 --> 00:38:38,700
We'll start with two
sided and then we

656
00:38:38,700 --> 00:38:40,080
have existing
techniques once you

657
00:38:40,080 --> 00:38:43,800
have two sided to
add on more sides,

658
00:38:43,800 --> 00:38:46,770
you may recall from the 3D
range searching lecture.

659
00:38:46,770 --> 00:38:48,840
So we're going to
use those techniques

660
00:38:48,840 --> 00:38:55,110
and refine them a little bit
to get that log log factor.

661
00:38:55,110 --> 00:39:00,780
But you may recall way back
when, at lecture six or so,

662
00:39:00,780 --> 00:39:02,070
that we had a technique.

663
00:39:02,070 --> 00:39:03,720
Once it was two
sided, every time

664
00:39:03,720 --> 00:39:07,650
we added a log factor in space,
we could add another side.

665
00:39:07,650 --> 00:39:09,900
The hard part was getting
up the number of dimensions.

666
00:39:09,900 --> 00:39:14,820
Then the easy part was turning
half infinite intervals

667
00:39:14,820 --> 00:39:16,990
into regular intervals.

668
00:39:16,990 --> 00:39:21,300
So once we have this, it's easy
to add a log, add another log.

669
00:39:21,300 --> 00:39:25,896
With a bit of sophistication,
we can save a log log factor.

670
00:39:25,896 --> 00:39:28,930
OK, but let's do two sided.

671
00:39:28,930 --> 00:39:31,410
This will be the
bulk of the lecture.

672
00:39:36,630 --> 00:39:44,420
This is a paper by [? Harga ?]
and [? Zey ?] in 2006.

673
00:39:44,420 --> 00:39:46,170
All right, so we want to do--

674
00:39:46,170 --> 00:39:49,050
I'm going to assume that
they are this kind of quarter

675
00:39:49,050 --> 00:39:49,770
plain query.

676
00:39:49,770 --> 00:39:53,760
So less than or
equal to x, less than

677
00:39:53,760 --> 00:39:58,210
or equal to some y-coordinate.

678
00:39:58,210 --> 00:40:03,390
We want to know all the
points in that quarter plane.

679
00:40:03,390 --> 00:40:07,800
So here's what
we're going to do.

680
00:40:07,800 --> 00:40:09,960
It's all static.

681
00:40:09,960 --> 00:40:12,540
We're going to have a
Van Emde Boas layout.

682
00:40:12,540 --> 00:40:21,100
So a binary tree on
the y-coordinate.

683
00:40:21,100 --> 00:40:23,760
So this just stores all
the points sorted by y.

684
00:40:26,280 --> 00:40:32,750
So if you want to do this query,
use search for that value of y,

685
00:40:32,750 --> 00:40:39,520
then each of these positions
in between two keys in here

686
00:40:39,520 --> 00:40:42,490
has a pointer to an array.

687
00:40:45,370 --> 00:40:49,570
The array is not sorted by x
or y, it's a very weird thing.

688
00:40:52,450 --> 00:40:55,330
And then here's the
algorithm you follow.

689
00:40:55,330 --> 00:40:59,920
You follow this pointer, you
go here, you walk to the right

690
00:40:59,920 --> 00:41:07,285
until you find a point whose
x-coordinate is too big.

691
00:41:07,285 --> 00:41:09,080
It's bigger than x.

692
00:41:09,080 --> 00:41:12,870
I should probably
call this x2, y2.

693
00:41:12,870 --> 00:41:19,390
So first you search for a y2
here, in this thing keyed by y.

694
00:41:19,390 --> 00:41:20,430
Follow the pointer.

695
00:41:20,430 --> 00:41:24,005
You look at all the points that
have x-coordinate less than

696
00:41:24,005 --> 00:41:25,270
or equal to x2.

697
00:41:25,270 --> 00:41:26,600
Those are the ones you want.

698
00:41:26,600 --> 00:41:29,290
Once you find a point whose
x-coordinate is bigger than x2,

699
00:41:29,290 --> 00:41:33,410
you stop, and then you
report these points.

700
00:41:33,410 --> 00:41:35,830
It's not quite so simple
because some of these points

701
00:41:35,830 --> 00:41:37,630
might be duplicates.

702
00:41:37,630 --> 00:41:39,400
You have to remove duplicates.

703
00:41:39,400 --> 00:41:40,420
That is your answer.

704
00:41:46,170 --> 00:41:49,410
To me, this is an insane idea.

705
00:41:49,410 --> 00:41:51,930
I would never
imagine this to work.

706
00:41:51,930 --> 00:41:56,730
But the claim is you can make
this array have linear size.

707
00:41:56,730 --> 00:41:59,520
That's the hard part.

708
00:41:59,520 --> 00:42:03,210
Make this, the amount of stuff
that you have to traverse here,

709
00:42:03,210 --> 00:42:08,970
be linear in out in the number
of points that are actually

710
00:42:08,970 --> 00:42:12,337
in this range.

711
00:42:12,337 --> 00:42:14,670
You are going to do a little
bit more work because there

712
00:42:14,670 --> 00:42:18,780
are duplicates in here, but only
a constant factor of more work.

713
00:42:18,780 --> 00:42:21,360
And yet somehow, you've taken
this two dimensional problem

714
00:42:21,360 --> 00:42:23,540
and squashed it onto a line.

715
00:42:23,540 --> 00:42:25,290
You did one search at
the beginning, which

716
00:42:25,290 --> 00:42:28,650
costs you log base B of N,
then you do this linear scan,

717
00:42:28,650 --> 00:42:33,300
and you get the right
answer, magically.

718
00:42:33,300 --> 00:42:36,120
I don't know how they thought
this would be possible,

719
00:42:36,120 --> 00:42:38,850
but magically, it turns
out it is possible.

720
00:42:38,850 --> 00:42:42,750
It was kind of a breakthrough
in cache oblivious

721
00:42:42,750 --> 00:42:43,890
range searching.

722
00:42:43,890 --> 00:42:47,580
It was known how to do this for
external memory a lot easier.

723
00:42:50,680 --> 00:42:54,950
For example, you can
do it with persistence,

724
00:42:54,950 --> 00:43:00,050
but this is a much cooler way
to do two sided range queries.

725
00:43:00,050 --> 00:43:02,920
All right, so I've explained
the query algorithm.

726
00:43:07,126 --> 00:43:08,500
The big thing I
haven't explained

727
00:43:08,500 --> 00:43:09,640
is how to build this array.

728
00:43:15,192 --> 00:43:16,650
Maybe I'll write
down the things we

729
00:43:16,650 --> 00:43:19,500
need to prove as well
before we get there,

730
00:43:19,500 --> 00:43:21,000
so you can think
about them as we're

731
00:43:21,000 --> 00:43:23,460
writing down the algorithm.

732
00:43:23,460 --> 00:43:26,460
First claim is that this
algorithm, which just decides

733
00:43:26,460 --> 00:43:29,250
to stop whenever it gets an
x-coordinate that is too big,

734
00:43:29,250 --> 00:43:31,230
actually finds the right answer.

735
00:43:31,230 --> 00:43:37,110
It Finds all points in the
range that we care about.

736
00:43:40,710 --> 00:43:45,210
The second thing is that the
number of scanned points,

737
00:43:45,210 --> 00:43:56,830
the length of that step here,
is order the size of the output.

738
00:43:56,830 --> 00:43:59,500
The number of actual
output points.

739
00:43:59,500 --> 00:44:02,830
We don't waste time
doing the scan.

740
00:44:02,830 --> 00:44:09,490
And the other thing is that the
array has size order N. That's

741
00:44:09,490 --> 00:44:13,889
the biggest surprise to me.

742
00:44:13,889 --> 00:44:15,430
So those are the
three things we need

743
00:44:15,430 --> 00:44:19,690
to prove about the algorithm,
which I will now tell you.

744
00:44:36,270 --> 00:44:39,040
OK, before I can define
how this array works,

745
00:44:39,040 --> 00:44:40,825
I need to define a
concept called density.

746
00:44:49,450 --> 00:44:59,760
If we look at a query, there's
two things that could happen.

747
00:44:59,760 --> 00:45:05,240
The good thing for
us would be if--

748
00:45:05,240 --> 00:45:05,960
get this right.

749
00:45:10,070 --> 00:45:20,570
The number of points in
lesser or equal to x star

750
00:45:20,570 --> 00:45:27,780
is at most, alpha times the
number of points in the answer.

751
00:45:34,530 --> 00:45:37,680
OK, star means no
restriction on y.

752
00:45:37,680 --> 00:45:39,674
Minus infinity to infinity.

753
00:45:45,110 --> 00:45:48,560
This would be good for
us because it says--

754
00:45:48,560 --> 00:45:52,085
ultimately what we're trying
to do here is do a scan in x.

755
00:45:57,050 --> 00:46:00,050
It's the right thing to do here.

756
00:46:00,050 --> 00:46:02,460
Then for this
particular y-coordinate,

757
00:46:02,460 --> 00:46:05,240
we could just basically start
at the beginning of the array,

758
00:46:05,240 --> 00:46:08,300
start scanning, and just
report all the points

759
00:46:08,300 --> 00:46:10,410
that are actually in our range.

760
00:46:10,410 --> 00:46:14,300
Sorry, I need to also
potentially throw away

761
00:46:14,300 --> 00:46:18,560
points that are not low enough.

762
00:46:18,560 --> 00:46:22,300
So the answer is
contained in here.

763
00:46:22,300 --> 00:46:24,350
I should say to throw
away duplicates,

764
00:46:24,350 --> 00:46:26,689
you have to throw away points
that are not in the range

765
00:46:26,689 --> 00:46:28,730
lesser or equal to x,
comma lesser or equal to y.

766
00:46:28,730 --> 00:46:30,500
Still, we claim the
number of scan points

767
00:46:30,500 --> 00:46:33,950
is proportional to
the output size.

768
00:46:33,950 --> 00:46:35,420
That's what we need.

769
00:46:35,420 --> 00:46:40,120
So if this held for every
query, we'd be happy.

770
00:46:40,120 --> 00:46:42,710
Just start at the
beginning, scan,

771
00:46:42,710 --> 00:46:44,600
and as long as this
alpha is some constant--

772
00:46:44,600 --> 00:46:48,380
it's going to be a
constant bigger than 1,

773
00:46:48,380 --> 00:46:52,450
then the number of points in
the answer is proportional--

774
00:46:52,450 --> 00:46:54,200
sorry, the number of
points we had to scan

775
00:46:54,200 --> 00:46:57,175
through is proportional to the
number of points in the answer,

776
00:46:57,175 --> 00:46:58,799
and so we're done.

777
00:46:58,799 --> 00:46:59,840
So this is the easy case.

778
00:46:59,840 --> 00:47:03,830
We need to distinguish
it, otherwise we

779
00:47:03,830 --> 00:47:09,677
call this range query
sparse, and those

780
00:47:09,677 --> 00:47:10,760
are the interesting cases.

781
00:47:13,700 --> 00:47:16,250
So nothing deep
here, but we're going

782
00:47:16,250 --> 00:47:17,360
to use this concept a lot.

783
00:47:32,650 --> 00:47:35,260
OK, so we're going
to actually try

784
00:47:35,260 --> 00:47:37,840
to solve this problem twice.

785
00:47:37,840 --> 00:47:42,010
The first try isn't going
to be quite successful,

786
00:47:42,010 --> 00:47:45,160
but it gets a lot
of the right ideas.

787
00:47:45,160 --> 00:47:55,470
So I'm going to let S0 be
all the points sorted by x.

788
00:47:55,470 --> 00:47:57,940
It's going to be sorted by x.

789
00:47:57,940 --> 00:48:00,340
I put things down here.

790
00:48:00,340 --> 00:48:02,830
And just to give you an
idea of where we're going,

791
00:48:02,830 --> 00:48:06,490
the array we're
imagining here is first

792
00:48:06,490 --> 00:48:08,680
we write down all
the points, then

793
00:48:08,680 --> 00:48:11,380
we'll write down some subset
of the points, S1, then

794
00:48:11,380 --> 00:48:16,030
some subset of that subset,
and so on until we get down

795
00:48:16,030 --> 00:48:18,590
to a constant size structure.

796
00:48:18,590 --> 00:48:20,630
OK, first we write
down all the points.

797
00:48:20,630 --> 00:48:21,130
Why?

798
00:48:21,130 --> 00:48:23,590
Because for dense queries,
that's what we want.

799
00:48:23,590 --> 00:48:25,610
We want all the points
just sitting there.

800
00:48:25,610 --> 00:48:28,660
So then you can just read
through all the points

801
00:48:28,660 --> 00:48:30,680
and dense queries will be happy.

802
00:48:30,680 --> 00:48:35,380
So if we detect a y-coordinate
where the queries going to be

803
00:48:35,380 --> 00:48:36,477
dense--

804
00:48:36,477 --> 00:48:37,810
I don't know how we detect that.

805
00:48:37,810 --> 00:48:39,340
Let's not worry
about it right now--

806
00:48:39,340 --> 00:48:41,620
then you could just
look through S0.

807
00:48:41,620 --> 00:48:42,654
That's fine.

808
00:48:42,654 --> 00:48:44,320
But some queries are
going to be sparse,

809
00:48:44,320 --> 00:48:47,420
and for that we're going
to use S1, S2, and so on.

810
00:48:47,420 --> 00:48:50,920
The intuition is the following.

811
00:48:50,920 --> 00:48:53,620
If in your query,
the y-coordinate

812
00:48:53,620 --> 00:48:57,130
is very large,
like say infinity,

813
00:48:57,130 --> 00:48:59,060
then your query is
guaranteed to be dense.

814
00:48:59,060 --> 00:49:01,930
It doesn't matter what x is.

815
00:49:01,930 --> 00:49:04,600
And in general, if
y is near the top,

816
00:49:04,600 --> 00:49:07,692
like it's at the top most
point, or maybe the next of top

817
00:49:07,692 --> 00:49:09,650
most point, or maybe a
little bit farther down,

818
00:49:09,650 --> 00:49:12,700
it depends on the point
set, then a lot of queries

819
00:49:12,700 --> 00:49:14,380
are going to be dense.

820
00:49:14,380 --> 00:49:16,540
So that's good news.

821
00:49:16,540 --> 00:49:21,620
Let's consider the first time
when there's a sparse query.

822
00:49:21,620 --> 00:49:32,980
So we're going to let yi be the
largest y-coordinate where some

823
00:49:32,980 --> 00:49:37,420
query, some x-coordinate--

824
00:49:37,420 --> 00:49:38,626
that y-coordinate.

825
00:49:38,626 --> 00:49:41,125
This is going to be less than
or equal to x, comma less than

826
00:49:41,125 --> 00:49:42,340
or equal to yi--

827
00:49:44,920 --> 00:49:53,326
is sparse in Si minus 1.

828
00:49:53,326 --> 00:49:56,530
OK, so initially we
have S0, all points.

829
00:49:56,530 --> 00:49:59,079
y1 is the largest y
co-ordinate where there's--

830
00:49:59,079 --> 00:50:00,620
so we work our way
down until there's

831
00:50:00,620 --> 00:50:03,556
some sparse query in S0.

832
00:50:03,556 --> 00:50:06,065
That's yi.

833
00:50:06,065 --> 00:50:11,400
So then we just
filter, based on that.

834
00:50:11,400 --> 00:50:15,280
So throw away all
the points above yi.

835
00:50:15,280 --> 00:50:17,860
So we're going to
say take Si minus 1,

836
00:50:17,860 --> 00:50:22,610
intersect it with the
range query, star less than

837
00:50:22,610 --> 00:50:24,950
or equal to yi.

838
00:50:24,950 --> 00:50:28,175
OK, so the picture is
we have some point set.

839
00:50:35,930 --> 00:50:40,370
Up here, every possible
query along this line

840
00:50:40,370 --> 00:50:42,110
is going to be dense
because everything

841
00:50:42,110 --> 00:50:45,260
to the left of the x-coordinate
will be in the output.

842
00:50:45,260 --> 00:50:48,600
At some point, we're going
to decide this is too scary.

843
00:50:48,600 --> 00:50:50,510
There's a query
here, maybe this one,

844
00:50:50,510 --> 00:50:53,630
or maybe it's this
query that's sparse.

845
00:50:53,630 --> 00:50:56,960
And so we say OK, throw
away these points.

846
00:50:56,960 --> 00:50:59,330
Redo the data structure
from here down,

847
00:50:59,330 --> 00:51:03,710
ignoring all these
points, repeat,

848
00:51:03,710 --> 00:51:06,960
and write down these things.

849
00:51:06,960 --> 00:51:09,620
So the idea is that if you
look at a particular query,

850
00:51:09,620 --> 00:51:14,950
it will be dense in
one of these Si's.

851
00:51:14,950 --> 00:51:18,289
And you can tell that just
according to your y-coordinate.

852
00:51:18,289 --> 00:51:20,830
Because you said oh, well, if
you're up here in y-coordinate,

853
00:51:20,830 --> 00:51:23,060
you're guaranteed safe.

854
00:51:23,060 --> 00:51:27,520
So just do that
search and you're OK.

855
00:51:27,520 --> 00:51:32,320
In general, we
continue this process

856
00:51:32,320 --> 00:51:37,150
until we get to some Si
that has constant size.

857
00:51:37,150 --> 00:51:38,950
At that point, we're
done, and then we

858
00:51:38,950 --> 00:51:42,250
can afford to look
through all the points.

859
00:51:42,250 --> 00:51:44,800
Unfortunately, this is
not a very good strategy,

860
00:51:44,800 --> 00:51:50,300
but it's the first cut, and
it's close to what works.

861
00:51:50,300 --> 00:51:52,540
Here's a problem with it.

862
00:51:52,540 --> 00:51:54,775
Suppose you have this point set.

863
00:51:58,960 --> 00:52:02,770
OK, what happens is you start at
the top, everything looks fine.

864
00:52:02,770 --> 00:52:06,510
At some point you decide
there's a query here, namely

865
00:52:06,510 --> 00:52:09,420
this one, which has
an empty answer,

866
00:52:09,420 --> 00:52:13,020
and yet there are points to
the left of this x-coordinate.

867
00:52:13,020 --> 00:52:16,440
So that's bad because
it's very hard

868
00:52:16,440 --> 00:52:18,540
to get within a
constant factor of zero.

869
00:52:18,540 --> 00:52:22,290
So pretty much immediately
you've got to draw a line here

870
00:52:22,290 --> 00:52:30,280
and say OK, S0 is all
points, S1 is these points,

871
00:52:30,280 --> 00:52:33,240
S2 is going to be these points.

872
00:52:33,240 --> 00:52:36,270
In general, there's
suffixes of the points,

873
00:52:36,270 --> 00:52:39,240
and so the total space
will be quadratic.

874
00:52:39,240 --> 00:52:41,990
So the first two
properties will be correct

875
00:52:41,990 --> 00:52:45,780
because you're just looking
in S0, or S1, or whatever.

876
00:52:45,780 --> 00:52:48,030
Everything looks
fine, but your right

877
00:52:48,030 --> 00:52:50,400
does not have linear size.

878
00:52:50,400 --> 00:52:52,650
So no good.

879
00:52:52,650 --> 00:52:54,090
First try, failed.

880
00:52:59,101 --> 00:53:00,100
Second time's the charm.

881
00:53:11,840 --> 00:53:15,840
You need a little
more sophistication

882
00:53:15,840 --> 00:53:20,712
in how we do this partitioning,
how we build our array,

883
00:53:20,712 --> 00:53:21,420
and we'll get it.

884
00:53:31,810 --> 00:53:33,450
I didn't read this before.

885
00:53:33,450 --> 00:53:40,135
This one line that says
maximize common suffix.

886
00:53:40,135 --> 00:53:42,179
I have no idea what
this means, but maybe it

887
00:53:42,179 --> 00:53:43,470
will mean something by the end.

888
00:53:43,470 --> 00:53:45,120
Let's see.

889
00:53:45,120 --> 00:53:49,530
OK, this is the part I read.

890
00:53:49,530 --> 00:53:53,790
So xi is going to be--

891
00:53:53,790 --> 00:53:56,701
so we had a yi That's going
to be the same as before.

892
00:53:56,701 --> 00:53:58,200
This is why I did
the first attempt.

893
00:53:58,200 --> 00:54:01,070
This definition
remains the same.

894
00:54:01,070 --> 00:54:06,930
So largest y where we have some
sparse query in Si minus 1.

895
00:54:06,930 --> 00:54:11,014
I want to look at what
that x-coordinate is.

896
00:54:11,014 --> 00:54:12,805
It's just that here it
says there's some x.

897
00:54:15,420 --> 00:54:18,620
What is that x?

898
00:54:18,620 --> 00:54:22,260
Let's just look at the maximum
possible x that it could be.

899
00:54:22,260 --> 00:54:25,150
This will turn out
to be really useful.

900
00:54:25,150 --> 00:54:30,700
The maximum x-coordinate where
less than or equal to xi,

901
00:54:30,700 --> 00:54:33,645
comma less than or
equal to yi is sparse--

902
00:54:37,020 --> 00:54:39,974
and Si minus 1.

903
00:54:39,974 --> 00:54:41,640
OK, we know there's
something we can put

904
00:54:41,640 --> 00:54:43,530
in here that makes yi sparse.

905
00:54:43,530 --> 00:54:47,190
So look at the largest
possible such x.

906
00:54:47,190 --> 00:54:50,100
So that means any query--

907
00:54:50,100 --> 00:54:52,750
so we have this new point.

908
00:54:52,750 --> 00:54:55,060
It's not an actual
point in our problem,

909
00:54:55,060 --> 00:54:59,280
but it's a query, xi, yi.

910
00:54:59,280 --> 00:55:01,740
And it's dense, oh
sorry, it's sparse.

911
00:55:01,740 --> 00:55:03,430
It's bad.

912
00:55:03,430 --> 00:55:11,700
We know that any query
up here is dense.

913
00:55:11,700 --> 00:55:14,280
That was the definition of yi.

914
00:55:14,280 --> 00:55:20,130
And now we also know that
any query over here, I guess,

915
00:55:20,130 --> 00:55:21,990
that's saying a lot.

916
00:55:21,990 --> 00:55:23,969
But these queries
are also dense.

917
00:55:23,969 --> 00:55:26,010
Because again, if you're
far enough to the right,

918
00:55:26,010 --> 00:55:28,020
that's going to be
basically everything.

919
00:55:28,020 --> 00:55:30,570
So let's get rid
of that as well.

920
00:55:30,570 --> 00:55:33,100
And this is a problem,
queries over here

921
00:55:33,100 --> 00:55:34,640
are also potentially a problem.

922
00:55:34,640 --> 00:55:36,600
We don't know.

923
00:55:36,600 --> 00:55:40,830
It doesn't seem like much,
but it will be enough.

924
00:55:40,830 --> 00:55:43,360
We're going to
redefine Si as well.

925
00:55:43,360 --> 00:55:45,210
So here's the fun part.

926
00:55:45,210 --> 00:55:52,134
If we have some
Si minus 1, we're

927
00:55:52,134 --> 00:55:53,550
going to define a
new thing, which

928
00:55:53,550 --> 00:56:03,672
is Pi minus 1, which is this.

929
00:56:03,672 --> 00:56:13,960
This is a funny thing, but it
is this part of the point set.

930
00:56:13,960 --> 00:56:17,990
This is Pi minus 1.

931
00:56:17,990 --> 00:56:20,500
So the points we care
about are kind of here,

932
00:56:20,500 --> 00:56:22,990
but let's just take
everything to the left

933
00:56:22,990 --> 00:56:24,200
of this x-coordinate.

934
00:56:24,200 --> 00:56:24,700
Why not?

935
00:56:24,700 --> 00:56:26,530
It's a thing.

936
00:56:26,530 --> 00:56:28,880
That is Pi minus 1.

937
00:56:28,880 --> 00:56:32,590
So Si minus 1 is
everything in this picture.

938
00:56:32,590 --> 00:56:35,020
First, let's restrict
to x, then the next step

939
00:56:35,020 --> 00:56:37,600
is we're going to restrict to y.

940
00:56:37,600 --> 00:56:39,940
But it's in a funny way.

941
00:56:39,940 --> 00:56:44,200
This is the Si, the next s set.

942
00:56:44,200 --> 00:56:47,110
Take the previous set
and we intersect it

943
00:56:47,110 --> 00:56:49,000
with a funny thing.

944
00:56:52,984 --> 00:56:54,400
It's harder to
write algebraically

945
00:56:54,400 --> 00:56:55,720
than it is to draw the picture.

946
00:57:01,160 --> 00:57:05,800
So it's intersected with a
union, which is basically--

947
00:57:05,800 --> 00:57:08,020
dare I draw it on
the same picture?

948
00:57:08,020 --> 00:57:08,790
Where's my red?

949
00:57:15,960 --> 00:57:18,620
It's going to be less
than or equal to y.

950
00:57:28,640 --> 00:57:30,398
This thing is going to be Si.

951
00:57:39,250 --> 00:57:42,212
We'll see why,
eventually, this works.

952
00:57:42,212 --> 00:57:44,420
I still don't know what
maximize common suffix means,

953
00:57:44,420 --> 00:57:47,450
but we'll get there.

954
00:57:47,450 --> 00:57:50,770
So we're looking at the
points below the line.

955
00:57:50,770 --> 00:57:52,180
That's what we did before.

956
00:57:52,180 --> 00:57:54,910
We used to say Si is just the
intersection with less than

957
00:57:54,910 --> 00:57:56,380
or equal to yi.

958
00:57:56,380 --> 00:57:59,560
But things are just
a little bit messier

959
00:57:59,560 --> 00:58:03,270
because of this restriction.

960
00:58:03,270 --> 00:58:06,400
Do I really not have a P here?

961
00:58:06,400 --> 00:58:08,762
OK, here's the difference.

962
00:58:08,762 --> 00:58:10,720
The reason we have to go
through this business.

963
00:58:10,720 --> 00:58:14,740
The array that we're going
to store is not the Si's.

964
00:58:14,740 --> 00:58:17,061
Si's are still too
big, potentially.

965
00:58:17,061 --> 00:58:18,685
What we're going to
store are the Pi's.

966
00:58:27,664 --> 00:58:29,557
Pi minus 1.

967
00:58:29,557 --> 00:58:31,265
And then in the end,
we're in a store Si.

968
00:58:31,265 --> 00:58:34,250
Si, again, has constant size.

969
00:58:34,250 --> 00:58:35,870
The final Si has constants size.

970
00:58:35,870 --> 00:58:37,953
I probably should have
used a different letter, Sk

971
00:58:37,953 --> 00:58:39,490
or whatever.

972
00:58:39,490 --> 00:58:41,274
We keep doing this
until we get down

973
00:58:41,274 --> 00:58:43,190
to something constant
sized, then we store it.

974
00:58:43,190 --> 00:58:46,460
That's the easy case.

975
00:58:46,460 --> 00:58:50,060
Until then, we just store
the Pi's, because really, we

976
00:58:50,060 --> 00:58:55,800
know that all the queries up
here and over here are OK.

977
00:58:55,800 --> 00:58:57,200
They're nice and dense.

978
00:58:57,200 --> 00:59:02,210
We sort of only care about the
points to the left of the line.

979
00:59:02,210 --> 00:59:07,310
OK, but essentially, the
Si has to pick up the slack

980
00:59:07,310 --> 00:59:11,390
and we have to include
these points in the next Si.

981
00:59:11,390 --> 00:59:12,950
Whereas, before, we did not.

982
00:59:12,950 --> 00:59:14,947
Before we just took
things below the line.

983
00:59:14,947 --> 00:59:17,030
Now we have to take things
that are below the line

984
00:59:17,030 --> 00:59:20,180
or to the right of
the vertical line.

985
00:59:23,390 --> 00:59:26,231
This is essentially
necessary for correctness.

986
00:59:31,640 --> 00:59:34,480
So we kind of win
some, we lose some.

987
00:59:34,480 --> 00:59:39,730
But it turns out all is well.

988
00:59:39,730 --> 00:59:46,840
So I know this is weird, but
let's jump to the analysis.

989
00:59:46,840 --> 00:59:51,370
These claims, in particular,
that the array has linear size.

990
00:59:51,370 --> 00:59:54,394
Let's think about that and it
will become clear why the heck

991
00:59:54,394 --> 00:59:55,435
we've made these choices.

992
00:59:57,836 --> 00:59:59,210
Unless you have
a question first.

993
00:59:59,210 --> 01:00:00,194
AUDIENCE: Is there
any relationship

994
01:00:00,194 --> 01:00:02,660
between the Si here and
the Si on the first try?

995
01:00:02,660 --> 01:00:04,480
ERIK DEMAINE: No,
this definition of Si

996
01:00:04,480 --> 01:00:06,340
is no longer in effect.

997
01:00:06,340 --> 01:00:12,250
S0 is correct, and all the
Si's are still sorted by x.

998
01:00:12,250 --> 01:00:13,900
We're no longer doing this.

999
01:00:13,900 --> 01:00:18,010
Instead of this rule,
we're doing this rule.

1000
01:00:18,010 --> 01:00:20,950
This part is the same, but we
have this extra union, which

1001
01:00:20,950 --> 01:00:23,320
contradicts the previous rule.

1002
01:00:23,320 --> 01:00:25,480
So the yi definition
is the same.

1003
01:00:25,480 --> 01:00:26,710
Sorry, it's a little weird.

1004
01:00:26,710 --> 01:00:30,520
xi is new, Pi is
new, and Si is new.

1005
01:00:36,900 --> 01:00:39,940
At this point, it's this
algebraic weird thing.

1006
01:00:39,940 --> 01:00:43,080
Here's the cool thing.

1007
01:00:43,080 --> 01:00:50,930
For the space
bound, the claim is

1008
01:00:50,930 --> 01:00:57,710
Pi minus 1 intersect Si
is less than or equal to 1

1009
01:00:57,710 --> 01:01:04,370
over alpha times Pi minus 1.

1010
01:01:04,370 --> 01:01:06,590
This is hard to even
interpret what it means,

1011
01:01:06,590 --> 01:01:09,450
but it's good news.

1012
01:01:09,450 --> 01:01:12,920
So remember, alpha is
a number bigger than 1.

1013
01:01:12,920 --> 01:01:15,059
It's what we use in the
definition of density,

1014
01:01:15,059 --> 01:01:17,600
and you could set this parameter
to whatever you want, say 2.

1015
01:01:20,420 --> 01:01:23,360
So then we're going to get that
this thing, whatever it is,

1016
01:01:23,360 --> 01:01:25,730
is at most half the size
of the previous one.

1017
01:01:28,640 --> 01:01:30,680
I claim this is good news.

1018
01:01:30,680 --> 01:01:36,080
I claim it means that these Pi's
essentially are geometrically

1019
01:01:36,080 --> 01:01:40,550
decreasing in size,
which is how we get--

1020
01:01:40,550 --> 01:01:45,050
that's not quite right, but this
will give us a charging scheme.

1021
01:01:45,050 --> 01:01:47,530
which will prove that the
whole thing has linear size.

1022
01:01:47,530 --> 01:01:50,830
First, why is this true?

1023
01:01:50,830 --> 01:01:55,010
It could really only be true
for sparsity from the alpha.

1024
01:01:55,010 --> 01:01:57,550
Right, so we said
oh, density is good.

1025
01:01:57,550 --> 01:01:59,840
If we have dense,
there's nothing to do.

1026
01:01:59,840 --> 01:02:03,650
Just put the points in
x order, we're done.

1027
01:02:03,650 --> 01:02:04,670
Sparse is bad.

1028
01:02:04,670 --> 01:02:07,470
But actually, sparse
tells us something.

1029
01:02:07,470 --> 01:02:09,680
It tells us there
are a lot of points

1030
01:02:09,680 --> 01:02:11,550
that are not in the answer.

1031
01:02:11,550 --> 01:02:14,250
So we're looking at
this query, xi yi.

1032
01:02:14,250 --> 01:02:17,750
And we'd like to just say oh,
start at negative infinity,

1033
01:02:17,750 --> 01:02:21,350
and just take all the
points up to here.

1034
01:02:21,350 --> 01:02:24,320
If we're dense, that is
within a constant factor

1035
01:02:24,320 --> 01:02:27,320
of the number of points that are
actually in the answer, which

1036
01:02:27,320 --> 01:02:29,400
is down here.

1037
01:02:29,400 --> 01:02:31,815
If we're sparse, that means
there are a lot of points

1038
01:02:31,815 --> 01:02:34,730
up here.

1039
01:02:34,730 --> 01:02:39,710
Most of the points have to be
up here in order to be sparse.

1040
01:02:39,710 --> 01:02:41,480
And that's actually
what this is saying

1041
01:02:41,480 --> 01:02:43,870
if you expand the definitions.

1042
01:02:43,870 --> 01:02:47,820
So Pi minus 1, that was
all the stuff to the left.

1043
01:02:47,820 --> 01:02:48,980
So that's this thing.

1044
01:02:48,980 --> 01:02:51,410
This is what we would get
if we just did a linear scan

1045
01:02:51,410 --> 01:02:53,810
from left to right.

1046
01:02:53,810 --> 01:02:56,870
Versus we're
considering the points

1047
01:02:56,870 --> 01:03:00,980
in Pi minus 1, which
just restricts to x,

1048
01:03:00,980 --> 01:03:04,090
and then we're looking at Si.

1049
01:03:04,090 --> 01:03:06,890
Si does this business.

1050
01:03:06,890 --> 01:03:09,320
But if we restrict
to the Si points that

1051
01:03:09,320 --> 01:03:11,660
are to the left of the line--

1052
01:03:11,660 --> 01:03:13,460
so we're looking
at, basically, this

1053
01:03:13,460 --> 01:03:17,150
left portion, which was this
white rectangle, intersected

1054
01:03:17,150 --> 01:03:20,750
with this funny red rectangle,
which was kind of awkward--

1055
01:03:20,750 --> 01:03:22,310
the intersection is just this.

1056
01:03:22,310 --> 01:03:26,120
That's the answer for
this query, xi yi.

1057
01:03:26,120 --> 01:03:38,540
OK, so this is the size
of the answer for xi yi.

1058
01:03:38,540 --> 01:03:47,270
And this was the
number of points

1059
01:03:47,270 --> 01:03:50,910
in less than or
equal to xi star.

1060
01:03:53,660 --> 01:03:56,330
We wanted to just do a
linear scan like this.

1061
01:03:56,330 --> 01:03:59,300
But this is the correct
answer and because we

1062
01:03:59,300 --> 01:04:01,190
know that this point
is sparse-- that was

1063
01:04:01,190 --> 01:04:04,280
the definition of xi and yi, it
was the maximum sparse point.

1064
01:04:04,280 --> 01:04:05,930
So it's a sparse
point, therefore

1065
01:04:05,930 --> 01:04:08,584
we know that this does not hold.

1066
01:04:08,584 --> 01:04:11,000
So the number of points less
than or equal to x comma star

1067
01:04:11,000 --> 01:04:13,730
is greater than alpha
times the number

1068
01:04:13,730 --> 01:04:16,190
of points in the correct range.

1069
01:04:16,190 --> 01:04:18,700
And if I got it right,
that should be this.

1070
01:04:18,700 --> 01:04:21,240
You could put alpha over
here without the one over

1071
01:04:21,240 --> 01:04:23,840
and I guess this is
strictly greater.

1072
01:04:23,840 --> 01:04:26,000
No big deal.

1073
01:04:26,000 --> 01:04:28,820
So that's the
definition of sparsity.

1074
01:04:28,820 --> 01:04:30,690
So this is the
cool thing we know.

1075
01:04:30,690 --> 01:04:33,930
Now, we're going to use--

1076
01:04:33,930 --> 01:04:36,581
this is now a
numbered less than 1.

1077
01:04:36,581 --> 01:04:37,080
Question?

1078
01:04:37,080 --> 01:04:39,520
AUDIENCE: So for Pi
minus 1, we add them

1079
01:04:39,520 --> 01:04:41,472
as the number of points
less than xi star.

1080
01:04:41,472 --> 01:04:42,450
But for example--

1081
01:04:42,450 --> 01:04:43,860
ERIK DEMAINE: Yes, that's
the definition here.

1082
01:04:43,860 --> 01:04:45,276
AUDIENCE: [INAUDIBLE]
like Pi, you

1083
01:04:45,276 --> 01:04:48,980
don't have that block in
the top left corner, right?

1084
01:04:48,980 --> 01:04:50,250
ERIK DEMAINE: Right.

1085
01:04:50,250 --> 01:04:52,080
After we restrict
to Si, yeah, we've

1086
01:04:52,080 --> 01:04:53,490
thrown away all of these points.

1087
01:04:53,490 --> 01:04:53,990
AUDIENCE: Right.

1088
01:04:53,990 --> 01:04:55,990
So if you take the next
Pi, it's not necessarily

1089
01:04:55,990 --> 01:04:58,031
going to be the points
less than or equal to xi--

1090
01:04:58,031 --> 01:04:59,280
ERIK DEMAINE: It's true.

1091
01:04:59,280 --> 01:05:01,650
When I say points, I
don't mean all points.

1092
01:05:01,650 --> 01:05:04,480
I mean points in Si minus 1.

1093
01:05:04,480 --> 01:05:05,880
I'm dropping that
because it gets

1094
01:05:05,880 --> 01:05:08,610
awkward to keep talking about.

1095
01:05:08,610 --> 01:05:10,530
So that's a correctness
issue, essentially.

1096
01:05:10,530 --> 01:05:12,738
You have to argue that we
can throw away these points

1097
01:05:12,738 --> 01:05:14,070
and it's safe.

1098
01:05:14,070 --> 01:05:18,940
Once we do, then you could
just ignore their existence.

1099
01:05:18,940 --> 01:05:21,942
You can ignore their
existence because you already

1100
01:05:21,942 --> 01:05:23,400
solved all the
dense queries, which

1101
01:05:23,400 --> 01:05:26,700
are over here, or over here,
which involve those points.

1102
01:05:26,700 --> 01:05:28,290
And so we now know
that we're only

1103
01:05:28,290 --> 01:05:31,950
going to be doing
queries from here down.

1104
01:05:31,950 --> 01:05:34,140
Otherwise, you look at P0.

1105
01:05:34,140 --> 01:05:35,250
So forget about those.

1106
01:05:35,250 --> 01:05:36,970
Forget about those points.

1107
01:05:36,970 --> 01:05:39,790
Now you're going to be searching
in one of these structures.

1108
01:05:39,790 --> 01:05:42,040
So you can forget about
all the points over here.

1109
01:05:42,040 --> 01:05:44,385
So that's that argument.

1110
01:05:44,385 --> 01:05:46,132
Once you've restricted
to Si minus 1

1111
01:05:46,132 --> 01:05:48,090
and you don't have to
look at any other points,

1112
01:05:48,090 --> 01:05:49,810
among those points,
this is going

1113
01:05:49,810 --> 01:05:52,800
to be all the points
less than or equal to xi.

1114
01:05:52,800 --> 01:05:54,600
But that's how we were
defining bar sparse.

1115
01:05:54,600 --> 01:05:56,920
We said sparse in Si minus 1.

1116
01:05:56,920 --> 01:05:59,520
So it's among those
points we have sparsity.

1117
01:05:59,520 --> 01:06:03,800
So this is the definition
of what we have.

1118
01:06:03,800 --> 01:06:05,305
OK, the claim is
it's a good thing.

1119
01:06:05,305 --> 01:06:06,430
Here's the charging scheme.

1120
01:06:09,790 --> 01:06:11,010
So this is by sparsity.

1121
01:06:17,910 --> 01:06:29,070
So I'm going to charge storing
Pi minus 1 to Pi minus 1

1122
01:06:29,070 --> 01:06:32,290
minus Si.

1123
01:06:32,290 --> 01:06:35,760
This algebra, I have to
interpret every single time,

1124
01:06:35,760 --> 01:06:37,410
but that's fine.

1125
01:06:37,410 --> 01:06:38,700
Let's look at the picture.

1126
01:06:38,700 --> 01:06:43,692
OK, Pi minus 1 remember, was
this white rectangle over here.

1127
01:06:43,692 --> 01:06:45,150
Everything to the
left of the line.

1128
01:06:47,690 --> 01:06:49,950
We have to store Pi.

1129
01:06:49,950 --> 01:06:52,950
We want that the sum of the
sizes of the Pi's is good.

1130
01:06:52,950 --> 01:06:54,510
And so here's my
charging scheme.

1131
01:06:54,510 --> 01:06:56,590
We have to store Pi minus 1.

1132
01:06:56,590 --> 01:06:59,200
I'm going to charge
it to these points.

1133
01:06:59,200 --> 01:07:00,985
What are those points?

1134
01:07:00,985 --> 01:07:03,360
Those are the points that are
inside the white rectangle,

1135
01:07:03,360 --> 01:07:08,220
but outside the red L-shape.

1136
01:07:08,220 --> 01:07:09,900
So that's these points.

1137
01:07:09,900 --> 01:07:14,730
This is Pi minus 1 minus Si.

1138
01:07:14,730 --> 01:07:16,830
Those are the points
that I'm throwing away.

1139
01:07:16,830 --> 01:07:17,730
That's good.

1140
01:07:17,730 --> 01:07:21,810
So if I charge them now, I will
never charge them in the future

1141
01:07:21,810 --> 01:07:23,670
because I just threw them away.

1142
01:07:23,670 --> 01:07:25,020
They are not in the next Si.

1143
01:07:28,800 --> 01:07:32,490
Each point overall in the point
set only gets charged once.

1144
01:07:41,080 --> 01:07:43,960
OK, how much does
it get charged?

1145
01:07:43,960 --> 01:07:47,380
How do these things relate
to each other in size?

1146
01:07:47,380 --> 01:07:49,660
That's where we use this thing.

1147
01:07:49,660 --> 01:07:53,170
It gets confusing to think about
intersection versus difference,

1148
01:07:53,170 --> 01:07:56,150
but the point is if we look
at the Pi minus ones that are

1149
01:07:56,150 --> 01:08:00,070
in Si, that's a small fraction.

1150
01:08:00,070 --> 01:08:01,750
Think of alpha as 100.

1151
01:08:01,750 --> 01:08:06,880
So then the Pi minus 1-- so this
part down here that's in Si,

1152
01:08:06,880 --> 01:08:11,030
this is only 1/100 of the
whole white rectangle.

1153
01:08:11,030 --> 01:08:17,800
So that means this part
is 99/100 of the Pi.

1154
01:08:17,800 --> 01:08:20,950
So if we charged the storing
of the entire rectangle

1155
01:08:20,950 --> 01:08:24,279
to these guys, we're only
losing a very small factor

1156
01:08:24,279 --> 01:08:26,840
like 100/99 or something.

1157
01:08:26,840 --> 01:08:30,460
It isn't actually exactly
100/99, I believe.

1158
01:08:30,460 --> 01:08:34,689
I worked it out and
the factor of charging,

1159
01:08:34,689 --> 01:08:37,120
assuming I did it
correctly, is 1 over 1

1160
01:08:37,120 --> 01:08:41,020
minus 1 over alpha,
which works out

1161
01:08:41,020 --> 01:08:43,420
to alpha over alpha minus 1.

1162
01:08:43,420 --> 01:08:46,279
It doesn't really matter, but
the point is it's constant.

1163
01:08:46,279 --> 01:08:48,622
I think that's easy to believe.

1164
01:08:48,622 --> 01:08:51,080
Maybe it's actually easiest to
think about when alpha is 2.

1165
01:08:54,939 --> 01:08:56,920
At most, half the
points are here.

1166
01:08:56,920 --> 01:08:58,710
At least, half the
points are here.

1167
01:08:58,710 --> 01:09:01,210
And so we're charging
storing the entire point set

1168
01:09:01,210 --> 01:09:03,850
to these points, which will
never get charged again.

1169
01:09:03,850 --> 01:09:06,220
So we're only charging
with a factor of two.

1170
01:09:06,220 --> 01:09:09,000
That's all we need,
a constant factor.

1171
01:09:09,000 --> 01:09:12,580
OK, therefore, this
thing has linear size.

1172
01:09:12,580 --> 01:09:13,870
That's the cool thing.

1173
01:09:13,870 --> 01:09:15,250
We get more though.

1174
01:09:15,250 --> 01:09:16,840
We also get the
query bound we want.

1175
01:09:20,394 --> 01:09:21,810
Let's think about
the query bound.

1176
01:09:31,590 --> 01:09:32,818
This is fun.

1177
01:09:32,818 --> 01:09:34,109
Think about where the query is.

1178
01:09:34,109 --> 01:09:36,420
It used to be over here.

1179
01:09:36,420 --> 01:09:41,220
We do a search in S0,
or we do a search in S1,

1180
01:09:41,220 --> 01:09:43,950
or we do a search in S2.

1181
01:09:43,950 --> 01:09:47,819
We'd never look at multiple Si's
because there'd be no point.

1182
01:09:47,819 --> 01:09:50,420
Either S0 was dense, and
we're fine, just do it.

1183
01:09:50,420 --> 01:09:53,702
Or you have to jump to
S1, skip some guys up top,

1184
01:09:53,702 --> 01:09:54,660
do the search in there.

1185
01:09:54,660 --> 01:09:55,600
Fine.

1186
01:09:55,600 --> 01:09:58,080
We no longer have that luxury
over here because we're using

1187
01:09:58,080 --> 01:09:59,970
Pi's instead of Si's.

1188
01:09:59,970 --> 01:10:02,730
So it actually may be
the search starts in P1,

1189
01:10:02,730 --> 01:10:06,630
but then has to go through
P2, and has to go through P3.

1190
01:10:06,630 --> 01:10:09,720
But it's OK because
the farther we

1191
01:10:09,720 --> 01:10:14,040
go right, we have this sparsity
condition that tells us

1192
01:10:14,040 --> 01:10:17,689
basically the points
we're looking at are--

1193
01:10:17,689 --> 01:10:19,230
the number of points
we're looking at

1194
01:10:19,230 --> 01:10:20,563
are getting smaller and smaller.

1195
01:10:23,560 --> 01:10:26,070
So I'll wave my hands
a little bit here,

1196
01:10:26,070 --> 01:10:30,390
but the claim is it's
a geometric series.

1197
01:10:33,130 --> 01:10:37,860
This needs a formal proof, but
we won't go through it here.

1198
01:10:37,860 --> 01:10:43,680
Decreasing-- so this
is the query bound.

1199
01:10:43,680 --> 01:10:47,670
The number of scanned
points is order output size.

1200
01:10:47,670 --> 01:10:50,514
So you have to check that no
matter where you start in Pi--

1201
01:10:50,514 --> 01:10:51,930
that's the little
bit tricky part.

1202
01:10:51,930 --> 01:10:53,760
We're not looking at all of Pi.

1203
01:10:53,760 --> 01:10:56,760
We're looking at
some of Pi and then

1204
01:10:56,760 --> 01:10:59,222
we're going to the
right from there.

1205
01:10:59,222 --> 01:11:00,180
Actually, is that true?

1206
01:11:00,180 --> 01:11:01,596
Maybe we always
look at all of Pi.

1207
01:11:04,010 --> 01:11:05,010
Let me think about this.

1208
01:11:08,410 --> 01:11:10,170
I think we do, actually.

1209
01:11:10,170 --> 01:11:12,477
Sorry.

1210
01:11:12,477 --> 01:11:13,560
That's what we did before.

1211
01:11:18,360 --> 01:11:20,789
We basically figure out
where we are in y-coordinate.

1212
01:11:20,789 --> 01:11:22,080
That was the overall structure.

1213
01:11:22,080 --> 01:11:25,290
We had a Van Emde
Boas search tree on y.

1214
01:11:25,290 --> 01:11:29,400
So all we know at this point is
the y-coordinate of our search.

1215
01:11:29,400 --> 01:11:31,830
And so we use that to
determine which of the Pi's we

1216
01:11:31,830 --> 01:11:37,120
go to, based on where the
yi becomes no longer dense.

1217
01:11:39,279 --> 01:11:41,820
And then we're going to have to
search through that entire Pi

1218
01:11:41,820 --> 01:11:47,100
and potentially more
of them because this

1219
01:11:47,100 --> 01:11:48,390
is no longer an Si.

1220
01:11:48,390 --> 01:11:51,150
It's just doing the
things to the left.

1221
01:11:51,150 --> 01:11:56,250
And so if we're lucky,
the Pi we're looking at,

1222
01:11:56,250 --> 01:12:01,647
or the query we're doing, is
not to the right of this point.

1223
01:12:01,647 --> 01:12:02,730
OK, maybe it's right here.

1224
01:12:02,730 --> 01:12:03,563
That would be great.

1225
01:12:03,563 --> 01:12:06,809
Then all our answers are done.

1226
01:12:06,809 --> 01:12:08,850
If our query is here, that
would have been dense,

1227
01:12:08,850 --> 01:12:11,400
so we would have done
it at an earlier stage.

1228
01:12:11,400 --> 01:12:13,860
Our query might be
down here though.

1229
01:12:13,860 --> 01:12:19,182
When the query's down here, we
need to report on these points.

1230
01:12:19,182 --> 01:12:20,640
Then we're going
to have to do more

1231
01:12:20,640 --> 01:12:22,860
and that's going
to be Pi plus 1.

1232
01:12:22,860 --> 01:12:25,950
So we'll do more and
more Pi's until we

1233
01:12:25,950 --> 01:12:30,930
get to our actual query here.

1234
01:12:30,930 --> 01:12:33,270
But in any case, the claim
is that this is geometrically

1235
01:12:33,270 --> 01:12:35,110
decreasing by the
same charging scheme.

1236
01:12:38,150 --> 01:12:40,220
OK, that's two out
of the three claims.

1237
01:12:40,220 --> 01:12:46,530
There's one more, which
is closely related.

1238
01:12:46,530 --> 01:12:48,520
It's still about
the query problem.

1239
01:12:48,520 --> 01:12:51,490
What we haven't shown is that
we actually find all the points.

1240
01:12:51,490 --> 01:12:53,760
This is what you might
call correctness.

1241
01:12:57,640 --> 01:13:01,360
To prove this, what
we need to say--

1242
01:13:01,360 --> 01:13:03,890
what we claim is that
after you do the P1's--

1243
01:13:07,280 --> 01:13:08,430
and now you do the P2's.

1244
01:13:12,910 --> 01:13:13,900
Well, I'll tell you.

1245
01:13:13,900 --> 01:13:18,790
The claim is that you visited
some x-coordinates here.

1246
01:13:18,790 --> 01:13:21,380
The Pi's were all the things
up to some x-coordinate.

1247
01:13:21,380 --> 01:13:23,530
Claim that the very
next point in here,

1248
01:13:23,530 --> 01:13:27,231
in P2, has a smaller
x-coordinate than what you just

1249
01:13:27,231 --> 01:13:27,730
did.

1250
01:13:30,600 --> 01:13:34,110
I think that should be clear
because presumably there

1251
01:13:34,110 --> 01:13:38,610
are some points in here, and
so the very next Pi, it's

1252
01:13:38,610 --> 01:13:40,050
restricted within
this red thing,

1253
01:13:40,050 --> 01:13:41,700
but it's going to be up
to some x-coordinate.

1254
01:13:41,700 --> 01:13:43,116
So you're basically
starting over.

1255
01:13:43,116 --> 01:13:47,310
Every time you go to the Pi's,
you're starting over in x.

1256
01:13:47,310 --> 01:13:50,010
Go back to minus infinity in x.

1257
01:13:50,010 --> 01:13:53,130
So the idea is the picture
will look something like this.

1258
01:13:53,130 --> 01:13:55,266
You start at minus infinity,
you read some points.

1259
01:13:55,266 --> 01:13:56,890
At some point, you
run out of the Pi's.

1260
01:13:56,890 --> 01:13:59,597
Then you start over again,
you read some smaller set

1261
01:13:59,597 --> 01:14:00,180
of the points.

1262
01:14:00,180 --> 01:14:01,471
Maybe you get a little farther.

1263
01:14:01,471 --> 01:14:04,140
You start over again,
read a little farther.

1264
01:14:04,140 --> 01:14:07,050
At some point, you're going
to reach your threshold x.

1265
01:14:07,050 --> 01:14:08,990
That's when you stop.

1266
01:14:08,990 --> 01:14:10,726
So that's correctness.

1267
01:14:13,510 --> 01:14:15,260
I feel like I need
another sentence there.

1268
01:14:19,310 --> 01:14:23,180
Once your Pi encompasses
your x range,

1269
01:14:23,180 --> 01:14:24,620
that's going to
have your answer.

1270
01:14:24,620 --> 01:14:25,590
Then you're done.

1271
01:14:25,590 --> 01:14:27,260
So that's this moment.

1272
01:14:27,260 --> 01:14:32,690
And so the only worry is that
an early Pi, basically, or maybe

1273
01:14:32,690 --> 01:14:35,450
the next Pi does
this, and then we

1274
01:14:35,450 --> 01:14:36,757
do this or something like this.

1275
01:14:36,757 --> 01:14:38,840
That never happens basically
because you're always

1276
01:14:38,840 --> 01:14:39,770
resetting x range.

1277
01:14:39,770 --> 01:14:42,320
And so your x will always
start over to something

1278
01:14:42,320 --> 01:14:44,060
less than what you had.

1279
01:14:44,060 --> 01:14:46,760
And so the
termination condition,

1280
01:14:46,760 --> 01:14:51,090
which I probably didn't
write down here, but which is

1281
01:14:51,090 --> 01:14:54,140
stop when your x-coordinate
is bigger than what you want.

1282
01:14:54,140 --> 01:14:56,210
Never terminates early.

1283
01:14:56,210 --> 01:14:59,480
Therefore we get all the
points we care about.

1284
01:14:59,480 --> 01:15:01,250
OK, a little bit
hand-wavy, but that

1285
01:15:01,250 --> 01:15:04,520
is why this structure works.

1286
01:15:04,520 --> 01:15:10,740
It's a very weird set up, but
linear sized, and you just

1287
01:15:10,740 --> 01:15:12,510
jump into the right
point in the array,

1288
01:15:12,510 --> 01:15:15,300
start reading, throw
away the points that

1289
01:15:15,300 --> 01:15:17,790
aren't in your range because
they just happen to be there.

1290
01:15:17,790 --> 01:15:20,270
Those would be these
points up here.

1291
01:15:20,270 --> 01:15:24,560
Throw away duplicates.

1292
01:15:24,560 --> 01:15:28,010
Just output the points in
your range and it gives you,

1293
01:15:28,010 --> 01:15:31,790
magically, all the points
in here by a linear scan.

1294
01:15:31,790 --> 01:15:35,310
I still find this so
weird, but it's true.

1295
01:15:38,050 --> 01:15:40,740
Truth is stranger
than fiction, I guess.

1296
01:15:40,740 --> 01:15:41,620
They're fun facts.

1297
01:15:41,620 --> 01:15:45,220
You can actually compute this
thing in the sorting bound.

1298
01:15:45,220 --> 01:15:48,700
So pre-processing is just sort.

1299
01:15:48,700 --> 01:15:52,125
I won't prove that here.

1300
01:15:52,125 --> 01:15:53,440
So this was two sided.

1301
01:15:53,440 --> 01:15:56,080
Let me briefly tell
you how to solve

1302
01:15:56,080 --> 01:15:58,300
three sided and four sided.

1303
01:15:58,300 --> 01:16:02,830
We basically already did
this one, which was--

1304
01:16:02,830 --> 01:16:05,110
I'll remind you
what it looks like.

1305
01:16:10,550 --> 01:16:15,160
So you have a binary
tree, and in each node

1306
01:16:15,160 --> 01:16:18,400
you store two
augmented structures.

1307
01:16:18,400 --> 01:16:20,910
One which can do ranged
queries like this,

1308
01:16:20,910 --> 01:16:23,707
and one which can do inverted
range queries like this.

1309
01:16:23,707 --> 01:16:24,790
This should look familiar.

1310
01:16:27,900 --> 01:16:31,930
And so you do a search on--

1311
01:16:31,930 --> 01:16:34,420
let's say we want
to do this thing.

1312
01:16:34,420 --> 01:16:39,490
So we have x1, x2, y2.

1313
01:16:39,490 --> 01:16:41,740
You search for x1,
you search for x2.

1314
01:16:41,740 --> 01:16:50,140
You find the LCA and then in
this subtree, you do a search.

1315
01:16:50,140 --> 01:16:52,900
In this subtree, you already
know that you're less than x2,

1316
01:16:52,900 --> 01:17:02,650
and so you do the x1,
y2 search in this node.

1317
01:17:02,650 --> 01:17:09,460
And then in the right subtree,
you do the x2, y2 search.

1318
01:17:09,460 --> 01:17:15,170
You take the union of those two
results and that is this query.

1319
01:17:15,170 --> 01:17:18,370
That's how we did it before.

1320
01:17:18,370 --> 01:17:19,560
No difficulty here.

1321
01:17:19,560 --> 01:17:22,240
And the point is,
you can build this,

1322
01:17:22,240 --> 01:17:24,040
put it in a Van
Emde Boas layout.

1323
01:17:24,040 --> 01:17:25,960
You do this search,
you do this search,

1324
01:17:25,960 --> 01:17:28,300
you find the LCA in
log base B of N--

1325
01:17:28,300 --> 01:17:30,760
to check that everything
works, cache obviously.

1326
01:17:30,760 --> 01:17:33,700
Then these structures are just
structures which we already

1327
01:17:33,700 --> 01:17:37,120
built, and so yes,
we lose a lag factor

1328
01:17:37,120 --> 01:17:42,190
because every point appears
in log data structures,

1329
01:17:42,190 --> 01:17:44,419
but that's it.

1330
01:17:44,419 --> 01:17:45,710
Everything else works the same.

1331
01:17:45,710 --> 01:17:48,430
So we get N log N
space log base B of N

1332
01:17:48,430 --> 01:17:51,144
plus output over B query.

1333
01:17:51,144 --> 01:17:53,560
Because now we just have to
do two queries instead of one.

1334
01:17:53,560 --> 01:17:56,908
We don't there's a log factor.

1335
01:17:56,908 --> 01:18:01,810
That's the trick we did
before OK, that was easy.

1336
01:18:05,510 --> 01:18:08,071
One more.

1337
01:18:08,071 --> 01:18:09,070
So that was three sided.

1338
01:18:13,580 --> 01:18:15,450
Next is four sided.

1339
01:18:20,850 --> 01:18:23,520
Four sided, of course, we could
do exactly the same thing.

1340
01:18:23,520 --> 01:18:25,035
Lose another log
factor in space.

1341
01:18:29,040 --> 01:18:33,300
Maintain log base B of N plus
output over B query time.

1342
01:18:33,300 --> 01:18:36,120
But I want to do
slightly better and this

1343
01:18:36,120 --> 01:18:39,150
is a trick we could have done
in internal memory as well.

1344
01:18:39,150 --> 01:18:42,270
But I have two minutes
to show it to you.

1345
01:18:42,270 --> 01:18:44,360
So here's a bonus.

1346
01:18:46,980 --> 01:18:49,230
Didn't have to do this in
external memory context,

1347
01:18:49,230 --> 01:18:49,800
but we can.

1348
01:18:52,500 --> 01:18:54,690
Four sided.

1349
01:18:54,690 --> 01:18:58,200
So we're going to do
the same thing, but not

1350
01:18:58,200 --> 01:19:00,150
on a binary tree.

1351
01:19:00,150 --> 01:19:05,120
Take this binary tree, this
is sorted by x, I suppose.

1352
01:19:05,120 --> 01:19:06,910
This is key on x.

1353
01:19:09,540 --> 01:19:16,200
Instead of making it binary,
make it root log [? N ary. ?]

1354
01:19:16,200 --> 01:19:18,300
So imagine taking
the binary tree,

1355
01:19:18,300 --> 01:19:22,130
taking little chunks, which
have size square root log

1356
01:19:22,130 --> 01:19:27,630
N. Its capital N. And
imagine contracting

1357
01:19:27,630 --> 01:19:29,220
those chunks into single nodes.

1358
01:19:29,220 --> 01:19:32,280
So we have a single note
which has square root

1359
01:19:32,280 --> 01:19:38,290
log N. Children [INAUDIBLE]
has square root log N children.

1360
01:19:38,290 --> 01:19:41,470
This is all static.

1361
01:19:41,470 --> 01:19:42,190
And so on.

1362
01:19:42,190 --> 01:19:45,060
Otherwise, the same.

1363
01:19:45,060 --> 01:19:47,890
The augmentation is going to
be a little bit different.

1364
01:19:47,890 --> 01:19:50,490
If we look at a
node, we're going

1365
01:19:50,490 --> 01:19:52,950
to store the same
things we had before,

1366
01:19:52,950 --> 01:19:57,270
which was this kind of query,
and this kind of query.

1367
01:19:57,270 --> 01:20:00,120
We're going to store
a little bit more.

1368
01:20:00,120 --> 01:20:04,230
Namely, for any
interval of children,

1369
01:20:04,230 --> 01:20:08,010
like here you have some start
child and some end child.

1370
01:20:08,010 --> 01:20:12,060
I want to store for all the
points that are down there.

1371
01:20:12,060 --> 01:20:16,290
For this thing, store a
regular binary search tree

1372
01:20:16,290 --> 01:20:20,650
on y for those points.

1373
01:20:20,650 --> 01:20:21,510
Why?

1374
01:20:21,510 --> 01:20:24,090
Because if we do a search--

1375
01:20:24,090 --> 01:20:32,610
OK, same deal-- we
find the LCA of x1, x1?

1376
01:20:32,610 --> 01:20:33,920
I don't know.

1377
01:20:33,920 --> 01:20:35,760
Let's say it's on x.

1378
01:20:35,760 --> 01:20:39,940
We'll have to do it
again on y whatever.

1379
01:20:39,940 --> 01:20:41,820
So here's the LCA.

1380
01:20:41,820 --> 01:20:43,730
Let's say there's
a lot of children.

1381
01:20:43,730 --> 01:20:52,410
OK, maybe here is
x1 and here is x2.

1382
01:20:52,410 --> 01:20:56,490
So in this subtree, we do this--

1383
01:20:56,490 --> 01:21:00,810
sorry, we do this
range query because we

1384
01:21:00,810 --> 01:21:03,330
want to go from x1 to infinity.

1385
01:21:03,330 --> 01:21:07,440
Over in this subtree, we
want to do this range query

1386
01:21:07,440 --> 01:21:11,430
because we want to go from
negative infinity to x2.

1387
01:21:11,430 --> 01:21:13,620
But then there's all
this stuff in the middle.

1388
01:21:13,620 --> 01:21:17,310
I don't want to have to do a
query for every single tree.

1389
01:21:17,310 --> 01:21:19,410
Instead, I have this
augmentation that

1390
01:21:19,410 --> 01:21:22,260
says for this interval,
here are all the points

1391
01:21:22,260 --> 01:21:24,870
sorted by x-coordinate.

1392
01:21:24,870 --> 01:21:27,030
I guess we're doing it this way.

1393
01:21:30,090 --> 01:21:33,780
Fine, so then it
is a range query.

1394
01:21:33,780 --> 01:21:37,170
I want to know what
are all the points.

1395
01:21:37,170 --> 01:21:39,390
Whoa, this is confusing.

1396
01:21:39,390 --> 01:21:41,190
I feel like I've
missed something here.

1397
01:21:41,190 --> 01:21:41,780
No, this on y.

1398
01:21:41,780 --> 01:21:42,280
Sorry.

1399
01:21:45,360 --> 01:21:47,460
These points I've
got sorted by y.

1400
01:21:47,460 --> 01:21:50,250
So I should draw
it the other way.

1401
01:21:50,250 --> 01:21:54,240
These points we already know
are in-between x1 and x2 in x.

1402
01:21:54,240 --> 01:21:56,320
We've already solved
the x problem here.

1403
01:21:56,320 --> 01:22:03,510
So now I just need to restrict
to the y range from y1 to y2.

1404
01:22:03,510 --> 01:22:06,330
In these trees, these
already match in x.

1405
01:22:06,330 --> 01:22:08,610
I just need to make
sure they match in y.

1406
01:22:08,610 --> 01:22:10,910
So I do a regular 1D range tree.

1407
01:22:10,910 --> 01:22:12,570
I search for y1,
I search for y2,

1408
01:22:12,570 --> 01:22:14,490
take all the points in between.

1409
01:22:14,490 --> 01:22:18,510
This is cheap if I just have a
regular old binary search tree.

1410
01:22:18,510 --> 01:22:22,380
Now, this thing has linear size.

1411
01:22:22,380 --> 01:22:29,340
This thing has-- sorry,
I think I actually need--

1412
01:22:29,340 --> 01:22:31,440
I should have a three
sided range query.

1413
01:22:31,440 --> 01:22:33,780
Thanks.

1414
01:22:33,780 --> 01:22:37,080
These should be three
sided because here I

1415
01:22:37,080 --> 01:22:40,044
know that I've got the
right side covered already

1416
01:22:40,044 --> 01:22:42,210
in this tree, I've got the
left side covered already

1417
01:22:42,210 --> 01:22:44,970
in this tree, but I still need
the remaining three sides.

1418
01:22:44,970 --> 01:22:46,980
In here, I only
need these two sides

1419
01:22:46,980 --> 01:22:50,050
because I've already
got x1 and x2 covered.

1420
01:22:50,050 --> 01:22:51,471
OK, so this is cheap.

1421
01:22:51,471 --> 01:22:53,220
I only need a linear
space data structure.

1422
01:22:53,220 --> 01:22:54,570
This thing is not so cheap.

1423
01:22:54,570 --> 01:22:56,940
I'm using the previous
data structure.

1424
01:22:56,940 --> 01:22:58,950
This thing, which
has N log N size,

1425
01:22:58,950 --> 01:23:01,330
these are three
sided range queries.

1426
01:23:01,330 --> 01:23:03,000
Sorry for drawing it wrong.

1427
01:23:05,550 --> 01:23:08,760
So I need two three
sided structures.

1428
01:23:08,760 --> 01:23:10,290
Then I need actually
a whole bunch

1429
01:23:10,290 --> 01:23:13,440
of these structures because
this was for every interval.

1430
01:23:13,440 --> 01:23:15,660
But conveniently, they're
only log N intervals

1431
01:23:15,660 --> 01:23:17,500
because there's
root log N children.

1432
01:23:17,500 --> 01:23:21,410
So root log N squared is
log N. So there's root N,

1433
01:23:21,410 --> 01:23:24,460
but then we need log N of them.

1434
01:23:24,460 --> 01:23:27,260
And so that's why these
things balance out.

1435
01:23:27,260 --> 01:23:28,820
See?

1436
01:23:28,820 --> 01:23:34,790
So normally, this would be N log
squared N because every point

1437
01:23:34,790 --> 01:23:37,040
would appear in log N trees.

1438
01:23:37,040 --> 01:23:40,220
But now the height
of my tree is merely

1439
01:23:40,220 --> 01:23:44,550
log N over log log
N with a factor

1440
01:23:44,550 --> 01:23:49,170
2 out here because I
have a square root here.

1441
01:23:49,170 --> 01:23:51,950
OK, so the tree has height
log N over log log N.

1442
01:23:51,950 --> 01:23:54,770
So each point only appears
in log N over log log N

1443
01:23:54,770 --> 01:23:55,760
structures.

1444
01:23:55,760 --> 01:23:58,580
Each of them needs a
structure size N log N.

1445
01:23:58,580 --> 01:24:03,470
So we end up with N log
squared N over log log N space.

1446
01:24:03,470 --> 01:24:05,150
Kind of crazy, but
this is how you

1447
01:24:05,150 --> 01:24:07,850
get that last little bit of
log log N space improvement

1448
01:24:07,850 --> 01:24:10,400
by contracting nodes,
doing a simpler data

1449
01:24:10,400 --> 01:24:14,780
structure for these
middle children,

1450
01:24:14,780 --> 01:24:16,290
and just focusing on--

1451
01:24:16,290 --> 01:24:19,280
The left child and the right
child you have to do one three

1452
01:24:19,280 --> 01:24:20,930
sided call, but
then the middle is

1453
01:24:20,930 --> 01:24:22,140
a very simple two sided call.

1454
01:24:22,140 --> 01:24:26,960
It's just a 1D structure
and so it's really cheap.

1455
01:24:26,960 --> 01:24:28,690
That's it.