1
00:00:00,060 --> 00:00:02,500
The following content is
provided under a Creative

2
00:00:02,500 --> 00:00:04,019
Commons license.

3
00:00:04,019 --> 00:00:06,360
Your support will help
MIT OpenCourseWare

4
00:00:06,360 --> 00:00:10,730
continue to offer high quality
educational resources for free.

5
00:00:10,730 --> 00:00:13,340
To make a donation or
view additional materials

6
00:00:13,340 --> 00:00:17,236
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:17,236 --> 00:00:17,861
at ocw.mit.edu.

8
00:00:20,915 --> 00:00:21,790
PROFESSOR: All right.

9
00:00:21,790 --> 00:00:23,780
Welcome back to 6046.

10
00:00:23,780 --> 00:00:24,611
AUDIENCE: Woohoo.

11
00:00:24,611 --> 00:00:26,860
PROFESSOR: Are you guys ready
to learn an awesome data

12
00:00:26,860 --> 00:00:27,440
structure?

13
00:00:27,440 --> 00:00:28,316
AUDIENCE: Woohoo.

14
00:00:28,316 --> 00:00:31,120
PROFESSOR: Yeah, let's do it.

15
00:00:31,120 --> 00:00:34,895
This is a data structure named
after a human being, Peter van

16
00:00:34,895 --> 00:00:36,640
Emde Boas.

17
00:00:36,640 --> 00:00:40,390
I was just corresponding
with him yesterday.

18
00:00:40,390 --> 00:00:43,380
And he, in the '70s, he
invented this really cool data

19
00:00:43,380 --> 00:00:43,880
structure.

20
00:00:43,880 --> 00:00:46,130
Its super fast It's amazing.

21
00:00:46,130 --> 00:00:48,300
It's actually pretty
simple to implement.

22
00:00:48,300 --> 00:00:51,390
And it's used a lot, in
practice, in network routers,

23
00:00:51,390 --> 00:00:53,230
among other things.

24
00:00:53,230 --> 00:00:54,830
And we're going
to cover it today.

25
00:00:54,830 --> 00:00:58,730
So let me first tell
you what it does.

26
00:00:58,730 --> 00:01:00,290
So it's an old data structure.

27
00:01:00,290 --> 00:01:03,110
But I feel like it's taken us
decades to really understand.

28
00:01:03,110 --> 00:01:03,610
Question.

29
00:01:03,610 --> 00:01:05,519
AUDIENCE: You're mic's not on.

30
00:01:05,519 --> 00:01:07,870
PROFESSOR: In what sense?

31
00:01:07,870 --> 00:01:09,190
It's not amplified.

32
00:01:09,190 --> 00:01:11,490
It's just for the cameras.

33
00:01:11,490 --> 00:01:15,410
So it's taken us
decades, really,

34
00:01:15,410 --> 00:01:18,270
to understand this data
structure, exactly how it works

35
00:01:18,270 --> 00:01:21,530
and why it's useful.

36
00:01:21,530 --> 00:01:27,900
The problem it's solving is what
you might call a predecessor

37
00:01:27,900 --> 00:01:28,560
problem.

38
00:01:28,560 --> 00:01:30,760
It's very similar to
the sort of problem

39
00:01:30,760 --> 00:01:32,310
that binary search trees solve.

40
00:01:32,310 --> 00:01:36,910
But we're going to do it faster,
but in a somewhat different

41
00:01:36,910 --> 00:01:41,320
model, in that the elements
we're going to be storing

42
00:01:41,320 --> 00:01:45,030
are not just things that
we know how to compare.

43
00:01:45,030 --> 00:01:46,580
That would be the
comparison model.

44
00:01:46,580 --> 00:01:48,250
We're storing integers.

45
00:01:48,250 --> 00:01:53,220
And the integers come from a
universe, U, of size little u.

46
00:01:53,220 --> 00:01:55,770
And we'll assume that they're
non-negative, so from 0

47
00:01:55,770 --> 00:01:56,410
to u minus 1.

48
00:01:56,410 --> 00:01:58,030
Although you could
support negative

49
00:01:58,030 --> 00:02:00,730
integers without
much more effort.

50
00:02:00,730 --> 00:02:03,860
And the operations
we want to support,

51
00:02:03,860 --> 00:02:06,430
we're storing a set of
n of those elements.

52
00:02:06,430 --> 00:02:14,185
We want to do insert,
delete, and successor.

53
00:02:20,230 --> 00:02:22,397
So these are operations you
should be familiar with.

54
00:02:22,397 --> 00:02:23,813
You should know
how to solve these

55
00:02:23,813 --> 00:02:26,640
in log n time per operation with
a balanced binary search tree,

56
00:02:26,640 --> 00:02:27,361
like AVL trees.

57
00:02:27,361 --> 00:02:29,610
You want to add something
to the set, delete something

58
00:02:29,610 --> 00:02:33,490
from the set, or
given a value I want

59
00:02:33,490 --> 00:02:38,730
to know the next largest
value that is in the set.

60
00:02:38,730 --> 00:02:43,790
So if you draw that as
a one dimensional thing,

61
00:02:43,790 --> 00:02:48,210
you've got some items
which are in your set.

62
00:02:48,210 --> 00:02:50,085
And then, you have a query.

63
00:02:52,870 --> 00:02:56,130
So you ask for the
successor of this value.

64
00:02:56,130 --> 00:02:59,510
Then you're asking for, what
is the next value that's

65
00:02:59,510 --> 00:03:00,010
in the set?

66
00:03:00,010 --> 00:03:02,855
So you want to return this item.

67
00:03:02,855 --> 00:03:04,730
OK, predecessor would
be the symmetric thing.

68
00:03:04,730 --> 00:03:06,110
But if you could
solve successor,

69
00:03:06,110 --> 00:03:08,280
you could usually
solve predecessor.

70
00:03:08,280 --> 00:03:10,030
So we'll focus on
these three operations,

71
00:03:10,030 --> 00:03:11,501
although, in the
textbook, you'll

72
00:03:11,501 --> 00:03:13,000
see there are lots
of operations you

73
00:03:13,000 --> 00:03:15,740
could do with van Emde Boas.

74
00:03:15,740 --> 00:03:17,110
So far so good.

75
00:03:17,110 --> 00:03:19,570
We know how to do
this in log n time.

76
00:03:19,570 --> 00:03:27,750
We are going to do
it in log log u time.

77
00:03:27,750 --> 00:03:30,230
Woah, amazing.

78
00:03:30,230 --> 00:03:33,720
So an extra log, but we're
cheating a little bit, in that

79
00:03:33,720 --> 00:03:35,840
we're replacing n with u.

80
00:03:35,840 --> 00:03:39,230
Now in a lot of applications,
u is pretty reasonable,

81
00:03:39,230 --> 00:03:42,335
like 2 to the 32 or 2
to the 64, depending

82
00:03:42,335 --> 00:03:44,620
on what kind of integers
you usually work with.

83
00:03:44,620 --> 00:03:47,830
So log log of that is usually
really tiny, and often smaller

84
00:03:47,830 --> 00:03:49,630
than log n.

85
00:03:49,630 --> 00:03:54,740
So in particular,
on the theory side,

86
00:03:54,740 --> 00:03:57,280
for example, if u is
a polynomial in n,

87
00:03:57,280 --> 00:04:05,960
or even larger than that, you
can support n to the polylog n.

88
00:04:05,960 --> 00:04:10,930
Then log log u is
the same as log log

89
00:04:10,930 --> 00:04:16,040
n, up to constant factors.

90
00:04:16,040 --> 00:04:18,089
And so this is an
exponential improvement

91
00:04:18,089 --> 00:04:20,805
over regular balanced
binary search trees.

92
00:04:20,805 --> 00:04:27,120
OK, so super fast, and it's
also pretty clean and simple,

93
00:04:27,120 --> 00:04:29,780
though it'll take us a
little while to get there.

94
00:04:29,780 --> 00:04:32,480
One application for
this, as I mentioned,

95
00:04:32,480 --> 00:04:35,080
is in network routers.

96
00:04:35,080 --> 00:04:38,610
And I believe most network
routers use the van Emde Boas

97
00:04:38,610 --> 00:04:40,270
data structure these
days, though just

98
00:04:40,270 --> 00:04:45,190
changed in the
last decade or so.

99
00:04:45,190 --> 00:04:47,810
Network router, you have to
store a routing table, which

100
00:04:47,810 --> 00:04:51,170
looks like, for IP
range from this to this,

101
00:04:51,170 --> 00:04:54,030
please send your
packets along this port.

102
00:04:54,030 --> 00:04:56,930
For IP range from this to
this, send along this port.

103
00:04:56,930 --> 00:05:00,430
So if you mark the
beginnings of those ranges

104
00:05:00,430 --> 00:05:06,520
as items in your set, and
given an actual IP address,

105
00:05:06,520 --> 00:05:08,110
you want to know
what range it's in,

106
00:05:08,110 --> 00:05:10,860
that is a predecessor
or a successor problem.

107
00:05:10,860 --> 00:05:13,880
And so van Emde Boas lets
you solve that really fast.

108
00:05:13,880 --> 00:05:19,490
u, for IPV4 is only
is only 2 to the 32.

109
00:05:19,490 --> 00:05:21,520
So that's super
fast and practical.

110
00:05:21,520 --> 00:05:23,150
It's going to take
like five operations

111
00:05:23,150 --> 00:05:27,740
to do log log 2 to the 32.

112
00:05:27,740 --> 00:05:30,750
So that's it.

113
00:05:30,750 --> 00:05:32,250
And as you may know,
network routers

114
00:05:32,250 --> 00:05:33,669
are basically computers.

115
00:05:33,669 --> 00:05:35,960
And so they used to have a
lot of specialized hardware.

116
00:05:35,960 --> 00:05:37,940
These days it's pretty
general purpose.

117
00:05:37,940 --> 00:05:41,660
And so you want nice data
structures, like the one

118
00:05:41,660 --> 00:05:43,060
we'll cover.

119
00:05:43,060 --> 00:05:46,660
OK, so we want to
shoot for log log u.

120
00:05:46,660 --> 00:05:50,990
We're going to get there
by a series of improvements

121
00:05:50,990 --> 00:05:52,510
on a very simple idea.

122
00:05:52,510 --> 00:05:54,830
This is not the original
way that van Emde Boas

123
00:05:54,830 --> 00:05:56,570
got to this concept.

124
00:05:56,570 --> 00:05:58,200
But it's sort of the
modern take on it.

125
00:05:58,200 --> 00:06:00,280
It's one that's in the textbook.

126
00:06:00,280 --> 00:06:04,920
So the first question is, how
might we get a log log u bound?

127
00:06:04,920 --> 00:06:06,497
Where might that come from?

128
00:06:06,497 --> 00:06:07,580
That's a question for you.

129
00:06:11,267 --> 00:06:12,225
This is just intuition.

130
00:06:22,244 --> 00:06:22,910
Any suggestions?

131
00:06:32,980 --> 00:06:34,520
We see logs all the time.

132
00:06:34,520 --> 00:06:35,140
So, yeah.

133
00:06:35,140 --> 00:06:37,797
AUDIENCE: You organize the
height of a tree into a tree.

134
00:06:37,797 --> 00:06:38,630
PROFESSOR: Ah, good.

135
00:06:38,630 --> 00:06:40,930
You organize the height
of the tree into a tree.

136
00:06:40,930 --> 00:06:47,560
So we normally think of a tree,
let's say we have u down here.

137
00:06:47,560 --> 00:06:51,340
So the height is log u.

138
00:06:51,340 --> 00:06:55,810
So somehow, we want
a binary search

139
00:06:55,810 --> 00:06:57,420
on the levels of this tree.

140
00:06:57,420 --> 00:06:59,750
Right, if we could kind of
start in the middle level,

141
00:06:59,750 --> 00:07:03,760
and then decide whether
we need to go up or down,

142
00:07:03,760 --> 00:07:05,870
I'm totally unclear
what that would mean.

143
00:07:05,870 --> 00:07:08,600
But in fact, that's exactly
the van Emde Boas will do.

144
00:07:08,600 --> 00:07:12,450
So you can binary
search-- I think

145
00:07:12,450 --> 00:07:18,785
we won't see that until the very
end-- but on levels of a tree.

146
00:07:22,230 --> 00:07:23,355
So at least some intuition.

147
00:07:26,030 --> 00:07:29,800
Now let's think about this
in terms of recurrences.

148
00:07:29,800 --> 00:07:35,070
There's a recurrence for
binary search, which is usually

149
00:07:35,070 --> 00:07:42,580
you have k things, t of k is
t of k over 2 plus order 1.

150
00:07:42,580 --> 00:07:44,070
You spend constant
time to decide

151
00:07:44,070 --> 00:07:46,361
whether you should go left
or right in a binary search,

152
00:07:46,361 --> 00:07:47,950
or in this case up
and down somehow.

153
00:07:47,950 --> 00:07:50,140
And then you reduce to a
problem of half the size.

154
00:07:50,140 --> 00:07:54,670
So this solves to log k.

155
00:07:54,670 --> 00:07:58,360
In our case, k is
actually log u.

156
00:07:58,360 --> 00:08:03,780
So we want a recurrence that
looks something like t of log u

157
00:08:03,780 --> 00:08:11,055
equals t of log
u/2 plus order 1.

158
00:08:11,055 --> 00:08:13,680
OK, even if you don't believe in
the binary search perspective,

159
00:08:13,680 --> 00:08:18,000
this is clearly a recurrence
that solves to log log u.

160
00:08:18,000 --> 00:08:20,080
I'm just substituting
k equals log u here.

161
00:08:20,080 --> 00:08:22,550
So that could be
on the right track.

162
00:08:22,550 --> 00:08:24,010
Now, that's in terms of log u.

163
00:08:24,010 --> 00:08:26,850
What if I wanted to rewrite
this recurrence in terms u?

164
00:08:26,850 --> 00:08:28,270
What would I get?

165
00:08:28,270 --> 00:08:33,299
If I wanted to have this
still solve to log log u,

166
00:08:33,299 --> 00:08:35,480
what should I write here?

167
00:08:50,010 --> 00:08:54,465
If I change the logarithm of
a number by a factor of 2,

168
00:08:54,465 --> 00:08:55,215
how does u change?

169
00:08:57,294 --> 00:08:58,210
AUDIENCE: Square root.

170
00:08:58,210 --> 00:08:59,168
PROFESSOR: Square root.

171
00:09:05,060 --> 00:09:08,180
OK, So I've changed what
the variable is here.

172
00:09:08,180 --> 00:09:10,150
But this is really
the same recurrence.

173
00:09:10,150 --> 00:09:13,020
It will still
solve to log log u.

174
00:09:13,020 --> 00:09:15,870
The number of times you have to
apply square root to a number

175
00:09:15,870 --> 00:09:17,990
to get to 1 is log log u.

176
00:09:17,990 --> 00:09:22,260
So this is some more intuition
for how van Emde Boas is

177
00:09:22,260 --> 00:09:23,530
going to achieve log log u.

178
00:09:23,530 --> 00:09:28,490
And in fact, this is the primary
intuition we'll be using.

179
00:09:28,490 --> 00:09:32,560
So what we would like is
to some take our problem,

180
00:09:32,560 --> 00:09:35,710
which has size u, and
split it into problems

181
00:09:35,710 --> 00:09:38,210
of size square root
of u, so that we only

182
00:09:38,210 --> 00:09:40,150
have to recurse on one of them.

183
00:09:40,150 --> 00:09:42,960
And then, we'll get
this recurrence.

184
00:09:42,960 --> 00:09:49,610
OK, that's where
we're going to go.

185
00:09:49,610 --> 00:09:53,480
But we're going to start with
a very simple data structure

186
00:09:53,480 --> 00:09:58,140
for representing a set of n
numbers from the universe 0

187
00:09:58,140 --> 00:09:59,210
up to u minus 1.

188
00:10:02,370 --> 00:10:05,530
And let's say, initially, our
goal is for insert and delete

189
00:10:05,530 --> 00:10:08,190
to be constant time.

190
00:10:08,190 --> 00:10:09,690
But let's not worry
about successor.

191
00:10:09,690 --> 00:10:11,490
Successor could
take linear time.

192
00:10:11,490 --> 00:10:15,650
What would be a good data
structure for storing items

193
00:10:15,650 --> 00:10:17,000
in this universe?

194
00:10:17,000 --> 00:10:19,157
I want u to be involved somehow.

195
00:10:19,157 --> 00:10:20,740
I don't just want
to, like, store them

196
00:10:20,740 --> 00:10:24,660
in a linked list of items
or assorted array of items.

197
00:10:24,660 --> 00:10:28,430
I would like u to
be involved, insert

198
00:10:28,430 --> 00:10:29,650
and delete constant time.

199
00:10:35,040 --> 00:10:35,725
Very simple.

200
00:10:45,633 --> 00:10:46,133
Yeah.

201
00:10:46,133 --> 00:10:47,582
AUDIENCE: Simply an array.

202
00:10:47,582 --> 00:10:48,790
PROFESSOR: In an array, yeah.

203
00:10:48,790 --> 00:10:51,514
What's the array indexed by?

204
00:10:51,514 --> 00:10:54,720
AUDIENCE: It would be index n.

205
00:10:54,720 --> 00:10:55,460
PROFESSOR: Sorry?

206
00:10:55,460 --> 00:10:56,870
AUDIENCE: By the index of n.

207
00:10:56,870 --> 00:10:59,964
PROFESSOR: The
index of n, close.

208
00:10:59,964 --> 00:11:01,260
AUDIENCE: The value.

209
00:11:01,260 --> 00:11:02,027
PROFESSOR: Sorry?

210
00:11:02,027 --> 00:11:02,860
AUDIENCE: The value.

211
00:11:02,860 --> 00:11:04,490
PROFESSOR: The value, yeah.

212
00:11:04,490 --> 00:11:04,990
Good.

213
00:11:04,990 --> 00:11:12,820
So I want-- this is normally
called a bit vector, where

214
00:11:12,820 --> 00:11:26,540
I want array of size u, and
for each cell in the array,

215
00:11:26,540 --> 00:11:27,730
I'm going to write 0 or 1.

216
00:11:27,730 --> 00:11:29,810
0 means absent.

217
00:11:29,810 --> 00:11:31,820
1 means present.

218
00:11:31,820 --> 00:11:32,570
It's in the set.

219
00:11:35,980 --> 00:11:40,060
So let me draw a
picture, maybe over here.

220
00:11:55,410 --> 00:12:00,885
Let me take my example
and give you a frisbee.

221
00:12:08,390 --> 00:12:10,520
Let me put it in the middle.

222
00:12:39,470 --> 00:12:43,960
So this is an example of a set
with-- if I maybe highlight

223
00:12:43,960 --> 00:12:46,060
a little bit-- here's 1.

224
00:12:48,580 --> 00:12:52,830
Here's a 1, and
a one, and a one.

225
00:12:52,830 --> 00:12:57,260
So there are 4
elements in the set.

226
00:12:57,260 --> 00:12:58,860
The universe size is 16.

227
00:13:03,820 --> 00:13:08,230
n equals 4, in this
particular example.

228
00:13:08,230 --> 00:13:12,930
If I want to insert into this
set, I just change 0 to a 1.

229
00:13:12,930 --> 00:13:15,460
If I want to delete from the
set, I change a 1 to a 0.

230
00:13:15,460 --> 00:13:17,170
So those are constant time.

231
00:13:17,170 --> 00:13:17,670
Good.

232
00:13:26,220 --> 00:13:32,570
If I want to do a successor
query, not so good.

233
00:13:32,570 --> 00:13:36,890
I might need to
spend order u time.

234
00:13:36,890 --> 00:13:39,520
Maybe I asked for the
successor of this item,

235
00:13:39,520 --> 00:13:42,280
and the only thing
to do is just keep

236
00:13:42,280 --> 00:13:44,840
jumping until I get to a 1.

237
00:13:44,840 --> 00:13:47,510
And the worst case,
there's almost to u 0's

238
00:13:47,510 --> 00:13:50,900
in a row, or u minus n.

239
00:13:50,900 --> 00:13:51,980
So that's really slow.

240
00:13:51,980 --> 00:13:53,930
But this, in fact, will
be our starting point.

241
00:13:53,930 --> 00:13:55,730
It may seem really silly.

242
00:13:55,730 --> 00:13:58,770
But it's actually a
good starting point

243
00:13:58,770 --> 00:14:01,570
for van Emde Boas.

244
00:14:01,570 --> 00:14:16,010
So the second idea is, we're
going to take our universe

245
00:14:16,010 --> 00:14:19,375
and split it into clusters.

246
00:14:22,260 --> 00:14:24,700
van Emde Boas, the person,
likes to call these galaxies.

247
00:14:24,700 --> 00:14:29,290
I think that's a nice name
for pieces of the universe.

248
00:14:29,290 --> 00:14:31,487
But textbook calls it clusters.

249
00:14:31,487 --> 00:14:33,070
Because they used
to call it clusters.

250
00:14:33,070 --> 00:14:38,630
So now, it's question of how
big the cluster should be.

251
00:14:38,630 --> 00:14:41,860
But I gave you
this picture, and I

252
00:14:41,860 --> 00:14:44,420
want to think about these
galaxies as separate chunks,

253
00:14:44,420 --> 00:14:45,930
and I ask for the
successor of this,

254
00:14:45,930 --> 00:14:51,508
how could I possibly speed
up the successor search?

255
00:14:51,508 --> 00:14:52,924
Yeah.

256
00:14:52,924 --> 00:14:58,130
AUDIENCE: You could form a tree
for each cluster and connect--

257
00:14:58,130 --> 00:15:01,240
PROFESSOR: You could form a tree
here and store what at the--

258
00:15:01,240 --> 00:15:02,073
[INTERPOSING VOICES]

259
00:15:02,073 --> 00:15:05,530
AUDIENCE: Could store an
or between the two bits.

260
00:15:05,530 --> 00:15:06,250
PROFESSOR: Cool.

261
00:15:06,250 --> 00:15:07,660
I like this.

262
00:15:07,660 --> 00:15:10,516
So I could store the
or of these two bits--

263
00:15:10,516 --> 00:15:12,970
clean this up a little
bit-- or of these two bits,

264
00:15:12,970 --> 00:15:15,320
or of these two bits, and so on.

265
00:15:19,096 --> 00:15:23,210
The or is interesting, because
this 0 bit, in particular,

266
00:15:23,210 --> 00:15:26,170
tells me there's
nothing in here.

267
00:15:26,170 --> 00:15:28,800
So I should just be
able to skip over it.

268
00:15:28,800 --> 00:15:32,330
So you're imagining a kind
of binary search-ish thing.

269
00:15:32,330 --> 00:15:33,140
It's a good idea.

270
00:15:37,440 --> 00:15:39,470
So each node here, I'm
just writing the or

271
00:15:39,470 --> 00:15:40,303
of its two children.

272
00:15:44,710 --> 00:15:46,910
And in fact, you could
do this all the way up.

273
00:15:46,910 --> 00:15:50,729
You could build an
entire binary tree.

274
00:15:50,729 --> 00:15:52,270
But remember, what
we're trying to do

275
00:15:52,270 --> 00:15:55,520
is a binary search on
the levels of the tree.

276
00:15:55,520 --> 00:16:00,570
And so, in particular, I'm
going to focus on this level.

277
00:16:00,570 --> 00:16:02,220
This is the middle
level of that tree

278
00:16:02,220 --> 00:16:05,410
if I drew out the whole thing.

279
00:16:05,410 --> 00:16:08,760
And that level is interesting,
because it's just summarizing--

280
00:16:08,760 --> 00:16:11,826
is there anybody in here, is
there anybody in this cluster,

281
00:16:11,826 --> 00:16:13,200
is there anybody
in this cluster,

282
00:16:13,200 --> 00:16:15,440
is there anybody
in this cluster.

283
00:16:15,440 --> 00:16:18,400
So we call this
the summary vector.

284
00:16:22,820 --> 00:16:26,660
So we'll come back to your
tree perspective at some point.

285
00:16:26,660 --> 00:16:29,834
That is a good big picture
of what's going on.

286
00:16:29,834 --> 00:16:31,750
But at this level, I'm
just going to say, well

287
00:16:31,750 --> 00:16:32,874
let's store the bit vector.

288
00:16:32,874 --> 00:16:36,570
Let's also store
this summary vector.

289
00:16:36,570 --> 00:16:39,950
And now, when I want to find
the successor of something,

290
00:16:39,950 --> 00:16:42,610
first I'll look
inside the cluster.

291
00:16:42,610 --> 00:16:46,000
If I don't find my answer, I'll
go up to the summary vector

292
00:16:46,000 --> 00:16:48,110
and find where is
the next cluster that

293
00:16:48,110 --> 00:16:49,880
has something in it.

294
00:16:49,880 --> 00:16:51,740
And then I'll go
into that cluster

295
00:16:51,740 --> 00:16:54,450
and look for the first one.

296
00:16:54,450 --> 00:16:59,560
OK, that's a good next step.

297
00:16:59,560 --> 00:17:07,280
So this will split the
universe into clusters.

298
00:17:10,280 --> 00:17:14,885
How big should the
clusters be to balance out?

299
00:17:14,885 --> 00:17:16,260
There's three
searches I'm doing.

300
00:17:16,260 --> 00:17:17,829
One is within a cluster.

301
00:17:17,829 --> 00:17:19,700
One is in the summary vector.

302
00:17:19,700 --> 00:17:23,422
And one is within
another cluster.

303
00:17:23,422 --> 00:17:23,922
Yeah.

304
00:17:23,922 --> 00:17:24,922
AUDIENCE: Square root u.

305
00:17:24,922 --> 00:17:26,000
PROFESSOR: Square root u.

306
00:17:26,000 --> 00:17:26,500
Yeah.

307
00:17:26,500 --> 00:17:27,740
That will balance out.

308
00:17:27,740 --> 00:17:29,142
If there's square
root of u size,

309
00:17:29,142 --> 00:17:31,350
then the number of clusters
will be square root of u.

310
00:17:31,350 --> 00:17:32,620
So the search in
the summary vector

311
00:17:32,620 --> 00:17:34,410
will be the same as
the cost down here.

312
00:17:34,410 --> 00:17:35,910
Also we know that
we kind of want

313
00:17:35,910 --> 00:17:38,000
to do square root of
u recursion somehow.

314
00:17:38,000 --> 00:17:40,030
So this is not yet
the recursive version.

315
00:17:40,030 --> 00:17:42,140
But square root of
u is exactly right.

316
00:17:42,140 --> 00:17:44,350
And I owe some frisbees, sorry.

317
00:17:44,350 --> 00:17:46,710
Here's one frisbee.

318
00:17:46,710 --> 00:17:50,860
And yeah, cool.

319
00:17:50,860 --> 00:17:54,552
And I think also you one.

320
00:17:54,552 --> 00:17:56,460
Sorry.

321
00:17:56,460 --> 00:18:01,430
So clusters have
size square root

322
00:18:01,430 --> 00:18:04,260
of u, the square
root of u of them.

323
00:18:04,260 --> 00:18:06,350
And, cool.

324
00:18:06,350 --> 00:18:10,250
So now, when I want to
do an insert or a delete,

325
00:18:10,250 --> 00:18:13,340
it's still-- let's not
worry about delete.

326
00:18:13,340 --> 00:18:14,580
That's a little tricky.

327
00:18:14,580 --> 00:18:16,170
To do an insert,
it's still easy.

328
00:18:16,170 --> 00:18:18,960
If I insert into
here, I set it to 1.

329
00:18:18,960 --> 00:18:23,210
And I check, if this is already
0, I should also set that to 1.

330
00:18:23,210 --> 00:18:24,830
Now deleting would be tricky.

331
00:18:24,830 --> 00:18:27,920
To delete this guy and realize
that there's nothing else, eh.

332
00:18:27,920 --> 00:18:30,890
Let's not worry about that
until we do a lot more work.

333
00:18:30,890 --> 00:18:33,790
Let's just focus on
insert and successor.

334
00:18:33,790 --> 00:18:40,940
So insert, with this strategy,
is still constant time.

335
00:18:40,940 --> 00:18:44,730
It's two steps instead
of one, but it's good.

336
00:18:44,730 --> 00:18:50,360
Successor does three things.

337
00:18:50,360 --> 00:18:56,230
First, we look, let's
say, successor of x.

338
00:18:56,230 --> 00:18:58,705
First thing we do is
look in x's cluster.

339
00:19:02,930 --> 00:19:06,860
Then, if we don't find
what we're looking for,

340
00:19:06,860 --> 00:19:20,690
then we'll look for the next
1 bit in the summary vector,

341
00:19:20,690 --> 00:19:30,790
and then, we'll look for
the first 1 in that cluster.

342
00:19:34,190 --> 00:19:35,190
So there are two cases.

343
00:19:35,190 --> 00:19:38,145
In the lucky case, we find
the successor in the cluster

344
00:19:38,145 --> 00:19:39,670
that we started in.

345
00:19:39,670 --> 00:19:41,590
So that only takes root u time.

346
00:19:41,590 --> 00:19:44,100
If we're unlucky, we
research in the summary.

347
00:19:44,100 --> 00:19:45,285
That takes root u time.

348
00:19:45,285 --> 00:19:46,660
And then we find
the first 1 bit.

349
00:19:46,660 --> 00:19:47,880
That takes root u time.

350
00:19:47,880 --> 00:19:51,930
Whole thing is square root of
u, which is, of course, not very

351
00:19:51,930 --> 00:19:53,930
good, compared to log n.

352
00:19:53,930 --> 00:19:56,140
But it's a lot
better than u, which

353
00:19:56,140 --> 00:19:59,070
is our first method,
the bit vector.

354
00:19:59,070 --> 00:20:01,230
So we've improved from
u to square root of u.

355
00:20:01,230 --> 00:20:03,660
Now of course, the
idea is to recurse.

356
00:20:03,660 --> 00:20:06,872
Instead of just doing a bit
vector at each of these levels,

357
00:20:06,872 --> 00:20:08,580
we're going to
recursively represent each

358
00:20:08,580 --> 00:20:11,669
of these clusters in this way.

359
00:20:11,669 --> 00:20:13,960
This is where things get a
little magical, in the magic

360
00:20:13,960 --> 00:20:17,730
of divide and conquer.

361
00:20:17,730 --> 00:20:19,710
And then, we'll get
t of square root of u

362
00:20:19,710 --> 00:20:23,070
instead of square root of u.

363
00:20:23,070 --> 00:20:26,280
And then we'll get
a log log cost.

364
00:20:26,280 --> 00:20:33,210
So before I get
there, let me give you

365
00:20:33,210 --> 00:20:40,460
a little bit of
terminology and an example

366
00:20:40,460 --> 00:20:42,295
for dealing with clusters.

367
00:20:45,770 --> 00:20:47,980
OK, in general,
remember the things

368
00:20:47,980 --> 00:20:50,580
we're searching for
are just integers.

369
00:20:50,580 --> 00:20:53,510
And what we're talking
about is essentially

370
00:20:53,510 --> 00:20:57,710
dividing an integer, like
x, by square root of u.

371
00:20:57,710 --> 00:21:01,470
And so this is,
whatever, the quotient.

372
00:21:01,470 --> 00:21:02,660
And this is the remainder.

373
00:21:02,660 --> 00:21:05,830
So I want j to be
between 0 and strictly

374
00:21:05,830 --> 00:21:07,290
less than square root of u.

375
00:21:07,290 --> 00:21:10,560
Then this is unique, fundamental
theorem of arithmetic,

376
00:21:10,560 --> 00:21:12,350
or something.

377
00:21:12,350 --> 00:21:15,950
And i is the cluster number.

378
00:21:15,950 --> 00:21:19,860
And then j is the position
of x within that cluster.

379
00:21:19,860 --> 00:21:28,000
So let's do an example
like x equals 9.

380
00:21:28,000 --> 00:21:30,720
So I didn't number
them over here.

381
00:21:30,720 --> 00:21:36,384
This is x equals 0, 1,
2, 3, 4, 5, 6, 7, 8,

382
00:21:36,384 --> 00:21:39,110
9-- here's the guy I'm
interested in-- 10,

383
00:21:39,110 --> 00:21:43,860
11, 12, and so on.

384
00:21:43,860 --> 00:21:45,110
So 9 is here.

385
00:21:45,110 --> 00:21:49,380
This is cluster number 0, 1, 2.

386
00:21:49,380 --> 00:21:52,870
So I claim 9 equals 2
times square root of u.

387
00:21:52,870 --> 00:21:53,860
Here is 4.

388
00:21:53,860 --> 00:21:57,110
I conveniently chose u
to be a perfect square.

389
00:21:57,110 --> 00:22:01,810
And it is item 0,1
within the cluster.

390
00:22:01,810 --> 00:22:05,370
And indeed, 9 equals
2 times 4 plus 1.

391
00:22:05,370 --> 00:22:09,360
So in general, if
you're given x,

392
00:22:09,360 --> 00:22:12,770
and I said, ah, look in x's
cluster, what that means

393
00:22:12,770 --> 00:22:17,440
is look at x integer
divided by square root of u.

394
00:22:17,440 --> 00:22:18,997
That's the cluster number.

395
00:22:18,997 --> 00:22:20,330
And I'll try to search in there.

396
00:22:22,510 --> 00:22:24,690
And I look in the
summary vector,

397
00:22:24,690 --> 00:22:27,650
starting from that
cluster name, the name

398
00:22:27,650 --> 00:22:31,207
of the cluster for this guy,
finding the next cluster.

399
00:22:31,207 --> 00:22:32,790
Then I'll multiply
by square root of u

400
00:22:32,790 --> 00:22:36,930
to get here, and
then continue on.

401
00:22:36,930 --> 00:22:40,340
In general, because
dividing to multiplying-- I

402
00:22:40,340 --> 00:22:43,220
don't want to have to
think about it too hard.

403
00:22:43,220 --> 00:22:47,850
I'm going to say, define
some functions to make

404
00:22:47,850 --> 00:22:51,290
this a little easier,
more intuitive.

405
00:22:51,290 --> 00:22:53,960
So when I do integer division
by square root of u, which

406
00:22:53,960 --> 00:22:55,840
is like taking the floor,
I'll call that high

407
00:22:55,840 --> 00:22:58,370
of x, the high part of x.

408
00:22:58,370 --> 00:23:01,320
And low of x is going
to be the remainder.

409
00:23:01,320 --> 00:23:03,130
That's the j up here.

410
00:23:07,350 --> 00:23:10,910
And if I have the high and
the low part, the i and the j,

411
00:23:10,910 --> 00:23:15,070
I'm going to use
index to go back to x.

412
00:23:15,070 --> 00:23:22,370
So index of ij is going to be i
times square root of u plus j.

413
00:23:22,370 --> 00:23:25,530
Now why do I call
these high and low?

414
00:23:32,195 --> 00:23:33,070
I'll give you a hint.

415
00:23:42,530 --> 00:23:44,380
Here's the binary
representation of x.

416
00:23:56,820 --> 00:24:01,160
In this case, high of x is 2.

417
00:24:01,160 --> 00:24:02,490
And low of x is 1.

418
00:24:06,282 --> 00:24:07,230
Yeah.

419
00:24:07,230 --> 00:24:09,550
AUDIENCE: So the high x
corresponds to the first two,

420
00:24:09,550 --> 00:24:11,522
which is the first 2 bit.

421
00:24:11,522 --> 00:24:13,990
And the low x corresponds
to [INAUDIBLE].

422
00:24:13,990 --> 00:24:15,930
PROFESSOR: Right.

423
00:24:15,930 --> 00:24:21,480
High of x corresponds to
the high half of the bits.

424
00:24:21,480 --> 00:24:26,230
And low of x corresponds to
the bottom half of the bits.

425
00:24:26,230 --> 00:24:29,520
So these are the high order
bits and the low order bits.

426
00:24:29,520 --> 00:24:31,170
And if you think
about it, remember

427
00:24:31,170 --> 00:24:34,790
when we take square root of u
in logarithm, it takes log u

428
00:24:34,790 --> 00:24:36,810
and divides it in half.

429
00:24:36,810 --> 00:24:38,880
So it's exactly,
in the bit factor,

430
00:24:38,880 --> 00:24:42,900
which is log u bits long,
we're dividing in half here,

431
00:24:42,900 --> 00:24:47,260
and looking at the high
bits versus the low bits.

432
00:24:47,260 --> 00:24:48,730
OK?

433
00:24:48,730 --> 00:24:51,780
So that's another interpretation
of what this is doing.

434
00:24:51,780 --> 00:24:53,860
And if you don't
like doing division,

435
00:24:53,860 --> 00:24:57,090
as many computers don't like
to do, all we're actually doing

436
00:24:57,090 --> 00:24:59,530
is masking out these
bits, or taking these bits

437
00:24:59,530 --> 00:25:01,060
and shifting them over.

438
00:25:01,060 --> 00:25:03,620
So these are very
efficient to actually do.

439
00:25:03,620 --> 00:25:07,950
And maybe get some intuition
for why they're relevant.

440
00:25:07,950 --> 00:25:13,900
So let's recurse, shall we?

441
00:25:21,975 --> 00:25:25,100
I think now we know how this
splitting things up works.

442
00:25:42,230 --> 00:25:47,150
So I'm going to call
the overall structure v,

443
00:25:47,150 --> 00:25:51,810
or a van Emde Boas structure
I'm trying to represent is v.

444
00:25:51,810 --> 00:25:56,390
And v is going to
consist of two parts.

445
00:25:56,390 --> 00:26:00,580
One is an array of
all of the clusters.

446
00:26:08,870 --> 00:26:11,050
I'm going to abbreviate
van Emde Boas as VEB.

447
00:26:13,920 --> 00:26:18,190
And recursively, each
of those clusters

448
00:26:18,190 --> 00:26:22,750
is going to be represented by a
smaller VEB structure, of size

449
00:26:22,750 --> 00:26:25,776
square root of the given one.

450
00:26:25,776 --> 00:26:33,901
OK, and i ranges from 0 to
square root of u minus 1.

451
00:26:33,901 --> 00:26:36,500
OK, so there's square
root of u of them.

452
00:26:36,500 --> 00:26:38,640
Total sizes is u.

453
00:26:38,640 --> 00:26:40,850
And then, in
addition, we're going

454
00:26:40,850 --> 00:26:43,160
to have a summary structure.

455
00:26:43,160 --> 00:26:48,311
And this is also a size
square root of u VEB.

456
00:26:53,230 --> 00:26:57,860
OK, you should think about
inserts and successors.

457
00:26:57,860 --> 00:27:01,810
Those are the two operations
I care about for now.

458
00:27:01,810 --> 00:27:02,810
Let's start with insert.

459
00:27:02,810 --> 00:27:03,410
That's easier.

460
00:27:20,360 --> 00:27:26,560
So if I want to insert an
item, x, into data structure v,

461
00:27:26,560 --> 00:27:29,610
then first thing I
should do is insert

462
00:27:29,610 --> 00:27:31,190
into its corresponding cluster.

463
00:27:31,190 --> 00:27:34,950
So let's just get comfortable
with that notation.

464
00:27:34,950 --> 00:27:41,760
We're inserting into the cluster
whose number is high of x.

465
00:27:41,760 --> 00:27:44,780
That is where x belongs.

466
00:27:44,780 --> 00:27:47,270
The name of its cluster
should be high of x.

467
00:27:47,270 --> 00:27:49,350
And what we're going to
be inserting recursively

468
00:27:49,350 --> 00:27:51,150
into there is low of x.

469
00:27:51,150 --> 00:27:54,950
That is the name of x
local to that cluster.

470
00:27:54,950 --> 00:27:58,120
x is a global name with
respect to v. This cluster only

471
00:27:58,120 --> 00:28:01,590
represents a small range
of square root of u items.

472
00:28:01,590 --> 00:28:03,550
So this gets us from
the big space of size u

473
00:28:03,550 --> 00:28:05,133
to the small space
of size square root

474
00:28:05,133 --> 00:28:06,860
of u within that cluster.

475
00:28:06,860 --> 00:28:10,220
So that's basically what
high and low were made for.

476
00:28:10,220 --> 00:28:13,170
But then, we have to also
update the summary structure.

477
00:28:13,170 --> 00:28:17,150
So we need, just in case--
Maybe it's already there.

478
00:28:17,150 --> 00:28:19,110
But in the worst case, it isn't.

479
00:28:19,110 --> 00:28:22,850
So we'll just think of that
as recursively inserting

480
00:28:22,850 --> 00:28:31,570
into v dot summary the
name of the cluster, which

481
00:28:31,570 --> 00:28:33,210
is high of x.

482
00:28:33,210 --> 00:28:36,661
High of x is keeping track of
which clusters are non-empty.

483
00:28:36,661 --> 00:28:38,660
We've just inserted
something into this cluster.

484
00:28:38,660 --> 00:28:39,740
So it's non-empty.

485
00:28:39,740 --> 00:28:43,170
We better mark that
that cluster, high of x,

486
00:28:43,170 --> 00:28:45,640
is non-empty in the
summary structure.

487
00:28:45,640 --> 00:28:46,140
Why?

488
00:28:46,140 --> 00:28:47,820
So we can do successor.

489
00:28:47,820 --> 00:28:50,295
So let's move on to successor.

490
00:29:00,440 --> 00:29:04,236
Actually, I want to mimic
the successor written here

491
00:29:04,236 --> 00:29:05,360
on the bottom of the board.

492
00:29:08,250 --> 00:29:10,610
So what we had in the
non-recursive version

493
00:29:10,610 --> 00:29:11,780
was three steps.

494
00:29:11,780 --> 00:29:14,120
So we're going to do
the same thing here.

495
00:29:14,120 --> 00:29:16,120
We're going to look
within x's cluster.

496
00:29:16,120 --> 00:29:19,420
We now know that is the
cluster known as high of x.

497
00:29:22,790 --> 00:29:25,880
And either we find, and
we're happy, or we don't.

498
00:29:25,880 --> 00:29:29,650
Then we're going to look at
v dot summary search for this

499
00:29:29,650 --> 00:29:33,140
the successor of high of x.

500
00:29:33,140 --> 00:29:37,060
Right, finding the next
1 bit, that is successor.

501
00:29:37,060 --> 00:29:42,330
And then, I want to find the
first 1 bit in that cluster.

502
00:29:42,330 --> 00:29:43,690
Is that a successor also?

503
00:29:52,251 --> 00:29:52,750
Yeah.

504
00:29:52,750 --> 00:29:56,460
That's just the successor
of negative infinity.

505
00:29:56,460 --> 00:30:01,250
Finding the minimum element in a
cluster is the successor of -1,

506
00:30:01,250 --> 00:30:02,940
or 0, or not zero.

507
00:30:02,940 --> 00:30:05,880
But -1 would work, or
negative infinity, maybe more

508
00:30:05,880 --> 00:30:06,469
intuitively.

509
00:30:06,469 --> 00:30:08,010
That'll find the
smallest thing here.

510
00:30:08,010 --> 00:30:10,490
So each of these is
a recursive call.

511
00:30:10,490 --> 00:30:15,230
I can think of it as
recursively calling successor.

512
00:30:15,230 --> 00:30:16,740
So let's do that.

513
00:30:24,770 --> 00:30:28,410
I want to find the successor
of x in v. First thing

514
00:30:28,410 --> 00:30:32,220
I'm going to do is
do the ij breakdown.

515
00:30:32,220 --> 00:30:39,380
I'll let i be high of x and
j be-- I could do low of x.

516
00:30:39,380 --> 00:30:44,940
But what I'm going to try for is
to search within this cluster,

517
00:30:44,940 --> 00:30:46,310
high of x.

518
00:30:46,310 --> 00:30:53,000
So I'm going to look for
the successor of cluster i,

519
00:30:53,000 --> 00:30:59,914
which is cluster high
of x, of low of x.

520
00:30:59,914 --> 00:31:03,870
OK, so that's this first step
of looking in x's cluster.

521
00:31:03,870 --> 00:31:05,310
This is x's cluster.

522
00:31:05,310 --> 00:31:06,916
This is x's name in the cluster.

523
00:31:06,916 --> 00:31:08,540
I'm going to try to
find the successor.

524
00:31:08,540 --> 00:31:10,110
But it might say infinity.

525
00:31:10,110 --> 00:31:12,160
I didn't find anything.

526
00:31:12,160 --> 00:31:15,660
And then I'll be unhappy
if j equals infinity.

527
00:31:21,270 --> 00:31:23,140
So that's line one.

528
00:31:32,070 --> 00:31:33,830
Well, then we're in
the wrong cluster.

529
00:31:33,830 --> 00:31:35,570
High of x is not
the right cluster.

530
00:31:35,570 --> 00:31:37,550
Let's find the
correct cluster, which

531
00:31:37,550 --> 00:31:40,370
is going to be the
next non-empty cluster.

532
00:31:40,370 --> 00:31:50,360
So I'm going to change i to be
the successor in the summary

533
00:31:50,360 --> 00:31:57,025
structure of i.

534
00:31:57,025 --> 00:31:59,480
So i was the name of a cluster.

535
00:31:59,480 --> 00:32:00,650
It may have items in it.

536
00:32:00,650 --> 00:32:02,830
But we want to find the
next non-empty thing.

537
00:32:02,830 --> 00:32:06,920
Because we know the successor
we're looking for is not here.

538
00:32:09,490 --> 00:32:09,990
OK.

539
00:32:09,990 --> 00:32:13,190
So this is the cluster
we now belong in.

540
00:32:13,190 --> 00:32:15,000
What item in the
cluster do we want?

541
00:32:15,000 --> 00:32:17,620
Well, we want to find the
minimum item in that cluster.

542
00:32:17,620 --> 00:32:24,280
And we're going to do that
by a recursive call, which

543
00:32:24,280 --> 00:32:40,730
is j is the successor within
cluster i of minus infinity,

544
00:32:40,730 --> 00:32:42,280
I'll say.

545
00:32:42,280 --> 00:32:43,800
-1 would also work.

546
00:32:43,800 --> 00:32:46,450
So this will find the
smallest item in the cluster.

547
00:32:46,450 --> 00:32:50,720
And then, in both
cases, we get i and j,

548
00:32:50,720 --> 00:32:53,900
which together in
this form describe

549
00:32:53,900 --> 00:32:55,860
the value x that we care about.

550
00:32:55,860 --> 00:33:02,610
So I'm just going to
say, return index of ij.

551
00:33:02,610 --> 00:33:08,140
That's how we reconstruct an
item name for the structure v.

552
00:33:08,140 --> 00:33:10,260
We knew which
substructure it's in.

553
00:33:10,260 --> 00:33:12,740
And we know its name
within the substructure,

554
00:33:12,740 --> 00:33:14,790
within the cluster.

555
00:33:14,790 --> 00:33:18,280
Is this algorithm
clearly correct?

556
00:33:18,280 --> 00:33:18,905
Good.

557
00:33:18,905 --> 00:33:21,120
It's also really bad.

558
00:33:21,120 --> 00:33:23,350
Well, it's better than
everything we've done so far.

559
00:33:23,350 --> 00:33:25,790
The last result we had
was square root of u.

560
00:33:25,790 --> 00:33:30,440
This is going to be better than
that, but still not log log u.

561
00:33:30,440 --> 00:33:31,370
Why?

562
00:33:31,370 --> 00:33:32,430
Both of these are bad.

563
00:33:38,990 --> 00:33:39,666
Yeah.

564
00:33:39,666 --> 00:33:42,932
AUDIENCE: You make more than
one call to [? your insert. ?]

565
00:33:42,932 --> 00:33:43,640
PROFESSOR: Right.

566
00:33:43,640 --> 00:33:46,134
I make more than
one recursive call

567
00:33:46,134 --> 00:33:47,550
to whatever the
operation is here.

568
00:33:47,550 --> 00:33:49,600
Insert calls insert twice.

569
00:33:49,600 --> 00:33:52,915
Here, successor calls successor
potentially three times.

570
00:33:55,830 --> 00:33:57,430
This is a good challenge for me.

571
00:33:57,430 --> 00:33:59,411
Let's see.

572
00:33:59,411 --> 00:34:00,480
Eh, not bad.

573
00:34:00,480 --> 00:34:02,564
Off by one.

574
00:34:02,564 --> 00:34:05,250
OK, that's a common problem
in computer science, right?

575
00:34:05,250 --> 00:34:07,649
Always off by one errors.

576
00:34:07,649 --> 00:34:09,690
OK, so let's think of it
in terms of recurrences,

577
00:34:09,690 --> 00:34:10,830
in case that's not clear.

578
00:34:10,830 --> 00:34:16,929
Here we have t of u is 2
times t of square root of u.

579
00:34:16,929 --> 00:34:18,739
Right, to solve a
problem of size u,

580
00:34:18,739 --> 00:34:23,280
I solve two problems of size
square root of u plus constant.

581
00:34:23,280 --> 00:34:26,050
Because high of x and
low of x, I'm assuming,

582
00:34:26,050 --> 00:34:27,402
take constant time to do.

583
00:34:27,402 --> 00:34:28,610
It's just, I have an integer.

584
00:34:28,610 --> 00:34:29,750
I divide it in half.

585
00:34:29,750 --> 00:34:30,429
Those are cheap.

586
00:34:33,949 --> 00:34:35,920
What does this solve to?

587
00:34:35,920 --> 00:34:38,489
It's probably easier to think
of it in terms of log u.

588
00:34:38,489 --> 00:34:40,900
Then we could apply
the master method.

589
00:34:40,900 --> 00:34:45,270
Right, this is the same
thing as t prime of log u

590
00:34:45,270 --> 00:34:52,055
is 2 times t of log u
divided by 2 plus order 1.

591
00:34:58,750 --> 00:35:01,270
This is not quite the
merge sort recurrence.

592
00:35:01,270 --> 00:35:04,452
But it's not good.

593
00:35:04,452 --> 00:35:05,910
One way to think
of it, is we start

594
00:35:05,910 --> 00:35:07,860
with the total weight of log u.

595
00:35:07,860 --> 00:35:11,580
We split into log over
2, but two copies of it.

596
00:35:11,580 --> 00:35:14,190
So we're not saving anything.

597
00:35:14,190 --> 00:35:16,520
And we didn't reduce
the problem strictly.

598
00:35:16,520 --> 00:35:18,940
In terms of the recursion
tree, we have, you know,

599
00:35:18,940 --> 00:35:22,400
log u-- well, it's hard
to think about because we

600
00:35:22,400 --> 00:35:28,396
have constant total cost.

601
00:35:28,396 --> 00:35:30,520
You could just plug this
in with the Master method,

602
00:35:30,520 --> 00:35:32,990
or see that essentially
we're conserving mass.

603
00:35:32,990 --> 00:35:34,640
We started with log u mass.

604
00:35:34,640 --> 00:35:36,350
We have two copies
of log u over 2.

605
00:35:36,350 --> 00:35:38,290
That's the same total mass.

606
00:35:38,290 --> 00:35:41,270
So how many recursions do we do?

607
00:35:41,270 --> 00:35:44,360
Well we do do log
log u recursions.

608
00:35:44,360 --> 00:35:48,080
The total number of leaves in
that recursion tree is log u.

609
00:35:48,080 --> 00:35:50,060
Each of them, we pay constant.

610
00:35:50,060 --> 00:35:58,490
So this is log u, not log log u.

611
00:35:58,490 --> 00:36:01,570
To get log log u, we need
to change this 2 into a 1.

612
00:36:01,570 --> 00:36:04,220
We can only afford
one recursive call.

613
00:36:04,220 --> 00:36:07,570
If we have two recursive calls,
we get logarithmic performance.

614
00:36:07,570 --> 00:36:11,240
If we have three recursive
calls, it's even worse.

615
00:36:11,240 --> 00:36:13,375
Here, I would definitely
use the Master method.

616
00:36:13,375 --> 00:36:16,400
It's less obvious.

617
00:36:16,400 --> 00:36:24,900
In this case, we get log u
to the log base 2 of 3 power,

618
00:36:24,900 --> 00:36:30,514
which is log u to the 1.6 or
so, so both worse than log n.

619
00:36:30,514 --> 00:36:31,930
This is strictly
worse than log n.

620
00:36:31,930 --> 00:36:34,920
This is maybe just a little
bit worse than log n,

621
00:36:34,920 --> 00:36:37,156
depending on how u relates to n.

622
00:36:37,156 --> 00:36:38,620
OK, so we're not there yet.

623
00:36:38,620 --> 00:36:39,881
But we're on the right track.

624
00:36:39,881 --> 00:36:41,380
We have the right
kind of structure.

625
00:36:41,380 --> 00:36:43,040
We have a problem of size u.

626
00:36:43,040 --> 00:36:46,640
We split it up into square root
of u sub problems of size u.

627
00:36:46,640 --> 00:36:48,200
From a data structures
perspective,

628
00:36:48,200 --> 00:36:49,850
this the first time we're
using divide and conquer

629
00:36:49,850 --> 00:36:50,780
for data structures.

630
00:36:50,780 --> 00:36:53,490
It's a little different
from algorithms.

631
00:36:53,490 --> 00:36:57,507
So that's how the data
structure is being laid out.

632
00:36:57,507 --> 00:36:59,840
But now we're worried about
the algorithms on those data

633
00:36:59,840 --> 00:37:00,340
structures.

634
00:37:00,340 --> 00:37:02,960
Those, we can only afford t
of u equals 1 times [? t of ?]

635
00:37:02,960 --> 00:37:04,150
squared of u plus order 1.

636
00:37:04,150 --> 00:37:06,169
Then we get log log u.

637
00:37:06,169 --> 00:37:07,710
So, here we have
two recursive calls.

638
00:37:07,710 --> 00:37:10,030
Somehow we have
to have only one.

639
00:37:10,030 --> 00:37:12,020
Let's start by fixing insert.

640
00:37:16,311 --> 00:37:16,810
Insert?

641
00:37:20,671 --> 00:37:21,170
No.

642
00:37:21,170 --> 00:37:22,890
Let's start by fixing successor.

643
00:37:22,890 --> 00:37:26,231
I think that will
be more intuitive.

644
00:37:26,231 --> 00:37:27,230
Let's look at successor.

645
00:37:27,230 --> 00:37:29,000
Because successor
is almost there.

646
00:37:29,000 --> 00:37:31,650
A lot of the time, it's just
going to make this call,

647
00:37:31,650 --> 00:37:33,040
and we're happy.

648
00:37:33,040 --> 00:37:37,040
The bad cases is when we need
that make both of these calls.

649
00:37:37,040 --> 00:37:40,590
Then there's three
total, very bad.

650
00:37:40,590 --> 00:37:44,420
How could I get
rid of this call?

651
00:37:44,420 --> 00:37:46,910
I was being all clever,
that the minimum element is

652
00:37:46,910 --> 00:37:48,700
the successor of
negative infinity.

653
00:37:48,700 --> 00:37:52,215
But that's actually
not the right idea.

654
00:37:52,215 --> 00:37:52,715
Yeah.

655
00:37:52,715 --> 00:37:57,477
[? AUDIENCE: Catching ?] the
minimum element in cluster i.

656
00:37:57,477 --> 00:37:59,560
PROFESSOR: Store the minimum
element of cluster i.

657
00:37:59,560 --> 00:38:00,059
Yeah.

658
00:38:00,059 --> 00:38:05,690
In general, for every structure
v, let's store the minimum.

659
00:38:05,690 --> 00:38:06,450
Why not?

660
00:38:06,450 --> 00:38:08,470
We know how to
augment structures.

661
00:38:11,570 --> 00:38:14,330
Here in 006, you
took an AVL tree,

662
00:38:14,330 --> 00:38:17,120
and you augment node to store
the sub-tree size of the node.

663
00:38:17,120 --> 00:38:20,400
In this case, we're doing a
similar kind of augmentation.

664
00:38:20,400 --> 00:38:24,130
Just for every structure, keep
track of what the minimum is.

665
00:38:24,130 --> 00:38:26,925
So that will be
idea number four.

666
00:38:44,297 --> 00:38:45,630
I'm going to add something here.

667
00:38:45,630 --> 00:38:47,730
But for now, let's
store the minimums.

668
00:38:47,730 --> 00:38:54,165
So to do an insert into
to structure v, item x,

669
00:38:54,165 --> 00:38:55,790
first thing we'll do
is just say, well,

670
00:38:55,790 --> 00:38:58,260
if x is-- let's see if
it's the new minimum.

671
00:38:58,260 --> 00:39:02,522
Maybe x is smaller
than v dot min.

672
00:39:02,522 --> 00:39:08,590
If that's the case, let's
just set v dot min to x.

673
00:39:08,590 --> 00:39:09,090
OK?

674
00:39:09,090 --> 00:39:12,070
And then, the rest is
the same, same insertion

675
00:39:12,070 --> 00:39:17,340
algorithm as over here,
these two recursive calls.

676
00:39:17,340 --> 00:39:19,020
I just spent constant
additional time.

677
00:39:19,020 --> 00:39:21,650
And now every structure
knows it's minimum.

678
00:39:21,650 --> 00:39:22,870
Again, ignore delete for now.

679
00:39:22,870 --> 00:39:25,210
That's trickier.

680
00:39:25,210 --> 00:39:28,620
OK, now every structure
knows its minimum,

681
00:39:28,620 --> 00:39:33,816
which means we can replace
this call with just v dot

682
00:39:33,816 --> 00:39:37,060
cluster i dot min.

683
00:39:37,060 --> 00:39:39,060
One down.

684
00:39:39,060 --> 00:39:50,330
OK, so if we look at
successor, of v comma x.

685
00:39:50,330 --> 00:39:53,270
I'm going to replace the last
line, or next to last line

686
00:39:53,270 --> 00:40:03,180
with j equals v
cluster i dot min.

687
00:40:10,070 --> 00:40:13,430
So now, we're down
to log u performance.

688
00:40:13,430 --> 00:40:15,610
We only have, at most,
two recursive calls.

689
00:40:15,610 --> 00:40:19,540
So that's partial progress.

690
00:40:19,540 --> 00:40:23,730
But we need another idea to
get rid of the second one.

691
00:40:23,730 --> 00:40:29,540
And the intuition here is that
really, only one of these call

692
00:40:29,540 --> 00:40:31,150
should matter.

693
00:40:31,150 --> 00:40:35,429
OK, let's draw the big picture.

694
00:40:35,429 --> 00:40:37,220
Here's what the recursive
thing looks like.

695
00:40:37,220 --> 00:40:38,219
We've got v dot summary.

696
00:40:41,340 --> 00:40:46,340
Then we've got a
cluster 0, cluster 1,

697
00:40:46,340 --> 00:40:50,900
cluster square
root of u minus 1.

698
00:40:50,900 --> 00:40:53,930
Each of those is a
recursive structure.

699
00:40:53,930 --> 00:40:57,690
And we're also just storing
the min over here as a copy.

700
00:41:00,240 --> 00:41:05,820
So when I do a query
for, I don't know,

701
00:41:05,820 --> 00:41:11,950
the successor of this guy,
there's kind of two cases.

702
00:41:11,950 --> 00:41:16,160
One situation is that I
find the successor somewhere

703
00:41:16,160 --> 00:41:17,620
in this interval.

704
00:41:17,620 --> 00:41:18,700
In that case, I'm happy.

705
00:41:18,700 --> 00:41:22,474
Because I just need
this one recursive call.

706
00:41:22,474 --> 00:41:23,890
OK, the other case
is that I don't

707
00:41:23,890 --> 00:41:25,840
find what I'm looking for here.

708
00:41:25,840 --> 00:41:29,030
Then I have to do a
successor up here.

709
00:41:29,030 --> 00:41:30,250
And then I'm done.

710
00:41:30,250 --> 00:41:33,030
Then I can teleport into
whatever cluster it is.

711
00:41:33,030 --> 00:41:34,710
And I've stored the min by now.

712
00:41:34,710 --> 00:41:38,040
So that's constant time to
jump into the right spot

713
00:41:38,040 --> 00:41:40,880
in the cluster.

714
00:41:40,880 --> 00:41:43,580
So either I find what
I'm looking for here,

715
00:41:43,580 --> 00:41:46,295
or I find what I'm
looking for here.

716
00:41:46,295 --> 00:41:47,670
What would be
really nice is if I

717
00:41:47,670 --> 00:41:50,800
could tell ahead of time
which one is going to succeed.

718
00:41:50,800 --> 00:41:53,710
Because then, if I know this
is not going to find anything,

719
00:41:53,710 --> 00:41:56,190
I might as well just
go immediately up here,

720
00:41:56,190 --> 00:41:58,859
and look at the successor
in the summary structure.

721
00:41:58,859 --> 00:42:00,650
If I know I'm going to
find something here,

722
00:42:00,650 --> 00:42:02,180
I'll just do the successor here.

723
00:42:02,180 --> 00:42:03,435
And I'm done.

724
00:42:03,435 --> 00:42:04,810
If I could just
get away with one

725
00:42:04,810 --> 00:42:08,000
or the other of these calls,
not both, I'd be very happy.

726
00:42:08,000 --> 00:42:10,072
How could I tell that?

727
00:42:10,072 --> 00:42:11,064
Yeah.

728
00:42:11,064 --> 00:42:13,050
AUDIENCE: Store the max.

729
00:42:13,050 --> 00:42:15,510
PROFESSOR: Store the max.

730
00:42:15,510 --> 00:42:16,840
Store the min and the max.

731
00:42:16,840 --> 00:42:19,230
Why not?

732
00:42:19,230 --> 00:42:21,830
OK, I just need a
similar line here.

733
00:42:21,830 --> 00:42:26,570
If x is bigger than v
dot max, change the max.

734
00:42:31,090 --> 00:42:33,080
So now, I've augmented
my data structure

735
00:42:33,080 --> 00:42:35,200
to have the min and
max at every level.

736
00:42:35,200 --> 00:42:39,690
And what's going on here
is, I won't find an answer

737
00:42:39,690 --> 00:42:43,060
if I am greater than
or equal to the maximum

738
00:42:43,060 --> 00:42:44,670
within this cluster.

739
00:42:44,670 --> 00:42:46,580
That's how I tell.

740
00:42:46,580 --> 00:42:49,420
If I'm equal to the max,
or if I'm beyond the max,

741
00:42:49,420 --> 00:42:52,860
if all the items are over here,
the max will be to my left.

742
00:42:52,860 --> 00:42:54,830
And then I know I will
fail within the cluster.

743
00:42:54,830 --> 00:42:58,732
So I might as well just go up
to summary and do it there.

744
00:42:58,732 --> 00:43:00,565
On the other hand, if
I'm less than the max,

745
00:43:00,565 --> 00:43:02,981
then I'm guaranteed I will
find something in this cluster.

746
00:43:02,981 --> 00:43:05,630
And so I can just
search in there.

747
00:43:05,630 --> 00:43:07,740
So all I need to
do-- I'll probably

748
00:43:07,740 --> 00:43:09,740
have to rewrite this slightly.

749
00:43:12,420 --> 00:43:25,980
If x is-- not x, close.

750
00:43:25,980 --> 00:43:30,880
I'm going to mimic this
code a little bit, at least

751
00:43:30,880 --> 00:43:35,590
the first line is going
to be i equals high of x.

752
00:43:35,590 --> 00:43:38,310
And now, that's the
cluster I'm starting in.

753
00:43:38,310 --> 00:43:41,150
And I want to look at the
maximum of that cluster.

754
00:43:58,630 --> 00:44:01,920
So I'm looking at v
dot cluster i dot max.

755
00:44:01,920 --> 00:44:04,330
And I want to know,
is x before that?

756
00:44:04,330 --> 00:44:07,180
Now within that cluster,
x is known as low of x.

757
00:44:07,180 --> 00:44:12,520
So I compare low of x to
cluster i's maximum element.

758
00:44:12,520 --> 00:44:14,340
If we're strictly to
the left, then there

759
00:44:14,340 --> 00:44:18,020
is a successor guaranteed
within that substructure.

760
00:44:18,020 --> 00:44:20,120
And so, I should do this line.

761
00:44:22,980 --> 00:44:24,310
I wish I could copy paste.

762
00:44:24,310 --> 00:44:30,220
But I will copy by hand.

763
00:44:30,220 --> 00:44:43,692
Successor within v dot
cluster i, of low of x.

764
00:44:43,692 --> 00:44:46,020
OK, then I've found the
item I'm looking for.

765
00:44:49,140 --> 00:44:56,760
Else, I'm beyond the max, I
know this is the wrong cluster.

766
00:44:56,760 --> 00:45:00,450
And so I should immediately
do these two lines, well,

767
00:45:00,450 --> 00:45:03,400
except I've made the
second line use the min.

768
00:45:03,400 --> 00:45:06,320
So it will only be one recursive
call, followed by a min.

769
00:45:09,790 --> 00:45:21,110
OK, so this is going to be i
equals the successor within v

770
00:45:21,110 --> 00:45:26,695
dot summary of high of x.

771
00:45:40,460 --> 00:45:49,030
And then j is that
line successor

772
00:45:49,030 --> 00:45:51,370
within-- oh, sorry--
the line that I

773
00:45:51,370 --> 00:45:55,540
used to have here, which
is going to be v cluster i

774
00:45:55,540 --> 00:45:56,230
dot min.

775
00:46:00,830 --> 00:46:06,670
OK, and then, in both
cases, I return index of ij.

776
00:46:12,030 --> 00:46:14,890
OK, so we're doing essentially
the same logic as over here.

777
00:46:14,890 --> 00:46:17,155
Although I've replaced
the step with the min,

778
00:46:17,155 --> 00:46:18,894
to get rid of that
recursive call.

779
00:46:18,894 --> 00:46:21,060
But I'm really only doing
one or the other of these,

780
00:46:21,060 --> 00:46:23,380
using max to distinguish.

781
00:46:23,380 --> 00:46:25,970
If I'm left of the
max, I do the successor

782
00:46:25,970 --> 00:46:28,290
within cluster high of x.

783
00:46:28,290 --> 00:46:33,796
If I'm right of the max,
then I do the successor

784
00:46:33,796 --> 00:46:35,170
immediately in
summary structure.

785
00:46:35,170 --> 00:46:37,650
Because I know this won't
find anything useful.

786
00:46:37,650 --> 00:46:43,000
And then I find the min within
that non-empty structure.

787
00:46:43,000 --> 00:46:45,979
And in both cases, ij is
the element I'm looking for.

788
00:46:45,979 --> 00:46:47,395
I put it back
together with index.

789
00:46:50,330 --> 00:46:52,840
Clear?

790
00:46:52,840 --> 00:46:57,150
What's the running
time of successor now?

791
00:46:57,150 --> 00:46:57,850
Log log u.

792
00:47:02,230 --> 00:47:03,720
Awesome.

793
00:47:03,720 --> 00:47:06,300
We've finished successor.

794
00:47:06,300 --> 00:47:09,390
Sadly, we have not
finished insert.

795
00:47:09,390 --> 00:47:11,300
Insert still takes log u time.

796
00:47:11,300 --> 00:47:13,720
But, b progress.

797
00:47:13,720 --> 00:47:16,150
Maybe your routing table
doesn't change that often,

798
00:47:16,150 --> 00:47:19,730
so you can afford to pay
some extra time for insert,

799
00:47:19,730 --> 00:47:21,790
as long as you can route
packets really fast,

800
00:47:21,790 --> 00:47:24,310
as long as you can find
where something belongs,

801
00:47:24,310 --> 00:47:26,760
the successor in log log u time.

802
00:47:26,760 --> 00:47:31,730
But for kicks, let's do
insert in log log u as well.

803
00:47:31,730 --> 00:47:35,070
This is going to
be a little harder,

804
00:47:35,070 --> 00:47:39,070
or I would say a
more surprising idea.

805
00:47:41,439 --> 00:47:41,980
This may be--

806
00:47:55,681 --> 00:47:57,960
I don't have a great
intuition for this step.

807
00:47:57,960 --> 00:47:58,671
I'm thinking.

808
00:48:01,330 --> 00:48:05,010
But again, most of the time,
this should be fine, right?

809
00:48:05,010 --> 00:48:08,720
Most of the time, we insert into
cluster high of x, low of x,

810
00:48:08,720 --> 00:48:09,810
and we're done.

811
00:48:09,810 --> 00:48:14,100
As long as there is something
already in that cluster,

812
00:48:14,100 --> 00:48:16,330
we don't need to update
the summary structure.

813
00:48:16,330 --> 00:48:19,380
As long as high of x has already
been inserted into the summary

814
00:48:19,380 --> 00:48:22,200
structure, we can get away
with just this first step.

815
00:48:22,200 --> 00:48:25,110
The tricky part is detecting.

816
00:48:25,110 --> 00:48:26,810
How would we know?

817
00:48:26,810 --> 00:48:30,210
Well, that's not enough
just to detect it.

818
00:48:30,210 --> 00:48:33,110
If high of x is not
in v dot summary,

819
00:48:33,110 --> 00:48:34,685
we have to do this insert.

820
00:48:34,685 --> 00:48:37,420
We can't get away with it.

821
00:48:37,420 --> 00:48:38,679
But that's kind of rare.

822
00:48:38,679 --> 00:48:40,220
That only happens
the very first time

823
00:48:40,220 --> 00:48:41,872
you insert into that cluster.

824
00:48:41,872 --> 00:48:44,080
Every subsequent time, it's
going to be really cheap.

825
00:48:44,080 --> 00:48:47,650
We just have to do this.

826
00:48:47,650 --> 00:48:51,590
It's easy enough to keep track
of whether a cluster is empty.

827
00:48:51,590 --> 00:48:53,300
For example, we're
storing the min.

828
00:48:53,300 --> 00:48:57,910
We can say v dot min is
none, special value, whenever

829
00:48:57,910 --> 00:49:00,579
the structure v is empty.

830
00:49:00,579 --> 00:49:03,120
But we still have this problem,
that the first time we insert

831
00:49:03,120 --> 00:49:05,035
into a cluster, it's expensive.

832
00:49:05,035 --> 00:49:06,160
Because we have to do this.

833
00:49:06,160 --> 00:49:09,390
And we have to do this.

834
00:49:09,390 --> 00:49:17,110
How could we avoid, in the
case where a cluster is empty--

835
00:49:17,110 --> 00:49:19,981
remember, an overall
structure looks like this.

836
00:49:19,981 --> 00:49:22,230
We can tell that it's empty
by saying min equals none,

837
00:49:22,230 --> 00:49:24,590
let's say.

838
00:49:24,590 --> 00:49:25,600
What could I do?

839
00:49:25,600 --> 00:49:27,363
Sorry, there's also a max now.

840
00:49:30,820 --> 00:49:35,140
What could I do to
speed up inserting

841
00:49:35,140 --> 00:49:36,282
into an empty cluster?

842
00:49:36,282 --> 00:49:37,990
Because I'm first
going to have to insert

843
00:49:37,990 --> 00:49:38,949
into the empty cluster.

844
00:49:38,949 --> 00:49:41,031
Then I'm going to have to
answer into the summary.

845
00:49:41,031 --> 00:49:42,260
I can't get away from this.

846
00:49:42,260 --> 00:49:46,354
So I'd like this to become
cheap, in the special case when

847
00:49:46,354 --> 00:49:47,270
this cluster is empty.

848
00:49:53,050 --> 00:49:53,550
Yeah.

849
00:49:53,550 --> 00:49:55,070
AUDIENCE: Lazy propogation.

850
00:49:55,070 --> 00:49:57,800
PROFESSOR: Lazy propagation--
you want to elaborate?

851
00:49:57,800 --> 00:49:58,780
AUDIENCE: Yeah.

852
00:49:58,780 --> 00:50:04,660
We mark the place we
want to insert in.

853
00:50:04,660 --> 00:50:07,914
And then we will take it down
whenever we [? insert ?] there.

854
00:50:07,914 --> 00:50:08,580
PROFESSOR: Good.

855
00:50:08,580 --> 00:50:11,690
So when I insert into
an empty structure,

856
00:50:11,690 --> 00:50:15,460
I'm just going to have a little
lazy field, or something.

857
00:50:15,460 --> 00:50:18,170
And I'll put the item in there.

858
00:50:18,170 --> 00:50:19,940
And then the next
time I insert into it,

859
00:50:19,940 --> 00:50:22,550
maybe I'll carry it
down a little bit.

860
00:50:22,550 --> 00:50:24,120
That actually works.

861
00:50:24,120 --> 00:50:27,529
And that was the original
van Emde Boas structure,

862
00:50:27,529 --> 00:50:29,070
[? I ?] [? learned ?]
[? recently. ?]

863
00:50:29,070 --> 00:50:31,390
So that works.

864
00:50:31,390 --> 00:50:33,900
But it's a little more
complicated than the solution

865
00:50:33,900 --> 00:50:35,040
I have in mind.

866
00:50:35,040 --> 00:50:41,940
So I'm going to unify that lazy
field with the minimum field.

867
00:50:41,940 --> 00:50:43,870
Say, when I insert
into a structure,

868
00:50:43,870 --> 00:50:45,570
if there's nothing
here, I'm just

869
00:50:45,570 --> 00:50:49,370
going to put the item
there, and not recurse.

870
00:50:49,370 --> 00:50:54,239
I just am not going to store
the minimum item recursively.

871
00:50:54,239 --> 00:50:55,030
Definitely frisbee.

872
00:50:57,940 --> 00:51:02,230
So that's the last
idea, pretty much.

873
00:51:11,040 --> 00:51:18,335
Idea number five is, don't
store the min recursively.

874
00:51:23,880 --> 00:51:26,672
This is effectively
equivalent to lazy.

875
00:51:26,672 --> 00:51:28,130
But we're actually
just never going

876
00:51:28,130 --> 00:51:30,890
to get around to
moving this guy down.

877
00:51:30,890 --> 00:51:32,180
Just leave it there.

878
00:51:32,180 --> 00:51:35,606
First, if the min field is
blank, store the item there.

879
00:51:35,606 --> 00:51:36,106
Yeah.

880
00:51:36,106 --> 00:51:38,629
AUDIENCE: What do you mean
by moving the guy down?

881
00:51:38,629 --> 00:51:40,670
PROFESSOR: Don't worry
about moving the guy down.

882
00:51:40,670 --> 00:51:41,711
We're not going to do it.

883
00:51:41,711 --> 00:51:43,230
AUDIENCE: [INAUDIBLE]

884
00:51:43,230 --> 00:51:45,000
PROFESSOR: But in
general, moving down

885
00:51:45,000 --> 00:51:46,980
means, when I want
to insert an item,

886
00:51:46,980 --> 00:51:50,570
I have to move it down
into its sub cluster.

887
00:51:50,570 --> 00:51:54,020
So I want to insert
x into the cluster,

888
00:51:54,020 --> 00:51:56,680
high of x with low of
x, that recursive call.

889
00:51:56,680 --> 00:51:58,230
That's moving it down.

890
00:51:58,230 --> 00:51:59,320
I'm not going to do that.

891
00:51:59,320 --> 00:52:02,700
If the structure
is empty, I'm going

892
00:52:02,700 --> 00:52:06,910
to set v dot min equal
to x, and then stop.

893
00:52:06,910 --> 00:52:18,725
Let me illustrate with
some code, maybe over here.

894
00:52:44,960 --> 00:52:46,170
Here's what I mean.

895
00:52:46,170 --> 00:52:50,740
If v dot min is special
none value-- use

896
00:52:50,740 --> 00:52:54,370
Python notation here--
then I'm just going

897
00:52:54,370 --> 00:52:55,730
to set v dot min equal to x.

898
00:52:55,730 --> 00:52:58,470
I should also set v
dot max equal to x.

899
00:52:58,470 --> 00:53:00,570
Because I want to keep
track of the maximum.

900
00:53:00,570 --> 00:53:01,540
And then, stop.

901
00:53:01,540 --> 00:53:03,480
Return.

902
00:53:03,480 --> 00:53:04,960
That's all I will
do for inserting

903
00:53:04,960 --> 00:53:08,320
into an empty structure, is
stick it in the max field.

904
00:53:11,040 --> 00:53:13,120
OK, this may seem
like a minor change.

905
00:53:13,120 --> 00:53:16,550
But it's going to
make this cheap.

906
00:53:16,550 --> 00:53:20,040
So the rest of the algorithm
is going to be pretty similar.

907
00:53:20,040 --> 00:53:23,700
There's a couple
annoying special cases,

908
00:53:23,700 --> 00:53:26,070
which is, we have to
keep the min up to date.

909
00:53:26,070 --> 00:53:28,925
And we have to keep the
max up to date, in general.

910
00:53:31,860 --> 00:53:32,750
This one is easy.

911
00:53:32,750 --> 00:53:35,881
We just set v dot
max equal to x.

912
00:53:35,881 --> 00:53:37,880
Because we're not doing
anything fancy with max.

913
00:53:37,880 --> 00:53:39,000
Min is a little special.

914
00:53:39,000 --> 00:53:43,150
Because if we're
inserting an item smaller

915
00:53:43,150 --> 00:53:47,960
than the current minimum, then
really x belongs in the slot.

916
00:53:47,960 --> 00:53:49,670
And then whatever
was in here needs

917
00:53:49,670 --> 00:53:51,390
to be recursively inserted.

918
00:53:51,390 --> 00:53:59,044
OK, so I'm going to say
swap x with v dot min.

919
00:53:59,044 --> 00:54:00,960
So I'm going to put x
into the v dot min slot.

920
00:54:00,960 --> 00:54:03,200
And I'm going to pull out
whatever item was in there

921
00:54:03,200 --> 00:54:04,800
and call it x now.

922
00:54:04,800 --> 00:54:06,919
And now my remaining
goal is to insert x

923
00:54:06,919 --> 00:54:08,210
into the rest of the structure.

924
00:54:08,210 --> 00:54:12,219
There's only one item that
gets this freedom of not

925
00:54:12,219 --> 00:54:13,260
being recursively stored.

926
00:54:13,260 --> 00:54:15,093
And it's always going
to be the minimum one.

927
00:54:15,093 --> 00:54:18,524
So this way, the new
value x goes there.

928
00:54:18,524 --> 00:54:21,190
Whatever it used to be there now
has to be recursively inserted.

929
00:54:21,190 --> 00:54:23,170
Because every item
except the minimum,

930
00:54:23,170 --> 00:54:25,660
we're going to
recursively insert.

931
00:54:25,660 --> 00:54:27,680
So the rest is
pretty much the same.

932
00:54:27,680 --> 00:54:33,500
But we're going to,
instead of always inserting

933
00:54:33,500 --> 00:54:35,060
into the summary
structure, we're

934
00:54:35,060 --> 00:54:37,740
going to see whether
it's necessary.

935
00:54:37,740 --> 00:54:39,370
Because we know how to do that.

936
00:54:39,370 --> 00:54:42,720
We just look at a
cluster high of x.

937
00:54:42,720 --> 00:54:47,720
And we see, is it empty?

938
00:54:47,720 --> 00:54:55,810
Cluster high of x-- and empty
means its minimum is none.

939
00:54:59,450 --> 00:55:02,860
So we're going to--
in fact, the next line

940
00:55:02,860 --> 00:55:09,670
after this one is going
to be insert v cluster

941
00:55:09,670 --> 00:55:21,270
high of x, comma low of x.

942
00:55:21,270 --> 00:55:23,741
All right, that's this line.

943
00:55:23,741 --> 00:55:24,990
We're always going to do that.

944
00:55:27,680 --> 00:55:30,080
And in the special case,
where there was not previously

945
00:55:30,080 --> 00:55:32,810
nothing in v cluster
high of x, we

946
00:55:32,810 --> 00:55:35,080
need to update the
summary structure.

947
00:55:35,080 --> 00:55:38,550
And we do that with this line.

948
00:55:38,550 --> 00:55:54,490
So I'm going to insert into
v dot summary high of x.

949
00:55:57,900 --> 00:56:00,737
But I'm only doing that in
the case when I need to.

950
00:56:00,737 --> 00:56:03,320
If it was already non-empty, I
know this has already happened.

951
00:56:03,320 --> 00:56:06,640
So I don't need to bother
with that insertion.

952
00:56:06,640 --> 00:56:08,230
OK, this is a weird algorithm.

953
00:56:08,230 --> 00:56:11,150
Because it doesn't
look much better.

954
00:56:11,150 --> 00:56:15,110
In the worst case, we're doing
two recursive calls to insert.

955
00:56:15,110 --> 00:56:18,748
But I claim this runs
in log log u time.

956
00:56:18,748 --> 00:56:19,248
Why?

957
00:56:25,152 --> 00:56:26,628
Yeah.

958
00:56:26,628 --> 00:56:30,564
AUDIENCE: Because when we
update the v dot summary,

959
00:56:30,564 --> 00:56:32,774
we [? just ?] [? have the ?]
[? first ?] [? line. ?]

960
00:56:32,774 --> 00:56:33,440
PROFESSOR: Good.

961
00:56:33,440 --> 00:56:34,230
Yeah.

962
00:56:34,230 --> 00:56:36,540
In the case when I have to
do this summary insertion,

963
00:56:36,540 --> 00:56:38,190
I know this guy was empty.

964
00:56:38,190 --> 00:56:39,770
Cluster high of x was empty.

965
00:56:39,770 --> 00:56:43,640
So this call is just going
to do these two lines.

966
00:56:43,640 --> 00:56:45,680
Because I optimized
the case of empty--

967
00:56:45,680 --> 00:56:48,160
when a structure is empty,
I spend constant time,

968
00:56:48,160 --> 00:56:49,960
no recursive calls.

969
00:56:49,960 --> 00:56:52,900
That means in the case when
cluster high of x is empty,

970
00:56:52,900 --> 00:56:55,450
and I have to pay to insert
into the summary structure,

971
00:56:55,450 --> 00:56:57,630
I know my second call is
going to be free, only

972
00:56:57,630 --> 00:56:59,540
take constant time.

973
00:56:59,540 --> 00:57:02,510
So either I do this, in which
case this takes constant time,

974
00:57:02,510 --> 00:57:06,090
or I don't do this, in which
case I make one recursive call.

975
00:57:06,090 --> 00:57:10,690
In both cases, I really am
only making one recursive call.

976
00:57:10,690 --> 00:57:19,560
OK, so this runs in log log u.

977
00:57:19,560 --> 00:57:22,170
Because I get the t of u
equals 1 times square root

978
00:57:22,170 --> 00:57:24,490
of t of u plus
order 1 recurrence.

979
00:57:24,490 --> 00:57:28,404
All the work I'm doing
here is constant time,

980
00:57:28,404 --> 00:57:29,695
other than the recursive calls.

981
00:57:32,901 --> 00:57:33,400
Question?

982
00:57:33,400 --> 00:57:36,872
AUDIENCE: So when we
insert the first time,

983
00:57:36,872 --> 00:57:40,022
we don't update v dot summary?

984
00:57:40,022 --> 00:57:42,480
PROFESSOR: When I insert into
a completely empty structure,

985
00:57:42,480 --> 00:57:43,800
we don't update summary at all.

986
00:57:43,800 --> 00:57:44,430
That's right.

987
00:57:44,430 --> 00:57:46,870
We just store it in the
min, and we're done.

988
00:57:46,870 --> 00:57:47,760
AUDIENCE: Oh.

989
00:57:47,760 --> 00:57:52,401
So then, if you were to
[? call ?] the successor,

990
00:57:52,401 --> 00:57:52,900
and you--

991
00:57:52,900 --> 00:57:53,841
PROFESSOR: Good.

992
00:57:53,841 --> 00:57:54,340
Yeah.

993
00:57:54,340 --> 00:57:57,000
The successor algorithm
is currently incorrect.

994
00:57:57,000 --> 00:57:58,146
Thank you.

995
00:57:58,146 --> 00:58:01,935
Here's some frisbees for that
question and the last answer.

996
00:58:05,230 --> 00:58:05,730
Yeah.

997
00:58:05,730 --> 00:58:08,110
This code is now slightly wrong.

998
00:58:08,110 --> 00:58:12,480
Because sometimes I'm storing
elements in v dot min.

999
00:58:12,480 --> 00:58:15,240
And successor is just
completely ignoring them.

1000
00:58:15,240 --> 00:58:18,040
So it's not going
to find those items.

1001
00:58:18,040 --> 00:58:19,900
Luckily, it's a very simple fix.

1002
00:58:26,360 --> 00:58:30,180
Out of room, but please
insert right in here.

1003
00:58:30,180 --> 00:58:41,150
If x is less v dot
min, return v dot min.

1004
00:58:41,150 --> 00:58:43,440
That's all we need to do.

1005
00:58:43,440 --> 00:58:44,590
The min is special.

1006
00:58:44,590 --> 00:58:46,730
Because we're not
storing it recursively.

1007
00:58:46,730 --> 00:58:49,240
And so, we can't rely on all
of our recursive structures.

1008
00:58:49,240 --> 00:58:50,630
We can't rely on cluster i.

1009
00:58:50,630 --> 00:58:54,510
We can't rely on summary, on
reporting about v dot min.

1010
00:58:54,510 --> 00:58:57,720
v dot min is just a
special item sitting there.

1011
00:58:57,720 --> 00:58:59,040
It's represented nowhere else.

1012
00:59:01,560 --> 00:59:02,350
But we can check.

1013
00:59:02,350 --> 00:59:03,480
Because it's the
minimum element,

1014
00:59:03,480 --> 00:59:05,000
and we're looking
for successors,

1015
00:59:05,000 --> 00:59:06,940
it's really easy to
check for whether it's

1016
00:59:06,940 --> 00:59:09,070
the item we're looking for.

1017
00:59:09,070 --> 00:59:10,320
Because it's the smallest one.

1018
00:59:10,320 --> 00:59:13,372
If we're smaller than it, then
that's clearly the successor.

1019
00:59:13,372 --> 00:59:15,970
OK, so in that case, we
just spent constant time.

1020
00:59:15,970 --> 00:59:18,520
So it actually speeds up some
situations for successor.

1021
00:59:18,520 --> 00:59:19,825
We're not exploiting that here.

1022
00:59:19,825 --> 00:59:21,450
It doesn't help much
in the worst case.

1023
00:59:21,450 --> 00:59:22,870
But now, it should be correct.

1024
00:59:22,870 --> 00:59:25,322
Hopefully, you're happy.

1025
00:59:25,322 --> 00:59:26,155
Any other questions?

1026
00:59:29,670 --> 00:59:33,470
So at this point, we have what
I will call a van Emde Boas.

1027
00:59:33,470 --> 00:59:37,560
This last version-- we can do
insert and successor in log

1028
00:59:37,560 --> 00:59:38,220
log u time.

1029
00:59:41,100 --> 00:59:42,510
Yeah, sorry.

1030
00:59:42,510 --> 00:59:45,430
I modified the wrong
successor algorithm, didn't I?

1031
00:59:45,430 --> 00:59:46,910
I meant to modify this one.

1032
00:59:46,910 --> 00:59:47,830
This is the fast one.

1033
00:59:47,830 --> 00:59:53,300
So please put that code here.

1034
00:59:53,300 --> 00:59:55,940
That's the log log u
version of successor.

1035
00:59:55,940 --> 00:59:58,790
We just added this
constant time check.

1036
00:59:58,790 --> 01:00:01,050
And now this runs
in log log u time.

1037
01:00:01,050 --> 01:00:03,770
The key idea here was
if we store the max,

1038
01:00:03,770 --> 01:00:06,780
then we know which of the two
recursive calls we need to do.

1039
01:00:06,780 --> 01:00:08,500
If we store the min,
this doesn't end up

1040
01:00:08,500 --> 01:00:10,030
being a recursive call.

1041
01:00:10,030 --> 01:00:11,170
So that's very clean.

1042
01:00:11,170 --> 01:00:13,644
With insert, we needed this
trickier idea that the min,

1043
01:00:13,644 --> 01:00:15,560
we're not even going to
recursively represent.

1044
01:00:15,560 --> 01:00:17,390
We'll just keep it there.

1045
01:00:17,390 --> 01:00:20,340
That requires this extra
little check for successor.

1046
01:00:20,340 --> 01:00:22,580
But it allows us to
do insert cheaply

1047
01:00:22,580 --> 01:00:27,530
in all cases-- cheap meaning
only one recursive call.

1048
01:00:27,530 --> 01:00:29,530
Either we need to update
the summary structure,

1049
01:00:29,530 --> 01:00:31,310
in which case that
thing was empty,

1050
01:00:31,310 --> 01:00:34,720
and so we can think
of that cluster--

1051
01:00:34,720 --> 01:00:36,740
so we have this special
case of inserting

1052
01:00:36,740 --> 01:00:39,520
into an empty cluster,
which is super cheap,

1053
01:00:39,520 --> 01:00:42,900
or most of the time, you imagine
that the cluster was already

1054
01:00:42,900 --> 01:00:43,400
non-empty.

1055
01:00:43,400 --> 01:00:45,608
And so we don't need to
update the summary structure.

1056
01:00:45,608 --> 01:00:48,110
And then we just
do this recursion.

1057
01:00:48,110 --> 01:00:51,210
So in all cases,
everything is cheap.

1058
01:00:51,210 --> 01:00:54,940
Now the one thing I've
been avoiding is delete.

1059
01:00:54,940 --> 01:00:55,991
Yeah, question.

1060
01:00:55,991 --> 01:00:58,817
AUDIENCE: [INAUDIBLE] If x
is greater than [? v ?] max,

1061
01:00:58,817 --> 01:01:03,060
[? we ?] [? swap ?] [? x ?]
[? with ?] [? v ?] [? max? ?]

1062
01:01:03,060 --> 01:01:06,580
PROFESSOR: So if x is
greater than v max,

1063
01:01:06,580 --> 01:01:08,730
I'm just going to update v max.

1064
01:01:08,730 --> 01:01:10,240
V max is stored recursively.

1065
01:01:10,240 --> 01:01:12,520
We're not doing anything
fancy with v max.

1066
01:01:12,520 --> 01:01:15,650
And we had, at some
point, a similar line.

1067
01:01:15,650 --> 01:01:18,616
So this is just updating v max.

1068
01:01:18,616 --> 01:01:20,280
Yeah, nothing special there.

1069
01:01:20,280 --> 01:01:23,100
In your problem set, you'll look
at a more symmetric version,

1070
01:01:23,100 --> 01:01:25,620
where you don't recursively
store min and max.

1071
01:01:25,620 --> 01:01:26,700
It works about the same.

1072
01:01:26,700 --> 01:01:30,430
But in some ways, the
code is actually prettier.

1073
01:01:30,430 --> 01:01:31,804
So you'll get to do that.

1074
01:01:31,804 --> 01:01:32,470
Other questions?

1075
01:01:35,410 --> 01:01:37,900
All right.

1076
01:01:37,900 --> 01:01:40,880
So, delete.

1077
01:01:40,880 --> 01:01:42,174
We have insert and successor.

1078
01:01:42,174 --> 01:01:44,090
And through all these
steps, it would actually

1079
01:01:44,090 --> 01:01:46,280
be very hard to do delete.

1080
01:01:46,280 --> 01:01:50,810
It turns out, at this
point, delete is no problem.

1081
01:01:50,810 --> 01:01:54,180
So let me give you
some delete codes.

1082
01:02:21,650 --> 01:02:22,650
It's a little bit long.

1083
01:02:26,190 --> 01:02:29,401
Maybe I'll start with
a high level picture,

1084
01:02:29,401 --> 01:02:30,545
sort of the main cases.

1085
01:02:36,260 --> 01:02:38,930
Deleting the min is a little bit
special, as you might imagine.

1086
01:02:38,930 --> 01:02:41,090
That element is different
from every other element.

1087
01:02:41,090 --> 01:02:44,660
So if x equals min, we're
going to do something else.

1088
01:02:44,660 --> 01:02:46,620
But let me specify that later.

1089
01:02:46,620 --> 01:02:49,500
Let's get to the bulk
of the code, which

1090
01:02:49,500 --> 01:03:10,220
is we're going to delete low
of x from cluster high of x.

1091
01:03:10,220 --> 01:03:12,600
That's the obvious
recursion to do.

1092
01:03:12,600 --> 01:03:17,490
This is essentially the
reverse of insert over here.

1093
01:03:17,490 --> 01:03:19,450
The first thing we
do is undo this.

1094
01:03:19,450 --> 01:03:21,030
In all cases, insert
was doing that.

1095
01:03:21,030 --> 01:03:22,420
So in all cases,
delete has to do

1096
01:03:22,420 --> 01:03:26,730
that, other than the
special case of the min.

1097
01:03:26,730 --> 01:03:29,090
And then, we need to
do the inverse of this.

1098
01:03:29,090 --> 01:03:33,580
So if that was the
last item, then we

1099
01:03:33,580 --> 01:03:36,100
need to delete from
the summary structure.

1100
01:03:36,100 --> 01:03:40,250
So it's actually
pretty symmetric,

1101
01:03:40,250 --> 01:03:43,260
other than the tiny details.

1102
01:03:43,260 --> 01:03:48,300
So after we delete, we can
check, is that structure empty.

1103
01:03:48,300 --> 01:03:53,110
Because then, the
min would equal none.

1104
01:03:53,110 --> 01:03:55,050
OK.

1105
01:03:55,050 --> 01:03:58,225
If that's the case, we delete
from the summary structure.

1106
01:04:13,946 --> 01:04:14,446
OK.

1107
01:04:18,350 --> 01:04:20,120
Cool.

1108
01:04:20,120 --> 01:04:22,870
And there is a bit
of a special case

1109
01:04:22,870 --> 01:04:26,985
at the end, which is when we
deleted the maximum element.

1110
01:04:31,130 --> 01:04:32,780
OK, so I need to fill these in.

1111
01:04:36,570 --> 01:04:39,980
And it's important that
these are filled in right.

1112
01:04:39,980 --> 01:04:41,800
Because in some
situations here, we

1113
01:04:41,800 --> 01:04:44,170
are making two recursive calls.

1114
01:04:44,170 --> 01:04:48,000
But again, we'd like it to
be, when we do both calls,

1115
01:04:48,000 --> 01:04:50,194
we want one of them to be cheap.

1116
01:04:50,194 --> 01:04:51,610
Now this one's
hard to make cheap.

1117
01:04:51,610 --> 01:04:54,150
So when we delete from
the summary structure,

1118
01:04:54,150 --> 01:04:57,150
we want this to delete to
have taken only constant time,

1119
01:04:57,150 --> 01:04:59,220
no recursions.

1120
01:04:59,220 --> 01:05:01,550
And that's going to
correspond to this case.

1121
01:05:01,550 --> 01:05:04,700
Because if we made
the cluster empty,

1122
01:05:04,700 --> 01:05:06,720
that means we deleted
the last item.

1123
01:05:06,720 --> 01:05:07,700
What's the last item?

1124
01:05:07,700 --> 01:05:10,630
Has to be v dot min.

1125
01:05:10,630 --> 01:05:12,280
If you have a size
1 structure, it's

1126
01:05:12,280 --> 01:05:14,310
always because that
item is in v dot min,

1127
01:05:14,310 --> 01:05:16,160
everything else is empty.

1128
01:05:16,160 --> 01:05:18,260
So that's the case of
deleting v dot min.

1129
01:05:18,260 --> 01:05:22,660
So we want this case to
take constant time when it's

1130
01:05:22,660 --> 01:05:26,010
the last item we're deleting.

1131
01:05:26,010 --> 01:05:30,550
So let's fill that in a little.

1132
01:05:38,032 --> 01:05:39,240
Let's see if I can fit it in.

1133
01:06:29,090 --> 01:06:30,850
This is code that
turns out to work

1134
01:06:30,850 --> 01:06:33,650
in this if x equals v dot min.

1135
01:06:33,650 --> 01:06:36,200
It's a little bit subtle.

1136
01:06:36,200 --> 01:06:39,940
But the key thing to check
here is, we want to know,

1137
01:06:39,940 --> 01:06:41,810
is this the last item.

1138
01:06:41,810 --> 01:06:44,980
And one way to do that is to
look at the summary structure,

1139
01:06:44,980 --> 01:06:48,142
and say, do you have
any non-empty clusters?

1140
01:06:48,142 --> 01:06:49,850
If you don't have any
non-empty clusters,

1141
01:06:49,850 --> 01:06:52,250
that means your min is none.

1142
01:06:52,250 --> 01:06:55,460
And that means, the only thing
keeping the structure non-empty

1143
01:06:55,460 --> 01:06:56,650
is the minimum element.

1144
01:06:56,650 --> 01:06:58,120
That's stored in v dot min.

1145
01:06:58,120 --> 01:06:59,860
So in that case, that's
the one situation

1146
01:06:59,860 --> 01:07:03,070
when v dot min becomes none.

1147
01:07:03,070 --> 01:07:07,420
We never set v dot min equals
none in the other algorithms.

1148
01:07:07,420 --> 01:07:10,590
Because initially
everything is none.

1149
01:07:10,590 --> 01:07:13,510
But when we're inserting,
we never empty a structure.

1150
01:07:13,510 --> 01:07:14,620
Now we're doing delete.

1151
01:07:14,620 --> 01:07:16,440
This is the one
situation when v dot

1152
01:07:16,440 --> 01:07:19,050
min becomes none from scratch.

1153
01:07:19,050 --> 01:07:21,780
In that case, no
recursive calls.

1154
01:07:21,780 --> 01:07:24,250
So that means this
algorithm is efficient.

1155
01:07:24,250 --> 01:07:26,500
Because if I had to delete
from the summary structure,

1156
01:07:26,500 --> 01:07:29,610
this only had a single item,
which is this situation.

1157
01:07:29,610 --> 01:07:31,630
Then I just set v dot
min equals to none.

1158
01:07:31,630 --> 01:07:32,700
And I'm done.

1159
01:07:32,700 --> 01:07:35,160
So this will, overall,
run in log log u time.

1160
01:07:40,170 --> 01:07:42,000
Now, it could be we're
deleting the min,

1161
01:07:42,000 --> 01:07:43,820
but it was not the only item.

1162
01:07:43,820 --> 01:07:46,570
So that's this situation.

1163
01:07:46,570 --> 01:07:49,260
In that situation,
we want to find out

1164
01:07:49,260 --> 01:07:50,580
what the min actually is.

1165
01:07:50,580 --> 01:07:51,080
Right?

1166
01:07:51,080 --> 01:07:52,487
We just deleted the min.

1167
01:07:52,487 --> 01:07:54,070
We want to put
something in v dot min.

1168
01:07:54,070 --> 01:07:55,040
We can't set it to none.

1169
01:07:55,040 --> 01:07:57,206
Because that indicates the
whole structure is empty.

1170
01:07:57,206 --> 01:08:01,190
So we have to recursively rip
out the new minimum out item.

1171
01:08:01,190 --> 01:08:03,660
Because it should not be
recursively stored anymore.

1172
01:08:03,660 --> 01:08:06,200
And then we're going to
stick it into v dot min.

1173
01:08:06,200 --> 01:08:10,380
So now, finding minimum items
is actually pretty easy.

1174
01:08:10,380 --> 01:08:12,990
We just looked at the
first non-empty structure.

1175
01:08:12,990 --> 01:08:15,050
And we looked at
the-- I think I'm

1176
01:08:15,050 --> 01:08:19,149
missing-- oh, v dot
cluster i min, I guess,

1177
01:08:19,149 --> 01:08:21,710
closed parenthesis.

1178
01:08:21,710 --> 01:08:26,300
That is the minimum item
in the first cluster.

1179
01:08:26,300 --> 01:08:29,370
So I want to
recursively delete it.

1180
01:08:29,370 --> 01:08:30,770
So I'm setting x to that thing.

1181
01:08:30,770 --> 01:08:32,853
And then I'm going to do
all this code, which will

1182
01:08:32,853 --> 01:08:34,970
delete x from that structure.

1183
01:08:34,970 --> 01:08:38,770
And then-- I mean, I'm
doing it all right here.

1184
01:08:38,770 --> 01:08:41,689
But then, I'm going to set
v dot min to be that value.

1185
01:08:41,689 --> 01:08:43,899
So then v dot min
has a new value.

1186
01:08:43,899 --> 01:08:45,840
Because I deleted the old one.

1187
01:08:45,840 --> 01:08:48,250
And it's no longer
recursively stored.

1188
01:08:48,250 --> 01:08:51,109
I don't want two copies
of x floating around.

1189
01:08:51,109 --> 01:08:56,939
So that's why I do, even in this
if case, I do all these steps.

1190
01:08:56,939 --> 01:08:58,540
Cool?

1191
01:08:58,540 --> 01:09:00,409
You can see delete--
is that a question?

1192
01:09:00,409 --> 01:09:02,832
AUDIENCE: [INAUDIBLE]

1193
01:09:02,832 --> 01:09:04,790
PROFESSOR: Oh, why did
I set v dot max to none?

1194
01:09:04,790 --> 01:09:06,373
AUDIENCE: Because
[? that's the ?] all

1195
01:09:06,373 --> 01:09:09,790
[? these ?] [INAUDIBLE]
[? x ?] equals v dot max,

1196
01:09:09,790 --> 01:09:10,790
the last time.

1197
01:09:10,790 --> 01:09:12,200
AUDIENCE: [? Do you ?]
[? find v dot max? ?]

1198
01:09:12,200 --> 01:09:12,830
PROFESSOR: Oh, right.

1199
01:09:12,830 --> 01:09:13,729
I'm not done yet.

1200
01:09:13,729 --> 01:09:15,988
I haven't specified
what to do here.

1201
01:09:15,988 --> 01:09:19,226
OK, you really want to know?

1202
01:09:19,226 --> 01:09:21,140
OK.

1203
01:09:21,140 --> 01:09:24,920
Let's go somewhere else.

1204
01:09:24,920 --> 01:09:27,990
I have enough room, I think.

1205
01:09:27,990 --> 01:09:29,909
Eh, maybe I can squeeze it in.

1206
01:09:29,909 --> 01:09:33,050
It's going to be super compact.

1207
01:09:33,050 --> 01:09:36,470
So, when x equals v dot
max, there are two cases.

1208
01:09:43,672 --> 01:09:44,880
So max is a little different.

1209
01:09:44,880 --> 01:09:47,700
We just need to
keep it up to date.

1210
01:09:47,700 --> 01:09:49,109
So it's not that hard.

1211
01:09:49,109 --> 01:09:51,474
We don't have to do
any recursive magic.

1212
01:10:09,600 --> 01:10:12,240
Well, I need another line.

1213
01:10:12,240 --> 01:10:13,510
Sorry.

1214
01:10:13,510 --> 01:10:15,040
Let me go up to the other board.

1215
01:10:54,066 --> 01:10:56,550
OK, I think that's the
complete delete code.

1216
01:10:56,550 --> 01:10:57,290
You asked for it.

1217
01:10:57,290 --> 01:10:59,150
You've got it.

1218
01:10:59,150 --> 01:11:03,810
So, at this point,
we have just deleted

1219
01:11:03,810 --> 01:11:06,310
the max, which means we
need to find, basically,

1220
01:11:06,310 --> 01:11:07,410
the predecessor of x.

1221
01:11:07,410 --> 01:11:09,910
But we can't afford
a recursive call.

1222
01:11:09,910 --> 01:11:10,660
I mean, that's OK.

1223
01:11:10,660 --> 01:11:13,990
It's just, we're trying to
find the max in what remains.

1224
01:11:13,990 --> 01:11:16,030
Imagine v dot max is just wrong.

1225
01:11:16,030 --> 01:11:17,840
So we've got to set
it from scratch.

1226
01:11:17,840 --> 01:11:19,410
It's not that hard to do.

1227
01:11:19,410 --> 01:11:23,850
Basically, we want to take
the last non-empty structure.

1228
01:11:23,850 --> 01:11:26,430
That would v dot
summary dot max,

1229
01:11:26,430 --> 01:11:30,410
and then find the last
item in that cluster.

1230
01:11:30,410 --> 01:11:34,030
OK, so cluster i is the
last one for v dot summary.

1231
01:11:34,030 --> 01:11:36,950
And then we look v dot
cluster of i dot max.

1232
01:11:36,950 --> 01:11:38,190
And we combine it with i.

1233
01:11:38,190 --> 01:11:42,670
That gives us the name of
that item in the last cluster,

1234
01:11:42,670 --> 01:11:44,410
the last non-empty cluster.

1235
01:11:44,410 --> 01:11:46,210
But there's a
special case, which

1236
01:11:46,210 --> 01:11:48,640
is maybe this returns none.

1237
01:11:48,640 --> 01:11:52,340
Maybe there actually is
nothing in v dot summary.

1238
01:11:52,340 --> 01:11:55,110
That means we just deleted
the last item, I guess.

1239
01:11:55,110 --> 01:11:57,240
Or there's only one left.

1240
01:11:57,240 --> 01:11:59,450
We deleted the
next to last time.

1241
01:11:59,450 --> 01:12:02,190
Now there's only one item
left, namely v dot min.

1242
01:12:02,190 --> 01:12:04,790
So we set v dot max
equal to v dot min.

1243
01:12:04,790 --> 01:12:06,480
So that's a special case.

1244
01:12:06,480 --> 01:12:08,170
But most the time,
you're just doing

1245
01:12:08,170 --> 01:12:11,089
a couple dot max's,
and you're done.

1246
01:12:11,089 --> 01:12:12,630
So that's how you
maintain the maxes,

1247
01:12:12,630 --> 01:12:14,080
even when you're deleting.

1248
01:12:14,080 --> 01:12:16,807
And unless I made an error,
I think all these algorithms

1249
01:12:16,807 --> 01:12:17,390
work together.

1250
01:12:17,390 --> 01:12:19,590
You're going to insert,
delete, and successor.

1251
01:12:19,590 --> 01:12:23,000
And symmetrically, you can
do predecessor in log log u

1252
01:12:23,000 --> 01:12:25,220
time per operation, super fast.

1253
01:12:28,340 --> 01:12:31,160
Let me tell you a
couple other things.

1254
01:12:31,160 --> 01:12:34,110
One is, there's a
matching lower bound.

1255
01:12:34,110 --> 01:12:36,230
Log log-- maybe
you wonder, can I

1256
01:12:36,230 --> 01:12:41,300
get log log log time, log log
log log time, or whatever?

1257
01:12:41,300 --> 01:12:43,180
No.

1258
01:12:43,180 --> 01:12:46,274
In most reasonable
choices of parameters--

1259
01:12:46,274 --> 01:12:48,190
it's a little bit more
complicated than this--

1260
01:12:48,190 --> 01:12:50,940
but for most of the time
that you care about,

1261
01:12:50,940 --> 01:12:54,330
log log u is the right answer.

1262
01:12:54,330 --> 01:12:55,770
This was proved in 2007.

1263
01:12:55,770 --> 01:12:59,536
So it took us decades
to really understand.

1264
01:12:59,536 --> 01:13:02,665
It's by a former MIT student.

1265
01:13:06,250 --> 01:13:18,299
So I'll give you some
range where it holds,

1266
01:13:18,299 --> 01:13:19,590
which will raise another issue.

1267
01:13:19,590 --> 01:13:26,300
But, OK.

1268
01:13:26,300 --> 01:13:28,910
So this range is the range
I talked about before.

1269
01:13:28,910 --> 01:13:30,891
This is when log log
u equals log log n.

1270
01:13:30,891 --> 01:13:33,390
So that's kind of the case where
you care about applying it.

1271
01:13:33,390 --> 01:13:37,090
If log log u is more like log
n, it's not so interesting.

1272
01:13:37,090 --> 01:13:39,230
But as long as u is
not too big, this

1273
01:13:39,230 --> 01:13:42,100
is a little bit bigger
than polynomial n.

1274
01:13:42,100 --> 01:13:45,486
Then this is the right answer.

1275
01:13:45,486 --> 01:13:47,360
Now technically, you
need another assumption,

1276
01:13:47,360 --> 01:13:49,068
which is the space of
your data structure

1277
01:13:49,068 --> 01:13:50,782
is not to super linear.

1278
01:13:50,782 --> 01:13:51,990
Now this is a little awkward.

1279
01:13:51,990 --> 01:13:54,545
Because the space of
this data show structure

1280
01:13:54,545 --> 01:13:59,120
is actually order u, not n.

1281
01:13:59,120 --> 01:14:00,766
So the last issue is space.

1282
01:14:06,140 --> 01:14:07,430
Space is order u.

1283
01:14:07,430 --> 01:14:11,290
Let me go back to this
binary tree picture.

1284
01:14:11,290 --> 01:14:13,360
So we had the idea
of, well, there's

1285
01:14:13,360 --> 01:14:15,640
all these bits at the bottom.

1286
01:14:15,640 --> 01:14:18,860
We're building a big
binary tree above those.

1287
01:14:18,860 --> 01:14:20,890
The leaves are the actual data.

1288
01:14:20,890 --> 01:14:23,410
And then we're summarizing,
by for every node,

1289
01:14:23,410 --> 01:14:25,462
we're writing the or of
the two nodes below it,

1290
01:14:25,462 --> 01:14:27,670
which is summarizing whether
that thing is non-empty.

1291
01:14:32,230 --> 01:14:34,400
What van Emde Boas is
doing-- so first of all,

1292
01:14:34,400 --> 01:14:37,900
you see that the total number of
nodes in this tree is order u.

1293
01:14:37,900 --> 01:14:39,521
Because there's u leaves.

1294
01:14:39,521 --> 01:14:41,395
The total size of a
binary tree with u leaves

1295
01:14:41,395 --> 01:14:44,112
is order u, 2u minus 1, right?

1296
01:14:46,630 --> 01:14:49,300
And you can kind of see what
van Emde Boas is doing here.

1297
01:14:49,300 --> 01:14:52,637
First, it's thinking
about the middle level.

1298
01:14:52,637 --> 01:14:54,470
Now it's not directly
looking at these bits.

1299
01:14:54,470 --> 01:14:57,940
It says, hey look,
I know my item,

1300
01:14:57,940 --> 01:15:01,430
the thing I'm doing a successor
of, let's say, is three.

1301
01:15:01,430 --> 01:15:03,580
I want to know the
successor of this position.

1302
01:15:03,580 --> 01:15:08,160
First, I want to check, should
I recurse in this block,

1303
01:15:08,160 --> 01:15:10,880
or should I recurse
in the summary

1304
01:15:10,880 --> 01:15:13,010
block-- which I didn't draw.

1305
01:15:13,010 --> 01:15:16,340
But it's the part of the
tree that would be up here.

1306
01:15:16,340 --> 01:15:23,280
And that's exactly what
we're doing with successor.

1307
01:15:23,280 --> 01:15:25,842
Should we recursively
look within cluster i?

1308
01:15:25,842 --> 01:15:27,800
Or should we look within
the summary structure?

1309
01:15:27,800 --> 01:15:29,790
We only do one or the other.

1310
01:15:29,790 --> 01:15:32,190
And that's the sense in
which we are binary searching

1311
01:15:32,190 --> 01:15:33,750
on the levels of this tree.

1312
01:15:33,750 --> 01:15:36,602
Either we will spend all of
our work recursively looking

1313
01:15:36,602 --> 01:15:38,810
for the successor within
the summary structure, which

1314
01:15:38,810 --> 01:15:42,960
is like finding the next 1 bit
in this row, the middle row,

1315
01:15:42,960 --> 01:15:46,749
or we will spend all of our
time doing successor in here.

1316
01:15:46,749 --> 01:15:47,540
And we can do that.

1317
01:15:47,540 --> 01:15:49,360
Because we have
the max augmented.

1318
01:15:49,360 --> 01:15:52,424
OK, but that's the
sense in which, kind of,

1319
01:15:52,424 --> 01:15:54,590
you are binary searching
in the levels of this tree.

1320
01:15:54,590 --> 01:15:57,870
So that's that early
intuition for van Emde Boas

1321
01:15:57,870 --> 01:15:59,800
is kind of what we're doing.

1322
01:15:59,800 --> 01:16:04,970
The trouble is, to store that
tree takes order u space.

1323
01:16:04,970 --> 01:16:07,920
We'd really like to
spend order n space.

1324
01:16:07,920 --> 01:16:09,680
And I have four minutes.

1325
01:16:09,680 --> 01:16:14,150
So you'll see part of
the answer to this.

1326
01:16:17,132 --> 01:16:18,623
My poor microphone.

1327
01:16:23,600 --> 01:16:26,319
Let me give you an idea of
how to fix the space bound.

1328
01:16:26,319 --> 01:16:27,485
Let's erase some algorithms.

1329
01:16:41,880 --> 01:16:50,910
The main idea here is only
store non-empty clusters,

1330
01:16:50,910 --> 01:16:52,160
pretty simple idea.

1331
01:16:54,930 --> 01:16:58,180
We want to spend space
only for the present items,

1332
01:16:58,180 --> 01:16:59,200
not for the absent ones.

1333
01:16:59,200 --> 01:17:01,940
So don't store the absent ones.

1334
01:17:01,940 --> 01:17:04,630
In particular, we're
doing all this work around

1335
01:17:04,630 --> 01:17:07,560
when clusters are
empty, in which case

1336
01:17:07,560 --> 01:17:10,270
we can see that just by
looking at the min item,

1337
01:17:10,270 --> 01:17:11,360
or when they're non-empty.

1338
01:17:11,360 --> 01:17:13,380
So let's just store
the non-empty ones.

1339
01:17:13,380 --> 01:17:17,170
That will get you down to
almost order n space, not quite,

1340
01:17:17,170 --> 01:17:19,070
but close.

1341
01:17:19,070 --> 01:17:24,370
To do this, v dot cluster
is no longer an array.

1342
01:17:24,370 --> 01:17:29,460
Just make it a hash table,
a dictionary in Python.

1343
01:17:29,460 --> 01:17:33,460
So v dot cluster--
we were always

1344
01:17:33,460 --> 01:17:34,750
doing v dot cluster of i.

1345
01:17:34,750 --> 01:17:37,020
Just make that into dictionary
instead of an array.

1346
01:17:37,020 --> 01:17:38,360
And you save most of the space.

1347
01:17:38,360 --> 01:17:40,795
You only have to store
the non-empty items.

1348
01:17:47,260 --> 01:17:50,030
And you should know from
006, hash table is constant

1349
01:17:50,030 --> 01:17:51,320
expected.

1350
01:17:51,320 --> 01:17:54,870
We'll prove that formally
in lecture eight, I think.

1351
01:17:54,870 --> 01:17:58,700
But for now, take
hashing as given.

1352
01:17:58,700 --> 01:18:00,950
Everything we did before is
essentially the same cost,

1353
01:18:00,950 --> 01:18:04,420
but an expectation,
no longer worst case.

1354
01:18:04,420 --> 01:18:07,140
But now the space goes way down.

1355
01:18:07,140 --> 01:18:12,290
Because if you look at an
item, when you insert an item,

1356
01:18:12,290 --> 01:18:15,620
it sort of goes to log
log u different places,

1357
01:18:15,620 --> 01:18:16,910
in the worst case.

1358
01:18:16,910 --> 01:18:21,800
But, yeah.

1359
01:18:21,800 --> 01:18:28,740
We end up with n log log u
space, which is pretty good,

1360
01:18:28,740 --> 01:18:31,077
almost linear space.

1361
01:18:31,077 --> 01:18:33,160
It's a little tricky to
see why you get log log u.

1362
01:18:33,160 --> 01:18:38,330
But I guess if you look
at the insert algorithm,

1363
01:18:38,330 --> 01:18:41,710
even though we had two recursive
calls in the worst case.

1364
01:18:41,710 --> 01:18:43,430
One of them was free.

1365
01:18:43,430 --> 01:18:45,510
When we do both of
them, we insert here.

1366
01:18:45,510 --> 01:18:47,506
This one happens to be free.

1367
01:18:47,506 --> 01:18:48,380
Because it was empty.

1368
01:18:48,380 --> 01:18:49,990
But we still pay for it.

1369
01:18:49,990 --> 01:18:52,310
We set v dot min equal to x.

1370
01:18:52,310 --> 01:18:55,110
And so that structure went
from empty to non-empty.

1371
01:18:55,110 --> 01:18:57,880
So this costs 1.

1372
01:18:57,880 --> 01:19:00,785
And then we recursively
call insert v dot summary

1373
01:19:00,785 --> 01:19:02,300
on high of x.

1374
01:19:02,300 --> 01:19:05,760
So we might, when we insert
one item x, if lots of things

1375
01:19:05,760 --> 01:19:10,350
were empty, actually log log
u structures become non-empty,

1376
01:19:10,350 --> 01:19:13,180
and that's why you pay log log
u for each item you insert.

1377
01:19:13,180 --> 01:19:14,880
It's kind of annoying.

1378
01:19:14,880 --> 01:19:16,760
There is a fix,
which is in my notes.

1379
01:19:16,760 --> 01:19:21,960
You can read it, for reducing
this further to order n.

1380
01:19:21,960 --> 01:19:25,500
But, OK, I have 30
seconds to explain it.

1381
01:19:25,500 --> 01:19:28,240
The idea is-- you're not
responsible for knowing it.

1382
01:19:28,240 --> 01:19:29,800
This is just in
case you're curious.

1383
01:19:32,500 --> 01:19:35,200
The idea is, instead of
going all the way down

1384
01:19:35,200 --> 01:19:37,790
in the recursion,
at the very bottom,

1385
01:19:37,790 --> 01:19:39,970
you say, well,
normally if you stop

1386
01:19:39,970 --> 01:19:41,890
the recursion when
you have u equals

1387
01:19:41,890 --> 01:19:50,130
1, just stop the recursion
when n is very small,

1388
01:19:50,130 --> 01:19:52,810
like log log u.

1389
01:19:52,810 --> 01:19:55,430
When I'm only storing
log log u items,

1390
01:19:55,430 --> 01:19:56,514
put them in a linked list.

1391
01:19:56,514 --> 01:19:57,054
I don't care.

1392
01:19:57,054 --> 01:19:59,040
You can do whatever you
want on log log u items

1393
01:19:59,040 --> 01:20:00,690
in log log u time.

1394
01:20:00,690 --> 01:20:02,120
It's just a tiny tweak.

1395
01:20:02,120 --> 01:20:05,720
But it turns out, it gets rid
of that log u in the space.

1396
01:20:05,720 --> 01:20:07,317
So it's a little bit messier.

1397
01:20:07,317 --> 01:20:09,650
And I don't know if you'd
want to implement it that way.

1398
01:20:09,650 --> 01:20:11,910
But you can reduce
to linear space.

1399
01:20:11,910 --> 01:20:13,738
And that's van Emde Boas.