1
00:00:00,090 --> 00:00:02,490
The following content is
provided under a Creative

2
00:00:02,490 --> 00:00:04,030
Commons license.

3
00:00:04,030 --> 00:00:06,360
Your support will help
MIT OpenCourseWare

4
00:00:06,360 --> 00:00:10,720
continue to offer high quality,
educational resources for free.

5
00:00:10,720 --> 00:00:13,320
To make a donation, or
view additional materials

6
00:00:13,320 --> 00:00:17,280
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:17,280 --> 00:00:18,450
at ocw.mit.edu.

8
00:00:21,480 --> 00:00:24,870
ERIK DEMAINE: All right,
welcome to my last lecture

9
00:00:24,870 --> 00:00:26,880
for the semester.

10
00:00:26,880 --> 00:00:30,090
We finish our coverage
of dynamic graphs,

11
00:00:30,090 --> 00:00:32,439
and also our coverage
of lower bounds.

12
00:00:32,439 --> 00:00:35,940
We saw one big lower
bound in this class

13
00:00:35,940 --> 00:00:37,502
in the cell probe model.

14
00:00:37,502 --> 00:00:39,210
You may recall cell
probe model, you just

15
00:00:39,210 --> 00:00:44,500
count how many cells
of memory do you touch.

16
00:00:44,500 --> 00:00:47,680
You want to prove a
lower bound on that.

17
00:00:47,680 --> 00:00:50,180
And today we're going to prove
a cell probe from lower bound

18
00:00:50,180 --> 00:00:54,570
on dynamic connectivity,
which is a problem we've

19
00:00:54,570 --> 00:00:57,030
solved a few different times.

20
00:00:57,030 --> 00:00:58,920
Our lower bound
will apply even when

21
00:00:58,920 --> 00:01:03,920
each of the connected components
of your graph are just a path.

22
00:01:03,920 --> 00:01:06,600
And so in particular, they
imply matching lower bounds

23
00:01:06,600 --> 00:01:07,665
for dynamic trees.

24
00:01:14,760 --> 00:01:18,120
So here is the theorem
we'll be proving today.

25
00:01:18,120 --> 00:01:27,270
You want to insert
and delete edges,

26
00:01:27,270 --> 00:01:35,970
and do connectivity queries
between pairs of vertices, vw.

27
00:01:35,970 --> 00:01:39,720
I want to know is there
a path from v to w,

28
00:01:39,720 --> 00:01:41,490
just like we've
been considering.

29
00:01:44,370 --> 00:01:52,320
These require omega log
n, time for operation.

30
00:01:55,590 --> 00:01:57,990
This is the max of
updating, query times has

31
00:01:57,990 --> 00:02:01,860
to be at least log n
time per operation,

32
00:02:01,860 --> 00:02:16,170
even if the connected
components are paths, and even

33
00:02:16,170 --> 00:02:19,958
amortized, and even randomized.

34
00:02:19,958 --> 00:02:23,700
Although I'm not going to
prove of all of these versions,

35
00:02:23,700 --> 00:02:26,130
I won't prove the
amortized version.

36
00:02:26,130 --> 00:02:29,280
I'm going to prove a worst
case log and lower bound,

37
00:02:29,280 --> 00:02:33,930
it's just a little more work to
prove an amortized lower bound.

38
00:02:33,930 --> 00:02:36,460
But same principles.

39
00:02:36,460 --> 00:02:41,500
And so that's going to be
today, is proving this theorem.

40
00:02:41,500 --> 00:02:44,640
It's not a short
proof, but it combines

41
00:02:44,640 --> 00:02:47,580
a bunch of relatively
simple ideas,

42
00:02:47,580 --> 00:02:51,060
and ends up being pretty
clean overall, piece-by-piece,

43
00:02:51,060 --> 00:02:54,810
but there's just a bunch of
pieces, as we will get to.

44
00:02:54,810 --> 00:02:57,600
Key concept is an
idea introduced

45
00:02:57,600 --> 00:03:01,140
in this paper, which is
to build a balanced binary

46
00:03:01,140 --> 00:03:05,190
tree over time, over
your access sequence.

47
00:03:05,190 --> 00:03:09,600
And argue about different
subtrees within that tree.

48
00:03:09,600 --> 00:03:14,520
This is a paper that maybe
came out of this class,

49
00:03:14,520 --> 00:03:19,500
in some sense, it was by
Mihai Patrascu and myself.

50
00:03:19,500 --> 00:03:21,510
Back when Mihai
was an undergrad,

51
00:03:21,510 --> 00:03:23,966
I think he'd just
taken this class.

52
00:03:23,966 --> 00:03:25,590
But at that point
the class didn't even

53
00:03:25,590 --> 00:03:26,923
cover dynamic connectivity, so--

54
00:03:30,040 --> 00:03:32,535
and time here is cell probes.

55
00:03:36,300 --> 00:03:39,050
So this is a very
strong model, it

56
00:03:39,050 --> 00:03:41,570
implies a lower bound on ram,
and implies a lower bound

57
00:03:41,570 --> 00:03:42,840
on pointer machine.

58
00:03:42,840 --> 00:03:46,380
We know matching
upper bounds for trees

59
00:03:46,380 --> 00:03:48,990
on a pointer machine
link/cut trees

60
00:03:48,990 --> 00:03:50,560
in [INAUDIBLE] to our trees.

61
00:03:50,560 --> 00:03:53,790
It's kind of fun that this lower
bound even applies to paths,

62
00:03:53,790 --> 00:03:55,740
because most of the
work in link/cut trees

63
00:03:55,740 --> 00:03:57,914
is about decomposing
your tree into paths.

64
00:03:57,914 --> 00:04:00,330
And so what this is saying is
even if that's done for you,

65
00:04:00,330 --> 00:04:03,720
and you just need to be able
to take paths, and concatenate

66
00:04:03,720 --> 00:04:10,380
them together by adding edges,
then maintaining the find root

67
00:04:10,380 --> 00:04:13,080
property so that you can
do connectivity queries,

68
00:04:13,080 --> 00:04:15,550
even that requires log n time.

69
00:04:15,550 --> 00:04:19,350
So converting a tree into
a path is basically free,

70
00:04:19,350 --> 00:04:23,010
the hard part is
maintaining the paths.

71
00:04:23,010 --> 00:04:25,830
So let's prove a theorem.

72
00:04:28,590 --> 00:04:30,030
The lower bound,
we get to choose

73
00:04:30,030 --> 00:04:32,550
what access sequence
we think is bad.

74
00:04:32,550 --> 00:04:37,170
And so we're going to come
up with a particular style

75
00:04:37,170 --> 00:04:42,703
of graph, which looks
like the following.

76
00:04:54,050 --> 00:04:55,190
Graph is going to be--

77
00:04:55,190 --> 00:04:57,530
the vertices are going to
be a root n by root n grid.

78
00:05:01,700 --> 00:05:03,950
And we're going to--

79
00:05:03,950 --> 00:05:06,300
these guys are in,
what did I call them?

80
00:05:06,300 --> 00:05:07,280
Groups?

81
00:05:07,280 --> 00:05:08,340
Columns.

82
00:05:08,340 --> 00:05:10,160
Columns is a good name.

83
00:05:10,160 --> 00:05:15,620
These are columns of
the matrix, or vertices.

84
00:05:15,620 --> 00:05:19,760
And what I'd like to have is
between consecutive columns,

85
00:05:19,760 --> 00:05:21,140
I want to have a
perfect matching

86
00:05:21,140 --> 00:05:23,232
between these vertices.

87
00:05:23,232 --> 00:05:28,580
So could be, I don't know, this
edge, this edge, this edge,

88
00:05:28,580 --> 00:05:29,554
and that edge.

89
00:05:29,554 --> 00:05:31,970
And I also want a perfect match
between these two columns,

90
00:05:31,970 --> 00:05:39,950
so maybe this one, this
one, this one, this one.

91
00:05:39,950 --> 00:05:46,340
And you can have some
boring things too.

92
00:05:46,340 --> 00:05:47,180
Something like that.

93
00:05:47,180 --> 00:05:53,815
So between every pair of
columns is a perfect matching,

94
00:05:53,815 --> 00:05:54,815
meaning perfect pairing.

95
00:05:58,910 --> 00:06:01,370
OK, this of course results
in a collection of paths,

96
00:06:01,370 --> 00:06:02,660
square root of n paths.

97
00:06:02,660 --> 00:06:04,970
You can start at any
vertex on the left,

98
00:06:04,970 --> 00:06:07,420
and you'll have a unique
way to go to the right.

99
00:06:07,420 --> 00:06:13,430
And so that's path 1, this
is path 2, this is path 3,

100
00:06:13,430 --> 00:06:16,070
and this is path 4.

101
00:06:16,070 --> 00:06:20,720
And so if I-- an
interesting query.

102
00:06:20,720 --> 00:06:23,450
Well, an interesting query is
something like I want to know,

103
00:06:23,450 --> 00:06:26,877
is this vertex
connected to this one?

104
00:06:26,877 --> 00:06:28,460
And it's not so easy
to figure it out,

105
00:06:28,460 --> 00:06:31,490
because you have to sort
of walk through this path

106
00:06:31,490 --> 00:06:32,349
to figure that out.

107
00:06:32,349 --> 00:06:34,640
We're going to think of each
of these perfect matchings

108
00:06:34,640 --> 00:06:41,060
as defining a permutation on the
vertices, on the column really.

109
00:06:41,060 --> 00:06:43,940
So you start with the
identity permutation,

110
00:06:43,940 --> 00:06:46,480
and then some things get
swapped around, that's pi 1.

111
00:06:46,480 --> 00:06:48,750
Something gets swapped
around again, that's pi 2.

112
00:06:48,750 --> 00:06:51,170
Somethings get swapped
around here, that's pi 3.

113
00:06:51,170 --> 00:06:57,500
And then this position would
be pi 3 of pi 2 of pi 1

114
00:06:57,500 --> 00:07:01,940
of vertex 4.

115
00:07:01,940 --> 00:07:05,960
We call this vertex 4.

116
00:07:05,960 --> 00:07:09,200
Or row, row 4.

117
00:07:09,200 --> 00:07:11,990
So in some sense, we have
to compose permutations.

118
00:07:11,990 --> 00:07:20,132
I'll call this pi 1, circle
pi 2, circle pi 3 of 4.

119
00:07:20,132 --> 00:07:21,590
And we're going to
show, basically,

120
00:07:21,590 --> 00:07:24,470
composing permutations is
tough when you can change

121
00:07:24,470 --> 00:07:26,630
those permutations dynamically.

122
00:07:26,630 --> 00:07:28,340
So what we're going
to do is a series

123
00:07:28,340 --> 00:07:37,130
of block operations,
which change or query

124
00:07:37,130 --> 00:07:39,660
entire permutations.

125
00:07:39,660 --> 00:07:45,470
So here, an update is going to
be a whole bunch of insertions

126
00:07:45,470 --> 00:07:48,230
and deletions of edges.

127
00:07:48,230 --> 00:07:53,966
Basically, what we want to
do is set pi i equal to pi.

128
00:07:53,966 --> 00:07:57,380
So that's what update
of i comma pi does.

129
00:07:57,380 --> 00:08:00,110
It changes an entire
perfect matching

130
00:08:00,110 --> 00:08:02,914
to be a specified permutation.

131
00:08:02,914 --> 00:08:03,830
So how do you do that?

132
00:08:03,830 --> 00:08:05,300
Well, you delete
all the edges that

133
00:08:05,300 --> 00:08:06,841
are in the existing
permutation, then

134
00:08:06,841 --> 00:08:08,940
you insert all the new edges.

135
00:08:08,940 --> 00:08:15,200
So this can be done in square
root of n edge deletions

136
00:08:15,200 --> 00:08:15,980
and insertions.

137
00:08:23,790 --> 00:08:26,460
So it's a bulk update of
square root of n operations.

138
00:08:26,460 --> 00:08:28,920
And so this could only
make our problem easier,

139
00:08:28,920 --> 00:08:30,980
because we're given
square root of n updates

140
00:08:30,980 --> 00:08:33,360
that we all need to do at once.

141
00:08:33,360 --> 00:08:35,850
So you could amortize
over them, you

142
00:08:35,850 --> 00:08:38,520
could do lots of
different things,

143
00:08:38,520 --> 00:08:42,059
but we're sure that won't help.

144
00:08:42,059 --> 00:08:45,300
And then we have a
query, and the query

145
00:08:45,300 --> 00:08:49,609
is going to be a
little bit weird,

146
00:08:49,609 --> 00:08:51,900
and it's also going to make
the proof a little bit more

147
00:08:51,900 --> 00:08:53,400
awkward.

148
00:08:53,400 --> 00:09:03,570
But what it asks is if I look at
the composition of pi j, from 1

149
00:09:03,570 --> 00:09:05,630
up to i.

150
00:09:05,630 --> 00:09:07,622
This is 1.

151
00:09:07,622 --> 00:09:12,660
So I want to know is that
composition equal to pi?

152
00:09:12,660 --> 00:09:13,680
Yes or no?

153
00:09:13,680 --> 00:09:18,720
This is what I'll call verify
sum, sum meaning composition.

154
00:09:18,720 --> 00:09:21,210
But the sum terminology comes
from a different problem,

155
00:09:21,210 --> 00:09:22,890
which we won't
talk about directly

156
00:09:22,890 --> 00:09:25,140
here, called partial sums.

157
00:09:25,140 --> 00:09:27,240
Partial sums is
basically this problem,

158
00:09:27,240 --> 00:09:29,580
you can change
numbers in an array,

159
00:09:29,580 --> 00:09:33,440
and you can compute the
prefix sum from 1 up to i.

160
00:09:33,440 --> 00:09:36,780
Here we're not computing it.

161
00:09:36,780 --> 00:09:37,950
Why are we not computing it?

162
00:09:37,950 --> 00:09:41,370
Because actually figuring out
what pi 3, or pi 2 of pi 1

163
00:09:41,370 --> 00:09:45,330
is of something is
tricky in this setting.

164
00:09:45,330 --> 00:09:48,120
The operations we're given
are, given two vertices,

165
00:09:48,120 --> 00:09:51,510
are they connected by a path?

166
00:09:51,510 --> 00:09:56,989
So to figure out the other end
of this path, that requires--

167
00:09:56,989 --> 00:09:59,280
I mean, it's hard to figure
out where the other end is.

168
00:09:59,280 --> 00:10:02,010
If I told you is it this one?

169
00:10:02,010 --> 00:10:05,280
Then I can answer that question
with just a connectivity query.

170
00:10:05,280 --> 00:10:09,490
So verify sum can be done
with order square root

171
00:10:09,490 --> 00:10:12,715
of n connectivity queries.

172
00:10:16,530 --> 00:10:22,770
Whereas computing the sum could
not be, as far as we know.

173
00:10:22,770 --> 00:10:25,980
If I tell you what that
composition is supposed to be,

174
00:10:25,980 --> 00:10:29,280
I can check does 4 go to 1?

175
00:10:29,280 --> 00:10:30,610
Yes or no?

176
00:10:30,610 --> 00:10:32,670
Does 3 go to 3?

177
00:10:32,670 --> 00:10:33,370
Yes or no?

178
00:10:33,370 --> 00:10:35,520
Does 2 go to 4?

179
00:10:35,520 --> 00:10:36,110
Yes or no?

180
00:10:36,110 --> 00:10:37,780
Does 1 go to 2?

181
00:10:37,780 --> 00:10:38,490
Yes or no?

182
00:10:38,490 --> 00:10:41,910
So with 4 queries, I can
check whether the permutation

183
00:10:41,910 --> 00:10:44,790
is what it is.

184
00:10:44,790 --> 00:10:47,060
If any of those fail,
then I return no.

185
00:10:49,890 --> 00:10:53,400
So the way this proceeds is
first we proved a lower bound

186
00:10:53,400 --> 00:10:56,880
on partial sums,
which is computing

187
00:10:56,880 --> 00:11:01,260
this value when you're not
told what the answer is.

188
00:11:01,260 --> 00:11:03,801
And then we extended that,
and we'll do such a proof here

189
00:11:03,801 --> 00:11:04,300
today.

190
00:11:04,300 --> 00:11:06,466
First, we're going to prove
a lower bound on the sum

191
00:11:06,466 --> 00:11:09,030
operation, which is
computing this value that's

192
00:11:09,030 --> 00:11:14,640
on our outline over
here, sum lower bound.

193
00:11:14,640 --> 00:11:17,580
And then we'll extend that and
make the argument a little bit

194
00:11:17,580 --> 00:11:20,940
more complicated, then we'll
get an actual connectivity lower

195
00:11:20,940 --> 00:11:23,740
bound, a lower
bound on verify sum.

196
00:11:23,740 --> 00:11:26,010
OK, but obviously
if we can prove

197
00:11:26,010 --> 00:11:29,490
that these operations
take a long time to do,

198
00:11:29,490 --> 00:11:33,150
we can prove that these original
operations take a long time

199
00:11:33,150 --> 00:11:34,410
to do.

200
00:11:34,410 --> 00:11:46,700
So what we claim is that
square root of n updates,

201
00:11:46,700 --> 00:11:58,260
these block updates plus square
of n verify some queries,

202
00:11:58,260 --> 00:12:12,220
require root n times
root n log n cell probes.

203
00:12:19,190 --> 00:12:22,360
I guess this is the
amortized claim.

204
00:12:22,360 --> 00:12:25,070
So if I want to do root n
updates and root n queries,

205
00:12:25,070 --> 00:12:27,590
and I take root n times
root n times log n--

206
00:12:27,590 --> 00:12:29,750
funny way of writing n log n--

207
00:12:29,750 --> 00:12:32,436
cell probes, then
if I divide through,

208
00:12:32,436 --> 00:12:33,810
I want the amortized
lower bound,

209
00:12:33,810 --> 00:12:36,770
I lose one of these
root ns, because I'm

210
00:12:36,770 --> 00:12:38,030
doing different operations.

211
00:12:38,030 --> 00:12:40,330
I lose another root n,
because each of these updates

212
00:12:40,330 --> 00:12:41,705
corresponds to
root n operations.

213
00:12:41,705 --> 00:12:43,370
Each of the verify
sums corresponds

214
00:12:43,370 --> 00:12:44,660
to root n operations.

215
00:12:44,660 --> 00:12:48,080
So overall per operation,
per original operation

216
00:12:48,080 --> 00:12:50,660
of edge deletion insertion
or connectivity query,

217
00:12:50,660 --> 00:12:52,440
I'm paying log n per operation.

218
00:12:52,440 --> 00:12:57,474
So if I can prove this claim,
then I get this theorem.

219
00:12:57,474 --> 00:12:59,020
All clear?

220
00:12:59,020 --> 00:13:02,039
So now we've reduced the problem
to these bulk operations,

221
00:13:02,039 --> 00:13:04,080
we'll just be thinking
about the bulk operations,

222
00:13:04,080 --> 00:13:05,160
update verify sum.

223
00:13:05,160 --> 00:13:07,320
We won't think about
edge deletion insertions

224
00:13:07,320 --> 00:13:09,041
and connectivity
queries anymore.

225
00:13:14,794 --> 00:13:15,294
OK.

226
00:13:18,180 --> 00:13:20,600
So this is just sort
of the general set up

227
00:13:20,600 --> 00:13:22,650
of what the graphs are
going to look like.

228
00:13:22,650 --> 00:13:25,460
And now I'm going to tell
you what sequence of updates

229
00:13:25,460 --> 00:13:28,840
and verify sums we're actually
going to do that are bad.

230
00:13:28,840 --> 00:13:31,520
This is the bad access sequence.

231
00:13:39,930 --> 00:13:42,642
And this is actually something
we've seen before in lecture 6,

232
00:13:42,642 --> 00:13:44,225
I think, the binary
search tree stuff.

233
00:13:49,300 --> 00:13:51,575
We're going to look at
the bit reversal sequence.

234
00:14:00,620 --> 00:14:03,830
So you may recall a
bit reversal sequence.

235
00:14:03,830 --> 00:14:14,580
You take binary numbers in
order, reverse the bits.

236
00:14:14,580 --> 00:14:28,370
So this becomes 000, 100, 010,
110, 001, 101, 011, and 111.

237
00:14:28,370 --> 00:14:33,620
So those are the
reversed strings.

238
00:14:33,620 --> 00:14:35,970
And then you reinterpret
those as regular numbers.

239
00:14:35,970 --> 00:14:43,244
So this is 0, 4,
2, 6, and then it

240
00:14:43,244 --> 00:14:45,160
should be the same thing,
but the odd version.

241
00:14:45,160 --> 00:14:49,100
So I have 1, 5, 3, 7.

242
00:14:51,926 --> 00:14:56,270
OK, I claimed, I think
probably didn't prove,

243
00:14:56,270 --> 00:15:00,760
that this bit reversal sequence
has a high Wilber lower bound.

244
00:15:00,760 --> 00:15:04,890
And so any binary search tree
accessing items in this order

245
00:15:04,890 --> 00:15:06,470
requires log n per operation.

246
00:15:06,470 --> 00:15:08,130
And we want log n
per operation here,

247
00:15:08,130 --> 00:15:10,430
so it seems like a
good choice, why not?

248
00:15:10,430 --> 00:15:13,580
So we're going to follow
this access sequence.

249
00:15:13,580 --> 00:15:15,170
And sorry, I've
changed notation here,

250
00:15:15,170 --> 00:15:18,320
we're going to number
the permutations from 0

251
00:15:18,320 --> 00:15:21,710
to root n minus 1 now.

252
00:15:21,710 --> 00:15:24,270
And assume root n
is a power of 2,

253
00:15:24,270 --> 00:15:27,140
so the bit reversal
sequence is well defined.

254
00:15:27,140 --> 00:15:34,940
And then we are going to do
two things for each such i.

255
00:15:34,940 --> 00:15:37,538
We're going to do a
verify sum operation.

256
00:15:49,010 --> 00:15:51,642
Actually maybe it is
starting at 1, I don't know.

257
00:15:51,642 --> 00:15:53,330
It doesn't matter.

258
00:15:53,330 --> 00:16:06,250
And then we'll do an update

259
00:16:06,250 --> 00:16:09,010
OK, so let's see.

260
00:16:09,010 --> 00:16:11,860
This pi random is just a
uniform random permutation,

261
00:16:11,860 --> 00:16:15,400
it's computed fresh every time.

262
00:16:15,400 --> 00:16:20,380
So we're just re-randomizing
pi i in this operation.

263
00:16:20,380 --> 00:16:22,300
Before we do that,
we're checking

264
00:16:22,300 --> 00:16:25,600
that the sum, the composition
of all the permutations up

265
00:16:25,600 --> 00:16:28,750
to position i, is what it is.

266
00:16:28,750 --> 00:16:31,300
So this is the
actual value here,

267
00:16:31,300 --> 00:16:35,390
and we're verifying that
that is indeed the sum.

268
00:16:35,390 --> 00:16:39,100
So this will always return yes.

269
00:16:39,100 --> 00:16:41,750
But data structure
has to be correct.

270
00:16:41,750 --> 00:16:45,050
So it needs to really verify
that that is the case.

271
00:16:45,050 --> 00:16:48,100
There's the threat that maybe
we gave the wrong answer here,

272
00:16:48,100 --> 00:16:50,170
and it needs to double
check that that is indeed

273
00:16:50,170 --> 00:16:51,190
the right answer.

274
00:16:51,190 --> 00:16:54,140
It may seem a little weird,
but we'll see why it works.

275
00:16:54,140 --> 00:16:56,410
So this is the bad
access sequence.

276
00:16:56,410 --> 00:17:03,540
Just do a query, do an update
in this weird order in i.

277
00:17:03,540 --> 00:17:11,230
OK, and big idea is to
build a nice balanced binary

278
00:17:11,230 --> 00:17:14,109
tree over time.

279
00:17:18,079 --> 00:17:26,514
So we have on the ground
here 0, 4, 2, 6, 1, 5, 3, 7.

280
00:17:26,514 --> 00:17:31,000
And when I write 5, I
mean verify sum of 5,

281
00:17:31,000 --> 00:17:33,820
and update permutation 5.

282
00:17:33,820 --> 00:17:36,010
And then we can build
a binary tree on that.

283
00:17:41,900 --> 00:17:43,660
And for each node
in this tree, we

284
00:17:43,660 --> 00:17:45,790
have the notion
of a left subtree,

285
00:17:45,790 --> 00:17:48,010
and we have the notion
of a right subtree.

286
00:17:48,010 --> 00:17:49,990
And cool thing about
bit reversal sequence

287
00:17:49,990 --> 00:17:52,960
is this nice self-similarity.

288
00:17:52,960 --> 00:17:55,330
If you look at the left
subtree of any node

289
00:17:55,330 --> 00:17:58,630
and the right subtree of any of
node, those items interleave.

290
00:17:58,630 --> 00:18:00,910
If you look at the sorted
order, it's 1 on the left,

291
00:18:00,910 --> 00:18:03,190
3 on the right, 5 on the
left, 7 on the right.

292
00:18:03,190 --> 00:18:04,687
They always
perfectly interleave,

293
00:18:04,687 --> 00:18:06,520
because this thing is
designed to interleave

294
00:18:06,520 --> 00:18:09,310
at every possible level.

295
00:18:09,310 --> 00:18:12,190
So that's the fact
we're going to use.

296
00:18:12,190 --> 00:18:18,670
We're going to analyze each node
separately, and talk about what

297
00:18:18,670 --> 00:18:22,690
information has to be
carried from the left subtree

298
00:18:22,690 --> 00:18:24,340
to the right subtree.

299
00:18:24,340 --> 00:18:27,100
In particular, we're
interested in the updates being

300
00:18:27,100 --> 00:18:29,410
done on the left subtree,
because here we change pi 1,

301
00:18:29,410 --> 00:18:31,150
we change pi 5.

302
00:18:31,150 --> 00:18:33,300
And the query's being
done on the right subtree,

303
00:18:33,300 --> 00:18:36,610
because here we
query 3, we query 7.

304
00:18:36,610 --> 00:18:39,310
When we query 3, that
queries everything,

305
00:18:39,310 --> 00:18:40,690
all the permutations, up to 3.

306
00:18:40,690 --> 00:18:43,380
It's a composition of
all permutations up to 3.

307
00:18:43,380 --> 00:18:45,287
So in particular it involves 1.

308
00:18:45,287 --> 00:18:47,620
So the claim is going to be
that the permutation that we

309
00:18:47,620 --> 00:18:51,220
set in 1 has to be carried
over to this query.

310
00:18:51,220 --> 00:18:54,670
And similarly, a
changing permutation 5

311
00:18:54,670 --> 00:18:55,990
will affect the query for 7.

312
00:18:55,990 --> 00:19:00,550
Also query, the update for 1,
will affect the query for 7.

313
00:19:00,550 --> 00:19:02,950
So we need to formalize
that little bit.

314
00:19:10,630 --> 00:19:12,230
So here is the claim.

315
00:19:18,020 --> 00:19:28,760
For every node in
the tree, say it

316
00:19:28,760 --> 00:19:35,400
has l leaves in its subtree--

317
00:19:42,742 --> 00:19:47,430
This should be a comma and
this should be a colon.

318
00:19:47,430 --> 00:19:51,060
Here's what we say.

319
00:19:51,060 --> 00:19:54,960
During the right subtree
of v, so right subtree

320
00:19:54,960 --> 00:19:57,850
corresponds to an
interval of time.

321
00:19:57,850 --> 00:20:00,750
So we're talking about
those operations done

322
00:20:00,750 --> 00:20:06,270
during the right subtree
of v. Claim is we must

323
00:20:06,270 --> 00:20:11,190
do omega l root n cell probes--

324
00:20:15,898 --> 00:20:19,275
sorry, expected cell probes.

325
00:20:21,950 --> 00:20:24,490
We are using some
randomness here, right?

326
00:20:24,490 --> 00:20:27,350
We said we're going to
update each permutation

327
00:20:27,350 --> 00:20:28,730
to a random value,
so we can only

328
00:20:28,730 --> 00:20:31,577
make claims about the
expected performance.

329
00:20:34,710 --> 00:20:36,549
Fine.

330
00:20:36,549 --> 00:20:38,090
But that's actually
a stronger thing,

331
00:20:38,090 --> 00:20:41,150
it implies a lower bound, even
for randomized algorithms.

332
00:20:41,150 --> 00:20:42,960
So if you can randomize
your input set.

333
00:20:45,890 --> 00:20:49,640
And then not just
any cell probes,

334
00:20:49,640 --> 00:20:55,490
but they're cell probes
that read cells last written

335
00:20:55,490 --> 00:20:56,555
during the left subtree.

336
00:21:08,440 --> 00:21:12,690
So this is what I was saying
at a high level before.

337
00:21:12,690 --> 00:21:16,350
We're looking at
reads over here,

338
00:21:16,350 --> 00:21:19,440
to cells that are
written over here.

339
00:21:19,440 --> 00:21:22,140
Because we claim the
updates over here

340
00:21:22,140 --> 00:21:24,750
have to store some
information that is--

341
00:21:24,750 --> 00:21:29,134
whatever the updates that happen
over here influence the queries

342
00:21:29,134 --> 00:21:30,050
that happen over here.

343
00:21:30,050 --> 00:21:31,841
So these queries have
to read the data that

344
00:21:31,841 --> 00:21:33,630
was written over here.

345
00:21:33,630 --> 00:21:36,990
And specifically, we're
claiming at least l root n

346
00:21:36,990 --> 00:21:40,140
cell probes have to
be read over here,

347
00:21:40,140 --> 00:21:42,420
from cells that were written--

348
00:21:42,420 --> 00:21:45,870
that were basically just
written in the left subtree.

349
00:21:45,870 --> 00:21:50,490
If we could prove this, then
we get our other claim--

350
00:21:50,490 --> 00:21:54,450
this one over here, that root
n updates, and root n verifies

351
00:21:54,450 --> 00:21:57,900
sums that require
this much time.

352
00:21:57,900 --> 00:22:01,290
The difference
is-- well, here we

353
00:22:01,290 --> 00:22:03,900
have an l, for an l leaf tree.

354
00:22:03,900 --> 00:22:07,710
And so what I'd like to
do is sum this lower bound

355
00:22:07,710 --> 00:22:09,390
over every node in the tree.

356
00:22:09,390 --> 00:22:12,625
I need to check that
that is valid to do.

357
00:22:12,625 --> 00:22:13,740
So let's do that.

358
00:22:25,750 --> 00:22:28,530
OK, for every node
v, we are claiming

359
00:22:28,530 --> 00:22:30,900
there's a certain number of
reads that happen over here,

360
00:22:30,900 --> 00:22:33,030
that correspond to
writes over here.

361
00:22:33,030 --> 00:22:35,490
But let's say you
look at the parent

362
00:22:35,490 --> 00:22:37,184
of v, which is over here.

363
00:22:37,184 --> 00:22:38,850
This thing is also
in the right subtree,

364
00:22:38,850 --> 00:22:40,680
and we're claiming there's
some number of reads

365
00:22:40,680 --> 00:22:43,180
on the right subtree, that read
things that are written over

366
00:22:43,180 --> 00:22:43,710
on the left.

367
00:22:43,710 --> 00:22:45,960
The worry would be that
the reads we counted here,

368
00:22:45,960 --> 00:22:47,640
we also count at
the next level up.

369
00:22:47,640 --> 00:22:49,723
We don't want to double
count in our lower bounds.

370
00:22:49,723 --> 00:22:53,796
If we're able to sum them up,
we can't be double counting.

371
00:22:53,796 --> 00:22:55,170
But the claim is
we're not double

372
00:22:55,170 --> 00:22:57,870
counting, because if you
look at any particular--

373
00:22:57,870 --> 00:23:02,990
any read-- so here's time, and
suppose you do a read here.

374
00:23:02,990 --> 00:23:05,534
You're reading a cell that was
written sometime in the past,

375
00:23:05,534 --> 00:23:07,950
if it was never written, it's
a not very interesting read,

376
00:23:07,950 --> 00:23:09,780
it communicates no information.

377
00:23:09,780 --> 00:23:13,410
So there's some write
in the past that changed

378
00:23:13,410 --> 00:23:15,420
the cell that's just read.

379
00:23:15,420 --> 00:23:18,390
And we are going
to count this read

380
00:23:18,390 --> 00:23:23,827
at a particular node, namely
the lca of those two times.

381
00:23:23,827 --> 00:23:25,410
So if you look at
the lca of the times

382
00:23:25,410 --> 00:23:27,890
of the reads and the writes,
that is the single note

383
00:23:27,890 --> 00:23:29,755
that we'll think
about that read that

384
00:23:29,755 --> 00:23:31,380
happened in the right
subtree, that was

385
00:23:31,380 --> 00:23:33,400
written in the left subtree.

386
00:23:33,400 --> 00:23:45,465
So no double counting, because
we only count at the lca.

387
00:23:48,496 --> 00:23:50,370
The other thing that we
need to be able to do

388
00:23:50,370 --> 00:23:52,270
is, because this is an
expected lower bound,

389
00:23:52,270 --> 00:23:53,760
we need linearity
of expectation.

390
00:23:53,760 --> 00:23:57,300
But expectation is indeed
linear, so we're all set.

391
00:24:01,755 --> 00:24:03,860
OK, so all that's left
is a little bit of common

392
00:24:03,860 --> 00:24:07,260
[INAUDIBLE] if we take l
root n, where l is the size

393
00:24:07,260 --> 00:24:10,290
of the subtree
below a given node,

394
00:24:10,290 --> 00:24:13,350
we sum that up over all nodes,
and it's a balanced binary

395
00:24:13,350 --> 00:24:15,260
search tree--

396
00:24:15,260 --> 00:24:18,870
or a balanced binary tree, I
should say, not a search tree.

397
00:24:18,870 --> 00:24:20,170
What do we get?

398
00:24:20,170 --> 00:24:24,370
Well, every leaf appears
in log n subtrees.

399
00:24:24,370 --> 00:24:30,740
So we get the total size of
the tree times log n for this,

400
00:24:30,740 --> 00:24:32,490
and we get another
root n over here.

401
00:24:32,490 --> 00:24:35,550
The total size of
the tree is root n.

402
00:24:35,550 --> 00:24:40,980
So we get this root
n log n, that's

403
00:24:40,980 --> 00:24:42,570
when you sum up the l part.

404
00:24:42,570 --> 00:24:46,170
Then everything gets
multiplied by root n,

405
00:24:46,170 --> 00:24:47,820
and that becomes
our lower bound,

406
00:24:47,820 --> 00:24:51,790
and that's exactly
what we need over here.

407
00:24:51,790 --> 00:24:55,256
So now this claim is done.

408
00:24:55,256 --> 00:24:57,000
Maybe I should do a check mark.

409
00:24:57,000 --> 00:24:58,660
Provided we can
prove this claim.

410
00:24:58,660 --> 00:25:01,360
So now our goal is
to prove this thing.

411
00:25:01,360 --> 00:25:03,520
And now we're in a
more local world,

412
00:25:03,520 --> 00:25:06,210
looking at a single node,
counting reads over here,

413
00:25:06,210 --> 00:25:08,007
the corresponding
rights over there.

414
00:25:08,007 --> 00:25:09,840
And then you just add
up those lower bounds,

415
00:25:09,840 --> 00:25:10,980
you get what you want.

416
00:25:10,980 --> 00:25:12,480
So this is where
the log comes from,

417
00:25:12,480 --> 00:25:14,370
because it's a balanced tree.

418
00:25:14,370 --> 00:25:17,160
And there's log n levels in
a balanced tree, that's where

419
00:25:17,160 --> 00:25:18,870
we're getting our lower bound.

420
00:25:18,870 --> 00:25:21,450
The root n's are just keeping
track of the size of the things

421
00:25:21,450 --> 00:25:22,641
we're manipulating.

422
00:25:26,852 --> 00:25:30,170
All right.

423
00:25:30,170 --> 00:25:34,730
So it remains to
prove this claim.

424
00:25:38,170 --> 00:25:39,790
Prove that claim,
we get that claim,

425
00:25:39,790 --> 00:25:40,998
and then we get this theorem.

426
00:26:05,630 --> 00:26:08,585
So proof of claim.

427
00:26:16,130 --> 00:26:18,860
We're going to do an
information theoretic

428
00:26:18,860 --> 00:26:21,950
argument, so let me set it up.

429
00:26:21,950 --> 00:26:24,050
It's again, it's
making this claim

430
00:26:24,050 --> 00:26:26,180
I said before, that
the permutations that

431
00:26:26,180 --> 00:26:27,830
get written over
here somehow have

432
00:26:27,830 --> 00:26:29,750
to be communicated to
the queries over here,

433
00:26:29,750 --> 00:26:33,380
because they matter.

434
00:26:33,380 --> 00:26:36,670
Because the permutations
that get said over here

435
00:26:36,670 --> 00:26:39,290
changed the answers to
all the queries over here,

436
00:26:39,290 --> 00:26:42,860
because of the interleaving
between left and right.

437
00:26:42,860 --> 00:26:45,710
So how are we going
to formalize that?

438
00:26:45,710 --> 00:26:57,950
Well, left subtree
does l/2 updates

439
00:26:57,950 --> 00:27:06,614
with l/2 random permutations,
uniform random permutations,

440
00:27:06,614 --> 00:27:08,030
because every node
does an update.

441
00:27:10,910 --> 00:27:13,760
And so the information
theoretic idea

442
00:27:13,760 --> 00:27:25,790
is that if we were to somehow
encode those permutations,

443
00:27:25,790 --> 00:27:34,720
That encoding must
use omega l log l--

444
00:27:34,720 --> 00:27:35,220
l?

445
00:27:35,220 --> 00:27:36,630
No, I'm sorry.

446
00:27:36,630 --> 00:27:39,020
It's not right.

447
00:27:39,020 --> 00:27:45,905
Off by some root n factors
here, l root n log n.

448
00:27:45,905 --> 00:27:48,380
OK, each permutation
must take root n

449
00:27:48,380 --> 00:27:51,470
log root n bits to encode.

450
00:27:51,470 --> 00:27:53,270
If you have a
random permutation,

451
00:27:53,270 --> 00:27:56,030
expected number of bits have
a very high probability.

452
00:27:56,030 --> 00:27:57,530
Almost every
permutation requires

453
00:27:57,530 --> 00:27:59,484
root n log root n bits.

454
00:27:59,484 --> 00:28:01,400
I'm not going to worry
about constant factors,

455
00:28:01,400 --> 00:28:04,197
put an omega here, so the
root n turns into an n.

456
00:28:04,197 --> 00:28:05,780
And then we've got
l over two of them,

457
00:28:05,780 --> 00:28:09,691
so again, ignoring constant
factors, that's l root n log n

458
00:28:09,691 --> 00:28:10,190
bits.

459
00:28:13,440 --> 00:28:17,510
And this is just
information, theoretic fact,

460
00:28:17,510 --> 00:28:19,490
our common [INAUDIBLE]
theory fact.

461
00:28:19,490 --> 00:28:25,460
And once we know that,
the idea is let's

462
00:28:25,460 --> 00:28:28,010
find an encoding that's
better than this,

463
00:28:28,010 --> 00:28:29,680
and get a contradiction.

464
00:28:29,680 --> 00:28:31,430
Of course we shouldn't
get a contradiction

465
00:28:31,430 --> 00:28:33,530
unless this claim is false.

466
00:28:33,530 --> 00:28:36,110
So either this claim is
true and we're happy,

467
00:28:36,110 --> 00:28:38,240
but if somehow the
word not enough cell

468
00:28:38,240 --> 00:28:40,940
reads on the right, that
did things that were written

469
00:28:40,940 --> 00:28:43,550
on the left, then
we will, from that,

470
00:28:43,550 --> 00:28:48,590
get a smaller encoding of
the update permutations

471
00:28:48,590 --> 00:28:50,330
that happen on the left.

472
00:28:50,330 --> 00:28:51,830
If we could somehow
do that, then we

473
00:28:51,830 --> 00:28:55,130
can get a contradiction, and
therefore conclude the claim

474
00:28:55,130 --> 00:28:57,770
is in fact true.

475
00:28:57,770 --> 00:29:15,800
So, if the claim fails, we'll
find a smaller encoding, which

476
00:29:15,800 --> 00:29:17,442
will give us a contradiction.

477
00:29:24,050 --> 00:29:32,260
All right, so let's set up
this problem a little bit more.

478
00:29:32,260 --> 00:29:34,900
I'm going to--
because we're really

479
00:29:34,900 --> 00:29:39,460
just interested in this subtree
v stuff on the left, stuff

480
00:29:39,460 --> 00:29:42,420
on the right, but this of course
lives in a much bigger tree,

481
00:29:42,420 --> 00:29:44,320
there's stuff that
happens over here.

482
00:29:44,320 --> 00:29:46,870
This I will call the past.

483
00:29:46,870 --> 00:29:51,160
I'm just going to assume we
know everything about the past.

484
00:29:51,160 --> 00:29:55,090
Everything to the
left of the subtree,

485
00:29:55,090 --> 00:29:56,530
we can assume that we know.

486
00:29:56,530 --> 00:29:59,420
When I say we know,
what do we know?

487
00:29:59,420 --> 00:30:02,020
We know all the updates, we know
all the queries that happen,

488
00:30:02,020 --> 00:30:04,220
and we know, at this
moment in particular,

489
00:30:04,220 --> 00:30:06,790
what is the state of
the data structure.

490
00:30:06,790 --> 00:30:09,760
Because this claim has
nothing to do with this stuff,

491
00:30:09,760 --> 00:30:12,850
it's all about reads here that
corresponds to writes here.

492
00:30:12,850 --> 00:30:15,610
So we can just assume we know
everything up to this point.

493
00:30:15,610 --> 00:30:18,946
In our encoding,
this is a key point.

494
00:30:18,946 --> 00:30:21,160
One way to say this in
a probabilistic sense

495
00:30:21,160 --> 00:30:23,680
is we're conditioning on
what happened over here

496
00:30:23,680 --> 00:30:25,820
on the left, what
updates happened.

497
00:30:25,820 --> 00:30:28,450
And if we can prove that
whatever we need to happen here

498
00:30:28,450 --> 00:30:31,150
holds no matter what
the condition is,

499
00:30:31,150 --> 00:30:33,190
then it will hold overall.

500
00:30:33,190 --> 00:30:35,800
So that's probabilistic
justification

501
00:30:35,800 --> 00:30:40,420
for why we can assume
we know the past OK.

502
00:30:40,420 --> 00:30:48,312
So then our goal
is to encode, this

503
00:30:48,312 --> 00:30:50,020
is a little bit
different from this goal.

504
00:31:07,050 --> 00:31:10,700
What we really want to do is
encode the update permutations

505
00:31:10,700 --> 00:31:12,825
on the left.

506
00:31:12,825 --> 00:31:14,450
That's a little
awkward to think about,

507
00:31:14,450 --> 00:31:17,030
because this is a claim
about how many probes

508
00:31:17,030 --> 00:31:18,800
happen on the right.

509
00:31:18,800 --> 00:31:20,690
So instead, what
we're going to do

510
00:31:20,690 --> 00:31:24,200
is encode the query
permutations on the right.

511
00:31:24,200 --> 00:31:27,230
So there are updates over here,
that's what we want to encode,

512
00:31:27,230 --> 00:31:29,590
but we're instead going to
encode the queries over here.

513
00:31:29,590 --> 00:31:32,480
I claim if you know what
the results of the queries

514
00:31:32,480 --> 00:31:35,480
were over here,
then you know what

515
00:31:35,480 --> 00:31:38,470
the updates were over there.

516
00:31:38,470 --> 00:31:41,560
Basically because of this
interleaving property.

517
00:31:41,560 --> 00:31:43,945
So I can write that down
a little more formally.

518
00:31:54,500 --> 00:31:57,730
So if we look at
time here, over,

519
00:31:57,730 --> 00:31:59,020
let's say this is v's subtree.

520
00:32:03,510 --> 00:32:07,500
Then what we have are
a sequence of updates

521
00:32:07,500 --> 00:32:09,490
and a sequence of queries.

522
00:32:13,354 --> 00:32:18,955
These are queries,
and these are updates.

523
00:32:21,460 --> 00:32:24,790
This is what the
sequence looks like--

524
00:32:24,790 --> 00:32:28,770
sorry, this is v's subtree,
this is the pi is, I should say.

525
00:32:31,830 --> 00:32:34,700
I mean, these operations are
all happened during time,

526
00:32:34,700 --> 00:32:38,436
but now I'm sorting by i.

527
00:32:38,436 --> 00:32:41,239
A little confusing.

528
00:32:41,239 --> 00:32:43,030
There are two orders
to think about, right?

529
00:32:43,030 --> 00:32:46,820
There's the sequence
over time, we're

530
00:32:46,820 --> 00:32:48,470
now looking at
such a left subtree

531
00:32:48,470 --> 00:32:51,230
where we do say 1, 5, and 3, 7.

532
00:32:51,230 --> 00:32:53,690
What that means-- so you're
imagining here, this is 1,

533
00:32:53,690 --> 00:32:55,580
this is 5, this is 3, this is 7.

534
00:32:55,580 --> 00:32:58,240
Here we're sorted by the
value written down there,

535
00:32:58,240 --> 00:33:00,770
we're sorting by the i,
the pi i that they're

536
00:33:00,770 --> 00:33:03,080
changing or querying.

537
00:33:03,080 --> 00:33:13,130
And so all the read things
are in the right subtree of v.

538
00:33:13,130 --> 00:33:23,090
And all the updates are
in the left subtree of v.

539
00:33:23,090 --> 00:33:24,539
This is the
interleaving property

540
00:33:24,539 --> 00:33:25,580
that I mentioned earlier.

541
00:33:28,140 --> 00:33:32,180
So I claim that if I encode
the results of the queries,

542
00:33:32,180 --> 00:33:35,990
namely I encode
these permutations,

543
00:33:35,990 --> 00:33:37,220
these are like summary--

544
00:33:37,220 --> 00:33:37,970
partial sums.

545
00:33:37,970 --> 00:33:41,480
These are prefixed sums
of the permutation list.

546
00:33:41,480 --> 00:33:43,970
Then I can figure out
what the updates were.

547
00:33:43,970 --> 00:33:44,900
Why?

548
00:33:44,900 --> 00:33:48,470
Because if I figure out
what this query, what

549
00:33:48,470 --> 00:33:50,000
it's permutation
is, that's the sum

550
00:33:50,000 --> 00:33:52,110
of all of these permutations.

551
00:33:52,110 --> 00:33:55,250
Now only one of them
changed in the left subtree,

552
00:33:55,250 --> 00:33:58,190
the rest all are in the past.

553
00:33:58,190 --> 00:34:01,580
They were all set before
this time over here,

554
00:34:01,580 --> 00:34:04,640
and I know everything about
the past, I'm assuming.

555
00:34:04,640 --> 00:34:07,730
So most of these I already
know, the one thing I don't know

556
00:34:07,730 --> 00:34:13,010
is this one, but I claim
if I know this sum,

557
00:34:13,010 --> 00:34:15,770
and I know all the others,
then I can figure out

558
00:34:15,770 --> 00:34:17,239
what this one is, right?

559
00:34:17,239 --> 00:34:20,469
It's slightly awkward to
do, if I give you this,

560
00:34:20,469 --> 00:34:29,960
I give you the sum of pi j from
j equals 0 to i, or something.

561
00:34:29,960 --> 00:34:33,320
I've got to--

562
00:34:33,320 --> 00:34:37,719
I want to strip away all
these, strip away all these.

563
00:34:37,719 --> 00:34:44,659
So I'm going to multiply by
sum of pi j inverses over here,

564
00:34:44,659 --> 00:34:48,350
and multiply by sum pi j--

565
00:34:48,350 --> 00:34:50,760
when I say multiply,
I mean compose.

566
00:34:50,760 --> 00:34:53,060
Sum pi j inverse is
here, maybe let's not

567
00:34:53,060 --> 00:34:54,870
worry about the
exact indices here.

568
00:34:54,870 --> 00:34:58,220
But the point is, this
is all in the past,

569
00:34:58,220 --> 00:35:01,700
and this is all in the past,
so I know all these pi js,

570
00:35:01,700 --> 00:35:03,750
I know they're inverses.

571
00:35:03,750 --> 00:35:05,750
So if I have this
total sum, and I right

572
00:35:05,750 --> 00:35:07,340
multiply with these
inverses, left

573
00:35:07,340 --> 00:35:10,400
multiply with these inverses,
I get the one that I want.

574
00:35:10,400 --> 00:35:19,190
This gives me some particular pi
k, if I set the indices right.

575
00:35:22,260 --> 00:35:22,760
OK?

576
00:35:22,760 --> 00:35:26,330
So if I know this query, I
figure out what this update is.

577
00:35:26,330 --> 00:35:29,480
Now once I know what this update
is, and I know this query, then

578
00:35:29,480 --> 00:35:33,440
in this sum, I know everything
except this one thing.

579
00:35:33,440 --> 00:35:35,740
And so by using
the same trick, I

580
00:35:35,740 --> 00:35:37,280
can figure out what
this update is.

581
00:35:37,280 --> 00:35:39,230
So now I know the
first two updates,

582
00:35:39,230 --> 00:35:41,000
if I then know the
answer to this query,

583
00:35:41,000 --> 00:35:42,200
I can figure out
what this update is.

584
00:35:42,200 --> 00:35:43,190
If I know the answer
to this query,

585
00:35:43,190 --> 00:35:44,231
I can figure this update.

586
00:35:44,231 --> 00:35:46,060
Because they're
perfectly interleaved,

587
00:35:46,060 --> 00:35:49,290
I only need to reconstruct
one update at a time.

588
00:35:49,290 --> 00:35:50,360
So if I'm given--

589
00:35:50,360 --> 00:35:54,260
if I've somehow encoded
all of the queries results,

590
00:35:54,260 --> 00:35:58,430
all of these prefix sums,
and I'm given the past,

591
00:35:58,430 --> 00:36:01,490
then I can reconstruct
what all the updates were.

592
00:36:01,490 --> 00:36:05,570
So that's basically saying
these two are the same issue.

593
00:36:05,570 --> 00:36:08,420
If I can encode the verified
sums in the right subtree,

594
00:36:08,420 --> 00:36:11,180
using less than l
root n log n bits,

595
00:36:11,180 --> 00:36:13,330
then I'll get a contradiction,
because it implies

596
00:36:13,330 --> 00:36:15,170
that from that
same encoding, you

597
00:36:15,170 --> 00:36:17,630
can also decode the
update permutations

598
00:36:17,630 --> 00:36:20,459
in the left subtree.

599
00:36:20,459 --> 00:36:21,250
So that's our goal.

600
00:36:24,761 --> 00:36:25,260
OK.

601
00:36:28,180 --> 00:36:33,592
So we'd like to prove
this for verify sum.

602
00:36:33,592 --> 00:36:35,050
But the first thing
I'm going to do

603
00:36:35,050 --> 00:36:39,640
is consider an easier
problem, which is sum.

604
00:36:39,640 --> 00:36:43,740
So suppose, basically, this
was not an input to the query.

605
00:36:43,740 --> 00:36:47,820
Suppose the query was,
what is the sum of i?

606
00:36:47,820 --> 00:36:48,750
Like this.

607
00:36:48,750 --> 00:36:52,360
I just want-- this is
the partial sum problem.

608
00:36:52,360 --> 00:36:54,360
I'm given an index
i, I want to know

609
00:36:54,360 --> 00:36:58,080
what is the permutation
from pi 0 up to pi i.

610
00:36:58,080 --> 00:36:59,720
Now that is not--

611
00:36:59,720 --> 00:37:01,720
that doesn't correspond
to dynamic connectivity,

612
00:37:01,720 --> 00:37:03,010
it's a new problem.

613
00:37:03,010 --> 00:37:05,051
We'll first prove a lower
bound for that problem,

614
00:37:05,051 --> 00:37:07,545
and then we'll put the
verify word back in.

615
00:37:07,545 --> 00:37:14,660
OK, so that's-- we're now
here at sum lower bound.

616
00:37:14,660 --> 00:37:15,620
Where should I go?

617
00:37:18,280 --> 00:37:21,269
Different-- so this is a lower
bound on the operation sum,

618
00:37:21,269 --> 00:37:23,560
as opposed to here, where
we're adding up lower bounds.

619
00:37:23,560 --> 00:37:27,490
Sorry for the
conflation of terms.

620
00:37:27,490 --> 00:37:30,170
Let's go here.

621
00:37:49,320 --> 00:37:50,730
So I'll call this a warm up.

622
00:37:58,690 --> 00:38:02,150
Suppose a query
is sum of i, which

623
00:38:02,150 --> 00:38:08,290
is supposed to give you this
prefix sum of pi j again,

624
00:38:08,290 --> 00:38:09,980
sum means composition.

625
00:38:13,060 --> 00:38:19,560
So this is going to be
relatively easy to prove,

626
00:38:19,560 --> 00:38:22,230
but it's not the problem
we actually want to solve,

627
00:38:22,230 --> 00:38:25,660
we'll use it to then
solve the real problem.

628
00:38:25,660 --> 00:38:28,390
And this is the order in which
we actually solve things.

629
00:38:28,390 --> 00:38:31,200
First, we prove a lower
bound of partial sums.

630
00:38:31,200 --> 00:38:34,500
OK, so let me give
you some notation,

631
00:38:34,500 --> 00:38:38,910
so we can really
get at this claim.

632
00:38:38,910 --> 00:38:42,630
Reading on the right,
writing on the left.

633
00:38:42,630 --> 00:38:44,580
So let r be all
the cells that are

634
00:38:44,580 --> 00:38:51,270
read during the right subtree,
which is an interval of time.

635
00:38:54,150 --> 00:38:59,130
And let w be the cells
written in the left subtree.

636
00:39:12,440 --> 00:39:14,440
OK, so what we're
talking about over here

637
00:39:14,440 --> 00:39:17,645
is that r intersects
w, those are cells that

638
00:39:17,645 --> 00:39:19,270
are read during the
right subtree, that

639
00:39:19,270 --> 00:39:22,420
were at some point written
during the left subtree,

640
00:39:22,420 --> 00:39:23,506
should be large.

641
00:39:23,506 --> 00:39:24,880
So we want to
prove a lower bound

642
00:39:24,880 --> 00:39:26,777
on the size of r intersect w.

643
00:39:30,760 --> 00:39:34,090
So if the lower
bound doesn't hold,

644
00:39:34,090 --> 00:39:37,880
that means that r intersect
w is relatively small.

645
00:39:37,880 --> 00:39:40,550
So imagine a situation where
r intersect w is very small,

646
00:39:40,550 --> 00:39:42,010
there's not very
much information

647
00:39:42,010 --> 00:39:44,380
passed from the left subtree
to the right subtree.

648
00:39:44,380 --> 00:39:46,630
If r intersect w is
small, then presumably I

649
00:39:46,630 --> 00:39:49,732
can afford to write it
down, I can encode it.

650
00:39:49,732 --> 00:39:51,940
So that's what we're going
to do, and we'll compute--

651
00:39:51,940 --> 00:39:55,350
we'll figure out that this is
indeed something we can afford.

652
00:39:55,350 --> 00:40:00,050
I'm going to encode r
intersect w explicitly.

653
00:40:00,050 --> 00:40:05,470
Meaning-- and this is a
set of cells in memory.

654
00:40:05,470 --> 00:40:07,480
So for every cell, I'm
going to write down

655
00:40:07,480 --> 00:40:12,310
what it's address is, and what
the contents of the cell are.

656
00:40:12,310 --> 00:40:18,850
So write down the
addresses and the contents

657
00:40:18,850 --> 00:40:20,720
for every such cell.

658
00:40:20,720 --> 00:40:23,830
So how many bits does that take?

659
00:40:23,830 --> 00:40:30,115
I'm going to say that it's r
intersect w times log n bits.

660
00:40:33,250 --> 00:40:35,830
Here's where I need to
mention an assumption.

661
00:40:35,830 --> 00:40:39,920
I'm assuming that the address
space is order log n bits long,

662
00:40:39,920 --> 00:40:42,280
that's like saying that the
space of your data structure

663
00:40:42,280 --> 00:40:44,050
is order--

664
00:40:44,050 --> 00:40:46,160
is polynomial in n.

665
00:40:46,160 --> 00:40:49,000
And if you want any hope of
having a reasonable update

666
00:40:49,000 --> 00:40:51,594
time, you need to have
polynomial space at most.

667
00:40:51,594 --> 00:40:54,010
So assuming polynomial space,
each of those addresses only

668
00:40:54,010 --> 00:40:56,140
takes order log n
bits to write down.

669
00:40:56,140 --> 00:41:00,417
The contents, let's say,
also take order log n bits

670
00:41:00,417 --> 00:41:01,000
to write down.

671
00:41:04,020 --> 00:41:08,880
OK, so fine.

672
00:41:08,880 --> 00:41:12,412
That's-- I mean, yeah.

673
00:41:12,412 --> 00:41:14,370
We don't really need to
make those assumptions,

674
00:41:14,370 --> 00:41:20,140
I don't think, but we will for
here to keep things simple.

675
00:41:20,140 --> 00:41:22,740
So if r intersect w is
small, meaning smaller

676
00:41:22,740 --> 00:41:28,470
than this thing, then this will
be small, smaller than l root

677
00:41:28,470 --> 00:41:29,238
log n.

678
00:41:32,516 --> 00:41:33,450
OK.

679
00:41:33,450 --> 00:41:38,040
So on the other hand, we know
that every encoding should

680
00:41:38,040 --> 00:41:41,360
take l root n log n bits.

681
00:41:41,360 --> 00:41:44,190
And so this will
be a contradiction,

682
00:41:44,190 --> 00:41:47,670
although we haven't quite
encoded what we need yet,

683
00:41:47,670 --> 00:41:50,250
or we haven't proved
that, but we're getting

684
00:41:50,250 --> 00:41:51,390
to be at the right point.

685
00:41:51,390 --> 00:41:55,890
These log ns are going
to cancel in a moment.

686
00:41:55,890 --> 00:41:59,010
So what we need to do is,
I claim this is actually

687
00:41:59,010 --> 00:42:00,790
enough to encode what we need.

688
00:42:00,790 --> 00:42:10,100
And so all that's left is a
decoding algorithm for the sum

689
00:42:10,100 --> 00:42:12,970
queries in the right subtree.

690
00:42:20,710 --> 00:42:22,960
So how are we going to do that?

691
00:42:22,960 --> 00:42:24,750
So this is my encoding,
these are the bits

692
00:42:24,750 --> 00:42:26,340
that I have written down.

693
00:42:26,340 --> 00:42:29,580
So now what I
know, as a decoder,

694
00:42:29,580 --> 00:42:32,590
is I know everything
about the past.

695
00:42:32,590 --> 00:42:34,390
I don't know what
these updates are,

696
00:42:34,390 --> 00:42:36,600
that's my whole goal, to
figure out what they are.

697
00:42:36,600 --> 00:42:38,730
I don't know what the
results of the queries

698
00:42:38,730 --> 00:42:41,050
are, but magically, I
know that r intersect w.

699
00:42:41,050 --> 00:42:42,030
Well, not magically.

700
00:42:42,030 --> 00:42:45,670
I wrote it down, kept
track on a piece of paper.

701
00:42:45,670 --> 00:42:47,580
So that's what I know.

702
00:42:47,580 --> 00:42:51,030
And so the idea
is, well, somebody

703
00:42:51,030 --> 00:42:53,550
gave us a data structure,
tells you how to do an update,

704
00:42:53,550 --> 00:42:55,090
tells you how to do a query.

705
00:42:55,090 --> 00:42:58,557
Let's run the query
algorithms over here.

706
00:42:58,557 --> 00:43:00,390
Run that query, run
that query, or whatever.

707
00:43:03,270 --> 00:43:05,310
It's a little hard to
run them, because we

708
00:43:05,310 --> 00:43:08,130
don't know what happened
in this intermediate part.

709
00:43:08,130 --> 00:43:12,430
But I claim r intersect w tells
us everything we need to know.

710
00:43:12,430 --> 00:43:23,220
So the decoding algorithm is
just simulate sum queries,

711
00:43:23,220 --> 00:43:24,942
simulate that algorithm.

712
00:43:35,780 --> 00:43:38,380
And let's go up here.

713
00:43:54,160 --> 00:43:55,680
How do we simulate
that algorithm?

714
00:43:55,680 --> 00:43:58,870
Well, the algorithm
makes a series of cell

715
00:43:58,870 --> 00:44:03,580
reads, and maybe writes, but
really we care about the reads.

716
00:44:03,580 --> 00:44:05,565
Writes are pretty
easy to simulate.

717
00:44:21,590 --> 00:44:23,470
There are three cases for reads.

718
00:44:23,470 --> 00:44:25,870
It could be that the thing
you're trying to read

719
00:44:25,870 --> 00:44:27,340
was written in
the right subtree,

720
00:44:27,340 --> 00:44:29,650
it could be that it was
written in the left subtree,

721
00:44:29,650 --> 00:44:31,730
or it could be it was
written in the past,

722
00:44:31,730 --> 00:44:36,177
before we got to v subtree.

723
00:44:36,177 --> 00:44:38,260
Now we don't necessarily
know which case we're in,

724
00:44:38,260 --> 00:44:40,890
but I claim we'll be
able to figure it out.

725
00:44:40,890 --> 00:44:45,789
Because any cells that are
written in the right subtree,

726
00:44:45,789 --> 00:44:47,830
we've just been running
the simulation algorithm,

727
00:44:47,830 --> 00:44:50,140
so every time we do
it right, we just

728
00:44:50,140 --> 00:44:51,460
can store it off to the side.

729
00:44:51,460 --> 00:44:54,250
So when we're doing
simulations, we

730
00:44:54,250 --> 00:44:57,100
don't need that the
simulation takes low space.

731
00:44:57,100 --> 00:44:59,750
We just need that the input--
these decoding algorithms

732
00:44:59,750 --> 00:45:01,291
doesn't have to be
low space, we just

733
00:45:01,291 --> 00:45:02,830
need that the
encoding was small.

734
00:45:02,830 --> 00:45:05,010
We've already made
the encoding small.

735
00:45:05,010 --> 00:45:06,400
And so the decoding
algorithm can

736
00:45:06,400 --> 00:45:08,290
spend lots of time
and space, we just

737
00:45:08,290 --> 00:45:10,540
need to show that decoding
algorithm can recover

738
00:45:10,540 --> 00:45:11,790
what it's supposed to recover.

739
00:45:11,790 --> 00:45:13,430
It's like a
compression algorithm,

740
00:45:13,430 --> 00:45:15,160
to show there's some
way to decompress,

741
00:45:15,160 --> 00:45:17,160
could take arbitrarily
amount of time and space.

742
00:45:17,160 --> 00:45:20,020
So when we're simulating
the right subtree,

743
00:45:20,020 --> 00:45:23,725
and we simulate not only the sum
queries, but also the updates.

744
00:45:27,370 --> 00:45:30,100
So whatever gets written
during that simulation,

745
00:45:30,100 --> 00:45:33,610
we just store it, and so
it's easy to reread it.

746
00:45:33,610 --> 00:45:36,250
If it was written
in the left subtree,

747
00:45:36,250 --> 00:45:38,245
well, that is r intersect w.

748
00:45:40,990 --> 00:45:42,910
And we've written
down r intersect w.

749
00:45:42,910 --> 00:45:44,470
So we can detect
that this happened,

750
00:45:44,470 --> 00:45:47,000
because we look
at r intersect w,

751
00:45:47,000 --> 00:45:48,970
we see, oh that
word was in there,

752
00:45:48,970 --> 00:45:51,220
that address was
in there, and so

753
00:45:51,220 --> 00:45:55,120
we read the contents
from the encoding.

754
00:45:55,120 --> 00:46:00,420
If it was in the
past, it's also easy.

755
00:46:00,420 --> 00:46:03,100
We already know it.

756
00:46:03,100 --> 00:46:07,690
OK, so basically what we do--

757
00:46:07,690 --> 00:46:09,880
what the simulation algorithm
is doing is it says,

758
00:46:09,880 --> 00:46:12,460
OK, let's assume
that main memory was

759
00:46:12,460 --> 00:46:14,477
whatever it was at this point.

760
00:46:14,477 --> 00:46:17,060
That data structure, I mean we
know everything about the past,

761
00:46:17,060 --> 00:46:18,768
so we know what the
data structure looked

762
00:46:18,768 --> 00:46:20,710
like at this moment, store that.

763
00:46:20,710 --> 00:46:23,530
Update all of the cells
that are in r intersect

764
00:46:23,530 --> 00:46:26,050
w given by our encoding.

765
00:46:26,050 --> 00:46:28,520
And then just run the algorithm.

766
00:46:28,520 --> 00:46:31,480
So we're sort of jumping
into this moment in time

767
00:46:31,480 --> 00:46:34,360
with a slightly
weird data structure.

768
00:46:34,360 --> 00:46:37,244
It's not the correct
data structure.

769
00:46:37,244 --> 00:46:39,160
It's not what the data
structure will actually

770
00:46:39,160 --> 00:46:41,470
look like at this point,
but it's close enough.

771
00:46:41,470 --> 00:46:44,530
Because anything
that's read here,

772
00:46:44,530 --> 00:46:46,300
either was written
here, in which case

773
00:46:46,300 --> 00:46:48,960
it's correct, or was
written here, in which case

774
00:46:48,960 --> 00:46:53,250
it's correct because
r intersect w had it.

775
00:46:53,250 --> 00:46:56,290
Or isn't it written
here, in which case--

776
00:46:56,290 --> 00:46:59,060
maybe it's always correct.

777
00:46:59,060 --> 00:46:59,560
No, no.

778
00:46:59,560 --> 00:47:01,120
See there could be
some writes that

779
00:47:01,120 --> 00:47:04,397
happened here, where there's no
corresponding read over here.

780
00:47:04,397 --> 00:47:06,730
So the data structure may
have been changed in ways here

781
00:47:06,730 --> 00:47:10,430
that don't matter for this
execution of the right subtree.

782
00:47:10,430 --> 00:47:13,300
So any rights that happened
here to some cell probe,

783
00:47:13,300 --> 00:47:15,496
to some cell, where that
cell is not read over here,

784
00:47:15,496 --> 00:47:16,870
we don't care
about, because they

785
00:47:16,870 --> 00:47:18,820
don't affect the simulation.

786
00:47:18,820 --> 00:47:20,830
So we have a good
enough data structure

787
00:47:20,830 --> 00:47:23,440
here, it may not be
completely accurate,

788
00:47:23,440 --> 00:47:26,170
but it's accurate enough
to run these queries.

789
00:47:26,170 --> 00:47:28,760
Once we run the queries,
the queries output the sums.

790
00:47:28,760 --> 00:47:31,450
That's what we're assuming in
this warm up, we run the query,

791
00:47:31,450 --> 00:47:33,160
we get the sum.

792
00:47:33,160 --> 00:47:35,302
Once I have that sum,
as I argued before,

793
00:47:35,302 --> 00:47:37,510
once you know what the
results of these queries were,

794
00:47:37,510 --> 00:47:41,680
I can figure out what the
arguments to the updates were,

795
00:47:41,680 --> 00:47:44,810
by doing that inverse
multiplication stuff.

796
00:47:44,810 --> 00:47:47,090
So that's actually it.

797
00:47:47,090 --> 00:47:51,650
What this implies is that this
is a correct encoding, which

798
00:47:51,650 --> 00:47:56,360
means that this order,
r intersect w times log

799
00:47:56,360 --> 00:48:02,690
n bits that we use to encode,
must be at least this big.

800
00:48:02,690 --> 00:48:04,580
Because we know any
encoding is going

801
00:48:04,580 --> 00:48:10,005
to require at least that
many bits, l root n log n.

802
00:48:12,680 --> 00:48:16,280
And so the log ns
cancel, and we're

803
00:48:16,280 --> 00:48:20,060
left with r intersect
w is at least l root n.

804
00:48:20,060 --> 00:48:24,102
And this is exactly the quantity
we cared about for this claim.

805
00:48:24,102 --> 00:48:29,450
So same thing, r intersect
w is at least l root n.

806
00:48:29,450 --> 00:48:31,090
OK, so warm up done.

807
00:48:31,090 --> 00:48:33,960
Any questions about the warm up?

808
00:48:33,960 --> 00:48:35,870
So in this weird
problem, which does not

809
00:48:35,870 --> 00:48:39,040
correspond to
dynamic connectivity,

810
00:48:39,040 --> 00:48:43,190
because it's this other problem,
prefix sums computation.

811
00:48:43,190 --> 00:48:46,190
We get the intended lower bound,
you need log n per operation.

812
00:48:46,190 --> 00:48:50,710
Or you need root n log
n per block operation.

813
00:48:50,710 --> 00:48:52,670
OK, but this is not
what we really want,

814
00:48:52,670 --> 00:48:55,490
we really want a lower
bound on verify sum.

815
00:48:55,490 --> 00:48:59,060
Where you're given as an
argument the permutation that

816
00:48:59,060 --> 00:49:00,980
we're talking about over here.

817
00:49:00,980 --> 00:49:06,170
So this goal is not the
right goal for verify sum,

818
00:49:06,170 --> 00:49:07,040
in some sense.

819
00:49:07,040 --> 00:49:08,675
Well, sort of the right goal.

820
00:49:08,675 --> 00:49:10,550
It's a little awkward
though, because they're

821
00:49:10,550 --> 00:49:12,980
given as inputs to the queries.

822
00:49:12,980 --> 00:49:17,240
So what is there to encode?

823
00:49:17,240 --> 00:49:21,230
Well, we can still set
it up in a useful way.

824
00:49:21,230 --> 00:49:23,650
Same goal, slightly restated.

825
00:49:42,230 --> 00:49:46,200
So this is the last step
to verify sum lower bound.

826
00:50:06,830 --> 00:50:08,232
So here's the set up.

827
00:51:05,010 --> 00:51:07,190
OK, so slightly
different set up here.

828
00:51:07,190 --> 00:51:10,670
Here I assumed that
we just knew the past.

829
00:51:10,670 --> 00:51:13,250
I also basically assumed
these two things,

830
00:51:13,250 --> 00:51:16,814
that we didn't know what
the update permutations were

831
00:51:16,814 --> 00:51:18,230
in the left subtree,
and we didn't

832
00:51:18,230 --> 00:51:20,090
know what the answers
to the queries

833
00:51:20,090 --> 00:51:21,609
were in the right subtree.

834
00:51:21,609 --> 00:51:23,150
Now I'm going to
assume we don't even

835
00:51:23,150 --> 00:51:25,100
know what we're passing
into the queries,

836
00:51:25,100 --> 00:51:27,860
because that is the information
we're trying to figure out.

837
00:51:27,860 --> 00:51:29,630
These two things are
basically the same,

838
00:51:29,630 --> 00:51:31,381
if you knew all the
update permutations,

839
00:51:31,381 --> 00:51:33,380
you could figure out all
the query permutations.

840
00:51:33,380 --> 00:51:35,060
If you knew all the
query permutations,

841
00:51:35,060 --> 00:51:37,101
you could figure out all
the update permutations.

842
00:51:37,101 --> 00:51:39,590
That's what we argued
over here, it's

843
00:51:39,590 --> 00:51:43,520
enough to figure out
query permutations,

844
00:51:43,520 --> 00:51:46,070
then we could figure out
the update permutations.

845
00:51:46,070 --> 00:51:49,410
It's just a little more
awkward, because now there

846
00:51:49,410 --> 00:51:50,900
are arguments to queries.

847
00:51:50,900 --> 00:51:53,336
And so if we did this
simulation, right?

848
00:51:53,336 --> 00:51:54,710
We'd simulate--
we don't know how

849
00:51:54,710 --> 00:51:57,251
to simulate the query algorithm,
because it's supposed to be,

850
00:51:57,251 --> 00:51:59,300
given the argument,
which is what

851
00:51:59,300 --> 00:52:01,294
we're trying to figure out.

852
00:52:01,294 --> 00:52:06,460
So we can't simulate
the query algorithm.

853
00:52:06,460 --> 00:52:08,440
It's kind of annoying,
but otherwise

854
00:52:08,440 --> 00:52:10,166
the set up is roughly the same.

855
00:52:10,166 --> 00:52:11,790
The one thing we know
is that the query

856
00:52:11,790 --> 00:52:15,130
is supposed to return
yes, because if you

857
00:52:15,130 --> 00:52:17,620
look at this bad
access sequence,

858
00:52:17,620 --> 00:52:20,980
it is designed to
always return yes.

859
00:52:20,980 --> 00:52:24,855
So that is a thing we know,
but we don't know the arguments

860
00:52:24,855 --> 00:52:26,980
to the updates on the left,
we don't know arguments

861
00:52:26,980 --> 00:52:28,240
to the updates on the right.

862
00:52:28,240 --> 00:52:29,781
We'll assume we know
everything else,

863
00:52:29,781 --> 00:52:31,630
basically, up to this time.

864
00:52:31,630 --> 00:52:33,940
Again, this is a
probabilistic statement,

865
00:52:33,940 --> 00:52:38,410
that conditioned on the past,
conditioned on the queries

866
00:52:38,410 --> 00:52:40,210
on the left, which
probably don't matter,

867
00:52:40,210 --> 00:52:43,480
conditioned on the updates on
the right, which do matter,

868
00:52:43,480 --> 00:52:46,900
but they're sort of irrelevant
to this r intersect w issue.

869
00:52:46,900 --> 00:52:48,400
Conditioned on all
those things will

870
00:52:48,400 --> 00:52:51,400
prove that the expected number
of operations you need to--

871
00:52:51,400 --> 00:52:55,510
or expected encoding
size, for this problem,

872
00:52:55,510 --> 00:52:58,750
is at least what it is,
l root n log n bits.

873
00:52:58,750 --> 00:53:01,720
And from that lower
bound, you can then

874
00:53:01,720 --> 00:53:06,640
take the sum over all possible
setups, over all conditions.

875
00:53:06,640 --> 00:53:09,580
And that implies a lower
bound on the overall setting

876
00:53:09,580 --> 00:53:12,088
without these assumptions.

877
00:53:12,088 --> 00:53:13,520
OK?

878
00:53:13,520 --> 00:53:16,630
So all I'm saying
is in this set up,

879
00:53:16,630 --> 00:53:19,236
it still takes a lot of bits
to encode these updates,

880
00:53:19,236 --> 00:53:20,860
because we don't have
the queries which

881
00:53:20,860 --> 00:53:22,587
would tell us the answers.

882
00:53:22,587 --> 00:53:24,670
So we get a lower bound
on encoding these updates,

883
00:53:24,670 --> 00:53:26,290
or a lower bound on
encoding these queries,

884
00:53:26,290 --> 00:53:27,944
because we assume
we don't know them.

885
00:53:27,944 --> 00:53:29,860
The rest of the-- all
the remaining operations

886
00:53:29,860 --> 00:53:34,320
don't tell us enough about this.

887
00:53:34,320 --> 00:53:35,720
OK.

888
00:53:35,720 --> 00:53:37,790
So how the heck are
we going to do--

889
00:53:37,790 --> 00:53:40,730
prove a lower bound in
this setting, when we can't

890
00:53:40,730 --> 00:53:43,624
simulate the query algorithm?

891
00:53:43,624 --> 00:53:45,290
There's one cool idea
to make this work.

892
00:53:50,640 --> 00:53:55,050
You may recall our last
cell probe lower bound

893
00:53:55,050 --> 00:53:56,340
for the predecessor problem.

894
00:54:00,570 --> 00:54:03,150
Use this idea of
round elimination.

895
00:54:03,150 --> 00:54:06,190
The idea with round
elimination was--

896
00:54:06,190 --> 00:54:09,470
Alice is sending a message,
Bob was sending a response.

897
00:54:09,470 --> 00:54:11,300
But that first m--
we set things up,

898
00:54:11,300 --> 00:54:14,240
we set up the problem so the
first message sent by Alice

899
00:54:14,240 --> 00:54:18,350
had, on average, less than
1-bit of information to Bob,

900
00:54:18,350 --> 00:54:20,120
or very little
information to Bob.

901
00:54:20,120 --> 00:54:23,120
And so what Bob could
do is basically guess

902
00:54:23,120 --> 00:54:24,800
what that message was.

903
00:54:24,800 --> 00:54:27,470
And that would be accurate
with some probability.

904
00:54:27,470 --> 00:54:28,910
Now here, we're
not quite allowed

905
00:54:28,910 --> 00:54:30,050
to do that, we're
not allowed to change

906
00:54:30,050 --> 00:54:31,550
the accuracy of our
results, because

907
00:54:31,550 --> 00:54:33,860
of our particular setting.

908
00:54:33,860 --> 00:54:38,600
So we can't afford to just
guess by flipping coins

909
00:54:38,600 --> 00:54:41,510
what we were supposed to know.

910
00:54:41,510 --> 00:54:43,340
What we're supposed
to know here is--

911
00:54:43,340 --> 00:54:45,470
we're trying to simulate
a query operation,

912
00:54:45,470 --> 00:54:46,970
and so we need to
know the argument,

913
00:54:46,970 --> 00:54:48,760
that whole permutation
to the queries.

914
00:54:48,760 --> 00:54:50,950
It's hard to run it
without that permutation.

915
00:54:50,950 --> 00:54:52,880
So instead of guessing
by flipping coins,

916
00:54:52,880 --> 00:54:54,920
we're going to guess in
the dynamic programming

917
00:54:54,920 --> 00:54:58,310
sense, which is we're going
to try all the possibilities.

918
00:54:58,310 --> 00:55:02,990
Run the simulation over
all possible queries,

919
00:55:02,990 --> 00:55:05,231
all possible second
arguments to the query.

920
00:55:05,231 --> 00:55:06,980
We don't know what the
presentation is, so

921
00:55:06,980 --> 00:55:08,480
just try them all.

922
00:55:08,480 --> 00:55:14,180
Cool thing is, only one
argument here should return yes.

923
00:55:14,180 --> 00:55:16,710
That's the one
we're looking for.

924
00:55:16,710 --> 00:55:18,140
So if you try them
all, find which

925
00:55:18,140 --> 00:55:21,844
one says yes, we'll be done.

926
00:55:21,844 --> 00:55:25,670
So this is called
the decoding idea.

927
00:55:31,460 --> 00:55:45,240
Simulate verify sum of
i comma pi, for all pi.

928
00:55:45,240 --> 00:55:48,794
And take the one that returns
yes, that is our permutation.

929
00:55:48,794 --> 00:55:51,210
And so if we figure out what
those query permutations are,

930
00:55:51,210 --> 00:55:53,376
then we figure out what the
update permutations are,

931
00:55:53,376 --> 00:55:57,360
and we get our lower
bounds just like before.

932
00:55:57,360 --> 00:55:59,940
OK.

933
00:55:59,940 --> 00:56:02,940
This is easier said than
done, unfortunately.

934
00:56:02,940 --> 00:56:08,446
We'd like to run the
simulation just like here, so

935
00:56:08,446 --> 00:56:09,570
simulate inquiry algorithm.

936
00:56:09,570 --> 00:56:11,669
They said, OK, still
the case, that if you're

937
00:56:11,669 --> 00:56:13,710
reading a cell that's
either in the left subtree,

938
00:56:13,710 --> 00:56:16,800
in the right subtree,
or in the past.

939
00:56:16,800 --> 00:56:19,680
And we said this was
easy, this was known.

940
00:56:19,680 --> 00:56:25,260
And the hard part is this
case, because if we're

941
00:56:25,260 --> 00:56:28,320
running this query,
and it reads something

942
00:56:28,320 --> 00:56:30,090
that was written in
the left subtree,

943
00:56:30,090 --> 00:56:33,600
it may not be in r intersect w.

944
00:56:33,600 --> 00:56:35,894
Why is that?

945
00:56:35,894 --> 00:56:36,810
Little puzzle for you.

946
00:56:39,810 --> 00:56:42,670
So we're running one of
these queries for sum pi.

947
00:56:42,670 --> 00:56:45,430
And I claim that when we read
something in the left subtree,

948
00:56:45,430 --> 00:56:49,422
we don't know if it's in r
intersect w, it might not be.

949
00:56:59,754 --> 00:57:01,360
Let's see if we're
on the same page.

950
00:57:01,360 --> 00:57:07,640
So r is the set of cells
read during the right subtree

951
00:57:07,640 --> 00:57:11,740
when executing these operations.

952
00:57:11,740 --> 00:57:12,240
OK?

953
00:57:12,240 --> 00:57:16,180
But what we're doing now is
simulating some executions

954
00:57:16,180 --> 00:57:19,210
that didn't necessarily happen.

955
00:57:19,210 --> 00:57:21,400
We're doing a verify
sum of i comma pi,

956
00:57:21,400 --> 00:57:23,260
but in the bad
access sequence, we

957
00:57:23,260 --> 00:57:24,910
did verify sum of
i comma something

958
00:57:24,910 --> 00:57:28,150
specific, not any pi,
but the correct pi.

959
00:57:28,150 --> 00:57:31,420
So we only ran the
yes verify sums,

960
00:57:31,420 --> 00:57:33,910
and that's what r is
defined with respect to.

961
00:57:33,910 --> 00:57:35,410
r is the set of
things that get read

962
00:57:35,410 --> 00:57:38,050
during these operations,
where the verify sum is always

963
00:57:38,050 --> 00:57:38,590
output yes.

964
00:57:38,590 --> 00:57:42,070
If you now run a verify
sum where the answer is no,

965
00:57:42,070 --> 00:57:46,390
it may read stuff that the other
verify sum didn't read maybe.

966
00:57:46,390 --> 00:57:48,820
Shouldn't matter,
but it's awkward,

967
00:57:48,820 --> 00:57:51,890
because now it's not just r
intersect w we need to encode.

968
00:57:51,890 --> 00:57:55,630
We need to encode
some more stuff.

969
00:57:55,630 --> 00:57:57,760
It's basically a
new r prime that

970
00:57:57,760 --> 00:58:01,180
may happen during these
reads, and we just

971
00:58:01,180 --> 00:58:02,680
can't afford to
encode that r prime,

972
00:58:02,680 --> 00:58:04,388
because it's not the
thing we care about.

973
00:58:04,388 --> 00:58:06,880
We care about what happens in
the actual access sequence,

974
00:58:06,880 --> 00:58:10,130
not in this
arbitrary simulation.

975
00:58:10,130 --> 00:58:15,361
So this is the annoying thing.

976
00:58:15,361 --> 00:58:15,860
Trouble.

977
00:58:21,700 --> 00:58:28,780
If you look at an incorrect
query, meaning the wrong pi,

978
00:58:28,780 --> 00:58:33,930
this is like a no
query, the output's no.

979
00:58:33,930 --> 00:58:39,040
Reads some different
set of cells, r

980
00:58:39,040 --> 00:58:42,770
prime, which isn't
the same thing as r.

981
00:58:42,770 --> 00:58:49,300
And so if-- we have
some good news, which

982
00:58:49,300 --> 00:58:52,150
is if we can somehow
detect that this happened,

983
00:58:52,150 --> 00:58:58,270
that we read something that
is in r prime, but not r,

984
00:58:58,270 --> 00:59:00,880
then the answer must be no.

985
00:59:05,950 --> 00:59:11,019
So that's our saving
hope, is that either we're

986
00:59:11,019 --> 00:59:13,060
reading something at r
intersect w, in which case

987
00:59:13,060 --> 00:59:15,460
it's been written down,
we know how to do it.

988
00:59:15,460 --> 00:59:20,230
What's not written there, and
if it's not written there,

989
00:59:20,230 --> 00:59:23,950
then it should be, hopefully,
in r prime minus r.

990
00:59:23,950 --> 00:59:27,320
So the answer should be no.

991
00:59:27,320 --> 00:59:29,620
Maybe.

992
00:59:29,620 --> 00:59:31,330
Slight problem,
though, because we

993
00:59:31,330 --> 00:59:34,780
used r intersect w to
detect what case we were in.

994
00:59:34,780 --> 00:59:36,610
If we were in r
intersect w, then

995
00:59:36,610 --> 00:59:40,160
we knew we should read
from those encoded cells.

996
00:59:40,160 --> 00:59:43,810
If we weren't, we were either
in the past or in the right

997
00:59:43,810 --> 00:59:45,850
subtree, these things
were easy to detect,

998
00:59:45,850 --> 00:59:48,850
because they got written
during the simulation.

999
00:59:48,850 --> 00:59:50,580
But we need to
distinguish between--

1000
00:59:50,580 --> 00:59:52,780
did we read something that
was in the left subtree,

1001
00:59:52,780 --> 00:59:56,710
or did we read something
that was known?

1002
00:59:56,710 --> 00:59:59,440
This is a little tricky, because
this gets at exactly the issue.

1003
00:59:59,440 --> 01:00:01,390
Left subtree might
write some stuff that

1004
01:00:01,390 --> 01:00:04,330
didn't get read by verify sum.

1005
01:00:04,330 --> 01:00:07,790
So now you go to read
it, you need to know,

1006
01:00:07,790 --> 01:00:12,610
am I reading something that
was not in r intersect w?

1007
01:00:12,610 --> 01:00:18,130
And therefore-- Yeah.

1008
01:00:18,130 --> 01:00:21,270
Basically the issue
is, is it in w?

1009
01:00:21,270 --> 01:00:23,480
If it's in w, but
not in r intersect w,

1010
01:00:23,480 --> 01:00:26,320
then I know the answer
is no, and I should stop.

1011
01:00:26,320 --> 01:00:30,450
If it's not in w though, that
means it was in the known past,

1012
01:00:30,450 --> 01:00:32,500
and then I should continue.

1013
01:00:32,500 --> 01:00:35,320
How do I know if I
should stop or continue?

1014
01:00:35,320 --> 01:00:41,030
So this is the tricky part.

1015
01:00:41,030 --> 01:00:49,730
We can't tell whether
there's the weird notation.

1016
01:00:49,730 --> 01:00:58,240
We want to know whether r is
in w minus r or past minus r

1017
01:00:58,240 --> 01:01:00,400
intersect w.

1018
01:01:00,400 --> 01:01:02,994
OK, we can tell whether
it's in r intersect w,

1019
01:01:02,994 --> 01:01:03,910
if it is, we're happy.

1020
01:01:03,910 --> 01:01:06,100
If it's not in r
intersect w, it could

1021
01:01:06,100 --> 01:01:08,490
be that's because it was
just in some past thing we

1022
01:01:08,490 --> 01:01:12,340
were reading, that didn't
get read otherwise.

1023
01:01:12,340 --> 01:01:14,590
Or it could be we're
reading something

1024
01:01:14,590 --> 01:01:16,720
that was written in
the left subtree,

1025
01:01:16,720 --> 01:01:18,770
but not read in
the right subtree.

1026
01:01:18,770 --> 01:01:21,850
So in this case,
we want to abort.

1027
01:01:21,850 --> 01:01:25,735
And in this case, it's known,
and so we just continue.

1028
01:01:30,322 --> 01:01:32,280
So that's what the
simulation would like to do,

1029
01:01:32,280 --> 01:01:35,292
if we could distinguish
between these two cases.

1030
01:01:35,292 --> 01:01:37,500
But right now, we can't
distinguish between these two

1031
01:01:37,500 --> 01:01:40,600
cases, because we don't
have enough information.

1032
01:01:40,600 --> 01:01:44,240
So we're going to make our
encoding a little bit bigger.

1033
01:01:44,240 --> 01:01:45,240
What we're going to do--

1034
01:01:48,888 --> 01:02:06,030
this is here-- is
encode a separator

1035
01:02:06,030 --> 01:02:15,736
for r minus w and w minus r.

1036
01:02:15,736 --> 01:02:21,130
So let's-- over here.

1037
01:02:36,600 --> 01:02:38,280
What does this mean?

1038
01:02:38,280 --> 01:02:45,030
Separators going to call, called
S. So I want this picture,

1039
01:02:45,030 --> 01:02:52,420
r minus w sits inside
S. And w minus r

1040
01:02:52,420 --> 01:02:59,327
sits outside S. This is
my universe of cells.

1041
01:02:59,327 --> 01:03:01,660
These are the things that are
read in the right subtree,

1042
01:03:01,660 --> 01:03:03,742
but not written in
the left subtree.

1043
01:03:03,742 --> 01:03:05,200
Those are the things
I care about--

1044
01:03:07,870 --> 01:03:10,390
well, no quite this,
the other ones.

1045
01:03:10,390 --> 01:03:13,210
So things that are read in
the right subtree and that

1046
01:03:13,210 --> 01:03:16,840
are not written in the last,
this is the past essentially,

1047
01:03:16,840 --> 01:03:18,820
that's useful over there.

1048
01:03:18,820 --> 01:03:22,190
Over here, I have
w minus r, these

1049
01:03:22,190 --> 01:03:24,190
are things that are written
in the left subtree,

1050
01:03:24,190 --> 01:03:25,690
but not read in
the right subtree.

1051
01:03:25,690 --> 01:03:27,356
These are the things
that I worry about,

1052
01:03:27,356 --> 01:03:29,830
because those ones I need to
detect that that was changed,

1053
01:03:29,830 --> 01:03:35,510
and say whoops, you must
have an answer of no.

1054
01:03:35,510 --> 01:03:36,010
OK?

1055
01:03:36,010 --> 01:03:38,557
So I can't afford to
store these sets exactly,

1056
01:03:38,557 --> 01:03:40,390
so I'm going to approximate
them, by saying,

1057
01:03:40,390 --> 01:03:43,840
well, let's store the
separator out here.

1058
01:03:43,840 --> 01:03:49,000
And if you're in S, then you're
definitely not in w minus r.

1059
01:03:49,000 --> 01:03:53,470
If you're definitely not in
w minus r, then you can run--

1060
01:03:53,470 --> 01:03:56,290
you can treat it
as if it was known.

1061
01:03:56,290 --> 01:04:00,080
OK, so if you're in
s, this would be--

1062
01:04:00,080 --> 01:04:03,240
why don't I write it here.

1063
01:04:03,240 --> 01:04:05,200
For the decoding
algorithm, if you

1064
01:04:05,200 --> 01:04:14,730
want to read a cell
that is written,

1065
01:04:14,730 --> 01:04:24,160
or last written in the
right subtree, in the past,

1066
01:04:24,160 --> 01:04:27,870
these are the two easy case.

1067
01:04:27,870 --> 01:04:30,350
Sorry-- I don't want to
write what's in the past,

1068
01:04:30,350 --> 01:04:33,080
because the whole point is to
figure out what's in the past.

1069
01:04:33,080 --> 01:04:35,360
The other easy case is
if it's in r intersect w,

1070
01:04:35,360 --> 01:04:37,332
then it's written down for us.

1071
01:04:37,332 --> 01:04:38,365
So this is encoded.

1072
01:04:40,930 --> 01:04:45,280
This is easy, because during the
simulation we did those rights,

1073
01:04:45,280 --> 01:04:46,870
and so we know what they were.

1074
01:04:46,870 --> 01:04:50,020
r intersect w, we've written
down, so it's easy to know.

1075
01:04:50,020 --> 01:04:52,600
Then the other cases
are either you're in S,

1076
01:04:52,600 --> 01:05:00,790
or you're not in S. OK.

1077
01:05:00,790 --> 01:05:05,800
I claim if you're in S,
you must be in the past,

1078
01:05:05,800 --> 01:05:09,340
that cell must have been
written in the past,

1079
01:05:09,340 --> 01:05:13,120
and so you know
what the value was.

1080
01:05:13,120 --> 01:05:15,550
And so you can continue
writing the simulation,

1081
01:05:15,550 --> 01:05:17,474
just like in this situation.

1082
01:05:19,830 --> 01:05:22,330
The other situation is you're
not in S, then you don't know,

1083
01:05:22,330 --> 01:05:24,820
it could have been written
or might not have been.

1084
01:05:24,820 --> 01:05:30,180
But what you know is that
you're definitely not in r.

1085
01:05:30,180 --> 01:05:32,100
Because if you're
not in r minus w,

1086
01:05:32,100 --> 01:05:36,050
and you're not in r intersect
w, then you're not in r.

1087
01:05:36,050 --> 01:05:41,060
If you're not in r, then
we're in this situation.

1088
01:05:41,060 --> 01:05:42,810
If you read something
not in r, that means

1089
01:05:42,810 --> 01:05:45,150
you're running the wrong query.

1090
01:05:45,150 --> 01:05:47,850
Because the correct
query does r--

1091
01:05:47,850 --> 01:05:49,590
only reads from r.

1092
01:05:49,590 --> 01:05:56,970
So if you're not an S,
you must not be in r.

1093
01:05:56,970 --> 01:06:01,560
And so in this case, you
know you can abort and try

1094
01:06:01,560 --> 01:06:02,660
the next pi.

1095
01:06:02,660 --> 01:06:04,890
So we're going to do
this for all pi, run

1096
01:06:04,890 --> 01:06:09,194
the simulation according to
this way of reading cells.

1097
01:06:09,194 --> 01:06:11,610
At the end, the queries are
either going to say yes or no,

1098
01:06:11,610 --> 01:06:16,950
or it may abort early.

1099
01:06:16,950 --> 01:06:18,840
So if it says no
or it aborts early,

1100
01:06:18,840 --> 01:06:21,360
then we know that
was not the right pi.

1101
01:06:21,360 --> 01:06:24,960
Only one of them can say yes,
that tells us what the pi is,

1102
01:06:24,960 --> 01:06:27,810
that tells us what
the queries were.

1103
01:06:27,810 --> 01:06:30,240
Once we know what the queries
were in the right subtree,

1104
01:06:30,240 --> 01:06:33,012
we can use the same
multiplying by inverses trick,

1105
01:06:33,012 --> 01:06:35,220
figure out what the updates
were in the left subtree.

1106
01:06:35,220 --> 01:06:41,250
But those permutations
require l root n log n bits.

1107
01:06:41,250 --> 01:06:43,770
Which used to be on this
board, it's been erased now.

1108
01:06:43,770 --> 01:06:46,470
That's what we use
for this argument.

1109
01:06:46,470 --> 01:06:48,270
And so what we get
is overall, encoding

1110
01:06:48,270 --> 01:06:52,822
must use l root n log n bits.

1111
01:06:52,822 --> 01:06:55,740
OK, but our encoding's
a little bit bigger now.

1112
01:06:55,740 --> 01:06:59,190
The big issue is how do
we store the separator?

1113
01:06:59,190 --> 01:07:01,860
We need to do store this
separator with very few bits,

1114
01:07:01,860 --> 01:07:06,160
otherwise we haven't
really proved anything.

1115
01:07:06,160 --> 01:07:07,800
We want encoding to be small.

1116
01:07:11,480 --> 01:07:19,320
So we get that the
encoding must use

1117
01:07:19,320 --> 01:07:28,170
omega l root n log n bits
in expectation, because this

1118
01:07:28,170 --> 01:07:30,630
is a valid decoding
algorithm, it will figure out

1119
01:07:30,630 --> 01:07:32,422
what the permutations were.

1120
01:07:32,422 --> 01:07:34,130
And they require at
least this many bits,

1121
01:07:34,130 --> 01:07:38,160
so encoding must use this
many bits in expectation.

1122
01:07:38,160 --> 01:07:40,740
Now the question is how many
bits does the encoding use?

1123
01:07:40,740 --> 01:07:42,360
Then we'll get either
a contradiction

1124
01:07:42,360 --> 01:07:44,550
or we'll prove the claim.

1125
01:07:47,142 --> 01:07:48,960
So let's go over here.

1126
01:08:23,830 --> 01:08:27,330
So here's a fun fact
about separators.

1127
01:08:27,330 --> 01:08:30,779
I'm not going to prove
it fully, but I'm

1128
01:08:30,779 --> 01:08:33,792
going to rely on
some hashing ability.

1129
01:08:36,630 --> 01:08:38,790
So given some universe
U, in this case

1130
01:08:38,790 --> 01:08:42,270
it's going to be the cells
in our data structure.

1131
01:08:42,270 --> 01:08:44,715
But speak a little bit more
generally of the universe U,

1132
01:08:44,715 --> 01:08:47,800
I have some number m,
which is our set size.

1133
01:08:47,800 --> 01:08:53,580
And what we're interested in
is in defining our separator

1134
01:08:53,580 --> 01:08:54,960
family.

1135
01:08:54,960 --> 01:08:58,930
Kind of like a family of hash
functions, closely related,

1136
01:08:58,930 --> 01:08:59,790
in fact.

1137
01:08:59,790 --> 01:09:07,320
Call it S. And it's going
to work for size m sets.

1138
01:09:13,510 --> 01:09:20,229
And so S is a separator family
if, for any two sets, A and B,

1139
01:09:20,229 --> 01:09:32,259
in the universe of size,
at most, m, and disjoint.

1140
01:09:32,259 --> 01:09:35,529
So A intersect B
is the empty set.

1141
01:09:35,529 --> 01:09:38,380
So of course what we're thinking
about here is r minus w,

1142
01:09:38,380 --> 01:09:39,279
and w minus r.

1143
01:09:39,279 --> 01:09:41,380
These are two subsets
of the universe.

1144
01:09:41,380 --> 01:09:44,510
Hopefully they're not too big,
because if this one is huge,

1145
01:09:44,510 --> 01:09:46,240
that means you read
a huge amount of data

1146
01:09:46,240 --> 01:09:47,080
in the right subtree.

1147
01:09:47,080 --> 01:09:49,621
If this one is huge, it meant
you wrote a huge amount of data

1148
01:09:49,621 --> 01:09:50,960
in the left subtree.

1149
01:09:50,960 --> 01:09:53,770
And then we get lower
bounds in an easier way.

1150
01:09:53,770 --> 01:09:56,550
Or they're not so big, let's
say they're size at most m.

1151
01:09:56,550 --> 01:09:59,470
They're disjoint for sure,
by definition, r minus w's

1152
01:09:59,470 --> 01:10:01,110
disjoint from w minus r.

1153
01:10:01,110 --> 01:10:02,983
It removes the intersection.

1154
01:10:06,160 --> 01:10:08,760
So that's our set up.

1155
01:10:08,760 --> 01:10:10,390
Then, what we
want, is that there

1156
01:10:10,390 --> 01:10:15,580
is some set C in the
separator family,

1157
01:10:15,580 --> 01:10:29,590
such that A is contained in
C, and B is outside of C.

1158
01:10:29,590 --> 01:10:33,580
So B is in the universe minus C.
So this is exactly our picture

1159
01:10:33,580 --> 01:10:38,890
from before, we have
A. A contains C,

1160
01:10:38,890 --> 01:10:41,230
and we have B over on the right.

1161
01:10:41,230 --> 01:10:45,840
And this is the whole universe
U, and so B is outside of C,

1162
01:10:45,840 --> 01:10:51,832
A is entirely inside C. OK.

1163
01:10:51,832 --> 01:10:54,040
This is what we want to
exist, because if a separator

1164
01:10:54,040 --> 01:10:58,090
family exists, then we know
whatever our r minus w, and w

1165
01:10:58,090 --> 01:11:00,630
minus r sets were, as long
as they're not too big,

1166
01:11:00,630 --> 01:11:02,320
they're definitely
disjoint, we can

1167
01:11:02,320 --> 01:11:05,170
find one of these separators
that encodes what we need

1168
01:11:05,170 --> 01:11:10,930
to encode, which is the set C.
Which is called s over there.

1169
01:11:10,930 --> 01:11:11,740
Cool.

1170
01:11:11,740 --> 01:11:12,640
How do we encode it?

1171
01:11:12,640 --> 01:11:14,240
Well, if the number--

1172
01:11:14,240 --> 01:11:20,260
if the size of the separator
family is something,

1173
01:11:20,260 --> 01:11:24,640
then we need log of that bits
to write down the separator set.

1174
01:11:24,640 --> 01:11:28,490
So as long as this is
small, we're happy.

1175
01:11:28,490 --> 01:11:50,150
So let me tell you what's
to know about separators

1176
01:11:50,150 --> 01:12:02,450
There exists a separator family
S, with size of S at most 2

1177
01:12:02,450 --> 01:12:09,497
to the order m
plus log log U. Now

1178
01:12:09,497 --> 01:12:11,330
this is getting into
an area that we haven't

1179
01:12:11,330 --> 01:12:13,740
spent a lot of time on, but--

1180
01:12:13,740 --> 01:12:15,170
so I'm going to
give you a sketch

1181
01:12:15,170 --> 01:12:16,211
of a proof of this claim.

1182
01:12:19,520 --> 01:12:22,910
Relying on perfect
hash functions.

1183
01:12:22,910 --> 01:12:27,170
So the idea is the following,
we want to know, basically,

1184
01:12:27,170 --> 01:12:30,070
which elements are in A,
which elements are in B.

1185
01:12:30,070 --> 01:12:32,270
But it's kind of annoying
to do that, it can't start

1186
01:12:32,270 --> 01:12:34,290
that for all universe elements.

1187
01:12:34,290 --> 01:12:37,235
So if we could just find a
nice perfect hash function that

1188
01:12:37,235 --> 01:12:39,110
maps the elements of a
and the elements would

1189
01:12:39,110 --> 01:12:40,926
B to different slots
in some hash table,

1190
01:12:40,926 --> 01:12:43,550
then for every slot in the hash
table we could say, is it in A,

1191
01:12:43,550 --> 01:12:45,050
or is it in B?

1192
01:12:45,050 --> 01:12:49,580
Now if you are not in A union
B and you hash somewhere,

1193
01:12:49,580 --> 01:12:52,260
you'll get some bit, who
knows what that that stores.

1194
01:12:52,260 --> 01:12:53,420
I don't care.

1195
01:12:53,420 --> 01:12:55,670
For the things
outside of A union B,

1196
01:12:55,670 --> 01:12:58,580
they could be in C or not
in C, I don't really care.

1197
01:12:58,580 --> 01:13:01,460
And so all I care about
is if A and B have

1198
01:13:01,460 --> 01:13:04,170
no collisions
between each other,

1199
01:13:04,170 --> 01:13:06,650
I don't want any A thing
to hash to B thing.

1200
01:13:06,650 --> 01:13:09,840
Then I can store a bit in
every cell in the hash table,

1201
01:13:09,840 --> 01:13:13,749
and that will tell me, in
particular, A versus B.

1202
01:13:13,749 --> 01:13:16,040
And then the rest of the
items are somehow categorized,

1203
01:13:16,040 --> 01:13:18,350
but I don't care how
they're categorized.

1204
01:13:18,350 --> 01:13:21,710
So we're going to use
this fact that there

1205
01:13:21,710 --> 01:13:28,340
is a set of perfect hash
functions of the same size.

1206
01:13:34,600 --> 01:13:45,680
Sorry, that should be H. This
is what's really true, size of H

1207
01:13:45,680 --> 01:13:51,094
is 2 the order m
plus log log U. OK,

1208
01:13:51,094 --> 01:13:52,760
I'm not going to prove
this, but this is

1209
01:13:52,760 --> 01:13:54,250
about succinct hash functions.

1210
01:13:54,250 --> 01:13:56,450
It may be hard to find
such a hash family,

1211
01:13:56,450 --> 01:13:58,655
but the claim is
that they exist.

1212
01:13:58,655 --> 01:14:01,030
Or it's hard to find the hash
function of the family that

1213
01:14:01,030 --> 01:14:03,200
has no collisions,
but the guarantee

1214
01:14:03,200 --> 01:14:07,580
is, as long as you have,
in total, two items,

1215
01:14:07,580 --> 01:14:10,850
out of your universe
of size U, you

1216
01:14:10,850 --> 01:14:13,160
can get a collision-free
hash function,

1217
01:14:13,160 --> 01:14:17,450
2 to the order m plus log log U.

1218
01:14:17,450 --> 01:14:18,080
OK.

1219
01:14:18,080 --> 01:14:20,630
So this is going to--

1220
01:14:20,630 --> 01:14:21,130
Yeah.

1221
01:14:23,840 --> 01:14:29,900
Maps, say A union B, to
an order m sized table.

1222
01:14:34,040 --> 01:14:36,920
And here, there
are no collisions.

1223
01:14:41,570 --> 01:14:45,680
So then what we also
store is an A or B

1224
01:14:45,680 --> 01:14:51,130
bit for each table entry.

1225
01:15:02,690 --> 01:15:03,970
So that's our encoding.

1226
01:15:03,970 --> 01:15:07,810
We store a perfect
hash function, that's

1227
01:15:07,810 --> 01:15:17,830
going to cost log H bits
for this part, and log of H

1228
01:15:17,830 --> 01:15:20,980
is just m plus log
log U. And then

1229
01:15:20,980 --> 01:15:25,000
we're going to store this A or
B bit for every table entry.

1230
01:15:25,000 --> 01:15:27,070
Number of table
entries is order m,

1231
01:15:27,070 --> 01:15:33,430
so this is going to take
2 to the order m bits.

1232
01:15:33,430 --> 01:15:37,029
Or sorry, not-- sorry, in term
of bits, its order m bits,

1233
01:15:37,029 --> 01:15:37,570
I should say.

1234
01:15:40,870 --> 01:15:45,100
In terms of functions,
it's 2 to the order m

1235
01:15:45,100 --> 01:15:48,520
possible choices
for this bit vector.

1236
01:15:48,520 --> 01:15:52,360
And so the easy way is to
just sum up these bits,

1237
01:15:52,360 --> 01:15:54,910
you use log of H bits
plus order m bits.

1238
01:15:54,910 --> 01:15:58,000
This already had an order m
term, and so you get this.

1239
01:15:58,000 --> 01:16:03,880
The log of S is order
m plus log log U.

1240
01:16:03,880 --> 01:16:06,610
So that's the end of
proof sketch of the claim.

1241
01:16:06,610 --> 01:16:09,010
If you believe perfect hash
functions can be written down

1242
01:16:09,010 --> 01:16:12,100
in a small way, then we're done.

1243
01:16:12,100 --> 01:16:16,060
Now first with separators,
now let's apply this separator

1244
01:16:16,060 --> 01:16:21,280
theorem claim to this setting.

1245
01:16:21,280 --> 01:16:23,530
So now we can compute
the size of our encoding,

1246
01:16:23,530 --> 01:16:26,290
our encoding involved
writing down r intersect w.

1247
01:16:26,290 --> 01:16:30,310
That takes r intersect w
times log n, just like before.

1248
01:16:30,310 --> 01:16:35,050
It also involves writing
down the separator.

1249
01:16:35,050 --> 01:16:40,960
Separator takes order
m bits, m is r plus w.

1250
01:16:40,960 --> 01:16:43,657
Things that are--
it's order r plus w.

1251
01:16:43,657 --> 01:16:44,740
These are all the things--

1252
01:16:44,740 --> 01:16:46,960
I'm trying to write
down r minus w and w

1253
01:16:46,960 --> 01:16:50,090
minus r, so that you add
up those sizes, basically r

1254
01:16:50,090 --> 01:16:51,130
plus w.

1255
01:16:51,130 --> 01:16:57,010
Plus log log U. U
is some small thing,

1256
01:16:57,010 --> 01:17:00,066
size of memory, number
of cells in memory.

1257
01:17:00,066 --> 01:17:01,690
We're assuming that
polynomials, so you

1258
01:17:01,690 --> 01:17:08,270
take log log of a polynomial,
that's like log log n.

1259
01:17:08,270 --> 01:17:10,790
So let's finish this off.

1260
01:17:16,250 --> 01:17:25,790
So before this was our
equation, r intersect w times

1261
01:17:25,790 --> 01:17:27,920
log n, that was the
size of our encoding.

1262
01:17:27,920 --> 01:17:29,050
We still have that term.

1263
01:17:34,110 --> 01:17:40,560
Sorry, r intersect w,
size of that, times log n.

1264
01:17:40,560 --> 01:17:41,860
So we still do that.

1265
01:17:46,600 --> 01:17:50,950
Now we also pay,
for this separator,

1266
01:17:50,950 --> 01:17:57,130
we're going to pay r plus
w, that's the m part.

1267
01:17:57,130 --> 01:18:00,790
Plus log log n.

1268
01:18:00,790 --> 01:18:04,180
This is the number of
bits in our encoding.

1269
01:18:04,180 --> 01:18:07,750
And I claim, or what
we've proved over here,

1270
01:18:07,750 --> 01:18:11,560
is that any encoding must
use l root n log n bits.

1271
01:18:14,340 --> 01:18:17,040
So this thing must be
at least this thing.

1272
01:18:17,040 --> 01:18:19,190
So we have a little
bit more work to prove.

1273
01:18:19,190 --> 01:18:20,870
There are now two cases.

1274
01:18:20,870 --> 01:18:22,600
It depends-- there's basically--

1275
01:18:22,600 --> 01:18:25,760
and log log n is
unlikely to dominate.

1276
01:18:25,760 --> 01:18:29,150
We're doing a block
operation on root n things,

1277
01:18:29,150 --> 01:18:32,480
probably need to use at
least log log n steps.

1278
01:18:32,480 --> 01:18:34,100
So it's not really relevant.

1279
01:18:34,100 --> 01:18:38,930
What will dominate is either
this term, as it used to,

1280
01:18:38,930 --> 01:18:40,500
or this term.

1281
01:18:40,500 --> 01:18:44,450
These are two different cases,
call them case one, case two.

1282
01:18:44,450 --> 01:18:50,930
In case two, r plus w is
at least l root n log n.

1283
01:18:50,930 --> 01:18:52,810
That's the lower bound we want.

1284
01:18:52,810 --> 01:19:02,610
If we can-- in case two, r
plus w is omega l root n log n.

1285
01:19:02,610 --> 01:19:05,030
What that means is in this
subtree, the amount of reading

1286
01:19:05,030 --> 01:19:06,920
we did in the right
subtree, plus the amount

1287
01:19:06,920 --> 01:19:08,780
of writing we did
in the left subtree,

1288
01:19:08,780 --> 01:19:11,120
is at least l root n log n.

1289
01:19:11,120 --> 01:19:13,430
That's our goal over here.

1290
01:19:13,430 --> 01:19:15,300
We want to prove--

1291
01:19:15,300 --> 01:19:18,650
sorry, it's a previous
claim, that's by now erased.

1292
01:19:18,650 --> 01:19:20,060
Is the easier
claim, we just want

1293
01:19:20,060 --> 01:19:23,900
to show that the total amount
of time spent in v's subtree

1294
01:19:23,900 --> 01:19:25,550
is at least log n per operation.

1295
01:19:25,550 --> 01:19:27,410
We're doing l root
n things here.

1296
01:19:27,410 --> 01:19:29,880
So this is a ton of
reading and writing.

1297
01:19:29,880 --> 01:19:32,120
So in that case, we're
happy, because we get

1298
01:19:32,120 --> 01:19:33,980
an actual lower bound on time.

1299
01:19:33,980 --> 01:19:37,520
Otherwise, we don't-- I
mean, these are actual reads

1300
01:19:37,520 --> 01:19:40,920
and writes, or total
number of reads and writes.

1301
01:19:40,920 --> 01:19:43,620
Here we're getting--
in the other case,

1302
01:19:43,620 --> 01:19:51,860
we get r intersect w log
n is at least l root n log

1303
01:19:51,860 --> 01:19:53,960
n, just like before.

1304
01:19:53,960 --> 01:19:57,600
So again, the log ns cancel.

1305
01:19:57,600 --> 01:19:59,750
So here we lose
the log n factor,

1306
01:19:59,750 --> 01:20:02,600
but it's OK,
because this is only

1307
01:20:02,600 --> 01:20:04,350
talking about r intersect w.

1308
01:20:04,350 --> 01:20:06,536
This we use the LCA
charging, to say, well,

1309
01:20:06,536 --> 01:20:07,910
if you look at a
particular read,

1310
01:20:07,910 --> 01:20:09,830
it's only gets
charged by the LCA.

1311
01:20:09,830 --> 01:20:12,140
So then we can afford
to sum up large amounts.

1312
01:20:12,140 --> 01:20:13,370
So it's a little bit weird.

1313
01:20:13,370 --> 01:20:15,579
In this situation, we add
up all the lower bounds.

1314
01:20:15,579 --> 01:20:17,120
Each of them doesn't
give us a log n,

1315
01:20:17,120 --> 01:20:19,620
but in aggregate, we get a log
n, because every leaf appears

1316
01:20:19,620 --> 01:20:21,200
in log n levels.

1317
01:20:21,200 --> 01:20:23,820
In this case, we don't need to
aggregate, because we just say,

1318
01:20:23,820 --> 01:20:25,910
well, the number of
operations in the subtree

1319
01:20:25,910 --> 01:20:28,880
is at least log n per operation.

1320
01:20:28,880 --> 01:20:31,040
This time spent,
cell probe's done,

1321
01:20:31,040 --> 01:20:32,789
is at least log n per operation.

1322
01:20:32,789 --> 01:20:35,330
So in that case, we don't need
to sum the lower bounds, which

1323
01:20:35,330 --> 01:20:36,380
is done.

1324
01:20:36,380 --> 01:20:38,690
So in either case, we're happy.

1325
01:20:38,690 --> 01:20:40,940
Little weird, because you
could have a mix of cases,

1326
01:20:40,940 --> 01:20:44,300
one vertex v could
be in case two,

1327
01:20:44,300 --> 01:20:46,940
then you just ignore
all the things below it.

1328
01:20:46,940 --> 01:20:49,197
The rest of the tree
might be in case one,

1329
01:20:49,197 --> 01:20:50,780
but you can mix and
match one and two,

1330
01:20:50,780 --> 01:20:55,400
as long as you don't use a
one below a two, you're OK,

1331
01:20:55,400 --> 01:20:56,840
you won't double count.

1332
01:20:56,840 --> 01:20:58,910
And so in either
case, we're happy,

1333
01:20:58,910 --> 01:21:02,870
we get a log n lower bound,
either on time per operation,

1334
01:21:02,870 --> 01:21:06,860
or on this kind of
time per operation.

1335
01:21:06,860 --> 01:21:09,850
Add up all those lower bounds,
you get log n per operation,

1336
01:21:09,850 --> 01:21:12,950
or get root n log n per
block operation, which

1337
01:21:12,950 --> 01:21:18,850
implies log n per insert delete
edge, or connectivity query.

1338
01:21:18,850 --> 01:21:23,750
And that proves right
there, more or less on time.

1339
01:21:23,750 --> 01:21:25,970
You can use the same
technique to do a trade off

1340
01:21:25,970 --> 01:21:27,730
between updates and queries.

1341
01:21:27,730 --> 01:21:30,540
This is just log n,
worst case of the two.

1342
01:21:30,540 --> 01:21:32,750
I mentioned what the
bound was last time.

1343
01:21:32,750 --> 01:21:34,640
Same trick works, you
just do more updates

1344
01:21:34,640 --> 01:21:36,469
than queries, or more
queries than updates.

1345
01:21:36,469 --> 01:21:38,260
So we get link/cut
trees are optimal, other

1346
01:21:38,260 --> 01:21:39,627
[? tour ?] trees are optimal.

1347
01:21:39,627 --> 01:21:41,585
And we've got lots of
other points on the trade

1348
01:21:41,585 --> 01:21:43,730
off curve, as you may
recall last class.

1349
01:21:43,730 --> 01:21:47,900
Like our log squared update is
optimal for a log over log log

1350
01:21:47,900 --> 01:21:50,120
query.

1351
01:21:50,120 --> 01:21:53,950
And that's the end of
dynamic graphs, the end

1352
01:21:53,950 --> 01:21:56,517
of advanced data structures.

1353
01:21:56,517 --> 01:21:58,100
Hope you had a fun
time, we got to see

1354
01:21:58,100 --> 01:21:59,520
lots of different topics.

1355
01:21:59,520 --> 01:22:01,830
And I hope you'll enjoy
watching on the videos,

1356
01:22:01,830 --> 01:22:04,580
and let me know if you have
any comments, send an email

1357
01:22:04,580 --> 01:22:06,890
or whatever.

1358
01:22:06,890 --> 01:22:08,090
Yay.

1359
01:22:08,090 --> 01:22:09,940
[APPLAUSE]