1 00:00:00,090 --> 00:00:02,490 The following content is provided under a Creative 2 00:00:02,490 --> 00:00:04,030 Commons license. 3 00:00:04,030 --> 00:00:06,360 Your support will help MIT OpenCourseWare 4 00:00:06,360 --> 00:00:10,720 continue to offer high quality, educational resources for free. 5 00:00:10,720 --> 00:00:13,320 To make a donation, or view additional materials 6 00:00:13,320 --> 00:00:17,280 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,280 --> 00:00:18,450 at ocw.mit.edu. 8 00:00:21,480 --> 00:00:24,870 ERIK DEMAINE: All right, welcome to my last lecture 9 00:00:24,870 --> 00:00:26,880 for the semester. 10 00:00:26,880 --> 00:00:30,090 We finish our coverage of dynamic graphs, 11 00:00:30,090 --> 00:00:32,439 and also our coverage of lower bounds. 12 00:00:32,439 --> 00:00:35,940 We saw one big lower bound in this class 13 00:00:35,940 --> 00:00:37,502 in the cell probe model. 14 00:00:37,502 --> 00:00:39,210 You may recall cell probe model, you just 15 00:00:39,210 --> 00:00:44,500 count how many cells of memory do you touch. 16 00:00:44,500 --> 00:00:47,680 You want to prove a lower bound on that. 17 00:00:47,680 --> 00:00:50,180 And today we're going to prove a cell probe from lower bound 18 00:00:50,180 --> 00:00:54,570 on dynamic connectivity, which is a problem we've 19 00:00:54,570 --> 00:00:57,030 solved a few different times. 20 00:00:57,030 --> 00:00:58,920 Our lower bound will apply even when 21 00:00:58,920 --> 00:01:03,920 each of the connected components of your graph are just a path. 22 00:01:03,920 --> 00:01:06,600 And so in particular, they imply matching lower bounds 23 00:01:06,600 --> 00:01:07,665 for dynamic trees. 24 00:01:14,760 --> 00:01:18,120 So here is the theorem we'll be proving today. 25 00:01:18,120 --> 00:01:27,270 You want to insert and delete edges, 26 00:01:27,270 --> 00:01:35,970 and do connectivity queries between pairs of vertices, vw. 27 00:01:35,970 --> 00:01:39,720 I want to know is there a path from v to w, 28 00:01:39,720 --> 00:01:41,490 just like we've been considering. 29 00:01:44,370 --> 00:01:52,320 These require omega log n, time for operation. 30 00:01:55,590 --> 00:01:57,990 This is the max of updating, query times has 31 00:01:57,990 --> 00:02:01,860 to be at least log n time per operation, 32 00:02:01,860 --> 00:02:16,170 even if the connected components are paths, and even 33 00:02:16,170 --> 00:02:19,958 amortized, and even randomized. 34 00:02:19,958 --> 00:02:23,700 Although I'm not going to prove of all of these versions, 35 00:02:23,700 --> 00:02:26,130 I won't prove the amortized version. 36 00:02:26,130 --> 00:02:29,280 I'm going to prove a worst case log and lower bound, 37 00:02:29,280 --> 00:02:33,930 it's just a little more work to prove an amortized lower bound. 38 00:02:33,930 --> 00:02:36,460 But same principles. 39 00:02:36,460 --> 00:02:41,500 And so that's going to be today, is proving this theorem. 40 00:02:41,500 --> 00:02:44,640 It's not a short proof, but it combines 41 00:02:44,640 --> 00:02:47,580 a bunch of relatively simple ideas, 42 00:02:47,580 --> 00:02:51,060 and ends up being pretty clean overall, piece-by-piece, 43 00:02:51,060 --> 00:02:54,810 but there's just a bunch of pieces, as we will get to. 44 00:02:54,810 --> 00:02:57,600 Key concept is an idea introduced 45 00:02:57,600 --> 00:03:01,140 in this paper, which is to build a balanced binary 46 00:03:01,140 --> 00:03:05,190 tree over time, over your access sequence. 47 00:03:05,190 --> 00:03:09,600 And argue about different subtrees within that tree. 48 00:03:09,600 --> 00:03:14,520 This is a paper that maybe came out of this class, 49 00:03:14,520 --> 00:03:19,500 in some sense, it was by Mihai Patrascu and myself. 50 00:03:19,500 --> 00:03:21,510 Back when Mihai was an undergrad, 51 00:03:21,510 --> 00:03:23,966 I think he'd just taken this class. 52 00:03:23,966 --> 00:03:25,590 But at that point the class didn't even 53 00:03:25,590 --> 00:03:26,923 cover dynamic connectivity, so-- 54 00:03:30,040 --> 00:03:32,535 and time here is cell probes. 55 00:03:36,300 --> 00:03:39,050 So this is a very strong model, it 56 00:03:39,050 --> 00:03:41,570 implies a lower bound on ram, and implies a lower bound 57 00:03:41,570 --> 00:03:42,840 on pointer machine. 58 00:03:42,840 --> 00:03:46,380 We know matching upper bounds for trees 59 00:03:46,380 --> 00:03:48,990 on a pointer machine link/cut trees 60 00:03:48,990 --> 00:03:50,560 in [INAUDIBLE] to our trees. 61 00:03:50,560 --> 00:03:53,790 It's kind of fun that this lower bound even applies to paths, 62 00:03:53,790 --> 00:03:55,740 because most of the work in link/cut trees 63 00:03:55,740 --> 00:03:57,914 is about decomposing your tree into paths. 64 00:03:57,914 --> 00:04:00,330 And so what this is saying is even if that's done for you, 65 00:04:00,330 --> 00:04:03,720 and you just need to be able to take paths, and concatenate 66 00:04:03,720 --> 00:04:10,380 them together by adding edges, then maintaining the find root 67 00:04:10,380 --> 00:04:13,080 property so that you can do connectivity queries, 68 00:04:13,080 --> 00:04:15,550 even that requires log n time. 69 00:04:15,550 --> 00:04:19,350 So converting a tree into a path is basically free, 70 00:04:19,350 --> 00:04:23,010 the hard part is maintaining the paths. 71 00:04:23,010 --> 00:04:25,830 So let's prove a theorem. 72 00:04:28,590 --> 00:04:30,030 The lower bound, we get to choose 73 00:04:30,030 --> 00:04:32,550 what access sequence we think is bad. 74 00:04:32,550 --> 00:04:37,170 And so we're going to come up with a particular style 75 00:04:37,170 --> 00:04:42,703 of graph, which looks like the following. 76 00:04:54,050 --> 00:04:55,190 Graph is going to be-- 77 00:04:55,190 --> 00:04:57,530 the vertices are going to be a root n by root n grid. 78 00:05:01,700 --> 00:05:03,950 And we're going to-- 79 00:05:03,950 --> 00:05:06,300 these guys are in, what did I call them? 80 00:05:06,300 --> 00:05:07,280 Groups? 81 00:05:07,280 --> 00:05:08,340 Columns. 82 00:05:08,340 --> 00:05:10,160 Columns is a good name. 83 00:05:10,160 --> 00:05:15,620 These are columns of the matrix, or vertices. 84 00:05:15,620 --> 00:05:19,760 And what I'd like to have is between consecutive columns, 85 00:05:19,760 --> 00:05:21,140 I want to have a perfect matching 86 00:05:21,140 --> 00:05:23,232 between these vertices. 87 00:05:23,232 --> 00:05:28,580 So could be, I don't know, this edge, this edge, this edge, 88 00:05:28,580 --> 00:05:29,554 and that edge. 89 00:05:29,554 --> 00:05:31,970 And I also want a perfect match between these two columns, 90 00:05:31,970 --> 00:05:39,950 so maybe this one, this one, this one, this one. 91 00:05:39,950 --> 00:05:46,340 And you can have some boring things too. 92 00:05:46,340 --> 00:05:47,180 Something like that. 93 00:05:47,180 --> 00:05:53,815 So between every pair of columns is a perfect matching, 94 00:05:53,815 --> 00:05:54,815 meaning perfect pairing. 95 00:05:58,910 --> 00:06:01,370 OK, this of course results in a collection of paths, 96 00:06:01,370 --> 00:06:02,660 square root of n paths. 97 00:06:02,660 --> 00:06:04,970 You can start at any vertex on the left, 98 00:06:04,970 --> 00:06:07,420 and you'll have a unique way to go to the right. 99 00:06:07,420 --> 00:06:13,430 And so that's path 1, this is path 2, this is path 3, 100 00:06:13,430 --> 00:06:16,070 and this is path 4. 101 00:06:16,070 --> 00:06:20,720 And so if I-- an interesting query. 102 00:06:20,720 --> 00:06:23,450 Well, an interesting query is something like I want to know, 103 00:06:23,450 --> 00:06:26,877 is this vertex connected to this one? 104 00:06:26,877 --> 00:06:28,460 And it's not so easy to figure it out, 105 00:06:28,460 --> 00:06:31,490 because you have to sort of walk through this path 106 00:06:31,490 --> 00:06:32,349 to figure that out. 107 00:06:32,349 --> 00:06:34,640 We're going to think of each of these perfect matchings 108 00:06:34,640 --> 00:06:41,060 as defining a permutation on the vertices, on the column really. 109 00:06:41,060 --> 00:06:43,940 So you start with the identity permutation, 110 00:06:43,940 --> 00:06:46,480 and then some things get swapped around, that's pi 1. 111 00:06:46,480 --> 00:06:48,750 Something gets swapped around again, that's pi 2. 112 00:06:48,750 --> 00:06:51,170 Somethings get swapped around here, that's pi 3. 113 00:06:51,170 --> 00:06:57,500 And then this position would be pi 3 of pi 2 of pi 1 114 00:06:57,500 --> 00:07:01,940 of vertex 4. 115 00:07:01,940 --> 00:07:05,960 We call this vertex 4. 116 00:07:05,960 --> 00:07:09,200 Or row, row 4. 117 00:07:09,200 --> 00:07:11,990 So in some sense, we have to compose permutations. 118 00:07:11,990 --> 00:07:20,132 I'll call this pi 1, circle pi 2, circle pi 3 of 4. 119 00:07:20,132 --> 00:07:21,590 And we're going to show, basically, 120 00:07:21,590 --> 00:07:24,470 composing permutations is tough when you can change 121 00:07:24,470 --> 00:07:26,630 those permutations dynamically. 122 00:07:26,630 --> 00:07:28,340 So what we're going to do is a series 123 00:07:28,340 --> 00:07:37,130 of block operations, which change or query 124 00:07:37,130 --> 00:07:39,660 entire permutations. 125 00:07:39,660 --> 00:07:45,470 So here, an update is going to be a whole bunch of insertions 126 00:07:45,470 --> 00:07:48,230 and deletions of edges. 127 00:07:48,230 --> 00:07:53,966 Basically, what we want to do is set pi i equal to pi. 128 00:07:53,966 --> 00:07:57,380 So that's what update of i comma pi does. 129 00:07:57,380 --> 00:08:00,110 It changes an entire perfect matching 130 00:08:00,110 --> 00:08:02,914 to be a specified permutation. 131 00:08:02,914 --> 00:08:03,830 So how do you do that? 132 00:08:03,830 --> 00:08:05,300 Well, you delete all the edges that 133 00:08:05,300 --> 00:08:06,841 are in the existing permutation, then 134 00:08:06,841 --> 00:08:08,940 you insert all the new edges. 135 00:08:08,940 --> 00:08:15,200 So this can be done in square root of n edge deletions 136 00:08:15,200 --> 00:08:15,980 and insertions. 137 00:08:23,790 --> 00:08:26,460 So it's a bulk update of square root of n operations. 138 00:08:26,460 --> 00:08:28,920 And so this could only make our problem easier, 139 00:08:28,920 --> 00:08:30,980 because we're given square root of n updates 140 00:08:30,980 --> 00:08:33,360 that we all need to do at once. 141 00:08:33,360 --> 00:08:35,850 So you could amortize over them, you 142 00:08:35,850 --> 00:08:38,520 could do lots of different things, 143 00:08:38,520 --> 00:08:42,059 but we're sure that won't help. 144 00:08:42,059 --> 00:08:45,300 And then we have a query, and the query 145 00:08:45,300 --> 00:08:49,609 is going to be a little bit weird, 146 00:08:49,609 --> 00:08:51,900 and it's also going to make the proof a little bit more 147 00:08:51,900 --> 00:08:53,400 awkward. 148 00:08:53,400 --> 00:09:03,570 But what it asks is if I look at the composition of pi j, from 1 149 00:09:03,570 --> 00:09:05,630 up to i. 150 00:09:05,630 --> 00:09:07,622 This is 1. 151 00:09:07,622 --> 00:09:12,660 So I want to know is that composition equal to pi? 152 00:09:12,660 --> 00:09:13,680 Yes or no? 153 00:09:13,680 --> 00:09:18,720 This is what I'll call verify sum, sum meaning composition. 154 00:09:18,720 --> 00:09:21,210 But the sum terminology comes from a different problem, 155 00:09:21,210 --> 00:09:22,890 which we won't talk about directly 156 00:09:22,890 --> 00:09:25,140 here, called partial sums. 157 00:09:25,140 --> 00:09:27,240 Partial sums is basically this problem, 158 00:09:27,240 --> 00:09:29,580 you can change numbers in an array, 159 00:09:29,580 --> 00:09:33,440 and you can compute the prefix sum from 1 up to i. 160 00:09:33,440 --> 00:09:36,780 Here we're not computing it. 161 00:09:36,780 --> 00:09:37,950 Why are we not computing it? 162 00:09:37,950 --> 00:09:41,370 Because actually figuring out what pi 3, or pi 2 of pi 1 163 00:09:41,370 --> 00:09:45,330 is of something is tricky in this setting. 164 00:09:45,330 --> 00:09:48,120 The operations we're given are, given two vertices, 165 00:09:48,120 --> 00:09:51,510 are they connected by a path? 166 00:09:51,510 --> 00:09:56,989 So to figure out the other end of this path, that requires-- 167 00:09:56,989 --> 00:09:59,280 I mean, it's hard to figure out where the other end is. 168 00:09:59,280 --> 00:10:02,010 If I told you is it this one? 169 00:10:02,010 --> 00:10:05,280 Then I can answer that question with just a connectivity query. 170 00:10:05,280 --> 00:10:09,490 So verify sum can be done with order square root 171 00:10:09,490 --> 00:10:12,715 of n connectivity queries. 172 00:10:16,530 --> 00:10:22,770 Whereas computing the sum could not be, as far as we know. 173 00:10:22,770 --> 00:10:25,980 If I tell you what that composition is supposed to be, 174 00:10:25,980 --> 00:10:29,280 I can check does 4 go to 1? 175 00:10:29,280 --> 00:10:30,610 Yes or no? 176 00:10:30,610 --> 00:10:32,670 Does 3 go to 3? 177 00:10:32,670 --> 00:10:33,370 Yes or no? 178 00:10:33,370 --> 00:10:35,520 Does 2 go to 4? 179 00:10:35,520 --> 00:10:36,110 Yes or no? 180 00:10:36,110 --> 00:10:37,780 Does 1 go to 2? 181 00:10:37,780 --> 00:10:38,490 Yes or no? 182 00:10:38,490 --> 00:10:41,910 So with 4 queries, I can check whether the permutation 183 00:10:41,910 --> 00:10:44,790 is what it is. 184 00:10:44,790 --> 00:10:47,060 If any of those fail, then I return no. 185 00:10:49,890 --> 00:10:53,400 So the way this proceeds is first we proved a lower bound 186 00:10:53,400 --> 00:10:56,880 on partial sums, which is computing 187 00:10:56,880 --> 00:11:01,260 this value when you're not told what the answer is. 188 00:11:01,260 --> 00:11:03,801 And then we extended that, and we'll do such a proof here 189 00:11:03,801 --> 00:11:04,300 today. 190 00:11:04,300 --> 00:11:06,466 First, we're going to prove a lower bound on the sum 191 00:11:06,466 --> 00:11:09,030 operation, which is computing this value that's 192 00:11:09,030 --> 00:11:14,640 on our outline over here, sum lower bound. 193 00:11:14,640 --> 00:11:17,580 And then we'll extend that and make the argument a little bit 194 00:11:17,580 --> 00:11:20,940 more complicated, then we'll get an actual connectivity lower 195 00:11:20,940 --> 00:11:23,740 bound, a lower bound on verify sum. 196 00:11:23,740 --> 00:11:26,010 OK, but obviously if we can prove 197 00:11:26,010 --> 00:11:29,490 that these operations take a long time to do, 198 00:11:29,490 --> 00:11:33,150 we can prove that these original operations take a long time 199 00:11:33,150 --> 00:11:34,410 to do. 200 00:11:34,410 --> 00:11:46,700 So what we claim is that square root of n updates, 201 00:11:46,700 --> 00:11:58,260 these block updates plus square of n verify some queries, 202 00:11:58,260 --> 00:12:12,220 require root n times root n log n cell probes. 203 00:12:19,190 --> 00:12:22,360 I guess this is the amortized claim. 204 00:12:22,360 --> 00:12:25,070 So if I want to do root n updates and root n queries, 205 00:12:25,070 --> 00:12:27,590 and I take root n times root n times log n-- 206 00:12:27,590 --> 00:12:29,750 funny way of writing n log n-- 207 00:12:29,750 --> 00:12:32,436 cell probes, then if I divide through, 208 00:12:32,436 --> 00:12:33,810 I want the amortized lower bound, 209 00:12:33,810 --> 00:12:36,770 I lose one of these root ns, because I'm 210 00:12:36,770 --> 00:12:38,030 doing different operations. 211 00:12:38,030 --> 00:12:40,330 I lose another root n, because each of these updates 212 00:12:40,330 --> 00:12:41,705 corresponds to root n operations. 213 00:12:41,705 --> 00:12:43,370 Each of the verify sums corresponds 214 00:12:43,370 --> 00:12:44,660 to root n operations. 215 00:12:44,660 --> 00:12:48,080 So overall per operation, per original operation 216 00:12:48,080 --> 00:12:50,660 of edge deletion insertion or connectivity query, 217 00:12:50,660 --> 00:12:52,440 I'm paying log n per operation. 218 00:12:52,440 --> 00:12:57,474 So if I can prove this claim, then I get this theorem. 219 00:12:57,474 --> 00:12:59,020 All clear? 220 00:12:59,020 --> 00:13:02,039 So now we've reduced the problem to these bulk operations, 221 00:13:02,039 --> 00:13:04,080 we'll just be thinking about the bulk operations, 222 00:13:04,080 --> 00:13:05,160 update verify sum. 223 00:13:05,160 --> 00:13:07,320 We won't think about edge deletion insertions 224 00:13:07,320 --> 00:13:09,041 and connectivity queries anymore. 225 00:13:14,794 --> 00:13:15,294 OK. 226 00:13:18,180 --> 00:13:20,600 So this is just sort of the general set up 227 00:13:20,600 --> 00:13:22,650 of what the graphs are going to look like. 228 00:13:22,650 --> 00:13:25,460 And now I'm going to tell you what sequence of updates 229 00:13:25,460 --> 00:13:28,840 and verify sums we're actually going to do that are bad. 230 00:13:28,840 --> 00:13:31,520 This is the bad access sequence. 231 00:13:39,930 --> 00:13:42,642 And this is actually something we've seen before in lecture 6, 232 00:13:42,642 --> 00:13:44,225 I think, the binary search tree stuff. 233 00:13:49,300 --> 00:13:51,575 We're going to look at the bit reversal sequence. 234 00:14:00,620 --> 00:14:03,830 So you may recall a bit reversal sequence. 235 00:14:03,830 --> 00:14:14,580 You take binary numbers in order, reverse the bits. 236 00:14:14,580 --> 00:14:28,370 So this becomes 000, 100, 010, 110, 001, 101, 011, and 111. 237 00:14:28,370 --> 00:14:33,620 So those are the reversed strings. 238 00:14:33,620 --> 00:14:35,970 And then you reinterpret those as regular numbers. 239 00:14:35,970 --> 00:14:43,244 So this is 0, 4, 2, 6, and then it 240 00:14:43,244 --> 00:14:45,160 should be the same thing, but the odd version. 241 00:14:45,160 --> 00:14:49,100 So I have 1, 5, 3, 7. 242 00:14:51,926 --> 00:14:56,270 OK, I claimed, I think probably didn't prove, 243 00:14:56,270 --> 00:15:00,760 that this bit reversal sequence has a high Wilber lower bound. 244 00:15:00,760 --> 00:15:04,890 And so any binary search tree accessing items in this order 245 00:15:04,890 --> 00:15:06,470 requires log n per operation. 246 00:15:06,470 --> 00:15:08,130 And we want log n per operation here, 247 00:15:08,130 --> 00:15:10,430 so it seems like a good choice, why not? 248 00:15:10,430 --> 00:15:13,580 So we're going to follow this access sequence. 249 00:15:13,580 --> 00:15:15,170 And sorry, I've changed notation here, 250 00:15:15,170 --> 00:15:18,320 we're going to number the permutations from 0 251 00:15:18,320 --> 00:15:21,710 to root n minus 1 now. 252 00:15:21,710 --> 00:15:24,270 And assume root n is a power of 2, 253 00:15:24,270 --> 00:15:27,140 so the bit reversal sequence is well defined. 254 00:15:27,140 --> 00:15:34,940 And then we are going to do two things for each such i. 255 00:15:34,940 --> 00:15:37,538 We're going to do a verify sum operation. 256 00:15:49,010 --> 00:15:51,642 Actually maybe it is starting at 1, I don't know. 257 00:15:51,642 --> 00:15:53,330 It doesn't matter. 258 00:15:53,330 --> 00:16:06,250 And then we'll do an update 259 00:16:06,250 --> 00:16:09,010 OK, so let's see. 260 00:16:09,010 --> 00:16:11,860 This pi random is just a uniform random permutation, 261 00:16:11,860 --> 00:16:15,400 it's computed fresh every time. 262 00:16:15,400 --> 00:16:20,380 So we're just re-randomizing pi i in this operation. 263 00:16:20,380 --> 00:16:22,300 Before we do that, we're checking 264 00:16:22,300 --> 00:16:25,600 that the sum, the composition of all the permutations up 265 00:16:25,600 --> 00:16:28,750 to position i, is what it is. 266 00:16:28,750 --> 00:16:31,300 So this is the actual value here, 267 00:16:31,300 --> 00:16:35,390 and we're verifying that that is indeed the sum. 268 00:16:35,390 --> 00:16:39,100 So this will always return yes. 269 00:16:39,100 --> 00:16:41,750 But data structure has to be correct. 270 00:16:41,750 --> 00:16:45,050 So it needs to really verify that that is the case. 271 00:16:45,050 --> 00:16:48,100 There's the threat that maybe we gave the wrong answer here, 272 00:16:48,100 --> 00:16:50,170 and it needs to double check that that is indeed 273 00:16:50,170 --> 00:16:51,190 the right answer. 274 00:16:51,190 --> 00:16:54,140 It may seem a little weird, but we'll see why it works. 275 00:16:54,140 --> 00:16:56,410 So this is the bad access sequence. 276 00:16:56,410 --> 00:17:03,540 Just do a query, do an update in this weird order in i. 277 00:17:03,540 --> 00:17:11,230 OK, and big idea is to build a nice balanced binary 278 00:17:11,230 --> 00:17:14,109 tree over time. 279 00:17:18,079 --> 00:17:26,514 So we have on the ground here 0, 4, 2, 6, 1, 5, 3, 7. 280 00:17:26,514 --> 00:17:31,000 And when I write 5, I mean verify sum of 5, 281 00:17:31,000 --> 00:17:33,820 and update permutation 5. 282 00:17:33,820 --> 00:17:36,010 And then we can build a binary tree on that. 283 00:17:41,900 --> 00:17:43,660 And for each node in this tree, we 284 00:17:43,660 --> 00:17:45,790 have the notion of a left subtree, 285 00:17:45,790 --> 00:17:48,010 and we have the notion of a right subtree. 286 00:17:48,010 --> 00:17:49,990 And cool thing about bit reversal sequence 287 00:17:49,990 --> 00:17:52,960 is this nice self-similarity. 288 00:17:52,960 --> 00:17:55,330 If you look at the left subtree of any node 289 00:17:55,330 --> 00:17:58,630 and the right subtree of any of node, those items interleave. 290 00:17:58,630 --> 00:18:00,910 If you look at the sorted order, it's 1 on the left, 291 00:18:00,910 --> 00:18:03,190 3 on the right, 5 on the left, 7 on the right. 292 00:18:03,190 --> 00:18:04,687 They always perfectly interleave, 293 00:18:04,687 --> 00:18:06,520 because this thing is designed to interleave 294 00:18:06,520 --> 00:18:09,310 at every possible level. 295 00:18:09,310 --> 00:18:12,190 So that's the fact we're going to use. 296 00:18:12,190 --> 00:18:18,670 We're going to analyze each node separately, and talk about what 297 00:18:18,670 --> 00:18:22,690 information has to be carried from the left subtree 298 00:18:22,690 --> 00:18:24,340 to the right subtree. 299 00:18:24,340 --> 00:18:27,100 In particular, we're interested in the updates being 300 00:18:27,100 --> 00:18:29,410 done on the left subtree, because here we change pi 1, 301 00:18:29,410 --> 00:18:31,150 we change pi 5. 302 00:18:31,150 --> 00:18:33,300 And the query's being done on the right subtree, 303 00:18:33,300 --> 00:18:36,610 because here we query 3, we query 7. 304 00:18:36,610 --> 00:18:39,310 When we query 3, that queries everything, 305 00:18:39,310 --> 00:18:40,690 all the permutations, up to 3. 306 00:18:40,690 --> 00:18:43,380 It's a composition of all permutations up to 3. 307 00:18:43,380 --> 00:18:45,287 So in particular it involves 1. 308 00:18:45,287 --> 00:18:47,620 So the claim is going to be that the permutation that we 309 00:18:47,620 --> 00:18:51,220 set in 1 has to be carried over to this query. 310 00:18:51,220 --> 00:18:54,670 And similarly, a changing permutation 5 311 00:18:54,670 --> 00:18:55,990 will affect the query for 7. 312 00:18:55,990 --> 00:19:00,550 Also query, the update for 1, will affect the query for 7. 313 00:19:00,550 --> 00:19:02,950 So we need to formalize that little bit. 314 00:19:10,630 --> 00:19:12,230 So here is the claim. 315 00:19:18,020 --> 00:19:28,760 For every node in the tree, say it 316 00:19:28,760 --> 00:19:35,400 has l leaves in its subtree-- 317 00:19:42,742 --> 00:19:47,430 This should be a comma and this should be a colon. 318 00:19:47,430 --> 00:19:51,060 Here's what we say. 319 00:19:51,060 --> 00:19:54,960 During the right subtree of v, so right subtree 320 00:19:54,960 --> 00:19:57,850 corresponds to an interval of time. 321 00:19:57,850 --> 00:20:00,750 So we're talking about those operations done 322 00:20:00,750 --> 00:20:06,270 during the right subtree of v. Claim is we must 323 00:20:06,270 --> 00:20:11,190 do omega l root n cell probes-- 324 00:20:15,898 --> 00:20:19,275 sorry, expected cell probes. 325 00:20:21,950 --> 00:20:24,490 We are using some randomness here, right? 326 00:20:24,490 --> 00:20:27,350 We said we're going to update each permutation 327 00:20:27,350 --> 00:20:28,730 to a random value, so we can only 328 00:20:28,730 --> 00:20:31,577 make claims about the expected performance. 329 00:20:34,710 --> 00:20:36,549 Fine. 330 00:20:36,549 --> 00:20:38,090 But that's actually a stronger thing, 331 00:20:38,090 --> 00:20:41,150 it implies a lower bound, even for randomized algorithms. 332 00:20:41,150 --> 00:20:42,960 So if you can randomize your input set. 333 00:20:45,890 --> 00:20:49,640 And then not just any cell probes, 334 00:20:49,640 --> 00:20:55,490 but they're cell probes that read cells last written 335 00:20:55,490 --> 00:20:56,555 during the left subtree. 336 00:21:08,440 --> 00:21:12,690 So this is what I was saying at a high level before. 337 00:21:12,690 --> 00:21:16,350 We're looking at reads over here, 338 00:21:16,350 --> 00:21:19,440 to cells that are written over here. 339 00:21:19,440 --> 00:21:22,140 Because we claim the updates over here 340 00:21:22,140 --> 00:21:24,750 have to store some information that is-- 341 00:21:24,750 --> 00:21:29,134 whatever the updates that happen over here influence the queries 342 00:21:29,134 --> 00:21:30,050 that happen over here. 343 00:21:30,050 --> 00:21:31,841 So these queries have to read the data that 344 00:21:31,841 --> 00:21:33,630 was written over here. 345 00:21:33,630 --> 00:21:36,990 And specifically, we're claiming at least l root n 346 00:21:36,990 --> 00:21:40,140 cell probes have to be read over here, 347 00:21:40,140 --> 00:21:42,420 from cells that were written-- 348 00:21:42,420 --> 00:21:45,870 that were basically just written in the left subtree. 349 00:21:45,870 --> 00:21:50,490 If we could prove this, then we get our other claim-- 350 00:21:50,490 --> 00:21:54,450 this one over here, that root n updates, and root n verifies 351 00:21:54,450 --> 00:21:57,900 sums that require this much time. 352 00:21:57,900 --> 00:22:01,290 The difference is-- well, here we 353 00:22:01,290 --> 00:22:03,900 have an l, for an l leaf tree. 354 00:22:03,900 --> 00:22:07,710 And so what I'd like to do is sum this lower bound 355 00:22:07,710 --> 00:22:09,390 over every node in the tree. 356 00:22:09,390 --> 00:22:12,625 I need to check that that is valid to do. 357 00:22:12,625 --> 00:22:13,740 So let's do that. 358 00:22:25,750 --> 00:22:28,530 OK, for every node v, we are claiming 359 00:22:28,530 --> 00:22:30,900 there's a certain number of reads that happen over here, 360 00:22:30,900 --> 00:22:33,030 that correspond to writes over here. 361 00:22:33,030 --> 00:22:35,490 But let's say you look at the parent 362 00:22:35,490 --> 00:22:37,184 of v, which is over here. 363 00:22:37,184 --> 00:22:38,850 This thing is also in the right subtree, 364 00:22:38,850 --> 00:22:40,680 and we're claiming there's some number of reads 365 00:22:40,680 --> 00:22:43,180 on the right subtree, that read things that are written over 366 00:22:43,180 --> 00:22:43,710 on the left. 367 00:22:43,710 --> 00:22:45,960 The worry would be that the reads we counted here, 368 00:22:45,960 --> 00:22:47,640 we also count at the next level up. 369 00:22:47,640 --> 00:22:49,723 We don't want to double count in our lower bounds. 370 00:22:49,723 --> 00:22:53,796 If we're able to sum them up, we can't be double counting. 371 00:22:53,796 --> 00:22:55,170 But the claim is we're not double 372 00:22:55,170 --> 00:22:57,870 counting, because if you look at any particular-- 373 00:22:57,870 --> 00:23:02,990 any read-- so here's time, and suppose you do a read here. 374 00:23:02,990 --> 00:23:05,534 You're reading a cell that was written sometime in the past, 375 00:23:05,534 --> 00:23:07,950 if it was never written, it's a not very interesting read, 376 00:23:07,950 --> 00:23:09,780 it communicates no information. 377 00:23:09,780 --> 00:23:13,410 So there's some write in the past that changed 378 00:23:13,410 --> 00:23:15,420 the cell that's just read. 379 00:23:15,420 --> 00:23:18,390 And we are going to count this read 380 00:23:18,390 --> 00:23:23,827 at a particular node, namely the lca of those two times. 381 00:23:23,827 --> 00:23:25,410 So if you look at the lca of the times 382 00:23:25,410 --> 00:23:27,890 of the reads and the writes, that is the single note 383 00:23:27,890 --> 00:23:29,755 that we'll think about that read that 384 00:23:29,755 --> 00:23:31,380 happened in the right subtree, that was 385 00:23:31,380 --> 00:23:33,400 written in the left subtree. 386 00:23:33,400 --> 00:23:45,465 So no double counting, because we only count at the lca. 387 00:23:48,496 --> 00:23:50,370 The other thing that we need to be able to do 388 00:23:50,370 --> 00:23:52,270 is, because this is an expected lower bound, 389 00:23:52,270 --> 00:23:53,760 we need linearity of expectation. 390 00:23:53,760 --> 00:23:57,300 But expectation is indeed linear, so we're all set. 391 00:24:01,755 --> 00:24:03,860 OK, so all that's left is a little bit of common 392 00:24:03,860 --> 00:24:07,260 [INAUDIBLE] if we take l root n, where l is the size 393 00:24:07,260 --> 00:24:10,290 of the subtree below a given node, 394 00:24:10,290 --> 00:24:13,350 we sum that up over all nodes, and it's a balanced binary 395 00:24:13,350 --> 00:24:15,260 search tree-- 396 00:24:15,260 --> 00:24:18,870 or a balanced binary tree, I should say, not a search tree. 397 00:24:18,870 --> 00:24:20,170 What do we get? 398 00:24:20,170 --> 00:24:24,370 Well, every leaf appears in log n subtrees. 399 00:24:24,370 --> 00:24:30,740 So we get the total size of the tree times log n for this, 400 00:24:30,740 --> 00:24:32,490 and we get another root n over here. 401 00:24:32,490 --> 00:24:35,550 The total size of the tree is root n. 402 00:24:35,550 --> 00:24:40,980 So we get this root n log n, that's 403 00:24:40,980 --> 00:24:42,570 when you sum up the l part. 404 00:24:42,570 --> 00:24:46,170 Then everything gets multiplied by root n, 405 00:24:46,170 --> 00:24:47,820 and that becomes our lower bound, 406 00:24:47,820 --> 00:24:51,790 and that's exactly what we need over here. 407 00:24:51,790 --> 00:24:55,256 So now this claim is done. 408 00:24:55,256 --> 00:24:57,000 Maybe I should do a check mark. 409 00:24:57,000 --> 00:24:58,660 Provided we can prove this claim. 410 00:24:58,660 --> 00:25:01,360 So now our goal is to prove this thing. 411 00:25:01,360 --> 00:25:03,520 And now we're in a more local world, 412 00:25:03,520 --> 00:25:06,210 looking at a single node, counting reads over here, 413 00:25:06,210 --> 00:25:08,007 the corresponding rights over there. 414 00:25:08,007 --> 00:25:09,840 And then you just add up those lower bounds, 415 00:25:09,840 --> 00:25:10,980 you get what you want. 416 00:25:10,980 --> 00:25:12,480 So this is where the log comes from, 417 00:25:12,480 --> 00:25:14,370 because it's a balanced tree. 418 00:25:14,370 --> 00:25:17,160 And there's log n levels in a balanced tree, that's where 419 00:25:17,160 --> 00:25:18,870 we're getting our lower bound. 420 00:25:18,870 --> 00:25:21,450 The root n's are just keeping track of the size of the things 421 00:25:21,450 --> 00:25:22,641 we're manipulating. 422 00:25:26,852 --> 00:25:30,170 All right. 423 00:25:30,170 --> 00:25:34,730 So it remains to prove this claim. 424 00:25:38,170 --> 00:25:39,790 Prove that claim, we get that claim, 425 00:25:39,790 --> 00:25:40,998 and then we get this theorem. 426 00:26:05,630 --> 00:26:08,585 So proof of claim. 427 00:26:16,130 --> 00:26:18,860 We're going to do an information theoretic 428 00:26:18,860 --> 00:26:21,950 argument, so let me set it up. 429 00:26:21,950 --> 00:26:24,050 It's again, it's making this claim 430 00:26:24,050 --> 00:26:26,180 I said before, that the permutations that 431 00:26:26,180 --> 00:26:27,830 get written over here somehow have 432 00:26:27,830 --> 00:26:29,750 to be communicated to the queries over here, 433 00:26:29,750 --> 00:26:33,380 because they matter. 434 00:26:33,380 --> 00:26:36,670 Because the permutations that get said over here 435 00:26:36,670 --> 00:26:39,290 changed the answers to all the queries over here, 436 00:26:39,290 --> 00:26:42,860 because of the interleaving between left and right. 437 00:26:42,860 --> 00:26:45,710 So how are we going to formalize that? 438 00:26:45,710 --> 00:26:57,950 Well, left subtree does l/2 updates 439 00:26:57,950 --> 00:27:06,614 with l/2 random permutations, uniform random permutations, 440 00:27:06,614 --> 00:27:08,030 because every node does an update. 441 00:27:10,910 --> 00:27:13,760 And so the information theoretic idea 442 00:27:13,760 --> 00:27:25,790 is that if we were to somehow encode those permutations, 443 00:27:25,790 --> 00:27:34,720 That encoding must use omega l log l-- 444 00:27:34,720 --> 00:27:35,220 l? 445 00:27:35,220 --> 00:27:36,630 No, I'm sorry. 446 00:27:36,630 --> 00:27:39,020 It's not right. 447 00:27:39,020 --> 00:27:45,905 Off by some root n factors here, l root n log n. 448 00:27:45,905 --> 00:27:48,380 OK, each permutation must take root n 449 00:27:48,380 --> 00:27:51,470 log root n bits to encode. 450 00:27:51,470 --> 00:27:53,270 If you have a random permutation, 451 00:27:53,270 --> 00:27:56,030 expected number of bits have a very high probability. 452 00:27:56,030 --> 00:27:57,530 Almost every permutation requires 453 00:27:57,530 --> 00:27:59,484 root n log root n bits. 454 00:27:59,484 --> 00:28:01,400 I'm not going to worry about constant factors, 455 00:28:01,400 --> 00:28:04,197 put an omega here, so the root n turns into an n. 456 00:28:04,197 --> 00:28:05,780 And then we've got l over two of them, 457 00:28:05,780 --> 00:28:09,691 so again, ignoring constant factors, that's l root n log n 458 00:28:09,691 --> 00:28:10,190 bits. 459 00:28:13,440 --> 00:28:17,510 And this is just information, theoretic fact, 460 00:28:17,510 --> 00:28:19,490 our common [INAUDIBLE] theory fact. 461 00:28:19,490 --> 00:28:25,460 And once we know that, the idea is let's 462 00:28:25,460 --> 00:28:28,010 find an encoding that's better than this, 463 00:28:28,010 --> 00:28:29,680 and get a contradiction. 464 00:28:29,680 --> 00:28:31,430 Of course we shouldn't get a contradiction 465 00:28:31,430 --> 00:28:33,530 unless this claim is false. 466 00:28:33,530 --> 00:28:36,110 So either this claim is true and we're happy, 467 00:28:36,110 --> 00:28:38,240 but if somehow the word not enough cell 468 00:28:38,240 --> 00:28:40,940 reads on the right, that did things that were written 469 00:28:40,940 --> 00:28:43,550 on the left, then we will, from that, 470 00:28:43,550 --> 00:28:48,590 get a smaller encoding of the update permutations 471 00:28:48,590 --> 00:28:50,330 that happen on the left. 472 00:28:50,330 --> 00:28:51,830 If we could somehow do that, then we 473 00:28:51,830 --> 00:28:55,130 can get a contradiction, and therefore conclude the claim 474 00:28:55,130 --> 00:28:57,770 is in fact true. 475 00:28:57,770 --> 00:29:15,800 So, if the claim fails, we'll find a smaller encoding, which 476 00:29:15,800 --> 00:29:17,442 will give us a contradiction. 477 00:29:24,050 --> 00:29:32,260 All right, so let's set up this problem a little bit more. 478 00:29:32,260 --> 00:29:34,900 I'm going to-- because we're really 479 00:29:34,900 --> 00:29:39,460 just interested in this subtree v stuff on the left, stuff 480 00:29:39,460 --> 00:29:42,420 on the right, but this of course lives in a much bigger tree, 481 00:29:42,420 --> 00:29:44,320 there's stuff that happens over here. 482 00:29:44,320 --> 00:29:46,870 This I will call the past. 483 00:29:46,870 --> 00:29:51,160 I'm just going to assume we know everything about the past. 484 00:29:51,160 --> 00:29:55,090 Everything to the left of the subtree, 485 00:29:55,090 --> 00:29:56,530 we can assume that we know. 486 00:29:56,530 --> 00:29:59,420 When I say we know, what do we know? 487 00:29:59,420 --> 00:30:02,020 We know all the updates, we know all the queries that happen, 488 00:30:02,020 --> 00:30:04,220 and we know, at this moment in particular, 489 00:30:04,220 --> 00:30:06,790 what is the state of the data structure. 490 00:30:06,790 --> 00:30:09,760 Because this claim has nothing to do with this stuff, 491 00:30:09,760 --> 00:30:12,850 it's all about reads here that corresponds to writes here. 492 00:30:12,850 --> 00:30:15,610 So we can just assume we know everything up to this point. 493 00:30:15,610 --> 00:30:18,946 In our encoding, this is a key point. 494 00:30:18,946 --> 00:30:21,160 One way to say this in a probabilistic sense 495 00:30:21,160 --> 00:30:23,680 is we're conditioning on what happened over here 496 00:30:23,680 --> 00:30:25,820 on the left, what updates happened. 497 00:30:25,820 --> 00:30:28,450 And if we can prove that whatever we need to happen here 498 00:30:28,450 --> 00:30:31,150 holds no matter what the condition is, 499 00:30:31,150 --> 00:30:33,190 then it will hold overall. 500 00:30:33,190 --> 00:30:35,800 So that's probabilistic justification 501 00:30:35,800 --> 00:30:40,420 for why we can assume we know the past OK. 502 00:30:40,420 --> 00:30:48,312 So then our goal is to encode, this 503 00:30:48,312 --> 00:30:50,020 is a little bit different from this goal. 504 00:31:07,050 --> 00:31:10,700 What we really want to do is encode the update permutations 505 00:31:10,700 --> 00:31:12,825 on the left. 506 00:31:12,825 --> 00:31:14,450 That's a little awkward to think about, 507 00:31:14,450 --> 00:31:17,030 because this is a claim about how many probes 508 00:31:17,030 --> 00:31:18,800 happen on the right. 509 00:31:18,800 --> 00:31:20,690 So instead, what we're going to do 510 00:31:20,690 --> 00:31:24,200 is encode the query permutations on the right. 511 00:31:24,200 --> 00:31:27,230 So there are updates over here, that's what we want to encode, 512 00:31:27,230 --> 00:31:29,590 but we're instead going to encode the queries over here. 513 00:31:29,590 --> 00:31:32,480 I claim if you know what the results of the queries 514 00:31:32,480 --> 00:31:35,480 were over here, then you know what 515 00:31:35,480 --> 00:31:38,470 the updates were over there. 516 00:31:38,470 --> 00:31:41,560 Basically because of this interleaving property. 517 00:31:41,560 --> 00:31:43,945 So I can write that down a little more formally. 518 00:31:54,500 --> 00:31:57,730 So if we look at time here, over, 519 00:31:57,730 --> 00:31:59,020 let's say this is v's subtree. 520 00:32:03,510 --> 00:32:07,500 Then what we have are a sequence of updates 521 00:32:07,500 --> 00:32:09,490 and a sequence of queries. 522 00:32:13,354 --> 00:32:18,955 These are queries, and these are updates. 523 00:32:21,460 --> 00:32:24,790 This is what the sequence looks like-- 524 00:32:24,790 --> 00:32:28,770 sorry, this is v's subtree, this is the pi is, I should say. 525 00:32:31,830 --> 00:32:34,700 I mean, these operations are all happened during time, 526 00:32:34,700 --> 00:32:38,436 but now I'm sorting by i. 527 00:32:38,436 --> 00:32:41,239 A little confusing. 528 00:32:41,239 --> 00:32:43,030 There are two orders to think about, right? 529 00:32:43,030 --> 00:32:46,820 There's the sequence over time, we're 530 00:32:46,820 --> 00:32:48,470 now looking at such a left subtree 531 00:32:48,470 --> 00:32:51,230 where we do say 1, 5, and 3, 7. 532 00:32:51,230 --> 00:32:53,690 What that means-- so you're imagining here, this is 1, 533 00:32:53,690 --> 00:32:55,580 this is 5, this is 3, this is 7. 534 00:32:55,580 --> 00:32:58,240 Here we're sorted by the value written down there, 535 00:32:58,240 --> 00:33:00,770 we're sorting by the i, the pi i that they're 536 00:33:00,770 --> 00:33:03,080 changing or querying. 537 00:33:03,080 --> 00:33:13,130 And so all the read things are in the right subtree of v. 538 00:33:13,130 --> 00:33:23,090 And all the updates are in the left subtree of v. 539 00:33:23,090 --> 00:33:24,539 This is the interleaving property 540 00:33:24,539 --> 00:33:25,580 that I mentioned earlier. 541 00:33:28,140 --> 00:33:32,180 So I claim that if I encode the results of the queries, 542 00:33:32,180 --> 00:33:35,990 namely I encode these permutations, 543 00:33:35,990 --> 00:33:37,220 these are like summary-- 544 00:33:37,220 --> 00:33:37,970 partial sums. 545 00:33:37,970 --> 00:33:41,480 These are prefixed sums of the permutation list. 546 00:33:41,480 --> 00:33:43,970 Then I can figure out what the updates were. 547 00:33:43,970 --> 00:33:44,900 Why? 548 00:33:44,900 --> 00:33:48,470 Because if I figure out what this query, what 549 00:33:48,470 --> 00:33:50,000 it's permutation is, that's the sum 550 00:33:50,000 --> 00:33:52,110 of all of these permutations. 551 00:33:52,110 --> 00:33:55,250 Now only one of them changed in the left subtree, 552 00:33:55,250 --> 00:33:58,190 the rest all are in the past. 553 00:33:58,190 --> 00:34:01,580 They were all set before this time over here, 554 00:34:01,580 --> 00:34:04,640 and I know everything about the past, I'm assuming. 555 00:34:04,640 --> 00:34:07,730 So most of these I already know, the one thing I don't know 556 00:34:07,730 --> 00:34:13,010 is this one, but I claim if I know this sum, 557 00:34:13,010 --> 00:34:15,770 and I know all the others, then I can figure out 558 00:34:15,770 --> 00:34:17,239 what this one is, right? 559 00:34:17,239 --> 00:34:20,469 It's slightly awkward to do, if I give you this, 560 00:34:20,469 --> 00:34:29,960 I give you the sum of pi j from j equals 0 to i, or something. 561 00:34:29,960 --> 00:34:33,320 I've got to-- 562 00:34:33,320 --> 00:34:37,719 I want to strip away all these, strip away all these. 563 00:34:37,719 --> 00:34:44,659 So I'm going to multiply by sum of pi j inverses over here, 564 00:34:44,659 --> 00:34:48,350 and multiply by sum pi j-- 565 00:34:48,350 --> 00:34:50,760 when I say multiply, I mean compose. 566 00:34:50,760 --> 00:34:53,060 Sum pi j inverse is here, maybe let's not 567 00:34:53,060 --> 00:34:54,870 worry about the exact indices here. 568 00:34:54,870 --> 00:34:58,220 But the point is, this is all in the past, 569 00:34:58,220 --> 00:35:01,700 and this is all in the past, so I know all these pi js, 570 00:35:01,700 --> 00:35:03,750 I know they're inverses. 571 00:35:03,750 --> 00:35:05,750 So if I have this total sum, and I right 572 00:35:05,750 --> 00:35:07,340 multiply with these inverses, left 573 00:35:07,340 --> 00:35:10,400 multiply with these inverses, I get the one that I want. 574 00:35:10,400 --> 00:35:19,190 This gives me some particular pi k, if I set the indices right. 575 00:35:22,260 --> 00:35:22,760 OK? 576 00:35:22,760 --> 00:35:26,330 So if I know this query, I figure out what this update is. 577 00:35:26,330 --> 00:35:29,480 Now once I know what this update is, and I know this query, then 578 00:35:29,480 --> 00:35:33,440 in this sum, I know everything except this one thing. 579 00:35:33,440 --> 00:35:35,740 And so by using the same trick, I 580 00:35:35,740 --> 00:35:37,280 can figure out what this update is. 581 00:35:37,280 --> 00:35:39,230 So now I know the first two updates, 582 00:35:39,230 --> 00:35:41,000 if I then know the answer to this query, 583 00:35:41,000 --> 00:35:42,200 I can figure out what this update is. 584 00:35:42,200 --> 00:35:43,190 If I know the answer to this query, 585 00:35:43,190 --> 00:35:44,231 I can figure this update. 586 00:35:44,231 --> 00:35:46,060 Because they're perfectly interleaved, 587 00:35:46,060 --> 00:35:49,290 I only need to reconstruct one update at a time. 588 00:35:49,290 --> 00:35:50,360 So if I'm given-- 589 00:35:50,360 --> 00:35:54,260 if I've somehow encoded all of the queries results, 590 00:35:54,260 --> 00:35:58,430 all of these prefix sums, and I'm given the past, 591 00:35:58,430 --> 00:36:01,490 then I can reconstruct what all the updates were. 592 00:36:01,490 --> 00:36:05,570 So that's basically saying these two are the same issue. 593 00:36:05,570 --> 00:36:08,420 If I can encode the verified sums in the right subtree, 594 00:36:08,420 --> 00:36:11,180 using less than l root n log n bits, 595 00:36:11,180 --> 00:36:13,330 then I'll get a contradiction, because it implies 596 00:36:13,330 --> 00:36:15,170 that from that same encoding, you 597 00:36:15,170 --> 00:36:17,630 can also decode the update permutations 598 00:36:17,630 --> 00:36:20,459 in the left subtree. 599 00:36:20,459 --> 00:36:21,250 So that's our goal. 600 00:36:24,761 --> 00:36:25,260 OK. 601 00:36:28,180 --> 00:36:33,592 So we'd like to prove this for verify sum. 602 00:36:33,592 --> 00:36:35,050 But the first thing I'm going to do 603 00:36:35,050 --> 00:36:39,640 is consider an easier problem, which is sum. 604 00:36:39,640 --> 00:36:43,740 So suppose, basically, this was not an input to the query. 605 00:36:43,740 --> 00:36:47,820 Suppose the query was, what is the sum of i? 606 00:36:47,820 --> 00:36:48,750 Like this. 607 00:36:48,750 --> 00:36:52,360 I just want-- this is the partial sum problem. 608 00:36:52,360 --> 00:36:54,360 I'm given an index i, I want to know 609 00:36:54,360 --> 00:36:58,080 what is the permutation from pi 0 up to pi i. 610 00:36:58,080 --> 00:36:59,720 Now that is not-- 611 00:36:59,720 --> 00:37:01,720 that doesn't correspond to dynamic connectivity, 612 00:37:01,720 --> 00:37:03,010 it's a new problem. 613 00:37:03,010 --> 00:37:05,051 We'll first prove a lower bound for that problem, 614 00:37:05,051 --> 00:37:07,545 and then we'll put the verify word back in. 615 00:37:07,545 --> 00:37:14,660 OK, so that's-- we're now here at sum lower bound. 616 00:37:14,660 --> 00:37:15,620 Where should I go? 617 00:37:18,280 --> 00:37:21,269 Different-- so this is a lower bound on the operation sum, 618 00:37:21,269 --> 00:37:23,560 as opposed to here, where we're adding up lower bounds. 619 00:37:23,560 --> 00:37:27,490 Sorry for the conflation of terms. 620 00:37:27,490 --> 00:37:30,170 Let's go here. 621 00:37:49,320 --> 00:37:50,730 So I'll call this a warm up. 622 00:37:58,690 --> 00:38:02,150 Suppose a query is sum of i, which 623 00:38:02,150 --> 00:38:08,290 is supposed to give you this prefix sum of pi j again, 624 00:38:08,290 --> 00:38:09,980 sum means composition. 625 00:38:13,060 --> 00:38:19,560 So this is going to be relatively easy to prove, 626 00:38:19,560 --> 00:38:22,230 but it's not the problem we actually want to solve, 627 00:38:22,230 --> 00:38:25,660 we'll use it to then solve the real problem. 628 00:38:25,660 --> 00:38:28,390 And this is the order in which we actually solve things. 629 00:38:28,390 --> 00:38:31,200 First, we prove a lower bound of partial sums. 630 00:38:31,200 --> 00:38:34,500 OK, so let me give you some notation, 631 00:38:34,500 --> 00:38:38,910 so we can really get at this claim. 632 00:38:38,910 --> 00:38:42,630 Reading on the right, writing on the left. 633 00:38:42,630 --> 00:38:44,580 So let r be all the cells that are 634 00:38:44,580 --> 00:38:51,270 read during the right subtree, which is an interval of time. 635 00:38:54,150 --> 00:38:59,130 And let w be the cells written in the left subtree. 636 00:39:12,440 --> 00:39:14,440 OK, so what we're talking about over here 637 00:39:14,440 --> 00:39:17,645 is that r intersects w, those are cells that 638 00:39:17,645 --> 00:39:19,270 are read during the right subtree, that 639 00:39:19,270 --> 00:39:22,420 were at some point written during the left subtree, 640 00:39:22,420 --> 00:39:23,506 should be large. 641 00:39:23,506 --> 00:39:24,880 So we want to prove a lower bound 642 00:39:24,880 --> 00:39:26,777 on the size of r intersect w. 643 00:39:30,760 --> 00:39:34,090 So if the lower bound doesn't hold, 644 00:39:34,090 --> 00:39:37,880 that means that r intersect w is relatively small. 645 00:39:37,880 --> 00:39:40,550 So imagine a situation where r intersect w is very small, 646 00:39:40,550 --> 00:39:42,010 there's not very much information 647 00:39:42,010 --> 00:39:44,380 passed from the left subtree to the right subtree. 648 00:39:44,380 --> 00:39:46,630 If r intersect w is small, then presumably I 649 00:39:46,630 --> 00:39:49,732 can afford to write it down, I can encode it. 650 00:39:49,732 --> 00:39:51,940 So that's what we're going to do, and we'll compute-- 651 00:39:51,940 --> 00:39:55,350 we'll figure out that this is indeed something we can afford. 652 00:39:55,350 --> 00:40:00,050 I'm going to encode r intersect w explicitly. 653 00:40:00,050 --> 00:40:05,470 Meaning-- and this is a set of cells in memory. 654 00:40:05,470 --> 00:40:07,480 So for every cell, I'm going to write down 655 00:40:07,480 --> 00:40:12,310 what it's address is, and what the contents of the cell are. 656 00:40:12,310 --> 00:40:18,850 So write down the addresses and the contents 657 00:40:18,850 --> 00:40:20,720 for every such cell. 658 00:40:20,720 --> 00:40:23,830 So how many bits does that take? 659 00:40:23,830 --> 00:40:30,115 I'm going to say that it's r intersect w times log n bits. 660 00:40:33,250 --> 00:40:35,830 Here's where I need to mention an assumption. 661 00:40:35,830 --> 00:40:39,920 I'm assuming that the address space is order log n bits long, 662 00:40:39,920 --> 00:40:42,280 that's like saying that the space of your data structure 663 00:40:42,280 --> 00:40:44,050 is order-- 664 00:40:44,050 --> 00:40:46,160 is polynomial in n. 665 00:40:46,160 --> 00:40:49,000 And if you want any hope of having a reasonable update 666 00:40:49,000 --> 00:40:51,594 time, you need to have polynomial space at most. 667 00:40:51,594 --> 00:40:54,010 So assuming polynomial space, each of those addresses only 668 00:40:54,010 --> 00:40:56,140 takes order log n bits to write down. 669 00:40:56,140 --> 00:41:00,417 The contents, let's say, also take order log n bits 670 00:41:00,417 --> 00:41:01,000 to write down. 671 00:41:04,020 --> 00:41:08,880 OK, so fine. 672 00:41:08,880 --> 00:41:12,412 That's-- I mean, yeah. 673 00:41:12,412 --> 00:41:14,370 We don't really need to make those assumptions, 674 00:41:14,370 --> 00:41:20,140 I don't think, but we will for here to keep things simple. 675 00:41:20,140 --> 00:41:22,740 So if r intersect w is small, meaning smaller 676 00:41:22,740 --> 00:41:28,470 than this thing, then this will be small, smaller than l root 677 00:41:28,470 --> 00:41:29,238 log n. 678 00:41:32,516 --> 00:41:33,450 OK. 679 00:41:33,450 --> 00:41:38,040 So on the other hand, we know that every encoding should 680 00:41:38,040 --> 00:41:41,360 take l root n log n bits. 681 00:41:41,360 --> 00:41:44,190 And so this will be a contradiction, 682 00:41:44,190 --> 00:41:47,670 although we haven't quite encoded what we need yet, 683 00:41:47,670 --> 00:41:50,250 or we haven't proved that, but we're getting 684 00:41:50,250 --> 00:41:51,390 to be at the right point. 685 00:41:51,390 --> 00:41:55,890 These log ns are going to cancel in a moment. 686 00:41:55,890 --> 00:41:59,010 So what we need to do is, I claim this is actually 687 00:41:59,010 --> 00:42:00,790 enough to encode what we need. 688 00:42:00,790 --> 00:42:10,100 And so all that's left is a decoding algorithm for the sum 689 00:42:10,100 --> 00:42:12,970 queries in the right subtree. 690 00:42:20,710 --> 00:42:22,960 So how are we going to do that? 691 00:42:22,960 --> 00:42:24,750 So this is my encoding, these are the bits 692 00:42:24,750 --> 00:42:26,340 that I have written down. 693 00:42:26,340 --> 00:42:29,580 So now what I know, as a decoder, 694 00:42:29,580 --> 00:42:32,590 is I know everything about the past. 695 00:42:32,590 --> 00:42:34,390 I don't know what these updates are, 696 00:42:34,390 --> 00:42:36,600 that's my whole goal, to figure out what they are. 697 00:42:36,600 --> 00:42:38,730 I don't know what the results of the queries 698 00:42:38,730 --> 00:42:41,050 are, but magically, I know that r intersect w. 699 00:42:41,050 --> 00:42:42,030 Well, not magically. 700 00:42:42,030 --> 00:42:45,670 I wrote it down, kept track on a piece of paper. 701 00:42:45,670 --> 00:42:47,580 So that's what I know. 702 00:42:47,580 --> 00:42:51,030 And so the idea is, well, somebody 703 00:42:51,030 --> 00:42:53,550 gave us a data structure, tells you how to do an update, 704 00:42:53,550 --> 00:42:55,090 tells you how to do a query. 705 00:42:55,090 --> 00:42:58,557 Let's run the query algorithms over here. 706 00:42:58,557 --> 00:43:00,390 Run that query, run that query, or whatever. 707 00:43:03,270 --> 00:43:05,310 It's a little hard to run them, because we 708 00:43:05,310 --> 00:43:08,130 don't know what happened in this intermediate part. 709 00:43:08,130 --> 00:43:12,430 But I claim r intersect w tells us everything we need to know. 710 00:43:12,430 --> 00:43:23,220 So the decoding algorithm is just simulate sum queries, 711 00:43:23,220 --> 00:43:24,942 simulate that algorithm. 712 00:43:35,780 --> 00:43:38,380 And let's go up here. 713 00:43:54,160 --> 00:43:55,680 How do we simulate that algorithm? 714 00:43:55,680 --> 00:43:58,870 Well, the algorithm makes a series of cell 715 00:43:58,870 --> 00:44:03,580 reads, and maybe writes, but really we care about the reads. 716 00:44:03,580 --> 00:44:05,565 Writes are pretty easy to simulate. 717 00:44:21,590 --> 00:44:23,470 There are three cases for reads. 718 00:44:23,470 --> 00:44:25,870 It could be that the thing you're trying to read 719 00:44:25,870 --> 00:44:27,340 was written in the right subtree, 720 00:44:27,340 --> 00:44:29,650 it could be that it was written in the left subtree, 721 00:44:29,650 --> 00:44:31,730 or it could be it was written in the past, 722 00:44:31,730 --> 00:44:36,177 before we got to v subtree. 723 00:44:36,177 --> 00:44:38,260 Now we don't necessarily know which case we're in, 724 00:44:38,260 --> 00:44:40,890 but I claim we'll be able to figure it out. 725 00:44:40,890 --> 00:44:45,789 Because any cells that are written in the right subtree, 726 00:44:45,789 --> 00:44:47,830 we've just been running the simulation algorithm, 727 00:44:47,830 --> 00:44:50,140 so every time we do it right, we just 728 00:44:50,140 --> 00:44:51,460 can store it off to the side. 729 00:44:51,460 --> 00:44:54,250 So when we're doing simulations, we 730 00:44:54,250 --> 00:44:57,100 don't need that the simulation takes low space. 731 00:44:57,100 --> 00:44:59,750 We just need that the input-- these decoding algorithms 732 00:44:59,750 --> 00:45:01,291 doesn't have to be low space, we just 733 00:45:01,291 --> 00:45:02,830 need that the encoding was small. 734 00:45:02,830 --> 00:45:05,010 We've already made the encoding small. 735 00:45:05,010 --> 00:45:06,400 And so the decoding algorithm can 736 00:45:06,400 --> 00:45:08,290 spend lots of time and space, we just 737 00:45:08,290 --> 00:45:10,540 need to show that decoding algorithm can recover 738 00:45:10,540 --> 00:45:11,790 what it's supposed to recover. 739 00:45:11,790 --> 00:45:13,430 It's like a compression algorithm, 740 00:45:13,430 --> 00:45:15,160 to show there's some way to decompress, 741 00:45:15,160 --> 00:45:17,160 could take arbitrarily amount of time and space. 742 00:45:17,160 --> 00:45:20,020 So when we're simulating the right subtree, 743 00:45:20,020 --> 00:45:23,725 and we simulate not only the sum queries, but also the updates. 744 00:45:27,370 --> 00:45:30,100 So whatever gets written during that simulation, 745 00:45:30,100 --> 00:45:33,610 we just store it, and so it's easy to reread it. 746 00:45:33,610 --> 00:45:36,250 If it was written in the left subtree, 747 00:45:36,250 --> 00:45:38,245 well, that is r intersect w. 748 00:45:40,990 --> 00:45:42,910 And we've written down r intersect w. 749 00:45:42,910 --> 00:45:44,470 So we can detect that this happened, 750 00:45:44,470 --> 00:45:47,000 because we look at r intersect w, 751 00:45:47,000 --> 00:45:48,970 we see, oh that word was in there, 752 00:45:48,970 --> 00:45:51,220 that address was in there, and so 753 00:45:51,220 --> 00:45:55,120 we read the contents from the encoding. 754 00:45:55,120 --> 00:46:00,420 If it was in the past, it's also easy. 755 00:46:00,420 --> 00:46:03,100 We already know it. 756 00:46:03,100 --> 00:46:07,690 OK, so basically what we do-- 757 00:46:07,690 --> 00:46:09,880 what the simulation algorithm is doing is it says, 758 00:46:09,880 --> 00:46:12,460 OK, let's assume that main memory was 759 00:46:12,460 --> 00:46:14,477 whatever it was at this point. 760 00:46:14,477 --> 00:46:17,060 That data structure, I mean we know everything about the past, 761 00:46:17,060 --> 00:46:18,768 so we know what the data structure looked 762 00:46:18,768 --> 00:46:20,710 like at this moment, store that. 763 00:46:20,710 --> 00:46:23,530 Update all of the cells that are in r intersect 764 00:46:23,530 --> 00:46:26,050 w given by our encoding. 765 00:46:26,050 --> 00:46:28,520 And then just run the algorithm. 766 00:46:28,520 --> 00:46:31,480 So we're sort of jumping into this moment in time 767 00:46:31,480 --> 00:46:34,360 with a slightly weird data structure. 768 00:46:34,360 --> 00:46:37,244 It's not the correct data structure. 769 00:46:37,244 --> 00:46:39,160 It's not what the data structure will actually 770 00:46:39,160 --> 00:46:41,470 look like at this point, but it's close enough. 771 00:46:41,470 --> 00:46:44,530 Because anything that's read here, 772 00:46:44,530 --> 00:46:46,300 either was written here, in which case 773 00:46:46,300 --> 00:46:48,960 it's correct, or was written here, in which case 774 00:46:48,960 --> 00:46:53,250 it's correct because r intersect w had it. 775 00:46:53,250 --> 00:46:56,290 Or isn't it written here, in which case-- 776 00:46:56,290 --> 00:46:59,060 maybe it's always correct. 777 00:46:59,060 --> 00:46:59,560 No, no. 778 00:46:59,560 --> 00:47:01,120 See there could be some writes that 779 00:47:01,120 --> 00:47:04,397 happened here, where there's no corresponding read over here. 780 00:47:04,397 --> 00:47:06,730 So the data structure may have been changed in ways here 781 00:47:06,730 --> 00:47:10,430 that don't matter for this execution of the right subtree. 782 00:47:10,430 --> 00:47:13,300 So any rights that happened here to some cell probe, 783 00:47:13,300 --> 00:47:15,496 to some cell, where that cell is not read over here, 784 00:47:15,496 --> 00:47:16,870 we don't care about, because they 785 00:47:16,870 --> 00:47:18,820 don't affect the simulation. 786 00:47:18,820 --> 00:47:20,830 So we have a good enough data structure 787 00:47:20,830 --> 00:47:23,440 here, it may not be completely accurate, 788 00:47:23,440 --> 00:47:26,170 but it's accurate enough to run these queries. 789 00:47:26,170 --> 00:47:28,760 Once we run the queries, the queries output the sums. 790 00:47:28,760 --> 00:47:31,450 That's what we're assuming in this warm up, we run the query, 791 00:47:31,450 --> 00:47:33,160 we get the sum. 792 00:47:33,160 --> 00:47:35,302 Once I have that sum, as I argued before, 793 00:47:35,302 --> 00:47:37,510 once you know what the results of these queries were, 794 00:47:37,510 --> 00:47:41,680 I can figure out what the arguments to the updates were, 795 00:47:41,680 --> 00:47:44,810 by doing that inverse multiplication stuff. 796 00:47:44,810 --> 00:47:47,090 So that's actually it. 797 00:47:47,090 --> 00:47:51,650 What this implies is that this is a correct encoding, which 798 00:47:51,650 --> 00:47:56,360 means that this order, r intersect w times log 799 00:47:56,360 --> 00:48:02,690 n bits that we use to encode, must be at least this big. 800 00:48:02,690 --> 00:48:04,580 Because we know any encoding is going 801 00:48:04,580 --> 00:48:10,005 to require at least that many bits, l root n log n. 802 00:48:12,680 --> 00:48:16,280 And so the log ns cancel, and we're 803 00:48:16,280 --> 00:48:20,060 left with r intersect w is at least l root n. 804 00:48:20,060 --> 00:48:24,102 And this is exactly the quantity we cared about for this claim. 805 00:48:24,102 --> 00:48:29,450 So same thing, r intersect w is at least l root n. 806 00:48:29,450 --> 00:48:31,090 OK, so warm up done. 807 00:48:31,090 --> 00:48:33,960 Any questions about the warm up? 808 00:48:33,960 --> 00:48:35,870 So in this weird problem, which does not 809 00:48:35,870 --> 00:48:39,040 correspond to dynamic connectivity, 810 00:48:39,040 --> 00:48:43,190 because it's this other problem, prefix sums computation. 811 00:48:43,190 --> 00:48:46,190 We get the intended lower bound, you need log n per operation. 812 00:48:46,190 --> 00:48:50,710 Or you need root n log n per block operation. 813 00:48:50,710 --> 00:48:52,670 OK, but this is not what we really want, 814 00:48:52,670 --> 00:48:55,490 we really want a lower bound on verify sum. 815 00:48:55,490 --> 00:48:59,060 Where you're given as an argument the permutation that 816 00:48:59,060 --> 00:49:00,980 we're talking about over here. 817 00:49:00,980 --> 00:49:06,170 So this goal is not the right goal for verify sum, 818 00:49:06,170 --> 00:49:07,040 in some sense. 819 00:49:07,040 --> 00:49:08,675 Well, sort of the right goal. 820 00:49:08,675 --> 00:49:10,550 It's a little awkward though, because they're 821 00:49:10,550 --> 00:49:12,980 given as inputs to the queries. 822 00:49:12,980 --> 00:49:17,240 So what is there to encode? 823 00:49:17,240 --> 00:49:21,230 Well, we can still set it up in a useful way. 824 00:49:21,230 --> 00:49:23,650 Same goal, slightly restated. 825 00:49:42,230 --> 00:49:46,200 So this is the last step to verify sum lower bound. 826 00:50:06,830 --> 00:50:08,232 So here's the set up. 827 00:51:05,010 --> 00:51:07,190 OK, so slightly different set up here. 828 00:51:07,190 --> 00:51:10,670 Here I assumed that we just knew the past. 829 00:51:10,670 --> 00:51:13,250 I also basically assumed these two things, 830 00:51:13,250 --> 00:51:16,814 that we didn't know what the update permutations were 831 00:51:16,814 --> 00:51:18,230 in the left subtree, and we didn't 832 00:51:18,230 --> 00:51:20,090 know what the answers to the queries 833 00:51:20,090 --> 00:51:21,609 were in the right subtree. 834 00:51:21,609 --> 00:51:23,150 Now I'm going to assume we don't even 835 00:51:23,150 --> 00:51:25,100 know what we're passing into the queries, 836 00:51:25,100 --> 00:51:27,860 because that is the information we're trying to figure out. 837 00:51:27,860 --> 00:51:29,630 These two things are basically the same, 838 00:51:29,630 --> 00:51:31,381 if you knew all the update permutations, 839 00:51:31,381 --> 00:51:33,380 you could figure out all the query permutations. 840 00:51:33,380 --> 00:51:35,060 If you knew all the query permutations, 841 00:51:35,060 --> 00:51:37,101 you could figure out all the update permutations. 842 00:51:37,101 --> 00:51:39,590 That's what we argued over here, it's 843 00:51:39,590 --> 00:51:43,520 enough to figure out query permutations, 844 00:51:43,520 --> 00:51:46,070 then we could figure out the update permutations. 845 00:51:46,070 --> 00:51:49,410 It's just a little more awkward, because now there 846 00:51:49,410 --> 00:51:50,900 are arguments to queries. 847 00:51:50,900 --> 00:51:53,336 And so if we did this simulation, right? 848 00:51:53,336 --> 00:51:54,710 We'd simulate-- we don't know how 849 00:51:54,710 --> 00:51:57,251 to simulate the query algorithm, because it's supposed to be, 850 00:51:57,251 --> 00:51:59,300 given the argument, which is what 851 00:51:59,300 --> 00:52:01,294 we're trying to figure out. 852 00:52:01,294 --> 00:52:06,460 So we can't simulate the query algorithm. 853 00:52:06,460 --> 00:52:08,440 It's kind of annoying, but otherwise 854 00:52:08,440 --> 00:52:10,166 the set up is roughly the same. 855 00:52:10,166 --> 00:52:11,790 The one thing we know is that the query 856 00:52:11,790 --> 00:52:15,130 is supposed to return yes, because if you 857 00:52:15,130 --> 00:52:17,620 look at this bad access sequence, 858 00:52:17,620 --> 00:52:20,980 it is designed to always return yes. 859 00:52:20,980 --> 00:52:24,855 So that is a thing we know, but we don't know the arguments 860 00:52:24,855 --> 00:52:26,980 to the updates on the left, we don't know arguments 861 00:52:26,980 --> 00:52:28,240 to the updates on the right. 862 00:52:28,240 --> 00:52:29,781 We'll assume we know everything else, 863 00:52:29,781 --> 00:52:31,630 basically, up to this time. 864 00:52:31,630 --> 00:52:33,940 Again, this is a probabilistic statement, 865 00:52:33,940 --> 00:52:38,410 that conditioned on the past, conditioned on the queries 866 00:52:38,410 --> 00:52:40,210 on the left, which probably don't matter, 867 00:52:40,210 --> 00:52:43,480 conditioned on the updates on the right, which do matter, 868 00:52:43,480 --> 00:52:46,900 but they're sort of irrelevant to this r intersect w issue. 869 00:52:46,900 --> 00:52:48,400 Conditioned on all those things will 870 00:52:48,400 --> 00:52:51,400 prove that the expected number of operations you need to-- 871 00:52:51,400 --> 00:52:55,510 or expected encoding size, for this problem, 872 00:52:55,510 --> 00:52:58,750 is at least what it is, l root n log n bits. 873 00:52:58,750 --> 00:53:01,720 And from that lower bound, you can then 874 00:53:01,720 --> 00:53:06,640 take the sum over all possible setups, over all conditions. 875 00:53:06,640 --> 00:53:09,580 And that implies a lower bound on the overall setting 876 00:53:09,580 --> 00:53:12,088 without these assumptions. 877 00:53:12,088 --> 00:53:13,520 OK? 878 00:53:13,520 --> 00:53:16,630 So all I'm saying is in this set up, 879 00:53:16,630 --> 00:53:19,236 it still takes a lot of bits to encode these updates, 880 00:53:19,236 --> 00:53:20,860 because we don't have the queries which 881 00:53:20,860 --> 00:53:22,587 would tell us the answers. 882 00:53:22,587 --> 00:53:24,670 So we get a lower bound on encoding these updates, 883 00:53:24,670 --> 00:53:26,290 or a lower bound on encoding these queries, 884 00:53:26,290 --> 00:53:27,944 because we assume we don't know them. 885 00:53:27,944 --> 00:53:29,860 The rest of the-- all the remaining operations 886 00:53:29,860 --> 00:53:34,320 don't tell us enough about this. 887 00:53:34,320 --> 00:53:35,720 OK. 888 00:53:35,720 --> 00:53:37,790 So how the heck are we going to do-- 889 00:53:37,790 --> 00:53:40,730 prove a lower bound in this setting, when we can't 890 00:53:40,730 --> 00:53:43,624 simulate the query algorithm? 891 00:53:43,624 --> 00:53:45,290 There's one cool idea to make this work. 892 00:53:50,640 --> 00:53:55,050 You may recall our last cell probe lower bound 893 00:53:55,050 --> 00:53:56,340 for the predecessor problem. 894 00:54:00,570 --> 00:54:03,150 Use this idea of round elimination. 895 00:54:03,150 --> 00:54:06,190 The idea with round elimination was-- 896 00:54:06,190 --> 00:54:09,470 Alice is sending a message, Bob was sending a response. 897 00:54:09,470 --> 00:54:11,300 But that first m-- we set things up, 898 00:54:11,300 --> 00:54:14,240 we set up the problem so the first message sent by Alice 899 00:54:14,240 --> 00:54:18,350 had, on average, less than 1-bit of information to Bob, 900 00:54:18,350 --> 00:54:20,120 or very little information to Bob. 901 00:54:20,120 --> 00:54:23,120 And so what Bob could do is basically guess 902 00:54:23,120 --> 00:54:24,800 what that message was. 903 00:54:24,800 --> 00:54:27,470 And that would be accurate with some probability. 904 00:54:27,470 --> 00:54:28,910 Now here, we're not quite allowed 905 00:54:28,910 --> 00:54:30,050 to do that, we're not allowed to change 906 00:54:30,050 --> 00:54:31,550 the accuracy of our results, because 907 00:54:31,550 --> 00:54:33,860 of our particular setting. 908 00:54:33,860 --> 00:54:38,600 So we can't afford to just guess by flipping coins 909 00:54:38,600 --> 00:54:41,510 what we were supposed to know. 910 00:54:41,510 --> 00:54:43,340 What we're supposed to know here is-- 911 00:54:43,340 --> 00:54:45,470 we're trying to simulate a query operation, 912 00:54:45,470 --> 00:54:46,970 and so we need to know the argument, 913 00:54:46,970 --> 00:54:48,760 that whole permutation to the queries. 914 00:54:48,760 --> 00:54:50,950 It's hard to run it without that permutation. 915 00:54:50,950 --> 00:54:52,880 So instead of guessing by flipping coins, 916 00:54:52,880 --> 00:54:54,920 we're going to guess in the dynamic programming 917 00:54:54,920 --> 00:54:58,310 sense, which is we're going to try all the possibilities. 918 00:54:58,310 --> 00:55:02,990 Run the simulation over all possible queries, 919 00:55:02,990 --> 00:55:05,231 all possible second arguments to the query. 920 00:55:05,231 --> 00:55:06,980 We don't know what the presentation is, so 921 00:55:06,980 --> 00:55:08,480 just try them all. 922 00:55:08,480 --> 00:55:14,180 Cool thing is, only one argument here should return yes. 923 00:55:14,180 --> 00:55:16,710 That's the one we're looking for. 924 00:55:16,710 --> 00:55:18,140 So if you try them all, find which 925 00:55:18,140 --> 00:55:21,844 one says yes, we'll be done. 926 00:55:21,844 --> 00:55:25,670 So this is called the decoding idea. 927 00:55:31,460 --> 00:55:45,240 Simulate verify sum of i comma pi, for all pi. 928 00:55:45,240 --> 00:55:48,794 And take the one that returns yes, that is our permutation. 929 00:55:48,794 --> 00:55:51,210 And so if we figure out what those query permutations are, 930 00:55:51,210 --> 00:55:53,376 then we figure out what the update permutations are, 931 00:55:53,376 --> 00:55:57,360 and we get our lower bounds just like before. 932 00:55:57,360 --> 00:55:59,940 OK. 933 00:55:59,940 --> 00:56:02,940 This is easier said than done, unfortunately. 934 00:56:02,940 --> 00:56:08,446 We'd like to run the simulation just like here, so 935 00:56:08,446 --> 00:56:09,570 simulate inquiry algorithm. 936 00:56:09,570 --> 00:56:11,669 They said, OK, still the case, that if you're 937 00:56:11,669 --> 00:56:13,710 reading a cell that's either in the left subtree, 938 00:56:13,710 --> 00:56:16,800 in the right subtree, or in the past. 939 00:56:16,800 --> 00:56:19,680 And we said this was easy, this was known. 940 00:56:19,680 --> 00:56:25,260 And the hard part is this case, because if we're 941 00:56:25,260 --> 00:56:28,320 running this query, and it reads something 942 00:56:28,320 --> 00:56:30,090 that was written in the left subtree, 943 00:56:30,090 --> 00:56:33,600 it may not be in r intersect w. 944 00:56:33,600 --> 00:56:35,894 Why is that? 945 00:56:35,894 --> 00:56:36,810 Little puzzle for you. 946 00:56:39,810 --> 00:56:42,670 So we're running one of these queries for sum pi. 947 00:56:42,670 --> 00:56:45,430 And I claim that when we read something in the left subtree, 948 00:56:45,430 --> 00:56:49,422 we don't know if it's in r intersect w, it might not be. 949 00:56:59,754 --> 00:57:01,360 Let's see if we're on the same page. 950 00:57:01,360 --> 00:57:07,640 So r is the set of cells read during the right subtree 951 00:57:07,640 --> 00:57:11,740 when executing these operations. 952 00:57:11,740 --> 00:57:12,240 OK? 953 00:57:12,240 --> 00:57:16,180 But what we're doing now is simulating some executions 954 00:57:16,180 --> 00:57:19,210 that didn't necessarily happen. 955 00:57:19,210 --> 00:57:21,400 We're doing a verify sum of i comma pi, 956 00:57:21,400 --> 00:57:23,260 but in the bad access sequence, we 957 00:57:23,260 --> 00:57:24,910 did verify sum of i comma something 958 00:57:24,910 --> 00:57:28,150 specific, not any pi, but the correct pi. 959 00:57:28,150 --> 00:57:31,420 So we only ran the yes verify sums, 960 00:57:31,420 --> 00:57:33,910 and that's what r is defined with respect to. 961 00:57:33,910 --> 00:57:35,410 r is the set of things that get read 962 00:57:35,410 --> 00:57:38,050 during these operations, where the verify sum is always 963 00:57:38,050 --> 00:57:38,590 output yes. 964 00:57:38,590 --> 00:57:42,070 If you now run a verify sum where the answer is no, 965 00:57:42,070 --> 00:57:46,390 it may read stuff that the other verify sum didn't read maybe. 966 00:57:46,390 --> 00:57:48,820 Shouldn't matter, but it's awkward, 967 00:57:48,820 --> 00:57:51,890 because now it's not just r intersect w we need to encode. 968 00:57:51,890 --> 00:57:55,630 We need to encode some more stuff. 969 00:57:55,630 --> 00:57:57,760 It's basically a new r prime that 970 00:57:57,760 --> 00:58:01,180 may happen during these reads, and we just 971 00:58:01,180 --> 00:58:02,680 can't afford to encode that r prime, 972 00:58:02,680 --> 00:58:04,388 because it's not the thing we care about. 973 00:58:04,388 --> 00:58:06,880 We care about what happens in the actual access sequence, 974 00:58:06,880 --> 00:58:10,130 not in this arbitrary simulation. 975 00:58:10,130 --> 00:58:15,361 So this is the annoying thing. 976 00:58:15,361 --> 00:58:15,860 Trouble. 977 00:58:21,700 --> 00:58:28,780 If you look at an incorrect query, meaning the wrong pi, 978 00:58:28,780 --> 00:58:33,930 this is like a no query, the output's no. 979 00:58:33,930 --> 00:58:39,040 Reads some different set of cells, r 980 00:58:39,040 --> 00:58:42,770 prime, which isn't the same thing as r. 981 00:58:42,770 --> 00:58:49,300 And so if-- we have some good news, which 982 00:58:49,300 --> 00:58:52,150 is if we can somehow detect that this happened, 983 00:58:52,150 --> 00:58:58,270 that we read something that is in r prime, but not r, 984 00:58:58,270 --> 00:59:00,880 then the answer must be no. 985 00:59:05,950 --> 00:59:11,019 So that's our saving hope, is that either we're 986 00:59:11,019 --> 00:59:13,060 reading something at r intersect w, in which case 987 00:59:13,060 --> 00:59:15,460 it's been written down, we know how to do it. 988 00:59:15,460 --> 00:59:20,230 What's not written there, and if it's not written there, 989 00:59:20,230 --> 00:59:23,950 then it should be, hopefully, in r prime minus r. 990 00:59:23,950 --> 00:59:27,320 So the answer should be no. 991 00:59:27,320 --> 00:59:29,620 Maybe. 992 00:59:29,620 --> 00:59:31,330 Slight problem, though, because we 993 00:59:31,330 --> 00:59:34,780 used r intersect w to detect what case we were in. 994 00:59:34,780 --> 00:59:36,610 If we were in r intersect w, then 995 00:59:36,610 --> 00:59:40,160 we knew we should read from those encoded cells. 996 00:59:40,160 --> 00:59:43,810 If we weren't, we were either in the past or in the right 997 00:59:43,810 --> 00:59:45,850 subtree, these things were easy to detect, 998 00:59:45,850 --> 00:59:48,850 because they got written during the simulation. 999 00:59:48,850 --> 00:59:50,580 But we need to distinguish between-- 1000 00:59:50,580 --> 00:59:52,780 did we read something that was in the left subtree, 1001 00:59:52,780 --> 00:59:56,710 or did we read something that was known? 1002 00:59:56,710 --> 00:59:59,440 This is a little tricky, because this gets at exactly the issue. 1003 00:59:59,440 --> 01:00:01,390 Left subtree might write some stuff that 1004 01:00:01,390 --> 01:00:04,330 didn't get read by verify sum. 1005 01:00:04,330 --> 01:00:07,790 So now you go to read it, you need to know, 1006 01:00:07,790 --> 01:00:12,610 am I reading something that was not in r intersect w? 1007 01:00:12,610 --> 01:00:18,130 And therefore-- Yeah. 1008 01:00:18,130 --> 01:00:21,270 Basically the issue is, is it in w? 1009 01:00:21,270 --> 01:00:23,480 If it's in w, but not in r intersect w, 1010 01:00:23,480 --> 01:00:26,320 then I know the answer is no, and I should stop. 1011 01:00:26,320 --> 01:00:30,450 If it's not in w though, that means it was in the known past, 1012 01:00:30,450 --> 01:00:32,500 and then I should continue. 1013 01:00:32,500 --> 01:00:35,320 How do I know if I should stop or continue? 1014 01:00:35,320 --> 01:00:41,030 So this is the tricky part. 1015 01:00:41,030 --> 01:00:49,730 We can't tell whether there's the weird notation. 1016 01:00:49,730 --> 01:00:58,240 We want to know whether r is in w minus r or past minus r 1017 01:00:58,240 --> 01:01:00,400 intersect w. 1018 01:01:00,400 --> 01:01:02,994 OK, we can tell whether it's in r intersect w, 1019 01:01:02,994 --> 01:01:03,910 if it is, we're happy. 1020 01:01:03,910 --> 01:01:06,100 If it's not in r intersect w, it could 1021 01:01:06,100 --> 01:01:08,490 be that's because it was just in some past thing we 1022 01:01:08,490 --> 01:01:12,340 were reading, that didn't get read otherwise. 1023 01:01:12,340 --> 01:01:14,590 Or it could be we're reading something 1024 01:01:14,590 --> 01:01:16,720 that was written in the left subtree, 1025 01:01:16,720 --> 01:01:18,770 but not read in the right subtree. 1026 01:01:18,770 --> 01:01:21,850 So in this case, we want to abort. 1027 01:01:21,850 --> 01:01:25,735 And in this case, it's known, and so we just continue. 1028 01:01:30,322 --> 01:01:32,280 So that's what the simulation would like to do, 1029 01:01:32,280 --> 01:01:35,292 if we could distinguish between these two cases. 1030 01:01:35,292 --> 01:01:37,500 But right now, we can't distinguish between these two 1031 01:01:37,500 --> 01:01:40,600 cases, because we don't have enough information. 1032 01:01:40,600 --> 01:01:44,240 So we're going to make our encoding a little bit bigger. 1033 01:01:44,240 --> 01:01:45,240 What we're going to do-- 1034 01:01:48,888 --> 01:02:06,030 this is here-- is encode a separator 1035 01:02:06,030 --> 01:02:15,736 for r minus w and w minus r. 1036 01:02:15,736 --> 01:02:21,130 So let's-- over here. 1037 01:02:36,600 --> 01:02:38,280 What does this mean? 1038 01:02:38,280 --> 01:02:45,030 Separators going to call, called S. So I want this picture, 1039 01:02:45,030 --> 01:02:52,420 r minus w sits inside S. And w minus r 1040 01:02:52,420 --> 01:02:59,327 sits outside S. This is my universe of cells. 1041 01:02:59,327 --> 01:03:01,660 These are the things that are read in the right subtree, 1042 01:03:01,660 --> 01:03:03,742 but not written in the left subtree. 1043 01:03:03,742 --> 01:03:05,200 Those are the things I care about-- 1044 01:03:07,870 --> 01:03:10,390 well, no quite this, the other ones. 1045 01:03:10,390 --> 01:03:13,210 So things that are read in the right subtree and that 1046 01:03:13,210 --> 01:03:16,840 are not written in the last, this is the past essentially, 1047 01:03:16,840 --> 01:03:18,820 that's useful over there. 1048 01:03:18,820 --> 01:03:22,190 Over here, I have w minus r, these 1049 01:03:22,190 --> 01:03:24,190 are things that are written in the left subtree, 1050 01:03:24,190 --> 01:03:25,690 but not read in the right subtree. 1051 01:03:25,690 --> 01:03:27,356 These are the things that I worry about, 1052 01:03:27,356 --> 01:03:29,830 because those ones I need to detect that that was changed, 1053 01:03:29,830 --> 01:03:35,510 and say whoops, you must have an answer of no. 1054 01:03:35,510 --> 01:03:36,010 OK? 1055 01:03:36,010 --> 01:03:38,557 So I can't afford to store these sets exactly, 1056 01:03:38,557 --> 01:03:40,390 so I'm going to approximate them, by saying, 1057 01:03:40,390 --> 01:03:43,840 well, let's store the separator out here. 1058 01:03:43,840 --> 01:03:49,000 And if you're in S, then you're definitely not in w minus r. 1059 01:03:49,000 --> 01:03:53,470 If you're definitely not in w minus r, then you can run-- 1060 01:03:53,470 --> 01:03:56,290 you can treat it as if it was known. 1061 01:03:56,290 --> 01:04:00,080 OK, so if you're in s, this would be-- 1062 01:04:00,080 --> 01:04:03,240 why don't I write it here. 1063 01:04:03,240 --> 01:04:05,200 For the decoding algorithm, if you 1064 01:04:05,200 --> 01:04:14,730 want to read a cell that is written, 1065 01:04:14,730 --> 01:04:24,160 or last written in the right subtree, in the past, 1066 01:04:24,160 --> 01:04:27,870 these are the two easy case. 1067 01:04:27,870 --> 01:04:30,350 Sorry-- I don't want to write what's in the past, 1068 01:04:30,350 --> 01:04:33,080 because the whole point is to figure out what's in the past. 1069 01:04:33,080 --> 01:04:35,360 The other easy case is if it's in r intersect w, 1070 01:04:35,360 --> 01:04:37,332 then it's written down for us. 1071 01:04:37,332 --> 01:04:38,365 So this is encoded. 1072 01:04:40,930 --> 01:04:45,280 This is easy, because during the simulation we did those rights, 1073 01:04:45,280 --> 01:04:46,870 and so we know what they were. 1074 01:04:46,870 --> 01:04:50,020 r intersect w, we've written down, so it's easy to know. 1075 01:04:50,020 --> 01:04:52,600 Then the other cases are either you're in S, 1076 01:04:52,600 --> 01:05:00,790 or you're not in S. OK. 1077 01:05:00,790 --> 01:05:05,800 I claim if you're in S, you must be in the past, 1078 01:05:05,800 --> 01:05:09,340 that cell must have been written in the past, 1079 01:05:09,340 --> 01:05:13,120 and so you know what the value was. 1080 01:05:13,120 --> 01:05:15,550 And so you can continue writing the simulation, 1081 01:05:15,550 --> 01:05:17,474 just like in this situation. 1082 01:05:19,830 --> 01:05:22,330 The other situation is you're not in S, then you don't know, 1083 01:05:22,330 --> 01:05:24,820 it could have been written or might not have been. 1084 01:05:24,820 --> 01:05:30,180 But what you know is that you're definitely not in r. 1085 01:05:30,180 --> 01:05:32,100 Because if you're not in r minus w, 1086 01:05:32,100 --> 01:05:36,050 and you're not in r intersect w, then you're not in r. 1087 01:05:36,050 --> 01:05:41,060 If you're not in r, then we're in this situation. 1088 01:05:41,060 --> 01:05:42,810 If you read something not in r, that means 1089 01:05:42,810 --> 01:05:45,150 you're running the wrong query. 1090 01:05:45,150 --> 01:05:47,850 Because the correct query does r-- 1091 01:05:47,850 --> 01:05:49,590 only reads from r. 1092 01:05:49,590 --> 01:05:56,970 So if you're not an S, you must not be in r. 1093 01:05:56,970 --> 01:06:01,560 And so in this case, you know you can abort and try 1094 01:06:01,560 --> 01:06:02,660 the next pi. 1095 01:06:02,660 --> 01:06:04,890 So we're going to do this for all pi, run 1096 01:06:04,890 --> 01:06:09,194 the simulation according to this way of reading cells. 1097 01:06:09,194 --> 01:06:11,610 At the end, the queries are either going to say yes or no, 1098 01:06:11,610 --> 01:06:16,950 or it may abort early. 1099 01:06:16,950 --> 01:06:18,840 So if it says no or it aborts early, 1100 01:06:18,840 --> 01:06:21,360 then we know that was not the right pi. 1101 01:06:21,360 --> 01:06:24,960 Only one of them can say yes, that tells us what the pi is, 1102 01:06:24,960 --> 01:06:27,810 that tells us what the queries were. 1103 01:06:27,810 --> 01:06:30,240 Once we know what the queries were in the right subtree, 1104 01:06:30,240 --> 01:06:33,012 we can use the same multiplying by inverses trick, 1105 01:06:33,012 --> 01:06:35,220 figure out what the updates were in the left subtree. 1106 01:06:35,220 --> 01:06:41,250 But those permutations require l root n log n bits. 1107 01:06:41,250 --> 01:06:43,770 Which used to be on this board, it's been erased now. 1108 01:06:43,770 --> 01:06:46,470 That's what we use for this argument. 1109 01:06:46,470 --> 01:06:48,270 And so what we get is overall, encoding 1110 01:06:48,270 --> 01:06:52,822 must use l root n log n bits. 1111 01:06:52,822 --> 01:06:55,740 OK, but our encoding's a little bit bigger now. 1112 01:06:55,740 --> 01:06:59,190 The big issue is how do we store the separator? 1113 01:06:59,190 --> 01:07:01,860 We need to do store this separator with very few bits, 1114 01:07:01,860 --> 01:07:06,160 otherwise we haven't really proved anything. 1115 01:07:06,160 --> 01:07:07,800 We want encoding to be small. 1116 01:07:11,480 --> 01:07:19,320 So we get that the encoding must use 1117 01:07:19,320 --> 01:07:28,170 omega l root n log n bits in expectation, because this 1118 01:07:28,170 --> 01:07:30,630 is a valid decoding algorithm, it will figure out 1119 01:07:30,630 --> 01:07:32,422 what the permutations were. 1120 01:07:32,422 --> 01:07:34,130 And they require at least this many bits, 1121 01:07:34,130 --> 01:07:38,160 so encoding must use this many bits in expectation. 1122 01:07:38,160 --> 01:07:40,740 Now the question is how many bits does the encoding use? 1123 01:07:40,740 --> 01:07:42,360 Then we'll get either a contradiction 1124 01:07:42,360 --> 01:07:44,550 or we'll prove the claim. 1125 01:07:47,142 --> 01:07:48,960 So let's go over here. 1126 01:08:23,830 --> 01:08:27,330 So here's a fun fact about separators. 1127 01:08:27,330 --> 01:08:30,779 I'm not going to prove it fully, but I'm 1128 01:08:30,779 --> 01:08:33,792 going to rely on some hashing ability. 1129 01:08:36,630 --> 01:08:38,790 So given some universe U, in this case 1130 01:08:38,790 --> 01:08:42,270 it's going to be the cells in our data structure. 1131 01:08:42,270 --> 01:08:44,715 But speak a little bit more generally of the universe U, 1132 01:08:44,715 --> 01:08:47,800 I have some number m, which is our set size. 1133 01:08:47,800 --> 01:08:53,580 And what we're interested in is in defining our separator 1134 01:08:53,580 --> 01:08:54,960 family. 1135 01:08:54,960 --> 01:08:58,930 Kind of like a family of hash functions, closely related, 1136 01:08:58,930 --> 01:08:59,790 in fact. 1137 01:08:59,790 --> 01:09:07,320 Call it S. And it's going to work for size m sets. 1138 01:09:13,510 --> 01:09:20,229 And so S is a separator family if, for any two sets, A and B, 1139 01:09:20,229 --> 01:09:32,259 in the universe of size, at most, m, and disjoint. 1140 01:09:32,259 --> 01:09:35,529 So A intersect B is the empty set. 1141 01:09:35,529 --> 01:09:38,380 So of course what we're thinking about here is r minus w, 1142 01:09:38,380 --> 01:09:39,279 and w minus r. 1143 01:09:39,279 --> 01:09:41,380 These are two subsets of the universe. 1144 01:09:41,380 --> 01:09:44,510 Hopefully they're not too big, because if this one is huge, 1145 01:09:44,510 --> 01:09:46,240 that means you read a huge amount of data 1146 01:09:46,240 --> 01:09:47,080 in the right subtree. 1147 01:09:47,080 --> 01:09:49,621 If this one is huge, it meant you wrote a huge amount of data 1148 01:09:49,621 --> 01:09:50,960 in the left subtree. 1149 01:09:50,960 --> 01:09:53,770 And then we get lower bounds in an easier way. 1150 01:09:53,770 --> 01:09:56,550 Or they're not so big, let's say they're size at most m. 1151 01:09:56,550 --> 01:09:59,470 They're disjoint for sure, by definition, r minus w's 1152 01:09:59,470 --> 01:10:01,110 disjoint from w minus r. 1153 01:10:01,110 --> 01:10:02,983 It removes the intersection. 1154 01:10:06,160 --> 01:10:08,760 So that's our set up. 1155 01:10:08,760 --> 01:10:10,390 Then, what we want, is that there 1156 01:10:10,390 --> 01:10:15,580 is some set C in the separator family, 1157 01:10:15,580 --> 01:10:29,590 such that A is contained in C, and B is outside of C. 1158 01:10:29,590 --> 01:10:33,580 So B is in the universe minus C. So this is exactly our picture 1159 01:10:33,580 --> 01:10:38,890 from before, we have A. A contains C, 1160 01:10:38,890 --> 01:10:41,230 and we have B over on the right. 1161 01:10:41,230 --> 01:10:45,840 And this is the whole universe U, and so B is outside of C, 1162 01:10:45,840 --> 01:10:51,832 A is entirely inside C. OK. 1163 01:10:51,832 --> 01:10:54,040 This is what we want to exist, because if a separator 1164 01:10:54,040 --> 01:10:58,090 family exists, then we know whatever our r minus w, and w 1165 01:10:58,090 --> 01:11:00,630 minus r sets were, as long as they're not too big, 1166 01:11:00,630 --> 01:11:02,320 they're definitely disjoint, we can 1167 01:11:02,320 --> 01:11:05,170 find one of these separators that encodes what we need 1168 01:11:05,170 --> 01:11:10,930 to encode, which is the set C. Which is called s over there. 1169 01:11:10,930 --> 01:11:11,740 Cool. 1170 01:11:11,740 --> 01:11:12,640 How do we encode it? 1171 01:11:12,640 --> 01:11:14,240 Well, if the number-- 1172 01:11:14,240 --> 01:11:20,260 if the size of the separator family is something, 1173 01:11:20,260 --> 01:11:24,640 then we need log of that bits to write down the separator set. 1174 01:11:24,640 --> 01:11:28,490 So as long as this is small, we're happy. 1175 01:11:28,490 --> 01:11:50,150 So let me tell you what's to know about separators 1176 01:11:50,150 --> 01:12:02,450 There exists a separator family S, with size of S at most 2 1177 01:12:02,450 --> 01:12:09,497 to the order m plus log log U. Now 1178 01:12:09,497 --> 01:12:11,330 this is getting into an area that we haven't 1179 01:12:11,330 --> 01:12:13,740 spent a lot of time on, but-- 1180 01:12:13,740 --> 01:12:15,170 so I'm going to give you a sketch 1181 01:12:15,170 --> 01:12:16,211 of a proof of this claim. 1182 01:12:19,520 --> 01:12:22,910 Relying on perfect hash functions. 1183 01:12:22,910 --> 01:12:27,170 So the idea is the following, we want to know, basically, 1184 01:12:27,170 --> 01:12:30,070 which elements are in A, which elements are in B. 1185 01:12:30,070 --> 01:12:32,270 But it's kind of annoying to do that, it can't start 1186 01:12:32,270 --> 01:12:34,290 that for all universe elements. 1187 01:12:34,290 --> 01:12:37,235 So if we could just find a nice perfect hash function that 1188 01:12:37,235 --> 01:12:39,110 maps the elements of a and the elements would 1189 01:12:39,110 --> 01:12:40,926 B to different slots in some hash table, 1190 01:12:40,926 --> 01:12:43,550 then for every slot in the hash table we could say, is it in A, 1191 01:12:43,550 --> 01:12:45,050 or is it in B? 1192 01:12:45,050 --> 01:12:49,580 Now if you are not in A union B and you hash somewhere, 1193 01:12:49,580 --> 01:12:52,260 you'll get some bit, who knows what that that stores. 1194 01:12:52,260 --> 01:12:53,420 I don't care. 1195 01:12:53,420 --> 01:12:55,670 For the things outside of A union B, 1196 01:12:55,670 --> 01:12:58,580 they could be in C or not in C, I don't really care. 1197 01:12:58,580 --> 01:13:01,460 And so all I care about is if A and B have 1198 01:13:01,460 --> 01:13:04,170 no collisions between each other, 1199 01:13:04,170 --> 01:13:06,650 I don't want any A thing to hash to B thing. 1200 01:13:06,650 --> 01:13:09,840 Then I can store a bit in every cell in the hash table, 1201 01:13:09,840 --> 01:13:13,749 and that will tell me, in particular, A versus B. 1202 01:13:13,749 --> 01:13:16,040 And then the rest of the items are somehow categorized, 1203 01:13:16,040 --> 01:13:18,350 but I don't care how they're categorized. 1204 01:13:18,350 --> 01:13:21,710 So we're going to use this fact that there 1205 01:13:21,710 --> 01:13:28,340 is a set of perfect hash functions of the same size. 1206 01:13:34,600 --> 01:13:45,680 Sorry, that should be H. This is what's really true, size of H 1207 01:13:45,680 --> 01:13:51,094 is 2 the order m plus log log U. OK, 1208 01:13:51,094 --> 01:13:52,760 I'm not going to prove this, but this is 1209 01:13:52,760 --> 01:13:54,250 about succinct hash functions. 1210 01:13:54,250 --> 01:13:56,450 It may be hard to find such a hash family, 1211 01:13:56,450 --> 01:13:58,655 but the claim is that they exist. 1212 01:13:58,655 --> 01:14:01,030 Or it's hard to find the hash function of the family that 1213 01:14:01,030 --> 01:14:03,200 has no collisions, but the guarantee 1214 01:14:03,200 --> 01:14:07,580 is, as long as you have, in total, two items, 1215 01:14:07,580 --> 01:14:10,850 out of your universe of size U, you 1216 01:14:10,850 --> 01:14:13,160 can get a collision-free hash function, 1217 01:14:13,160 --> 01:14:17,450 2 to the order m plus log log U. 1218 01:14:17,450 --> 01:14:18,080 OK. 1219 01:14:18,080 --> 01:14:20,630 So this is going to-- 1220 01:14:20,630 --> 01:14:21,130 Yeah. 1221 01:14:23,840 --> 01:14:29,900 Maps, say A union B, to an order m sized table. 1222 01:14:34,040 --> 01:14:36,920 And here, there are no collisions. 1223 01:14:41,570 --> 01:14:45,680 So then what we also store is an A or B 1224 01:14:45,680 --> 01:14:51,130 bit for each table entry. 1225 01:15:02,690 --> 01:15:03,970 So that's our encoding. 1226 01:15:03,970 --> 01:15:07,810 We store a perfect hash function, that's 1227 01:15:07,810 --> 01:15:17,830 going to cost log H bits for this part, and log of H 1228 01:15:17,830 --> 01:15:20,980 is just m plus log log U. And then 1229 01:15:20,980 --> 01:15:25,000 we're going to store this A or B bit for every table entry. 1230 01:15:25,000 --> 01:15:27,070 Number of table entries is order m, 1231 01:15:27,070 --> 01:15:33,430 so this is going to take 2 to the order m bits. 1232 01:15:33,430 --> 01:15:37,029 Or sorry, not-- sorry, in term of bits, its order m bits, 1233 01:15:37,029 --> 01:15:37,570 I should say. 1234 01:15:40,870 --> 01:15:45,100 In terms of functions, it's 2 to the order m 1235 01:15:45,100 --> 01:15:48,520 possible choices for this bit vector. 1236 01:15:48,520 --> 01:15:52,360 And so the easy way is to just sum up these bits, 1237 01:15:52,360 --> 01:15:54,910 you use log of H bits plus order m bits. 1238 01:15:54,910 --> 01:15:58,000 This already had an order m term, and so you get this. 1239 01:15:58,000 --> 01:16:03,880 The log of S is order m plus log log U. 1240 01:16:03,880 --> 01:16:06,610 So that's the end of proof sketch of the claim. 1241 01:16:06,610 --> 01:16:09,010 If you believe perfect hash functions can be written down 1242 01:16:09,010 --> 01:16:12,100 in a small way, then we're done. 1243 01:16:12,100 --> 01:16:16,060 Now first with separators, now let's apply this separator 1244 01:16:16,060 --> 01:16:21,280 theorem claim to this setting. 1245 01:16:21,280 --> 01:16:23,530 So now we can compute the size of our encoding, 1246 01:16:23,530 --> 01:16:26,290 our encoding involved writing down r intersect w. 1247 01:16:26,290 --> 01:16:30,310 That takes r intersect w times log n, just like before. 1248 01:16:30,310 --> 01:16:35,050 It also involves writing down the separator. 1249 01:16:35,050 --> 01:16:40,960 Separator takes order m bits, m is r plus w. 1250 01:16:40,960 --> 01:16:43,657 Things that are-- it's order r plus w. 1251 01:16:43,657 --> 01:16:44,740 These are all the things-- 1252 01:16:44,740 --> 01:16:46,960 I'm trying to write down r minus w and w 1253 01:16:46,960 --> 01:16:50,090 minus r, so that you add up those sizes, basically r 1254 01:16:50,090 --> 01:16:51,130 plus w. 1255 01:16:51,130 --> 01:16:57,010 Plus log log U. U is some small thing, 1256 01:16:57,010 --> 01:17:00,066 size of memory, number of cells in memory. 1257 01:17:00,066 --> 01:17:01,690 We're assuming that polynomials, so you 1258 01:17:01,690 --> 01:17:08,270 take log log of a polynomial, that's like log log n. 1259 01:17:08,270 --> 01:17:10,790 So let's finish this off. 1260 01:17:16,250 --> 01:17:25,790 So before this was our equation, r intersect w times 1261 01:17:25,790 --> 01:17:27,920 log n, that was the size of our encoding. 1262 01:17:27,920 --> 01:17:29,050 We still have that term. 1263 01:17:34,110 --> 01:17:40,560 Sorry, r intersect w, size of that, times log n. 1264 01:17:40,560 --> 01:17:41,860 So we still do that. 1265 01:17:46,600 --> 01:17:50,950 Now we also pay, for this separator, 1266 01:17:50,950 --> 01:17:57,130 we're going to pay r plus w, that's the m part. 1267 01:17:57,130 --> 01:18:00,790 Plus log log n. 1268 01:18:00,790 --> 01:18:04,180 This is the number of bits in our encoding. 1269 01:18:04,180 --> 01:18:07,750 And I claim, or what we've proved over here, 1270 01:18:07,750 --> 01:18:11,560 is that any encoding must use l root n log n bits. 1271 01:18:14,340 --> 01:18:17,040 So this thing must be at least this thing. 1272 01:18:17,040 --> 01:18:19,190 So we have a little bit more work to prove. 1273 01:18:19,190 --> 01:18:20,870 There are now two cases. 1274 01:18:20,870 --> 01:18:22,600 It depends-- there's basically-- 1275 01:18:22,600 --> 01:18:25,760 and log log n is unlikely to dominate. 1276 01:18:25,760 --> 01:18:29,150 We're doing a block operation on root n things, 1277 01:18:29,150 --> 01:18:32,480 probably need to use at least log log n steps. 1278 01:18:32,480 --> 01:18:34,100 So it's not really relevant. 1279 01:18:34,100 --> 01:18:38,930 What will dominate is either this term, as it used to, 1280 01:18:38,930 --> 01:18:40,500 or this term. 1281 01:18:40,500 --> 01:18:44,450 These are two different cases, call them case one, case two. 1282 01:18:44,450 --> 01:18:50,930 In case two, r plus w is at least l root n log n. 1283 01:18:50,930 --> 01:18:52,810 That's the lower bound we want. 1284 01:18:52,810 --> 01:19:02,610 If we can-- in case two, r plus w is omega l root n log n. 1285 01:19:02,610 --> 01:19:05,030 What that means is in this subtree, the amount of reading 1286 01:19:05,030 --> 01:19:06,920 we did in the right subtree, plus the amount 1287 01:19:06,920 --> 01:19:08,780 of writing we did in the left subtree, 1288 01:19:08,780 --> 01:19:11,120 is at least l root n log n. 1289 01:19:11,120 --> 01:19:13,430 That's our goal over here. 1290 01:19:13,430 --> 01:19:15,300 We want to prove-- 1291 01:19:15,300 --> 01:19:18,650 sorry, it's a previous claim, that's by now erased. 1292 01:19:18,650 --> 01:19:20,060 Is the easier claim, we just want 1293 01:19:20,060 --> 01:19:23,900 to show that the total amount of time spent in v's subtree 1294 01:19:23,900 --> 01:19:25,550 is at least log n per operation. 1295 01:19:25,550 --> 01:19:27,410 We're doing l root n things here. 1296 01:19:27,410 --> 01:19:29,880 So this is a ton of reading and writing. 1297 01:19:29,880 --> 01:19:32,120 So in that case, we're happy, because we get 1298 01:19:32,120 --> 01:19:33,980 an actual lower bound on time. 1299 01:19:33,980 --> 01:19:37,520 Otherwise, we don't-- I mean, these are actual reads 1300 01:19:37,520 --> 01:19:40,920 and writes, or total number of reads and writes. 1301 01:19:40,920 --> 01:19:43,620 Here we're getting-- in the other case, 1302 01:19:43,620 --> 01:19:51,860 we get r intersect w log n is at least l root n log 1303 01:19:51,860 --> 01:19:53,960 n, just like before. 1304 01:19:53,960 --> 01:19:57,600 So again, the log ns cancel. 1305 01:19:57,600 --> 01:19:59,750 So here we lose the log n factor, 1306 01:19:59,750 --> 01:20:02,600 but it's OK, because this is only 1307 01:20:02,600 --> 01:20:04,350 talking about r intersect w. 1308 01:20:04,350 --> 01:20:06,536 This we use the LCA charging, to say, well, 1309 01:20:06,536 --> 01:20:07,910 if you look at a particular read, 1310 01:20:07,910 --> 01:20:09,830 it's only gets charged by the LCA. 1311 01:20:09,830 --> 01:20:12,140 So then we can afford to sum up large amounts. 1312 01:20:12,140 --> 01:20:13,370 So it's a little bit weird. 1313 01:20:13,370 --> 01:20:15,579 In this situation, we add up all the lower bounds. 1314 01:20:15,579 --> 01:20:17,120 Each of them doesn't give us a log n, 1315 01:20:17,120 --> 01:20:19,620 but in aggregate, we get a log n, because every leaf appears 1316 01:20:19,620 --> 01:20:21,200 in log n levels. 1317 01:20:21,200 --> 01:20:23,820 In this case, we don't need to aggregate, because we just say, 1318 01:20:23,820 --> 01:20:25,910 well, the number of operations in the subtree 1319 01:20:25,910 --> 01:20:28,880 is at least log n per operation. 1320 01:20:28,880 --> 01:20:31,040 This time spent, cell probe's done, 1321 01:20:31,040 --> 01:20:32,789 is at least log n per operation. 1322 01:20:32,789 --> 01:20:35,330 So in that case, we don't need to sum the lower bounds, which 1323 01:20:35,330 --> 01:20:36,380 is done. 1324 01:20:36,380 --> 01:20:38,690 So in either case, we're happy. 1325 01:20:38,690 --> 01:20:40,940 Little weird, because you could have a mix of cases, 1326 01:20:40,940 --> 01:20:44,300 one vertex v could be in case two, 1327 01:20:44,300 --> 01:20:46,940 then you just ignore all the things below it. 1328 01:20:46,940 --> 01:20:49,197 The rest of the tree might be in case one, 1329 01:20:49,197 --> 01:20:50,780 but you can mix and match one and two, 1330 01:20:50,780 --> 01:20:55,400 as long as you don't use a one below a two, you're OK, 1331 01:20:55,400 --> 01:20:56,840 you won't double count. 1332 01:20:56,840 --> 01:20:58,910 And so in either case, we're happy, 1333 01:20:58,910 --> 01:21:02,870 we get a log n lower bound, either on time per operation, 1334 01:21:02,870 --> 01:21:06,860 or on this kind of time per operation. 1335 01:21:06,860 --> 01:21:09,850 Add up all those lower bounds, you get log n per operation, 1336 01:21:09,850 --> 01:21:12,950 or get root n log n per block operation, which 1337 01:21:12,950 --> 01:21:18,850 implies log n per insert delete edge, or connectivity query. 1338 01:21:18,850 --> 01:21:23,750 And that proves right there, more or less on time. 1339 01:21:23,750 --> 01:21:25,970 You can use the same technique to do a trade off 1340 01:21:25,970 --> 01:21:27,730 between updates and queries. 1341 01:21:27,730 --> 01:21:30,540 This is just log n, worst case of the two. 1342 01:21:30,540 --> 01:21:32,750 I mentioned what the bound was last time. 1343 01:21:32,750 --> 01:21:34,640 Same trick works, you just do more updates 1344 01:21:34,640 --> 01:21:36,469 than queries, or more queries than updates. 1345 01:21:36,469 --> 01:21:38,260 So we get link/cut trees are optimal, other 1346 01:21:38,260 --> 01:21:39,627 [? tour ?] trees are optimal. 1347 01:21:39,627 --> 01:21:41,585 And we've got lots of other points on the trade 1348 01:21:41,585 --> 01:21:43,730 off curve, as you may recall last class. 1349 01:21:43,730 --> 01:21:47,900 Like our log squared update is optimal for a log over log log 1350 01:21:47,900 --> 01:21:50,120 query. 1351 01:21:50,120 --> 01:21:53,950 And that's the end of dynamic graphs, the end 1352 01:21:53,950 --> 01:21:56,517 of advanced data structures. 1353 01:21:56,517 --> 01:21:58,100 Hope you had a fun time, we got to see 1354 01:21:58,100 --> 01:21:59,520 lots of different topics. 1355 01:21:59,520 --> 01:22:01,830 And I hope you'll enjoy watching on the videos, 1356 01:22:01,830 --> 01:22:04,580 and let me know if you have any comments, send an email 1357 01:22:04,580 --> 01:22:06,890 or whatever. 1358 01:22:06,890 --> 01:22:08,090 Yay. 1359 01:22:08,090 --> 01:22:09,940 [APPLAUSE]