1 00:00:11,000 --> 00:00:17,000 So, we're going to talk today about binary search trees. 2 00:00:17,000 --> 00:00:23,000 It's something called randomly built binary search trees. 3 00:00:23,000 --> 00:00:29,000 And, I'll abbreviate binary search trees as BST's throughout 4 00:00:29,000 --> 00:00:33,000 the lecture. And, you of all seen binary 5 00:00:33,000 --> 00:00:39,000 search trees in one place or another, in particular, 6 00:00:39,000 --> 00:00:45,000 recitation on Friday. So, we're going to build up the 7 00:00:45,000 --> 00:00:49,000 basic ideas presented there, and talk about how to randomize 8 00:00:49,000 --> 00:00:54,000 them, and make them good. So, you know that there are 9 00:00:54,000 --> 00:00:58,000 good binary search trees, which are relatively balanced, 10 00:00:58,000 --> 00:01:02,000 something like this. The height is log n. 11 00:01:02,000 --> 00:01:04,000 We called unbalanced, and that's good. 12 00:01:04,000 --> 00:01:06,000 Anything order log n will be fine. 13 00:01:06,000 --> 00:01:10,000 In terms of searching, it will then cost order log n. 14 00:01:10,000 --> 00:01:14,000 And, there are bad binary search trees which have really 15 00:01:14,000 --> 00:01:16,000 large height, possibly as big as n. 16 00:01:16,000 --> 00:01:19,000 So, this is good, and this is bad. 17 00:01:19,000 --> 00:01:22,000 We'd sort of like to know, we'd like to build binary 18 00:01:22,000 --> 00:01:26,000 search trees in such a way that they are good all the time, 19 00:01:26,000 --> 00:01:31,000 or at least most of the time. There are lots of ways to do 20 00:01:31,000 --> 00:01:36,000 this, and in the next couple of weeks, we will see four of them, 21 00:01:36,000 --> 00:01:39,000 if you count the problem set, I believe. 22 00:01:39,000 --> 00:01:42,000 Today, we are going to use randomization to make them 23 00:01:42,000 --> 00:01:45,000 balanced most of the time in a certain sense. 24 00:01:45,000 --> 00:01:49,000 And then, in your problem set, you will make that in a broader 25 00:01:49,000 --> 00:01:52,000 sense. But, one way to motivate this 26 00:01:52,000 --> 00:01:56,000 topic, so I'm not going to define randomly built binary 27 00:01:56,000 --> 00:02:00,000 search trees for a little bit. One way to motivate the topic 28 00:02:00,000 --> 00:02:04,000 is through sorting, our good friend. 29 00:02:04,000 --> 00:02:09,000 So, there's a natural way to sort n numbers using binary 30 00:02:09,000 --> 00:02:13,000 search trees. So, if I give you an array, 31 00:02:13,000 --> 00:02:18,000 A, how would you sort that array using binary search tree 32 00:02:18,000 --> 00:02:23,000 operations as a black box? Build the binary search tree, 33 00:02:23,000 --> 00:02:27,000 and then traverse it in order. Exactly. 34 00:02:27,000 --> 00:02:30,000 So, let's say we have some initial tree, 35 00:02:30,000 --> 00:02:35,000 which is empty, and then for each element of 36 00:02:35,000 --> 00:02:40,000 the array, we insert it into the tree. 37 00:02:40,000 --> 00:02:46,000 That's what you meant by building the search tree. 38 00:02:46,000 --> 00:02:53,000 So, we insert AI into the tree. This is the binary search tree 39 00:02:53,000 --> 00:03:00,000 insertion, standard insertion. And then, we do an in order 40 00:03:00,000 --> 00:03:09,000 traversal, which in the book is called in order tree walk. 41 00:03:09,000 --> 00:03:11,000 OK, you should know these algorithms are, 42 00:03:11,000 --> 00:03:14,000 but just for very quick reminder, tree insert basically 43 00:03:14,000 --> 00:03:18,000 searches for that element AI until it finds the place where 44 00:03:18,000 --> 00:03:21,000 it should have been if it was in the tree already, 45 00:03:21,000 --> 00:03:24,000 and then adds a new leaf there to insert that value. 46 00:03:24,000 --> 00:03:27,000 Tree walk recursively walks the left subtree, 47 00:03:27,000 --> 00:03:30,000 then prints out the root, and then recursively walks the 48 00:03:30,000 --> 00:03:33,000 right subtree. And, by the binary search tree 49 00:03:33,000 --> 00:03:38,000 property, that will print the elements out in sorted order. 50 00:03:38,000 --> 00:03:43,000 So, let's do a quick example because this turns out to be 51 00:03:43,000 --> 00:03:48,000 related to another sorting algorithm we've seen already. 52 00:03:48,000 --> 00:03:52,000 So, while the example is probably pretty trivial, 53 00:03:52,000 --> 00:03:55,000 the connection is pretty surprising. 54 00:03:55,000 --> 00:04:02,000 At least, it was to me the first time I taught this class. 55 00:04:02,000 --> 00:04:04,000 So, my array is three, one, eight, two, 56 00:04:04,000 --> 00:04:08,000 six, seven, five. And, I'm going to visit these 57 00:04:08,000 --> 00:04:12,000 elements in order from left to right, and just build a tree. 58 00:04:12,000 --> 00:04:15,000 So, the first element I see is three. 59 00:04:15,000 --> 00:04:18,000 So, I insert three into an empty tree. 60 00:04:18,000 --> 00:04:21,000 That requires no comparisons. Then I insert one. 61 00:04:21,000 --> 00:04:24,000 I see, is one bigger or less than three? 62 00:04:24,000 --> 00:04:27,000 It's smaller. So, I put it over here. 63 00:04:27,000 --> 00:04:31,000 Then I insert eight. That's bigger than three, 64 00:04:31,000 --> 00:04:35,000 so it get's a new leaf over here. 65 00:04:35,000 --> 00:04:38,000 Then I insert two. That sits between one and 66 00:04:38,000 --> 00:04:41,000 three. And so, it would fall off this 67 00:04:41,000 --> 00:04:44,000 right child of one. So, I add two there. 68 00:04:44,000 --> 00:04:48,000 Six is bigger than three, and less than eight. 69 00:04:48,000 --> 00:04:51,000 So, it goes here. Seven is bigger than three, 70 00:04:51,000 --> 00:04:54,000 and less than eight, bigger than six. 71 00:04:54,000 --> 00:04:58,000 So, it goes here, and five fits in between three 72 00:04:58,000 --> 00:05:03,000 and five, three and six rather. And so, that's the binary 73 00:05:03,000 --> 00:05:06,000 search tree that again. Then I run an in order 74 00:05:06,000 --> 00:05:10,000 traversal, which will print one, two, three, five, 75 00:05:10,000 --> 00:05:13,000 six, seven, eight. OK, I can run I quickly in my 76 00:05:13,000 --> 00:05:15,000 head because I've got a big stack. 77 00:05:15,000 --> 00:05:18,000 I've got to be a little bit careful. 78 00:05:18,000 --> 00:05:22,000 Of course, you should check that they come out in sorted 79 00:05:22,000 --> 00:05:24,000 order: one, two, three, five, 80 00:05:24,000 --> 00:05:27,000 six, seven, eight. And, if you don't have a big 81 00:05:27,000 --> 00:05:32,000 stack, you can go and buy one. That's always useful. 82 00:05:32,000 --> 00:05:36,000 Memory costs are going up a bit these days, or going down. 83 00:05:36,000 --> 00:05:40,000 They should be because of politics, but price-fixing, 84 00:05:40,000 --> 00:05:43,000 or whatever. So, the question is, 85 00:05:43,000 --> 00:05:46,000 what's the running time of the algorithm? 86 00:05:46,000 --> 00:05:50,000 Here, this is one of those answers where it depends. 87 00:05:50,000 --> 00:05:53,000 The parts that are easy to analyze are, well, 88 00:05:53,000 --> 00:05:56,000 initialization. The in order tree walk, 89 00:05:56,000 --> 00:06:00,000 how long does that take? n, good. 90 00:06:00,000 --> 00:06:05,000 So, it's order n for the walk, and for the initialization, 91 00:06:05,000 --> 00:06:08,000 which is constant. The question is, 92 00:06:08,000 --> 00:06:13,000 how long does it take me to do n tree inserts? 93 00:06:21,000 --> 00:06:26,000 Anyone want to guess any kind of answer to that question, 94 00:06:26,000 --> 00:06:32,000 other than it depends? I've already stolen the thunder 95 00:06:32,000 --> 00:06:34,000 there. Yeah? 96 00:06:34,000 --> 00:06:38,000 Big Omega of n log n, that's good. 97 00:06:38,000 --> 00:06:42,000 It's at least n log n. Why? 98 00:06:56,000 --> 00:06:58,000 Right. So, you gave two reasons. 99 00:06:58,000 --> 00:07:02,000 The first one is because of the decision tree lower bound. 100 00:07:02,000 --> 00:07:04,000 That doesn't actually prove this. 101 00:07:04,000 --> 00:07:07,000 You have to be a little bit careful. 102 00:07:07,000 --> 00:07:10,000 This is a claim that it's omega n log n all the time. 103 00:07:10,000 --> 00:07:14,000 It's certainly omega n log n in the worst case. 104 00:07:14,000 --> 00:07:18,000 Every comparison-based sorting algorithm is omega n log n in 105 00:07:18,000 --> 00:07:21,000 the worst case. It's also n log n every single 106 00:07:21,000 --> 00:07:25,000 time, omega n log n because of the second reason you gave, 107 00:07:25,000 --> 00:07:29,000 which is the best thing that could happen is we have a 108 00:07:29,000 --> 00:07:33,000 perfectly balanced tree. So, this is the figure that I 109 00:07:33,000 --> 00:07:36,000 have drawn the most on a blackboard in my life, 110 00:07:36,000 --> 00:07:41,000 the perfect tree on 15 nodes, I guess. 111 00:07:41,000 --> 00:07:42,000 So, if we're lucky, we have this. 112 00:07:42,000 --> 00:07:45,000 And if you add up all the depths of the nodes here, 113 00:07:45,000 --> 00:07:48,000 which gives you the search tree cost, in particular, 114 00:07:48,000 --> 00:07:52,000 these n over two nodes in the bottom, each have depth log n. 115 00:07:52,000 --> 00:07:54,000 And, therefore, you're going to have to pay it 116 00:07:54,000 --> 00:07:57,000 least n log n for those. And, if you're less balanced, 117 00:07:57,000 --> 00:08:02,000 it's going to be even worse. That takes some proving, 118 00:08:02,000 --> 00:08:08,000 but it's true. So, it's actually omega n log n 119 00:08:08,000 --> 00:08:13,000 all the time. OK, there are some cases, 120 00:08:13,000 --> 00:08:19,000 like you do know that the elements are almost already in 121 00:08:19,000 --> 00:08:25,000 order, you can do it in linear number comparisons. 122 00:08:25,000 --> 00:08:32,000 But here, you can't. Any other guesses at an answer 123 00:08:32,000 --> 00:08:34,000 to this question? Yeah? 124 00:08:34,000 --> 00:08:39,000 Big O n^2? Good, why? 125 00:08:39,000 --> 00:08:41,000 Right. We are doing n things, 126 00:08:41,000 --> 00:08:44,000 and each node has depth, at most, n. 127 00:08:44,000 --> 00:08:49,000 So, the number of comparisons we're making per element we 128 00:08:49,000 --> 00:08:51,000 insert, is, at most, n. 129 00:08:51,000 --> 00:08:53,000 So that's, at most, n^2. 130 00:08:53,000 --> 00:08:56,000 Any other answers? Is it possible for this 131 00:08:56,000 --> 00:09:03,000 algorithm to take n^2 time? Are there instances where it 132 00:09:03,000 --> 00:09:08,000 takes theta n^2? If it's already sorted, 133 00:09:08,000 --> 00:09:14,000 that would be pretty bad. So, if it's already sorted or 134 00:09:14,000 --> 00:09:21,000 if it's reverse sorted, you are in bad shape because 135 00:09:21,000 --> 00:09:27,000 then you get a tree like this. This is the sorted case. 136 00:09:27,000 --> 00:09:32,000 And, you compute. So, the total cost, 137 00:09:32,000 --> 00:09:38,000 the time in general is going to be the sum of the depths of the 138 00:09:38,000 --> 00:09:41,000 nodes for each node, X, in the tree. 139 00:09:41,000 --> 00:09:45,000 And in this case, it's one plus two plus three 140 00:09:45,000 --> 00:09:48,000 plus four, this arithmetic series. 141 00:09:48,000 --> 00:09:52,000 There's n of them, so this is theta n squared. 142 00:09:52,000 --> 00:09:56,000 It's like n^2 over two. So, that's bad news. 143 00:09:56,000 --> 00:10:03,000 The worst-case running time of this algorithm is n^2. 144 00:10:03,000 --> 00:10:08,000 Does that sound familiar at all, and algorithms worst-case 145 00:10:08,000 --> 00:10:11,000 running time is n^2, in particular, 146 00:10:11,000 --> 00:10:16,000 in the already-sorted case? But if we're lucky, 147 00:10:16,000 --> 00:10:20,000 at the lucky case, as we said, it's a balanced 148 00:10:20,000 --> 00:10:23,000 tree. Wouldn't that be great? 149 00:10:23,000 --> 00:10:28,000 Anything with omega log n height would give us a sorting 150 00:10:28,000 --> 00:10:36,000 algorithm that runs in n log n. So, in the lucky case, 151 00:10:36,000 --> 00:10:43,000 we are n log n. But in the unlucky case, 152 00:10:43,000 --> 00:10:48,000 we are n^2 and unlucky use sorted. 153 00:10:48,000 --> 00:10:57,000 Does it remind you of any algorithm we've seen before? 154 00:10:57,000 --> 00:11:02,000 Quicksort. It turns out the running time 155 00:11:02,000 --> 00:11:09,000 of this algorithm is the same as the running time of quicksort in 156 00:11:09,000 --> 00:11:13,000 a very strong sense. It turns out the comparisons 157 00:11:13,000 --> 00:11:19,000 that this algorithm makes are exactly the same comparisons 158 00:11:19,000 --> 00:11:24,000 that quicksort makes. It makes them in a different 159 00:11:24,000 --> 00:11:29,000 order, but it's really the same algorithm in disguise. 160 00:11:29,000 --> 00:11:34,000 That's the surprise here. So, in particular, 161 00:11:34,000 --> 00:11:36,000 we've already analyzed quicksort. 162 00:11:36,000 --> 00:11:40,000 We should get something for free out of that analysis. 163 00:11:54,000 --> 00:12:05,000 So, the relation is, BST sort and quicksort make the 164 00:12:05,000 --> 00:12:15,000 same comparisons but in a different order. 165 00:12:25,000 --> 00:12:29,000 So, let me walk through the same example we did before: 166 00:12:29,000 --> 00:12:33,000 three, one, eight, two, six, seven, 167 00:12:33,000 --> 00:12:35,000 five. So, there is an array. 168 00:12:35,000 --> 00:12:40,000 We are going to run a particular version of quicksort. 169 00:12:40,000 --> 00:12:43,000 I have to be a little bit careful here. 170 00:12:43,000 --> 00:12:47,000 It's sort of the obvious version of quicksort. 171 00:12:47,000 --> 00:12:52,000 Remember, our standard, boring quicksort is you take 172 00:12:52,000 --> 00:12:56,000 the first element as the partition element. 173 00:12:56,000 --> 00:13:01,000 So, I'll take three here. And, I split into the elements 174 00:13:01,000 --> 00:13:04,000 less than three, which is one and two. 175 00:13:04,000 --> 00:13:07,000 And, the elements bigger than three, which is eight, 176 00:13:07,000 --> 00:13:09,000 six, seven, five. And, in this version of 177 00:13:09,000 --> 00:13:12,000 quicksort, I don't change the order of the elements, 178 00:13:12,000 --> 00:13:13,000 eight, six, seven, five. 179 00:13:13,000 --> 00:13:17,000 So, let's say the order is preserved because only then will 180 00:13:17,000 --> 00:13:20,000 this equivalence hold. So, this is sort of a stable 181 00:13:20,000 --> 00:13:22,000 partition algorithm. It's easy enough to do. 182 00:13:22,000 --> 00:13:25,000 It's a particular version of quicksort. 183 00:13:25,000 --> 00:13:27,000 And soon, we're going to randomize it. 184 00:13:27,000 --> 00:13:32,000 And after we randomize, this difference doesn't matter. 185 00:13:32,000 --> 00:13:35,000 OK, then on the left recursion, we split in the partition 186 00:13:35,000 --> 00:13:38,000 element. There is things less than one, 187 00:13:38,000 --> 00:13:41,000 which is nothing, things bigger than one, 188 00:13:41,000 --> 00:13:44,000 which is two. And then, that's our partition 189 00:13:44,000 --> 00:13:45,000 element. We are done. 190 00:13:45,000 --> 00:13:48,000 Over here, we partition on eight. 191 00:13:48,000 --> 00:13:51,000 Everything is less than eight. So, we get six, 192 00:13:51,000 --> 00:13:53,000 seven, five, nothing on the right. 193 00:13:53,000 --> 00:13:57,000 Then we partition at six. We get things less than six, 194 00:13:57,000 --> 00:13:59,000 mainly five, things bigger than six, 195 00:13:59,000 --> 00:14:03,000 mainly seven. And, those are sort of 196 00:14:03,000 --> 00:14:06,000 partition elements in a trivial way. 197 00:14:06,000 --> 00:14:11,000 Now, this tree that we get on the partition elements looks an 198 00:14:11,000 --> 00:14:16,000 awful lot like this tree. OK, it should be exactly the 199 00:14:16,000 --> 00:14:19,000 same tree. And, you can walk through, 200 00:14:19,000 --> 00:14:22,000 what comparisons does quicksort make? 201 00:14:22,000 --> 00:14:25,000 Well, first, it compares everything to 202 00:14:25,000 --> 00:14:30,000 three, OK, except three itself. Now, if you look over here, 203 00:14:30,000 --> 00:14:32,000 what happens when we are inserting elements? 204 00:14:32,000 --> 00:14:35,000 Well, each time we insert an element, the first thing we do 205 00:14:35,000 --> 00:14:37,000 is compare with three. If it's less than, 206 00:14:37,000 --> 00:14:40,000 we go to the left branch. If it's greater than, 207 00:14:40,000 --> 00:14:43,000 we go to the right branch. So, we are making all these 208 00:14:43,000 --> 00:14:44,000 comparisons with three in both cases. 209 00:14:44,000 --> 00:14:47,000 Then, if we have an element less than three, 210 00:14:47,000 --> 00:14:49,000 it's either one or two. If it's one, 211 00:14:49,000 --> 00:14:51,000 we're done. No comparisons happen here one 212 00:14:51,000 --> 00:14:52,000 to one. But, we compare two to one. 213 00:14:52,000 --> 00:14:56,000 And indeed, when we insert two over there after comparing it to 214 00:14:56,000 --> 00:14:59,000 three, we compare it to one. And then we figure out that it 215 00:14:59,000 --> 00:15:01,000 happens here. Same thing happens in 216 00:15:01,000 --> 00:15:04,000 quicksort. For elements greater than 217 00:15:04,000 --> 00:15:08,000 three, we compare everyone to eight here because we are 218 00:15:08,000 --> 00:15:12,000 partitioning with respect to eight, and here because that's 219 00:15:12,000 --> 00:15:16,000 the next node after three. As soon as eight is inserted, 220 00:15:16,000 --> 00:15:20,000 we compare everything with eight to see in fact that's less 221 00:15:20,000 --> 00:15:23,000 than eight, and so on: so, all of the same 222 00:15:23,000 --> 00:15:25,000 comparisons, just in a different order. 223 00:15:25,000 --> 00:15:29,000 So, we turn 90∞. Kind of cool. 224 00:15:29,000 --> 00:15:34,000 So, this has various consequences in the analysis. 225 00:15:50,000 --> 00:15:54,000 So, in particular, the worst-case running time is 226 00:15:54,000 --> 00:15:58,000 theta n^2, which is not so exciting. 227 00:15:58,000 --> 00:16:04,000 What we really care about is the randomized version because 228 00:16:04,000 --> 00:16:10,000 that's what performs well. So, randomized BST sort is just 229 00:16:10,000 --> 00:16:16,000 like randomized quicksort. So, the first thing you do is 230 00:16:16,000 --> 00:16:21,000 randomly permute the array uniformly, picking all 231 00:16:21,000 --> 00:16:24,000 permutations with equal probability. 232 00:16:24,000 --> 00:16:31,000 And then, we call BST sort. OK, this is basically what 233 00:16:31,000 --> 00:16:35,000 randomized quicksort could be formulated as. 234 00:16:35,000 --> 00:16:40,000 And then, randomized BST sort is going to make exactly the 235 00:16:40,000 --> 00:16:43,000 same comparisons as randomized quicksort. 236 00:16:43,000 --> 00:16:48,000 Here, we are picking the root essentially randomly. 237 00:16:48,000 --> 00:16:52,000 And here in quicksort, you are picking the partition 238 00:16:52,000 --> 00:16:56,000 elements randomly. It's the same difference. 239 00:16:56,000 --> 00:17:00,000 OK, so the time of this algorithm equals the time of 240 00:17:00,000 --> 00:17:08,000 randomized quicksort because we are making the same comparisons. 241 00:17:08,000 --> 00:17:10,000 So, the number of comparisons is equal. 242 00:17:10,000 --> 00:17:11,000 And this is true as random variables. 243 00:17:11,000 --> 00:17:13,000 The random variable, the running time, 244 00:17:13,000 --> 00:17:16,000 this algorithm is equal to the random variable of this 245 00:17:16,000 --> 00:17:17,000 algorithm. In particular, 246 00:17:17,000 --> 00:17:20,000 the expectations are the same. 247 00:17:33,000 --> 00:17:37,000 OK, and we know that the expected running time of 248 00:17:37,000 --> 00:17:40,000 randomized quicksort on n elements is? 249 00:17:40,000 --> 00:17:42,000 Oh boy. n log n. 250 00:17:42,000 --> 00:17:45,000 Good. I was a little worried there. 251 00:17:45,000 --> 00:17:49,000 OK, so in particular, the expected running time of 252 00:17:49,000 --> 00:17:53,000 BST sort is n log n. Obviously, this is not too 253 00:17:53,000 --> 00:17:57,000 exciting from a sorting point of view. 254 00:17:57,000 --> 00:18:03,000 Sorting was just sort of to see this connection. 255 00:18:03,000 --> 00:18:05,000 What we actually care about, and the reason I've introduced 256 00:18:05,000 --> 00:18:08,000 this BST sort is what the tree looks like. 257 00:18:08,000 --> 00:18:10,000 What we really want is that search tree. 258 00:18:10,000 --> 00:18:11,000 The search tree can do more than sort. 259 00:18:11,000 --> 00:18:14,000 n order traversals are a pretty boring thing to do with the 260 00:18:14,000 --> 00:18:16,000 search tree. You can search in a search 261 00:18:16,000 --> 00:18:18,000 tree. So, OK, that's still not so 262 00:18:18,000 --> 00:18:20,000 exciting. You could sort the elements and 263 00:18:20,000 --> 00:18:22,000 then put them in an array and do binary search. 264 00:18:22,000 --> 00:18:26,000 But, the point of binary search trees, instead of binary search 265 00:18:26,000 --> 00:18:28,000 arrays, is that you can update them dynamically. 266 00:18:28,000 --> 00:18:31,000 We won't be updating them dynamically in this lecture, 267 00:18:31,000 --> 00:18:35,000 and we will in Wednesday and on your problem set. 268 00:18:35,000 --> 00:18:36,000 For now, it's just sort of warm-up. 269 00:18:36,000 --> 00:18:39,000 Let's say that the elements aren't changing. 270 00:18:39,000 --> 00:18:41,000 We are building one tree from the beginning. 271 00:18:41,000 --> 00:18:43,000 We have all n elements ahead of time. 272 00:18:43,000 --> 00:18:45,000 We are going to build it randomly. 273 00:18:45,000 --> 00:18:49,000 We randomly permute that array. Then we throw all the elements 274 00:18:49,000 --> 00:18:52,000 into a binary search tree. That's what BST sort does. 275 00:18:52,000 --> 00:18:54,000 Then it calls n order traversal. 276 00:18:54,000 --> 00:18:56,000 I don't really care about n order traversal. 277 00:18:56,000 --> 00:19:00,000 What I want, because we've just analyzed it. 278 00:19:00,000 --> 00:19:04,000 It would be a short lecture if I were done. 279 00:19:04,000 --> 00:19:11,000 What we want is this randomly built BST, which is what we get 280 00:19:11,000 --> 00:19:18,000 out of this algorithm. So, this is the tree resulting 281 00:19:18,000 --> 00:19:24,000 from randomized BST sort, OK, resulting from randomly 282 00:19:24,000 --> 00:19:30,000 permute in the array of just inserting those elements using 283 00:19:30,000 --> 00:19:36,000 the simple tree insert algorithm. 284 00:19:36,000 --> 00:19:40,000 The question is, what does that tree look like? 285 00:19:40,000 --> 00:19:45,000 And in particular, is there anything we can 286 00:19:45,000 --> 00:19:50,000 conclude out of this fact? The expected running time of 287 00:19:50,000 --> 00:19:55,000 BST sort is n log n. OK, I've mentioned cursorily 288 00:19:55,000 --> 00:20:02,000 what the running time of BST sort is, several times. 289 00:20:02,000 --> 00:20:06,000 It was the sum. So, this is the time of BST 290 00:20:06,000 --> 00:20:11,000 sort on n elements. It's the sum over all nodes, 291 00:20:11,000 --> 00:20:17,000 X, of the depth of that node. OK, depth starts at zero and 292 00:20:17,000 --> 00:20:21,000 works its way down because the root element, 293 00:20:21,000 --> 00:20:27,000 you don't make any comparisons beyond that, you are making 294 00:20:27,000 --> 00:20:32,000 whatever the depth is comparisons. 295 00:20:32,000 --> 00:20:40,000 OK, so we know that this thing is, in expectation we know that 296 00:20:40,000 --> 00:20:47,000 this is n log n. What does that tell us about 297 00:20:47,000 --> 00:20:52,000 the tree? This is for all nodes, 298 00:20:52,000 --> 00:20:58,000 X, in the tree. Does it tell us anything about 299 00:20:58,000 --> 00:21:03,000 the height of the tree, for example? 300 00:21:03,000 --> 00:21:07,000 Yeah? Right, intuitively, 301 00:21:07,000 --> 00:21:11,000 it says that the height of the tree is theta log n, 302 00:21:11,000 --> 00:21:13,000 and not n. But, in fact, 303 00:21:13,000 --> 00:21:17,000 it doesn't show that. And that's why if you feel that 304 00:21:17,000 --> 00:21:21,000 that's just intuition, but it may not be quite right. 305 00:21:21,000 --> 00:21:24,000 Indeed it's not. Let me tell you what it does 306 00:21:24,000 --> 00:21:27,000 say. So, if we take expectation of 307 00:21:27,000 --> 00:21:31,000 both sides, here we get n log n. So, the expected value of that 308 00:21:31,000 --> 00:21:35,000 is n log n. So, over here, 309 00:21:35,000 --> 00:21:41,000 well, we get the expected total depth, which is not so exciting. 310 00:21:41,000 --> 00:21:45,000 Let's look at the expected average depth. 311 00:21:45,000 --> 00:21:51,000 So, if I look at one over n, the sum over all n nodes in the 312 00:21:51,000 --> 00:21:57,000 tree of the depth of X, that would be the average depth 313 00:21:57,000 --> 00:22:02,000 over all the nodes. And what I should get is theta 314 00:22:02,000 --> 00:22:06,000 n log n over n because I divided n on both sides. 315 00:22:06,000 --> 00:22:10,000 And, I'm using, here, linearity of expectation, 316 00:22:10,000 --> 00:22:14,000 which is log n. So, what this fact about the 317 00:22:14,000 --> 00:22:19,000 expected running time tells me is that the average depth in the 318 00:22:19,000 --> 00:22:23,000 tree is log n, which is not quite the height 319 00:22:23,000 --> 00:22:26,000 of the tree being log n. 320 00:22:35,000 --> 00:22:39,000 OK, remember the height of the tree is the maximum depth of any 321 00:22:39,000 --> 00:22:41,000 node. Here, we are just bounding the 322 00:22:41,000 --> 00:22:43,000 average depth. 323 00:23:04,000 --> 00:23:08,000 Let's look at an example of a tree. 324 00:23:08,000 --> 00:23:14,000 I'll draw my favorite picture. So, here we have a nice 325 00:23:14,000 --> 00:23:20,000 balanced tree, let's say, on half of the nodes 326 00:23:20,000 --> 00:23:25,000 or a little more. And then, I have one really 327 00:23:25,000 --> 00:23:30,000 long path hanging off one particular leaf. 328 00:23:30,000 --> 00:23:37,000 It doesn't matter which one. And, I'm going to say that this 329 00:23:37,000 --> 00:23:41,000 path has length, with a total height here, 330 00:23:41,000 --> 00:23:45,000 I want to make root n, which is a lot bigger than log 331 00:23:45,000 --> 00:23:47,000 n. This is roughly log n. 332 00:23:47,000 --> 00:23:51,000 It's going to be log of n minus root n, or so, 333 00:23:51,000 --> 00:23:54,000 roughly. So, most of the nodes have 334 00:23:54,000 --> 00:23:58,000 logarithmic height and, sorry, logarithmic depth. 335 00:23:58,000 --> 00:24:03,000 If you compute the average depth in this particular tree, 336 00:24:03,000 --> 00:24:06,000 for most of the nodes, let's say it's, 337 00:24:06,000 --> 00:24:12,000 at most, n of the nodes have height log n. 338 00:24:12,000 --> 00:24:15,000 And then, there are root n nodes, at most, 339 00:24:15,000 --> 00:24:19,000 down here, which have depth, at most, root n. 340 00:24:19,000 --> 00:24:22,000 So, it's, at most, root n times root n. 341 00:24:22,000 --> 00:24:26,000 In fact, it's like half that, but not a big deal. 342 00:24:26,000 --> 00:24:29,000 So, this is n. So, this is n log n, 343 00:24:29,000 --> 00:24:34,000 or, sorry, average depth: I have to divide everything by 344 00:24:34,000 --> 00:24:38,000 n. n log n would be rather large 345 00:24:38,000 --> 00:24:42,000 for an average height, average depth. 346 00:24:42,000 --> 00:24:48,000 So, the average depth here is log n, but the height of the 347 00:24:48,000 --> 00:24:53,000 tree is square root of n. So, this is not enough. 348 00:24:53,000 --> 00:24:59,000 Just to know that the average depth is log n doesn't mean that 349 00:24:59,000 --> 00:25:04,000 the height is log n. OK, but the claim is this 350 00:25:04,000 --> 00:25:10,000 theorem for today is that the expected height of a randomly 351 00:25:10,000 --> 00:25:16,000 built binary search tree is indeed log n. 352 00:25:16,000 --> 00:25:21,000 BST is order log n. This is what we like to know 353 00:25:21,000 --> 00:25:26,000 because that tells us, if we just build a binary 354 00:25:26,000 --> 00:25:31,000 search tree randomly, then we can search in it in log 355 00:25:31,000 --> 00:25:34,000 n time. OK, for sorting, 356 00:25:34,000 --> 00:25:38,000 it's not as big a deal. We just care about the expected 357 00:25:38,000 --> 00:25:41,000 running time of creating the thing. 358 00:25:41,000 --> 00:25:44,000 Here, now we know that once we prove this theorem, 359 00:25:44,000 --> 00:25:48,000 we know that we can search quickly in expectation, 360 00:25:48,000 --> 00:25:53,000 in fact, most of the time. So, the rest of today's lecture 361 00:25:53,000 --> 00:25:56,000 will be proving this theorem. It's quite tricky, 362 00:25:56,000 --> 00:26:00,000 as you might imagine. It's another big probability 363 00:26:00,000 --> 00:26:06,000 analysis along the lines of quicksort and everything. 364 00:26:22,000 --> 00:26:26,000 So, I'm going to start with an outline of the proof, 365 00:26:26,000 --> 00:26:31,000 unless there are any questions about the theorem. 366 00:26:31,000 --> 00:26:35,000 It should be pretty clear what we want to prove. 367 00:26:35,000 --> 00:26:40,000 This is even weirder than most of the analyses we've seen. 368 00:26:40,000 --> 00:26:45,000 It's going to use a fancy trick, which is exponentiating a 369 00:26:45,000 --> 00:26:50,000 random variable. And to do that we need a tool 370 00:26:50,000 --> 00:26:54,000 called Jenson's inequality. We are going to prove that 371 00:26:54,000 --> 00:26:57,000 tool. Usually, we don't prove 372 00:26:57,000 --> 00:27:01,000 probability tools. But this one we are going to 373 00:27:01,000 --> 00:27:03,000 prove. It's not too hard. 374 00:27:03,000 --> 00:27:09,000 It's also basic analysis. So, the lemma, 375 00:27:09,000 --> 00:27:13,000 says that if we have what's called to a convex function, 376 00:27:13,000 --> 00:27:17,000 f, and you should all know what that means, but I'll define it 377 00:27:17,000 --> 00:27:21,000 soon in case you have forgotten. If you have a convex function, 378 00:27:21,000 --> 00:27:25,000 f, and you have a random variable, X, you take f of the 379 00:27:25,000 --> 00:27:27,000 expectation. That's, at most, 380 00:27:27,000 --> 00:27:32,000 the expectation of f of that random variable. 381 00:27:32,000 --> 00:27:40,000 Think about it enough and draw a convex function that is fairly 382 00:27:40,000 --> 00:27:46,000 intuitive, I guess. But we will prove it. 383 00:27:46,000 --> 00:27:54,000 What that allows us to do is instead of analyzing the random 384 00:27:54,000 --> 00:28:00,000 variable that tells us the height of a tree, 385 00:28:00,000 --> 00:28:06,000 so, X_n I'll call the random variable, RV, 386 00:28:06,000 --> 00:28:13,000 of the height of a BST, randomly constructed BST on n 387 00:28:13,000 --> 00:28:21,000 nodes we will analyze. Well, instead of analyzing this 388 00:28:21,000 --> 00:28:27,000 desired random variable, X_n, sorry, this should have 389 00:28:27,000 --> 00:28:32,000 been in capital X. We can analyze any convex 390 00:28:32,000 --> 00:28:35,000 function of X_n. And, we're going to analyze the 391 00:28:35,000 --> 00:28:39,000 exponentiation. So, I'm going to define Y_n to 392 00:28:39,000 --> 00:28:43,000 be two to the power of X_n. OK, the big question here is 393 00:28:43,000 --> 00:28:47,000 why bother doing this? The answer is because it works 394 00:28:47,000 --> 00:28:50,000 and it wouldn't work if we analyze X_n. 395 00:28:50,000 --> 00:28:54,000 We will see some intuition of that later on, 396 00:28:54,000 --> 00:28:59,000 but it's not very intuitive. This is our analysis where you 397 00:28:59,000 --> 00:29:03,000 need this extra trick. So, we're going to bound the 398 00:29:03,000 --> 00:29:05,000 expectation of Y_n, and from that, 399 00:29:05,000 --> 00:29:09,000 and using Jensen's inequality, we're going to get a bound on 400 00:29:09,000 --> 00:29:12,000 the expectation of X_n, a pretty tight bound, 401 00:29:12,000 --> 00:29:16,000 actually, because if we can bound the exponent up to 402 00:29:16,000 --> 00:29:18,000 constant factors, the exponentiation up to 403 00:29:18,000 --> 00:29:21,000 constant factors, we can bound X_n even better 404 00:29:21,000 --> 00:29:23,000 because you take logs to get X_n. 405 00:29:23,000 --> 00:29:28,000 So, we will even figure out what the constant is. 406 00:29:28,000 --> 00:29:33,000 So, what we will prove, this is the heart of the proof, 407 00:29:33,000 --> 00:29:37,000 is that the expected value of Y_n is order n^3. 408 00:29:37,000 --> 00:29:42,000 Here, we won't really know what the constant is. 409 00:29:42,000 --> 00:29:46,000 We don't need to. And then, we put these pieces 410 00:29:46,000 --> 00:29:49,000 together. So, let's do that. 411 00:29:49,000 --> 00:29:54,000 What we really care about is the expectation of X_n, 412 00:29:54,000 --> 00:29:57,000 which is the height of our tree. 413 00:29:57,000 --> 00:30:02,000 What we find out about is this fact. 414 00:30:02,000 --> 00:30:05,000 So, leave some horizontal space here. 415 00:30:05,000 --> 00:30:09,000 We get the expectation of two to the X_n. 416 00:30:09,000 --> 00:30:14,000 That's the expectation of Y_n. So, we learned that that's 417 00:30:14,000 --> 00:30:18,000 order n^3. And, Jensen's inequality tells 418 00:30:18,000 --> 00:30:23,000 us that if we take this function, two to the X, 419 00:30:23,000 --> 00:30:27,000 we plug it in here, that on the left-hand side we 420 00:30:27,000 --> 00:30:33,000 get two to the E of X. So, we get two to the E of X_n 421 00:30:33,000 --> 00:30:38,000 is at most E of two to the X_n. So, that's where we use 422 00:30:38,000 --> 00:30:43,000 Jensen's inequality, because what we care about is E 423 00:30:43,000 --> 00:30:46,000 of X_n. So now, we have a bound. 424 00:30:46,000 --> 00:30:50,000 We say, well, two to the E of X_n is, 425 00:30:50,000 --> 00:30:54,000 at most, n^3. So, if we take the log of both 426 00:30:54,000 --> 00:31:00,000 sides, we get E of X_n is, at most, the log of n^3. 427 00:31:00,000 --> 00:31:05,000 OK, I will write it in this funny way, log of order n^3, 428 00:31:05,000 --> 00:31:09,000 which will actually tell us the constant. 429 00:31:09,000 --> 00:31:12,000 This is three log n plus order one. 430 00:31:12,000 --> 00:31:18,000 So, we will prove that the expected height of a randomly 431 00:31:18,000 --> 00:31:24,000 constructed binary search tree on n nodes is roughly three log 432 00:31:24,000 --> 00:31:28,000 n, at most. OK, I will say more about that 433 00:31:28,000 --> 00:31:31,000 later. So, you've now seen the end of 434 00:31:31,000 --> 00:31:35,000 the proof. That's the foreshadowing. 435 00:31:35,000 --> 00:31:38,000 And now, this is the top-down approach. 436 00:31:38,000 --> 00:31:41,000 So, you sort of see what the steps are. 437 00:31:41,000 --> 00:31:44,000 Now, we just have to do the steps. 438 00:31:44,000 --> 00:31:46,000 OK, step one: take a bit of work, 439 00:31:46,000 --> 00:31:50,000 but it's easy because it's pretty basic stuff. 440 00:31:50,000 --> 00:31:54,000 Step two is just a definition and we are done. 441 00:31:54,000 --> 00:31:57,000 Step three is probably the hardest part. 442 00:31:57,000 --> 00:32:03,000 Step four, we've already done. So, let's start with step one. 443 00:32:16,000 --> 00:32:22,000 So, the first thing I need to do is define a convex function 444 00:32:22,000 --> 00:32:29,000 because we are going to manipulate the definition a fair 445 00:32:29,000 --> 00:32:33,000 amount. So, this is a notion from real 446 00:32:33,000 --> 00:32:36,000 analysis. Analysis is a fancy word for 447 00:32:36,000 --> 00:32:40,000 calculus if you haven't taken the proper analysis class. 448 00:32:40,000 --> 00:32:44,000 You should have seen convexity in any calculus class. 449 00:32:44,000 --> 00:32:47,000 A convex function is one that looks like this. 450 00:32:47,000 --> 00:32:50,000 OK, good. One way to formalize that 451 00:32:50,000 --> 00:32:53,000 notion is to consider any two points on this curve. 452 00:32:53,000 --> 00:32:57,000 So, I'm only interested in functions from reals to reals. 453 00:32:57,000 --> 00:33:01,000 So, it looks like this. This is f of something. 454 00:33:01,000 --> 00:33:05,000 And, this is the something. If I take two points on this 455 00:33:05,000 --> 00:33:08,000 curve, and I draw a line segment connecting them, 456 00:33:08,000 --> 00:33:11,000 that line segment is always above the curve. 457 00:33:11,000 --> 00:33:13,000 That's the meaning of convexity. 458 00:33:13,000 --> 00:33:16,000 It has a geometric notion, which is basically the same. 459 00:33:16,000 --> 00:33:19,000 But for functions, this line segment should stay 460 00:33:19,000 --> 00:33:22,000 above the curve. The line does not stay above 461 00:33:22,000 --> 00:33:24,000 the curve. If I extended it farther, 462 00:33:24,000 --> 00:33:26,000 it goes beneath the curve, of course. 463 00:33:26,000 --> 00:33:31,000 But, that segment should. So, I'm going to formalize that 464 00:33:31,000 --> 00:33:33,000 a little bit. I'll call this x, 465 00:33:33,000 --> 00:33:37,000 and then this is f of x. And, I'll call this y, 466 00:33:37,000 --> 00:33:41,000 and this is f of y. So, the claim is that I take 467 00:33:41,000 --> 00:33:44,000 any number between x and y, and I look up, 468 00:33:44,000 --> 00:33:48,000 and I say, OK, here's the point on the curve. 469 00:33:48,000 --> 00:33:50,000 Here's the point on the line segment. 470 00:33:50,000 --> 00:33:54,000 The value of that point on the y value, here, 471 00:33:54,000 --> 00:33:58,000 should be greater than or equal to the y value here, 472 00:33:58,000 --> 00:34:01,000 OK? To figure out what the point 473 00:34:01,000 --> 00:34:06,000 is, we need some, I would call it geometry. 474 00:34:06,000 --> 00:34:08,000 I'm sure it's an analysis concept, too. 475 00:34:08,000 --> 00:34:12,000 But, I'm a geometer, so I get to call it geometry. 476 00:34:12,000 --> 00:34:16,000 If you have two points, p and q, and you want to 477 00:34:16,000 --> 00:34:19,000 parameterize this line segment between them, 478 00:34:19,000 --> 00:34:24,000 so, I want to parameterize some points here, the way to do it is 479 00:34:24,000 --> 00:34:29,000 to take a linear combination. And, if you should have taken 480 00:34:29,000 --> 00:34:32,000 some linear algebra, linear combination look 481 00:34:32,000 --> 00:34:35,000 something like this. And, in fact, 482 00:34:35,000 --> 00:34:39,000 we're going to take something called an affine combination 483 00:34:39,000 --> 00:34:41,000 where alpha plus beta equals one. 484 00:34:41,000 --> 00:34:43,000 It turns out, if you take all such points, 485 00:34:43,000 --> 00:34:45,000 some number, alpha, times the point, 486 00:34:45,000 --> 00:34:48,000 p, plus some number, beta times the point, 487 00:34:48,000 --> 00:34:50,000 q, where alpha plus beta equals one. 488 00:34:50,000 --> 00:34:53,000 If you take all those points, you get the entire line here, 489 00:34:53,000 --> 00:34:56,000 which is nifty. But, we don't want the entire 490 00:34:56,000 --> 00:34:58,000 line. If you also constrained alpha 491 00:34:58,000 --> 00:35:01,000 and beta to be nonnegative, you just get this line segment. 492 00:35:01,000 --> 00:35:05,000 So, this forces alpha and beta to be between zero and one 493 00:35:05,000 --> 00:35:10,000 because they have to sum to one, and they are nonnegative. 494 00:35:10,000 --> 00:35:14,000 So, what we are going to do here is take alpha times x plus 495 00:35:14,000 --> 00:35:17,000 beta times y. That's going to be our point 496 00:35:17,000 --> 00:35:22,000 between with these constraints: alpha plus beta equals one. 497 00:35:22,000 --> 00:35:26,000 Alpha and beta are greater than or equal to zero. 498 00:35:26,000 --> 00:35:31,000 Then, this point is f of that. This is f of alpha x plus beta, 499 00:35:31,000 --> 00:35:34,000 y. And, this point is the linear 500 00:35:34,000 --> 00:35:38,000 interpolation between f of x and f of y, the same one. 501 00:35:38,000 --> 00:35:42,000 So, it's alpha times f of x plus beta times f of y. 502 00:35:42,000 --> 00:35:46,000 OK, that's the intuition. If you didn't follow it, 503 00:35:46,000 --> 00:35:51,000 it's not too big a deal because all we care about are the 504 00:35:51,000 --> 00:35:54,000 symbolic answer for proving things. 505 00:35:54,000 --> 00:35:56,000 But, that's where this comes from. 506 00:35:56,000 --> 00:36:03,000 So, here's the definition. Its function is convex. 507 00:36:03,000 --> 00:36:09,000 If, for all x and y, and all alpha and beta are 508 00:36:09,000 --> 00:36:16,000 greater than or equal to zero, whose sum is one, 509 00:36:16,000 --> 00:36:25,000 we have f of alpha x plus beta y is less than or equal to alpha 510 00:36:25,000 --> 00:36:32,000 f of x plus beta f of y. So, that's just saying that 511 00:36:32,000 --> 00:36:38,000 this y coordinate here is less than or equal to this y 512 00:36:38,000 --> 00:36:41,000 coordinate. OK, but that's the symbolism 513 00:36:41,000 --> 00:36:46,000 behind that picture. OK, so now we want to prove 514 00:36:46,000 --> 00:36:51,000 Jensen's inequality. OK, we're not quite there yet. 515 00:36:51,000 --> 00:36:57,000 We are going to prove a simple lemma, from which it will be 516 00:36:57,000 --> 00:37:02,000 easy to derive Jenson's equality. 517 00:37:02,000 --> 00:37:07,000 So, this is the theorem we are proving. 518 00:37:07,000 --> 00:37:13,000 So, here's a lemma about convex functions. 519 00:37:13,000 --> 00:37:22,000 You may have seen it before. It will be crucial to Jensen's 520 00:37:22,000 --> 00:37:25,000 inequality. So, suppose, 521 00:37:25,000 --> 00:37:34,000 this is a statement about affine combinations of n things 522 00:37:34,000 --> 00:37:41,000 instead of two things. So, this will say that 523 00:37:41,000 --> 00:37:46,000 convexity can be generalized to taking n things. 524 00:37:46,000 --> 00:37:52,000 So, suppose we have n real numbers, and we have n values 525 00:37:52,000 --> 00:37:55,000 alpha i, alpha one up to alpha n. 526 00:37:55,000 --> 00:38:00,000 They are all nonnegative. And, their sum is one. 527 00:38:00,000 --> 00:38:06,000 So, the sum of alpha k, I guess, k equals one to n, 528 00:38:06,000 --> 00:38:11,000 is one. So, those are the assumptions. 529 00:38:11,000 --> 00:38:18,000 The conclusion is the same thing, but summing over all k. 530 00:38:18,000 --> 00:38:22,000 So, k equals one to n, alpha_k * x_k. 531 00:38:22,000 --> 00:38:29,000 Take f of that versus taking the sum of the alphas times the 532 00:38:29,000 --> 00:38:32,000 f's. k equals one to n. 533 00:38:32,000 --> 00:38:37,000 So, the definition of convexity is exactly that statement, 534 00:38:37,000 --> 00:38:42,000 but where n equals two. OK, alpha one and alpha two are 535 00:38:42,000 --> 00:38:46,000 alpha and beta. This is just a statement for 536 00:38:46,000 --> 00:38:50,000 general n. And, you can interpret this in 537 00:38:50,000 --> 00:38:53,000 some funnier way, which I won't get into. 538 00:38:53,000 --> 00:38:56,000 Oh, sure, why not? I'm a geometer. 539 00:38:56,000 --> 00:39:03,000 So, this is saying you take several points on this curve. 540 00:39:03,000 --> 00:39:05,000 You take the polygon that they define. 541 00:39:05,000 --> 00:39:07,000 So, these are straight-line segments. 542 00:39:07,000 --> 00:39:10,000 You take the interior. If you take an affine 543 00:39:10,000 --> 00:39:13,000 combination like that, you will get a point inside 544 00:39:13,000 --> 00:39:16,000 that polygon, or possibly on the boundary. 545 00:39:16,000 --> 00:39:20,000 The claim is that all those points are above the curve. 546 00:39:20,000 --> 00:39:23,000 Again, intuitively: true if you draw a nice, 547 00:39:23,000 --> 00:39:25,000 canonical convex curve, but in fact, 548 00:39:25,000 --> 00:39:27,000 it's true algebraically, too. 549 00:39:27,000 --> 00:39:33,000 It's always a good thing. Any suggestions on how we might 550 00:39:33,000 --> 00:39:36,000 prove this theorem, this lemma? 551 00:39:36,000 --> 00:39:40,000 It's pretty easy. So, what technique might we use 552 00:39:40,000 --> 00:39:44,000 to prove it? One word: induction. 553 00:39:44,000 --> 00:39:46,000 Always a good answer, yeah. 554 00:39:46,000 --> 00:39:52,000 Induction should shout out at you here because we already know 555 00:39:52,000 --> 00:40:00,000 that this is true by definition of convexity for n equals two. 556 00:40:00,000 --> 00:40:04,000 So, the base case is clear. In fact, there's an even 557 00:40:04,000 --> 00:40:08,000 simpler base case, which is when n equals one. 558 00:40:08,000 --> 00:40:13,000 If n equals one, then you have one number that 559 00:40:13,000 --> 00:40:16,000 sums to one. So, alpha one is one. 560 00:40:16,000 --> 00:40:19,000 And so, nothing is going on here. 561 00:40:19,000 --> 00:40:23,000 This is just saying that f of one times x_1 is, 562 00:40:23,000 --> 00:40:28,000 at most, one times f of x_1: so, not terribly exciting 563 00:40:28,000 --> 00:40:33,000 because that holds with the quality. 564 00:40:33,000 --> 00:40:37,000 OK, so we don't even need the n equals two base case. 565 00:40:37,000 --> 00:40:42,000 So, the interesting part, although still not terribly 566 00:40:42,000 --> 00:40:45,000 interesting, is the induction step. 567 00:40:45,000 --> 00:40:48,000 This is good practice in induction. 568 00:40:48,000 --> 00:40:53,000 So, what we care about is this f of this linear combination, 569 00:40:53,000 --> 00:40:57,000 f on combination, x_k times x_k summed over all 570 00:40:57,000 --> 00:41:01,000 k. Now, what I would like to do is 571 00:41:01,000 --> 00:41:05,000 apply induction. What I know about inductively, 572 00:41:05,000 --> 00:41:09,000 is say f of this sum, if it's summed only up to n 573 00:41:09,000 --> 00:41:12,000 minus one instead of all the way up to n. 574 00:41:12,000 --> 00:41:16,000 Any smaller sum I can deal with by induction. 575 00:41:16,000 --> 00:41:20,000 So, I'm going to try and get rid of the nth term. 576 00:41:20,000 --> 00:41:24,000 I want to separate it out. And, this is fairly natural if 577 00:41:24,000 --> 00:41:28,000 you've played with affine combinations before. 578 00:41:28,000 --> 00:41:35,000 But it's just some algebra. So, I want to separate out the 579 00:41:35,000 --> 00:41:40,000 alpha_n*x_n term. And, I'd also like to make it 580 00:41:40,000 --> 00:41:45,000 an affine combination. This is the trick. 581 00:41:45,000 --> 00:41:50,000 Sorry, no f here. If I just removed the last 582 00:41:50,000 --> 00:41:57,000 term, the alpha k's from one up to n minus one wouldn't sum to 583 00:41:57,000 --> 00:42:02,000 one anymore. They'd sum to something 584 00:42:02,000 --> 00:42:05,000 smaller. So, I can't just take out this 585 00:42:05,000 --> 00:42:08,000 term. I'm going to have to do some 586 00:42:08,000 --> 00:42:10,000 trickery here, x_k plus the f. 587 00:42:10,000 --> 00:42:13,000 Good. So, you should see why this is 588 00:42:13,000 --> 00:42:17,000 true, because the one minus alpha n's cancel. 589 00:42:17,000 --> 00:42:22,000 And then, I'm just getting the sum of alpha_k*x_k, 590 00:42:22,000 --> 00:42:28,000 k equals one to n minus one, plus the alpha_n*x_n term. 591 00:42:28,000 --> 00:42:30,000 So, I haven't done anything here. 592 00:42:30,000 --> 00:42:32,000 These are equal. But now, I have this nifty 593 00:42:32,000 --> 00:42:36,000 feature, that on the one hand, these two numbers, 594 00:42:36,000 --> 00:42:38,000 alpha n and one minus alpha n sum to one. 595 00:42:38,000 --> 00:42:41,000 And on the other hand, if I did it right, 596 00:42:41,000 --> 00:42:45,000 these numbers should sum up to one just going from one up to n 597 00:42:45,000 --> 00:42:47,000 minus one. Why do they sum up to one? 598 00:42:47,000 --> 00:42:51,000 Well, these numbers summed up to one minus alpha n. 599 00:42:51,000 --> 00:42:54,000 And so, I'm dividing everything by one minus alpha n. 600 00:42:54,000 --> 00:42:57,000 So, they will sum to one. So now, I have two affine 601 00:42:57,000 --> 00:43:02,000 combinations. I just apply the two things 602 00:43:02,000 --> 00:43:07,000 that I know. I know this affine combination 603 00:43:07,000 --> 00:43:10,000 will work because, well, why? 604 00:43:10,000 --> 00:43:16,000 Why can I say that this is alpha n f of x_n plus one minus 605 00:43:16,000 --> 00:43:20,000 alpha n f of this crazy sum? 606 00:43:35,000 --> 00:43:41,000 Shout it out. There are two possible answers. 607 00:43:41,000 --> 00:43:47,000 One is correct, and one is incorrect. 608 00:43:47,000 --> 00:43:55,000 So, which will it be? This should have been less than 609 00:43:55,000 --> 00:44:01,000 or equal to. That's important. 610 00:44:01,000 --> 00:44:04,000 It's on the board. It can't be too difficult. 611 00:44:17,000 --> 00:44:21,000 So, I'm treating this as just one big X value. 612 00:44:21,000 --> 00:44:26,000 So, I have some x_n, and I have some crazy X. 613 00:44:26,000 --> 00:44:31,000 I want f of the affine combination of those two X 614 00:44:31,000 --> 00:44:36,000 values is, at most, the affine combinations of the 615 00:44:36,000 --> 00:44:40,000 f's of those X values. This is? 616 00:44:40,000 --> 00:44:43,000 It is the inductive hypothesis where n equals two. 617 00:44:43,000 --> 00:44:45,000 Unfortunately, we didn't prove the n equals 618 00:44:45,000 --> 00:44:49,000 two case is a special base case. So, we can't use induction here 619 00:44:49,000 --> 00:44:52,000 the way that I've stated the base case. 620 00:44:52,000 --> 00:44:55,000 If you did n equals two base case, you can do that. 621 00:44:55,000 --> 00:44:58,000 Here, we can't. So, the other answer is by 622 00:44:58,000 --> 00:45:02,000 convexity, good. That's right here. 623 00:45:02,000 --> 00:45:08,000 So, f is convex. We know that this is true for 624 00:45:08,000 --> 00:45:15,000 any two X values, and provided these two sum to 625 00:45:15,000 --> 00:45:20,000 one. So, we know that this is true. 626 00:45:20,000 --> 00:45:28,000 Now is when we apply induction. So, now we are going to 627 00:45:28,000 --> 00:45:35,000 manipulate this right term by induction. 628 00:45:35,000 --> 00:45:40,000 See, before we didn't necessarily know that n was 629 00:45:40,000 --> 00:45:44,000 bigger than two. But, we know that n is bigger 630 00:45:44,000 --> 00:45:49,000 than n minus one. That much, I can be sure of. 631 00:45:49,000 --> 00:45:53,000 So, this is one minus alpha n times the sum, 632 00:45:53,000 --> 00:46:00,000 k equals one to n minus one of alpha k over one minus alpha n 633 00:46:00,000 --> 00:46:05,000 times f of x_k, if I got that right. 634 00:46:05,000 --> 00:46:09,000 This is by induction, the induction hypothesis, 635 00:46:09,000 --> 00:46:16,000 because these alpha k's over one minus alpha n sum to one. 636 00:46:16,000 --> 00:46:22,000 Now, these one minus alpha n's cancel, and we just get what we 637 00:46:22,000 --> 00:46:26,000 want. This is sum k equals one to n 638 00:46:26,000 --> 00:46:31,000 of alpha k, f of x_k. So, we get f of the sum is, 639 00:46:31,000 --> 00:46:37,000 at most, sum of the f's. That proves the lemma. 640 00:46:37,000 --> 00:46:43,000 OK, a bit tedious, but each step is pretty 641 00:46:43,000 --> 00:46:46,000 straightforward. Do you agree? 642 00:46:46,000 --> 00:46:53,000 Now, it turns out to be relatively straightforward to 643 00:46:53,000 --> 00:47:00,000 prove Jensen's inequality. That's the magic. 644 00:47:00,000 --> 00:47:04,000 And then, we get to do the expectation analysis. 645 00:47:04,000 --> 00:47:09,000 So, we use our good friends, indicator random variables. 646 00:47:09,000 --> 00:47:13,000 OK, but for now, we just want to prove this 647 00:47:13,000 --> 00:47:16,000 statement. If we have a convex function, 648 00:47:16,000 --> 00:47:21,000 f of the expectation is, at most, expectation of f of 649 00:47:21,000 --> 00:47:26,000 that random variable. OK, this is a random variable, 650 00:47:26,000 --> 00:47:29,000 right? If you want to sample from this 651 00:47:29,000 --> 00:47:33,000 random variable, you sample from X, 652 00:47:33,000 --> 00:47:39,000 and then you apply f to it. That's the meaning of this 653 00:47:39,000 --> 00:47:45,000 notation, f of X because X is a random variable. 654 00:47:45,000 --> 00:47:51,000 We get to use that f is convex. OK, it turns out this is not 655 00:47:51,000 --> 00:47:57,000 hard, if you remember the definition of expectation, 656 00:47:57,000 --> 00:48:01,000 oh, I want to make one more assumption here, 657 00:48:01,000 --> 00:48:08,000 which is that X is integral. So, it's an integer random 658 00:48:08,000 --> 00:48:11,000 variable, meaning it takes integer values. 659 00:48:11,000 --> 00:48:16,000 OK, that's all we care about because we're looking at running 660 00:48:16,000 --> 00:48:19,000 times. This statement is true for 661 00:48:19,000 --> 00:48:24,000 continuous random variables, too, but I would like to do the 662 00:48:24,000 --> 00:48:29,000 discrete case because then I get to write down what U of X is. 663 00:48:29,000 --> 00:48:34,000 So, what is the definition of E of X? 664 00:48:34,000 --> 00:48:40,000 X only takes on integer values. This is easy, 665 00:48:40,000 --> 00:48:47,000 but you have to remember it. It's a good drill. 666 00:48:47,000 --> 00:48:55,000 I don't really know much about X except that it takes on 667 00:48:55,000 --> 00:49:02,000 integer values. Any suggestions on how I should 668 00:49:02,000 --> 00:49:10,000 expand the expectation of X? How many people know this by 669 00:49:10,000 --> 00:49:14,000 heart? OK, it's not too easy then. 670 00:49:14,000 --> 00:49:20,000 Well, expectation has something to do with probability, 671 00:49:20,000 --> 00:49:23,000 right? So, I should be looking at 672 00:49:23,000 --> 00:49:29,000 something like the probability that X equals some value, 673 00:49:29,000 --> 00:49:32,000 x. That would seem like a good 674 00:49:32,000 --> 00:49:36,000 thing to do. What else goes here? 675 00:49:36,000 --> 00:49:39,000 A sum, yeah. The sum, well, 676 00:49:39,000 --> 00:49:44,000 X could be somewhere between minus infinity and infinity. 677 00:49:44,000 --> 00:49:49,000 That's certainly true. And, we have some more. 678 00:49:49,000 --> 00:49:54,000 There's something missing here. What is this sum? 679 00:49:54,000 --> 00:49:58,000 What does it come out to for any random variable, 680 00:49:58,000 --> 00:50:03,000 X, that takes on integer values? 681 00:50:03,000 --> 00:50:06,000 One, good. So, I need to add in something 682 00:50:06,000 --> 00:50:10,000 here, namely X. OK, that's the definition of 683 00:50:10,000 --> 00:50:13,000 the expectation. Now, f of a sum of things, 684 00:50:13,000 --> 00:50:18,000 where these coefficients sum to one looks an awful lot like the 685 00:50:18,000 --> 00:50:23,000 lemma that we just proved. OK, we proved it in the finite 686 00:50:23,000 --> 00:50:25,000 case. It turns out, 687 00:50:25,000 --> 00:50:30,000 it holds just as well if you take all integers. 688 00:50:30,000 --> 00:50:33,000 So, I'm just going to assume that. 689 00:50:33,000 --> 00:50:39,000 So, I have these probabilities, these alpha values sum to one. 690 00:50:39,000 --> 00:50:44,000 Therefore, I can use this inequality, that this is, 691 00:50:44,000 --> 00:50:49,000 at most, let me get this right, I have the alphas, 692 00:50:49,000 --> 00:50:53,000 so I have a sum, x equals minus infinity to 693 00:50:53,000 --> 00:50:58,000 infinity of the alphas, which are a probability; 694 00:50:58,000 --> 00:51:03,000 capital X equals little x times f of the value, 695 00:51:03,000 --> 00:51:09,000 f of little x. OK, so there it is. 696 00:51:09,000 --> 00:51:16,000 I've used the lemma. So, maybe now I'll erase the 697 00:51:16,000 --> 00:51:21,000 lemma. OK, I cheated by using the 698 00:51:21,000 --> 00:51:31,000 countable version of the lemma while only proving the finite 699 00:51:31,000 --> 00:51:36,000 case. It's all I can do in lecture. 700 00:51:36,000 --> 00:51:42,000 So, this is by a lemma. Now, what I'd like to prove and 701 00:51:42,000 --> 00:51:47,000 leave some blank space here is this is, at most, 702 00:51:47,000 --> 00:51:51,000 E of f of X, so that this summation is, 703 00:51:51,000 --> 00:51:56,000 at most, E of f of X. Actually, it's equal to E of f 704 00:51:56,000 --> 00:52:00,000 of X. And, it really looks kind of 705 00:52:00,000 --> 00:52:05,000 equal, right? You've got sum of some 706 00:52:05,000 --> 00:52:09,000 probabilities times f of X. It almost looks like the 707 00:52:09,000 --> 00:52:13,000 definition of E of f of X, but it isn't. 708 00:52:13,000 --> 00:52:18,000 You've got to be a little bit careful because E of f of X 709 00:52:18,000 --> 00:52:23,000 should talk about the probability that f of X equals a 710 00:52:23,000 --> 00:52:28,000 particular value. We can relate these as follows. 711 00:52:28,000 --> 00:52:32,000 It's not too hard. You can look at each value that 712 00:52:32,000 --> 00:52:37,000 f takes on, and then look at all the values, k, 713 00:52:37,000 --> 00:52:41,000 that map to that value, x. 714 00:52:41,000 --> 00:52:48,000 So all the k's where f of X equals x, the probability that X 715 00:52:48,000 --> 00:52:54,000 equals k, OK, this is another way of writing 716 00:52:54,000 --> 00:53:00,000 the probability that f of X equals x. 717 00:53:00,000 --> 00:53:04,000 OK, so, in other words, I'm grouping the terms in a 718 00:53:04,000 --> 00:53:07,000 particular way. I'm saying, well, 719 00:53:07,000 --> 00:53:12,000 f of X takes on various values. Clever me to switch. 720 00:53:12,000 --> 00:53:18,000 I used to use k's unannounced, so I better call this something 721 00:53:18,000 --> 00:53:20,000 else. Let's call this Y, 722 00:53:20,000 --> 00:53:25,000 sorry, switch notation here. It makes sense. 723 00:53:25,000 --> 00:53:31,000 I should look at the probability that X equals x. 724 00:53:31,000 --> 00:53:35,000 So, what I really care about is what this f of X value takes on. 725 00:53:35,000 --> 00:53:38,000 Let's just call it Y, look at all the values, 726 00:53:38,000 --> 00:53:41,000 Y, that f could take on. That's the range of f. 727 00:53:41,000 --> 00:53:46,000 And then, I'll look at all the different values of X where f of 728 00:53:46,000 --> 00:53:47,000 X equals Y. If I add up those 729 00:53:47,000 --> 00:53:50,000 probabilities, because these are different 730 00:53:50,000 --> 00:53:53,000 values of X. Those are sort of independent 731 00:53:53,000 --> 00:53:56,000 events. So, this summation will be the 732 00:53:56,000 --> 00:53:58,000 probability that f of X equals Y. 733 00:53:58,000 --> 00:54:02,000 This is capital X. This is little y. 734 00:54:02,000 --> 00:54:09,000 And then, if I multiply that by y, I'm getting the expectation 735 00:54:09,000 --> 00:54:12,000 of f of X. So, think about this, 736 00:54:12,000 --> 00:54:18,000 these two inequalities hold. This may be a bit bizarre here 737 00:54:18,000 --> 00:54:22,000 because these sums are potentially infinite. 738 00:54:22,000 --> 00:54:26,000 But, it's true. OK, this proves Jensen's 739 00:54:26,000 --> 00:54:30,000 inequality. So, it wasn't very hard, 740 00:54:30,000 --> 00:54:35,000 just a couple of boards, once we had this powerful 741 00:54:35,000 --> 00:54:41,000 convexity lemma. So, we just used convexity. 742 00:54:41,000 --> 00:54:43,000 We used the definition of E of X. 743 00:54:43,000 --> 00:54:47,000 We used convexity. That lets us put the f's 744 00:54:47,000 --> 00:54:50,000 inside. Then we do this regrouping of 745 00:54:50,000 --> 00:54:54,000 terms, and we figure out, oh, that's just E of f of X. 746 00:54:54,000 --> 00:54:58,000 So, the only inequality here is coming from convexity. 747 00:54:58,000 --> 00:55:01,000 All right, now comes the algorithms. 748 00:55:01,000 --> 00:55:05,000 So, this was just some basic probability stuff, 749 00:55:05,000 --> 00:55:10,000 which is good to practice. OK, we could see in the quiz, 750 00:55:10,000 --> 00:55:13,000 which is not surprising. This is the case for me, 751 00:55:13,000 --> 00:55:15,000 too. You have a lot of intuition 752 00:55:15,000 --> 00:55:17,000 with algorithms. Whenever it's algorithmic, 753 00:55:17,000 --> 00:55:21,000 it makes a lot of sense because you're sort of grounded in some 754 00:55:21,000 --> 00:55:24,000 things that you know because you are computer scientists, 755 00:55:24,000 --> 00:55:27,000 or something of that ilk. For the purposes of this class, 756 00:55:27,000 --> 00:55:32,000 you are computer scientists. But, with sort of the basic 757 00:55:32,000 --> 00:55:36,000 probability, unless you happen to be a mathematician, 758 00:55:36,000 --> 00:55:40,000 it's less intuitive, and therefore harder to get 759 00:55:40,000 --> 00:55:42,000 fast. And, in quiz one, 760 00:55:42,000 --> 00:55:45,000 speed is pretty important. On the final, 761 00:55:45,000 --> 00:55:50,000 speed will also be important. The take home certainly doesn't 762 00:55:50,000 --> 00:55:53,000 hurt. So, the take home is more 763 00:55:53,000 --> 00:55:56,000 interesting because it requires being clever. 764 00:55:56,000 --> 00:56:01,000 You have to actually be creative. 765 00:56:01,000 --> 00:56:03,000 And, that really tests algorithmic design. 766 00:56:03,000 --> 00:56:06,000 So far, we've mainly tested analysis, and just, 767 00:56:06,000 --> 00:56:09,000 can you work through probability? 768 00:56:09,000 --> 00:56:12,000 Can you figure out what the, can you remember what your 769 00:56:12,000 --> 00:56:15,000 running time of randomized quicksort is, 770 00:56:15,000 --> 00:56:17,000 and so on? Quiz two will actually test 771 00:56:17,000 --> 00:56:20,000 creativity because you have more time. 772 00:56:20,000 --> 00:56:22,000 It's hard to be creative in two hours. 773 00:56:22,000 --> 00:56:26,000 OK, so we want to analyze the expected height of a randomly 774 00:56:26,000 --> 00:56:32,000 constructed binary search tree. So, I've defined this before, 775 00:56:32,000 --> 00:56:38,000 but let me repeat it because it was a while ago almost at the 776 00:56:38,000 --> 00:56:42,000 beginning of lecture. I'm going to take the random 777 00:56:42,000 --> 00:56:48,000 variable of the height of a randomly built binary search 778 00:56:48,000 --> 00:56:51,000 tree on n nodes. So, that was randomized, 779 00:56:51,000 --> 00:56:55,000 the n values. Take a random permutation, 780 00:56:55,000 --> 00:57:02,000 insert them one by one from left to right with tree insert. 781 00:57:02,000 --> 00:57:05,000 What is the height of the tree that you get? 782 00:57:05,000 --> 00:57:08,000 What is the maximum depth of any node? 783 00:57:08,000 --> 00:57:11,000 I'm not going to look so much at X_n. 784 00:57:11,000 --> 00:57:14,000 I'm going to look at the exponentiation of X_n. 785 00:57:14,000 --> 00:57:17,000 And, still we have no intuition why. 786 00:57:17,000 --> 00:57:20,000 But, two to the X is a convex function. 787 00:57:20,000 --> 00:57:23,000 OK, it looks like that. It's very sharp. 788 00:57:23,000 --> 00:57:27,000 That's the best I can do for drawing, two to the X. 789 00:57:27,000 --> 00:57:31,000 You saw how I drew my histogram. 790 00:57:31,000 --> 00:57:34,000 So, we want to somehow write this random variable as 791 00:57:34,000 --> 00:57:36,000 something, OK, in some algebra. 792 00:57:36,000 --> 00:57:39,000 The main thing here is to split into cases. 793 00:57:39,000 --> 00:57:42,000 That's how we usually go because there's lots of 794 00:57:42,000 --> 00:57:45,000 different scenarios on what happens. 795 00:57:45,000 --> 00:57:48,000 So, I mean, how do we construct a tree from the beginning? 796 00:57:48,000 --> 00:57:51,000 First thing we do is we take the first node. 797 00:57:51,000 --> 00:57:54,000 We throw it in, make it the root. 798 00:57:54,000 --> 00:57:58,000 OK, so whatever the first value happens to be in the array, 799 00:57:58,000 --> 00:58:02,000 which we don't really know how that falls into sorted order, 800 00:58:02,000 --> 00:58:06,000 we put it at the root. And, it stays the root. 801 00:58:06,000 --> 00:58:08,000 We never change the root from then on. 802 00:58:08,000 --> 00:58:12,000 Now, of all the remaining elements, some of them are less 803 00:58:12,000 --> 00:58:14,000 than this value, and they go over here. 804 00:58:14,000 --> 00:58:17,000 So, let's call this r at the root. 805 00:58:17,000 --> 00:58:19,000 And, some of them are greater than r. 806 00:58:19,000 --> 00:58:22,000 So, they go over here. Maybe there's more over here. 807 00:58:22,000 --> 00:58:25,000 Maybe there's more over here. Who knows? 808 00:58:25,000 --> 00:58:28,000 Arbitrary partition, in fact, uniformly random 809 00:58:28,000 --> 00:58:31,000 partition, which should sound familiar, whether there are k 810 00:58:31,000 --> 00:58:34,000 elements over here, and n minus k minus one 811 00:58:34,000 --> 00:58:36,000 elements over here, for any value of k, 812 00:58:36,000 --> 00:58:42,000 that's equally likely because this is chosen uniformly. 813 00:58:42,000 --> 00:58:44,000 The root is chosen uniformly. It's the first element in a 814 00:58:44,000 --> 00:58:47,000 random permutation. So, what I'm going to do is 815 00:58:47,000 --> 00:58:49,000 parameterize by that. How many elements are over 816 00:58:49,000 --> 00:58:51,000 here, and how many elements are over here? 817 00:58:51,000 --> 00:58:54,000 Because this thing is, again, a randomly built binary 818 00:58:54,000 --> 00:58:57,000 search tree on however many nodes are in there because after 819 00:58:57,000 --> 00:59:00,000 I pick r, it's determined who is to the left and who is to the 820 00:59:00,000 --> 00:59:03,000 right. And so, I can just partition. 821 00:59:03,000 --> 00:59:07,000 It's like running quicksort. I partition the elements left 822 00:59:07,000 --> 00:59:11,000 of r, the elements right of r, and I'm sort of recursively 823 00:59:11,000 --> 00:59:15,000 constructing a randomly built binary search tree on those two 824 00:59:15,000 --> 00:59:18,000 sub-permutations because sub-permutations of uniform 825 00:59:18,000 --> 00:59:22,000 permutations are uniform. OK, so these are essentially 826 00:59:22,000 --> 00:59:25,000 recursive problems. And, we know how to analyze 827 00:59:25,000 --> 00:59:28,000 recursive problems. All we need to know is that 828 00:59:28,000 --> 00:59:31,000 there are k minus one elements over here, and n minus k 829 00:59:31,000 --> 00:59:38,000 elements over here. And, that would mean that r has 830 00:59:38,000 --> 00:59:45,000 rank k, remember, rank in the sense of the index 831 00:59:45,000 --> 00:59:52,000 in assorted order. So, where should I go? 832 01:00:08,000 --> 01:00:11,034 So, if the root, r, has rank, 833 01:00:11,034 --> 01:00:17,318 k, so if this is a statement about condition on this event, 834 01:00:17,318 --> 01:00:23,278 which is a random event, then what we have is X_n equals 835 01:00:23,278 --> 01:00:29,888 one plus the max of X_(k minus one), X_(n minus k) because the 836 01:00:29,888 --> 01:00:35,848 height of this tree is the max of the heights of the two 837 01:00:35,848 --> 01:00:43,000 subtrees plus one because we have one more level up top. 838 01:00:43,000 --> 01:00:46,728 OK, so that's the natural thing to do. 839 01:00:46,728 --> 01:00:51,263 What we are trying to analyze, though, is Y_n. 840 01:00:51,263 --> 01:00:55,193 So, for Y_n, we have to take two to this 841 01:00:55,193 --> 01:00:58,720 power. So, it's two times the max of 842 01:00:58,720 --> 01:01:03,961 two to the X_(k minus one), which is Y_(k minus one), 843 01:01:03,961 --> 01:01:09,000 and two to this, which is Y_(n minus k). 844 01:01:09,000 --> 01:01:12,536 And, now you start to see, maybe, why we are interested in 845 01:01:12,536 --> 01:01:16,260 Y's instead of X's in the sense that it's what we know how to 846 01:01:16,260 --> 01:01:18,059 do. When we solve a recursion, 847 01:01:18,059 --> 01:01:20,541 when we solve, like, the expected running 848 01:01:20,541 --> 01:01:22,713 time, we haven't taken expectations, 849 01:01:22,713 --> 01:01:24,823 yet, here. But, when we compute the 850 01:01:24,823 --> 01:01:28,050 expected running time of quicksort, we have something 851 01:01:28,050 --> 01:01:30,656 like two times, I mean, we have a couple of 852 01:01:30,656 --> 01:01:35,000 recursive subproblems, which are being added together. 853 01:01:35,000 --> 01:01:37,015 OK, here, we have a factor of two. 854 01:01:37,015 --> 01:01:39,276 Here, we have a max. But, intuitively, 855 01:01:39,276 --> 01:01:43,002 we know how to multiply random variables by a constant because 856 01:01:43,002 --> 01:01:45,079 that's, like, there's two recursive 857 01:01:45,079 --> 01:01:48,500 subproblems of the size is equal to the max of these two, 858 01:01:48,500 --> 01:01:50,576 which we don't happen to know here. 859 01:01:50,576 --> 01:01:52,653 But, there it is, whereas one plus, 860 01:01:52,653 --> 01:01:54,791 we don't know how to handle so well. 861 01:01:54,791 --> 01:01:57,357 And, indeed, our techniques are really good 862 01:01:57,357 --> 01:02:00,289 at solving recurrences, except up to the constant 863 01:02:00,289 --> 01:02:03,355 factors. And, this one plus really 864 01:02:03,355 --> 01:02:05,685 doesn't affect the constant factor too much, 865 01:02:05,685 --> 01:02:07,745 it would seem. OK, but it's a big deal. 866 01:02:07,745 --> 01:02:09,859 In exponentiation, it's a factor of two. 867 01:02:09,859 --> 01:02:13,112 So here, it's really hard to see what this one plus is doing. 868 01:02:13,112 --> 01:02:14,900 And, our analysis, if we tried it, 869 01:02:14,900 --> 01:02:18,099 and it's a good idea to try it at home and see what happens, 870 01:02:18,099 --> 01:02:20,700 if you tried to do what I'm about to do with X_n, 871 01:02:20,700 --> 01:02:24,007 the one plus will sort of get lost, and you won't get a bound. 872 01:02:24,007 --> 01:02:26,771 You just can't prove anything. With a factor of two, 873 01:02:26,771 --> 01:02:29,319 we're in good shape. We sort of know how to deal 874 01:02:29,319 --> 01:02:33,980 with that. We'll say more when we've 875 01:02:33,980 --> 01:02:41,015 actually done the proof about why we use Y_n instead of X_n. 876 01:02:41,015 --> 01:02:44,353 But for now, we're using Y_n. 877 01:02:44,353 --> 01:02:49,480 So, this is sort of a recursion, except it's 878 01:02:49,480 --> 01:02:56,038 conditioned on this event. So, how do I turn this into a 879 01:02:56,038 --> 01:02:59,973 statement that holds all the time? 880 01:02:59,973 --> 01:03:04,896 Sorry? Divide by the probability of 881 01:03:04,896 --> 01:03:07,275 the event? More or less. 882 01:03:07,275 --> 01:03:11,000 Indeed, these events are independent. 883 01:03:11,000 --> 01:03:15,551 Or, they're all equally likely, I should say. 884 01:03:15,551 --> 01:03:21,241 They're not independent. In fact, one determines all the 885 01:03:21,241 --> 01:03:24,241 others. So, how do I generally 886 01:03:24,241 --> 01:03:30,137 represent an event in algebra? Indicator random variables: 887 01:03:30,137 --> 01:03:34,995 good. Remember your friends, 888 01:03:34,995 --> 01:03:42,076 indicator random variables. All of these analyses use 889 01:03:42,076 --> 01:03:49,565 indicator random variables. So, they will just represent 890 01:03:49,565 --> 01:03:54,195 this event, and we'll call it Z_nk. 891 01:03:54,195 --> 01:03:59,778 It's going to be one if the root has rank, 892 01:03:59,778 --> 01:04:05,415 k, and zero otherwise. So, in particular, 893 01:04:05,415 --> 01:04:09,110 the probability of, these things are all equally 894 01:04:09,110 --> 01:04:13,828 likely for, a particular value of n if you try all the values 895 01:04:13,828 --> 01:04:16,186 of k. The probability that this 896 01:04:16,186 --> 01:04:20,746 equals one, which is also the expectation of that indicator 897 01:04:20,746 --> 01:04:23,734 random variable, which you should know, 898 01:04:23,734 --> 01:04:26,486 is it only takes values one or zero. 899 01:04:26,486 --> 01:04:29,788 The zero doesn't matter in the expectation. 900 01:04:29,788 --> 01:04:34,034 So, this is going to be, hopefully, one over n if I got 901 01:04:34,034 --> 00:00:00,000 right. 902 01:04:36,000 --> 01:04:43,013 So, there are n possibility of what the rank of the root could 903 01:04:43,013 --> 01:04:46,922 be. Each of them are equally likely 904 01:04:46,922 --> 01:04:51,176 because we have a uniform permutation. 905 01:04:51,176 --> 01:04:57,040 So, now, I can rewrite this condition statement as a 906 01:04:57,040 --> 01:05:04,168 summation where the Z_nk's will let me choose what case I'm in. 907 01:05:04,168 --> 01:05:10,836 So, we have Y_n is the sum, k equals one to n of Z_nk times 908 01:05:10,836 --> 01:05:16,010 two times the max of X, sorry, Y, k minus one, 909 01:05:16,010 --> 01:05:20,478 Y_n minus k. So, now we have our good 910 01:05:20,478 --> 01:05:23,126 friend, the recurrence. We need to solve it. 911 01:05:23,126 --> 01:05:26,329 OK, we can't really solve it because this is a random 912 01:05:26,329 --> 01:05:29,963 variable, and it's talking about recursive random variables. 913 01:05:29,963 --> 01:05:32,858 So, we first take the expectation of both sides. 914 01:05:32,858 --> 01:05:36,000 That's the only thing we can really bound. 915 01:05:36,000 --> 01:05:40,074 Y_n could be n^2 in an unlucky case, sorry, not n^2. 916 01:05:40,074 --> 01:05:43,190 It could be n^2. It could be two to the, 917 01:05:43,190 --> 01:05:47,903 boy, two to the n if you are unlucky because X_n could be as 918 01:05:47,903 --> 01:05:50,460 big as n, the height of the tree. 919 01:05:50,460 --> 01:05:54,694 And, Y_n is two to that. So, it could be two to the n. 920 01:05:54,694 --> 01:05:58,688 What we want to prove is that it's polynomial in n. 921 01:05:58,688 --> 01:06:02,203 If it's n to some constant, and we take logs, 922 01:06:02,203 --> 01:06:07,341 it'll be order log n. OK, so we'll take the 923 01:06:07,341 --> 01:06:14,254 expectation, and hopefully that will guarantee that this holds. 924 01:06:14,254 --> 01:06:20,163 OK, so we have expectation of this summation of random 925 01:06:20,163 --> 01:06:24,846 variables times recursive random variables. 926 01:06:24,846 --> 01:06:30,198 So, what is the first, woops, I forgot a bracket. 927 01:06:30,198 --> 01:06:37,000 What is the first thing that we do in this analysis? 928 01:06:37,000 --> 01:06:41,300 This should, yeah, linearity of expectation. 929 01:06:41,300 --> 01:06:45,900 That one's easy to remember. OK, we have a sum. 930 01:06:45,900 --> 01:06:49,000 So, let's put the E inside. 931 01:07:04,000 --> 01:07:08,842 OK, now we have the expectation of our product. 932 01:07:08,842 --> 01:07:12,210 What should we use? Independence. 933 01:07:12,210 --> 01:07:15,684 Hopefully, things are independent. 934 01:07:15,684 --> 01:07:21,052 And then, we could write this. Then, it would be the 935 01:07:21,052 --> 01:07:26,842 expectation of the product. And, heck, let's put the two 936 01:07:26,842 --> 01:07:34,000 outside, because it's not, no sense in keeping it in here. 937 01:07:34,000 --> 01:07:37,956 Y is there starting to look like X's? 938 01:07:37,956 --> 01:07:42,351 I can't even read them. Sorry about that. 939 01:07:42,351 --> 01:07:46,417 This should all be Y's. OK, very wise, 940 01:07:46,417 --> 01:07:48,615 random variables. So. 941 01:07:48,615 --> 01:07:54,769 Why are these independent? So, here we are looking at the 942 01:07:54,769 --> 01:08:00,703 choice of what the root is, what rank the root has in a 943 01:08:00,703 --> 01:08:05,608 problem of size n. In here, we're looking at what 944 01:08:05,608 --> 01:08:08,020 the root, I mean, there are various choices of 945 01:08:08,020 --> 01:08:11,290 what the search tree looks like in the stuff left of the root, 946 01:08:11,290 --> 01:08:13,112 and in the stuff right of the root. 947 01:08:13,112 --> 01:08:16,220 Those are independent choices because everything is uniform 948 01:08:16,220 --> 01:08:18,096 here. So, the choice of this guy was 949 01:08:18,096 --> 01:08:20,081 uniform. And then, that determines who 950 01:08:20,081 --> 01:08:22,011 partitions in the left and the right. 951 01:08:22,011 --> 01:08:24,798 Those are completely independent recursive choices of 952 01:08:24,798 --> 01:08:26,621 who's the root in the left subtree? 953 01:08:26,621 --> 01:08:29,086 Who's the root in the left of the left subtree, 954 01:08:29,086 --> 01:08:31,176 and so on? So, this is a little trickier 955 01:08:31,176 --> 01:08:36,385 than usual. Before, it was random choices 956 01:08:36,385 --> 01:08:41,871 in the algorithm. Now, it's in some construction 957 01:08:41,871 --> 01:08:47,474 where we choose the random numbers ahead of time. 958 01:08:47,474 --> 01:08:52,961 It's a bit funny, but this is still independent. 959 01:08:52,961 --> 01:08:58,214 So, we get this just like we did in quicksort, 960 01:08:58,214 --> 01:08:59,731 and so on. OK. 961 01:08:59,731 --> 01:09:05,374 Now, we continue. And, now it's time to be a bit 962 01:09:05,374 --> 01:09:08,143 sloppy. Well, one of these things we 963 01:09:08,143 --> 01:09:09,568 know. OK, E of ZNK, 964 01:09:09,568 --> 01:09:12,812 that, we wrote over here. It's one over n. 965 01:09:12,812 --> 01:09:15,899 So, that's cool. So, we get a two over n 966 01:09:15,899 --> 01:09:20,488 outside, and we get this sum of the expectation of a max of 967 01:09:20,488 --> 01:09:23,812 these two things. Normally, we would write, 968 01:09:23,812 --> 01:09:27,136 well, I think sometimes you write T of max, 969 01:09:27,136 --> 01:09:30,143 or Y of the max of the two things here. 970 01:09:30,143 --> 01:09:36,000 You've got to write it as the max of these two variables. 971 01:09:36,000 --> 01:09:41,547 And, the trick, I mean, it's not too much of a 972 01:09:41,547 --> 01:09:46,849 trick, is that the max is, at most, the sum. 973 01:09:46,849 --> 01:09:53,506 So, we have nonnegative things. So, we have two over n, 974 01:09:53,506 --> 01:10:00,657 sum k equals one to n of the expectation of the sum instead 975 01:10:00,657 --> 01:10:03,943 of the max. OK, this is, 976 01:10:03,943 --> 01:10:07,014 in some sense, the key step where we are 977 01:10:07,014 --> 01:10:11,344 losing something in our bound. So far, we've been exact. 978 01:10:11,344 --> 01:10:15,437 Now, we're being pretty sloppy. It's true the max is, 979 01:10:15,437 --> 01:10:19,137 at most, the sum. But, it's a pretty loose upper 980 01:10:19,137 --> 01:10:22,758 bound as things go. We'll keep that in mind for 981 01:10:22,758 --> 01:10:25,434 later. What else can we do with the 982 01:10:25,434 --> 01:10:27,166 summation? This should, 983 01:10:27,166 --> 01:10:33,470 again, look familiar. Now that we have a sum of a sum 984 01:10:33,470 --> 01:10:38,283 of two things, I'm trying to like it to be a 985 01:10:38,283 --> 01:10:40,858 sum of one thing. Sorry? 986 01:10:40,858 --> 01:10:45,559 You can use linearity of expectation, good. 987 01:10:45,559 --> 01:10:49,813 So, that's the first thing I should do. 988 01:10:49,813 --> 01:10:55,410 So, linearity of expectation lets me separate that. 989 01:10:55,410 --> 01:11:02,079 Now I have a sum of 2n things. Right, I could break that into 990 01:11:02,079 --> 01:11:05,405 the sum of these guys, and the sum of these guys. 991 01:11:05,405 --> 01:11:08,247 Do you know anything about those two sums? 992 01:11:08,247 --> 01:11:11,019 Do we know anything about those two sums? 993 01:11:11,019 --> 01:11:14,068 They're the same. In fact, every term here is 994 01:11:14,068 --> 01:11:17,326 appearing exactly twice. One says a k minus one. 995 01:11:17,326 --> 01:11:20,722 One says an n minus k, and that even works if it's 996 01:11:20,722 --> 01:11:22,455 odd, I think. So, in fact, 997 01:11:22,455 --> 01:11:26,267 we can just take one of the sums and multiply it by two. 998 01:11:26,267 --> 01:11:30,356 So, this is four over n times the sum, and I'll rewrite it a 999 01:11:30,356 --> 01:11:35,000 little bit from zero to n minus one of E of Y_k. 1000 01:11:35,000 --> 01:11:40,425 Just check the number of times each Y_k appears from zero up to 1001 01:11:40,425 --> 01:11:45,237 n minus one is exactly two. So, now I have a recurrence. 1002 01:11:45,237 --> 01:11:48,649 I have E of Y_n is, at most, this thing. 1003 01:11:48,649 --> 01:11:51,800 Let's just write that for our memory. 1004 01:11:51,800 --> 01:11:53,550 So, how's that? Cool. 1005 01:11:53,550 --> 01:11:57,050 Now, I just have to solve the recurrence. 1006 01:11:57,050 --> 01:12:03,000 How should I solve an ugly, hairy, recurrence like this? 1007 01:12:03,000 --> 01:12:05,125 Substitution: yea! 1008 01:12:05,125 --> 01:12:10,750 Not the master method. OK, it's a pretty nasty 1009 01:12:10,750 --> 01:12:15,875 recurrence. So, I'm going to make a guess, 1010 01:12:15,875 --> 01:12:22,125 and I've already told you the guess, that it's n^3. 1011 01:12:22,125 --> 01:12:29,375 I think n^3 is pretty much exactly where this proof will be 1012 01:12:29,375 --> 01:12:34,239 obtainable. So, substitution method, 1013 01:12:34,239 --> 01:12:38,720 substitution method is just a proof by induction. 1014 01:12:38,720 --> 01:12:44,506 And, there are two things every proof by induction should have, 1015 01:12:44,506 --> 01:12:49,826 well, almost every proof by induction, unless you're being 1016 01:12:49,826 --> 01:12:52,906 fancy. It should have a base case, 1017 01:12:52,906 --> 01:12:57,013 and the base case here is n equals order one. 1018 01:12:57,013 --> 01:13:00,093 I didn't write it, but, of course, 1019 01:13:00,093 --> 01:13:05,318 if you have a constant size tree, it has constant height. 1020 01:13:05,318 --> 01:13:10,640 So, this thing will be true as long as we set true if c is 1021 01:13:10,640 --> 01:13:15,684 sufficiently large. OK, so, don't forget that. 1022 01:13:15,684 --> 01:13:18,080 A lot of people forgot it on the quiz. 1023 01:13:18,080 --> 01:13:20,089 We even mentioned the base case. 1024 01:13:20,089 --> 01:13:22,939 Usually, we don't even mention the base case. 1025 01:13:22,939 --> 01:13:25,854 And, you should assume that there's one there. 1026 01:13:25,854 --> 01:13:30,000 And, you have to say this in any proof by substitution. 1027 01:13:30,000 --> 01:13:33,107 OK, now, we have the induction step. 1028 01:13:33,107 --> 01:13:37,279 So, I claim that E of Y_n is, at most, Ccof n^3, 1029 01:13:37,279 --> 01:13:40,563 assuming that it's true for smaller n. 1030 01:13:40,563 --> 01:13:44,647 You should write the induction hypothesis here, 1031 01:13:44,647 --> 01:13:49,618 but I'm going to skip it because I'm running out of time. 1032 01:13:49,618 --> 01:13:53,613 Now, we have this recurrence that E of Y_n is, 1033 01:13:53,613 --> 01:13:56,809 at most, this thing. So, E of Y_n is, 1034 01:13:56,809 --> 01:14:01,159 at most, four over n, sum k equals zero to n minus 1035 01:14:01,159 --> 01:14:07,223 one of E of Y_k. Now, notice that k is always 1036 01:14:07,223 --> 01:14:12,059 smaller than n. So, we can apply induction. 1037 01:14:12,059 --> 01:14:15,858 So, this is, at most, four over n, 1038 01:14:15,858 --> 01:14:21,269 sum k equals zero to n minus one of c times k^3. 1039 01:14:21,269 --> 01:14:24,838 That's the induction hypothesis. 1040 01:14:24,838 --> 01:14:28,753 Cool. Now, I need an upper bound on 1041 01:14:28,753 --> 01:14:35,430 this sum, if you have a good memory, then you know a closed 1042 01:14:35,430 --> 01:14:40,801 form for this sum. But, I don't have such a good 1043 01:14:40,801 --> 01:14:43,970 memory as I used to. I never memorized this sum when 1044 01:14:43,970 --> 01:14:47,884 I was a kid, so I don't remember everything when I memorize when 1045 01:14:47,884 --> 01:14:51,612 I was less than 12 years old. I still remember all the digits 1046 01:14:51,612 --> 01:14:54,532 of pi, whatever. But, anything I try to memorize 1047 01:14:54,532 --> 01:14:57,079 now just doesn't quite stick the same way. 1048 01:14:57,079 --> 01:15:00,000 So, I don't happen to know this sum. 1049 01:15:00,000 --> 01:15:03,169 What's a good way to approximate this sum? 1050 01:15:03,169 --> 01:15:05,256 Integral: good. So, in fact, 1051 01:15:05,256 --> 01:15:07,653 I'm going to take the c outside. 1052 01:15:07,653 --> 01:15:10,900 So, this is 4c over n. The sum is, at most, 1053 01:15:10,900 --> 01:15:13,992 the integral. If you get the range right, 1054 01:15:13,992 --> 01:15:18,089 so, you have to go one larger. Instead of n minus one, 1055 01:15:18,089 --> 01:15:21,104 you go up to n. This is in the textbook. 1056 01:15:21,104 --> 01:15:24,274 It's intuitive, too, as long as you have a 1057 01:15:24,274 --> 01:15:26,516 monotone function. That's key. 1058 01:15:26,516 --> 01:15:31,000 So, you have something that's like this. 1059 01:15:31,000 --> 01:15:34,075 And, you know, the sum is taking each of these 1060 01:15:34,075 --> 01:15:36,671 and weighting them with a value of one. 1061 01:15:36,671 --> 01:15:40,157 The integral is computing the area under this curve. 1062 01:15:40,157 --> 01:15:42,684 So, in particular, if you look at this 1063 01:15:42,684 --> 01:15:45,624 approximation of the integral, then, I mean, 1064 01:15:45,624 --> 01:15:49,382 this thing is certainly, this would be the sum if you go 1065 01:15:49,382 --> 01:15:52,252 one larger at the end, and that's, at most, 1066 01:15:52,252 --> 01:15:55,054 the integral. So, that's proof by picture. 1067 01:15:55,054 --> 01:15:57,309 But, you can see this in the book. 1068 01:15:57,309 --> 01:16:01,000 You should know it from 042 I guess. 1069 01:16:01,000 --> 01:16:04,448 Now, integrals, hopefully, you can solve. 1070 01:16:04,448 --> 01:16:07,206 Integral of x^3 is x^4 over four. 1071 01:16:07,206 --> 01:16:11,172 I got it right. And then, we're valuing that at 1072 01:16:11,172 --> 01:16:12,637 n. And, it's zero. 1073 01:16:12,637 --> 01:16:17,293 Subtracting the zero doesn't matter because zero to the 1074 01:16:17,293 --> 01:16:21,517 fourth power is zero. So, it's just n^4 over four. 1075 01:16:21,517 --> 01:16:25,051 So, this is 4c over n times n^4 over four. 1076 01:16:25,051 --> 01:16:28,931 And, conveniently, this four cancels with this 1077 01:16:28,931 --> 01:16:31,689 four. The four turns into a three 1078 01:16:31,689 --> 01:16:36,000 because of this, and we get n^3. 1079 01:16:36,000 --> 01:16:38,159 We get cn^3. Damn convenient, 1080 01:16:38,159 --> 01:16:41,089 because that's what we wanted to prove. 1081 01:16:41,089 --> 01:16:44,404 OK, so this proof is just barely snaking by: 1082 01:16:44,404 --> 01:16:48,028 no residual term. We've been sloppy all over the 1083 01:16:48,028 --> 01:16:50,727 place, and yet we were really lucky. 1084 01:16:50,727 --> 01:16:54,120 And, we were just sloppy in the right places. 1085 01:16:54,120 --> 01:16:56,510 So, this is a very tricky proof. 1086 01:16:56,510 --> 01:17:01,214 If you just tried to do it by hand, it's pretty easy to be too 1087 01:17:01,214 --> 01:17:04,452 sloppy, and not get quite the right answer. 1088 01:17:04,452 --> 01:17:09,869 But, this just barely works. So, let me say a couple of 1089 01:17:09,869 --> 01:17:12,890 things about it in my remaining one minute. 1090 01:17:12,890 --> 01:17:15,407 So, we can do the conclusion, again. 1091 01:17:15,407 --> 01:17:18,428 I won't write it because I don't have time, 1092 01:17:18,428 --> 01:17:21,664 but here it is. We just proved a bound on Y_n, 1093 01:17:21,664 --> 01:17:25,907 which was two to the power X_n. What we cared about was X_n. 1094 01:17:25,907 --> 01:17:29,000 So, we used Jensen's inequality. 1095 01:17:29,000 --> 01:17:32,350 We get the two to the E of X_n is, at most, E of two to the 1096 01:17:32,350 --> 01:17:34,083 X_n. This is what we know about 1097 01:17:34,083 --> 01:17:36,740 because that's Y_n. So, we know E of Y_n is now 1098 01:17:36,740 --> 01:17:39,108 order n^3. OK, we had to set this constant 1099 01:17:39,108 --> 01:17:41,187 sufficiently large for the base case. 1100 01:17:41,187 --> 01:17:44,306 We didn't really figure out what the constant was here. 1101 01:17:44,306 --> 01:17:47,599 It didn't matter because now we're taking the logs of both 1102 01:17:47,599 --> 01:17:49,043 sides. We get E of X_n is, 1103 01:17:49,043 --> 01:17:51,584 at most, log of order n^3. This constant is a 1104 01:17:51,584 --> 01:17:54,241 multiplicative constant. So, you take the logs. 1105 01:17:54,241 --> 01:17:57,072 It becomes additive. This constant is an exponent. 1106 01:17:57,072 --> 01:18:01,000 So, it would take logs. It becomes a multiple. 1107 01:18:01,000 --> 01:18:07,361 Three log n plus order one. This is a pretty damn tight 1108 01:18:07,361 --> 01:18:13,486 bound on the height of a randomly built binary search 1109 01:18:13,486 --> 01:18:18,081 tree, the expected height, I should say. 1110 01:18:18,081 --> 01:18:23,617 In fact, the expected height of X_n is equal to, 1111 01:18:23,617 --> 01:18:28,447 well, roughly, I'll just say it's roughly, 1112 01:18:28,447 --> 01:18:34,925 I don't want to be too precise here, 2.9882 times log n. 1113 01:18:34,925 --> 01:18:40,934 This is the result by a friend of mine, Luke Devroy, 1114 01:18:40,934 --> 01:18:46,000 if I spell it right, in 1986. 1115 01:18:46,000 --> 01:18:49,572 He's a professor at McGill University in Montreal. 1116 01:18:49,572 --> 01:18:52,270 So, we're pretty close, three to 2.98. 1117 01:18:52,270 --> 01:18:56,572 And, I won't prove this here. The hard part here is actually 1118 01:18:56,572 --> 01:19:00,000 the lower bound, but it's only that much. 1119 01:19:00,000 --> 01:19:04,273 I should say a little bit more about why we use Y_n instead of 1120 01:19:04,273 --> 01:19:06,166 X_n. And, it's all about the 1121 01:19:06,166 --> 01:19:08,268 sloppiness. And, in particular, 1122 01:19:08,268 --> 01:19:12,193 this step, where we said that the max of these two random 1123 01:19:12,193 --> 01:19:14,295 variables is, at most, the sum. 1124 01:19:14,295 --> 01:19:18,359 And, while that's true for X just as well as it is true for 1125 01:19:18,359 --> 01:19:21,653 Y, it's more true for Y. OK, this is a bit weird 1126 01:19:21,653 --> 01:19:24,876 because, remember, what we're analyzing here is 1127 01:19:24,876 --> 01:19:28,800 all possible values of k. This has to work no matter what 1128 01:19:28,800 --> 01:19:32,234 k is, in some sense. I mean, we're bounding all of 1129 01:19:32,234 --> 01:19:37,000 those cases simultaneously, the sum of them all. 1130 01:19:37,000 --> 01:19:41,576 So, here we're looking at k minus one versus n minus k. 1131 01:19:41,576 --> 01:19:44,881 And, in fact, here, there's a polynomial 1132 01:19:44,881 --> 01:19:48,186 version. But, so, if you take two values 1133 01:19:48,186 --> 01:19:51,576 a and b, and you say, well, max of ab is, 1134 01:19:51,576 --> 01:19:55,728 at most, a plus b. And, on the other hand you say, 1135 01:19:55,728 --> 01:19:59,541 well, max of two to the a and two to the b is, 1136 01:19:59,541 --> 01:20:02,847 at most, two to the a plus two to the b. 1137 01:20:02,847 --> 01:20:07,000 Doesn't this feel better than that? 1138 01:20:07,000 --> 01:20:09,820 Well, they are, of course, the same. 1139 01:20:09,820 --> 01:20:13,367 But, if you look at a minus b, as that grows, 1140 01:20:13,367 --> 01:20:17,719 this becomes a tighter bound faster than this becomes a 1141 01:20:17,719 --> 01:20:22,716 tighter bound because here we're looking at absolute difference 1142 01:20:22,716 --> 01:20:26,504 between a minus b. So, that's why this is pretty 1143 01:20:26,504 --> 01:20:31,259 good and this is pretty bad. We're still really bad if a and 1144 01:20:31,259 --> 01:20:35,812 b are almost the same. But, we're trying to solve this 1145 01:20:35,812 --> 01:20:38,677 for all partitions into k minus one and n minus k. 1146 01:20:38,677 --> 01:20:42,127 So, it's OK if we get a few of the cases wrong in the middle 1147 01:20:42,127 --> 01:20:45,284 where it evenly partitions. But, as soon as we get some 1148 01:20:45,284 --> 01:20:49,026 skew, this will be very close to this, whereas this will be still 1149 01:20:49,026 --> 01:20:52,066 pretty far from this. You have to get pretty close to 1150 01:20:52,066 --> 01:20:54,580 the edge before you're not losing much here, 1151 01:20:54,580 --> 01:20:57,504 whereas pretty quickly you're not losing much here. 1152 01:20:57,504 --> 01:21:00,368 That's the intuition. Try it, and see what happens 1153 01:21:00,368 --> 01:21:03,000 with X_n, and it won't work. See you Wednesday.