1 00:00:00,060 --> 00:00:02,500 The following content is provided under a Creative 2 00:00:02,500 --> 00:00:04,019 Commons license. 3 00:00:04,019 --> 00:00:06,360 Your support will help MIT OpenCourseWare 4 00:00:06,360 --> 00:00:10,730 continue to offer high quality educational resources for free. 5 00:00:10,730 --> 00:00:13,340 To make a donation or view additional materials 6 00:00:13,340 --> 00:00:17,236 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,236 --> 00:00:17,861 at ocw.mit.edu. 8 00:00:21,500 --> 00:00:23,770 LING REN: So today, we're going to analyze 9 00:00:23,770 --> 00:00:29,000 two random algorithms we've seen in the last lecture, 10 00:00:29,000 --> 00:00:31,720 randomized median and randomized quick sort. 11 00:00:31,720 --> 00:00:35,920 But before I go into that, I'd like 12 00:00:35,920 --> 00:00:38,990 to make a correction in problem set two. 13 00:00:38,990 --> 00:00:41,490 I think you all got feedback from that. 14 00:00:41,490 --> 00:00:43,770 So it's the second problem, where you're 15 00:00:43,770 --> 00:00:45,175 asked to combine B-trees. 16 00:00:51,810 --> 00:00:56,620 You have two B-trees, T1 and T2 each with some children. 17 00:00:56,620 --> 00:00:59,050 And we're giving you another element, k. 18 00:00:59,050 --> 00:01:01,050 And we ask you to combine them. 19 00:01:01,050 --> 00:01:08,110 So if the two trees are the same height, how do you do that? 20 00:01:08,110 --> 00:01:10,130 Does anyone want to share his or her answer? 21 00:01:13,130 --> 00:01:13,630 Go ahead. 22 00:01:13,630 --> 00:01:17,630 AUDIENCE: You put k in the middle of T1 and T2. 23 00:01:17,630 --> 00:01:19,770 And then you can split. 24 00:01:19,770 --> 00:01:20,770 LING REN: Yeah, exactly. 25 00:01:20,770 --> 00:01:22,780 So you put k here. 26 00:01:22,780 --> 00:01:25,190 And if this is too full, you split. 27 00:01:25,190 --> 00:01:28,250 So why do I have to do this? 28 00:01:28,250 --> 00:01:35,000 Can I simply just make k my new root and do this? 29 00:01:38,340 --> 00:01:38,840 Go ahead. 30 00:01:38,840 --> 00:01:40,440 AUDIENCE: Because T1 and T2 don't 31 00:01:40,440 --> 00:01:42,581 have to be valid like internal [? modular ?] nodes. 32 00:01:42,581 --> 00:01:43,580 LING REN: Yeah, exactly. 33 00:01:43,580 --> 00:01:46,040 So for B-tree, the requirement on the root 34 00:01:46,040 --> 00:01:48,157 is slightly different from the requirements 35 00:01:48,157 --> 00:01:49,240 for the rest of the nodes. 36 00:01:49,240 --> 00:01:51,360 To this may have too few children. 37 00:01:51,360 --> 00:01:55,050 It is not a valid node. 38 00:01:55,050 --> 00:01:56,560 It's not a valid internal node. 39 00:01:56,560 --> 00:02:01,770 But our solution actually made a mistake in the second part. 40 00:02:01,770 --> 00:02:06,480 In the second part, we're saying so T1 and T2, 41 00:02:06,480 --> 00:02:09,600 their heights are different by 1. 42 00:02:09,600 --> 00:02:14,170 Our solution says put k here and make a pointer. 43 00:02:14,170 --> 00:02:16,580 Exactly the same problem. 44 00:02:16,580 --> 00:02:18,350 Does everyone see that? 45 00:02:18,350 --> 00:02:21,250 So this may not be a valid internal node. 46 00:02:21,250 --> 00:02:24,240 So what's the correct solution? 47 00:02:24,240 --> 00:02:29,950 You put k here and combine it with the last child of T1. 48 00:02:29,950 --> 00:02:32,540 And then you may have to split and split again. 49 00:02:32,540 --> 00:02:33,389 OK. 50 00:02:33,389 --> 00:02:34,430 Everyone happy with that? 51 00:02:39,210 --> 00:02:46,200 Now, today, we're going to look into randomization specifically 52 00:02:46,200 --> 00:02:47,905 we have seen two algorithms in class. 53 00:02:50,560 --> 00:02:55,650 I'll just call them quick find and quick sort. 54 00:03:03,090 --> 00:03:06,310 So quick find is a slightly generalized version 55 00:03:06,310 --> 00:03:08,570 of medium finding. 56 00:03:08,570 --> 00:03:12,060 In the very first lecture and recitation, 57 00:03:12,060 --> 00:03:15,990 we have seen a non-randomized version of quick sort. 58 00:03:15,990 --> 00:03:19,090 So we divide them into groups of five. 59 00:03:26,150 --> 00:03:28,540 And we find a medium of each group. 60 00:03:28,540 --> 00:03:31,650 And then find the median of the median. 61 00:03:31,650 --> 00:03:34,860 Depending on whether you are smaller or larger, 62 00:03:34,860 --> 00:03:38,070 we drew a funny subproblem like this. 63 00:03:38,070 --> 00:03:39,470 Anyone remember that? 64 00:03:39,470 --> 00:03:41,870 And we analyzed this runtime, where 65 00:03:41,870 --> 00:03:44,890 our recursion was one fifth plus 7 over 10 66 00:03:44,890 --> 00:03:46,080 or something like that. 67 00:03:46,080 --> 00:03:48,840 And we show it's the worst case O of n. 68 00:03:48,840 --> 00:03:51,200 So that's a smart algorithm, I'll give you that. 69 00:03:51,200 --> 00:03:54,010 But it's just complicated. 70 00:03:54,010 --> 00:03:59,340 You have to divide them into groups 71 00:03:59,340 --> 00:04:03,160 and do, well, several recursive course. 72 00:04:03,160 --> 00:04:06,010 And also, let me digress a little bit, 73 00:04:06,010 --> 00:04:10,210 there's a very interesting point regarding this worst case 74 00:04:10,210 --> 00:04:11,920 O of n algorithm. 75 00:04:11,920 --> 00:04:15,839 Has anyone wondered why we use groups of five? 76 00:04:15,839 --> 00:04:16,839 Why not groups of three? 77 00:04:23,330 --> 00:04:26,390 Algorithms should work in the same way, right? 78 00:04:26,390 --> 00:04:30,070 If we take out this first row and this last row, 79 00:04:30,070 --> 00:04:32,046 we can still find the median, which is just 80 00:04:32,046 --> 00:04:33,670 a second element in each group and find 81 00:04:33,670 --> 00:04:34,800 the median of the median. 82 00:04:34,800 --> 00:04:39,390 We still have a subproblem that looks like this. 83 00:04:39,390 --> 00:04:40,500 Exercise. 84 00:04:40,500 --> 00:04:43,300 And it turns out if we use groups of three, 85 00:04:43,300 --> 00:04:46,280 when we solve the recursion, it doesn't solve to O of n. 86 00:04:46,280 --> 00:04:48,700 It solves to something else. 87 00:04:48,700 --> 00:04:49,350 OK. 88 00:04:49,350 --> 00:04:51,770 Now, end of digression. 89 00:04:51,770 --> 00:04:56,180 Let's get back to the randomized version. 90 00:04:56,180 --> 00:04:58,110 So how does the randomized version work? 91 00:04:58,110 --> 00:04:59,440 It's that much simpler. 92 00:04:59,440 --> 00:05:02,130 We have an array. 93 00:05:02,130 --> 00:05:11,070 Let me call it Find in array A, which is of size n. 94 00:05:11,070 --> 00:05:15,650 And we want to find i-th largest or smallest element in it. 95 00:05:19,940 --> 00:05:22,560 So what we're going to do is that we'll 96 00:05:22,560 --> 00:05:28,530 pick a random element, x, in this array. 97 00:05:28,530 --> 00:05:37,270 And then we'll put all the smaller elements on this side 98 00:05:37,270 --> 00:05:39,050 and all the larger elements on that side. 99 00:05:44,730 --> 00:05:48,160 Now, because I'm picking the random one, 100 00:05:48,160 --> 00:05:50,440 so this x can be anywhere. 101 00:05:50,440 --> 00:05:58,396 If it is the k-th smallest element from the left, then 102 00:05:58,396 --> 00:05:59,145 what do I do next? 103 00:06:05,240 --> 00:06:08,210 My goal is to find out the i-th smallest element 104 00:06:08,210 --> 00:06:18,466 in this array, A. 105 00:06:18,466 --> 00:06:23,330 AUDIENCE: If A is longer than i then [INAUDIBLE]. 106 00:06:23,330 --> 00:06:24,484 LING REN: OK. 107 00:06:24,484 --> 00:06:34,220 If i is less than k, then my element is on this side, right? 108 00:06:34,220 --> 00:06:36,971 So I should find-- OK, let me define 109 00:06:36,971 --> 00:06:39,220 this to be the left array, this to be the right array. 110 00:06:39,220 --> 00:06:43,580 I should find in the left array, what is its size? 111 00:06:43,580 --> 00:06:47,190 It's k minus 1. 112 00:06:47,190 --> 00:06:47,730 Make sense? 113 00:06:47,730 --> 00:06:48,730 This is k minus 1. 114 00:06:48,730 --> 00:06:53,376 And that is n minus k plus one element in the middle. 115 00:06:53,376 --> 00:07:03,628 And so what's the last argument in that function called? 116 00:07:03,628 --> 00:07:05,480 It's i. 117 00:07:05,480 --> 00:07:10,250 So on the other hand, if i is greater than k, 118 00:07:10,250 --> 00:07:13,100 then I should go to my right half. 119 00:07:13,100 --> 00:07:17,670 Its problem size is n minus k. 120 00:07:17,670 --> 00:07:20,740 So what's the last argument? 121 00:07:20,740 --> 00:07:22,210 AUDIENCE: i minus k. 122 00:07:28,640 --> 00:07:30,630 LING REN: Agree? 123 00:07:30,630 --> 00:07:31,770 i minus k. 124 00:07:34,990 --> 00:07:35,590 OK. 125 00:07:35,590 --> 00:07:39,462 So, of course, if i is equal than k, then we just return x. 126 00:07:43,020 --> 00:07:49,710 Now, obviously, this algorithm's runtime depends on our luck, 127 00:07:49,710 --> 00:07:51,940 depends on this choice of k. 128 00:07:51,940 --> 00:07:56,120 If k is roughly in the middle, then we reduce the problem size 129 00:07:56,120 --> 00:07:58,640 by roughly half. 130 00:07:58,640 --> 00:08:01,970 However, if k is 0 or close to n, 131 00:08:01,970 --> 00:08:05,470 then we only reduce the problem size by a little bit. 132 00:08:05,470 --> 00:08:10,610 So it's impossible to give a definite runtime. 133 00:08:10,610 --> 00:08:14,340 So what we opt to do in randomized algorithm 134 00:08:14,340 --> 00:08:17,820 is that we analyze expected runtime. 135 00:08:23,737 --> 00:08:24,570 What does that mean? 136 00:08:27,840 --> 00:08:30,680 So we can write the recursion of this. 137 00:08:30,680 --> 00:08:39,320 It's T of n equals-- I have two subproblems. 138 00:08:39,320 --> 00:08:42,409 One of them is T of k minus 1. 139 00:08:42,409 --> 00:08:47,286 The other is T of n minus k. 140 00:08:47,286 --> 00:08:49,160 So which one should I put into the recursion? 141 00:08:55,653 --> 00:08:56,635 AUDIENCE: [INAUDIBLE]. 142 00:08:56,635 --> 00:08:57,260 LING REN: Yeah. 143 00:08:57,260 --> 00:08:58,750 I don't know, right? 144 00:08:58,750 --> 00:09:01,940 I don't know whether my element is on a left or on the right. 145 00:09:01,940 --> 00:09:04,570 So I'll be conservative and take a max of these two. 146 00:09:09,770 --> 00:09:11,329 Let me write it down a little bit. 147 00:09:25,320 --> 00:09:28,740 And I have some amount of work to do 148 00:09:28,740 --> 00:09:30,000 before I go to my subproblem. 149 00:09:33,180 --> 00:09:38,720 What's the complexity of that work? 150 00:09:38,720 --> 00:09:39,220 Go ahead. 151 00:09:39,220 --> 00:09:40,220 AUDIENCE: [INAUDIBLE]. 152 00:09:40,220 --> 00:09:43,140 LING REN: It's all O of n. 153 00:09:43,140 --> 00:09:44,240 O of theta n. 154 00:09:44,240 --> 00:09:46,064 Why is that? 155 00:09:46,064 --> 00:09:49,497 AUDIENCE: Because you have to create the array. 156 00:09:49,497 --> 00:09:50,080 LING REN: Yes. 157 00:09:50,080 --> 00:09:52,130 Because you have to scan the array once to 158 00:09:52,130 --> 00:09:54,720 put the smaller elements on one side and the larger elements 159 00:09:54,720 --> 00:09:55,940 on the other side. 160 00:09:55,940 --> 00:09:59,950 Now, this recurrence is impossible to solve, 161 00:09:59,950 --> 00:10:01,620 because I don't know what k is. 162 00:10:01,620 --> 00:10:04,735 So instead, we'll just calculate its expectation. 163 00:10:19,100 --> 00:10:24,960 So the expectation of T of n is taking average 164 00:10:24,960 --> 00:10:29,410 over all the randomness, which means the choice of k. 165 00:10:29,410 --> 00:10:39,290 So there is a probability that my k is equal to j. 166 00:10:42,070 --> 00:10:45,380 If my k is equal to j, then I should 167 00:10:45,380 --> 00:10:48,760 take the maximum of-- sorry. 168 00:11:07,480 --> 00:11:10,690 If my k is equal to j, then I should 169 00:11:10,690 --> 00:11:14,120 take the expectation of the maximum between those two. 170 00:11:18,100 --> 00:11:21,850 And according to the definition of expectation, 171 00:11:21,850 --> 00:11:27,660 I should do a sum from j equals 0-- no, not zero. 172 00:11:27,660 --> 00:11:35,796 I'm starting with 1 all the way to n, right? 173 00:11:35,796 --> 00:11:36,670 Any questions so far? 174 00:11:43,940 --> 00:11:53,470 Now, obviously, depending on my choice of j, 175 00:11:53,470 --> 00:11:54,960 sometimes this one is larger. 176 00:11:54,960 --> 00:11:58,250 Sometimes that one is larger. 177 00:11:58,250 --> 00:12:00,580 I'll just write it a little verbosely. 178 00:12:03,430 --> 00:12:10,790 So if my j is 1, then I should take the right one. 179 00:12:15,060 --> 00:12:19,040 Plus, if j is 2, then I should take m minus 2, 180 00:12:19,040 --> 00:12:20,250 so on and so forth, right? 181 00:12:20,250 --> 00:12:22,800 I think everyone gets this, right? 182 00:12:22,800 --> 00:12:24,850 So I'll just directly jump to the next step. 183 00:12:43,660 --> 00:12:46,970 So when j is smaller than half of n, 184 00:12:46,970 --> 00:12:48,640 I will take the right one. 185 00:12:48,640 --> 00:12:53,720 If j is larger than half of n, I will take left one. 186 00:12:53,720 --> 00:12:57,910 And they happen to be symmetric. 187 00:12:57,910 --> 00:13:06,430 And what I have here is-- everyone gets that? 188 00:13:10,270 --> 00:13:14,020 Now, maybe I'm missing the minus 1 here. 189 00:13:14,020 --> 00:13:18,560 OK, I shouldn't sum to there. 190 00:13:18,560 --> 00:13:19,060 Sorry. 191 00:13:25,310 --> 00:13:28,500 I have n minus 1, n, minus 2, n minus 3, all the way 192 00:13:28,500 --> 00:13:29,982 to half of n. 193 00:13:29,982 --> 00:13:32,430 But from there, I know longer go down. 194 00:13:32,430 --> 00:13:36,030 I go backwards, half of n plus 1 plus 2 plus 3, 195 00:13:36,030 --> 00:13:40,925 all the way back to n minus 1. 196 00:13:40,925 --> 00:13:42,015 Any questions so far? 197 00:13:46,750 --> 00:13:51,076 Oh, we forgot our last term, which is a theta n. 198 00:13:56,532 --> 00:13:57,960 Now, how do we solve this? 199 00:14:03,130 --> 00:14:04,810 Any thoughts? 200 00:14:09,090 --> 00:14:14,940 This is a recurrence on the expectation of T. 201 00:14:14,940 --> 00:14:19,140 So for this type of general recurrence, 202 00:14:19,140 --> 00:14:21,480 we don't have a very good way. 203 00:14:21,480 --> 00:14:25,300 Instead, what we'll do is just take a random guess, 204 00:14:25,300 --> 00:14:27,920 and then see if it is correct. 205 00:14:30,840 --> 00:14:32,730 So I don't really need to guess in this case, 206 00:14:32,730 --> 00:14:34,370 because I know it's O of n. 207 00:14:34,370 --> 00:14:40,650 So let's just assume our expectation of Tn 208 00:14:40,650 --> 00:14:44,492 is theta of n. 209 00:14:44,492 --> 00:14:45,575 What does that mean again? 210 00:14:52,670 --> 00:14:59,240 It means I can find some constant, such 211 00:14:59,240 --> 00:15:07,770 that this expectation is bounded by a constant times n. 212 00:15:07,770 --> 00:15:10,590 So far so good? 213 00:15:10,590 --> 00:15:13,180 Now, we can use induction, assume 214 00:15:13,180 --> 00:15:17,280 that this holds for everything up to n minus 1. 215 00:15:17,280 --> 00:15:22,764 And we're going to show this also holds for n. 216 00:15:22,764 --> 00:15:23,722 Then we're done, right? 217 00:15:26,320 --> 00:15:29,160 Now, we'll just plug that in. 218 00:15:29,160 --> 00:15:38,840 The expectation of T n will be less or equal than this sum 219 00:15:38,840 --> 00:15:41,655 from half of n to n. 220 00:15:49,520 --> 00:15:50,020 Right? 221 00:15:52,710 --> 00:15:54,680 Yeah, I just plugged that in. 222 00:15:54,680 --> 00:15:57,880 Of course, plus our theta n term. 223 00:15:57,880 --> 00:16:02,650 Now, what's the sum of this guy? 224 00:16:05,750 --> 00:16:08,470 Any guesses? 225 00:16:08,470 --> 00:16:10,400 n square? n cube? 226 00:16:10,400 --> 00:16:12,510 or n? 227 00:16:12,510 --> 00:16:13,010 OK. 228 00:16:13,010 --> 00:16:14,090 It's probably a messy. 229 00:16:16,650 --> 00:16:18,685 More cleanly, I can pull this B out. 230 00:16:21,480 --> 00:16:31,850 It's just the sigma of j if I change my sum, 231 00:16:31,850 --> 00:16:35,500 decrease the range of sum by 1. 232 00:16:35,500 --> 00:16:37,900 What is the sigma some of j? 233 00:16:43,110 --> 00:16:45,908 What order first? 234 00:16:45,908 --> 00:16:47,280 AUDIENCE: n square. 235 00:16:47,280 --> 00:16:48,224 LING REN: n square. 236 00:16:48,224 --> 00:16:50,040 OK. 237 00:16:50,040 --> 00:16:52,200 Yeah, definitely n square. 238 00:16:52,200 --> 00:16:57,500 But we need to be a little bit more precise than that. 239 00:16:57,500 --> 00:16:59,740 So what's the coefficient before the n square? 240 00:17:08,970 --> 00:17:13,079 So I claim this coefficient is 3 over 8. 241 00:17:16,670 --> 00:17:18,263 Can anyone see that? 242 00:17:18,263 --> 00:17:21,331 AUDIENCE: Why did you assume that the expected value is 243 00:17:21,331 --> 00:17:22,520 theta n. 244 00:17:22,520 --> 00:17:24,319 LING REN: Oh, that's just a guess. 245 00:17:24,319 --> 00:17:26,849 If it's wrong, we'll have to assume something else, which 246 00:17:26,849 --> 00:17:28,098 we'll see in the next example. 247 00:17:30,700 --> 00:17:31,450 But good question. 248 00:17:34,240 --> 00:17:34,850 OK. 249 00:17:34,850 --> 00:17:35,350 Yeah. 250 00:17:35,350 --> 00:17:36,750 Let me ask the question again. 251 00:17:36,750 --> 00:17:41,940 I claim this sum is roughly 3 over 8 n square. 252 00:17:44,880 --> 00:17:45,880 Can anyone see that? 253 00:17:51,261 --> 00:17:51,760 Any ideas? 254 00:17:56,690 --> 00:17:59,400 So I don't know how to calculate this term. 255 00:17:59,400 --> 00:18:05,670 But I do know how to calculate sigma from 1 to n, right? 256 00:18:05,670 --> 00:18:08,250 This is easy. 257 00:18:08,250 --> 00:18:08,750 What's that? 258 00:18:12,890 --> 00:18:17,170 It's half of n, n minus 1. 259 00:18:17,170 --> 00:18:23,910 So it's roughly half of n squared. 260 00:18:23,910 --> 00:18:27,070 Now, this term is the sum of this 261 00:18:27,070 --> 00:18:29,050 minus the sum to half of n. 262 00:18:40,180 --> 00:18:43,390 So it's roughly half of n squared minus one 263 00:18:43,390 --> 00:18:48,200 half of one half n squared. 264 00:18:48,200 --> 00:18:50,280 Makes sense? 265 00:18:50,280 --> 00:19:00,075 So this is roughly 3 over 8 n squared plus an order n 266 00:19:00,075 --> 00:19:02,125 term or less, or constant. 267 00:19:05,420 --> 00:19:07,006 Any questions? 268 00:19:07,006 --> 00:19:07,950 Does that makes sense? 269 00:19:10,530 --> 00:19:14,280 Then it's very easy if we just plug this in. 270 00:19:21,480 --> 00:19:22,620 Sorry. 271 00:19:22,620 --> 00:19:25,000 There is a mistake. 272 00:19:25,000 --> 00:19:26,670 I just realized. 273 00:19:26,670 --> 00:19:30,645 Can anyone point that out? 274 00:19:36,790 --> 00:19:39,515 So how many terms do I have in total? 275 00:19:42,700 --> 00:19:46,720 One from n, I have [? a ?] term. 276 00:19:46,720 --> 00:19:48,970 So each term should appear twice. 277 00:19:52,028 --> 00:19:53,840 Correct? 278 00:19:53,840 --> 00:19:55,780 So I should have a two here. 279 00:19:59,690 --> 00:20:03,810 And so I somehow just throw away this probability. 280 00:20:03,810 --> 00:20:09,797 But this probability is 1 over n. 281 00:20:09,797 --> 00:20:11,380 Because I'm choosing a random element, 282 00:20:11,380 --> 00:20:15,020 there is 1 over n probability that is equal to 1, 283 00:20:15,020 --> 00:20:16,030 equal to 2, 3, 4. 284 00:20:19,130 --> 00:20:21,460 Every of this is 1 over n. 285 00:20:27,330 --> 00:20:27,830 Correct? 286 00:20:34,210 --> 00:20:39,270 So I should have 2 over n here, 2 over here. 287 00:20:42,060 --> 00:20:52,080 And if we plug this in, it's 3 over 8 n cubed plus a theta n. 288 00:20:52,080 --> 00:20:59,260 Our goal is to show this is less than B times n, which 289 00:20:59,260 --> 00:21:01,910 is clearly true. 290 00:21:01,910 --> 00:21:09,030 Because this is 3/4 of n, 3/4 of B times n, plus another term. 291 00:21:09,030 --> 00:21:12,120 We can say this is another constant D times n. 292 00:21:12,120 --> 00:21:17,307 And if we choose B accordingly, this can hold. 293 00:21:17,307 --> 00:21:17,890 Any questions? 294 00:21:35,680 --> 00:21:39,930 You look confused or too easy. 295 00:21:39,930 --> 00:21:40,430 OK. 296 00:21:40,430 --> 00:21:42,080 Our guess is the latter. 297 00:21:44,990 --> 00:21:46,085 Oh, it is not? 298 00:21:54,852 --> 00:21:55,860 OK? 299 00:21:55,860 --> 00:22:00,040 So then we have solved this expected runtime of quick find. 300 00:22:02,550 --> 00:22:03,850 Now, let's look at quick sort. 301 00:22:06,410 --> 00:22:09,080 Quick sort is very similar. 302 00:22:09,080 --> 00:22:12,294 The only difference is that once I 303 00:22:12,294 --> 00:22:13,960 put all the smaller elements on one side 304 00:22:13,960 --> 00:22:17,270 and the larger elements on the other side, 305 00:22:17,270 --> 00:22:19,980 instead of going into one of them, I have to sort of both. 306 00:22:23,730 --> 00:22:29,360 So the only change is that instead of taking the max here, 307 00:22:29,360 --> 00:22:31,560 I need to add them. 308 00:22:35,780 --> 00:22:36,280 Correct? 309 00:22:39,010 --> 00:22:42,300 So now the same thing here. 310 00:22:42,300 --> 00:22:45,770 Instead of taking the max, I should add them up. 311 00:22:50,500 --> 00:22:56,050 Now when it propagates here, so every term appears 312 00:22:56,050 --> 00:23:05,489 twice all the way from j equals 1 to j equals n. 313 00:23:05,489 --> 00:23:06,405 Is everyone following? 314 00:23:17,101 --> 00:23:18,372 AUDIENCE: Can you repeat that? 315 00:23:18,372 --> 00:23:19,080 LING REN: Hm-hmm? 316 00:23:19,080 --> 00:23:20,360 AUDIENCE: Can you repeat that? 317 00:23:20,360 --> 00:23:20,901 LING REN: OK. 318 00:23:20,901 --> 00:23:21,850 Sure. 319 00:23:21,850 --> 00:23:25,880 Originally, we have a max here. 320 00:23:25,880 --> 00:23:28,370 So first, did everyone get this part? 321 00:23:28,370 --> 00:23:31,400 We have a plus instead of a max here. 322 00:23:31,400 --> 00:23:34,830 We have to solve both the problems. 323 00:23:34,830 --> 00:23:39,650 Now, if it's a max, then what I have is n minus 1, n minus 2, 324 00:23:39,650 --> 00:23:41,730 all the way to half of n, and then half of n, 325 00:23:41,730 --> 00:23:46,120 half of n plus 1, all the way back to n minus 1, right? 326 00:23:46,120 --> 00:23:51,060 And if I have a sum, then what I have is for j equals 1's case, 327 00:23:51,060 --> 00:23:56,970 I have T of 0 and T of n and minus 1. 328 00:23:56,970 --> 00:23:58,130 This is j equals 1. 329 00:23:58,130 --> 00:24:05,520 If j equals 2, I have T of 1 and T of n minus 2. 330 00:24:05,520 --> 00:24:09,540 As j increases, this one goes from 0 to n. 331 00:24:09,540 --> 00:24:13,240 And this one goes from n minus 1 to 0. 332 00:24:13,240 --> 00:24:15,044 I'm going to sum all of them up. 333 00:24:23,450 --> 00:24:26,064 Does that answer your question? 334 00:24:26,064 --> 00:24:27,530 OK. 335 00:24:27,530 --> 00:24:31,600 So instead of from half of n to n, we're summing from 1 to n. 336 00:24:35,610 --> 00:24:40,870 Now, we also have another good question here. 337 00:24:40,870 --> 00:24:43,870 Why do I guess it's theta n? 338 00:24:43,870 --> 00:24:45,180 Well, it's just a random guess. 339 00:24:45,180 --> 00:24:46,300 It could be wrong. 340 00:24:46,300 --> 00:24:49,340 For example, in this case, it's just incorrect. 341 00:24:49,340 --> 00:24:49,840 Why? 342 00:24:49,840 --> 00:24:57,160 Because every sum, the range becomes a 1 to n. 343 00:25:02,110 --> 00:25:06,120 Now what I have is no longer 3 over 8 over n. 344 00:25:06,120 --> 00:25:08,730 What do I have again? 345 00:25:08,730 --> 00:25:10,520 What do I have now if it's 1 over n? 346 00:25:14,146 --> 00:25:15,245 AUDIENCE: One half. 347 00:25:15,245 --> 00:25:15,870 LING REN: Yeah. 348 00:25:15,870 --> 00:25:18,036 It's one half of n. 349 00:25:18,036 --> 00:25:21,490 But if I change one half here, what I get 350 00:25:21,490 --> 00:25:25,530 is B times n plus D times n. 351 00:25:25,530 --> 00:25:26,930 I want to prove that it's smaller 352 00:25:26,930 --> 00:25:32,030 than B times n, which is clearly impossible no matter 353 00:25:32,030 --> 00:25:44,710 how you choose B. 354 00:25:44,710 --> 00:25:46,960 Did everyone get that? 355 00:25:46,960 --> 00:25:50,220 If we the same assumption, I make the hypothesis 356 00:25:50,220 --> 00:25:52,100 and we plug them in, we can no longer 357 00:25:52,100 --> 00:25:54,350 prove the induction step. 358 00:25:59,130 --> 00:25:59,630 OK. 359 00:25:59,630 --> 00:26:01,319 So what we do? 360 00:26:01,319 --> 00:26:02,235 We make another guess. 361 00:26:05,730 --> 00:26:07,315 So let me rewrite our recursion. 362 00:26:21,328 --> 00:26:22,715 So what's the next guess? 363 00:26:32,150 --> 00:26:32,650 Any guesses? 364 00:26:41,035 --> 00:26:42,410 How about we just guess n square? 365 00:26:48,020 --> 00:26:52,320 Anyone unhappy with that guess? 366 00:26:52,320 --> 00:26:53,540 So we can do the same thing. 367 00:26:53,540 --> 00:26:55,260 We can plug it in. 368 00:26:55,260 --> 00:26:59,030 And that will be a-- sorry, I missed another term. 369 00:26:59,030 --> 00:27:01,570 This is 1 over 2 here. 370 00:27:04,660 --> 00:27:06,600 If we make that guess, then what we 371 00:27:06,600 --> 00:27:14,450 have is the sum from 1 to n minus 1, 372 00:27:14,450 --> 00:27:18,004 the sum of j square plus a theta n. 373 00:27:21,880 --> 00:27:34,260 And the sum of j squared is roughly n cubed divided by 3. 374 00:27:39,550 --> 00:27:41,251 Is that obvious to everyone? 375 00:27:41,251 --> 00:27:41,750 Maybe not. 376 00:27:44,780 --> 00:27:46,100 OK. 377 00:27:46,100 --> 00:27:49,590 Can anyone explain this to us, why 378 00:27:49,590 --> 00:27:55,710 can I claim the sum of square term is n cubed over 3? 379 00:28:05,039 --> 00:28:07,494 Go ahead. 380 00:28:07,494 --> 00:28:11,904 AUDIENCE: It's n times n minus 1 times n minus 2 over 381 00:28:11,904 --> 00:28:12,404 [INAUDIBLE]. 382 00:28:14,945 --> 00:28:16,320 LING REN: I think you're correct. 383 00:28:16,320 --> 00:28:19,550 There is a formula, which is-- yeah, I don't remember exactly, 384 00:28:19,550 --> 00:28:25,150 but it's roughly this. 385 00:28:25,150 --> 00:28:27,310 If you know this, then you definitely see that. 386 00:28:27,310 --> 00:28:32,010 If you don't, we can turn this sum into an integral. 387 00:28:35,390 --> 00:28:41,830 And that is n cubed over 3. 388 00:28:41,830 --> 00:28:42,383 Make sense? 389 00:28:45,210 --> 00:28:49,790 If we plug that in, what do we have? 390 00:28:49,790 --> 00:28:56,370 2 over n 3n cubed plus theta n. 391 00:28:56,370 --> 00:28:58,290 Of course there's a B here. 392 00:28:58,290 --> 00:29:01,625 And we want to show this is less than B times n squared. 393 00:29:07,940 --> 00:29:09,566 Does it hold? 394 00:29:09,566 --> 00:29:10,065 Is it true? 395 00:29:17,690 --> 00:29:19,270 It's clearly true, right? 396 00:29:19,270 --> 00:29:22,597 So this-- no. 397 00:29:22,597 --> 00:29:23,180 This is cubed. 398 00:29:26,440 --> 00:29:26,940 Yeah. 399 00:29:26,940 --> 00:29:28,314 Sorry, I am making many mistakes. 400 00:29:28,314 --> 00:29:32,820 But that's actually good to catch your attention. 401 00:29:32,820 --> 00:29:35,280 But it actually worries me that you didn't point this out. 402 00:29:39,460 --> 00:29:42,890 This is 2/3 n squared times B. It's 403 00:29:42,890 --> 00:29:46,010 clearly less than B n square. 404 00:29:46,010 --> 00:29:46,510 OK. 405 00:29:46,510 --> 00:29:50,321 So we claimed the algorithm is n squared. 406 00:29:50,321 --> 00:29:50,820 Correct? 407 00:29:57,190 --> 00:29:58,170 Go ahead. 408 00:29:58,170 --> 00:29:58,660 AUDIENCE: Does that mean that you 409 00:29:58,660 --> 00:30:00,517 claim that it's less than n squared, being 410 00:30:00,517 --> 00:30:02,301 that it's n squared definitely? 411 00:30:02,301 --> 00:30:03,050 LING REN: Exactly. 412 00:30:03,050 --> 00:30:07,380 So what I've proved here is that the algorithm is definitely 413 00:30:07,380 --> 00:30:11,680 O of n square, but maybe less. 414 00:30:11,680 --> 00:30:14,590 And you can see we still have a lot of room here. 415 00:30:14,590 --> 00:30:17,300 This inequality is not very tight. 416 00:30:20,620 --> 00:30:26,050 So in fact, it's a very good question, 417 00:30:26,050 --> 00:30:27,570 how do we make that guess. 418 00:30:27,570 --> 00:30:31,440 So you already know the answer is n log n, right? 419 00:30:31,440 --> 00:30:32,460 So it's not interesting. 420 00:30:32,460 --> 00:30:36,100 But if you don't, then how do we go about and do things? 421 00:30:36,100 --> 00:30:38,870 We have to make these guesses. 422 00:30:38,870 --> 00:30:42,780 So how about we know this n doesn't hold and 2 is too much. 423 00:30:42,780 --> 00:30:46,220 Next guess is n raised to 1 plus epsilon. 424 00:30:49,930 --> 00:30:52,506 Then what will we have? 425 00:30:52,506 --> 00:30:57,420 If we carry out the same integral argument, 426 00:30:57,420 --> 00:31:05,630 we have 2 plus epsilon, n raised to 2 plus epsilon 427 00:31:05,630 --> 00:31:06,927 over 2 plus epsilon. 428 00:31:06,927 --> 00:31:08,270 Correct? 429 00:31:08,270 --> 00:31:20,600 And if we plug that in, we get this. 430 00:31:23,450 --> 00:31:28,060 And we want to show it's less than n raised 431 00:31:28,060 --> 00:31:30,880 to 1 plus epsilon. 432 00:31:30,880 --> 00:31:31,670 Does this hold? 433 00:31:37,200 --> 00:31:39,290 This term is less than 1. 434 00:31:39,290 --> 00:31:41,800 And that's n raised to 1 plus epsilon. 435 00:31:41,800 --> 00:31:44,030 So this is true. 436 00:31:44,030 --> 00:31:49,320 So we can easily prove it's, indeed, n raised to 1 437 00:31:49,320 --> 00:31:51,080 plus epsilon for any epsilon. 438 00:31:53,695 --> 00:31:54,195 Questions? 439 00:31:57,440 --> 00:31:58,750 But is it tight? 440 00:31:58,750 --> 00:32:01,370 We still don't know. 441 00:32:01,370 --> 00:32:02,730 So then what do we do? 442 00:32:02,730 --> 00:32:04,110 We just make another guess. 443 00:32:07,260 --> 00:32:13,328 And let's guess T of n is [INAUDIBLE] n log n. 444 00:32:18,750 --> 00:32:21,370 Definitely, you may run into two cases. 445 00:32:21,370 --> 00:32:24,110 You can either prove it or not. 446 00:32:24,110 --> 00:32:26,390 If this doesn't hold, you just go to log n square. 447 00:32:26,390 --> 00:32:28,550 And gradually, you will find the answer. 448 00:32:28,550 --> 00:32:30,150 Well, if you don't know the answer, 449 00:32:30,150 --> 00:32:32,480 probably this is how you do things. 450 00:32:32,480 --> 00:32:41,110 Now, if we guess this n log n, then I 451 00:32:41,110 --> 00:32:47,350 have a little harder equation here. 452 00:32:52,330 --> 00:32:53,850 Because it's now j log j. 453 00:33:04,634 --> 00:33:05,550 How do I compute that? 454 00:33:18,834 --> 00:33:19,340 Yeah. 455 00:33:19,340 --> 00:33:24,510 It's not the sum of natural number or squares of numbers, 456 00:33:24,510 --> 00:33:29,359 so you cannot use a formula like this. 457 00:33:29,359 --> 00:33:31,150 But we can still use the integral argument. 458 00:33:34,290 --> 00:33:37,420 I'm not going to do that, because that's 459 00:33:37,420 --> 00:33:42,880 what you should have learned in calculus or other math class. 460 00:33:42,880 --> 00:33:49,760 But it happens that this integral of j log j 461 00:33:49,760 --> 00:34:04,760 is roughly half of n squared log n minus some constant times log 462 00:34:04,760 --> 00:34:05,260 n. 463 00:34:05,260 --> 00:34:06,830 I think you can change this constant. 464 00:34:06,830 --> 00:34:11,389 But it's roughly smaller than that. 465 00:34:11,389 --> 00:34:45,520 If you plug that in, you get n over 2 one half times B 466 00:34:45,520 --> 00:34:47,080 plus theta n. 467 00:34:47,080 --> 00:34:52,020 And we want it to be smaller than B times n log 468 00:34:52,020 --> 00:34:56,850 n, which will be true. 469 00:35:05,480 --> 00:35:10,060 This is exactly that, but we are minus some term. 470 00:35:10,060 --> 00:35:12,162 And the term we are extracting is 471 00:35:12,162 --> 00:35:13,370 larger than the theta n term. 472 00:35:15,920 --> 00:35:19,430 So we can prove this algorithm is n log n. 473 00:35:23,460 --> 00:35:25,840 But you can ask the same question, 474 00:35:25,840 --> 00:35:27,260 how do I know it's n log n? 475 00:35:27,260 --> 00:35:29,530 Or if I don't know it's n log n, maybe I 476 00:35:29,530 --> 00:35:31,320 should go about and try log log n. 477 00:35:36,290 --> 00:35:37,740 So you're welcome to try. 478 00:35:37,740 --> 00:35:40,120 And it's actually a very good thought. 479 00:35:40,120 --> 00:35:43,099 Because I think is very uninteresting 480 00:35:43,099 --> 00:35:44,390 if you already know the answer. 481 00:35:44,390 --> 00:35:46,736 If you don't know, you have to try that. 482 00:35:46,736 --> 00:35:49,120 But when do you stop? 483 00:35:49,120 --> 00:35:51,490 At a reasonable point, you can also 484 00:35:51,490 --> 00:35:54,910 prove the other way that it's runtime is 485 00:35:54,910 --> 00:35:56,930 larger than something. 486 00:35:56,930 --> 00:36:00,580 Here, if we prove this big O of n log n, 487 00:36:00,580 --> 00:36:03,974 if you can show the other way that it's omega n log n, 488 00:36:03,974 --> 00:36:06,950 then you know you have arrived at a final answer. 489 00:36:16,870 --> 00:36:18,720 That's the math part. 490 00:36:18,720 --> 00:36:20,065 Any questions about that? 491 00:36:23,530 --> 00:36:24,030 Yeah. 492 00:36:24,030 --> 00:36:26,113 Any questions about everything I have said so far? 493 00:36:35,430 --> 00:36:39,750 If not, so lastly, I just have a few comments, 494 00:36:39,750 --> 00:36:40,860 several terminology. 495 00:36:43,740 --> 00:36:49,920 Now, this recitation we focused on expected runtime. 496 00:36:49,920 --> 00:36:52,300 You have already seen amortized runtime. 497 00:36:58,100 --> 00:37:00,070 Or you may have heard of average runtime. 498 00:37:06,900 --> 00:37:10,930 So to be honest, expected and amortized 499 00:37:10,930 --> 00:37:14,470 are just too fancier ways of saying average. 500 00:37:14,470 --> 00:37:20,882 But in algorithm analysis, we do mean slightly different things 501 00:37:20,882 --> 00:37:21,590 with these terms. 502 00:37:25,160 --> 00:37:29,310 So the difference is that we are averaging 503 00:37:29,310 --> 00:37:31,540 over different things. 504 00:37:31,540 --> 00:37:33,870 So if we say average runtime, we usually 505 00:37:33,870 --> 00:37:37,370 mean taking the average over input. 506 00:37:37,370 --> 00:37:39,710 We can imagine a quick sort of quick 507 00:37:39,710 --> 00:37:42,760 find algorithm that doesn't use randomness, where you'll always 508 00:37:42,760 --> 00:37:45,380 select your first element as your favorite. 509 00:37:45,380 --> 00:37:47,220 That's a reasonable algorithm. 510 00:37:47,220 --> 00:37:49,200 If your input is random, then you 511 00:37:49,200 --> 00:37:53,110 can carry out the same argument and show that its complexity is 512 00:37:53,110 --> 00:37:55,320 over n or n log n. 513 00:37:55,320 --> 00:37:58,490 But if your input is pre-sorted or reverse sorted in some 514 00:37:58,490 --> 00:38:01,230 special cases, you cannot do that. 515 00:38:01,230 --> 00:38:04,510 So average runtime is usually a very weak argument, 516 00:38:04,510 --> 00:38:08,400 because you have to make assumptions about your input. 517 00:38:08,400 --> 00:38:11,790 And expected runtime is definitely better, 518 00:38:11,790 --> 00:38:15,060 because we are taking average over the randomness we 519 00:38:15,060 --> 00:38:16,630 introduced. 520 00:38:16,630 --> 00:38:20,770 They are independent of the input. 521 00:38:20,770 --> 00:38:25,090 So we're not making any assumptions on the input. 522 00:38:25,090 --> 00:38:27,300 So of course, this comes at a price. 523 00:38:27,300 --> 00:38:30,330 This randomness doesn't come for free. 524 00:38:30,330 --> 00:38:35,360 So in fact, it's very hard to generate high quantity 525 00:38:35,360 --> 00:38:37,030 random numbers. 526 00:38:37,030 --> 00:38:38,460 Maybe at the end of the class, you 527 00:38:38,460 --> 00:38:42,030 will see that in crypto, a lot of works 528 00:38:42,030 --> 00:38:45,630 are just devoted to generate high quantity random numbers. 529 00:38:45,630 --> 00:38:47,870 If you can do that efficiently, you can actually 530 00:38:47,870 --> 00:38:49,280 solve a lot of problems. 531 00:38:49,280 --> 00:38:53,322 So amortized runtime are slightly, again, different 532 00:38:53,322 --> 00:38:55,280 from these two, because they are taking average 533 00:38:55,280 --> 00:38:57,740 over a number of operations. 534 00:38:57,740 --> 00:39:00,250 You're doing many, many operations in a row. 535 00:39:00,250 --> 00:39:01,500 And some of them takes longer. 536 00:39:01,500 --> 00:39:03,540 Some of them take shorter. 537 00:39:03,540 --> 00:39:05,080 And you take average on those. 538 00:39:09,445 --> 00:39:10,420 OK. 539 00:39:10,420 --> 00:39:11,587 That's this recitation. 540 00:39:11,587 --> 00:39:12,170 Any questions? 541 00:39:20,540 --> 00:39:22,090 OK.