1 00:00:00,100 --> 00:00:02,500 The following content is provided under a Creative 2 00:00:02,500 --> 00:00:04,019 Commons license. 3 00:00:04,019 --> 00:00:06,360 Your support will help MIT OpenCourseWare 4 00:00:06,360 --> 00:00:10,740 continue to offer high quality educational resources for free. 5 00:00:10,740 --> 00:00:13,330 To make a donation, or view additional materials 6 00:00:13,330 --> 00:00:17,207 from 100 of MIT courses, visit MIT OpenCourseWare 7 00:00:17,207 --> 00:00:17,832 at ocw.mit.edu. 8 00:00:21,150 --> 00:00:23,740 LING REN: Everyone, today we're going to look 9 00:00:23,740 --> 00:00:25,260 at dynamic programming again. 10 00:00:29,000 --> 00:00:31,700 So I think I have mentioned several times, 11 00:00:31,700 --> 00:00:34,590 so you should all know it by heart now, 12 00:00:34,590 --> 00:00:37,420 the dynamic programming, its main idea 13 00:00:37,420 --> 00:00:46,460 is divide the problem into subproblems and reuse 14 00:00:46,460 --> 00:00:49,680 the results of the problems you already solved. 15 00:00:49,680 --> 00:00:50,600 Right? 16 00:00:50,600 --> 00:00:54,010 And, of course, in 6.046 we always care about the runtime. 17 00:00:57,440 --> 00:01:03,720 So those are the two big themes for dynamic programming. 18 00:01:06,430 --> 00:01:09,070 Now, let's start with a warm-up example. 19 00:01:09,070 --> 00:01:12,120 It's extremely simple. 20 00:01:12,120 --> 00:01:20,730 Let's say we have a grid, and there's a robot from, say, 21 00:01:20,730 --> 00:01:26,980 coordinate 1,1 and it wants to go to coordinate m,n. 22 00:01:26,980 --> 00:01:30,980 So at every step, it can only either take a step up, 23 00:01:30,980 --> 00:01:33,160 or take a step on the right. 24 00:01:33,160 --> 00:01:37,640 So how many distinct paths are there for the robot to take? 25 00:01:46,330 --> 00:01:49,290 Is the question clear? 26 00:01:49,290 --> 00:01:51,870 So we have a robot at coordinate 1,1. 27 00:01:51,870 --> 00:01:54,420 It wants to go to coordinate m,n. 28 00:01:54,420 --> 00:01:57,410 And every step, it can either take a step up, 29 00:01:57,410 --> 00:01:59,420 or take a step to the right. 30 00:01:59,420 --> 00:02:02,340 How many distinct path are there that 31 00:02:02,340 --> 00:02:04,153 can take the robot to its destination? 32 00:02:10,180 --> 00:02:11,365 Any ideas how to solve that? 33 00:02:19,623 --> 00:02:20,123 Go ahead. 34 00:02:20,123 --> 00:02:23,616 AUDIENCE: So, we define subproblems as the number 35 00:02:23,616 --> 00:02:30,640 of distinct paths from some point x,y to m,n. 36 00:02:30,640 --> 00:02:33,090 Then the number of distinct paths from some point 37 00:02:33,090 --> 00:02:37,990 is the number of paths if you go up if you're allowed to go up, 38 00:02:37,990 --> 00:02:40,930 plus the number of paths if you go right 39 00:02:40,930 --> 00:02:42,400 if you're allowed to go right. 40 00:02:42,400 --> 00:02:44,880 So if you were on the edge, [INAUDIBLE]. 41 00:02:44,880 --> 00:02:47,520 LING REN: Yup, yup. 42 00:02:47,520 --> 00:02:48,520 Does everyone got that? 43 00:02:48,520 --> 00:02:49,540 So, it's very simple. 44 00:02:49,540 --> 00:02:53,590 So, I know I have only one way to get to these points. 45 00:02:53,590 --> 00:02:55,230 I need to go all the way right. 46 00:02:55,230 --> 00:02:57,340 And only one way to get to these points. 47 00:02:57,340 --> 00:02:59,560 I need to go all the way up. 48 00:02:59,560 --> 00:03:01,720 So for all the intermediate nodes, 49 00:03:01,720 --> 00:03:07,220 my number of choices are-- is this board moving? 50 00:03:07,220 --> 00:03:12,490 Are just the number of distinct paths I can come from my left, 51 00:03:12,490 --> 00:03:16,060 plus the number of distinct path I can come from bottom. 52 00:03:16,060 --> 00:03:17,200 And then I can go in. 53 00:03:17,200 --> 00:03:20,500 For every node, I'll just take a sum between the two numbers 54 00:03:20,500 --> 00:03:23,200 on my left and on my bottom. 55 00:03:23,200 --> 00:03:25,900 And go from there. 56 00:03:25,900 --> 00:03:26,495 OK. 57 00:03:26,495 --> 00:03:27,478 Is that clear? 58 00:03:31,960 --> 00:03:34,320 So this example is very simple, but it 59 00:03:34,320 --> 00:03:38,125 does illustrate the point of dynamic programming very well. 60 00:03:41,130 --> 00:03:45,210 You solve subproblems, and ask how many distinct path can I 61 00:03:45,210 --> 00:03:49,930 come here, and you reuse the results of, for example, 62 00:03:49,930 --> 00:03:54,200 this subproblem because you are using it to compute 63 00:03:54,200 --> 00:03:58,400 this number and that number. 64 00:03:58,400 --> 00:04:02,440 If you don't do that, if you don't memorize and reuse 65 00:04:02,440 --> 00:04:06,055 the results, then your runtime will be worse. 66 00:04:06,055 --> 00:04:07,305 So what's the runtime of that? 67 00:04:16,510 --> 00:04:17,601 Speak up. 68 00:04:17,601 --> 00:04:19,810 AUDIENCE: [INAUDIBLE] 69 00:04:19,810 --> 00:04:24,100 LING REN: It's just m times n. 70 00:04:24,100 --> 00:04:24,600 Why? 71 00:04:24,600 --> 00:04:28,320 Because I have this many unique sub problems. 72 00:04:28,320 --> 00:04:33,580 One at each point, and I'm just taking the sum of two numbers 73 00:04:33,580 --> 00:04:37,470 at each subproblem, so it takes me constant time 74 00:04:37,470 --> 00:04:41,490 to merge the results from my subproblems to get my problem. 75 00:04:41,490 --> 00:04:45,220 So to analyze runtime, usually we 76 00:04:45,220 --> 00:04:51,010 ask the question how many unique problems do I have. 77 00:04:51,010 --> 00:04:52,760 And what's the amount of merge work 78 00:04:52,760 --> 00:04:55,000 I have to do at every step? 79 00:05:04,542 --> 00:05:05,500 That's the toy example. 80 00:05:09,685 --> 00:05:12,850 Now let's look at some more complicated examples. 81 00:05:15,660 --> 00:05:17,545 Our first one is called make change. 82 00:05:21,150 --> 00:05:24,320 As its name suggests, we have a bunch of coins. 83 00:05:24,320 --> 00:05:31,710 s1, s2, all the way to, say, sm. 84 00:05:31,710 --> 00:05:35,600 So each coin has some values, like 1 cent, 5 cent, 10 cent. 85 00:05:35,600 --> 00:05:39,500 We're going to make change for a total of n cents, 86 00:05:39,500 --> 00:05:43,550 and ask what's the minimum number of coins 87 00:05:43,550 --> 00:05:47,715 do I need to make change of n cents. 88 00:05:53,440 --> 00:05:56,540 So to guarantee that we can always make this change, 89 00:05:56,540 --> 00:05:58,990 we'll set s1 to be 1. 90 00:05:58,990 --> 00:06:01,820 Otherwise, there's a chance that the problem is unsolvable. 91 00:06:08,700 --> 00:06:09,860 Any ideas? 92 00:06:09,860 --> 00:06:11,418 Is the problem clear? 93 00:06:11,418 --> 00:06:12,834 STUDENT: How do you find s1 again? 94 00:06:12,834 --> 00:06:15,800 Or si? 95 00:06:15,800 --> 00:06:17,970 LING REN: What, these numbers? 96 00:06:17,970 --> 00:06:19,690 They are inputs. 97 00:06:19,690 --> 00:06:20,870 They are also inputs. 98 00:06:20,870 --> 00:06:22,840 It could be 1 cent, 5 cent, 10 cent. 99 00:06:22,840 --> 00:06:25,670 Or 3 cent, 7 cent. 100 00:06:25,670 --> 00:06:27,170 Though the smallest one is always 1. 101 00:06:30,420 --> 00:06:30,920 OK. 102 00:06:30,920 --> 00:06:32,461 I need to find a combination of them. 103 00:06:32,461 --> 00:06:35,330 For each of them, I have an infinite number of them. 104 00:06:35,330 --> 00:06:39,230 So I can find two of these, three of that, five of that, 105 00:06:39,230 --> 00:06:42,960 such that their sum is n. 106 00:06:42,960 --> 00:06:43,918 Is the problem clear? 107 00:06:48,365 --> 00:06:48,865 OK. 108 00:06:48,865 --> 00:06:50,031 Any ideas how to solve that? 109 00:06:55,995 --> 00:06:59,510 So let's just use a naive or very straightforward 110 00:06:59,510 --> 00:07:00,010 algorithms. 111 00:07:12,440 --> 00:07:13,190 Go ahead. 112 00:07:13,190 --> 00:07:18,640 AUDIENCE: You pick one, and then you do mc of n minus that. 113 00:07:18,640 --> 00:07:20,000 LING REN: OK, great. 114 00:07:20,000 --> 00:07:23,460 Yeah, let's just do exhaustive search. 115 00:07:23,460 --> 00:07:27,220 Let's pick si. 116 00:07:27,220 --> 00:07:29,770 If I pick this coin, then my subproblem 117 00:07:29,770 --> 00:07:35,220 becomes n minus the coin value. 118 00:07:35,220 --> 00:07:37,950 And of course, I use the one coin. 119 00:07:37,950 --> 00:07:39,230 That's si. 120 00:07:39,230 --> 00:07:50,180 So then I think the min of this for all the i's, and that's 121 00:07:50,180 --> 00:07:50,940 the solution. 122 00:07:54,300 --> 00:07:55,060 So far so good? 123 00:08:02,640 --> 00:08:03,140 OK. 124 00:08:03,140 --> 00:08:05,430 So what's the runtime of this algorithm? 125 00:08:15,968 --> 00:08:19,640 If it's not immediately obvious, then we 126 00:08:19,640 --> 00:08:23,620 ask how many unique subproblems are there. 127 00:08:23,620 --> 00:08:28,470 And how much work do I have to do to go from my subproblems 128 00:08:28,470 --> 00:08:30,140 to my original problem? 129 00:08:33,460 --> 00:08:35,105 So how many subproblem are there? 130 00:08:49,670 --> 00:08:51,960 So to be clear, for this one, we have 131 00:08:51,960 --> 00:08:54,730 to call this recursive call again. 132 00:08:54,730 --> 00:08:57,110 n minus si, probably minus sj. 133 00:09:05,300 --> 00:09:07,860 And if you cannot compute how many subproblems are there, 134 00:09:07,860 --> 00:09:08,890 let's just give a bound. 135 00:09:23,360 --> 00:09:23,980 Any ideas? 136 00:09:31,320 --> 00:09:32,826 John, right? 137 00:09:32,826 --> 00:09:35,022 AUDIENCE: I'm not sure there would 138 00:09:35,022 --> 00:09:39,658 be more than n subproblems, because the smallest 139 00:09:39,658 --> 00:09:44,538 amount we can subtract from the original is 1. 140 00:09:44,538 --> 00:09:46,978 And if we keep subtracting 1 repeatedly, 141 00:09:46,978 --> 00:09:48,930 we get n subproblems, and that will cover 142 00:09:48,930 --> 00:09:50,870 everything-- that subproblem. 143 00:09:50,870 --> 00:09:51,870 LING REN: Yeah, correct. 144 00:09:51,870 --> 00:09:55,260 So this may not be a very tight bound, 145 00:09:55,260 --> 00:09:58,570 but we know we cannot have more than this number 146 00:09:58,570 --> 00:09:59,830 of subproblems. 147 00:09:59,830 --> 00:10:04,300 Actually, I don't need to even put the order there. 148 00:10:04,300 --> 00:10:07,410 I know we can have no more than n subproblems. 149 00:10:07,410 --> 00:10:10,650 They're just make change of n, n minus 1, n minus 2, all the way 150 00:10:10,650 --> 00:10:14,010 to make change 1. 151 00:10:14,010 --> 00:10:15,960 And actually, this bound is pretty 152 00:10:15,960 --> 00:10:19,590 tight, because we set our smallest coin is 1, 153 00:10:19,590 --> 00:10:23,580 so we won't make a recursive call to make change n minus 1, 154 00:10:23,580 --> 00:10:25,320 right? 155 00:10:25,320 --> 00:10:29,470 If I pick the 1 coin, the 1 cent coin first. 156 00:10:29,470 --> 00:10:32,710 And then from there, I will pick a 1 cent coin again. 157 00:10:32,710 --> 00:10:35,850 That gives me a subproblem with n minus 2. 158 00:10:35,850 --> 00:10:38,580 So indeed, I will encounter all the n subproblems. 159 00:10:42,120 --> 00:10:45,810 OK, so having realized that, how much work 160 00:10:45,810 --> 00:10:48,315 do I have to do to go from here to there? 161 00:10:56,316 --> 00:10:58,120 AUDIENCE: [INAUDIBLE] 162 00:10:58,120 --> 00:10:58,870 LING REN: Correct. 163 00:10:58,870 --> 00:11:02,430 Because I'm taking the min of how many terms? 164 00:11:02,430 --> 00:11:05,264 m terms. 165 00:11:05,264 --> 00:11:06,180 So that's our runtime. 166 00:11:18,390 --> 00:11:19,480 Any questions so far? 167 00:11:22,110 --> 00:11:25,090 If not, let me take a digression. 168 00:11:25,090 --> 00:11:28,710 So, make change, this problem. 169 00:11:28,710 --> 00:11:31,335 If you think about it, it's very similar to knapsack. 170 00:11:35,140 --> 00:11:38,300 Has anyone not heard of this problem? 171 00:11:38,300 --> 00:11:40,530 Knapsack means you have a bunch of items. 172 00:11:40,530 --> 00:11:42,430 You want to pack these into a bag, 173 00:11:42,430 --> 00:11:45,880 and the bag has a certain size. 174 00:11:45,880 --> 00:11:47,520 So each item has a certain value, 175 00:11:47,520 --> 00:11:53,050 and you want to pack the items that have the largest combined 176 00:11:53,050 --> 00:11:56,160 value into your bag. 177 00:11:56,160 --> 00:12:01,415 So, why are they similar? 178 00:12:04,110 --> 00:12:07,070 So in some sense, n is our size. 179 00:12:07,070 --> 00:12:11,590 We want to pick a bunch of coins to make the size n. 180 00:12:11,590 --> 00:12:14,720 And each coin here actually has a negative value, 181 00:12:14,720 --> 00:12:17,350 because we want to pick the min of it. 182 00:12:17,350 --> 00:12:20,344 If you do that, then this problem is exactly knapsack. 183 00:12:20,344 --> 00:12:21,510 And knapsack is NP-complete. 184 00:12:26,080 --> 00:12:32,130 That means we don't know a polynomial solution to it yet. 185 00:12:32,130 --> 00:12:33,630 However, we just found one. 186 00:12:36,900 --> 00:12:39,840 Our input is, m stuff and n. 187 00:12:39,840 --> 00:12:42,555 Our solution is polynomial to m, and polynomial to n. 188 00:12:45,866 --> 00:12:51,040 If this is true, then I have found the polynomial solution 189 00:12:51,040 --> 00:12:53,120 to one NP problem. 190 00:12:53,120 --> 00:12:54,320 So P equals NP. 191 00:12:54,320 --> 00:12:58,292 SO we should all be getting Turing award for that. 192 00:12:58,292 --> 00:12:59,500 So clearly something's wrong. 193 00:13:02,760 --> 00:13:07,000 But there's no problem with this solution. 194 00:13:07,000 --> 00:13:09,330 This covers all the cases. 195 00:13:09,330 --> 00:13:11,790 And our analysis is definitely correct. 196 00:13:18,840 --> 00:13:22,885 So does anyone get what I'm asking? 197 00:13:22,885 --> 00:13:24,620 So what's the contradiction here? 198 00:13:29,540 --> 00:13:32,490 I will probably discuss this later, 199 00:13:32,490 --> 00:13:37,390 in later lectures when we get to complexity or reduction. 200 00:13:37,390 --> 00:13:39,500 But to give a short answer, the problem 201 00:13:39,500 --> 00:13:46,200 is that when we say the input is n, its size is not n. 202 00:13:46,200 --> 00:13:54,680 So I only need log n this to represent this input. 203 00:13:54,680 --> 00:13:57,090 Make sense? 204 00:13:57,090 --> 00:14:01,040 Therefore, for log n length input, my runtime is n. 205 00:14:01,040 --> 00:14:03,567 That means my runtime is exponential. 206 00:14:03,567 --> 00:14:04,400 It's not polynomial. 207 00:14:06,940 --> 00:14:07,824 OK. 208 00:14:07,824 --> 00:14:09,365 Now that's the end of the digression. 209 00:14:24,170 --> 00:14:25,910 Now let's look at another example. 210 00:14:28,960 --> 00:14:31,650 This one is called rectangular blocks. 211 00:14:41,350 --> 00:14:45,130 So in this problem, we have a bunch of blocks. 212 00:14:45,130 --> 00:14:48,530 Say 1, 2, all the way to n. 213 00:14:48,530 --> 00:14:54,310 And each of them has a length, width, and height. 214 00:14:54,310 --> 00:14:55,890 So it's a three-dimensional block. 215 00:14:58,580 --> 00:15:02,790 So I want to put blocks, stack them on top of each other 216 00:15:02,790 --> 00:15:05,070 to get the maximum height. 217 00:15:05,070 --> 00:15:10,230 But in order for j to be put on top of i, 218 00:15:10,230 --> 00:15:18,530 I require the length of j to be smaller then the length of i, 219 00:15:18,530 --> 00:15:23,800 and the width of j is also smaller with width of i. 220 00:15:23,800 --> 00:15:29,190 So visually I just meant this is a block. 221 00:15:29,190 --> 00:15:31,250 I can put another block on there. 222 00:15:31,250 --> 00:15:34,176 They are smaller in width and length. 223 00:15:34,176 --> 00:15:39,930 But I cannot put this guy on top of it because one of its 224 00:15:39,930 --> 00:15:44,190 dimension is larger than the underlying block. 225 00:15:44,190 --> 00:15:48,910 And to make things simple, that's not allowed, rotating. 226 00:15:48,910 --> 00:15:50,360 So OK, I can rotate. 227 00:15:50,360 --> 00:15:51,280 It still doesn't fit. 228 00:15:51,280 --> 00:15:53,730 But you see the complication. 229 00:15:53,730 --> 00:15:58,610 So you allow rotate, then there's more possibility. 230 00:15:58,610 --> 00:16:02,560 Length and width are so one of them is north-south, 231 00:16:02,560 --> 00:16:05,785 the other is east-west, and you cannot change that. 232 00:16:08,770 --> 00:16:09,380 OK. 233 00:16:09,380 --> 00:16:10,580 Is the problem clear? 234 00:16:10,580 --> 00:16:13,380 You want to stack one on top of each other 235 00:16:13,380 --> 00:16:14,970 to get the maximum height. 236 00:16:25,100 --> 00:16:28,140 Any ideas? 237 00:16:28,140 --> 00:16:30,510 Again, let's start from simple algorithm. 238 00:16:30,510 --> 00:16:32,950 Say, let's just try everything out. 239 00:16:47,670 --> 00:16:48,430 OK, go ahead. 240 00:16:48,430 --> 00:16:51,005 AUDIENCE: If you try everything else, you have n factorial. 241 00:16:51,005 --> 00:16:51,713 LING REN: Pardon? 242 00:16:51,713 --> 00:16:53,797 AUDIENCE: It would be O of n factorial? 243 00:16:53,797 --> 00:16:55,130 LING REN: You're going too fast. 244 00:16:55,130 --> 00:16:57,360 Let's write the algorithm first. 245 00:16:57,360 --> 00:17:04,220 So I want to solve my rectangle block problem, say from 1 to n. 246 00:17:04,220 --> 00:17:05,220 What are my subproblems? 247 00:17:09,967 --> 00:17:11,669 AUDIENCE: Choose one block. 248 00:17:11,669 --> 00:17:12,210 LING REN: OK. 249 00:17:12,210 --> 00:17:13,168 Let's choose one block. 250 00:17:13,168 --> 00:17:17,036 AUDIENCE: And then you run RB of everything except that block. 251 00:17:19,970 --> 00:17:22,599 LING REN: So I get its height, and then I have a subproblem. 252 00:17:25,410 --> 00:17:26,630 What is the subproblem? 253 00:17:26,630 --> 00:17:27,760 And then I'll take a max. 254 00:17:35,330 --> 00:17:41,920 So the difficulty here is this subproblem. 255 00:17:41,920 --> 00:17:44,410 So Andrew, right? 256 00:17:44,410 --> 00:17:51,320 So Andrew said it's just everything except i. 257 00:17:51,320 --> 00:17:53,410 Is that the case? 258 00:17:53,410 --> 00:17:53,910 Go ahead. 259 00:17:53,910 --> 00:17:55,410 AUDIENCE: It's everything except i, 260 00:17:55,410 --> 00:17:59,305 and anything with wider or longer than i. 261 00:17:59,305 --> 00:18:00,630 LING REN: Do you get that? 262 00:18:00,630 --> 00:18:03,260 Not only do we have to exclude i, 263 00:18:03,260 --> 00:18:07,770 we also have to exclude everything longer or wider 264 00:18:07,770 --> 00:18:08,880 than i. 265 00:18:08,880 --> 00:18:10,545 So that's actually a messy problem. 266 00:18:13,580 --> 00:18:25,480 So let me define this subproblem to be a compatible set of w i. 267 00:18:25,480 --> 00:18:34,090 And let me define that to be the set of blocks where 268 00:18:34,090 --> 00:18:38,140 the length is smaller than the required length, 269 00:18:38,140 --> 00:18:45,420 and their which is also smaller than the required width. 270 00:18:45,420 --> 00:18:48,280 So this should remind you of the weighted interval scheduling 271 00:18:48,280 --> 00:18:53,650 problem, where we define a compatible set once we 272 00:18:53,650 --> 00:18:54,950 have chosen some block. 273 00:18:58,823 --> 00:18:59,323 Question? 274 00:18:59,323 --> 00:19:00,989 AUDIENCE: What are we trying to do here? 275 00:19:00,989 --> 00:19:04,180 Are we trying to minimize h? 276 00:19:04,180 --> 00:19:05,340 LING REN: Maximize h. 277 00:19:05,340 --> 00:19:06,940 We want to get as high as possible. 278 00:19:12,500 --> 00:19:13,990 I choose a block, I get its height, 279 00:19:13,990 --> 00:19:16,930 and then I find out the competitive remaining blocks, 280 00:19:16,930 --> 00:19:18,750 and I want to stack them on top of it. 281 00:19:25,450 --> 00:19:29,330 Everyone agrees this solution is correct? 282 00:19:29,330 --> 00:19:31,820 OK, then let's analyze its runtime. 283 00:19:40,640 --> 00:19:42,360 So how do we analyze runtime? 284 00:19:52,020 --> 00:19:55,374 So what's the first question I always ask? 285 00:19:55,374 --> 00:19:57,596 AUDIENCE: How many subproblems? 286 00:19:57,596 --> 00:19:58,220 LING REN: Yeah. 287 00:19:58,220 --> 00:20:00,470 I'm not sure who said that, but how many 288 00:20:00,470 --> 00:20:01,600 subproblems do we have? 289 00:20:19,841 --> 00:20:22,027 AUDIENCE: At most n? 290 00:20:22,027 --> 00:20:22,860 LING REN: At most n. 291 00:20:25,510 --> 00:20:27,460 Can you explain why is that the case? 292 00:20:27,460 --> 00:20:29,150 Or it's just a guess? 293 00:20:29,150 --> 00:20:37,340 AUDIENCE: Because if n is compatible-- nothing 294 00:20:37,340 --> 00:20:39,555 in the compatible-- n will not be 295 00:20:39,555 --> 00:20:41,047 in the compatible set of anything 296 00:20:41,047 --> 00:20:44,029 that is in the compatible set of n. 297 00:20:44,029 --> 00:20:45,780 LING REN: OK, that's very tricky. 298 00:20:45,780 --> 00:20:46,650 I didn't get that. 299 00:20:46,650 --> 00:20:48,580 Can you say that again? 300 00:20:48,580 --> 00:20:51,130 AUDIENCE: Because for example, if you 301 00:20:51,130 --> 00:20:52,630 start with n, then everything that's 302 00:20:52,630 --> 00:20:56,140 in the compatible set of n. 303 00:20:56,140 --> 00:20:59,020 n won't be in the compatible set of that. 304 00:21:01,624 --> 00:21:02,165 LING REN: OK. 305 00:21:02,165 --> 00:21:04,290 I think I got what you said. 306 00:21:04,290 --> 00:21:08,330 So, if we think there are only n subproblems, what are they? 307 00:21:08,330 --> 00:21:16,976 They have to be compatible sets l1, w1, then l2, w2. 308 00:21:16,976 --> 00:21:19,000 These are the n unique subproblems 309 00:21:19,000 --> 00:21:21,670 you are thinking about. 310 00:21:21,670 --> 00:21:24,540 Is there any chance that I will get a compatible set 311 00:21:24,540 --> 00:21:26,770 like something like l3 but w5? 312 00:21:29,580 --> 00:21:33,320 If I ever have this subproblem then, well, 313 00:21:33,320 --> 00:21:35,845 my number of subproblems are kind of exploding. 314 00:21:43,450 --> 00:21:49,210 Yeah, I see many of you are saying no. 315 00:21:49,210 --> 00:21:50,210 Why not? 316 00:21:50,210 --> 00:21:54,180 Because if we have a subproblem, say, compatible set of l i 317 00:21:54,180 --> 00:22:05,160 and w i, and if we go from here, and choose the next block, say 318 00:22:05,160 --> 00:22:18,370 t, it's guaranteed that t is shorter and narrower. 319 00:22:18,370 --> 00:22:22,300 That means our new subproblem, or new compatible set 320 00:22:22,300 --> 00:22:30,040 becomes-- our new subproblem needs to be 321 00:22:30,040 --> 00:22:33,750 compatible with t instead of i. 322 00:22:33,750 --> 00:22:39,580 So, the only subproblems I can get are these ones. 323 00:22:39,580 --> 00:22:41,330 I cannot have one of these. 324 00:22:45,330 --> 00:22:48,446 The number of subproblems are n. 325 00:22:48,446 --> 00:22:56,145 And how much work do I have to do at each level? 326 00:23:00,780 --> 00:23:03,200 AUDIENCE: n. 327 00:23:03,200 --> 00:23:05,540 LING REN: n, because I'm just taking the max, 328 00:23:05,540 --> 00:23:09,860 and there are n potential choices inside my max. 329 00:23:13,710 --> 00:23:14,760 So runtime n squared. 330 00:23:24,160 --> 00:23:32,810 OK, we're not fully done, because there is an extra step 331 00:23:32,810 --> 00:23:35,350 when we're trying to do this. 332 00:23:35,350 --> 00:23:39,280 We have to figure out what each of these are. 333 00:23:42,260 --> 00:23:45,060 Because once I go into this subproblem, 334 00:23:45,060 --> 00:23:51,120 I need to take a max on all the blocks that's in this set. 335 00:23:51,120 --> 00:23:53,420 I have to know what blocks are in that set. 336 00:23:57,780 --> 00:24:00,530 Is that hard? 337 00:24:00,530 --> 00:24:02,542 So how would you do that? 338 00:24:02,542 --> 00:24:06,999 AUDIENCE: You just check for all of them, and that's O of n. 339 00:24:06,999 --> 00:24:07,540 LING REN: OK. 340 00:24:07,540 --> 00:24:09,950 So, I check all of them. 341 00:24:09,950 --> 00:24:10,830 That's O of n. 342 00:24:15,240 --> 00:24:17,520 I'm pretty sure you just meant scanning, 343 00:24:17,520 --> 00:24:20,970 scan the entire thing, and pick out the compatible ones. 344 00:24:20,970 --> 00:24:23,140 But that's for this subproblem. 345 00:24:23,140 --> 00:24:27,130 We have to do it for every one. 346 00:24:27,130 --> 00:24:29,230 Or there may be a better way. 347 00:24:29,230 --> 00:24:31,240 So I think the previous TA is telling me there's 348 00:24:31,240 --> 00:24:33,020 a better way to do that. 349 00:24:33,020 --> 00:24:36,730 So in order to find the entire compatible stuff, 350 00:24:36,730 --> 00:24:39,624 he claims he can do it in n log n, but I haven't checked that, 351 00:24:39,624 --> 00:24:40,290 so I'm not sure. 352 00:24:40,290 --> 00:24:45,230 This is a folklore legend here. 353 00:24:45,230 --> 00:24:46,880 Yeah, we'll double check that offline. 354 00:24:46,880 --> 00:24:51,080 But assuming if I don't have this, then 355 00:24:51,080 --> 00:24:55,960 figure out all these subproblems will also take n squared. 356 00:24:55,960 --> 00:24:58,910 Then my total runtime is n squared plus n squared, 357 00:24:58,910 --> 00:25:00,040 and still n squared. 358 00:25:06,400 --> 00:25:06,900 Question? 359 00:25:06,900 --> 00:25:08,784 AUDIENCE: Is the n log n solution 360 00:25:08,784 --> 00:25:11,139 giving us sorting this by [INAUDIBLE]? 361 00:25:14,014 --> 00:25:15,930 LING REN: Yeah, I think it should be something 362 00:25:15,930 --> 00:25:19,260 along those lines, but yeah, I haven't figured out whether you 363 00:25:19,260 --> 00:25:22,070 sort by length or by width. 364 00:25:22,070 --> 00:25:25,890 You can only sort by one of them. 365 00:25:25,890 --> 00:25:29,150 So after sorting, say let's sort by length. 366 00:25:29,150 --> 00:25:34,570 Then after sorting, I may get something like this. 367 00:25:34,570 --> 00:25:37,600 And if I'm asking what's the compatible set of width 368 00:25:37,600 --> 00:25:40,885 this guy, I still have to kick all of them out. 369 00:25:48,942 --> 00:25:52,130 Yeah, so it's not entirely clear to me how to do it, 370 00:25:52,130 --> 00:25:57,240 but I think you can potentially consider having another, 371 00:25:57,240 --> 00:26:00,550 say, binary search tree that's sorted by width, 372 00:26:00,550 --> 00:26:03,910 and you can go in and just delete everything larger 373 00:26:03,910 --> 00:26:06,280 than a certain width. 374 00:26:06,280 --> 00:26:08,640 So that's the, yeah. 375 00:26:08,640 --> 00:26:09,642 OK, go ahead. 376 00:26:09,642 --> 00:26:13,815 AUDIENCE: Can you convert into a directed graph, where each 377 00:26:13,815 --> 00:26:17,615 pair of shapes that's compatible, you do an edge. 378 00:26:17,615 --> 00:26:20,950 And then path find. 379 00:26:20,950 --> 00:26:22,100 LING REN: OK. 380 00:26:22,100 --> 00:26:24,510 OK. 381 00:26:24,510 --> 00:26:29,626 But constructing that graph already takes O n squared, 382 00:26:29,626 --> 00:26:30,126 correct? 383 00:26:34,856 --> 00:26:36,470 Yeah, OK, let's move on. 384 00:26:36,470 --> 00:26:38,260 I don't have time to figure this out. 385 00:26:41,390 --> 00:26:49,160 So, this problem is remotely similar to interval scheduling, 386 00:26:49,160 --> 00:26:52,180 weighted interval scheduling, in a sense 387 00:26:52,180 --> 00:26:54,630 that it has some compatible set. 388 00:26:54,630 --> 00:26:58,430 And in the very first lecture and recitation, 389 00:26:58,430 --> 00:27:02,400 we have two algorithm for weighted interval scheduling, 390 00:27:02,400 --> 00:27:04,617 and one of them is better than the other. 391 00:27:04,617 --> 00:27:06,450 And this one looks like the naive algorithm. 392 00:27:13,480 --> 00:27:18,380 So, does anyone remember what the better algorithm 393 00:27:18,380 --> 00:27:19,925 is for weighted interval scheduling? 394 00:27:46,710 --> 00:27:55,530 But instead of checking every one as my potential lowest one, 395 00:27:55,530 --> 00:27:57,230 it really doesn't make sense to do that. 396 00:27:57,230 --> 00:27:59,160 Because for the very small ones, I 397 00:27:59,160 --> 00:28:03,420 shouldn't put them as my bottom one. 398 00:28:03,420 --> 00:28:09,440 I should try the larger ones first as the very bottom one. 399 00:28:09,440 --> 00:28:09,940 Go ahead. 400 00:28:09,940 --> 00:28:11,908 Oh, you're not-- 401 00:28:11,908 --> 00:28:15,380 AUDIENCE: You could create a sorted list of length n 402 00:28:15,380 --> 00:28:16,868 with the width. 403 00:28:16,868 --> 00:28:19,844 So you know that items that are later in the list, 404 00:28:19,844 --> 00:28:24,330 they're not going to be in the first level of the tower. 405 00:28:24,330 --> 00:28:26,030 LING REN: Yeah, correct. 406 00:28:26,030 --> 00:28:29,230 So, just in the same line of thought 407 00:28:29,230 --> 00:28:33,290 as weighted interval scheduling, let's first sort them. 408 00:28:33,290 --> 00:28:35,405 But then, it's a little tricky because do I 409 00:28:35,405 --> 00:28:37,700 sort by length or width? 410 00:28:37,700 --> 00:28:43,120 So I'm not sure yet, so let's just sort by length 411 00:28:43,120 --> 00:28:43,835 and then width. 412 00:28:46,790 --> 00:28:48,790 So this means if they have the same length, 413 00:28:48,790 --> 00:28:50,180 then I'll sort them by width. 414 00:28:50,180 --> 00:28:52,810 So I can create a sorted list. 415 00:28:52,810 --> 00:28:54,920 Let me just assume that it's in-place sort, 416 00:28:54,920 --> 00:28:58,570 and now I have the sorted list. 417 00:28:58,570 --> 00:29:04,320 So once I have that, the potential solutions 418 00:29:04,320 --> 00:29:06,980 I should consider is that whether or not 419 00:29:06,980 --> 00:29:11,680 I put my first block as the bottom one. 420 00:29:11,680 --> 00:29:15,470 It doesn't make sense for me to put a later one down. 421 00:29:15,470 --> 00:29:25,175 So my original problem becomes taking the max, 422 00:29:25,175 --> 00:29:29,540 and whether or not I choose block one. 423 00:29:29,540 --> 00:29:35,470 If I do, then I get its weight-- height, sorry. 424 00:29:35,470 --> 00:29:42,290 And my subproblem is the ones compatible with it. 425 00:29:48,300 --> 00:29:51,770 If I do not choose it, then my sub problem 426 00:29:51,770 --> 00:29:57,280 is like what Andrew first said, from 2 all the way to n. 427 00:30:00,710 --> 00:30:04,260 So why is this correct? 428 00:30:04,260 --> 00:30:08,250 So I claim this covers all the cases. 429 00:30:08,250 --> 00:30:13,800 Either h1 is chosen as the first bottom one, or it's not. 430 00:30:13,800 --> 00:30:15,050 It's not chosen at all. 431 00:30:15,050 --> 00:30:17,990 It's impossible for h1 to be somewhere in the middle, 432 00:30:17,990 --> 00:30:21,710 because it has the longest, largest length. 433 00:30:21,710 --> 00:30:22,210 OK. 434 00:30:25,400 --> 00:30:26,970 So how many subproblems do I have? 435 00:30:38,826 --> 00:30:40,320 Go ahead. 436 00:30:40,320 --> 00:30:43,330 Still n. 437 00:30:43,330 --> 00:30:49,780 So there are all of these compatible set of l1 w1, l2 w2. 438 00:30:49,780 --> 00:30:52,140 But it looks like I do have some new subproblems. 439 00:30:57,940 --> 00:31:02,060 These do not exist before. 440 00:31:02,060 --> 00:31:05,860 However, there are only n of them. 441 00:31:05,860 --> 00:31:10,640 They're just a suffix of the entire set. 442 00:31:10,640 --> 00:31:13,240 So I still have O of n subproblems. 443 00:31:13,240 --> 00:31:18,050 And at each step, I'm doing constant amount of work. 444 00:31:18,050 --> 00:31:19,510 There are just two items. 445 00:31:19,510 --> 00:31:22,925 So we found an order n solution. 446 00:31:26,200 --> 00:31:27,460 Are we done? 447 00:31:27,460 --> 00:31:30,661 Is it really order n? 448 00:31:30,661 --> 00:31:31,160 OK, no. 449 00:31:31,160 --> 00:31:32,870 AUDIENCE: You still have to find the c. 450 00:31:32,870 --> 00:31:33,495 LING REN: Yeah. 451 00:31:33,495 --> 00:31:35,980 I still have to find all these c's. 452 00:31:35,980 --> 00:31:39,060 And first, I actually have a sort step. 453 00:31:39,060 --> 00:31:41,050 That sort step is n log n. 454 00:31:43,710 --> 00:31:46,860 Yeah, then again, well, if we do it naively, 455 00:31:46,860 --> 00:31:50,990 then it's again n squared, because I 456 00:31:50,990 --> 00:31:54,720 have to find this compatible set, each of them. 457 00:31:54,720 --> 00:31:56,940 But if there's an n log n solution 458 00:31:56,940 --> 00:31:59,840 to find these compatible sets, then my final runtime is n 459 00:31:59,840 --> 00:32:00,340 log n. 460 00:32:05,491 --> 00:32:05,990 Make sense? 461 00:32:17,615 --> 00:32:18,490 Any questions so far? 462 00:32:29,050 --> 00:32:30,400 OK. 463 00:32:30,400 --> 00:32:34,490 So now we actually have a choice. 464 00:32:34,490 --> 00:32:39,560 So we can either go through another DP example, 465 00:32:39,560 --> 00:32:40,835 I do have another one. 466 00:32:40,835 --> 00:32:44,150 But Nancy, one of the lecturers suggested, 467 00:32:44,150 --> 00:32:47,560 that it seems that many people have some trouble understanding 468 00:32:47,560 --> 00:32:50,850 yesterday's lecture on universal hashing and perfect hashing. 469 00:32:50,850 --> 00:32:53,876 So we can also consider going through that. 470 00:32:53,876 --> 00:32:55,250 Well, of course, the third option 471 00:32:55,250 --> 00:32:56,291 is to just call it a day. 472 00:32:59,300 --> 00:33:01,620 So, let me just take a poll. 473 00:33:01,620 --> 00:33:05,050 How many people before we go over the hash stuff? 474 00:33:08,780 --> 00:33:10,925 How many people prefer another DP example? 475 00:33:13,890 --> 00:33:14,390 OK. 476 00:33:14,390 --> 00:33:15,400 Sorry guys. 477 00:33:15,400 --> 00:33:17,980 How many people just want to leave? 478 00:33:17,980 --> 00:33:18,521 It's fine. 479 00:33:18,521 --> 00:33:19,020 OK. 480 00:33:19,020 --> 00:33:19,520 Great. 481 00:33:19,520 --> 00:33:20,024 That's it. 482 00:33:23,440 --> 00:33:23,940 OK. 483 00:33:23,940 --> 00:33:26,270 So, so much for DP. 484 00:33:26,270 --> 00:33:27,470 We do have another example. 485 00:33:27,470 --> 00:33:28,940 We will release it in recitation notes. 486 00:33:28,940 --> 00:33:30,510 For those of you who are interested, 487 00:33:30,510 --> 00:33:32,140 you can take a look. 488 00:33:32,140 --> 00:33:36,180 So, well, sure you all know that we 489 00:33:36,180 --> 00:33:39,120 haven't go into DP in the main lectures yet. 490 00:33:39,120 --> 00:33:42,510 So this is really just a warm up to prepare 491 00:33:42,510 --> 00:33:45,690 you to go to the more advanced DP concepts. 492 00:33:45,690 --> 00:33:50,270 And also, DP will be covered in quiz 1. 493 00:33:50,270 --> 00:33:54,880 But the difficulty will be strictly easier 494 00:33:54,880 --> 00:33:57,432 than the examples we covered here. 495 00:34:01,360 --> 00:34:01,860 OK? 496 00:34:28,460 --> 00:34:32,510 Now let's review universal and perfect hashing. 497 00:34:32,510 --> 00:34:35,960 So it's not like I have a better way to teach it. 498 00:34:35,960 --> 00:34:38,010 Our advantage here is that we have fewer people, 499 00:34:38,010 --> 00:34:41,460 so you can ask questions you have. 500 00:34:41,460 --> 00:34:43,719 So let me start with the motivating example. 501 00:34:43,719 --> 00:34:44,929 So why do we care about hash? 502 00:34:47,520 --> 00:34:54,860 It's because we want to create a hash table of, say, n. 503 00:34:54,860 --> 00:34:55,660 It has n bins. 504 00:35:00,170 --> 00:35:06,160 And we will receive input, say, k0, k1, all the way to k n 505 00:35:06,160 --> 00:35:07,340 minus 1. 506 00:35:07,340 --> 00:35:08,610 n keys. 507 00:35:08,610 --> 00:35:12,110 And we'll create a hash function to each of them 508 00:35:12,110 --> 00:35:14,580 to map them to one of the bins. 509 00:35:14,580 --> 00:35:23,140 That the hope is that if n is theta m, or in the other way, 510 00:35:23,140 --> 00:35:26,040 m is theta n, then each bin should contain 511 00:35:26,040 --> 00:35:27,200 a constant number of keys. 512 00:35:30,310 --> 00:35:33,250 So to complete the picture, all the keys are 513 00:35:33,250 --> 00:35:40,060 drawn from a universe that has size u. 514 00:35:40,060 --> 00:35:42,420 And this u is usually pretty large. 515 00:35:42,420 --> 00:35:46,010 Let's say it's larger than m squared. 516 00:35:46,010 --> 00:35:48,750 It's larger than the square of my hash table size. 517 00:35:52,600 --> 00:36:05,090 But let me first start with a negative result. 518 00:36:05,090 --> 00:36:18,580 So if my hash function is deterministic, 519 00:36:18,580 --> 00:36:23,630 then there always exists a series of input 520 00:36:23,630 --> 00:36:25,350 that all map to the same thing. 521 00:36:31,630 --> 00:36:32,880 We call that worst case. 522 00:36:38,280 --> 00:36:39,720 We don't like the worst case. 523 00:36:39,720 --> 00:36:40,330 Why? 524 00:36:40,330 --> 00:36:42,538 Because in that case, the hash is not doing anything. 525 00:36:42,538 --> 00:36:45,730 We still have all of the items in the same list. 526 00:36:56,420 --> 00:36:58,220 Why is that lemma true? 527 00:36:58,220 --> 00:37:03,050 Because by a very simple pigeonhole argument, 528 00:37:03,050 --> 00:37:08,730 so imagine I insert all of the keys in the universe 529 00:37:08,730 --> 00:37:09,544 into my hash table. 530 00:37:09,544 --> 00:37:10,960 I would never do that in practice. 531 00:37:10,960 --> 00:37:12,590 It's just a thought experiment. 532 00:37:12,590 --> 00:37:14,760 So by a simple pigeonhole argument, 533 00:37:14,760 --> 00:37:17,670 if u is greater than m squared, then 534 00:37:17,670 --> 00:37:23,020 at least some bin will contain more than m elements. 535 00:37:23,020 --> 00:37:26,910 Well, if it just so happens that my inputs are these m keys, 536 00:37:26,910 --> 00:37:30,180 then my hash will hash all of them to the same bin. 537 00:37:30,180 --> 00:37:32,980 Make sense? 538 00:37:32,980 --> 00:37:35,490 So this is the problem we're trying to solve. 539 00:37:35,490 --> 00:37:37,490 We don't want this worst case. 540 00:37:37,490 --> 00:37:40,010 And it does say that if h is deterministic, 541 00:37:40,010 --> 00:37:41,580 we cannot avoid that. 542 00:37:41,580 --> 00:37:43,190 There always exist a worst case. 543 00:37:43,190 --> 00:37:44,390 So what's the solution? 544 00:37:48,050 --> 00:37:52,696 Then the solution is to randomize h. 545 00:37:55,960 --> 00:38:01,420 However, I can't really randomize h. 546 00:38:01,420 --> 00:38:07,970 If h take some key, if my hash function 547 00:38:07,970 --> 00:38:10,740 maps a key into a certain bin, well, 548 00:38:10,740 --> 00:38:12,410 the next time I call this hash function, 549 00:38:12,410 --> 00:38:13,990 it better give the same bin. 550 00:38:13,990 --> 00:38:17,900 Otherwise I cannot find that item. 551 00:38:17,900 --> 00:38:21,850 So h needs to be deterministic. 552 00:38:27,710 --> 00:38:34,400 So now our only choice is to pick a random h. 553 00:38:38,070 --> 00:38:38,840 Make sense? 554 00:38:38,840 --> 00:38:40,570 Every hash function is deterministic, 555 00:38:40,570 --> 00:38:46,150 but we will pick a random one from a family 556 00:38:46,150 --> 00:38:47,000 of hash functions. 557 00:38:51,190 --> 00:38:53,320 So in some sense, this is cheating. 558 00:38:53,320 --> 00:38:53,950 Why? 559 00:38:53,950 --> 00:38:58,210 Because all I'm saying is I will not choose a hash function 560 00:38:58,210 --> 00:38:59,030 beforehand. 561 00:38:59,030 --> 00:39:05,070 I will wait for the user to insert inputs. 562 00:39:05,070 --> 00:39:08,221 If I have too many collisions, I'll choose another one. 563 00:39:08,221 --> 00:39:09,470 If I have too many collisions. 564 00:39:09,470 --> 00:39:10,470 I'll choose another one. 565 00:39:13,080 --> 00:39:13,600 OK. 566 00:39:13,600 --> 00:39:15,891 I think I forgot to mention one thing that's important. 567 00:39:15,891 --> 00:39:17,550 So you may ask why do I care? 568 00:39:17,550 --> 00:39:19,670 Why do I care about that worst case? 569 00:39:19,670 --> 00:39:23,190 What's the chance of it happening in practice? 570 00:39:23,190 --> 00:39:27,750 It's very low, but in algorithms, we really 571 00:39:27,750 --> 00:39:30,641 don't like making assumptions on inputs. 572 00:39:30,641 --> 00:39:31,140 Why? 573 00:39:31,140 --> 00:39:33,300 Because if you imagine you're running, say, 574 00:39:33,300 --> 00:39:37,910 a website, a web server, and you code has some has table in it. 575 00:39:37,910 --> 00:39:41,370 So if your competitor, or someone who hates you, 576 00:39:41,370 --> 00:39:43,250 wants to put you out of business, 577 00:39:43,250 --> 00:39:45,330 and if he knows your hash function, 578 00:39:45,330 --> 00:39:48,090 he can create a worst case input. 579 00:39:48,090 --> 00:39:51,100 That will make your website infinitely slow. 580 00:39:51,100 --> 00:39:54,340 So what we are saying here is I don't tell him 581 00:39:54,340 --> 00:39:56,280 what hash function I'll use. 582 00:39:56,280 --> 00:39:57,780 I'll say I choose one. 583 00:39:57,780 --> 00:40:01,340 If he figures out the wrong input, the worst case input, 584 00:40:01,340 --> 00:40:06,311 I'm going to change my hash function and use another one. 585 00:40:06,311 --> 00:40:06,810 Make sense? 586 00:40:23,740 --> 00:40:33,050 Now the definition of universal hash function 587 00:40:33,050 --> 00:40:38,920 is that if I pick a random h from my universal hash function 588 00:40:38,920 --> 00:40:48,610 family, the probability that any key i mapped to the same bin 589 00:40:48,610 --> 00:40:54,380 as any key j should be less or equal than 1 590 00:40:54,380 --> 00:40:56,440 over m, where m is my hash table. 591 00:40:56,440 --> 00:40:57,940 This is really the best you can get. 592 00:40:57,940 --> 00:41:01,202 If the hash function is really evenly distributing things, 593 00:41:01,202 --> 00:41:02,410 you should get this property. 594 00:41:07,970 --> 00:41:13,700 So we have seen one universal hash function in the class. 595 00:41:13,700 --> 00:41:16,030 I'll just go over the other example, 596 00:41:16,030 --> 00:41:27,470 which is ak plus b modulo p, and then modulo m. 597 00:41:27,470 --> 00:41:33,760 So p is a prime number that is greater than the universe size. 598 00:41:36,760 --> 00:41:38,760 We'll see why this is a universal hash function. 599 00:41:41,730 --> 00:41:44,490 So to do that, we just need to analyze the collision 600 00:41:44,490 --> 00:41:45,690 probability. 601 00:41:45,690 --> 00:41:49,550 So if I have two key, that k1 and k2 that 602 00:41:49,550 --> 00:41:59,780 map to the same bin, that means they must have this property. 603 00:41:59,780 --> 00:42:02,650 After taking the mod m, their difference 604 00:42:02,650 --> 00:42:05,600 should be a multiple of m. 605 00:42:05,600 --> 00:42:09,540 Because if this is true after taking the modulo m, 606 00:42:09,540 --> 00:42:12,410 they will map to the same bin. 607 00:42:12,410 --> 00:42:14,620 Make sense? 608 00:42:14,620 --> 00:42:22,510 Now I can quickly write it as a times the difference of the key 609 00:42:22,510 --> 00:42:27,920 equals a multiple of m, mod p. 610 00:42:27,920 --> 00:42:33,620 Now, k1 and k2 are not equal, so they are nonzero. 611 00:42:33,620 --> 00:42:36,820 And in this group, based on some number theory, 612 00:42:36,820 --> 00:42:39,050 we have an inverse element for it. 613 00:42:39,050 --> 00:42:43,670 So, if this happens, we'll call it a bad a. 614 00:42:43,670 --> 00:42:45,220 How many bad a's do I have? 615 00:42:47,785 --> 00:42:52,510 One of a will make this equation holds with i equals 1. 616 00:42:52,510 --> 00:42:56,090 Another a make the equation holds with i equals 2. 617 00:42:56,090 --> 00:42:59,120 But how many such a's do I have? 618 00:42:59,120 --> 00:43:04,950 At most, because this equation can hold with m, 2m, 3m, 619 00:43:04,950 --> 00:43:11,950 all the way to p over m floored m. 620 00:43:11,950 --> 00:43:14,080 This is the total number of possible ways 621 00:43:14,080 --> 00:43:16,270 this equation can hold. 622 00:43:16,270 --> 00:43:18,790 So how many bad a's do I have? 623 00:43:18,790 --> 00:43:25,920 I have p over m, over the total number 624 00:43:25,920 --> 00:43:27,610 of a's, which is p minus 1. 625 00:43:30,272 --> 00:43:31,730 Oh, yeah, I forgot to mention that. 626 00:43:31,730 --> 00:43:39,510 So a is from 1 to p minus one. 627 00:43:44,240 --> 00:43:44,900 OK. 628 00:43:44,900 --> 00:43:49,800 So I can always choose my p to be not a multiple of m. 629 00:43:49,800 --> 00:44:00,600 If I do that, this floor-- so, then p and p minus 1 630 00:44:00,600 --> 00:44:03,070 do not cross the boundary of modulo m. 631 00:44:03,070 --> 00:44:11,400 Then this is true, and this is less than 1 over m. 632 00:44:11,400 --> 00:44:14,827 So this is a universal hash function family. 633 00:44:20,440 --> 00:44:21,840 So what's the randomness here? 634 00:44:21,840 --> 00:44:23,530 The randomness is a. 635 00:44:23,530 --> 00:44:27,440 I'll pick an a to get one of my hash, and if it doesn't work, 636 00:44:27,440 --> 00:44:28,205 I pick another a. 637 00:44:31,456 --> 00:44:32,854 AUDIENCE: What is b? 638 00:44:32,854 --> 00:44:33,790 What is b? 639 00:44:33,790 --> 00:44:37,050 LING REN: p is a prime number I choose-- 640 00:44:37,050 --> 00:44:38,070 AUDIENCE: [INAUDIBLE] 641 00:44:38,070 --> 00:44:38,570 LING REN: b? 642 00:44:38,570 --> 00:44:39,290 AUDIENCE: Yeah. 643 00:44:39,290 --> 00:44:41,424 LING REN: Oh, b. 644 00:44:41,424 --> 00:44:42,840 I think it's also a random number. 645 00:44:45,550 --> 00:44:47,810 Yeah, so, actually it's not needed, but I think 646 00:44:47,810 --> 00:44:49,560 there's some deep reason that they keep it 647 00:44:49,560 --> 00:44:51,280 in the hash function. 648 00:44:51,280 --> 00:44:52,000 I'm not sure why. 649 00:45:04,380 --> 00:45:09,510 Now once we have that, once we have universal hash, 650 00:45:09,510 --> 00:45:13,990 people also want perfect hashing, 651 00:45:13,990 --> 00:45:18,550 which means I want absolutely 0 collision. 652 00:45:21,730 --> 00:45:22,820 So how do I do that? 653 00:45:22,820 --> 00:45:28,540 Let me first give a method 1. 654 00:45:32,570 --> 00:45:36,230 I'll just use any universal hash function, 655 00:45:36,230 --> 00:45:38,590 but I choose my m to be n squared. 656 00:45:43,940 --> 00:45:46,020 I claim this is a perfect hash function 657 00:45:46,020 --> 00:45:48,090 with certain probability. 658 00:45:48,090 --> 00:45:48,660 Why? 659 00:45:48,660 --> 00:45:52,100 Because I want to calculate probability no collision. 660 00:46:03,230 --> 00:46:06,170 Yeah, 1 minus probability I do have a collision. 661 00:46:09,740 --> 00:46:12,180 And I can use a union bound. 662 00:46:12,180 --> 00:46:20,215 That's the probability that any pair has a collision. 663 00:46:25,570 --> 00:46:29,290 Any pair of hx equals hy. 664 00:46:33,540 --> 00:46:34,604 How many pairs do I have? 665 00:46:37,852 --> 00:46:39,244 AUDIENCE: N choose 2. 666 00:46:39,244 --> 00:46:43,560 LING REN: Yeah. n choose 2, which is this number. 667 00:46:48,060 --> 00:46:50,040 So if it's a universal hash function, 668 00:46:50,040 --> 00:46:53,040 then any collision, any two colliding, 669 00:46:53,040 --> 00:46:55,200 the probability is 1 over m. 670 00:46:58,560 --> 00:47:00,610 So I choose my m to be n squared, 671 00:47:00,610 --> 00:47:02,330 so this one is larger than 1/2. 672 00:47:05,300 --> 00:47:08,970 So what I'm saying, to get a perfect hash function, 673 00:47:08,970 --> 00:47:10,640 I'll just use the simplest way. 674 00:47:10,640 --> 00:47:13,180 I select the universal hash function with m 675 00:47:13,180 --> 00:47:14,950 equals n squared. 676 00:47:14,950 --> 00:47:18,130 I have a probability more than 1/2 to succeed. 677 00:47:18,130 --> 00:47:20,820 Or if I don't succeed, I'll choose another one 678 00:47:20,820 --> 00:47:24,350 until I succeed. 679 00:47:24,350 --> 00:47:26,540 So this is a randomized algorithm, 680 00:47:26,540 --> 00:47:30,030 and we can make it a Monte Carlo algorithm or Las Vegas 681 00:47:30,030 --> 00:47:30,910 algorithm. 682 00:47:30,910 --> 00:47:38,830 So I can either say if I choose alpha log n times, 683 00:47:38,830 --> 00:47:41,580 then what's the chance that none of my choice 684 00:47:41,580 --> 00:47:42,970 satisfies perfect hashing? 685 00:47:45,820 --> 00:47:48,980 My failure probability is less than this. 686 00:47:56,210 --> 00:48:01,000 My each chance I have a half success rate, 687 00:48:01,000 --> 00:48:03,600 and I try this many times, what's 688 00:48:03,600 --> 00:48:05,107 the chance of all of them failing? 689 00:48:08,310 --> 00:48:09,820 This is 1 over n raised to alpha. 690 00:48:15,601 --> 00:48:19,260 Of course, I can also say, I'll keep trying until I succeed. 691 00:48:19,260 --> 00:48:22,360 Then I have a 100 percent success rate, 692 00:48:22,360 --> 00:48:26,960 but my runtime could potentially go unbounded. 693 00:48:26,960 --> 00:48:29,180 Make sense? 694 00:48:29,180 --> 00:48:29,890 OK. 695 00:48:29,890 --> 00:48:32,700 This sounds like a perfect solution. 696 00:48:32,700 --> 00:48:38,000 The only problem is that the space complexity of this method 697 00:48:38,000 --> 00:48:43,480 is n squared, because I choose my m hash table 698 00:48:43,480 --> 00:48:45,474 size to be n squared. 699 00:48:45,474 --> 00:48:46,890 So this is the only thing we don't 700 00:48:46,890 --> 00:48:50,140 want in this simple method. 701 00:48:59,370 --> 00:49:05,970 Our final goal, is to have a perfect hash function that 702 00:49:05,970 --> 00:49:15,910 has space O of n, and also runtime some polynomial in n, 703 00:49:15,910 --> 00:49:19,230 and failure probability arbitrarily small. 704 00:49:19,230 --> 00:49:23,283 And the idea there is this two-level hashing. 705 00:49:31,180 --> 00:49:37,470 So, I choose h1 first to hash my keys into bins. 706 00:49:37,470 --> 00:49:41,170 And for each bin, say I get l1 elements here, l2 elements 707 00:49:41,170 --> 00:49:44,280 here, so on and so forth. 708 00:49:44,280 --> 00:49:46,825 I'll choose each of the bins to be 709 00:49:46,825 --> 00:49:50,630 a second level perfect hashing. 710 00:49:50,630 --> 00:49:55,290 So we can use the method one to choose this small one. 711 00:49:55,290 --> 00:49:59,580 If I choose m1, which is the hash table size of this guy, 712 00:49:59,580 --> 00:50:04,560 to be l1 squared, then I know after alpha log n trial, 713 00:50:04,560 --> 00:50:07,580 this one should be a perfect hashing. 714 00:50:07,580 --> 00:50:10,090 After another alpha log n trial, I 715 00:50:10,090 --> 00:50:12,220 should resolve all the conflicts in l2 716 00:50:12,220 --> 00:50:15,206 to make it a perfect hashing. 717 00:50:15,206 --> 00:50:16,194 Make sense? 718 00:50:19,160 --> 00:50:27,410 So after n log n trials, I will resolve all the conflicts 719 00:50:27,410 --> 00:50:28,780 in my second level hashing. 720 00:50:28,780 --> 00:50:29,280 Question? 721 00:50:29,280 --> 00:50:30,200 AUDIENCE: It was mentioned in the lecture 722 00:50:30,200 --> 00:50:33,040 that this only works if there are no inserts or deletes, 723 00:50:33,040 --> 00:50:34,040 or something like that? 724 00:50:37,970 --> 00:50:39,720 LING REN: Let me think about that offline. 725 00:50:39,720 --> 00:50:40,719 I'm not sure about that. 726 00:50:43,461 --> 00:50:43,960 OK. 727 00:50:43,960 --> 00:50:47,150 So the only remaining problem is we 728 00:50:47,150 --> 00:50:50,270 need to figure out whether we achieve this space O of n. 729 00:50:50,270 --> 00:50:53,210 What is this space complexity of this algorithm? 730 00:50:53,210 --> 00:51:04,310 It's n plus l i squared, because each table size is the square 731 00:51:04,310 --> 00:51:06,330 of the elements in it. 732 00:51:06,330 --> 00:51:10,070 And finally, we have that Markov inequality 733 00:51:10,070 --> 00:51:12,040 or I think something like that, to prove 734 00:51:12,040 --> 00:51:21,240 this is the case with-- so my space is O of n, 735 00:51:21,240 --> 00:51:25,650 also with the probability of greater than 1/2. 736 00:51:25,650 --> 00:51:27,150 I can keep going. 737 00:51:27,150 --> 00:51:32,960 I'll try alpha log n times on my first level hash function, 738 00:51:32,960 --> 00:51:35,960 until my space is O of n. 739 00:51:35,960 --> 00:51:38,380 Once I get to that point, I'll try 740 00:51:38,380 --> 00:51:40,440 choosing universal hash functions for my smaller 741 00:51:40,440 --> 00:51:43,625 tables, until I succeed. 742 00:51:43,625 --> 00:51:44,125 OK? 743 00:51:51,010 --> 00:51:54,536 That's it for hashing and DP.