1 00:00:00,050 --> 00:00:01,770 The following content is provided 2 00:00:01,770 --> 00:00:04,010 under a Creative Commons license. 3 00:00:04,010 --> 00:00:06,860 Your support will help MIT OpenCourseWare continue 4 00:00:06,860 --> 00:00:10,720 to offer high quality educational resources for free. 5 00:00:10,720 --> 00:00:13,340 To make a donation or view additional materials 6 00:00:13,340 --> 00:00:17,207 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,207 --> 00:00:17,832 at ocw.mit.edu. 8 00:00:21,155 --> 00:00:22,030 PROFESSOR: All right. 9 00:00:22,030 --> 00:00:23,220 So I brought a few problems. 10 00:00:23,220 --> 00:00:26,950 They're obviously not the quiz problems, though some of them 11 00:00:26,950 --> 00:00:29,280 are supposed to be similar. 12 00:00:29,280 --> 00:00:30,880 What I have here might not be what 13 00:00:30,880 --> 00:00:33,260 you have on the quiz because we might drop quiz problems 14 00:00:33,260 --> 00:00:35,280 or because some of them are just meant to make 15 00:00:35,280 --> 00:00:39,310 you think and not to give away the solutions to the quiz. 16 00:00:39,310 --> 00:00:41,490 Now, before we get started on this, 17 00:00:41,490 --> 00:00:43,814 do you guys have any burning questions or any concepts 18 00:00:43,814 --> 00:00:44,730 that you want covered? 19 00:00:44,730 --> 00:00:46,800 Based on that, I'll select which problems we do. 20 00:00:46,800 --> 00:00:47,727 Yes? 21 00:00:47,727 --> 00:00:50,288 AUDIENCE: This actually relates not too much to the Pset. 22 00:00:53,040 --> 00:00:55,725 If you're looking at the time complexity 23 00:00:55,725 --> 00:01:02,360 to maybe transfer something from one table to another, 24 00:01:02,360 --> 00:01:04,164 it takes a lot more time, I would 25 00:01:04,164 --> 00:01:06,980 assume, to move the actual item to the new table 26 00:01:06,980 --> 00:01:09,250 than it does just to look at your point and be like, 27 00:01:09,250 --> 00:01:10,680 oh, there's nothing there. 28 00:01:10,680 --> 00:01:12,187 So if you were just going to look 29 00:01:12,187 --> 00:01:14,330 through an empty table of size m, 30 00:01:14,330 --> 00:01:17,325 the time to look through that empty table, I'm assuming, 31 00:01:17,325 --> 00:01:21,210 is much less than the time to actually move an item. 32 00:01:21,210 --> 00:01:28,650 PROFESSOR: So you're saying we have m things here. 33 00:01:28,650 --> 00:01:29,350 AUDIENCE: Yes. 34 00:01:29,350 --> 00:01:33,050 PROFESSOR: Some might be nil, and some 35 00:01:33,050 --> 00:01:36,260 might have stuff in them, and you're 36 00:01:36,260 --> 00:01:41,166 going to resize that to presumably 2 times m, 37 00:01:41,166 --> 00:01:42,540 and the way you do that is you're 38 00:01:42,540 --> 00:01:45,730 going to move the elements, presumably by rehashing them, 39 00:01:45,730 --> 00:01:46,411 right? 40 00:01:46,411 --> 00:01:46,994 AUDIENCE: Yes. 41 00:01:50,170 --> 00:01:53,560 PROFESSOR: So these elements, at least when we use Python, 42 00:01:53,560 --> 00:01:57,060 we don't really store big elements anywhere. 43 00:01:57,060 --> 00:01:59,000 If you have a big object, we always 44 00:01:59,000 --> 00:02:01,270 work with references to that object. 45 00:02:01,270 --> 00:02:03,280 So you remember the address where 46 00:02:03,280 --> 00:02:06,510 that object lies in memory, and since the memory is 47 00:02:06,510 --> 00:02:10,110 finite and small, addresses are all from 0 to a small number, 48 00:02:10,110 --> 00:02:12,650 so they're constant. 49 00:02:12,650 --> 00:02:15,990 So what you have here is not a big object. 50 00:02:15,990 --> 00:02:17,800 It's the address of the big object, 51 00:02:17,800 --> 00:02:21,470 so moving is always constant time. 52 00:02:21,470 --> 00:02:23,750 AUDIENCE: What I'm saying is let's 53 00:02:23,750 --> 00:02:26,230 say that the table is completely full 54 00:02:26,230 --> 00:02:28,280 versus completely empty table. 55 00:02:28,280 --> 00:02:32,040 It would take more time to move everything out 56 00:02:32,040 --> 00:02:34,510 of the full table than it does just 57 00:02:34,510 --> 00:02:36,515 to look the empty table, right? 58 00:02:36,515 --> 00:02:37,390 PROFESSOR: Let's see. 59 00:02:37,390 --> 00:02:42,300 So writing something here is order 1 time, right? 60 00:02:42,300 --> 00:02:44,350 So moving is order 1 time. 61 00:02:44,350 --> 00:02:47,130 Moving one element is order 1 time. 62 00:02:47,130 --> 00:02:50,242 What's accessing an element in a table in a list? 63 00:02:50,242 --> 00:02:51,200 You have a Python list. 64 00:02:51,200 --> 00:02:53,282 What's the cost of doing an index access? 65 00:02:53,282 --> 00:02:54,740 AUDIENCE: It's also order 1, right? 66 00:02:54,740 --> 00:02:55,790 PROFESSOR: OK. 67 00:02:55,790 --> 00:02:59,890 So order 1, index. 68 00:02:59,890 --> 00:03:02,727 Order 1, move. 69 00:03:02,727 --> 00:03:04,060 Suppose you have an empty table. 70 00:03:04,060 --> 00:03:05,680 How many indices do you do? 71 00:03:05,680 --> 00:03:07,915 How many times the index? 72 00:03:07,915 --> 00:03:09,540 AUDIENCE: You look at each one, so it's 73 00:03:09,540 --> 00:03:12,425 order 1 times the length of the table. 74 00:03:12,425 --> 00:03:14,890 PROFESSOR: m. 75 00:03:14,890 --> 00:03:17,620 So if the table is empty, you have order m indices 76 00:03:17,620 --> 00:03:19,740 and 0 moves. 77 00:03:19,740 --> 00:03:22,970 Total running time, order m. 78 00:03:22,970 --> 00:03:26,070 If you have a full table, how many times 79 00:03:26,070 --> 00:03:27,412 do you index in the table? 80 00:03:27,412 --> 00:03:28,437 AUDIENCE: Still order m. 81 00:03:28,437 --> 00:03:29,020 PROFESSOR: OK. 82 00:03:29,020 --> 00:03:31,330 How many times do you move stuff? 83 00:03:31,330 --> 00:03:34,365 AUDIENCE: Order m. 84 00:03:34,365 --> 00:03:35,615 PROFESSOR: Total running time? 85 00:03:35,615 --> 00:03:38,835 AUDIENCE: It's order 2m, which is order m. 86 00:03:38,835 --> 00:03:40,960 PROFESSOR: So it doesn't matter whether the table's 87 00:03:40,960 --> 00:03:42,198 full or empty. 88 00:03:42,198 --> 00:03:42,739 AUDIENCE: OK. 89 00:03:42,739 --> 00:03:44,160 Just wanted to confirm that. 90 00:03:44,160 --> 00:03:45,784 PROFESSOR: And this is how you do that. 91 00:03:47,940 --> 00:03:48,460 Cool. 92 00:03:48,460 --> 00:03:49,022 Thanks. 93 00:03:49,022 --> 00:03:49,855 Any other questions? 94 00:04:00,410 --> 00:04:04,480 Then we will go over problems in the order in which I like them, 95 00:04:04,480 --> 00:04:06,110 which is easiest to hardest so that I 96 00:04:06,110 --> 00:04:07,690 don't have to explain the hard ones. 97 00:04:14,710 --> 00:04:16,570 Warm up problem one. 98 00:04:24,800 --> 00:04:27,720 So you have this recursion and you have to solve it, 99 00:04:27,720 --> 00:04:40,350 and you get a hint that n to the power of 1 over log n 100 00:04:40,350 --> 00:04:45,540 is 2, which is theta 1. 101 00:04:45,540 --> 00:04:47,130 So based on the hint, you can see 102 00:04:47,130 --> 00:04:50,826 that it's going to involve some math. 103 00:04:50,826 --> 00:04:53,700 It's going to get a bit ugly. 104 00:04:53,700 --> 00:04:55,260 So how do we solve recursions? 105 00:04:55,260 --> 00:04:56,110 Two methods. 106 00:04:56,110 --> 00:04:57,514 What are they? 107 00:04:57,514 --> 00:04:59,927 AUDIENCE: Expand. 108 00:04:59,927 --> 00:05:00,510 PROFESSOR: OK. 109 00:05:00,510 --> 00:05:02,951 Substitution formally, but basically, we 110 00:05:02,951 --> 00:05:04,450 expand this guy over and over again. 111 00:05:04,450 --> 00:05:06,084 And? 112 00:05:06,084 --> 00:05:07,340 AUDIENCE: Trees. 113 00:05:07,340 --> 00:05:09,970 PROFESSOR: Recursion trees. 114 00:05:09,970 --> 00:05:11,610 Which one do guys want to do first? 115 00:05:15,080 --> 00:05:17,780 If you only have one t here, anything 116 00:05:17,780 --> 00:05:22,340 works because you can keep expanding it and that works, 117 00:05:22,340 --> 00:05:24,317 so we can do either method. 118 00:05:24,317 --> 00:05:25,900 Which one do you guys want to go over? 119 00:05:28,580 --> 00:05:29,740 Trees. 120 00:05:29,740 --> 00:05:31,100 OK. 121 00:05:31,100 --> 00:05:36,020 So we start with the first node. 122 00:05:36,020 --> 00:05:39,292 The size of the problem is n. 123 00:05:39,292 --> 00:05:41,732 What's the cost inside here? 124 00:05:45,148 --> 00:05:46,130 AUDIENCE: 1. 125 00:05:46,130 --> 00:05:47,550 PROFESSOR: OK. 126 00:05:47,550 --> 00:05:50,620 So this creates one sub-problem. 127 00:05:50,620 --> 00:05:51,483 What's the size? 128 00:05:54,198 --> 00:05:56,170 AUDIENCE: n to the 1/2. 129 00:05:56,170 --> 00:05:58,130 PROFESSOR: OK. 130 00:05:58,130 --> 00:06:00,530 Square root of n equals n to the 1/2. 131 00:06:00,530 --> 00:06:01,774 You solved it already. 132 00:06:01,774 --> 00:06:02,440 What's the cost? 133 00:06:05,279 --> 00:06:05,779 AUDIENCE: 1. 134 00:06:08,641 --> 00:06:10,660 PROFESSOR: Do people remember this? 135 00:06:10,660 --> 00:06:14,980 Is anyone confused about what's going on here? 136 00:06:14,980 --> 00:06:15,760 OK. 137 00:06:15,760 --> 00:06:19,710 So two terms, something involving t and something 138 00:06:19,710 --> 00:06:21,250 not involving t. 139 00:06:21,250 --> 00:06:24,110 The thing involving t is what we want to get rid of. 140 00:06:24,110 --> 00:06:29,870 When we do our recursion tree, whatever is in here 141 00:06:29,870 --> 00:06:34,030 goes inside here, and this tells me 142 00:06:34,030 --> 00:06:37,920 how this number relates to this number. 143 00:06:37,920 --> 00:06:41,250 So when I go from one level to the next level, 144 00:06:41,250 --> 00:06:42,550 this is the transformation. 145 00:06:42,550 --> 00:06:48,406 n and becomes square root of n, so the transformation here 146 00:06:48,406 --> 00:06:50,030 is the same as the transformation here. 147 00:06:57,880 --> 00:07:00,133 What's next? 148 00:07:00,133 --> 00:07:02,152 AUDIENCE: n to the 1/4. 149 00:07:02,152 --> 00:07:02,735 PROFESSOR: OK. 150 00:07:05,500 --> 00:07:07,828 Cost? 151 00:07:07,828 --> 00:07:09,320 AUDIENCE: 1. 152 00:07:09,320 --> 00:07:10,000 PROFESSOR: OK. 153 00:07:10,000 --> 00:07:12,350 Do we need to do one more or do people see the pattern? 154 00:07:16,090 --> 00:07:17,340 Silence means one more. 155 00:07:17,340 --> 00:07:21,240 If you guys don't speak, we're going to go slow. 156 00:07:21,240 --> 00:07:21,790 What's here? 157 00:07:25,213 --> 00:07:29,125 AUDIENCE: n to the 1/8. 158 00:07:29,125 --> 00:07:31,559 PROFESSOR: What's here? 159 00:07:31,559 --> 00:07:32,059 AUDIENCE: 1. 160 00:07:35,200 --> 00:07:37,170 PROFESSOR: Let's hope everyone saw the pattern, 161 00:07:37,170 --> 00:07:40,590 and suppose we've done this for l levels, 162 00:07:40,590 --> 00:07:43,330 so we're at the bottom. 163 00:07:43,330 --> 00:07:44,950 What should the cost be at the bottom? 164 00:07:47,436 --> 00:07:48,352 AUDIENCE: [INAUDIBLE]. 165 00:07:51,412 --> 00:07:52,120 PROFESSOR: Sorry. 166 00:07:52,120 --> 00:07:53,230 We don't start with the cost. 167 00:07:53,230 --> 00:07:55,438 What should the size of the problem be at the bottom? 168 00:08:00,978 --> 00:08:03,448 AUDIENCE: n to the 1 over 2 to the i. 169 00:08:11,015 --> 00:08:12,390 PROFESSOR: Let's say that this is 170 00:08:12,390 --> 00:08:15,080 level h, where h is the height of the tree. 171 00:08:17,871 --> 00:08:20,204 AUDIENCE: Don't you want to do something like n to the 1 172 00:08:20,204 --> 00:08:20,936 over n? 173 00:08:25,120 --> 00:08:26,420 PROFESSOR: Yeah. 174 00:08:26,420 --> 00:08:30,876 OK, so we want something that looks like that? 175 00:08:30,876 --> 00:08:33,300 AUDIENCE: If the recursion tree is height i, 176 00:08:33,300 --> 00:08:35,299 it is n to the 1 over 2 to the i, but 2 to the i 177 00:08:35,299 --> 00:08:37,267 should equal n, or approximately n. 178 00:08:40,835 --> 00:08:41,459 PROFESSOR: Why? 179 00:08:46,960 --> 00:08:48,520 I like that, but why? 180 00:08:48,520 --> 00:08:54,550 AUDIENCE: Because you need to go down 181 00:08:54,550 --> 00:08:57,690 until you're only looking at one element, 182 00:08:57,690 --> 00:09:02,067 and that would be one nth of the problem. 183 00:09:02,067 --> 00:09:02,650 PROFESSOR: OK. 184 00:09:02,650 --> 00:09:05,968 So we want this guy to look like what? 185 00:09:15,450 --> 00:09:17,990 In fact, it doesn't exactly have to look like 1, but what's 186 00:09:17,990 --> 00:09:22,417 the advantage if we manage to get this guy to look like 1? 187 00:09:22,417 --> 00:09:23,250 We have a recursion. 188 00:09:23,250 --> 00:09:25,940 We don't have a base case here, right? 189 00:09:25,940 --> 00:09:34,780 A reasonable base case is T of 1 is theta 1. 190 00:09:34,780 --> 00:09:37,800 Whatever function that is, if you evaluate it at 1, 191 00:09:37,800 --> 00:09:41,640 you're going to get a constant, so you can say that. 192 00:09:41,640 --> 00:09:45,376 Now, at the same time I can say that for any constant, 193 00:09:45,376 --> 00:09:50,660 c, T of c is theta 1. 194 00:09:53,180 --> 00:09:59,410 So if I take this constant here, which happens to be 2, 195 00:09:59,410 --> 00:10:01,100 but that's not to worry about that. 196 00:10:01,100 --> 00:10:04,320 If I take this guy here, I can put it in here. 197 00:10:08,350 --> 00:10:12,560 And I know that this guy here equals this guy here. 198 00:10:12,560 --> 00:10:17,210 So if I can make this guy here look like this guy here, 199 00:10:17,210 --> 00:10:20,500 then I'm done. 200 00:10:20,500 --> 00:10:21,015 Make sense? 201 00:10:23,780 --> 00:10:25,740 If it makes sense, everyone should nod so 202 00:10:25,740 --> 00:10:30,720 that I know and I can go forward, or smile or something. 203 00:10:30,720 --> 00:10:32,250 So this should look like 1. 204 00:10:32,250 --> 00:10:35,240 This is order 1 if not 1. 205 00:10:35,240 --> 00:10:39,250 Let's make it order 1, because it's 2 in this case. 206 00:10:39,250 --> 00:10:42,495 What's the cost here? 207 00:10:42,495 --> 00:10:43,410 AUDIENCE: 1. 208 00:10:43,410 --> 00:10:44,230 PROFESSOR: 1. 209 00:10:44,230 --> 00:10:47,130 Everything inside the bubbles is order of already, 210 00:10:47,130 --> 00:10:48,913 so I don't need to write an order of. 211 00:10:52,380 --> 00:10:55,120 What do we do next? 212 00:10:55,120 --> 00:10:58,030 AUDIENCE: Solve the [INAUDIBLE] equation. 213 00:10:58,030 --> 00:10:59,485 2 to the [INAUDIBLE]. 214 00:11:04,330 --> 00:11:06,825 PROFESSOR: You're skipping one step. 215 00:11:06,825 --> 00:11:09,200 That's exactly what you do when you have the substitution 216 00:11:09,200 --> 00:11:09,580 method. 217 00:11:09,580 --> 00:11:10,710 You're going to get to something and you 218 00:11:10,710 --> 00:11:11,834 need to solve the equation. 219 00:11:11,834 --> 00:11:14,160 But for the tree, there are two steps. 220 00:11:14,160 --> 00:11:15,990 So we need to add up all the costs here 221 00:11:15,990 --> 00:11:18,220 and that's the total cost here. 222 00:11:18,220 --> 00:11:24,681 In order to do that, first, we sum up over each level. 223 00:11:24,681 --> 00:11:26,180 And in this case, it's really simple 224 00:11:26,180 --> 00:11:27,846 because there's only one node per level, 225 00:11:27,846 --> 00:11:29,690 but if you have multiple nodes per level, 226 00:11:29,690 --> 00:11:31,600 you have to sum up for each level, 227 00:11:31,600 --> 00:11:33,570 and then you have to do a big sum. 228 00:11:33,570 --> 00:11:37,441 What's the sum for this level? 229 00:11:37,441 --> 00:11:37,940 1. 230 00:11:40,920 --> 00:11:41,610 Come on, guys. 231 00:11:41,610 --> 00:11:43,834 You're scaring me. 232 00:11:43,834 --> 00:11:45,325 AUDIENCE: 1. 233 00:11:45,325 --> 00:11:46,319 AUDIENCE: 1. 234 00:11:46,319 --> 00:11:47,320 AUDIENCE: It's all 1. 235 00:11:47,320 --> 00:11:48,564 PROFESSOR: Excellent. 236 00:11:48,564 --> 00:11:49,980 So the only thing that I'm missing 237 00:11:49,980 --> 00:11:53,070 is to know how many levels I have, because the sum is going 238 00:11:53,070 --> 00:11:58,050 to be order h, whatever h is. 239 00:11:58,050 --> 00:11:59,480 How do we do that? 240 00:11:59,480 --> 00:12:02,990 n to the 1 over 2 to the power of h 241 00:12:02,990 --> 00:12:07,030 has to equal this guy, right? 242 00:12:07,030 --> 00:12:08,852 AUDIENCE: Why would it equal that guy? 243 00:12:08,852 --> 00:12:11,122 We know it's less than that guy, but we 244 00:12:11,122 --> 00:12:13,150 don't know it's equal to that guy. 245 00:12:13,150 --> 00:12:15,600 PROFESSOR: We have to make it equal because we can only 246 00:12:15,600 --> 00:12:18,300 stop when we get to the base case. 247 00:12:18,300 --> 00:12:20,210 So we have to expand the recursion tree 248 00:12:20,210 --> 00:12:22,570 until we get to a base case, and then we stop, 249 00:12:22,570 --> 00:12:25,270 and this is our base case because this 250 00:12:25,270 --> 00:12:27,670 is what the problem says should be our base case. 251 00:12:27,670 --> 00:12:29,774 AUDIENCE: Right, but n to the 1 over 2 to the h 252 00:12:29,774 --> 00:12:39,172 is not equal to n to the 1 over log n. 253 00:12:39,172 --> 00:12:41,380 PROFESSOR: Well, we can set h to be whatever we want. 254 00:12:41,380 --> 00:12:44,540 h is the height of the tree, so we don't know what it is. 255 00:12:44,540 --> 00:12:46,166 We have to find out what it is. 256 00:12:46,166 --> 00:12:47,540 AUDIENCE: So let's say 2 to the h 257 00:12:47,540 --> 00:12:52,992 is equal to log n if you want to make it look like that. 258 00:12:52,992 --> 00:12:54,700 PROFESSOR: Let me write down the equation 259 00:12:54,700 --> 00:12:57,030 to make sure you're right. 260 00:12:57,030 --> 00:13:00,010 You're probably right because you're thinking faster than me, 261 00:13:00,010 --> 00:13:04,136 but let me not embarrass myself and do this the right way. 262 00:13:09,970 --> 00:13:14,890 So you said 2 to the h is log n, right? 263 00:13:14,890 --> 00:13:15,640 Looks about right. 264 00:13:15,640 --> 00:13:19,215 So what's h? 265 00:13:19,215 --> 00:13:21,500 AUDIENCE: Log base 2. 266 00:13:21,500 --> 00:13:23,770 PROFESSOR: All right. 267 00:13:23,770 --> 00:13:25,410 Log log n. 268 00:13:28,110 --> 00:13:30,600 So T of n is order h. 269 00:13:30,600 --> 00:13:33,950 We got this from here. 270 00:13:33,950 --> 00:13:43,510 T of n is order h is order of log log n. 271 00:13:43,510 --> 00:13:44,820 Math people drowning, right? 272 00:13:48,090 --> 00:13:49,205 Any questions about this? 273 00:13:52,260 --> 00:13:53,493 Yes? 274 00:13:53,493 --> 00:13:55,710 AUDIENCE: The first line on the right-- 275 00:13:55,710 --> 00:13:56,565 PROFESSOR: This? 276 00:13:56,565 --> 00:13:57,190 AUDIENCE: Yeah. 277 00:13:57,190 --> 00:13:58,148 Is that your base case? 278 00:13:58,148 --> 00:13:59,214 What is that? 279 00:13:59,214 --> 00:14:01,380 PROFESSOR: We got a hint with the problem that said, 280 00:14:01,380 --> 00:14:06,660 n to the power of 1 over log n is 2, which is order 1. 281 00:14:09,640 --> 00:14:12,507 So for the base case, we always want them to look like this. 282 00:14:12,507 --> 00:14:14,840 If we don't get a base case, we write our own base case, 283 00:14:14,840 --> 00:14:16,700 which is if you plug in a constant, 284 00:14:16,700 --> 00:14:18,950 you're going to get a constant. 285 00:14:18,950 --> 00:14:21,580 And since we're told that this guy is a constant, 286 00:14:21,580 --> 00:14:24,405 that's a pretty good hint that we want to get to it. 287 00:14:29,994 --> 00:14:31,410 Let's see how we're doing on time. 288 00:14:31,410 --> 00:14:34,080 Good. 289 00:14:34,080 --> 00:14:37,740 Ready to move on to the next problem? 290 00:14:37,740 --> 00:14:38,832 Let's do a fun one. 291 00:14:38,832 --> 00:14:41,040 Some people might remember it from elementary school, 292 00:14:41,040 --> 00:14:44,450 but this time, we're going to look at it with our 6.006 eyes. 293 00:14:44,450 --> 00:14:50,850 So suppose you have m coins, gold coins. 294 00:14:50,850 --> 00:14:51,840 One of them is face. 295 00:14:51,840 --> 00:14:54,340 The fake one is super light because it's not real gold. 296 00:14:54,340 --> 00:14:57,380 It's something that looks like gold. 297 00:14:57,380 --> 00:15:03,210 And we have a scale, and the scale is super accurate. 298 00:15:03,210 --> 00:15:05,950 It can weigh any coins on either side 299 00:15:05,950 --> 00:15:08,640 and tell us which side is heavier. 300 00:15:08,640 --> 00:15:10,830 Perfect accuracy, no need to worry about errors. 301 00:15:13,690 --> 00:15:16,550 I want to find out which coin is the bad coin. 302 00:15:16,550 --> 00:15:19,410 What is the minimum number of experiments I have to do? 303 00:15:22,440 --> 00:15:24,880 So there is a strategy, and we can worry about that later, 304 00:15:24,880 --> 00:15:28,790 but using 6.006, what is the minimum number of experiments 305 00:15:28,790 --> 00:15:30,284 I have to do? 306 00:15:30,284 --> 00:15:31,280 AUDIENCE: Log N times. 307 00:15:35,770 --> 00:15:36,710 PROFESSOR: Not quite. 308 00:15:36,710 --> 00:15:38,280 So this is what you think is, and you 309 00:15:38,280 --> 00:15:41,620 can do log n with binary search, right? 310 00:15:41,620 --> 00:15:45,270 The problem with binary search is if I put half of my coins 311 00:15:45,270 --> 00:15:47,550 on the left, half of my coins on the right, 312 00:15:47,550 --> 00:15:49,960 one side is going to be heavier, right? 313 00:15:49,960 --> 00:15:53,430 So the answers are going to be this or this, 314 00:15:53,430 --> 00:15:55,610 but I never get this. 315 00:15:55,610 --> 00:15:58,750 I only get one bit of information 316 00:15:58,750 --> 00:16:00,760 instead of getting one trit. 317 00:16:00,760 --> 00:16:03,250 A trit is a base three digit. 318 00:16:03,250 --> 00:16:07,952 How many bits of information in a trit? 319 00:16:07,952 --> 00:16:10,362 AUDIENCE: One and a half. 320 00:16:10,362 --> 00:16:12,290 PROFESSOR: Roughly. 321 00:16:12,290 --> 00:16:13,712 AUDIENCE: Log 3. 322 00:16:13,712 --> 00:16:14,420 PROFESSOR: Log 3. 323 00:16:14,420 --> 00:16:16,211 And we know that it's base 2 because that's 324 00:16:16,211 --> 00:16:17,720 what we use in CS. 325 00:16:17,720 --> 00:16:21,790 So we're discarding a fractional bit of information 326 00:16:21,790 --> 00:16:25,046 if we're not allowing for this to happen. 327 00:16:31,550 --> 00:16:33,254 Anyone want to try something else? 328 00:16:33,254 --> 00:16:34,670 We have to prove this, by the way. 329 00:16:34,670 --> 00:16:36,753 We have to prove the minimum that we come up with. 330 00:16:40,176 --> 00:16:42,064 AUDIENCE: You could just do it coin by coin, 331 00:16:42,064 --> 00:16:43,490 but that would take forever. 332 00:16:43,490 --> 00:16:45,010 PROFESSOR: That's N. That's worse. 333 00:16:45,010 --> 00:16:50,620 AUDIENCE: How about log base 4 N or something like that? 334 00:16:50,620 --> 00:16:53,020 AUDIENCE: Can you explain to me why we can't just 335 00:16:53,020 --> 00:16:54,190 do binary search? 336 00:16:54,190 --> 00:16:55,140 PROFESSOR: We can. 337 00:16:55,140 --> 00:16:58,030 It's definitely going to give us the correct answer, 338 00:16:58,030 --> 00:17:00,190 but it's not the minimum number of weighings 339 00:17:00,190 --> 00:17:03,570 because we're discarding a possible answer. 340 00:17:03,570 --> 00:17:05,319 So if you do binary search, you will never 341 00:17:05,319 --> 00:17:07,470 get that the two sides are equal. 342 00:17:07,470 --> 00:17:08,700 AUDIENCE: Log base 3. 343 00:17:08,700 --> 00:17:10,560 PROFESSOR: Log base 3 would be better 344 00:17:10,560 --> 00:17:12,829 because we have three choices all the time. 345 00:17:12,829 --> 00:17:13,980 Let's prove that. 346 00:17:13,980 --> 00:17:17,190 So the right answer happens to be log base 3 of N. Let's see 347 00:17:17,190 --> 00:17:19,880 how we would get it aside from guessing. 348 00:17:19,880 --> 00:17:22,850 AUDIENCE: So you divide it into thirds 349 00:17:22,850 --> 00:17:26,810 and compare one third and one third, and if they're equal, 350 00:17:26,810 --> 00:17:29,285 then the light one is in the other third. 351 00:17:29,285 --> 00:17:31,265 And if they're not, [INAUDIBLE] light one. 352 00:17:31,265 --> 00:17:34,937 Then you just keep dividing by 3. 353 00:17:34,937 --> 00:17:35,520 PROFESSOR: OK. 354 00:17:35,520 --> 00:17:36,760 So that's the strategy. 355 00:17:36,760 --> 00:17:38,250 What if I don't know the strategy? 356 00:17:38,250 --> 00:17:40,166 How do I do this without knowing the strategy? 357 00:17:42,668 --> 00:17:46,014 AUDIENCE: What if the number of coins isn't divisible by 3? 358 00:17:49,360 --> 00:17:52,186 PROFESSOR: Math people. 359 00:17:52,186 --> 00:17:56,520 AUDIENCE: Yeah, but then how do you-- OK, never mind. 360 00:17:56,520 --> 00:17:59,134 AUDIENCE: Just take the two extra coins and toss them out. 361 00:18:02,220 --> 00:18:03,810 PROFESSOR: If it's not divisible by 3, 362 00:18:03,810 --> 00:18:07,540 you add fake coins that are good. 363 00:18:07,540 --> 00:18:11,050 I mean, you use good coins. 364 00:18:11,050 --> 00:18:13,500 But we're not worried about the strategy. 365 00:18:13,500 --> 00:18:15,240 I want us to think of a lower bound. 366 00:18:15,240 --> 00:18:17,470 This is a lower bound for an algorithm, right? 367 00:18:17,470 --> 00:18:23,280 You cannot do better than log 3 N experiments. 368 00:18:23,280 --> 00:18:27,247 Does the word "lower bound" ring any bells? 369 00:18:27,247 --> 00:18:29,580 Is there any lecture where we talked about lower bounds? 370 00:18:37,070 --> 00:18:39,830 So if you sort and you're using a comparison model, 371 00:18:39,830 --> 00:18:40,970 what's the best you can do? 372 00:18:44,260 --> 00:18:46,140 AUDIENCE: N log N. 373 00:18:46,140 --> 00:18:48,700 PROFESSOR: N log N. Good. 374 00:18:48,700 --> 00:18:58,010 So sorting using CMP, the comparison model, is N log N. 375 00:18:58,010 --> 00:18:59,110 How did we prove that? 376 00:19:06,360 --> 00:19:07,300 One word. 377 00:19:07,300 --> 00:19:08,080 Well, two words. 378 00:19:12,160 --> 00:19:13,280 Decision trees. 379 00:19:13,280 --> 00:19:16,140 Does anyone remember what decision trees are? 380 00:19:16,140 --> 00:19:17,086 One person. 381 00:19:17,086 --> 00:19:19,107 AUDIENCE: It's just a comparison thing, right? 382 00:19:19,107 --> 00:19:20,940 You're like, is it greater, is it less than, 383 00:19:20,940 --> 00:19:22,315 or is there some sort of question 384 00:19:22,315 --> 00:19:24,384 you're asking about each key. 385 00:19:24,384 --> 00:19:25,050 PROFESSOR: Cool. 386 00:19:25,050 --> 00:19:27,590 Let's go over that a little bit. 387 00:19:27,590 --> 00:19:31,790 No matter what your algorithm is, 388 00:19:31,790 --> 00:19:34,000 it's going to weigh some coins and it's 389 00:19:34,000 --> 00:19:36,310 going to get an answer from the scale. 390 00:19:36,310 --> 00:19:39,350 And then based on that, it's going to weigh some other coins 391 00:19:39,350 --> 00:19:42,120 and get some answer from the scale. 392 00:19:42,120 --> 00:19:44,510 And it will do some experiments and then 393 00:19:44,510 --> 00:19:46,540 it will give you an answer. 394 00:19:46,540 --> 00:19:50,300 So if you draw a decision tree, it would look like this. 395 00:19:50,300 --> 00:19:52,670 First, we start with 0 information. 396 00:19:52,670 --> 00:19:54,590 We weigh some coins. 397 00:19:54,590 --> 00:19:59,420 Based on that, we have three possible answers-- 398 00:19:59,420 --> 00:20:03,910 smaller, equal, greater. 399 00:20:03,910 --> 00:20:07,260 Now, if we're here, we're going to do another experiment. 400 00:20:07,260 --> 00:20:10,910 Three possible answers. 401 00:20:10,910 --> 00:20:13,350 If we're here, another experiment, three possible 402 00:20:13,350 --> 00:20:13,850 answers. 403 00:20:13,850 --> 00:20:16,770 If we're here, another experiment, three possible 404 00:20:16,770 --> 00:20:19,420 answers. 405 00:20:19,420 --> 00:20:21,830 Say we do a third experiment. 406 00:20:21,830 --> 00:20:24,641 One, two, three, one, two, three, one, two, three, one, 407 00:20:24,641 --> 00:20:27,863 two, three, one, two, three, one, two, three, one, two, 408 00:20:27,863 --> 00:20:30,760 three, one, two, three, one, two, three. 409 00:20:30,760 --> 00:20:33,160 And then suppose we stop. 410 00:20:33,160 --> 00:20:35,620 If we stop, we have to give an answer. 411 00:20:35,620 --> 00:20:37,530 So this is an answer, this is an answer, 412 00:20:37,530 --> 00:20:44,880 this is an answer, answer, answer, answer, answer, answer. 413 00:20:44,880 --> 00:20:47,100 So how many answers do I have at the bottom 414 00:20:47,100 --> 00:20:50,684 if I have three levels? 415 00:20:50,684 --> 00:20:52,600 Here I have three experiments, so three levels 416 00:20:52,600 --> 00:20:53,474 in the decision tree. 417 00:20:53,474 --> 00:20:56,160 How many answers? 418 00:20:56,160 --> 00:20:58,080 AUDIENCE: [INAUDIBLE]. 419 00:20:58,080 --> 00:21:00,260 PROFESSOR: 3 to the third because I start three 420 00:21:00,260 --> 00:21:03,580 at the first level, nine at the second, 27 at the third. 421 00:21:03,580 --> 00:21:05,050 Each time, I multiply by 3. 422 00:21:08,490 --> 00:21:15,030 So if I do three weighings, I can give at most 27 answers. 423 00:21:15,030 --> 00:21:18,790 If I have more than 27 coins, I can't possibly 424 00:21:18,790 --> 00:21:24,410 decide which one is bad because say if I have 30 coins, 425 00:21:24,410 --> 00:21:27,400 then I need to be able to give out 30 answers. 426 00:21:27,400 --> 00:21:29,810 My algorithm has to have a place where 427 00:21:29,810 --> 00:21:33,020 it says the bad coin is coin one, coin two, coin three, all 428 00:21:33,020 --> 00:21:34,460 the way to coin 30. 429 00:21:34,460 --> 00:21:36,960 Here I only have 27 possible answers, 430 00:21:36,960 --> 00:21:39,297 so this isn't going to cut it for 30 coins. 431 00:21:39,297 --> 00:21:41,880 I need to do one more comparison so that I have a deeper tree. 432 00:21:44,560 --> 00:21:49,230 So suppose I have h comparisons instead. 433 00:21:49,230 --> 00:21:51,500 How many leaves? 434 00:21:51,500 --> 00:21:53,464 How many possible answers? 435 00:21:53,464 --> 00:21:55,900 AUDIENCE: h to the third. 436 00:21:55,900 --> 00:21:56,663 PROFESSOR: Almost. 437 00:21:56,663 --> 00:21:57,538 AUDIENCE: 3 to the h. 438 00:21:57,538 --> 00:22:00,380 PROFESSOR: 3 to the h. 439 00:22:00,380 --> 00:22:07,370 So 3 multiplied by 3 multiplied by 3 multiplied by 3 h times, 440 00:22:07,370 --> 00:22:09,210 so 3 to the h. 441 00:22:09,210 --> 00:22:12,210 It's no longer equal to 27. 442 00:22:12,210 --> 00:22:14,425 3 to the h is the number of possible answers. 443 00:22:14,425 --> 00:22:17,700 This has to be bigger or equal to N. Otherwise, 444 00:22:17,700 --> 00:22:18,885 the algorithm is incorrect. 445 00:22:21,830 --> 00:22:23,210 So what can we say about h? 446 00:22:28,261 --> 00:22:29,177 AUDIENCE: [INAUDIBLE]. 447 00:22:43,370 --> 00:22:45,740 PROFESSOR: We did all this without even thinking 448 00:22:45,740 --> 00:22:47,940 of what an algorithm would look like. 449 00:22:47,940 --> 00:22:49,330 This works for any algorithm. 450 00:22:49,330 --> 00:22:52,820 No matter how smart you are, no matter how much math you know, 451 00:22:52,820 --> 00:22:55,940 your algorithm is going to be bound by this. 452 00:22:55,940 --> 00:22:59,071 So the fact that the answer looks like this 453 00:22:59,071 --> 00:23:01,320 gives you some intuition for how to solve the problem. 454 00:23:01,320 --> 00:23:03,445 If you want to solve the problem now and figure out 455 00:23:03,445 --> 00:23:06,600 the strategy, you know that you have a 3 here. 456 00:23:06,600 --> 00:23:08,430 So if you divide into 2 every time, 457 00:23:08,430 --> 00:23:11,820 you're not going to get to the right limit. 458 00:23:11,820 --> 00:23:14,660 So first you do this, you get a lower bound, 459 00:23:14,660 --> 00:23:18,190 and then you use your intuition to figure out 460 00:23:18,190 --> 00:23:20,110 what the lower bound means. 461 00:23:20,110 --> 00:23:24,170 In this case, it would mean the strategy that we heard earlier. 462 00:23:24,170 --> 00:23:25,750 You have to divide into 3 every time 463 00:23:25,750 --> 00:23:28,810 and then figure out what you do based on the comparison. 464 00:23:28,810 --> 00:23:32,790 So your answer works perfectly once we have this. 465 00:23:32,790 --> 00:23:34,389 And also, once we have this, you know 466 00:23:34,389 --> 00:23:36,430 that your answer is correct because it's optimal. 467 00:23:36,430 --> 00:23:37,680 You can't do better than that. 468 00:23:40,007 --> 00:23:41,340 Any questions on decision trees? 469 00:23:44,590 --> 00:23:47,420 So lower bounds are a boring topic in general. 470 00:23:47,420 --> 00:23:48,770 They tell you what you can't do. 471 00:23:48,770 --> 00:23:51,600 They don't tell you anything useful about what you can do. 472 00:23:51,600 --> 00:23:53,900 In some cases, being able to reason about a lower bound 473 00:23:53,900 --> 00:23:55,274 gives you a hint of the solution. 474 00:24:01,730 --> 00:24:02,230 New problem. 475 00:24:23,340 --> 00:24:26,380 Suppose we have a 2D map. 476 00:24:26,380 --> 00:24:29,080 There's a hill, and you take a satellite picture of it 477 00:24:29,080 --> 00:24:33,930 at night, and you get a picture with bright pixels and not 478 00:24:33,930 --> 00:24:35,070 bright pixels. 479 00:24:35,070 --> 00:24:39,540 There are numbers showing how bright your pixels are. 480 00:24:39,540 --> 00:24:46,530 1, 2, 1, 2, 3, 0, 0. 481 00:24:46,530 --> 00:24:51,645 I'm going to draw out an example so we can use our intuition. 482 00:24:51,645 --> 00:24:55,338 0, 0, 1. 483 00:25:13,090 --> 00:25:14,680 So suppose this is our map. 484 00:25:14,680 --> 00:25:21,660 It's W times H-- W of what these are, 485 00:25:21,660 --> 00:25:25,860 I think they're columns, and H of the other ones. 486 00:25:25,860 --> 00:25:28,080 And you want to find a certain picture inside it. 487 00:25:28,080 --> 00:25:31,210 You want to see how many times does a certain pattern show up. 488 00:25:31,210 --> 00:25:35,470 Say the pattern is small w times small h, 489 00:25:35,470 --> 00:25:37,740 and it looks like this. 490 00:25:40,980 --> 00:25:43,374 But this will be the input to your problem, 491 00:25:43,374 --> 00:25:44,790 so the pattern might be different. 492 00:25:44,790 --> 00:25:46,660 You can't hard code this in. 493 00:25:46,660 --> 00:25:48,290 And this is useful. 494 00:25:48,290 --> 00:25:50,250 This problem is called a bunker hill problem. 495 00:25:50,250 --> 00:25:53,429 This is a hill, and this is a bunker. 496 00:25:53,429 --> 00:25:54,720 You take a picture of the hill. 497 00:25:54,720 --> 00:25:55,960 You want to know where the bunkers are 498 00:25:55,960 --> 00:25:58,670 so you can bomb them at night so then you can attack the place. 499 00:25:58,670 --> 00:25:59,660 AUDIENCE: That's awful. 500 00:25:59,660 --> 00:26:00,645 PROFESSOR: Thank you. 501 00:26:00,645 --> 00:26:05,670 I'll take that as a compliment So a nice way of solving this? 502 00:26:09,963 --> 00:26:13,090 AUDIENCE: You could just go through each row, 503 00:26:13,090 --> 00:26:15,950 and then look for a match for the first row, and then-- 504 00:26:15,950 --> 00:26:17,212 PROFESSOR: Yep. 505 00:26:17,212 --> 00:26:18,559 Is this a match? 506 00:26:18,559 --> 00:26:19,642 That's what you're saying. 507 00:26:19,642 --> 00:26:20,128 AUDIENCE: Yeah. 508 00:26:20,128 --> 00:26:21,025 We can see that's a match. 509 00:26:21,025 --> 00:26:21,320 PROFESSOR: Is this a match? 510 00:26:21,320 --> 00:26:22,360 Is this a match? 511 00:26:22,360 --> 00:26:24,920 By the way, this is a match. 512 00:26:24,920 --> 00:26:29,960 This is not a match, this is not a match, this is not a match, 513 00:26:29,960 --> 00:26:30,970 this is not a match. 514 00:26:30,970 --> 00:26:32,200 Now we go down here. 515 00:26:32,200 --> 00:26:34,560 This is not a match, this is not a match, 516 00:26:34,560 --> 00:26:37,242 this is not a match, so on and so forth. 517 00:26:37,242 --> 00:26:39,075 AUDIENCE: That wasn't what I was suggesting, 518 00:26:39,075 --> 00:26:41,422 but that's a good idea. 519 00:26:41,422 --> 00:26:43,630 PROFESSOR: Maybe you're suggesting something smarter, 520 00:26:43,630 --> 00:26:45,296 and I don't want to let you do something 521 00:26:45,296 --> 00:26:47,932 smarter so that we look at the brute force approach first. 522 00:26:47,932 --> 00:26:49,390 AUDIENCE: I mean, I was just saying 523 00:26:49,390 --> 00:26:51,200 take the first row of your bunker, 524 00:26:51,200 --> 00:26:53,150 and then compare it to other rows, 525 00:26:53,150 --> 00:26:56,787 and once you hit that, then check and see if the rest of-- 526 00:26:56,787 --> 00:26:58,370 PROFESSOR: Yeah, that's a bit smarter, 527 00:26:58,370 --> 00:27:00,300 so that's harder to reason about. 528 00:27:00,300 --> 00:27:04,623 Let's take this one and figure out the running time of it. 529 00:27:04,623 --> 00:27:07,654 AUDIENCE: Does that mean even if you know it's not a match, 530 00:27:07,654 --> 00:27:10,114 you keep checking all nine of them? 531 00:27:10,114 --> 00:27:10,780 PROFESSOR: Yeah. 532 00:27:10,780 --> 00:27:12,500 Say at the worst case, you only find out 533 00:27:12,500 --> 00:27:14,232 it's not a match all the way at the end. 534 00:27:16,895 --> 00:27:19,436 AUDIENCE: Are you trying to look for all matches or just one? 535 00:27:19,436 --> 00:27:22,560 PROFESSOR: All matches. 536 00:27:22,560 --> 00:27:25,155 AUDIENCE: You're limited to n squared time almost no matter 537 00:27:25,155 --> 00:27:26,755 what, right? 538 00:27:26,755 --> 00:27:29,270 If you have a small bunker in a large field, 539 00:27:29,270 --> 00:27:32,210 you have to hit the small bunker every time. 540 00:27:32,210 --> 00:27:34,720 PROFESSOR: Are you going to solve the problem for me? 541 00:27:34,720 --> 00:27:37,337 AUDIENCE: Are we trying to find the [INAUDIBLE]? 542 00:27:37,337 --> 00:27:37,920 PROFESSOR: No. 543 00:27:37,920 --> 00:27:39,697 We're trying to find out the running 544 00:27:39,697 --> 00:27:42,420 time for the dumb algorithm first. 545 00:27:42,420 --> 00:27:44,150 Humor me and let's solve this first, 546 00:27:44,150 --> 00:27:46,870 and then let's get to the efficient algorithm, OK? 547 00:27:46,870 --> 00:27:50,580 AUDIENCE: Big W minus small w plus 1 times big H 548 00:27:50,580 --> 00:27:52,080 minus small h plus 1. 549 00:27:55,080 --> 00:27:57,080 AUDIENCE: Where'd you get plus 1 from? 550 00:27:59,582 --> 00:28:01,064 AUDIENCE: So it's WH? 551 00:28:01,064 --> 00:28:02,101 AUDIENCE: Yeah. 552 00:28:02,101 --> 00:28:04,100 PROFESSOR: Well, there's something missing here. 553 00:28:04,100 --> 00:28:08,250 This is how many positions I have that I have to look at. 554 00:28:08,250 --> 00:28:11,014 How much time does it take to compare the small images? 555 00:28:11,014 --> 00:28:11,930 AUDIENCE: [INAUDIBLE]. 556 00:28:15,860 --> 00:28:18,800 PROFESSOR: This is smaller than wh, which is the input size, 557 00:28:18,800 --> 00:28:21,120 so it's scary if you have an algorithm that runs faster 558 00:28:21,120 --> 00:28:24,010 than the input size because it means 559 00:28:24,010 --> 00:28:25,926 you're not looking at all the input. 560 00:28:34,641 --> 00:28:36,640 So this is definitely bigger than the input size 561 00:28:36,640 --> 00:28:38,530 once we add the w times h here. 562 00:28:38,530 --> 00:28:39,555 Don't forget this guy. 563 00:28:42,590 --> 00:28:48,700 This is the naive algorithm, and if we discard the small order 564 00:28:48,700 --> 00:28:51,012 factors, we get that this is order of WHwh. 565 00:28:53,950 --> 00:28:56,420 How can you do better? 566 00:28:56,420 --> 00:28:59,240 You have the answer, right? 567 00:28:59,240 --> 00:29:01,850 Let's let everyone else think for a minute, 568 00:29:01,850 --> 00:29:04,300 and then you can give me the answer if you want, 569 00:29:04,300 --> 00:29:06,620 or someone else can give me the answer. 570 00:29:08,955 --> 00:29:11,079 I guess you should because you thought of it first. 571 00:29:17,070 --> 00:29:17,780 Any ideas? 572 00:29:23,620 --> 00:29:28,010 So you're thinking about the input size, right? 573 00:29:28,010 --> 00:29:32,160 Someone was thinking about the input size. 574 00:29:32,160 --> 00:29:34,680 So the input size is W times H, right? 575 00:29:34,680 --> 00:29:36,519 So if I have an algorithm that's W times H, 576 00:29:36,519 --> 00:29:38,810 that's optimal because it has to look at all the input. 577 00:29:42,016 --> 00:29:43,890 Well, we're going to have an algorithm that's 578 00:29:43,890 --> 00:29:47,990 W times H, so with that out of the way, 579 00:29:47,990 --> 00:29:51,000 does that inspire anyone as to what the solution is? 580 00:29:58,485 --> 00:30:00,510 AUDIENCE: Do that thing that I was saying, 581 00:30:00,510 --> 00:30:03,704 just take the first row, but then you still have a W term. 582 00:30:03,704 --> 00:30:04,370 PROFESSOR: Yeah. 583 00:30:04,370 --> 00:30:07,470 So let's make it better. 584 00:30:07,470 --> 00:30:09,250 It is the correct intuition. 585 00:30:09,250 --> 00:30:11,920 Now try to use a trick we learned in lecture to make that 586 00:30:11,920 --> 00:30:12,420 faster. 587 00:30:17,233 --> 00:30:18,816 AUDIENCE: Just use the top left corner 588 00:30:18,816 --> 00:30:21,527 instead of the whole row. 589 00:30:21,527 --> 00:30:22,110 PROFESSOR: OK. 590 00:30:22,110 --> 00:30:24,110 So we could use the top left corner, 591 00:30:24,110 --> 00:30:26,590 and if the top left corner doesn't match, 592 00:30:26,590 --> 00:30:30,630 then we don't have to check for matches. 593 00:30:30,630 --> 00:30:33,300 So this works for reasonably random data. 594 00:30:33,300 --> 00:30:35,630 As long as we don't have a lot of false positives, 595 00:30:35,630 --> 00:30:38,040 we're going to run fast. 596 00:30:38,040 --> 00:30:43,450 Now, the top corner of this one, if the map 597 00:30:43,450 --> 00:30:47,820 has a lot of 1's and then some 2's sprinkled all over it, 598 00:30:47,820 --> 00:30:51,370 most of the time, we'll have to go through the whole image 599 00:30:51,370 --> 00:30:54,190 so we're going to have a lot of false positives. 600 00:30:54,190 --> 00:30:59,180 How do we make our false positive rate go down? 601 00:30:59,180 --> 00:31:01,680 AUDIENCE: Looks kind of like a rolling hash problem. 602 00:31:01,680 --> 00:31:04,580 PROFESSOR: Looks like a rolling hash problem, exactly. 603 00:31:04,580 --> 00:31:07,147 Let's see if we can use rolling hashes. 604 00:31:07,147 --> 00:31:09,230 AUDIENCE: But then you still have that lowercase w 605 00:31:09,230 --> 00:31:11,022 term, though. 606 00:31:11,022 --> 00:31:12,480 PROFESSOR: How do we get rid of it? 607 00:31:15,204 --> 00:31:16,182 AUDIENCE: [INAUDIBLE]. 608 00:31:16,182 --> 00:31:16,890 PROFESSOR: Sorry? 609 00:31:21,633 --> 00:31:22,758 AUDIENCE: Wouldn't it be w? 610 00:31:22,758 --> 00:31:25,936 I mean the running time if we were just going through one row 611 00:31:25,936 --> 00:31:29,115 would be big W minus little w, times-- 612 00:31:31,534 --> 00:31:33,200 PROFESSOR: So where's your rolling hash? 613 00:31:37,768 --> 00:31:40,143 AUDIENCE: I guess you can use the entire thing as a hash, 614 00:31:40,143 --> 00:31:40,643 too. 615 00:31:40,643 --> 00:31:42,970 That would kind of work. 616 00:31:42,970 --> 00:31:45,330 PROFESSOR: So we want a hash for the whole thing. 617 00:31:45,330 --> 00:31:50,928 Instead of using this as the hash, we want a smarter hash. 618 00:31:50,928 --> 00:31:53,880 AUDIENCE: It's the entire thing, and then 619 00:31:53,880 --> 00:31:58,308 as you move to the right, you can add those and subtract, 620 00:31:58,308 --> 00:32:01,487 and compare that with the hash. 621 00:32:01,487 --> 00:32:02,070 PROFESSOR: OK. 622 00:32:02,070 --> 00:32:05,510 So we'd have a rolling hash that has everything in here, 623 00:32:05,510 --> 00:32:08,640 and then as I move to the right, I add these guys 624 00:32:08,640 --> 00:32:09,710 and I remove these guys. 625 00:32:12,730 --> 00:32:19,400 This is big W times big H, roughly, times small h 626 00:32:19,400 --> 00:32:21,470 because every time I move to the right, 627 00:32:21,470 --> 00:32:25,090 I have to do order h work. 628 00:32:25,090 --> 00:32:32,260 So I'm down from this thing to order of WHh, 629 00:32:32,260 --> 00:32:33,590 So it's better. 630 00:32:33,590 --> 00:32:34,590 It's one step forward. 631 00:32:34,590 --> 00:32:35,990 Now, let's make this even faster. 632 00:32:47,420 --> 00:32:53,810 What if I could do this in order 1 instead of order h? 633 00:32:57,790 --> 00:33:00,114 How would I do this in order 1? 634 00:33:00,114 --> 00:33:03,450 AUDIENCE: You'd have to compress all the rows, 635 00:33:03,450 --> 00:33:05,690 and then take the hash of each column. 636 00:33:05,690 --> 00:33:08,662 PROFESSOR: How would we compress them? 637 00:33:08,662 --> 00:33:12,050 AUDIENCE: Take the hash of the column. 638 00:33:12,050 --> 00:33:13,830 PROFESSOR: You want to compress the rows? 639 00:33:13,830 --> 00:33:14,413 AUDIENCE: Yes. 640 00:33:14,413 --> 00:33:16,705 You divide it-- 641 00:33:16,705 --> 00:33:18,330 PROFESSOR: Let's not compress the rows. 642 00:33:23,970 --> 00:33:26,050 AUDIENCE: You could take just your bunker, 643 00:33:26,050 --> 00:33:29,980 and then figure out the hashes of the three columns, 644 00:33:29,980 --> 00:33:33,290 and just run through like that. 645 00:33:33,290 --> 00:33:36,146 You'd still have to access each of those items. 646 00:33:36,146 --> 00:33:39,800 I don't really see how it's faster. 647 00:33:39,800 --> 00:33:41,380 I guess it's less, though. 648 00:33:41,380 --> 00:33:42,340 It's less. 649 00:33:42,340 --> 00:33:43,300 Maybe it's only 1. 650 00:33:47,630 --> 00:33:53,970 AUDIENCE: So do you want to hash each little column [INAUDIBLE]? 651 00:34:01,200 --> 00:34:03,950 PROFESSOR: So we're going to hash all these guys, 652 00:34:03,950 --> 00:34:06,340 and then we're going to have hashes for them, 653 00:34:06,340 --> 00:34:10,590 and we're going to do the regular Rabin-Karp 654 00:34:10,590 --> 00:34:13,420 for the hashes. 655 00:34:13,420 --> 00:34:16,478 Now, what happens when I go down? 656 00:34:16,478 --> 00:34:18,840 PROFESSOR: You have to recompute everything. 657 00:34:18,840 --> 00:34:20,024 PROFESSOR: Let's do better than recompute everything. 658 00:34:20,024 --> 00:34:21,241 AUDIENCE: Do you want to [INAUDIBLE] downward 659 00:34:21,241 --> 00:34:21,540 on each column? 660 00:34:21,540 --> 00:34:22,165 PROFESSOR: Yep. 661 00:34:22,165 --> 00:34:23,489 Rolling hash. 662 00:34:23,489 --> 00:34:28,199 I want to make this faster, so I have big W hashes. 663 00:34:28,199 --> 00:34:30,610 They're all little h inside. 664 00:34:30,610 --> 00:34:32,360 Here, I have to compute them brute force. 665 00:34:32,360 --> 00:34:33,485 I can't do anything better. 666 00:34:33,485 --> 00:34:36,250 But when I go from here to here, there's 667 00:34:36,250 --> 00:34:39,394 only one element going out and one element going in. 668 00:34:42,070 --> 00:34:43,310 Same for all these guys. 669 00:34:43,310 --> 00:34:46,080 Let's not make the picture uglier than it needs to be. 670 00:34:46,080 --> 00:34:48,960 So I have big W rolling hashes. 671 00:34:48,960 --> 00:34:51,340 They're vertical rolling hashes. 672 00:34:51,340 --> 00:34:59,310 And then the rolling hashes hash columns, so my the sliding 673 00:34:59,310 --> 00:35:03,230 window that I have is little w rolling hashes. 674 00:35:03,230 --> 00:35:10,320 Each rolling hash is little h in size, so it's a hash of hashes. 675 00:35:10,320 --> 00:35:12,000 It's nested hashes. 676 00:35:12,000 --> 00:35:14,410 And then when I go down, I only have 677 00:35:14,410 --> 00:35:17,950 to roll down each of the rolling hashes by 1, 678 00:35:17,950 --> 00:35:19,300 so that's constant time. 679 00:35:19,300 --> 00:35:24,630 So to go from here to here, to the slide the window one down, 680 00:35:24,630 --> 00:35:27,330 I have to roll this hash down, roll this one, 681 00:35:27,330 --> 00:35:29,640 this one, this one, this one, this one, and all of them 682 00:35:29,640 --> 00:35:31,640 roll down in constant time. 683 00:35:31,640 --> 00:35:36,640 So when I'm adding a column to the hash, say I'm here 684 00:35:36,640 --> 00:35:40,380 and I want to go here, I roll down this hash 685 00:35:40,380 --> 00:35:42,560 and I have the answer. 686 00:35:42,560 --> 00:35:43,210 It's order 1. 687 00:35:43,210 --> 00:35:45,360 I'm adding it in order 1. 688 00:35:47,726 --> 00:35:48,600 Does this make sense? 689 00:35:55,087 --> 00:35:58,572 AUDIENCE: It's tricky. 690 00:35:58,572 --> 00:36:00,196 PROFESSOR: But it's not too bad, right? 691 00:36:00,196 --> 00:36:02,770 AUDIENCE: You just need to do the vertical roll first, right? 692 00:36:02,770 --> 00:36:03,395 PROFESSOR: Yep. 693 00:36:05,880 --> 00:36:07,890 To have the simplest possible code, 694 00:36:07,890 --> 00:36:12,520 you start with big W rolling hashes, do 1D Rabin-Karp, 695 00:36:12,520 --> 00:36:17,267 you roll everything down, 1D Rabin-Karp, 696 00:36:17,267 --> 00:36:18,100 and keep doing that. 697 00:36:20,810 --> 00:36:23,372 OK What's the running time for this? 698 00:36:23,372 --> 00:36:24,410 AUDIENCE: WH. 699 00:36:24,410 --> 00:36:25,012 PROFESSOR: WH. 700 00:36:25,012 --> 00:36:27,595 AUDIENCE: Does this one have a space complexity about W, then, 701 00:36:27,595 --> 00:36:28,780 because [INAUDIBLE]? 702 00:36:28,780 --> 00:36:29,840 PROFESSOR: Yeah. 703 00:36:29,840 --> 00:36:32,500 So my memory requirement went up to 4W. 704 00:36:39,850 --> 00:36:41,670 Is everyone happy with this? 705 00:36:41,670 --> 00:36:46,340 It's one of the few cases where an approach for solving a 1D 706 00:36:46,340 --> 00:36:48,670 problem generalizes to 2D. 707 00:36:48,670 --> 00:36:51,630 In most problems, you have to rethink the whole situation. 708 00:37:02,980 --> 00:37:04,497 Let's do a hard problem. 709 00:37:04,497 --> 00:37:05,580 Enough with the easy ones. 710 00:37:10,580 --> 00:37:19,240 Two lists, roughly size N, and they're both sorted. 711 00:37:19,240 --> 00:37:21,090 Let me fill them out with random numbers. 712 00:37:25,010 --> 00:37:50,710 5, 13, 22, 43, 56, 62, 81, 86, 87, 2, 3, 7, 9, 15, 19, 713 00:37:50,710 --> 00:37:53,965 24, 28, 32. 714 00:37:56,780 --> 00:37:58,700 So I have these lists. 715 00:37:58,700 --> 00:38:01,810 Let's be generous and say that all the numbers are different. 716 00:38:01,810 --> 00:38:03,110 They're sorted. 717 00:38:03,110 --> 00:38:09,110 I want to find the nth number, so the number with rank n, 718 00:38:09,110 --> 00:38:19,274 out of both lists as fast as possible. 719 00:38:19,274 --> 00:38:22,680 AUDIENCE: Wouldn't it just be index? 720 00:38:22,680 --> 00:38:26,470 PROFESSOR: So the thing is if, for example, n is 1, 721 00:38:26,470 --> 00:38:27,990 then it's this. 722 00:38:27,990 --> 00:38:29,140 This is the second number. 723 00:38:29,140 --> 00:38:31,884 This is the third. 724 00:38:31,884 --> 00:38:33,800 So if you take the lists and you combine them, 725 00:38:33,800 --> 00:38:36,720 then I want the result out of that. 726 00:38:36,720 --> 00:38:39,120 AUDIENCE: Let's do merge sort and the combined 727 00:38:39,120 --> 00:38:41,950 [INAUDIBLE] index, right? 728 00:38:41,950 --> 00:38:44,980 PROFESSOR: Full merge sort, N log N? 729 00:38:44,980 --> 00:38:45,480 No. 730 00:38:45,480 --> 00:38:46,870 AUDIENCE: It's already sorted, though. 731 00:38:46,870 --> 00:38:47,160 PROFESSOR: OK. 732 00:38:47,160 --> 00:38:47,610 So what do we do? 733 00:38:47,610 --> 00:38:48,260 AUDIENCE: We just do merge. 734 00:38:48,260 --> 00:38:49,560 That's just order N. 735 00:38:49,560 --> 00:38:51,182 PROFESSOR: So merge. 736 00:38:51,182 --> 00:38:53,050 AUDIENCE: Merge [INAUDIBLE]. 737 00:38:53,050 --> 00:38:54,760 PROFESSOR: OK. 738 00:38:54,760 --> 00:38:58,230 So merge and then index is the first approach, 739 00:38:58,230 --> 00:39:01,840 which is order N. Then you said run the merge algorithm, 740 00:39:01,840 --> 00:39:07,052 but stop when you get to the little nth element, 741 00:39:07,052 --> 00:39:09,010 so that's a little bit better and we don't have 742 00:39:09,010 --> 00:39:12,490 to produce an array so the space complexity is down to order 1, 743 00:39:12,490 --> 00:39:14,600 right? 744 00:39:14,600 --> 00:39:15,615 Now let's do-- 745 00:39:15,615 --> 00:39:17,170 AUDIENCE: Logarighmic times n. 746 00:39:17,170 --> 00:39:18,575 PROFESSOR: Yeah, exactly. 747 00:39:18,575 --> 00:39:19,200 This is linear. 748 00:39:19,200 --> 00:39:20,810 We have to get to logarithms. 749 00:39:20,810 --> 00:39:23,228 How do we get to logarithms? 750 00:39:23,228 --> 00:39:24,860 AUDIENCE: Do a modified binary search. 751 00:39:32,070 --> 00:39:38,190 AUDIENCE: What if you first looked at-- actually, 752 00:39:38,190 --> 00:39:39,830 I don't know what I'm going to say. 753 00:39:42,132 --> 00:39:43,090 PROFESSOR: Anyone else? 754 00:39:46,037 --> 00:39:47,120 So modified binary search. 755 00:39:47,120 --> 00:39:49,380 Do you know the full answer, or do you 756 00:39:49,380 --> 00:39:51,350 want to start looking at the solution? 757 00:39:51,350 --> 00:39:53,665 AUDIENCE: I have an idea [INAUDIBLE]. 758 00:39:53,665 --> 00:39:55,290 PROFESSOR: Let's see how it would work. 759 00:39:58,530 --> 00:40:11,530 AUDIENCE: So if we take the n over second element 760 00:40:11,530 --> 00:40:19,628 on each row, the one that is lower, 761 00:40:19,628 --> 00:40:22,600 that's at least the n over second element, 762 00:40:22,600 --> 00:40:23,725 and the one that's higher-- 763 00:40:35,230 --> 00:40:38,415 PROFESSOR: So let's say this is our N1 and N2, 764 00:40:38,415 --> 00:40:41,330 and they're both order N initially. 765 00:40:41,330 --> 00:40:45,010 So this is N2 over 2 if this is smaller. 766 00:40:48,328 --> 00:40:49,786 AUDIENCE: The lower one is at least 767 00:40:49,786 --> 00:40:51,790 the N over second element since everything 768 00:40:51,790 --> 00:41:03,844 before it is less than N. Does that make sense? 769 00:41:03,844 --> 00:41:04,510 PROFESSOR: Yeah. 770 00:41:04,510 --> 00:41:07,140 So this is N over 2 are greater, right? 771 00:41:07,140 --> 00:41:07,840 This guy. 772 00:41:16,030 --> 00:41:28,859 AUDIENCE: We also know that the element above it 773 00:41:28,859 --> 00:41:35,316 is at most the nth element because it's greater than-- 774 00:41:35,316 --> 00:41:37,620 PROFESSOR: So it's at most N1 plus N2 775 00:41:37,620 --> 00:41:39,370 because that's how many you have in total, 776 00:41:39,370 --> 00:41:44,150 and the one on the top, you know that these ones are 777 00:41:44,150 --> 00:41:45,520 bigger than h, right? 778 00:41:45,520 --> 00:41:47,850 But you don't know anything about these ones, 779 00:41:47,850 --> 00:41:52,360 so it's minus N1 over 2. 780 00:41:52,360 --> 00:41:56,700 So it's at most N1 over 2 plus N2, the top element. 781 00:41:59,590 --> 00:42:06,860 AUDIENCE: So then we can take the element three quarters 782 00:42:06,860 --> 00:42:09,992 of the way through N2 and one quarter of the way 783 00:42:09,992 --> 00:42:13,934 through N1 to do more [INAUDIBLE]. 784 00:42:13,934 --> 00:42:14,850 AUDIENCE: [INAUDIBLE]? 785 00:42:17,722 --> 00:42:18,305 AUDIENCE: Yes. 786 00:42:22,962 --> 00:42:24,920 PROFESSOR: Let's see what happens in each case. 787 00:42:24,920 --> 00:42:34,270 So if little n is here, then you divide. 788 00:42:34,270 --> 00:42:41,490 So if n is smaller than this, then you chop them up here 789 00:42:41,490 --> 00:42:43,860 and you've divided the problem into half. 790 00:42:43,860 --> 00:42:45,190 You're good. 791 00:42:45,190 --> 00:42:49,680 If it's bigger than this other number here, 792 00:42:49,680 --> 00:42:52,610 you've chopped the problem up and you're here. 793 00:42:52,610 --> 00:42:53,500 You're good. 794 00:42:53,500 --> 00:42:56,100 Now, the hard case seems to be when it's in between. 795 00:42:56,100 --> 00:42:57,690 So what do we do then? 796 00:42:57,690 --> 00:42:59,854 AUDIENCE: Aren't those two numbers the same? 797 00:42:59,854 --> 00:43:01,770 AUDIENCE: If it's between, take the upper half 798 00:43:01,770 --> 00:43:07,220 of the bottom one and the lower half of the upper one, right? 799 00:43:07,220 --> 00:43:09,695 If it's between 15 and 43, then you take everything 800 00:43:09,695 --> 00:43:15,375 in the upper half of N2 and you take the lower half of N1. 801 00:43:15,375 --> 00:43:16,750 AUDIENCE: Yeah, you should always 802 00:43:16,750 --> 00:43:18,662 be taking the upper half of one and the lower 803 00:43:18,662 --> 00:43:21,240 half of the other in this case. 804 00:43:21,240 --> 00:43:21,990 PROFESSOR: Really? 805 00:43:26,270 --> 00:43:35,230 AUDIENCE: [INAUDIBLE] N2, 15 is at least 806 00:43:35,230 --> 00:43:37,330 the nth over 2 element. 807 00:43:37,330 --> 00:43:39,664 I think we're using three n's at the same time. 808 00:43:39,664 --> 00:43:42,110 AUDIENCE: Are you using the little n or the big N? 809 00:43:42,110 --> 00:43:42,776 AUDIENCE: Sorry. 810 00:43:45,100 --> 00:43:47,224 [INAUDIBLE] is element and the lists are called N 811 00:43:47,224 --> 00:43:48,140 and this is confusing. 812 00:43:48,140 --> 00:43:50,376 Can we rename the nth element to something 813 00:43:50,376 --> 00:43:53,130 like m or some other useful number? 814 00:43:53,130 --> 00:43:54,200 The kth element, OK. 815 00:43:54,200 --> 00:43:57,680 So the 15 at least the kth over 2 element, 816 00:43:57,680 --> 00:44:00,227 so it can't be anything on the left half of the-- 817 00:44:00,227 --> 00:44:01,060 PROFESSOR: k over 2? 818 00:44:01,060 --> 00:44:02,350 Why k over 2? 819 00:44:02,350 --> 00:44:03,570 This list is size N. 820 00:44:03,570 --> 00:44:05,220 AUDIENCE: Sorry. 821 00:44:05,220 --> 00:44:07,470 I didn't pick the elements at N over 2. 822 00:44:07,470 --> 00:44:11,369 I picked the elements at k over 2. 823 00:44:11,369 --> 00:44:12,660 PROFESSOR: Why would I do that? 824 00:44:18,790 --> 00:44:24,030 If N1 is greater than k, then I chop off the end of the list, 825 00:44:24,030 --> 00:44:24,966 right? 826 00:44:24,966 --> 00:44:29,325 If N2 is bigger than k, then I chop off the end of the list 827 00:44:29,325 --> 00:44:29,825 completely. 828 00:44:33,375 --> 00:44:37,100 If this list is sorted and I want the third element, 829 00:44:37,100 --> 00:44:39,200 I know that these are not the answer. 830 00:44:39,200 --> 00:44:42,140 No matter what's down here, these are not the answer. 831 00:44:42,140 --> 00:44:49,350 So I know for sure that k is going to be bigger than N1, N2. 832 00:44:49,350 --> 00:44:55,640 So instead of going there, let's go at k over 2. 833 00:44:55,640 --> 00:45:01,290 And here, let's go for k over 2. 834 00:45:01,290 --> 00:45:03,255 Now this one looks a bit nastier. 835 00:45:03,255 --> 00:45:07,720 N1 plus N2 stays what it was before. 836 00:45:14,970 --> 00:45:18,560 AUDIENCE: On the list where you got the element that was lower, 837 00:45:18,560 --> 00:45:24,690 you know that everything to the left of it is less than k 838 00:45:24,690 --> 00:45:25,240 over 2. 839 00:45:31,470 --> 00:45:34,330 The element number is lower than k over 2, 840 00:45:34,330 --> 00:45:36,650 so we're not using anything to the left of the 15. 841 00:45:36,650 --> 00:45:38,540 You can kill that section for us. 842 00:45:38,540 --> 00:45:42,007 PROFESSOR: OK, so we can kill it, but then what's the rank? 843 00:45:42,007 --> 00:45:43,590 When I recurse, how am I going to know 844 00:45:43,590 --> 00:45:45,520 the rank that I'm looking for? 845 00:45:48,817 --> 00:45:50,701 AUDIENCE: k minus what you killed. 846 00:45:55,700 --> 00:45:57,520 AUDIENCE: You can save the branches. 847 00:45:57,520 --> 00:45:59,560 PROFESSOR: So you want to kill this guy, right? 848 00:45:59,560 --> 00:46:01,970 So you want to kill these numbers. 849 00:46:01,970 --> 00:46:04,330 But here I have k over 2 numbers, 850 00:46:04,330 --> 00:46:06,130 and here I have k over 2 numbers. 851 00:46:06,130 --> 00:46:09,401 How do I know that it's not somewhere here? 852 00:46:09,401 --> 00:46:13,562 AUDIENCE: How do you know that what's not somewhere there? 853 00:46:13,562 --> 00:46:16,220 AUDIENCE: You compare 15 and 43, right? 854 00:46:16,220 --> 00:46:19,660 And then you see that 43 is bigger, 855 00:46:19,660 --> 00:46:26,300 and you see 15 is smaller, so then you would go to, I guess, 856 00:46:26,300 --> 00:46:36,195 k over 4 index in N1, and 3k over 4 in N2. 857 00:46:36,195 --> 00:46:37,695 AUDIENCE: When you recurse, you said 858 00:46:37,695 --> 00:46:41,430 that you've killed k over 2 elements. 859 00:46:41,430 --> 00:46:46,947 AUDIENCE: But you can also kill everything to the right of 43. 860 00:46:46,947 --> 00:46:47,530 AUDIENCE: Yes. 861 00:46:47,530 --> 00:46:50,015 You can kill everything to the right of 43 862 00:46:50,015 --> 00:46:51,754 since it can't be any of those elements, 863 00:46:51,754 --> 00:46:53,670 and you can kill everything to the left of 15. 864 00:46:53,670 --> 00:46:57,236 And then you repeat the algorithm again with the lists 865 00:46:57,236 --> 00:47:00,210 you didn't kill, except you also put in a term of we've 866 00:47:00,210 --> 00:47:03,950 already covered k over 2 elements. 867 00:47:03,950 --> 00:47:07,570 PROFESSOR: So we want the element with the rank k over 4 868 00:47:07,570 --> 00:47:08,715 over these lists. 869 00:47:13,646 --> 00:47:16,330 So I know for sure that what I have is either k over 2 870 00:47:16,330 --> 00:47:21,240 or less than k over 2, right? 871 00:47:21,240 --> 00:47:23,710 So this is less than k over 2, and then I'm 872 00:47:23,710 --> 00:47:25,532 looking for a rank of k over 4. 873 00:47:28,184 --> 00:47:30,250 That seems to work. 874 00:47:30,250 --> 00:47:33,540 How does the running time look? 875 00:47:33,540 --> 00:47:38,030 AUDIENCE: It should be O of log k. 876 00:47:38,030 --> 00:47:40,530 AUDIENCE: I think it's log [INAUDIBLE]. 877 00:47:45,030 --> 00:47:52,609 PROFESSOR: So log k, log N1 plus N2. 878 00:47:57,440 --> 00:47:58,620 Are these different? 879 00:47:58,620 --> 00:48:00,830 AUDIENCE: Yes. 880 00:48:00,830 --> 00:48:06,406 AUDIENCE: If k is 1, the algorithm 881 00:48:06,406 --> 00:48:10,084 should only recurse once, even if N is 20 million. 882 00:48:16,220 --> 00:48:18,364 PROFESSOR: OK. 883 00:48:18,364 --> 00:48:21,902 AUDIENCE: But if k is 20 million and the list lengths are 884 00:48:21,902 --> 00:48:24,784 two million long, it'll take approximately those lengths 885 00:48:24,784 --> 00:48:26,740 to run. 886 00:48:26,740 --> 00:48:28,690 PROFESSOR: OK. 887 00:48:28,690 --> 00:48:36,420 So what gets reduced, aside from the list size, k gets reduced. 888 00:48:36,420 --> 00:48:39,870 k seems to define the input size for the next iteration 889 00:48:39,870 --> 00:48:42,700 because I'll have at least k over 2 elements in one 890 00:48:42,700 --> 00:48:45,640 of these buckets. 891 00:48:45,640 --> 00:48:48,250 So it sounds like it should be log k. 892 00:48:56,120 --> 00:49:00,390 k is bigger than N1, N2, but it should hopefully be smaller 893 00:49:00,390 --> 00:49:03,740 than the sum because otherwise, why am I doing the problem? 894 00:49:03,740 --> 00:49:08,360 So this is definitely order of N1 plus N2. 895 00:49:08,360 --> 00:49:09,390 So this is a bound. 896 00:49:09,390 --> 00:49:11,820 This is a slightly tighter bound. 897 00:49:11,820 --> 00:49:13,665 We have a different solution to the problem. 898 00:49:16,945 --> 00:49:18,820 All the possible solutions are hard to argue. 899 00:49:18,820 --> 00:49:21,540 They all come down to something like this. 900 00:49:21,540 --> 00:49:26,070 The one that we have requires you to use-- you 901 00:49:26,070 --> 00:49:29,873 have two indices, and you know that the sum of the indices 902 00:49:29,873 --> 00:49:33,580 is k, and you do binary search on the top 903 00:49:33,580 --> 00:49:36,090 and adjust the index on the bottom 904 00:49:36,090 --> 00:49:40,040 to keep the constraint that the sum of the two indices in N. 905 00:49:40,040 --> 00:49:44,620 And you can look at that in the notes that we're going to post. 906 00:49:44,620 --> 00:49:45,780 Are you guys tired? 907 00:49:45,780 --> 00:49:49,509 Do you want to look at one more thing, or are we done? 908 00:49:49,509 --> 00:49:50,425 AUDIENCE: [INAUDIBLE]. 909 00:49:57,370 --> 00:50:01,510 PROFESSOR: Let's look at something reasonably easy. 910 00:50:01,510 --> 00:50:03,350 You guys can read this on your own. 911 00:50:03,350 --> 00:50:04,910 I'm not going to bore you with that. 912 00:50:08,650 --> 00:50:12,030 So suppose we have some functions, 913 00:50:12,030 --> 00:50:13,990 and we want to order them according 914 00:50:13,990 --> 00:50:16,070 to their asymptotic growth rate. 915 00:50:16,070 --> 00:50:19,480 Do people remember how to do this from Pset one? 916 00:50:19,480 --> 00:50:22,400 So the idea is that you take each function, you simplify it, 917 00:50:22,400 --> 00:50:23,970 and then you sort them. 918 00:50:23,970 --> 00:50:28,234 So let's have a couple of simple ones and then some hard ones, 919 00:50:28,234 --> 00:50:29,900 and we're going to stop in five minutes. 920 00:50:39,110 --> 00:50:39,650 What's this? 921 00:50:42,752 --> 00:50:44,153 AUDIENCE: n to the fourth. 922 00:50:46,652 --> 00:50:47,235 PROFESSOR: OK. 923 00:50:49,810 --> 00:50:51,090 Let's see. 924 00:50:51,090 --> 00:50:51,850 n choose 3. 925 00:50:51,850 --> 00:50:54,444 What is this? 926 00:50:54,444 --> 00:50:55,539 AUDIENCE: [INAUDIBLE]. 927 00:50:55,539 --> 00:50:56,580 PROFESSOR: OK, very good. 928 00:50:56,580 --> 00:50:57,796 Why? 929 00:50:57,796 --> 00:51:01,177 AUDIENCE: Something about choose is n times n minus 1 times 930 00:51:01,177 --> 00:51:03,592 n minus 2. 931 00:51:03,592 --> 00:51:07,456 AUDIENCE: It's n factorial over n minus 2 factorial. 932 00:51:11,220 --> 00:51:13,325 PROFESSOR: This comes out to be roughly n cubed. 933 00:51:16,265 --> 00:51:16,765 Cool. 934 00:51:19,570 --> 00:51:22,808 How about n plus log to the fourth of n? 935 00:51:25,616 --> 00:51:27,020 AUDIENCE: N. 936 00:51:27,020 --> 00:51:28,960 PROFESSOR: Yep. 937 00:51:28,960 --> 00:51:32,290 So even if I have a polynomial in a logarithm, 938 00:51:32,290 --> 00:51:35,780 it's still dominated by pure n. 939 00:51:35,780 --> 00:51:39,000 Now, suppose we want to order these guys together 940 00:51:39,000 --> 00:51:43,730 with-- which one doesn't look boring at all? 941 00:51:43,730 --> 00:51:46,940 n to the log n and 2 to the n. 942 00:51:49,790 --> 00:51:50,739 Let's sort them. 943 00:51:50,739 --> 00:51:51,780 Which one's the smallest? 944 00:51:51,780 --> 00:51:53,394 Which one's the biggest? 945 00:51:59,250 --> 00:52:02,504 AUDIENCE: 2 to the n is bigger. 946 00:52:02,504 --> 00:52:04,420 PROFESSOR: Let's start with the smallest ones, 947 00:52:04,420 --> 00:52:07,340 because I think that will be easy. 948 00:52:07,340 --> 00:52:10,480 So which one's the absolute smallest out of all these guys? 949 00:52:10,480 --> 00:52:11,340 AUDIENCE: n. 950 00:52:11,340 --> 00:52:12,590 PROFESSOR: OK. 951 00:52:12,590 --> 00:52:14,120 Then? 952 00:52:14,120 --> 00:52:15,960 AUDIENCE: n to the third. 953 00:52:15,960 --> 00:52:17,276 PROFESSOR: Cubed and fourth. 954 00:52:17,276 --> 00:52:18,650 So we have to compare these guys. 955 00:52:18,650 --> 00:52:19,608 How do we compare them? 956 00:52:23,010 --> 00:52:26,900 n to the power of log n and 2 to the n something. 957 00:52:29,984 --> 00:52:30,900 AUDIENCE: [INAUDIBLE]. 958 00:52:34,819 --> 00:52:35,860 PROFESSOR: Take the logs. 959 00:52:35,860 --> 00:52:38,550 So when we have something confusing with exponentials, 960 00:52:38,550 --> 00:52:40,500 take the logs and see what we get. 961 00:52:40,500 --> 00:52:42,640 Logs are monotonic, so if you take the logs, 962 00:52:42,640 --> 00:52:45,450 you'll have the same relationship afterwards. 963 00:52:45,450 --> 00:52:53,090 So log of this is log of n to the power of log n, 964 00:52:53,090 --> 00:53:05,020 so it's log n times log n, so it's log 2 n. 965 00:53:05,020 --> 00:53:08,940 Log of 2 to the n is n. 966 00:53:08,940 --> 00:53:11,404 Which one's bigger? 967 00:53:11,404 --> 00:53:12,205 AUDIENCE: n. 968 00:53:12,205 --> 00:53:13,080 PROFESSOR: All right. 969 00:53:21,040 --> 00:53:22,470 And we're not going to solve this, 970 00:53:22,470 --> 00:53:25,150 but how would you go about solving this guy? 971 00:53:28,840 --> 00:53:30,060 What do you do to it? 972 00:53:30,060 --> 00:53:30,860 AUDIENCE: Sterling. 973 00:53:30,860 --> 00:53:33,170 PROFESSOR: Sterling, yep. 974 00:53:33,170 --> 00:53:35,270 You do Sterling, you go through the numbers, 975 00:53:35,270 --> 00:53:37,210 and you figure out the answer. 976 00:53:37,210 --> 00:53:39,135 And then if you have to do logarithms, 977 00:53:39,135 --> 00:53:41,390 you use logarithms to figure out where 978 00:53:41,390 --> 00:53:43,510 it belongs among these guys. 979 00:53:43,510 --> 00:53:44,800 AUDIENCE: What's Sterling? 980 00:53:44,800 --> 00:53:45,540 AUDIENCE: Sterling's formula. 981 00:53:45,540 --> 00:53:47,506 It's that gross thing that was on the board 982 00:53:47,506 --> 00:53:48,464 before we came in here. 983 00:53:48,464 --> 00:53:54,390 PROFESSOR: So Sterling says that n factorial is ugly. 984 00:54:03,030 --> 00:54:09,440 2 pi n here times n over e to the power of n. 985 00:54:13,670 --> 00:54:14,827 So what's this binomial? 986 00:54:14,827 --> 00:54:15,910 What's the formula for it? 987 00:54:21,670 --> 00:54:23,210 OK, formula for n choose k. 988 00:54:23,210 --> 00:54:24,512 Anyone? 989 00:54:24,512 --> 00:54:28,363 AUDIENCE: It's n times 1 over 2 factorial times n 990 00:54:28,363 --> 00:54:29,611 minus k factorial. 991 00:54:32,750 --> 00:54:35,410 PROFESSOR: In this case, it's n factorial 992 00:54:35,410 --> 00:54:40,576 over n over 2 factorial raised to the power of 2, right? 993 00:54:40,576 --> 00:54:42,950 And then we chug through the math and get to some answer. 994 00:54:46,390 --> 00:54:47,956 All right?