1 00:00:00,050 --> 00:00:01,770 The following content is provided 2 00:00:01,770 --> 00:00:04,000 under a Creative Commons license. 3 00:00:04,000 --> 00:00:06,860 Your support will help MIT OpenCourseWare continue 4 00:00:06,860 --> 00:00:10,720 to offer high quality educational resources for free. 5 00:00:10,720 --> 00:00:13,320 To make a donation or view additional materials 6 00:00:13,320 --> 00:00:17,207 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,207 --> 00:00:17,832 at ocw.mit.edu. 8 00:00:21,380 --> 00:00:23,720 PROFESSOR: It's a bit better, my heart's back in place. 9 00:00:23,720 --> 00:00:26,812 I'm glad that not everyone's gone. 10 00:00:26,812 --> 00:00:29,480 AUDIENCE: Well you mean passed out and died from the test? 11 00:00:29,480 --> 00:00:30,250 PROFESSOR: Yeah. 12 00:00:30,250 --> 00:00:32,380 AUDIENCE: OK, no there were a few students left. 13 00:00:32,380 --> 00:00:35,400 PROFESSOR: That's good, that's good. 14 00:00:35,400 --> 00:00:37,650 I think once we release the results everyone will calm 15 00:00:37,650 --> 00:00:41,498 down, and we'll realize that the mistake was on our part. 16 00:00:41,498 --> 00:00:42,710 AUDIENCE: The mistake? 17 00:00:42,710 --> 00:00:46,120 PROFESSOR: I mean the results are lower than what we thought. 18 00:00:46,120 --> 00:00:48,600 And it's because we haven't covered something 19 00:00:48,600 --> 00:00:51,870 that I will cover today. 20 00:00:51,870 --> 00:00:54,346 So we think-- well we talked about it yesterday, 21 00:00:54,346 --> 00:00:56,220 and we think we haven't done enough algorithm 22 00:00:56,220 --> 00:00:57,094 design with you guys. 23 00:00:57,094 --> 00:00:59,230 So today I have problems, and we're 24 00:00:59,230 --> 00:01:01,870 going to come up with solutions. 25 00:01:01,870 --> 00:01:03,453 AUDIENCE: Are you not going to tell us 26 00:01:03,453 --> 00:01:05,519 what [INAUDIBLE] number is? 27 00:01:05,519 --> 00:01:07,060 PROFESSOR: It's in the lecture notes. 28 00:01:07,060 --> 00:01:09,650 It's actually quite boring. 29 00:01:09,650 --> 00:01:11,080 I promise it's really boring. 30 00:01:11,080 --> 00:01:12,004 AUDIENCE: OK. 31 00:01:12,004 --> 00:01:14,170 PROFESSOR: And if you want to do that next time when 32 00:01:14,170 --> 00:01:15,780 we'll actually talk about numerics. 33 00:01:15,780 --> 00:01:17,150 The thing is numerics are straightforward. 34 00:01:17,150 --> 00:01:18,649 Once you learn the algorithms you're 35 00:01:18,649 --> 00:01:21,100 not going to come up with a new one. 36 00:01:21,100 --> 00:01:22,546 I'm pretty sure about that. 37 00:01:22,546 --> 00:01:24,920 Like you're not going to come up with a revolutionary way 38 00:01:24,920 --> 00:01:26,980 of adding two numbers. 39 00:01:26,980 --> 00:01:29,420 AUDIENCE: I don't know. 40 00:01:29,420 --> 00:01:30,340 You never know. 41 00:01:30,340 --> 00:01:31,840 PROFESSOR: Well you can tell me why. 42 00:01:31,840 --> 00:01:32,950 Think about it and then you can tell me 43 00:01:32,950 --> 00:01:33,750 why you are not going to. 44 00:01:33,750 --> 00:01:34,740 AUDIENCE: OK, got it. 45 00:01:37,680 --> 00:01:41,450 PROFESSOR: OK so let's start with a problem. 46 00:01:41,450 --> 00:01:43,450 We know what sorted arrays look like, right? 47 00:01:46,130 --> 00:01:50,890 1, 3, 5, 6, 7, 9, 12. 48 00:01:50,890 --> 00:01:53,809 This is a sorted array. 49 00:01:53,809 --> 00:01:55,350 If we're given a sorted array we know 50 00:01:55,350 --> 00:01:59,210 how to find the number in it, right? 51 00:01:59,210 --> 00:02:01,729 What's the running time for that. 52 00:02:01,729 --> 00:02:03,020 AUDIENCE: For any given number? 53 00:02:03,020 --> 00:02:04,656 PROFESSOR: Yeah. 54 00:02:04,656 --> 00:02:06,390 AUDIENCE: [INAUDIBLE] 55 00:02:06,390 --> 00:02:09,338 PROFESSOR: Well what's the best way that we know? 56 00:02:09,338 --> 00:02:10,230 AUDIENCE: Logarithm. 57 00:02:10,230 --> 00:02:11,896 PROFESSOR: Log and binary search, right? 58 00:02:14,580 --> 00:02:18,040 Well instead of this we're given a shifted array. 59 00:02:18,040 --> 00:02:20,880 And shifted means that-- say you're 60 00:02:20,880 --> 00:02:23,185 shifting the array by K elements. 61 00:02:23,185 --> 00:02:25,090 We're taking these guys. 62 00:02:25,090 --> 00:02:27,890 So the array is shifted to the left. 63 00:02:27,890 --> 00:02:30,930 So these K elements end up on the right. 64 00:02:30,930 --> 00:02:34,760 These N minus K elements end up on the left. 65 00:02:34,760 --> 00:02:39,000 So 7, 9, 12, 1, 3, 5, 6. 66 00:02:42,780 --> 00:02:48,130 So still N elements they are shifted by some number K. 67 00:02:48,130 --> 00:02:51,220 And I want to find one number, e, in the array. 68 00:02:51,220 --> 00:02:52,805 I don't know K by the way. 69 00:02:52,805 --> 00:02:54,430 We just know that it's a shifted array. 70 00:02:58,120 --> 00:02:58,930 This is 12. 71 00:03:06,490 --> 00:03:12,430 So the first thing you do is figure out 72 00:03:12,430 --> 00:03:14,635 how much time you have for this, right? 73 00:03:14,635 --> 00:03:16,760 If you're on a test, you roughly know how much time 74 00:03:16,760 --> 00:03:17,590 you have for a problem. 75 00:03:17,590 --> 00:03:18,890 If you're on an interview you have 76 00:03:18,890 --> 00:03:20,520 to figure out how much time the interviewer is 77 00:03:20,520 --> 00:03:21,950 willing to give you for problem. 78 00:03:21,950 --> 00:03:24,960 And spend the first, I don't, a third 79 00:03:24,960 --> 00:03:26,357 of the time thinking maybe. 80 00:03:26,357 --> 00:03:28,190 Come up with the best solution that you can. 81 00:03:28,190 --> 00:03:30,800 And then stop there, and start talking. 82 00:03:30,800 --> 00:03:32,540 So the first thing you do, you want 83 00:03:32,540 --> 00:03:33,930 to make sure that when you run out of time 84 00:03:33,930 --> 00:03:35,013 you have something to say. 85 00:03:35,013 --> 00:03:37,670 The most awful thing you can say is, dude I'm going home now. 86 00:03:37,670 --> 00:03:39,422 Or leave the answer blank. 87 00:03:39,422 --> 00:03:40,880 If you leave the answer blank we're 88 00:03:40,880 --> 00:03:42,570 not going to give you points, right? 89 00:03:42,570 --> 00:03:43,780 So not good. 90 00:03:43,780 --> 00:03:46,140 So what's the worst answer you could give us? 91 00:03:50,260 --> 00:03:54,460 OK worst is a bad term. 92 00:03:54,460 --> 00:03:56,630 What's the brute force solution to this? 93 00:03:56,630 --> 00:03:58,970 The solution where we don't care about the running time, 94 00:03:58,970 --> 00:04:02,100 but we want the correct answer. 95 00:04:02,100 --> 00:04:03,350 AUDIENCE: Look everywhere. 96 00:04:03,350 --> 00:04:04,710 PROFESSOR: Yep. 97 00:04:04,710 --> 00:04:06,600 So do a linear search. 98 00:04:06,600 --> 00:04:09,350 Pretend we don't know anything about this array. 99 00:04:09,350 --> 00:04:11,420 Lose the information that it's a shifted array. 100 00:04:11,420 --> 00:04:16,860 Linear search running time order N. 101 00:04:16,860 --> 00:04:19,108 OK so this is something, at least now 102 00:04:19,108 --> 00:04:20,899 when your time runs out you have something. 103 00:04:20,899 --> 00:04:22,524 You're not going to leave empty handed. 104 00:04:25,010 --> 00:04:26,250 Let's start thinking now. 105 00:04:26,250 --> 00:04:26,770 Next step. 106 00:04:30,011 --> 00:04:32,764 AUDIENCE: [INAUDIBLE] can we shift it again? 107 00:04:32,764 --> 00:04:34,430 PROFESSOR: So the fact that it's shifted 108 00:04:34,430 --> 00:04:37,640 means that-- so originally it was sorted, right? 109 00:04:37,640 --> 00:04:39,900 But now instead of it being completely sorted, 110 00:04:39,900 --> 00:04:42,580 you have all the elements are shifted to the left. 111 00:04:42,580 --> 00:04:45,610 And then so there's a rotation thing going on here. 112 00:04:45,610 --> 00:04:48,430 So these elements got out of the array 113 00:04:48,430 --> 00:04:51,182 and then they are put back in from the other side. 114 00:04:51,182 --> 00:04:52,870 AUDIENCE: Why do we do that? 115 00:04:52,870 --> 00:04:54,820 PROFESSOR: That's how the info looks like. 116 00:04:54,820 --> 00:04:55,910 We don't do it. 117 00:04:55,910 --> 00:04:58,042 It was done to us. 118 00:04:58,042 --> 00:04:58,932 AUDIENCE: Oh OK. 119 00:04:58,932 --> 00:04:59,890 So now what do we want? 120 00:04:59,890 --> 00:05:00,980 PROFESSOR: We want to find an element-- 121 00:05:00,980 --> 00:05:01,620 AUDIENCE: Oh, I see. 122 00:05:01,620 --> 00:05:04,161 PROFESSOR: --despite the fact that the array looks like this. 123 00:05:04,161 --> 00:05:06,235 AUDIENCE: Do we know that the list that we have 124 00:05:06,235 --> 00:05:07,240 is shifted already? 125 00:05:07,240 --> 00:05:07,910 PROFESSOR: Yes. 126 00:05:07,910 --> 00:05:11,280 So we're promised that this is a shifted array. 127 00:05:11,280 --> 00:05:13,130 So it will look like this. 128 00:05:13,130 --> 00:05:15,710 But we don't know what K is. 129 00:05:15,710 --> 00:05:20,810 If we knew what K is, could we do something fast? 130 00:05:20,810 --> 00:05:22,010 What would we do? 131 00:05:22,010 --> 00:05:24,754 AUDIENCE: Yeah, you'd just re-shift [INAUDIBLE]. 132 00:05:24,754 --> 00:05:26,170 PROFESSOR: OK so if you re-shifted 133 00:05:26,170 --> 00:05:28,610 what's the running time? 134 00:05:28,610 --> 00:05:31,560 AUDIENCE: For order K [INAUDIBLE]. 135 00:05:31,560 --> 00:05:33,225 PROFESSOR: OK so if we actually shifted 136 00:05:33,225 --> 00:05:36,450 and then do a binary search it's order K plus log 137 00:05:36,450 --> 00:05:40,692 N. So for big K's that's not better. 138 00:05:40,692 --> 00:05:45,542 AUDIENCE: Why is it not order N plus log [INAUDIBLE]. 139 00:05:45,542 --> 00:05:48,000 PROFESSOR: You can say that since we don't have any promise 140 00:05:48,000 --> 00:05:53,620 on K it's N. It's order K if you can shift things out of both 141 00:05:53,620 --> 00:05:54,410 N's. 142 00:05:54,410 --> 00:05:55,970 With Python this would be order N, 143 00:05:55,970 --> 00:05:58,610 just popping out one element is order N in Python. 144 00:05:58,610 --> 00:06:01,430 So this is assuming a smart array. 145 00:06:01,430 --> 00:06:03,420 Otherwise if it's Python, good point. 146 00:06:03,420 --> 00:06:07,920 It's straight up order N. So now another good point. 147 00:06:07,920 --> 00:06:12,420 You have this solution, and you have the brute force solution. 148 00:06:12,420 --> 00:06:14,760 They have the same running time. 149 00:06:14,760 --> 00:06:16,110 You run out of time. 150 00:06:16,110 --> 00:06:17,950 Which one are you going to code up? 151 00:06:17,950 --> 00:06:21,257 Which one are you going to show? 152 00:06:21,257 --> 00:06:22,340 AUDIENCE: The simpler one. 153 00:06:22,340 --> 00:06:23,923 PROFESSOR: The simpler one, excellent. 154 00:06:23,923 --> 00:06:26,230 So the reason is, if you're on a test 155 00:06:26,230 --> 00:06:30,787 it's probably give the pseudocode, then analyze it. 156 00:06:30,787 --> 00:06:32,870 If you're in an interview the guy will ask you OK, 157 00:06:32,870 --> 00:06:34,290 what's the running time? 158 00:06:34,290 --> 00:06:38,121 Code it up on the board in C, Java, whatever he knows. 159 00:06:38,121 --> 00:06:39,870 So you want to code the simplest solution, 160 00:06:39,870 --> 00:06:42,170 because that reduces the chance that you'll have bugs. 161 00:06:42,170 --> 00:06:44,290 So that gives you the most points. 162 00:06:44,290 --> 00:06:46,270 So this solution shows more insight, 163 00:06:46,270 --> 00:06:49,490 but it doesn't have a better run time. 164 00:06:49,490 --> 00:06:51,760 Stick to the simple solution. 165 00:06:51,760 --> 00:06:53,166 However if you have this then you 166 00:06:53,166 --> 00:06:54,540 have some insight on the problem. 167 00:06:54,540 --> 00:06:57,300 So you can keep going and hope you 168 00:06:57,300 --> 00:06:59,480 can come up with a better answer. 169 00:06:59,480 --> 00:07:02,090 So if we knew K, one thing we could do 170 00:07:02,090 --> 00:07:05,300 is reduce the array to an unshifted array. 171 00:07:05,300 --> 00:07:07,850 What's another thing we can do? 172 00:07:07,850 --> 00:07:10,080 So I claim that if you know K you 173 00:07:10,080 --> 00:07:16,196 can come up with a reasonably easy log N method. 174 00:07:16,196 --> 00:07:18,370 AUDIENCE: If you're doing binary search, 175 00:07:18,370 --> 00:07:23,530 like if you just pretend like the array is all together, 176 00:07:23,530 --> 00:07:28,013 but if you know K. So let's say you're looking for 6. 177 00:07:28,013 --> 00:07:32,340 Then you'd say oh well I'm going to split the array half, 178 00:07:32,340 --> 00:07:34,080 but you're actually going to start at K 179 00:07:34,080 --> 00:07:35,470 and then split it in half. 180 00:07:35,470 --> 00:07:38,300 So it's like you pretend that-- 181 00:07:38,300 --> 00:07:39,780 PROFESSOR: So what you want to say 182 00:07:39,780 --> 00:07:43,827 is you have a pretend array in your mind, right? 183 00:07:43,827 --> 00:07:44,410 AUDIENCE: Yes. 184 00:07:44,410 --> 00:07:47,110 It's [INAUDIBLE] by K. 185 00:07:47,110 --> 00:07:49,590 PROFESSOR: And you want to access the middle element 186 00:07:49,590 --> 00:07:53,030 to see if what you're looking for is bigger or smaller. 187 00:07:53,030 --> 00:07:55,100 Instead of looking at the middle element here, 188 00:07:55,100 --> 00:07:57,636 you look at the middle plus K, right? 189 00:07:57,636 --> 00:07:58,219 AUDIENCE: Yes. 190 00:07:58,219 --> 00:07:59,638 Oh, there you go. 191 00:07:59,638 --> 00:08:01,530 Plus K. 192 00:08:01,530 --> 00:08:04,219 PROFESSOR: This is one way of doing it, good running time. 193 00:08:04,219 --> 00:08:05,260 The problem is it's hard. 194 00:08:05,260 --> 00:08:08,490 You'll have to rewrite binary search and hope it works. 195 00:08:08,490 --> 00:08:10,580 What I would do, given that I've had 196 00:08:10,580 --> 00:08:15,460 a bit of time to think about it, is this is sorted. 197 00:08:15,460 --> 00:08:17,790 This is sorted. 198 00:08:17,790 --> 00:08:22,630 So two binary searched are also going to be log N time. 199 00:08:22,630 --> 00:08:25,270 Two binary searches, two lines of pseudocode. 200 00:08:25,270 --> 00:08:27,540 The running time analysis is pretty simple. 201 00:08:27,540 --> 00:08:30,880 Correctness is also pretty simple. 202 00:08:30,880 --> 00:08:32,710 And also this gives me some insight 203 00:08:32,710 --> 00:08:34,689 on the rest of the problem I claim. 204 00:08:37,470 --> 00:08:43,780 OK so if we have K we can do log N. What if we don't have K? 205 00:08:43,780 --> 00:08:45,830 What do we do? 206 00:08:45,830 --> 00:08:46,909 Yes? 207 00:08:46,909 --> 00:08:48,200 AUDIENCE: Figure out what K is. 208 00:08:48,200 --> 00:08:50,783 PROFESSOR: All right let's try to find K. We know how to do it 209 00:08:50,783 --> 00:08:55,360 if we have K. So let's try to find K. What-- if I want 210 00:08:55,360 --> 00:09:00,020 to arrive to a solution that's log N, 211 00:09:00,020 --> 00:09:01,830 how much time can I spend on finding K? 212 00:09:04,340 --> 00:09:07,260 OK so let's find K in log N time. 213 00:09:11,220 --> 00:09:13,210 AUDIENCE: Binary search for minimum? 214 00:09:13,210 --> 00:09:15,720 PROFESSOR: Binary search-- so I like binary search, 215 00:09:15,720 --> 00:09:17,780 because binary search is an algorithm that 216 00:09:17,780 --> 00:09:19,590 runs on an array. 217 00:09:19,590 --> 00:09:21,300 And that runs in log N time. 218 00:09:21,300 --> 00:09:24,092 So if I'm able to make it work I know everything's 219 00:09:24,092 --> 00:09:25,550 going to be right in terms of time. 220 00:09:25,550 --> 00:09:28,335 So what do you run a binary search for? 221 00:09:28,335 --> 00:09:29,960 AUDIENCE: The smallest number possible? 222 00:09:29,960 --> 00:09:32,335 I guess that's kind of going through all of them, though. 223 00:09:32,335 --> 00:09:33,400 It doesn't really help. 224 00:09:33,400 --> 00:09:35,280 PROFESSOR: So if you have the min. 225 00:09:35,280 --> 00:09:36,902 Sorry, you can speak in one second. 226 00:09:36,902 --> 00:09:38,110 AUDIENCE: Oh we have the min! 227 00:09:38,110 --> 00:09:39,943 PROFESSOR: So no, if you can't have the min. 228 00:09:39,943 --> 00:09:41,080 I think it's good insight. 229 00:09:41,080 --> 00:09:45,134 So if you knew where the min is, you know this is K, right? 230 00:09:45,134 --> 00:09:45,970 AUDIENCE: Yes. 231 00:09:45,970 --> 00:09:47,900 PROFESSOR: So this is the minimum, 232 00:09:47,900 --> 00:09:51,105 that's K. OK, what were you going to say? 233 00:09:51,105 --> 00:09:52,653 AUDIENCE: Oh for just binary search 234 00:09:52,653 --> 00:09:55,491 it would not [INAUDIBLE] minimum. 235 00:09:55,491 --> 00:10:01,900 I was thinking that if we start at 1 we will see to our right 236 00:10:01,900 --> 00:10:02,900 and left. 237 00:10:02,900 --> 00:10:05,304 And the point where [INAUDIBLE] are ending 238 00:10:05,304 --> 00:10:08,004 is where we have something larger to our right 239 00:10:08,004 --> 00:10:09,420 and something smaller to our left. 240 00:10:09,420 --> 00:10:11,420 PROFESSOR: OK so there's a discontinuity here, 241 00:10:11,420 --> 00:10:12,180 that's what you're saying, right? 242 00:10:12,180 --> 00:10:13,180 So this is sorted. 243 00:10:13,180 --> 00:10:15,390 But then at this point this breaks. 244 00:10:15,390 --> 00:10:18,700 AUDIENCE: Yes, we are kind of finding that point where 245 00:10:18,700 --> 00:10:20,650 something to the right that's greater 246 00:10:20,650 --> 00:10:22,682 than [INAUDIBLE] and something to the left 247 00:10:22,682 --> 00:10:24,225 is also greater than [INAUDIBLE]. 248 00:10:24,225 --> 00:10:26,100 PROFESSOR: OK so let's see if we can do that. 249 00:10:26,100 --> 00:10:29,030 So for binary search you have to go somewhere. 250 00:10:29,030 --> 00:10:32,890 So in our case we're trying to get K, right? 251 00:10:32,890 --> 00:10:36,200 And we know that it's somewhere between 1 and 10. 252 00:10:36,200 --> 00:10:38,515 And what binary search does is it makes a guess. 253 00:10:42,970 --> 00:10:45,820 It says hey, I think it's in the middle of the array. 254 00:10:45,820 --> 00:10:49,380 So it will probably guess n over 2. 255 00:10:49,380 --> 00:10:52,030 And it makes a guess and you have 256 00:10:52,030 --> 00:10:54,290 to tell it was the guess too small, 257 00:10:54,290 --> 00:10:57,310 or was the guess too large? 258 00:10:57,310 --> 00:10:59,220 Because this is what allows you to recurse 259 00:10:59,220 --> 00:11:02,930 on either the left interval or on the right interval. 260 00:11:02,930 --> 00:11:07,100 The problem with a discontinuity is, if I guess here, 261 00:11:07,100 --> 00:11:10,850 and if I guess here, I still don't see the discontinuity. 262 00:11:10,850 --> 00:11:12,860 So it's good inside, but it's not enough. 263 00:11:12,860 --> 00:11:14,540 I need a little bit more. 264 00:11:14,540 --> 00:11:18,531 Yes, 2, 3 hands oh wow you guys don't got it? 265 00:11:18,531 --> 00:11:20,155 AUDIENCE: So I think we can arbitrarily 266 00:11:20,155 --> 00:11:22,497 take the halfway point instead of subtracting 267 00:11:22,497 --> 00:11:23,455 from the first element. 268 00:11:23,455 --> 00:11:27,085 And then if it's a negative number, 269 00:11:27,085 --> 00:11:30,055 then discontinuity will be in this half. 270 00:11:30,055 --> 00:11:32,107 If it's [INAUDIBLE] it will be in the other half. 271 00:11:32,107 --> 00:11:33,315 And then you recurse on that. 272 00:11:33,315 --> 00:11:34,920 PROFESSOR: OK. 273 00:11:34,920 --> 00:11:37,950 So let's draw this up. 274 00:11:37,950 --> 00:11:43,740 So in a sorted array the numbers look like this. 275 00:11:43,740 --> 00:11:47,170 In a shifted array we splice it here, 276 00:11:47,170 --> 00:11:49,530 and this guy goes to the right. 277 00:11:49,530 --> 00:11:57,220 So it's like this and then like this. 278 00:11:57,220 --> 00:11:59,750 So this picture shows me the insight that I had before, 279 00:11:59,750 --> 00:12:03,320 that this part is sorted and this part is sorted. 280 00:12:03,320 --> 00:12:05,110 The missing part, which I just heard now, 281 00:12:05,110 --> 00:12:09,880 is that since the whole array was originally sorted, 282 00:12:09,880 --> 00:12:13,020 this guy is smaller than this guy. 283 00:12:13,020 --> 00:12:19,160 So if I draw a horizontal line here, 284 00:12:19,160 --> 00:12:21,150 I can draw a horizontal line somewhere, 285 00:12:21,150 --> 00:12:24,990 and this and this will not cross it. 286 00:12:24,990 --> 00:12:29,670 So this whole thing is taller than this. 287 00:12:29,670 --> 00:12:33,130 So by the way, K was where the discontinuity was, right? 288 00:12:33,130 --> 00:12:34,580 You said discontinuity. 289 00:12:34,580 --> 00:12:38,450 This is K, it's somewhere here. 290 00:12:38,450 --> 00:12:39,540 So this is a better. 291 00:12:39,540 --> 00:12:44,330 So if I make my guess and I land somewhere here, 292 00:12:44,330 --> 00:12:46,150 I can know that my guess is too big, 293 00:12:46,150 --> 00:12:48,450 because it's below the line. 294 00:12:48,450 --> 00:12:51,830 If I make my guess and it's somewhere here, 295 00:12:51,830 --> 00:12:54,410 I know my guess is too small, because the number that I 296 00:12:54,410 --> 00:12:56,530 see here is above the line. 297 00:12:56,530 --> 00:12:57,820 Who sets the line? 298 00:12:57,820 --> 00:12:58,950 The first element here. 299 00:13:02,010 --> 00:13:04,030 So this is how you look at it graphically. 300 00:13:04,030 --> 00:13:06,410 If you don't want to look at it graphically, 301 00:13:06,410 --> 00:13:08,820 this was a sorted array. 302 00:13:08,820 --> 00:13:12,960 If this is the Kth element, then everything 303 00:13:12,960 --> 00:13:14,950 here is smaller than it. 304 00:13:14,950 --> 00:13:18,470 So all these guys are smaller than the first element. 305 00:13:24,180 --> 00:13:26,780 OK so honestly who understands the solution? 306 00:13:29,360 --> 00:13:31,432 3, 4, OK. 307 00:13:31,432 --> 00:13:34,230 Oh, OK pretty good. 308 00:13:34,230 --> 00:13:36,380 Do we want to code this up, or do we 309 00:13:36,380 --> 00:13:39,019 want to look at another problem? 310 00:13:39,019 --> 00:13:40,685 OK who wants to look at another problem? 311 00:13:43,900 --> 00:13:45,560 Clear majority, all right. 312 00:13:45,560 --> 00:13:46,980 Usually I have to do both choices, 313 00:13:46,980 --> 00:13:49,780 because not enough people are paying attention to get this. 314 00:13:49,780 --> 00:13:51,560 So I am happy. 315 00:13:51,560 --> 00:13:53,515 AUDIENCE: [INAUDIBLE]. 316 00:13:53,515 --> 00:13:54,140 PROFESSOR: Yes. 317 00:13:59,870 --> 00:14:02,120 All right so before I start another problem, one thing 318 00:14:02,120 --> 00:14:03,130 I want to say. 319 00:14:03,130 --> 00:14:05,186 Not only do I have a solution for this problem, 320 00:14:05,186 --> 00:14:06,560 but I have a process that allowed 321 00:14:06,560 --> 00:14:11,040 me to go from nothing to a few partial solutions. 322 00:14:11,040 --> 00:14:13,230 And while I was doing that, I was getting insight 323 00:14:13,230 --> 00:14:16,300 and I was making sure that if I run out of time before I have 324 00:14:16,300 --> 00:14:20,670 the final solution, I don't walk out of the room empty handed. 325 00:14:20,670 --> 00:14:23,000 So I don't just want to show you the final solution, 326 00:14:23,000 --> 00:14:24,630 I want to show you the process. 327 00:14:24,630 --> 00:14:26,838 You can look at the notes and see the final solution. 328 00:14:26,838 --> 00:14:30,860 That's not everything I want you to get out of this. 329 00:14:30,860 --> 00:14:33,900 OK, problem 2 has a heap. 330 00:14:38,960 --> 00:14:41,420 And this is a minimum heap, so it looks like this. 331 00:14:51,360 --> 00:14:55,520 So this is a minimum heap, N elements. 332 00:14:55,520 --> 00:14:59,440 And I want to extract the kth smallest element in the heap. 333 00:15:03,330 --> 00:15:07,450 So if K equals 3, this is the third smallest element, right? 334 00:15:07,450 --> 00:15:09,240 K equals 4, it's this guy. 335 00:15:09,240 --> 00:15:12,130 5, and 6, 1 and 2 are here. 336 00:15:14,740 --> 00:15:18,756 OK the good running time that we want, 337 00:15:18,756 --> 00:15:20,630 because this is a hard problem so we give you 338 00:15:20,630 --> 00:15:25,300 the running time, is K log K. However 339 00:15:25,300 --> 00:15:29,942 before we do that I want to hear some brute force solutions. 340 00:15:29,942 --> 00:15:31,555 AUDIENCE: All of them. 341 00:15:31,555 --> 00:15:32,180 PROFESSOR: And? 342 00:15:35,110 --> 00:15:37,170 OK you need to sort them first. 343 00:15:37,170 --> 00:15:41,600 So this heap is actually an array, right? 344 00:15:41,600 --> 00:15:49,840 2, 5, 7, 6, 8, oh it's 6, 9, 8, sorry. 345 00:15:49,840 --> 00:15:53,200 So you're saying sort the array, then K-- 346 00:15:53,200 --> 00:15:54,640 AUDIENCE: [INAUDIBLE]. 347 00:15:54,640 --> 00:15:57,004 PROFESSOR: OK what's the running time for this? 348 00:15:57,004 --> 00:16:04,090 AUDIENCE: Log N. 349 00:16:04,090 --> 00:16:05,910 PROFESSOR: All right we have a solution. 350 00:16:05,910 --> 00:16:09,180 We're not going to leave empty handed. 351 00:16:09,180 --> 00:16:12,857 OK let's try to go a bit better. 352 00:16:12,857 --> 00:16:14,440 What's another way of going it that'll 353 00:16:14,440 --> 00:16:18,190 give me a better running time? 354 00:16:18,190 --> 00:16:20,815 AUDIENCE: You could pop 5 of the K elements off of the-- 355 00:16:20,815 --> 00:16:22,940 PROFESSOR: All right so this is a mean heap, right? 356 00:16:22,940 --> 00:16:24,281 So it has find min. 357 00:16:26,990 --> 00:16:33,950 And find min runs in order log N. So if I call it K times 358 00:16:33,950 --> 00:16:36,910 I'm going to get the K smallest elements. 359 00:16:36,910 --> 00:16:39,610 By the way heap sort says pop and times, 360 00:16:39,610 --> 00:16:42,190 and you'll have all the elements in sorted order. 361 00:16:42,190 --> 00:16:43,970 So we're doing a heap sort, except we 362 00:16:43,970 --> 00:16:47,710 stop when we lose interest after K elements. 363 00:16:47,710 --> 00:16:54,700 So we're down from N log N to K log N. 364 00:16:54,700 --> 00:16:58,600 So I would be interested in hearing a solution that's 365 00:16:58,600 --> 00:17:03,890 worse, because it would look like N log K. 366 00:17:03,890 --> 00:17:06,826 But shows me more insight. 367 00:17:06,826 --> 00:17:07,950 So by the way this is good. 368 00:17:07,950 --> 00:17:12,190 You're already K log N. So K log N, the correct answer 369 00:17:12,190 --> 00:17:16,569 is K log K. Small difference, right? 370 00:17:16,569 --> 00:17:18,280 It's a logarithm factor. 371 00:17:18,280 --> 00:17:19,930 At least it's not an N factor. 372 00:17:19,930 --> 00:17:21,499 If you code this up chances are we're 373 00:17:21,499 --> 00:17:23,540 not going to be able to distinguish between this. 374 00:17:23,540 --> 00:17:24,998 So you'll never see this on a PSet. 375 00:17:24,998 --> 00:17:28,150 So you're almost there. 376 00:17:28,150 --> 00:17:30,790 And this is just applying straight up knowledge 377 00:17:30,790 --> 00:17:32,830 that we had before. 378 00:17:32,830 --> 00:17:36,210 Let's look at this solution, if anyone sees it. 379 00:17:38,800 --> 00:17:41,640 Before we attempt K log K. 380 00:17:41,640 --> 00:17:43,490 AUDIENCE: In another case would we 381 00:17:43,490 --> 00:17:48,070 just pop off first K elements, why would that be in log N? 382 00:17:48,070 --> 00:17:51,620 Because it's actually an array, so I 383 00:17:51,620 --> 00:17:54,980 would think that we'd just take the K time. 384 00:17:54,980 --> 00:17:56,650 PROFESSOR: So this is a heap. 385 00:17:56,650 --> 00:17:58,516 If you don't maintain the heaping variant 386 00:17:58,516 --> 00:17:59,890 after you do the first pop you're 387 00:17:59,890 --> 00:18:02,570 not going to be able to do the second one. 388 00:18:02,570 --> 00:18:05,970 OK, cool. 389 00:18:05,970 --> 00:18:07,480 So let me give you a hint. 390 00:18:07,480 --> 00:18:10,260 How would we find-- if this is an array-- 391 00:18:10,260 --> 00:18:12,280 how do I find the minimum? 392 00:18:12,280 --> 00:18:14,680 2, 5, 7, did I forget something? 393 00:18:14,680 --> 00:18:16,180 No. 394 00:18:16,180 --> 00:18:19,280 Let's pretend this array doesn't start with 2, because it's 395 00:18:19,280 --> 00:18:21,530 boring if it starts with 2. 396 00:18:21,530 --> 00:18:23,140 How do I find the minimum? 397 00:18:23,140 --> 00:18:26,300 I keep one variable that says the best I've seen so far, 398 00:18:26,300 --> 00:18:27,250 right? 399 00:18:27,250 --> 00:18:31,570 Let's see, N-- oh this is still boring. 400 00:18:31,570 --> 00:18:32,960 Let's start here. 401 00:18:32,960 --> 00:18:38,120 So we start with best seen equals 7. 402 00:18:38,120 --> 00:18:41,970 Then when we go to 6 we see, is 6 better than best seen? 403 00:18:41,970 --> 00:18:45,500 If so, replace best seen with 6. 404 00:18:45,500 --> 00:18:46,940 If not keep going. 405 00:18:46,940 --> 00:18:47,950 Then I get to 9. 406 00:18:47,950 --> 00:18:50,400 Is 9 better than best seen? 407 00:18:50,400 --> 00:18:51,430 Nope, keep going. 408 00:18:51,430 --> 00:18:52,820 Is 8 better than best seen? 409 00:18:52,820 --> 00:18:54,200 Nope, keep going. 410 00:18:54,200 --> 00:18:59,210 So I compare every element with the best seen, 411 00:18:59,210 --> 00:19:02,950 and then whenever the element is better I do a replacement. 412 00:19:02,950 --> 00:19:06,060 And then at the end, best seen will have the smallest element. 413 00:19:06,060 --> 00:19:08,130 So this algorithm works for k equals 414 00:19:08,130 --> 00:19:11,510 1, which isn't very useful. 415 00:19:11,510 --> 00:19:14,130 So can we generalize this somehow 416 00:19:14,130 --> 00:19:17,557 to-- so we have a running time here. 417 00:19:17,557 --> 00:19:20,140 That might give you a hint about how we want to generalize it, 418 00:19:20,140 --> 00:19:28,749 and I want to generalize it for all values of K. 419 00:19:28,749 --> 00:19:33,334 AUDIENCE: If you go to the power of 2 then it's less than-- 420 00:19:33,334 --> 00:19:35,070 the nearest power of 2 less than K-- 421 00:19:35,070 --> 00:19:35,985 PROFESSOR: OK. 422 00:19:35,985 --> 00:19:42,010 AUDIENCE: --that element, and iterate forward with your best 423 00:19:42,010 --> 00:19:42,510 seen. 424 00:19:42,510 --> 00:19:44,470 Does that make sense? 425 00:19:44,470 --> 00:19:48,610 If you want the kth, if you want the tenth smallest element, 426 00:19:48,610 --> 00:19:55,040 then it has to be after the 8th row 427 00:19:55,040 --> 00:19:57,690 because it's the next level in the tree. 428 00:19:57,690 --> 00:19:58,565 Does that make sense? 429 00:19:58,565 --> 00:20:00,020 That doesn't make sense. 430 00:20:00,020 --> 00:20:03,417 PROFESSOR: It makes sense, but I don't think it's right. 431 00:20:03,417 --> 00:20:05,500 So you're thinking that the tenth smallest element 432 00:20:05,500 --> 00:20:07,220 has to be somewhere below, right? 433 00:20:07,220 --> 00:20:07,845 AUDIENCE: Yeah. 434 00:20:10,270 --> 00:20:20,630 PROFESSOR: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, and then pretend 435 00:20:20,630 --> 00:20:22,570 that there are numbers here. 436 00:20:22,570 --> 00:20:27,480 11, 12, 13, 14, 15, do you see what I'm saying? 437 00:20:27,480 --> 00:20:28,340 So this is a heap. 438 00:20:34,420 --> 00:20:37,640 So I can keep filling it with bigger elements, 439 00:20:37,640 --> 00:20:40,490 and 10 is here. 440 00:20:40,490 --> 00:20:43,910 However you can do something else 441 00:20:43,910 --> 00:20:48,350 to limit the size of the heap. 442 00:20:48,350 --> 00:20:50,640 It will give us a different running time, 443 00:20:50,640 --> 00:20:52,280 but you can do something. 444 00:20:52,280 --> 00:20:53,660 You can think about it. 445 00:20:53,660 --> 00:20:56,490 How can you chop up some of the heap? 446 00:20:56,490 --> 00:20:58,450 For example if I have a heap that's ten deep 447 00:20:58,450 --> 00:21:03,250 and I look at the fourth element, what can I do? 448 00:21:03,250 --> 00:21:04,840 You can think about that. 449 00:21:04,840 --> 00:21:08,262 And let's try to get to this. 450 00:21:08,262 --> 00:21:09,720 So I'll accept an answer for either 451 00:21:09,720 --> 00:21:12,010 how do we limit the heap in that case, 452 00:21:12,010 --> 00:21:15,270 or how do we generalize this algorithm. 453 00:21:15,270 --> 00:21:16,580 Yes? 454 00:21:16,580 --> 00:21:18,870 AUDIENCE: You want us to remember the smallest K 455 00:21:18,870 --> 00:21:23,406 elements, you'd make a max heap [INAUDIBLE] K [INAUDIBLE] 456 00:21:23,406 --> 00:21:25,310 [INAUDIBLE]. 457 00:21:25,310 --> 00:21:27,060 PROFESSOR: OK so I want to have-- 458 00:21:27,060 --> 00:21:30,050 I'll break down your solution into parts . 459 00:21:30,050 --> 00:21:35,090 So you want to have a bag of the smallest K elements, right? 460 00:21:35,090 --> 00:21:40,960 So instead of the best seen, you want to have the K best seen. 461 00:21:40,960 --> 00:21:44,930 And once you have a bag you want to go through your elements. 462 00:21:44,930 --> 00:21:46,920 And then if you have something that's 463 00:21:46,920 --> 00:21:48,817 better than what you have in the bag, 464 00:21:48,817 --> 00:21:50,150 you want to put that in the bag. 465 00:21:52,710 --> 00:21:55,100 Suppose I have, suppose K equals 3, 466 00:21:55,100 --> 00:21:59,310 and I have 2, 5, and 7 in the bag. 467 00:21:59,310 --> 00:22:00,820 And I see 6. 468 00:22:00,820 --> 00:22:03,120 Who do I want to compare it with? 469 00:22:03,120 --> 00:22:05,700 The biggest thing in the bag, right? 470 00:22:05,700 --> 00:22:09,790 So if I want the K smallest elements, 471 00:22:09,790 --> 00:22:11,830 if this guy is smaller than anything, 472 00:22:11,830 --> 00:22:14,830 these aren't the K smallest elements anymore. 473 00:22:14,830 --> 00:22:17,500 So I want to take the maximum in the bag, 474 00:22:17,500 --> 00:22:20,160 compare it with what I'm seeing right now, 475 00:22:20,160 --> 00:22:23,530 and if what I'm seeing is smaller I want to replace it. 476 00:22:29,704 --> 00:22:30,620 AUDIENCE: [INAUDIBLE]. 477 00:22:30,620 --> 00:22:32,700 PROFESSOR: Because I keep doing this maximum, 478 00:22:32,700 --> 00:22:34,400 I keep asking this maximum question, 479 00:22:34,400 --> 00:22:35,490 this has to be a max heap. 480 00:22:35,490 --> 00:22:36,882 That's why he said max heap. 481 00:22:41,610 --> 00:22:43,900 So you did all these steps at once 482 00:22:43,900 --> 00:22:45,450 and then gave me the final answer. 483 00:22:45,450 --> 00:22:47,230 But this is how you do it step by step. 484 00:22:47,230 --> 00:22:52,810 So it looks like finding the minimum element, 485 00:22:52,810 --> 00:22:54,450 except you have a bag. 486 00:22:54,450 --> 00:22:56,590 And that bag has to be a maximum heap. 487 00:22:56,590 --> 00:22:58,380 And the original heap is a minimum heap. 488 00:22:58,380 --> 00:23:00,338 So the fact that you have to use a maximum heap 489 00:23:00,338 --> 00:23:01,960 is a bit nontrivial. 490 00:23:01,960 --> 00:23:03,020 Good answer. 491 00:23:03,020 --> 00:23:06,100 All right so we have K log N, and we 492 00:23:06,100 --> 00:23:10,270 have N log K, so choose what you want to have in your log. 493 00:23:10,270 --> 00:23:11,410 We have a solution for you. 494 00:23:14,530 --> 00:23:15,680 How about this. 495 00:23:15,680 --> 00:23:16,770 How are we doing here? 496 00:23:20,480 --> 00:23:22,690 So suppose I'm looking for the fourth element, 497 00:23:22,690 --> 00:23:25,120 and my heap has 10 levels. 498 00:23:25,120 --> 00:23:28,050 How can I chop it-- how can I reduce 499 00:23:28,050 --> 00:23:29,650 the number of things I'm looking at? 500 00:23:37,330 --> 00:23:40,230 AUDIENCE: You can reduce it down to 4 levels. 501 00:23:40,230 --> 00:23:43,550 PROFESSOR: I can reduce it down to 4 levels, exactly. 502 00:23:43,550 --> 00:23:48,640 So this heap has log N levels. 503 00:23:48,640 --> 00:23:52,490 And my K is smaller than log N. I can reduce the heap down 504 00:23:52,490 --> 00:23:55,700 to K levels and discard everything below. 505 00:23:59,150 --> 00:24:03,310 And the reason for that is we have a mean heap, right? 506 00:24:03,310 --> 00:24:07,690 So if we go down from, on any path from the roots to a leaf, 507 00:24:07,690 --> 00:24:11,150 the values have to increase, right? 508 00:24:11,150 --> 00:24:12,520 Otherwise it's not a mean keep. 509 00:24:12,520 --> 00:24:15,290 Otherwise there's an invariant violation somewhere there. 510 00:24:15,290 --> 00:24:21,650 So as I go down on any path my numbers are going up. 511 00:24:21,650 --> 00:24:26,615 So these are all the paths of length 4. 512 00:24:26,615 --> 00:24:29,370 All of them have to go through here. 513 00:24:29,370 --> 00:24:32,320 All the paths of length 4 will stop here. 514 00:24:32,320 --> 00:24:34,450 So I know that everything here has 515 00:24:34,450 --> 00:24:37,210 to be bigger than the first 4 elements. 516 00:24:37,210 --> 00:24:41,330 So if I reduce this to K and I discard everything else, 517 00:24:41,330 --> 00:24:44,630 what's the running time? 518 00:24:44,630 --> 00:24:47,150 So if I use my find my extract min algorithm before, 519 00:24:47,150 --> 00:24:49,389 what was the running time? 520 00:24:49,389 --> 00:24:51,361 AUDIENCE: It was [INAUDIBLE]. 521 00:24:56,790 --> 00:24:58,300 PROFESSOR: So it's not the-- 522 00:24:58,300 --> 00:25:01,085 AUDIENCE: Oh sorry it's the-- 523 00:25:01,085 --> 00:25:02,960 PROFESSOR: So what's one operation in a heap? 524 00:25:02,960 --> 00:25:06,420 If I have the height of a heap, what's an operation? 525 00:25:06,420 --> 00:25:08,604 How much time does it take to do one operation 526 00:25:08,604 --> 00:25:10,270 as a function of the height of the heap? 527 00:25:13,940 --> 00:25:17,010 So if my heap has h levels, in this case 528 00:25:17,010 --> 00:25:20,840 h happens to be log N, it's order h. 529 00:25:20,840 --> 00:25:24,090 So if I reduce it-- I'm not reducing it from N to K. 530 00:25:24,090 --> 00:25:24,950 I wish I could. 531 00:25:24,950 --> 00:25:26,800 I'm reducing it from log N to K. 532 00:25:26,800 --> 00:25:30,670 So for really tiny K's, this becomes order K. 533 00:25:30,670 --> 00:25:38,410 And my total running time is K squared. 534 00:25:38,410 --> 00:25:42,230 So I'm going to do K operations, K extract mins. 535 00:25:42,230 --> 00:25:45,730 OK now the reason I wanted to entertain this is I 536 00:25:45,730 --> 00:25:48,970 claim it's going to be useful to help us find the answer. 537 00:25:48,970 --> 00:25:52,630 So everything that we have here gives us some insight 538 00:25:52,630 --> 00:25:56,350 into what the correct answer is. 539 00:25:56,350 --> 00:25:57,900 Well what our correct answer is. 540 00:25:57,900 --> 00:26:01,040 There might be others. 541 00:26:01,040 --> 00:26:04,350 So let's think for a bit, and see if we can do better. 542 00:26:13,220 --> 00:26:14,678 Am I covering something? 543 00:26:14,678 --> 00:26:15,674 I hope not. 544 00:26:27,321 --> 00:26:29,404 So by the way, when you have problems on your own, 545 00:26:29,404 --> 00:26:31,470 say you are looking at CLRS or at old exams, 546 00:26:31,470 --> 00:26:35,270 you want to give yourselves half an hour, an hour to think. 547 00:26:35,270 --> 00:26:37,027 And just this process alone is going 548 00:26:37,027 --> 00:26:38,360 to help you do better on a test. 549 00:26:38,360 --> 00:26:40,290 Because while you're thinking you're going through everything 550 00:26:40,290 --> 00:26:40,850 you know. 551 00:26:40,850 --> 00:26:43,050 And you're rearranging stuff in your brain 552 00:26:43,050 --> 00:26:45,336 in a way that will be easier to access it later. 553 00:26:45,336 --> 00:26:47,710 So now you're going to think, what do I know about heaps? 554 00:26:47,710 --> 00:26:50,640 What do I know that takes log N time? 555 00:26:50,640 --> 00:26:52,380 What do I know that takes N log N time? 556 00:26:52,380 --> 00:26:55,340 And your brain will be better at answering 557 00:26:55,340 --> 00:26:57,129 these kinds of questions later. 558 00:26:57,129 --> 00:26:58,920 Now we're not going to give you 30 minutes, 559 00:26:58,920 --> 00:27:01,860 because that would make us run out of time. 560 00:27:05,129 --> 00:27:08,047 AUDIENCE: You want to reduce it down to K elements. 561 00:27:08,047 --> 00:27:09,630 PROFESSOR: I want to only have to look 562 00:27:09,630 --> 00:27:11,650 at K elements, that's good. 563 00:27:11,650 --> 00:27:14,160 AUDIENCE: Otherwise you can't plug K into the search. 564 00:27:14,160 --> 00:27:16,640 PROFESSOR: Yep, OK so that's good. 565 00:27:16,640 --> 00:27:19,630 AUDIENCE: Which is interesting, because it's K log K, 566 00:27:19,630 --> 00:27:21,130 and that kind of suggests that you'd 567 00:27:21,130 --> 00:27:23,420 have K elements in the tree. 568 00:27:23,420 --> 00:27:27,860 But then you're searching for each one in the tree. 569 00:27:27,860 --> 00:27:29,440 PROFESSOR: So maybe I'm not going 570 00:27:29,440 --> 00:27:32,430 to be able to cut this heap into K elements, right? 571 00:27:32,430 --> 00:27:34,886 I'll have to do a bit more. 572 00:27:40,298 --> 00:27:42,758 AUDIENCE: Can you cut this heap into K elements 573 00:27:42,758 --> 00:27:48,220 and use that heap to do our [INAUDIBLE]? 574 00:27:48,220 --> 00:27:50,080 PROFESSOR: Let's see how we'd cut this heap. 575 00:27:50,080 --> 00:27:52,090 First off let's see how this keep 576 00:27:52,090 --> 00:27:53,940 would look like if it's cut. 577 00:27:53,940 --> 00:27:58,080 How do we find the first K elements here? 578 00:27:58,080 --> 00:28:00,518 How do we find the first element? 579 00:28:00,518 --> 00:28:03,362 AUDIENCE: [INAUDIBLE]. 580 00:28:03,362 --> 00:28:06,690 PROFESSOR: It's the root. 581 00:28:06,690 --> 00:28:08,973 Second element. 582 00:28:08,973 --> 00:28:11,740 AUDIENCE: [INAUDIBLE]. 583 00:28:11,740 --> 00:28:13,090 PROFESSOR: What do I look at? 584 00:28:13,090 --> 00:28:16,000 If I want to select the second element in a heap, 585 00:28:16,000 --> 00:28:18,220 how many elements do I have to look at? 586 00:28:18,220 --> 00:28:22,940 Two, 5 and 7, because everything below will be bigger, right? 587 00:28:22,940 --> 00:28:26,180 OK I look at them, I compare them, 588 00:28:26,180 --> 00:28:29,230 I know 5 is the smallest one. 589 00:28:29,230 --> 00:28:31,990 Now suppose I want to find the third element. 590 00:28:31,990 --> 00:28:32,920 Who do I look at? 591 00:28:39,120 --> 00:28:40,640 7 or the thing under 5. 592 00:28:40,640 --> 00:28:42,630 So 7 is still in the race for sure. 593 00:28:42,630 --> 00:28:47,420 And then I have to look at the children of 5. 594 00:28:54,110 --> 00:28:57,670 Right now we're looking at 3. 595 00:28:57,670 --> 00:29:02,770 Suppose this has some really large kids. 596 00:29:02,770 --> 00:29:04,050 As in numbers. 597 00:29:04,050 --> 00:29:07,810 And I find that this is the third element. 598 00:29:07,810 --> 00:29:09,850 Who do I look at for the fourth element? 599 00:29:13,450 --> 00:29:15,840 AUDIENCE: [INAUDIBLE]. 600 00:29:15,840 --> 00:29:20,660 PROFESSOR: OK so this isn't in the race 601 00:29:20,660 --> 00:29:22,360 anymore, because it's the third. 602 00:29:22,360 --> 00:29:25,010 The fourth has to be either of these two guys. 603 00:29:25,010 --> 00:29:26,390 Or the kids here, right? 604 00:29:29,690 --> 00:29:31,840 And it happens to be 7. 605 00:29:31,840 --> 00:29:34,460 So I take it out. 606 00:29:34,460 --> 00:29:40,540 If I want to look at the-- if I want to find the next element, 607 00:29:40,540 --> 00:29:41,660 who's in the race? 608 00:29:41,660 --> 00:29:43,968 This guy gets out of the race. 609 00:29:43,968 --> 00:29:44,966 AUDIENCE: 7's kids. 610 00:29:48,958 --> 00:29:49,960 [INAUDIBLE] 611 00:29:49,960 --> 00:29:52,610 PROFESSOR: OK so we have something. 612 00:29:52,610 --> 00:29:54,640 We're not really cutting up the heap, 613 00:29:54,640 --> 00:29:57,230 but we are sort of computing where the blade would 614 00:29:57,230 --> 00:30:02,110 go if we wanted to cut it up in K elements and N minus K 615 00:30:02,110 --> 00:30:03,950 elements. 616 00:30:03,950 --> 00:30:06,990 Does this make some sense? 617 00:30:06,990 --> 00:30:08,371 Nods, no nods. 618 00:30:08,371 --> 00:30:09,912 AUDIENCE: I mean I guess you're never 619 00:30:09,912 --> 00:30:14,530 going to be going down farther than K. 620 00:30:14,530 --> 00:30:16,610 PROFESSOR: So let's just understand the concept. 621 00:30:16,610 --> 00:30:18,318 And then we're going to do one more pass, 622 00:30:18,318 --> 00:30:21,070 write pseudocode, and understand the running time. 623 00:30:21,070 --> 00:30:23,122 Because this is still confusing, right? 624 00:30:23,122 --> 00:30:24,580 We'll need one more pass, otherwise 625 00:30:24,580 --> 00:30:27,300 we can't write the pseudocode into it. 626 00:30:27,300 --> 00:30:31,595 So does the concept make sense? 627 00:30:31,595 --> 00:30:33,455 AUDIENCE: Is that K log K? 628 00:30:33,455 --> 00:30:34,080 PROFESSOR: Yes. 629 00:30:37,560 --> 00:30:40,960 So the idea here is that I have a horizon that says, 630 00:30:40,960 --> 00:30:44,320 what are the next elements that I'm willing to consider? 631 00:30:44,320 --> 00:30:46,730 And first the horizon starts with just the root, 632 00:30:46,730 --> 00:30:48,920 because I know that's the minimum element. 633 00:30:48,920 --> 00:30:50,890 And when I take an element out of the horizon 634 00:30:50,890 --> 00:30:52,970 I put in its children. 635 00:30:52,970 --> 00:30:56,160 That's what I did all the time. 636 00:30:56,160 --> 00:30:59,160 So given a horizon how do I know what the next elements 637 00:30:59,160 --> 00:31:01,536 to extract out of the horizon? 638 00:31:01,536 --> 00:31:03,000 AUDIENCE: [INAUDIBLE]. 639 00:31:03,000 --> 00:31:04,720 PROFESSOR: The mean, OK. 640 00:31:04,720 --> 00:31:06,620 So I want a data structure for the horizon 641 00:31:06,620 --> 00:31:09,490 that can extract means quickly. 642 00:31:09,490 --> 00:31:13,030 OK what am I going to use for the horizon? 643 00:31:13,030 --> 00:31:15,490 A min heap, excellent. 644 00:31:15,490 --> 00:31:17,405 So let's try to go for pseudocode. 645 00:31:26,030 --> 00:31:30,694 Suppose we have H as our original heap. 646 00:31:30,694 --> 00:31:31,527 So H is a mean heap. 647 00:31:34,150 --> 00:31:36,300 We will make Z be our horizon. 648 00:31:36,300 --> 00:31:37,279 I can't use H again. 649 00:31:37,279 --> 00:31:38,820 It would be nice if I could, but I'll 650 00:31:38,820 --> 00:31:41,610 use Z because Z's also a letter in horizon. 651 00:31:41,610 --> 00:31:44,910 So Z's the mean heap. 652 00:31:44,910 --> 00:31:48,700 And then first I will insert into Z. 653 00:31:48,700 --> 00:31:51,620 I'll insert the heap's root, right? 654 00:31:51,620 --> 00:31:55,910 So Z dot insert H of 1. 655 00:31:55,910 --> 00:31:58,220 Remember that heaps are actually arrays. 656 00:31:58,220 --> 00:31:59,220 I hinted to his earlier. 657 00:31:59,220 --> 00:32:04,740 So these nodes have are elements in an array. 658 00:32:04,740 --> 00:32:09,550 So this is the first element, second, third, fourth, fifth, 659 00:32:09,550 --> 00:32:11,970 sixth. 660 00:32:11,970 --> 00:32:15,130 So we're using array backed heaps, and H of one 661 00:32:15,130 --> 00:32:16,670 is going to be the root. 662 00:32:16,670 --> 00:32:21,120 Then I'm going to compute the first K elements like this, 663 00:32:21,120 --> 00:32:29,380 for K in range-- sorry, for i in range K, 664 00:32:29,380 --> 00:32:33,480 so K is going to go from 1 to K. 665 00:32:33,480 --> 00:32:34,280 What I want to do? 666 00:32:38,590 --> 00:32:41,420 Take, compute the ith element. 667 00:32:41,420 --> 00:32:43,508 How do I do that? 668 00:32:43,508 --> 00:32:46,190 AUDIENCE: Extract min. 669 00:32:46,190 --> 00:32:48,346 PROFESSOR: i equals Z dot extract min. 670 00:32:57,010 --> 00:32:59,335 And then I want to insert the children in the horizon. 671 00:33:02,170 --> 00:33:04,290 Right? 672 00:33:04,290 --> 00:33:05,075 How do I do that? 673 00:33:08,664 --> 00:33:11,030 AUDIENCE: 2i and 2i plus 1. 674 00:33:11,030 --> 00:33:16,110 PROFESSOR: OK so this is if I know the index, right? 675 00:33:19,450 --> 00:33:21,880 When I'm putting things in the heap 676 00:33:21,880 --> 00:33:23,520 the keys are going to be the values, 677 00:33:23,520 --> 00:33:25,936 so that I can take out the minimum. 678 00:33:25,936 --> 00:33:30,460 AUDIENCE: [INAUDIBLE] heap first and then inserted H Y. 679 00:33:30,460 --> 00:33:33,370 PROFESSOR: Yeah, OK. 680 00:33:33,370 --> 00:33:36,800 This is empty. 681 00:33:36,800 --> 00:33:38,000 And this is the input. 682 00:33:45,560 --> 00:33:49,050 OK so I need to use the numbers as the keys. 683 00:33:49,050 --> 00:33:51,440 So when I extract something out of the heap, 684 00:33:51,440 --> 00:33:53,870 so when I extract the first element it's going to say 2, 685 00:33:53,870 --> 00:33:56,770 it's not going to say 1. 686 00:33:56,770 --> 00:33:58,137 If I want to-- 687 00:33:58,137 --> 00:34:02,790 AUDIENCE: Why wouldn't Z dot extract [INAUDIBLE] because-- 688 00:34:02,790 --> 00:34:07,600 PROFESSOR: So this will give me the next key in the horizon. 689 00:34:07,600 --> 00:34:11,550 AUDIENCE: But-- oh I see, you're starting out 690 00:34:11,550 --> 00:34:13,150 with just the first one. 691 00:34:13,150 --> 00:34:14,439 PROFESSOR: Yeah. 692 00:34:14,439 --> 00:34:15,409 AUDIENCE: Oh and then you want to add in the next. 693 00:34:15,409 --> 00:34:17,283 PROFESSOR: So at the end of this whole thing, 694 00:34:17,283 --> 00:34:25,639 if I'm extracting them right, I can return this variable here. 695 00:34:25,639 --> 00:34:29,190 Because after K durations this is going to be the Kth element. 696 00:34:29,190 --> 00:34:31,250 So I return it and I'm done. 697 00:34:31,250 --> 00:34:37,520 The problem is I want this guy's index too, right? 698 00:34:37,520 --> 00:34:40,739 So I can't just store the key in the heap. 699 00:34:40,739 --> 00:34:43,020 I have to augment the heap to let me store values. 700 00:34:43,020 --> 00:34:45,340 And I have to store the index. 701 00:34:45,340 --> 00:34:49,090 So for this guy would have Z insert H of 1, 702 00:34:49,090 --> 00:34:51,530 and then it's index 1. 703 00:34:51,530 --> 00:34:56,300 Then when I get out the ith element 704 00:34:56,300 --> 00:34:58,040 I'll also get out it's index. 705 00:34:58,040 --> 00:35:00,144 A variable name for that? 706 00:35:00,144 --> 00:35:01,130 AUDIENCE: j. 707 00:35:01,130 --> 00:35:02,930 PROFESSOR: j. 708 00:35:02,930 --> 00:35:07,770 OK why would you name your variables like this? 709 00:35:07,770 --> 00:35:10,640 In the previous section I had a similar suggestion, i, i. 710 00:35:10,640 --> 00:35:14,680 So why would you name your variables like this? 711 00:35:14,680 --> 00:35:16,530 AUDIENCE: [INAUDIBLE]. 712 00:35:16,530 --> 00:35:17,530 PROFESSOR: Job security. 713 00:35:20,750 --> 00:35:21,930 All right. 714 00:35:21,930 --> 00:35:23,540 So it's OK here. 715 00:35:23,540 --> 00:35:26,580 Try to not to do that when doing an exam or an interview, 716 00:35:26,580 --> 00:35:28,184 because it reflects poorly on you. 717 00:35:28,184 --> 00:35:30,350 For an interview and for an exam you'll get us upset 718 00:35:30,350 --> 00:35:32,800 and we might be less lenient. 719 00:35:32,800 --> 00:35:34,940 Or at least explain what you're doing. 720 00:35:34,940 --> 00:35:38,390 So extract min is going to give us the key. 721 00:35:38,390 --> 00:35:40,830 And it's going to give us index in the heap. 722 00:35:40,830 --> 00:35:41,970 What do we do afterwards? 723 00:35:46,052 --> 00:35:48,870 AUDIENCE: We add to [INAUDIBLE] H of-- 724 00:35:48,870 --> 00:35:53,430 PROFESSOR: All right so when we take out 2-- 725 00:35:53,430 --> 00:35:55,430 so we start out with an horizon of 2. 726 00:35:55,430 --> 00:35:58,610 When we take it out 2's the only thing that's in the horizon 727 00:35:58,610 --> 00:35:59,110 first. 728 00:35:59,110 --> 00:36:03,540 Then we take it out and its two children get in the horizon. 729 00:36:03,540 --> 00:36:06,300 Then we take out one of the children 730 00:36:06,300 --> 00:36:09,564 and put its children in the horizon. 731 00:36:09,564 --> 00:36:10,980 So when we take out a node we want 732 00:36:10,980 --> 00:36:13,400 to put its children in the horizon. 733 00:36:13,400 --> 00:36:15,790 So we're going to say Z dot-- 734 00:36:15,790 --> 00:36:17,640 AUDIENCE: Insert. 735 00:36:17,640 --> 00:36:19,740 PROFESSOR: Insert. 736 00:36:19,740 --> 00:36:22,800 What do I insert? 737 00:36:22,800 --> 00:36:26,055 AUDIENCE: H of I times 2. 738 00:36:26,055 --> 00:36:27,406 j times 2. 739 00:36:27,406 --> 00:36:28,030 PROFESSOR: See? 740 00:36:28,030 --> 00:36:31,330 It's working already. 741 00:36:31,330 --> 00:36:33,010 The job security thing is working. 742 00:36:33,010 --> 00:36:35,206 And? 743 00:36:35,206 --> 00:36:36,630 AUDIENCE: 2j plus 1. 744 00:36:40,940 --> 00:36:42,825 You have to do two lines. 745 00:36:42,825 --> 00:36:43,700 PROFESSOR: OK, sweet. 746 00:36:57,990 --> 00:36:59,920 OK. 747 00:36:59,920 --> 00:37:01,214 Does this work? 748 00:37:01,214 --> 00:37:03,380 I mean does this do what we wanted it to do earlier? 749 00:37:08,260 --> 00:37:11,188 AUDIENCE: Wait, we're extracting oh-- 750 00:37:16,304 --> 00:37:17,595 PROFESSOR: All right first nod. 751 00:37:21,710 --> 00:37:24,146 AUDIENCE: I mean if K is small enough. 752 00:37:24,146 --> 00:37:29,940 Eventually you'll ask for something that is out of range. 753 00:37:29,940 --> 00:37:32,700 PROFESSOR: Oh so you're thinking that eventually these 754 00:37:32,700 --> 00:37:33,617 will run out of range. 755 00:37:33,617 --> 00:37:35,616 AUDIENCE: If you have your really lopsided array 756 00:37:35,616 --> 00:37:37,950 eventually you'll ask for something that's [INAUDIBLE]. 757 00:37:37,950 --> 00:37:42,150 PROFESSOR: OK what would we want to do in that case? 758 00:37:42,150 --> 00:37:46,764 AUDIENCE: Just want to check to make sure that the [INAUDIBLE]. 759 00:37:46,764 --> 00:37:48,680 PROFESSOR: Yeah, but otherwise move on, right? 760 00:37:48,680 --> 00:37:52,600 If an element doesn't have kids, we don't add on to the horizon. 761 00:37:52,600 --> 00:37:56,540 So we need some bounce checks, exception checking, 762 00:37:56,540 --> 00:37:58,390 things like that in here. 763 00:37:58,390 --> 00:37:59,880 And I won't add that because that 764 00:37:59,880 --> 00:38:01,990 will make it look long and ugly. 765 00:38:01,990 --> 00:38:05,350 So this is the idea. 766 00:38:05,350 --> 00:38:06,850 OK what's the running time for this? 767 00:38:09,774 --> 00:38:10,690 AUDIENCE: [INAUDIBLE]. 768 00:38:14,530 --> 00:38:15,520 PROFESSOR: Cool. 769 00:38:15,520 --> 00:38:22,020 So creating heaps, initializing, all order 1, insertion, 770 00:38:22,020 --> 00:38:23,400 this heap is almost empty now. 771 00:38:23,400 --> 00:38:25,030 So this is order 1. 772 00:38:25,030 --> 00:38:28,520 Then these happen K times. 773 00:38:28,520 --> 00:38:32,070 And these are all operations on the heap Z. 774 00:38:32,070 --> 00:38:37,090 And the heap for the heap Z, it has some number of elements. 775 00:38:37,090 --> 00:38:40,570 And it's not always going to have one element, 776 00:38:40,570 --> 00:38:42,010 because every time I'm extracting 777 00:38:42,010 --> 00:38:43,450 one element I'm adding two. 778 00:38:43,450 --> 00:38:49,461 So well how many elements is it going to have at most? 779 00:38:49,461 --> 00:38:49,960 AUDIENCE: K. 780 00:38:49,960 --> 00:38:51,961 PROFESSOR: OK why is that? 781 00:38:51,961 --> 00:38:54,850 AUDIENCE: Because each time you add it it's one element. 782 00:38:54,850 --> 00:38:56,430 PROFESSOR: So I extract one for sure. 783 00:38:56,430 --> 00:38:58,760 And then I add at most two elements. 784 00:38:58,760 --> 00:39:03,070 So the heap size grows by at most 1 in every iteration. 785 00:39:03,070 --> 00:39:07,550 So the heap size Z will have at most K elements. 786 00:39:11,030 --> 00:39:13,270 So now I know the running for all these operations. 787 00:39:13,270 --> 00:39:15,350 What is it? 788 00:39:15,350 --> 00:39:18,680 Log K. Cool. 789 00:39:18,680 --> 00:39:24,440 So it's K times log K. And the reason that it works, 790 00:39:24,440 --> 00:39:26,270 it's a bit harder to see. 791 00:39:26,270 --> 00:39:29,440 You have to convince yourself maybe using this bigger tree, 792 00:39:29,440 --> 00:39:31,800 that whenever you're spending expanding the horizon 793 00:39:31,800 --> 00:39:34,250 you're expanding it the right way. 794 00:39:34,250 --> 00:39:37,600 So the idea is again that whatever 795 00:39:37,600 --> 00:39:42,760 path you take down you're going to see ascending numbers. 796 00:39:42,760 --> 00:39:44,360 So when you're increasing the horizon 797 00:39:44,360 --> 00:39:47,220 you're always pushing it down in such a way 798 00:39:47,220 --> 00:39:52,100 that your invariant is that all the numbers in the horizon 799 00:39:52,100 --> 00:39:55,140 are smaller then their children. 800 00:39:55,140 --> 00:39:57,260 And so on and so forth. 801 00:39:57,260 --> 00:39:59,040 So the horizon is always guaranteed 802 00:39:59,040 --> 00:40:02,330 to have the smallest number that you haven't extracted yet. 803 00:40:02,330 --> 00:40:04,372 And that's really the only thing you need. 804 00:40:07,230 --> 00:40:10,430 OK does this make some sense? 805 00:40:10,430 --> 00:40:13,471 AUDIENCE: It never would have occurred to me on an exam. 806 00:40:13,471 --> 00:40:14,470 PROFESSOR: Yeah exactly. 807 00:40:14,470 --> 00:40:17,610 This would not occur on an exam unless you think a lot, 808 00:40:17,610 --> 00:40:19,640 you're super inspired, all that. 809 00:40:19,640 --> 00:40:22,570 If it doesn't occur to you what do you do? 810 00:40:22,570 --> 00:40:24,970 AUDIENCE: Go with the N log K solution. 811 00:40:24,970 --> 00:40:26,410 PROFESSOR: OK, very good. 812 00:40:26,410 --> 00:40:27,138 Wait. 813 00:40:27,138 --> 00:40:28,890 AUDIENCE: K log N. 814 00:40:28,890 --> 00:40:32,450 PROFESSOR: OK, K log N or N log K, which one? 815 00:40:32,450 --> 00:40:33,740 AUDIENCE: K log N. 816 00:40:33,740 --> 00:40:34,370 PROFESSOR: Why? 817 00:40:34,370 --> 00:40:36,840 Two reasons. 818 00:40:36,840 --> 00:40:40,630 K log N is-- so two reasons, faster and simpler. 819 00:40:44,510 --> 00:40:47,110 So you write this down. 820 00:40:47,110 --> 00:40:53,510 And you get half score or 3/4 of the score and you're done. 821 00:40:53,510 --> 00:40:55,240 It's better than nothing, anything, 822 00:40:55,240 --> 00:40:57,630 and getting a 0, right? 823 00:40:57,630 --> 00:41:01,560 I mean 3/4 of a score for two lines of pseudocode 824 00:41:01,560 --> 00:41:03,854 is reasonable, right? 825 00:41:03,854 --> 00:41:04,645 Two or three lines. 826 00:41:04,645 --> 00:41:06,231 This is three lines probably. 827 00:41:10,160 --> 00:41:13,510 Also on most exams we're humans, right? 828 00:41:13,510 --> 00:41:16,394 We might mess them up, we might make them too long. 829 00:41:16,394 --> 00:41:17,810 If we make them too long, you want 830 00:41:17,810 --> 00:41:20,610 to get the most number of points. 831 00:41:20,610 --> 00:41:23,050 You'll have time to figure out one or two 832 00:41:23,050 --> 00:41:24,170 problems at that level. 833 00:41:24,170 --> 00:41:26,397 But if we give you too many, for the rest of them 834 00:41:26,397 --> 00:41:27,980 you want to have something simple that 835 00:41:27,980 --> 00:41:30,954 gives you some of the points. 836 00:41:30,954 --> 00:41:31,870 Same for an interview. 837 00:41:31,870 --> 00:41:34,060 For most interviews most people don't really 838 00:41:34,060 --> 00:41:37,670 have a clue how many problems you can solve, 839 00:41:37,670 --> 00:41:39,420 how many problems are reasonable. 840 00:41:39,420 --> 00:41:41,350 So you want, for every problem you 841 00:41:41,350 --> 00:41:44,890 want to show some solution reasonably fast. 842 00:41:44,890 --> 00:41:46,589 And then see if they're happy. 843 00:41:46,589 --> 00:41:48,630 And if they're happy move on to the next problem. 844 00:41:48,630 --> 00:41:51,990 And if they're not happy only then spend more time. 845 00:41:51,990 --> 00:41:54,960 So this is as important as that. 846 00:41:54,960 --> 00:41:58,100 If you look at the recitation notes we'll have some problems 847 00:41:58,100 --> 00:42:00,076 and we'll have some solutions. 848 00:42:00,076 --> 00:42:01,950 What are going to do, memorize the solutions? 849 00:42:01,950 --> 00:42:03,540 Yay, you know how to solve more problems. 850 00:42:03,540 --> 00:42:05,498 There are probably a million problems in total. 851 00:42:05,498 --> 00:42:07,904 That doesn't get you very far. 852 00:42:07,904 --> 00:42:09,820 So what you want is to understand this process 853 00:42:09,820 --> 00:42:10,694 that we went through. 854 00:42:10,694 --> 00:42:14,782 So every time we tried something we got from some point 855 00:42:14,782 --> 00:42:16,490 to some point with a better running time. 856 00:42:16,490 --> 00:42:17,850 Well except for here. 857 00:42:17,850 --> 00:42:19,890 And where we had more insight on the problem. 858 00:42:19,890 --> 00:42:23,030 So this is the important part. 859 00:42:23,030 --> 00:42:27,080 And I'm going to show you one more problem, really quickly. 860 00:42:27,080 --> 00:42:29,200 We're probably not going to be able to solve it, 861 00:42:29,200 --> 00:42:30,660 because it's hard. 862 00:42:30,660 --> 00:42:32,660 But we are going to talk about it 863 00:42:32,660 --> 00:42:36,300 and see if we can get some insight. 864 00:42:36,300 --> 00:42:38,730 Let's see, what do I want to erase? 865 00:42:38,730 --> 00:42:39,968 This. 866 00:42:39,968 --> 00:42:41,462 I like that. 867 00:43:02,910 --> 00:43:07,380 All right, so we have an array random numbers, 7, 2, 5-- 868 00:43:07,380 --> 00:43:15,020 this time there's no order in it-- 8, 9, 4. 869 00:43:15,020 --> 00:43:18,340 And we tell you that the array has 2 to the N numbers, 870 00:43:18,340 --> 00:43:20,280 to make the problem easier. 871 00:43:20,280 --> 00:43:22,534 1, 2, 3, 4, 5, 6, 7. 872 00:43:26,390 --> 00:43:27,840 6. 873 00:43:27,840 --> 00:43:30,030 So you have this array. 874 00:43:30,030 --> 00:43:33,580 And we want to answer queries of this shape. 875 00:43:33,580 --> 00:43:37,080 Say this array is E, and it has N elements, 876 00:43:37,080 --> 00:43:40,750 and you know that N is some 2 to the K. 877 00:43:40,750 --> 00:43:45,240 Minimum of all the elements from i to j. 878 00:43:47,870 --> 00:43:49,552 So you have two phases, just like we 879 00:43:49,552 --> 00:43:50,760 had on a problem on the exam. 880 00:43:50,760 --> 00:43:53,093 You have a pre-processing stage where you get the array, 881 00:43:53,093 --> 00:43:56,342 you do some computation, you save some information. 882 00:43:56,342 --> 00:43:57,800 And then you have a querying phase, 883 00:43:57,800 --> 00:44:02,280 where you have to answer these as fast as possible. 884 00:44:02,280 --> 00:44:04,310 I see most people have unhappy faces. 885 00:44:04,310 --> 00:44:06,740 Bad memories, huh? 886 00:44:06,740 --> 00:44:09,780 OK let's not worry about that problem. 887 00:44:09,780 --> 00:44:11,820 Let's look at this one. 888 00:44:11,820 --> 00:44:14,210 So assuming you have as much time as you 889 00:44:14,210 --> 00:44:16,900 want to do the pre-processing, what's 890 00:44:16,900 --> 00:44:20,260 the fastest way you could answer these? 891 00:44:20,260 --> 00:44:22,065 Yes? 892 00:44:22,065 --> 00:44:23,460 AUDIENCE: If you had as much time 893 00:44:23,460 --> 00:44:27,055 for pre-processing [INAUDIBLE] memorize it. 894 00:44:27,055 --> 00:44:27,930 PROFESSOR: All right. 895 00:44:27,930 --> 00:44:32,510 So if we compute the answers to all possible solutions, right? 896 00:44:32,510 --> 00:44:34,770 How would I store that? 897 00:44:34,770 --> 00:44:37,930 So I want to do this in order 1. 898 00:44:37,930 --> 00:44:40,266 So how would I store these answers? 899 00:44:40,266 --> 00:44:43,220 AUDIENCE: Just sort your array. 900 00:44:43,220 --> 00:44:44,885 PROFESSOR: OK so I sort my array. 901 00:44:44,885 --> 00:44:47,830 AUDIENCE: Then you want the minimum from i to j, 902 00:44:47,830 --> 00:44:50,770 so look at the ith element and that's your [INAUDIBLE]. 903 00:44:59,610 --> 00:45:01,650 PROFESSOR: OK so figure it out? 904 00:45:04,690 --> 00:45:06,921 Well I mean if I can sort it I can also say hey, 905 00:45:06,921 --> 00:45:08,420 why don't we use this array instead? 906 00:45:12,016 --> 00:45:13,390 And then I'll answer the queries. 907 00:45:18,820 --> 00:45:21,600 You can go off a tangent trying to sort the elements 908 00:45:21,600 --> 00:45:22,550 and keep their keys. 909 00:45:22,550 --> 00:45:25,230 The important thing is if you think about it for awhile 910 00:45:25,230 --> 00:45:27,570 and you see that things stop making sense, back out. 911 00:45:27,570 --> 00:45:29,627 Look somewhere else. 912 00:45:29,627 --> 00:45:31,710 We spent some time trying to find a solution based 913 00:45:31,710 --> 00:45:34,410 on sorting in my last section. 914 00:45:34,410 --> 00:45:36,030 It's not going to work. 915 00:45:36,030 --> 00:45:38,252 So-- 916 00:45:38,252 --> 00:45:42,570 AUDIENCE: Can't you just take the [INAUDIBLE] from i to j? 917 00:45:45,032 --> 00:45:46,740 PROFESSOR: OK let's get to that in a bit. 918 00:45:46,740 --> 00:45:47,865 So let's keep that in mind. 919 00:45:47,865 --> 00:45:50,960 Because that's another point on the trade off curve. 920 00:45:50,960 --> 00:45:54,970 So if I want to serve my queries in order 1, 921 00:45:54,970 --> 00:46:01,780 then the way I do that is I will have a hash of all the arrays 922 00:46:01,780 --> 00:46:04,100 that look like i, j. 923 00:46:04,100 --> 00:46:08,790 So all the possible intervals. 924 00:46:08,790 --> 00:46:12,830 And I'll store the answer here. 925 00:46:12,830 --> 00:46:18,200 The minimum the elements from i to j. 926 00:46:18,200 --> 00:46:20,950 And I can do a hash lookup in order 1 and get the answer 927 00:46:20,950 --> 00:46:23,490 and return the answer. 928 00:46:23,490 --> 00:46:25,080 How many elements so I have here? 929 00:46:28,760 --> 00:46:31,739 So how much storage do I have to use for this? 930 00:46:31,739 --> 00:46:32,780 AUDIENCE: O of N squared. 931 00:46:32,780 --> 00:46:38,560 PROFESSOR: OK N values for this, N values for this, so roughly 932 00:46:38,560 --> 00:46:40,540 N squared. 933 00:46:40,540 --> 00:46:43,620 What's the time for computing this? 934 00:46:43,620 --> 00:46:44,930 Brute force, let's not think. 935 00:46:44,930 --> 00:46:48,314 What's the time for computing this? 936 00:46:48,314 --> 00:46:49,280 AUDIENCE: N cubed. 937 00:46:49,280 --> 00:46:50,179 PROFESSOR: N cubed. 938 00:46:50,179 --> 00:46:50,845 You're thinking. 939 00:46:54,290 --> 00:46:56,370 So I have unsquared elements here. 940 00:46:56,370 --> 00:46:59,060 For every element I have to compute 941 00:46:59,060 --> 00:47:04,330 the minimum of potentially order N elements, right? 942 00:47:04,330 --> 00:47:05,350 So this is N cubed. 943 00:47:05,350 --> 00:47:08,110 I could reduce it to N squared by noticing that if I have 944 00:47:08,110 --> 00:47:10,390 the minimum of these elements, and I 945 00:47:10,390 --> 00:47:13,050 want to compute the minimum of these elements, 946 00:47:13,050 --> 00:47:14,980 really all I have to do is compute, compare 947 00:47:14,980 --> 00:47:18,000 this minimum with this element. 948 00:47:18,000 --> 00:47:20,730 So every time I start with an interval of size 1 949 00:47:20,730 --> 00:47:22,550 and then I expand it by 1. 950 00:47:22,550 --> 00:47:24,350 So I have my two for loops here. 951 00:47:24,350 --> 00:47:27,310 And I keep growing my minimum. 952 00:47:27,310 --> 00:47:33,025 So I could get down to order of N squared times. 953 00:47:37,210 --> 00:47:41,750 So I have one solution that has order 954 00:47:41,750 --> 00:47:44,280 of N squared time and space, and then answers 955 00:47:44,280 --> 00:47:46,480 the queries in order 1. 956 00:47:46,480 --> 00:47:49,790 You had a solution you said where, what you do 957 00:47:49,790 --> 00:47:53,890 is, when you get a query you compute this, right? 958 00:47:53,890 --> 00:47:55,890 You were suggesting sorting the array. 959 00:47:55,890 --> 00:47:58,715 That would be N log N. I would suggest not sorting it. 960 00:47:58,715 --> 00:48:00,840 Do the splicing, you look through all the elements, 961 00:48:00,840 --> 00:48:03,596 and you find the minimum. 962 00:48:03,596 --> 00:48:06,844 AUDIENCE: I was saying that if the original E spans i to j 963 00:48:06,844 --> 00:48:08,660 and started at the-- 964 00:48:08,660 --> 00:48:10,410 PROFESSOR: So when you get a query the i's 965 00:48:10,410 --> 00:48:14,300 and j's change for every query. 966 00:48:14,300 --> 00:48:16,710 Otherwise we could compute the answer. 967 00:48:16,710 --> 00:48:21,300 So we have one answer where we take order 968 00:48:21,300 --> 00:48:23,510 N time to answer a query. 969 00:48:23,510 --> 00:48:25,768 And what do we do for pre-processing? 970 00:48:32,400 --> 00:48:32,900 Nothing. 971 00:48:32,900 --> 00:48:34,940 Order 1. 972 00:48:34,940 --> 00:48:39,350 So these are two ends of a trade off, right? 973 00:48:39,350 --> 00:48:43,257 One possible extreme is that you pre-compute all your answers. 974 00:48:43,257 --> 00:48:45,590 The other possible extreme is that you don't do anything 975 00:48:45,590 --> 00:48:47,590 and you brute force every answer. 976 00:48:47,590 --> 00:48:49,704 And now we want to find a point somewhere 977 00:48:49,704 --> 00:48:51,120 on this line between the extremes. 978 00:48:55,320 --> 00:48:57,800 So the answer that we're going to show in the solutions 979 00:48:57,800 --> 00:49:05,980 uses order N log N space. 980 00:49:05,980 --> 00:49:09,240 And it answers the query by using 981 00:49:09,240 --> 00:49:17,580 order 1 elements in this order N log N data structure. 982 00:49:17,580 --> 00:49:20,550 So I have order N log N partial minima. 983 00:49:20,550 --> 00:49:24,540 And I will only use two of them. 984 00:49:24,540 --> 00:49:28,300 So the total running time isn't actually order 1. 985 00:49:28,300 --> 00:49:30,898 But we only use order 1 elements. 986 00:49:35,350 --> 00:49:39,600 Let's start thinking very quickly. 987 00:49:39,600 --> 00:49:42,070 Let's think for about a minute, and then we'll 988 00:49:42,070 --> 00:49:44,460 go through the solution. 989 00:49:44,460 --> 00:49:45,864 And there are multiple solutions. 990 00:49:45,864 --> 00:49:47,780 All of them are interesting in different ways. 991 00:49:47,780 --> 00:49:50,200 And there are other solutions that 992 00:49:50,200 --> 00:49:55,002 are equally fun and applicable with not the same running time. 993 00:50:00,820 --> 00:50:02,155 Let me make some space here. 994 00:50:26,647 --> 00:50:28,980 So like I said, thinking is a useful process on its own. 995 00:50:28,980 --> 00:50:31,510 So you're getting better just by doing this. 996 00:50:35,010 --> 00:50:38,760 AUDIENCE: [INAUDIBLE] using more than one space total? 997 00:50:38,760 --> 00:50:40,520 PROFESSOR: We're using N log N space. 998 00:50:40,520 --> 00:50:43,106 AUDIENCE: Oh and it takes constant time-- 999 00:50:43,106 --> 00:50:44,980 PROFESSOR: It will only look at two elements. 1000 00:50:44,980 --> 00:50:46,313 It's actually not constant time. 1001 00:50:46,313 --> 00:50:48,730 We're not going to worry too much about time. 1002 00:50:48,730 --> 00:50:51,722 It turns out being log. 1003 00:50:51,722 --> 00:50:53,700 AUDIENCE: Ok, what was the order 1 then? 1004 00:50:53,700 --> 00:50:56,440 PROFESSOR: You only access order 1 elements. 1005 00:50:56,440 --> 00:50:58,854 Order 1 partial minimum. 1006 00:50:58,854 --> 00:50:59,562 AUDIENCE: Oh, OK. 1007 00:51:04,442 --> 00:51:07,370 Does it have to do with two different K? 1008 00:51:07,370 --> 00:51:08,346 PROFESSOR: Maybe. 1009 00:51:08,346 --> 00:51:12,738 AUDIENCE: I don't know what to do with that. 1010 00:51:12,738 --> 00:51:14,770 There's probably some sort of tree involved. 1011 00:51:21,120 --> 00:51:23,536 PROFESSOR: So you're going to want to split things, right? 1012 00:51:23,536 --> 00:51:25,130 Into halves. 1013 00:51:25,130 --> 00:51:28,020 And you're going to want to be able to do this all the time. 1014 00:51:28,020 --> 00:51:30,800 And we say 2 to the K so we don't have to worry about, 1015 00:51:30,800 --> 00:51:33,384 oh my God what happens if the halves aren't equal? 1016 00:51:33,384 --> 00:51:35,800 You can usually solve this when you implement the problem. 1017 00:51:35,800 --> 00:51:38,000 But it's useful to not worry about 1018 00:51:38,000 --> 00:51:41,760 that when you come up with your first algorithm. 1019 00:51:41,760 --> 00:51:43,656 If you're going to start dividing in halves. 1020 00:51:53,586 --> 00:51:54,502 AUDIENCE: [INAUDIBLE]. 1021 00:51:59,460 --> 00:52:00,415 PROFESSOR: Um. 1022 00:52:00,415 --> 00:52:01,331 AUDIENCE: [INAUDIBLE]. 1023 00:52:04,740 --> 00:52:08,090 PROFESSOR: So that leads to another useful solution. 1024 00:52:08,090 --> 00:52:12,060 That leads to a solution that takes-- that has N log N 1025 00:52:12,060 --> 00:52:14,340 storage and it will run in N log N time 1026 00:52:14,340 --> 00:52:17,210 with N log N element axes. 1027 00:52:17,210 --> 00:52:18,790 So what you thinking of is you're 1028 00:52:18,790 --> 00:52:21,710 going to have your array of elements, right? 1029 00:52:21,710 --> 00:52:25,510 And say you want to find the minimum from here to here. 1030 00:52:25,510 --> 00:52:28,380 You're going to have your array split in half. 1031 00:52:28,380 --> 00:52:31,620 So you're going to find the minimum of this, 1032 00:52:31,620 --> 00:52:33,377 and the minimum of this. 1033 00:52:33,377 --> 00:52:34,960 But to do that you'll have to recurse. 1034 00:52:34,960 --> 00:52:37,075 So this is also say split in half. 1035 00:52:37,075 --> 00:52:40,550 So you'll have to find-- so it turns out 1036 00:52:40,550 --> 00:52:42,340 that if you do this, in the end you'll 1037 00:52:42,340 --> 00:52:46,300 have log N minima that you have to look at. 1038 00:52:46,300 --> 00:52:49,600 But this is more, this is a cooler and more useful thing, 1039 00:52:49,600 --> 00:52:51,570 so I'll try to put it on a PSet or something 1040 00:52:51,570 --> 00:52:53,820 to make you think about it. 1041 00:52:53,820 --> 00:52:55,526 So this is-- don't tell people yet. 1042 00:52:55,526 --> 00:52:57,150 You might have a solution to a problem. 1043 00:53:05,880 --> 00:53:07,350 AUDIENCE: [INAUDIBLE]. 1044 00:53:07,350 --> 00:53:09,100 PROFESSOR: OK. 1045 00:53:09,100 --> 00:53:13,100 So what we thought of, or the way we thought of doing it, 1046 00:53:13,100 --> 00:53:19,870 is 6, 7, 2, 5, 3, 8, 9, 4. 1047 00:53:19,870 --> 00:53:21,630 So we compute these partial minima. 1048 00:53:21,630 --> 00:53:23,930 We split the array into two. 1049 00:53:23,930 --> 00:53:28,313 And these are the minima that we compute. 1050 00:53:32,480 --> 00:53:35,994 Sorry, this is like this, this is like this. 1051 00:53:38,940 --> 00:53:42,022 So everything, so all the left half then these guys, 1052 00:53:42,022 --> 00:53:44,510 then these guys, then this guy. 1053 00:53:44,510 --> 00:53:48,510 Everything here, then these guys, then these guys, 1054 00:53:48,510 --> 00:53:50,220 then this guy. 1055 00:53:50,220 --> 00:53:55,100 So if your i and j are on different sides of the middle, 1056 00:53:55,100 --> 00:53:58,110 then you do two lookups, you're done. 1057 00:53:58,110 --> 00:54:00,810 If they're in the same half, then you 1058 00:54:00,810 --> 00:54:02,759 have a problem that's half the size. 1059 00:54:02,759 --> 00:54:04,800 So you're going to have to take this array that's 1060 00:54:04,800 --> 00:54:07,380 half the size, 2, 5. 1061 00:54:07,380 --> 00:54:09,410 Split it into halves and do the same thing. 1062 00:54:12,290 --> 00:54:15,690 And then we're going to have to do the same to this other one. 1063 00:54:15,690 --> 00:54:19,450 3, 8, 9, 4, split it into halves and do the same thing. 1064 00:54:23,010 --> 00:54:25,110 So in the end you'll end up in someplace 1065 00:54:25,110 --> 00:54:29,410 where your interval ages are on different sides of the middle. 1066 00:54:29,410 --> 00:54:33,420 And you look at two elements and you're done. 1067 00:54:33,420 --> 00:54:35,860 Let's see how much space this takes. 1068 00:54:35,860 --> 00:54:38,770 Can someone tell me a recursion for how much 1069 00:54:38,770 --> 00:54:45,220 space, for how many minimums I would need to keep? 1070 00:54:45,220 --> 00:54:47,220 So space for an elements is? 1071 00:54:51,070 --> 00:54:54,990 AUDIENCE: The first level you have 8. 1072 00:54:54,990 --> 00:54:59,390 So go down by an order of 2. 1073 00:54:59,390 --> 00:55:01,780 PROFESSOR: So what's the first level? 1074 00:55:01,780 --> 00:55:03,010 AUDIENCE: Of 8 N. 1075 00:55:03,010 --> 00:55:05,200 PROFESSOR: So order N plus? 1076 00:55:08,310 --> 00:55:09,550 AUDIENCE: N over 2? 1077 00:55:09,550 --> 00:55:10,682 t of N over 2? 1078 00:55:10,682 --> 00:55:12,140 PROFESSOR: OK S because it's space. 1079 00:55:12,140 --> 00:55:13,560 N over 2. 1080 00:55:13,560 --> 00:55:16,045 OK. 1081 00:55:16,045 --> 00:55:16,545 And? 1082 00:55:18,750 --> 00:55:19,666 AUDIENCE: [INAUDIBLE]. 1083 00:55:22,940 --> 00:55:24,660 PROFESSOR: You're missing something. 1084 00:55:24,660 --> 00:55:25,930 Look at this picture. 1085 00:55:25,930 --> 00:55:27,970 So this is the whole thing. 1086 00:55:27,970 --> 00:55:29,590 Then I have a half. 1087 00:55:29,590 --> 00:55:33,120 And then what else do I have? 1088 00:55:33,120 --> 00:55:33,910 AUDIENCE: 2. 1089 00:55:33,910 --> 00:55:35,008 PROFESSOR: The other half. 1090 00:55:35,008 --> 00:55:36,229 AUDIENCE: Oh, 2. 1091 00:55:36,229 --> 00:55:38,270 PROFESSOR: OK so the difference between these two 1092 00:55:38,270 --> 00:55:39,853 is that one of them gives you order N, 1093 00:55:39,853 --> 00:55:43,170 the other one gives you N log N. So I gave you the answer, 1094 00:55:43,170 --> 00:55:45,050 so I can't ask you for the answer now. 1095 00:55:45,050 --> 00:55:46,870 But where did we see this before? 1096 00:55:46,870 --> 00:55:48,130 Pretend these are t's. 1097 00:55:48,130 --> 00:55:49,310 AUDIENCE: [INAUDIBLE]. 1098 00:55:49,310 --> 00:55:50,620 PROFESSOR: Sorry? 1099 00:55:50,620 --> 00:55:53,450 So these are t's, this is the recursion for [? more sort ?]. 1100 00:55:53,450 --> 00:55:56,049 So once you put it up you don't draw the recursion tree 1101 00:55:56,049 --> 00:55:56,590 and solve it. 1102 00:55:56,590 --> 00:55:59,270 You say this is what we saw in [? more sort. ?] Therefore, 1103 00:55:59,270 --> 00:56:05,720 the solution is N log N. So this is 1104 00:56:05,720 --> 00:56:07,380 how you show you have N log N space, 1105 00:56:07,380 --> 00:56:09,290 and it's pretty clear that you're only 1106 00:56:09,290 --> 00:56:10,975 going to access two elements. 1107 00:56:10,975 --> 00:56:12,875 AUDIENCE: I don't understand how [INAUDIBLE]. 1108 00:56:16,680 --> 00:56:18,070 PROFESSOR: How it works? 1109 00:56:18,070 --> 00:56:23,340 So you have your i and you have your j. 1110 00:56:23,340 --> 00:56:25,250 Let's make that one i here. 1111 00:56:25,250 --> 00:56:27,385 If you want to find the minimum, if i and j 1112 00:56:27,385 --> 00:56:33,550 are on different sides of the half, you have this and this. 1113 00:56:33,550 --> 00:56:36,060 And these two partial minima cover your entire interval. 1114 00:56:39,830 --> 00:56:44,630 Now if they're on the same side of the half then 1115 00:56:44,630 --> 00:56:47,121 you recurse to a smaller problem. 1116 00:56:47,121 --> 00:56:49,537 AUDIENCE: Well you don't have to there because you already 1117 00:56:49,537 --> 00:56:52,140 have the minimum of that section. 1118 00:56:52,140 --> 00:56:53,875 PROFESSOR: Yeah. 1119 00:56:53,875 --> 00:56:57,280 AUDIENCE: It wouldn't work if you had 6 and 2, right? 1120 00:56:57,280 --> 00:56:59,220 Or that. 1121 00:56:59,220 --> 00:57:00,636 PROFESSOR: Yeah. 1122 00:57:00,636 --> 00:57:03,410 AUDIENCE: Well why not just take 7 and 2 then? 1123 00:57:03,410 --> 00:57:05,657 Why do you have to break up the entire panel? 1124 00:57:05,657 --> 00:57:07,490 PROFESSOR: Assume there's more things there. 1125 00:57:10,490 --> 00:57:11,440 AUDIENCE: Oh I see. 1126 00:57:11,440 --> 00:57:16,660 PROFESSOR: So if you have this, now it's no longer true, right? 1127 00:57:16,660 --> 00:57:19,830 So wherever they are here, you do that. 1128 00:57:19,830 --> 00:57:22,670 And remember your pseudocode has to be as simple as possible 1129 00:57:22,670 --> 00:57:24,730 to reduce the probability of bugs. 1130 00:57:24,730 --> 00:57:26,870 So you want to do the simplest possible thing, 1131 00:57:26,870 --> 00:57:29,680 not have special cases. 1132 00:57:29,680 --> 00:57:31,670 OK. 1133 00:57:31,670 --> 00:57:33,450 By the way there's a study that shows 1134 00:57:33,450 --> 00:57:37,080 that for good or bad programmers, if you have 1135 00:57:37,080 --> 00:57:40,724 1,000 lines of code, there's a constant probability of a bug. 1136 00:57:40,724 --> 00:57:42,390 And the constants are different for good 1137 00:57:42,390 --> 00:57:45,830 versus bad programmers, but it's still a constant. 1138 00:57:45,830 --> 00:57:49,030 So how many mistakes you make is directly 1139 00:57:49,030 --> 00:57:50,957 proportional to how much you write. 1140 00:57:50,957 --> 00:57:52,498 This is why we like simple solutions. 1141 00:57:55,850 --> 00:57:59,350 OK, any questions on this? 1142 00:57:59,350 --> 00:58:00,990 So we have four problems. 1143 00:58:00,990 --> 00:58:02,390 We didn't cover one. 1144 00:58:02,390 --> 00:58:04,830 Look at the other one, look at the solution. 1145 00:58:04,830 --> 00:58:07,120 Ideally look at the problem, think for at least half 1146 00:58:07,120 --> 00:58:09,320 an hour, then look at the solution. 1147 00:58:09,320 --> 00:58:11,792 What I want you to take away is not just oh, here 1148 00:58:11,792 --> 00:58:14,000 are three problems, let's memorize how we solve them. 1149 00:58:14,000 --> 00:58:15,780 But the whole process thing, and how 1150 00:58:15,780 --> 00:58:19,100 we played with data structures and how we used all the hints 1151 00:58:19,100 --> 00:58:23,170 that we possibly could to build more insights into the problem. 1152 00:58:23,170 --> 00:58:25,020 OK, cool.