1 00:00:00,089 --> 00:00:04,820 The following content is provided under a Creative Commons license. Your support will 2 00:00:04,820 --> 00:00:10,680 help MIT OpenCourseWare continue to offer high quality educational resources for free. 3 00:00:10,680 --> 00:00:15,520 To make a donation or view additional materials from hundreds of MIT courses, 4 00:00:15,520 --> 00:00:21,320 visit MIT OpenCourseWare at ocw.mit.edu. 5 00:00:21,320 --> 00:00:29,800 PROFESSOR: All right. Good morning, everyone. Let's get started. So we're going to start 6 00:00:29,810 --> 00:00:38,000 6.046 in earnest today. We're going to start with our first module on divide and conquer. 7 00:00:38,000 --> 00:00:44,149 You've all seen divide and conquer algorithms before. Merge sort is a classic divide and 8 00:00:44,149 --> 00:00:50,469 conquer algorithm. I'm going to spend just a couple minutes talking about the paradigm, 9 00:00:50,469 --> 00:00:55,680 give you a slightly more general setting than merge sort. And then we'll get into two really 10 00:00:55,680 --> 00:01:04,059 cool divide and conquer problems in the sense that these are problems for which divide and 11 00:01:04,059 --> 00:01:10,149 conquer works very well-- mainly, convex hall and median finding. 12 00:01:10,149 --> 00:01:16,340 So before I get started on the material, let me remind you that you should be signing up 13 00:01:16,340 --> 00:01:22,779 for a recitation section on Stellar. And please do that even if you don't plan on attending 14 00:01:22,779 --> 00:01:30,139 sections. Because we need that so we can assign your problem sets to be graded, OK? 15 00:01:30,139 --> 00:01:36,310 So that's our way of partitioning problem sets as well. And then the other thing is 16 00:01:36,310 --> 00:01:42,259 problem set one is going to go out today. And that it's a one week problem set. 17 00:01:42,259 --> 00:01:49,479 All problem sets are going to be a week in duration. Please read these problem sets the 18 00:01:49,479 --> 00:01:54,158 day that they come out. Spend 5, 10 minutes reading them. 19 00:01:54,158 --> 00:01:59,700 Some things are going to look like they're magic, that they're-- how could I possibly 20 00:01:59,700 --> 00:02:05,959 prove this? If you think about it for a bit, it'll become obvious. We promise you that. 21 00:02:05,959 --> 00:02:12,459 But get started early. Don't get started at 7:00 PM when we have 11:59 PM deadline on 22 00:02:12,459 --> 00:02:19,110 Thursday, all right? That four hours or five hours of time may not be enough to go from 23 00:02:19,110 --> 00:02:23,150 magical to obvious, OK? 24 00:02:23,150 --> 00:02:31,930 So let's get started with the paradigm associated with divide and conquer. It's just a beautiful 25 00:02:31,930 --> 00:02:41,159 notion that you can break up the problem into smaller parts and somehow compose the solutions 26 00:02:41,159 --> 00:02:47,689 to the smaller parts. And of course, the details are going to be what's important when we take 27 00:02:47,689 --> 00:02:50,650 a particular problem instance. 28 00:02:50,650 --> 00:02:59,900 But let's say we're given a problem of size n. We're going to divide it into a sub problems-- 29 00:02:59,900 --> 00:03:13,459 I'll put that in quotes so you know it's a symbol-- a sub problems of size n over b. 30 00:03:13,459 --> 00:03:17,319 And here, a is an integer. 31 00:03:17,319 --> 00:03:22,609 And a is going to be greater than or equal to 1. It could be two. It could be three. 32 00:03:22,609 --> 00:03:23,569 It could be four. 33 00:03:23,569 --> 00:03:32,549 This is the generalization I alluded to. And b does not have to be two or even an integer. 34 00:03:32,549 --> 00:03:35,400 But it has to be strictly greater than one. 35 00:03:35,400 --> 00:03:40,519 Otherwise, there's no notion of divide and conquer. You're not breaking things up into 36 00:03:40,519 --> 00:03:48,549 smaller problems. So b should be strictly greater than one. So that's the general setting. 37 00:03:48,549 --> 00:03:58,379 And then you'll solve each sub problem recursively. And the idea here is that once the sub problems 38 00:03:58,379 --> 00:04:06,249 become really small, they become constant size, it's relatively easy to solve them. 39 00:04:06,249 --> 00:04:08,829 You can just do exhaustive search. 40 00:04:08,829 --> 00:04:16,738 If you have 10 elements and you're doing effectively a cubic search, well, 10 cubed is 1,000. That's 41 00:04:16,738 --> 00:04:22,449 a constant. You're in great shape as long as the constants are small enough. 42 00:04:22,449 --> 00:04:28,849 And so you're going to recurse until these problems get small. And then typically-- this 43 00:04:28,849 --> 00:04:35,590 is not true for all divide and conquer approaches. But for most of them, and certainly the ones 44 00:04:35,590 --> 00:04:42,370 we're going to cover today, the smarts is going to be in the combination step-- when 45 00:04:42,370 --> 00:05:01,960 you combine these problems, the solutions of these sub problems, into the overall solution. 46 00:05:01,960 --> 00:05:04,310 And so that's the story. 47 00:05:04,310 --> 00:05:12,530 Typically, what happens in terms of efficiency is that you can write a recurrence that's 48 00:05:12,530 --> 00:05:22,849 associated with this divide and conquer algorithm. And you say t of n, which is a running time, 49 00:05:22,849 --> 00:05:32,879 for a problem of size n is going to be a times tfn over b-- and this is a recurrence-- plus 50 00:05:32,879 --> 00:05:41,729 the work that you need to do for the merge operational or the combine. This is the same 51 00:05:41,729 --> 00:05:45,639 as merge. 52 00:05:45,639 --> 00:05:53,129 And so you get a recurrence. And you're not quite done yet in terms of the analysis. Because 53 00:05:53,129 --> 00:05:57,159 once you have the recurrence, you do have to solve the recurrence. And it's usually 54 00:05:57,159 --> 00:06:01,430 not that hard and certainly it's not going to be particularly difficult for the divide 55 00:06:01,430 --> 00:06:06,360 and conquer examples that we're going to look, at least today. 56 00:06:06,360 --> 00:06:13,270 But we also have this theorem that's called the master theorem that is essentially something 57 00:06:13,270 --> 00:06:21,659 where you can fairly mechanically plug in the a's and the b's and whatever you have 58 00:06:21,659 --> 00:06:27,199 there-- maybe it's theta n, maybe it's theta n square-- and get the solution to the recurrence. 59 00:06:27,199 --> 00:06:34,419 I'm actually not going to do that today. But you'll hear once again about the massive theorem 60 00:06:34,419 --> 00:06:42,379 tomorrow in section. And it's a fairly straightforward template that you can use for most of the 61 00:06:42,379 --> 00:06:48,280 divide and conquer examples we're going to look at in 046 with one exception that we'll 62 00:06:48,280 --> 00:06:52,949 look at in median finding today that will simply give you the solution to the recurrence, 63 00:06:52,949 --> 00:06:53,919 OK? 64 00:06:53,919 --> 00:06:58,520 So you've see most of these things before. That's a little bit of setup. And so let's 65 00:06:58,520 --> 00:07:11,080 dive right in into convex hull, which is my favorite problem when it comes to using divide 66 00:07:11,080 --> 00:07:12,620 and conquer. 67 00:07:12,620 --> 00:07:19,509 So convex hull, I got a little prop here which will save me from writing on the board and 68 00:07:19,509 --> 00:07:28,330 hopefully be more understandable. But the idea here is that in this case, we have a 69 00:07:28,330 --> 00:07:35,830 two dimensional problem with a bunch of points in a two dimensional plane. You can certainly 70 00:07:35,830 --> 00:07:43,460 do convex hull for three dimensions, many dimensions. And convexity is something that 71 00:07:43,460 --> 00:07:46,280 is a fundamental notion in optimization. 72 00:07:46,280 --> 00:07:52,400 And maybe we'll get to that in 6046 in advanced topics, maybe not. But in the context of today's 73 00:07:52,400 --> 00:07:58,729 lecture, what we're interested in doing is essentially finding an envelope or a hull 74 00:07:58,729 --> 00:08:05,490 associated with a collection of points on a two dimensional plane. And this hull obviously 75 00:08:05,490 --> 00:08:14,849 is going to be something, as you can guess, that encloses all of these points, OK? 76 00:08:14,849 --> 00:08:23,250 So what I have here, if I make this string taut enough-- this is not working so well, 77 00:08:23,250 --> 00:08:35,159 but I think you get the picture. All right, so that's not a convex hull. This is not a 78 00:08:35,159 --> 00:08:40,419 convex hull for the reason that I have a bunch of points outside of the hull. 79 00:08:40,419 --> 00:08:53,120 All right, so let me just-- that is a convex hull. And now if I start stretching like that 80 00:08:53,120 --> 00:08:59,940 or like this or like that, that's still a convex hull, OK? So that's the game. 81 00:08:59,940 --> 00:09:08,400 We have to find an algorithm. And we look at a couple of different ones that will find 82 00:09:08,400 --> 00:09:14,390 all of these segments that are associated with this convex hull, OK? So this is a segment 83 00:09:14,390 --> 00:09:15,790 that's part of the convex hull. 84 00:09:15,790 --> 00:09:23,570 That's a segment that's part of the convex hull. If, in fact, I had something like this-- 85 00:09:23,570 --> 00:09:28,710 and this was stretched out-- because I have those two points outside the convex hull, 86 00:09:28,710 --> 00:09:35,970 this may still be a segment that's part of the electronics hall but this one is not, 87 00:09:35,970 --> 00:09:41,480 right? So that's-- the game here is to find these segments. So if you're going to working 88 00:09:41,480 --> 00:09:51,340 with segments or tangents-- they're going to be used synonymously-- all of the tangents 89 00:09:51,340 --> 00:09:56,080 or segments associated with the entirety of the convex hull and we have to discover them. 90 00:09:56,080 --> 00:10:01,780 And only input that we have is the set of pointx-- xiy coordinates. 91 00:10:01,780 --> 00:10:08,200 And there's just a variety of algorithms that you can use to do this. The one that I wish 92 00:10:08,200 --> 00:10:14,330 I had time to explain but I'll just mention is what's called a gift wrapping algorithm. 93 00:10:14,330 --> 00:10:22,070 You might not have done this, but I guarantee you I said you probably have taken a misshapen 94 00:10:22,070 --> 00:10:26,200 gift, right, and tried to wrap it in gift wrapping paper. 95 00:10:26,200 --> 00:10:30,250 And when you're doing that, you're essentially-- if you're doing it right you're essentially 96 00:10:30,250 --> 00:10:35,090 trying to find the convex hull of this three dimensional structure. You're trying to tighten 97 00:10:35,090 --> 00:10:39,750 it up. You're trying to find the minimum amount of gift wrapping paper. 98 00:10:39,750 --> 00:10:43,730 I'm not sure if you've ever thought about minimizing gift wrapping paper, but you should 99 00:10:43,730 --> 00:10:50,030 have. And that's the convex hull of this three dimensional shape. But we'll stick to two 100 00:10:50,030 --> 00:10:56,710 dimensions because we'll have to draw things on the board. So let me just spec this out 101 00:10:56,710 --> 00:10:57,990 a bit. 102 00:10:57,990 --> 00:11:12,910 I've been given endpoints in a plane. And those set of points are s, xi, yi such that 103 00:11:12,910 --> 00:11:21,200 i equals 1, 2 to n. And we're just going to assume here, just to make things easy because 104 00:11:21,200 --> 00:11:28,610 we don't want to have segments that are null or segments that are a little bit different 105 00:11:28,610 --> 00:11:37,270 because they're discontinuous. But we're going to assume that no two have the same x-coordinate. 106 00:11:37,270 --> 00:11:56,020 This is just a matter of convenience. And no two have the same y-coordinate. And then finally, no three in a line. 107 00:11:56,020 --> 00:12:01,840 Because we want to be able to look at pairs of points and find these segments. And it 108 00:12:01,840 --> 00:12:08,070 just gets kind of inconvenient. You have to do special cases if there of them are on a 109 00:12:08,070 --> 00:12:29,700 line. And so the convex hull itself is the smallest polygon containing all points in 110 00:12:29,700 --> 00:12:36,810 s. And we're going to call that ch of s-- convex hull of s. 111 00:12:36,810 --> 00:12:39,440 STUDENT: Smallest convex polygon. 112 00:12:39,440 --> 00:12:55,520 PROFESSOR: The smallest convex polygon-- thank you. And so just as an example on the 113 00:12:55,520 --> 00:13:02,660 board, when you have something like this, you're going to have your convex hull being 114 00:13:02,660 --> 00:13:07,690 that. This one is inside of it. 115 00:13:07,690 --> 00:13:13,490 These two points are inside of it. And all the other ones form the hull. And so we might 116 00:13:13,490 --> 00:13:25,750 have p, q, r, s, t, u. And v and x are inside of the hull. They're not part of the specification 117 00:13:25,750 --> 00:13:30,000 of ch of s, which I haven't quite told you how we're going to specify that. 118 00:13:30,000 --> 00:13:38,290 But the way you're going to specify that is simply by representing it as a sequence of 119 00:13:38,290 --> 00:13:54,390 points that are on the boundary on the hull in clockwise order. And you can think of this 120 00:13:54,390 --> 00:14:00,160 as being a doubly linked list in terms of the data structure that you'd use if you coded 121 00:14:00,160 --> 00:14:11,080 this up. So in this case, it would be p to q to r to s. 122 00:14:11,080 --> 00:14:18,800 You're going to start with t in this case. It's a doubly linked list. So you could conceivably 123 00:14:18,800 --> 00:14:26,120 start with anything. But that's the representation of the convex hull. 124 00:14:26,120 --> 00:14:33,890 And we're going to use clockwise just because we want to be clear on as to what order we're 125 00:14:33,890 --> 00:14:38,180 enumerating these points. It's going to become important when we do the divide and conquer 126 00:14:38,180 --> 00:14:46,170 algorithm. So let's say that we didn't care about divide and conquer just for the heck 127 00:14:46,170 --> 00:14:57,610 of it and I gave you a bunch of points over here. 128 00:14:57,610 --> 00:15:07,510 Can you think of a simple-- forget efficiency for just a couple of minutes. Can you think 129 00:15:07,510 --> 00:15:18,720 of a simple algorithm that would generate the segments of the convex hull? For example, 130 00:15:18,720 --> 00:15:21,940 I do not want to generate this segment vx. 131 00:15:21,940 --> 00:15:27,890 If I think of a segment as being something that is defined by two points, then I don't 132 00:15:27,890 --> 00:15:32,480 want to generate the segment vx because clearly the segment is not part of the convex hull. 133 00:15:32,480 --> 00:15:38,820 But whereas the segment pq, qr, rs, et cetera, they're all part of the convex hull, right? 134 00:15:38,820 --> 00:15:46,360 So what is the obvious brute force algorithm, forgetting efficiency, that given this set 135 00:15:46,360 --> 00:15:53,670 of points will generate one by one the segments of the convex hull? 136 00:15:53,670 --> 00:16:01,550 Anybody? Did you have your head up? No? Go ahead. Yep. 137 00:16:01,550 --> 00:16:06,380 STUDENT: Draw the line and check how many other lines intersect with it. 138 00:16:06,380 --> 00:16:09,860 PROFESSOR: Draw the line and check how many lines it intersects with. 139 00:16:09,860 --> 00:16:11,120 STUDENT: Yeah. 140 00:16:11,120 --> 00:16:15,180 PROFESSOR: Is there-- I think you got-- you draw the line. That's good, right? 141 00:16:15,180 --> 00:16:18,460 STUDENT: [LAUGHS] AUDIENCE: [LAUGHING] 142 00:16:18,470 --> 00:16:23,670 PROFESSOR: Well-- but you want to do a little more. Yeah, go ahead. 143 00:16:23,670 --> 00:16:28,110 STUDENT: For every pair of points you see, make a half-plane and see where they complete 144 00:16:28,110 --> 00:16:31,260 all of their other points. [INAUDIBLE] 145 00:16:31,260 --> 00:16:33,320 PROFESSOR: Ah, so that's good. That's good. That's good. 146 00:16:33,330 --> 00:16:41,420 All right, so the first person who breaks the ice here always gets a Frisbee. Sorry 147 00:16:41,420 --> 00:16:47,180 man. At least I only hit the lecturer-- no liability considerations here. OK, now I'm 148 00:16:47,180 --> 00:16:49,240 getting scared. 149 00:16:49,240 --> 00:16:55,250 Right, so I think there's a certain amount of when I throw this, am I going to choke 150 00:16:55,250 --> 00:17:01,450 or not, right? But it's going to get higher when one of you guys in the back answers a 151 00:17:01,450 --> 00:17:03,860 question. So you're exactly right. 152 00:17:03,860 --> 00:17:10,128 And you draw a line. And then you just look at it. And you look at the half plane. 153 00:17:10,128 --> 00:17:17,869 And if all the points are to one side, it is a segment of the convex hull. If they're 154 00:17:17,869 --> 00:17:23,029 not, it's not a segment-- beautiful. All right, are we done? Can we go and enjoy the good 155 00:17:23,029 --> 00:17:24,159 weather outside? 156 00:17:24,159 --> 00:17:31,110 No, we've got ways to go here. So this is not the segment whereas one-- let me draw 157 00:17:31,110 --> 00:17:36,039 that. I should draw these in a dotted way. 158 00:17:36,039 --> 00:17:41,480 This is not a segment. This is not a segment. This is a segment. 159 00:17:41,480 --> 00:17:45,850 And I violated my rule of these three not being in a straight line. So I'll move this 160 00:17:45,850 --> 00:17:51,640 over here. And then that's a segment and so on and so forth, OK? Right? 161 00:17:51,640 --> 00:17:53,900 STUDENT: It's no longer a side with the ones below it. 162 00:17:53,900 --> 00:17:55,700 PROFESSOR: I'm sorry? 163 00:17:55,700 --> 00:17:58,580 STUDENT: It would have to go directly to the bottom one from the left one. 164 00:17:58,580 --> 00:18:02,120 PROFESSOR: Oh, you're right. That's a good point. That's an excellent point. 165 00:18:02,139 --> 00:18:08,340 SO what happened here was when I moved that out-- exactly right. Thank you. This is good. 166 00:18:08,340 --> 00:18:16,700 So when I moved this out here, what happened was-- and I drew this-- well, this one here, 167 00:18:16,700 --> 00:18:23,230 my convex hull, changed. The problem specification changed on me. It was my fault. But then what 168 00:18:23,230 --> 00:18:28,080 would happen, of course, is as I move this, that would become the segment that was part 169 00:18:28,080 --> 00:18:30,389 of the convex hull, OK? 170 00:18:30,389 --> 00:18:36,240 So sorry to confuse people. But what we have here in terms of an algorithm, if I leave 171 00:18:36,240 --> 00:18:43,679 the points the same, works perfectly well. So let me just leave the points the same and 172 00:18:43,679 --> 00:18:48,029 just quickly recap, which is, I'm going to take a pair of points. 173 00:18:48,029 --> 00:18:54,200 And I'm going to draw-- and let me just draw this in a dotted fashion first. And I'm going 174 00:18:54,200 --> 00:18:58,240 to say that's the segment. And I'm going to take a look at that line and say this breaks 175 00:18:58,240 --> 00:19:04,779 up the plane into two half planes. Are all about points on one side? 176 00:19:04,779 --> 00:19:09,799 And if the answer is yes, I'm going to go ahead and, boom, say that is a segment of 177 00:19:09,799 --> 00:19:16,110 my convex hull. If the answers is no, like in this case, I'm going to drop that segment, 178 00:19:16,110 --> 00:19:19,950 OK? So now let's talk about complexity. 179 00:19:19,950 --> 00:19:29,990 Let's say that there are n points here. And how many segments do I have? I have O n square 180 00:19:30,000 --> 00:19:31,860 theta n square segments. 181 00:19:31,860 --> 00:19:38,399 And what is the complexity of the test? What is the complexity of the test that's associated 182 00:19:38,399 --> 00:19:43,149 with, once I've drawn the segments, deciding whether the segment is going to be a tangent 183 00:19:43,149 --> 00:19:45,490 which is part of the convex hull or not? What is the complexity? 184 00:19:45,490 --> 00:19:46,360 STUDENT: O n. 185 00:19:46,360 --> 00:19:59,680 PROFESSOR: O n-- exactly right. So on test complexity-- and so we got over theta n cubed 186 00:19:59,680 --> 00:20:05,059 complexity, OK? So it makes sense to do divide and conquer if you can do better than this. 187 00:20:05,059 --> 00:20:10,230 Because this is a really simple algorithm. The good news is we will be able to do better 188 00:20:10,230 --> 00:20:18,990 than that. And now that we have a particular algorithm-- I'm not quite ready to show you 189 00:20:18,990 --> 00:20:19,700 that yet. 190 00:20:19,700 --> 00:20:26,590 Now that we have a particular algorithm, we can think about how we can improve things. 191 00:20:26,590 --> 00:20:34,210 And of course we're going to use divide and conquer. So let's go ahead and do that. And 192 00:20:34,210 --> 00:20:40,909 so generally, the divide and conquer, as I mentioned before, in most cases, the division 193 00:20:40,909 --> 00:20:44,240 is pretty straightforward. 194 00:20:44,240 --> 00:20:50,629 And that's the case here as well. All the fun is going to be in the merge step. Right, 195 00:20:50,629 --> 00:20:54,460 so what we're going to do, as you can imagine, is we're going to take these points. 196 00:20:54,460 --> 00:20:59,690 And we're going to break them up. And the way we're going to break them up is by dividing 197 00:20:59,690 --> 00:21:03,409 them into half lengths. We're going to just draw a line. 198 00:21:03,409 --> 00:21:07,980 And we're going to say everything to the left of the line is one sub problem, everything 199 00:21:07,980 --> 00:21:14,440 to the right of the line is another sub problem, go off and find the convex hull for each of 200 00:21:14,440 --> 00:21:20,509 the sub problems. If you have two points, you're done, obviously. It's trivial. 201 00:21:20,509 --> 00:21:24,899 And at some point, you can say I'm just going to deal with brute force. If we can go down 202 00:21:24,899 --> 00:21:30,789 to order n cubed, if n is small, I can just apply that algorithm. So it doesn't even have 203 00:21:30,789 --> 00:21:36,059 to be the base case of n equals 1 or n equals 2. That's a perfectly fine thing to do. 204 00:21:36,059 --> 00:21:39,749 But you could certainly go with n equals 10, as I mentioned before, and run this brute 205 00:21:39,749 --> 00:21:44,360 force algorithm. And so at that point, you know that you can get down to small enough 206 00:21:44,360 --> 00:21:50,779 size sub problems for which you can find the convex hull efficiently. And then you've got 207 00:21:50,779 --> 00:21:57,539 these two convex hulls which are clearly on two different half planes because that's the 208 00:21:57,539 --> 00:21:59,019 way you defined them. 209 00:21:59,019 --> 00:22:04,820 And now you've got to merge them. And that's where all the fun is, OK? So let's just write 210 00:22:04,820 --> 00:22:06,460 this out again. 211 00:22:06,460 --> 00:22:17,850 You're going to sort the points by x-coordinates. And we're going to do this once and for all. 212 00:22:17,850 --> 00:22:22,299 We don't have to keep sorting here because we're just going to be partitioning based 213 00:22:22,299 --> 00:22:22,929 on x-coordinates. 214 00:22:22,929 --> 00:22:27,509 And we can keep splitting based on x-coordinates because we want to generate these half-lengths, 215 00:22:27,509 --> 00:22:41,639 right? So if we can do those once and for all-- and for the input set S, we're going 216 00:22:41,640 --> 00:23:00,200 to divide into the left half A and right half B by the x-coordinates. And then we're going 217 00:23:00,200 --> 00:23:08,700 to compute CH of A and CH of B recursively. 218 00:23:08,700 --> 00:23:14,399 And then we're going to combine. So the only difference here from what we had before is 219 00:23:14,399 --> 00:23:18,909 the specification of the division. It looked pretty generic. 220 00:23:18,909 --> 00:23:23,769 It's similar to the paradigm that I wrote before. But I've specified exactly how I'm 221 00:23:23,769 --> 00:23:34,419 going to break this up. So let's start with the merge operation. We're going to spend 222 00:23:34,419 --> 00:23:36,929 most of our time specing that. 223 00:23:36,929 --> 00:23:42,909 And again, there's many ways you could do the merge. And we want the most efficient 224 00:23:42,909 --> 00:23:56,139 way. That's obviously going to determine complexity. So, big question-- how to merge. 225 00:23:56,139 --> 00:24:03,169 So what I have here, if I look at the merge step, is I've created my two sub problems 226 00:24:03,169 --> 00:24:11,240 corresponding to these two half planes. And what I have here is-- let's say I've generated, 227 00:24:11,240 --> 00:24:18,820 at this point, a convex hull associated with each of these sub problems. So what I have 228 00:24:18,820 --> 00:24:23,129 here is a1, a2. 229 00:24:23,129 --> 00:24:30,249 I'm going to go clockwise to specify the convex hull. And the other thing that I'm going to 230 00:24:30,249 --> 00:24:39,350 do is in the sub problem case, my starting point is going to be for the left sub problem, 231 00:24:39,350 --> 00:24:47,159 the coordinate that has the highest x value, OK? So that's a1 in this case-- the highest 232 00:24:47,159 --> 00:24:50,470 x value going over. x is increasing to the right. 233 00:24:50,470 --> 00:24:58,999 And for the right half of the problem, it's going to be the coordinate that has the lowest 234 00:24:58,999 --> 00:25:07,009 x value. And I'm going to go clockwise in both of these cases. So when you see an ordering 235 00:25:07,009 --> 00:25:14,970 associated with the subscripts for these points, start with a1 or b1 and then go clockwise. 236 00:25:14,970 --> 00:25:20,159 And that's how we number this-- so just notational, nothing profound here. 237 00:25:20,159 --> 00:25:26,519 So I got these two convex hulls-- these sub hulls, if you will. And what I need to do 238 00:25:26,519 --> 00:25:32,850 now is merge them together. And you can obviously look at this and it's kind of obvious what 239 00:25:32,850 --> 00:25:37,210 the overall convex hull is, right? 240 00:25:37,210 --> 00:25:46,779 But the key thing is, I'm going to have to look at each of the pairs of points that are 241 00:25:46,779 --> 00:25:56,490 associated with this and that and try to generate the tangents, the new tangents, that are not 242 00:25:56,490 --> 00:26:04,169 part of the sub hulls, but they're part of the overall hull, right? So in this case, 243 00:26:04,169 --> 00:26:11,600 you can imagine an algorithm that is going to kind of do what this brute force algorithm 244 00:26:11,600 --> 00:26:20,879 does except that it's looking at a point from here and a point from here. 245 00:26:20,879 --> 00:26:28,529 So you could imagine that I'm going to do a pairwise generation of segments. And then 246 00:26:28,529 --> 00:26:32,240 I'm going to check to see whether these segments are actually tangents that are part of the 247 00:26:32,240 --> 00:26:38,240 overall convex hull or not. So what I would do here is I'd look at this. 248 00:26:38,240 --> 00:26:45,820 And is that going to be part of the overall hull? No, and precisely why not? Someone tell 249 00:26:45,820 --> 00:26:53,620 me why this segment a1 b1 is not part of the overall hull? Yeah, go ahead. 250 00:26:53,620 --> 00:26:56,940 STUDENT: If we were to draw a line through the whole thing there would be one on both sides. 251 00:26:56,940 --> 00:27:03,700 PROFESSOR: Exactly right-- that's exactly right. So here you go. So that's not part 252 00:27:03,700 --> 00:27:10,249 of it. Now, if I look at this-- well, same reason that's not part of it. 253 00:27:10,249 --> 00:27:14,499 In this case-- and this is a fairly obvious example. I'm going to do something that's 254 00:27:14,499 --> 00:27:19,509 slightly less obvious in case you get your hopes up that we have this trivial algorithm, 255 00:27:19,509 --> 00:27:27,820 OK? This is looking good, right? That's supposed to be a straight line, by the way. 256 00:27:27,820 --> 00:27:33,220 So a4 b2-- I mean, that's looking good, right? Because all the points are on one side. So 257 00:27:33,220 --> 00:27:41,669 a4 b2 is our upper tangent. Right, so our upper tangent is something that we're going 258 00:27:41,669 --> 00:27:49,249 to define as-- if I look at each of these things, I'm going to say they have a yij. 259 00:27:49,249 --> 00:27:59,669 OK, what is yij? yij is the y-coordinate. of the segment that I'm looking at, the ij 260 00:27:59,669 --> 00:28:00,129 segment. 261 00:28:00,129 --> 00:28:09,970 So this yij is for ai and bj. So what I have here is y42 out here. And this is-- for the 262 00:28:09,970 --> 00:28:16,399 upper tangent, yij is going to be maximum, right? Because that's essentially something 263 00:28:16,399 --> 00:28:20,850 which would ensure me that there are no points higher than that, right? 264 00:28:20,850 --> 00:28:26,759 So if I go up all the way and I find this that has the maximum yij, that is going to 265 00:28:26,759 --> 00:28:32,580 be my upper tangent. Because only for that will I have no points ahead of that, OK? So 266 00:28:32,580 --> 00:28:34,409 yij is upper tangent. 267 00:28:34,409 --> 00:28:41,799 This is going to be maximum. And I'm not going to write this down, but it makes sense that 268 00:28:41,799 --> 00:28:50,479 the lower tangent is going to have the lowest yij. Are we all good here? Yeah, question. 269 00:28:50,480 --> 00:28:55,660 STUDENT: So I am just wondering, I couldn't hear what she said why we moved out a1 b1. 270 00:28:55,660 --> 00:29:02,379 PROFESSOR: OK, so good. Let me-- that reason we moved out a1 b1 is because if I just drew 271 00:29:02,379 --> 00:29:09,450 a1 b1 like this-- and I'm extrapolating this. This is again supposed to be a straight line. 272 00:29:09,450 --> 00:29:14,039 Then you clearly see that there are points on either side of the a1 b1 segment when you 273 00:29:14,039 --> 00:29:20,860 look at the overall problem, correct? You see that on a1 b1, b2 is on this side, b3 274 00:29:20,860 --> 00:29:25,450 is on this side if I just extend this line all the way to infinity in both directions. 275 00:29:25,450 --> 00:29:32,059 And that violates the requirement that the segment be part of the overall hull, OK? 276 00:29:32,059 --> 00:29:36,639 That make sense? Good. So, everybody with me? 277 00:29:36,639 --> 00:29:45,580 So clearly, there's a trivial merge algorithm here. And the trivial merge algorithm is to 278 00:29:45,580 --> 00:29:55,039 look at not every pair of points-- every ab pair, right? Every aibj pair. 279 00:29:55,039 --> 00:30:04,259 And so what is the complexity of doing that? If I have n total points, the complexity would 280 00:30:04,259 --> 00:30:10,769 be-- would be in square, right? Because maybe I'd have half here and half here, ignore constants. 281 00:30:10,769 --> 00:30:16,110 And you could say, well, it's going to be n squared divided by 4, but that's theta n 282 00:30:16,110 --> 00:30:30,190 squared. So there's an obvious merge algorithm that is theta n square looking at all pairs 283 00:30:30,190 --> 00:30:38,759 of points. And when I mean all pairs of points, I mean like an a and a b. 284 00:30:38,759 --> 00:30:44,720 Because I want to pick a pair when I go left of that dividing line and then right of the 285 00:30:44,720 --> 00:30:49,259 dividing line. But either way, it's theta n square, OK? So now you look at that and 286 00:30:49,259 --> 00:30:53,879 you go, huh. Can I do a better? 287 00:30:53,879 --> 00:31:02,179 What if I just went for the highest a point and the highest b point and I just, no, that's 288 00:31:02,179 --> 00:31:08,600 it? I'm done-- constant time. Wouldn't that be wonderful? Yeah, wonderful, but incorrect, 289 00:31:08,600 --> 00:31:09,409 OK? 290 00:31:09,409 --> 00:31:14,559 Right, so what is an example. And so this is something that I spent a little bit 291 00:31:14,559 --> 00:31:21,659 of time last night concocting. So I'm like you guys too. I do my problem set the night 292 00:31:21,659 --> 00:31:22,950 before. 293 00:31:22,950 --> 00:31:35,539 Well, don't do as I do. Do as I say. But I've done this before. So that's the difference. 294 00:31:35,539 --> 00:31:42,359 But this particular example is new. So what I have here is I'm going to show you why there's 295 00:31:42,359 --> 00:31:54,809 not a trivial algorithm, OK, that-- I got to get these angles right-- that you can't 296 00:31:54,809 --> 00:32:00,759 just pick the highest points and keep going, right? 297 00:32:00,759 --> 00:32:06,470 And then that would be constant time. So that's my a over here. And let's assume that I have 298 00:32:06,470 --> 00:32:11,109 my dividing line like that. And then what I'm going to do here-- and I hope I get this 299 00:32:11,109 --> 00:32:17,960 right-- is I'm going to have something like this, like that. 300 00:32:17,960 --> 00:32:30,409 And then I'm going to have b1 here clockwise-- so b2, b3, and b4. So as you can see here, 301 00:32:30,409 --> 00:32:49,389 if I look at a4-- a little adjustment necessary. OK, so if I look at that, a4 to b1 versus-- 302 00:32:49,389 --> 00:32:50,710 I mean, just eyeball it. 303 00:32:50,710 --> 00:32:58,669 A3 to b1-- right, is a4 to b1 going to be the upper tangent? No, right? So now a3 is 304 00:32:58,669 --> 00:33:01,859 lower than a4. You guys see that? 305 00:33:01,859 --> 00:33:08,320 And b1 is lower than b2, right? So it's clear that if I just took a4 to b2 that it will 306 00:33:08,320 --> 00:33:12,489 not be an upper tangent. Everybody see that? 307 00:33:12,489 --> 00:33:19,590 Yep, all right, good. So we can't have a constant time algorithm. We have theta and square in 308 00:33:19,590 --> 00:33:24,289 the back. So it is there something-- maybe theta n? 309 00:33:24,289 --> 00:33:34,429 How would we do this merge and find the upper tangent by being a little smarter about searching 310 00:33:34,429 --> 00:33:43,570 for pairs of points that give us this maximum yij? I mean, the goal here is simple. At some 311 00:33:43,570 --> 00:33:47,220 level, if you looked at the brute force, I would generate each of these things. 312 00:33:47,220 --> 00:33:53,340 I would find the yj intercepts associated with this line. And I just pick the maximum. 313 00:33:53,340 --> 00:33:56,070 And the constant time algorithm doesn't work. 314 00:33:56,070 --> 00:34:01,730 The theta n squared algorithm definitely works. But we don't like it. So there has to be something 315 00:34:01,730 --> 00:34:05,990 in between. So, any ideas? Yeah, back there. 316 00:34:05,990 --> 00:34:14,000 STUDENT: So... I had a question. [INAUDIBLE] 317 00:34:14,000 --> 00:34:19,139 PROFESSOR: No, you're just finding-- no, you're maximizing the yij. So for once you have this 318 00:34:19,139 --> 00:34:25,929 segment-- so the question was, isn't the obvious merge algorithm theta n cubed, right? And 319 00:34:25,940 --> 00:34:31,668 my answer is no, because the theta n extra factor came from the fact that you had to 320 00:34:31,668 --> 00:34:36,739 check every point, every endpoint, to see on which side of the plane it was. Whereas 321 00:34:36,739 --> 00:34:41,070 here, what I'm doing is I've got this one line here that is basically y equals 0, if 322 00:34:41,070 --> 00:34:47,909 you like, or y equals some-- I'm sorry, x equals 0 or x equals some value. 323 00:34:47,909 --> 00:34:55,270 And I just need to, once I have the equation for the line associated with a4 b1 or a4 b2, 324 00:34:55,270 --> 00:35:00,500 I just have to find the intercept of it, which is constant time, right? And then once I find 325 00:35:00,500 --> 00:35:06,750 the intercept of it, I just maximize that intercept to get my yij. So I'm good, OK? 326 00:35:06,750 --> 00:35:15,230 So it's only theta n squared, right? Good question. So this is actually quite-- very, 327 00:35:15,230 --> 00:35:17,620 very, very clever. 328 00:35:17,620 --> 00:35:22,370 This particular algorithm is called the two finger algorithm. And I do have multiple fingers, 329 00:35:22,370 --> 00:35:27,790 but it's going to work a lot better if I borrow Eric's finger. And we're going to demonstrate 330 00:35:27,790 --> 00:35:36,300 to you the two finger algorithm for merging these two convex hulls. And then we'll talk 331 00:35:36,300 --> 00:35:39,160 about the complexity of it. 332 00:35:39,160 --> 00:35:44,470 And my innovation again last night was to turn this from a two-finger algorithm. Not 333 00:35:44,470 --> 00:35:48,660 only did I have the bright idea of using Eric-- I decided it was going to become the two finger 334 00:35:48,660 --> 00:35:52,720 an string algorithm. So this is wild. 335 00:35:52,720 --> 00:36:04,420 This is my contribution to 046 lore-- come on. So the way the two finger algorithm works-- 336 00:36:04,420 --> 00:36:10,220 this pseudo code should be incomprehensible. If you just look at it and you go, what, right? 337 00:36:10,220 --> 00:36:15,100 But this demo is going to clear everything up. Right so here's what you do. So now we're 338 00:36:15,100 --> 00:36:22,150 going to do a demo of the merge algorithm that is a clever merge algorithm than the 339 00:36:22,150 --> 00:36:29,070 one that uses order n square time. And it's correct. It's going to get you the correct 340 00:36:29,070 --> 00:36:37,290 upper tangent and what we are starting at here is with Erik’s left finger on A1, which 341 00:36:37,290 --> 00:36:45,400 is defined to be the point that's closest to the vertical line that you see here, the 342 00:36:45,400 --> 00:36:50,580 one that has the highest x-coordinate. And my finger is on B1, which is the point that 343 00:36:50,580 --> 00:36:58,760 has the smallest X-coordinate on the right hand side sub-hull. And what we do is we compute, 344 00:36:58,760 --> 00:37:06,460 for the segment A1 B1, we compute by Yij, in this case Y11, which is the intercept on 345 00:37:06,460 --> 00:37:13,730 the vertical line that you see here that Erik just marked with a red dot. And you can look 346 00:37:13,730 --> 00:37:19,960 at the pseudocode over on, to my right if I face the board. And what happens now is 347 00:37:19,960 --> 00:37:26,730 I'm going to move clockwise, and I'm going to go from B1 to B4. And what happened here? 348 00:37:26,730 --> 00:37:34,010 Did the Yij increase or decrease? Well, as you can see it decreased. And so I'm going 349 00:37:34,010 --> 00:37:40,930 to go back to B1. And we're not quite done with this step here. Erik’s going to go 350 00:37:40,930 --> 00:37:47,360 counterclockwise over to A4. And we're going to check again, yeah, keep the string taught, 351 00:37:47,360 --> 00:37:53,440 check again whether Yij increased or decreased and as is clear from here Yij increased. So 352 00:37:53,440 --> 00:38:00,700 now we move to this point. And as of this moment we think that A4 B1 has the highest 353 00:38:00,700 --> 00:38:05,220 Yij. But we have a while loop. We’re going to have to continue with this while loop, 354 00:38:05,220 --> 00:38:13,370 and now what happens is, I’m going to go from B1 clockwise again to B4. And when this 355 00:38:13,370 --> 00:38:19,570 happens, did Yij increase or decrease? Well it decreased. So I'm going to go back to B1 356 00:38:19,570 --> 00:38:32,430 and Erik now is going to go counterclockwise to A3. And as you can see Y31 increased a 357 00:38:32,430 --> 00:38:39,350 little bit, so we're going to now stop this iteration of the algorithm and we're at A3 358 00:38:39,350 --> 00:38:46,860 B1, which we think at this point is our upper tangent, but let's check that. Start over 359 00:38:46,860 --> 00:38:54,750 again on my side B1 to B4, what happened? Well Yij decreased. So I'm going to go back 360 00:38:54,750 --> 00:38:59,350 to B1. And then Erik’s going to try. He’s going conterclockwise, he's going to go A3 361 00:38:59,350 --> 00:39:08,760 to A2 and, well, big decrease in Yij. Now Erik goes back to A3. At this point we've 362 00:39:08,760 --> 00:39:17,040 tried both moves, my clockwise move and Erik’s counterclockwise move. My move from B1 to 363 00:39:17,040 --> 00:39:24,890 B4 and Erik’s move from A3 to A2. So we've converged, we're out of the while loop, A3 364 00:39:24,890 --> 00:39:34,010 B1 for this example is our upper tangent. All right. You can have your finger back Erik. 365 00:39:34,010 --> 00:39:42,810 So the reason this works is because we have a convex hull here and a convex hull here. 366 00:39:42,810 --> 00:39:51,100 We are starting with the points that are closest to each other in terms of A1 being the closest 367 00:39:51,100 --> 00:39:58,240 to this vertical line, B1 being the closest to this vertical line, and we are moving upward 368 00:39:58,240 --> 00:40:04,570 in both directions because I went clockwise and Erik went counterclockwise. And that's 369 00:40:04,570 --> 00:40:09,000 the intuition of why this algorithm works. We're not going to do a formal proof of this 370 00:40:09,000 --> 00:40:17,490 algorithm, but the monotonicity property corresponding to the convexity of this subhull and the convexity 371 00:40:17,490 --> 00:40:24,100 of the subhull essentially can give you a formal proof of correctness of this algorithm, 372 00:40:24,100 --> 00:40:31,010 but as I said we won't cover that in 046. So all that remains now is to look at our 373 00:40:31,010 --> 00:40:37,850 pseudocode which matches the execution that you just saw and talk about the complexity 374 00:40:37,850 --> 00:40:39,310 of the pseudocode. 375 00:40:39,310 --> 00:40:46,300 So what is the complexity of this algorithm? It's order n, right? So what has happening 376 00:40:46,300 --> 00:40:51,970 here, if you look at this while loop, is that while I have two counters, I'm essentially 377 00:40:51,970 --> 00:40:56,490 looking at two operations per loop. 378 00:40:56,490 --> 00:41:03,860 And either one of those counters is guaranteed to increment through the loop. And so since 379 00:41:03,860 --> 00:41:11,750 I have in this case p points, in one case p plus q equals n-- so let's say I had p points 380 00:41:11,750 --> 00:41:19,500 here and I have q points here. And got p plus q equals n. 381 00:41:19,500 --> 00:41:29,250 And I got a theta n merge simply because I'm going to be running through and incrementing-- 382 00:41:29,250 --> 00:41:35,430 as long as I'm in the loop, I'm going to be incrementing either the i or the j. And the 383 00:41:35,430 --> 00:41:41,830 maximum they can go to are p and q before I bounce out of the loop or before they rotate 384 00:41:41,830 --> 00:41:42,850 around. 385 00:41:42,850 --> 00:41:50,780 And so that's why this is theta n. And so you put it all together in terms of what the 386 00:41:50,780 --> 00:41:57,190 merge corresponds to in terms of complexity and put that together with the overall divide 387 00:41:57,190 --> 00:42:06,270 and conquer. We have a case where this is looking like a recurrence that you've seen 388 00:42:06,270 --> 00:42:08,460 many a time t of n. 389 00:42:08,460 --> 00:42:17,890 I've broken it up into two sub problems. So I have 2. And I could certainly choose this 390 00:42:17,890 --> 00:42:27,750 l over here that's my line l to be such that I have a good partition between the two sets 391 00:42:27,750 --> 00:42:28,400 of points. 392 00:42:28,400 --> 00:42:33,210 Now, if I choose l to be all the way on the right hand side, then I have this large sub 393 00:42:33,210 --> 00:42:38,360 problem-- makes no sense whatsoever. So what I can do-- there's nothing that's stopping 394 00:42:38,360 --> 00:42:46,610 me when I've sorted these points by the x-coordinates to do the division such that there's exactly 395 00:42:46,610 --> 00:42:52,140 the same number, assuming an even number of points n, exactly the same number on the left 396 00:42:52,140 --> 00:42:57,340 hand side or the right hand side. But I can get that right roughly certainly within one 397 00:42:57,340 --> 00:42:59,000 very easily. 398 00:42:59,000 --> 00:43:04,510 So that's where the n over 2 comes from, OK? In the next problem that we'll look at, the 399 00:43:04,510 --> 00:43:09,540 median finding problem, we'll find that trying to get the sub problems to be of roughly equal 400 00:43:09,540 --> 00:43:14,620 size is actually a little difficult, OK? But I want to point out that in this particular 401 00:43:14,620 --> 00:43:20,970 case, it's easy to get sub problems that are half the size because you've done the sorting. 402 00:43:20,970 --> 00:43:26,970 And then you just choose the line, the vertical line such that you've got a bunch of points 403 00:43:26,970 --> 00:43:33,320 that are on either side. And then in terms of the merge operation, we have 2t n over 404 00:43:33,320 --> 00:43:41,310 2 plus theta n. People recognize this recurrence? It's the old merge sort recurrence. 405 00:43:41,310 --> 00:43:45,920 So we did all of this in-- well, it's not merge sort. Clearly the algorithm is not merge 406 00:43:45,920 --> 00:43:48,340 sort. We got the same recurrence. 407 00:43:48,340 --> 00:43:56,960 And so this is theta n log n-- so a lot better than theta nq. And there's no convex hull 408 00:43:56,960 --> 00:44:02,850 algorithm that's in the general case better than this. Even the gift wrapping algorithm 409 00:44:02,850 --> 00:44:07,720 that I mentioned to you, with the right data structures, it gets down to that in terms 410 00:44:07,720 --> 00:44:11,010 of theta n log n, but no better. 411 00:44:11,010 --> 00:44:17,890 OK, so good. That's pretty much what I had here. Again, like I said, happy to answer 412 00:44:17,890 --> 00:44:25,750 questions about the correctness of this loop algorithm for merge later. Any other questions 413 00:44:25,750 --> 00:44:27,570 associated with this? 414 00:44:27,570 --> 00:44:28,560 STUDENT: Question. 415 00:44:28,560 --> 00:44:29,760 Yeah, back there. 416 00:44:29,770 --> 00:44:33,940 STUDENT: If the input is recorded by x coordinates, can you do better than [INAUDIBLE]? 417 00:44:33,940 --> 00:44:41,100 PROFESSOR: No, you can't, because-- I mean, the n log n for the pre-sorting, I mean, there's 418 00:44:41,100 --> 00:44:48,120 another theta n log n for the sorting at the top level. And we didn't actually use that, 419 00:44:48,120 --> 00:44:54,010 right? So the question was, can we do better if the input was pre sorted? 420 00:44:54,010 --> 00:45:00,920 And I actually did not even use the complexity of the sort. We just matched it in this case. 421 00:45:00,920 --> 00:45:05,220 So theta n log n-- and then you can imagine maybe that you could do a theta n sort if 422 00:45:05,220 --> 00:45:09,860 these points were small enough and you rounded them up and you could use a bucket sort or 423 00:45:09,860 --> 00:45:12,130 a counting sort and lower that. 424 00:45:12,130 --> 00:45:17,720 So this theta n log n is kind of fundamental to the divide and conquer algorithm. The only 425 00:45:17,720 --> 00:45:23,530 way you can improve that is by making a merge process that's even faster. And we obviously 426 00:45:23,530 --> 00:45:29,470 tried to cook up a theta one merge process. But that didn't work out, OK? 427 00:45:29,470 --> 00:45:33,720 STUDENT: But are there algorithms that [INAUDIBLE] ? 428 00:45:33,720 --> 00:45:38,920 PROFESSOR: First-- if you assume certain things about the input, you're absolutely, right? 429 00:45:38,930 --> 00:45:45,540 So one thing you'll discover in algorithms in 6046 as well is that we're never satisfied. 430 00:45:45,540 --> 00:45:49,630 OK, so I just said, oh, you can't do better than theta n log n. 431 00:45:49,630 --> 00:45:54,620 But that's in the general case. And I think I mentioned that. You're on the right track. 432 00:45:54,620 --> 00:46:00,440 If the input is pre sorted, you can take that away-- no, it doesn't help in that particular 433 00:46:00,440 --> 00:46:09,110 instance if you have general settings. But if you-- the two dimensional case-- if the 434 00:46:09,110 --> 00:46:17,140 hull, all the segments have a certain characteristic-- not quite planar, but something that's a little 435 00:46:17,140 --> 00:46:21,430 more stringent than that-- you could imagine that you can do improvements. I don't know 436 00:46:21,430 --> 00:46:27,750 if any compelling special case input for convex hull from which you can do better than theta 437 00:46:27,750 --> 00:46:28,540 n log n. 438 00:46:28,540 --> 00:46:34,520 But that's a fine exercise for you, which is in what cases, given some structure on 439 00:46:34,520 --> 00:46:38,690 the points, can I do better than theta n log n? So that's something that keeps coming up 440 00:46:38,690 --> 00:46:45,890 in the algorithm literature, if you can use that, OK? Yeah, back there-- question. 441 00:46:45,890 --> 00:46:47,710 STUDENT: Where's your [INAUDIBLE] step? 442 00:46:47,710 --> 00:46:53,000 You also have to figure out which lines to remove from each of your two... 443 00:46:53,000 --> 00:46:58,560 PROFESSOR: Ah, good point. And you're exactly, absolutely right. And I just realized that 444 00:46:58,560 --> 00:47:00,060 I skipped that step, right? 445 00:47:00,060 --> 00:47:05,360 Thank you so much. So the question was, how do I remove the lines? And it's actually fairly 446 00:47:05,360 --> 00:47:06,090 straightforward. 447 00:47:06,090 --> 00:47:13,720 Let's keep this up here. And we don't need this incomprehensible pseudo code, right? 448 00:47:13,720 --> 00:47:16,800 So let's erase that. 449 00:47:16,800 --> 00:47:24,060 And thank you for asking that question. So it's a little simple cut and paste approach 450 00:47:24,060 --> 00:47:39,440 where let's say that I find the upper tangent ai bj. And I find the lower tangent. 451 00:47:39,440 --> 00:47:51,430 Let's call it ak bm. And in this particular instance, what do I have? I have a1, a2, a3, 452 00:47:51,430 --> 00:48:00,850 a4 as being one of my sub hulls. And then I have b1, b2, b3, b4 as the other one. 453 00:48:00,850 --> 00:48:10,830 Now, what did we determine to be the upper tangent? Was it a3 b1? Right, a3 b1? 454 00:48:10,830 --> 00:48:27,370 So a3 b1 was my upper tangent. And I guess it was a1-- a1 b4? A1 b4 was my lower tangent. 455 00:48:27,370 --> 00:48:34,580 So the big question is, now that I've found these two, how do I generate the collect representation 456 00:48:34,580 --> 00:48:40,990 of the overall convex hull? And so it turns out that you have to do this-- and then the 457 00:48:40,990 --> 00:48:46,660 complexity of this is important as well. And you need to do what's called a cut and paste 458 00:48:46,660 --> 00:48:50,350 that's associated with this where we're going to just look at this and that. 459 00:48:50,350 --> 00:48:54,740 So if we're going to have these two things, then we've got to generate a list of points. 460 00:48:54,740 --> 00:49:00,200 Now, clearly a4 is not going to be part of that, right? A4 is not going to be part of 461 00:49:00,200 --> 00:49:01,490 the overall hull. 462 00:49:01,490 --> 00:49:11,880 What is it that we want? We want something like a1, a2, a3, b1, b2, b3, b4, right? But 463 00:49:11,880 --> 00:49:16,210 there's a point that we have to discard here. Agree? 464 00:49:16,210 --> 00:49:22,780 And so the way we do this is very mechanical. That's the good news here. I mean, you don't 465 00:49:22,780 --> 00:49:24,230 have to look at it pictorially. 466 00:49:24,230 --> 00:49:30,840 I just made that up looking at-- eyeballing it. Clearly, a computer doesn't have eyeballs, 467 00:49:30,840 --> 00:49:37,430 right? And so what we're going to do is we're going to say the first link-- in general, 468 00:49:37,430 --> 00:49:40,960 the first link is ai to bj. 469 00:49:40,960 --> 00:49:49,940 Because that's my upper tangent, OK? And in this case, it's going to be a3 d1, OK? And 470 00:49:49,940 --> 00:50:08,820 then I'm going to go down the b list until you see bm, which is the lower tangent. 471 00:50:08,820 --> 00:50:12,640 You're on the b list. So you're looking for the lower tangent point. And then you're going 472 00:50:12,640 --> 00:50:20,240 to jump until you see bm. You link it to ak, OK? 473 00:50:20,240 --> 00:50:34,310 You link it to ak and continue until you return to ai. And then you have your circular 474 00:50:34,310 --> 00:50:42,910 list, OK? So what you see here is you have a3 here. So I'm going to go ahead and write 475 00:50:42,910 --> 00:50:47,980 out the execution of what I just wrote here. 476 00:50:47,980 --> 00:50:54,500 So I have a3. And I'm going to go jump over to b1. So I'm going to write down b1. Then 477 00:50:54,500 --> 00:50:58,340 I'm going to along the b's until I get to b4. 478 00:50:58,340 --> 00:51:05,440 In this case, I'm going to include all of the b's. So I got b1, b2, b3, b4. And then 479 00:51:05,440 --> 00:51:13,810 I'm going to jump from b4 to a1 because that's part of my lower tangent. 480 00:51:13,810 --> 00:51:25,250 And I got a1 here, a2. And then I'm back to a3, which is great. Because then I'm done, 481 00:51:25,250 --> 00:51:26,290 OK? 482 00:51:26,290 --> 00:51:32,050 And so exactly what I said happened, thank goodness, which is we dropped a4 but we kept 483 00:51:32,050 --> 00:51:38,300 all the other points. Does that answer your question? Good. 484 00:51:38,300 --> 00:51:44,520 What is the complexity of cut and paste? It's order n. I'm just walking through these lists. 485 00:51:44,520 --> 00:51:50,730 So there's no hidden complexity here, OK? Good, good-- thank you. You definitely deserve 486 00:51:50,730 --> 00:51:51,440 a Frisbee. 487 00:51:51,440 --> 00:51:59,400 In fact, you deserve two, right? Where are you? I-- oh, could you stand up? 488 00:51:59,400 --> 00:52:09,570 Yeah, right-- two colors. All right. Oh, so he-- well, you can give it to him if you like. 489 00:52:09,570 --> 00:52:12,380 So good, thank you. 490 00:52:12,380 --> 00:52:20,760 So are we done? Are we done with convex hull? OK, good. So let's go on and do median finding. 491 00:52:20,760 --> 00:52:25,960 Very different-- very different set of issues here. 492 00:52:25,960 --> 00:52:37,960 Still on divide and conquer, but a very different set of issues. The specification here is, 493 00:52:37,960 --> 00:52:44,210 of course, straightforward. You can think of it as I just want a better algorithm than 494 00:52:44,210 --> 00:52:51,480 sorting and looking for the median at the particular position-- in over two position, 495 00:52:51,480 --> 00:53:01,680 for example. Let's say n is odd. And it's floor of n over 2. You can find that median. 496 00:53:01,680 --> 00:53:10,300 Right, so it's pretty easy if you can do sorting. But we're never satisfied with using a standard 497 00:53:10,300 --> 00:53:14,970 algorithm. If we think that we can do better than that. So the whole game here is going 498 00:53:14,970 --> 00:53:18,970 to be I'm going to find the median. 499 00:53:18,970 --> 00:53:32,910 And I want to do it in better than theta n log n time. OK, so that's what median finding 500 00:53:32,910 --> 00:53:37,070 is all about. You're going to use divide and conquer for this. 501 00:53:37,070 --> 00:53:53,880 And so in general, we're going to define, given a set of n numbers, define rank of x 502 00:53:53,880 --> 00:54:06,510 as the numbers in the set that are greater than-- I'm sorry, less than or equal to x. 503 00:54:06,510 --> 00:54:09,270 I mean, you could have defined it differently. We're going to go with less than or equal 504 00:54:09,270 --> 00:54:10,750 to. 505 00:54:10,750 --> 00:54:18,570 So in general, the rank, of course, is something that could be used very easily to find the 506 00:54:18,570 --> 00:54:28,930 median. So if you want to find the element of rank n plus 1 divided by 2 floor, that's 507 00:54:28,930 --> 00:54:38,650 what we call the lower median. And n plus 1 divided by 2 ceiling is the upper median. 508 00:54:38,650 --> 00:54:43,730 And they may be the same if n is odd. 509 00:54:43,730 --> 00:54:48,210 But that's what we want. So you can think of it as it's not median finding, but finding 510 00:54:48,210 --> 00:54:55,200 elements with a certain rank. And we want to do this in linear time, OK? 511 00:54:55,200 --> 00:55:05,400 So we're going to apply divide and conquer here. And as always, the template can be instantiated. 512 00:55:05,400 --> 00:55:11,780 And the devil is in the details of either division or merge. 513 00:55:11,780 --> 00:55:19,460 And we had most of our fun with convex hull on the merge operation. It turns out most 514 00:55:19,460 --> 00:55:32,770 of the fun here with respect to median finding is in the divide, OK? So what I want is the 515 00:55:32,770 --> 00:55:38,780 definition of a select routine that takes a set of numbers s. 516 00:55:38,780 --> 00:55:49,340 And this is the rank. So I want a rank i. And that i might be n over 2-- well, floor 517 00:55:49,340 --> 00:55:52,560 of n plus 1 over 2, whatever? 518 00:55:52,560 --> 00:55:56,500 And so what does the divide and conquer look like? Well, the first thing you need to do 519 00:55:56,500 --> 00:56:04,920 is divide. And as of now, we're just going to say you're going to pick some element x 520 00:56:04,920 --> 00:56:06,240 belonging to s. 521 00:56:06,240 --> 00:56:10,230 And this choice is going to be crucial. But at this point, I'm not ready to specify this 522 00:56:10,230 --> 00:56:15,640 choice yet, OK? So we're going to have to do this cleverly. And then what we're going 523 00:56:15,640 --> 00:56:30,220 to do is we're going to compute on k, which is the rank of x, and generate two sub arrays 524 00:56:30,220 --> 00:56:35,680 such that I want to find the fifth highest element. I want to find the median element. 525 00:56:35,680 --> 00:56:40,910 I want to find the 10th highest element. So I have to keep track of what happens in the 526 00:56:40,910 --> 00:56:46,990 sub problems. Because the sub problems are going to determine, depending on how many 527 00:56:46,990 --> 00:56:52,700 elements are inside those sub problems, which I can only determine after I've solved those 528 00:56:52,700 --> 00:56:56,740 sub problems. I'm going to have to collect that information and put it together in the 529 00:56:56,740 --> 00:56:59,080 merge operation. 530 00:56:59,080 --> 00:57:05,700 So if I want to find the 10th highest element and I've broken it up relatively arbitrarily, 531 00:57:05,700 --> 00:57:10,240 it's quite possible that the 10th highest element is going to be discovered in the left 532 00:57:10,240 --> 00:57:15,150 one or the right one. And I have to show that it's the 10th highest. And it might be that 533 00:57:15,150 --> 00:57:24,090 there's four elements in the left and five on the right that are-- let's see. 534 00:57:24,090 --> 00:57:29,010 If I defined the rank as less than or equal to x, there's four on the left and five on 535 00:57:29,010 --> 00:57:34,680 the right that are smaller. And that's why this is the 10th highest element. And that's 536 00:57:34,680 --> 00:57:44,330 essentially what we have to look at. So b and c are going to correspond to the sub arrays 537 00:57:44,330 --> 00:57:49,600 that you can clearly eliminate one of them. 538 00:57:49,600 --> 00:57:55,350 You can count the number of elements in b, count the number of elements in c. And you 539 00:57:55,350 --> 00:58:03,810 can eliminate one of them in this recursion as you're discovering this element with the 540 00:58:03,810 --> 00:58:09,320 correct rank-- in this case, i. So let me write the rest of this out and make sure we're 541 00:58:09,320 --> 00:58:11,710 all on the same page. 542 00:58:11,710 --> 00:58:23,570 What I have here pictorially is I've generated b here and c. So this is all of b and that's 543 00:58:23,570 --> 00:58:30,970 all of c. I have k minus 1 elements here in b. 544 00:58:30,970 --> 00:58:44,170 And let's say I have n minus k elements in c. And I'm going to do-- essentially take-- 545 00:58:44,170 --> 00:58:49,080 once I've selected a particular element, I'm going to look at all of the elements that 546 00:58:49,080 --> 00:58:52,510 are less than it and put it into the array b. I'm going to look at all the elements that 547 00:58:52,510 --> 00:58:53,560 are better than it. 548 00:58:53,560 --> 00:58:58,830 Let's assume all elements are unique. I'm going to put all of them into c. And I'm going 549 00:58:58,830 --> 00:59:06,030 to recur on b and c. Those two are my sub problems. 550 00:59:06,030 --> 00:59:17,120 But what I have to do is once I recur and I discover the ranks of the sub problems, 551 00:59:17,120 --> 00:59:23,300 I have to put them together. So what I have here is if k equals i-- so I computed the 552 00:59:23,300 --> 00:59:32,430 rank and I realized that if k equals-- equals i, I should say-- if k equals i, then I'm 553 00:59:32,430 --> 00:59:35,360 going to just return x. I'm done at this point. 554 00:59:35,360 --> 00:59:42,610 I got lucky. I picked an element x and it magically ended up having the correct rank, 555 00:59:42,610 --> 00:59:53,030 OK? Not always going to happen. And so in other case, if k is greater than i, then going 556 00:59:53,030 --> 01:00:01,540 to return select bi. 557 01:00:01,540 --> 01:00:07,670 So what I've done here is if k is greater than i, then I'm going to say, oh, so now 558 01:00:07,670 --> 01:00:11,540 I'm going to have to find the element in b. I know that it's going to be in b because 559 01:00:11,540 --> 01:00:15,740 k is greater than i. And I've got to find the exact position depending on what i is 560 01:00:15,740 --> 01:00:23,530 over here. But it's going to be somewhere between 1 and k minus 1. 561 01:00:23,530 --> 01:00:34,350 And then the last case is if k is less than i, then this is a little more tricky. I'm 562 01:00:34,350 --> 01:00:47,080 going to turn on c of i minus k, OK? So what happens here is that my k is-- the rank for 563 01:00:47,080 --> 01:00:50,520 the x that I looked at over here is less than i. 564 01:00:50,520 --> 01:00:57,110 So I know that I'm going to find this element that I'm looking for in c. But if I just look 565 01:00:57,110 --> 01:01:05,050 at c, I don't want to look at c and look for an element of rank i within c, right? That 566 01:01:05,050 --> 01:01:09,880 doesn't make sense because I'm looking for an element of rank i in the overall array 567 01:01:09,880 --> 01:01:11,300 that was given to me. 568 01:01:11,300 --> 01:01:18,730 So I have to subtract out the k elements that correspond to x and all of the k minus 1 elements 569 01:01:18,730 --> 01:01:25,450 that are in b to go figure out exactly what position or rank I'm looking for in the sub 570 01:01:25,450 --> 01:01:31,830 array corresponding to c, OK? So, people buy that. So that's just a small, little thing 571 01:01:31,830 --> 01:01:34,660 that you have to keep in mind as you do this. 572 01:01:34,660 --> 01:01:41,480 So that's pretty straightforward, looking pretty good. And you say, well, am I done 573 01:01:41,480 --> 01:01:49,750 here? And as you can imagine, the answer is no, because we haven't specified this value. 574 01:01:49,750 --> 01:01:59,070 Now, can someone tell me, at least from an efficiency standpoint, what might happen, 575 01:01:59,070 --> 01:02:04,790 what we're looking for here? As you can imagine, we want to improve on theta n log n. And so 576 01:02:04,790 --> 01:02:10,070 you could you say, well, I'm happy with theta n. That theta n complexity algorithm is better 577 01:02:10,070 --> 01:02:13,770 than a theta n log n complexity algorithm, which is kind of in the bag. 578 01:02:13,770 --> 01:02:18,340 Because we know how to sort and we know how to index. So we want a theta n algorithm. 579 01:02:18,340 --> 01:02:28,880 Now, if you take this and if I just picked, let's say, the biggest element-- I kept picking 580 01:02:28,880 --> 01:02:36,560 x to be n or n minus 1 or just picked a constant value. I picked x to be in the middle. 581 01:02:36,560 --> 01:02:42,130 I picked the index. I can always pick an element based on its index. I can always go for the 582 01:02:42,130 --> 01:02:43,680 middle one. 583 01:02:43,680 --> 01:02:51,340 So what is the worst case complexity of this algorithm? If I don't specify or I give you 584 01:02:51,340 --> 01:02:56,170 this arbitrary selection corresponding to x belonging to s, what is the worst case complexity 585 01:02:56,170 --> 01:02:59,300 of this algorithm? Yeah, go ahead. 586 01:02:59,300 --> 01:03:00,160 STUDENT: N squared. 587 01:03:00,160 --> 01:03:01,400 PROFESSOR: N squared-- why is that? 588 01:03:01,400 --> 01:03:04,360 STUDENT: Because if you [INAUDIBLE] take like the least element. 589 01:03:04,360 --> 01:03:05,100 PROFESSOR: Yep. 590 01:03:05,100 --> 01:03:08,820 STUDENT: How do you compare like N o against the other analysis? 591 01:03:08,820 --> 01:03:12,560 PROFESSOR: Exactly right. That's exactly right. So what happens is that you're doing a bunch 592 01:03:12,560 --> 01:03:15,430 of work here with this theta n work. 593 01:03:15,430 --> 01:03:21,650 Right here, this is theta n work, OK? So given that you're doing theta n work here, you have 594 01:03:21,650 --> 01:03:27,930 to be really careful as to how you pick the x element. So what might happen is that you 595 01:03:27,930 --> 01:03:30,160 end up picking the x over here. 596 01:03:30,160 --> 01:03:34,740 And given the particular rank you're looking for, you have to now-- you're left with a 597 01:03:34,740 --> 01:03:40,160 large array that has n minus 1 elements in the worst case. You started with n. You did 598 01:03:40,160 --> 01:03:45,460 not go to n over 2 and n over 2, which is what divide and conquer is all about-- even 599 01:03:45,460 --> 01:03:47,180 n over b, OK? 600 01:03:47,180 --> 01:03:52,200 You went to n minus 1. And then you go to n minus 2. And you go to n minus 3 because 601 01:03:52,200 --> 01:03:56,140 you're constantly picking-- this is worst case analysis. You're constantly picking these 602 01:03:56,140 --> 01:03:59,990 sub arrays to be extremely unbalanced. 603 01:03:59,990 --> 01:04:05,440 So when the sub arrays are extremely unbalanced, you end up doing theta n work in each 604 01:04:05,440 --> 01:04:10,980 level of the recursion. And those theta n's, because you're going down all the way from 605 01:04:10,980 --> 01:04:18,600 n to one, are going to be theta n square when you keep doing that, OK? So thanks for that 606 01:04:18,600 --> 01:04:22,010 analysis. 607 01:04:22,010 --> 01:04:32,170 And so this is theta n squared if you have a batch selection. So we won't talk about 608 01:04:32,170 --> 01:04:38,520 randomized algorithms, but the problem with randomized algorithms is that the analysis 609 01:04:38,520 --> 01:04:45,140 will be given a probability distribution. And it'll be expected time. 610 01:04:45,140 --> 01:04:51,890 What we want here is a deterministic algorithm that is guaranteed to run in worst case theta 611 01:04:51,890 --> 01:05:00,740 n. So we want a deterministic way of picking x belonging to s such that all of this works 612 01:05:00,740 --> 01:05:05,320 out and when we get our recurrence and we solve it, somehow magically we're getting 613 01:05:05,320 --> 01:05:12,860 fully balanced partitions-- firmly balanced sub problems in the sense that it's not n 614 01:05:12,860 --> 01:05:17,540 minus 1 and 1. It's something like-- it could even be n over 10 and 9n over 10. 615 01:05:17,540 --> 01:05:22,320 But as long as you guarantee that, you're shaking things down geometrically. And the 616 01:05:22,320 --> 01:05:28,320 asymptotics is going to work out. but the determinism is what we need. 617 01:05:28,320 --> 01:05:42,190 And so we're going to pick x cleverly. And we don't want the rank x to be extreme. 618 01:05:42,190 --> 01:05:49,000 So this is not the only way you could do it, but this is really very clever. 619 01:05:49,000 --> 01:05:57,880 There's a deterministic way. And you're going to see some arbitrary constants here. And 620 01:05:57,880 --> 01:06:03,910 we'll talk about them once I've described it. But what we're going to do is we're going 621 01:06:03,910 --> 01:06:08,270 to arrange s into columns of size 5, right? 622 01:06:08,270 --> 01:06:12,130 We're going to take this single array. And we're going to make it a two dimensional array 623 01:06:12,130 --> 01:06:20,040 where the number of rows is five and the number of columns that you have is n over 5-- the 624 01:06:20,040 --> 01:06:35,570 ceiling in this case. And then we're going to sort it each column, big elements on top. 625 01:06:35,570 --> 01:06:38,180 And we're going to do this in linear time. 626 01:06:38,180 --> 01:06:44,460 And you might say, how did that happen? Well, there's only five elements. So it's linear. 627 01:06:44,460 --> 01:06:47,360 You could do whatever you wanted. You could do n raised to four. 628 01:06:47,360 --> 01:06:55,230 But it's five raised to four and it's constants. Don't you love theory? So then we're going 629 01:06:55,230 --> 01:06:59,630 to find what we're going to call the median of medians. 630 01:06:59,630 --> 01:07:04,380 So I'm going to explain this. This works for arbitrary rank, but it's a little easier to 631 01:07:04,380 --> 01:07:09,790 focus in on the median to just explain the particular example. Because as you can see, 632 01:07:09,790 --> 01:07:18,090 there's an intricacy here associated with the break up. 633 01:07:18,090 --> 01:07:23,890 And so here we go. I'm going to draw out a picture. And we're going to try and argue 634 01:07:23,890 --> 01:07:32,560 that this deterministic strategy that I'll specify gives you fairly balanced partitions 635 01:07:32,560 --> 01:07:35,730 in all cases, OK? 636 01:07:35,730 --> 01:07:47,270 So what we see here is we see-- pictorially, you see columns of length five. Each of these 637 01:07:47,270 --> 01:08:00,640 dots corresponds to a number. This one dimensional array got turned into a two dimensional right. 638 01:08:00,640 --> 01:08:08,010 So I got four full columns. And it's suddenly possible, given n, that my fifth column is 639 01:08:08,010 --> 01:08:16,299 not full, right? So that's certainly possible. So that's why I have that up here. It so what 640 01:08:16,299 --> 01:08:19,420 I've here is I'm going to lay them out this way. 641 01:08:19,420 --> 01:08:31,889 And I'm going to look at that. I'm going to look at the middle elements of each of these 642 01:08:31,889 --> 01:08:41,549 n over five columns. That's exactly what I'm going to look at. Now, if I look at what I 643 01:08:41,549 --> 01:08:48,179 want, what I want over here is this x. If I want to find-- 644 01:08:48,179 --> 01:08:59,109 I'm going to find the median of medians. So is x. Now, it is true the first that 645 01:08:59,109 --> 01:09:03,269 these columns-- I'm just putting that up here imagining that that's x. 646 01:09:03,269 --> 01:09:12,568 That's not guaranteed to be x because the columns themselves aren't-- well, these columns 647 01:09:12,568 --> 01:09:18,849 are sorted. And what I'm going to have to guarantee, of course, is that when I go find 648 01:09:18,849 --> 01:09:25,749 this median of medians is that it ends up being something that gives me balanced partitions. 649 01:09:25,749 --> 01:09:32,749 So maybe say a little bit more before I explain what's going on. 650 01:09:32,749 --> 01:09:38,960 Each of these columns is sorted. And s is arranged into columns of size 5 like I just 651 01:09:38,960 --> 01:09:51,259 said here. These are the medians, OK? If I look at determining the medians and I say 652 01:09:51,259 --> 01:09:57,449 that once I've determined this x, which I've discovered that it's the median, then this 653 01:09:57,449 --> 01:10:00,710 is right there in the middle. There's going to be a bunch of columns to the left of it, 654 01:10:00,710 --> 01:10:04,239 a bunch of elements to the left of it, and a bunch of elements to the right of it. 655 01:10:04,239 --> 01:10:08,909 And in this case, I have five columns. I could have had more. It happens to be the third 656 01:10:08,909 --> 01:10:09,880 one. 657 01:10:09,880 --> 01:10:15,909 So the idea is that once I find this median of medians, which corresponds to this x number, 658 01:10:15,909 --> 01:10:23,999 I can say that all of the columns-- these all correspond to columns that have their 659 01:10:23,999 --> 01:10:29,070 median element greater than x. These correspond to columns that have their median element 660 01:10:29,070 --> 01:10:39,159 less than x, OK? So what I have here in this picture is that these elements here are going 661 01:10:39,159 --> 01:10:42,199 to be greater than x. 662 01:10:42,199 --> 01:10:56,440 And these elements here are going to be less than x. So let me clear. What's happened here 663 01:10:56,440 --> 01:11:07,659 is we've not only sorted all of the columns such that you have large elements up here. 664 01:11:07,659 --> 01:11:12,360 Each of these five columns have been sorted that way. On top of that, I've discovered 665 01:11:12,360 --> 01:11:19,929 the particular column that corresponds to the medians of medians. And this is my x over 666 01:11:19,929 --> 01:11:20,989 here. 667 01:11:20,989 --> 01:11:25,199 And it may be the case that these columns aren't sorted. This one may be larger than 668 01:11:25,199 --> 01:11:29,030 that or vice versa-- same thing over there. I have no idea. 669 01:11:29,030 --> 01:11:36,119 But it's guaranteed that once I find this median that I do know all of the columns that 670 01:11:36,119 --> 01:11:44,550 have elements in this position that are less than this x. And I know columns that in this 671 01:11:44,550 --> 01:11:48,450 position have elements that are greater than x, OK? Yep. 672 01:11:48,450 --> 01:11:56,200 STUDENT: Shouldn't the two elements below x also be computed [INAUDIBLE] less than x. 673 01:11:56,200 --> 01:12:04,579 PROFESSOR: You're exactly right. I would have probably been able to get the same asymptotic 674 01:12:04,579 --> 01:12:09,440 complexity if I dropped those because I had a constant number. But you're absolutely exactly 675 01:12:09,440 --> 01:12:10,199 right. 676 01:12:10,199 --> 01:12:15,429 So the point that-- the question was-- I just redrew it. These two are clearly less than 677 01:12:15,429 --> 01:12:21,610 x as well because they're part of the sorting. And that's essentially I have here. 678 01:12:21,610 --> 01:12:26,679 Now, my goal here-- and you can kind of see from here as to where we're headed. What I've 679 01:12:26,679 --> 01:12:31,780 down here by this process of sorting each column and finding the median of medians is 680 01:12:31,780 --> 01:12:37,760 that I found this median of medians such that there's a bunch of columns on the left. And 681 01:12:37,760 --> 01:12:41,530 roughly half of those elements in those columns are less than x. 682 01:12:41,530 --> 01:12:47,739 And there are a bunch of columns on the right. And roughly half of those columns have elements 683 01:12:47,739 --> 01:12:54,030 that are greater than x. So what I now have to do is to do a little bit of math to show 684 01:12:54,030 --> 01:12:58,079 you exactly what the recurrence is. And let me do that over here. 685 01:12:58,079 --> 01:13:03,550 So that's the last thing that we have to do. I probably won't solve the recurrence, but 686 01:13:03,550 --> 01:13:10,469 that can wait until tomorrow. The recurrence will be something that's not particularly 687 01:13:10,469 --> 01:13:23,670 difficult to solve. So I want to now make a more quantitative argument that the variable 688 01:13:23,670 --> 01:13:33,300 being n as to how many elements are guaranteed to be greater than x. 689 01:13:33,300 --> 01:13:38,519 And essentially what I'm saying, which is I'm writing out what I have on that picture 690 01:13:38,519 --> 01:14:02,030 there, half of the n over 5 groups contribute at least three elements greater than x except 691 01:14:02,030 --> 01:14:10,559 for one group with possibly less than five elements, which is the one that I have all 692 01:14:10,559 --> 01:14:27,550 the way to the right, and one group that contains x. So for all the other columns, I'm going to 693 01:14:27,550 --> 01:14:35,619 get three elements that are greater than x. And so if you write that out, this says there 694 01:14:35,619 --> 01:14:47,199 are at least three n over 10, because I have half of all of those groups, minus 2. 695 01:14:47,199 --> 01:14:53,239 And I'm not counting perfectly accurately here, but I have an at least. So this should 696 01:14:53,239 --> 01:15:02,219 all be fine. 3n over 1d-- 3 times n over 10 minus 2 elements are strictly greater than 697 01:15:02,219 --> 01:15:06,179 x. And that comes from that picture. 698 01:15:06,179 --> 01:15:14,639 I'm going to be able to say the same thing for less than x as well. I can't count the 699 01:15:14,639 --> 01:15:20,960 one. Depending on how things go, maybe I could have played around and subtracted 1 instead 700 01:15:20,960 --> 01:15:23,440 of a 2 in the latter case. 701 01:15:23,440 --> 01:15:28,829 But I'm just being conservative here. It is clear that I'm going to have a bunch of columns 702 01:15:28,829 --> 01:15:34,789 that are full columns, that are going to be contributing three elements that are greater 703 01:15:34,789 --> 01:15:39,249 than x. And in this case, I have, well, two of them here for the less than x. 704 01:15:39,249 --> 01:15:44,550 And I got one for the greater than x. So that's all that I'm seeing over here with respect 705 01:15:44,550 --> 01:15:49,800 to the balance of the partitions. And it turns out that's enough. 706 01:15:49,800 --> 01:15:57,949 It turns out all I have to do with this observation is to go off and run the recurrence. And we're 707 01:15:57,949 --> 01:16:04,160 going to get an efficient algorithm. Yep. 708 01:16:04,160 --> 01:16:08,059 STUDENT: Should it not be like greater than or equal to, because there's... [INAUDIBLE] 709 01:16:08,059 --> 01:16:11,240 PROFESSOR: No, there's nothing that's equal. 710 01:16:11,240 --> 01:16:12,580 STUDENT: So you are saying, that's all you need. 711 01:16:12,580 --> 01:16:16,840 PROFESSOR: Yeah. Yeah, I assume that-- so, convenience, yeah. There's always a little 712 01:16:16,849 --> 01:16:19,030 bit of convenience thrown in here. 713 01:16:19,030 --> 01:16:27,989 We will assume that the a has unique elements. So there's nothing that's x, OK? Good. 714 01:16:27,989 --> 01:16:38,190 So the recurrence, once you do that, is t of n equals-- we're going to just say it's 715 01:16:38,190 --> 01:16:48,510 order one for n less than or equal to 140. Where did that come from? Well, like 140. 716 01:16:48,510 --> 01:16:53,630 It's just a large number. It came from the fact that you're going to see 10 minus 3, 717 01:16:53,630 --> 01:16:57,119 which is 7. And then you want to multiply that by 2. 718 01:16:57,119 --> 01:17:01,619 So some reasonably large number-- we're going to go off and we're going to assume that's 719 01:17:01,619 --> 01:17:08,400 a constant. So you could sort those 140 numbers and find the median or whatever rank. It's 720 01:17:08,400 --> 01:17:10,739 all constant time once you get down to the base case. 721 01:17:10,739 --> 01:17:14,659 So you just want it to be large enough such that you could break it up and you have something 722 01:17:14,659 --> 01:17:19,679 interesting going on with respect to the number of columns. So don't worry much about that 723 01:17:19,679 --> 01:17:24,230 number. The key thing here is the recurrence, all right? 724 01:17:24,230 --> 01:17:31,980 And this is what we have spent the rest of our time on. And I'll just write this out 725 01:17:31,980 --> 01:17:47,179 and explain where these numbers came from. So that's our recurrence for n less than or 726 01:17:47,179 --> 01:17:48,650 equal to 140. 727 01:17:48,650 --> 01:17:54,590 And else, you're going to do this. So what is going on here? What are all of these components 728 01:17:54,590 --> 01:17:58,110 corresponding to this recurrence? 729 01:17:58,110 --> 01:18:05,300 Really quickly, this is simply something that says I'm finding the median of medians. I'm 730 01:18:05,300 --> 01:18:11,170 finding some element that has a certain rank. So this median of medians is going to be running 731 01:18:11,170 --> 01:18:17,579 on n over 5 columns. So I've got this-- there are n over 5 columns here. 732 01:18:17,579 --> 01:18:24,150 And I'm going to be calling this algorithm recursively, the median finding algorithm, 733 01:18:24,150 --> 01:18:35,260 to do that-- finding the median of medians. This thing over here is-- I'm going to be 734 01:18:35,260 --> 01:18:43,039 discarding at least regardless of what I do. Because I have these two statements here, 735 01:18:43,039 --> 01:18:47,760 I take the overall n. And I'm going to discard. 736 01:18:47,760 --> 01:18:51,849 In my paradigm over here, I'm either going to go with b or I'm either going to go with 737 01:18:51,849 --> 01:18:57,889 c depending on what I'm looking for. And given that b and c are not completely unbalanced, 738 01:18:57,889 --> 01:19:06,349 I'm going to be discarding 3n over 10 minus 6 elements, which simply corresponds to me 739 01:19:06,349 --> 01:19:12,150 ignoring the ceiling here and multiplying the 3 out. So that's 3n over 10 minus 6. 740 01:19:12,150 --> 01:19:18,999 So then I have 7n over 10 plus 6. That's the maximum size partition that I'm going to recur 741 01:19:18,999 --> 01:19:22,579 on. It's only going to be exactly one of them, as you can see from that. 742 01:19:22,579 --> 01:19:26,570 It's either else. It's not recurring on both of them. It's recurring on one of them. So 743 01:19:26,570 --> 01:19:32,099 that's where the 7n over 10 plus 6 comes from. And then you ask where does this theta n come 744 01:19:32,099 --> 01:19:32,749 from. 745 01:19:32,749 --> 01:19:38,780 Well, the theta n comes from the fact that I do have to do some sorting. It's constant 746 01:19:38,780 --> 01:19:44,079 time sorting for every column, OK? Because it's only five elements. 747 01:19:44,079 --> 01:19:49,099 So I'm going to do constant time sorting. But there's order n columns. Because it's-- 748 01:19:49,099 --> 01:19:50,909 then it's n over 5 columns. 749 01:19:50,909 --> 01:20:00,679 So this is the sorting of all of the columns, all right? So that's it. And I'll just leave 750 01:20:00,679 --> 01:20:08,659 you with-- you cannot apply the master theorem for solving this particular recurrence. But 751 01:20:08,659 --> 01:20:11,699 if you make the observation-- and you'll see this in section. 752 01:20:11,699 --> 01:20:19,409 You make the observation that n over 5 plus 7n over 10 is actually less than n. So you 753 01:20:19,409 --> 01:20:23,610 get 0.2n here and 0.7n there. That's actually less than n. 754 01:20:23,610 --> 01:20:28,070 This thing runs in linear time. And you'll see that in section tomorrow. So this whole 755 01:20:28,070 --> 01:20:33,250 thing is theta n time. See you next time.