1
00:00:00,089 --> 00:00:04,820
The following content is provided under a
Creative Commons license. Your support will

2
00:00:04,820 --> 00:00:10,680
help MIT OpenCourseWare continue to offer
high quality educational resources for free.

3
00:00:10,680 --> 00:00:15,520
To make a donation or view additional materials
from hundreds of MIT courses,

4
00:00:15,520 --> 00:00:21,320
visit MIT OpenCourseWare at ocw.mit.edu.

5
00:00:21,320 --> 00:00:29,800
PROFESSOR: All right. Good morning, everyone.
Let's get started. So we're going to start

6
00:00:29,810 --> 00:00:38,000
6.046 in earnest today. We're going to start
with our first module on divide and conquer.

7
00:00:38,000 --> 00:00:44,149
You've all seen divide and conquer algorithms
before. Merge sort is a classic divide and

8
00:00:44,149 --> 00:00:50,469
conquer algorithm. I'm going to spend just
a couple minutes talking about the paradigm,

9
00:00:50,469 --> 00:00:55,680
give you a slightly more general setting than
merge sort. And then we'll get into two really

10
00:00:55,680 --> 00:01:04,059
cool divide and conquer problems in the sense
that these are problems for which divide and

11
00:01:04,059 --> 00:01:10,149
conquer works very well-- mainly, convex hall
and median finding.

12
00:01:10,149 --> 00:01:16,340
So before I get started on the material, let
me remind you that you should be signing up

13
00:01:16,340 --> 00:01:22,779
for a recitation section on Stellar. And please
do that even if you don't plan on attending

14
00:01:22,779 --> 00:01:30,139
sections. Because we need that so we can assign
your problem sets to be graded, OK?

15
00:01:30,139 --> 00:01:36,310
So that's our way of partitioning problem
sets as well. And then the other thing is

16
00:01:36,310 --> 00:01:42,259
problem set one is going to go out today.
And that it's a one week problem set.

17
00:01:42,259 --> 00:01:49,479
All problem sets are going to be a week in
duration. Please read these problem sets the

18
00:01:49,479 --> 00:01:54,158
day that they come out. Spend 5, 10 minutes
reading them.

19
00:01:54,158 --> 00:01:59,700
Some things are going to look like they're
magic, that they're-- how could I possibly

20
00:01:59,700 --> 00:02:05,959
prove this? If you think about it for a bit,
it'll become obvious. We promise you that.

21
00:02:05,959 --> 00:02:12,459
But get started early. Don't get started at
7:00 PM when we have 11:59 PM deadline on

22
00:02:12,459 --> 00:02:19,110
Thursday, all right? That four hours or five
hours of time may not be enough to go from

23
00:02:19,110 --> 00:02:23,150
magical to obvious, OK?

24
00:02:23,150 --> 00:02:31,930
So let's get started with the paradigm associated
with divide and conquer. It's just a beautiful

25
00:02:31,930 --> 00:02:41,159
notion that you can break up the problem into
smaller parts and somehow compose the solutions

26
00:02:41,159 --> 00:02:47,689
to the smaller parts. And of course, the details
are going to be what's important when we take

27
00:02:47,689 --> 00:02:50,650
a particular problem instance.

28
00:02:50,650 --> 00:02:59,900
But let's say we're given a problem of size
n. We're going to divide it into a sub problems--

29
00:02:59,900 --> 00:03:13,459
I'll put that in quotes so you know it's a
symbol-- a sub problems of size n over b.

30
00:03:13,459 --> 00:03:17,319
And here, a is an integer.

31
00:03:17,319 --> 00:03:22,609
And a is going to be greater than or equal
to 1. It could be two. It could be three.

32
00:03:22,609 --> 00:03:23,569
It could be four.

33
00:03:23,569 --> 00:03:32,549
This is the generalization I alluded to. And
b does not have to be two or even an integer.

34
00:03:32,549 --> 00:03:35,400
But it has to be strictly greater than one.

35
00:03:35,400 --> 00:03:40,519
Otherwise, there's no notion of divide and
conquer. You're not breaking things up into

36
00:03:40,519 --> 00:03:48,549
smaller problems. So b should be strictly
greater than one. So that's the general setting.

37
00:03:48,549 --> 00:03:58,379
And then you'll solve each sub problem recursively.
And the idea here is that once the sub problems

38
00:03:58,379 --> 00:04:06,249
become really small, they become constant
size, it's relatively easy to solve them.

39
00:04:06,249 --> 00:04:08,829
You can just do exhaustive search.

40
00:04:08,829 --> 00:04:16,738
If you have 10 elements and you're doing effectively
a cubic search, well, 10 cubed is 1,000. That's

41
00:04:16,738 --> 00:04:22,449
a constant. You're in great shape as long
as the constants are small enough.

42
00:04:22,449 --> 00:04:28,849
And so you're going to recurse until these
problems get small. And then typically-- this

43
00:04:28,849 --> 00:04:35,590
is not true for all divide and conquer approaches.
But for most of them, and certainly the ones

44
00:04:35,590 --> 00:04:42,370
we're going to cover today, the smarts is
going to be in the combination step-- when

45
00:04:42,370 --> 00:05:01,960
you combine these problems, the solutions
of these sub problems, into the overall solution.

46
00:05:01,960 --> 00:05:04,310
And so that's the story.

47
00:05:04,310 --> 00:05:12,530
Typically, what happens in terms of efficiency
is that you can write a recurrence that's

48
00:05:12,530 --> 00:05:22,849
associated with this divide and conquer algorithm.
And you say t of n, which is a running time,

49
00:05:22,849 --> 00:05:32,879
for a problem of size n is going to be a times
tfn over b-- and this is a recurrence-- plus

50
00:05:32,879 --> 00:05:41,729
the work that you need to do for the merge
operational or the combine. This is the same

51
00:05:41,729 --> 00:05:45,639
as merge.

52
00:05:45,639 --> 00:05:53,129
And so you get a recurrence. And you're not
quite done yet in terms of the analysis. Because

53
00:05:53,129 --> 00:05:57,159
once you have the recurrence, you do have
to solve the recurrence. And it's usually

54
00:05:57,159 --> 00:06:01,430
not that hard and certainly it's not going
to be particularly difficult for the divide

55
00:06:01,430 --> 00:06:06,360
and conquer examples that we're going to look,
at least today.

56
00:06:06,360 --> 00:06:13,270
But we also have this theorem that's called
the master theorem that is essentially something

57
00:06:13,270 --> 00:06:21,659
where you can fairly mechanically plug in
the a's and the b's and whatever you have

58
00:06:21,659 --> 00:06:27,199
there-- maybe it's theta n, maybe it's theta
n square-- and get the solution to the recurrence.

59
00:06:27,199 --> 00:06:34,419
I'm actually not going to do that today. But
you'll hear once again about the massive theorem

60
00:06:34,419 --> 00:06:42,379
tomorrow in section. And it's a fairly straightforward
template that you can use for most of the

61
00:06:42,379 --> 00:06:48,280
divide and conquer examples we're going to
look at in 046 with one exception that we'll

62
00:06:48,280 --> 00:06:52,949
look at in median finding today that will
simply give you the solution to the recurrence,

63
00:06:52,949 --> 00:06:53,919
OK?

64
00:06:53,919 --> 00:06:58,520
So you've see most of these things before.
That's a little bit of setup. And so let's

65
00:06:58,520 --> 00:07:11,080
dive right in into convex hull, which is my
favorite problem when it comes to using divide

66
00:07:11,080 --> 00:07:12,620
and conquer.

67
00:07:12,620 --> 00:07:19,509
So convex hull, I got a little prop here which
will save me from writing on the board and

68
00:07:19,509 --> 00:07:28,330
hopefully be more understandable. But the
idea here is that in this case, we have a

69
00:07:28,330 --> 00:07:35,830
two dimensional problem with a bunch of points
in a two dimensional plane. You can certainly

70
00:07:35,830 --> 00:07:43,460
do convex hull for three dimensions, many
dimensions. And convexity is something that

71
00:07:43,460 --> 00:07:46,280
is a fundamental notion in optimization.

72
00:07:46,280 --> 00:07:52,400
And maybe we'll get to that in 6046 in advanced
topics, maybe not. But in the context of today's

73
00:07:52,400 --> 00:07:58,729
lecture, what we're interested in doing is
essentially finding an envelope or a hull

74
00:07:58,729 --> 00:08:05,490
associated with a collection of points on
a two dimensional plane. And this hull obviously

75
00:08:05,490 --> 00:08:14,849
is going to be something, as you can guess,
that encloses all of these points, OK?

76
00:08:14,849 --> 00:08:23,250
So what I have here, if I make this string
taut enough-- this is not working so well,

77
00:08:23,250 --> 00:08:35,159
but I think you get the picture. All right,
so that's not a convex hull. This is not a

78
00:08:35,159 --> 00:08:40,419
convex hull for the reason that I have a bunch
of points outside of the hull.

79
00:08:40,419 --> 00:08:53,120
All right, so let me just-- that is a convex
hull. And now if I start stretching like that

80
00:08:53,120 --> 00:08:59,940
or like this or like that, that's still a
convex hull, OK? So that's the game.

81
00:08:59,940 --> 00:09:08,400
We have to find an algorithm. And we look
at a couple of different ones that will find

82
00:09:08,400 --> 00:09:14,390
all of these segments that are associated
with this convex hull, OK? So this is a segment

83
00:09:14,390 --> 00:09:15,790
that's part of the convex hull.

84
00:09:15,790 --> 00:09:23,570
That's a segment that's part of the convex
hull. If, in fact, I had something like this--

85
00:09:23,570 --> 00:09:28,710
and this was stretched out-- because I have
those two points outside the convex hull,

86
00:09:28,710 --> 00:09:35,970
this may still be a segment that's part of
the electronics hall but this one is not,

87
00:09:35,970 --> 00:09:41,480
right? So that's-- the game here is to find
these segments. So if you're going to working

88
00:09:41,480 --> 00:09:51,340
with segments or tangents-- they're going
to be used synonymously-- all of the tangents

89
00:09:51,340 --> 00:09:56,080
or segments associated with the entirety of
the convex hull and we have to discover them.

90
00:09:56,080 --> 00:10:01,780
And only input that we have is the set of
pointx-- xiy coordinates.

91
00:10:01,780 --> 00:10:08,200
And there's just a variety of algorithms that
you can use to do this. The one that I wish

92
00:10:08,200 --> 00:10:14,330
I had time to explain but I'll just mention
is what's called a gift wrapping algorithm.

93
00:10:14,330 --> 00:10:22,070
You might not have done this, but I guarantee
you I said you probably have taken a misshapen

94
00:10:22,070 --> 00:10:26,200
gift, right, and tried to wrap it in gift
wrapping paper.

95
00:10:26,200 --> 00:10:30,250
And when you're doing that, you're essentially--
if you're doing it right you're essentially

96
00:10:30,250 --> 00:10:35,090
trying to find the convex hull of this three
dimensional structure. You're trying to tighten

97
00:10:35,090 --> 00:10:39,750
it up. You're trying to find the minimum amount
of gift wrapping paper.

98
00:10:39,750 --> 00:10:43,730
I'm not sure if you've ever thought about
minimizing gift wrapping paper, but you should

99
00:10:43,730 --> 00:10:50,030
have. And that's the convex hull of this three
dimensional shape. But we'll stick to two

100
00:10:50,030 --> 00:10:56,710
dimensions because we'll have to draw things
on the board. So let me just spec this out

101
00:10:56,710 --> 00:10:57,990
a bit.

102
00:10:57,990 --> 00:11:12,910
I've been given endpoints in a plane. And
those set of points are s, xi, yi such that

103
00:11:12,910 --> 00:11:21,200
i equals 1, 2 to n. And we're just going to
assume here, just to make things easy because

104
00:11:21,200 --> 00:11:28,610
we don't want to have segments that are null
or segments that are a little bit different

105
00:11:28,610 --> 00:11:37,270
because they're discontinuous. But we're going
to assume that no two have the same x-coordinate.

106
00:11:37,270 --> 00:11:56,020
This is just a matter of convenience. And
no two have the same y-coordinate.
And then finally, no three in a line.

107
00:11:56,020 --> 00:12:01,840
Because we want to be able to look at pairs
of points and find these segments. And it

108
00:12:01,840 --> 00:12:08,070
just gets kind of inconvenient. You have to
do special cases if there of them are on a

109
00:12:08,070 --> 00:12:29,700
line. And so the convex hull itself is the
smallest polygon containing all points in

110
00:12:29,700 --> 00:12:36,810
s. And we're going to call that ch of s--
convex hull of s.

111
00:12:36,810 --> 00:12:39,440
STUDENT: Smallest convex polygon.

112
00:12:39,440 --> 00:12:55,520
PROFESSOR: The smallest convex polygon--
thank you. And so just as an example on the

113
00:12:55,520 --> 00:13:02,660
board, when you have something like this,
you're going to have your convex hull being

114
00:13:02,660 --> 00:13:07,690
that. This one is inside of it.

115
00:13:07,690 --> 00:13:13,490
These two points are inside of it. And all
the other ones form the hull. And so we might

116
00:13:13,490 --> 00:13:25,750
have p, q, r, s, t, u. And v and x are inside
of the hull. They're not part of the specification

117
00:13:25,750 --> 00:13:30,000
of ch of s, which I haven't quite told you
how we're going to specify that.

118
00:13:30,000 --> 00:13:38,290
But the way you're going to specify that is
simply by representing it as a sequence of

119
00:13:38,290 --> 00:13:54,390
points that are on the boundary on the hull
in clockwise order. And you can think of this

120
00:13:54,390 --> 00:14:00,160
as being a doubly linked list in terms of
the data structure that you'd use if you coded

121
00:14:00,160 --> 00:14:11,080
this up. So in this case, it would be p to
q to r to s.

122
00:14:11,080 --> 00:14:18,800
You're going to start with t in this case.
It's a doubly linked list. So you could conceivably

123
00:14:18,800 --> 00:14:26,120
start with anything. But that's the representation
of the convex hull.

124
00:14:26,120 --> 00:14:33,890
And we're going to use clockwise just because
we want to be clear on as to what order we're

125
00:14:33,890 --> 00:14:38,180
enumerating these points. It's going to become
important when we do the divide and conquer

126
00:14:38,180 --> 00:14:46,170
algorithm. So let's say that we didn't care
about divide and conquer just for the heck

127
00:14:46,170 --> 00:14:57,610
of it and I gave you a bunch of points over
here.

128
00:14:57,610 --> 00:15:07,510
Can you think of a simple-- forget efficiency
for just a couple of minutes. Can you think

129
00:15:07,510 --> 00:15:18,720
of a simple algorithm that would generate
the segments of the convex hull? For example,

130
00:15:18,720 --> 00:15:21,940
I do not want to generate this segment vx.

131
00:15:21,940 --> 00:15:27,890
If I think of a segment as being something
that is defined by two points, then I don't

132
00:15:27,890 --> 00:15:32,480
want to generate the segment vx because clearly
the segment is not part of the convex hull.

133
00:15:32,480 --> 00:15:38,820
But whereas the segment pq, qr, rs, et cetera,
they're all part of the convex hull, right?

134
00:15:38,820 --> 00:15:46,360
So what is the obvious brute force algorithm,
forgetting efficiency, that given this set

135
00:15:46,360 --> 00:15:53,670
of points will generate one by one the segments
of the convex hull?

136
00:15:53,670 --> 00:16:01,550
Anybody? Did you have your head up? No? Go
ahead. Yep.

137
00:16:01,550 --> 00:16:06,380
STUDENT: Draw the line and check how many other lines intersect with it.

138
00:16:06,380 --> 00:16:09,860
PROFESSOR: Draw the line and check how many
lines it intersects with.

139
00:16:09,860 --> 00:16:11,120
STUDENT: Yeah.

140
00:16:11,120 --> 00:16:15,180
PROFESSOR: Is there-- I think you got-- you
draw the line. That's good, right?

141
00:16:15,180 --> 00:16:18,460
STUDENT: [LAUGHS]
AUDIENCE: [LAUGHING]

142
00:16:18,470 --> 00:16:23,670
PROFESSOR: Well-- but you want to do a little
more. Yeah, go ahead.

143
00:16:23,670 --> 00:16:28,110
STUDENT: For every pair of points you see, make a half-plane and see where they complete

144
00:16:28,110 --> 00:16:31,260
all of their other points. [INAUDIBLE]

145
00:16:31,260 --> 00:16:33,320
PROFESSOR: Ah, so that's good. That's good.
That's good.

146
00:16:33,330 --> 00:16:41,420
All right, so the first person who breaks
the ice here always gets a Frisbee. Sorry

147
00:16:41,420 --> 00:16:47,180
man. At least I only hit the lecturer-- no
liability considerations here. OK, now I'm

148
00:16:47,180 --> 00:16:49,240
getting scared.

149
00:16:49,240 --> 00:16:55,250
Right, so I think there's a certain amount
of when I throw this, am I going to choke

150
00:16:55,250 --> 00:17:01,450
or not, right? But it's going to get higher
when one of you guys in the back answers a

151
00:17:01,450 --> 00:17:03,860
question. So you're exactly right.

152
00:17:03,860 --> 00:17:10,128
And you draw a line. And then you just look
at it. And you look at the half plane.

153
00:17:10,128 --> 00:17:17,869
And if all the points are to one side, it
is a segment of the convex hull. If they're

154
00:17:17,869 --> 00:17:23,029
not, it's not a segment-- beautiful. All right,
are we done? Can we go and enjoy the good

155
00:17:23,029 --> 00:17:24,159
weather outside?

156
00:17:24,159 --> 00:17:31,110
No, we've got ways to go here. So this is
not the segment whereas one-- let me draw

157
00:17:31,110 --> 00:17:36,039
that. I should draw these in a dotted way.

158
00:17:36,039 --> 00:17:41,480
This is not a segment. This is not a segment.
This is a segment.

159
00:17:41,480 --> 00:17:45,850
And I violated my rule of these three not
being in a straight line. So I'll move this

160
00:17:45,850 --> 00:17:51,640
over here. And then that's a segment and so
on and so forth, OK? Right?

161
00:17:51,640 --> 00:17:53,900
STUDENT: It's no longer a side with the ones below it.

162
00:17:53,900 --> 00:17:55,700
PROFESSOR: I'm sorry?

163
00:17:55,700 --> 00:17:58,580
STUDENT: It would have to go directly to the bottom one from the left one.

164
00:17:58,580 --> 00:18:02,120
PROFESSOR: Oh, you're right. That's a good
point. That's an excellent point.

165
00:18:02,139 --> 00:18:08,340
SO what happened here was when I moved that
out-- exactly right. Thank you. This is good.

166
00:18:08,340 --> 00:18:16,700
So when I moved this out here, what happened
was-- and I drew this-- well, this one here,

167
00:18:16,700 --> 00:18:23,230
my convex hull, changed. The problem specification
changed on me. It was my fault. But then what

168
00:18:23,230 --> 00:18:28,080
would happen, of course, is as I move this,
that would become the segment that was part

169
00:18:28,080 --> 00:18:30,389
of the convex hull, OK?

170
00:18:30,389 --> 00:18:36,240
So sorry to confuse people. But what we have
here in terms of an algorithm, if I leave

171
00:18:36,240 --> 00:18:43,679
the points the same, works perfectly well.
So let me just leave the points the same and

172
00:18:43,679 --> 00:18:48,029
just quickly recap, which is, I'm going to
take a pair of points.

173
00:18:48,029 --> 00:18:54,200
And I'm going to draw-- and let me just draw
this in a dotted fashion first. And I'm going

174
00:18:54,200 --> 00:18:58,240
to say that's the segment. And I'm going to
take a look at that line and say this breaks

175
00:18:58,240 --> 00:19:04,779
up the plane into two half planes. Are all
about points on one side?

176
00:19:04,779 --> 00:19:09,799
And if the answer is yes, I'm going to go
ahead and, boom, say that is a segment of

177
00:19:09,799 --> 00:19:16,110
my convex hull. If the answers is no, like
in this case, I'm going to drop that segment,

178
00:19:16,110 --> 00:19:19,950
OK? So now let's talk about complexity.

179
00:19:19,950 --> 00:19:29,990
Let's say that there are n points here. And
how many segments do I have? I have O n square

180
00:19:30,000 --> 00:19:31,860
theta n square segments.

181
00:19:31,860 --> 00:19:38,399
And what is the complexity of the test? What
is the complexity of the test that's associated

182
00:19:38,399 --> 00:19:43,149
with, once I've drawn the segments, deciding
whether the segment is going to be a tangent

183
00:19:43,149 --> 00:19:45,490
which is part of the convex hull or not? What
is the complexity?

184
00:19:45,490 --> 00:19:46,360
STUDENT: O n.

185
00:19:46,360 --> 00:19:59,680
PROFESSOR: O n-- exactly right. So on test
complexity-- and so we got over theta n cubed

186
00:19:59,680 --> 00:20:05,059
complexity, OK? So it makes sense to do divide
and conquer if you can do better than this.

187
00:20:05,059 --> 00:20:10,230
Because this is a really simple algorithm.
The good news is we will be able to do better

188
00:20:10,230 --> 00:20:18,990
than that. And now that we have a particular
algorithm-- I'm not quite ready to show you

189
00:20:18,990 --> 00:20:19,700
that yet.

190
00:20:19,700 --> 00:20:26,590
Now that we have a particular algorithm, we
can think about how we can improve things.

191
00:20:26,590 --> 00:20:34,210
And of course we're going to use divide and
conquer. So let's go ahead and do that. And

192
00:20:34,210 --> 00:20:40,909
so generally, the divide and conquer, as I
mentioned before, in most cases, the division

193
00:20:40,909 --> 00:20:44,240
is pretty straightforward.

194
00:20:44,240 --> 00:20:50,629
And that's the case here as well. All the
fun is going to be in the merge step. Right,

195
00:20:50,629 --> 00:20:54,460
so what we're going to do, as you can imagine,
is we're going to take these points.

196
00:20:54,460 --> 00:20:59,690
And we're going to break them up. And the
way we're going to break them up is by dividing

197
00:20:59,690 --> 00:21:03,409
them into half lengths. We're going to just
draw a line.

198
00:21:03,409 --> 00:21:07,980
And we're going to say everything to the left
of the line is one sub problem, everything

199
00:21:07,980 --> 00:21:14,440
to the right of the line is another sub problem,
go off and find the convex hull for each of

200
00:21:14,440 --> 00:21:20,509
the sub problems. If you have two points,
you're done, obviously. It's trivial.

201
00:21:20,509 --> 00:21:24,899
And at some point, you can say I'm just going
to deal with brute force. If we can go down

202
00:21:24,899 --> 00:21:30,789
to order n cubed, if n is small, I can just
apply that algorithm. So it doesn't even have

203
00:21:30,789 --> 00:21:36,059
to be the base case of n equals 1 or n equals
2. That's a perfectly fine thing to do.

204
00:21:36,059 --> 00:21:39,749
But you could certainly go with n equals 10,
as I mentioned before, and run this brute

205
00:21:39,749 --> 00:21:44,360
force algorithm. And so at that point, you
know that you can get down to small enough

206
00:21:44,360 --> 00:21:50,779
size sub problems for which you can find the
convex hull efficiently. And then you've got

207
00:21:50,779 --> 00:21:57,539
these two convex hulls which are clearly on
two different half planes because that's the

208
00:21:57,539 --> 00:21:59,019
way you defined them.

209
00:21:59,019 --> 00:22:04,820
And now you've got to merge them. And that's
where all the fun is, OK? So let's just write

210
00:22:04,820 --> 00:22:06,460
this out again.

211
00:22:06,460 --> 00:22:17,850
You're going to sort the points by x-coordinates.
And we're going to do this once and for all.

212
00:22:17,850 --> 00:22:22,299
We don't have to keep sorting here because
we're just going to be partitioning based

213
00:22:22,299 --> 00:22:22,929
on x-coordinates.

214
00:22:22,929 --> 00:22:27,509
And we can keep splitting based on x-coordinates
because we want to generate these half-lengths,

215
00:22:27,509 --> 00:22:41,639
right? So if we can do those once and for
all-- and for the input set S, we're going

216
00:22:41,640 --> 00:23:00,200
to divide into the left half A and right half
B by the x-coordinates. And then we're going

217
00:23:00,200 --> 00:23:08,700
to compute CH of A and CH of B recursively.

218
00:23:08,700 --> 00:23:14,399
And then we're going to combine. So the only
difference here from what we had before is

219
00:23:14,399 --> 00:23:18,909
the specification of the division. It looked
pretty generic.

220
00:23:18,909 --> 00:23:23,769
It's similar to the paradigm that I wrote
before. But I've specified exactly how I'm

221
00:23:23,769 --> 00:23:34,419
going to break this up. So let's start with
the merge operation. We're going to spend

222
00:23:34,419 --> 00:23:36,929
most of our time specing that.

223
00:23:36,929 --> 00:23:42,909
And again, there's many ways you could do
the merge. And we want the most efficient

224
00:23:42,909 --> 00:23:56,139
way. That's obviously going to determine complexity.
So, big question-- how to merge.

225
00:23:56,139 --> 00:24:03,169
So what I have here, if I look at the merge
step, is I've created my two sub problems

226
00:24:03,169 --> 00:24:11,240
corresponding to these two half planes. And
what I have here is-- let's say I've generated,

227
00:24:11,240 --> 00:24:18,820
at this point, a convex hull associated with
each of these sub problems. So what I have

228
00:24:18,820 --> 00:24:23,129
here is a1, a2.

229
00:24:23,129 --> 00:24:30,249
I'm going to go clockwise to specify the convex
hull. And the other thing that I'm going to

230
00:24:30,249 --> 00:24:39,350
do is in the sub problem case, my starting
point is going to be for the left sub problem,

231
00:24:39,350 --> 00:24:47,159
the coordinate that has the highest x value,
OK? So that's a1 in this case-- the highest

232
00:24:47,159 --> 00:24:50,470
x value going over. x is increasing to the
right.

233
00:24:50,470 --> 00:24:58,999
And for the right half of the problem, it's
going to be the coordinate that has the lowest

234
00:24:58,999 --> 00:25:07,009
x value. And I'm going to go clockwise in
both of these cases. So when you see an ordering

235
00:25:07,009 --> 00:25:14,970
associated with the subscripts for these points,
start with a1 or b1 and then go clockwise.

236
00:25:14,970 --> 00:25:20,159
And that's how we number this-- so just notational,
nothing profound here.

237
00:25:20,159 --> 00:25:26,519
So I got these two convex hulls-- these sub
hulls, if you will. And what I need to do

238
00:25:26,519 --> 00:25:32,850
now is merge them together. And you can obviously
look at this and it's kind of obvious what

239
00:25:32,850 --> 00:25:37,210
the overall convex hull is, right?

240
00:25:37,210 --> 00:25:46,779
But the key thing is, I'm going to have to
look at each of the pairs of points that are

241
00:25:46,779 --> 00:25:56,490
associated with this and that and try to generate
the tangents, the new tangents, that are not

242
00:25:56,490 --> 00:26:04,169
part of the sub hulls, but they're part of
the overall hull, right? So in this case,

243
00:26:04,169 --> 00:26:11,600
you can imagine an algorithm that is going
to kind of do what this brute force algorithm

244
00:26:11,600 --> 00:26:20,879
does except that it's looking at a point from
here and a point from here.

245
00:26:20,879 --> 00:26:28,529
So you could imagine that I'm going to do
a pairwise generation of segments. And then

246
00:26:28,529 --> 00:26:32,240
I'm going to check to see whether these segments
are actually tangents that are part of the

247
00:26:32,240 --> 00:26:38,240
overall convex hull or not. So what I would
do here is I'd look at this.

248
00:26:38,240 --> 00:26:45,820
And is that going to be part of the overall
hull? No, and precisely why not? Someone tell

249
00:26:45,820 --> 00:26:53,620
me why this segment a1 b1 is not part of the
overall hull? Yeah, go ahead.

250
00:26:53,620 --> 00:26:56,940
STUDENT: If we were to draw a line through the whole thing there would be one on both sides.

251
00:26:56,940 --> 00:27:03,700
PROFESSOR: Exactly right-- that's exactly
right. So here you go. So that's not part

252
00:27:03,700 --> 00:27:10,249
of it. Now, if I look at this-- well, same
reason that's not part of it.

253
00:27:10,249 --> 00:27:14,499
In this case-- and this is a fairly obvious
example. I'm going to do something that's

254
00:27:14,499 --> 00:27:19,509
slightly less obvious in case you get your
hopes up that we have this trivial algorithm,

255
00:27:19,509 --> 00:27:27,820
OK? This is looking good, right? That's supposed
to be a straight line, by the way.

256
00:27:27,820 --> 00:27:33,220
So a4 b2-- I mean, that's looking good, right?
Because all the points are on one side. So

257
00:27:33,220 --> 00:27:41,669
a4 b2 is our upper tangent. Right, so our
upper tangent is something that we're going

258
00:27:41,669 --> 00:27:49,249
to define as-- if I look at each of these
things, I'm going to say they have a yij.

259
00:27:49,249 --> 00:27:59,669
OK, what is yij? yij is the y-coordinate.
of the segment that I'm looking at, the ij

260
00:27:59,669 --> 00:28:00,129
segment.

261
00:28:00,129 --> 00:28:09,970
So this yij is for ai and bj. So what I have
here is y42 out here. And this is-- for the

262
00:28:09,970 --> 00:28:16,399
upper tangent, yij is going to be maximum,
right? Because that's essentially something

263
00:28:16,399 --> 00:28:20,850
which would ensure me that there are no points
higher than that, right?

264
00:28:20,850 --> 00:28:26,759
So if I go up all the way and I find this
that has the maximum yij, that is going to

265
00:28:26,759 --> 00:28:32,580
be my upper tangent. Because only for that
will I have no points ahead of that, OK? So

266
00:28:32,580 --> 00:28:34,409
yij is upper tangent.

267
00:28:34,409 --> 00:28:41,799
This is going to be maximum. And I'm not going
to write this down, but it makes sense that

268
00:28:41,799 --> 00:28:50,479
the lower tangent is going to have the lowest
yij. Are we all good here? Yeah, question.

269
00:28:50,480 --> 00:28:55,660
STUDENT: So I am just wondering, I couldn't hear
what she said why we moved out a1 b1.

270
00:28:55,660 --> 00:29:02,379
PROFESSOR: OK, so good. Let me-- that reason
we moved out a1 b1 is because if I just drew

271
00:29:02,379 --> 00:29:09,450
a1 b1 like this-- and I'm extrapolating this.
This is again supposed to be a straight line.

272
00:29:09,450 --> 00:29:14,039
Then you clearly see that there are points
on either side of the a1 b1 segment when you

273
00:29:14,039 --> 00:29:20,860
look at the overall problem, correct? You
see that on a1 b1, b2 is on this side, b3

274
00:29:20,860 --> 00:29:25,450
is on this side if I just extend this line
all the way to infinity in both directions.

275
00:29:25,450 --> 00:29:32,059
And that violates the requirement that the
segment be part of the overall hull, OK?

276
00:29:32,059 --> 00:29:36,639
That make sense? Good. So, everybody with
me?

277
00:29:36,639 --> 00:29:45,580
So clearly, there's a trivial merge algorithm
here. And the trivial merge algorithm is to

278
00:29:45,580 --> 00:29:55,039
look at not every pair of points-- every ab
pair, right? Every aibj pair.

279
00:29:55,039 --> 00:30:04,259
And so what is the complexity of doing that?
If I have n total points, the complexity would

280
00:30:04,259 --> 00:30:10,769
be-- would be in square, right? Because maybe
I'd have half here and half here, ignore constants.

281
00:30:10,769 --> 00:30:16,110
And you could say, well, it's going to be
n squared divided by 4, but that's theta n

282
00:30:16,110 --> 00:30:30,190
squared. So there's an obvious merge algorithm
that is theta n square looking at all pairs

283
00:30:30,190 --> 00:30:38,759
of points. And when I mean all pairs of points,
I mean like an a and a b.

284
00:30:38,759 --> 00:30:44,720
Because I want to pick a pair when I go left
of that dividing line and then right of the

285
00:30:44,720 --> 00:30:49,259
dividing line. But either way, it's theta
n square, OK? So now you look at that and

286
00:30:49,259 --> 00:30:53,879
you go, huh. Can I do a better?

287
00:30:53,879 --> 00:31:02,179
What if I just went for the highest a point
and the highest b point and I just, no, that's

288
00:31:02,179 --> 00:31:08,600
it? I'm done-- constant time. Wouldn't that
be wonderful? Yeah, wonderful, but incorrect,

289
00:31:08,600 --> 00:31:09,409
OK?

290
00:31:09,409 --> 00:31:14,559
Right, so what is an example. And so
this is something that I spent a little bit

291
00:31:14,559 --> 00:31:21,659
of time last night concocting. So I'm like
you guys too. I do my problem set the night

292
00:31:21,659 --> 00:31:22,950
before.

293
00:31:22,950 --> 00:31:35,539
Well, don't do as I do. Do as I say. But I've
done this before. So that's the difference.

294
00:31:35,539 --> 00:31:42,359
But this particular example is new. So what
I have here is I'm going to show you why there's

295
00:31:42,359 --> 00:31:54,809
not a trivial algorithm, OK, that-- I got
to get these angles right-- that you can't

296
00:31:54,809 --> 00:32:00,759
just pick the highest points and keep going,
right?

297
00:32:00,759 --> 00:32:06,470
And then that would be constant time. So that's
my a over here. And let's assume that I have

298
00:32:06,470 --> 00:32:11,109
my dividing line like that. And then what
I'm going to do here-- and I hope I get this

299
00:32:11,109 --> 00:32:17,960
right-- is I'm going to have something like
this, like that.

300
00:32:17,960 --> 00:32:30,409
And then I'm going to have b1 here clockwise--
so b2, b3, and b4. So as you can see here,

301
00:32:30,409 --> 00:32:49,389
if I look at a4-- a little adjustment necessary.
OK, so if I look at that, a4 to b1 versus--

302
00:32:49,389 --> 00:32:50,710
I mean, just eyeball it.

303
00:32:50,710 --> 00:32:58,669
A3 to b1-- right, is a4 to b1 going to be
the upper tangent? No, right? So now a3 is

304
00:32:58,669 --> 00:33:01,859
lower than a4. You guys see that?

305
00:33:01,859 --> 00:33:08,320
And b1 is lower than b2, right? So it's clear
that if I just took a4 to b2 that it will

306
00:33:08,320 --> 00:33:12,489
not be an upper tangent. Everybody see that?

307
00:33:12,489 --> 00:33:19,590
Yep, all right, good. So we can't have a constant
time algorithm. We have theta and square in

308
00:33:19,590 --> 00:33:24,289
the back. So it is there something-- maybe
theta n?

309
00:33:24,289 --> 00:33:34,429
How would we do this merge and find the upper
tangent by being a little smarter about searching

310
00:33:34,429 --> 00:33:43,570
for pairs of points that give us this maximum
yij? I mean, the goal here is simple. At some

311
00:33:43,570 --> 00:33:47,220
level, if you looked at the brute force, I
would generate each of these things.

312
00:33:47,220 --> 00:33:53,340
I would find the yj intercepts associated
with this line. And I just pick the maximum.

313
00:33:53,340 --> 00:33:56,070
And the constant time algorithm doesn't work.

314
00:33:56,070 --> 00:34:01,730
The theta n squared algorithm definitely works.
But we don't like it. So there has to be something

315
00:34:01,730 --> 00:34:05,990
in between. So, any ideas? Yeah, back there.

316
00:34:05,990 --> 00:34:14,000
STUDENT: So... I had a question. [INAUDIBLE]

317
00:34:14,000 --> 00:34:19,139
PROFESSOR: No, you're just finding-- no, you're
maximizing the yij. So for once you have this

318
00:34:19,139 --> 00:34:25,929
segment-- so the question was, isn't the obvious
merge algorithm theta n cubed, right? And

319
00:34:25,940 --> 00:34:31,668
my answer is no, because the theta n extra
factor came from the fact that you had to

320
00:34:31,668 --> 00:34:36,739
check every point, every endpoint, to see
on which side of the plane it was. Whereas

321
00:34:36,739 --> 00:34:41,070
here, what I'm doing is I've got this one
line here that is basically y equals 0, if

322
00:34:41,070 --> 00:34:47,909
you like, or y equals some-- I'm sorry, x
equals 0 or x equals some value.

323
00:34:47,909 --> 00:34:55,270
And I just need to, once I have the equation
for the line associated with a4 b1 or a4 b2,

324
00:34:55,270 --> 00:35:00,500
I just have to find the intercept of it, which
is constant time, right? And then once I find

325
00:35:00,500 --> 00:35:06,750
the intercept of it, I just maximize that
intercept to get my yij. So I'm good, OK?

326
00:35:06,750 --> 00:35:15,230
So it's only theta n squared, right? Good
question. So this is actually quite-- very,

327
00:35:15,230 --> 00:35:17,620
very, very clever.

328
00:35:17,620 --> 00:35:22,370
This particular algorithm is called the two
finger algorithm. And I do have multiple fingers,

329
00:35:22,370 --> 00:35:27,790
but it's going to work a lot better if I borrow
Eric's finger. And we're going to demonstrate

330
00:35:27,790 --> 00:35:36,300
to you the two finger algorithm for merging
these two convex hulls. And then we'll talk

331
00:35:36,300 --> 00:35:39,160
about the complexity of it.

332
00:35:39,160 --> 00:35:44,470
And my innovation again last night was to
turn this from a two-finger algorithm. Not

333
00:35:44,470 --> 00:35:48,660
only did I have the bright idea of using Eric--
I decided it was going to become the two finger

334
00:35:48,660 --> 00:35:52,720
an string algorithm. So this is wild.

335
00:35:52,720 --> 00:36:04,420
This is my contribution to 046 lore-- come
on. So the way the two finger algorithm works--

336
00:36:04,420 --> 00:36:10,220
this pseudo code should be incomprehensible.
If you just look at it and you go, what, right?

337
00:36:10,220 --> 00:36:15,100
But this demo is going to clear everything
up. Right so here's what you do. So now we're

338
00:36:15,100 --> 00:36:22,150
going to do a demo of the merge algorithm
that is a clever merge algorithm than the

339
00:36:22,150 --> 00:36:29,070
one that uses order n square time. And it's
correct. It's going to get you the correct

340
00:36:29,070 --> 00:36:37,290
upper tangent and what we are starting at
here is with Erik’s left finger on A1, which

341
00:36:37,290 --> 00:36:45,400
is defined to be the point that's closest
to the vertical line that you see here, the

342
00:36:45,400 --> 00:36:50,580
one that has the highest x-coordinate. And
my finger is on B1, which is the point that

343
00:36:50,580 --> 00:36:58,760
has the smallest X-coordinate on the right
hand side sub-hull. And what we do is we compute,

344
00:36:58,760 --> 00:37:06,460
for the segment A1 B1, we compute by Yij,
in this case Y11, which is the intercept on

345
00:37:06,460 --> 00:37:13,730
the vertical line that you see here that Erik
just marked with a red dot. And you can look

346
00:37:13,730 --> 00:37:19,960
at the pseudocode over on, to my right if
I face the board. And what happens now is

347
00:37:19,960 --> 00:37:26,730
I'm going to move clockwise, and I'm going
to go from B1 to B4. And what happened here?

348
00:37:26,730 --> 00:37:34,010
Did the Yij increase or decrease? Well, as
you can see it decreased. And so I'm going

349
00:37:34,010 --> 00:37:40,930
to go back to B1. And we're not quite done
with this step here. Erik’s going to go

350
00:37:40,930 --> 00:37:47,360
counterclockwise over to A4. And we're going
to check again, yeah, keep the string taught,

351
00:37:47,360 --> 00:37:53,440
check again whether Yij increased or decreased
and as is clear from here Yij increased. So

352
00:37:53,440 --> 00:38:00,700
now we move to this point. And as of this
moment we think that A4 B1 has the highest

353
00:38:00,700 --> 00:38:05,220
Yij. But we have a while loop. We’re going
to have to continue with this while loop,

354
00:38:05,220 --> 00:38:13,370
and now what happens is, I’m going to go
from B1 clockwise again to B4. And when this

355
00:38:13,370 --> 00:38:19,570
happens, did Yij increase or decrease? Well
it decreased. So I'm going to go back to B1

356
00:38:19,570 --> 00:38:32,430
and Erik now is going to go counterclockwise
to A3. And as you can see Y31 increased a

357
00:38:32,430 --> 00:38:39,350
little bit, so we're going to now stop this
iteration of the algorithm and we're at A3

358
00:38:39,350 --> 00:38:46,860
B1, which we think at this point is our upper
tangent, but let's check that. Start over

359
00:38:46,860 --> 00:38:54,750
again on my side B1 to B4, what happened?
Well Yij decreased. So I'm going to go back

360
00:38:54,750 --> 00:38:59,350
to B1. And then Erik’s going to try. He’s
going conterclockwise, he's going to go A3

361
00:38:59,350 --> 00:39:08,760
to A2 and, well, big decrease in Yij. Now
Erik goes back to A3. At this point we've

362
00:39:08,760 --> 00:39:17,040
tried both moves, my clockwise move and Erik’s
counterclockwise move. My move from B1 to

363
00:39:17,040 --> 00:39:24,890
B4 and Erik’s move from A3 to A2. So we've
converged, we're out of the while loop, A3

364
00:39:24,890 --> 00:39:34,010
B1 for this example is our upper tangent.
All right. You can have your finger back Erik.

365
00:39:34,010 --> 00:39:42,810
So the reason this works is because we have
a convex hull here and a convex hull here.

366
00:39:42,810 --> 00:39:51,100
We are starting with the points that are closest
to each other in terms of A1 being the closest

367
00:39:51,100 --> 00:39:58,240
to this vertical line, B1 being the closest
to this vertical line, and we are moving upward

368
00:39:58,240 --> 00:40:04,570
in both directions because I went clockwise
and Erik went counterclockwise. And that's

369
00:40:04,570 --> 00:40:09,000
the intuition of why this algorithm works.
We're not going to do a formal proof of this

370
00:40:09,000 --> 00:40:17,490
algorithm, but the monotonicity property corresponding
to the convexity of this subhull and the convexity

371
00:40:17,490 --> 00:40:24,100
of the subhull essentially can give you a
formal proof of correctness of this algorithm,

372
00:40:24,100 --> 00:40:31,010
but as I said we won't cover that in 046.
So all that remains now is to look at our

373
00:40:31,010 --> 00:40:37,850
pseudocode which matches the execution that
you just saw and talk about the complexity

374
00:40:37,850 --> 00:40:39,310
of the pseudocode.

375
00:40:39,310 --> 00:40:46,300
So what is the complexity of this algorithm?
It's order n, right? So what has happening

376
00:40:46,300 --> 00:40:51,970
here, if you look at this while loop, is that
while I have two counters, I'm essentially

377
00:40:51,970 --> 00:40:56,490
looking at two operations per loop.

378
00:40:56,490 --> 00:41:03,860
And either one of those counters is guaranteed
to increment through the loop. And so since

379
00:41:03,860 --> 00:41:11,750
I have in this case p points, in one case
p plus q equals n-- so let's say I had p points

380
00:41:11,750 --> 00:41:19,500
here and I have q points here. And got p plus
q equals n.

381
00:41:19,500 --> 00:41:29,250
And I got a theta n merge simply because I'm
going to be running through and incrementing--

382
00:41:29,250 --> 00:41:35,430
as long as I'm in the loop, I'm going to be
incrementing either the i or the j. And the

383
00:41:35,430 --> 00:41:41,830
maximum they can go to are p and q before
I bounce out of the loop or before they rotate

384
00:41:41,830 --> 00:41:42,850
around.

385
00:41:42,850 --> 00:41:50,780
And so that's why this is theta n. And so
you put it all together in terms of what the

386
00:41:50,780 --> 00:41:57,190
merge corresponds to in terms of complexity
and put that together with the overall divide

387
00:41:57,190 --> 00:42:06,270
and conquer. We have a case where this is
looking like a recurrence that you've seen

388
00:42:06,270 --> 00:42:08,460
many a time t of n.

389
00:42:08,460 --> 00:42:17,890
I've broken it up into two sub problems. So
I have 2. And I could certainly choose this

390
00:42:17,890 --> 00:42:27,750
l over here that's my line l to be such that
I have a good partition between the two sets

391
00:42:27,750 --> 00:42:28,400
of points.

392
00:42:28,400 --> 00:42:33,210
Now, if I choose l to be all the way on the
right hand side, then I have this large sub

393
00:42:33,210 --> 00:42:38,360
problem-- makes no sense whatsoever. So what
I can do-- there's nothing that's stopping

394
00:42:38,360 --> 00:42:46,610
me when I've sorted these points by the x-coordinates
to do the division such that there's exactly

395
00:42:46,610 --> 00:42:52,140
the same number, assuming an even number of
points n, exactly the same number on the left

396
00:42:52,140 --> 00:42:57,340
hand side or the right hand side. But I can
get that right roughly certainly within one

397
00:42:57,340 --> 00:42:59,000
very easily.

398
00:42:59,000 --> 00:43:04,510
So that's where the n over 2 comes from, OK?
In the next problem that we'll look at, the

399
00:43:04,510 --> 00:43:09,540
median finding problem, we'll find that trying
to get the sub problems to be of roughly equal

400
00:43:09,540 --> 00:43:14,620
size is actually a little difficult, OK? But
I want to point out that in this particular

401
00:43:14,620 --> 00:43:20,970
case, it's easy to get sub problems that are
half the size because you've done the sorting.

402
00:43:20,970 --> 00:43:26,970
And then you just choose the line, the vertical
line such that you've got a bunch of points

403
00:43:26,970 --> 00:43:33,320
that are on either side. And then in terms
of the merge operation, we have 2t n over

404
00:43:33,320 --> 00:43:41,310
2 plus theta n. People recognize this recurrence?
It's the old merge sort recurrence.

405
00:43:41,310 --> 00:43:45,920
So we did all of this in-- well, it's not
merge sort. Clearly the algorithm is not merge

406
00:43:45,920 --> 00:43:48,340
sort. We got the same recurrence.

407
00:43:48,340 --> 00:43:56,960
And so this is theta n log n-- so a lot better
than theta nq. And there's no convex hull

408
00:43:56,960 --> 00:44:02,850
algorithm that's in the general case better
than this. Even the gift wrapping algorithm

409
00:44:02,850 --> 00:44:07,720
that I mentioned to you, with the right data
structures, it gets down to that in terms

410
00:44:07,720 --> 00:44:11,010
of theta n log n, but no better.

411
00:44:11,010 --> 00:44:17,890
OK, so good. That's pretty much what I had
here. Again, like I said, happy to answer

412
00:44:17,890 --> 00:44:25,750
questions about the correctness of this loop
algorithm for merge later. Any other questions

413
00:44:25,750 --> 00:44:27,570
associated with this?

414
00:44:27,570 --> 00:44:28,560
STUDENT: Question.

415
00:44:28,560 --> 00:44:29,760
Yeah, back there.

416
00:44:29,770 --> 00:44:33,940
STUDENT: If the input is recorded by x coordinates, can you do better than [INAUDIBLE]?

417
00:44:33,940 --> 00:44:41,100
PROFESSOR: No, you can't, because-- I mean,
the n log n for the pre-sorting, I mean, there's

418
00:44:41,100 --> 00:44:48,120
another theta n log n for the sorting at the
top level. And we didn't actually use that,

419
00:44:48,120 --> 00:44:54,010
right? So the question was, can we do better
if the input was pre sorted?

420
00:44:54,010 --> 00:45:00,920
And I actually did not even use the complexity
of the sort. We just matched it in this case.

421
00:45:00,920 --> 00:45:05,220
So theta n log n-- and then you can imagine
maybe that you could do a theta n sort if

422
00:45:05,220 --> 00:45:09,860
these points were small enough and you rounded
them up and you could use a bucket sort or

423
00:45:09,860 --> 00:45:12,130
a counting sort and lower that.

424
00:45:12,130 --> 00:45:17,720
So this theta n log n is kind of fundamental
to the divide and conquer algorithm. The only

425
00:45:17,720 --> 00:45:23,530
way you can improve that is by making a merge
process that's even faster. And we obviously

426
00:45:23,530 --> 00:45:29,470
tried to cook up a theta one merge process.
But that didn't work out, OK?

427
00:45:29,470 --> 00:45:33,720
STUDENT: But are there algorithms that [INAUDIBLE] ?

428
00:45:33,720 --> 00:45:38,920
PROFESSOR: First-- if you assume certain things
about the input, you're absolutely, right?

429
00:45:38,930 --> 00:45:45,540
So one thing you'll discover in algorithms
in 6046 as well is that we're never satisfied.

430
00:45:45,540 --> 00:45:49,630
OK, so I just said, oh, you can't do better
than theta n log n.

431
00:45:49,630 --> 00:45:54,620
But that's in the general case. And I think
I mentioned that. You're on the right track.

432
00:45:54,620 --> 00:46:00,440
If the input is pre sorted, you can take that
away-- no, it doesn't help in that particular

433
00:46:00,440 --> 00:46:09,110
instance if you have general settings. But
if you-- the two dimensional case-- if the

434
00:46:09,110 --> 00:46:17,140
hull, all the segments have a certain characteristic--
not quite planar, but something that's a little

435
00:46:17,140 --> 00:46:21,430
more stringent than that-- you could imagine
that you can do improvements. I don't know

436
00:46:21,430 --> 00:46:27,750
if any compelling special case input for convex
hull from which you can do better than theta

437
00:46:27,750 --> 00:46:28,540
n log n.

438
00:46:28,540 --> 00:46:34,520
But that's a fine exercise for you, which
is in what cases, given some structure on

439
00:46:34,520 --> 00:46:38,690
the points, can I do better than theta n log
n? So that's something that keeps coming up

440
00:46:38,690 --> 00:46:45,890
in the algorithm literature, if you can use
that, OK? Yeah, back there-- question.

441
00:46:45,890 --> 00:46:47,710
STUDENT: Where's your [INAUDIBLE] step?

442
00:46:47,710 --> 00:46:53,000
You also have to figure out which lines to remove from each of your two...

443
00:46:53,000 --> 00:46:58,560
PROFESSOR: Ah, good point. And you're exactly,
absolutely right. And I just realized that

444
00:46:58,560 --> 00:47:00,060
I skipped that step, right?

445
00:47:00,060 --> 00:47:05,360
Thank you so much. So the question was, how
do I remove the lines? And it's actually fairly

446
00:47:05,360 --> 00:47:06,090
straightforward.

447
00:47:06,090 --> 00:47:13,720
Let's keep this up here. And we don't need
this incomprehensible pseudo code, right?

448
00:47:13,720 --> 00:47:16,800
So let's erase that.

449
00:47:16,800 --> 00:47:24,060
And thank you for asking that question. So
it's a little simple cut and paste approach

450
00:47:24,060 --> 00:47:39,440
where let's say that I find the upper tangent
ai bj. And I find the lower tangent.

451
00:47:39,440 --> 00:47:51,430
Let's call it ak bm. And in this particular
instance, what do I have? I have a1, a2, a3,

452
00:47:51,430 --> 00:48:00,850
a4 as being one of my sub hulls. And then
I have b1, b2, b3, b4 as the other one.

453
00:48:00,850 --> 00:48:10,830
Now, what did we determine to be the upper
tangent? Was it a3 b1? Right, a3 b1?

454
00:48:10,830 --> 00:48:27,370
So a3 b1 was my upper tangent. And I guess
it was a1-- a1 b4? A1 b4 was my lower tangent.

455
00:48:27,370 --> 00:48:34,580
So the big question is, now that I've found
these two, how do I generate the collect representation

456
00:48:34,580 --> 00:48:40,990
of the overall convex hull? And so it turns
out that you have to do this-- and then the

457
00:48:40,990 --> 00:48:46,660
complexity of this is important as well. And
you need to do what's called a cut and paste

458
00:48:46,660 --> 00:48:50,350
that's associated with this where we're going
to just look at this and that.

459
00:48:50,350 --> 00:48:54,740
So if we're going to have these two things,
then we've got to generate a list of points.

460
00:48:54,740 --> 00:49:00,200
Now, clearly a4 is not going to be part of
that, right? A4 is not going to be part of

461
00:49:00,200 --> 00:49:01,490
the overall hull.

462
00:49:01,490 --> 00:49:11,880
What is it that we want? We want something
like a1, a2, a3, b1, b2, b3, b4, right? But

463
00:49:11,880 --> 00:49:16,210
there's a point that we have to discard here.
Agree?

464
00:49:16,210 --> 00:49:22,780
And so the way we do this is very mechanical.
That's the good news here. I mean, you don't

465
00:49:22,780 --> 00:49:24,230
have to look at it pictorially.

466
00:49:24,230 --> 00:49:30,840
I just made that up looking at-- eyeballing
it. Clearly, a computer doesn't have eyeballs,

467
00:49:30,840 --> 00:49:37,430
right? And so what we're going to do is we're
going to say the first link-- in general,

468
00:49:37,430 --> 00:49:40,960
the first link is ai to bj.

469
00:49:40,960 --> 00:49:49,940
Because that's my upper tangent, OK? And in
this case, it's going to be a3 d1, OK? And

470
00:49:49,940 --> 00:50:08,820
then I'm going to go down the b list until
you see bm, which is the lower tangent.

471
00:50:08,820 --> 00:50:12,640
You're on the b list. So you're looking for
the lower tangent point. And then you're going

472
00:50:12,640 --> 00:50:20,240
to jump until you see bm. You link it to ak,
OK?

473
00:50:20,240 --> 00:50:34,310
You link it to ak and continue until you return
to ai. And then you have your circular

474
00:50:34,310 --> 00:50:42,910
list, OK? So what you see here is you have
a3 here. So I'm going to go ahead and write

475
00:50:42,910 --> 00:50:47,980
out the execution of what I just wrote here.

476
00:50:47,980 --> 00:50:54,500
So I have a3. And I'm going to go jump over
to b1. So I'm going to write down b1. Then

477
00:50:54,500 --> 00:50:58,340
I'm going to along the b's until I get to
b4.

478
00:50:58,340 --> 00:51:05,440
In this case, I'm going to include all of
the b's. So I got b1, b2, b3, b4. And then

479
00:51:05,440 --> 00:51:13,810
I'm going to jump from b4 to a1 because that's
part of my lower tangent.

480
00:51:13,810 --> 00:51:25,250
And I got a1 here, a2. And then I'm back to
a3, which is great. Because then I'm done,

481
00:51:25,250 --> 00:51:26,290
OK?

482
00:51:26,290 --> 00:51:32,050
And so exactly what I said happened, thank
goodness, which is we dropped a4 but we kept

483
00:51:32,050 --> 00:51:38,300
all the other points. Does that answer your
question? Good.

484
00:51:38,300 --> 00:51:44,520
What is the complexity of cut and paste? It's
order n. I'm just walking through these lists.

485
00:51:44,520 --> 00:51:50,730
So there's no hidden complexity here, OK?
Good, good-- thank you. You definitely deserve

486
00:51:50,730 --> 00:51:51,440
a Frisbee.

487
00:51:51,440 --> 00:51:59,400
In fact, you deserve two, right? Where are
you? I-- oh, could you stand up?

488
00:51:59,400 --> 00:52:09,570
Yeah, right-- two colors. All right. Oh, so
he-- well, you can give it to him if you like.

489
00:52:09,570 --> 00:52:12,380
So good, thank you.

490
00:52:12,380 --> 00:52:20,760
So are we done? Are we done with convex hull?
OK, good. So let's go on and do median finding.

491
00:52:20,760 --> 00:52:25,960
Very different-- very different set of issues
here.

492
00:52:25,960 --> 00:52:37,960
Still on divide and conquer, but a very different
set of issues. The specification here is,

493
00:52:37,960 --> 00:52:44,210
of course, straightforward. You can think
of it as I just want a better algorithm than

494
00:52:44,210 --> 00:52:51,480
sorting and looking for the median at the
particular position-- in over two position,

495
00:52:51,480 --> 00:53:01,680
for example. Let's say n is odd. And it's
floor of n over 2. You can find that median.

496
00:53:01,680 --> 00:53:10,300
Right, so it's pretty easy if you can do sorting.
But we're never satisfied with using a standard

497
00:53:10,300 --> 00:53:14,970
algorithm. If we think that we can do better
than that. So the whole game here is going

498
00:53:14,970 --> 00:53:18,970
to be I'm going to find the median.

499
00:53:18,970 --> 00:53:32,910
And I want to do it in better than theta n
log n time. OK, so that's what median finding

500
00:53:32,910 --> 00:53:37,070
is all about. You're going to use divide and
conquer for this.

501
00:53:37,070 --> 00:53:53,880
And so in general, we're going to define,
given a set of n numbers, define rank of x

502
00:53:53,880 --> 00:54:06,510
as the numbers in the set that are greater
than-- I'm sorry, less than or equal to x.

503
00:54:06,510 --> 00:54:09,270
I mean, you could have defined it differently.
We're going to go with less than or equal

504
00:54:09,270 --> 00:54:10,750
to.

505
00:54:10,750 --> 00:54:18,570
So in general, the rank, of course, is something
that could be used very easily to find the

506
00:54:18,570 --> 00:54:28,930
median. So if you want to find the element
of rank n plus 1 divided by 2 floor, that's

507
00:54:28,930 --> 00:54:38,650
what we call the lower median. And n plus
1 divided by 2 ceiling is the upper median.

508
00:54:38,650 --> 00:54:43,730
And they may be the same if n is odd.

509
00:54:43,730 --> 00:54:48,210
But that's what we want. So you can think
of it as it's not median finding, but finding

510
00:54:48,210 --> 00:54:55,200
elements with a certain rank. And we want
to do this in linear time, OK?

511
00:54:55,200 --> 00:55:05,400
So we're going to apply divide and conquer
here. And as always, the template can be instantiated.

512
00:55:05,400 --> 00:55:11,780
And the devil is in the details of either
division or merge.

513
00:55:11,780 --> 00:55:19,460
And we had most of our fun with convex hull
on the merge operation. It turns out most

514
00:55:19,460 --> 00:55:32,770
of the fun here with respect to median finding
is in the divide, OK? So what I want is the

515
00:55:32,770 --> 00:55:38,780
definition of a select routine that takes
a set of numbers s.

516
00:55:38,780 --> 00:55:49,340
And this is the rank. So I want a rank i.
And that i might be n over 2-- well, floor

517
00:55:49,340 --> 00:55:52,560
of n plus 1 over 2, whatever?

518
00:55:52,560 --> 00:55:56,500
And so what does the divide and conquer look
like? Well, the first thing you need to do

519
00:55:56,500 --> 00:56:04,920
is divide. And as of now, we're just going
to say you're going to pick some element x

520
00:56:04,920 --> 00:56:06,240
belonging to s.

521
00:56:06,240 --> 00:56:10,230
And this choice is going to be crucial. But
at this point, I'm not ready to specify this

522
00:56:10,230 --> 00:56:15,640
choice yet, OK? So we're going to have to
do this cleverly. And then what we're going

523
00:56:15,640 --> 00:56:30,220
to do is we're going to compute on k, which
is the rank of x, and generate two sub arrays

524
00:56:30,220 --> 00:56:35,680
such that I want to find the fifth highest
element. I want to find the median element.

525
00:56:35,680 --> 00:56:40,910
I want to find the 10th highest element. So
I have to keep track of what happens in the

526
00:56:40,910 --> 00:56:46,990
sub problems. Because the sub problems are
going to determine, depending on how many

527
00:56:46,990 --> 00:56:52,700
elements are inside those sub problems, which
I can only determine after I've solved those

528
00:56:52,700 --> 00:56:56,740
sub problems. I'm going to have to collect
that information and put it together in the

529
00:56:56,740 --> 00:56:59,080
merge operation.

530
00:56:59,080 --> 00:57:05,700
So if I want to find the 10th highest element
and I've broken it up relatively arbitrarily,

531
00:57:05,700 --> 00:57:10,240
it's quite possible that the 10th highest
element is going to be discovered in the left

532
00:57:10,240 --> 00:57:15,150
one or the right one. And I have to show that
it's the 10th highest. And it might be that

533
00:57:15,150 --> 00:57:24,090
there's four elements in the left and five
on the right that are-- let's see.

534
00:57:24,090 --> 00:57:29,010
If I defined the rank as less than or equal
to x, there's four on the left and five on

535
00:57:29,010 --> 00:57:34,680
the right that are smaller. And that's why
this is the 10th highest element. And that's

536
00:57:34,680 --> 00:57:44,330
essentially what we have to look at. So b
and c are going to correspond to the sub arrays

537
00:57:44,330 --> 00:57:49,600
that you can clearly eliminate one of them.

538
00:57:49,600 --> 00:57:55,350
You can count the number of elements in b,
count the number of elements in c. And you

539
00:57:55,350 --> 00:58:03,810
can eliminate one of them in this recursion
as you're discovering this element with the

540
00:58:03,810 --> 00:58:09,320
correct rank-- in this case, i. So let me
write the rest of this out and make sure we're

541
00:58:09,320 --> 00:58:11,710
all on the same page.

542
00:58:11,710 --> 00:58:23,570
What I have here pictorially is I've generated
b here and c. So this is all of b and that's

543
00:58:23,570 --> 00:58:30,970
all of c. I have k minus 1 elements here in
b.

544
00:58:30,970 --> 00:58:44,170
And let's say I have n minus k elements in
c. And I'm going to do-- essentially take--

545
00:58:44,170 --> 00:58:49,080
once I've selected a particular element, I'm
going to look at all of the elements that

546
00:58:49,080 --> 00:58:52,510
are less than it and put it into the array
b. I'm going to look at all the elements that

547
00:58:52,510 --> 00:58:53,560
are better than it.

548
00:58:53,560 --> 00:58:58,830
Let's assume all elements are unique. I'm
going to put all of them into c. And I'm going

549
00:58:58,830 --> 00:59:06,030
to recur on b and c. Those two are my sub
problems.

550
00:59:06,030 --> 00:59:17,120
But what I have to do is once I recur and
I discover the ranks of the sub problems,

551
00:59:17,120 --> 00:59:23,300
I have to put them together. So what I have
here is if k equals i-- so I computed the

552
00:59:23,300 --> 00:59:32,430
rank and I realized that if k equals-- equals
i, I should say-- if k equals i, then I'm

553
00:59:32,430 --> 00:59:35,360
going to just return x. I'm done at this point.

554
00:59:35,360 --> 00:59:42,610
I got lucky. I picked an element x and it
magically ended up having the correct rank,

555
00:59:42,610 --> 00:59:53,030
OK? Not always going to happen. And so in
other case, if k is greater than i, then going

556
00:59:53,030 --> 01:00:01,540
to return select bi.

557
01:00:01,540 --> 01:00:07,670
So what I've done here is if k is greater
than i, then I'm going to say, oh, so now

558
01:00:07,670 --> 01:00:11,540
I'm going to have to find the element in b.
I know that it's going to be in b because

559
01:00:11,540 --> 01:00:15,740
k is greater than i. And I've got to find
the exact position depending on what i is

560
01:00:15,740 --> 01:00:23,530
over here. But it's going to be somewhere
between 1 and k minus 1.

561
01:00:23,530 --> 01:00:34,350
And then the last case is if k is less than
i, then this is a little more tricky. I'm

562
01:00:34,350 --> 01:00:47,080
going to turn on c of i minus k, OK? So what
happens here is that my k is-- the rank for

563
01:00:47,080 --> 01:00:50,520
the x that I looked at over here is less than
i.

564
01:00:50,520 --> 01:00:57,110
So I know that I'm going to find this element
that I'm looking for in c. But if I just look

565
01:00:57,110 --> 01:01:05,050
at c, I don't want to look at c and look for
an element of rank i within c, right? That

566
01:01:05,050 --> 01:01:09,880
doesn't make sense because I'm looking for
an element of rank i in the overall array

567
01:01:09,880 --> 01:01:11,300
that was given to me.

568
01:01:11,300 --> 01:01:18,730
So I have to subtract out the k elements that
correspond to x and all of the k minus 1 elements

569
01:01:18,730 --> 01:01:25,450
that are in b to go figure out exactly what
position or rank I'm looking for in the sub

570
01:01:25,450 --> 01:01:31,830
array corresponding to c, OK? So, people buy
that. So that's just a small, little thing

571
01:01:31,830 --> 01:01:34,660
that you have to keep in mind as you do this.

572
01:01:34,660 --> 01:01:41,480
So that's pretty straightforward, looking
pretty good. And you say, well, am I done

573
01:01:41,480 --> 01:01:49,750
here? And as you can imagine, the answer is
no, because we haven't specified this value.

574
01:01:49,750 --> 01:01:59,070
Now, can someone tell me, at least from an
efficiency standpoint, what might happen,

575
01:01:59,070 --> 01:02:04,790
what we're looking for here? As you can imagine,
we want to improve on theta n log n. And so

576
01:02:04,790 --> 01:02:10,070
you could you say, well, I'm happy with theta
n. That theta n complexity algorithm is better

577
01:02:10,070 --> 01:02:13,770
than a theta n log n complexity algorithm,
which is kind of in the bag.

578
01:02:13,770 --> 01:02:18,340
Because we know how to sort and we know how
to index. So we want a theta n algorithm.

579
01:02:18,340 --> 01:02:28,880
Now, if you take this and if I just picked,
let's say, the biggest element-- I kept picking

580
01:02:28,880 --> 01:02:36,560
x to be n or n minus 1 or just picked a constant
value. I picked x to be in the middle.

581
01:02:36,560 --> 01:02:42,130
I picked the index. I can always pick an element
based on its index. I can always go for the

582
01:02:42,130 --> 01:02:43,680
middle one.

583
01:02:43,680 --> 01:02:51,340
So what is the worst case complexity of this
algorithm? If I don't specify or I give you

584
01:02:51,340 --> 01:02:56,170
this arbitrary selection corresponding to
x belonging to s, what is the worst case complexity

585
01:02:56,170 --> 01:02:59,300
of this algorithm? Yeah, go ahead.

586
01:02:59,300 --> 01:03:00,160
STUDENT: N squared.

587
01:03:00,160 --> 01:03:01,400
PROFESSOR: N squared-- why is that?

588
01:03:01,400 --> 01:03:04,360
STUDENT: Because if you [INAUDIBLE] take like the least element.

589
01:03:04,360 --> 01:03:05,100
PROFESSOR: Yep.

590
01:03:05,100 --> 01:03:08,820
STUDENT: How do you compare like N o against the other analysis?

591
01:03:08,820 --> 01:03:12,560
PROFESSOR: Exactly right. That's exactly right.
So what happens is that you're doing a bunch

592
01:03:12,560 --> 01:03:15,430
of work here with this theta n work.

593
01:03:15,430 --> 01:03:21,650
Right here, this is theta n work, OK? So given
that you're doing theta n work here, you have

594
01:03:21,650 --> 01:03:27,930
to be really careful as to how you pick the
x element. So what might happen is that you

595
01:03:27,930 --> 01:03:30,160
end up picking the x over here.

596
01:03:30,160 --> 01:03:34,740
And given the particular rank you're looking
for, you have to now-- you're left with a

597
01:03:34,740 --> 01:03:40,160
large array that has n minus 1 elements in
the worst case. You started with n. You did

598
01:03:40,160 --> 01:03:45,460
not go to n over 2 and n over 2, which is
what divide and conquer is all about-- even

599
01:03:45,460 --> 01:03:47,180
n over b, OK?

600
01:03:47,180 --> 01:03:52,200
You went to n minus 1. And then you go to
n minus 2. And you go to n minus 3 because

601
01:03:52,200 --> 01:03:56,140
you're constantly picking-- this is worst
case analysis. You're constantly picking these

602
01:03:56,140 --> 01:03:59,990
sub arrays to be extremely unbalanced.

603
01:03:59,990 --> 01:04:05,440
So when the sub arrays are extremely unbalanced,
you end up doing theta n work in each

604
01:04:05,440 --> 01:04:10,980
level of the recursion. And those theta n's,
because you're going down all the way from

605
01:04:10,980 --> 01:04:18,600
n to one, are going to be theta n square when
you keep doing that, OK? So thanks for that

606
01:04:18,600 --> 01:04:22,010
analysis.

607
01:04:22,010 --> 01:04:32,170
And so this is theta n squared if you have
a batch selection. So we won't talk about

608
01:04:32,170 --> 01:04:38,520
randomized algorithms, but the problem with
randomized algorithms is that the analysis

609
01:04:38,520 --> 01:04:45,140
will be given a probability distribution.
And it'll be expected time.

610
01:04:45,140 --> 01:04:51,890
What we want here is a deterministic algorithm
that is guaranteed to run in worst case theta

611
01:04:51,890 --> 01:05:00,740
n. So we want a deterministic way of picking
x belonging to s such that all of this works

612
01:05:00,740 --> 01:05:05,320
out and when we get our recurrence and we
solve it, somehow magically we're getting

613
01:05:05,320 --> 01:05:12,860
fully balanced partitions-- firmly balanced
sub problems in the sense that it's not n

614
01:05:12,860 --> 01:05:17,540
minus 1 and 1. It's something like-- it could
even be n over 10 and 9n over 10.

615
01:05:17,540 --> 01:05:22,320
But as long as you guarantee that, you're
shaking things down geometrically. And the

616
01:05:22,320 --> 01:05:28,320
asymptotics is going to work out. but the
determinism is what we need.

617
01:05:28,320 --> 01:05:42,190
And so we're going to pick x cleverly. And
we don't want the rank x to be extreme.

618
01:05:42,190 --> 01:05:49,000
So this is not the only way you could do it,
but this is really very clever.

619
01:05:49,000 --> 01:05:57,880
There's a deterministic way. And you're going
to see some arbitrary constants here. And

620
01:05:57,880 --> 01:06:03,910
we'll talk about them once I've described
it. But what we're going to do is we're going

621
01:06:03,910 --> 01:06:08,270
to arrange s into columns of size 5, right?

622
01:06:08,270 --> 01:06:12,130
We're going to take this single array. And
we're going to make it a two dimensional array

623
01:06:12,130 --> 01:06:20,040
where the number of rows is five and the number
of columns that you have is n over 5-- the

624
01:06:20,040 --> 01:06:35,570
ceiling in this case. And then we're going
to sort it each column, big elements on top.

625
01:06:35,570 --> 01:06:38,180
And we're going to do this in linear time.

626
01:06:38,180 --> 01:06:44,460
And you might say, how did that happen? Well,
there's only five elements. So it's linear.

627
01:06:44,460 --> 01:06:47,360
You could do whatever you wanted. You could
do n raised to four.

628
01:06:47,360 --> 01:06:55,230
But it's five raised to four and it's constants.
Don't you love theory? So then we're going

629
01:06:55,230 --> 01:06:59,630
to find what we're going to call the median
of medians.

630
01:06:59,630 --> 01:07:04,380
So I'm going to explain this. This works for
arbitrary rank, but it's a little easier to

631
01:07:04,380 --> 01:07:09,790
focus in on the median to just explain the
particular example. Because as you can see,

632
01:07:09,790 --> 01:07:18,090
there's an intricacy here associated with
the break up.

633
01:07:18,090 --> 01:07:23,890
And so here we go. I'm going to draw out a
picture. And we're going to try and argue

634
01:07:23,890 --> 01:07:32,560
that this deterministic strategy that I'll
specify gives you fairly balanced partitions

635
01:07:32,560 --> 01:07:35,730
in all cases, OK?

636
01:07:35,730 --> 01:07:47,270
So what we see here is we see-- pictorially,
you see columns of length five. Each of these

637
01:07:47,270 --> 01:08:00,640
dots corresponds to a number. This one dimensional
array got turned into a two dimensional right.

638
01:08:00,640 --> 01:08:08,010
So I got four full columns. And it's suddenly
possible, given n, that my fifth column is

639
01:08:08,010 --> 01:08:16,299
not full, right? So that's certainly possible.
So that's why I have that up here. It so what

640
01:08:16,299 --> 01:08:19,420
I've here is I'm going to lay them out this
way.

641
01:08:19,420 --> 01:08:31,889
And I'm going to look at that. I'm going to
look at the middle elements of each of these

642
01:08:31,889 --> 01:08:41,549
n over five columns. That's exactly what I'm
going to look at. Now, if I look at what I

643
01:08:41,549 --> 01:08:48,179
want, what I want over here is this x. If
I want to find--

644
01:08:48,179 --> 01:08:59,109
I'm going to find the median of medians. So
is x. Now, it is true the first that

645
01:08:59,109 --> 01:09:03,269
these columns-- I'm just putting that up here
imagining that that's x.

646
01:09:03,269 --> 01:09:12,568
That's not guaranteed to be x because the
columns themselves aren't-- well, these columns

647
01:09:12,568 --> 01:09:18,849
are sorted. And what I'm going to have to
guarantee, of course, is that when I go find

648
01:09:18,849 --> 01:09:25,749
this median of medians is that it ends up
being something that gives me balanced partitions.

649
01:09:25,749 --> 01:09:32,749
So maybe say a little bit more before I explain
what's going on.

650
01:09:32,749 --> 01:09:38,960
Each of these columns is sorted. And s is
arranged into columns of size 5 like I just

651
01:09:38,960 --> 01:09:51,259
said here. These are the medians, OK? If I
look at determining the medians and I say

652
01:09:51,259 --> 01:09:57,449
that once I've determined this x, which I've
discovered that it's the median, then this

653
01:09:57,449 --> 01:10:00,710
is right there in the middle. There's going
to be a bunch of columns to the left of it,

654
01:10:00,710 --> 01:10:04,239
a bunch of elements to the left of it, and
a bunch of elements to the right of it.

655
01:10:04,239 --> 01:10:08,909
And in this case, I have five columns. I could
have had more. It happens to be the third

656
01:10:08,909 --> 01:10:09,880
one.

657
01:10:09,880 --> 01:10:15,909
So the idea is that once I find this median
of medians, which corresponds to this x number,

658
01:10:15,909 --> 01:10:23,999
I can say that all of the columns-- these
all correspond to columns that have their

659
01:10:23,999 --> 01:10:29,070
median element greater than x. These correspond
to columns that have their median element

660
01:10:29,070 --> 01:10:39,159
less than x, OK? So what I have here in this
picture is that these elements here are going

661
01:10:39,159 --> 01:10:42,199
to be greater than x.

662
01:10:42,199 --> 01:10:56,440
And these elements here are going to be less
than x. So let me clear. What's happened here

663
01:10:56,440 --> 01:11:07,659
is we've not only sorted all of the columns
such that you have large elements up here.

664
01:11:07,659 --> 01:11:12,360
Each of these five columns have been sorted
that way. On top of that, I've discovered

665
01:11:12,360 --> 01:11:19,929
the particular column that corresponds to
the medians of medians. And this is my x over

666
01:11:19,929 --> 01:11:20,989
here.

667
01:11:20,989 --> 01:11:25,199
And it may be the case that these columns
aren't sorted. This one may be larger than

668
01:11:25,199 --> 01:11:29,030
that or vice versa-- same thing over there.
I have no idea.

669
01:11:29,030 --> 01:11:36,119
But it's guaranteed that once I find this
median that I do know all of the columns that

670
01:11:36,119 --> 01:11:44,550
have elements in this position that are less
than this x. And I know columns that in this

671
01:11:44,550 --> 01:11:48,450
position have elements that are greater than
x, OK? Yep.

672
01:11:48,450 --> 01:11:56,200
STUDENT: Shouldn't the two elements below x also be computed [INAUDIBLE] less than x.

673
01:11:56,200 --> 01:12:04,579
PROFESSOR: You're exactly right. I would have
probably been able to get the same asymptotic

674
01:12:04,579 --> 01:12:09,440
complexity if I dropped those because I had
a constant number. But you're absolutely exactly

675
01:12:09,440 --> 01:12:10,199
right.

676
01:12:10,199 --> 01:12:15,429
So the point that-- the question was-- I just
redrew it. These two are clearly less than

677
01:12:15,429 --> 01:12:21,610
x as well because they're part of the sorting.
And that's essentially I have here.

678
01:12:21,610 --> 01:12:26,679
Now, my goal here-- and you can kind of see
from here as to where we're headed. What I've

679
01:12:26,679 --> 01:12:31,780
down here by this process of sorting each
column and finding the median of medians is

680
01:12:31,780 --> 01:12:37,760
that I found this median of medians such that
there's a bunch of columns on the left. And

681
01:12:37,760 --> 01:12:41,530
roughly half of those elements in those columns
are less than x.

682
01:12:41,530 --> 01:12:47,739
And there are a bunch of columns on the right.
And roughly half of those columns have elements

683
01:12:47,739 --> 01:12:54,030
that are greater than x. So what I now have
to do is to do a little bit of math to show

684
01:12:54,030 --> 01:12:58,079
you exactly what the recurrence is. And let
me do that over here.

685
01:12:58,079 --> 01:13:03,550
So that's the last thing that we have to do.
I probably won't solve the recurrence, but

686
01:13:03,550 --> 01:13:10,469
that can wait until tomorrow. The recurrence
will be something that's not particularly

687
01:13:10,469 --> 01:13:23,670
difficult to solve. So I want to now make
a more quantitative argument that the variable

688
01:13:23,670 --> 01:13:33,300
being n as to how many elements are guaranteed
to be greater than x.

689
01:13:33,300 --> 01:13:38,519
And essentially what I'm saying, which is
I'm writing out what I have on that picture

690
01:13:38,519 --> 01:14:02,030
there, half of the n over 5 groups contribute
at least three elements greater than x except

691
01:14:02,030 --> 01:14:10,559
for one group with possibly less than five
elements, which is the one that I have all

692
01:14:10,559 --> 01:14:27,550
the way to the right, and one group that contains
x.
So for all the other columns, I'm going to

693
01:14:27,550 --> 01:14:35,619
get three elements that are greater than x.
And so if you write that out, this says there

694
01:14:35,619 --> 01:14:47,199
are at least three n over 10, because I have
half of all of those groups, minus 2.

695
01:14:47,199 --> 01:14:53,239
And I'm not counting perfectly accurately
here, but I have an at least. So this should

696
01:14:53,239 --> 01:15:02,219
all be fine. 3n over 1d-- 3 times n over 10
minus 2 elements are strictly greater than

697
01:15:02,219 --> 01:15:06,179
x. And that comes from that picture.

698
01:15:06,179 --> 01:15:14,639
I'm going to be able to say the same thing
for less than x as well. I can't count the

699
01:15:14,639 --> 01:15:20,960
one. Depending on how things go, maybe I could
have played around and subtracted 1 instead

700
01:15:20,960 --> 01:15:23,440
of a 2 in the latter case.

701
01:15:23,440 --> 01:15:28,829
But I'm just being conservative here. It is
clear that I'm going to have a bunch of columns

702
01:15:28,829 --> 01:15:34,789
that are full columns, that are going to be
contributing three elements that are greater

703
01:15:34,789 --> 01:15:39,249
than x. And in this case, I have, well, two
of them here for the less than x.

704
01:15:39,249 --> 01:15:44,550
And I got one for the greater than x. So that's
all that I'm seeing over here with respect

705
01:15:44,550 --> 01:15:49,800
to the balance of the partitions. And it turns
out that's enough.

706
01:15:49,800 --> 01:15:57,949
It turns out all I have to do with this observation
is to go off and run the recurrence. And we're

707
01:15:57,949 --> 01:16:04,160
going to get an efficient algorithm. Yep.

708
01:16:04,160 --> 01:16:08,059
STUDENT: Should it not be like greater than or equal to, because there's... [INAUDIBLE]

709
01:16:08,059 --> 01:16:11,240
PROFESSOR: No, there's nothing that's equal.

710
01:16:11,240 --> 01:16:12,580
STUDENT: So you are saying, that's all you need.

711
01:16:12,580 --> 01:16:16,840
PROFESSOR: Yeah. Yeah, I assume that-- so,
convenience, yeah. There's always a little

712
01:16:16,849 --> 01:16:19,030
bit of convenience thrown in here.

713
01:16:19,030 --> 01:16:27,989
We will assume that the a has unique elements.
So there's nothing that's x, OK? Good.

714
01:16:27,989 --> 01:16:38,190
So the recurrence, once you do that, is t
of n equals-- we're going to just say it's

715
01:16:38,190 --> 01:16:48,510
order one for n less than or equal to 140.
Where did that come from? Well, like 140.

716
01:16:48,510 --> 01:16:53,630
It's just a large number. It came from the
fact that you're going to see 10 minus 3,

717
01:16:53,630 --> 01:16:57,119
which is 7. And then you want to multiply
that by 2.

718
01:16:57,119 --> 01:17:01,619
So some reasonably large number-- we're going
to go off and we're going to assume that's

719
01:17:01,619 --> 01:17:08,400
a constant. So you could sort those 140 numbers
and find the median or whatever rank. It's

720
01:17:08,400 --> 01:17:10,739
all constant time once you get down to the
base case.

721
01:17:10,739 --> 01:17:14,659
So you just want it to be large enough such
that you could break it up and you have something

722
01:17:14,659 --> 01:17:19,679
interesting going on with respect to the number
of columns. So don't worry much about that

723
01:17:19,679 --> 01:17:24,230
number. The key thing here is the recurrence,
all right?

724
01:17:24,230 --> 01:17:31,980
And this is what we have spent the rest of
our time on. And I'll just write this out

725
01:17:31,980 --> 01:17:47,179
and explain where these numbers came from.
So that's our recurrence for n less than or

726
01:17:47,179 --> 01:17:48,650
equal to 140.

727
01:17:48,650 --> 01:17:54,590
And else, you're going to do this. So what
is going on here? What are all of these components

728
01:17:54,590 --> 01:17:58,110
corresponding to this recurrence?

729
01:17:58,110 --> 01:18:05,300
Really quickly, this is simply something that
says I'm finding the median of medians. I'm

730
01:18:05,300 --> 01:18:11,170
finding some element that has a certain rank.
So this median of medians is going to be running

731
01:18:11,170 --> 01:18:17,579
on n over 5 columns. So I've got this-- there
are n over 5 columns here.

732
01:18:17,579 --> 01:18:24,150
And I'm going to be calling this algorithm
recursively, the median finding algorithm,

733
01:18:24,150 --> 01:18:35,260
to do that-- finding the median of medians.
This thing over here is-- I'm going to be

734
01:18:35,260 --> 01:18:43,039
discarding at least regardless of what I do.
Because I have these two statements here,

735
01:18:43,039 --> 01:18:47,760
I take the overall n. And I'm going to discard.

736
01:18:47,760 --> 01:18:51,849
In my paradigm over here, I'm either going
to go with b or I'm either going to go with

737
01:18:51,849 --> 01:18:57,889
c depending on what I'm looking for. And given
that b and c are not completely unbalanced,

738
01:18:57,889 --> 01:19:06,349
I'm going to be discarding 3n over 10 minus
6 elements, which simply corresponds to me

739
01:19:06,349 --> 01:19:12,150
ignoring the ceiling here and multiplying
the 3 out. So that's 3n over 10 minus 6.

740
01:19:12,150 --> 01:19:18,999
So then I have 7n over 10 plus 6. That's the
maximum size partition that I'm going to recur

741
01:19:18,999 --> 01:19:22,579
on. It's only going to be exactly one of them,
as you can see from that.

742
01:19:22,579 --> 01:19:26,570
It's either else. It's not recurring on both
of them. It's recurring on one of them. So

743
01:19:26,570 --> 01:19:32,099
that's where the 7n over 10 plus 6 comes from.
And then you ask where does this theta n come

744
01:19:32,099 --> 01:19:32,749
from.

745
01:19:32,749 --> 01:19:38,780
Well, the theta n comes from the fact that
I do have to do some sorting. It's constant

746
01:19:38,780 --> 01:19:44,079
time sorting for every column, OK? Because
it's only five elements.

747
01:19:44,079 --> 01:19:49,099
So I'm going to do constant time sorting.
But there's order n columns. Because it's--

748
01:19:49,099 --> 01:19:50,909
then it's n over 5 columns.

749
01:19:50,909 --> 01:20:00,679
So this is the sorting of all of the columns,
all right? So that's it. And I'll just leave

750
01:20:00,679 --> 01:20:08,659
you with-- you cannot apply the master theorem
for solving this particular recurrence. But

751
01:20:08,659 --> 01:20:11,699
if you make the observation-- and you'll see
this in section.

752
01:20:11,699 --> 01:20:19,409
You make the observation that n over 5 plus
7n over 10 is actually less than n. So you

753
01:20:19,409 --> 01:20:23,610
get 0.2n here and 0.7n there. That's actually
less than n.

754
01:20:23,610 --> 01:20:28,070
This thing runs in linear time. And you'll
see that in section tomorrow. So this whole

755
01:20:28,070 --> 01:20:33,250
thing is theta n time. See you next time.