1
00:00:08,000 --> 00:00:12,000
Today we're going to talk about
sorting, which may not come as

2
00:00:12,000 --> 00:00:15,000
such a big surprise.
We talked about sorting for a

3
00:00:15,000 --> 00:00:20,000
while, but we're going to talk
about it at a somewhat higher

4
00:00:20,000 --> 00:00:24,000
level and question some of the
assumptions that we've been

5
00:00:24,000 --> 00:00:27,000
making so far.
And we're going to ask the

6
00:00:27,000 --> 00:00:32,000
question how fast can we sort?
A pretty natural question.

7
00:00:32,000 --> 00:00:35,000
You may think you know the
answer.

8
00:00:35,000 --> 00:00:40,000
Perhaps you do.
Any suggestions on what the

9
00:00:40,000 --> 00:00:43,000
answer to this question might
be?

10
00:00:43,000 --> 00:00:46,000
There are several possible
answers.

11
00:00:46,000 --> 00:00:50,000
Many of them are partially
correct.

12
00:00:50,000 --> 00:00:56,000
Let's hear any kinds of answers
you'd like and start waking up

13
00:00:56,000 --> 00:01:00,000
this fresh morning.
Sorry?

14
00:01:00,000 --> 00:01:02,000
Theta n log n.
That's a good answer.

15
00:01:02,000 --> 00:01:06,000
That's often correct.
Any other suggestions?

16
00:01:06,000 --> 00:01:09,000
N squared.
That's correct if all you're

17
00:01:09,000 --> 00:01:12,000
allowed to do is swap adjacent
elements.

18
00:01:12,000 --> 00:01:13,000
Good.
That was close.

19
00:01:13,000 --> 00:01:17,000
I will see if I can make every
answer correct.

20
00:01:17,000 --> 00:01:20,000
Usually n squared is not the
right answer,

21
00:01:20,000 --> 00:01:22,000
but in some models it is.
Yeah?

22
00:01:22,000 --> 00:01:26,000
Theta n is also sometimes the
right answer.

23
00:01:26,000 --> 00:01:30,000
The real answer is "it
depends".

24
00:01:30,000 --> 00:01:33,000
That's the point of today's
lecture.

25
00:01:33,000 --> 00:01:37,000
It depends on what we call the
computational model,

26
00:01:37,000 --> 00:01:42,000
what you're allowed to do.
And, in particular here,

27
00:01:42,000 --> 00:01:46,000
with sorting,
what we care about is the order

28
00:01:46,000 --> 00:01:49,000
of the elements,
how are you allowed to

29
00:01:49,000 --> 00:01:54,000
manipulate the elements,
what are you allowed to do with

30
00:01:54,000 --> 00:02:00,000
them and find out their order.
The model is what you can do

31
00:02:00,000 --> 00:02:03,000
with the elements.

32
00:02:14,000 --> 00:02:18,000
Now, we've seen several sorting
algorithms.

33
00:02:18,000 --> 00:02:23,000
Do you want to shout some out?
I think we've seen four,

34
00:02:23,000 --> 00:02:27,000
but maybe you know even more
algorithms.

35
00:02:27,000 --> 00:02:30,000
Quicksort.
Keep going.

36
00:02:30,000 --> 00:02:32,000
Heapsort.
Merge sort.

37
00:02:32,000 --> 00:02:37,000
You can remember all the way
back to Lecture 1.

38
00:02:37,000 --> 00:02:39,000
Any others?
Insertion sort.

39
00:02:39,000 --> 00:02:43,000
All right.
You're on top of it today.

40
00:02:43,000 --> 00:02:49,000
I don't know exactly why,
but these two are single words

41
00:02:49,000 --> 00:02:54,000
and these two are two words.
That's the style.

42
00:02:54,000 --> 00:03:00,000
What is the running time of
quicksort?

43
00:03:00,000 --> 00:03:04,000
This is a bit tricky.
N log n in the average case.

44
00:03:04,000 --> 00:03:10,000
Or, if we randomize quicksort,
randomized quicksort runs in n

45
00:03:10,000 --> 00:03:14,000
log n expected for any input
sequence.

46
00:03:14,000 --> 00:03:18,000
Let's say n lg n randomized.
That's theta.

47
00:03:18,000 --> 00:03:24,000
And the worst-case with plain
old quicksort where you just

48
00:03:24,000 --> 00:03:30,000
pick the first element as the
partition element.

49
00:03:30,000 --> 00:03:34,000
That's n^2.
Heapsort, what's the running

50
00:03:34,000 --> 00:03:37,000
time there?
n lg n always.

51
00:03:37,000 --> 00:03:43,000
Merge sort, I hope you can
remember that as well,

52
00:03:43,000 --> 00:03:46,000
n lg n.
And insertion sort?

53
00:03:46,000 --> 00:03:50,000
n^2.
All of these algorithms run no

54
00:03:50,000 --> 00:03:54,000
faster than n lg n,
so we might ask,

55
00:03:54,000 --> 00:03:59,000
can we do better than n lg n?

56
00:04:11,000 --> 00:04:13,000
And that is a question,
in some sense,

57
00:04:13,000 --> 00:04:16,000
we will answer both yes and no
to today.

58
00:04:16,000 --> 00:04:20,000
But all of these algorithms
have something in common in

59
00:04:20,000 --> 00:04:25,000
terms of the model of what
you're allowed to do with the

60
00:04:25,000 --> 00:04:28,000
elements.
Any guesses on what that model

61
00:04:28,000 --> 00:04:30,000
might be?
Yeah?

62
00:04:30,000 --> 00:04:33,000
You compare pairs of elements,
exactly.

63
00:04:33,000 --> 00:04:39,000
That is indeed the model used
by all four of these algorithms.

64
00:04:39,000 --> 00:04:43,000
And in that model n lg n is the
best you can do.

65
00:04:43,000 --> 00:04:48,000
We have so far just looked at
what are called comparison

66
00:04:48,000 --> 00:04:52,000
sorting algorithms or
"comparison sorts".

67
00:04:52,000 --> 00:04:57,000
And this is a model for the
sorting problem of what you're

68
00:04:57,000 --> 00:05:02,000
allowed to do.
Here all you can do is use

69
00:05:02,000 --> 00:05:06,000
comparisons meaning less than,
greater than,

70
00:05:06,000 --> 00:05:11,000
less than or equal to,
greater than or equal to,

71
00:05:11,000 --> 00:05:17,000
equals to determine the
relative order of elements.

72
00:05:25,000 --> 00:05:26,000
This is a restriction on
algorithms.

73
00:05:26,000 --> 00:05:29,000
It is, in some sense,
stating what kinds of elements

74
00:05:29,000 --> 00:05:32,000
we're dealing with.
They are elements that we can
This is a three digit number.

75
00:05:32,000 --> 00:05:35,000
somehow compare.
They have a total order,

76
00:05:35,000 --> 00:05:37,000
some are less,
some are bigger.

77
00:05:37,000 --> 00:05:39,000
But is also restricts the
algorithm.

78
00:05:39,000 --> 00:05:42,000
You could say,
well, I'm sorting integers,

79
00:05:42,000 --> 00:05:45,000
but still I'm only allowed to
do comparisons with them.

80
00:05:45,000 --> 00:05:49,000
I'm not allowed to multiply the
integers or do other weird

81
00:05:49,000 --> 00:05:51,000
things.
That's the comparison sorting

82
00:05:51,000 --> 00:05:52,000
model.
And this lecture,

83
00:05:52,000 --> 00:05:55,000
in some sense,
follows the standard

84
00:05:55,000 --> 00:05:58,000
mathematical progression where
you have a theorem,
Then we get some 4s,

85
00:05:58,000 --> 00:06:01,000
then you have a proof,
then you have a counter

86
00:06:01,000 --> 00:06:05,000
example.
It's always a good way to have

87
00:06:05,000 --> 00:06:07,000
a math lecture.
We're going to prove the

88
00:06:07,000 --> 00:06:11,000
theorem that no comparison
sorting algorithm runs better

89
00:06:11,000 --> 00:06:13,000
than n lg n.
Comparisons.

90
00:06:13,000 --> 00:06:17,000
State the theorem,
prove that, and then we'll give

91
00:06:17,000 --> 00:06:21,000
a counter example in the sense
that if you go outside the

92
00:06:21,000 --> 00:06:25,000
comparison sorting model you can
do better, you can get linear

93
00:06:25,000 --> 00:06:28,000
time in some cases,
better than n lg n.

94
00:06:28,000 --> 00:06:32,000
So, that is what we're doing
today.

95
00:06:32,000 --> 00:06:36,000
But first we're going to stick
to this comparison model and try

96
00:06:36,000 --> 00:06:41,000
to understand why we need n lg n
comparisons if that's all we're

97
00:06:41,000 --> 00:06:45,000
allowed to do.
And for that we're going to

98
00:06:45,000 --> 00:06:48,000
look at something called
decision trees,

99
00:06:48,000 --> 00:06:52,000
which in some sense is another
model of what you're allowed to

100
00:06:52,000 --> 00:06:56,000
do in an algorithm,
but it's more general than the

101
00:06:56,000 --> 00:07:01,000
comparison model.
And let's try and example to

102
00:07:01,000 --> 00:07:06,000
get some intuition.
Suppose we want to sort three

103
00:07:06,000 --> 00:07:10,000
elements.
This is not very challenging,

104
00:07:10,000 --> 00:07:15,000
but we'll get to draw the
decision tree that corresponds

105
00:07:15,000 --> 00:07:22,000
to sorting three elements.
Here is one solution I claim.

106
00:07:42,000 --> 00:07:45,000
This is, in a certain sense,
an algorithm,

107
00:07:45,000 --> 00:07:50,000
but it's drawn as a tree
instead of pseudocode.

108
00:08:15,000 --> 00:08:18,000
What this tree means is that
each node you're making a

109
00:08:18,000 --> 00:08:21,000
comparison.
This says compare a_1 versus

110
00:08:21,000 --> 00:08:24,000
a_2.
If a_1 is smaller than a_2 you

111
00:08:24,000 --> 00:08:27,000
go this way, if it is bigger
than a_2 you go this way,

112
00:08:27,000 --> 00:08:32,000
and then you proceed.
When you get down to a leaf,

113
00:08:32,000 --> 00:08:36,000
this is the answer.
Remember, the sorting problem

114
00:08:36,000 --> 00:08:41,000
is you're trying to find a
permutation of the inputs that

115
00:08:41,000 --> 00:08:45,000
puts it in sorted order.
Let's try it with some sequence

116
00:08:45,000 --> 00:08:48,000
of numbers, say 9,
4 and 6.

117
00:08:48,000 --> 00:08:51,000
We want to sort 9,
4 and 6, so first we compare

118
00:08:51,000 --> 00:08:55,000
the first element with the
second element.

119
00:08:55,000 --> 00:09:00,000
9 is bigger than 4 so we go
down this way.

120
00:09:00,000 --> 00:09:03,000
Then we compare the first
element with the third element,

121
00:09:03,000 --> 00:09:05,000
that's 9 versus 6.
9 is bigger than 6,

122
00:09:05,000 --> 00:09:08,000
so we go this way.
And then we compare the second

123
00:09:08,000 --> 00:09:11,000
element with the third element,
4 is less than 6 and,

124
00:09:11,000 --> 00:09:14,000
so we go this way.
And the claim is that this is

125
00:09:14,000 --> 00:09:16,000
the correct permutation of the
elements.

126
00:09:16,000 --> 00:09:19,000
You take a_2,
which is 4, then you take a_3,

127
00:09:19,000 --> 00:09:22,000
which is 6, and then you take
a_1, which is 9,

128
00:09:22,000 --> 00:09:25,000
so indeed that works out.
And if I wrote this down right,

129
00:09:25,000 --> 00:09:30,000
this is a sorting algorithm in
the decision tree model.

130
00:09:30,000 --> 00:09:36,000
In general, let me just say the
rules of this game.

131
00:09:36,000 --> 00:09:43,000
In general, we have n elements
we want to sort.

132
00:09:43,000 --> 00:09:52,000
And I only drew the n = 3 case
because these trees get very big

133
00:09:52,000 --> 00:09:56,000
very quickly.
Each internal node,

134
00:09:56,000 --> 00:10:03,000
so every non-leaf node,
has a label of the form i :

135
00:10:03,000 --> 00:10:10,000
j where i and j are between 1
and n.

136
00:10:15,000 --> 00:10:23,000
And this means that we compare
a_i with a_j.

137
00:10:29,000 --> 00:10:33,000
And we have two subtrees from
every such node.

138
00:10:33,000 --> 00:10:40,000
We have the left subtree which
tells you what the algorithm

139
00:10:40,000 --> 00:10:45,000
does, what subsequent
comparisons it makes if it comes

140
00:10:45,000 --> 00:10:48,000
out less than.

141
00:10:54,000 --> 00:10:57,000
And we have to be a little bit
careful because it could also

142
00:10:57,000 --> 00:10:59,000
come out equal.
What we will do is the left

143
00:10:59,000 --> 00:11:03,000
subtree corresponds to less than
or equal to and the right

144
00:11:03,000 --> 00:11:06,000
subtree corresponds to strictly
greater than.

145
00:11:17,000 --> 00:11:21,000
That is a little bit more
precise than what we were doing

146
00:11:21,000 --> 00:11:23,000
here.
Here all the elements were

147
00:11:23,000 --> 00:11:26,000
distinct so no problem.
But, in general,

148
00:11:26,000 --> 00:11:30,000
we care about the equality case
too to be general.

149
00:11:30,000 --> 00:11:32,000
So, that was the internal
nodes.

150
00:11:32,000 --> 00:11:36,000
And then each leaf node gives
you a permutation.

151
00:11:44,000 --> 00:11:47,000
So, in order to be the answer
to that sorting problem,

152
00:11:47,000 --> 00:11:52,000
that permutation better have
the property that it orders the

153
00:11:52,000 --> 00:11:54,000
elements.
This is from the first lecture

154
00:11:54,000 --> 00:11:58,000
when we defined the sorting
problem.

155
00:11:58,000 --> 00:12:05,000
Some permutation on n things
such that a_pi(1) is less than

156
00:12:05,000 --> 00:12:09,000
or equal to a_pi(2) and so on.

157
00:12:15,000 --> 00:12:18,000
So, that is the definition of a
decision tree.

158
00:12:18,000 --> 00:12:21,000
Any binary tree with these
kinds of labels satisfies all

159
00:12:21,000 --> 00:12:24,000
these properties.
That is, in some sense,

160
00:12:24,000 --> 00:12:28,000
a sorting algorithm.
It's a sorting algorithm in the

161
00:12:28,000 --> 00:12:31,000
decision tree model.
Now, as you might expect,

162
00:12:31,000 --> 00:12:35,000
this is really not too
different than the comparison

163
00:12:35,000 --> 00:12:37,000
model.
If I give you a comparison

164
00:12:37,000 --> 00:12:40,000
sorting algorithm,
we have these four,

165
00:12:40,000 --> 00:12:44,000
quicksort, heapsort,
merge sort and insertion sort.

166
00:12:44,000 --> 00:12:48,000
All of them can be translated
into the decision tree model.

167
00:12:48,000 --> 00:12:52,000
It's sort of a graphical
representation of what the

168
00:12:52,000 --> 00:12:55,000
algorithm does.
It's not a terribly useful one

169
00:12:55,000 --> 00:13:00,000
for writing down an algorithm.
Any guesses why?

170
00:13:00,000 --> 00:13:03,000
Why do we not draw these
pictures as a definition of

171
00:13:03,000 --> 00:13:06,000
quicksort or a definition of
merge sort?

172
00:13:06,000 --> 00:13:09,000
It depends on the size of the
input, that's a good point.

173
00:13:09,000 --> 00:13:13,000
This tree is specific to the
value of n, so it is,

174
00:13:13,000 --> 00:13:15,000
in some sense,
not as generic.

175
00:13:15,000 --> 00:13:19,000
Now, we could try to write down
a construction for an arbitrary

176
00:13:19,000 --> 00:13:22,000
value of n of one of these
decision trees and that would

177
00:13:22,000 --> 00:13:28,000
give us sort of a real algorithm
that works for any input size.

178
00:13:28,000 --> 00:13:31,000
But even then this is not a
terribly convenient

179
00:13:31,000 --> 00:13:34,000
representation for writing down
an algorithm.

180
00:13:34,000 --> 00:13:38,000
Well, let's write down a
transformation that converts a

181
00:13:38,000 --> 00:13:42,000
comparison sorting algorithm to
a decision tree and then maybe

182
00:13:42,000 --> 00:13:45,000
you will see why.
This is not a useless model,

183
00:13:45,000 --> 00:13:48,000
obviously, I wouldn't be
telling you otherwise.

184
00:13:48,000 --> 00:13:52,000
It will be very powerful for
proving that we cannot do better

185
00:13:52,000 --> 00:13:56,000
than n lg n, but as writing down
an algorithm,

186
00:13:56,000 --> 00:14:00,000
if you were going to implement
something, this tree is not so

187
00:14:00,000 --> 00:14:05,000
useful.
Even if you had a decision tree

188
00:14:05,000 --> 00:14:10,000
computer, whatever that is.
But let's prove this theorem

189
00:14:10,000 --> 00:14:14,000
that decision trees,
in some sense,

190
00:14:14,000 --> 00:14:19,000
model comparison sorting
algorithms, which we call just

191
00:14:19,000 --> 00:14:22,000
comparison sorts.

192
00:14:29,000 --> 00:14:33,000
This is a transformation.
And we're going to build one

193
00:14:33,000 --> 00:14:38,000
tree for each value of n.
The decision trees depend on n.

194
00:14:38,000 --> 00:14:43,000
The algorithm hopefully,
well, it depends on n,

195
00:14:43,000 --> 00:14:46,000
but it works for all values of
n.

196
00:14:46,000 --> 00:14:51,000
And we're just going to think
of the algorithm as splitting

197
00:14:51,000 --> 00:14:55,000
into two forks,
the left subtree and the right

198
00:14:55,000 --> 00:15:00,000
subtree whenever it makes a
comparison.

199
00:15:07,000 --> 00:15:09,000
If we take a comparison sort
like merge sort.

200
00:15:09,000 --> 00:15:12,000
And it does lots of stuff.
It does index arithmetic,

201
00:15:12,000 --> 00:15:14,000
it does recursion,
whatever.

202
00:15:14,000 --> 00:15:18,000
But at some point it makes a
comparison and then we say,

203
00:15:18,000 --> 00:15:20,000
OK, there are two halves of the
algorithm.

204
00:15:20,000 --> 00:15:24,000
There is what the algorithm
would do if the comparison came

205
00:15:24,000 --> 00:15:27,000
out less than or equal to and
what the algorithm would do if

206
00:15:27,000 --> 00:15:31,000
the comparison came out greater
than.

207
00:15:31,000 --> 00:15:33,000
So, you can build a tree in
this way.

208
00:15:33,000 --> 00:15:37,000
In some sense,
what this tree is doing is

209
00:15:37,000 --> 00:15:42,000
listing all possible executions
of this algorithm considering

210
00:15:42,000 --> 00:15:46,000
what would happen for all
possible values of those

211
00:15:46,000 --> 00:15:48,000
comparisons.

212
00:15:59,000 --> 00:16:03,000
We will call these all possible
instruction traces.

213
00:16:03,000 --> 00:16:09,000
If you write down all the
instructions that are executed

214
00:16:09,000 --> 00:16:13,000
by this algorithm,
for all possible input arrays,

215
00:16:13,000 --> 00:16:19,000
a_1 to a_n, see what all the
comparisons, how they could come

216
00:16:19,000 --> 00:16:25,000
and what the algorithm does,
in the end you will get a tree.

217
00:16:25,000 --> 00:16:30,000
Now, how big will that tree be
roughly?

218
00:16:43,000 --> 00:16:48,000
As a function of n. Yeah?

219
00:16:55,000 --> 00:16:57,000
Right.
If it's got to be able to sort

220
00:16:57,000 --> 00:17:01,000
every possible list of length n,
at the leaves I have to have

221
00:17:01,000 --> 00:17:05,000
all the permutations of those
elements.

222
00:17:05,000 --> 00:17:07,000
That is a lot.
There are a lot of permeations

223
00:17:07,000 --> 00:17:10,000
on n elements.
There's n factorial of them.

224
00:17:10,000 --> 00:17:13,000
N factorial is exponential,
it's really big.

225
00:17:13,000 --> 00:17:17,000
So, this tree is huge.
It's going to be exponential on

226
00:17:17,000 --> 00:17:19,000
the input size n.
That is why we don't write

227
00:17:19,000 --> 00:17:22,000
algorithms down normally as a
decision tree,

228
00:17:22,000 --> 00:17:25,000
even though in some cases maybe
we could.

229
00:17:25,000 --> 00:17:29,000
It's not a very compact
representation.

230
00:17:29,000 --> 00:17:31,000
These algorithms,
you write them down in

231
00:17:31,000 --> 00:17:33,000
pseudocode, they have constant
length.

232
00:17:33,000 --> 00:17:35,000
It's a very succinct
representation of this

233
00:17:35,000 --> 00:17:38,000
algorithm.
Here the length depends on n

234
00:17:38,000 --> 00:17:41,000
and it depends exponentially on
n, which is not useful if you

235
00:17:41,000 --> 00:17:44,000
wanted to implement the
algorithm because writing down

236
00:17:44,000 --> 00:17:46,000
the algorithm would take a long
time.

237
00:17:46,000 --> 00:17:49,000
But, nonetheless,
we can use this as a tool to

238
00:17:49,000 --> 00:17:51,000
analyze these comparison sorting
algorithms.

239
00:17:51,000 --> 00:17:54,000
We have all of these.
Any algorithm can be

240
00:17:54,000 --> 00:17:58,000
transformed in this way into a
decision tree.

241
00:17:58,000 --> 00:18:03,000
And now we have this
observation that the number of

242
00:18:03,000 --> 00:18:08,000
leaves in this decision tree has
to be really big.

243
00:18:08,000 --> 00:18:12,000
Let me talk about leaves in a
second.

244
00:18:12,000 --> 00:18:18,000
Before we get to leaves,
let's talk about the depth of

245
00:18:18,000 --> 00:18:20,000
the tree.

246
00:18:26,000 --> 00:18:29,000
This decision tree represents
all possible executions of the

247
00:18:29,000 --> 00:18:31,000
algorithm.
If I look at a particular

248
00:18:31,000 --> 00:18:35,000
execution, which corresponds to
some root to leaf path in the

249
00:18:35,000 --> 00:18:38,000
tree, the running time or the
number of comparisons made by

250
00:18:38,000 --> 00:18:42,000
that execution is just the
length of the path.

251
00:18:47,000 --> 00:18:52,000
And, therefore,
the worst-case running time,

252
00:18:52,000 --> 00:18:59,000
over all possible inputs of
length n, is going to be --

253
00:19:05,000 --> 00:19:06,000
n - 1?
Could be.

254
00:19:06,000 --> 00:19:11,000
Depends on the decision tree.
But, as a function of the

255
00:19:11,000 --> 00:19:14,000
decision tree?
The longest path,

256
00:19:14,000 --> 00:19:19,000
right, which is called the
height of the tree.

257
00:19:24,000 --> 00:19:26,000
So, this is what we want to
measure.

258
00:19:26,000 --> 00:19:29,000
We want to claim that the
height of the tree has to be at

259
00:19:29,000 --> 00:19:32,000
least n lg n with an omega in
front.

260
00:19:32,000 --> 00:19:34,000
That is what we'll prove.

261
00:19:42,000 --> 00:19:44,000
And the only thing we're going
to use is that the number of

262
00:19:44,000 --> 00:19:48,000
leaves in that tree has to be
big, has to be n factorial.

263
00:20:00,000 --> 00:20:09,000
This is a lower bound on
decision tree sorting.

264
00:20:21,000 --> 00:20:26,000
And the lower bound says that
if you have any decision tree

265
00:20:26,000 --> 00:20:32,000
that sorts n elements then its
height has to be at least n lg n

266
00:20:32,000 --> 00:20:35,000
up to constant factors.

267
00:20:45,000 --> 00:20:52,000
So, that is the theorem.
Now we're going to prove the

268
00:20:52,000 --> 00:20:57,000
theorem.
And we're going to use that the

269
00:20:57,000 --> 00:21:06,000
number of leaves in that tree
must be at least n factorial.

270
00:21:06,000 --> 00:21:10,000
Because there are n factorial
permutations of the inputs.

271
00:21:10,000 --> 00:21:14,000
All of them could happen.
And so, for this algorithm to

272
00:21:14,000 --> 00:21:19,000
be correct, it has detect every
one of those permutations in

273
00:21:19,000 --> 00:21:22,000
some way.
Now, it may do it very quickly.

274
00:21:22,000 --> 00:21:26,000
We better only need n lg n
comparisons because we know

275
00:21:26,000 --> 00:21:31,000
that's possible.
The depth of the tree may not

276
00:21:31,000 --> 00:21:35,000
be too big, but it has to have a
huge number of leaves down

277
00:21:35,000 --> 00:21:37,000
there.
It has to branch enough to get

278
00:21:37,000 --> 00:21:42,000
n factorial leaves because it
has to give the right answer in

279
00:21:42,000 --> 00:21:45,000
possible inputs.
This is, in some sense,

280
00:21:45,000 --> 00:21:49,000
counting the number of possible
inputs that we have to

281
00:21:49,000 --> 00:21:52,000
distinguish.
This is the number of leaves.

282
00:21:52,000 --> 00:21:55,000
What we care about is the
height of the tree.

283
00:21:55,000 --> 00:21:59,000
Let's call the height of the
tree h.

284
00:21:59,000 --> 00:22:02,000
Now, if I have a tree of height
h, how many leaves could it

285
00:22:02,000 --> 00:22:04,000
have?
What's the maximum number of

286
00:22:04,000 --> 00:22:06,000
leaves it could have?

287
00:22:19,000 --> 00:22:23,000
2^h, exactly.
Because this is binary tree,

288
00:22:23,000 --> 00:22:29,000
comparison trees always have a
branching factor of 2,

289
00:22:29,000 --> 00:22:35,000
the number of leaves has to be
at most 2^h, if I have a height

290
00:22:35,000 --> 00:22:38,000
h tree.
Now, this gives me a relation.

291
00:22:38,000 --> 00:22:41,000
The number of leaves has to be
greater than or equal to n

292
00:22:41,000 --> 00:22:44,000
factorial and the number of
leaves has to be less than or

293
00:22:44,000 --> 00:22:47,000
equal to 2^h.
Therefore, n factorial is less

294
00:22:47,000 --> 00:22:50,000
than or equal to 2^h,
if I got that right.

295
00:22:58,000 --> 00:23:02,000
Now, again, we care about h in
terms of n factorial,

296
00:23:02,000 --> 00:23:04,000
so we solve this by taking
logs.

297
00:23:04,000 --> 00:23:07,000
And I am also going to flip
sides.

298
00:23:07,000 --> 00:23:12,000
Now h is at least log base 2,
because there is a 2 over here,

299
00:23:12,000 --> 00:23:15,000
of n factorial.
There is a property that I'm

300
00:23:15,000 --> 00:23:20,000
using here in order to derive
this inequality from this

301
00:23:20,000 --> 00:23:23,000
inequality.
This is a technical aside,

302
00:23:23,000 --> 00:23:27,000
but it's important that you
realize there is a technical

303
00:23:27,000 --> 00:23:30,000
issue here.

304
00:23:40,000 --> 00:23:43,000
The general principle I'm
applying is I have some

305
00:23:43,000 --> 00:23:46,000
inequality, I do the same thing
to both sides,

306
00:23:46,000 --> 00:23:49,000
and hopefully that inequality
should still be true.

307
00:23:49,000 --> 00:23:53,000
But, in order for that to be
the case, I need a property

308
00:23:53,000 --> 00:23:56,000
about that operation that I'm
performing.

309
00:23:56,000 --> 00:24:00,000
It has to be a monotonic
transformation.

310
00:24:00,000 --> 00:24:04,000
Here what I'm using is that log
is a monotonically increasing

311
00:24:04,000 --> 00:24:06,000
function.
That is important.

312
00:24:06,000 --> 00:24:11,000
If I multiply both sides by -1,
which is a decreasing function,

313
00:24:11,000 --> 00:24:14,000
the inequality would have to
get flipped.

314
00:24:14,000 --> 00:24:18,000
The fact that the inequality is
not flipping here,

315
00:24:18,000 --> 00:24:21,000
I need to know that log is
monotonically increasing.

316
00:24:21,000 --> 00:24:27,000
If you see log that's true.
We need to be careful here.

317
00:24:27,000 --> 00:24:31,000
Now we need some approximation
of n factorial in order to

318
00:24:31,000 --> 00:24:36,000
figure out what its log is.
Does anyone know a good

319
00:24:36,000 --> 00:24:41,000
approximation for n factorial?
Not necessarily the equation

320
00:24:41,000 --> 00:24:44,000
but the name.
Stirling's formula.

321
00:24:44,000 --> 00:24:47,000
Good.
You all remember Stirling.

322
00:24:47,000 --> 00:24:52,000
And I just need the highest
order term, which I believe is

323
00:24:52,000 --> 00:24:54,000
that.
N factorial is at least

324
00:24:54,000 --> 00:24:59,000
(n/e)^n.
So, that's all we need here.

325
00:24:59,000 --> 00:25:06,000
Now I can use properties of
logs to bring the n outside.

326
00:25:06,000 --> 00:25:09,000
This is n lg (n/e).

327
00:25:15,000 --> 00:25:18,000
And then lg (n/e) I can
simplify.

328
00:25:28,000 --> 00:25:32,000
That is just lg n - lg e.
So, this is n(lg n - lg e).

329
00:25:32,000 --> 00:25:37,000
Lg e is a constant,
so it's really tiny compared to

330
00:25:37,000 --> 00:25:39,000
this lg n which is growing
within.

331
00:25:39,000 --> 00:25:44,000
This is Omega(n lg n).
All we care about is the

332
00:25:44,000 --> 00:25:47,000
leading term.
It is actually Theta(n lg n),

333
00:25:47,000 --> 00:25:52,000
but because we have it greater
than or equal to all we care

334
00:25:52,000 --> 00:25:57,000
about is the omega.
A theta here wouldn't give us

335
00:25:57,000 --> 00:26:01,000
anything stronger.
Of course, not all algorithms

336
00:26:01,000 --> 00:26:04,000
have n lg n running time or make
n lg n comparisons.

337
00:26:04,000 --> 00:26:07,000
Some of them do,
some of them are worse,

338
00:26:07,000 --> 00:26:10,000
but this proves that all of
them require a height of at

339
00:26:10,000 --> 00:26:12,000
least n lg n.
There you see proof,

340
00:26:12,000 --> 00:26:15,000
once you observe the fact about
the number of leaves,

341
00:26:15,000 --> 00:26:18,000
and if you remember Stirling's
formula.

342
00:26:18,000 --> 00:26:22,000
So, you should know this proof.
You can show that all sorts of

343
00:26:22,000 --> 00:26:25,000
problems require n lg n time
with this kind of technique,

344
00:26:25,000 --> 00:26:30,000
provided you're in some kind of
a decision tree model.

345
00:26:30,000 --> 00:26:32,000
That's important.
We really need that our

346
00:26:32,000 --> 00:26:35,000
algorithm can be phrased as a
decision tree.

347
00:26:35,000 --> 00:26:37,000
And, in particular,
we know from this

348
00:26:37,000 --> 00:26:40,000
transformation that all
comparison sorts can be

349
00:26:40,000 --> 00:26:42,000
represented as the decision
tree.

350
00:26:42,000 --> 00:26:45,000
But there are some sorting
algorithms which cannot be

351
00:26:45,000 --> 00:26:48,000
represented as a decision tree.
And we will turn to that

352
00:26:48,000 --> 00:26:51,000
momentarily.
But before we get there I

353
00:26:51,000 --> 00:26:54,000
phrased this theorem as a lower
bound on decision tree sorting.

354
00:26:54,000 --> 00:26:57,000
But, of course,
we also get a lower bound on

355
00:26:57,000 --> 00:27:02,000
comparison sorting.
And, in particular,

356
00:27:02,000 --> 00:27:08,000
it tells us that merge sort and
heapsort are asymptotically

357
00:27:08,000 --> 00:27:11,000
optimal.
Their dependence on n,

358
00:27:11,000 --> 00:27:17,000
in terms of asymptotic
notation, so ignoring constant

359
00:27:17,000 --> 00:27:24,000
factors, these algorithms are
optimal in terms of growth of n,

360
00:27:24,000 --> 00:27:30,000
but this is only in the
comparison model.

361
00:27:30,000 --> 00:27:33,000
So, among comparison sorting
algorithms, which these are,

362
00:27:33,000 --> 00:27:35,000
they are asymptotically
optimal.

363
00:27:35,000 --> 00:27:39,000
They use the minimum number of
comparisons up to constant

364
00:27:39,000 --> 00:27:41,000
factors.
In fact, their whole running

365
00:27:41,000 --> 00:27:44,000
time is dominated by the number
of comparisons.

366
00:27:44,000 --> 00:27:47,000
It's all Theta(n lg n).
So, this is good news.

367
00:27:47,000 --> 00:27:51,000
And I should probably mention a
little bit about what happens

368
00:27:51,000 --> 00:27:55,000
with randomized algorithms.
What I've described here really

369
00:27:55,000 --> 00:27:57,000
only applies,
in some sense,

370
00:27:57,000 --> 00:28:02,000
to deterministic algorithms.
Does anyone see what would

371
00:28:02,000 --> 00:28:06,000
change with randomized
algorithms or where I've assumed

372
00:28:06,000 --> 00:28:09,000
that I've had a deterministic
comparison sort?

373
00:28:09,000 --> 00:28:13,000
This is a bit subtle.
And I only noticed it reading

374
00:28:13,000 --> 00:28:17,000
the notes this morning,
oh, wait.

375
00:28:28,000 --> 00:28:30,000
I will give you a hint.
It's over here,

376
00:28:30,000 --> 00:28:33,000
the right-hand side of the
world.

377
00:28:50,000 --> 00:28:55,000
If I have a deterministic
algorithm, what the algorithm

378
00:28:55,000 --> 00:29:00,000
does is completely determinate
at each step.

379
00:29:00,000 --> 00:29:05,000
As long as I know all the
comparisons that it made up to

380
00:29:05,000 --> 00:29:11,000
some point, it's determinate
what that algorithm will do.

381
00:29:11,000 --> 00:29:17,000
But, if I have a randomized
algorithm, it also depends on

382
00:29:17,000 --> 00:29:24,000
the outcomes of some coin flips.
Any suggestions of what breaks

383
00:29:24,000 --> 00:29:28,000
over here?
There is more than one tree,

384
00:29:28,000 --> 00:29:31,000
exactly.
So, we had this assumption that

385
00:29:31,000 --> 00:29:33,000
we only have one tree for each
n.

386
00:29:33,000 --> 00:29:36,000
In fact, what we get is a
probability distribution over

387
00:29:36,000 --> 00:29:38,000
trees.
For each value of n,

388
00:29:38,000 --> 00:29:41,000
if you take all the possible
executions of that algorithm,

389
00:29:41,000 --> 00:29:44,000
all the instruction traces,
well, now, in addition to

390
00:29:44,000 --> 00:29:47,000
branching on comparisons,
we also branch on whether a

391
00:29:47,000 --> 00:29:50,000
coin flip came out heads or
tails, or however we're

392
00:29:50,000 --> 00:29:53,000
generating random numbers it
came out with some value between

393
00:29:53,000 --> 00:29:55,000
1 and n.
So, we get a probability

394
00:29:55,000 --> 00:29:58,000
distribution over trees.
This lower bound still applies,

395
00:29:58,000 --> 00:30:02,000
though.
Because, no matter what tree we

396
00:30:02,000 --> 00:30:05,000
get, I don't really care.
I get at least one tree for

397
00:30:05,000 --> 00:30:08,000
each n.
And this proof applies to every

398
00:30:08,000 --> 00:30:10,000
tree.
So, no matter what tree you

399
00:30:10,000 --> 00:30:15,000
get, if it is a correct tree it
has to have height Omega(n lg

400
00:30:15,000 --> 00:30:17,000
n).
This lower bound applies even

401
00:30:17,000 --> 00:30:21,000
for randomized algorithms.
You cannot get better than n lg

402
00:30:21,000 --> 00:30:24,000
n, because no matter what tree
it comes up with,

403
00:30:24,000 --> 00:30:29,000
no matter how those coin flips
come out, this argument still

404
00:30:29,000 --> 00:30:33,000
applies.
Every tree that comes out has

405
00:30:33,000 --> 00:30:37,000
to be correct,
so this is really at least one

406
00:30:37,000 --> 00:30:38,000
tree.

407
00:30:43,000 --> 00:30:47,000
And that will now work.
We also get the fact that

408
00:30:47,000 --> 00:30:52,000
randomized quicksort is
asymptotically optimal in

409
00:30:52,000 --> 00:30:54,000
expectation.

410
00:31:05,000 --> 00:31:09,000
But, in order to say that
randomized quicksort is

411
00:31:09,000 --> 00:31:13,000
asymptotically optimal,
we need to know that all

412
00:31:13,000 --> 00:31:19,000
randomized algorithms require
Omega(n lg n) comparisons.

413
00:31:19,000 --> 00:31:22,000
Now we know that so all is
well.

414
00:31:22,000 --> 00:31:27,000
That is the comparison model.
Any questions before we go on?

415
00:31:27,000 --> 00:31:31,000
Good.
The next topic is to burst

416
00:31:31,000 --> 00:31:37,000
outside of the comparison model
and try to sort in linear time.

417
00:31:43,000 --> 00:31:45,000
It is pretty clear that,
as long as you don't have some

418
00:31:45,000 --> 00:31:48,000
kind of a parallel algorithm or
something really fancy,

419
00:31:48,000 --> 00:31:51,000
you cannot sort any better than
linear time because you've at

420
00:31:51,000 --> 00:31:54,000
least got to look at the data.
No matter what you're doing

421
00:31:54,000 --> 00:31:56,000
with the data,
you've got to look at it,

422
00:31:56,000 --> 00:31:59,000
otherwise you're not sorting it
correctly.

423
00:31:59,000 --> 00:32:01,000
So, linear time is the best we
could hope for.

424
00:32:01,000 --> 00:32:05,000
N lg n is pretty close.
How could we sort in linear

425
00:32:05,000 --> 00:32:07,000
time?
Well, we're going to need some

426
00:32:07,000 --> 00:32:10,000
more powerful assumption.
And this is the counter

427
00:32:10,000 --> 00:32:12,000
example.
We're going to have to move

428
00:32:12,000 --> 00:32:16,000
outside the comparison model and
do something else with our

429
00:32:16,000 --> 00:32:18,000
elements.
And what we're going to do is

430
00:32:18,000 --> 00:32:21,000
assume that they're integers in
a particular range,

431
00:32:21,000 --> 00:32:24,000
and we will use that to sort in
linear time.

432
00:32:24,000 --> 00:32:27,000
We're going to see two
algorithms for sorting faster

433
00:32:27,000 --> 00:32:32,000
than n lg n.
The first one is pretty simple,

434
00:32:32,000 --> 00:32:35,000
and we will use it in the
second algorithm.

435
00:32:35,000 --> 00:32:40,000
It's called counting sort.
The input to counting sort is

436
00:32:40,000 --> 00:32:44,000
an array, as usual,
but we're going to assume what

437
00:32:44,000 --> 00:32:49,000
those array elements look like.
Each A[i] is an integer from

438
00:32:49,000 --> 00:32:52,000
the range of 1 to k.
This is a pretty strong

439
00:32:52,000 --> 00:32:55,000
assumption.
And the running time is

440
00:32:55,000 --> 00:33:01,000
actually going to depend on k.
If k is small it is going to be

441
00:33:01,000 --> 00:33:06,000
a good algorithm.
If k is big it's going to be a

442
00:33:06,000 --> 00:33:10,000
really bad algorithm,
worse than n lg n.

443
00:33:10,000 --> 00:33:15,000
Our goal is to output some
sorted version of this array.

444
00:33:15,000 --> 00:33:20,000
Let's call this sorting of A.
It's going to be easier to

445
00:33:20,000 --> 00:33:25,000
write down the output directly
instead of writing down

446
00:33:25,000 --> 00:33:32,000
permutation for this algorithm.
And then we have some auxiliary

447
00:33:32,000 --> 00:33:36,000
storage.
I'm about to write down the

448
00:33:36,000 --> 00:33:41,000
pseudocode, which is why I'm
declaring all my variables here.

449
00:33:41,000 --> 00:33:45,000
And the auxiliary storage will
have length k,

450
00:33:45,000 --> 00:33:48,000
which is the range on my input
values.

451
00:33:48,000 --> 00:33:52,000
Let's see the algorithm.

452
00:34:07,000 --> 00:34:09,000
This is counting sort.

453
00:34:17,000 --> 00:34:20,000
And it takes a little while to
write down but it's pretty

454
00:34:20,000 --> 00:34:22,000
straightforward.

455
00:34:28,000 --> 00:34:32,000
First we do some
initialization.

456
00:34:32,000 --> 00:34:36,000
Then we do some counting.

457
00:35:04,000 --> 00:35:06,000
Then we do some summing.

458
00:35:50,000 --> 00:35:54,000
And then we actually write the
output.

459
00:36:28,000 --> 00:36:30,000
Is that algorithm perfectly
clear to everyone?

460
00:36:30,000 --> 00:36:30,000
No one.
Good.
This should illustrate how
obscure pseudocode can be.

461
00:36:33,000 --> 00:36:36,000
And when you're solving your
problem sets,

462
00:36:36,000 --> 00:36:39,000
you should keep in mind that
it's really hard to understand

463
00:36:39,000 --> 00:36:41,000
an algorithm just given
pseudocode like this.

464
00:36:41,000 --> 00:36:45,000
You need some kind of English
description of what's going on

465
00:36:45,000 --> 00:36:48,000
because, while you could work
through and figure out what this

466
00:36:48,000 --> 00:36:51,000
means, it could take half an
hour to an hour.

467
00:36:51,000 --> 00:36:53,000
And that's not a good way of
expressing yourself.

468
00:36:53,000 --> 00:36:57,000
And so what I will give you now
is the English description,

469
00:36:57,000 --> 00:37:01,000
but we will refer back to this
to understand.

470
00:37:01,000 --> 00:37:05,000
This is sort of our bible of
what the algorithm is supposed

471
00:37:05,000 --> 00:37:07,000
to do.
Let me go over it briefly.

472
00:37:07,000 --> 00:37:11,000
The first step is just some
initialization.

473
00:37:11,000 --> 00:37:15,000
The C[i]'s are going to count
some things, count occurrences

474
00:37:15,000 --> 00:37:18,000
of values.
And so first we set them to

475
00:37:18,000 --> 00:37:20,000
zero.
Then, for every value we see

476
00:37:20,000 --> 00:37:25,000
A[j], we're going to increment
the counter for that value A[j].

477
00:37:25,000 --> 00:37:30,000
Then the C[i]s will give me the
number of elements equal to a

478
00:37:30,000 --> 00:37:35,000
particular value i.
Then I'm going to take prefix

479
00:37:35,000 --> 00:37:39,000
sums, which will make it so that
C[i] gives me the number of

480
00:37:39,000 --> 00:37:42,000
keys, the number of elements
less than or equal to [i]

481
00:37:42,000 --> 00:37:45,000
instead of equals.
And then, finally,

482
00:37:45,000 --> 00:37:49,000
it turns out that's enough to
put all the elements in the

483
00:37:49,000 --> 00:37:52,000
right place.
This I will call distribution.

484
00:37:52,000 --> 00:37:56,000
This is the distribution step.
And it's probably the least

485
00:37:56,000 --> 00:38:01,000
obvious of all the steps.
And let's do an example to make

486
00:38:01,000 --> 00:38:04,000
it more obvious what's going on.

487
00:38:12,000 --> 00:38:30,000
Let's take an array A = [4,
1, 3, 4, 3].

488
00:38:30,000 --> 00:38:36,000
And then I want some array C.
And let me add some indices

489
00:38:36,000 --> 00:38:43,000
here so we can see what the
algorithm is really doing.

490
00:38:43,000 --> 00:38:50,000
Here it turns out that all of
my numbers are in the range 1 to

491
00:38:50,000 --> 00:38:54,000
4, so k = 4.
My array C has four values.

492
00:38:54,000 --> 00:39:00,000
Initially, I set them all to
zero.

493
00:39:00,000 --> 00:39:03,000
That's easy.
And now I want to count through

494
00:39:03,000 --> 00:39:07,000
everything.
And let me not cheat here.

495
00:39:07,000 --> 00:39:10,000
I'm in the second step,
so to speak.

496
00:39:10,000 --> 00:39:13,000
And I look for each element in
order.

497
00:39:13,000 --> 00:39:17,000
I look at the C[i] value.
The first element is 4,

498
00:39:17,000 --> 00:39:20,000
so I look at C4.
That is 0.

499
00:39:20,000 --> 00:39:24,000
I increment it to 1.
Then I look at element 1.

500
00:39:24,000 --> 00:39:28,000
That's 0.
I increment it to 1.

501
00:39:28,000 --> 00:39:30,000
Then I look at 3 and that's
here.

502
00:39:30,000 --> 00:39:33,000
It is also 0.
I increment it to 1.

503
00:39:33,000 --> 00:39:37,000
Not so exciting so far.
Now I see 4,

504
00:39:37,000 --> 00:39:40,000
which I've seen before,
how exciting.

505
00:39:40,000 --> 00:39:44,000
I had value 1 in here,
I increment it to 2.

506
00:39:44,000 --> 00:39:48,000
Then I see value 3,
which also had a value of 1.

507
00:39:48,000 --> 00:39:51,000
I increment that to 2.
The result is [1,

508
00:39:51,000 --> 00:39:55,000
0, 2, 2].
That's what my array C looks

509
00:39:55,000 --> 00:40:00,000
like at this point in the
algorithm.

510
00:40:00,000 --> 00:40:04,000
Now I do a relatively simple
transformation of taking prefix

511
00:40:04,000 --> 00:40:05,000
sums.
I want to know,

512
00:40:05,000 --> 00:40:09,000
instead of these individual
values, the sum of this prefix,

513
00:40:09,000 --> 00:40:13,000
the sum of this prefix,
the sum of this prefix and the

514
00:40:13,000 --> 00:40:17,000
sum of this prefix.
I will call that C prime just

515
00:40:17,000 --> 00:40:21,000
so we don't get too lost in all
these different versions of C.

516
00:40:21,000 --> 00:40:23,000
This is just 1.
And 1 plus 0 is 1.

517
00:40:23,000 --> 00:40:25,000
1 plus 2 is 3.
3 plus 2 is 5.

518
00:40:25,000 --> 00:40:30,000
So, these are sort of the
running totals.

519
00:40:30,000 --> 00:40:33,000
There are five elements total,
there are three elements less

520
00:40:33,000 --> 00:40:37,000
than or equal to 3,
there is one element less than

521
00:40:37,000 --> 00:40:38,000
or equal to 2,
and so on.

522
00:40:38,000 --> 00:40:40,000
Now, the fun part,
the distribution.

523
00:40:40,000 --> 00:40:43,000
And this is where we get our
array B.

524
00:40:43,000 --> 00:40:46,000
B better have the same size,
every element better appear

525
00:40:46,000 --> 00:40:50,000
here somewhere and they should
come out in sorted order.

526
00:40:50,000 --> 00:40:54,000
Let's just run the algorithm.
j is going to start at the end

527
00:40:54,000 --> 00:40:58,000
of the array and work its way
down to 1, the beginning of the

528
00:40:58,000 --> 00:41:02,000
array.
And what we do is we pick up

529
00:41:02,000 --> 00:41:05,000
the last element of A,
A[n].

530
00:41:05,000 --> 00:41:11,000
We look at the counter.
We look at the C vector for

531
00:41:11,000 --> 00:41:14,000
that value.
Here the value is 3,

532
00:41:14,000 --> 00:41:19,000
and this is the third column,
so that has number 3.

533
00:41:19,000 --> 00:41:24,000
And the claim is that's where
it belongs in B.

534
00:41:24,000 --> 00:41:29,000
You take this number 3,
you put it in index 3 of the

535
00:41:29,000 --> 00:41:34,000
array B.
And then you decrement the

536
00:41:34,000 --> 00:41:37,000
counter.
I'm going to replace 3 here

537
00:41:37,000 --> 00:41:40,000
with 2.
And the idea is these numbers

538
00:41:40,000 --> 00:41:44,000
tell you where those values
should go.

539
00:41:44,000 --> 00:41:48,000
Anything of value 1 should go
at position 1.

540
00:41:48,000 --> 00:41:53,000
Anything with value 3 should go
at position 3 or less.

541
00:41:53,000 --> 00:41:59,000
This is going to be the last
place that a 3 should go.

542
00:41:59,000 --> 00:42:02,000
And then anything with value 4
should go at position 5 or less,

543
00:42:02,000 --> 00:42:06,000
definitely should go at the end
of the array because 4 is the

544
00:42:06,000 --> 00:42:09,000
largest value.
And this counter will work out

545
00:42:09,000 --> 00:42:13,000
perfectly because these counts
have left enough space in each

546
00:42:13,000 --> 00:42:15,000
section of the array.
Effectively,

547
00:42:15,000 --> 00:42:18,000
this part is reserved for ones,
there are no twos,

548
00:42:18,000 --> 00:42:21,000
this part is reserved for
threes, and this part is

549
00:42:21,000 --> 00:42:24,000
reserved for fours.
You can check if that's really

550
00:42:24,000 --> 00:42:27,000
what this array means.
Let's finish running the

551
00:42:27,000 --> 00:42:31,000
algorithm.
That was the last element.

552
00:42:31,000 --> 00:42:34,000
I won't cross it off,
but we've sort of done that.

553
00:42:34,000 --> 00:42:36,000
Now I look at the next to last
element.

554
00:42:36,000 --> 00:42:38,000
That's a 4.
Fours go in position 5.

555
00:42:38,000 --> 00:42:42,000
So, I put my 4 here in position
5 and I decrement that counter.

556
00:42:42,000 --> 00:42:45,000
Next I look at another 3.
Threes now go in position 2,

557
00:42:45,000 --> 00:42:48,000
so that goes there.
And then I decrement that

558
00:42:48,000 --> 00:42:50,000
counter.
I won't actually use that

559
00:42:50,000 --> 00:42:53,000
counter anymore,
but let's decrement it because

560
00:42:53,000 --> 00:42:57,000
that's what the algorithm says.
I look at the previous element.

561
00:42:57,000 --> 00:43:00,000
That's a 1.
Ones go in position 1,

562
00:43:00,000 --> 00:43:04,000
so I put it here and decrement
that counter.

563
00:43:04,000 --> 00:43:09,000
And finally I have another 4.
And fours go in position 4 now,

564
00:43:09,000 --> 00:43:13,000
position 4 is here,
and I decrement that counter.

565
00:43:13,000 --> 00:43:18,000
So, that's counting sort.
And you'll notice that all the

566
00:43:18,000 --> 00:43:23,000
elements appear and they appear
in order, so that's the

567
00:43:23,000 --> 00:43:26,000
algorithm.
Now, what's the running time of

568
00:43:26,000 --> 00:43:31,000
counting sort?
kn is an upper bound.

569
00:43:31,000 --> 00:43:35,000
It's a little bit better than
that.

570
00:43:35,000 --> 00:43:43,000
Actually, quite a bit better.
This requires some summing.

571
00:43:43,000 --> 00:43:49,000
Let's go back to the top of the
algorithm.

572
00:43:49,000 --> 00:43:53,000
How much time does this step
take?

573
00:43:53,000 --> 00:43:57,000
k.
How much time does this step

574
00:43:57,000 --> 00:44:00,000
take?
n.

575
00:44:00,000 --> 00:44:05,000
How much time does this step
take?

576
00:44:05,000 --> 00:44:10,000
k.
Each of these operations in the

577
00:44:10,000 --> 00:44:17,000
for loops is taking constant
time, so it is how many

578
00:44:17,000 --> 00:44:22,000
iterations of that for loop are
there?

579
00:44:22,000 --> 00:44:29,000
And, finally,
this step takes n.

580
00:44:29,000 --> 00:44:35,000
So, the total running time of
counting sort is k + n.

581
00:44:35,000 --> 00:44:43,000
And this is a great algorithm
if k is relatively small,

582
00:44:43,000 --> 00:44:49,000
like at most n.
If k is big like n^2 or 2^n or

583
00:44:49,000 --> 00:44:54,000
whatever, this is not such a
good algorithm,

584
00:44:54,000 --> 00:45:01,000
but if k = O(n) this is great.
And we get our linear time

585
00:45:01,000 --> 00:45:04,000
sorting algorithm.
Not only do we need the

586
00:45:04,000 --> 00:45:08,000
assumption that our numbers are
integers, but we need that the

587
00:45:08,000 --> 00:45:12,000
range of the integers is pretty
small for this algorithm to

588
00:45:12,000 --> 00:45:14,000
work.
If all the numbers are between

589
00:45:14,000 --> 00:45:17,000
1 and order n then we get a
linear time algorithm.

590
00:45:17,000 --> 00:45:20,000
But as soon as they're up to n
lg n we're toast.

591
00:45:20,000 --> 00:45:24,000
We're back to n lg n sorting.
It's not so great.

592
00:45:24,000 --> 00:45:27,000
So, you could write a
combination algorithm that says,

593
00:45:27,000 --> 00:45:31,000
well, if k is bigger than n lg
n, then I will just use merge

594
00:45:31,000 --> 00:45:35,000
sort.
And if it's less than n lg n

595
00:45:35,000 --> 00:45:38,000
I'll use counting sort.
And that would work,

596
00:45:38,000 --> 00:45:42,000
but we can do better than that.
How's the time?

597
00:45:42,000 --> 00:45:46,000
It is worth noting that we've
beaten our bound,

598
00:45:46,000 --> 00:45:51,000
but only assuming that we're
outside the comparison model.

599
00:45:51,000 --> 00:45:55,000
We haven't really contradicted
the original theorem,

600
00:45:55,000 --> 00:46:00,000
we're just changing the model.
And it's always good to

601
00:46:00,000 --> 00:46:04,000
question what you're allowed to
do in any problem scenario.

602
00:46:04,000 --> 00:46:07,000
In, say, some practical
scenarios, this would be great

603
00:46:07,000 --> 00:46:10,000
if the numbers you're dealing
with are, say,

604
00:46:10,000 --> 00:46:12,000
a byte long.
Then k is only 2^8,

605
00:46:12,000 --> 00:46:15,000
which is 256.
You need this auxiliary array

606
00:46:15,000 --> 00:46:17,000
of size 256, and this is really
fast.

607
00:46:17,000 --> 00:46:21,000
256 + n, no matter how big n is
it's linear in n.

608
00:46:21,000 --> 00:46:24,000
If you know your numbers are
small, it's great.

609
00:46:24,000 --> 00:46:27,000
But if you're numbers are
bigger, say you still know

610
00:46:27,000 --> 00:46:30,000
they're integers but they fit in
like 32 bit words,

611
00:46:30,000 --> 00:46:35,000
then life is not so easy.
Because k is then 2^32,

612
00:46:35,000 --> 00:46:39,000
which is 4.2 billion or so,
which is pretty big.

613
00:46:39,000 --> 00:46:43,000
And you would need this
auxiliary array of 4.2 billion

614
00:46:43,000 --> 00:46:46,000
words, which is probably like 16
gigabytes.

615
00:46:46,000 --> 00:46:51,000
So, you just need to initialize
that array before you can even

616
00:46:51,000 --> 00:46:54,000
get started.
Unless n is like much,

617
00:46:54,000 --> 00:46:58,000
much more than 4 billion and
you have 16 gigabytes of storage

618
00:46:58,000 --> 00:47:02,000
just to throw away,
which I don't even have any

619
00:47:02,000 --> 00:47:06,000
machines with 16 gigabytes of
RAM, this is not such a great

620
00:47:06,000 --> 00:47:10,000
algorithm.
Just to get a feel,

621
00:47:10,000 --> 00:47:13,000
it's good, the numbers are
really small.

622
00:47:13,000 --> 00:47:18,000
What we're going to do next is
come up with a fancier algorithm

623
00:47:18,000 --> 00:47:22,000
that uses this as a subroutine
on small numbers and combines

624
00:47:22,000 --> 00:47:25,000
this algorithm to handle larger
numbers.

625
00:47:25,000 --> 00:47:29,000
That algorithm is called radix
sort.

626
00:47:29,000 --> 00:47:34,000
But we need one important
property of counting sort before

627
00:47:34,000 --> 00:47:36,000
we can go there.

628
00:47:42,000 --> 00:47:45,000
And that important property is
stability.

629
00:47:50,000 --> 00:47:58,000
A stable sorting algorithm
preserves the order of equal

630
00:47:58,000 --> 00:48:05,000
elements, let's say the relative
order.

631
00:48:19,000 --> 00:48:21,000
This is a bit subtle because
usually we think of elements

632
00:48:21,000 --> 00:48:24,000
just as numbers.
And, yeah, we had a couple

633
00:48:24,000 --> 00:48:25,000
threes and we had a couple
fours.

634
00:48:25,000 --> 00:48:28,000
It turns out,
if you look at the order of

635
00:48:28,000 --> 00:48:31,000
those threes and the order of
those fours, we kept them in

636
00:48:31,000 --> 00:48:33,000
order.
Because we took the last three

637
00:48:33,000 --> 00:48:36,000
and we put it here.
Then we took the next to the

638
00:48:36,000 --> 00:48:39,000
last three and we put it to the
left of that where O is

639
00:48:39,000 --> 00:48:42,000
decrementing our counter and
moving from the end of the array

640
00:48:42,000 --> 00:48:45,000
to the beginning of the array.
No matter how we do that,

641
00:48:45,000 --> 00:48:49,000
the orders of those threes are
preserved, the orders of the

642
00:48:49,000 --> 00:48:51,000
fours are preserved.
This may seem like a relatively

643
00:48:51,000 --> 00:48:54,000
simple thing,
but if you look at the other

644
00:48:54,000 --> 00:48:57,000
four sorting algorithms we've
seen, not all of them are

645
00:48:57,000 --> 00:49:00,000
stable.
So, this is an exercise.

646
00:49:06,000 --> 00:49:11,000
Exercise is figure out which
other sorting algorithms that

647
00:49:11,000 --> 00:49:15,000
we've seen are stable and which
are not.

648
00:49:21,000 --> 00:49:25,000
I encourage you to work that
out because this is the sort of

649
00:49:25,000 --> 00:49:29,000
thing that we ask on quizzes.
But for now all we need is that

650
00:49:29,000 --> 00:49:33,000
counting sort is stable.
And I won't prove this,

651
00:49:33,000 --> 00:49:37,000
but it should be pretty obvious
from the algorithm.

652
00:49:37,000 --> 00:49:41,000
Now we get to talk about radix
sort.

653
00:49:55,000 --> 00:50:01,000
Radix sort is going to work for
a much larger range of numbers

654
00:50:01,000 --> 00:50:04,000
in linear time.
Still it has to have an

655
00:50:04,000 --> 00:50:09,000
assumption about how big those
numbers are, but it will be a

656
00:50:09,000 --> 00:50:13,000
much more lax assumption.
Now, to increase suspense even

657
00:50:13,000 --> 00:50:18,000
further, I am going to tell you
some history about radix sort.

658
00:50:18,000 --> 00:50:22,000
This is one of the oldest
sorting algorithms.

659
00:50:22,000 --> 00:50:26,000
It's probably the oldest
implemented sorting algorithm.

660
00:50:26,000 --> 00:50:32,000
It was implemented around 1890.
This is Herman Hollerith.

661
00:50:32,000 --> 00:50:35,000
Let's say around 1890.
Has anyone heard of Hollerith

662
00:50:35,000 --> 00:50:37,000
before?
A couple people.

663
00:50:37,000 --> 00:50:41,000
Not too many.
He is sort of an important guy.

664
00:50:41,000 --> 00:50:43,000
He was a lecturer at MIT at
some point.

665
00:50:43,000 --> 00:50:47,000
He developed an early version
of punch cards.

666
00:50:47,000 --> 00:50:51,000
Punch card technology.
This is before my time so I

667
00:50:51,000 --> 00:50:54,000
even have to look at my notes to
remember.

668
00:50:54,000 --> 00:50:57,000
Oh, yeah, they're called punch
cards.

669
00:50:57,000 --> 00:51:02,000
You may have seen them.
If not they're in the

670
00:51:02,000 --> 00:51:06,000
PowerPoint lecture notes.
There's this big grid.

671
00:51:06,000 --> 00:51:11,000
These days, if you've used a
modern punch card recently,

672
00:51:11,000 --> 00:51:16,000
they are 80 characters wide
and, I don't know,

673
00:51:16,000 --> 00:51:21,000
I think it's something like 16,
I don't remember exactly.

674
00:51:21,000 --> 00:51:25,000
And then you punch little holes
here.

675
00:51:25,000 --> 00:51:30,000
You have this magic machine.
It's like a typewriter.

676
00:51:30,000 --> 00:51:34,000
You press a letter and that
corresponds to some character.

677
00:51:34,000 --> 00:51:38,000
Maybe it will punch out a hole
here, punch out a hole here.

678
00:51:38,000 --> 00:51:42,000
You can see the website if you
want to know exactly how this

679
00:51:42,000 --> 00:51:46,000
works for historical reasons.
You don't see these too often

680
00:51:46,000 --> 00:51:49,000
anymore, but this is in
particular the reason why most

681
00:51:49,000 --> 00:51:53,000
terminals are 80 characters wide
because that was how things

682
00:51:53,000 --> 00:51:55,000
were.
Hollerith actually didn't

683
00:51:55,000 --> 00:51:59,000
develop these punch cards
exactly, although eventually he

684
00:51:59,000 --> 00:52:01,000
did.
In the beginning,

685
00:52:01,000 --> 00:52:04,000
in 1890, the big deal was the
US Census.

686
00:52:04,000 --> 00:52:07,000
If you watched the news,
I guess like a year or two ago,

687
00:52:07,000 --> 00:52:10,000
the US Census was a big deal
because it's really expensive to

688
00:52:10,000 --> 00:52:12,000
collect all this data from
everyone.

689
00:52:12,000 --> 00:52:15,000
And the Constitution says
you've got to collect data about

690
00:52:15,000 --> 00:52:18,000
everyone every ten years.
And it was getting hard.

691
00:52:18,000 --> 00:52:20,000
In particular,
in 1880, they did the census.

692
00:52:20,000 --> 00:52:24,000
And it took them almost ten
years to complete the census.

693
00:52:24,000 --> 00:52:27,000
The population kept going up,
and ten years to do a ten-year

694
00:52:27,000 --> 00:52:30,000
census, that's going to start
getting expensive when they

695
00:52:30,000 --> 00:52:34,000
overlap with each other.
So, for 1890 they wanted to do

696
00:52:34,000 --> 00:52:37,000
something fancier.
And Hollerith said,

697
00:52:37,000 --> 00:52:40,000
OK, I'm going to build a
machine that you take in the

698
00:52:40,000 --> 00:52:42,000
data.
It was a modified punch card

699
00:52:42,000 --> 00:52:46,000
where you would mark out
particular squares depending on

700
00:52:46,000 --> 00:52:50,000
your status, whether you were
single or married or whatever.

701
00:52:50,000 --> 00:52:53,000
All the things they wanted to
know on the census they would

702
00:52:53,000 --> 00:52:57,000
encode in binary onto this card.
And then he built a machine

703
00:52:57,000 --> 00:53:02,000
that would sort these cards so
you could do counting.

704
00:53:02,000 --> 00:53:05,000
And, in some sense,
these are numbers.

705
00:53:05,000 --> 00:53:10,000
And the numbers aren't too big,
but they're big enough that

706
00:53:10,000 --> 00:53:15,000
counting sort wouldn't work.
I mean if there were a hundred

707
00:53:15,000 --> 00:53:18,000
numbers here,
2^100 is pretty overwhelming,

708
00:53:18,000 --> 00:53:24,000
so we cannot use counting sort.
The first idea was the wrong

709
00:53:24,000 --> 00:53:27,000
idea.
I'm going to think of these as

710
00:53:27,000 --> 00:53:30,000
numbers.
Let's say each of these columns

711
00:53:30,000 --> 00:53:34,000
is one number.
And so there's sort of the most

712
00:53:34,000 --> 00:53:38,000
significant number out here and
there is the least significant

713
00:53:38,000 --> 00:53:40,000
number out here.
The first idea was you sort by

714
00:53:40,000 --> 00:53:43,000
the most significant digit
first.

715
00:53:50,000 --> 00:53:53,000
That's not such a great
algorithm, because if you sort

716
00:53:53,000 --> 00:53:58,000
by the most significant digit
you get a bunch of buckets each

717
00:53:58,000 --> 00:54:01,000
with a pile of cards.
And this was a physical device.

718
00:54:01,000 --> 00:54:04,000
It wasn't exactly an
electronically controlled

719
00:54:04,000 --> 00:54:06,000
computer.
It was a human that would push

720
00:54:06,000 --> 00:54:09,000
down some kind of reader.
It would see which holes in the

721
00:54:09,000 --> 00:54:12,000
first column are punched.
And then it would open a

722
00:54:12,000 --> 00:54:15,000
physical bin in which the person
would sort of swipe it and it

723
00:54:15,000 --> 00:54:17,000
would just fall into the right
bin.

724
00:54:17,000 --> 00:54:20,000
It was a semi-automated.
I mean the computer was the

725
00:54:20,000 --> 00:54:22,000
human plus the machine,
but never mind.

726
00:54:22,000 --> 00:54:25,000
This was the procedure.
You sorted it into bins.

727
00:54:25,000 --> 00:54:28,000
Then you had to go through and
sort each bin by the second

728
00:54:28,000 --> 00:54:32,000
digit.
And pretty soon the number of

729
00:54:32,000 --> 00:54:36,000
bins gets pretty big.
And if you don't have too many

730
00:54:36,000 --> 00:54:40,000
digits this is OK,
but it's not the right thing to

731
00:54:40,000 --> 00:54:41,000
do.
The right idea,

732
00:54:41,000 --> 00:54:45,000
which is what Hollerith came up
with after that,

733
00:54:45,000 --> 00:54:50,000
was to sort by the least
significant digit first.

734
00:55:00,000 --> 00:55:03,000
And you should also do that
using a stable sorting

735
00:55:03,000 --> 00:55:05,000
algorithm.
Now, Hollerith probably didn't

736
00:55:05,000 --> 00:55:08,000
call it a stable sorting
algorithm at the time,

737
00:55:08,000 --> 00:55:11,000
but we will.
And this won Hollerith lots of

738
00:55:11,000 --> 00:55:14,000
money and good things.
He founded this tabulating

739
00:55:14,000 --> 00:55:17,000
machine company in 1911,
and that merged with several

740
00:55:17,000 --> 00:55:21,000
other companies to form
something you may have heard of

741
00:55:21,000 --> 00:55:24,000
called IBM in 1924.
That may be the context in

742
00:55:24,000 --> 00:55:28,000
which you've heard of Hollerith,
or if you've done punch cards

743
00:55:28,000 --> 00:55:32,000
before.
The whole idea is that we're

744
00:55:32,000 --> 00:55:37,000
doing a digit by digit sort.
I should have mentioned that at

745
00:55:37,000 --> 00:55:40,000
the beginning.
And we're going to do it from

746
00:55:40,000 --> 00:55:43,000
least significant to most
significant.

747
00:55:43,000 --> 00:55:48,000
It turns out that works.
And to see that let's do an

748
00:55:48,000 --> 00:55:50,000
example.
I think I'm going to need a

749
00:55:50,000 --> 00:55:55,000
whole two boards ideally.
First we'll see an example.

750
00:55:55,000 --> 00:55:59,000
Then we'll prove the theorem.
The proof is actually pretty

751
00:55:59,000 --> 00:56:03,000
darn easy.
But, nonetheless,

752
00:56:03,000 --> 00:56:07,000
it's rather counterintuitive
this works if you haven't seen

753
00:56:07,000 --> 00:56:10,000
it before.
Certainly, the first time I saw

754
00:56:10,000 --> 00:56:14,000
it, it was quite a surprise.
The nice thing also about this

755
00:56:14,000 --> 00:56:19,000
algorithm is there are no bins.
It's all one big bin at all

756
00:56:19,000 --> 00:56:21,000
times.
Let's take some numbers.

757
00:56:23,000 --> 00:56:28,000
I'm spacing out the digits so
we can see them a little bit

758
00:56:28,000 --> 00:07:37,000
better.

759
00:56:30,000 --> 00:56:33,000
657, 839, 436,
720 and 355.

760
00:56:33,000 --> 00:56:38,000
I'm assuming here we're using
decimal numbers.

761
00:56:38,000 --> 00:56:43,000
Why not?
Hopefully this are not yet

762
00:56:43,000 --> 00:56:47,000
sorted.
We'd like to sort them.

763
00:56:47,000 --> 00:56:54,000
The first thing we do is take
the least significant digit,

764
00:56:54,000 --> 00:57:00,000
sort by the least significant
digit.

765
00:57:00,000 --> 00:57:04,000
And whenever we have equal
elements like these two nines,

766
00:57:04,000 --> 00:57:07,000
we preserve their relative
order.

767
00:57:07,000 --> 00:57:11,000
So, 329 is going to remain
above 839.

768
00:57:11,000 --> 00:57:16,000
It doesn't matter here because
we're doing the first sort,

769
00:57:16,000 --> 00:57:20,000
but in general we're always
using a stable sorting

770
00:57:20,000 --> 00:57:23,000
algorithm.
When we sort by this column,

771
00:57:23,000 --> 00:57:27,000
first we get the zero,
so that's 720,

772
00:57:27,000 --> 00:05:55,000
then we get 5,

773
00:57:30,000 --> 00:07:16,000
Then we get 6,

774
00:57:31,000 --> 00:57:36,000
Stop me if I make a mistake.
Then we get the 7s,

775
00:57:36,000 --> 00:57:42,000
and we preserve the order.
Here it happens to be the right

776
00:57:42,000 --> 00:57:47,000
order, but it may not be at this
point.

777
00:57:47,000 --> 00:57:51,000
We haven't even looked at the
other digits.

778
00:57:51,000 --> 00:57:54,000
Then we get 9s,
there are two 9s,

779
00:57:54,000 --> 00:57:57,000
329 and 839.
All right so far?

780
00:57:57,000 --> 00:58:03,000
Good.
Now we sort by the middle

781
00:58:03,000 --> 00:58:07,000
digit, the next least
significant.

782
00:58:07,000 --> 00:58:12,000
And we start out with what
looks like the 2s.

783
00:58:12,000 --> 00:58:17,000
There is a 2 up here and a 2
down here.

784
00:58:17,000 --> 00:58:23,000
Of course, we write the first 2
first, 720, then 329.

785
00:58:23,000 --> 00:58:30,000
Then we have the 3s,
so we have 436 and 839.

786
00:58:30,000 --> 00:58:33,000
Then we have a bunch of 5s it
looks like.

787
00:58:33,000 --> 00:58:36,000
Have I missed anyone so far?
No.

788
00:58:36,000 --> 00:58:38,000
Good.
We have three 5s,

789
00:58:38,000 --> 00:58:42,000
355, 457 and 657.
I like to check that I haven't

790
00:58:42,000 --> 00:58:45,000
lost any elements.
We have seven here,

791
00:58:45,000 --> 00:58:48,000
seven here and seven elements
here.

792
00:58:48,000 --> 00:58:51,000
Good.
Finally, we sort by the last

793
00:58:51,000 --> 00:58:53,000
digit.
One thing to notice,

794
00:58:53,000 --> 00:59:00,000
by the way, is before we sorted
by the last digit --

795
00:59:00,000 --> 00:59:05,000
Currently these numbers don't
resemble sorted order at all.

796
00:59:05,000 --> 00:59:10,000
But if you look at everything
beyond the digit we haven't yet

797
00:59:10,000 --> 00:59:15,000
sorted, so these two digits,
that's nice and sorted,

798
00:59:15,000 --> 00:59:17,000
20, 29, 36, 39,
55, 57, 57.

799
00:59:17,000 --> 00:59:20,000
Pretty cool.
Let's finish it off.

800
00:59:20,000 --> 00:59:23,000
We stably sort by the first
digit.

801
00:59:23,000 --> 00:59:29,000
And the smallest number we get
is a 3, so we get 329 and then

802
00:59:36,000 --> 00:59:45,000
436 and 457,
then we get a 6,

803
00:59:45,000 --> 00:59:55,000
657, then a 7,
and then we have an 8.

804
00:59:55,000 --> 01:00:01,631
And check.
I still have seven elements.

805
01:00:01,631 --> 01:00:03,203
Good.
I haven't lost anyone.

806
01:00:03,203 --> 01:00:05,533
And, indeed,
they're now in sorted order.

807
01:00:05,533 --> 01:00:08,097
And you can start to see why
this is working.

808
01:00:08,097 --> 01:00:11,417
When I have equal elements
here, I have already sorted the

809
01:00:11,417 --> 01:00:13,398
suffix.
Let's write down a proof of

810
01:00:13,398 --> 01:00:15,029
that.
What is nice about this

811
01:00:15,029 --> 01:00:17,650
algorithm is we're not
partitioning into bins.

812
01:00:17,650 --> 01:00:20,970
We always keep the huge batch
of elements in one big pile,

813
01:00:20,970 --> 01:00:23,650
but we're just going through it
multiple times.

814
01:00:23,650 --> 01:00:27,087
In general, we sort of need to
go through it multiple times.

815
01:00:27,087 --> 01:00:32,006
Hopefully not too many times.
But let's first argue

816
01:00:32,006 --> 01:00:36,019
correctness.
To analyze the running time is

817
01:00:36,019 --> 01:00:41,751
a little bit tricky here because
it depends how you partition

818
01:00:41,751 --> 01:00:44,808
into digits.
Correctness is easy.

819
01:00:44,808 --> 01:00:50,159
We just induct on the digit
position that we're currently

820
01:00:50,159 --> 01:00:55,891
sorting, so let's call that t.
And we can assume by induction

821
01:00:55,891 --> 01:01:02,656
that it's sorted beyond digit t.
This is our induction

822
01:01:02,656 --> 01:01:07,841
hypothesis.
We assume that we're sorted on

823
01:01:07,841 --> 01:01:14,924
the low-order t - 1 digits.
And then the next thing we do

824
01:01:14,924 --> 01:01:21,501
is sort on the t-th digit.
We just need to check that

825
01:01:21,501 --> 01:01:26,561
things work.
And we restore the induction

826
01:01:26,561 --> 00:00:01,000
hypothesis for t instead of t -

827
01:01:32,000 --> 01:01:36,009
When we sort on the t-th digit
there are two cases.

828
01:01:36,009 --> 01:01:40,981
If we look at any two elements,
we want to know whether they're

829
01:01:40,981 --> 01:01:45,150
put in the right order.
If two elements are the same,

830
01:01:45,150 --> 01:01:49,000
let's say they have the same
t-th digit --

831
01:01:58,000 --> 01:02:02,000
This is the tricky case.
If they have the same t-th

832
01:02:02,000 --> 01:02:05,519
digit then their order should
not be changed.

833
01:02:05,519 --> 01:02:09,360
So, by stability,
we know that they remain in the

834
01:02:09,360 --> 01:02:14,400
same order because stability is
supposed to preserve things that

835
01:02:14,400 --> 01:02:17,519
have the same key that we're
sorting on.

836
01:02:17,519 --> 01:02:21,920
And then, by the induction
hypothesis, we know that that

837
01:02:21,920 --> 01:02:26,239
keeps them in sorted order
because induction hypothesis

838
01:02:26,239 --> 01:02:30,000
says that they used to be
sorted.

839
01:02:30,000 --> 01:02:35,369
Adding on this value in the
front that's the same in both

840
01:02:35,369 --> 01:02:39,684
doesn't change anything so they
remain sorted.

841
01:02:39,684 --> 01:02:44,000
And if they have differing t-th
digits --

842
01:02:54,000 --> 01:03:00,000
-- then this sorting step will
put them in the right order.

843
01:03:00,000 --> 01:03:03,189
Because that's what sorting
does.

844
01:03:03,189 --> 01:03:08,870
This is the most significant
digit, so you've got to order

845
01:03:08,870 --> 01:03:12,558
them by the t-th digit if they
differ.

846
01:03:12,558 --> 01:03:17,840
The rest are irrelevant.
So, proof here of correctness

847
01:03:17,840 --> 01:03:22,026
is very simple once you know the
algorithm.

848
01:03:22,026 --> 01:03:25,514
Any questions before we go on?
Good.

849
01:03:25,514 --> 01:03:30,000
We're going to use counting
sort.

850
01:03:30,000 --> 01:03:30,344
We could use any sorting
algorithm we want for individual

851
01:03:30,344 --> 01:03:30,713
digits, but the only algorithm
that we know that runs in less

852
01:03:30,713 --> 01:03:30,916
than n lg n time is counting
sort.

853
01:03:30,916 --> 01:03:31,267
So, we better use that one to
sort of bootstrap and get an

854
01:03:31,267 --> 01:03:31,501
even faster and more general
algorithm.

855
01:03:31,501 --> 01:03:31,883
I just erased the running time.
Counting sort runs in order k +

856
01:03:31,883 --> 01:03:36,003
n time.
We need to remember that.

857
01:03:36,003 --> 01:03:44,329
And the range of the numbers is
1 to k or 0 to k - 1.

858
01:03:44,329 --> 01:03:53,616
When we sort by a particular
digit, we shouldn't use n lg n

859
01:03:53,616 --> 01:04:02,743
algorithm because then this
thing will take n lg n for one

860
01:04:02,743 --> 01:04:09,788
round and it's going to have
multiple rounds.

861
01:04:09,788 --> 01:04:15,552
That's going to be worse than n
lg n.

862
01:04:15,552 --> 01:04:25,000
We're going to use counting
sort for each round.

863
01:04:32,000 --> 01:04:34,931
We use counting sort for each
digit.

864
01:04:34,931 --> 01:04:40,125
And we know the running time of
counting sort here is order k +

865
01:04:40,125 --> 01:04:42,973
n .
But I don't want to assume that

866
01:04:42,973 --> 01:04:46,324
my integers are split into
digits for me.

867
01:04:46,324 --> 01:04:50,261
That's sort of giving away too
much flexibility.

868
01:04:50,261 --> 01:04:55,287
Because if I have some number
written in whatever form it is,

869
01:04:55,287 --> 01:05:00,062
probably written in binary,
I can cluster together some of

870
01:05:00,062 --> 01:05:04,000
those bits and call that a
digit.

871
01:05:04,000 --> 01:05:07,415
Let's think of our numbers as
binary.

872
01:05:07,415 --> 01:05:12,442
Suppose we have n integers.
And they're in some range.

873
01:05:12,442 --> 01:05:16,901
And we want to know how big a
range they can be.

874
01:05:16,901 --> 01:05:21,264
Let's say, a sort of practical
way of thinking,

875
01:05:21,264 --> 01:05:26,577
you know, we're in a binary
world, each integer is b bits

876
01:05:26,577 --> 01:05:29,774
long.
So, in other words,

877
01:05:29,774 --> 01:05:35,283
the range is from 0 to 2b - 1.
I will assume that my numbers

878
01:05:35,283 --> 01:05:39,765
are non-negative.
It doesn't make much difference

879
01:05:39,765 --> 01:05:42,006
if they're negative,
too.

880
01:05:42,006 --> 01:05:47,515
I want to know how big a b I
can handle, but I don't want to

881
01:05:47,515 --> 01:05:52,650
split into bits as my digits
because then I would have b

882
01:05:52,650 --> 01:05:59,000
digits and I would have to do b
rounds of this algorithm.

883
01:05:59,000 --> 01:06:02,839
The number of rounds of this
algorithm is the number of

884
01:06:02,839 --> 01:06:05,754
digits that I have.
And each one costs me,

885
01:06:05,754 --> 01:06:08,598
let's hope, for linear time.
And, indeed,

886
01:06:08,598 --> 01:06:10,589
if I use a single bit,
k = 2.

887
01:06:10,589 --> 01:06:14,428
And so this is order n.
But then the running time would

888
01:06:14,428 --> 01:06:17,557
be order n per round.
And there are b digits,

889
01:06:17,557 --> 01:06:21,183
if I consider them to be bits,
order n times b time.

890
01:06:21,183 --> 01:06:24,240
And even if b is something
small like log n,

891
01:06:24,240 --> 01:06:27,866
if I have log n bits,
then these are numbers between

892
01:06:27,866 --> 01:06:32,549
0 and n - 1.
I already know how to sort

893
01:06:32,549 --> 01:06:36,666
numbers between 0 and n - 1 in
linear time.

894
01:06:36,666 --> 01:06:41,372
Here I'm spending n lg n time,
so that's no good.

895
01:06:41,372 --> 01:06:47,549
Instead, what we're going to do
is take a bunch of bits and call

896
01:06:47,549 --> 01:06:51,470
that a digit,
the most bits we can handle

897
01:06:51,470 --> 01:06:56,078
with counting sort.
The notation will be I split

898
01:06:56,078 --> 01:07:01,846
each integer into b/r digits.
Each r bits long.

899
01:07:01,846 --> 01:07:06,630
In other words,
I think of my number as being

900
01:07:06,630 --> 01:07:11,086
in base 2^r.
And I happen to be writing it

901
01:07:11,086 --> 01:07:15,869
down in binary,
but I cluster together r bits

902
01:07:15,869 --> 01:07:20,108
and I get a bunch of digits in
base 2^r.

903
01:07:20,108 --> 01:07:26,195
And then there are b/ r digits.
This b/r is the number of

904
01:07:26,195 --> 01:07:30,000
rounds.
And this base --

905
01:07:30,000 --> 01:07:34,104
This is the maximum value I
have in one of these digits.

906
01:07:34,104 --> 01:07:37,537
It's between 0 and 2^r.
This is, in some sense,

907
01:07:37,537 --> 01:07:40,000
k for a run of counting sort.

908
01:07:49,000 --> 01:07:54,673
What is the running time?
Well, I have b/r rounds.

909
01:07:54,673 --> 01:08:00,000
It's b/r times the running time
for a round.

910
01:08:00,000 --> 01:08:05,830
Which I have n numbers and my
value of k is 2^r.

911
01:08:05,830 --> 01:08:10,917
This is the running time of
counting sort,

912
01:08:10,917 --> 01:08:18,236
n + k, this is the number of
rounds, so this is b/r (n+2^r).

913
01:08:18,236 --> 01:08:23,198
And I am free to choose r
however I want.

914
01:08:23,198 --> 01:08:30,145
What I would like to do is
minimize this run time over my

915
01:08:30,145 --> 01:08:35,703
choices of r.
Any suggestions on how I might

916
01:08:35,703 --> 01:08:40,303
find the minimum running time
over all choices of r?

917
01:08:40,303 --> 01:08:44,000
Techniques, not necessarily
solutions.

918
01:08:53,000 --> 01:08:55,488
We're not used to this because
it's asymptomatic,

919
01:08:55,488 --> 01:08:58,288
but forget the big O here.
How do I minimize a function

920
01:08:58,288 --> 01:09:01,336
with respect to one variable?
Take the derivative,

921
01:09:01,336 --> 01:09:03,541
yeah.
I can take the derivative of

922
01:09:03,541 --> 01:09:06,080
this function by r,
differentiate by r,

923
01:09:06,080 --> 01:09:10,022
set the derivative equal to 0,
and that should be a critical

924
01:09:10,022 --> 01:09:13,496
point in this function.
It turns out this function is

925
01:09:13,496 --> 01:09:16,368
unimodal in r and you will find
the minimum.

926
01:09:16,368 --> 01:09:19,510
We could do that.
I'm not going to do it because

927
01:09:19,510 --> 01:09:23,385
it takes a little bit more work.
You should try it at home.

928
01:09:23,385 --> 01:09:27,059
It will give you the exact
minimum, which is good if you

929
01:09:27,059 --> 01:09:32,283
know what this constant is.
Differentiate with respect to r

930
01:09:32,283 --> 01:09:35,305
and set to 0.
I am going to do it a little

931
01:09:35,305 --> 01:09:39,063
bit more intuitively,
in other words less precisely,

932
01:09:39,063 --> 01:09:41,788
but I will still get the right
answer.

933
01:09:41,788 --> 01:09:46,210
And definitely I will get an
upper bound because I can choose

934
01:09:46,210 --> 01:09:50,115
r to be whatever I want.
It turns out this will be the

935
01:09:50,115 --> 01:09:53,210
right answer.
Let's just think about growth

936
01:09:53,210 --> 01:09:56,526
in terms of r.
There are essentially two terms

937
01:09:56,526 --> 01:10:00,024
here.
I have b/r(n) and I have

938
01:10:00,024 --> 01:10:03,315
b/r(2^r).
Now, b/r(n) would like r to be

939
01:10:03,315 --> 01:10:07,364
as big as possible.
The bigger r is the number of

940
01:10:07,364 --> 01:10:10,992
rounds goes down.
This number in front of n,

941
01:10:10,992 --> 01:10:16,138
this coefficient in front of n
goes down, so I would like r to

942
01:10:16,138 --> 01:10:18,669
be big.
So, b/r(n) wants r big.

943
01:10:18,669 --> 01:10:23,478
However, r cannot be too big.
This is saying I want digits

944
01:10:23,478 --> 01:10:28,540
that have a lot of bits in them.
It cannot be too big because

945
01:10:28,540 --> 01:10:34,465
there's 2^r term out here.
If this happens to be bigger

946
01:10:34,465 --> 01:10:39,220
than n then this will dominate
in terms of growth of r.

947
01:10:39,220 --> 01:10:43,182
This is going to be b times 2
to the r over r.

948
01:10:43,182 --> 01:10:46,264
2 the r is much,
much bigger than r,

949
01:10:46,264 --> 01:10:50,490
so it's going to grow much
faster is what I mean.

950
01:10:50,490 --> 01:10:55,949
And so I really don't want r to
be too big for this other term.

951
01:10:55,949 --> 01:11:00,000
So, that is b/4(2^r) wants r
small.

952
01:11:00,000 --> 01:11:06,684
Provided that this term is
bigger or equal to this term

953
01:11:06,684 --> 01:11:11,758
then I can set r pretty big for
that term.

954
01:11:11,758 --> 01:11:16,710
What I want is the n to
dominate the 2^r.

955
01:11:16,710 --> 01:11:23,641
Provided I have that then I can
set r as large as I want.

956
01:11:23,641 --> 01:11:30,697
Let's say I want to choose r to
be maximum subject to this

957
01:11:30,697 --> 01:11:38,000
condition that n is greater than
or equal to 2^r.

958
01:11:38,000 --> 01:11:42,291
This is an upper bound to 2^r,
and upper bound on r.

959
01:11:42,291 --> 01:11:44,899
In other words,
I want r = lg n.

960
01:11:44,899 --> 01:11:49,948
This turns out to be the right
answer up to constant factors.

961
01:11:49,948 --> 01:11:53,566
There we go.
And definitely choosing r to be

962
01:11:53,566 --> 01:11:58,951
lg n will give me an upper bound
on the best running time I could

963
01:11:58,951 --> 01:12:04,000
get because I can choose it to
be whatever I want.

964
01:12:04,000 --> 01:12:10,564
If you differentiate you will
indeed get the same answer.

965
01:12:10,564 --> 01:12:15,956
This was not quite a formal
argument but close,

966
01:12:15,956 --> 01:12:21,699
because the big O is all about
what grows fastest.

967
01:12:21,699 --> 01:12:26,036
If we plug in r = lg n we get
bn/lg n.

968
01:12:26,036 --> 01:12:31,780
The n and the 2^r are equal,
that's a factor of 2,

969
01:12:31,780 --> 01:12:38,704
2 times n, not a big deal.
It comes out into the O.

970
01:12:38,704 --> 01:12:44,788
We have bn/lg n which is r.
We have to think about what

971
01:12:44,788 --> 01:12:49,859
this means and translate it in
terms of range.

972
01:12:49,859 --> 01:12:56,957
b was the number of bits in our
number, which corresponds to the

973
01:12:56,957 --> 01:13:03,417
range of the number.
I've got 20 minutes under so

974
01:13:03,417 --> 01:13:08,543
far in lecture so I can go 20
minutes over,

975
01:13:08,543 --> 01:13:11,228
right?
No, I'm kidding.

976
01:13:11,228 --> 01:13:15,988
Almost done.
Let's say that our numbers,

977
01:13:15,988 --> 01:13:21,724
are integers are in the range,
we have 0 to 2^b,

978
01:13:21,724 --> 01:13:26,606
I'm going to say that it's
range 0 to nd.

979
01:13:26,606 --> 01:13:33,449
This should be a -1 here.
If I have numbers that are

980
01:13:33,449 --> 01:13:38,632
between 0 and n^d - 1 where d is
a constant or d is some

981
01:13:38,632 --> 01:13:42,306
parameter, so this is a
polynomial in n,

982
01:13:42,306 --> 01:13:45,604
then you work out this running
time.

983
01:13:45,604 --> 01:13:49,844
It is order dn.
This is the way to think about

984
01:13:49,844 --> 01:13:54,179
it because now we can compare to
counting sort.

985
01:13:54,179 --> 01:13:59,644
Counting sort could handle 0 up
to some constant times d in

986
01:13:59,644 --> 01:14:04,501
linear time.
Now I can handle 0 up to n to

987
01:14:04,501 --> 01:14:07,434
some constant power in linear
time.

988
01:14:07,434 --> 01:14:12,178
This is if d = order 1 then we
get a linear time sorting

989
01:14:12,178 --> 01:14:15,543
algorithm.
And that is cool as long as d

990
01:14:15,543 --> 01:14:19,511
is at most lg n.
As long as your numbers are at

991
01:14:19,511 --> 01:14:24,255
most n lg n then we have
something that beats our n lg n

992
01:14:24,255 --> 01:14:29,000
sorting algorithms.
And this is pretty nice.

993
01:14:29,000 --> 01:14:33,099
Whenever you know that your
numbers are order log end bits

994
01:14:33,099 --> 01:14:36,048
long we are happy,
and you get some smooth

995
01:14:36,048 --> 01:14:37,990
tradeoff there.
For example,

996
01:14:37,990 --> 01:14:42,018
if we have our 32 bit numbers
and we split into let's say

997
01:14:42,018 --> 01:14:46,262
eight bit chunks then we'll only
have to do four rounds each

998
01:14:46,262 --> 01:14:49,570
linear time and we have just 256
working space.

999
01:14:49,570 --> 01:14:52,735
We were doing four rounds for
32 bit numbers.

1000
01:14:52,735 --> 01:14:56,835
If you use n lg n algorithm,
you're going to be doing lg n

1001
01:14:56,835 --> 01:15:00,941
rounds through your numbers.
n is like 2000,

1002
01:15:00,941 --> 01:15:03,515
and that's at least 11 rounds
for example.

1003
01:15:03,515 --> 01:15:07,281
You would think this algorithm
is going to be much faster for

1004
01:15:07,281 --> 01:15:09,038
small numbers.
Unfortunately,

1005
01:15:09,038 --> 01:15:11,612
counting sort is not very good
on a cache.

1006
01:15:11,612 --> 01:15:14,311
In practice,
rating sort is not that fast an

1007
01:15:14,311 --> 01:15:17,199
algorithm unless your numbers
are really small.

1008
01:15:17,199 --> 01:15:19,584
Something like quicksort can do
better.

1009
01:15:19,584 --> 01:15:22,660
It's sort of shame,
but theoretically this is very

1010
01:15:22,660 --> 01:15:25,045
beautiful.
And there are contexts where

1011
01:15:25,045 --> 01:15:29,000
this is really the right way to
sort things.

1012
01:15:29,000 --> 01:15:34,352
I will mention finally that if
you have arbitrary integers that

1013
01:15:34,352 --> 01:15:39,100
are one word length long.
Here we're assuming that there

1014
01:15:39,100 --> 01:15:44,280
are b bits in a word and we have
some depends indirectly on b

1015
01:15:44,280 --> 01:15:46,093
here.
But, in general,

1016
01:15:46,093 --> 01:15:51,100
if you have a bunch of integers
and they're one word length

1017
01:15:51,100 --> 01:15:55,589
long, and you can manipulate a
word in constant time,

1018
01:15:55,589 --> 01:16:00,597
then the best algorithm we know
for sorting runs in n times

1019
01:16:00,597 --> 01:16:05,000
square root of lg lg n time
expected.

1020
01:16:05,000 --> 01:16:08,719
It is a randomized algorithm.
We're not going to cover that

1021
01:16:08,719 --> 01:16:11,798
algorithm in this class.
It's rather complicated.

1022
01:16:11,798 --> 01:16:15,068
I didn't even cover it in
Advanced Algorithms when I

1023
01:16:15,068 --> 01:16:17,570
taught it.
If you want something easier,

1024
01:16:17,570 --> 01:16:21,289
you can get n times square root
of lg lg n time worst-case.

1025
01:16:21,289 --> 01:16:23,406
And that paper is almost
readable.

1026
01:16:23,406 --> 01:16:26,035
I have taught that in Advanced
Algorithms.

1027
01:16:26,035 --> 01:16:28,729
If you're interested in this
kind of stuff,

1028
01:16:28,729 --> 01:16:32,000
take Advanced Algorithms next
fall.

1029
01:16:32,000 --> 01:16:34,552
It's one of the follow-ons to
this class.

1030
01:16:34,552 --> 01:16:38,317
These are much more complicated
algorithms, but it gives you

1031
01:16:38,317 --> 01:16:40,870
some sense.
You can even break out of the

1032
01:16:40,870 --> 01:16:43,742
dependence on b,
as long as you know that b is

1033
01:16:43,742 --> 01:16:46,486
at most a word.
And I will stop there unless

1034
01:16:46,486 --> 01:16:49,000
there are any questions.
Then see you Wednesday.