1
00:00:00,100 --> 00:00:02,500
The following content is
provided under a Creative

2
00:00:02,500 --> 00:00:04,019
Commons license.

3
00:00:04,019 --> 00:00:06,360
Your support will help
MIT OpenCourseWare

4
00:00:06,360 --> 00:00:10,740
continue to offer high quality
educational resources for free.

5
00:00:10,740 --> 00:00:13,330
To make a donation, or
view additional materials

6
00:00:13,330 --> 00:00:17,207
from 100 of MIT courses,
visit MIT OpenCourseWare

7
00:00:17,207 --> 00:00:17,832
at ocw.mit.edu.

8
00:00:21,150 --> 00:00:23,740
LING REN: Everyone,
today we're going to look

9
00:00:23,740 --> 00:00:25,260
at dynamic programming again.

10
00:00:29,000 --> 00:00:31,700
So I think I have
mentioned several times,

11
00:00:31,700 --> 00:00:34,590
so you should all
know it by heart now,

12
00:00:34,590 --> 00:00:37,420
the dynamic programming,
its main idea

13
00:00:37,420 --> 00:00:46,460
is divide the problem
into subproblems and reuse

14
00:00:46,460 --> 00:00:49,680
the results of the problems
you already solved.

15
00:00:49,680 --> 00:00:50,600
Right?

16
00:00:50,600 --> 00:00:54,010
And, of course, in 6.046 we
always care about the runtime.

17
00:00:57,440 --> 00:01:03,720
So those are the two big
themes for dynamic programming.

18
00:01:06,430 --> 00:01:09,070
Now, let's start with
a warm-up example.

19
00:01:09,070 --> 00:01:12,120
It's extremely simple.

20
00:01:12,120 --> 00:01:20,730
Let's say we have a grid, and
there's a robot from, say,

21
00:01:20,730 --> 00:01:26,980
coordinate 1,1 and it wants
to go to coordinate m,n.

22
00:01:26,980 --> 00:01:30,980
So at every step, it can
only either take a step up,

23
00:01:30,980 --> 00:01:33,160
or take a step on the right.

24
00:01:33,160 --> 00:01:37,640
So how many distinct paths are
there for the robot to take?

25
00:01:46,330 --> 00:01:49,290
Is the question clear?

26
00:01:49,290 --> 00:01:51,870
So we have a robot
at coordinate 1,1.

27
00:01:51,870 --> 00:01:54,420
It wants to go to
coordinate m,n.

28
00:01:54,420 --> 00:01:57,410
And every step, it can
either take a step up,

29
00:01:57,410 --> 00:01:59,420
or take a step to the right.

30
00:01:59,420 --> 00:02:02,340
How many distinct
path are there that

31
00:02:02,340 --> 00:02:04,153
can take the robot
to its destination?

32
00:02:10,180 --> 00:02:11,365
Any ideas how to solve that?

33
00:02:19,623 --> 00:02:20,123
Go ahead.

34
00:02:20,123 --> 00:02:23,616
AUDIENCE: So, we define
subproblems as the number

35
00:02:23,616 --> 00:02:30,640
of distinct paths from
some point x,y to m,n.

36
00:02:30,640 --> 00:02:33,090
Then the number of distinct
paths from some point

37
00:02:33,090 --> 00:02:37,990
is the number of paths if you go
up if you're allowed to go up,

38
00:02:37,990 --> 00:02:40,930
plus the number of
paths if you go right

39
00:02:40,930 --> 00:02:42,400
if you're allowed to go right.

40
00:02:42,400 --> 00:02:44,880
So if you were on the
edge, [INAUDIBLE].

41
00:02:44,880 --> 00:02:47,520
LING REN: Yup, yup.

42
00:02:47,520 --> 00:02:48,520
Does everyone got that?

43
00:02:48,520 --> 00:02:49,540
So, it's very simple.

44
00:02:49,540 --> 00:02:53,590
So, I know I have only one
way to get to these points.

45
00:02:53,590 --> 00:02:55,230
I need to go all the way right.

46
00:02:55,230 --> 00:02:57,340
And only one way to
get to these points.

47
00:02:57,340 --> 00:02:59,560
I need to go all the way up.

48
00:02:59,560 --> 00:03:01,720
So for all the
intermediate nodes,

49
00:03:01,720 --> 00:03:07,220
my number of choices are--
is this board moving?

50
00:03:07,220 --> 00:03:12,490
Are just the number of distinct
paths I can come from my left,

51
00:03:12,490 --> 00:03:16,060
plus the number of distinct
path I can come from bottom.

52
00:03:16,060 --> 00:03:17,200
And then I can go in.

53
00:03:17,200 --> 00:03:20,500
For every node, I'll just take
a sum between the two numbers

54
00:03:20,500 --> 00:03:23,200
on my left and on my bottom.

55
00:03:23,200 --> 00:03:25,900
And go from there.

56
00:03:25,900 --> 00:03:26,495
OK.

57
00:03:26,495 --> 00:03:27,478
Is that clear?

58
00:03:31,960 --> 00:03:34,320
So this example is
very simple, but it

59
00:03:34,320 --> 00:03:38,125
does illustrate the point of
dynamic programming very well.

60
00:03:41,130 --> 00:03:45,210
You solve subproblems, and ask
how many distinct path can I

61
00:03:45,210 --> 00:03:49,930
come here, and you reuse
the results of, for example,

62
00:03:49,930 --> 00:03:54,200
this subproblem because
you are using it to compute

63
00:03:54,200 --> 00:03:58,400
this number and that number.

64
00:03:58,400 --> 00:04:02,440
If you don't do that, if
you don't memorize and reuse

65
00:04:02,440 --> 00:04:06,055
the results, then your
runtime will be worse.

66
00:04:06,055 --> 00:04:07,305
So what's the runtime of that?

67
00:04:16,510 --> 00:04:17,601
Speak up.

68
00:04:17,601 --> 00:04:19,810
AUDIENCE: [INAUDIBLE]

69
00:04:19,810 --> 00:04:24,100
LING REN: It's just m times n.

70
00:04:24,100 --> 00:04:24,600
Why?

71
00:04:24,600 --> 00:04:28,320
Because I have this many
unique sub problems.

72
00:04:28,320 --> 00:04:33,580
One at each point, and I'm just
taking the sum of two numbers

73
00:04:33,580 --> 00:04:37,470
at each subproblem, so
it takes me constant time

74
00:04:37,470 --> 00:04:41,490
to merge the results from my
subproblems to get my problem.

75
00:04:41,490 --> 00:04:45,220
So to analyze
runtime, usually we

76
00:04:45,220 --> 00:04:51,010
ask the question how many
unique problems do I have.

77
00:04:51,010 --> 00:04:52,760
And what's the
amount of merge work

78
00:04:52,760 --> 00:04:55,000
I have to do at every step?

79
00:05:04,542 --> 00:05:05,500
That's the toy example.

80
00:05:09,685 --> 00:05:12,850
Now let's look at some
more complicated examples.

81
00:05:15,660 --> 00:05:17,545
Our first one is
called make change.

82
00:05:21,150 --> 00:05:24,320
As its name suggests, we
have a bunch of coins.

83
00:05:24,320 --> 00:05:31,710
s1, s2, all the way to, say, sm.

84
00:05:31,710 --> 00:05:35,600
So each coin has some values,
like 1 cent, 5 cent, 10 cent.

85
00:05:35,600 --> 00:05:39,500
We're going to make change
for a total of n cents,

86
00:05:39,500 --> 00:05:43,550
and ask what's the
minimum number of coins

87
00:05:43,550 --> 00:05:47,715
do I need to make
change of n cents.

88
00:05:53,440 --> 00:05:56,540
So to guarantee that we can
always make this change,

89
00:05:56,540 --> 00:05:58,990
we'll set s1 to be 1.

90
00:05:58,990 --> 00:06:01,820
Otherwise, there's a chance
that the problem is unsolvable.

91
00:06:08,700 --> 00:06:09,860
Any ideas?

92
00:06:09,860 --> 00:06:11,418
Is the problem clear?

93
00:06:11,418 --> 00:06:12,834
STUDENT: How do
you find s1 again?

94
00:06:12,834 --> 00:06:15,800
Or si?

95
00:06:15,800 --> 00:06:17,970
LING REN: What, these numbers?

96
00:06:17,970 --> 00:06:19,690
They are inputs.

97
00:06:19,690 --> 00:06:20,870
They are also inputs.

98
00:06:20,870 --> 00:06:22,840
It could be 1 cent,
5 cent, 10 cent.

99
00:06:22,840 --> 00:06:25,670
Or 3 cent, 7 cent.

100
00:06:25,670 --> 00:06:27,170
Though the smallest
one is always 1.

101
00:06:30,420 --> 00:06:30,920
OK.

102
00:06:30,920 --> 00:06:32,461
I need to find a
combination of them.

103
00:06:32,461 --> 00:06:35,330
For each of them, I have
an infinite number of them.

104
00:06:35,330 --> 00:06:39,230
So I can find two of these,
three of that, five of that,

105
00:06:39,230 --> 00:06:42,960
such that their sum is n.

106
00:06:42,960 --> 00:06:43,918
Is the problem clear?

107
00:06:48,365 --> 00:06:48,865
OK.

108
00:06:48,865 --> 00:06:50,031
Any ideas how to solve that?

109
00:06:55,995 --> 00:06:59,510
So let's just use a naive
or very straightforward

110
00:06:59,510 --> 00:07:00,010
algorithms.

111
00:07:12,440 --> 00:07:13,190
Go ahead.

112
00:07:13,190 --> 00:07:18,640
AUDIENCE: You pick one, and
then you do mc of n minus that.

113
00:07:18,640 --> 00:07:20,000
LING REN: OK, great.

114
00:07:20,000 --> 00:07:23,460
Yeah, let's just do
exhaustive search.

115
00:07:23,460 --> 00:07:27,220
Let's pick si.

116
00:07:27,220 --> 00:07:29,770
If I pick this coin,
then my subproblem

117
00:07:29,770 --> 00:07:35,220
becomes n minus the coin value.

118
00:07:35,220 --> 00:07:37,950
And of course, I
use the one coin.

119
00:07:37,950 --> 00:07:39,230
That's si.

120
00:07:39,230 --> 00:07:50,180
So then I think the min of this
for all the i's, and that's

121
00:07:50,180 --> 00:07:50,940
the solution.

122
00:07:54,300 --> 00:07:55,060
So far so good?

123
00:08:02,640 --> 00:08:03,140
OK.

124
00:08:03,140 --> 00:08:05,430
So what's the runtime
of this algorithm?

125
00:08:15,968 --> 00:08:19,640
If it's not immediately
obvious, then we

126
00:08:19,640 --> 00:08:23,620
ask how many unique
subproblems are there.

127
00:08:23,620 --> 00:08:28,470
And how much work do I have to
do to go from my subproblems

128
00:08:28,470 --> 00:08:30,140
to my original problem?

129
00:08:33,460 --> 00:08:35,105
So how many
subproblem are there?

130
00:08:49,670 --> 00:08:51,960
So to be clear, for
this one, we have

131
00:08:51,960 --> 00:08:54,730
to call this
recursive call again.

132
00:08:54,730 --> 00:08:57,110
n minus si, probably minus sj.

133
00:09:05,300 --> 00:09:07,860
And if you cannot compute how
many subproblems are there,

134
00:09:07,860 --> 00:09:08,890
let's just give a bound.

135
00:09:23,360 --> 00:09:23,980
Any ideas?

136
00:09:31,320 --> 00:09:32,826
John, right?

137
00:09:32,826 --> 00:09:35,022
AUDIENCE: I'm not
sure there would

138
00:09:35,022 --> 00:09:39,658
be more than n subproblems,
because the smallest

139
00:09:39,658 --> 00:09:44,538
amount we can subtract
from the original is 1.

140
00:09:44,538 --> 00:09:46,978
And if we keep
subtracting 1 repeatedly,

141
00:09:46,978 --> 00:09:48,930
we get n subproblems,
and that will cover

142
00:09:48,930 --> 00:09:50,870
everything-- that subproblem.

143
00:09:50,870 --> 00:09:51,870
LING REN: Yeah, correct.

144
00:09:51,870 --> 00:09:55,260
So this may not be
a very tight bound,

145
00:09:55,260 --> 00:09:58,570
but we know we cannot
have more than this number

146
00:09:58,570 --> 00:09:59,830
of subproblems.

147
00:09:59,830 --> 00:10:04,300
Actually, I don't need to
even put the order there.

148
00:10:04,300 --> 00:10:07,410
I know we can have no
more than n subproblems.

149
00:10:07,410 --> 00:10:10,650
They're just make change of n, n
minus 1, n minus 2, all the way

150
00:10:10,650 --> 00:10:14,010
to make change 1.

151
00:10:14,010 --> 00:10:15,960
And actually, this
bound is pretty

152
00:10:15,960 --> 00:10:19,590
tight, because we set
our smallest coin is 1,

153
00:10:19,590 --> 00:10:23,580
so we won't make a recursive
call to make change n minus 1,

154
00:10:23,580 --> 00:10:25,320
right?

155
00:10:25,320 --> 00:10:29,470
If I pick the 1 coin,
the 1 cent coin first.

156
00:10:29,470 --> 00:10:32,710
And then from there, I will
pick a 1 cent coin again.

157
00:10:32,710 --> 00:10:35,850
That gives me a
subproblem with n minus 2.

158
00:10:35,850 --> 00:10:38,580
So indeed, I will encounter
all the n subproblems.

159
00:10:42,120 --> 00:10:45,810
OK, so having realized
that, how much work

160
00:10:45,810 --> 00:10:48,315
do I have to do to go
from here to there?

161
00:10:56,316 --> 00:10:58,120
AUDIENCE: [INAUDIBLE]

162
00:10:58,120 --> 00:10:58,870
LING REN: Correct.

163
00:10:58,870 --> 00:11:02,430
Because I'm taking the
min of how many terms?

164
00:11:02,430 --> 00:11:05,264
m terms.

165
00:11:05,264 --> 00:11:06,180
So that's our runtime.

166
00:11:18,390 --> 00:11:19,480
Any questions so far?

167
00:11:22,110 --> 00:11:25,090
If not, let me
take a digression.

168
00:11:25,090 --> 00:11:28,710
So, make change, this problem.

169
00:11:28,710 --> 00:11:31,335
If you think about it, it's
very similar to knapsack.

170
00:11:35,140 --> 00:11:38,300
Has anyone not heard
of this problem?

171
00:11:38,300 --> 00:11:40,530
Knapsack means you
have a bunch of items.

172
00:11:40,530 --> 00:11:42,430
You want to pack
these into a bag,

173
00:11:42,430 --> 00:11:45,880
and the bag has a certain size.

174
00:11:45,880 --> 00:11:47,520
So each item has
a certain value,

175
00:11:47,520 --> 00:11:53,050
and you want to pack the items
that have the largest combined

176
00:11:53,050 --> 00:11:56,160
value into your bag.

177
00:11:56,160 --> 00:12:01,415
So, why are they similar?

178
00:12:04,110 --> 00:12:07,070
So in some sense, n is our size.

179
00:12:07,070 --> 00:12:11,590
We want to pick a bunch of
coins to make the size n.

180
00:12:11,590 --> 00:12:14,720
And each coin here actually
has a negative value,

181
00:12:14,720 --> 00:12:17,350
because we want to
pick the min of it.

182
00:12:17,350 --> 00:12:20,344
If you do that, then this
problem is exactly knapsack.

183
00:12:20,344 --> 00:12:21,510
And knapsack is NP-complete.

184
00:12:26,080 --> 00:12:32,130
That means we don't know a
polynomial solution to it yet.

185
00:12:32,130 --> 00:12:33,630
However, we just found one.

186
00:12:36,900 --> 00:12:39,840
Our input is, m stuff and n.

187
00:12:39,840 --> 00:12:42,555
Our solution is polynomial
to m, and polynomial to n.

188
00:12:45,866 --> 00:12:51,040
If this is true, then I have
found the polynomial solution

189
00:12:51,040 --> 00:12:53,120
to one NP problem.

190
00:12:53,120 --> 00:12:54,320
So P equals NP.

191
00:12:54,320 --> 00:12:58,292
SO we should all be getting
Turing award for that.

192
00:12:58,292 --> 00:12:59,500
So clearly something's wrong.

193
00:13:02,760 --> 00:13:07,000
But there's no problem
with this solution.

194
00:13:07,000 --> 00:13:09,330
This covers all the cases.

195
00:13:09,330 --> 00:13:11,790
And our analysis is
definitely correct.

196
00:13:18,840 --> 00:13:22,885
So does anyone get
what I'm asking?

197
00:13:22,885 --> 00:13:24,620
So what's the
contradiction here?

198
00:13:29,540 --> 00:13:32,490
I will probably
discuss this later,

199
00:13:32,490 --> 00:13:37,390
in later lectures when we get
to complexity or reduction.

200
00:13:37,390 --> 00:13:39,500
But to give a short
answer, the problem

201
00:13:39,500 --> 00:13:46,200
is that when we say the input
is n, its size is not n.

202
00:13:46,200 --> 00:13:54,680
So I only need log n this
to represent this input.

203
00:13:54,680 --> 00:13:57,090
Make sense?

204
00:13:57,090 --> 00:14:01,040
Therefore, for log n length
input, my runtime is n.

205
00:14:01,040 --> 00:14:03,567
That means my runtime
is exponential.

206
00:14:03,567 --> 00:14:04,400
It's not polynomial.

207
00:14:06,940 --> 00:14:07,824
OK.

208
00:14:07,824 --> 00:14:09,365
Now that's the end
of the digression.

209
00:14:24,170 --> 00:14:25,910
Now let's look at
another example.

210
00:14:28,960 --> 00:14:31,650
This one is called
rectangular blocks.

211
00:14:41,350 --> 00:14:45,130
So in this problem, we
have a bunch of blocks.

212
00:14:45,130 --> 00:14:48,530
Say 1, 2, all the way to n.

213
00:14:48,530 --> 00:14:54,310
And each of them has a
length, width, and height.

214
00:14:54,310 --> 00:14:55,890
So it's a
three-dimensional block.

215
00:14:58,580 --> 00:15:02,790
So I want to put blocks, stack
them on top of each other

216
00:15:02,790 --> 00:15:05,070
to get the maximum height.

217
00:15:05,070 --> 00:15:10,230
But in order for j to
be put on top of i,

218
00:15:10,230 --> 00:15:18,530
I require the length of j to be
smaller then the length of i,

219
00:15:18,530 --> 00:15:23,800
and the width of j is also
smaller with width of i.

220
00:15:23,800 --> 00:15:29,190
So visually I just
meant this is a block.

221
00:15:29,190 --> 00:15:31,250
I can put another
block on there.

222
00:15:31,250 --> 00:15:34,176
They are smaller in
width and length.

223
00:15:34,176 --> 00:15:39,930
But I cannot put this guy on
top of it because one of its

224
00:15:39,930 --> 00:15:44,190
dimension is larger than
the underlying block.

225
00:15:44,190 --> 00:15:48,910
And to make things simple,
that's not allowed, rotating.

226
00:15:48,910 --> 00:15:50,360
So OK, I can rotate.

227
00:15:50,360 --> 00:15:51,280
It still doesn't fit.

228
00:15:51,280 --> 00:15:53,730
But you see the complication.

229
00:15:53,730 --> 00:15:58,610
So you allow rotate, then
there's more possibility.

230
00:15:58,610 --> 00:16:02,560
Length and width are so
one of them is north-south,

231
00:16:02,560 --> 00:16:05,785
the other is east-west,
and you cannot change that.

232
00:16:08,770 --> 00:16:09,380
OK.

233
00:16:09,380 --> 00:16:10,580
Is the problem clear?

234
00:16:10,580 --> 00:16:13,380
You want to stack one
on top of each other

235
00:16:13,380 --> 00:16:14,970
to get the maximum height.

236
00:16:25,100 --> 00:16:28,140
Any ideas?

237
00:16:28,140 --> 00:16:30,510
Again, let's start
from simple algorithm.

238
00:16:30,510 --> 00:16:32,950
Say, let's just
try everything out.

239
00:16:47,670 --> 00:16:48,430
OK, go ahead.

240
00:16:48,430 --> 00:16:51,005
AUDIENCE: If you try everything
else, you have n factorial.

241
00:16:51,005 --> 00:16:51,713
LING REN: Pardon?

242
00:16:51,713 --> 00:16:53,797
AUDIENCE: It would
be O of n factorial?

243
00:16:53,797 --> 00:16:55,130
LING REN: You're going too fast.

244
00:16:55,130 --> 00:16:57,360
Let's write the algorithm first.

245
00:16:57,360 --> 00:17:04,220
So I want to solve my rectangle
block problem, say from 1 to n.

246
00:17:04,220 --> 00:17:05,220
What are my subproblems?

247
00:17:09,967 --> 00:17:11,669
AUDIENCE: Choose one block.

248
00:17:11,669 --> 00:17:12,210
LING REN: OK.

249
00:17:12,210 --> 00:17:13,168
Let's choose one block.

250
00:17:13,168 --> 00:17:17,036
AUDIENCE: And then you run RB
of everything except that block.

251
00:17:19,970 --> 00:17:22,599
LING REN: So I get its height,
and then I have a subproblem.

252
00:17:25,410 --> 00:17:26,630
What is the subproblem?

253
00:17:26,630 --> 00:17:27,760
And then I'll take a max.

254
00:17:35,330 --> 00:17:41,920
So the difficulty here
is this subproblem.

255
00:17:41,920 --> 00:17:44,410
So Andrew, right?

256
00:17:44,410 --> 00:17:51,320
So Andrew said it's just
everything except i.

257
00:17:51,320 --> 00:17:53,410
Is that the case?

258
00:17:53,410 --> 00:17:53,910
Go ahead.

259
00:17:53,910 --> 00:17:55,410
AUDIENCE: It's
everything except i,

260
00:17:55,410 --> 00:17:59,305
and anything with
wider or longer than i.

261
00:17:59,305 --> 00:18:00,630
LING REN: Do you get that?

262
00:18:00,630 --> 00:18:03,260
Not only do we
have to exclude i,

263
00:18:03,260 --> 00:18:07,770
we also have to exclude
everything longer or wider

264
00:18:07,770 --> 00:18:08,880
than i.

265
00:18:08,880 --> 00:18:10,545
So that's actually
a messy problem.

266
00:18:13,580 --> 00:18:25,480
So let me define this subproblem
to be a compatible set of w i.

267
00:18:25,480 --> 00:18:34,090
And let me define that to
be the set of blocks where

268
00:18:34,090 --> 00:18:38,140
the length is smaller
than the required length,

269
00:18:38,140 --> 00:18:45,420
and their which is also smaller
than the required width.

270
00:18:45,420 --> 00:18:48,280
So this should remind you of
the weighted interval scheduling

271
00:18:48,280 --> 00:18:53,650
problem, where we define
a compatible set once we

272
00:18:53,650 --> 00:18:54,950
have chosen some block.

273
00:18:58,823 --> 00:18:59,323
Question?

274
00:18:59,323 --> 00:19:00,989
AUDIENCE: What are
we trying to do here?

275
00:19:00,989 --> 00:19:04,180
Are we trying to minimize h?

276
00:19:04,180 --> 00:19:05,340
LING REN: Maximize h.

277
00:19:05,340 --> 00:19:06,940
We want to get as
high as possible.

278
00:19:12,500 --> 00:19:13,990
I choose a block,
I get its height,

279
00:19:13,990 --> 00:19:16,930
and then I find out the
competitive remaining blocks,

280
00:19:16,930 --> 00:19:18,750
and I want to stack
them on top of it.

281
00:19:25,450 --> 00:19:29,330
Everyone agrees this
solution is correct?

282
00:19:29,330 --> 00:19:31,820
OK, then let's
analyze its runtime.

283
00:19:40,640 --> 00:19:42,360
So how do we analyze runtime?

284
00:19:52,020 --> 00:19:55,374
So what's the first
question I always ask?

285
00:19:55,374 --> 00:19:57,596
AUDIENCE: How many subproblems?

286
00:19:57,596 --> 00:19:58,220
LING REN: Yeah.

287
00:19:58,220 --> 00:20:00,470
I'm not sure who said
that, but how many

288
00:20:00,470 --> 00:20:01,600
subproblems do we have?

289
00:20:19,841 --> 00:20:22,027
AUDIENCE: At most n?

290
00:20:22,027 --> 00:20:22,860
LING REN: At most n.

291
00:20:25,510 --> 00:20:27,460
Can you explain why
is that the case?

292
00:20:27,460 --> 00:20:29,150
Or it's just a guess?

293
00:20:29,150 --> 00:20:37,340
AUDIENCE: Because if n
is compatible-- nothing

294
00:20:37,340 --> 00:20:39,555
in the compatible--
n will not be

295
00:20:39,555 --> 00:20:41,047
in the compatible
set of anything

296
00:20:41,047 --> 00:20:44,029
that is in the
compatible set of n.

297
00:20:44,029 --> 00:20:45,780
LING REN: OK,
that's very tricky.

298
00:20:45,780 --> 00:20:46,650
I didn't get that.

299
00:20:46,650 --> 00:20:48,580
Can you say that again?

300
00:20:48,580 --> 00:20:51,130
AUDIENCE: Because
for example, if you

301
00:20:51,130 --> 00:20:52,630
start with n, then
everything that's

302
00:20:52,630 --> 00:20:56,140
in the compatible set of n.

303
00:20:56,140 --> 00:20:59,020
n won't be in the
compatible set of that.

304
00:21:01,624 --> 00:21:02,165
LING REN: OK.

305
00:21:02,165 --> 00:21:04,290
I think I got what you said.

306
00:21:04,290 --> 00:21:08,330
So, if we think there are only
n subproblems, what are they?

307
00:21:08,330 --> 00:21:16,976
They have to be compatible
sets l1, w1, then l2, w2.

308
00:21:16,976 --> 00:21:19,000
These are the n
unique subproblems

309
00:21:19,000 --> 00:21:21,670
you are thinking about.

310
00:21:21,670 --> 00:21:24,540
Is there any chance that I
will get a compatible set

311
00:21:24,540 --> 00:21:26,770
like something like l3 but w5?

312
00:21:29,580 --> 00:21:33,320
If I ever have this
subproblem then, well,

313
00:21:33,320 --> 00:21:35,845
my number of subproblems
are kind of exploding.

314
00:21:43,450 --> 00:21:49,210
Yeah, I see many of
you are saying no.

315
00:21:49,210 --> 00:21:50,210
Why not?

316
00:21:50,210 --> 00:21:54,180
Because if we have a subproblem,
say, compatible set of l i

317
00:21:54,180 --> 00:22:05,160
and w i, and if we go from here,
and choose the next block, say

318
00:22:05,160 --> 00:22:18,370
t, it's guaranteed that t
is shorter and narrower.

319
00:22:18,370 --> 00:22:22,300
That means our new subproblem,
or new compatible set

320
00:22:22,300 --> 00:22:30,040
becomes-- our new
subproblem needs to be

321
00:22:30,040 --> 00:22:33,750
compatible with t instead of i.

322
00:22:33,750 --> 00:22:39,580
So, the only subproblems
I can get are these ones.

323
00:22:39,580 --> 00:22:41,330
I cannot have one of these.

324
00:22:45,330 --> 00:22:48,446
The number of subproblems are n.

325
00:22:48,446 --> 00:22:56,145
And how much work do I
have to do at each level?

326
00:23:00,780 --> 00:23:03,200
AUDIENCE: n.

327
00:23:03,200 --> 00:23:05,540
LING REN: n, because
I'm just taking the max,

328
00:23:05,540 --> 00:23:09,860
and there are n potential
choices inside my max.

329
00:23:13,710 --> 00:23:14,760
So runtime n squared.

330
00:23:24,160 --> 00:23:32,810
OK, we're not fully done,
because there is an extra step

331
00:23:32,810 --> 00:23:35,350
when we're trying to do this.

332
00:23:35,350 --> 00:23:39,280
We have to figure out
what each of these are.

333
00:23:42,260 --> 00:23:45,060
Because once I go
into this subproblem,

334
00:23:45,060 --> 00:23:51,120
I need to take a max on all
the blocks that's in this set.

335
00:23:51,120 --> 00:23:53,420
I have to know what
blocks are in that set.

336
00:23:57,780 --> 00:24:00,530
Is that hard?

337
00:24:00,530 --> 00:24:02,542
So how would you do that?

338
00:24:02,542 --> 00:24:06,999
AUDIENCE: You just check for
all of them, and that's O of n.

339
00:24:06,999 --> 00:24:07,540
LING REN: OK.

340
00:24:07,540 --> 00:24:09,950
So, I check all of them.

341
00:24:09,950 --> 00:24:10,830
That's O of n.

342
00:24:15,240 --> 00:24:17,520
I'm pretty sure you
just meant scanning,

343
00:24:17,520 --> 00:24:20,970
scan the entire thing, and
pick out the compatible ones.

344
00:24:20,970 --> 00:24:23,140
But that's for this subproblem.

345
00:24:23,140 --> 00:24:27,130
We have to do it for every one.

346
00:24:27,130 --> 00:24:29,230
Or there may be a better way.

347
00:24:29,230 --> 00:24:31,240
So I think the previous
TA is telling me there's

348
00:24:31,240 --> 00:24:33,020
a better way to do that.

349
00:24:33,020 --> 00:24:36,730
So in order to find the
entire compatible stuff,

350
00:24:36,730 --> 00:24:39,624
he claims he can do it in n log
n, but I haven't checked that,

351
00:24:39,624 --> 00:24:40,290
so I'm not sure.

352
00:24:40,290 --> 00:24:45,230
This is a folklore legend here.

353
00:24:45,230 --> 00:24:46,880
Yeah, we'll double
check that offline.

354
00:24:46,880 --> 00:24:51,080
But assuming if I
don't have this, then

355
00:24:51,080 --> 00:24:55,960
figure out all these subproblems
will also take n squared.

356
00:24:55,960 --> 00:24:58,910
Then my total runtime is
n squared plus n squared,

357
00:24:58,910 --> 00:25:00,040
and still n squared.

358
00:25:06,400 --> 00:25:06,900
Question?

359
00:25:06,900 --> 00:25:08,784
AUDIENCE: Is the
n log n solution

360
00:25:08,784 --> 00:25:11,139
giving us sorting
this by [INAUDIBLE]?

361
00:25:14,014 --> 00:25:15,930
LING REN: Yeah, I think
it should be something

362
00:25:15,930 --> 00:25:19,260
along those lines, but yeah, I
haven't figured out whether you

363
00:25:19,260 --> 00:25:22,070
sort by length or by width.

364
00:25:22,070 --> 00:25:25,890
You can only sort
by one of them.

365
00:25:25,890 --> 00:25:29,150
So after sorting, say
let's sort by length.

366
00:25:29,150 --> 00:25:34,570
Then after sorting, I may
get something like this.

367
00:25:34,570 --> 00:25:37,600
And if I'm asking what's
the compatible set of width

368
00:25:37,600 --> 00:25:40,885
this guy, I still have
to kick all of them out.

369
00:25:48,942 --> 00:25:52,130
Yeah, so it's not entirely
clear to me how to do it,

370
00:25:52,130 --> 00:25:57,240
but I think you can potentially
consider having another,

371
00:25:57,240 --> 00:26:00,550
say, binary search tree
that's sorted by width,

372
00:26:00,550 --> 00:26:03,910
and you can go in and just
delete everything larger

373
00:26:03,910 --> 00:26:06,280
than a certain width.

374
00:26:06,280 --> 00:26:08,640
So that's the, yeah.

375
00:26:08,640 --> 00:26:09,642
OK, go ahead.

376
00:26:09,642 --> 00:26:13,815
AUDIENCE: Can you convert into
a directed graph, where each

377
00:26:13,815 --> 00:26:17,615
pair of shapes that's
compatible, you do an edge.

378
00:26:17,615 --> 00:26:20,950
And then path find.

379
00:26:20,950 --> 00:26:22,100
LING REN: OK.

380
00:26:22,100 --> 00:26:24,510
OK.

381
00:26:24,510 --> 00:26:29,626
But constructing that graph
already takes O n squared,

382
00:26:29,626 --> 00:26:30,126
correct?

383
00:26:34,856 --> 00:26:36,470
Yeah, OK, let's move on.

384
00:26:36,470 --> 00:26:38,260
I don't have time
to figure this out.

385
00:26:41,390 --> 00:26:49,160
So, this problem is remotely
similar to interval scheduling,

386
00:26:49,160 --> 00:26:52,180
weighted interval
scheduling, in a sense

387
00:26:52,180 --> 00:26:54,630
that it has some compatible set.

388
00:26:54,630 --> 00:26:58,430
And in the very first
lecture and recitation,

389
00:26:58,430 --> 00:27:02,400
we have two algorithm for
weighted interval scheduling,

390
00:27:02,400 --> 00:27:04,617
and one of them is
better than the other.

391
00:27:04,617 --> 00:27:06,450
And this one looks like
the naive algorithm.

392
00:27:13,480 --> 00:27:18,380
So, does anyone remember
what the better algorithm

393
00:27:18,380 --> 00:27:19,925
is for weighted
interval scheduling?

394
00:27:46,710 --> 00:27:55,530
But instead of checking every
one as my potential lowest one,

395
00:27:55,530 --> 00:27:57,230
it really doesn't
make sense to do that.

396
00:27:57,230 --> 00:27:59,160
Because for the
very small ones, I

397
00:27:59,160 --> 00:28:03,420
shouldn't put them
as my bottom one.

398
00:28:03,420 --> 00:28:09,440
I should try the larger ones
first as the very bottom one.

399
00:28:09,440 --> 00:28:09,940
Go ahead.

400
00:28:09,940 --> 00:28:11,908
Oh, you're not--

401
00:28:11,908 --> 00:28:15,380
AUDIENCE: You could create
a sorted list of length n

402
00:28:15,380 --> 00:28:16,868
with the width.

403
00:28:16,868 --> 00:28:19,844
So you know that items
that are later in the list,

404
00:28:19,844 --> 00:28:24,330
they're not going to be in
the first level of the tower.

405
00:28:24,330 --> 00:28:26,030
LING REN: Yeah, correct.

406
00:28:26,030 --> 00:28:29,230
So, just in the
same line of thought

407
00:28:29,230 --> 00:28:33,290
as weighted interval scheduling,
let's first sort them.

408
00:28:33,290 --> 00:28:35,405
But then, it's a little
tricky because do I

409
00:28:35,405 --> 00:28:37,700
sort by length or width?

410
00:28:37,700 --> 00:28:43,120
So I'm not sure yet, so
let's just sort by length

411
00:28:43,120 --> 00:28:43,835
and then width.

412
00:28:46,790 --> 00:28:48,790
So this means if they
have the same length,

413
00:28:48,790 --> 00:28:50,180
then I'll sort them by width.

414
00:28:50,180 --> 00:28:52,810
So I can create a sorted list.

415
00:28:52,810 --> 00:28:54,920
Let me just assume that
it's in-place sort,

416
00:28:54,920 --> 00:28:58,570
and now I have the sorted list.

417
00:28:58,570 --> 00:29:04,320
So once I have that,
the potential solutions

418
00:29:04,320 --> 00:29:06,980
I should consider is
that whether or not

419
00:29:06,980 --> 00:29:11,680
I put my first block
as the bottom one.

420
00:29:11,680 --> 00:29:15,470
It doesn't make sense for
me to put a later one down.

421
00:29:15,470 --> 00:29:25,175
So my original problem
becomes taking the max,

422
00:29:25,175 --> 00:29:29,540
and whether or not
I choose block one.

423
00:29:29,540 --> 00:29:35,470
If I do, then I get its
weight-- height, sorry.

424
00:29:35,470 --> 00:29:42,290
And my subproblem is the
ones compatible with it.

425
00:29:48,300 --> 00:29:51,770
If I do not choose it,
then my sub problem

426
00:29:51,770 --> 00:29:57,280
is like what Andrew first
said, from 2 all the way to n.

427
00:30:00,710 --> 00:30:04,260
So why is this correct?

428
00:30:04,260 --> 00:30:08,250
So I claim this
covers all the cases.

429
00:30:08,250 --> 00:30:13,800
Either h1 is chosen as the
first bottom one, or it's not.

430
00:30:13,800 --> 00:30:15,050
It's not chosen at all.

431
00:30:15,050 --> 00:30:17,990
It's impossible for h1 to
be somewhere in the middle,

432
00:30:17,990 --> 00:30:21,710
because it has the
longest, largest length.

433
00:30:21,710 --> 00:30:22,210
OK.

434
00:30:25,400 --> 00:30:26,970
So how many
subproblems do I have?

435
00:30:38,826 --> 00:30:40,320
Go ahead.

436
00:30:40,320 --> 00:30:43,330
Still n.

437
00:30:43,330 --> 00:30:49,780
So there are all of these
compatible set of l1 w1, l2 w2.

438
00:30:49,780 --> 00:30:52,140
But it looks like I do
have some new subproblems.

439
00:30:57,940 --> 00:31:02,060
These do not exist before.

440
00:31:02,060 --> 00:31:05,860
However, there are
only n of them.

441
00:31:05,860 --> 00:31:10,640
They're just a suffix
of the entire set.

442
00:31:10,640 --> 00:31:13,240
So I still have O
of n subproblems.

443
00:31:13,240 --> 00:31:18,050
And at each step, I'm doing
constant amount of work.

444
00:31:18,050 --> 00:31:19,510
There are just two items.

445
00:31:19,510 --> 00:31:22,925
So we found an order n solution.

446
00:31:26,200 --> 00:31:27,460
Are we done?

447
00:31:27,460 --> 00:31:30,661
Is it really order n?

448
00:31:30,661 --> 00:31:31,160
OK, no.

449
00:31:31,160 --> 00:31:32,870
AUDIENCE: You still
have to find the c.

450
00:31:32,870 --> 00:31:33,495
LING REN: Yeah.

451
00:31:33,495 --> 00:31:35,980
I still have to
find all these c's.

452
00:31:35,980 --> 00:31:39,060
And first, I actually
have a sort step.

453
00:31:39,060 --> 00:31:41,050
That sort step is n log n.

454
00:31:43,710 --> 00:31:46,860
Yeah, then again, well,
if we do it naively,

455
00:31:46,860 --> 00:31:50,990
then it's again n
squared, because I

456
00:31:50,990 --> 00:31:54,720
have to find this compatible
set, each of them.

457
00:31:54,720 --> 00:31:56,940
But if there's an
n log n solution

458
00:31:56,940 --> 00:31:59,840
to find these compatible sets,
then my final runtime is n

459
00:31:59,840 --> 00:32:00,340
log n.

460
00:32:05,491 --> 00:32:05,990
Make sense?

461
00:32:17,615 --> 00:32:18,490
Any questions so far?

462
00:32:29,050 --> 00:32:30,400
OK.

463
00:32:30,400 --> 00:32:34,490
So now we actually
have a choice.

464
00:32:34,490 --> 00:32:39,560
So we can either go
through another DP example,

465
00:32:39,560 --> 00:32:40,835
I do have another one.

466
00:32:40,835 --> 00:32:44,150
But Nancy, one of the
lecturers suggested,

467
00:32:44,150 --> 00:32:47,560
that it seems that many people
have some trouble understanding

468
00:32:47,560 --> 00:32:50,850
yesterday's lecture on universal
hashing and perfect hashing.

469
00:32:50,850 --> 00:32:53,876
So we can also consider
going through that.

470
00:32:53,876 --> 00:32:55,250
Well, of course,
the third option

471
00:32:55,250 --> 00:32:56,291
is to just call it a day.

472
00:32:59,300 --> 00:33:01,620
So, let me just take a poll.

473
00:33:01,620 --> 00:33:05,050
How many people before we
go over the hash stuff?

474
00:33:08,780 --> 00:33:10,925
How many people prefer
another DP example?

475
00:33:13,890 --> 00:33:14,390
OK.

476
00:33:14,390 --> 00:33:15,400
Sorry guys.

477
00:33:15,400 --> 00:33:17,980
How many people
just want to leave?

478
00:33:17,980 --> 00:33:18,521
It's fine.

479
00:33:18,521 --> 00:33:19,020
OK.

480
00:33:19,020 --> 00:33:19,520
Great.

481
00:33:19,520 --> 00:33:20,024
That's it.

482
00:33:23,440 --> 00:33:23,940
OK.

483
00:33:23,940 --> 00:33:26,270
So, so much for DP.

484
00:33:26,270 --> 00:33:27,470
We do have another example.

485
00:33:27,470 --> 00:33:28,940
We will release it
in recitation notes.

486
00:33:28,940 --> 00:33:30,510
For those of you
who are interested,

487
00:33:30,510 --> 00:33:32,140
you can take a look.

488
00:33:32,140 --> 00:33:36,180
So, well, sure you
all know that we

489
00:33:36,180 --> 00:33:39,120
haven't go into DP in
the main lectures yet.

490
00:33:39,120 --> 00:33:42,510
So this is really just
a warm up to prepare

491
00:33:42,510 --> 00:33:45,690
you to go to the more
advanced DP concepts.

492
00:33:45,690 --> 00:33:50,270
And also, DP will be
covered in quiz 1.

493
00:33:50,270 --> 00:33:54,880
But the difficulty
will be strictly easier

494
00:33:54,880 --> 00:33:57,432
than the examples
we covered here.

495
00:34:01,360 --> 00:34:01,860
OK?

496
00:34:28,460 --> 00:34:32,510
Now let's review universal
and perfect hashing.

497
00:34:32,510 --> 00:34:35,960
So it's not like I have
a better way to teach it.

498
00:34:35,960 --> 00:34:38,010
Our advantage here is
that we have fewer people,

499
00:34:38,010 --> 00:34:41,460
so you can ask
questions you have.

500
00:34:41,460 --> 00:34:43,719
So let me start with
the motivating example.

501
00:34:43,719 --> 00:34:44,929
So why do we care about hash?

502
00:34:47,520 --> 00:34:54,860
It's because we want to create
a hash table of, say, n.

503
00:34:54,860 --> 00:34:55,660
It has n bins.

504
00:35:00,170 --> 00:35:06,160
And we will receive input,
say, k0, k1, all the way to k n

505
00:35:06,160 --> 00:35:07,340
minus 1.

506
00:35:07,340 --> 00:35:08,610
n keys.

507
00:35:08,610 --> 00:35:12,110
And we'll create a hash
function to each of them

508
00:35:12,110 --> 00:35:14,580
to map them to one of the bins.

509
00:35:14,580 --> 00:35:23,140
That the hope is that if n is
theta m, or in the other way,

510
00:35:23,140 --> 00:35:26,040
m is theta n, then
each bin should contain

511
00:35:26,040 --> 00:35:27,200
a constant number of keys.

512
00:35:30,310 --> 00:35:33,250
So to complete the
picture, all the keys are

513
00:35:33,250 --> 00:35:40,060
drawn from a universe
that has size u.

514
00:35:40,060 --> 00:35:42,420
And this u is
usually pretty large.

515
00:35:42,420 --> 00:35:46,010
Let's say it's larger
than m squared.

516
00:35:46,010 --> 00:35:48,750
It's larger than the square
of my hash table size.

517
00:35:52,600 --> 00:36:05,090
But let me first start
with a negative result.

518
00:36:05,090 --> 00:36:18,580
So if my hash function
is deterministic,

519
00:36:18,580 --> 00:36:23,630
then there always
exists a series of input

520
00:36:23,630 --> 00:36:25,350
that all map to the same thing.

521
00:36:31,630 --> 00:36:32,880
We call that worst case.

522
00:36:38,280 --> 00:36:39,720
We don't like the worst case.

523
00:36:39,720 --> 00:36:40,330
Why?

524
00:36:40,330 --> 00:36:42,538
Because in that case, the
hash is not doing anything.

525
00:36:42,538 --> 00:36:45,730
We still have all of the
items in the same list.

526
00:36:56,420 --> 00:36:58,220
Why is that lemma true?

527
00:36:58,220 --> 00:37:03,050
Because by a very simple
pigeonhole argument,

528
00:37:03,050 --> 00:37:08,730
so imagine I insert all of
the keys in the universe

529
00:37:08,730 --> 00:37:09,544
into my hash table.

530
00:37:09,544 --> 00:37:10,960
I would never do
that in practice.

531
00:37:10,960 --> 00:37:12,590
It's just a thought experiment.

532
00:37:12,590 --> 00:37:14,760
So by a simple
pigeonhole argument,

533
00:37:14,760 --> 00:37:17,670
if u is greater
than m squared, then

534
00:37:17,670 --> 00:37:23,020
at least some bin will
contain more than m elements.

535
00:37:23,020 --> 00:37:26,910
Well, if it just so happens
that my inputs are these m keys,

536
00:37:26,910 --> 00:37:30,180
then my hash will hash all
of them to the same bin.

537
00:37:30,180 --> 00:37:32,980
Make sense?

538
00:37:32,980 --> 00:37:35,490
So this is the problem
we're trying to solve.

539
00:37:35,490 --> 00:37:37,490
We don't want this worst case.

540
00:37:37,490 --> 00:37:40,010
And it does say that
if h is deterministic,

541
00:37:40,010 --> 00:37:41,580
we cannot avoid that.

542
00:37:41,580 --> 00:37:43,190
There always exist a worst case.

543
00:37:43,190 --> 00:37:44,390
So what's the solution?

544
00:37:48,050 --> 00:37:52,696
Then the solution
is to randomize h.

545
00:37:55,960 --> 00:38:01,420
However, I can't
really randomize h.

546
00:38:01,420 --> 00:38:07,970
If h take some key,
if my hash function

547
00:38:07,970 --> 00:38:10,740
maps a key into a
certain bin, well,

548
00:38:10,740 --> 00:38:12,410
the next time I call
this hash function,

549
00:38:12,410 --> 00:38:13,990
it better give the same bin.

550
00:38:13,990 --> 00:38:17,900
Otherwise I cannot
find that item.

551
00:38:17,900 --> 00:38:21,850
So h needs to be deterministic.

552
00:38:27,710 --> 00:38:34,400
So now our only choice
is to pick a random h.

553
00:38:38,070 --> 00:38:38,840
Make sense?

554
00:38:38,840 --> 00:38:40,570
Every hash function
is deterministic,

555
00:38:40,570 --> 00:38:46,150
but we will pick a
random one from a family

556
00:38:46,150 --> 00:38:47,000
of hash functions.

557
00:38:51,190 --> 00:38:53,320
So in some sense,
this is cheating.

558
00:38:53,320 --> 00:38:53,950
Why?

559
00:38:53,950 --> 00:38:58,210
Because all I'm saying is I
will not choose a hash function

560
00:38:58,210 --> 00:38:59,030
beforehand.

561
00:38:59,030 --> 00:39:05,070
I will wait for the
user to insert inputs.

562
00:39:05,070 --> 00:39:08,221
If I have too many collisions,
I'll choose another one.

563
00:39:08,221 --> 00:39:09,470
If I have too many collisions.

564
00:39:09,470 --> 00:39:10,470
I'll choose another one.

565
00:39:13,080 --> 00:39:13,600
OK.

566
00:39:13,600 --> 00:39:15,891
I think I forgot to mention
one thing that's important.

567
00:39:15,891 --> 00:39:17,550
So you may ask why do I care?

568
00:39:17,550 --> 00:39:19,670
Why do I care about
that worst case?

569
00:39:19,670 --> 00:39:23,190
What's the chance of it
happening in practice?

570
00:39:23,190 --> 00:39:27,750
It's very low, but in
algorithms, we really

571
00:39:27,750 --> 00:39:30,641
don't like making
assumptions on inputs.

572
00:39:30,641 --> 00:39:31,140
Why?

573
00:39:31,140 --> 00:39:33,300
Because if you imagine
you're running, say,

574
00:39:33,300 --> 00:39:37,910
a website, a web server, and you
code has some has table in it.

575
00:39:37,910 --> 00:39:41,370
So if your competitor,
or someone who hates you,

576
00:39:41,370 --> 00:39:43,250
wants to put you
out of business,

577
00:39:43,250 --> 00:39:45,330
and if he knows
your hash function,

578
00:39:45,330 --> 00:39:48,090
he can create a
worst case input.

579
00:39:48,090 --> 00:39:51,100
That will make your
website infinitely slow.

580
00:39:51,100 --> 00:39:54,340
So what we are saying
here is I don't tell him

581
00:39:54,340 --> 00:39:56,280
what hash function I'll use.

582
00:39:56,280 --> 00:39:57,780
I'll say I choose one.

583
00:39:57,780 --> 00:40:01,340
If he figures out the wrong
input, the worst case input,

584
00:40:01,340 --> 00:40:06,311
I'm going to change my hash
function and use another one.

585
00:40:06,311 --> 00:40:06,810
Make sense?

586
00:40:23,740 --> 00:40:33,050
Now the definition of
universal hash function

587
00:40:33,050 --> 00:40:38,920
is that if I pick a random h
from my universal hash function

588
00:40:38,920 --> 00:40:48,610
family, the probability that
any key i mapped to the same bin

589
00:40:48,610 --> 00:40:54,380
as any key j should be
less or equal than 1

590
00:40:54,380 --> 00:40:56,440
over m, where m
is my hash table.

591
00:40:56,440 --> 00:40:57,940
This is really the
best you can get.

592
00:40:57,940 --> 00:41:01,202
If the hash function is really
evenly distributing things,

593
00:41:01,202 --> 00:41:02,410
you should get this property.

594
00:41:07,970 --> 00:41:13,700
So we have seen one universal
hash function in the class.

595
00:41:13,700 --> 00:41:16,030
I'll just go over
the other example,

596
00:41:16,030 --> 00:41:27,470
which is ak plus b modulo
p, and then modulo m.

597
00:41:27,470 --> 00:41:33,760
So p is a prime number that is
greater than the universe size.

598
00:41:36,760 --> 00:41:38,760
We'll see why this is a
universal hash function.

599
00:41:41,730 --> 00:41:44,490
So to do that, we just need
to analyze the collision

600
00:41:44,490 --> 00:41:45,690
probability.

601
00:41:45,690 --> 00:41:49,550
So if I have two key,
that k1 and k2 that

602
00:41:49,550 --> 00:41:59,780
map to the same bin, that means
they must have this property.

603
00:41:59,780 --> 00:42:02,650
After taking the mod
m, their difference

604
00:42:02,650 --> 00:42:05,600
should be a multiple of m.

605
00:42:05,600 --> 00:42:09,540
Because if this is true
after taking the modulo m,

606
00:42:09,540 --> 00:42:12,410
they will map to the same bin.

607
00:42:12,410 --> 00:42:14,620
Make sense?

608
00:42:14,620 --> 00:42:22,510
Now I can quickly write it as a
times the difference of the key

609
00:42:22,510 --> 00:42:27,920
equals a multiple of m, mod p.

610
00:42:27,920 --> 00:42:33,620
Now, k1 and k2 are not
equal, so they are nonzero.

611
00:42:33,620 --> 00:42:36,820
And in this group, based
on some number theory,

612
00:42:36,820 --> 00:42:39,050
we have an inverse
element for it.

613
00:42:39,050 --> 00:42:43,670
So, if this happens,
we'll call it a bad a.

614
00:42:43,670 --> 00:42:45,220
How many bad a's do I have?

615
00:42:47,785 --> 00:42:52,510
One of a will make this
equation holds with i equals 1.

616
00:42:52,510 --> 00:42:56,090
Another a make the equation
holds with i equals 2.

617
00:42:56,090 --> 00:42:59,120
But how many such a's do I have?

618
00:42:59,120 --> 00:43:04,950
At most, because this equation
can hold with m, 2m, 3m,

619
00:43:04,950 --> 00:43:11,950
all the way to p
over m floored m.

620
00:43:11,950 --> 00:43:14,080
This is the total
number of possible ways

621
00:43:14,080 --> 00:43:16,270
this equation can hold.

622
00:43:16,270 --> 00:43:18,790
So how many bad a's do I have?

623
00:43:18,790 --> 00:43:25,920
I have p over m,
over the total number

624
00:43:25,920 --> 00:43:27,610
of a's, which is p minus 1.

625
00:43:30,272 --> 00:43:31,730
Oh, yeah, I forgot
to mention that.

626
00:43:31,730 --> 00:43:39,510
So a is from 1 to p minus one.

627
00:43:44,240 --> 00:43:44,900
OK.

628
00:43:44,900 --> 00:43:49,800
So I can always choose my p
to be not a multiple of m.

629
00:43:49,800 --> 00:44:00,600
If I do that, this floor--
so, then p and p minus 1

630
00:44:00,600 --> 00:44:03,070
do not cross the
boundary of modulo m.

631
00:44:03,070 --> 00:44:11,400
Then this is true, and
this is less than 1 over m.

632
00:44:11,400 --> 00:44:14,827
So this is a universal
hash function family.

633
00:44:20,440 --> 00:44:21,840
So what's the randomness here?

634
00:44:21,840 --> 00:44:23,530
The randomness is a.

635
00:44:23,530 --> 00:44:27,440
I'll pick an a to get one of my
hash, and if it doesn't work,

636
00:44:27,440 --> 00:44:28,205
I pick another a.

637
00:44:31,456 --> 00:44:32,854
AUDIENCE: What is b?

638
00:44:32,854 --> 00:44:33,790
What is b?

639
00:44:33,790 --> 00:44:37,050
LING REN: p is a prime
number I choose--

640
00:44:37,050 --> 00:44:38,070
AUDIENCE: [INAUDIBLE]

641
00:44:38,070 --> 00:44:38,570
LING REN: b?

642
00:44:38,570 --> 00:44:39,290
AUDIENCE: Yeah.

643
00:44:39,290 --> 00:44:41,424
LING REN: Oh, b.

644
00:44:41,424 --> 00:44:42,840
I think it's also
a random number.

645
00:44:45,550 --> 00:44:47,810
Yeah, so, actually it's
not needed, but I think

646
00:44:47,810 --> 00:44:49,560
there's some deep
reason that they keep it

647
00:44:49,560 --> 00:44:51,280
in the hash function.

648
00:44:51,280 --> 00:44:52,000
I'm not sure why.

649
00:45:04,380 --> 00:45:09,510
Now once we have that, once
we have universal hash,

650
00:45:09,510 --> 00:45:13,990
people also want
perfect hashing,

651
00:45:13,990 --> 00:45:18,550
which means I want
absolutely 0 collision.

652
00:45:21,730 --> 00:45:22,820
So how do I do that?

653
00:45:22,820 --> 00:45:28,540
Let me first give a method 1.

654
00:45:32,570 --> 00:45:36,230
I'll just use any
universal hash function,

655
00:45:36,230 --> 00:45:38,590
but I choose my m
to be n squared.

656
00:45:43,940 --> 00:45:46,020
I claim this is a
perfect hash function

657
00:45:46,020 --> 00:45:48,090
with certain probability.

658
00:45:48,090 --> 00:45:48,660
Why?

659
00:45:48,660 --> 00:45:52,100
Because I want to calculate
probability no collision.

660
00:46:03,230 --> 00:46:06,170
Yeah, 1 minus probability
I do have a collision.

661
00:46:09,740 --> 00:46:12,180
And I can use a union bound.

662
00:46:12,180 --> 00:46:20,215
That's the probability that
any pair has a collision.

663
00:46:25,570 --> 00:46:29,290
Any pair of hx equals hy.

664
00:46:33,540 --> 00:46:34,604
How many pairs do I have?

665
00:46:37,852 --> 00:46:39,244
AUDIENCE: N choose 2.

666
00:46:39,244 --> 00:46:43,560
LING REN: Yeah. n choose
2, which is this number.

667
00:46:48,060 --> 00:46:50,040
So if it's a universal
hash function,

668
00:46:50,040 --> 00:46:53,040
then any collision,
any two colliding,

669
00:46:53,040 --> 00:46:55,200
the probability is 1 over m.

670
00:46:58,560 --> 00:47:00,610
So I choose my m
to be n squared,

671
00:47:00,610 --> 00:47:02,330
so this one is larger than 1/2.

672
00:47:05,300 --> 00:47:08,970
So what I'm saying, to get
a perfect hash function,

673
00:47:08,970 --> 00:47:10,640
I'll just use the simplest way.

674
00:47:10,640 --> 00:47:13,180
I select the universal
hash function with m

675
00:47:13,180 --> 00:47:14,950
equals n squared.

676
00:47:14,950 --> 00:47:18,130
I have a probability
more than 1/2 to succeed.

677
00:47:18,130 --> 00:47:20,820
Or if I don't succeed,
I'll choose another one

678
00:47:20,820 --> 00:47:24,350
until I succeed.

679
00:47:24,350 --> 00:47:26,540
So this is a
randomized algorithm,

680
00:47:26,540 --> 00:47:30,030
and we can make it a Monte
Carlo algorithm or Las Vegas

681
00:47:30,030 --> 00:47:30,910
algorithm.

682
00:47:30,910 --> 00:47:38,830
So I can either say if I
choose alpha log n times,

683
00:47:38,830 --> 00:47:41,580
then what's the chance
that none of my choice

684
00:47:41,580 --> 00:47:42,970
satisfies perfect hashing?

685
00:47:45,820 --> 00:47:48,980
My failure probability
is less than this.

686
00:47:56,210 --> 00:48:01,000
My each chance I have
a half success rate,

687
00:48:01,000 --> 00:48:03,600
and I try this
many times, what's

688
00:48:03,600 --> 00:48:05,107
the chance of all
of them failing?

689
00:48:08,310 --> 00:48:09,820
This is 1 over n
raised to alpha.

690
00:48:15,601 --> 00:48:19,260
Of course, I can also say, I'll
keep trying until I succeed.

691
00:48:19,260 --> 00:48:22,360
Then I have a 100
percent success rate,

692
00:48:22,360 --> 00:48:26,960
but my runtime could
potentially go unbounded.

693
00:48:26,960 --> 00:48:29,180
Make sense?

694
00:48:29,180 --> 00:48:29,890
OK.

695
00:48:29,890 --> 00:48:32,700
This sounds like a
perfect solution.

696
00:48:32,700 --> 00:48:38,000
The only problem is that the
space complexity of this method

697
00:48:38,000 --> 00:48:43,480
is n squared, because I
choose my m hash table

698
00:48:43,480 --> 00:48:45,474
size to be n squared.

699
00:48:45,474 --> 00:48:46,890
So this is the
only thing we don't

700
00:48:46,890 --> 00:48:50,140
want in this simple method.

701
00:48:59,370 --> 00:49:05,970
Our final goal, is to have
a perfect hash function that

702
00:49:05,970 --> 00:49:15,910
has space O of n, and also
runtime some polynomial in n,

703
00:49:15,910 --> 00:49:19,230
and failure probability
arbitrarily small.

704
00:49:19,230 --> 00:49:23,283
And the idea there is
this two-level hashing.

705
00:49:31,180 --> 00:49:37,470
So, I choose h1 first to
hash my keys into bins.

706
00:49:37,470 --> 00:49:41,170
And for each bin, say I get
l1 elements here, l2 elements

707
00:49:41,170 --> 00:49:44,280
here, so on and so forth.

708
00:49:44,280 --> 00:49:46,825
I'll choose each
of the bins to be

709
00:49:46,825 --> 00:49:50,630
a second level perfect hashing.

710
00:49:50,630 --> 00:49:55,290
So we can use the method one
to choose this small one.

711
00:49:55,290 --> 00:49:59,580
If I choose m1, which is the
hash table size of this guy,

712
00:49:59,580 --> 00:50:04,560
to be l1 squared, then I
know after alpha log n trial,

713
00:50:04,560 --> 00:50:07,580
this one should be
a perfect hashing.

714
00:50:07,580 --> 00:50:10,090
After another alpha
log n trial, I

715
00:50:10,090 --> 00:50:12,220
should resolve all
the conflicts in l2

716
00:50:12,220 --> 00:50:15,206
to make it a perfect hashing.

717
00:50:15,206 --> 00:50:16,194
Make sense?

718
00:50:19,160 --> 00:50:27,410
So after n log n trials, I
will resolve all the conflicts

719
00:50:27,410 --> 00:50:28,780
in my second level hashing.

720
00:50:28,780 --> 00:50:29,280
Question?

721
00:50:29,280 --> 00:50:30,200
AUDIENCE: It was
mentioned in the lecture

722
00:50:30,200 --> 00:50:33,040
that this only works if there
are no inserts or deletes,

723
00:50:33,040 --> 00:50:34,040
or something like that?

724
00:50:37,970 --> 00:50:39,720
LING REN: Let me think
about that offline.

725
00:50:39,720 --> 00:50:40,719
I'm not sure about that.

726
00:50:43,461 --> 00:50:43,960
OK.

727
00:50:43,960 --> 00:50:47,150
So the only remaining
problem is we

728
00:50:47,150 --> 00:50:50,270
need to figure out whether
we achieve this space O of n.

729
00:50:50,270 --> 00:50:53,210
What is this space
complexity of this algorithm?

730
00:50:53,210 --> 00:51:04,310
It's n plus l i squared, because
each table size is the square

731
00:51:04,310 --> 00:51:06,330
of the elements in it.

732
00:51:06,330 --> 00:51:10,070
And finally, we have
that Markov inequality

733
00:51:10,070 --> 00:51:12,040
or I think something
like that, to prove

734
00:51:12,040 --> 00:51:21,240
this is the case with--
so my space is O of n,

735
00:51:21,240 --> 00:51:25,650
also with the probability
of greater than 1/2.

736
00:51:25,650 --> 00:51:27,150
I can keep going.

737
00:51:27,150 --> 00:51:32,960
I'll try alpha log n times on
my first level hash function,

738
00:51:32,960 --> 00:51:35,960
until my space is O of n.

739
00:51:35,960 --> 00:51:38,380
Once I get to that
point, I'll try

740
00:51:38,380 --> 00:51:40,440
choosing universal hash
functions for my smaller

741
00:51:40,440 --> 00:51:43,625
tables, until I succeed.

742
00:51:43,625 --> 00:51:44,125
OK?

743
00:51:51,010 --> 00:51:54,536
That's it for hashing and DP.