1
00:00:00,040 --> 00:00:02,480
The following content is
provided under a Creative

2
00:00:02,480 --> 00:00:04,010
Commons license.

3
00:00:04,010 --> 00:00:06,340
Your support will help
MIT OpenCourseWare

4
00:00:06,340 --> 00:00:10,690
continue to offer high quality
educational resources for free.

5
00:00:10,690 --> 00:00:13,320
To make a donation or
view additional materials

6
00:00:13,320 --> 00:00:17,035
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:17,035 --> 00:00:17,660
at ocw.mit.edu.

8
00:00:21,264 --> 00:00:22,430
SRINIVAS DEVADAS: All right.

9
00:00:22,430 --> 00:00:23,672
Good morning, everyone.

10
00:00:23,672 --> 00:00:26,060
And let's get started.

11
00:00:26,060 --> 00:00:30,130
Today's lecture is about a
randomized data structure

12
00:00:30,130 --> 00:00:32,200
called the skip list.

13
00:00:32,200 --> 00:00:37,390
And it's a data structure
that, obviously because it's

14
00:00:37,390 --> 00:00:41,360
randomized, we'd have to do
a probabilistic analysis for.

15
00:00:41,360 --> 00:00:44,740
And we're going to sort of raise
the stakes here a little bit

16
00:00:44,740 --> 00:00:48,010
with respect to
our expectation--

17
00:00:48,010 --> 00:00:52,640
the pun intended of this
data structure-- in the sense

18
00:00:52,640 --> 00:00:57,410
that we're not going to be happy
with just doing an expected

19
00:00:57,410 --> 00:01:01,290
value analysis or to get
what the expectation is

20
00:01:01,290 --> 00:01:05,129
of the search complexity
in a skip list.

21
00:01:05,129 --> 00:01:10,030
We're going to introduce this
notion with high probability,

22
00:01:10,030 --> 00:01:13,630
which is a stronger notion than
just giving you the expected

23
00:01:13,630 --> 00:01:19,020
value or the expectation for
the complexity of a search

24
00:01:19,020 --> 00:01:20,090
algorithm.

25
00:01:20,090 --> 00:01:24,690
And we're going to prove
that under this notion,

26
00:01:24,690 --> 00:01:28,500
that search has a
particular complexity

27
00:01:28,500 --> 00:01:30,520
with high probability.

28
00:01:30,520 --> 00:01:34,480
So we'll get to the with high
probability part a little bit

29
00:01:34,480 --> 00:01:36,330
later in the lecture,
but we're just

30
00:01:36,330 --> 00:01:41,470
going to start off doing some
cool data structure design,

31
00:01:41,470 --> 00:01:45,210
I guess, [INAUDIBLE]
pointing to the skip list.

32
00:01:45,210 --> 00:01:48,810
The skip list is a relatively
young data structure

33
00:01:48,810 --> 00:01:54,290
invented by a guy called
Bill Pugh in 1989,

34
00:01:54,290 --> 00:01:56,597
so not much older than you guys.

35
00:01:59,470 --> 00:02:04,240
It's relatively easy to
implement as you'll see.

36
00:02:04,240 --> 00:02:07,480
I won't really claim
that, but hopefully you'll

37
00:02:07,480 --> 00:02:11,700
be convinced by the time you're
done describing the structure.

38
00:02:11,700 --> 00:02:18,004
Especially in comparison
to balanced trees.

39
00:02:20,670 --> 00:02:24,860
And we can do a comparison after
we do our analysis of the data

40
00:02:24,860 --> 00:02:30,290
structure as to what the
complexity comparisons are

41
00:02:30,290 --> 00:02:33,920
for search and insert
when you take a skip list

42
00:02:33,920 --> 00:02:37,310
and compare it to an
AVL tree, for example,

43
00:02:37,310 --> 00:02:41,820
or a red black tree, et cetera.

44
00:02:41,820 --> 00:02:44,240
In general, when we
have a data structure,

45
00:02:44,240 --> 00:02:46,840
we want it to be dynamic.

46
00:02:46,840 --> 00:02:53,530
The skip list maintains
a dynamic set.

47
00:02:53,530 --> 00:02:55,970
What that means
is not only do you

48
00:02:55,970 --> 00:02:58,810
want to search on
it-- obviously it's

49
00:02:58,810 --> 00:03:03,170
uninteresting to have a static
data structure and do a search.

50
00:03:03,170 --> 00:03:05,750
You want to be able to
change it, want to be

51
00:03:05,750 --> 00:03:08,390
able to insert values into it.

52
00:03:08,390 --> 00:03:10,600
There's a complexity of
insert to worry about.

53
00:03:10,600 --> 00:03:13,170
You want to be able
to delete values.

54
00:03:13,170 --> 00:03:15,110
And the richness of
the data structure

55
00:03:15,110 --> 00:03:17,800
comes from the operations
and the augmentations

56
00:03:17,800 --> 00:03:22,440
you can do on it, and the skip
lists are no exception to that.

57
00:03:22,440 --> 00:03:25,870
So if you want to maintain
a dynamic set of n elements,

58
00:03:25,870 --> 00:03:29,320
and you obviously know a ton
of data structures to do this,

59
00:03:29,320 --> 00:03:32,630
each of which has
different characteristics.

60
00:03:32,630 --> 00:03:37,050
And this is, if you
ignore hash tables,

61
00:03:37,050 --> 00:03:44,280
this is your first real
randomized data structure,

62
00:03:44,280 --> 00:03:48,740
if you're just taking
6006 and this class

63
00:03:48,740 --> 00:03:51,875
might have seen randomized
structures in other classes.

64
00:03:55,050 --> 00:04:00,320
So we're going to try and
do this in order log n time.

65
00:04:00,320 --> 00:04:04,190
As you know with
balanced binary trees,

66
00:04:04,190 --> 00:04:07,470
you can do things in order
log n time, a ton of things,

67
00:04:07,470 --> 00:04:12,070
pretty much everything
that is interesting.

68
00:04:12,070 --> 00:04:16,010
And this, given that
it's randomized,

69
00:04:16,010 --> 00:04:19,360
it's a relatively
easy analysis to show

70
00:04:19,360 --> 00:04:25,210
that the expectation or the
expected value of a search

71
00:04:25,210 --> 00:04:28,750
would be order log
n expected time.

72
00:04:28,750 --> 00:04:31,550
But we're going to, as I
said, raise the stakes,

73
00:04:31,550 --> 00:04:36,020
and we're going to
spend a ton of time

74
00:04:36,020 --> 00:04:40,610
the second half of this showing
the with high probability

75
00:04:40,610 --> 00:04:43,850
result. And that's
a stronger result

76
00:04:43,850 --> 00:04:47,790
than just saying that search
takes expected order log n

77
00:04:47,790 --> 00:04:49,120
time.

78
00:04:49,120 --> 00:04:51,720
All right, so
that's the context.

79
00:04:51,720 --> 00:04:55,720
You can think of a
skip list as beginning

80
00:04:55,720 --> 00:05:00,800
with a simple linked list.

81
00:05:00,800 --> 00:05:08,640
So if we have one link list
and that link list-- let's

82
00:05:08,640 --> 00:05:16,610
first think of this
as being unsorted.

83
00:05:16,610 --> 00:05:19,660
So suppose I have a link
list which is unsorted

84
00:05:19,660 --> 00:05:23,650
and I want to search for a
particular value in this link

85
00:05:23,650 --> 00:05:24,380
list.

86
00:05:24,380 --> 00:05:29,320
And we can assume that this
is a doubly-linked list,

87
00:05:29,320 --> 00:05:32,640
so the arrows go both ways.

88
00:05:32,640 --> 00:05:38,660
You have a pointer, let's say,
just to the first element.

89
00:05:38,660 --> 00:05:41,920
So if you have a
list that's unsorted

90
00:05:41,920 --> 00:05:45,030
and you want to
search for an element,

91
00:05:45,030 --> 00:05:47,270
you would want to do
a membership query.

92
00:05:47,270 --> 00:05:50,906
If there's n elements,
the complexity is?

93
00:05:50,906 --> 00:05:51,860
AUDIENCE: Order n.

94
00:05:51,860 --> 00:05:53,140
SRINIVAS DEVADAS: Order n.

95
00:05:53,140 --> 00:06:00,370
So a linked list, the search
takes the order n time.

96
00:06:00,370 --> 00:06:05,630
Now let's go ahead and say
that we are sorting this list,

97
00:06:05,630 --> 00:06:07,930
so it's a sorted linked list.

98
00:06:07,930 --> 00:06:17,190
So your values here,
14, 23, 34, 42, 50, 59.

99
00:06:17,190 --> 00:06:20,850
They're sorted in
ascending order.

100
00:06:20,850 --> 00:06:25,040
You still only have a pointer
to the front of the list

101
00:06:25,040 --> 00:06:27,710
and it's a
doubly-linked list, what

102
00:06:27,710 --> 00:06:34,090
is the complexity of search
in the sorted link list?

103
00:06:34,090 --> 00:06:35,570
AUDIENCE: Log n.

104
00:06:35,570 --> 00:06:36,810
SRINIVAS DEVADAS: Log n.

105
00:06:36,810 --> 00:06:39,200
Oh, I wanted to hear that.

106
00:06:39,200 --> 00:06:40,724
Because it is?

107
00:06:40,724 --> 00:06:41,474
AUDIENCE: Order n.

108
00:06:41,474 --> 00:06:42,520
SRINIVAS DEVADAS: It's order n.

109
00:06:42,520 --> 00:06:44,400
log n is-- yeah, that
was a trick question.

110
00:06:49,310 --> 00:06:51,910
Because I liked that answer,
the person who said log n

111
00:06:51,910 --> 00:06:52,550
gets a Frisbee.

112
00:06:55,790 --> 00:06:56,800
This person won't admit.

113
00:06:56,800 --> 00:06:58,270
[LAUGHTER]

114
00:06:58,270 --> 00:06:59,210
Oh, it was you.

115
00:06:59,210 --> 00:07:00,190
OK, all right.

116
00:07:00,190 --> 00:07:02,200
There you go.

117
00:07:02,200 --> 00:07:03,940
All right.

118
00:07:03,940 --> 00:07:12,960
So log n would imply that
you have random access.

119
00:07:12,960 --> 00:07:16,090
If you have an array that's
sorted and you can go

120
00:07:16,090 --> 00:07:19,660
[? AFi, ?] and you can go
[? AFi ?] divided by 2,

121
00:07:19,660 --> 00:07:23,150
or [? AF2i ?] and you can
go directly to that element,

122
00:07:23,150 --> 00:07:27,650
then you can do binary search
and you can get a log n,

123
00:07:27,650 --> 00:07:29,010
order log in.

124
00:07:29,010 --> 00:07:33,210
But here, the sorting
actually doesn't help you

125
00:07:33,210 --> 00:07:38,200
with respect to the
search simply because you

126
00:07:38,200 --> 00:07:40,890
have to start from
the beginning,

127
00:07:40,890 --> 00:07:43,430
from the front of the list,
and you've got to keep walking.

128
00:07:43,430 --> 00:07:47,390
The only place that it helps you
is that if you know it's sorted

129
00:07:47,390 --> 00:07:50,850
and you're looking
for 37, you can

130
00:07:50,850 --> 00:07:54,840
stop after you see 42, right?

131
00:07:54,840 --> 00:07:57,030
That's pretty much the only
place that it helps you.

132
00:07:57,030 --> 00:07:58,740
But it's still
order n because that

133
00:07:58,740 --> 00:08:00,650
could happen-- on
average is going

134
00:08:00,650 --> 00:08:04,410
to happen halfway through the
list for a given membership

135
00:08:04,410 --> 00:08:05,230
query.

136
00:08:05,230 --> 00:08:10,350
So it's still order n
for a sorted link list.

137
00:08:10,350 --> 00:08:15,110
But now let's say that we
had two sorted link lists.

138
00:08:20,480 --> 00:08:24,710
And how are these two
link lists structured?

139
00:08:24,710 --> 00:08:27,080
Well, they're structured
in a certain way,

140
00:08:27,080 --> 00:08:32,270
and let me draw our canonical
example for skip list

141
00:08:32,270 --> 00:08:34,090
that I'm going to
keep coming back to.

142
00:08:34,090 --> 00:08:39,974
So I won't erase this, but I'll
draw one out-- 1, 2, 3, 4, 5,

143
00:08:39,974 --> 00:08:42,605
6, 7, 8, 9-- 9 elements.

144
00:08:56,060 --> 00:09:01,450
So that's my first
list which is sorted,

145
00:09:01,450 --> 00:09:18,110
and so I have 14, 23, 34,
42, 50, 59, 66, 72, and 79.

146
00:09:18,110 --> 00:09:20,590
What I'm going to have
now is another list sort

147
00:09:20,590 --> 00:09:22,830
of on top of this.

148
00:09:22,830 --> 00:09:31,660
I can move from top
to bottom, et cetera.

149
00:09:31,660 --> 00:09:36,230
But I'm not going
to have elements

150
00:09:36,230 --> 00:09:38,733
on top of each bottom element.

151
00:09:42,090 --> 00:09:44,650
By convention, I'm
going to have elements

152
00:09:44,650 --> 00:09:47,490
on top of the first
element, regardless

153
00:09:47,490 --> 00:09:48,870
of how many lists I have.

154
00:09:48,870 --> 00:09:50,840
We only have two at this point.

155
00:09:50,840 --> 00:09:54,100
And so I see a 14, which
is exactly the same element

156
00:09:54,100 --> 00:09:57,280
duplicated up on the top list.

157
00:09:57,280 --> 00:09:59,970
And that list is
also sorted, but I

158
00:09:59,970 --> 00:10:04,340
won't have all of the
elements in the top list.

159
00:10:04,340 --> 00:10:06,860
I'm just picking a couple here.

160
00:10:06,860 --> 00:10:11,770
So I've got 34, 42--
they're adjacent here--

161
00:10:11,770 --> 00:10:16,070
and then I go all
the way up to 72,

162
00:10:16,070 --> 00:10:18,740
and I duplicate it, et cetera.

163
00:10:22,100 --> 00:10:24,227
Now, this looks kind of random.

164
00:10:24,227 --> 00:10:25,560
Anybody recognize these numbers?

165
00:10:31,460 --> 00:10:36,180
No one from the great
City of New York?

166
00:10:36,180 --> 00:10:36,760
No?

167
00:10:36,760 --> 00:10:37,290
Yup, yup.

168
00:10:37,290 --> 00:10:38,540
AUDIENCE: On the subway stops?

169
00:10:38,540 --> 00:10:41,520
SRINIVAS DEVADAS: Yeah, subway
stops on the Seventh Avenue

170
00:10:41,520 --> 00:10:42,465
Express Line.

171
00:10:49,640 --> 00:10:53,910
So this is exactly the notion
of a skip list, the fact

172
00:10:53,910 --> 00:10:57,152
that you have--
could you stand up?

173
00:10:59,992 --> 00:11:02,970
Great.

174
00:11:02,970 --> 00:11:04,520
All right.

175
00:11:04,520 --> 00:11:09,210
So the notion here
is that you don't

176
00:11:09,210 --> 00:11:14,370
have to make a lot of stops if
you know you have to go far.

177
00:11:14,370 --> 00:11:20,340
So if you want to go from
14th Street to 72nd Street,

178
00:11:20,340 --> 00:11:22,960
you just take the express line.

179
00:11:22,960 --> 00:11:27,426
But if you want to go to 66th
Street, what would you do?

180
00:11:27,426 --> 00:11:30,830
AUDIENCE: Go to 72nd
and then go back.

181
00:11:30,830 --> 00:11:33,710
SRINIVAS DEVADAS:
Well, that's one way.

182
00:11:33,710 --> 00:11:34,790
That's one way.

183
00:11:34,790 --> 00:11:37,156
That's not the way I wanted.

184
00:11:37,156 --> 00:11:38,780
The way we're going
to do this is we're

185
00:11:38,780 --> 00:11:40,340
not going to overshoot.

186
00:11:40,340 --> 00:11:43,360
So we want to minimize
distance, let's say.

187
00:11:43,360 --> 00:11:46,240
So our secondary
thing is going to be

188
00:11:46,240 --> 00:11:49,730
minimizing distance travel.

189
00:11:49,730 --> 00:11:53,610
And so you're going to pop
up the express line, go

190
00:11:53,610 --> 00:11:58,400
all the way to 42nd
Street, and you're

191
00:11:58,400 --> 00:12:01,100
going to say if I go to the
next stop on the Express Line,

192
00:12:01,100 --> 00:12:02,610
I'm going too far.

193
00:12:02,610 --> 00:12:05,630
And so you're going to pop
down to the local line.

194
00:12:05,630 --> 00:12:09,240
So you can think of this as
being link list L0 and link

195
00:12:09,240 --> 00:12:10,482
list L1.

196
00:12:10,482 --> 00:12:12,190
You're going to pop
down, and then you're

197
00:12:12,190 --> 00:12:15,150
going to go to 66th Street.

198
00:12:15,150 --> 00:12:25,520
So search 66 will be
going from 14 to 42

199
00:12:25,520 --> 00:12:34,730
on L1, and then from 42,
let's just say that's walking.

200
00:12:34,730 --> 00:12:37,690
42 to 42, L1 to L0.

201
00:12:43,510 --> 00:12:47,310
And then 42 to 66 on L0.

202
00:12:50,360 --> 00:12:53,200
So that's the basic
notion of a skip list.

203
00:12:53,200 --> 00:12:57,660
So you can see that it's
really pretty simple.

204
00:12:57,660 --> 00:13:02,900
What we're going to do
now is do two things.

205
00:13:02,900 --> 00:13:08,090
I want to think about this
double-sorted list as a data

206
00:13:08,090 --> 00:13:12,590
structure in its own right
before I dive into skip lists

207
00:13:12,590 --> 00:13:13,920
in general.

208
00:13:13,920 --> 00:13:21,990
And I want to analyze at some
level, the best case situation

209
00:13:21,990 --> 00:13:24,870
for worst case complexity.

210
00:13:24,870 --> 00:13:30,970
And by that I mean I want to
structure the express stops

211
00:13:30,970 --> 00:13:33,490
in the best manner possible.

212
00:13:33,490 --> 00:13:35,660
These stops are very
structured for passengers

213
00:13:35,660 --> 00:13:38,980
because they figured fancy
stops on 42nd Avenue, whatever--

214
00:13:38,980 --> 00:13:40,190
fancy stores.

215
00:13:40,190 --> 00:13:42,980
Everybody wants to go there
and so on and so forth.

216
00:13:42,980 --> 00:13:46,890
So you have 34 pretty close
to 42 because they're both

217
00:13:46,890 --> 00:13:49,720
popular destinations.

218
00:13:49,720 --> 00:13:53,250
But let's say that
things where I

219
00:13:53,250 --> 00:13:57,920
guess more egalitarian and
randomized, if you will.

220
00:13:57,920 --> 00:14:00,310
And what I want to do
is I want to structure

221
00:14:00,310 --> 00:14:07,030
this double-sorted list so
I get the best worst case

222
00:14:07,030 --> 00:14:10,330
complexity for search.

223
00:14:10,330 --> 00:14:12,750
And so let's do that.

224
00:14:12,750 --> 00:14:20,019
And before I do that, let me
write out the search algorithm,

225
00:14:20,019 --> 00:14:21,310
which is going to be important.

226
00:14:21,310 --> 00:14:24,960
I want you to assimilate
this, keep it in your head

227
00:14:24,960 --> 00:14:27,000
because we're going
to analyze search

228
00:14:27,000 --> 00:14:29,250
pretty much for the rest
of the morning here.

229
00:14:29,250 --> 00:14:31,930
And so I'll write this down.

230
00:14:31,930 --> 00:14:37,280
You've got a sense of what it
is based on what I just did here

231
00:14:37,280 --> 00:14:43,930
with this example of 66,
but worth writing down.

232
00:14:43,930 --> 00:14:47,060
We're going to walk right
in the top linked list,

233
00:14:47,060 --> 00:14:50,278
so this is simply
for two linked lists,

234
00:14:50,278 --> 00:14:53,390
and we'll generalize
at some point.

235
00:14:53,390 --> 00:15:00,260
So we want to walk right
in the top linked list, L1,

236
00:15:00,260 --> 00:15:06,956
until going right
would go too far.

237
00:15:11,060 --> 00:15:15,210
Now, there was this answer with
72, which I kind of dismissed.

238
00:15:15,210 --> 00:15:19,260
But there's no reason
why you can't overshoot

239
00:15:19,260 --> 00:15:21,184
one stop and go backwards.

240
00:15:21,184 --> 00:15:23,100
It would just be a
different search algorithm.

241
00:15:23,100 --> 00:15:25,380
It's not something we're
going to analyze here.

242
00:15:25,380 --> 00:15:29,770
It turns out in analyzing that
with high probability would

243
00:15:29,770 --> 00:15:32,840
be even more painful
than the painful analysis

244
00:15:32,840 --> 00:15:34,290
we're going to do.

245
00:15:34,290 --> 00:15:35,355
So we won't go there.

246
00:15:39,690 --> 00:15:45,880
And then we walk down
to the bottom list.

247
00:15:52,350 --> 00:15:54,940
And the bottom
list we'll call L0.

248
00:15:57,930 --> 00:16:11,960
And walk right in L0 until
the element is found or not.

249
00:16:11,960 --> 00:16:13,630
And you know that
if you've overshot.

250
00:16:16,220 --> 00:16:21,130
So if you're looking here for
route 67, when you get to 72

251
00:16:21,130 --> 00:16:24,540
here-- you've seen
66 and you get to 72

252
00:16:24,540 --> 00:16:27,450
and you're looking
for 67, search fails.

253
00:16:27,450 --> 00:16:28,730
It stops and fails.

254
00:16:28,730 --> 00:16:30,880
Doesn't succeed in this case.

255
00:16:30,880 --> 00:16:34,870
So that's what we
got for search.

256
00:16:34,870 --> 00:16:38,300
And that's our two
linked list argument.

257
00:16:38,300 --> 00:16:45,040
Now, our analysis essentially
says what I have is

258
00:16:45,040 --> 00:16:53,090
I'm walking right at
the bottom list here,

259
00:16:53,090 --> 00:16:59,530
and my top list is L1,
so I start with L1.

260
00:17:02,845 --> 00:17:04,780
And my search cost
is going to be

261
00:17:04,780 --> 00:17:12,420
approximately the length of L1.

262
00:17:12,420 --> 00:17:17,470
The worst case analysis,
I could go all the way

263
00:17:17,470 --> 00:17:22,030
on the top list-- it's possible.

264
00:17:22,030 --> 00:17:30,680
But for a given value, I'm
going to be looking at only

265
00:17:30,680 --> 00:17:34,730
a portion of the bottom list.

266
00:17:34,730 --> 00:17:37,700
I'm not going to go all the
way on the bottom list ever.

267
00:17:37,700 --> 00:17:40,250
I'm only going to be
looking at a portion of it.

268
00:17:40,250 --> 00:17:44,220
So it's going to be
L0 divided by L1,

269
00:17:44,220 --> 00:17:51,180
if I have interspersed my
express stops in a uniform way.

270
00:17:51,180 --> 00:17:54,600
So there's no reason--
if I have 100 elements

271
00:17:54,600 --> 00:17:59,750
in the bottom list,
and if I had five,

272
00:17:59,750 --> 00:18:02,160
just for argument sake,
five in the top list,

273
00:18:02,160 --> 00:18:08,060
then I'd put them at, let's
say, the 0 position, 20, 40, 60,

274
00:18:08,060 --> 00:18:09,260
et cetera.

275
00:18:09,260 --> 00:18:11,540
So I want to have
roughly equal spacings.

276
00:18:11,540 --> 00:18:15,505
But we need to make that
a little more concrete,

277
00:18:15,505 --> 00:18:17,600
and a little more precise.

278
00:18:17,600 --> 00:18:19,210
And what I'm saying
here simply is

279
00:18:19,210 --> 00:18:23,300
that this is the cost of
traversal in the top list,

280
00:18:23,300 --> 00:18:26,022
and this is the cost of
traversal in the bottom list,

281
00:18:26,022 --> 00:18:28,480
because I'm not going to go
all the way in the bottom list.

282
00:18:28,480 --> 00:18:31,650
I'm only going to go a
portion on the bottom list.

283
00:18:31,650 --> 00:18:33,152
Everybody gets that?

284
00:18:33,152 --> 00:18:34,400
Yup?

285
00:18:34,400 --> 00:18:35,830
All right, good.

286
00:18:35,830 --> 00:18:38,860
So if I want to minimize
this cost, which

287
00:18:38,860 --> 00:18:44,250
is going to tell
me how to scatter

288
00:18:44,250 --> 00:18:50,230
these elements in the top list,
how to choose my express stops,

289
00:18:50,230 --> 00:18:54,560
if you will-- I want to
scatter these in a uniform way,

290
00:18:54,560 --> 00:19:01,908
then this is minimized
when terms are equal.

291
00:19:04,920 --> 00:19:07,510
You could go off and
differentiate and do that.

292
00:19:07,510 --> 00:19:09,670
It's fairly standard.

293
00:19:09,670 --> 00:19:16,950
And what you end up getting
is you want to get L1 square

294
00:19:16,950 --> 00:19:20,510
equals L0 equals n.

295
00:19:20,510 --> 00:19:23,460
So all of the elements are
down at the bottom list,

296
00:19:23,460 --> 00:19:27,730
and so the cardinality
of the bottom list is n.

297
00:19:27,730 --> 00:19:34,980
And roughly speaking, you're
going to end up optimizing,

298
00:19:34,980 --> 00:19:39,950
if you have this satisfied,
which means that L1 is

299
00:19:39,950 --> 00:19:42,010
going to be square root of n.

300
00:19:42,010 --> 00:19:43,310
OK?

301
00:19:43,310 --> 00:19:46,870
So what you've done here is
you've said a bunch of things,

302
00:19:46,870 --> 00:19:48,030
actually.

303
00:19:48,030 --> 00:19:50,760
You've decided how
many elements are

304
00:19:50,760 --> 00:19:53,580
going to be in your top list.

305
00:19:53,580 --> 00:19:55,450
If there's n elements
in the bottom list,

306
00:19:55,450 --> 00:19:59,580
you want to have the square root
of n elements in the top list.

307
00:19:59,580 --> 00:20:02,100
And not only that,
in order to make sure

308
00:20:02,100 --> 00:20:07,910
that this works properly, and
that you don't get a worse case

309
00:20:07,910 --> 00:20:10,930
cost that is not
optimal, you do have

310
00:20:10,930 --> 00:20:15,340
to intersperse the
square root of n elements

311
00:20:15,340 --> 00:20:19,630
at regular intervals in
relation to the bottom list

312
00:20:19,630 --> 00:20:21,020
on the top list.

313
00:20:21,020 --> 00:20:24,820
OK, so pictorially
what this means is it's

314
00:20:24,820 --> 00:20:26,730
not what you have here.

315
00:20:26,730 --> 00:20:29,780
What you really
want is something

316
00:20:29,780 --> 00:20:40,920
that, let's say, looks like
this where this part here

317
00:20:40,920 --> 00:20:45,750
is square root of n elements
up until that point,

318
00:20:45,750 --> 00:20:50,190
and then let's say we
go from here to here

319
00:20:50,190 --> 00:20:52,930
or square root of n
elements, and maybe I'll

320
00:20:52,930 --> 00:20:57,610
have a 66 here because
that's exactly where I

321
00:20:57,610 --> 00:20:59,180
want my square root of n.

322
00:20:59,180 --> 00:21:02,130
Basically, three
elements in between.

323
00:21:02,130 --> 00:21:06,870
So I got 66 here, et cetera.

324
00:21:06,870 --> 00:21:09,310
I mean I chose n to be
a particular value here,

325
00:21:09,310 --> 00:21:12,230
but you get the picture.

326
00:21:12,230 --> 00:21:15,210
So the search now, as you can
see if you just add those up

327
00:21:15,210 --> 00:21:17,240
you get square root
of n here, and you

328
00:21:17,240 --> 00:21:19,270
got n divided by
square root of n here.

329
00:21:19,270 --> 00:21:21,060
So that's square
root of n as well.

330
00:21:21,060 --> 00:21:29,530
So the search cost is
order square root of n.

331
00:21:29,530 --> 00:21:30,766
And so that's it.

332
00:21:30,766 --> 00:21:36,660
That's the first
generalization, and really

333
00:21:36,660 --> 00:21:39,900
the most important
one, that comes

334
00:21:39,900 --> 00:21:42,580
from going from a
single sorted list

335
00:21:42,580 --> 00:21:47,370
to an approximation
of a skip list.

336
00:21:47,370 --> 00:21:51,420
So what do you do if you
want to make things better?

337
00:21:51,420 --> 00:21:52,970
So we want to make
things better?

338
00:21:52,970 --> 00:21:54,609
Are we happy with
square root of n?

339
00:21:54,609 --> 00:21:55,150
AUDIENCE: No.

340
00:21:55,150 --> 00:21:55,500
SRINIVAS DEVADAS: No.

341
00:21:55,500 --> 00:21:56,966
Well, what's our target?

342
00:21:56,966 --> 00:21:57,760
AUDIENCE: Log n.

343
00:21:57,760 --> 00:21:59,370
SRINIVAS DEVADAS:
Log n, obviously.

344
00:21:59,370 --> 00:22:01,328
Well, I guess you can
argue that our target may

345
00:22:01,328 --> 00:22:03,880
be order 1 at some point,
but for today's lecture

346
00:22:03,880 --> 00:22:06,960
it is order log n
with high probability.

347
00:22:06,960 --> 00:22:08,150
We'll leave it at that.

348
00:22:08,150 --> 00:22:14,940
And so what do you do if
you want to go this way

349
00:22:14,940 --> 00:22:16,370
and generalize?

350
00:22:16,370 --> 00:22:18,525
You simply add more lists.

351
00:22:18,525 --> 00:22:20,650
I mean it seems to be pretty
much the only thing we

352
00:22:20,650 --> 00:22:22,050
could do here.

353
00:22:22,050 --> 00:22:25,900
So let's go ahead
and add a third list.

354
00:22:25,900 --> 00:22:32,710
So if you have two
sorted lists, that

355
00:22:32,710 --> 00:22:34,560
implies I have 2
square root of n.

356
00:22:34,560 --> 00:22:37,240
If I want to be explicit
about the constant in terms

357
00:22:37,240 --> 00:22:40,050
of the search cost,
assuming things

358
00:22:40,050 --> 00:22:42,900
are interspersed exactly right.

359
00:22:42,900 --> 00:22:45,560
Keep that in mind because
that is going to go away

360
00:22:45,560 --> 00:22:47,000
when we go and randomize.

361
00:22:47,000 --> 00:22:50,220
We're going to be flipping
coins and things like that.

362
00:22:50,220 --> 00:22:53,880
But so far, things
are very structured.

363
00:22:53,880 --> 00:22:59,690
What do you think-- we won't
do this analysis-- the cost is

364
00:22:59,690 --> 00:23:04,410
going to be if I
intersperse optimally, what

365
00:23:04,410 --> 00:23:08,830
is the cost going
to be for a search

366
00:23:08,830 --> 00:23:12,575
when I have three sorted lists?

367
00:23:12,575 --> 00:23:13,624
AUDIENCE: Cube root.

368
00:23:13,624 --> 00:23:14,790
SRINIVAS DEVADAS: Cube root.

369
00:23:14,790 --> 00:23:15,540
Great guess.

370
00:23:15,540 --> 00:23:16,930
Who said cube root?

371
00:23:16,930 --> 00:23:17,960
AUDIENCE: [INAUDIBLE].

372
00:23:17,960 --> 00:23:19,834
SRINIVAS DEVADAS: You
already have a Frisbee.

373
00:23:19,834 --> 00:23:20,780
Give it to a friend.

374
00:23:20,780 --> 00:23:21,905
I need to get rid of these.

375
00:23:24,810 --> 00:23:29,730
So it's going to be cube
root, and the constant

376
00:23:29,730 --> 00:23:31,116
in front of that is going to be?

377
00:23:31,116 --> 00:23:31,647
AUDIENCE: 3.

378
00:23:31,647 --> 00:23:32,480
SRINIVAS DEVADAS: 3.

379
00:23:32,480 --> 00:23:36,380
So you have-- right?

380
00:23:36,380 --> 00:23:37,910
So let's just keep going.

381
00:23:37,910 --> 00:23:39,860
You have k sorted lists.

382
00:23:39,860 --> 00:23:43,550
You're going to have k
times the k-th root of n.

383
00:23:47,700 --> 00:23:49,310
That's what you got.

384
00:23:49,310 --> 00:23:51,070
And I'm not going to
bother drawing this,

385
00:23:51,070 --> 00:23:53,160
but essentially
what happens is you

386
00:23:53,160 --> 00:23:57,110
are making the same number
of moves which corresponds

387
00:23:57,110 --> 00:24:01,180
to the root of n, the
corresponding root of n,

388
00:24:01,180 --> 00:24:04,420
at every level.

389
00:24:04,420 --> 00:24:12,600
And the last thing we have to do
to get a sense for what happens

390
00:24:12,600 --> 00:24:17,630
here is we have log n sorted
lists, so the number of levels

391
00:24:17,630 --> 00:24:20,770
here is log n.

392
00:24:20,770 --> 00:24:24,840
So this is starting to look kind
of familiar because it borrows

393
00:24:24,840 --> 00:24:26,530
from other data structures.

394
00:24:26,530 --> 00:24:31,390
And what this is I'm just going
to substitute log n for k,

395
00:24:31,390 --> 00:24:36,240
and I got this kind
of scary looking--

396
00:24:36,240 --> 00:24:38,037
I was scared the
first time I saw this.

397
00:24:41,306 --> 00:24:42,150
Oh, this is n.

398
00:24:45,050 --> 00:24:49,494
It's the log n-th root of n, OK?

399
00:24:49,494 --> 00:24:50,910
And so it's kind
of scary looking.

400
00:24:50,910 --> 00:24:53,840
But what is the log n-th
root of n-- and we can assume

401
00:24:53,840 --> 00:24:56,102
that n is a power of two?

402
00:24:56,102 --> 00:24:56,602
AUDIENCE: 2.

403
00:24:56,602 --> 00:24:58,396
SRINIVAS DEVADAS: 2, exactly.

404
00:24:58,396 --> 00:25:00,020
It's not that scary
looking, and that's

405
00:25:00,020 --> 00:25:01,630
because I'm not a mathematician.

406
00:25:01,630 --> 00:25:03,710
That's why I was scared.

407
00:25:03,710 --> 00:25:06,930
So 2 log n.

408
00:25:06,930 --> 00:25:07,850
All right.

409
00:25:07,850 --> 00:25:10,140
So that's it.

410
00:25:10,140 --> 00:25:14,170
So you get a sense of how
this works now, right?

411
00:25:14,170 --> 00:25:16,730
We haven't talked about
randomized structures yet,

412
00:25:16,730 --> 00:25:18,860
but I've given you
the template that's

413
00:25:18,860 --> 00:25:22,990
associated with the skip list,
which essentially says what I'm

414
00:25:22,990 --> 00:25:29,990
going to have are-- if it was
static data items and n was

415
00:25:29,990 --> 00:25:32,500
a power of two, then
essentially what

416
00:25:32,500 --> 00:25:37,230
I'm saying is I'm going to
have a bunch of items, n items,

417
00:25:37,230 --> 00:25:38,620
at the bottom.

418
00:25:38,620 --> 00:25:41,690
I'm going to have n over
2 items at the list that's

419
00:25:41,690 --> 00:25:44,280
just immediately above.

420
00:25:44,280 --> 00:25:46,620
And each of them are
going to be alternating.

421
00:25:46,620 --> 00:25:48,570
You're going to have
an item in between.

422
00:25:48,570 --> 00:25:52,210
And then on the top I'm
going to see n over 4 items,

423
00:25:52,210 --> 00:25:53,940
and so on and so forth.

424
00:25:53,940 --> 00:25:56,270
What does that look like?

425
00:25:56,270 --> 00:25:57,710
Kind of looks like
a tree, right?

426
00:25:57,710 --> 00:25:59,210
I mean it doesn't
have the structure

427
00:25:59,210 --> 00:26:02,072
of a tree in the sense
of the edges of a tree.

428
00:26:02,072 --> 00:26:03,530
It's quite different
because you're

429
00:26:03,530 --> 00:26:06,300
connecting things differently.

430
00:26:06,300 --> 00:26:07,970
You have all the
leaves connected down

431
00:26:07,970 --> 00:26:10,170
at the bottom of
this so-called tree

432
00:26:10,170 --> 00:26:13,890
with this doubly linked
list, but it has the triangle

433
00:26:13,890 --> 00:26:15,140
structure of a tree.

434
00:26:15,140 --> 00:26:17,490
And that's where the
log n comes from.

435
00:26:17,490 --> 00:26:20,290
So this is would
all be wonderful

436
00:26:20,290 --> 00:26:22,340
if this were a static set.

437
00:26:22,340 --> 00:26:25,910
And n doesn't have to be a
power of 2-- you could pad it,

438
00:26:25,910 --> 00:26:27,180
and so on and so forth.

439
00:26:27,180 --> 00:26:29,400
But the big thing here
is that we haven't quite

440
00:26:29,400 --> 00:26:32,900
accomplished what
we set out to do,

441
00:26:32,900 --> 00:26:37,220
even though we seem to have
this log n cost for search.

442
00:26:37,220 --> 00:26:41,840
But it's all based on a static
set which doesn't change.

443
00:26:41,840 --> 00:26:45,010
And the problem, of course, is
that you could have deletions.

444
00:26:45,010 --> 00:26:47,480
You want to take away 42.

445
00:26:47,480 --> 00:26:50,130
For some reason you
can't go to 42nd Avenue,

446
00:26:50,130 --> 00:26:53,090
or I guess art-- you
can't go to [INAUDIBLE]

447
00:26:53,090 --> 00:26:55,300
would be a better example.

448
00:26:55,300 --> 00:26:58,300
So stuff breaks, right?

449
00:26:58,300 --> 00:27:01,530
And so you take stuff out
and you insert things in.

450
00:27:01,530 --> 00:27:06,650
Suppose I wanted to insert
60, 61, 62, 63, and 64

451
00:27:06,650 --> 00:27:08,180
into that list that I have?

452
00:27:08,180 --> 00:27:10,239
What would happen?

453
00:27:10,239 --> 00:27:11,530
Yeah, you're shaking your head.

454
00:27:11,530 --> 00:27:17,490
I mean that log n would go
away, so it would be a problem.

455
00:27:17,490 --> 00:27:20,050
But what we have
to do now is move

456
00:27:20,050 --> 00:27:22,320
to the probabilistic domain.

457
00:27:22,320 --> 00:27:23,990
We have to think
about what happens

458
00:27:23,990 --> 00:27:25,190
when we insert elements.

459
00:27:25,190 --> 00:27:27,000
We need an algorithm for insert.

460
00:27:27,000 --> 00:27:30,510
So then we can start with the
null list and build it up.

461
00:27:30,510 --> 00:27:32,990
And then you start
with a null list

462
00:27:32,990 --> 00:27:35,410
and you have a randomized
algorithm for insert,

463
00:27:35,410 --> 00:27:37,490
it ain't going to
look that pretty.

464
00:27:37,490 --> 00:27:39,990
It's going to look random.

465
00:27:39,990 --> 00:27:42,460
But you have to have a
certain amount of structure

466
00:27:42,460 --> 00:27:45,190
so you can still get
your order log n.

467
00:27:45,190 --> 00:27:48,630
So you have to do the
insertion appropriately.

468
00:27:48,630 --> 00:27:50,325
So that's what we
have to do next.

469
00:27:50,325 --> 00:27:51,950
But any questions
about that complexity

470
00:27:51,950 --> 00:27:52,832
that I have up there?

471
00:27:55,460 --> 00:27:56,780
All right, good.

472
00:28:03,280 --> 00:28:08,460
I want a canonical
example of a list here,

473
00:28:08,460 --> 00:28:10,400
and I kind of ran out
of room over there,

474
00:28:10,400 --> 00:28:23,220
so bear with me as I draw you
a more sophisticated skip list

475
00:28:23,220 --> 00:28:25,900
that has a few more levels.

476
00:28:25,900 --> 00:28:30,090
And the reason for this is
it's only interesting when

477
00:28:30,090 --> 00:28:33,100
you have three or more levels.

478
00:28:33,100 --> 00:28:35,330
The search algorithm
is kind of the same.

479
00:28:35,330 --> 00:28:39,070
You go up top and when
you overshoot you pop down

480
00:28:39,070 --> 00:28:43,670
one level, and then you do
the same thing over and over.

481
00:28:43,670 --> 00:28:48,060
But we are going
to have to bound

482
00:28:48,060 --> 00:28:52,740
the number of levels in the skip
list in a probabilistic way.

483
00:28:52,740 --> 00:28:57,410
We have to actually discover
the expected number of levels

484
00:28:57,410 --> 00:29:01,540
because we're going to be doing
inserts in a randomized way.

485
00:29:01,540 --> 00:29:04,380
And so it's worthwhile
having a picture that's

486
00:29:04,380 --> 00:29:07,750
a little more interesting than
the picture of the two linked

487
00:29:07,750 --> 00:29:09,800
lists that I had up there.

488
00:29:09,800 --> 00:29:14,685
So I'm going to leave this on
for the rest of the lecture.

489
00:29:23,000 --> 00:29:25,000
So that's our bottom,
and that hasn't changed

490
00:29:25,000 --> 00:29:27,420
from our previous examples.

491
00:29:27,420 --> 00:29:34,450
I'm not going to bother drawing
the horizontal connections.

492
00:29:34,450 --> 00:29:39,080
When you see things adjacent
horizontally at the same level,

493
00:29:39,080 --> 00:29:43,896
assume that they're all
connected-- all of them.

494
00:29:46,950 --> 00:29:52,440
And so I have four levels here.

495
00:29:52,440 --> 00:29:56,310
And you can think of this
as being the entire list

496
00:29:56,310 --> 00:29:58,310
or part of it.

497
00:29:58,310 --> 00:30:02,685
Just to delineate
things nicely, we'll

498
00:30:02,685 --> 00:30:07,940
assume that 79, which
is the last element,

499
00:30:07,940 --> 00:30:10,150
is all the way up
at the top as well.

500
00:30:10,150 --> 00:30:15,300
Sort of the terminus,
termini, corresponding

501
00:30:15,300 --> 00:30:21,980
to our analogy of subways.

502
00:30:21,980 --> 00:30:24,400
And so that's our
top-most level.

503
00:30:24,400 --> 00:30:35,470
And then I might have
50 here at this level,

504
00:30:35,470 --> 00:30:36,720
or so that looks like.

505
00:30:36,720 --> 00:30:39,320
I will have 50, so
the invariant here,

506
00:30:39,320 --> 00:30:41,950
and that's another reason
I want to draw this out,

507
00:30:41,950 --> 00:30:48,680
is that if you have a station
at highest level, then

508
00:30:48,680 --> 00:30:54,000
you will have-- it's got
to be sitting on something.

509
00:30:54,000 --> 00:30:57,180
So if you've got a 79 at
level four, or level three

510
00:30:57,180 --> 00:31:06,200
here if this is L0, then you
will see 79 at L2, L1, and L0.

511
00:31:06,200 --> 00:31:09,900
And if you see 50 here,
it's not in L3 so that's OK,

512
00:31:09,900 --> 00:31:13,420
but it's in L2, so it's
got to be at L1 as well.

513
00:31:13,420 --> 00:31:15,660
Of course you know that
everything is down at L1,

514
00:31:15,660 --> 00:31:18,800
so this is interesting from a
standpoint of the relationship

515
00:31:18,800 --> 00:31:25,220
between Li and Li plus 1 where
i is greater than or equal to 1.

516
00:31:25,220 --> 00:31:33,440
So the implication is that if
you see it at at Li plus 1,

517
00:31:33,440 --> 00:31:37,340
it's going to be at
Li and Li minus 1

518
00:31:37,340 --> 00:31:39,940
if that happens to
exist, et cetera.

519
00:31:39,940 --> 00:31:42,980
And so one last thing
here just to finish it up.

520
00:31:42,980 --> 00:31:48,490
I got 34 here, which is
an additional thing which

521
00:31:48,490 --> 00:31:49,160
ends there.

522
00:31:49,160 --> 00:31:54,060
So the highest level is
this second level or L1.

523
00:31:54,060 --> 00:31:56,760
This is 66.

524
00:31:56,760 --> 00:31:58,850
And then that's it.

525
00:31:58,850 --> 00:32:01,910
So that's our skip list.

526
00:32:01,910 --> 00:32:05,910
So if you wanted to search
for 72, you would start here,

527
00:32:05,910 --> 00:32:09,460
and then you'd go to 79,
or you'd look and say,

528
00:32:09,460 --> 00:32:13,560
oh, 79 is too far, so I'm
going to pop down a level.

529
00:32:13,560 --> 00:32:15,320
And then you'd say 50, oh, good.

530
00:32:15,320 --> 00:32:16,690
I can get to 50.

531
00:32:16,690 --> 00:32:19,520
79 is too far, so I'm
going to pop down a level.

532
00:32:19,520 --> 00:32:23,820
And then you go to 66--
79 is too far-- and at 66,

533
00:32:23,820 --> 00:32:29,220
you pop down a level and
then you go 66 to 72.

534
00:32:29,220 --> 00:32:31,120
So same as what we had before.

535
00:32:31,120 --> 00:32:34,310
Hopefully it's not
too complicated.

536
00:32:34,310 --> 00:32:36,990
So that's our skip list.

537
00:32:36,990 --> 00:32:40,710
It's still looking
pretty structured,

538
00:32:40,710 --> 00:32:43,000
looking pretty regular.

539
00:32:43,000 --> 00:32:44,640
But if I start
taking that and start

540
00:32:44,640 --> 00:32:46,590
inserting things
and deleting things,

541
00:32:46,590 --> 00:32:48,680
it could become quite irregular.

542
00:32:48,680 --> 00:32:50,540
I could take away
23, for example.

543
00:32:50,540 --> 00:32:52,123
And there's nothing
that's stopping me

544
00:32:52,123 --> 00:32:54,446
from taking away 34 or 79.

545
00:32:54,446 --> 00:32:56,070
You've got to delete
an element, you've

546
00:32:56,070 --> 00:32:57,574
got to delete an element.

547
00:32:57,574 --> 00:32:59,240
I mean the fact that
it's in four levels

548
00:32:59,240 --> 00:33:01,530
shouldn't make a difference.

549
00:33:01,530 --> 00:33:03,680
And so that's something
to keep in mind.

550
00:33:03,680 --> 00:33:07,110
So this could get pretty messy.

551
00:33:07,110 --> 00:33:08,830
So let's talk about
insert, and I've

552
00:33:08,830 --> 00:33:12,190
spent a bunch of time skirting
around the issue of what

553
00:33:12,190 --> 00:33:15,910
exactly happens when
you insert an element.

554
00:33:15,910 --> 00:33:18,260
Turns out delete is pretty easy.

555
00:33:18,260 --> 00:33:20,330
Insert is more interesting.

556
00:33:20,330 --> 00:33:21,090
Let's do insert.

557
00:33:43,540 --> 00:33:45,855
To insert an element
x into a skip list,

558
00:33:45,855 --> 00:33:47,230
the first thing
we're going to do

559
00:33:47,230 --> 00:33:59,530
is search to figure out where
x fits into the bottom list.

560
00:33:59,530 --> 00:34:04,270
So you do a search just like
you would if you were just

561
00:34:04,270 --> 00:34:06,990
doing a search.

562
00:34:06,990 --> 00:34:09,750
You always insert into
the appropriate position.

563
00:34:09,750 --> 00:34:12,040
So if there's a
single sorted list,

564
00:34:12,040 --> 00:34:13,248
that would pretty much be it.

565
00:34:16,400 --> 00:34:18,810
And so that part is easy.

566
00:34:18,810 --> 00:34:24,530
If you want to insert 67, you
do all of the search operations

567
00:34:24,530 --> 00:34:26,270
that I just went
over, and then you

568
00:34:26,270 --> 00:34:30,639
insert 67 between 66 and 72.

569
00:34:30,639 --> 00:34:33,949
So do your pointer
manipulations, what have you,

570
00:34:33,949 --> 00:34:35,250
and you're good.

571
00:34:35,250 --> 00:34:38,250
But you're not done yet, because
you want this to be a skip list

572
00:34:38,250 --> 00:34:41,730
and you want this to
have expected search

573
00:34:41,730 --> 00:34:47,199
over any random query as
the list grows and shrinks

574
00:34:47,199 --> 00:34:51,060
of order log n, expectation,
and also with high probability.

575
00:34:51,060 --> 00:34:54,560
So what you're going to have to
do is when you start inserting,

576
00:34:54,560 --> 00:34:56,820
you're going to have
to decide if you're

577
00:34:56,820 --> 00:35:01,740
going to what is called
promote these elements or not.

578
00:35:01,740 --> 00:35:05,240
And the notion of a
promotion is that you

579
00:35:05,240 --> 00:35:09,500
are going up and duplicating
this inserted element

580
00:35:09,500 --> 00:35:11,820
some number of levels up.

581
00:35:11,820 --> 00:35:16,520
So if you just look
at how this works,

582
00:35:16,520 --> 00:35:18,490
it's really pretty
straightforward.

583
00:35:18,490 --> 00:35:22,070
What is going to happen is
simply that let's say I have 67

584
00:35:22,070 --> 00:35:25,000
and I'm going to insert
it between 66 and 72.

585
00:35:25,000 --> 00:35:26,230
That much is a given.

586
00:35:26,230 --> 00:35:28,020
That is deterministic.

587
00:35:28,020 --> 00:35:33,550
Then I'm going to flip a
coin or spin a Frisbee.

588
00:35:33,550 --> 00:35:36,180
I like this better.

589
00:35:36,180 --> 00:35:38,420
I'm not sure if this
is biased or not.

590
00:35:38,420 --> 00:35:40,230
It's probably seriously biased.

591
00:35:40,230 --> 00:35:42,880
[LAUGHTER]

592
00:35:42,880 --> 00:35:47,650
Would it ever go the
other way is the question.

593
00:35:47,650 --> 00:35:48,400
Would it ever?

594
00:35:48,400 --> 00:35:49,560
No.

595
00:35:49,560 --> 00:35:50,120
All right.

596
00:35:50,120 --> 00:35:51,860
So we've got a problem here.

597
00:35:51,860 --> 00:35:53,920
I think we might have to
do something like that.

598
00:35:53,920 --> 00:35:57,030
[LAUGHTER]

599
00:35:57,030 --> 00:35:58,280
I'm procrastinating.

600
00:35:58,280 --> 00:36:00,280
I don't want to teach the
rest of this material.

601
00:36:00,280 --> 00:36:05,630
[LAUGHTER]

602
00:36:05,630 --> 00:36:06,630
All right.

603
00:36:06,630 --> 00:36:08,840
Let's go, let's go.

604
00:36:08,840 --> 00:36:23,640
So I'd like to insert
into some of the lists,

605
00:36:23,640 --> 00:36:25,420
and the big question
is which ones?

606
00:36:30,060 --> 00:36:32,370
It's going to be really cool.

607
00:36:32,370 --> 00:36:36,560
I'm just going to flip
coins, fair coins,

608
00:36:36,560 --> 00:36:42,855
and decide how much to
promote these elements.

609
00:36:51,940 --> 00:36:57,790
So flip fair coin.

610
00:36:57,790 --> 00:37:13,000
If heads, promote x to the
next level up, and repeat.

611
00:37:21,100 --> 00:37:26,400
Else, if you ever get
a tails, you stop.

612
00:37:26,400 --> 00:37:29,030
And this next level up
may be newly created.

613
00:37:35,370 --> 00:37:41,530
So what might happen with the
67 is that you stick it in here,

614
00:37:41,530 --> 00:37:44,520
and it might happen that
the first time you flip you

615
00:37:44,520 --> 00:37:47,520
get a tails, in which
case, 67 is going

616
00:37:47,520 --> 00:37:49,270
to just be at the bottom list.

617
00:37:49,270 --> 00:37:51,760
But if you get one heads,
then you're not only

618
00:37:51,760 --> 00:37:54,020
going to put 67 in
here, you're going

619
00:37:54,020 --> 00:37:55,880
to put 67 up here as well.

620
00:37:55,880 --> 00:37:59,160
And you're going to flip again.

621
00:37:59,160 --> 00:38:04,890
And if you get a heads again,
you're going to put 67 up here.

622
00:38:04,890 --> 00:38:10,040
And if you get a heads again,
you're going to put 67 up here.

623
00:38:10,040 --> 00:38:11,650
And if you get a
heads again, you're

624
00:38:11,650 --> 00:38:13,790
going to create a
new list up there,

625
00:38:13,790 --> 00:38:17,265
and at this point when
you create the new list,

626
00:38:17,265 --> 00:38:20,420
it's only going
to be 67 up there.

627
00:38:20,420 --> 00:38:24,330
And that's going to be
the front of your list,

628
00:38:24,330 --> 00:38:27,210
because that's the one element
that you're duplicating.

629
00:38:27,210 --> 00:38:30,660
So you're going to keep
going until you get a tails.

630
00:38:30,660 --> 00:38:34,430
Now, that's why this
coin had better be fair.

631
00:38:34,430 --> 00:38:36,310
So you're going to
keep going and you're

632
00:38:36,310 --> 00:38:37,910
going to keep adding.

633
00:38:37,910 --> 00:38:40,950
Every time you insert
there's a potential

634
00:38:40,950 --> 00:38:44,700
for increasing the number
of levels in this list.

635
00:38:44,700 --> 00:38:47,810
Now, the number
of levels is going

636
00:38:47,810 --> 00:38:52,270
to be bounded in expectation
with a high probability

637
00:38:52,270 --> 00:38:55,220
of regular
expectation, but I want

638
00:38:55,220 --> 00:38:57,820
to make it clear that
every time you insert,

639
00:38:57,820 --> 00:38:59,690
if you get a chain
of heads, you're

640
00:38:59,690 --> 00:39:02,130
going to be adding levels.

641
00:39:02,130 --> 00:39:06,550
And so the first time you
get a tails, you just stop.

642
00:39:06,550 --> 00:39:08,090
You just stop.

643
00:39:08,090 --> 00:39:11,730
So you can see that this can
get pretty messy pretty quick.

644
00:39:11,730 --> 00:39:14,180
And especially if you were
starting from ground zero

645
00:39:14,180 --> 00:39:16,610
and adding 14, 23--
all of those things,

646
00:39:16,610 --> 00:39:18,120
the bottom is going
to look exactly

647
00:39:18,120 --> 00:39:20,600
like it looks now because
you're going to put it in there.

648
00:39:20,600 --> 00:39:21,910
It's deterministic.

649
00:39:21,910 --> 00:39:24,740
But the very next level after
that looked pretty messy.

650
00:39:24,740 --> 00:39:26,560
You could have all of
them chunked up here,

651
00:39:26,560 --> 00:39:29,000
and a big gap, et
cetera, et cetera.

652
00:39:29,000 --> 00:39:33,600
So it's all about
randomized search cost.

653
00:39:33,600 --> 00:39:37,920
The worse case cost here
is going to be order n.

654
00:39:37,920 --> 00:39:40,030
Worst case cost is
going to be order n,

655
00:39:40,030 --> 00:39:42,310
because you have no idea
where these things are

656
00:39:42,310 --> 00:39:43,330
going to end up.

657
00:39:43,330 --> 00:39:47,020
But the randomized cost
is what's cool about this.

658
00:39:47,020 --> 00:39:50,940
Any questions about
insert or anything I said?

659
00:39:50,940 --> 00:39:51,887
Yeah, go ahead.

660
00:39:51,887 --> 00:39:53,512
AUDIENCE: Is worse
case really order n?

661
00:39:53,512 --> 00:39:55,640
What if you had a really
long, like a lot of lists

662
00:39:55,640 --> 00:39:58,200
on top of each other, and
you start at the top of that

663
00:39:58,200 --> 00:40:01,530
and you had to walk all
the way [INAUDIBLE]?

664
00:40:01,530 --> 00:40:05,820
SRINIVAS DEVADAS: Well, you go
n down and n this way, right?

665
00:40:05,820 --> 00:40:08,589
You would be checking
so it would be order n.

666
00:40:08,589 --> 00:40:10,130
AUDIENCE: So it's
[? bounded ?] by n?

667
00:40:10,130 --> 00:40:11,980
SRINIVAS DEVADAS:
Yeah, the worst case.

668
00:40:11,980 --> 00:40:13,790
AUDIENCE: Worse
case is infinity.

669
00:40:13,790 --> 00:40:14,840
SRINIVAS DEVADAS:
Worse case is infinity.

670
00:40:14,840 --> 00:40:15,890
Oh, in that sense, yeah.

671
00:40:15,890 --> 00:40:17,020
OK.

672
00:40:17,020 --> 00:40:19,769
Well, n elements, Eric is right.

673
00:40:19,769 --> 00:40:21,310
So what is happening
here is that you

674
00:40:21,310 --> 00:40:23,700
have a small probability
that you will

675
00:40:23,700 --> 00:40:27,840
keep flipping heads forever.

676
00:40:27,840 --> 00:40:32,140
So at some level, if you
somehow take that away and use

677
00:40:32,140 --> 00:40:34,750
Frisbees instead
or you truncate it.

678
00:40:34,750 --> 00:40:37,780
Let's say at some point you
ended up saying that you only

679
00:40:37,780 --> 00:40:39,185
have n levels total.

680
00:40:42,310 --> 00:40:47,240
So it's not a-- I
should have gone there.

681
00:40:47,240 --> 00:40:49,370
The question has to
be posed a little more

682
00:40:49,370 --> 00:40:52,560
precisely for the
answer to be order n.

683
00:40:52,560 --> 00:40:55,420
You have to have some more
limitations to avoid the case

684
00:40:55,420 --> 00:40:59,960
that Eric just mentioned, which
is in the randomized situation

685
00:40:59,960 --> 00:41:03,070
you will have the
possibility of getting

686
00:41:03,070 --> 00:41:04,807
an infinite number of heads.

687
00:41:04,807 --> 00:41:05,890
Yeah, question back there.

688
00:41:05,890 --> 00:41:06,806
AUDIENCE: [INAUDIBLE].

689
00:41:10,110 --> 00:41:13,060
SRINIVAS DEVADAS: Yes, you
can certainly do capping

690
00:41:13,060 --> 00:41:17,100
and you can do a
bunch of other things.

691
00:41:17,100 --> 00:41:20,290
It ends up becoming
something which is not

692
00:41:20,290 --> 00:41:22,320
as clean as what you have here.

693
00:41:22,320 --> 00:41:25,090
The analysis is messy.

694
00:41:25,090 --> 00:41:28,792
And it's sort of in between
a randomized data structure,

695
00:41:28,792 --> 00:41:30,250
a purely randomized
data structure,

696
00:41:30,250 --> 00:41:31,840
and a deterministic one.

697
00:41:34,510 --> 00:41:37,130
I think the important
thing to bring out here

698
00:41:37,130 --> 00:41:43,571
is the worst case is much
worse than order log n, OK?

699
00:41:43,571 --> 00:41:44,070
Cool.

700
00:41:44,070 --> 00:41:44,569
Good.

701
00:41:44,569 --> 00:41:46,560
Thanks for those questions.

702
00:41:46,560 --> 00:41:52,720
And so what we have here now is
an insert algorithm that could

703
00:41:52,720 --> 00:41:56,890
make things look pretty messy.

704
00:41:56,890 --> 00:42:00,310
I'm going to leave the insert
up here, and that, of course,

705
00:42:00,310 --> 00:42:02,380
is part of that.

706
00:42:02,380 --> 00:42:04,890
Now, for the rest
of the lecture we're

707
00:42:04,890 --> 00:42:08,810
going to talk about why
skip lists are good.

708
00:42:08,810 --> 00:42:12,110
And we're going to justify
this randomized data structure

709
00:42:12,110 --> 00:42:16,790
and show lots of nice
results with respect

710
00:42:16,790 --> 00:42:20,680
to the expectation on the
number of levels, expectation

711
00:42:20,680 --> 00:42:22,390
on the number of
moves in a search,

712
00:42:22,390 --> 00:42:26,410
regardless of what items
you're inserting and deleting.

713
00:42:26,410 --> 00:42:27,700
One last thing.

714
00:42:27,700 --> 00:42:31,900
To delete an item,
you just delete it.

715
00:42:31,900 --> 00:42:40,258
You find it, search, and
delete at all levels.

716
00:42:43,000 --> 00:42:45,150
So you can't leave it
in any of the levels.

717
00:42:45,150 --> 00:42:47,800
So you find it, and you have
to have the pointers set up

718
00:42:47,800 --> 00:42:51,980
properly-- move the
previous pointer over

719
00:42:51,980 --> 00:42:54,550
to the next one, et
cetera, et cetera.

720
00:42:54,550 --> 00:42:56,520
We won't get into
that here, but you

721
00:42:56,520 --> 00:43:01,150
have to do the delete
at every level.

722
00:43:01,150 --> 00:43:01,880
Yeah, question.

723
00:43:01,880 --> 00:43:04,380
AUDIENCE: So what happens
if you inserted 10s

724
00:43:04,380 --> 00:43:06,380
and you flip off a tail?

725
00:43:06,380 --> 00:43:08,720
So that's like your
first element is not

726
00:43:08,720 --> 00:43:12,772
going to go up all the way,
and then have you do search.

727
00:43:12,772 --> 00:43:14,230
SRINIVAS DEVADAS:
So typically what

728
00:43:14,230 --> 00:43:18,520
happens is you need to
have a minus infinity here.

729
00:43:18,520 --> 00:43:19,559
And that's a good point.

730
00:43:19,559 --> 00:43:20,350
It's a corner case.

731
00:43:20,350 --> 00:43:21,933
You have to have a
minus infinity that

732
00:43:21,933 --> 00:43:23,990
goes up all the way.

733
00:43:23,990 --> 00:43:25,240
Good question.

734
00:43:25,240 --> 00:43:28,790
So the question was what happens
if I had something less than 14

735
00:43:28,790 --> 00:43:29,780
and I inserted it?

736
00:43:29,780 --> 00:43:31,760
Well, that doesn't
happen because nothing

737
00:43:31,760 --> 00:43:35,040
is less than minus infinity,
and that goes up all the way.

738
00:43:35,040 --> 00:43:37,740
But thanks for bringing it up.

739
00:43:37,740 --> 00:43:43,790
And so we're going to do
a little warm-up Lemma.

740
00:43:43,790 --> 00:43:45,220
I don't know if
you've ever heard

741
00:43:45,220 --> 00:43:51,520
these two terms in juxtaposition
like this-- warm up and Lemma.

742
00:43:51,520 --> 00:43:54,330
But here you go, your
first warm-up Lemma.

743
00:43:54,330 --> 00:43:57,330
I guess you'd never
have a warm-up theorem.

744
00:43:57,330 --> 00:44:00,300
It's a warm-up Lemma for
this theorem, which is

745
00:44:00,300 --> 00:44:04,060
going to take a while to prove.

746
00:44:04,060 --> 00:44:09,470
This comes down to trying to
get a sense of how many levels

747
00:44:09,470 --> 00:44:12,540
you're going to have from
a probabilistic standpoint.

748
00:44:12,540 --> 00:44:22,290
The number of levels in
an n element skip list

749
00:44:22,290 --> 00:44:24,980
is order log n.

750
00:44:24,980 --> 00:44:29,740
And I'm going to now define
the term with high probability.

751
00:44:29,740 --> 00:44:32,500
So what does this mean exactly?

752
00:44:32,500 --> 00:44:35,330
Well, what this
means is order log n

753
00:44:35,330 --> 00:44:39,040
is something like c
log n plus a constant.

754
00:44:39,040 --> 00:44:43,460
Let's ignore the constant
and let's stick with c log n.

755
00:44:43,460 --> 00:44:48,640
And with high probability
is a probability

756
00:44:48,640 --> 00:44:56,790
that is really a
function of n and alpha.

757
00:44:56,790 --> 00:45:02,710
And you have this inverse
polynomial relationship

758
00:45:02,710 --> 00:45:06,530
in the sense that
obviously as n grows here,

759
00:45:06,530 --> 00:45:13,300
an alpha-- we'll assume that
alpha is greater than the 1--

760
00:45:13,300 --> 00:45:19,480
you are going to get a
decrease in this quantity.

761
00:45:19,480 --> 00:45:23,341
So this is going to get closer
and closer to 1 as n grows.

762
00:45:23,341 --> 00:45:25,590
So that's the difference
between with high probability

763
00:45:25,590 --> 00:45:27,990
and just sort of giving you
an expectation number where

764
00:45:27,990 --> 00:45:29,800
you have no such guarantees.

765
00:45:29,800 --> 00:45:33,360
What is interesting about
this is that as n grows,

766
00:45:33,360 --> 00:45:36,940
you're going to get a higher
and higher probability.

767
00:45:36,940 --> 00:45:41,742
And this constant c is going
to be related to alpha.

768
00:45:41,742 --> 00:45:43,950
That's the other thing that's
interesting about this.

769
00:45:43,950 --> 00:45:46,680
So it's like saying-- and
you can kind of say this

770
00:45:46,680 --> 00:45:51,810
for using Chernoff bounds that
we'll get to in a few minutes,

771
00:45:51,810 --> 00:45:54,890
even for expectation as well.

772
00:45:54,890 --> 00:46:00,980
But what this says is that
if, for example, c doubled,

773
00:46:00,980 --> 00:46:06,620
then you are saying that
your number of levels

774
00:46:06,620 --> 00:46:08,770
is order 4 log n.

775
00:46:08,770 --> 00:46:11,250
I mean I understand that that
doesn't make too much sense,

776
00:46:11,250 --> 00:46:14,620
but it's less than or equal
to 4 log n plus a constant.

777
00:46:14,620 --> 00:46:18,850
And that 4 is going to get
reflected in the alpha here.

778
00:46:21,720 --> 00:46:25,380
When the 4 goes from 4 to
8, the alpha increases.

779
00:46:25,380 --> 00:46:30,600
So the more room that you have
with respect to this constant,

780
00:46:30,600 --> 00:46:32,350
the higher the probability.

781
00:46:32,350 --> 00:46:34,760
It becomes an
overwhelming probability

782
00:46:34,760 --> 00:46:38,190
that you're going to be
within those number of levels.

783
00:46:38,190 --> 00:46:41,240
So maybe there's
an 80% probability

784
00:46:41,240 --> 00:46:44,370
that you're within 2 log n.

785
00:46:44,370 --> 00:46:47,460
But there's a
99.99999% probability

786
00:46:47,460 --> 00:46:50,430
that you're within 4 log
n, and so on and so forth.

787
00:46:50,430 --> 00:46:53,000
So that's the kind of thing
that with the high probability

788
00:46:53,000 --> 00:46:56,630
analysis tells you explicitly.

789
00:46:56,630 --> 00:47:00,210
And so you can do that,
you can do this analysis

790
00:47:00,210 --> 00:47:03,980
fairly straightforwardly.

791
00:47:03,980 --> 00:47:08,390
And let me do that
on a different board.

792
00:47:08,390 --> 00:47:10,410
Let me go ahead and
do that over here.

793
00:47:10,410 --> 00:47:12,328
Actually, I don't
really need this.

794
00:47:12,328 --> 00:47:14,110
So let's do that over here.

795
00:47:18,830 --> 00:47:22,510
And so this is our first with
high probability analysis.

796
00:47:22,510 --> 00:47:26,210
And I want to prove
that warm-up Lemma.

797
00:47:26,210 --> 00:47:28,550
So usually what you
do here is you look

798
00:47:28,550 --> 00:47:30,500
at the failure probability.

799
00:47:30,500 --> 00:47:32,940
So with high
probability is typically

800
00:47:32,940 --> 00:47:35,980
something that
looks like 1 minus 1

801
00:47:35,980 --> 00:47:38,100
divided by n raised to alpha.

802
00:47:38,100 --> 00:47:42,060
And this part here is
the failure probability.

803
00:47:42,060 --> 00:47:43,750
And that's typically
what you analyze

804
00:47:43,750 --> 00:47:46,430
and what we're
going to do today.

805
00:47:46,430 --> 00:47:49,040
So the failure probability
is that it's not less

806
00:47:49,040 --> 00:47:52,560
than c log n levels, is the
complement of what we just

807
00:47:52,560 --> 00:47:57,250
looked at, which is the
probability that it's strictly

808
00:47:57,250 --> 00:47:58,836
greater than c log n levels.

809
00:48:01,610 --> 00:48:14,710
And that's the probability
that some element gets promoted

810
00:48:14,710 --> 00:48:16,265
greater than c log n times.

811
00:48:19,080 --> 00:48:24,120
So why would you have
more than c log n levels?

812
00:48:24,120 --> 00:48:27,030
It's essentially because
you inserted something

813
00:48:27,030 --> 00:48:30,930
and that element got promoted
strictly greater than c

814
00:48:30,930 --> 00:48:35,160
log n times, which
obviously implies that you

815
00:48:35,160 --> 00:48:37,320
had the sequence
of heads, and we'll

816
00:48:37,320 --> 00:48:39,110
get to that in just a second.

817
00:48:39,110 --> 00:48:43,350
But before we go to that
step of figuring out

818
00:48:43,350 --> 00:48:47,130
exactly what's going on here
as to why this got promoted

819
00:48:47,130 --> 00:48:48,880
and what the probability
of each promotion

820
00:48:48,880 --> 00:48:56,760
is, what I have here is I
have a sequence of inserts

821
00:48:56,760 --> 00:48:58,790
potentially that
I have to analyze.

822
00:48:58,790 --> 00:49:04,020
And in general, when I
have an n element list,

823
00:49:04,020 --> 00:49:06,440
I'm going to assume that
each of these elements

824
00:49:06,440 --> 00:49:09,540
got inserted into the
list at some point.

825
00:49:09,540 --> 00:49:11,760
So I've had n inserts.

826
00:49:11,760 --> 00:49:16,050
And we just look at the case
where you have n inserts,

827
00:49:16,050 --> 00:49:18,690
you could have deletes, and so
you could have more inserts,

828
00:49:18,690 --> 00:49:20,980
but it won't really
change anything.

829
00:49:20,980 --> 00:49:26,480
You have n inserts corresponding
to each of these elements,

830
00:49:26,480 --> 00:49:31,310
and one of those n elements got
promoted in this failure case

831
00:49:31,310 --> 00:49:34,380
greater than c log n times.

832
00:49:34,380 --> 00:49:36,320
That's essentially
what's happened here.

833
00:49:36,320 --> 00:49:41,120
And so you don't know which
one, but you can typically

834
00:49:41,120 --> 00:49:42,930
do this in with high
probability analysis

835
00:49:42,930 --> 00:49:45,410
because the probabilities
are so small

836
00:49:45,410 --> 00:49:50,545
and they're inverse
polynomials, polynomials like n

837
00:49:50,545 --> 00:49:51,620
raised to alpha.

838
00:49:51,620 --> 00:49:53,330
You can use what's
called the union bound

839
00:49:53,330 --> 00:49:58,180
that I'm sure you've used before
in some context or the other.

840
00:49:58,180 --> 00:50:00,500
And you essentially
say that this

841
00:50:00,500 --> 00:50:03,000
is less than or equal
to the probability

842
00:50:03,000 --> 00:50:06,810
that a particular element x.

843
00:50:06,810 --> 00:50:10,740
So you just pick an element,
arbitrary element x,

844
00:50:10,740 --> 00:50:12,720
but you pick one.

845
00:50:12,720 --> 00:50:18,830
Gets promoted greater
than c log n times.

846
00:50:18,830 --> 00:50:21,440
So you have a small probability.

847
00:50:21,440 --> 00:50:24,760
You have no idea whether these
events are independent or not.

848
00:50:24,760 --> 00:50:27,750
The union bound
doesn't care about it.

849
00:50:27,750 --> 00:50:31,800
It's like saying you've got
a 0.001 probability that any

850
00:50:31,800 --> 00:50:35,380
of these elements could get
promoted greater than c log n

851
00:50:35,380 --> 00:50:38,980
times, and there's
10 of those elements.

852
00:50:38,980 --> 00:50:41,560
You don't know whether they're
independent events or not,

853
00:50:41,560 --> 00:50:43,060
but you can certainly
use the union

854
00:50:43,060 --> 00:50:46,620
bound that says the overall
failure probability is going

855
00:50:46,620 --> 00:50:50,990
to be less than or equal to
n equals 10, in my example,

856
00:50:50,990 --> 00:50:53,460
times that 0.001.

857
00:50:53,460 --> 00:50:55,680
That's basically it.

858
00:50:55,680 --> 00:50:58,460
Now you can go off
and say, what does it

859
00:50:58,460 --> 00:51:00,700
mean for an element
to get promoted?

860
00:51:00,700 --> 00:51:05,040
What actually has to happen
for an element to get promoted?

861
00:51:05,040 --> 00:51:09,970
And you have n times 1
over 2, because you're

862
00:51:09,970 --> 00:51:13,250
flipping a fair
coin, and you are

863
00:51:13,250 --> 00:51:18,950
getting a c log n heads here.

864
00:51:18,950 --> 00:51:22,230
You flip and you
get one promotion.

865
00:51:25,705 --> 00:51:27,800
There's two levels
associated with a promotion,

866
00:51:27,800 --> 00:51:31,330
the level you came from
and the level you went to.

867
00:51:31,330 --> 00:51:33,980
And so a promotion
is a move, so you're

868
00:51:33,980 --> 00:51:37,530
going to have one more level.

869
00:51:37,530 --> 00:51:40,740
If you count levels, then you
have the number of promotions,

870
00:51:40,740 --> 00:51:41,670
right?

871
00:51:41,670 --> 00:51:46,510
That's just simply
corresponds to taking this 1/2

872
00:51:46,510 --> 00:51:50,340
and raising it to c log n,
because that's essentially

873
00:51:50,340 --> 00:51:55,330
the number of
promotions you have.

874
00:51:55,330 --> 00:52:02,950
And you got n 1/2 c log n,
and what does that turn into?

875
00:52:02,950 --> 00:52:07,990
What is n times 1/2 c log n?

876
00:52:07,990 --> 00:52:12,490
1 over 2 raised to
log n would give you?

877
00:52:12,490 --> 00:52:14,790
2 raised to log ns?

878
00:52:14,790 --> 00:52:15,590
Is n, right?

879
00:52:15,590 --> 00:52:20,510
So you got n divided
by n raised to c, which

880
00:52:20,510 --> 00:52:23,920
is 1 divided by n
raised to c minus 1,

881
00:52:23,920 --> 00:52:27,500
which is 1 divided by n
raised to alpha where alpha

882
00:52:27,500 --> 00:52:30,341
is c minus 1.

883
00:52:30,341 --> 00:52:30,882
So that's it.

884
00:52:30,882 --> 00:52:33,970
That's our first with
high probability analysis.

885
00:52:33,970 --> 00:52:35,550
Not too hard.

886
00:52:35,550 --> 00:52:39,140
What I've done is done
exactly what I just told you

887
00:52:39,140 --> 00:52:42,770
that the notion of with
high probability is.

888
00:52:42,770 --> 00:52:48,050
You have a failure
probability that is related.

889
00:52:48,050 --> 00:52:54,380
Inverse polynomial and the
degree of the polynomial alpha

890
00:52:54,380 --> 00:52:55,621
is related to c.

891
00:52:55,621 --> 00:52:57,120
And so that's what
I have out there,

892
00:52:57,120 --> 00:53:00,080
but c equals-- what did it have?

893
00:53:00,080 --> 00:53:04,510
Alpha equals c minus 1
or c equals alpha plus 1.

894
00:53:04,510 --> 00:53:07,090
So what I've done here
is done an analysis

895
00:53:07,090 --> 00:53:10,480
that tells you with high
probability how many levels

896
00:53:10,480 --> 00:53:14,610
I'm going to have given
my insert algorithm.

897
00:53:14,610 --> 00:53:19,290
So this is the first part
of what we'd like to show.

898
00:53:19,290 --> 00:53:22,050
This just tells us
how big this skip list

899
00:53:22,050 --> 00:53:24,420
is going to grow vertically.

900
00:53:24,420 --> 00:53:29,140
It doesn't tell us anything
about the structure of the list

901
00:53:29,140 --> 00:53:35,110
internally as to whether
the randomization is going

902
00:53:35,110 --> 00:53:37,970
to cause that pretty
structure that you see up

903
00:53:37,970 --> 00:53:42,710
here to be completely messed up
to the point where we don't get

904
00:53:42,710 --> 00:53:46,280
order log n search complexity,
because we are spending way too

905
00:53:46,280 --> 00:53:49,200
much time let's say on the
bottom list or the list

906
00:53:49,200 --> 00:53:51,590
just above the bottom
list, et cetera.

907
00:53:51,590 --> 00:53:57,580
So we need to get a sense of
how the structure corresponding

908
00:53:57,580 --> 00:54:00,130
to the skip list, whether it's
going to look somewhat uniform

909
00:54:00,130 --> 00:54:00,630
or not.

910
00:54:00,630 --> 00:54:02,810
We have to categorize
that, and the only way

911
00:54:02,810 --> 00:54:04,320
we're going to
characterize that is

912
00:54:04,320 --> 00:54:08,400
by analyzing search and
counting the number of moves

913
00:54:08,400 --> 00:54:09,970
that a search makes.

914
00:54:09,970 --> 00:54:11,510
And the reason it's
more complicated

915
00:54:11,510 --> 00:54:15,660
than what you see up there
is that in a search, as you

916
00:54:15,660 --> 00:54:19,110
can see, you're going to be
moving at different levels.

917
00:54:19,110 --> 00:54:21,500
You're going to be
moving at the top level.

918
00:54:21,500 --> 00:54:24,410
Maybe at relatively
small number of moves,

919
00:54:24,410 --> 00:54:28,130
you're going to pop down one,
move a few moves at that level,

920
00:54:28,130 --> 00:54:30,160
pop down, et cetera, et cetera.

921
00:54:30,160 --> 00:54:32,680
So there's a lot of things
going on in search which

922
00:54:32,680 --> 00:54:35,860
happen at different
levels, and the total cost

923
00:54:35,860 --> 00:54:38,920
is going to have to
be all of the moves.

924
00:54:38,920 --> 00:54:42,200
So we're going to think
about all of the moves--

925
00:54:42,200 --> 00:54:45,760
up moves, down moves,
and add them all up.

926
00:54:45,760 --> 00:54:49,090
They all have to be order
log n with high probability.

927
00:54:49,090 --> 00:54:52,310
There's no getting around that
because each of them costs you.

928
00:54:52,310 --> 00:54:59,400
So that's the thing that we'll
spend the next 20 minutes on.

929
00:54:59,400 --> 00:55:04,640
And the theorem that we
like to prove for search

930
00:55:04,640 --> 00:55:08,140
is that-- this is
what I just said--

931
00:55:08,140 --> 00:55:26,920
any search in an n element skip
list costs order log n w.h.p.

932
00:55:26,920 --> 00:55:30,780
So it doesn't matter how
this skip list looks.

933
00:55:30,780 --> 00:55:32,800
There's n elements,
they got inserted

934
00:55:32,800 --> 00:55:34,510
using the insert
algorithm-- that's

935
00:55:34,510 --> 00:55:37,450
important to know if you're
going to have to use that.

936
00:55:37,450 --> 00:55:41,300
And when I do a search for an
element, it may be in there,

937
00:55:41,300 --> 00:55:42,854
it may not be in there.

938
00:55:42,854 --> 00:55:43,770
Doesn't really matter.

939
00:55:43,770 --> 00:55:46,940
We'll assume a
successful search.

940
00:55:46,940 --> 00:55:51,200
That is going to cost me order
log n with high probability.

941
00:55:51,200 --> 00:55:55,130
And the cool idea here in
terms of analyzing the search

942
00:55:55,130 --> 00:55:58,930
in order to figure out
how we're going to add up

943
00:55:58,930 --> 00:56:01,280
all of these moves is
we're going to analyze

944
00:56:01,280 --> 00:56:04,470
the search backwards.

945
00:56:04,470 --> 00:56:05,460
So that's a cool idea.

946
00:56:09,350 --> 00:56:12,780
So what does that mean exactly?

947
00:56:12,780 --> 00:56:15,890
Well, what that
means is that we're

948
00:56:15,890 --> 00:56:18,280
going to think
about this b search,

949
00:56:18,280 --> 00:56:24,470
which think of it as the
backward search, starts--

950
00:56:24,470 --> 00:56:28,540
it actually ends, so that's what
I'm writing in brackets here,

951
00:56:28,540 --> 00:56:31,000
at the node in the bottom list.

952
00:56:31,000 --> 00:56:35,300
So we're assuming a successful
search, as I mentioned before.

953
00:56:35,300 --> 00:56:40,520
Otherwise, the point would
just be in between two members.

954
00:56:40,520 --> 00:56:44,180
You know that it's not in there
because you're looking for 67

955
00:56:44,180 --> 00:56:48,820
and you see 66 to your
left and 72 to your right.

956
00:56:48,820 --> 00:56:51,940
So either way it
works, but keep in mind

957
00:56:51,940 --> 00:56:55,340
that it's a successful
search because it just makes

958
00:56:55,340 --> 00:56:58,310
things a little bit easier.

959
00:56:58,310 --> 00:57:06,760
Now, at each node that we
visit, what we're going to do

960
00:57:06,760 --> 00:57:16,920
is we're going to say that
if the node was not promoted

961
00:57:16,920 --> 00:57:20,480
higher, then what
actually happened here

962
00:57:20,480 --> 00:57:24,030
was that when you inserted
that particular element,

963
00:57:24,030 --> 00:57:26,330
you got a tails.

964
00:57:26,330 --> 00:57:29,020
Because otherwise you
would have gotten a heads,

965
00:57:29,020 --> 00:57:31,820
that element would have
been promoted higher.

966
00:57:31,820 --> 00:57:38,250
Then you go-- and
that really means

967
00:57:38,250 --> 00:57:44,200
that you came from the left-hand
side, so you make a left move.

968
00:57:44,200 --> 00:57:47,820
Now, search of course makes
down moves and right moves,

969
00:57:47,820 --> 00:57:50,680
but this is a backward search
so it's going to make left moves

970
00:57:50,680 --> 00:57:53,750
and up moves.

971
00:57:53,750 --> 00:57:55,390
What else do I have here?

972
00:57:55,390 --> 00:58:06,400
Running out of room, so let
me-- let's continue with that.

973
00:58:18,100 --> 00:58:19,320
All right.

974
00:58:19,320 --> 00:58:29,050
And now the case is if
the node was promoted

975
00:58:29,050 --> 00:58:34,510
higher, that means
we got heads here

976
00:58:34,510 --> 00:58:36,790
in that particular insertion.

977
00:58:36,790 --> 00:58:43,990
Then we go, and that means
that during the search

978
00:58:43,990 --> 00:58:49,280
we came from upstairs.

979
00:58:49,280 --> 00:58:52,830
And then lastly, we
stop, which means

980
00:58:52,830 --> 00:59:06,070
we start when we reach the top
level or minus infinity if we

981
00:59:06,070 --> 00:59:08,630
go all the way back.

982
00:59:08,630 --> 00:59:10,100
So that's it.

983
00:59:10,100 --> 00:59:13,360
A lot of writing here, but
this should make things clear.

984
00:59:13,360 --> 00:59:18,020
So let's say that
we're searching for 66.

985
00:59:18,020 --> 00:59:20,430
I want to trace through what
the backwards path would

986
00:59:20,430 --> 00:59:24,890
look like, and keep that
code in mind as I do this.

987
00:59:24,890 --> 00:59:27,717
So I'm searching for
66, and obviously, we

988
00:59:27,717 --> 00:59:28,550
know how to find it.

989
00:59:28,550 --> 00:59:29,470
We've done that.

990
00:59:29,470 --> 00:59:32,650
But let's go backwards
as to what exactly

991
00:59:32,650 --> 00:59:36,380
happened when we look for 66.

992
00:59:36,380 --> 00:59:42,230
When we look for 66, right at
this point when you see 66,

993
00:59:42,230 --> 00:59:43,790
where would you have come from?

994
00:59:43,790 --> 00:59:44,665
AUDIENCE: [INAUDIBLE]

995
00:59:44,665 --> 00:59:46,880
SRINIVAS DEVADAS: You'd
have come from the top.

996
00:59:46,880 --> 00:59:50,600
And so if you go look
at what happens here,

997
00:59:50,600 --> 00:59:54,800
the node when it got inserted
was promoted one level.

998
00:59:54,800 --> 00:59:59,591
So that means that you would go
up top in the backward search

999
00:59:59,591 --> 01:00:00,090
first.

1000
01:00:00,090 --> 01:00:03,140
Your first move would
be going up like that.

1001
01:00:03,140 --> 01:00:07,390
Now, if there's a 66 up there,
you would go up one more.

1002
01:00:07,390 --> 01:00:09,340
But there's not, so you go left.

1003
01:00:12,270 --> 01:00:13,720
You go to 50.

1004
01:00:13,720 --> 01:00:17,310
And when you have a 50 up here,
would you stay on this level?

1005
01:00:17,310 --> 01:00:18,291
AUDIENCE: No.

1006
01:00:18,291 --> 01:00:19,166
SRINIVAS DEVADAS: No.

1007
01:00:19,166 --> 01:00:22,690
You'd go up to 50
because the first chance

1008
01:00:22,690 --> 01:00:26,020
you get you want to get
up to the higher levels.

1009
01:00:26,020 --> 01:00:28,215
And again, this 50 was
promoted so you go up there,

1010
01:00:28,215 --> 01:00:33,230
and you go to 14, and pretty
much that's the end of that.

1011
01:00:33,230 --> 01:00:38,860
So this would look like you go
like that, you have an up move,

1012
01:00:38,860 --> 01:00:42,860
then you have a left move--
different colors here

1013
01:00:42,860 --> 01:00:47,570
would be good-- then
you have an up move,

1014
01:00:47,570 --> 01:00:52,270
and a left, and then an up.

1015
01:00:52,270 --> 01:00:55,140
So that's our backward search.

1016
01:00:55,140 --> 01:00:58,940
And it's not that
complicated, hopefully.

1017
01:00:58,940 --> 01:01:01,980
If you're looking for
66 or 59, you do that.

1018
01:01:01,980 --> 01:01:05,710
So it's much more natural,
and you just need to flip it.

1019
01:01:05,710 --> 01:01:07,520
Why am I doing all this?

1020
01:01:07,520 --> 01:01:09,720
Well, the reason
I'm doing all this

1021
01:01:09,720 --> 01:01:15,770
is that I have to do some
bounding of the moves,

1022
01:01:15,770 --> 01:01:21,350
and I know that the moves that
correspond to the up moves

1023
01:01:21,350 --> 01:01:25,280
are probabilistic in the sense
that the reason I'm making them

1024
01:01:25,280 --> 01:01:29,170
is because I flipped
heads at some point.

1025
01:01:29,170 --> 01:01:32,560
So all of this is going
to turn into counting

1026
01:01:32,560 --> 01:01:36,350
how many coin flips
come out heads

1027
01:01:36,350 --> 01:01:39,672
in a long stream of coin flips.

1028
01:01:39,672 --> 01:01:41,130
So that's what this
backward search

1029
01:01:41,130 --> 01:01:42,630
is going to allow us to do.

1030
01:01:42,630 --> 01:01:47,270
And that crucial thing is
what we'll look at next.

1031
01:01:47,270 --> 01:01:50,730
So the analysis itself
is a bit painful,

1032
01:01:50,730 --> 01:01:52,139
but there's a bunch of algebra.

1033
01:01:52,139 --> 01:01:53,680
But what I want to
do is to make sure

1034
01:01:53,680 --> 01:01:57,950
that you get the high
level picture, number one,

1035
01:01:57,950 --> 01:02:08,500
and the insights as to why
the expected value or the with

1036
01:02:08,500 --> 01:02:10,810
high probability value is
going to be order log n.

1037
01:02:10,810 --> 01:02:13,164
But the key is the strategy.

1038
01:02:13,164 --> 01:02:14,580
So we're going to
go off and we're

1039
01:02:14,580 --> 01:02:15,746
going to prove this theorem.

1040
01:02:22,080 --> 01:02:38,330
Our backward search makes
up moves and left moves.

1041
01:02:38,330 --> 01:02:38,900
We know that.

1042
01:02:42,480 --> 01:02:48,310
Each with probability 1/2.

1043
01:02:48,310 --> 01:02:52,910
And the reason for
that is when you go up

1044
01:02:52,910 --> 01:02:55,050
is because you got
a heads, and if you

1045
01:02:55,050 --> 01:02:58,880
didn't get a heads in you got a
tails, that meant you go left.

1046
01:02:58,880 --> 01:03:01,960
Because of the previous
element, every time you're

1047
01:03:01,960 --> 01:03:06,230
passing these elements
that are inserted,

1048
01:03:06,230 --> 01:03:09,660
and they were inserted
by flipping coins.

1049
01:03:09,660 --> 01:03:13,436
So that's key point number one.

1050
01:03:13,436 --> 01:03:15,310
All of that, if you look
at what happens here

1051
01:03:15,310 --> 01:03:17,650
when I drew this out,
you got heads here

1052
01:03:17,650 --> 01:03:19,630
and you got tails there.

1053
01:03:19,630 --> 01:03:21,460
So each of those
things for a fair coin

1054
01:03:21,460 --> 01:03:23,370
is happening with
probability 1/2.

1055
01:03:23,370 --> 01:03:26,120
And it's all about
coin flips here.

1056
01:03:26,120 --> 01:03:38,700
Now, the number
of moves going up

1057
01:03:38,700 --> 01:03:44,750
is less than the number of
levels-- the number of levels

1058
01:03:44,750 --> 01:03:46,100
is one more than that.

1059
01:03:46,100 --> 01:03:52,230
And we've shown that that's
c log n with high probability

1060
01:03:52,230 --> 01:03:53,480
by the warm-up Lemma.

1061
01:03:53,480 --> 01:03:55,370
That's what this just did.

1062
01:03:55,370 --> 01:03:59,540
The number of up moves-- I mean
you can't go off the list here.

1063
01:03:59,540 --> 01:04:01,720
This list is now you're
not inserting anymore,

1064
01:04:01,720 --> 01:04:02,840
you're doing a search.

1065
01:04:02,840 --> 01:04:04,750
So it's not like you're
going to be adding

1066
01:04:04,750 --> 01:04:06,460
levels or anything like that.

1067
01:04:06,460 --> 01:04:09,070
So the number of up moves
we've taken care of.

1068
01:04:09,070 --> 01:04:11,970
So this last thing here which
I'm going to write out here

1069
01:04:11,970 --> 01:04:15,600
is the key observation,
which is going to make

1070
01:04:15,600 --> 01:04:17,880
the whole analysis possible.

1071
01:04:17,880 --> 01:04:23,400
And so this last thing it
says that the total number

1072
01:04:23,400 --> 01:04:27,260
of moves-- so now the total
number of moves has to include,

1073
01:04:27,260 --> 01:04:28,820
obviously, the up
moves and the left

1074
01:04:28,820 --> 01:04:30,470
moves, and there's
no other kind.

1075
01:04:33,146 --> 01:04:38,770
The total number
of moves is going

1076
01:04:38,770 --> 01:04:51,258
to correspond to
the number of moves

1077
01:04:51,258 --> 01:05:04,317
till you get c log n up moves.

1078
01:05:07,570 --> 01:05:09,720
So what does that mean?

1079
01:05:09,720 --> 01:05:11,530
There's some sequence
of heads and tails

1080
01:05:11,530 --> 01:05:15,270
that I'm getting, each of
them with probability 1/2.

1081
01:05:15,270 --> 01:05:19,090
Every time that I got a
heads, I moved up a level.

1082
01:05:19,090 --> 01:05:23,140
The fact of the matter is that
I can't get more than c log n

1083
01:05:23,140 --> 01:05:27,000
heads because I'm going
to run out of levels.

1084
01:05:27,000 --> 01:05:28,980
That's it.

1085
01:05:28,980 --> 01:05:33,530
I'm going to run out of room
vertically if I keep popping up

1086
01:05:33,530 --> 01:05:35,700
and keep doing up moves.

1087
01:05:35,700 --> 01:05:39,484
So at that point I'm
forced to go left.

1088
01:05:39,484 --> 01:05:40,900
Maybe I'm going
left in the middle

1089
01:05:40,900 --> 01:05:44,220
there when I still
had a chance to go up.

1090
01:05:44,220 --> 01:05:47,390
That corresponds to getting a
tails as opposed to a heads.

1091
01:05:47,390 --> 01:05:50,910
But I can limit the
total number of moves

1092
01:05:50,910 --> 01:05:53,850
from a probabilistic
standpoint by saying

1093
01:05:53,850 --> 01:05:57,370
during that sequence
of coin flips I only

1094
01:05:57,370 --> 01:05:59,500
have a certain number
of heads that I

1095
01:05:59,500 --> 01:06:01,080
could have possibly gotten.

1096
01:06:01,080 --> 01:06:04,910
Because if I got more heads
than that, I would be up top.

1097
01:06:04,910 --> 01:06:10,120
I'd be out of the skip
list, and that doesn't work.

1098
01:06:10,120 --> 01:06:13,210
So the total number of
moves is the number of moves

1099
01:06:13,210 --> 01:06:18,720
till you get c log n up moves,
which essentially corresponds

1100
01:06:18,720 --> 01:06:24,210
to-- now, forget about
skip lists for a second.

1101
01:06:24,210 --> 01:06:28,590
Our claim is the
total number of moves

1102
01:06:28,590 --> 01:06:33,950
is the number of coin flips,
so these are the same,

1103
01:06:33,950 --> 01:06:37,090
because every move
corresponds to a coin flip.

1104
01:06:37,090 --> 01:06:41,720
Until-- it's a fair
coin, probability 1/2--

1105
01:06:41,720 --> 01:06:49,620
until c log n heads
have been obtained.

1106
01:06:49,620 --> 01:06:52,800
So the number of
coin flips until c

1107
01:06:52,800 --> 01:06:56,880
log n heads is the
total number of moves.

1108
01:06:56,880 --> 01:06:57,920
This equals that.

1109
01:07:00,450 --> 01:07:06,600
And what we now want to show, if
you believe that, and hopefully

1110
01:07:06,600 --> 01:07:08,480
you do because the
argument is simply

1111
01:07:08,480 --> 01:07:15,740
that you run out of levels,
that this is order log n w.h.p.

1112
01:07:15,740 --> 01:07:17,450
That's why it's a claim.

1113
01:07:17,450 --> 01:07:21,550
So the observation is
that the number of coin

1114
01:07:21,550 --> 01:07:24,330
flips, as you flip a
fair coin, until you

1115
01:07:24,330 --> 01:07:28,700
get c log n heads will give
you the number of moves

1116
01:07:28,700 --> 01:07:33,270
in your search, total number
of moves in your search.

1117
01:07:33,270 --> 01:07:35,880
It includes the up moves
as well as the left moves.

1118
01:07:35,880 --> 01:07:41,220
And now what we have
to show is that that

1119
01:07:41,220 --> 01:07:44,150
is going to be order log
n with high probability.

1120
01:07:44,150 --> 01:07:45,240
OK?

1121
01:07:45,240 --> 01:07:48,650
And then once you do that
you've done two things.

1122
01:07:48,650 --> 01:07:55,830
You've bounded the number
of levels in the skip list

1123
01:07:55,830 --> 01:07:58,910
to be order log n
with high probability.

1124
01:07:58,910 --> 01:08:01,470
And you've said the number
of moves in the search

1125
01:08:01,470 --> 01:08:06,110
is order log n with high
probability assuming

1126
01:08:06,110 --> 01:08:11,240
that the number of levels
is c log n, obviously.

1127
01:08:11,240 --> 01:08:15,650
So it's not that the bottom
one subsumes the top one.

1128
01:08:15,650 --> 01:08:18,560
It's the last thing to
keep in mind as we get all

1129
01:08:18,560 --> 01:08:22,520
of these items out of the way.

1130
01:08:22,520 --> 01:08:26,439
This assumes that there are
less than or equal to c log n

1131
01:08:26,439 --> 01:08:27,155
levels.

1132
01:08:27,155 --> 01:08:29,279
That's the only reason why
I could make an argument

1133
01:08:29,279 --> 01:08:31,149
that I've run out of levels.

1134
01:08:31,149 --> 01:08:35,036
So if I have this event A
here-- if I call this event A,

1135
01:08:35,036 --> 01:08:39,510
and I have this event
B, what I really want

1136
01:08:39,510 --> 01:08:43,390
is-- I've shown you that event
A happens with high probability.

1137
01:08:43,390 --> 01:08:45,149
That's the warm-up Lemma.

1138
01:08:45,149 --> 01:08:48,649
I need to show you that event B
happens with high probability.

1139
01:08:48,649 --> 01:08:51,680
And then I have to show you
that event A and event B

1140
01:08:51,680 --> 01:08:56,490
happen with high probability,
because I need both.

1141
01:08:56,490 --> 01:08:57,149
Any questions?

1142
01:08:57,149 --> 01:08:59,460
We're stopping a minute here.

1143
01:08:59,460 --> 01:09:01,870
The rest of the analysis,
a bunch of algebra,

1144
01:09:01,870 --> 01:09:03,910
we'll get through it, you
can look at the notes.

1145
01:09:03,910 --> 01:09:05,920
This is the key point.

1146
01:09:05,920 --> 01:09:08,762
If you got this, you got it.

1147
01:09:08,762 --> 01:09:09,262
Yeah.

1148
01:09:09,262 --> 01:09:11,553
AUDIENCE: Can you just say
that because the probability

1149
01:09:11,553 --> 01:09:15,869
of drawing an up move
instead of a left move

1150
01:09:15,869 --> 01:09:21,265
is 1/2, that the expected
number of left moves

1151
01:09:21,265 --> 01:09:25,227
should be equal to the number
of up moves, [INAUDIBLE]

1152
01:09:25,227 --> 01:09:26,649
bound the up moves?

1153
01:09:26,649 --> 01:09:28,229
SRINIVAS DEVADAS:
So the argument

1154
01:09:28,229 --> 01:09:32,410
is that since you
have 1/2, can you

1155
01:09:32,410 --> 01:09:37,470
simply say that the expected
number of left moves

1156
01:09:37,470 --> 01:09:40,490
is going to be the same as
the same as the up moves?

1157
01:09:40,490 --> 01:09:42,790
You can make arguments
about expectation.

1158
01:09:42,790 --> 01:09:46,200
You can say that at any level,
the number of left moves

1159
01:09:46,200 --> 01:09:50,090
that you're going to have is
going to be two in expectation.

1160
01:09:50,090 --> 01:09:54,290
It's not going to give you your
with high probability proof.

1161
01:09:54,290 --> 01:09:57,410
It's not going to relate
that to the 1 divided

1162
01:09:57,410 --> 01:09:58,630
by n raised to alpha.

1163
01:09:58,630 --> 01:10:02,430
But I will tell you that if you
just wanted to show expectation

1164
01:10:02,430 --> 01:10:04,990
for search is order
log n, you won't

1165
01:10:04,990 --> 01:10:08,400
have to jump through
all of these hoops.

1166
01:10:08,400 --> 01:10:11,270
At some level you'll be
making the assumptions

1167
01:10:11,270 --> 01:10:13,927
that I've made explicit
here through my observations

1168
01:10:13,927 --> 01:10:15,135
when you do that expectation.

1169
01:10:15,135 --> 01:10:19,540
So if you really want to write
a precise proof of expected

1170
01:10:19,540 --> 01:10:22,320
value for search
complexity, you would

1171
01:10:22,320 --> 01:10:25,880
have to do a lot of the
things that I'm doing here.

1172
01:10:25,880 --> 01:10:27,380
I'm not saying you
waved your hands.

1173
01:10:27,380 --> 01:10:30,120
You did not.

1174
01:10:30,120 --> 01:10:34,220
But it needed more to
than what you just said.

1175
01:10:34,220 --> 01:10:35,820
OK?

1176
01:10:35,820 --> 01:10:40,580
So this is pretty much
what the analysis is.

1177
01:10:40,580 --> 01:10:43,800
With high probability analysis
we bounded the vertical,

1178
01:10:43,800 --> 01:10:45,920
we bounded the number of moves.

1179
01:10:45,920 --> 01:10:48,710
Assuming the
vertical was bounded,

1180
01:10:48,710 --> 01:10:51,350
we got the result for
the number of moves.

1181
01:10:51,350 --> 01:10:53,720
So both of those happen
with high probability.

1182
01:10:53,720 --> 01:10:56,570
You got your result,
which is the theorem

1183
01:10:56,570 --> 01:11:00,950
that we have somewhere.

1184
01:11:00,950 --> 01:11:02,770
Woah, did I erase the theorem?

1185
01:11:02,770 --> 01:11:04,150
AUDIENCE: [INAUDIBLE].

1186
01:11:04,150 --> 01:11:05,525
SRINIVAS DEVADAS:
It's somewhere.

1187
01:11:06,901 --> 01:11:07,400
All right.

1188
01:11:07,400 --> 01:11:08,270
Good.

1189
01:11:08,270 --> 01:11:10,780
So let's do what
we can with respect

1190
01:11:10,780 --> 01:11:14,980
to showing this theorem.

1191
01:11:14,980 --> 01:11:17,810
There's a couple ways
that you could prove this.

1192
01:11:17,810 --> 01:11:26,910
There's a way that you
could use a Chernoff bound.

1193
01:11:26,910 --> 01:11:29,840
And this is kind
of a cool result

1194
01:11:29,840 --> 01:11:32,700
that I think is worth knowing.

1195
01:11:32,700 --> 01:11:34,430
I don't know if
you've seen this,

1196
01:11:34,430 --> 01:11:38,430
but this is a seminal
theorem by Chernoff

1197
01:11:38,430 --> 01:11:55,220
that says if you have a
random variable representing

1198
01:11:55,220 --> 01:12:00,700
the total number of
tails, let's say--

1199
01:12:00,700 --> 01:12:08,310
it could be heads as
well-- in a series of m--

1200
01:12:08,310 --> 01:12:22,110
not n, m-- independent coin
flips where each flip has

1201
01:12:22,110 --> 01:12:30,200
a probability p of
coming up heads,

1202
01:12:30,200 --> 01:12:38,750
then for all r greater
than 0, we have

1203
01:12:38,750 --> 01:12:45,040
this beautiful result that
says the probability that y,

1204
01:12:45,040 --> 01:12:53,320
which is a random variable--
a particular instance

1205
01:12:53,320 --> 01:12:58,980
when you evaluate
it-- that it is larger

1206
01:12:58,980 --> 01:13:03,210
than the expectation
by r is bounded.

1207
01:13:03,210 --> 01:13:07,560
So just a beautiful
result that says here's

1208
01:13:07,560 --> 01:13:12,520
a random variable that
corresponds to flipping a coin.

1209
01:13:12,520 --> 01:13:15,700
I'm going to flip
this a bunch of times,

1210
01:13:15,700 --> 01:13:17,510
and I know what
the expectation is.

1211
01:13:17,510 --> 01:13:21,790
If it's a fair coin
of 1/2, then I'm

1212
01:13:21,790 --> 01:13:24,400
going to get m over 2--
expected number of heads

1213
01:13:24,400 --> 01:13:25,760
is going to be m over 2.

1214
01:13:25,760 --> 01:13:28,040
Expected number of tails
is going to be m over 2.

1215
01:13:28,040 --> 01:13:30,190
If it's p, then obviously
it's a little bit

1216
01:13:30,190 --> 01:13:32,600
different-- p times m.

1217
01:13:32,600 --> 01:13:37,850
But what I have here is if you
tell me what the probability is

1218
01:13:37,850 --> 01:13:40,500
that I'm 10 away
from the expectation

1219
01:13:40,500 --> 01:13:44,670
and that would imply that r is
10, then that is bounded by e

1220
01:13:44,670 --> 01:13:48,240
raised to minus 2 times
10 square divided by m.

1221
01:13:48,240 --> 01:13:50,369
So that's Chernoff's bound.

1222
01:13:50,369 --> 01:13:52,910
And you can see how this relates
to our with high probability

1223
01:13:52,910 --> 01:13:53,860
analysis.

1224
01:13:53,860 --> 01:13:55,290
Because our with
high probability

1225
01:13:55,290 --> 01:13:57,110
analysis is exactly this.

1226
01:13:57,110 --> 01:14:00,830
This is the hammer that you can
use to do with high probability

1227
01:14:00,830 --> 01:14:01,690
analysis.

1228
01:14:01,690 --> 01:14:04,730
Because this tells you as you
get further and further away

1229
01:14:04,730 --> 01:14:07,460
from the average or you get
further and further away

1230
01:14:07,460 --> 01:14:10,090
from the expectation, what
the probability is that you're

1231
01:14:10,090 --> 01:14:11,960
going to be so far away.

1232
01:14:11,960 --> 01:14:19,260
What is the probability that in
100 coin flips that are fair,

1233
01:14:19,260 --> 01:14:22,390
you get 50 heads?

1234
01:14:22,390 --> 01:14:25,440
It's a reasonably large number
because the expected value

1235
01:14:25,440 --> 01:14:28,590
corresponds to 50.

1236
01:14:28,590 --> 01:14:30,470
So r is 0.

1237
01:14:30,470 --> 01:14:32,755
So that just says
this is a-- well,

1238
01:14:32,755 --> 01:14:35,130
it doesn't tell you much
because this says it's less than

1239
01:14:35,130 --> 01:14:36,750
or equal to 1.

1240
01:14:36,750 --> 01:14:38,400
That's all it's says.

1241
01:14:38,400 --> 01:14:43,360
But if you had 75, what are
the probability that you

1242
01:14:43,360 --> 01:14:48,370
get 75 heads when you
flip a coin 100 times?

1243
01:14:48,370 --> 01:14:53,390
Then e of y for a fair coin
would be 50, r would be 25,

1244
01:14:53,390 --> 01:14:56,220
and you'd go off and you
could do the math for that.

1245
01:14:56,220 --> 01:14:59,670
So it's a beautiful
relationship that tells you

1246
01:14:59,670 --> 01:15:05,050
how the probabilities change as
your random variable value is

1247
01:15:05,050 --> 01:15:07,880
further and further away
from the expectation.

1248
01:15:07,880 --> 01:15:09,900
And you can imagine
that this is going

1249
01:15:09,900 --> 01:15:19,110
to be very useful in showing our
with high probability result.

1250
01:15:19,110 --> 01:15:22,760
And I think what I
have time for is just

1251
01:15:22,760 --> 01:15:27,810
to give you a sense of how
this result works out-- I'm not

1252
01:15:27,810 --> 01:15:28,960
going to do the algebra.

1253
01:15:28,960 --> 01:15:32,610
I don't think it's worth it to
write all of this on the board

1254
01:15:32,610 --> 01:15:35,340
when you can read
it in the notes.

1255
01:15:35,340 --> 01:15:37,260
But the bottom
line is we're going

1256
01:15:37,260 --> 01:15:47,730
to show this little Lemma
that says for any c,

1257
01:15:47,730 --> 01:15:53,330
invoking this Chernoff
bound, there's a constant d,

1258
01:15:53,330 --> 01:16:05,406
such that with high
probability, the number of heads

1259
01:16:05,406 --> 01:16:09,510
in flipping d log n.

1260
01:16:09,510 --> 01:16:11,240
So I have a new constant here.

1261
01:16:11,240 --> 01:16:15,830
d log n fair coins,
or a single fair coin,

1262
01:16:15,830 --> 01:16:20,040
d log n times,
assuming independence,

1263
01:16:20,040 --> 01:16:23,380
is at least c log n.

1264
01:16:23,380 --> 01:16:24,780
So what does this say?

1265
01:16:24,780 --> 01:16:26,390
A lot of words.

1266
01:16:26,390 --> 01:16:32,320
It just says, hey, you
want an order log n

1267
01:16:32,320 --> 01:16:34,270
bound here eventually.

1268
01:16:34,270 --> 01:16:36,590
The beauty of order log n
is that there's a constant

1269
01:16:36,590 --> 01:16:38,760
in there that you control.

1270
01:16:38,760 --> 01:16:41,420
That constant is d.

1271
01:16:41,420 --> 01:16:46,530
So you tell me
that c log n is 50.

1272
01:16:46,530 --> 01:16:49,590
So c log n is 50.

1273
01:16:49,590 --> 01:16:52,570
Then what I'm going to do is
I'm going to say something like,

1274
01:16:52,570 --> 01:17:00,760
well, if I flip a
coin 1,000 times, then

1275
01:17:00,760 --> 01:17:02,970
I'm going to have an
overwhelming probability

1276
01:17:02,970 --> 01:17:06,070
that I'm going to get 50 heads.

1277
01:17:06,070 --> 01:17:06,860
And that's it.

1278
01:17:06,860 --> 01:17:10,040
That's what the Lemma says.

1279
01:17:10,040 --> 01:17:12,900
It says tell me what c log n is.

1280
01:17:12,900 --> 01:17:14,430
Give me that value.

1281
01:17:14,430 --> 01:17:18,970
And I will find you a d, such
that by invoking Chernoff,

1282
01:17:18,970 --> 01:17:22,250
I'm going to show you an
overwhelming probability that

1283
01:17:22,250 --> 01:17:25,924
for that d you're going to
get at least c log n heads.

1284
01:17:25,924 --> 01:17:26,840
So everybody buy that?

1285
01:17:26,840 --> 01:17:30,117
Make sense from what
you see up there?

1286
01:17:30,117 --> 01:17:31,520
Yup?

1287
01:17:31,520 --> 01:17:33,920
So this essentially
can be shown--

1288
01:17:33,920 --> 01:17:35,640
it turns out that
what you have to do

1289
01:17:35,640 --> 01:17:38,110
is-- and you don't
have to choose 8,

1290
01:17:38,110 --> 01:17:41,620
but you can choose d equals 8c.

1291
01:17:41,620 --> 01:17:44,030
Just choose d
equals 8c and you'll

1292
01:17:44,030 --> 01:17:48,100
see the algebra in the
notes corresponding to what

1293
01:17:48,100 --> 01:17:49,560
each of these values are.

1294
01:17:49,560 --> 01:17:54,570
So e of y, just to tell
you, would be m over 2.

1295
01:17:54,570 --> 01:17:58,940
You're flipping m coins, fair
coin with probability 1/2.

1296
01:17:58,940 --> 01:18:01,075
So you got m over 2.

1297
01:18:01,075 --> 01:18:02,450
And then the last
thing that I'll

1298
01:18:02,450 --> 01:18:08,980
tell you is that what you want
in terms of invoking that,

1299
01:18:08,980 --> 01:18:12,660
you want r-- remember we were
talking about tails here-- so r

1300
01:18:12,660 --> 01:18:18,990
is going to be d
log n minus c log n.

1301
01:18:18,990 --> 01:18:23,780
So you just invoke Chernoff
with e of y equals m over 2.

1302
01:18:23,780 --> 01:18:27,370
And what you're saying here
is you want c log n heads.

1303
01:18:27,370 --> 01:18:34,195
You want to make sure you
get c log n heads, which

1304
01:18:34,195 --> 01:18:35,820
means that the number
of tails is going

1305
01:18:35,820 --> 01:18:38,390
to be d log n minus c log n.

1306
01:18:38,390 --> 01:18:41,610
And typically we analyze
failure probability,

1307
01:18:41,610 --> 01:18:45,180
so what this is is this is
going to be a tiny number.

1308
01:18:45,180 --> 01:18:51,350
So the failure is when you
get fewer than c log n heads.

1309
01:18:51,350 --> 01:18:54,140
So the failure is when you
get fewer than c log n heads.

1310
01:18:54,140 --> 01:18:59,585
And so that means that you're
getting more than d log

1311
01:18:59,585 --> 01:19:04,330
n minus c log n tails as
you're flipping this coin.

1312
01:19:04,330 --> 01:19:07,910
Fewer than c log n heads means
you're getting at least d log

1313
01:19:07,910 --> 01:19:10,030
n minus c log n tails.

1314
01:19:10,030 --> 01:19:12,130
So that's why this
is your r here.

1315
01:19:12,130 --> 01:19:14,360
And then when your
r gets that large,

1316
01:19:14,360 --> 01:19:16,400
and you can play around
with the d and the c

1317
01:19:16,400 --> 01:19:19,580
and choose d equals
8c, you realize

1318
01:19:19,580 --> 01:19:22,640
that this is going to be
a minuscule probability.

1319
01:19:22,640 --> 01:19:27,790
And you can turn that
around to a polynomial--

1320
01:19:27,790 --> 01:19:29,180
again, a little bit of algebra.

1321
01:19:29,180 --> 01:19:32,990
But you can show
this result on here

1322
01:19:32,990 --> 01:19:34,820
that says that
the number of coin

1323
01:19:34,820 --> 01:19:37,745
flips until c log n
heads is order log

1324
01:19:37,745 --> 01:19:39,940
n with high probability
by appropriately

1325
01:19:39,940 --> 01:19:45,700
choosing the constant d to
be some time/number over c.

1326
01:19:45,700 --> 01:19:47,250
So I'll let you do that algebra.

1327
01:19:47,250 --> 01:19:51,100
But this one last thing
that-- we're not quite done.

1328
01:19:51,100 --> 01:19:53,880
So you thought we were done,
but we're not quite done.

1329
01:19:53,880 --> 01:19:58,220
And why is it that
we're not quite done?

1330
01:19:58,220 --> 01:20:02,370
Real quick question
worth five Frisbees.

1331
01:20:02,370 --> 01:20:04,620
Why is it that we're
not quite done?

1332
01:20:04,620 --> 01:20:05,840
What did I say?

1333
01:20:05,840 --> 01:20:08,733
I have done event A
and event B, right?

1334
01:20:08,733 --> 01:20:10,100
AUDIENCE: [INAUDIBLE].

1335
01:20:10,100 --> 01:20:13,800
SRINIVAS DEVADAS: I haven't
done the last thing which

1336
01:20:13,800 --> 01:20:22,070
is to show that probability
of event A-- this

1337
01:20:22,070 --> 01:20:25,250
is with high
probability happens--

1338
01:20:25,250 --> 01:20:26,910
and I need to show
that probability

1339
01:20:26,910 --> 01:20:33,385
of event A and event
B happens-- or this

1340
01:20:33,385 --> 01:20:34,967
is with high probability.

1341
01:20:34,967 --> 01:20:36,550
Or I should just say
event A and event

1342
01:20:36,550 --> 01:20:42,510
B happen with high probability.

1343
01:20:42,510 --> 01:20:43,470
And you can see that.

1344
01:20:43,470 --> 01:20:45,480
It turns out it's
pretty straightforward,

1345
01:20:45,480 --> 01:20:47,440
but you got the gist of it.

1346
01:20:47,440 --> 01:20:49,020
Thanks for being so patient.

1347
01:20:49,020 --> 01:20:51,870
And there you go guys.

1348
01:20:51,870 --> 01:20:53,420
Woah.