1
00:00:00,500 --> 00:00:07,520
Let's go ahead and get started.

2
00:00:07,520 --> 00:00:12,940
OK, so today we have one topic
to finish up very briefly

3
00:00:12,940 --> 00:00:14,380
from last time.

4
00:00:14,380 --> 00:00:18,660
So if you remember, when
we finished off last time,

5
00:00:18,660 --> 00:00:22,046
we were talking about the
example of a multithreaded Web

6
00:00:22,046 --> 00:00:22,545
server.

7
00:00:25,080 --> 00:00:28,309
So the example that
we were talking about,

8
00:00:28,309 --> 00:00:29,850
and this is an
example that I'm going

9
00:00:29,850 --> 00:00:34,720
to use throughout the lecture
today consisted of a Web server

10
00:00:34,720 --> 00:00:40,440
with, this example
consists of a Web server

11
00:00:40,440 --> 00:00:43,380
with three main modules
or main components.

12
00:00:43,380 --> 00:00:46,990
So it consists, the three
modules are a networking

13
00:00:46,990 --> 00:00:51,773
module, a Web server module --

14
00:00:54,880 --> 00:01:01,770
-- which is in charge of
generating, for example,

15
00:01:01,770 --> 00:01:06,110
HTML pages, and then a disk
module which is in charge

16
00:01:06,110 --> 00:01:08,280
of reading data off a disk.

17
00:01:08,280 --> 00:01:11,460
OK, so this thing is
going to be communicating

18
00:01:11,460 --> 00:01:15,330
with the disk, which I've
drawn as a cylinder here.

19
00:01:15,330 --> 00:01:17,200
So what happens is
that client requests

20
00:01:17,200 --> 00:01:19,340
come in to this Web server.

21
00:01:19,340 --> 00:01:21,360
They come in to
the network module.

22
00:01:21,360 --> 00:01:24,140
The network module
forwards those requests

23
00:01:24,140 --> 00:01:25,540
on to the Web server.

24
00:01:25,540 --> 00:01:27,530
The Web server is in
charge of generating,

25
00:01:27,530 --> 00:01:30,454
say, the HTML page that
corresponds to the request.

26
00:01:30,454 --> 00:01:32,870
And in order to do that, it
may need to read some data off

27
00:01:32,870 --> 00:01:33,960
of the disk.

28
00:01:33,960 --> 00:01:37,380
So it forwards this request
onto the disk module, which

29
00:01:37,380 --> 00:01:39,440
goes and actually gets
the page from the disk

30
00:01:39,440 --> 00:01:42,390
and at some point later,
the disk returns the page

31
00:01:42,390 --> 00:01:43,590
to the Web server.

32
00:01:43,590 --> 00:01:45,920
The Web server returns the
page to the network module,

33
00:01:45,920 --> 00:01:49,630
and then the network module
sends the answer back

34
00:01:49,630 --> 00:01:51,310
over the network to the user.

35
00:01:51,310 --> 00:01:54,100
So this is a very simple
example of a Web server.

36
00:01:54,100 --> 00:01:56,120
It should be sort
of familiar to you

37
00:01:56,120 --> 00:01:59,130
since you have just spent a
while studying the Flash Web

38
00:01:59,130 --> 00:01:59,630
server.

39
00:01:59,630 --> 00:02:02,130
So you can see that this is
sort of a simplified description

40
00:02:02,130 --> 00:02:05,320
of what a Web server does.

41
00:02:05,320 --> 00:02:08,699
So if you think about
how you actually

42
00:02:08,699 --> 00:02:11,300
go about designing a Web
server like this, of course

43
00:02:11,300 --> 00:02:14,320
it's not the case that there
is only one request that

44
00:02:14,320 --> 00:02:16,810
is moving between these modules
at any one point in time.

45
00:02:16,810 --> 00:02:20,750
So, in fact, there may be
multiple client requests that

46
00:02:20,750 --> 00:02:22,295
come in to the network module.

47
00:02:22,295 --> 00:02:23,920
And the network module
may want to have

48
00:02:23,920 --> 00:02:27,300
multiple outstanding pages
that it's asking the Web

49
00:02:27,300 --> 00:02:29,000
server to generate.

50
00:02:29,000 --> 00:02:32,400
And the Web server itself might
be requesting multiple items

51
00:02:32,400 --> 00:02:34,210
from the disk.

52
00:02:34,210 --> 00:02:36,860
And so, in turn, that means
that at any point in time

53
00:02:36,860 --> 00:02:39,180
there could be sort
of multiple results.

54
00:02:39,180 --> 00:02:43,340
There could be results streaming
back in from the disk which

55
00:02:43,340 --> 00:02:45,470
are going into the
Web server, which

56
00:02:45,470 --> 00:02:47,890
is sort of chewing on
results and producing pages

57
00:02:47,890 --> 00:02:49,590
to the network module.

58
00:02:49,590 --> 00:02:51,560
And so it's possible
for there to be

59
00:02:51,560 --> 00:02:54,680
sort of queues that are building
up between these modules

60
00:02:54,680 --> 00:02:56,420
both on the send and receive.

61
00:02:56,420 --> 00:02:58,440
So, I'm going to
draw a queue, I'll

62
00:02:58,440 --> 00:03:01,061
draw a queue just sort as a
box with these vertical arrows

63
00:03:01,061 --> 00:03:01,560
through it.

64
00:03:01,560 --> 00:03:02,935
So there is some
buffering that's

65
00:03:02,935 --> 00:03:05,100
happening between the
sort of incoming requests

66
00:03:05,100 --> 00:03:06,900
and the outgoing requests
on these modules.

67
00:03:11,170 --> 00:03:13,026
OK, and this buffering
is a good thing.

68
00:03:13,026 --> 00:03:14,650
And we're going to
talk more about this

69
00:03:14,650 --> 00:03:17,150
throughout the lecture today
because what it allows us to do

70
00:03:17,150 --> 00:03:19,050
is it allows us to
decouple the operations

71
00:03:19,050 --> 00:03:20,175
of these different modules.

72
00:03:20,175 --> 00:03:23,020
So, for example, the disk module
can be reading a page from disk

73
00:03:23,020 --> 00:03:26,510
while the HTML page, while the
HTML server is, for example

74
00:03:26,510 --> 00:03:28,601
simultaneously generating
an HTML page that

75
00:03:28,601 --> 00:03:29,850
wants to return to the client.

76
00:03:32,626 --> 00:03:34,000
But in this
architecture, you can

77
00:03:34,000 --> 00:03:37,900
see that, for example,
when the Web server wants

78
00:03:37,900 --> 00:03:40,430
to produce a result, it can
only produce a result when

79
00:03:40,430 --> 00:03:43,090
the disk pages that it needs
are actually available.

80
00:03:43,090 --> 00:03:47,000
So the Web server is dependent
on some result from the disk

81
00:03:47,000 --> 00:03:49,360
module being available.

82
00:03:49,360 --> 00:03:51,770
So if we were to look
at just this Web server,

83
00:03:51,770 --> 00:03:54,550
I'm going to call this the
HTML thread here and the disk

84
00:03:54,550 --> 00:03:56,800
thread, so these
two threads that

85
00:03:56,800 --> 00:04:01,160
are on the right side of this
diagram that I've drawn here,

86
00:04:01,160 --> 00:04:02,919
If you were to look
at the code that

87
00:04:02,919 --> 00:04:05,210
was running in these things,
and we saw this last time,

88
00:04:05,210 --> 00:04:07,530
the code might look
something like this.

89
00:04:07,530 --> 00:04:10,750
So the HTML thread is just going
to sit in a loop continually

90
00:04:10,750 --> 00:04:13,830
trying to de-queue information
from this queue that is shared

91
00:04:13,830 --> 00:04:15,320
between it and the disk thread.

92
00:04:15,320 --> 00:04:16,930
And then, the disk thread
is going to be in a loop

93
00:04:16,930 --> 00:04:19,120
where it continually
reads blocks off the disk,

94
00:04:19,120 --> 00:04:22,260
and then enqueues
them onto this queue.

95
00:04:22,260 --> 00:04:25,080
So this design at first
seems like it might be fine.

96
00:04:25,080 --> 00:04:27,540
But then if you start thinking
about what's really going on

97
00:04:27,540 --> 00:04:28,831
here, there could be a problem.

98
00:04:28,831 --> 00:04:32,210
So, suppose for example that
the queue is of a finite length.

99
00:04:32,210 --> 00:04:34,290
It only has a certain
number of elements in it.

100
00:04:34,290 --> 00:04:37,690
Now when we keep calling enqueue
over and over and over again,

101
00:04:37,690 --> 00:04:40,262
it's possible that if the
HTML thread isn't consuming

102
00:04:40,262 --> 00:04:42,720
these pages off the queue fast
enough, that the queue could

103
00:04:42,720 --> 00:04:44,904
fill up, and it could
overflow, right?

104
00:04:44,904 --> 00:04:46,320
So that might be
a problem that we

105
00:04:46,320 --> 00:04:49,550
want to sort of make a condition
that we would explicitly

106
00:04:49,550 --> 00:04:50,896
check for in the code.

107
00:04:50,896 --> 00:04:52,270
And so we could
do that by adding

108
00:04:52,270 --> 00:04:53,730
a set of conditions like this.

109
00:04:53,730 --> 00:04:59,390
So what you see here is that
I have just augmented the code

110
00:04:59,390 --> 00:05:02,070
with these two additional
variables, used and free, where

111
00:05:02,070 --> 00:05:04,530
used indicates the number blocks
that are in the queue that

112
00:05:04,530 --> 00:05:05,930
are currently in use.

113
00:05:05,930 --> 00:05:07,880
And free indicates
the number of blocks

114
00:05:07,880 --> 00:05:10,764
that are in the code,
the number of blocks

115
00:05:10,764 --> 00:05:12,680
that are in the queue
that are currently free.

116
00:05:12,680 --> 00:05:15,750
So what this loop does
is that the disk thread

117
00:05:15,750 --> 00:05:18,910
says it only wants to enqueue
something onto the queue

118
00:05:18,910 --> 00:05:21,690
when there are some free blocks.

119
00:05:21,690 --> 00:05:24,420
So, it has a while loop
that just loops forever

120
00:05:24,420 --> 00:05:27,130
and ever and ever while
they're waiting when

121
00:05:27,130 --> 00:05:28,940
there are no free blocks, OK?

122
00:05:28,940 --> 00:05:30,830
And similarly, the
HTML thread is just

123
00:05:30,830 --> 00:05:35,430
going to wait forever when
there are no used blocks, OK?

124
00:05:35,430 --> 00:05:38,600
And then, when the disk
thread enqueues a block

125
00:05:38,600 --> 00:05:41,012
onto the queue, it's going
to decrement the free count

126
00:05:41,012 --> 00:05:42,720
because it's reduced
the number of things

127
00:05:42,720 --> 00:05:44,080
that are in the queue.

128
00:05:44,080 --> 00:05:46,580
And it's going to increment the
used count because now there

129
00:05:46,580 --> 00:05:49,020
is one additional thing that's
available in the queue, OK?

130
00:05:49,020 --> 00:05:50,870
So this is a simple
way in which now we've

131
00:05:50,870 --> 00:05:53,389
made it so these things
are waiting for each other.

132
00:05:53,389 --> 00:05:54,930
They are coordinating
with each other

133
00:05:54,930 --> 00:05:58,370
by use of these two shared
variables used in free.

134
00:05:58,370 --> 00:06:02,990
OK, so these two threads
share these variables.

135
00:06:02,990 --> 00:06:04,450
So that's fine.

136
00:06:04,450 --> 00:06:07,450
But if you think about this
from a scheduling point of view,

137
00:06:07,450 --> 00:06:10,040
there still is a little bit of
a problem with this approach.

138
00:06:10,040 --> 00:06:22,140
So in particular, what's
going on here is that, oops,

139
00:06:22,140 --> 00:06:25,270
when one of these threads enters
into one of these while loops,

140
00:06:25,270 --> 00:06:27,050
it's just going to
sit there checking

141
00:06:27,050 --> 00:06:29,290
this condition over and over
and over and over again, right?

142
00:06:29,290 --> 00:06:31,000
So then the thread scheduler
schedules that thread.

143
00:06:31,000 --> 00:06:33,110
It's going to repeatedly
check this condition.

144
00:06:33,110 --> 00:06:35,450
And that's maybe
not so desirable.

145
00:06:35,450 --> 00:06:37,710
So suppose, for example,
that the HTML thread

146
00:06:37,710 --> 00:06:40,043
enters into this loop and
starts looping because there's

147
00:06:40,043 --> 00:06:41,336
no data available.

148
00:06:41,336 --> 00:06:43,210
Now, what really we
would like to have happen

149
00:06:43,210 --> 00:06:45,709
is for the disk thread to be
allowed to get a chance to run,

150
00:06:45,709 --> 00:06:48,451
so maybe it can produce some
data so that the HTML thread

151
00:06:48,451 --> 00:06:49,700
can then go ahead and operate.

152
00:06:49,700 --> 00:06:53,700
But, with this while loop
there, we can't quite do that.

153
00:06:53,700 --> 00:06:55,690
We just sort of waste
the CPU during the time

154
00:06:55,690 --> 00:06:56,970
we are in this while loop.

155
00:06:56,970 --> 00:06:58,428
So, instead what
we are going to do

156
00:06:58,428 --> 00:07:01,001
is introduce the set of what
we call sequence coordination

157
00:07:01,001 --> 00:07:01,500
operators.

158
00:07:09,550 --> 00:07:11,650
So in order to
introduce this, we're

159
00:07:11,650 --> 00:07:14,190
going to add something,
a new kind of data type

160
00:07:14,190 --> 00:07:15,465
that we call an eventcount.

161
00:07:18,070 --> 00:07:20,400
An eventcount, you
can just think of it

162
00:07:20,400 --> 00:07:26,010
as an integer that indicates
the number of times

163
00:07:26,010 --> 00:07:27,270
that something has occurred.

164
00:07:27,270 --> 00:07:29,505
It's just some sort of
running counter variable.

165
00:07:33,050 --> 00:07:35,490
And we're going to
introduce two new routines.

166
00:07:35,490 --> 00:07:40,780
So these two routines are
called wait and notify.

167
00:07:43,770 --> 00:07:46,504
OK, so wait takes two arguments.

168
00:07:46,504 --> 00:07:48,295
It takes one of these
eventcount variables.

169
00:07:52,410 --> 00:07:55,560
And it takes a value.

170
00:07:55,560 --> 00:08:02,870
OK, so what wait says
is check the value

171
00:08:02,870 --> 00:08:08,730
of this eventcount thing, and
see whether when we check it,

172
00:08:08,730 --> 00:08:13,750
the value of this eventcount
is less than or equal to value.

173
00:08:13,750 --> 00:08:20,110
If eventcount is less than or
equal to value, then it waits.

174
00:08:20,110 --> 00:08:21,800
And what it means
for it to wait is

175
00:08:21,800 --> 00:08:24,280
that it tells the
thread scheduler

176
00:08:24,280 --> 00:08:28,880
that it no longer wants to be
scheduled until somebody later

177
00:08:28,880 --> 00:08:34,270
calls this notify routine on
this same eventcount variable.

178
00:08:34,270 --> 00:08:37,620
OK, so wait says wait, if
this condition is true,

179
00:08:37,620 --> 00:08:39,809
and then notify says,
wake up everybody

180
00:08:39,809 --> 00:08:41,210
who's waiting on this variable.

181
00:08:46,140 --> 00:08:48,799
So we can use these routines in
the following way in this code.

182
00:08:48,799 --> 00:08:50,340
And it's really very
straightforward.

183
00:08:50,340 --> 00:08:56,150
We simply change our iteration
through our while loops

184
00:08:56,150 --> 00:08:57,324
into wait statements.

185
00:08:57,324 --> 00:08:58,740
So what we're going
to do is we're

186
00:08:58,740 --> 00:09:02,720
going to have the HTML thread
wait until the value of used

187
00:09:02,720 --> 00:09:04,777
becomes greater than zero.

188
00:09:04,777 --> 00:09:06,610
And we're going to have
our disk thread wait

189
00:09:06,610 --> 00:09:09,020
until the value of free
becomes greater than zero.

190
00:09:09,020 --> 00:09:11,570
And then the only other thing
that we have to add to this

191
00:09:11,570 --> 00:09:13,450
is simply a call to notify.

192
00:09:13,450 --> 00:09:17,560
So what notify does
is it indicates

193
00:09:17,560 --> 00:09:20,460
to any other thread
that is waiting

194
00:09:20,460 --> 00:09:22,870
on a particular variable
that that thread can run.

195
00:09:22,870 --> 00:09:25,780
So the HTML thread
will notify free,

196
00:09:25,780 --> 00:09:28,250
which will tell the disk
thread that it can now

197
00:09:28,250 --> 00:09:31,200
begin running if it had been
waiting on the variable free.

198
00:09:33,890 --> 00:09:36,180
OK, so this emulates the
behavior of the while loop

199
00:09:36,180 --> 00:09:40,326
that we had before except that
the thread scheduler, rather

200
00:09:40,326 --> 00:09:42,200
than sitting in an
infinite while loop simply

201
00:09:42,200 --> 00:09:44,510
doesn't schedule the HTML
thread or the disk thread

202
00:09:44,510 --> 00:09:48,330
while it's waiting in one
of these wait statements.

203
00:09:48,330 --> 00:09:53,550
OK, so what we're going
to talk about for the rest

204
00:09:53,550 --> 00:09:56,140
of the lecture today
is related to this,

205
00:09:56,140 --> 00:09:59,510
and I think you will see why
as we get through the talk.

206
00:09:59,510 --> 00:10:01,530
The topic for today
is performance.

207
00:10:06,970 --> 00:10:11,570
So performance, what we've
looked at so far in this class

208
00:10:11,570 --> 00:10:14,190
are these various ways of
structuring complex programs,

209
00:10:14,190 --> 00:10:16,920
how to break them up
into several modules,

210
00:10:16,920 --> 00:10:20,380
the client/server
paradigm, how threads work,

211
00:10:20,380 --> 00:10:23,420
how a thread scheduler works,
all of these sort of big topics

212
00:10:23,420 --> 00:10:24,670
about how you design a system.

213
00:10:24,670 --> 00:10:27,050
But we haven't said anything
about how you take a system

214
00:10:27,050 --> 00:10:31,420
design and in an ordered,
regular, systematic way, think

215
00:10:31,420 --> 00:10:33,320
about making that
system run efficiently.

216
00:10:33,320 --> 00:10:35,630
So that's what we're going
to try and get at today.

217
00:10:35,630 --> 00:10:37,380
We're going to look
at a set of techniques

218
00:10:37,380 --> 00:10:41,460
that we can use to make a
computer system more efficient.

219
00:10:41,460 --> 00:10:45,110
And so, these techniques,
there are really

220
00:10:45,110 --> 00:10:47,350
three techniques that we're
going to look at today.

221
00:10:47,350 --> 00:10:51,070
The first one is a technique
called concurrency.

222
00:10:51,070 --> 00:10:54,080
And concurrency is really
about allowing the system

223
00:10:54,080 --> 00:10:56,800
to perform multiple
operations simultaneously.

224
00:10:56,800 --> 00:11:00,110
So, for example, in
our sample Web server,

225
00:11:00,110 --> 00:11:03,490
it may be the case that
we have this disc that we

226
00:11:03,490 --> 00:11:06,120
can sort of read pages
from at the same time

227
00:11:06,120 --> 00:11:09,100
that, for example, the
CPU generates some Web

228
00:11:09,100 --> 00:11:11,450
pages that it's going
to output to the client.

229
00:11:11,450 --> 00:11:13,587
OK, so that's what
concurrency is about.

230
00:11:13,587 --> 00:11:15,920
We are also going to look at
a technique called caching,

231
00:11:15,920 --> 00:11:18,050
which you guys should
have all seen before.

232
00:11:18,050 --> 00:11:20,650
Caching is really
just about saving off

233
00:11:20,650 --> 00:11:22,697
some previous work, some
previous computation

234
00:11:22,697 --> 00:11:24,780
that we've already done,
or our previous disk page

235
00:11:24,780 --> 00:11:25,904
that we've already read in.

236
00:11:25,904 --> 00:11:28,080
We want to save it off so
that we can reuse it again

237
00:11:28,080 --> 00:11:28,887
at a later time.

238
00:11:28,887 --> 00:11:30,470
And then finally,
we are going to look

239
00:11:30,470 --> 00:11:32,790
at something called scheduling.

240
00:11:32,790 --> 00:11:36,990
So scheduling is about when
we have multiple requests

241
00:11:36,990 --> 00:11:39,080
to process, we might
be able to order

242
00:11:39,080 --> 00:11:41,550
those requests in a certain
way or group the requests

243
00:11:41,550 --> 00:11:43,790
together in a certain
way so that we can

244
00:11:43,790 --> 00:11:45,240
make the system more efficient.

245
00:11:45,240 --> 00:11:48,760
So it's really about sort of
choosing the order in which we

246
00:11:48,760 --> 00:11:53,200
do things in order to make the
system run more efficiently.

247
00:11:53,200 --> 00:11:54,931
And throughout the
course of this,

248
00:11:54,931 --> 00:11:56,930
I'm going to use this
example of this Web server

249
00:11:56,930 --> 00:11:58,971
that we've been talking
about to sort of motivate

250
00:11:58,971 --> 00:12:03,530
each of the applications,
or each of these performance

251
00:12:03,530 --> 00:12:07,090
techniques that we're
going to talk about.

252
00:12:07,090 --> 00:12:08,877
So in order to get
to the point where

253
00:12:08,877 --> 00:12:11,210
we can understand how these
performance techniques work,

254
00:12:11,210 --> 00:12:12,835
we need to talk a
little bit about what

255
00:12:12,835 --> 00:12:13,881
we mean by performance.

256
00:12:13,881 --> 00:12:15,880
How do we measure the
performance of the system,

257
00:12:15,880 --> 00:12:18,770
and how do we understand where
the bottlenecks in performance

258
00:12:18,770 --> 00:12:21,010
in a system might be?

259
00:12:21,010 --> 00:12:28,330
So one thing we might want to,
the first thing we need to do

260
00:12:28,330 --> 00:12:31,350
is to define a set of
performance metrics.

261
00:12:31,350 --> 00:12:33,842
These are just a set of
terms and definitions

262
00:12:33,842 --> 00:12:35,300
that we can use so
that we can talk

263
00:12:35,300 --> 00:12:37,406
about what the performance
of the system is.

264
00:12:37,406 --> 00:12:39,280
So the first metric we
might be interested in

265
00:12:39,280 --> 00:12:44,790
is the capacity of the system.

266
00:12:44,790 --> 00:12:48,170
And capacity is
simply some measure

267
00:12:48,170 --> 00:12:54,175
of the amount of
resource in a system.

268
00:12:54,175 --> 00:12:56,800
So this sounds kind of abstract,
but what we mean by a resource

269
00:12:56,800 --> 00:12:59,080
is some sort of thing
that we can compete with.

270
00:12:59,080 --> 00:13:01,350
It's a disk, or a
CPU, or a network,

271
00:13:01,350 --> 00:13:09,800
so we might, for example, talk
about the capacity of a disk

272
00:13:09,800 --> 00:13:14,780
might be the size in gigabytes
or the capacity of a processor

273
00:13:14,780 --> 00:13:17,920
might be the number
of instructions

274
00:13:17,920 --> 00:13:19,130
it can execute per second.

275
00:13:22,080 --> 00:13:24,250
OK, so once we have
capacity, now we

276
00:13:24,250 --> 00:13:26,584
can start talking about
how much of the system

277
00:13:26,584 --> 00:13:27,500
we are actually using.

278
00:13:27,500 --> 00:13:28,875
So we talk about
the utilization.

279
00:13:32,170 --> 00:13:38,540
So utilization is simply
the percentage of capacity

280
00:13:38,540 --> 00:13:39,940
we're using.

281
00:13:39,940 --> 00:13:42,534
So we might have used up
80% of the disk blocks

282
00:13:42,534 --> 00:13:43,200
on our computer.

283
00:13:45,950 --> 00:13:50,440
So now there are two sort of
properties, or two metrics that

284
00:13:50,440 --> 00:13:53,160
are very commonly used
in computer systems

285
00:13:53,160 --> 00:13:56,800
in order to classify or sort of
talk about what the performance

286
00:13:56,800 --> 00:13:57,840
of the system is.

287
00:13:57,840 --> 00:14:01,190
So, the first metric is latency.

288
00:14:01,190 --> 00:14:10,290
So, latency is simply the time
for a request to complete.

289
00:14:10,290 --> 00:14:18,410
The REQ is request,
OK, and we can also

290
00:14:18,410 --> 00:14:21,270
talk about sort of
the inverse of this,

291
00:14:21,270 --> 00:14:23,620
what at first will seem
like the inverse of this,

292
00:14:23,620 --> 00:14:24,685
which is throughput.

293
00:14:29,420 --> 00:14:32,350
That's simply the number
of requests per second

294
00:14:32,350 --> 00:14:34,360
that we can process.

295
00:14:34,360 --> 00:14:36,340
So when you think about
latency and throughput,

296
00:14:36,340 --> 00:14:37,798
when you first see
this definition,

297
00:14:37,798 --> 00:14:41,230
it's tempting to think
that simply throughput is

298
00:14:41,230 --> 00:14:42,580
the inverse of latency, right?

299
00:14:42,580 --> 00:14:44,940
If it takes 10 ms for
a request to complete,

300
00:14:44,940 --> 00:14:46,570
well, then I must
be able to complete

301
00:14:46,570 --> 00:14:48,860
100 requests per second, right?

302
00:14:48,860 --> 00:14:55,380
And, that's true
in the simple case

303
00:14:55,380 --> 00:14:57,500
where in the very
simple example where

304
00:14:57,500 --> 00:15:00,140
I have a single module, for
example, that can process one

305
00:15:00,140 --> 00:15:06,180
request at a time, so a
single computational resource,

306
00:15:06,180 --> 00:15:10,210
for example, that can only
do one thing at a time,

307
00:15:10,210 --> 00:15:15,270
if this thing has some
infinite set of inputs in it,

308
00:15:15,270 --> 00:15:18,180
it takes 10 ms to
process each input,

309
00:15:18,180 --> 00:15:23,710
we'll see, say, 100 results
per second coming out, OK?

310
00:15:23,710 --> 00:15:26,740
So if something
takes 10 ms to do,

311
00:15:26,740 --> 00:15:28,670
you can be 100 of
them per second.

312
00:15:28,670 --> 00:15:30,660
So we could say the
throughput of this system

313
00:15:30,660 --> 00:15:33,260
is 100 per second, and
the latency is 10 ms.

314
00:15:33,260 --> 00:15:35,780
What we're going to see
throughout this talk is

315
00:15:35,780 --> 00:15:40,500
that in fact a strict
relationship between latency

316
00:15:40,500 --> 00:15:42,090
and throughput
doesn't hold I mean,

317
00:15:42,090 --> 00:15:43,548
you guys have
probably already seen

318
00:15:43,548 --> 00:15:45,600
the notion of pipelining
before in 6.004,

319
00:15:45,600 --> 00:15:47,180
and you understand
that pipelining

320
00:15:47,180 --> 00:15:49,790
is a way in which we can improve
the throughput of the system

321
00:15:49,790 --> 00:15:51,498
without necessarily
changing the latency.

322
00:15:51,498 --> 00:15:56,380
And we'll talk about that more
carefully as this talk goes on.

323
00:15:56,380 --> 00:16:00,520
OK, so given these metrics,
now what we need to do

324
00:16:00,520 --> 00:16:04,750
is think a little bit about, OK,
so suppose I have some system,

325
00:16:04,750 --> 00:16:07,340
and suppose I have some sort
of set of goals for that system

326
00:16:07,340 --> 00:16:09,484
like I want the system
to be able to process

327
00:16:09,484 --> 00:16:11,150
a certain number of
requests per second,

328
00:16:11,150 --> 00:16:15,010
or I want the latency of this
system to be under some amount.

329
00:16:15,010 --> 00:16:18,640
So then the question is, so you
are given this computer system

330
00:16:18,640 --> 00:16:20,680
and you sit down and
you want to measure it.

331
00:16:20,680 --> 00:16:23,490
And so you're going
to measure the system.

332
00:16:23,490 --> 00:16:27,400
And what do you expect to find?

333
00:16:27,400 --> 00:16:31,760
So, in the design
of computer systems,

334
00:16:31,760 --> 00:16:34,870
it turns out that there is some
sort of well-known performance

335
00:16:34,870 --> 00:16:38,560
pitfalls, or so-called
performance bottlenecks.

336
00:16:44,900 --> 00:16:47,221
And the goal of sort of
doing performance analysis

337
00:16:47,221 --> 00:16:48,720
of a system is to
look at the system

338
00:16:48,720 --> 00:16:50,900
and figure out where
the bottlenecks are.

339
00:16:50,900 --> 00:16:54,202
So, this typically in the
design of the big computer

340
00:16:54,202 --> 00:16:55,660
system, what we're
worried about is

341
00:16:55,660 --> 00:16:59,150
which of the little individual
modules within the system

342
00:16:59,150 --> 00:17:01,876
is most responsible for
slowing down my computer.

343
00:17:01,876 --> 00:17:03,250
And what should
I do in order to,

344
00:17:03,250 --> 00:17:05,791
sort of, and then once you've
identified that module, picking

345
00:17:05,791 --> 00:17:08,342
about how to make a particular
module that slow run faster.

346
00:17:08,342 --> 00:17:10,550
So that's really what finding
performance bottlenecks

347
00:17:10,550 --> 00:17:13,130
is about.

348
00:17:13,130 --> 00:17:17,060
And there's a classic bottleneck
that occurs in computer systems

349
00:17:17,060 --> 00:17:18,637
that you guys all
need to know about.

350
00:17:18,637 --> 00:17:20,095
It's this so-called
I/O bottleneck.

351
00:17:23,150 --> 00:17:26,390
OK, so what the
I/O bottleneck says

352
00:17:26,390 --> 00:17:29,300
it's really fairly
straightforward.

353
00:17:29,300 --> 00:17:32,560
If you think about
a computer system,

354
00:17:32,560 --> 00:17:36,360
it has a hierarchy of
memory devices in it, OK?

355
00:17:36,360 --> 00:17:39,560
And these memory devices
start, or storage devices.

356
00:17:39,560 --> 00:17:42,700
So these storage devices
first start with the CPU.

357
00:17:42,700 --> 00:17:45,080
So the CPU has some
set of registers on it,

358
00:17:45,080 --> 00:17:48,170
a small number of them,
say for example, 32.

359
00:17:48,170 --> 00:17:50,790
And you can access those
registers very, very fast,

360
00:17:50,790 --> 00:17:54,850
say once per instruction, once
per cycle on the computer.

361
00:17:54,850 --> 00:17:58,880
So, for example, if your
CPU is one gigahertz,

362
00:17:58,880 --> 00:18:03,040
you may be able to access
one of these registers in 1

363
00:18:03,040 --> 00:18:03,730
nanosecond.

364
00:18:03,730 --> 00:18:09,670
OK, and so typically at the
tallest level, of this pyramid,

365
00:18:09,670 --> 00:18:14,200
we have a small storage
that is fast, OK?

366
00:18:14,200 --> 00:18:17,050
As we go down this
pyramid adding new layers,

367
00:18:17,050 --> 00:18:19,152
and looking at this
storage hierarchy,

368
00:18:19,152 --> 00:18:21,360
we're going to see that
things get bigger and slower.

369
00:18:21,360 --> 00:18:26,140
So, just below the CPU, we may
have some processor cache, OK,

370
00:18:26,140 --> 00:18:31,750
and this might be,
for example, 512 kB.

371
00:18:31,750 --> 00:18:35,910
And it might take 20 ns
to access a single, say,

372
00:18:35,910 --> 00:18:38,162
block of this memory.

373
00:18:38,162 --> 00:18:40,370
And then we're going to have
the RAM, the main memory

374
00:18:40,370 --> 00:18:44,290
of the device, which on a
modern machine might be 1 GB.

375
00:18:44,290 --> 00:18:47,870
And it might take
100 ns to access.

376
00:18:47,870 --> 00:18:50,900
And then below that, you take
a big step down or big step up

377
00:18:50,900 --> 00:18:52,930
in size and big step
down and performance.

378
00:18:52,930 --> 00:18:54,630
You typically have a disk.

379
00:18:54,630 --> 00:18:57,440
So a disk might be as
big as 100 GB, right?

380
00:18:57,440 --> 00:18:59,330
But, performance is very slow.

381
00:18:59,330 --> 00:19:02,230
So it's a mechanical
thing that has to spin,

382
00:19:02,230 --> 00:19:04,270
and it only spins so fast.

383
00:19:04,270 --> 00:19:07,480
So a typical access time
for a block of the disk

384
00:19:07,480 --> 00:19:10,280
might be as high as
10 ms or even higher.

385
00:19:10,280 --> 00:19:12,040
And then sometimes
people will talk

386
00:19:12,040 --> 00:19:15,576
in this hierarchy the network
is actually a level below that.

387
00:19:15,576 --> 00:19:18,200
So if something isn't available
on the local disk, for example,

388
00:19:18,200 --> 00:19:19,720
on our Web server,
we might actually

389
00:19:19,720 --> 00:19:21,800
have to go out into the
network and fetch it.

390
00:19:21,800 --> 00:19:23,824
And if this network
is the Internet,

391
00:19:23,824 --> 00:19:25,740
right, the Internet has
a huge amount of data.

392
00:19:25,740 --> 00:19:27,115
I mean, who knows
how much it is.

393
00:19:27,115 --> 00:19:29,320
It's certainly
orders of terabytes.

394
00:19:29,320 --> 00:19:32,710
And it could take a long time
to get a page of the Internet.

395
00:19:32,710 --> 00:19:36,610
So it might take 100 ms
to reach some remote site

396
00:19:36,610 --> 00:19:38,090
on the Internet.

397
00:19:38,090 --> 00:19:41,550
All right, so the point
about this I/O bottleneck

398
00:19:41,550 --> 00:19:44,710
is that this is going to
be a very common, sort

399
00:19:44,710 --> 00:19:46,586
of the disparity
in the performance

400
00:19:46,586 --> 00:19:48,210
of these different
levels of the system

401
00:19:48,210 --> 00:19:50,910
is going to be a very
common source of performance

402
00:19:50,910 --> 00:19:52,060
problems in our computers.

403
00:19:52,060 --> 00:19:57,420
So in particular, if you look
at the access time, here's 1 ns.

404
00:19:57,420 --> 00:19:59,330
The access time
down here is 100 ms.

405
00:19:59,330 --> 00:20:02,000
This is a ten to the
eighth difference, right,

406
00:20:02,000 --> 00:20:04,780
which is equal to
100 million times

407
00:20:04,780 --> 00:20:07,380
difference in the performance
of the fastest to the slowest

408
00:20:07,380 --> 00:20:07,880
thing here.

409
00:20:07,880 --> 00:20:10,130
So, if the CPU has
to wait for something

410
00:20:10,130 --> 00:20:13,354
to come over the network, you're
waiting for a very long time

411
00:20:13,354 --> 00:20:15,770
in terms of the amount of time
the CPU takes to, say, read

412
00:20:15,770 --> 00:20:18,090
a single word of memory.

413
00:20:18,090 --> 00:20:22,490
So when we look at the
performance of a computer

414
00:20:22,490 --> 00:20:24,330
system, we're going
to see that often

415
00:20:24,330 --> 00:20:27,680
this sort of I/O bottleneck is
the problem with that system.

416
00:20:27,680 --> 00:20:30,090
So if we look, for
example, at our Web server,

417
00:20:30,090 --> 00:20:38,830
with its three stages, where
this stage is the one that

418
00:20:38,830 --> 00:20:43,150
goes to disk, this is the HTML
stage, which maybe can just

419
00:20:43,150 --> 00:20:45,020
be computed in memory.

420
00:20:45,020 --> 00:20:47,760
And this is the network stage.

421
00:20:47,760 --> 00:20:51,980
We might be talking about 10
ms latency for the disk stage.

422
00:20:51,980 --> 00:20:54,750
We might be talking about
just 1 ms for the HTML page,

423
00:20:54,750 --> 00:20:57,850
because all it has to do is
do some computation in memory.

424
00:20:57,850 --> 00:21:00,770
And we might be talking about
100 ms for the network stage

425
00:21:00,770 --> 00:21:02,700
to run because it has
to send some data out

426
00:21:02,700 --> 00:21:04,020
to some remote site.

427
00:21:04,020 --> 00:21:06,760
So if you, in order to
process a single request,

428
00:21:06,760 --> 00:21:09,280
have to go through each of
these steps in sequence,

429
00:21:09,280 --> 00:21:12,030
then the total
performance of the system,

430
00:21:12,030 --> 00:21:15,410
the time to process a single
request is going to be,

431
00:21:15,410 --> 00:21:21,540
say for example, 111 ms, the
sum of these three things, OK?

432
00:21:21,540 --> 00:21:24,430
And so if you look at the
system and you say, OK,

433
00:21:24,430 --> 00:21:27,989
what's the performance
bottleneck in this system?

434
00:21:27,989 --> 00:21:29,530
So the performance
bottleneck, right,

435
00:21:29,530 --> 00:21:32,430
is clearly this network stage
because it takes the longest

436
00:21:32,430 --> 00:21:33,800
to run.

437
00:21:33,800 --> 00:21:36,410
And so if we want to answer a
question about where we should

438
00:21:36,410 --> 00:21:38,370
be optimizing the
system, one place

439
00:21:38,370 --> 00:21:41,750
we might think to optimize
is within this network stage.

440
00:21:41,750 --> 00:21:44,800
And we'll see later an
example of a simple kind

441
00:21:44,800 --> 00:21:46,490
of optimization
that we can apply

442
00:21:46,490 --> 00:21:48,346
based on this notion
of concurrency

443
00:21:48,346 --> 00:21:50,470
to improve the performance
of the networking stage.

444
00:21:55,190 --> 00:21:57,620
So as I just said, the
notion of concurrency

445
00:21:57,620 --> 00:21:59,328
is going to be the
way that we are really

446
00:21:59,328 --> 00:22:01,870
going to get at sort of
eliminating these I/O

447
00:22:01,870 --> 00:22:02,630
bottlenecks.

448
00:22:02,630 --> 00:22:12,160
So -- And the idea is going
to be that we want to overlap

449
00:22:12,160 --> 00:22:18,290
the use of some other resource
during the time that we are

450
00:22:18,290 --> 00:22:21,410
waiting for one of these
slow I/O devices to complete.

451
00:22:21,410 --> 00:22:24,270
And, we are going to look
at two types of concurrency.

452
00:22:24,270 --> 00:22:28,880
We're going to look at
concurrency between modules --

453
00:22:28,880 --> 00:22:33,000
-- and within a module, OK?

454
00:22:37,030 --> 00:22:39,730
So we may have modules that
are composed, for example,

455
00:22:39,730 --> 00:22:42,260
our networking module
may be composed

456
00:22:42,260 --> 00:22:43,690
of multiple threads,
each of which

457
00:22:43,690 --> 00:22:44,898
can be accessing the network.

458
00:22:44,898 --> 00:22:47,450
So that's an example of
concurrency within a module.

459
00:22:47,450 --> 00:22:49,680
And, we're going
to look at the case

460
00:22:49,680 --> 00:22:53,180
of between module concurrency
where, for example, the HTML

461
00:22:53,180 --> 00:22:56,220
module can be processing and
be generating an HTML page,

462
00:22:56,220 --> 00:23:01,160
while the disk module is reading
a request for another client

463
00:23:01,160 --> 00:23:02,650
at the same time.

464
00:23:02,650 --> 00:23:06,810
OK, and so the idea
behind concurrency

465
00:23:06,810 --> 00:23:10,480
is really going to be
that by using concurrency,

466
00:23:10,480 --> 00:23:14,170
we can hide the latency of
one of these slow I/O stages.

467
00:23:22,130 --> 00:23:25,285
OK, so the first
kind of concurrency

468
00:23:25,285 --> 00:23:27,660
we're going to talk about is
concurrency between modules.

469
00:23:31,340 --> 00:23:34,890
And the primary technique we use
for doing this is pipelining.

470
00:23:34,890 --> 00:23:37,660
So the idea with
pipelining is as follows.

471
00:23:37,660 --> 00:23:42,080
Suppose we have our
Web server again.

472
00:23:42,080 --> 00:23:44,310
And this time let's
draw it as I drew it

473
00:23:44,310 --> 00:23:50,870
at first with queues between
each of the modules, OK?

474
00:23:50,870 --> 00:23:53,804
So, we have our Web server
which has our three stages.

475
00:23:53,804 --> 00:23:55,220
And suppose that
what we are doing

476
00:23:55,220 --> 00:23:59,150
is we have some set
of requests, sort

477
00:23:59,150 --> 00:24:02,375
of an infinite queue of requests
that is sort of queued up

478
00:24:02,375 --> 00:24:04,000
at the disk thread,
and the disk thread

479
00:24:04,000 --> 00:24:05,030
is producing these things.

480
00:24:05,030 --> 00:24:06,321
And we're sending them through.

481
00:24:06,321 --> 00:24:10,690
Well, we want to look at
how many pages come out

482
00:24:10,690 --> 00:24:13,530
here per second, and what
the latency of each page is.

483
00:24:13,530 --> 00:24:17,590
So, if we have some
list of requests,

484
00:24:17,590 --> 00:24:22,890
suppose these requests are
numbered R1 through Rn, OK?

485
00:24:22,890 --> 00:24:27,740
So what's going to happen
is that the first request is

486
00:24:27,740 --> 00:24:30,410
going to start being processed
by the disk server, right?

487
00:24:30,410 --> 00:24:33,162
So, it's going to
start processing R1.

488
00:24:33,162 --> 00:24:35,620
Now, in a pipelining system,
what we're going to want to do

489
00:24:35,620 --> 00:24:38,595
is to have each one of these
threads sort of working

490
00:24:38,595 --> 00:24:40,970
on a different request, each
one of these modules working

491
00:24:40,970 --> 00:24:43,700
on a different request
at each point in time.

492
00:24:43,700 --> 00:24:46,270
And because the disk is
an independent resource

493
00:24:46,270 --> 00:24:48,920
from the CPU, is an independent
resource from the network,

494
00:24:48,920 --> 00:24:50,160
this is going to be OK.

495
00:24:50,160 --> 00:24:51,620
These three modules
aren't actually

496
00:24:51,620 --> 00:24:53,540
going to contend with
each other too much.

497
00:24:53,540 --> 00:24:55,415
So what's going to happen
is this guy's going

498
00:24:55,415 --> 00:24:57,720
to start processing R1, right?

499
00:24:57,720 --> 00:25:03,280
And then after 10 ms, he's
going to pass R1 up to here,

500
00:25:03,280 --> 00:25:05,580
and start working on R2, OK?

501
00:25:05,580 --> 00:25:08,440
And now, 1 ms after
that, this guy

502
00:25:08,440 --> 00:25:12,280
is going to finish R1
and send it to here.

503
00:25:12,280 --> 00:25:15,730
And then, 9 ms after that,
R2 is going to come up here.

504
00:25:15,730 --> 00:25:17,946
And this guy can
start processing R3.

505
00:25:17,946 --> 00:25:19,570
OK, so does everybody
sort of see where

506
00:25:19,570 --> 00:25:20,820
those numbers are coming from?

507
00:25:23,660 --> 00:25:25,320
OK.

508
00:25:25,320 --> 00:25:26,660
[LAUGHTER] Good.

509
00:25:26,660 --> 00:25:34,890
So now, what we're going to do
is if we look at time starting

510
00:25:34,890 --> 00:25:38,700
with this equal to time zero, in
terms of the requests that come

511
00:25:38,700 --> 00:25:41,900
in and out of this
last network thread,

512
00:25:41,900 --> 00:25:43,929
we can sort of get
a sense of how fast

513
00:25:43,929 --> 00:25:44,970
this thing is processing.

514
00:25:44,970 --> 00:25:52,880
So the first R1 enters into
this system after 11 ms, right?

515
00:25:52,880 --> 00:25:55,200
It takes 10 ms to get
through here and 1 ms

516
00:25:55,200 --> 00:25:56,420
to get through here.

517
00:25:56,420 --> 00:25:58,910
And, it starts processing
R1 at this time.

518
00:25:58,910 --> 00:26:00,960
So, I'm going to write
plus R1 to suggest that we

519
00:26:00,960 --> 00:26:03,130
start processing it here.

520
00:26:03,130 --> 00:26:06,570
The next time that this
module can do anything

521
00:26:06,570 --> 00:26:12,670
is 100 ms after it first
started processing,

522
00:26:12,670 --> 00:26:16,240
the next time this module does
anything is 100 ms after it

523
00:26:16,240 --> 00:26:17,470
started processing R1.

524
00:26:17,470 --> 00:26:22,860
So, at time 111 ms,
it can output R1,

525
00:26:22,860 --> 00:26:24,740
or it's done processing R1.

526
00:26:24,740 --> 00:26:31,490
And then, of course, by
that time, R2 and R3,

527
00:26:31,490 --> 00:26:34,360
some set of requests
have already queued up

528
00:26:34,360 --> 00:26:35,790
in this queue waiting for it.

529
00:26:35,790 --> 00:26:41,480
So it can immediately begin
processing R2 at this time, OK?

530
00:26:41,480 --> 00:26:45,560
So then, clearly what's going
to happen is after 211 ms,

531
00:26:45,560 --> 00:26:49,030
it's going to
output R2, and it's

532
00:26:49,030 --> 00:26:52,760
going to begin
processing R3, OK?

533
00:26:52,760 --> 00:26:55,000
So, there should be a plus
there and a plus there.

534
00:26:55,000 --> 00:26:58,040
So, and similarly,
at 311 we're going

535
00:26:58,040 --> 00:26:59,390
to move on to the next one.

536
00:26:59,390 --> 00:27:01,750
So, if you look
now at the system,

537
00:27:01,750 --> 00:27:04,080
we've done something
pretty interesting,

538
00:27:04,080 --> 00:27:06,330
which is that it
still took us, sort

539
00:27:06,330 --> 00:27:11,070
of the time for this request to
travel through this whole thing

540
00:27:11,070 --> 00:27:12,770
was 110 ms.

541
00:27:12,770 --> 00:27:15,150
But if you look at the
enter - arrival time

542
00:27:15,150 --> 00:27:18,850
between each of these
successive outputs of R1,

543
00:27:18,850 --> 00:27:21,030
they are only 100 ms, right?

544
00:27:21,030 --> 00:27:23,390
So we are only
waiting as long as it

545
00:27:23,390 --> 00:27:26,600
takes R1 to process
a result in order

546
00:27:26,600 --> 00:27:29,460
to produce these results,
in order to produce answers.

547
00:27:29,460 --> 00:27:31,260
So by pipelining the
system in this way

548
00:27:31,260 --> 00:27:34,940
and having the Web server
thread and the disk thread

549
00:27:34,940 --> 00:27:37,080
do their processing
on later requests

550
00:27:37,080 --> 00:27:39,860
while R1 is processing
its request,

551
00:27:39,860 --> 00:27:42,230
we can increase the
throughput of the system.

552
00:27:42,230 --> 00:27:45,580
So in this case, we get
an arrival every 100 ms.

553
00:27:45,580 --> 00:27:53,650
So the throughput is now equal
to one result every 100 ms,

554
00:27:53,650 --> 00:27:56,580
or ten results per second, OK?

555
00:27:56,580 --> 00:28:00,350
So, even though the
latency is still 111 ms,

556
00:28:00,350 --> 00:28:02,570
the throughput is no
longer one over the latency

557
00:28:02,570 --> 00:28:06,920
because we have separated them
in this way by pipelining them.

558
00:28:06,920 --> 00:28:07,800
OK, so that was good.

559
00:28:07,800 --> 00:28:08,370
That was nice.

560
00:28:08,370 --> 00:28:10,050
We improve the performance
of the system a little bit.

561
00:28:10,050 --> 00:28:12,219
But we didn't really
improve it very much, right?

562
00:28:12,219 --> 00:28:14,510
We increased the throughput
of this thing a little bit.

563
00:28:14,510 --> 00:28:19,420
But we haven't really addressed
what we identified earlier as

564
00:28:19,420 --> 00:28:22,510
being a bottleneck, which the
fact that this R1 stage is

565
00:28:22,510 --> 00:28:25,640
taking 100 ms to process.

566
00:28:25,640 --> 00:28:28,530
And in general, when we have
a pipeline system like this,

567
00:28:28,530 --> 00:28:31,720
we can say that the
throughput of the system

568
00:28:31,720 --> 00:28:34,920
is bottlenecked by the
slowest stage of the system.

569
00:28:34,920 --> 00:28:38,370
So anytime you have a pipeline,
the throughput of the system

570
00:28:38,370 --> 00:28:41,010
is going to be the throughput
of the slowest stage.

571
00:28:41,010 --> 00:28:45,110
So in this case, the throughput
is 10 results per second.

572
00:28:45,110 --> 00:28:47,060
And that's the throughput
of the whole system.

573
00:28:47,060 --> 00:28:49,362
So if we want to improve
the throughput anymore

574
00:28:49,362 --> 00:28:51,070
than this, what we're
going to have to do

575
00:28:51,070 --> 00:28:53,710
is to somehow improve the
performance of this module

576
00:28:53,710 --> 00:28:55,325
here.

577
00:28:55,325 --> 00:28:56,950
And the way that
we're going to do that

578
00:28:56,950 --> 00:28:59,140
is also by exploiting
concurrency.

579
00:29:05,920 --> 00:29:11,760
This is going to be this
within a module concurrency.

580
00:29:16,450 --> 00:29:19,840
So if you think about
how a Web server works,

581
00:29:19,840 --> 00:29:22,250
or how a network
works, typically

582
00:29:22,250 --> 00:29:27,790
when we are sending these
requests to a client,

583
00:29:27,790 --> 00:29:30,291
it's not that we are using up
all of the available bandwidth

584
00:29:30,291 --> 00:29:32,331
of the network when we
are sending these requests

585
00:29:32,331 --> 00:29:33,140
to a client, right?

586
00:29:33,140 --> 00:29:35,890
You may be able to send
100 MB per second out

587
00:29:35,890 --> 00:29:36,890
over your network.

588
00:29:36,890 --> 00:29:38,770
Or if you're connected
to a machine here,

589
00:29:38,770 --> 00:29:42,570
you may be able to send 10 MB
a second across the country

590
00:29:42,570 --> 00:29:44,840
to some other university.

591
00:29:44,840 --> 00:29:47,700
The issue is that it takes
a relatively long time

592
00:29:47,700 --> 00:29:49,410
for that request to
propagate, especially

593
00:29:49,410 --> 00:29:52,300
when that request is propagating
out over the Internet.

594
00:29:52,300 --> 00:29:54,280
The latency can be quite high.

595
00:29:54,280 --> 00:29:56,392
But you may not be
using all the bandwidth

596
00:29:56,392 --> 00:29:58,600
when you are, say, for
example, sending an HTML page.

597
00:29:58,600 --> 00:30:04,340
So in particular it is the case
that multiple applications,

598
00:30:04,340 --> 00:30:07,430
multiple threads, can be
simultaneously sending data out

599
00:30:07,430 --> 00:30:08,660
over the network.

600
00:30:08,660 --> 00:30:10,746
And if that doesn't make
sense to you right now,

601
00:30:10,746 --> 00:30:13,120
we're going to spend the whole
next four lectures talking

602
00:30:13,120 --> 00:30:14,203
about network performance.

603
00:30:14,203 --> 00:30:15,980
And it should make
sense for you.

604
00:30:15,980 --> 00:30:17,862
So just take my
word for it that one

605
00:30:17,862 --> 00:30:19,320
of the properties
of the network is

606
00:30:19,320 --> 00:30:23,570
so that the latency of the
network may be relatively high.

607
00:30:23,570 --> 00:30:25,070
But in this case
we are not actually

608
00:30:25,070 --> 00:30:26,820
going to be using all
the bandwidth that's

609
00:30:26,820 --> 00:30:27,650
available to us.

610
00:30:27,650 --> 00:30:30,570
So that suggests that
there is an idle resource.

611
00:30:30,570 --> 00:30:32,580
It means that we sort
of have some network

612
00:30:32,580 --> 00:30:34,927
bandwidth that we could be
using that we are not using.

613
00:30:34,927 --> 00:30:36,510
So we'd like to take
advantage of that

614
00:30:36,510 --> 00:30:38,380
in the design of our system.

615
00:30:38,380 --> 00:30:44,130
So we can do this in a
relatively simple way, which

616
00:30:44,130 --> 00:30:47,969
is simply to say, let's,
within our networking module,

617
00:30:47,969 --> 00:30:50,260
rather than only having one
thread sending out requests

618
00:30:50,260 --> 00:30:52,340
at one time, let's
have multiple threads.

619
00:30:52,340 --> 00:30:56,710
Let's, for example have,
say we have 10 threads.

620
00:30:56,710 --> 00:31:00,870
So we have thread one,
thread two, thread ten, OK?

621
00:31:00,870 --> 00:31:02,810
And we're going to
allow these all to be

622
00:31:02,810 --> 00:31:05,120
using the network at once.

623
00:31:05,120 --> 00:31:06,620
And they are all
going to be talking

624
00:31:06,620 --> 00:31:12,070
to the same queue that's
connected to the same HTML

625
00:31:12,070 --> 00:31:14,770
module that's connected
to the same disk module.

626
00:31:14,770 --> 00:31:16,520
And there's a queue
between these as well.

627
00:31:19,610 --> 00:31:25,610
OK, so now when we think
about the performance of this,

628
00:31:25,610 --> 00:31:28,194
now let's see what happens
when we start running requests

629
00:31:28,194 --> 00:31:29,110
through this pipeline.

630
00:31:29,110 --> 00:31:32,602
And let's see how frequently
we get requests coming out

631
00:31:32,602 --> 00:31:33,310
of the other end.

632
00:31:37,620 --> 00:31:39,400
We draw our timeline again.

633
00:31:39,400 --> 00:31:42,120
You can see that R1 is
going to come in here,

634
00:31:42,120 --> 00:31:45,170
and then after 10 ms it's
going to move to here.

635
00:31:45,170 --> 00:31:47,520
And then after 11 ms
it'll arrive here.

636
00:31:47,520 --> 00:31:50,960
We'll start processing
request one.

637
00:31:50,960 --> 00:31:54,460
Now the second request,
R2, is going to be here.

638
00:31:54,460 --> 00:31:59,000
And, we're going to have 9 ms
of processing left to do on it.

639
00:31:59,000 --> 00:32:02,330
After R1, it gets sent
on to the next thread.

640
00:32:02,330 --> 00:32:06,740
So, R2 is going to
be in here for 9 ms.

641
00:32:06,740 --> 00:32:08,390
It will be in here for 1 ms.

642
00:32:08,390 --> 00:32:12,860
So, 10 ms after R1 arrives here,
R2 is going to arrive here.

643
00:32:12,860 --> 00:32:15,950
So, what we have
is we have 11 ms.

644
00:32:15,950 --> 00:32:16,830
We have R1.

645
00:32:16,830 --> 00:32:21,240
Now, 10 ms later, we have R2.

646
00:32:21,240 --> 00:32:23,360
OK, so now you can
see that suddenly

647
00:32:23,360 --> 00:32:26,950
this module, this system is able
to process multiple requests,

648
00:32:26,950 --> 00:32:30,640
so it has multiple requests that
processing at the same time.

649
00:32:30,640 --> 00:32:37,320
And so 10 ms after that, R3 is
going to start being processed,

650
00:32:37,320 --> 00:32:42,220
and then, so what that means
is that after some passage

651
00:32:42,220 --> 00:32:45,900
of time, we're going
to have R10 in here.

652
00:32:45,900 --> 00:32:48,470
And, that's going to go
in after 101 ms, right?

653
00:32:48,470 --> 00:32:49,820
So, we're going to get R10.

654
00:32:49,820 --> 00:32:51,990
OK, and now we are ready
to start processing.

655
00:32:51,990 --> 00:32:54,410
Now we've sort of pushed
all these through.

656
00:32:54,410 --> 00:32:57,090
And now, suppose we
start processing R11.

657
00:32:57,090 --> 00:33:00,800
OK, so R11 is going to
flow through this pipeline.

658
00:33:00,800 --> 00:33:06,690
And then, it's at
time 111, R11 is going

659
00:33:06,690 --> 00:33:08,780
to be ready to be processed.

660
00:33:08,780 --> 00:33:11,360
But notice that at
time 111, we are

661
00:33:11,360 --> 00:33:14,990
finished processing R1, right?

662
00:33:14,990 --> 00:33:19,010
So, at this time, we can
add R11 to the system,

663
00:33:19,010 --> 00:33:21,350
and we can output R1.

664
00:33:21,350 --> 00:33:25,160
OK, so now every
10 ms after this,

665
00:33:25,160 --> 00:33:26,850
another result is
going to arrive,

666
00:33:26,850 --> 00:33:29,610
and we're going to be able
to output the next one.

667
00:33:29,610 --> 00:33:31,370
OK, and this is just
going to continue.

668
00:33:31,370 --> 00:33:32,786
So now, you see
what we've managed

669
00:33:32,786 --> 00:33:36,510
to do is we've made this
system so that every 10

670
00:33:36,510 --> 00:33:40,240
ms after this sort of
startup time of 111 ms,

671
00:33:40,240 --> 00:33:44,160
after every 10 ms, we are
producing a result, right?

672
00:33:44,160 --> 00:33:47,920
So we are going to get,
actually, 100 per second.

673
00:33:47,920 --> 00:33:51,040
This is going to be the
throughput of this system now.

674
00:33:51,040 --> 00:33:53,819
OK, so that was kind of neat.

675
00:33:53,819 --> 00:33:54,610
How did we do that?

676
00:33:54,610 --> 00:33:56,230
What have we done here?

677
00:33:56,230 --> 00:33:57,800
Well, effectively
what we've done

678
00:33:57,800 --> 00:34:00,690
is we've made it so
that this module here

679
00:34:00,690 --> 00:34:03,580
can process sort of 10
times as many requests

680
00:34:03,580 --> 00:34:04,460
as it could before.

681
00:34:04,460 --> 00:34:09,320
So this module itself now
has 10 times the throughput

682
00:34:09,320 --> 00:34:10,830
that it had before.

683
00:34:10,830 --> 00:34:14,560
And we said before that the
bottleneck in the system

684
00:34:14,560 --> 00:34:18,949
is, the throughput of the
system is the throughput

685
00:34:18,949 --> 00:34:21,610
of the slowest stage.

686
00:34:21,610 --> 00:34:24,275
So what we've managed to do
is decrease the throughput

687
00:34:24,275 --> 00:34:25,150
of the slowest stage.

688
00:34:25,150 --> 00:34:27,760
And so now the system is
running 10 times as fast.

689
00:34:27,760 --> 00:34:31,489
Notice now that the disk
thread and the network threads

690
00:34:31,489 --> 00:34:34,969
both take 10 ms, sort of the
throughput of each of them

691
00:34:34,969 --> 00:34:37,260
is 100 per second.

692
00:34:37,260 --> 00:34:41,280
And so, now we have sort of two
stages that have been equalized

693
00:34:41,280 --> 00:34:42,986
in their throughput.

694
00:34:42,986 --> 00:34:44,610
And so if we wanted
to further increase

695
00:34:44,610 --> 00:34:46,150
the performance
of the system, we

696
00:34:46,150 --> 00:34:47,733
would have to increase
the performance

697
00:34:47,733 --> 00:34:50,400
of both of these stages,
not just one of them.

698
00:34:50,400 --> 00:34:56,630
OK, so that was a
nice result, right?

699
00:34:56,630 --> 00:34:58,850
It seems like we've
done something sort of,

700
00:34:58,850 --> 00:35:01,900
we've shown that we can use
this notion of concurrency

701
00:35:01,900 --> 00:35:05,690
to increase the
performance of a system.

702
00:35:05,690 --> 00:35:11,320
But, we've introduced a
little bit of a problem.

703
00:35:11,320 --> 00:35:15,990
In particular, the problem
we've introduced is as follows.

704
00:35:15,990 --> 00:35:23,530
So, remember, we said we
had this set of threads,

705
00:35:23,530 --> 00:35:27,200
one through, say for example,
ten, that our processing,

706
00:35:27,200 --> 00:35:30,160
they're all sharing
this queue data

707
00:35:30,160 --> 00:35:36,350
structure that is connected
up to our HTML thread.

708
00:35:40,120 --> 00:35:42,870
So, the problem with this
is that what we've done

709
00:35:42,870 --> 00:35:46,017
is to introduce what's called
a race condition on this queue.

710
00:35:46,017 --> 00:35:47,600
And I'll show you
what I mean by that.

711
00:35:51,730 --> 00:35:53,390
So if we look at
our code snippet

712
00:35:53,390 --> 00:35:57,230
up here, for example for what's
happening in our HTML thread,

713
00:35:57,230 --> 00:36:01,230
we see that what it does
is it calls dequeue, right?

714
00:36:01,230 --> 00:36:02,790
So the problem
that we can have is

715
00:36:02,790 --> 00:36:05,790
that we may have multiple
of these modules that

716
00:36:05,790 --> 00:36:08,290
are simultaneously
executing at the same time.

717
00:36:08,290 --> 00:36:12,230
And they may simultaneously
both call dequeue, right?

718
00:36:12,230 --> 00:36:14,260
So depending on how
dequeue is implemented,

719
00:36:14,260 --> 00:36:15,920
we can get some weird results.

720
00:36:15,920 --> 00:36:17,660
So, let me give
you a sort of very

721
00:36:17,660 --> 00:36:20,790
simple possible
implementation of dequeue.

722
00:36:20,790 --> 00:36:25,110
Suppose that what
dequeue does is it reads,

723
00:36:25,110 --> 00:36:30,570
so, given this queue here, let's
say the queue is managed by,

724
00:36:30,570 --> 00:36:33,100
there's two variables that
keep track of the current state

725
00:36:33,100 --> 00:36:33,850
of this queue.

726
00:36:33,850 --> 00:36:35,330
There is a variable
called first,

727
00:36:35,330 --> 00:36:37,180
which points to the
head of the queue,

728
00:36:37,180 --> 00:36:39,240
and there's a variable
called last, which --

729
00:36:39,240 --> 00:36:42,890
first points to the first
used element in this queue,

730
00:36:42,890 --> 00:36:45,980
and last points to
the last used element.

731
00:36:45,980 --> 00:36:49,780
So, the elements that are in
use in the queue at any one time

732
00:36:49,780 --> 00:36:51,910
are between first and last, OK?

733
00:36:51,910 --> 00:36:54,850
And, what's going to
happen is when we dequeue,

734
00:36:54,850 --> 00:36:58,990
we're going to sort of
move first over one, right?

735
00:36:58,990 --> 00:37:03,130
So, when we dequeue something,
we'll free up this cell.

736
00:37:03,130 --> 00:37:07,840
And when we enqueue,
we'll move last down one.

737
00:37:07,840 --> 00:37:10,229
And then, when last
reaches the end,

738
00:37:10,229 --> 00:37:11,520
we are going to wrap it around.

739
00:37:11,520 --> 00:37:13,870
So this is a fairly standard
implementation of a queue.

740
00:37:13,870 --> 00:37:16,450
It's called the circular buffer.

741
00:37:16,450 --> 00:37:19,994
And if first is equal
to last, then we

742
00:37:19,994 --> 00:37:21,160
know that the queue is full.

743
00:37:21,160 --> 00:37:23,160
So that's the condition
that we can check.

744
00:37:23,160 --> 00:37:25,210
So we are not going to
go into too many details

745
00:37:25,210 --> 00:37:27,130
about how this thing is
actually implemented.

746
00:37:27,130 --> 00:37:31,550
But let's look at a very
simple example of how dequeue

747
00:37:31,550 --> 00:37:32,600
might work.

748
00:37:32,600 --> 00:37:36,010
So remember we have these
two shared variables first

749
00:37:36,010 --> 00:37:37,810
and last that are
shared between these,

750
00:37:37,810 --> 00:37:40,410
say, all these threads that
are accessing this thing.

751
00:37:40,410 --> 00:37:44,430
And what dequeue might do is to
say it's going to read a block,

752
00:37:44,430 --> 00:37:48,440
read a page from this
queue, so read the next HTML

753
00:37:48,440 --> 00:37:51,120
page to output, and
it's going to read that

754
00:37:51,120 --> 00:37:53,730
into a local
variable called page.

755
00:37:53,730 --> 00:37:56,550
Let's call this
queue buf, B-U-F,

756
00:37:56,550 --> 00:37:59,920
I mean we'll use a array
notation for accessing it.

757
00:37:59,920 --> 00:38:04,680
So it's going to read
buf sub first, OK,

758
00:38:04,680 --> 00:38:06,990
and then it's going
to increment first.

759
00:38:09,730 --> 00:38:14,650
First gets first plus one, and
then it's going to return page.

760
00:38:17,790 --> 00:38:21,400
OK, that seems like a
straightforward implementation

761
00:38:21,400 --> 00:38:23,010
of dequeue.

762
00:38:23,010 --> 00:38:24,890
And so we have one
thread that's doing this.

763
00:38:24,890 --> 00:38:26,870
Now, suppose we have
another thread that's

764
00:38:26,870 --> 00:38:30,520
doing exactly the same
thing at the same time.

765
00:38:30,520 --> 00:38:33,520
So it runs exactly
the same code.

766
00:38:33,520 --> 00:38:35,830
And remember that these
two threads are sharing

767
00:38:35,830 --> 00:38:38,125
the variables buf and first.

768
00:38:46,650 --> 00:38:49,300
OK, so if you think about this
if you think about these two

769
00:38:49,300 --> 00:38:52,040
things running two
threads at the same time,

770
00:38:52,040 --> 00:38:57,040
there is sort of an interesting
problem that can arise.

771
00:38:57,040 --> 00:38:59,814
So one thing that might happen
when we are running these two

772
00:38:59,814 --> 00:39:02,230
things at the same time is
that the thread scheduler might

773
00:39:02,230 --> 00:39:04,110
first start running thread one.

774
00:39:04,110 --> 00:39:06,446
And it might run the first
instruction of thread one.

775
00:39:06,446 --> 00:39:08,320
And then it might run
the second instruction.

776
00:39:08,320 --> 00:39:10,580
And then it might run
this return thing.

777
00:39:10,580 --> 00:39:12,450
And then it might
come over here,

778
00:39:12,450 --> 00:39:14,840
and it might start running T2.

779
00:39:14,840 --> 00:39:20,720
So, it might, then, stop
running T1 and start running T2,

780
00:39:20,720 --> 00:39:22,850
and execute its
three instructions.

781
00:39:22,850 --> 00:39:27,247
So if the thread scheduler does
this, there's nothing wrong.

782
00:39:27,247 --> 00:39:28,330
It's not a problem, right?

783
00:39:28,330 --> 00:39:31,680
The thread scheduler,
each of these things

784
00:39:31,680 --> 00:39:34,050
read its value from the
queue and incremented it.

785
00:39:34,050 --> 00:39:35,882
T1 read one thing
from the queue,

786
00:39:35,882 --> 00:39:37,840
and then T2 read the next
thing from the queue.

787
00:39:37,840 --> 00:39:40,770
So clearly some of the time
this is going to work fine.

788
00:39:40,770 --> 00:39:43,820
So let's make a list
of possible outcomes.

789
00:39:43,820 --> 00:39:45,360
Sometimes we'll be OK.

790
00:39:45,360 --> 00:39:48,150
The first possible
outcome was OK.

791
00:39:48,150 --> 00:39:51,590
But let's look at a
different situation.

792
00:39:51,590 --> 00:39:57,530
Suppose what happens is that
the first thing the thread

793
00:39:57,530 --> 00:39:59,240
scheduler does is schedule T1.

794
00:39:59,240 --> 00:40:02,010
And T1 executes this
first instruction,

795
00:40:02,010 --> 00:40:04,060
and then just after that
the thread scheduler

796
00:40:04,060 --> 00:40:08,540
decides to pre-empt T1, and
allow T2 to start running.

797
00:40:08,540 --> 00:40:11,880
So it in particular
allows T2 to execute

798
00:40:11,880 --> 00:40:14,530
this dequeue
instruction to its end,

799
00:40:14,530 --> 00:40:17,690
and then it comes over
here and it runs T1.

800
00:40:17,690 --> 00:40:21,150
OK, so what's the problem now?

801
00:40:29,310 --> 00:40:29,810
Yeah?

802
00:40:33,550 --> 00:40:38,680
Right, OK, so they've both
read in the same page variable.

803
00:40:38,680 --> 00:40:42,700
So now both of these threads
have dequeued the same page.

804
00:40:42,700 --> 00:40:48,110
So the value first, for
T1, it was pointing here.

805
00:40:48,110 --> 00:40:49,330
And then we switched.

806
00:40:49,330 --> 00:40:51,119
And it was still
pointing here, right?

807
00:40:51,119 --> 00:40:53,410
And now, so both of these
guys have read the same page.

808
00:40:53,410 --> 00:40:56,879
And now they are both at some
point going to increment first.

809
00:40:56,879 --> 00:40:58,420
So you're going to
increment it once.

810
00:40:58,420 --> 00:40:59,510
Then you're going to
increment it again.

811
00:40:59,510 --> 00:41:02,419
So this second element here
in the queue has been skipped.

812
00:41:02,419 --> 00:41:03,460
OK, so this is a problem.

813
00:41:03,460 --> 00:41:04,668
We don't want this to happen.

814
00:41:04,668 --> 00:41:08,340
Because the system is not
outputting all the pages

815
00:41:08,340 --> 00:41:10,010
that it was supposed to output.

816
00:41:10,010 --> 00:41:12,280
So what can we do to fix this?

817
00:41:20,956 --> 00:41:23,330
So the way that we fixed this
is by introducing something

818
00:41:23,330 --> 00:41:24,545
we call isolation primitives.

819
00:41:32,050 --> 00:41:33,590
And the basic idea
is that we want

820
00:41:33,590 --> 00:41:36,820
to introduce an operation
that will make it

821
00:41:36,820 --> 00:41:40,850
so that any time that
the page variable gets

822
00:41:40,850 --> 00:41:46,870
read out of the queue, that
we also at the same time

823
00:41:46,870 --> 00:41:50,450
increment first without any
other sort of threads' accesses

824
00:41:50,450 --> 00:41:52,700
to this queue being
interleaved with our accesses

825
00:41:52,700 --> 00:41:55,400
to this queue, or our
dequeues from the queue.

826
00:41:55,400 --> 00:41:58,080
So in sort of technical
terms, what we say is

827
00:41:58,080 --> 00:42:02,570
we want these two things,
the reading of page

828
00:42:02,570 --> 00:42:06,160
and the incrementing of
first to be so-called atomic.

829
00:42:06,160 --> 00:42:08,940
OK, and the way that we're going
to make these things atomic

830
00:42:08,940 --> 00:42:14,520
is by isolating them
from each other, that

831
00:42:14,520 --> 00:42:16,640
by isolating these two
threads from each other

832
00:42:16,640 --> 00:42:18,931
when they are executing the
enqueue and dequeue things.

833
00:42:18,931 --> 00:42:21,180
So, these two terms we're
going to come back to

834
00:42:21,180 --> 00:42:25,342
in a few months in the class
towards the end of the class.

835
00:42:25,342 --> 00:42:26,800
But all you need
to understand here

836
00:42:26,800 --> 00:42:28,482
is that there is
this race condition,

837
00:42:28,482 --> 00:42:29,940
and we want some
way to prevent it.

838
00:42:29,940 --> 00:42:31,564
And the way that
we're going to prevent

839
00:42:31,564 --> 00:42:35,790
it is by using these isolation
routines also sometimes

840
00:42:35,790 --> 00:42:37,320
called locks.

841
00:42:37,320 --> 00:42:40,472
So in this case, the
isolation schemes

842
00:42:40,472 --> 00:42:41,680
are going to be called locks.

843
00:42:41,680 --> 00:42:46,020
So the idea is that a lock
is simply a variable, which

844
00:42:46,020 --> 00:42:48,100
can be in one of two states.

845
00:42:48,100 --> 00:42:52,200
It can either be set or unset.

846
00:42:52,200 --> 00:42:55,990
And we have two operations
that we can apply on a lock.

847
00:42:55,990 --> 00:42:58,830
We can acquire it,
and we can release it.

848
00:43:02,030 --> 00:43:06,220
OK, and acquire and release
have the following behavior.

849
00:43:06,220 --> 00:43:10,610
What acquire says is check
the state of the lock,

850
00:43:10,610 --> 00:43:16,500
and if the lock is unset,
then change the state to set.

851
00:43:16,500 --> 00:43:19,480
But if the lock
is set, then wait

852
00:43:19,480 --> 00:43:23,990
until the lock becomes unset.

853
00:43:23,990 --> 00:43:26,030
What a release says is
it simply says change

854
00:43:26,030 --> 00:43:29,910
the state of the lock from unset
to set, or from set to unset,

855
00:43:29,910 --> 00:43:31,420
excuse me.

856
00:43:31,420 --> 00:43:35,560
So let's see how we can use
these two routines in our code.

857
00:43:35,560 --> 00:43:38,740
So let's go back to our
example of enqueue and dequeue.

858
00:43:38,740 --> 00:43:40,110
Let's introduce a lock variable.

859
00:43:40,110 --> 00:43:42,580
We'll call it TL
for thread lock.

860
00:43:42,580 --> 00:43:47,130
And, what we're going to do
is simply around these two

861
00:43:47,130 --> 00:43:53,660
operations to access the
queue, to modify this page

862
00:43:53,660 --> 00:43:56,227
and first, to read the
page and modify first,

863
00:43:56,227 --> 00:43:58,185
we're going to put in an
acquire and a release.

864
00:44:21,620 --> 00:44:24,270
OK so we have ACQ
on this thread lock,

865
00:44:24,270 --> 00:44:27,220
and we have release
on this thread lock.

866
00:44:27,220 --> 00:44:30,770
OK, so let's look,
so this seems fine.

867
00:44:30,770 --> 00:44:32,020
It looks like we've done this.

868
00:44:32,020 --> 00:44:36,300
But it's sort of positing
the existence of this acquire

869
00:44:36,300 --> 00:44:38,809
procedure that just sort
of does the right thing.

870
00:44:38,809 --> 00:44:40,350
If you think about
this for a minute,

871
00:44:40,350 --> 00:44:42,620
it seems like we can have
the same race condition

872
00:44:42,620 --> 00:44:45,332
problem in the acquire module
as well, right, or the acquire

873
00:44:45,332 --> 00:44:46,040
function as well.

874
00:44:46,040 --> 00:44:48,470
With two guys both try
and acquire the lock

875
00:44:48,470 --> 00:44:49,500
at the same time?

876
00:44:49,500 --> 00:44:51,315
How are we going to
avoid this problem?

877
00:44:51,315 --> 00:44:52,690
And there's a
couple of ways that

878
00:44:52,690 --> 00:44:54,770
are sort of well
understood for avoiding

879
00:44:54,770 --> 00:44:57,010
this problem in
practice, and they're

880
00:44:57,010 --> 00:44:58,230
talked about in the book.

881
00:44:58,230 --> 00:45:00,242
I'm just going to introduce
the simplest of them

882
00:45:00,242 --> 00:45:02,700
now, which is that we're going
to add a special instruction

883
00:45:02,700 --> 00:45:06,910
to the microprocessor
that allows us to do this,

884
00:45:06,910 --> 00:45:07,880
acquire efficiently.

885
00:45:07,880 --> 00:45:10,710
It turns out that most
modern microprocessors

886
00:45:10,710 --> 00:45:12,336
have an equivalent instruction.

887
00:45:12,336 --> 00:45:13,960
So we're going to
call this instruction

888
00:45:13,960 --> 00:45:18,690
RSL for read-set- lock.

889
00:45:18,690 --> 00:45:24,415
OK, so the idea with
RSL is as follows.

890
00:45:27,700 --> 00:45:32,520
We can basically,
the implementation

891
00:45:32,520 --> 00:45:39,165
of the acquire module is
going to be like this.

892
00:45:39,165 --> 00:45:40,790
What it's going to
do, remember we want

893
00:45:40,790 --> 00:45:42,750
to wait until we want to loop.

894
00:45:42,750 --> 00:45:43,890
We don't have the lock.

895
00:45:43,890 --> 00:45:45,480
If we don't have
the lock, we want

896
00:45:45,480 --> 00:45:46,997
to loop until
we've had the lock.

897
00:45:46,997 --> 00:45:49,080
So the implementation
require may look as follows.

898
00:45:49,080 --> 00:45:50,770
We'll have a local
variable called held.

899
00:45:50,770 --> 00:45:54,180
We'll initially set it
to false in a while loop

900
00:45:54,180 --> 00:45:57,710
while we don't hold the lock.

901
00:45:57,710 --> 00:45:59,560
We're going to use
this RSL instruction.

902
00:46:07,190 --> 00:46:10,170
So, what this says is
held equals RSL of TL, OK?

903
00:46:10,170 --> 00:46:12,270
So, what the RSL
instruction does

904
00:46:12,270 --> 00:46:16,560
is it looks at the state of the
lock, and if the lock is unset,

905
00:46:16,560 --> 00:46:18,350
then it sets it.

906
00:46:18,350 --> 00:46:22,650
And if the lock is set, then
it sets it and it returns true.

907
00:46:22,650 --> 00:46:25,520
And if the lock is set,
then it returns false.

908
00:46:25,520 --> 00:46:28,690
So it has the property
that it can both read

909
00:46:28,690 --> 00:46:31,940
and set the lock within a
single instruction, right?

910
00:46:31,940 --> 00:46:36,980
And we're going to use this read
and set lock sort of primitive,

911
00:46:36,980 --> 00:46:39,200
this basic thing,
as a way to build up

912
00:46:39,200 --> 00:46:42,900
this sort of more complicated
acquire function, which

913
00:46:42,900 --> 00:46:45,470
we can then use to
build up these locks.

914
00:46:45,470 --> 00:46:51,490
OK, so anytime you're designing
a multithreaded system

915
00:46:51,490 --> 00:46:53,990
in this way, or a system
with lots of concurrency,

916
00:46:53,990 --> 00:46:55,810
you should be worrying
about whether you

917
00:46:55,810 --> 00:46:57,440
have race conditions.

918
00:46:57,440 --> 00:46:59,120
And if you have
race conditions, you

919
00:46:59,120 --> 00:47:02,060
need to think about how to
use locks in order to prevent

920
00:47:02,060 --> 00:47:03,750
those race conditions.

921
00:47:03,750 --> 00:47:09,350
Alright, so there are a
couple of other topics related

922
00:47:09,350 --> 00:47:11,435
to performance that
appear in the text.

923
00:47:14,630 --> 00:47:16,340
And one of those
topics is caching.

924
00:47:16,340 --> 00:47:19,810
And I just want to spend one
very brief minute on caching.

925
00:47:19,810 --> 00:47:22,770
So you guys have already
seen catching presumably

926
00:47:22,770 --> 00:47:26,700
in the context of 6.004
with processor caches.

927
00:47:26,700 --> 00:47:29,130
And what we would
want to do, so you

928
00:47:29,130 --> 00:47:30,900
might want to sit
down and think through

929
00:47:30,900 --> 00:47:33,180
as an example of how
you would use a cache

930
00:47:33,180 --> 00:47:36,770
to improve the performance
of our Web server.

931
00:47:36,770 --> 00:47:38,950
So one thing that
you might do in order

932
00:47:38,950 --> 00:47:44,960
to improve the performnence
of the Web server

933
00:47:44,960 --> 00:47:47,520
is to put a cace
in the disk thread

934
00:47:47,520 --> 00:47:53,230
that you use instead of going to
disk in order to sort of reduce

935
00:47:53,230 --> 00:47:55,550
the latency of a disk access.

936
00:47:55,550 --> 00:47:57,570
And I'll at the beginning
of class next time

937
00:47:57,570 --> 00:47:59,300
take you through a
very simple example

938
00:47:59,300 --> 00:48:01,425
of how we can actually use
the disk thread in order

939
00:48:01,425 --> 00:48:02,327
to do that.

940
00:48:02,327 --> 00:48:04,910
But you guys should think about
this a little bit on your own.

941
00:48:04,910 --> 00:48:07,750
So barring that
little digression

942
00:48:07,750 --> 00:48:09,490
that we'll have next
time, this takes us

943
00:48:09,490 --> 00:48:12,770
to the end of our discussion
of sort of modularity,

944
00:48:12,770 --> 00:48:14,319
abstraction, and performance.

945
00:48:14,319 --> 00:48:15,860
And what we're going
to start talking

946
00:48:15,860 --> 00:48:18,470
about next time is
networking, and how networks

947
00:48:18,470 --> 00:48:19,970
But I want you guys
to make sure you

948
00:48:19,970 --> 00:48:22,000
keep in mind all these
topics that we've

949
00:48:22,000 --> 00:48:22,930
talked about because
these are going

950
00:48:22,930 --> 00:48:24,320
to be the sort of
fundamental tools

951
00:48:24,320 --> 00:48:25,900
that we are going to
use throughout the class

952
00:48:25,900 --> 00:48:27,360
in the design of
computer systems.

953
00:48:27,360 --> 00:48:28,970
So because we've
finished this module,

954
00:48:28,970 --> 00:48:30,930
it doesn't mean that
it's sort of OK to stop

955
00:48:30,930 --> 00:48:31,770
thinking about this stuff.

956
00:48:31,770 --> 00:48:34,019
You need to keep all of this
in mind at the same time.

957
00:48:34,019 --> 00:48:36,480
So we'll see you
all on Wednesday.