1 00:00:00,500 --> 00:00:07,520 Let's go ahead and get started. 2 00:00:07,520 --> 00:00:12,940 OK, so today we have one topic to finish up very briefly 3 00:00:12,940 --> 00:00:14,380 from last time. 4 00:00:14,380 --> 00:00:18,660 So if you remember, when we finished off last time, 5 00:00:18,660 --> 00:00:22,046 we were talking about the example of a multithreaded Web 6 00:00:22,046 --> 00:00:22,545 server. 7 00:00:25,080 --> 00:00:28,309 So the example that we were talking about, 8 00:00:28,309 --> 00:00:29,850 and this is an example that I'm going 9 00:00:29,850 --> 00:00:34,720 to use throughout the lecture today consisted of a Web server 10 00:00:34,720 --> 00:00:40,440 with, this example consists of a Web server 11 00:00:40,440 --> 00:00:43,380 with three main modules or main components. 12 00:00:43,380 --> 00:00:46,990 So it consists, the three modules are a networking 13 00:00:46,990 --> 00:00:51,773 module, a Web server module -- 14 00:00:54,880 --> 00:01:01,770 -- which is in charge of generating, for example, 15 00:01:01,770 --> 00:01:06,110 HTML pages, and then a disk module which is in charge 16 00:01:06,110 --> 00:01:08,280 of reading data off a disk. 17 00:01:08,280 --> 00:01:11,460 OK, so this thing is going to be communicating 18 00:01:11,460 --> 00:01:15,330 with the disk, which I've drawn as a cylinder here. 19 00:01:15,330 --> 00:01:17,200 So what happens is that client requests 20 00:01:17,200 --> 00:01:19,340 come in to this Web server. 21 00:01:19,340 --> 00:01:21,360 They come in to the network module. 22 00:01:21,360 --> 00:01:24,140 The network module forwards those requests 23 00:01:24,140 --> 00:01:25,540 on to the Web server. 24 00:01:25,540 --> 00:01:27,530 The Web server is in charge of generating, 25 00:01:27,530 --> 00:01:30,454 say, the HTML page that corresponds to the request. 26 00:01:30,454 --> 00:01:32,870 And in order to do that, it may need to read some data off 27 00:01:32,870 --> 00:01:33,960 of the disk. 28 00:01:33,960 --> 00:01:37,380 So it forwards this request onto the disk module, which 29 00:01:37,380 --> 00:01:39,440 goes and actually gets the page from the disk 30 00:01:39,440 --> 00:01:42,390 and at some point later, the disk returns the page 31 00:01:42,390 --> 00:01:43,590 to the Web server. 32 00:01:43,590 --> 00:01:45,920 The Web server returns the page to the network module, 33 00:01:45,920 --> 00:01:49,630 and then the network module sends the answer back 34 00:01:49,630 --> 00:01:51,310 over the network to the user. 35 00:01:51,310 --> 00:01:54,100 So this is a very simple example of a Web server. 36 00:01:54,100 --> 00:01:56,120 It should be sort of familiar to you 37 00:01:56,120 --> 00:01:59,130 since you have just spent a while studying the Flash Web 38 00:01:59,130 --> 00:01:59,630 server. 39 00:01:59,630 --> 00:02:02,130 So you can see that this is sort of a simplified description 40 00:02:02,130 --> 00:02:05,320 of what a Web server does. 41 00:02:05,320 --> 00:02:08,699 So if you think about how you actually 42 00:02:08,699 --> 00:02:11,300 go about designing a Web server like this, of course 43 00:02:11,300 --> 00:02:14,320 it's not the case that there is only one request that 44 00:02:14,320 --> 00:02:16,810 is moving between these modules at any one point in time. 45 00:02:16,810 --> 00:02:20,750 So, in fact, there may be multiple client requests that 46 00:02:20,750 --> 00:02:22,295 come in to the network module. 47 00:02:22,295 --> 00:02:23,920 And the network module may want to have 48 00:02:23,920 --> 00:02:27,300 multiple outstanding pages that it's asking the Web 49 00:02:27,300 --> 00:02:29,000 server to generate. 50 00:02:29,000 --> 00:02:32,400 And the Web server itself might be requesting multiple items 51 00:02:32,400 --> 00:02:34,210 from the disk. 52 00:02:34,210 --> 00:02:36,860 And so, in turn, that means that at any point in time 53 00:02:36,860 --> 00:02:39,180 there could be sort of multiple results. 54 00:02:39,180 --> 00:02:43,340 There could be results streaming back in from the disk which 55 00:02:43,340 --> 00:02:45,470 are going into the Web server, which 56 00:02:45,470 --> 00:02:47,890 is sort of chewing on results and producing pages 57 00:02:47,890 --> 00:02:49,590 to the network module. 58 00:02:49,590 --> 00:02:51,560 And so it's possible for there to be 59 00:02:51,560 --> 00:02:54,680 sort of queues that are building up between these modules 60 00:02:54,680 --> 00:02:56,420 both on the send and receive. 61 00:02:56,420 --> 00:02:58,440 So, I'm going to draw a queue, I'll 62 00:02:58,440 --> 00:03:01,061 draw a queue just sort as a box with these vertical arrows 63 00:03:01,061 --> 00:03:01,560 through it. 64 00:03:01,560 --> 00:03:02,935 So there is some buffering that's 65 00:03:02,935 --> 00:03:05,100 happening between the sort of incoming requests 66 00:03:05,100 --> 00:03:06,900 and the outgoing requests on these modules. 67 00:03:11,170 --> 00:03:13,026 OK, and this buffering is a good thing. 68 00:03:13,026 --> 00:03:14,650 And we're going to talk more about this 69 00:03:14,650 --> 00:03:17,150 throughout the lecture today because what it allows us to do 70 00:03:17,150 --> 00:03:19,050 is it allows us to decouple the operations 71 00:03:19,050 --> 00:03:20,175 of these different modules. 72 00:03:20,175 --> 00:03:23,020 So, for example, the disk module can be reading a page from disk 73 00:03:23,020 --> 00:03:26,510 while the HTML page, while the HTML server is, for example 74 00:03:26,510 --> 00:03:28,601 simultaneously generating an HTML page that 75 00:03:28,601 --> 00:03:29,850 wants to return to the client. 76 00:03:32,626 --> 00:03:34,000 But in this architecture, you can 77 00:03:34,000 --> 00:03:37,900 see that, for example, when the Web server wants 78 00:03:37,900 --> 00:03:40,430 to produce a result, it can only produce a result when 79 00:03:40,430 --> 00:03:43,090 the disk pages that it needs are actually available. 80 00:03:43,090 --> 00:03:47,000 So the Web server is dependent on some result from the disk 81 00:03:47,000 --> 00:03:49,360 module being available. 82 00:03:49,360 --> 00:03:51,770 So if we were to look at just this Web server, 83 00:03:51,770 --> 00:03:54,550 I'm going to call this the HTML thread here and the disk 84 00:03:54,550 --> 00:03:56,800 thread, so these two threads that 85 00:03:56,800 --> 00:04:01,160 are on the right side of this diagram that I've drawn here, 86 00:04:01,160 --> 00:04:02,919 If you were to look at the code that 87 00:04:02,919 --> 00:04:05,210 was running in these things, and we saw this last time, 88 00:04:05,210 --> 00:04:07,530 the code might look something like this. 89 00:04:07,530 --> 00:04:10,750 So the HTML thread is just going to sit in a loop continually 90 00:04:10,750 --> 00:04:13,830 trying to de-queue information from this queue that is shared 91 00:04:13,830 --> 00:04:15,320 between it and the disk thread. 92 00:04:15,320 --> 00:04:16,930 And then, the disk thread is going to be in a loop 93 00:04:16,930 --> 00:04:19,120 where it continually reads blocks off the disk, 94 00:04:19,120 --> 00:04:22,260 and then enqueues them onto this queue. 95 00:04:22,260 --> 00:04:25,080 So this design at first seems like it might be fine. 96 00:04:25,080 --> 00:04:27,540 But then if you start thinking about what's really going on 97 00:04:27,540 --> 00:04:28,831 here, there could be a problem. 98 00:04:28,831 --> 00:04:32,210 So, suppose for example that the queue is of a finite length. 99 00:04:32,210 --> 00:04:34,290 It only has a certain number of elements in it. 100 00:04:34,290 --> 00:04:37,690 Now when we keep calling enqueue over and over and over again, 101 00:04:37,690 --> 00:04:40,262 it's possible that if the HTML thread isn't consuming 102 00:04:40,262 --> 00:04:42,720 these pages off the queue fast enough, that the queue could 103 00:04:42,720 --> 00:04:44,904 fill up, and it could overflow, right? 104 00:04:44,904 --> 00:04:46,320 So that might be a problem that we 105 00:04:46,320 --> 00:04:49,550 want to sort of make a condition that we would explicitly 106 00:04:49,550 --> 00:04:50,896 check for in the code. 107 00:04:50,896 --> 00:04:52,270 And so we could do that by adding 108 00:04:52,270 --> 00:04:53,730 a set of conditions like this. 109 00:04:53,730 --> 00:04:59,390 So what you see here is that I have just augmented the code 110 00:04:59,390 --> 00:05:02,070 with these two additional variables, used and free, where 111 00:05:02,070 --> 00:05:04,530 used indicates the number blocks that are in the queue that 112 00:05:04,530 --> 00:05:05,930 are currently in use. 113 00:05:05,930 --> 00:05:07,880 And free indicates the number of blocks 114 00:05:07,880 --> 00:05:10,764 that are in the code, the number of blocks 115 00:05:10,764 --> 00:05:12,680 that are in the queue that are currently free. 116 00:05:12,680 --> 00:05:15,750 So what this loop does is that the disk thread 117 00:05:15,750 --> 00:05:18,910 says it only wants to enqueue something onto the queue 118 00:05:18,910 --> 00:05:21,690 when there are some free blocks. 119 00:05:21,690 --> 00:05:24,420 So, it has a while loop that just loops forever 120 00:05:24,420 --> 00:05:27,130 and ever and ever while they're waiting when 121 00:05:27,130 --> 00:05:28,940 there are no free blocks, OK? 122 00:05:28,940 --> 00:05:30,830 And similarly, the HTML thread is just 123 00:05:30,830 --> 00:05:35,430 going to wait forever when there are no used blocks, OK? 124 00:05:35,430 --> 00:05:38,600 And then, when the disk thread enqueues a block 125 00:05:38,600 --> 00:05:41,012 onto the queue, it's going to decrement the free count 126 00:05:41,012 --> 00:05:42,720 because it's reduced the number of things 127 00:05:42,720 --> 00:05:44,080 that are in the queue. 128 00:05:44,080 --> 00:05:46,580 And it's going to increment the used count because now there 129 00:05:46,580 --> 00:05:49,020 is one additional thing that's available in the queue, OK? 130 00:05:49,020 --> 00:05:50,870 So this is a simple way in which now we've 131 00:05:50,870 --> 00:05:53,389 made it so these things are waiting for each other. 132 00:05:53,389 --> 00:05:54,930 They are coordinating with each other 133 00:05:54,930 --> 00:05:58,370 by use of these two shared variables used in free. 134 00:05:58,370 --> 00:06:02,990 OK, so these two threads share these variables. 135 00:06:02,990 --> 00:06:04,450 So that's fine. 136 00:06:04,450 --> 00:06:07,450 But if you think about this from a scheduling point of view, 137 00:06:07,450 --> 00:06:10,040 there still is a little bit of a problem with this approach. 138 00:06:10,040 --> 00:06:22,140 So in particular, what's going on here is that, oops, 139 00:06:22,140 --> 00:06:25,270 when one of these threads enters into one of these while loops, 140 00:06:25,270 --> 00:06:27,050 it's just going to sit there checking 141 00:06:27,050 --> 00:06:29,290 this condition over and over and over and over again, right? 142 00:06:29,290 --> 00:06:31,000 So then the thread scheduler schedules that thread. 143 00:06:31,000 --> 00:06:33,110 It's going to repeatedly check this condition. 144 00:06:33,110 --> 00:06:35,450 And that's maybe not so desirable. 145 00:06:35,450 --> 00:06:37,710 So suppose, for example, that the HTML thread 146 00:06:37,710 --> 00:06:40,043 enters into this loop and starts looping because there's 147 00:06:40,043 --> 00:06:41,336 no data available. 148 00:06:41,336 --> 00:06:43,210 Now, what really we would like to have happen 149 00:06:43,210 --> 00:06:45,709 is for the disk thread to be allowed to get a chance to run, 150 00:06:45,709 --> 00:06:48,451 so maybe it can produce some data so that the HTML thread 151 00:06:48,451 --> 00:06:49,700 can then go ahead and operate. 152 00:06:49,700 --> 00:06:53,700 But, with this while loop there, we can't quite do that. 153 00:06:53,700 --> 00:06:55,690 We just sort of waste the CPU during the time 154 00:06:55,690 --> 00:06:56,970 we are in this while loop. 155 00:06:56,970 --> 00:06:58,428 So, instead what we are going to do 156 00:06:58,428 --> 00:07:01,001 is introduce the set of what we call sequence coordination 157 00:07:01,001 --> 00:07:01,500 operators. 158 00:07:09,550 --> 00:07:11,650 So in order to introduce this, we're 159 00:07:11,650 --> 00:07:14,190 going to add something, a new kind of data type 160 00:07:14,190 --> 00:07:15,465 that we call an eventcount. 161 00:07:18,070 --> 00:07:20,400 An eventcount, you can just think of it 162 00:07:20,400 --> 00:07:26,010 as an integer that indicates the number of times 163 00:07:26,010 --> 00:07:27,270 that something has occurred. 164 00:07:27,270 --> 00:07:29,505 It's just some sort of running counter variable. 165 00:07:33,050 --> 00:07:35,490 And we're going to introduce two new routines. 166 00:07:35,490 --> 00:07:40,780 So these two routines are called wait and notify. 167 00:07:43,770 --> 00:07:46,504 OK, so wait takes two arguments. 168 00:07:46,504 --> 00:07:48,295 It takes one of these eventcount variables. 169 00:07:52,410 --> 00:07:55,560 And it takes a value. 170 00:07:55,560 --> 00:08:02,870 OK, so what wait says is check the value 171 00:08:02,870 --> 00:08:08,730 of this eventcount thing, and see whether when we check it, 172 00:08:08,730 --> 00:08:13,750 the value of this eventcount is less than or equal to value. 173 00:08:13,750 --> 00:08:20,110 If eventcount is less than or equal to value, then it waits. 174 00:08:20,110 --> 00:08:21,800 And what it means for it to wait is 175 00:08:21,800 --> 00:08:24,280 that it tells the thread scheduler 176 00:08:24,280 --> 00:08:28,880 that it no longer wants to be scheduled until somebody later 177 00:08:28,880 --> 00:08:34,270 calls this notify routine on this same eventcount variable. 178 00:08:34,270 --> 00:08:37,620 OK, so wait says wait, if this condition is true, 179 00:08:37,620 --> 00:08:39,809 and then notify says, wake up everybody 180 00:08:39,809 --> 00:08:41,210 who's waiting on this variable. 181 00:08:46,140 --> 00:08:48,799 So we can use these routines in the following way in this code. 182 00:08:48,799 --> 00:08:50,340 And it's really very straightforward. 183 00:08:50,340 --> 00:08:56,150 We simply change our iteration through our while loops 184 00:08:56,150 --> 00:08:57,324 into wait statements. 185 00:08:57,324 --> 00:08:58,740 So what we're going to do is we're 186 00:08:58,740 --> 00:09:02,720 going to have the HTML thread wait until the value of used 187 00:09:02,720 --> 00:09:04,777 becomes greater than zero. 188 00:09:04,777 --> 00:09:06,610 And we're going to have our disk thread wait 189 00:09:06,610 --> 00:09:09,020 until the value of free becomes greater than zero. 190 00:09:09,020 --> 00:09:11,570 And then the only other thing that we have to add to this 191 00:09:11,570 --> 00:09:13,450 is simply a call to notify. 192 00:09:13,450 --> 00:09:17,560 So what notify does is it indicates 193 00:09:17,560 --> 00:09:20,460 to any other thread that is waiting 194 00:09:20,460 --> 00:09:22,870 on a particular variable that that thread can run. 195 00:09:22,870 --> 00:09:25,780 So the HTML thread will notify free, 196 00:09:25,780 --> 00:09:28,250 which will tell the disk thread that it can now 197 00:09:28,250 --> 00:09:31,200 begin running if it had been waiting on the variable free. 198 00:09:33,890 --> 00:09:36,180 OK, so this emulates the behavior of the while loop 199 00:09:36,180 --> 00:09:40,326 that we had before except that the thread scheduler, rather 200 00:09:40,326 --> 00:09:42,200 than sitting in an infinite while loop simply 201 00:09:42,200 --> 00:09:44,510 doesn't schedule the HTML thread or the disk thread 202 00:09:44,510 --> 00:09:48,330 while it's waiting in one of these wait statements. 203 00:09:48,330 --> 00:09:53,550 OK, so what we're going to talk about for the rest 204 00:09:53,550 --> 00:09:56,140 of the lecture today is related to this, 205 00:09:56,140 --> 00:09:59,510 and I think you will see why as we get through the talk. 206 00:09:59,510 --> 00:10:01,530 The topic for today is performance. 207 00:10:06,970 --> 00:10:11,570 So performance, what we've looked at so far in this class 208 00:10:11,570 --> 00:10:14,190 are these various ways of structuring complex programs, 209 00:10:14,190 --> 00:10:16,920 how to break them up into several modules, 210 00:10:16,920 --> 00:10:20,380 the client/server paradigm, how threads work, 211 00:10:20,380 --> 00:10:23,420 how a thread scheduler works, all of these sort of big topics 212 00:10:23,420 --> 00:10:24,670 about how you design a system. 213 00:10:24,670 --> 00:10:27,050 But we haven't said anything about how you take a system 214 00:10:27,050 --> 00:10:31,420 design and in an ordered, regular, systematic way, think 215 00:10:31,420 --> 00:10:33,320 about making that system run efficiently. 216 00:10:33,320 --> 00:10:35,630 So that's what we're going to try and get at today. 217 00:10:35,630 --> 00:10:37,380 We're going to look at a set of techniques 218 00:10:37,380 --> 00:10:41,460 that we can use to make a computer system more efficient. 219 00:10:41,460 --> 00:10:45,110 And so, these techniques, there are really 220 00:10:45,110 --> 00:10:47,350 three techniques that we're going to look at today. 221 00:10:47,350 --> 00:10:51,070 The first one is a technique called concurrency. 222 00:10:51,070 --> 00:10:54,080 And concurrency is really about allowing the system 223 00:10:54,080 --> 00:10:56,800 to perform multiple operations simultaneously. 224 00:10:56,800 --> 00:11:00,110 So, for example, in our sample Web server, 225 00:11:00,110 --> 00:11:03,490 it may be the case that we have this disc that we 226 00:11:03,490 --> 00:11:06,120 can sort of read pages from at the same time 227 00:11:06,120 --> 00:11:09,100 that, for example, the CPU generates some Web 228 00:11:09,100 --> 00:11:11,450 pages that it's going to output to the client. 229 00:11:11,450 --> 00:11:13,587 OK, so that's what concurrency is about. 230 00:11:13,587 --> 00:11:15,920 We are also going to look at a technique called caching, 231 00:11:15,920 --> 00:11:18,050 which you guys should have all seen before. 232 00:11:18,050 --> 00:11:20,650 Caching is really just about saving off 233 00:11:20,650 --> 00:11:22,697 some previous work, some previous computation 234 00:11:22,697 --> 00:11:24,780 that we've already done, or our previous disk page 235 00:11:24,780 --> 00:11:25,904 that we've already read in. 236 00:11:25,904 --> 00:11:28,080 We want to save it off so that we can reuse it again 237 00:11:28,080 --> 00:11:28,887 at a later time. 238 00:11:28,887 --> 00:11:30,470 And then finally, we are going to look 239 00:11:30,470 --> 00:11:32,790 at something called scheduling. 240 00:11:32,790 --> 00:11:36,990 So scheduling is about when we have multiple requests 241 00:11:36,990 --> 00:11:39,080 to process, we might be able to order 242 00:11:39,080 --> 00:11:41,550 those requests in a certain way or group the requests 243 00:11:41,550 --> 00:11:43,790 together in a certain way so that we can 244 00:11:43,790 --> 00:11:45,240 make the system more efficient. 245 00:11:45,240 --> 00:11:48,760 So it's really about sort of choosing the order in which we 246 00:11:48,760 --> 00:11:53,200 do things in order to make the system run more efficiently. 247 00:11:53,200 --> 00:11:54,931 And throughout the course of this, 248 00:11:54,931 --> 00:11:56,930 I'm going to use this example of this Web server 249 00:11:56,930 --> 00:11:58,971 that we've been talking about to sort of motivate 250 00:11:58,971 --> 00:12:03,530 each of the applications, or each of these performance 251 00:12:03,530 --> 00:12:07,090 techniques that we're going to talk about. 252 00:12:07,090 --> 00:12:08,877 So in order to get to the point where 253 00:12:08,877 --> 00:12:11,210 we can understand how these performance techniques work, 254 00:12:11,210 --> 00:12:12,835 we need to talk a little bit about what 255 00:12:12,835 --> 00:12:13,881 we mean by performance. 256 00:12:13,881 --> 00:12:15,880 How do we measure the performance of the system, 257 00:12:15,880 --> 00:12:18,770 and how do we understand where the bottlenecks in performance 258 00:12:18,770 --> 00:12:21,010 in a system might be? 259 00:12:21,010 --> 00:12:28,330 So one thing we might want to, the first thing we need to do 260 00:12:28,330 --> 00:12:31,350 is to define a set of performance metrics. 261 00:12:31,350 --> 00:12:33,842 These are just a set of terms and definitions 262 00:12:33,842 --> 00:12:35,300 that we can use so that we can talk 263 00:12:35,300 --> 00:12:37,406 about what the performance of the system is. 264 00:12:37,406 --> 00:12:39,280 So the first metric we might be interested in 265 00:12:39,280 --> 00:12:44,790 is the capacity of the system. 266 00:12:44,790 --> 00:12:48,170 And capacity is simply some measure 267 00:12:48,170 --> 00:12:54,175 of the amount of resource in a system. 268 00:12:54,175 --> 00:12:56,800 So this sounds kind of abstract, but what we mean by a resource 269 00:12:56,800 --> 00:12:59,080 is some sort of thing that we can compete with. 270 00:12:59,080 --> 00:13:01,350 It's a disk, or a CPU, or a network, 271 00:13:01,350 --> 00:13:09,800 so we might, for example, talk about the capacity of a disk 272 00:13:09,800 --> 00:13:14,780 might be the size in gigabytes or the capacity of a processor 273 00:13:14,780 --> 00:13:17,920 might be the number of instructions 274 00:13:17,920 --> 00:13:19,130 it can execute per second. 275 00:13:22,080 --> 00:13:24,250 OK, so once we have capacity, now we 276 00:13:24,250 --> 00:13:26,584 can start talking about how much of the system 277 00:13:26,584 --> 00:13:27,500 we are actually using. 278 00:13:27,500 --> 00:13:28,875 So we talk about the utilization. 279 00:13:32,170 --> 00:13:38,540 So utilization is simply the percentage of capacity 280 00:13:38,540 --> 00:13:39,940 we're using. 281 00:13:39,940 --> 00:13:42,534 So we might have used up 80% of the disk blocks 282 00:13:42,534 --> 00:13:43,200 on our computer. 283 00:13:45,950 --> 00:13:50,440 So now there are two sort of properties, or two metrics that 284 00:13:50,440 --> 00:13:53,160 are very commonly used in computer systems 285 00:13:53,160 --> 00:13:56,800 in order to classify or sort of talk about what the performance 286 00:13:56,800 --> 00:13:57,840 of the system is. 287 00:13:57,840 --> 00:14:01,190 So, the first metric is latency. 288 00:14:01,190 --> 00:14:10,290 So, latency is simply the time for a request to complete. 289 00:14:10,290 --> 00:14:18,410 The REQ is request, OK, and we can also 290 00:14:18,410 --> 00:14:21,270 talk about sort of the inverse of this, 291 00:14:21,270 --> 00:14:23,620 what at first will seem like the inverse of this, 292 00:14:23,620 --> 00:14:24,685 which is throughput. 293 00:14:29,420 --> 00:14:32,350 That's simply the number of requests per second 294 00:14:32,350 --> 00:14:34,360 that we can process. 295 00:14:34,360 --> 00:14:36,340 So when you think about latency and throughput, 296 00:14:36,340 --> 00:14:37,798 when you first see this definition, 297 00:14:37,798 --> 00:14:41,230 it's tempting to think that simply throughput is 298 00:14:41,230 --> 00:14:42,580 the inverse of latency, right? 299 00:14:42,580 --> 00:14:44,940 If it takes 10 ms for a request to complete, 300 00:14:44,940 --> 00:14:46,570 well, then I must be able to complete 301 00:14:46,570 --> 00:14:48,860 100 requests per second, right? 302 00:14:48,860 --> 00:14:55,380 And, that's true in the simple case 303 00:14:55,380 --> 00:14:57,500 where in the very simple example where 304 00:14:57,500 --> 00:15:00,140 I have a single module, for example, that can process one 305 00:15:00,140 --> 00:15:06,180 request at a time, so a single computational resource, 306 00:15:06,180 --> 00:15:10,210 for example, that can only do one thing at a time, 307 00:15:10,210 --> 00:15:15,270 if this thing has some infinite set of inputs in it, 308 00:15:15,270 --> 00:15:18,180 it takes 10 ms to process each input, 309 00:15:18,180 --> 00:15:23,710 we'll see, say, 100 results per second coming out, OK? 310 00:15:23,710 --> 00:15:26,740 So if something takes 10 ms to do, 311 00:15:26,740 --> 00:15:28,670 you can be 100 of them per second. 312 00:15:28,670 --> 00:15:30,660 So we could say the throughput of this system 313 00:15:30,660 --> 00:15:33,260 is 100 per second, and the latency is 10 ms. 314 00:15:33,260 --> 00:15:35,780 What we're going to see throughout this talk is 315 00:15:35,780 --> 00:15:40,500 that in fact a strict relationship between latency 316 00:15:40,500 --> 00:15:42,090 and throughput doesn't hold I mean, 317 00:15:42,090 --> 00:15:43,548 you guys have probably already seen 318 00:15:43,548 --> 00:15:45,600 the notion of pipelining before in 6.004, 319 00:15:45,600 --> 00:15:47,180 and you understand that pipelining 320 00:15:47,180 --> 00:15:49,790 is a way in which we can improve the throughput of the system 321 00:15:49,790 --> 00:15:51,498 without necessarily changing the latency. 322 00:15:51,498 --> 00:15:56,380 And we'll talk about that more carefully as this talk goes on. 323 00:15:56,380 --> 00:16:00,520 OK, so given these metrics, now what we need to do 324 00:16:00,520 --> 00:16:04,750 is think a little bit about, OK, so suppose I have some system, 325 00:16:04,750 --> 00:16:07,340 and suppose I have some sort of set of goals for that system 326 00:16:07,340 --> 00:16:09,484 like I want the system to be able to process 327 00:16:09,484 --> 00:16:11,150 a certain number of requests per second, 328 00:16:11,150 --> 00:16:15,010 or I want the latency of this system to be under some amount. 329 00:16:15,010 --> 00:16:18,640 So then the question is, so you are given this computer system 330 00:16:18,640 --> 00:16:20,680 and you sit down and you want to measure it. 331 00:16:20,680 --> 00:16:23,490 And so you're going to measure the system. 332 00:16:23,490 --> 00:16:27,400 And what do you expect to find? 333 00:16:27,400 --> 00:16:31,760 So, in the design of computer systems, 334 00:16:31,760 --> 00:16:34,870 it turns out that there is some sort of well-known performance 335 00:16:34,870 --> 00:16:38,560 pitfalls, or so-called performance bottlenecks. 336 00:16:44,900 --> 00:16:47,221 And the goal of sort of doing performance analysis 337 00:16:47,221 --> 00:16:48,720 of a system is to look at the system 338 00:16:48,720 --> 00:16:50,900 and figure out where the bottlenecks are. 339 00:16:50,900 --> 00:16:54,202 So, this typically in the design of the big computer 340 00:16:54,202 --> 00:16:55,660 system, what we're worried about is 341 00:16:55,660 --> 00:16:59,150 which of the little individual modules within the system 342 00:16:59,150 --> 00:17:01,876 is most responsible for slowing down my computer. 343 00:17:01,876 --> 00:17:03,250 And what should I do in order to, 344 00:17:03,250 --> 00:17:05,791 sort of, and then once you've identified that module, picking 345 00:17:05,791 --> 00:17:08,342 about how to make a particular module that slow run faster. 346 00:17:08,342 --> 00:17:10,550 So that's really what finding performance bottlenecks 347 00:17:10,550 --> 00:17:13,130 is about. 348 00:17:13,130 --> 00:17:17,060 And there's a classic bottleneck that occurs in computer systems 349 00:17:17,060 --> 00:17:18,637 that you guys all need to know about. 350 00:17:18,637 --> 00:17:20,095 It's this so-called I/O bottleneck. 351 00:17:23,150 --> 00:17:26,390 OK, so what the I/O bottleneck says 352 00:17:26,390 --> 00:17:29,300 it's really fairly straightforward. 353 00:17:29,300 --> 00:17:32,560 If you think about a computer system, 354 00:17:32,560 --> 00:17:36,360 it has a hierarchy of memory devices in it, OK? 355 00:17:36,360 --> 00:17:39,560 And these memory devices start, or storage devices. 356 00:17:39,560 --> 00:17:42,700 So these storage devices first start with the CPU. 357 00:17:42,700 --> 00:17:45,080 So the CPU has some set of registers on it, 358 00:17:45,080 --> 00:17:48,170 a small number of them, say for example, 32. 359 00:17:48,170 --> 00:17:50,790 And you can access those registers very, very fast, 360 00:17:50,790 --> 00:17:54,850 say once per instruction, once per cycle on the computer. 361 00:17:54,850 --> 00:17:58,880 So, for example, if your CPU is one gigahertz, 362 00:17:58,880 --> 00:18:03,040 you may be able to access one of these registers in 1 363 00:18:03,040 --> 00:18:03,730 nanosecond. 364 00:18:03,730 --> 00:18:09,670 OK, and so typically at the tallest level, of this pyramid, 365 00:18:09,670 --> 00:18:14,200 we have a small storage that is fast, OK? 366 00:18:14,200 --> 00:18:17,050 As we go down this pyramid adding new layers, 367 00:18:17,050 --> 00:18:19,152 and looking at this storage hierarchy, 368 00:18:19,152 --> 00:18:21,360 we're going to see that things get bigger and slower. 369 00:18:21,360 --> 00:18:26,140 So, just below the CPU, we may have some processor cache, OK, 370 00:18:26,140 --> 00:18:31,750 and this might be, for example, 512 kB. 371 00:18:31,750 --> 00:18:35,910 And it might take 20 ns to access a single, say, 372 00:18:35,910 --> 00:18:38,162 block of this memory. 373 00:18:38,162 --> 00:18:40,370 And then we're going to have the RAM, the main memory 374 00:18:40,370 --> 00:18:44,290 of the device, which on a modern machine might be 1 GB. 375 00:18:44,290 --> 00:18:47,870 And it might take 100 ns to access. 376 00:18:47,870 --> 00:18:50,900 And then below that, you take a big step down or big step up 377 00:18:50,900 --> 00:18:52,930 in size and big step down and performance. 378 00:18:52,930 --> 00:18:54,630 You typically have a disk. 379 00:18:54,630 --> 00:18:57,440 So a disk might be as big as 100 GB, right? 380 00:18:57,440 --> 00:18:59,330 But, performance is very slow. 381 00:18:59,330 --> 00:19:02,230 So it's a mechanical thing that has to spin, 382 00:19:02,230 --> 00:19:04,270 and it only spins so fast. 383 00:19:04,270 --> 00:19:07,480 So a typical access time for a block of the disk 384 00:19:07,480 --> 00:19:10,280 might be as high as 10 ms or even higher. 385 00:19:10,280 --> 00:19:12,040 And then sometimes people will talk 386 00:19:12,040 --> 00:19:15,576 in this hierarchy the network is actually a level below that. 387 00:19:15,576 --> 00:19:18,200 So if something isn't available on the local disk, for example, 388 00:19:18,200 --> 00:19:19,720 on our Web server, we might actually 389 00:19:19,720 --> 00:19:21,800 have to go out into the network and fetch it. 390 00:19:21,800 --> 00:19:23,824 And if this network is the Internet, 391 00:19:23,824 --> 00:19:25,740 right, the Internet has a huge amount of data. 392 00:19:25,740 --> 00:19:27,115 I mean, who knows how much it is. 393 00:19:27,115 --> 00:19:29,320 It's certainly orders of terabytes. 394 00:19:29,320 --> 00:19:32,710 And it could take a long time to get a page of the Internet. 395 00:19:32,710 --> 00:19:36,610 So it might take 100 ms to reach some remote site 396 00:19:36,610 --> 00:19:38,090 on the Internet. 397 00:19:38,090 --> 00:19:41,550 All right, so the point about this I/O bottleneck 398 00:19:41,550 --> 00:19:44,710 is that this is going to be a very common, sort 399 00:19:44,710 --> 00:19:46,586 of the disparity in the performance 400 00:19:46,586 --> 00:19:48,210 of these different levels of the system 401 00:19:48,210 --> 00:19:50,910 is going to be a very common source of performance 402 00:19:50,910 --> 00:19:52,060 problems in our computers. 403 00:19:52,060 --> 00:19:57,420 So in particular, if you look at the access time, here's 1 ns. 404 00:19:57,420 --> 00:19:59,330 The access time down here is 100 ms. 405 00:19:59,330 --> 00:20:02,000 This is a ten to the eighth difference, right, 406 00:20:02,000 --> 00:20:04,780 which is equal to 100 million times 407 00:20:04,780 --> 00:20:07,380 difference in the performance of the fastest to the slowest 408 00:20:07,380 --> 00:20:07,880 thing here. 409 00:20:07,880 --> 00:20:10,130 So, if the CPU has to wait for something 410 00:20:10,130 --> 00:20:13,354 to come over the network, you're waiting for a very long time 411 00:20:13,354 --> 00:20:15,770 in terms of the amount of time the CPU takes to, say, read 412 00:20:15,770 --> 00:20:18,090 a single word of memory. 413 00:20:18,090 --> 00:20:22,490 So when we look at the performance of a computer 414 00:20:22,490 --> 00:20:24,330 system, we're going to see that often 415 00:20:24,330 --> 00:20:27,680 this sort of I/O bottleneck is the problem with that system. 416 00:20:27,680 --> 00:20:30,090 So if we look, for example, at our Web server, 417 00:20:30,090 --> 00:20:38,830 with its three stages, where this stage is the one that 418 00:20:38,830 --> 00:20:43,150 goes to disk, this is the HTML stage, which maybe can just 419 00:20:43,150 --> 00:20:45,020 be computed in memory. 420 00:20:45,020 --> 00:20:47,760 And this is the network stage. 421 00:20:47,760 --> 00:20:51,980 We might be talking about 10 ms latency for the disk stage. 422 00:20:51,980 --> 00:20:54,750 We might be talking about just 1 ms for the HTML page, 423 00:20:54,750 --> 00:20:57,850 because all it has to do is do some computation in memory. 424 00:20:57,850 --> 00:21:00,770 And we might be talking about 100 ms for the network stage 425 00:21:00,770 --> 00:21:02,700 to run because it has to send some data out 426 00:21:02,700 --> 00:21:04,020 to some remote site. 427 00:21:04,020 --> 00:21:06,760 So if you, in order to process a single request, 428 00:21:06,760 --> 00:21:09,280 have to go through each of these steps in sequence, 429 00:21:09,280 --> 00:21:12,030 then the total performance of the system, 430 00:21:12,030 --> 00:21:15,410 the time to process a single request is going to be, 431 00:21:15,410 --> 00:21:21,540 say for example, 111 ms, the sum of these three things, OK? 432 00:21:21,540 --> 00:21:24,430 And so if you look at the system and you say, OK, 433 00:21:24,430 --> 00:21:27,989 what's the performance bottleneck in this system? 434 00:21:27,989 --> 00:21:29,530 So the performance bottleneck, right, 435 00:21:29,530 --> 00:21:32,430 is clearly this network stage because it takes the longest 436 00:21:32,430 --> 00:21:33,800 to run. 437 00:21:33,800 --> 00:21:36,410 And so if we want to answer a question about where we should 438 00:21:36,410 --> 00:21:38,370 be optimizing the system, one place 439 00:21:38,370 --> 00:21:41,750 we might think to optimize is within this network stage. 440 00:21:41,750 --> 00:21:44,800 And we'll see later an example of a simple kind 441 00:21:44,800 --> 00:21:46,490 of optimization that we can apply 442 00:21:46,490 --> 00:21:48,346 based on this notion of concurrency 443 00:21:48,346 --> 00:21:50,470 to improve the performance of the networking stage. 444 00:21:55,190 --> 00:21:57,620 So as I just said, the notion of concurrency 445 00:21:57,620 --> 00:21:59,328 is going to be the way that we are really 446 00:21:59,328 --> 00:22:01,870 going to get at sort of eliminating these I/O 447 00:22:01,870 --> 00:22:02,630 bottlenecks. 448 00:22:02,630 --> 00:22:12,160 So -- And the idea is going to be that we want to overlap 449 00:22:12,160 --> 00:22:18,290 the use of some other resource during the time that we are 450 00:22:18,290 --> 00:22:21,410 waiting for one of these slow I/O devices to complete. 451 00:22:21,410 --> 00:22:24,270 And, we are going to look at two types of concurrency. 452 00:22:24,270 --> 00:22:28,880 We're going to look at concurrency between modules -- 453 00:22:28,880 --> 00:22:33,000 -- and within a module, OK? 454 00:22:37,030 --> 00:22:39,730 So we may have modules that are composed, for example, 455 00:22:39,730 --> 00:22:42,260 our networking module may be composed 456 00:22:42,260 --> 00:22:43,690 of multiple threads, each of which 457 00:22:43,690 --> 00:22:44,898 can be accessing the network. 458 00:22:44,898 --> 00:22:47,450 So that's an example of concurrency within a module. 459 00:22:47,450 --> 00:22:49,680 And, we're going to look at the case 460 00:22:49,680 --> 00:22:53,180 of between module concurrency where, for example, the HTML 461 00:22:53,180 --> 00:22:56,220 module can be processing and be generating an HTML page, 462 00:22:56,220 --> 00:23:01,160 while the disk module is reading a request for another client 463 00:23:01,160 --> 00:23:02,650 at the same time. 464 00:23:02,650 --> 00:23:06,810 OK, and so the idea behind concurrency 465 00:23:06,810 --> 00:23:10,480 is really going to be that by using concurrency, 466 00:23:10,480 --> 00:23:14,170 we can hide the latency of one of these slow I/O stages. 467 00:23:22,130 --> 00:23:25,285 OK, so the first kind of concurrency 468 00:23:25,285 --> 00:23:27,660 we're going to talk about is concurrency between modules. 469 00:23:31,340 --> 00:23:34,890 And the primary technique we use for doing this is pipelining. 470 00:23:34,890 --> 00:23:37,660 So the idea with pipelining is as follows. 471 00:23:37,660 --> 00:23:42,080 Suppose we have our Web server again. 472 00:23:42,080 --> 00:23:44,310 And this time let's draw it as I drew it 473 00:23:44,310 --> 00:23:50,870 at first with queues between each of the modules, OK? 474 00:23:50,870 --> 00:23:53,804 So, we have our Web server which has our three stages. 475 00:23:53,804 --> 00:23:55,220 And suppose that what we are doing 476 00:23:55,220 --> 00:23:59,150 is we have some set of requests, sort 477 00:23:59,150 --> 00:24:02,375 of an infinite queue of requests that is sort of queued up 478 00:24:02,375 --> 00:24:04,000 at the disk thread, and the disk thread 479 00:24:04,000 --> 00:24:05,030 is producing these things. 480 00:24:05,030 --> 00:24:06,321 And we're sending them through. 481 00:24:06,321 --> 00:24:10,690 Well, we want to look at how many pages come out 482 00:24:10,690 --> 00:24:13,530 here per second, and what the latency of each page is. 483 00:24:13,530 --> 00:24:17,590 So, if we have some list of requests, 484 00:24:17,590 --> 00:24:22,890 suppose these requests are numbered R1 through Rn, OK? 485 00:24:22,890 --> 00:24:27,740 So what's going to happen is that the first request is 486 00:24:27,740 --> 00:24:30,410 going to start being processed by the disk server, right? 487 00:24:30,410 --> 00:24:33,162 So, it's going to start processing R1. 488 00:24:33,162 --> 00:24:35,620 Now, in a pipelining system, what we're going to want to do 489 00:24:35,620 --> 00:24:38,595 is to have each one of these threads sort of working 490 00:24:38,595 --> 00:24:40,970 on a different request, each one of these modules working 491 00:24:40,970 --> 00:24:43,700 on a different request at each point in time. 492 00:24:43,700 --> 00:24:46,270 And because the disk is an independent resource 493 00:24:46,270 --> 00:24:48,920 from the CPU, is an independent resource from the network, 494 00:24:48,920 --> 00:24:50,160 this is going to be OK. 495 00:24:50,160 --> 00:24:51,620 These three modules aren't actually 496 00:24:51,620 --> 00:24:53,540 going to contend with each other too much. 497 00:24:53,540 --> 00:24:55,415 So what's going to happen is this guy's going 498 00:24:55,415 --> 00:24:57,720 to start processing R1, right? 499 00:24:57,720 --> 00:25:03,280 And then after 10 ms, he's going to pass R1 up to here, 500 00:25:03,280 --> 00:25:05,580 and start working on R2, OK? 501 00:25:05,580 --> 00:25:08,440 And now, 1 ms after that, this guy 502 00:25:08,440 --> 00:25:12,280 is going to finish R1 and send it to here. 503 00:25:12,280 --> 00:25:15,730 And then, 9 ms after that, R2 is going to come up here. 504 00:25:15,730 --> 00:25:17,946 And this guy can start processing R3. 505 00:25:17,946 --> 00:25:19,570 OK, so does everybody sort of see where 506 00:25:19,570 --> 00:25:20,820 those numbers are coming from? 507 00:25:23,660 --> 00:25:25,320 OK. 508 00:25:25,320 --> 00:25:26,660 [LAUGHTER] Good. 509 00:25:26,660 --> 00:25:34,890 So now, what we're going to do is if we look at time starting 510 00:25:34,890 --> 00:25:38,700 with this equal to time zero, in terms of the requests that come 511 00:25:38,700 --> 00:25:41,900 in and out of this last network thread, 512 00:25:41,900 --> 00:25:43,929 we can sort of get a sense of how fast 513 00:25:43,929 --> 00:25:44,970 this thing is processing. 514 00:25:44,970 --> 00:25:52,880 So the first R1 enters into this system after 11 ms, right? 515 00:25:52,880 --> 00:25:55,200 It takes 10 ms to get through here and 1 ms 516 00:25:55,200 --> 00:25:56,420 to get through here. 517 00:25:56,420 --> 00:25:58,910 And, it starts processing R1 at this time. 518 00:25:58,910 --> 00:26:00,960 So, I'm going to write plus R1 to suggest that we 519 00:26:00,960 --> 00:26:03,130 start processing it here. 520 00:26:03,130 --> 00:26:06,570 The next time that this module can do anything 521 00:26:06,570 --> 00:26:12,670 is 100 ms after it first started processing, 522 00:26:12,670 --> 00:26:16,240 the next time this module does anything is 100 ms after it 523 00:26:16,240 --> 00:26:17,470 started processing R1. 524 00:26:17,470 --> 00:26:22,860 So, at time 111 ms, it can output R1, 525 00:26:22,860 --> 00:26:24,740 or it's done processing R1. 526 00:26:24,740 --> 00:26:31,490 And then, of course, by that time, R2 and R3, 527 00:26:31,490 --> 00:26:34,360 some set of requests have already queued up 528 00:26:34,360 --> 00:26:35,790 in this queue waiting for it. 529 00:26:35,790 --> 00:26:41,480 So it can immediately begin processing R2 at this time, OK? 530 00:26:41,480 --> 00:26:45,560 So then, clearly what's going to happen is after 211 ms, 531 00:26:45,560 --> 00:26:49,030 it's going to output R2, and it's 532 00:26:49,030 --> 00:26:52,760 going to begin processing R3, OK? 533 00:26:52,760 --> 00:26:55,000 So, there should be a plus there and a plus there. 534 00:26:55,000 --> 00:26:58,040 So, and similarly, at 311 we're going 535 00:26:58,040 --> 00:26:59,390 to move on to the next one. 536 00:26:59,390 --> 00:27:01,750 So, if you look now at the system, 537 00:27:01,750 --> 00:27:04,080 we've done something pretty interesting, 538 00:27:04,080 --> 00:27:06,330 which is that it still took us, sort 539 00:27:06,330 --> 00:27:11,070 of the time for this request to travel through this whole thing 540 00:27:11,070 --> 00:27:12,770 was 110 ms. 541 00:27:12,770 --> 00:27:15,150 But if you look at the enter - arrival time 542 00:27:15,150 --> 00:27:18,850 between each of these successive outputs of R1, 543 00:27:18,850 --> 00:27:21,030 they are only 100 ms, right? 544 00:27:21,030 --> 00:27:23,390 So we are only waiting as long as it 545 00:27:23,390 --> 00:27:26,600 takes R1 to process a result in order 546 00:27:26,600 --> 00:27:29,460 to produce these results, in order to produce answers. 547 00:27:29,460 --> 00:27:31,260 So by pipelining the system in this way 548 00:27:31,260 --> 00:27:34,940 and having the Web server thread and the disk thread 549 00:27:34,940 --> 00:27:37,080 do their processing on later requests 550 00:27:37,080 --> 00:27:39,860 while R1 is processing its request, 551 00:27:39,860 --> 00:27:42,230 we can increase the throughput of the system. 552 00:27:42,230 --> 00:27:45,580 So in this case, we get an arrival every 100 ms. 553 00:27:45,580 --> 00:27:53,650 So the throughput is now equal to one result every 100 ms, 554 00:27:53,650 --> 00:27:56,580 or ten results per second, OK? 555 00:27:56,580 --> 00:28:00,350 So, even though the latency is still 111 ms, 556 00:28:00,350 --> 00:28:02,570 the throughput is no longer one over the latency 557 00:28:02,570 --> 00:28:06,920 because we have separated them in this way by pipelining them. 558 00:28:06,920 --> 00:28:07,800 OK, so that was good. 559 00:28:07,800 --> 00:28:08,370 That was nice. 560 00:28:08,370 --> 00:28:10,050 We improve the performance of the system a little bit. 561 00:28:10,050 --> 00:28:12,219 But we didn't really improve it very much, right? 562 00:28:12,219 --> 00:28:14,510 We increased the throughput of this thing a little bit. 563 00:28:14,510 --> 00:28:19,420 But we haven't really addressed what we identified earlier as 564 00:28:19,420 --> 00:28:22,510 being a bottleneck, which the fact that this R1 stage is 565 00:28:22,510 --> 00:28:25,640 taking 100 ms to process. 566 00:28:25,640 --> 00:28:28,530 And in general, when we have a pipeline system like this, 567 00:28:28,530 --> 00:28:31,720 we can say that the throughput of the system 568 00:28:31,720 --> 00:28:34,920 is bottlenecked by the slowest stage of the system. 569 00:28:34,920 --> 00:28:38,370 So anytime you have a pipeline, the throughput of the system 570 00:28:38,370 --> 00:28:41,010 is going to be the throughput of the slowest stage. 571 00:28:41,010 --> 00:28:45,110 So in this case, the throughput is 10 results per second. 572 00:28:45,110 --> 00:28:47,060 And that's the throughput of the whole system. 573 00:28:47,060 --> 00:28:49,362 So if we want to improve the throughput anymore 574 00:28:49,362 --> 00:28:51,070 than this, what we're going to have to do 575 00:28:51,070 --> 00:28:53,710 is to somehow improve the performance of this module 576 00:28:53,710 --> 00:28:55,325 here. 577 00:28:55,325 --> 00:28:56,950 And the way that we're going to do that 578 00:28:56,950 --> 00:28:59,140 is also by exploiting concurrency. 579 00:29:05,920 --> 00:29:11,760 This is going to be this within a module concurrency. 580 00:29:16,450 --> 00:29:19,840 So if you think about how a Web server works, 581 00:29:19,840 --> 00:29:22,250 or how a network works, typically 582 00:29:22,250 --> 00:29:27,790 when we are sending these requests to a client, 583 00:29:27,790 --> 00:29:30,291 it's not that we are using up all of the available bandwidth 584 00:29:30,291 --> 00:29:32,331 of the network when we are sending these requests 585 00:29:32,331 --> 00:29:33,140 to a client, right? 586 00:29:33,140 --> 00:29:35,890 You may be able to send 100 MB per second out 587 00:29:35,890 --> 00:29:36,890 over your network. 588 00:29:36,890 --> 00:29:38,770 Or if you're connected to a machine here, 589 00:29:38,770 --> 00:29:42,570 you may be able to send 10 MB a second across the country 590 00:29:42,570 --> 00:29:44,840 to some other university. 591 00:29:44,840 --> 00:29:47,700 The issue is that it takes a relatively long time 592 00:29:47,700 --> 00:29:49,410 for that request to propagate, especially 593 00:29:49,410 --> 00:29:52,300 when that request is propagating out over the Internet. 594 00:29:52,300 --> 00:29:54,280 The latency can be quite high. 595 00:29:54,280 --> 00:29:56,392 But you may not be using all the bandwidth 596 00:29:56,392 --> 00:29:58,600 when you are, say, for example, sending an HTML page. 597 00:29:58,600 --> 00:30:04,340 So in particular it is the case that multiple applications, 598 00:30:04,340 --> 00:30:07,430 multiple threads, can be simultaneously sending data out 599 00:30:07,430 --> 00:30:08,660 over the network. 600 00:30:08,660 --> 00:30:10,746 And if that doesn't make sense to you right now, 601 00:30:10,746 --> 00:30:13,120 we're going to spend the whole next four lectures talking 602 00:30:13,120 --> 00:30:14,203 about network performance. 603 00:30:14,203 --> 00:30:15,980 And it should make sense for you. 604 00:30:15,980 --> 00:30:17,862 So just take my word for it that one 605 00:30:17,862 --> 00:30:19,320 of the properties of the network is 606 00:30:19,320 --> 00:30:23,570 so that the latency of the network may be relatively high. 607 00:30:23,570 --> 00:30:25,070 But in this case we are not actually 608 00:30:25,070 --> 00:30:26,820 going to be using all the bandwidth that's 609 00:30:26,820 --> 00:30:27,650 available to us. 610 00:30:27,650 --> 00:30:30,570 So that suggests that there is an idle resource. 611 00:30:30,570 --> 00:30:32,580 It means that we sort of have some network 612 00:30:32,580 --> 00:30:34,927 bandwidth that we could be using that we are not using. 613 00:30:34,927 --> 00:30:36,510 So we'd like to take advantage of that 614 00:30:36,510 --> 00:30:38,380 in the design of our system. 615 00:30:38,380 --> 00:30:44,130 So we can do this in a relatively simple way, which 616 00:30:44,130 --> 00:30:47,969 is simply to say, let's, within our networking module, 617 00:30:47,969 --> 00:30:50,260 rather than only having one thread sending out requests 618 00:30:50,260 --> 00:30:52,340 at one time, let's have multiple threads. 619 00:30:52,340 --> 00:30:56,710 Let's, for example have, say we have 10 threads. 620 00:30:56,710 --> 00:31:00,870 So we have thread one, thread two, thread ten, OK? 621 00:31:00,870 --> 00:31:02,810 And we're going to allow these all to be 622 00:31:02,810 --> 00:31:05,120 using the network at once. 623 00:31:05,120 --> 00:31:06,620 And they are all going to be talking 624 00:31:06,620 --> 00:31:12,070 to the same queue that's connected to the same HTML 625 00:31:12,070 --> 00:31:14,770 module that's connected to the same disk module. 626 00:31:14,770 --> 00:31:16,520 And there's a queue between these as well. 627 00:31:19,610 --> 00:31:25,610 OK, so now when we think about the performance of this, 628 00:31:25,610 --> 00:31:28,194 now let's see what happens when we start running requests 629 00:31:28,194 --> 00:31:29,110 through this pipeline. 630 00:31:29,110 --> 00:31:32,602 And let's see how frequently we get requests coming out 631 00:31:32,602 --> 00:31:33,310 of the other end. 632 00:31:37,620 --> 00:31:39,400 We draw our timeline again. 633 00:31:39,400 --> 00:31:42,120 You can see that R1 is going to come in here, 634 00:31:42,120 --> 00:31:45,170 and then after 10 ms it's going to move to here. 635 00:31:45,170 --> 00:31:47,520 And then after 11 ms it'll arrive here. 636 00:31:47,520 --> 00:31:50,960 We'll start processing request one. 637 00:31:50,960 --> 00:31:54,460 Now the second request, R2, is going to be here. 638 00:31:54,460 --> 00:31:59,000 And, we're going to have 9 ms of processing left to do on it. 639 00:31:59,000 --> 00:32:02,330 After R1, it gets sent on to the next thread. 640 00:32:02,330 --> 00:32:06,740 So, R2 is going to be in here for 9 ms. 641 00:32:06,740 --> 00:32:08,390 It will be in here for 1 ms. 642 00:32:08,390 --> 00:32:12,860 So, 10 ms after R1 arrives here, R2 is going to arrive here. 643 00:32:12,860 --> 00:32:15,950 So, what we have is we have 11 ms. 644 00:32:15,950 --> 00:32:16,830 We have R1. 645 00:32:16,830 --> 00:32:21,240 Now, 10 ms later, we have R2. 646 00:32:21,240 --> 00:32:23,360 OK, so now you can see that suddenly 647 00:32:23,360 --> 00:32:26,950 this module, this system is able to process multiple requests, 648 00:32:26,950 --> 00:32:30,640 so it has multiple requests that processing at the same time. 649 00:32:30,640 --> 00:32:37,320 And so 10 ms after that, R3 is going to start being processed, 650 00:32:37,320 --> 00:32:42,220 and then, so what that means is that after some passage 651 00:32:42,220 --> 00:32:45,900 of time, we're going to have R10 in here. 652 00:32:45,900 --> 00:32:48,470 And, that's going to go in after 101 ms, right? 653 00:32:48,470 --> 00:32:49,820 So, we're going to get R10. 654 00:32:49,820 --> 00:32:51,990 OK, and now we are ready to start processing. 655 00:32:51,990 --> 00:32:54,410 Now we've sort of pushed all these through. 656 00:32:54,410 --> 00:32:57,090 And now, suppose we start processing R11. 657 00:32:57,090 --> 00:33:00,800 OK, so R11 is going to flow through this pipeline. 658 00:33:00,800 --> 00:33:06,690 And then, it's at time 111, R11 is going 659 00:33:06,690 --> 00:33:08,780 to be ready to be processed. 660 00:33:08,780 --> 00:33:11,360 But notice that at time 111, we are 661 00:33:11,360 --> 00:33:14,990 finished processing R1, right? 662 00:33:14,990 --> 00:33:19,010 So, at this time, we can add R11 to the system, 663 00:33:19,010 --> 00:33:21,350 and we can output R1. 664 00:33:21,350 --> 00:33:25,160 OK, so now every 10 ms after this, 665 00:33:25,160 --> 00:33:26,850 another result is going to arrive, 666 00:33:26,850 --> 00:33:29,610 and we're going to be able to output the next one. 667 00:33:29,610 --> 00:33:31,370 OK, and this is just going to continue. 668 00:33:31,370 --> 00:33:32,786 So now, you see what we've managed 669 00:33:32,786 --> 00:33:36,510 to do is we've made this system so that every 10 670 00:33:36,510 --> 00:33:40,240 ms after this sort of startup time of 111 ms, 671 00:33:40,240 --> 00:33:44,160 after every 10 ms, we are producing a result, right? 672 00:33:44,160 --> 00:33:47,920 So we are going to get, actually, 100 per second. 673 00:33:47,920 --> 00:33:51,040 This is going to be the throughput of this system now. 674 00:33:51,040 --> 00:33:53,819 OK, so that was kind of neat. 675 00:33:53,819 --> 00:33:54,610 How did we do that? 676 00:33:54,610 --> 00:33:56,230 What have we done here? 677 00:33:56,230 --> 00:33:57,800 Well, effectively what we've done 678 00:33:57,800 --> 00:34:00,690 is we've made it so that this module here 679 00:34:00,690 --> 00:34:03,580 can process sort of 10 times as many requests 680 00:34:03,580 --> 00:34:04,460 as it could before. 681 00:34:04,460 --> 00:34:09,320 So this module itself now has 10 times the throughput 682 00:34:09,320 --> 00:34:10,830 that it had before. 683 00:34:10,830 --> 00:34:14,560 And we said before that the bottleneck in the system 684 00:34:14,560 --> 00:34:18,949 is, the throughput of the system is the throughput 685 00:34:18,949 --> 00:34:21,610 of the slowest stage. 686 00:34:21,610 --> 00:34:24,275 So what we've managed to do is decrease the throughput 687 00:34:24,275 --> 00:34:25,150 of the slowest stage. 688 00:34:25,150 --> 00:34:27,760 And so now the system is running 10 times as fast. 689 00:34:27,760 --> 00:34:31,489 Notice now that the disk thread and the network threads 690 00:34:31,489 --> 00:34:34,969 both take 10 ms, sort of the throughput of each of them 691 00:34:34,969 --> 00:34:37,260 is 100 per second. 692 00:34:37,260 --> 00:34:41,280 And so, now we have sort of two stages that have been equalized 693 00:34:41,280 --> 00:34:42,986 in their throughput. 694 00:34:42,986 --> 00:34:44,610 And so if we wanted to further increase 695 00:34:44,610 --> 00:34:46,150 the performance of the system, we 696 00:34:46,150 --> 00:34:47,733 would have to increase the performance 697 00:34:47,733 --> 00:34:50,400 of both of these stages, not just one of them. 698 00:34:50,400 --> 00:34:56,630 OK, so that was a nice result, right? 699 00:34:56,630 --> 00:34:58,850 It seems like we've done something sort of, 700 00:34:58,850 --> 00:35:01,900 we've shown that we can use this notion of concurrency 701 00:35:01,900 --> 00:35:05,690 to increase the performance of a system. 702 00:35:05,690 --> 00:35:11,320 But, we've introduced a little bit of a problem. 703 00:35:11,320 --> 00:35:15,990 In particular, the problem we've introduced is as follows. 704 00:35:15,990 --> 00:35:23,530 So, remember, we said we had this set of threads, 705 00:35:23,530 --> 00:35:27,200 one through, say for example, ten, that our processing, 706 00:35:27,200 --> 00:35:30,160 they're all sharing this queue data 707 00:35:30,160 --> 00:35:36,350 structure that is connected up to our HTML thread. 708 00:35:40,120 --> 00:35:42,870 So, the problem with this is that what we've done 709 00:35:42,870 --> 00:35:46,017 is to introduce what's called a race condition on this queue. 710 00:35:46,017 --> 00:35:47,600 And I'll show you what I mean by that. 711 00:35:51,730 --> 00:35:53,390 So if we look at our code snippet 712 00:35:53,390 --> 00:35:57,230 up here, for example for what's happening in our HTML thread, 713 00:35:57,230 --> 00:36:01,230 we see that what it does is it calls dequeue, right? 714 00:36:01,230 --> 00:36:02,790 So the problem that we can have is 715 00:36:02,790 --> 00:36:05,790 that we may have multiple of these modules that 716 00:36:05,790 --> 00:36:08,290 are simultaneously executing at the same time. 717 00:36:08,290 --> 00:36:12,230 And they may simultaneously both call dequeue, right? 718 00:36:12,230 --> 00:36:14,260 So depending on how dequeue is implemented, 719 00:36:14,260 --> 00:36:15,920 we can get some weird results. 720 00:36:15,920 --> 00:36:17,660 So, let me give you a sort of very 721 00:36:17,660 --> 00:36:20,790 simple possible implementation of dequeue. 722 00:36:20,790 --> 00:36:25,110 Suppose that what dequeue does is it reads, 723 00:36:25,110 --> 00:36:30,570 so, given this queue here, let's say the queue is managed by, 724 00:36:30,570 --> 00:36:33,100 there's two variables that keep track of the current state 725 00:36:33,100 --> 00:36:33,850 of this queue. 726 00:36:33,850 --> 00:36:35,330 There is a variable called first, 727 00:36:35,330 --> 00:36:37,180 which points to the head of the queue, 728 00:36:37,180 --> 00:36:39,240 and there's a variable called last, which -- 729 00:36:39,240 --> 00:36:42,890 first points to the first used element in this queue, 730 00:36:42,890 --> 00:36:45,980 and last points to the last used element. 731 00:36:45,980 --> 00:36:49,780 So, the elements that are in use in the queue at any one time 732 00:36:49,780 --> 00:36:51,910 are between first and last, OK? 733 00:36:51,910 --> 00:36:54,850 And, what's going to happen is when we dequeue, 734 00:36:54,850 --> 00:36:58,990 we're going to sort of move first over one, right? 735 00:36:58,990 --> 00:37:03,130 So, when we dequeue something, we'll free up this cell. 736 00:37:03,130 --> 00:37:07,840 And when we enqueue, we'll move last down one. 737 00:37:07,840 --> 00:37:10,229 And then, when last reaches the end, 738 00:37:10,229 --> 00:37:11,520 we are going to wrap it around. 739 00:37:11,520 --> 00:37:13,870 So this is a fairly standard implementation of a queue. 740 00:37:13,870 --> 00:37:16,450 It's called the circular buffer. 741 00:37:16,450 --> 00:37:19,994 And if first is equal to last, then we 742 00:37:19,994 --> 00:37:21,160 know that the queue is full. 743 00:37:21,160 --> 00:37:23,160 So that's the condition that we can check. 744 00:37:23,160 --> 00:37:25,210 So we are not going to go into too many details 745 00:37:25,210 --> 00:37:27,130 about how this thing is actually implemented. 746 00:37:27,130 --> 00:37:31,550 But let's look at a very simple example of how dequeue 747 00:37:31,550 --> 00:37:32,600 might work. 748 00:37:32,600 --> 00:37:36,010 So remember we have these two shared variables first 749 00:37:36,010 --> 00:37:37,810 and last that are shared between these, 750 00:37:37,810 --> 00:37:40,410 say, all these threads that are accessing this thing. 751 00:37:40,410 --> 00:37:44,430 And what dequeue might do is to say it's going to read a block, 752 00:37:44,430 --> 00:37:48,440 read a page from this queue, so read the next HTML 753 00:37:48,440 --> 00:37:51,120 page to output, and it's going to read that 754 00:37:51,120 --> 00:37:53,730 into a local variable called page. 755 00:37:53,730 --> 00:37:56,550 Let's call this queue buf, B-U-F, 756 00:37:56,550 --> 00:37:59,920 I mean we'll use a array notation for accessing it. 757 00:37:59,920 --> 00:38:04,680 So it's going to read buf sub first, OK, 758 00:38:04,680 --> 00:38:06,990 and then it's going to increment first. 759 00:38:09,730 --> 00:38:14,650 First gets first plus one, and then it's going to return page. 760 00:38:17,790 --> 00:38:21,400 OK, that seems like a straightforward implementation 761 00:38:21,400 --> 00:38:23,010 of dequeue. 762 00:38:23,010 --> 00:38:24,890 And so we have one thread that's doing this. 763 00:38:24,890 --> 00:38:26,870 Now, suppose we have another thread that's 764 00:38:26,870 --> 00:38:30,520 doing exactly the same thing at the same time. 765 00:38:30,520 --> 00:38:33,520 So it runs exactly the same code. 766 00:38:33,520 --> 00:38:35,830 And remember that these two threads are sharing 767 00:38:35,830 --> 00:38:38,125 the variables buf and first. 768 00:38:46,650 --> 00:38:49,300 OK, so if you think about this if you think about these two 769 00:38:49,300 --> 00:38:52,040 things running two threads at the same time, 770 00:38:52,040 --> 00:38:57,040 there is sort of an interesting problem that can arise. 771 00:38:57,040 --> 00:38:59,814 So one thing that might happen when we are running these two 772 00:38:59,814 --> 00:39:02,230 things at the same time is that the thread scheduler might 773 00:39:02,230 --> 00:39:04,110 first start running thread one. 774 00:39:04,110 --> 00:39:06,446 And it might run the first instruction of thread one. 775 00:39:06,446 --> 00:39:08,320 And then it might run the second instruction. 776 00:39:08,320 --> 00:39:10,580 And then it might run this return thing. 777 00:39:10,580 --> 00:39:12,450 And then it might come over here, 778 00:39:12,450 --> 00:39:14,840 and it might start running T2. 779 00:39:14,840 --> 00:39:20,720 So, it might, then, stop running T1 and start running T2, 780 00:39:20,720 --> 00:39:22,850 and execute its three instructions. 781 00:39:22,850 --> 00:39:27,247 So if the thread scheduler does this, there's nothing wrong. 782 00:39:27,247 --> 00:39:28,330 It's not a problem, right? 783 00:39:28,330 --> 00:39:31,680 The thread scheduler, each of these things 784 00:39:31,680 --> 00:39:34,050 read its value from the queue and incremented it. 785 00:39:34,050 --> 00:39:35,882 T1 read one thing from the queue, 786 00:39:35,882 --> 00:39:37,840 and then T2 read the next thing from the queue. 787 00:39:37,840 --> 00:39:40,770 So clearly some of the time this is going to work fine. 788 00:39:40,770 --> 00:39:43,820 So let's make a list of possible outcomes. 789 00:39:43,820 --> 00:39:45,360 Sometimes we'll be OK. 790 00:39:45,360 --> 00:39:48,150 The first possible outcome was OK. 791 00:39:48,150 --> 00:39:51,590 But let's look at a different situation. 792 00:39:51,590 --> 00:39:57,530 Suppose what happens is that the first thing the thread 793 00:39:57,530 --> 00:39:59,240 scheduler does is schedule T1. 794 00:39:59,240 --> 00:40:02,010 And T1 executes this first instruction, 795 00:40:02,010 --> 00:40:04,060 and then just after that the thread scheduler 796 00:40:04,060 --> 00:40:08,540 decides to pre-empt T1, and allow T2 to start running. 797 00:40:08,540 --> 00:40:11,880 So it in particular allows T2 to execute 798 00:40:11,880 --> 00:40:14,530 this dequeue instruction to its end, 799 00:40:14,530 --> 00:40:17,690 and then it comes over here and it runs T1. 800 00:40:17,690 --> 00:40:21,150 OK, so what's the problem now? 801 00:40:29,310 --> 00:40:29,810 Yeah? 802 00:40:33,550 --> 00:40:38,680 Right, OK, so they've both read in the same page variable. 803 00:40:38,680 --> 00:40:42,700 So now both of these threads have dequeued the same page. 804 00:40:42,700 --> 00:40:48,110 So the value first, for T1, it was pointing here. 805 00:40:48,110 --> 00:40:49,330 And then we switched. 806 00:40:49,330 --> 00:40:51,119 And it was still pointing here, right? 807 00:40:51,119 --> 00:40:53,410 And now, so both of these guys have read the same page. 808 00:40:53,410 --> 00:40:56,879 And now they are both at some point going to increment first. 809 00:40:56,879 --> 00:40:58,420 So you're going to increment it once. 810 00:40:58,420 --> 00:40:59,510 Then you're going to increment it again. 811 00:40:59,510 --> 00:41:02,419 So this second element here in the queue has been skipped. 812 00:41:02,419 --> 00:41:03,460 OK, so this is a problem. 813 00:41:03,460 --> 00:41:04,668 We don't want this to happen. 814 00:41:04,668 --> 00:41:08,340 Because the system is not outputting all the pages 815 00:41:08,340 --> 00:41:10,010 that it was supposed to output. 816 00:41:10,010 --> 00:41:12,280 So what can we do to fix this? 817 00:41:20,956 --> 00:41:23,330 So the way that we fixed this is by introducing something 818 00:41:23,330 --> 00:41:24,545 we call isolation primitives. 819 00:41:32,050 --> 00:41:33,590 And the basic idea is that we want 820 00:41:33,590 --> 00:41:36,820 to introduce an operation that will make it 821 00:41:36,820 --> 00:41:40,850 so that any time that the page variable gets 822 00:41:40,850 --> 00:41:46,870 read out of the queue, that we also at the same time 823 00:41:46,870 --> 00:41:50,450 increment first without any other sort of threads' accesses 824 00:41:50,450 --> 00:41:52,700 to this queue being interleaved with our accesses 825 00:41:52,700 --> 00:41:55,400 to this queue, or our dequeues from the queue. 826 00:41:55,400 --> 00:41:58,080 So in sort of technical terms, what we say is 827 00:41:58,080 --> 00:42:02,570 we want these two things, the reading of page 828 00:42:02,570 --> 00:42:06,160 and the incrementing of first to be so-called atomic. 829 00:42:06,160 --> 00:42:08,940 OK, and the way that we're going to make these things atomic 830 00:42:08,940 --> 00:42:14,520 is by isolating them from each other, that 831 00:42:14,520 --> 00:42:16,640 by isolating these two threads from each other 832 00:42:16,640 --> 00:42:18,931 when they are executing the enqueue and dequeue things. 833 00:42:18,931 --> 00:42:21,180 So, these two terms we're going to come back to 834 00:42:21,180 --> 00:42:25,342 in a few months in the class towards the end of the class. 835 00:42:25,342 --> 00:42:26,800 But all you need to understand here 836 00:42:26,800 --> 00:42:28,482 is that there is this race condition, 837 00:42:28,482 --> 00:42:29,940 and we want some way to prevent it. 838 00:42:29,940 --> 00:42:31,564 And the way that we're going to prevent 839 00:42:31,564 --> 00:42:35,790 it is by using these isolation routines also sometimes 840 00:42:35,790 --> 00:42:37,320 called locks. 841 00:42:37,320 --> 00:42:40,472 So in this case, the isolation schemes 842 00:42:40,472 --> 00:42:41,680 are going to be called locks. 843 00:42:41,680 --> 00:42:46,020 So the idea is that a lock is simply a variable, which 844 00:42:46,020 --> 00:42:48,100 can be in one of two states. 845 00:42:48,100 --> 00:42:52,200 It can either be set or unset. 846 00:42:52,200 --> 00:42:55,990 And we have two operations that we can apply on a lock. 847 00:42:55,990 --> 00:42:58,830 We can acquire it, and we can release it. 848 00:43:02,030 --> 00:43:06,220 OK, and acquire and release have the following behavior. 849 00:43:06,220 --> 00:43:10,610 What acquire says is check the state of the lock, 850 00:43:10,610 --> 00:43:16,500 and if the lock is unset, then change the state to set. 851 00:43:16,500 --> 00:43:19,480 But if the lock is set, then wait 852 00:43:19,480 --> 00:43:23,990 until the lock becomes unset. 853 00:43:23,990 --> 00:43:26,030 What a release says is it simply says change 854 00:43:26,030 --> 00:43:29,910 the state of the lock from unset to set, or from set to unset, 855 00:43:29,910 --> 00:43:31,420 excuse me. 856 00:43:31,420 --> 00:43:35,560 So let's see how we can use these two routines in our code. 857 00:43:35,560 --> 00:43:38,740 So let's go back to our example of enqueue and dequeue. 858 00:43:38,740 --> 00:43:40,110 Let's introduce a lock variable. 859 00:43:40,110 --> 00:43:42,580 We'll call it TL for thread lock. 860 00:43:42,580 --> 00:43:47,130 And, what we're going to do is simply around these two 861 00:43:47,130 --> 00:43:53,660 operations to access the queue, to modify this page 862 00:43:53,660 --> 00:43:56,227 and first, to read the page and modify first, 863 00:43:56,227 --> 00:43:58,185 we're going to put in an acquire and a release. 864 00:44:21,620 --> 00:44:24,270 OK so we have ACQ on this thread lock, 865 00:44:24,270 --> 00:44:27,220 and we have release on this thread lock. 866 00:44:27,220 --> 00:44:30,770 OK, so let's look, so this seems fine. 867 00:44:30,770 --> 00:44:32,020 It looks like we've done this. 868 00:44:32,020 --> 00:44:36,300 But it's sort of positing the existence of this acquire 869 00:44:36,300 --> 00:44:38,809 procedure that just sort of does the right thing. 870 00:44:38,809 --> 00:44:40,350 If you think about this for a minute, 871 00:44:40,350 --> 00:44:42,620 it seems like we can have the same race condition 872 00:44:42,620 --> 00:44:45,332 problem in the acquire module as well, right, or the acquire 873 00:44:45,332 --> 00:44:46,040 function as well. 874 00:44:46,040 --> 00:44:48,470 With two guys both try and acquire the lock 875 00:44:48,470 --> 00:44:49,500 at the same time? 876 00:44:49,500 --> 00:44:51,315 How are we going to avoid this problem? 877 00:44:51,315 --> 00:44:52,690 And there's a couple of ways that 878 00:44:52,690 --> 00:44:54,770 are sort of well understood for avoiding 879 00:44:54,770 --> 00:44:57,010 this problem in practice, and they're 880 00:44:57,010 --> 00:44:58,230 talked about in the book. 881 00:44:58,230 --> 00:45:00,242 I'm just going to introduce the simplest of them 882 00:45:00,242 --> 00:45:02,700 now, which is that we're going to add a special instruction 883 00:45:02,700 --> 00:45:06,910 to the microprocessor that allows us to do this, 884 00:45:06,910 --> 00:45:07,880 acquire efficiently. 885 00:45:07,880 --> 00:45:10,710 It turns out that most modern microprocessors 886 00:45:10,710 --> 00:45:12,336 have an equivalent instruction. 887 00:45:12,336 --> 00:45:13,960 So we're going to call this instruction 888 00:45:13,960 --> 00:45:18,690 RSL for read-set- lock. 889 00:45:18,690 --> 00:45:24,415 OK, so the idea with RSL is as follows. 890 00:45:27,700 --> 00:45:32,520 We can basically, the implementation 891 00:45:32,520 --> 00:45:39,165 of the acquire module is going to be like this. 892 00:45:39,165 --> 00:45:40,790 What it's going to do, remember we want 893 00:45:40,790 --> 00:45:42,750 to wait until we want to loop. 894 00:45:42,750 --> 00:45:43,890 We don't have the lock. 895 00:45:43,890 --> 00:45:45,480 If we don't have the lock, we want 896 00:45:45,480 --> 00:45:46,997 to loop until we've had the lock. 897 00:45:46,997 --> 00:45:49,080 So the implementation require may look as follows. 898 00:45:49,080 --> 00:45:50,770 We'll have a local variable called held. 899 00:45:50,770 --> 00:45:54,180 We'll initially set it to false in a while loop 900 00:45:54,180 --> 00:45:57,710 while we don't hold the lock. 901 00:45:57,710 --> 00:45:59,560 We're going to use this RSL instruction. 902 00:46:07,190 --> 00:46:10,170 So, what this says is held equals RSL of TL, OK? 903 00:46:10,170 --> 00:46:12,270 So, what the RSL instruction does 904 00:46:12,270 --> 00:46:16,560 is it looks at the state of the lock, and if the lock is unset, 905 00:46:16,560 --> 00:46:18,350 then it sets it. 906 00:46:18,350 --> 00:46:22,650 And if the lock is set, then it sets it and it returns true. 907 00:46:22,650 --> 00:46:25,520 And if the lock is set, then it returns false. 908 00:46:25,520 --> 00:46:28,690 So it has the property that it can both read 909 00:46:28,690 --> 00:46:31,940 and set the lock within a single instruction, right? 910 00:46:31,940 --> 00:46:36,980 And we're going to use this read and set lock sort of primitive, 911 00:46:36,980 --> 00:46:39,200 this basic thing, as a way to build up 912 00:46:39,200 --> 00:46:42,900 this sort of more complicated acquire function, which 913 00:46:42,900 --> 00:46:45,470 we can then use to build up these locks. 914 00:46:45,470 --> 00:46:51,490 OK, so anytime you're designing a multithreaded system 915 00:46:51,490 --> 00:46:53,990 in this way, or a system with lots of concurrency, 916 00:46:53,990 --> 00:46:55,810 you should be worrying about whether you 917 00:46:55,810 --> 00:46:57,440 have race conditions. 918 00:46:57,440 --> 00:46:59,120 And if you have race conditions, you 919 00:46:59,120 --> 00:47:02,060 need to think about how to use locks in order to prevent 920 00:47:02,060 --> 00:47:03,750 those race conditions. 921 00:47:03,750 --> 00:47:09,350 Alright, so there are a couple of other topics related 922 00:47:09,350 --> 00:47:11,435 to performance that appear in the text. 923 00:47:14,630 --> 00:47:16,340 And one of those topics is caching. 924 00:47:16,340 --> 00:47:19,810 And I just want to spend one very brief minute on caching. 925 00:47:19,810 --> 00:47:22,770 So you guys have already seen catching presumably 926 00:47:22,770 --> 00:47:26,700 in the context of 6.004 with processor caches. 927 00:47:26,700 --> 00:47:29,130 And what we would want to do, so you 928 00:47:29,130 --> 00:47:30,900 might want to sit down and think through 929 00:47:30,900 --> 00:47:33,180 as an example of how you would use a cache 930 00:47:33,180 --> 00:47:36,770 to improve the performance of our Web server. 931 00:47:36,770 --> 00:47:38,950 So one thing that you might do in order 932 00:47:38,950 --> 00:47:44,960 to improve the performnence of the Web server 933 00:47:44,960 --> 00:47:47,520 is to put a cace in the disk thread 934 00:47:47,520 --> 00:47:53,230 that you use instead of going to disk in order to sort of reduce 935 00:47:53,230 --> 00:47:55,550 the latency of a disk access. 936 00:47:55,550 --> 00:47:57,570 And I'll at the beginning of class next time 937 00:47:57,570 --> 00:47:59,300 take you through a very simple example 938 00:47:59,300 --> 00:48:01,425 of how we can actually use the disk thread in order 939 00:48:01,425 --> 00:48:02,327 to do that. 940 00:48:02,327 --> 00:48:04,910 But you guys should think about this a little bit on your own. 941 00:48:04,910 --> 00:48:07,750 So barring that little digression 942 00:48:07,750 --> 00:48:09,490 that we'll have next time, this takes us 943 00:48:09,490 --> 00:48:12,770 to the end of our discussion of sort of modularity, 944 00:48:12,770 --> 00:48:14,319 abstraction, and performance. 945 00:48:14,319 --> 00:48:15,860 And what we're going to start talking 946 00:48:15,860 --> 00:48:18,470 about next time is networking, and how networks 947 00:48:18,470 --> 00:48:19,970 But I want you guys to make sure you 948 00:48:19,970 --> 00:48:22,000 keep in mind all these topics that we've 949 00:48:22,000 --> 00:48:22,930 talked about because these are going 950 00:48:22,930 --> 00:48:24,320 to be the sort of fundamental tools 951 00:48:24,320 --> 00:48:25,900 that we are going to use throughout the class 952 00:48:25,900 --> 00:48:27,360 in the design of computer systems. 953 00:48:27,360 --> 00:48:28,970 So because we've finished this module, 954 00:48:28,970 --> 00:48:30,930 it doesn't mean that it's sort of OK to stop 955 00:48:30,930 --> 00:48:31,770 thinking about this stuff. 956 00:48:31,770 --> 00:48:34,019 You need to keep all of this in mind at the same time. 957 00:48:34,019 --> 00:48:36,480 So we'll see you all on Wednesday.