1 00:00:00,120 --> 00:00:02,500 The following content is provided under a Creative 2 00:00:02,500 --> 00:00:03,910 Commons license. 3 00:00:03,910 --> 00:00:06,950 Your support will help MIT OpenCourseWare continue to 4 00:00:06,950 --> 00:00:10,600 offer high-quality educational resources for free. 5 00:00:10,600 --> 00:00:13,500 To make a donation or view additional materials from 6 00:00:13,500 --> 00:00:17,430 hundreds of MIT courses, visit MIT OpenCourseWare at 7 00:00:17,430 --> 00:00:18,680 ocw.mit.edu. 8 00:00:27,870 --> 00:00:30,680 Let's get going here. 9 00:00:35,990 --> 00:00:40,940 So this is a lecture that's actually appropriate for 10 00:00:40,940 --> 00:00:46,670 Halloween, because it's a scary topic. 11 00:00:46,670 --> 00:00:48,000 Non-deterministic programming. 12 00:00:52,410 --> 00:00:55,280 So we've been looking mostly at deterministic programs. 13 00:00:55,280 --> 00:01:00,290 So a program is deterministic on a given input if every 14 00:01:00,290 --> 00:01:03,550 memory location is updated with the same sequence of 15 00:01:03,550 --> 00:01:05,470 values in every execution. 16 00:01:08,000 --> 00:01:12,580 So if you look at the memory of the machine, you can view 17 00:01:12,580 --> 00:01:17,250 that as, essentially, the state of the machine. 18 00:01:17,250 --> 00:01:19,570 And if you're always updating every memory location with 19 00:01:19,570 --> 00:01:23,760 exactly the same sequence of values, then the program is 20 00:01:23,760 --> 00:01:24,530 deterministic. 21 00:01:24,530 --> 00:01:29,700 Now it may be that two memory locations may be updated in a 22 00:01:29,700 --> 00:01:31,310 different order. 23 00:01:31,310 --> 00:01:36,340 So you may have one location which is updated first in one 24 00:01:36,340 --> 00:01:39,600 execution, and another that's second, and then in a 25 00:01:39,600 --> 00:01:41,490 different execution, they may be a different order. 26 00:01:41,490 --> 00:01:43,270 That's OK, generally. 27 00:01:43,270 --> 00:01:46,490 The issue is whether or not every memory location sees the 28 00:01:46,490 --> 00:01:47,870 same order. 29 00:01:47,870 --> 00:01:55,800 And if they do, then it's for every execution, then it's a 30 00:01:55,800 --> 00:01:57,050 deterministic program. 31 00:02:01,850 --> 00:02:07,850 So what's the advantage of having a 32 00:02:07,850 --> 00:02:10,316 deterministic program? 33 00:02:10,316 --> 00:02:11,302 Yeah? 34 00:02:11,302 --> 00:02:15,246 AUDIENCE: It always runs the same way [INAUDIBLE]. 35 00:02:15,246 --> 00:02:16,470 PROFESSOR: It always runs the same way. 36 00:02:16,470 --> 00:02:17,860 So what? 37 00:02:17,860 --> 00:02:18,400 What's that good for? 38 00:02:18,400 --> 00:02:20,220 AUDIENCE: So you can find bugs easier. 39 00:02:20,220 --> 00:02:22,790 PROFESSOR: Yeah, debugging. 40 00:02:22,790 --> 00:02:25,810 It's really easy to find bugs if every time you run it it 41 00:02:25,810 --> 00:02:27,030 does the same thing. 42 00:02:27,030 --> 00:02:31,360 It's much harder to find bugs if, when you run it, it might 43 00:02:31,360 --> 00:02:34,060 do something different. 44 00:02:34,060 --> 00:02:38,400 So that leads to our first major rule of thumb about 45 00:02:38,400 --> 00:02:41,910 determinism, which is you should always write 46 00:02:41,910 --> 00:02:43,160 deterministic programs. 47 00:02:46,180 --> 00:02:47,190 Don't write 48 00:02:47,190 --> 00:02:50,690 non-deterministic programs. 49 00:02:50,690 --> 00:02:54,480 And the only problem is, boy is that poor quality there. 50 00:02:54,480 --> 00:02:56,750 So basically, it says, always write non-deterministic 51 00:02:56,750 --> 00:03:00,570 programs unless you can't. 52 00:03:00,570 --> 00:03:04,790 So sometimes, the only way to get performance is to do 53 00:03:04,790 --> 00:03:06,040 something non-deterministic. 54 00:03:09,740 --> 00:03:14,900 So this lecture is basically about some of the ways of 55 00:03:14,900 --> 00:03:18,380 doing non-deterministic programming. 56 00:03:18,380 --> 00:03:26,450 So it's appropriate that we say this is not for those who 57 00:03:26,450 --> 00:03:27,940 are faint of heart. 58 00:03:27,940 --> 00:03:33,340 We are treading into dangerous territory here. 59 00:03:36,020 --> 00:03:39,410 So the basic rule is, as I say, any time you can, make 60 00:03:39,410 --> 00:03:42,830 your program deterministic. 61 00:03:42,830 --> 00:03:46,780 So we're going to talk about the number one way that people 62 00:03:46,780 --> 00:03:51,240 introduce non-determinism into programs, which is via mutual 63 00:03:51,240 --> 00:03:57,510 exclusion and mutexes, which are a type of lock, and then 64 00:03:57,510 --> 00:04:01,350 look at some of the anomalies that you get. 65 00:04:01,350 --> 00:04:05,990 Besides just things being non-deterministic, you can 66 00:04:05,990 --> 00:04:08,420 also get some very, very weird behavior 67 00:04:08,420 --> 00:04:12,100 sometimes for the execution. 68 00:04:12,100 --> 00:04:15,840 So we'll start out with mutual exclusion. 69 00:04:15,840 --> 00:04:18,120 So let's take a look, for example, suppose I'm 70 00:04:18,120 --> 00:04:20,899 implementing a hash table as a set of bins. 71 00:04:20,899 --> 00:04:24,640 And I'm resolving collisions with chaining. 72 00:04:24,640 --> 00:04:29,720 So here, each slot of my hash table has a chain of all the 73 00:04:29,720 --> 00:04:34,180 values that resolve to that slot. 74 00:04:34,180 --> 00:04:40,660 And if I have a value x, let's say it has key 81, and I want 75 00:04:40,660 --> 00:04:47,920 to insert x into the table, I first compute a hash of x. 76 00:04:47,920 --> 00:04:53,690 And let's say it hashes to this particular list here. 77 00:04:53,690 --> 00:04:55,960 And then what I do is I say, OK, let me 78 00:04:55,960 --> 00:04:57,990 insert x into the table. 79 00:04:57,990 --> 00:05:02,900 So I make the next pointer of x point to whatever is the 80 00:05:02,900 --> 00:05:07,310 head of the table. 81 00:05:07,310 --> 00:05:12,020 And then I make the table 0.2x. 82 00:05:12,020 --> 00:05:19,540 And that effectively inserts x into the hash table. 83 00:05:19,540 --> 00:05:21,910 Fairly straightforward piece of code. 84 00:05:21,910 --> 00:05:24,490 I would expect that most of you could write that even on 85 00:05:24,490 --> 00:05:27,760 an exam and get it right. 86 00:05:27,760 --> 00:05:33,130 But what happens when we say, oh, let's have some 87 00:05:33,130 --> 00:05:33,380 concurrency. 88 00:05:33,380 --> 00:05:37,430 Let's have the ability to look up things in a hash table in 89 00:05:37,430 --> 00:05:44,080 different parallel branches of a parallel program. 90 00:05:44,080 --> 00:05:48,010 So here, we have a concurrent hash table now where I've got 91 00:05:48,010 --> 00:05:51,190 two values, and I'm going to have two different threads 92 00:05:51,190 --> 00:05:54,770 inserting x and y. 93 00:05:54,770 --> 00:05:57,480 So one of them is going to do this one, and one of them is 94 00:05:57,480 --> 00:06:00,450 going to do this one. 95 00:06:00,450 --> 00:06:05,580 So let's just see how this can screw up. 96 00:06:05,580 --> 00:06:10,120 So first, we hash x, and it hashes to 97 00:06:10,120 --> 00:06:13,440 this particular slot. 98 00:06:13,440 --> 00:06:17,170 So then we do, just as we're doing before, making its next 99 00:06:17,170 --> 00:06:18,570 pointer point to the beginning of the array. 100 00:06:22,450 --> 00:06:25,140 Then y gets in the picture, and it decides oh, 101 00:06:25,140 --> 00:06:26,600 I'm going to hash. 102 00:06:26,600 --> 00:06:29,275 And oh, it hashes to exactly the same slot. 103 00:06:31,910 --> 00:06:34,830 And then y makes its next pointer point to the same to 104 00:06:34,830 --> 00:06:37,240 the head of the list. 105 00:06:37,240 --> 00:06:41,100 And then it sets the head of the list to point to y. 106 00:06:41,100 --> 00:06:43,700 So now y is in the list. 107 00:06:43,700 --> 00:06:49,020 Whoops, now x puts itself in the list, effectively taking y 108 00:06:49,020 --> 00:06:49,780 out of the list. 109 00:06:49,780 --> 00:06:53,300 So rather than x and y both being in the list, we have a 110 00:06:53,300 --> 00:06:54,550 concurrency bug. 111 00:06:58,150 --> 00:07:03,770 So this is clearly a race. 112 00:07:03,770 --> 00:07:12,530 So it's a determinacy race, because we have two parallel 113 00:07:12,530 --> 00:07:16,090 instructions accessing essentially the same location, 114 00:07:16,090 --> 00:07:18,450 at least one of which-- in this case both of them-- 115 00:07:18,450 --> 00:07:22,030 performing a store to that location. 116 00:07:22,030 --> 00:07:23,890 So that's a determinacy race. 117 00:07:23,890 --> 00:07:26,950 And how things are going to work out depends upon which 118 00:07:26,950 --> 00:07:29,540 one of these guys goes first. 119 00:07:29,540 --> 00:07:32,920 Notice, as with most race bugs, that if this code all 120 00:07:32,920 --> 00:07:35,780 executed before this code, we're OK. 121 00:07:35,780 --> 00:07:40,780 Or if this code all executed before this code, we're OK. 122 00:07:40,780 --> 00:07:44,810 So the bug occurs when they happen to execute at 123 00:07:44,810 --> 00:07:49,610 essentially the same time and their instructions interleave. 124 00:07:52,380 --> 00:07:54,110 So this is a race bug. 125 00:07:54,110 --> 00:08:03,610 So one of the classic ways of fixing this kind of race bug 126 00:08:03,610 --> 00:08:07,296 is to insist on some kind of mutual exclusion. 127 00:08:09,820 --> 00:08:16,830 So a critical section is a piece of code that is going to 128 00:08:16,830 --> 00:08:26,350 access shared data that must not be executed by two threads 129 00:08:26,350 --> 00:08:29,400 at the same time. 130 00:08:29,400 --> 00:08:34,809 So it shouldn't be accessed by two threads at the same time. 131 00:08:34,809 --> 00:08:36,100 So it's mutual exclusion. 132 00:08:36,100 --> 00:08:39,159 So that's what a critical section is. 133 00:08:39,159 --> 00:08:43,640 And we have a mechanism that operating 134 00:08:43,640 --> 00:08:45,510 systems typically provide-- 135 00:08:45,510 --> 00:08:48,910 as well as runtime systems, but you can build your own-- 136 00:08:48,910 --> 00:08:53,270 called "mutexes," or "mutex locks," or sometimes just 137 00:08:53,270 --> 00:08:59,220 "locks." So a mutex is an object that has a lock and 138 00:08:59,220 --> 00:09:00,580 unlock member function. 139 00:09:03,210 --> 00:09:08,140 And any attempt by a thread to lock an already locked mutex 140 00:09:08,140 --> 00:09:11,330 causes that thread to block. 141 00:09:11,330 --> 00:09:15,000 And "block" is, by the way, a hugely overused word in 142 00:09:15,000 --> 00:09:16,190 computer science. 143 00:09:16,190 --> 00:09:21,540 In this case, by "block," they mean "wait." It waits until 144 00:09:21,540 --> 00:09:25,120 the mutex is unlocked. 145 00:09:25,120 --> 00:09:28,680 So whenever you have something that's locked, somebody else 146 00:09:28,680 --> 00:09:31,450 comes and tries to grab the lock. 147 00:09:31,450 --> 00:09:33,980 The mutex mechanism only allows one 148 00:09:33,980 --> 00:09:35,380 thread to access it. 149 00:09:35,380 --> 00:09:37,850 The other one waits until the lock is freed. 150 00:09:37,850 --> 00:09:46,220 Then this other one can go access it. 151 00:09:46,220 --> 00:09:52,370 So what we can do is build a concurrent hash table by 152 00:09:52,370 --> 00:09:59,770 modifying each slot in the table to have both a mutex, L, 153 00:09:59,770 --> 00:10:05,290 and a pointer called "head" to the slot contents. 154 00:10:05,290 --> 00:10:09,350 And then the idea is that what we'll do is hash 155 00:10:09,350 --> 00:10:11,370 the value to a slot. 156 00:10:11,370 --> 00:10:14,070 But before we access the elements of the slot, we're 157 00:10:14,070 --> 00:10:17,460 going to grab the lock on the slot. 158 00:10:17,460 --> 00:10:19,980 So every slot in the table has a lock here. 159 00:10:19,980 --> 00:10:22,040 Now, I could have a lock on the whole table. 160 00:10:22,040 --> 00:10:24,820 What's the problem with that? 161 00:10:24,820 --> 00:10:25,763 Sure. 162 00:10:25,763 --> 00:10:27,013 AUDIENCE: [INAUDIBLE] 163 00:10:29,627 --> 00:10:31,076 basically can't do anything. 164 00:10:31,076 --> 00:10:32,042 You can't read. 165 00:10:32,042 --> 00:10:33,020 You couldn't be reading from the table. 166 00:10:33,020 --> 00:10:34,470 PROFESSOR: Yeah, so if you have a lock 167 00:10:34,470 --> 00:10:35,460 on the whole table-- 168 00:10:35,460 --> 00:10:37,550 AUDIENCE: You would defeat the purpose [INAUDIBLE] 169 00:10:37,550 --> 00:10:38,510 PROFESSOR: You defeat the purpose of trying to have a 170 00:10:38,510 --> 00:10:41,380 concurrent hash table, right? 171 00:10:41,380 --> 00:10:44,200 Because only one thread can actually access the 172 00:10:44,200 --> 00:10:45,260 hash table at a time. 173 00:10:45,260 --> 00:10:49,270 So in this case, what we'll do is we'll lock each slot of the 174 00:10:49,270 --> 00:10:51,310 hash table. 175 00:10:51,310 --> 00:10:53,450 And there are actually mechanisms where you can lock 176 00:10:53,450 --> 00:10:55,780 each element of the hash table or a 177 00:10:55,780 --> 00:10:58,040 constant number of elements. 178 00:10:58,040 --> 00:11:01,420 But basically, what we're trying to do is make it so 179 00:11:01,420 --> 00:11:04,480 that the odds are that if you have a big enough table and 180 00:11:04,480 --> 00:11:08,560 relatively few processors you're running on, the odds 181 00:11:08,560 --> 00:11:10,500 that they'll conflict are going to be very low. 182 00:11:13,100 --> 00:11:16,780 So what we do is we grab a lock on the slot, and then we 183 00:11:16,780 --> 00:11:19,810 play the same game of inserting 184 00:11:19,810 --> 00:11:21,670 ourselves at the head. 185 00:11:21,670 --> 00:11:23,270 And then we unlock the slot. 186 00:11:26,770 --> 00:11:30,860 So what that does is it means that only one of the two 187 00:11:30,860 --> 00:11:35,690 threads in the previous example can actually execute 188 00:11:35,690 --> 00:11:38,260 this code at a time. 189 00:11:38,260 --> 00:11:42,820 And so it guarantees that the two regions of code will 190 00:11:42,820 --> 00:11:45,790 either execute in this order or in this order, and you'll 191 00:11:45,790 --> 00:11:49,460 never get the instructions interleaved. 192 00:11:49,460 --> 00:11:55,150 Now, this is introducing non-determinism. 193 00:11:55,150 --> 00:11:58,740 Why is this going to be non-deterministic? 194 00:11:58,740 --> 00:12:00,130 Yes? 195 00:12:00,130 --> 00:12:02,630 AUDIENCE: [INAUDIBLE] lock first, it'll be [INAUDIBLE]. 196 00:12:02,630 --> 00:12:04,510 PROFESSOR: Yeah, depending upon which one gets the lock 197 00:12:04,510 --> 00:12:10,210 first, the length list in there will have the elements 198 00:12:10,210 --> 00:12:12,660 in a different order. 199 00:12:12,660 --> 00:12:16,700 So a program that depends on the order of that list is 200 00:12:16,700 --> 00:12:19,230 going to behave differently from run to run. 201 00:12:22,520 --> 00:12:26,610 So let's recall the definition of a determinacy race. 202 00:12:26,610 --> 00:12:29,980 It occurs when two logically parallel instructions access 203 00:12:29,980 --> 00:12:32,500 the same memory location, and at least one of the 204 00:12:32,500 --> 00:12:36,080 instructions performs a write. 205 00:12:36,080 --> 00:12:41,190 So that is, we do have a determinacy race when we 206 00:12:41,190 --> 00:12:43,480 introduce locks. 207 00:12:43,480 --> 00:12:45,480 Locks are essentially, we're going to have an intentional 208 00:12:45,480 --> 00:12:48,680 determinacy race. 209 00:12:48,680 --> 00:12:53,480 So a program execution with no determinacy races means the 210 00:12:53,480 --> 00:12:57,350 program is deterministic on that input. 211 00:12:57,350 --> 00:13:02,420 So if there are no determinacy races, then although 212 00:13:02,420 --> 00:13:05,300 individual locations may be updated in a different order 213 00:13:05,300 --> 00:13:11,680 in a parallel execution, every memory location will be 214 00:13:11,680 --> 00:13:15,500 updated by exactly the same thing at the same time. 215 00:13:15,500 --> 00:13:18,860 The order will be of update of operations on any given 216 00:13:18,860 --> 00:13:20,460 location will be the same always. 217 00:13:22,960 --> 00:13:25,350 So that's actually a theorem, which we're 218 00:13:25,350 --> 00:13:27,950 not going to prove. 219 00:13:27,950 --> 00:13:30,700 But I think if you think about it, it's fairly 220 00:13:30,700 --> 00:13:31,240 straightforward. 221 00:13:31,240 --> 00:13:35,430 If you never have two guys in parallel that could possibly 222 00:13:35,430 --> 00:13:39,490 affect the same location, then the behavior always is going 223 00:13:39,490 --> 00:13:40,760 to be the same thing. 224 00:13:40,760 --> 00:13:44,020 Things are going to get written in the same order. 225 00:13:44,020 --> 00:13:47,200 So the program in that case always behaves the same on 226 00:13:47,200 --> 00:13:50,370 that given input, no matter how it's 227 00:13:50,370 --> 00:13:52,960 scheduled and executed. 228 00:13:52,960 --> 00:13:56,220 We'll always have essentially the same behavior, even though 229 00:13:56,220 --> 00:13:58,220 it may get scheduled one way or another. 230 00:14:04,210 --> 00:14:07,630 And one of the nice things that we have in our race 231 00:14:07,630 --> 00:14:11,640 detection tool Cilkscreen is that if we do have determinacy 232 00:14:11,640 --> 00:14:14,250 races that exist in an ostensibly 233 00:14:14,250 --> 00:14:15,590 deterministic program-- 234 00:14:15,590 --> 00:14:18,460 that is, a program with no mutexes. 235 00:14:18,460 --> 00:14:25,880 If basically it just reads and writes on locations and so 236 00:14:25,880 --> 00:14:27,720 forth, then Cilkscreen guarantees 237 00:14:27,720 --> 00:14:30,250 to find such a race. 238 00:14:30,250 --> 00:14:32,980 So It's nice that we get a guarantee out of Cilkscreen. 239 00:14:38,640 --> 00:14:43,430 So this is all beautiful, elegant, everything works out 240 00:14:43,430 --> 00:14:48,270 great if there are no determinacy races. 241 00:14:48,270 --> 00:14:51,510 But when we do something like a concurrent hash table, we're 242 00:14:51,510 --> 00:14:56,240 intentionally putting in a determinacy area. 243 00:14:56,240 --> 00:14:59,000 So that asks sort of a natural question. 244 00:14:59,000 --> 00:15:04,050 Why would I want to have a concurrent hash table? 245 00:15:04,050 --> 00:15:09,450 Why not make it so that my program is deterministic? 246 00:15:09,450 --> 00:15:13,190 Why might a concurrent hash table be an advantageous thing 247 00:15:13,190 --> 00:15:15,070 to have in a program that you wanted to go fast? 248 00:15:19,290 --> 00:15:20,780 Some ideas? 249 00:15:20,780 --> 00:15:22,187 Where might you want to use it? 250 00:15:22,187 --> 00:15:22,644 Yeah? 251 00:15:22,644 --> 00:15:23,560 AUDIENCE: Speed? 252 00:15:23,560 --> 00:15:24,300 PROFESSOR: Yeah, speed. 253 00:15:24,300 --> 00:15:25,300 But I mean, what's an application? 254 00:15:25,300 --> 00:15:31,640 What's a use case, as the entrepreneurs would ask you? 255 00:15:31,640 --> 00:15:33,350 Where is it that you would really want to use a 256 00:15:33,350 --> 00:15:35,480 concurrent hash table to give you speed? 257 00:15:38,420 --> 00:15:38,910 Yeah? 258 00:15:38,910 --> 00:15:41,850 AUDIENCE: If you started using it [INAUDIBLE] 259 00:15:41,850 --> 00:15:45,430 along with your system [INAUDIBLE] 260 00:15:45,430 --> 00:15:45,820 values. 261 00:15:45,820 --> 00:15:49,150 PROFESSOR: Yeah, it could be that there's some sort of 262 00:15:49,150 --> 00:15:53,020 global table that you want a lot of people to be able to 263 00:15:53,020 --> 00:15:54,270 access at one time. 264 00:15:56,990 --> 00:16:00,960 So if you lock down and only had one thread accessing at a 265 00:16:00,960 --> 00:16:03,670 time, you reduce how much concurrency 266 00:16:03,670 --> 00:16:04,670 that you could have. 267 00:16:04,670 --> 00:16:05,796 That's a good one. 268 00:16:05,796 --> 00:16:06,768 Yeah? 269 00:16:06,768 --> 00:16:10,170 AUDIENCE: Perhaps most of the time, people are just reading. 270 00:16:10,170 --> 00:16:12,600 So if you had something concurrent, your reading 271 00:16:12,600 --> 00:16:15,516 should be fine. 272 00:16:15,516 --> 00:16:18,432 So in that case, a lot more reading high performance 273 00:16:18,432 --> 00:16:21,348 [INAUDIBLE] 274 00:16:21,348 --> 00:16:25,090 PROFESSOR: Yeah, so in fact, there's a type of lock called 275 00:16:25,090 --> 00:16:31,010 a reader-writer lock, which allows one writer to operate, 276 00:16:31,010 --> 00:16:33,650 but many readers. 277 00:16:33,650 --> 00:16:37,120 So that's another type of concurrency control. 278 00:16:37,120 --> 00:16:39,770 So just another place, a common place that you use it, 279 00:16:39,770 --> 00:16:42,440 is when you're memoizing. 280 00:16:42,440 --> 00:16:45,350 Meaning I do a computation, I want to remember the results 281 00:16:45,350 --> 00:16:49,790 so that if I see it again, I can look it up rather than 282 00:16:49,790 --> 00:16:52,210 having to compute it again from scratch. 283 00:16:52,210 --> 00:16:56,060 So you might keep all those values in a hash table. 284 00:16:56,060 --> 00:16:59,340 Well, if I go in the hash table, now I'm going to have 285 00:16:59,340 --> 00:17:02,550 concurrent accesses to that hash table if I've got a 286 00:17:02,550 --> 00:17:05,890 parallel program that wants to do memorizing. 287 00:17:05,890 --> 00:17:09,050 And there are a bunch of other cases. 288 00:17:09,050 --> 00:17:11,300 So we have determinacy races. 289 00:17:11,300 --> 00:17:15,810 And we have a great guarantee that if there is a race, we 290 00:17:15,810 --> 00:17:17,208 guarantee to find it. 291 00:17:20,869 --> 00:17:23,500 Now, there's another type of race, and in fact, you'll hear 292 00:17:23,500 --> 00:17:26,859 more about this type of race if you read the literature 293 00:17:26,859 --> 00:17:30,170 than you hear about determinacy races. 294 00:17:30,170 --> 00:17:34,440 So a data race occurs when you have two logically parallel 295 00:17:34,440 --> 00:17:39,060 instructions holding no locks in common. 296 00:17:39,060 --> 00:17:41,400 And they access the same location, and at least one of 297 00:17:41,400 --> 00:17:44,730 the instructions performs a write. 298 00:17:44,730 --> 00:17:49,310 So this is saying that I've got accesses. 299 00:17:49,310 --> 00:17:51,340 And if they have no locks in common-- 300 00:17:51,340 --> 00:17:54,490 so it could be that you have a problem where one of them 301 00:17:54,490 --> 00:17:59,720 holds a lock L, and another one holds L prime. 302 00:17:59,720 --> 00:18:04,270 And then they access the location, that's going to be a 303 00:18:04,270 --> 00:18:08,030 data race, because they don't hold locks in common. 304 00:18:08,030 --> 00:18:13,360 But if I have L and L being the locks that the two threads 305 00:18:13,360 --> 00:18:15,250 hold, and they access the same location, that's 306 00:18:15,250 --> 00:18:16,770 not a data race now. 307 00:18:16,770 --> 00:18:20,410 It is a determinacy race, because it's going to matter 308 00:18:20,410 --> 00:18:23,780 which order it is, but it's not a data race, because the 309 00:18:23,780 --> 00:18:27,160 locks, in some sense, are protecting access. 310 00:18:27,160 --> 00:18:30,460 So Cilkscreen, in fact, understands locks and will not 311 00:18:30,460 --> 00:18:34,515 report a determinacy race unless it is also a data race. 312 00:18:38,880 --> 00:18:44,530 However, since codes that use locks are non-deterministic by 313 00:18:44,530 --> 00:18:50,440 intention, they actually weaken Cilkscreen's guarantee. 314 00:18:50,440 --> 00:18:54,820 And in particular, in its execution that it does, if it 315 00:18:54,820 --> 00:18:58,250 finds a data race, it's going to say, I'm going to ignore 316 00:18:58,250 --> 00:18:59,240 that data race. 317 00:18:59,240 --> 00:19:04,530 But now it is only going to follow one of the two paths 318 00:19:04,530 --> 00:19:06,640 that might arise from that data race. 319 00:19:11,460 --> 00:19:13,990 In other words, it doesn't follow both paths. 320 00:19:13,990 --> 00:19:18,480 If you could think about it, when one of them wins-- 321 00:19:18,480 --> 00:19:20,970 so you have a race between two critical sections. 322 00:19:20,970 --> 00:19:25,080 When one of them wins, you can imagine that's one possible 323 00:19:25,080 --> 00:19:26,300 outcome of the computation. 324 00:19:26,300 --> 00:19:31,090 When the other wins, it's another path. 325 00:19:31,090 --> 00:19:33,420 And what Cilkscreen does is it picks one path. 326 00:19:33,420 --> 00:19:36,350 In fact, it picks the path which is the one that would 327 00:19:36,350 --> 00:19:40,120 occur in the cereal execution. 328 00:19:40,120 --> 00:19:42,210 So there's a whole path there that you're not exploring. 329 00:19:45,240 --> 00:19:49,420 So Cilkscreen's guarantee is not going to be strong there. 330 00:19:49,420 --> 00:19:53,050 However, if the critical sections, in fact, commute-- 331 00:19:53,050 --> 00:19:55,230 that is, they do exactly the same thing, no 332 00:19:55,230 --> 00:19:57,450 matter what the order. 333 00:19:57,450 --> 00:20:00,780 So for example, if they're both incrementing a value, 334 00:20:00,780 --> 00:20:04,500 then the result, after doing one versus after the other is 335 00:20:04,500 --> 00:20:07,970 the same value, then you get a guarantee out of Cilkscreen. 336 00:20:10,860 --> 00:20:13,460 So Cilkscreen could still be very helpful for finding bugs, 337 00:20:13,460 --> 00:20:17,930 because typically, when you organize your computation, if 338 00:20:17,930 --> 00:20:22,180 it occurs in this order, there's typically some 339 00:20:22,180 --> 00:20:24,870 execution or input where you can make things occur in the 340 00:20:24,870 --> 00:20:26,920 other order. 341 00:20:26,920 --> 00:20:32,040 So you can actually cover more races than you might imagine 342 00:20:32,040 --> 00:20:33,920 on first blush. 343 00:20:33,920 --> 00:20:36,286 But it is a danger. 344 00:20:36,286 --> 00:20:38,140 But what we're talking about today is dangerous 345 00:20:38,140 --> 00:20:42,180 programming, non-deterministic programming. 346 00:20:42,180 --> 00:20:45,230 So when you start using mutexes, some of the 347 00:20:45,230 --> 00:20:49,600 guarantees and so forth get much dicier. 348 00:20:49,600 --> 00:20:50,850 Any questions about that? 349 00:20:55,120 --> 00:20:59,840 Now, if you have no data races in your code, that doesn't 350 00:20:59,840 --> 00:21:04,430 mean that you have no bugs. 351 00:21:04,430 --> 00:21:09,970 So for example, here's a way somebody might fix that 352 00:21:09,970 --> 00:21:12,070 insertion code. 353 00:21:12,070 --> 00:21:18,750 So we hash the key, we grab a lock, we set x next to be 354 00:21:18,750 --> 00:21:21,290 whatever is the head of the list, and 355 00:21:21,290 --> 00:21:23,820 then we do an unlock. 356 00:21:23,820 --> 00:21:25,890 And now we lock it again. 357 00:21:25,890 --> 00:21:29,500 Now we follow the head to set x-- 358 00:21:29,500 --> 00:21:32,190 sorry, we set x to be the head of the list 359 00:21:32,190 --> 00:21:33,790 and then unlock again. 360 00:21:33,790 --> 00:21:37,160 And now notice that in this case, technically, there is no 361 00:21:37,160 --> 00:21:41,160 data race if I have two concurrent threads trying to 362 00:21:41,160 --> 00:21:43,800 access these at a time, because all the axis I'm 363 00:21:43,800 --> 00:21:47,260 doing, I'm holding lock L. Nevertheless, I can get that 364 00:21:47,260 --> 00:21:54,610 same interleaving of code that causes the bug. 365 00:21:54,610 --> 00:21:58,690 So just because you don't have a data race doesn't mean that 366 00:21:58,690 --> 00:22:01,280 you don't have a bug in your code. 367 00:22:01,280 --> 00:22:02,790 As I say, this is dangerous programming. 368 00:22:05,580 --> 00:22:10,490 However, typically, if you have mutexes and no data 369 00:22:10,490 --> 00:22:14,290 races, usually it means that you went through and thought 370 00:22:14,290 --> 00:22:15,040 about this code. 371 00:22:15,040 --> 00:22:19,530 And if you were thinking about this code, you would say, gee, 372 00:22:19,530 --> 00:22:22,520 really I'm trying to make these two instructions be the 373 00:22:22,520 --> 00:22:23,400 critical section. 374 00:22:23,400 --> 00:22:26,590 Why would I unlock and lock again? 375 00:22:26,590 --> 00:22:30,610 So most of the time, as a practical matter, if you don't 376 00:22:30,610 --> 00:22:33,430 have data races, it probably means you did the right thing 377 00:22:33,430 --> 00:22:36,570 in terms of identifying the critical sections that needed 378 00:22:36,570 --> 00:22:41,330 to be locked and not unlocking things in the middle of them. 379 00:22:41,330 --> 00:22:45,710 So as a practical matter, no data races usually means it's 380 00:22:45,710 --> 00:22:46,960 unlikely you have bugs. 381 00:22:49,910 --> 00:22:50,860 But no guarantees. 382 00:22:50,860 --> 00:22:53,330 As I say, dangerous programming. 383 00:22:53,330 --> 00:22:55,310 Non-deterministic programming is dangerous program. 384 00:22:57,980 --> 00:22:59,230 Any questions about that? 385 00:23:02,910 --> 00:23:06,550 Anybody scared off yet? 386 00:23:06,550 --> 00:23:07,530 Yeah? 387 00:23:07,530 --> 00:23:10,778 AUDIENCE: So what you can do is the opposite. 388 00:23:10,778 --> 00:23:13,634 You don't have any bugs, but you made the critical 389 00:23:13,634 --> 00:23:15,070 distinction to [INAUDIBLE] 390 00:23:15,070 --> 00:23:18,310 PROFESSOR: Yes, so certainly from a performance point of 391 00:23:18,310 --> 00:23:21,080 view, one of the problems with locking is that-- 392 00:23:21,080 --> 00:23:22,780 and we'll talk about this a little bit later-- 393 00:23:22,780 --> 00:23:26,300 with locking is that if you have a large section that you 394 00:23:26,300 --> 00:23:29,640 decide to lock, it means other threads can't do 395 00:23:29,640 --> 00:23:31,870 work on that section. 396 00:23:31,870 --> 00:23:34,600 So they're spinning, wasting cycles. 397 00:23:34,600 --> 00:23:36,020 So generally, you want to try to lock 398 00:23:36,020 --> 00:23:38,970 things as small as possible. 399 00:23:38,970 --> 00:23:41,390 The other problem is, it turns out that there's overhead 400 00:23:41,390 --> 00:23:44,550 associated with these locks. 401 00:23:44,550 --> 00:23:47,150 So if there's overhead associated with the locks, 402 00:23:47,150 --> 00:23:50,610 that's problematic as well, because now you 403 00:23:50,610 --> 00:23:52,180 may be slowing down. 404 00:23:52,180 --> 00:23:56,460 If this is in an inner loop, notice that we've now, even if 405 00:23:56,460 --> 00:24:00,430 I just have the lock and unlock without these two 406 00:24:00,430 --> 00:24:03,570 spurious ones here, we may be more than 407 00:24:03,570 --> 00:24:04,610 doubling the overhead. 408 00:24:04,610 --> 00:24:07,760 In fact, locking instructions tend to be much more expensive 409 00:24:07,760 --> 00:24:10,140 than register operations. 410 00:24:10,140 --> 00:24:15,770 They usually cost something on the order of going to L2 cache 411 00:24:15,770 --> 00:24:18,050 as a minimum. 412 00:24:18,050 --> 00:24:20,210 So it's not even L1 cache. 413 00:24:20,210 --> 00:24:21,730 It's like going out to L2 cache. 414 00:24:24,780 --> 00:24:27,740 Now, it turns out there are some times where 415 00:24:27,740 --> 00:24:28,870 you have data races. 416 00:24:28,870 --> 00:24:32,720 So we say if there are no data races, then you have no 417 00:24:32,720 --> 00:24:34,540 guarantee there's no bugs. 418 00:24:34,540 --> 00:24:41,440 If there are data races, your program still may be correct. 419 00:24:41,440 --> 00:24:48,350 Here's an example of a code where you might want to allow 420 00:24:48,350 --> 00:24:50,460 a benign data race. 421 00:24:50,460 --> 00:24:53,470 So here we have, let's say, an array A that has these 422 00:24:53,470 --> 00:24:54,930 elements in it. 423 00:24:54,930 --> 00:24:56,960 And we want to find, what is the set of 424 00:24:56,960 --> 00:24:59,680 digits in the array? 425 00:24:59,680 --> 00:25:02,880 So these are all going to be values between 0 and 9. 426 00:25:02,880 --> 00:25:05,200 And I want to know which ones are present of 427 00:25:05,200 --> 00:25:06,230 the digits 0 to 9. 428 00:25:06,230 --> 00:25:09,110 Which ones are not present? 429 00:25:09,110 --> 00:25:10,710 So I can write a little code for that. 430 00:25:10,710 --> 00:25:14,430 Let me initialize an array called "digits" to have 431 00:25:14,430 --> 00:25:16,900 all-zero entries. 432 00:25:16,900 --> 00:25:24,880 And now let me go through all the elements of A and set 433 00:25:24,880 --> 00:25:28,790 digits of whatever the digit is to be 1. 434 00:25:28,790 --> 00:25:32,920 So set at 1 if that digit is present. 435 00:25:32,920 --> 00:25:35,000 And I can do that in parallel, even. 436 00:25:37,880 --> 00:25:42,500 So what can happen here is I can have, if I've done this in 437 00:25:42,500 --> 00:25:48,580 parallel, this particular update of digits of 6 will be 438 00:25:48,580 --> 00:25:52,170 set to 1 when this one is being sent to 1. 439 00:25:52,170 --> 00:25:54,060 Is that a problem? 440 00:25:54,060 --> 00:25:54,850 In some sense, no. 441 00:25:54,850 --> 00:25:56,090 They're both being set to 1. 442 00:25:56,090 --> 00:25:58,190 Who cares? 443 00:25:58,190 --> 00:26:01,370 But there is a race there. 444 00:26:01,370 --> 00:26:04,130 There is a race, but it's a benign race. 445 00:26:04,130 --> 00:26:07,370 Well, it may or may not be benign. 446 00:26:07,370 --> 00:26:12,030 So there's a gotcha on this one. 447 00:26:12,030 --> 00:26:15,320 So this code only works correctly if the hardware 448 00:26:15,320 --> 00:26:16,850 writes the array elements atomically. 449 00:26:19,410 --> 00:26:22,610 So for example, not on the x86-64 450 00:26:22,610 --> 00:26:23,740 architecture we're using. 451 00:26:23,740 --> 00:26:29,380 But on some architectures, you cannot write a byte value. 452 00:26:29,380 --> 00:26:32,902 You cannot write a byte value as an atomic operation. 453 00:26:32,902 --> 00:26:37,040 It implements a right to a byte by reading a word, 454 00:26:37,040 --> 00:26:40,840 masking out things, changing the field, masking again, and 455 00:26:40,840 --> 00:26:42,720 then writing it back out. 456 00:26:42,720 --> 00:26:44,650 So you can have a race on a byte value. 457 00:26:44,650 --> 00:26:47,620 In particular, even if I were going to do this with bits, I 458 00:26:47,620 --> 00:26:52,360 could have a race on bits, although C doesn't let me 459 00:26:52,360 --> 00:26:54,870 access bits directly. 460 00:26:54,870 --> 00:26:59,670 The smallest unit I can access is a byte. 461 00:26:59,670 --> 00:27:02,970 So you have to worry about what's the level of atomicity 462 00:27:02,970 --> 00:27:04,520 provided by your architecture? 463 00:27:04,520 --> 00:27:09,470 So the x86 architecture, the grain size of atomic update is 464 00:27:09,470 --> 00:27:12,040 you can do a single-byte write, and it will do the 465 00:27:12,040 --> 00:27:14,370 right, proper thing-- 466 00:27:14,370 --> 00:27:17,020 do the right thing on the write. 467 00:27:17,020 --> 00:27:19,780 So we have both things. 468 00:27:19,780 --> 00:27:20,740 No bugs. 469 00:27:20,740 --> 00:27:23,370 No data races doesn't mean no bugs. 470 00:27:23,370 --> 00:27:27,090 Presence of data races doesn't mean you have bugs. 471 00:27:27,090 --> 00:27:31,000 But generally, they're fairly well overlapped. 472 00:27:31,000 --> 00:27:38,740 Now, why would I not want to put in a lock and unlock here 473 00:27:38,740 --> 00:27:41,030 just to get rid of the race? 474 00:27:41,030 --> 00:27:43,230 If I run Cilkscreen on this, it's going to complain. 475 00:27:43,230 --> 00:27:46,820 It's going to say, you've got a race here. 476 00:27:46,820 --> 00:27:49,870 Why would I not want to put a lock on here, for example? 477 00:27:53,758 --> 00:27:55,216 AUDIENCE: Because then we don't 478 00:27:55,216 --> 00:27:57,160 have parallelism anymore? 479 00:27:57,160 --> 00:28:00,870 PROFESSOR: No, well, I'd have parallelism maybe up to 10, 480 00:28:00,870 --> 00:28:03,100 for example, right? 481 00:28:03,100 --> 00:28:04,880 Because I have 10 different things that could be 482 00:28:04,880 --> 00:28:06,510 going on at a time. 483 00:28:06,510 --> 00:28:09,290 But that's one reason. 484 00:28:09,290 --> 00:28:10,890 That is one reason. 485 00:28:10,890 --> 00:28:12,310 What's another reason why I might not want to 486 00:28:12,310 --> 00:28:14,563 put locks in here? 487 00:28:14,563 --> 00:28:15,813 AUDIENCE: [INAUDIBLE] 488 00:28:18,802 --> 00:28:21,400 PROFESSOR: It could be that all the numbers-- 489 00:28:21,400 --> 00:28:24,760 that's a case where it doesn't get me much speedup. 490 00:28:24,760 --> 00:28:26,680 But what's another reason I might want to do this? 491 00:28:26,680 --> 00:28:27,930 AUDIENCE: [INAUDIBLE] 492 00:28:31,927 --> 00:28:34,990 PROFESSOR: I think you're on the right track. 493 00:28:34,990 --> 00:28:35,620 Overhead. 494 00:28:35,620 --> 00:28:36,370 Yeah. 495 00:28:36,370 --> 00:28:37,040 Overhead. 496 00:28:37,040 --> 00:28:39,730 This is my inner loop. 497 00:28:39,730 --> 00:28:41,970 So if I'm locking and unlocking, all this is doing 498 00:28:41,970 --> 00:28:44,310 is just doing a memory [? dereference ?] 499 00:28:44,310 --> 00:28:46,670 and an assignment. 500 00:28:46,670 --> 00:28:49,890 And that may be fairly cheap, whereas if I grab a lock and 501 00:28:49,890 --> 00:28:53,590 then release the lock, those operations may be much, much 502 00:28:53,590 --> 00:28:55,810 more expensive. 503 00:28:55,810 --> 00:29:00,110 So I may be slowing down the execution of this loop by more 504 00:29:00,110 --> 00:29:05,860 than I'm going to gain out of the parallelism of this. 505 00:29:05,860 --> 00:29:09,810 So I may say, I may reason, hey, there is a good reason 506 00:29:09,810 --> 00:29:14,530 why not have a data race there. 507 00:29:19,280 --> 00:29:21,780 So I may want to have a data race, and I may want 508 00:29:21,780 --> 00:29:23,540 to say that's OK. 509 00:29:23,540 --> 00:29:25,690 And if that happens, however, you're now going to get 510 00:29:25,690 --> 00:29:27,060 warnings out of Cilkscreen. 511 00:29:27,060 --> 00:29:30,370 And I generally recommend that you have no warnings on 512 00:29:30,370 --> 00:29:33,610 Cilkscreen when you run your code. 513 00:29:33,610 --> 00:29:38,280 So the Cilk environment provides a mechanism called 514 00:29:38,280 --> 00:29:45,310 "fake locks." So a fake lock allows you to communicate to 515 00:29:45,310 --> 00:29:47,290 Cilkscreen that a race is intentional. 516 00:29:47,290 --> 00:29:51,460 So what you then do is you put a fake lock in around this 517 00:29:51,460 --> 00:29:53,260 access here. 518 00:29:53,260 --> 00:29:56,760 And what happens is when Cilkscreen runs, it says, oh, 519 00:29:56,760 --> 00:30:01,760 you grabbed this lock, so I shouldn't report a race. 520 00:30:01,760 --> 00:30:08,610 But during execution, no lock is actually grabbed, because 521 00:30:08,610 --> 00:30:10,000 it's a fake one. 522 00:30:10,000 --> 00:30:15,470 So it doesn't slow you down it all at runtime, but Cilkscreen 523 00:30:15,470 --> 00:30:18,970 still thinks that a lock is being acquired. 524 00:30:18,970 --> 00:30:21,160 Questions about that? 525 00:30:21,160 --> 00:30:25,050 So this is if you want to have an intentional race, this is a 526 00:30:25,050 --> 00:30:26,200 way you can quiet Cilkscreen. 527 00:30:26,200 --> 00:30:28,360 Of course, it's dangerous, right? 528 00:30:28,360 --> 00:30:31,610 It's yet another example of what's dangerous here. 529 00:30:31,610 --> 00:30:33,150 Because what happens if you did it wrong? 530 00:30:33,150 --> 00:30:35,350 What happens if there really is a bug there? 531 00:30:35,350 --> 00:30:38,080 You're now telling it to ignore that bug. 532 00:30:38,080 --> 00:30:41,960 So one way that you can make your code-- 533 00:30:41,960 --> 00:30:44,720 if you put in fake locks everywhere, you could make it 534 00:30:44,720 --> 00:30:48,010 so, oh, Cilkscreen runs just great, and have your code full 535 00:30:48,010 --> 00:30:49,260 of race bugs. 536 00:30:51,400 --> 00:30:56,020 So if you use fake locks, you should document very carefully 537 00:30:56,020 --> 00:30:59,340 that you're doing so and why that's going to be a safe 538 00:30:59,340 --> 00:31:02,160 thing to do. 539 00:31:02,160 --> 00:31:03,410 Any questions about that? 540 00:31:07,676 --> 00:31:10,860 By the way, one of the nice things about some of the 541 00:31:10,860 --> 00:31:15,240 concurrency platforms like Cilk is that they provide a 542 00:31:15,240 --> 00:31:17,830 layer of abstraction where generally, you don't have to 543 00:31:17,830 --> 00:31:19,180 do very much locking. 544 00:31:19,180 --> 00:31:22,670 If you program with Pthreads, for example, you're locking 545 00:31:22,670 --> 00:31:24,660 all the time. 546 00:31:24,660 --> 00:31:26,990 So you're writing non-deterministic programs all 547 00:31:26,990 --> 00:31:29,500 the time, and you're debugging non-deterministic 548 00:31:29,500 --> 00:31:31,340 programs all the time. 549 00:31:31,340 --> 00:31:34,230 Whereas Cilk provides a layer of programming where you can 550 00:31:34,230 --> 00:31:38,040 do most of your programming in a deterministic fashion. 551 00:31:38,040 --> 00:31:42,420 And occasionally, you may want to have some non-determinism 552 00:31:42,420 --> 00:31:43,020 here or there. 553 00:31:43,020 --> 00:31:49,580 But hopefully you can manage that if you do it judiciously. 554 00:31:49,580 --> 00:31:53,670 Any questions about mutexes and uses 555 00:31:53,670 --> 00:31:56,140 for them and so forth? 556 00:31:56,140 --> 00:31:57,880 Good. 557 00:31:57,880 --> 00:31:59,920 So let's talk about how they get implemented. 558 00:31:59,920 --> 00:32:02,900 Because as with all these things, we want to understand 559 00:32:02,900 --> 00:32:05,990 not just what the abstractions is but how it is that you 560 00:32:05,990 --> 00:32:10,280 actually implement these things so that you can reason 561 00:32:10,280 --> 00:32:14,510 about them more cogently. 562 00:32:14,510 --> 00:32:19,900 So there's typically three major properties of mutexes 563 00:32:19,900 --> 00:32:20,810 when you look at them. 564 00:32:20,810 --> 00:32:23,580 And when you see documentation for mutexes, you should 565 00:32:23,580 --> 00:32:26,740 understand what the difference is of these things. 566 00:32:26,740 --> 00:32:29,080 The first is whether it's a yielding mutex 567 00:32:29,080 --> 00:32:31,850 or a spinning mutex. 568 00:32:31,850 --> 00:32:36,190 So a yielding mutex, when you spin, it returns control to 569 00:32:36,190 --> 00:32:37,700 the operating system. 570 00:32:37,700 --> 00:32:40,170 And why might you want to do that? 571 00:32:40,170 --> 00:32:43,700 Whereas a spinning one just consumes processor cycles. 572 00:32:43,700 --> 00:32:44,950 Why would you want to do that? 573 00:32:47,926 --> 00:32:48,412 Yeah. 574 00:32:48,412 --> 00:32:50,360 AUDIENCE: [INAUDIBLE] allow other threads. 575 00:32:50,360 --> 00:32:53,090 PROFESSOR: Yeah, it can allow other threads or other jobs 576 00:32:53,090 --> 00:32:56,180 that could be running to use the processor 577 00:32:56,180 --> 00:32:57,010 while you're waiting. 578 00:32:57,010 --> 00:32:58,330 What's the downside of that? 579 00:33:00,830 --> 00:33:02,290 To speak to the-- either one. 580 00:33:02,290 --> 00:33:03,787 Go ahead. 581 00:33:03,787 --> 00:33:07,945 AUDIENCE: It might be possible that whatever you're trying to 582 00:33:07,945 --> 00:33:11,771 do is essential, and you really want to get that done 583 00:33:11,771 --> 00:33:13,268 [UNINTELLIGIBLE] everything else executes. 584 00:33:13,268 --> 00:33:14,250 So you really want [INAUDIBLE] 585 00:33:14,250 --> 00:33:19,760 PROFESSOR: Yeah, context switching a thread out is a 586 00:33:19,760 --> 00:33:22,100 heavyweight operation. 587 00:33:22,100 --> 00:33:25,010 And It may be, if you end up context switching out, it may 588 00:33:25,010 --> 00:33:29,290 be you only had to wait for a half a dozen cycles and you'd 589 00:33:29,290 --> 00:33:30,480 have the lock. 590 00:33:30,480 --> 00:33:33,700 But instead, now you're going and you're doing a context 591 00:33:33,700 --> 00:33:36,250 switch and may not get access to the machine for another 592 00:33:36,250 --> 00:33:38,750 hundredth of a second or something. 593 00:33:38,750 --> 00:33:43,640 So it may be on the order of 10 to the 6th-- a million 594 00:33:43,640 --> 00:33:45,660 instructions before you get access again, 595 00:33:45,660 --> 00:33:49,500 rather than just a few. 596 00:33:49,500 --> 00:33:54,520 The second property of mutexes is whether they're 597 00:33:54,520 --> 00:33:56,850 reentrant or not. 598 00:33:56,850 --> 00:34:02,550 So a reenttrant mutex allows a thread that's holding a lock 599 00:34:02,550 --> 00:34:05,340 to acquire it again. 600 00:34:05,340 --> 00:34:08,219 So I may hold the lock, and then I may try to acquire the 601 00:34:08,219 --> 00:34:10,570 lock again. 602 00:34:10,570 --> 00:34:19,139 Java is full of reentrant locks, reentrant mutexes. 603 00:34:19,139 --> 00:34:22,530 So why is this a positive or negative? 604 00:34:22,530 --> 00:34:26,130 What are the pros and cons of this one? 605 00:34:26,130 --> 00:34:29,699 Why might reentrancy be a good thing to want? 606 00:34:34,690 --> 00:34:38,190 Why would I bother doing-- 607 00:34:38,190 --> 00:34:42,043 why would I grab a lock that I already have? 608 00:34:42,043 --> 00:34:45,375 AUDIENCE: It'd be too easy to do a check [INAUDIBLE]. 609 00:34:45,375 --> 00:34:46,739 PROFESSOR: It lets you do what? 610 00:34:46,739 --> 00:34:49,491 AUDIENCE: It lets you not have to worry about locking when 611 00:34:49,491 --> 00:34:49,804 you already have a lock. 612 00:34:49,804 --> 00:34:51,610 PROFESSOR: It lets you not worry about it. 613 00:34:51,610 --> 00:34:52,000 That's right. 614 00:34:52,000 --> 00:34:52,944 But why is that valuable? 615 00:34:52,944 --> 00:34:55,159 AUDIENCE: It saves you one line in an If statment to 616 00:34:55,159 --> 00:34:58,069 check if you have a lock or not. 617 00:34:58,069 --> 00:34:59,640 PROFESSOR: That could be. 618 00:34:59,640 --> 00:35:02,320 Basically, the If statement is embedded in there. 619 00:35:02,320 --> 00:35:03,130 But why would I care? 620 00:35:03,130 --> 00:35:05,320 Why would I want to be acquiring something that I 621 00:35:05,320 --> 00:35:06,570 already have? 622 00:35:09,410 --> 00:35:12,600 In what programming situation might that arise? 623 00:35:12,600 --> 00:35:15,510 This seems kind of weird, right? 624 00:35:15,510 --> 00:35:16,970 Could be recursion. 625 00:35:16,970 --> 00:35:17,890 Yeah. 626 00:35:17,890 --> 00:35:20,880 So usually, what it comes from is when you have objects, and 627 00:35:20,880 --> 00:35:23,480 you have several methods on the object. 628 00:35:23,480 --> 00:35:27,070 And what you'd like to do is, if somebody's calling the 629 00:35:27,070 --> 00:35:33,440 method from the outside, you would like to be able to 630 00:35:33,440 --> 00:35:35,680 execute that particular-- 631 00:35:35,680 --> 00:35:41,650 I guess in C++ they don't call them "methods." They call them 632 00:35:41,650 --> 00:35:43,940 "member functions." "Member functions," they call them. 633 00:35:43,940 --> 00:35:46,430 In Java, they call them "methods," and in C++, they 634 00:35:46,430 --> 00:35:49,020 call them "member functions." Doesn't matter. 635 00:35:49,020 --> 00:35:50,700 It's the same thing. 636 00:35:50,700 --> 00:35:52,840 So when you access one of these, normally, from the 637 00:35:52,840 --> 00:35:55,940 outside, you want to make sure you grab the lock associated 638 00:35:55,940 --> 00:35:57,380 with the object. 639 00:35:57,380 --> 00:36:00,760 However, it may be that what you're doing inside the object 640 00:36:00,760 --> 00:36:01,880 is you want to be able-- 641 00:36:01,880 --> 00:36:05,720 one of the operations may be a more complex operation that 642 00:36:05,720 --> 00:36:09,840 wants to use one of its own implementations. 643 00:36:09,840 --> 00:36:11,650 So rather than implementing it twice-- 644 00:36:11,650 --> 00:36:16,560 once in the locked form, once without getting the lock-- 645 00:36:16,560 --> 00:36:19,970 you just implement it once, and you use reentrant locks. 646 00:36:19,970 --> 00:36:23,690 And that way, you don't have to worry about, in coding 647 00:36:23,690 --> 00:36:26,840 those things, whether or not you've already got it. 648 00:36:26,840 --> 00:36:29,570 So that's probably the most common place that I know that 649 00:36:29,570 --> 00:36:31,040 people want reentrant locks. 650 00:36:31,040 --> 00:36:35,420 Naturally, to acquire a reentrant lock, you have to do 651 00:36:35,420 --> 00:36:37,880 some kind of If statement, which is a conditional. 652 00:36:37,880 --> 00:36:41,015 And as you know, if it's an unpredictable branch, that's 653 00:36:41,015 --> 00:36:43,010 going to be very expensive. 654 00:36:43,010 --> 00:36:50,090 So generally, there is a cost to making it reentrant. 655 00:36:50,090 --> 00:36:55,640 The third property is whether the lock is fair or unfair. 656 00:36:55,640 --> 00:36:59,770 So a fair mutex puts block threads essentially into a 657 00:36:59,770 --> 00:37:00,970 FIFO queue. 658 00:37:00,970 --> 00:37:04,250 And the unlock operation unblocks the thread that has 659 00:37:04,250 --> 00:37:06,940 been waiting the longest. 660 00:37:06,940 --> 00:37:13,450 So it makes it so that if you try to acquire a lock, you 661 00:37:13,450 --> 00:37:15,930 don't have some other thread coming in and trying to access 662 00:37:15,930 --> 00:37:19,000 that lock and getting ahead of you. 663 00:37:19,000 --> 00:37:21,740 It puts you in a queue. 664 00:37:21,740 --> 00:37:24,900 So an unfair mutex lets any blocked thread go next. 665 00:37:27,530 --> 00:37:31,030 So the cheapest thing to implement is a spinning, 666 00:37:31,030 --> 00:37:35,180 non-reentrant, unfair lock-- 667 00:37:35,180 --> 00:37:36,110 mutex. 668 00:37:36,110 --> 00:37:37,890 Those are the cheapest ones to implement. 669 00:37:37,890 --> 00:37:40,490 Very lightweight, very easy to use. 670 00:37:40,490 --> 00:37:42,270 The heavyweight ones are a yielding, 671 00:37:42,270 --> 00:37:44,900 reentrant, fair lock. 672 00:37:44,900 --> 00:37:48,130 And of course, you can have combinations, because all of 673 00:37:48,130 --> 00:37:52,100 these have, as you can see, different properties in terms 674 00:37:52,100 --> 00:37:55,840 of convenience of use and so forth, as well 675 00:37:55,840 --> 00:37:57,650 as different overheads. 676 00:37:57,650 --> 00:38:00,950 So there's some cases where the overhead isn't a big deal 677 00:38:00,950 --> 00:38:05,660 because it's not in the inner loop of a program or a heavily 678 00:38:05,660 --> 00:38:06,910 executed statement. 679 00:38:09,220 --> 00:38:12,400 So let's take a look at one of the simplest locks, which is a 680 00:38:12,400 --> 00:38:14,710 simple spinning mutex. 681 00:38:14,710 --> 00:38:20,420 This is the x86 code for how to acquire a lock. 682 00:38:23,260 --> 00:38:24,060 So let's run through this. 683 00:38:24,060 --> 00:38:26,000 So we start out at the top. 684 00:38:26,000 --> 00:38:28,490 And I check to see if the mutex is 0, which is 685 00:38:28,490 --> 00:38:32,280 basically, it's going to be 0 if it's free and 1 if it has 686 00:38:32,280 --> 00:38:34,570 been acquired. 687 00:38:34,570 --> 00:38:36,230 So we compare it. 688 00:38:36,230 --> 00:38:41,010 If it's free, then I jump to try to get the mutex. 689 00:38:41,010 --> 00:38:44,270 Otherwise, I execute this PAUSE instruction, and this 690 00:38:44,270 --> 00:38:46,140 turns out to be a-- 691 00:38:46,140 --> 00:38:46,880 it's humorous. 692 00:38:46,880 --> 00:38:50,460 It's x86 hack to un-confuse the pipeline. 693 00:38:50,460 --> 00:38:53,890 So it turns out that in this case, if you don't have a 694 00:38:53,890 --> 00:38:55,100 pause here-- 695 00:38:55,100 --> 00:38:58,320 which is no-op and does nothing-- 696 00:38:58,320 --> 00:39:04,970 x86 mispredicts something or whatever, and it's more time 697 00:39:04,970 --> 00:39:08,040 consuming than if it doesn't have that there. 698 00:39:08,040 --> 00:39:12,290 The manual explains very little about this hardware bug 699 00:39:12,290 --> 00:39:15,440 except to say, put in the pause. 700 00:39:15,440 --> 00:39:17,940 So if you didn't get it, then you jump to spin mutex, and 701 00:39:17,940 --> 00:39:20,710 try again, check to see if it's free. 702 00:39:20,710 --> 00:39:23,890 Now, notice that we're going to spin until it's free, and 703 00:39:23,890 --> 00:39:25,860 then we're going to try to get it. 704 00:39:25,860 --> 00:39:27,170 Why not just try to get it first? 705 00:39:34,900 --> 00:39:37,110 Well, think about that while we go through how to get it, 706 00:39:37,110 --> 00:39:39,310 and then I'll ask it again. 707 00:39:39,310 --> 00:39:42,310 Think about why it is that you might want to get it first. 708 00:39:42,310 --> 00:39:47,180 So if I want to get the mutex, I first get a 709 00:39:47,180 --> 00:39:50,490 value of 1 in my register. 710 00:39:50,490 --> 00:39:53,950 And then I compute this exchange operation, which 711 00:39:53,950 --> 00:40:01,210 exchanges the value of the mutex with the value of the-- 712 00:40:01,210 --> 00:40:03,110 with the one that I have. 713 00:40:03,110 --> 00:40:06,240 So it exchanges the memory location with the register. 714 00:40:06,240 --> 00:40:08,070 Now, this is an expensive operation-- 715 00:40:08,070 --> 00:40:09,060 exchange-- 716 00:40:09,060 --> 00:40:11,970 because it's an atomic exchange, and it typically has 717 00:40:11,970 --> 00:40:15,930 to go at least out to L2 to do this. 718 00:40:15,930 --> 00:40:17,810 So it's an expensive operation, because it's a 719 00:40:17,810 --> 00:40:20,170 read-modify-write operation. 720 00:40:20,170 --> 00:40:25,280 I'm swapping my register value with a value 721 00:40:25,280 --> 00:40:26,530 that's in the mutex. 722 00:40:30,490 --> 00:40:35,100 So it turns out that if it's 0, then it means I got it. 723 00:40:39,730 --> 00:40:43,610 So I compare it with 0, and if it's equal to 0, I go onto the 724 00:40:43,610 --> 00:40:44,720 critical section. 725 00:40:44,720 --> 00:40:47,140 When I'm done with the critical section, I release 726 00:40:47,140 --> 00:40:51,050 the mutex by basically storing 0 in there, because I'm the 727 00:40:51,050 --> 00:40:54,620 only one who accesses the mutex at this point. 728 00:40:54,620 --> 00:40:57,410 If I didn't get it, if the value is 1, notice that 729 00:40:57,410 --> 00:41:02,510 because I'm swapping a 1 in, even though the 1 got swapped 730 00:41:02,510 --> 00:41:04,350 in, well, there was a 1 there before. 731 00:41:04,350 --> 00:41:08,650 So it basically did not affect the value of the mutex. 732 00:41:08,650 --> 00:41:10,810 But I discover, oh, I don't have it. 733 00:41:10,810 --> 00:41:14,050 Then we go all the way back up there to spin mutex. 734 00:41:14,050 --> 00:41:15,090 So here's the question. 735 00:41:15,090 --> 00:41:16,790 Why do I need all this preamble code? 736 00:41:16,790 --> 00:41:21,820 Why not just go straight to Get_Mutex, make the spin mutex 737 00:41:21,820 --> 00:41:25,110 here be a jump to Get_Mutex? 738 00:41:25,110 --> 00:41:25,450 Yeah? 739 00:41:25,450 --> 00:41:27,945 AUDIENCE: Maybe it's because the exchange is expensive. 740 00:41:27,945 --> 00:41:29,050 PROFESSOR: Excuse me? 741 00:41:29,050 --> 00:41:29,810 AUDIENCE: The exchange is-- 742 00:41:29,810 --> 00:41:31,510 PROFESSOR: Yeah, because the exchange is expensive. 743 00:41:31,510 --> 00:41:32,520 Exactly. 744 00:41:32,520 --> 00:41:36,190 So this code here, I can compare. 745 00:41:36,190 --> 00:41:38,900 And as long as nobody's touching anything, this 746 00:41:38,900 --> 00:41:46,950 becomes just L1 memory accesses. 747 00:41:46,950 --> 00:41:51,320 Whereas here, it's going to be at least L2 to do 748 00:41:51,320 --> 00:41:52,790 the exchange operation. 749 00:41:52,790 --> 00:41:55,690 So rather than doing that-- 750 00:41:55,690 --> 00:42:00,290 moreover, this one actually changes the value. 751 00:42:00,290 --> 00:42:02,500 So what happens when I change the value of the mutex? 752 00:42:02,500 --> 00:42:05,970 Even though I change it to the same value, what happens in 753 00:42:05,970 --> 00:42:08,590 order to do that exchange? 754 00:42:08,590 --> 00:42:13,320 Remember from several lectures ago. 755 00:42:13,320 --> 00:42:15,890 What's going to happen when I make an exchange there? 756 00:42:15,890 --> 00:42:17,350 What does the hardware have to do? 757 00:42:23,811 --> 00:42:27,730 What's the hardware going to do on any store to a shared 758 00:42:27,730 --> 00:42:32,380 memory location, to a memory location in shared memory that 759 00:42:32,380 --> 00:42:34,480 is actually shared? 760 00:42:34,480 --> 00:42:35,260 Yeah? 761 00:42:35,260 --> 00:42:36,126 AUDIENCE: [INAUDIBLE] 762 00:42:36,126 --> 00:42:37,890 PROFESSOR: Yeah, it's got to invalidate 763 00:42:37,890 --> 00:42:41,180 all the other copies. 764 00:42:41,180 --> 00:42:43,900 So if everybody spinning here-- imagine that you have 765 00:42:43,900 --> 00:42:48,310 five guys spinning, doing exchanges-- 766 00:42:48,310 --> 00:42:53,310 they're all creating all this traffic of invalidations, 767 00:42:53,310 --> 00:42:56,870 what's called an "invalidation storm." So they create an 768 00:42:56,870 --> 00:42:59,950 invalidation storm as they all are invalidating each other so 769 00:42:59,950 --> 00:43:02,240 that they can get access to it so that they can change the 770 00:43:02,240 --> 00:43:05,240 value themselves. 771 00:43:05,240 --> 00:43:07,605 But up here, all I'm doing is looking at the value. 772 00:43:10,672 --> 00:43:15,345 All I'm doing is looking at the value to see if it's free. 773 00:43:15,345 --> 00:43:26,675 And it's not until the guy actually frees the value that 774 00:43:26,675 --> 00:43:27,430 it actually-- 775 00:43:27,430 --> 00:43:30,100 actually, this is interesting. 776 00:43:30,100 --> 00:43:34,470 I think I wrote this with Intel syntax, rather than 777 00:43:34,470 --> 00:43:37,670 AT&T, didn't I? 778 00:43:37,670 --> 00:43:46,020 The MOV mutex, 0 moves 0 into the mutex, 779 00:43:46,020 --> 00:43:47,630 which is Intel syntax. 780 00:43:47,630 --> 00:43:51,620 I probably should have converted this to AT&T, 781 00:43:51,620 --> 00:43:54,350 because that's what we're generally using in the class. 782 00:43:54,350 --> 00:43:58,700 I'll fix that before I put the slides up. 783 00:43:58,700 --> 00:44:00,640 Basically, I pulled this out of the Intel manual. 784 00:44:03,820 --> 00:44:06,770 So any questions about this code? 785 00:44:06,770 --> 00:44:08,490 Everybody see how it works? 786 00:44:08,490 --> 00:44:11,610 It relies on this atomic exchange operation. 787 00:44:11,610 --> 00:44:15,310 And I'm going to end up sitting here spinning until 788 00:44:15,310 --> 00:44:17,330 maybe I can get access to it. 789 00:44:17,330 --> 00:44:19,750 When I have a chance to get access to it, I try to get it. 790 00:44:19,750 --> 00:44:21,570 If I don't get it, I go back to spinning. 791 00:44:26,690 --> 00:44:29,600 How do I convert this to a yielding mutex? 792 00:44:35,449 --> 00:44:37,894 AUDIENCE: Instead of having that 793 00:44:37,894 --> 00:44:42,295 spinning mutex, you should-- 794 00:44:42,295 --> 00:44:43,762 you shouldn't have that. 795 00:44:43,762 --> 00:44:46,207 You should just have something that allows you to just 796 00:44:46,207 --> 00:44:46,696 [INAUDIBLE]. 797 00:44:46,696 --> 00:44:48,910 PROFESSOR: Yeah, so actually, the way you do it is you 798 00:44:48,910 --> 00:44:50,460 replace the PAUSE instruction. 799 00:44:50,460 --> 00:44:51,610 Exactly what you're saying. 800 00:44:51,610 --> 00:44:53,650 You've got the right place in the code. 801 00:44:53,650 --> 00:44:54,865 We basically call a yield. 802 00:44:54,865 --> 00:44:56,250 And you can use, for example, pthread_yield. 803 00:44:59,330 --> 00:45:00,770 What it tells the operating system is, 804 00:45:00,770 --> 00:45:02,295 give up on this quantum. 805 00:45:02,295 --> 00:45:04,590 You can schedule me out. 806 00:45:04,590 --> 00:45:05,710 Somebody else can be scheduled. 807 00:45:05,710 --> 00:45:08,830 Now, if nobody else is there to be scheduled, often you'll 808 00:45:08,830 --> 00:45:12,860 just get control back, and you'll jump again and give the 809 00:45:12,860 --> 00:45:14,160 operating system another time. 810 00:45:17,070 --> 00:45:23,300 Now, one of the things I've seen in computer benchmarks 811 00:45:23,300 --> 00:45:30,470 that use locking is that they all use spin locks. 812 00:45:30,470 --> 00:45:37,880 They never use the yielding, because if you yield, then 813 00:45:37,880 --> 00:45:40,380 when the lock comes free, you're not going to be ready 814 00:45:40,380 --> 00:45:41,020 to come back in. 815 00:45:41,020 --> 00:45:44,470 You may be switched out. 816 00:45:44,470 --> 00:45:48,900 So a common thing that all these companies do when 817 00:45:48,900 --> 00:45:52,320 they're vying for who's got the fastest on this benchmark 818 00:45:52,320 --> 00:45:55,790 or fastest on that benchmark is they go through and they 819 00:45:55,790 --> 00:46:01,250 convert all their yielding mutexes into spinning mutexes, 820 00:46:01,250 --> 00:46:05,240 then take their measurements, when in fact, as a practical 821 00:46:05,240 --> 00:46:08,100 matter, they can't actually ship code that way. 822 00:46:08,100 --> 00:46:11,970 So you'll see this kind of game played where people try 823 00:46:11,970 --> 00:46:15,330 to get the best performance they can in some kind of 824 00:46:15,330 --> 00:46:16,490 laboratory setting. 825 00:46:16,490 --> 00:46:18,680 It's not the same as when you're actually 826 00:46:18,680 --> 00:46:22,650 doing a real thing. 827 00:46:22,650 --> 00:46:24,490 So you have a choice here. 828 00:46:29,430 --> 00:46:30,850 There's kind of a tension here. 829 00:46:35,880 --> 00:46:39,240 You'd like to claim the mutex soon after it's released. 830 00:46:39,240 --> 00:46:40,885 And you're not going to get that if you yield. 831 00:46:43,600 --> 00:46:47,080 At the same time, you want to behave nicely 832 00:46:47,080 --> 00:46:50,530 and waste few cycles. 833 00:46:50,530 --> 00:46:55,040 So what's the strategy for being able to accomplish both 834 00:46:55,040 --> 00:46:57,260 of these goals? 835 00:46:57,260 --> 00:47:01,090 So one of these goals is the spinning mutex does a great 836 00:47:01,090 --> 00:47:03,870 job of claiming the mutex soon after it's released. 837 00:47:03,870 --> 00:47:09,630 The yielding mutex behaves nicely and wastes few cycles. 838 00:47:09,630 --> 00:47:10,830 Is there the best of both worlds? 839 00:47:10,830 --> 00:47:12,460 There's certainly the worst of both worlds, right? 840 00:47:15,500 --> 00:47:16,800 What's the best of both worlds? 841 00:47:19,710 --> 00:47:24,910 How might we accomplish both of these goals with small 842 00:47:24,910 --> 00:47:27,650 modification to the locking code? 843 00:47:36,990 --> 00:47:38,100 So it turns out you can get within a 844 00:47:38,100 --> 00:47:39,375 factor of two of optimal. 845 00:47:47,130 --> 00:47:52,445 How might you do that while wasting few cycles? 846 00:47:55,790 --> 00:47:57,920 So here's the idea. 847 00:47:57,920 --> 00:48:07,030 Spin for a little while, and then, if after a little while 848 00:48:07,030 --> 00:48:14,480 you didn't manage to access the mutex, then yield. 849 00:48:14,480 --> 00:48:16,910 So that if the new mutex was right there available to be 850 00:48:16,910 --> 00:48:20,100 accessed, you could access it, but you don't spin for an 851 00:48:20,100 --> 00:48:23,390 indefinite amount of time. 852 00:48:23,390 --> 00:48:27,490 So the question is, how long do you spin? 853 00:48:27,490 --> 00:48:30,020 So we're going to spin for a little while and then yield. 854 00:48:30,020 --> 00:48:30,360 Yeah? 855 00:48:30,360 --> 00:48:32,760 AUDIENCE: [INAUDIBLE]. 856 00:48:32,760 --> 00:48:34,950 PROFESSOR: Yeah, exactly. 857 00:48:34,950 --> 00:48:38,700 So what you do is you spin for basically as long as a context 858 00:48:38,700 --> 00:48:39,950 switch takes. 859 00:48:41,980 --> 00:48:44,760 So if you spin for as long as it takes to do a context 860 00:48:44,760 --> 00:48:49,090 switch and then do a context switch, if the mutex became 861 00:48:49,090 --> 00:48:51,530 immediately available, well, you're only going to wait 862 00:48:51,530 --> 00:48:52,780 double what you would have waited. 863 00:48:55,260 --> 00:48:59,010 And if in the meantime during that first part where you're 864 00:48:59,010 --> 00:49:01,970 spinning it becomes available, you're not waiting at all any 865 00:49:01,970 --> 00:49:03,570 longer than you actually have to. 866 00:49:03,570 --> 00:49:05,710 So in both cases, you're waiting at 867 00:49:05,710 --> 00:49:06,950 most a factor of two. 868 00:49:06,950 --> 00:49:09,580 In one case, you're waiting exactly the right. 869 00:49:09,580 --> 00:49:12,510 The other, you can actually wait a factor of two. 870 00:49:12,510 --> 00:49:18,220 So this is a classic amortized kind of argument, that you can 871 00:49:18,220 --> 00:49:20,980 amortize the cost of the spinning 872 00:49:20,980 --> 00:49:22,680 to the context switch. 873 00:49:22,680 --> 00:49:25,860 So spin until you spend as much time as it would cost for 874 00:49:25,860 --> 00:49:26,940 a context switch. 875 00:49:26,940 --> 00:49:30,130 Then do the context switch. 876 00:49:30,130 --> 00:49:31,380 Yet another voodoo parameter. 877 00:49:37,440 --> 00:49:39,700 Yeah, so if the mutex is released while spinning, 878 00:49:39,700 --> 00:49:41,720 that's optimal. 879 00:49:41,720 --> 00:49:44,280 If the mutex is released after the yield, you're 880 00:49:44,280 --> 00:49:45,690 within twice optimal. 881 00:49:48,270 --> 00:49:52,820 Turns out that 2 is not the optimal value. 882 00:49:52,820 --> 00:49:57,130 There's a randomized algorithm that makes it e over e minus 1 883 00:49:57,130 --> 00:50:02,030 competitive where e is the base of the natural logarithm. 884 00:50:05,620 --> 00:50:12,990 So 2.7 divided by 1.7, which is what? 885 00:50:12,990 --> 00:50:14,850 Who's got a calculator? 886 00:50:14,850 --> 00:50:17,150 2.7 divided by 1.7 is-- 887 00:50:17,150 --> 00:50:18,760 I should have calculated this out. 888 00:50:18,760 --> 00:50:19,930 AUDIENCE: [INAUDIBLE] 889 00:50:19,930 --> 00:50:21,470 PROFESSOR: It's about 1.6. 890 00:50:21,470 --> 00:50:21,910 Good. 891 00:50:21,910 --> 00:50:23,310 So it's better than 2. 892 00:50:26,980 --> 00:50:30,021 People analyze these things, right? 893 00:50:30,021 --> 00:50:34,250 So any questions about implementation of locks? 894 00:50:34,250 --> 00:50:36,030 There are many other ways of implementing locks. 895 00:50:36,030 --> 00:50:39,210 There are other instructions that people use. 896 00:50:39,210 --> 00:50:42,780 They do things like compare-and-swap is another 897 00:50:42,780 --> 00:50:43,830 operation that's used. 898 00:50:43,830 --> 00:50:47,100 There are some machines have an operation called 899 00:50:47,100 --> 00:50:51,290 load-linked/store-conditional, which is not on the x86 900 00:50:51,290 --> 00:50:53,590 architecture, but it is on other architectures. 901 00:50:53,590 --> 00:50:55,720 You'll see a lot of other things of doing some kind of 902 00:50:55,720 --> 00:50:59,130 atomic operation to implement a lock. 903 00:50:59,130 --> 00:51:03,110 Uniformly, they're expensive compared to register 904 00:51:03,110 --> 00:51:06,160 operations in particular or even L1 accesses, typically, 905 00:51:06,160 --> 00:51:08,760 in particular. 906 00:51:08,760 --> 00:51:10,010 Any questions? 907 00:51:12,860 --> 00:51:15,810 So now that we've decided that we're going to use mutexes, 908 00:51:15,810 --> 00:51:19,330 and we understand we're writing non-deterministic code 909 00:51:19,330 --> 00:51:22,590 and so forth, well, it turns out there are a host of other 910 00:51:22,590 --> 00:51:24,520 system anomalies that occur. 911 00:51:24,520 --> 00:51:29,120 So locks are like, they're this really evil mechanism 912 00:51:29,120 --> 00:51:31,360 that works really well. 913 00:51:31,360 --> 00:51:33,420 It feels so good that nobody wants to stop 914 00:51:33,420 --> 00:51:36,510 using it, even though-- 915 00:51:36,510 --> 00:51:37,880 but nobody has better ideas. 916 00:51:37,880 --> 00:51:42,440 One of the most interesting ideas in recent memory is the 917 00:51:42,440 --> 00:51:45,940 idea of using what's called "transactional memory," which 918 00:51:45,940 --> 00:51:48,800 is basically where memory operates like a database 919 00:51:48,800 --> 00:51:50,100 transaction. 920 00:51:50,100 --> 00:51:52,730 And it's allowed to abort, in which case you roll it back 921 00:51:52,730 --> 00:51:55,520 and retry it. 922 00:51:55,520 --> 00:51:59,280 Yet, transactional memory has a host of issues with it, and 923 00:51:59,280 --> 00:52:00,530 still people use locks. 924 00:52:06,170 --> 00:52:08,610 So let's talk about some of the bad things that happen 925 00:52:08,610 --> 00:52:11,070 when you start doing locks. 926 00:52:11,070 --> 00:52:15,430 I'm going to talk about three of them, deadlock, convoying, 927 00:52:15,430 --> 00:52:18,230 and contention. 928 00:52:18,230 --> 00:52:22,760 So deadlock is probably the most important one, because it 929 00:52:22,760 --> 00:52:24,665 has to do with correctness. 930 00:52:24,665 --> 00:52:27,030 So you can have coded-- in fact, I've seen people with 931 00:52:27,030 --> 00:52:31,360 very fast code that has deadlock potential in it. 932 00:52:31,360 --> 00:52:34,830 It's like, if you deadlock, then your average running time 933 00:52:34,830 --> 00:52:37,860 is infinite if there's a possibility 934 00:52:37,860 --> 00:52:38,810 of a deadlock, right? 935 00:52:38,810 --> 00:52:41,440 Because you're averaging infinity with everything else 936 00:52:41,440 --> 00:52:44,410 that you might run. 937 00:52:44,410 --> 00:52:48,260 So it's not good to have deadlock in your code, 938 00:52:48,260 --> 00:52:50,400 regardless. 939 00:52:50,400 --> 00:52:52,970 It's kind of like your code seg faulting. 940 00:52:52,970 --> 00:52:55,110 No decent code should seg fault. 941 00:52:55,110 --> 00:52:58,290 It should always catch its own errors and terminate 942 00:52:58,290 --> 00:52:59,350 gracefully. 943 00:52:59,350 --> 00:53:02,600 It shouldn't just seg fault in some circumstance. 944 00:53:02,600 --> 00:53:04,295 Similarly, your code should not deadlock. 945 00:53:07,190 --> 00:53:12,540 So here's sort of a classical instance of deadlock. 946 00:53:12,540 --> 00:53:16,020 And deadlock typically occurs when you hold more than one 947 00:53:16,020 --> 00:53:18,670 lock at a time. 948 00:53:18,670 --> 00:53:27,250 So here, this guy is going to grab a lock A, going to grab a 949 00:53:27,250 --> 00:53:31,150 lock B, then unlock B, unlock A, and in there 950 00:53:31,150 --> 00:53:32,050 do a critical section. 951 00:53:32,050 --> 00:53:34,650 Why might I grab two locks? 952 00:53:34,650 --> 00:53:37,340 What's the circumstance where I might have code that looked 953 00:53:37,340 --> 00:53:40,480 very similar to this? 954 00:53:40,480 --> 00:53:41,200 Use case. 955 00:53:41,200 --> 00:53:42,172 AUDIENCE: Two objects? 956 00:53:42,172 --> 00:53:43,156 PROFESSOR: sorry? 957 00:53:43,156 --> 00:53:44,250 AUDIENCE: You need two objects. 958 00:53:44,250 --> 00:53:45,430 PROFESSOR: You need two objects. 959 00:53:45,430 --> 00:53:46,680 When might that occur? 960 00:53:49,442 --> 00:53:51,827 AUDIENCE: Account transactions. 961 00:53:51,827 --> 00:53:53,780 PROFESSOR: Yeah, account transactions. 962 00:53:53,780 --> 00:53:56,790 That's the classic one. 963 00:53:56,790 --> 00:53:59,060 You want to move something from this bank account to that 964 00:53:59,060 --> 00:54:00,055 bank account. 965 00:54:00,055 --> 00:54:02,400 And you want to make sure that as you're updating it, nothing 966 00:54:02,400 --> 00:54:04,110 else is occurring. 967 00:54:04,110 --> 00:54:06,110 Another place this comes up all the time is when you do 968 00:54:06,110 --> 00:54:08,420 graph algorithms. 969 00:54:08,420 --> 00:54:12,140 You always want to grab the edge and have the two vertices 970 00:54:12,140 --> 00:54:15,040 on each end of the edge not move while you do something 971 00:54:15,040 --> 00:54:16,970 across the edge. 972 00:54:16,970 --> 00:54:19,870 So lots of cases there. 973 00:54:19,870 --> 00:54:22,170 It turns out the order in which you unlock things 974 00:54:22,170 --> 00:54:26,240 doesn't matter, because you can always unlock something. 975 00:54:26,240 --> 00:54:28,350 You never hold up for unlocking. 976 00:54:28,350 --> 00:54:30,180 The problem with deadlock is generally 977 00:54:30,180 --> 00:54:31,660 how you acquire locks. 978 00:54:31,660 --> 00:54:35,010 So in this example, Thread 2 grabs Lock B, then grabs Lock 979 00:54:35,010 --> 00:54:39,290 A. So it might be, for example, that you have some 980 00:54:39,290 --> 00:54:42,320 random process that's at the node of a graph. 981 00:54:42,320 --> 00:54:44,440 And now it's going to grab a lock on the 982 00:54:44,440 --> 00:54:45,480 other end of an edge. 983 00:54:45,480 --> 00:54:48,790 But you might have the guy at the other end grabbing that 984 00:54:48,790 --> 00:54:52,850 vertex and then grabbing the one on your end. 985 00:54:52,850 --> 00:54:54,730 And that's basically the situation. 986 00:54:54,730 --> 00:54:59,990 So what happens is Thread 1 acquires a lock here. 987 00:54:59,990 --> 00:55:01,890 Thread 2 acquires that lock. 988 00:55:01,890 --> 00:55:04,910 And now which one can go? 989 00:55:04,910 --> 00:55:05,550 Neither of them. 990 00:55:05,550 --> 00:55:08,110 You've got a deadlock. 991 00:55:08,110 --> 00:55:11,060 Ultimate loss of performance. 992 00:55:11,060 --> 00:55:13,280 So it's really a correctness issue. 993 00:55:13,280 --> 00:55:16,210 But you can view it, if you really say, oh, correctness, 994 00:55:16,210 --> 00:55:17,350 that's for sissies. 995 00:55:17,350 --> 00:55:19,340 We do performance. 996 00:55:19,340 --> 00:55:23,630 Well, it's still a performance issue, because it's the 997 00:55:23,630 --> 00:55:24,970 ultimate loss of performance. 998 00:55:24,970 --> 00:55:27,740 In fact, that's probably true of any correctness issue. 999 00:55:27,740 --> 00:55:28,470 No, that's not true. 1000 00:55:28,470 --> 00:55:30,700 Sometimes you just get the wrong number. 1001 00:55:30,700 --> 00:55:32,700 Here is a correctness issue that 1002 00:55:32,700 --> 00:55:33,950 your code stops operating. 1003 00:55:37,710 --> 00:55:41,410 So there are three conditions that are usually pointed to 1004 00:55:41,410 --> 00:55:43,320 that you need for deadlock. 1005 00:55:43,320 --> 00:55:45,220 The first is mutual exclusion. 1006 00:55:45,220 --> 00:55:49,340 Each thread claims exclusive control over the resources 1007 00:55:49,340 --> 00:55:55,540 that it holds, in this case, the resources being the locks. 1008 00:55:55,540 --> 00:55:58,140 So there's got to be some resource that you're grabbing, 1009 00:55:58,140 --> 00:56:00,840 and that you're the only one who gets to have it. 1010 00:56:00,840 --> 00:56:03,280 So in this case, it would be the locks. 1011 00:56:03,280 --> 00:56:06,310 The second is non-preemption. 1012 00:56:06,310 --> 00:56:09,390 You don't let go of your resources until you complete 1013 00:56:09,390 --> 00:56:12,370 your use of them. 1014 00:56:12,370 --> 00:56:18,840 So that means you can't let go of a lock in a situation. 1015 00:56:18,840 --> 00:56:22,060 If you're actually able to preempt-- 1016 00:56:22,060 --> 00:56:26,020 so this piece of code over there has grabbed locks, and 1017 00:56:26,020 --> 00:56:29,210 now I can come in and take them away, then you may not 1018 00:56:29,210 --> 00:56:29,975 have a deadlock potential. 1019 00:56:29,975 --> 00:56:31,560 You may have other issues, but you won't 1020 00:56:31,560 --> 00:56:34,090 have a deadlock potential. 1021 00:56:34,090 --> 00:56:36,520 And the third one is circular waiting. 1022 00:56:36,520 --> 00:56:39,430 You have a cycle of threads in which each thread is blocked 1023 00:56:39,430 --> 00:56:44,790 waiting for resources held by the next thread in the cycle. 1024 00:56:44,790 --> 00:56:48,650 So let me illustrate this with a very famous story that some 1025 00:56:48,650 --> 00:56:52,900 of you may have seen, because it is so famous. 1026 00:56:52,900 --> 00:56:55,980 It's the dining philosophers problem. 1027 00:56:55,980 --> 00:56:58,890 It's an illustrative story a deadlock that was originally 1028 00:56:58,890 --> 00:57:03,120 told by Tony Hoare, based on an examination question by 1029 00:57:03,120 --> 00:57:05,160 Edsger Dijkstra. 1030 00:57:05,160 --> 00:57:07,260 And the story has been embellished over the years by 1031 00:57:07,260 --> 00:57:09,060 many retellers. 1032 00:57:09,060 --> 00:57:10,610 It's one of these things that if you're a computer 1033 00:57:10,610 --> 00:57:13,120 scientist, you should know this story just because 1034 00:57:13,120 --> 00:57:16,130 everybody knows this story. 1035 00:57:16,130 --> 00:57:19,080 So here's how the story goes, at least my version of it. 1036 00:57:19,080 --> 00:57:21,810 I get to retell it now. 1037 00:57:21,810 --> 00:57:25,270 So each of n philosophers needs the two chopsticks on 1038 00:57:25,270 --> 00:57:29,500 either side of his or her plate to eat the 1039 00:57:29,500 --> 00:57:31,960 noodles on the plate. 1040 00:57:31,960 --> 00:57:35,940 So they're not worried about germs here, by the way. 1041 00:57:35,940 --> 00:57:37,690 So you have five philosophers in this case 1042 00:57:37,690 --> 00:57:40,250 sitting around the table. 1043 00:57:40,250 --> 00:57:42,470 There are five chopsticks between them. 1044 00:57:42,470 --> 00:57:46,200 In order to eat, they need to grab the two chopsticks on 1045 00:57:46,200 --> 00:57:47,350 either side. 1046 00:57:47,350 --> 00:57:48,200 Then they can eat. 1047 00:57:48,200 --> 00:57:50,470 Then they put them down. 1048 00:57:50,470 --> 00:57:56,310 So here's what philosopher i does. 1049 00:57:56,310 --> 00:58:02,220 So in an infinite loop, the philosopher does thinking, 1050 00:58:02,220 --> 00:58:04,650 because that's what philosophers do. 1051 00:58:04,650 --> 00:58:14,660 Then it grabs the lock of chopstick i and grabs the lock 1052 00:58:14,660 --> 00:58:16,480 of chopstick i plus 1. 1053 00:58:16,480 --> 00:58:17,020 That's the 1. 1054 00:58:17,020 --> 00:58:19,850 So if we index them, say, to the left of the plate, this is 1055 00:58:19,850 --> 00:58:22,180 grabbing the chopstick to the left of your plate. 1056 00:58:22,180 --> 00:58:24,930 This is grabbing the chopstick to the right of your plate. 1057 00:58:24,930 --> 00:58:26,240 Then you can eat. 1058 00:58:26,240 --> 00:58:27,550 Then you release your two chopsticks. 1059 00:58:30,810 --> 00:58:32,650 So here, that's the code. 1060 00:58:32,650 --> 00:58:33,900 And then you go back to thinking. 1061 00:58:36,900 --> 00:58:38,435 I guess they have no other bodily functions. 1062 00:58:41,990 --> 00:58:45,660 So the problem is, one day they all pick up their left 1063 00:58:45,660 --> 00:58:46,970 chopsticks simultaneously. 1064 00:58:50,200 --> 00:58:52,080 Now they go to look for their right chopstick. 1065 00:58:52,080 --> 00:58:53,800 It's not there. 1066 00:58:53,800 --> 00:58:55,050 So what happens? 1067 00:58:57,150 --> 00:59:04,520 They starve because their code doesn't let them release-- 1068 00:59:04,520 --> 00:59:07,780 there's no preemption, so they can't release the chopstick 1069 00:59:07,780 --> 00:59:09,030 they've already got. 1070 00:59:11,510 --> 00:59:13,090 And we have a circular waiting. 1071 00:59:13,090 --> 00:59:14,130 They have mutual exclusion. 1072 00:59:14,130 --> 00:59:16,570 Only one of them can have a chopstick at a time. 1073 00:59:16,570 --> 00:59:19,350 And we have a circular waiting thing, because everyone is 1074 00:59:19,350 --> 00:59:24,320 waiting for the philosopher on the right. 1075 00:59:24,320 --> 00:59:25,510 Is that clear to everybody? 1076 00:59:25,510 --> 00:59:27,470 That's the dining philosophers problem. 1077 00:59:27,470 --> 00:59:28,745 How do you fix this problem? 1078 00:59:31,420 --> 00:59:34,225 What are solutions to fixing this problem? 1079 00:59:39,390 --> 00:59:41,655 The problem being that you'd like them to eat indefinitely. 1080 00:59:41,655 --> 00:59:45,401 AUDIENCE: You can index the chopstick and say that 1081 00:59:45,401 --> 00:59:46,850 [INAUDIBLE]. 1082 00:59:46,850 --> 00:59:50,000 PROFESSOR: Yeah, you can pick the smaller index first. 1083 00:59:50,000 --> 00:59:52,330 So in general, that means everybody would grab the one 1084 00:59:52,330 --> 00:59:54,630 on their left, then the one on their right, except for the 1085 00:59:54,630 --> 01:00:00,660 guy who's going between 0 and n minus 1. 1086 01:00:00,660 --> 01:00:04,850 They would do n minus 1 and then 0. 1087 01:00:04,850 --> 01:00:08,260 They would do n minus 1 first, and then 0. 1088 01:00:08,260 --> 01:00:08,780 Sorry. 1089 01:00:08,780 --> 01:00:11,160 They would do 0 first, and then n minus 1. 1090 01:00:11,160 --> 01:00:12,160 [INAUDIBLE] 1091 01:00:12,160 --> 01:00:14,360 Let me say that more precisely. 1092 01:00:14,360 --> 01:00:18,270 So this is a classic way to prevent deadlock. 1093 01:00:18,270 --> 01:00:22,510 Suppose that we can linearly order the mutexes in some 1094 01:00:22,510 --> 01:00:26,430 order so that whenever a thread that holds a mutex Li 1095 01:00:26,430 --> 01:00:32,360 and attempts to lock another mutex Lj, we have it that Li 1096 01:00:32,360 --> 01:00:35,090 goes before Lj in the ordering. 1097 01:00:35,090 --> 01:00:36,500 Then no deadlock can occur. 1098 01:00:39,890 --> 01:00:44,920 So always grab the resource so if they can all order the 1099 01:00:44,920 --> 01:00:45,530 resources-- 1100 01:00:45,530 --> 01:00:50,320 so they're always grabbing them in some subsequence of 1101 01:00:50,320 --> 01:00:52,980 this order, so they're always grabbing one that's larger and 1102 01:00:52,980 --> 01:00:56,860 larger and larger, and you're never going back and grabbing 1103 01:00:56,860 --> 01:00:58,960 one smaller, than you have no deadlock. 1104 01:00:58,960 --> 01:01:00,730 Here's why. 1105 01:01:00,730 --> 01:01:03,010 Suppose you have a cycle of waiting. 1106 01:01:03,010 --> 01:01:05,160 You have a deadlock has occurred. 1107 01:01:05,160 --> 01:01:07,530 Let's look at the thread in the cycle that holds the 1108 01:01:07,530 --> 01:01:10,710 largest mutex that's called Lmax in the ordering. 1109 01:01:10,710 --> 01:01:13,090 So whatever is in the ordering. 1110 01:01:13,090 --> 01:01:16,380 And suppose that it's waiting on a mutex L held by the next 1111 01:01:16,380 --> 01:01:17,350 threat in the cycle. 1112 01:01:17,350 --> 01:01:18,640 That's the condition. 1113 01:01:18,640 --> 01:01:23,270 Well, then it must be that Lmax falls before L, because 1114 01:01:23,270 --> 01:01:26,930 we're gathering them always in an increasing order. 1115 01:01:26,930 --> 01:01:32,210 But that contradicts the fact that Lmax is the largest. 1116 01:01:32,210 --> 01:01:33,810 So a deadlock cannot occur. 1117 01:01:36,930 --> 01:01:38,180 Questions? 1118 01:01:45,430 --> 01:01:46,690 Is this clear? 1119 01:01:46,690 --> 01:01:48,910 Who's seen this before? 1120 01:01:48,910 --> 01:01:49,780 A few people. 1121 01:01:49,780 --> 01:01:51,030 OK. 1122 01:01:53,030 --> 01:01:53,820 Is this clear? 1123 01:01:53,820 --> 01:01:56,900 So if you grab them in increasing order, then there's 1124 01:01:56,900 --> 01:02:00,110 always some guy that has the largest one, and nobody is 1125 01:02:00,110 --> 01:02:01,490 holding one larger. 1126 01:02:01,490 --> 01:02:05,770 So he can always grab the next one. 1127 01:02:05,770 --> 01:02:14,110 So in this case of the dining philosophers, what we can do 1128 01:02:14,110 --> 01:02:22,840 is grab the minimum of i and i plus 1 mod n and then the 1129 01:02:22,840 --> 01:02:25,120 maximum of i and i plus 1 mod n. 1130 01:02:25,120 --> 01:02:28,890 That gives us the same two chopsticks. 1131 01:02:28,890 --> 01:02:32,090 And in fact, for most of the philosophers, it's exactly the 1132 01:02:32,090 --> 01:02:32,770 same order. 1133 01:02:32,770 --> 01:02:36,830 But for one guy, it's a different order. 1134 01:02:36,830 --> 01:02:42,030 It ends up being the guy who would normally have done n 1135 01:02:42,030 --> 01:02:43,060 minus 1 and 0. 1136 01:02:43,060 --> 01:02:44,840 Instead, he does 0, n minus 1. 1137 01:02:44,840 --> 01:02:47,350 So in some sense, it's like having a left-handed 1138 01:02:47,350 --> 01:02:49,300 person at the table. 1139 01:02:49,300 --> 01:02:52,320 You grab your left, then your right, except for one guy does 1140 01:02:52,320 --> 01:02:54,030 right and then left. 1141 01:02:54,030 --> 01:02:56,290 And that fixes it, OK? 1142 01:02:56,290 --> 01:02:57,540 That fixes it. 1143 01:03:01,060 --> 01:03:01,310 Good. 1144 01:03:01,310 --> 01:03:03,865 So that's basically the dining philosophers problem. 1145 01:03:03,865 --> 01:03:04,880 That's one way of fixing it. 1146 01:03:04,880 --> 01:03:07,150 There are actually other ways of doing it. 1147 01:03:07,150 --> 01:03:09,410 One of the problems with this particular solution is you 1148 01:03:09,410 --> 01:03:11,030 still can have a long chain of waiting. 1149 01:03:13,710 --> 01:03:16,100 So there are other schemes that you can use where, for 1150 01:03:16,100 --> 01:03:20,700 example, if every other one grabs left and then right and 1151 01:03:20,700 --> 01:03:23,100 then right and then left and then left and then right and 1152 01:03:23,100 --> 01:03:28,950 then right and left and so forth, you can end up making 1153 01:03:28,950 --> 01:03:31,510 it so that nobody has to wait to go all the 1154 01:03:31,510 --> 01:03:32,510 way around the circle. 1155 01:03:32,510 --> 01:03:33,278 Yeah? 1156 01:03:33,278 --> 01:03:34,528 AUDIENCE: [INAUDIBLE] 1157 01:03:37,400 --> 01:03:40,580 PROFESSOR: Well, that would be a preemption type of thing, 1158 01:03:40,580 --> 01:03:42,730 where I grab one, and if I didn't get it in time, I 1159 01:03:42,730 --> 01:03:45,320 release it and then try again. 1160 01:03:45,320 --> 01:03:48,120 When you have something like that, there's an issue. 1161 01:03:48,120 --> 01:03:51,570 It's, how do you set the timeout amount? 1162 01:03:51,570 --> 01:03:53,980 And the second issue that you get into when you do timeouts 1163 01:03:53,980 --> 01:03:57,990 is, how do you know you don't then repeat exactly the same 1164 01:03:57,990 --> 01:04:00,810 thing and convert a deadlock situation 1165 01:04:00,810 --> 01:04:03,050 into a livelock situation? 1166 01:04:03,050 --> 01:04:05,030 So a livelock situation is where they're not making 1167 01:04:05,030 --> 01:04:07,480 progress, but they're all busily working, thinking 1168 01:04:07,480 --> 01:04:09,120 they're making progress. 1169 01:04:09,120 --> 01:04:10,370 So you timeout. 1170 01:04:10,370 --> 01:04:12,320 Let's try again. 1171 01:04:12,320 --> 01:04:14,530 What makes you think that the guys that are deadlocking 1172 01:04:14,530 --> 01:04:16,490 aren't going to do exactly the same thing. 1173 01:04:16,490 --> 01:04:18,062 AUDIENCE: [INAUDIBLE] 1174 01:04:18,062 --> 01:04:19,680 PROFESSOR: And exactly. 1175 01:04:19,680 --> 01:04:22,660 And in fact, that's actually a workable scheme. 1176 01:04:22,660 --> 01:04:23,850 And there are schemes that do it. 1177 01:04:23,850 --> 01:04:27,240 Now, that's much more complicated. 1178 01:04:27,240 --> 01:04:30,470 Sometimes has more overhead, especially because things 1179 01:04:30,470 --> 01:04:31,440 become available. 1180 01:04:31,440 --> 01:04:35,360 And it's like, no, you're busy raiding some random amount of 1181 01:04:35,360 --> 01:04:38,020 time before you try again. 1182 01:04:38,020 --> 01:04:40,590 So this is, by the way, the protocol that is used on the 1183 01:04:40,590 --> 01:04:45,630 Ethernet for doing contention resolution. 1184 01:04:45,630 --> 01:04:49,100 It's what's called "exponential backoff." And 1185 01:04:49,100 --> 01:04:55,190 various backoff schemes are used in order to allow 1186 01:04:55,190 --> 01:04:58,810 multiple things acquire mutually-exclusive access to 1187 01:04:58,810 --> 01:05:02,960 something without having to have a definite ordering. 1188 01:05:02,960 --> 01:05:05,820 So there are solutions, but they definitely get more 1189 01:05:05,820 --> 01:05:07,230 heavyweight. 1190 01:05:07,230 --> 01:05:09,080 It's not lightweight. 1191 01:05:09,080 --> 01:05:11,890 Whereas if you can prevent deadlock, that's really good, 1192 01:05:11,890 --> 01:05:16,900 because you just simply do the natural thing. 1193 01:05:16,900 --> 01:05:19,500 And that tends to be pretty quick. 1194 01:05:19,500 --> 01:05:24,950 But yeah, all I'm doing is sort of covering the 1195 01:05:24,950 --> 01:05:26,240 introduction to all these things. 1196 01:05:26,240 --> 01:05:30,150 There are books written on this type of subject. 1197 01:05:30,150 --> 01:05:33,620 Any other questions about dining philosophers and 1198 01:05:33,620 --> 01:05:36,630 deadlock and so forth? 1199 01:05:36,630 --> 01:05:38,005 Now let me tell you how to deadlock Cilk++. 1200 01:05:42,680 --> 01:05:45,880 So here's a code that will deadlock 1201 01:05:45,880 --> 01:05:48,240 Cilk++, or has the potential. 1202 01:05:48,240 --> 01:05:50,060 You might run it a bunch of times, it looks fine. 1203 01:05:53,070 --> 01:05:56,330 Here's what we've done is main routine spawns foo. 1204 01:05:56,330 --> 01:05:57,210 Here's foo down here. 1205 01:05:57,210 --> 01:06:01,020 All foo does is grab a lock and then unlocks it. 1206 01:06:01,020 --> 01:06:02,500 Empty critical section. 1207 01:06:02,500 --> 01:06:03,520 It could do something in there. 1208 01:06:03,520 --> 01:06:05,950 It doesn't matter. 1209 01:06:05,950 --> 01:06:09,460 Then the main grabs a lock, does a cilk_sync and then 1210 01:06:09,460 --> 01:06:12,270 unlocks it. 1211 01:06:12,270 --> 01:06:15,420 So what can go wrong here? 1212 01:06:15,420 --> 01:06:19,455 Notice, by the way, this is only one lock, L. 1213 01:06:19,455 --> 01:06:22,730 There's not two locks. 1214 01:06:22,730 --> 01:06:26,960 So you can deadlock Cilk by just introducing one lock. 1215 01:06:26,960 --> 01:06:28,890 So here's sort of what's going on. 1216 01:06:28,890 --> 01:06:33,140 Let's let this be the main thread and this be foo. 1217 01:06:33,140 --> 01:06:35,780 And this will represent a lock acquire, and 1218 01:06:35,780 --> 01:06:37,390 this is a lock release. 1219 01:06:37,390 --> 01:06:39,910 So what happens is we perform the lock 1220 01:06:39,910 --> 01:06:43,970 acquire here in the parent. 1221 01:06:43,970 --> 01:06:48,090 First, we spawned here, then we acquire the lock here. 1222 01:06:48,090 --> 01:06:52,300 And now foo tries to get access to the lock, and it 1223 01:06:52,300 --> 01:06:55,480 can't because why? 1224 01:06:55,480 --> 01:06:58,570 The main routine has the lock. 1225 01:06:58,570 --> 01:07:01,000 Now what happens? 1226 01:07:01,000 --> 01:07:03,750 The main routine proceeds to the sync, and what does it do 1227 01:07:03,750 --> 01:07:05,000 at the sync? 1228 01:07:06,870 --> 01:07:09,960 It waits for all children to be done. 1229 01:07:12,520 --> 01:07:15,420 And notice now we've created a cycle of waiting, even though 1230 01:07:15,420 --> 01:07:17,630 we didn't use a lock. 1231 01:07:17,630 --> 01:07:20,580 Main waits, but foo is never going to complete, because 1232 01:07:20,580 --> 01:07:24,170 it's waiting for the main thread to release it, the main 1233 01:07:24,170 --> 01:07:26,722 strand here to release it, the main function here. 1234 01:07:26,722 --> 01:07:28,850 Is that clear? 1235 01:07:28,850 --> 01:07:33,830 So you can deadlock Cilk too by doing non-deterministic 1236 01:07:33,830 --> 01:07:34,690 programming. 1237 01:07:34,690 --> 01:07:40,660 So here's the methodology that will help you not do that. 1238 01:07:40,660 --> 01:07:42,060 So what's bad here? 1239 01:07:42,060 --> 01:07:46,430 What's bad is holding the lock across the sync. 1240 01:07:46,430 --> 01:07:47,540 That's bad. 1241 01:07:47,540 --> 01:07:49,940 So don't do that. 1242 01:07:49,940 --> 01:07:52,880 Doctor, my head hurts. 1243 01:07:52,880 --> 01:07:54,130 Well, stop hitting it. 1244 01:07:59,120 --> 01:08:03,160 So don't hold mutexes across Cilk syncs. 1245 01:08:03,160 --> 01:08:06,290 Hold mutexes only within strands, only with 1246 01:08:06,290 --> 01:08:08,802 serially-executing pieces of code. 1247 01:08:08,802 --> 01:08:13,780 Now, it turns out that you can hold it across syncs and so 1248 01:08:13,780 --> 01:08:15,880 forth, but you have to be careful. 1249 01:08:15,880 --> 01:08:19,390 And I'm not going to get into the details of 1250 01:08:19,390 --> 01:08:20,170 how you can do that. 1251 01:08:20,170 --> 01:08:23,770 If you want to figure that out on your own, that's fine. 1252 01:08:23,770 --> 01:08:25,720 And then you're welcome to try to do that 1253 01:08:25,720 --> 01:08:28,319 without deadlocking something. 1254 01:08:28,319 --> 01:08:32,380 Turns out, basically, if you grab the lock before you do 1255 01:08:32,380 --> 01:08:35,259 any spawns, and then released it after the Cilk 1256 01:08:35,259 --> 01:08:36,509 sync, you're OK. 1257 01:08:40,020 --> 01:08:41,382 You're generally, in that case, OK. 1258 01:08:47,710 --> 01:08:50,920 So as always, try to avoid using mutexes, but that's not 1259 01:08:50,920 --> 01:08:52,350 always possible. 1260 01:08:52,350 --> 01:08:54,260 In other words, try to do deterministic programming. 1261 01:08:54,260 --> 01:08:56,790 That helps too. 1262 01:08:56,790 --> 01:09:01,720 And on your homework, you had an example of where it is that 1263 01:09:01,720 --> 01:09:08,189 deterministic programming can actually do a pretty good job. 1264 01:09:08,189 --> 01:09:11,350 The next anomaly I want to talk about is convoying. 1265 01:09:11,350 --> 01:09:13,620 Once again, another thing that can happen. 1266 01:09:13,620 --> 01:09:17,710 This one is actually quite an embarrassment, because the 1267 01:09:17,710 --> 01:09:22,760 original MIT Cilk system that we built had this bug in it. 1268 01:09:22,760 --> 01:09:24,970 So we had this bug. 1269 01:09:24,970 --> 01:09:26,590 So let me show you what it is. 1270 01:09:26,590 --> 01:09:28,600 So here's the idea. 1271 01:09:28,600 --> 01:09:31,520 We're using random work-stealing where each thief 1272 01:09:31,520 --> 01:09:33,529 grabs a mutex on its victim's deck. 1273 01:09:33,529 --> 01:09:38,330 So in order to steal from a victim, it grabs a mutex on 1274 01:09:38,330 --> 01:09:39,420 the victim. 1275 01:09:39,420 --> 01:09:42,600 And now, once it's got the mutex, it now is in a position 1276 01:09:42,600 --> 01:09:46,960 to migrate the work that's on that victim to 1277 01:09:46,960 --> 01:09:48,200 actually steal the work. 1278 01:09:48,200 --> 01:09:49,640 And you want to do that atomically. 1279 01:09:49,640 --> 01:09:51,800 You don't want two guys getting in there trying to 1280 01:09:51,800 --> 01:09:54,220 steal from each other. 1281 01:09:54,220 --> 01:09:58,150 So if the victim's deck is empty, the thief releases the 1282 01:09:58,150 --> 01:09:59,770 mutex and tries again at random. 1283 01:09:59,770 --> 01:10:01,180 That makes sense. 1284 01:10:01,180 --> 01:10:04,070 If there's nothing there to be stolen, then just released the 1285 01:10:04,070 --> 01:10:06,200 mutex and move on. 1286 01:10:06,200 --> 01:10:08,960 If the victim's deck contains work, the thief then steals 1287 01:10:08,960 --> 01:10:10,750 the topmost frame and then releases the mutex. 1288 01:10:13,400 --> 01:10:14,890 Where's the performance bug here? 1289 01:10:19,760 --> 01:10:21,708 AUDIENCE: [INAUDIBLE] 1290 01:10:21,708 --> 01:10:24,143 trying to steal from each other. 1291 01:10:24,143 --> 01:10:27,065 Like A steals from B, B steals from C, C steals from D, and 1292 01:10:27,065 --> 01:10:29,510 they all have locks on each other, and then-- 1293 01:10:29,510 --> 01:10:31,810 PROFESSOR: No, because in that case, they'll each grab the 1294 01:10:31,810 --> 01:10:34,440 deck from each other, discover it's empty, and release it. 1295 01:10:38,680 --> 01:10:39,720 OK, let me show the bug. 1296 01:10:39,720 --> 01:10:41,520 It is very subtle. 1297 01:10:41,520 --> 01:10:45,810 As I say, we didn't realize we had this bug until we noticed 1298 01:10:45,810 --> 01:10:47,650 some codes on which we weren't getting the 1299 01:10:47,650 --> 01:10:48,860 speedups we were expecting. 1300 01:10:48,860 --> 01:10:51,830 Let me show you where this bug comes from. 1301 01:10:51,830 --> 01:10:53,130 Here's the problem. 1302 01:10:53,130 --> 01:10:56,850 At the startup, most thieves will quickly converge on the 1303 01:10:56,850 --> 01:10:58,820 worker P0 containing the initial 1304 01:10:58,820 --> 01:11:02,720 strand, creating a convoy. 1305 01:11:02,720 --> 01:11:05,390 So let me show you how that happens. 1306 01:11:05,390 --> 01:11:09,450 So here we have the startup of our Cilk system where one guy 1307 01:11:09,450 --> 01:11:12,700 has work, and all these are workers that 1308 01:11:12,700 --> 01:11:15,410 have no work to do. 1309 01:11:15,410 --> 01:11:16,490 So what happens? 1310 01:11:16,490 --> 01:11:19,200 They all try to steal at random. 1311 01:11:19,200 --> 01:11:23,250 In this case, we have this guy tries to steal from this 1312 01:11:23,250 --> 01:11:25,330 fellow, this guy tries to steal from 1313 01:11:25,330 --> 01:11:26,930 this fellow, et cetera. 1314 01:11:26,930 --> 01:11:33,260 So of these, this guy, this guy, and that guy all are 1315 01:11:33,260 --> 01:11:35,850 going to discover there's nothing there to be stolen, 1316 01:11:35,850 --> 01:11:38,270 and they're going to repeat the process. 1317 01:11:38,270 --> 01:11:41,010 This guy and this guy, there's going to be some arbitration. 1318 01:11:41,010 --> 01:11:43,530 And one of them is going to get the lock. 1319 01:11:43,530 --> 01:11:46,540 Let's assume it's this one here. 1320 01:11:46,540 --> 01:11:48,900 So what happens is, this guy gets the lock. 1321 01:11:48,900 --> 01:11:50,150 What does this guy do? 1322 01:11:52,440 --> 01:11:53,690 He's going to wait. 1323 01:11:55,970 --> 01:11:57,500 Because he's trying to acquire the lock. 1324 01:11:57,500 --> 01:12:00,610 He can't acquire the lock, so he waits. 1325 01:12:00,610 --> 01:12:02,350 So then what happens? 1326 01:12:02,350 --> 01:12:07,470 This guy now wants to steal the work from this fellow. 1327 01:12:07,470 --> 01:12:10,980 So he steals a little bit of work. 1328 01:12:10,980 --> 01:12:13,510 Then these guys now, what do they do? 1329 01:12:13,510 --> 01:12:16,550 They try again. 1330 01:12:16,550 --> 01:12:19,450 So this guy tries to steal from there, this guy tries to 1331 01:12:19,450 --> 01:12:22,100 steal from there, this one happens to try to steal there. 1332 01:12:22,100 --> 01:12:25,510 This one sees there's work there to be done, so 1333 01:12:25,510 --> 01:12:27,320 what does it do? 1334 01:12:27,320 --> 01:12:29,200 It waits. 1335 01:12:29,200 --> 01:12:32,590 But these guys then try again. 1336 01:12:32,590 --> 01:12:36,675 Maybe a little bit more stuff is moved. 1337 01:12:36,675 --> 01:12:38,490 They try again. 1338 01:12:38,490 --> 01:12:40,550 A little bit more stuff. 1339 01:12:40,550 --> 01:12:42,020 They try again. 1340 01:12:42,020 --> 01:12:45,870 But every time one tries and gets stuck on P0 while we're 1341 01:12:45,870 --> 01:12:51,540 doing that whole transfer, they all are ending up getting 1342 01:12:51,540 --> 01:12:54,980 stuck waiting for this guy to finish. 1343 01:12:54,980 --> 01:12:58,300 And now, we've got work over here, but how many guys are 1344 01:12:58,300 --> 01:13:00,580 going to be trying to steal from this guy? 1345 01:13:00,580 --> 01:13:02,800 None. 1346 01:13:02,800 --> 01:13:05,140 They're all going to be trying to steal from this one, 1347 01:13:05,140 --> 01:13:07,870 because they all have done a lock acquisition, and they're 1348 01:13:07,870 --> 01:13:10,520 sitting there waiting. 1349 01:13:10,520 --> 01:13:14,300 So this is called convoying, where they all pile up on one 1350 01:13:14,300 --> 01:13:16,090 thing, and now resolving that convoy. 1351 01:13:16,090 --> 01:13:18,530 So this was a bug in startup. 1352 01:13:18,530 --> 01:13:22,540 Why wasn't Cilk starting up fast? 1353 01:13:22,540 --> 01:13:26,680 Initially, we just thought, oh, there's system kinds of 1354 01:13:26,680 --> 01:13:28,870 things going on there. 1355 01:13:28,870 --> 01:13:30,770 So the work now gets distributed very slowly, 1356 01:13:30,770 --> 01:13:33,340 because each one is going to serially try to get this work, 1357 01:13:33,340 --> 01:13:34,650 and they're not going to try to get the 1358 01:13:34,650 --> 01:13:36,460 work from each other. 1359 01:13:36,460 --> 01:13:41,040 What you want is that on the second phase, half the guys 1360 01:13:41,040 --> 01:13:43,100 might start hitting this one. 1361 01:13:43,100 --> 01:13:47,170 So you get some kind of exponential distribution of 1362 01:13:47,170 --> 01:13:49,590 the work in kind of a tree fashion. 1363 01:13:49,590 --> 01:13:52,080 And that's what theory says would happen. 1364 01:13:52,080 --> 01:13:55,420 But the theory is usually done without worrying about what 1365 01:13:55,420 --> 01:13:58,810 happens in the implementation of the lock. 1366 01:13:58,810 --> 01:14:01,547 What's the fix for this? 1367 01:14:01,547 --> 01:14:03,044 Yeah? 1368 01:14:03,044 --> 01:14:06,038 AUDIENCE: Can you just basically shove-- when you're 1369 01:14:06,038 --> 01:14:09,032 transferring, you should also say, I have work, so that 1370 01:14:09,032 --> 01:14:13,024 people [INAUDIBLE] waiting for that guy to [INAUDIBLE]. 1371 01:14:13,024 --> 01:14:15,790 PROFESSOR: You could do that, but in the meantime, it could 1372 01:14:15,790 --> 01:14:19,540 be that the attempt to steal goes so much faster than the 1373 01:14:19,540 --> 01:14:22,000 actual getting of the work, you're still going to get half 1374 01:14:22,000 --> 01:14:24,980 the guys locked up on this one. 1375 01:14:24,980 --> 01:14:27,160 And the other half might be locked up on this one. 1376 01:14:30,890 --> 01:14:32,640 Good idea. 1377 01:14:32,640 --> 01:14:35,584 What other things can we do? 1378 01:14:35,584 --> 01:14:37,428 AUDIENCE: Can you check how many people 1379 01:14:37,428 --> 01:14:38,350 are waiting on the-- 1380 01:14:38,350 --> 01:14:41,110 PROFESSOR: Yeah, so the idea is we don't want 1381 01:14:41,110 --> 01:14:43,215 to use a lock operation. 1382 01:14:46,350 --> 01:14:48,280 So here's the idea. 1383 01:14:48,280 --> 01:14:51,370 We use a non-blocking function that's usually called 1384 01:14:51,370 --> 01:14:55,660 "try_lock," rather than "lock." try_lock attempts to 1385 01:14:55,660 --> 01:14:57,360 acquire the mutex. 1386 01:14:57,360 --> 01:14:59,830 If it succeeds, great. 1387 01:14:59,830 --> 01:15:00,770 It's got it. 1388 01:15:00,770 --> 01:15:03,800 If it fails, it doesn't go to spin. 1389 01:15:03,800 --> 01:15:05,435 It simply returns and say, I failed. 1390 01:15:08,290 --> 01:15:10,350 It doesn't go to spin or to yield or anything. 1391 01:15:10,350 --> 01:15:12,550 It just says, oh, I failed, and tells 1392 01:15:12,550 --> 01:15:15,750 that back to the user. 1393 01:15:15,750 --> 01:15:18,710 But it doesn't attempt to block. 1394 01:15:18,710 --> 01:15:23,610 So with try_lock now, what can these other processors do? 1395 01:15:23,610 --> 01:15:24,940 They do a try_lock-- 1396 01:15:24,940 --> 01:15:25,440 yeah? 1397 01:15:25,440 --> 01:15:27,680 AUDIENCE: [INAUDIBLE] 1398 01:15:27,680 --> 01:15:30,190 PROFESSOR: Exactly. 1399 01:15:30,190 --> 01:15:34,700 Instead of waiting there on the guy that they fail on, 1400 01:15:34,700 --> 01:15:37,390 they pick another random one to steal from. 1401 01:15:39,950 --> 01:15:42,030 So they'll just continually try to get it. 1402 01:15:42,030 --> 01:15:44,150 If they get it, then they can do their operation. 1403 01:15:44,150 --> 01:15:50,110 If they don't get it, they just look elsewhere for work. 1404 01:15:50,110 --> 01:15:51,020 So that's what it does. 1405 01:15:51,020 --> 01:15:52,330 It just tries to steal again at 1406 01:15:52,330 --> 01:15:53,580 random, rather than blocking. 1407 01:15:57,820 --> 01:16:00,845 And that gets rid of this convoying problem. 1408 01:16:04,090 --> 01:16:09,250 As I say, dangerous programming, because we didn't 1409 01:16:09,250 --> 01:16:10,800 even know we had a problem. 1410 01:16:10,800 --> 01:16:12,930 Just our code was slower than it could have been. 1411 01:16:16,740 --> 01:16:18,130 Questions about convoying? 1412 01:16:24,210 --> 01:16:27,230 So try_lock is actually a very convenient thing to use. 1413 01:16:27,230 --> 01:16:31,150 So in many cases, you may find that, hey, rather than waiting 1414 01:16:31,150 --> 01:16:33,860 on something with nothing to do, let me go see if there's 1415 01:16:33,860 --> 01:16:37,450 something else I can do in the meantime. 1416 01:16:40,320 --> 01:16:41,570 Contention. 1417 01:16:43,760 --> 01:16:51,390 So here's an example of a code where I want to add up some 1418 01:16:51,390 --> 01:16:57,060 function of the elements of some array. 1419 01:16:57,060 --> 01:17:04,470 So here I've got a value of n, which is a million. 1420 01:17:04,470 --> 01:17:17,410 And I have a type X. So we have a compute function, which 1421 01:17:17,410 --> 01:17:20,680 takes a pointer to a-- 1422 01:17:20,680 --> 01:17:22,640 did I do this right? 1423 01:17:22,640 --> 01:17:31,150 To value V. So anyway, my C++ is not as good as my C, and 1424 01:17:31,150 --> 01:17:35,220 for those who don't know, my C isn't very good. 1425 01:17:35,220 --> 01:17:41,025 So anyway, we have an array of type X of n elements. 1426 01:17:43,730 --> 01:17:47,390 And what I do is I set result to be 0, and then I have a 1427 01:17:47,390 --> 01:17:52,090 loop here which basically goes and adds into result the 1428 01:17:52,090 --> 01:17:55,220 result of computing on each element of the array. 1429 01:17:55,220 --> 01:17:58,810 And then it outputs the result. 1430 01:17:58,810 --> 01:18:00,940 Does everybody understand what's going on in the code? 1431 01:18:00,940 --> 01:18:03,230 It's basically compute on every element in the array, 1432 01:18:03,230 --> 01:18:06,270 take the result, add all those results together. 1433 01:18:06,270 --> 01:18:08,680 We want to parallelize this. 1434 01:18:08,680 --> 01:18:12,090 So let's parallelize that. 1435 01:18:12,090 --> 01:18:14,335 What looks like the best opportunity for parallelizing? 1436 01:18:18,970 --> 01:18:21,490 Yeah, we go after the for and make it be a cilk_for. 1437 01:18:21,490 --> 01:18:24,310 Let's add all those guys up. 1438 01:18:24,310 --> 01:18:28,490 And what's the problem with that? 1439 01:18:28,490 --> 01:18:29,580 We get a race. 1440 01:18:29,580 --> 01:18:30,590 What's the race on? 1441 01:18:30,590 --> 01:18:31,510 AUDIENCE: Result. 1442 01:18:31,510 --> 01:18:33,360 PROFESSOR: Result. 1443 01:18:33,360 --> 01:18:36,200 They're all updating result in parallel. 1444 01:18:36,200 --> 01:18:39,910 Oh, I know how to resolve a race. 1445 01:18:39,910 --> 01:18:43,800 Let's just put a lock around it. 1446 01:18:43,800 --> 01:18:45,050 So here we have the race. 1447 01:18:48,460 --> 01:18:50,630 First, let's analyze this. 1448 01:18:50,630 --> 01:18:53,940 So the work here is order n. 1449 01:18:53,940 --> 01:18:55,320 What is the span? 1450 01:18:55,320 --> 01:18:56,240 AUDIENCE: Log n. 1451 01:18:56,240 --> 01:19:00,000 PROFESSOR: Yeah, the span is log n for the control of the 1452 01:19:00,000 --> 01:19:04,080 stuff here, because this is all constant time. 1453 01:19:04,080 --> 01:19:11,760 So the running time here is order n over P plus log n. 1454 01:19:11,760 --> 01:19:13,560 If you remember the greedy scheduling, it's going to be 1455 01:19:13,560 --> 01:19:16,100 something like this, because this is the work 1456 01:19:16,100 --> 01:19:20,080 over P plus the span. 1457 01:19:20,080 --> 01:19:25,610 So we expect that if n over P is big compared to log n, 1458 01:19:25,610 --> 01:19:28,060 we're going to do pretty well, because we have parallelism 1459 01:19:28,060 --> 01:19:29,310 over log n. 1460 01:19:31,370 --> 01:19:33,330 So let's fix this bug. 1461 01:19:33,330 --> 01:19:38,030 So this is fast code, but it's incorrect code. 1462 01:19:38,030 --> 01:19:43,040 So let's fix it by getting rid of this race. 1463 01:19:43,040 --> 01:19:47,410 So what we'll do is we'll put a lock before. 1464 01:19:47,410 --> 01:19:52,045 We'll introduce a mutex L, and we'll lock L before we add to 1465 01:19:52,045 --> 01:19:55,400 the result, and then we'll unlock it. 1466 01:19:55,400 --> 01:20:00,120 So first of all, this is a bad way to do it, because what I 1467 01:20:00,120 --> 01:20:05,380 really should do is first compute the result of my array 1468 01:20:05,380 --> 01:20:10,210 and then lock, add it to the result, and then unlock so 1469 01:20:10,210 --> 01:20:12,160 that we lessen the time that I'm holding the 1470 01:20:12,160 --> 01:20:14,200 lock in each iteration. 1471 01:20:14,200 --> 01:20:16,270 Nevertheless, this is still a lousy piece of code. 1472 01:20:16,270 --> 01:20:17,311 Why's that? 1473 01:20:17,311 --> 01:20:19,235 AUDIENCE: It's still serialized. 1474 01:20:19,235 --> 01:20:20,820 PROFESSOR: Yeah, it's serialized. 1475 01:20:20,820 --> 01:20:27,110 Every update to result here has to go on serially. 1476 01:20:27,110 --> 01:20:28,130 They're n accesses. 1477 01:20:28,130 --> 01:20:30,760 They're all going to go one at a time. 1478 01:20:30,760 --> 01:20:34,450 So my running time, instead of being n over log n, is going 1479 01:20:34,450 --> 01:20:37,800 to be something like order n. 1480 01:20:43,490 --> 01:20:49,390 Believe me, I have seen many people write code where they 1481 01:20:49,390 --> 01:20:53,220 essentially do exactly this. 1482 01:20:53,220 --> 01:20:56,340 They take something, they make it parallel, they have a race 1483 01:20:56,340 --> 01:20:57,760 bug, they fix it with a mutex. 1484 01:21:01,200 --> 01:21:05,240 Bad idea, because then we end up with 1485 01:21:05,240 --> 01:21:07,170 contention on this mutex. 1486 01:21:07,170 --> 01:21:08,990 What's the right way to parallelize this? 1487 01:21:17,900 --> 01:21:18,395 Yeah? 1488 01:21:18,395 --> 01:21:24,830 AUDIENCE: Maybe you could have each [INAUDIBLE] 1489 01:21:24,830 --> 01:21:29,780 have result as an array and have each [INAUDIBLE] 1490 01:21:29,780 --> 01:21:30,770 one place [INAUDIBLE]. 1491 01:21:30,770 --> 01:21:32,750 And then at the end, some of all the-- 1492 01:21:32,750 --> 01:21:34,970 PROFESSOR: But won't that be n elements to sum up? 1493 01:21:34,970 --> 01:21:37,455 AUDIENCE: [INAUDIBLE] 1494 01:21:37,455 --> 01:21:39,940 AUDIENCE: So basically, have, say. 1495 01:21:39,940 --> 01:21:41,440 Eight results, instead of having-- 1496 01:21:41,440 --> 01:21:42,690 PROFESSOR: For each thread. 1497 01:21:45,180 --> 01:21:46,400 Good. 1498 01:21:46,400 --> 01:21:50,310 So that each one could keep it local to its own thread. 1499 01:21:50,310 --> 01:21:53,180 Now, of course, that involves me knowing how many processors 1500 01:21:53,180 --> 01:21:55,770 I'm running on. 1501 01:21:55,770 --> 01:22:01,490 So now, if that number changes or whatever-- 1502 01:22:01,490 --> 01:22:03,530 there's a way of doing it completely processor 1503 01:22:03,530 --> 01:22:05,570 obliviously. 1504 01:22:05,570 --> 01:22:06,430 AUDIENCE: Divide and conquer. 1505 01:22:06,430 --> 01:22:08,970 PROFESSOR: Yeah, do divide and conquer. 1506 01:22:08,970 --> 01:22:12,680 Add up recursively the first half of the elements, add up 1507 01:22:12,680 --> 01:22:14,080 the second half of the elements, 1508 01:22:14,080 --> 01:22:15,820 and add them together. 1509 01:22:15,820 --> 01:22:18,240 Next time, we're going to see yet another mechanism for 1510 01:22:18,240 --> 01:22:22,060 doing that, which gets the kind of performance that 1511 01:22:22,060 --> 01:22:26,340 you're mentioning but without having to rewrite the For loop 1512 01:22:26,340 --> 01:22:27,780 as divide and conquer. 1513 01:22:27,780 --> 01:22:29,030 We'll see that next time. 1514 01:22:31,390 --> 01:22:33,640 So in this case, we have lock contention that takes away our 1515 01:22:33,640 --> 01:22:34,570 parallelism. 1516 01:22:34,570 --> 01:22:40,960 Unfortunately, very little is known about lock contention. 1517 01:22:40,960 --> 01:22:45,200 The greedy scheduler, you can show that it achieves T1 over 1518 01:22:45,200 --> 01:22:51,740 P plus T infinity plus B where B is the bondage, that is, if 1519 01:22:51,740 --> 01:22:56,180 you add the total time of all critical sections. 1520 01:22:56,180 --> 01:22:58,440 That's a lousy bound, because it says, even if they're 1521 01:22:58,440 --> 01:23:02,310 locked by different locks, you still add up the total time of 1522 01:23:02,310 --> 01:23:04,100 all the critical sections. 1523 01:23:04,100 --> 01:23:07,550 And generally, although you can improve this in special 1524 01:23:07,550 --> 01:23:12,120 cases, the general theory for understanding contention is 1525 01:23:12,120 --> 01:23:14,220 not understood very well. 1526 01:23:14,220 --> 01:23:18,740 And this upper bound is weak, but little is known about lock 1527 01:23:18,740 --> 01:23:20,100 contention. 1528 01:23:20,100 --> 01:23:22,890 Very little is known about lock contention. 1529 01:23:22,890 --> 01:23:28,050 So to conclude, always write deterministic 1530 01:23:28,050 --> 01:23:31,170 programs, unless you can't. 1531 01:23:33,790 --> 01:23:37,410 Always write deterministic programs, unless you can't. 1532 01:23:37,410 --> 01:23:38,660 Great.