1 00:00:07,000 --> 00:00:11,000 OK, good morning. So today, we're going to 2 00:00:11,000 --> 00:00:16,000 continue our exploration of multithreaded algorithms. 3 00:00:16,000 --> 00:00:21,000 Last time we talked about some aspects of scheduling, 4 00:00:21,000 --> 00:00:26,000 and a little bit about linguistics to describe a 5 00:00:26,000 --> 00:00:30,000 multithreaded competition. And today, we're going to 6 00:00:30,000 --> 00:00:32,000 actually deal with some algorithms. 7 00:00:47,000 --> 00:00:50,000 So, we're going to start out with a really simple, 8 00:00:50,000 --> 00:00:53,000 actually, what's fun about this, actually, 9 00:00:53,000 --> 00:00:57,000 is that everything I'm going to teach you today I could have 10 00:00:57,000 --> 00:01:01,000 taught you in week two, OK, because basically it's just 11 00:01:01,000 --> 00:01:05,000 taking the divide and conquer hammer, and just smashing 12 00:01:05,000 --> 00:01:10,000 problem after problem with it. OK, and so, actually next 13 00:01:10,000 --> 00:01:13,000 week's lectures on caching, also very similar. 14 00:01:13,000 --> 00:01:18,000 So, everybody should bone up on their master theorem and 15 00:01:18,000 --> 00:01:22,000 substitution methods for occurrences, and so forth 16 00:01:22,000 --> 00:01:25,000 because that's our going to be doing. 17 00:01:25,000 --> 00:01:28,000 And of course, all the stuff will be on the 18 00:01:28,000 --> 00:01:31,000 final. So let's start with matrix 19 00:01:31,000 --> 00:01:32,000 multiplication. 20 00:01:40,000 --> 00:01:45,000 And we'll do n by n. So, our problem is to do C 21 00:01:45,000 --> 00:01:50,000 equals A times B. And the way we'll do that is 22 00:01:50,000 --> 00:01:55,000 using divide and conquer, as we saw before, 23 00:01:55,000 --> 00:02:02,000 although we're not going to use Strassen's method. 24 00:02:02,000 --> 00:02:08,000 OK, we'll just use the ordinary thing, and I'll leave Strassen's 25 00:02:08,000 --> 00:02:12,000 as an exercise. So, the idea is we're going to 26 00:02:12,000 --> 00:02:18,000 look at matrix multiplication in terms of an n by n matrix, 27 00:02:18,000 --> 00:02:22,000 in terms of n over 2 by n over 2 matrices. 28 00:02:22,000 --> 00:02:28,000 So, I partition C into four blocks, and likewise with A and 29 00:02:28,000 --> 00:02:30,000 B. 30 00:02:50,000 --> 00:02:58,000 OK, and we multiply those out, and that gives us the 31 00:02:58,000 --> 00:03:03,000 following. Make sure I get all my indices 32 00:03:03,000 --> 00:03:04,000 right. 33 00:03:40,000 --> 00:03:45,000 OK, so it gives us the sum of these two n by n matrices. 34 00:03:45,000 --> 00:03:49,000 OK, so for example, if I multiply the first row by 35 00:03:49,000 --> 00:03:54,000 the first column, I'm putting the first term, 36 00:03:54,000 --> 00:03:58,000 A_1-1 times B_1-1 in this matrix, in the second one, 37 00:03:58,000 --> 00:04:03,000 A_1-2 times B_2-1 gets placed here. 38 00:04:03,000 --> 00:04:06,000 So, when I sum them, and so forth, 39 00:04:06,000 --> 00:04:10,000 for the other entries, and when I sum them, 40 00:04:10,000 --> 00:04:16,000 I'm going to get my result. So, we can write that out as a, 41 00:04:16,000 --> 00:04:22,000 let's see, I'm not sure this is going to all fit on one board, 42 00:04:22,000 --> 00:04:28,000 but we'll see we can do. OK, so we can write that out as 43 00:04:28,000 --> 00:04:35,000 a multithreaded program. So this, we're going to see 44 00:04:35,000 --> 00:04:41,000 that n is an exact power of two for simplicity. 45 00:04:41,000 --> 00:04:49,000 And since we're going to have two matrices that we have to 46 00:04:49,000 --> 00:04:57,000 add, we're going to basically put one of them in our output, 47 00:04:57,000 --> 00:05:04,000 C; that'll be the first one, and we're going to use a 48 00:05:04,000 --> 00:05:11,000 temporary matrix, T, which is also n by n. 49 00:05:11,000 --> 00:05:21,000 OK, and the code looks something like this, 50 00:05:21,000 --> 00:05:32,000 OK, n equals one, and C of one gets A of 1-1 51 00:05:32,000 --> 00:05:41,000 times B of 1-1. Otherwise, what we do then is 52 00:05:41,000 --> 00:05:49,000 we partition the matrices. OK, so we partition them into 53 00:05:49,000 --> 00:05:55,000 the block. So, how long does it take me to 54 00:05:55,000 --> 00:06:05,000 partition at matrix into blocks if I'm clever at my programming? 55 00:06:05,000 --> 00:06:07,000 Yeah? No time, or it actually does 56 00:06:07,000 --> 00:06:10,000 take a little bit of time. Yeah, order one, 57 00:06:10,000 --> 00:06:13,000 basically, OK, because all it is is just index 58 00:06:13,000 --> 00:06:15,000 calculations. You have to change what the 59 00:06:15,000 --> 00:06:18,000 index is. You have to pass in what you're 60 00:06:18,000 --> 00:06:22,000 passing these in addition to A, B, and C for example, 61 00:06:22,000 --> 00:06:26,000 pass and arrange which would have essentially a constant 62 00:06:26,000 --> 00:06:28,000 overhead. But it's basically order one 63 00:06:28,000 --> 00:06:30,000 time. 64 00:06:36,000 --> 00:06:39,000 Basically order one time, OK, to partition the matrices 65 00:06:39,000 --> 00:06:43,000 because all we are doing is index calculations. 66 00:06:43,000 --> 00:06:46,000 And all we have to do is just as we go through, 67 00:06:46,000 --> 00:06:50,000 is just make sure we keep track of the indices, 68 00:06:50,000 --> 00:06:52,000 OK? Any questions about that? 69 00:06:52,000 --> 00:06:55,000 People follow? OK, that's sort of standard 70 00:06:55,000 --> 00:07:05,000 programming. So then, what I do is I spawn 71 00:07:05,000 --> 00:07:17,000 multiplication of, woops, the sub-matrices, 72 00:07:17,000 --> 00:07:22,000 and spawn -- 73 00:07:36,000 --> 00:07:43,000 -- and continue, C_2-1, gets A_2-1, 74 00:07:43,000 --> 00:07:53,000 B_1-1, two, and let's see, 2-2, yeah, it's 2-1. 75 00:07:53,000 --> 00:08:02,000 OK, and continuing onto the next page. 76 00:08:02,000 --> 00:08:10,000 Let me just make sure I somehow get the indentation right. 77 00:08:10,000 --> 00:08:18,000 This is my level of indentation, and I'm continuing 78 00:08:18,000 --> 00:08:25,000 right along. And now what I do is put the 79 00:08:25,000 --> 00:08:30,000 results in T, and then -- 80 00:08:58,000 --> 00:09:01,000 OK, so I've spawn off all these multiplications. 81 00:09:01,000 --> 00:09:06,000 So that means when I spawn, I get to, after I spawn 82 00:09:06,000 --> 00:09:11,000 something I can go onto the next statement, and execute that even 83 00:09:11,000 --> 00:09:15,000 as this is executing. OK, so that's our notion of 84 00:09:15,000 --> 00:09:20,000 multithreaded programming. I spawn off these eight things. 85 00:09:20,000 --> 00:09:24,000 What do I do next? What's the next step in this 86 00:09:24,000 --> 00:09:25,000 code? Sync. 87 00:09:25,000 --> 00:09:28,000 Yeah. OK, I've got to wait for them 88 00:09:28,000 --> 00:09:33,000 to be done before I can use their results. 89 00:09:33,000 --> 00:09:38,000 OK, so I put a sync in, say wait for all those things I 90 00:09:38,000 --> 00:09:42,000 spawned off to be done, and then what? 91 00:09:48,000 --> 00:09:51,000 Yeah. That you have to add T and C. 92 00:09:51,000 --> 00:09:55,000 So let's do that with a subroutine call. 93 00:09:55,000 --> 00:10:02,000 OK, and then we are done. We do a return at the end. 94 00:10:02,000 --> 00:10:08,000 OK, so let's write the code for add, because add, 95 00:10:08,000 --> 00:10:14,000 we also would like to do in parallel if we can. 96 00:10:14,000 --> 00:10:21,000 And what we are doing here is doing C gets C plus T, 97 00:10:21,000 --> 00:10:25,000 OK? So, we're going to add T into 98 00:10:25,000 --> 00:10:29,000 C. So, we have some code here to 99 00:10:29,000 --> 00:10:35,000 do our base case, and partitioning because we're 100 00:10:35,000 --> 00:10:43,000 going to do it divide and conquer as before. 101 00:10:43,000 --> 00:10:49,000 And this one's actually a lot easier. 102 00:10:49,000 --> 00:10:54,000 We just spawn, add a C_1-1, 103 00:10:54,000 --> 00:11:00,000 T_1-1, n over 2, C_1-2, T_1-2, 104 00:11:00,000 --> 00:11:06,000 n over 2, C_2-1, T_2-1, n over 2, 105 00:11:06,000 --> 00:11:13,000 C_2-2, 2-2-2, n over 2, and then sync, 106 00:11:13,000 --> 00:11:21,000 and return the result. OK, so all we're doing here is 107 00:11:21,000 --> 00:11:26,000 just dividing it into four pieces, spawning them off. 108 00:11:26,000 --> 00:11:30,000 That's it. OK, wait until they're all 109 00:11:30,000 --> 00:11:32,000 done, then we return with the result. 110 00:11:32,000 --> 00:11:36,000 OK, so any questions about how this code works? 111 00:11:36,000 --> 00:11:39,000 So, remember that, here, we're going to have a 112 00:11:39,000 --> 00:11:43,000 scheduler underneath which is scheduling this onto our 113 00:11:43,000 --> 00:11:46,000 processors. And we're going to have to 114 00:11:46,000 --> 00:11:50,000 worry about how well that scheduler is doing. 115 00:11:50,000 --> 00:11:53,000 And, from last time, we learned that there were two 116 00:11:53,000 --> 00:11:56,000 important measures, OK, that can be used 117 00:11:56,000 --> 00:12:01,000 essentially to predict the performance on any number of 118 00:12:01,000 --> 00:12:03,000 processors. And what are those two 119 00:12:03,000 --> 00:12:08,000 measures? Yeah, T_1 and T infinity so 120 00:12:08,000 --> 00:12:12,000 that we had some names. T_1 is the work, 121 00:12:12,000 --> 00:12:16,000 good, and T infinity is critical path length, 122 00:12:16,000 --> 00:12:19,000 good. So, you have to work in the 123 00:12:19,000 --> 00:12:23,000 critical path length. If we know the work in the 124 00:12:23,000 --> 00:12:28,000 critical path length, we can do things like say what 125 00:12:28,000 --> 00:12:33,000 the parallelism is of our program, and from that, 126 00:12:33,000 --> 00:12:38,000 understand how many processors it makes sense to run this 127 00:12:38,000 --> 00:12:47,000 program on. OK, so let's do that analysis. 128 00:12:47,000 --> 00:12:59,000 OK, so let's let M_P of n be the p processor execution time 129 00:12:59,000 --> 00:13:09,000 for our mult code, and A_P of n be the same thing 130 00:13:09,000 --> 00:13:19,000 for our matrix addition code. So, the first thing we're going 131 00:13:19,000 --> 00:13:23,000 to analyze is work. And, what do we hope our answer 132 00:13:23,000 --> 00:13:26,000 to our work is? What we analyze work, 133 00:13:26,000 --> 00:13:29,000 what do we hope it's going to be? 134 00:13:39,000 --> 00:13:41,000 Well, we hope it's going to be small. 135 00:13:41,000 --> 00:13:45,000 I'll grant you that. What could we benchmark it 136 00:13:45,000 --> 00:13:48,000 against? Yeah, if we wrote just 137 00:13:48,000 --> 00:13:52,000 something that didn't used to have any parallelism. 138 00:13:52,000 --> 00:13:57,000 We'd like our parallel code when run on one processor to be 139 00:13:57,000 --> 00:14:02,000 just as fast as our serial code, the normal code that we would 140 00:14:02,000 --> 00:14:08,000 use to write to do this problem. That's generally the way that 141 00:14:08,000 --> 00:14:11,000 we would like these things to operate, OK? 142 00:14:11,000 --> 00:14:16,000 So, what is that for matrix multiplication in the naÔve way? 143 00:14:16,000 --> 00:14:19,000 Yeah, it's n^3. Of course, we use Strassen's 144 00:14:19,000 --> 00:14:24,000 algorithm, or one of the other, faster algorithms beat n^3. 145 00:14:24,000 --> 00:14:28,000 But, for this problem, we are just going to focus on 146 00:14:28,000 --> 00:14:32,000 n^3. I'm going to let you do the 147 00:14:32,000 --> 00:14:37,000 Strassen as an exercise. So, let's analyze the work. 148 00:14:37,000 --> 00:14:43,000 OK, since we have a subroutine for add that we are using in the 149 00:14:43,000 --> 00:14:47,000 multiply code, OK, we start by analyzing the 150 00:14:47,000 --> 00:14:49,000 add. So, we have A_1 of n, 151 00:14:49,000 --> 00:14:52,000 OK, is, well, can somebody give me a 152 00:14:52,000 --> 00:14:56,000 recurrence here? What's the recurrence for 153 00:14:56,000 --> 00:15:02,000 understanding the running time of this code? 154 00:15:10,000 --> 00:15:16,000 OK, this is basically week two. This is lecture one actually. 155 00:15:16,000 --> 00:15:22,000 This is like lecture two or, at worst, lecture three. 156 00:15:22,000 --> 00:15:25,000 Well, A of 1 of n. 157 00:15:34,000 --> 00:15:35,000 Plus order one, right. 158 00:15:35,000 --> 00:15:39,000 OK, that's right. So, we have four problems of 159 00:15:39,000 --> 00:15:41,000 size n over 2 that we are solving. 160 00:15:41,000 --> 00:15:45,000 OK, so to see this, you don't even have to know 161 00:15:45,000 --> 00:15:49,000 that we are doing this in parallel, because the work is 162 00:15:49,000 --> 00:15:54,000 basically what would happen if it executed on a serial machine. 163 00:15:54,000 --> 00:15:59,000 So, we have four problems of size n over 2 plus order one is 164 00:15:59,000 --> 00:16:05,000 the total work. Any questions about how I got 165 00:16:05,000 --> 00:16:10,000 that recurrence? Is that pretty straightforward? 166 00:16:10,000 --> 00:16:15,000 If not, let me know. OK, and so, what's the solution 167 00:16:15,000 --> 00:16:19,000 to this recurrence? Yeah, order n^2. 168 00:16:19,000 --> 00:16:23,000 How do we know that? Yeah, master method, 169 00:16:23,000 --> 00:16:30,000 so n to the log base two of four, right, is n^2. 170 00:16:30,000 --> 00:16:33,000 Compare that with order one. This dramatically dominates. 171 00:16:33,000 --> 00:16:37,000 So this is the answer, the n to the log base two of 172 00:16:37,000 --> 00:16:39,000 four, n^2. OK, everybody remember that? 173 00:16:39,000 --> 00:16:43,000 OK, so I want people to bone up because this is going to be 174 00:16:43,000 --> 00:16:47,000 recurrences, and divide and conquer and stuff is going to be 175 00:16:47,000 --> 00:16:50,000 on the final, OK, even though we haven't seen 176 00:16:50,000 --> 00:16:52,000 it in many moons. OK, so that's good. 177 00:16:52,000 --> 00:16:56,000 That's the same as the serial. If I had to add 2N by n 178 00:16:56,000 --> 00:16:59,000 matrices, how long does it take me to do it? 179 00:16:59,000 --> 00:17:04,000 n^2 time. OK, so the input is size n^2. 180 00:17:04,000 --> 00:17:12,000 So, you're not going to be the size of the input if you have to 181 00:17:12,000 --> 00:17:16,000 look at every piece of the input. 182 00:17:16,000 --> 00:17:23,000 OK, let's now do the work of the matrix multiplication. 183 00:17:23,000 --> 00:17:28,000 So once again, we want to get a recurrence 184 00:17:28,000 --> 00:17:36,000 here. So, what's our recurrence here? 185 00:17:36,000 --> 00:17:39,000 Yeah? Not quite. 186 00:17:39,000 --> 00:17:42,000 Eight, right, good. 187 00:17:42,000 --> 00:17:48,000 OK, eight, M1, n over 2, plus, 188 00:17:48,000 --> 00:18:00,000 yeah, there's theta n^2 for the addition, and then there's extra 189 00:18:00,000 --> 00:18:11,000 theta one that we can absorb into theta n^2. 190 00:18:11,000 --> 00:18:19,000 Isn't asymptotics great? OK, it's just great. 191 00:18:19,000 --> 00:18:27,000 And so, what's the solution to that one? 192 00:18:27,000 --> 00:18:35,000 Theta n^3, why is that? Man, we are exercising old 193 00:18:35,000 --> 00:18:36,000 muscles. Aren't we? 194 00:18:36,000 --> 00:18:39,000 And they're just creaking. I can hear them. 195 00:18:39,000 --> 00:18:42,000 Why is that? Yeah, master method because 196 00:18:42,000 --> 00:18:46,000 we're looking at, what are we comparing? 197 00:18:46,000 --> 00:18:50,000 Yeah, n to the log base two of eight, or n^3 versus n^2, 198 00:18:50,000 --> 00:18:55,000 this one dominates order n^3. OK, so this is same as serial. 199 00:18:55,000 --> 00:18:59,000 This was the same as serial. This was the same as serial. 200 00:18:59,000 --> 00:19:02,000 That's good. OK, we know we have a program 201 00:19:02,000 --> 00:19:06,000 that on one processor, will execute the same as our 202 00:19:06,000 --> 00:19:12,000 serial code on which it's based. Namely, we could have done 203 00:19:12,000 --> 00:19:15,000 this. If I had just got rid of all 204 00:19:15,000 --> 00:19:19,000 the spawns and syncs, that would have just been a 205 00:19:19,000 --> 00:19:22,000 perfectly good piece of pseudocode describing the 206 00:19:22,000 --> 00:19:26,000 runtime of the algorithm, describing the serial 207 00:19:26,000 --> 00:19:29,000 algorithm. And its run time ends up being 208 00:19:29,000 --> 00:19:33,000 exactly the same, not too surprising. 209 00:19:33,000 --> 00:19:38,000 OK? OK, so now we do the new stuff, 210 00:19:38,000 --> 00:19:47,000 critical path length. OK, so here we have A infinity 211 00:19:47,000 --> 00:19:54,000 of n. Ooh, OK, so we're going to add 212 00:19:54,000 --> 00:20:02,000 up the critical path of this code here. 213 00:20:02,000 --> 00:20:07,000 Hmm, how do I figure out the critical path on a piece of code 214 00:20:07,000 --> 00:20:08,000 like that? 215 00:20:26,000 --> 00:20:30,000 So, it's going to expand into one of those DAG's. 216 00:20:30,000 --> 00:20:33,000 What's the DAG going to look like? 217 00:20:33,000 --> 00:20:37,000 How do I reason? So, it's actually easier not to 218 00:20:37,000 --> 00:20:42,000 think about the DAG, but to simply think about 219 00:20:42,000 --> 00:20:45,000 what's going on in the code. Yeah? 220 00:20:45,000 --> 00:20:49,000 Yeah, so it's basically, since all four spawns are 221 00:20:49,000 --> 00:20:54,000 spawning off the same thing, and they're operating in 222 00:20:54,000 --> 00:20:59,000 parallel, I could just look at one. 223 00:20:59,000 --> 00:21:01,000 Or in general, if I spawned off several 224 00:21:01,000 --> 00:21:06,000 things, I look at which everyone is going to have the maximum 225 00:21:06,000 --> 00:21:09,000 critical path for the things that I've spawned off. 226 00:21:09,000 --> 00:21:13,000 So, when we do work, we're usually doing plus when I 227 00:21:13,000 --> 00:21:17,000 have multiple subroutines. When we do critical path, 228 00:21:17,000 --> 00:21:20,000 I'm doing max. It's going to be the max over 229 00:21:20,000 --> 00:21:23,000 the critical paths of the subroutines that I call. 230 00:21:23,000 --> 00:21:26,000 OK, and here they are all equal. 231 00:21:26,000 --> 00:21:30,000 So what's the recurrence that I get? 232 00:21:41,000 --> 00:21:47,000 What's the recurrence I'm going to get out of this one? 233 00:21:47,000 --> 00:21:51,000 Yeah, A infinity, n over 2, plus constant, 234 00:21:51,000 --> 00:21:58,000 OK, because this is what the worst is of any of those four 235 00:21:58,000 --> 00:22:05,000 because they're all the same. They're all a problem looking 236 00:22:05,000 --> 00:22:11,000 at the critical path of something that's half the size, 237 00:22:11,000 --> 00:22:15,000 for a problem that's half the size. 238 00:22:15,000 --> 00:22:21,000 OK, people with me? OK, so what's the solution to 239 00:22:21,000 --> 00:22:24,000 this? Yeah, that's theta log n. 240 00:22:24,000 --> 00:22:29,000 That's just, once again, master theorem, 241 00:22:29,000 --> 00:22:36,000 case two, because the solution to this is n to the log base two 242 00:22:36,000 --> 00:22:44,000 of one, which is n to the zero. So we have, on this side, 243 00:22:44,000 --> 00:22:47,000 we have one, and here, we're comparing it 244 00:22:47,000 --> 00:22:50,000 with one. They're the same, 245 00:22:50,000 --> 00:22:53,000 so therefore we tack on that extra log n. 246 00:22:53,000 --> 00:22:58,000 OK, so tack on one log n. OK, so case two of the master 247 00:22:58,000 --> 00:23:01,000 method. Pretty good. 248 00:23:01,000 --> 00:23:06,000 OK, so that's pretty good, because the critical path is 249 00:23:06,000 --> 00:23:11,000 pretty short, log n compared to the work, 250 00:23:11,000 --> 00:23:12,000 n^2. So, let's do, 251 00:23:12,000 --> 00:23:18,000 then, this one which is a little bit more interesting, 252 00:23:18,000 --> 00:23:22,000 but not much harder. How about this one? 253 00:23:22,000 --> 00:23:25,000 What's the recurrence going to be? 254 00:23:25,000 --> 00:23:31,000 Critical path of the multiplication? 255 00:23:38,000 --> 00:23:42,000 So once again, it's going to be the maximum of 256 00:23:42,000 --> 00:23:46,000 everything we spawned off in parallel, which is, 257 00:23:46,000 --> 00:23:50,000 by symmetry, the same as one of them. 258 00:23:50,000 --> 00:23:54,000 So what do I get here? M infinity, n over 2, 259 00:23:54,000 --> 00:23:58,000 plus theta log n. Where'd the theta log n come 260 00:23:58,000 --> 00:24:02,000 from? Yeah, from the addition. 261 00:24:02,000 --> 00:24:05,000 That's the critical path of the addition. 262 00:24:05,000 --> 00:24:11,000 Now, why is it that the maximum of that with all the spawns? 263 00:24:11,000 --> 00:24:16,000 You said that when you spawn things off, you're going to do 264 00:24:16,000 --> 00:24:22,000 them, yeah, you sync first. And, sync says you wait for all 265 00:24:22,000 --> 00:24:26,000 those to be done. So, you're only taking the 266 00:24:26,000 --> 00:24:31,000 maximum, and across syncs you're adding. 267 00:24:31,000 --> 00:24:34,000 So, you add across syncs, and across things that you 268 00:24:34,000 --> 00:24:37,000 spawned off in parallel. That's where you are doing the 269 00:24:37,000 --> 00:24:39,000 max. OK, but if you have a sync, 270 00:24:39,000 --> 00:24:41,000 it says that that's the end. You've got to wait for 271 00:24:41,000 --> 00:24:45,000 everything there to end. This isn't going on in parallel 272 00:24:45,000 --> 00:24:47,000 with it. This is going on after it. 273 00:24:47,000 --> 00:24:50,000 So, whatever the critical path is here, OK, if I have an 274 00:24:50,000 --> 00:24:53,000 infinite number of processors, I'd still have to wait up at 275 00:24:53,000 --> 00:24:57,000 this point, and that would therefore make it so that the 276 00:24:57,000 --> 00:25:00,000 remaining execution gear was whatever the critical, 277 00:25:00,000 --> 00:25:03,000 I would have to add whatever the critical path was to this 278 00:25:03,000 --> 00:25:08,000 one. Is that clear to everybody? 279 00:25:08,000 --> 00:25:16,000 OK, so we get this recurrence. And, that has solution what? 280 00:25:16,000 --> 00:25:21,000 Yeah, theta log squared n. OK, once again, 281 00:25:21,000 --> 00:25:27,000 by master method, case two, where this ends up 282 00:25:27,000 --> 00:25:34,000 being a constant versus log n, those don't differ by a 283 00:25:34,000 --> 00:25:41,000 polynomial amount, or equal to a log factor. 284 00:25:41,000 --> 00:25:48,000 What we do in that circumstance is tack on an extra log factor. 285 00:25:48,000 --> 00:25:53,000 OK, so as I say, good idea to review the master 286 00:25:53,000 --> 00:25:56,000 method. OK, that's great. 287 00:25:56,000 --> 00:26:04,000 So now, let's take a look at the parallelism that we get. 288 00:26:04,000 --> 00:26:12,000 We'll just do it right here for the multiplication. 289 00:26:12,000 --> 00:26:21,000 OK, so parallelism is what for the multiplication? 290 00:26:21,000 --> 00:26:26,000 What's the formula for parallelism? 291 00:26:26,000 --> 00:26:35,000 So, we have p bar is the notation we use for this 292 00:26:35,000 --> 00:26:43,000 problem. What's the parallelism going to 293 00:26:43,000 --> 00:26:47,000 be? What's the ratio I take? 294 00:26:47,000 --> 00:26:54,000 Yeah, it's M_1 of n divided by M infinity of n. 295 00:26:54,000 --> 00:27:00,000 OK, and that's equal to, that's n^3. 296 00:27:00,000 --> 00:27:09,000 That's n^2, or log n^2, sorry, log squared n. 297 00:27:09,000 --> 00:27:12,000 OK, so this is the parallelism. That says you could run up to 298 00:27:12,000 --> 00:27:16,000 this many processors and expect to be getting linear speed up. 299 00:27:16,000 --> 00:27:20,000 If I ran with more processors than the parallelism, 300 00:27:20,000 --> 00:27:23,000 I don't expect to be getting linear speed up anymore, 301 00:27:23,000 --> 00:27:26,000 OK, because what I expect to run in, is just time 302 00:27:26,000 --> 00:27:30,000 proportional to critical path length, and throwing more 303 00:27:30,000 --> 00:27:33,000 processors at the problem is not going to help me very much, 304 00:27:33,000 --> 00:27:39,000 OK? So let's just look at this just 305 00:27:39,000 --> 00:27:44,000 to get a sense of what's going on here. 306 00:27:44,000 --> 00:27:50,000 Let's imagine that the constants are irrelevant, 307 00:27:50,000 --> 00:27:55,000 and we have, say, thousand by thousand 308 00:27:55,000 --> 00:27:59,000 matrices, OK, so in that case, 309 00:27:59,000 --> 00:28:08,000 our parallelism is 1,000^3 divided by log of 1,000^2. 310 00:28:08,000 --> 00:28:12,000 What's log of 1,000? Ten, approximately, 311 00:28:12,000 --> 00:28:16,000 right? Log base two of 1,000 is about 312 00:28:16,000 --> 00:28:21,000 ten, so that's 10^2. So, you have about 10^7 313 00:28:21,000 --> 00:28:24,000 parallelism. How big is 10^7? 314 00:28:24,000 --> 00:28:30,000 Ten million processors. OK, so who knows of a machine 315 00:28:30,000 --> 00:28:38,000 with ten million processors? What's the most number of 316 00:28:38,000 --> 00:28:44,000 processors anybody knows about? Yeah, not quite, 317 00:28:44,000 --> 00:28:52,000 the IBM Blue Jean has a humungous number of processors, 318 00:28:52,000 --> 00:28:55,000 exceeding 10,000. Yeah. 319 00:28:55,000 --> 00:29:03,000 Those were one bit processors. OK, so this is actually a 320 00:29:03,000 --> 00:29:09,000 pretty big number, and so, our parallelism is much 321 00:29:09,000 --> 00:29:15,000 bigger than a typical, actual number of processors. 322 00:29:15,000 --> 00:29:22,000 So, we would expect to be able to run this and get very good 323 00:29:22,000 --> 00:29:28,000 performance, OK, because we're never going to be 324 00:29:28,000 --> 00:29:35,000 limited in this algorithm by performance. 325 00:29:35,000 --> 00:29:38,000 However, there are some tricks we can do. 326 00:29:38,000 --> 00:29:42,000 One of the things in this code is that we actually have some 327 00:29:42,000 --> 00:29:47,000 overhead that's not apparent because I haven't run this code 328 00:29:47,000 --> 00:29:51,000 with you, although I could, which is that we have this 329 00:29:51,000 --> 00:29:54,000 temporary matrix, T, and if you look at the 330 00:29:54,000 --> 00:29:58,000 execution stack, we're always allocating T and 331 00:29:58,000 --> 00:30:01,000 getting rid of it, etc. 332 00:30:01,000 --> 00:30:04,000 And, one of the things when you actually look at the performance 333 00:30:04,000 --> 00:30:07,000 of real code, which is now that you have your 334 00:30:07,000 --> 00:30:10,000 algorithmic background, you're ready to go and do that 335 00:30:10,000 --> 00:30:13,000 with some insight. Of course, you're interested in 336 00:30:13,000 --> 00:30:16,000 getting more than just asymptotic behavior. 337 00:30:16,000 --> 00:30:19,000 You're interested in getting real performance behavior on 338 00:30:19,000 --> 00:30:21,000 real things. So, you do care about constants 339 00:30:21,000 --> 00:30:24,000 in that nature. OK, and one of the things is 340 00:30:24,000 --> 00:30:26,000 having a large, temporary variable. 341 00:30:26,000 --> 00:30:30,000 That turns out to be a lot of overhead. 342 00:30:30,000 --> 00:30:33,000 And, in fact, it's often the case when you're 343 00:30:33,000 --> 00:30:37,000 looking at real code that if you can optimize for space, 344 00:30:37,000 --> 00:30:42,000 you also optimized for time. If you can run your code with 345 00:30:42,000 --> 00:30:45,000 smaller space, you can actually run it with 346 00:30:45,000 --> 00:30:49,000 smaller time, tends to be a constant factor 347 00:30:49,000 --> 00:30:52,000 advantage. But, those constants can add 348 00:30:52,000 --> 00:30:57,000 up, and can make a difference in whether somebody else's code is 349 00:30:57,000 --> 00:31:01,000 faster or your code is faster, once you have your basic 350 00:31:01,000 --> 00:31:03,000 algorithm. So, the idea is to, 351 00:31:03,000 --> 00:31:07,000 in this case, we're going to get rid of it by 352 00:31:07,000 --> 00:31:12,000 trading parallelism because we've got oodles of parallelism 353 00:31:12,000 --> 00:31:22,000 here for space efficiency. OK, and the idea is we're going 354 00:31:22,000 --> 00:31:33,000 to get rid of T. OK, so let's throw this up. 355 00:31:33,000 --> 00:31:40,000 So, who can suggest how I might get rid of T here, 356 00:31:40,000 --> 00:31:44,000 get rid of this temporary matrix? 357 00:31:44,000 --> 00:31:46,000 Yeah? 358 00:31:58,000 --> 00:32:00,000 Yeah? 359 00:32:14,000 --> 00:32:15,000 So, if you just did adding it into C? 360 00:32:15,000 --> 00:32:18,000 So, the issue that you get there if they're both adding 361 00:32:18,000 --> 00:32:21,000 into C is you get interference between the two subcomputations. 362 00:32:21,000 --> 00:32:24,000 Now, there are ways of doing that that work out, 363 00:32:24,000 --> 00:32:27,000 but you now have to worry about things we're not going to talk 364 00:32:27,000 --> 00:32:30,000 about such as mutual exclusion to make sure that as you're 365 00:32:30,000 --> 00:32:33,000 updating it, somebody else isn't updating it, and you don't have 366 00:32:33,000 --> 00:32:38,000 race conditions. But you can actually do it in 367 00:32:38,000 --> 00:32:41,000 this context with no race conditions. 368 00:32:41,000 --> 00:32:44,000 Yeah, exactly. Exactly, OK, 369 00:32:44,000 --> 00:32:48,000 exactly. So, the idea is spawn off four 370 00:32:48,000 --> 00:32:52,000 of them. OK, they all update their copy 371 00:32:52,000 --> 00:32:58,000 of C, and then spawn off the other four that add their values 372 00:32:58,000 --> 00:33:03,000 in. So, that is a piece of code 373 00:33:03,000 --> 00:33:08,000 we'll call mult add. And, it's actually going to do 374 00:33:08,000 --> 00:33:15,000 C gets C plus A times B. OK, so it's actually going to 375 00:33:15,000 --> 00:33:19,000 add it in. So, initially you'd have to 376 00:33:19,000 --> 00:33:26,000 zero out C, but we can do that with code very similar to the 377 00:33:26,000 --> 00:33:33,000 addition code with order n^2 work, and order log n critical 378 00:33:33,000 --> 00:33:38,000 path. So that's not going to be a big 379 00:33:38,000 --> 00:33:41,000 part of what we have to deal with. 380 00:33:41,000 --> 00:33:45,000 OK, so here's the code. We basically, 381 00:33:45,000 --> 00:33:51,000 once again, do the base and partition which I'm not going to 382 00:33:51,000 --> 00:34:00,000 write out the code for. We spawn a mult add of C1-1, 383 00:34:00,000 --> 00:34:09,000 A1-1, B1-1, n over 2, and we do a few more of those 384 00:34:09,000 --> 00:34:14,000 down to the fourth one. 385 00:34:25,000 --> 00:34:32,000 And then we put in a sync. And then we do the other four -- 386 00:35:01,000 --> 00:35:03,000 -- and then sync when we're done with that. 387 00:35:12,000 --> 00:35:14,000 OK, does everybody understand that code? 388 00:35:14,000 --> 00:35:18,000 See that it basically does the same calculation. 389 00:35:18,000 --> 00:35:22,000 We actually don't need to call add anymore, because we are 390 00:35:22,000 --> 00:35:26,000 doing that as part of the multiply because we're adding it 391 00:35:26,000 --> 00:35:28,000 in. But we do have to initialize. 392 00:35:28,000 --> 00:35:33,000 OK, we do have to initialize the matrix in this case. 393 00:35:33,000 --> 00:35:42,000 OK, so there is another phase. So, people understand the 394 00:35:42,000 --> 00:35:50,000 semantics of this code. So let's analyze it. 395 00:35:50,000 --> 00:35:58,000 OK, so what's the work of multiply, add of n? 396 00:35:58,000 --> 00:36:06,000 It's basically the same thing, right? 397 00:36:06,000 --> 00:36:10,000 It's order n^3 because the serial code is almost the same 398 00:36:10,000 --> 00:36:14,000 as the serial code up there, OK, not quite, 399 00:36:14,000 --> 00:36:19,000 OK, but you get essentially the same recurrence except you don't 400 00:36:19,000 --> 00:36:23,000 even have the add. You just get the same 401 00:36:23,000 --> 00:36:28,000 recurrence but with order one here, oops, order one up here. 402 00:36:28,000 --> 00:36:33,000 So, it's still got the order n^3 solution. 403 00:36:33,000 --> 00:36:37,000 OK, so that, I think, is not too hard. 404 00:36:37,000 --> 00:36:42,000 OK, so the critical path length, so there, 405 00:36:42,000 --> 00:36:45,000 let's write out, so multiply, 406 00:36:45,000 --> 00:36:50,000 add, of n, OK, what's my recurrence for this 407 00:36:50,000 --> 00:36:52,000 code? 408 00:37:02,000 --> 00:37:06,000 Yeah, 2M infinity, M over 2 [ost that, 409 00:37:06,000 --> 00:37:09,000 so order one. Plus order one, 410 00:37:09,000 --> 00:37:13,000 yeah. OK, so the point is that we're 411 00:37:13,000 --> 00:37:17,000 going to have, for the critical path, 412 00:37:17,000 --> 00:37:23,000 we're going to spawn these four off, and so I take the maximum 413 00:37:23,000 --> 00:37:29,000 of whatever those is, which since they're symmetric 414 00:37:29,000 --> 00:37:36,000 is any one of them, OK, and then I have to wait. 415 00:37:36,000 --> 00:37:39,000 And then I do it again. So, that sync, 416 00:37:39,000 --> 00:37:42,000 once again, translates into, in the analysis, 417 00:37:42,000 --> 00:37:46,000 it translates into a plus of the critical path, 418 00:37:46,000 --> 00:37:50,000 which are the things I spawn off in parallel, 419 00:37:50,000 --> 00:37:53,000 I do the max. OK, so people see that? 420 00:37:53,000 --> 00:37:58,000 So, I get this recurrence, 2MA of n over 2 plus order one, 421 00:37:58,000 --> 00:38:02,000 and what's the solution to that? 422 00:38:02,000 --> 00:38:10,000 OK, that's order n, OK, because n to the log base 423 00:38:10,000 --> 00:38:18,000 two of two is n, and that's bigger than one so 424 00:38:18,000 --> 00:38:25,000 we get order n. OK, so the parallelism, 425 00:38:25,000 --> 00:38:36,000 we have p bar is equal to MA one of n over MA infinity of n 426 00:38:36,000 --> 00:38:47,000 is equal to, in this case, n^3 over n, or order n^2. 427 00:38:47,000 --> 00:38:51,000 OK, so for 1,000 by 1,000 matrices, for example, 428 00:38:51,000 --> 00:38:56,000 by the way, 1,000 by 1,000 is considered a small matrix, 429 00:38:56,000 --> 00:39:02,000 these days, because that's only one million entries. 430 00:39:02,000 --> 00:39:07,000 You can put that on your laptop no sweat. 431 00:39:07,000 --> 00:39:14,000 OK, so, but for 1,000 by 1,000 matrices, our parallelism is 432 00:39:14,000 --> 00:39:18,000 about 10^6. OK, so once again, 433 00:39:18,000 --> 00:39:27,000 ample parallelism for anything we would run it on today. 434 00:39:27,000 --> 00:39:28,000 And as it turns out, it's faster in practice -- 435 00:39:38,000 --> 00:39:43,000 -- because we have less space. OK, so here's a game where, 436 00:39:43,000 --> 00:39:49,000 so, often the game you'll see in theory papers if you look at 437 00:39:49,000 --> 00:39:53,000 research papers, people are often striving to 438 00:39:53,000 --> 00:39:59,000 get the most parallelism, and that's a good game to play, 439 00:39:59,000 --> 00:40:04,000 OK, but it's not necessarily the only game. 440 00:40:04,000 --> 00:40:06,000 Particularly, if you have a lot of 441 00:40:06,000 --> 00:40:09,000 parallelism, one of the things that's very easy to do is to 442 00:40:09,000 --> 00:40:14,000 retreat on the parallelism and gain other aspects that you may 443 00:40:14,000 --> 00:40:16,000 want in your code. OK, and so this is a good 444 00:40:16,000 --> 00:40:19,000 example of that. In fact, and this is an 445 00:40:19,000 --> 00:40:22,000 exercise, you can actually achieve work n^3, 446 00:40:22,000 --> 00:40:25,000 order n^3 work, and a critical path of log n, 447 00:40:25,000 --> 00:40:29,000 so even better than either of these two algorithms in terms of 448 00:40:29,000 --> 00:40:33,000 parallelism. OK, so that gives you n^3 over 449 00:40:33,000 --> 00:40:36,000 log n parallelism. So, that's an exercise. 450 00:40:36,000 --> 00:40:40,000 And then, the other exercise that I mention that that's good 451 00:40:40,000 --> 00:40:44,000 to do is parallel Strassen, OK, doing the same thing with 452 00:40:44,000 --> 00:40:48,000 Strassen, and analyze, what's the working critical 453 00:40:48,000 --> 00:40:51,000 path and parallelism of the Strassen code? 454 00:40:51,000 --> 00:40:54,000 OK, any questions about matrix multiplication? 455 00:40:54,000 --> 00:40:56,000 Yeah? 456 00:41:03,000 --> 00:41:07,000 Yeah, so that would take, that would add a log n to the 457 00:41:07,000 --> 00:41:10,000 critical path, which is nothing compared to 458 00:41:10,000 --> 00:41:12,000 the n. Excuse me? 459 00:41:12,000 --> 00:41:16,000 Well, you got to make sure C is zero to begin with. 460 00:41:16,000 --> 00:41:20,000 OK, so you have to set all the entries to zero, 461 00:41:20,000 --> 00:41:25,000 and so that will take you n^2 work, which is nothing compared 462 00:41:25,000 --> 00:41:30,000 to the n^3 work you're doing here, and it will cost you log n 463 00:41:30,000 --> 00:41:34,000 additional to the critical path, which is nothing compared to 464 00:41:34,000 --> 00:41:39,000 the order n that you're spending. 465 00:41:39,000 --> 00:41:45,000 Any other questions about matrix multiplication? 466 00:41:45,000 --> 00:41:51,000 OK, as they say, this all goes back to week two, 467 00:41:51,000 --> 00:41:55,000 or something, in the class. 468 00:41:55,000 --> 00:42:01,000 Did you have a comment? Yes, you can. 469 00:42:01,000 --> 00:42:05,000 OK, yes you can. It's actually kind of 470 00:42:05,000 --> 00:42:11,000 interesting to look at that. Actually, we'll talk later. 471 00:42:11,000 --> 00:42:17,000 We'll write a research paper after the class is over, 472 00:42:17,000 --> 00:42:24,000 OK, because there's actually some interesting open questions 473 00:42:24,000 --> 00:42:28,000 there. OK, let's move on to something 474 00:42:28,000 --> 00:42:34,000 that you thought you'd gotten rid of weeks ago, 475 00:42:34,000 --> 00:42:40,000 and that would be the topic of sorting. 476 00:42:40,000 --> 00:42:44,000 Back to sorting. OK, so we want to parallel sort 477 00:42:44,000 --> 00:42:47,000 now, OK? Hugely important problem. 478 00:42:47,000 --> 00:42:52,000 So, let's take a look at, so if I think about algorithms 479 00:42:52,000 --> 00:42:57,000 for sorting that sound easy to parallelize, which ones sound 480 00:42:57,000 --> 00:43:01,000 kind of easy to parallelize? Quick sort, yeah, 481 00:43:01,000 --> 00:43:05,000 that's a good one. Yeah, quick sort is a pretty 482 00:43:05,000 --> 00:43:08,000 good one to parallelize and analyze. 483 00:43:08,000 --> 00:43:10,000 But remember, quick sort has a little bit 484 00:43:10,000 --> 00:43:13,000 more complicated analysis than some other sorts. 485 00:43:13,000 --> 00:43:17,000 What's another one that looks like it should be pretty easy to 486 00:43:17,000 --> 00:43:19,000 parallelize? Merge sort. 487 00:43:19,000 --> 00:43:21,000 When did we teach merge sort? Day one. 488 00:43:21,000 --> 00:43:25,000 OK, so do merge sort because it's just a little bit easier to 489 00:43:25,000 --> 00:43:27,000 analyze. OK, we could do the same thing 490 00:43:27,000 --> 00:43:32,000 for quick sort. Here's merge sort, 491 00:43:32,000 --> 00:43:38,000 OK, and it's going to sort A of p to r. 492 00:43:38,000 --> 00:43:47,000 So, if p is less than r, then we get the middle element, 493 00:43:47,000 --> 00:43:55,000 and then we'll spawn off since we have to, as you recall, 494 00:43:55,000 --> 00:44:03,000 when you merge sort you first recursively sort the two 495 00:44:03,000 --> 00:44:11,000 sub-arrays. There's no reason not to do 496 00:44:11,000 --> 00:44:18,000 those parallel. Let's just do them in parallel. 497 00:44:18,000 --> 00:44:23,000 Let's spawn off, merge sort of A, 498 00:44:23,000 --> 00:44:30,000 p, q, and spawn off, then, merge sort of A, 499 00:44:30,000 --> 00:44:38,000 q plus one r. And then, we wait for them to 500 00:44:38,000 --> 00:44:42,000 be done. Don't forget your syncs. 501 00:44:42,000 --> 00:44:48,000 Sync or swim. OK, and then what to do what we 502 00:44:48,000 --> 00:44:54,000 are done with this? OK, we merge. 503 00:44:54,000 --> 00:45:03,000 OK, so we merge of A, p, q, r, which is merge A of p 504 00:45:03,000 --> 00:45:10,000 up to q with A of q plus one up to r. 505 00:45:10,000 --> 00:45:16,000 And, once we've merged, we're done. 506 00:45:16,000 --> 00:45:27,000 OK, so this is the same code as we saw before in day one except 507 00:45:27,000 --> 00:45:37,000 we've got a couple of spawns in the sync. 508 00:45:37,000 --> 00:45:52,000 So let's analyze this. So, the work is called T_1 of 509 00:45:52,000 --> 00:46:03,000 n, what's the recurrence for this? 510 00:46:03,000 --> 00:46:07,000 This really is going back to day one, right? 511 00:46:07,000 --> 00:46:11,000 We actually did this on day one. 512 00:46:11,000 --> 00:46:17,000 OK, so what's the recurrence? 2T1 of n over 2 plus order n 513 00:46:17,000 --> 00:46:21,000 merges order n time operation, OK? 514 00:46:21,000 --> 00:46:25,000 And so, that gives us a solution of n log n, 515 00:46:25,000 --> 00:46:32,000 OK, even if you didn't know the solution, you should know the 516 00:46:32,000 --> 00:46:37,000 answer, OK, which is the same as the serial code, 517 00:46:37,000 --> 00:46:45,000 not surprisingly. That's what we want. 518 00:46:45,000 --> 00:46:57,000 OK, critical path length, AT infinity of n is equal to, 519 00:46:57,000 --> 00:47:08,000 OK, T infinity of n over 2 plus order n again. 520 00:47:08,000 --> 00:47:15,000 And that's equal to order n, OK? 521 00:47:15,000 --> 00:47:29,000 So, the parallelism is then p bar equals T_1 of n over T 522 00:47:29,000 --> 00:47:40,000 infinity of n is equal to theta of log n. 523 00:47:40,000 --> 00:47:45,000 Is that a lot of parallelism? No, we have a technical name 524 00:47:45,000 --> 00:47:47,000 for that. We call it puny. 525 00:47:47,000 --> 00:47:50,000 OK, that's puny parallelism. Log n? 526 00:47:50,000 --> 00:47:55,000 Now, so this is actually probably a decent algorithm for 527 00:47:55,000 --> 00:48:00,000 some of the small scale processors, especially the 528 00:48:00,000 --> 00:48:04,000 multicore processors that are coming on the market, 529 00:48:04,000 --> 00:48:09,000 and some of the smaller SMP, symmetric multiprocessors, 530 00:48:09,000 --> 00:48:15,000 that are available. You know, they have four or 531 00:48:15,000 --> 00:48:19,000 eight processors or something. It might be OK. 532 00:48:19,000 --> 00:48:22,000 There's not a lot of parallelism. 533 00:48:22,000 --> 00:48:25,000 For a million elements, log n is about 20. 534 00:48:25,000 --> 00:48:29,000 OK, and so and then there's constant overheads, 535 00:48:29,000 --> 00:48:34,000 etc. This is not very much 536 00:48:34,000 --> 00:48:39,000 parallelism at all. Question? 537 00:48:39,000 --> 00:48:46,000 Yeah, so how can we do better? I mean, it's like, 538 00:48:46,000 --> 00:48:53,000 man, at merge, right, it takes order n. 539 00:48:53,000 --> 00:48:59,000 if I want to do better, what should I do? 540 00:48:59,000 --> 00:49:03,000 Yeah? Sort in-place, 541 00:49:03,000 --> 00:49:08,000 but for example if you do quick sort and partition, 542 00:49:08,000 --> 00:49:12,000 you still have a linear time partition. 543 00:49:12,000 --> 00:49:17,000 So you're going to be very much in the same situation. 544 00:49:17,000 --> 00:49:21,000 But what can I do here? Parallel merge. 545 00:49:21,000 --> 00:49:24,000 OK, let's make merge go in parallel. 546 00:49:24,000 --> 00:49:28,000 That's where all the critical path is. 547 00:49:28,000 --> 00:49:33,000 Let's figure out a way of building a merge program that 548 00:49:33,000 --> 00:49:41,000 has a very short critical path. You have to parallelize the 549 00:49:41,000 --> 00:49:43,000 merge. This is great. 550 00:49:43,000 --> 00:49:50,000 It's so nice to see at the end of a course like this that 551 00:49:50,000 --> 00:49:57,000 people have the intuition that, oh, you can look at it and sort 552 00:49:57,000 --> 00:50:03,000 of see, where should you put in your work? 553 00:50:03,000 --> 00:50:05,000 OK, the one thing about algorithms is it doesn't stop 554 00:50:05,000 --> 00:50:09,000 you from having to engineer a program when you code it. 555 00:50:09,000 --> 00:50:12,000 There's a lot more to coding a program well than just having 556 00:50:12,000 --> 00:50:15,000 the algorithm as we talked about, also, in day one. 557 00:50:15,000 --> 00:50:18,000 There's things like making it modular, and making it 558 00:50:18,000 --> 00:50:20,000 maintainable, and a whole bunch of things 559 00:50:20,000 --> 00:50:22,000 like that. But one of the things that 560 00:50:22,000 --> 00:50:25,000 algorithms does is it tells you, where should you focus your 561 00:50:25,000 --> 00:50:28,000 work? OK, there's no point in, 562 00:50:28,000 --> 00:50:30,000 for example, sort of saying, 563 00:50:30,000 --> 00:50:34,000 OK, let me spawn off four of these things of size n over 4 in 564 00:50:34,000 --> 00:50:37,000 hopes of getting, I mean, it's like, 565 00:50:37,000 --> 00:50:39,000 that's not where you put the work. 566 00:50:39,000 --> 00:50:42,000 You put the work in merge because that's the one that's 567 00:50:42,000 --> 00:50:44,000 the bottleneck, OK? 568 00:50:44,000 --> 00:50:47,000 And, that's the nice thing about algorithms is it very 569 00:50:47,000 --> 00:50:51,000 quickly lets you hone in on where you should put your 570 00:50:51,000 --> 00:50:54,000 effort, OK, when you're doing algorithmic design in 571 00:50:54,000 --> 00:50:58,000 engineering practice. So you must parallelize the 572 00:50:58,000 --> 00:51:00,000 merge. 573 00:51:09,000 --> 00:51:12,000 The merge we're taking, so here's the basic idea we're 574 00:51:12,000 --> 00:51:14,000 going to use. So, in general, 575 00:51:14,000 --> 00:51:17,000 when we merge, when we do our recursive merge, 576 00:51:17,000 --> 00:51:21,000 we're going to have two arrays. Let's call them A and B. 577 00:51:21,000 --> 00:51:25,000 I called them A there. I probably shouldn't have used 578 00:51:25,000 --> 00:51:27,000 A. I probably should have called 579 00:51:27,000 --> 00:51:30,000 them something else, but that's what my notes have, 580 00:51:30,000 --> 00:51:36,000 so we're going to stick to it. But we get a little bit more 581 00:51:36,000 --> 00:51:39,000 space there and see what's going on. 582 00:51:39,000 --> 00:51:43,000 We have two arrays. I'll call them A and B, 583 00:51:43,000 --> 00:51:46,000 OK? And, what we're going to do, 584 00:51:46,000 --> 00:51:49,000 these are going to be already sorted. 585 00:51:49,000 --> 00:51:54,000 And our job is going to be to merge them together. 586 00:51:54,000 --> 00:52:00,000 So, what I'll do is I'll take the middle element of A. 587 00:52:00,000 --> 00:52:05,000 So this, let's say, goes from one to l, 588 00:52:05,000 --> 00:52:11,000 and this goes from one to m. OK, I'll take the middle 589 00:52:11,000 --> 00:52:20,000 element, the element at l over 2, say, and what I'll do is use 590 00:52:20,000 --> 00:52:27,000 binary search to figure out, where does it go in the array 591 00:52:27,000 --> 00:52:31,000 B? Where does this element go? 592 00:52:31,000 --> 00:52:36,000 It goes to some point here where we have j here and j plus 593 00:52:36,000 --> 00:52:38,000 one here. So, we know, 594 00:52:38,000 --> 00:52:43,000 since this is sorted, that all these things are less 595 00:52:43,000 --> 00:52:48,000 than or equal to A of l over 2, and all these things are 596 00:52:48,000 --> 00:52:52,000 greater than or equal to A of l over 2. 597 00:52:52,000 --> 00:52:56,000 And similarly, since that element falls here, 598 00:52:56,000 --> 00:53:02,000 all these are less than or equal to A of l over 2. 599 00:53:02,000 --> 00:53:09,000 And all these are going to be less greater than or equal to 600 00:53:09,000 --> 00:53:13,000 two. OK, and so now what I can do is 601 00:53:13,000 --> 00:53:20,000 once I figured out where this goes, I can recursively merge 602 00:53:20,000 --> 00:53:27,000 this array with this one, and this one with this one, 603 00:53:27,000 --> 00:53:33,000 and then if I can just concatenate them altogether, 604 00:53:33,000 --> 00:53:41,000 I've got my merged array. OK, so let's write that code. 605 00:53:41,000 --> 00:53:47,000 Everybody get the gist of what's going on there, 606 00:53:47,000 --> 00:53:52,000 how we're going to parallelize the merge? 607 00:53:52,000 --> 00:53:58,000 Of course, you can see, it's going to get a little 608 00:53:58,000 --> 00:54:02,000 messy because j could be anywhere. 609 00:54:02,000 --> 00:54:06,000 Secures my code, parallel merge of, 610 00:54:06,000 --> 00:54:13,000 and we're going to put it in C of one to n, so I'm going to 611 00:54:13,000 --> 00:54:21,000 have n elements. So, this is doing merge A and B 612 00:54:21,000 --> 00:54:27,000 into C, and n is equal to l plus n. 613 00:54:27,000 --> 00:54:36,000 OK, so we're going to take two arrays and merge it into the 614 00:54:36,000 --> 00:54:42,000 third array, OK? So, without loss of generality, 615 00:54:42,000 --> 00:54:45,000 I'm going to say, let's see, without loss of 616 00:54:45,000 --> 00:54:49,000 generality, I'm going to say l is bigger than m as I show here 617 00:54:49,000 --> 00:54:52,000 because if it's not, what do I do? 618 00:54:52,000 --> 00:54:55,000 Just do it the other way around, right? 619 00:54:55,000 --> 00:54:57,000 So, I figure out which one was bigger. 620 00:54:57,000 --> 00:55:01,000 So that only cost me order one to test that, 621 00:55:01,000 --> 00:55:04,000 or whatever. And then, I basically do a base 622 00:55:04,000 --> 00:55:07,000 case, you know, if the two arrays are empty or 623 00:55:07,000 --> 00:55:10,000 whatever, what you do in practice, of course, 624 00:55:10,000 --> 00:55:14,000 is if they're small enough, you just do a serial merge, 625 00:55:14,000 --> 00:55:17,000 OK, if they're small enough, and I don't really expect to 626 00:55:17,000 --> 00:55:20,000 get much parallelism. There isn't much work there. 627 00:55:20,000 --> 00:55:23,000 You might as well just do serial merge, 628 00:55:23,000 --> 00:55:25,000 and be a little bit more efficient, OK? 629 00:55:25,000 --> 00:55:32,000 So, do the base case. So then, what I do is I find 630 00:55:32,000 --> 00:55:41,000 the j such that B of j is less than or equal to A of l over 2, 631 00:55:41,000 --> 00:55:50,000 less than or equal to B of j plus one, using binary search. 632 00:55:50,000 --> 00:55:59,000 What did recover binary search? Oh yeah, that was week one, 633 00:55:59,000 --> 00:56:07,000 right? That was first recitation or 634 00:56:07,000 --> 00:56:13,000 something. Yeah, it's amazing. 635 00:56:13,000 --> 00:56:20,000 OK, and now, what we do is we spawn off p 636 00:56:20,000 --> 00:56:29,000 merge of A of one, l over 2, B of one to j, 637 00:56:29,000 --> 00:56:40,000 and stick it into C of one, two, l over 2 plus j. 638 00:56:40,000 --> 00:56:53,000 OK, and similarly now, we can spawn off a merge of A 639 00:56:53,000 --> 00:57:07,000 of l over 2 plus one up to l. B of j plus one up to M, 640 00:57:07,000 --> 00:57:19,000 and a C of l over two plus j plus one up to n. 641 00:57:19,000 --> 00:57:24,000 And then, I sync. 642 00:57:32,000 --> 00:57:35,000 So, code is pretty straightforward, 643 00:57:35,000 --> 00:57:42,000 doing exactly what I said we were going to do over here, 644 00:57:42,000 --> 00:57:47,000 analysis, a little messier, a little messier. 645 00:57:47,000 --> 00:57:53,000 So, let's just try to understand us before we do the 646 00:57:53,000 --> 00:57:57,000 analysis. Why is it that I want to pick 647 00:57:57,000 --> 00:58:05,000 the middle of the big array rather than the small array? 648 00:58:05,000 --> 00:58:13,000 What sort of my rationale there? 649 00:58:13,000 --> 00:58:27,000 That's actually a key part, going to be a key part of the 650 00:58:27,000 --> 00:58:31,000 analysis. Yeah? 651 00:58:31,000 --> 00:58:36,000 OK. Yeah, imagine that B, 652 00:58:36,000 --> 00:58:40,000 for example, had only one element in it, 653 00:58:40,000 --> 00:58:46,000 OK, or just a few elements, then finding it in A might mean 654 00:58:46,000 --> 00:58:50,000 finding it right near the beginning of A. 655 00:58:50,000 --> 00:58:56,000 And now, I'd be left with subproblems that were very big, 656 00:58:56,000 --> 00:59:00,000 whereas here, as you're pointing out, 657 00:59:00,000 --> 00:59:04,000 if I start here, if my total number of elements 658 00:59:04,000 --> 00:59:11,000 is n, what's the smallest that one of these recursions could 659 00:59:11,000 --> 00:59:15,000 be? n over 4 is the smallest it 660 00:59:15,000 --> 00:59:18,000 could be, OK, because I would have at least a 661 00:59:18,000 --> 00:59:23,000 quarter of the total number of elements to the left here or to 662 00:59:23,000 --> 00:59:26,000 the right here. If I do it the other way 663 00:59:26,000 --> 00:59:30,000 around, my recursion, I might get a recursion that 664 00:59:30,000 --> 00:59:33,000 was nearly as big as n, and that's sort of, 665 00:59:33,000 --> 00:59:37,000 once again, sort of like the difference when we were 666 00:59:37,000 --> 00:59:41,000 analyzing quick sort with whether we got a good 667 00:59:41,000 --> 00:59:46,000 partitioning element or not. The partitioning element is 668 00:59:46,000 --> 00:59:49,000 somewhere in the middle, we're really good, 669 00:59:49,000 --> 00:59:52,000 but it's always at one end, it's no better than insertion 670 00:59:52,000 --> 00:59:54,000 sort. You want to cut off at least a 671 00:59:54,000 --> 00:59:57,000 constant fraction in your divided and conquered in order 672 00:59:57,000 --> 01:00:02,326 to get the logarithmic behavior. OK, so we'll see that in the 673 01:00:02,326 --> 01:00:05,566 analysis. But the key thing here is that 674 01:00:05,566 --> 01:00:10,302 what we are going to do the recursion, we're going to have 675 01:00:10,302 --> 01:00:15,204 at least n over 4 elements in whatever the smaller thing is. 676 01:00:15,204 --> 01:00:19,192 OK, but let's start. It turns out the work is the 677 01:00:19,192 --> 01:00:23,181 hard part of this. Let's start with critical path 678 01:00:23,181 --> 01:00:25,175 length. OK, look at that, 679 01:00:25,175 --> 01:00:32,045 critical path length. OK, so parallel merge, 680 01:00:32,045 --> 01:00:42,712 so infinity of n is going to be, at most, so if the smaller 681 01:00:42,712 --> 01:00:53,379 piece has at least a quarter, what's the larger piece going 682 01:00:53,379 --> 01:01:03,588 to be of these two things here? So, we have two problems 683 01:01:03,588 --> 01:01:10,166 responding off. Now, we really have to do max 684 01:01:10,166 --> 01:01:19,136 because they're not symmetric. Which one's going to be worse? 685 01:01:19,136 --> 01:01:24,966 One could have, at most, three quarters, 686 01:01:24,966 --> 01:01:30,647 OK, of n. Woops, 3n, of 3n over 4 plus, 687 01:01:30,647 --> 01:01:39,767 OK, so the worst of those two is going to be three quarters of 688 01:01:39,767 --> 01:01:45,000 the elements plus, what? 689 01:01:45,000 --> 01:01:52,000 Plus log n. What's the log n? 690 01:01:52,000 --> 01:02:02,250 The binary search. OK, and that gives me a 691 01:02:02,250 --> 01:02:15,000 solution of, this ends up being n to the, what? 692 01:02:15,000 --> 01:02:16,845 n to the zero, right. 693 01:02:16,845 --> 01:02:20,996 OK, it's n to the log base four thirds of one. 694 01:02:20,996 --> 01:02:25,147 OK, it was the log of anything of one is zero. 695 01:02:25,147 --> 01:02:29,760 So, it's n to the zero. So that's just one compared 696 01:02:29,760 --> 01:02:33,265 with log n, tack on this log squared n. 697 01:02:33,265 --> 01:02:37,324 So, we have a critical path of log squared n. 698 01:02:37,324 --> 01:02:44,090 That's good news. Now, let's hope that we didn't 699 01:02:44,090 --> 01:02:49,545 blow up the work by a substantial amount. 700 01:02:49,545 --> 01:02:55,545 OK, so the work is PM_1 of n is equal to, OK, 701 01:02:55,545 --> 01:03:01,000 so we don't know what the split is. 702 01:03:01,000 --> 01:03:07,529 So let's call it alpha. OK, so alpha n in one side, 703 01:03:07,529 --> 01:03:15,235 and then the work on the other side will be PM of one of one 704 01:03:15,235 --> 01:03:21,503 minus alpha n plus, and then still order of log n 705 01:03:21,503 --> 01:03:26,858 to the binary search where, as we've said, 706 01:03:26,858 --> 01:03:36,000 alpha is going to fall between one quarter and three quarters. 707 01:03:46,000 --> 01:03:51,090 OK, how do we solve a recurrence like this? 708 01:03:51,090 --> 01:03:57,515 What's the technical name for this kind of recurrence? 709 01:03:57,515 --> 01:04:01,151 Hairy. It's a hairy recurrence. 710 01:04:01,151 --> 01:04:06,000 How do we solve hairy recurrences? 711 01:04:06,000 --> 01:04:09,318 Substitution. OK, good. 712 01:04:09,318 --> 01:04:15,502 Substitution. OK, so we're going to say PM 713 01:04:15,502 --> 01:04:24,402 one of k is less than or equal to, OK, I want to make a good 714 01:04:24,402 --> 01:04:31,340 guess here, OK, because I've fooled around with 715 01:04:31,340 --> 01:04:34,493 it. I want it to be linear, 716 01:04:34,493 --> 01:04:37,870 so it's going to have a linear term, a times k minus, 717 01:04:37,870 --> 01:04:39,948 and then I'm going to do b log k. 718 01:04:39,948 --> 01:04:43,454 So, this is this trick of subtracting a low order term. 719 01:04:43,454 --> 01:04:47,220 Remember that in substitution in order to make it stronger? 720 01:04:47,220 --> 01:04:51,311 If I just did ak it's not going to work because here I would get 721 01:04:51,311 --> 01:04:55,077 n, and then when I did this substitution I'm going to get a 722 01:04:55,077 --> 01:04:58,974 alpha n, and then a one minus alpha n, and those two together 723 01:04:58,974 --> 01:05:03,000 are already going to add up to everything here. 724 01:05:03,000 --> 01:05:08,411 So, there's no way I'm going to get it down when I add this term 725 01:05:08,411 --> 01:05:10,558 in. So, I need to subtract 726 01:05:10,558 --> 01:05:15,196 something from both of these so as to absorb this term, 727 01:05:15,196 --> 01:05:17,773 OK? So, I'm skipping over those 728 01:05:17,773 --> 01:05:22,411 steps, OK, because we did those steps in lecture two or 729 01:05:22,411 --> 01:05:25,588 something. OK, so that's the thing I'm 730 01:05:25,588 --> 01:05:31,000 going to guess where a and b are greater than zero. 731 01:05:31,000 --> 01:05:34,000 OK, so let's do the substitution. 732 01:05:46,000 --> 01:05:52,000 OK, so we have PM one of n is less than or equal to, 733 01:05:52,000 --> 01:05:57,764 OK, we substitute this inductive hypothesis in for 734 01:05:57,764 --> 01:06:02,234 these two guys. So, we get a alpha n minus b 735 01:06:02,234 --> 01:06:07,023 log of alpha n plus a of one minus alpha n minus b log of one 736 01:06:07,023 --> 01:06:10,535 minus alpha, maybe another parentheses there, 737 01:06:10,535 --> 01:06:14,206 one minus alpha n, and even leave myself enough 738 01:06:14,206 --> 01:06:19,154 space here plus a of one minus alpha n minus b log of one minus 739 01:06:19,154 --> 01:06:23,704 alpha, maybe another parenthesis there, one minus alpha n. 740 01:06:23,704 --> 01:06:27,215 I didn't even leave myself enough space here. 741 01:06:27,215 --> 01:06:31,924 Plus, let me just move this over so I don't end up using too 742 01:06:31,924 --> 01:06:39,704 much space. So, b log of one minus alpha n 743 01:06:39,704 --> 01:06:45,598 plus theta of log n. How's that? 744 01:06:45,598 --> 01:06:52,443 Are we OK on that? OK, so that's just 745 01:06:52,443 --> 01:07:01,000 substitution. Let's do a little algebra. 746 01:07:01,000 --> 01:07:07,977 That's equal to a of times alpha na times one minus alpha 747 01:07:07,977 --> 01:07:10,095 n. That's just an, 748 01:07:10,095 --> 01:07:15,578 OK, minus, well, the b isn't quite so simple. 749 01:07:15,578 --> 01:07:22,057 OK, so I have a b term. Now I've got a whole bunch of 750 01:07:22,057 --> 01:07:26,543 stuff there. I've got log of alpha n. 751 01:07:26,543 --> 01:07:31,900 I have, then, this log of one minus alpha n, 752 01:07:31,900 --> 01:07:40,000 OK, I'll start with the n, and then plus theta log n. 753 01:07:40,000 --> 01:07:45,947 Did I do that right? Does that look OK? 754 01:07:45,947 --> 01:07:53,773 OK, so look at that. OK, so now let's just multiply 755 01:07:53,773 --> 01:08:01,943 some of this stuff out. So, I have an minus b times, 756 01:08:01,943 --> 01:08:08,845 well, log of alpha n is just log alpha plus log n. 757 01:08:08,845 --> 01:08:16,450 And then I have plus log of one minus alpha plus log n, 758 01:08:16,450 --> 01:08:22,929 OK, plus theta log n. That's just more algebra, 759 01:08:22,929 --> 01:08:32,035 OK, using our rules for logs. Now let me express this as my 760 01:08:32,035 --> 01:08:39,131 solution minus my desired solution minus a residual, 761 01:08:39,131 --> 01:08:43,446 an minus b log n, OK, minus, OK, 762 01:08:43,446 --> 01:08:49,707 and so that was one of these b log n's, right, 763 01:08:49,707 --> 01:08:54,718 is here. And the other one's going to 764 01:08:54,718 --> 01:09:00,841 end up in here. I have B times log n plus log 765 01:09:00,841 --> 01:09:11,000 of alpha times one minus alpha minus, oops, I've got too many. 766 01:09:11,000 --> 01:09:17,716 Do I have the right number of closes. 767 01:09:17,716 --> 01:09:24,246 Close that, close that, that's good, 768 01:09:24,246 --> 01:09:29,470 minus theta log n. Two there. 769 01:09:29,470 --> 01:09:38,895 Boy, my writing is degrading. OK, did I do that right? 770 01:09:38,895 --> 01:09:42,636 Do I have the parentheses right? 771 01:09:42,636 --> 01:09:45,774 That matches, that matches, 772 01:09:45,774 --> 01:09:47,948 that matches, good. 773 01:09:47,948 --> 01:09:51,931 And then B goes to that, OK, good. 774 01:09:51,931 --> 01:09:59,051 OK, and I claim that is less than or equal to AN minus B log 775 01:09:59,051 --> 01:10:09,860 n if we choose B large enough. OK, this mess dominates this 776 01:10:09,860 --> 01:10:22,837 because this is basically a log n here, and this is essentially 777 01:10:22,837 --> 01:10:29,952 a constant. OK, so if I increase B, 778 01:10:29,952 --> 01:10:40,000 OK, times log n, I can overcome that log n, 779 01:10:40,000 --> 01:10:47,587 whatever the constant is, hidden by the asymptotic 780 01:10:47,587 --> 01:10:53,934 notation, OK, such that B times log n plus 781 01:10:53,934 --> 01:11:04,000 log of alpha times one minus alpha dominates the theta log n. 782 01:11:04,000 --> 01:11:13,961 OK, and I can also choose my base condition to be big enough 783 01:11:13,961 --> 01:11:22,740 to handle the initial conditions, whatever they might 784 01:11:22,740 --> 01:11:27,467 be. OK, so we'll choose A big 785 01:11:27,467 --> 01:11:30,000 enough -- 786 01:11:48,000 --> 01:11:55,172 -- to satisfy the base of the induction. 787 01:11:55,172 --> 01:12:04,000 OK, so thus PM_1 of n is equal to theta n, OK? 788 01:12:04,000 --> 01:12:07,384 So I actually showed O, and it turns out, 789 01:12:07,384 --> 01:12:12,207 the lower bound that it is omega n is more straightforward 790 01:12:12,207 --> 01:12:17,030 because the recurrence is easier because I can do the same 791 01:12:17,030 --> 01:12:20,584 substitution. I just don't have to subtract 792 01:12:20,584 --> 01:12:24,561 off low order terms. OK, so it's actually theta, 793 01:12:24,561 --> 01:12:27,776 not just O. OK, so that gives us a log, 794 01:12:27,776 --> 01:12:30,907 what did we say the critical path was? 795 01:12:30,907 --> 01:12:35,138 The critical path is log squared n for the parallel 796 01:12:35,138 --> 01:12:40,787 merge. So, let's do the analysis of 797 01:12:40,787 --> 01:12:45,927 merge sort using this. So, the work is, 798 01:12:45,927 --> 01:12:52,285 as we know already, T_1 of n is theta of n log n 799 01:12:52,285 --> 01:12:59,048 because our work that we just analyzed was order n, 800 01:12:59,048 --> 01:13:05,000 same as for the serial algorithm, OK? 801 01:13:05,000 --> 01:13:10,481 The critical path length, now, is T infinity of n is 802 01:13:10,481 --> 01:13:14,457 equal to, OK, so in normal merge sort, 803 01:13:14,457 --> 01:13:20,261 we have a problem of half the size, T of n over 2 plus, 804 01:13:20,261 --> 01:13:26,387 now, my critical path for merging is not order n as it was 805 01:13:26,387 --> 01:13:37,428 before. Instead, it's just over there. 806 01:13:37,428 --> 01:13:45,600 Log squared n, there we go. 807 01:13:45,600 --> 01:14:01,000 OK, and so that gives us theta of log cubed n. 808 01:14:01,000 --> 01:14:10,312 So, our parallelism is then theta of n over log cubed n. 809 01:14:10,312 --> 01:14:17,423 And, in fact, the best that's been done is, 810 01:14:17,423 --> 01:14:23,179 sorry, log squared n, you're right. 811 01:14:23,179 --> 01:14:33,000 Log squared n because it's n log n over log cubed n. 812 01:14:33,000 --> 01:14:37,247 It's n over log squared n, OK? 813 01:14:37,247 --> 01:14:43,106 And the best, so now I wonder if I have a 814 01:14:43,106 --> 01:14:48,085 typo here. I have that the best is, 815 01:14:48,085 --> 01:14:54,676 p bar is theta of n over log n. Is that right? 816 01:14:54,676 --> 01:15:02,000 I think so. Yeah, that's the best to date. 817 01:15:02,000 --> 01:15:05,576 That's the best to date. By Occoli, I believe, 818 01:15:05,576 --> 01:15:09,153 is who did this. OK, so you can actually get a 819 01:15:09,153 --> 01:15:13,446 fairly good, but it turns out sorting is a really tough 820 01:15:13,446 --> 01:15:18,215 problem to parallelize to get really good constants where you 821 01:15:18,215 --> 01:15:22,030 want to make it so it's running exactly the same. 822 01:15:22,030 --> 01:15:26,243 Matrix multiplication, you can make it run in parallel 823 01:15:26,243 --> 01:15:30,058 and get straight, hard rail, linear speed up with 824 01:15:30,058 --> 01:15:35,356 a number of processors. There is plenty of parallelism, 825 01:15:35,356 --> 01:15:39,994 and running on more processors, every processor carries a full 826 01:15:39,994 --> 01:15:41,514 weight. With sorting, 827 01:15:41,514 --> 01:15:43,947 typically you lose, I don't know, 828 01:15:43,947 --> 01:15:47,596 20% in my experience, OK, in terms of other stuff 829 01:15:47,596 --> 01:15:51,777 going on because you have to work really hard to get the 830 01:15:51,777 --> 01:15:55,883 constants of this merge algorithm down to the constants 831 01:15:55,883 --> 01:15:59,000 of that normal merge, right? 832 01:15:59,000 --> 01:16:02,337 I mean that's a pretty good algorithm, right, 833 01:16:02,337 --> 01:16:05,143 the one that just goes, BUZZING SOUND, 834 01:16:05,143 --> 01:16:08,934 and just takes two lists and merges them like that. 835 01:16:08,934 --> 01:16:13,410 So, it's an interesting issue. And a lot of people work very 836 01:16:13,410 --> 01:16:16,975 hard on sorting, because it's a hugely important 837 01:16:16,975 --> 01:16:21,601 problem, and how it is that you can actually get the constants 838 01:16:21,601 --> 01:16:25,924 down while still guaranteeing that it will scale up with a 839 01:16:25,924 --> 01:16:29,716 number of processors. OK, that's our little sojourn 840 01:16:29,716 --> 01:16:33,281 into parallel land, and next week we're going to 841 01:16:33,281 --> 01:16:37,073 talk about caching, which is another very important 842 01:16:37,073 --> 01:16:40,000 area of algorithms, and of programming in general.