1 00:00:00,030 --> 00:00:02,420 The following content is provided under a Creative 2 00:00:02,420 --> 00:00:03,840 Commons license. 3 00:00:03,840 --> 00:00:06,860 Your support will help MIT OpenCourseWare continue to 4 00:00:06,860 --> 00:00:10,540 offer high quality educational resources for free. 5 00:00:10,540 --> 00:00:13,410 To make a donation or view additional materials from 6 00:00:13,410 --> 00:00:17,610 hundreds of MIT courses, visit MIT OpenCourseWare at 7 00:00:17,610 --> 00:00:18,860 ocw.mit.edu. 8 00:00:21,020 --> 00:00:26,106 PROFESSOR: So we have Arvind today going to talk about how 9 00:00:26,106 --> 00:00:30,833 actually do parallelism much closer to hardware and all the 10 00:00:30,833 --> 00:00:32,470 work we have been doing. 11 00:00:32,470 --> 00:00:34,750 ARVIND: So first, just a clarification. 12 00:00:34,750 --> 00:00:37,600 This talk is not really about any techniques. 13 00:00:37,600 --> 00:00:41,590 This is some very primary ideas that I have some new 14 00:00:41,590 --> 00:00:44,470 ideas on parallel programming. 15 00:00:44,470 --> 00:00:48,120 The second thing is 5, 6 years ago I was going around the 16 00:00:48,120 --> 00:00:51,640 country giving a talk, why hardware design can't be left 17 00:00:51,640 --> 00:00:54,890 to hardware designers. 18 00:00:54,890 --> 00:00:59,770 Because I think the way they expressed their ideas is very, 19 00:00:59,770 --> 00:01:01,790 very low level from software point of view. 20 00:01:01,790 --> 00:01:05,750 So I have worked pretty hard on injecting some fairly 21 00:01:05,750 --> 00:01:09,350 sophisticated ideas from software into hardware design. 22 00:01:09,350 --> 00:01:13,780 But it often happens when you deal with powerful people, you 23 00:01:13,780 --> 00:01:15,260 get effected by it. 24 00:01:15,260 --> 00:01:17,240 So I have sort of come the full cirlce. 25 00:01:17,240 --> 00:01:20,670 Now I'm going to show you how hardware design can influence 26 00:01:20,670 --> 00:01:22,520 software design and that's really the 27 00:01:22,520 --> 00:01:24,360 idea behind this talk. 28 00:01:24,360 --> 00:01:27,760 So that's borrowing some ideas from hardware design. 29 00:01:32,600 --> 00:01:36,360 Cell processor is a very good example of a system on a chip 30 00:01:36,360 --> 00:01:38,570 and this is what everybody believes the 31 00:01:38,570 --> 00:01:40,570 future will look like. 32 00:01:40,570 --> 00:01:46,430 That you'll have fairly symmetric things in some sense 33 00:01:46,430 --> 00:01:49,450 and that is needed by hardware because hardware gets out of 34 00:01:49,450 --> 00:01:52,410 hand if you have wires running all over the place. 35 00:01:52,410 --> 00:01:57,600 So people would like these things to be fairly regular in 36 00:01:57,600 --> 00:02:03,290 some sense, but that doesn't mean that all these styles 37 00:02:03,290 --> 00:02:04,200 have to be the same. 38 00:02:04,200 --> 00:02:09,000 So for example, you may have very application specific 39 00:02:09,000 --> 00:02:12,060 blocks on your chip. 40 00:02:12,060 --> 00:02:16,610 So for example, in your cell phone you have many, many 41 00:02:16,610 --> 00:02:19,840 blocks which are specialized, especially all the radio 42 00:02:19,840 --> 00:02:21,080 communication. 43 00:02:21,080 --> 00:02:24,210 Because if you did that stuff in software you may not be 44 00:02:24,210 --> 00:02:26,490 able to meet the performance and even if you can meet the 45 00:02:26,490 --> 00:02:30,850 performance your battery will drain instantaneously. 46 00:02:30,850 --> 00:02:33,590 I mean, it just takes too much power to do that kind of 47 00:02:33,590 --> 00:02:36,570 computation in software. 48 00:02:36,570 --> 00:02:40,430 So for various reasons we need application specific 49 00:02:40,430 --> 00:02:43,330 processing units and I'm sure cell processor has 50 00:02:43,330 --> 00:02:46,090 quite a few of these. 51 00:02:46,090 --> 00:02:48,490 By the way, these application specific units, they're often 52 00:02:48,490 --> 00:02:50,120 called ASICs. 53 00:02:50,120 --> 00:02:53,220 And you can think of them as a separate chip, but in due 54 00:02:53,220 --> 00:02:56,780 course of time they become blocks of a bigger thing. 55 00:02:56,780 --> 00:02:59,680 You may have general purpose processors. 56 00:02:59,680 --> 00:03:03,020 Already, systems have more than one such thing. 57 00:03:03,020 --> 00:03:04,470 Cell processor has how many? 58 00:03:04,470 --> 00:03:05,590 Like 8. 59 00:03:05,590 --> 00:03:05,960 PROFESSOR: Depends. 60 00:03:05,960 --> 00:03:09,900 Some have 8, but they don't [OBSCURED] 61 00:03:09,900 --> 00:03:12,730 So the current Playstations already list supposedly 7, but 62 00:03:12,730 --> 00:03:14,540 we've had only 6. 63 00:03:14,540 --> 00:03:18,380 So, the [OBSCURED]. 64 00:03:18,380 --> 00:03:21,380 PROFESSOR: And high-end cell phones have 2 general purpose 65 00:03:21,380 --> 00:03:31,960 processors and 2 DSPs on it, so you certainly have -- 66 00:03:31,960 --> 00:03:36,420 multiple processors you will have memory banks, et cetera. 67 00:03:36,420 --> 00:03:38,080 And you will have structured on-chip 68 00:03:38,080 --> 00:03:40,250 interconnection networks. 69 00:03:40,250 --> 00:03:44,080 ARVIND: So I think this is not controllers. 70 00:03:44,080 --> 00:03:46,830 That this is what the trajectory will look like in 71 00:03:46,830 --> 00:03:48,040 the future. 72 00:03:48,040 --> 00:03:50,760 That doesn't mean that all the chips will look the same. 73 00:03:50,760 --> 00:03:52,900 You may have different mixes of things depending upon the 74 00:03:52,900 --> 00:03:57,470 application and one of the interesting things is can you 75 00:03:57,470 --> 00:04:00,250 put together one of these things fast enough because 76 00:04:00,250 --> 00:04:02,660 it's very, very expensive to design and 77 00:04:02,660 --> 00:04:04,210 implement these things. 78 00:04:04,210 --> 00:04:07,690 So really one issue that comes up all the time is can we 79 00:04:07,690 --> 00:04:11,290 rapidly produce high quality chips and surrounding systems 80 00:04:11,290 --> 00:04:13,310 and software for such thing? 81 00:04:17,240 --> 00:04:20,630 So this talk, as I said, it's about ideas and what I'm going 82 00:04:20,630 --> 00:04:24,420 to do is I'm first going to tell you how I used to think 83 00:04:24,420 --> 00:04:25,670 about parallel programming. 84 00:04:28,660 --> 00:04:31,750 And this way is all about threads. 85 00:04:31,750 --> 00:04:33,350 You know, where are my threads? 86 00:04:33,350 --> 00:04:36,330 And I worked on this problem for a very, very long time. 87 00:04:36,330 --> 00:04:40,450 Like 20 years and this is not necessarily wrong, You know 88 00:04:40,450 --> 00:04:44,550 I'm sure in this course itself you have seen quite a bit 89 00:04:44,550 --> 00:04:47,090 about this and I think tomorrow's lecture is on Cilk 90 00:04:47,090 --> 00:04:49,610 so you will see even more about threads. 91 00:04:52,260 --> 00:04:57,500 And lately, I've been thinking about it differently. 92 00:04:57,500 --> 00:05:00,870 And really the main point I would like to get across 93 00:05:00,870 --> 00:05:05,250 somehow to you is now I think of parallel little programming 94 00:05:05,250 --> 00:05:08,070 module as a resource. 95 00:05:08,070 --> 00:05:09,370 What do you mean as a resource? 96 00:05:09,370 --> 00:05:13,470 I'll try to make it more concrete, this idea. 97 00:05:13,470 --> 00:05:18,770 And the whole issue here is I think this viewpoint is 98 00:05:18,770 --> 00:05:21,880 valuable, but it's not proven yet. 99 00:05:21,880 --> 00:05:24,120 You know, so it may not be right. 100 00:05:24,120 --> 00:05:27,870 So just a warning. 101 00:05:27,870 --> 00:05:32,690 I've been working with this on students, in particular. 102 00:05:32,690 --> 00:05:34,730 Now those of you who have heard of transactional 103 00:05:34,730 --> 00:05:38,580 programming, lots of things I will say you'll say oh, you 104 00:05:38,580 --> 00:05:40,140 mean a transaction? 105 00:05:40,140 --> 00:05:43,580 Yes, I mean there are very, very strong connections for 106 00:05:43,580 --> 00:05:45,960 transactional stuff, but that's not the language I'm 107 00:05:45,960 --> 00:05:48,870 going to be using here. 108 00:05:48,870 --> 00:05:50,220 These slides are color-coded. 109 00:05:50,220 --> 00:05:54,130 So when I was talking about old world thinking it's yellow 110 00:05:54,130 --> 00:05:57,930 and when it's new thinking then it's white. 111 00:05:57,930 --> 00:06:01,390 So well, I think you probably have already heard this. 112 00:06:01,390 --> 00:06:05,180 That the only reason for parallel programming used to 113 00:06:05,180 --> 00:06:06,250 be performance. 114 00:06:06,250 --> 00:06:08,980 This made programming extremely difficult. 115 00:06:08,980 --> 00:06:11,730 You had to know a lot about the machine. 116 00:06:11,730 --> 00:06:14,300 You were programming in code were not portable. 117 00:06:14,300 --> 00:06:16,490 Endless performance tuning, all this still goes on. 118 00:06:16,490 --> 00:06:19,630 I mean, very few of these things have changed. 119 00:06:19,630 --> 00:06:21,740 Parallel libraries were not composable. 120 00:06:21,740 --> 00:06:25,230 And this is one of the main things I worry about a lot and 121 00:06:25,230 --> 00:06:27,550 I think Allen was alluding to it too. 122 00:06:27,550 --> 00:06:29,710 Somebody has code, he just wants to run the damn thing. 123 00:06:29,710 --> 00:06:31,880 You know, even at factor of 2 speed that 124 00:06:31,880 --> 00:06:33,420 would be what fantastic. 125 00:06:33,420 --> 00:06:35,060 You know, often it runs slower. 126 00:06:35,060 --> 00:06:38,200 Very, very fun, machines. 127 00:06:38,200 --> 00:06:40,580 Very different to deal with heap structures and memory 128 00:06:40,580 --> 00:06:42,530 hierarchies. 129 00:06:42,530 --> 00:06:48,060 And then there is always the synchronization cost issue. 130 00:06:48,060 --> 00:06:50,340 In some sense parallel programming doesn't make sense 131 00:06:50,340 --> 00:06:53,480 if there is no synchronization going on and that's like this 132 00:06:53,480 --> 00:06:56,820 embarrsingly parallel computation you just start off 133 00:06:56,820 --> 00:06:58,770 and things never talk to each other. 134 00:06:58,770 --> 00:07:01,880 In even moderately interesting parallel programs there's 135 00:07:01,880 --> 00:07:06,740 always some communication between various tasks and when 136 00:07:06,740 --> 00:07:10,290 that happens synchronization cost becomes an issue and it 137 00:07:10,290 --> 00:07:13,570 is such a major issue right now that you have to think 138 00:07:13,570 --> 00:07:14,530 about it out front. 139 00:07:14,530 --> 00:07:18,620 Oh, if I synchronize too much then I know for a fact that my 140 00:07:18,620 --> 00:07:21,240 program's going to run like a dog. 141 00:07:21,240 --> 00:07:23,750 So you try to avoid synchronization and make 142 00:07:23,750 --> 00:07:26,590 things coarser and so on. 143 00:07:26,590 --> 00:07:30,240 Really the whole issue used to be how to exploit hundreds of 144 00:07:30,240 --> 00:07:34,480 threads from software because new hardware will support 145 00:07:34,480 --> 00:07:35,730 many, many threads. 146 00:07:38,190 --> 00:07:40,760 But at the same time, it's important to understand that 147 00:07:40,760 --> 00:07:44,180 in this world there's always virtualization. 148 00:07:44,180 --> 00:07:46,590 So number of threads in your machine is never going to 149 00:07:46,590 --> 00:07:50,000 match exactly what you have in your application. 150 00:07:50,000 --> 00:07:53,720 So any programming model you have, virtualization is a very 151 00:07:53,720 --> 00:07:55,130 important aspect of it. 152 00:07:55,130 --> 00:07:59,320 That I have n threads and n may be dynamic, maybe varying 153 00:07:59,320 --> 00:08:02,820 with time in my application, but in hardware I have some 154 00:08:02,820 --> 00:08:05,000 fixed number of threads. 155 00:08:05,000 --> 00:08:08,970 Which is significantly smaller in general than what you have 156 00:08:08,970 --> 00:08:10,680 in your application. 157 00:08:10,680 --> 00:08:13,830 So the thing I was most interested in all those days 158 00:08:13,830 --> 00:08:17,410 was what I call implicit parallelism. 159 00:08:17,410 --> 00:08:20,550 And from the questions I was hearing about analysis you're 160 00:08:20,550 --> 00:08:24,550 doing of your programs, all those issues come up there. 161 00:08:24,550 --> 00:08:27,560 You know, given some code can I figure out what can be done 162 00:08:27,560 --> 00:08:31,470 in parallel and just run it like that? 163 00:08:31,470 --> 00:08:35,860 So expect parallelism from programs written in sequential 164 00:08:35,860 --> 00:08:40,200 languages and people like some of the champions in this and I 165 00:08:40,200 --> 00:08:42,740 have also spent lots and lots of time on 166 00:08:42,740 --> 00:08:45,480 this research problem. 167 00:08:45,480 --> 00:08:48,690 So this is one of those areas which certainly has not 168 00:08:48,690 --> 00:08:51,490 suffered because of lack of research. 169 00:08:51,490 --> 00:08:54,540 There's tons and tons of research in this. 170 00:08:54,540 --> 00:08:58,110 You know, you can go to IBM and there will be 50 Ph.D.'s 171 00:08:58,110 --> 00:08:59,990 working on Fortran compiler. 172 00:08:59,990 --> 00:09:01,300 Trying to extract parallelism. 173 00:09:04,060 --> 00:09:06,050 But the success is limited. 174 00:09:06,050 --> 00:09:09,220 In fact, the main takeaway from this research is people 175 00:09:09,220 --> 00:09:12,520 have learned now how they should write their programs so 176 00:09:12,520 --> 00:09:15,930 that compiler has a chance of extracting parallelism. 177 00:09:15,930 --> 00:09:18,000 Because now people look at your code, they say, well, 178 00:09:18,000 --> 00:09:19,140 this is a bad code. 179 00:09:19,140 --> 00:09:19,410 Why? 180 00:09:19,410 --> 00:09:22,890 Because compiler can't analyze it. 181 00:09:22,890 --> 00:09:27,180 You become part of the equation in this. 182 00:09:27,180 --> 00:09:31,470 Now the methodology I was pushing for a long time was 183 00:09:31,470 --> 00:09:33,080 functional languages. 184 00:09:33,080 --> 00:09:35,460 Programs in functional languages which may not 185 00:09:35,460 --> 00:09:38,350 obscure parallelism in the first place. 186 00:09:38,350 --> 00:09:41,160 I'm not going to talk about functional languages. 187 00:09:41,160 --> 00:09:45,110 It's immaterial, you know, whether this is a good idea or 188 00:09:45,110 --> 00:09:49,460 a bad idea, the fact of matter is in real life people's 189 00:09:49,460 --> 00:09:51,520 reaction is functional languages-- 190 00:09:51,520 --> 00:09:52,770 are you kidding me? 191 00:09:55,210 --> 00:09:57,970 So it doesn't get off ground. 192 00:09:57,970 --> 00:10:00,580 It's one of those things I can come and preach to you is good 193 00:10:00,580 --> 00:10:01,700 for your soul. 194 00:10:01,700 --> 00:10:03,710 And they leave me alone. 195 00:10:03,710 --> 00:10:08,630 I don't do movies with subtitles. 196 00:10:08,630 --> 00:10:11,790 AUDIENCE: Sorry to temporarily derail you-- 197 00:10:11,790 --> 00:10:11,970 ARVIND: Sorry? 198 00:10:11,970 --> 00:10:17,380 AUDIENCE: Sorry to temporarily derail you, so what kind of 199 00:10:17,380 --> 00:10:22,180 parallelism is easier to extract from 200 00:10:22,180 --> 00:10:23,520 a functional language? 201 00:10:23,520 --> 00:10:29,330 I can clearly see some task parallelism, can you also 202 00:10:29,330 --> 00:10:31,940 extract data parallelism or pipeline 203 00:10:31,940 --> 00:10:34,400 parallelism or what not? 204 00:10:34,400 --> 00:10:36,290 ARVIND: It turns out that's actually not that right 205 00:10:36,290 --> 00:10:39,430 question to ask in the functional language because 206 00:10:39,430 --> 00:10:43,220 functional language doesn't obscure the parallelism that 207 00:10:43,220 --> 00:10:44,440 was there in the first place. 208 00:10:44,440 --> 00:10:48,150 So you already have a partial order on operations as opposed 209 00:10:48,150 --> 00:10:49,470 to a sequential order. 210 00:10:49,470 --> 00:11:00,580 You see when you write in Fortran even a simple thing 211 00:11:00,580 --> 00:11:08,790 like f of g of x, h of y. 212 00:11:08,790 --> 00:11:11,380 It may be obvious to a functional programmer that 213 00:11:11,380 --> 00:11:12,680 these things can be done in parallel. 214 00:11:12,680 --> 00:11:15,510 Perhaps, the execution of these can overlap with the 215 00:11:15,510 --> 00:11:18,830 evaluation of f, but that's not the semantics of Fortran 216 00:11:18,830 --> 00:11:20,900 or any other sequential language. 217 00:11:20,900 --> 00:11:22,960 The semantics is you're supposed to execute the 218 00:11:22,960 --> 00:11:25,690 program left to right, top to bottom. 219 00:11:25,690 --> 00:11:29,390 So you will go and execute g first and you'll go inside g 220 00:11:29,390 --> 00:11:33,450 and execute it and then you will go and execute h and then 221 00:11:33,450 --> 00:11:36,190 you will go and execute f. 222 00:11:36,190 --> 00:11:37,280 Why is that important? 223 00:11:37,280 --> 00:11:40,550 Because there are side effects, so it's conceivable 224 00:11:40,550 --> 00:11:45,920 that you do something in this which affects this and so on. 225 00:11:45,920 --> 00:11:49,880 So you're given a sequential order and then compiler does 226 00:11:49,880 --> 00:11:52,660 deep analysis, the kind you're doing right now it terms of 227 00:11:52,660 --> 00:11:55,540 dependence and anti-dependence and so on. 228 00:11:55,540 --> 00:11:59,760 Which says it's OK to do g and h in parallel. 229 00:11:59,760 --> 00:12:02,790 Well, if it was a functional program then you know by the 230 00:12:02,790 --> 00:12:06,830 semantics of the language g and h can be done in parallel. 231 00:12:06,830 --> 00:12:09,630 That doesn't mean it's a good idea to do it in parallel. 232 00:12:09,630 --> 00:12:11,930 So when we get to the second point of mapping this 233 00:12:11,930 --> 00:12:16,870 computation onto a given substrate off 2 processors or 234 00:12:16,870 --> 00:12:19,320 10 processors then problems become similar. 235 00:12:19,320 --> 00:12:23,440 But the problem of detecting parallelism is very different. 236 00:12:23,440 --> 00:12:25,940 Because you don't talk of detecting parallelism in 237 00:12:25,940 --> 00:12:28,480 functional languages, it's already there. 238 00:12:28,480 --> 00:12:30,280 You don't obscure it. 239 00:12:30,280 --> 00:12:32,090 Does that make sense? 240 00:12:32,090 --> 00:12:35,560 AUDIENCE: Yeah, maybe. 241 00:12:35,560 --> 00:12:38,970 ARVIND: And it goes without saying that if your algorithm 242 00:12:38,970 --> 00:12:42,010 has low parallelism then functional languages or any 243 00:12:42,010 --> 00:12:43,750 other language and there's no magic-- 244 00:12:43,750 --> 00:12:46,440 you can't suck blood out of stone. 245 00:12:46,440 --> 00:12:47,660 You have to know your algorithm. 246 00:12:47,660 --> 00:12:49,640 You have to know what operations 247 00:12:49,640 --> 00:12:52,080 can be done in parallel. 248 00:12:52,080 --> 00:12:54,690 So language only is an aid. 249 00:12:54,690 --> 00:12:57,380 In some sense that if it was parallel I could express it 250 00:12:57,380 --> 00:13:00,670 more easily as a parallel program. 251 00:13:00,670 --> 00:13:03,070 And if it's a bad language, you obscure it and then you 252 00:13:03,070 --> 00:13:04,320 try to undo the damage. 253 00:13:06,820 --> 00:13:10,410 Now as I said, there has been a lot of success in this in 254 00:13:10,410 --> 00:13:14,810 terms of dense matrix kind of stuff, but more irregular the 255 00:13:14,810 --> 00:13:19,500 computation harder it gets to do these things statically. 256 00:13:19,500 --> 00:13:23,410 So when you can't do it then people of course, designed 257 00:13:23,410 --> 00:13:27,090 explicitly parallel programming models and some of 258 00:13:27,090 --> 00:13:32,070 the most successful ones in this are data parallel and 259 00:13:32,070 --> 00:13:32,820 you'll hear about 260 00:13:32,820 --> 00:13:36,130 multithreading tomorrow in Cilk. 261 00:13:36,130 --> 00:13:38,060 And then there are very low level models where you are 262 00:13:38,060 --> 00:13:42,690 doing actual message passing between various modules that 263 00:13:42,690 --> 00:13:46,700 are sitting over there and I can expose you threads and 264 00:13:46,700 --> 00:13:48,600 synchronizations that are very low-level fork 265 00:13:48,600 --> 00:13:50,590 and joint, et cetera. 266 00:13:50,590 --> 00:13:55,220 So of course, high-level models are preferable because 267 00:13:55,220 --> 00:13:58,330 you're likely to make fewer errors in them. 268 00:13:58,330 --> 00:14:00,510 The question is, can we compile them? 269 00:14:00,510 --> 00:14:05,190 And in some cases what turns out is this is a good model, 270 00:14:05,190 --> 00:14:07,010 but it is not general enough. 271 00:14:07,010 --> 00:14:11,230 So if you have a data parallel algorithm, fine. 272 00:14:11,230 --> 00:14:12,800 I mean, you should use these things. 273 00:14:12,800 --> 00:14:17,000 But not all applications fit the data parallel model, 274 00:14:17,000 --> 00:14:21,500 multithreading in some sense is more general. 275 00:14:21,500 --> 00:14:28,280 This is what I mean by a multithreaded model that you 276 00:14:28,280 --> 00:14:32,210 may have a main which spawn off things. 277 00:14:32,210 --> 00:14:35,790 So not only these things can be done in parallel, but 278 00:14:35,790 --> 00:14:39,380 execution of these can overlap the execution of f. 279 00:14:39,380 --> 00:14:42,870 So parents and children may be executing simultaneously. 280 00:14:42,870 --> 00:14:46,930 So if you look at this as an activation tree, this invokes 281 00:14:46,930 --> 00:14:50,450 a loop which has many, many iterations in it. 282 00:14:50,450 --> 00:14:53,460 When we talk of sequential computing you're always 283 00:14:53,460 --> 00:14:57,780 sitting on one branch of this. 284 00:14:57,780 --> 00:14:59,230 So that's the stack. 285 00:14:59,230 --> 00:15:02,670 This stack, this stack, stack frames. 286 00:15:02,670 --> 00:15:06,130 When you talk of parallel computing then part of this 287 00:15:06,130 --> 00:15:09,570 activation tree is concurrently active. 288 00:15:09,570 --> 00:15:11,350 So all problems get harder. 289 00:15:11,350 --> 00:15:13,410 Storage management gets harder. 290 00:15:13,410 --> 00:15:16,410 And what gets especially harder is that there is this 291 00:15:16,410 --> 00:15:17,920 global heap. 292 00:15:17,920 --> 00:15:21,130 You always have heap in interesting programs, global 293 00:15:21,130 --> 00:15:22,320 data structures. 294 00:15:22,320 --> 00:15:25,880 Now the problem is if this and this are active at the same 295 00:15:25,880 --> 00:15:27,900 time or this and this are active at the same time and 296 00:15:27,900 --> 00:15:30,360 they are both reading and writing into the data 297 00:15:30,360 --> 00:15:32,340 structure then naturally there's a race 298 00:15:32,340 --> 00:15:34,280 condition in this. 299 00:15:34,280 --> 00:15:38,030 Even if only 1 guy's writing and 5 people are reading it. 300 00:15:38,030 --> 00:15:41,490 It's a question of the guy should not read it too soon. 301 00:15:41,490 --> 00:15:45,370 So the moment you do any kind of parallel programming, 302 00:15:45,370 --> 00:15:48,220 synchronization issues arise immediately. 303 00:15:48,220 --> 00:15:51,780 You know, you always have to worry about, how do I indicate 304 00:15:51,780 --> 00:15:54,300 that it's OK to read this? 305 00:15:54,300 --> 00:15:56,040 Well, you can follow some control ideas. 306 00:15:56,040 --> 00:15:57,860 You say, this guy was writing. 307 00:15:57,860 --> 00:16:00,870 He has finished execution, so it must be OK to write. 308 00:16:00,870 --> 00:16:03,310 Or you can go to very fine-grained synchronization, 309 00:16:03,310 --> 00:16:06,550 you can have some bit here which says, oh this data has 310 00:16:06,550 --> 00:16:10,240 been updated, now it's OK to unlock it and 311 00:16:10,240 --> 00:16:11,730 read it and so on. 312 00:16:11,730 --> 00:16:14,175 So all these ideas have been explored and they're still 313 00:16:14,175 --> 00:16:17,580 being explored in this context. 314 00:16:17,580 --> 00:16:22,270 The main takeaway is instead of a stack you will have a 315 00:16:22,270 --> 00:16:25,380 tree which is active at the same time. 316 00:16:25,380 --> 00:16:26,880 Many, many things are going on. 317 00:16:26,880 --> 00:16:29,050 And the second thing is that there is a competition. 318 00:16:29,050 --> 00:16:32,750 There is a race in reading the heap data structures and 319 00:16:32,750 --> 00:16:35,770 therefore you have to worry about that you don't read it 320 00:16:35,770 --> 00:16:38,200 too soon or you don't write it too soon because if you write 321 00:16:38,200 --> 00:16:40,720 it too soon you may clobber before somebody else gets a 322 00:16:40,720 --> 00:16:42,320 chance to read it. 323 00:16:42,320 --> 00:16:44,830 So these issues are quite serious. 324 00:16:48,180 --> 00:16:50,870 Now it's really possible-- 325 00:16:50,870 --> 00:16:54,890 I mean, I claim you know, I have languages which can 326 00:16:54,890 --> 00:16:58,290 express computation very efficiently this way. 327 00:16:58,290 --> 00:16:59,486 Efficiently is not the right word. 328 00:16:59,486 --> 00:17:03,130 Very easily you can express computation in this way. 329 00:17:03,130 --> 00:17:07,700 But at the end of the day the goal is to take that stuff and 330 00:17:07,700 --> 00:17:11,000 run it somewhere on some parallel machine. 331 00:17:11,000 --> 00:17:13,600 Whether it was cell processor in the old days if would have 332 00:17:13,600 --> 00:17:15,920 been parallel machines of various ergs. 333 00:17:15,920 --> 00:17:20,230 And that has proven to be quite difficult, efficient 334 00:17:20,230 --> 00:17:23,610 mapping of these things because it turns out that you 335 00:17:23,610 --> 00:17:25,990 say, oh, you didn't tell me before you have only 2 336 00:17:25,990 --> 00:17:27,550 processors. 337 00:17:27,550 --> 00:17:31,970 Then I would have expressed the computation differently. 338 00:17:31,970 --> 00:17:34,720 Or you didn't tell me that you have 1000 processors. 339 00:17:34,720 --> 00:17:38,740 So there is this kind of yes, there is virtualization. 340 00:17:38,740 --> 00:17:41,310 But it's not virtualization enough. 341 00:17:41,310 --> 00:17:43,030 Qualitatively matters a lot. 342 00:17:43,030 --> 00:17:44,070 You know, what is the target? 343 00:17:44,070 --> 00:17:46,820 And that effects how you would have expressed the 344 00:17:46,820 --> 00:17:49,360 computation. 345 00:17:49,360 --> 00:17:53,940 So I used to go around saying this all the time that 346 00:17:53,940 --> 00:17:57,370 parallel programming is so important. 347 00:17:57,370 --> 00:17:59,920 That really sequential programming will be taught as 348 00:17:59,920 --> 00:18:03,310 a special case of parallel programming because it will be 349 00:18:03,310 --> 00:18:05,630 all parallel. 350 00:18:05,630 --> 00:18:07,710 If you have only one processor then we can teach you some 351 00:18:07,710 --> 00:18:11,540 special tricks how to run it efficiently on one processor. 352 00:18:11,540 --> 00:18:16,450 So this as you can see, is largely an unrealized dream. 353 00:18:16,450 --> 00:18:19,360 AUDIENCE: That's an old idea, but not one to be dismissed. 354 00:18:19,360 --> 00:18:20,560 ARVIND: Not one to be dismissed. 355 00:18:20,560 --> 00:18:21,640 Absolutely. 356 00:18:21,640 --> 00:18:25,960 So the question is, has the situation changed? 357 00:18:25,960 --> 00:18:29,730 And the situation has certainly changed. 358 00:18:29,730 --> 00:18:30,980 So multicores have arrived. 359 00:18:33,090 --> 00:18:33,810 This is a big deal. 360 00:18:33,810 --> 00:18:38,410 I mean, Microsoft wants to exploit parallelism. 361 00:18:38,410 --> 00:18:43,030 There can be no bigger indication than this. 362 00:18:43,030 --> 00:18:46,700 And there is explosion of cell phones. 363 00:18:46,700 --> 00:18:49,290 And if you don't understand what that means economically, 364 00:18:49,290 --> 00:18:51,550 you know there are 100 million PCs that will be sold this 365 00:18:51,550 --> 00:18:56,890 year and 950 million cell phones that'll 366 00:18:56,890 --> 00:18:58,420 be sold this year. 367 00:18:58,420 --> 00:19:00,560 This is the same kind of transition that is taking 368 00:19:00,560 --> 00:19:03,180 place that happened in the early 80s when we went from 369 00:19:03,180 --> 00:19:06,920 mainframes and mini- computers to micros. 370 00:19:06,920 --> 00:19:08,930 So what happens there determines what happens 371 00:19:08,930 --> 00:19:09,580 everywhere else. 372 00:19:09,580 --> 00:19:13,020 So it's quite possible then what happens on these small 373 00:19:13,020 --> 00:19:17,050 devices, hand-held devices will determine the 374 00:19:17,050 --> 00:19:19,070 architecture of everything else just 375 00:19:19,070 --> 00:19:20,270 because of the numbers. 376 00:19:20,270 --> 00:19:23,750 People are willing to invest a lot more money into this 377 00:19:23,750 --> 00:19:26,330 explosion of game boxes. 378 00:19:26,330 --> 00:19:28,410 AUDIENCE: [OBSCURED] 379 00:19:28,410 --> 00:19:29,170 ARVIND: I'm sorry? 380 00:19:29,170 --> 00:19:31,900 AUDIENCE: The explosion of laptops is also happening. 381 00:19:31,900 --> 00:19:35,610 ARVIND: Yeah, but still that number is 100 million. 382 00:19:35,610 --> 00:19:38,810 While cell phone is 950 million, at least in 2000. 383 00:19:38,810 --> 00:19:40,240 AUDIENCE: These are cell processors. 384 00:19:40,240 --> 00:19:41,630 ARVIND: Right. 385 00:19:41,630 --> 00:19:43,220 Absolutely. 386 00:19:43,220 --> 00:19:48,760 So look, if I'm talking to you as a teacher I mean, my 387 00:19:48,760 --> 00:19:52,170 message to my colleagues in the department is it's no 388 00:19:52,170 --> 00:19:54,780 longer good enough to say well we'll teach you parallelism 389 00:19:54,780 --> 00:19:57,890 when you're a sophomore. 390 00:19:57,890 --> 00:19:59,940 I mean, it is the thing that people want to deal with. 391 00:19:59,940 --> 00:20:01,300 You say, what do you mean telling me how 392 00:20:01,300 --> 00:20:02,520 to program one processor. 393 00:20:02,520 --> 00:20:06,040 I have 10 of them in my pocket. 394 00:20:06,040 --> 00:20:08,670 So I think we have to deal with this. 395 00:20:08,670 --> 00:20:10,350 You know, how do we introduce parallelism? 396 00:20:10,350 --> 00:20:14,740 What method we should have for teaching parallel programming 397 00:20:14,740 --> 00:20:17,030 so that is the default programming 398 00:20:17,030 --> 00:20:20,630 model in your head? 399 00:20:20,630 --> 00:20:21,830 So it's all about parallelism now. 400 00:20:21,830 --> 00:20:24,590 No longer an advanced topic. 401 00:20:24,590 --> 00:20:27,590 It has to be the first topic. 402 00:20:27,590 --> 00:20:30,390 Just another word of caution in this. 403 00:20:30,390 --> 00:20:31,800 So let's look at cell phones. 404 00:20:34,810 --> 00:20:40,630 So mine has some bugs in it so sometimes it misses a call 405 00:20:40,630 --> 00:20:43,280 when I'm surfing the web. 406 00:20:43,280 --> 00:20:44,530 It doesn't rate. 407 00:20:47,896 --> 00:20:51,330 So this is the deep question now, so whose fault is it? 408 00:20:54,000 --> 00:20:56,410 So for example, to what extent the phone call software should 409 00:20:56,410 --> 00:21:00,570 be aware of web surfing software or vice versa. 410 00:21:00,570 --> 00:21:03,220 So now you have to understand how these things are done. 411 00:21:03,220 --> 00:21:05,973 Of course, how the phone calls are made, that software has 412 00:21:05,973 --> 00:21:07,640 been evolving, right? 413 00:21:07,640 --> 00:21:11,825 You can look at it all in the last 15 years and you'll be 414 00:21:11,825 --> 00:21:13,190 able to trace a continuum. 415 00:21:13,190 --> 00:21:16,470 You know how phone calls are made and that's not trivial 416 00:21:16,470 --> 00:21:20,870 among the software on your cell phone. 417 00:21:20,870 --> 00:21:24,000 Well, when it comes to web surfing software I can 418 00:21:24,000 --> 00:21:27,770 guarantee you it was not written from scratch. 419 00:21:27,770 --> 00:21:29,420 People went to your PC, they say, what do 420 00:21:29,420 --> 00:21:30,340 you do on your web? 421 00:21:30,340 --> 00:21:32,840 They take all that software say, let's port 422 00:21:32,840 --> 00:21:35,180 it on the cell phone. 423 00:21:35,180 --> 00:21:38,330 Now the guy who wrote the web software never thought that 424 00:21:38,330 --> 00:21:41,300 you would be talking on telephone while you're surfing 425 00:21:41,300 --> 00:21:44,670 the web or at least not on the same device. 426 00:21:44,670 --> 00:21:47,290 And vice versa, the guy who wrote the phone software never 427 00:21:47,290 --> 00:21:51,210 thought you would be surfing the web while you're talking 428 00:21:51,210 --> 00:21:52,350 on the phone. 429 00:21:52,350 --> 00:21:57,740 I mean even 1995 or 1998 if somebody had said, you'll be 430 00:21:57,740 --> 00:21:59,510 surfing the web at the same time. 431 00:21:59,510 --> 00:22:00,850 It would have sounded pretty bizarre. 432 00:22:07,890 --> 00:22:09,100 Is it merely a scheduling issue? 433 00:22:09,100 --> 00:22:11,790 So this is the answer saying ah, they didn't process the 434 00:22:11,790 --> 00:22:13,830 interrupt properly. 435 00:22:13,830 --> 00:22:16,540 I'm sorry, this is not just a scheduling issue. 436 00:22:16,540 --> 00:22:19,160 I don't even know the semantics of this. 437 00:22:19,160 --> 00:22:22,370 So for example, if you're surfing the web and the phone 438 00:22:22,370 --> 00:22:25,390 rings do you really want to stop surfing the web and pick 439 00:22:25,390 --> 00:22:27,010 up the phone? 440 00:22:27,010 --> 00:22:28,440 I mean, what's so special about phone? 441 00:22:28,440 --> 00:22:31,180 There is nothing being indicated from the language 442 00:22:31,180 --> 00:22:32,070 point of view. 443 00:22:32,070 --> 00:22:35,850 It may be application requirement that you want to 444 00:22:35,850 --> 00:22:38,120 stop it or not stop it and so on. 445 00:22:40,800 --> 00:22:42,160 Is it a performance issue? 446 00:22:42,160 --> 00:22:44,820 That oh, it's because you have only 2 processors in this. 447 00:22:44,820 --> 00:22:47,670 If you had 4 this problem will go away. 448 00:22:47,670 --> 00:22:50,510 Or if you could just run it faster it will go away. 449 00:22:50,510 --> 00:22:53,800 I don't think any of these answers are right. 450 00:22:53,800 --> 00:22:57,550 I mean, the bottom line in this is sequential modules are 451 00:22:57,550 --> 00:23:00,930 often used in concurrent environments with unforeseen 452 00:23:00,930 --> 00:23:03,510 consequences. 453 00:23:03,510 --> 00:23:06,520 So your mindset has to be that I'm going to put together a 454 00:23:06,520 --> 00:23:11,620 parallel program with lots of existing things. 455 00:23:11,620 --> 00:23:14,260 Because you can't keep giving the answer oh, let's just 456 00:23:14,260 --> 00:23:16,650 rewrite it again from scratch. 457 00:23:16,650 --> 00:23:20,250 Because amount of software is so enormous in this. 458 00:23:20,250 --> 00:23:22,580 What we should worry about now is how we should 459 00:23:22,580 --> 00:23:23,750 design these modules. 460 00:23:23,750 --> 00:23:28,080 How we should write pieces of software or design hardware so 461 00:23:28,080 --> 00:23:32,420 that we can put together an ensemble of them, a collection 462 00:23:32,420 --> 00:23:36,800 of such modules so that they do something useful. 463 00:23:36,800 --> 00:23:38,000 And it's all about parallelism. 464 00:23:38,000 --> 00:23:41,990 And how to do all this in a parallel setting. 465 00:23:41,990 --> 00:23:44,800 OK, so new goals as opposed to old goals. 466 00:23:44,800 --> 00:23:47,740 So I don't want to think in terms of decomposition. 467 00:23:47,740 --> 00:23:49,770 I don't want to think in terms of here is my clever 468 00:23:49,770 --> 00:23:54,590 algorithm, how do I decompose it so that this runs here and 469 00:23:54,590 --> 00:23:56,320 this runs here and so on. 470 00:23:56,320 --> 00:23:59,280 Instead I want to think in terms of synthesis. 471 00:23:59,280 --> 00:24:03,630 That you've already given me lots of modules, which are 472 00:24:03,630 --> 00:24:04,360 very useful. 473 00:24:04,360 --> 00:24:06,440 Which perhaps are written by domain experts. 474 00:24:06,440 --> 00:24:09,170 They know something about that stuff. 475 00:24:09,170 --> 00:24:11,660 And can I put together a bigger application very 476 00:24:11,660 --> 00:24:13,820 quickly from it? 477 00:24:13,820 --> 00:24:16,520 And you know, another favorite example of mine in this regard 478 00:24:16,520 --> 00:24:19,650 is FFTW and linear algebra packages. 479 00:24:19,650 --> 00:24:21,830 Both have been optimized to the hilt, right? 480 00:24:21,830 --> 00:24:26,910 And if somebody gives you an application which calls FFT 481 00:24:26,910 --> 00:24:30,540 and uses enough linear algebra I guarantee you that program 482 00:24:30,540 --> 00:24:33,870 will not be easy to write in spite of the existence of both 483 00:24:33,870 --> 00:24:38,210 these packages, which are extremely well optimized. 484 00:24:38,210 --> 00:24:41,240 So there is something we're not paying attention to is, 485 00:24:41,240 --> 00:24:45,080 how do we bring parallel modules together in such a way 486 00:24:45,080 --> 00:24:47,460 so that the functionality, the performance, et cetera are 487 00:24:47,460 --> 00:24:49,330 unpredictable in some sense? 488 00:24:49,330 --> 00:24:49,760 AUDIENCE: I'm sorry. 489 00:24:49,760 --> 00:24:52,730 What's your point? 490 00:24:52,730 --> 00:24:54,970 I think I agree with your point, I just want to make 491 00:24:54,970 --> 00:24:56,960 sure I understand your point. 492 00:24:56,960 --> 00:25:00,030 That FFTW linear algorithm is not enough and 493 00:25:00,030 --> 00:25:01,670 so there will be-- 494 00:25:01,670 --> 00:25:02,260 ARVIND: No, no. 495 00:25:02,260 --> 00:25:04,870 I think it's very well done, FFTW. 496 00:25:04,870 --> 00:25:07,040 Linear algebra is very well done. 497 00:25:07,040 --> 00:25:10,900 But we still haven't provided any good way of thinking in 498 00:25:10,900 --> 00:25:12,700 terms of two parallel things. 499 00:25:12,700 --> 00:25:15,110 You know, what happens when they interact? 500 00:25:15,110 --> 00:25:18,500 So if my program is calling FFTW often and then it's 501 00:25:18,500 --> 00:25:20,820 calling linear algebra often. 502 00:25:20,820 --> 00:25:24,805 AUDIENCE: You weren't here when I mentioned that but I 503 00:25:24,805 --> 00:25:26,990 was actually bringing up the point of linear algebra and 504 00:25:26,990 --> 00:25:28,710 FFTW working together. 505 00:25:28,710 --> 00:25:29,690 Just working. 506 00:25:29,690 --> 00:25:31,940 ARVIND: Exactly. 507 00:25:36,100 --> 00:25:37,400 Absolutely. 508 00:25:37,400 --> 00:25:40,680 I'm saying in that sense, you see, this is a difference way 509 00:25:40,680 --> 00:25:43,600 of thinking because the old way of thinking well, give me 510 00:25:43,600 --> 00:25:44,640 your application. 511 00:25:44,640 --> 00:25:46,320 Now let's decompose it. 512 00:25:46,320 --> 00:25:51,090 The point is I don't have experts who can write FFTW as 513 00:25:51,090 --> 00:25:53,420 well as Matel wrote it. 514 00:25:53,420 --> 00:25:55,910 Or who's linear algebra packages are being written. 515 00:25:55,910 --> 00:25:57,770 I want to use those things. 516 00:25:57,770 --> 00:26:00,590 I want to synthesize large parallel programs quickly. 517 00:26:00,590 --> 00:26:00,900 Yes? 518 00:26:00,900 --> 00:26:04,210 AUDIENCE: How does the Apple iPhone work with-- 519 00:26:04,210 --> 00:26:05,623 ARVIND: I'm sorry, how does? 520 00:26:05,623 --> 00:26:06,596 AUDIENCE: The iPhone. 521 00:26:06,596 --> 00:26:07,570 AUDIENCE: Apple iPhone. 522 00:26:07,570 --> 00:26:08,050 ARVIND: Apple iPhone? 523 00:26:08,050 --> 00:26:12,090 AUDIENCE: Yes, they have the operating system on there-- 524 00:26:12,090 --> 00:26:14,090 ARVIND: I think from our point of view that's the same thing. 525 00:26:14,090 --> 00:26:17,490 Is just that the user interface is guaranteed to be 526 00:26:17,490 --> 00:26:21,470 far superior because in my phone when I see the icons I'm 527 00:26:21,470 --> 00:26:24,490 a techie and I still can't understand half of them. 528 00:26:24,490 --> 00:26:26,160 So I like Apples idea. 529 00:26:26,160 --> 00:26:28,520 You know, phone, pictures. 530 00:26:28,520 --> 00:26:30,650 [OBSCURED] 531 00:26:30,650 --> 00:26:33,160 PROFESSOR: The way Apple might be with that is going to not 532 00:26:33,160 --> 00:26:34,680 let anybody program it. 533 00:26:34,680 --> 00:26:37,400 It's going to have a very restrictive development 534 00:26:37,400 --> 00:26:40,804 program internally so they can basically say, I'm going to 535 00:26:40,804 --> 00:26:42,830 take this person from this person and put it together. 536 00:26:42,830 --> 00:26:46,510 They will try to do everything in one unified way. 537 00:26:46,510 --> 00:26:50,150 So to some extent it might work, but then you suddenly 538 00:26:50,150 --> 00:26:53,710 realize Apple hired programmers to keep up with 539 00:26:53,710 --> 00:26:55,290 all the needs. 540 00:26:55,290 --> 00:27:00,010 So it may even start to have other people running things. 541 00:27:00,010 --> 00:27:01,750 ARVIND: Internally their software is-- 542 00:27:01,750 --> 00:27:03,280 AUDIENCE: So is there something 543 00:27:03,280 --> 00:27:05,700 parallel going on in there? 544 00:27:05,700 --> 00:27:07,290 ARVIND: Guaranteed there will be parallel 545 00:27:07,290 --> 00:27:08,150 stuff going on there. 546 00:27:08,150 --> 00:27:11,690 Guaranteed because all the communication stuff has to go 547 00:27:11,690 --> 00:27:15,420 on in parallel with the computation stuff. 548 00:27:15,420 --> 00:27:21,320 So I mean, phones have to lots of things in parallel if for 549 00:27:21,320 --> 00:27:23,180 no other reason just because of power. 550 00:27:23,180 --> 00:27:26,110 It takes less power when you do things in parallel as 551 00:27:26,110 --> 00:27:28,735 opposed to when you do sequentially because then you 552 00:27:28,735 --> 00:27:34,500 have to run much faster in that. 553 00:27:34,500 --> 00:27:37,440 OK, so a method of designing and connecting modules has 554 00:27:37,440 --> 00:27:41,470 that functionality and performance are predictable. 555 00:27:41,470 --> 00:27:45,000 And must facilitate natural description of concurrent 556 00:27:45,000 --> 00:27:50,710 systems. A method of refining individual modules into 557 00:27:50,710 --> 00:27:53,900 hardware or software for systems on a chip. 558 00:27:53,900 --> 00:27:57,880 So refinement is a highly technical word. 559 00:27:57,880 --> 00:28:02,660 What that means is that you have written a program and you 560 00:28:02,660 --> 00:28:05,340 move on to rewrite it, but nobody can tell from outside 561 00:28:05,340 --> 00:28:06,810 that you rewrote it. 562 00:28:06,810 --> 00:28:09,270 All right, so you refine it and you can to 563 00:28:09,270 --> 00:28:10,120 refine it into hardware. 564 00:28:10,120 --> 00:28:13,720 I mean, you may implement it in hardware in this as opposed 565 00:28:13,720 --> 00:28:15,010 to transformation. 566 00:28:15,010 --> 00:28:16,340 Transformation is automatic. 567 00:28:16,340 --> 00:28:19,170 We're doing something, it's guaranteed it's correct. 568 00:28:19,170 --> 00:28:22,620 Refinement you may have to work a little bit more to show 569 00:28:22,620 --> 00:28:24,120 that it is correct in some sense. 570 00:28:24,120 --> 00:28:27,460 It's a correct refinement of whatever you were doing. 571 00:28:27,460 --> 00:28:32,470 So basically, just to get the application going, which as 572 00:28:32,470 --> 00:28:36,600 Allen is pointing out is the major task and this is true 573 00:28:36,600 --> 00:28:38,450 regardless of what people say. 574 00:28:38,450 --> 00:28:41,340 When you get to complex system, if people start 575 00:28:41,340 --> 00:28:44,780 talking about performance you're 90% there. 576 00:28:44,780 --> 00:28:47,140 That means the damn thing works. 577 00:28:47,140 --> 00:28:50,110 So even the people may say, oh, performance will be lousy. 578 00:28:50,110 --> 00:28:51,810 Most of the energy is just consumed in 579 00:28:51,810 --> 00:28:53,760 getting it to work. 580 00:28:53,760 --> 00:28:55,220 So never forget that. 581 00:28:55,220 --> 00:28:58,610 So a refinement idea becomes very important because you 582 00:28:58,610 --> 00:29:03,330 want to take as many things as possible, which are available. 583 00:29:03,330 --> 00:29:06,290 Make it work and then individually refine; modular 584 00:29:06,290 --> 00:29:08,910 refinement of things. 585 00:29:08,910 --> 00:29:10,875 AUDIENCE: Maybe just also underscore, just to emphasis 586 00:29:10,875 --> 00:29:15,130 what you're saying Arvind, our salesmen will go and say 587 00:29:15,130 --> 00:29:17,660 explicitly that they can sell a factor of 2. 588 00:29:17,660 --> 00:29:19,210 Just like you were saying. 589 00:29:19,210 --> 00:29:24,963 Just a factor of 2 on 100 processors or whatever, they 590 00:29:24,963 --> 00:29:26,687 can sell that if it works. 591 00:29:26,687 --> 00:29:26,850 If it gives the right answer. 592 00:29:26,850 --> 00:29:27,610 AUDIENCE: It's the ease. 593 00:29:27,610 --> 00:29:30,080 ARVIND: The ease and confidence that it's really 594 00:29:30,080 --> 00:29:31,840 doing what you expect it to do. 595 00:29:31,840 --> 00:29:35,350 AUDIENCE: So it's really performance second, ease of 596 00:29:35,350 --> 00:29:40,550 use first. We've got it backwards in academia. 597 00:29:40,550 --> 00:29:42,540 ARVIND: But everybody set designs on 2 multicores. 598 00:29:46,150 --> 00:29:49,140 So why multicore problems becomes hard and this where 599 00:29:49,140 --> 00:29:52,940 hardware and software starts diverging a little bit. 600 00:29:52,940 --> 00:29:55,060 So what happens in hardware design? 601 00:29:55,060 --> 00:29:57,620 Everything is in parallel. 602 00:29:57,620 --> 00:30:00,670 It's sitting right there, this blog does this, 603 00:30:00,670 --> 00:30:02,540 this blog does that. 604 00:30:02,540 --> 00:30:04,410 So there's no confusion in your mind 605 00:30:04,410 --> 00:30:06,640 what's going on in parallel. 606 00:30:06,640 --> 00:30:10,200 If you have to multiplex some thing in hardware, that first 607 00:30:10,200 --> 00:30:13,390 we'll do this then we'll do that, that's also expressed 608 00:30:13,390 --> 00:30:16,590 explicitly in the design. 609 00:30:16,590 --> 00:30:17,930 Software is not like that. 610 00:30:17,930 --> 00:30:21,850 You know, software have n threads, but I have only 10 in 611 00:30:21,850 --> 00:30:23,790 my hardware. 612 00:30:23,790 --> 00:30:28,020 So there is another layer of software or operating system 613 00:30:28,020 --> 00:30:30,620 or whatever you want to call it, runtime system. 614 00:30:30,620 --> 00:30:33,400 Whose sole job is to virtualize or to do time 615 00:30:33,400 --> 00:30:36,410 multiplexing of underlying resources and that makes the 616 00:30:36,410 --> 00:30:38,060 problem harder. 617 00:30:38,060 --> 00:30:40,220 You know, that makes the problem harder whether you're 618 00:30:40,220 --> 00:30:42,490 doing it in a way that preserves 619 00:30:42,490 --> 00:30:44,100 some performance goals. 620 00:30:44,100 --> 00:30:46,410 Whether you're doing it so that you want cause deadlocks, 621 00:30:46,410 --> 00:30:49,800 so there are many, many issues that come up in this. 622 00:30:49,800 --> 00:30:52,970 And basically, when I said this is an ideas kind of a 623 00:30:52,970 --> 00:30:56,590 talk, I do know how to do this and I still haven't worked 624 00:30:56,590 --> 00:31:00,850 enough on this problem of how to take this kind of 625 00:31:00,850 --> 00:31:03,100 methodology where I can write these parallel modules, 626 00:31:03,100 --> 00:31:04,450 compose them. 627 00:31:04,450 --> 00:31:05,780 How to map it onto multicores. 628 00:31:08,900 --> 00:31:11,980 OK, so now let me tell you something 629 00:31:11,980 --> 00:31:14,460 technical about this stuff. 630 00:31:14,460 --> 00:31:17,270 So this hardware inspired methodology for synthesizing 631 00:31:17,270 --> 00:31:21,000 parallel programs. Now I'm going to use some fancy words. 632 00:31:21,000 --> 00:31:24,390 So this is a rule-based specifications or what I call 633 00:31:24,390 --> 00:31:26,730 guarded atomic actions. 634 00:31:26,730 --> 00:31:30,100 I will explain in a second what that means. 635 00:31:30,100 --> 00:31:32,520 It will let you think about parallel systems in a very 636 00:31:32,520 --> 00:31:33,220 different way. 637 00:31:33,220 --> 00:31:39,380 So it's like saying look, if this register is 2 and this 638 00:31:39,380 --> 00:31:44,860 register is 3, you can add them and put them here, 5. 639 00:31:44,860 --> 00:31:47,470 And you can do this any time you want. 640 00:31:47,470 --> 00:31:48,900 So this is like a rule. 641 00:31:48,900 --> 00:31:50,090 Invariant. 642 00:31:50,090 --> 00:31:51,100 You're saying, I don't care what's 643 00:31:51,100 --> 00:31:52,120 happening in the machine. 644 00:31:52,120 --> 00:31:54,600 This is always safe to do. 645 00:31:54,600 --> 00:31:57,870 Now what I'm going to ask you to do is take a huge 646 00:31:57,870 --> 00:32:00,430 intellectual jump from this. 647 00:32:00,430 --> 00:32:03,640 That if you give me enough such rules you would have 648 00:32:03,640 --> 00:32:07,870 completely described the hardware, what it does. 649 00:32:07,870 --> 00:32:09,600 Now why am I thinking like this? 650 00:32:09,600 --> 00:32:14,060 Because these invariants are stated properly. 651 00:32:14,060 --> 00:32:16,850 It's always possible to understand the behavior of the 652 00:32:16,850 --> 00:32:20,950 system as if one things at a time is happening. 653 00:32:20,950 --> 00:32:25,430 When 2 plus 3 is becoming 5, if you want you can have this 654 00:32:25,430 --> 00:32:29,120 mental picture-- the rest of the world is frozen. 655 00:32:29,120 --> 00:32:31,020 Nothing is changing there, only that 656 00:32:31,020 --> 00:32:32,430 change is taking place. 657 00:32:32,430 --> 00:32:35,490 And you can think it terms of these small, small changes and 658 00:32:35,490 --> 00:32:39,730 all legal behaviors will be explainable as some sequence 659 00:32:39,730 --> 00:32:41,270 of these changes. 660 00:32:41,270 --> 00:32:44,840 It's possible to describe all kinds of hardware 661 00:32:44,840 --> 00:32:47,040 using things like that. 662 00:32:47,040 --> 00:32:51,730 Composition of modules with guarded interfaces. 663 00:32:51,730 --> 00:32:56,250 So this is the language called Bluespec and it has a very 664 00:32:56,250 --> 00:33:00,270 strong echoes of a language that was designed in the 80s 665 00:33:00,270 --> 00:33:02,140 by two guys, Chandry and Misra, the 666 00:33:02,140 --> 00:33:04,720 language called Unity. 667 00:33:04,720 --> 00:33:06,620 But they never used it for hardware design. 668 00:33:06,620 --> 00:33:09,300 Their goals were different and they didn't really have a 669 00:33:09,300 --> 00:33:13,040 concept of a module in that language. 670 00:33:13,040 --> 00:33:16,530 So let me show you 3 examples here. 671 00:33:16,530 --> 00:33:19,310 Greatest common divisor, just to get off ground and then I 672 00:33:19,310 --> 00:33:24,220 will show you 2 very different problems: airlines reservation 673 00:33:24,220 --> 00:33:28,910 query problem and the video codec H.264. 674 00:33:28,910 --> 00:33:31,610 We won't have time to get into ordered list, but if somebody 675 00:33:31,610 --> 00:33:35,560 wants to discuss that I've be happy to show you. 676 00:33:35,560 --> 00:33:41,510 OK, so now let's look at how do I think of my design of my 677 00:33:41,510 --> 00:33:43,640 program in Bluespec? 678 00:33:43,640 --> 00:33:45,300 I always think in terms of a bunch of modules. 679 00:33:47,890 --> 00:33:48,880 And what's a module. 680 00:33:48,880 --> 00:33:52,880 A module is going to have some state elements in it. 681 00:33:52,880 --> 00:33:57,020 If you're thinking software just think each module guards 682 00:33:57,020 --> 00:34:00,580 some variables, which nobody else can touch. 683 00:34:00,580 --> 00:34:02,340 You're the only one. 684 00:34:02,340 --> 00:34:06,560 So you have a variable xyz, he has some abcd, et cetera. 685 00:34:06,560 --> 00:34:08,750 So each module has some. 686 00:34:08,750 --> 00:34:10,650 That's what I'm showing in red. 687 00:34:10,650 --> 00:34:13,680 If you were thinking hardware, think of thse things as 688 00:34:13,680 --> 00:34:15,820 stateful elements like registers or 689 00:34:15,820 --> 00:34:18,170 flip flops of memories. 690 00:34:18,170 --> 00:34:22,960 But the main point is if it is here then it's not here. 691 00:34:22,960 --> 00:34:26,040 You own something, this module owns something, this module 692 00:34:26,040 --> 00:34:26,300 owns something. 693 00:34:26,300 --> 00:34:28,030 One. 694 00:34:28,030 --> 00:34:33,010 Second thing is every module has internal rules for 695 00:34:33,010 --> 00:34:36,260 manipulating the state and the rules are always going to be 696 00:34:36,260 --> 00:34:40,050 of the form that if some condition is true on this 697 00:34:40,050 --> 00:34:44,130 you're allowed to make the following changes to that. 698 00:34:44,130 --> 00:34:45,800 So that's what the rule looks like. 699 00:34:45,800 --> 00:34:49,600 Some condition, if it holds that by action I 700 00:34:49,600 --> 00:34:51,350 mean, change the state. 701 00:34:51,350 --> 00:34:53,310 Change the state of those variables. 702 00:34:53,310 --> 00:34:54,690 And it's all or nothing. 703 00:34:54,690 --> 00:34:57,180 So if you have 2 variables and if you want to change both of 704 00:34:57,180 --> 00:35:01,030 them in a rule then the execution of a rule means 705 00:35:01,030 --> 00:35:04,340 either both of them will change or neither will change. 706 00:35:04,340 --> 00:35:06,320 There's nothing in between you can see. 707 00:35:06,320 --> 00:35:08,440 That's disallowed by this model. 708 00:35:13,250 --> 00:35:14,780 I mean, the reason I have modules is 709 00:35:14,780 --> 00:35:16,240 they're not totally disjoined. 710 00:35:16,240 --> 00:35:18,940 You can actually access the state. 711 00:35:18,940 --> 00:35:23,350 Both read it and write it, but you can't just arbitrarily 712 00:35:23,350 --> 00:35:24,540 reach it and do something. 713 00:35:24,540 --> 00:35:27,680 You have to go through some interface methods the moment 714 00:35:27,680 --> 00:35:29,520 you talk to some other modules. 715 00:35:29,520 --> 00:35:32,610 So this is the standard information hiding principle. 716 00:35:32,610 --> 00:35:36,150 You know, abstract data types, whatever you want to call it. 717 00:35:36,150 --> 00:35:38,360 You know, so there's an interface through which you 718 00:35:38,360 --> 00:35:41,570 will enter and if you're just reading the values I think we 719 00:35:41,570 --> 00:35:45,550 call them read methods or value methods and if you're 720 00:35:45,550 --> 00:35:48,380 going to effect the state of one of the modules we'll call 721 00:35:48,380 --> 00:35:50,380 those action methods. 722 00:35:50,380 --> 00:35:51,850 Because you're actually going to change the 723 00:35:51,850 --> 00:35:53,360 state in some module. 724 00:35:56,240 --> 00:35:59,280 So now, very strange execution model here. 725 00:35:59,280 --> 00:36:03,060 Very, very different from sequential execution model. 726 00:36:03,060 --> 00:36:04,770 So repeatedly you do the following. 727 00:36:04,770 --> 00:36:07,110 You pick a rule to execute. 728 00:36:07,110 --> 00:36:10,350 Any rule, I don't care. 729 00:36:10,350 --> 00:36:14,900 And select a rule to execute, compute the state, what 730 00:36:14,900 --> 00:36:18,480 updates should be made, make the state update. 731 00:36:18,480 --> 00:36:22,210 And then go and do it again, so repeatedly you do this. 732 00:36:22,210 --> 00:36:24,530 So this is a highly nondeterministic thing, 733 00:36:24,530 --> 00:36:25,790 selective rule. 734 00:36:25,790 --> 00:36:29,110 There may be a gazillion rules in your system. 735 00:36:29,110 --> 00:36:32,360 So system privileges are provided to control the 736 00:36:32,360 --> 00:36:34,160 selection if you want. 737 00:36:34,160 --> 00:36:37,070 But it's a detail that we won't get into to. 738 00:36:37,070 --> 00:36:40,370 So does everybody get the model right here? 739 00:36:40,370 --> 00:36:42,690 That you have a lot of state elements and 740 00:36:42,690 --> 00:36:44,870 you have lot of rules. 741 00:36:44,870 --> 00:36:51,170 Many rules may be applicable, pick one of them to execute. 742 00:36:51,170 --> 00:36:52,780 Execute and that will change the state. 743 00:36:52,780 --> 00:36:55,410 Ask the same question again, which rules are enabled? 744 00:36:55,410 --> 00:36:58,950 No, pick any rule amongst them and keep doing it. 745 00:36:58,950 --> 00:37:03,220 Now semantics say one rule at a time. 746 00:37:03,220 --> 00:37:05,560 In any implementation we're going to do many, 747 00:37:05,560 --> 00:37:07,590 many rule in parallel. 748 00:37:07,590 --> 00:37:08,660 But you don't have to worry about that. 749 00:37:08,660 --> 00:37:10,860 That will be done all automatically. 750 00:37:10,860 --> 00:37:14,740 That we can do many rules in parallel. 751 00:37:14,740 --> 00:37:18,300 OK, so now let's look at GCD. 752 00:37:18,300 --> 00:37:20,310 So this is an ordinary GCD program. 753 00:37:20,310 --> 00:37:23,880 You can write it in any language you want if y is zero 754 00:37:23,880 --> 00:37:28,070 then x otherwise if this is greater than GCD of y, x. 755 00:37:28,070 --> 00:37:31,410 Otherwise you subtract. 756 00:37:31,410 --> 00:37:35,080 So no problem and I'm sure you know how it executes. 757 00:37:35,080 --> 00:37:39,720 So if I was to take GCD of 6,15 you will see there is a 758 00:37:39,720 --> 00:37:43,830 recursive call that'll go on because 6 is less than 15 so 759 00:37:43,830 --> 00:37:45,570 it will be this. 760 00:37:45,570 --> 00:37:47,950 You get this and you get this, et cetera. 761 00:37:47,950 --> 00:37:51,070 Ultimately you get 3 as an answer. 762 00:37:51,070 --> 00:37:51,960 Everybody is with me? 763 00:37:51,960 --> 00:37:52,790 You know you all know GCD? 764 00:37:52,790 --> 00:37:55,850 You can all write it in your favorite language. 765 00:37:55,850 --> 00:38:02,230 Now the question is, what does it mean to execute this 766 00:38:02,230 --> 00:38:04,000 program in a concurrent setting? 767 00:38:07,530 --> 00:38:11,450 We were not thinking parallel or sequential, right? 768 00:38:11,450 --> 00:38:14,430 This is GCD. 769 00:38:14,430 --> 00:38:16,550 Someone who's teaching this class says run it in parallel. 770 00:38:16,550 --> 00:38:19,450 You say, what do you mean, run it in parallel? 771 00:38:19,450 --> 00:38:23,250 Oh, you mean that maybe I should try to do several GCD 772 00:38:23,250 --> 00:38:26,160 calls, recursive calls in parallel? 773 00:38:26,160 --> 00:38:29,070 That also doesn't make too much sense in this. 774 00:38:29,070 --> 00:38:30,400 Perhaps, this is what someone meant. 775 00:38:30,400 --> 00:38:33,100 You know that if he has some program where there are 2 776 00:38:33,100 --> 00:38:37,960 calls to GCD, he would like you to do those 2 in parallel. 777 00:38:43,160 --> 00:38:44,860 I mean, what does it mean to do GCD in parallel? 778 00:38:49,140 --> 00:38:53,420 Now let me contrast this with how you would think about this 779 00:38:53,420 --> 00:38:55,990 problem as a hardware person. 780 00:38:55,990 --> 00:38:58,940 So your job is to build a GCD machine. 781 00:38:58,940 --> 00:38:59,820 You're going to make millions of 782 00:38:59,820 --> 00:39:03,030 dollars, GCD is very popular. 783 00:39:03,030 --> 00:39:05,160 So you build this GCD machine. 784 00:39:05,160 --> 00:39:08,030 You know, it has 2 inputs, it has an output. 785 00:39:08,030 --> 00:39:10,580 And this will be my module I'm going to sell this 786 00:39:10,580 --> 00:39:14,280 intellectual property to everyone. 787 00:39:14,280 --> 00:39:20,330 OK, so now let me talk of parallel invocations of this. 788 00:39:20,330 --> 00:39:23,790 You see in hardware there can be no confusion, either you 789 00:39:23,790 --> 00:39:28,300 have 2 GCD boxes or you have 1 GCD box. 790 00:39:28,300 --> 00:39:30,430 I mean, you can have as many boxes as you want, but there 791 00:39:30,430 --> 00:39:32,170 is no confusion about that. 792 00:39:32,170 --> 00:39:36,330 You know how many GCDs you have. So in some sense, if you 793 00:39:36,330 --> 00:39:40,520 have many of them, you're talking of independent calls. 794 00:39:40,520 --> 00:39:42,400 Who knows what about recursive calls? 795 00:39:42,400 --> 00:39:44,310 I mean, that's internal story. 796 00:39:44,310 --> 00:39:46,220 You know, that's a different level of questions. 797 00:39:46,220 --> 00:39:50,110 So this question will automatically get asked. 798 00:39:50,110 --> 00:39:52,720 In hardware you will ask the question, does the answer come 799 00:39:52,720 --> 00:39:55,820 out immediately? 800 00:39:55,820 --> 00:39:58,110 Does it come out in particular time? 801 00:39:58,110 --> 00:39:59,750 Why's this question important? 802 00:39:59,750 --> 00:40:01,790 Because if it's going to take some time then while it's 803 00:40:01,790 --> 00:40:03,800 computing the question I'm going to ask is can I give 804 00:40:03,800 --> 00:40:05,050 another set of inputs? 805 00:40:07,540 --> 00:40:09,330 Are you with me? 806 00:40:09,330 --> 00:40:11,250 I mean, that's a legitimate question to ask. 807 00:40:11,250 --> 00:40:15,160 If it's going to take half an hour to compute the GCD can I 808 00:40:15,160 --> 00:40:19,310 give it another input while it's thinking? 809 00:40:19,310 --> 00:40:22,160 Can the machine be shared by 2 different users? 810 00:40:25,020 --> 00:40:27,700 Can it be pipelined? 811 00:40:27,700 --> 00:40:30,300 So you agree that all these questions are meaningful in 812 00:40:30,300 --> 00:40:31,620 hardware setting? 813 00:40:31,620 --> 00:40:34,870 And I claim all these questions are also meaningful 814 00:40:34,870 --> 00:40:36,290 in software setting. 815 00:40:36,290 --> 00:40:37,540 We just don't think like that. 816 00:40:41,110 --> 00:40:44,390 This is exactly the problem we are having right now when we 817 00:40:44,390 --> 00:40:46,260 say that we have optimized something 818 00:40:46,260 --> 00:40:49,710 and we call it again. 819 00:40:49,710 --> 00:40:53,380 We think off it as starting a fresh, maybe not. 820 00:40:53,380 --> 00:40:54,000 Maybe not. 821 00:40:54,000 --> 00:40:55,700 You know, maybe we should think of 822 00:40:55,700 --> 00:40:57,210 it in terms of resources. 823 00:41:00,370 --> 00:41:04,560 And the point I want to make is I want to think of GCD 824 00:41:04,560 --> 00:41:07,950 module as a resource. 825 00:41:07,950 --> 00:41:11,360 Even in software I want to think of it as a resource. 826 00:41:11,360 --> 00:41:14,140 So that there is no ambiguity in your mind whether you have 827 00:41:14,140 --> 00:41:15,670 1 GCD or you have 2 GCDs. 828 00:41:18,750 --> 00:41:21,870 If you're going to multiplex it we'll write it differently. 829 00:41:21,870 --> 00:41:24,090 If you're going to have 2 independent ones, which can go 830 00:41:24,090 --> 00:41:28,050 on in parallel we'll write it differently. 831 00:41:28,050 --> 00:41:31,650 OK, so that's the idea I want to borrow from the hardware 832 00:41:31,650 --> 00:41:33,820 side of this. 833 00:41:33,820 --> 00:41:35,930 So how would you do-- yes? 834 00:41:35,930 --> 00:41:40,120 AUDIENCE: But how would you decide how many GCDs to 835 00:41:40,120 --> 00:41:40,800 instantiate? 836 00:41:40,800 --> 00:41:43,910 Would that be is this model left up to the programmer? 837 00:41:43,910 --> 00:41:45,340 ARVIND: Well, you see that's the big difference between 838 00:41:45,340 --> 00:41:46,350 hardware and software. 839 00:41:46,350 --> 00:41:50,070 That in hardware you have no design until you have taken 840 00:41:50,070 --> 00:41:51,920 that decision. 841 00:41:51,920 --> 00:41:53,060 AUDIENCE: Right. 842 00:41:53,060 --> 00:41:57,440 I mean, hardware designs once it's built, it's built. 843 00:41:57,440 --> 00:42:00,910 Software designs you want the same module of software to 844 00:42:00,910 --> 00:42:03,360 seamlessly run on different kinds of hardware. 845 00:42:03,360 --> 00:42:06,230 ARVIND: That doesn't mean that I can't recompile it or can't 846 00:42:06,230 --> 00:42:08,300 resynthesize it or something. 847 00:42:08,300 --> 00:42:10,700 So it may be same source description in software. 848 00:42:10,700 --> 00:42:12,510 AUDIENCE: Right, if your source subscription specifies 849 00:42:12,510 --> 00:42:17,160 2 GCD modules and you have 8 cores, you're not necessarily 850 00:42:17,160 --> 00:42:18,590 going to take very good advantage of 851 00:42:18,590 --> 00:42:19,930 this, so how do you-- 852 00:42:19,930 --> 00:42:20,320 ARVIND: No, no. 853 00:42:20,320 --> 00:42:23,900 The way I would like to think about that is that it'll be 854 00:42:23,900 --> 00:42:26,120 highly parameterized code and you will plug in some 855 00:42:26,120 --> 00:42:29,310 information like that and resynthesize 856 00:42:29,310 --> 00:42:30,650 the parallel program. 857 00:42:30,650 --> 00:42:33,450 AUDIENCE: So the software or some sort of macro meta 858 00:42:33,450 --> 00:42:38,870 software has to determine the extent to which the problem 859 00:42:38,870 --> 00:42:41,490 should be distributed. 860 00:42:41,490 --> 00:42:44,950 ARVIND: Well, you can think like that, but really this 861 00:42:44,950 --> 00:42:46,790 integration will be much tighter than you think. 862 00:42:46,790 --> 00:42:51,930 So for example, let me just give you some more insight 863 00:42:51,930 --> 00:42:53,510 from the hardware side. 864 00:42:53,510 --> 00:42:56,500 So in the hardware side it's standard practice that will 865 00:42:56,500 --> 00:42:59,700 have some RTL code, I'll have some code. 866 00:42:59,700 --> 00:43:01,400 And I plug in the library. 867 00:43:01,400 --> 00:43:02,820 What kind of gates have I got? 868 00:43:02,820 --> 00:43:05,830 Have I got 2 input gates, 4 input gates, whatever. 869 00:43:05,830 --> 00:43:07,750 There will be a huge library. 870 00:43:07,750 --> 00:43:12,670 The same code can be compiled using different libraries. 871 00:43:12,670 --> 00:43:16,310 So you choose something at synthesis time. 872 00:43:16,310 --> 00:43:19,140 Another thing that'll happen in hardware is what is called 873 00:43:19,140 --> 00:43:20,800 generic statements. 874 00:43:20,800 --> 00:43:25,940 So you may have a loop which assigns a 875 00:43:25,940 --> 00:43:28,830 register file with n registers. 876 00:43:28,830 --> 00:43:31,100 Which will conceptually makes sense, it will make sense in 877 00:43:31,100 --> 00:43:33,460 simulation, but when it comes to synthesizing it you 878 00:43:33,460 --> 00:43:35,600 have to specify n. 879 00:43:35,600 --> 00:43:39,240 So I'm going towards that kind of methodology. 880 00:43:39,240 --> 00:43:41,840 That you will have highly parameterized code, but many 881 00:43:41,840 --> 00:43:45,150 parameters you have to specify before you will say, now my 882 00:43:45,150 --> 00:43:48,350 program is ready to run on this parallel machine. 883 00:43:48,350 --> 00:43:51,170 And this is totally unexplored. 884 00:43:51,170 --> 00:43:53,720 As I said, this is an idea level them being. 885 00:43:53,720 --> 00:43:56,870 AUDIENCE: So this adds a problem that is not in 886 00:43:56,870 --> 00:43:58,450 traditional hardware synthesis, mainly 887 00:43:58,450 --> 00:44:00,380 picking all the n's. 888 00:44:00,380 --> 00:44:03,110 Ideally you wouldn't want to have to have the programmer to 889 00:44:03,110 --> 00:44:04,610 come up with creative algorithms because they're 890 00:44:04,610 --> 00:44:05,770 going to have lots and lots of . 891 00:44:05,770 --> 00:44:07,010 ARVIND: Absolutely. 892 00:44:07,010 --> 00:44:08,770 We may have lots of defaults. 893 00:44:08,770 --> 00:44:11,890 We may other level of smart programs, which look at this 894 00:44:11,890 --> 00:44:14,590 and they instantly go and set many parameters. 895 00:44:14,590 --> 00:44:17,300 You know, so there will be a practical side, 896 00:44:17,300 --> 00:44:18,480 many issues like that. 897 00:44:18,480 --> 00:44:21,690 But main point I'm trying to make is that I really want to 898 00:44:21,690 --> 00:44:23,460 synthesize. 899 00:44:23,460 --> 00:44:27,470 I want to instantiate a program, synthesize a program 900 00:44:27,470 --> 00:44:32,710 for a given configuration of hardware. 901 00:44:32,710 --> 00:44:36,180 Let's look at the GCD in Bluespec. 902 00:44:36,180 --> 00:44:40,700 If I'm going to find the GCD of 2 numbers I need 2 903 00:44:40,700 --> 00:44:43,430 registers, x and y; so that's what this is. 904 00:44:43,430 --> 00:44:45,950 Make me a register. 905 00:44:45,950 --> 00:44:47,530 If you want, make me 2 variables. 906 00:44:47,530 --> 00:44:49,620 You know, make me variable x and y, 907 00:44:49,620 --> 00:44:52,330 initial values are zero. 908 00:44:52,330 --> 00:44:56,160 So this is what is called the state of the module. 909 00:44:56,160 --> 00:44:57,930 It knows about and x and y, nobody else 910 00:44:57,930 --> 00:45:00,490 knows about x and y. 911 00:45:00,490 --> 00:45:01,770 What are the dynamics of it? 912 00:45:01,770 --> 00:45:05,090 How do I compute this? 913 00:45:05,090 --> 00:45:08,960 So internal behavior is being described and this is exactly 914 00:45:08,960 --> 00:45:12,240 what you saw earlier in saying that there is a swap rule. 915 00:45:12,240 --> 00:45:15,520 If x is greater than y and y is not equal to zero then you 916 00:45:15,520 --> 00:45:18,556 swap x and y, so this is a parallel composition. x gets 917 00:45:18,556 --> 00:45:19,806 y, y gets x. 918 00:45:23,480 --> 00:45:27,480 In this system all the reads take place instantaneously and 919 00:45:27,480 --> 00:45:30,930 then you do all the writes at the end of it. 920 00:45:30,930 --> 00:45:34,720 Don't read this sequentially, you read x and y in parallel 921 00:45:34,720 --> 00:45:38,210 and then you go and update x and y. 922 00:45:38,210 --> 00:45:39,320 Does this rule make sense? 923 00:45:39,320 --> 00:45:41,110 I mean, if you know anything GCD-- 924 00:45:41,110 --> 00:45:44,120 if x is greater than y than you're going to swap it and 925 00:45:44,120 --> 00:45:47,830 subtraction, if x is less than or equal to y than y 926 00:45:47,830 --> 00:45:50,190 gets y minus x. 927 00:45:50,190 --> 00:45:53,210 So this is very interesting. 928 00:45:53,210 --> 00:45:57,180 Now you have 2 registers x and y and I've given 2 rules and 929 00:45:57,180 --> 00:46:00,650 I'm saying you can apply these rules anytime you want. 930 00:46:00,650 --> 00:46:03,580 Just repeatedly keep doing these rules. 931 00:46:03,580 --> 00:46:05,450 Now what does the outside world know about this? 932 00:46:08,400 --> 00:46:10,630 What do you want to advertise about GCD 933 00:46:10,630 --> 00:46:11,870 to the outside world? 934 00:46:11,870 --> 00:46:13,670 You don't want to give away your secret. 935 00:46:13,670 --> 00:46:16,280 This is your intellectual property of how you're 936 00:46:16,280 --> 00:46:18,530 computing the GCD. 937 00:46:18,530 --> 00:46:21,090 You just want to be use GCD. 938 00:46:21,090 --> 00:46:25,210 So the outside world can say, oh find me start it. 939 00:46:25,210 --> 00:46:29,750 With a and b. x gets a, y gets b. 940 00:46:29,750 --> 00:46:33,540 It's going to give you 2 parameters, but we may be busy 941 00:46:33,540 --> 00:46:37,730 computing so we have a guard here which says, if y is 942 00:46:37,730 --> 00:46:39,240 zero-- this is internal, outside 943 00:46:39,240 --> 00:46:41,360 world doesn't see this. 944 00:46:41,360 --> 00:46:45,890 If y is zero then only then can you start it, otherwise 945 00:46:45,890 --> 00:46:46,650 you can't start it. 946 00:46:46,650 --> 00:46:50,290 Otherwise it means it's busy computing. 947 00:46:50,290 --> 00:46:52,930 OK, similarly when is the result available? 948 00:46:52,930 --> 00:46:54,970 When the y is zero then you have the result and 949 00:46:54,970 --> 00:46:56,640 you will return x. 950 00:46:56,640 --> 00:46:58,720 AUDIENCE: [OBSCURED] 951 00:46:58,720 --> 00:46:59,280 ARVIND: That's right. 952 00:46:59,280 --> 00:47:02,260 So that'll be more sophisticated. 953 00:47:02,260 --> 00:47:05,000 And that's exactly the kind of decision I do want the 954 00:47:05,000 --> 00:47:08,700 designers to take because you may want to put a five hole. 955 00:47:08,700 --> 00:47:11,150 You may just keep spitting out results. 956 00:47:11,150 --> 00:47:12,810 You can make it as sophisticated as you want. 957 00:47:12,810 --> 00:47:14,250 Tag it if you want. 958 00:47:14,250 --> 00:47:16,790 No predetermined thing. 959 00:47:22,540 --> 00:47:24,400 So first question I ask you, what happened to those 960 00:47:24,400 --> 00:47:25,650 recursive calls? 961 00:47:30,850 --> 00:47:34,080 You know, if you go back to the previous slide there was a 962 00:47:34,080 --> 00:47:37,940 recursively called GCD, which you were all comfortable with. 963 00:47:37,940 --> 00:47:42,154 There is no GCD call here. 964 00:47:42,154 --> 00:47:44,280 AUDIENCE: [OBSCURED] 965 00:47:44,280 --> 00:47:46,020 ARVIND: Right, so there's a notion of a 966 00:47:46,020 --> 00:47:48,590 cycle event or something. 967 00:47:48,590 --> 00:47:51,040 Look at it now, update it, look at it again. 968 00:47:51,040 --> 00:47:54,040 You know, so there's like an infinite loop here, which is 969 00:47:54,040 --> 00:47:55,710 always going. 970 00:47:55,710 --> 00:47:57,280 Which is exactly how hardware works. 971 00:48:00,660 --> 00:48:02,570 It doesn't have to be synchronous, but if you want 972 00:48:02,570 --> 00:48:04,960 to have a simple model in your head just 973 00:48:04,960 --> 00:48:06,580 think, there's the clock. 974 00:48:06,580 --> 00:48:09,730 You know, look at this state, pick one route to execute, 975 00:48:09,730 --> 00:48:13,370 update it, then it's the next clock cycle. 976 00:48:13,370 --> 00:48:16,220 Keep doing this repeatedly. 977 00:48:16,220 --> 00:48:19,510 It's details here, but I can take such a description and 978 00:48:19,510 --> 00:48:23,810 actually produce that machine for you, which does the GCD. 979 00:48:23,810 --> 00:48:26,880 What I'm much more interested in is, this is how I want to 980 00:48:26,880 --> 00:48:29,760 think about my GCD now. 981 00:48:29,760 --> 00:48:32,450 So I've hidden everything inside it. 982 00:48:32,450 --> 00:48:37,190 And it has 2 methods: start and result and there is a very 983 00:48:37,190 --> 00:48:41,450 strong notion of when a method is ready-- 984 00:48:41,450 --> 00:48:43,320 because it may be busy computing, it doesn't want to 985 00:48:43,320 --> 00:48:45,420 listen to you. 986 00:48:45,420 --> 00:48:48,930 So there is this notion of something being ready and if 987 00:48:48,930 --> 00:48:52,260 it's an action method then you're only allowed to enable 988 00:48:52,260 --> 00:48:55,560 it when it's ready. 989 00:48:55,560 --> 00:48:57,320 So this is a high-level protocol that 990 00:48:57,320 --> 00:48:58,420 compiler will enforce. 991 00:48:58,420 --> 00:49:03,950 It'll never ever set this, it'll never execute this 992 00:49:03,950 --> 00:49:06,700 unless it is ready to be executed. 993 00:49:06,700 --> 00:49:08,690 And of course, when you're going to enable it then you 994 00:49:08,690 --> 00:49:13,010 have to give me 2 arguments that go with it. 995 00:49:13,010 --> 00:49:14,290 And what does this really mean? 996 00:49:16,990 --> 00:49:19,930 The result is valid. 997 00:49:19,930 --> 00:49:21,900 So if are to do types et cetera, this would easily be 998 00:49:21,900 --> 00:49:24,500 baby type or something. 999 00:49:24,500 --> 00:49:28,490 Tag union type that validates is being coded here. 1000 00:49:32,230 --> 00:49:35,080 So really to the world you are just 1001 00:49:35,080 --> 00:49:37,430 advertising this interface. 1002 00:49:37,430 --> 00:49:40,520 There is a GCD interface and it has 2 methods. 1003 00:49:40,520 --> 00:49:44,100 An action method, which is called start and a result 1004 00:49:44,100 --> 00:49:46,650 method, which is just a value kind of a thing. 1005 00:49:46,650 --> 00:49:48,470 And in order to do this you have to give me an 1006 00:49:48,470 --> 00:49:50,720 int a into b in this. 1007 00:49:50,720 --> 00:49:52,650 An end of interface. 1008 00:49:52,650 --> 00:49:55,670 Now this is borrowing a lot from modern programming 1009 00:49:55,670 --> 00:50:02,020 languages so for example, I can easily make it 1010 00:50:02,020 --> 00:50:03,570 polymorphic. 1011 00:50:03,570 --> 00:50:05,870 It doesn't have to be ints here. 1012 00:50:05,870 --> 00:50:07,270 You know, what is the meaning of int? 1013 00:50:11,950 --> 00:50:15,390 You can specify the type in this. 1014 00:50:15,390 --> 00:50:20,490 So it'll be a's of type t, b's of type t et cetera. 1015 00:50:20,490 --> 00:50:23,620 What are the examples of type t here? 1016 00:50:23,620 --> 00:50:26,990 It could be a 32 bit integer, it could be 16 bit integer, it 1017 00:50:26,990 --> 00:50:30,170 could be 17 bit integer, whatever you want and whatever 1018 00:50:30,170 --> 00:50:31,410 makes sense. 1019 00:50:31,410 --> 00:50:34,210 So this is the kind of thing that you always take care of 1020 00:50:34,210 --> 00:50:36,250 when you synthesize things. 1021 00:50:36,250 --> 00:50:39,990 It won't synthesize until you specify the type, but the 1022 00:50:39,990 --> 00:50:42,240 description remains the same. 1023 00:50:42,240 --> 00:50:45,430 And you will instantiate it for a given type. 1024 00:50:45,430 --> 00:50:47,470 This I'm showing you for one reason. 1025 00:50:47,470 --> 00:50:50,880 If you are coming from the hardware side really these 1026 00:50:50,880 --> 00:50:54,400 ideas and types and sensations are very sophisticated. 1027 00:50:54,400 --> 00:50:57,240 I mean, this is getting as advanced as 1028 00:50:57,240 --> 00:51:00,990 you can have in software. 1029 00:51:00,990 --> 00:51:03,490 The other very interesting thing is, and this is the 1030 00:51:03,490 --> 00:51:06,880 abstract data type kind of thinking, you can go and 1031 00:51:06,880 --> 00:51:11,280 completely change the implementation of this. 1032 00:51:11,280 --> 00:51:15,240 If you have a clever algorithm for doing GCD, fine, go 1033 00:51:15,240 --> 00:51:15,800 ahead and do it. 1034 00:51:15,800 --> 00:51:19,280 You know, it doesn't effect the users of this. 1035 00:51:19,280 --> 00:51:22,410 So this is a very important property for composition. 1036 00:51:22,410 --> 00:51:25,770 All I insist on is that every method has this ready thing 1037 00:51:25,770 --> 00:51:29,690 coming on so that I don't make mistakes in wiring it. 1038 00:51:29,690 --> 00:51:32,240 I will involve this method only when it's ready. 1039 00:51:32,240 --> 00:51:36,020 So in some sense if you forget about the ready thing, I'm 1040 00:51:36,020 --> 00:51:39,480 just borrowing ideas from object oriented languages. 1041 00:51:39,480 --> 00:51:42,600 You know, you have a class, you have methods on it, and 1042 00:51:42,600 --> 00:51:44,940 all I'm saying is oh, you should manipulate this state 1043 00:51:44,940 --> 00:51:48,280 in a very systematic manner using these methods. 1044 00:51:48,280 --> 00:51:51,790 But then I'm injecting some hardware idea here, which 1045 00:51:51,790 --> 00:51:54,280 doesn't exist in software today. 1046 00:51:54,280 --> 00:51:58,030 That is, oh, a method may not be ready. 1047 00:51:58,030 --> 00:52:01,490 And even that can be captured very abstractly and done 1048 00:52:01,490 --> 00:52:03,270 properly in this. 1049 00:52:03,270 --> 00:52:04,950 Yep? 1050 00:52:04,950 --> 00:52:07,366 AUDIENCE: So then in the case of making models I was 1051 00:52:07,366 --> 00:52:08,610 thinking of spy nature. 1052 00:52:08,610 --> 00:52:12,620 So if you want to basically synthesize the hardware you 1053 00:52:12,620 --> 00:52:15,880 may also better efficiency by having set of synchronized 1054 00:52:15,880 --> 00:52:18,958 protocol by taking the output [OBSCURED] 1055 00:52:18,958 --> 00:52:22,150 number of cycles. 1056 00:52:22,150 --> 00:52:23,770 ARVIND: I think, yes. 1057 00:52:23,770 --> 00:52:26,670 That'll be a low-level detail that if it is synchronized 1058 00:52:26,670 --> 00:52:31,030 then you will actually try to manipulate it like that. 1059 00:52:31,030 --> 00:52:33,000 AUDIENCE: [OBSCURED] 1060 00:52:33,000 --> 00:52:35,160 ARVIND: OK, right. 1061 00:52:35,160 --> 00:52:38,475 So let me just quickly say what the languages is. 1062 00:52:38,475 --> 00:52:40,935 So you have modules, then you have state variables, and you 1063 00:52:40,935 --> 00:52:43,690 have rules and you have action methods and read methods. 1064 00:52:43,690 --> 00:52:45,720 What I wanted to show you is what is 1065 00:52:45,720 --> 00:52:48,290 the language of actions. 1066 00:52:48,290 --> 00:52:52,630 So when I write an action here or an action here, what is it? 1067 00:52:52,630 --> 00:52:56,500 So simplest action is assignment to a variable, 1068 00:52:56,500 --> 00:52:59,390 assigment to a register. 1069 00:52:59,390 --> 00:53:03,150 This is a conditional action, so if predicate is true then 1070 00:53:03,150 --> 00:53:08,060 do this action, otherwise no action, no change in state. 1071 00:53:08,060 --> 00:53:09,780 This is a parallel composition. 1072 00:53:09,780 --> 00:53:13,520 Do a1 and a2 in parallel. 1073 00:53:13,520 --> 00:53:16,410 Sequential composition, the effects of this are visible to 1074 00:53:16,410 --> 00:53:18,070 this, et cetera. 1075 00:53:18,070 --> 00:53:21,820 This is a call to a method of one of the module. 1076 00:53:21,820 --> 00:53:23,210 And then there is a guard. 1077 00:53:27,550 --> 00:53:31,870 And you have an expression language, which is there's 1078 00:53:31,870 --> 00:53:33,140 nothing special here. 1079 00:53:33,140 --> 00:53:37,100 So this is just simply you can read a variable, 1080 00:53:37,100 --> 00:53:38,570 you can have constants. 1081 00:53:38,570 --> 00:53:41,550 These are just names and there's nothing 1082 00:53:41,550 --> 00:53:42,820 special going on here. 1083 00:53:42,820 --> 00:53:44,060 Let me explain you guards. 1084 00:53:44,060 --> 00:53:49,150 So people find guards versus if's confusing. 1085 00:53:49,150 --> 00:53:51,760 And this is one way to understand it. 1086 00:53:51,760 --> 00:53:53,590 So guards affect the surroundings. 1087 00:53:53,590 --> 00:54:00,500 If I wrote here a1 when p1 in parallel with a2 think of 1088 00:54:00,500 --> 00:54:03,250 guard as something to do with resources. 1089 00:54:03,250 --> 00:54:10,170 I really don't want you to do a1 unless p1 is true. 1090 00:54:10,170 --> 00:54:14,080 p1 made a fact that the module was busy. 1091 00:54:14,080 --> 00:54:15,880 Some predicate here. 1092 00:54:15,880 --> 00:54:20,330 But I want the effect of this whole thing to be atomic. 1093 00:54:20,330 --> 00:54:23,100 So if any part of it can't be done then the whole thing 1094 00:54:23,100 --> 00:54:24,780 can't be done. 1095 00:54:24,780 --> 00:54:29,370 And therefore the affect of guard is as if you wrote a1 in 1096 00:54:29,370 --> 00:54:31,240 parallel with a2 when p1. 1097 00:54:31,240 --> 00:54:34,420 In other words, guard on anything becomes guard on 1098 00:54:34,420 --> 00:54:36,250 everything in that parallel composition. 1099 00:54:38,920 --> 00:54:40,390 Do you understand what I just said? 1100 00:54:42,980 --> 00:54:46,350 This is very important for composition of guards. 1101 00:54:46,350 --> 00:54:47,850 Because I want the atomicity of the 1102 00:54:47,850 --> 00:54:48,950 whole thing to be preserved. 1103 00:54:48,950 --> 00:54:51,470 So if anything can't be done that means the whole thing 1104 00:54:51,470 --> 00:54:54,370 can't be done. 1105 00:54:54,370 --> 00:54:56,480 On the other hand, conditional action is 1106 00:54:56,480 --> 00:54:57,440 just conditional action. 1107 00:54:57,440 --> 00:55:02,000 So if I have if p1 than a1 in parallel with a2, well that's 1108 00:55:02,000 --> 00:55:06,080 like saying if p1 was true then I want to do a1 and a2 in 1109 00:55:06,080 --> 00:55:10,570 parallel, otherwise I just want to do a2. 1110 00:55:10,570 --> 00:55:14,230 So there is a very big difference between conditional 1111 00:55:14,230 --> 00:55:15,560 actions and guards. 1112 00:55:15,560 --> 00:55:17,530 Guards are something about resources. 1113 00:55:17,530 --> 00:55:20,730 I need it for proper composition of atomic actions. 1114 00:55:24,300 --> 00:55:25,550 AUDIENCE: I have question. 1115 00:55:27,650 --> 00:55:28,990 So are they entirely equal into. 1116 00:55:28,990 --> 00:55:33,490 Because this is maybe a low-level question, but in the 1117 00:55:33,490 --> 00:55:36,780 first case-- so the first case in the left. 1118 00:55:36,780 --> 00:55:39,002 Can't you start doing a2 and then roll it back 1119 00:55:39,002 --> 00:55:40,240 or something [OBSCURED] 1120 00:55:40,240 --> 00:55:43,276 ARVIND: Oh, I think there may be many tricks you can play in 1121 00:55:43,276 --> 00:55:43,540 implentation. 1122 00:55:43,540 --> 00:55:44,890 You can be optimistic, you can-- 1123 00:55:44,890 --> 00:55:45,700 [INTERPOSING VOICES] 1124 00:55:45,700 --> 00:55:45,850 ARVIND: No. 1125 00:55:45,850 --> 00:55:45,950 No. 1126 00:55:45,950 --> 00:55:47,110 No. 1127 00:55:47,110 --> 00:55:50,740 Semantically this is this defining the semantics. 1128 00:55:50,740 --> 00:55:52,110 I mean, this is the algebra. 1129 00:55:52,110 --> 00:55:55,300 This has to be true because I'm saying this is 1130 00:55:55,300 --> 00:55:56,310 what a guard is. 1131 00:55:56,310 --> 00:55:57,344 AUDIENCE: Right, so the semantically equal but not 1132 00:55:57,344 --> 00:56:02,860 [OBSCURED]. 1133 00:56:02,860 --> 00:56:04,520 ARVIND: Well I mean, that's probably not the 1134 00:56:04,520 --> 00:56:05,210 right way to say it. 1135 00:56:05,210 --> 00:56:07,230 I mean, semantics are being defined. 1136 00:56:07,230 --> 00:56:08,520 You have choice in implementations. 1137 00:56:11,050 --> 00:56:13,760 I mean I'm not telling you how to implement this. 1138 00:56:13,760 --> 00:56:15,030 Let me go on with this. 1139 00:56:15,030 --> 00:56:20,520 So here is a problem that was posed to me by Jayadev Misra. 1140 00:56:20,520 --> 00:56:23,790 This is quasi realistic problem. 1141 00:56:23,790 --> 00:56:25,980 Ask for codes from 2 airlines. 1142 00:56:25,980 --> 00:56:30,700 If one code is below $300, buy immediately. 1143 00:56:30,700 --> 00:56:34,520 $300 is sort of the cheapest ticket you get these dates. 1144 00:56:34,520 --> 00:56:38,240 Buy the lower quote if over $300. 1145 00:56:38,240 --> 00:56:41,360 But as some-- say your patience runs out, say I can't 1146 00:56:41,360 --> 00:56:42,900 wait anymore. 1147 00:56:42,900 --> 00:56:48,370 So after one minute buy from whosoever has quoted otherwise 1148 00:56:48,370 --> 00:56:51,130 flag error. 1149 00:56:51,130 --> 00:56:52,410 Is this a realistic scenario? 1150 00:56:56,170 --> 00:56:58,850 You can express it using threads. 1151 00:56:58,850 --> 00:57:03,480 And you can write a scheduler to do this, et cetera. 1152 00:57:05,990 --> 00:57:09,780 Those things are not as succint as you would like. 1153 00:57:09,780 --> 00:57:13,130 I mean, you'd be surprised, in this simple a problem how 1154 00:57:13,130 --> 00:57:15,550 complicated and how many questions will arise if you 1155 00:57:15,550 --> 00:57:18,000 express this as a threaded computation. 1156 00:57:18,000 --> 00:57:21,890 Now let me show you what you will do-- 1157 00:57:26,200 --> 00:57:29,110 in Bluuspec. 1158 00:57:29,110 --> 00:57:32,720 OK, so we are going to make a module, which does what? 1159 00:57:32,720 --> 00:57:35,590 Make get quotes. 1160 00:57:35,590 --> 00:57:38,600 This module's job is to get quotes. 1161 00:57:38,600 --> 00:57:43,000 So what kind of state elements do you expect it to have? 1162 00:57:43,000 --> 00:57:46,030 Well airline a may be quoting some value, airline b may be 1163 00:57:46,030 --> 00:57:47,400 quoting some value. 1164 00:57:47,400 --> 00:57:50,290 Whether we are done or not there's a timer, which we're 1165 00:57:50,290 --> 00:57:53,080 going to be bumping, et cetera. 1166 00:57:53,080 --> 00:57:58,120 And they'll be rules you know, get a quote from a, which will 1167 00:57:58,120 --> 00:58:01,060 be when not done do something. 1168 00:58:01,060 --> 00:58:04,360 Executes when a responds. 1169 00:58:04,360 --> 00:58:06,530 Get b, timeout, et cetera. 1170 00:58:06,530 --> 00:58:10,090 There are there ruls of this sort. 1171 00:58:10,090 --> 00:58:13,700 Let me just show you one methodd. 1172 00:58:13,700 --> 00:58:17,600 So method, this is how you will start it-- 1173 00:58:17,600 --> 00:58:20,230 you have this module get quotes and you're saying book 1174 00:58:20,230 --> 00:58:23,660 me a ticket and this is your request r. 1175 00:58:23,660 --> 00:58:25,430 Now obviously this can only be done when the 1176 00:58:25,430 --> 00:58:27,250 module is not busy. 1177 00:58:27,250 --> 00:58:30,300 If module is busy the you can't do it. 1178 00:58:30,300 --> 00:58:32,870 So what will you do when you get such a request? 1179 00:58:32,870 --> 00:58:35,660 Tell me this makes sense. 1180 00:58:35,660 --> 00:58:36,910 So what is this saying? 1181 00:58:39,720 --> 00:58:46,190 Try to get a request from a in parallel. 1182 00:58:46,190 --> 00:58:48,780 Try to get a request from b. 1183 00:58:51,630 --> 00:58:53,380 Now you are busy. 1184 00:58:53,380 --> 00:58:58,620 The module is busy so done equals false here. 1185 00:58:58,620 --> 00:59:00,350 And I'm initializing this. 1186 00:59:00,350 --> 00:59:04,120 I'm just saying that a quote right now is infinity and b 1187 00:59:04,120 --> 00:59:09,190 quote is infinity and timer is zero. 1188 00:59:09,190 --> 00:59:11,900 Fair? 1189 00:59:11,900 --> 00:59:14,350 And when will I get ticket? 1190 00:59:14,350 --> 00:59:17,780 Well, when done then you'll return the ticket from this. 1191 00:59:17,780 --> 00:59:20,950 Now let's look at example of a rule. 1192 00:59:23,870 --> 00:59:31,020 So you may have a rule like this, pick cheapest. You see 1193 00:59:31,020 --> 00:59:33,590 the interesting thing in this is you'll be able to read this 1194 00:59:33,590 --> 00:59:36,980 independently of what else is going on in this system. 1195 00:59:36,980 --> 00:59:39,760 And let's see, what does this rule say? 1196 00:59:39,760 --> 00:59:41,830 Is it succint? 1197 00:59:41,830 --> 00:59:48,200 Rules says when you are not done and if the quote from a 1198 00:59:48,200 --> 00:59:54,840 is not infinite that means a has quoted. 1199 00:59:54,840 --> 01:00:04,760 And b has also quoted, then what should happen? 1200 01:00:04,760 --> 01:00:09,340 If a is less than b then buy the ticket from a. 1201 01:00:09,340 --> 01:00:14,270 Otherwise you'll buy the ticket from b and you're done. 1202 01:00:19,370 --> 01:00:22,330 I'm not going to write all these rules, but I think you 1203 01:00:22,330 --> 01:00:25,080 should be able to-- 1204 01:00:25,080 --> 01:00:29,770 I'm certain you can write these rules. 1205 01:00:29,770 --> 01:00:34,180 What's happening here that is just separation of concerns. 1206 01:00:34,180 --> 01:00:36,830 It's all concurrenct stuff. 1207 01:00:36,830 --> 01:00:38,261 Yes? 1208 01:00:38,261 --> 01:00:43,370 AUDIENCE: [OBSCURED] 1209 01:00:43,370 --> 01:00:44,840 ARVIND: Parallel? 1210 01:00:44,840 --> 01:00:46,530 AUDIENCE: Or like, for lack of better words 1211 01:00:46,530 --> 01:00:50,980 the type in the name. 1212 01:00:50,980 --> 01:00:51,720 ARVIND: Sorry, say it again. 1213 01:00:51,720 --> 01:00:56,417 AUDIENCE: So as the ampersand, how does that 1214 01:00:56,417 --> 01:00:56,850 interact with the-- 1215 01:00:56,850 --> 01:00:58,386 PROFESSOR: Can you hold that question first because we will 1216 01:00:58,386 --> 01:00:59,636 run out of space. 1217 01:01:01,990 --> 01:01:10,634 AUDIENCE: So the ampersand after not done that is, what 1218 01:01:10,634 --> 01:01:14,020 sort of ordering is that? 1219 01:01:14,020 --> 01:01:16,760 ARVIND: It's just, I mean, associative, commutative. 1220 01:01:16,760 --> 01:01:18,360 Kind of thing. 1221 01:01:18,360 --> 01:01:19,350 It's all happening in parallel. 1222 01:01:19,350 --> 01:01:22,890 AUDIENCE: [OBSCURED]. 1223 01:01:22,890 --> 01:01:26,040 ARVIND: This thing, right? 1224 01:01:26,040 --> 01:01:29,800 I mean, think of it like this, these are variables and you're 1225 01:01:29,800 --> 01:01:32,310 checking their values right now. 1226 01:01:32,310 --> 01:01:35,310 So it's just a combination kind of a query 1227 01:01:35,310 --> 01:01:36,400 that you have here. 1228 01:01:36,400 --> 01:01:38,297 AUDIENCE: So I guess you're checking there's 1229 01:01:38,297 --> 01:01:39,410 new activity happening. 1230 01:01:39,410 --> 01:01:40,300 ARVIND: That's right. 1231 01:01:40,300 --> 01:01:44,481 AUDIENCE: Doesn't matter which order you check. 1232 01:01:44,481 --> 01:01:45,984 AUDIENCE: So I guess what I'm trying to get is that the 1233 01:01:45,984 --> 01:01:47,988 whole block at the top is executed in parallel with the 1234 01:01:47,988 --> 01:01:51,780 block at the bottom? 1235 01:01:51,780 --> 01:01:52,340 ARVIND: No. 1236 01:01:52,340 --> 01:01:55,050 Those things are not in parallel. 1237 01:01:55,050 --> 01:01:56,300 AUDIENCE: [UNINTELLIGIBLE PHRASE] 1238 01:02:09,660 --> 01:02:12,010 ARVIND: Now I just want to show you this because it's 1239 01:02:12,010 --> 01:02:15,510 completely different from airline reservations. 1240 01:02:15,510 --> 01:02:19,490 You know, H.264 this is a codec that is being used 1241 01:02:19,490 --> 01:02:23,740 everywhere and we'll be used even more. 1242 01:02:23,740 --> 01:02:27,420 And this is a typical data flow diagram you will find for 1243 01:02:27,420 --> 01:02:28,350 this kind of thing. 1244 01:02:28,350 --> 01:02:31,940 You don't have to understand anything about this except 1245 01:02:31,940 --> 01:02:36,250 that conceptually you just have five flows for 1246 01:02:36,250 --> 01:02:41,120 communicating between them and in all such things the goal is 1247 01:02:41,120 --> 01:02:43,190 that it has to be able to sustain a certain amount of 1248 01:02:43,190 --> 01:02:46,590 rate at which dat is to be processed and it's usually 1249 01:02:46,590 --> 01:02:50,140 forgiving in terms of latencies. 1250 01:02:50,140 --> 01:02:52,500 So if latency increases a little bit here and there it's 1251 01:02:52,500 --> 01:02:56,910 usually OK-- when things go in and come out. 1252 01:02:56,910 --> 01:03:00,320 A data flow like network. 1253 01:03:00,320 --> 01:03:02,540 And another reason why this example is fascinating is 1254 01:03:02,540 --> 01:03:06,350 because it's done regularly both in hardware and software 1255 01:03:06,350 --> 01:03:08,250 and any mixture of the two. 1256 01:03:08,250 --> 01:03:11,040 So for example, when it runs on your PC it's being done 1257 01:03:11,040 --> 01:03:17,420 100% in software and I think on iPods there are special 1258 01:03:17,420 --> 01:03:20,870 purpose hardware for doing it and therefore the battery can 1259 01:03:20,870 --> 01:03:23,530 last much longer. 1260 01:03:23,530 --> 01:03:27,010 And on cell phones it's kind of in between in this and it's 1261 01:03:27,010 --> 01:03:30,420 a question of what frame rate you want to process? 1262 01:03:30,420 --> 01:03:33,050 If you're procssing very high frame rate then it'll have to 1263 01:03:33,050 --> 01:03:36,800 be done in hardware. 1264 01:03:36,800 --> 01:03:41,770 So when we got into this thing we said, OK, we wanted to 1265 01:03:41,770 --> 01:03:44,030 build hardware for doing this. 1266 01:03:44,030 --> 01:03:48,200 And reference codes that are available, it's 80,000 lines 1267 01:03:48,200 --> 01:03:50,010 and it sort of gives you a heart attack. 1268 01:03:50,010 --> 01:03:52,520 You know, you read this and you say, whoa! 1269 01:03:52,520 --> 01:03:54,880 And it doesn't need the performance. 1270 01:03:54,880 --> 01:03:57,230 I mean, that reference code, if it ran on your laptop you 1271 01:03:57,230 --> 01:04:01,180 won't be able to watch a movie with this. 1272 01:04:01,180 --> 01:04:05,210 And there is the Linux version of it, which gives you even a 1273 01:04:05,210 --> 01:04:09,930 bigger heart attack because it's 200,000 lies and many 1274 01:04:09,930 --> 01:04:12,430 different codecs are mixed in this. 1275 01:04:15,620 --> 01:04:20,180 The biggest problem here it is none of these codes reflect 1276 01:04:20,180 --> 01:04:23,660 that picture I showed you of the previous diagram. 1277 01:04:23,660 --> 01:04:28,060 Because the way people approach it in software is 1278 01:04:28,060 --> 01:04:32,770 they take some input from here and they push it 1279 01:04:32,770 --> 01:04:34,020 as far as they can. 1280 01:04:37,410 --> 01:04:37,990 And that's it. 1281 01:04:37,990 --> 01:04:40,540 Then they will take the next input and push it. 1282 01:04:40,540 --> 01:04:44,500 So different boxes keep modifying it as they see fit 1283 01:04:44,500 --> 01:04:48,735 and as a reader you can't tell, oh, this is this part 1284 01:04:48,735 --> 01:04:49,700 and this is this part. 1285 01:04:49,700 --> 01:04:50,940 I mean, there are no 54Q's. 1286 01:04:50,940 --> 01:04:51,710 Why? 1287 01:04:51,710 --> 01:04:54,870 Because 54Q's are expensive in software. 1288 01:04:54,870 --> 01:04:56,380 You know, it will involve copying. 1289 01:04:56,380 --> 01:04:58,920 You know, you'll write it here then you'll read it again, 1290 01:04:58,920 --> 01:05:01,710 which is a bad idea in software. 1291 01:05:01,710 --> 01:05:05,830 So in some sense, the software will thinking already entered. 1292 01:05:09,690 --> 01:05:14,050 In this style of coding there is no model of concurrency. 1293 01:05:14,050 --> 01:05:16,720 So it's not clear to me when you give me that quote what 1294 01:05:16,720 --> 01:05:19,720 all I can do in parallel in order to preserve the 1295 01:05:19,720 --> 01:05:21,020 semantics of this. 1296 01:05:21,020 --> 01:05:23,790 I mean the problem is much, much harder here. 1297 01:05:23,790 --> 01:05:26,730 So you will just do some high-level analysis of the 1298 01:05:26,730 --> 01:05:30,100 code, but you have lost a lot of information that was all 1299 01:05:30,100 --> 01:05:32,380 present in the source-- 1300 01:05:32,380 --> 01:05:34,150 could've been present in the source. 1301 01:05:34,150 --> 01:05:35,990 And my claim is it can be done differently. 1302 01:05:35,990 --> 01:05:41,620 So this has been done by Chun-Chieh and this I'm 1303 01:05:41,620 --> 01:05:44,870 showing you a month old slide, but he has done a lot more 1304 01:05:44,870 --> 01:05:49,970 work since then and the total code in Bluespec is 9000 lines 1305 01:05:49,970 --> 01:05:54,650 for this contrasting it with 80,000 versus 200,000 kind of 1306 01:05:54,650 --> 01:05:59,090 stuff and this is telling you the amount of code in various 1307 01:05:59,090 --> 01:06:02,080 blocks in this. 1308 01:06:02,080 --> 01:06:04,940 And since it's done in Bluespec you can take it and 1309 01:06:04,940 --> 01:06:08,290 you can go and implement it all in hardware, so you can 1310 01:06:08,290 --> 01:06:10,980 synthesize it. 1311 01:06:10,980 --> 01:06:15,380 It actually does now 45 frames per second, actually 90 frames 1312 01:06:15,380 --> 01:06:17,510 even per second today. 1313 01:06:17,510 --> 01:06:20,410 You know, so this is a month old. 1314 01:06:20,410 --> 01:06:23,180 This is the main point. 1315 01:06:23,180 --> 01:06:26,440 Once you have done this now you can refine it. 1316 01:06:26,440 --> 01:06:28,120 You have a different preference. 1317 01:06:28,120 --> 01:06:30,830 You say, oh, I want to do this differently. 1318 01:06:30,830 --> 01:06:34,640 So you can go and rewrite any module again. 1319 01:06:34,640 --> 01:06:38,110 It will be very, very easy to do this in the system. 1320 01:06:38,110 --> 01:06:41,840 It is very difficult to do this in the reference code, so 1321 01:06:41,840 --> 01:06:43,940 that's one point to be made. 1322 01:06:43,940 --> 01:06:46,220 The second thing is you don't have to do 1323 01:06:46,220 --> 01:06:47,180 everything in hardware. 1324 01:06:47,180 --> 01:06:49,520 If you wanted you could have implemented any part of it in 1325 01:06:49,520 --> 01:06:52,510 software, so you can take the same kind of a description and 1326 01:06:52,510 --> 01:06:55,680 generate software from it as well. 1327 01:06:55,680 --> 01:06:57,530 So each module can be refined separately. 1328 01:06:57,530 --> 01:07:00,610 Behaviors of modules are composable. 1329 01:07:00,610 --> 01:07:05,190 You know, you can take these things and it's predictable 1330 01:07:05,190 --> 01:07:07,400 what will happen when you compose these things in terms 1331 01:07:07,400 --> 01:07:11,480 of performance and in terms of resources also, it's very 1332 01:07:11,480 --> 01:07:12,820 clear what's happening. 1333 01:07:12,820 --> 01:07:14,950 In hardware the resources will be area and time 1334 01:07:14,950 --> 01:07:17,100 and so on in this. 1335 01:07:17,100 --> 01:07:18,730 So takeaway. 1336 01:07:18,730 --> 01:07:21,400 Parallel programming should be based on well-defined modules 1337 01:07:21,400 --> 01:07:24,580 and parallel composition of such modules. 1338 01:07:24,580 --> 01:07:28,140 Modules must embody a notion of resources and consequently, 1339 01:07:28,140 --> 01:07:31,020 sharing in time multiplex use. 1340 01:07:31,020 --> 01:07:33,510 This is a controversial point. 1341 01:07:33,510 --> 01:07:35,820 And guard atomic actions and modules with guarded 1342 01:07:35,820 --> 01:07:39,800 interfaces provide a solid foundation for doing so. 1343 01:07:39,800 --> 01:07:41,050 PROFESSOR: Thank you. 1344 01:07:44,996 --> 01:07:47,151 Now we ran a little bit late, so if you have questions you 1345 01:07:47,151 --> 01:07:50,050 can ask about programs.