1 00:00:00,000 --> 00:00:02,430 The following content is provided under a Creative 2 00:00:02,430 --> 00:00:03,880 Commons license. 3 00:00:03,880 --> 00:00:06,870 Your support will help MIT OpenCourseWare continue to 4 00:00:06,870 --> 00:00:10,590 offer high quality educational resource for free. 5 00:00:10,590 --> 00:00:14,115 To make a donation you or view additional materials from 6 00:00:14,115 --> 00:00:16,360 hundreds of MIT courses, visit mitopencourseware@ocw.mit.edu. 7 00:00:21,330 --> 00:00:23,380 PROFESSOR: So let's get started with the second 8 00:00:23,380 --> 00:00:25,640 lecture for today. 9 00:00:25,640 --> 00:00:30,210 So I guess one thing multicores did, is really 10 00:00:30,210 --> 00:00:33,910 shatter this nice view of writing in your programs and 11 00:00:33,910 --> 00:00:37,040 hardwares to take care of, giving you performance. 12 00:00:37,040 --> 00:00:41,690 So hardware just kind of completely gave that up. 13 00:00:41,690 --> 00:00:46,010 But so what you're doing in this class, is you're trying 14 00:00:46,010 --> 00:00:46,590 to do it by yourself. 15 00:00:46,590 --> 00:00:51,560 Give all the responsibility back to the program. 16 00:00:51,560 --> 00:00:54,720 And you realize as you go, it's a much harder job. 17 00:00:54,720 --> 00:00:57,520 I mean, this is not simple programming. 18 00:00:57,520 --> 00:01:01,860 So you need to have, you don't have MIT class students on 19 00:01:01,860 --> 00:01:05,625 every company to do this, so we need to have some kind of 20 00:01:05,625 --> 00:01:05,670 middle ground. 21 00:01:05,670 --> 00:01:10,240 And so some of the stuff we have been doing is trying to 22 00:01:10,240 --> 00:01:11,650 figure out are there any middle ground. 23 00:01:11,650 --> 00:01:15,910 Can you actually take some of that load away from the user 24 00:01:15,910 --> 00:01:18,640 into things like languages and compilers. 25 00:01:18,640 --> 00:01:22,340 So we will talk about some of those. 26 00:01:22,340 --> 00:01:25,140 So right now we are kind of switching from directly doing 27 00:01:25,140 --> 00:01:29,920 what's necessary, to do the Cell project into going 28 00:01:29,920 --> 00:01:35,107 breadth So this lecture, and then we will sit back and do a 29 00:01:35,107 --> 00:01:35,367 little bit of debugging and performance work and that will 30 00:01:35,367 --> 00:01:36,570 be directly helpful. 31 00:01:36,570 --> 00:01:39,736 And then next week we'll have lots of guest lectures to kind 32 00:01:39,736 --> 00:01:41,770 of give you breadth in there. 33 00:01:41,770 --> 00:01:46,090 So you'll understand no just Cell programming but parallel 34 00:01:46,090 --> 00:01:49,640 programming and parallel processing, what the world is 35 00:01:49,640 --> 00:01:50,320 like beyond that. 36 00:01:50,320 --> 00:01:56,300 So today we're going to have Bill talk about streams. 37 00:01:56,300 --> 00:01:57,210 BILL THIES: OK very good. 38 00:01:57,210 --> 00:01:58,470 So my name is Bill Thies. 39 00:01:58,470 --> 00:02:00,720 I'm a graduate student working with Saman and Roderick, and 40 00:02:00,720 --> 00:02:01,450 others here. 41 00:02:01,450 --> 00:02:03,710 And I'll talk about the StreamIt language. 42 00:02:03,710 --> 00:02:06,230 So why do we need a new programming language? 43 00:02:06,230 --> 00:02:08,490 Well we think that languages haven't kept up with the 44 00:02:08,490 --> 00:02:09,520 architectures. 45 00:02:09,520 --> 00:02:12,860 So one way to look at this is that if you look back at 46 00:02:12,860 --> 00:02:17,110 previous languages, look at C with von-Neumann machine. 47 00:02:17,110 --> 00:02:19,870 Now I grew up in rural Pennsylvania not too far from 48 00:02:19,870 --> 00:02:20,630 Amish Country. 49 00:02:20,630 --> 00:02:23,640 And so to me these go together just like a horse and buggy. 50 00:02:23,640 --> 00:02:26,110 OK they're perfectly made for each other. 51 00:02:26,110 --> 00:02:27,360 They basically go at the same rate. 52 00:02:27,360 --> 00:02:28,720 Everything is fine. 53 00:02:28,720 --> 00:02:31,310 But the problem is, in comes the modern architecture. 54 00:02:31,310 --> 00:02:32,360 OK this is an F-16. 55 00:02:32,360 --> 00:02:34,610 you have a lot more that you can do with it then, then with 56 00:02:34,610 --> 00:02:35,700 the horse and buggy. 57 00:02:35,700 --> 00:02:38,590 So how do you program these new architectures? 58 00:02:38,590 --> 00:02:41,840 Well architecture makers these days are basically faced with 59 00:02:41,840 --> 00:02:43,100 a really hard choice. 60 00:02:43,100 --> 00:02:46,060 On the one hand, you could get a really cool architecture and 61 00:02:46,060 --> 00:02:48,950 develop an ad hoc programming technique where you're really 62 00:02:48,950 --> 00:02:51,130 just leaving it to the programmer to do something 63 00:02:51,130 --> 00:02:52,900 complicated to get performance. 64 00:02:52,900 --> 00:02:54,570 And unfortunately I think that's the 65 00:02:54,570 --> 00:02:55,100 route that they took. 66 00:02:55,100 --> 00:02:57,490 I mean fortunately for the industry, but unfortunately 67 00:02:57,490 --> 00:03:00,010 for you, I think that's the route they took with cell 68 00:03:00,010 --> 00:03:01,700 which means all of you are going to become basically 69 00:03:01,700 --> 00:03:02,800 fighter pilots. 70 00:03:02,800 --> 00:03:04,160 You have to learn how to fly the plane. 71 00:03:04,160 --> 00:03:05,100 You have to become an expert. 72 00:03:05,100 --> 00:03:07,140 You're going to become the best people at programming 73 00:03:07,140 --> 00:03:08,350 these architectures. 74 00:03:08,350 --> 00:03:11,810 And unfortunately the only other option is to really bend 75 00:03:11,810 --> 00:03:15,010 over backwards to support the previous era of languages 76 00:03:15,010 --> 00:03:16,670 like C and C . 77 00:03:16,670 --> 00:03:19,500 And you can see what's coming here, it's just hard to get 78 00:03:19,500 --> 00:03:20,850 off the runway. 79 00:03:20,850 --> 00:03:25,400 So you don't want this situation. 80 00:03:25,400 --> 00:03:27,640 Out of consideration for whoever is in the buggy, 81 00:03:27,640 --> 00:03:29,790 hopefully you'll never take off. 82 00:03:29,790 --> 00:03:33,190 So looking from a more academic perspective, why do 83 00:03:33,190 --> 00:03:35,060 we need a new language right now? 84 00:03:35,060 --> 00:03:37,880 So if you look back over the past 30 years, I know you've 85 00:03:37,880 --> 00:03:39,330 seen this graph before. 86 00:03:39,330 --> 00:03:41,760 We were dealing with just one core in the machine 87 00:03:41,760 --> 00:03:42,950 for all this time. 88 00:03:42,950 --> 00:03:45,110 And now we have this plethora of multicores 89 00:03:45,110 --> 00:03:46,610 coming across the board. 90 00:03:46,610 --> 00:03:49,120 So how did we program these old machines? 91 00:03:49,120 --> 00:03:52,670 Well we had languages like C and FORTRAN that really have a 92 00:03:52,670 --> 00:03:55,230 lot of nice properties across these architectures. 93 00:03:55,230 --> 00:03:58,540 So it was portable, high-performance, composable-- 94 00:03:58,540 --> 00:04:00,870 you could have really good software development-- 95 00:04:00,870 --> 00:04:03,180 malleable, maintainable, all the nice things you'd like to 96 00:04:03,180 --> 00:04:05,250 see from a software engineering perspective. 97 00:04:05,250 --> 00:04:08,450 And really if you wrote a program back in 1970, you 98 00:04:08,450 --> 00:04:12,170 could keep it in C and have it continue to leverage all the 99 00:04:12,170 --> 00:04:14,620 new properties of machines over the past 30 years. 100 00:04:14,620 --> 00:04:16,760 So just one fine out of the box. 101 00:04:16,760 --> 00:04:19,620 And looking forward, that's not going to be true. 102 00:04:19,620 --> 00:04:21,660 So for example, we could say that C was the 103 00:04:21,660 --> 00:04:23,060 common machine language. 104 00:04:23,060 --> 00:04:25,760 That's what we say for the past 30 years, was common 105 00:04:25,760 --> 00:04:27,520 across all the machines. 106 00:04:27,520 --> 00:04:29,120 But now looking forward, that's not 107 00:04:29,120 --> 00:04:30,210 going to be true anymore. 108 00:04:30,210 --> 00:04:32,800 Because you have to program every core separately. 109 00:04:32,800 --> 00:04:35,270 So what's the common machine language for multicores? 110 00:04:35,270 --> 00:04:37,240 We really think you need something where you can write 111 00:04:37,240 --> 00:04:41,370 a program once today, and have it scale for the next 30 years 112 00:04:41,370 --> 00:04:43,560 without having to modify the program. 113 00:04:43,560 --> 00:04:45,930 So what kind of language do you need to get that kind of 114 00:04:45,930 --> 00:04:47,620 performance? 115 00:04:47,620 --> 00:04:49,520 Well let's look a little deeper into this notion of a 116 00:04:49,520 --> 00:04:51,020 common machine language. 117 00:04:51,020 --> 00:04:53,820 So why did it work so well for the past 30 years? 118 00:04:53,820 --> 00:04:57,670 Well on uniprocessors, things like C and FORTRAN fran really 119 00:04:57,670 --> 00:04:59,800 encapsulated the common properties. 120 00:04:59,800 --> 00:05:02,820 So things like a single flow of control in the machine, a 121 00:05:02,820 --> 00:05:06,260 single memory image, are both properties of the language. 122 00:05:06,260 --> 00:05:08,260 But they also hid certain properties from the 123 00:05:08,260 --> 00:05:09,070 programmer. 124 00:05:09,070 --> 00:05:11,550 So they hid the things that were different between one 125 00:05:11,550 --> 00:05:12,620 machine and another. 126 00:05:12,620 --> 00:05:16,320 So for example, the register file, the ISA, the functional 127 00:05:16,320 --> 00:05:17,540 units and so on. 128 00:05:17,540 --> 00:05:20,200 These things could change from one architecture to another. 129 00:05:20,200 --> 00:05:22,330 And you didn't have to change your program because those 130 00:05:22,330 --> 00:05:24,820 aspects weren't in the programming language. 131 00:05:24,820 --> 00:05:27,140 So that's why these languages were succeeding. 132 00:05:27,140 --> 00:05:30,070 And what do we need to succeed in the multicore era from a 133 00:05:30,070 --> 00:05:31,580 language perspective? 134 00:05:31,580 --> 00:05:34,370 Well you need to encapsulate the common properties again. 135 00:05:34,370 --> 00:05:37,830 And this time it's multiple flows of control that you have 136 00:05:37,830 --> 00:05:40,860 for all the different cores, and multiple local memories. 137 00:05:40,860 --> 00:05:43,710 There's no more monolithic memory anymore, that everyone 138 00:05:43,710 --> 00:05:46,370 can read and write to. 139 00:05:46,370 --> 00:05:48,530 Also you need to hide some of the 140 00:05:48,530 --> 00:05:50,190 differences between the machines. 141 00:05:50,190 --> 00:05:51,990 So some cores have different capabilities. 142 00:05:51,990 --> 00:05:54,570 On cell there's a heterogeneous system between 143 00:05:54,570 --> 00:05:56,560 the STEs and the PPE. 144 00:05:56,560 --> 00:05:59,090 Different communication models on different architectures, 145 00:05:59,090 --> 00:06:01,010 different synchronization models. 146 00:06:01,010 --> 00:06:02,970 So whatever common machine language we come up with, 147 00:06:02,970 --> 00:06:04,600 we'll have to keep these things hidden from the 148 00:06:04,600 --> 00:06:06,130 programmer. 149 00:06:06,130 --> 00:06:07,960 Now a lot of different researchers are taking 150 00:06:07,960 --> 00:06:10,390 different tacts for how you want to invent the next common 151 00:06:10,390 --> 00:06:11,380 machine language. 152 00:06:11,380 --> 00:06:13,920 And the thrust that we're really excited about is this 153 00:06:13,920 --> 00:06:15,510 notion of streaming. 154 00:06:15,510 --> 00:06:17,460 So what is a stream program? 155 00:06:17,460 --> 00:06:20,130 Well if you look at a lot of the high-performance systems 156 00:06:20,130 --> 00:06:22,780 today-- including Powerpoint which is running this awesome 157 00:06:22,780 --> 00:06:24,730 animation-- 158 00:06:24,730 --> 00:06:26,880 you can basically see that they're based around some 159 00:06:26,880 --> 00:06:27,930 stream of data. 160 00:06:27,930 --> 00:06:33,290 So audio, video, like HDTV, video editing, graphic stuff. 161 00:06:33,290 --> 00:06:35,570 I think actually, a lot of the projects in this class that I 162 00:06:35,570 --> 00:06:38,160 looked at, would fit into the streaming mold. 163 00:06:38,160 --> 00:06:41,790 Things like the software radio, array tracing, I 164 00:06:41,790 --> 00:06:42,750 probably don't remember them all. 165 00:06:42,750 --> 00:06:44,460 But when I looked at them, they all looked like they had 166 00:06:44,460 --> 00:06:46,670 a streaming component somewhere in there. 167 00:06:46,670 --> 00:06:49,560 So what's special about a stream program compared to 168 00:06:49,560 --> 00:06:51,440 just a normal program? 169 00:06:51,440 --> 00:06:53,830 Well they have a lot of attractive properties. 170 00:06:53,830 --> 00:06:56,380 If you look at their structure, you can usually see 171 00:06:56,380 --> 00:07:00,220 that the computation pattern remains relatively constant 172 00:07:00,220 --> 00:07:01,850 across the lifetime of the program. 173 00:07:01,850 --> 00:07:04,600 So they have some well-defined units that are communicating 174 00:07:04,600 --> 00:07:05,810 with each other. 175 00:07:05,810 --> 00:07:08,790 And they continue that pattern of communication throughout. 176 00:07:08,790 --> 00:07:11,130 And this really exposes a lot of opportunities for the 177 00:07:11,130 --> 00:07:13,670 compiler to do some optimizations that it couldn't 178 00:07:13,670 --> 00:07:16,640 do on just an arbitrary general purpose program. 179 00:07:16,640 --> 00:07:19,020 And if you saw before, we have basically all the types of 180 00:07:19,020 --> 00:07:22,380 parallelism are really exposed in a stream program. 181 00:07:22,380 --> 00:07:25,000 There's the pipeline parallelism between different 182 00:07:25,000 --> 00:07:26,650 producers and consumers. 183 00:07:26,650 --> 00:07:29,340 There's the task parallelism basically going 184 00:07:29,340 --> 00:07:30,420 from left to right. 185 00:07:30,420 --> 00:07:33,220 And also data parallelism which means that a single one 186 00:07:33,220 --> 00:07:36,900 of these stages can sometimes be split to apply to multiple 187 00:07:36,900 --> 00:07:39,960 elements in the data stream. 188 00:07:39,960 --> 00:07:42,380 So when you're thinking about stream programming, there's a 189 00:07:42,380 --> 00:07:43,840 lot of different ways you can actually 190 00:07:43,840 --> 00:07:45,500 represent the program. 191 00:07:45,500 --> 00:07:47,640 So whenever you have a programming model, you have to 192 00:07:47,640 --> 00:07:49,530 answer these kinds of questions. 193 00:07:49,530 --> 00:07:52,150 For example do the senders and the receivers block when they 194 00:07:52,150 --> 00:07:53,700 try to communicate? 195 00:07:53,700 --> 00:07:55,470 How much buffering is allowed? 196 00:07:55,470 --> 00:07:57,300 Is the computation deterministic? 197 00:07:57,300 --> 00:07:59,440 What kind of model do you have in there? 198 00:07:59,440 --> 00:08:00,580 Can you avoid deadlock? 199 00:08:00,580 --> 00:08:03,880 Questions like these, and we could spend a whole lecture 200 00:08:03,880 --> 00:08:05,790 answering these questions, putting them in different 201 00:08:05,790 --> 00:08:06,980 categories. 202 00:08:06,980 --> 00:08:09,170 But what I want to just to do, just to give you a feel is 203 00:08:09,170 --> 00:08:12,010 touch on kind of three of the major models that you might 204 00:08:12,010 --> 00:08:14,410 see come up in different kinds of programming models. 205 00:08:14,410 --> 00:08:17,550 And I'll just touch Kahn process networks, synchronous 206 00:08:17,550 --> 00:08:22,800 dataflow, and communicating sequential processes, or CSP. 207 00:08:22,800 --> 00:08:24,930 So just one slide on these models. 208 00:08:24,930 --> 00:08:27,300 So let's compare them a little bit. 209 00:08:27,300 --> 00:08:29,650 First there's the Kahn process networks. 210 00:08:29,650 --> 00:08:31,910 So this is kind of the simplest model. 211 00:08:31,910 --> 00:08:32,940 It's very intuitive. 212 00:08:32,940 --> 00:08:34,610 You just have different processes that are 213 00:08:34,610 --> 00:08:36,460 communicating over FIFOs. 214 00:08:36,460 --> 00:08:40,700 And the FIFO size is conceptually unbounded. 215 00:08:40,700 --> 00:08:44,840 So to a first approximation, it's kind of like a Unix pipe. 216 00:08:44,840 --> 00:08:47,630 These processes can just read from the input, and they can 217 00:08:47,630 --> 00:08:50,310 push onto their outputs without blocking. 218 00:08:50,310 --> 00:08:53,170 But if they try to read from an input they do block until 219 00:08:53,170 --> 00:08:54,850 an input is available. 220 00:08:54,850 --> 00:08:58,090 And the interesting thing is that the communication pattern 221 00:08:58,090 --> 00:09:00,520 can actually be dependent on the data. 222 00:09:00,520 --> 00:09:04,410 So for example I could pop an index off of one channel, and 223 00:09:04,410 --> 00:09:07,300 then use that index to determine which other channel 224 00:09:07,300 --> 00:09:09,830 I'll read from on the next time time step. 225 00:09:09,830 --> 00:09:12,560 But at the same time it is deterministic. 226 00:09:12,560 --> 00:09:16,450 So for a given series of input values on the stream, I'll 227 00:09:16,450 --> 00:09:18,540 always have the same communication pattern that I'm 228 00:09:18,540 --> 00:09:20,630 trying from the other input. 229 00:09:20,630 --> 00:09:24,580 So if it's a deterministic model, that's a nice property. 230 00:09:24,580 --> 00:09:26,320 Let's see, what else to say here? 231 00:09:26,320 --> 00:09:29,670 There's actually a few recent ventures that are using Kahn 232 00:09:29,670 --> 00:09:30,710 process networks. 233 00:09:30,710 --> 00:09:33,190 So there's commercial interest. For example Ambric 234 00:09:33,190 --> 00:09:36,920 is a startup that I think will be based on a Kahn process 235 00:09:36,920 --> 00:09:40,750 network for the programming model. 236 00:09:40,750 --> 00:09:43,410 Looking at another model called synchronous dataflow, 237 00:09:43,410 --> 00:09:45,930 this is actually what we use in the StreamIt system. 238 00:09:45,930 --> 00:09:47,760 And compared to Kahn process networks, 239 00:09:47,760 --> 00:09:48,830 it's kind of a subset. 240 00:09:48,830 --> 00:09:50,600 It's a little bit more restrictive. 241 00:09:50,600 --> 00:09:53,040 So if you look at the space of all possible program 242 00:09:53,040 --> 00:09:55,770 behaviors, Kahn process networks are a pretty big 243 00:09:55,770 --> 00:09:57,110 piece of the space. 244 00:09:57,110 --> 00:09:59,580 And then synchronous dataflow is kind of a subset of that 245 00:09:59,580 --> 00:10:02,490 space where you know more about the communication 246 00:10:02,490 --> 00:10:04,390 pattern at compile time. 247 00:10:04,390 --> 00:10:07,070 So for example, in synchronous dataflow, the programmer 248 00:10:07,070 --> 00:10:10,810 actually declares how many items it will consume from 249 00:10:10,810 --> 00:10:14,090 each of its in put channels on a given execution step. 250 00:10:14,090 --> 00:10:16,770 So there's no more data dependence regarding the 251 00:10:16,770 --> 00:10:18,000 communication pattern. 252 00:10:18,000 --> 00:10:21,290 It'll always input some items from some of the channels and 253 00:10:21,290 --> 00:10:23,990 produce some number of items to other channels. 254 00:10:23,990 --> 00:10:26,300 And this is a really nice properties because it lets the 255 00:10:26,300 --> 00:10:28,510 compiler do to scheduling for you. 256 00:10:28,510 --> 00:10:31,020 So the compiling can see who's communicating to who and 257 00:10:31,020 --> 00:10:32,630 exactly what pattern. 258 00:10:32,630 --> 00:10:35,720 And it can statically interleave the filters to 259 00:10:35,720 --> 00:10:39,400 guarantee that everyone has enough data to complete their 260 00:10:39,400 --> 00:10:40,750 computation. 261 00:10:40,750 --> 00:10:42,490 So there's a lot of interesting optimizations you 262 00:10:42,490 --> 00:10:42,950 can do here. 263 00:10:42,950 --> 00:10:45,820 That's why it's very attractive for StreamIt. 264 00:10:45,820 --> 00:10:47,820 And you can statically guarantee freedom from 265 00:10:47,820 --> 00:10:51,560 deadlock, which is a nice property to have. 266 00:10:51,560 --> 00:10:53,850 The last one I want to touch on is communicating sequential 267 00:10:53,850 --> 00:10:56,440 processes or CSP. 268 00:10:56,440 --> 00:10:59,080 And in the space of program behaviors, it's kind of an 269 00:10:59,080 --> 00:11:02,790 overlapping that from Kahn processing networks, and adds 270 00:11:02,790 --> 00:11:05,170 a few new semantic behaviors. 271 00:11:05,170 --> 00:11:08,030 So the buffering model is basically rendezvous 272 00:11:08,030 --> 00:11:09,300 communication now. 273 00:11:09,300 --> 00:11:11,950 So there's no bothering in the system. 274 00:11:11,950 --> 00:11:15,390 Basically anytime you send a value to another process, you 275 00:11:15,390 --> 00:11:18,380 have to block and wait until that process will actually 276 00:11:18,380 --> 00:11:20,060 receive that value from you. 277 00:11:20,060 --> 00:11:24,570 So everyone is rendevouzing at every communication step. 278 00:11:24,570 --> 00:11:27,200 In addition to that, they have some sophisticated 279 00:11:27,200 --> 00:11:28,730 synchronization primitives. 280 00:11:28,730 --> 00:11:32,760 So you can for example, discuss alternative behaviors 281 00:11:32,760 --> 00:11:35,850 that you have. You can either one thing or another which 282 00:11:35,850 --> 00:11:39,510 will introduce the nondeterminism in the model. 283 00:11:39,510 --> 00:11:42,150 Which could be a good or a bad thing depending on the program 284 00:11:42,150 --> 00:11:43,550 you're trying to express. 285 00:11:43,550 --> 00:11:47,090 And pretty much the most well-known encapsulation of 286 00:11:47,090 --> 00:11:50,550 CSP is this occam programming language invented 287 00:11:50,550 --> 00:11:51,550 quite a while ago. 288 00:11:51,550 --> 00:11:54,420 And some people are still using that today. 289 00:11:54,420 --> 00:11:56,030 Any questions on the model computations? 290 00:11:59,760 --> 00:12:00,720 OK. 291 00:12:00,720 --> 00:12:03,570 So now let me get into what StreamIt is. 292 00:12:03,570 --> 00:12:06,190 So StreamIt is a great language. 293 00:12:06,190 --> 00:12:07,380 It's a high-level 294 00:12:07,380 --> 00:12:08,930 architecture-independent language. 295 00:12:08,930 --> 00:12:11,660 Oh question the back. 296 00:12:11,660 --> 00:12:15,024 AUDIENCE: With the CSP I'm trying to understand exactly 297 00:12:15,024 --> 00:12:19,660 what that means or how it's different. 298 00:12:19,660 --> 00:12:22,150 Is basically what's it's saying is you have a bunch 299 00:12:22,150 --> 00:12:24,940 processes and they can send messages to each other. 300 00:12:24,940 --> 00:12:29,390 BILL THIES: So all these models have that property. 301 00:12:29,390 --> 00:12:31,230 AUDIENCE: They all fit into that. 302 00:12:31,230 --> 00:12:35,220 But it seems like from your explanation of CSP, that that 303 00:12:35,220 --> 00:12:41,620 was just sort of the essence of CSP is it more specific? 304 00:12:41,620 --> 00:12:43,820 BILL THIES: So CSP is usually associated with rendezvous 305 00:12:43,820 --> 00:12:45,360 communication. 306 00:12:45,360 --> 00:12:47,270 That's the side of programs that fit inside 307 00:12:47,270 --> 00:12:48,780 Kahn process networks. 308 00:12:48,780 --> 00:12:51,860 It's any communicating model where you basically have no 309 00:12:51,860 --> 00:12:54,310 buffering between the processes. 310 00:12:54,310 --> 00:12:57,520 Now the piece that sits outside is usually lumped with 311 00:12:57,520 --> 00:13:01,000 CSP, or especially with occam They have a set a primitives 312 00:13:01,000 --> 00:13:04,500 that are richer in terms of synchronization. 313 00:13:04,500 --> 00:13:07,810 So for example, you can have guards on your communication. 314 00:13:07,810 --> 00:13:11,040 Don't execute this consumption from this channel until I see 315 00:13:11,040 --> 00:13:12,550 a certain value. 316 00:13:12,550 --> 00:13:15,580 So there's some more rich semantics there. 317 00:13:15,580 --> 00:13:17,720 And so that's the things that are usually outside. 318 00:13:17,720 --> 00:13:19,710 They're outside the other models. 319 00:13:19,710 --> 00:13:23,940 Does that make sense? 320 00:13:23,940 --> 00:13:25,190 Other questions? 321 00:13:28,530 --> 00:13:32,210 OK so StreamIt. 322 00:13:32,210 --> 00:13:34,480 OK so StreamIt is architecture-independent. 323 00:13:34,480 --> 00:13:39,190 It's basically a really nice syntactic model for 324 00:13:39,190 --> 00:13:42,480 interfacing with these lower level models of computation 325 00:13:42,480 --> 00:13:43,530 for streaming. 326 00:13:43,530 --> 00:13:46,460 And really we have two goals in the StreamIt project. 327 00:13:46,460 --> 00:13:49,470 And the first is from the programmer's side. 328 00:13:49,470 --> 00:13:52,000 So we want to improve the programmer's life when you're 329 00:13:52,000 --> 00:13:53,360 writing a parallel program. 330 00:13:53,360 --> 00:13:55,840 We want to make it easier for you to write a parallel 331 00:13:55,840 --> 00:13:59,060 program then you would have to do in C or a language like 332 00:13:59,060 --> 00:14:01,510 Java, or any other language that you know. 333 00:14:01,510 --> 00:14:03,960 And at the same time, we want scalable and portable 334 00:14:03,960 --> 00:14:06,100 performance across the multicores. 335 00:14:06,100 --> 00:14:08,820 So an interesting thing these days is you'll find, is it's 336 00:14:08,820 --> 00:14:13,570 often very hard a tempt the programmer to switch to your 337 00:14:13,570 --> 00:14:16,240 favorite language based solely on performance. 338 00:14:16,240 --> 00:14:18,770 Or at least this has been the story in the past. It may 339 00:14:18,770 --> 00:14:20,210 change, looking forward. 340 00:14:20,210 --> 00:14:22,640 Because it's a lot harder to get performance these days. 341 00:14:22,640 --> 00:14:24,730 But usually you have to offer them some other carrot to get 342 00:14:24,730 --> 00:14:25,720 them on board. 343 00:14:25,720 --> 00:14:28,660 And you know the carrot here is that it's really nice to 344 00:14:28,660 --> 00:14:29,630 program in. 345 00:14:29,630 --> 00:14:30,880 It's fun to program in. 346 00:14:30,880 --> 00:14:32,200 It's beautiful. 347 00:14:32,200 --> 00:14:34,820 It's a lot easier to program and stream it then it would be 348 00:14:34,820 --> 00:14:37,230 in something like C or Java for a certain class of 349 00:14:37,230 --> 00:14:40,380 programs. So that's how we get them on board, and then we 350 00:14:40,380 --> 00:14:43,010 also provide the performance. 351 00:14:43,010 --> 00:14:45,420 We're mostly based on the synchronous in dataflow model. 352 00:14:45,420 --> 00:14:49,380 In that when there are static communication patterns, we 353 00:14:49,380 --> 00:14:51,560 leverage that from the compiler side. 354 00:14:51,560 --> 00:14:53,850 So I'll also tell you about some dynamic extensions that 355 00:14:53,850 --> 00:14:58,410 we have, that is the much richer model of communication. 356 00:14:58,410 --> 00:15:00,690 So what have we been doing in the Streamit Project? 357 00:15:00,690 --> 00:15:03,860 We have kind of a dual thrust within our group building on 358 00:15:03,860 --> 00:15:04,800 this language. 359 00:15:04,800 --> 00:15:07,470 So the first thrust is from the programmability side, 360 00:15:07,470 --> 00:15:10,140 looking at applications and programmability What can we 361 00:15:10,140 --> 00:15:12,030 fit into the streaming model? 362 00:15:12,030 --> 00:15:14,260 And we're also really pushing the optimizations. 363 00:15:14,260 --> 00:15:17,660 So what can you do from both a domain specific optimization 364 00:15:17,660 --> 00:15:21,930 standpoint, as kind of emulating a DSP engineer or a 365 00:15:21,930 --> 00:15:24,770 signal processing expert in the design flow. 366 00:15:24,770 --> 00:15:26,860 And also architecture specific optimizations. 367 00:15:26,860 --> 00:15:29,520 So we've been compiling for a lot of parallel machines. 368 00:15:29,520 --> 00:15:31,860 And we were hoping we could have a full system for you 369 00:15:31,860 --> 00:15:34,735 guys, this IEP, so you could write it then stream it and 370 00:15:34,735 --> 00:15:35,520 then hit the button. 371 00:15:35,520 --> 00:15:37,900 And it would work the whole way down to cell. 372 00:15:37,900 --> 00:15:39,570 Unfortunately we're not quite there yet. 373 00:15:39,570 --> 00:15:42,510 But we do have a pretty robust compiler infrastructure. 374 00:15:42,510 --> 00:15:45,010 And you can download this off the web and play with it if 375 00:15:45,010 --> 00:15:46,190 you want to. 376 00:15:46,190 --> 00:15:49,270 One of our backends that we've released so far actually does 377 00:15:49,270 --> 00:15:52,260 go to a cluster of workstations. 378 00:15:52,260 --> 00:15:56,550 So it's kind of an MPI-like version of C. It uses Pthreads 379 00:15:56,550 --> 00:15:58,120 for the parallelism model. 380 00:15:58,120 --> 00:16:00,630 And I mean, depending on what kind of a hacker you are, you 381 00:16:00,630 --> 00:16:02,950 actually might be able to lower that down onto cell. 382 00:16:02,950 --> 00:16:07,840 So some of the stuff you might be able to use if you're have 383 00:16:07,840 --> 00:16:09,050 some initiative in there. 384 00:16:09,050 --> 00:16:10,400 And of course we'd be willing to work with 385 00:16:10,400 --> 00:16:12,260 you on this as well. 386 00:16:12,260 --> 00:16:14,460 so we have lots optimizations in the tool flow. 387 00:16:14,460 --> 00:16:17,280 And actually Saman will spend another lecture focusing on 388 00:16:17,280 --> 00:16:19,240 the StreamIt compiler, and how we get 389 00:16:19,240 --> 00:16:22,010 performance out of the model. 390 00:16:22,010 --> 00:16:25,330 OK, so let's just jump right in and do the analog of Hello 391 00:16:25,330 --> 00:16:26,080 World in StreamIt. 392 00:16:26,080 --> 00:16:28,470 I'm going to kind of walk you through the language and show 393 00:16:28,470 --> 00:16:30,950 you the interesting pieces from an intellectual 394 00:16:30,950 --> 00:16:31,380 standpoint. 395 00:16:31,380 --> 00:16:33,720 What's interesting about a streaming model that 396 00:16:33,720 --> 00:16:35,280 you can take away. 397 00:16:35,280 --> 00:16:38,100 So instead of Hello World, we have a counter. 398 00:16:38,100 --> 00:16:40,320 Since we're dealing with stream programs here, you're 399 00:16:40,320 --> 00:16:42,200 not usually doing text processing. 400 00:16:42,200 --> 00:16:43,820 So how do you write counter? 401 00:16:43,820 --> 00:16:46,040 Well there are two pieces to the program. 402 00:16:46,040 --> 00:16:48,430 The first is kind of the interconnect between the 403 00:16:48,430 --> 00:16:49,270 different components. 404 00:16:49,270 --> 00:16:50,950 That's what we have up here. 405 00:16:50,950 --> 00:16:54,210 We're saying the program is a pipeline with two stages, it 406 00:16:54,210 --> 00:16:56,690 has a source, and it has a printer. 407 00:16:56,690 --> 00:16:58,120 And then we can write the source and 408 00:16:58,120 --> 00:16:59,720 the printer as filters. 409 00:16:59,720 --> 00:17:03,190 We call those basic building blocks filters in Streamit. 410 00:17:03,190 --> 00:17:05,430 So the source will just have a variable x that it 411 00:17:05,430 --> 00:17:07,130 initializes is zero. 412 00:17:07,130 --> 00:17:09,450 And then we have a work function which is 413 00:17:09,450 --> 00:17:12,570 automatically called by our runtime system every time 414 00:17:12,570 --> 00:17:13,820 through the steady state. 415 00:17:13,820 --> 00:17:16,400 So this work function well push one item on to the output 416 00:17:16,400 --> 00:17:20,130 channel, and it'll increment the value afterward. 417 00:17:20,130 --> 00:17:22,870 Whereas the intPrinter at the bottom here, 418 00:17:22,870 --> 00:17:24,720 will input one value. 419 00:17:24,720 --> 00:17:27,080 And its work function here just pops that value off the 420 00:17:27,080 --> 00:17:29,940 input tape, and prints it to the output. 421 00:17:29,940 --> 00:17:31,180 Now how do we run this thing? 422 00:17:31,180 --> 00:17:33,490 Well there's no main function here like you 423 00:17:33,490 --> 00:17:34,730 see in Hello World. 424 00:17:34,730 --> 00:17:36,190 Is there comment? 425 00:17:36,190 --> 00:17:39,207 Oh, sorry. 426 00:17:39,207 --> 00:17:42,180 AUDIENCE: The two meanings of push and two meanings of pop. 427 00:17:42,180 --> 00:17:42,670 BILL THIES: Two meanings? 428 00:17:42,670 --> 00:17:44,120 AUDIENCE: Push 1, 2, 3. 429 00:17:44,120 --> 00:17:44,970 BILL THIES: Yeah, yeah. 430 00:17:44,970 --> 00:17:48,490 So the first push here is just declaring that this work 431 00:17:48,490 --> 00:17:51,300 function will push one item to the output tape. 432 00:17:51,300 --> 00:17:53,450 So this is the synchronous dataflow aspect. 433 00:17:53,450 --> 00:17:56,510 Were associating an output rate with this work function. 434 00:17:56,510 --> 00:17:58,480 So that's a declaration here. 435 00:17:58,480 --> 00:18:02,190 Whereas this push is just actually executing the push 436 00:18:02,190 --> 00:18:03,760 onto the output. 437 00:18:03,760 --> 00:18:06,000 So how do we run this thing? 438 00:18:06,000 --> 00:18:08,770 Well we compile it with a StreamIt compiler, store C 439 00:18:08,770 --> 00:18:09,980 into a binary. 440 00:18:09,980 --> 00:18:12,170 And then when we run, we run for a given number of 441 00:18:12,170 --> 00:18:13,390 iterations. 442 00:18:13,390 --> 00:18:15,100 So you don't just call it once. 443 00:18:15,100 --> 00:18:18,060 Our model is that this is a continuous stream of data 444 00:18:18,060 --> 00:18:19,420 going through the program. 445 00:18:19,420 --> 00:18:22,110 And so when you run it, you run it for some number of 446 00:18:22,110 --> 00:18:25,570 iterations, or basically input or output items, is what 447 00:18:25,570 --> 00:18:26,590 you're running it for. 448 00:18:26,590 --> 00:18:28,470 So if you run this for four iterations, it would 449 00:18:28,470 --> 00:18:31,060 print in the 033. 450 00:18:31,060 --> 00:18:33,540 So we can leverage this steady flow of data. 451 00:18:33,540 --> 00:18:35,050 Yeah Amir? 452 00:18:35,050 --> 00:18:38,500 AUDIENCE: 1, 2, 3, 4, pushing X plus plus . 453 00:18:38,500 --> 00:18:40,320 BILL THIES: I think the plus, plus is 454 00:18:40,320 --> 00:18:41,140 executed after the push. 455 00:18:41,140 --> 00:18:43,010 AUDIENCE: Push plus plus? 456 00:18:43,010 --> 00:18:46,570 BILL THIES: So it starts at zero. 457 00:18:46,570 --> 00:18:50,670 So I think a PostFix expression executes after the 458 00:18:50,670 --> 00:18:51,520 actual obsession. 459 00:18:51,520 --> 00:18:53,780 Yeah. 460 00:18:53,780 --> 00:18:56,220 Yeah. 461 00:18:56,220 --> 00:18:57,470 Other questions? 462 00:18:59,680 --> 00:19:02,200 OK so let's step up a level and look at 463 00:19:02,200 --> 00:19:03,890 what we have in StreamIt. 464 00:19:03,890 --> 00:19:05,940 So the first question is, how do you represent this 465 00:19:05,940 --> 00:19:08,350 connectivity between different building blocks? 466 00:19:08,350 --> 00:19:10,200 How do you represent streams? 467 00:19:10,200 --> 00:19:13,160 And if you look at traditional programming models, kind of 468 00:19:13,160 --> 00:19:15,140 the conventional wisdom is that a stream 469 00:19:15,140 --> 00:19:16,490 program is a graph. 470 00:19:16,490 --> 00:19:17,560 You have different nodes that are 471 00:19:17,560 --> 00:19:19,060 communicating to each other. 472 00:19:19,060 --> 00:19:21,350 And graphs are actually kind of hard to analyze. 473 00:19:21,350 --> 00:19:22,420 They're hard to represent. 474 00:19:22,420 --> 00:19:24,090 They're a little but confusing. 475 00:19:24,090 --> 00:19:26,900 So the approach we decided to take in StreamIt is one of a 476 00:19:26,900 --> 00:19:29,170 structured computation graph. 477 00:19:29,170 --> 00:19:32,145 So instead of having arbitrary inner connections between the 478 00:19:32,145 --> 00:19:35,280 stages, we have a higher hierarchical description in 479 00:19:35,280 --> 00:19:38,150 which every individual stage has a single input and a 480 00:19:38,150 --> 00:19:39,340 single output. 481 00:19:39,340 --> 00:19:41,760 And you can compose these together into 482 00:19:41,760 --> 00:19:42,990 higher level stages. 483 00:19:42,990 --> 00:19:45,600 Of course there's some pages that do split and join with 484 00:19:45,600 --> 00:19:46,770 multiple inputs. 485 00:19:46,770 --> 00:19:48,520 We'll get to that. 486 00:19:48,520 --> 00:19:52,400 So the analog here is kind of ah analogous to structured 487 00:19:52,400 --> 00:19:54,160 control flow, in your favorite 488 00:19:54,160 --> 00:19:55,780 imperative programming language. 489 00:19:55,780 --> 00:19:59,060 Of course there was a day when everyone used goto statements 490 00:19:59,060 --> 00:20:01,060 instead of having structure control flow. 491 00:20:01,060 --> 00:20:03,400 We've got a fan of goto statements in the audience? 492 00:20:03,400 --> 00:20:06,550 OK, I'll get to you later. 493 00:20:06,550 --> 00:20:09,160 But the problem was, it's really hard to understand the 494 00:20:09,160 --> 00:20:12,110 program that's jumping all over the place because there's 495 00:20:12,110 --> 00:20:14,810 no local reasoning you can have. You know you're jumping 496 00:20:14,810 --> 00:20:17,150 to this location, you're coming back a different way. 497 00:20:17,150 --> 00:20:19,950 It's hard to reason about program components. 498 00:20:19,950 --> 00:20:21,960 So when people went to structured control flow, 499 00:20:21,960 --> 00:20:26,410 there's just if else, four loop statements. 500 00:20:26,410 --> 00:20:28,740 Those are the two basic constructs. 501 00:20:28,740 --> 00:20:32,250 You can basically express all kinds of computation in those 502 00:20:32,250 --> 00:20:33,540 simple primitives. 503 00:20:33,540 --> 00:20:35,340 And things got a lot simpler. 504 00:20:35,340 --> 00:20:37,500 And you know people objected at one point even. 505 00:20:37,500 --> 00:20:39,630 You know what about a finite-state machine? 506 00:20:39,630 --> 00:20:41,990 Don't you need goto statements for a finite-state machine, 507 00:20:41,990 --> 00:20:43,980 going from one state to another another. 508 00:20:43,980 --> 00:20:46,850 And now everyone writes in FSM with a really simple idiom. 509 00:20:46,850 --> 00:20:50,090 You usually have a while loop around a case statement. 510 00:20:50,090 --> 00:20:51,830 Right, you have a dispatch loop. 511 00:20:51,830 --> 00:20:53,780 So and now whenever you see that pattern you can 512 00:20:53,780 --> 00:20:55,490 recognize, oh there's a finite-state machine. 513 00:20:55,490 --> 00:20:57,200 It's not just at set of gotos. 514 00:20:57,200 --> 00:20:58,840 It's a finite-state machine. 515 00:20:58,840 --> 00:21:00,370 So we think there are similar idioms in 516 00:21:00,370 --> 00:21:01,140 the streaming domain. 517 00:21:01,140 --> 00:21:02,820 And that's kind of the direction we're pushing from a 518 00:21:02,820 --> 00:21:04,600 design standpoint. 519 00:21:04,600 --> 00:21:06,510 So what are our structures that we have? 520 00:21:06,510 --> 00:21:09,310 Well here are our structured streams. At the 521 00:21:09,310 --> 00:21:10,310 base we have a filter. 522 00:21:10,310 --> 00:21:13,100 That's just the programmable unit like I showed you. 523 00:21:13,100 --> 00:21:16,430 We have a pipeline, which just connects one stream to another 524 00:21:16,430 --> 00:21:17,400 in a sequence. 525 00:21:17,400 --> 00:21:19,620 So this gives you pipeline parallelism. 526 00:21:19,620 --> 00:21:22,270 There's a splitjoin where you have explicit parallelism in 527 00:21:22,270 --> 00:21:23,020 the stream. 528 00:21:23,020 --> 00:21:25,470 So I'll talk about what these splitters and joiners can do. 529 00:21:25,470 --> 00:21:28,730 It's basically a predefined pattern of scattering data to 530 00:21:28,730 --> 00:21:31,530 some child streams, and then gathering that data back into 531 00:21:31,530 --> 00:21:32,770 a single stream. 532 00:21:32,770 --> 00:21:35,500 So the whole construct still remains single input and 533 00:21:35,500 --> 00:21:37,020 single output. 534 00:21:37,020 --> 00:21:40,720 Likewise a feedback loop is just a simple way to put a 535 00:21:40,720 --> 00:21:42,340 loop in your stream. 536 00:21:42,340 --> 00:21:44,160 And of course these are hierarchical. 537 00:21:44,160 --> 00:21:47,160 So all of these green boxes can be any of the three 538 00:21:47,160 --> 00:21:47,910 constructs. 539 00:21:47,910 --> 00:21:50,270 So that's how you can have these hierarchical graphs. 540 00:21:50,270 --> 00:21:52,740 And again, since everything is single-input single-output, 541 00:21:52,740 --> 00:21:53,920 you can really mix and match. 542 00:21:53,920 --> 00:21:56,040 You know choose your favorite components, and they'll always 543 00:21:56,040 --> 00:21:56,760 fit together. 544 00:21:56,760 --> 00:22:00,390 You don't need to stitch multiple connections. 545 00:22:00,390 --> 00:22:03,010 So let's dive inside one of these filters now. 546 00:22:03,010 --> 00:22:05,620 And I gave you a feel for how they look before, but here's a 547 00:22:05,620 --> 00:22:07,080 little more detail. 548 00:22:07,080 --> 00:22:09,130 So how do we program the filter? 549 00:22:09,130 --> 00:22:11,600 Well a filter just transforms one stream 550 00:22:11,600 --> 00:22:12,950 into another stream. 551 00:22:12,950 --> 00:22:15,260 And here it's transforming a stream of floating-point 552 00:22:15,260 --> 00:22:17,780 numbers into another floating-point number stream. 553 00:22:17,780 --> 00:22:20,800 I can take some parameters at the top. 554 00:22:20,800 --> 00:22:23,310 These actually fixed at compile time in our model, 555 00:22:23,310 --> 00:22:25,860 which allows the compiler to really specialize the filters 556 00:22:25,860 --> 00:22:28,720 code depending on the context in which it's being used. 557 00:22:28,720 --> 00:22:32,210 So for example here, we're inputting N in a frequency. 558 00:22:32,210 --> 00:22:34,740 And then we have two stages of execution. 559 00:22:34,740 --> 00:22:38,220 At initialization time-- this runs one at the beginning of 560 00:22:38,220 --> 00:22:40,940 the whole program-- we can calculate some weights for 561 00:22:40,940 --> 00:22:42,720 example, from the frequency. 562 00:22:42,720 --> 00:22:45,550 And we can store those weights as a local variable. 563 00:22:45,550 --> 00:22:47,860 So you can think of this kind of like a Java class. 564 00:22:47,860 --> 00:22:49,260 You can have some member variables. 565 00:22:49,260 --> 00:22:52,350 You can retains state from one execution to the next. 566 00:22:52,350 --> 00:22:53,790 The work function is the closest 567 00:22:53,790 --> 00:22:55,350 thing to the main function. 568 00:22:55,350 --> 00:22:57,860 This is called repeatedly in the steady state. 569 00:22:57,860 --> 00:23:00,660 And here are the IO rates of the work function. 570 00:23:00,660 --> 00:23:03,660 This filter actually peaks at some data items. That means 571 00:23:03,660 --> 00:23:07,010 that it inspects more items on the input channel than it 572 00:23:07,010 --> 00:23:09,490 actually consumes on every iteration. 573 00:23:09,490 --> 00:23:12,920 So we'll look at N input items, and we'll push one new 574 00:23:12,920 --> 00:23:17,040 item onto the output and pop one item from the input tape 575 00:23:17,040 --> 00:23:19,200 every time we execute. 576 00:23:19,200 --> 00:23:21,800 So here we have a sliding window computation. 577 00:23:21,800 --> 00:23:24,330 It means the next time through, we'll just slide this 578 00:23:24,330 --> 00:23:27,180 window up by one and inspect the next N items 579 00:23:27,180 --> 00:23:28,830 on the input tape. 580 00:23:28,830 --> 00:23:31,170 And inside the work function you can have pretty much 581 00:23:31,170 --> 00:23:32,590 general purpose code. 582 00:23:32,590 --> 00:23:34,520 Right now we just allow pointers and a few other 583 00:23:34,520 --> 00:23:35,980 things to keep things simple. 584 00:23:35,980 --> 00:23:38,050 But the idea is this is general purpose imperative 585 00:23:38,050 --> 00:23:40,380 code inside the work function. 586 00:23:40,380 --> 00:23:44,160 Now what's nice about this representations of a filter is 587 00:23:44,160 --> 00:23:46,020 for one thing is this peak function. 588 00:23:46,020 --> 00:23:48,750 So what we really have is a nice representation of the 589 00:23:48,750 --> 00:23:51,100 data pattern, that you're reading the data 590 00:23:51,100 --> 00:23:52,470 on the input channel. 591 00:23:52,470 --> 00:23:55,290 And if you look at this for example in a language like C, 592 00:23:55,290 --> 00:23:56,730 it's a lot messier. 593 00:23:56,730 --> 00:23:58,990 So usually when you're doing buffer management, you have to 594 00:23:58,990 --> 00:24:00,610 do some modulo operations. 595 00:24:00,610 --> 00:24:03,290 You have to keep a circular buffer of your live data. 596 00:24:03,290 --> 00:24:06,540 And increase you know, a head or tail pointer and mod around 597 00:24:06,540 --> 00:24:08,910 the side with some kind of modulo operation. 598 00:24:08,910 --> 00:24:11,580 And for a compiler, this is a real nightmare. 599 00:24:11,580 --> 00:24:13,340 Because modulo operations are kind of the the 600 00:24:13,340 --> 00:24:14,680 worst thing to analyze. 601 00:24:14,680 --> 00:24:16,760 You can't see what it's actually trying to read. 602 00:24:16,760 --> 00:24:19,060 And if you want to map this buffer to a network or to a 603 00:24:19,060 --> 00:24:22,460 combined communication with some other actor or filter in 604 00:24:22,460 --> 00:24:24,690 the graph, it's pretty much impossible. 605 00:24:24,690 --> 00:24:27,540 So here is we're exposing that all to the compiler. 606 00:24:27,540 --> 00:24:29,610 And you'll see how that can make a difference. 607 00:24:29,610 --> 00:24:31,450 And also it's just a lot easier to program. 608 00:24:31,450 --> 00:24:33,040 I mean, I don't like looking at the code. 609 00:24:33,040 --> 00:24:36,260 So I'm going to go to the next slide. 610 00:24:36,260 --> 00:24:38,930 OK, so how do we piece things together? 611 00:24:38,930 --> 00:24:41,560 Let's just build some higher level components. 612 00:24:41,560 --> 00:24:44,380 So here's a pipeline of two components. 613 00:24:44,380 --> 00:24:45,870 And we already saw a pipeline. 614 00:24:45,870 --> 00:24:48,920 You can just add one component after another. 615 00:24:48,920 --> 00:24:53,680 And add just basically has the effect of making a queue, and 616 00:24:53,680 --> 00:24:56,570 just queueing up all of the components that you added, and 617 00:24:56,570 --> 00:24:58,530 connecting them one after another. 618 00:24:58,530 --> 00:25:01,560 So here we have a BandPastFilter by connecting a 619 00:25:01,560 --> 00:25:04,310 LowPassFilter and feeding it's output into a HighPassFilter. 620 00:25:04,310 --> 00:25:07,650 You end up with a BandPassFilter. 621 00:25:07,650 --> 00:25:08,850 OK what about a splitjoin? 622 00:25:08,850 --> 00:25:11,290 How do we make those? 623 00:25:11,290 --> 00:25:14,730 So a splitjoin has an add statement as well. 624 00:25:14,730 --> 00:25:17,260 And here we're adding components in a loop. 625 00:25:17,260 --> 00:25:20,710 So what this means is now when we say add, we're actually 626 00:25:20,710 --> 00:25:22,700 adding from left to right. 627 00:25:22,700 --> 00:25:24,920 So instead of going top to down, we're adding from left 628 00:25:24,920 --> 00:25:26,610 to right across the splitjoin. 629 00:25:26,610 --> 00:25:28,440 And we can actually do that in a loop. 630 00:25:28,440 --> 00:25:31,880 So here we're intPrinting a parameter N. And depending on 631 00:25:31,880 --> 00:25:34,110 that value, we'll add N 632 00:25:34,110 --> 00:25:36,420 BandPassFilters to this splitjoin. 633 00:25:36,420 --> 00:25:37,590 So it's kind of cool, right? 634 00:25:37,590 --> 00:25:40,000 Because you can input a parameter, and that parameter 635 00:25:40,000 --> 00:25:43,180 actually affects the structure of your graph. 636 00:25:43,180 --> 00:25:47,450 So this graph is unrolled at compiled time by our compiler, 637 00:25:47,450 --> 00:25:50,110 constructing a big sequence of computations. 638 00:25:50,110 --> 00:25:52,440 And it can resolve the structure and communication 639 00:25:52,440 --> 00:25:55,050 pattern in that graph, and then map it to the underlying 640 00:25:55,050 --> 00:25:56,300 substrate when we compile. 641 00:25:58,820 --> 00:26:00,680 Also to notice here are the splitter and the joiner. 642 00:26:00,680 --> 00:26:03,420 So we have a predefined set of splitter and joiners. 643 00:26:03,420 --> 00:26:05,760 I'll go into more detail later. 644 00:26:05,760 --> 00:26:07,750 But here we're just duplicating the data to every 645 00:26:07,750 --> 00:26:09,970 one of these parallel components, and then doing a 646 00:26:09,970 --> 00:26:12,970 round-robin join pattern where we bring them back together 647 00:26:12,970 --> 00:26:15,030 into a single output stream. 648 00:26:15,030 --> 00:26:17,150 And if you want to do an equalizer, you basically need 649 00:26:17,150 --> 00:26:18,990 an adder at the bottom to add the 650 00:26:18,990 --> 00:26:21,310 different components together. 651 00:26:21,310 --> 00:26:23,290 And another thing you can notice here is that we have 652 00:26:23,290 --> 00:26:24,960 some inlining going on. 653 00:26:24,960 --> 00:26:28,530 So we actually embedded this splitjoin inside a higher 654 00:26:28,530 --> 00:26:29,980 level pipeline. 655 00:26:29,980 --> 00:26:32,260 So what this does is it prevents you from having to 656 00:26:32,260 --> 00:26:34,340 name every component of your stream. 657 00:26:34,340 --> 00:26:37,870 You can have a single stream definition with lots of nested 658 00:26:37,870 --> 00:26:39,110 components. 659 00:26:39,110 --> 00:26:42,770 And the natural extension is, you can basically scale-up to 660 00:26:42,770 --> 00:26:45,230 basically the natural size of a procedure, just like you 661 00:26:45,230 --> 00:26:47,160 would in an imperative language. 662 00:26:47,160 --> 00:26:50,190 And here is for example, an FM radio where we have a few 663 00:26:50,190 --> 00:26:51,650 inline components. 664 00:26:51,650 --> 00:26:54,530 And the interesting thing here is that there's a pretty good 665 00:26:54,530 --> 00:26:57,530 correspondence between the lines of the text and the 666 00:26:57,530 --> 00:26:59,480 actual structure of the graph. 667 00:26:59,480 --> 00:27:01,110 And that's something that's hard to find in 668 00:27:01,110 --> 00:27:02,100 an imperative language. 669 00:27:02,100 --> 00:27:05,120 I mean if you want to for example, stitch nodes together 670 00:27:05,120 --> 00:27:08,210 with edges, it's often very hard to visualize the 671 00:27:08,210 --> 00:27:11,140 resulting structure of the graph that you have. But here 672 00:27:11,140 --> 00:27:13,650 if you just go through the program, you can see that the 673 00:27:13,650 --> 00:27:16,940 AtoD component goes right over to the AtoD, the demodulator 674 00:27:16,940 --> 00:27:18,650 to the demodulator, and so on. 675 00:27:18,650 --> 00:27:21,640 And even for the parallel components, you can kind of 676 00:27:21,640 --> 00:27:23,720 piece them together. 677 00:27:23,720 --> 00:27:27,680 so that's kind of how we think of building programs. Any 678 00:27:27,680 --> 00:27:28,930 questions so far? 679 00:27:31,070 --> 00:27:33,390 OK so this is kind of how you go about 680 00:27:33,390 --> 00:27:34,920 programming in StreamIt. 681 00:27:34,920 --> 00:27:37,370 But programming is kind of a chug n' plug activity right? 682 00:27:37,370 --> 00:27:39,140 Nobody wants to be a code monkey. 683 00:27:39,140 --> 00:27:41,340 The reason we're all here is to see what's beautiful about 684 00:27:41,340 --> 00:27:42,590 this programming model. 685 00:27:42,590 --> 00:27:42,910 Right? 686 00:27:42,910 --> 00:27:45,410 Don Knuth said this, a famous computer scientist from 687 00:27:45,410 --> 00:27:48,320 Stanford during his Turing Award Lecture you know, 688 00:27:48,320 --> 00:27:50,020 "Some programs are elegant, some are 689 00:27:50,020 --> 00:27:52,460 exquisite, some are sparkling. 690 00:27:52,460 --> 00:27:54,520 My claim is that it is possible to write grand 691 00:27:54,520 --> 00:27:56,910 programs, noble programs, truly 692 00:27:56,910 --> 00:27:58,640 magnificent ones!" Right. 693 00:27:58,640 --> 00:28:00,210 We want the best programs possible. 694 00:28:00,210 --> 00:28:01,990 It's not just about making it work. 695 00:28:01,990 --> 00:28:04,850 We want really beautiful programs. So what's beautiful 696 00:28:04,850 --> 00:28:06,370 about the streaming domain? 697 00:28:06,370 --> 00:28:08,820 What can you go away with and say wow, that was a really 698 00:28:08,820 --> 00:28:11,540 beautiful expression of the computation. 699 00:28:11,540 --> 00:28:14,290 Well for me I think one of the interesting things here is the 700 00:28:14,290 --> 00:28:15,680 splitjoin contruct. 701 00:28:15,680 --> 00:28:18,040 Splitjoins can really be beautiful. 702 00:28:18,040 --> 00:28:22,000 You know some mornings I just wake up and I'm like, oh I'm 703 00:28:22,000 --> 00:28:24,250 so glad I live in a world with splitjoins. 704 00:28:24,250 --> 00:28:25,510 You know? 705 00:28:25,510 --> 00:28:27,820 And and now splitjoins will be part of your world. 706 00:28:27,820 --> 00:28:29,010 You can say this tomorrow. 707 00:28:29,010 --> 00:28:30,380 This is just wonderful. 708 00:28:30,380 --> 00:28:32,840 So OK, what do we having in the splitjoin constructs? 709 00:28:32,840 --> 00:28:34,900 You can duplicate data. 710 00:28:34,900 --> 00:28:37,650 You can do a round-robin communication pattern from one 711 00:28:37,650 --> 00:28:40,130 to another, or round-robin join. 712 00:28:40,130 --> 00:28:42,300 Now the duplicate is pretty simple. 713 00:28:42,300 --> 00:28:44,090 You just take the input stream and duplicate 714 00:28:44,090 --> 00:28:45,440 it to all the children. 715 00:28:45,440 --> 00:28:47,240 No problem. 716 00:28:47,240 --> 00:28:48,980 What do you do for a round-robin? 717 00:28:48,980 --> 00:28:53,260 Well you path N items from the input to a given child. 718 00:28:53,260 --> 00:28:56,950 So for example, if N is 1, we'll just distribute one at a 719 00:28:56,950 --> 00:29:01,170 time to the child streams. And you get a pattern like this, a 720 00:29:01,170 --> 00:29:03,890 round-robin just going across. 721 00:29:03,890 --> 00:29:06,290 And you can do the same thing on the joiner side. 722 00:29:06,290 --> 00:29:09,310 Let's say you're joining with a factor of one. 723 00:29:09,310 --> 00:29:12,490 You're just reading from the children and putting them into 724 00:29:12,490 --> 00:29:14,530 a single stream. 725 00:29:14,530 --> 00:29:18,190 OK so the pretty colorful, but nothing too fancy yet. 726 00:29:18,190 --> 00:29:20,460 Let's consider a different round-robin factor. 727 00:29:20,460 --> 00:29:24,420 So a round-robin of 2 means that we peel off 2 items from 728 00:29:24,420 --> 00:29:27,760 the input, and pass those items to the first output. 729 00:29:27,760 --> 00:29:29,590 OK there actually is going to be a quiz on this. 730 00:29:29,590 --> 00:29:32,630 So ask questions of this doesn't make sense. 731 00:29:32,630 --> 00:29:37,290 OK pass the next 2 items, and 2 items round-robin like that. 732 00:29:37,290 --> 00:29:39,010 And you can actually have nonuniform 733 00:29:39,010 --> 00:29:40,210 weights if you want to. 734 00:29:40,210 --> 00:29:41,740 So on the right let's say we're doing 735 00:29:41,740 --> 00:29:43,270 round-robin 1, 2, 3. 736 00:29:43,270 --> 00:29:46,620 That means we pass 1 item from the first stream, 2 items from 737 00:29:46,620 --> 00:29:49,670 the next stream, and then 3 items from the third stream. 738 00:29:49,670 --> 00:29:50,460 OK, pretty simple. 739 00:29:50,460 --> 00:29:53,160 We're just doing 1, 2, and 3, and so on. 740 00:29:53,160 --> 00:29:54,260 Does that make sense? 741 00:29:54,260 --> 00:29:59,050 I'm going to build on this so any questions? 742 00:29:59,050 --> 00:30:01,780 OK this was colorful but this totally beautiful yet. 743 00:30:01,780 --> 00:30:03,670 So what's beautiful about this? 744 00:30:03,670 --> 00:30:06,440 well let's see how you might write a matrix transpose. 745 00:30:06,440 --> 00:30:08,720 OK, something you guys have probably all written at one 746 00:30:08,720 --> 00:30:11,120 point in your life is transposing a matrix. 747 00:30:11,120 --> 00:30:14,430 Let's say this matrix has M rows, and it has N columns 748 00:30:14,430 --> 00:30:15,600 going across. 749 00:30:15,600 --> 00:30:17,830 And we're starting with a representation in which the 750 00:30:17,830 --> 00:30:20,660 stream is basically, I think this is row major order. 751 00:30:20,660 --> 00:30:23,430 The first thing that you're doing is going across the rows 752 00:30:23,430 --> 00:30:25,340 before you're going to the next the next row. 753 00:30:25,340 --> 00:30:27,680 I'm sorry you're going across the columns, and then down to 754 00:30:27,680 --> 00:30:28,770 the next row. 755 00:30:28,770 --> 00:30:30,710 So it's zigzagging like this. 756 00:30:30,710 --> 00:30:33,020 And you want to pass through a transpose, so you 757 00:30:33,020 --> 00:30:34,420 zigzag the other way. 758 00:30:34,420 --> 00:30:36,870 You do the first column, up, and then the next 759 00:30:36,870 --> 00:30:38,340 column, and so on. 760 00:30:38,340 --> 00:30:41,110 And this comes up a lot in a stream program. 761 00:30:41,110 --> 00:30:44,080 So it turns out you can represent this as a splitjoin. 762 00:30:44,080 --> 00:30:45,570 Oh that's not good. 763 00:30:48,110 --> 00:30:52,420 Just in my moment in glory as well. 764 00:30:52,420 --> 00:30:53,430 OK, yeah slides. 765 00:30:53,430 --> 00:30:54,680 You can download slides. 766 00:30:57,540 --> 00:30:58,670 Actually I can do this on the board. 767 00:30:58,670 --> 00:30:59,980 This is the thinking part anyway. 768 00:30:59,980 --> 00:31:01,910 Yeah, could you just bring up GMail? 769 00:31:01,910 --> 00:31:03,820 I have a backup on GMail. 770 00:31:03,820 --> 00:31:05,880 Can we focus on the board? 771 00:31:05,880 --> 00:31:09,990 So this is going to be a little exercise anyway. 772 00:31:09,990 --> 00:31:14,380 OK, here's what we had. 773 00:31:14,380 --> 00:31:20,230 We had M, M rows, N columns. 774 00:31:20,230 --> 00:31:21,780 We started with an interleaving like this. 775 00:31:27,640 --> 00:31:31,060 Right, and we want to go into a splitjoin. 776 00:31:31,060 --> 00:31:33,030 And this will be a round-robin construct. 777 00:31:33,030 --> 00:31:35,090 And what I want you to do is fill in the round-robin 778 00:31:35,090 --> 00:31:41,020 weight, and also the number of the streams. And you can have 779 00:31:41,020 --> 00:31:42,270 a round-robin at the bottom. 780 00:31:44,690 --> 00:31:48,240 And when it comes out, you want the opposite 781 00:31:48,240 --> 00:31:49,490 interleaving. 782 00:31:56,130 --> 00:32:00,990 This is M and N. 783 00:32:00,990 --> 00:32:06,060 OK so there are 3 unknowns here, what you're doing the 784 00:32:06,060 --> 00:32:09,410 round-robin by-- can you guys see that over there-- 785 00:32:09,410 --> 00:32:12,290 how many parallel streams there are, and what you're 786 00:32:12,290 --> 00:32:13,630 joining the round-robin by. 787 00:32:17,520 --> 00:32:19,170 Ok so I'm going to give you a minute to think about this. 788 00:32:19,170 --> 00:32:19,960 Try to think about this. 789 00:32:19,960 --> 00:32:22,170 See if you can figure out what these constants are. 790 00:32:22,170 --> 00:32:24,260 You just basically want to read from this data in a row 791 00:32:24,260 --> 00:32:28,770 major pattern, M rows and N columns, and end up with 792 00:32:28,770 --> 00:32:31,450 something that's column major. 793 00:32:31,450 --> 00:32:35,130 What are the values for the constant? 794 00:32:35,130 --> 00:32:35,570 Does it makes sense? 795 00:32:35,570 --> 00:32:36,920 Ask a question if it doesn't make sense. 796 00:32:36,920 --> 00:32:38,870 Yeah? 797 00:32:38,870 --> 00:32:40,980 AUDIENCE: So we assume that values are going to, based on 798 00:32:40,980 --> 00:32:45,070 the line that you drew, across it, tests like a stream line? 799 00:32:45,070 --> 00:32:45,960 BILL THIES: Right, right, right. 800 00:32:45,960 --> 00:32:47,860 So those values are coming down the stream. 801 00:32:47,860 --> 00:32:49,870 You have a 1-dimensional stream. 802 00:32:49,870 --> 00:32:50,870 It's interleaved like this. 803 00:32:50,870 --> 00:32:52,850 So you'll be reading them like this. 804 00:32:52,850 --> 00:32:55,620 And then you the output a 1-dimensional stream that is 805 00:32:55,620 --> 00:32:57,170 threading the columns. 806 00:32:57,170 --> 00:32:57,530 Does that make sense? 807 00:32:57,530 --> 00:33:00,950 Somebody ask another question. 808 00:33:00,950 --> 00:33:01,840 Yeah 809 00:33:01,840 --> 00:33:08,470 AUDIENCE: So the actual matrix transpose codes, it's my 810 00:33:08,470 --> 00:33:10,690 understand that nobody actually does it sequentially 811 00:33:10,690 --> 00:33:13,300 like that because of locality issues. 812 00:33:13,300 --> 00:33:15,690 Instead it's broken up into blocks. 813 00:33:15,690 --> 00:33:18,900 BILL THIES: So there are ways to optimize this. 814 00:33:18,900 --> 00:33:21,150 AUDIENCE: And after you've sort of serialized it, can you 815 00:33:21,150 --> 00:33:25,000 then capture.. 816 00:33:25,000 --> 00:33:26,510 PROFESSOR: Guys, [UNINTELLIGIBLE PHRASE] 817 00:33:26,510 --> 00:33:28,260 has a blocking segment. 818 00:33:28,260 --> 00:33:30,540 So you can heirarchically do that. 819 00:33:30,540 --> 00:33:34,203 So, normally what happens is you do the blocks and inside 820 00:33:34,203 --> 00:33:35,580 the blocks, you can do it again. 821 00:33:35,580 --> 00:33:36,830 You can do it at two levels, basically. 822 00:33:45,950 --> 00:33:46,390 BILL THIES: Any hypotheses? 823 00:33:46,390 --> 00:33:46,690 Anyone? 824 00:33:46,690 --> 00:33:55,430 AUDIENCE: Is it N for first one, M for the second one? 825 00:33:55,430 --> 00:33:57,170 BILL THIES: OK, what do we have 826 00:33:57,170 --> 00:34:00,110 AUDIENCE: N for the first one, and M for the second? 827 00:34:00,110 --> 00:34:02,490 BILL THIES: N,M and 828 00:34:02,490 --> 00:34:05,105 AUDIENCE: Same for the M? 829 00:34:05,105 --> 00:34:07,990 BILL THIES: Same, M? 830 00:34:07,990 --> 00:34:09,460 AUDIENCE: Yeah. 831 00:34:09,460 --> 00:34:17,560 BILL THIES: OK OK, this is a hypothesis. 832 00:34:17,560 --> 00:34:19,940 Other hypotheses? 833 00:34:19,940 --> 00:34:21,920 AUDIENCE: N, M, 1. 834 00:34:26,360 --> 00:34:27,610 BILL THIES: OK, anyone else? 835 00:34:30,510 --> 00:34:33,000 Any amendments? 836 00:34:33,000 --> 00:34:34,270 AUDIENCE: How about 1, 1, 1? 837 00:34:34,270 --> 00:34:35,520 BILL THIES: 1, 1, 1? 838 00:34:38,750 --> 00:34:39,890 OK lottery is closing. 839 00:34:39,890 --> 00:34:40,630 Yep? 840 00:34:40,630 --> 00:34:41,945 AUDIENCE: !, M, M 841 00:34:41,945 --> 00:34:44,030 BILL THIES: 1 M, M? 842 00:34:44,030 --> 00:34:53,260 1, M, M, OK and last call? 843 00:34:53,260 --> 00:34:57,890 OK I believe two of the ones submitted are correct. 844 00:34:57,890 --> 00:35:01,190 Is this and, yeah? 845 00:35:01,190 --> 00:35:05,480 OK I think this, N, M, 1, works and 1, N, M, works. 846 00:35:05,480 --> 00:35:07,830 So let me explain the 1, N, M, this is how I like 847 00:35:07,830 --> 00:35:10,000 to think about it. 848 00:35:10,000 --> 00:35:11,510 One way to think about this is we just want to 849 00:35:11,510 --> 00:35:13,370 move the whole matrix. 850 00:35:13,370 --> 00:35:15,090 You doing OK, yeah? 851 00:35:15,090 --> 00:35:17,220 OK we just want to move the whole 852 00:35:17,220 --> 00:35:19,600 matrix into this splitjoin. 853 00:35:19,600 --> 00:35:23,290 So the way we can do that is have M columns of the 854 00:35:23,290 --> 00:35:25,660 splitjoin since we have N columns of the matrix. 855 00:35:25,660 --> 00:35:28,540 And what we'll do is we'll just do a round-robin one at a 856 00:35:28,540 --> 00:35:31,400 time, from the columns of the matrix into the 857 00:35:31,400 --> 00:35:33,260 columns of the splitjoin. 858 00:35:33,260 --> 00:35:35,530 So we'll take the first element, send it to the last, 859 00:35:35,530 --> 00:35:38,070 next element, next column, and so on. 860 00:35:38,070 --> 00:35:40,380 So we get the whole matrix here, Now I want to read it 861 00:35:40,380 --> 00:35:41,540 out column-wise. 862 00:35:41,540 --> 00:35:44,650 So we'll do a joiner of M. We'll read M from the left 863 00:35:44,650 --> 00:35:48,260 stream that'll read all M items from the columns, send 864 00:35:48,260 --> 00:35:50,300 it out, and then M items in the next column, and 865 00:35:50,300 --> 00:35:50,730 then send it out. 866 00:35:50,730 --> 00:35:52,490 Does that make sense? 867 00:35:55,960 --> 00:35:58,370 How many people understood that? 868 00:35:58,370 --> 00:35:58,760 All right, 869 00:35:58,760 --> 00:36:02,920 So if you think about it, you can also do it in M, N, 1. 870 00:36:02,920 --> 00:36:07,310 That's basically, yeah, it's very similar. 871 00:36:14,890 --> 00:36:17,420 OK there we were. 872 00:36:17,420 --> 00:36:18,990 And yes. 873 00:36:18,990 --> 00:36:21,570 We can do 1, N, M. We basically read the matrix 874 00:36:21,570 --> 00:36:25,420 down, and then pull it down into column. 875 00:36:25,420 --> 00:36:29,260 And it's very easy to write this as a transpose. 876 00:36:29,260 --> 00:36:31,660 So we just have a transpose filter in 877 00:36:31,660 --> 00:36:32,920 which we're doing nothing. 878 00:36:32,920 --> 00:36:36,630 No competition in the actual rows or the actual contents of 879 00:36:36,630 --> 00:36:37,790 the splitjoin. 880 00:36:37,790 --> 00:36:41,140 And we just split the data by 1, have N different identity 881 00:36:41,140 --> 00:36:44,390 filters, and then join it back together by M. Any questions 882 00:36:44,390 --> 00:36:45,700 about this? 883 00:36:45,700 --> 00:36:49,080 An interesting way to write a transpose. 884 00:36:49,080 --> 00:36:52,260 OK so there's one more opportunity to shine here. 885 00:36:52,260 --> 00:36:54,660 And that's a little more interesting permutation called 886 00:36:54,660 --> 00:36:55,590 a bit-reversed ordering. 887 00:36:55,590 --> 00:36:59,120 And so this comes up in an FFT and another algorithm. 888 00:36:59,120 --> 00:37:01,460 The permutation here, is that we're taking the data 889 00:37:01,460 --> 00:37:03,030 at the index n. 890 00:37:03,030 --> 00:37:05,690 And let's say n has binary digits b sub 0, b sub 891 00:37:05,690 --> 00:37:07,640 1, up to b sub k. 892 00:37:07,640 --> 00:37:11,390 And we want to rearrange that data, so this data goes to a 893 00:37:11,390 --> 00:37:15,630 different index with the reversed bits of its index. 894 00:37:15,630 --> 00:37:19,320 So if it was and index n before, it ends up at b sub k 895 00:37:19,320 --> 00:37:21,250 down to b sub 1, b sub 0. 896 00:37:21,250 --> 00:37:22,630 So for example, let's just look at 897 00:37:22,630 --> 00:37:24,330 3-digit binary numbers. 898 00:37:24,330 --> 00:37:28,850 If we have 0, 0, 0, this is 1 input item. 899 00:37:28,850 --> 00:37:32,720 We're reversing those digits, you still get 0, 0, 0. 900 00:37:32,720 --> 00:37:35,590 Item at index 1 will be 0, 0, 1. 901 00:37:35,590 --> 00:37:38,330 We want to reorder that to index 4. 902 00:37:38,330 --> 00:37:44,270 OK, 1, 0, 0, 0, 1, 0, stays the same, 0 1, 1, goes to 1, 903 00:37:44,270 --> 00:37:46,000 1, 0 shifts over. 904 00:37:46,000 --> 00:37:49,040 And from there on, it's symmetric. 905 00:37:49,040 --> 00:37:49,410 AUDIENCE: [UNINTELLIGIBLE PHRASE] 906 00:37:49,410 --> 00:37:51,540 BILL THIES: Sorry? 907 00:37:51,540 --> 00:37:51,710 AUDIENCE: [UNINTELLIGIBLE PHRASE]. 908 00:37:51,710 --> 00:37:52,190 BILL THIES: OK. 909 00:37:52,190 --> 00:37:54,480 So here I'm writing the indices. 910 00:37:54,480 --> 00:37:56,790 So I'm not writing the data. 911 00:37:56,790 --> 00:38:00,940 So index 0, 1, 2, 3 up through index 8., or index 7. 912 00:38:00,940 --> 00:38:04,340 OK and on the bottom you have indices 0 through 7. 913 00:38:04,340 --> 00:38:07,780 So the data will actually be moved, reordered like that. 914 00:38:07,780 --> 00:38:08,360 Does that make sense? 915 00:38:08,360 --> 00:38:11,320 It's a reordering of data. 916 00:38:11,320 --> 00:38:12,660 Does this transformation make sense? 917 00:38:12,660 --> 00:38:13,980 Other questions? 918 00:38:13,980 --> 00:38:17,940 OK, it turns out you can write this as a splitjoin. 919 00:38:17,940 --> 00:38:21,380 And you just need 3 different weights for the round-robins. 920 00:38:21,380 --> 00:38:26,010 OK round-robin with 1 weight on the top here, and two 921 00:38:26,010 --> 00:38:28,480 different round-robin weights on the bottom. 922 00:38:28,480 --> 00:38:32,460 And here I'm assuming you have 3 binary digits in your index. 923 00:38:32,460 --> 00:38:35,290 So you're reordering in groups to 8. 924 00:38:35,290 --> 00:38:37,510 OK, so let me give you a second to think about this. 925 00:38:37,510 --> 00:38:40,670 What are the values for these weights? 926 00:38:40,670 --> 00:38:41,990 I'll give you a hint in just a second, 927 00:38:41,990 --> 00:38:44,890 or ask another question. 928 00:38:44,890 --> 00:38:46,096 Yes? 929 00:38:46,096 --> 00:38:47,346 AUDIENCE: [UNINTELLIGIBLE PHRASE]. 930 00:38:51,170 --> 00:38:52,860 BILL THIES: So what we're doing here, is we're exposing 931 00:38:52,860 --> 00:38:54,240 the communication pattern. 932 00:38:54,240 --> 00:38:54,970 That's the thing. 933 00:38:54,970 --> 00:38:57,800 If you write this in an imperative way, you end up 934 00:38:57,800 --> 00:39:01,440 basically having your, you're conflating basically the data 935 00:39:01,440 --> 00:39:03,930 dependencies with the reordering pattern. 936 00:39:03,930 --> 00:39:06,150 So what I'm trying to convey here, is how you can use the 937 00:39:06,150 --> 00:39:09,790 streaming model to show how you're sending data around. 938 00:39:09,790 --> 00:39:11,410 Because when you're on an architecture-like cell, 939 00:39:11,410 --> 00:39:13,610 everything is about the data motion. 940 00:39:13,610 --> 00:39:15,460 You're taking data from one place and you're trying to 941 00:39:15,460 --> 00:39:19,080 efficiently get it to the producers or the consumers. 942 00:39:19,080 --> 00:39:21,400 And you really need to-- the compiler needs to understand 943 00:39:21,400 --> 00:39:22,530 the data motion. 944 00:39:22,530 --> 00:39:24,700 And also, it's just another way of writing it, which I 945 00:39:24,700 --> 00:39:27,100 think it's actually easier to understand once you see it. 946 00:39:27,100 --> 00:39:29,990 It's a way to think about the actual reordering from a 947 00:39:29,990 --> 00:39:33,060 theoretical standpoint 948 00:39:33,060 --> 00:39:34,310 Any wagers on this? 949 00:39:39,550 --> 00:39:41,790 So what we to do, if you think about the bit-reverse 950 00:39:41,790 --> 00:39:45,630 ordering, what we want to do is distribute the data by the 951 00:39:45,630 --> 00:39:48,300 low-order bits. 952 00:39:48,300 --> 00:39:52,080 And then gather the data by the high-order bits. 953 00:39:52,080 --> 00:39:54,450 So you want a fine-grained parity when you're shuffling. 954 00:39:54,450 --> 00:39:55,870 You can also do it the other way around. 955 00:39:55,870 --> 00:39:57,340 It's totally symmetrical. 956 00:39:57,340 --> 00:39:59,230 But one way to think about it is, you want a fine-grained 957 00:39:59,230 --> 00:40:02,090 parity when you're distributing data, and then a 958 00:40:02,090 --> 00:40:03,540 course-grained when you're coming together. 959 00:40:06,800 --> 00:40:08,050 Anyone see it? 960 00:40:14,120 --> 00:40:15,740 Give you ten more seconds. 961 00:40:21,530 --> 00:40:22,680 Come on? 962 00:40:22,680 --> 00:40:25,860 Yeah it is a little bit tricky. 963 00:40:25,860 --> 00:40:29,030 OK well let me explain how it works. 964 00:40:29,030 --> 00:40:31,370 So 1, 2, and 4. 965 00:40:31,370 --> 00:40:35,900 Ok so what these round-robin 1 splitters do, is these are 966 00:40:35,900 --> 00:40:38,510 basically the fine-grained parities. 967 00:40:38,510 --> 00:40:41,740 So OK, the first round-robin, that will send all the even 968 00:40:41,740 --> 00:40:45,450 bits to the left, and all the odd values to the right. 969 00:40:45,450 --> 00:40:47,060 Right, that's the lowest order bit. 970 00:40:47,060 --> 00:40:48,370 Because it's doing every other one, 971 00:40:48,370 --> 00:40:50,540 shuffling it left and right. 972 00:40:50,540 --> 00:40:53,630 So this round-robin is seeing only the even values. 973 00:40:53,630 --> 00:40:55,170 Now it's going to split them up based on who's 974 00:40:55,170 --> 00:40:56,370 divisible by 4. 975 00:40:56,370 --> 00:40:58,410 Now we'll go to the left or go to the right. 976 00:40:58,410 --> 00:41:01,610 This is basically shuffling in the order of the bits, from 977 00:41:01,610 --> 00:41:03,420 low-order bits to high-order bits. 978 00:41:03,420 --> 00:41:07,250 So these will be ordered in terms of their low-order bits. 979 00:41:07,250 --> 00:41:10,050 And now we just want to read them out from left to right. 980 00:41:10,050 --> 00:41:13,010 Just take the order that we made with those round-robins, 981 00:41:13,010 --> 00:41:15,360 and read them out from left to right. 982 00:41:15,360 --> 00:41:17,480 And since they have 8 values, you can do that just by 983 00:41:17,480 --> 00:41:18,410 chunking them up. 984 00:41:18,410 --> 00:41:22,440 We'll read 2, 2, and then we'll read these two, 2 and 2, 985 00:41:22,440 --> 00:41:23,320 and now put all 8. 986 00:41:23,320 --> 00:41:25,010 Does that make sense? 987 00:41:28,790 --> 00:41:30,100 OK, so yes, it's a bit clever. 988 00:41:30,100 --> 00:41:31,910 So I think it's a nice way of thinking about what a 989 00:41:31,910 --> 00:41:33,540 bit-reversed ordering means. 990 00:41:33,540 --> 00:41:35,870 And you can write this in a very general way. 991 00:41:35,870 --> 00:41:37,600 You just have a recursive bit-reversed 992 00:41:37,600 --> 00:41:39,930 filter for N values. 993 00:41:39,930 --> 00:41:41,950 And base case, you only have 2. 994 00:41:41,950 --> 00:41:44,200 So there's no reordering to do when you when 995 00:41:44,200 --> 00:41:45,530 you think about it. 996 00:41:45,530 --> 00:41:47,550 So you're not doing any computation. 997 00:41:47,550 --> 00:41:50,790 Otherwise yo have a round-robin split in half, and 998 00:41:50,790 --> 00:41:51,990 then have a coarse-grain joiner. 999 00:41:51,990 --> 00:41:56,060 So, you get a structure like this. 1000 00:41:56,060 --> 00:41:58,700 If you're, as you're building up, just distributing and then 1001 00:41:58,700 --> 00:42:02,330 bringing back together in a course-grained way. 1002 00:42:02,330 --> 00:42:03,580 OK. 1003 00:42:05,290 --> 00:42:07,720 Let's see how do I don't do this? 1004 00:42:07,720 --> 00:42:09,990 OK so one thing to notice, there's one more 1005 00:42:09,990 --> 00:42:10,730 example of a splitjoin. 1006 00:42:10,730 --> 00:42:11,170 Question? 1007 00:42:11,170 --> 00:42:13,140 AUDIENCE: [UNINTELLIGIBLE]. 1008 00:42:13,140 --> 00:42:16,990 BILL THIES: OK so in general, at the base of this hierarchy, 1009 00:42:16,990 --> 00:42:20,260 we could've added some other filter to do some competition. 1010 00:42:20,260 --> 00:42:22,980 Identity just means we're doing no computation. 1011 00:42:22,980 --> 00:42:25,230 It's a predefined filter that just does nothing. 1012 00:42:25,230 --> 00:42:26,460 PROFESSOR: On complex data. 1013 00:42:26,460 --> 00:42:27,610 BILL THIES: On complex data. 1014 00:42:27,610 --> 00:42:29,780 Sorry this is a templated filter. 1015 00:42:29,780 --> 00:42:32,100 So we're reordering complex values. 1016 00:42:32,100 --> 00:42:35,490 And we're passing the input to the output. 1017 00:42:35,490 --> 00:42:35,740 Amir? 1018 00:42:35,740 --> 00:42:40,050 AUDIENCE: [UNINTELLIGIBLE] 1019 00:42:40,050 --> 00:42:41,760 BILL THIES: In general the language does not have support 1020 00:42:41,760 --> 00:42:42,480 for templates. 1021 00:42:42,480 --> 00:42:44,570 We only do it for these base classes. 1022 00:42:44,570 --> 00:42:46,570 That's more of an implementation detail. 1023 00:42:46,570 --> 00:42:46,860 Yeah. 1024 00:42:46,860 --> 00:42:48,010 AUDIENCE: [UNINTELLIGIBLE]. 1025 00:42:48,010 --> 00:42:49,420 BILL THIES: Right now there isn't, but 1026 00:42:49,420 --> 00:42:50,840 nothing fundamental there. 1027 00:42:50,840 --> 00:42:51,060 Yeah? 1028 00:42:51,060 --> 00:42:53,410 Other questions? 1029 00:42:53,410 --> 00:42:54,000 Yeah? 1030 00:42:54,000 --> 00:42:56,864 AUDIENCE: How did you know that there are two filters 1031 00:42:56,864 --> 00:42:59,510 after that? 1032 00:42:59,510 --> 00:43:01,830 BILL THIES: Two filters, sorry here? 1033 00:43:01,830 --> 00:43:06,030 AUDIENCE: [UNINTELLIGIBLE]. 1034 00:43:06,030 --> 00:43:06,570 BILL THIES: OK. 1035 00:43:06,570 --> 00:43:07,160 OK. 1036 00:43:07,160 --> 00:43:10,180 So we have two add statements between the 1037 00:43:10,180 --> 00:43:11,740 split and the join. 1038 00:43:11,740 --> 00:43:13,650 So that branches to two parallel streams. 1039 00:43:13,650 --> 00:43:16,240 Is that your question? 1040 00:43:20,610 --> 00:43:20,673 AUDIENCE: Yeah. 1041 00:43:20,673 --> 00:43:20,990 How do you that theorem, that there's not like three. 1042 00:43:20,990 --> 00:43:24,590 BILL THIES: So the compiler will analyze 1043 00:43:24,590 --> 00:43:25,610 this, the compile time. 1044 00:43:25,610 --> 00:43:28,360 And it'll know these values of N and propagate them down at 1045 00:43:28,360 --> 00:43:29,950 compile time. 1046 00:43:29,950 --> 00:43:32,660 So it'll basically symbolically evaluate this 1047 00:43:32,660 --> 00:43:35,010 code, see there are two branches, and you can unroll 1048 00:43:35,010 --> 00:43:36,180 this communication pattern. 1049 00:43:36,180 --> 00:43:41,800 AUDIENCE: I think another way of thinking about it is, each 1050 00:43:41,800 --> 00:43:43,262 add statement essentially adds another 1051 00:43:43,262 --> 00:43:44,730 branch in your splitjoin. 1052 00:43:44,730 --> 00:43:49,240 PROFESSOR: That is another box. 1053 00:43:49,240 --> 00:43:51,880 BILL THIES: It's about to get clear actually. 1054 00:43:51,880 --> 00:43:53,130 Other questions? 1055 00:43:56,090 --> 00:43:56,580 AUDIENCE: [UNINTELLIGIBLE] 1056 00:43:56,580 --> 00:43:58,690 BILL THIES: That's one way to think about it, yeah. 1057 00:43:58,690 --> 00:44:00,590 Wait say again, counting? 1058 00:44:00,590 --> 00:44:03,060 AUDIENCE: A rating sort. 1059 00:44:03,060 --> 00:44:05,630 BILL THIES: A rating sort, right. 1060 00:44:05,630 --> 00:44:07,890 OK, well is it sorting? 1061 00:44:07,890 --> 00:44:09,460 It's not really sorting. 1062 00:44:09,460 --> 00:44:09,620 AUDIENCE: Well it's not really sorting. 1063 00:44:09,620 --> 00:44:10,730 BILL THIES: It's a permutation. 1064 00:44:10,730 --> 00:44:13,730 AUDIENCE: Could you do a rating sort? 1065 00:44:13,730 --> 00:44:14,940 BILL THIES: You could do a rating sort. 1066 00:44:14,940 --> 00:44:18,050 So actually, what I want to show next is how you can morph 1067 00:44:18,050 --> 00:44:21,890 this program into a merge sort by changing only a few lines. 1068 00:44:21,890 --> 00:44:22,640 OK look carefully. 1069 00:44:22,640 --> 00:44:24,710 Don't blink. 1070 00:44:24,710 --> 00:44:26,500 OK there's merge sort. 1071 00:44:26,500 --> 00:44:27,710 So very similar pattern. 1072 00:44:27,710 --> 00:44:29,440 This is one of those idioms. It's a recursive idiom with 1073 00:44:29,440 --> 00:44:30,860 splitjoins. 1074 00:44:30,860 --> 00:44:33,130 But now in the base case, we have a sort. 1075 00:44:33,130 --> 00:44:35,840 So we would basically branch down. 1076 00:44:35,840 --> 00:44:37,760 What we ended up with was a sort in the base case. 1077 00:44:37,760 --> 00:44:39,800 We're just sorting a few values. 1078 00:44:39,800 --> 00:44:42,080 And we call [? merge sort ?] twice again, and 1079 00:44:42,080 --> 00:44:43,410 then we do a merge. 1080 00:44:43,410 --> 00:44:46,150 So instead of identity at the base case here, we now have a 1081 00:44:46,150 --> 00:44:47,680 basic sorting routine. 1082 00:44:47,680 --> 00:44:50,260 And we merge those results from both sides. 1083 00:44:50,260 --> 00:44:52,850 And the only thing I changed in terms of the communication 1084 00:44:52,850 --> 00:44:56,100 rate, is to be more efficient we just distributed data in 1085 00:44:56,100 --> 00:44:59,120 chuncks instead of doing a fine-grained splitting. 1086 00:44:59,120 --> 00:45:01,550 Actually you do it however you want in a merge sort. 1087 00:45:01,550 --> 00:45:02,360 But this is chunked up. 1088 00:45:02,360 --> 00:45:03,960 And let's just zoom in here. 1089 00:45:03,960 --> 00:45:06,420 This is how a merger sort looks in StreamIt. 1090 00:45:06,420 --> 00:45:10,480 So we split the data two ways, both directions, come 1091 00:45:10,480 --> 00:45:14,210 together, do a sort on both sides, and then you merge. 1092 00:45:14,210 --> 00:45:16,840 And so by having the, you know you can interleave pipelines 1093 00:45:16,840 --> 00:45:18,030 and splitjoins like this. 1094 00:45:18,030 --> 00:45:19,720 So you have these hierarchical structures that are coming 1095 00:45:19,720 --> 00:45:21,350 back together. 1096 00:45:21,350 --> 00:45:24,420 Does this make sense? 1097 00:45:24,420 --> 00:45:25,890 OK. 1098 00:45:25,890 --> 00:45:29,830 I'm going to hold off on messaging actually. 1099 00:45:29,830 --> 00:45:32,830 Let me see, what do I want to cover? 1100 00:45:32,830 --> 00:45:34,330 OK let me actually skip to the end. 1101 00:45:37,330 --> 00:45:38,580 Oh, I can show you this. 1102 00:45:40,870 --> 00:45:42,380 Yeah, I'm going to cut short a little bit. 1103 00:45:42,380 --> 00:45:44,860 So here's how other programs look written in StreamIt. 1104 00:45:44,860 --> 00:45:47,110 OK, you can have a Bitonic sort. 1105 00:45:47,110 --> 00:45:48,840 OK so you see a lot of these regular structures. 1106 00:45:48,840 --> 00:45:50,340 And the compiler can unroll this and then 1107 00:45:50,340 --> 00:45:52,320 match it to the substrate. 1108 00:45:52,320 --> 00:45:53,380 This is how and FFT looks. 1109 00:45:53,380 --> 00:45:56,740 It's quite an ah elegant implementation of an FFT. 1110 00:45:56,740 --> 00:45:59,430 It'd be good to go into in more detail. 1111 00:45:59,430 --> 00:46:01,560 You can do things like block matrix multiply. 1112 00:46:01,560 --> 00:46:04,420 You don't always have to have column or row-wise ordering. 1113 00:46:04,420 --> 00:46:07,420 It's natural to split things up like this. 1114 00:46:07,420 --> 00:46:09,400 We have a lot of DSP algorithms, the filter bank, 1115 00:46:09,400 --> 00:46:12,790 FM radio with equalizer, radar array front end. 1116 00:46:12,790 --> 00:46:14,040 Here's an MP3 decoder. 1117 00:46:16,190 --> 00:46:18,160 And let's see, I'm going to skip this section and just 1118 00:46:18,160 --> 00:46:19,410 give you a taste for the end here. 1119 00:46:23,880 --> 00:46:27,190 I'm skipping a hundred slides. 1120 00:46:27,190 --> 00:46:29,840 Yeah. 1121 00:46:29,840 --> 00:46:31,100 OK so if I give you a feel. 1122 00:46:31,100 --> 00:46:32,990 Our biggest program written in StreamIt so far, is the 1123 00:46:32,990 --> 00:46:35,680 complete MPEG-2 encoder and decoder. 1124 00:46:35,680 --> 00:46:36,940 So here is MPEG-2 decoder. 1125 00:46:36,940 --> 00:46:38,720 And I think you've seen block diagrams of this 1126 00:46:38,720 --> 00:46:40,420 already in the class. 1127 00:46:40,420 --> 00:46:41,800 And so it's a pretty natural expression. 1128 00:46:41,800 --> 00:46:43,590 You can really get a feel for the high-level structure of 1129 00:46:43,590 --> 00:46:45,250 the algorithm mapping down. 1130 00:46:45,250 --> 00:46:47,430 And for example, here on the top we're doing the spatial 1131 00:46:47,430 --> 00:46:49,980 decoding looking inside each frame. 1132 00:46:49,980 --> 00:46:52,170 Whereas at the bottom we're doing the temporal decoding 1133 00:46:52,170 --> 00:46:55,580 between two frames, the motion compensation. 1134 00:46:55,580 --> 00:46:58,970 And one thing that I didn't have a chance to mention, is 1135 00:46:58,970 --> 00:47:01,540 that we have a concept of teleport messaging. 1136 00:47:01,540 --> 00:47:05,070 What this means is, I showed you how the steady state flow 1137 00:47:05,070 --> 00:47:08,270 data goes between these actors in the stream. 1138 00:47:08,270 --> 00:47:10,820 But sometimes you want to control the stream as well. 1139 00:47:10,820 --> 00:47:13,280 For example, this is a variable length 1140 00:47:13,280 --> 00:47:14,050 decoder at the top. 1141 00:47:14,050 --> 00:47:16,010 It's parsing the input data. 1142 00:47:16,010 --> 00:47:18,390 It might want to change how the processing is happening 1143 00:47:18,390 --> 00:47:19,260 downstream. 1144 00:47:19,260 --> 00:47:22,180 For example, say that you have-- you know in this case 1145 00:47:22,180 --> 00:47:24,430 you have different picture types coming in. 1146 00:47:24,430 --> 00:47:26,320 And you want to tell other components to change their 1147 00:47:26,320 --> 00:47:29,370 processing based on a non-local effect. 1148 00:47:29,370 --> 00:47:32,340 And that's hard to do if you just want static data rates. 1149 00:47:32,340 --> 00:47:35,180 But what we have is this limited notion of limited 1150 00:47:35,180 --> 00:47:37,660 dynamism, where you're basically poking into somebody 1151 00:47:37,660 --> 00:47:38,730 else's stream. 1152 00:47:38,730 --> 00:47:40,600 And we let you do that very precisely. 1153 00:47:40,600 --> 00:47:42,440 I don't have time to go into the details, but you can 1154 00:47:42,440 --> 00:47:45,500 basically synchronize the arrival of these messages with 1155 00:47:45,500 --> 00:47:48,220 the data that's also flowing through the stream, And so in 1156 00:47:48,220 --> 00:47:50,340 this case, were sending through the picture type. 1157 00:47:50,340 --> 00:47:52,320 And it really simplifies the program code. 1158 00:47:52,320 --> 00:47:54,430 I didn't have time for details, but why don't we put 1159 00:47:54,430 --> 00:47:57,530 in the slides anyway, if you're interested. 1160 00:47:57,530 --> 00:48:00,350 And if you do a similar communication pattern in C, 1161 00:48:00,350 --> 00:48:01,760 it's a little bit of a nightmare. 1162 00:48:01,760 --> 00:48:05,020 You have all these different, basically memory spaces, 1163 00:48:05,020 --> 00:48:06,320 different files. 1164 00:48:06,320 --> 00:48:08,570 And the control information is basically going left 1165 00:48:08,570 --> 00:48:09,980 and right all over. 1166 00:48:09,980 --> 00:48:12,650 So this really helps both the compiler and the programmer as 1167 00:48:12,650 --> 00:48:15,080 well in StreamIt. 1168 00:48:15,080 --> 00:48:16,220 So it's all implemented. 1169 00:48:16,220 --> 00:48:18,720 It's about 2,000 lines of code in StreamIt. 1170 00:48:18,720 --> 00:48:22,100 Which is about 2/3 of the size of the C code, taking into 1171 00:48:22,100 --> 00:48:24,020 account similar functionality there. 1172 00:48:24,020 --> 00:48:25,120 And it's a pretty big program. 1173 00:48:25,120 --> 00:48:27,780 You can write 48 static streams. And then we expand 1174 00:48:27,780 --> 00:48:30,400 that to more than 600 instantiated filters. 1175 00:48:30,400 --> 00:48:31,970 So this gives you a lot of flexibility when you're trying 1176 00:48:31,970 --> 00:48:32,750 to get parallelism. 1177 00:48:32,750 --> 00:48:33,260 Question? 1178 00:48:33,260 --> 00:48:35,920 AUDIENCE: When a compiler downloads all bytes? 1179 00:48:35,920 --> 00:48:39,780 BILL THIES: Oh the object code, you mean. 1180 00:48:39,780 --> 00:48:41,920 OK, so right now our current implementation, we duplicate a 1181 00:48:41,920 --> 00:48:42,990 lot of code. 1182 00:48:42,990 --> 00:48:45,600 So it end up being bigger than it needs to be. 1183 00:48:45,600 --> 00:48:46,820 There's no reason for us to do that. 1184 00:48:46,820 --> 00:48:48,630 That's kind of a-- we have a research compiler 1185 00:48:48,630 --> 00:48:49,640 that make that easy. 1186 00:48:49,640 --> 00:48:52,240 AUDIENCE: So object-wise , its not data. 1187 00:48:52,240 --> 00:48:54,350 BILL THIES: Object-wise we still need to do that 1188 00:48:54,350 --> 00:48:55,120 comparison. 1189 00:48:55,120 --> 00:48:56,830 Yeah that's a good question. 1190 00:48:56,830 --> 00:48:58,660 Yeah. 1191 00:48:58,660 --> 00:49:00,940 Other questions? 1192 00:49:00,940 --> 00:49:03,070 OK so let me cut to the end. 1193 00:49:03,070 --> 00:49:05,360 OK, so we have the StreamIt language. 1194 00:49:05,360 --> 00:49:05,940 And we think it really 1195 00:49:05,940 --> 00:49:07,570 preserves the program structure. 1196 00:49:07,570 --> 00:49:09,620 It's a new way of thinking about how you orchestrate the 1197 00:49:09,620 --> 00:49:12,100 data reordering with the splitjoins, showing you who is 1198 00:49:12,100 --> 00:49:14,470 communicating to who, and how you can stitch together 1199 00:49:14,470 --> 00:49:16,980 different pieces in your program development. 1200 00:49:16,980 --> 00:49:20,400 And again, really our goal is to get this scalable multicore 1201 00:49:20,400 --> 00:49:21,170 performance. 1202 00:49:21,170 --> 00:49:22,840 But you can't get people on board just on 1203 00:49:22,840 --> 00:49:23,840 a performance stat. 1204 00:49:23,840 --> 00:49:25,540 You need to show them a new programming model that 1205 00:49:25,540 --> 00:49:27,250 actually makes their lives easier. 1206 00:49:27,250 --> 00:49:28,870 So that's what we're working on. 1207 00:49:28,870 --> 00:49:30,060 And thinks with listening. 1208 00:49:30,060 --> 00:49:35,520 [APPLAUSE] 1209 00:49:35,520 --> 00:49:37,450 BILL THIES: Any last questions? 1210 00:49:37,450 --> 00:49:37,940 Yes? 1211 00:49:37,940 --> 00:49:42,770 AUDIENCE: So in the anti-decoder, you have a lot 1212 00:49:42,770 --> 00:49:47,110 of computations size that were not sequential streams. Like 1213 00:49:47,110 --> 00:49:52,300 for example, the output of the distinct cosine transform is 1214 00:49:52,300 --> 00:49:57,750 not a stream of pixel, you are going to have coefficients and 1215 00:49:57,750 --> 00:49:58,080 things like that. 1216 00:49:58,080 --> 00:49:59,500 Which are a logical sort of chunk. 1217 00:49:59,500 --> 00:50:04,780 BILL THIES: Yes so depending on the granularity of the 1218 00:50:04,780 --> 00:50:07,730 competition, you don't need to pass individual values over 1219 00:50:07,730 --> 00:50:08,610 the stream. 1220 00:50:08,610 --> 00:50:10,540 For example, you can have a stream that inputs the whole 1221 00:50:10,540 --> 00:50:12,980 array at a time. 1222 00:50:12,980 --> 00:50:16,120 And so we basically advocate that if you have something 1223 00:50:16,120 --> 00:50:18,520 that course-grained, you should be passing an array or 1224 00:50:18,520 --> 00:50:21,660 a macroblock, in the case of MPEG, or a set of coefficients 1225 00:50:21,660 --> 00:50:22,650 in a structure. 1226 00:50:22,650 --> 00:50:25,060 So when you have coarse-grain parallelism, you write your 1227 00:50:25,060 --> 00:50:26,630 program in a course-grained way. 1228 00:50:26,630 --> 00:50:28,600 The fine-grained things I showed for the bit 1229 00:50:28,600 --> 00:50:30,560 interleaving and so on, is more for the fine-grained 1230 00:50:30,560 --> 00:50:33,260 programs. AUDIENCE: Can you do both? 1231 00:50:33,260 --> 00:50:38,305 In the sense that can you stream over an array, so it's 1232 00:50:38,305 --> 00:50:39,780 stream of stream, so to speak. 1233 00:50:39,780 --> 00:50:41,270 BILL THIES: So there's an interesting multidimensional 1234 00:50:41,270 --> 00:50:42,760 problem there. 1235 00:50:42,760 --> 00:50:45,700 Right now we've taken a 1-dimensional approach. 1236 00:50:45,700 --> 00:50:48,360 So far it's basically the programmer has to set an 1237 00:50:48,360 --> 00:50:51,500 iteration order, and end up with a 1-dimensional stream 1238 00:50:51,500 --> 00:50:53,200 coming into and out of every filter. 1239 00:50:53,200 --> 00:50:54,780 We're working on extending that. 1240 00:50:54,780 --> 00:50:57,020 Yeah, but when you have basically streams of 1241 00:50:57,020 --> 00:51:00,210 2-dimensional data, you like the freedom to either iterate 1242 00:51:00,210 --> 00:51:02,340 basically in time or in space, depending 1243 00:51:02,340 --> 00:51:03,350 on what you're doing. 1244 00:51:03,350 --> 00:51:05,180 And so I think that's more of a research problem. 1245 00:51:05,180 --> 00:51:08,240 So far we're just been doing a 1-dimensional representation. 1246 00:51:08,240 --> 00:51:11,590 Yeah good point. 1247 00:51:11,590 --> 00:51:13,650 Other questions? 1248 00:51:13,650 --> 00:51:14,370 Yeah? 1249 00:51:14,370 --> 00:51:18,290 AUDIENCE: Why did you decide on this synchronous dataflow 1250 00:51:18,290 --> 00:51:19,620 model as opposed to something more general? 1251 00:51:19,620 --> 00:51:22,180 BILL THIES: So our philosophy has been that you want to 1252 00:51:22,180 --> 00:51:27,180 start with the most kind of basic block of a stream 1253 00:51:27,180 --> 00:51:29,850 program, and optimize it really well. 1254 00:51:29,850 --> 00:51:31,340 And then you can stitch those together into 1255 00:51:31,340 --> 00:51:32,510 higher level blocks. 1256 00:51:32,510 --> 00:51:34,770 So we think of synchronous dataflow as being kind of the 1257 00:51:34,770 --> 00:51:36,750 basic block of streaming. 1258 00:51:36,750 --> 00:51:38,790 You know what's coming in, you know what's coming out. 1259 00:51:38,790 --> 00:51:41,390 And even if you have a more general model, they'll be 1260 00:51:41,390 --> 00:51:44,750 pieces that fit under their synchronous dataflow model. 1261 00:51:44,750 --> 00:51:46,490 And so we saw a lot of optimizations 1262 00:51:46,490 --> 00:51:47,720 opportunities in there. 1263 00:51:47,720 --> 00:51:50,100 And really knowing those IO rates can let you do a lot of 1264 00:51:50,100 --> 00:51:52,180 things that you can't do in a general model. 1265 00:51:52,180 --> 00:51:55,010 So I wanted to get the simple case right first. And actually 1266 00:51:55,010 --> 00:51:58,010 kind of our focus now is on expanding, and how do you look 1267 00:51:58,010 --> 00:51:59,960 at the heterogeneous system, and how do you optimize a more 1268 00:51:59,960 --> 00:52:01,980 dynamic system. 1269 00:52:01,980 --> 00:52:03,230 Yep. 1270 00:52:05,270 --> 00:52:06,520 Other questions? 1271 00:52:10,520 --> 00:52:12,420 OK. 1272 00:52:12,420 --> 00:52:13,180 Yes. 1273 00:52:13,180 --> 00:52:15,140 You can check out our web page. 1274 00:52:15,140 --> 00:52:16,550 Yeah if you Google for StreamIt, I'm sure 1275 00:52:16,550 --> 00:52:17,770 you can find it. 1276 00:52:17,770 --> 00:52:19,930 Yeah, we have a public release. 1277 00:52:19,930 --> 00:52:22,380 Yes, send us any problems. It's actually it's a good 1278 00:52:22,380 --> 00:52:25,050 test. We want to make sure it works for everyone. 1279 00:52:25,050 --> 00:52:26,830 But I mean, we've had, you know, hundreds of downloads. 1280 00:52:26,830 --> 00:52:28,930 There are a lot of people using StreamIt. 1281 00:52:28,930 --> 00:52:31,440 It shouldn't break if you download it. 1282 00:52:31,440 --> 00:52:32,690 Yeah. 1283 00:52:34,360 --> 00:52:35,500 OK good. 1284 00:52:35,500 --> 00:52:36,750 Thanks.