1 00:00:00,050 --> 00:00:02,500 The following content is provided under a Creative 2 00:00:02,500 --> 00:00:04,010 Commons license. 3 00:00:04,010 --> 00:00:06,350 Your support will help MIT OpenCourseWare 4 00:00:06,350 --> 00:00:10,720 continue to offer high quality educational resources for free. 5 00:00:10,720 --> 00:00:13,330 To make a donation or view additional materials 6 00:00:13,330 --> 00:00:17,205 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,205 --> 00:00:17,830 at ocw.mit.edu. 8 00:00:21,480 --> 00:00:24,388 BRIAN SWEATT: So we're the real time ray tracing group. 9 00:00:24,388 --> 00:00:29,020 And as we like to call ourselves, Blue Steel. 10 00:00:29,020 --> 00:00:33,440 So one thing you might be asking yourselves and wondering 11 00:00:33,440 --> 00:00:35,980 from us is, where did we get the name Blue Steel? 12 00:00:35,980 --> 00:00:38,670 What does it have to do with this thing we call ray tracing? 13 00:00:38,670 --> 00:00:44,350 And this morning when I was sitting up and finishing up 14 00:00:44,350 --> 00:00:47,390 the project, I was asking myself the same thing. 15 00:00:47,390 --> 00:00:48,390 We're doing ray tracing. 16 00:00:48,390 --> 00:00:49,740 Where did we come up with this name? 17 00:00:49,740 --> 00:00:51,906 So I'm going to give you a little background on what 18 00:00:51,906 --> 00:00:52,580 Blue Steel is. 19 00:00:57,890 --> 00:01:01,610 The name came from the first day we got access to our PS3, 20 00:01:01,610 --> 00:01:03,360 we went to the student center like a bunch 21 00:01:03,360 --> 00:01:04,920 of excited kids on Christmas. 22 00:01:04,920 --> 00:01:06,060 And what did we do? 23 00:01:06,060 --> 00:01:10,010 We created a mailing list for our group. 24 00:01:10,010 --> 00:01:11,260 Didn't even mess with the PS3. 25 00:01:11,260 --> 00:01:13,890 Just created the mailing list and sat there for about an hour 26 00:01:13,890 --> 00:01:15,520 and a half, thinking of a name. 27 00:01:15,520 --> 00:01:19,140 And somehow, we decided on Zoolander's famous face, 28 00:01:19,140 --> 00:01:20,350 Blue Steel. 29 00:01:20,350 --> 00:01:24,740 And it didn't have particular significance to us before, 30 00:01:24,740 --> 00:01:28,730 but as you'll see, it has a lot of historical value. 31 00:01:32,080 --> 00:01:37,060 Blue Steel is also an alloy, a ferrous alloy. 32 00:01:39,680 --> 00:01:44,000 As you could see, we have blue tempered spring steel 33 00:01:44,000 --> 00:01:47,210 that many companies, industries, use. 34 00:01:47,210 --> 00:01:54,440 And Blue Steel is also a warhead made in the Cold War. 35 00:01:54,440 --> 00:01:56,770 But these have nothing to do with our project. 36 00:01:56,770 --> 00:02:00,400 To us, and for the purpose of this project, 37 00:02:00,400 --> 00:02:04,000 Blue Steel is a distributed real time ray tracer. 38 00:02:04,000 --> 00:02:07,020 And Natalia is going to give you a little background on what 39 00:02:07,020 --> 00:02:09,461 ray tracing really is. 40 00:02:09,461 --> 00:02:10,669 NATALIA CHERNENKO: Thank you. 41 00:02:17,660 --> 00:02:24,210 So ray tracing, there are two terms that you would generally 42 00:02:24,210 --> 00:02:27,250 hear, ray tracing and ray casting. 43 00:02:27,250 --> 00:02:32,800 Basically, the idea is the camera is like the eye. 44 00:02:32,800 --> 00:02:35,710 And from the camera, rays are cast 45 00:02:35,710 --> 00:02:38,170 for each pixel in a picture. 46 00:02:38,170 --> 00:02:43,590 And these rays, they intersect with all the virtual objects 47 00:02:43,590 --> 00:02:44,700 in the scene. 48 00:02:44,700 --> 00:02:50,820 When an object is hit, the color of it is computed. 49 00:02:53,610 --> 00:02:56,400 The closest object that is intersected 50 00:02:56,400 --> 00:03:00,270 is the object that shows up on the screen. 51 00:03:02,451 --> 00:03:04,284 BRIAN SWEATT: We should credit these slides. 52 00:03:04,284 --> 00:03:08,760 They're Fredo Durand's from 6A37. 53 00:03:08,760 --> 00:03:12,180 NATALIA CHERNENKO: And ray tracing is recursive. 54 00:03:12,180 --> 00:03:15,140 It accounts for shadows, positions 55 00:03:15,140 --> 00:03:19,200 of other objects in the scene. 56 00:03:19,200 --> 00:03:22,390 There is reflection and refraction. 57 00:03:22,390 --> 00:03:26,030 For example, if this ray were to hit that object 58 00:03:26,030 --> 00:03:28,360 and then reflect it, then another ray 59 00:03:28,360 --> 00:03:32,160 is cast from that object in the reflecting direction. 60 00:03:32,160 --> 00:03:35,420 And this can go on for an indefinite number 61 00:03:35,420 --> 00:03:37,500 of reflections. 62 00:03:37,500 --> 00:03:40,590 These are normally specified in the program. 63 00:03:44,550 --> 00:03:47,250 Here is an example of the recursion. 64 00:03:47,250 --> 00:03:52,080 So the three planes are reflective. 65 00:03:52,080 --> 00:03:55,080 They're like mirrors, and there are three spheres 66 00:03:55,080 --> 00:03:56,580 running between them. 67 00:03:56,580 --> 00:03:59,580 With no recursion, you can see there is no reflection at all. 68 00:03:59,580 --> 00:04:04,580 With one recursion, you can see the rays are reflecting just 69 00:04:04,580 --> 00:04:05,580 off the side mirrors. 70 00:04:05,580 --> 00:04:08,170 At the end with two recursions, there is also 71 00:04:08,170 --> 00:04:09,570 reflection at the bottom. 72 00:04:09,570 --> 00:04:12,220 So it is a reflection of one of the reflections. 73 00:04:17,180 --> 00:04:20,550 And this is very computationally intensive 74 00:04:20,550 --> 00:04:25,090 because for every pixel, you're casting a ray. 75 00:04:25,090 --> 00:04:27,300 And for every object in the scene, 76 00:04:27,300 --> 00:04:30,010 you're intersecting that ray with the object 77 00:04:30,010 --> 00:04:31,480 and seeing if it's hit. 78 00:04:31,480 --> 00:04:35,560 And then if it is hit and it is reflective or refractive-- 79 00:04:35,560 --> 00:04:40,300 so transparent-- then another ray is cast. 80 00:04:40,300 --> 00:04:46,440 This scene here is a resolution 1024 by 1024 81 00:04:46,440 --> 00:04:49,490 and recursion depth is 100. 82 00:04:49,490 --> 00:04:54,780 And the rendering time in our program was 2.6 seconds. 83 00:04:54,780 --> 00:05:01,910 The rendering time for 20 depth recursion was 600 milliseconds, 84 00:05:01,910 --> 00:05:06,492 and for 10 was 300 milliseconds. 85 00:05:06,492 --> 00:05:10,920 PROFESSOR: [INAUDIBLE] SPU [INAUDIBLE]. 86 00:05:10,920 --> 00:05:13,872 What would be the [INAUDIBLE]. 87 00:05:13,872 --> 00:05:17,550 R. J. RYAN: Many of us have taken 6A37, and in that class, 88 00:05:17,550 --> 00:05:20,300 we're all about ray tracers. 89 00:05:20,300 --> 00:05:24,098 And a typical scene like the one that is on the PowerPoint right 90 00:05:24,098 --> 00:05:28,550 now could take up to a minute or more. 91 00:05:28,550 --> 00:05:32,360 Often, if you were to do 100 bounces, 92 00:05:32,360 --> 00:05:37,398 I think mine would probably take over 10 minutes. 93 00:05:37,398 --> 00:05:40,877 PROFESSOR: [INAUDIBLE] you just to [INAUDIBLE] SPU or 94 00:05:40,877 --> 00:05:42,236 [? one SPU. ?] 95 00:05:42,236 --> 00:05:44,110 NATALIA CHERNENKO: We did try different SPUs. 96 00:05:44,110 --> 00:05:46,840 This picture, we didn't try with different SPUs, 97 00:05:46,840 --> 00:05:50,610 but we have tried other scenes with various numbers. 98 00:05:50,610 --> 00:05:54,290 And we did get exact linear speed up. 99 00:05:54,290 --> 00:05:57,620 So if there were twice as many SPUs, 100 00:05:57,620 --> 00:05:59,900 it would be twice as fast. 101 00:06:05,165 --> 00:06:05,790 R. J. RYAN: OK. 102 00:06:05,790 --> 00:06:08,685 So I'm going to talk to you about optimizations 103 00:06:08,685 --> 00:06:10,080 that we did. 104 00:06:10,080 --> 00:06:12,580 Natalia's told you about ray tracing, and that's only half 105 00:06:12,580 --> 00:06:15,030 of our project. 106 00:06:15,030 --> 00:06:19,012 The part that we have to focus on is optimizing. 107 00:06:19,012 --> 00:06:22,565 And meeting that goal of being real time. 108 00:06:22,565 --> 00:06:27,745 And for us, real time, you have to ask, what is real time? 109 00:06:27,745 --> 00:06:32,790 If we go off of like, the American video standard NTSC, 110 00:06:32,790 --> 00:06:37,430 they define it to be around 27, 30 frames a second. 111 00:06:37,430 --> 00:06:39,366 So that was pretty much the goal that we 112 00:06:39,366 --> 00:06:40,890 set when we set out to do this. 113 00:06:40,890 --> 00:06:44,790 And so we took the code and we think, OK, we can split this. 114 00:06:44,790 --> 00:06:47,290 This task is easily parallelizable. 115 00:06:47,290 --> 00:06:50,720 But we're going to have to do a lot more than that if we 116 00:06:50,720 --> 00:06:52,010 want to get better rates. 117 00:06:52,010 --> 00:06:55,430 So we took advantage of the hardware that we were on. 118 00:06:55,430 --> 00:07:01,570 And one thing that on the ray tracing end, 119 00:07:01,570 --> 00:07:03,680 we had to go crazy with, was SIMD. 120 00:07:04,670 --> 00:07:08,300 Pretty much, the entire ray tracer never 121 00:07:08,300 --> 00:07:11,010 deals with a scalar value. 122 00:07:11,010 --> 00:07:12,660 We use SIMD for everything. 123 00:07:12,660 --> 00:07:19,460 And in the core of it, we do branch avoidance. 124 00:07:19,460 --> 00:07:21,880 So we really try not to take branches, 125 00:07:21,880 --> 00:07:25,010 because branch misses are very costly on the cell. 126 00:07:25,010 --> 00:07:29,490 So I believe there's something like three 127 00:07:29,490 --> 00:07:34,020 if statements spread across 4,000 lines of code 128 00:07:34,020 --> 00:07:35,520 for the ray tracer. 129 00:07:35,520 --> 00:07:38,590 We really took SIMD to hear in this. 130 00:07:42,110 --> 00:07:47,480 Using SIMD, you can process arrays of structures. 131 00:07:47,480 --> 00:07:49,797 So we took-- 132 00:07:49,797 --> 00:07:51,640 NATALIA CHERNENKO: [INAUDIBLE] 133 00:07:52,585 --> 00:07:56,510 R. J. RYAN: We would trace ray packets of four rays at a time, 134 00:07:56,510 --> 00:07:58,950 and from those ray packets, we would generate hit packets. 135 00:07:58,950 --> 00:08:01,120 And then similarly, as you go down the line, 136 00:08:01,120 --> 00:08:05,630 we would produce an RGB packet, which is four color values tied 137 00:08:05,630 --> 00:08:06,540 into one. 138 00:08:06,540 --> 00:08:11,015 So at every step, we would use SIMD to our advantage. 139 00:08:11,015 --> 00:08:13,140 Now Brian's going to talk to you about pipe lining. 140 00:08:17,570 --> 00:08:20,085 BRIAN SWEATT: So of particular importance on the cell 141 00:08:20,085 --> 00:08:23,347 is not only what instructions you execute, but the order 142 00:08:23,347 --> 00:08:25,164 that you execute them in, because there is 143 00:08:25,164 --> 00:08:27,080 no reorder buffer on the cell. 144 00:08:27,080 --> 00:08:28,680 There is no out of order execution. 145 00:08:28,680 --> 00:08:31,400 And despite the fact that compiler technology 146 00:08:31,400 --> 00:08:35,909 is very good, hand optimization and pipe lining instructions 147 00:08:35,909 --> 00:08:37,770 is still always better. 148 00:08:37,770 --> 00:08:42,520 For instance, with our triangle intersection routine, 149 00:08:42,520 --> 00:08:45,360 we gained on the order of hundreds 150 00:08:45,360 --> 00:08:49,259 of milliseconds of render time. 151 00:08:49,259 --> 00:08:51,300 We shaved hundreds of milliseconds of render time 152 00:08:51,300 --> 00:08:54,010 off of our render time for some scenes you will see later, 153 00:08:54,010 --> 00:08:57,190 just by reordering our calculations 154 00:08:57,190 --> 00:08:58,450 in the intersection code. 155 00:08:58,450 --> 00:09:00,750 So it's very important, especially on the SPEs. 156 00:09:00,750 --> 00:09:02,720 And one other optimization we made 157 00:09:02,720 --> 00:09:06,430 was the fact that we coded basically everything 158 00:09:06,430 --> 00:09:07,750 on the SPEs. 159 00:09:07,750 --> 00:09:11,480 Anything running on the SPEs is coded in C. Not in C++. 160 00:09:11,480 --> 00:09:13,730 And we're not using inheritance when we do use C++ 161 00:09:13,730 --> 00:09:18,930 because virtual functions have more overhead in respect 162 00:09:18,930 --> 00:09:22,290 to memory, the V table, and also looking up the function 163 00:09:22,290 --> 00:09:23,880 pointers from the V table. 164 00:09:23,880 --> 00:09:26,354 So these are just a couple of the optimizations we made. 165 00:09:26,354 --> 00:09:27,520 We'll talk about more later. 166 00:09:27,520 --> 00:09:28,120 AUDIENCE: Just a quick question. 167 00:09:28,120 --> 00:09:29,494 So for the pipeline, you actually 168 00:09:29,494 --> 00:09:31,630 reordered instruction statements? 169 00:09:31,630 --> 00:09:33,710 The compiler wasn't very helpful in that respect? 170 00:09:33,710 --> 00:09:34,501 BRIAN SWEATT: Yeah. 171 00:09:34,501 --> 00:09:40,480 So on that scene, 1024 by 1024 with the spheres, 172 00:09:40,480 --> 00:09:43,460 we were getting, Natalia mentioned 173 00:09:43,460 --> 00:09:47,800 with a depth of recursion 10, we were getting 300 milliseconds. 174 00:09:47,800 --> 00:09:48,970 That's with optimized. 175 00:09:48,970 --> 00:09:52,250 Without the optimizations, we were getting closer to 500 176 00:09:52,250 --> 00:09:55,850 with a depth of recursion of 10. 177 00:09:55,850 --> 00:09:59,930 And we gained that extra 200 milliseconds just by reordering 178 00:09:59,930 --> 00:10:03,980 instructions in the intersection with optimization level 03 179 00:10:03,980 --> 00:10:05,570 in XLC. 180 00:10:05,570 --> 00:10:08,930 So hand optimization, like Mike Acton said in class, 181 00:10:08,930 --> 00:10:09,960 is still very important. 182 00:10:13,281 --> 00:10:15,030 And I'm going to hand you off to Mikey now 183 00:10:15,030 --> 00:10:16,340 to discuss the regular grid. 184 00:10:21,059 --> 00:10:22,850 MICHAEL D'AMBROSIO: So one of the problems, 185 00:10:22,850 --> 00:10:25,340 I guess, with ray tracing is that intersections 186 00:10:25,340 --> 00:10:26,953 are very expensive. 187 00:10:26,953 --> 00:10:31,056 And when you have scenes with n rays and m primitives, 188 00:10:31,056 --> 00:10:34,200 well, you have to generate n times m intersections. 189 00:10:34,200 --> 00:10:37,700 So one common acceleration structure is grid. 190 00:10:37,700 --> 00:10:40,480 And the concept is very simple, as the picture depicts. 191 00:10:40,480 --> 00:10:42,380 You just take your screen, split it up 192 00:10:42,380 --> 00:10:44,840 into a number of evenly size box holes, 193 00:10:44,840 --> 00:10:46,930 place your primitives in them, and then you 194 00:10:46,930 --> 00:10:48,560 march your ray through. 195 00:10:48,560 --> 00:10:52,550 And you only intersect the ray with primitives 196 00:10:52,550 --> 00:10:54,660 that are around the area. 197 00:10:54,660 --> 00:10:56,230 And the good thing about a grid is 198 00:10:56,230 --> 00:10:57,690 that it's very easy to construct, 199 00:10:57,690 --> 00:11:00,130 as opposed to, say, a k-d tree or an octree. 200 00:11:00,130 --> 00:11:02,050 Those take on the order of n log n, 201 00:11:02,050 --> 00:11:04,350 where n is number of primitives. 202 00:11:04,350 --> 00:11:07,275 Regular grid is order n, where n is the number of primitives. 203 00:11:07,275 --> 00:11:08,650 And the reason why we did this is 204 00:11:08,650 --> 00:11:12,170 because with an interactive scene where things are moving 205 00:11:12,170 --> 00:11:15,090 around, you need to rebuild the grid every frame 206 00:11:15,090 --> 00:11:16,930 or whenever something moves. 207 00:11:16,930 --> 00:11:20,410 So we favored this choice of acceleration structure 208 00:11:20,410 --> 00:11:23,325 because it was quick to build and easy to traverse. 209 00:11:26,080 --> 00:11:31,063 And now Fish is going to tell you about the fun 210 00:11:31,063 --> 00:11:32,037 part of the project. 211 00:11:37,394 --> 00:11:41,290 SCOTT FISHER: I was in charge of building the actual physics 212 00:11:41,290 --> 00:11:42,270 engine. 213 00:11:42,270 --> 00:11:47,770 Our original plan was to the 6170 well-known gizmo 214 00:11:47,770 --> 00:11:51,860 ball in 3D and ray traced. 215 00:11:51,860 --> 00:11:57,180 And I guess if we want to show that? 216 00:11:57,180 --> 00:12:00,790 I have it running on this laptop. 217 00:12:00,790 --> 00:12:02,700 We didn't quite get it ray traced, 218 00:12:02,700 --> 00:12:06,110 but I'm just going to show you a little bit of what 219 00:12:06,110 --> 00:12:07,300 I did as a physics engine. 220 00:12:21,880 --> 00:12:22,380 Maybe not. 221 00:12:32,535 --> 00:12:34,118 R. J. RYAN: We'll get the demo working 222 00:12:34,118 --> 00:12:35,886 and then I think we can use it later. 223 00:12:35,886 --> 00:12:37,469 SCOTT FISHER: Anyway, it was basically 224 00:12:37,469 --> 00:12:40,120 a 2 and 1/2 D version of the example that 225 00:12:40,120 --> 00:12:42,170 was just up on the screen. 226 00:12:42,170 --> 00:12:46,440 Flippers work, absorbers work, and due to complications, 227 00:12:46,440 --> 00:12:48,470 we never did get it fully ray traced. 228 00:12:48,470 --> 00:12:54,120 And I guess we'll go ahead and give you the real ray trace 229 00:12:54,120 --> 00:12:55,860 demo, since we do have a couple of those. 230 00:13:04,613 --> 00:13:08,025 PROFESSOR: So their actual demo will be on a little LCD screen 231 00:13:08,025 --> 00:13:08,525 here. 232 00:13:08,525 --> 00:13:11,948 Hopefully everybody can see the ray trace. 233 00:13:11,948 --> 00:13:15,126 Unfortunately, PS3 needs a specific kind 234 00:13:15,126 --> 00:13:18,716 of digital connection to display things in the right resolution 235 00:13:18,716 --> 00:13:20,680 when you're writing through Frame Buffer. 236 00:13:20,680 --> 00:13:22,660 So we can't project it onto the big screen. 237 00:13:22,660 --> 00:13:25,135 PROFESSOR: [INAUDIBLE] 238 00:13:25,135 --> 00:13:26,819 PROFESSOR: Can I do it from here? 239 00:13:26,819 --> 00:13:27,610 PROFESSOR: Perfect. 240 00:13:32,624 --> 00:13:34,540 BRIAN SWEATT: Now you got to tell [INAUDIBLE]. 241 00:13:37,505 --> 00:13:38,005 Oh, no. 242 00:13:38,005 --> 00:13:39,985 Wrong way. 243 00:13:39,985 --> 00:13:40,975 R. J. RYAN: Really? 244 00:13:40,975 --> 00:13:43,770 BRIAN SWEATT: By the way, he's still adjusting it. 245 00:13:43,770 --> 00:13:44,720 R. J. RYAN: Whoops. 246 00:13:44,720 --> 00:13:45,553 BRIAN SWEATT: Uh oh. 247 00:13:49,000 --> 00:13:50,715 Oh, no. 248 00:13:50,715 --> 00:13:51,340 R. J. RYAN: OK. 249 00:13:51,340 --> 00:13:54,480 We've prepared a few demos for you. 250 00:13:54,480 --> 00:13:57,710 And these are really just going to show off 251 00:13:57,710 --> 00:14:00,650 the ray tracing part of it and the rates 252 00:14:00,650 --> 00:14:04,430 that we got after we applied our speed ups. 253 00:14:08,160 --> 00:14:09,960 Thank you. 254 00:14:09,960 --> 00:14:12,850 Here we see both the camera is moving 255 00:14:12,850 --> 00:14:17,450 and we implemented full scene functional textures, 256 00:14:17,450 --> 00:14:18,900 so you see the checkerboard. 257 00:14:18,900 --> 00:14:21,890 It's generated every frame. 258 00:14:21,890 --> 00:14:23,500 We used no texture maps. 259 00:14:23,500 --> 00:14:26,600 This is completely generated functionally. 260 00:14:26,600 --> 00:14:30,110 Bump mapping, as you can see in the large orange sphere, 261 00:14:30,110 --> 00:14:33,470 is also done on the fly, as well as blending 262 00:14:33,470 --> 00:14:37,700 between two textures using Perlin noise is done on the fly 263 00:14:37,700 --> 00:14:38,980 also. 264 00:14:38,980 --> 00:14:40,970 Showcased in the center sphere right here, 265 00:14:40,970 --> 00:14:43,720 you can see both reflection and refraction. 266 00:14:43,720 --> 00:14:45,130 The sphere is transparent. 267 00:14:45,130 --> 00:14:46,910 It acts as a glass sphere. 268 00:14:46,910 --> 00:14:53,490 So you can see the reflection of the orange in it, 269 00:14:53,490 --> 00:14:55,390 but also the warping of the checkerboard 270 00:14:55,390 --> 00:14:57,390 as it goes through. 271 00:14:57,390 --> 00:14:59,250 This is one of the advantages of-- well, 272 00:14:59,250 --> 00:15:02,180 spheres are one of the advantages of ray tracing, 273 00:15:02,180 --> 00:15:08,690 because you can use geometrical equations to define 274 00:15:08,690 --> 00:15:12,080 your geometry versus in using rasterization methods, 275 00:15:12,080 --> 00:15:13,990 you have to actually define the geometry 276 00:15:13,990 --> 00:15:17,246 in some concrete numerical way. 277 00:15:17,246 --> 00:15:19,370 Going to go to the next demo. 278 00:15:19,370 --> 00:15:22,000 This should showcase some of the lights that we implemented. 279 00:15:22,000 --> 00:15:24,900 AUDIENCE: So how many frames per second is this? 280 00:15:24,900 --> 00:15:27,400 R. J. RYAN: This should be 15 frames a second. 281 00:15:27,400 --> 00:15:29,860 And let me reiterate. 282 00:15:29,860 --> 00:15:33,290 The refraction step requires at least two bounces, 283 00:15:33,290 --> 00:15:35,840 so two steps of recursion. 284 00:15:35,840 --> 00:15:42,540 If we were to turn that off, we would be achieving, 285 00:15:42,540 --> 00:15:46,165 I would estimate, 27 frames a second, 28. 286 00:15:46,165 --> 00:15:47,790 But then it doesn't look as pretty, so, 287 00:15:47,790 --> 00:15:49,220 you know, what's the point. 288 00:15:49,220 --> 00:15:52,040 You can see-- this will be showcased later-- 289 00:15:52,040 --> 00:15:54,167 but you can see the spotlight. 290 00:15:54,167 --> 00:15:55,000 It's kind of moving. 291 00:15:55,000 --> 00:15:57,639 It's waving back and forth, illuminating 292 00:15:57,639 --> 00:15:58,430 parts of the scene. 293 00:15:58,430 --> 00:16:01,490 There's also a point light at some distance around here 294 00:16:01,490 --> 00:16:02,970 that's illuminating everything. 295 00:16:02,970 --> 00:16:06,199 But it's basically the same set of primitives. 296 00:16:06,199 --> 00:16:07,740 If you are closer, you could probably 297 00:16:07,740 --> 00:16:11,690 notice that the checkerboard is fully reflective. 298 00:16:11,690 --> 00:16:15,292 This far wall, you can view the entire scene in it. 299 00:16:15,292 --> 00:16:16,605 Going to go to the next demo. 300 00:16:19,950 --> 00:16:22,910 This is with the lights off. 301 00:16:22,910 --> 00:16:25,600 So you can get a better sense of what the spotlight is actually 302 00:16:25,600 --> 00:16:26,100 doing. 303 00:16:28,910 --> 00:16:30,770 If you notice on this orange sphere, 304 00:16:30,770 --> 00:16:35,190 we implemented two different forms of BRDFs. 305 00:16:35,190 --> 00:16:36,920 One was generic foam shading. 306 00:16:36,920 --> 00:16:38,990 And then the other was Cook-Torrance, 307 00:16:38,990 --> 00:16:42,165 which helps particularly with metals 308 00:16:42,165 --> 00:16:45,170 and other materials like that. 309 00:16:45,170 --> 00:16:50,750 It helps have very bright specular highlight. 310 00:16:50,750 --> 00:16:52,740 And you'll notice as the camera's panning 311 00:16:52,740 --> 00:17:01,360 back and forth, the speed ups that we get from-- we 312 00:17:01,360 --> 00:17:04,560 used little tricks to glean speed here and there. 313 00:17:04,560 --> 00:17:09,150 Like when it's so dark or it's very far away, 314 00:17:09,150 --> 00:17:10,704 we stop recursing. 315 00:17:10,704 --> 00:17:12,156 BRIAN SWEATT: We're over, so-- 316 00:17:12,156 --> 00:17:12,640 R. J. RYAN: We are? 317 00:17:12,640 --> 00:17:13,124 BRIAN SWEATT: Yeah. 318 00:17:13,124 --> 00:17:14,099 So if there's any-- 319 00:17:14,099 --> 00:17:15,280 R. J. RYAN: Let's see. 320 00:17:15,280 --> 00:17:17,650 SCOTT FISHER: Do the SPUs from one to six. 321 00:17:17,650 --> 00:17:21,050 R. J. RYAN: I can demonstrate-- we actually enabled it 322 00:17:21,050 --> 00:17:24,347 so that you can turn it on either one through six SPEs 323 00:17:24,347 --> 00:17:25,555 using command line arguments. 324 00:17:28,130 --> 00:17:29,810 Let's go back to the first demo. 325 00:17:29,810 --> 00:17:36,330 If I say just use one SPE, we can watch it slowly trod along. 326 00:17:36,330 --> 00:17:38,060 And if I were SHH-ed in, it would 327 00:17:38,060 --> 00:17:42,056 be giving me timing information so I could tell you exactly. 328 00:17:42,056 --> 00:17:44,180 It's really unfortunate that it's not, because it's 329 00:17:44,180 --> 00:17:45,790 very linear in its speed up. 330 00:17:45,790 --> 00:17:49,690 Right now it says 500 milliseconds. 331 00:17:49,690 --> 00:17:54,770 But if I turn it on two, it drops literally in half to 250. 332 00:17:54,770 --> 00:17:57,305 You can get a sense of that, that it's 333 00:17:57,305 --> 00:18:02,760 moving like tick, tick, tick, tick versus tick, tick, tick. 334 00:18:02,760 --> 00:18:08,420 And then three, it's a similar speedup. 335 00:18:08,420 --> 00:18:11,420 And then I'll jump to five. 336 00:18:11,420 --> 00:18:13,200 It's getting closer. 337 00:18:13,200 --> 00:18:22,130 And then when we turn on the sixth, this should be about 12, 338 00:18:22,130 --> 00:18:23,560 15 frames a second. 339 00:18:25,977 --> 00:18:28,060 AUDIENCE: So you estimate if you had a dozen SPUs, 340 00:18:28,060 --> 00:18:29,024 you could go-- 341 00:18:29,024 --> 00:18:31,190 R. J. RYAN: Actually, it's funny you mentioned that. 342 00:18:31,190 --> 00:18:34,165 I think it was the Julia re-tracer or the IRT. 343 00:18:34,165 --> 00:18:35,290 BRIAN SWEATT: Both of them. 344 00:18:35,290 --> 00:18:37,485 R. J. RYAN: Both of them use two cell blades, 345 00:18:37,485 --> 00:18:39,125 which is a full eight? 346 00:18:39,125 --> 00:18:41,000 MICHAEL D'AMBROSIO: It's a full eight times-- 347 00:18:41,000 --> 00:18:42,416 R. J. RYAN: Eight cores times two. 348 00:18:42,416 --> 00:18:44,760 So they use 16 SPUs. 349 00:18:44,760 --> 00:18:47,517 The PlayStation 3 do have certain factors. 350 00:18:47,517 --> 00:18:48,600 We're only limited to six. 351 00:18:53,340 --> 00:18:55,390 AUDIENCE: The way you're dividing up the work, 352 00:18:55,390 --> 00:19:01,140 each SPU taking every sixth raster line, that sounds like-- 353 00:19:01,140 --> 00:19:05,640 could you improve on the performance by allocating 354 00:19:05,640 --> 00:19:06,980 the rays-- 355 00:19:06,980 --> 00:19:09,420 R. J. RYAN: The issue there is one of granularity 356 00:19:09,420 --> 00:19:11,500 of our distribution. 357 00:19:11,500 --> 00:19:15,630 And also communication time on the bus. 358 00:19:15,630 --> 00:19:18,535 We've designed it such that we can fine tune it in any way 359 00:19:18,535 --> 00:19:19,248 we want. 360 00:19:19,248 --> 00:19:20,706 AUDIENCE: But it seems to me if you 361 00:19:20,706 --> 00:19:26,490 had a localized bundle of rays on a single SPU, 362 00:19:26,490 --> 00:19:28,680 you might be able to do some optimizations, 363 00:19:28,680 --> 00:19:30,810 like treat them as fat rays. 364 00:19:30,810 --> 00:19:33,177 Nearby rays are likely to intersect the same triangles. 365 00:19:33,177 --> 00:19:35,260 MICHAEL D'AMBROSIO: Choosing the frosting variety. 366 00:19:35,260 --> 00:19:37,090 R. J. RYAN: The entire material system 367 00:19:37,090 --> 00:19:38,390 works very much like that. 368 00:19:38,390 --> 00:19:40,790 It actually saves on computations 369 00:19:40,790 --> 00:19:45,780 based on which rays in a packet of four 370 00:19:45,780 --> 00:19:48,210 have collided with the same element. 371 00:19:48,210 --> 00:19:51,130 And if it does, then it takes full advantage of SIMD 372 00:19:51,130 --> 00:19:54,933 to quickly chug through those rays that are the same 373 00:19:54,933 --> 00:19:57,217 and then never call that particular shader again, 374 00:19:57,217 --> 00:19:59,050 because it's already done the work for them. 375 00:19:59,050 --> 00:20:01,260 MICHAEL D'AMBROSIO: Another issue, I think, is-- 376 00:20:01,260 --> 00:20:03,320 PROFESSOR: Sorry, we should try to wrap so we 377 00:20:03,320 --> 00:20:04,580 can get through the others. 378 00:20:04,580 --> 00:20:07,290 Any more very short questions? 379 00:20:07,290 --> 00:20:10,760 Did you guys have any last concluding statements? 380 00:20:10,760 --> 00:20:13,560 NATALIA CHERNENKO: Maybe we go through this like really fast? 381 00:20:13,560 --> 00:20:15,452 [INTERPOSING VOICES] 382 00:20:15,452 --> 00:20:17,160 BRIAN SWEATT: Point is we had two pieces. 383 00:20:17,160 --> 00:20:18,010 PROFESSOR: Thank the speaker. 384 00:20:18,010 --> 00:20:18,926 R. J. RYAN: Thank you. 385 00:20:18,926 --> 00:20:22,560 [APPLAUSE]