1 00:00:00,050 --> 00:00:01,770 The following content is provided 2 00:00:01,770 --> 00:00:04,010 under a Creative Commons license. 3 00:00:04,010 --> 00:00:06,860 Your support will help MIT OpenCourseWare continue 4 00:00:06,860 --> 00:00:10,720 to offer high quality educational resources for free. 5 00:00:10,720 --> 00:00:13,320 To make a donation or view additional materials 6 00:00:13,320 --> 00:00:17,207 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,207 --> 00:00:17,832 at ocw.mit.edu. 8 00:00:20,552 --> 00:00:22,010 PROFESSOR SRINI DEVADAS: Erik and I 9 00:00:22,010 --> 00:00:24,900 have been tag teaming this lecture in this class 10 00:00:24,900 --> 00:00:28,250 so we're going to split this lecture. 11 00:00:28,250 --> 00:00:33,110 So I get to do the first 2 minutes. 12 00:00:33,110 --> 00:00:33,950 No. 13 00:00:33,950 --> 00:00:38,520 I get to do the first 20 minutes, or so, 14 00:00:38,520 --> 00:00:40,670 talking about some of my research 15 00:00:40,670 --> 00:00:42,690 in parallel architecture. 16 00:00:42,690 --> 00:00:45,040 And Erik's going to talk about a bunch of things 17 00:00:45,040 --> 00:00:48,770 that he's been up to over the years in Algorithm Design 18 00:00:48,770 --> 00:00:50,470 and Analysis. 19 00:00:50,470 --> 00:00:51,990 So let's get started. 20 00:00:56,060 --> 00:00:58,640 When was the first PC built, anybody? 21 00:01:01,400 --> 00:01:01,900 Yeah. 22 00:01:01,900 --> 00:01:03,180 AUDIENCE: In the 1950s. 23 00:01:03,180 --> 00:01:04,346 PROFESSOR SRINI DEVADAS: No. 24 00:01:04,346 --> 00:01:08,600 The first personal computer was 1981-- not the first computer. 25 00:01:08,600 --> 00:01:14,920 So all of you know about Intel, and Microsoft, and IBM, 26 00:01:14,920 --> 00:01:15,500 and so on. 27 00:01:18,060 --> 00:01:23,430 Intel's gift to humankind is the x86 architecture. 28 00:01:23,430 --> 00:01:26,830 Though, some people would argue that point. 29 00:01:26,830 --> 00:01:32,110 And the x86 architecture was invented in 1981, 30 00:01:32,110 --> 00:01:38,290 and was part of the first PC-- that provided the horsepower 31 00:01:38,290 --> 00:01:41,270 for the first PC-- the IBM PC. 32 00:01:41,270 --> 00:01:43,015 And it ran at 5 megahertz. 33 00:01:48,930 --> 00:01:53,210 And x86 has been around-- you still can buy x86 computers. 34 00:01:53,210 --> 00:02:01,570 The 80486, in 1989, ran at 25 megahertz. 35 00:02:01,570 --> 00:02:03,770 So you can see a trend here. 36 00:02:03,770 --> 00:02:07,060 And the 80486, as it turns out, ended up 37 00:02:07,060 --> 00:02:10,900 being called the I486 because there was a court ruling that 38 00:02:10,900 --> 00:02:15,300 said that you couldn't trademark numbers. 39 00:02:15,300 --> 00:02:17,430 And so Intel, at that point, decided 40 00:02:17,430 --> 00:02:19,190 to start naming their processors. 41 00:02:21,870 --> 00:02:27,170 So the Pentium, which is one of the more famous Intel 42 00:02:27,170 --> 00:02:31,720 processors, was built and came out in 1993. 43 00:02:31,720 --> 00:02:35,240 And the clock speed went up to 66 megahertz, 44 00:02:35,240 --> 00:02:37,960 back in the early '90s. 45 00:02:37,960 --> 00:02:42,010 And since this is just such a cool name, 46 00:02:42,010 --> 00:02:45,170 Intel continued to call its processors Pentium. 47 00:02:45,170 --> 00:02:52,855 And the Pentium 4, in 2000, had this incredibly deep pipeline 48 00:02:52,855 --> 00:02:54,650 where you broke up the computation 49 00:02:54,650 --> 00:02:55,650 into a bunch of stages. 50 00:02:55,650 --> 00:02:57,760 In fact, it had a 30 stage pipeline. 51 00:02:57,760 --> 00:03:02,580 And so the clock speed went up all the way to 1.5 gigahertz. 52 00:03:02,580 --> 00:03:05,690 The Pentium was famous for many things, 53 00:03:05,690 --> 00:03:10,010 including a couple of bugs in the floating point 54 00:03:10,010 --> 00:03:15,540 pipeline where division, in particular corner cases, 55 00:03:15,540 --> 00:03:17,190 wasn't done correctly. 56 00:03:17,190 --> 00:03:24,170 And there was also this bug called the F00F bug, which 57 00:03:24,170 --> 00:03:28,960 allowed a malicious program to crash the entire system, 58 00:03:28,960 --> 00:03:32,600 regardless of whether it had administrative privileges 59 00:03:32,600 --> 00:03:34,280 or not. 60 00:03:34,280 --> 00:03:37,116 But the Pentium was obviously very successful. 61 00:03:37,116 --> 00:03:40,020 A lot of machines sold. 62 00:03:40,020 --> 00:03:44,680 And it felt like it was only going to be a matter of time 63 00:03:44,680 --> 00:03:47,180 before we got to 10s of gigahertz, 64 00:03:47,180 --> 00:03:48,380 the way things were going. 65 00:03:48,380 --> 00:03:51,270 As you can see, this is a pretty steep growth 66 00:03:51,270 --> 00:03:54,070 from 5 megahertz to 25 to 1.5 gigahertz 67 00:03:54,070 --> 00:03:57,400 in the space of about 20 years. 68 00:03:57,400 --> 00:04:03,760 As it turns out, after the Pentium D, which came out 69 00:04:03,760 --> 00:04:09,770 in 2005, where the clock speed peaked at about 3.2 gigahertz, 70 00:04:09,770 --> 00:04:12,530 clock frequency stopped increasing. 71 00:04:12,530 --> 00:04:17,880 And what you see now are things that 72 00:04:17,880 --> 00:04:21,070 correspond to multiple processors on a chip. 73 00:04:21,070 --> 00:04:26,110 So for example, the Quad Core Xeon came out in 2008. 74 00:04:26,110 --> 00:04:27,670 You can still buy it. 75 00:04:27,670 --> 00:04:31,170 Only runs at 3 gigahertz, which is basically 76 00:04:31,170 --> 00:04:34,240 about the same as the Pentium D ran. 77 00:04:34,240 --> 00:04:37,100 Each of these has a range of frequencies. 78 00:04:37,100 --> 00:04:41,440 And beyond about 2005, the clock speed 79 00:04:41,440 --> 00:04:43,540 of processors that you can buy is kind of 80 00:04:43,540 --> 00:04:46,870 saturated at about 3 gigahertz. 81 00:04:46,870 --> 00:04:50,060 And the way you're getting performance 82 00:04:50,060 --> 00:04:54,290 is by putting multiple processors on the chip. 83 00:04:54,290 --> 00:04:59,570 And people use the term cores synonymously with processors. 84 00:04:59,570 --> 00:05:02,170 So a quad core means that they're, in effect, 85 00:05:02,170 --> 00:05:07,400 four x86 processors on the same silicon integrated circuit. 86 00:05:07,400 --> 00:05:10,100 And they're interconnected together. 87 00:05:10,100 --> 00:05:11,920 And they talk to memory. 88 00:05:11,920 --> 00:05:14,720 And you have, essentially, a parallel processor 89 00:05:14,720 --> 00:05:16,980 on a single chip. 90 00:05:16,980 --> 00:05:20,280 And the single user, potentially running many programs, 91 00:05:20,280 --> 00:05:22,350 is using this system. 92 00:05:22,350 --> 00:05:25,530 And you have dual core processors on your laptops. 93 00:05:25,530 --> 00:05:29,140 And so the scale, now, is-- the metric now, 94 00:05:29,140 --> 00:05:32,830 I should say-- is how many cores do you have on a chip. 95 00:05:32,830 --> 00:05:34,330 And people are predicting that we're 96 00:05:34,330 --> 00:05:38,560 going to have 1,000 cores by 2020, on a chip. 97 00:05:38,560 --> 00:05:43,450 So this brings us to the problem of how do we use parallelism. 98 00:05:43,450 --> 00:05:46,000 So there's a lot of work in parallel algorithms. 99 00:05:46,000 --> 00:05:48,770 And there's also work in building hardware, 100 00:05:48,770 --> 00:05:51,720 such that algorithms can sort of automatically 101 00:05:51,720 --> 00:05:54,210 be parallelized while they're running in hardware, 102 00:05:54,210 --> 00:05:57,380 so they can run faster, and so on and so forth. 103 00:05:57,380 --> 00:06:00,240 So some of my research is in parallel architecture. 104 00:06:00,240 --> 00:06:02,090 Some of it is in parallel algorithms. 105 00:06:02,090 --> 00:06:05,980 I want to give you a sense of what the problems are 106 00:06:05,980 --> 00:06:08,670 in building parallel architectures. 107 00:06:08,670 --> 00:06:13,370 And in particular, I'll start with a canonical system that 108 00:06:13,370 --> 00:06:16,215 corresponds to, let's say, this quad core system. 109 00:06:16,215 --> 00:06:23,730 And so you have 4 processors on this single integrated circuit. 110 00:06:23,730 --> 00:06:25,950 So that signifies that. 111 00:06:25,950 --> 00:06:32,040 And typically, you have a lot of fast, static random-access 112 00:06:32,040 --> 00:06:35,395 memory, SRAM, on the same chip. 113 00:06:35,395 --> 00:06:38,960 So typically, megabytes of the memory 114 00:06:38,960 --> 00:06:45,756 on the chip and gigabytes of memory in DRAM, 115 00:06:45,756 --> 00:06:47,630 which are separate modules that are connected 116 00:06:47,630 --> 00:06:49,950 via high speed bus, off the chip. 117 00:06:49,950 --> 00:06:53,810 So there are usually many DRAM modules. 118 00:06:53,810 --> 00:06:58,390 They're called DIMMS-- if you might have heard the term. 119 00:06:58,390 --> 00:07:01,670 So the connection between the processors and the SRAM 120 00:07:01,670 --> 00:07:03,910 is typically very fast. 121 00:07:03,910 --> 00:07:05,910 It's on-chip. 122 00:07:05,910 --> 00:07:08,540 Things being clocked at gigahertz. 123 00:07:08,540 --> 00:07:10,170 And when you go off-chip, you're down 124 00:07:10,170 --> 00:07:11,350 to a few hundred megahertz. 125 00:07:11,350 --> 00:07:15,060 So typically, an order of magnitude less speed. 126 00:07:15,060 --> 00:07:17,040 But you're accessing much more memory. 127 00:07:17,040 --> 00:07:18,632 So this is really gigabytes and this 128 00:07:18,632 --> 00:07:19,840 is at the level of megabytes. 129 00:07:22,515 --> 00:07:26,750 If you see this picture, here-- if you 130 00:07:26,750 --> 00:07:28,720 think about the number of processors increasing 131 00:07:28,720 --> 00:07:32,750 from four to eight to 16, all the way to, 132 00:07:32,750 --> 00:07:36,180 say, to hundreds of processors, you 133 00:07:36,180 --> 00:07:38,850 can see that there's going to be a bottleneck associated 134 00:07:38,850 --> 00:07:41,550 with accessing the memory. 135 00:07:41,550 --> 00:07:44,340 The big problem is you can't possibly 136 00:07:44,340 --> 00:07:48,210 build memory that serves hundreds 137 00:07:48,210 --> 00:07:49,940 of requests in parallel. 138 00:07:49,940 --> 00:07:53,780 If you try and make a large SRAM, which is megabytes long, 139 00:07:53,780 --> 00:07:55,920 the number of ports in the SRAM-- 140 00:07:55,920 --> 00:08:01,760 read ports-- is roughly of the order of four. 141 00:08:01,760 --> 00:08:04,070 And after that it's kind of hard to build. 142 00:08:04,070 --> 00:08:12,220 So this architecture isn't going to be sustainable beyond 4, 8, 143 00:08:12,220 --> 00:08:13,990 maybe 16 cores. 144 00:08:13,990 --> 00:08:17,120 So typically, what people build is-- 145 00:08:17,120 --> 00:08:19,790 or people are trying to build in academia-- 146 00:08:19,790 --> 00:08:23,980 is something that corresponds to a distributed architecture 147 00:08:23,980 --> 00:08:32,320 on the chip, where you have processors and memory in tiles. 148 00:08:32,320 --> 00:08:39,059 So you have, essentially, something 149 00:08:39,059 --> 00:08:43,490 like this, where you can imagine having literally 100 150 00:08:43,490 --> 00:08:50,657 processors on a chip that correspond to an implementation 151 00:08:50,657 --> 00:08:52,990 where you build tiles, where you have a processor that's 152 00:08:52,990 --> 00:08:56,950 doing the computation, and you have memory-- sometimes 153 00:08:56,950 --> 00:08:58,400 called cache memory. 154 00:08:58,400 --> 00:09:02,690 But there's multiple levels of caches, typically, that 155 00:09:02,690 --> 00:09:04,830 are attached to each of these processors. 156 00:09:04,830 --> 00:09:11,400 And the space between the processor tiles 157 00:09:11,400 --> 00:09:17,590 is reserved for interconnect or for wires 158 00:09:17,590 --> 00:09:19,690 that connect these processors up. 159 00:09:19,690 --> 00:09:23,270 And so there's research that goes on in routing algorithms. 160 00:09:23,270 --> 00:09:25,790 How you figure out if these processors want 161 00:09:25,790 --> 00:09:29,030 to talk to each other; what the best way of routing 162 00:09:29,030 --> 00:09:31,810 the messages are; you want to find the shortest path. 163 00:09:31,810 --> 00:09:33,520 In this case, the weight corresponds 164 00:09:33,520 --> 00:09:36,800 to the congestion that's associated 165 00:09:36,800 --> 00:09:39,570 with each of these channels that you have. 166 00:09:39,570 --> 00:09:41,990 And people actually use algorithms 167 00:09:41,990 --> 00:09:44,880 like weighted shortest paths, in hardware, 168 00:09:44,880 --> 00:09:47,690 to determine what the best way of getting from here to there 169 00:09:47,690 --> 00:09:48,190 is. 170 00:09:48,190 --> 00:09:49,420 It may not be this way. 171 00:09:49,420 --> 00:09:51,800 It may be going around the chip simply 172 00:09:51,800 --> 00:09:55,195 because that path-- the latter one is less congested. 173 00:09:57,960 --> 00:10:00,620 The other issue that comes up has 174 00:10:00,620 --> 00:10:05,410 to do with how long it takes to go across the chip 175 00:10:05,410 --> 00:10:06,560 and come back. 176 00:10:06,560 --> 00:10:09,840 So if this processor wants to access its local memory-- 177 00:10:09,840 --> 00:10:14,710 that's typically pretty simple or fast. 178 00:10:14,710 --> 00:10:18,226 But if it wants to access remote memory-- 179 00:10:18,226 --> 00:10:19,600 and it's quite possible that it's 180 00:10:19,600 --> 00:10:22,290 sharing some data with a different thread running 181 00:10:22,290 --> 00:10:23,845 on a different processor. 182 00:10:23,845 --> 00:10:25,470 So typically, there's a program running 183 00:10:25,470 --> 00:10:29,210 on this processor, sometimes called a thread, 184 00:10:29,210 --> 00:10:34,480 and this program may share data with a different program, which 185 00:10:34,480 --> 00:10:36,310 is running on this processor. 186 00:10:36,310 --> 00:10:38,940 Or it may just require a lot more space. 187 00:10:38,940 --> 00:10:43,080 And what this program has to do is make a request 188 00:10:43,080 --> 00:10:47,820 all the way to this processor and this particular cache 189 00:10:47,820 --> 00:10:48,860 in this processor. 190 00:10:48,860 --> 00:10:52,080 And then it gets the data back. 191 00:10:52,080 --> 00:10:57,990 So what you see here is a round trip access 192 00:10:57,990 --> 00:11:01,620 that goes across the chip. 193 00:11:01,620 --> 00:11:05,510 And this distance, if it's large, 194 00:11:05,510 --> 00:11:07,550 could take 10s of cycles. 195 00:11:07,550 --> 00:11:09,030 So typically, it's a single cycle 196 00:11:09,030 --> 00:11:11,990 to access local memory-- the fastest local memory, 197 00:11:11,990 --> 00:11:13,380 called the L1 cache. 198 00:11:13,380 --> 00:11:15,680 But it could take 10s of cycles to go send a message 199 00:11:15,680 --> 00:11:18,970 across the chip and 10s of cycles to get the data back. 200 00:11:18,970 --> 00:11:23,480 So the bottleneck, really, in parallel processing 201 00:11:23,480 --> 00:11:25,570 from a standpoint of communication 202 00:11:25,570 --> 00:11:31,029 is this routing of messages and getting the messages back. 203 00:11:31,029 --> 00:11:33,070 One of the things that my research group is doing 204 00:11:33,070 --> 00:11:37,460 is looking at the notion of migrating 205 00:11:37,460 --> 00:11:40,200 computation as opposed to data. 206 00:11:40,200 --> 00:11:45,620 We call it execution migration, where 207 00:11:45,620 --> 00:11:52,960 you could say-- suppose I have a processor running 208 00:11:52,960 --> 00:11:55,460 a particular program, out here. 209 00:11:55,460 --> 00:12:00,620 And if this program wanted to access a remote memory, 210 00:12:00,620 --> 00:12:02,220 then, rather than doing what I just 211 00:12:02,220 --> 00:12:03,650 showed you there-- send a message, 212 00:12:03,650 --> 00:12:05,680 get the data back-- you could imagine 213 00:12:05,680 --> 00:12:10,420 that you could migrate the program itself. 214 00:12:10,420 --> 00:12:12,070 And in particular, you think of it 215 00:12:12,070 --> 00:12:16,830 as migrating the context of the program 216 00:12:16,830 --> 00:12:20,580 from this processor to this one. 217 00:12:20,580 --> 00:12:21,800 And so what is the context? 218 00:12:21,800 --> 00:12:27,540 For those of you who have taken 6.004 probably 219 00:12:27,540 --> 00:12:28,360 know what this is. 220 00:12:28,360 --> 00:12:33,630 But it's simply where you are in terms 221 00:12:33,630 --> 00:12:34,825 of executing your program. 222 00:12:34,825 --> 00:12:36,200 And that's typically given to you 223 00:12:36,200 --> 00:12:41,340 by our program counter, and your current state of your register 224 00:12:41,340 --> 00:12:46,740 file, and a few other things, including 225 00:12:46,740 --> 00:12:49,180 cache memory and so on and so forth. 226 00:12:49,180 --> 00:12:52,230 So the advantage with execution migration 227 00:12:52,230 --> 00:12:56,160 is that it's a one way trip, as opposed to a round trip. 228 00:13:00,510 --> 00:13:05,620 You don't have to send a message and get the data back, 229 00:13:05,620 --> 00:13:08,707 which would be two messages, if you will-- 230 00:13:08,707 --> 00:13:10,540 one in the case of the address and the other 231 00:13:10,540 --> 00:13:13,780 for the data-- but you migrate your execution. 232 00:13:13,780 --> 00:13:15,660 Since you have computation out here, 233 00:13:15,660 --> 00:13:20,570 you can run on this remote processor. 234 00:13:20,570 --> 00:13:23,260 So that's one of the advantages of execution migration 235 00:13:23,260 --> 00:13:27,860 One of the downsides of it is that this 236 00:13:27,860 --> 00:13:31,200 can be multiple kilobytes-- or kilobits. 237 00:13:31,200 --> 00:13:35,750 And it could be significantly more in terms of size, 238 00:13:35,750 --> 00:13:39,460 or in terms of bits, than the data that you want to access. 239 00:13:39,460 --> 00:13:41,180 So there's a trade-off here. 240 00:13:41,180 --> 00:13:43,580 And then, when any time you have a trade-off, 241 00:13:43,580 --> 00:13:45,540 you can think of an algorithm to try and find 242 00:13:45,540 --> 00:13:47,100 the optimal trade-off. 243 00:13:47,100 --> 00:13:54,650 So this is the context for the particular optimization problem 244 00:13:54,650 --> 00:13:57,280 that we need to solve, here, that corresponds 245 00:13:57,280 --> 00:14:03,230 to really deciding when you want to do data migration 246 00:14:03,230 --> 00:14:06,960 and when you want to do execution migration. 247 00:14:06,960 --> 00:14:08,870 There's a choice. 248 00:14:08,870 --> 00:14:13,510 At the top level, it's a round trip to get the data. 249 00:14:13,510 --> 00:14:18,670 So you're really traveling longer-- twice as long. 250 00:14:18,670 --> 00:14:20,790 The distance is twice as much. 251 00:14:20,790 --> 00:14:23,590 But it's possible that the amount 252 00:14:23,590 --> 00:14:26,580 of state that you'd have to move, 253 00:14:26,580 --> 00:14:29,790 in terms of taking your context of your thread 254 00:14:29,790 --> 00:14:32,710 and moving across the chip, could be large enough 255 00:14:32,710 --> 00:14:38,020 that it offsets the advantage of the shorter distance. 256 00:14:38,020 --> 00:14:42,820 So we set this up as an optimization problem. 257 00:14:42,820 --> 00:14:46,110 So now we're in the realm of-- we moved from 6.004 to 6.006, 258 00:14:46,110 --> 00:14:50,210 here, in the last couple of seconds. 259 00:14:50,210 --> 00:14:57,990 So assume we know or can predict the access 260 00:14:57,990 --> 00:15:04,720 pattern of a program. 261 00:15:04,720 --> 00:15:06,320 And you can do this-- people build 262 00:15:06,320 --> 00:15:09,150 these things in hardware-- prefetch engines, 263 00:15:09,150 --> 00:15:11,100 branch predictors, and so on. 264 00:15:11,100 --> 00:15:13,241 They're in the x86 machines. 265 00:15:13,241 --> 00:15:15,740 And you can tell-- especially if you're going through a loop 266 00:15:15,740 --> 00:15:20,057 over and over-- you can make this prediction. 267 00:15:20,057 --> 00:15:21,640 So you have some amount of look ahead. 268 00:15:21,640 --> 00:15:25,000 And you know that m1 through mn are 269 00:15:25,000 --> 00:15:29,250 the memory accesses that this program is going to make. 270 00:15:29,250 --> 00:15:30,725 And these other memory addresses. 271 00:15:34,930 --> 00:15:42,670 And I'm going to think about p of m1, p of m2, p of mn, 272 00:15:42,670 --> 00:15:52,400 as the processor caches for each mi. 273 00:15:52,400 --> 00:15:57,330 So what might be the case, in a simple example, 274 00:15:57,330 --> 00:16:02,731 is you want to access memory in processor one. 275 00:16:02,731 --> 00:16:04,106 You're sitting there and you want 276 00:16:04,106 --> 00:16:06,270 to access memory in processor one. 277 00:16:06,270 --> 00:16:08,370 And then, the next one, you want to access memory 278 00:16:08,370 --> 00:16:10,090 in processor two. 279 00:16:10,090 --> 00:16:13,379 And so on and so forth. 280 00:16:13,379 --> 00:16:14,920 So you might see something like that. 281 00:16:14,920 --> 00:16:17,410 So the sequence of memory addressees-- 282 00:16:17,410 --> 00:16:20,207 if you're sitting on processor one-- this first one is local. 283 00:16:20,207 --> 00:16:22,540 And then, after that, you want to access processor two's 284 00:16:22,540 --> 00:16:24,790 memory because you're sharing data with it. 285 00:16:24,790 --> 00:16:27,312 Then you're back home, again, to processor one. 286 00:16:27,312 --> 00:16:28,270 And so on and so forth. 287 00:16:31,370 --> 00:16:34,610 So that's one example of a set up. 288 00:16:34,610 --> 00:16:39,350 And we can think of about the cost of migration 289 00:16:39,350 --> 00:16:41,874 as-- if you want to go from s to d-- 290 00:16:41,874 --> 00:16:45,400 as being a function of the distance, 291 00:16:45,400 --> 00:16:49,590 s comma d, plus some constant, which 292 00:16:49,590 --> 00:16:53,987 is proportional to the context size. 293 00:16:53,987 --> 00:16:55,820 And that context size, we're going to assume 294 00:16:55,820 --> 00:16:59,070 is fixed for a particular architecture. 295 00:16:59,070 --> 00:17:00,820 It may change for different architectures, 296 00:17:00,820 --> 00:17:04,511 but if it's a few kilobits, then there's 297 00:17:04,511 --> 00:17:06,010 going to be some overhead associated 298 00:17:06,010 --> 00:17:08,160 with putting the context onto the network. 299 00:17:08,160 --> 00:17:12,000 And it's a sizable overhead that needs to be taken into account. 300 00:17:12,000 --> 00:17:13,940 That's the cost of migration. 301 00:17:13,940 --> 00:17:18,210 The cost of an access, s comma d, 302 00:17:18,210 --> 00:17:23,050 is twice the distance between s and d. 303 00:17:23,050 --> 00:17:25,800 And it's typically just a word that you 304 00:17:25,800 --> 00:17:29,470 want to access-- 32 bits, 64 bits-- 305 00:17:29,470 --> 00:17:33,470 and so there's no additional overhead associated with a data 306 00:17:33,470 --> 00:17:34,750 access. 307 00:17:34,750 --> 00:17:35,890 So there you go. 308 00:17:35,890 --> 00:17:39,800 You have the formulation of the problem. 309 00:17:39,800 --> 00:17:43,750 You have the trade-off written, where the cost of migration 310 00:17:43,750 --> 00:17:46,500 has just the distance. 311 00:17:46,500 --> 00:17:48,120 But it has a constant factor. 312 00:17:48,120 --> 00:17:52,780 And you've got twice the distance, here, for the access. 313 00:17:52,780 --> 00:17:56,320 Now if s equals d, and I want to write this down, 314 00:17:56,320 --> 00:17:58,140 you have a local access. 315 00:17:58,140 --> 00:18:01,000 And the cost is assumed to be zero. 316 00:18:01,000 --> 00:18:02,560 You could change that. 317 00:18:02,560 --> 00:18:05,760 We are in the realm of the theory and symbols. 318 00:18:05,760 --> 00:18:08,280 So you can do whatever you want. 319 00:18:08,280 --> 00:18:12,770 But given those equations, our problem 320 00:18:12,770 --> 00:18:26,780 is decide when to migrate to minimize total memory access 321 00:18:26,780 --> 00:18:27,280 cost. 322 00:18:33,940 --> 00:18:35,630 So in our example there, I suppose 323 00:18:35,630 --> 00:18:41,850 we had p1, p2, p2, et cetera. 324 00:18:41,850 --> 00:18:43,100 And let's say you start at p1. 325 00:18:46,850 --> 00:18:49,535 This first one would be a local access. 326 00:18:49,535 --> 00:18:50,910 And then, you may decide that you 327 00:18:50,910 --> 00:18:52,780 want to migrate to p2, over here. 328 00:18:56,120 --> 00:18:58,900 In this case, you get this as a local access, as well. 329 00:18:58,900 --> 00:19:00,520 So is this one. 330 00:19:00,520 --> 00:19:03,933 Right here, you might want to migrate to p1 back to be p1. 331 00:19:06,910 --> 00:19:08,450 So this becomes a local access. 332 00:19:08,450 --> 00:19:10,020 That's a local access. 333 00:19:10,020 --> 00:19:11,830 They're all, essentially, free. 334 00:19:11,830 --> 00:19:13,850 And then, if you just stay at p1, 335 00:19:13,850 --> 00:19:19,140 over here, you may end up doing remote accesses to p3 and p2, 336 00:19:19,140 --> 00:19:21,500 respectively. 337 00:19:21,500 --> 00:19:24,670 And so you have a cost of migration-- the cost 338 00:19:24,670 --> 00:19:27,130 of migration and the cost of two remote access. 339 00:19:29,810 --> 00:19:31,271 So that's the set up. 340 00:19:31,271 --> 00:19:32,895 How are we going to solve this problem? 341 00:19:37,694 --> 00:19:38,610 Are we going Dijkstra? 342 00:19:38,610 --> 00:19:39,825 Are we going to use Bellman-Ford? 343 00:19:39,825 --> 00:19:41,700 Are we going to use balanced search trees? 344 00:19:41,700 --> 00:19:44,159 Are we going to use hash functions? 345 00:19:44,159 --> 00:19:45,200 What are we going to use? 346 00:19:45,200 --> 00:19:46,064 AUDIENCE: Dynamic Programming. 347 00:19:46,064 --> 00:19:47,939 PROFESSOR SRINI DEVADAS: Dynamic Programming. 348 00:19:47,939 --> 00:19:48,714 All together. 349 00:19:48,714 --> 00:19:50,410 EVERYONE: Dynamic Programming. 350 00:19:50,410 --> 00:19:52,160 PROFESSOR SRINI DEVADAS: Dynamic programming, all right. 351 00:19:52,160 --> 00:19:53,520 We're going to use dynamic programming 352 00:19:53,520 --> 00:19:54,436 to solve this problem. 353 00:19:57,181 --> 00:19:57,680 Good. 354 00:19:57,680 --> 00:20:00,887 So Erik taught you something. 355 00:20:00,887 --> 00:20:02,220 AUDIENCE: Where are the erasers? 356 00:20:02,220 --> 00:20:02,460 PROFESSOR SRINI DEVADAS: Yeah. 357 00:20:02,460 --> 00:20:03,000 Where are the erasers? 358 00:20:03,000 --> 00:20:04,460 I think they fluttered down here. 359 00:20:04,460 --> 00:20:05,880 All right. 360 00:20:05,880 --> 00:20:08,545 Let me bail out and use this while you find the erasers. 361 00:20:11,060 --> 00:20:16,150 So a program at p1, which is the processor, initially. 362 00:20:16,150 --> 00:20:18,480 I'm just going to set up this DP. 363 00:20:18,480 --> 00:20:29,809 Let's assume that the number of processors equals Q. Now, 364 00:20:29,809 --> 00:20:30,850 what are the subproblems? 365 00:20:35,100 --> 00:20:37,920 You could do this many different ways. 366 00:20:37,920 --> 00:20:41,100 Let's go ahead and use prefixes. 367 00:20:41,100 --> 00:20:53,710 And so DP(k,p1) is the cost of the optimal solution 368 00:20:53,710 --> 00:21:07,840 for the prefix m1 through mk of memory accesses, 369 00:21:07,840 --> 00:21:18,111 when the program starts at p1 and ends at pi. 370 00:21:18,111 --> 00:21:19,870 So that's my subproblem. 371 00:21:19,870 --> 00:21:22,970 I want to know, as I build this up, 372 00:21:22,970 --> 00:21:24,630 what is the optimal way that I'm going 373 00:21:24,630 --> 00:21:26,670 to choose between migrations and accesses 374 00:21:26,670 --> 00:21:34,690 for the first k memory access, assuming a starting point at p1 375 00:21:34,690 --> 00:21:37,422 and ending at some pi. 376 00:21:37,422 --> 00:21:39,130 And I need to build up these subproblems. 377 00:21:39,130 --> 00:21:40,200 And I want to grow them. 378 00:21:43,470 --> 00:21:48,460 Let's go ahead and set this up. 379 00:21:48,460 --> 00:21:50,822 What I want to do now is figure out DP(k plus 1, pj). 380 00:21:56,050 --> 00:22:01,400 And assuming I have all of the k, pi's computed-- 381 00:22:01,400 --> 00:22:04,670 and how many subproblems do I have? 382 00:22:04,670 --> 00:22:07,820 How many subproblems do I have? 383 00:22:07,820 --> 00:22:10,430 Total? 384 00:22:10,430 --> 00:22:14,880 Look at this and tell me what the ranges of the possibilities 385 00:22:14,880 --> 00:22:15,380 are. 386 00:22:15,380 --> 00:22:17,800 So how many subproblems would I have? 387 00:22:17,800 --> 00:22:18,300 Someone? 388 00:22:22,880 --> 00:22:27,950 N times Q. So you have N times Q subproblems. 389 00:22:31,910 --> 00:22:37,260 So you've set this up for up until k and for all 390 00:22:37,260 --> 00:22:38,740 of the pi's. 391 00:22:38,740 --> 00:22:43,230 Now, what you have to do is essentially say, well, 392 00:22:43,230 --> 00:22:56,030 DP of k plus 1, pj is going to be k, pj plus cost of access 393 00:22:56,030 --> 00:23:07,656 pj, p of mk plus 1 if pj is not equal to p of mk plus 1. 394 00:23:07,656 --> 00:23:09,030 So there's going to be two cases. 395 00:23:09,030 --> 00:23:13,250 I'll just write this out and I'll explain it. 396 00:23:13,250 --> 00:23:16,620 But the first case corresponds to if the new memory 397 00:23:16,620 --> 00:23:21,590 access is not in the processor cache corresponding to pj, 398 00:23:21,590 --> 00:23:26,640 then what you could do is use the optimum value, 399 00:23:26,640 --> 00:23:30,910 where you ended pj, and simply do a remote access that 400 00:23:30,910 --> 00:23:35,020 corresponds to accessing mk plus 1. 401 00:23:35,020 --> 00:23:36,560 So that's one case. 402 00:23:36,560 --> 00:23:48,410 The case is to use the minimum solution-- optimum solution 403 00:23:48,410 --> 00:23:56,110 corresponding to ending at pi and do a migration. 404 00:23:56,110 --> 00:24:01,460 You have cost of migration from pi to pj. 405 00:24:01,460 --> 00:24:11,330 And you do this if you want to go do p of mk 406 00:24:11,330 --> 00:24:14,810 plus 1-- the processor corresponding to p 407 00:24:14,810 --> 00:24:16,900 of mk plus 1. 408 00:24:16,900 --> 00:24:21,120 So that's the set up for this dynamic program. 409 00:24:21,120 --> 00:24:24,590 What you've done is created a sub problem, its optimum, 410 00:24:24,590 --> 00:24:27,280 and then you look at the two cases. 411 00:24:27,280 --> 00:24:30,990 You want to go migrate and do a local access-- that's 412 00:24:30,990 --> 00:24:32,870 this case over here. 413 00:24:32,870 --> 00:24:35,780 Migrate to the processor and do a local access there. 414 00:24:35,780 --> 00:24:36,910 That will be this case. 415 00:24:36,910 --> 00:24:40,185 And in this case, you stay where you are and do a remote access. 416 00:24:43,480 --> 00:24:52,360 In the case of migration, you could end up choosing different 417 00:24:52,360 --> 00:24:55,660 initial starting points corresponding to the pi's. 418 00:24:55,660 --> 00:24:58,880 And you have to run through all of those. 419 00:24:58,880 --> 00:25:05,170 So what's the cost of a subproblem, or the running time 420 00:25:05,170 --> 00:25:10,630 of computing one of these things-- it's order? 421 00:25:10,630 --> 00:25:19,050 Q. And so the total cost is NQ squared. 422 00:25:22,810 --> 00:25:27,270 It's a little review of DP. 423 00:25:27,270 --> 00:25:31,080 I'm going to stop here and let Erik take over. 424 00:25:31,080 --> 00:25:36,530 Just, in closing, while this makes some assumptions, 425 00:25:36,530 --> 00:25:39,280 It's actually fairly close to what 426 00:25:39,280 --> 00:25:40,790 we're building in hardware. 427 00:25:40,790 --> 00:25:42,570 This type of analysis is something 428 00:25:42,570 --> 00:25:44,190 that we have to do in hardware. 429 00:25:44,190 --> 00:25:48,200 My research group is building a 128 processor machine, 430 00:25:48,200 --> 00:25:50,470 that we call the Execution Migration Machine. 431 00:25:50,470 --> 00:25:52,960 And it does exactly what I've described to you, 432 00:25:52,960 --> 00:25:54,690 decide whether to do a remote access 433 00:25:54,690 --> 00:25:59,650 or to do a migration based on this kind of analysis. 434 00:25:59,650 --> 00:26:02,557 So hand it over to Erik. 435 00:26:02,557 --> 00:26:04,390 PROFESSOR ERIK DEMAINE: I have a microphone. 436 00:26:04,390 --> 00:26:04,570 PROFESSOR SRINI DEVADAS: All right. 437 00:26:04,570 --> 00:26:05,070 Good. 438 00:26:08,804 --> 00:26:10,720 PROFESSOR ERIK DEMAINE: So I have a few things 439 00:26:10,720 --> 00:26:12,590 to tell you a little bit about. 440 00:26:12,590 --> 00:26:14,510 Srini talked about one topic in detail. 441 00:26:14,510 --> 00:26:17,040 I'm going to talk about many topics in less detail, 442 00:26:17,040 --> 00:26:19,660 as I said "shallowly." 443 00:26:19,660 --> 00:26:22,370 And these are my main areas of research. 444 00:26:22,370 --> 00:26:26,770 I do geometry, in particular, folding, and data structures, 445 00:26:26,770 --> 00:26:29,470 graphs, and recreational algorithms. 446 00:26:29,470 --> 00:26:32,910 That's the really fun stuff. 447 00:26:32,910 --> 00:26:35,040 A lot of these have corresponding courses 448 00:26:35,040 --> 00:26:36,980 if you're interested in more about this stuff. 449 00:26:36,980 --> 00:26:39,220 Computational geometry, in general, 450 00:26:39,220 --> 00:26:43,180 is-- I'm not going to remember all numbers. 451 00:26:43,180 --> 00:26:44,390 840? 452 00:26:44,390 --> 00:26:46,890 50? 453 00:26:46,890 --> 00:26:49,660 50. 454 00:26:49,660 --> 00:26:51,160 6.850. 455 00:26:51,160 --> 00:26:53,450 That's a class I don't teach. 456 00:26:53,450 --> 00:26:57,560 Folding is 6.849. 457 00:26:57,560 --> 00:27:01,610 Data Structures is 6.851. 458 00:27:01,610 --> 00:27:06,070 And Graphs was being taught this semester, 459 00:27:06,070 --> 00:27:07,350 in parallel with this class. 460 00:27:07,350 --> 00:27:08,770 6.889. 461 00:27:08,770 --> 00:27:12,260 And recreational algorithms isn't fully covered 462 00:27:12,260 --> 00:27:15,321 but you could check out SP.268, which 463 00:27:15,321 --> 00:27:17,010 was offered last semester. 464 00:27:17,010 --> 00:27:19,030 And especially for those watching at home 465 00:27:19,030 --> 00:27:22,155 on MIT OpenCourseWare-- this class, 466 00:27:22,155 --> 00:27:25,230 all the video lectures are online for free. 467 00:27:25,230 --> 00:27:26,940 6.851, we'll do that next semester. 468 00:27:26,940 --> 00:27:30,050 And 6.889 are all online, right now. 469 00:27:30,050 --> 00:27:34,190 And there's some lecture notes for SP.268 on OpenCourseWare. 470 00:27:34,190 --> 00:27:35,800 There's a lot of material, here. 471 00:27:35,800 --> 00:27:38,930 And in particular, the obvious next class for you to be taking 472 00:27:38,930 --> 00:27:40,820 is 6.046. 473 00:27:40,820 --> 00:27:42,654 But why should you be taking 6.046? 474 00:27:42,654 --> 00:27:44,820 Because then you can take all these exciting classes 475 00:27:44,820 --> 00:27:46,770 and many others about algorithms. 476 00:27:46,770 --> 00:27:48,670 There's a complete list of follow-on classes 477 00:27:48,670 --> 00:27:51,930 in the lecture notes, which are online. 478 00:27:51,930 --> 00:27:55,020 And there's a ton of-- there's so much research in algorithms. 479 00:27:55,020 --> 00:27:56,510 It's a really exciting area. 480 00:27:56,510 --> 00:27:58,990 This is just the beginning-- just a taste. 481 00:27:58,990 --> 00:28:02,750 And I want to show you various exciting places it can go. 482 00:28:08,670 --> 00:28:10,750 Let's do some algorithms. 483 00:28:30,680 --> 00:28:33,790 So the first topic I'll tell you a little bit 484 00:28:33,790 --> 00:28:38,140 about-- maybe the most fun-- is geometric folding algorithms. 485 00:28:41,920 --> 00:28:45,480 That's the title of the textbook and the class 6.849. 486 00:28:45,480 --> 00:28:49,107 And in general-- well, there's a lot 487 00:28:49,107 --> 00:28:50,940 of different kinds of folding, in the world, 488 00:28:50,940 --> 00:28:54,290 but maybe the most accessible and fun is origami. 489 00:28:54,290 --> 00:28:57,870 So you have, on the one hand, a piece of paper. 490 00:28:57,870 --> 00:29:01,010 And you'd like to turn it into some crazy, three dimensional 491 00:29:01,010 --> 00:29:03,950 shape, which I'm not going to try to draw here. 492 00:29:03,950 --> 00:29:05,900 You want to fold a giraffe or you 493 00:29:05,900 --> 00:29:08,230 want to make some geometric sculpture. 494 00:29:08,230 --> 00:29:09,380 How do you do this? 495 00:29:09,380 --> 00:29:13,170 So, usually, you put some creases into the piece of paper 496 00:29:13,170 --> 00:29:14,680 in some reasonable way. 497 00:29:17,330 --> 00:29:19,000 And one of the questions is what are 498 00:29:19,000 --> 00:29:21,790 the rules for putting creases into a piece of paper? 499 00:29:21,790 --> 00:29:23,090 When is that possible? 500 00:29:23,090 --> 00:29:26,690 And then you'd like to fold it into that shape. 501 00:29:26,690 --> 00:29:28,580 So there are really two big problems here. 502 00:29:28,580 --> 00:29:32,530 One is I guess you could call it foldability. 503 00:29:35,550 --> 00:29:38,100 And this is what you do if you practice origami 504 00:29:38,100 --> 00:29:39,700 in the typical way. 505 00:29:39,700 --> 00:29:42,180 You get origami diagrams, and they say, "fold this." 506 00:29:42,180 --> 00:29:43,530 And you're like, oh, gosh. 507 00:29:43,530 --> 00:29:45,696 Takes you hours to figure out how to fold something. 508 00:29:45,696 --> 00:29:48,302 Especially, if they just gave you a crease pattern. 509 00:29:48,302 --> 00:29:50,760 Can you even tell does it fold into anything, first of all. 510 00:29:50,760 --> 00:29:53,180 And then, if so, how do I do it? 511 00:29:53,180 --> 00:30:05,440 That problem-- folding increase pattern and understanding 512 00:30:05,440 --> 00:30:10,130 what crease patterns are valid-- unfortunately, is NP-complete. 513 00:30:10,130 --> 00:30:13,260 So there's no good way to really understand that. 514 00:30:13,260 --> 00:30:16,214 So origami is hard. 515 00:30:16,214 --> 00:30:18,130 In some sense, the more interesting direction, 516 00:30:18,130 --> 00:30:19,780 though, is the reverse direction, 517 00:30:19,780 --> 00:30:22,360 which I would call origami design. 518 00:30:22,360 --> 00:30:26,300 I have an intended 3D shape I want to design. 519 00:30:26,300 --> 00:30:30,410 How can I come up with-- how can I, as an algorithm, convert 520 00:30:30,410 --> 00:30:33,320 that 3D shape into a crease pattern that does fold, 521 00:30:33,320 --> 00:30:36,260 that's guaranteed to fold into that 3D shape. 522 00:30:36,260 --> 00:30:38,790 And that's actually solvable. 523 00:30:38,790 --> 00:30:39,790 So design is easier. 524 00:30:42,379 --> 00:30:44,170 And there's all sorts of different versions 525 00:30:44,170 --> 00:30:46,300 of the design problem. 526 00:30:46,300 --> 00:30:48,540 Some of them, you could solve in polynomial time. 527 00:30:48,540 --> 00:30:49,570 Some of them, you can't. 528 00:30:49,570 --> 00:30:51,280 If you really want optimal design, 529 00:30:51,280 --> 00:30:53,040 that can be NP-complete again. 530 00:30:53,040 --> 00:30:59,390 But in particular, there's a way to fold any 3D shape you want. 531 00:30:59,390 --> 00:31:01,660 So there's an algorithm-- the coolest one, right now, 532 00:31:01,660 --> 00:31:03,010 is called Origamizer. 533 00:31:03,010 --> 00:31:06,420 It's free software online, by Tomohiro Tachi. 534 00:31:06,420 --> 00:31:09,940 And you give it a 3D model of a polyhedron. 535 00:31:09,940 --> 00:31:13,070 And it outputs a giant crease pattern 536 00:31:13,070 --> 00:31:16,090 on a square piece of paper that folds into that 3D polyhedron. 537 00:31:16,090 --> 00:31:18,710 And it's reasonably practical. 538 00:31:18,710 --> 00:31:21,210 And he's folded tons of models in that way. 539 00:31:23,810 --> 00:31:27,022 Let's see. 540 00:31:27,022 --> 00:31:28,980 I'll show you some other things. 541 00:31:28,980 --> 00:31:32,940 Here's a simple example of a geometric origami model. 542 00:31:32,940 --> 00:31:37,420 So this is folded from a square paper with concentric squares 543 00:31:37,420 --> 00:31:38,610 as creases. 544 00:31:38,610 --> 00:31:40,357 Alternating mountain and valley. 545 00:31:40,357 --> 00:31:42,190 So you see mountain valley, mountain valley. 546 00:31:42,190 --> 00:31:43,470 Also fold the diagonals. 547 00:31:43,470 --> 00:31:45,090 It's very easy to make. 548 00:31:45,090 --> 00:31:46,860 And what's funny-- what's cool about it 549 00:31:46,860 --> 00:31:48,850 is that when you put all those creases in, 550 00:31:48,850 --> 00:31:52,410 it pops into this 3D shape, which for many years people 551 00:31:52,410 --> 00:31:54,220 conjectured was a hyperbolic parabola. 552 00:31:54,220 --> 00:31:56,420 This design is one of the earliest geometric origami 553 00:31:56,420 --> 00:31:56,920 designs. 554 00:31:56,920 --> 00:32:01,920 It goes back to late '20s in the Bauhaus School of Design. 555 00:32:01,920 --> 00:32:03,240 And it's very cool. 556 00:32:03,240 --> 00:32:04,890 People fold them a lot. 557 00:32:04,890 --> 00:32:09,600 I've personally folded thousands of them 558 00:32:09,600 --> 00:32:10,740 for sculpture and things. 559 00:32:10,740 --> 00:32:13,400 We also do a lot of algorithmic sculpture, which 560 00:32:13,400 --> 00:32:15,590 I won't talk about in detail here. 561 00:32:15,590 --> 00:32:20,770 But we discovered, two years ago, that this does not exist. 562 00:32:20,770 --> 00:32:23,320 It is impossible to fold a square piece of paper 563 00:32:23,320 --> 00:32:25,520 with this crease pattern. 564 00:32:25,520 --> 00:32:27,160 That was a bit of a surprise. 565 00:32:27,160 --> 00:32:29,650 And it's kind of fun to make things that don't exist. 566 00:32:29,650 --> 00:32:30,733 AUDIENCE: So what is that? 567 00:32:30,733 --> 00:32:33,490 PROFESSOR ERIK DEMAINE: So what is this? 568 00:32:33,490 --> 00:32:39,090 Well, somehow, physical world is differing from the real world. 569 00:32:39,090 --> 00:32:42,550 Now, some ways it might be differing 570 00:32:42,550 --> 00:32:45,750 are that these creases might not be 571 00:32:45,750 --> 00:32:47,140 creases in the technical sense. 572 00:32:47,140 --> 00:32:49,390 A crease is a place that should be non-differentiable. 573 00:32:49,390 --> 00:32:51,120 So maybe they're kind of rounding it out. 574 00:32:51,120 --> 00:32:52,910 And then, who knows what's happening. 575 00:32:52,910 --> 00:32:54,300 Then, kind of all bets are off. 576 00:32:54,300 --> 00:32:56,300 Another possibility of what I think is happening 577 00:32:56,300 --> 00:32:58,859 is that their are extra creases, in here, that you don't see. 578 00:32:58,859 --> 00:32:59,650 They're very small. 579 00:32:59,650 --> 00:33:04,420 If you look, especially the raw edge, here, and that profile. 580 00:33:04,420 --> 00:33:05,580 It's a little bit wavy. 581 00:33:05,580 --> 00:33:07,700 And it's conceivable there's some points here 582 00:33:07,700 --> 00:33:09,680 that look non-differentiable to me. 583 00:33:09,680 --> 00:33:12,120 And I always thought I wasn't folding it well enough. 584 00:33:12,120 --> 00:33:15,180 But in fact, something like that has to happen. 585 00:33:15,180 --> 00:33:16,970 And my conjecture is, if you look 586 00:33:16,970 --> 00:33:19,340 at this under a microscope, which we haven't done yet, 587 00:33:19,340 --> 00:33:21,580 there are little creases that are so shallow they're 588 00:33:21,580 --> 00:33:23,900 hard to see, but are there. 589 00:33:23,900 --> 00:33:26,662 And the theorem says some creases have to be there. 590 00:33:26,662 --> 00:33:28,620 It is possible to fold this with extra creases, 591 00:33:28,620 --> 00:33:31,740 but not with these. 592 00:33:31,740 --> 00:33:34,599 So get rid of that. 593 00:33:34,599 --> 00:33:36,390 On the other hand, if you do the same thing 594 00:33:36,390 --> 00:33:38,770 with concentric circular creases-- this a little harder 595 00:33:38,770 --> 00:33:39,400 to unfold. 596 00:33:39,400 --> 00:33:43,150 It really wants to be in this kind of Pringles shape. 597 00:33:43,150 --> 00:33:45,690 This also is from about Bauhaus. 598 00:33:45,690 --> 00:33:47,690 It's a little harder to fold concentric circles. 599 00:33:47,690 --> 00:33:50,700 But this, we think, does exist. 600 00:33:50,700 --> 00:33:52,820 Can't prove it yet. 601 00:33:52,820 --> 00:33:56,930 So we've done a lot of sculpture based on these guys. 602 00:33:56,930 --> 00:34:00,080 What else do I want to say? 603 00:34:00,080 --> 00:34:01,310 Another demo. 604 00:34:01,310 --> 00:34:02,480 So here's a fun problem. 605 00:34:02,480 --> 00:34:03,460 This is a magic trick. 606 00:34:03,460 --> 00:34:06,980 Goes back to Houdini and others. 607 00:34:06,980 --> 00:34:13,010 So imagine I take a rectangle of paper and then I fold it flat 608 00:34:13,010 --> 00:34:16,150 and take my scissors-- not strict origami, here-- 609 00:34:16,150 --> 00:34:18,120 and I make one complete straight cut. 610 00:34:21,940 --> 00:34:24,310 In this case, I get two pieces. 611 00:34:24,310 --> 00:34:25,560 And I unfold the pieces. 612 00:34:25,560 --> 00:34:28,630 And the question is what shapes can I get out of those pieces? 613 00:34:28,630 --> 00:34:31,409 In this case, I get a swan. 614 00:34:31,409 --> 00:34:35,199 You're not impressed so I'll another one. 615 00:34:35,199 --> 00:34:36,881 Make one straight cut. 616 00:34:36,881 --> 00:34:38,380 These are on my web page if you want 617 00:34:38,380 --> 00:34:39,546 to impress all your friends. 618 00:34:42,281 --> 00:34:44,739 You could take the class if you want to know how it's done. 619 00:34:47,642 --> 00:34:49,100 This example has a lot of symmetry. 620 00:34:49,100 --> 00:34:52,008 You get a little angelfish. 621 00:34:52,008 --> 00:34:53,420 I only have one more example. 622 00:34:53,420 --> 00:34:55,030 I hope you'll be impressed. 623 00:34:55,030 --> 00:34:58,600 This is very hard to fold. 624 00:34:58,600 --> 00:35:02,390 It was an MIT spotlight picture, at some point. 625 00:35:02,390 --> 00:35:03,570 And it's even harder to cut. 626 00:35:06,350 --> 00:35:07,170 Straight cut. 627 00:35:14,290 --> 00:35:19,876 This should be the MIT logo. 628 00:35:19,876 --> 00:35:25,200 [APPLAUSE] 629 00:35:25,200 --> 00:35:27,470 So the theorem is there's an algorithm, given 630 00:35:27,470 --> 00:35:29,417 any set of polygons in the plane, 631 00:35:29,417 --> 00:35:31,000 you could fold, make one straight cut, 632 00:35:31,000 --> 00:35:32,530 and get exactly those polygons. 633 00:35:32,530 --> 00:35:33,600 There's some limits, in practice, 634 00:35:33,600 --> 00:35:34,724 because of paper thickness. 635 00:35:34,724 --> 00:35:38,240 But in theory, you can do everything. 636 00:35:38,240 --> 00:35:39,250 All right. 637 00:35:39,250 --> 00:35:39,820 Fun stuff. 638 00:35:44,580 --> 00:35:47,020 I don't think I have time to talk about self-assembly. 639 00:35:47,020 --> 00:35:49,311 Let me talk a little bit about data structures because, 640 00:35:49,311 --> 00:35:52,650 conveniently, Srini drew this diagram for me. 641 00:35:52,650 --> 00:35:56,140 And I have the exact same diagram-- the left one, though. 642 00:35:56,140 --> 00:35:56,890 I'm old fashioned. 643 00:35:59,560 --> 00:36:03,790 So the models of computation we've used, in this class, 644 00:36:03,790 --> 00:36:04,770 are pretty simple. 645 00:36:04,770 --> 00:36:06,600 We have, in particular, the Word RAM. 646 00:36:06,600 --> 00:36:07,550 You can read a word. 647 00:36:07,550 --> 00:36:08,887 You can add two words. 648 00:36:08,887 --> 00:36:10,970 Do whatever you want with a constant number words. 649 00:36:10,970 --> 00:36:12,810 Send them out to main memory. 650 00:36:12,810 --> 00:36:15,170 Everything's the same amount of time. 651 00:36:15,170 --> 00:36:18,080 It's all constant, anyway, so who cares? 652 00:36:18,080 --> 00:36:21,140 Except there's this issue in real computers, 653 00:36:21,140 --> 00:36:23,140 and it gets even worse with parallel, but let's 654 00:36:23,140 --> 00:36:29,790 stick to sequential old fashioned computers. 655 00:36:29,790 --> 00:36:33,950 So you have this slow bottleneck between main memory and cache. 656 00:36:33,950 --> 00:36:35,030 Cache is really fast. 657 00:36:35,030 --> 00:36:37,340 Think of this as a really fat pipe. 658 00:36:37,340 --> 00:36:40,047 And this is a very thin pipe. 659 00:36:40,047 --> 00:36:40,630 What do we do? 660 00:36:40,630 --> 00:36:42,610 We'd like to always work with things in cache, 661 00:36:42,610 --> 00:36:44,850 but that's kind of difficult. 662 00:36:44,850 --> 00:36:46,390 At some point, you run out of space. 663 00:36:46,390 --> 00:36:47,723 You've got to go to main memory. 664 00:36:47,723 --> 00:36:51,200 And maybe to disc, other levels of the memory hierarchy. 665 00:36:51,200 --> 00:36:54,750 So what systems do is, when you fetch something from memory, 666 00:36:54,750 --> 00:36:59,260 you don't just get one word, you get an entire cache line. 667 00:36:59,260 --> 00:37:01,570 And cache lines are getting bigger and bigger. 668 00:37:01,570 --> 00:37:09,140 But memory transfers happen in blocks, 669 00:37:09,140 --> 00:37:10,840 when you're going to a big memory. 670 00:37:19,600 --> 00:37:22,460 So let's say B is the size of a block. 671 00:37:22,460 --> 00:37:24,600 There is another model of computation 672 00:37:24,600 --> 00:37:26,770 that's more sophisticated than the Word RAM that 673 00:37:26,770 --> 00:37:31,490 says how should my running time depend on B. How many memory 674 00:37:31,490 --> 00:37:35,466 transfers do I need to do, as a function of B and n? 675 00:37:35,466 --> 00:37:40,047 And so for example, if you want to do search-- normally, 676 00:37:40,047 --> 00:37:41,380 we think of doing binary search. 677 00:37:41,380 --> 00:37:44,720 That takes log(n) accesses if everything is uniform. 678 00:37:44,720 --> 00:37:46,340 But with asymmetry, and if you're 679 00:37:46,340 --> 00:37:49,340 reading in entire blocks, if you do it right, 680 00:37:49,340 --> 00:37:56,190 you can do it in log base B of n, instead of log base 2. 681 00:37:56,190 --> 00:37:58,490 This is counting memory transfers, not computation. 682 00:37:58,490 --> 00:38:01,160 Computation here is free. 683 00:38:01,160 --> 00:38:03,720 It's a little weird, but you get used to it. 684 00:38:03,720 --> 00:38:05,770 Sorting. 685 00:38:05,770 --> 00:38:06,800 They're classic. 686 00:38:06,800 --> 00:38:10,230 Just to give you an idea of how this gets a little complicated. 687 00:38:10,230 --> 00:38:15,710 You get n divided by B times log base C of n divided by B. C 688 00:38:15,710 --> 00:38:20,610 is the number of blocks that fit in here. 689 00:38:20,610 --> 00:38:24,970 So there's C different blocks that fit in your cache. 690 00:38:24,970 --> 00:38:26,780 That's the optimal way to sort. 691 00:38:26,780 --> 00:38:29,687 Just upper and lower bounds in the comparison model. 692 00:38:29,687 --> 00:38:30,770 Just to give you a flavor. 693 00:38:30,770 --> 00:38:33,390 And there's a whole study of algorithms to do this. 694 00:38:33,390 --> 00:38:35,850 What's really cool is you can achieve these bounds 695 00:38:35,850 --> 00:38:37,930 even if you don't know what B is. 696 00:38:37,930 --> 00:38:39,360 And if you don't know what C is. 697 00:38:39,360 --> 00:38:42,140 There's one algorithm, that whatever the architecture is 698 00:38:42,140 --> 00:38:44,607 underlying it, we'll still achieve the same bounds. 699 00:38:44,607 --> 00:38:46,440 Those are called cache-oblivious algorithms, 700 00:38:46,440 --> 00:38:48,580 and they were invented, here, at MIT. 701 00:38:53,065 --> 00:38:58,100 I think I want to-- this is too much fun to pass up. 702 00:38:58,100 --> 00:39:02,740 On the Word RAM, there's this problem, 703 00:39:02,740 --> 00:39:04,570 which we've dealt with several times. 704 00:39:04,570 --> 00:39:08,920 What if you want to maintain a dynamic set of elements-- 705 00:39:08,920 --> 00:39:09,820 integers. 706 00:39:09,820 --> 00:39:13,870 I want to do insert, delete, predecessor, successor. 707 00:39:13,870 --> 00:39:16,810 This is what binary search trees do. 708 00:39:16,810 --> 00:39:19,090 But you can do better. 709 00:39:19,090 --> 00:39:26,420 If we have integers-- n integers-- in the range 710 00:39:26,420 --> 00:39:28,760 0 to u minus 1. 711 00:39:28,760 --> 00:39:31,830 So u is the size of the universe. 712 00:39:31,830 --> 00:39:37,840 Then, we already know how to do log(n). 713 00:39:37,840 --> 00:39:40,530 But you can do two bounds. 714 00:39:40,530 --> 00:39:41,435 One is log(log(u)). 715 00:39:44,807 --> 00:39:46,640 This is a data structure called [INAUDIBLE]. 716 00:39:50,700 --> 00:39:52,780 And it's in CLRS, if you're interested. 717 00:39:52,780 --> 00:40:01,025 You can also do log(log(n)) divided by log(log(u)). 718 00:40:01,025 --> 00:40:02,900 This is a data structure called fusion trees. 719 00:40:02,900 --> 00:40:04,233 It's an advanced data structure. 720 00:40:04,233 --> 00:40:08,030 6.851, if you're interested. 721 00:40:08,030 --> 00:40:09,850 And you can take the min of those two. 722 00:40:09,850 --> 00:40:12,050 That's, essentially, the best possible, 723 00:40:12,050 --> 00:40:13,800 the matching lower bound, that that that's 724 00:40:13,800 --> 00:40:14,633 all you can achieve. 725 00:40:17,190 --> 00:40:20,528 And so just to state it in terms that you know, 726 00:40:20,528 --> 00:40:23,390 which is normal n bounds. 727 00:40:23,390 --> 00:40:25,560 You take the min of those two things, 728 00:40:25,560 --> 00:40:31,955 there are always at most square root log(n) divided 729 00:40:31,955 --> 00:40:32,580 by log(log(n)). 730 00:40:37,680 --> 00:40:39,520 Compare that with log(n). 731 00:40:39,520 --> 00:40:41,546 It's way better. 732 00:40:41,546 --> 00:40:42,670 A whole square root better. 733 00:40:42,670 --> 00:40:44,044 And a little tiny savings better. 734 00:40:44,044 --> 00:40:45,220 And this is optimal. 735 00:40:45,220 --> 00:40:46,200 It is a function of n. 736 00:40:46,200 --> 00:40:48,870 That's the best you can do for the predecessor problem. 737 00:40:48,870 --> 00:40:50,000 So pretty crazy stuff. 738 00:40:50,000 --> 00:40:51,530 It's a very complicated structure. 739 00:40:51,530 --> 00:40:53,960 It's probably completely impractical. 740 00:40:53,960 --> 00:40:55,240 But, hey. 741 00:40:55,240 --> 00:40:57,770 They're, theoretically, pretty cool. 742 00:40:57,770 --> 00:41:00,110 I'll tell you a little bit about graph algorithms. 743 00:41:21,070 --> 00:41:23,580 We've seen a lot of graph algorithms in this class. 744 00:41:23,580 --> 00:41:27,980 One way to make them new and fun again is to suppose your graph 745 00:41:27,980 --> 00:41:29,252 is planar or almost planer. 746 00:41:29,252 --> 00:41:30,960 Meaning you can draw it in two dimensions 747 00:41:30,960 --> 00:41:34,280 without any crossings, as you might get from a graph that's 748 00:41:34,280 --> 00:41:38,330 drawn on the earth, like a road network 749 00:41:38,330 --> 00:41:41,100 or something with no or few overpasses. 750 00:41:41,100 --> 00:41:42,600 Then you can do things a lot better. 751 00:41:42,600 --> 00:41:44,820 For example, you can do the equivalent 752 00:41:44,820 --> 00:41:46,360 of Dijkstra's algorithm. 753 00:41:46,360 --> 00:41:49,860 So non-negative weight shortest path, in linear time. 754 00:41:52,700 --> 00:41:56,627 That's not so impressive cause Dijkstra is number of edges. 755 00:41:56,627 --> 00:41:57,960 Here, I mean number of vertices. 756 00:41:57,960 --> 00:42:00,530 It doesn't really matter with planar graphs. 757 00:42:00,530 --> 00:42:02,540 And we had E log(V). 758 00:42:02,540 --> 00:42:04,780 You can write E, here, if you prefer. 759 00:42:04,780 --> 00:42:06,060 It's only a log savings. 760 00:42:06,060 --> 00:42:08,655 More impressive, is you can do with negative weights-- 761 00:42:08,655 --> 00:42:14,840 the equivalent of Bellman-Ford-- in almost linear time. 762 00:42:18,700 --> 00:42:20,740 So some log factors. 763 00:42:20,740 --> 00:42:22,480 Log squared n divided by log(log(n)). 764 00:42:22,480 --> 00:42:23,980 It's the best bound known to date. 765 00:42:23,980 --> 00:42:25,360 That was a result from last year. 766 00:42:25,360 --> 00:42:27,560 So it's still a work in progress. 767 00:42:27,560 --> 00:42:29,610 And if you're interested in this kind of stuff, 768 00:42:29,610 --> 00:42:33,330 you should check out the videos for the class we just taught, 769 00:42:33,330 --> 00:42:35,610 6.889. 770 00:42:35,610 --> 00:42:36,720 And recreation algorithms. 771 00:42:36,720 --> 00:42:38,470 I've actually already told you about a lot 772 00:42:38,470 --> 00:42:42,280 of these-- like algorithms for solving a Rubik's cube in n 773 00:42:42,280 --> 00:42:43,640 squared divided by log(n) steps. 774 00:42:43,640 --> 00:42:45,410 That was a paper this year. 775 00:42:45,410 --> 00:42:46,700 Tetris is NP-complete. 776 00:42:46,700 --> 00:42:49,440 A whole bunch of NP-completeness, and x time 777 00:42:49,440 --> 00:42:51,130 completeness, and so on. 778 00:42:51,130 --> 00:42:52,860 Results for games. 779 00:42:52,860 --> 00:42:55,800 Other fun stuff, like balloon twisting-- algorithms 780 00:42:55,800 --> 00:42:59,060 for designing how to balloon twist a given polyhedron, 781 00:42:59,060 --> 00:43:01,480 optimally, using the fewest balloons. 782 00:43:01,480 --> 00:43:02,780 Algorithmic magic tricks. 783 00:43:02,780 --> 00:43:04,190 There's tons of stuff out there. 784 00:43:04,190 --> 00:43:05,067 It's really fun. 785 00:43:05,067 --> 00:43:07,150 I should teach a class about some of those things, 786 00:43:07,150 --> 00:43:07,900 but I haven't yet. 787 00:43:10,510 --> 00:43:13,670 The last thing we wanted to do is together. 788 00:43:13,670 --> 00:43:16,132 And it has to do with these 789 00:43:16,132 --> 00:43:18,090 PROFESSOR SRINI DEVADAS: Getting rid of these-- 790 00:43:18,090 --> 00:43:18,430 PROFESSOR ERIK DEMAINE: These cushions. 791 00:43:18,430 --> 00:43:20,400 Getting rid of these damn cushions. 792 00:43:20,400 --> 00:43:22,675 We have so many of these cushions. 793 00:43:22,675 --> 00:43:24,100 Just gotta get rid of them. 794 00:43:27,350 --> 00:43:28,550 That's two freebies. 795 00:43:28,550 --> 00:43:30,050 PROFESSOR SRINI DEVADAS: Now, you're 796 00:43:30,050 --> 00:43:31,949 going to have to pay for these cushions. 797 00:43:31,949 --> 00:43:33,490 PROFESSOR ERIK DEMAINE: He's kidding. 798 00:43:33,490 --> 00:43:33,860 He's kidding. 799 00:43:33,860 --> 00:43:35,111 Actually we're having trouble. 800 00:43:35,111 --> 00:43:36,651 We're having trouble giving them away 801 00:43:36,651 --> 00:43:38,400 because-- I don't know-- some people seem 802 00:43:38,400 --> 00:43:39,980 to not like them very much. 803 00:43:39,980 --> 00:43:40,800 And neither do we. 804 00:43:40,800 --> 00:43:46,700 So we wanted to give you some motivation for why you really 805 00:43:46,700 --> 00:43:49,250 need some of these cushions. 806 00:43:49,250 --> 00:43:51,439 So we actually prepared a top 10 list. 807 00:43:51,439 --> 00:43:53,230 PROFESSOR SRINI DEVADAS: This is the top 10 808 00:43:53,230 --> 00:43:57,710 uses of 6.006 cushions. 809 00:43:57,710 --> 00:43:59,130 We're going to alternate here. 810 00:43:59,130 --> 00:44:00,410 Number 10. 811 00:44:00,410 --> 00:44:02,240 PROFESSOR ERIK DEMAINE: You can sit on it 812 00:44:02,240 --> 00:44:06,159 and get guaranteed inspiration in constant time. 813 00:44:06,159 --> 00:44:07,700 PROFESSOR SRINI DEVADAS: Don't forget 814 00:44:07,700 --> 00:44:09,234 to bring one for the final exam. 815 00:44:09,234 --> 00:44:11,150 PROFESSOR ERIK DEMAINE: Highly recommended it. 816 00:44:11,150 --> 00:44:12,842 Number nine. 817 00:44:12,842 --> 00:44:15,050 PROFESSOR SRINI DEVADAS: You can use it as a Frisbee. 818 00:44:15,050 --> 00:44:18,330 You've seen that before, except you cut it into a circle. 819 00:44:18,330 --> 00:44:19,639 You cut it into a circle. 820 00:44:19,639 --> 00:44:20,680 And it works really well. 821 00:44:23,237 --> 00:44:25,820 PROFESSOR ERIK DEMAINE: We had fun with a Bandsaw, last night. 822 00:44:25,820 --> 00:44:27,495 PROFESSOR SRINI DEVADAS: Number eight. 823 00:44:27,495 --> 00:44:29,120 PROFESSOR ERIK DEMAINE: You can sell it 824 00:44:29,120 --> 00:44:32,272 as a limited edition collectible on eBay. 825 00:44:32,272 --> 00:44:33,730 PROFESSOR SRINI DEVADAS: It's never 826 00:44:33,730 --> 00:44:36,790 ever going to be made, again. 827 00:44:36,790 --> 00:44:39,610 You can make money off this in 5 years-- 10 years. 828 00:44:39,610 --> 00:44:40,370 PROFESSOR ERIK DEMAINE: At least $5. 829 00:44:40,370 --> 00:44:40,911 I don't know. 830 00:44:43,590 --> 00:44:44,310 Number seven. 831 00:44:44,310 --> 00:44:46,010 PROFESSOR SRINI DEVADAS: Number seven. 832 00:44:46,010 --> 00:44:49,740 If you had two of these, you could stick them like this, 833 00:44:49,740 --> 00:44:54,922 and remove the branding, and use it as a regular cushion. 834 00:44:54,922 --> 00:44:56,380 PROFESSOR ERIK DEMAINE: Now, no one 835 00:44:56,380 --> 00:44:58,281 will ever know you took this class. 836 00:44:58,281 --> 00:44:59,030 You just need two. 837 00:45:01,550 --> 00:45:03,050 PROFESSOR SRINI DEVADAS: Number six. 838 00:45:03,050 --> 00:45:04,508 PROFESSOR ERIK DEMAINE: Number six. 839 00:45:04,508 --> 00:45:06,537 It's a holiday conversation starter. 840 00:45:06,537 --> 00:45:08,620 PROFESSOR SRINI DEVADAS: And conversation stopper. 841 00:45:13,210 --> 00:45:14,710 PROFESSOR ERIK DEMAINE: Number five. 842 00:45:14,710 --> 00:45:15,550 PROFESSOR SRINI DEVADAS: Asymptotically 843 00:45:15,550 --> 00:45:17,190 optimal-- we had to use that term, 844 00:45:17,190 --> 00:45:18,799 acoustic acoustic paneling. 845 00:45:18,799 --> 00:45:21,340 PROFESSOR ERIK DEMAINE: That was a suggestion from a student. 846 00:45:21,340 --> 00:45:23,270 You just need a lot of them. 847 00:45:23,270 --> 00:45:26,460 This would be great for piano, guitar fingering practice. 848 00:45:26,460 --> 00:45:29,189 You know you're doing your DP. 849 00:45:29,189 --> 00:45:30,730 PROFESSOR SRINI DEVADAS: Number four. 850 00:45:30,730 --> 00:45:31,310 PROFESSOR ERIK DEMAINE: Number four. 851 00:45:31,310 --> 00:45:33,960 You can use it as target practice for your next larp 852 00:45:33,960 --> 00:45:34,460 session. 853 00:45:38,695 --> 00:45:39,195 Woah. 854 00:45:39,195 --> 00:45:39,695 Misfire. 855 00:45:42,300 --> 00:45:43,120 I'm missing. 856 00:45:43,120 --> 00:45:45,120 PROFESSOR SRINI DEVADAS: You haven't hit me yet. 857 00:45:46,791 --> 00:45:47,290 All right. 858 00:45:47,290 --> 00:45:49,461 Finally, you got one. 859 00:45:49,461 --> 00:45:51,369 [APPLAUSE] 860 00:45:55,279 --> 00:45:56,820 PROFESSOR ERIK DEMAINE: Number three. 861 00:45:56,820 --> 00:45:57,400 PROFESSOR SRINI DEVADAS: All right. 862 00:45:57,400 --> 00:45:59,440 10 years from now, it might be all 863 00:45:59,440 --> 00:46:01,592 you remember about double 0 6. 864 00:46:03,769 --> 00:46:05,310 PROFESSOR ERIK DEMAINE: In truth, you 865 00:46:05,310 --> 00:46:07,352 might also remember this top 10 list. 866 00:46:07,352 --> 00:46:08,810 PROFESSOR SRINI DEVADAS: All right. 867 00:46:08,810 --> 00:46:09,530 Number two. 868 00:46:09,530 --> 00:46:10,988 PROFESSOR ERIK DEMAINE: Number two. 869 00:46:10,988 --> 00:46:13,070 You can use it as your final exam cheat sheet. 870 00:46:13,070 --> 00:46:14,824 This is a new rule. 871 00:46:14,824 --> 00:46:18,080 Instead of 8 and 1/2 by 11, you could 872 00:46:18,080 --> 00:46:21,430 bring in the appropriate number of cushions. 873 00:46:21,430 --> 00:46:26,010 And the number one-- number one use for a double 0 6 cushion. 874 00:46:26,010 --> 00:46:27,650 PROFESSOR SRINI DEVADAS: Three words. 875 00:46:27,650 --> 00:46:29,580 OK Cupid profile picture. 876 00:46:35,220 --> 00:46:37,100 Don't use this cheat sheet. 877 00:46:37,100 --> 00:46:39,449 But come to the final exam and good luck. 878 00:46:39,449 --> 00:46:40,740 PROFESSOR ERIK DEMAINE: Thanks. 879 00:46:40,740 --> 00:46:43,790 [APPLAUSE]