1 00:00:00,120 --> 00:00:02,500 The following content is provided under a Creative 2 00:00:02,500 --> 00:00:03,910 Commons license. 3 00:00:03,910 --> 00:00:06,950 Your support will help MIT OpenCourseWare continue to 4 00:00:06,950 --> 00:00:10,600 offer high quality educational resources for free. 5 00:00:10,600 --> 00:00:13,500 To make a donation or view additional materials from 6 00:00:13,500 --> 00:00:17,430 hundreds of MIT courses visit MIT OpenCourseWare at 7 00:00:17,430 --> 00:00:18,680 ocw.mit.edu. 8 00:00:20,740 --> 00:00:22,650 PROFESSOR: So today we're going to talk about bit 9 00:00:22,650 --> 00:00:32,420 hacking, which is a topic that has a long, long history in 10 00:00:32,420 --> 00:00:33,480 computer science. 11 00:00:33,480 --> 00:00:40,985 We'll only cover on a few of the techniques. 12 00:00:43,550 --> 00:00:46,420 So let's just get going. 13 00:00:46,420 --> 00:00:51,920 So I want to swap two integers. 14 00:00:51,920 --> 00:00:54,530 So I think most of you would know how to write a program to 15 00:00:54,530 --> 00:00:55,520 swap two integers. 16 00:00:55,520 --> 00:00:57,540 And it would look something like this. 17 00:00:57,540 --> 00:00:58,960 And mostly this is pseudocode. 18 00:00:58,960 --> 00:01:03,250 I'm not going to be doing declarations of types and 19 00:01:03,250 --> 00:01:06,120 writing full code, in order to make sure things get on slides 20 00:01:06,120 --> 00:01:07,230 and so forth. 21 00:01:07,230 --> 00:01:08,130 So what do you do? 22 00:01:08,130 --> 00:01:12,890 You assign a temporary to the value of x. 23 00:01:12,890 --> 00:01:16,270 You then let x take the value of y. 24 00:01:16,270 --> 00:01:20,480 And then you let y take the value of the temporary. 25 00:01:20,480 --> 00:01:23,110 What could be simpler? 26 00:01:23,110 --> 00:01:25,125 Well how about doing it without a temporary? 27 00:01:27,770 --> 00:01:34,205 So how do you swap two numbers without a temporary? 28 00:01:37,290 --> 00:01:38,540 So here's one way. 29 00:01:49,980 --> 00:01:51,640 So what's going on there? 30 00:01:51,640 --> 00:01:58,790 So the carrot is an XOR, exclusive or, OK? 31 00:01:58,790 --> 00:02:01,040 So here's what's going on. 32 00:02:01,040 --> 00:02:03,300 So let's do an example first. 33 00:02:03,300 --> 00:02:05,530 So I have x and y. 34 00:02:05,530 --> 00:02:11,990 I then let x be the XOR of x and y. 35 00:02:11,990 --> 00:02:17,300 So, as you see, that first bit is the XOR of one and zero. 36 00:02:17,300 --> 00:02:20,660 The second bit is the XOR of zero and zero, which is zero. 37 00:02:20,660 --> 00:02:23,220 The third bit is the XOR of one and one. 38 00:02:23,220 --> 00:02:24,680 That's zero. 39 00:02:24,680 --> 00:02:26,080 And so forth throughout the bits. 40 00:02:29,630 --> 00:02:37,970 So then I let y be the XOR of x and y. 41 00:02:37,970 --> 00:02:43,400 And then, finally, I let x be the XOR of x and y again. 42 00:02:43,400 --> 00:02:45,800 And now, if you notice, that number is the 43 00:02:45,800 --> 00:02:47,370 same as that number. 44 00:02:47,370 --> 00:02:49,255 And that number is the same as that number. 45 00:02:52,580 --> 00:02:54,950 Magic. 46 00:02:54,950 --> 00:02:58,060 We're going to see a lot of magic today actually. 47 00:02:58,060 --> 00:02:58,720 OK? 48 00:02:58,720 --> 00:03:01,940 We're going to see a lot of magic today, no temporary. 49 00:03:01,940 --> 00:03:03,730 Why does this work? 50 00:03:03,730 --> 00:03:06,830 So the reason this works as a great property of XOR is that 51 00:03:06,830 --> 00:03:10,560 it's its own inverse. 52 00:03:10,560 --> 00:03:18,060 So if you take x exclusive or y, and you exclusive or that 53 00:03:18,060 --> 00:03:21,020 with y, you get x. 54 00:03:21,020 --> 00:03:25,930 If you were to exclusive or that with x you would get y. 55 00:03:25,930 --> 00:03:32,310 So that first step is basically putting in here the 56 00:03:32,310 --> 00:03:37,630 XOR of x and y so that when you end up on the next step 57 00:03:37,630 --> 00:03:44,780 computing the XOR of y and this, you get x. 58 00:03:44,780 --> 00:03:46,080 So now you've got x here. 59 00:03:46,080 --> 00:03:52,520 And you've now got the original x XORed with y here. 60 00:03:52,520 --> 00:03:57,310 So to get back the value of y you just XOR out the x. 61 00:03:57,310 --> 00:04:00,060 So it swaps them. 62 00:04:00,060 --> 00:04:01,310 Whose brain hurts? 63 00:04:06,730 --> 00:04:10,080 You can study these later on. 64 00:04:10,080 --> 00:04:14,290 But a pretty neat trick, pretty neat trick. 65 00:04:14,290 --> 00:04:16,690 Does it perform well? 66 00:04:16,690 --> 00:04:18,940 Turns out not really. 67 00:04:18,940 --> 00:04:21,440 And the other way is actually a better way of doing it, 68 00:04:21,440 --> 00:04:24,710 generally, with most compilers and architectures. 69 00:04:24,710 --> 00:04:27,820 And the reason is because the other way of doing it you can 70 00:04:27,820 --> 00:04:32,300 actually, essentially, pull two things out of memory, one 71 00:04:32,300 --> 00:04:33,540 into a temporary one-- 72 00:04:33,540 --> 00:04:35,400 and then stick them back like this. 73 00:04:35,400 --> 00:04:37,780 That's what the compiler ends up doing. 74 00:04:37,780 --> 00:04:41,610 Where as this one, it has to wait for each step. 75 00:04:41,610 --> 00:04:43,870 And so you don't get to exploit instruction-level 76 00:04:43,870 --> 00:04:45,120 parallelism. 77 00:04:45,120 --> 00:04:47,850 Remember from last time instruction-level parallelism 78 00:04:47,850 --> 00:04:50,970 is the fact that a processor can issue more than one 79 00:04:50,970 --> 00:04:53,730 instruction at a given step. 80 00:04:53,730 --> 00:04:57,620 And here, this sequence of operations, each step has to 81 00:04:57,620 --> 00:05:00,090 wait until the previous one is computed 82 00:05:00,090 --> 00:05:01,490 before it can execute. 83 00:05:01,490 --> 00:05:04,550 So you get no instruction parallelism in this. 84 00:05:04,550 --> 00:05:06,760 So it's not particularly high performing. 85 00:05:06,760 --> 00:05:08,750 But there are other places where we'll use 86 00:05:08,750 --> 00:05:10,190 this kind of property. 87 00:05:10,190 --> 00:05:13,720 But it's a neat bit hack, swap two things 88 00:05:13,720 --> 00:05:14,970 without using a temporary. 89 00:05:17,760 --> 00:05:19,440 Now here's a real bit hack. 90 00:05:19,440 --> 00:05:21,410 And a real useful one. 91 00:05:21,410 --> 00:05:25,730 Finding the minimum of two integers, x and y. 92 00:05:25,730 --> 00:05:31,440 Gee whiz, let me just call a sub routine or something. 93 00:05:31,440 --> 00:05:36,390 So you might be tempted to write something like this; if 94 00:05:36,390 --> 00:05:39,870 x is less than y then the result is x. 95 00:05:39,870 --> 00:05:43,250 Otherwise, the result is y. 96 00:05:43,250 --> 00:05:44,600 Seems pretty straightforward. 97 00:05:44,600 --> 00:05:49,740 Or if you know a little bit more c, you can write it with 98 00:05:49,740 --> 00:05:55,610 this sort of cryptic if x is less than y, then x else y. 99 00:05:55,610 --> 00:05:58,830 So those are two equivalent c ways of doing things. 100 00:06:02,700 --> 00:06:03,950 So what's wrong with that? 101 00:06:06,550 --> 00:06:09,840 Well nothing if you don't mind slow code. 102 00:06:09,840 --> 00:06:12,160 In fact, for something like this the compiler actually 103 00:06:12,160 --> 00:06:14,450 will optimize it to deal with it. 104 00:06:14,450 --> 00:06:16,980 But let me just point out a couple things. 105 00:06:16,980 --> 00:06:24,300 First of all, the processor has within it a branch 106 00:06:24,300 --> 00:06:27,020 prediction unit. 107 00:06:27,020 --> 00:06:30,830 Whenever it comes to a branch it guesses which way the 108 00:06:30,830 --> 00:06:33,980 branch is going to go, and proceeds to speculatively 109 00:06:33,980 --> 00:06:36,520 execute along that path. 110 00:06:36,520 --> 00:06:41,290 If it turns out to be wrong it says, whoa, hold your horses, 111 00:06:41,290 --> 00:06:43,570 got to go that way. 112 00:06:43,570 --> 00:06:45,920 To do that, it empties the processor pipeline. 113 00:06:45,920 --> 00:06:48,190 And that takes, on the machines we're 114 00:06:48,190 --> 00:06:49,855 using, around 16 cycles. 115 00:06:52,880 --> 00:06:56,520 So you don't want to have branches that are 116 00:06:56,520 --> 00:06:58,620 mis-predicted. 117 00:06:58,620 --> 00:07:01,310 And, in particular, what you want to look for in 118 00:07:01,310 --> 00:07:04,326 conditional instructions, is whether they're predictable. 119 00:07:07,000 --> 00:07:11,130 So something that's almost all the time branching the same 120 00:07:11,130 --> 00:07:14,720 way, that's a predictable branch. 121 00:07:14,720 --> 00:07:19,100 The hardware is very smart about figuring out how to 122 00:07:19,100 --> 00:07:21,800 predict that, and will make the right prediction. 123 00:07:21,800 --> 00:07:25,370 And you won't pay any performance penalty. 124 00:07:25,370 --> 00:07:27,840 But if you have something where you don't know which way 125 00:07:27,840 --> 00:07:34,100 it goes, so in a code like this, the architecture isn't 126 00:07:34,100 --> 00:07:35,400 going to know. 127 00:07:35,400 --> 00:07:39,480 If you just throw at it various pairs of x and y it's 128 00:07:39,480 --> 00:07:42,560 a 50/50 guess as to whether it guesses right. 129 00:07:42,560 --> 00:07:46,280 So half the time it's going to predict the wrong thing. 130 00:07:50,170 --> 00:07:53,280 So the compiler might be smart enough. 131 00:07:53,280 --> 00:07:54,100 But maybe not. 132 00:07:54,100 --> 00:07:55,770 But you can be sure. 133 00:07:55,770 --> 00:07:59,150 And here's a way of being sure. 134 00:07:59,150 --> 00:08:00,400 You write this code. 135 00:08:04,560 --> 00:08:05,965 What is going on there? 136 00:08:10,350 --> 00:08:12,320 So here we go. 137 00:08:12,320 --> 00:08:18,060 We're taking x less than y, and taking a minus sign. 138 00:08:18,060 --> 00:08:20,060 Yikes, what's that do? 139 00:08:20,060 --> 00:08:23,000 Well c represents the Boolean's true and false with 140 00:08:23,000 --> 00:08:25,680 the integers one and zero, respectively. 141 00:08:25,680 --> 00:08:30,970 So if you execute the operator x less than y, as opposed to 142 00:08:30,970 --> 00:08:35,090 doing a conditional based on x bit less than y, it returns to 143 00:08:35,090 --> 00:08:38,440 either a zero or a one, depending upon whether it was 144 00:08:38,440 --> 00:08:39,820 successful. 145 00:08:39,820 --> 00:08:45,920 So when you take the negation of that, this is 146 00:08:45,920 --> 00:08:46,940 either zero or one. 147 00:08:46,940 --> 00:08:50,950 It's either going to be zero or minus 1. 148 00:08:50,950 --> 00:08:56,110 And what is minus 1 in two's complement arithmetic? 149 00:08:56,110 --> 00:08:59,720 It's a word filled all with ones. 150 00:08:59,720 --> 00:09:05,460 So you either get a word all filled with zeros or you get a 151 00:09:05,460 --> 00:09:06,710 word all filled with ones. 152 00:09:06,710 --> 00:09:11,410 So if x is less than y you get a word all filled with one's. 153 00:09:11,410 --> 00:09:12,570 So then what are you doing? 154 00:09:12,570 --> 00:09:16,440 You're doing x XOR y, and you're 155 00:09:16,440 --> 00:09:18,280 ending it with all ones. 156 00:09:18,280 --> 00:09:22,780 Well that's a noop to end it with all ones. 157 00:09:22,780 --> 00:09:25,630 To mask something-- if I do an and of anything with one, I 158 00:09:25,630 --> 00:09:27,740 get whatever the thing is. 159 00:09:27,740 --> 00:09:33,490 So this expression ends up evaluating to just x XOR y. 160 00:09:33,490 --> 00:09:35,260 Great, then what? 161 00:09:35,260 --> 00:09:40,330 I take y here and I XOR it, I get back x because of that 162 00:09:40,330 --> 00:09:44,420 inverse property of XOR. 163 00:09:44,420 --> 00:09:47,880 So if x is less than y, then r gets the value of x. 164 00:09:54,640 --> 00:09:57,670 If x is greater or equal to y then this expression 165 00:09:57,670 --> 00:09:59,740 evaluates to zero. 166 00:09:59,740 --> 00:10:04,220 And a word of zeroes ended with x or y gives you a word 167 00:10:04,220 --> 00:10:07,755 of zeroes because zero is an annihilator for and. 168 00:10:10,580 --> 00:10:12,930 Wherever you and with zero, it doesn't matter what it is. 169 00:10:12,930 --> 00:10:14,320 You get zero. 170 00:10:14,320 --> 00:10:30,660 So therefore this just becomes r equals y because 171 00:10:30,660 --> 00:10:37,010 y XOR zero is y. 172 00:10:37,010 --> 00:10:40,940 So pretty clever, is this really better? 173 00:10:40,940 --> 00:10:43,150 Seems like an awful lot of operations. 174 00:10:43,150 --> 00:10:46,830 Well the answer is yes it is, because all of this goes on 175 00:10:46,830 --> 00:10:52,190 within the processing unit rather than with anything 176 00:10:52,190 --> 00:10:54,610 having to do with memory. 177 00:10:54,610 --> 00:10:57,740 It gets the values for x and y to begin with and then it's 178 00:10:57,740 --> 00:11:00,410 all instructions within the processing unit. 179 00:11:00,410 --> 00:11:02,060 Those typically take one cycle. 180 00:11:02,060 --> 00:11:08,310 And if there's any parallelism in it the parallelism will be 181 00:11:08,310 --> 00:11:12,280 able to execute even more than one operation per cycle. 182 00:11:12,280 --> 00:11:16,622 In fact, the machines we're using have are six issue. 183 00:11:16,622 --> 00:11:21,400 They can run six operation simultaneously, 184 00:11:21,400 --> 00:11:24,420 each taking a cycle. 185 00:11:24,420 --> 00:11:27,930 So the difference between that and going out to memory is 186 00:11:27,930 --> 00:11:31,010 really quite considerable. 187 00:11:31,010 --> 00:11:34,130 So everybody follow that? 188 00:11:34,130 --> 00:11:35,910 Pretty cute trick, how to make it go fast. 189 00:11:35,910 --> 00:11:37,060 Yes, question? 190 00:11:37,060 --> 00:11:39,510 AUDIENCE: Doesn't the expression that's tested, 191 00:11:39,510 --> 00:11:43,920 doesn't it have to be [? weighed for ?] the inner 192 00:11:43,920 --> 00:11:45,390 expression before you take-- 193 00:11:45,390 --> 00:11:48,660 PROFESSOR: There's no-- this is a comparison. 194 00:11:48,660 --> 00:11:52,200 This is operated like a-- 195 00:11:52,200 --> 00:11:55,190 so there's no compare instruction there. 196 00:11:55,190 --> 00:12:01,350 It's a CPU operation. 197 00:12:01,350 --> 00:12:04,100 It's an arithmetic and logical operation of the CPU that it 198 00:12:04,100 --> 00:12:07,690 can do in one cycle, is to compare. 199 00:12:07,690 --> 00:12:10,330 The normal thing that you're trying to do if you have an if 200 00:12:10,330 --> 00:12:13,680 is you're trying to change the program counter. 201 00:12:13,680 --> 00:12:16,100 And that's what's costly. 202 00:12:16,100 --> 00:12:22,380 Not the actual doing the test of the branch, test of whether 203 00:12:22,380 --> 00:12:24,220 x is less y. 204 00:12:24,220 --> 00:12:25,024 OK? 205 00:12:25,024 --> 00:12:27,617 AUDIENCE: Would you have to wait for that to finish before 206 00:12:27,617 --> 00:12:28,976 you can do the negation? 207 00:12:28,976 --> 00:12:30,460 PROFESSOR: Yes you do. 208 00:12:30,460 --> 00:12:35,190 So that's one cycle, two cycles, we can add it up here, 209 00:12:35,190 --> 00:12:36,020 three cycles. 210 00:12:36,020 --> 00:12:37,450 This can be going on in parallel. 211 00:12:37,450 --> 00:12:41,990 So it's really only two cycles total. 212 00:12:41,990 --> 00:12:47,960 Three cycles, four cycles, so in four cycles you can get the 213 00:12:47,960 --> 00:12:51,790 minimum done. 214 00:12:51,790 --> 00:12:55,980 The L1 cache in the architecture we're using costs 215 00:12:55,980 --> 00:12:59,280 you four cycles to fetch something if you get a cache 216 00:12:59,280 --> 00:13:02,350 hit in the L1 cache. 217 00:13:02,350 --> 00:13:04,200 That's the cheapest memory operation you 218 00:13:04,200 --> 00:13:05,560 can do, is four cycles. 219 00:13:05,560 --> 00:13:07,690 This computed the whole minimum in four cycles. 220 00:13:15,520 --> 00:13:18,570 Here's another one, modular addition. 221 00:13:18,570 --> 00:13:23,580 So sometimes you know something that the compiler 222 00:13:23,580 --> 00:13:24,970 doesn't know. 223 00:13:24,970 --> 00:13:28,710 Like suppose that you know that x is between zero and 224 00:13:28,710 --> 00:13:34,330 some value n, and y is between zero and some value n, and you 225 00:13:34,330 --> 00:13:36,900 want to compute their sum. 226 00:13:36,900 --> 00:13:41,820 So what is that the sum is going to be less than what? 227 00:13:41,820 --> 00:13:43,070 2n. 228 00:13:45,000 --> 00:13:50,180 So normally a modular operation is very expensive 229 00:13:50,180 --> 00:13:52,380 because it involves a divide. 230 00:13:52,380 --> 00:13:56,240 Now multiply is normally more expensive than an ordinary ALU 231 00:13:56,240 --> 00:13:58,930 operation that's just a bitwise operation, like 232 00:13:58,930 --> 00:14:03,480 addition, or XORing bitwise XORs, or comparison, 233 00:14:03,480 --> 00:14:04,120 or what have you. 234 00:14:04,120 --> 00:14:06,870 Those are very cheap one cycle operations. 235 00:14:06,870 --> 00:14:10,700 Multiply is usually a many cycle operation. 236 00:14:10,700 --> 00:14:14,540 Divide is often implemented by doing repeated multiplies 237 00:14:14,540 --> 00:14:18,060 using any of a variety of techniques, including Newton 238 00:14:18,060 --> 00:14:18,850 techniques. 239 00:14:18,850 --> 00:14:23,240 Sometimes there is a divider, or a divide step. 240 00:14:23,240 --> 00:14:26,190 But divide is, generally, in any case more expensive, even 241 00:14:26,190 --> 00:14:30,510 though it's doing operations all within the processor. 242 00:14:30,510 --> 00:14:37,910 So if you actually compute mod using your percent thing. 243 00:14:37,910 --> 00:14:41,750 This actually can be quite expensive unless you're 244 00:14:41,750 --> 00:14:44,110 dividing by a power of two. 245 00:14:44,110 --> 00:14:46,860 If you mod a power of two that's easy because the 246 00:14:46,860 --> 00:14:49,550 processor, if knows it's a power of two, if the compiler 247 00:14:49,550 --> 00:14:54,210 knows it's a power of two, it'll just do a masking 248 00:14:54,210 --> 00:14:59,080 operation on the low order bits to give you whatever the 249 00:14:59,080 --> 00:15:02,040 remainder is, mod2 to the n. 250 00:15:02,040 --> 00:15:05,340 But if you're not in that situation, n may not be a 251 00:15:05,340 --> 00:15:08,640 power of two, you still want to do something modn, there 252 00:15:08,640 --> 00:15:11,980 still are some tricks you can play but the compiler won't- 253 00:15:11,980 --> 00:15:14,710 this is one the compiler generally won't play for you 254 00:15:14,710 --> 00:15:17,410 because the compiler won't know that these are 255 00:15:17,410 --> 00:15:20,420 preconditions in your code. 256 00:15:20,420 --> 00:15:22,590 By the way, one of the most common things is just doing x 257 00:15:22,590 --> 00:15:24,950 plus 1 modn. 258 00:15:24,950 --> 00:15:29,970 Very, very common thing to be doing, x plus 1 modn, where 259 00:15:29,970 --> 00:15:32,930 you're wrapping around in some index space. 260 00:15:36,910 --> 00:15:38,150 Here's another way you could do it. 261 00:15:38,150 --> 00:15:39,660 So divide is expensive. 262 00:15:39,660 --> 00:15:43,410 Here you could just say z equals x plus y. 263 00:15:43,410 --> 00:15:47,610 And then if z is less than n, give z otherwise z minus n. 264 00:15:50,620 --> 00:15:52,210 The problem with this is that it's got an 265 00:15:52,210 --> 00:15:54,210 unpredictable branch. 266 00:15:54,210 --> 00:15:56,510 To execute this code, I could have written it out with an if 267 00:15:56,510 --> 00:16:00,960 statement, it's got to change the program counter to execute 268 00:16:00,960 --> 00:16:03,205 either this or this. 269 00:16:05,960 --> 00:16:09,680 so not very fast because you have an unpredictable branch. 270 00:16:09,680 --> 00:16:11,920 And we already talked about that has to empty the pipeline 271 00:16:11,920 --> 00:16:15,530 if it's wrong in the guess. 272 00:16:15,530 --> 00:16:19,520 So here's a way of doing it which doesn't 273 00:16:19,520 --> 00:16:24,060 have an explicit branch. 274 00:16:24,060 --> 00:16:27,010 So we compute x plus y. 275 00:16:27,010 --> 00:16:30,360 And now what we do is we look at whether z is greater than 276 00:16:30,360 --> 00:16:31,610 or equal to n. 277 00:16:34,180 --> 00:16:37,900 And if it is, we're basically going to take the negation. 278 00:16:37,900 --> 00:16:41,040 So that if it's is greater or equal to n, the negation here 279 00:16:41,040 --> 00:16:44,590 is all ones, once again, is minus 1. 280 00:16:44,590 --> 00:16:50,260 And so this becomes n and-ed with minus one. 281 00:16:50,260 --> 00:16:51,780 That gives me n. 282 00:16:51,780 --> 00:16:54,150 And so then I'll take z minus n. 283 00:16:54,150 --> 00:16:56,250 That's what I want 284 00:16:56,250 --> 00:17:02,730 However, if this is z is less than n, then this will 285 00:17:02,730 --> 00:17:03,850 evaluate to zero. 286 00:17:03,850 --> 00:17:05,740 Minus zero is zero. 287 00:17:05,740 --> 00:17:08,319 And n and zero is zero. 288 00:17:08,319 --> 00:17:11,599 And so I'll end up just getting the x plus y. 289 00:17:11,599 --> 00:17:14,650 So it's basically the same trick with a couple of 290 00:17:14,650 --> 00:17:22,829 twiddles on the minimum that we saw on the previous foil. 291 00:17:27,220 --> 00:17:28,470 Who's having fun? 292 00:17:31,322 --> 00:17:33,390 Good. 293 00:17:33,390 --> 00:17:34,800 As I said, we're going to see lots of tricks, 294 00:17:34,800 --> 00:17:36,280 magic tricks even. 295 00:17:41,140 --> 00:17:43,210 Round up to a power of two. 296 00:17:43,210 --> 00:17:45,540 This is a common thing that you want to do here. 297 00:17:45,540 --> 00:17:48,630 This, for example, goes on in memory allocators, which we'll 298 00:17:48,630 --> 00:17:50,710 talk about later in the course. 299 00:17:50,710 --> 00:17:53,670 So in a memory allocator, somebody asks for a hunk of 300 00:17:53,670 --> 00:18:00,020 storage of size 19, most memory allocators want to give 301 00:18:00,020 --> 00:18:04,280 out chunks that are powers of two for reasons we will 302 00:18:04,280 --> 00:18:05,940 discover later. 303 00:18:05,940 --> 00:18:12,100 So you want to round up to the next higher power of two. 304 00:18:12,100 --> 00:18:14,240 So how do you do that? 305 00:18:14,240 --> 00:18:17,570 So here's an example. 306 00:18:17,570 --> 00:18:25,610 So what I do is I decrement n, and then I update n or-ing it 307 00:18:25,610 --> 00:18:29,920 with the left shift of n, and then or-ing that with- sorry, 308 00:18:29,920 --> 00:18:31,870 the right shift of n by one. 309 00:18:31,870 --> 00:18:35,420 Then the right shift of n by two, et cetera, et cetera. 310 00:18:35,420 --> 00:18:36,670 So here's an example. 311 00:18:39,810 --> 00:18:42,850 So here's my original number. 312 00:18:42,850 --> 00:18:45,120 And what I want is to round it up to the next power of two. 313 00:18:45,120 --> 00:18:46,630 This is what I'm going to end up with in the end. 314 00:18:46,630 --> 00:18:48,420 See I've got the next higher power of two? 315 00:18:48,420 --> 00:18:51,430 Just one bit is on if I've rounded up to the next higher 316 00:18:51,430 --> 00:18:53,430 power of two. 317 00:18:53,430 --> 00:18:54,610 So what do I do? 318 00:18:54,610 --> 00:19:03,790 I basically decrement and then I take this word and I shift 319 00:19:03,790 --> 00:19:07,230 it by one to the right and or it in. 320 00:19:13,570 --> 00:19:17,340 And then I shift it by two and or it in. 321 00:19:17,340 --> 00:19:21,160 And then I shift it by four and or it in. 322 00:19:21,160 --> 00:19:22,940 And then, in fact, I shift it by eight and or it in. 323 00:19:22,940 --> 00:19:23,870 And I didn't do that. 324 00:19:23,870 --> 00:19:26,590 And since this isn't a 64-bit word I skipped the last two 325 00:19:26,590 --> 00:19:28,860 instructions. 326 00:19:28,860 --> 00:19:31,320 So what's going on when I'm shifting and or-ing it in, by 327 00:19:31,320 --> 00:19:34,090 one, by two, by four. 328 00:19:34,090 --> 00:19:35,260 What's happening? 329 00:19:35,260 --> 00:19:35,680 Yeah? 330 00:19:35,680 --> 00:19:41,344 AUDIENCE: [INAUDIBLE] this way there's no number of ones 331 00:19:41,344 --> 00:19:42,165 starting from the one. 332 00:19:42,165 --> 00:19:44,670 PROFESSOR: Yeah basically, from the most significant bit, 333 00:19:44,670 --> 00:19:46,640 if you look at what's happening with the most 334 00:19:46,640 --> 00:19:50,650 significant bit, you're shifting it by one. 335 00:19:50,650 --> 00:19:52,320 Then you're shifting it by two. 336 00:19:52,320 --> 00:19:55,630 You're flooding the low order bits with ones. 337 00:19:55,630 --> 00:19:58,700 And because it's an or, as soon as something gets set to 338 00:19:58,700 --> 00:20:00,350 one, it stays a one. 339 00:20:00,350 --> 00:20:02,280 So it doesn't matter what's actually happening in the low 340 00:20:02,280 --> 00:20:02,730 order bits. 341 00:20:02,730 --> 00:20:05,070 The only bit that we care about is this bit. 342 00:20:05,070 --> 00:20:09,355 And it basically floods all of the other bits with one. 343 00:20:12,200 --> 00:20:15,440 And then, once I've flooded them all with one I increment. 344 00:20:15,440 --> 00:20:21,230 And that gives me a carry out to this position and gives me 345 00:20:21,230 --> 00:20:22,160 the next higher power. 346 00:20:22,160 --> 00:20:27,760 So why did I decrement here, and then increment there? 347 00:20:27,760 --> 00:20:29,890 What's the decrement for? 348 00:20:29,890 --> 00:20:30,180 Yeah? 349 00:20:30,180 --> 00:20:34,950 AUDIENCE: So that you can flood with one because you 350 00:20:34,950 --> 00:20:36,858 want to get yourself back. 351 00:20:36,858 --> 00:20:38,766 So you want to the add the one-- 352 00:20:38,766 --> 00:20:41,703 PROFESSOR: But why did I decrement first? 353 00:20:41,703 --> 00:20:45,685 AUDIENCE: If you were not to decrement, it would just flood 354 00:20:45,685 --> 00:20:47,340 everything with ones. 355 00:20:47,340 --> 00:20:51,270 PROFESSOR: Well here, if I didn't decrement right here, I 356 00:20:51,270 --> 00:20:52,520 would have gotten the same result. 357 00:20:55,022 --> 00:20:58,810 If you already have a power of two. 358 00:20:58,810 --> 00:21:01,560 If I already have a power of two and I flood the low order 359 00:21:01,560 --> 00:21:05,790 bits, then I increment, I'll get the next 360 00:21:05,790 --> 00:21:08,150 higher power of two. 361 00:21:08,150 --> 00:21:12,490 So by subtracting one I make sure that I'm handling that 362 00:21:12,490 --> 00:21:16,050 base case when n is a power of two. 363 00:21:16,050 --> 00:21:17,016 Yeah? 364 00:21:17,016 --> 00:21:22,252 AUDIENCE: Does the [INAUDIBLE] operate [INAUDIBLE] 365 00:21:22,252 --> 00:21:24,450 PROFESSOR: It actually, the compiler is not going to care 366 00:21:24,450 --> 00:21:26,550 in this case. 367 00:21:26,550 --> 00:21:29,030 But it does make sure that it doesn't bother to try to 368 00:21:29,030 --> 00:21:31,510 return and keep around the old value. 369 00:21:31,510 --> 00:21:35,130 But it's smart enough to not worry about that. 370 00:21:38,250 --> 00:21:42,770 Yeah, I mean, some people like post-fix decrementing, and 371 00:21:42,770 --> 00:21:47,620 some people like pre-fix decrementing and it doesn't 372 00:21:47,620 --> 00:21:49,930 matter in most cases. 373 00:21:49,930 --> 00:21:52,610 But sometimes doing it after-- 374 00:21:52,610 --> 00:21:55,530 there are situations where doing it post-decrementing 375 00:21:55,530 --> 00:21:56,780 costs you a cycle. 376 00:21:59,420 --> 00:22:01,310 So everybody got the idea here? 377 00:22:01,310 --> 00:22:04,030 So basically round up to a next power two. 378 00:22:08,380 --> 00:22:10,710 How about computing a mask of the least 379 00:22:10,710 --> 00:22:13,020 significant one in a word? 380 00:22:13,020 --> 00:22:16,720 So I want to mask which is the power of two, the word that's 381 00:22:16,720 --> 00:22:22,370 all zeros except for one in the least significant one bit. 382 00:22:22,370 --> 00:22:23,620 Any ideas how to do that? 383 00:22:36,280 --> 00:22:37,680 This is a classic trick. 384 00:22:37,680 --> 00:22:41,290 Everybody should know this trick. 385 00:22:41,290 --> 00:22:44,345 You take x and you and it with its two's complement. 386 00:22:48,380 --> 00:22:50,470 Take x and and it with it's two's complement. 387 00:22:50,470 --> 00:22:51,710 Why does that work? 388 00:22:51,710 --> 00:22:54,590 So here's x, some value here. 389 00:22:54,590 --> 00:22:59,870 The two's complement is the one's complement plus one. 390 00:22:59,870 --> 00:23:00,590 Right? 391 00:23:00,590 --> 00:23:01,240 If you remember. 392 00:23:01,240 --> 00:23:04,180 So one's complement just means I compliment every bit. 393 00:23:04,180 --> 00:23:05,570 The two's complement is I compliment 394 00:23:05,570 --> 00:23:07,060 every bit and add one. 395 00:23:07,060 --> 00:23:12,140 So when I compliment every bit and add one, basically I go 396 00:23:12,140 --> 00:23:14,910 all the way to the least significant bit and then I get 397 00:23:14,910 --> 00:23:16,030 zeros after it. 398 00:23:16,030 --> 00:23:18,980 So right up to there I get-- 399 00:23:18,980 --> 00:23:22,575 it's one's complement and then it's basically- would have 400 00:23:22,575 --> 00:23:24,940 been 0, 1, 1, 1, 1, 1, plus 1. 401 00:23:24,940 --> 00:23:28,330 The carrot pulls you back up to there. 402 00:23:28,330 --> 00:23:31,730 And so then when you and them together, oh look at that. 403 00:23:31,730 --> 00:23:33,460 There's our least significant bit sitting there. 404 00:23:40,370 --> 00:23:42,430 Pretty good one? 405 00:23:42,430 --> 00:23:46,740 So so how do you find an index of the bit? 406 00:23:52,240 --> 00:24:00,975 So by an index I mean this is bit 01234. 407 00:24:00,975 --> 00:24:07,280 Well it turns out these days many machines have a special 408 00:24:07,280 --> 00:24:09,210 instruction to do that. 409 00:24:09,210 --> 00:24:11,790 And so if you look around and you find the right library 410 00:24:11,790 --> 00:24:15,270 that calls that instruction, you can use that instruction 411 00:24:15,270 --> 00:24:16,840 pretty cheaply. 412 00:24:16,840 --> 00:24:20,090 But there's still a lot of machines, especially things 413 00:24:20,090 --> 00:24:22,820 like Mobile machines, et cetera, where they have a 414 00:24:22,820 --> 00:24:24,840 depleted instruction set. 415 00:24:24,840 --> 00:24:29,460 Where they have no instruction to convert from a power of two 416 00:24:29,460 --> 00:24:32,040 to essentially it's log base two. 417 00:24:32,040 --> 00:24:34,760 So LG is the notation for log base two. 418 00:24:37,730 --> 00:24:39,390 So how do you go about doing that? 419 00:24:42,690 --> 00:24:52,200 So what we're going to do is do some magic to motivate the 420 00:24:52,200 --> 00:24:54,740 solution to this. 421 00:24:54,740 --> 00:24:58,980 So one way to do it is to use the ESP instruction. 422 00:24:58,980 --> 00:25:00,680 Are people familiar with the ESP instruction? 423 00:25:03,920 --> 00:25:06,580 What's that? 424 00:25:06,580 --> 00:25:07,300 The stack [? order? ?] 425 00:25:07,300 --> 00:25:12,160 No, no, that's BSP on some things. 426 00:25:12,160 --> 00:25:18,560 Or yeah, right, so in the extended instruction set-- 427 00:25:18,560 --> 00:25:20,280 yeah, OK. 428 00:25:20,280 --> 00:25:24,535 Yeah, so the ESP instruction; Extra Sensory Perception. 429 00:25:28,040 --> 00:25:43,970 And we have today the tremendous magician Tautology 430 00:25:43,970 --> 00:25:50,680 who is going to demonstrate the theory behind finding the 431 00:25:50,680 --> 00:25:53,470 index of the bit. 432 00:25:53,470 --> 00:25:56,200 So please give a warm hand for Tautology. 433 00:26:20,590 --> 00:26:21,380 TAUTOLOGY: How about now? 434 00:26:21,380 --> 00:26:22,630 Can everyone hear me? 435 00:26:26,110 --> 00:26:29,870 So, as my good friend Professor Leiserson has 436 00:26:29,870 --> 00:26:32,610 mentioned, I am the amazing Tautology. 437 00:26:32,610 --> 00:26:37,265 And today I am going to show you an amazing card trick 438 00:26:37,265 --> 00:26:39,550 which will baffle your minds for approximately five minutes 439 00:26:39,550 --> 00:26:40,920 until he shows you the next slide. 440 00:26:43,480 --> 00:26:47,452 All right, but to do this I will need five volunteers from 441 00:26:47,452 --> 00:26:48,714 the audience. 442 00:26:48,714 --> 00:26:52,072 PROFESSOR: Who can follow instructions. 443 00:26:52,072 --> 00:26:56,040 [LAUGHTER] 444 00:26:56,040 --> 00:27:00,840 So who would like to volunteer to participate in real magic? 445 00:27:00,840 --> 00:27:08,830 Here we go, one, two, three, four, only four? 446 00:27:11,630 --> 00:27:12,410 We need one more. 447 00:27:12,410 --> 00:27:15,310 So come on up and line up along the front here. 448 00:27:15,310 --> 00:27:16,825 We need one more, one more volunteer. 449 00:27:20,760 --> 00:27:23,360 One more volunteer, you get extra points. 450 00:27:23,360 --> 00:27:28,580 Remember participation is part of your grade. 451 00:27:28,580 --> 00:27:29,830 OK I'm going to cold call. 452 00:27:33,710 --> 00:27:34,960 Here we go. 453 00:27:40,066 --> 00:27:42,970 TAUTOLOGY: All right, excellent. 454 00:27:42,970 --> 00:27:45,630 Please cut this deck. 455 00:27:45,630 --> 00:27:48,886 PROFESSOR: First you gotta show it's a random deck. 456 00:27:48,886 --> 00:27:50,920 TAUTOLOGY: All right, I will first show you that's it's a 457 00:27:50,920 --> 00:27:52,170 random deck. 458 00:28:00,540 --> 00:28:04,300 PROFESSOR: It's only 32 cards. 459 00:28:04,300 --> 00:28:06,480 It's only 32 cards. 460 00:28:06,480 --> 00:28:08,560 So he pulled out some of the other cards. 461 00:28:08,560 --> 00:28:12,370 But they're in a pretty random order there. 462 00:28:12,370 --> 00:28:14,850 So we want to give everybody a chance to shuffle the deck 463 00:28:14,850 --> 00:28:16,355 here by doing a cut. 464 00:28:44,240 --> 00:28:48,180 Look at it, but don't show it to Tautology. 465 00:28:55,860 --> 00:28:59,940 Why don't you go around the back so that the 466 00:28:59,940 --> 00:29:01,190 class can see the cards? 467 00:29:07,260 --> 00:29:11,150 So hide your cards while he runs around behind. 468 00:29:11,150 --> 00:29:14,810 And then turn them around so that the class can see what 469 00:29:14,810 --> 00:29:15,670 the cards are. 470 00:29:15,670 --> 00:29:20,250 TAUTOLOGY: All right, are you guys ready to turn around? 471 00:29:20,250 --> 00:29:21,205 Cool, excellent. 472 00:29:21,205 --> 00:29:23,840 PROFESSOR: I'm not going to look at them either. 473 00:29:23,840 --> 00:29:27,010 There are no dupes. 474 00:29:27,010 --> 00:29:29,145 This is all done by ESP. 475 00:29:29,145 --> 00:29:32,410 TAUTOLOGY: All right, now- 476 00:29:32,410 --> 00:29:33,460 PROFESSOR: Why don't you come over here where 477 00:29:33,460 --> 00:29:34,330 you can see the class? 478 00:29:34,330 --> 00:29:35,620 TAUTOLOGY: Hello everybody. 479 00:29:35,620 --> 00:29:39,420 I'm going to tell you what is on those five cards, which I 480 00:29:39,420 --> 00:29:41,200 have not seen. 481 00:29:41,200 --> 00:29:44,160 Behold I have not seen them. 482 00:29:44,160 --> 00:29:48,230 OK, are you guys ready? 483 00:29:48,230 --> 00:29:49,780 PROFESSOR: Now you've got to think about it. 484 00:29:49,780 --> 00:29:51,590 You've got to think hard about what your card is. 485 00:29:51,590 --> 00:29:53,180 If you're not sure you can check. 486 00:29:53,180 --> 00:29:55,790 But you've got to think real hard about what your card is. 487 00:29:55,790 --> 00:29:57,040 TAUTOLOGY: Okay. 488 00:30:01,864 --> 00:30:03,460 PROFESSOR: Are you guys thinking hard? 489 00:30:09,290 --> 00:30:12,720 I think you need some technological assistance. 490 00:30:12,720 --> 00:30:14,595 TAUTOLOGY: I could use some technological assistance. 491 00:30:19,640 --> 00:30:20,925 PROFESSOR: There we go. 492 00:30:26,610 --> 00:30:30,415 This is our brain amplifier. 493 00:30:30,415 --> 00:30:34,420 It amplifies the brain waves coming from you. 494 00:30:34,420 --> 00:30:36,205 OK, give it a go. 495 00:30:36,205 --> 00:30:38,190 TAUTOLOGY: Hang on, is this thing on? 496 00:30:38,190 --> 00:30:38,810 There we go. 497 00:30:38,810 --> 00:30:40,110 Now it's on. 498 00:30:40,110 --> 00:30:41,900 OK, here I go. 499 00:30:48,830 --> 00:30:51,760 All right guys I'm having kind of an off day. 500 00:30:51,760 --> 00:30:56,300 So I'm going to need a little bit of help. 501 00:30:56,300 --> 00:30:58,240 Can you guys just raise your hand if you're 502 00:30:58,240 --> 00:30:59,590 holding a red card? 503 00:31:07,312 --> 00:31:08,560 So that's a red? 504 00:31:08,560 --> 00:31:10,620 PROFESSOR: Red, red, black, red. 505 00:31:13,160 --> 00:31:14,410 They did it right, right? 506 00:31:19,355 --> 00:31:22,280 TAUTOLOGY: Now let me give this a shot. 507 00:31:22,280 --> 00:31:25,942 So red, black, red, red, red. 508 00:31:25,942 --> 00:31:31,320 PROFESSOR: No, red, red - yes OK you do it your way. 509 00:31:36,850 --> 00:31:40,550 TAUTOLOGY: Now, now I might be wrong. 510 00:31:40,550 --> 00:31:48,470 But I think- am I seeing a diamond with a seven on it? 511 00:31:48,470 --> 00:31:50,880 Is that what I'm seeing? 512 00:31:50,880 --> 00:31:53,530 PROFESSOR: Oh! 513 00:31:53,530 --> 00:31:55,682 How impressive is that? 514 00:31:55,682 --> 00:31:58,764 TAUTOLOGY: OK, one down, four to go. 515 00:32:03,640 --> 00:32:05,280 What else am I seeing? 516 00:32:05,280 --> 00:32:07,250 I believe I am seeing. 517 00:32:07,250 --> 00:32:10,890 You're going to have to think about this card pretty hard. 518 00:32:10,890 --> 00:32:12,400 I'm having an off day. 519 00:32:12,400 --> 00:32:18,825 So I think I'm seeing a spade, with a six. 520 00:32:26,620 --> 00:32:28,810 Thank you for your honesty. 521 00:32:28,810 --> 00:32:29,920 Thank you. 522 00:32:29,920 --> 00:32:32,720 It does mean a lot to me. 523 00:32:32,720 --> 00:32:44,040 All right, so I'm seeing a- now what am I seeing now? 524 00:32:44,040 --> 00:32:47,750 What could this be? 525 00:32:47,750 --> 00:32:49,860 It's some kind of a heart. 526 00:32:52,420 --> 00:32:58,330 I think, just maybe, it has a value of five. 527 00:33:03,045 --> 00:33:03,550 PROFESSOR: Oh! 528 00:33:03,550 --> 00:33:06,060 Three in a row! 529 00:33:06,060 --> 00:33:07,700 I don't think he's doing this at random. 530 00:33:07,700 --> 00:33:09,585 There must be ESP at work here. 531 00:33:09,585 --> 00:33:10,470 TAUTOLOGY: Clearly. 532 00:33:10,470 --> 00:33:12,610 It's all the hat to be honest. 533 00:33:12,610 --> 00:33:16,460 It's all the technology, just the latest technology. 534 00:33:16,460 --> 00:33:19,065 All right, so- 535 00:33:19,065 --> 00:33:20,760 PROFESSOR: It gets harder as we go. 536 00:33:20,760 --> 00:33:24,440 TAUTOLOGY: It does get harder as we go. 537 00:33:24,440 --> 00:33:27,700 But, what is this? 538 00:33:27,700 --> 00:33:34,240 do I- I think- no it can't be- can it be? 539 00:33:34,240 --> 00:33:37,202 The three of hearts? 540 00:33:37,202 --> 00:33:38,895 It's the three of hearts! 541 00:33:38,895 --> 00:33:42,800 My old nemesis the three of hearts. 542 00:33:42,800 --> 00:33:45,102 OK just one more card. 543 00:33:45,102 --> 00:33:46,830 [LAUGHTER] 544 00:33:46,830 --> 00:33:49,084 Did you just swap cards? 545 00:33:49,084 --> 00:33:51,705 I think I just watched you swap you cards. 546 00:33:51,705 --> 00:33:53,910 Well I'm just going for in the set- 547 00:33:53,910 --> 00:33:55,310 PROFESSOR: And they did it without a temporary notice. 548 00:34:00,160 --> 00:34:01,840 They must have xor-ed them together or something. 549 00:34:06,740 --> 00:34:09,480 TAUTOLOGY: So the last card, which may or may not be in the 550 00:34:09,480 --> 00:34:11,850 last person's hands. 551 00:34:11,850 --> 00:34:13,179 Let me see if I can see it. 552 00:34:15,984 --> 00:34:17,521 What could it be? 553 00:34:20,467 --> 00:34:21,717 What could it be? 554 00:34:25,780 --> 00:34:29,540 No, no! 555 00:34:29,540 --> 00:34:33,760 No, anything, not the- not that! 556 00:34:33,760 --> 00:34:35,243 Not the 6 of diamonds! 557 00:34:38,495 --> 00:34:43,580 PROFESSOR: All right, five out of five! 558 00:34:43,580 --> 00:34:47,480 Thank you, thank you. 559 00:34:47,480 --> 00:34:48,830 OK guys go back to your seats. 560 00:34:52,739 --> 00:34:54,525 So that was a pretty easy trick, right? 561 00:34:54,525 --> 00:34:55,780 TAUTOLOGY: Yeah, pretty easy. 562 00:34:55,780 --> 00:34:58,045 PROFESSOR: So how does it work? 563 00:35:01,980 --> 00:35:03,814 What's the basic idea behind it? 564 00:35:10,590 --> 00:35:11,940 What's the basic idea behind it? 565 00:35:11,940 --> 00:35:17,780 So key thing is how many bits of information did he get? 566 00:35:17,780 --> 00:35:19,580 Five bits of information. 567 00:35:19,580 --> 00:35:23,980 And there were how many cards in the deck? 568 00:35:23,980 --> 00:35:25,950 Thirty-two. 569 00:35:25,950 --> 00:35:34,360 So the pattern of the five bits, of which cards were read 570 00:35:34,360 --> 00:35:39,580 let him know where in the cyclic sequence of cards he 571 00:35:39,580 --> 00:35:40,810 was, right? 572 00:35:40,810 --> 00:35:42,180 Because the cards really weren't random. 573 00:35:42,180 --> 00:35:43,430 They just looked random. 574 00:35:46,100 --> 00:35:48,840 And so five bits is enough. 575 00:35:48,840 --> 00:35:53,540 But that means that every the sequence of five cards in that 576 00:35:53,540 --> 00:35:57,200 deck, as you rotate it around, has to have a 577 00:35:57,200 --> 00:35:58,450 different bit pattern. 578 00:36:01,790 --> 00:36:04,566 So does anybody know a name for that property? 579 00:36:08,120 --> 00:36:10,040 A circular sequence that has that property? 580 00:36:14,150 --> 00:36:20,370 The property is that if you- well, let's see. 581 00:36:20,370 --> 00:36:21,620 Maybe I have it up here. 582 00:36:25,560 --> 00:36:32,360 So here's our magic code, which is going to 583 00:36:32,360 --> 00:36:34,940 compute the log of x. 584 00:36:34,940 --> 00:36:38,720 And it's using what's called a De Bruijn sequence. 585 00:36:38,720 --> 00:36:43,590 So let's come back to the magic trick in a minute. 586 00:36:43,590 --> 00:36:47,080 And let's look to see how we compute this. 587 00:36:47,080 --> 00:36:49,450 And then we'll understand how both work. 588 00:36:52,320 --> 00:36:58,130 There's a magic number here called De Bruijn. 589 00:36:58,130 --> 00:36:59,650 De Bruijn was a Dutch mathematician. 590 00:37:02,540 --> 00:37:06,360 And then there's this funny conversion table. 591 00:37:06,360 --> 00:37:11,310 And to find the log of x, where x is a power of two, I 592 00:37:11,310 --> 00:37:15,350 multiply x by this magic number here. 593 00:37:15,350 --> 00:37:20,390 Right shifted 58 places. 594 00:37:20,390 --> 00:37:22,900 This is keeping how many bits after the multiply here? 595 00:37:26,940 --> 00:37:31,430 Six bits because it's a 64-bit word. 596 00:37:31,430 --> 00:37:33,105 And then looking it up in this table. 597 00:37:38,310 --> 00:37:40,580 So let's take a look at what's going on there. 598 00:37:40,580 --> 00:37:47,890 So a De Bruijn sequence, s, of length 2 to the k, is a cyclic 599 00:37:47,890 --> 00:37:51,940 zero one sequence, such that each of the two to the zero 600 00:37:51,940 --> 00:37:57,230 one strings of length k occurs exactly once as a 601 00:37:57,230 --> 00:38:00,010 substring of s. 602 00:38:00,010 --> 00:38:02,040 That's a mouthful. 603 00:38:02,040 --> 00:38:04,760 Let's do an example, smaller. 604 00:38:04,760 --> 00:38:07,150 So for example k equals 3. 605 00:38:07,150 --> 00:38:13,370 So here's a sequence, 0, 0, 0, 1, 1, 1, 0, 1, base 2. 606 00:38:13,370 --> 00:38:18,730 If I look at the first three bits it's 0, 0, 0. 607 00:38:18,730 --> 00:38:22,520 The second three bits is 0, 0, 1. 608 00:38:22,520 --> 00:38:25,340 And notice that as I go through here every sequence, 609 00:38:25,340 --> 00:38:28,400 and these are wrapping around the end, taking the last bit 610 00:38:28,400 --> 00:38:31,780 and then the first two of the end. 611 00:38:31,780 --> 00:38:35,200 Every one of these gives you an index of every different 612 00:38:35,200 --> 00:38:39,430 bit pattern of length 3. 613 00:38:39,430 --> 00:38:46,000 So these came up because people played with those 614 00:38:46,000 --> 00:38:51,010 keypads where you have to enter a combination, right? 615 00:38:51,010 --> 00:38:52,960 And if you have a keypad and you want to enter a 616 00:38:52,960 --> 00:38:56,920 combination- let's say the keypad has only two numbers on 617 00:38:56,920 --> 00:38:59,080 it, 0 and 1. 618 00:38:59,080 --> 00:39:04,710 And you have to hit the right sequence of k numbers. 619 00:39:04,710 --> 00:39:08,630 So the naive way of doing it would be to say well let's try 620 00:39:08,630 --> 00:39:11,600 0,0,0,0,0,0,0. 621 00:39:11,600 --> 00:39:18,850 Then let's try 0,0,0,0,0,0,1, then 0,0,0,0,0,1,0. 622 00:39:18,850 --> 00:39:22,710 So what that'll do is you'll have to go through 2 to the k 623 00:39:22,710 --> 00:39:28,970 numbers, each of which is k bits long, for k times 2 to 624 00:39:28,970 --> 00:39:32,320 the k punches in order to be sure that you've hit every 625 00:39:32,320 --> 00:39:36,690 number to open the lock. 626 00:39:36,690 --> 00:39:39,630 The De Bruijn sequence takes it from k times 2 to the k 627 00:39:39,630 --> 00:39:40,730 down to 2 to the k. 628 00:39:40,730 --> 00:39:45,190 Still exponential in k, but there's k at the front because 629 00:39:45,190 --> 00:39:50,080 it's making it so that each sequence that you have nests 630 00:39:50,080 --> 00:39:53,260 into the previous one. 631 00:39:53,260 --> 00:39:54,810 And that's basically what's going on here. 632 00:39:54,810 --> 00:39:59,950 Every sequence of length 3 exists in 633 00:39:59,950 --> 00:40:01,400 the De Bruijn sequence. 634 00:40:01,400 --> 00:40:08,880 And so this one here is a De Bruijn sequence of length 64 635 00:40:08,880 --> 00:40:12,040 as it turns out. 636 00:40:12,040 --> 00:40:15,790 And what we had in the magic trick was a De Bruijn sequence 637 00:40:15,790 --> 00:40:18,200 of length 32. 638 00:40:18,200 --> 00:40:20,630 So that when you cut the cards and you looked at the first 639 00:40:20,630 --> 00:40:27,115 five cards that was a unique pattern of reds and blacks. 640 00:40:27,115 --> 00:40:29,890 It told you where you were in the rotation of the sequence. 641 00:40:32,430 --> 00:40:35,280 And then there's a little bit of cleverness to how it is 642 00:40:35,280 --> 00:40:37,020 that you translate that into cards. 643 00:40:37,020 --> 00:40:39,860 Because remembering 32 cards, and what their sequence was, 644 00:40:39,860 --> 00:40:43,410 and so forth, that's pretty hard. 645 00:40:43,410 --> 00:40:44,700 But it turns out you can just do an 646 00:40:44,700 --> 00:40:46,970 encoding of the five bits. 647 00:40:46,970 --> 00:40:50,000 Two of the bits encode the suit. 648 00:40:50,000 --> 00:40:55,700 So the high order bit encodes the suit, the two bits encode 649 00:40:55,700 --> 00:40:59,320 the suit, and then the last three bits tell what the 650 00:40:59,320 --> 00:41:03,770 number is, 1 through 8. 651 00:41:03,770 --> 00:41:06,950 So that's how that worked. 652 00:41:06,950 --> 00:41:09,760 It wasn't really magic after all. 653 00:41:09,760 --> 00:41:11,010 Who's surprised? 654 00:41:13,090 --> 00:41:17,040 So how can we use this in this particular code? 655 00:41:17,040 --> 00:41:19,280 So for this, basically the convert 656 00:41:19,280 --> 00:41:21,040 table does the following. 657 00:41:21,040 --> 00:41:24,700 It says, well, if you've got zero, the offset, the shift of 658 00:41:24,700 --> 00:41:27,180 this amount here, is zero. 659 00:41:27,180 --> 00:41:30,990 And if you've got one, then the shift is one. 660 00:41:30,990 --> 00:41:35,090 And if you have a six- where's six in here? 661 00:41:40,760 --> 00:41:44,930 Sorry if I have a two here, then that's going 662 00:41:44,930 --> 00:41:47,520 to be that I'm six. 663 00:41:47,520 --> 00:41:51,160 So this table is inverting this number. 664 00:41:51,160 --> 00:41:54,930 Do people see the relationship there? 665 00:41:54,930 --> 00:41:58,640 So if I know what the pattern is I can do a look up and tell 666 00:41:58,640 --> 00:42:02,390 how much did I shift by? 667 00:42:02,390 --> 00:42:05,140 If I am shifting by a given amount there, 668 00:42:05,140 --> 00:42:07,190 or circularly shifting. 669 00:42:07,190 --> 00:42:10,520 So here's the way that code works. 670 00:42:10,520 --> 00:42:12,900 Let's say we've got a number like 2 to the fourth that I'm 671 00:42:12,900 --> 00:42:15,310 trying to figure out what the exponent is. 672 00:42:15,310 --> 00:42:17,600 It's always a power of 2. 673 00:42:17,600 --> 00:42:19,380 So I'm looking at 2 to the fourth and I 674 00:42:19,380 --> 00:42:21,020 want to extract 4. 675 00:42:21,020 --> 00:42:23,800 But all I have is the mask that's 16, which 676 00:42:23,800 --> 00:42:25,460 has the one bit on. 677 00:42:25,460 --> 00:42:30,110 What I do is I multiply this number, the De Bruijn sequence 678 00:42:30,110 --> 00:42:33,130 number, by 16. 679 00:42:33,130 --> 00:42:35,840 Well what happens when you multiply by a power of 2? 680 00:42:40,390 --> 00:42:41,920 It shifts it by 4 bits. 681 00:42:44,500 --> 00:42:47,670 So it's shifted the bits by 4 bits. 682 00:42:47,670 --> 00:42:51,320 And now if I right shift it by 8 minus 3, I 683 00:42:51,320 --> 00:42:52,860 capture the top 3 bits. 684 00:42:55,840 --> 00:43:00,940 In this case, 1, 1, 0, which is 6. 685 00:43:00,940 --> 00:43:01,955 Then I convert 6. 686 00:43:01,955 --> 00:43:03,740 It says I had a shift of 4. 687 00:43:22,770 --> 00:43:25,235 And just with 64 bits it's a longer De Bruijn sequence. 688 00:43:28,420 --> 00:43:30,540 So it's performance is limited by the fact that you have to 689 00:43:30,540 --> 00:43:32,590 do a multiply and a table look up. 690 00:43:32,590 --> 00:43:35,800 But it's generally fairly competitive for many machines 691 00:43:35,800 --> 00:43:40,540 that do not actually have a log base 2 of a power of 2. 692 00:43:40,540 --> 00:43:43,480 These days machine instructions are getting- 693 00:43:43,480 --> 00:43:45,540 there are instructions that will do that in a single 694 00:43:45,540 --> 00:43:47,850 instruction for you. 695 00:43:47,850 --> 00:43:52,630 But if you don't happen to have one on your architecture 696 00:43:52,630 --> 00:43:55,600 and need to do this fast this is a reasonably 697 00:43:55,600 --> 00:43:57,090 fast way to do it. 698 00:43:57,090 --> 00:43:59,950 Even with a table look up and the thing. 699 00:43:59,950 --> 00:44:01,890 The other way of doing it, of course, would be to shift by 700 00:44:01,890 --> 00:44:06,120 one, shift by one, shift by one, until you get the one. 701 00:44:06,120 --> 00:44:07,710 And there's some other techniques as well 702 00:44:07,710 --> 00:44:08,350 that you can use. 703 00:44:08,350 --> 00:44:11,990 You can do divide and conquer in a binary way, where you do 704 00:44:11,990 --> 00:44:15,640 binary search for where the bit is, by shifting and so 705 00:44:15,640 --> 00:44:17,250 forth, and hone in. 706 00:44:17,250 --> 00:44:20,060 But the problem with those techniques, the binary search 707 00:44:20,060 --> 00:44:21,890 in particular, is what? 708 00:44:21,890 --> 00:44:26,340 If I try to binary search to find a bit what's 709 00:44:26,340 --> 00:44:27,640 that going to be? 710 00:44:27,640 --> 00:44:28,320 Yeah, branching. 711 00:44:28,320 --> 00:44:31,490 You're going to have unpredictable branches. 712 00:44:31,490 --> 00:44:34,880 And each of those will cost you 16 cycles. 713 00:44:34,880 --> 00:44:40,920 And so with a 64-bit word you've got 16 cycles times six 714 00:44:40,920 --> 00:44:45,430 bits that you're trying to decode, times however many 715 00:44:45,430 --> 00:44:47,140 instructions it actually takes you. 716 00:44:47,140 --> 00:44:48,405 It adds up to a lot of cycles. 717 00:44:51,100 --> 00:44:52,930 But that can sometimes be an effective way 718 00:44:52,930 --> 00:44:54,210 of doing it as well. 719 00:44:54,210 --> 00:44:56,880 And there are other ways. 720 00:44:56,880 --> 00:44:59,790 You can look byte by byte. 721 00:44:59,790 --> 00:45:01,070 There are a variety of other techniques. 722 00:45:01,070 --> 00:45:02,320 Anyway, but this is a cute one. 723 00:45:05,840 --> 00:45:09,140 Here's another one, population count. 724 00:45:09,140 --> 00:45:11,420 Count up the number of one bits in a word. 725 00:45:16,710 --> 00:45:19,500 So here's one way of doing it. 726 00:45:19,500 --> 00:45:22,080 I start out r at zero. 727 00:45:22,080 --> 00:45:25,820 And I keep incrementing r. 728 00:45:25,820 --> 00:45:30,640 And what I do is I quit when x is zero. 729 00:45:30,640 --> 00:45:35,370 And what I do is I do that trick of x, ending it 730 00:45:35,370 --> 00:45:37,060 with x minus 1. 731 00:45:37,060 --> 00:45:38,310 Which does what? 732 00:45:40,550 --> 00:45:45,640 Eliminates the low order bit, that's one, the low order one. 733 00:45:45,640 --> 00:45:48,350 So basically I go through and I just kick out one of the 734 00:45:48,350 --> 00:45:51,150 ones, kick out another one of the ones, kick out another one 735 00:45:51,150 --> 00:45:54,330 of the ones, until I'm done. 736 00:45:54,330 --> 00:45:55,460 This has a branch in it. 737 00:45:55,460 --> 00:45:58,370 But in some sense it's a predictable branch because 738 00:45:58,370 --> 00:46:00,330 almost all the time you're going through the loop. 739 00:46:03,300 --> 00:46:09,940 However, it has downside, which is that suppose you're 740 00:46:09,940 --> 00:46:11,545 given minus 1. 741 00:46:15,210 --> 00:46:21,260 Then you have to do 64 iterations of this loop before 742 00:46:21,260 --> 00:46:25,040 you can get your final answer. 743 00:46:25,040 --> 00:46:29,410 And so that's a lot of iterations to do. 744 00:46:29,410 --> 00:46:31,490 So here's what's going on in your loop. 745 00:46:31,490 --> 00:46:32,660 Here's x. 746 00:46:32,660 --> 00:46:34,970 Here's x minus 1. 747 00:46:34,970 --> 00:46:37,180 And now if I and them it's very similar to the other 748 00:46:37,180 --> 00:46:38,680 trick that I taught you. 749 00:46:38,680 --> 00:46:41,700 You and them, notice you have the same number you started 750 00:46:41,700 --> 00:46:44,630 with except it's missing the one low order bit. 751 00:46:53,420 --> 00:46:56,220 So this is fast if the population count is small. 752 00:46:56,220 --> 00:47:00,820 If you know there's only a couple of its on in the word, 753 00:47:00,820 --> 00:47:03,780 then this can be a pretty effective technique. 754 00:47:03,780 --> 00:47:05,860 But in the worst case it's going to take it's going to be 755 00:47:05,860 --> 00:47:07,590 proportional to the number of bits in the word. 756 00:47:07,590 --> 00:47:10,760 Because you're only getting rid of one bit at a time. 757 00:47:10,760 --> 00:47:14,350 But it's better, in some sense, than looking one bit at 758 00:47:14,350 --> 00:47:18,165 a time because you have the off chance that the number of 759 00:47:18,165 --> 00:47:20,800 one bits will be sparse. 760 00:47:20,800 --> 00:47:22,740 Whereas if you just looked at the low order bit, then the 761 00:47:22,740 --> 00:47:27,820 next bit, then the next bit, that would definitely take you 762 00:47:27,820 --> 00:47:31,100 worst case every single time. 763 00:47:31,100 --> 00:47:33,080 Here's another way to do it. 764 00:47:33,080 --> 00:47:35,310 It's a table look up. 765 00:47:35,310 --> 00:47:39,180 So you have to pay, but if you're doing this a lot maybe 766 00:47:39,180 --> 00:47:42,800 all of this is an L1 so the table look up only costs you 767 00:47:42,800 --> 00:47:46,950 four cycles if it's an L1 cache. 768 00:47:46,950 --> 00:47:48,780 So what is this sequence? 769 00:47:48,780 --> 00:47:54,410 This tells for any given byte, so there's 256 values, how 770 00:47:54,410 --> 00:47:55,610 many ones are in the world. 771 00:47:55,610 --> 00:47:58,980 So zero has zero one bits. 772 00:47:58,980 --> 00:48:01,890 One has one one bit. 773 00:48:01,890 --> 00:48:04,950 Two has one one bit, three has two one bits. 774 00:48:04,950 --> 00:48:08,860 Four has one one bit, five has two, six has two, seven has 775 00:48:08,860 --> 00:48:11,090 three, eight has one, et cetera. 776 00:48:11,090 --> 00:48:12,060 So that's this table. 777 00:48:12,060 --> 00:48:14,610 I didn't fill out the rest of the table. 778 00:48:14,610 --> 00:48:17,380 And now what you're doing in this loop is you're basically 779 00:48:17,380 --> 00:48:21,650 taking a look at the value of x. 780 00:48:21,650 --> 00:48:25,480 You're right shifting it and then your indexing, masking 781 00:48:25,480 --> 00:48:29,710 with the low order byte. 782 00:48:29,710 --> 00:48:31,680 So you may ask the low order byte. 783 00:48:31,680 --> 00:48:36,080 You add that to the count by doing a look up, which 784 00:48:36,080 --> 00:48:41,470 hopefully only takes you four cycles if the table is in L1. 785 00:48:41,470 --> 00:48:45,040 And then just run around this loop until you've got no more 786 00:48:45,040 --> 00:48:47,850 things in your word. 787 00:48:47,850 --> 00:48:50,180 So how many are in byte one, how many are in byte two, how 788 00:48:50,180 --> 00:48:51,430 many are in byte three, and so forth. 789 00:48:56,910 --> 00:48:59,230 For things that use table look up you have to be careful 790 00:48:59,230 --> 00:49:03,590 because if you have a great big table why not look up two 791 00:49:03,590 --> 00:49:06,070 bytes at a time? 792 00:49:06,070 --> 00:49:12,070 Well two bytes is 65,000 entries. 793 00:49:12,070 --> 00:49:18,700 So why not look up four bytes at a time? 794 00:49:18,700 --> 00:49:20,920 Four bytes is four billion entries. 795 00:49:20,920 --> 00:49:23,060 At that point you're going out to memory and starting to 796 00:49:23,060 --> 00:49:25,260 consume a lot of space. 797 00:49:25,260 --> 00:49:28,850 So here's some common numbers. 798 00:49:28,850 --> 00:49:30,830 These are sort of approximate. 799 00:49:30,830 --> 00:49:33,900 But generally if you're doing operations on registers, one 800 00:49:33,900 --> 00:49:39,510 cycles, and plus you can issue six per core, per cycle. 801 00:49:39,510 --> 00:49:42,870 So L1 cache is going to cost you around four cycles, L2 802 00:49:42,870 --> 00:49:46,190 cache about 10, L3 about 50, and D RAM 803 00:49:46,190 --> 00:49:49,280 about 150 to 200 cycles. 804 00:49:49,280 --> 00:49:51,900 When you access these you get, in fact, generally for all 805 00:49:51,900 --> 00:49:55,950 these, you tend to get a 64 byte cache line. 806 00:49:55,950 --> 00:49:56,980 So you're getting more than one. 807 00:49:56,980 --> 00:49:59,770 But if what you're doing is random access in a table it 808 00:49:59,770 --> 00:50:01,890 doesn't help that all those other bytes are coming in. 809 00:50:07,250 --> 00:50:10,950 Population count three using parallel, divide, and conquer. 810 00:50:10,950 --> 00:50:12,200 Here's the clever one. 811 00:50:14,770 --> 00:50:16,020 Here's the code. 812 00:50:19,680 --> 00:50:22,290 It's all register operations basically. 813 00:50:26,320 --> 00:50:30,600 So it's creating some masks. 814 00:50:30,600 --> 00:50:32,950 So let's just take a look at what does this first 815 00:50:32,950 --> 00:50:34,200 instruction do? 816 00:50:36,790 --> 00:50:38,420 It's taking minus one. 817 00:50:38,420 --> 00:50:42,900 It's shifting it left 32 bits. 818 00:50:42,900 --> 00:50:45,400 So that gives all ones in the higher order half of the word, 819 00:50:45,400 --> 00:50:47,210 and all zeros in the lower half. 820 00:50:47,210 --> 00:50:50,960 And then it's xor-ing it with minus one. 821 00:50:50,960 --> 00:50:52,670 But with all minus ones. 822 00:50:52,670 --> 00:50:55,030 So that gives you a mask of ones in the low 823 00:50:55,030 --> 00:50:57,280 order half of the word. 824 00:50:57,280 --> 00:50:58,280 Yeah question? 825 00:50:58,280 --> 00:51:03,363 AUDIENCE: Why don't you just do negative one right shifted. 826 00:51:03,363 --> 00:51:04,520 Isn't there a type of- 827 00:51:04,520 --> 00:51:05,320 PROFESSOR: Yeah you can do that. 828 00:51:05,320 --> 00:51:06,620 I was trying to be consistent here. 829 00:51:06,620 --> 00:51:08,950 And, in fact, for these first two operations there are 830 00:51:08,950 --> 00:51:12,020 actually more clever ways of doing this that take fewer 831 00:51:12,020 --> 00:51:12,980 operations. 832 00:51:12,980 --> 00:51:15,395 AUDIENCE: Isn't there a right shift operator that 833 00:51:15,395 --> 00:51:16,844 [? pulls some ?] zeros in the top level? 834 00:51:16,844 --> 00:51:20,960 PROFESSOR: Yeah, so there's logical versus arithmetic 835 00:51:20,960 --> 00:51:22,120 right shift. 836 00:51:22,120 --> 00:51:22,490 Yeah. 837 00:51:22,490 --> 00:51:25,790 AUDIENCE: [INAUDIBLE] 838 00:51:25,790 --> 00:51:26,890 PROFESSOR: That's right. 839 00:51:26,890 --> 00:51:28,300 But then I wouldn't have the pattern that 840 00:51:28,300 --> 00:51:30,820 I'm setting up here. 841 00:51:30,820 --> 00:51:35,340 So yes, in fact, if you want to play with it yourself you 842 00:51:35,340 --> 00:51:37,240 can optimize these first two statements. 843 00:51:37,240 --> 00:51:41,450 They don't need to be as complicated as this one. 844 00:51:41,450 --> 00:51:43,660 But basically what you're doing in every step is your 845 00:51:43,660 --> 00:51:46,540 shifting it over, half the word, xor-ing it. 846 00:51:46,540 --> 00:51:50,630 And then the second one is you get a block of 16 bits of 847 00:51:50,630 --> 00:51:54,450 zeros, 16 ones, 16 zeros, 16 ones. 848 00:51:54,450 --> 00:51:59,400 The next one you get a block of eight zeros, eight one's, 849 00:51:59,400 --> 00:52:00,370 eight zeros, eight ones. 850 00:52:00,370 --> 00:52:03,050 And so basically you're generating masks for that. 851 00:52:03,050 --> 00:52:05,850 So by the time you get down to the last one you're having 852 00:52:05,850 --> 00:52:09,030 every other bit is zero. 853 00:52:09,030 --> 00:52:11,380 You're alternating zeros and ones. 854 00:52:11,380 --> 00:52:12,960 And then, basically- well let me not go 855 00:52:12,960 --> 00:52:13,590 through the code here. 856 00:52:13,590 --> 00:52:16,090 Let me show with an example. 857 00:52:16,090 --> 00:52:18,940 The main thing to observe is that it takes log n time where 858 00:52:18,940 --> 00:52:21,400 n is the word length to do this. 859 00:52:24,070 --> 00:52:30,610 So here's population count on 32 bits, same kind of thing. 860 00:52:30,610 --> 00:52:31,860 So here's the idea. 861 00:52:34,260 --> 00:52:39,160 We extract every other bit for the two words. 862 00:52:39,160 --> 00:52:41,460 So you saw how I extracted that right? 863 00:52:41,460 --> 00:52:42,970 So we extract. 864 00:52:42,970 --> 00:52:48,660 So I can do that with a mask and a shift. 865 00:52:48,660 --> 00:52:50,330 And then I add them together. 866 00:52:54,340 --> 00:52:56,520 So just so we can see what's being added here. 867 00:52:59,040 --> 00:53:01,580 And when I add them together the largest value I'm going to 868 00:53:01,580 --> 00:53:04,820 have in any one of these things is what? 869 00:53:04,820 --> 00:53:05,940 AUDIENCE: Two. 870 00:53:05,940 --> 00:53:08,630 PROFESSOR: And [? that, ?] fortunately, fits in two bits. 871 00:53:11,640 --> 00:53:14,840 So we can get off the ground. 872 00:53:14,840 --> 00:53:18,220 So now every two bits has the sum of the two 873 00:53:18,220 --> 00:53:19,110 bits that were there. 874 00:53:19,110 --> 00:53:22,480 The bits that I'm not showing are all zeros because it's 875 00:53:22,480 --> 00:53:23,720 done 64 bit words. 876 00:53:23,720 --> 00:53:27,390 The bits I'm showing are all zeros. 877 00:53:27,390 --> 00:53:28,640 So now what do we do? 878 00:53:32,780 --> 00:53:37,820 We mask and shift and take off every two pairs of bits and 879 00:53:37,820 --> 00:53:39,070 then add them together. 880 00:53:43,770 --> 00:53:47,360 So now this guy is saying there's four bit that were 881 00:53:47,360 --> 00:53:52,040 originally in the word that began. 882 00:53:52,040 --> 00:53:54,620 This one says there are two bits in that range, one bit, 883 00:53:54,620 --> 00:54:00,100 one bit, two bits, three bits, two bits, two bits. 884 00:54:00,100 --> 00:54:01,350 So we do it again. 885 00:54:04,780 --> 00:54:06,030 Add it together. 886 00:54:08,495 --> 00:54:09,745 And we just keep going. 887 00:54:14,640 --> 00:54:16,510 And then finally we add them all together. 888 00:54:20,460 --> 00:54:25,860 It says there are 17 ones in the word, which there were. 889 00:54:25,860 --> 00:54:28,570 I should have probably left the word up there or something 890 00:54:28,570 --> 00:54:30,740 so we could verify that. 891 00:54:30,740 --> 00:54:32,080 But Yeah there are 17 ones. 892 00:54:32,080 --> 00:54:36,720 So everybody see it's parallel, divide, and conquer 893 00:54:36,720 --> 00:54:40,400 because you're adding many words, many sub-pieces pieces 894 00:54:40,400 --> 00:54:41,090 of the word [UNINTELLIGIBLE]. 895 00:54:41,090 --> 00:54:44,270 And the key thing is to make it so that no carries are 896 00:54:44,270 --> 00:54:46,510 propagating out of their range. 897 00:54:46,510 --> 00:54:48,240 But the numbers are just getting smaller and smaller. 898 00:54:48,240 --> 00:54:51,430 When you're done you're only going to have six bits here 899 00:54:51,430 --> 00:54:52,480 that are significant anyway. 900 00:54:52,480 --> 00:54:53,730 All of these will be zeros. 901 00:54:58,000 --> 00:54:59,250 Is that cool? 902 00:55:03,310 --> 00:55:04,560 So there's a 17, yeah. 903 00:55:11,720 --> 00:55:14,960 Here's a problem for which bit representations 904 00:55:14,960 --> 00:55:16,370 are a lot of fun. 905 00:55:16,370 --> 00:55:18,760 Last year we gave this as a problem to students. 906 00:55:18,760 --> 00:55:21,600 This year we're giving you a different problem so that lets 907 00:55:21,600 --> 00:55:24,280 me lecture on it. 908 00:55:24,280 --> 00:55:28,250 So many people are probably familiar with this problem. 909 00:55:28,250 --> 00:55:30,460 It's an old chess nut. 910 00:55:30,460 --> 00:55:33,020 But basically the queen's problems is to place n queens 911 00:55:33,020 --> 00:55:37,650 on an n by n chess board so that no queen attacks another. 912 00:55:37,650 --> 00:55:43,110 So there are no two Queens in any row, column, or diagonal. 913 00:55:43,110 --> 00:55:46,000 So queen's kind of move like this. 914 00:55:46,000 --> 00:55:46,925 It's got to be clear. 915 00:55:46,925 --> 00:55:49,950 If I did that around any one of these guys they wouldn't 916 00:55:49,950 --> 00:55:50,850 hit anybody else. 917 00:55:50,850 --> 00:55:53,720 In fact, this arrangement here is, I think, one of the few 918 00:55:53,720 --> 00:55:54,430 symmetric ones. 919 00:55:54,430 --> 00:55:56,430 Maybe it's the only symmetric one. 920 00:55:56,430 --> 00:55:57,490 It's radially symmetric. 921 00:55:57,490 --> 00:55:58,795 Most of them are more scattered. 922 00:56:01,370 --> 00:56:03,790 So the question is how do you find such a thing, or count 923 00:56:03,790 --> 00:56:05,090 the number of solutions, is another 924 00:56:05,090 --> 00:56:06,340 popular one, et cetera. 925 00:56:10,820 --> 00:56:14,150 A popular strategy for this is called backtracking search. 926 00:56:14,150 --> 00:56:16,070 And we're going to have in your homework a different 927 00:56:16,070 --> 00:56:19,350 backtracking search. 928 00:56:19,350 --> 00:56:22,490 And the idea is you just simply try to place the queens 929 00:56:22,490 --> 00:56:24,830 row by row. 930 00:56:24,830 --> 00:56:29,750 So, for example, we start out with the first row, row zero, 931 00:56:29,750 --> 00:56:31,610 and we place a queen. 932 00:56:31,610 --> 00:56:33,980 Then we go to the next row. 933 00:56:33,980 --> 00:56:37,640 And we try to see if a queen works on that square, nope, 934 00:56:37,640 --> 00:56:40,230 nope, yes it works there. 935 00:56:40,230 --> 00:56:43,410 Now we go on to the next row. 936 00:56:43,410 --> 00:56:45,500 So this is making progress. 937 00:56:45,500 --> 00:56:46,360 We're placing queens. 938 00:56:46,360 --> 00:56:47,660 We're going to more rows. 939 00:56:47,660 --> 00:56:50,420 We're going to get to the end of the rows, right? 940 00:56:50,420 --> 00:57:00,030 So we keep- yep, we found it. 941 00:57:00,030 --> 00:57:01,200 And we keep going. 942 00:57:01,200 --> 00:57:04,330 This is easy. 943 00:57:04,330 --> 00:57:05,750 Found it right after two there. 944 00:57:05,750 --> 00:57:07,000 That's pretty good. 945 00:57:16,824 --> 00:57:19,170 Found it there. 946 00:57:19,170 --> 00:57:20,495 Look, we're making great progress. 947 00:57:23,440 --> 00:57:26,730 Doesn't go there, doesn't go there, doesn't go there, 948 00:57:26,730 --> 00:57:30,740 doesn't go there, oops doesn't go anywhere. 949 00:57:30,740 --> 00:57:32,210 So what do we do? 950 00:57:32,210 --> 00:57:35,060 We backtrack. 951 00:57:35,060 --> 00:57:38,340 We say, gee, if it didn't fit in any those but it had 952 00:57:38,340 --> 00:57:41,260 nothing to do with the placement there, it's the 953 00:57:41,260 --> 00:57:44,490 fault of the guy who came before me. 954 00:57:44,490 --> 00:57:47,580 So that position is not a valid position, at least with 955 00:57:47,580 --> 00:57:48,600 that prefix. 956 00:57:48,600 --> 00:57:49,900 So we continue with him. 957 00:57:56,150 --> 00:57:57,690 Aha, we found another place for him. 958 00:58:00,280 --> 00:58:02,140 So then we try this one. 959 00:58:08,910 --> 00:58:16,060 Oops this doesn't look good, aha, got to backtrack. 960 00:58:16,060 --> 00:58:18,055 Whoops that's the last one in the row, got 961 00:58:18,055 --> 00:58:21,490 to backtrack again. 962 00:58:21,490 --> 00:58:24,335 So that means that guy can't go there. 963 00:58:28,950 --> 00:58:32,300 Found a place, now we get to go forward, hooray. 964 00:58:32,300 --> 00:58:33,580 And you keep going on. 965 00:58:33,580 --> 00:58:35,010 So you backtrack, et cetera, until you finally 966 00:58:35,010 --> 00:58:37,490 find a place for them. 967 00:58:37,490 --> 00:58:42,260 So the backtracking search is pretty interesting. 968 00:58:42,260 --> 00:58:47,040 But the question is how do you represent it so that it can go 969 00:58:47,040 --> 00:58:48,290 really fast. 970 00:58:50,710 --> 00:58:52,950 So here's some ideas. 971 00:58:52,950 --> 00:58:56,020 The first idea you might come up with is to use an array of 972 00:58:56,020 --> 00:59:00,300 n squared bytes, where you put a value in the byte if there's 973 00:59:00,300 --> 00:59:02,360 a queen there. 974 00:59:02,360 --> 00:59:05,430 You should, at this point, figure out that, gee all I 975 00:59:05,430 --> 00:59:08,350 have to know is whether a queen is there or not. 976 00:59:08,350 --> 00:59:11,300 So why should I keep a byte? 977 00:59:11,300 --> 00:59:12,870 Why not just keep a bit? 978 00:59:12,870 --> 00:59:15,560 That'll be smaller and have a smaller representing. 979 00:59:15,560 --> 00:59:17,330 So let's keep n squared bits. 980 00:59:21,340 --> 00:59:23,960 Well let's see. 981 00:59:23,960 --> 00:59:28,320 If I'm only putting a queen in one place in every row I never 982 00:59:28,320 --> 00:59:34,510 have to have more than one bit set in any row. 983 00:59:34,510 --> 00:59:42,420 So why not just say the column number in the row that I'm in? 984 00:59:42,420 --> 00:59:50,500 So rather than using an array of n bits for every row let me 985 00:59:50,500 --> 00:59:59,010 just use an index of a byte to say which row that one queen 986 00:59:59,010 --> 01:00:01,550 is in because there can't be any other Queens in that row 987 01:00:01,550 --> 01:00:02,800 in a legal configuration. 988 01:00:05,330 --> 01:00:06,630 So that's actually more clever. 989 01:00:06,630 --> 01:00:08,490 And that's the way most people code it. 990 01:00:08,490 --> 01:00:12,000 But we're going to look at a solution that was originally 991 01:00:12,000 --> 01:00:18,390 due to Edsger Dijkstra of using three bit vectors to 992 01:00:18,390 --> 01:00:20,825 represent the board. 993 01:00:20,825 --> 01:00:28,810 And the idea is we want to make things go- we're going to 994 01:00:28,810 --> 01:00:32,550 use three bit vectors that are relatively small. 995 01:00:32,550 --> 01:00:34,190 So it turns out the n queens is an 996 01:00:34,190 --> 01:00:36,780 exponential search problem. 997 01:00:36,780 --> 01:00:43,920 And so you really can't run n queens on 128 by 128 board. 998 01:00:43,920 --> 01:00:47,180 You can run that if you're interested in one solution. 999 01:00:47,180 --> 01:00:48,720 You can't count up how many solutions. 1000 01:00:48,720 --> 01:00:51,450 And if you go to Wikipedia and look at n Queens they will 1001 01:00:51,450 --> 01:00:56,050 tell you what all the latest records are for who has 1002 01:00:56,050 --> 01:00:59,740 computed how many solutions there are on an n by n board 1003 01:00:59,740 --> 01:01:03,570 for n up to some rather small number. 1004 01:01:06,440 --> 01:01:10,945 So it's a way to get your name on the web, which, as you 1005 01:01:10,945 --> 01:01:12,260 know, is very difficult to do. 1006 01:01:15,500 --> 01:01:18,240 So let's see this three bit vector trick. 1007 01:01:18,240 --> 01:01:21,870 So the idea of the bit vector trick is that, for any partial 1008 01:01:21,870 --> 01:01:25,310 placement, rather than representing where the queens 1009 01:01:25,310 --> 01:01:27,380 are on the board, what I really care about is which 1010 01:01:27,380 --> 01:01:28,840 columns have been knocked out. 1011 01:01:33,310 --> 01:01:37,320 So, therefore, what I'll do is I'll store a one if there's a 1012 01:01:37,320 --> 01:01:41,150 queen in that column and a zero if I don't have a queen 1013 01:01:41,150 --> 01:01:42,400 in that column. 1014 01:01:44,630 --> 01:01:47,410 So the point is I can keep the whole representation of the 1015 01:01:47,410 --> 01:01:52,310 column mask in a word that just tells me whether I have 1016 01:01:52,310 --> 01:01:54,970 ones or zeros in a column. 1017 01:01:54,970 --> 01:01:59,680 Now how do I know whether it's safe to place a queen in a 1018 01:01:59,680 --> 01:02:00,930 given column? 1019 01:02:07,380 --> 01:02:08,630 What's that? 1020 01:02:11,852 --> 01:02:12,605 AUDIENCE: [INAUDIBLE] 1021 01:02:12,605 --> 01:02:15,340 PROFESSOR: How do I know, if I try to place a queen say, 1022 01:02:15,340 --> 01:02:18,530 here, how do I know whether that's OK or not? 1023 01:02:24,420 --> 01:02:27,280 Suppose that my program wants to try to put a 1024 01:02:27,280 --> 01:02:29,390 queen in this space. 1025 01:02:29,390 --> 01:02:32,070 It can't because there's something here. 1026 01:02:32,070 --> 01:02:37,490 It's just some operations on words, which we will see here. 1027 01:02:37,490 --> 01:02:40,780 So placing a queen in column c is not safe if 1028 01:02:40,780 --> 01:02:43,650 this down were here. 1029 01:02:43,650 --> 01:02:48,185 When I and it with one shifted left by the number of columns 1030 01:02:48,185 --> 01:02:54,390 so that I have the position of the queen in the column if 1031 01:02:54,390 --> 01:02:56,360 that's non-zero because it means somebody 1032 01:02:56,360 --> 01:02:58,650 else is in that column. 1033 01:02:58,650 --> 01:03:01,830 If it's in a new column when I do the and I'll get all zeros. 1034 01:03:04,600 --> 01:03:06,810 Everybody follow that? 1035 01:03:06,810 --> 01:03:10,420 So testing columnists, whether it's safe to put it in a given 1036 01:03:10,420 --> 01:03:15,410 column, from the column attack we can do that really, really 1037 01:03:15,410 --> 01:03:18,330 efficiently, right? 1038 01:03:18,330 --> 01:03:19,580 These are all going to be in registers. 1039 01:03:22,070 --> 01:03:28,850 No memory operations, no table look ups, no L1 caches, all 1040 01:03:28,850 --> 01:03:30,100 right in registers. 1041 01:03:32,890 --> 01:03:37,005 Well what do we do about the diagonals? 1042 01:03:37,005 --> 01:03:40,810 So for the diagonals we can also use a bit vector 1043 01:03:40,810 --> 01:03:43,860 representation for each diagonal. 1044 01:03:43,860 --> 01:03:47,830 Where I look to see if there's a number along this diagonal, 1045 01:03:47,830 --> 01:03:50,220 and if there is a queen then it's a one. 1046 01:03:50,220 --> 01:03:54,070 And if there isn't a queen on the diagonal it's zero. 1047 01:03:54,070 --> 01:03:57,320 There are more diagonals then there are columns, right? 1048 01:03:57,320 --> 01:04:00,110 So I have a longer bit vector representation. 1049 01:04:00,110 --> 01:04:02,550 I have to represent that. 1050 01:04:02,550 --> 01:04:05,880 But I can still do that in one computer word for things of 1051 01:04:05,880 --> 01:04:08,340 different size, or even for things that are pretty good 1052 01:04:08,340 --> 01:04:14,140 size two computer words would be, certainly, ample. 1053 01:04:14,140 --> 01:04:18,600 So now how do I tell whether or not a queen placed on a 1054 01:04:18,600 --> 01:04:26,250 given square can legally be placed there? 1055 01:04:29,620 --> 01:04:33,570 So it turns out it's not safe to place it if when I take my 1056 01:04:33,570 --> 01:04:39,400 writer ray, and I take n minus r plus c, and left shift it by 1057 01:04:39,400 --> 01:04:43,810 that amount, and that's non-zero. 1058 01:04:43,810 --> 01:04:47,180 So here I'm indexing rows and columns from the upper left 1059 01:04:47,180 --> 01:04:49,720 hand corner. 1060 01:04:49,720 --> 01:04:52,620 And so, basically, you're trying to say is, for a given 1061 01:04:52,620 --> 01:04:57,970 square, notice that if I increase the row, that's this 1062 01:04:57,970 --> 01:05:06,660 way, I get to the same- of if I'm increasing the row, I'm 1063 01:05:06,660 --> 01:05:10,110 decreasing the diagonal that I'm on. 1064 01:05:10,110 --> 01:05:13,460 But if I'm increasing the column I'm increasing the 1065 01:05:13,460 --> 01:05:16,730 diagonal I'm on. 1066 01:05:16,730 --> 01:05:19,340 So that's it's basically a difference here. 1067 01:05:19,340 --> 01:05:24,230 And then you just normalize it and with essentially very, 1068 01:05:24,230 --> 01:05:26,650 very few operations I can tell whether there's a 1069 01:05:26,650 --> 01:05:28,410 conflict in a column. 1070 01:05:28,410 --> 01:05:31,190 Of course, for both this and the other one, if I need to 1071 01:05:31,190 --> 01:05:34,170 set it now that's pretty simple also. 1072 01:05:34,170 --> 01:05:37,320 I just take this and or it with right. 1073 01:05:37,320 --> 01:05:41,630 If my test is good, I just take this, or it with right, 1074 01:05:41,630 --> 01:05:43,040 and now that's my new right. 1075 01:05:46,960 --> 01:05:49,530 And left is similar. 1076 01:05:49,530 --> 01:05:52,860 So, once again, we have these guys going this way, placing a 1077 01:05:52,860 --> 01:05:54,560 queen in row r. 1078 01:05:54,560 --> 01:05:58,980 And column c is not safe if- and now I just look at row 1079 01:05:58,980 --> 01:06:02,070 plus column because these diagonals increase with both 1080 01:06:02,070 --> 01:06:04,280 row and column. 1081 01:06:04,280 --> 01:06:07,970 You're increasing the diagonal for both row and column. 1082 01:06:07,970 --> 01:06:09,130 And so I'm not going to go through all the 1083 01:06:09,130 --> 01:06:10,060 details of the code. 1084 01:06:10,060 --> 01:06:15,000 But you can see that with this representation, literally, the 1085 01:06:15,000 --> 01:06:18,060 inner loop of your program, which is testing whether 1086 01:06:18,060 --> 01:06:21,505 queens fit on boards and then setting them if they do, you 1087 01:06:21,505 --> 01:06:25,040 can do with just three words and a few operations. 1088 01:06:25,040 --> 01:06:25,856 Question? 1089 01:06:25,856 --> 01:06:31,688 AUDIENCE: Can you mask the three bits together and check 1090 01:06:31,688 --> 01:06:35,090 if one position is taken up? 1091 01:06:35,090 --> 01:06:40,193 You'd have to do some creative stuff with lining up the row 1092 01:06:40,193 --> 01:06:43,760 and column vectors with the diagonals, but- 1093 01:06:43,760 --> 01:06:46,020 PROFESSOR: So typically here you're looking at 1094 01:06:46,020 --> 01:06:48,410 both row and column. 1095 01:06:48,410 --> 01:06:51,470 So you're adding them, whereas on the previous one you were 1096 01:06:51,470 --> 01:06:53,480 subtracting. 1097 01:06:53,480 --> 01:06:56,000 And then the first one you didn't even care about what 1098 01:06:56,000 --> 01:06:57,120 the column was. 1099 01:06:57,120 --> 01:07:00,940 So I'm not sure I would know how to combine those. 1100 01:07:00,940 --> 01:07:02,995 It's conceivable you could do it. 1101 01:07:02,995 --> 01:07:04,450 AUDIENCE: [INAUDIBLE] 1102 01:07:04,450 --> 01:07:08,330 column vector, you could add the two together and find 1103 01:07:08,330 --> 01:07:11,163 specific positions that are free. 1104 01:07:11,163 --> 01:07:13,590 PROFESSOR: Yeah so you could also do a generation that says 1105 01:07:13,590 --> 01:07:16,290 here are the ones that are free and then use things like 1106 01:07:16,290 --> 01:07:19,470 the least significant bit trick to pull out what is the 1107 01:07:19,470 --> 01:07:20,990 positions I should bother to check. 1108 01:07:24,060 --> 01:07:25,670 So here I can test that it's safe. 1109 01:07:25,670 --> 01:07:29,720 But I could also generate using similar tricks, which is 1110 01:07:29,720 --> 01:07:33,270 what you're saying, generate all of the bit positions on a 1111 01:07:33,270 --> 01:07:38,290 given row where it would be safe to put a queen. 1112 01:07:38,290 --> 01:07:41,960 Yep, yep, good. 1113 01:07:41,960 --> 01:07:47,130 So fast programs use this technique. 1114 01:07:47,130 --> 01:07:50,560 So you see that there's a lot of cleverness in these kinds 1115 01:07:50,560 --> 01:07:52,735 of techniques. 1116 01:07:55,600 --> 01:08:01,490 So there are a whole bunch of other bit hacking techniques. 1117 01:08:01,490 --> 01:08:06,760 One really good resource for it is this webpage. 1118 01:08:06,760 --> 01:08:11,610 And of course I'll put this up on the- called bit twiddling 1119 01:08:11,610 --> 01:08:14,070 hacks, where he's compiled- there's a lot of people that 1120 01:08:14,070 --> 01:08:15,390 have worked on different bit twiddling hacks. 1121 01:08:15,390 --> 01:08:18,439 And he's done a very good job of compiling what he thinks 1122 01:08:18,439 --> 01:08:21,740 are the best code sequences for a whole bunch of things, 1123 01:08:21,740 --> 01:08:25,065 including things like reversing the bits in a word. 1124 01:08:25,065 --> 01:08:27,620 If you think about it that could be kind of tricky. 1125 01:08:27,620 --> 01:08:29,210 Actually turns out to be relevant to 1126 01:08:29,210 --> 01:08:32,880 your homework as well. 1127 01:08:32,880 --> 01:08:37,770 So on your homework, so lab one will be- it's posted I 1128 01:08:37,770 --> 01:08:40,740 gather, right? 1129 01:08:40,740 --> 01:08:43,250 It'll be posted shortly. 1130 01:08:43,250 --> 01:08:47,115 We have you trying to take advantage of some of these bit 1131 01:08:47,115 --> 01:08:54,220 tricks in a couple of warm up exercises and then a 1132 01:08:54,220 --> 01:08:56,819 backtracking search algorithm. 1133 01:08:56,819 --> 01:09:00,680 And so I think you'll find it's a lot of fun. 1134 01:09:00,680 --> 01:09:04,380 You'll learn a lot about all the kinds of tricks you can do 1135 01:09:04,380 --> 01:09:08,380 to make stuff go fast by using register operations locally, 1136 01:09:08,380 --> 01:09:10,775 and using good representations for your storage. 1137 01:09:13,430 --> 01:09:14,880 Yes, also, announcement. 1138 01:09:14,880 --> 01:09:24,819 Tonight, at 7 o'clock, in 32, 144, there is a primer on c. 1139 01:09:24,819 --> 01:09:27,470 So if you want to brush up on your c or if you want to learn 1140 01:09:27,470 --> 01:09:32,210 c, this is a good time to go. 1141 01:09:32,210 --> 01:09:41,750 There's a lot of good nuggets of wisdom coming out for c. 1142 01:09:41,750 --> 01:09:42,750 OK, thanks very much. 1143 01:09:42,750 --> 01:09:44,440 See you Thursday.