1 00:00:01,810 --> 00:00:04,240 The following content is provided under a Creative 2 00:00:04,240 --> 00:00:05,650 Commons license. 3 00:00:05,650 --> 00:00:08,680 Your support will help MIT OpenCourseWare continue to 4 00:00:08,680 --> 00:00:12,340 offer high quality educational resources for free. 5 00:00:12,340 --> 00:00:15,230 To make a donation or view additional materials from 6 00:00:15,230 --> 00:00:19,160 hundreds of MIT courses, visit MIT OpenCourseWare at 7 00:00:19,160 --> 00:00:20,410 ocw.mit.edu. 8 00:00:26,000 --> 00:00:27,220 CHARLES LEISERSON: So today we're going to take a little 9 00:00:27,220 --> 00:00:33,120 bit closer look at what's happening under the covers 10 00:00:33,120 --> 00:00:37,240 when you compile a C program. 11 00:00:37,240 --> 00:00:42,080 But before we get into that, we did a little interesting 12 00:00:42,080 --> 00:00:51,160 correlation on your scores for the first problem, for the 13 00:00:51,160 --> 00:00:52,760 every bit one. 14 00:00:52,760 --> 00:00:55,810 So this is basically plotting. 15 00:00:55,810 --> 00:01:00,350 It's a scatter plot of how you did in your test coverage 16 00:01:00,350 --> 00:01:07,410 score versus how you did in your correctness score and 17 00:01:07,410 --> 00:01:11,050 performance, correctness and performance together. 18 00:01:11,050 --> 00:01:15,270 And what's interesting is that if you did better in your test 19 00:01:15,270 --> 00:01:19,940 coverage, you did better in your performance and 20 00:01:19,940 --> 00:01:22,161 correctness. 21 00:01:22,161 --> 00:01:24,640 OK, that's a pretty good correlation, right? 22 00:01:24,640 --> 00:01:26,465 There's some outliers here. 23 00:01:26,465 --> 00:01:27,925 But that's a pretty good correlation. 24 00:01:36,170 --> 00:01:36,890 Yeah, John? 25 00:01:36,890 --> 00:01:37,643 Yeah? 26 00:01:37,643 --> 00:01:38,106 JOHN: The-- 27 00:01:38,106 --> 00:01:39,340 CHARLES LEISERSON: Do we have a handheld? 28 00:01:39,340 --> 00:01:40,360 Here we go. 29 00:01:40,360 --> 00:01:41,610 Just a second. 30 00:01:47,660 --> 00:01:49,610 The only thing is we have to figure out how to turn it on. 31 00:01:49,610 --> 00:01:50,860 There we go. 32 00:01:52,885 --> 00:01:55,640 JOHN: Yeah, so just to clarify, the people who 33 00:01:55,640 --> 00:01:58,515 seemingly got no test coverage but really good performance 34 00:01:58,515 --> 00:02:03,420 scores, what actually happened was that we tested for things 35 00:02:03,420 --> 00:02:07,630 that we didn't expect students to cover for like on feeding 36 00:02:07,630 --> 00:02:12,510 invalid values to better a set and better a get or testing 37 00:02:12,510 --> 00:02:14,830 for private functions to their implementations. 38 00:02:14,830 --> 00:02:19,190 So in reality they had better test suites than the coverage 39 00:02:19,190 --> 00:02:20,440 score would indicate. 40 00:02:26,970 --> 00:02:28,140 CHARLES LEISERSON: So what are the lessons that 41 00:02:28,140 --> 00:02:31,430 one draws from this? 42 00:02:31,430 --> 00:02:35,270 So professional engineers know what the lessons are. 43 00:02:35,270 --> 00:02:39,320 So the lessons are that it is actually better, if you have a 44 00:02:39,320 --> 00:02:44,910 coding problem to do, to write tests first. 45 00:02:44,910 --> 00:02:48,850 Before you code you write your tests. 46 00:02:48,850 --> 00:02:52,360 And that actually speeds the development of 47 00:02:52,360 --> 00:02:53,880 fast correct code. 48 00:02:53,880 --> 00:02:55,080 It's actually faster. 49 00:02:55,080 --> 00:02:58,340 You get to the end result much faster. 50 00:02:58,340 --> 00:03:02,820 Because whenever you make an error in your program, you 51 00:03:02,820 --> 00:03:07,360 instantly know that you may have a problem rather than 52 00:03:07,360 --> 00:03:10,380 thinking that you're doing something OK and then 53 00:03:10,380 --> 00:03:15,120 discovering that, oh, in fact your code is, in fact, 54 00:03:15,120 --> 00:03:18,120 incorrect, and you're working away optimizing something 55 00:03:18,120 --> 00:03:25,030 that's not working So before coding, it's highly 56 00:03:25,030 --> 00:03:26,620 recommended that you write test. 57 00:03:26,620 --> 00:03:31,740 Also if you find a bug, when you find a bug, the first 58 00:03:31,740 --> 00:03:35,840 thing you should do is write a test for that bug if it wasn't 59 00:03:35,840 --> 00:03:38,310 already covered. 60 00:03:38,310 --> 00:03:40,490 Then you fix the bug. 61 00:03:40,490 --> 00:03:43,860 And then you make sure that your test now, that your new 62 00:03:43,860 --> 00:03:46,830 implementation, passes that particular one. 63 00:03:46,830 --> 00:03:48,850 Professional engineers know this. 64 00:03:48,850 --> 00:03:52,310 Professional software developers know this. 65 00:03:52,310 --> 00:03:54,360 It comes hard. 66 00:03:54,360 --> 00:03:59,760 And if you want a job at any top flight software firm, 67 00:03:59,760 --> 00:04:02,520 they're going to expect that you know that you write tests 68 00:04:02,520 --> 00:04:04,990 first before you do coding. 69 00:04:08,100 --> 00:04:12,030 The second lesson isn't quite so obvious from this. 70 00:04:12,030 --> 00:04:14,780 But it's the second lesson that I think some people 71 00:04:14,780 --> 00:04:20,120 experienced in the class which was the idea of putting you in 72 00:04:20,120 --> 00:04:22,800 groups, in particular in pairs, was not so that you 73 00:04:22,800 --> 00:04:26,930 could do divide and conquer on the code. 74 00:04:26,930 --> 00:04:29,550 It was to do pair programming. 75 00:04:29,550 --> 00:04:33,380 And what we found was that a bunch of groups 76 00:04:33,380 --> 00:04:35,540 divided up the work. 77 00:04:35,540 --> 00:04:38,150 And they said, OK, it'll go faster if you do this one and 78 00:04:38,150 --> 00:04:39,800 I do that one. 79 00:04:39,800 --> 00:04:42,810 Once again that's probably a mistake. 80 00:04:42,810 --> 00:04:47,010 If you can sit together and take turns at the keyboard 81 00:04:47,010 --> 00:04:49,890 making the changes, it may seem like it's going slower to 82 00:04:49,890 --> 00:04:53,580 begin with, but it's amazing how many errors you catch and 83 00:04:53,580 --> 00:04:57,890 how quickly you find your errors because you're just 84 00:04:57,890 --> 00:04:59,560 talking with each other. 85 00:04:59,560 --> 00:05:02,990 And it's like, oh, duh. 86 00:05:02,990 --> 00:05:06,860 So good programmers know this. 87 00:05:06,860 --> 00:05:10,020 That it really helps to have more than one person 88 00:05:10,020 --> 00:05:12,560 understand what's going on in the code. 89 00:05:12,560 --> 00:05:15,580 So the people who had difficulty with their partners 90 00:05:15,580 --> 00:05:19,660 one way or another often did not, it was partly because 91 00:05:19,660 --> 00:05:22,560 they just divided up the work, you're responsible for that, 92 00:05:22,560 --> 00:05:25,290 oh, we got a bad grade on that, that's your fault. 93 00:05:25,290 --> 00:05:29,540 No, both partners own that grade 100%. 94 00:05:29,540 --> 00:05:35,220 And the best way to ensure is to work together. 95 00:05:35,220 --> 00:05:39,060 Now, this sometimes flies in the face of people who believe 96 00:05:39,060 --> 00:05:42,440 that they are clever or more experienced than somebody, 97 00:05:42,440 --> 00:05:47,720 than their partner, oh, I can do this much better on my own. 98 00:05:47,720 --> 00:05:51,370 Usually, that's true for little projects, but as the 99 00:05:51,370 --> 00:05:59,390 projects get bigger that becomes a much harder 100 00:05:59,390 --> 00:06:01,350 situation to deal with. 101 00:06:01,350 --> 00:06:04,980 It becomes the case that you really want two brains looking 102 00:06:04,980 --> 00:06:09,100 at the same thing, four eyes as opposed to two 103 00:06:09,100 --> 00:06:12,310 eyes looking at things. 104 00:06:12,310 --> 00:06:16,060 But I think, in particular, before coding write test. 105 00:06:16,060 --> 00:06:18,320 And we are right now working on improving the 106 00:06:18,320 --> 00:06:19,620 infrastructure. 107 00:06:19,620 --> 00:06:24,940 One of the things that they have in most companies is, at 108 00:06:24,940 --> 00:06:29,510 the very minimum, they have what's called a nightly build. 109 00:06:29,510 --> 00:06:32,710 Nightly build says they take all the software, they build 110 00:06:32,710 --> 00:06:35,800 it, and then they run regression tests against it 111 00:06:35,800 --> 00:06:37,990 all night while everybody's home sleeping. 112 00:06:37,990 --> 00:06:40,480 Come in the next morning, here's the things that broke. 113 00:06:40,480 --> 00:06:44,120 And if you broke the build, you got some 114 00:06:44,120 --> 00:06:46,760 work to do that morning. 115 00:06:46,760 --> 00:06:50,260 And it's generally not a good idea to break the build. 116 00:06:50,260 --> 00:06:54,050 What has been demonstrating, in fact, is that continuous 117 00:06:54,050 --> 00:06:56,520 build is even better. 118 00:06:56,520 --> 00:06:58,940 This is where, whenever you make a change to the program, 119 00:06:58,940 --> 00:07:02,250 you run the full suite of tests on it. 120 00:07:02,250 --> 00:07:04,160 And we're going to look into it. 121 00:07:04,160 --> 00:07:05,830 We have to see what our resources are. 122 00:07:05,830 --> 00:07:09,660 As you know, our TAs are a limited resource. 123 00:07:09,660 --> 00:07:11,820 But we're going to look into seeing whether we can provide 124 00:07:11,820 --> 00:07:14,030 more of that kind of infrastructure on some of the 125 00:07:14,030 --> 00:07:15,940 later projects for you folks. 126 00:07:15,940 --> 00:07:18,190 So you can sort of see the matrix that we 127 00:07:18,190 --> 00:07:19,370 eventually got to you. 128 00:07:19,370 --> 00:07:21,920 You can see that develop in real time. 129 00:07:21,920 --> 00:07:24,650 How am I doing against other people's tests? 130 00:07:24,650 --> 00:07:26,740 How are they doing against my tests, et cetera? 131 00:07:29,600 --> 00:07:31,410 So we'll see whether we can do that. 132 00:07:31,410 --> 00:07:34,810 But what's funny is you think that it'd be faster to just 133 00:07:34,810 --> 00:07:35,950 code and do it. 134 00:07:35,950 --> 00:07:39,120 Computer science is full of wonderful paradoxes. 135 00:07:39,120 --> 00:07:44,850 And one of them is that doing things like writing the extra 136 00:07:44,850 --> 00:07:51,120 code to test is actually faster than not writing it, 137 00:07:51,120 --> 00:07:52,650 surprisingly. 138 00:07:52,650 --> 00:07:55,460 It really gets you to the end result a lot faster. 139 00:07:55,460 --> 00:07:56,710 Any questions about that? 140 00:07:59,200 --> 00:08:00,450 Any comments about that? 141 00:08:03,410 --> 00:08:08,330 Let's talk about our today. 142 00:08:08,330 --> 00:08:11,000 So today we're going to talk mostly about single threaded 143 00:08:11,000 --> 00:08:12,990 performance. 144 00:08:12,990 --> 00:08:15,020 This is one instruction stream that you're 145 00:08:15,020 --> 00:08:16,970 trying to make go fast. 146 00:08:16,970 --> 00:08:23,620 But if you look at today's computing milieu, how all of 147 00:08:23,620 --> 00:08:26,380 the computers are used, what do you have? 148 00:08:26,380 --> 00:08:29,650 You've got networks of multi-core clusters. 149 00:08:29,650 --> 00:08:32,270 It's parallelism everywhere. 150 00:08:32,270 --> 00:08:35,950 You've got shared memory among processors within a chip. 151 00:08:35,950 --> 00:08:39,409 You've got message passing among machines in a cluster. 152 00:08:39,409 --> 00:08:42,210 You've got network protocols among clusters so that you can 153 00:08:42,210 --> 00:08:47,400 do wide area things. 154 00:08:47,400 --> 00:08:50,620 Yet we're saying, no, let's take a look at what happens on 155 00:08:50,620 --> 00:08:54,320 one core on one machine. 156 00:08:54,320 --> 00:08:59,920 So why is that important to focus first on what 157 00:08:59,920 --> 00:09:02,590 one core can do? 158 00:09:02,590 --> 00:09:05,530 Why study single threaded performance at all? 159 00:09:05,530 --> 00:09:06,920 Let's just go do the parallel stuff. 160 00:09:06,920 --> 00:09:08,170 That's more fun anyway. 161 00:09:10,820 --> 00:09:13,630 Well, there are a couple of reasons that I can think of. 162 00:09:13,630 --> 00:09:22,010 The first one is at the end of the day, even if you've got 163 00:09:22,010 --> 00:09:27,230 something running widely in parallel, the code is running 164 00:09:27,230 --> 00:09:29,560 in each core in a single threaded manner. 165 00:09:29,560 --> 00:09:32,050 You just have a bunch of them. 166 00:09:32,050 --> 00:09:36,600 And so if you've given up a factor of two or a factor of 167 00:09:36,600 --> 00:09:40,690 four in performance or even more, as you're aware, you can 168 00:09:40,690 --> 00:09:44,240 sometimes make it orders of magnitude, but in performance 169 00:09:44,240 --> 00:09:47,010 what you're saying is that you're going to end up using 170 00:09:47,010 --> 00:09:52,060 much more resources to do your particular job in parallel. 171 00:09:52,060 --> 00:09:53,690 And resources is money. 172 00:09:53,690 --> 00:10:03,360 So if I can do the job with a cluster of 16 processors and 173 00:10:03,360 --> 00:10:06,000 somebody else can do it in a cluster with only four 174 00:10:06,000 --> 00:10:12,360 processors, hey, they just spent a quarter the amount on 175 00:10:12,360 --> 00:10:17,080 not just the capital investment in that hardware 176 00:10:17,080 --> 00:10:21,720 but also the operating costs of what it cost to actually 177 00:10:21,720 --> 00:10:27,040 cool and provide electricity to and maintain and so forth. 178 00:10:27,040 --> 00:10:29,050 All that gets much cheaper. 179 00:10:29,050 --> 00:10:31,100 So if you get good single thread performance, it 180 00:10:31,100 --> 00:10:31,730 translates. 181 00:10:31,730 --> 00:10:36,300 That's kind of the direct reason. 182 00:10:36,300 --> 00:10:40,350 The indirect reason is, for studying it, is that many of 183 00:10:40,350 --> 00:10:42,160 the lessons will generalize. 184 00:10:42,160 --> 00:10:44,310 So things that we'll see for single core, there is an 185 00:10:44,310 --> 00:10:47,890 analogy when you start looking at parallel 186 00:10:47,890 --> 00:10:50,640 and distributed systems. 187 00:10:50,640 --> 00:10:53,320 So that's a little less concrete. 188 00:10:53,320 --> 00:10:57,360 But as you'll see as you gain experience, you'll see that 189 00:10:57,360 --> 00:11:00,980 there's a lot of lessons that generalize to how do you think 190 00:11:00,980 --> 00:11:05,715 about performance no matter what the context. 191 00:11:09,660 --> 00:11:12,860 So what about a single threaded machine? 192 00:11:12,860 --> 00:11:14,130 What's it like? 193 00:11:14,130 --> 00:11:16,370 So some of this is going to be a little bit review, but we're 194 00:11:16,370 --> 00:11:17,590 going to just sort of go deeper. 195 00:11:17,590 --> 00:11:20,330 We've sort of been taking layers off the onion. 196 00:11:20,330 --> 00:11:22,210 And today we're going to take a few more 197 00:11:22,210 --> 00:11:23,960 layers off the onion. 198 00:11:23,960 --> 00:11:26,680 So you have inside a processor core. 199 00:11:26,680 --> 00:11:28,340 You've got registers. 200 00:11:28,340 --> 00:11:30,930 You've got the functional units to do your ALU 201 00:11:30,930 --> 00:11:34,880 operations, floating point units, vector units these 202 00:11:34,880 --> 00:11:39,760 days, and all the stuff to do instruction, execution, and 203 00:11:39,760 --> 00:11:42,300 coordination, scheduling, out of order 204 00:11:42,300 --> 00:11:45,100 execution, and so forth. 205 00:11:45,100 --> 00:11:48,510 In addition then, you have a memory hierarchy. 206 00:11:48,510 --> 00:11:50,780 Within the core, you typically have registers 207 00:11:50,780 --> 00:11:52,520 and L1 and L2 caches. 208 00:11:52,520 --> 00:11:57,910 And then outside the core often is the L3 cache. 209 00:11:57,910 --> 00:12:00,480 DRAM memory, you may have a solid-state drive 210 00:12:00,480 --> 00:12:02,780 these days and disk. 211 00:12:02,780 --> 00:12:05,240 And so in that context, you're trying to make 212 00:12:05,240 --> 00:12:06,490 your code run fast. 213 00:12:09,540 --> 00:12:15,100 So when you compile the piece of code, so here I have a 214 00:12:15,100 --> 00:12:15,660 piece of code. 215 00:12:15,660 --> 00:12:19,370 I'm always amused when I put up a Fibonacci as the example. 216 00:12:19,370 --> 00:12:23,210 Because this is a really terrible way to compute 217 00:12:23,210 --> 00:12:25,580 Fibonacci numbers. 218 00:12:25,580 --> 00:12:28,530 So this is an exponential time algorithm for computing 219 00:12:28,530 --> 00:12:30,120 Fibonacci numbers. 220 00:12:30,120 --> 00:12:35,100 And you may be aware you can do this in linear time just by 221 00:12:35,100 --> 00:12:36,910 adding up from the bottom. 222 00:12:36,910 --> 00:12:39,070 In fact, if you take the algorithms course, you learn 223 00:12:39,070 --> 00:12:44,220 that you can actually do this in logarithmic time by matrix, 224 00:12:44,220 --> 00:12:46,770 recursive squaring of matrices. 225 00:12:46,770 --> 00:12:49,510 So it's sort of interesting to put up something where we say 226 00:12:49,510 --> 00:12:52,100 we're going to optimize this. 227 00:12:52,100 --> 00:12:54,570 And, of course, we'll get a constant factor improvement on 228 00:12:54,570 --> 00:12:55,800 something like this. 229 00:12:55,800 --> 00:13:01,610 But, in fact, really this is a terrible program to write for 230 00:13:01,610 --> 00:13:02,380 optimization. 231 00:13:02,380 --> 00:13:04,720 But it's good didactically. 232 00:13:04,720 --> 00:13:07,460 And Fibonacci numbers are fun anyway. 233 00:13:07,460 --> 00:13:13,120 So typically what happens is when you run GCC on your .C 234 00:13:13,120 --> 00:13:19,980 file and produce a binary, what happens is it produces 235 00:13:19,980 --> 00:13:23,010 the machine code, which is basically a string of bytes, 236 00:13:23,010 --> 00:13:24,810 zeros and ones. 237 00:13:24,810 --> 00:13:28,570 And that goes, when you run the program, that goes through 238 00:13:28,570 --> 00:13:30,650 the hardware interpreter. 239 00:13:30,650 --> 00:13:33,430 So the hardware of the machine is doing an interpretation of 240 00:13:33,430 --> 00:13:37,150 these very simple instructions and produces an execution. 241 00:13:37,150 --> 00:13:40,980 But, in fact, there's actually four stages that go on inside 242 00:13:40,980 --> 00:13:44,710 of GCC if you type a command like this. 243 00:13:44,710 --> 00:13:47,320 The first thing is what's called preprocessing. 244 00:13:47,320 --> 00:13:51,360 And what that does is it does any macro expansion and so 245 00:13:51,360 --> 00:13:55,280 forth, things that are just basically on the level of 246 00:13:55,280 --> 00:13:57,800 textual substitutions before you get into 247 00:13:57,800 --> 00:13:59,210 the guts of the compiler. 248 00:13:59,210 --> 00:14:02,310 Then you actually do the compiler. 249 00:14:02,310 --> 00:14:07,930 And that produces a version of machine code called assembly 250 00:14:07,930 --> 00:14:11,140 language, which we'll see in just a minute. 251 00:14:11,140 --> 00:14:16,940 And from that version of assembly language it then goes 252 00:14:16,940 --> 00:14:21,390 into a process called linking and loading, which actually 253 00:14:21,390 --> 00:14:24,280 causes it to produce the binary that 254 00:14:24,280 --> 00:14:27,510 you can then execute. 255 00:14:27,510 --> 00:14:30,350 So all four stages are included here. 256 00:14:30,350 --> 00:14:36,070 And there are switches to GCC that let you do only one or 257 00:14:36,070 --> 00:14:37,430 all of these things. 258 00:14:37,430 --> 00:14:40,360 You can, for example, run the preprocessor GCC. 259 00:14:40,360 --> 00:14:43,050 You can tell it to run the preprocessor alone and see 260 00:14:43,050 --> 00:14:46,180 what all your macros expanded to. 261 00:14:46,180 --> 00:14:46,410 Yes? 262 00:14:46,410 --> 00:14:47,265 Question? 263 00:14:47,265 --> 00:14:48,477 AUDIENCE: What's the difference between compiling 264 00:14:48,477 --> 00:14:50,180 and assembling? 265 00:14:50,180 --> 00:14:53,000 CHARLES LEISERSON: So compiling reduces it to 266 00:14:53,000 --> 00:14:55,090 essentially assembly language. 267 00:14:55,090 --> 00:15:00,900 And then assembling is taking that assembly language and 268 00:15:00,900 --> 00:15:04,713 producing the machine binary. 269 00:15:04,713 --> 00:15:06,162 AUDIENCE: I was going to say, there's a one-to-one 270 00:15:06,162 --> 00:15:09,060 correspondence between machine code and assembly, but there's 271 00:15:09,060 --> 00:15:10,750 not a one-to-one correspondence between C code 272 00:15:10,750 --> 00:15:11,958 and assembly. 273 00:15:11,958 --> 00:15:12,450 CHARLES LEISERSON: Yeah. 274 00:15:12,450 --> 00:15:14,850 So there's actually not quite a one to one, 275 00:15:14,850 --> 00:15:17,020 but it's very close. 276 00:15:17,020 --> 00:15:19,060 It's very close. 277 00:15:19,060 --> 00:15:21,720 So you can think of it as one to one between assembly 278 00:15:21,720 --> 00:15:22,180 machine code. 279 00:15:22,180 --> 00:15:26,390 But assembly is, in some sense, a more human readable 280 00:15:26,390 --> 00:15:29,890 and understandable version of machine code. 281 00:15:29,890 --> 00:15:31,760 In fact, that's what we're going to talk about. 282 00:15:31,760 --> 00:15:34,210 So let's go directly to assembly code. 283 00:15:34,210 --> 00:15:38,530 To do that I can use the minus S switch. 284 00:15:38,530 --> 00:15:39,970 Now it turns out it's also helpful to 285 00:15:39,970 --> 00:15:41,680 use the minus G switch. 286 00:15:41,680 --> 00:15:46,840 Minus G says give me all the debugger symbol tables. 287 00:15:46,840 --> 00:15:48,880 And what that makes it is so that you can actually read the 288 00:15:48,880 --> 00:15:50,360 assembly language. 289 00:15:50,360 --> 00:15:52,980 If you don't have that information, then you don't 290 00:15:52,980 --> 00:15:56,560 know what the programmer wrote as the symbols. 291 00:15:56,560 --> 00:16:03,050 Instead you will get computer generated 292 00:16:03,050 --> 00:16:04,620 symbol names for things. 293 00:16:04,620 --> 00:16:07,180 And you don't have any meaning to those. 294 00:16:07,180 --> 00:16:08,900 So it's really a good idea to use minus 295 00:16:08,900 --> 00:16:10,800 G and minus S together. 296 00:16:10,800 --> 00:16:13,700 And this basically provides a convenient symbolic 297 00:16:13,700 --> 00:16:15,340 representation of the machine language. 298 00:16:15,340 --> 00:16:17,730 And this is sort of the type of thing that you'll get, 299 00:16:17,730 --> 00:16:20,080 something coming out that looks like this. 300 00:16:20,080 --> 00:16:21,800 It's basically an Ascii. 301 00:16:21,800 --> 00:16:26,140 It's in text, characters, rather than being in the 302 00:16:26,140 --> 00:16:28,710 binary executable. 303 00:16:28,710 --> 00:16:34,770 And if you want, you can find out all the vagaries of it. 304 00:16:34,770 --> 00:16:40,690 This is one site that has some reasonable documentation on 305 00:16:40,690 --> 00:16:42,560 the GNU assembler. 306 00:16:42,560 --> 00:16:45,950 It's actually not as good on the instructions, but it's 307 00:16:45,950 --> 00:16:49,330 really good on all the directives, which we'll talk 308 00:16:49,330 --> 00:16:53,730 about in a minute like .global and .type and all that stuff. 309 00:16:53,730 --> 00:16:54,980 It's very good on that stuff. 310 00:16:58,520 --> 00:17:00,260 There's another thing that you can do. 311 00:17:00,260 --> 00:17:04,220 And once again, it's also helpful if you've produced a 312 00:17:04,220 --> 00:17:06,569 binary that has the symbol table. 313 00:17:06,569 --> 00:17:11,760 And that is to do a dump of the object code. 314 00:17:11,760 --> 00:17:16,030 And when you do a dump of the object code, what it does is 315 00:17:16,030 --> 00:17:19,079 you basically give it an executable and it goes 316 00:17:19,079 --> 00:17:24,020 backwards the other way, take this executable and undo one 317 00:17:24,020 --> 00:17:27,800 step, disassemble it. 318 00:17:27,800 --> 00:17:31,150 And what's good about object dump is that it gives you, 319 00:17:31,150 --> 00:17:33,710 first of all, these are all the byte codes of the 320 00:17:33,710 --> 00:17:34,990 instructions. 321 00:17:34,990 --> 00:17:38,750 Also if you've got the minus S says interleave the source 322 00:17:38,750 --> 00:17:42,060 code, so you can see, here's the source code interleaved. 323 00:17:42,060 --> 00:17:44,070 So you can see which regions of code 324 00:17:44,070 --> 00:17:47,280 depend on which things. 325 00:17:47,280 --> 00:17:50,600 And so it basically tells you where in memory it's being 326 00:17:50,600 --> 00:17:53,970 loaded, it's been loaded, what the instructions are. 327 00:17:53,970 --> 00:17:57,560 And then it gives you the assembly interpretation of 328 00:17:57,560 --> 00:17:59,230 that machine binary. 329 00:17:59,230 --> 00:18:02,600 And this is where you can see it's almost one to one what's 330 00:18:02,600 --> 00:18:03,180 going on here. 331 00:18:03,180 --> 00:18:05,295 Here we have a push of an operand. 332 00:18:05,295 --> 00:18:07,620 And that notice is just a one byte code. 333 00:18:07,620 --> 00:18:13,330 Whereas here we've got an opcode and two arguments. 334 00:18:13,330 --> 00:18:16,690 And it has three bytes as it turns out. 335 00:18:16,690 --> 00:18:18,640 So you can see there's sort of a correspondence. 336 00:18:18,640 --> 00:18:20,208 Yeah, question? 337 00:18:20,208 --> 00:18:21,196 AUDIENCE: How does [? logic then take the ?] 338 00:18:21,196 --> 00:18:24,160 machine language code and go to-- 339 00:18:24,160 --> 00:18:27,124 how does it know the function names and stuff? 340 00:18:27,124 --> 00:18:28,820 CHARLES LEISERSON: It knows the function names because 341 00:18:28,820 --> 00:18:33,110 when you compile it with -g, it produces, in addition to 342 00:18:33,110 --> 00:18:36,250 producing the binary, it produces a separate segment 343 00:18:36,250 --> 00:18:39,870 that's not loaded in that has all that information that 344 00:18:39,870 --> 00:18:43,030 says, oh, at this location is where this symbol is. 345 00:18:43,030 --> 00:18:46,840 And it produces all that as stuff that's never loaded in 346 00:18:46,840 --> 00:18:49,700 at run time but which is there in order to 347 00:18:49,700 --> 00:18:51,290 aid debuggers mainly. 348 00:18:51,290 --> 00:18:52,610 Question? 349 00:18:52,610 --> 00:18:56,370 AUDIENCE: To compile something not using the gflag and then 350 00:18:56,370 --> 00:18:58,715 you do an object dump, how would that work? 351 00:18:58,715 --> 00:19:01,430 CHARLES LEISERSON: Then what happens is, first of all, you 352 00:19:01,430 --> 00:19:06,280 would not be able to get this stuff interleaved. 353 00:19:06,280 --> 00:19:09,720 And then things like here where it says fib, well, fib 354 00:19:09,720 --> 00:19:14,160 may be an external name so you might know it anyway. 355 00:19:14,160 --> 00:19:16,970 But if it were an internal name, you would not be able to 356 00:19:16,970 --> 00:19:18,800 see what it was. 357 00:19:18,800 --> 00:19:21,360 Yeah, if you're going to respond let's 358 00:19:21,360 --> 00:19:22,660 get you on mike here. 359 00:19:22,660 --> 00:19:24,920 Why don't you just hold this? 360 00:19:24,920 --> 00:19:27,270 JOHN: Yeah, so you'll generally get the function 361 00:19:27,270 --> 00:19:30,370 names so you know roughly a huge blob of assembly 362 00:19:30,370 --> 00:19:32,100 corresponds to a function. 363 00:19:32,100 --> 00:19:34,740 But you won't be able to get any information about what 364 00:19:34,740 --> 00:19:39,520 variables are in which registers or what position the 365 00:19:39,520 --> 00:19:41,630 sixth line of assembly corresponds to in terms of 366 00:19:41,630 --> 00:19:42,880 your source code. 367 00:19:45,580 --> 00:19:47,170 CHARLES LEISERSON: Then the other thing that you can do is 368 00:19:47,170 --> 00:19:50,690 you can actually take the assembler, the assembly code, 369 00:19:50,690 --> 00:19:53,140 if you produce just the assembly code, and if you tell 370 00:19:53,140 --> 00:19:57,540 GCC to take a .s file, which is the assembly code, you can 371 00:19:57,540 --> 00:20:00,146 produce the machine code from it. 372 00:20:00,146 --> 00:20:05,910 And so one thing that you can do is you can produce a .s 373 00:20:05,910 --> 00:20:11,820 file and then edit it in Emacs or VI or whatever your 374 00:20:11,820 --> 00:20:16,180 favorite text editor is and then assemble it with GCC. 375 00:20:16,180 --> 00:20:18,430 So you can actually modify what the 376 00:20:18,430 --> 00:20:21,480 machine code is directly. 377 00:20:21,480 --> 00:20:23,220 And that's what we're going to spend a little bit of time 378 00:20:23,220 --> 00:20:24,580 doing today. 379 00:20:24,580 --> 00:20:28,220 Let's go in and see what the compiler generates and then 380 00:20:28,220 --> 00:20:29,470 let's twiddle it a bit. 381 00:20:34,630 --> 00:20:37,760 So here's what we're going to expect that you do, that 382 00:20:37,760 --> 00:20:39,010 you're able to do. 383 00:20:41,770 --> 00:20:45,320 We expect in this class that you're going to be able to 384 00:20:45,320 --> 00:20:50,370 understand how a compiler implements the C linguistic 385 00:20:50,370 --> 00:20:52,575 constructs using x86 instructions. 386 00:20:55,320 --> 00:21:00,060 We're going to expect that you can read x86 assembly language 387 00:21:00,060 --> 00:21:01,520 with the aid of a manual. 388 00:21:01,520 --> 00:21:03,830 We don't expect that you know all the instructions, but the 389 00:21:03,830 --> 00:21:08,350 basic ones we expect that you know what those are. 390 00:21:08,350 --> 00:21:10,370 We expect that you're going to be able to make simple 391 00:21:10,370 --> 00:21:14,840 modifications to the assembly language generated by a 392 00:21:14,840 --> 00:21:19,200 compiler, and that you would know, if push came to shove, 393 00:21:19,200 --> 00:21:21,610 how to write your own machine code on your own. 394 00:21:21,610 --> 00:21:23,700 That's not something we're going to expect that you do, 395 00:21:23,700 --> 00:21:26,920 but you would know how to get started to do that if at some 396 00:21:26,920 --> 00:21:28,310 point you said, oh, I really have to 397 00:21:28,310 --> 00:21:29,560 write this in assembler. 398 00:21:32,410 --> 00:21:35,150 So this is, as I say, really we're going to take off some 399 00:21:35,150 --> 00:21:40,036 layers of the onion today, try to get down what's going on. 400 00:21:40,036 --> 00:21:44,420 It turns out this is actually kind of fun. 401 00:21:44,420 --> 00:21:49,100 Now, the part that's not fun at some level is the x86 64 402 00:21:49,100 --> 00:21:50,350 machine model. 403 00:21:53,670 --> 00:21:55,680 The x86 is what's called a complex 404 00:21:55,680 --> 00:21:59,530 instruction set computer. 405 00:21:59,530 --> 00:22:06,590 And these, long ago, were demonstrated to be inferior to 406 00:22:06,590 --> 00:22:10,940 so-called reduced instruction set computers. 407 00:22:10,940 --> 00:22:14,050 But that hasn't mattered in the marketplace. 408 00:22:14,050 --> 00:22:16,290 What's mattered in the marketplace is who could build 409 00:22:16,290 --> 00:22:18,240 better and faster chips. 410 00:22:18,240 --> 00:22:22,940 And also the amount of people who started using the x86 411 00:22:22,940 --> 00:22:28,240 instruction set has produced a huge legacy and inertia. 412 00:22:28,240 --> 00:22:30,860 It's sort of like some people might argue that Esperanto is 413 00:22:30,860 --> 00:22:32,140 a better language for everybody 414 00:22:32,140 --> 00:22:34,480 to learn than English. 415 00:22:34,480 --> 00:22:38,350 But how come English with all its complexities and so forth, 416 00:22:38,350 --> 00:22:42,500 and I'm sure for some of you who have learned English as a 417 00:22:42,500 --> 00:22:45,370 second language, it's like it's a crazy language. 418 00:22:45,370 --> 00:22:46,640 Who do you learn English? 419 00:22:46,640 --> 00:22:50,040 Well, it's because that's what everybody's learning. 420 00:22:50,040 --> 00:22:52,560 That's where the legacy is. 421 00:22:52,560 --> 00:22:57,130 And so x86 is very much like the English of 422 00:22:57,130 --> 00:22:58,920 machines these days. 423 00:22:58,920 --> 00:23:05,030 So in this model there's basically a flat 64-bit 424 00:23:05,030 --> 00:23:06,280 address space. 425 00:23:09,510 --> 00:23:14,710 There are 16 64-bit general purpose registers, and then 426 00:23:14,710 --> 00:23:18,800 what are some segment registers, a register full of 427 00:23:18,800 --> 00:23:27,910 flags, an instruction pointer register, rest in peace. 428 00:23:27,910 --> 00:23:33,170 They're eight 80-bit floating point data registers, some 429 00:23:33,170 --> 00:23:37,850 control status registers, an opcode register, a floating 430 00:23:37,850 --> 00:23:40,540 point instruction pointer register, and a floating point 431 00:23:40,540 --> 00:23:47,000 data pointing register, some MMX registers for the 432 00:23:47,000 --> 00:23:56,770 multimedia extensions, and a 128-bit XMM registers for the 433 00:23:56,770 --> 00:24:01,500 SSE instructions, which are the ability to have an opcode 434 00:24:01,500 --> 00:24:05,040 run over several pieces of data at once, short vectors, 435 00:24:05,040 --> 00:24:09,440 vector instructions, and a 32-bit register that frankly I 436 00:24:09,440 --> 00:24:13,410 don't have a clue as to what it does. 437 00:24:13,410 --> 00:24:16,010 So, fortunately, we don't have to know all these. 438 00:24:16,010 --> 00:24:17,930 You can look at the architecture manual if any of 439 00:24:17,930 --> 00:24:20,250 these become important. 440 00:24:20,250 --> 00:24:24,170 So our goal is not to memorize the x86 instruction set. 441 00:24:24,170 --> 00:24:30,390 That would be a punishment probably worse than death. 442 00:24:30,390 --> 00:24:33,175 The only thing worse would be learning all of C++. 443 00:24:36,960 --> 00:24:40,800 So here's the general registers. 444 00:24:40,800 --> 00:24:44,260 So there are basically 64-bit registers. 445 00:24:44,260 --> 00:24:46,090 And here's the mnemonics that they have. 446 00:24:46,090 --> 00:24:48,370 So you can see is all very mnemonic, right? 447 00:24:48,370 --> 00:24:50,110 We got some of them that are numbered. 448 00:24:50,110 --> 00:24:51,560 How come they're all just not numbered? 449 00:24:51,560 --> 00:24:53,670 I mean come on, right? 450 00:24:53,670 --> 00:24:54,150 I know why. 451 00:24:54,150 --> 00:24:54,880 I know why. 452 00:24:54,880 --> 00:24:56,130 Don't tell me. 453 00:24:58,660 --> 00:25:01,740 So what you get to do is look at and remember that there are 454 00:25:01,740 --> 00:25:06,240 all these fun registers. 455 00:25:06,240 --> 00:25:11,520 And what they did is the x86 64 architecture grew out of 456 00:25:11,520 --> 00:25:14,730 the x86 which was 32-bit. 457 00:25:14,730 --> 00:25:18,340 Well, in fact, originally it was 16-bit. 458 00:25:18,340 --> 00:25:21,930 And it's been extended twice to have more bits in the 459 00:25:21,930 --> 00:25:23,580 instruction word so that now it's a 460 00:25:23,580 --> 00:25:25,730 64-bit instruction word. 461 00:25:25,730 --> 00:25:28,620 And what they did in order to make it so that they could run 462 00:25:28,620 --> 00:25:32,750 legacy code more easily, which might have been written with a 463 00:25:32,750 --> 00:25:37,080 smaller word size, is they've overlap so that the EAX 464 00:25:37,080 --> 00:25:39,370 register, for example, is the low order 465 00:25:39,370 --> 00:25:43,740 32-bits of the RAX register. 466 00:25:43,740 --> 00:25:47,320 So what you do is you'll see that R is the prefix that 467 00:25:47,320 --> 00:25:52,182 says, hey, that's a 64-bit register. 468 00:25:52,182 --> 00:25:56,380 E is the prefix that says that it is-- 469 00:25:56,380 --> 00:25:57,920 Whoops, I made a mistake there. 470 00:25:57,920 --> 00:25:59,250 Those should all be Es. 471 00:26:03,530 --> 00:26:05,200 Oh, no, sorry, no, there's are correct. 472 00:26:05,200 --> 00:26:08,310 These are R and then with D because these are the 473 00:26:08,310 --> 00:26:09,760 extended ones, yes. 474 00:26:09,760 --> 00:26:17,410 So these are D. So R and D, that means also that it's 16. 475 00:26:17,410 --> 00:26:20,280 So you can see just how easy this is to remember without a 476 00:26:20,280 --> 00:26:21,670 cheat sheet, right? 477 00:26:21,670 --> 00:26:23,540 And then you go down to 15, et cetera. 478 00:26:23,540 --> 00:26:25,980 And so you can go all the way down to byte naming, the low 479 00:26:25,980 --> 00:26:28,510 order byte of the registers. 480 00:26:28,510 --> 00:26:31,450 In addition, it turns out that that's not all. 481 00:26:31,450 --> 00:26:36,150 But the high order byte of the 16-bit registers are also 482 00:26:36,150 --> 00:26:40,365 available as independently named registers. 483 00:26:45,830 --> 00:26:49,590 When you're using this in a C program, there's a convention 484 00:26:49,590 --> 00:26:50,600 that C has. 485 00:26:50,600 --> 00:26:54,570 And it's actually different on Windows from on Linux. 486 00:26:54,570 --> 00:26:57,450 Because there's no reason they should make those things 487 00:26:57,450 --> 00:26:58,010 compatible. 488 00:26:58,010 --> 00:27:01,040 That would be too easy. 489 00:27:01,040 --> 00:27:02,460 So instead they have different ones. 490 00:27:02,460 --> 00:27:07,480 But the ones on Linux, this is essentially the structure. 491 00:27:07,480 --> 00:27:10,810 What happens when you call a subroutine is generally you're 492 00:27:10,810 --> 00:27:15,250 passing the arguments to the subroutine in registers. 493 00:27:15,250 --> 00:27:17,490 And in fact the first six arguments are 494 00:27:17,490 --> 00:27:19,360 passed in these registers. 495 00:27:19,360 --> 00:27:22,790 RDI, you'll get very familiar with RDI. 496 00:27:22,790 --> 00:27:26,660 Because that's where the first argument is always passed. 497 00:27:26,660 --> 00:27:30,530 And almost all your functions will have a first argument, 498 00:27:30,530 --> 00:27:33,880 except for the ones that have no arguments, and then the 499 00:27:33,880 --> 00:27:36,080 second arguments, the third, and so forth, and 500 00:27:36,080 --> 00:27:37,890 then fifth and sixth. 501 00:27:37,890 --> 00:27:41,550 If you get more than six, then it turns out, then you start 502 00:27:41,550 --> 00:27:44,120 passing arguments through memory. 503 00:27:44,120 --> 00:27:47,120 But otherwise the convention is that the arguments are 504 00:27:47,120 --> 00:27:49,650 passed through registers. 505 00:27:49,650 --> 00:27:52,220 There are a couple of other important registers. 506 00:27:52,220 --> 00:27:59,320 One here is the return value always comes back in RAX. 507 00:27:59,320 --> 00:28:03,840 So when a function returns, boom, that's where, RAX is 508 00:28:03,840 --> 00:28:07,210 where the value of the return is. 509 00:28:07,210 --> 00:28:11,440 There is a base pointer and a stack pointer which give you 510 00:28:11,440 --> 00:28:15,900 the stack frame so that when you do a push and want to push 511 00:28:15,900 --> 00:28:18,880 local variables those are telling you the limits of your 512 00:28:18,880 --> 00:28:19,520 local variats. 513 00:28:19,520 --> 00:28:21,780 And we'll talk more about that a little bit. 514 00:28:21,780 --> 00:28:23,390 And then there are a variety of other ones. 515 00:28:23,390 --> 00:28:28,880 Some are callee saved and some are caller saved. 516 00:28:28,880 --> 00:28:30,880 And you can refer to this chart. 517 00:28:30,880 --> 00:28:36,000 And there are others similar to it in the various manuals. 518 00:28:36,000 --> 00:28:39,730 Now, it gets pretty confusing, if this isn't confusing enough 519 00:28:39,730 --> 00:28:40,980 for the naming. 520 00:28:42,960 --> 00:28:45,540 Let's go on to how you name data types. 521 00:28:45,540 --> 00:28:47,530 And I think some of you have already experienced this a 522 00:28:47,530 --> 00:28:50,960 little bit, the beauties of the data types. 523 00:28:50,960 --> 00:28:56,410 So in C, they have all these different data types such as 524 00:28:56,410 --> 00:28:58,580 I'm listing here. 525 00:28:58,580 --> 00:29:02,100 And if you want to generate a constant of that size, so 526 00:29:02,100 --> 00:29:06,570 sometimes the compiler will coerce a value from one type 527 00:29:06,570 --> 00:29:07,330 to another. 528 00:29:07,330 --> 00:29:09,220 But sometimes it won't. 529 00:29:09,220 --> 00:29:11,520 And so if you want to have a constant, and I've just given 530 00:29:11,520 --> 00:29:14,410 a couple things here, for example, if you want it to be 531 00:29:14,410 --> 00:29:18,710 just an int, you can just write the number. 532 00:29:18,710 --> 00:29:20,350 But if you want it to be unsigned, you have to 533 00:29:20,350 --> 00:29:22,000 put a U after it. 534 00:29:22,000 --> 00:29:23,620 Or if you want it to be a long, you have to 535 00:29:23,620 --> 00:29:26,210 put an L after it. 536 00:29:26,210 --> 00:29:29,390 And for many things it'll get coerced automatically to the 537 00:29:29,390 --> 00:29:32,320 right type because if you do an operator with another 538 00:29:32,320 --> 00:29:35,100 argument it will be coerced to that type. 539 00:29:35,100 --> 00:29:37,490 But some of you got burned on some of the shift things to 540 00:29:37,490 --> 00:29:44,570 begin with because it wasn't clear what exactly the sizes. 541 00:29:44,570 --> 00:29:50,230 Well, you can be explicit in C and name them using this 542 00:29:50,230 --> 00:29:52,690 particular convention. 543 00:29:52,690 --> 00:29:57,100 This tells you how many bytes are being allocated for that 544 00:29:57,100 --> 00:30:03,110 type in the x86 64 size. 545 00:30:03,110 --> 00:30:04,550 So it's [? veted ?] here for four. 546 00:30:04,550 --> 00:30:06,150 Now, long double is a funny one. 547 00:30:06,150 --> 00:30:08,910 It's actually allocate 16 bytes, but only 548 00:30:08,910 --> 00:30:11,740 10 of them are used. 549 00:30:11,740 --> 00:30:15,500 So basically there are six bytes that get unused by that. 550 00:30:15,500 --> 00:30:19,960 And I think that's for future expansion so that they can 551 00:30:19,960 --> 00:30:21,700 have even wider extension. 552 00:30:21,700 --> 00:30:23,480 This is generally used, of course, for floating 553 00:30:23,480 --> 00:30:25,010 point and so forth. 554 00:30:25,010 --> 00:30:28,480 Now, in the assembly language, each of the 555 00:30:28,480 --> 00:30:34,125 operators has a suffix. 556 00:30:34,125 --> 00:30:37,060 And sometimes, if it's a two operand instruction, it may, 557 00:30:37,060 --> 00:30:39,820 where it's taking things of different sizes, it may have 558 00:30:39,820 --> 00:30:40,780 two suffixes. 559 00:30:40,780 --> 00:30:44,110 But it has a suffix which is a single character that tells 560 00:30:44,110 --> 00:30:52,050 you what the size is that you're working with. 561 00:30:52,050 --> 00:30:55,310 So, for example, B is for byte. 562 00:30:55,310 --> 00:30:57,980 W is for word because originally the 563 00:30:57,980 --> 00:31:01,421 words were 16 bits. 564 00:31:01,421 --> 00:31:05,580 L is for long except that it's not a long 565 00:31:05,580 --> 00:31:07,040 so don't get confused. 566 00:31:07,040 --> 00:31:08,680 L is not long. 567 00:31:08,680 --> 00:31:15,180 Long is a quad word, or Q, four bytes. 568 00:31:15,180 --> 00:31:19,230 And then a float is an S. A double is a D. And a long 569 00:31:19,230 --> 00:31:22,100 double is a T. 570 00:31:22,100 --> 00:31:24,920 So these you will get familiar with. 571 00:31:24,920 --> 00:31:26,160 And they're not so hard. 572 00:31:26,160 --> 00:31:28,290 But that doesn't mean you know them right off the bat. 573 00:31:28,290 --> 00:31:29,650 And it helps to have a cheat sheet. 574 00:31:32,860 --> 00:31:36,550 As I say, the main one not to get confused about is the Ls. 575 00:31:36,550 --> 00:31:42,630 L means something different in x86 than it means in C. 576 00:31:42,630 --> 00:31:49,250 So, for example, here we have a move of, and because it's a 577 00:31:49,250 --> 00:31:54,540 Q, I know that it is an eight byte or a 64-bit operator. 578 00:31:54,540 --> 00:31:57,615 And you can tell that also because it's using RBP and 579 00:31:57,615 --> 00:32:02,630 RAX, both of which are 64-bit registers. 580 00:32:02,630 --> 00:32:05,330 In fact, in assembly, you can actually write it without the 581 00:32:05,330 --> 00:32:13,420 Q, because the assembler can infer when the Q isn't there 582 00:32:13,420 --> 00:32:18,060 that, oh, this is a 64-bit register, that's a 64-bit 583 00:32:18,060 --> 00:32:21,700 register, I bet he means move 64-bits. 584 00:32:21,700 --> 00:32:24,020 So it actually fills that in sometimes. 585 00:32:24,020 --> 00:32:27,220 But sometimes you need to be explicit. 586 00:32:27,220 --> 00:32:27,580 Question? 587 00:32:27,580 --> 00:32:30,418 AUDIENCE: What happens when you actually put 64-bit 588 00:32:30,418 --> 00:32:33,640 registers but you only, and you just put move [? b ?] 589 00:32:33,640 --> 00:32:34,020 or something? 590 00:32:34,020 --> 00:32:34,375 Would it complain? 591 00:32:34,375 --> 00:32:36,490 Would it [UNINTELLIGIBLE]? 592 00:32:36,490 --> 00:32:37,980 CHARLES LEISERSON: Yeah, it would complain. 593 00:32:37,980 --> 00:32:38,910 Yeah, it would complain. 594 00:32:38,910 --> 00:32:46,750 it'll say it's an improperly formed instruction, so, yeah. 595 00:32:46,750 --> 00:32:48,110 And the other thing you can do, of course, 596 00:32:48,110 --> 00:32:50,430 is just try it out. 597 00:32:50,430 --> 00:32:51,920 What happens if? 598 00:32:51,920 --> 00:32:53,200 That's the great thing about computers. 599 00:32:53,200 --> 00:32:56,140 It's easy to do what happened if. 600 00:32:56,140 --> 00:33:00,680 Now, the instruction format is typically an opcode followed 601 00:33:00,680 --> 00:33:03,840 by an operand list. 602 00:33:03,840 --> 00:33:06,760 So the opcode is a short mnemonic identifying the type 603 00:33:06,760 --> 00:33:12,230 of instruction that includes typically the single character 604 00:33:12,230 --> 00:33:15,390 suffix indicating the data type. 605 00:33:15,390 --> 00:33:18,570 However, for some instructions it turns out you can have two 606 00:33:18,570 --> 00:33:20,610 suffixes if the two-- 607 00:33:20,610 --> 00:33:25,670 Most instructions operate on data types of the same size. 608 00:33:25,670 --> 00:33:31,300 But some of them operate on two different sizes in which 609 00:33:31,300 --> 00:33:33,130 case you'll have two suffixes. 610 00:33:33,130 --> 00:33:35,480 If the suffix is missing, it can generally be inferred, as 611 00:33:35,480 --> 00:33:36,710 I mentioned. 612 00:33:36,710 --> 00:33:41,580 Then the operand list is from zero, two, and very rarely 613 00:33:41,580 --> 00:33:45,840 three operands separated by commas. 614 00:33:45,840 --> 00:33:50,090 Now, in the architecture manual, in fact, they say if 615 00:33:50,090 --> 00:33:54,790 you look at it, they'll show you fourth operand. 616 00:33:54,790 --> 00:33:56,260 And I said, four operands? 617 00:33:56,260 --> 00:33:58,270 This documentation says there's only three. 618 00:33:58,270 --> 00:34:00,460 This one says there's four. 619 00:34:00,460 --> 00:34:01,520 I went through the whole 620 00:34:01,520 --> 00:34:03,170 architecture manual last night. 621 00:34:06,660 --> 00:34:10,210 Every time it says four operands, it says N/A, not 622 00:34:10,210 --> 00:34:11,500 applicable. 623 00:34:11,500 --> 00:34:14,540 So I think it's just there reserved or something. 624 00:34:14,540 --> 00:34:20,070 But anyway there is no fourth operand as far as I can tell. 625 00:34:20,070 --> 00:34:25,690 Now, one of the operands is the destination. 626 00:34:25,690 --> 00:34:28,489 And here's where we start to get into some differences. 627 00:34:28,489 --> 00:34:32,139 There's actually two standard formats for assembly language 628 00:34:32,139 --> 00:34:36,630 that are generally called Intel and AT&T. So AT&T was 629 00:34:36,630 --> 00:34:41,469 the original Unix system. 630 00:34:41,469 --> 00:34:45,969 And Intel is what Intel uses for their assembler. 631 00:34:45,969 --> 00:34:51,389 They do the destination operand in the opposite order. 632 00:34:51,389 --> 00:34:55,000 So AT&T, it puts the destination last. 633 00:34:55,000 --> 00:34:57,910 In Intel it puts the destination first. 634 00:34:57,910 --> 00:35:00,720 So when you're reading documentation, you can read 635 00:35:00,720 --> 00:35:01,595 the Intel documentation. 636 00:35:01,595 --> 00:35:04,020 You just have to remember to flip it around if you're 637 00:35:04,020 --> 00:35:07,260 actually writing it as we will be using the AT&T format. 638 00:35:07,260 --> 00:35:09,860 Almost everybody uses AT&T as far as I 639 00:35:09,860 --> 00:35:13,370 can tell except Intel. 640 00:35:13,370 --> 00:35:16,740 So Intel's assembler does it the other way around. 641 00:35:16,740 --> 00:35:20,030 And actually now GCC will actually, you can give it a 642 00:35:20,030 --> 00:35:28,190 directive to say I'm now switching to writing it in 643 00:35:28,190 --> 00:35:29,480 Intel assembler. 644 00:35:29,480 --> 00:35:31,450 So you can actually go back and forth between the two if 645 00:35:31,450 --> 00:35:34,080 you happen to borrow some assembly language code from 646 00:35:34,080 --> 00:35:35,510 somebody else. 647 00:35:35,510 --> 00:35:37,620 So one of them is the destination. 648 00:35:37,620 --> 00:35:42,680 The other operations are read-only, so const in the C++ 649 00:35:42,680 --> 00:35:44,230 terminology. 650 00:35:44,230 --> 00:35:45,540 They're read-only. 651 00:35:45,540 --> 00:35:49,140 So it's always the case that only one of them is going to 652 00:35:49,140 --> 00:35:50,160 be modified. 653 00:35:50,160 --> 00:35:54,770 And that's the one that's the destination of the operation. 654 00:35:54,770 --> 00:35:56,730 In addition in assembler, there are what are called 655 00:35:56,730 --> 00:35:57,570 directives. 656 00:35:57,570 --> 00:36:01,840 Besides the instructions, there are directives. 657 00:36:01,840 --> 00:36:03,740 So first of all there are things like labels. 658 00:36:03,740 --> 00:36:07,780 You can take any instruction and put an identifier and a 659 00:36:07,780 --> 00:36:12,200 colon, and that becomes then a way of naming that 660 00:36:12,200 --> 00:36:14,040 place in your code. 661 00:36:14,040 --> 00:36:16,320 So, for example, jump instructions want to know to 662 00:36:16,320 --> 00:36:18,480 where they're jumping. 663 00:36:18,480 --> 00:36:23,050 And rather than having to know upfront what the address is, 664 00:36:23,050 --> 00:36:25,740 the assembler will calculate what that address is and 665 00:36:25,740 --> 00:36:29,420 everywhere you put x, it'll put in the right value. 666 00:36:29,420 --> 00:36:33,130 And you get to name it symbolically rather than as an 667 00:36:33,130 --> 00:36:35,700 absolute machine location. 668 00:36:35,700 --> 00:36:37,180 There are storage directives. 669 00:36:37,180 --> 00:36:41,560 So, for example, .space 20 says allocate 20 bytes at 670 00:36:41,560 --> 00:36:43,270 location x. 671 00:36:43,270 --> 00:36:49,890 .long says store the constant 172 at y. 672 00:36:49,890 --> 00:36:53,620 It's being stored at y because I said y is here. 673 00:36:53,620 --> 00:36:57,100 And asciz gives you a string that's zero terminated. 674 00:36:57,100 --> 00:37:00,720 So the standard for strings is zero terminated. 675 00:37:00,720 --> 00:37:03,390 You can also, there's one that says give me a 676 00:37:03,390 --> 00:37:05,630 nonterminated string. 677 00:37:05,630 --> 00:37:08,740 So you can have fun with that if you like that. 678 00:37:08,740 --> 00:37:12,150 The align directive says make sure that as you're going 679 00:37:12,150 --> 00:37:14,040 through, so what's happening is the assembler is going 680 00:37:14,040 --> 00:37:16,640 through there, is it's laying these things out in memory 681 00:37:16,640 --> 00:37:19,300 typically sequentially, the way you wrote it down in the 682 00:37:19,300 --> 00:37:23,460 program, in the assembly language program. 683 00:37:23,460 --> 00:37:26,460 If you say align eight, it says advance whatever the 684 00:37:26,460 --> 00:37:29,480 pointer is of where the next thing is going to be put to be 685 00:37:29,480 --> 00:37:31,910 a multiple of eight. 686 00:37:31,910 --> 00:37:34,600 And that way you don't run the risk of where you declare a 687 00:37:34,600 --> 00:37:39,550 character and then you say, OK, and now I want a long or 688 00:37:39,550 --> 00:37:43,930 something, and it's not aligned in a way that the 689 00:37:43,930 --> 00:37:48,700 eight bytes correspond to a multiple of eight the way you 690 00:37:48,700 --> 00:37:50,300 need to in order for the instructions to 691 00:37:50,300 --> 00:37:52,890 properly work on them. 692 00:37:52,890 --> 00:37:56,350 So generally, although, we have byte pointers, most 693 00:37:56,350 --> 00:37:58,445 instructions only work on aligned values. 694 00:38:01,760 --> 00:38:05,050 And for some of them that work on unaligned values, they're 695 00:38:05,050 --> 00:38:09,760 generally slower than the ones that work on aligned values. 696 00:38:09,760 --> 00:38:11,260 There are also segment directives. 697 00:38:11,260 --> 00:38:16,510 So in memory when you run your program, the executing program 698 00:38:16,510 --> 00:38:20,810 starts with the program text down at the bottom of memory. 699 00:38:20,810 --> 00:38:25,830 And then it has fixed data that's not going to change, 700 00:38:25,830 --> 00:38:27,410 static allocation of data. 701 00:38:27,410 --> 00:38:30,870 And then it's got heap, which is dynamically allocated data. 702 00:38:30,870 --> 00:38:32,010 And then it's got stack. 703 00:38:32,010 --> 00:38:36,230 The stack grows downward, and the heap grows upward. 704 00:38:36,230 --> 00:38:39,530 By saying something like text, it says make sure that the 705 00:38:39,530 --> 00:38:42,010 next stuff I'm putting goes into the text segment. 706 00:38:42,010 --> 00:38:45,430 So that's generally where you put your code. 707 00:38:45,430 --> 00:38:47,590 Saying it's in data says make sure it goes in here. 708 00:38:47,590 --> 00:38:51,500 So you may want to have a table, for example. 709 00:38:51,500 --> 00:38:54,230 So, for example, from pentominos you might have a 710 00:38:54,230 --> 00:38:55,470 table there. 711 00:38:55,470 --> 00:38:56,610 There's going to be a fixed table. 712 00:38:56,610 --> 00:38:57,740 You're never going to change it during the 713 00:38:57,740 --> 00:38:59,050 running of the program. 714 00:38:59,050 --> 00:39:00,300 Put it in the data segment. 715 00:39:03,790 --> 00:39:06,060 And then there's also things like scope and linkage 716 00:39:06,060 --> 00:39:06,550 directives. 717 00:39:06,550 --> 00:39:09,640 So saying .global, and you can either spell it incorrectly, 718 00:39:09,640 --> 00:39:14,310 as I have here, or with the a, it's the same thing for the 719 00:39:14,310 --> 00:39:16,640 GNU assembler anyway. 720 00:39:16,640 --> 00:39:21,130 It says make the symbol fib externally visible. 721 00:39:21,130 --> 00:39:23,760 And that makes sure that it goes into the symbol table so 722 00:39:23,760 --> 00:39:26,910 that debuggers and things can look at it. 723 00:39:26,910 --> 00:39:29,980 And there's a lot more of these in the assembler manual, 724 00:39:29,980 --> 00:39:33,140 that link that I showed you to before, tells you what all the 725 00:39:33,140 --> 00:39:33,940 directives mean. 726 00:39:33,940 --> 00:39:35,880 So that when you're looking at code, which mostly you'll be 727 00:39:35,880 --> 00:39:38,680 reading it, making a few changes to it, you can know 728 00:39:38,680 --> 00:39:39,930 what things mean. 729 00:39:42,520 --> 00:39:47,020 So the opcode examples, here's some examples. 730 00:39:47,020 --> 00:39:49,500 There are things like mov, push, and pop. 731 00:39:49,500 --> 00:39:53,970 So, for example, here movslq, this is an interesting one 732 00:39:53,970 --> 00:39:55,150 because it's moving. 733 00:39:55,150 --> 00:40:01,000 The s says extend the sign because I'm moving from a long 734 00:40:01,000 --> 00:40:09,930 from a 32-bit word to a 64-bit word, from 4-bits to 8-bits. 735 00:40:09,930 --> 00:40:14,490 So that's why this one takes two suffixes moving from, 736 00:40:14,490 --> 00:40:17,470 you'll notice, a 32-bit register to a 64-bit register. 737 00:40:20,160 --> 00:40:21,570 So you have to be careful. 738 00:40:21,570 --> 00:40:24,840 This is something I got caught up in the other day. 739 00:40:24,840 --> 00:40:27,960 The results of 32-bit operations are implicitly 740 00:40:27,960 --> 00:40:30,620 extended to 64-bit values. 741 00:40:30,620 --> 00:40:34,330 So if you store something into EAX, for example, it 742 00:40:34,330 --> 00:40:40,220 automatically zeroes out the high order 32-bits of RAX. 743 00:40:40,220 --> 00:40:42,130 Because that's the one that it's embedded in. 744 00:40:42,130 --> 00:40:46,340 However, that's not true for the eight and 16-bit 745 00:40:46,340 --> 00:40:47,580 operations. 746 00:40:47,580 --> 00:40:52,910 If you store into an 8-bit field, an 8-bit part of the 747 00:40:52,910 --> 00:40:55,750 register, it does not zero out the high 748 00:40:55,750 --> 00:40:58,120 order bits of the remainder. 749 00:40:58,120 --> 00:41:02,340 So you just have to be careful when you're doing that. 750 00:41:02,340 --> 00:41:06,370 Most of these are things, by the way, that is more cryptic 751 00:41:06,370 --> 00:41:07,250 when you're looking at stuff. 752 00:41:07,250 --> 00:41:10,850 It's like, oh, how come it's, gee, I thought I had, I'm 753 00:41:10,850 --> 00:41:13,770 returning a double word, but it looks here like it's 754 00:41:13,770 --> 00:41:16,440 returning a 32-bit word, how come I thought I 755 00:41:16,440 --> 00:41:18,270 was returning 64-- 756 00:41:18,270 --> 00:41:20,060 Well, the answer is because it knows the high 757 00:41:20,060 --> 00:41:21,310 order bits are zero. 758 00:41:21,310 --> 00:41:23,343 So it's using the shorter instructions. 759 00:41:26,230 --> 00:41:33,780 And yet it still is having the impact on the 64-bit register. 760 00:41:33,780 --> 00:41:37,000 They're all the arithmetic and logical operations. 761 00:41:37,000 --> 00:41:40,810 So subtracting, once again, the destination is second. 762 00:41:40,810 --> 00:41:43,170 So typically these are two operator things. 763 00:41:43,170 --> 00:41:45,820 So you always have the destination occurs both at the 764 00:41:45,820 --> 00:41:49,130 beginning and on the left hand side and the right hand side, 765 00:41:49,130 --> 00:41:53,027 things like shifts and rotates, control transfer, so 766 00:41:53,027 --> 00:41:56,190 call which does a subroutine jump, return from a 767 00:41:56,190 --> 00:42:00,650 subroutine, a jump instruction that just says make the next 768 00:42:00,650 --> 00:42:03,320 instruction the thing that you're pointing to, and very 769 00:42:03,320 --> 00:42:05,380 important, the jump conditionals where the 770 00:42:05,380 --> 00:42:08,460 condition is a whole bunch of keys that are things like 771 00:42:08,460 --> 00:42:12,420 greater than, less than, and so forth, and different ones 772 00:42:12,420 --> 00:42:16,400 for signed and unsigned and so forth. 773 00:42:16,400 --> 00:42:20,210 So typically the condition is computed by using a compare 774 00:42:20,210 --> 00:42:20,800 instruction. 775 00:42:20,800 --> 00:42:24,570 I probably should have put CNP on here as well, but I didn't. 776 00:42:24,570 --> 00:42:27,540 But the CNP instruction is usually what you use to 777 00:42:27,540 --> 00:42:30,900 compare two things and then you separately jump on what 778 00:42:30,900 --> 00:42:32,960 the condition is. 779 00:42:32,960 --> 00:42:37,330 There's a pretty nice website that has 780 00:42:37,330 --> 00:42:39,840 most of these opcodes. 781 00:42:39,840 --> 00:42:44,590 However, they only deal with the old x86 782 00:42:44,590 --> 00:42:46,970 without the 64-bit extension. 783 00:42:46,970 --> 00:42:49,390 And they use the Intel syntax. 784 00:42:49,390 --> 00:42:50,570 But it's really convenient. 785 00:42:50,570 --> 00:42:55,140 Because they've done a nice job of making a quick jump 786 00:42:55,140 --> 00:42:57,220 table where you can just go, look up the 787 00:42:57,220 --> 00:42:59,200 opcode, and pop it up. 788 00:42:59,200 --> 00:43:02,560 Otherwise you can just look at them in the manual. 789 00:43:02,560 --> 00:43:04,420 Anyway, that's kind of a convenient place. 790 00:43:04,420 --> 00:43:07,180 And, as I say, just beware because it's 32-bit only, and 791 00:43:07,180 --> 00:43:08,350 it's Intel syntax. 792 00:43:08,350 --> 00:43:10,190 Most of the instructions got extended. 793 00:43:10,190 --> 00:43:15,240 I mean it's like, OK, if you do it for eight and 16 and 32, 794 00:43:15,240 --> 00:43:18,720 the operation is not going to change that much to go to 64. 795 00:43:18,720 --> 00:43:20,780 A few of them do, however. 796 00:43:20,780 --> 00:43:27,520 Now, the operands, Intel supports, the x86, which is 797 00:43:27,520 --> 00:43:30,820 Intel and AMD, typically, support all kinds of 798 00:43:30,820 --> 00:43:33,870 addressing modes. 799 00:43:33,870 --> 00:43:36,550 The rule is that only one operand, 800 00:43:36,550 --> 00:43:41,060 however, can address memory. 801 00:43:41,060 --> 00:43:43,180 So you have to pick which is the operand that's going to 802 00:43:43,180 --> 00:43:45,600 address memory if you have multiple operands. 803 00:43:45,600 --> 00:43:47,270 You can't have both operands. 804 00:43:47,270 --> 00:43:50,860 So you can't add two things in memory. 805 00:43:50,860 --> 00:43:52,800 You always have to take something for memory and bring 806 00:43:52,800 --> 00:43:56,880 it into a register and then store it back out. 807 00:43:56,880 --> 00:44:00,610 So the simplest one is two register instructions. 808 00:44:00,610 --> 00:44:04,080 Here I've basically marked the-- 809 00:44:04,080 --> 00:44:05,030 What have I marked here? 810 00:44:05,030 --> 00:44:07,515 I guess I marked-- 811 00:44:07,515 --> 00:44:07,820 I don't know. 812 00:44:07,820 --> 00:44:09,090 Down here I was marking memory. 813 00:44:09,090 --> 00:44:10,820 I'm not sure what I was marking up here. 814 00:44:10,820 --> 00:44:12,070 Because they're both registers. 815 00:44:14,270 --> 00:44:18,710 But in any case, this is just adding RBX into RAX. 816 00:44:18,710 --> 00:44:21,850 And so it takes the contents of RBX adds it in 817 00:44:21,850 --> 00:44:23,730 the contents of RAX. 818 00:44:23,730 --> 00:44:26,360 There's something that's called direct. 819 00:44:26,360 --> 00:44:31,780 So this is, it says, where you move, x is some constant 820 00:44:31,780 --> 00:44:37,130 value, and you move it, the contents of it, into RDI. 821 00:44:37,130 --> 00:44:39,270 So if x, for example, is a location that you've stored a 822 00:44:39,270 --> 00:44:42,570 value in, you can say move whatever is the value at that 823 00:44:42,570 --> 00:44:46,450 location into RDI. 824 00:44:46,450 --> 00:44:51,240 Immediate says, which usually is preceded by a dollar sign, 825 00:44:51,240 --> 00:44:54,990 says move the address of it, move that as a constant. 826 00:44:54,990 --> 00:44:57,970 So x has a value, move that value. 827 00:44:57,970 --> 00:45:02,290 So if you say $3, then you'll move the 828 00:45:02,290 --> 00:45:03,740 constant three into RDI. 829 00:45:03,740 --> 00:45:07,690 If you said mov3 this, you're going to move the contents of 830 00:45:07,690 --> 00:45:11,030 location three in memory. 831 00:45:11,030 --> 00:45:16,250 So that's the difference between direct and immediate. 832 00:45:16,250 --> 00:45:19,720 So the dollar sign says you're taking that 833 00:45:19,720 --> 00:45:21,350 as a literal constant. 834 00:45:21,350 --> 00:45:24,610 And the direct says you're actually going to memory and 835 00:45:24,610 --> 00:45:26,470 fetching it. 836 00:45:26,470 --> 00:45:28,020 Then things start getting interesting. 837 00:45:28,020 --> 00:45:33,215 Register indirect says, in this case, the thing that 838 00:45:33,215 --> 00:45:34,900 you're going to access is the thing 839 00:45:34,900 --> 00:45:38,050 pointed to by that register. 840 00:45:38,050 --> 00:45:42,610 So don't move, in this case, RBX into RAX, move it to the 841 00:45:42,610 --> 00:45:47,920 memory location that RAX is pointing to. 842 00:45:47,920 --> 00:45:51,160 Then you can do register index which says, well, it's 843 00:45:51,160 --> 00:45:57,300 pointing to it, but I want displaced 172-bytes off of 844 00:45:57,300 --> 00:46:01,930 that location, of whatever this is pointing to. 845 00:46:01,930 --> 00:46:04,470 So, for example, if you have a pointer to a record, you can 846 00:46:04,470 --> 00:46:07,050 then have just a single pointer and address all the 847 00:46:07,050 --> 00:46:10,320 fields just by doing register indirect to the different 848 00:46:10,320 --> 00:46:11,940 fields using that same register. 849 00:46:15,430 --> 00:46:17,650 Then there is, it actually-- 850 00:46:17,650 --> 00:46:21,220 I skipped, actually, a few in here that are subsets of this. 851 00:46:21,220 --> 00:46:24,500 This is, I think, the most complicated one that I know. 852 00:46:24,500 --> 00:46:27,520 It's base index scale displacement where base and 853 00:46:27,520 --> 00:46:31,100 index are registers, the scale is two, four, eight, and if 854 00:46:31,100 --> 00:46:33,050 it's not there, it implies one. 855 00:46:33,050 --> 00:46:37,710 The displacement is eight, 16, or a 32-bit value. 856 00:46:37,710 --> 00:46:45,260 And it says take RD-- 857 00:46:45,260 --> 00:46:48,570 oh, I had put the math on here, and then I guess I lost 858 00:46:48,570 --> 00:46:54,640 it-- it says take RDX, multiply it by eight, add RDI, 859 00:46:54,640 --> 00:46:57,141 and add 172. 860 00:46:57,141 --> 00:46:58,391 [WHISTLE]. 861 00:47:00,290 --> 00:47:02,700 So, anyway, you can look in the manual. 862 00:47:02,700 --> 00:47:05,860 So you'll see some of these instructions being generated. 863 00:47:05,860 --> 00:47:08,030 Generally, you're not going to generate these instructions. 864 00:47:08,030 --> 00:47:10,780 So when you see them generated, you can see it. 865 00:47:10,780 --> 00:47:14,220 And then, and this is actually new, it's not in the x86. 866 00:47:14,220 --> 00:47:18,270 It has this instruction pointer where you can actually 867 00:47:18,270 --> 00:47:22,350 access where the current program counter is pointing, 868 00:47:22,350 --> 00:47:25,490 where it is in the code, and store that value, in this case 869 00:47:25,490 --> 00:47:28,660 indexed by six, into RAX. 870 00:47:28,660 --> 00:47:30,760 So you can do it relative where this has 871 00:47:30,760 --> 00:47:33,200 to be a 32-bit constant. 872 00:47:33,200 --> 00:47:37,210 And what's good about that is it allows you then to write 873 00:47:37,210 --> 00:47:39,930 code where you can do things like jump to something that's 874 00:47:39,930 --> 00:47:42,690 relative to the program counter. 875 00:47:42,690 --> 00:47:46,020 And that lets you put the code anywhere in memory, and it 876 00:47:46,020 --> 00:47:48,010 still has the same behavior. 877 00:47:48,010 --> 00:47:51,190 Because you're going relative to where that code is rather 878 00:47:51,190 --> 00:47:54,080 than to an absolute location. 879 00:47:54,080 --> 00:47:56,315 So it allows the code to be relocatable. 880 00:48:00,580 --> 00:48:02,540 So here's-- 881 00:48:02,540 --> 00:48:03,740 Yeah, questions, yeah sure. 882 00:48:03,740 --> 00:48:07,044 AUDIENCE: Why was it the index registers, when you have the 883 00:48:07,044 --> 00:48:09,876 numbering for the register, what does that mean again? 884 00:48:09,876 --> 00:48:11,680 CHARLES LEISERSON: The number before the register? 885 00:48:11,680 --> 00:48:13,099 AUDIENCE: In the instruction [UNINTELLIGIBLE], 886 00:48:13,099 --> 00:48:15,120 that's 60 for RID. 887 00:48:15,120 --> 00:48:18,610 CHARLES LEISERSON: OK, or whatever, whenever it's here, 888 00:48:18,610 --> 00:48:22,440 it's basically saying, add that value to 889 00:48:22,440 --> 00:48:25,900 the contents of RAX. 890 00:48:25,900 --> 00:48:29,490 And so the same thing here, add six to the contents of the 891 00:48:29,490 --> 00:48:30,050 instruction. 892 00:48:30,050 --> 00:48:35,510 So this is six bytes ahead of me in the instruction stream. 893 00:48:35,510 --> 00:48:36,360 OK? 894 00:48:36,360 --> 00:48:39,230 So you can actually say, well, what's that instruction ahead 895 00:48:39,230 --> 00:48:41,380 of me in the instructions stream? 896 00:48:41,380 --> 00:48:42,630 OK? 897 00:48:45,350 --> 00:48:50,490 So here's some examples of essentially the same code and 898 00:48:50,490 --> 00:48:53,470 how it gets compiled. 899 00:48:53,470 --> 00:48:57,420 So here we're going to have a fou1, fou2, fou3. 900 00:48:57,420 --> 00:49:00,690 And in this case we declare x, y, and z 901 00:49:00,690 --> 00:49:02,910 to be unsigned integers. 902 00:49:02,910 --> 00:49:04,450 We set them to some values. 903 00:49:04,450 --> 00:49:10,900 And we just simply say return x plus y or z, 904 00:49:10,900 --> 00:49:12,880 bitwise OR with z. 905 00:49:12,880 --> 00:49:16,450 If you look at what the code is that's generated, it says 906 00:49:16,450 --> 00:49:19,612 move the constant 45 into EAX. 907 00:49:22,740 --> 00:49:23,800 Why does it do that? 908 00:49:23,800 --> 00:49:24,990 Well, let's just see. 909 00:49:24,990 --> 00:49:28,930 Well, the compiler figures out that it knows what 35, seven, 910 00:49:28,930 --> 00:49:29,890 and 45 are. 911 00:49:29,890 --> 00:49:31,560 It computes x plus y. 912 00:49:31,560 --> 00:49:32,975 That's 41. 913 00:49:32,975 --> 00:49:37,930 If you take 41 bitwise OR with 45, it turns out it's masking 914 00:49:37,930 --> 00:49:40,040 the same bits, that's 45. 915 00:49:40,040 --> 00:49:43,600 So the compiler actually can figure this out that all it 916 00:49:43,600 --> 00:49:48,040 has to do is return 45 in a 64-bit register. 917 00:49:48,040 --> 00:49:51,620 Ah, but here it's returning it in a 32-bit register. 918 00:49:51,620 --> 00:49:52,770 What happened? 919 00:49:52,770 --> 00:49:53,990 It's not obeying the type. 920 00:49:53,990 --> 00:49:56,200 The type is supposed to be 64-bits, but 921 00:49:56,200 --> 00:49:58,460 that's a 32-bit register. 922 00:49:58,460 --> 00:50:01,400 Oh, yeah, that's this thing where it automatically zeroes 923 00:50:01,400 --> 00:50:03,540 out the high order bits. 924 00:50:03,540 --> 00:50:07,200 And it uses this instruction, because this is a shorter 925 00:50:07,200 --> 00:50:10,030 instruction than if it did the RAX. 926 00:50:10,030 --> 00:50:12,550 It could do the same thing with RAX, but it would be more 927 00:50:12,550 --> 00:50:13,640 bytes of instruction. 928 00:50:13,640 --> 00:50:17,020 So they saved a couple bytes of instruction by doing that. 929 00:50:17,020 --> 00:50:20,240 So people follow what happened there? 930 00:50:20,240 --> 00:50:21,290 Let's take a look at the next one. 931 00:50:21,290 --> 00:50:23,770 Here it's the same code just let's pass 932 00:50:23,770 --> 00:50:26,890 those things as arguments. 933 00:50:26,890 --> 00:50:29,280 Well, if you remember the calling convention, parameter 934 00:50:29,280 --> 00:50:32,640 one is in RDI, parameter two is in RSI, and 935 00:50:32,640 --> 00:50:35,580 parameter three is in RDX. 936 00:50:35,580 --> 00:50:37,170 So I don't expect you to remember that off 937 00:50:37,170 --> 00:50:37,920 the top your head. 938 00:50:37,920 --> 00:50:41,020 But we have the cheat sheet, and you can figure that out. 939 00:50:41,020 --> 00:50:43,520 So here's what it does is, oh my goodness, what is that 940 00:50:43,520 --> 00:50:44,530 instruction? 941 00:50:44,530 --> 00:50:51,070 This is actually a computation of effective address. 942 00:50:51,070 --> 00:50:54,100 So the effective address is basically saying, and it's 943 00:50:54,100 --> 00:50:56,610 using one of these funny indexing modes. 944 00:50:56,610 --> 00:51:00,650 So what this is actually doing is it's actually adding these 945 00:51:00,650 --> 00:51:05,690 two numbers together, the values stored in those 946 00:51:05,690 --> 00:51:08,940 locations together, and storing it into RAX. 947 00:51:08,940 --> 00:51:16,990 And then it's then OR-ing RDX, what's in RDX, with RAX and 948 00:51:16,990 --> 00:51:17,800 then returning. 949 00:51:17,800 --> 00:51:22,280 Remember that RAX is where the result is always going to be. 950 00:51:22,280 --> 00:51:24,470 So the result is always returned in RAX. 951 00:51:24,470 --> 00:51:27,360 So you can see you have to do a little bit more complicated 952 00:51:27,360 --> 00:51:30,240 addressing in order to pull them out as parameters than if 953 00:51:30,240 --> 00:51:32,690 it could actually figure out what the numbers are. 954 00:51:32,690 --> 00:51:35,890 Last example here is I declared these things before I 955 00:51:35,890 --> 00:51:38,760 ever got their globals. 956 00:51:38,760 --> 00:51:40,540 And so I declared them before I ever got in. 957 00:51:40,540 --> 00:51:42,530 So that means since they're globals, they have a fixed 958 00:51:42,530 --> 00:51:44,200 place in memory. 959 00:51:44,200 --> 00:51:49,180 And so the code that's generated is moving, it turns 960 00:51:49,180 --> 00:51:53,790 out it allocates them right nearby the instructions here. 961 00:51:53,790 --> 00:51:56,710 And so what it does is it has actually a relative offset for 962 00:51:56,710 --> 00:52:03,490 x, relativity instruction pointer, put that in RAX, add 963 00:52:03,490 --> 00:52:08,420 the offset of x into it, and then OR it with the offset of 964 00:52:08,420 --> 00:52:10,990 z, and then return. 965 00:52:10,990 --> 00:52:13,960 And so there the constants are actually stored right nearby 966 00:52:13,960 --> 00:52:19,200 in the code so that they can use this relative offset. 967 00:52:19,200 --> 00:52:23,470 And the compiler figures out, or the assembler figures out, 968 00:52:23,470 --> 00:52:26,640 exactly what the offset is that it actually needs to 969 00:52:26,640 --> 00:52:30,410 substitute for y so that it can be a relative offset from 970 00:52:30,410 --> 00:52:32,120 the current instruction pointer. 971 00:52:32,120 --> 00:52:33,970 Notice that, for example, that's going to change 972 00:52:33,970 --> 00:52:35,980 depending upon the value of y here. 973 00:52:35,980 --> 00:52:39,700 It's going to change compared to if I accessed y down here. 974 00:52:39,700 --> 00:52:40,960 It would be a different instruction 975 00:52:40,960 --> 00:52:43,070 pointer at this point. 976 00:52:43,070 --> 00:52:45,020 So it actually just goes but it computes what the 977 00:52:45,020 --> 00:52:50,710 difference is so it knows what the distance is. 978 00:52:50,710 --> 00:52:53,730 It can compute that at compile time, and then at execution 979 00:52:53,730 --> 00:52:57,440 time it just uses whatever constant goes in there. 980 00:52:57,440 --> 00:52:59,540 So the important thing here is just to notice that the code 981 00:52:59,540 --> 00:53:01,990 depends upon where x, y, and z are allocated. 982 00:53:06,010 --> 00:53:11,420 So the first thing to actually look at good code is to 983 00:53:11,420 --> 00:53:13,400 understand the calling convention 984 00:53:13,400 --> 00:53:15,915 that's used by the compiler. 985 00:53:19,950 --> 00:53:21,550 And here are the basics of it. 986 00:53:21,550 --> 00:53:24,910 So the register RSP points to the function 987 00:53:24,910 --> 00:53:26,620 call stack in memory. 988 00:53:26,620 --> 00:53:29,750 And the call stack grows downward in memory, like in 989 00:53:29,750 --> 00:53:33,080 that little map I showed you before, so that as you push 990 00:53:33,080 --> 00:53:36,040 things onto the stack they're getting lower numbered not 991 00:53:36,040 --> 00:53:37,290 higher numbered. 992 00:53:40,260 --> 00:53:43,120 The call instruction pushes the current instruction 993 00:53:43,120 --> 00:53:47,780 pointer onto the stack, jumps to the call target operand, 994 00:53:47,780 --> 00:53:51,550 which is basically the address of the thing you're calling. 995 00:53:51,550 --> 00:53:55,020 So when you do a call, it saves your return 996 00:53:55,020 --> 00:53:57,580 address on the stack. 997 00:53:57,580 --> 00:54:00,520 The return instruction pops the return address off the 998 00:54:00,520 --> 00:54:02,410 stack and returns to the caller. 999 00:54:02,410 --> 00:54:05,520 It basically says, oh, I know where the return address is. 1000 00:54:05,520 --> 00:54:09,790 I slam that into the current instruction pointer, and that 1001 00:54:09,790 --> 00:54:13,140 becomes the next instruction that's executed. 1002 00:54:13,140 --> 00:54:16,080 Now, there are some software conventions that are used 1003 00:54:16,080 --> 00:54:17,210 that's helpful to know. 1004 00:54:17,210 --> 00:54:20,030 Besides those instruction registers, some of the 1005 00:54:20,030 --> 00:54:23,300 registers are expected to be saved by the caller, some are 1006 00:54:23,300 --> 00:54:26,930 expected to be saved by the callee. 1007 00:54:26,930 --> 00:54:31,460 You're free to violate this in your own little piece of code 1008 00:54:31,460 --> 00:54:34,690 as long as if you're calling something else, 1009 00:54:34,690 --> 00:54:37,260 you're obeying it. 1010 00:54:37,260 --> 00:54:40,090 So you don't have obey this convention in the code you 1011 00:54:40,090 --> 00:54:44,760 write unless you want to interoperate with other stuff. 1012 00:54:44,760 --> 00:54:48,320 So if you, for example, have a leaf procedure, you can decide 1013 00:54:48,320 --> 00:54:50,880 for that leaf procedure, oh, I'm going to make something 1014 00:54:50,880 --> 00:54:54,240 callee saved that was caller saved or whatever as long as 1015 00:54:54,240 --> 00:54:56,840 by the time you return you've cleaned everything up for the 1016 00:54:56,840 --> 00:54:58,450 rest of the world. 1017 00:54:58,450 --> 00:55:00,800 So these are conventions. 1018 00:55:00,800 --> 00:55:03,575 But for the most part, you're not going to violate these, 1019 00:55:03,575 --> 00:55:06,350 and the code that the compiler generates doesn't violate 1020 00:55:06,350 --> 00:55:09,460 these because it expects everything to interoperate. 1021 00:55:09,460 --> 00:55:11,980 So here's how the subroutine linkage works. 1022 00:55:11,980 --> 00:55:15,100 We're going to do an example here where function A calls 1023 00:55:15,100 --> 00:55:18,200 function B which will call function C. And right now, 1024 00:55:18,200 --> 00:55:21,450 we're at the point we're executing B. And so on the 1025 00:55:21,450 --> 00:55:29,600 stack are the arguments that were passed from A to B that 1026 00:55:29,600 --> 00:55:32,020 did not fit within the registers. 1027 00:55:32,020 --> 00:55:33,050 So normally most of the 1028 00:55:33,050 --> 00:55:34,510 arguments are within registers. 1029 00:55:34,510 --> 00:55:38,750 But if you exceed the six registers then, because you 1030 00:55:38,750 --> 00:55:40,430 have a long argument list, then it gets 1031 00:55:40,430 --> 00:55:41,420 passed on the stack. 1032 00:55:41,420 --> 00:55:43,020 And here's where it gets passed. 1033 00:55:43,020 --> 00:55:45,890 The next thing is B's return address. 1034 00:55:45,890 --> 00:55:48,200 This is the thing that got smashed in there 1035 00:55:48,200 --> 00:55:49,890 when you did the call. 1036 00:55:49,890 --> 00:55:51,620 It got pushed onto the stack. 1037 00:55:51,620 --> 00:55:54,290 And then there's what's called a base pointer for A. And this 1038 00:55:54,290 --> 00:55:57,790 is the way that A ends up accessing its local variables. 1039 00:55:57,790 --> 00:56:01,110 And then there's a separate region here where it's going 1040 00:56:01,110 --> 00:56:06,920 to put arguments from B to B's callees if they exceed the six 1041 00:56:06,920 --> 00:56:13,440 registers, if any of the things that B is calling 1042 00:56:13,440 --> 00:56:15,130 require more than the six registers. 1043 00:56:15,130 --> 00:56:16,850 So let's just take a look. 1044 00:56:16,850 --> 00:56:22,320 So function B can access its nonregister values by indexing 1045 00:56:22,320 --> 00:56:25,600 off of RBP. 1046 00:56:25,600 --> 00:56:28,480 So these we say, these are in a linkage block. 1047 00:56:28,480 --> 00:56:30,250 And the reason is because it's actually part of 1048 00:56:30,250 --> 00:56:31,850 A's frame as well. 1049 00:56:31,850 --> 00:56:35,380 It's a shared part of the frame where A stores it into 1050 00:56:35,380 --> 00:56:38,010 memory and then B's going to fetch it out of memory. 1051 00:56:38,010 --> 00:56:39,870 And that's the linkage block. 1052 00:56:39,870 --> 00:56:41,900 So this is positive in memory. 1053 00:56:41,900 --> 00:56:47,500 So if I use a positive offset, I then go up to getting the 1054 00:56:47,500 --> 00:56:55,820 arguments from A. Then it can access its local variables 1055 00:56:55,820 --> 00:56:58,510 from the base point with a negative offset because we're 1056 00:56:58,510 --> 00:56:59,760 growing down in memory. 1057 00:57:02,790 --> 00:57:08,640 Now, if it wants to call C, what it does is it places the 1058 00:57:08,640 --> 00:57:15,680 nonregister arguments into the reserved linkage block here, 1059 00:57:15,680 --> 00:57:18,730 which are arguments from B to B's callees. 1060 00:57:18,730 --> 00:57:21,890 And that once again acts just as if they're local variables. 1061 00:57:21,890 --> 00:57:28,530 It's positive index off of RBP, sorry, negative offset 1062 00:57:28,530 --> 00:57:29,580 off of RBP. 1063 00:57:29,580 --> 00:57:33,860 So it pushes those things into the argument, into that 1064 00:57:33,860 --> 00:57:38,080 region, if it needs to use that region. 1065 00:57:38,080 --> 00:57:42,130 Then we actually, once it's done that, we have the call. 1066 00:57:42,130 --> 00:57:46,920 So B calls C which saves the return address for B on the 1067 00:57:46,920 --> 00:57:52,440 stack, so it saves it on the stack, and then transfers 1068 00:57:52,440 --> 00:58:00,120 control to C. So now it starts executing C's code. 1069 00:58:00,120 --> 00:58:01,780 And what does C do? 1070 00:58:01,780 --> 00:58:05,120 So C is going to have to advance these pointers to 1071 00:58:05,120 --> 00:58:08,720 refer to its region rather than B's. 1072 00:58:08,720 --> 00:58:12,770 It does it by saving B's base pointer on the stack. 1073 00:58:12,770 --> 00:58:15,695 So it saves this pointer here so that it can restore it when 1074 00:58:15,695 --> 00:58:16,950 it returns. 1075 00:58:16,950 --> 00:58:21,010 It advances, it sets its new base pointer to be where the 1076 00:58:21,010 --> 00:58:25,850 stack pointer is now and then advances the stack pointer to 1077 00:58:25,850 --> 00:58:29,520 allocate space for C's local variables and linkage blocks. 1078 00:58:29,520 --> 00:58:30,770 Watch, here we go. 1079 00:58:36,690 --> 00:58:38,900 So that ends up being C's frame. 1080 00:58:38,900 --> 00:58:43,360 So notice that B's frame and C's frame are overlapping in 1081 00:58:43,360 --> 00:58:45,157 the linkage block between them. 1082 00:58:49,220 --> 00:58:52,800 Now, if a function never performs stack allocations 1083 00:58:52,800 --> 00:58:57,090 except during function calls, there's a great compile time 1084 00:58:57,090 --> 00:59:00,810 optimization that the compiler will often do. 1085 00:59:00,810 --> 00:59:03,350 And what it will do is realize that 1086 00:59:03,350 --> 00:59:06,620 this distance is constant. 1087 00:59:06,620 --> 00:59:10,450 So, therefore, it doesn't need RBP. 1088 00:59:10,450 --> 00:59:15,190 It can just do the math and index everything off of RSP as 1089 00:59:15,190 --> 00:59:19,440 long as RSP is always the same, for example for C, when 1090 00:59:19,440 --> 00:59:21,840 C is executing. 1091 00:59:21,840 --> 00:59:24,400 There's certain C commands like [? alaka ?] 1092 00:59:24,400 --> 00:59:26,310 which changed the stack pointer. 1093 00:59:26,310 --> 00:59:31,040 If you use those, the compiler can't do that optimization. 1094 00:59:31,040 --> 00:59:36,200 But if the storage on the stack never changes for a 1095 00:59:36,200 --> 00:59:39,490 given frame, then it's free to make this optimization. 1096 00:59:39,490 --> 00:59:42,360 So you'll see code where RBP has been optimized away. 1097 00:59:45,590 --> 00:59:48,340 How about some questions before we go 1098 00:59:48,340 --> 00:59:50,030 on and do an example. 1099 00:59:50,030 --> 00:59:51,443 Yeah? 1100 00:59:51,443 --> 00:59:52,856 AUDIENCE: [INAUDIBLE] 1101 00:59:52,856 --> 00:59:56,624 should there be A's return address, where you try to 1102 00:59:56,624 --> 00:59:59,094 [INAUDIBLE]? 1103 00:59:59,094 --> 01:00:00,390 CHARLES LEISERSON: Up? 1104 01:00:00,390 --> 01:00:02,360 Oh, this should be, sorry, this should 1105 01:00:02,360 --> 01:00:03,310 be A's return address. 1106 01:00:03,310 --> 01:00:04,420 Yes, you're right. 1107 01:00:04,420 --> 01:00:05,670 OK, good, typo. 1108 01:00:09,090 --> 01:00:10,956 Is somebody catching my typos to-- 1109 01:00:14,080 --> 01:00:16,640 OK, yep, a good one, that should be A's return address. 1110 01:00:16,640 --> 01:00:18,410 Sorry about that. 1111 01:00:18,410 --> 01:00:20,560 This is B's return address. 1112 01:00:20,560 --> 01:00:21,460 Any other questions? 1113 01:00:21,460 --> 01:00:22,070 That's good. 1114 01:00:22,070 --> 01:00:23,720 That means you understand something. 1115 01:00:23,720 --> 01:00:24,970 Hooray. 1116 01:00:29,980 --> 01:00:31,750 So let's do an example. 1117 01:00:31,750 --> 01:00:34,450 So here's my fib example. 1118 01:00:34,450 --> 01:00:38,150 And I compiled this with minus oh zero. 1119 01:00:38,150 --> 01:00:41,620 Because when I compiled it with minus oh three, I 1120 01:00:41,620 --> 01:00:44,650 couldn't understand what was going on. 1121 01:00:44,650 --> 01:00:48,000 So I compiled this with minus oh zero, which gives me really 1122 01:00:48,000 --> 01:00:48,910 unoptimized code. 1123 01:00:48,910 --> 01:00:52,150 And that lets me be the compiler optimizer. 1124 01:00:52,150 --> 01:00:54,920 So here's the code that it generates. 1125 01:00:54,920 --> 01:00:56,720 So we can take a look at a few things here. 1126 01:00:56,720 --> 01:00:59,370 First of all is declaring fib to be a global. 1127 01:00:59,370 --> 01:01:00,880 And it's got some other things here. 1128 01:01:00,880 --> 01:01:03,640 I actually took out some of the directives that were in 1129 01:01:03,640 --> 01:01:05,590 here that were irrelevant for our purposes. 1130 01:01:05,590 --> 01:01:07,960 If you actually compile it, there's a lot more directives 1131 01:01:07,960 --> 01:01:10,680 that are stuck in there and a lot more labels and things 1132 01:01:10,680 --> 01:01:16,700 that you don't need to understand it. 1133 01:01:16,700 --> 01:01:18,340 There are two labels here. 1134 01:01:18,340 --> 01:01:24,850 And so you can see here basically what's going on is 1135 01:01:24,850 --> 01:01:29,090 we're first of all doing the advancing of the base pointer 1136 01:01:29,090 --> 01:01:31,530 and advancing the stack pointer here. 1137 01:01:31,530 --> 01:01:34,380 That's doing that operation that I showed you, those of 1138 01:01:34,380 --> 01:01:37,680 moving the base and stack pointer up. 1139 01:01:37,680 --> 01:01:45,170 And then at the end here this is equivalent to doing, a 1140 01:01:45,170 --> 01:01:47,750 leave instruction is equivalent to undoing that. 1141 01:01:47,750 --> 01:01:52,110 So Intel lets you do one leave instruction rather than making 1142 01:01:52,110 --> 01:01:55,580 you put these instructions in every time. 1143 01:01:55,580 --> 01:01:59,230 It's exactly the same thing. 1144 01:01:59,230 --> 01:02:01,590 But in any case, let's just sort of see 1145 01:02:01,590 --> 01:02:02,940 what's going on here. 1146 01:02:02,940 --> 01:02:07,370 So we're pushing some storage. 1147 01:02:07,370 --> 01:02:10,080 This is saving a register here. 1148 01:02:10,080 --> 01:02:14,750 We're then advancing the stack pointer to store 24 bytes of 1149 01:02:14,750 --> 01:02:16,580 temporary storage. 1150 01:02:16,580 --> 01:02:19,950 And then we're start to do some computations here. 1151 01:02:19,950 --> 01:02:22,580 This looks like we're comparing one with something 1152 01:02:22,580 --> 01:02:24,880 and then doing a ja. 1153 01:02:24,880 --> 01:02:27,510 So this is a jump above. 1154 01:02:27,510 --> 01:02:29,370 This is the unsigned version. 1155 01:02:29,370 --> 01:02:32,380 What you're looking is to see, here we say if n is less than 1156 01:02:32,380 --> 01:02:36,450 two, in fact, what it's doing is saying if n is greater than 1157 01:02:36,450 --> 01:02:38,770 one go to L4. 1158 01:02:42,280 --> 01:02:43,640 So it's actually doing the other one. 1159 01:02:43,640 --> 01:02:47,020 So you can see then L4 is, what happens is the one that 1160 01:02:47,020 --> 01:02:50,200 has the two calls to fib, recursive calls, so that's 1161 01:02:50,200 --> 01:02:52,850 this part of the code, and it's doing that if it's 1162 01:02:52,850 --> 01:02:53,470 greater than one. 1163 01:02:53,470 --> 01:02:58,310 And otherwise it's going to execute these instructions, 1164 01:02:58,310 --> 01:03:01,820 which are basically returning n. 1165 01:03:01,820 --> 01:03:04,020 And so it basically does some computations. 1166 01:03:04,020 --> 01:03:07,510 And then both of them converge here where it moves the 1167 01:03:07,510 --> 01:03:11,490 results and then pops it off and so forth. 1168 01:03:11,490 --> 01:03:13,810 So that's sort of the outline of what's going on there. 1169 01:03:13,810 --> 01:03:16,920 So let's dive in here a little bit and sort of see what's 1170 01:03:16,920 --> 01:03:21,820 going on, see if we can read this a little bit more closely 1171 01:03:21,820 --> 01:03:24,180 and whether we can optimize it. 1172 01:03:24,180 --> 01:03:26,500 So the first thing that I noticed in looking at this is 1173 01:03:26,500 --> 01:03:30,330 look at all this memory addressing that we're doing. 1174 01:03:30,330 --> 01:03:37,070 What do you suppose this thing is, minus 16% RBP? 1175 01:03:37,070 --> 01:03:38,080 So this is the base pointer. 1176 01:03:38,080 --> 01:03:42,340 So this is a local variable because it's a negative offset 1177 01:03:42,340 --> 01:03:43,790 off of the base pointer. 1178 01:03:43,790 --> 01:03:45,250 What do you think it's doing there? 1179 01:03:48,720 --> 01:03:49,970 What's stored in here? 1180 01:03:54,880 --> 01:03:58,960 Yeah, this is where n is being stored. 1181 01:03:58,960 --> 01:03:59,790 Because what are we doing? 1182 01:03:59,790 --> 01:04:04,240 We're trying to compare n with one here even though it says 1183 01:04:04,240 --> 01:04:05,470 two up there. 1184 01:04:05,470 --> 01:04:08,970 We're comparing it with one here. 1185 01:04:08,970 --> 01:04:11,460 And so I look at that, and I say, look, I'm comparing it 1186 01:04:11,460 --> 01:04:19,710 with one, then I'm jumping to L4, then I jump to L4 or not. 1187 01:04:19,710 --> 01:04:21,310 And then let's say I don't. 1188 01:04:21,310 --> 01:04:25,320 Well, then the first thing I do is I move n into RAX. 1189 01:04:25,320 --> 01:04:28,530 But wait a minute, I just compared it with that. 1190 01:04:28,530 --> 01:04:29,230 So I'm accessing n again. 1191 01:04:29,230 --> 01:04:31,100 I'm accessing it a third time. 1192 01:04:31,100 --> 01:04:32,930 How about if I try to store that stuff 1193 01:04:32,930 --> 01:04:34,180 in a register instead? 1194 01:04:36,520 --> 01:04:39,830 So what I did is I picked the RDI register, because that one 1195 01:04:39,830 --> 01:04:44,630 happens to be available, and I said do they, if you look 1196 01:04:44,630 --> 01:04:45,320 here, what did we do? 1197 01:04:45,320 --> 01:04:51,350 We stored RDI, which is the first argument, into memory. 1198 01:04:51,350 --> 01:04:53,270 And then we compared with it in memory. 1199 01:04:53,270 --> 01:04:54,640 Why don't we compare with it in RDI? 1200 01:04:57,230 --> 01:04:57,520 Right? 1201 01:04:57,520 --> 01:05:02,655 Duh, stupid compiler, well, because I had minus oh zero. 1202 01:05:05,610 --> 01:05:07,590 OK, so I can do that improvement. 1203 01:05:07,590 --> 01:05:11,140 So what I did was I edited it to put RDI there and RDI here 1204 01:05:11,140 --> 01:05:12,410 and RDI here. 1205 01:05:12,410 --> 01:05:15,990 And I went up and I said, what about RDI here? 1206 01:05:15,990 --> 01:05:17,410 Why didn't I replace that one? 1207 01:05:27,130 --> 01:05:28,650 No, there's no loop going on here. 1208 01:05:28,650 --> 01:05:29,694 It's recursion. 1209 01:05:29,694 --> 01:05:30,944 AUDIENCE: [INAUDIBLE PHRASE] 1210 01:05:33,870 --> 01:05:35,690 CHARLES LEISERSON: Yeah, the problem is that when I call 1211 01:05:35,690 --> 01:05:41,020 fib, RDI gets garbaged on me. 1212 01:05:41,020 --> 01:05:45,590 Because RDI is going to be the first argument to-- 1213 01:05:45,590 --> 01:05:48,330 See it's being garbaged here? 1214 01:05:48,330 --> 01:05:51,360 It's garbage as far as my use of it for n. 1215 01:05:51,360 --> 01:05:55,790 It's being used to pass n minus 1 as the argument to the 1216 01:05:55,790 --> 01:05:57,040 recursive call. 1217 01:06:02,670 --> 01:06:05,500 So I can't replace this one after fib because RDI no 1218 01:06:05,500 --> 01:06:06,420 longer has it. 1219 01:06:06,420 --> 01:06:07,810 Because I had to leave it. 1220 01:06:07,810 --> 01:06:12,350 But even so I went from 5.45 seconds for the original code 1221 01:06:12,350 --> 01:06:14,930 to 4.09 seconds when I compile that just 1222 01:06:14,930 --> 01:06:16,300 with that little change. 1223 01:06:16,300 --> 01:06:19,360 I felt pretty good. 1224 01:06:19,360 --> 01:06:20,200 I felt pretty good. 1225 01:06:20,200 --> 01:06:23,090 So then I wanted more. 1226 01:06:23,090 --> 01:06:23,710 That was fun. 1227 01:06:23,710 --> 01:06:24,960 I wanted more. 1228 01:06:27,570 --> 01:06:30,660 So what was the next thing I noticed? 1229 01:06:30,660 --> 01:06:31,720 I noticed that-- 1230 01:06:31,720 --> 01:06:34,260 And by the way almost all the things, that 1231 01:06:34,260 --> 01:06:35,620 stuff I did last night. 1232 01:06:35,620 --> 01:06:37,870 This is what I did an hour before class so we'll see 1233 01:06:37,870 --> 01:06:39,960 whether it-- 1234 01:06:39,960 --> 01:06:44,260 So then I noticed that, look, we're moving this stuff here. 1235 01:06:44,260 --> 01:06:46,420 We keep using minus 24. 1236 01:06:46,420 --> 01:06:48,870 And once again, memory operations are expensive 1237 01:06:48,870 --> 01:06:50,290 compared to register operations. 1238 01:06:50,290 --> 01:06:51,340 Let me try to get rid them. 1239 01:06:51,340 --> 01:06:52,850 What do you suppose is in here? 1240 01:06:59,160 --> 01:07:08,990 So look, we move RAX into the local variable minus 24. 1241 01:07:08,990 --> 01:07:11,310 And then we jump to L5. 1242 01:07:11,310 --> 01:07:13,370 And we move minus 24 into RAX. 1243 01:07:16,950 --> 01:07:19,490 That seems kind of unnecessary. 1244 01:07:19,490 --> 01:07:22,470 Here we move RBX into minus 24. 1245 01:07:22,470 --> 01:07:26,490 Then we move minus 24 into RAX. 1246 01:07:26,490 --> 01:07:29,350 What is this value first of all that I'm storing there? 1247 01:07:35,840 --> 01:07:39,050 What's going to be in RAX at the very end? 1248 01:07:42,730 --> 01:07:46,620 RAX is the return value. 1249 01:07:46,620 --> 01:07:50,010 So I'm trying to save, here in this case, this is the branch 1250 01:07:50,010 --> 01:07:52,010 where I just want to return n. 1251 01:07:52,010 --> 01:07:55,185 I just want to put RAX to have it return, be 1252 01:07:55,185 --> 01:07:57,720 in RAX when I return. 1253 01:07:57,720 --> 01:07:58,910 So I've got the value here. 1254 01:07:58,910 --> 01:07:59,440 It's just n. 1255 01:07:59,440 --> 01:08:00,395 It was in RDI. 1256 01:08:00,395 --> 01:08:01,750 It's now in RAX. 1257 01:08:01,750 --> 01:08:04,240 But that's clearly unnecessary. 1258 01:08:04,240 --> 01:08:07,940 Why go put it into memory and then take it back out again? 1259 01:08:07,940 --> 01:08:11,140 And here just put it in RAX directly. 1260 01:08:11,140 --> 01:08:12,760 So that's what I did. 1261 01:08:12,760 --> 01:08:17,180 I basically, instead of moving it here, I changed this 1262 01:08:17,180 --> 01:08:20,180 instruction that said add it and put in RBX, I said, no, 1263 01:08:20,180 --> 01:08:21,810 don't put it in RBX. 1264 01:08:21,810 --> 01:08:27,200 Let's just add RBX into RAX, and then it's right there. 1265 01:08:27,200 --> 01:08:30,800 And this one, get rid of those so that it's now moved into 1266 01:08:30,800 --> 01:08:33,840 RAX and it's in RAX. 1267 01:08:33,840 --> 01:08:35,000 So I did that. 1268 01:08:35,000 --> 01:08:39,319 I dropped to 3.9 seconds. 1269 01:08:39,319 --> 01:08:40,630 That felt pretty good, too. 1270 01:08:40,630 --> 01:08:45,229 In addition, I got rid of this extra variable. 1271 01:08:45,229 --> 01:08:49,600 So now I could actually reduce my storage requirements. 1272 01:08:49,600 --> 01:08:53,000 However, when I measured it with this being 24 and not 1273 01:08:53,000 --> 01:08:55,970 being 24, it was the same speed. 1274 01:08:55,970 --> 01:08:58,870 So it's like, eh, but I didn't want to 1275 01:08:58,870 --> 01:09:02,260 waste the storage anyway. 1276 01:09:02,260 --> 01:09:05,200 So then I looked a little bit further. 1277 01:09:05,200 --> 01:09:10,660 And I noticed that I want to get rid of 1278 01:09:10,660 --> 01:09:13,750 this access to n here. 1279 01:09:13,750 --> 01:09:17,450 So basically I'm subtracting it, and I'm storing n. 1280 01:09:17,450 --> 01:09:18,569 How can I get rid of it? 1281 01:09:18,569 --> 01:09:21,020 And this took me a little while to figure out. 1282 01:09:21,020 --> 01:09:23,090 What I realized is, look, we're storing 1283 01:09:23,090 --> 01:09:27,399 stuff away in RBX. 1284 01:09:27,399 --> 01:09:31,770 We have RBX as an available register because I saved the 1285 01:09:31,770 --> 01:09:34,680 value of RBX with this push instruction there. 1286 01:09:34,680 --> 01:09:36,359 So RBX is an available register. 1287 01:09:36,359 --> 01:09:46,120 We're using it to keep the return value of the 1288 01:09:46,120 --> 01:09:47,370 first call to fib. 1289 01:09:49,510 --> 01:09:52,540 So I'm going to use it for their first call to fib so 1290 01:09:52,540 --> 01:09:55,000 that when I make the second call to fib I can then add the 1291 01:09:55,000 --> 01:09:56,250 two things together. 1292 01:09:59,360 --> 01:10:04,030 Well, how about if before the first call the fib, why don't 1293 01:10:04,030 --> 01:10:08,150 I use it to store the value of n and then use it to store the 1294 01:10:08,150 --> 01:10:15,810 value of the return value of fib of n minus 1? 1295 01:10:15,810 --> 01:10:17,590 So I did that. 1296 01:10:17,590 --> 01:10:22,640 And that took a little bit of moving things 1297 01:10:22,640 --> 01:10:23,550 around a little bit. 1298 01:10:23,550 --> 01:10:26,380 But I managed to get rid of it by using RBX for two different 1299 01:10:26,380 --> 01:10:31,240 purposes, one to store the temporary, the value of n, and 1300 01:10:31,240 --> 01:10:39,010 the other to store the return value when I need it. 1301 01:10:39,010 --> 01:10:40,300 And when I did that, I got it all the way 1302 01:10:40,300 --> 01:10:44,640 down to 3.61 seconds. 1303 01:10:44,640 --> 01:10:47,380 I actually ran it with minus oh three, 1304 01:10:47,380 --> 01:10:49,650 took about two seconds. 1305 01:10:49,650 --> 01:10:51,720 So I think I can keep my day job. 1306 01:10:54,280 --> 01:10:57,070 But kind of fun to go in and sort of see what are the 1307 01:10:57,070 --> 01:10:58,500 things that can be done. 1308 01:10:58,500 --> 01:11:00,520 And you can get a very good sense of what's going on. 1309 01:11:00,520 --> 01:11:03,000 The more important thing is when you look at compilers 1310 01:11:03,000 --> 01:11:05,280 generating your code, as we saw on the last lecture on 1311 01:11:05,280 --> 01:11:10,480 profiling, you can see, oh, it did something silly here. 1312 01:11:10,480 --> 01:11:12,000 So you can actually go and say, oh, it's 1313 01:11:12,000 --> 01:11:13,130 doing something silly. 1314 01:11:13,130 --> 01:11:14,990 We can do a better job than that. 1315 01:11:14,990 --> 01:11:17,700 Or, oh, I didn't realize I'd declared this an int when in 1316 01:11:17,700 --> 01:11:23,100 fact, if I declared it a unsigned int 64, it actually 1317 01:11:23,100 --> 01:11:25,260 would produce faster, better code. 1318 01:11:25,260 --> 01:11:25,480 Yeah? 1319 01:11:25,480 --> 01:11:26,868 Question? 1320 01:11:26,868 --> 01:11:27,362 AUDIENCE: Sorry. 1321 01:11:27,362 --> 01:11:32,302 So when you said that when you run it with minus oh three, 1322 01:11:32,302 --> 01:11:34,772 what you're just saying is even though you optimized the 1323 01:11:34,772 --> 01:11:35,266 [INAUDIBLE]? 1324 01:11:35,266 --> 01:11:38,420 CHARLES LEISERSON: As the compiler, that's why I said I 1325 01:11:38,420 --> 01:11:39,670 can keep my day job. 1326 01:11:43,060 --> 01:11:46,650 So simple optimization strategies, if you're playing 1327 01:11:46,650 --> 01:11:49,090 with things, is you try to keep values and registers to 1328 01:11:49,090 --> 01:11:51,320 eliminate excess memory traffic. 1329 01:11:51,320 --> 01:11:55,550 You can optimize naive function call linkage. 1330 01:11:55,550 --> 01:11:58,790 And the most important thing probably is constant fold. 1331 01:11:58,790 --> 01:12:03,690 Look to see where you've got constants that 1332 01:12:03,690 --> 01:12:05,060 can be combined together. 1333 01:12:05,060 --> 01:12:07,510 There are other optimizations that compilers do like common 1334 01:12:07,510 --> 01:12:10,300 subexpression elimination and so forth. 1335 01:12:10,300 --> 01:12:12,110 But these are sort of the ones, if you're doing it by 1336 01:12:12,110 --> 01:12:14,870 hand, these are sort of things to focus on, particularly 1337 01:12:14,870 --> 01:12:20,690 number one, just get rid of excess memory traffic. 1338 01:12:20,690 --> 01:12:23,120 Let me say, by the way, in doing this I also went down a 1339 01:12:23,120 --> 01:12:25,670 bunch of dead ends, things that I said, oh, this should 1340 01:12:25,670 --> 01:12:28,840 definitely save, and then it was slower. 1341 01:12:28,840 --> 01:12:31,700 And then we look at it, and it turns out, oh, my branch 1342 01:12:31,700 --> 01:12:34,410 misprediction rate is going way up and so forth. 1343 01:12:34,410 --> 01:12:35,540 That's why you have a profiler. 1344 01:12:35,540 --> 01:12:36,830 Because you don't want to do this blind. 1345 01:12:39,780 --> 01:12:44,860 Now, how does the compiler compile some common high level 1346 01:12:44,860 --> 01:12:46,110 structures? 1347 01:12:48,130 --> 01:12:52,590 So if you have a conditional, for example, if p, do the 1348 01:12:52,590 --> 01:12:56,360 ctrue clause, else do the cfalse clause, what it does 1349 01:12:56,360 --> 01:12:59,910 basically is it generates instructions to evaluate p. 1350 01:12:59,910 --> 01:13:02,570 And then it does a jump with the condition to see if p is 1351 01:13:02,570 --> 01:13:06,650 false to the else clause and executes those instruction. 1352 01:13:06,650 --> 01:13:11,730 And then otherwise it passes through, does the true clause, 1353 01:13:11,730 --> 01:13:12,930 and then jumps to the end. 1354 01:13:12,930 --> 01:13:16,800 And you'll see that pattern in the code when you look at it. 1355 01:13:16,800 --> 01:13:18,550 So that's a very common pattern for doing 1356 01:13:18,550 --> 01:13:21,370 conditionals. 1357 01:13:21,370 --> 01:13:24,660 Compiling while loops is kind of interesting because most 1358 01:13:24,660 --> 01:13:27,520 while loops start out with a jump. 1359 01:13:27,520 --> 01:13:29,440 So here are the instructions for the body 1360 01:13:29,440 --> 01:13:30,810 of the while loop. 1361 01:13:30,810 --> 01:13:31,850 And here's the test. 1362 01:13:31,850 --> 01:13:35,280 And what they usually do is they jump to the test, they 1363 01:13:35,280 --> 01:13:37,650 evaluate the condition, and if it's true, 1364 01:13:37,650 --> 01:13:38,460 they jump to the loop. 1365 01:13:38,460 --> 01:13:40,470 Otherwise, they fall through. 1366 01:13:40,470 --> 01:13:43,740 And then they go back, do the loop sometimes for the first 1367 01:13:43,740 --> 01:13:44,460 time, et cetera. 1368 01:13:44,460 --> 01:13:48,700 So that's kind of the pattern for a while loop. 1369 01:13:48,700 --> 01:13:51,900 For a for loop, they basically just convert it 1370 01:13:51,900 --> 01:13:53,610 into a while loop. 1371 01:13:53,610 --> 01:13:56,530 You basically take the initialization code. 1372 01:13:56,530 --> 01:13:58,030 You execute that. 1373 01:13:58,030 --> 01:14:00,780 Then while the condition is true, you do the code followed 1374 01:14:00,780 --> 01:14:03,150 by whatever the next code is. 1375 01:14:03,150 --> 01:14:08,510 And so it ends up converting for loops into while loops. 1376 01:14:08,510 --> 01:14:13,230 Now, arrays are, how do we go about implementing data types? 1377 01:14:13,230 --> 01:14:18,670 Arrays are just blocks of memory. 1378 01:14:18,670 --> 01:14:23,440 So you can have basically three different types of array 1379 01:14:23,440 --> 01:14:26,050 depending upon where it gets allocated, either allocated in 1380 01:14:26,050 --> 01:14:28,830 the data segment, allocate on the heap, or 1381 01:14:28,830 --> 01:14:30,080 allocated on the stack. 1382 01:14:32,830 --> 01:14:35,090 Sometimes even the static arrays these days can be 1383 01:14:35,090 --> 01:14:37,600 allocated in the code segment, if you're not 1384 01:14:37,600 --> 01:14:40,010 going to change them. 1385 01:14:40,010 --> 01:14:47,600 So one thing is to understand that arrays and pointers are 1386 01:14:47,600 --> 01:14:50,580 almost the same thing. 1387 01:14:50,580 --> 01:14:53,820 If you have an array, that's a pointer to a place in memory 1388 01:14:53,820 --> 01:14:56,360 where the array begins. 1389 01:14:56,360 --> 01:15:00,260 And a zero is the same as the value you get when you 1390 01:15:00,260 --> 01:15:04,210 dereference the pointer to a. 1391 01:15:04,210 --> 01:15:08,300 A pointer, if you think about it, is actually just an index 1392 01:15:08,300 --> 01:15:10,970 into the array of all memory. 1393 01:15:10,970 --> 01:15:13,920 And the hardware allows you to index into the 1394 01:15:13,920 --> 01:15:14,890 array of all memory. 1395 01:15:14,890 --> 01:15:18,530 Well, it can also allow you to index into any subregion of 1396 01:15:18,530 --> 01:15:19,570 that memory. 1397 01:15:19,570 --> 01:15:21,740 And that's why arrays and pointers are basically the 1398 01:15:21,740 --> 01:15:22,990 same thing. 1399 01:15:25,200 --> 01:15:26,400 Here's a little quiz. 1400 01:15:26,400 --> 01:15:28,900 What is eight of a? 1401 01:15:32,820 --> 01:15:34,300 AUDIENCE: The a elements? 1402 01:15:34,300 --> 01:15:38,230 CHARLES LEISERSON: Yeah, it's basically a of eight. 1403 01:15:38,230 --> 01:15:41,720 It's basically a of eight because the addressing that's 1404 01:15:41,720 --> 01:15:43,540 going on is essentially the same. 1405 01:15:43,540 --> 01:15:45,600 Even though we prefer to write it-- 1406 01:15:45,600 --> 01:15:48,830 If you start writing code like this, I guarantee that at some 1407 01:15:48,830 --> 01:15:52,355 companies they'll get very angry at you even though you 1408 01:15:52,355 --> 01:15:55,910 say, it's the same thing. 1409 01:15:55,910 --> 01:15:59,890 But what's going on is you're actually taking the base, the 1410 01:15:59,890 --> 01:16:02,570 address of a, you're adding eight to it, and then 1411 01:16:02,570 --> 01:16:05,130 dereferencing that value. 1412 01:16:05,130 --> 01:16:08,680 And indeed, even in C, they actually do all the coercions, 1413 01:16:08,680 --> 01:16:11,340 even if eight is a different type, it actually does the 1414 01:16:11,340 --> 01:16:13,260 coercions properly so that they are 1415 01:16:13,260 --> 01:16:14,620 actually the same thing. 1416 01:16:14,620 --> 01:16:17,860 Because it does them after it's converted it into a 1417 01:16:17,860 --> 01:16:22,090 dereference of a plus 8. 1418 01:16:22,090 --> 01:16:26,290 So it's kind of interesting that it works right even 1419 01:16:26,290 --> 01:16:29,420 though when I looked at that I say, well, what if it's bytes 1420 01:16:29,420 --> 01:16:32,560 verses words and so forth? 1421 01:16:32,560 --> 01:16:32,750 Yeah? 1422 01:16:32,750 --> 01:16:34,004 Question? 1423 01:16:34,004 --> 01:16:37,270 AUDIENCE: Will there be a performance difference between 1424 01:16:37,270 --> 01:16:41,450 putting data on those three different arrays? 1425 01:16:41,450 --> 01:16:43,140 CHARLES LEISERSON: Yes, there can be. 1426 01:16:43,140 --> 01:16:46,730 In particular, static array, it knows exactly where the 1427 01:16:46,730 --> 01:16:48,770 base pointer is as a constant. 1428 01:16:48,770 --> 01:16:51,470 Whereas the others it has to actually figure out where it 1429 01:16:51,470 --> 01:16:53,600 is in the heap you need a pointer to it 1430 01:16:53,600 --> 01:16:54,220 to dereference it. 1431 01:16:54,220 --> 01:16:57,716 It can't put it right into the instruction stream itself. 1432 01:16:57,716 --> 01:16:58,966 AUDIENCE: [INAUDIBLE] 1433 01:17:01,524 --> 01:17:03,966 those would be just constant [INAUDIBLE], right? 1434 01:17:03,966 --> 01:17:08,746 So I can either put that in the static array or-- 1435 01:17:08,746 --> 01:17:10,940 CHARLES LEISERSON: Yes, generally it's faster to have 1436 01:17:10,940 --> 01:17:14,430 it in a static array if you can. 1437 01:17:14,430 --> 01:17:16,740 I want to finish up here so that-- 1438 01:17:16,740 --> 01:17:18,250 We have structs. 1439 01:17:18,250 --> 01:17:21,470 Structs are just blocks of memory also. 1440 01:17:21,470 --> 01:17:23,190 So you can have a bunch of things here. 1441 01:17:23,190 --> 01:17:27,850 This is a bad way to declare a struct because the fields are 1442 01:17:27,850 --> 01:17:29,660 stored next to each other generally in the 1443 01:17:29,660 --> 01:17:32,240 order you give them. 1444 01:17:32,240 --> 01:17:35,470 So here it says it's x and then i and double. 1445 01:17:35,470 --> 01:17:37,740 What happens here is you have to be careful 1446 01:17:37,740 --> 01:17:39,930 about alignment issues. 1447 01:17:39,930 --> 01:17:41,400 So if you do [? char, ?] 1448 01:17:41,400 --> 01:17:44,330 it's then got to pad it out to get to the next 1449 01:17:44,330 --> 01:17:45,740 alignment for an int. 1450 01:17:45,740 --> 01:17:47,460 And it's got to pad that out to get the next 1451 01:17:47,460 --> 01:17:48,970 alignment for a double. 1452 01:17:48,970 --> 01:17:51,380 Whereas if you do it in the opposite order, it starts 1453 01:17:51,380 --> 01:17:52,580 packing them. 1454 01:17:52,580 --> 01:17:55,220 So generally it's best to declare longer fields before 1455 01:17:55,220 --> 01:17:57,915 shorter fields because then you know when you're finished 1456 01:17:57,915 --> 01:17:59,410 with the longer fields, you're already 1457 01:17:59,410 --> 01:18:00,880 aligned for shorter fields. 1458 01:18:03,660 --> 01:18:06,430 Like arrays, there are static, dynamic, and local structs. 1459 01:18:06,430 --> 01:18:07,340 So that's all they are. 1460 01:18:07,340 --> 01:18:08,965 And so you'll see in the indexing-- 1461 01:18:11,590 --> 01:18:13,600 There's also stuff that-- 1462 01:18:13,600 --> 01:18:18,760 and actually this is important for one of the binary puzzles 1463 01:18:18,760 --> 01:18:21,960 we gave to figure out what it does-- 1464 01:18:21,960 --> 01:18:23,950 there are what are called SIMD instruction. 1465 01:18:23,950 --> 01:18:27,220 This is single instruction multiple data instructions 1466 01:18:27,220 --> 01:18:29,740 where a single instruction operates on 1467 01:18:29,740 --> 01:18:32,980 multiple pieces of data. 1468 01:18:32,980 --> 01:18:34,550 They operate on smaller vectors. 1469 01:18:34,550 --> 01:18:40,570 And there are 16 128-bit XMM registers, which you can view 1470 01:18:40,570 --> 01:18:44,160 as two 64-bit values or four 32-bit values. 1471 01:18:44,160 --> 01:18:45,860 And you can do an operation on it. 1472 01:18:45,860 --> 01:18:48,520 So this is used for multimedia, for streaming 1473 01:18:48,520 --> 01:18:51,180 applications, and so forth where you're trying to shove a 1474 01:18:51,180 --> 01:18:54,330 lot of data through and you're doing the same repeated stuff 1475 01:18:54,330 --> 01:18:56,270 on the things at time. 1476 01:18:56,270 --> 01:18:59,160 So there are instructions that operate on multiple values. 1477 01:18:59,160 --> 01:19:04,750 For example, here we're moving four 32-bit ints into this 1478 01:19:04,750 --> 01:19:07,370 particular XMM register. 1479 01:19:07,370 --> 01:19:10,650 And similarly here's another one, we're adding it. 1480 01:19:10,650 --> 01:19:12,750 And you can look at the manual for these. 1481 01:19:12,750 --> 01:19:18,740 So you may come across these because they're using up parts 1482 01:19:18,740 --> 01:19:19,450 of the machine. 1483 01:19:19,450 --> 01:19:22,990 Mostly those are the kinds of things we can say look it up 1484 01:19:22,990 --> 01:19:24,790 in the manual, because nobody's going to remember all 1485 01:19:24,790 --> 01:19:25,730 those instructions. 1486 01:19:25,730 --> 01:19:32,130 Of course, if you get a job with one of the graphics 1487 01:19:32,130 --> 01:19:35,760 companies, then you may become very familiar with these kinds 1488 01:19:35,760 --> 01:19:38,080 of instructions. 1489 01:19:38,080 --> 01:19:41,800 There's a lot more C and C++ constructs that we don't have 1490 01:19:41,800 --> 01:19:42,750 to go into. 1491 01:19:42,750 --> 01:19:46,450 You can have arrays of structs versus structs of arrays. 1492 01:19:46,450 --> 01:19:48,125 And there can be a difference in performance. 1493 01:19:50,860 --> 01:19:56,710 If you have an array of structs, then it makes it, if 1494 01:19:56,710 --> 01:19:59,530 you're accessing one struct, you can access the other 1495 01:19:59,530 --> 01:20:01,880 structs very easily. 1496 01:20:01,880 --> 01:20:06,590 But if you're using things like the SSC instructions, 1497 01:20:06,590 --> 01:20:09,970 then it may be better to have structs of arrays. 1498 01:20:09,970 --> 01:20:12,140 Because then when you're access an array, you can 1499 01:20:12,140 --> 01:20:17,160 stream what's called a stride of one, a regular stride of 1500 01:20:17,160 --> 01:20:20,170 just one memory location after the next, to do the 1501 01:20:20,170 --> 01:20:22,310 processing. 1502 01:20:22,310 --> 01:20:25,090 So the hardware doesn't work as well if you skip by 1503 01:20:25,090 --> 01:20:29,140 seventeens to gather things compared if you just get one 1504 01:20:29,140 --> 01:20:29,980 thing after the next. 1505 01:20:29,980 --> 01:20:32,590 Because there's prefetching logic that tries to fetch 1506 01:20:32,590 --> 01:20:34,620 things from memory faster. 1507 01:20:34,620 --> 01:20:36,570 There are things like function pointers. 1508 01:20:36,570 --> 01:20:40,380 So you can have store function pointer into something and 1509 01:20:40,380 --> 01:20:44,720 then call that function indirectly. 1510 01:20:44,720 --> 01:20:47,310 There's things like bit fields in arrays. 1511 01:20:47,310 --> 01:20:49,460 There are objects, virtual function tables. 1512 01:20:49,460 --> 01:20:52,080 We'll get into some of these when we do C++. 1513 01:20:52,080 --> 01:20:54,170 And there's a variety of stuff having to do with memory 1514 01:20:54,170 --> 01:20:56,250 management that we'll talk about. 1515 01:20:56,250 --> 01:20:59,890 But this is mainly to get you folks sort of at the level 1516 01:20:59,890 --> 01:21:01,810 where you can sort of understand and feel 1517 01:21:01,810 --> 01:21:05,170 comfortable with dealing with the assembler. 1518 01:21:05,170 --> 01:21:07,140 And you'll see that those resources 1519 01:21:07,140 --> 01:21:08,440 are pretty good resources. 1520 01:21:08,440 --> 01:21:12,440 But the basics are relatively simple, but it's hard to do it 1521 01:21:12,440 --> 01:21:18,110 without a manual or some online reference material. 1522 01:21:18,110 --> 01:21:19,360 Any questions? 1523 01:21:22,500 --> 01:21:26,770 What are the first two lessons I taught you today? 1524 01:21:26,770 --> 01:21:28,826 Number one is-- 1525 01:21:28,826 --> 01:21:30,260 AUDIENCE: [INAUDIBLE] 1526 01:21:30,260 --> 01:21:32,880 CHARLES LEISERSON: --write tests before you write code. 1527 01:21:32,880 --> 01:21:34,990 And what's the second lesson? 1528 01:21:34,990 --> 01:21:35,750 AUDIENCE: Pair programming. 1529 01:21:35,750 --> 01:21:37,360 CHARLES LEISERSON: Pair programming, not divide and 1530 01:21:37,360 --> 01:21:41,060 conquer, I teach algorithms where divide and conquer is a 1531 01:21:41,060 --> 01:21:43,040 fabulous technique. 1532 01:21:43,040 --> 01:21:45,650 With programming, pair programming is going to have 1533 01:21:45,650 --> 01:21:49,710 you generally get where you want to get faster than if 1534 01:21:49,710 --> 01:21:51,180 you're programming alone. 1535 01:21:54,915 --> 01:21:56,750 OK, thank you.