1 00:00:00,090 --> 00:00:02,430 The following content is provided under a Creative 2 00:00:02,430 --> 00:00:03,820 Commons license. 3 00:00:03,820 --> 00:00:06,050 Your support will help MIT OpenCourseWare 4 00:00:06,050 --> 00:00:10,150 continue to offer high quality educational resources for free. 5 00:00:10,150 --> 00:00:12,690 To make a donation or to view additional materials 6 00:00:12,690 --> 00:00:16,600 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:16,600 --> 00:00:26,054 at ocw.mit.edu 8 00:00:26,054 --> 00:00:27,720 PROFESSOR: In this class, this semester, 9 00:00:27,720 --> 00:00:29,696 the other co-lecturer is going to be 10 00:00:29,696 --> 00:00:32,070 James Mickens, who is a visiting professor from Microsoft 11 00:00:32,070 --> 00:00:33,480 Research. 12 00:00:33,480 --> 00:00:36,130 He'll lecture on some other topics like web security 13 00:00:36,130 --> 00:00:37,520 later on. 14 00:00:37,520 --> 00:00:40,500 But we'll decide later what's going on exactly, in terms 15 00:00:40,500 --> 00:00:41,600 of the lecture split up. 16 00:00:41,600 --> 00:00:46,120 We also have four TAs this year, Stephen, Webb, [INAUDIBLE], 17 00:00:46,120 --> 00:00:47,610 and James. 18 00:00:47,610 --> 00:00:52,590 And hopefully you'll meet them in office hours over the year 19 00:00:52,590 --> 00:00:54,780 if you need help. 20 00:00:54,780 --> 00:00:57,950 So the plan for this class is to understand 21 00:00:57,950 --> 00:01:01,900 how to build secure systems, why computer systems sometimes 22 00:01:01,900 --> 00:01:04,330 are insecure and how we can make them better, 23 00:01:04,330 --> 00:01:06,040 and what goes wrong. 24 00:01:06,040 --> 00:01:08,970 And in order to do this, there's not really a great textbook 25 00:01:08,970 --> 00:01:09,830 about this topic. 26 00:01:09,830 --> 00:01:11,246 So instead, what we're going to do 27 00:01:11,246 --> 00:01:13,690 is, each lecture other than this one is 28 00:01:13,690 --> 00:01:16,860 going to be focused around some research, typically a paper, 29 00:01:16,860 --> 00:01:19,730 that we'll assign on the website and you guys 30 00:01:19,730 --> 00:01:21,050 should read ahead of time. 31 00:01:21,050 --> 00:01:22,550 And there are some question that you 32 00:01:22,550 --> 00:01:25,780 should answer in the submission system about the paper. 33 00:01:25,780 --> 00:01:28,520 And submit your own question by 10:00 PM before the lecture 34 00:01:28,520 --> 00:01:29,220 day. 35 00:01:29,220 --> 00:01:30,110 And then when you come to lecture, 36 00:01:30,110 --> 00:01:31,984 we'll actually discuss the paper, figure out, 37 00:01:31,984 --> 00:01:32,810 what is the system? 38 00:01:32,810 --> 00:01:34,500 What problem does it solve? 39 00:01:34,500 --> 00:01:35,300 When does it work? 40 00:01:35,300 --> 00:01:36,730 When does it not work? 41 00:01:36,730 --> 00:01:39,000 Are these ideas any good in other cases? 42 00:01:39,000 --> 00:01:39,560 Et cetera. 43 00:01:39,560 --> 00:01:41,840 And hopefully, through these case studies, 44 00:01:41,840 --> 00:01:45,080 we'll get some appreciation of how do we actually build 45 00:01:45,080 --> 00:01:47,220 systems that are secure. 46 00:01:47,220 --> 00:01:49,644 And we have some preliminary schedule up on the website. 47 00:01:49,644 --> 00:01:52,310 If there's other topics you guys are particularly interested in, 48 00:01:52,310 --> 00:01:54,300 or if there's particular papers you're excited about, 49 00:01:54,300 --> 00:01:56,050 just send us email and we'll see if we can 50 00:01:56,050 --> 00:01:57,530 fit them in or do something. 51 00:01:57,530 --> 00:01:59,150 We're pretty flexible. 52 00:01:59,150 --> 00:02:01,150 So if there's anything that you'd 53 00:02:01,150 --> 00:02:03,973 like to hear more about, just let us know. 54 00:02:03,973 --> 00:02:08,520 And in a similar vein, if you ever have a question 55 00:02:08,520 --> 00:02:11,290 or if there's some mistake, just interrupt and ask 56 00:02:11,290 --> 00:02:15,760 us what's going on in lecture, anytime. 57 00:02:15,760 --> 00:02:18,190 Security is, in many ways, all about the details 58 00:02:18,190 --> 00:02:20,100 and getting everything right. 59 00:02:20,100 --> 00:02:21,742 And I will inevitably make mistakes. 60 00:02:21,742 --> 00:02:23,200 So if something doesn't seem right, 61 00:02:23,200 --> 00:02:24,570 there's a good chance it's not. 62 00:02:24,570 --> 00:02:26,153 And you should just interrupt and ask. 63 00:02:26,153 --> 00:02:28,040 And we'll figure out what's going on 64 00:02:28,040 --> 00:02:30,730 and what's the right way to do things. 65 00:02:30,730 --> 00:02:33,670 And I guess in terms of the class organization, 66 00:02:33,670 --> 00:02:35,136 the other large part of the class, 67 00:02:35,136 --> 00:02:36,510 in addition to lectures, is going 68 00:02:36,510 --> 00:02:38,540 to be a series of lab assignments. 69 00:02:38,540 --> 00:02:40,740 The first one is already posted on the website. 70 00:02:40,740 --> 00:02:42,960 And these lab assignments will help 71 00:02:42,960 --> 00:02:46,840 you go through understanding the different range of security 72 00:02:46,840 --> 00:02:51,220 problems and how do you prevent them in a simple web server. 73 00:02:51,220 --> 00:02:54,136 So in lab one, which is out right now, 74 00:02:54,136 --> 00:02:55,510 you'll actually take a web server 75 00:02:55,510 --> 00:02:58,910 that we give you and find ways to exploit buffer overflow 76 00:02:58,910 --> 00:03:01,800 vulnerabilities in it and take control of this website 77 00:03:01,800 --> 00:03:04,020 by just sending it carefully-crafted requests 78 00:03:04,020 --> 00:03:05,430 and packets. 79 00:03:05,430 --> 00:03:07,300 And in other labs, you'll look at ways 80 00:03:07,300 --> 00:03:10,150 to defend the web server, to find bugs in the code, 81 00:03:10,150 --> 00:03:13,910 to write worms that run in the user's 82 00:03:13,910 --> 00:03:18,920 browser, and other kinds of interesting security problems. 83 00:03:18,920 --> 00:03:21,150 One thing that surprises many students 84 00:03:21,150 --> 00:03:24,740 is that every lab uses a different language. 85 00:03:24,740 --> 00:03:27,160 So lab one is all about C and Assembly. 86 00:03:27,160 --> 00:03:29,530 Lab two involves a lot of Python coding. 87 00:03:29,530 --> 00:03:30,980 Lab three will be something else. 88 00:03:30,980 --> 00:03:32,660 Lab five will be JavaScript. 89 00:03:32,660 --> 00:03:33,600 And so on. 90 00:03:33,600 --> 00:03:35,340 This is sort of inevitable. 91 00:03:35,340 --> 00:03:36,920 And I sort of apologize ahead of time 92 00:03:36,920 --> 00:03:37,970 that you're going to have to learn 93 00:03:37,970 --> 00:03:40,179 all these languages if you haven't seen them already. 94 00:03:40,179 --> 00:03:42,178 In some ways it's useful, because the real world 95 00:03:42,178 --> 00:03:42,740 is like this. 96 00:03:42,740 --> 00:03:44,573 All the systems are complicated and composed 97 00:03:44,573 --> 00:03:45,550 of different parts. 98 00:03:45,550 --> 00:03:48,149 And in the long run, it'll be useful for you, 99 00:03:48,149 --> 00:03:49,690 for your moral character or something 100 00:03:49,690 --> 00:03:52,180 like that, to learn this stuff. 101 00:03:52,180 --> 00:03:53,871 But it will take some preparation, 102 00:03:53,871 --> 00:03:56,120 especially if you haven't seen these languages before. 103 00:03:56,120 --> 00:03:57,790 It might be helpful to start early. 104 00:03:57,790 --> 00:04:00,750 In particular, lab one is going to rely 105 00:04:00,750 --> 00:04:03,130 on a lot of subtle details of C and Assembly 106 00:04:03,130 --> 00:04:06,040 code that we don't really teach in other classes 107 00:04:06,040 --> 00:04:07,560 here in as much detail. 108 00:04:07,560 --> 00:04:09,400 So it's probably a good idea to start early. 109 00:04:09,400 --> 00:04:12,290 And we'll try to get the TAs to hold office hours next week 110 00:04:12,290 --> 00:04:14,700 where we'll do some sort of a tutorial session 111 00:04:14,700 --> 00:04:18,700 where we can help you get started with understanding what 112 00:04:18,700 --> 00:04:21,500 a binary program looks like, how to disassemble it, 113 00:04:21,500 --> 00:04:25,900 how to figure out what's on the stack, and so on. 114 00:04:25,900 --> 00:04:26,790 All right. 115 00:04:26,790 --> 00:04:29,259 And I guess the one other thing, we're actually 116 00:04:29,259 --> 00:04:30,550 videotaping lectures this year. 117 00:04:30,550 --> 00:04:33,030 So you might be able to watch these online. 118 00:04:33,030 --> 00:04:35,030 We'll post them as soon as we get them ourselves 119 00:04:35,030 --> 00:04:37,690 from the video people. 120 00:04:37,690 --> 00:04:39,340 And the last bit of administrivia 121 00:04:39,340 --> 00:04:42,777 is you should, if you have questions online, 122 00:04:42,777 --> 00:04:44,360 we're using Piazza, so I'm sure you've 123 00:04:44,360 --> 00:04:46,870 used this in other classes. 124 00:04:46,870 --> 00:04:47,530 All right. 125 00:04:47,530 --> 00:04:51,020 So before we dive into security, I need to tell you one thing. 126 00:04:51,020 --> 00:04:55,190 There is a sort of rules that MIT has for accessing 127 00:04:55,190 --> 00:04:58,030 MIT's network when you're, especially, doing security 128 00:04:58,030 --> 00:05:00,180 research or playing with security problems, 129 00:05:00,180 --> 00:05:03,200 you should be aware that not everything you can technically 130 00:05:03,200 --> 00:05:04,870 do is legal. 131 00:05:04,870 --> 00:05:08,450 And there's many things that you will learn in this class that 132 00:05:08,450 --> 00:05:09,580 are technically possible. 133 00:05:09,580 --> 00:05:13,880 We'll understand how systems can be broken or compromised. 134 00:05:13,880 --> 00:05:16,650 Doesn't mean you should go out and do this everywhere. 135 00:05:16,650 --> 00:05:19,370 And there's this link in the lecture notes 136 00:05:19,370 --> 00:05:22,290 we'll post that has some rules that are good guidelines. 137 00:05:22,290 --> 00:05:24,060 But in general, if you're in doubt, 138 00:05:24,060 --> 00:05:28,950 ask one of the lecturers or a TA as to what you should do. 139 00:05:28,950 --> 00:05:34,800 And hopefully it's not too puzzling, what's going on. 140 00:05:34,800 --> 00:05:35,300 All right. 141 00:05:35,300 --> 00:05:37,290 So any questions about all this administrivia 142 00:05:37,290 --> 00:05:39,620 before we dive in? 143 00:05:39,620 --> 00:05:42,150 Feel free to ask questions. 144 00:05:42,150 --> 00:05:42,990 OK. 145 00:05:42,990 --> 00:05:44,290 So what is security? 146 00:05:44,290 --> 00:05:47,240 So we'll start with some basic stuff today. 147 00:05:47,240 --> 00:05:50,150 And we'll look at just some general examples 148 00:05:50,150 --> 00:05:53,280 of why security is hard and what it means to try 149 00:05:53,280 --> 00:05:55,160 to build a secure system. 150 00:05:55,160 --> 00:05:56,770 Because there's not really a paper, 151 00:05:56,770 --> 00:05:59,930 this will not have sort of deep intellectual content, maybe, 152 00:05:59,930 --> 00:06:02,180 but it'll give you some background and context for how 153 00:06:02,180 --> 00:06:04,550 to think about secure systems. 154 00:06:04,550 --> 00:06:06,890 So security, in general, is all about 155 00:06:06,890 --> 00:06:10,755 achieving some goal when there is an adversary present. 156 00:06:13,410 --> 00:06:16,510 So think of it as there's some bad guy out there that wants 157 00:06:16,510 --> 00:06:18,520 to make sure you don't succeed. 158 00:06:18,520 --> 00:06:19,770 They want to steal your files. 159 00:06:19,770 --> 00:06:22,450 They want to delete your entire hard drive contents. 160 00:06:22,450 --> 00:06:24,710 They want to make sure nothing works 161 00:06:24,710 --> 00:06:27,770 and your phone doesn't connect, all these things, right? 162 00:06:27,770 --> 00:06:30,130 And a secure system is one that can actually 163 00:06:30,130 --> 00:06:32,650 do something, regardless of what the bad guy is 164 00:06:32,650 --> 00:06:33,492 trying to do to you. 165 00:06:33,492 --> 00:06:35,950 So it's kind of cool that we can actually potentially build 166 00:06:35,950 --> 00:06:39,000 systems that are resilient to a whole range 167 00:06:39,000 --> 00:06:41,400 of bad guys, adversaries, attackers, 168 00:06:41,400 --> 00:06:43,440 whatever you want to call them. 169 00:06:43,440 --> 00:06:45,720 And we can still build computer systems that 170 00:06:45,720 --> 00:06:48,010 allow us to get our work done. 171 00:06:48,010 --> 00:06:53,430 And the general way to think about security 172 00:06:53,430 --> 00:06:55,880 is sort of break it up into three parts. 173 00:06:55,880 --> 00:07:00,064 One part is roughly the policy that you 174 00:07:00,064 --> 00:07:01,230 want your system to enforce. 175 00:07:01,230 --> 00:07:03,313 This is roughly the goal that you want to achieve. 176 00:07:03,313 --> 00:07:05,430 Like well, maybe, only I should be 177 00:07:05,430 --> 00:07:09,190 able to read the grades file for 6.858. 178 00:07:09,190 --> 00:07:11,330 Or maybe the TAs as well, and all the co-lecturers, 179 00:07:11,330 --> 00:07:11,979 et cetera. 180 00:07:11,979 --> 00:07:14,270 But there is some statement about what I want my system 181 00:07:14,270 --> 00:07:16,210 to be able to do. 182 00:07:16,210 --> 00:07:18,466 And then, if you want sort of think 183 00:07:18,466 --> 00:07:20,340 about what kinds of policies you might write, 184 00:07:20,340 --> 00:07:26,850 typical ones have to do with either confidentiality of data, 185 00:07:26,850 --> 00:07:31,040 so the grades file is only accessible to the 6.858 course 186 00:07:31,040 --> 00:07:32,330 staff. 187 00:07:32,330 --> 00:07:33,890 Another example of a security policy 188 00:07:33,890 --> 00:07:35,744 has something to do with integrity. 189 00:07:35,744 --> 00:07:37,160 For example, only the course staff 190 00:07:37,160 --> 00:07:38,604 can also modify the grades file. 191 00:07:38,604 --> 00:07:40,770 Or only the course staff can upload the final grades 192 00:07:40,770 --> 00:07:41,853 to the registrar's office. 193 00:07:41,853 --> 00:07:43,290 That'll be great. 194 00:07:43,290 --> 00:07:47,770 Then you can also think about things like availability. 195 00:07:47,770 --> 00:07:51,770 So for example, a website should be available, 196 00:07:51,770 --> 00:07:54,040 even if the bad guys try to take it down and mount 197 00:07:54,040 --> 00:07:57,972 some sort of a DOS-- Denial of Service-- attack on it. 198 00:07:57,972 --> 00:07:59,180 So this is all well and good. 199 00:07:59,180 --> 00:08:01,180 So these are the policies that we might actually 200 00:08:01,180 --> 00:08:02,860 care about from a system. 201 00:08:02,860 --> 00:08:05,220 But because it's security, there's a bad guy involved. 202 00:08:05,220 --> 00:08:06,680 We need to understand, what are we thinking 203 00:08:06,680 --> 00:08:07,980 the bad guy is going to do? 204 00:08:07,980 --> 00:08:10,190 And this is typically what we call a threat model. 205 00:08:13,090 --> 00:08:15,130 And this is basically just a set of assumptions 206 00:08:15,130 --> 00:08:23,090 about the bad guy or adversary. 207 00:08:23,090 --> 00:08:25,710 And it's important to have some sort of assumptions 208 00:08:25,710 --> 00:08:29,310 about the bad guy because, if the bad guy is omnipresent 209 00:08:29,310 --> 00:08:32,700 and is everywhere at once and you can do anything they want, 210 00:08:32,700 --> 00:08:36,030 it's going to be hard to achieve some semblance of security. 211 00:08:36,030 --> 00:08:37,630 So for example, you probably want 212 00:08:37,630 --> 00:08:40,750 to assume the bad guy doesn't exactly know your password, 213 00:08:40,750 --> 00:08:43,159 or they don't actually have physical access to your phone 214 00:08:43,159 --> 00:08:45,050 and your keys and your laptop. 215 00:08:45,050 --> 00:08:47,990 Otherwise, it's going to be hard to make some sort of progress 216 00:08:47,990 --> 00:08:50,510 in this game. 217 00:08:50,510 --> 00:08:52,940 And turns out that while this is actually quite tricky 218 00:08:52,940 --> 00:08:56,470 to come up with, but I guess one general rule is it's 219 00:08:56,470 --> 00:08:59,390 much better err on the side of caution 220 00:08:59,390 --> 00:09:01,800 and being conservative in picking your threat model, 221 00:09:01,800 --> 00:09:03,920 because bad guy might always surprise you 222 00:09:03,920 --> 00:09:07,290 in terms of what they might be able to do in practice. 223 00:09:07,290 --> 00:09:10,190 And finally, in order to achieve security, in order 224 00:09:10,190 --> 00:09:12,980 to achieve our goal under the set of assumptions, 225 00:09:12,980 --> 00:09:14,580 we're going to look at some mechanism. 226 00:09:17,400 --> 00:09:21,720 And this is the, basically, software or hardware 227 00:09:21,720 --> 00:09:24,590 or whatever part of system design, 228 00:09:24,590 --> 00:09:26,280 implementation, et cetera, that's 229 00:09:26,280 --> 00:09:30,470 going to try to make sure our policy is followed 230 00:09:30,470 --> 00:09:34,430 as long as the bad guy follows the threat model. 231 00:09:34,430 --> 00:09:36,970 So the end result is that, as long 232 00:09:36,970 --> 00:09:39,700 as our threat model was correct, hopefully we'll 233 00:09:39,700 --> 00:09:40,810 satisfy our policy. 234 00:09:40,810 --> 00:09:44,100 And it has to be the case that the mechanism doesn't screw up. 235 00:09:44,100 --> 00:09:45,420 Make sense? 236 00:09:45,420 --> 00:09:48,220 Fairly high level story about how 237 00:09:48,220 --> 00:09:50,470 to think about this kind of stuff. 238 00:09:50,470 --> 00:09:52,410 So why is this so hard, right? 239 00:09:52,410 --> 00:09:53,644 It seems like a simple plan. 240 00:09:53,644 --> 00:09:55,060 You write down these three things, 241 00:09:55,060 --> 00:09:57,180 and you're off and running. 242 00:09:57,180 --> 00:10:02,180 But in practice, as you, I'm sure, have seen in the world, 243 00:10:02,180 --> 00:10:04,490 computer systems are almost always compromised 244 00:10:04,490 --> 00:10:05,730 in some way or another. 245 00:10:05,730 --> 00:10:08,300 And break ins are pretty commonplace. 246 00:10:08,300 --> 00:10:12,210 And the big reason why security tends 247 00:10:12,210 --> 00:10:14,800 to be a difficult problem is because what we have here 248 00:10:14,800 --> 00:10:17,260 is sort of, this will be familiar to those of you 249 00:10:17,260 --> 00:10:19,400 took 6.033, this is a negative goal, 250 00:10:19,400 --> 00:10:23,270 meaning that we have to make sure our security policy is 251 00:10:23,270 --> 00:10:27,360 followed regardless of what the attacker can do. 252 00:10:27,360 --> 00:10:30,730 So just by contrast, if you want to build a file system, 253 00:10:30,730 --> 00:10:36,310 and you want to make sure that my TAs can access the grades 254 00:10:36,310 --> 00:10:37,790 file, that's pretty easy. 255 00:10:37,790 --> 00:10:40,120 I just ask them, hey, can you guys test and see? 256 00:10:40,120 --> 00:10:41,411 Can you access the grades file? 257 00:10:41,411 --> 00:10:43,600 And if they all can access it, done. 258 00:10:43,600 --> 00:10:45,340 The system works. 259 00:10:45,340 --> 00:10:48,070 But if I want to say that no one other than the TAs 260 00:10:48,070 --> 00:10:50,640 can access the grades file, this is a much harder problem 261 00:10:50,640 --> 00:10:52,820 to solve, because now I have to figure out 262 00:10:52,820 --> 00:10:56,010 what could all these non TA people in the world to try 263 00:10:56,010 --> 00:10:57,460 to get my grades file, right? 264 00:10:57,460 --> 00:11:01,110 They could try to just open it and read it. 265 00:11:01,110 --> 00:11:02,960 Maybe my file system will disallow it. 266 00:11:02,960 --> 00:11:04,900 But they might try all kinds of other attacks, 267 00:11:04,900 --> 00:11:07,060 like guessing the password for the TAs 268 00:11:07,060 --> 00:11:10,670 or stealing the TAs laptops or breaking into the room 269 00:11:10,670 --> 00:11:11,960 or who knows, right? 270 00:11:11,960 --> 00:11:14,010 This is all stuff that we have to really put 271 00:11:14,010 --> 00:11:15,050 into our threat model. 272 00:11:15,050 --> 00:11:17,650 Probably for this class, I'm not that concerned about 273 00:11:17,650 --> 00:11:21,280 the grades file to worry about these guys' laptops being 274 00:11:21,280 --> 00:11:22,880 stolen from their dorm room. 275 00:11:22,880 --> 00:11:23,460 Although maybe I should be. 276 00:11:23,460 --> 00:11:24,001 I don't know. 277 00:11:24,001 --> 00:11:25,140 It's hard to tell, right? 278 00:11:25,140 --> 00:11:27,700 And as a result, this security game 279 00:11:27,700 --> 00:11:30,030 is often not so clear cut as to what 280 00:11:30,030 --> 00:11:32,884 the right set of assumptions to make is. 281 00:11:32,884 --> 00:11:35,050 And it's only after the fact that you often realize, 282 00:11:35,050 --> 00:11:37,000 well should have thought of that. 283 00:11:39,520 --> 00:11:40,720 All right. 284 00:11:40,720 --> 00:11:42,700 And sort of, as a result, this is very much 285 00:11:42,700 --> 00:11:44,170 an iterative process. 286 00:11:44,170 --> 00:11:46,960 And the thing you end up realizing at every iteration 287 00:11:46,960 --> 00:11:49,132 is, well, here's the weakest link into my system. 288 00:11:49,132 --> 00:11:50,590 Maybe I got the threat model wrong. 289 00:11:50,590 --> 00:11:53,290 Maybe my mechanism had some bugs in it because it's a software 290 00:11:53,290 --> 00:11:55,120 and it's going to be large systems. 291 00:11:55,120 --> 00:11:57,010 They'll have lots of bugs. 292 00:11:57,010 --> 00:11:58,490 And you sort of fix them up. 293 00:11:58,490 --> 00:11:59,950 You change your threat model a bit. 294 00:11:59,950 --> 00:12:02,900 And you iterate and try to design a new system, 295 00:12:02,900 --> 00:12:06,720 and hopefully, make things better. 296 00:12:06,720 --> 00:12:10,640 So one possible interpretation of this class-- well, 297 00:12:10,640 --> 00:12:14,360 one danger-- is that you come away thinking, man, everything 298 00:12:14,360 --> 00:12:15,070 is just broken. 299 00:12:15,070 --> 00:12:15,780 Nothing works. 300 00:12:15,780 --> 00:12:18,390 We should just give up and stop using computers. 301 00:12:18,390 --> 00:12:21,460 And this is one possible interpretation. 302 00:12:21,460 --> 00:12:23,752 But it's probably not quite the right one. 303 00:12:23,752 --> 00:12:25,210 The reason this is going to come up 304 00:12:25,210 --> 00:12:26,585 or you're going to think this way 305 00:12:26,585 --> 00:12:28,060 is because, throughout this class, 306 00:12:28,060 --> 00:12:29,850 we're going to look at all these different systems, 307 00:12:29,850 --> 00:12:31,270 and we're going to sort of push them to the edge. 308 00:12:31,270 --> 00:12:33,100 We're going to see, OK, well, what if we do this? 309 00:12:33,100 --> 00:12:33,690 Is it going to break? 310 00:12:33,690 --> 00:12:34,320 What if we do that? 311 00:12:34,320 --> 00:12:35,519 Is it going to break then? 312 00:12:35,519 --> 00:12:37,060 And inevitably, every system is going 313 00:12:37,060 --> 00:12:38,643 to have some sort of a breaking point. 314 00:12:38,643 --> 00:12:39,860 And we'll figure out, oh hey. 315 00:12:39,860 --> 00:12:42,420 This system, we can break it in if we push this way. 316 00:12:42,420 --> 00:12:46,260 And this system doesn't work under these set of assumptions. 317 00:12:46,260 --> 00:12:48,195 And it's inevitable that every system 318 00:12:48,195 --> 00:12:49,320 will have a breaking point. 319 00:12:49,320 --> 00:12:51,380 But that doesn't mean that every system is worthless. 320 00:12:51,380 --> 00:12:52,920 It just means you have to know when 321 00:12:52,920 --> 00:12:54,980 to use every system design. 322 00:12:54,980 --> 00:12:57,360 And it's sort of useful to do this pushing exercise 323 00:12:57,360 --> 00:12:59,130 to find the weaknesses so that you 324 00:12:59,130 --> 00:13:03,530 know when certain ideas work, when certain ideas are not 325 00:13:03,530 --> 00:13:04,940 applicable. 326 00:13:04,940 --> 00:13:09,190 And in reality, this is a little more fuzzy boundary, right? 327 00:13:09,190 --> 00:13:11,660 The more secure you make your system, the less likely 328 00:13:11,660 --> 00:13:14,910 you'll have some embarrassing story on the front page of New 329 00:13:14,910 --> 00:13:17,000 York Times saying, your start up company 330 00:13:17,000 --> 00:13:21,460 leaked a million people's social security numbers. 331 00:13:21,460 --> 00:13:26,770 And then you pay less money to recover from that disaster. 332 00:13:26,770 --> 00:13:29,650 And I guess one sort of actually positive note on security 333 00:13:29,650 --> 00:13:33,230 is that, in many ways, security enables cool things that you 334 00:13:33,230 --> 00:13:36,450 couldn't do before, because security, especially 335 00:13:36,450 --> 00:13:40,170 mechanisms, that allow us to protect 336 00:13:40,170 --> 00:13:43,140 against certain classes of attacks, are pretty powerful. 337 00:13:43,140 --> 00:13:46,512 As one example, the browser used to be fairly boring in terms 338 00:13:46,512 --> 00:13:47,720 of what you could do with it. 339 00:13:47,720 --> 00:13:49,270 You could just view web pages, maybe 340 00:13:49,270 --> 00:13:50,930 run some JavaScript code in it. 341 00:13:50,930 --> 00:13:52,710 But now there's all these cool mechanisms 342 00:13:52,710 --> 00:13:54,420 we'll learn about in a couple of weeks 343 00:13:54,420 --> 00:13:57,810 that allow you to run arbitrary x86 native code in the web 344 00:13:57,810 --> 00:13:59,530 browser and make sure it doesn't do 345 00:13:59,530 --> 00:14:01,260 anything funny to your machine. 346 00:14:01,260 --> 00:14:04,154 And it can send-- and there's a technique or system 347 00:14:04,154 --> 00:14:06,070 called Native Client from Google that actually 348 00:14:06,070 --> 00:14:08,200 allows us to do this securely. 349 00:14:08,200 --> 00:14:11,127 And before, in order to run some native game on your machine, 350 00:14:11,127 --> 00:14:13,710 you'd have download and install it, click on lot's of dialogue 351 00:14:13,710 --> 00:14:15,671 boxes, say yes, I allow this. 352 00:14:15,671 --> 00:14:17,420 But now, you can just run it in a browser, 353 00:14:17,420 --> 00:14:18,600 no clicking required. 354 00:14:18,600 --> 00:14:19,510 It just runs. 355 00:14:19,510 --> 00:14:22,200 And the reason it's so easy and powerful 356 00:14:22,200 --> 00:14:25,990 is that our security mechanism can sandbox this program 357 00:14:25,990 --> 00:14:29,590 and not have to assume anything about the user choosing 358 00:14:29,590 --> 00:14:31,730 the right game and not some malicious game to play 359 00:14:31,730 --> 00:14:34,460 in their computer, or some other program to run. 360 00:14:34,460 --> 00:14:36,270 So in many ways, good security mechanisms 361 00:14:36,270 --> 00:14:40,610 are going to enable constructing cool new systems that weren't 362 00:14:40,610 --> 00:14:43,551 possible to construct before. 363 00:14:43,551 --> 00:14:44,050 All right. 364 00:14:44,050 --> 00:14:45,420 Make sense? 365 00:14:45,420 --> 00:14:50,485 Any questions about this story? 366 00:14:50,485 --> 00:14:51,890 All right. 367 00:14:51,890 --> 00:14:54,150 So I guess in the rest of the lecture, 368 00:14:54,150 --> 00:14:58,320 I want to go through a bunch of different examples of how 369 00:14:58,320 --> 00:15:00,000 security goes wrong. 370 00:15:00,000 --> 00:15:02,680 So, so far, we've seen how you can think of it. 371 00:15:02,680 --> 00:15:05,790 But inevitably, it's useful to see examples 372 00:15:05,790 --> 00:15:10,230 of what not to do so that you can have a better mindset when 373 00:15:10,230 --> 00:15:12,470 you're approaching security problems. 374 00:15:12,470 --> 00:15:16,586 And in this sort of breakdown of a security system, 375 00:15:16,586 --> 00:15:18,942 pretty much every one of these three things goes wrong. 376 00:15:18,942 --> 00:15:20,650 In practice, people get the policy wrong, 377 00:15:20,650 --> 00:15:22,066 people get the threat model wrong, 378 00:15:22,066 --> 00:15:23,580 and people get the mechanism wrong. 379 00:15:23,580 --> 00:15:27,280 And let's, I guess, start with policies and examples 380 00:15:27,280 --> 00:15:31,420 of how you can screw up a system's policy. 381 00:15:31,420 --> 00:15:35,220 Maybe the cleanest or sort of simplest example of this 382 00:15:35,220 --> 00:15:38,350 are account recovery questions. 383 00:15:41,810 --> 00:15:46,282 So typically, when you sign into a website, 384 00:15:46,282 --> 00:15:47,240 you provide a password. 385 00:15:47,240 --> 00:15:49,640 But what happens if you lose your password? 386 00:15:49,640 --> 00:15:52,570 Some sites will send you email if you 387 00:15:52,570 --> 00:15:55,190 lose your password with a link to reset your password. 388 00:15:55,190 --> 00:15:57,481 So it's easy enough, if you have another email address. 389 00:15:57,481 --> 00:15:59,400 But what if this is your email provider? 390 00:15:59,400 --> 00:16:03,700 So at least, several years ago, Yahoo 391 00:16:03,700 --> 00:16:06,660 hosted email, webmail, for anyone on the internet. 392 00:16:06,660 --> 00:16:08,572 And when you forgot your Yahoo password, 393 00:16:08,572 --> 00:16:10,030 they couldn't really send you email 394 00:16:10,030 --> 00:16:11,610 because you couldn't get it. 395 00:16:11,610 --> 00:16:13,420 So instead, they had you register 396 00:16:13,420 --> 00:16:16,317 a couple of questions with them that hopefully only you know. 397 00:16:16,317 --> 00:16:18,650 And if you forget your password, you can click on a link 398 00:16:18,650 --> 00:16:21,130 and say, well, here's the answers to my questions. 399 00:16:21,130 --> 00:16:23,480 Let me have my password again. 400 00:16:23,480 --> 00:16:26,360 And what turns out to be the case is-- well, 401 00:16:26,360 --> 00:16:30,430 some people failed to realize is that this changes your policy, 402 00:16:30,430 --> 00:16:32,800 because before, the policy of the system 403 00:16:32,800 --> 00:16:35,280 is people that can log in are the people that 404 00:16:35,280 --> 00:16:36,931 know the password. 405 00:16:36,931 --> 00:16:38,930 And when you introduce these recovery questions, 406 00:16:38,930 --> 00:16:40,596 the policy becomes, well, you can log in 407 00:16:40,596 --> 00:16:44,380 if you know either the password or those security questions. 408 00:16:44,380 --> 00:16:47,001 So it strictly weakens the security of your system. 409 00:16:47,001 --> 00:16:49,250 And many people have actually taken advantage of this. 410 00:16:49,250 --> 00:16:53,410 One sort of well known example is, I think a couple years ago, 411 00:16:53,410 --> 00:16:55,820 Sarah Palin had an email account at Yahoo. 412 00:16:55,820 --> 00:16:59,390 And her recovery questions were things like, well, 413 00:16:59,390 --> 00:17:00,660 where'd you go to school? 414 00:17:00,660 --> 00:17:03,406 What was your friend's name? 415 00:17:03,406 --> 00:17:04,280 What's your birthday? 416 00:17:04,280 --> 00:17:04,960 Et cetera. 417 00:17:04,960 --> 00:17:07,430 These were all things written on her Wikipedia page. 418 00:17:07,430 --> 00:17:09,690 And as a result, someone can quite easily, 419 00:17:09,690 --> 00:17:12,839 and someone did, actually, get into her Yahoo email account 420 00:17:12,839 --> 00:17:15,319 just by looking up on Wikipedia what her high school was 421 00:17:15,319 --> 00:17:17,379 and what her birthday was. 422 00:17:17,379 --> 00:17:18,920 So you really have to think carefully 423 00:17:18,920 --> 00:17:21,890 about the implications of different security 424 00:17:21,890 --> 00:17:24,819 policies you're making here. 425 00:17:24,819 --> 00:17:29,220 Perhaps a more intricate and, maybe, interesting example, 426 00:17:29,220 --> 00:17:32,720 is what happens when you have multiple systems that start 427 00:17:32,720 --> 00:17:34,750 interacting with one another. 428 00:17:34,750 --> 00:17:39,340 So there's this nice story about a guy called Mat Honan. 429 00:17:39,340 --> 00:17:42,660 Maybe you read this story a year or two ago. 430 00:17:42,660 --> 00:17:45,900 He's a editor at this wired.com magazine. 431 00:17:45,900 --> 00:17:48,150 And had a bit of a problem. 432 00:17:48,150 --> 00:17:50,550 Someone basically got into his Gmail account 433 00:17:50,550 --> 00:17:52,340 and did lots of bad things. 434 00:17:52,340 --> 00:17:53,590 But how did they do it, right? 435 00:17:53,590 --> 00:17:54,870 So it's kind of interesting. 436 00:17:54,870 --> 00:17:57,640 So all parties in this story seem 437 00:17:57,640 --> 00:17:58,890 to be doing reasonable things. 438 00:17:58,890 --> 00:18:01,181 But we'll see how they add up to something unfortunate. 439 00:18:01,181 --> 00:18:02,530 So we have Gmail. 440 00:18:02,530 --> 00:18:06,580 And Gmail lets you reset your password 441 00:18:06,580 --> 00:18:09,700 if you forget, as do pretty much every other system. 442 00:18:09,700 --> 00:18:13,670 And the way you do a reset at Gmail 443 00:18:13,670 --> 00:18:16,090 is you send them a reset request. 444 00:18:16,090 --> 00:18:19,030 And what they say is, well, you weren't 445 00:18:19,030 --> 00:18:21,420 going to do this recovery questions, at least 446 00:18:21,420 --> 00:18:22,306 not for this guy. 447 00:18:22,306 --> 00:18:24,930 What they do is they send you a recovery link to a backup email 448 00:18:24,930 --> 00:18:27,130 address, or some other email address that you have. 449 00:18:27,130 --> 00:18:29,588 And helpful, they actually print the email address for you. 450 00:18:29,588 --> 00:18:31,139 So for this guy's account, someone 451 00:18:31,139 --> 00:18:32,930 went and asked Gmail to reset the password. 452 00:18:32,930 --> 00:18:33,830 And they said, well, yeah. 453 00:18:33,830 --> 00:18:34,330 Sure. 454 00:18:34,330 --> 00:18:37,080 We just sent the recovery link to this email, 455 00:18:37,080 --> 00:18:42,089 foo@me.com, which was some Apple email service. 456 00:18:42,089 --> 00:18:44,505 OK, but the bad guy doesn't have access to me.com, either. 457 00:18:44,505 --> 00:18:46,860 But they want to get this password reset 458 00:18:46,860 --> 00:18:48,790 link to get access to Gmail. 459 00:18:48,790 --> 00:18:50,860 Well, the way things worked was that, 460 00:18:50,860 --> 00:18:55,980 in Apple's case, this me.com site, 461 00:18:55,980 --> 00:19:00,830 allowed you to actually reset your password if you know 462 00:19:00,830 --> 00:19:03,580 your billing address and the last four digits of your credit 463 00:19:03,580 --> 00:19:05,112 card number. 464 00:19:05,112 --> 00:19:07,300 So it's still not clear how you're going to get this 465 00:19:07,300 --> 00:19:10,591 guy's-- well, home address, maybe you could look it up 466 00:19:10,591 --> 00:19:11,090 somewhere. 467 00:19:11,090 --> 00:19:12,981 This guy was a well known person at the time. 468 00:19:12,981 --> 00:19:15,480 But where do you get the last four digits of his credit card 469 00:19:15,480 --> 00:19:16,640 number? 470 00:19:16,640 --> 00:19:21,177 Well, not clear, but let's keep going further. 471 00:19:21,177 --> 00:19:23,510 So you need to send these things to me.com to get access 472 00:19:23,510 --> 00:19:25,652 to his email account there. 473 00:19:25,652 --> 00:19:28,110 Well, it turns out this guy had an account at Amazon, which 474 00:19:28,110 --> 00:19:31,250 is another party in this story. 475 00:19:31,250 --> 00:19:34,460 Amazon really wants you to buy things. 476 00:19:34,460 --> 00:19:38,030 And as a result, they actually have a fairly elaborate account 477 00:19:38,030 --> 00:19:39,770 management system. 478 00:19:39,770 --> 00:19:42,950 And in particular, because they really want you to buy stuff, 479 00:19:42,950 --> 00:19:44,490 they don't require you to sign in 480 00:19:44,490 --> 00:19:47,670 in order to purchase some item with a credit card. 481 00:19:47,670 --> 00:19:50,710 So I can actually go on Amazon, or at least at the time, 482 00:19:50,710 --> 00:19:53,400 I was able to go on Amazon and say, well, I'm this user. 483 00:19:53,400 --> 00:19:57,490 And I want to buy this pack of toothbrushes. 484 00:19:57,490 --> 00:20:00,140 And if I wanted to use the saved credit card 485 00:20:00,140 --> 00:20:02,640 number in the guy's account, I shouldn't be able to do this. 486 00:20:02,640 --> 00:20:05,740 But if I just was providing a new credit card, what Amazon 487 00:20:05,740 --> 00:20:08,280 would do is, they can actually add a new credit 488 00:20:08,280 --> 00:20:13,500 card to some guy's account. 489 00:20:13,500 --> 00:20:15,500 So that seems not too bad, right? 490 00:20:15,500 --> 00:20:17,060 I'm basically ordering toothbrushes 491 00:20:17,060 --> 00:20:18,839 through one of your Amazon accounts. 492 00:20:18,839 --> 00:20:20,380 But it's not your credit card anyway. 493 00:20:20,380 --> 00:20:22,380 It's just my credit card number being used. 494 00:20:22,380 --> 00:20:24,220 So it's not clear how things go wrong yet. 495 00:20:24,220 --> 00:20:26,630 But Amazon had another interface. 496 00:20:26,630 --> 00:20:28,430 All these are complicated systems. 497 00:20:28,430 --> 00:20:31,700 And Amazon had an interface for password reset. 498 00:20:31,700 --> 00:20:34,550 And in order to reset a password in Amazon, 499 00:20:34,550 --> 00:20:38,230 what you had to provide is just one of the user's credit card 500 00:20:38,230 --> 00:20:39,560 numbers. 501 00:20:39,560 --> 00:20:42,424 So I can order stuff and add a credit card number 502 00:20:42,424 --> 00:20:43,090 to your account. 503 00:20:43,090 --> 00:20:45,298 And then I can say, hey, I want to reset my password. 504 00:20:45,298 --> 00:20:46,920 This is one of my credit card numbers. 505 00:20:46,920 --> 00:20:48,900 And this, in fact, worked. 506 00:20:48,900 --> 00:20:53,590 So this is where the bad guy got a hold of this guy's, Mat's, 507 00:20:53,590 --> 00:20:54,920 Amazon account. 508 00:20:54,920 --> 00:20:55,420 But OK. 509 00:20:55,420 --> 00:20:57,260 How do you fish out the credit card number 510 00:20:57,260 --> 00:20:59,350 for resetting Apple's site? 511 00:20:59,350 --> 00:21:01,010 Well, Amazon was actually very careful. 512 00:21:01,010 --> 00:21:03,010 Even if you break into someone's Amazon account, 513 00:21:03,010 --> 00:21:05,620 it will not print you the saved credit card 514 00:21:05,620 --> 00:21:07,700 numbers from that person. 515 00:21:07,700 --> 00:21:09,284 But it will show the last four digits. 516 00:21:09,284 --> 00:21:11,616 Just so you know which credit card you're talking about. 517 00:21:11,616 --> 00:21:14,220 So you can list all the credit cards, other than the one 518 00:21:14,220 --> 00:21:14,930 you added. 519 00:21:14,930 --> 00:21:16,770 You can then go and break into me.com. 520 00:21:16,770 --> 00:21:19,620 You can click on this link and get access 521 00:21:19,620 --> 00:21:21,281 to the guy's Gmail account. 522 00:21:21,281 --> 00:21:22,530 This is all very subtle stuff. 523 00:21:22,530 --> 00:21:24,380 And in isolation, each system seems 524 00:21:24,380 --> 00:21:26,880 to be doing somewhat sensible things. 525 00:21:26,880 --> 00:21:28,590 But it's actually quite hard to reason 526 00:21:28,590 --> 00:21:31,560 about these vulnerabilities and weaknesses 527 00:21:31,560 --> 00:21:34,480 unless you have this whole picture explained to you 528 00:21:34,480 --> 00:21:37,380 and you've sort of put all the pieces together. 529 00:21:37,380 --> 00:21:41,310 So this is actually fairly tricky stuff. 530 00:21:41,310 --> 00:21:45,012 And unfortunately, well, much like for every one 531 00:21:45,012 --> 00:21:47,470 of these three categories, the answer for how to avoid this 532 00:21:47,470 --> 00:21:50,630 is often think hard and be careful. 533 00:21:50,630 --> 00:21:54,330 I guess the one general plan is, be conservative in terms 534 00:21:54,330 --> 00:21:57,960 of what you set your policy to be, 535 00:21:57,960 --> 00:22:01,540 to maybe not depend on things other sites might reveal. 536 00:22:01,540 --> 00:22:05,645 So well, I'm not sure if any really great advice would 537 00:22:05,645 --> 00:22:06,942 have prevented this problem. 538 00:22:06,942 --> 00:22:07,650 But now you know. 539 00:22:07,650 --> 00:22:11,180 And now you'll make other mistakes. 540 00:22:11,180 --> 00:22:13,970 There's many other examples of policies 541 00:22:13,970 --> 00:22:18,577 going wrong and allowing a system to be compromised. 542 00:22:18,577 --> 00:22:19,660 That's interesting enough. 543 00:22:19,660 --> 00:22:22,550 But let's look at how people might screw up threat models. 544 00:22:22,550 --> 00:22:28,020 So let me turn off this blue square. 545 00:22:28,020 --> 00:22:28,520 OK. 546 00:22:28,520 --> 00:22:36,060 So what are examples of threat models that go wrong? 547 00:22:36,060 --> 00:22:42,410 Well, probably a big one in practice is human factors. 548 00:22:42,410 --> 00:22:45,970 So we often make assumptions about what 549 00:22:45,970 --> 00:22:49,040 people will do in a system, like they 550 00:22:49,040 --> 00:22:51,150 will pick a good, strong password, 551 00:22:51,150 --> 00:22:53,920 or they will not click on random websites 552 00:22:53,920 --> 00:22:56,800 that they get through email and enter their password there. 553 00:22:56,800 --> 00:22:59,594 So these are-- well, as you probably suspect, 554 00:22:59,594 --> 00:23:01,260 and in practice, happens to be the case, 555 00:23:01,260 --> 00:23:03,800 these are not good assumptions in all cases. 556 00:23:03,800 --> 00:23:06,020 And people pick bad passwords. 557 00:23:06,020 --> 00:23:08,510 And people will click on random links. 558 00:23:08,510 --> 00:23:10,350 And people will enter their password 559 00:23:10,350 --> 00:23:13,510 on sites that are actually not the right site at all. 560 00:23:13,510 --> 00:23:16,990 And they will not be paying a lot of attention. 561 00:23:16,990 --> 00:23:19,960 So you probably don't want to have threat models that 562 00:23:19,960 --> 00:23:21,600 make very strong assumptions about what 563 00:23:21,600 --> 00:23:23,433 humans will do because inevitably, something 564 00:23:23,433 --> 00:23:25,610 will go wrong. 565 00:23:25,610 --> 00:23:26,720 Make sense? 566 00:23:26,720 --> 00:23:29,220 Any questions? 567 00:23:29,220 --> 00:23:29,850 All right. 568 00:23:29,850 --> 00:23:32,690 Another sort of good thing to watch out in threat models 569 00:23:32,690 --> 00:23:35,860 is that they sometimes change over time. 570 00:23:35,860 --> 00:23:38,160 Or whether something is a good assumption or not 571 00:23:38,160 --> 00:23:40,160 changes over time. 572 00:23:40,160 --> 00:23:45,420 One example of this is actually at MIT in the mid '90s-- mid 573 00:23:45,420 --> 00:23:48,080 '80s, actually-- Project Athena developed 574 00:23:48,080 --> 00:23:49,250 this system called Kerberos. 575 00:23:49,250 --> 00:23:52,830 And we'll read about this in a couple of weeks in this class. 576 00:23:52,830 --> 00:23:55,640 And at the time, they were sort of figuring out, well, Kerberos 577 00:23:55,640 --> 00:23:57,181 is going to be based on cryptography. 578 00:23:57,181 --> 00:23:59,331 So we need to pick some size keys 579 00:23:59,331 --> 00:24:00,830 to make sure they're not going to be 580 00:24:00,830 --> 00:24:02,570 guessed by arbitrary people. 581 00:24:02,570 --> 00:24:03,320 And they said, OK. 582 00:24:03,320 --> 00:24:06,120 Well you know, 56-bit keys, at the time, 583 00:24:06,120 --> 00:24:09,960 for this cypher called DES, seemed like a plausible size. 584 00:24:09,960 --> 00:24:13,700 Maybe not great, but certainly not entirely unreasonable. 585 00:24:13,700 --> 00:24:14,997 And this was in the mid '80s. 586 00:24:14,997 --> 00:24:17,580 But then you know, this system got popular and got used a lot. 587 00:24:17,580 --> 00:24:19,400 MIT still uses it. 588 00:24:19,400 --> 00:24:22,400 And they never really went back to seriously revisit 589 00:24:22,400 --> 00:24:23,900 this assumption. 590 00:24:23,900 --> 00:24:27,244 And then, a couple years ago, a group of 6.858 students 591 00:24:27,244 --> 00:24:29,910 figured out that actually, yeah, you can just break this, right? 592 00:24:29,910 --> 00:24:34,480 It's easy enough to enumerate all the 256 keys these days. 593 00:24:34,480 --> 00:24:36,570 Computers are so fast, you can just do it. 594 00:24:36,570 --> 00:24:38,780 And as a result, they were able to, 595 00:24:38,780 --> 00:24:42,619 with the help of some hardware from a particular web 596 00:24:42,619 --> 00:24:45,160 service-- we'll have some links the lecture notes-- they were 597 00:24:45,160 --> 00:24:50,170 able to get, basically, anyone's Kerberos account key in roughly 598 00:24:50,170 --> 00:24:51,100 a day. 599 00:24:51,100 --> 00:24:55,030 And so this assumption was good in the mid 1980s. 600 00:24:55,030 --> 00:24:57,210 No longer a good assumption today. 601 00:24:57,210 --> 00:24:59,850 So you really have to make sure your assumptions 602 00:24:59,850 --> 00:25:02,220 sort of keep up with the times. 603 00:25:02,220 --> 00:25:06,140 Maybe a more timely example is, if your adversary-- 604 00:25:06,140 --> 00:25:08,980 or if you're worried about government attacks, 605 00:25:08,980 --> 00:25:12,371 you might realize that you shouldn't trust hardware even 606 00:25:12,371 --> 00:25:13,120 these days, right? 607 00:25:13,120 --> 00:25:14,870 There was all these revelations about what 608 00:25:14,870 --> 00:25:16,860 the NSA is capable of doing. 609 00:25:16,860 --> 00:25:18,590 And they have hardware back doors 610 00:25:18,590 --> 00:25:20,922 that they can insert into computers. 611 00:25:20,922 --> 00:25:23,444 And maybe up until a couple years ago, well, who knows? 612 00:25:23,444 --> 00:25:25,110 I guess we didn't know about this stuff. 613 00:25:25,110 --> 00:25:27,360 So maybe it was a reasonable assumption 614 00:25:27,360 --> 00:25:29,270 to assume your laptop is not going 615 00:25:29,270 --> 00:25:31,962 to be compromised physically, the hardware itself. 616 00:25:31,962 --> 00:25:32,670 But now you know. 617 00:25:32,670 --> 00:25:34,670 Actually, if you're worried about the government 618 00:25:34,670 --> 00:25:37,190 being after you, you probably have a much harder problem 619 00:25:37,190 --> 00:25:39,009 to deal with because your laptop might 620 00:25:39,009 --> 00:25:40,550 be compromised physically, regardless 621 00:25:40,550 --> 00:25:42,621 of what you install in it. 622 00:25:42,621 --> 00:25:44,870 So we really have to be careful with your threat model 623 00:25:44,870 --> 00:25:46,820 and really sort of balance it against who 624 00:25:46,820 --> 00:25:48,236 you think is out to get you. 625 00:25:48,236 --> 00:25:50,860 I think it's going to be a very expensive proposition if you're 626 00:25:50,860 --> 00:25:53,302 going to try to protect yourself from the NSA, really. 627 00:25:53,302 --> 00:25:55,510 On the other hand, if you're just protecting yourself 628 00:25:55,510 --> 00:25:57,772 from random other students that are, 629 00:25:57,772 --> 00:26:00,230 I don't know, snooping around in your Athena home directory 630 00:26:00,230 --> 00:26:01,938 or whatnot, maybe you don't have to worry 631 00:26:01,938 --> 00:26:03,210 about this stuff as much. 632 00:26:03,210 --> 00:26:06,320 So it's really a balancing game and picking the right threat 633 00:26:06,320 --> 00:26:08,610 model. 634 00:26:08,610 --> 00:26:15,370 Another example of a bad threat model shows up in the way 635 00:26:15,370 --> 00:26:18,307 secure websites these days check certificates of a website 636 00:26:18,307 --> 00:26:19,390 that you're connecting to. 637 00:26:19,390 --> 00:26:23,350 So in this SSL protocol or TLS, when you connect to a website 638 00:26:23,350 --> 00:26:25,860 and it says HTTPS-- we'll talk much more about this 639 00:26:25,860 --> 00:26:28,710 in later lectures-- but what happens 640 00:26:28,710 --> 00:26:30,955 is that the site you're connecting to presents you 641 00:26:30,955 --> 00:26:34,800 a certificate signed by one of the certificate authorities 642 00:26:34,800 --> 00:26:37,450 out there that attests that, yep, this key 643 00:26:37,450 --> 00:26:39,910 belongs to Amazon.com. 644 00:26:39,910 --> 00:26:42,330 And architecturally, the sort of mistake 645 00:26:42,330 --> 00:26:46,937 or the bad threat model that these guys assumed 646 00:26:46,937 --> 00:26:49,020 is that all these CAs are going to be trustworthy. 647 00:26:49,020 --> 00:26:50,805 They will never make a mistake. 648 00:26:50,805 --> 00:26:52,180 And in fact, the way system works 649 00:26:52,180 --> 00:26:54,710 is that there's hundreds of these CAs out there. 650 00:26:54,710 --> 00:26:58,390 The Indian postal authority, I think, has a CA. 651 00:26:58,390 --> 00:27:00,420 The Chinese government has a CA. 652 00:27:00,420 --> 00:27:04,780 Lots of entities are certificate authorities in this design. 653 00:27:04,780 --> 00:27:06,870 And any of them can make a certificate 654 00:27:06,870 --> 00:27:09,667 for any host name or a domain name. 655 00:27:09,667 --> 00:27:11,750 And as a result, what happens if you're a bad guy, 656 00:27:11,750 --> 00:27:14,750 if you want to compromise Gmail or if you want to impersonate 657 00:27:14,750 --> 00:27:16,739 Gmail's website, you just have to compromise 658 00:27:16,739 --> 00:27:18,280 one of these certificate authorities. 659 00:27:18,280 --> 00:27:20,450 And it turns out the weakest link is probably 660 00:27:20,450 --> 00:27:23,170 some poorly run authority somewhere in some, 661 00:27:23,170 --> 00:27:26,730 you know, not particularly up to date country. 662 00:27:26,730 --> 00:27:27,760 Who knows, right? 663 00:27:27,760 --> 00:27:31,170 And as a result, it's probably a bad assumption 664 00:27:31,170 --> 00:27:33,216 to build a system-- or it's a bad idea 665 00:27:33,216 --> 00:27:34,840 to build a system around the assumption 666 00:27:34,840 --> 00:27:38,870 that you'll manage to keep all 300 certificate 667 00:27:38,870 --> 00:27:42,420 authorities spread out around the globe perfectly secure. 668 00:27:42,420 --> 00:27:44,560 But yet, that's the assumption underpinning 669 00:27:44,560 --> 00:27:49,090 the security mechanism of today's SSL protocol used 670 00:27:49,090 --> 00:27:51,900 by web browsers. 671 00:27:51,900 --> 00:27:56,040 And there's sort of many other, I guess, examples 672 00:27:56,040 --> 00:27:58,510 that are things you might not have thought of. 673 00:27:58,510 --> 00:28:04,080 Another sort of amusing example from the 1980s was DARPA. 674 00:28:04,080 --> 00:28:07,090 This defense agency, at the time, 675 00:28:07,090 --> 00:28:10,310 really wanted to build secure operating systems. 676 00:28:10,310 --> 00:28:13,646 And they actually went so far as to get 677 00:28:13,646 --> 00:28:15,270 a bunch of universities and researchers 678 00:28:15,270 --> 00:28:17,840 to build secure OS prototypes. 679 00:28:17,840 --> 00:28:19,520 And then they actually got a red team, 680 00:28:19,520 --> 00:28:23,516 like a team of bad guys pretending to be the attackers, 681 00:28:23,516 --> 00:28:25,890 and told them, well, go break into these secure operating 682 00:28:25,890 --> 00:28:26,930 systems any way you can. 683 00:28:26,930 --> 00:28:29,300 We actually want to know, is it secure? 684 00:28:29,300 --> 00:28:32,322 And it's kind of amusing, some of the surprising ways 685 00:28:32,322 --> 00:28:33,530 they compromised the systems. 686 00:28:33,530 --> 00:28:36,960 One was that there was this OS research 687 00:28:36,960 --> 00:28:39,072 team that seemed to have a perfectly secure OS, 688 00:28:39,072 --> 00:28:40,030 but it got compromised. 689 00:28:40,030 --> 00:28:42,940 And the way it happened is that the server in which the source 690 00:28:42,940 --> 00:28:44,190 code of the operating system was stored 691 00:28:44,190 --> 00:28:46,070 was some development machine in someone's office 692 00:28:46,070 --> 00:28:47,270 that wasn't secured at all. 693 00:28:47,270 --> 00:28:48,660 But that had all the source code. 694 00:28:48,660 --> 00:28:50,500 So the bad guys broke into that server. 695 00:28:50,500 --> 00:28:51,850 It was not protected very well. 696 00:28:51,850 --> 00:28:53,340 Changed the source code of the operating system 697 00:28:53,340 --> 00:28:54,690 to introduce a back door. 698 00:28:54,690 --> 00:28:57,964 And then, when the researchers built their operating systems, 699 00:28:57,964 --> 00:28:59,130 well, it had this back door. 700 00:28:59,130 --> 00:29:00,755 And the bad guys were able to break in. 701 00:29:00,755 --> 00:29:03,530 So you really have to think about all the possible sort 702 00:29:03,530 --> 00:29:05,800 of assumptions you're making about 703 00:29:05,800 --> 00:29:07,440 where your software is coming from, 704 00:29:07,440 --> 00:29:09,630 about how the bad guy can get in, 705 00:29:09,630 --> 00:29:14,150 in order to make sure your system is really secure. 706 00:29:14,150 --> 00:29:18,160 And there's many other examples in lecture notes, if you want. 707 00:29:18,160 --> 00:29:19,240 So I'm using anecdotes. 708 00:29:19,240 --> 00:29:20,910 You can page through those. 709 00:29:23,750 --> 00:29:28,580 Probably the most pervasive problem that shows up, 710 00:29:28,580 --> 00:29:30,710 of course, is in mechanisms, though. 711 00:29:30,710 --> 00:29:33,290 And in part, it's because mechanisms 712 00:29:33,290 --> 00:29:35,490 are the most complicated part of the story. 713 00:29:35,490 --> 00:29:39,225 It's the entirety of all the software and hardware 714 00:29:39,225 --> 00:29:41,000 and all that sort of system components 715 00:29:41,000 --> 00:29:45,400 that make up what is trying to enforce your security policy. 716 00:29:45,400 --> 00:29:49,960 And there's no end of ways in which mechanisms can fail. 717 00:29:49,960 --> 00:29:55,070 And, partly as a result, much of this class 718 00:29:55,070 --> 00:29:57,760 will focus pretty heavily on mechanisms 719 00:29:57,760 --> 00:30:00,480 and how do you make mechanisms that are secure, 720 00:30:00,480 --> 00:30:04,450 that provide correct enforcement of security policies. 721 00:30:04,450 --> 00:30:06,930 And we'll talk about threat models and policies as well. 722 00:30:06,930 --> 00:30:12,010 But turns out it's much easier to make clean, 723 00:30:12,010 --> 00:30:14,300 sort of crisp statements about mechanisms and ways 724 00:30:14,300 --> 00:30:18,100 they work and don't work, as opposed to policies and threat 725 00:30:18,100 --> 00:30:20,420 models which, really, you have to figure out 726 00:30:20,420 --> 00:30:22,190 how to fit them into a particular context 727 00:30:22,190 --> 00:30:24,980 where you're using a system. 728 00:30:24,980 --> 00:30:30,320 So let's look at some examples of, I guess, mechanism bugs. 729 00:30:30,320 --> 00:30:33,280 One that you might have heard in the last couple of days 730 00:30:33,280 --> 00:30:38,380 was a problem in the security mechanism in Apple's cloud 731 00:30:38,380 --> 00:30:39,610 infrastructure called iCloud. 732 00:30:42,820 --> 00:30:45,430 Well actually, any one of you that has an iPhone 733 00:30:45,430 --> 00:30:47,450 might be using this iCloud service. 734 00:30:47,450 --> 00:30:49,690 They basically provide storage for files 735 00:30:49,690 --> 00:30:53,290 and let you find your iPhone if you lose it, and probably 736 00:30:53,290 --> 00:30:55,450 lots of other useful features. 737 00:30:55,450 --> 00:30:58,780 And I think it's some relative of this me.com service 738 00:30:58,780 --> 00:31:03,150 that was implicated in this scheme a couple years back. 739 00:31:03,150 --> 00:31:04,840 And the problem that someone discovered 740 00:31:04,840 --> 00:31:08,270 in this iCloud service is that they 741 00:31:08,270 --> 00:31:11,960 didn't enforce the same sort of mechanism at all interfaces. 742 00:31:11,960 --> 00:31:13,540 OK, so what does iCloud look like? 743 00:31:13,540 --> 00:31:18,410 Well, it basically provides lots of services for the same sort 744 00:31:18,410 --> 00:31:19,650 of set of accounts. 745 00:31:19,650 --> 00:31:23,050 So maybe you have your file storage on iCloud. 746 00:31:23,050 --> 00:31:26,087 Maybe you have your photo sharing. 747 00:31:26,087 --> 00:31:27,420 Maybe you have other interfaces. 748 00:31:27,420 --> 00:31:28,850 And one of the interfaces into iCloud-- 749 00:31:28,850 --> 00:31:30,475 these are all sort of at different APIs 750 00:31:30,475 --> 00:31:35,130 that they provide-- was this feature to find my iPhone, 751 00:31:35,130 --> 00:31:36,730 I think. 752 00:31:36,730 --> 00:31:39,090 And all these interfaces want to make sure 753 00:31:39,090 --> 00:31:42,580 that you are the right user, you're authenticated correctly. 754 00:31:42,580 --> 00:31:45,050 And unfortunately, the developers 755 00:31:45,050 --> 00:31:48,190 all this iCloud system, you know it's a giant piece of software. 756 00:31:48,190 --> 00:31:51,310 I'm sure lots of developers worked on this. 757 00:31:51,310 --> 00:31:53,070 But on this particular interface, 758 00:31:53,070 --> 00:31:55,480 the find my iPhone interface, when 759 00:31:55,480 --> 00:31:58,510 you tried to log in with a username and password, 760 00:31:58,510 --> 00:32:02,420 they didn't keep track of how many times you tried to log in. 761 00:32:02,420 --> 00:32:05,640 And the reason is important is that, as I mentioned earlier, 762 00:32:05,640 --> 00:32:07,930 humans are not that great at picking good passwords. 763 00:32:07,930 --> 00:32:10,920 So actually building a system that authenticates users 764 00:32:10,920 --> 00:32:12,490 with passwords is pretty tricky. 765 00:32:12,490 --> 00:32:14,820 We'll actually read a whole paper about this later on. 766 00:32:14,820 --> 00:32:19,290 But one good strategy is, there's probably 767 00:32:19,290 --> 00:32:22,170 a million passwords out there that will account 768 00:32:22,170 --> 00:32:24,180 for 50% percent of accounts. 769 00:32:24,180 --> 00:32:26,420 So if you can guess, make a million attempts 770 00:32:26,420 --> 00:32:28,900 at someone's account, then there's 771 00:32:28,900 --> 00:32:31,150 a good chance you'll get their password because people 772 00:32:31,150 --> 00:32:33,060 actually pick predictable passwords. 773 00:32:33,060 --> 00:32:34,494 And one way to try to defeat this 774 00:32:34,494 --> 00:32:36,160 is to make sure that your system doesn't 775 00:32:36,160 --> 00:32:38,670 allow an arbitrary number of attempts 776 00:32:38,670 --> 00:32:39,870 to log in to an account. 777 00:32:39,870 --> 00:32:42,220 Maybe after three or 10 tries, you 778 00:32:42,220 --> 00:32:44,950 should say, well, you've had enough tries. 779 00:32:44,950 --> 00:32:45,740 Time out. 780 00:32:45,740 --> 00:32:48,360 You can try again in 10 minutes or in an hour. 781 00:32:48,360 --> 00:32:50,870 And this way you really slow down the attacker. 782 00:32:50,870 --> 00:32:54,460 So they can only make a handful of guesses a day, 783 00:32:54,460 --> 00:32:56,217 instead of millions of guesses. 784 00:32:56,217 --> 00:32:58,300 And as a result, even if you have not the greatest 785 00:32:58,300 --> 00:33:00,550 of passwords, it's going to be pretty hard for someone 786 00:33:00,550 --> 00:33:01,570 to guess it. 787 00:33:01,570 --> 00:33:06,990 What would happen is that iCloud had this password guessing 788 00:33:06,990 --> 00:33:10,210 prevention or, basically, back off, on some interfaces, 789 00:33:10,210 --> 00:33:12,730 like if you tried to log in through other interfaces 790 00:33:12,730 --> 00:33:15,130 and you failed 10 times, it would say, well, sorry. 791 00:33:15,130 --> 00:33:17,150 You have to wait until you try again. 792 00:33:17,150 --> 00:33:18,710 But on this find my iPhone interface, 793 00:33:18,710 --> 00:33:19,669 they forget this check. 794 00:33:19,669 --> 00:33:21,335 That's probably, you know, some guy just 795 00:33:21,335 --> 00:33:23,300 forgot to call this function on this API. 796 00:33:23,300 --> 00:33:26,867 But the result is that, for the same set of accounts, 797 00:33:26,867 --> 00:33:28,950 a bad guy would be able to now guess your password 798 00:33:28,950 --> 00:33:32,890 through this interface at millions of attempts per day 799 00:33:32,890 --> 00:33:35,340 easily, because this is just limited up to how fast they 800 00:33:35,340 --> 00:33:37,452 can send packets to this iCloud thing. 801 00:33:37,452 --> 00:33:39,160 And they can probably guess your password 802 00:33:39,160 --> 00:33:43,960 with pretty good accuracy, or with pretty good success rate, 803 00:33:43,960 --> 00:33:46,780 after making many guesses. 804 00:33:46,780 --> 00:33:48,890 And this led to some unfortunate break ins. 805 00:33:48,890 --> 00:33:51,660 And people's confidential data got stolen 806 00:33:51,660 --> 00:33:54,870 from this iCloud service. 807 00:33:54,870 --> 00:33:59,621 So this is sort of an example of you had the right policy. 808 00:33:59,621 --> 00:34:01,120 Only the user and the right password 809 00:34:01,120 --> 00:34:02,632 would get you access to the files. 810 00:34:02,632 --> 00:34:04,090 You even had the right threat model 811 00:34:04,090 --> 00:34:06,810 that, well, the bad guy might be able to guess the password. 812 00:34:06,810 --> 00:34:09,370 So we'll have to break limit the number of guess attempts. 813 00:34:09,370 --> 00:34:12,250 But he just screwed up, like the mechanism had a bug in it. 814 00:34:12,250 --> 00:34:15,239 He just forgot to enforce this right policy and mechanism 815 00:34:15,239 --> 00:34:16,280 at some interface. 816 00:34:16,280 --> 00:34:19,520 And this shows up again and again in systems, 817 00:34:19,520 --> 00:34:24,060 where just made a mistake and it has pretty drastic effects 818 00:34:24,060 --> 00:34:27,000 on the security of the overall system. 819 00:34:27,000 --> 00:34:28,960 This make sense? 820 00:34:28,960 --> 00:34:30,320 Any questions so far? 821 00:34:33,290 --> 00:34:34,630 All right. 822 00:34:34,630 --> 00:34:35,260 OK. 823 00:34:35,260 --> 00:34:39,149 So another example-- this is sort of an example of you 824 00:34:39,149 --> 00:34:42,982 forget to check for password guessing attempts. 825 00:34:42,982 --> 00:34:44,690 There's many other things you can forget. 826 00:34:44,690 --> 00:34:47,830 You could forget to check for access control altogether. 827 00:34:47,830 --> 00:34:53,179 So one example is, Citibank had a website-- actually, still 828 00:34:53,179 --> 00:34:57,150 has a website that allows you to look at your credit card 829 00:34:57,150 --> 00:34:58,280 account information. 830 00:34:58,280 --> 00:34:59,560 So if you have a credit card with Citibank, 831 00:34:59,560 --> 00:35:00,660 you go to this website, it tells you, 832 00:35:00,660 --> 00:35:01,993 yeah, you have this credit card. 833 00:35:01,993 --> 00:35:04,160 Here's all the charges, all this great stuff. 834 00:35:04,160 --> 00:35:08,480 And the workflow a couple of years ago was that you go 835 00:35:08,480 --> 00:35:12,710 to some site, you provide a log in username and password, 836 00:35:12,710 --> 00:35:15,630 and you get redirected to another URL, 837 00:35:15,630 --> 00:35:18,130 which is something like, I don't know, I'm guessing, 838 00:35:18,130 --> 00:35:23,190 but basically like citi.com/account?id= you know, 839 00:35:23,190 --> 00:35:26,640 whatever, one two three four. 840 00:35:26,640 --> 00:35:29,422 And it turns out that some guy figured out, well, 841 00:35:29,422 --> 00:35:30,880 if you change this number, you just 842 00:35:30,880 --> 00:35:33,910 get someone else's account. 843 00:35:33,910 --> 00:35:37,510 And it's not clear quite how to think of this. 844 00:35:37,510 --> 00:35:40,010 One possibility is that these guys were just thinking right, 845 00:35:40,010 --> 00:35:43,020 but they, again, forgot to check a function in this account 846 00:35:43,020 --> 00:35:46,646 page that, not only do I have a valid ID number, 847 00:35:46,646 --> 00:35:48,520 but it's also the ID number of the guy that's 848 00:35:48,520 --> 00:35:50,480 currently logged in. 849 00:35:50,480 --> 00:35:51,950 It's an important check to me. 850 00:35:51,950 --> 00:35:53,759 But it's easy to forget. 851 00:35:53,759 --> 00:35:55,800 Another thing is, maybe these guys were thinking, 852 00:35:55,800 --> 00:35:56,932 no, no one could hit URLs. 853 00:35:56,932 --> 00:35:58,640 Maybe they had a bad threat model, right? 854 00:35:58,640 --> 00:36:00,450 Maybe they're thinking, the URL-- 855 00:36:00,450 --> 00:36:02,820 if I don't print this URL, no one can click on it. 856 00:36:02,820 --> 00:36:04,190 It's like a bad threat model. 857 00:36:04,190 --> 00:36:07,480 So maybe that's-- well, it's hard to tell exactly what went 858 00:36:07,480 --> 00:36:08,160 wrong. 859 00:36:08,160 --> 00:36:10,280 But anyway, these mistakes do happen. 860 00:36:10,280 --> 00:36:12,860 And they show up a lot. 861 00:36:12,860 --> 00:36:17,620 So easy to have small, seemingly, bugs 862 00:36:17,620 --> 00:36:24,430 in your mechanism lead to pretty unfortunate consequences. 863 00:36:24,430 --> 00:36:28,150 Another example that's not so much in missing checks 864 00:36:28,150 --> 00:36:30,990 is a problem that showed up on Android 865 00:36:30,990 --> 00:36:33,810 phones a couple of months ago. 866 00:36:33,810 --> 00:36:38,070 Maybe I'll use this board over here. 867 00:36:38,070 --> 00:36:42,110 So the problem was related to Bitcoin, which is this-- well, 868 00:36:42,110 --> 00:36:44,480 I'm sure you've heard-- this electronic currency 869 00:36:44,480 --> 00:36:47,770 system that's pretty popular these days. 870 00:36:47,770 --> 00:36:54,650 And the way that Bitcoin works, at a very high level, 871 00:36:54,650 --> 00:36:58,710 is that your balance of Bitcoins is 872 00:36:58,710 --> 00:37:00,900 associated with a private key. 873 00:37:00,900 --> 00:37:03,000 And if you have someone's private key 874 00:37:03,000 --> 00:37:05,770 you can, of course, spend their Bitcoins. 875 00:37:05,770 --> 00:37:10,610 So the security of Bitcoin relies quite heavily 876 00:37:10,610 --> 00:37:13,410 on no one else knowing your private key. 877 00:37:13,410 --> 00:37:15,720 It's kind of like a password, except it's even more 878 00:37:15,720 --> 00:37:18,397 important, because people can probably make lots of guesses 879 00:37:18,397 --> 00:37:19,230 at your private key. 880 00:37:19,230 --> 00:37:21,396 And there's no real server that's checking your key. 881 00:37:21,396 --> 00:37:22,390 It's just cryptography. 882 00:37:22,390 --> 00:37:24,717 So any machine can try to make lots of guesses 883 00:37:24,717 --> 00:37:25,550 at your private key. 884 00:37:25,550 --> 00:37:28,380 And if they guess it, then they can transfer your Bitcoins 885 00:37:28,380 --> 00:37:30,220 to someone else. 886 00:37:30,220 --> 00:37:32,340 And as a result, it's critically important 887 00:37:32,340 --> 00:37:34,910 that you generate good, random keys 888 00:37:34,910 --> 00:37:36,980 that no one else can guess. 889 00:37:36,980 --> 00:37:41,220 And there are people using Bitcoin on Android. 890 00:37:41,220 --> 00:37:45,450 And the Android applications for Bitcoin were getting random 891 00:37:45,450 --> 00:37:51,210 values for these keys using this Java API called SecureRandom(), 892 00:37:51,210 --> 00:37:56,040 which sounds great, but as people figured out, well, OK. 893 00:37:56,040 --> 00:37:59,080 So what it is, right, it doesn't really get real random numbers. 894 00:37:59,080 --> 00:38:00,800 Inside of it, there's this construction 895 00:38:00,800 --> 00:38:04,090 called Pseudorandom Number Generator, 896 00:38:04,090 --> 00:38:07,217 or PRNG that, given a particular seed 897 00:38:07,217 --> 00:38:09,540 value, like you get maybe a couple of hundred 898 00:38:09,540 --> 00:38:12,400 bits of randomness and you shove it into this PRNG, 899 00:38:12,400 --> 00:38:15,410 you can keep asking it for more randomness and sort of stretch 900 00:38:15,410 --> 00:38:19,660 these random bits into as many random bits as you want. 901 00:38:19,660 --> 00:38:22,126 So you see them initially, and then you 902 00:38:22,126 --> 00:38:24,000 can generate as many random bits as you want. 903 00:38:24,000 --> 00:38:26,581 And for various cryptographic reasons I won't go into here, 904 00:38:26,581 --> 00:38:27,330 it actually works. 905 00:38:27,330 --> 00:38:30,030 If you give it a couple of hundred really good random bits 906 00:38:30,030 --> 00:38:32,070 initially, it's going to be very hard for anyone 907 00:38:32,070 --> 00:38:37,380 to predict what the pseudorandom values it's generating are. 908 00:38:37,380 --> 00:38:40,300 But the problem is that this Java library 909 00:38:40,300 --> 00:38:41,840 had a small bug in it. 910 00:38:41,840 --> 00:38:44,980 In some set of circumstances, it forgot 911 00:38:44,980 --> 00:38:46,860 to initialize the PRNG with a seed, 912 00:38:46,860 --> 00:38:50,022 so it was just all zeros, which means that everyone could just 913 00:38:50,022 --> 00:38:51,730 figure out what your random numbers were. 914 00:38:51,730 --> 00:38:53,120 If they start with zeros, they'll 915 00:38:53,120 --> 00:38:54,757 produce the same random numbers as you, 916 00:38:54,757 --> 00:38:57,090 which means they'll produce the same private key as you. 917 00:38:57,090 --> 00:38:59,190 So they can just generate the same private key 918 00:38:59,190 --> 00:39:01,120 and transfer your Bitcoins. 919 00:39:01,120 --> 00:39:05,400 So this is, again, a small or not small bug, 920 00:39:05,400 --> 00:39:08,320 depending on, I guess, who is asking. 921 00:39:08,320 --> 00:39:10,000 But nonetheless, right? 922 00:39:10,000 --> 00:39:12,500 Another example of small programming mistakes 923 00:39:12,500 --> 00:39:14,790 leading to pretty catastrophic results. 924 00:39:14,790 --> 00:39:17,410 Lot's of people got their Bitcoin balances stolen 925 00:39:17,410 --> 00:39:19,400 because of this weakness. 926 00:39:19,400 --> 00:39:21,920 Of course, the fix is pretty simple at some level. 927 00:39:21,920 --> 00:39:23,360 You change the Java implementation 928 00:39:23,360 --> 00:39:28,270 of SecureRandom() to always seed this PRNG with random input 929 00:39:28,270 --> 00:39:29,210 bits. 930 00:39:29,210 --> 00:39:31,300 And then, hopefully, you're in good shape. 931 00:39:31,300 --> 00:39:36,532 But still, that's yet another example of mechanism failure. 932 00:39:36,532 --> 00:39:37,174 Yeah? 933 00:39:37,174 --> 00:39:39,424 AUDIENCE: Just to be clear, is this a different attack 934 00:39:39,424 --> 00:39:42,096 from the DSA signature randomness? 935 00:39:42,096 --> 00:39:42,970 PROFESSOR: Well yeah. 936 00:39:42,970 --> 00:39:44,810 So the actual problem is a little bit more 937 00:39:44,810 --> 00:39:46,310 complicated, as you're hinting at. 938 00:39:46,310 --> 00:39:48,700 The problem is, even if you didn't generate 939 00:39:48,700 --> 00:39:51,140 your key on the Android device in the first place, 940 00:39:51,140 --> 00:39:56,150 the particular signature scheme used by Bitcoin 941 00:39:56,150 --> 00:39:59,110 assumes that every time you generate a new signature 942 00:39:59,110 --> 00:40:01,310 with that key, you use a fresh, what's 943 00:40:01,310 --> 00:40:03,290 called a nonce, for generating that signature. 944 00:40:03,290 --> 00:40:07,270 And if you ever generate two signatures with the same nonce, 945 00:40:07,270 --> 00:40:09,544 then someone can figure out what your key is. 946 00:40:09,544 --> 00:40:10,710 The story is pretty similar. 947 00:40:10,710 --> 00:40:12,270 But the details are a little different. 948 00:40:12,270 --> 00:40:14,310 So yeah, even if you actually generated your key somewhere 949 00:40:14,310 --> 00:40:16,726 else and your key was great, it's just that every time you 950 00:40:16,726 --> 00:40:19,840 generate a signature, you would-- 951 00:40:19,840 --> 00:40:23,190 and you generated two signatures with exactly the same nonce, 952 00:40:23,190 --> 00:40:26,650 or random value, someone could apply some clever math 953 00:40:26,650 --> 00:40:30,270 to your signatures and sort of extract your public key out 954 00:40:30,270 --> 00:40:30,770 of it. 955 00:40:30,770 --> 00:40:34,240 Or private key, more importantly. 956 00:40:34,240 --> 00:40:35,140 All right. 957 00:40:35,140 --> 00:40:40,610 Other questions about these problems, examples, et cetera? 958 00:40:40,610 --> 00:40:41,690 All right. 959 00:40:41,690 --> 00:40:46,570 So I guess, one thing I wanted to point out is that actually, 960 00:40:46,570 --> 00:40:48,400 well, as you're starting to appreciate, 961 00:40:48,400 --> 00:40:52,830 is that in computer security, almost every detail has 962 00:40:52,830 --> 00:40:55,000 a chance of really mattering. 963 00:40:55,000 --> 00:40:58,390 If you screw up almost something seemingly inconsequential, 964 00:40:58,390 --> 00:41:00,780 like forgetting to check something, or this, 965 00:41:00,780 --> 00:41:03,190 or forgetting to initialize the random seed, 966 00:41:03,190 --> 00:41:04,980 it can have pretty dramatic consequences 967 00:41:04,980 --> 00:41:07,010 for the overall system. 968 00:41:07,010 --> 00:41:09,530 And you really have to be very clear about, 969 00:41:09,530 --> 00:41:11,510 what is the specification of your system? 970 00:41:11,510 --> 00:41:12,510 What is it doing? 971 00:41:12,510 --> 00:41:14,870 Exactly what are all the corner cases? 972 00:41:14,870 --> 00:41:17,170 And a good way to sort of think of breaking a system 973 00:41:17,170 --> 00:41:19,369 or, conversely, figure out if your system is secure, 974 00:41:19,369 --> 00:41:20,910 is to really push all the edge cases, 975 00:41:20,910 --> 00:41:23,950 like what happens if my input is just large enough? 976 00:41:23,950 --> 00:41:26,620 Or what is the biggest or the smallest input? 977 00:41:26,620 --> 00:41:28,980 What is the sort of strangest set 978 00:41:28,980 --> 00:41:30,670 of inputs I could provide to my program 979 00:41:30,670 --> 00:41:34,250 and push it in all these corner cases? 980 00:41:34,250 --> 00:41:38,960 One example of this ambiguity, sort of a good example 981 00:41:38,960 --> 00:41:44,850 to keep in mind, is how SSL certificates, again, 982 00:41:44,850 --> 00:41:49,202 encode names into the certificate itself. 983 00:41:49,202 --> 00:41:51,160 So this is a different problem than the problem 984 00:41:51,160 --> 00:41:53,580 about the certificate authorities being trusted. 985 00:41:53,580 --> 00:41:57,470 So these SSL certificates are just sequences of bytes 986 00:41:57,470 --> 00:41:59,000 that a web server sends to you. 987 00:41:59,000 --> 00:42:01,200 And inside of this SSL certificate 988 00:42:01,200 --> 00:42:04,340 is the name of the server you're connecting to, 989 00:42:04,340 --> 00:42:06,144 so something like Amazon.com. 990 00:42:06,144 --> 00:42:08,060 You know, you can't just put down those bytes. 991 00:42:08,060 --> 00:42:09,870 You have to encode it somehow and specify, well, 992 00:42:09,870 --> 00:42:10,536 it's Amazon.com. 993 00:42:10,536 --> 00:42:12,860 And that's the end of the string. 994 00:42:12,860 --> 00:42:18,540 So in SSL certificates, they use a particular encoding scheme 995 00:42:18,540 --> 00:42:24,290 that writes down Amazon.com by first writing down 996 00:42:24,290 --> 00:42:26,394 the number of bytes in the string. 997 00:42:26,394 --> 00:42:27,560 So you first write down, OK. 998 00:42:27,560 --> 00:42:32,314 Well, I'm going to have a 10 byte string called Amazon.com. 999 00:42:35,679 --> 00:42:36,720 That's actually 10 bytes. 1000 00:42:36,720 --> 00:42:36,990 Great. 1001 00:42:36,990 --> 00:42:37,510 OK. 1002 00:42:37,510 --> 00:42:40,120 So this is like-- in the SSL certificate, somewhere 1003 00:42:40,120 --> 00:42:44,160 in there, there is this byte 10 followed by 10 bytes saying 1004 00:42:44,160 --> 00:42:45,170 what the host name is. 1005 00:42:45,170 --> 00:42:48,840 And there's other stuff afterwards, right, and before. 1006 00:42:48,840 --> 00:42:50,810 And when a browser takes it, well, the browser 1007 00:42:50,810 --> 00:42:54,140 is written in C. And the way C represents strings 1008 00:42:54,140 --> 00:42:56,660 is by null terminating them. 1009 00:42:56,660 --> 00:42:59,350 So in C, a string doesn't have a length count. 1010 00:42:59,350 --> 00:43:01,080 Instead, it has all the bytes. 1011 00:43:01,080 --> 00:43:03,660 And the end of the string is just the byte zero. 1012 00:43:03,660 --> 00:43:07,110 And in C, you write it with a backslash zero character. 1013 00:43:07,110 --> 00:43:08,740 So this is in memory in your browser. 1014 00:43:11,340 --> 00:43:13,310 Somewhere in memory there's this string 1015 00:43:13,310 --> 00:43:15,951 of 11 bytes, now, with an extra zero at the end. 1016 00:43:15,951 --> 00:43:17,700 And when a browser interprets this string, 1017 00:43:17,700 --> 00:43:19,950 it just keeps going until it sees an end of string 1018 00:43:19,950 --> 00:43:22,840 marker, which is a zero byte. 1019 00:43:22,840 --> 00:43:24,000 OK. 1020 00:43:24,000 --> 00:43:26,330 So, what could go wrong? 1021 00:43:26,330 --> 00:43:28,751 Any guesses? 1022 00:43:28,751 --> 00:43:29,250 Yeah? 1023 00:43:29,250 --> 00:43:31,575 AUDIENCE: You have a zero in the middle [INAUDIBLE]? 1024 00:43:31,575 --> 00:43:32,040 PROFESSOR: Yes. 1025 00:43:32,040 --> 00:43:32,510 This is great. 1026 00:43:32,510 --> 00:43:33,009 All right. 1027 00:43:33,009 --> 00:43:35,222 So, this is actually a bit of a discontinuity 1028 00:43:35,222 --> 00:43:36,680 in terms of how this guy represents 1029 00:43:36,680 --> 00:43:37,850 strings and this guy. 1030 00:43:37,850 --> 00:43:41,530 So suppose that I own the domain foo.com. 1031 00:43:41,530 --> 00:43:45,731 So I can get certificates for anything dot foo dot com. 1032 00:43:45,731 --> 00:43:50,568 So what I could do is ask for a certificate for the name 1033 00:43:50,568 --> 00:43:51,443 amazon.com0x.foo.com. 1034 00:43:57,325 --> 00:43:59,160 That's a perfectly valid string. 1035 00:43:59,160 --> 00:44:00,730 It has a bunch of bytes. 1036 00:44:00,730 --> 00:44:03,710 I guess it's 10, 11 12 13, 14, 15, 16, 1037 00:44:03,710 --> 00:44:05,830 there's another four, 20, right? 1038 00:44:05,830 --> 00:44:10,020 So this is 20 byte name with these 20 bytes. 1039 00:44:10,020 --> 00:44:12,760 So it used to be that if you go to a certificate authority, 1040 00:44:12,760 --> 00:44:15,230 in many cases, you could say, hey, I own foo.com. 1041 00:44:15,230 --> 00:44:16,990 Give me a certificate for this thing. 1042 00:44:16,990 --> 00:44:19,660 And they'd be perfectly willing to do it because it's 1043 00:44:19,660 --> 00:44:20,790 a subdomain of foo.com. 1044 00:44:20,790 --> 00:44:22,750 It's all yours. 1045 00:44:22,750 --> 00:44:25,220 But then, when a browser takes this string 1046 00:44:25,220 --> 00:44:27,930 and loads it in memory, well, what it does is the same thing 1047 00:44:27,930 --> 00:44:28,830 it did here. 1048 00:44:28,830 --> 00:44:30,839 It copies the string. 1049 00:44:30,839 --> 00:44:31,714 amazon.com0x.foo.com. 1050 00:44:37,206 --> 00:44:40,700 It'll dutifully add the terminating zero at the end. 1051 00:44:40,700 --> 00:44:43,370 But then, when the rest of the browser software 1052 00:44:43,370 --> 00:44:47,510 goes and tries to interpret the string at this memory location, 1053 00:44:47,510 --> 00:44:50,722 it'll keep going up until it gets to zero and say, OK well, 1054 00:44:50,722 --> 00:44:51,930 that's the end of the string. 1055 00:44:51,930 --> 00:44:53,276 So this is Amazon.com. 1056 00:44:53,276 --> 00:44:54,630 That's it. 1057 00:44:54,630 --> 00:45:00,120 So this sort of disconnect between how C software 1058 00:45:00,120 --> 00:45:03,070 and how SSL certificates represent names 1059 00:45:03,070 --> 00:45:05,800 led to some unfortunate security problems. 1060 00:45:05,800 --> 00:45:08,240 This was actually discovered a number of years 1061 00:45:08,240 --> 00:45:11,030 ago now by this guy, Moxie Marlinspike. 1062 00:45:11,030 --> 00:45:13,620 But it's a fairly clever observation. 1063 00:45:13,620 --> 00:45:17,470 And these kinds of encoding bugs are actually also 1064 00:45:17,470 --> 00:45:20,470 pretty common in lots of software 1065 00:45:20,470 --> 00:45:24,030 because, unless you're very diligent about exactly how you 1066 00:45:24,030 --> 00:45:27,090 encode things, there might be different ways of encoding. 1067 00:45:27,090 --> 00:45:28,547 And whenever there's disagreement, 1068 00:45:28,547 --> 00:45:30,880 there's a chance the bad guy can take advantage of this. 1069 00:45:30,880 --> 00:45:32,421 One system thinks that's a fine name. 1070 00:45:32,421 --> 00:45:34,590 Another thinks that's not, something else. 1071 00:45:34,590 --> 00:45:36,870 So these are good places to sort of push a system 1072 00:45:36,870 --> 00:45:39,360 to see how it might break. 1073 00:45:39,360 --> 00:45:42,030 That make sense? 1074 00:45:42,030 --> 00:45:42,530 All right. 1075 00:45:42,530 --> 00:45:47,220 So maybe the last example of mechanism failure 1076 00:45:47,220 --> 00:45:51,090 I'm going to talk about today is a reasonably popular one. 1077 00:45:51,090 --> 00:45:52,910 It's this problem or buffer overflows. 1078 00:45:56,230 --> 00:45:59,380 So some of you have seen this in, or at least at some level, 1079 00:45:59,380 --> 00:46:01,860 in 6.033, if you did the undergrad course. 1080 00:46:01,860 --> 00:46:05,290 But for those of you that have forgotten or haven't taken 1081 00:46:05,290 --> 00:46:07,540 oh three three, we'll sort of go over buffer overflows 1082 00:46:07,540 --> 00:46:08,130 in more detail. 1083 00:46:08,130 --> 00:46:10,463 And this will be, actually, quite critical for you guys, 1084 00:46:10,463 --> 00:46:12,826 because lab one is all about buffer overflows. 1085 00:46:12,826 --> 00:46:14,200 And you're going to be exploiting 1086 00:46:14,200 --> 00:46:19,710 these vulnerabilities in a somewhat real web server. 1087 00:46:19,710 --> 00:46:21,610 So let's figure out, what is the setting? 1088 00:46:21,610 --> 00:46:23,300 What are we talking about here? 1089 00:46:23,300 --> 00:46:25,470 So the setting we're going to be considering 1090 00:46:25,470 --> 00:46:30,330 is a system which has, let's say, a web server. 1091 00:46:30,330 --> 00:46:34,510 So what we have is, we have some computer out there 1092 00:46:34,510 --> 00:46:36,540 that has a web server on it. 1093 00:46:39,130 --> 00:46:41,200 And the web server is a program that 1094 00:46:41,200 --> 00:46:44,070 is going to accept connections from the outside world, 1095 00:46:44,070 --> 00:46:47,480 take requests-- which are basically just packets-- 1096 00:46:47,480 --> 00:46:51,820 and somehow process them, and do some checking, probably. 1097 00:46:51,820 --> 00:46:54,174 If it's an illegal URL or if they're 1098 00:46:54,174 --> 00:46:56,590 trying to access a file they are not authorized to access, 1099 00:46:56,590 --> 00:46:58,381 the web server is going to return an error. 1100 00:46:58,381 --> 00:47:00,400 But otherwise, it's going to access some files, 1101 00:47:00,400 --> 00:47:04,040 maybe on disk, and send them back out 1102 00:47:04,040 --> 00:47:06,990 in some sort of a reply. 1103 00:47:06,990 --> 00:47:10,320 So this is a hugely common picture, almost any system 1104 00:47:10,320 --> 00:47:11,970 you look at. 1105 00:47:11,970 --> 00:47:13,146 What's the policy? 1106 00:47:13,146 --> 00:47:14,270 Or what's the threat model? 1107 00:47:18,300 --> 00:47:22,130 So this is a bit of a problem in many real world systems, 1108 00:47:22,130 --> 00:47:23,850 namely that it's actually pretty hard 1109 00:47:23,850 --> 00:47:26,990 to pin down what is the exact policy or threat model 1110 00:47:26,990 --> 00:47:28,320 that we're talking about. 1111 00:47:28,320 --> 00:47:31,570 And this sort of imprecision or ambiguity about policies, 1112 00:47:31,570 --> 00:47:33,825 threat models, et cetera, is what sometimes 1113 00:47:33,825 --> 00:47:34,950 leads to security problems. 1114 00:47:34,950 --> 00:47:37,180 Not in this particular case, but we'll see. 1115 00:47:37,180 --> 00:47:40,090 But maybe just to give you a sense of how 1116 00:47:40,090 --> 00:47:44,780 to think of a typical web server in the context of this policy, 1117 00:47:44,780 --> 00:47:47,630 threat model kind of stuff, is that well, probably the policy 1118 00:47:47,630 --> 00:47:50,005 is, the web server should do what the programmer intended 1119 00:47:50,005 --> 00:47:50,660 it to do. 1120 00:47:50,660 --> 00:47:51,575 It's a little vague. 1121 00:47:51,575 --> 00:47:53,950 But that's probably what's going on because anything more 1122 00:47:53,950 --> 00:47:55,616 specific, as well, the web server should 1123 00:47:55,616 --> 00:47:57,485 do exactly what the code does, is going 1124 00:47:57,485 --> 00:47:59,860 to be a bit of an [INAUDIBLE] And if your code has a bug, 1125 00:47:59,860 --> 00:48:01,090 well, your policy says, well, that's 1126 00:48:01,090 --> 00:48:02,131 exactly what I should do. 1127 00:48:02,131 --> 00:48:04,120 I should follow the bug. 1128 00:48:04,120 --> 00:48:07,390 So it's a little hard to state a policy precisely, 1129 00:48:07,390 --> 00:48:09,332 but in this case, let's go with some intuitive 1130 00:48:09,332 --> 00:48:11,290 version of, well, the web server should do what 1131 00:48:11,290 --> 00:48:13,785 the programmer wanted it to do. 1132 00:48:13,785 --> 00:48:15,160 And the threat model is probably, 1133 00:48:15,160 --> 00:48:18,260 the attacker doesn't have access to this machine, 1134 00:48:18,260 --> 00:48:20,800 can't log in to it remotely, doesn't have physical access 1135 00:48:20,800 --> 00:48:22,690 to it, but can send any packet they want. 1136 00:48:22,690 --> 00:48:26,874 So they're not restricted to certain kinds of packets. 1137 00:48:26,874 --> 00:48:28,290 Anything you can shape and sort of 1138 00:48:28,290 --> 00:48:30,187 deliver to this web server, that's fair game. 1139 00:48:30,187 --> 00:48:32,270 Seems like a reasonable threat model, in practice, 1140 00:48:32,270 --> 00:48:34,450 to have in mind. 1141 00:48:34,450 --> 00:48:39,940 And I guess the goal is that this web server shouldn't 1142 00:48:39,940 --> 00:48:42,752 allow arbitrary stuff to go wrong here. 1143 00:48:42,752 --> 00:48:44,460 I guess that sort of goes along with what 1144 00:48:44,460 --> 00:48:45,590 the programmer intended. 1145 00:48:45,590 --> 00:48:47,740 The programmer probably didn't intend any request 1146 00:48:47,740 --> 00:48:49,610 to be able to access anything on the server. 1147 00:48:49,610 --> 00:48:51,630 And yet, it turns out if you make certain kinds of mistakes 1148 00:48:51,630 --> 00:48:53,937 in writing the web server software, which is basically 1149 00:48:53,937 --> 00:48:55,020 the mechanism here, right? 1150 00:48:55,020 --> 00:48:57,390 The web server software is the thing that takes a request 1151 00:48:57,390 --> 00:48:59,050 and looks at it and makes sure that it's not 1152 00:48:59,050 --> 00:49:01,710 going to do something bad, sends a response back if everything's 1153 00:49:01,710 --> 00:49:02,050 OK. 1154 00:49:02,050 --> 00:49:03,424 The web server in this mechanism. 1155 00:49:03,424 --> 00:49:05,720 It's enforcing your policy. 1156 00:49:05,720 --> 00:49:08,730 And as a result, if the web server software is buggy, 1157 00:49:08,730 --> 00:49:10,270 then you're in trouble. 1158 00:49:10,270 --> 00:49:12,650 And one sort of common problem, if you're 1159 00:49:12,650 --> 00:49:14,670 writing software in C which, you know, 1160 00:49:14,670 --> 00:49:16,240 many things are still written in C 1161 00:49:16,240 --> 00:49:19,590 and probably will continue to be written in C for a while, 1162 00:49:19,590 --> 00:49:21,540 you can mismanage your memory allocations. 1163 00:49:21,540 --> 00:49:25,330 And as we saw in this SSL certificate naming example, 1164 00:49:25,330 --> 00:49:27,270 even sort of a single byte can really 1165 00:49:27,270 --> 00:49:30,470 make a huge difference, in terms of what goes on. 1166 00:49:30,470 --> 00:49:32,480 And I guess for this example, we'll 1167 00:49:32,480 --> 00:49:35,960 look at a small piece of code that's not quite a real web 1168 00:49:35,960 --> 00:49:36,460 server. 1169 00:49:36,460 --> 00:49:38,900 In the lab, you'll have this whole picture to play with. 1170 00:49:38,900 --> 00:49:41,340 But for lecture, I just want to give you 1171 00:49:41,340 --> 00:49:43,470 a simplified example so we can talk 1172 00:49:43,470 --> 00:49:47,140 about what's sort of at the core of what's going wrong. 1173 00:49:47,140 --> 00:49:51,515 And, in particular, if this system wakes up, 1174 00:49:51,515 --> 00:49:56,240 I will show you sort of a very small C function. 1175 00:49:56,240 --> 00:49:59,070 And we can sort of see what goes wrong 1176 00:49:59,070 --> 00:50:04,400 if you provide different inputs to that piece of code. 1177 00:50:04,400 --> 00:50:05,220 All right. 1178 00:50:05,220 --> 00:50:09,460 So the C function that I have in mind is this guy. 1179 00:50:13,950 --> 00:50:15,920 Somewhere here. 1180 00:50:15,920 --> 00:50:16,720 Oh, yeah. 1181 00:50:19,684 --> 00:50:21,166 It's coming on. 1182 00:50:21,166 --> 00:50:23,150 All right. 1183 00:50:23,150 --> 00:50:27,740 So here's the sort of program I'm talking about, 1184 00:50:27,740 --> 00:50:30,280 or I want to use as an example here. 1185 00:50:30,280 --> 00:50:32,974 So this program is just going to read a request. 1186 00:50:32,974 --> 00:50:34,890 And you can sort of imagine it's going to read 1187 00:50:34,890 --> 00:50:36,400 a request from the network. 1188 00:50:36,400 --> 00:50:38,462 But for the purposes of this example, 1189 00:50:38,462 --> 00:50:40,420 it's just going to read a request from whatever 1190 00:50:40,420 --> 00:50:42,940 I'm typing in on the keyboard. 1191 00:50:42,940 --> 00:50:45,425 And it's going to store it in a buffer here. 1192 00:50:45,425 --> 00:50:47,300 And then it's going to parse it is an integer 1193 00:50:47,300 --> 00:50:48,470 and return the integer. 1194 00:50:48,470 --> 00:50:52,430 And the program will then print whatever integer I get back. 1195 00:50:52,430 --> 00:50:54,110 It's like far from a web server. 1196 00:50:54,110 --> 00:50:57,290 But we'll at least see some basics 1197 00:50:57,290 --> 00:51:00,920 of how buffer overflows work and what goes wrong. 1198 00:51:00,920 --> 00:51:03,380 So let's see actually what happens if we run this program. 1199 00:51:03,380 --> 00:51:05,875 So I can compile it here. 1200 00:51:05,875 --> 00:51:07,250 And actually, you can sort of see 1201 00:51:07,250 --> 00:51:10,600 the-- it's already telling me what I'm screwing up, right? 1202 00:51:10,600 --> 00:51:13,535 The get function is dangerous and should not be used. 1203 00:51:13,535 --> 00:51:15,710 And we'll see in a second why the compiler is 1204 00:51:15,710 --> 00:51:18,300 so intent on telling me this. 1205 00:51:18,300 --> 00:51:20,320 And it actually is true. 1206 00:51:20,320 --> 00:51:23,530 But for now, suppose we're a happy go lucky 1207 00:51:23,530 --> 00:51:26,660 developer that is willing to ignore this warning. 1208 00:51:26,660 --> 00:51:27,350 So OK. 1209 00:51:27,350 --> 00:51:30,200 I run this redirect function, I provide some input, 1210 00:51:30,200 --> 00:51:33,040 and it works. 1211 00:51:33,040 --> 00:51:34,900 Let's see if I provide large inputs. 1212 00:51:34,900 --> 00:51:37,265 If I type in some large number, well, 1213 00:51:37,265 --> 00:51:38,890 at least it gives me some large number. 1214 00:51:38,890 --> 00:51:43,000 It basically maxes out to two to the 31 and prints that 1215 00:51:43,000 --> 00:51:44,530 and doesn't go any higher. 1216 00:51:44,530 --> 00:51:46,290 So that's maybe not disastrous, right? 1217 00:51:46,290 --> 00:51:46,790 Whatever. 1218 00:51:46,790 --> 00:51:49,570 You provided this ridiculously large number. 1219 00:51:49,570 --> 00:51:51,990 You got something didn't quite work. 1220 00:51:51,990 --> 00:51:53,510 It's not quite a problem yet. 1221 00:51:53,510 --> 00:51:55,520 But if we provide some really large input, 1222 00:51:55,520 --> 00:51:57,880 we might get some other problem, right? 1223 00:51:57,880 --> 00:52:00,940 So suppose I provide in a lot of by 12 1224 00:52:00,940 --> 00:52:03,395 I just provided things that are not numbers. 1225 00:52:03,395 --> 00:52:04,020 It prints zero. 1226 00:52:04,020 --> 00:52:06,430 That's not so bad. 1227 00:52:06,430 --> 00:52:10,990 But suppose I'm going to paste in a huge number of As. 1228 00:52:10,990 --> 00:52:13,490 OK, so now the program crashes. 1229 00:52:13,490 --> 00:52:14,770 Maybe not too surprising. 1230 00:52:14,770 --> 00:52:18,115 So if it was the case that if I send a bad request to the web 1231 00:52:18,115 --> 00:52:20,740 server, it just doesn't get back to me or doesn't send a reply, 1232 00:52:20,740 --> 00:52:21,629 that would be fine. 1233 00:52:21,629 --> 00:52:23,170 But we'll sort of look inside and see 1234 00:52:23,170 --> 00:52:25,750 what happens, and try to figure out how we can actually 1235 00:52:25,750 --> 00:52:30,610 take advantage of this crash to maybe do something much more 1236 00:52:30,610 --> 00:52:35,960 interesting, or, well, much more along with what a hacker might 1237 00:52:35,960 --> 00:52:37,794 be interested in doing. 1238 00:52:37,794 --> 00:52:39,710 So to do this, we're going to run this program 1239 00:52:39,710 --> 00:52:40,680 under a debugger. 1240 00:52:40,680 --> 00:52:43,980 You'll get super familiar with this in lab one. 1241 00:52:43,980 --> 00:52:45,500 But for now, what we're going to do 1242 00:52:45,500 --> 00:52:49,700 is set a breakpoint in that redirect function. 1243 00:52:49,700 --> 00:52:52,380 And we're going to sort of run along and see what happens. 1244 00:52:52,380 --> 00:52:54,410 So when I run the program, it's going 1245 00:52:54,410 --> 00:52:56,450 to start executing in the main function. 1246 00:52:56,450 --> 00:52:58,780 And pretty quickly, it calls redirect. 1247 00:52:58,780 --> 00:53:01,790 And the debugger is now stopped at the beginning of redirect. 1248 00:53:01,790 --> 00:53:06,830 And we can actually see what's going on here by, for example, 1249 00:53:06,830 --> 00:53:09,455 we can ask it to print the current CPU registers. 1250 00:53:09,455 --> 00:53:11,330 We're going to look at really low level stuff 1251 00:53:11,330 --> 00:53:13,610 here, as opposed to at the level of C source code. 1252 00:53:13,610 --> 00:53:15,090 We're going to look at the actual instructions 1253 00:53:15,090 --> 00:53:16,881 that my machine is executing because that's 1254 00:53:16,881 --> 00:53:17,930 what really is going on. 1255 00:53:17,930 --> 00:53:20,950 The C is actually maybe hiding some things from us. 1256 00:53:20,950 --> 00:53:23,110 So you can actually print all the registers. 1257 00:53:23,110 --> 00:53:25,974 So on x86, as you might remember. 1258 00:53:25,974 --> 00:53:27,390 Well, on [INAUDIBLE] architecture, 1259 00:53:27,390 --> 00:53:29,170 there's a stack pointer. 1260 00:53:29,170 --> 00:53:32,530 So let me start maybe drawing this diagram on the board 1261 00:53:32,530 --> 00:53:36,450 so we can try to reconstruct what's happening. 1262 00:53:36,450 --> 00:53:39,550 So what's going on is that my program, not surprisingly, 1263 00:53:39,550 --> 00:53:41,020 has a stack. 1264 00:53:41,020 --> 00:53:43,300 On x86, the stack grows down. 1265 00:53:43,300 --> 00:53:46,040 So it sort of is this stack like this. 1266 00:53:46,040 --> 00:53:49,020 And we can keep pushing stuff onto it. 1267 00:53:49,020 --> 00:53:51,980 So right now, the stack pointer points 1268 00:53:51,980 --> 00:53:58,230 at this particular memory location FFD010. 1269 00:53:58,230 --> 00:53:59,535 So some value. 1270 00:53:59,535 --> 00:54:01,660 So you can try to figure out, how did it get there? 1271 00:54:01,660 --> 00:54:05,480 One way to do it is to disassemble the code 1272 00:54:05,480 --> 00:54:07,380 of this redirect function. 1273 00:54:12,650 --> 00:54:14,230 Is this going to work better? 1274 00:54:14,230 --> 00:54:15,620 Really? 1275 00:54:15,620 --> 00:54:18,250 Convenience variable must have integer value. 1276 00:54:20,870 --> 00:54:21,370 Man. 1277 00:54:21,370 --> 00:54:22,786 What is going on with my debugger? 1278 00:54:28,190 --> 00:54:28,690 All right. 1279 00:54:28,690 --> 00:54:31,500 Well, we can disassemble the function by name. 1280 00:54:31,500 --> 00:54:33,200 So this is what the function is doing. 1281 00:54:33,200 --> 00:54:36,340 So first off, it starts by manipulating something 1282 00:54:36,340 --> 00:54:37,362 with this EBP register. 1283 00:54:37,362 --> 00:54:38,570 That's not super interesting. 1284 00:54:38,570 --> 00:54:40,620 But the first thing it does after that is 1285 00:54:40,620 --> 00:54:43,800 subtract a certain value from the stack pointer. 1286 00:54:43,800 --> 00:54:46,940 This is, basically, it's making space for all those variables, 1287 00:54:46,940 --> 00:54:50,680 like the buffer and the integer, i, we saw in the C source code. 1288 00:54:50,680 --> 00:54:53,570 So we're actually, now, four instructions 1289 00:54:53,570 --> 00:54:55,230 into the function, here. 1290 00:54:55,230 --> 00:54:57,190 So that stack pointer value that we 1291 00:54:57,190 --> 00:55:01,560 saw before is actually already in the middle, so to say, 1292 00:55:01,560 --> 00:55:02,730 of the stack. 1293 00:55:02,730 --> 00:55:06,840 And currently, there's stuff above it 1294 00:55:06,840 --> 00:55:09,550 that is going to be the buffer, that integer 1295 00:55:09,550 --> 00:55:12,110 value, and actually, also the return address 1296 00:55:12,110 --> 00:55:14,390 into the main function goes on the stack, as well. 1297 00:55:14,390 --> 00:55:17,734 So somewhere here, we'll have the return address. 1298 00:55:17,734 --> 00:55:19,150 And we actually try to figure out, 1299 00:55:19,150 --> 00:55:20,720 where are things on the stack? 1300 00:55:20,720 --> 00:55:26,850 So we can print the address of that buffer variable. 1301 00:55:26,850 --> 00:55:31,040 So the buffer variable is at address D02C. 1302 00:55:31,040 --> 00:55:35,690 We can also print the value of that integer, i. 1303 00:55:35,690 --> 00:55:38,960 That guy is at D0AC. 1304 00:55:38,960 --> 00:55:40,970 So the i is way up on the stack. 1305 00:55:40,970 --> 00:55:44,310 But the buffer is a bit lower. 1306 00:55:44,310 --> 00:55:47,210 So what's going on is that we have our buffer here 1307 00:55:47,210 --> 00:55:52,460 on the stack, and then followed above by i and maybe 1308 00:55:52,460 --> 00:55:54,640 some other stuff, and then finally, the return 1309 00:55:54,640 --> 00:55:57,260 address into the main function that called redirect. 1310 00:55:57,260 --> 00:56:00,910 And the buffer is-- this is going, 1311 00:56:00,910 --> 00:56:02,290 the stack is growing down. 1312 00:56:02,290 --> 00:56:03,845 So these are higher addresses. 1313 00:56:07,250 --> 00:56:11,010 So what this means is that the buffer-- we actually 1314 00:56:11,010 --> 00:56:13,750 have to decide, where is the zeroth element of the buffer, 1315 00:56:13,750 --> 00:56:16,950 and where is the 128th element of this buffer? 1316 00:56:16,950 --> 00:56:20,510 So where does the zeroth element of the buffer go? 1317 00:56:20,510 --> 00:56:22,205 Yeah? 1318 00:56:22,205 --> 00:56:24,080 Should be at the bottom, right, because yeah, 1319 00:56:24,080 --> 00:56:25,590 higher elements just keep going up. 1320 00:56:25,590 --> 00:56:27,505 So buff of zero is down here. 1321 00:56:27,505 --> 00:56:28,760 It just keeps going on. 1322 00:56:28,760 --> 00:56:31,140 And buff of 127 is going to be up there. 1323 00:56:31,140 --> 00:56:34,240 And then we'll have i and other stuff. 1324 00:56:34,240 --> 00:56:35,020 OK. 1325 00:56:35,020 --> 00:56:36,860 Well, let's see what happens now if we 1326 00:56:36,860 --> 00:56:39,620 provide that input that seemed to be crashing it before. 1327 00:56:39,620 --> 00:56:41,120 So I guess one thing we can actually 1328 00:56:41,120 --> 00:56:43,700 do before this is to see whether we can actually 1329 00:56:43,700 --> 00:56:45,270 find this return address. 1330 00:56:45,270 --> 00:56:48,870 Where it actually happens to live is at the EBP pointer. 1331 00:56:48,870 --> 00:56:52,700 This is just a convenient thing in the x86 calling convention, 1332 00:56:52,700 --> 00:56:59,270 that the EBP pointer, or register, actually 1333 00:56:59,270 --> 00:57:02,150 happens to point to something on the stack which is going 1334 00:57:02,150 --> 00:57:06,040 to be called the saved EBP. 1335 00:57:06,040 --> 00:57:08,870 It's a separate location, sort of after all the variables 1336 00:57:08,870 --> 00:57:10,250 but before the return address. 1337 00:57:10,250 --> 00:57:11,666 And this is the thing that's being 1338 00:57:11,666 --> 00:57:14,800 saved by those first couple of instructions at the top. 1339 00:57:14,800 --> 00:57:16,630 And you actually sort of examine it. 1340 00:57:16,630 --> 00:57:23,450 In GDB you can say, examine x, some value, so the EBP pointer 1341 00:57:23,450 --> 00:57:24,570 value. 1342 00:57:24,570 --> 00:57:26,720 So that's the location of the stack, D0B8. 1343 00:57:26,720 --> 00:57:30,020 Indeed, it's actually above even the i variable. 1344 00:57:30,020 --> 00:57:30,770 So it's great. 1345 00:57:30,770 --> 00:57:32,436 And it has some other value that happens 1346 00:57:32,436 --> 00:57:36,050 to be the EBP before this function was called. 1347 00:57:36,050 --> 00:57:38,950 But then, sort of one more memory location 1348 00:57:38,950 --> 00:57:40,710 up is going to be the return address. 1349 00:57:40,710 --> 00:57:44,210 So if we print EBP plus four, there's something else there, 1350 00:57:44,210 --> 00:57:48,800 this 0x08048E5F. 1351 00:57:48,800 --> 00:57:51,720 And let's actually see where that's pointing. 1352 00:57:51,720 --> 00:57:54,485 So this is something you're going to do a lot in the lab. 1353 00:57:54,485 --> 00:57:56,140 So you can take this address. 1354 00:57:56,140 --> 00:57:59,070 And you can try to disassemble it. 1355 00:57:59,070 --> 00:58:00,130 So what is this guy? 1356 00:58:00,130 --> 00:58:02,290 Where did we end up? 1357 00:58:02,290 --> 00:58:05,040 So GDB actually helpfully figures out which function 1358 00:58:05,040 --> 00:58:06,480 contains that address. 1359 00:58:06,480 --> 00:58:07,640 So 5F. 1360 00:58:07,640 --> 00:58:11,550 This is the guy that our return address is pointing to. 1361 00:58:11,550 --> 00:58:13,790 And as you can see, this is the instruction right 1362 00:58:13,790 --> 00:58:16,070 after the call to redirect. 1363 00:58:16,070 --> 00:58:17,655 So when we return from redirect, this 1364 00:58:17,655 --> 00:58:20,570 is exactly where we're going to jump and continue execution. 1365 00:58:20,570 --> 00:58:22,319 This is, hopefully, fairly straightforward 1366 00:58:22,319 --> 00:58:25,660 stuff from double oh four, some standard OS class. 1367 00:58:25,660 --> 00:58:26,160 OK. 1368 00:58:26,160 --> 00:58:28,300 So where are we now? 1369 00:58:28,300 --> 00:58:33,060 Just to recap, we can try to disassemble our instruction 1370 00:58:33,060 --> 00:58:33,990 pointer. 1371 00:58:33,990 --> 00:58:36,900 So we're at the beginning of redirect right now. 1372 00:58:36,900 --> 00:58:43,520 And we can run for a bit, and maybe run that getS() function. 1373 00:58:43,520 --> 00:58:45,210 So OK, we run next. 1374 00:58:45,210 --> 00:58:48,620 What this does is it runs getS() and it's waiting for getS() 1375 00:58:48,620 --> 00:58:49,627 to return. 1376 00:58:49,627 --> 00:58:51,960 We can provide our bad input to getS() and try to get it 1377 00:58:51,960 --> 00:58:54,950 to crash again and see what's going on, really, there, right? 1378 00:58:54,950 --> 00:58:57,310 So we can paste a bunch of As again. 1379 00:58:57,310 --> 00:58:57,810 OK. 1380 00:58:57,810 --> 00:59:00,420 So we got out of getS() and things are actually still OK, 1381 00:59:00,420 --> 00:59:00,919 right? 1382 00:59:00,919 --> 00:59:02,520 The program is still running. 1383 00:59:02,520 --> 00:59:05,830 But we can try to figure out, what is in memory right now 1384 00:59:05,830 --> 00:59:08,775 and why are things going to go wrong? 1385 00:59:08,775 --> 00:59:10,150 Actually, what do you guys think? 1386 00:59:10,150 --> 00:59:11,025 What happened, right? 1387 00:59:11,025 --> 00:59:12,980 So I printed out a bunch of As. 1388 00:59:12,980 --> 00:59:14,560 What did getS() do to the memory? 1389 00:59:16,770 --> 00:59:17,270 Yeah, yeah. 1390 00:59:17,270 --> 00:59:18,936 So it just keeps writing As here, right? 1391 00:59:18,936 --> 00:59:21,360 All we actually passed to getS() was a single pointer, 1392 00:59:21,360 --> 00:59:23,410 the start of this address, right? 1393 00:59:23,410 --> 00:59:26,660 So this is the argument to getS(), 1394 00:59:26,660 --> 00:59:28,800 is a pointer to this memory location on the stack. 1395 00:59:28,800 --> 00:59:30,470 So it just kept writing As. 1396 00:59:30,470 --> 00:59:32,470 And it doesn't actually know what the length is, 1397 00:59:32,470 --> 00:59:33,760 so it just keeps going, right? 1398 00:59:33,760 --> 00:59:36,334 It's going to override As all the way up the stack, 1399 00:59:36,334 --> 00:59:38,500 past the return address, probably, and into whatever 1400 00:59:38,500 --> 00:59:40,934 was up the stack above us. 1401 00:59:40,934 --> 00:59:42,600 So we can check whether that's the case. 1402 00:59:42,600 --> 00:59:47,064 So we can actually print the buffer. 1403 00:59:47,064 --> 00:59:48,480 And in fact, it tells us, yeah, we 1404 00:59:48,480 --> 00:59:51,500 have 180 As there, even though the buffer 1405 00:59:51,500 --> 00:59:55,670 should be 128 elements large. 1406 00:59:55,670 --> 00:59:57,310 So this is not so great. 1407 00:59:57,310 --> 00:59:59,530 And we can actually, again, examine what's 1408 00:59:59,530 --> 01:00:03,290 going on in that EBP pointer. 1409 01:00:03,290 --> 01:00:05,095 Dollar sign, EBP. 1410 01:00:05,095 --> 01:00:06,610 So in fact, yeah. 1411 01:00:06,610 --> 01:00:12,159 It's all 0x41, which is the ASCII encoding of the letter A. 1412 01:00:12,159 --> 01:00:14,200 And in fact, the return address is probably going 1413 01:00:14,200 --> 01:00:15,283 to be the same way, right? 1414 01:00:15,283 --> 01:00:19,350 If we print the return address, it's also all As. 1415 01:00:19,350 --> 01:00:20,245 That's not so great. 1416 01:00:20,245 --> 01:00:22,370 In fact, what's going to happen if we return now is 1417 01:00:22,370 --> 01:00:25,447 the program will jump to that address, 41414141. 1418 01:00:25,447 --> 01:00:26,530 And there's nothing there. 1419 01:00:26,530 --> 01:00:27,196 And it'll crash. 1420 01:00:27,196 --> 01:00:29,840 That's the segmentation fault you're getting. 1421 01:00:29,840 --> 01:00:33,090 So let's just step up to it and see what happens. 1422 01:00:33,090 --> 01:00:34,490 So let's run next. 1423 01:00:34,490 --> 01:00:37,470 So we keep stepping through the program. 1424 01:00:37,470 --> 01:00:40,060 And we can see where we are. 1425 01:00:40,060 --> 01:00:40,560 OK. 1426 01:00:40,560 --> 01:00:43,330 We're getting close to the end of the function. 1427 01:00:43,330 --> 01:00:46,400 So we can step over two more instructions. 1428 01:00:46,400 --> 01:00:49,260 nexti. 1429 01:00:49,260 --> 01:00:51,531 And now we can disassemble again. 1430 01:00:51,531 --> 01:00:52,030 OK. 1431 01:00:52,030 --> 01:00:54,859 We're now just at the return instruction from this function. 1432 01:00:54,859 --> 01:00:56,150 And we can actually figure out. 1433 01:00:56,150 --> 01:00:59,690 So as you can see, at the end of the function, 1434 01:00:59,690 --> 01:01:02,120 it runs this leave x86 instruction, 1435 01:01:02,120 --> 01:01:05,220 which basically restores the stack back to where it was. 1436 01:01:05,220 --> 01:01:07,020 So it sort of pushes the stack pointer 1437 01:01:07,020 --> 01:01:10,200 all the way back to the return address using the same EBP. 1438 01:01:10,200 --> 01:01:11,810 That's what it's basically for. 1439 01:01:11,810 --> 01:01:15,421 And now, the stack is pointing at the return address 1440 01:01:15,421 --> 01:01:16,420 that we're going to use. 1441 01:01:16,420 --> 01:01:18,340 And in fact, it's all A's. 1442 01:01:18,340 --> 01:01:20,370 And if we run one more instruction, 1443 01:01:20,370 --> 01:01:22,730 the CPU is going to jump to that exact memory address 1444 01:01:22,730 --> 01:01:25,350 and start executing code there and crash, 1445 01:01:25,350 --> 01:01:29,160 because it's not a valid address that's in the page table. 1446 01:01:29,160 --> 01:01:32,360 So let's actually see, just to double check, what's going on. 1447 01:01:32,360 --> 01:01:34,569 Let's print our buffer again. 1448 01:01:34,569 --> 01:01:36,860 Our buffer-- well, that's actually kind of interesting, 1449 01:01:36,860 --> 01:01:37,359 right? 1450 01:01:37,359 --> 01:01:38,930 So now, buffer, for some reason it 1451 01:01:38,930 --> 01:01:41,710 only says A repeats 128 times. 1452 01:01:41,710 --> 01:01:45,590 Whereas if you remember before, it said A repeated 180 times 1453 01:01:45,590 --> 01:01:47,690 in our buffer. 1454 01:01:47,690 --> 01:01:49,376 So what happened? 1455 01:01:49,376 --> 01:01:49,876 Yeah? 1456 01:01:49,876 --> 01:01:51,340 AUDIENCE: [INAUDIBLE]. 1457 01:01:51,340 --> 01:01:51,695 PROFESSOR: Yeah, yeah. 1458 01:01:51,695 --> 01:01:52,050 Exactly. 1459 01:01:52,050 --> 01:01:53,633 So there's actually something going on 1460 01:01:53,633 --> 01:01:55,160 after the buffer overflow happens 1461 01:01:55,160 --> 01:01:56,812 that changes what's going on. 1462 01:01:56,812 --> 01:01:58,270 So actually, if you remember, we do 1463 01:01:58,270 --> 01:02:00,650 this A to i conversion of the string to an integer. 1464 01:02:00,650 --> 01:02:03,070 And if you provide all As, it actually 1465 01:02:03,070 --> 01:02:05,850 writes zero to this memory location. 1466 01:02:05,850 --> 01:02:08,840 So a zero, if you remember, terminates strings in C. 1467 01:02:08,840 --> 01:02:12,120 So GDB now thinks, yep, we have a perfectly well-terminated 1468 01:02:12,120 --> 01:02:15,155 128 byte string of all As. 1469 01:02:15,155 --> 01:02:16,780 But you know, it doesn't really matter, 1470 01:02:16,780 --> 01:02:18,530 because we still have those As up top that 1471 01:02:18,530 --> 01:02:21,180 already corrupted our stack. 1472 01:02:21,180 --> 01:02:21,680 OK. 1473 01:02:21,680 --> 01:02:23,554 That was actually kind of an important lesson 1474 01:02:23,554 --> 01:02:25,990 that-- it's actually a little bit tricky, sometimes, 1475 01:02:25,990 --> 01:02:28,896 to explore these buffer overflows because, even 1476 01:02:28,896 --> 01:02:31,270 though you've already changed lots of stuff on the stack, 1477 01:02:31,270 --> 01:02:32,870 you still have to get to the point 1478 01:02:32,870 --> 01:02:34,810 where you use the value that you have somehow 1479 01:02:34,810 --> 01:02:35,700 placed on the stack. 1480 01:02:35,700 --> 01:02:37,140 So there's other code that's going 1481 01:02:37,140 --> 01:02:38,940 to run after you've managed to overflow 1482 01:02:38,940 --> 01:02:40,274 some buffer and corrupt memory. 1483 01:02:40,274 --> 01:02:42,690 You have to make sure that code doesn't do something silly 1484 01:02:42,690 --> 01:02:45,490 like, if it's A to i, just exited right away, 1485 01:02:45,490 --> 01:02:48,070 as soon as it saw a non-integer value, 1486 01:02:48,070 --> 01:02:53,350 we might not get to jump to all this 41414141 address. 1487 01:02:53,350 --> 01:02:55,419 So you have to massage your input in some cases. 1488 01:02:55,419 --> 01:02:56,710 Maybe not so much in this case. 1489 01:02:56,710 --> 01:02:58,470 But in other situations, you'll have 1490 01:02:58,470 --> 01:03:01,110 to be careful in constructing this input. 1491 01:03:01,110 --> 01:03:04,655 OK, so just to see what happens, we can jump one more time. 1492 01:03:04,655 --> 01:03:06,030 Well, let's look at our register. 1493 01:03:06,030 --> 01:03:10,400 So right now, our EIP, the sort of instruction pointer, 1494 01:03:10,400 --> 01:03:12,780 is pointing at the last thing in redirect. 1495 01:03:12,780 --> 01:03:14,830 And if we step one more time, hopefully we'll 1496 01:03:14,830 --> 01:03:19,570 jump to, finally, that unfortunate 4141 address. 1497 01:03:19,570 --> 01:03:20,120 Over here. 1498 01:03:20,120 --> 01:03:20,987 And in fact, yep. 1499 01:03:20,987 --> 01:03:22,820 The program now seems to be executing there. 1500 01:03:22,820 --> 01:03:25,990 If we ask GDB to print the current set of registers, 1501 01:03:25,990 --> 01:03:29,420 yep, the current instruction pointer is this strange value. 1502 01:03:29,420 --> 01:03:31,840 And if we exclude one more instruction, 1503 01:03:31,840 --> 01:03:34,200 it's going to crash because that's finally 1504 01:03:34,200 --> 01:03:39,700 trying to execute an instruction pointer that doesn't correspond 1505 01:03:39,700 --> 01:03:42,730 to a valid page in the operating system's page table 1506 01:03:42,730 --> 01:03:44,770 for this process. 1507 01:03:44,770 --> 01:03:46,750 Make sense? 1508 01:03:46,750 --> 01:03:49,260 Any questions? 1509 01:03:49,260 --> 01:03:49,760 All right. 1510 01:03:49,760 --> 01:03:52,910 Well, I've got a question for you guys, actually. 1511 01:03:52,910 --> 01:03:58,540 So what happens-- you know, it seems to be exploitable. 1512 01:03:58,540 --> 01:03:59,644 Or well, OK. 1513 01:03:59,644 --> 01:04:02,060 Maybe let's first figure out why this is particularly bad, 1514 01:04:02,060 --> 01:04:02,560 right? 1515 01:04:02,560 --> 01:04:03,762 So why is it a problem? 1516 01:04:03,762 --> 01:04:05,220 So not only does our program crash, 1517 01:04:05,220 --> 01:04:07,011 but presumably we're going to take it over. 1518 01:04:07,011 --> 01:04:09,010 So I guess, first simple question is, OK, 1519 01:04:09,010 --> 01:04:10,474 so what's the problem? 1520 01:04:10,474 --> 01:04:11,140 What can you do? 1521 01:04:11,140 --> 01:04:11,890 Yeah? 1522 01:04:11,890 --> 01:04:13,180 AUDIENCE: You can do whatever you want. 1523 01:04:13,180 --> 01:04:13,846 PROFESSOR: Yeah. 1524 01:04:13,846 --> 01:04:16,809 So I was actually pretty silly and just put in lots of As. 1525 01:04:16,809 --> 01:04:18,350 But if you were careful about knowing 1526 01:04:18,350 --> 01:04:20,512 where to put what values, you might 1527 01:04:20,512 --> 01:04:21,970 be able to put in a different value 1528 01:04:21,970 --> 01:04:23,387 and get it to jump somewhere else. 1529 01:04:23,387 --> 01:04:25,344 So let's see if we can actually do this, right? 1530 01:04:25,344 --> 01:04:26,710 We can retrace this whole thing. 1531 01:04:26,710 --> 01:04:27,210 OK. 1532 01:04:27,210 --> 01:04:28,930 Re-run the program again. 1533 01:04:28,930 --> 01:04:33,030 And I guess I have to reset the breakpoint. 1534 01:04:33,030 --> 01:04:35,450 So I can break and redirect again. 1535 01:04:35,450 --> 01:04:36,570 And run. 1536 01:04:36,570 --> 01:04:42,000 And this time, I'll, again, next, 1537 01:04:42,000 --> 01:04:43,900 supply lots of As and overflow things. 1538 01:04:43,900 --> 01:04:47,980 But I'm not going to try to carefully construct-- 1539 01:04:47,980 --> 01:04:50,445 you know, figure out which point in these As corresponds 1540 01:04:50,445 --> 01:04:51,710 to the location in the stack. 1541 01:04:51,710 --> 01:04:52,430 That's something you guys are going 1542 01:04:52,430 --> 01:04:53,920 to have to do for lab one. 1543 01:04:53,920 --> 01:04:57,050 But suppose that I overflow the stack here. 1544 01:04:57,050 --> 01:04:58,470 And then I'm going to manually try 1545 01:04:58,470 --> 01:05:01,470 to change things on the stack to get it to jump to some point I 1546 01:05:01,470 --> 01:05:03,310 want to jump to. 1547 01:05:03,310 --> 01:05:08,979 And in this program, OK, so let's again-- nexti. 1548 01:05:08,979 --> 01:05:09,520 Where are we? 1549 01:05:09,520 --> 01:05:12,350 We're at, again, at the very end of redirect. 1550 01:05:12,350 --> 01:05:14,540 And let's actually look at the stack, right? 1551 01:05:14,540 --> 01:05:18,420 So if we examine esp here, we see our corrupted pointer. 1552 01:05:18,420 --> 01:05:18,920 OK. 1553 01:05:18,920 --> 01:05:21,020 Where could we jump to? 1554 01:05:21,020 --> 01:05:22,794 What interesting things could we do? 1555 01:05:22,794 --> 01:05:24,710 Unfortunately, this program is pretty limited. 1556 01:05:24,710 --> 01:05:26,543 There's almost nothing in the program's code 1557 01:05:26,543 --> 01:05:28,880 where you could jump and do anything interesting. 1558 01:05:28,880 --> 01:05:31,350 But maybe we can do a little bit of something interesting. 1559 01:05:31,350 --> 01:05:33,460 Maybe we'll find the printf in main 1560 01:05:33,460 --> 01:05:36,190 and jump directly there, and get it to print the x value, 1561 01:05:36,190 --> 01:05:37,710 or x equals something. 1562 01:05:37,710 --> 01:05:38,590 So we can do this. 1563 01:05:38,590 --> 01:05:41,820 We can actually disassemble the main function. 1564 01:05:41,820 --> 01:05:44,630 And main does a bunch of stuff, you 1565 01:05:44,630 --> 01:05:47,710 know, initializes, calls redirect, does some more stuff, 1566 01:05:47,710 --> 01:05:49,360 and then calls printf. 1567 01:05:49,360 --> 01:05:51,970 So how about we jump to this point, which is, 1568 01:05:51,970 --> 01:05:54,200 it sets up the argument to printf, 1569 01:05:54,200 --> 01:05:58,204 which is x equals percent d, and then actually calls printf. 1570 01:05:58,204 --> 01:05:59,620 So we can actually take this value 1571 01:05:59,620 --> 01:06:01,900 and try to stick it in the stack. 1572 01:06:01,900 --> 01:06:05,290 And should be able to do this with the debugger 1573 01:06:05,290 --> 01:06:06,850 pretty easily, at least. 1574 01:06:06,850 --> 01:06:11,780 So you can do this set [? int ?] esp equals this value. 1575 01:06:11,780 --> 01:06:14,040 So we can examine esp again and, indeed, it actually 1576 01:06:14,040 --> 01:06:14,780 has this value. 1577 01:06:14,780 --> 01:06:19,590 So if we continue now, well, it printed out x 1578 01:06:19,590 --> 01:06:21,950 equals some garbage, which I guess 1579 01:06:21,950 --> 01:06:24,380 happens to be just whatever is on the stack that 1580 01:06:24,380 --> 01:06:25,260 was passed to printf. 1581 01:06:25,260 --> 01:06:26,690 We didn't correctly set up all the arguments 1582 01:06:26,690 --> 01:06:29,065 because we jumped in the middle of this calling sequence. 1583 01:06:29,065 --> 01:06:30,810 But yeah, we printed this value. 1584 01:06:30,810 --> 01:06:32,790 And then it crashed. 1585 01:06:32,790 --> 01:06:33,650 Why did crash? 1586 01:06:33,650 --> 01:06:36,312 Why do you think? 1587 01:06:36,312 --> 01:06:37,520 What actually happens, right? 1588 01:06:37,520 --> 01:06:40,000 So we jump to printf. 1589 01:06:40,000 --> 01:06:42,085 And then, something went wrong. 1590 01:06:42,085 --> 01:06:42,585 Yeah? 1591 01:06:45,772 --> 01:06:47,230 Well, we changed the return address 1592 01:06:47,230 --> 01:06:48,930 so that when we return from redirect, 1593 01:06:48,930 --> 01:06:52,420 we now jump to this new address, which is that point up there, 1594 01:06:52,420 --> 01:06:53,825 right after printf. 1595 01:06:53,825 --> 01:06:58,160 So where's this crash coming from? 1596 01:06:58,160 --> 01:06:59,317 Yeah? 1597 01:06:59,317 --> 01:07:01,025 AUDIENCE: Is it restricted because your i 1598 01:07:01,025 --> 01:07:02,981 is supposed to be some sort of integer, but-- 1599 01:07:02,981 --> 01:07:04,420 PROFESSOR: No, actually, well the i is like, 1600 01:07:04,420 --> 01:07:05,580 well it's a 32-bit register. 1601 01:07:05,580 --> 01:07:06,790 So whatever's in the register, it'll print. 1602 01:07:06,790 --> 01:07:08,831 In fact, that's the thing that's in the register. 1603 01:07:08,831 --> 01:07:10,370 So that's OK. 1604 01:07:10,370 --> 01:07:11,024 Yeah? 1605 01:07:11,024 --> 01:07:12,482 AUDIENCE: [INAUDIBLE] main returns. 1606 01:07:12,482 --> 01:07:12,935 PROFESSOR: Yes. 1607 01:07:12,935 --> 01:07:13,560 Actually, yeah. 1608 01:07:13,560 --> 01:07:15,525 What's going on is, you have to sort of-- OK, 1609 01:07:15,525 --> 01:07:17,710 so this is the point where we jumped. 1610 01:07:17,710 --> 01:07:18,960 It's set up some arguments. 1611 01:07:18,960 --> 01:07:20,115 It actually calls printf. 1612 01:07:20,115 --> 01:07:22,140 printf seems to work. printf is going to return. 1613 01:07:22,140 --> 01:07:24,584 Now actually, that's fine, because this call instruction 1614 01:07:24,584 --> 01:07:26,750 put a return address on the stack for printf to use. 1615 01:07:26,750 --> 01:07:27,810 That's fine. 1616 01:07:27,810 --> 01:07:29,639 Then main is going to continue running. 1617 01:07:29,639 --> 01:07:31,930 It's going to run the sleeve instruction, which doesn't 1618 01:07:31,930 --> 01:07:32,929 do anything interesting. 1619 01:07:32,929 --> 01:07:34,454 And then it does another return. 1620 01:07:34,454 --> 01:07:36,120 But the thing in this-- up to the stack, 1621 01:07:36,120 --> 01:07:38,120 it doesn't actually have a valid return address. 1622 01:07:38,120 --> 01:07:40,250 So presumably, we return to some other 1623 01:07:40,250 --> 01:07:42,800 who knows what memory location that's up on the stack 1624 01:07:42,800 --> 01:07:44,750 and jump somewhere else. 1625 01:07:44,750 --> 01:07:48,010 So unfortunately, here, our pseudoattack 1626 01:07:48,010 --> 01:07:48,890 didn't really work. 1627 01:07:48,890 --> 01:07:49,840 It ran some code. 1628 01:07:49,840 --> 01:07:51,140 But then it crashed. 1629 01:07:51,140 --> 01:07:52,310 That's probably not something you want to do. 1630 01:07:52,310 --> 01:07:53,936 So if you really wanted to be careful, 1631 01:07:53,936 --> 01:07:56,310 you would carefully plant not just this return address up 1632 01:07:56,310 --> 01:07:58,180 on the stack, but maybe you'd figure out, 1633 01:07:58,180 --> 01:08:02,270 where is this second red going to get its return address from, 1634 01:08:02,270 --> 01:08:03,770 and try to carefully place something 1635 01:08:03,770 --> 01:08:06,100 else on the stack there that will ensure 1636 01:08:06,100 --> 01:08:08,950 that your program cleanly exits after it gets exploited 1637 01:08:08,950 --> 01:08:10,892 so that no one notices. 1638 01:08:10,892 --> 01:08:12,350 So this is all stuff you'll sort of 1639 01:08:12,350 --> 01:08:15,680 try to do in lab one in a little bit more detail. 1640 01:08:15,680 --> 01:08:20,189 But I guess one thing we can try to think about now 1641 01:08:20,189 --> 01:08:24,198 is, we sort of understand why it's bad to jump to the-- 1642 01:08:24,198 --> 01:08:25,614 or to have these buffer overflows. 1643 01:08:31,630 --> 01:08:33,790 One problem, or one sort of way to think of this 1644 01:08:33,790 --> 01:08:35,939 is that, the problem is just because the return address is 1645 01:08:35,939 --> 01:08:36,605 up there, right? 1646 01:08:36,605 --> 01:08:38,939 So the buffer keeps growing and eventually runs 1647 01:08:38,939 --> 01:08:41,149 over the return address. 1648 01:08:41,149 --> 01:08:43,279 What if we flip the stack around? 1649 01:08:43,279 --> 01:08:47,319 You know, some machines actually have stacks that grow up. 1650 01:08:47,319 --> 01:08:51,529 So an alternative design we could sort of imagine 1651 01:08:51,529 --> 01:08:55,370 is one where the stack starts at the bottom 1652 01:08:55,370 --> 01:08:58,550 and keeps going up instead of going down. 1653 01:08:58,550 --> 01:09:01,336 So then, if you overflow this buffer, 1654 01:09:01,336 --> 01:09:02,960 you'll just keep going up on the stack, 1655 01:09:02,960 --> 01:09:06,576 and maybe there's nothing bad that will happen. 1656 01:09:06,576 --> 01:09:07,995 Yeah? 1657 01:09:07,995 --> 01:09:10,846 AUDIENCE: [INAUDIBLE]. 1658 01:09:10,846 --> 01:09:11,970 PROFESSOR: So you're right. 1659 01:09:11,970 --> 01:09:14,667 It might be that, if you have-- well, 1660 01:09:14,667 --> 01:09:16,250 so let me draw this new stack diagram. 1661 01:09:16,250 --> 01:09:20,090 And we'll sort of try to figure out what it applies to and not. 1662 01:09:20,090 --> 01:09:20,619 But OK. 1663 01:09:20,619 --> 01:09:22,410 So we'll basically just invert the picture. 1664 01:09:22,410 --> 01:09:25,410 So when you call redirect on this alternative architecture, 1665 01:09:25,410 --> 01:09:27,300 what's going to happen is the return address 1666 01:09:27,300 --> 01:09:31,040 is going to go here on the stack. 1667 01:09:31,040 --> 01:09:34,076 Then we'll have our i variable, or maybe the saved EBP. 1668 01:09:36,930 --> 01:09:38,620 Then we'll have our i variable. 1669 01:09:38,620 --> 01:09:39,670 And then we'll have buff. 1670 01:09:39,670 --> 01:09:44,660 So we'll have buff of zero, buff 127, and so on, right? 1671 01:09:44,660 --> 01:09:48,229 So then when we do the overflow, it overflows up there and maybe 1672 01:09:48,229 --> 01:09:49,310 doesn't hit anything bad. 1673 01:09:49,310 --> 01:09:50,768 I guess what you're saying is that, 1674 01:09:50,768 --> 01:09:52,595 well, maybe we had a buffer down there. 1675 01:09:52,595 --> 01:09:54,470 And if we had a buffer down there, then yeah, 1676 01:09:54,470 --> 01:09:55,761 that seems kind of unfortunate. 1677 01:09:55,761 --> 01:09:58,930 It could overrun this return address. 1678 01:09:58,930 --> 01:09:59,690 So you're right. 1679 01:09:59,690 --> 01:10:01,420 So you could still run into problems 1680 01:10:01,420 --> 01:10:03,350 on this stack growing up. 1681 01:10:03,350 --> 01:10:04,785 But what about this exact program? 1682 01:10:08,420 --> 01:10:11,119 Is this particular program safe on machines 1683 01:10:11,119 --> 01:10:12,160 where the stack grows up? 1684 01:10:12,160 --> 01:10:15,670 So just to recap what the program read is this guy. 1685 01:10:18,391 --> 01:10:18,890 Yeah? 1686 01:10:18,890 --> 01:10:19,810 AUDIENCE: Still going to overwrite 1687 01:10:19,810 --> 01:10:21,190 [INAUDIBLE] as a return value. 1688 01:10:21,190 --> 01:10:21,360 PROFESSOR: Yeah. 1689 01:10:21,360 --> 01:10:22,735 So that's actually clever, right? 1690 01:10:22,735 --> 01:10:29,040 So this is the stack frame for redirect. 1691 01:10:29,040 --> 01:10:31,250 I guess it actually spans all the way up here. 1692 01:10:31,250 --> 01:10:34,790 But what actually happens when you call getS() is that 1693 01:10:34,790 --> 01:10:36,610 redirect makes a function call. 1694 01:10:36,610 --> 01:10:40,380 It actually saves its return address up here on the stack. 1695 01:10:40,380 --> 01:10:42,740 And then getS() starts running. 1696 01:10:42,740 --> 01:10:45,490 And getS() puts its own saved EBP up here. 1697 01:10:45,490 --> 01:10:50,240 And getS() is going to post its own variables higher up. 1698 01:10:50,240 --> 01:10:54,142 And then getS() is going to fill in the buffer. 1699 01:10:54,142 --> 01:10:55,350 So this is still problematic. 1700 01:10:55,350 --> 01:10:57,599 Basically, the buffer is surrounded by return initials 1701 01:10:57,599 --> 01:10:59,430 on all sides. 1702 01:10:59,430 --> 01:11:02,190 Either way, you're going to be able to overflow something. 1703 01:11:02,190 --> 01:11:06,300 So at what point-- suppose we had a stack growing up machine. 1704 01:11:06,300 --> 01:11:08,770 At what point would you be able to take 1705 01:11:08,770 --> 01:11:10,614 control of the program's execution then? 1706 01:11:14,100 --> 01:11:16,424 Yes, and that is actually even easier in some ways. 1707 01:11:16,424 --> 01:11:18,340 You don't have to wait until redirect returns. 1708 01:11:18,340 --> 01:11:20,200 And maybe there was like, stuff that was going to mess you up 1709 01:11:20,200 --> 01:11:21,210 like this A to i. 1710 01:11:21,210 --> 01:11:21,710 No. 1711 01:11:21,710 --> 01:11:24,281 It's actually easier, because getS() is going to overflow 1712 01:11:24,281 --> 01:11:24,780 the buffer. 1713 01:11:24,780 --> 01:11:26,480 It's going to change the return address 1714 01:11:26,480 --> 01:11:28,271 and then immediately return and immediately 1715 01:11:28,271 --> 01:11:32,440 jump to wherever you sort of tried to construct, 1716 01:11:32,440 --> 01:11:34,780 makes sense. 1717 01:11:34,780 --> 01:11:38,175 So what happens if we have a program like this 1718 01:11:38,175 --> 01:11:39,050 that's pretty boring? 1719 01:11:39,050 --> 01:11:41,091 There's like no real interesting code to jump to. 1720 01:11:41,091 --> 01:11:45,200 All you can do is get it to print different x values here. 1721 01:11:45,200 --> 01:11:47,504 What if you want to do something interesting that you 1722 01:11:47,504 --> 01:11:48,645 didn't-- yeah? 1723 01:11:48,645 --> 01:11:52,085 AUDIENCE: I mean, if you have an extra cable stack, 1724 01:11:52,085 --> 01:11:54,400 you could put arbitrary code that, 1725 01:11:54,400 --> 01:11:56,027 for example, executes a shell? 1726 01:11:56,027 --> 01:11:57,110 PROFESSOR: Yeah yeah yeah. 1727 01:11:57,110 --> 01:11:59,810 So that's kind of clever, right, because you actually 1728 01:11:59,810 --> 01:12:01,370 can supply other inputs, right? 1729 01:12:01,370 --> 01:12:04,520 So at least, well-- there's some defenses against this. 1730 01:12:04,520 --> 01:12:06,700 And we'll go over these in subsequent lectures. 1731 01:12:06,700 --> 01:12:10,270 But in principle, you could have the return address here 1732 01:12:10,270 --> 01:12:13,380 that you override on either the stack up or stack down machine. 1733 01:12:13,380 --> 01:12:16,360 And instead of pointing it to some existing code, 1734 01:12:16,360 --> 01:12:18,340 like the printf inside of main, we 1735 01:12:18,340 --> 01:12:22,486 can actually have the return address point into the buffer. 1736 01:12:22,486 --> 01:12:24,610 So it's previously just some location on the stack. 1737 01:12:24,610 --> 01:12:27,080 But you could jump there and treat it as executable. 1738 01:12:27,080 --> 01:12:29,270 So as part of your request, you'll actually 1739 01:12:29,270 --> 01:12:32,160 send some bytes of data to the server, 1740 01:12:32,160 --> 01:12:35,690 and then have the return address or the thing you overwrite here 1741 01:12:35,690 --> 01:12:37,720 point to the base of the buffer, and you'll just 1742 01:12:37,720 --> 01:12:39,240 keep going from there. 1743 01:12:39,240 --> 01:12:41,429 So then you'll be able to sort of provide 1744 01:12:41,429 --> 01:12:42,970 the code you want to run, jump to it, 1745 01:12:42,970 --> 01:12:44,390 and get the server to run it. 1746 01:12:44,390 --> 01:12:46,710 And in fact, traditionally, in Unix systems, 1747 01:12:46,710 --> 01:12:49,230 what adversaries would often do is just ask the operating 1748 01:12:49,230 --> 01:12:51,280 system to execute the binsh command, which 1749 01:12:51,280 --> 01:12:53,390 lets you sort of type in arbitrary shell commands 1750 01:12:53,390 --> 01:12:54,240 after that. 1751 01:12:54,240 --> 01:12:56,360 So as a result, this thing, this piece 1752 01:12:56,360 --> 01:12:57,910 of code you inject into this buffer, 1753 01:12:57,910 --> 01:13:01,260 was often called, sort of for historical reasons, shell code. 1754 01:13:01,260 --> 01:13:06,420 And you'll try to construct some in this lab one as well. 1755 01:13:06,420 --> 01:13:07,380 All right. 1756 01:13:07,380 --> 01:13:09,627 Make sense, what you can do here? 1757 01:13:09,627 --> 01:13:10,210 Any questions? 1758 01:13:10,210 --> 01:13:11,004 Yeah? 1759 01:13:11,004 --> 01:13:13,492 AUDIENCE: Is there a separation between code and data? 1760 01:13:13,492 --> 01:13:14,200 PROFESSOR: Right. 1761 01:13:14,200 --> 01:13:17,270 So is there a separation between code and data here? 1762 01:13:17,270 --> 01:13:20,882 At least, well, historically, many machines 1763 01:13:20,882 --> 01:13:22,840 didn't enforce any separation of code and data. 1764 01:13:22,840 --> 01:13:24,740 You'd just have a flat memory address space. 1765 01:13:24,740 --> 01:13:26,290 The stack pointer points somewhere. 1766 01:13:26,290 --> 01:13:28,322 The code pointer points somewhere else. 1767 01:13:28,322 --> 01:13:30,780 And you just execute wherever the code pointer, instruction 1768 01:13:30,780 --> 01:13:32,510 pointer is pointing. 1769 01:13:32,510 --> 01:13:35,000 Modern machines try to provide some defenses 1770 01:13:35,000 --> 01:13:36,660 for these kinds of attacks. 1771 01:13:36,660 --> 01:13:39,140 And what modern machines often do is, 1772 01:13:39,140 --> 01:13:40,810 they actually associate permissions 1773 01:13:40,810 --> 01:13:42,320 with various memory regions. 1774 01:13:42,320 --> 01:13:44,280 And one of the permissions is execute. 1775 01:13:44,280 --> 01:13:47,730 So the part of your 32-bit or 64-bit address 1776 01:13:47,730 --> 01:13:51,180 space that contains code has the execute permission. 1777 01:13:51,180 --> 01:13:53,450 So if your instruction pointer points there, 1778 01:13:53,450 --> 01:13:55,740 the CPU will actually run those things. 1779 01:13:55,740 --> 01:13:59,057 And the stack and other data portions of your address space 1780 01:13:59,057 --> 01:14:00,890 typically don't have the execute permission. 1781 01:14:00,890 --> 01:14:03,610 So if you happen to somehow set your instruction pointer 1782 01:14:03,610 --> 01:14:07,570 to some non-code memory location, you can set it, 1783 01:14:07,570 --> 01:14:10,060 but the CPU will refuse to execute it. 1784 01:14:10,060 --> 01:14:13,930 So this is a reasonably nice way to defend 1785 01:14:13,930 --> 01:14:15,250 against these kinds of attacks. 1786 01:14:15,250 --> 01:14:18,680 But it doesn't prevent quite everything. 1787 01:14:18,680 --> 01:14:19,990 So just a question. 1788 01:14:19,990 --> 01:14:20,490 OK. 1789 01:14:20,490 --> 01:14:22,410 So how would you bypass this if you 1790 01:14:22,410 --> 01:14:24,985 had this non-executable stack? 1791 01:14:24,985 --> 01:14:26,860 You actually saw this example earlier, right, 1792 01:14:26,860 --> 01:14:30,050 when I actually jumped to the middle of main. 1793 01:14:30,050 --> 01:14:34,180 So that was a way of sort of exploiting this buffer 1794 01:14:34,180 --> 01:14:36,820 overflow without having to inject new code of my own. 1795 01:14:36,820 --> 01:14:39,514 So even if the stack was non-executable, 1796 01:14:39,514 --> 01:14:41,680 I would still be able to jump in the middle of main. 1797 01:14:41,680 --> 01:14:43,554 In this particular case, it's kind of boring. 1798 01:14:43,554 --> 01:14:45,020 It just prints x and crashes. 1799 01:14:45,020 --> 01:14:46,550 But in other situations, you might 1800 01:14:46,550 --> 01:14:48,090 have other pieces of code in your program 1801 01:14:48,090 --> 01:14:50,089 that are doing interesting stuff that you really 1802 01:14:50,089 --> 01:14:51,320 do want to execute. 1803 01:14:51,320 --> 01:14:54,520 And that's sort of called return to libc attacks for, again, 1804 01:14:54,520 --> 01:14:56,170 somewhat historical reasons. 1805 01:14:56,170 --> 01:14:59,610 But it is a way to bypass the security measures. 1806 01:14:59,610 --> 01:15:02,860 So in the context of buffer overflows, 1807 01:15:02,860 --> 01:15:06,699 there's not really a clear cut solution 1808 01:15:06,699 --> 01:15:08,990 that provides perfect protection against these mistakes 1809 01:15:08,990 --> 01:15:10,600 because, at the end of the day, the programmer did 1810 01:15:10,600 --> 01:15:12,310 make some mistake in writing this source code. 1811 01:15:12,310 --> 01:15:14,540 And the best way to fix it is probably just to change 1812 01:15:14,540 --> 01:15:17,100 the source code and make sure you don't call getS() very 1813 01:15:17,100 --> 01:15:18,570 much, like the compiler warned you. 1814 01:15:18,570 --> 01:15:20,000 And there's more subtle things that the compiler 1815 01:15:20,000 --> 01:15:20,958 doesn't warn you about. 1816 01:15:20,958 --> 01:15:23,490 And you still have to avoid making those calls. 1817 01:15:23,490 --> 01:15:26,570 But because it's hard, in practice, 1818 01:15:26,570 --> 01:15:28,470 to change all the software out there, 1819 01:15:28,470 --> 01:15:30,080 many people try to devise techniques 1820 01:15:30,080 --> 01:15:33,100 that make it more difficult to exploit these bugs. 1821 01:15:33,100 --> 01:15:35,590 For example, making the stack non-executable, 1822 01:15:35,590 --> 01:15:39,135 so you can't inject the shell code onto the stack, 1823 01:15:39,135 --> 01:15:41,930 and you have to do something slightly more elaborate. 1824 01:15:41,930 --> 01:15:45,690 And next couple of lectures, next two lectures, 1825 01:15:45,690 --> 01:15:47,930 actually, we'll look at these defense techniques. 1826 01:15:47,930 --> 01:15:48,980 They're not all perfect. 1827 01:15:48,980 --> 01:15:50,480 But they do, in practice, make it 1828 01:15:50,480 --> 01:15:52,780 much more difficult for that hacker to exploit things. 1829 01:15:52,780 --> 01:15:53,065 Question? 1830 01:15:53,065 --> 01:15:54,802 AUDIENCE: I just have a general administrative question. 1831 01:15:54,802 --> 01:15:55,286 PROFESSOR: Yeah? 1832 01:15:55,286 --> 01:15:56,738 AUDIENCE: I was wondering if there was a final? 1833 01:15:56,738 --> 01:15:58,696 And also if there are quizzes, and what dates-- 1834 01:15:58,696 --> 01:16:00,200 PROFESSOR: Oh yeah. 1835 01:16:00,200 --> 01:16:02,630 Yeah, I think if you go to the schedule page, 1836 01:16:02,630 --> 01:16:03,960 there's two quizzes. 1837 01:16:03,960 --> 01:16:05,780 And there's no final during the final week, 1838 01:16:05,780 --> 01:16:08,034 but there's a quiz right before it. 1839 01:16:08,034 --> 01:16:09,450 So you're free for the final week, 1840 01:16:09,450 --> 01:16:12,100 but there's still something at the end of the class. 1841 01:16:12,100 --> 01:16:13,190 Yeah. 1842 01:16:13,190 --> 01:16:14,470 All right. 1843 01:16:14,470 --> 01:16:14,970 OK. 1844 01:16:14,970 --> 01:16:17,310 So I think that's probably it for buffer overflows. 1845 01:16:17,310 --> 01:16:19,600 I guess the one question is, so what 1846 01:16:19,600 --> 01:16:21,380 do you do about mechanism problems? 1847 01:16:21,380 --> 01:16:26,030 And the general answer is to probably have fewer mechanisms. 1848 01:16:26,030 --> 01:16:27,530 So as we saw here, if you're relying 1849 01:16:27,530 --> 01:16:29,850 on every piece of software to enforce your security policy, 1850 01:16:29,850 --> 01:16:31,349 you'll inevitably have mistakes that 1851 01:16:31,349 --> 01:16:34,110 allow an adversary to bypass your mechanism to exploit 1852 01:16:34,110 --> 01:16:35,829 some bug in the web server. 1853 01:16:35,829 --> 01:16:37,370 And a much better design, and one but 1854 01:16:37,370 --> 01:16:39,300 you will explore in lab two, is one 1855 01:16:39,300 --> 01:16:41,030 where you structure your whole system 1856 01:16:41,030 --> 01:16:42,621 so the security of the system doesn't 1857 01:16:42,621 --> 01:16:44,120 depend on all the pieces of software 1858 01:16:44,120 --> 01:16:45,790 enforcing your security policy. 1859 01:16:45,790 --> 01:16:47,206 The security policy is going to be 1860 01:16:47,206 --> 01:16:48,970 enforced by a small number of components. 1861 01:16:48,970 --> 01:16:49,900 And the rest of the stuff actually 1862 01:16:49,900 --> 01:16:51,483 doesn't matter, for security purposes, 1863 01:16:51,483 --> 01:16:52,740 if it's right or wrong. 1864 01:16:52,740 --> 01:16:55,060 It's not going to violate your security policy at all. 1865 01:16:55,060 --> 01:16:58,540 So this, kind of minimizing your trusted computing base 1866 01:16:58,540 --> 01:17:01,820 is a pretty powerful technique to get around these mechanism 1867 01:17:01,820 --> 01:17:04,410 bugs and problems that we've looked at today, at least 1868 01:17:04,410 --> 01:17:05,775 in a little bit of detail. 1869 01:17:05,775 --> 01:17:06,370 All right. 1870 01:17:06,370 --> 01:17:07,620 So read the paper for Monday. 1871 01:17:07,620 --> 01:17:08,828 And come to Monday's lecture. 1872 01:17:08,828 --> 01:17:10,770 And submit the questions on the website. 1873 01:17:10,770 --> 01:17:12,920 See you guys then.