1 00:00:00,070 --> 00:00:02,430 The following content is provided under a Creative 2 00:00:02,430 --> 00:00:03,820 Commons license. 3 00:00:03,820 --> 00:00:06,060 Your support will help MIT OpenCourseWare 4 00:00:06,060 --> 00:00:10,150 continue to offer high-quality educational resources for free. 5 00:00:10,150 --> 00:00:12,700 To make a donation or to view additional materials 6 00:00:12,700 --> 00:00:16,600 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:16,600 --> 00:00:17,255 at ocw.mit.edu. 8 00:00:27,415 --> 00:00:28,290 PROFESSOR: All right. 9 00:00:28,290 --> 00:00:33,220 So let's get started with the second lecture 10 00:00:33,220 --> 00:00:36,790 in our stunning series on web security. 11 00:00:36,790 --> 00:00:38,740 So to start off the class today, I actually 12 00:00:38,740 --> 00:00:41,220 want to go over some quick demos. 13 00:00:41,220 --> 00:00:43,937 So as you know, demos almost never work. 14 00:00:43,937 --> 00:00:46,270 So hopefully, you won't just be seeing my empty terminal 15 00:00:46,270 --> 00:00:46,770 up here. 16 00:00:46,770 --> 00:00:48,360 But the basic idea is that I first 17 00:00:48,360 --> 00:00:51,815 wanted to show you an example of the Shellshock bug 18 00:00:51,815 --> 00:00:52,940 that you may have heard of. 19 00:00:52,940 --> 00:00:55,940 This has been a pretty sort of popular topic 20 00:00:55,940 --> 00:00:57,320 in the security literature. 21 00:00:57,320 --> 00:00:59,200 And people were saying that Heartbleed 22 00:00:59,200 --> 00:01:01,265 was like a 10 out of 10 security [? bug. ?] 23 00:01:01,265 --> 00:01:03,640 But people were saying, like, we should not have reserved 24 00:01:03,640 --> 00:01:04,972 10 out of 10 for Heartbleed. 25 00:01:04,972 --> 00:01:06,055 This is potentially worse. 26 00:01:06,055 --> 00:01:06,620 All right? 27 00:01:06,620 --> 00:01:08,280 And so I thought that this would be a great idea for you guys 28 00:01:08,280 --> 00:01:10,790 to see some living history, and for you to tell your parents 29 00:01:10,790 --> 00:01:13,165 that, you know, they're getting their tuition's worth out 30 00:01:13,165 --> 00:01:13,710 of MIT. 31 00:01:13,710 --> 00:01:18,350 So what is the basic idea behind the Shellshock bug? 32 00:01:18,350 --> 00:01:19,890 Well, it's a really great example 33 00:01:19,890 --> 00:01:22,760 of why it's so difficult to build secure web 34 00:01:22,760 --> 00:01:25,520 applications that span multiple technology 35 00:01:25,520 --> 00:01:28,060 stacks, multiple languages, multiple OS's, so on and so 36 00:01:28,060 --> 00:01:28,740 forth. 37 00:01:28,740 --> 00:01:31,100 So the basic idea is that Shellshock 38 00:01:31,100 --> 00:01:33,620 is going to take advantage of the fact 39 00:01:33,620 --> 00:01:37,610 that the attacker can craft a special HTTP 40 00:01:37,610 --> 00:01:40,860 request to a server and control the headers that 41 00:01:40,860 --> 00:01:42,230 are in that request. 42 00:01:42,230 --> 00:01:45,502 And so I've written an example up here. 43 00:01:45,502 --> 00:01:46,210 It's very simple. 44 00:01:46,210 --> 00:01:49,940 So let's say that the attacker wants to send some GET query. 45 00:01:49,940 --> 00:01:53,890 They're going to send that query to some CGI interface. 46 00:01:53,890 --> 00:01:55,890 And then there's going to be some question mark. 47 00:01:55,890 --> 00:01:57,389 The person wants to search for cats, 48 00:01:57,389 --> 00:01:59,165 because that's all that people search for. 49 00:01:59,165 --> 00:02:01,320 And then there's some standard headers here, 50 00:02:01,320 --> 00:02:02,730 like host, for example. 51 00:02:02,730 --> 00:02:06,610 So this is saying that this URL here 52 00:02:06,610 --> 00:02:08,750 is hanging off of example.com. 53 00:02:08,750 --> 00:02:12,360 Now, note that the attacker can also specify custom headers. 54 00:02:12,360 --> 00:02:12,860 Right? 55 00:02:12,860 --> 00:02:14,485 So the attacker can just say, I want 56 00:02:14,485 --> 00:02:16,300 it to find some application-specific header 57 00:02:16,300 --> 00:02:19,122 called Custom-header, and I want to specify some value there, 58 00:02:19,122 --> 00:02:21,580 because you can imagine that a web application might define 59 00:02:21,580 --> 00:02:22,996 certain functionalities that can't 60 00:02:22,996 --> 00:02:25,811 be expressed using the simple, pre-defined HTTP headers. 61 00:02:25,811 --> 00:02:26,310 OK. 62 00:02:26,310 --> 00:02:28,280 So that all seems fairly innocuous. 63 00:02:28,280 --> 00:02:32,000 But what ends up happening is that in a lot of these CGI web 64 00:02:32,000 --> 00:02:35,640 servers, they will actually take these custom header values 65 00:02:35,640 --> 00:02:39,430 and use them to set environment variables for Bash. 66 00:02:39,430 --> 00:02:40,070 OK? 67 00:02:40,070 --> 00:02:43,480 So they will use this header to create a Bash variable name 68 00:02:43,480 --> 00:02:44,320 custom header. 69 00:02:44,320 --> 00:02:45,970 Then they will take this value here 70 00:02:45,970 --> 00:02:48,440 that the attacker has supplied, and use 71 00:02:48,440 --> 00:02:51,360 that to be the value of that Bash variable, right? 72 00:02:51,360 --> 00:02:53,290 And then once that variable is set up, 73 00:02:53,290 --> 00:02:57,110 then the CGI server will do some processing 74 00:02:57,110 --> 00:02:58,940 in the context of that environment. 75 00:02:58,940 --> 00:02:59,440 Right? 76 00:02:59,440 --> 00:03:00,550 So this is clearly bad. 77 00:03:00,550 --> 00:03:02,440 You can probably see where this is going. 78 00:03:02,440 --> 00:03:05,020 Web servers should not be taking these arbitrary values 79 00:03:05,020 --> 00:03:07,000 from arbitrary unwashed masses. 80 00:03:07,000 --> 00:03:09,780 So in the particular example of the Shellshock bug, 81 00:03:09,780 --> 00:03:13,470 what ended up happening is that if you set your Bash variable 82 00:03:13,470 --> 00:03:17,280 to this, this kind of malformed, evil-looking thing, then 83 00:03:17,280 --> 00:03:19,070 there's going to be insanity that happens. 84 00:03:19,070 --> 00:03:21,730 Basically, this is a malformed [? Select ?] function 85 00:03:21,730 --> 00:03:24,597 definition in the Bash scripting language. 86 00:03:24,597 --> 00:03:26,680 You don't have to worry about the specifics of it. 87 00:03:26,680 --> 00:03:30,020 But what was intended to happen, if Bash were correct, 88 00:03:30,020 --> 00:03:32,650 is that this part over here wouldn't be executed. 89 00:03:32,650 --> 00:03:35,335 So basically, you just defined some stupid function here 90 00:03:35,335 --> 00:03:36,376 that doesn't do anything. 91 00:03:36,376 --> 00:03:38,330 And in the [INAUDIBLE] terminate here. 92 00:03:38,330 --> 00:03:40,370 But this sequence of characters actually 93 00:03:40,370 --> 00:03:42,006 confuses the Bash parser. 94 00:03:42,006 --> 00:03:44,380 And so what ends up happening is that it sort of stumbles 95 00:03:44,380 --> 00:03:46,180 through this nonsense here. 96 00:03:46,180 --> 00:03:48,430 And then it says, oh, I might as well keep on parsing 97 00:03:48,430 --> 00:03:50,680 and execute some commands here, right? 98 00:03:50,680 --> 00:03:53,262 And so in this case, this just does the bin/id command, 99 00:03:53,262 --> 00:03:55,220 which displays some information about the user. 100 00:03:55,220 --> 00:03:57,080 But this could be any code right here. 101 00:03:57,080 --> 00:03:59,267 So that's the heart of the vulnerability. 102 00:03:59,267 --> 00:04:01,100 So I'll give you a very simple example here, 103 00:04:01,100 --> 00:04:02,650 so you see up on the screen. 104 00:04:02,650 --> 00:04:07,400 So basically, we've got a very simple Python server here, 105 00:04:07,400 --> 00:04:09,400 just the dumbest one you could possibly imagine. 106 00:04:09,400 --> 00:04:11,380 It's got this do GET method. 107 00:04:11,380 --> 00:04:13,200 And so with the do GET method, it 108 00:04:13,200 --> 00:04:18,709 is going to basically iterate through all of the HTTP headers 109 00:04:18,709 --> 00:04:19,450 in the request. 110 00:04:19,450 --> 00:04:19,950 OK? 111 00:04:19,950 --> 00:04:23,270 So that's what this four key value for the header 112 00:04:23,270 --> 00:04:25,787 and the value in this request. 113 00:04:25,787 --> 00:04:28,120 And then it'll just print out the headers that it finds. 114 00:04:28,120 --> 00:04:29,900 And then in this dirt-simple example, 115 00:04:29,900 --> 00:04:31,691 it's going to do something very dumb, which 116 00:04:31,691 --> 00:04:34,960 is execute the system call and just directly set 117 00:04:34,960 --> 00:04:39,640 the environment value to the value specified in the header. 118 00:04:39,640 --> 00:04:41,390 So that's the whole root of vulnerability. 119 00:04:41,390 --> 00:04:45,465 So if I come over here and I start my victim web server-- 120 00:04:45,465 --> 00:04:48,190 OK, so it's now ready to accept requests. 121 00:04:48,190 --> 00:04:52,140 And then I can write my special Shellshock client like so. 122 00:04:52,140 --> 00:04:54,530 And this is actually pretty dirt-simple. 123 00:04:54,530 --> 00:04:58,900 So here, I just define one of these malformed strings. 124 00:04:58,900 --> 00:05:01,856 So I have these kind of janky characters at the beginning. 125 00:05:01,856 --> 00:05:04,230 And then I know that everything after this is essentially 126 00:05:04,230 --> 00:05:07,250 going to be executed on my behalf on the server side. 127 00:05:07,250 --> 00:05:08,834 So in this case, I pick something that 128 00:05:08,834 --> 00:05:10,083 was actually pretty innocuous. 129 00:05:10,083 --> 00:05:11,720 It just says, echo, I own your machine. 130 00:05:11,720 --> 00:05:13,670 But this could be anything here. 131 00:05:13,670 --> 00:05:17,170 You could start another Bash shell kind of like I do here. 132 00:05:17,170 --> 00:05:20,680 And then, echo attacker command, where in the real world, 133 00:05:20,680 --> 00:05:22,890 that could actually be something very dangerous. 134 00:05:22,890 --> 00:05:27,364 So then I set the headers and my custom request. 135 00:05:27,364 --> 00:05:29,655 And then I just use Python to create an HTTP connection 136 00:05:29,655 --> 00:05:31,047 and just send it to server. 137 00:05:31,047 --> 00:05:32,130 So what ends up happening? 138 00:05:32,130 --> 00:05:37,190 So I execute my Shellshock client here. 139 00:05:37,190 --> 00:05:38,900 So it's saying that I had a 404 here, 140 00:05:38,900 --> 00:05:41,400 because it doesn't matter what file I requested. 141 00:05:41,400 --> 00:05:44,760 So I just put in some index, an HTML that doesn't exist. 142 00:05:44,760 --> 00:05:48,070 But if we look over here, this is the output for the server. 143 00:05:48,070 --> 00:05:51,450 And so what you see is that you have this output, 144 00:05:51,450 --> 00:05:53,690 I OWN UR MACHINE, and ATTACKER CMD. 145 00:05:53,690 --> 00:05:56,400 And that's because as the server got that header, 146 00:05:56,400 --> 00:05:58,120 it set the Bash variable. 147 00:05:58,120 --> 00:06:00,020 It set it with this weird thing here. 148 00:06:00,020 --> 00:06:02,310 And as a result, an ATTACKER-controlled command 149 00:06:02,310 --> 00:06:03,800 got the run. 150 00:06:03,800 --> 00:06:06,422 So does that all make sense? 151 00:06:06,422 --> 00:06:12,714 AUDIENCE: So does this happen if the program is run under that? 152 00:06:12,714 --> 00:06:14,774 I'm still unclear on, like-- 153 00:06:14,774 --> 00:06:15,440 PROFESSOR: Yeah. 154 00:06:15,440 --> 00:06:17,840 So the specifics of how the attack works actually 155 00:06:17,840 --> 00:06:20,600 depends on are you running Apache, like what exactly 156 00:06:20,600 --> 00:06:21,900 your web server looks like. 157 00:06:21,900 --> 00:06:23,900 So in this example, it's a little bit contrived, 158 00:06:23,900 --> 00:06:26,890 because I actually called [INAUDIBLE] explicitly spawned 159 00:06:26,890 --> 00:06:29,932 off another Bash shell, set the environment variable in there, 160 00:06:29,932 --> 00:06:31,140 and then we were ready to go. 161 00:06:31,140 --> 00:06:33,440 But you could imagine that if you were spawning off 162 00:06:33,440 --> 00:06:35,827 a different process for each incoming connection, 163 00:06:35,827 --> 00:06:38,160 you could set the environment variable for that directly 164 00:06:38,160 --> 00:06:40,660 if that guy was using-- was living inside 165 00:06:40,660 --> 00:06:41,905 of a Bash environment. 166 00:06:41,905 --> 00:06:44,244 AUDIENCE: So if you go back to your web server code, 167 00:06:44,244 --> 00:06:47,600 it seems that you have a much worse vulnerability 168 00:06:47,600 --> 00:06:50,386 than the Shellshock, because you're calling [? though ?] 169 00:06:50,386 --> 00:06:50,886 a system. 170 00:06:50,886 --> 00:06:55,117 And I can execute a command just by setting the custom header 171 00:06:55,117 --> 00:06:56,200 to something [? that I ?]. 172 00:06:56,200 --> 00:06:59,032 I wouldn't have to use the Shellshock bug in this example. 173 00:06:59,032 --> 00:07:00,115 PROFESSOR: That's correct. 174 00:07:00,115 --> 00:07:00,230 Yeah. 175 00:07:00,230 --> 00:07:02,480 So in this particular web server, which is something 176 00:07:02,480 --> 00:07:04,150 I wrote just for sort of teaching value, 177 00:07:04,150 --> 00:07:06,233 yeah, this thing you shouldn't trust for anything. 178 00:07:06,233 --> 00:07:07,790 AUDIENCE: But the Shellshock exploit 179 00:07:07,790 --> 00:07:10,677 was on assigning something malicious to an environment 180 00:07:10,677 --> 00:07:12,760 variable using [? set N ?] or something like that, 181 00:07:12,760 --> 00:07:13,420 which is something [INAUDIBLE]. 182 00:07:13,420 --> 00:07:13,724 PROFESSOR: Oh, yeah, yeah. 183 00:07:13,724 --> 00:07:14,840 So that gets back to his question. 184 00:07:14,840 --> 00:07:15,410 That's right. 185 00:07:15,410 --> 00:07:17,954 So if you had, like, let's say, Apache up here, 186 00:07:17,954 --> 00:07:19,995 Apache's a little bit tricky to sort of configure 187 00:07:19,995 --> 00:07:21,350 in a way that's obviously what's going on. 188 00:07:21,350 --> 00:07:22,391 But you're exactly right. 189 00:07:22,391 --> 00:07:24,580 So Apache would call Set nth, which 190 00:07:24,580 --> 00:07:26,970 is another way that you can directly set the environment 191 00:07:26,970 --> 00:07:28,845 value for whatever particular service [? I ?] 192 00:07:28,845 --> 00:07:29,745 process you have. 193 00:07:29,745 --> 00:07:31,370 But you also actually have some servers 194 00:07:31,370 --> 00:07:33,690 like this that you can imagine that they actually 195 00:07:33,690 --> 00:07:36,370 do a spawn of a separate process and do something very morally 196 00:07:36,370 --> 00:07:37,220 equivalent to this. 197 00:07:37,220 --> 00:07:38,803 But you're exactly right, that the way 198 00:07:38,803 --> 00:07:41,070 that a patch in particular was violated 199 00:07:41,070 --> 00:07:44,580 was the way that you described. 200 00:07:44,580 --> 00:07:45,790 So does it all make sense? 201 00:07:48,730 --> 00:07:49,240 OK. 202 00:07:49,240 --> 00:07:52,725 So that's sort of a quick and dirty example 203 00:07:52,725 --> 00:07:55,310 of Shellshock stuff. 204 00:07:55,310 --> 00:07:57,210 And so another example I wanted to give 205 00:07:57,210 --> 00:08:01,460 you was an example of a cross-site scripting. 206 00:08:01,460 --> 00:08:03,031 And so the Shellshock bug was sort 207 00:08:03,031 --> 00:08:06,885 of an example of how content sanitization is very important. 208 00:08:06,885 --> 00:08:09,010 So as we'd just been discussing, you shouldn't just 209 00:08:09,010 --> 00:08:10,700 take inputs from an arbitrary person 210 00:08:10,700 --> 00:08:14,000 and them use them directly in commands of any type. 211 00:08:14,000 --> 00:08:16,490 So cross-site scripting attacks are another example 212 00:08:16,490 --> 00:08:18,550 of how something can go wrong. 213 00:08:18,550 --> 00:08:21,670 So in this example, I have another sort 214 00:08:21,670 --> 00:08:24,950 of dumb CGI server here. 215 00:08:24,950 --> 00:08:28,000 And if we look at this CGI server, 216 00:08:28,000 --> 00:08:29,450 so what is it going to do? 217 00:08:29,450 --> 00:08:31,580 So once again, I've written something very simple 218 00:08:31,580 --> 00:08:32,532 in Python. 219 00:08:32,532 --> 00:08:33,990 This is going to be the handle that 220 00:08:33,990 --> 00:08:36,792 executes when a request comes in from the client. 221 00:08:36,792 --> 00:08:38,250 And so essentially, what happens is 222 00:08:38,250 --> 00:08:42,701 that up here, I'm going to print some headers for the response. 223 00:08:42,701 --> 00:08:44,159 So I'm going to say, my response is 224 00:08:44,159 --> 00:08:46,250 going to be of type text HTML. 225 00:08:46,250 --> 00:08:48,934 This line here we'll actually explain in a second. 226 00:08:48,934 --> 00:08:51,350 So as it turns out, browsers have some security mechanisms 227 00:08:51,350 --> 00:08:54,250 to try to prevent the attack that I'm about to show you. 228 00:08:54,250 --> 00:08:56,560 So I put this example-- I put that header line in there 229 00:08:56,560 --> 00:08:59,060 to turn some of the protections off. 230 00:08:59,060 --> 00:09:01,640 And then what the CGI script does 231 00:09:01,640 --> 00:09:06,625 is it gets access to all of the fields and the CGI requests. 232 00:09:06,625 --> 00:09:09,180 So imagine that everything in a query string 233 00:09:09,180 --> 00:09:13,140 after this question mark-- like these header and value things, 234 00:09:13,140 --> 00:09:15,740 that's what goes into that form example there. 235 00:09:15,740 --> 00:09:18,510 And so what the CGI script does is something very simple. 236 00:09:18,510 --> 00:09:22,960 It just directly prints the value of something that 237 00:09:22,960 --> 00:09:25,280 was passed from the attacker. 238 00:09:25,280 --> 00:09:26,150 So same basic idea. 239 00:09:26,150 --> 00:09:28,600 This is a bad idea, because this Print statement, 240 00:09:28,600 --> 00:09:32,200 it's printing directly into the HTML itself. 241 00:09:32,200 --> 00:09:35,840 So what can happen is as follows. 242 00:09:35,840 --> 00:09:40,740 So let's say that I have a bunch of queries I want to run. 243 00:09:40,740 --> 00:09:44,760 So in this first query here, I'm just setting the message value 244 00:09:44,760 --> 00:09:46,350 to Hello. 245 00:09:46,350 --> 00:09:50,567 So if I go over here and I run that page, 246 00:09:50,567 --> 00:09:52,900 well, then you're going to see that this Hello shows up, 247 00:09:52,900 --> 00:09:54,983 because once again, the server was taking directly 248 00:09:54,983 --> 00:09:55,740 what I pass to it. 249 00:09:55,740 --> 00:09:57,140 And it prints Hello. 250 00:09:57,140 --> 00:09:59,150 So no big surprises there. 251 00:09:59,150 --> 00:10:01,890 Now let's say I realize that I can actually 252 00:10:01,890 --> 00:10:03,950 pass arbitrary HTML in there. 253 00:10:03,950 --> 00:10:09,930 So now I actually try to embed some styling in there. 254 00:10:09,930 --> 00:10:12,550 So I say, h1 and then Hello again /h1. 255 00:10:12,550 --> 00:10:13,550 So that worked, right? 256 00:10:13,550 --> 00:10:16,310 So once again, we're printing directly into the [? pake. ?] 257 00:10:16,310 --> 00:10:18,410 So now you might think, OK, we're in business now. 258 00:10:18,410 --> 00:10:18,951 This is cool. 259 00:10:18,951 --> 00:10:23,530 So let's just directly embed some JavaScript code in there. 260 00:10:23,530 --> 00:10:24,030 All right. 261 00:10:24,030 --> 00:10:25,761 And so I do this. 262 00:10:25,761 --> 00:10:28,010 And here, I've actually just put in-- for the message, 263 00:10:28,010 --> 00:10:29,050 I put script. 264 00:10:29,050 --> 00:10:34,067 And then I want it to just alert XSS and then script. 265 00:10:34,067 --> 00:10:35,150 So now that's interesting. 266 00:10:35,150 --> 00:10:37,220 So it seems like something didn't quite work. 267 00:10:37,220 --> 00:10:38,320 So I don't see any output. 268 00:10:38,320 --> 00:10:39,569 I didn't see the alert either. 269 00:10:39,569 --> 00:10:43,840 And if I actually look at the output for the web server-- 270 00:10:43,840 --> 00:10:46,700 and what I see is that here, the web 271 00:10:46,700 --> 00:10:49,950 server itself didn't actually get that trailing script tag. 272 00:10:49,950 --> 00:10:52,190 So it seems like the browser itself has somehow 273 00:10:52,190 --> 00:10:54,470 detected something evil even though I tried 274 00:10:54,470 --> 00:10:58,244 to disable the XSS filter. 275 00:10:58,244 --> 00:10:59,160 So that's interesting. 276 00:10:59,160 --> 00:11:02,805 We're going to come to this defense mechanism a bit 277 00:11:02,805 --> 00:11:03,430 in the lecture. 278 00:11:03,430 --> 00:11:05,429 But suffice it to say, it seems like the browser 279 00:11:05,429 --> 00:11:07,822 is trying to resist this cross-site scripting attack. 280 00:11:07,822 --> 00:11:09,530 But of course, what we can take advantage 281 00:11:09,530 --> 00:11:13,730 of is the fact that HTML, and CSS, and JavaScript, 282 00:11:13,730 --> 00:11:15,910 they're extremely complex languages. 283 00:11:15,910 --> 00:11:18,910 And they compose in these very difficult to understand ways. 284 00:11:18,910 --> 00:11:21,510 So here, this is what I've been setting my attack string here. 285 00:11:21,510 --> 00:11:23,380 This is malform URL. 286 00:11:23,380 --> 00:11:25,350 I'm saying, image, and then three quotation 287 00:11:25,350 --> 00:11:27,340 marks in a row, and then a script tag. 288 00:11:27,340 --> 00:11:29,330 Like, this shouldn't actually parse. 289 00:11:29,330 --> 00:11:30,900 But what's going to end up happening 290 00:11:30,900 --> 00:11:34,740 is that the browser's going to get confused here. 291 00:11:34,740 --> 00:11:38,560 So it's built-in cross-site scripting detection 292 00:11:38,560 --> 00:11:40,210 actually fails here. 293 00:11:40,210 --> 00:11:42,710 And so what ends up happening is that now you see the alert. 294 00:11:42,710 --> 00:11:43,230 OK? 295 00:11:43,230 --> 00:11:45,146 And what's interesting is that if you actually 296 00:11:45,146 --> 00:11:48,130 look at the contents of the page now, it's kind of messed up. 297 00:11:48,130 --> 00:11:51,520 Like, where did this quotation mark and brace come in? 298 00:11:51,520 --> 00:11:53,690 If we do a Control, U, we can see 299 00:11:53,690 --> 00:11:57,574 that this does not make the browser happy in some way. 300 00:11:57,574 --> 00:11:58,740 That's a little bit unclear. 301 00:11:58,740 --> 00:12:01,430 But it doesn't matter if we're their attacker. 302 00:12:01,430 --> 00:12:03,009 We saw that alert. 303 00:12:03,009 --> 00:12:04,800 That means that [? our code ?] got the run. 304 00:12:04,800 --> 00:12:06,750 And from the perspective of the attacker, who cares 305 00:12:06,750 --> 00:12:08,041 that the page is messed up now? 306 00:12:08,041 --> 00:12:09,660 Because I could have used that code 307 00:12:09,660 --> 00:12:11,932 to steal the cookie or things like that. 308 00:12:11,932 --> 00:12:13,140 So does that all make-- yeah? 309 00:12:13,140 --> 00:12:15,137 AUDIENCE: What's the cross-site aspect? 310 00:12:15,137 --> 00:12:15,720 PROFESSOR: Ah. 311 00:12:15,720 --> 00:12:21,200 So the cross-site aspect is that if the attacker can convince 312 00:12:21,200 --> 00:12:24,220 the user to go to a URL like this, 313 00:12:24,220 --> 00:12:26,270 then the attacker's the one who's specifying 314 00:12:26,270 --> 00:12:27,820 that stuff in the message. 315 00:12:27,820 --> 00:12:30,610 It's the attacker who's specifying the alert XSS 316 00:12:30,610 --> 00:12:31,660 or something like that. 317 00:12:31,660 --> 00:12:33,160 And so essentially, what's happening 318 00:12:33,160 --> 00:12:35,890 is that the victim page is executing code 319 00:12:35,890 --> 00:12:40,110 on behalf of someone that is not that page. 320 00:12:40,110 --> 00:12:42,535 AUDIENCE: Can you explain exactly what the browser roles 321 00:12:42,535 --> 00:12:44,554 are for sanitizing [? games ?] for [? play? ?] 322 00:12:44,554 --> 00:12:45,470 PROFESSOR: Yeah, yeah. 323 00:12:45,470 --> 00:12:46,844 So we'll get to that in a second. 324 00:12:46,844 --> 00:12:49,750 So we'll get to that in a second. 325 00:12:49,750 --> 00:12:50,300 OK. 326 00:12:50,300 --> 00:12:55,745 So that is all for story time. 327 00:12:55,745 --> 00:12:57,110 And let's see here. 328 00:12:57,110 --> 00:12:59,644 So I guess I can turn this guy on. 329 00:12:59,644 --> 00:13:07,991 And maybe he will [INAUDIBLE] guy here. 330 00:13:07,991 --> 00:13:08,589 This guy here. 331 00:13:08,589 --> 00:13:09,464 AUDIENCE: Front wall. 332 00:13:09,464 --> 00:13:10,214 PROFESSOR: Ah, OK. 333 00:13:10,214 --> 00:13:11,745 There you go. 334 00:13:11,745 --> 00:13:12,245 All right. 335 00:13:12,245 --> 00:13:13,245 Eighth time's the charm. 336 00:13:13,245 --> 00:13:14,365 OK, thanks. 337 00:13:14,365 --> 00:13:14,865 OK. 338 00:13:14,865 --> 00:13:17,565 So yeah, so those are just two quick demos to sort of show you 339 00:13:17,565 --> 00:13:19,810 the filthy and dirty world that we live in right now. 340 00:13:19,810 --> 00:13:23,090 So why is cross-site scripting so prevalent? 341 00:13:23,090 --> 00:13:25,780 Why are these problems such a big deal? 342 00:13:25,780 --> 00:13:27,745 Well, the reason is that websites 343 00:13:27,745 --> 00:13:29,740 are increasingly more and more dynamic, 344 00:13:29,740 --> 00:13:32,940 and they want to incorporate a user content a lot of times, 345 00:13:32,940 --> 00:13:35,610 or they want to incorporate content from other domains. 346 00:13:35,610 --> 00:13:38,810 So think about the Comment section on a news article. 347 00:13:38,810 --> 00:13:40,895 Those comments come from untrusted folks, 348 00:13:40,895 --> 00:13:41,520 from the users. 349 00:13:41,520 --> 00:13:43,950 So somehow, these sites have to figure out, 350 00:13:43,950 --> 00:13:46,710 what are the rules for composing those types of things? 351 00:13:46,710 --> 00:13:50,410 And also, the websites might host user-submitted documents, 352 00:13:50,410 --> 00:13:53,280 a thing like Google Docs or Office 365, for example. 353 00:13:53,280 --> 00:13:56,010 Those documents all come from untrusted folks, 354 00:13:56,010 --> 00:13:58,946 but somehow, they have to live with each other 355 00:13:58,946 --> 00:14:00,320 and with the large infrastructure 356 00:14:00,320 --> 00:14:03,530 from Google or from Microsoft or whatnot. 357 00:14:03,530 --> 00:14:06,740 So what are some of the cross-site scripting defenses 358 00:14:06,740 --> 00:14:07,240 we can use? 359 00:14:07,240 --> 00:14:08,827 This kind of gets to your question. 360 00:14:08,827 --> 00:14:10,660 So we'll actually look at some of those now. 361 00:14:16,140 --> 00:14:21,120 So one type of defense is to basically 362 00:14:21,120 --> 00:14:26,600 have cross-site scripting filters in the browser itself. 363 00:14:30,210 --> 00:14:32,730 And so these filters will essentially 364 00:14:32,730 --> 00:14:38,090 try to detect when there's a potential cross-site scripting 365 00:14:38,090 --> 00:14:39,360 attack. 366 00:14:39,360 --> 00:14:42,490 And so we actually saw one of those filters in action. 367 00:14:42,490 --> 00:14:45,560 And I think that was the third example that we looked at. 368 00:14:45,560 --> 00:14:49,140 If you have some website-- or some URL-- excuse me-- 369 00:14:49,140 --> 00:14:53,850 that looks like this-- so foo.com. 370 00:14:53,850 --> 00:14:59,280 And then you have some question mark and then some query string 371 00:14:59,280 --> 00:15:01,300 you're going to submit. 372 00:15:01,300 --> 00:15:06,830 This is very similar to the example that I tried third. 373 00:15:06,830 --> 00:15:09,660 So I just set this source to something like 374 00:15:09,660 --> 00:15:22,150 evil.com/cookiestealer.js. 375 00:15:22,150 --> 00:15:23,900 And so what ended up happening was 376 00:15:23,900 --> 00:15:25,940 that when I tried an example similar to this, 377 00:15:25,940 --> 00:15:28,460 the browser actually rejected it out of hand. 378 00:15:28,460 --> 00:15:30,380 So we saw that it didn't even work. 379 00:15:30,380 --> 00:15:31,820 And the reason why it didn't work 380 00:15:31,820 --> 00:15:34,330 is because the browser looked and said, 381 00:15:34,330 --> 00:15:40,451 is there an embedded script tag in a URL? 382 00:15:40,451 --> 00:15:42,200 So basically, it's a very simple heuristic 383 00:15:42,200 --> 00:15:44,860 for figuring out if something evil's probably going on, 384 00:15:44,860 --> 00:15:47,570 because no legitimate developer-- or no developer 385 00:15:47,570 --> 00:15:49,752 that's sane-- should be doing stuff like this. 386 00:15:49,752 --> 00:15:51,710 So there's actually these configuration options 387 00:15:51,710 --> 00:15:54,209 in your browser you can use to turn these things on and off. 388 00:15:54,209 --> 00:15:55,900 Occasionally, this is useful for testing 389 00:15:55,900 --> 00:15:57,941 if you just want to inject some JavaScript really 390 00:15:57,941 --> 00:15:58,940 quick and dirty. 391 00:15:58,940 --> 00:16:01,530 But this is almost always assigned [INAUDIBLE]. 392 00:16:01,530 --> 00:16:03,925 So for example, Chrome and IE have a built-in filter that 393 00:16:03,925 --> 00:16:06,290 will look at your URL value in the address bar, 394 00:16:06,290 --> 00:16:08,240 look for things like this. 395 00:16:08,240 --> 00:16:11,530 And if it's there, they will do things like maybe delete 396 00:16:11,530 --> 00:16:12,910 this whole thing completely. 397 00:16:12,910 --> 00:16:15,420 They will maybe change the source to be empty, 398 00:16:15,420 --> 00:16:16,357 stuff like that. 399 00:16:16,357 --> 00:16:18,190 And so essentially, to get to your question, 400 00:16:18,190 --> 00:16:21,310 there's a bunch of heuristics that the browsers have 401 00:16:21,310 --> 00:16:22,940 to identify things like this. 402 00:16:22,940 --> 00:16:24,750 And if you look at the OWASP site, 403 00:16:24,750 --> 00:16:27,509 they actually collect examples of heuristics 404 00:16:27,509 --> 00:16:29,300 you can use to detect cross-site scripting, 405 00:16:29,300 --> 00:16:33,906 as well as tricks you can use to bypass those filters. 406 00:16:33,906 --> 00:16:34,780 So it was very funny. 407 00:16:34,780 --> 00:16:36,360 So the first thing I wanted to do for the demo 408 00:16:36,360 --> 00:16:38,320 is do something like this, and it didn't work. 409 00:16:38,320 --> 00:16:40,060 So then I went to the OWASP cheat sheet. 410 00:16:40,060 --> 00:16:42,170 I looked at, like, the third thing they suggested, 411 00:16:42,170 --> 00:16:43,586 and the third thing they suggested 412 00:16:43,586 --> 00:16:47,460 worked, which was that sort of broken image syntax type stuff. 413 00:16:47,460 --> 00:16:50,702 So the basic problem with just relying on this 414 00:16:50,702 --> 00:16:52,910 is that, like I said, there's a lot of different ways 415 00:16:52,910 --> 00:16:56,710 to force the CSS and HTML parsers to mal-parse something. 416 00:16:56,710 --> 00:16:59,080 So these things are not complete solutions. 417 00:16:59,080 --> 00:17:00,980 They don't have the perfect coverage. 418 00:17:00,980 --> 00:17:02,630 AUDIENCE: Shouldn't this just be like the lead-in 419 00:17:02,630 --> 00:17:03,379 from the browsers? 420 00:17:03,379 --> 00:17:07,449 Because it seems like not the browser's job to do this. 421 00:17:07,449 --> 00:17:09,365 PROFESSOR: You mean it's not the browser's job 422 00:17:09,365 --> 00:17:10,795 to sanitize this kind of stuff? 423 00:17:10,795 --> 00:17:11,420 AUDIENCE: Yeah. 424 00:17:11,420 --> 00:17:12,280 PROFESSOR: I mean, you could imagine 425 00:17:12,280 --> 00:17:15,119 sort of having a browser sit atop a proxy, for example. 426 00:17:15,119 --> 00:17:17,290 And maybe the proxy did sort of cleaning like this. 427 00:17:17,290 --> 00:17:20,029 I mean, intuitive reason why it might make sense 428 00:17:20,029 --> 00:17:21,945 to do it inside the browser is because so many 429 00:17:21,945 --> 00:17:25,519 of the legitimate parsing engines are inside the browser. 430 00:17:25,519 --> 00:17:28,060 So presumably, if you're closer to where the actual parsing's 431 00:17:28,060 --> 00:17:29,070 being done, it's better. 432 00:17:29,070 --> 00:17:29,778 But you're right. 433 00:17:29,778 --> 00:17:32,316 In practice, you can imagine there being sort of defense 434 00:17:32,316 --> 00:17:33,190 in layers, basically. 435 00:17:33,190 --> 00:17:34,606 AUDIENCE: I think what he might be 436 00:17:34,606 --> 00:17:37,198 saying is that it's the web developer's job, 437 00:17:37,198 --> 00:17:39,960 not the client's job to sanitize this. 438 00:17:39,960 --> 00:17:42,480 PROFESSOR: But, I mean, that's kind of like saying-- so 439 00:17:42,480 --> 00:17:44,730 in a certain sense, we could say that about processes, 440 00:17:44,730 --> 00:17:46,360 too, in Unix or Windows. 441 00:17:46,360 --> 00:17:49,460 So we could say it's sort of developers' jobs 442 00:17:49,460 --> 00:17:51,150 to make sure those things stay isolated. 443 00:17:51,150 --> 00:17:53,090 But in fact, the OS and the hardware 444 00:17:53,090 --> 00:17:54,840 as well has an important role to play, 445 00:17:54,840 --> 00:17:57,087 because [INAUDIBLE] trusted whereas any two arbitrary 446 00:17:57,087 --> 00:17:59,170 programs developed by any two arbitrary developers 447 00:17:59,170 --> 00:18:01,670 may or may not be trusted to sort of implement security 448 00:18:01,670 --> 00:18:02,180 correctly. 449 00:18:02,180 --> 00:18:02,971 But you're correct. 450 00:18:02,971 --> 00:18:05,100 And in fact, frameworks like Django or whatnot, 451 00:18:05,100 --> 00:18:06,891 they actually try to help you to get around 452 00:18:06,891 --> 00:18:08,860 some of these problems. 453 00:18:08,860 --> 00:18:11,350 So anyways, so yeah, so filters are not a perfect solution. 454 00:18:11,350 --> 00:18:18,460 And also, filters can't prevent what's 455 00:18:18,460 --> 00:18:27,120 known as a persistent-- persistent-- cross-site 456 00:18:27,120 --> 00:18:29,590 scripting attacks. 457 00:18:29,590 --> 00:18:32,430 This is known as sort of a reflected or transient one, 458 00:18:32,430 --> 00:18:35,290 because this script code just sort of lives in the URL. 459 00:18:35,290 --> 00:18:37,460 Then once the user's closed that URL, 460 00:18:37,460 --> 00:18:38,941 basically, the attack's gone. 461 00:18:38,941 --> 00:18:40,440 But you could imagine that you could 462 00:18:40,440 --> 00:18:51,510 have someone who-- user puts malicious HTML in the Comment 463 00:18:51,510 --> 00:18:53,285 section for a website. 464 00:18:56,370 --> 00:19:03,570 And if the web server actually accepts that comment is valid, 465 00:19:03,570 --> 00:19:05,950 then that comment, with this malicious payload, 466 00:19:05,950 --> 00:19:07,580 can essentially live there forever. 467 00:19:07,580 --> 00:19:09,330 So whenever any user goes there, they 468 00:19:09,330 --> 00:19:11,280 would be exposed to that malicious content. 469 00:19:11,280 --> 00:19:14,000 Another example, which is sort of 470 00:19:14,000 --> 00:19:15,750 funny and sad, as all these things, 471 00:19:15,750 --> 00:19:17,985 is if you look at dating websites. 472 00:19:21,090 --> 00:19:24,790 So some dating websites actually allow 473 00:19:24,790 --> 00:19:29,600 users to put full-blown HTML in their profile. 474 00:19:29,600 --> 00:19:30,740 So what does that mean? 475 00:19:30,740 --> 00:19:33,860 So when someone else is lonely, presumably, or looking 476 00:19:33,860 --> 00:19:38,200 to find their one true soul match, they go to your website. 477 00:19:38,200 --> 00:19:40,210 They're going to run HTML that you've 478 00:19:40,210 --> 00:19:43,450 crafted in the context of their session. 479 00:19:43,450 --> 00:19:46,390 And so that can also be a very damaging attack as well. 480 00:19:46,390 --> 00:19:48,040 So just doing these kinds of filters 481 00:19:48,040 --> 00:19:49,901 don't protect against things like that. 482 00:19:49,901 --> 00:19:52,065 AUDIENCE: So [INAUDIBLE] in the Comments 483 00:19:52,065 --> 00:19:55,673 section presumably does that by setting a post-- 484 00:19:55,673 --> 00:19:58,879 the information goes to the server in a post variable 485 00:19:58,879 --> 00:20:00,290 or something like that? 486 00:20:00,290 --> 00:20:01,220 PROFESSOR: So there's a bunch of different ways 487 00:20:01,220 --> 00:20:02,210 you could imagine doing it. 488 00:20:02,210 --> 00:20:02,495 Yeah. 489 00:20:02,495 --> 00:20:04,494 So one way you could imagine doing it is a post. 490 00:20:04,494 --> 00:20:06,795 Another way you could imagine doing it is a dynamic XML 491 00:20:06,795 --> 00:20:07,516 HTTP request. 492 00:20:07,516 --> 00:20:08,450 AUDIENCE: OK. 493 00:20:08,450 --> 00:20:11,597 But if it's like a post, why can't you just scan through it 494 00:20:11,597 --> 00:20:13,430 and do the same thing that you have in the-- 495 00:20:13,430 --> 00:20:13,830 PROFESSOR: Yes. 496 00:20:13,830 --> 00:20:15,040 So you're exactly correct about that, 497 00:20:15,040 --> 00:20:16,831 and we'll discuss some of that in a second. 498 00:20:16,831 --> 00:20:18,915 But you're exactly correct that the server 499 00:20:18,915 --> 00:20:20,290 side of the application should be 500 00:20:20,290 --> 00:20:22,246 very defensive and mistrustful of this stuff. 501 00:20:22,246 --> 00:20:23,245 So you're exactly right. 502 00:20:23,245 --> 00:20:25,860 So you could imagine that when the server maybe 503 00:20:25,860 --> 00:20:28,560 saw something like this, [INAUDIBLE] 504 00:20:28,560 --> 00:20:30,315 even if the browser did not. 505 00:20:30,315 --> 00:20:32,400 You're correct about that. 506 00:20:32,400 --> 00:20:32,900 All right. 507 00:20:32,900 --> 00:20:36,740 So that's basically a survey of these cross-site filters 508 00:20:36,740 --> 00:20:37,410 in the browser. 509 00:20:37,410 --> 00:20:43,510 So another defense against cross-site scripting 510 00:20:43,510 --> 00:20:49,480 is something known as HTTP-only cookies. 511 00:20:52,100 --> 00:20:54,430 And so the basic idea behind this 512 00:20:54,430 --> 00:20:58,550 is that a server can actually tell the browser 513 00:20:58,550 --> 00:21:00,380 that client-side JavaScript should not 514 00:21:00,380 --> 00:21:03,970 be able to access a particular cookie. 515 00:21:03,970 --> 00:21:06,000 And so basically, the server can just 516 00:21:06,000 --> 00:21:10,187 send a header value in response in the set Cookie field. 517 00:21:10,187 --> 00:21:12,270 It can say, hey, don't let clients like JavaScript 518 00:21:12,270 --> 00:21:13,070 manipulate this cookie. 519 00:21:13,070 --> 00:21:14,570 So only the server could do this. 520 00:21:14,570 --> 00:21:17,020 And so this is only a partial defense, though, 521 00:21:17,020 --> 00:21:21,820 because the attacker can still issue requests that 522 00:21:21,820 --> 00:21:24,310 contain the user's cookies. 523 00:21:24,310 --> 00:21:26,450 So this was the cross-site request forgery 524 00:21:26,450 --> 00:21:30,130 that we looked at in last lecture. 525 00:21:30,130 --> 00:21:33,891 So even if JavaScript code can't manipulate cookies, 526 00:21:33,891 --> 00:21:35,766 the attacker can still do things like conjure 527 00:21:35,766 --> 00:21:39,700 up a URL to some e-commerce site, let's say buy.com. 528 00:21:39,700 --> 00:21:44,150 The attacker can put whatever item the attacker wants to buy. 529 00:21:44,150 --> 00:21:46,710 So puts a Ferrari, for example. 530 00:21:46,710 --> 00:21:52,040 And then the attacker can then say, who should this go to? 531 00:21:52,040 --> 00:21:55,281 This should go to the attacker. 532 00:21:55,281 --> 00:21:57,030 And so even though clients like JavaScript 533 00:21:57,030 --> 00:21:58,789 can't access the cookie, there's nothing 534 00:21:58,789 --> 00:22:00,830 that prevents the attacker from just conjuring up 535 00:22:00,830 --> 00:22:02,130 a URL like this. 536 00:22:02,130 --> 00:22:04,671 This is what some of the CSRF tokens 537 00:22:04,671 --> 00:22:06,170 help to prevent against, which we'll 538 00:22:06,170 --> 00:22:09,230 talk about a little bit later. 539 00:22:09,230 --> 00:22:12,330 So another thing that you can try 540 00:22:12,330 --> 00:22:16,090 to do to prevent these cross-site scripting attacks 541 00:22:16,090 --> 00:22:18,810 is privilege separation. 542 00:22:22,220 --> 00:22:26,670 And so the idea here is basically 543 00:22:26,670 --> 00:22:36,630 that you want to use a separate domain for all the content that 544 00:22:36,630 --> 00:22:37,225 is untrusted. 545 00:22:41,370 --> 00:22:46,400 And so for example, a lot of the online server providers 546 00:22:46,400 --> 00:22:49,200 were things like email or online productivity suites. 547 00:22:49,200 --> 00:22:52,620 So think Google Docs, Office 365, so on and so forth. 548 00:22:52,620 --> 00:22:54,790 They actually use a separate domain 549 00:22:54,790 --> 00:22:57,570 to host user-submitted content. 550 00:22:57,570 --> 00:23:00,990 So Google, I think they still use this. 551 00:23:00,990 --> 00:23:02,940 They used to put all the stuff that users 552 00:23:02,940 --> 00:23:04,782 submitted into some special domain 553 00:23:04,782 --> 00:23:05,990 called googleusercontent.com. 554 00:23:11,492 --> 00:23:13,270 And so here, they would put things 555 00:23:13,270 --> 00:23:15,880 like cached copies of pages, your Gmail [INAUDIBLE], 556 00:23:15,880 --> 00:23:17,320 and things like this. 557 00:23:17,320 --> 00:23:19,430 And at least as of a year or two ago, 558 00:23:19,430 --> 00:23:22,990 this is like one of the top 25 [? Alexa-visited ?] domains, 559 00:23:22,990 --> 00:23:24,970 because Google services were so popular. 560 00:23:24,970 --> 00:23:28,670 And so what's the advantage of putting stuff in here? 561 00:23:28,670 --> 00:23:31,740 Well, the hope, at least, is that if there 562 00:23:31,740 --> 00:23:34,700 is some type of cross-site scripting vulnerability 563 00:23:34,700 --> 00:23:37,810 or something like this in a user-submitted content, 564 00:23:37,810 --> 00:23:40,240 then hopefully, the daemons would just 565 00:23:40,240 --> 00:23:42,700 be limited to that domain. 566 00:23:42,700 --> 00:23:46,122 It wouldn't actually affect the full-blown google.com. 567 00:23:46,122 --> 00:23:48,185 This isn't a perfect defense, though, 568 00:23:48,185 --> 00:23:50,590 because user-submitted content may have references 569 00:23:50,590 --> 00:23:51,695 to things from google.com. 570 00:23:51,695 --> 00:23:53,480 And so once again, this is only sort 571 00:23:53,480 --> 00:23:58,326 of a partial fix for a much more pervasive problem. 572 00:23:58,326 --> 00:23:59,700 Now, another thing you could do-- 573 00:23:59,700 --> 00:24:03,430 and this gets back to the gentleman's suggestion 574 00:24:03,430 --> 00:24:05,740 over here is that we can actually 575 00:24:05,740 --> 00:24:08,695 do content sanitization. 576 00:24:14,270 --> 00:24:21,570 And so the idea here is that, essentially, whenever you-- 577 00:24:21,570 --> 00:24:24,170 where you can be the browser, where you can be the web 578 00:24:24,170 --> 00:24:27,730 server, or whatever-- whenever you receive untrusted content, 579 00:24:27,730 --> 00:24:28,930 you don't trust it at all. 580 00:24:28,930 --> 00:24:32,230 And so you go through it, and you do things to sort of render 581 00:24:32,230 --> 00:24:35,210 it sort of neutral such that it can't actually 582 00:24:35,210 --> 00:24:38,530 execute code or subvert your system in any way. 583 00:24:38,530 --> 00:24:45,820 And so an example of this is the Django template system. 584 00:24:50,740 --> 00:24:53,896 And so Django is an example of a web framework. 585 00:24:53,896 --> 00:24:55,645 So basically, the high-level web framework 586 00:24:55,645 --> 00:24:59,560 is something that helps to automate and secure 587 00:24:59,560 --> 00:25:02,700 some of the sort of tedious tasks of developing a website. 588 00:25:02,700 --> 00:25:06,740 So it will help you with making database access easier. 589 00:25:06,740 --> 00:25:10,380 It'll help you with doing things like session management. 590 00:25:10,380 --> 00:25:13,520 And it will also help you with maintaining a consistent look 591 00:25:13,520 --> 00:25:15,847 and feel across your website. 592 00:25:15,847 --> 00:25:18,180 And so one way to maintain that consistent look and feel 593 00:25:18,180 --> 00:25:20,040 is to use this notion of templates. 594 00:25:20,040 --> 00:25:21,490 So all of your pages automatically 595 00:25:21,490 --> 00:25:23,510 start out with the same CSS and things 596 00:25:23,510 --> 00:25:25,700 like that, the same styles. 597 00:25:25,700 --> 00:25:28,420 But then there's these portions in the web page 598 00:25:28,420 --> 00:25:30,840 where you can specialize it with the particular news 599 00:25:30,840 --> 00:25:34,420 article that's at the top of everybody's mind that day, 600 00:25:34,420 --> 00:25:36,630 or something like that, or user-specific content. 601 00:25:36,630 --> 00:25:40,780 So for example, in Django, you can look at a template, 602 00:25:40,780 --> 00:25:42,950 and it might look like something like this. 603 00:25:42,950 --> 00:25:44,790 So you have a bold tag. 604 00:25:44,790 --> 00:25:48,350 It says, Hello. 605 00:25:48,350 --> 00:25:51,850 And then you have these braces here, these double braces. 606 00:25:51,850 --> 00:25:52,815 And it says, name. 607 00:25:55,640 --> 00:25:57,360 And so essentially, what this means 608 00:25:57,360 --> 00:26:01,000 is that this is like a placeholder variable. 609 00:26:01,000 --> 00:26:04,410 So essentially, these pages get dynamically generated. 610 00:26:04,410 --> 00:26:06,780 So when the user goes to a Django site, 611 00:26:06,780 --> 00:26:09,310 the Django server says, OK, well, this name 612 00:26:09,310 --> 00:26:11,850 is going to be somewhere, who knows, in the cookie. 613 00:26:11,850 --> 00:26:14,120 Maybe it's going to be in a CGI string, whatever. 614 00:26:14,120 --> 00:26:16,100 And so as the Django server dynamically 615 00:26:16,100 --> 00:26:17,990 generates the page [? to return ?] to user, 616 00:26:17,990 --> 00:26:20,470 it replaces this special reference here 617 00:26:20,470 --> 00:26:23,339 with whatever the value of this variable is. 618 00:26:23,339 --> 00:26:24,630 So it's pretty straightforward. 619 00:26:24,630 --> 00:26:27,630 This is kind of like that dinky CGI server I showed you. 620 00:26:27,630 --> 00:26:29,480 So just reflecting user-submitted content 621 00:26:29,480 --> 00:26:30,320 right here. 622 00:26:30,320 --> 00:26:33,930 But Django actually does it better than the silly CGI 623 00:26:33,930 --> 00:26:36,260 server that I showed you, because it uses this notion 624 00:26:36,260 --> 00:26:38,620 of content sanitization. 625 00:26:38,620 --> 00:26:41,387 So Django expects that users may be adversarial. 626 00:26:41,387 --> 00:26:42,970 So it's not just going to directly put 627 00:26:42,970 --> 00:26:44,680 the value of the name variable here. 628 00:26:44,680 --> 00:26:47,460 Instead, it is going to encode it in such a way 629 00:26:47,460 --> 00:26:50,583 that this content will never be able to escape out 630 00:26:50,583 --> 00:26:53,292 of the HTML context and execute JavaScript 631 00:26:53,292 --> 00:26:54,250 or something like this. 632 00:26:54,250 --> 00:26:55,666 So for example, one thing it'll do 633 00:26:55,666 --> 00:27:00,180 is it'll take the angle brackets, 634 00:27:00,180 --> 00:27:05,350 and it will translate them into these HTML entities. 635 00:27:05,350 --> 00:27:09,380 So the less than character gets transformed into this. 636 00:27:09,380 --> 00:27:16,060 The greater than character gets translated into this. 637 00:27:16,060 --> 00:27:24,720 Double quotations get translated into ampersand quote, 638 00:27:24,720 --> 00:27:25,950 and so on and so forth. 639 00:27:25,950 --> 00:27:30,200 And so what this ensures is that if the content the user put 640 00:27:30,200 --> 00:27:32,520 in name actually tries to contain angle brackets 641 00:27:32,520 --> 00:27:35,367 or things like this, then it'll basically be neutered. 642 00:27:35,367 --> 00:27:36,950 And it'll be translated into something 643 00:27:36,950 --> 00:27:38,890 that would not be interpreted as HTML 644 00:27:38,890 --> 00:27:41,540 on the client-side browser. 645 00:27:41,540 --> 00:27:43,740 So does that make sense? 646 00:27:43,740 --> 00:27:48,361 So now I know that this is not a completely foolproof defense 647 00:27:48,361 --> 00:27:50,360 against some of this cross-site scripting stuff. 648 00:27:50,360 --> 00:27:52,610 And the reason, as we showed in the example, 649 00:27:52,610 --> 00:27:55,800 is that these grammars for HTML, and CSS, and JavaScript 650 00:27:55,800 --> 00:28:00,320 are so complicated that it's very easy to confuse 651 00:28:00,320 --> 00:28:01,490 the browser's parser. 652 00:28:01,490 --> 00:28:07,020 So for example, let's say that you had something like this. 653 00:28:07,020 --> 00:28:10,880 And this is a very common thing to do 654 00:28:10,880 --> 00:28:12,510 in frameworks like Django. 655 00:28:12,510 --> 00:28:15,350 So you have some div. 656 00:28:15,350 --> 00:28:20,380 And then you want to set its class dynamically. 657 00:28:20,380 --> 00:28:26,692 So you set its class to some var, so on and so forth. 658 00:28:26,692 --> 00:28:28,650 So the idea is that when Django processes this, 659 00:28:28,650 --> 00:28:30,540 it should figure out what the current styling is and then put 660 00:28:30,540 --> 00:28:31,360 it in here. 661 00:28:31,360 --> 00:28:33,850 Well, one thing you can do is maybe 662 00:28:33,850 --> 00:28:38,740 the attacker supplies something like a string like this. 663 00:28:38,740 --> 00:28:41,690 So attacker will say, class 1. 664 00:28:41,690 --> 00:28:45,400 OK, so far so good, because that seems like a valid CSS 665 00:28:45,400 --> 00:28:46,052 expression. 666 00:28:46,052 --> 00:28:47,510 But then the attacker will then try 667 00:28:47,510 --> 00:28:51,210 to put some JavaScript here. 668 00:28:51,210 --> 00:28:57,980 So it might say, onclick equals-- and then put 669 00:28:57,980 --> 00:29:00,400 JavaScript URL. 670 00:29:00,400 --> 00:29:06,740 And then put some function call here. 671 00:29:06,740 --> 00:29:09,650 So this is malformed. 672 00:29:09,650 --> 00:29:12,787 The browser should probably just do a fail-stop here. 673 00:29:12,787 --> 00:29:14,370 But the problem is that if you've ever 674 00:29:14,370 --> 00:29:17,550 looked at the HTML for a real web page, all of it's 675 00:29:17,550 --> 00:29:20,250 broken, even for like legitimate, benevolent sites. 676 00:29:20,250 --> 00:29:21,960 People just can't hack HTML. 677 00:29:21,960 --> 00:29:24,500 So if the browser were to be fail-stop, 678 00:29:24,500 --> 00:29:27,910 literally, no site that you enjoy would ever work ever. 679 00:29:27,910 --> 00:29:29,990 If you ever want to be disappointed by the world 680 00:29:29,990 --> 00:29:32,120 if I haven't helped you do that enough, 681 00:29:32,120 --> 00:29:35,620 open up your JavaScript console when you browse a website 682 00:29:35,620 --> 00:29:37,710 and see how many errors get spit out. 683 00:29:37,710 --> 00:29:41,400 Like, go to CNN and just see how many errors get spit out. 684 00:29:41,400 --> 00:29:44,725 CNN basically kind of works, but it's very disturbing, 685 00:29:44,725 --> 00:29:46,600 because if you were to open up Acrobat reader 686 00:29:46,600 --> 00:29:47,650 and you're just constantly throwing 687 00:29:47,650 --> 00:29:49,220 null pointer exceptions, you would 688 00:29:49,220 --> 00:29:50,390 feel a bit cheated by life. 689 00:29:50,390 --> 00:29:53,072 But in the web, apparently, we've learned to accept this. 690 00:29:53,072 --> 00:29:55,530 So because browsers have to be so tolerant of these things, 691 00:29:55,530 --> 00:29:57,940 they will actually try to massage malformed code 692 00:29:57,940 --> 00:29:59,490 into something that seems reasonable. 693 00:29:59,490 --> 00:30:01,239 And therein lies a security vulnerability. 694 00:30:05,060 --> 00:30:07,120 So I guess the take-home point for this 695 00:30:07,120 --> 00:30:12,666 is that content sanitization kind of works. 696 00:30:12,666 --> 00:30:14,290 So it is literally better than nothing. 697 00:30:14,290 --> 00:30:17,240 It can actually catch a lot of cases. 698 00:30:17,240 --> 00:30:21,140 But in many cases, it is not a full defense. 699 00:30:21,140 --> 00:30:23,970 And so one thing you might actually think about doing 700 00:30:23,970 --> 00:30:28,200 is-- actually, let's put this over here. 701 00:30:28,200 --> 00:30:36,808 You might think about sort of using a less expressive markup 702 00:30:36,808 --> 00:30:37,308 language. 703 00:30:41,140 --> 00:30:42,540 So what do I mean by that? 704 00:30:42,540 --> 00:30:47,200 So HTML and CSS and JavaScript are [? touring ?] complete. 705 00:30:47,200 --> 00:30:49,721 They allow you to do all kinds of fun things, but-- yeah? 706 00:30:49,721 --> 00:30:50,970 AUDIENCE: Sorry to bother you. 707 00:30:50,970 --> 00:30:54,151 When does content sanitization not work? 708 00:30:54,151 --> 00:30:55,400 PROFESSOR: When does content-- 709 00:30:55,400 --> 00:30:57,140 AUDIENCE: In many cases, it doesn't work. 710 00:30:57,140 --> 00:30:57,610 PROFESSOR: Oh, yeah. 711 00:30:57,610 --> 00:30:59,200 So like in this case, for example, 712 00:30:59,200 --> 00:31:01,630 Django will probably not be able to statically determine 713 00:31:01,630 --> 00:31:03,730 this is a bad thing. 714 00:31:03,730 --> 00:31:04,980 Like, in this particular case. 715 00:31:04,980 --> 00:31:09,450 But in the case where I inserted that malformed image tag-- 716 00:31:09,450 --> 00:31:10,374 I basically said-- 717 00:31:10,374 --> 00:31:11,790 AUDIENCE: In that particular case, 718 00:31:11,790 --> 00:31:14,100 I would expect the class=assignment to be 719 00:31:14,100 --> 00:31:17,600 in quotes and then for that thing to not have any effect. 720 00:31:17,600 --> 00:31:20,830 So Django could enforce codes that [INAUDIBLE]. 721 00:31:20,830 --> 00:31:23,330 PROFESSOR: Well, see, there's a little bit trickiness there, 722 00:31:23,330 --> 00:31:28,000 because if we assumed that all pages were written-- well, 723 00:31:28,000 --> 00:31:29,380 pull me back up a little bit. 724 00:31:29,380 --> 00:31:32,010 If we assume the HTML grammar was well specified 725 00:31:32,010 --> 00:31:35,080 and the CSS grammar was well specified 726 00:31:35,080 --> 00:31:36,920 and so on and so forth, then you could 727 00:31:36,920 --> 00:31:39,620 imagine a world in which perfect parsers would 728 00:31:39,620 --> 00:31:42,060 be able to sort of catch these problems 729 00:31:42,060 --> 00:31:44,490 or somehow convert them to normal things. 730 00:31:44,490 --> 00:31:46,940 But in fact, the HTML grammars and the CSS grammars 731 00:31:46,940 --> 00:31:48,860 are not well specified. 732 00:31:48,860 --> 00:31:52,930 And then on top of that, browsers don't implement specs. 733 00:31:52,930 --> 00:31:55,010 So it's like Babushka dolls of terror. 734 00:31:55,010 --> 00:31:57,219 So I mean, this, in fact, gets into this notion here. 735 00:31:57,219 --> 00:31:59,135 Because I think essentially what you're saying 736 00:31:59,135 --> 00:32:01,650 is, well, look, if we have the grammar for something, 737 00:32:01,650 --> 00:32:03,420 that should mean something. 738 00:32:03,420 --> 00:32:05,220 And as it turns out, if you stick 739 00:32:05,220 --> 00:32:09,346 to a less expressive grammar, then it is actually much easier 740 00:32:09,346 --> 00:32:10,470 to do content sanitization. 741 00:32:13,704 --> 00:32:14,620 There's some language. 742 00:32:14,620 --> 00:32:20,740 It's called Markdown instead of markup. 743 00:32:20,740 --> 00:32:21,550 [? Wall, ?] right? 744 00:32:21,550 --> 00:32:24,310 And so with Markdown, the basic idea 745 00:32:24,310 --> 00:32:27,100 is that it's designed to be a language that 746 00:32:27,100 --> 00:32:29,750 allows, for example, users to submit comments, 747 00:32:29,750 --> 00:32:32,415 but it doesn't actually have things like the blank tag, 748 00:32:32,415 --> 00:32:34,530 and applet support, and stuff like that. 749 00:32:34,530 --> 00:32:36,700 And so in Markdown, it's actually much easier 750 00:32:36,700 --> 00:32:39,360 to do what you suggested, which seems like a reasonable thing 751 00:32:39,360 --> 00:32:40,910 at first glance. 752 00:32:40,910 --> 00:32:43,580 Just define the grammar unambiguously and then just 753 00:32:43,580 --> 00:32:45,880 enforce that grammar. 754 00:32:45,880 --> 00:32:47,930 So it's much easier to do sanitization 755 00:32:47,930 --> 00:32:52,410 in a simple language than in the full-blown HTML, CSS, 756 00:32:52,410 --> 00:32:53,035 and JavaScript. 757 00:32:53,035 --> 00:32:54,630 And in a certain sense, think about it 758 00:32:54,630 --> 00:32:57,680 like the difference between understanding gnarly C 759 00:32:57,680 --> 00:33:00,110 code versus gnarly Python code. 760 00:33:00,110 --> 00:33:01,710 There's actually a big difference 761 00:33:01,710 --> 00:33:04,820 in trying to understand that much more expressive language. 762 00:33:04,820 --> 00:33:06,670 Because it can do many more things. 763 00:33:06,670 --> 00:33:09,427 By constraining expressivity, you oftentimes 764 00:33:09,427 --> 00:33:10,135 improve security. 765 00:33:12,779 --> 00:33:13,820 Does that all make sense? 766 00:33:17,440 --> 00:33:17,940 All right. 767 00:33:17,940 --> 00:33:19,900 So another thing that you can imagine 768 00:33:19,900 --> 00:33:24,270 doing to protect against cross-site scripting attacks 769 00:33:24,270 --> 00:33:31,040 is to use something called CSP, Content Security Policy. 770 00:33:36,750 --> 00:33:40,560 And so the idea behind CSP is that it's going 771 00:33:40,560 --> 00:33:43,295 to allow a web server to-- oh. 772 00:33:43,295 --> 00:33:45,878 AUDIENCE: Yeah, I'm just curious about this Markdown language. 773 00:33:45,878 --> 00:33:50,864 So all browsers know how to parse this language? 774 00:33:50,864 --> 00:33:51,780 PROFESSOR: No, no, no. 775 00:33:51,780 --> 00:33:54,030 So what happens with a lot of these types of languages 776 00:33:54,030 --> 00:33:56,640 is that you essentially-- you can convert them. 777 00:33:56,640 --> 00:34:00,080 You can pile them down to HTML, but they're not natively 778 00:34:00,080 --> 00:34:02,700 understood by the browser, typically. 779 00:34:02,700 --> 00:34:09,120 So in other words, you've got some comment submission system. 780 00:34:09,120 --> 00:34:11,840 It internally expresses stuff in Markdown. 781 00:34:11,840 --> 00:34:13,819 But then before it can be rendered to the page, 782 00:34:13,819 --> 00:34:16,224 it essentially goes to the Markdown compiler. 783 00:34:16,224 --> 00:34:18,265 The Markdown compiler then translates it to HTML. 784 00:34:18,265 --> 00:34:19,017 AUDIENCE: I see. 785 00:34:19,017 --> 00:34:20,508 Thanks. 786 00:34:20,508 --> 00:34:23,725 [INAUDIBLE] Markdown might not be 787 00:34:23,725 --> 00:34:25,300 the best trick [? to use Markdown ?] 788 00:34:25,300 --> 00:34:26,934 [INAUDIBLE]. 789 00:34:26,934 --> 00:34:28,850 PROFESSOR: So Markdown does allow inline HTML. 790 00:34:28,850 --> 00:34:30,750 As far as I know, there's a way to disable 791 00:34:30,750 --> 00:34:31,844 that in the compiler. 792 00:34:31,844 --> 00:34:33,010 I could be wrong about that. 793 00:34:33,010 --> 00:34:34,595 But I believe that there's a flag 794 00:34:34,595 --> 00:34:36,790 you can pass to get rid of it. 795 00:34:36,790 --> 00:34:37,770 But you're correct. 796 00:34:37,770 --> 00:34:39,565 If you use a constrained language 797 00:34:39,565 --> 00:34:42,380 but then you embed an unconstrained language, 798 00:34:42,380 --> 00:34:44,219 then that-- I mean, the terrorists have won. 799 00:34:44,219 --> 00:34:47,360 So you're right about that. 800 00:34:47,360 --> 00:34:47,900 OK. 801 00:34:47,900 --> 00:34:48,400 Yeah. 802 00:34:48,400 --> 00:34:51,480 So another thing you can do to improve security 803 00:34:51,480 --> 00:34:53,550 is this thing called Content Security Policy. 804 00:34:53,550 --> 00:34:57,230 So like I was saying, what this allows the server to do 805 00:34:57,230 --> 00:35:01,140 is to tell a web browser what types of content 806 00:35:01,140 --> 00:35:03,470 can be loaded in the page it's sending back, 807 00:35:03,470 --> 00:35:06,510 and also where that content should come from. 808 00:35:06,510 --> 00:35:10,270 So for example, in an HTTP response, 809 00:35:10,270 --> 00:35:14,060 the server might be able to say something like this. 810 00:35:14,060 --> 00:35:20,640 It'd include the Content Security Policy header. 811 00:35:20,640 --> 00:35:26,390 And then it might say something like the default 812 00:35:26,390 --> 00:35:31,220 source is going to equal self. 813 00:35:34,811 --> 00:35:39,250 And it will also accept things from asterisk mydomain.com. 814 00:35:43,270 --> 00:35:45,860 So what does this mean? 815 00:35:45,860 --> 00:35:50,300 So essentially, the server is saying the content 816 00:35:50,300 --> 00:35:52,680 from this site should only come from whatever 817 00:35:52,680 --> 00:35:56,250 it is that the domain is for the particular page. 818 00:35:56,250 --> 00:36:01,210 And any other subdomain from mydomain.com. 819 00:36:01,210 --> 00:36:02,640 So what that means, basically, is 820 00:36:02,640 --> 00:36:07,745 that let's say if self was bound to foo.com, 821 00:36:07,745 --> 00:36:11,039 let's say, that's the origin of the server that's sending 822 00:36:11,039 --> 00:36:12,330 this thing back to the browser. 823 00:36:12,330 --> 00:36:14,540 So if, somehow, there is a cross-site scripting 824 00:36:14,540 --> 00:36:17,325 attack and the page tried to generate a reference to, 825 00:36:17,325 --> 00:36:19,490 let's say, bar.com, the browser would say, 826 00:36:19,490 --> 00:36:21,960 OK, bar.com is not self. 827 00:36:21,960 --> 00:36:25,695 Bar.com is also not in this sort of set of domains. 828 00:36:25,695 --> 00:36:27,320 So therefore, the browser can just say, 829 00:36:27,320 --> 00:36:31,390 I will not allow that request to go forward. 830 00:36:31,390 --> 00:36:33,960 So this is actually a pretty powerful mechanism. 831 00:36:33,960 --> 00:36:36,450 And you can actually specify more fine-grained controls 832 00:36:36,450 --> 00:36:36,950 here. 833 00:36:36,950 --> 00:36:39,880 You can say, my images should come from here. 834 00:36:39,880 --> 00:36:42,700 My scripts should come from here, so on and so forth. 835 00:36:42,700 --> 00:36:45,150 This is actually pretty nice. 836 00:36:45,150 --> 00:36:46,610 And one nice thing about this, too, 837 00:36:46,610 --> 00:36:50,440 is that it actually prevents inline JavaScript. 838 00:36:50,440 --> 00:36:53,856 So you can't have script tag and then some literal JavaScript 839 00:36:53,856 --> 00:36:54,730 and close script tag. 840 00:36:54,730 --> 00:36:57,350 Everything has to come from a script tag with a source. 841 00:36:57,350 --> 00:36:59,850 So it can be validated through this. 842 00:36:59,850 --> 00:37:02,150 And also, a Content Security Policy 843 00:37:02,150 --> 00:37:05,230 prevents these danger statements like eval. 844 00:37:05,230 --> 00:37:06,840 So eval basically allows a web page 845 00:37:06,840 --> 00:37:09,200 to check dynamically generated JavaScript code. 846 00:37:09,200 --> 00:37:13,070 And so if the CSP header is specified, 847 00:37:13,070 --> 00:37:17,720 the browser does not execute evals. 848 00:37:17,720 --> 00:37:19,358 So does that all make sense? 849 00:37:19,358 --> 00:37:21,848 AUDIENCE: So since it's a kind of ad-hoc set of things, 850 00:37:21,848 --> 00:37:26,247 is that like a complete set of things that it [INAUDIBLE]? 851 00:37:26,247 --> 00:37:26,830 PROFESSOR: No. 852 00:37:26,830 --> 00:37:29,490 So there's a whole list of resources 853 00:37:29,490 --> 00:37:31,100 that it actually protects. 854 00:37:31,100 --> 00:37:34,057 So this is sort of like the most blanket type protection 855 00:37:34,057 --> 00:37:34,640 you could get. 856 00:37:34,640 --> 00:37:36,000 But like I said, it actually allows 857 00:37:36,000 --> 00:37:38,230 you to specify, I think, like, where CSS can come from, 858 00:37:38,230 --> 00:37:39,586 like a bunch of different things. 859 00:37:39,586 --> 00:37:41,404 AUDIENCE: But on preventing evals, that seems 860 00:37:41,404 --> 00:37:42,654 like the system's [INAUDIBLE]. 861 00:37:42,654 --> 00:37:45,050 Are there are other things [INAUDIBLE]? 862 00:37:45,050 --> 00:37:46,540 PROFESSOR: So yeah, there are. 863 00:37:46,540 --> 00:37:48,540 So there's always this question of completeness. 864 00:37:51,580 --> 00:37:53,610 So for example, eval is not the only way 865 00:37:53,610 --> 00:37:56,240 JavaScript can actually generate code dynamically. 866 00:37:56,240 --> 00:37:58,250 There's the function constructor, for example. 867 00:37:58,250 --> 00:38:00,250 There's certain ways you can call a set timeout. 868 00:38:00,250 --> 00:38:01,270 You pass in a string. 869 00:38:01,270 --> 00:38:03,000 You can evaluate code that way. 870 00:38:03,000 --> 00:38:05,590 So I believe that CSP actually shuts down those vectors 871 00:38:05,590 --> 00:38:06,410 as well. 872 00:38:06,410 --> 00:38:09,580 But if you're asking, is this provably complete in terms 873 00:38:09,580 --> 00:38:11,746 of what it isolates, no. 874 00:38:11,746 --> 00:38:13,620 And I don't think that any of these solutions 875 00:38:13,620 --> 00:38:16,830 are provably complete. 876 00:38:16,830 --> 00:38:18,830 AUDIENCE: One really interesting thing about CSP 877 00:38:18,830 --> 00:38:21,860 is the fact that you can set it to disallow all inline 878 00:38:21,860 --> 00:38:23,194 [? dom ?] script on a page. 879 00:38:23,194 --> 00:38:23,860 PROFESSOR: Yeah. 880 00:38:23,860 --> 00:38:24,401 That's right. 881 00:38:24,401 --> 00:38:24,901 Yeah, yeah. 882 00:38:24,901 --> 00:38:27,234 AUDIENCE: Which [? helps ?] [INAUDIBLE] to be sanitized. 883 00:38:27,234 --> 00:38:27,910 PROFESSOR: Yeah. 884 00:38:27,910 --> 00:38:32,410 AUDIENCE: [INAUDIBLE] prevents an attacker from-- 885 00:38:32,410 --> 00:38:34,160 PROFESSOR: So that helps with some things. 886 00:38:34,160 --> 00:38:38,400 But that still would allow, like, [INAUDIBLE] to use eval. 887 00:38:38,400 --> 00:38:40,530 So that's why it's important to try to get rid 888 00:38:40,530 --> 00:38:42,340 of all of those dynamically. 889 00:38:42,340 --> 00:38:44,102 All of those interfaces [? use dynamic ?] 890 00:38:44,102 --> 00:38:44,829 code generation. 891 00:38:44,829 --> 00:38:47,245 AUDIENCE: If you list your tag with a source but then also 892 00:38:47,245 --> 00:38:50,260 inline code, is there like standardized [INAUDIBLE] that 893 00:38:50,260 --> 00:38:52,764 all browsers do with-- 894 00:38:52,764 --> 00:38:53,430 PROFESSOR: Yeah. 895 00:38:53,430 --> 00:38:57,960 So what should happen is that the inline code 896 00:38:57,960 --> 00:39:00,320 should be ignored. 897 00:39:00,320 --> 00:39:02,660 The browser should always get the code 898 00:39:02,660 --> 00:39:03,940 from the source attribute. 899 00:39:03,940 --> 00:39:06,290 I actually don't know if all browsers do that. 900 00:39:06,290 --> 00:39:07,850 I've actually personally experienced 901 00:39:07,850 --> 00:39:10,490 browsers exhibit different behavior [? in that. ?] 902 00:39:10,490 --> 00:39:14,280 This was a couple years ago, so I'm not sure. 903 00:39:14,280 --> 00:39:14,780 And so yeah. 904 00:39:14,780 --> 00:39:17,840 So one thing to keep in mind about doing work 905 00:39:17,840 --> 00:39:19,780 in web security is that in a sense, 906 00:39:19,780 --> 00:39:21,450 it's almost like a natural science. 907 00:39:21,450 --> 00:39:24,180 So it's like people actually propose theories 908 00:39:24,180 --> 00:39:25,390 about how browsers work. 909 00:39:25,390 --> 00:39:26,914 And then you go seeing them do that. 910 00:39:26,914 --> 00:39:28,830 And so that can be a little bit disappointing, 911 00:39:28,830 --> 00:39:31,790 because we're taught, yay, algorithms, and proofs, 912 00:39:31,790 --> 00:39:32,660 and stuff like that. 913 00:39:32,660 --> 00:39:35,940 But these browsers are so ill-behaved that a lot 914 00:39:35,940 --> 00:39:39,440 of times, the answer is maybe or maybe not. 915 00:39:39,440 --> 00:39:42,140 And then [? you go ?] see, as we'll see. 916 00:39:42,140 --> 00:39:43,390 They keep on adding features. 917 00:39:43,390 --> 00:39:44,890 It gets back to your question about, 918 00:39:44,890 --> 00:39:47,530 are these things provably complete? 919 00:39:47,530 --> 00:39:52,260 I think web vendors have punted on this notion of creating 920 00:39:52,260 --> 00:39:54,840 a browser that is provably [INAUDIBLE]. 921 00:39:54,840 --> 00:39:56,345 Basically, what they try to do is 922 00:39:56,345 --> 00:39:58,386 just try to keep one step ahead of the attackers. 923 00:39:58,386 --> 00:40:00,944 And we'll see some examples of that further in the lecture. 924 00:40:00,944 --> 00:40:01,828 So yeah. 925 00:40:01,828 --> 00:40:04,310 So CSP is actually pretty cool. 926 00:40:04,310 --> 00:40:09,920 Another thing that's useful is that the server can set this 927 00:40:09,920 --> 00:40:16,500 HTTP header called X-Content-Type-Options, 928 00:40:16,500 --> 00:40:19,266 and then can say, nosniff. 929 00:40:19,266 --> 00:40:23,395 And so what this means is that this prevents the browser 930 00:40:23,395 --> 00:40:26,570 from doing some of those, quote, unquote, helpful optimizations, 931 00:40:26,570 --> 00:40:29,595 like we discussed last lecture, where it will say, a-ha, 932 00:40:29,595 --> 00:40:31,936 there's a mismatch between the file extension 933 00:40:31,936 --> 00:40:34,560 and the actual [? bytes ?] that I have sniffed in the contents. 934 00:40:34,560 --> 00:40:35,660 So let me somehow massage this content 935 00:40:35,660 --> 00:40:37,120 to some different thing. 936 00:40:37,120 --> 00:40:39,996 And then all of a sudden, you've given the barbarians the keys 937 00:40:39,996 --> 00:40:40,620 to the kingdom. 938 00:40:40,620 --> 00:40:42,453 So you can set this header to basically say, 939 00:40:42,453 --> 00:40:43,555 browser, do not do that. 940 00:40:43,555 --> 00:40:45,180 And so that can be useful in mitigating 941 00:40:45,180 --> 00:40:48,430 some types of attacks as well. 942 00:40:48,430 --> 00:40:48,930 All right. 943 00:40:48,930 --> 00:40:51,850 So that's kind of a quick survey of some 944 00:40:51,850 --> 00:40:55,352 of these cross-site scripting vulnerabilities. 945 00:40:55,352 --> 00:41:03,030 So now let's look at another popular vector for attacks. 946 00:41:03,030 --> 00:41:08,524 And that vector is going to be SQL. 947 00:41:08,524 --> 00:41:15,440 And so you've probably heard of these SQL injection attacks. 948 00:41:15,440 --> 00:41:19,910 And so what these attacks do is they take advantage of the fact 949 00:41:19,910 --> 00:41:22,720 that on the back end, for a lot of websites, 950 00:41:22,720 --> 00:41:24,295 there's some type of database. 951 00:41:24,295 --> 00:41:26,170 And so to dynamically construct a page that's 952 00:41:26,170 --> 00:41:27,720 shown to the user, there have to be 953 00:41:27,720 --> 00:41:31,190 some database queries that are issued to that back-end server. 954 00:41:31,190 --> 00:41:37,370 So imagine that you have some query that looked like this. 955 00:41:37,370 --> 00:41:41,850 So you do a SELECT asterisk. 956 00:41:41,850 --> 00:41:45,600 So give me all the values from this query FROM 957 00:41:45,600 --> 00:41:58,970 some particular table, WHERE the User ID field 958 00:41:58,970 --> 00:42:07,820 is equal to something that is specified 959 00:42:07,820 --> 00:42:13,986 over the web from some potentially untrusted source. 960 00:42:13,986 --> 00:42:16,680 So at this point, I may think we all know how this story ends. 961 00:42:16,680 --> 00:42:18,430 It ends very badly There are no survivors. 962 00:42:18,430 --> 00:42:20,990 So basically, if this comes from someone untrusted, 963 00:42:20,990 --> 00:42:23,780 then you can do all kinds of [? chicaner ?] stuff here. 964 00:42:23,780 --> 00:42:27,380 So one thing you could do is if you want to be a jerk, 965 00:42:27,380 --> 00:42:32,520 you could just set this to the string, 0 966 00:42:32,520 --> 00:42:36,070 and then something like DELETE TABLE. 967 00:42:41,330 --> 00:42:42,270 So what happens here? 968 00:42:42,270 --> 00:42:44,575 So basically, the database server's going to say, 969 00:42:44,575 --> 00:42:48,249 OK, I'll set the user ID to 0;. 970 00:42:48,249 --> 00:42:49,540 Here's a sort of a new command. 971 00:42:49,540 --> 00:42:50,320 DELETE TABLE. 972 00:42:50,320 --> 00:42:52,114 OK, cheers, there goes your table. 973 00:42:52,114 --> 00:42:52,780 And you're done. 974 00:42:52,780 --> 00:42:54,320 And in fact, there was a viral image that 975 00:42:54,320 --> 00:42:55,410 went around a couple years ago. 976 00:42:55,410 --> 00:42:56,993 It's unclear if it was true, like many 977 00:42:56,993 --> 00:42:57,930 of these viral images. 978 00:42:57,930 --> 00:42:59,750 But it was that people in Germany 979 00:42:59,750 --> 00:43:03,830 had license plates that actually said 0; DELETE TABLE. 980 00:43:03,830 --> 00:43:05,000 [LAUGHTER] 981 00:43:05,000 --> 00:43:07,460 Because the idea is that the security cameras, 982 00:43:07,460 --> 00:43:11,160 they would use OCR, Optical Character Recognition, 983 00:43:11,160 --> 00:43:14,270 to figure out what your license plate was, and then put it 984 00:43:14,270 --> 00:43:15,505 in a SQL database. 985 00:43:15,505 --> 00:43:17,365 And there were images floating around. 986 00:43:17,365 --> 00:43:19,250 These Volkswagens, people would have 987 00:43:19,250 --> 00:43:21,586 this as their license plate. 988 00:43:21,586 --> 00:43:22,710 I don't know if that works. 989 00:43:22,710 --> 00:43:23,070 It's funny. 990 00:43:23,070 --> 00:43:24,570 So I like to believe that it's true. 991 00:43:24,570 --> 00:43:26,036 But who knows. 992 00:43:26,036 --> 00:43:27,660 But you get the basic idea behind that. 993 00:43:27,660 --> 00:43:29,860 So once again, the idea is you want 994 00:43:29,860 --> 00:43:33,210 to be sure to sanitize this content that you're getting 995 00:43:33,210 --> 00:43:35,880 from these untrusted sources. 996 00:43:35,880 --> 00:43:38,040 And so note that there may be some sort 997 00:43:38,040 --> 00:43:40,062 of straightforward things that don't quite work. 998 00:43:40,062 --> 00:43:41,603 So you might think, OK, well then why 999 00:43:41,603 --> 00:43:46,720 can't I just put another quote here and then 1000 00:43:46,720 --> 00:43:49,650 put another quote here such that whatever 1001 00:43:49,650 --> 00:43:52,490 it is that the attacker submits, it's going 1002 00:43:52,490 --> 00:43:54,014 to be enclosed in a string? 1003 00:43:54,014 --> 00:43:56,430 So this doesn't work, because then the attacker can always 1004 00:43:56,430 --> 00:43:59,080 just put a quote inside his or her attack string. 1005 00:43:59,080 --> 00:44:02,150 So a lot of times, these sort of half-hearted hacks 1006 00:44:02,150 --> 00:44:04,680 don't really get you the security you think they might. 1007 00:44:04,680 --> 00:44:10,470 So the solution here is that you need 1008 00:44:10,470 --> 00:44:14,335 to rigorously encode your data. 1009 00:44:19,420 --> 00:44:23,640 And once again, that just means that when you get information 1010 00:44:23,640 --> 00:44:26,360 from an untrusted source, don't just stick it 1011 00:44:26,360 --> 00:44:29,420 in the system sort of as it is. 1012 00:44:29,420 --> 00:44:32,300 Make sure that, for example, it can actually 1013 00:44:32,300 --> 00:44:34,880 escape from whatever sandbox or whatnot you think 1014 00:44:34,880 --> 00:44:36,390 you're actually putting into. 1015 00:44:36,390 --> 00:44:40,750 So for example, you want to put in an Escape function that 1016 00:44:40,750 --> 00:44:42,360 would prevent maybe the semicolon 1017 00:44:42,360 --> 00:44:45,850 operator from showing up in a raw form and things like this. 1018 00:44:45,850 --> 00:44:47,350 And so a lot of these web frameworks 1019 00:44:47,350 --> 00:44:52,850 like Django will actually have built-in libraries to do things 1020 00:44:52,850 --> 00:44:54,770 like character escaping for SQL queries 1021 00:44:54,770 --> 00:44:56,600 to try to prevent some of this stuff. 1022 00:44:56,600 --> 00:44:58,210 And a lot of these frameworks actually 1023 00:44:58,210 --> 00:45:00,740 encourage developers not to ever directly interface 1024 00:45:00,740 --> 00:45:01,664 with the database. 1025 00:45:01,664 --> 00:45:03,330 So it's like Django itself would provide 1026 00:45:03,330 --> 00:45:06,280 some high-level interface which does sanitization for you. 1027 00:45:06,280 --> 00:45:10,160 It takes care of some of these icky corner cases. 1028 00:45:10,160 --> 00:45:12,011 But performance, performance, performance. 1029 00:45:12,011 --> 00:45:14,010 Sometimes people think that these web frameworks 1030 00:45:14,010 --> 00:45:14,720 are too slow. 1031 00:45:14,720 --> 00:45:16,928 So you will still see, on the back end a lot of time, 1032 00:45:16,928 --> 00:45:20,590 people will still make these raw SQL queries. 1033 00:45:20,590 --> 00:45:23,330 And that can lead to problems. 1034 00:45:23,330 --> 00:45:25,050 So you can also imagine that there 1035 00:45:25,050 --> 00:45:33,280 are problems if the web server takes in path names 1036 00:45:33,280 --> 00:45:34,840 from untrusted images. 1037 00:45:34,840 --> 00:45:38,430 So imagine that somewhere in your server, 1038 00:45:38,430 --> 00:45:39,640 you do something like this. 1039 00:45:39,640 --> 00:45:40,644 You have an open call. 1040 00:45:40,644 --> 00:45:42,060 And then you say that you're going 1041 00:45:42,060 --> 00:45:44,355 to read from the WWW directory. 1042 00:45:44,355 --> 00:45:49,500 You're going to read from the images subdirectory in there. 1043 00:45:49,500 --> 00:45:54,020 And then you're going to read from some file name that, once 1044 00:45:54,020 --> 00:45:56,110 again, is supplied by the user. 1045 00:45:56,110 --> 00:45:59,700 So as we saw in some of the discussion of [? Troot ?] 1046 00:45:59,700 --> 00:46:03,270 and things like this, what if this file name maps 1047 00:46:03,270 --> 00:46:08,475 to something like a bunch of instances of the dot dot 1048 00:46:08,475 --> 00:46:08,975 character? 1049 00:46:13,830 --> 00:46:17,610 So if you're not careful, then the untrusted entity 1050 00:46:17,610 --> 00:46:20,984 can specify basically glub, glub, glub, glub, 1051 00:46:20,984 --> 00:46:22,442 and go down to etc password and may 1052 00:46:22,442 --> 00:46:24,180 be able to do some evil here. 1053 00:46:24,180 --> 00:46:26,950 So once again, if you want to be able to use the web 1054 00:46:26,950 --> 00:46:29,140 server or the web framework, you need 1055 00:46:29,140 --> 00:46:32,320 to be able to detect these dangerous characters, 1056 00:46:32,320 --> 00:46:34,470 escape them in some way to prevent 1057 00:46:34,470 --> 00:46:38,150 sort of those raw commands from executing. 1058 00:46:38,150 --> 00:46:40,300 So yeah, it's all pretty straightforward. 1059 00:46:40,300 --> 00:46:40,800 OK. 1060 00:46:40,800 --> 00:46:45,760 So let's move on from the discussion of content 1061 00:46:45,760 --> 00:46:49,310 sanitization, and now let's talk a little bit about cookies. 1062 00:46:52,670 --> 00:46:56,290 So cookies are a very popular way 1063 00:46:56,290 --> 00:47:02,620 to do session management, to bind 1064 00:47:02,620 --> 00:47:07,920 the user to some set of resources 1065 00:47:07,920 --> 00:47:10,086 that exist on the server side. 1066 00:47:10,086 --> 00:47:12,460 And so a lot of frameworks like Django, like [? zoobar ?] 1067 00:47:12,460 --> 00:47:15,300 that you see in this class, they actually 1068 00:47:15,300 --> 00:47:17,520 put a random session ID inside the cookie. 1069 00:47:17,520 --> 00:47:23,630 And so the idea is that this session ID is the index 1070 00:47:23,630 --> 00:47:27,770 into some server-side table. 1071 00:47:27,770 --> 00:47:30,410 So you just supply the session ID there. 1072 00:47:30,410 --> 00:47:35,120 And this is where your user info lives. 1073 00:47:35,120 --> 00:47:40,690 And so as a result, this session ID and cookies, by extension, 1074 00:47:40,690 --> 00:47:43,034 are very sensitive entities. 1075 00:47:43,034 --> 00:47:44,450 And so that's why a lot of attacks 1076 00:47:44,450 --> 00:47:46,860 involve stealing of the cookie in order 1077 00:47:46,860 --> 00:47:48,480 to get that session ID. 1078 00:47:48,480 --> 00:47:50,600 And so as we discussed in the last lecture, 1079 00:47:50,600 --> 00:47:52,975 the same origin policy can help you, to a certain extent, 1080 00:47:52,975 --> 00:47:55,102 against some of these cookie-stealing attacks, 1081 00:47:55,102 --> 00:47:56,810 because there are origin-based rules that 1082 00:47:56,810 --> 00:48:00,050 prevent arbitrary tampering with cookies. 1083 00:48:00,050 --> 00:48:02,180 But one thing that's a little bit subtle 1084 00:48:02,180 --> 00:48:07,000 is that you shouldn't share a domain or a subdomain 1085 00:48:07,000 --> 00:48:09,510 with someone that you don't trust. 1086 00:48:09,510 --> 00:48:11,460 Because as we discussed in last lecture, 1087 00:48:11,460 --> 00:48:13,500 there are these sort of very subtle rules 1088 00:48:13,500 --> 00:48:17,950 in which two origins with the same domain or possibly 1089 00:48:17,950 --> 00:48:20,500 some subdomain relationship, they can actually 1090 00:48:20,500 --> 00:48:22,540 access each other's cookies. 1091 00:48:22,540 --> 00:48:25,130 And so if you trust a domain that you shouldn't, 1092 00:48:25,130 --> 00:48:28,020 then that domain may be able to do things like directly 1093 00:48:28,020 --> 00:48:32,390 set the session ID in that cookie that both of you 1094 00:48:32,390 --> 00:48:33,680 can access. 1095 00:48:33,680 --> 00:48:35,680 And that can do things like allow the attacker 1096 00:48:35,680 --> 00:48:38,330 to force the user to use a session ID of the attacker's 1097 00:48:38,330 --> 00:48:39,540 choosing. 1098 00:48:39,540 --> 00:48:41,500 And then, for example-- let's say 1099 00:48:41,500 --> 00:48:46,310 the attacker sets the user's Gmail cookie, let's say. 1100 00:48:46,310 --> 00:48:48,680 The user goes to Gmail, types some emails. 1101 00:48:48,680 --> 00:48:51,030 The attacker, later on, can then use that cookie 1102 00:48:51,030 --> 00:48:53,780 or specifically use that session ID, 1103 00:48:53,780 --> 00:48:55,870 load up Gmail, and then access Gmail 1104 00:48:55,870 --> 00:48:59,040 as if he or she were the user who was victimized. 1105 00:48:59,040 --> 00:49:02,070 So there's a lot of subtleties with using these cookies 1106 00:49:02,070 --> 00:49:03,720 for session management. 1107 00:49:03,720 --> 00:49:06,490 So there's a lot more we could talk about cookies. 1108 00:49:06,490 --> 00:49:08,625 We'll discuss some of it today and last lecture. 1109 00:49:08,625 --> 00:49:11,250 So you might be thinking, well, can we just get rid of cookies? 1110 00:49:11,250 --> 00:49:12,765 Cookies just seem more trouble than they're worth, 1111 00:49:12,765 --> 00:49:15,431 just like [? dribbels. ?] So can we just not have these cookies? 1112 00:49:15,431 --> 00:49:18,920 So one thing you could imagine is you could imagine basically 1113 00:49:18,920 --> 00:49:25,570 having some notion of stateless cookies, 1114 00:49:25,570 --> 00:49:29,445 of somehow getting rid of the notion of sessions altogether 1115 00:49:29,445 --> 00:49:32,631 and preventing this nasty attack vector that 1116 00:49:32,631 --> 00:49:34,880 seems to be sort of prevalent in all these discussions 1117 00:49:34,880 --> 00:49:35,750 that we have. 1118 00:49:35,750 --> 00:49:40,210 So the basic idea here is if you want to go sort of stateless, 1119 00:49:40,210 --> 00:49:41,780 then this essentially means you have 1120 00:49:41,780 --> 00:49:46,285 to authenticate every request. 1121 00:49:51,060 --> 00:49:52,560 Because the nice thing about cookies 1122 00:49:52,560 --> 00:49:54,820 is that they basically follow you wherever you go. 1123 00:49:54,820 --> 00:49:56,352 So you authenticate once, and then 1124 00:49:56,352 --> 00:49:57,485 every subsequent request you make 1125 00:49:57,485 --> 00:49:58,570 has this little token in it. 1126 00:49:58,570 --> 00:50:01,020 But if you want to get rid of those things, well then now 1127 00:50:01,020 --> 00:50:03,394 you essentially have to have some proof of your authority 1128 00:50:03,394 --> 00:50:05,747 in every request that you make. 1129 00:50:05,747 --> 00:50:07,330 And so one way you could imagine doing 1130 00:50:07,330 --> 00:50:13,580 this is by using something called MAX, or Message 1131 00:50:13,580 --> 00:50:14,600 Authentication Codes. 1132 00:50:18,550 --> 00:50:23,020 And so the basic way to think about one of these MAX, 1133 00:50:23,020 --> 00:50:26,410 it's like a hash that takes in a key as well. 1134 00:50:26,410 --> 00:50:28,440 So the method authentication code 1135 00:50:28,440 --> 00:50:34,290 is the hash of some key and then some message. 1136 00:50:34,290 --> 00:50:39,410 And so the basic idea is that the client, the user, 1137 00:50:39,410 --> 00:50:43,206 and the server are going to share some secret key, k. 1138 00:50:43,206 --> 00:50:48,060 And so the client uses that key to produce a signature 1139 00:50:48,060 --> 00:50:50,550 over the message that it sends to the server. 1140 00:50:50,550 --> 00:50:52,960 And then the server, who also knows the key, 1141 00:50:52,960 --> 00:50:55,710 can then use this same function here 1142 00:50:55,710 --> 00:50:58,090 to validate if a signature is correct. 1143 00:50:58,090 --> 00:50:58,590 OK. 1144 00:50:58,590 --> 00:51:03,650 So let's look at a very specific example of how this works. 1145 00:51:03,650 --> 00:51:05,950 So one real service that uses these types 1146 00:51:05,950 --> 00:51:09,740 of stateless cookies is Amazon Web Services. 1147 00:51:09,740 --> 00:51:11,280 So like x3, for example. 1148 00:51:11,280 --> 00:51:21,950 And so basically, Amazon, AWS, gives each user two things-- 1149 00:51:21,950 --> 00:51:25,530 gives that user a secret key. 1150 00:51:28,950 --> 00:51:31,890 And so this is equivalent to the k 1151 00:51:31,890 --> 00:51:33,420 that we were discussing over there. 1152 00:51:33,420 --> 00:51:37,320 And it also gives them a-- just think of it 1153 00:51:37,320 --> 00:51:40,120 like an AWS user ID. 1154 00:51:42,840 --> 00:51:45,700 So this part is not secret, but this part is. 1155 00:51:45,700 --> 00:51:49,570 And so every time you want to send a request to AWS via HTTP, 1156 00:51:49,570 --> 00:51:51,970 you have to send it in a special format. 1157 00:51:51,970 --> 00:51:57,040 So you'll have the first line of the GET request. 1158 00:51:57,040 --> 00:52:04,680 So you want to access some photos. 1159 00:52:04,680 --> 00:52:07,800 No surprises here. 1160 00:52:07,800 --> 00:52:14,370 And then you will put the host from which 1161 00:52:14,370 --> 00:52:15,585 you expect to get it. 1162 00:52:15,585 --> 00:52:16,710 That's not super important. 1163 00:52:16,710 --> 00:52:19,731 So this is just some AWS server that's there. 1164 00:52:19,731 --> 00:52:20,605 You'll have the date. 1165 00:52:23,160 --> 00:52:28,830 So maybe this is Monday, June 4. 1166 00:52:28,830 --> 00:52:30,470 Whatever. 1167 00:52:30,470 --> 00:52:34,215 And then you have this thing that's essentially 1168 00:52:34,215 --> 00:52:35,390 the Authorization field. 1169 00:52:38,300 --> 00:52:41,620 And this is where the message authentication code comes in. 1170 00:52:41,620 --> 00:52:47,520 So essentially, what this looks like is 1171 00:52:47,520 --> 00:52:51,660 you've got some string here. 1172 00:52:51,660 --> 00:52:56,368 This represents your access ID, the user ID. 1173 00:52:59,510 --> 00:53:03,530 And then you've got something here, 1174 00:53:03,530 --> 00:53:07,470 some other seemingly random letters. 1175 00:53:07,470 --> 00:53:13,350 And then these things are a signature 1176 00:53:13,350 --> 00:53:16,240 that use this Message Authentication Code here. 1177 00:53:16,240 --> 00:53:20,730 So what does that signature look like? 1178 00:53:20,730 --> 00:53:24,250 So the details are a little bit complicated. 1179 00:53:24,250 --> 00:53:27,630 But basically, this signature is over a string 1180 00:53:27,630 --> 00:53:30,110 that encapsulates a bunch of details of this request. 1181 00:53:30,110 --> 00:53:36,500 So essentially, the string assigned 1182 00:53:36,500 --> 00:53:37,950 looks something like this. 1183 00:53:37,950 --> 00:53:43,360 So you put the HTTP verb in there. 1184 00:53:43,360 --> 00:53:47,280 So in this case, that verb is GET. 1185 00:53:47,280 --> 00:53:53,376 And then you put [? indy5 ?] checksum 1186 00:53:53,376 --> 00:53:55,100 of the message content. 1187 00:53:57,612 --> 00:54:01,335 And then you also put the content type. 1188 00:54:01,335 --> 00:54:03,760 So it's html or image or whatever. 1189 00:54:03,760 --> 00:54:04,835 And put in the date. 1190 00:54:07,575 --> 00:54:14,915 And then the resource name, which is essentially the path 1191 00:54:14,915 --> 00:54:16,180 that you see over here. 1192 00:54:16,180 --> 00:54:19,650 So in other words, this string here 1193 00:54:19,650 --> 00:54:26,620 is the message that you pass into the H MAC over here. 1194 00:54:26,620 --> 00:54:32,230 And so note that the server can see all this stuff 1195 00:54:32,230 --> 00:54:34,501 in clear text in the request. 1196 00:54:34,501 --> 00:54:36,000 And so that's what allows the server 1197 00:54:36,000 --> 00:54:38,270 to validate that that message authentication 1198 00:54:38,270 --> 00:54:39,850 code was correct. 1199 00:54:39,850 --> 00:54:43,965 Because note that the server shares that key with the user. 1200 00:54:43,965 --> 00:54:46,340 So that allows the server to validate that kind of stuff. 1201 00:54:46,340 --> 00:54:49,065 So does that all make sense? 1202 00:54:49,065 --> 00:54:51,010 AUDIENCE: [INAUDIBLE]? 1203 00:54:51,010 --> 00:54:51,770 PROFESSOR: Oh. 1204 00:54:51,770 --> 00:54:53,145 So in this case, for the content, 1205 00:54:53,145 --> 00:54:55,603 that's probably going to be nothing, like the empty string. 1206 00:54:55,603 --> 00:54:58,670 But you can imagine there's like a post or something like that. 1207 00:54:58,670 --> 00:55:00,900 You'd actually have the data of the HTTP. 1208 00:55:00,900 --> 00:55:04,170 AUDIENCE: [INAUDIBLE] which is kind of an unfortunate 1209 00:55:04,170 --> 00:55:05,130 choice nowadays. 1210 00:55:05,130 --> 00:55:07,060 PROFESSOR: So I believe that they do. 1211 00:55:07,060 --> 00:55:10,194 So I checked the Amazon documentation yesterday. 1212 00:55:10,194 --> 00:55:11,360 So I believe they do use it. 1213 00:55:11,360 --> 00:55:13,734 But I think-- I could be wrong, but I think they actually 1214 00:55:13,734 --> 00:55:16,056 use a stronger hash here. 1215 00:55:16,056 --> 00:55:17,180 So that helps a little bit. 1216 00:55:17,180 --> 00:55:19,584 But you're right. [? Indy5 ?] is not the best. 1217 00:55:19,584 --> 00:55:22,686 AUDIENCE: [INAUDIBLE] this works. 1218 00:55:22,686 --> 00:55:23,380 PROFESSOR: OK. 1219 00:55:23,380 --> 00:55:26,060 So allow me to help you, hopefully, 1220 00:55:26,060 --> 00:55:27,824 even though I'm the guy who confused you 1221 00:55:27,824 --> 00:55:28,615 in the first place. 1222 00:55:28,615 --> 00:55:32,180 So the basic idea is that we want 1223 00:55:32,180 --> 00:55:34,830 to get rid of this notion of this persistent cookie that's 1224 00:55:34,830 --> 00:55:37,100 always following the user around. 1225 00:55:37,100 --> 00:55:39,125 Now, the problem, though, is that the server 1226 00:55:39,125 --> 00:55:43,330 needs some way to identify which client it's talking to. 1227 00:55:43,330 --> 00:55:44,930 So what we're going to do is we're 1228 00:55:44,930 --> 00:55:49,160 going to ensure that each client shares 1229 00:55:49,160 --> 00:55:52,470 a unique key with the server. 1230 00:55:52,470 --> 00:55:56,260 And so basically, whenever the client sends a message 1231 00:55:56,260 --> 00:55:58,500 to the server, the client is going 1232 00:55:58,500 --> 00:56:00,965 to send the message before and then also send 1233 00:56:00,965 --> 00:56:03,340 this special cryptographic operation, 1234 00:56:03,340 --> 00:56:04,780 the result of this operation here. 1235 00:56:04,780 --> 00:56:08,394 AUDIENCE: Oh, OK. [INAUDIBLE] and then again, you hash it. 1236 00:56:08,394 --> 00:56:09,060 PROFESSOR: Yeah. 1237 00:56:09,060 --> 00:56:11,280 So basically, to first approximation, 1238 00:56:11,280 --> 00:56:13,000 like imagine in the regular world, 1239 00:56:13,000 --> 00:56:14,932 like, this would be some cookie here 1240 00:56:14,932 --> 00:56:16,140 instead of the authorization. 1241 00:56:16,140 --> 00:56:18,556 But now we're getting rid of the cookie, and we're saying, 1242 00:56:18,556 --> 00:56:20,060 here's this clear text message. 1243 00:56:20,060 --> 00:56:21,880 And then here's this crypto thing, 1244 00:56:21,880 --> 00:56:24,160 which basically allows the server to figure out 1245 00:56:24,160 --> 00:56:25,990 who this thing came from. 1246 00:56:25,990 --> 00:56:30,460 And so the server knows who the user is, because that's 1247 00:56:30,460 --> 00:56:31,656 embedded in the clear. 1248 00:56:31,656 --> 00:56:32,780 That's not a secret, right? 1249 00:56:32,780 --> 00:56:34,490 But this basically allows the server 1250 00:56:34,490 --> 00:56:37,005 to say, aha, I know which secret key this user 1251 00:56:37,005 --> 00:56:39,795 should have been using to create this if that 1252 00:56:39,795 --> 00:56:41,345 is, in fact, the real user. 1253 00:56:41,345 --> 00:56:42,706 AUDIENCE: Nice. 1254 00:56:42,706 --> 00:56:43,205 OK. 1255 00:56:43,205 --> 00:56:43,929 Thanks. 1256 00:56:43,929 --> 00:56:46,470 AUDIENCE: So what prevents the attacker from finding the key? 1257 00:56:46,470 --> 00:56:47,692 Where is this secret key? 1258 00:56:47,692 --> 00:56:49,358 PROFESSOR: Yeah, that's a good question. 1259 00:56:49,358 --> 00:56:54,770 So in a lot of cases, the client for AWS 1260 00:56:54,770 --> 00:56:58,150 is not a browser, but some VM running in the cloud, 1261 00:56:58,150 --> 00:56:59,040 for example. 1262 00:56:59,040 --> 00:57:02,200 So you'll see sort of just VM and VM communication. 1263 00:57:02,200 --> 00:57:06,110 You can also imagine, too, that users 1264 00:57:06,110 --> 00:57:10,990 can sort of hand out these links or embed them somehow in HTML. 1265 00:57:10,990 --> 00:57:16,750 So it's like you just have sort of this-- inside the HTML 1266 00:57:16,750 --> 00:57:19,240 or JavaScript source code, you'd have the code 1267 00:57:19,240 --> 00:57:20,549 to create a request like this. 1268 00:57:20,549 --> 00:57:22,590 So that's almost like me giving you a capability. 1269 00:57:22,590 --> 00:57:24,250 So if I give you one of these things, 1270 00:57:24,250 --> 00:57:27,126 you can make that request on my behalf, basically. 1271 00:57:27,126 --> 00:57:28,835 AUDIENCE: So would it be possible to use 1272 00:57:28,835 --> 00:57:32,888 MACs on the normal clients [INAUDIBLE]? 1273 00:57:32,888 --> 00:57:35,000 PROFESSOR: For a normal-- you mean like browsers? 1274 00:57:35,000 --> 00:57:36,470 AUDIENCE: For normal users. 1275 00:57:36,470 --> 00:57:38,020 PROFESSOR: Well, I mean, you get into these questions 1276 00:57:38,020 --> 00:57:39,180 like, where does the key live, which 1277 00:57:39,180 --> 00:57:40,930 was [? kind of like what ?] he was asking. 1278 00:57:40,930 --> 00:57:43,712 So in a certain extent, the issue of where the key lives 1279 00:57:43,712 --> 00:57:45,170 is actually super, super important. 1280 00:57:45,170 --> 00:57:48,010 Because if the key can be stolen just as easily as the cookie, 1281 00:57:48,010 --> 00:57:50,544 well then we're sort of back to square one. 1282 00:57:50,544 --> 00:57:52,460 So in many cases, this stuff is actually just, 1283 00:57:52,460 --> 00:57:54,200 as I said, sort of server to server, 1284 00:57:54,200 --> 00:57:56,270 like a VM to VM somewhere in the cloud. 1285 00:57:56,270 --> 00:57:58,185 So the application developer runs 1286 00:57:58,185 --> 00:58:03,518 a VM that sort of outsources a bunch of stored stuff to AWS. 1287 00:58:03,518 --> 00:58:06,500 AUDIENCE: So do you think [INAUDIBLE] 1288 00:58:06,500 --> 00:58:09,979 but isn't that kind of like a bad way of preventing-- 1289 00:58:09,979 --> 00:58:11,457 I mean, they have network latency, 1290 00:58:11,457 --> 00:58:13,623 so it can't be like too fine-grained of a constraint 1291 00:58:13,623 --> 00:58:14,835 that they're putting on. 1292 00:58:14,835 --> 00:58:17,251 If an attacker sends the same request again really quickly 1293 00:58:17,251 --> 00:58:20,120 after the user, wouldn't they be able to [INAUDIBLE]? 1294 00:58:20,120 --> 00:58:21,640 PROFESSOR: Yeah, yeah, yeah. 1295 00:58:21,640 --> 00:58:24,850 So suffice it to say that secure timestamping is 1296 00:58:24,850 --> 00:58:27,540 like several people's PhD theses. 1297 00:58:27,540 --> 00:58:31,650 But you're exactly right that if this-- just as a crude example. 1298 00:58:31,650 --> 00:58:35,370 So imagine this just said, Monday, June 4. 1299 00:58:35,370 --> 00:58:38,470 Then if, somehow, the attacker could get access 1300 00:58:38,470 --> 00:58:40,040 to this entire thing and there was 1301 00:58:40,040 --> 00:58:41,860 nothing that was different about it-- 1302 00:58:41,860 --> 00:58:44,370 so there was no [? knocks, ?] no random stuff like that, 1303 00:58:44,370 --> 00:58:45,120 then that's right. 1304 00:58:45,120 --> 00:58:48,770 Then that request could be [INAUDIBLE]. 1305 00:58:48,770 --> 00:58:51,270 Now, one thing that AWS actually does 1306 00:58:51,270 --> 00:58:53,145 is you can actually include an expiration 1307 00:58:53,145 --> 00:58:55,610 date in these things. 1308 00:58:55,610 --> 00:58:58,820 So one thing you can actually do is 1309 00:58:58,820 --> 00:59:05,670 add sort of an Expires field, essentially, 1310 00:59:05,670 --> 00:59:07,097 have that thing be assigned. 1311 00:59:07,097 --> 00:59:09,680 Then I can hand that reference to a bunch of different people. 1312 00:59:09,680 --> 00:59:11,930 Kind of like I was saying in response to his question, 1313 00:59:11,930 --> 00:59:13,730 it acts as a capability. 1314 00:59:13,730 --> 00:59:15,770 The server can then check that expiration date 1315 00:59:15,770 --> 00:59:18,200 from when it actually sees it and then not actually-- 1316 00:59:18,200 --> 00:59:19,658 AUDIENCE: But even if the expiration date 1317 00:59:19,658 --> 00:59:21,949 is like 200 milliseconds in the future or something, as 1318 00:59:21,949 --> 00:59:24,518 long as the attacker has [INAUDIBLE] latency, 1319 00:59:24,518 --> 00:59:27,920 then they might send two [INAUDIBLE] 1320 00:59:27,920 --> 00:59:29,684 two copies instead of one. 1321 00:59:29,684 --> 00:59:30,350 PROFESSOR: Yeah. 1322 00:59:30,350 --> 00:59:31,010 That's exactly right. 1323 00:59:31,010 --> 00:59:31,884 That's exactly right. 1324 00:59:31,884 --> 00:59:35,090 So yeah, if the attacker can somehow-- 1325 00:59:35,090 --> 00:59:36,810 like a network attacker, for example, 1326 00:59:36,810 --> 00:59:39,480 is seeing these things go over the wire-- and you're right. 1327 00:59:39,480 --> 00:59:43,379 If there's enough wiggle room in the expiration date, 1328 00:59:43,379 --> 00:59:44,920 then they can exactly do that attack. 1329 00:59:44,920 --> 00:59:47,810 That's right. 1330 00:59:47,810 --> 00:59:49,030 OK. 1331 00:59:49,030 --> 00:59:54,260 So that is an overview of how these stateless cookies work. 1332 00:59:54,260 --> 00:59:55,900 And so one question that's interesting 1333 00:59:55,900 --> 00:59:58,790 is you might think, well, what does it mean to log out 1334 00:59:58,790 --> 01:00:00,670 with this type of cookie? 1335 01:00:00,670 --> 01:00:03,455 And the answer is that you don't really log out. 1336 01:00:03,455 --> 01:00:04,880 I mean, you have this key. 1337 01:00:04,880 --> 01:00:08,210 And so whenever you want to send a request, you just send it. 1338 01:00:08,210 --> 01:00:10,254 You include this dude right here, 1339 01:00:10,254 --> 01:00:11,420 and then you're ready to go. 1340 01:00:11,420 --> 01:00:13,419 Now, one thing the server could do, for example, 1341 01:00:13,419 --> 01:00:15,660 though, is revoke your key. 1342 01:00:15,660 --> 01:00:17,100 So the server revokes your key. 1343 01:00:17,100 --> 01:00:18,862 Then you can generate one of these things. 1344 01:00:18,862 --> 01:00:20,570 But when you send the message over there, 1345 01:00:20,570 --> 01:00:23,320 the server's going to say, aha, I know what your user ID is. 1346 01:00:23,320 --> 01:00:26,310 You've been revoked, so I'm not going to honor your request. 1347 01:00:26,310 --> 01:00:27,894 But it's a little bit interesting. 1348 01:00:27,894 --> 01:00:30,060 And revocation, as we'll talk more about with things 1349 01:00:30,060 --> 01:00:34,390 like SSL, is always a tricky issue, because as it turns out, 1350 01:00:34,390 --> 01:00:36,101 taking authority away from human users 1351 01:00:36,101 --> 01:00:38,225 is often much more difficult than giving it to them 1352 01:00:38,225 --> 01:00:40,550 in the first place. 1353 01:00:40,550 --> 01:00:44,795 So that's the basic idea behind these sort 1354 01:00:44,795 --> 01:00:46,971 of stateless cookies. 1355 01:00:46,971 --> 01:00:50,650 So there's also a couple other things 1356 01:00:50,650 --> 01:00:54,030 that you can use if you want to avoid 1357 01:00:54,030 --> 01:00:58,740 traditional cookies for implementing authentication. 1358 01:00:58,740 --> 01:01:00,572 So one thing you can imagine doing 1359 01:01:00,572 --> 01:01:17,870 is actually using DOM storage to hold client-side authentication 1360 01:01:17,870 --> 01:01:19,130 information. 1361 01:01:19,130 --> 01:01:21,713 This says "alternatives" in case you couldn't [? read that. ?] 1362 01:01:21,713 --> 01:01:24,240 So one thing you could do is to use DOM storage 1363 01:01:24,240 --> 01:01:26,500 to hold some of that session state 1364 01:01:26,500 --> 01:01:28,729 that you would ordinarily put inside of a cookie. 1365 01:01:28,729 --> 01:01:30,270 So if you remember from last lecture, 1366 01:01:30,270 --> 01:01:33,255 DOM storage is essentially a key value interface 1367 01:01:33,255 --> 01:01:36,050 that the browser provides to each origin. 1368 01:01:36,050 --> 01:01:39,050 So you can say GET and PUT in both the key and eval 1369 01:01:39,050 --> 01:01:41,654 [? strings. ?] So you could imagine putting authentication 1370 01:01:41,654 --> 01:01:42,445 stuff inside there. 1371 01:01:42,445 --> 01:01:47,060 Now, the nice thing about this is that DOM storage actually 1372 01:01:47,060 --> 01:01:49,777 has much less wacky rules with respect 1373 01:01:49,777 --> 01:01:50,860 to the same origin policy. 1374 01:01:50,860 --> 01:01:52,605 So if it were cookies, you can do 1375 01:01:52,605 --> 01:01:55,040 all these tricks with subdomains and stuff like that. 1376 01:01:55,040 --> 01:01:56,410 It got kind of weird. 1377 01:01:56,410 --> 01:02:00,036 DOM storage is actually strictly tied to a single origin. 1378 01:02:00,036 --> 01:02:01,910 You can't do any of this subdomain expansion, 1379 01:02:01,910 --> 01:02:03,070 all that kind of stuff. 1380 01:02:03,070 --> 01:02:05,610 And so frameworks like Meteor use DOM storage 1381 01:02:05,610 --> 01:02:06,510 for this very reason. 1382 01:02:06,510 --> 01:02:08,760 But now, note that if you want to store authentication 1383 01:02:08,760 --> 01:02:10,450 information in DOM storage, then you 1384 01:02:10,450 --> 01:02:13,460 have to write JavaScript code yourself to actually pass 1385 01:02:13,460 --> 01:02:15,660 that authentication information to the server 1386 01:02:15,660 --> 01:02:17,451 to do the [? encryption ?] that's necessary 1387 01:02:17,451 --> 01:02:19,320 and so on and so forth. 1388 01:02:19,320 --> 01:02:20,700 So that's one thing you could do. 1389 01:02:20,700 --> 01:02:24,300 Another thing you could do is actually 1390 01:02:24,300 --> 01:02:28,360 use client-side certificates. 1391 01:02:28,360 --> 01:02:34,510 So for example, like an x.509 format. 1392 01:02:34,510 --> 01:02:37,240 And so what's nice about these certificates 1393 01:02:37,240 --> 01:02:40,480 is that, basically, JavaScript has no explicit interface 1394 01:02:40,480 --> 01:02:41,680 to access these things. 1395 01:02:41,680 --> 01:02:44,233 So unlike cookies, where there's always this arms race 1396 01:02:44,233 --> 01:02:47,646 to find these weird same-origin bugs, 1397 01:02:47,646 --> 01:02:50,830 there's no explicit JavaScript interface for that stuff. 1398 01:02:50,830 --> 01:02:53,312 So that's very nice from a security perspective. 1399 01:02:53,312 --> 01:02:55,020 One problem that I mentioned very briefly 1400 01:02:55,020 --> 01:02:57,145 that we'll look at in more detail in later lectures 1401 01:02:57,145 --> 01:02:58,940 is that the revocation [? store ?] 1402 01:02:58,940 --> 01:03:00,140 is kind of hard for these. 1403 01:03:00,140 --> 01:03:02,904 So once a user leaves your organization, 1404 01:03:02,904 --> 01:03:04,570 how do you take back their certificates? 1405 01:03:04,570 --> 01:03:07,380 And it becomes a little bit tricky. 1406 01:03:07,380 --> 01:03:10,014 Also, these things don't have great usability, 1407 01:03:10,014 --> 01:03:12,180 because who wants to install a bunch of certificates 1408 01:03:12,180 --> 01:03:13,734 for each site that you go to? 1409 01:03:13,734 --> 01:03:15,900 So as a result, these things have a lot of friction, 1410 01:03:15,900 --> 01:03:18,785 and these are not super popular except for in companies 1411 01:03:18,785 --> 01:03:20,910 or organizations that are super security-conscious. 1412 01:03:24,950 --> 01:03:25,460 All right. 1413 01:03:25,460 --> 01:03:29,860 So that concludes our discussion of cookies. 1414 01:03:29,860 --> 01:03:38,130 And so now let's talk about protocol vulnerabilities 1415 01:03:38,130 --> 01:03:39,340 in the web stack. 1416 01:03:45,489 --> 01:03:51,520 And so one kind of interesting attack 1417 01:03:51,520 --> 01:03:55,620 is that there are all these bugs in the way 1418 01:03:55,620 --> 01:03:58,222 that different browser components parse URLs, 1419 01:03:58,222 --> 01:03:59,578 for example. 1420 01:03:59,578 --> 01:04:05,710 So how can URL parsing get us into trouble? 1421 01:04:05,710 --> 01:04:10,220 So suppose that we have a URL that looks like this. 1422 01:04:10,220 --> 01:04:13,080 HTTP comes from example.com. 1423 01:04:18,150 --> 01:04:22,620 And then it's got an explicit port specifies that it's 80. 1424 01:04:22,620 --> 01:04:24,710 And then for some unknown reason, 1425 01:04:24,710 --> 01:04:30,380 it embeds this @ character here. 1426 01:04:30,380 --> 01:04:34,090 So the question is, well, what is the origin 1427 01:04:34,090 --> 01:04:37,230 of this particular URL? 1428 01:04:37,230 --> 01:04:41,710 So as it turns out, so Flash would 1429 01:04:41,710 --> 01:04:49,060 say that the host name portion of this was example.com. 1430 01:04:49,060 --> 01:04:55,040 However, when the browser would parse this, 1431 01:04:55,040 --> 01:04:58,980 it would say that the host name part of the origin 1432 01:04:58,980 --> 01:05:00,680 was actually foo.com. 1433 01:05:00,680 --> 01:05:03,220 So this is clearly a bad thing, because once you 1434 01:05:03,220 --> 01:05:05,490 have two different entities who are confused 1435 01:05:05,490 --> 01:05:07,540 about the origin of the same resource, 1436 01:05:07,540 --> 01:05:10,220 then you can get into all kinds of nasty problems. 1437 01:05:10,220 --> 01:05:14,717 So for example, the Flash code can 1438 01:05:14,717 --> 01:05:17,530 be malicious, can download stuff from example.com. 1439 01:05:17,530 --> 01:05:19,510 If it was embedded in the page from foo.com, 1440 01:05:19,510 --> 01:05:22,245 it could then do some evil things there. 1441 01:05:22,245 --> 01:05:23,870 And it takes some code from example.com 1442 01:05:23,870 --> 01:05:25,460 and run it with the authority of foo.com. 1443 01:05:25,460 --> 01:05:27,210 So there's a lot of complex parsing rules 1444 01:05:27,210 --> 01:05:29,690 like that that make life very difficult. 1445 01:05:29,690 --> 01:05:31,085 This is a continuing theme. 1446 01:05:31,085 --> 01:05:34,650 Like, as we just saw with the content sanitization-- 1447 01:05:34,650 --> 01:05:37,150 so the basic idea that it's oftentimes much better 1448 01:05:37,150 --> 01:05:39,534 to have simpler parsing rules for this kind of stuff. 1449 01:05:39,534 --> 01:05:41,200 It's difficult to do that in retrospect, 1450 01:05:41,200 --> 01:05:43,380 though, because HTML's already out there. 1451 01:05:43,380 --> 01:05:45,790 So all aboard the wam-bulance. 1452 01:05:45,790 --> 01:05:49,010 So this next one, this is actually 1453 01:05:49,010 --> 01:05:52,215 my all-time favorite security vulnerability. 1454 01:05:54,800 --> 01:05:59,730 So it basically attacks the way that the browser 1455 01:05:59,730 --> 01:06:03,370 [? rule 1 ?] JAR files, basically Java applets. 1456 01:06:03,370 --> 01:06:06,500 So in 2007, I think-- yeah, 2007. 1457 01:06:06,500 --> 01:06:08,170 So lifehacker.com-- great website 1458 01:06:08,170 --> 01:06:09,280 if you haven't been to it. 1459 01:06:09,280 --> 01:06:11,820 Lifehacker.com basically explains 1460 01:06:11,820 --> 01:06:18,330 how you can embed ZIP files inside of images. 1461 01:06:18,330 --> 01:06:20,190 Now, it's not quite clear who you're trying 1462 01:06:20,190 --> 01:06:21,590 to hide from by doing this. 1463 01:06:21,590 --> 01:06:24,210 But Lifehacker says you can do it, so hurray. 1464 01:06:24,210 --> 01:06:28,480 So basically, what they take advantage of is the fact 1465 01:06:28,480 --> 01:06:33,200 that if you look at image formats like GIF, for example, 1466 01:06:33,200 --> 01:06:35,530 typically, the way the parser works is the parser 1467 01:06:35,530 --> 01:06:37,807 works from the top, down. 1468 01:06:37,807 --> 01:06:39,390 So it finds information in the header. 1469 01:06:39,390 --> 01:06:42,710 And then it sort of computes on the rest of the bits here. 1470 01:06:42,710 --> 01:06:48,330 Now, what was interesting is that, as it turns out, 1471 01:06:48,330 --> 01:06:51,270 programs which manipulate ZIP files typically 1472 01:06:51,270 --> 01:06:53,215 work from the bottom up. 1473 01:06:53,215 --> 01:06:56,250 So they find some information in the footer of the file. 1474 01:06:56,250 --> 01:06:59,400 Then they work up to try to extract what's inside of it. 1475 01:06:59,400 --> 01:07:01,030 So what Lifehacker basically said 1476 01:07:01,030 --> 01:07:05,600 is that if you wanted to hide a ZIP file on a merger 1477 01:07:05,600 --> 01:07:11,130 or something like this, then you could actually post a GIF there 1478 01:07:11,130 --> 01:07:13,290 that has this ZIP file here. 1479 01:07:13,290 --> 01:07:17,170 It will pass all the validation checks on Flickr or whatever 1480 01:07:17,170 --> 01:07:18,160 as an image. 1481 01:07:18,160 --> 01:07:20,850 It will actually display as an image in your browser. 1482 01:07:20,850 --> 01:07:23,582 Aha, but only you know the hidden truth, 1483 01:07:23,582 --> 01:07:27,380 that if you take this file here, you can pass it to unzip, 1484 01:07:27,380 --> 01:07:30,364 and it will unzip [INAUDIBLE] information there. 1485 01:07:30,364 --> 01:07:32,780 OK, fine, this seems like it's sort of like a cheap parlor 1486 01:07:32,780 --> 01:07:33,279 trick. 1487 01:07:33,279 --> 01:07:34,456 OK, that's nice. 1488 01:07:34,456 --> 01:07:36,080 Now, attackers, of course, never sleep, 1489 01:07:36,080 --> 01:07:37,371 and they want to ruin our life. 1490 01:07:37,371 --> 01:07:38,660 So what did they realize? 1491 01:07:38,660 --> 01:07:45,140 They realize that JAR files are basically derivatives 1492 01:07:45,140 --> 01:07:48,700 of the .ZIP format. 1493 01:07:48,700 --> 01:07:50,830 So this meant that you could actually 1494 01:07:50,830 --> 01:07:53,910 create a GIF or an image that had 1495 01:07:53,910 --> 01:07:56,850 a JAR file, executable JavaScript code, 1496 01:07:56,850 --> 01:07:58,360 at the bottom of it. 1497 01:07:58,360 --> 01:08:02,130 So then people called this attack-- 1498 01:08:02,130 --> 01:08:05,940 they called it the GIFAR attack. 1499 01:08:05,940 --> 01:08:06,910 [LAUGHTER] 1500 01:08:06,910 --> 01:08:10,620 Half GIF, half JAR, all evil. 1501 01:08:10,620 --> 01:08:12,655 Because this was amazing. 1502 01:08:12,655 --> 01:08:15,750 And so what did this mean that you could do? 1503 01:08:15,750 --> 01:08:17,340 Well, it's actually quite subtle. 1504 01:08:17,340 --> 01:08:18,979 Because people first discovered this, 1505 01:08:18,979 --> 01:08:20,930 they thought it was amazing, but they didn't quite 1506 01:08:20,930 --> 01:08:21,888 know how to exploit it. 1507 01:08:21,888 --> 01:08:24,310 But as it turns out, you can do things like the following. 1508 01:08:24,310 --> 01:08:26,729 So first of all, how do you make one of these things? 1509 01:08:26,729 --> 01:08:28,170 You just use CAD. 1510 01:08:28,170 --> 01:08:29,972 There is literally no [? trickeration ?] 1511 01:08:29,972 --> 01:08:30,805 that you have to do. 1512 01:08:30,805 --> 01:08:33,143 Take this, take this, you CAD it. 1513 01:08:33,143 --> 01:08:34,870 Boom, you've got a GIF/JAR. 1514 01:08:34,870 --> 01:08:36,854 So once you have that, what can you do? 1515 01:08:36,854 --> 01:08:39,300 Well, there are some sensitive sites 1516 01:08:39,300 --> 01:08:41,950 that will allow users to submit data, 1517 01:08:41,950 --> 01:08:43,979 but not arbitrary types of data. 1518 01:08:43,979 --> 01:08:45,626 So [INAUDIBLE] Flickr or something 1519 01:08:45,626 --> 01:08:48,750 like that, it may not allow you to submit arbitrary ActiveX 1520 01:08:48,750 --> 01:08:50,140 or whatever, arbitrary HTML. 1521 01:08:50,140 --> 01:08:53,439 But it will allow you to submit images. 1522 01:08:53,439 --> 01:08:56,470 So what you could do is construct one of these things, 1523 01:08:56,470 --> 01:08:58,920 submit it to one of these sensitive sites that does 1524 01:08:58,920 --> 01:09:01,880 allow you to submit images. 1525 01:09:01,880 --> 01:09:03,140 And then what can you do? 1526 01:09:03,140 --> 01:09:06,700 Well, the next thing you need to do is-- so yes, 1527 01:09:06,700 --> 01:09:10,700 the first thing you do is you submit one of these things 1528 01:09:10,700 --> 01:09:15,150 to the sensitive [? cycle. ?] 1529 01:09:15,150 --> 01:09:17,130 And then the next thing that you can do 1530 01:09:17,130 --> 01:09:21,022 is if you have an XSS attack, if you 1531 01:09:21,022 --> 01:09:23,420 have a cross-site vulnerability, then 1532 01:09:23,420 --> 01:09:29,330 you can use the cross-site scripting to inject something 1533 01:09:29,330 --> 01:09:30,060 like this. 1534 01:09:33,100 --> 01:09:34,770 And due to poor board management, 1535 01:09:34,770 --> 01:09:37,630 I will draw this over here. 1536 01:09:37,630 --> 01:09:47,890 So you can inject an applet, write JavaScript code that has, 1537 01:09:47,890 --> 01:09:53,980 as its sort of source, you just say, cats.gif. 1538 01:09:58,210 --> 01:10:00,740 And so what's interesting about this 1539 01:10:00,740 --> 01:10:04,490 is that this code, because we're using a cross-site scripting 1540 01:10:04,490 --> 01:10:07,620 vulnerability, runs in the context of the vulnerable site. 1541 01:10:07,620 --> 01:10:10,600 This has been uploaded to the vulnerable site's origin. 1542 01:10:10,600 --> 01:10:14,220 So this will pass the same origin test. 1543 01:10:14,220 --> 01:10:17,030 But however, this code was specified by the attacker. 1544 01:10:17,030 --> 01:10:19,130 So now what happens is that the attacker 1545 01:10:19,130 --> 01:10:23,130 gets to run that Java applet in the context of the victim's 1546 01:10:23,130 --> 01:10:27,290 site with all the authority of that origin 1547 01:10:27,290 --> 01:10:32,440 even though the GIFAR passed the vulnerable site's image 1548 01:10:32,440 --> 01:10:34,182 validation code. 1549 01:10:34,182 --> 01:10:35,890 Because one of these things will actually 1550 01:10:35,890 --> 01:10:38,660 parse correctly as a GIF. 1551 01:10:38,660 --> 01:10:40,280 But it has this hidden code in here. 1552 01:10:40,280 --> 01:10:41,190 And so [INAUDIBLE] when the browser tries 1553 01:10:41,190 --> 01:10:43,515 to execute the JAR part of it, once again, 1554 01:10:43,515 --> 01:10:45,660 it starts from the bottom, comes up here, 1555 01:10:45,660 --> 01:10:47,580 and just ignores that part. 1556 01:10:47,580 --> 01:10:49,502 So this is actually pretty amazing. 1557 01:10:49,502 --> 01:10:51,460 And so there's some fairly straightforward ways 1558 01:10:51,460 --> 01:10:53,520 you can fix something like this. 1559 01:10:53,520 --> 01:10:58,490 So for example, you can actually have the applet loader actually 1560 01:10:58,490 --> 01:11:01,790 understand that there should not be random junk up here, 1561 01:11:01,790 --> 01:11:02,690 for example. 1562 01:11:02,690 --> 01:11:05,932 What was happening in many cases is that there was information 1563 01:11:05,932 --> 01:11:08,390 in the metadata saying, here's the length of this resource. 1564 01:11:08,390 --> 01:11:10,852 And then if it said, the length, it stops here, 1565 01:11:10,852 --> 01:11:12,810 they would just say, who cares what's the rest. 1566 01:11:12,810 --> 01:11:14,010 It's probably zero. 1567 01:11:14,010 --> 01:11:16,070 But in this case, it wasn't. 1568 01:11:16,070 --> 01:11:18,270 What I love about this is that it really 1569 01:11:18,270 --> 01:11:23,150 shows how wide the software stack is for the web. 1570 01:11:23,150 --> 01:11:27,510 So sort of taking these two formats, GIF and then JAR, 1571 01:11:27,510 --> 01:11:29,510 we can actually create this really nasty attack. 1572 01:11:29,510 --> 01:11:31,315 You can actually do this for PDFs, too. 1573 01:11:31,315 --> 01:11:32,232 You can put PDFs here. 1574 01:11:32,232 --> 01:11:34,106 I think that was called, like, the [? PDFR ?] 1575 01:11:34,106 --> 01:11:35,590 attack or something like this. 1576 01:11:35,590 --> 01:11:37,664 So people had a field day with this for a day. 1577 01:11:37,664 --> 01:11:39,455 These vulnerabilities have been closed now. 1578 01:11:39,455 --> 01:11:42,244 AUDIENCE: So what can you do with this attack 1579 01:11:42,244 --> 01:11:43,660 that you can't do with [INAUDIBLE] 1580 01:11:43,660 --> 01:11:44,530 XSS or your own [INAUDIBLE]? 1581 01:11:44,530 --> 01:11:45,780 PROFESSOR: So what's nice-- yeah, yeah. 1582 01:11:45,780 --> 01:11:46,488 So good question. 1583 01:11:46,488 --> 01:11:49,360 So what's nice about this is that Java oftentimes 1584 01:11:49,360 --> 01:11:52,660 can be more powerful than just running regular JavaScript, 1585 01:11:52,660 --> 01:11:54,460 because it has slightly different rules on, 1586 01:11:54,460 --> 01:11:56,740 [? same origin ?] policy and stuff like that. 1587 01:11:56,740 --> 01:11:58,805 [INAUDIBLE] get more lower-level access 1588 01:11:58,805 --> 01:12:02,270 to the file systems or things like that. 1589 01:12:02,270 --> 01:12:04,960 But you're right, that if you can do cross-site scripting, 1590 01:12:04,960 --> 01:12:07,130 running JavaScript's already pretty damaging. 1591 01:12:07,130 --> 01:12:08,530 But the main advantage of this is, once again, 1592 01:12:08,530 --> 01:12:09,710 running inside the applet. 1593 01:12:13,600 --> 01:12:14,340 All right. 1594 01:12:14,340 --> 01:12:14,840 Yeah. 1595 01:12:14,840 --> 01:12:17,235 So like I said, that's my favorite attack of all time, 1596 01:12:17,235 --> 01:12:20,330 mainly just because it forced serious-minded security 1597 01:12:20,330 --> 01:12:23,120 individuals to say GIFAR all the time. 1598 01:12:23,120 --> 01:12:25,830 So if you're easily amused, like myself, then 1599 01:12:25,830 --> 01:12:28,570 this was a bonanza for you. 1600 01:12:28,570 --> 01:12:31,140 So another thing that's interesting 1601 01:12:31,140 --> 01:12:36,320 is that there are actually attacks 1602 01:12:36,320 --> 01:12:41,230 that are based on a time. 1603 01:12:41,230 --> 01:12:44,350 So you might not think of time as a resource which 1604 01:12:44,350 --> 01:12:46,390 could be a vector for attacks. 1605 01:12:46,390 --> 01:12:51,900 But as I was discussing with someone a few minutes ago, 1606 01:12:51,900 --> 01:12:55,280 yeah, time can actually be a way that a system can be exploited. 1607 01:12:55,280 --> 01:12:58,925 And so these attacks are called-- the particular attack 1608 01:12:58,925 --> 01:13:00,300 I'm going to talk to you about is 1609 01:13:00,300 --> 01:13:04,080 a specific example of a covert channel attack. 1610 01:13:07,330 --> 01:13:10,390 And so the idea behind the covert channel attack 1611 01:13:10,390 --> 01:13:12,860 is that, essentially, the attacker 1612 01:13:12,860 --> 01:13:14,990 has found some way for two applications 1613 01:13:14,990 --> 01:13:17,100 to exchange information. 1614 01:13:17,100 --> 01:13:20,900 And that exchange vector is not an officially sanctioned 1615 01:13:20,900 --> 01:13:21,510 vector. 1616 01:13:21,510 --> 01:13:23,980 The attacker is somehow leveraging some other part 1617 01:13:23,980 --> 01:13:27,210 of the system to pass bits of information 1618 01:13:27,210 --> 01:13:29,520 between two different entities. 1619 01:13:29,520 --> 01:13:33,250 So a good example of some of this stuff 1620 01:13:33,250 --> 01:13:39,974 is something called CSS-based sniffing attacks. 1621 01:13:44,270 --> 01:13:48,282 So what is this attack all about? 1622 01:13:48,282 --> 01:13:57,059 So attacker has a website that the user can visit. 1623 01:13:57,059 --> 01:13:59,100 And once again, getting a user to visit a website 1624 01:13:59,100 --> 01:14:00,891 is actually usually pretty straightforward. 1625 01:14:00,891 --> 01:14:01,560 You create ads. 1626 01:14:01,560 --> 01:14:03,268 You send them a phishing email, whatever. 1627 01:14:03,268 --> 01:14:06,810 So the attacker has a website that the user visits. 1628 01:14:10,020 --> 01:14:16,240 And the goal of the attacker is to learn 1629 01:14:16,240 --> 01:14:22,257 what other websites the user has visited. 1630 01:14:26,160 --> 01:14:29,040 And the attacker might want to know this for several reasons. 1631 01:14:29,040 --> 01:14:32,637 Maybe they're trying to figure out what kinds of search terms 1632 01:14:32,637 --> 01:14:33,595 the user's looking for. 1633 01:14:33,595 --> 01:14:34,500 Maybe they're trying to figure out 1634 01:14:34,500 --> 01:14:36,291 where that person's employed, or maybe they 1635 01:14:36,291 --> 01:14:38,620 want to know if they've accessed some type 1636 01:14:38,620 --> 01:14:41,810 of embarrassing material, so on and so forth. 1637 01:14:41,810 --> 01:14:44,220 So how is the attacker going to do 1638 01:14:44,220 --> 01:14:46,630 that if the only thing that the attacker controls 1639 01:14:46,630 --> 01:14:50,130 is a website that he or she can convince the user to visit? 1640 01:14:50,130 --> 01:14:58,870 Well, the exploit is to leverage link colors. 1641 01:15:01,400 --> 01:15:03,365 So you know like when you go to a web page 1642 01:15:03,365 --> 01:15:05,740 and you click on a link, the next time you see that link, 1643 01:15:05,740 --> 01:15:08,260 it is now a different color. 1644 01:15:08,260 --> 01:15:11,580 So zoinks, that's actually a security vulnerability. 1645 01:15:11,580 --> 01:15:15,325 Because what that means is that in this attacker website, 1646 01:15:15,325 --> 01:15:18,810 if the attacker can trick you into visiting it, 1647 01:15:18,810 --> 01:15:22,900 then the attacker can generate a huge list of candidate URLs 1648 01:15:22,900 --> 01:15:25,180 that you might have visited and then 1649 01:15:25,180 --> 01:15:30,530 use JavaScript to see what color those URLs are. 1650 01:15:30,530 --> 01:15:34,240 And if the URL color is purple, that means, 1651 01:15:34,240 --> 01:15:37,430 aha, you have visited that site. 1652 01:15:37,430 --> 01:15:39,760 So this was very subtle. 1653 01:15:39,760 --> 01:15:41,310 And what's interesting about this 1654 01:15:41,310 --> 01:15:43,995 is that you don't even have to display the URLs in many cases 1655 01:15:43,995 --> 01:15:45,070 to the user. 1656 01:15:45,070 --> 01:15:47,370 You can just sort of conjure up a domino that 1657 01:15:47,370 --> 01:15:49,585 has a particular href and just look at its style, 1658 01:15:49,585 --> 01:15:52,784 and then see if it has the visited color or not. 1659 01:15:52,784 --> 01:15:54,200 So this is actually pretty subtle. 1660 01:15:54,200 --> 01:15:55,470 So you might be thinking, well, isn't 1661 01:15:55,470 --> 01:15:57,600 it going to be inefficient to scan through all 1662 01:15:57,600 --> 01:15:59,250 these candidate URLs? 1663 01:15:59,250 --> 01:16:01,340 We can do all kinds of clever optimizations. 1664 01:16:01,340 --> 01:16:04,850 So for example, you can have multiple passes. 1665 01:16:04,850 --> 01:16:06,480 In your first pass, you could only 1666 01:16:06,480 --> 01:16:09,050 see if the user had visited top-level URLs-- 1667 01:16:09,050 --> 01:16:12,340 cnn.com, Facebook.com, so on and so forth. 1668 01:16:12,340 --> 01:16:14,030 If the answer is yes, you can then 1669 01:16:14,030 --> 01:16:17,120 do sort of a depth-first search on those hits 1670 01:16:17,120 --> 01:16:18,705 that you found at the top level. 1671 01:16:18,705 --> 01:16:20,205 So you can actually really constrain 1672 01:16:20,205 --> 01:16:22,410 the search space this way. 1673 01:16:22,410 --> 01:16:24,440 So this was really, really funny, too, 1674 01:16:24,440 --> 01:16:26,160 if you have a demented sense of humor, 1675 01:16:26,160 --> 01:16:30,370 because it showed that this very innocuous feature 1676 01:16:30,370 --> 01:16:32,870 that browsers support-- they're just trying to help you out. 1677 01:16:32,870 --> 01:16:34,471 They're trying to say, hey, buddy, 1678 01:16:34,471 --> 01:16:34,840 here's where you visited. 1679 01:16:34,840 --> 01:16:37,090 It can actually reveal this very damaging information. 1680 01:16:37,090 --> 01:16:39,350 So what is a solution for this? 1681 01:16:39,350 --> 01:16:42,680 So in practice, what the browser [? runners ?] did 1682 01:16:42,680 --> 01:16:47,240 is that they made it such that the browser lies to JavaScript 1683 01:16:47,240 --> 01:16:49,440 about the color of links. 1684 01:16:49,440 --> 01:16:52,126 So basically, when JavaScript tries to look at the link 1685 01:16:52,126 --> 01:16:56,230 and look at its styling, the browser always says, unvisited. 1686 01:16:56,230 --> 01:16:56,840 OK. 1687 01:16:56,840 --> 01:17:00,584 So that seems somewhat unfortunate, 1688 01:17:00,584 --> 01:17:01,750 but it prevents this attack. 1689 01:17:01,750 --> 01:17:03,410 So I guess we can live with it. 1690 01:17:03,410 --> 01:17:05,790 JavaScript not being able to read link colors, eh, not 1691 01:17:05,790 --> 01:17:07,080 the end of the world. 1692 01:17:07,080 --> 01:17:08,950 So are we done, though? 1693 01:17:08,950 --> 01:17:11,689 Does this fix the problem of the attacker 1694 01:17:11,689 --> 01:17:13,480 being able to figure out where you've been? 1695 01:17:13,480 --> 01:17:15,280 The answer, of course, is no. 1696 01:17:15,280 --> 01:17:20,300 So the next attack that the attacker can do 1697 01:17:20,300 --> 01:17:24,440 is a cache-based attack. 1698 01:17:24,440 --> 01:17:30,260 And so the intuition here is that, once again, the goals 1699 01:17:30,260 --> 01:17:30,980 are the same. 1700 01:17:30,980 --> 01:17:32,896 Attacker wants to know what sites you visited. 1701 01:17:32,896 --> 01:17:34,970 The exploit vector is that information 1702 01:17:34,970 --> 01:17:38,270 that has been cached is quicker to access. 1703 01:17:38,270 --> 01:17:40,470 That, in fact, is the whole reason why you cache it 1704 01:17:40,470 --> 01:17:42,170 in the first place. 1705 01:17:42,170 --> 01:17:44,660 So once again, the attacker can generate 1706 01:17:44,660 --> 01:17:47,650 a list of candidate objects that the attacker thinks 1707 01:17:47,650 --> 01:17:50,390 you might have visited and then just time 1708 01:17:50,390 --> 01:17:53,660 how quickly those objects come back to the attacker. 1709 01:17:53,660 --> 01:17:55,525 And so if the objects come back quickly, 1710 01:17:55,525 --> 01:17:57,150 you know [? you need some ?] threshold, 1711 01:17:57,150 --> 01:17:58,565 the attacker can guess that you, in fact, have 1712 01:17:58,565 --> 01:17:59,830 been to those objects before. 1713 01:17:59,830 --> 01:18:01,740 So does that make sense? 1714 01:18:01,740 --> 01:18:03,990 Once again, the browser's just trying to help you out. 1715 01:18:03,990 --> 01:18:07,160 But you can leverage these techniques to figure out 1716 01:18:07,160 --> 01:18:09,380 some evil knowledge. 1717 01:18:09,380 --> 01:18:10,755 And what's interesting about this 1718 01:18:10,755 --> 01:18:13,790 is that this attack can actually leverage 1719 01:18:13,790 --> 01:18:17,530 some very interesting geographic location information. 1720 01:18:17,530 --> 01:18:22,090 So imagine that we're doing attacks on Google Map tiles, 1721 01:18:22,090 --> 01:18:22,780 for example. 1722 01:18:22,780 --> 01:18:24,790 So if I detect that you've actually 1723 01:18:24,790 --> 01:18:26,980 accessed a series of Google Map tiles, 1724 01:18:26,980 --> 01:18:29,900 that probably means you are either in that place 1725 01:18:29,900 --> 01:18:31,692 or you're interested in other people who 1726 01:18:31,692 --> 01:18:32,650 might be in that place. 1727 01:18:32,650 --> 01:18:36,080 So it's actually a pretty powerful attack. 1728 01:18:36,080 --> 01:18:36,610 So OK. 1729 01:18:36,610 --> 01:18:39,460 So how can you fix this one? 1730 01:18:39,460 --> 01:18:43,035 Well, this one is not quite clear. 1731 01:18:43,035 --> 01:18:45,410 You could have a site that doesn't cache anything at all. 1732 01:18:45,410 --> 01:18:47,150 And then your site's going to be slow. 1733 01:18:47,150 --> 01:18:48,200 So that kind of sucks. 1734 01:18:48,200 --> 01:18:50,480 So it's not quite clear how you get around this. 1735 01:18:50,480 --> 01:18:51,170 But OK. 1736 01:18:51,170 --> 01:18:53,900 Let's suppose that we have the defense we put in place here-- 1737 01:18:53,900 --> 01:18:55,570 JavaScript can't read link colors. 1738 01:18:55,570 --> 01:18:57,200 Let's assume that the site is super 1739 01:18:57,200 --> 01:19:00,010 paranoid it caches nothing. 1740 01:19:00,010 --> 01:19:03,680 So have we completely defended ourselves against this attack? 1741 01:19:03,680 --> 01:19:04,310 One second. 1742 01:19:04,310 --> 01:19:06,325 So the answer is no. 1743 01:19:06,325 --> 01:19:10,675 Because the attacker can actually launch DNS-based 1744 01:19:10,675 --> 01:19:11,175 attacks. 1745 01:19:14,100 --> 01:19:18,510 So the intuition is that even if you don't cache anything, when 1746 01:19:18,510 --> 01:19:21,075 you access a resource for the first time, 1747 01:19:21,075 --> 01:19:23,450 you have to generate a DNS request for the hosting that's 1748 01:19:23,450 --> 01:19:25,410 associated with that resource. 1749 01:19:25,410 --> 01:19:27,490 So once again, the attacker can look in time 1750 01:19:27,490 --> 01:19:29,590 and see how long it takes for the attacker 1751 01:19:29,590 --> 01:19:32,420 to access these candidate objects the attacker thinks 1752 01:19:32,420 --> 01:19:33,400 you may have accessed. 1753 01:19:33,400 --> 01:19:36,360 And if they come back quickly, then that's 1754 01:19:36,360 --> 01:19:40,260 perhaps a good hint that you've resulted the DNS 1755 01:19:40,260 --> 01:19:42,290 name for that host before. 1756 01:19:42,290 --> 01:19:46,120 And so this works even if you don't cache anything, 1757 01:19:46,120 --> 01:19:49,846 because the DNS cache lives with the OS, not with the browser. 1758 01:19:49,846 --> 01:19:53,060 AUDIENCE: You've mentioned, I think last class, the ability 1759 01:19:53,060 --> 01:19:55,504 to get JavaScript to take screenshots. 1760 01:19:55,504 --> 01:19:56,420 PROFESSOR: Yeah, yeah. 1761 01:19:56,420 --> 01:19:58,580 AUDIENCE: So can you just render the [? link ?] 1762 01:19:58,580 --> 01:20:00,620 as a single pixel, and then take a screenshot, 1763 01:20:00,620 --> 01:20:02,234 and [INAUDIBLE] that pixel? 1764 01:20:02,234 --> 01:20:02,900 PROFESSOR: Yeah. 1765 01:20:02,900 --> 01:20:04,229 Well-- so you could. 1766 01:20:04,229 --> 01:20:06,270 So rendering stuff is always a little bit tricky, 1767 01:20:06,270 --> 01:20:07,550 because you have to play these games. 1768 01:20:07,550 --> 01:20:08,910 If you want to show something to the user, 1769 01:20:08,910 --> 01:20:10,435 it has to flash really quickly. 1770 01:20:10,435 --> 01:20:11,915 Or else they might see that someone's entering 1771 01:20:11,915 --> 01:20:12,874 this huge list of URLs. 1772 01:20:12,874 --> 01:20:13,581 But you're right. 1773 01:20:13,581 --> 01:20:15,540 If you have access to the screen-sharing API, 1774 01:20:15,540 --> 01:20:17,040 a lot of this becomes a lot simpler. 1775 01:20:17,040 --> 01:20:20,570 AUDIENCE: And if you just have some kind of animated image 1776 01:20:20,570 --> 01:20:23,084 that looks mostly random, then you just 1777 01:20:23,084 --> 01:20:25,137 pay attention to one pixel of it? 1778 01:20:25,137 --> 01:20:26,470 PROFESSOR: You're exactly right. 1779 01:20:26,470 --> 01:20:27,570 I mean, in general, I think the screen-sharing API is 1780 01:20:27,570 --> 01:20:28,800 a bad idea. 1781 01:20:28,800 --> 01:20:32,320 I'm not the president of the world, so what can I do? 1782 01:20:32,320 --> 01:20:36,080 So anyways, so DNS-based attacks work even if there 1783 01:20:36,080 --> 01:20:38,190 is no caching that takes place. 1784 01:20:38,190 --> 01:20:38,690 OK. 1785 01:20:38,690 --> 01:20:40,640 So as the final piece de resistance, 1786 01:20:40,640 --> 01:20:44,825 so you might think, OK, what if we only use raw IP addresses 1787 01:20:44,825 --> 01:20:46,600 for all of our host names? 1788 01:20:46,600 --> 01:20:48,260 We don't cache a thing! 1789 01:20:48,260 --> 01:20:49,020 OK? 1790 01:20:49,020 --> 01:20:52,700 And we're running on an updated browser that doesn't expose 1791 01:20:52,700 --> 01:20:54,340 link colors to JavaScript. 1792 01:20:54,340 --> 01:20:55,630 So surely we're fine. 1793 01:20:55,630 --> 01:20:58,100 I'm here to tell you you are not fine. 1794 01:20:58,100 --> 01:21:01,390 Because what the attacker can actually do 1795 01:21:01,390 --> 01:21:03,850 is take advantage of rendering attacks. 1796 01:21:07,200 --> 01:21:13,140 So the basic idea here is that it is typically faster 1797 01:21:13,140 --> 01:21:16,380 to render a URL that you have visited 1798 01:21:16,380 --> 01:21:18,446 before for various wacky reasons that 1799 01:21:18,446 --> 01:21:20,612 have to deal with how browsers [INAUDIBLE] rendering 1800 01:21:20,612 --> 01:21:21,880 [INAUDIBLE] internal. 1801 01:21:21,880 --> 01:21:24,890 And so what the attacker can do is actually 1802 01:21:24,890 --> 01:21:29,049 create a candidate iframe, let's say, puts some content in there 1803 01:21:29,049 --> 01:21:31,340 that the attacker thinks you may have visited, and then 1804 01:21:31,340 --> 01:21:35,720 constantly see if the attacker loses access to that iframe. 1805 01:21:35,720 --> 01:21:38,150 Because as that iframe is loading, 1806 01:21:38,150 --> 01:21:39,840 the browser typically thinks that iframe 1807 01:21:39,840 --> 01:21:42,499 belongs to the attacker's page. 1808 01:21:42,499 --> 01:21:44,540 And then as soon as that different origin content 1809 01:21:44,540 --> 01:21:47,195 comes in, then you'll start getting these access errors. 1810 01:21:47,195 --> 01:21:49,634 Because now that different origin [INAUDIBLE]. 1811 01:21:49,634 --> 01:21:51,300 So now the attacker can't touch anymore. 1812 01:21:51,300 --> 01:21:53,750 So the attacker can do things like this still 1813 01:21:53,750 --> 01:21:56,350 to see if there's caching, rendering information 1814 01:21:56,350 --> 01:21:59,510 [INAUDIBLE] browser for these candidate sites. 1815 01:21:59,510 --> 01:22:01,700 So anyways, so those are the only hopes and dreams 1816 01:22:01,700 --> 01:22:03,230 I want to crush in you today. 1817 01:22:03,230 --> 01:22:05,210 I believe we're running out of time. 1818 01:22:05,210 --> 01:22:07,960 But I will see you next time.