1 00:00:00,080 --> 00:00:02,430 The following content is provided under a Creative 2 00:00:02,430 --> 00:00:03,810 Commons license. 3 00:00:03,810 --> 00:00:06,060 Your support will help MIT OpenCourseWare 4 00:00:06,060 --> 00:00:10,150 continue to offer high quality educational resources for free. 5 00:00:10,150 --> 00:00:12,700 To make a donation or to view additional materials 6 00:00:12,700 --> 00:00:16,600 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:16,600 --> 00:00:17,305 at ocw.mit.edu. 8 00:00:26,380 --> 00:00:28,855 PROFESSOR: All right, guys. 9 00:00:28,855 --> 00:00:30,850 Let's get started with the next installment 10 00:00:30,850 --> 00:00:34,290 of our exciting journey into computer security. 11 00:00:34,290 --> 00:00:36,665 Today, we're actually going to talk about web security. 12 00:00:36,665 --> 00:00:39,760 Web security is, actually, one of my favorite topics 13 00:00:39,760 --> 00:00:41,657 to talk about because it really exposes you 14 00:00:41,657 --> 00:00:43,543 to the true horrors of the world. 15 00:00:43,543 --> 00:00:45,126 It's very easy to think, as a student, 16 00:00:45,126 --> 00:00:46,780 that everything will be great when you graduate. 17 00:00:46,780 --> 00:00:48,516 Today's lecture and the next lecture 18 00:00:48,516 --> 00:00:51,006 will be telling you that's, in fact, not the case. 19 00:00:51,006 --> 00:00:52,030 Everything's terrible. 20 00:00:52,030 --> 00:00:53,760 So what Is the web? 21 00:00:53,760 --> 00:00:57,230 Well back in the olden days, the web was, actually, much simpler 22 00:00:57,230 --> 00:00:58,630 than it is today, right. 23 00:00:58,630 --> 00:01:00,780 So clients, which is to say the browsers, 24 00:01:00,780 --> 00:01:03,030 couldn't really do anything with respect of displaying 25 00:01:03,030 --> 00:01:04,030 rigid or active content. 26 00:01:04,030 --> 00:01:06,540 Basically they could just get static images, static text, 27 00:01:06,540 --> 00:01:07,960 and that was about it. 28 00:01:07,960 --> 00:01:10,512 Now the server side was a little more interesting 29 00:01:10,512 --> 00:01:13,320 because even if there was static content on a clients side. 30 00:01:13,320 --> 00:01:15,830 Maybe the server was talking databases, 31 00:01:15,830 --> 00:01:18,700 maybe it was talking to other machines on the server side. 32 00:01:18,700 --> 00:01:20,050 Things like that. 33 00:01:20,050 --> 00:01:22,950 So for a very long time, the notion of web security, 34 00:01:22,950 --> 00:01:26,060 basically, meant looking at what the server was doing. 35 00:01:26,060 --> 00:01:27,760 And to this point in this class, we've 36 00:01:27,760 --> 00:01:29,430 essentially taken that approach. 37 00:01:29,430 --> 00:01:33,450 So we looked at things like buffer to overflow attacks. 38 00:01:33,450 --> 00:01:35,890 So how clients can trick the server into doing things 39 00:01:35,890 --> 00:01:37,140 the server doesn't want to do. 40 00:01:37,140 --> 00:01:39,500 You also looked at the OKWS server 41 00:01:39,500 --> 00:01:43,750 and looked at how we can do some privilege isolation there. 42 00:01:43,750 --> 00:01:46,230 So to this point, we, sort of, looked at security 43 00:01:46,230 --> 00:01:50,489 through the experiences that were actually 44 00:01:50,489 --> 00:01:52,530 experienced by the security resources themselves. 45 00:01:52,530 --> 00:01:55,269 But now, actually, the browser is very interesting 46 00:01:55,269 --> 00:01:56,810 to think about, in terms of security, 47 00:01:56,810 --> 00:02:02,484 because the browser is super, super complicated these days. 48 00:02:05,460 --> 00:02:07,950 So now there's all kinds of insane, dynamic stuff 49 00:02:07,950 --> 00:02:09,350 that the browser can actually do. 50 00:02:09,350 --> 00:02:13,320 So for example, you probably heard of JavaScript. 51 00:02:13,320 --> 00:02:16,450 So JavaScript now allows pages to execute 52 00:02:16,450 --> 00:02:18,465 client side code, Turing complete, 53 00:02:18,465 --> 00:02:20,350 can do all kinds of wacky stuff. 54 00:02:20,350 --> 00:02:22,320 There is the DOM model, which we'll 55 00:02:22,320 --> 00:02:25,100 talk about in more depth later today. 56 00:02:25,100 --> 00:02:27,350 The DOM model, essentially, allows JavaScript code 57 00:02:27,350 --> 00:02:31,480 to dynamically change the visual appearance of the page. 58 00:02:31,480 --> 00:02:36,166 Fiddle with things like font stylings and stuff like that. 59 00:02:36,166 --> 00:02:40,630 There's XML HTTP request. 60 00:02:40,630 --> 00:02:44,250 These are, basically, a way for JavaScript 61 00:02:44,250 --> 00:02:47,350 to asynchronously fetch contents from servers. 62 00:02:47,350 --> 00:02:53,520 You may also hear XML HTTP requests referred to as AJAX. 63 00:02:53,520 --> 00:02:56,030 Asynchronous JavaScript fetching. 64 00:02:56,030 --> 00:02:58,760 There are things like web sockets. 65 00:02:58,760 --> 00:03:02,780 This is, actually, recently introduced API. 66 00:03:02,780 --> 00:03:05,960 So WebSockets, essentially, allow a full duplex 67 00:03:05,960 --> 00:03:08,260 communication between clients and servers. 68 00:03:08,260 --> 00:03:09,920 Communication going both ways. 69 00:03:09,920 --> 00:03:12,610 We've got all kinds of multimedia support. 70 00:03:16,230 --> 00:03:22,630 So for example, we have things like the video tag, 71 00:03:22,630 --> 00:03:26,167 which allows a web page to play video 72 00:03:26,167 --> 00:03:27,250 without using a Flash app. 73 00:03:27,250 --> 00:03:30,110 It can actually just play that video natively. 74 00:03:30,110 --> 00:03:34,170 There's also a geolocation. 75 00:03:34,170 --> 00:03:39,180 So now a web page can actually determine, physically, 76 00:03:39,180 --> 00:03:40,190 where you are. 77 00:03:40,190 --> 00:03:42,680 For example, if you're running a web page on a smartphone, 78 00:03:42,680 --> 00:03:45,360 the browser can actually access your GPS unit. 79 00:03:45,360 --> 00:03:48,550 If you're accessing a page on a desktop browser, 80 00:03:48,550 --> 00:03:51,460 it can actually look at your Wi-Fi connection 81 00:03:51,460 --> 00:03:54,310 and connect to Google's Wi-Fi geolocation service 82 00:03:54,310 --> 00:03:56,130 to figure out where exactly you are. 83 00:03:56,130 --> 00:03:57,130 That's, kind of, insane. 84 00:03:57,130 --> 00:03:57,630 Right? 85 00:03:57,630 --> 00:04:00,470 But now web pages can do do that kind of stuff. 86 00:04:00,470 --> 00:04:05,000 So we've also talked about things like NaCl, 87 00:04:05,000 --> 00:04:09,300 for example, which allows browsers to run native code. 88 00:04:09,300 --> 00:04:11,371 So there's many, many other features 89 00:04:11,371 --> 00:04:12,620 that I haven't mentioned here. 90 00:04:12,620 --> 00:04:14,240 But suffice it to say the browser 91 00:04:14,240 --> 00:04:16,480 is now incredibly complicated. 92 00:04:16,480 --> 00:04:19,750 So what does this mean from the perspective of security? 93 00:04:19,750 --> 00:04:22,140 Well basically, it means that we're screwed. 94 00:04:22,140 --> 00:04:22,640 Right? 95 00:04:22,640 --> 00:04:25,590 The thread surface for that right there is enormous. 96 00:04:25,590 --> 00:04:28,580 And loosely speaking, when you're thinking about security, 97 00:04:28,580 --> 00:04:31,460 you can think of a graph that, sort of, looks like this. 98 00:04:31,460 --> 00:04:37,230 So you've got the likelihood of correctness. 99 00:04:41,214 --> 00:04:43,630 And then, you've got the number of features that you have. 100 00:04:48,430 --> 00:04:51,035 And so you know, this graph starts up here at 100. 101 00:04:51,035 --> 00:04:53,630 Well of course, we never even started 100, 102 00:04:53,630 --> 00:04:55,680 even with very simple code because we can't even 103 00:04:55,680 --> 00:04:58,190 do bubble sort right. 104 00:04:58,190 --> 00:05:00,470 So essentially, that curve looks something like this. 105 00:05:00,470 --> 00:05:03,490 And web browsers are right over here. 106 00:05:03,490 --> 00:05:05,140 So as we'll discuss today, There's 107 00:05:05,140 --> 00:05:09,210 all kinds of wacky security bugs that are arising constantly. 108 00:05:09,210 --> 00:05:11,020 And as soon as the old ones are fixed, 109 00:05:11,020 --> 00:05:12,660 new ones are rising because people 110 00:05:12,660 --> 00:05:14,530 keep adding these new features. 111 00:05:14,530 --> 00:05:16,270 Oftentimes, without thinking about what 112 00:05:16,270 --> 00:05:19,270 the security implications of those features are. 113 00:05:19,270 --> 00:05:22,400 So if you think about what a web application is these days, 114 00:05:22,400 --> 00:05:24,720 well it's this client thing and it's a server thing. 115 00:05:24,720 --> 00:05:28,220 And a web application now spans multiple programming languages, 116 00:05:28,220 --> 00:05:30,770 multiple machines, and multiple hardware programs. 117 00:05:30,770 --> 00:05:32,972 You could be using Firefox on Windows. 118 00:05:32,972 --> 00:05:35,430 Then it's going to go talk to a machine in the cloud that's 119 00:05:35,430 --> 00:05:36,030 running Linux. 120 00:05:36,030 --> 00:05:38,230 It's running the Apache server. 121 00:05:38,230 --> 00:05:41,460 Maybe it's running an ARM chip opposed to x86 or something 122 00:05:41,460 --> 00:05:43,020 like that, or the other way around. 123 00:05:43,020 --> 00:05:47,210 So long story short, there's all these problems of composition. 124 00:05:47,210 --> 00:05:49,935 There's all these software layers and all these hardware 125 00:05:49,935 --> 00:05:53,797 layers that all can impact security in some way. 126 00:05:53,797 --> 00:05:54,880 But it's also complicated. 127 00:05:54,880 --> 00:05:58,470 It's not quite clear how we can make sense of the entire whole. 128 00:05:58,470 --> 00:06:03,170 So for example, one common problem with the web 129 00:06:03,170 --> 00:06:06,385 is this problem of a parsing context. 130 00:06:10,220 --> 00:06:12,050 So as an example, suppose that you 131 00:06:12,050 --> 00:06:16,220 had something in a page that looked like this. 132 00:06:16,220 --> 00:06:19,260 You declare a script tag. 133 00:06:19,260 --> 00:06:22,470 Inside that script tag, you declare a variable. 134 00:06:22,470 --> 00:06:24,440 There's some string here. 135 00:06:24,440 --> 00:06:29,700 And let's say that this string comes from an untrusted party. 136 00:06:29,700 --> 00:06:34,810 Either the user or another machine or something like that. 137 00:06:34,810 --> 00:06:36,710 And then, you close that script tag. 138 00:06:40,360 --> 00:06:42,074 So this stuff is trusted. 139 00:06:42,074 --> 00:06:42,990 This stuff is trusted. 140 00:06:42,990 --> 00:06:44,280 This stuff is not trusted. 141 00:06:44,280 --> 00:06:45,390 So can anybody figure out why there 142 00:06:45,390 --> 00:06:47,775 might be some problems here if we take this entrusted string 143 00:06:47,775 --> 00:06:48,608 and put it in there? 144 00:06:51,422 --> 00:06:55,728 AUDIENCE: You can have a closing quote mark in [INAUDIBLE] 145 00:06:55,728 --> 00:06:57,159 and then have some [INAUDIBLE]. 146 00:06:57,159 --> 00:06:58,640 PROFESSOR: Right, right, exactly. 147 00:06:58,640 --> 00:07:01,320 So the problem is there are multiple context, 148 00:07:01,320 --> 00:07:04,590 that this untrusted code could, sort of, break into. 149 00:07:04,590 --> 00:07:09,390 So for example, if the untrusted code had a double quote here, 150 00:07:09,390 --> 00:07:14,056 now we've closed the definition of this JavaScript string. 151 00:07:14,056 --> 00:07:16,055 So now we're added the JavaScript string context 152 00:07:16,055 --> 00:07:18,570 and render the regular JavaScript execution context. 153 00:07:18,570 --> 00:07:20,610 And then the attacker gets a regular job 154 00:07:20,610 --> 00:07:22,540 zip code here and go to town. 155 00:07:22,540 --> 00:07:25,580 Alternatively, the attacker could just 156 00:07:25,580 --> 00:07:31,220 put a closing script tag here. 157 00:07:31,220 --> 00:07:31,850 Right? 158 00:07:31,850 --> 00:07:35,270 And then, at that point, the attacker 159 00:07:35,270 --> 00:07:38,820 can, sort of, get out of the JavaScript context 160 00:07:38,820 --> 00:07:40,940 and then get into the HTML context. 161 00:07:40,940 --> 00:07:44,250 Maybe to find some new HTML nodes or something like that. 162 00:07:44,250 --> 00:07:46,090 So you see this problem with composition 163 00:07:46,090 --> 00:07:48,185 all over the place in the web because there 164 00:07:48,185 --> 00:07:49,810 are so many different languages and run 165 00:07:49,810 --> 00:07:51,018 times for you to think about. 166 00:07:51,018 --> 00:07:54,575 HTML, CSS, JavaScript, maybe MySQL on the server side, 167 00:07:54,575 --> 00:07:56,820 and so on and so forth. 168 00:07:56,820 --> 00:07:59,540 So this is just a classic example 169 00:07:59,540 --> 00:08:02,240 of why you have to do something called content standardization. 170 00:08:02,240 --> 00:08:05,410 So whenever you get untrusted input from someone, 171 00:08:05,410 --> 00:08:07,700 you actually need to analyze it very carefully 172 00:08:07,700 --> 00:08:11,720 to make sure that it's not being used as a vector for an attack. 173 00:08:11,720 --> 00:08:14,420 So another reason why web security so tricky 174 00:08:14,420 --> 00:08:17,510 is because the web specifications are incredibly 175 00:08:17,510 --> 00:08:19,130 long, they're incredibly tedious, 176 00:08:19,130 --> 00:08:21,857 they're incredibly boring, and they're often inconsistent. 177 00:08:21,857 --> 00:08:23,440 So when I mean the web specifications, 178 00:08:23,440 --> 00:08:26,000 I mean things like the definition of JPEG, 179 00:08:26,000 --> 00:08:28,522 the definition of CSS, the definition of HTML. 180 00:08:28,522 --> 00:08:29,980 These documents are, like, the size 181 00:08:29,980 --> 00:08:33,480 of the EU constitution and equally as easy to understand. 182 00:08:33,480 --> 00:08:36,130 So what ends up happening is that the browser vendors 183 00:08:36,130 --> 00:08:37,549 see all these specs. 184 00:08:37,549 --> 00:08:40,080 And they essentially say, OK, thanks for that. 185 00:08:40,080 --> 00:08:42,169 I'm going to do something that somewhat resembles 186 00:08:42,169 --> 00:08:43,320 what these specs look like. 187 00:08:43,320 --> 00:08:44,610 Then they call it a day and they laugh about it 188 00:08:44,610 --> 00:08:45,410 with their friends. 189 00:08:45,410 --> 00:08:48,160 OK, so what ends up happening is that these specifications 190 00:08:48,160 --> 00:08:52,550 end up being like these vague, aspirational documents that 191 00:08:52,550 --> 00:08:55,109 don't always accurately reflect what real browsers are doing. 192 00:08:55,109 --> 00:08:57,150 And if you want to understand the horror of this, 193 00:08:57,150 --> 00:08:59,450 you can go to this site called quirksmode.org. 194 00:08:59,450 --> 00:09:01,050 I mean, don't go to this site if you want to be happy. 195 00:09:01,050 --> 00:09:01,925 But you can go there. 196 00:09:01,925 --> 00:09:05,540 And it actually documents all of these terrible inconsistencies 197 00:09:05,540 --> 00:09:08,030 that browsers have with respect to what happens 198 00:09:08,030 --> 00:09:10,337 when the user hits a key press? 199 00:09:10,337 --> 00:09:12,670 There should just be one key precedent that's generated. 200 00:09:12,670 --> 00:09:13,840 You are so wrong. 201 00:09:13,840 --> 00:09:15,680 So go to quirksmode.org and check that out, 202 00:09:15,680 --> 00:09:17,075 and see what's going on. 203 00:09:17,075 --> 00:09:18,450 So anyway, in this lecture, we're 204 00:09:18,450 --> 00:09:21,514 going to focus on the client side of the web application. 205 00:09:21,514 --> 00:09:22,930 In particular, we're going to look 206 00:09:22,930 --> 00:09:26,250 at how we can isolate content from different web 207 00:09:26,250 --> 00:09:28,610 providers that has to coexist, somehow, 208 00:09:28,610 --> 00:09:31,720 in the same machine and the same browser. 209 00:09:31,720 --> 00:09:34,012 So at a high level, there's this fundamental difference 210 00:09:34,012 --> 00:09:35,386 between the way you traditionally 211 00:09:35,386 --> 00:09:37,362 think of a desktop application and the way 212 00:09:37,362 --> 00:09:39,420 you think of a web application. 213 00:09:39,420 --> 00:09:42,490 Abstractly speaking, most of the desktop applications that you 214 00:09:42,490 --> 00:09:45,320 use, you can think of it as coming from a single principal. 215 00:09:45,320 --> 00:09:47,490 So word comes from Microsoft. 216 00:09:47,490 --> 00:09:49,925 And maybe TurboTax comes from Mr. and Mrs. TurboTax, 217 00:09:49,925 --> 00:09:51,470 so on and so forth. 218 00:09:51,470 --> 00:09:54,870 But when you look at a web application, something 219 00:09:54,870 --> 00:09:58,830 that looks to you, visually, as a single application 220 00:09:58,830 --> 00:10:01,226 is actually composed of a bunch of different content 221 00:10:01,226 --> 00:10:02,600 from a bunch of different people. 222 00:10:02,600 --> 00:10:05,260 So you go to CNN, it looks like it's all on one tab. 223 00:10:05,260 --> 00:10:08,670 But each of those visual things that you see 224 00:10:08,670 --> 00:10:10,420 may, in fact, come from someone different. 225 00:10:10,420 --> 00:10:15,530 So let's just look at a very simple example here. 226 00:10:15,530 --> 00:10:20,020 So let's say that we were looking at the following site. 227 00:10:20,020 --> 00:10:24,654 So HTTP food.com. 228 00:10:24,654 --> 00:10:26,854 And we're just looking at index.html. 229 00:10:30,730 --> 00:10:35,170 So you know, you look at your browser tab. 230 00:10:35,170 --> 00:10:36,240 What might you see? 231 00:10:36,240 --> 00:10:41,830 So one thing that you might see is an advertisement. 232 00:10:41,830 --> 00:10:43,230 So you might see an advertisement 233 00:10:43,230 --> 00:10:45,660 in the form of a gift. 234 00:10:45,660 --> 00:10:49,082 And maybe that was downloaded from ads.com. 235 00:10:51,974 --> 00:10:56,270 Then you also might see, let's say, an analytics library. 236 00:10:58,980 --> 00:11:03,062 And maybe this comes from google.com. 237 00:11:06,970 --> 00:11:09,470 So these libraries are very popular for doing things 238 00:11:09,470 --> 00:11:11,760 like tracking how many people have loaded your page, 239 00:11:11,760 --> 00:11:14,600 looking to see where people click on things 240 00:11:14,600 --> 00:11:17,040 to see which parts of their site are the most interesting 241 00:11:17,040 --> 00:11:19,550 for people to interact with, so on and so forth. 242 00:11:19,550 --> 00:11:22,690 And you might also have another JavaScript library. 243 00:11:22,690 --> 00:11:24,810 Let's say it's jQuery. 244 00:11:27,440 --> 00:11:33,370 And maybe that comes from cdn.foo.com. 245 00:11:33,370 --> 00:11:38,190 So some content distribution network that foo.com runs. 246 00:11:38,190 --> 00:11:40,470 jQuery is very popular library for doing things 247 00:11:40,470 --> 00:11:41,822 like GUI manipulation. 248 00:11:41,822 --> 00:11:42,530 Things like that. 249 00:11:42,530 --> 00:11:44,670 So a lot of popular websites have jQuery. 250 00:11:44,670 --> 00:11:47,140 Although, they serve it from different places. 251 00:11:47,140 --> 00:11:53,170 And then, on this page you might see some HTML. 252 00:11:53,170 --> 00:11:54,860 And here's where you might see stuff 253 00:11:54,860 --> 00:12:02,250 like buttons for the user to click on, text input, and so 254 00:12:02,250 --> 00:12:03,020 on and so forth. 255 00:12:05,700 --> 00:12:08,120 So that's just raw HTML on the page. 256 00:12:08,120 --> 00:12:12,690 And then, you might see what they call 257 00:12:12,690 --> 00:12:19,075 inline JavaScript from foo.com. 258 00:12:21,865 --> 00:12:27,480 In my inline, you have a script tag. 259 00:12:27,480 --> 00:12:31,029 And then, you have a closed script tag. 260 00:12:31,029 --> 00:12:32,820 And then you just have some JavaScript code 261 00:12:32,820 --> 00:12:34,430 included in their directly. 262 00:12:34,430 --> 00:12:39,400 That's as opposed to where you say something like script. 263 00:12:39,400 --> 00:12:43,780 And then, the source equals something that 264 00:12:43,780 --> 00:12:45,199 lives on some server remotely. 265 00:12:45,199 --> 00:12:46,990 So this is what's called inline JavaScript. 266 00:12:46,990 --> 00:12:49,115 This is what's referred to as an externally defined 267 00:12:49,115 --> 00:12:49,830 JavaScript file. 268 00:12:49,830 --> 00:12:53,172 So you might have some inline JavaScript there from foo.com. 269 00:12:53,172 --> 00:12:55,130 And the other thing that you might have in here 270 00:12:55,130 --> 00:12:58,960 is actually a frame. 271 00:12:58,960 --> 00:13:01,680 So we'll talk about frames a bit more in a little bit, 272 00:13:01,680 --> 00:13:04,450 but think of a frame as almost like a separate JavaScript 273 00:13:04,450 --> 00:13:05,770 universe. 274 00:13:05,770 --> 00:13:08,960 It's a little bit equivalent to a process and UNIX. 275 00:13:08,960 --> 00:13:13,500 So maybe this frame here, maybe this guy belongs 276 00:13:13,500 --> 00:13:15,226 to https://facebook .com/likethis.html. 277 00:13:30,690 --> 00:13:36,940 So maybe here we have some inline JavaScript 278 00:13:36,940 --> 00:13:40,240 from Facebook. 279 00:13:40,240 --> 00:13:43,300 And then, maybe, we also have some image. 280 00:13:43,300 --> 00:13:46,410 So you know, f.jpeg. 281 00:13:49,040 --> 00:14:00,140 That comes from https://facebook.com. 282 00:14:00,140 --> 00:14:06,370 OK, so this is what a single tab might have in its contents. 283 00:14:06,370 --> 00:14:08,052 But as I just mentioned, all this 284 00:14:08,052 --> 00:14:10,510 can, potentially, come from all these different principles. 285 00:14:10,510 --> 00:14:12,301 So there's a bunch of interesting questions 286 00:14:12,301 --> 00:14:14,320 that we can ask about a application that 287 00:14:14,320 --> 00:14:15,200 looks like this 288 00:14:15,200 --> 00:14:19,354 So for example, can this analytics code from google.com 289 00:14:19,354 --> 00:14:21,660 actually access JavaScript state that 290 00:14:21,660 --> 00:14:23,880 resides in the jQuery code. 291 00:14:23,880 --> 00:14:26,840 So to first approximation, maybe that seems like a bad idea 292 00:14:26,840 --> 00:14:29,577 because these two pieces of code came from different places. 293 00:14:29,577 --> 00:14:31,160 But then again, maybe it's actually OK 294 00:14:31,160 --> 00:14:35,170 because, presumably, foo.com brought both of these libraries 295 00:14:35,170 --> 00:14:37,270 in so that they can work with each other. 296 00:14:37,270 --> 00:14:38,280 So who knows. 297 00:14:38,280 --> 00:14:40,370 Another question you might have is 298 00:14:40,370 --> 00:14:43,090 can the analytics code here actually 299 00:14:43,090 --> 00:14:44,880 interact with the text inputs here. 300 00:14:44,880 --> 00:14:47,350 So for example, can the analytics code 301 00:14:47,350 --> 00:14:49,460 define event handlers? 302 00:14:49,460 --> 00:14:51,720 So a little bit of background in JavaScript. 303 00:14:51,720 --> 00:14:54,720 JavaScript is single threaded vent driven model. 304 00:14:54,720 --> 00:14:56,280 So basically, in each frame, there's 305 00:14:56,280 --> 00:14:58,970 just an event loop that's just constantly pulling events. 306 00:14:58,970 --> 00:15:01,570 Key presses, network events timers, and stuff like that. 307 00:15:01,570 --> 00:15:03,260 And then, seeing if there are any handlers associated 308 00:15:03,260 --> 00:15:04,009 with those events. 309 00:15:04,009 --> 00:15:05,460 And if so, it fires them. 310 00:15:05,460 --> 00:15:08,800 So who should be able to define event handlers for this HTML. 311 00:15:08,800 --> 00:15:10,510 Should google.com be able to do it. 312 00:15:10,510 --> 00:15:14,520 It's not from foo.com so maybe, maybe not. 313 00:15:14,520 --> 00:15:16,890 Another question, too, is what's the relationship 314 00:15:16,890 --> 00:15:19,930 between this Facebook frame here and the larger frame? 315 00:15:19,930 --> 00:15:23,680 The Facebook frame is an HTTPS, secure. 316 00:15:23,680 --> 00:15:26,460 foo.com is an HTTP, nonsecure. 317 00:15:26,460 --> 00:15:29,090 So how should these two things be able to interact? 318 00:15:29,090 --> 00:15:31,900 So basically, to answer these questions, 319 00:15:31,900 --> 00:15:38,015 browsers use a security model called the same-origin policy. 320 00:15:43,910 --> 00:15:47,294 So there's, sort of, this vague goal 321 00:15:47,294 --> 00:15:49,460 because a lot of things with respect to web security 322 00:15:49,460 --> 00:15:50,436 are, kind of, vague because nobody 323 00:15:50,436 --> 00:15:51,477 knows what they're doing. 324 00:15:51,477 --> 00:15:58,140 But the basic idea is two websites 325 00:15:58,140 --> 00:16:03,654 should not be able to tamper with each other, 326 00:16:03,654 --> 00:16:05,272 unless they want to. 327 00:16:14,090 --> 00:16:19,860 So defining what tampering means was actually easier 328 00:16:19,860 --> 00:16:21,300 when the web was simpler. 329 00:16:21,300 --> 00:16:23,032 But as we keep adding these new APIs, 330 00:16:23,032 --> 00:16:24,990 it's more and more difficult to understand what 331 00:16:24,990 --> 00:16:26,760 this non-tampering goal means. 332 00:16:26,760 --> 00:16:29,550 So for example, it's obviously bad 333 00:16:29,550 --> 00:16:32,010 if two websites, which don't trust each other, 334 00:16:32,010 --> 00:16:34,850 can over write o each other's visual display. 335 00:16:34,850 --> 00:16:36,970 That seems like an obviously bad thing. 336 00:16:36,970 --> 00:16:39,000 It seems like an obviously good thing 337 00:16:39,000 --> 00:16:41,680 if two websites, which want to collaborate, 338 00:16:41,680 --> 00:16:44,977 are able to, somehow, exchange data in a safe way. 339 00:16:44,977 --> 00:16:47,310 So you can think of mash up sites you may have heard of. 340 00:16:47,310 --> 00:16:49,040 So sometimes you'll see these things in the internet. 341 00:16:49,040 --> 00:16:50,990 It's like someone takes Google map data, 342 00:16:50,990 --> 00:16:52,995 and then takes the location of food trucks. 343 00:16:52,995 --> 00:16:54,620 And then, you have this amazing mash up 344 00:16:54,620 --> 00:16:57,140 that allows you to eat cheaply and avoid salmonella, right? 345 00:16:57,140 --> 00:16:59,930 So that seems like a thing you should be able to do. 346 00:16:59,930 --> 00:17:02,695 But how, exactly, do we enable that type of composition? 347 00:17:02,695 --> 00:17:05,069 Then there's other things that are, kind of, hard to say. 348 00:17:05,069 --> 00:17:07,910 So for example, if JavaScript code comes from origin 349 00:17:07,910 --> 00:17:11,270 x inside of a page that's from origin y, 350 00:17:11,270 --> 00:17:15,920 how exactly should that code and that content compose? 351 00:17:15,920 --> 00:17:23,220 So the strategy that the same-origin policy user can be 352 00:17:23,220 --> 00:17:25,579 roughly described as follows. 353 00:17:25,579 --> 00:17:38,830 So each resource is assigned an origin, which 354 00:17:38,830 --> 00:17:41,680 we'll discuss in a second. 355 00:17:44,790 --> 00:17:49,740 And essentially, a JavaScript code 356 00:17:49,740 --> 00:17:57,430 can only access resources from its own origin. 357 00:18:05,820 --> 00:18:08,820 So this is the high level strategy 358 00:18:08,820 --> 00:18:10,064 the same origin policy uses. 359 00:18:10,064 --> 00:18:11,355 But the devil's in the details. 360 00:18:11,355 --> 00:18:13,271 And there's the ton of exceptions, which we're 361 00:18:13,271 --> 00:18:15,450 going to look into in a second. 362 00:18:15,450 --> 00:18:17,180 But first of all, before we proceed, 363 00:18:17,180 --> 00:18:19,930 let's define what an origin is. 364 00:18:19,930 --> 00:18:29,310 So an origin is, basically, a network protocol scheme 365 00:18:29,310 --> 00:18:36,140 plus a host name plus a port. 366 00:18:39,540 --> 00:18:44,952 So for example, we can have something like HTTP foo.com. 367 00:18:47,664 --> 00:18:49,310 And then, maybe, it's index.html. 368 00:18:55,130 --> 00:18:58,536 So the scheme here is HTTP. 369 00:18:58,536 --> 00:19:02,400 And the host name is foo.com. 370 00:19:02,400 --> 00:19:03,840 And the port is 80. 371 00:19:03,840 --> 00:19:06,530 Now the port, in this case, is implicit. 372 00:19:06,530 --> 00:19:08,830 The port is the port on the server side 373 00:19:08,830 --> 00:19:10,560 that the client uses to connect. 374 00:19:10,560 --> 00:19:13,490 So if you see a URL from the HTTP scheme 375 00:19:13,490 --> 00:19:16,550 and there's no port that's explicitly supplied, then, 376 00:19:16,550 --> 00:19:19,000 implicitly, that port is 80. 377 00:19:19,000 --> 00:19:26,220 So then, if we look at something like the HTTPS, 378 00:19:26,220 --> 00:19:29,764 once again, foo.com index.html. 379 00:19:33,340 --> 00:19:37,270 So these two URLs have the same host name. 380 00:19:37,270 --> 00:19:37,800 Right? 381 00:19:37,800 --> 00:19:40,880 But they have, actually, different schemes. 382 00:19:40,880 --> 00:19:42,690 HTTPS vs HTTP. 383 00:19:42,690 --> 00:19:46,710 And also, here, the port is implicitly 443. 384 00:19:46,710 --> 00:19:48,880 That's the default HTTPS port. 385 00:19:48,880 --> 00:19:51,940 So these two URLs have different origins. 386 00:19:51,940 --> 00:19:54,840 And then, as a final example, if you 387 00:19:54,840 --> 00:20:00,830 had a site like HTTP bar.com, then you 388 00:20:00,830 --> 00:20:03,740 can use this colon notation here. 389 00:20:03,740 --> 00:20:07,330 8181. 390 00:20:07,330 --> 00:20:09,680 You know, these things beyond here 391 00:20:09,680 --> 00:20:12,915 don't matter with respect to the same origin policy, at least 392 00:20:12,915 --> 00:20:15,150 with respect to this very simple example. 393 00:20:15,150 --> 00:20:17,930 Here, we see that we have a scheme of HTTP, a host 394 00:20:17,930 --> 00:20:22,230 name of bar.com, and here we've explicitly specified the port. 395 00:20:22,230 --> 00:20:25,771 So in this case, it's a non-default port of 8181. 396 00:20:25,771 --> 00:20:26,770 So does that make sense? 397 00:20:26,770 --> 00:20:29,480 It's pretty straightforward. 398 00:20:29,480 --> 00:20:33,970 OK, so this is, basically, what an origin is. 399 00:20:33,970 --> 00:20:39,630 Loosely speaking, you can think of an origin as a UID in Unix 400 00:20:39,630 --> 00:20:43,950 with the frame being loosely considered as, like, a process. 401 00:20:43,950 --> 00:20:53,410 So there are four basic ideas behind the browser's 402 00:20:53,410 --> 00:20:56,100 implementation of the same origin policy. 403 00:20:56,100 --> 00:21:08,350 So first idea is each origin has client side resources. 404 00:21:14,180 --> 00:21:17,590 So what are examples of those resources? 405 00:21:17,590 --> 00:21:21,560 Things like cookies. 406 00:21:21,560 --> 00:21:25,170 Now you can think of cookies as a very simple way 407 00:21:25,170 --> 00:21:29,560 to implement state in a stateless protocol like HTTP. 408 00:21:29,560 --> 00:21:31,740 Basically, a cookie is like a tiny file that's 409 00:21:31,740 --> 00:21:33,614 associated with each origin. 410 00:21:33,614 --> 00:21:35,780 And we'll talk about the specifics of this in a bit. 411 00:21:35,780 --> 00:21:38,238 But the basic idea is that when the browser sends a request 412 00:21:38,238 --> 00:21:40,960 to a particular website, it includes any cookies 413 00:21:40,960 --> 00:21:43,320 that the client has for that website. 414 00:21:43,320 --> 00:21:46,230 And you can use these cookies for things 415 00:21:46,230 --> 00:21:48,385 like implementing password remembering. 416 00:21:48,385 --> 00:21:50,480 Maybe if you were going to an ecommerce site, 417 00:21:50,480 --> 00:21:53,935 you can remember stuff about a user's shopping cart 418 00:21:53,935 --> 00:21:55,960 in these cookies, so on and so forth. 419 00:21:55,960 --> 00:21:59,530 So cookies are one thing that each origin 420 00:21:59,530 --> 00:22:01,070 can be associated with. 421 00:22:01,070 --> 00:22:04,180 Also, you can think of DOM storage 422 00:22:04,180 --> 00:22:06,170 as another one of these resources. 423 00:22:06,170 --> 00:22:08,350 This is a fairly new interface. 424 00:22:08,350 --> 00:22:11,820 But think of DOM storage as just a key value store. 425 00:22:11,820 --> 00:22:14,600 So DOM storage allows an origin to say, 426 00:22:14,600 --> 00:22:16,562 for this given key, which is a string, 427 00:22:16,562 --> 00:22:18,020 let me associate it with this given 428 00:22:18,020 --> 00:22:21,650 value, which is also a string. 429 00:22:21,650 --> 00:22:26,390 Another thing that is social with an origin 430 00:22:26,390 --> 00:22:28,545 is a JavaScript name space. 431 00:22:32,810 --> 00:22:34,840 So that JavaScript name space defines 432 00:22:34,840 --> 00:22:36,530 what functions and what interfaces 433 00:22:36,530 --> 00:22:38,887 are available to the origin. 434 00:22:38,887 --> 00:22:40,470 Some of those interfaces are built in. 435 00:22:40,470 --> 00:22:42,890 Like, let's say, the string prototype and stuff like that. 436 00:22:42,890 --> 00:22:44,514 And then, an application might actually 437 00:22:44,514 --> 00:22:47,620 fill the JavaScript namespace with some other content. 438 00:22:47,620 --> 00:22:53,180 There's also this thing called the DOM tree. 439 00:22:53,180 --> 00:22:56,580 So DOM is short for Document Object Model. 440 00:22:56,580 --> 00:22:58,410 And the Dom tree is, essentially, 441 00:22:58,410 --> 00:23:03,090 a JavaScript reflection of the HTML in a page. 442 00:23:03,090 --> 00:23:07,410 So you can imagine that the DOM tree 443 00:23:07,410 --> 00:23:14,690 has a node for the topmost HTML5 node in the HTML. 444 00:23:14,690 --> 00:23:20,820 And then, it's going to have a node for the head tag. 445 00:23:20,820 --> 00:23:24,470 Then, it's going to have a node for the body tag. 446 00:23:27,176 --> 00:23:29,200 All right, so on and so forth. 447 00:23:29,200 --> 00:23:32,270 So the way that a lot of dynamic web pages 448 00:23:32,270 --> 00:23:35,470 are made dynamic is the JavaScript code 449 00:23:35,470 --> 00:23:37,630 can access this data structure in JavaScript 450 00:23:37,630 --> 00:23:39,249 that mirrors the HTML content. 451 00:23:39,249 --> 00:23:41,040 So you can imagine an animation takes place 452 00:23:41,040 --> 00:23:43,000 by changing some of these nodes down 453 00:23:43,000 --> 00:23:46,670 here to implement different organizations of various tabs. 454 00:23:46,670 --> 00:23:49,290 So that's what the DOM tree is. 455 00:23:49,290 --> 00:23:53,085 There's also a visual display area. 456 00:23:57,398 --> 00:23:59,611 Although, we'll see that the visual display area 457 00:23:59,611 --> 00:24:01,860 actually interacts very strangely with the same origin 458 00:24:01,860 --> 00:24:02,910 policy. 459 00:24:02,910 --> 00:24:04,170 So on and so forth. 460 00:24:04,170 --> 00:24:06,950 So at high level, each origin has access 461 00:24:06,950 --> 00:24:10,160 to some set of client side resources of these types. 462 00:24:10,160 --> 00:24:13,290 Doe that make sense? 463 00:24:13,290 --> 00:24:21,920 And then, the second big idea is that each frame 464 00:24:21,920 --> 00:24:28,100 gets the origin of its URL. 465 00:24:34,060 --> 00:24:35,790 So as I mentioned before, a frame 466 00:24:35,790 --> 00:24:39,850 is, roughly, analogous to a process in Unix. 467 00:24:39,850 --> 00:24:41,780 It's, kind of, like a name space that 468 00:24:41,780 --> 00:24:45,700 aggregates a bunch of other different resources. 469 00:24:45,700 --> 00:24:55,380 So third idea is that scripts, so JavaScript code, 470 00:24:55,380 --> 00:25:09,700 execute with the authority of it's frame's origin. 471 00:25:18,510 --> 00:25:22,990 OK, so what that means is that foo.com imports a JavaScript 472 00:25:22,990 --> 00:25:24,130 file from bar.com. 473 00:25:24,130 --> 00:25:26,200 Well, that JavaScript file is going 474 00:25:26,200 --> 00:25:30,780 to be able to act with the authority of foo.com. 475 00:25:30,780 --> 00:25:34,125 So loosely speaking, this is, sort of, similar to 476 00:25:34,125 --> 00:25:36,220 if you were in the Unix world to run 477 00:25:36,220 --> 00:25:38,610 a binary that, sort of, belonged in someone else's home 478 00:25:38,610 --> 00:25:39,380 directory. 479 00:25:39,380 --> 00:25:41,760 That thing would sort of, execute, with your privileged 480 00:25:41,760 --> 00:25:43,650 there. 481 00:25:43,650 --> 00:25:50,020 And the fourth thing is there's passive content. 482 00:25:50,020 --> 00:25:55,980 So by passive content I mean things like that images, 483 00:25:55,980 --> 00:25:57,490 for example. 484 00:25:57,490 --> 00:26:00,217 Or CSS file or things like that. 485 00:26:00,217 --> 00:26:01,800 These are things, which we don't think 486 00:26:01,800 --> 00:26:03,750 of as having executable code. 487 00:26:03,750 --> 00:26:08,800 So passive content gets zero authority from the browser. 488 00:26:16,430 --> 00:26:19,070 So that, kind of, makes sense. 489 00:26:19,070 --> 00:26:21,270 We'll see why this fourth thing is a little bit 490 00:26:21,270 --> 00:26:22,280 subtle in a second. 491 00:26:22,280 --> 00:26:25,080 So going back to our example here. 492 00:26:25,080 --> 00:26:27,830 So we see, for example, that the Google Analytics 493 00:26:27,830 --> 00:26:32,425 script and the jQuery script can access all kinds of stuff 494 00:26:32,425 --> 00:26:33,630 in foo.com. 495 00:26:33,630 --> 00:26:35,970 So for example, they can read and write cookies. 496 00:26:35,970 --> 00:26:39,440 They can do things like attach event handlers to buttons here. 497 00:26:39,440 --> 00:26:41,500 So on and so forth. 498 00:26:41,500 --> 00:26:44,900 If we look at the Facebook frame and its relationship 499 00:26:44,900 --> 00:26:47,090 to the larger foo.com frame, then we 500 00:26:47,090 --> 00:26:49,440 see that they're from different origins 501 00:26:49,440 --> 00:26:51,830 because they have different schemes here. 502 00:26:51,830 --> 00:26:54,660 They have different host names. 503 00:26:54,660 --> 00:26:55,560 Different ports. 504 00:26:55,560 --> 00:26:58,630 So what this means is that they are, to a first approximation, 505 00:26:58,630 --> 00:27:00,010 isolated. 506 00:27:00,010 --> 00:27:03,630 Now they can communicate if they both opt 507 00:27:03,630 --> 00:27:07,885 into it using this interface called postMessage. 508 00:27:12,540 --> 00:27:17,320 So postMessage allows two different frames 509 00:27:17,320 --> 00:27:20,960 to exchange asynchronous immutable messages 510 00:27:20,960 --> 00:27:21,750 with each other. 511 00:27:21,750 --> 00:27:25,080 So think of this facility as allowing Facebook 512 00:27:25,080 --> 00:27:27,310 to try to send a string. 513 00:27:27,310 --> 00:27:30,860 Not a reference, a string up to the enclosing foo.com frame. 514 00:27:30,860 --> 00:27:34,420 Now note that if foo.com doesn't want to receive those messages, 515 00:27:34,420 --> 00:27:35,430 it doesn't have to. 516 00:27:35,430 --> 00:27:37,940 So this has to be opt in from both sides 517 00:27:37,940 --> 00:27:41,220 to get this thing to work. 518 00:27:41,220 --> 00:27:45,880 So note that the JavaScript code here in the Facebook frame 519 00:27:45,880 --> 00:27:51,860 cannot issue an XML HTTP request to the foo.com server. 520 00:27:51,860 --> 00:27:54,460 That's once again because network destinations also 521 00:27:54,460 --> 00:27:56,710 have these origins that are associated with them. 522 00:27:56,710 --> 00:28:00,220 So because Facebook.com does not have the same origin as foo.com 523 00:28:00,220 --> 00:28:05,610 it can't asynchronously fetch stuff from it via HTML request. 524 00:28:05,610 --> 00:28:08,370 So the last thing we can look at we 525 00:28:08,370 --> 00:28:10,480 can say, OK, we got an image up here from ads.com. 526 00:28:10,480 --> 00:28:12,302 This is rule number four over there. 527 00:28:12,302 --> 00:28:13,760 So it seems pretty straightforward. 528 00:28:13,760 --> 00:28:14,690 This is an image. 529 00:28:14,690 --> 00:28:15,890 It has no executable code. 530 00:28:15,890 --> 00:28:18,350 So clearly, the browser's going to give it no authority. 531 00:28:18,350 --> 00:28:20,320 Now that seems kind of like a dumb thing. 532 00:28:20,320 --> 00:28:22,260 Like, why are you even talking about images 533 00:28:22,260 --> 00:28:23,800 having authority or not having authority? 534 00:28:23,800 --> 00:28:26,258 It seems obvious that images shouldn't be able to do stuff. 535 00:28:26,258 --> 00:28:28,530 Well it's a security class. 536 00:28:28,530 --> 00:28:32,005 So clearly, there is mischief that hides in statement number 537 00:28:32,005 --> 00:28:32,970 four up there. 538 00:28:32,970 --> 00:28:39,600 So what happens if the browser incorrectly parses an object 539 00:28:39,600 --> 00:28:42,422 and misattributes it's type? 540 00:28:42,422 --> 00:28:44,630 So you can actually get into security problems there. 541 00:28:44,630 --> 00:28:46,660 And this was actually a real security problem. 542 00:28:46,660 --> 00:28:49,340 So there's this thing called the MIME sniffing attack. 543 00:28:49,340 --> 00:28:50,876 So the MIME type-- I mean, you've 544 00:28:50,876 --> 00:28:52,000 probably seen these before. 545 00:28:52,000 --> 00:28:56,176 You knows it's Something like text dot HTML 546 00:28:56,176 --> 00:28:58,360 or image.JPEG Things like that. 547 00:28:58,360 --> 00:29:00,240 This was like a MIME type. 548 00:29:00,240 --> 00:29:04,690 So old versions of i.e used to do something that they thought 549 00:29:04,690 --> 00:29:06,470 was going to be helpful for you. 550 00:29:06,470 --> 00:29:08,410 So sometimes what web servers will do 551 00:29:08,410 --> 00:29:13,519 is they will misattribute the file extension of an object. 552 00:29:13,519 --> 00:29:15,310 So you can imagine that a web server that's 553 00:29:15,310 --> 00:29:19,050 been configured incorrectly might attach a dot HTML 554 00:29:19,050 --> 00:29:21,830 suffix to something that's really an image. 555 00:29:21,830 --> 00:29:24,470 Or it might attach a dot JPEG suffix 556 00:29:24,470 --> 00:29:26,910 to something that's really HTML. 557 00:29:26,910 --> 00:29:29,190 So what IE would do back in the olden 558 00:29:29,190 --> 00:29:31,040 days is try to help you out. 559 00:29:31,040 --> 00:29:32,250 So IE would go out. 560 00:29:32,250 --> 00:29:34,270 It would go fetch this resource. 561 00:29:34,270 --> 00:29:37,020 And it would say, OK, this resource 562 00:29:37,020 --> 00:29:39,840 claims to be of some type, according to its file name 563 00:29:39,840 --> 00:29:40,570 extension. 564 00:29:40,570 --> 00:29:43,520 But then it would actually look at the first 256 bytes 565 00:29:43,520 --> 00:29:45,620 of what was in that object. 566 00:29:45,620 --> 00:29:48,089 And if it found certain magic values in there 567 00:29:48,089 --> 00:29:50,380 that indicated that there was a different type for that 568 00:29:50,380 --> 00:29:54,440 object, it would just say, hey, I found something cool here. 569 00:29:54,440 --> 00:29:56,630 The web server misidentified the object. 570 00:29:56,630 --> 00:29:59,640 Let me just treat the object like it's type 571 00:29:59,640 --> 00:30:01,779 that I found in these first 256 bytes. 572 00:30:01,779 --> 00:30:03,570 And then, everybody's a winner because I've 573 00:30:03,570 --> 00:30:05,028 helped the web server developer out 574 00:30:05,028 --> 00:30:08,396 because now their website's going to render properly. 575 00:30:08,396 --> 00:30:09,770 And the user's going to like this 576 00:30:09,770 --> 00:30:11,290 because they get to unlock this content that 577 00:30:11,290 --> 00:30:12,850 would have been garbage before. 578 00:30:12,850 --> 00:30:15,320 But this is clearly a vulnerability 579 00:30:15,320 --> 00:30:20,260 because suppose that a page includes some passive content. 580 00:30:20,260 --> 00:30:23,400 Like, let's say, an image from a domain that's 581 00:30:23,400 --> 00:30:25,340 controlled by the attacker. 582 00:30:25,340 --> 00:30:28,750 Now from the perspective of the victim page, it's saying, 583 00:30:28,750 --> 00:30:32,820 even if this attacker site is evil, it's passive content. 584 00:30:32,820 --> 00:30:34,189 It can't do anything. 585 00:30:34,189 --> 00:30:36,230 Like, at worst, it displays an unfortunate image. 586 00:30:36,230 --> 00:30:38,130 But it can't actually access any code 587 00:30:38,130 --> 00:30:40,820 because passive content gives 0 authority. 588 00:30:40,820 --> 00:30:44,790 But what would happen is that IE could sniff this image. 589 00:30:44,790 --> 00:30:46,300 The first 256 bytes. 590 00:30:46,300 --> 00:30:48,230 And the attacker could intentionally 591 00:30:48,230 --> 00:30:51,096 put HTML and JavaScript in there. 592 00:30:51,096 --> 00:30:53,220 So what would happen is that the victim site brings 593 00:30:53,220 --> 00:30:54,930 in what it thinks is an image. 594 00:30:54,930 --> 00:30:58,260 IE coerces it into HTML and JavaScript. 595 00:30:58,260 --> 00:31:02,300 And then, executes that code in the context 596 00:31:02,300 --> 00:31:04,790 of that enclosing page. 597 00:31:04,790 --> 00:31:07,360 So does that attack make sense? 598 00:31:07,360 --> 00:31:07,860 so 599 00:31:07,860 --> 00:31:12,420 This is, sort of, an example of how complex browsers are 600 00:31:12,420 --> 00:31:17,000 and how adding even a very well intentioned feature 601 00:31:17,000 --> 00:31:22,010 can cause these very subtle security bugs. 602 00:31:22,010 --> 00:31:26,740 So let's now dig down and take a deeper look 603 00:31:26,740 --> 00:31:29,870 at how the browser secures various resources. 604 00:31:29,870 --> 00:31:36,515 So let's look at frames and window objects. 605 00:31:42,100 --> 00:31:46,550 So frames represent these separate JavaScript universes 606 00:31:46,550 --> 00:31:48,720 that we discussed over here. 607 00:31:48,720 --> 00:31:51,610 I mean, implementation wise, a frame 608 00:31:51,610 --> 00:31:55,400 with respect to JavaScript is an instance of a DOM node. 609 00:31:55,400 --> 00:31:57,010 So I forget where I drew-- oh, yeah. 610 00:31:57,010 --> 00:31:58,030 This DOM node up here. 611 00:31:58,030 --> 00:32:01,340 So the frame would exist as a DOM node 612 00:32:01,340 --> 00:32:03,080 object somewhere in this hierarchy that's 613 00:32:03,080 --> 00:32:04,730 visible to JavaScript. 614 00:32:04,730 --> 00:32:07,900 In JavaScript, the window object is actually an alias 615 00:32:07,900 --> 00:32:09,030 for the global name space. 616 00:32:09,030 --> 00:32:10,321 It's, kind of, this wacky idea. 617 00:32:10,321 --> 00:32:12,980 Like, if you were to find this global variable name x, 618 00:32:12,980 --> 00:32:16,500 you can also access it via the name window.x. 619 00:32:16,500 --> 00:32:19,260 OK, so basically, frames and window objects 620 00:32:19,260 --> 00:32:22,450 are very powerful references for you to be able to access. 621 00:32:22,450 --> 00:32:24,662 And they actually contain pointers to each other. 622 00:32:24,662 --> 00:32:26,120 The frame can [INAUDIBLE] a pointer 623 00:32:26,120 --> 00:32:28,479 to the associated window object and vice versa. 624 00:32:28,479 --> 00:32:30,020 So these two things are, essentially, 625 00:32:30,020 --> 00:32:31,130 equivalently powerful. 626 00:32:31,130 --> 00:32:43,220 So frame and window objects get the origin of the framed URL. 627 00:32:49,910 --> 00:32:54,650 Or because there's always an or in web security, 628 00:32:54,650 --> 00:33:10,530 they can get a suffix of the original domain name. 629 00:33:10,530 --> 00:33:11,890 The original origin. 630 00:33:11,890 --> 00:33:18,200 So for example, a frame could start off 631 00:33:18,200 --> 00:33:21,470 having an initial origin. 632 00:33:21,470 --> 00:33:26,770 x dot y dot z dot com. 633 00:33:26,770 --> 00:33:30,180 So let's ignore the scheme and the protocol for a second. 634 00:33:30,180 --> 00:33:33,020 So initially, the page can start off like this. 635 00:33:33,020 --> 00:33:39,470 It can then intentionally say I want to set my origin 636 00:33:39,470 --> 00:33:41,782 to be y dot z dot com. 637 00:33:41,782 --> 00:33:42,820 A suffix of that. 638 00:33:42,820 --> 00:33:44,320 And the way that it indicates this 639 00:33:44,320 --> 00:33:51,080 is by doing an assignment to the special document 640 00:33:51,080 --> 00:33:56,090 dot domain value that's accessible via JavaScript. 641 00:33:56,090 --> 00:33:59,600 So we can set document dot domain explicitly to this right 642 00:33:59,600 --> 00:34:00,150 here. 643 00:34:00,150 --> 00:34:02,060 And that's allowable because this guy 644 00:34:02,060 --> 00:34:04,160 is a suffix of that guy. 645 00:34:04,160 --> 00:34:07,880 And then, similarly, it could also 646 00:34:07,880 --> 00:34:10,770 set document dot domain to z.com and effectively reset 647 00:34:10,770 --> 00:34:12,770 it's origin like that. 648 00:34:12,770 --> 00:34:16,980 Now what it cannot do is it cannot do something like 649 00:34:16,980 --> 00:34:23,729 setting document domain to a dot y dot z dot com. 650 00:34:23,729 --> 00:34:25,270 That's disallowed because this is not 651 00:34:25,270 --> 00:34:29,370 a problem this is not a proper suffix of the original origin. 652 00:34:29,370 --> 00:34:35,536 And also, it cannot set its suffix to dot com. 653 00:34:35,536 --> 00:34:39,510 So does anyone have any theories about why this is a bad idea? 654 00:34:39,510 --> 00:34:40,270 Right, exactly. 655 00:34:40,270 --> 00:34:41,760 So people are laughing because, clearly, this 656 00:34:41,760 --> 00:34:43,593 is going to bring out the apocalypse, right. 657 00:34:43,593 --> 00:34:45,330 So if it does this, then this means 658 00:34:45,330 --> 00:34:49,639 that the site could somehow be able to impact cookies 659 00:34:49,639 --> 00:34:52,050 or things like that in any dot com site, which 660 00:34:52,050 --> 00:34:53,250 will be pretty devastating. 661 00:34:53,250 --> 00:34:56,210 The motivation for why these types of things are allowable 662 00:34:56,210 --> 00:34:59,910 is because, presumably, these origins 663 00:34:59,910 --> 00:35:02,130 have some type of preexisting trust relationship. 664 00:35:02,130 --> 00:35:04,330 So this seems to be vaguely OK. 665 00:35:04,330 --> 00:35:05,890 Whereas, this would seem to be bad. 666 00:35:05,890 --> 00:35:07,650 AUDIENCE: So you can make these splits 667 00:35:07,650 --> 00:35:10,999 on any dot or actual end point? 668 00:35:10,999 --> 00:35:12,811 Like, for example, for your x.y.zz.com, 669 00:35:12,811 --> 00:35:14,956 can you change that to your z.com? 670 00:35:14,956 --> 00:35:16,730 PROFESSOR: No, it says on every dot. 671 00:35:16,730 --> 00:35:17,936 AUDIENCE: OK. 672 00:35:17,936 --> 00:35:20,150 Is there a reason that it wasn't made 673 00:35:20,150 --> 00:35:27,560 so that you could specify super- or subdomain, 674 00:35:27,560 --> 00:35:31,820 but somehow they had to agree on where the information was 675 00:35:31,820 --> 00:35:33,050 coming from. 676 00:35:33,050 --> 00:35:36,370 So, like, you said some kind of I want to consider all of these 677 00:35:36,370 --> 00:35:37,674 to be the same origin as me. 678 00:35:37,674 --> 00:35:39,940 So any of them can attack me. 679 00:35:39,940 --> 00:35:42,945 And then you made this symmetric in order for me 680 00:35:42,945 --> 00:35:44,315 to impact them as well? 681 00:35:44,315 --> 00:35:48,170 [INAUDIBLE] .com means anything that's .com can impact me. 682 00:35:48,170 --> 00:35:50,410 And then you put [INAUDIBLE]. 683 00:35:50,410 --> 00:35:51,799 PROFESSOR: Yeah, it's tricky. 684 00:35:51,799 --> 00:35:53,840 So there's a couple of different answers to that. 685 00:35:53,840 --> 00:35:55,770 So first of all, people were very worried about this attack 686 00:35:55,770 --> 00:35:56,440 here. 687 00:35:56,440 --> 00:36:00,570 So they wanted to make the domain manipulation 688 00:36:00,570 --> 00:36:03,540 language be, at least, somewhat easy to understand. 689 00:36:03,540 --> 00:36:05,859 So they don't allow more broke settings. 690 00:36:05,859 --> 00:36:08,150 I'll get to one thing in a second, which kind of allows 691 00:36:08,150 --> 00:36:10,720 what you're talking about but only with respect to domain 692 00:36:10,720 --> 00:36:11,220 [INAUDIBLE]. 693 00:36:11,220 --> 00:36:12,370 I'll get to that in one second. 694 00:36:12,370 --> 00:36:15,070 And another to mention, too, is that the post message interface 695 00:36:15,070 --> 00:36:18,230 does allow arbitrary domains to communicate with each other 696 00:36:18,230 --> 00:36:20,080 if they both opt into it. 697 00:36:20,080 --> 00:36:22,700 So in practice, people use post message 698 00:36:22,700 --> 00:36:25,040 to cross domain communication if they 699 00:36:25,040 --> 00:36:27,510 can't set their origins to be the same using 700 00:36:27,510 --> 00:36:30,060 these tricks here. 701 00:36:30,060 --> 00:36:35,780 So yeah, so browsers can constrain or widen, 702 00:36:35,780 --> 00:36:37,880 I should say, their domain to these suffixes 703 00:36:37,880 --> 00:36:39,404 of the original domain. 704 00:36:39,404 --> 00:36:41,570 And there's also this little interesting quark here, 705 00:36:41,570 --> 00:36:45,980 which is that browsers actually distinguish between a document 706 00:36:45,980 --> 00:36:48,150 dot domain value that has been written 707 00:36:48,150 --> 00:36:50,306 and one that has not been written, OK. 708 00:36:50,306 --> 00:36:51,805 And there's a subtle reason for this 709 00:36:51,805 --> 00:36:52,930 we'll get into in a second. 710 00:36:52,930 --> 00:37:03,100 So basically, two frames can access each other 711 00:37:03,100 --> 00:37:06,840 if one of two things is true. 712 00:37:06,840 --> 00:37:13,290 The first thing is both of the frames set document dot 713 00:37:13,290 --> 00:37:19,390 domain to the same value. 714 00:37:24,330 --> 00:37:27,630 And the other way that two frames can access each other 715 00:37:27,630 --> 00:37:36,110 is that neither of those frames has changed document domain. 716 00:37:42,310 --> 00:37:46,110 And of course, both values have to match. 717 00:37:46,110 --> 00:37:49,278 And there's a value match. 718 00:37:52,090 --> 00:37:57,290 So the reason for this is a bit subtle. 719 00:37:57,290 --> 00:38:02,540 But the basic idea is that these two rules prevent a domain 720 00:38:02,540 --> 00:38:06,060 from being attacked by one of its own buggy 721 00:38:06,060 --> 00:38:08,150 or malicious sub-domains. 722 00:38:08,150 --> 00:38:08,760 OK? 723 00:38:08,760 --> 00:38:13,110 So imagine that you have the domain x.y.z.com. 724 00:38:16,540 --> 00:38:19,985 And then, imagine that it's trying to attack y.z.com. 725 00:38:23,526 --> 00:38:29,040 So this guy up here is buggy or evil. 726 00:38:32,080 --> 00:38:36,140 So what this guy could try to do is actually shorten his domain 727 00:38:36,140 --> 00:38:36,730 to be y.z.com. 728 00:38:36,730 --> 00:38:40,320 And then, start messing around with JavaScript state, 729 00:38:40,320 --> 00:38:42,170 or cookies or stuff like that here. 730 00:38:42,170 --> 00:38:42,690 Right? 731 00:38:42,690 --> 00:38:45,420 So basically, what these two rules over here will say 732 00:38:45,420 --> 00:38:49,600 is that if y.z.com does not want to actually allow anyone 733 00:38:49,600 --> 00:38:51,910 to interact with it, it will never 734 00:38:51,910 --> 00:38:54,560 change it's document.domain value 735 00:38:54,560 --> 00:38:57,860 so that when this frame up here does shorten it, 736 00:38:57,860 --> 00:38:59,340 the browser will say aha. 737 00:38:59,340 --> 00:39:00,700 You've shortened it. 738 00:39:00,700 --> 00:39:01,470 You have not. 739 00:39:01,470 --> 00:39:03,379 There's a match here in terms of the values. 740 00:39:03,379 --> 00:39:04,920 But this person hasn't indicated they 741 00:39:04,920 --> 00:39:08,209 want to opt into this type of chicanery. 742 00:39:08,209 --> 00:39:09,250 So does that makes sense? 743 00:39:12,850 --> 00:39:18,610 OK, so that is, basically, how frames work with respect 744 00:39:18,610 --> 00:39:19,860 to the same origin policy. 745 00:39:23,280 --> 00:39:27,200 So then we can look at how our DOM node's treated. 746 00:39:27,200 --> 00:39:31,700 So DOM nodes, it's pretty straightforward for DOM nodes. 747 00:39:31,700 --> 00:39:33,870 So DOM nodes, basically, get the origin 748 00:39:33,870 --> 00:39:35,950 of their surrounding frame. 749 00:39:35,950 --> 00:39:37,140 Makes sense. 750 00:39:37,140 --> 00:39:38,590 Then we can look at cookies. 751 00:39:38,590 --> 00:39:44,770 Cookies are complicated and a bit tricky. 752 00:39:44,770 --> 00:39:50,400 So cookies have a domain. 753 00:39:50,400 --> 00:39:52,555 And they have a path. 754 00:39:55,810 --> 00:40:01,040 So for example, you can imagine a cookie might be associated 755 00:40:01,040 --> 00:40:02,680 with the following information. 756 00:40:02,680 --> 00:40:06,880 So asterisks dot MIT.edu. 757 00:40:06,880 --> 00:40:12,030 And then, 6.858. 758 00:40:12,030 --> 00:40:14,660 So you've got this domain thing sitting here, 759 00:40:14,660 --> 00:40:18,000 and then, you've got this path thing sitting over here. 760 00:40:18,000 --> 00:40:23,000 So note that this domain can be, possibly, complete 761 00:40:23,000 --> 00:40:24,670 suffix of the pages current domain. 762 00:40:24,670 --> 00:40:26,378 So you can play, somewhat, similar tricks 763 00:40:26,378 --> 00:40:27,280 as we had over there. 764 00:40:27,280 --> 00:40:29,300 And note that this path here can actually just 765 00:40:29,300 --> 00:40:33,690 be set just to the slash with nothing else there, which 766 00:40:33,690 --> 00:40:37,280 indicates that all paths in the domain 767 00:40:37,280 --> 00:40:40,230 should be able to have access to this cookie here. 768 00:40:40,230 --> 00:40:41,940 But in this case, we actually have 769 00:40:41,940 --> 00:40:43,800 one of these nonempty paths. 770 00:40:43,800 --> 00:40:46,500 So whoever sets this cookie, basically, 771 00:40:46,500 --> 00:40:49,630 gets to choose what the domain in the path look like. 772 00:40:49,630 --> 00:40:51,950 And it can actually be set by the server 773 00:40:51,950 --> 00:40:54,190 or can be set on the client side. 774 00:40:54,190 --> 00:40:56,260 So on the client side, you can basically 775 00:40:56,260 --> 00:41:00,985 right to this JavaScript object called document.cooking. 776 00:41:04,200 --> 00:41:06,540 And there's, sort of, this Byzantine format 777 00:41:06,540 --> 00:41:08,426 that you can use to indicate all these paths 778 00:41:08,426 --> 00:41:09,300 and things like that. 779 00:41:09,300 --> 00:41:11,320 But suffice to say it can be done. 780 00:41:11,320 --> 00:41:13,280 So JavaScript can set cookies like this. 781 00:41:13,280 --> 00:41:14,690 And also, the server can actually 782 00:41:14,690 --> 00:41:18,880 set cookies on HP responses when they come back over the wire. 783 00:41:18,880 --> 00:41:21,740 So you can, basically, just use the set cookie header, 784 00:41:21,740 --> 00:41:24,590 if you're the server, to set some of these things. 785 00:41:24,590 --> 00:41:30,530 And know that there's also a secure flag 786 00:41:30,530 --> 00:41:34,520 that you can set in the cookie to indicate that it's an HTTPS 787 00:41:34,520 --> 00:41:38,330 cookie, meaning that HTTP content should not 788 00:41:38,330 --> 00:41:41,110 be able to access that cookie. 789 00:41:41,110 --> 00:41:45,210 So that's the basic idea behind cookies. 790 00:41:45,210 --> 00:41:48,780 Now note that whenever the browser generates a request 791 00:41:48,780 --> 00:41:50,580 to a particular web server, it's going 792 00:41:50,580 --> 00:41:54,720 to include all of the matching cookies in that request. 793 00:41:54,720 --> 00:41:56,540 So there's a little bit of, sort of, 794 00:41:56,540 --> 00:41:58,093 string matching and algorithms that 795 00:41:58,093 --> 00:42:00,289 have to take place to figure out what are all 796 00:42:00,289 --> 00:42:01,830 the exact cookies that should be sent 797 00:42:01,830 --> 00:42:03,180 to the service for a particular request 798 00:42:03,180 --> 00:42:04,990 because you can have all these weird, 799 00:42:04,990 --> 00:42:06,632 sort of, suffix domain things going on 800 00:42:06,632 --> 00:42:07,590 and so on and so forth. 801 00:42:07,590 --> 00:42:12,890 But that's the basic idea behind cookies. 802 00:42:12,890 --> 00:42:16,224 So does that all make sense? 803 00:42:16,224 --> 00:42:18,654 AUDIENCE: So can frames access each other cookies 804 00:42:18,654 --> 00:42:21,084 if they match those rules? 805 00:42:21,084 --> 00:42:24,150 PROFESSOR: Yeah, so frames can do that. 806 00:42:24,150 --> 00:42:28,430 But it's dependent on how the document.domain has been set. 807 00:42:28,430 --> 00:42:32,315 And then, it's dependent on what the cookie domain 808 00:42:32,315 --> 00:42:33,760 and path have been set. 809 00:42:33,760 --> 00:42:36,860 So yeah, after a bunch of these strained comparisons, 810 00:42:36,860 --> 00:42:38,610 yes, frames can access each others cookies 811 00:42:38,610 --> 00:42:39,810 if all those tests pass. 812 00:42:44,220 --> 00:42:47,400 OK, so yes, that leads me into the next question. 813 00:42:47,400 --> 00:42:50,240 So we're trying to figure out how different frames can 814 00:42:50,240 --> 00:42:51,580 access each others cookies. 815 00:42:51,580 --> 00:42:54,040 So what's the problem? 816 00:42:54,040 --> 00:42:56,910 What would be the problem is we allowed arbitrary frames 817 00:42:56,910 --> 00:42:59,516 to write arbitrary people's cookies? 818 00:42:59,516 --> 00:43:00,390 So what do you think? 819 00:43:06,513 --> 00:43:08,950 Well, it will be bad, suffice it to say. 820 00:43:08,950 --> 00:43:11,150 The reason it would be bad is because, once again, 821 00:43:11,150 --> 00:43:16,340 these cookies allow the client side of the application 822 00:43:16,340 --> 00:43:18,940 to store a per user data. 823 00:43:18,940 --> 00:43:22,180 So you can imagine that if an attacker could control 824 00:43:22,180 --> 00:43:24,960 or override a users cookie, the attacker could actually, 825 00:43:24,960 --> 00:43:27,940 for example, change that cookie for a Gmail 826 00:43:27,940 --> 00:43:33,140 to make the user log into the attackers Gmail account. 827 00:43:33,140 --> 00:43:35,230 So when the user logged into the attacker Gmail 828 00:43:35,230 --> 00:43:38,820 account, any email that the user typed in 829 00:43:38,820 --> 00:43:40,680 could be read by the attacker, for example. 830 00:43:40,680 --> 00:43:42,680 You could also imagine that someone could tamper 831 00:43:42,680 --> 00:43:44,177 with the Amazon.com cookie. 832 00:43:44,177 --> 00:43:46,510 You know, put all kinds of embarrassing ridiculous stuff 833 00:43:46,510 --> 00:43:49,066 in your shopping cart, perhaps, or so and so forth. 834 00:43:49,066 --> 00:43:51,690 So cookies are, actually, a very important resource to protect. 835 00:43:51,690 --> 00:43:54,580 And a lot of web security attacks 836 00:43:54,580 --> 00:43:59,390 try to steal that cookie to do various kinds of evil. 837 00:43:59,390 --> 00:44:01,750 So here's another interesting question 838 00:44:01,750 --> 00:44:03,370 with respect to cookies. 839 00:44:03,370 --> 00:44:08,420 So let's say that you've got the site that's 840 00:44:08,420 --> 00:44:12,290 coming from foo.co.uk. 841 00:44:14,800 --> 00:44:18,690 So should the site from this host name 842 00:44:18,690 --> 00:44:24,120 be allowed to set a cookie for co.uk? 843 00:44:26,630 --> 00:44:30,146 So this is a bit subtle because, according 844 00:44:30,146 --> 00:44:31,520 to the rules that we've discussed 845 00:44:31,520 --> 00:44:37,320 before, a site from here should be able to shorten its domain, 846 00:44:37,320 --> 00:44:41,000 set a cookie for this, and that all seems to be legal. 847 00:44:41,000 --> 00:44:42,760 Now of course, as a human, we think 848 00:44:42,760 --> 00:44:45,430 this is kind of suspicious because, as a human, 849 00:44:45,430 --> 00:44:48,820 we actually understand that this is morally speaking 850 00:44:48,820 --> 00:44:51,790 a single atomic domain. 851 00:44:51,790 --> 00:44:54,640 Morally speaking, this is equivalent to .com. 852 00:44:54,640 --> 00:44:55,640 The British got screwed. 853 00:44:55,640 --> 00:44:56,311 They have to have a dot in there. 854 00:44:56,311 --> 00:44:58,186 But that's not their fault. History's unfair. 855 00:44:58,186 --> 00:44:58,720 Right? 856 00:44:58,720 --> 00:45:02,470 So morally speaking, this is a single domain. 857 00:45:02,470 --> 00:45:05,040 So you actually have to have some special infrastructure 858 00:45:05,040 --> 00:45:08,260 to get the cookie setting rules to work out correctly. 859 00:45:08,260 --> 00:45:12,400 So essentially, Mozilla, they have this website 860 00:45:12,400 --> 00:45:15,430 called publicsuffix.org. 861 00:45:21,220 --> 00:45:25,030 And basically, what this website contains 862 00:45:25,030 --> 00:45:29,360 are lists of these rules for how cookies, and origins, 863 00:45:29,360 --> 00:45:32,480 and domains should be shrunk given that some things might 864 00:45:32,480 --> 00:45:33,590 have dots in them. 865 00:45:33,590 --> 00:45:37,010 But actually, they should be treated as a single, sort of, 866 00:45:37,010 --> 00:45:38,930 atomic thing. 867 00:45:38,930 --> 00:45:41,500 So actually, when your browser is figuring out 868 00:45:41,500 --> 00:45:44,512 how it should do all these various cookie manipulations, 869 00:45:44,512 --> 00:45:46,220 it's actually going to consult this side. 870 00:45:46,220 --> 00:45:47,730 Or it's going to have this baked in somehow 871 00:45:47,730 --> 00:45:49,230 or something like that to make sure 872 00:45:49,230 --> 00:45:52,790 that foo.co.uk can't actually just shorten its domain 873 00:45:52,790 --> 00:45:54,070 to co.uk. 874 00:45:54,070 --> 00:45:56,980 And then, perform some chicanery. 875 00:45:56,980 --> 00:45:59,220 So once again, this is very subtle. 876 00:45:59,220 --> 00:46:01,770 And a lot of the interesting web security 877 00:46:01,770 --> 00:46:04,740 issues that we find come about because a lot 878 00:46:04,740 --> 00:46:07,120 of the original infrastructure was designed just 879 00:46:07,120 --> 00:46:08,590 for the English language. 880 00:46:08,590 --> 00:46:11,150 You know, for ASCII text or something like this. 881 00:46:11,150 --> 00:46:15,460 It wasn't designed for an international community. 882 00:46:15,460 --> 00:46:18,275 So as the internet became more popular, people said, hey, 883 00:46:18,275 --> 00:46:20,150 we made some pretty big design decisions here 884 00:46:20,150 --> 00:46:20,840 at the beginning. 885 00:46:20,840 --> 00:46:22,298 We should actually make this usable 886 00:46:22,298 --> 00:46:25,181 on people who use our narrow understanding of what language 887 00:46:25,181 --> 00:46:25,680 means. 888 00:46:25,680 --> 00:46:27,319 You run into all these crazy problems. 889 00:46:27,319 --> 00:46:28,860 And I'll give you another example one 890 00:46:28,860 --> 00:46:31,520 of those a later lecture. 891 00:46:31,520 --> 00:46:34,400 So does this all makes sense? 892 00:46:34,400 --> 00:46:36,220 OK. 893 00:46:36,220 --> 00:46:44,930 So with respect to XML HTTP responses, 894 00:46:44,930 --> 00:46:50,740 how are they treated by the same origin policy? 895 00:46:53,310 --> 00:46:58,510 So by default, JavaScript can only generate one of these 896 00:46:58,510 --> 00:47:01,720 if it's going to its origin server. 897 00:47:01,720 --> 00:47:05,500 However, there's this new interface called 898 00:47:05,500 --> 00:47:08,476 cross origin request or CORS. 899 00:47:08,476 --> 00:47:13,970 All right, so this is the same origin 900 00:47:13,970 --> 00:47:20,500 unless the server has enabled this CORS thing. 901 00:47:24,120 --> 00:47:29,960 So basically, this adds a new HTTP response header called 902 00:47:29,960 --> 00:47:36,480 access control allow origin. 903 00:47:42,100 --> 00:47:43,960 So let's say that JavaScript from foo.com 904 00:47:43,960 --> 00:47:47,470 wants to make an XML HTTP request to bar.com. 905 00:47:47,470 --> 00:47:51,280 So that's cross origin, as we described in the rules so far. 906 00:47:51,280 --> 00:47:55,380 So if the server in bar.com wants to allow this, 907 00:47:55,380 --> 00:47:59,220 it will return in it's HTTP response this header here 908 00:47:59,220 --> 00:48:07,670 that's going to say, yes, I allow, for example, foo.com 909 00:48:07,670 --> 00:48:13,220 to send me these cross origin XML HTTP request. 910 00:48:13,220 --> 00:48:15,270 The server on bar.com could actually say no. 911 00:48:15,270 --> 00:48:17,230 It could refuse the request. 912 00:48:17,230 --> 00:48:21,440 In which case, the browser would fail the XML HTTP request. 913 00:48:21,440 --> 00:48:23,260 So this is, sort of, a new thing that's 914 00:48:23,260 --> 00:48:27,270 come up in large part because of these mash up applications. 915 00:48:27,270 --> 00:48:30,732 This need for, somehow, applications 916 00:48:30,732 --> 00:48:32,690 from different developers and different domains 917 00:48:32,690 --> 00:48:35,930 to be able to share data in some type of constrained way. 918 00:48:35,930 --> 00:48:38,085 So this could also be asterisks over here 919 00:48:38,085 --> 00:48:40,220 if anybody can fetch the data cross-origin, 920 00:48:40,220 --> 00:48:42,630 so on and so forth. 921 00:48:42,630 --> 00:48:45,316 So I think that's pretty straightforward. 922 00:48:45,316 --> 00:48:47,190 So I mean, there's a bunch of other resources 923 00:48:47,190 --> 00:48:50,220 we could look at. 924 00:48:50,220 --> 00:48:52,030 For example, images. 925 00:48:52,030 --> 00:48:56,310 So a frame can load images from any origin that it desires. 926 00:48:56,310 --> 00:49:01,044 But it can't actually inspect the bits in that image 927 00:49:01,044 --> 00:49:02,710 because, somehow, the same origin policy 928 00:49:02,710 --> 00:49:04,790 says that having different origin 929 00:49:04,790 --> 00:49:07,870 directly inspect each others content is a bad thing. 930 00:49:07,870 --> 00:49:10,390 So the frame can't inspect the bits. 931 00:49:10,390 --> 00:49:12,005 But it can, actually, infer things 932 00:49:12,005 --> 00:49:14,630 like what the size of the image is because it can actually 933 00:49:14,630 --> 00:49:17,490 see where the other dominoes in that page 934 00:49:17,490 --> 00:49:18,910 have been placed, for example. 935 00:49:18,910 --> 00:49:20,700 So this is another one of these weird instances where 936 00:49:20,700 --> 00:49:22,390 the same origin policy is ostensibly 937 00:49:22,390 --> 00:49:24,180 trying to prevent all information leakage. 938 00:49:24,180 --> 00:49:25,805 But it can't actually prevent all of it 939 00:49:25,805 --> 00:49:27,670 because embedding inherently reveals 940 00:49:27,670 --> 00:49:29,960 some types of information. 941 00:49:29,960 --> 00:49:33,280 CSS has a similar story to images. 942 00:49:33,280 --> 00:49:38,140 So a frame can embed CSS from any origin. 943 00:49:38,140 --> 00:49:41,850 However, it cannot directly inspect the text inside that 944 00:49:41,850 --> 00:49:44,150 CSS file, if it's from a different origin. 945 00:49:44,150 --> 00:49:47,640 But it can actually imply what this CSS does because it just 946 00:49:47,640 --> 00:49:49,130 can create a bunch of nodes. 947 00:49:49,130 --> 00:49:51,370 And then, see how they're styling gets changed. 948 00:49:51,370 --> 00:49:53,740 So it's a bit wacky. 949 00:49:53,740 --> 00:49:59,020 JavaScript is actually my favorite example 950 00:49:59,020 --> 00:50:01,440 of how this same origin policy struggles 951 00:50:01,440 --> 00:50:04,550 to maintain any type of intellectual consistency. 952 00:50:04,550 --> 00:50:08,740 So the idea here is that, if you do a cross origin fetch 953 00:50:08,740 --> 00:50:10,980 of JavaScript, that is allowed. 954 00:50:10,980 --> 00:50:13,200 You can allow that external JavaScript 955 00:50:13,200 --> 00:50:15,810 to execute in the context of your own page. 956 00:50:15,810 --> 00:50:19,300 You cannot, however, look at the source code for it. 957 00:50:19,300 --> 00:50:21,750 So if you have a script tag source 958 00:50:21,750 --> 00:50:23,700 equals something outside your domain, 959 00:50:23,700 --> 00:50:25,922 then when that source gets executed, 960 00:50:25,922 --> 00:50:27,130 you can call functions in it. 961 00:50:27,130 --> 00:50:29,296 But you can't actually look at the JavaScript source 962 00:50:29,296 --> 00:50:30,470 code in it. 963 00:50:30,470 --> 00:50:31,370 OK, fine. 964 00:50:31,370 --> 00:50:32,510 So that seems very nice. 965 00:50:32,510 --> 00:50:34,343 However, there are a bunch of holes in this. 966 00:50:34,343 --> 00:50:38,040 So for example, JavaScript is dynamic scripting language. 967 00:50:38,040 --> 00:50:40,920 And functions are first class objects. 968 00:50:40,920 --> 00:50:47,090 So for any function f, you can just call f.tostring. 969 00:50:47,090 --> 00:50:49,950 And that will give you the source code for the function. 970 00:50:49,950 --> 00:50:51,590 And people do this all the time. 971 00:50:51,590 --> 00:50:54,654 Do things like dynamic rewriting and stuff like that. 972 00:50:54,654 --> 00:50:56,070 So you know the same origin policy 973 00:50:56,070 --> 00:50:57,880 doesn't allow you to directly look 974 00:50:57,880 --> 00:51:00,140 at the contents of the script tag itself? 975 00:51:00,140 --> 00:51:02,864 You can just call this for any public function 976 00:51:02,864 --> 00:51:04,530 that that external script has given you. 977 00:51:04,530 --> 00:51:06,600 And just get the source code like that. 978 00:51:06,600 --> 00:51:08,190 Another thing you could imagine doing 979 00:51:08,190 --> 00:51:11,540 is you could just get your home server from your domain 980 00:51:11,540 --> 00:51:13,980 to just fetch the source code for you. 981 00:51:13,980 --> 00:51:16,630 And then, just send it back to you. 982 00:51:16,630 --> 00:51:17,275 So oops. 983 00:51:17,275 --> 00:51:19,400 I mean, you essentially just asked your home server 984 00:51:19,400 --> 00:51:20,610 to run Wget. 985 00:51:20,610 --> 00:51:22,180 And you get the source code that way. 986 00:51:22,180 --> 00:51:24,370 OK, so that's, kind of think, goofy. 987 00:51:24,370 --> 00:51:27,290 So long story short, the same origin policies 988 00:51:27,290 --> 00:51:28,556 here are a bit odd. 989 00:51:28,556 --> 00:51:30,903 AUDIENCE: Presume that par of the reason they 990 00:51:30,903 --> 00:51:33,281 do it is to prevent the user from fetching JavaScript 991 00:51:33,281 --> 00:51:35,030 because then cookies will be sent as well. 992 00:51:35,030 --> 00:51:37,284 So you can get JavaScript tailored to you. 993 00:51:37,284 --> 00:51:38,276 PROFESSOR: Yeah. 994 00:51:38,276 --> 00:51:40,260 AUDIENCE: So if you get your server to fetch it for you, 995 00:51:40,260 --> 00:51:41,252 it won't have the user's cookies [INAUDIBLE]. 996 00:51:41,252 --> 00:51:42,640 PROFESSOR: That is true. 997 00:51:42,640 --> 00:51:44,790 Although, in practice, a lot of times, 998 00:51:44,790 --> 00:51:49,119 the raw source code, itself, is not user tailored in practice. 999 00:51:49,119 --> 00:51:50,660 But you're right that it will prevent 1000 00:51:50,660 --> 00:51:52,960 some cookie-mediated attacks like that. 1001 00:51:52,960 --> 00:51:54,630 Modulo, some of the cookie [INAUDIBLE]. 1002 00:51:54,630 --> 00:51:57,160 But that's exactly correct. 1003 00:51:57,160 --> 00:52:02,505 So because it's actually pretty easy for users and applications 1004 00:52:02,505 --> 00:52:04,809 to get JavaScript source code, a lot of times, 1005 00:52:04,809 --> 00:52:06,600 JavaScript source code, when it's deployed, 1006 00:52:06,600 --> 00:52:09,070 it's actually obfuscated and minified. 1007 00:52:09,070 --> 00:52:11,860 So if you've ever tried to look and see how a web page works, 1008 00:52:11,860 --> 00:52:13,800 if you look at the source, sometimes people 1009 00:52:13,800 --> 00:52:16,330 will do things like move all the white space. 1010 00:52:16,330 --> 00:52:18,440 They will also change all the variable names 1011 00:52:18,440 --> 00:52:21,070 to be super short and have all these exclamation marks. 1012 00:52:21,070 --> 00:52:23,940 Looks like cartoon characters cursing in the cartoons. 1013 00:52:23,940 --> 00:52:25,650 So that's, sort of, like a cheat form 1014 00:52:25,650 --> 00:52:27,290 of digital rights management. 1015 00:52:27,290 --> 00:52:32,090 But it's all, ultimately, a bit of a crap shoot 1016 00:52:32,090 --> 00:52:34,490 because you can do things like execute 1017 00:52:34,490 --> 00:52:36,450 that code in your own browser. 1018 00:52:36,450 --> 00:52:37,300 See what it does. 1019 00:52:37,300 --> 00:52:38,220 Sniff the network. 1020 00:52:38,220 --> 00:52:40,352 See who it talks to, so on and so forth. 1021 00:52:40,352 --> 00:52:44,480 But that's, basically, the same origin story for JavaScript. 1022 00:52:44,480 --> 00:52:45,255 Plug-ins-- 1023 00:52:45,255 --> 00:52:46,754 AUDIENCE: I was under the impression 1024 00:52:46,754 --> 00:52:50,051 that the reason you do that is [INAUDIBLE] 1025 00:52:50,051 --> 00:52:52,330 take less time to download rather than [INAUDIBLE]. 1026 00:52:52,330 --> 00:52:54,820 PROFESSOR: So that is also a reason they do that, too. 1027 00:52:54,820 --> 00:52:56,100 That's a good point. 1028 00:52:56,100 --> 00:53:00,312 But I mean, if you type into the internet, sort of, 1029 00:53:00,312 --> 00:53:02,500 web page obfuscation or stuff like that, 1030 00:53:02,500 --> 00:53:06,110 people often try to, somehow, make some type of secrets 1031 00:53:06,110 --> 00:53:08,330 into either their HTML or their JavaScript. 1032 00:53:08,330 --> 00:53:10,170 Maybe they want to obscure the protocol. 1033 00:53:10,170 --> 00:53:13,230 For example, if the client uses it to talk to the server. 1034 00:53:13,230 --> 00:53:16,222 Some people will also do the obfuscation for that reason. 1035 00:53:16,222 --> 00:53:17,680 Pure minification-- in other words, 1036 00:53:17,680 --> 00:53:19,944 just making the variable names small 1037 00:53:19,944 --> 00:53:21,360 and moving the [INAUDIBLE] space-- 1038 00:53:21,360 --> 00:53:24,875 yeah, that's mainly just to save download band, download time. 1039 00:53:28,611 --> 00:53:31,710 OK, so that's the story for JavaScript. 1040 00:53:31,710 --> 00:53:34,150 There's also plug-ins. 1041 00:53:34,150 --> 00:53:39,440 So this is stuff like Java and things like this. 1042 00:53:39,440 --> 00:53:42,799 So a frame can easily run a plug-in from either origin. 1043 00:53:42,799 --> 00:53:44,590 Now plug-ins, depending on who you believe, 1044 00:53:44,590 --> 00:53:46,548 are actually going to the way of the dinosaurs. 1045 00:53:46,548 --> 00:53:48,530 Because a lot of the new HTML 5 features, 1046 00:53:48,530 --> 00:53:50,030 like video tag and things like this, 1047 00:53:50,030 --> 00:53:51,488 can actually do stuff that you used 1048 00:53:51,488 --> 00:53:53,574 to only be able to do with a plug-in like Java. 1049 00:53:53,574 --> 00:53:55,490 So it's not clear how much longer these things 1050 00:53:55,490 --> 00:53:58,460 are going to be around. 1051 00:53:58,460 --> 00:53:59,590 OK, so any questions. 1052 00:54:02,992 --> 00:54:07,560 OK, so remember that when a browser generates an HTTP 1053 00:54:07,560 --> 00:54:11,090 request it automatically includes the relevant cookies 1054 00:54:11,090 --> 00:54:12,080 in that request. 1055 00:54:12,080 --> 00:54:18,250 So what happens if a malicious site generates 1056 00:54:18,250 --> 00:54:21,280 a URL that looks like this? 1057 00:54:21,280 --> 00:54:24,538 So for example, it creates a new child frame. 1058 00:54:24,538 --> 00:54:28,370 It says that URL to bank.com. 1059 00:54:28,370 --> 00:54:31,990 And then, it actually tries to mimic what the browser would 1060 00:54:31,990 --> 00:54:36,910 do if there was going to be a transfer of money 1061 00:54:36,910 --> 00:54:39,780 between the user and someone else. 1062 00:54:44,140 --> 00:54:49,605 So in this URL, in this frame that the attack 1063 00:54:49,605 --> 00:54:53,020 is trying to create, it tries to invoke this transfer command 1064 00:54:53,020 --> 00:54:53,520 here. 1065 00:54:53,520 --> 00:54:54,872 Say $500. 1066 00:54:54,872 --> 00:54:58,960 And that should go to the attacker's account at the bank. 1067 00:54:58,960 --> 00:55:01,800 Now the attacker page, which the user 1068 00:55:01,800 --> 00:55:04,516 visited because, somehow, the attacker is [INAUDIBLE] 1069 00:55:04,516 --> 00:55:07,450 go there. 1070 00:55:07,450 --> 00:55:09,160 What's interesting about this is that, 1071 00:55:09,160 --> 00:55:11,760 even though the attacker page won't 1072 00:55:11,760 --> 00:55:14,930 be able to see the contents of this child frame 1073 00:55:14,930 --> 00:55:18,020 because it's probably going to be in a different origin. 1074 00:55:18,020 --> 00:55:21,880 The bank.com page will still do what the attacker wants 1075 00:55:21,880 --> 00:55:24,220 because the browser's going to transfer all the users 1076 00:55:24,220 --> 00:55:25,506 cookies with this request. 1077 00:55:25,506 --> 00:55:27,130 It's going to look at this command here 1078 00:55:27,130 --> 00:55:29,080 and say, oh, the user must've, somehow, 1079 00:55:29,080 --> 00:55:31,770 asked me to transfer $500 to this mysteriously named 1080 00:55:31,770 --> 00:55:32,990 individual named attacker. 1081 00:55:32,990 --> 00:55:34,070 OK, I'll do. 1082 00:55:34,070 --> 00:55:36,030 All right, seems reasonable. 1083 00:55:36,030 --> 00:55:38,080 So that's a problem. 1084 00:55:38,080 --> 00:55:39,620 Then the reason this attack works 1085 00:55:39,620 --> 00:55:42,850 is because, essentially, the attacker 1086 00:55:42,850 --> 00:55:45,760 can figure out deterministically what 1087 00:55:45,760 --> 00:55:47,520 this command should look like. 1088 00:55:47,520 --> 00:55:49,825 There's no randomness in this command here. 1089 00:55:49,825 --> 00:55:51,200 So essentially, what the attacker 1090 00:55:51,200 --> 00:55:54,164 can do is try this on his or her own bank account, 1091 00:55:54,164 --> 00:55:55,580 figure out this protocol, and then 1092 00:55:55,580 --> 00:55:58,780 just, somehow, force the user browser to execute 1093 00:55:58,780 --> 00:56:00,560 this on the attackers behalf. 1094 00:56:00,560 --> 00:56:08,639 So this is what's called a cross site request forgery. 1095 00:56:12,180 --> 00:56:16,690 So sometimes you hear this is called CSRF. 1096 00:56:16,690 --> 00:56:21,680 C-S-R-F. 1097 00:56:21,680 --> 00:56:25,620 So the solution to fixing this attack here 1098 00:56:25,620 --> 00:56:28,540 is that you actually just need to include some randomness 1099 00:56:28,540 --> 00:56:30,680 in this URL that's generated. 1100 00:56:30,680 --> 00:56:32,420 A type of randomness that the attacker 1101 00:56:32,420 --> 00:56:33,810 can't guess statically. 1102 00:56:33,810 --> 00:56:42,960 So for example, you can imagine that inside the bank's web page 1103 00:56:42,960 --> 00:56:45,110 it's going to have some form. 1104 00:56:45,110 --> 00:56:47,010 The form is the thing, which actually 1105 00:56:47,010 --> 00:56:48,460 generates request like this. 1106 00:56:48,460 --> 00:56:53,475 So maybe the action of that form is transfer.cgi. 1107 00:56:57,690 --> 00:57:02,330 And then, inside this form, you're going to have an input. 1108 00:57:02,330 --> 00:57:05,322 Inputs are usually used to get in user input like text, 1109 00:57:05,322 --> 00:57:07,280 key presses, mouse clicks, and stuff like that. 1110 00:57:07,280 --> 00:57:09,310 But we can actually give this input 1111 00:57:09,310 --> 00:57:12,950 a type of hidden, which means that it's not 1112 00:57:12,950 --> 00:57:16,280 shown to the user. 1113 00:57:16,280 --> 00:57:19,060 And then, we can give it this attribute. 1114 00:57:19,060 --> 00:57:24,190 We'll call it CSRF. 1115 00:57:24,190 --> 00:57:26,020 And then, we'll give it some random value. 1116 00:57:31,790 --> 00:57:33,380 You know, a72f. 1117 00:57:33,380 --> 00:57:35,240 Whatever. 1118 00:57:35,240 --> 00:57:37,620 So remember, this is generated on the server side. 1119 00:57:37,620 --> 00:57:41,320 So when the user goes to this page, on the server side, 1120 00:57:41,320 --> 00:57:43,270 it sometimes generates this random here 1121 00:57:43,270 --> 00:57:46,940 and embeds that in the HTML that the user receives. 1122 00:57:46,940 --> 00:57:49,390 So when the user submits this form, 1123 00:57:49,390 --> 00:57:52,140 then this URL that we have up here will actually 1124 00:57:52,140 --> 00:58:03,620 have this extra thing up here, which is this token here. 1125 00:58:03,620 --> 00:58:05,250 So what this does is that this now 1126 00:58:05,250 --> 00:58:08,198 means that the attacker would have 1127 00:58:08,198 --> 00:58:10,450 to be able to guess the particular range of token 1128 00:58:10,450 --> 00:58:13,060 that the server generated for the user each time 1129 00:58:13,060 --> 00:58:14,460 the user had gone to the page. 1130 00:58:14,460 --> 00:58:17,720 So if you sufficient randomness here, 1131 00:58:17,720 --> 00:58:20,230 the attacker can't just forge one of these things 1132 00:58:20,230 --> 00:58:23,250 because if the attacker guesses the wrong token, 1133 00:58:23,250 --> 00:58:25,364 then the server orders will reject your request. 1134 00:58:25,364 --> 00:58:26,985 AUDIENCE: Well why should these always 1135 00:58:26,985 --> 00:58:30,450 be included in the URL and not in the body of the [INAUDIBLE]? 1136 00:58:35,286 --> 00:58:36,160 PROFESSOR: Yeah, yea. 1137 00:58:36,160 --> 00:58:38,836 So HTTPS helps a lot of these things. 1138 00:58:38,836 --> 00:58:40,502 And there's actually no intrinsic reason 1139 00:58:40,502 --> 00:58:42,240 why you couldn't put some of this stuff 1140 00:58:42,240 --> 00:58:44,000 in the body of the request. 1141 00:58:44,000 --> 00:58:47,319 There's some legacy reasons why forms, sort of, work like this. 1142 00:58:47,319 --> 00:58:48,110 But you're correct. 1143 00:58:48,110 --> 00:58:50,690 And in practice, you can put that information somewhere else 1144 00:58:50,690 --> 00:58:51,660 in the HTTPS request. 1145 00:58:51,660 --> 00:58:54,000 But note that just moving that information, for example, 1146 00:58:54,000 --> 00:58:56,280 to the body of the request, there's 1147 00:58:56,280 --> 00:58:59,080 still a challenge there, potentially because if there's 1148 00:58:59,080 --> 00:59:01,350 something there that the attacker can guess. 1149 00:59:01,350 --> 00:59:03,635 Then the attacker may still be able to, somehow, 1150 00:59:03,635 --> 00:59:05,410 conjure up that URL. 1151 00:59:05,410 --> 00:59:08,510 For example, when I'm making XML HTTP request and then, 1152 00:59:08,510 --> 00:59:10,370 explicitly, setting the body to this thing 1153 00:59:10,370 --> 00:59:11,911 that the attacker knows how to guess. 1154 00:59:11,911 --> 00:59:15,154 AUDIENCE: Well if the attacker just gives you a URL, 1155 00:59:15,154 --> 00:59:19,934 then that just gets encoded in the header of [INAUDIBLE]. 1156 00:59:19,934 --> 00:59:22,620 PROFESSOR: If the attacker just gives you a URL. 1157 00:59:22,620 --> 00:59:26,320 So if you're just setting a frame to URL, 1158 00:59:26,320 --> 00:59:28,826 then, that's all that the attacker can control. 1159 00:59:28,826 --> 00:59:30,450 But if you're using an XML HTTP request 1160 00:59:30,450 --> 00:59:32,970 if, if somehow the attacker can generate one of those, 1161 00:59:32,970 --> 00:59:38,050 then XML HTTP interface actually allows you to set the body. 1162 00:59:38,050 --> 00:59:39,972 AUDIENCE: The XML HTTP request would 1163 00:59:39,972 --> 00:59:41,740 be limited by, say, an origin. 1164 00:59:41,740 --> 00:59:44,110 But the attacker could just write a form and submit it. 1165 00:59:44,110 --> 00:59:46,820 There's nothing [INAUDIBLE] submitting a form like using 1166 00:59:46,820 --> 00:59:47,330 [INAUDIBLE]. 1167 00:59:47,330 --> 00:59:49,910 And then, it's sent in the body. 1168 00:59:49,910 --> 00:59:50,710 But it's still-- 1169 00:59:50,710 --> 00:59:51,710 PROFESSOR: That's right. 1170 00:59:51,710 --> 00:59:55,290 So XML HTTP request is limited to the same origin. 1171 00:59:55,290 --> 00:59:58,190 However, if for example, the attacker can, 1172 00:59:58,190 --> 01:00:01,380 maybe, do something like this, for example. 1173 01:00:01,380 --> 01:00:04,070 And the attacker can inject the XML HTTP request here, 1174 01:00:04,070 --> 01:00:05,090 which would then execute with the authority 1175 01:00:05,090 --> 01:00:05,965 of the embedded page. 1176 01:00:10,650 --> 01:00:13,250 AUDIENCE: Can the attacker [INAUDIBLE] 1177 01:00:13,250 --> 01:00:16,741 by inspecting the HTML source code? 1178 01:00:16,741 --> 01:00:19,360 PROFESSOR: Yes, that's actually a good question. right so 1179 01:00:19,360 --> 01:00:22,830 it depends on what the attacker has access to. 1180 01:00:22,830 --> 01:00:25,431 If the attacker-- for example, by doing something goofy 1181 01:00:25,431 --> 01:00:30,110 like that-- can actually access this JavaScript property 1182 01:00:30,110 --> 01:00:33,870 called inner HTML. 1183 01:00:33,870 --> 01:00:35,680 This is a property [INAUDIBLE], right. 1184 01:00:35,680 --> 01:00:39,970 So if I document that body dot inner HTML, 1185 01:00:39,970 --> 01:00:42,536 I will get all of the HTML that's inside that page 1186 01:00:42,536 --> 01:00:43,140 right now. 1187 01:00:43,140 --> 01:00:43,640 So yeah. 1188 01:00:43,640 --> 01:00:45,612 So if the attacker can do this, then yeah. 1189 01:00:45,612 --> 01:00:46,570 Then you're in trouble. 1190 01:00:46,570 --> 01:00:47,624 That's right. 1191 01:00:47,624 --> 01:00:49,040 So a lot of these details, though, 1192 01:00:49,040 --> 01:00:50,970 depend on exactly what the attacker can and can't do. 1193 01:00:50,970 --> 01:00:52,230 So it, kind of, makes sense. 1194 01:00:52,230 --> 01:00:54,780 So if the attacker can or cannot generate Ajax request, 1195 01:00:54,780 --> 01:00:55,800 that means one thing. 1196 01:00:55,800 --> 01:00:57,690 The attacker can or cannot look at the right HTML, 1197 01:00:57,690 --> 01:00:58,760 then you have another thing. 1198 01:00:58,760 --> 01:00:59,551 So on and so forth. 1199 01:01:02,041 --> 01:01:02,540 All right. 1200 01:01:02,540 --> 01:01:03,660 So yeah. 1201 01:01:03,660 --> 01:01:06,340 So this is token based thing is a popular way 1202 01:01:06,340 --> 01:01:10,270 to get around these CSRF attacks. 1203 01:01:10,270 --> 01:01:17,970 All right, so another thing we can look at 1204 01:01:17,970 --> 01:01:19,595 are network addresses. 1205 01:01:22,711 --> 01:01:25,210 So this gets into some of the conversation we've been having 1206 01:01:25,210 --> 01:01:30,610 about who the attacker cannot contact via XML HTTP request, 1207 01:01:30,610 --> 01:01:31,240 for example. 1208 01:01:36,450 --> 01:01:40,848 So with respect to network addresses, 1209 01:01:40,848 --> 01:01:47,210 a frame can send HTTP and HTTPS requests 1210 01:01:47,210 --> 01:01:50,560 to a host plus a port that matches it's origin. 1211 01:01:50,560 --> 01:01:54,950 But note that the security of the same origin policy is, 1212 01:01:54,950 --> 01:01:58,600 actually, very tightly tied with the security of the DNS 1213 01:01:58,600 --> 01:02:01,730 infrastructure because all the same origin 1214 01:02:01,730 --> 01:02:04,360 policies' rules are based upon what names me. 1215 01:02:04,360 --> 01:02:06,080 So if you can control what names me, 1216 01:02:06,080 --> 01:02:08,260 you can actually want some pretty vicious attacks. 1217 01:02:08,260 --> 01:02:14,520 So an example of this is the DNS rebinding attack. 1218 01:02:19,940 --> 01:02:25,010 So in this attack, the goal of the attacker 1219 01:02:25,010 --> 01:02:40,577 is run attacker controlled JavaScript with the authority 1220 01:02:40,577 --> 01:02:42,997 of some victim website. 1221 01:02:42,997 --> 01:02:44,330 We'll just call them victim.com. 1222 01:02:48,232 --> 01:02:50,440 So the attacker wants to bus the same origin policies 1223 01:02:50,440 --> 01:02:53,240 and somehow run code that he has written 1224 01:02:53,240 --> 01:02:55,620 with the authority of some other site. 1225 01:02:55,620 --> 01:02:59,480 So here's the approach. 1226 01:02:59,480 --> 01:03:03,740 So the first thing that the attacker is going to do 1227 01:03:03,740 --> 01:03:07,460 is register a domain name. 1228 01:03:10,670 --> 01:03:13,040 So let's say we just call that attacker.com. 1229 01:03:18,746 --> 01:03:19,960 Very simple to do. 1230 01:03:19,960 --> 01:03:21,089 Just pay a couple of bucks. 1231 01:03:21,089 --> 01:03:21,880 You're ready to go. 1232 01:03:21,880 --> 01:03:24,390 You own your own domain name. 1233 01:03:24,390 --> 01:03:26,220 So note that the attacker is also 1234 01:03:26,220 --> 01:03:28,970 going to set up a DNS server to respond 1235 01:03:28,970 --> 01:03:32,490 to name resolution requests for objects 1236 01:03:32,490 --> 01:03:33,960 that reside in attacker.com. 1237 01:03:33,960 --> 01:03:35,810 So the second thing that has to happen 1238 01:03:35,810 --> 01:03:40,980 is that the user has to visit attacker.com. 1239 01:03:44,291 --> 01:03:47,260 In particular, the user has to visit some website that 1240 01:03:47,260 --> 01:03:49,190 hangs off of this domain name. 1241 01:03:49,190 --> 01:03:50,990 This part is actually not tricky. 1242 01:03:50,990 --> 01:03:53,110 See if you can create an ad campaign. 1243 01:03:53,110 --> 01:03:54,040 Free iPad. 1244 01:03:54,040 --> 01:03:54,800 Everybody wants a free iPad, even 1245 01:03:54,800 --> 01:03:56,730 though I don't know anyone who's ever won a free iPad. 1246 01:03:56,730 --> 01:03:57,647 The click on this. 1247 01:03:57,647 --> 01:03:58,230 They're there. 1248 01:03:58,230 --> 01:04:00,063 It's in the phishing email, so and so forth. 1249 01:04:00,063 --> 01:04:01,330 This part's not hard. 1250 01:04:01,330 --> 01:04:03,030 So what's going to happen? 1251 01:04:03,030 --> 01:04:10,430 So this is actually going to cause the browser 1252 01:04:10,430 --> 01:04:25,560 to generate a DNS request to attacker.com 1253 01:04:25,560 --> 01:04:27,540 because this page has some objects that 1254 01:04:27,540 --> 01:04:30,950 refer to some objects that live in attacker.com. 1255 01:04:30,950 --> 01:04:34,810 The browser's going to say I never seen this domain before. 1256 01:04:34,810 --> 01:04:38,292 Let me send the DNS resolution request to attacker.com. 1257 01:04:38,292 --> 01:04:39,750 So what's going to end up happening 1258 01:04:39,750 --> 01:04:42,570 is that the attackers DNS server is going 1259 01:04:42,570 --> 01:04:45,090 to respond to that request. 1260 01:04:45,090 --> 01:04:49,100 But it's going to respond with a DNS result that 1261 01:04:49,100 --> 01:04:51,630 has a very short time to live. 1262 01:04:51,630 --> 01:04:52,210 OK? 1263 01:04:52,210 --> 01:04:54,540 Meaning that the browser will think 1264 01:04:54,540 --> 01:04:58,300 that it's only valid for a very short period of time 1265 01:04:58,300 --> 01:05:00,400 before it has to go out and revalidate that. 1266 01:05:00,400 --> 01:05:02,070 OK? 1267 01:05:02,070 --> 01:05:17,780 So in other words, the attacker response has a small DTL. 1268 01:05:20,600 --> 01:05:21,260 OK, fine. 1269 01:05:21,260 --> 01:05:23,990 So the user gets the response back. 1270 01:05:23,990 --> 01:05:27,460 The malicious website is now running on the user side. 1271 01:05:27,460 --> 01:05:30,580 Meanwhile, while the user's interacting with the sight, 1272 01:05:30,580 --> 01:05:34,580 the attacker is going to configure the DNS 1273 01:05:34,580 --> 01:05:37,310 server that he controls. 1274 01:05:37,310 --> 01:05:45,390 The attacker is going to bind the attacker.com name 1275 01:05:45,390 --> 01:05:50,940 to victim.com's IP address. 1276 01:05:56,600 --> 01:05:57,260 Right? 1277 01:05:57,260 --> 01:06:02,050 So what that means is that now if the user's browser ask 1278 01:06:02,050 --> 01:06:04,600 for a domain name resolution for something that 1279 01:06:04,600 --> 01:06:06,730 resides in attacker.com, it's actually 1280 01:06:06,730 --> 01:06:10,152 going to get some internal address to victim.com. 1281 01:06:10,152 --> 01:06:12,750 This is actually very subtle. 1282 01:06:12,750 --> 01:06:15,720 Now why can the attacker's DNS resolver do that? 1283 01:06:15,720 --> 01:06:18,530 Because the attacker configures it to do so. 1284 01:06:18,530 --> 01:06:19,970 The attacker's DNS server does not 1285 01:06:19,970 --> 01:06:23,387 have to consult victim.com to do this rebinding. 1286 01:06:23,387 --> 01:06:25,970 So perhaps, you can see some of the outline in the attack now. 1287 01:06:25,970 --> 01:06:32,450 So what will happen is that the website 1288 01:06:32,450 --> 01:06:44,185 wants to fetch a new object via, let's say, AJAX. 1289 01:06:47,480 --> 01:06:50,300 And it thinks that that AJAX request 1290 01:06:50,300 --> 01:06:53,520 is going to go to attacker.com somewhere externally. 1291 01:06:53,520 --> 01:07:00,950 But this AJAX request actually goes to victim.com. 1292 01:07:05,800 --> 01:07:08,110 And the reason why that's bad is because now we've 1293 01:07:08,110 --> 01:07:10,240 got this code on appliance side that 1294 01:07:10,240 --> 01:07:16,270 resides on the attacker.com web page that's actually accessing 1295 01:07:16,270 --> 01:07:19,070 now data that is from a different origin 1296 01:07:19,070 --> 01:07:20,990 from victim.com. 1297 01:07:20,990 --> 01:07:23,150 So once this step of the attack completes, 1298 01:07:23,150 --> 01:07:26,765 then the attacker.com web page can send that contact back 1299 01:07:26,765 --> 01:07:30,600 to the server using [INAUDIBLE] or do other things like that. 1300 01:07:30,600 --> 01:07:32,709 So does this attack make sense? 1301 01:07:32,709 --> 01:07:35,000 AUDIENCE: Wouldn't it be more sensible to do the attack 1302 01:07:35,000 --> 01:07:36,560 the other way around? 1303 01:07:36,560 --> 01:07:41,732 So to [INAUDIBLE] victim.com to the attackers IP address. 1304 01:07:41,732 --> 01:07:43,940 Because that way you're the same origin as victim.com 1305 01:07:43,940 --> 01:07:47,710 so you can get all the cookies and such. 1306 01:07:47,710 --> 01:07:50,330 PROFESSOR: Yeah, so that would work, too, as well. 1307 01:07:50,330 --> 01:07:53,850 So what's nice about this though is that, 1308 01:07:53,850 --> 01:07:58,547 presumably, this allows you o do nice things like port scanning 1309 01:07:58,547 --> 01:07:59,380 and stuff like that. 1310 01:07:59,380 --> 01:08:01,500 I mean, your approach will work, right. 1311 01:08:01,500 --> 01:08:04,680 But I think here the reason why you do-- 1312 01:08:04,680 --> 01:08:05,780 AUDIENCE: [INAUDIBLE]. 1313 01:08:05,780 --> 01:08:07,280 PROFESSOR: Because, essentially, you 1314 01:08:07,280 --> 01:08:11,460 can do things like constantly rebind what attacker.com points 1315 01:08:11,460 --> 01:08:15,680 to to different machine names and different ports inside 1316 01:08:15,680 --> 01:08:17,394 of victim.com's network. 1317 01:08:17,394 --> 01:08:19,060 So then, you can, sort of, step through. 1318 01:08:19,060 --> 01:08:22,240 So in other words, let's say that the attacker.com web page 1319 01:08:22,240 --> 01:08:28,899 always thinks it's going to attacker.com 1320 01:08:28,899 --> 01:08:32,540 and issuing an AJAX request there. 1321 01:08:32,540 --> 01:08:35,270 So every time the DNS server rebinds, 1322 01:08:35,270 --> 01:08:37,910 it [INAUDIBLE] to some different IP address 1323 01:08:37,910 --> 01:08:39,693 inside of victim.com's network. 1324 01:08:39,693 --> 01:08:42,109 So it can just, sort of, step through the IP addresses one 1325 01:08:42,109 --> 01:08:47,369 by one and see if anybody's responding to those requests. 1326 01:08:47,369 --> 01:08:51,560 AUDIENCE: But the client, the user you're attacking, 1327 01:08:51,560 --> 01:08:55,280 doesn't necessarily have inside access to victim.com's network. 1328 01:08:55,280 --> 01:08:57,550 PROFESSOR: So what this attack, typically, ensues 1329 01:08:57,550 --> 01:09:00,390 is that there are certain firewall rules that 1330 01:09:00,390 --> 01:09:03,400 would prevent attacker.com from outside the network 1331 01:09:03,400 --> 01:09:05,970 from actually looking through each one of the IP addresses 1332 01:09:05,970 --> 01:09:07,354 inside of victim.com. 1333 01:09:07,354 --> 01:09:09,270 However, if you're inside corp.net-- if you're 1334 01:09:09,270 --> 01:09:11,540 inside the corporate firewall, let's say-- 1335 01:09:11,540 --> 01:09:16,384 then machines often do have the ability to contact [INAUDIBLE]. 1336 01:09:16,384 --> 01:09:17,300 AUDIENCE: [INAUDIBLE]. 1337 01:09:17,300 --> 01:09:18,430 PROFESSOR: Yeah, yeah. 1338 01:09:18,430 --> 01:09:20,270 Exactly. 1339 01:09:20,270 --> 01:09:23,229 AUDIENCE: Does this work over HTTPS? 1340 01:09:23,229 --> 01:09:25,270 PROFESSOR: Ah, so that's an interesting question. 1341 01:09:25,270 --> 01:09:29,960 So HTTPS has these keys. 1342 01:09:29,960 --> 01:09:33,090 So the way you'd have to get this to work with HTTPS 1343 01:09:33,090 --> 01:09:41,497 is if somehow, for example, if attacker.com could-- let 1344 01:09:41,497 --> 01:09:44,005 me think about this. 1345 01:09:44,005 --> 01:09:47,990 Yeah, it's interesting because, presumably, 1346 01:09:47,990 --> 01:09:51,450 if you were using HTTPS, then when you sent out this Ajax 1347 01:09:51,450 --> 01:09:53,510 request, the victim machine wouldn't 1348 01:09:53,510 --> 01:09:56,830 have the attackers HTTPS keys. 1349 01:09:56,830 --> 01:10:00,896 So the cryptography would fail somehow. 1350 01:10:00,896 --> 01:10:02,270 So I think HTTPS would stop that. 1351 01:10:02,270 --> 01:10:07,590 AUDIENCE: Or if the the victim only has things on HTTPS? 1352 01:10:07,590 --> 01:10:08,570 PROFESSOR: Yeah. 1353 01:10:08,570 --> 01:10:10,352 So I think that would stop it. 1354 01:10:14,771 --> 01:10:20,663 AUDIENCE: If you configure the [INAUDIBLE] 1355 01:10:20,663 --> 01:10:24,100 use the initial or receiving result [INAUDIBLE]? 1356 01:10:24,100 --> 01:10:25,580 PROFESSOR: That's a good question. 1357 01:10:25,580 --> 01:10:26,280 I'm actually not sure about that. 1358 01:10:26,280 --> 01:10:27,739 So actually, a lot of these attacks 1359 01:10:27,739 --> 01:10:29,821 were dependant on the devil in the details, right? 1360 01:10:29,821 --> 01:10:31,732 So I'm not actually sure how that wold work. 1361 01:10:31,732 --> 01:10:33,190 AUDIENCE: It uses the first domain. 1362 01:10:33,190 --> 01:10:34,898 PROFESSOR: It would use the first domain? 1363 01:10:34,898 --> 01:10:37,460 OK. 1364 01:10:37,460 --> 01:10:37,960 Yep? 1365 01:10:37,960 --> 01:10:40,030 AUDIENCE: So why can the attacker 1366 01:10:40,030 --> 01:10:46,319 respond with the victims IP address in the first place? 1367 01:10:46,319 --> 01:10:48,110 PROFESSOR: So why can't-- what do you mean? 1368 01:10:48,110 --> 01:10:50,900 AUDIENCE: [INAUDIBLE]. 1369 01:10:50,900 --> 01:10:53,630 Why has the attacker team [INAUDIBLE] 1370 01:10:53,630 --> 01:10:57,777 has to respond with the attacker's IP [INAUDIBLE]? 1371 01:10:57,777 --> 01:10:58,860 PROFESSOR: Oh, well, yeah. 1372 01:10:58,860 --> 01:11:00,318 Since the attacker has to, somehow, 1373 01:11:00,318 --> 01:11:01,970 get it's own code on the victim machine 1374 01:11:01,970 --> 01:11:05,090 first before it can then start doing this nonsense where it's 1375 01:11:05,090 --> 01:11:06,300 looking inside the network. 1376 01:11:06,300 --> 01:11:08,231 So it's that initial step where it 1377 01:11:08,231 --> 01:11:10,467 has to put that code on the victims machine. 1378 01:11:10,467 --> 01:11:12,050 All right, so in the interest of time, 1379 01:11:12,050 --> 01:11:13,133 let's keep moving forward. 1380 01:11:13,133 --> 01:11:15,556 But come see me after class if you 1381 01:11:15,556 --> 01:11:19,100 want to follow up the question. 1382 01:11:19,100 --> 01:11:22,560 So that's the DNS rebinding attack. 1383 01:11:22,560 --> 01:11:24,546 So how can you fix this? 1384 01:11:24,546 --> 01:11:25,920 So one way you could fix it is so 1385 01:11:25,920 --> 01:11:29,040 that you modify your client-side DNS resolver 1386 01:11:29,040 --> 01:11:31,700 so that external host names can never 1387 01:11:31,700 --> 01:11:33,215 resolve to internal IP address. 1388 01:11:33,215 --> 01:11:35,590 It's, kind of, goofy that someone outside of your network 1389 01:11:35,590 --> 01:11:37,756 should be able to create a DNS binding for something 1390 01:11:37,756 --> 01:11:38,840 inside of your network. 1391 01:11:38,840 --> 01:11:40,740 That's the most straightforward solution. 1392 01:11:40,740 --> 01:11:43,310 You could also imagine that the browser could do something 1393 01:11:43,310 --> 01:11:44,620 called DNS pinning. 1394 01:11:44,620 --> 01:11:47,760 Whereby, if it receives a DNS resolution record, 1395 01:11:47,760 --> 01:11:51,240 then it will always treat that record as valid for, 1396 01:11:51,240 --> 01:11:53,895 let's say, 30 minutes, regardless of whether it 1397 01:11:53,895 --> 01:11:56,740 has a short TTL set inside it because that also 1398 01:11:56,740 --> 01:11:58,177 prevents the attack, as well. 1399 01:11:58,177 --> 01:12:00,260 That solution is a little bit tricky because there 1400 01:12:00,260 --> 01:12:02,920 are some sites that actually, intentionally, use dynamic DNS 1401 01:12:02,920 --> 01:12:05,170 and do things like load balancing and stuff like that. 1402 01:12:05,170 --> 01:12:08,230 So the first solution is probably the better one. 1403 01:12:08,230 --> 01:12:13,240 OK, so here is, sort of, a fun attack. 1404 01:12:13,240 --> 01:12:18,680 So we've talked about a lot of resources 1405 01:12:18,680 --> 01:12:20,628 that the origin protects-- the the same origin 1406 01:12:20,628 --> 01:12:20,930 policy protects. 1407 01:12:20,930 --> 01:12:21,805 So what about pixels? 1408 01:12:25,230 --> 01:12:27,520 So how does the same origin policy protect pixels? 1409 01:12:27,520 --> 01:12:31,350 Well as it turns out, pixels don't really have an origin. 1410 01:12:31,350 --> 01:12:35,040 So each frame gets its own little bounding box. 1411 01:12:35,040 --> 01:12:36,480 Just a square, basically. 1412 01:12:36,480 --> 01:12:40,710 So a frame can draw wherever it wants on that square. 1413 01:12:40,710 --> 01:12:42,910 So this is, actually, a problem because what 1414 01:12:42,910 --> 01:12:45,700 this means is that a parent frame can 1415 01:12:45,700 --> 01:12:49,030 draw atop of it's child frame. 1416 01:12:49,030 --> 01:12:51,250 So this can lead to some very insidious attacks. 1417 01:12:51,250 --> 01:12:59,040 So let's say that the attacker creates some page. 1418 01:12:59,040 --> 01:13:02,620 And let's say, inside of that page, 1419 01:13:02,620 --> 01:13:09,420 the attacker says click to win the iPad. 1420 01:13:09,420 --> 01:13:11,690 The very same standard thing. 1421 01:13:11,690 --> 01:13:13,090 So this is the parent frame. 1422 01:13:13,090 --> 01:13:15,320 Now what the parent frame can do is actually create 1423 01:13:15,320 --> 01:13:23,140 a child frame that is actually the Facebook Like button frame. 1424 01:13:27,850 --> 01:13:32,630 So Facebook allows you to run this little piece of Facebook 1425 01:13:32,630 --> 01:13:34,210 code you can put on your page. 1426 01:13:34,210 --> 01:13:36,340 You know, if the user clicks Like, then that means 1427 01:13:36,340 --> 01:13:37,970 that it'll go on Facebook and say, hey, 1428 01:13:37,970 --> 01:13:40,640 the user likes the particular page. 1429 01:13:40,640 --> 01:13:43,255 So we've got this child frame over here. 1430 01:13:45,852 --> 01:13:47,560 That actually turned out remarkably well. 1431 01:13:47,560 --> 01:13:51,480 Anyway, so you've got this Like thing over here. 1432 01:13:51,480 --> 01:13:58,200 Now what the attacker can do is actually overlay this frame 1433 01:13:58,200 --> 01:14:01,070 on top of the click to get the free iPad 1434 01:14:01,070 --> 01:14:04,720 and also make this invisible. 1435 01:14:04,720 --> 01:14:06,252 So CSS let's you do that. 1436 01:14:06,252 --> 01:14:07,730 So what's going to happen? 1437 01:14:07,730 --> 01:14:10,260 As we've already established, everybody wants a free iPad. 1438 01:14:10,260 --> 01:14:12,370 So the user's going to go to this site, 1439 01:14:12,370 --> 01:14:16,609 click on thing-- this area of the screen-- thinking 1440 01:14:16,609 --> 01:14:18,900 that they're going to click here and get the free iPad. 1441 01:14:18,900 --> 01:14:21,060 But in reality, they're clicking the Like button 1442 01:14:21,060 --> 01:14:23,130 that they can't see that's invisible. 1443 01:14:23,130 --> 01:14:25,560 It's like layered atop the C index. 1444 01:14:25,560 --> 01:14:27,640 So what that means is that now maybe they 1445 01:14:27,640 --> 01:14:30,310 go check their Facebook profile, and they've liked attacker.com. 1446 01:14:30,310 --> 01:14:33,300 You know, and they don't remember how that happened. 1447 01:14:33,300 --> 01:14:36,050 So this is actually called click jacking attack 1448 01:14:36,050 --> 01:14:38,910 because you can imagine you can do all kinds of evil things 1449 01:14:38,910 --> 01:14:39,410 here. 1450 01:14:39,410 --> 01:14:43,610 So you can imagine you could steal passwords this way. 1451 01:14:43,610 --> 01:14:44,770 You could get raw input. 1452 01:14:44,770 --> 01:14:46,270 I mean, it's madness. 1453 01:14:46,270 --> 01:14:49,760 So once again, this happens because the parent, 1454 01:14:49,760 --> 01:14:53,720 essentially, gets the right to draw over anything that's 1455 01:14:53,720 --> 01:14:56,140 inside this bounding box. 1456 01:14:56,140 --> 01:15:00,084 So does that attack make sense? 1457 01:15:00,084 --> 01:15:00,724 Yeah. 1458 01:15:00,724 --> 01:15:02,140 AUDIENCE: [INAUDIBLE], what do you 1459 01:15:02,140 --> 01:15:06,400 mean the parent gets to draw over anything [INAUDIBLE]? 1460 01:15:06,400 --> 01:15:08,900 PROFESSOR: So what I'm trying to indicate here 1461 01:15:08,900 --> 01:15:14,415 is that, visually speaking, what the user just sees is this. 1462 01:15:14,415 --> 01:15:16,040 AUDIENCE: Oh, that's the parent frames. 1463 01:15:16,040 --> 01:15:17,140 PROFESSOR: Yeah, this is the parent frame. 1464 01:15:17,140 --> 01:15:17,380 That's right. 1465 01:15:17,380 --> 01:15:17,930 This is the child frame. 1466 01:15:17,930 --> 01:15:20,120 So visually speaking, the user just sees this. 1467 01:15:20,120 --> 01:15:23,790 But using the miracle of my da Vinci style drawing techniques, 1468 01:15:23,790 --> 01:15:27,340 this is actually overlaid atop this transparently. 1469 01:15:27,340 --> 01:15:28,720 So that's the child frame. 1470 01:15:28,720 --> 01:15:30,505 That's the parent frame. 1471 01:15:30,505 --> 01:15:32,380 OK so, there's a couple different solutions-- 1472 01:15:32,380 --> 01:15:34,575 you can imagine-- for solving this. 1473 01:15:34,575 --> 01:15:40,320 The first solution is to use a frame busting code. 1474 01:15:43,850 --> 01:15:47,200 So you can actually use JavaScript expressions 1475 01:15:47,200 --> 01:15:50,910 to figure out if you have been put into a frame 1476 01:15:50,910 --> 01:15:51,920 by someone else. 1477 01:15:51,920 --> 01:15:59,490 So like, one of these tests is you compare the reference self 1478 01:15:59,490 --> 01:16:01,750 to top. 1479 01:16:01,750 --> 01:16:04,330 So in the JavaScript world, self refers 1480 01:16:04,330 --> 01:16:06,800 to frame that you yourself aren't in. 1481 01:16:06,800 --> 01:16:10,700 Top refers to the frame at the top of the frame hierarchy. 1482 01:16:10,700 --> 01:16:12,846 So if you do this test and you find out 1483 01:16:12,846 --> 01:16:14,780 that self is not equal to top, then you 1484 01:16:14,780 --> 01:16:16,570 realize that you are a child frame. 1485 01:16:16,570 --> 01:16:19,039 And then you can refuse to load or do things like this. 1486 01:16:19,039 --> 01:16:20,580 So this, in fact, is what will happen 1487 01:16:20,580 --> 01:16:22,844 if you try to create a frame for, let's say, CNN.com. 1488 01:16:22,844 --> 01:16:24,760 You can actually look in the JavaScript source 1489 01:16:24,760 --> 01:16:26,940 and see that it does this test because CNN.com 1490 01:16:26,940 --> 01:16:29,980 doesn't want other people taking credit for it's content. 1491 01:16:29,980 --> 01:16:31,755 So it only wants to be the top most frame. 1492 01:16:31,755 --> 01:16:33,550 So that's one solution you can use here. 1493 01:16:33,550 --> 01:16:35,216 The other solution that you can use here 1494 01:16:35,216 --> 01:16:39,890 is also to have your web server send this HTTP response 1495 01:16:39,890 --> 01:16:41,900 hitter called x-frame options. 1496 01:16:45,180 --> 01:16:47,520 So when the web server returns a response, 1497 01:16:47,520 --> 01:16:48,690 it can set this header. 1498 01:16:48,690 --> 01:16:50,870 And it can basically say, hey, browser, 1499 01:16:50,870 --> 01:16:54,740 do not allow anyone to put my content inside of a frame. 1500 01:16:54,740 --> 01:16:56,830 So that allows the browser to do the enforcement. 1501 01:16:56,830 --> 01:16:59,540 So that's pretty straightforward. 1502 01:16:59,540 --> 01:17:02,460 So there's a bunch of other, sort of, crazy 1503 01:17:02,460 --> 01:17:04,151 attacks that you can launch. 1504 01:17:04,151 --> 01:17:06,150 Here's another one that's actually pretty funny. 1505 01:17:06,150 --> 01:17:08,860 So as I was mentioning before, the fact 1506 01:17:08,860 --> 01:17:11,180 that we're now living in a web that's internationalized 1507 01:17:11,180 --> 01:17:14,502 actually mean that there's all these issues that 1508 01:17:14,502 --> 01:17:17,900 come up involving name and how you represent host names. 1509 01:17:17,900 --> 01:17:23,434 So for example, let's say that you see this letter right here. 1510 01:17:23,434 --> 01:17:24,600 So what does this look like? 1511 01:17:24,600 --> 01:17:26,120 This looks like a C, right? 1512 01:17:26,120 --> 01:17:27,490 What is this? 1513 01:17:27,490 --> 01:17:30,460 A C in ASCII in the Latin alphabet? 1514 01:17:30,460 --> 01:17:33,250 Or is this a C in Cyrillic? 1515 01:17:33,250 --> 01:17:34,870 Hard to say, right? 1516 01:17:34,870 --> 01:17:37,890 So you can end up having these really strange attacks where 1517 01:17:37,890 --> 01:17:44,210 attackers will register a domain name, like cats.com, 1518 01:17:44,210 --> 01:17:45,350 for example. 1519 01:17:45,350 --> 01:17:48,340 But this is a Cyrillic C. 1520 01:17:48,340 --> 01:17:50,724 So users will go to this domain. 1521 01:17:50,724 --> 01:17:52,140 They might click on it or whatever 1522 01:17:52,140 --> 01:17:55,840 thinking they're going to Latin alphabet C, cats.com. 1523 01:17:55,840 --> 01:17:58,450 But instead, they're going to an attacker one. 1524 01:17:58,450 --> 01:18:01,824 And then, all kinds of madness can happen from there, as well. 1525 01:18:01,824 --> 01:18:03,240 So you might have heard of attacks 1526 01:18:03,240 --> 01:18:05,240 like this are like typo squatting attacks 1527 01:18:05,240 --> 01:18:11,900 where people register for names like F-C-E book.com. 1528 01:18:16,440 --> 01:18:20,170 This is a common fumble finger typing for Facebook.com. 1529 01:18:20,170 --> 01:18:23,745 So if you control this, you're going to get a ton of traffic 1530 01:18:23,745 --> 01:18:26,456 from people who think they're going to Facebook.com. 1531 01:18:26,456 --> 01:18:29,130 So there's a bunch of different, sort of, wacky attacks 1532 01:18:29,130 --> 01:18:31,710 that you can launch through the domain 1533 01:18:31,710 --> 01:18:34,806 registry system that are tricky to defend from first principles 1534 01:18:34,806 --> 01:18:37,180 because how are you going to prevent users from mistyping 1535 01:18:37,180 --> 01:18:38,540 things, for example? 1536 01:18:38,540 --> 01:18:41,700 Or how would the browser indicate to the user, hey, 1537 01:18:41,700 --> 01:18:43,110 this is Cyrillic? 1538 01:18:43,110 --> 01:18:45,260 Is the browser going to alert the user every time 1539 01:18:45,260 --> 01:18:46,820 Cyrillic fonts are included? 1540 01:18:46,820 --> 01:18:49,070 That's going to make people angry if they actually use 1541 01:18:49,070 --> 01:18:51,220 Cyrillic as their native font. 1542 01:18:51,220 --> 01:18:54,040 So it's not quite clear, technologically speaking, 1543 01:18:54,040 --> 01:18:56,940 how we deal with some of those issues. 1544 01:18:56,940 --> 01:19:01,430 So yeah, there's a bunch of other security issues 1545 01:19:01,430 --> 01:19:02,790 that are very subtle here. 1546 01:19:02,790 --> 01:19:07,670 One thing that's interesting is if you look at plugins. 1547 01:19:07,670 --> 01:19:10,900 So how do plugins treat the same origin policy? 1548 01:19:10,900 --> 01:19:15,442 Well plugins often have very subtle incompatibilities 1549 01:19:15,442 --> 01:19:17,150 with the rest of the browser with respect 1550 01:19:17,150 --> 01:19:17,941 to the same origin. 1551 01:19:17,941 --> 01:19:20,480 So for example, if you look at a Java plug-in, 1552 01:19:20,480 --> 01:19:25,020 Java, oftentimes, assumes that different host 1553 01:19:25,020 --> 01:19:28,730 names that have the same IP address 1554 01:19:28,730 --> 01:19:31,420 actually have the same origin. 1555 01:19:31,420 --> 01:19:34,580 That's actually a pretty big deviation from the standard 1556 01:19:34,580 --> 01:19:37,450 interpretation of the same origin policy because this 1557 01:19:37,450 --> 01:19:45,620 means that if you have something like x.y.com and, lets say, 1558 01:19:45,620 --> 01:19:50,640 z.y.com, if they map onto the same IP address, 1559 01:19:50,640 --> 01:19:53,940 then Java will consider these to be in the same origin, 1560 01:19:53,940 --> 01:19:55,580 which is a problem if, for example, 1561 01:19:55,580 --> 01:19:58,390 this site gets [? owned ?] but this one doesn't. 1562 01:19:58,390 --> 01:19:59,970 So there's a bunch of other corner 1563 01:19:59,970 --> 01:20:01,420 cases involving plug-ins. 1564 01:20:01,420 --> 01:20:05,190 You can refer to the tangled web to see some more about some 1565 01:20:05,190 --> 01:20:07,910 of those types of things. 1566 01:20:07,910 --> 01:20:09,740 So the final thing that I want to discuss-- 1567 01:20:09,740 --> 01:20:11,323 you can see the lecture notes for more 1568 01:20:11,323 --> 01:20:13,851 examples of a crazy Attacks that people can launch-- 1569 01:20:13,851 --> 01:20:15,600 but the final thing that I want to discuss 1570 01:20:15,600 --> 01:20:19,680 is this screen sharing attack. 1571 01:20:19,680 --> 01:20:22,680 So HTML 5 actually define this NEW API 1572 01:20:22,680 --> 01:20:26,630 by which a web page can allow all the bits in it's screen 1573 01:20:26,630 --> 01:20:28,560 to be shared with another browser 1574 01:20:28,560 --> 01:20:30,630 or shared with the server. 1575 01:20:30,630 --> 01:20:32,230 This seems like a really cool idea 1576 01:20:32,230 --> 01:20:34,170 because now I can do collaborative foo. 1577 01:20:34,170 --> 01:20:36,406 We can collaborate on a document at the same time. 1578 01:20:36,406 --> 01:20:38,405 And it's exciting because we live in the future. 1579 01:20:38,405 --> 01:20:40,950 But what's funny about this is that, 1580 01:20:40,950 --> 01:20:44,420 when they designed this API, and it's a very new API, 1581 01:20:44,420 --> 01:20:47,560 they apparently didn't think about same origin policies 1582 01:20:47,560 --> 01:20:49,260 at all. 1583 01:20:49,260 --> 01:20:54,070 So what that means is that if you have some page that 1584 01:20:54,070 --> 01:20:57,775 has multiple frames, then any one of these frames, 1585 01:20:57,775 --> 01:21:00,180 if they are granted permission to take 1586 01:21:00,180 --> 01:21:04,840 a screenshot of your monitor, it can take an entire screen 1587 01:21:04,840 --> 01:21:07,630 shot of the entire thing, regardless 1588 01:21:07,630 --> 01:21:11,200 of what origin that other content's coming from. 1589 01:21:11,200 --> 01:21:14,340 So this is, actually, a pretty devastating flaw 1590 01:21:14,340 --> 01:21:16,875 in the same origin policy. 1591 01:21:16,875 --> 01:21:19,250 So there's some pretty obvious fixes you can think about. 1592 01:21:19,250 --> 01:21:23,500 So for example, if this person's given screenshot capabilities, 1593 01:21:23,500 --> 01:21:25,310 only let it take a screenshot of this. 1594 01:21:25,310 --> 01:21:25,810 Right? 1595 01:21:25,810 --> 01:21:26,700 Not this whole thing. 1596 01:21:26,700 --> 01:21:29,010 Why didn't the browser vendors implement it like this? 1597 01:21:29,010 --> 01:21:32,410 Because there's such pressure to compete on features, 1598 01:21:32,410 --> 01:21:35,595 and to innovate on features, and to get that next new thing out 1599 01:21:35,595 --> 01:21:36,150 there. 1600 01:21:36,150 --> 01:21:38,441 So for example, a lot of the questions that people were 1601 01:21:38,441 --> 01:21:40,940 asking about this particular lecture online [INAUDIBLE] 1602 01:21:40,940 --> 01:21:42,711 was like, well, why couldn't you do this? 1603 01:21:42,711 --> 01:21:44,210 Wouldn't this thing make more sense? 1604 01:21:44,210 --> 01:21:46,030 It seems like this current scheme is brain dead. 1605 01:21:46,030 --> 01:21:47,460 Wouldn't this other one be better? 1606 01:21:47,460 --> 01:21:48,210 And the answer is, yes. 1607 01:21:48,210 --> 01:21:48,895 Everything, yes. 1608 01:21:48,895 --> 01:21:50,850 That's exactly correct. 1609 01:21:50,850 --> 01:21:53,460 Almost anything would be better than this. 1610 01:21:53,460 --> 01:21:56,030 I'm ashamed to be associated with this. 1611 01:21:56,030 --> 01:21:57,220 But this is what we had. 1612 01:21:57,220 --> 01:21:59,440 So what ends up happening is if you look at the nuts 1613 01:21:59,440 --> 01:22:01,507 and bolts of how web browsers get developed, 1614 01:22:01,507 --> 01:22:03,590 people are a little bit better about security now. 1615 01:22:03,590 --> 01:22:05,256 But like, with the screen sharing thing, 1616 01:22:05,256 --> 01:22:08,290 people were so pumped to get this thing out there, 1617 01:22:08,290 --> 01:22:10,310 they didn't realize that's it's going to leak 1618 01:22:10,310 --> 01:22:12,920 all the bits on your screen. 1619 01:22:12,920 --> 01:22:14,864 So now we're at his point with the web 1620 01:22:14,864 --> 01:22:16,530 where-- I mean, look at all these things 1621 01:22:16,530 --> 01:22:18,310 that we've discussed today. 1622 01:22:18,310 --> 01:22:20,200 So if we were going to start from scratch 1623 01:22:20,200 --> 01:22:22,280 and come up with a better security policy, 1624 01:22:22,280 --> 01:22:25,020 what fraction of websites that you have today 1625 01:22:25,020 --> 01:22:26,870 are going to actually work? 1626 01:22:26,870 --> 01:22:28,941 Like, approximately, .2% of them. 1627 01:22:28,941 --> 01:22:29,440 Right? 1628 01:22:29,440 --> 01:22:30,731 So users are going to complain. 1629 01:22:30,731 --> 01:22:33,090 And this is another constant story with security. 1630 01:22:33,090 --> 01:22:36,040 Once you give users a feature, it's often very difficult 1631 01:22:36,040 --> 01:22:40,280 to claw that back, even if that feature is insecure. 1632 01:22:40,280 --> 01:22:42,450 So today, we discussed a lot of different things 1633 01:22:42,450 --> 01:22:44,120 about the same origin policy and stuff like that. 1634 01:22:44,120 --> 01:22:45,720 Next lecture, we'll go into some more 1635 01:22:45,720 --> 01:22:48,680 depth about some of those things we talked about [INAUDIBLE].