1 00:00:00,080 --> 00:00:02,430 The following content is provided under a Creative 2 00:00:02,430 --> 00:00:03,810 Commons license. 3 00:00:03,810 --> 00:00:06,050 Your support will help MIT OpenCourseWare 4 00:00:06,050 --> 00:00:10,160 continue to offer high quality educational resources for free. 5 00:00:10,160 --> 00:00:12,700 To make a donation or to view additional materials 6 00:00:12,700 --> 00:00:16,600 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:16,600 --> 00:00:17,263 at ocw.mit.edu. 8 00:00:26,590 --> 00:00:29,840 PROFESSOR: All right, let's get started. 9 00:00:29,840 --> 00:00:33,336 So welcome to another exciting lecture about security 10 00:00:33,336 --> 00:00:34,710 and why the world is so terrible. 11 00:00:34,710 --> 00:00:37,390 So today we're going to talk about private browsing 12 00:00:37,390 --> 00:00:38,860 modes, something that a lot of you 13 00:00:38,860 --> 00:00:40,940 probably have a lot of personal experience with. 14 00:00:40,940 --> 00:00:43,996 At a high level, what is the goal of privacy? 15 00:00:43,996 --> 00:00:45,870 When security researchers talk about privacy, 16 00:00:45,870 --> 00:00:47,140 what are they talking about? 17 00:00:47,140 --> 00:00:48,900 Well at a high level, they're talking 18 00:00:48,900 --> 00:00:50,060 about the following goal. 19 00:00:50,060 --> 00:00:55,260 So any particular user should be indistinguishable from a bunch 20 00:00:55,260 --> 00:00:56,200 of other users. 21 00:00:56,200 --> 00:00:58,560 In particular, the activity of a given user 22 00:00:58,560 --> 00:01:02,480 should be non-incriminating when viewed in light of activity 23 00:01:02,480 --> 00:01:05,099 from a bunch of other different users. 24 00:01:05,099 --> 00:01:06,620 And so, as I mentioned, today we're 25 00:01:06,620 --> 00:01:09,480 going to talk about privacy in the specific context 26 00:01:09,480 --> 00:01:11,800 of private web browsing. 27 00:01:11,800 --> 00:01:14,460 And so there's actually no formal definition 28 00:01:14,460 --> 00:01:16,960 for what private web browsing means. 29 00:01:16,960 --> 00:01:19,020 There's a couple different reasons for that. 30 00:01:19,020 --> 00:01:20,850 So one reason is that web applications 31 00:01:20,850 --> 00:01:22,170 are very, very complicated. 32 00:01:22,170 --> 00:01:24,160 And they're adding new features all the time 33 00:01:24,160 --> 00:01:26,140 like audio and video capabilities and things 34 00:01:26,140 --> 00:01:26,870 like this. 35 00:01:26,870 --> 00:01:29,020 As a result, there's this moving target 36 00:01:29,020 --> 00:01:30,910 in terms of what browsers can do. 37 00:01:30,910 --> 00:01:32,520 And as a result, what information 38 00:01:32,520 --> 00:01:35,470 they might be able to leak about a particular user. 39 00:01:35,470 --> 00:01:37,740 And so what ends up happening is that in practice, 40 00:01:37,740 --> 00:01:39,630 like with many things involving browsers, 41 00:01:39,630 --> 00:01:41,990 there's this living standard. 42 00:01:41,990 --> 00:01:43,400 So different browser vendors will 43 00:01:43,400 --> 00:01:45,420 implement different features, particularly with respect 44 00:01:45,420 --> 00:01:46,370 to private browsing. 45 00:01:46,370 --> 00:01:49,150 Other vendors will look and see what vendor X is doing. 46 00:01:49,150 --> 00:01:51,970 They will update their own browser. 47 00:01:51,970 --> 00:01:54,260 So it's like a moving target. 48 00:01:54,260 --> 00:01:58,040 And as users grow to rely on private browsing more and more, 49 00:01:58,040 --> 00:02:00,910 they end up a lot of times actually finding bugs 50 00:02:00,910 --> 00:02:03,730 in private browsing mode, as I'll discuss a couple 51 00:02:03,730 --> 00:02:05,430 minutes later in the lecture. 52 00:02:05,430 --> 00:02:07,510 And so private browsing at a high level 53 00:02:07,510 --> 00:02:09,930 you can think of as an aspirational goal. 54 00:02:09,930 --> 00:02:11,960 But we as society are continually 55 00:02:11,960 --> 00:02:14,070 refining what it means to do private browsing 56 00:02:14,070 --> 00:02:16,470 and getting better in some aspects-- 57 00:02:16,470 --> 00:02:19,090 worse in some aspects-- as we'll see a little bit later. 58 00:02:19,090 --> 00:02:22,920 So what exactly do we mean by private browsing? 59 00:02:22,920 --> 00:02:23,440 It's tough. 60 00:02:23,440 --> 00:02:27,030 But the paper tries to formalize it in two specific ways. 61 00:02:27,030 --> 00:02:31,640 So first of all, the paper talks about a local attacker 62 00:02:31,640 --> 00:02:33,015 on private web browsing. 63 00:02:33,015 --> 00:02:34,640 This is someone who is going to possess 64 00:02:34,640 --> 00:02:35,866 your machine after you've finished a private browsing 65 00:02:35,866 --> 00:02:36,555 session. 66 00:02:36,555 --> 00:02:39,066 And it wants to figure out what sites you looked 67 00:02:39,066 --> 00:02:40,710 at in private browsing mode. 68 00:02:40,710 --> 00:02:44,520 And the paper also talks about web attackers. 69 00:02:44,520 --> 00:02:47,425 The web attacker is someone who controls the websites 70 00:02:47,425 --> 00:02:48,680 that you visit. 71 00:02:48,680 --> 00:02:51,750 And this web attacker might want to try to figure out 72 00:02:51,750 --> 00:02:55,470 that you are some particular person John or Jane as opposed 73 00:02:55,470 --> 00:02:58,460 to some amorphous user that the website can't 74 00:02:58,460 --> 00:02:59,210 tell who they are. 75 00:02:59,210 --> 00:03:03,840 And so we'll look at each one of these attacks in detail. 76 00:03:03,840 --> 00:03:07,400 But for now, suffice it to say that if the attacker can launch 77 00:03:07,400 --> 00:03:10,488 both of these attacks-- both a local and a web 78 00:03:10,488 --> 00:03:12,783 attack-- that actually really strengthens their ability 79 00:03:12,783 --> 00:03:14,160 to try to dearm us. 80 00:03:14,160 --> 00:03:16,230 So, for example, a local attacker 81 00:03:16,230 --> 00:03:18,820 who, for example, maybe knows your IP address 82 00:03:18,820 --> 00:03:21,390 can actually talk to the website and say, hey, 83 00:03:21,390 --> 00:03:23,640 have you seen this particular IP address in your logs. 84 00:03:23,640 --> 00:03:24,310 If so, aha! 85 00:03:24,310 --> 00:03:28,744 You're looking at the user whose machine I control right now. 86 00:03:28,744 --> 00:03:31,160 So it's actually pretty useful from a security perspective 87 00:03:31,160 --> 00:03:33,299 to consider these local and web attacks. 88 00:03:33,299 --> 00:03:35,090 So they are separate things and then to see 89 00:03:35,090 --> 00:03:37,110 how they can possibly compose. 90 00:03:37,110 --> 00:03:42,100 So let's look at this first type of attacker, 91 00:03:42,100 --> 00:03:44,560 which is the local attacker. 92 00:03:49,880 --> 00:03:54,890 So as I mentioned, we assume that this attacker 93 00:03:54,890 --> 00:04:02,700 is going to control the user's machine post-session. 94 00:04:02,700 --> 00:04:07,784 And so by post-session, I mean that the private browsing 95 00:04:07,784 --> 00:04:11,440 activity has already finished-- the user has perhaps gone 96 00:04:11,440 --> 00:04:12,809 off and done something else. 97 00:04:12,809 --> 00:04:13,850 It's not at the computer. 98 00:04:13,850 --> 00:04:15,891 And then the attacker takes control of that issue 99 00:04:15,891 --> 00:04:17,890 and wants to figure out what's going on. 100 00:04:17,890 --> 00:04:21,110 And so the security goal is that well we don't the attacker 101 00:04:21,110 --> 00:04:23,080 be able to figure out any of the websites 102 00:04:23,080 --> 00:04:27,310 that the user visited during this private browsing activity. 103 00:04:27,310 --> 00:04:29,885 Now, the reason why the post is actually important 104 00:04:29,885 --> 00:04:32,420 there is because if we assume that the attacker can control 105 00:04:32,420 --> 00:04:34,787 the machine before the users private browsing, 106 00:04:34,787 --> 00:04:37,370 then basically it's game over, right, because the attacker can 107 00:04:37,370 --> 00:04:41,730 install a keystroke logger-- the attacker can subvert the binary 108 00:04:41,730 --> 00:04:44,120 that [INAUDIBLE] the browser. 109 00:04:44,120 --> 00:04:45,980 The attacker can subert the OS. 110 00:04:45,980 --> 00:04:50,460 So we don't really care about this pre-session attacker. 111 00:04:50,460 --> 00:04:52,800 And also note that we're not trying to provide privacy 112 00:04:52,800 --> 00:04:57,189 for the user after the attacker has controlled the machine. 113 00:04:57,189 --> 00:04:58,480 And that's for the same reason. 114 00:04:58,480 --> 00:04:59,910 Once the attacker gets to the machine, 115 00:04:59,910 --> 00:05:02,535 he or she can do the same thing that is mentioned-- key logger. 116 00:05:02,535 --> 00:05:05,530 So, basically, once the user leaves the machine, 117 00:05:05,530 --> 00:05:08,840 we don't assume any forward notions of privacy. 118 00:05:08,840 --> 00:05:09,740 Does that make sense? 119 00:05:09,740 --> 00:05:11,580 It's pretty straightforward. 120 00:05:11,580 --> 00:05:14,700 And so you can imagine that another goal that you might 121 00:05:14,700 --> 00:05:17,950 want to try to satisfy here is you 122 00:05:17,950 --> 00:05:20,150 might want to try to hide from the attacker 123 00:05:20,150 --> 00:05:23,820 that the user was employing private browsing mode at all. 124 00:05:23,820 --> 00:05:26,290 Now the paper actually said that's very difficult. 125 00:05:26,290 --> 00:05:28,530 This property is often called plausible deniability. 126 00:05:28,530 --> 00:05:31,389 So your boss comes up to you after you use private browsing, 127 00:05:31,389 --> 00:05:33,305 and says were you looking at mylittlepony.com? 128 00:05:33,305 --> 00:05:35,027 No, no, I certainly wasn't. 129 00:05:35,027 --> 00:05:37,110 And I certainly wasn't using private browsing mode 130 00:05:37,110 --> 00:05:39,444 to hide the fact that I was looking at mylittlepony.com. 131 00:05:39,444 --> 00:05:40,818 So as I said, the paper said it's 132 00:05:40,818 --> 00:05:42,390 difficult to provide this property 133 00:05:42,390 --> 00:05:43,770 of plausible deniability. 134 00:05:43,770 --> 00:05:45,300 I'll give you some concrete reasons 135 00:05:45,300 --> 00:05:47,190 why this might be the case a little bit later 136 00:05:47,190 --> 00:05:48,570 on in the lecture. 137 00:05:48,570 --> 00:05:51,440 But that basically is an overview of the local attacker. 138 00:05:51,440 --> 00:05:55,142 So one question we might want to think about 139 00:05:55,142 --> 00:06:05,760 is what kinds of persistent client-side state 140 00:06:05,760 --> 00:06:11,041 can be leaked by a private browsing session? 141 00:06:14,730 --> 00:06:18,796 And by persistent, I just mean stuff 142 00:06:18,796 --> 00:06:22,022 that will end up getting stored on the local hard disk, 143 00:06:22,022 --> 00:06:24,716 the local SSD or whatever. 144 00:06:24,716 --> 00:06:27,215 So what kinds of state might be leaked if we weren't careful 145 00:06:27,215 --> 00:06:29,381 when someone is doing this type of private browsing? 146 00:06:29,381 --> 00:06:31,010 So one thing you might be worried about 147 00:06:31,010 --> 00:06:38,150 is JavaScript accessible states. 148 00:06:38,150 --> 00:06:45,391 So examplees of this includes things like cookies and DOM 149 00:06:45,391 --> 00:06:45,890 storage. 150 00:06:49,400 --> 00:06:52,220 Another thing you might be worried about-- 151 00:06:52,220 --> 00:06:55,049 and this is what most people think about when they think 152 00:06:55,049 --> 00:06:57,090 about what they want to say in private browsing-- 153 00:06:57,090 --> 00:06:59,514 is maybe the browser cache. 154 00:06:59,514 --> 00:07:01,680 So you don't want someone to look in the inner cache 155 00:07:01,680 --> 00:07:04,200 and figure out here are some images or HTML 156 00:07:04,200 --> 00:07:05,908 files from websites you prefer people 157 00:07:05,908 --> 00:07:07,116 didn't know that you visited. 158 00:07:10,620 --> 00:07:16,377 Another important thing is your history of visited sites. 159 00:07:19,680 --> 00:07:21,486 So many of your relationships have 160 00:07:21,486 --> 00:07:24,032 been broken when the other goes to the browser-- start typing 161 00:07:24,032 --> 00:07:26,240 something into to the address bar and all of a sudden 162 00:07:26,240 --> 00:07:28,281 it auto-completes to something very embarrassing. 163 00:07:28,281 --> 00:07:30,245 So this is one thing definitely you 164 00:07:30,245 --> 00:07:33,610 don't want to leak outside the private browsing session. 165 00:07:33,610 --> 00:07:38,610 You can also think about configuration states 166 00:07:38,610 --> 00:07:39,830 with the browsing. 167 00:07:39,830 --> 00:07:43,270 And so here you could think about things 168 00:07:43,270 --> 00:07:47,330 like client certificates. 169 00:07:47,330 --> 00:07:51,725 You could also think about stuff like bookmarks. 170 00:07:55,170 --> 00:07:58,262 Maybe if you logged into a particular site and the browser 171 00:07:58,262 --> 00:08:00,178 offers to store your passwords in another type 172 00:08:00,178 --> 00:08:02,455 of configuration state that you might not want leaking 173 00:08:02,455 --> 00:08:05,210 from private browsing mode. 174 00:08:05,210 --> 00:08:09,940 Downloaded files-- as we'll discuss, 175 00:08:09,940 --> 00:08:12,520 this one's a little bit interesting 176 00:08:12,520 --> 00:08:15,972 because downloading a file actually requires explicit user 177 00:08:15,972 --> 00:08:17,180 action to download that file. 178 00:08:17,180 --> 00:08:18,876 Maybe we do actually want this stuff 179 00:08:18,876 --> 00:08:20,610 to leak outside of private browsing mode. 180 00:08:20,610 --> 00:08:22,780 Maybe if you download something in private browsing mode, 181 00:08:22,780 --> 00:08:25,247 it should actually be accessible when you open the browser 182 00:08:25,247 --> 00:08:26,830 or use the machine after that session. 183 00:08:26,830 --> 00:08:30,660 So we'll talk about this a little bit in a second. 184 00:08:30,660 --> 00:08:34,320 And then, finally, during private browsing mode, 185 00:08:34,320 --> 00:08:39,261 you might install new plug-ins or browser sessions. 186 00:08:43,489 --> 00:08:45,030 That's another type of state that you 187 00:08:45,030 --> 00:08:48,970 might imagine you don't want to leak outside 188 00:08:48,970 --> 00:08:50,160 of private browsing mode. 189 00:08:50,160 --> 00:08:54,040 So, basically, current browsing modes typically 190 00:08:54,040 --> 00:08:58,750 try to prevent one, two, and three from leaking outside 191 00:08:58,750 --> 00:09:00,351 of private browser sessions. 192 00:09:00,351 --> 00:09:00,850 Right? 193 00:09:00,850 --> 00:09:02,808 So there shouldn't be any cookies or DOM stores 194 00:09:02,808 --> 00:09:03,732 to get out of there. 195 00:09:03,732 --> 00:09:05,940 Anything you put in a cache during a private browsing 196 00:09:05,940 --> 00:09:07,404 session should be deleted. 197 00:09:07,404 --> 00:09:09,320 And you shouldn't have any history of the URLs 198 00:09:09,320 --> 00:09:11,160 that you're using. 199 00:09:11,160 --> 00:09:14,510 Typically, four, five, and six private browsing modes 200 00:09:14,510 --> 00:09:16,890 allow to leak outside of a session. 201 00:09:16,890 --> 00:09:19,220 And there's some good and some bad reasons 202 00:09:19,220 --> 00:09:20,570 why this might be the case. 203 00:09:20,570 --> 00:09:22,400 And as we'll discuss later, we'll 204 00:09:22,400 --> 00:09:25,050 see if you allow anything to leak from the private browsing 205 00:09:25,050 --> 00:09:28,130 session, that actually radically increases the threat 206 00:09:28,130 --> 00:09:29,470 surface of private leaks. 207 00:09:29,470 --> 00:09:31,540 So it becomes much more difficult to reason 208 00:09:31,540 --> 00:09:33,240 about what the security properties are 209 00:09:33,240 --> 00:09:34,429 for private browsing mode. 210 00:09:34,429 --> 00:09:35,470 Does that all make sense? 211 00:09:35,470 --> 00:09:38,370 Anyone have any questions? 212 00:09:38,370 --> 00:09:41,320 It's pretty straightforward. 213 00:09:41,320 --> 00:09:45,320 So the next thing we're going to talk about very briefly 214 00:09:45,320 --> 00:09:50,704 is network activity during private browsing mode. 215 00:09:50,704 --> 00:09:53,460 And what's interesting about this 216 00:09:53,460 --> 00:09:56,911 is that even if we cover all this stuff-- 217 00:09:56,911 --> 00:09:58,910 we don't allow private browsing to leak anything 218 00:09:58,910 --> 00:10:02,060 from there-- the mere fact that you're issuing network packet 219 00:10:02,060 --> 00:10:04,650 connections leave evidence of what you were doing. 220 00:10:04,650 --> 00:10:06,513 So imagine when you want to go to foo.com, 221 00:10:06,513 --> 00:10:08,540 the website, your machine actually 222 00:10:08,540 --> 00:10:12,259 has to issue a DNS resolution request for foo.com. 223 00:10:12,259 --> 00:10:14,050 So even if you don't leave any of this type 224 00:10:14,050 --> 00:10:15,508 of persistent state up there, there 225 00:10:15,508 --> 00:10:18,310 may be records in your local DNS cache 226 00:10:18,310 --> 00:10:21,681 that you, in fact, tried to resolve the hostname foo.com. 227 00:10:21,681 --> 00:10:22,680 That's very interesting. 228 00:10:22,680 --> 00:10:24,680 So you can imagine that browsers could 229 00:10:24,680 --> 00:10:27,660 try to flush the DNS cache somehow 230 00:10:27,660 --> 00:10:29,390 after the private session was over. 231 00:10:29,390 --> 00:10:30,370 Now, in practice, that's actually 232 00:10:30,370 --> 00:10:31,911 tricky to do because on many systems, 233 00:10:31,911 --> 00:10:34,480 you require administrator privileges to do that. 234 00:10:34,480 --> 00:10:37,360 So it's not clear if you want the browser running as root 235 00:10:37,360 --> 00:10:40,115 because browsers, as we've seen, are somewhat 236 00:10:40,115 --> 00:10:42,020 untrustworthy individuals. 237 00:10:42,020 --> 00:10:44,130 And also too-- a lot of DNS flush 238 00:10:44,130 --> 00:10:46,290 commands-- they don't actually act per user. 239 00:10:46,290 --> 00:10:47,707 They flush the entire cache, which 240 00:10:47,707 --> 00:10:50,165 is typically not what you would want if you're implementing 241 00:10:50,165 --> 00:10:51,250 private browsing mode. 242 00:10:51,250 --> 00:10:53,000 You'd want to use a type of surgical thing 243 00:10:53,000 --> 00:10:55,472 where I only want to get rid of foo.com and things that 244 00:10:55,472 --> 00:10:57,597 were visited during this private browsing sessions, 245 00:10:57,597 --> 00:10:59,470 but not delete other things. 246 00:10:59,470 --> 00:11:02,462 So in practice, that's kind of a tricky thing to handle. 247 00:11:02,462 --> 00:11:03,920 And another tricky thing to handle, 248 00:11:03,920 --> 00:11:08,310 which the paper mentions-- are these things 249 00:11:08,310 --> 00:11:10,620 that I'll call RAM artifacts. 250 00:11:13,330 --> 00:11:18,210 So the basic idea here is that during private browsing mode, 251 00:11:18,210 --> 00:11:22,320 that private browser has to be keeping some stuff in memory. 252 00:11:22,320 --> 00:11:24,290 And so even if the private browsing mode 253 00:11:24,290 --> 00:11:29,630 doesn't issue any direct I/Os to disk-- user rights. 254 00:11:29,630 --> 00:11:32,620 The RAM that belongs to that private browsing tab 255 00:11:32,620 --> 00:11:35,720 can still be reflected into the page file, for example. 256 00:11:35,720 --> 00:11:38,490 It can still be reflected into the hibernation file, 257 00:11:38,490 --> 00:11:40,330 for example, the laptop. 258 00:11:40,330 --> 00:11:42,500 And so if that state gets reflected 259 00:11:42,500 --> 00:11:45,450 into persistent storage, then what may end up happening 260 00:11:45,450 --> 00:11:47,740 is that after your private browsing session is over, 261 00:11:47,740 --> 00:11:50,160 the attacker can look in your page file, for example, 262 00:11:50,160 --> 00:11:52,420 and find, for example, JavaScript code that 263 00:11:52,420 --> 00:11:54,480 was reflected to disk or find HTML 264 00:11:54,480 --> 00:11:56,320 that was reflected to disk. 265 00:11:56,320 --> 00:11:59,650 So we're going to have a little demonstration of how 266 00:11:59,650 --> 00:12:01,160 this might work. 267 00:12:01,160 --> 00:12:04,080 So if you see up here on the screen, 268 00:12:04,080 --> 00:12:09,750 I basically loaded up private browsing tabs. 269 00:12:09,750 --> 00:12:11,890 And so what I'm going to do is I'm 270 00:12:11,890 --> 00:12:15,810 going to go to some website. 271 00:12:15,810 --> 00:12:21,360 So this is for the PDOS group here at CSAIL. 272 00:12:21,360 --> 00:12:23,160 I've loaded up that page. 273 00:12:23,160 --> 00:12:25,090 And then what I'm going to do is use 274 00:12:25,090 --> 00:12:28,540 this fun command called gcore. 275 00:12:28,540 --> 00:12:30,740 So, basically, I'm going to take a memory 276 00:12:30,740 --> 00:12:34,200 snapshot of this running page. 277 00:12:34,200 --> 00:12:37,410 And so I will do the following magic. 278 00:12:48,780 --> 00:12:53,150 So basically there's going to be some work 279 00:12:53,150 --> 00:12:58,345 that my terminal is doing to generate that memory snapshot. 280 00:13:02,270 --> 00:13:04,980 So this takes a little bit of time sometimes. 281 00:13:10,400 --> 00:13:16,220 Now, what's happening here. 282 00:13:16,220 --> 00:13:18,840 So now we've basically generated the core file 283 00:13:18,840 --> 00:13:20,987 for that private browsing image. 284 00:13:20,987 --> 00:13:22,570 So what we're going to do now is we're 285 00:13:22,570 --> 00:13:26,550 going to look inside of that image 286 00:13:26,550 --> 00:13:30,790 and see if we can find any mentions of PDOS. 287 00:13:33,832 --> 00:13:35,290 And so what's interesting is we see 288 00:13:35,290 --> 00:13:39,410 a ton of instances of the string PDOS in that memory image 289 00:13:39,410 --> 00:13:41,330 for the private browsing mode. 290 00:13:41,330 --> 00:13:43,590 And so what is interesting is we actually see 291 00:13:43,590 --> 00:13:45,610 various prefixes for things. 292 00:13:45,610 --> 00:13:48,860 If we look further up, we can see things 293 00:13:48,860 --> 00:13:52,630 like there's full URLs here and things like this. 294 00:13:52,630 --> 00:13:55,280 You also find HTML code in there. 295 00:13:55,280 --> 00:13:58,580 So the point here is that if we found all this 296 00:13:58,580 --> 00:14:02,780 in the memory of that page, then if this-- if any of those pages 297 00:14:02,780 --> 00:14:05,786 got put to disk in the page file, then he attacker 298 00:14:05,786 --> 00:14:07,160 could basically just run strings. 299 00:14:07,160 --> 00:14:09,602 So they could do what I just did over the page file 300 00:14:09,602 --> 00:14:11,560 and try to find out what sites that you visited 301 00:14:11,560 --> 00:14:13,340 in private browsing mode. 302 00:14:13,340 --> 00:14:14,340 So does that make sense? 303 00:14:14,340 --> 00:14:17,540 Basically, the problem here is that private browsing modes 304 00:14:17,540 --> 00:14:20,226 don't try to obfuscate RAM basically or encrypt it 305 00:14:20,226 --> 00:14:21,051 in any way. 306 00:14:21,051 --> 00:14:23,300 And that seems like a pretty fundamental thing because 307 00:14:23,300 --> 00:14:24,780 at a certain point, the processor 308 00:14:24,780 --> 00:14:27,600 has to execute on clear text data. 309 00:14:27,600 --> 00:14:31,814 And so this is actually a pretty big challenge. 310 00:14:31,814 --> 00:14:33,230 So does anyone have any questions? 311 00:14:33,230 --> 00:14:34,200 Yeah? 312 00:14:34,200 --> 00:14:37,472 AUDIENCE: So one thing is I don't expect my browser 313 00:14:37,472 --> 00:14:38,951 to do that. 314 00:14:38,951 --> 00:14:41,021 One thing is that these browsers-- the guarantee 315 00:14:41,021 --> 00:14:42,895 that they give you through private browsing-- 316 00:14:42,895 --> 00:14:45,360 the example they give is if you're shopping for something, 317 00:14:45,360 --> 00:14:47,825 your layman friend can't go on the computer 318 00:14:47,825 --> 00:14:49,179 and see the things. 319 00:14:49,179 --> 00:14:51,679 So can you talk a little bit about what guarantees they give 320 00:14:51,679 --> 00:14:52,950 and if they had to change anything 321 00:14:52,950 --> 00:14:54,360 as a consequence of this paper? 322 00:14:54,360 --> 00:14:57,030 PROFESSOR: Yeah, it's very interesting. 323 00:14:57,030 --> 00:14:58,960 One thing you can look at is when you open up 324 00:14:58,960 --> 00:15:00,335 a private browsing tab, typically 325 00:15:00,335 --> 00:15:02,630 there will be a little blurb that says, hey, 326 00:15:02,630 --> 00:15:03,730 welcome to incognito mode. 327 00:15:03,730 --> 00:15:05,230 Here's where we'll help you against. 328 00:15:05,230 --> 00:15:07,200 We won't help you if someone is standing behind you 329 00:15:07,200 --> 00:15:08,741 with a rubber hose about to beat you. 330 00:15:08,741 --> 00:15:10,850 And so the browser vendors themselves 331 00:15:10,850 --> 00:15:14,737 area little bit cagey about what guarantees they provide. 332 00:15:14,737 --> 00:15:17,320 And in fact, after the Snowden incident, a lot of the browsers 333 00:15:17,320 --> 00:15:18,800 actually changed that splash page 334 00:15:18,800 --> 00:15:20,533 because they wanted to actually make it clear 335 00:15:20,533 --> 00:15:22,658 that we're not actually protecting from strong ways 336 00:15:22,658 --> 00:15:24,650 with the NSA or something like that. 337 00:15:24,650 --> 00:15:26,505 So long story short, what guarantees 338 00:15:26,505 --> 00:15:27,660 are they providing you? 339 00:15:27,660 --> 00:15:30,502 In practice, they're providing that weak thing 340 00:15:30,502 --> 00:15:31,460 that you mention there. 341 00:15:31,460 --> 00:15:33,266 It's like a lay person who wanted 342 00:15:33,266 --> 00:15:34,990 to see what you were doing afterwards 343 00:15:34,990 --> 00:15:36,990 couldn't figure out what you were doing. 344 00:15:36,990 --> 00:15:38,365 And we're assuming the lay person 345 00:15:38,365 --> 00:15:41,227 can't run strings on the page file or things like that. 346 00:15:41,227 --> 00:15:43,560 Now, the problem-- there's actually two problems though. 347 00:15:43,560 --> 00:15:47,835 One problem is that first of all, because browsers 348 00:15:47,835 --> 00:15:49,720 are so complicated, they often don't even 349 00:15:49,720 --> 00:15:50,970 protect against the layperson. 350 00:15:50,970 --> 00:15:52,520 I can give you a personal example. 351 00:15:52,520 --> 00:15:56,920 So a lot of times when you see those ridiculous ads 352 00:15:56,920 --> 00:15:58,670 from "Huffington Post," like, oh, my gosh. 353 00:15:58,670 --> 00:16:00,920 It's like puppies trying to help small puppies go down 354 00:16:00,920 --> 00:16:02,310 stairs and things like that. 355 00:16:02,310 --> 00:16:03,340 Right? 356 00:16:03,340 --> 00:16:06,010 Because I'm weak, I will sometimes hook on those things. 357 00:16:06,010 --> 00:16:08,093 But because I don't know want people to know that, 358 00:16:08,093 --> 00:16:10,734 I'll sometimes do that in private browsing mode. 359 00:16:10,734 --> 00:16:12,900 So what will happen sometimes is that sometimes I'll 360 00:16:12,900 --> 00:16:16,610 see those URLs will leak into my URL history 361 00:16:16,610 --> 00:16:19,240 like my regular, public mode browser, which 362 00:16:19,240 --> 00:16:22,220 is precisely what this stuff is designed not to do. 363 00:16:22,220 --> 00:16:25,346 So one problem is that sometimes these browsers 364 00:16:25,346 --> 00:16:27,720 don't provide protection against the layperson attackers. 365 00:16:27,720 --> 00:16:29,844 The second thing is I think that there are actually 366 00:16:29,844 --> 00:16:32,270 a lot of people who would like for private browsing mode 367 00:16:32,270 --> 00:16:34,652 to provide something stronger, particularly 368 00:16:34,652 --> 00:16:36,640 with the whole Snowden thing. 369 00:16:36,640 --> 00:16:37,280 I think there is a lot of people increasingly 370 00:16:37,280 --> 00:16:39,279 who would like private browsing mode to protect, 371 00:16:39,279 --> 00:16:41,510 for example, against these RAM artifact attacks, 372 00:16:41,510 --> 00:16:44,310 even though they may not be able to technically articulate 373 00:16:44,310 --> 00:16:45,617 that goal. 374 00:16:45,617 --> 00:16:47,200 And so actually one of the things I've 375 00:16:47,200 --> 00:16:48,120 done while I was here, I got to do 376 00:16:48,120 --> 00:16:50,351 some research in a stronger private browsing mode 377 00:16:50,351 --> 00:16:50,850 protection. 378 00:16:50,850 --> 00:16:52,350 So we can chat about that after all. 379 00:16:52,350 --> 00:16:54,308 One of the things we learn about all professors 380 00:16:54,308 --> 00:16:56,763 is that we will talk about our research endlessly. 381 00:16:56,763 --> 00:16:59,013 So if you want to talk about that for three hours just 382 00:16:59,013 --> 00:17:00,150 send me a calendar request. 383 00:17:00,150 --> 00:17:02,490 And we can do that. 384 00:17:02,490 --> 00:17:06,099 So, anyway, this is basically a demonstration. 385 00:17:06,099 --> 00:17:07,180 Oh, you had a question? 386 00:17:07,180 --> 00:17:09,400 AUDIENCE: Yeah, about the RAM. 387 00:17:09,400 --> 00:17:12,375 So I'm not familiar with how it works exactly. 388 00:17:12,375 --> 00:17:15,333 How come a browser can't at the end of a session, 389 00:17:15,333 --> 00:17:19,384 just ask the OS to flush those parts around that he was using? 390 00:17:19,384 --> 00:17:20,800 PROFESSOR: So we're actually going 391 00:17:20,800 --> 00:17:23,437 to get to that topic in a couple of minutes. 392 00:17:23,437 --> 00:17:24,270 But you are correct. 393 00:17:24,270 --> 00:17:26,920 At a high level, what you can imagine 394 00:17:26,920 --> 00:17:30,280 is that maybe the OS when it, for example, killed a process, 395 00:17:30,280 --> 00:17:32,510 would actually go through all those numbered pages 396 00:17:32,510 --> 00:17:34,540 and write zeros to all those pages. 397 00:17:34,540 --> 00:17:37,375 Or you could also imagine that maybe the browser tried 398 00:17:37,375 --> 00:17:40,140 to pin all the pages in memory to prevent anything 399 00:17:40,140 --> 00:17:42,410 from getting flushed out at all. 400 00:17:42,410 --> 00:17:44,880 So there are some solutions that can do that. 401 00:17:44,880 --> 00:17:48,040 So hold onto that question for one second. 402 00:17:48,040 --> 00:17:50,170 This is basically an example of how data from RAM 403 00:17:50,170 --> 00:17:53,060 can leak onto disk through paging activity. 404 00:17:53,060 --> 00:17:58,460 But note that data lifetime is a bigger problem than just 405 00:17:58,460 --> 00:18:00,797 in the context of private browsing. 406 00:18:00,797 --> 00:18:02,880 You can imagine that any programs that deals with, 407 00:18:02,880 --> 00:18:05,740 let's say, cryptographic keys or user passwords 408 00:18:05,740 --> 00:18:06,930 will have this problem. 409 00:18:06,930 --> 00:18:09,540 Anytime you type in your password to a a program, 410 00:18:09,540 --> 00:18:12,330 the memory page which holds that password can always get 411 00:18:12,330 --> 00:18:13,080 reflected to disk. 412 00:18:13,080 --> 00:18:17,560 So let me show you another example of this. 413 00:18:17,560 --> 00:18:24,140 So let's say that we looked at the following program, which 414 00:18:24,140 --> 00:18:25,660 is pretty simple. 415 00:18:25,660 --> 00:18:26,890 It's called memclear. 416 00:18:26,890 --> 00:18:28,700 So you see here at the bottom and main, 417 00:18:28,700 --> 00:18:33,410 we're just going to read in some secret text file here. 418 00:18:33,410 --> 00:18:35,420 And then we're just going to sleep forever. 419 00:18:35,420 --> 00:18:37,990 So what is that Read Secret do? 420 00:18:37,990 --> 00:18:42,470 Basically, it reasons from file. 421 00:18:42,470 --> 00:18:48,790 It's going to print out the contents of that file. 422 00:18:48,790 --> 00:18:50,770 And then it's actually going to clear out 423 00:18:50,770 --> 00:18:54,040 the buffer that was used to store that secret information. 424 00:18:54,040 --> 00:18:55,290 So getting back to your issue. 425 00:18:55,290 --> 00:18:57,123 So one can imagine the browser, for example, 426 00:18:57,123 --> 00:19:01,180 would try to just memset to zero all the secrets that it 427 00:19:01,180 --> 00:19:05,940 encountered when it's just in private browser. 428 00:19:05,940 --> 00:19:12,810 So if we look at the secret files, it's not very fun. 429 00:19:12,810 --> 00:19:14,660 It just says, my secrets of in a file. 430 00:19:14,660 --> 00:19:21,230 And then if we run this program, in the background-- 431 00:19:21,230 --> 00:19:22,210 so what did it do? 432 00:19:22,210 --> 00:19:24,040 So like I said, it just printed it out. 433 00:19:24,040 --> 00:19:26,970 It read that file in, printed out the secret value-- 434 00:19:26,970 --> 00:19:28,930 cleared the memory buffer that it 435 00:19:28,930 --> 00:19:30,400 used to print that stuff out. 436 00:19:30,400 --> 00:19:32,870 Now it's just sleeping in the background. 437 00:19:32,870 --> 00:19:39,190 So once again, if we use this fun gcore command, 438 00:19:39,190 --> 00:19:44,050 we can take a memory dump of the memclear program 439 00:19:44,050 --> 00:19:46,730 that's running in memory right now. 440 00:19:46,730 --> 00:19:51,050 OK, and then if we do-- let's see which 441 00:19:51,050 --> 00:19:54,440 ones we're going to look at. 442 00:19:54,440 --> 00:20:00,390 So then if we look at-- this guy is the one we want. 443 00:20:00,390 --> 00:20:05,540 And then we do a grep for a secret. 444 00:20:05,540 --> 00:20:07,650 So once again, we see that if look 445 00:20:07,650 --> 00:20:11,700 in the RAM image of that running program, 446 00:20:11,700 --> 00:20:14,050 we found instances of both the file name that 447 00:20:14,050 --> 00:20:17,240 was read in and also some prefixes of the string 448 00:20:17,240 --> 00:20:20,390 contents of that file, even though we 449 00:20:20,390 --> 00:20:24,190 wiped the buffer in the C program itself. 450 00:20:24,190 --> 00:20:26,810 So you might say why did this happen? 451 00:20:26,810 --> 00:20:28,720 This seems very, very strange. 452 00:20:28,720 --> 00:20:30,860 And the reason is that if you think about the way 453 00:20:30,860 --> 00:20:34,710 that I/O works, it's like a layer type thing. 454 00:20:34,710 --> 00:20:37,830 So by the time that the contents of that file 455 00:20:37,830 --> 00:20:41,320 get to the program, it's already gone through, let's say, 456 00:20:41,320 --> 00:20:42,070 the kernel memory. 457 00:20:42,070 --> 00:20:45,740 It's already gone through maybe like the C Standard Library 458 00:20:45,740 --> 00:20:47,430 to do I/O because that library does 459 00:20:47,430 --> 00:20:48,980 buffering and stuff like that. 460 00:20:48,980 --> 00:20:50,730 And so what ends up happening is that even 461 00:20:50,730 --> 00:20:54,870 if you memset the application visible buffer, 462 00:20:54,870 --> 00:20:57,757 there are still instances of secret data lying 463 00:20:57,757 --> 00:21:00,590 in many different places throughout the system. 464 00:21:00,590 --> 00:21:02,300 And this is looking at the user mode 465 00:21:02,300 --> 00:21:03,960 portion of this application. 466 00:21:03,960 --> 00:21:06,530 So there's probably still data sitting around in maybe 467 00:21:06,530 --> 00:21:09,110 like the kernel I/O buffers or things like that. 468 00:21:09,110 --> 00:21:10,740 So getting back to your question, 469 00:21:10,740 --> 00:21:13,800 if you want to do what they call security allocation, 470 00:21:13,800 --> 00:21:17,655 you can't just rely on mechanisms at the application 471 00:21:17,655 --> 00:21:20,730 level because there may be other places where that data lives. 472 00:21:20,730 --> 00:21:22,720 So what are some examples of other places 473 00:21:22,720 --> 00:21:26,110 where this data might live? 474 00:21:26,110 --> 00:21:33,540 So, for example, it might live in a process memory. 475 00:21:33,540 --> 00:21:41,470 So these are things like the heap and the stack. 476 00:21:41,470 --> 00:21:45,080 So when we did that memset inside of memclear.c, 477 00:21:45,080 --> 00:21:47,520 we were basically trying to address this. 478 00:21:47,520 --> 00:21:50,530 But what we found out is that that 479 00:21:50,530 --> 00:21:54,260 is necessary, but insufficient to actually clear all instances 480 00:21:54,260 --> 00:21:56,490 of that secret from memory. 481 00:21:56,490 --> 00:22:02,830 So where else my RAM artifacts live or secret data 482 00:22:02,830 --> 00:22:05,790 persists-- so all kinds of files-- 483 00:22:05,790 --> 00:22:11,590 backups-- SQL write databases. 484 00:22:14,900 --> 00:22:18,291 If at any point, an application takes something in RAM 485 00:22:18,291 --> 00:22:20,540 and writes it to one of these things, then once again, 486 00:22:20,540 --> 00:22:23,640 the attacker may be able to recover that after the attacker 487 00:22:23,640 --> 00:22:25,500 controls the disk . 488 00:22:25,500 --> 00:22:33,080 As I mentioned, a kernel memory is another common place 489 00:22:33,080 --> 00:22:35,840 where RAM secrets may live because, once 490 00:22:35,840 --> 00:22:39,650 again, applications typically do layered I/O in which 491 00:22:39,650 --> 00:22:42,580 each piece of data goes through multiple parts of the stack. 492 00:22:42,580 --> 00:22:45,560 Think of like network transmission, for example. 493 00:22:45,560 --> 00:22:48,052 First, the data has to come to some network buffer that's 494 00:22:48,052 --> 00:22:49,420 probably inside the kernel. 495 00:22:49,420 --> 00:22:52,400 Then once again, it probably goes through some buffers 496 00:22:52,400 --> 00:22:54,170 inside the C Standard Library. 497 00:22:54,170 --> 00:22:57,334 And then finally it will go to the user mode-- 498 00:22:57,334 --> 00:22:59,500 the part of the application that the developer wrote 499 00:22:59,500 --> 00:23:02,400 him or herself. 500 00:23:02,400 --> 00:23:04,140 So that can actually be a big problem. 501 00:23:04,140 --> 00:23:07,500 You can also think too of freed memory pages as being 502 00:23:07,500 --> 00:23:09,900 a place where data can leak. 503 00:23:09,900 --> 00:23:14,610 So imagine that your application allocates 504 00:23:14,610 --> 00:23:18,370 a bunch of memory using whatever [INAUDIBLE] or whatnot. 505 00:23:18,370 --> 00:23:20,730 And then that process dies. 506 00:23:20,730 --> 00:23:23,190 And the kernel sends out another process 507 00:23:23,190 --> 00:23:26,980 but hasn't actually zeroed out all the physical RAM page. 508 00:23:26,980 --> 00:23:29,912 So what could happen is that when that new process spins up, 509 00:23:29,912 --> 00:23:32,370 it could just do a walk through all this physical RAM pages 510 00:23:32,370 --> 00:23:33,995 and use a bunch of memory and just do 511 00:23:33,995 --> 00:23:36,430 the same thing-- do the strange thing-- see if there's 512 00:23:36,430 --> 00:23:38,090 anything interesting there. 513 00:23:38,090 --> 00:23:40,321 And then they might be able to get secrets that way. 514 00:23:40,321 --> 00:23:41,820 So there's a lot of ways information 515 00:23:41,820 --> 00:23:44,140 is leaked from the kernel. 516 00:23:44,140 --> 00:23:47,130 You could also think about I/O buffers and things 517 00:23:47,130 --> 00:23:50,230 like a keyboard from things like the mouse. 518 00:23:50,230 --> 00:23:52,870 There's just a bunch of different factors 519 00:23:52,870 --> 00:23:55,430 that data can leak through the kernel. 520 00:23:59,650 --> 00:24:03,366 How might an attacker try to get some of this information? 521 00:24:03,366 --> 00:24:04,740 Well, in some cases, it's just as 522 00:24:04,740 --> 00:24:09,520 simple as reading the files-- so just read the page file. 523 00:24:09,520 --> 00:24:13,350 Read the hibernation file and just see what's in there. 524 00:24:13,350 --> 00:24:16,060 Some file formats actually embed different versions 525 00:24:16,060 --> 00:24:17,360 within themselves. 526 00:24:17,360 --> 00:24:19,610 For example, the way that Microsoft Word used to work 527 00:24:19,610 --> 00:24:22,180 is that a single Word file would actually contain versions 528 00:24:22,180 --> 00:24:23,850 for old pieces of data. 529 00:24:23,850 --> 00:24:25,910 So if you could get access to that Word file, 530 00:24:25,910 --> 00:24:27,826 you could just sit there through either format 531 00:24:27,826 --> 00:24:30,380 and so step through all the old versions. 532 00:24:30,380 --> 00:24:33,580 And so as we have been discussing in the last couple 533 00:24:33,580 --> 00:24:38,030 minutes, security allocation is also a problem. 534 00:24:38,030 --> 00:24:40,430 It cannot supported a full stack. 535 00:24:40,430 --> 00:24:42,610 So for example, an older Linux kernel-- when 536 00:24:42,610 --> 00:24:45,830 you would create a directory, end directory, 537 00:24:45,830 --> 00:24:49,410 you could leak up to four kilobytes of kernel memory. 538 00:24:49,410 --> 00:24:51,160 Only Zeus knows what's inside that memory. 539 00:24:51,160 --> 00:24:55,870 And that's because Linux wasn't actually zeroing out 540 00:24:55,870 --> 00:24:58,480 kernel memory that had been allocated, deallocated, 541 00:24:58,480 --> 00:25:02,060 and then allocated to something else. 542 00:25:02,060 --> 00:25:06,780 So as I mentioned before too-- if the kernel doesn't zero out 543 00:25:06,780 --> 00:25:09,390 pages that are given to user mode processes, 544 00:25:09,390 --> 00:25:10,973 you can also have user mode secret 545 00:25:10,973 --> 00:25:14,020 leaks through those types of menu pages as well. 546 00:25:14,020 --> 00:25:21,990 Another thing is that-- SSDs-- many of them implement logging. 547 00:25:26,290 --> 00:25:32,250 And so in other words, when you send a write to an SSD, 548 00:25:32,250 --> 00:25:35,480 oftentimes you are not directly overwriting data, 549 00:25:35,480 --> 00:25:37,480 you're actually writing to a log. 550 00:25:37,480 --> 00:25:40,260 And when a piece of data becomes invalid, 551 00:25:40,260 --> 00:25:42,760 it lays away your claim. 552 00:25:42,760 --> 00:25:46,664 So what that means is that if you as the user get unlucky. 553 00:25:46,664 --> 00:25:49,205 And you've written a bunch of data that hasn't been reclaimed 554 00:25:49,205 --> 00:25:51,440 by the SSD, then maybe the attacker 555 00:25:51,440 --> 00:25:54,754 can look at that hardware and say, oh, OK, 556 00:25:54,754 --> 00:25:55,920 I understand the log format. 557 00:25:55,920 --> 00:25:56,850 And even though technically speaking, 558 00:25:56,850 --> 00:25:58,810 this data may be invalid, I can still 559 00:25:58,810 --> 00:26:01,832 recover because I understand how the Flash translation layer 560 00:26:01,832 --> 00:26:03,040 works or something like that. 561 00:26:03,040 --> 00:26:04,550 And at a high level, you can also 562 00:26:04,550 --> 00:26:10,020 have this problem with stolen or discarded hardware as well. 563 00:26:10,020 --> 00:26:12,500 If you don't use encryption, then a lot of times, 564 00:26:12,500 --> 00:26:14,270 you can just take some disk that you 565 00:26:14,270 --> 00:26:15,900 found in a dumpster somewhere-- you 566 00:26:15,900 --> 00:26:17,483 understand what the physical layout is 567 00:26:17,483 --> 00:26:19,670 and recover data like that. 568 00:26:19,670 --> 00:26:21,570 So anyway, there's a lot of problems 569 00:26:21,570 --> 00:26:25,220 with these RAM artifacts getting stuck in persistent storage 570 00:26:25,220 --> 00:26:30,490 somehow and then being available for an attacker later on. 571 00:26:30,490 --> 00:26:38,670 So how can we fix these data lifetime problems? 572 00:26:42,570 --> 00:26:47,700 So we've already discussed one solution, 573 00:26:47,700 --> 00:26:53,429 which is to basically zero out memory 574 00:26:53,429 --> 00:26:54,470 when you're done with it. 575 00:26:57,680 --> 00:27:00,620 So whenever you deallocate something, you just go through. 576 00:27:00,620 --> 00:27:02,970 You write a bunch of zeros or some random thing 577 00:27:02,970 --> 00:27:04,595 and then essentially hide the old data 578 00:27:04,595 --> 00:27:06,470 from someone else who might come along later. 579 00:27:06,470 --> 00:27:09,011 So does anyone see potential any potential problem with that? 580 00:27:13,500 --> 00:27:16,555 One problem you might imagine is that as with all things 581 00:27:16,555 --> 00:27:20,130 in security, people always complain about performance. 582 00:27:20,130 --> 00:27:22,940 And so when you say that you zero out memory, 583 00:27:22,940 --> 00:27:26,410 maybe this isn't a problem if your program is I/O bound. 584 00:27:26,410 --> 00:27:28,776 So you're waiting on some slow, mechanical part 585 00:27:28,776 --> 00:27:30,110 of the hard disk or whatnot. 586 00:27:30,110 --> 00:27:32,862 But imagine if your program is CPU bound. 587 00:27:32,862 --> 00:27:34,570 And maybe it's very memory intensive too. 588 00:27:34,570 --> 00:27:36,630 So it's always allocating and deallocating data. 589 00:27:36,630 --> 00:27:40,400 So maybe zeroing out memory might be performance cost 590 00:27:40,400 --> 00:27:42,499 that you don't want to pay. 591 00:27:42,499 --> 00:27:44,290 Typically this isn't a problem in practice. 592 00:27:44,290 --> 00:27:45,990 But as we all know, people love performance. 593 00:27:45,990 --> 00:27:47,740 This is sometimes an objection that you'll 594 00:27:47,740 --> 00:27:49,200 have with this approach. 595 00:27:49,200 --> 00:27:51,903 Another thing you can imagine doing 596 00:27:51,903 --> 00:27:53,486 is that instead of zeroing out memory, 597 00:27:53,486 --> 00:28:03,130 you always encrypt data as it goes to stable storage. 598 00:28:08,550 --> 00:28:11,210 So in a system like this, basically, 599 00:28:11,210 --> 00:28:14,880 before the application ever writes anything to disk, 600 00:28:14,880 --> 00:28:17,520 it's actually going to encrypt it before it actually hits 601 00:28:17,520 --> 00:28:19,180 that SSD or that hard disk. 602 00:28:19,180 --> 00:28:22,857 Similarly, when the data comes back in from stable storage, 603 00:28:22,857 --> 00:28:24,440 you're going to decrypt it dynamically 604 00:28:24,440 --> 00:28:26,160 before you put it into RAM. 605 00:28:26,160 --> 00:28:29,410 And so what's interesting about this approach is 606 00:28:29,410 --> 00:28:33,060 that if the key that you use to decrypt and encrypt data-- 607 00:28:33,060 --> 00:28:36,920 if you throw it away, then once you throw it away, 608 00:28:36,920 --> 00:28:39,830 you've effectively made that data on disk 609 00:28:39,830 --> 00:28:42,944 unrecoverable by the attacker, assuming that you 610 00:28:42,944 --> 00:28:44,920 believe in cryptography. 611 00:28:44,920 --> 00:28:49,160 So this is very, very nice because it gives us 612 00:28:49,160 --> 00:28:50,840 this nice property that we don't have 613 00:28:50,840 --> 00:28:53,010 to remember per se all places where you've 614 00:28:53,010 --> 00:28:54,810 written this encrypted data. 615 00:28:54,810 --> 00:28:56,455 We can just say why drop the keys? 616 00:28:56,455 --> 00:28:58,380 And I'll just treat all that encrypted data 617 00:28:58,380 --> 00:29:01,230 as it's something that I can allocate again. 618 00:29:01,230 --> 00:29:08,050 So, for example, if you look at Open BSD, 619 00:29:08,050 --> 00:29:14,610 they have this option where you can do swap encryption. 620 00:29:14,610 --> 00:29:19,190 So you can basically associate keys 621 00:29:19,190 --> 00:29:22,120 with various sections of the page file. 622 00:29:22,120 --> 00:29:24,115 So it does this very thing I mentioned. 623 00:29:24,115 --> 00:29:25,690 So every time you group the machine, 624 00:29:25,690 --> 00:29:27,720 it'll generate a bunch of new keys. 625 00:29:27,720 --> 00:29:30,340 And then when your machine goes down because you shut it down 626 00:29:30,340 --> 00:29:32,298 or you reboot it or whatever, it will basically 627 00:29:32,298 --> 00:29:35,100 forget all the keys that it used to encrypt that swap space. 628 00:29:35,100 --> 00:29:37,058 And then it can basically say now all that swap 629 00:29:37,058 --> 00:29:38,520 is available to be used again. 630 00:29:38,520 --> 00:29:40,910 And so because those keys are forgotten, 631 00:29:40,910 --> 00:29:42,740 one can assume that the attacker can't look 632 00:29:42,740 --> 00:29:43,990 at the stuff that is in there. 633 00:29:43,990 --> 00:29:47,127 AUDIENCE: What is the [INAUDIBLE]? 634 00:29:47,127 --> 00:29:48,960 PROFESSOR: Ah, yeah, that's a good question. 635 00:29:48,960 --> 00:29:52,960 I'm actually not sure what sources of entropy it uses. 636 00:29:52,960 --> 00:29:56,200 Open BSD is pretty paranoid about security. 637 00:29:56,200 --> 00:29:58,557 So I imagine it does things like it 638 00:29:58,557 --> 00:30:00,390 looks at let's say the entropy pool gathered 639 00:30:00,390 --> 00:30:02,276 from user keyboard input, for example, 640 00:30:02,276 --> 00:30:03,400 and other things like that. 641 00:30:03,400 --> 00:30:05,608 Yeah, I'm not actually sure how it drives those keys. 642 00:30:05,608 --> 00:30:08,155 But you're exactly right that if these sources of entropy 643 00:30:08,155 --> 00:30:10,197 that it uses are predictable, then that basically 644 00:30:10,197 --> 00:30:12,029 shrinks the entropy space of the key itself, 645 00:30:12,029 --> 00:30:13,788 which then makes the key more vulnerable. 646 00:30:13,788 --> 00:30:18,189 AUDIENCE: So with the memory it's capturing [INAUDIBLE]. 647 00:30:21,940 --> 00:30:25,510 PROFESSOR: Yeah, so basically, what this model assumes 648 00:30:25,510 --> 00:30:28,580 if all we are doing is looking at the swap encryption, 649 00:30:28,580 --> 00:30:32,230 It assumes that the RAM pages for the keys, 650 00:30:32,230 --> 00:30:34,159 for example, are never swapped out. 651 00:30:34,159 --> 00:30:35,700 And that's actually pretty easy to do 652 00:30:35,700 --> 00:30:38,180 if you're the OS of if you just pin that page to memory. 653 00:30:38,180 --> 00:30:40,030 And this also doesn't help you with someone 654 00:30:40,030 --> 00:30:42,465 whose got pins with the memory bus or someone who 655 00:30:42,465 --> 00:30:44,590 can walk the kernel memory page or stuff like that. 656 00:30:44,590 --> 00:30:45,256 So you're right. 657 00:30:47,460 --> 00:30:49,190 AUDIENCE: In terms of browsing, it 658 00:30:49,190 --> 00:30:51,641 helps of attackers that come after the fact 659 00:30:51,641 --> 00:30:53,390 because if you have to throw away the key, 660 00:30:53,390 --> 00:30:55,500 then after the fact, there is no key to memory. 661 00:30:55,500 --> 00:30:57,083 PROFESSOR: Yeah, that's exactly right. 662 00:30:57,083 --> 00:30:59,890 So what's nice about this is that it essentially 663 00:30:59,890 --> 00:31:01,910 doesn't require modifications to applications. 664 00:31:01,910 --> 00:31:04,810 Like you said, you can just put any old thing atop this 665 00:31:04,810 --> 00:31:06,140 and get this property for free. 666 00:31:09,008 --> 00:31:11,790 AUDIENCE: Going back a bit-- if you look at the data 667 00:31:11,790 --> 00:31:16,272 before [INAUDIBLE] to RAM. 668 00:31:16,272 --> 00:31:18,355 How does that avoid the RAM artifacts [INAUDIBLE]? 669 00:31:21,555 --> 00:31:23,513 PROFESSOR: OK, so if I understand your question 670 00:31:23,513 --> 00:31:25,930 correctly, I think you're worried about the fact 671 00:31:25,930 --> 00:31:29,081 that, sure, data is encrypted when it's on disk, 672 00:31:29,081 --> 00:31:31,080 but then it actually can sit in clear text forms 673 00:31:31,080 --> 00:31:34,710 somehow in the actual memory itself. 674 00:31:34,710 --> 00:31:37,880 So this gets back to the discussion that we had here. 675 00:31:37,880 --> 00:31:42,150 So ensuring that data hit the disk encrypted 676 00:31:42,150 --> 00:31:44,445 doesn't actually protect against an attacker who 677 00:31:44,445 --> 00:31:46,566 can look at RAM in real time. 678 00:31:46,566 --> 00:31:47,940 So basically what we're saying is 679 00:31:47,940 --> 00:31:50,300 that if you're only worried about this post-session 680 00:31:50,300 --> 00:31:52,800 attacker who can't, for example, look at your RAM views 681 00:31:52,800 --> 00:31:54,354 in real time, this works fine. 682 00:31:54,354 --> 00:31:56,520 But you're exactly right that this does not provide, 683 00:31:56,520 --> 00:31:58,469 for lack of a better term, encrypted RAM. 684 00:31:58,469 --> 00:32:00,510 And there actually are some research systems that 685 00:32:00,510 --> 00:32:01,880 try to do something like that. 686 00:32:01,880 --> 00:32:04,590 It gets a little bit tricky because at some point when 687 00:32:04,590 --> 00:32:06,340 you look at your hardware, your processor, 688 00:32:06,340 --> 00:32:10,276 it has to actually do something on real data 689 00:32:10,276 --> 00:32:13,470 like if you want to do an ad and you have to pass a clear text 690 00:32:13,470 --> 00:32:15,260 operands perhaps. 691 00:32:15,260 --> 00:32:17,670 There are also some interesting research systems 692 00:32:17,670 --> 00:32:20,530 which actually try to do computation on encrypted data. 693 00:32:20,530 --> 00:32:23,240 This is mind blowing like "The Matrix." 694 00:32:23,240 --> 00:32:26,220 But suffice it to say that protections that people have 695 00:32:26,220 --> 00:32:29,851 for in RAM data are typically much weaker than what 696 00:32:29,851 --> 00:32:32,477 they have for data that lives on stable storage. 697 00:32:32,477 --> 00:32:33,268 You got a question? 698 00:32:33,268 --> 00:32:35,152 AUDIENCE: Yeah, but does that [INAUDIBLE] 699 00:32:35,152 --> 00:32:38,710 because even though the attacker has post-session access, 700 00:32:38,710 --> 00:32:41,851 that's just post-private mode access. 701 00:32:41,851 --> 00:32:43,342 So there could this could still be 702 00:32:43,342 --> 00:32:45,330 a public mode session going on. 703 00:32:45,330 --> 00:32:48,320 And the attacker would have access to the machine, right? 704 00:32:48,320 --> 00:32:49,660 PROFESSOR: So you're worried about if a concurrent-- 705 00:32:49,660 --> 00:32:50,656 AUDIENCE: So if you have a public mode tab 706 00:32:50,656 --> 00:32:51,989 and you have a private mode tab. 707 00:32:51,989 --> 00:32:54,171 You close the private tab and the public mode tab 708 00:32:54,171 --> 00:32:58,761 stays on-- the attacker could still dump the memory. 709 00:32:58,761 --> 00:33:00,813 And the RAM artifacts would be problematic. 710 00:33:00,813 --> 00:33:01,680 Is that right? 711 00:33:01,680 --> 00:33:04,250 PROFESSOR: Yeah, interesting-- so we 712 00:33:04,250 --> 00:33:07,110 will talk at the end of lecture about an attack which 713 00:33:07,110 --> 00:33:08,710 is somewhat similar. 714 00:33:08,710 --> 00:33:11,242 So most of the threat models of private browsing 715 00:33:11,242 --> 00:33:12,950 due not assume a current attacker at all. 716 00:33:12,950 --> 00:33:14,615 In other words, they assume that when 717 00:33:14,615 --> 00:33:16,115 you're doing private browsing, there 718 00:33:16,115 --> 00:33:18,480 is no other person who have a public mode 719 00:33:18,480 --> 00:33:20,100 tab open or anything like that. 720 00:33:20,100 --> 00:33:24,880 But you are in fact correct that the way that private browsing 721 00:33:24,880 --> 00:33:26,434 modes are often implemented-- let's 722 00:33:26,434 --> 00:33:27,850 say you open up a private browsing 723 00:33:27,850 --> 00:33:29,974 tab, you close that tab. 724 00:33:29,974 --> 00:33:31,890 You immediately run to go get a cup of coffee. 725 00:33:31,890 --> 00:33:34,490 So one attack I will describe is that Firefox, for example, 726 00:33:34,490 --> 00:33:37,830 still keeps statistics about, let's say, memory allocation. 727 00:33:37,830 --> 00:33:39,405 So if the memory for your private tab 728 00:33:39,405 --> 00:33:40,780 is actually laid with the garbage 729 00:33:40,780 --> 00:33:43,830 collected and I can basically go to about.memory or whatever 730 00:33:43,830 --> 00:33:46,794 and actually see URLs and stuff in your tab. 731 00:33:46,794 --> 00:33:49,210 But yeah, but the long story short, most of these attacker 732 00:33:49,210 --> 00:33:51,570 models do not assume a concurrent attacker 733 00:33:51,570 --> 00:33:55,070 at the same time that you're privately browsing. 734 00:33:55,070 --> 00:33:55,570 Make sense? 735 00:34:00,690 --> 00:34:03,312 So this is one that you do-- do swap encryption 736 00:34:03,312 --> 00:34:04,020 like I mentioned. 737 00:34:04,020 --> 00:34:06,862 This is nice because this gives you some pretty cool security 738 00:34:06,862 --> 00:34:08,320 properties without having to change 739 00:34:08,320 --> 00:34:10,510 the browser at all or any of applications 740 00:34:10,510 --> 00:34:11,630 running on top of this. 741 00:34:11,630 --> 00:34:15,290 And in practice, the CPU cost of doing this kind of thing 742 00:34:15,290 --> 00:34:17,810 is much, much lower than the actual cost 743 00:34:17,810 --> 00:34:19,879 of doing I/O in general, particularly 744 00:34:19,879 --> 00:34:21,670 if you have a disk because with disk you're 745 00:34:21,670 --> 00:34:22,989 particularly paying C cost. 746 00:34:22,989 --> 00:34:24,449 That's a mechanical cost. 747 00:34:24,449 --> 00:34:27,360 This is all processing cost-- pure computational stuff. 748 00:34:27,360 --> 00:34:30,166 So typically this not that big of a performance hit. 749 00:34:36,159 --> 00:34:37,818 Oh, god there's physics here. 750 00:34:41,319 --> 00:34:45,980 This is always an adventure. 751 00:34:45,980 --> 00:34:52,320 So the next attacker that we're going to look at 752 00:34:52,320 --> 00:34:57,940 is this web attacker that I mentioned 753 00:34:57,940 --> 00:35:00,920 at the beginning of lecture. 754 00:35:00,920 --> 00:35:08,940 So the assumption here are that the attacker 755 00:35:08,940 --> 00:35:17,376 who controls the website that the user is going 756 00:35:17,376 --> 00:35:22,066 to visit in private browsing mode-- 757 00:35:22,066 --> 00:35:27,686 how the attacker does not control the user's 758 00:35:27,686 --> 00:35:28,618 local machine. 759 00:35:32,350 --> 00:35:34,820 And so the security goals that we 760 00:35:34,820 --> 00:35:38,990 want to have against the web attackers are two fold. 761 00:35:41,680 --> 00:35:46,960 So first, we don't want the attacker 762 00:35:46,960 --> 00:35:52,560 to be able to identify the users. 763 00:35:55,560 --> 00:35:57,320 And by identify with, we just mean 764 00:35:57,320 --> 00:35:59,820 we don't want the attacker to be able to distinguish 765 00:35:59,820 --> 00:36:02,778 the user from any other user that happens 766 00:36:02,778 --> 00:36:04,640 to be visiting the site. 767 00:36:04,640 --> 00:36:08,140 And you also might imagine that perhaps we 768 00:36:08,140 --> 00:36:15,340 don't want the attacker to tell whether or not we're 769 00:36:15,340 --> 00:36:18,940 using private browsing mode. 770 00:36:18,940 --> 00:36:24,430 So the attacker can't tell the user employees 771 00:36:24,430 --> 00:36:25,290 private browsing. 772 00:36:28,380 --> 00:36:33,330 And so as the paper discusses, defending 773 00:36:33,330 --> 00:36:37,260 against the web attacker is actually pretty tricky. 774 00:36:37,260 --> 00:36:39,000 So what does it mean, for example, 775 00:36:39,000 --> 00:36:41,935 to identify different users. 776 00:36:41,935 --> 00:36:44,060 Like I said, at a high level, as you could imagine, 777 00:36:44,060 --> 00:36:47,320 the user looks no different than any other users 778 00:36:47,320 --> 00:36:48,910 that visits this site. 779 00:36:48,910 --> 00:36:50,460 So you can imagine a web attacker 780 00:36:50,460 --> 00:36:53,170 might want to do one of two specific things. 781 00:36:53,170 --> 00:36:56,400 It might want to say, OK, I see multiple people who 782 00:36:56,400 --> 00:36:59,740 were visiting my site in private browsing mode. 783 00:36:59,740 --> 00:37:02,890 You were visitor five, seven, and eight. 784 00:37:02,890 --> 00:37:04,890 So in other words, identifying a particular user 785 00:37:04,890 --> 00:37:07,820 within the context of multiple private browsing sessions. 786 00:37:07,820 --> 00:37:09,920 The second the attacker might want to do 787 00:37:09,920 --> 00:37:14,230 is actually try to link a user across public and private mode 788 00:37:14,230 --> 00:37:15,120 browsing sessions. 789 00:37:15,120 --> 00:37:18,110 So I go to Amazon.com once in public browsing mode. 790 00:37:18,110 --> 00:37:20,350 I then go to it again in private browsing mode. 791 00:37:20,350 --> 00:37:22,366 Can the attacker actually figure out 792 00:37:22,366 --> 00:37:23,740 that I'm actually the same person 793 00:37:23,740 --> 00:37:24,600 through those two visits. 794 00:37:24,600 --> 00:37:25,150 Yes? 795 00:37:25,150 --> 00:37:27,900 AUDIENCE: This is all a module of the IP address. 796 00:37:27,900 --> 00:37:31,370 PROFESSOR: Ah, yes, that's exactly right. 797 00:37:31,370 --> 00:37:32,740 That is excellent foreshadowing. 798 00:37:32,740 --> 00:37:38,315 So right now I'm assuming that either user employs Tor or uses 799 00:37:38,315 --> 00:37:39,180 something like this. 800 00:37:39,180 --> 00:37:41,180 So yeah, we're punting on this whole issue of IP 801 00:37:41,180 --> 00:37:42,270 admittedly for now. 802 00:37:42,270 --> 00:37:44,640 That's right. 803 00:37:44,640 --> 00:37:47,150 So yeah, this segues very well. 804 00:37:47,150 --> 00:37:48,960 So what's an easy way to identify the user, 805 00:37:48,960 --> 00:37:50,780 as you suggested, the IP address. 806 00:37:50,780 --> 00:37:53,260 So it's a pretty high likelihood if you 807 00:37:53,260 --> 00:37:55,425 see two visits that are sort of close in time 808 00:37:55,425 --> 00:37:57,590 relatively speaking with the same IP 809 00:37:57,590 --> 00:38:00,900 with high likelihood that's probably the same user. 810 00:38:00,900 --> 00:38:02,442 And this in fact the motivation-- one 811 00:38:02,442 --> 00:38:05,110 of the motivations for stuff like Tor. 812 00:38:05,110 --> 00:38:08,510 And so we're actually willing to discuss Tor next lecture. 813 00:38:08,510 --> 00:38:10,320 So in case you haven't heard of Tor, 814 00:38:10,320 --> 00:38:13,560 it's basically a tool which tries to obscure things 815 00:38:13,560 --> 00:38:15,120 like your IP address. 816 00:38:15,120 --> 00:38:18,560 And you could actually imagine layering Tor-- 817 00:38:18,560 --> 00:38:22,210 having Tor be the foundation. 818 00:38:22,210 --> 00:38:24,630 And then you put private browsing modes atop that. 819 00:38:24,630 --> 00:38:26,986 And that might give you some stronger properties then 820 00:38:26,986 --> 00:38:31,680 you would if you used private browsing modes at all. 821 00:38:31,680 --> 00:38:34,610 But, anyway, so the thing to mention about Tor 822 00:38:34,610 --> 00:38:37,940 though is that Tor does provide some sense of IP anonymity. 823 00:38:37,940 --> 00:38:40,830 But it doesn't actually address things like the data secrecy 824 00:38:40,830 --> 00:38:42,920 lifetime issues or things like that. 825 00:38:42,920 --> 00:38:46,410 So Tor-- perhaps you can think of it as maybe necessary, 826 00:38:46,410 --> 00:38:48,580 but insufficient for a full implementation 827 00:38:48,580 --> 00:38:50,760 of private browsing mode. 828 00:38:50,760 --> 00:38:53,450 And so what's interesting too is that even if a user 829 00:38:53,450 --> 00:38:57,800 employees Tor, there are still ways that a web server can 830 00:38:57,800 --> 00:39:02,020 identify the user by looking at the unique characteristics 831 00:39:02,020 --> 00:39:06,230 of that user's browser. 832 00:39:06,230 --> 00:39:09,080 So this is our final demo for today. 833 00:39:09,080 --> 00:39:12,255 So let's see here. 834 00:39:12,255 --> 00:39:15,980 So going to get rid of this guy. 835 00:39:15,980 --> 00:39:18,380 And then let's see. 836 00:39:18,380 --> 00:39:22,632 I am going to go to this site called Panopticlick. 837 00:39:22,632 --> 00:39:23,840 Some of so you heard of this. 838 00:39:23,840 --> 00:39:25,260 It's run the EFF. 839 00:39:25,260 --> 00:39:29,640 The basic idea is it is going to try to identify you the user 840 00:39:29,640 --> 00:39:32,940 by looking at various characteristics of your web 841 00:39:32,940 --> 00:39:33,738 browser. 842 00:39:33,738 --> 00:39:37,410 So I'll show you exactly what I mean. 843 00:39:37,410 --> 00:39:39,101 So I want to go-- the URL is very long. 844 00:39:39,101 --> 00:39:41,506 This is very stressful for me to type in. 845 00:39:41,506 --> 00:39:43,911 So please don't just if it doesn't go through. 846 00:39:43,911 --> 00:39:45,354 Let's see. 847 00:39:45,354 --> 00:39:49,220 Panopticlick-- did it work? 848 00:39:49,220 --> 00:39:51,730 Yes, OK. 849 00:39:51,730 --> 00:39:54,030 So I am going to go to this website. 850 00:39:54,030 --> 00:39:57,600 And it's run by the folks at EFF. 851 00:39:57,600 --> 00:39:59,820 And I say, OK, test me. 852 00:39:59,820 --> 00:40:02,117 So what this is doing is it's basically 853 00:40:02,117 --> 00:40:03,825 running a bunch of JavaScript code, maybe 854 00:40:03,825 --> 00:40:05,730 an applet-- maybe some Java. 855 00:40:05,730 --> 00:40:08,110 And it's trying to fingerprint my browser. 856 00:40:08,110 --> 00:40:12,115 And it's trying to figure out how much unique information 857 00:40:12,115 --> 00:40:12,990 does it have. 858 00:40:12,990 --> 00:40:18,810 And so-- let me increase the font here. 859 00:40:18,810 --> 00:40:20,960 So, for example, one thing it looks at 860 00:40:20,960 --> 00:40:23,620 is it looks at you see here what are 861 00:40:23,620 --> 00:40:27,060 all the details of the browser plugins that I'm running. 862 00:40:27,060 --> 00:40:29,390 So basically it'll run code in it's web page 863 00:40:29,390 --> 00:40:31,454 that looks and sees do I have Flash installed? 864 00:40:31,454 --> 00:40:32,370 What version of Flash? 865 00:40:32,370 --> 00:40:33,620 Do I have Java installed? 866 00:40:33,620 --> 00:40:35,970 What version of Java? 867 00:40:35,970 --> 00:40:39,190 So you can see that these are all-- they can't even 868 00:40:39,190 --> 00:40:40,810 fit on the tree at one time. 869 00:40:40,810 --> 00:40:44,820 These are like all the various plugins and ridiculous formats 870 00:40:44,820 --> 00:40:45,960 that my browser supports. 871 00:40:45,960 --> 00:40:48,774 Now, the high level-- this should be troubling to you 872 00:40:48,774 --> 00:40:49,940 if you're a security person. 873 00:40:49,940 --> 00:40:51,939 Am I actually actively using all of these things 874 00:40:51,939 --> 00:40:53,180 at a given time? 875 00:40:53,180 --> 00:40:55,805 This gives me nightmares. 876 00:40:55,805 --> 00:40:57,930 So what ends up happening is that web 877 00:40:57,930 --> 00:41:00,389 servers-- this web attacker-- they can hunt code like this. 878 00:41:00,389 --> 00:41:02,888 And they can figure out what are all the plugins that you're 879 00:41:02,888 --> 00:41:03,840 looking at. 880 00:41:03,840 --> 00:41:05,970 Now if you look at these two columns to the left, 881 00:41:05,970 --> 00:41:07,020 what are they? 882 00:41:07,020 --> 00:41:09,550 So you see up here. 883 00:41:09,550 --> 00:41:11,810 It says bits of identifying information. 884 00:41:11,810 --> 00:41:15,760 And then one in x browsers has this value. 885 00:41:15,760 --> 00:41:18,635 So, for example, if we look at a plugin, 886 00:41:18,635 --> 00:41:21,979 it's saying there is basically-- it's probably 887 00:41:21,979 --> 00:41:23,770 this is the number that's more interesting. 888 00:41:23,770 --> 00:41:24,660 It's no longer right. 889 00:41:24,660 --> 00:41:30,140 It's saying that 1 in approximately 280,000 browsers 890 00:41:30,140 --> 00:41:33,610 has this exact set of plugins. 891 00:41:33,610 --> 00:41:37,960 So that's actually a pretty specific way to fingerprint me. 892 00:41:37,960 --> 00:41:40,580 It's saying very, very few people 893 00:41:40,580 --> 00:41:43,674 who have this exact set of plugins and configurations. 894 00:41:43,674 --> 00:41:45,090 So as it turns out, they're right. 895 00:41:45,090 --> 00:41:45,840 I am quite unique. 896 00:41:45,840 --> 00:41:50,104 But this a problem from the security perspective. 897 00:41:50,104 --> 00:41:50,770 So look at this. 898 00:41:50,770 --> 00:41:55,120 The screen size and the color depths for my machine-- 899 00:41:55,120 --> 00:41:57,830 1 in-- what is this? 900 00:41:57,830 --> 00:42:00,570 1.5 million. 901 00:42:00,570 --> 00:42:02,515 That's actually pretty shocking. 902 00:42:02,515 --> 00:42:07,050 So there's only one person in a sample of 1.5 million people 903 00:42:07,050 --> 00:42:10,420 who have this particular screen image. 904 00:42:10,420 --> 00:42:14,110 So these things-- they are additive in some sense. 905 00:42:14,110 --> 00:42:17,340 So the more fingerprints you have, the more easy 906 00:42:17,340 --> 00:42:21,180 it is for the attacker to figure out exactly who you are. 907 00:42:21,180 --> 00:42:24,420 And so note this was done purely from the server side. 908 00:42:24,420 --> 00:42:26,090 I just went to this web page. 909 00:42:26,090 --> 00:42:27,490 And I just did this. 910 00:42:27,490 --> 00:42:28,710 And this is what it got to. 911 00:42:28,710 --> 00:42:30,716 One second-- I want to show one more thing. 912 00:42:30,716 --> 00:42:33,614 This was done in private browsing mode. 913 00:42:33,614 --> 00:42:35,063 And let's see here. 914 00:42:38,927 --> 00:42:43,948 I will open up a regular version of Firefox. 915 00:42:47,392 --> 00:42:51,850 Then I run this up again. 916 00:42:51,850 --> 00:42:55,490 So note that now I'm in a public mode browser. 917 00:42:55,490 --> 00:42:57,050 Before I was in private mode. 918 00:42:57,050 --> 00:42:58,970 Now I am public mode. 919 00:42:58,970 --> 00:43:02,250 So what you'll see is that when we look at the browser plugins, 920 00:43:02,250 --> 00:43:04,000 the extent to which I can be fingerprinted 921 00:43:04,000 --> 00:43:05,820 is essentially the same. 922 00:43:05,820 --> 00:43:08,448 So it's going to be a few plugins that may or may not 923 00:43:08,448 --> 00:43:10,274 load depending on the vagaries of how 924 00:43:10,274 --> 00:43:11,440 privacy mode is implemented. 925 00:43:11,440 --> 00:43:13,512 But still, look at that. 926 00:43:13,512 --> 00:43:15,872 I'm still very easy to fingerprint. 927 00:43:15,872 --> 00:43:18,392 And in fact, if you look back at this guy 928 00:43:18,392 --> 00:43:20,100 again-- that screen size and color depth. 929 00:43:20,100 --> 00:43:22,082 I didn't change that actually between the two-- 930 00:43:22,082 --> 00:43:23,790 between public and private browsing mode. 931 00:43:23,790 --> 00:43:26,730 So that ability to fingerprint there is basically the same. 932 00:43:26,730 --> 00:43:29,430 This is one reason why it's so difficult to protect yourself 933 00:43:29,430 --> 00:43:33,110 against this web attack because browsers themselves reveal 934 00:43:33,110 --> 00:43:35,749 so much information about you just from their configuration. 935 00:43:35,749 --> 00:43:37,998 AUDIENCE: I am curious the screen size and color depth 936 00:43:37,998 --> 00:43:39,133 thing. 937 00:43:39,133 --> 00:43:39,966 How does it do that? 938 00:43:39,966 --> 00:43:42,336 How is that unique? 939 00:43:42,336 --> 00:43:44,887 How many screen sizes and color depths are there? 940 00:43:44,887 --> 00:43:46,470 PROFESSOR: Well, I think it's actually 941 00:43:46,470 --> 00:43:48,136 hiding some of the magic that it's using 942 00:43:48,136 --> 00:43:49,430 to figure out what that is. 943 00:43:49,430 --> 00:43:51,638 So at a high level, how do a lot of these tests work? 944 00:43:51,638 --> 00:43:55,250 So there's some parts of your browser environment 945 00:43:55,250 --> 00:43:57,300 that are testable purely by JavaScript code. 946 00:43:57,300 --> 00:43:59,866 So you can imagine that you can essentially 947 00:43:59,866 --> 00:44:01,240 have JavaScript code, which looks 948 00:44:01,240 --> 00:44:03,198 over the properties of the window object, which 949 00:44:03,198 --> 00:44:05,370 is like a global JavaScript manuscript 950 00:44:05,370 --> 00:44:07,741 and sees how do you define this weird widget? 951 00:44:07,741 --> 00:44:09,240 How do you define this weird widget? 952 00:44:09,240 --> 00:44:12,090 And if so, that my count your plug-ins, lets say. 953 00:44:12,090 --> 00:44:14,650 Pages like this also typically take advantage of the fact 954 00:44:14,650 --> 00:44:18,522 that Java applets and Flash objects 955 00:44:18,522 --> 00:44:20,480 can look at all kinds of more interesting stuff 956 00:44:20,480 --> 00:44:22,521 like the fonts that are available on your machine 957 00:44:22,521 --> 00:44:23,660 and things like that. 958 00:44:23,660 --> 00:44:27,180 So as to the particular screen size and color depth thing-- 959 00:44:27,180 --> 00:44:28,555 I think-- don't quote me on that. 960 00:44:28,555 --> 00:44:29,971 But I think what ends up happening 961 00:44:29,971 --> 00:44:32,796 is it will try to run an applet, let's say, that will actually 962 00:44:32,796 --> 00:44:35,360 try to query your graphics card or whatever are the graphics 963 00:44:35,360 --> 00:44:38,334 interfaces in Java and poke for different aspects of it. 964 00:44:38,334 --> 00:44:40,250 So I think it's actually more than just screen 965 00:44:40,250 --> 00:44:41,030 size and depth. 966 00:44:41,030 --> 00:44:43,620 They condense it for size as that. 967 00:44:43,620 --> 00:44:45,842 So at a high level, that's how all these tricks work. 968 00:44:45,842 --> 00:44:47,300 So you see a bunch of information-- 969 00:44:47,300 --> 00:44:48,830 you can snarf up through JavaScript. 970 00:44:48,830 --> 00:44:50,720 Then you run a bunch of plugins, which 971 00:44:50,720 --> 00:44:53,820 can typically access more stuff and see what they can snarf up. 972 00:44:53,820 --> 00:44:56,910 And then you see what's going on. 973 00:44:56,910 --> 00:44:58,736 Does it all make sense? 974 00:44:58,736 --> 00:45:01,152 Yeah, this is basically why it's very difficult to protect 975 00:45:01,152 --> 00:45:02,520 against a web attacker. 976 00:45:02,520 --> 00:45:04,686 And in particular, getting back to the discussion we 977 00:45:04,686 --> 00:45:07,940 had about Tor, right, even if I had gone through Tor-- so 978 00:45:07,940 --> 00:45:12,145 you'll note the IP address-- you don't see it up here. 979 00:45:12,145 --> 00:45:13,867 And so you can imagine that yeah, 980 00:45:13,867 --> 00:45:16,200 maybe this thing would actually look at your IP address. 981 00:45:16,200 --> 00:45:17,405 But the thing is like even if I didn't 982 00:45:17,405 --> 00:45:19,200 know what IP you were coming from at all, 983 00:45:19,200 --> 00:45:21,924 I can do all these things. 984 00:45:21,924 --> 00:45:22,840 It's pretty maddening. 985 00:45:22,840 --> 00:45:23,890 It's pretty insane. 986 00:45:23,890 --> 00:45:25,682 So there are some products out there 987 00:45:25,682 --> 00:45:28,690 that tried to do things like imagine 988 00:45:28,690 --> 00:45:31,090 that you had a proxy out in the cloud 989 00:45:31,090 --> 00:45:33,170 that all your web traffic went through. 990 00:45:33,170 --> 00:45:34,680 And then imagine that proxy tried 991 00:45:34,680 --> 00:45:40,250 to present a canonical version of a browser runtime. 992 00:45:40,250 --> 00:45:42,890 And imagine that it would always try to emulate, 993 00:45:42,890 --> 00:45:46,400 let's say, Firefox v 10.7. 994 00:45:46,400 --> 00:45:48,780 Then it would try to send back the data 995 00:45:48,780 --> 00:45:51,930 that it rendered as Firefox v 10.7. 996 00:45:51,930 --> 00:45:53,960 So some people would try to attack this. 997 00:45:53,960 --> 00:45:54,970 It's sort of tricky. 998 00:45:54,970 --> 00:45:55,886 AUDIENCE: [INAUDIBLE]. 999 00:45:58,896 --> 00:45:59,878 PROFESSOR: I am not-- 1000 00:45:59,878 --> 00:46:00,860 AUDIENCE: Is that Tor distributions? 1001 00:46:00,860 --> 00:46:02,333 Is that paired with virtual machines? 1002 00:46:02,333 --> 00:46:02,833 [INAUDIBLE] 1003 00:46:05,527 --> 00:46:07,110 PROFESSOR: I see-- so the basic idea-- 1004 00:46:07,110 --> 00:46:09,443 is it a similar idea to what we were just talking about? 1005 00:46:09,443 --> 00:46:10,744 AUDIENCE: Yes, [INAUDIBLE]. 1006 00:46:10,744 --> 00:46:12,660 PROFESSOR: Yeah, so I never heard of that one. 1007 00:46:12,660 --> 00:46:14,535 I have heard of some of these other projects. 1008 00:46:14,535 --> 00:46:18,480 I'm imagining there's actually some trickiness in getting 1009 00:46:18,480 --> 00:46:20,495 systems like this to be efficient a lot of times 1010 00:46:20,495 --> 00:46:22,870 because particularly imagine if you have something that's 1011 00:46:22,870 --> 00:46:23,655 interactive. 1012 00:46:23,655 --> 00:46:26,030 It's like you want to play a game or something like that. 1013 00:46:26,030 --> 00:46:28,790 It's a little bit awkward to send my mouse 1014 00:46:28,790 --> 00:46:30,650 click to some proxy. 1015 00:46:30,650 --> 00:46:34,858 That proxy is then somehow going to [INAUDIBLE]. 1016 00:46:34,858 --> 00:46:38,770 AUDIENCE: Let me clarify the first station virtual machine 1017 00:46:38,770 --> 00:46:41,215 actually runs [INAUDIBLE] Firefox. 1018 00:46:41,215 --> 00:46:44,160 In the proxy it's known as a Tor. 1019 00:46:44,160 --> 00:46:46,512 PROFESSOR: Ah, it's just a Tor proxy. 1020 00:46:46,512 --> 00:46:48,470 So if it's a Tor proxy, sure, that's one thing. 1021 00:46:48,470 --> 00:46:50,303 Then the only overhead there you have to pay 1022 00:46:50,303 --> 00:46:53,062 is the regular Tor overhead of going 1023 00:46:53,062 --> 00:46:54,840 through all the onion route. 1024 00:46:54,840 --> 00:46:57,860 Yeah, so I was talking there are systems-- 1025 00:46:57,860 --> 00:46:59,880 let's ignore the IP anonymity for a second 1026 00:46:59,880 --> 00:47:01,820 because they basically try to say 1027 00:47:01,820 --> 00:47:04,550 you have your own very fingerprintable browser 1028 00:47:04,550 --> 00:47:05,571 on your own machine. 1029 00:47:05,571 --> 00:47:07,570 You don't want to expose that to the web server. 1030 00:47:07,570 --> 00:47:09,270 So essentially you go through a proxy, 1031 00:47:09,270 --> 00:47:10,686 which you think of it all the time 1032 00:47:10,686 --> 00:47:14,370 like a headless Firefox let's say of some canonical version. 1033 00:47:14,370 --> 00:47:16,760 The web server thinks it is interacting with this thing. 1034 00:47:16,760 --> 00:47:19,910 So if I go load this site, I am perceived by the web server 1035 00:47:19,910 --> 00:47:21,490 as Firefox 10.7 or whatever. 1036 00:47:21,490 --> 00:47:23,910 If you go there, you are also perceived as Firefox 10.7. 1037 00:47:23,910 --> 00:47:26,954 Then behind the scenes its' spitting out HTML and stuff 1038 00:47:26,954 --> 00:47:29,000 like that it collected from the proxy. 1039 00:47:29,000 --> 00:47:32,680 So those two things are orthogonal. 1040 00:47:32,680 --> 00:47:35,620 AUDIENCE: But it seems like you don't need a proxy for this. 1041 00:47:35,620 --> 00:47:36,600 You could have browser support for this, right? 1042 00:47:36,600 --> 00:47:38,460 Meaning the Tor browser does this 1043 00:47:38,460 --> 00:47:42,150 already by trying to appear as the most generic version 1044 00:47:42,150 --> 00:47:42,870 of Firefox. 1045 00:47:42,870 --> 00:47:44,560 PROFESSOR: Yeah, so this is true. 1046 00:47:44,560 --> 00:47:46,810 Although, I think a problem with a lot of those things 1047 00:47:46,810 --> 00:47:49,311 that even if you try to lock yourself into one version, 1048 00:47:49,311 --> 00:47:51,560 there's still a lot of things that can fingerprint it. 1049 00:47:51,560 --> 00:47:53,768 So I think with the Tor distribution, what they often 1050 00:47:53,768 --> 00:47:56,950 do is they say, we control what's in the Tor distribution. 1051 00:47:56,950 --> 00:47:59,670 So if we all go down to the Tor distribution, then forshizzle, 1052 00:47:59,670 --> 00:48:04,000 we're both going to get Firefox with the same Java version-- 1053 00:48:04,000 --> 00:48:05,624 the same so on and so forth. 1054 00:48:05,624 --> 00:48:07,490 AUDIENCE: Well, it's more than that though. 1055 00:48:07,490 --> 00:48:09,823 They return screen sizes that are the most common screen 1056 00:48:09,823 --> 00:48:11,909 sizes whenever you clear the screen. 1057 00:48:11,909 --> 00:48:13,034 PROFESSOR: That's all true. 1058 00:48:13,034 --> 00:48:14,170 Yeah, so one thing that's interesting to look 1059 00:48:14,170 --> 00:48:16,639 at though-- the Tor team that also put out-- the people who 1060 00:48:16,639 --> 00:48:19,180 do the bundle-- they'll often put out reports about what data 1061 00:48:19,180 --> 00:48:20,120 still gets leaked. 1062 00:48:20,120 --> 00:48:21,946 So stuff does still get leaked out. 1063 00:48:21,946 --> 00:48:23,297 But you're right. 1064 00:48:23,297 --> 00:48:25,880 If you could-- the high level of that goal is very reasonable. 1065 00:48:25,880 --> 00:48:27,590 It's saying that if we all agreed 1066 00:48:27,590 --> 00:48:29,629 to download the same distribution 1067 00:48:29,629 --> 00:48:32,170 and to then not trick it out by adding plugins or stuff like, 1068 00:48:32,170 --> 00:48:33,253 then you're exactly right. 1069 00:48:33,253 --> 00:48:35,197 That'd work great. 1070 00:48:35,197 --> 00:48:36,030 Any other questions? 1071 00:48:40,410 --> 00:48:44,030 Yeah, so that is it for demo time. 1072 00:48:51,845 --> 00:48:52,886 And there's more physics. 1073 00:48:56,330 --> 00:48:59,560 This must have been a riveting previous class. 1074 00:48:59,560 --> 00:49:01,295 So we will ignore that for the moment. 1075 00:49:01,295 --> 00:49:01,920 Let's see here. 1076 00:49:07,240 --> 00:49:08,340 So where were we? 1077 00:49:12,116 --> 00:49:14,525 So what is the high-level goal of privacy? 1078 00:49:14,525 --> 00:49:15,900 And you can think of it as what's 1079 00:49:15,900 --> 00:49:18,420 your anonymity set if you're a user? 1080 00:49:18,420 --> 00:49:20,490 So in other words, how many-- what's 1081 00:49:20,490 --> 00:49:22,630 the size of people-- the number of people 1082 00:49:22,630 --> 00:49:25,270 that you could be confused for-- you 1083 00:49:25,270 --> 00:49:26,857 could be mistaken for by an attacker. 1084 00:49:26,857 --> 00:49:28,940 And so what the browser fingerprinting stuff shows 1085 00:49:28,940 --> 00:49:32,620 is that oftentimes a web attacker can narrow you 1086 00:49:32,620 --> 00:49:35,360 down to a very, very tight demographic 1087 00:49:35,360 --> 00:49:38,510 without controlling anything on your local machine whatsoever. 1088 00:49:38,510 --> 00:49:41,370 So that's actually little bit frightening to know. 1089 00:49:44,020 --> 00:49:47,480 So you might want to think about how 1090 00:49:47,480 --> 00:49:50,480 can a web attacker determine if you're using private browsing 1091 00:49:50,480 --> 00:49:51,692 mode? 1092 00:49:51,692 --> 00:49:53,400 Maybe that's [INAUDIBLE] for some reason. 1093 00:49:53,400 --> 00:49:56,260 So in the paper they actually describe an attack 1094 00:49:56,260 --> 00:49:58,400 that uses link colors. 1095 00:49:58,400 --> 00:50:00,260 So remember, in private browsing mode, 1096 00:50:00,260 --> 00:50:01,730 the browsers isn't supposed to keep 1097 00:50:01,730 --> 00:50:04,770 track of the history of the sites that you visit. 1098 00:50:04,770 --> 00:50:07,630 And so in the paper, the authors describe an attack 1099 00:50:07,630 --> 00:50:10,630 in which essentially the attacker-controlled page 1100 00:50:10,630 --> 00:50:14,510 creates an iframe to some URL that the attacker controls 1101 00:50:14,510 --> 00:50:16,780 and loads that inside the attacker page. 1102 00:50:16,780 --> 00:50:19,400 And then it basically looks at the link colors. 1103 00:50:19,400 --> 00:50:21,065 It creates a link to that page-- that 1104 00:50:21,065 --> 00:50:22,880 iframe it just created-- and then sees 1105 00:50:22,880 --> 00:50:26,810 that the link color for that link is the visited color. 1106 00:50:26,810 --> 00:50:29,460 So see it as purple versus blue. 1107 00:50:29,460 --> 00:50:33,600 And the idea that if you do this test in private browsing mode, 1108 00:50:33,600 --> 00:50:35,510 then presumably the link colors should 1109 00:50:35,510 --> 00:50:38,084 stay like the unvisited color because the browser 1110 00:50:38,084 --> 00:50:40,542 is supposed to be forgetting about all this kinds of stuff. 1111 00:50:40,542 --> 00:50:43,097 So that's the attack they describe in the paper. 1112 00:50:43,097 --> 00:50:45,055 What's interesting is that this attack actually 1113 00:50:45,055 --> 00:50:46,330 doesn't work anymore. 1114 00:50:46,330 --> 00:50:49,280 So we actually discussed this a couple of lectures back. 1115 00:50:49,280 --> 00:50:51,430 So this attack that the paper describes 1116 00:50:51,430 --> 00:50:53,550 is the browser history sniffing attack. 1117 00:50:53,550 --> 00:50:55,640 So as we discussed a couple of lectures ago, 1118 00:50:55,640 --> 00:50:59,770 JavaScript code now does not expose correct link colors 1119 00:50:59,770 --> 00:51:02,380 basically to JavaScript . 1120 00:51:02,380 --> 00:51:06,290 And it's precisely to prevent these types of attacks. 1121 00:51:06,290 --> 00:51:08,775 So that particular part of the paper is outdated. 1122 00:51:08,775 --> 00:51:11,790 AUDIENCE: What does it point to that browsers now also show 1123 00:51:11,790 --> 00:51:14,774 links as purple in private browsing mode 1124 00:51:14,774 --> 00:51:16,444 and turn blue again when you exit. 1125 00:51:16,444 --> 00:51:18,110 PROFESSOR: Yeah, it's a bit weird, yeah. 1126 00:51:18,110 --> 00:51:20,442 They implemented that attack-- the defense-- 1127 00:51:20,442 --> 00:51:22,275 I think before a lot of the private browsers 1128 00:51:22,275 --> 00:51:23,220 like a popware. 1129 00:51:23,220 --> 00:51:25,190 So now they do this additional thing too. 1130 00:51:25,190 --> 00:51:27,854 The long story short, the attack they describe in the paper 1131 00:51:27,854 --> 00:51:30,145 doesn't work because of some of these browsers sniffing 1132 00:51:30,145 --> 00:51:30,700 defenses. 1133 00:51:30,700 --> 00:51:32,610 But you can still imagine that there 1134 00:51:32,610 --> 00:51:36,054 may be ways for the web attacker to figure out if you are 1135 00:51:36,054 --> 00:51:37,220 using private browsing mode. 1136 00:51:37,220 --> 00:51:40,500 So for example, when you do private browsing mode, 1137 00:51:40,500 --> 00:51:42,640 any cookies that you got from public mode 1138 00:51:42,640 --> 00:51:45,340 should not be sent during private mode. 1139 00:51:45,340 --> 00:51:48,102 So in other words, if I go to Amazon.com in public mode, 1140 00:51:48,102 --> 00:51:50,050 I generate some cookies. 1141 00:51:50,050 --> 00:51:52,521 Then I go to Amazon.com in private browsing mode. 1142 00:51:52,521 --> 00:51:54,270 When I contact Amazon.com in private mode, 1143 00:51:54,270 --> 00:51:57,320 those public mode cookies should not be sent. 1144 00:51:57,320 --> 00:52:02,420 That can actually act as the sign to the web attacker 1145 00:52:02,420 --> 00:52:04,500 that you actually are using private mode. 1146 00:52:04,500 --> 00:52:06,940 AUDIENCE: This is also now you're using the canvass 1147 00:52:06,940 --> 00:52:08,612 in both of these events, right? 1148 00:52:08,612 --> 00:52:10,770 So you need to know the IP address. 1149 00:52:10,770 --> 00:52:12,722 PROFESSOR: Yeah, that's right. 1150 00:52:12,722 --> 00:52:14,610 AUDIENCE: So that link that you were 1151 00:52:14,610 --> 00:52:17,442 targeting with the link color would be on a per IP basis. 1152 00:52:17,442 --> 00:52:19,358 And you would have to rely that the user first 1153 00:52:19,358 --> 00:52:21,494 visited it as a public mode, and you protect it. 1154 00:52:21,494 --> 00:52:23,160 PROFESSOR: Ah, so the link-- so the link 1155 00:52:23,160 --> 00:52:26,800 attack you can actually do in the context of a single page. 1156 00:52:26,800 --> 00:52:29,300 So imagine that I, the web attacker, 1157 00:52:29,300 --> 00:52:30,819 construct single page. 1158 00:52:30,819 --> 00:52:33,110 I, the attacker, have JavaScript that creates an iframe 1159 00:52:33,110 --> 00:52:35,820 to foo.com like this. 1160 00:52:35,820 --> 00:52:38,570 So that iframe will load the contents of that page. 1161 00:52:38,570 --> 00:52:40,570 And then I, the attacker, in the parent frame 1162 00:52:40,570 --> 00:52:42,840 can then create a link element and then 1163 00:52:42,840 --> 00:52:44,190 try to look at the color. 1164 00:52:44,190 --> 00:52:46,330 This worked four years ago. 1165 00:52:46,330 --> 00:52:49,880 So in that case, it doesn't rely on the user having explicitly 1166 00:52:49,880 --> 00:52:53,890 visited that iframe page at all because I, the attacker, 1167 00:52:53,890 --> 00:52:56,008 can create that in the context of the page. 1168 00:52:56,008 --> 00:52:59,330 I have gotten [INAUDIBLE]. 1169 00:52:59,330 --> 00:53:01,310 Any other questions? 1170 00:53:01,310 --> 00:53:04,167 So yeah, so you can maybe think about how cookies 1171 00:53:04,167 --> 00:53:06,000 can reveal public and private browsing modes 1172 00:53:06,000 --> 00:53:08,660 and things like that. 1173 00:53:08,660 --> 00:53:12,120 So one thing we might think about 1174 00:53:12,120 --> 00:53:21,210 is how we can provide a stronger privacy 1175 00:53:21,210 --> 00:53:25,520 guarantee for private browsers? 1176 00:53:29,554 --> 00:53:35,050 And for the sake of this discussion, 1177 00:53:35,050 --> 00:53:41,630 let's just ignore IP addresses for now 1178 00:53:41,630 --> 00:53:45,260 because as we'll discuss next lecture, 1179 00:53:45,260 --> 00:53:47,670 we can used Tor to maybe help with some 1180 00:53:47,670 --> 00:53:49,330 of the privacy of IP addresses. 1181 00:53:49,330 --> 00:53:51,971 So one thing you can imagine doing 1182 00:53:51,971 --> 00:53:55,712 is you can imagine using VMs in some way 1183 00:53:55,712 --> 00:54:06,200 to help provide stronger private browsing guaranteed-- so VM 1184 00:54:06,200 --> 00:54:08,490 level privacy. 1185 00:54:08,490 --> 00:54:11,290 And so the basic idea is that you 1186 00:54:11,290 --> 00:54:21,825 want to run each private session inside of a separate VM. 1187 00:54:25,020 --> 00:54:29,070 And then when the user is done with that-- 1188 00:54:29,070 --> 00:54:31,830 so is finished with the private browsing session, 1189 00:54:31,830 --> 00:54:38,580 you basically delete VM after that session is done. 1190 00:54:43,820 --> 00:54:47,870 So what's the advantage of this? 1191 00:54:47,870 --> 00:54:51,230 Well, what's nice about this is presumably 1192 00:54:51,230 --> 00:54:52,730 you can get some stronger guarantees 1193 00:54:52,730 --> 00:54:58,910 about what privacy properties you can provide to the user 1194 00:54:58,910 --> 00:55:01,640 because, presumably, the VM has a pretty 1195 00:55:01,640 --> 00:55:06,606 clean interface to the I/O path of the underlying post-OS. 1196 00:55:06,606 --> 00:55:07,980 So you can imagine that maybe you 1197 00:55:07,980 --> 00:55:13,000 combine this VMs into let's say some type of a secure swap 1198 00:55:13,000 --> 00:55:16,206 solution like Open BSD has-- give us another encrypted disk 1199 00:55:16,206 --> 00:55:16,705 type thing. 1200 00:55:16,705 --> 00:55:21,840 So you can imagine, OK, we have a very clean separation of VM 1201 00:55:21,840 --> 00:55:24,450 up here and all the I/Os that are generated down here. 1202 00:55:24,450 --> 00:55:27,420 And so that gives you stronger guarantees 1203 00:55:27,420 --> 00:55:30,891 than what you can get from the browser, which wasn't designed 1204 00:55:30,891 --> 00:55:33,390 from the ground up to think very carefully about all the I/O 1205 00:55:33,390 --> 00:55:35,764 paths and what secrets might leak when it was in storage. 1206 00:55:38,620 --> 00:55:42,330 So yes, this provides what's nice 1207 00:55:42,330 --> 00:55:45,050 about this-- strong guarantees. 1208 00:55:48,930 --> 00:55:52,560 And, also, what's nice is it doesn't require 1209 00:55:52,560 --> 00:55:57,060 any changes to your applications-- that 1210 00:55:57,060 --> 00:55:58,474 is to say to the browser. 1211 00:55:58,474 --> 00:56:00,140 You take your browser, put it inside one 1212 00:56:00,140 --> 00:56:03,760 of these VMs-- then everything gets better all magically. 1213 00:56:03,760 --> 00:56:06,045 It's not location change. 1214 00:56:06,045 --> 00:56:11,315 So what's bad about this-- I'll use an unhappy face 1215 00:56:11,315 --> 00:56:12,330 to demonstrate that. 1216 00:56:12,330 --> 00:56:17,110 So what's bad is first of all, it's heavyweight. 1217 00:56:17,110 --> 00:56:20,050 And by heavyweight, I mean that time you 1218 00:56:20,050 --> 00:56:22,860 want to spin up one of these private browsing sessions, 1219 00:56:22,860 --> 00:56:25,000 you have to spin up a whole VM. 1220 00:56:25,000 --> 00:56:27,260 And that can actually be pretty painful. 1221 00:56:27,260 --> 00:56:28,886 So perhaps users are going to get upset 1222 00:56:28,886 --> 00:56:30,760 because it's going to take them long time now 1223 00:56:30,760 --> 00:56:32,660 to launch these private browsing sessions. 1224 00:56:32,660 --> 00:56:36,730 And the other problems to is this solution actually has 1225 00:56:36,730 --> 00:56:39,830 bad usability. 1226 00:56:39,830 --> 00:56:43,080 And the reason I say that is because now it's actually 1227 00:56:43,080 --> 00:56:47,230 difficult for users to do things like take files 1228 00:56:47,230 --> 00:56:49,176 that they've saved in private browsing mode 1229 00:56:49,176 --> 00:56:52,190 and then take them to the rest of their computer-- 1230 00:56:52,190 --> 00:56:54,731 any bookmarks that they generate during private browsing mode 1231 00:56:54,731 --> 00:56:57,110 that who they actually do want to persist 1232 00:56:57,110 --> 00:56:59,485 will be difficult to get those at the end. 1233 00:56:59,485 --> 00:57:00,110 It can be done. 1234 00:57:00,110 --> 00:57:02,120 But there's a lot of friction here. 1235 00:57:02,120 --> 00:57:03,400 So that's the bummer. 1236 00:57:05,920 --> 00:57:11,720 So another thing that you might imagine doing 1237 00:57:11,720 --> 00:57:16,740 is something that looks like approach number one. 1238 00:57:16,740 --> 00:57:23,813 But we actually implement it inside of the OS themselves 1239 00:57:23,813 --> 00:57:26,180 instead of in a virtual machine. 1240 00:57:26,180 --> 00:57:28,500 So the basic idea here is that you 1241 00:57:28,500 --> 00:57:35,500 can imagine that each process could potentially 1242 00:57:35,500 --> 00:57:39,746 run in a privacy domain. 1243 00:57:44,620 --> 00:57:51,340 So basically, the privacy domain access the collection 1244 00:57:51,340 --> 00:57:54,400 of OS global resources that process uses. 1245 00:57:54,400 --> 00:57:56,680 And so the OS tracks all that kind of stuff. 1246 00:57:56,680 --> 00:58:00,190 And then once the process dies, essentially the OS 1247 00:58:00,190 --> 00:58:01,950 goes through and looks at all the things 1248 00:58:01,950 --> 00:58:04,230 that are in that privacy domain set. 1249 00:58:04,230 --> 00:58:09,050 And then purely deallocate all those resources. 1250 00:58:09,050 --> 00:58:12,993 And so the advantage of this over the VM 1251 00:58:12,993 --> 00:58:20,330 is that it is lighter weight because if you think about it, 1252 00:58:20,330 --> 00:58:23,450 the VM is essentially agnostic to all the OS state and all 1253 00:58:23,450 --> 00:58:26,580 the application state that is actually being used to run. 1254 00:58:26,580 --> 00:58:29,266 So the result-- it probably does more work than the OS 1255 00:58:29,266 --> 00:58:31,880 would have to do because the OS presumably 1256 00:58:31,880 --> 00:58:35,082 knows all the points at which the private browser would 1257 00:58:35,082 --> 00:58:38,360 be touching I/O, and talk to the network, and stuff like that. 1258 00:58:38,360 --> 00:58:40,585 So maybe it even knows things like you can actually 1259 00:58:40,585 --> 00:58:43,980 clear the DNS cache selectively, for example. 1260 00:58:43,980 --> 00:58:46,560 So you can imagine that it's much easier 1261 00:58:46,560 --> 00:58:49,095 to spin these things up-- these privacy domains-- 1262 00:58:49,095 --> 00:58:51,090 then to tear them down. 1263 00:58:51,090 --> 00:58:53,930 However, the sad thing, at least with respect 1264 00:58:53,930 --> 00:58:58,580 to the virtual machine solution, is that it's harder 1265 00:58:58,580 --> 00:58:59,330 to get this right. 1266 00:59:03,010 --> 00:59:07,292 So I just described the VM approach 1267 00:59:07,292 --> 00:59:09,000 as being headway because it's essentially 1268 00:59:09,000 --> 00:59:12,650 agnostic to everything that's running inside the container. 1269 00:59:12,650 --> 00:59:14,832 But what's nice about that is that allows 1270 00:59:14,832 --> 00:59:18,585 the VM approach to only focus on a few low-level interfaces. 1271 00:59:18,585 --> 00:59:20,620 And it can focus on those things. 1272 00:59:20,620 --> 00:59:23,600 For example, the interface the VM uses to write to disk, 1273 00:59:23,600 --> 00:59:27,230 then you can have high confidence that it's actually 1274 00:59:27,230 --> 00:59:29,070 managed to contain everything. 1275 00:59:29,070 --> 00:59:30,705 Whereas with the OS-- if you think 1276 00:59:30,705 --> 00:59:33,291 the OS is going to interpose on individual files with system 1277 00:59:33,291 --> 00:59:35,790 interfaces-- perhaps individual network interfaces and stuff 1278 00:59:35,790 --> 00:59:37,714 like that-- it's much more complicated to find 1279 00:59:37,714 --> 00:59:42,667 all of those points at which data can leak if you're going 1280 00:59:42,667 --> 00:59:44,450 to do that at the OS level. 1281 00:59:44,450 --> 00:59:45,782 So does that all make sense? 1282 00:59:57,972 --> 00:59:59,263 Why is this physics everywhere? 1283 01:00:02,124 --> 01:00:03,207 Ah, god, I'm being tested. 1284 01:00:09,468 --> 01:00:10,926 Those are basically some approaches 1285 01:00:10,926 --> 01:00:13,952 we can use to provide potentially stronger privacy 1286 01:00:13,952 --> 01:00:16,202 guarantees than what's implemented in private browsers 1287 01:00:16,202 --> 01:00:18,240 right now. 1288 01:00:18,240 --> 01:00:26,610 So one question you might have is can we still 1289 01:00:26,610 --> 01:00:33,250 be an anonymized user if the browser-- sorry, 1290 01:00:33,250 --> 01:00:38,950 if the user is employing one of these more powerful solutions-- 1291 01:00:38,950 --> 01:00:43,039 if the user is surfing through VM 1292 01:00:43,039 --> 01:00:45,330 or surfing one of these privacy domains in the OS-- can 1293 01:00:45,330 --> 01:00:46,960 we still figure out who they are? 1294 01:00:46,960 --> 01:00:48,420 And the answer is, yes. 1295 01:00:48,420 --> 01:00:53,020 So maybe the VM is unique for some reason. 1296 01:00:56,800 --> 01:01:02,844 And so similar to how we were able to fingerprint browsers 1297 01:01:02,844 --> 01:01:04,760 using that Panopticlick website, maybe there's 1298 01:01:04,760 --> 01:01:07,460 something unique about the way that the VM would be set up 1299 01:01:07,460 --> 01:01:09,530 that allows to fingerprint it. 1300 01:01:09,530 --> 01:01:14,800 And it may in fact be the case that maybe the virtual machine 1301 01:01:14,800 --> 01:01:20,640 monitor or the OS itself is unique in some ways. 1302 01:01:20,640 --> 01:01:23,650 That would allow a web attacker to figure out who the user was. 1303 01:01:23,650 --> 01:01:28,440 And so one cute example of this is TCP fingerprinting. 1304 01:01:32,742 --> 01:01:34,200 So what's the big idea behind this. 1305 01:01:34,200 --> 01:01:35,980 So as it turns out, the specification 1306 01:01:35,980 --> 01:01:38,290 for the TCP protocol actually allows 1307 01:01:38,290 --> 01:01:40,420 some of the parameters for the protocol 1308 01:01:40,420 --> 01:01:44,080 to be set by the implementation of the protocol. 1309 01:01:44,080 --> 01:01:47,725 So, for example, TCP allows implementers to choose things 1310 01:01:47,725 --> 01:01:49,556 like initial packet size-- the things that 1311 01:01:49,556 --> 01:01:52,140 are sent out the first part of the TCP connection-- 1312 01:01:52,140 --> 01:01:55,000 it allows implementers to choose things like that initial time 1313 01:01:55,000 --> 01:01:57,870 to live in those packets. 1314 01:01:57,870 --> 01:01:59,745 And so you can imagine, and in fact, you 1315 01:01:59,745 --> 01:02:01,995 don't have to imagine that this is actually the truth. 1316 01:02:01,995 --> 01:02:04,817 You can get off-the shelf tools like InMap, for example, 1317 01:02:04,817 --> 01:02:07,150 that they actually can tell what operating system you're 1318 01:02:07,150 --> 01:02:10,340 running with high probability just by sending you packets. 1319 01:02:10,340 --> 01:02:13,040 They'll send these very carefully crafted packets. 1320 01:02:13,040 --> 01:02:15,042 And they will look and see things like here's 1321 01:02:15,042 --> 01:02:17,000 what the TTL was or here's what the packet size 1322 01:02:17,000 --> 01:02:20,090 distribution was-- here's what the TTP sequence number was. 1323 01:02:20,090 --> 01:02:22,394 And they basically have a database to fingerprint. 1324 01:02:22,394 --> 01:02:24,644 And they say, OK, if the return packet has this, this, 1325 01:02:24,644 --> 01:02:27,280 and this characteristic, then the table 1326 01:02:27,280 --> 01:02:29,420 says that you're probably running for some reason 1327 01:02:29,420 --> 01:02:30,650 Solaris. 1328 01:02:30,650 --> 01:02:31,800 You're running Mac. 1329 01:02:31,800 --> 01:02:34,120 You're running Windows or whatever. 1330 01:02:34,120 --> 01:02:36,770 So even if we use one of these stronger approaches 1331 01:02:36,770 --> 01:02:39,070 for private browsing with a VM or an OS, 1332 01:02:39,070 --> 01:02:41,570 you still may be able to run one of those TCP fingerprinting 1333 01:02:41,570 --> 01:02:45,360 attacks and learn a lot about a particular user. 1334 01:02:45,360 --> 01:02:50,302 And one thing that's also interesting to note 1335 01:02:50,302 --> 01:02:56,042 is that even if we use one of these more powerful techniques 1336 01:02:56,042 --> 01:02:59,070 to try to protect the user, the user is still 1337 01:02:59,070 --> 01:03:04,500 shared across both the public and the private browsing 1338 01:03:04,500 --> 01:03:05,550 session. 1339 01:03:05,550 --> 01:03:07,460 Still uses-- visibly using the machine. 1340 01:03:07,460 --> 01:03:09,480 So why is it interesting? 1341 01:03:09,480 --> 01:03:13,020 Well, it's interesting because you yourself 1342 01:03:13,020 --> 01:03:17,180 by way that you use computers, may leak information 1343 01:03:17,180 --> 01:03:17,980 about yourself. 1344 01:03:17,980 --> 01:03:22,780 So, for example, as it turns out, 1345 01:03:22,780 --> 01:03:26,140 users have unique keystroke timing. 1346 01:03:29,050 --> 01:03:32,600 So if I look at-- if I give everyone in this class the same 1347 01:03:32,600 --> 01:03:35,240 thing to type in -- the quick, brown fox-- 1348 01:03:35,240 --> 01:03:37,380 whatever that nonsense is-- and I actually look 1349 01:03:37,380 --> 01:03:42,320 at the inter-key press timing, we'll all have these unique 1350 01:03:42,320 --> 01:03:44,790 distributions that can potentially be used 1351 01:03:44,790 --> 01:03:46,890 to fingerprint us. 1352 01:03:46,890 --> 01:03:50,960 Another thing that's interesting is that users 1353 01:03:50,960 --> 01:03:52,510 have unique writing styles. 1354 01:03:55,850 --> 01:04:00,500 So there's this branch of security 1355 01:04:00,500 --> 01:04:02,525 that is called stylography. 1356 01:04:06,060 --> 01:04:12,270 And the basic idea here is to figure out if I am an attacker, 1357 01:04:12,270 --> 01:04:14,410 can I figure out who you are just 1358 01:04:14,410 --> 01:04:16,460 by looking at writing samples from you? 1359 01:04:16,460 --> 01:04:18,730 So imagine that for whatever reason 1360 01:04:18,730 --> 01:04:21,190 you're hanging out on 4chan-- don't hang out on 4chan-- 1361 01:04:21,190 --> 01:04:23,697 and I want to figure out if you've actually, in fact, 1362 01:04:23,697 --> 01:04:24,780 been hanging out on 4chan. 1363 01:04:24,780 --> 01:04:27,240 So perhaps what I can do is I can 1364 01:04:27,240 --> 01:04:30,970 look at a bunch of different posts from 4chan. 1365 01:04:30,970 --> 01:04:34,692 Maybe I can cluster those posts into sets of posts 1366 01:04:34,692 --> 01:04:37,999 that I think look stylistically the same. 1367 01:04:37,999 --> 01:04:39,790 And then what I can do is I can find things 1368 01:04:39,790 --> 01:04:42,580 that you've written publicly where you're actually 1369 01:04:42,580 --> 01:04:43,652 attributed as the author. 1370 01:04:43,652 --> 01:04:45,610 I'll look at you homework assignments or papers 1371 01:04:45,610 --> 01:04:47,276 that you've written or things like that. 1372 01:04:47,276 --> 01:04:49,630 And I'll see do you match any of these clusters 1373 01:04:49,630 --> 01:04:51,130 from these 4chan comments. 1374 01:04:51,130 --> 01:04:53,900 And if so, them maybe I can say maybe send you a stern note. 1375 01:04:53,900 --> 01:04:55,121 Talk to the parents that their kid has gone off 1376 01:04:55,121 --> 01:04:56,090 the beaten path. 1377 01:04:56,090 --> 01:04:57,700 Get off of 4chan. 1378 01:04:57,700 --> 01:05:00,460 So the reason is I would like to look at this thing called 1379 01:05:00,460 --> 01:05:00,960 stylography. 1380 01:05:00,960 --> 01:05:03,100 It's actually quite interesting. 1381 01:05:03,100 --> 01:05:06,371 Does anyone have any questions about that? 1382 01:05:06,371 --> 01:05:06,870 Excellent. 1383 01:05:09,620 --> 01:05:15,040 So we discuss how we might be able to use 1384 01:05:15,040 --> 01:05:19,340 VM or modified operating systems to provide private browsing 1385 01:05:19,340 --> 01:05:20,130 support. 1386 01:05:20,130 --> 01:05:23,010 And so you might wonder, OK, well, then why don't browsers 1387 01:05:23,010 --> 01:05:25,895 require users to do one of these things-- 1388 01:05:25,895 --> 01:05:28,270 to have one of these tricked out VMs or tricked out OSes? 1389 01:05:28,270 --> 01:05:30,050 So why do browsers take it upon themselves 1390 01:05:30,050 --> 01:05:31,560 to implement all this stuff? 1391 01:05:31,560 --> 01:05:34,200 And so the main reason is deployability. 1392 01:05:34,200 --> 01:05:36,210 So in fact, browser vendors typically 1393 01:05:36,210 --> 01:05:39,290 do not want to ask their users to do anything special 1394 01:05:39,290 --> 01:05:42,550 to use the browser besides install the browser binary 1395 01:05:42,550 --> 01:05:43,050 itself. 1396 01:05:43,050 --> 01:05:44,582 This is similar to the motivation 1397 01:05:44,582 --> 01:05:45,720 of the native client. 1398 01:05:45,720 --> 01:05:47,840 So if Google wants to add these cool future 1399 01:05:47,840 --> 01:05:49,020 to end users' computers. 1400 01:05:49,020 --> 01:05:50,640 But it doesn't want to force users 1401 01:05:50,640 --> 01:05:53,795 to install some special version of Windows or Linux 1402 01:05:53,795 --> 01:05:54,620 or whatever. 1403 01:05:54,620 --> 01:05:58,610 So Google basically says, we'll take care of this ourselves. 1404 01:05:58,610 --> 01:06:01,100 Then another reason is actually usability. 1405 01:06:01,100 --> 01:06:04,920 So a lot of these VM and OS-level 1406 01:06:04,920 --> 01:06:07,260 solutions in private browsing-- as we've discussed, 1407 01:06:07,260 --> 01:06:08,960 they make it more difficult for users 1408 01:06:08,960 --> 01:06:12,160 to persist state from private browsing sessions 1409 01:06:12,160 --> 01:06:15,830 that they do actually want to persist like downloading files 1410 01:06:15,830 --> 01:06:19,480 like bookmarks they create and things like that. 1411 01:06:19,480 --> 01:06:21,539 So basically the browser vendors say, well, 1412 01:06:21,539 --> 01:06:23,580 if we implement private browsing modes ourselves, 1413 01:06:23,580 --> 01:06:25,730 we can actually allow users to do those things. 1414 01:06:25,730 --> 01:06:27,780 We can allow users to take downloaded files 1415 01:06:27,780 --> 01:06:29,470 from private browsing mode and take them 1416 01:06:29,470 --> 01:06:30,594 to the rest of the machine. 1417 01:06:30,594 --> 01:06:32,710 So that seems nice at first. 1418 01:06:32,710 --> 01:06:35,635 But note that, of course, that allowing users 1419 01:06:35,635 --> 01:06:37,740 to export some type of private state 1420 01:06:37,740 --> 01:06:40,490 actually opens up a lot of security vulnerabilities. 1421 01:06:40,490 --> 01:06:43,420 It makes it very difficult to analyze security properties 1422 01:06:43,420 --> 01:06:49,770 that result in private browsing modes actually provide. 1423 01:06:49,770 --> 01:06:53,665 And so in the paper, they try to characterize 1424 01:06:53,665 --> 01:06:57,900 the different types of browser states that can be modified 1425 01:06:57,900 --> 01:07:01,400 and how current private browsing modes actually handle 1426 01:07:01,400 --> 01:07:03,080 the modifications at stake. 1427 01:07:03,080 --> 01:07:06,292 So the paper describes this taxonomy 1428 01:07:06,292 --> 01:07:12,122 of browser state changes. 1429 01:07:12,122 --> 01:07:14,540 And so there are four things in the taxonomy. 1430 01:07:14,540 --> 01:07:22,110 So one type of state change is initiated 1431 01:07:22,110 --> 01:07:25,109 by the website itself. 1432 01:07:25,109 --> 01:07:26,442 And there's no user interaction. 1433 01:07:29,750 --> 01:07:33,880 And so examples of this type of state change 1434 01:07:33,880 --> 01:07:37,580 think about stuff like when a cookie gets 1435 01:07:37,580 --> 01:07:43,482 set-- when something gets added to the address 1436 01:07:43,482 --> 01:07:49,250 history of the browser-- maybe within a browser 1437 01:07:49,250 --> 01:07:52,270 cache or something. 1438 01:07:52,270 --> 01:07:56,210 And so from this type of state, basically, 1439 01:07:56,210 --> 01:07:57,935 private browsing mode says this state 1440 01:07:57,935 --> 01:08:01,240 is a private browsing mode session. 1441 01:08:01,240 --> 01:08:03,300 But it basically is going to be destroyed 1442 01:08:03,300 --> 01:08:05,982 when that private browsing session concludes. 1443 01:08:05,982 --> 01:08:10,419 And so the intuition behind this is that because there 1444 01:08:10,419 --> 01:08:14,158 is no user interaction in creating this state, 1445 01:08:14,158 --> 01:08:16,241 then perhaps the right thing for the browser to do 1446 01:08:16,241 --> 01:08:21,050 is assume that the user wouldn't want that to persist. 1447 01:08:21,050 --> 01:08:25,094 So another type of browser state change 1448 01:08:25,094 --> 01:08:32,569 is initiated by the website that the user is visiting. 1449 01:08:32,569 --> 01:08:37,314 But there is some type of user interaction involved 1450 01:08:37,314 --> 01:08:40,189 in the state change. 1451 01:08:40,189 --> 01:08:45,234 So an example of this might be the user installs client 1452 01:08:45,234 --> 01:08:53,359 certificate or maybe there's a safe password. 1453 01:08:53,359 --> 01:08:57,920 So the user tries to login to something. 1454 01:08:57,920 --> 01:09:00,130 And the browser says very helpfully would you 1455 01:09:00,130 --> 01:09:01,608 like to save this password? 1456 01:09:01,608 --> 01:09:03,649 And then if the users says, yes, then these types 1457 01:09:03,649 --> 01:09:05,616 of things, say passwords, can actually 1458 01:09:05,616 --> 01:09:08,970 be used outside of the private browsing mode. 1459 01:09:08,970 --> 01:09:12,460 And so it's a little bit unclear in principle 1460 01:09:12,460 --> 01:09:14,927 what the policy for this should be. 1461 01:09:14,927 --> 01:09:16,950 So what ends up happening in practice 1462 01:09:16,950 --> 01:09:20,260 is that browsers typically allow statements 1463 01:09:20,260 --> 01:09:23,095 in this category that set in private browsing modes 1464 01:09:23,095 --> 01:09:26,200 to persist outside of that private browsing mode 1465 01:09:26,200 --> 01:09:29,995 under the intuition that the user did have to say yes or no. 1466 01:09:29,995 --> 01:09:31,744 If the user said, yes, then maybe the user 1467 01:09:31,744 --> 01:09:35,689 is smart enough to understand that they 1468 01:09:35,689 --> 01:09:38,066 save some password for some unsavory site 1469 01:09:38,066 --> 01:09:39,649 and someone comes on later and figures 1470 01:09:39,649 --> 01:09:42,950 that out, that's the users fault-- not the browsers fault. 1471 01:09:42,950 --> 01:09:45,330 So it's a little unclear what the best policy is here. 1472 01:09:45,330 --> 01:09:46,795 But in practice, this type of state change 1473 01:09:46,795 --> 01:09:49,086 is allowed to persist outside of private browsing mode. 1474 01:09:52,360 --> 01:09:54,860 So there's another type of state change, which is 1475 01:09:54,860 --> 01:09:59,790 purely initiated by the user. 1476 01:09:59,790 --> 01:10:05,590 And so here you can think about things like setting a bookmark 1477 01:10:05,590 --> 01:10:08,420 or maybe downloading files. 1478 01:10:11,800 --> 01:10:13,880 And so the story for this state is 1479 01:10:13,880 --> 01:10:15,700 similar to the story for the state up here. 1480 01:10:15,700 --> 01:10:18,440 So basically because the user was explicitly 1481 01:10:18,440 --> 01:10:20,492 involved in the creation of that state. 1482 01:10:20,492 --> 01:10:22,408 Private browsing modes typically say, OK, it's 1483 01:10:22,408 --> 01:10:25,174 OK to persist these types of changes 1484 01:10:25,174 --> 01:10:29,040 to the outside world outside of private browsing mode. 1485 01:10:29,040 --> 01:10:31,450 Then there's some sets of state which 1486 01:10:31,450 --> 01:10:40,720 are actually unrelated to any particular session at all. 1487 01:10:40,720 --> 01:10:46,100 So this is stuff, for example, like an update to the browser 1488 01:10:46,100 --> 01:10:53,890 itself-- the actual binary that constitutes the browser. 1489 01:10:53,890 --> 01:10:56,327 And so the way the browser vendors think about this state 1490 01:10:56,327 --> 01:10:57,826 is this state is essentially assumed 1491 01:10:57,826 --> 01:11:01,651 to be part of the single, global state that's 1492 01:11:01,651 --> 01:11:04,540 available to both public and private browsing modes. 1493 01:11:04,540 --> 01:11:06,585 And so eventually, if you look at it, 1494 01:11:06,585 --> 01:11:09,210 there's actually quite a lot of states that will actually 1495 01:11:09,210 --> 01:11:11,930 potentially leak outside of private browsing mode, 1496 01:11:11,930 --> 01:11:14,539 particularly if there's user volition involved. 1497 01:11:14,539 --> 01:11:16,080 So it's interesting to think about is 1498 01:11:16,080 --> 01:11:22,150 this the right trade-off between security and privacy? 1499 01:11:22,150 --> 01:11:25,899 So what's interesting is that-- so the paper actually 1500 01:11:25,899 --> 01:11:32,045 says that it's difficult to sort of prevent a local attacker 1501 01:11:32,045 --> 01:11:34,136 from detecting whether or not you've been 1502 01:11:34,136 --> 01:11:35,700 using private browsing mode. 1503 01:11:35,700 --> 01:11:37,200 And the paper was a little bit vague 1504 01:11:37,200 --> 01:11:38,810 about why this might be the case. 1505 01:11:38,810 --> 01:11:40,700 So one reason why this might be the case 1506 01:11:40,700 --> 01:11:43,457 is because some of this state that actually leaks 1507 01:11:43,457 --> 01:11:46,140 from private browsing mode to public browsing mode, 1508 01:11:46,140 --> 01:11:47,967 essentially it can actually contain 1509 01:11:47,967 --> 01:11:50,960 hints the state was generated in private browsing mode. 1510 01:11:50,960 --> 01:11:53,940 So for example, on Firefox and Chrome, 1511 01:11:53,940 --> 01:11:58,524 when you generate a bookmark in private browsing mode, 1512 01:11:58,524 --> 01:12:00,440 that bookmark has a bunch of metadata with it. 1513 01:12:00,440 --> 01:12:02,860 So for example, the time that it was visited 1514 01:12:02,860 --> 01:12:03,780 and things like that. 1515 01:12:03,780 --> 01:12:06,350 So in many cases, that metadata will 1516 01:12:06,350 --> 01:12:08,682 be set to zero or some null value 1517 01:12:08,682 --> 01:12:11,140 if that bookmark was generated inside of a private browsing 1518 01:12:11,140 --> 01:12:12,133 mode. 1519 01:12:12,133 --> 01:12:14,216 So then later on if someone controls your machine, 1520 01:12:14,216 --> 01:12:16,650 and they look at your bookmark information-- 1521 01:12:16,650 --> 01:12:19,580 if they see this metadata set to this zero and null value, 1522 01:12:19,580 --> 01:12:22,590 they can say, aha, that bookmark was probably generated 1523 01:12:22,590 --> 01:12:25,140 in private browsing mode. 1524 01:12:25,140 --> 01:12:28,775 So one thing to think about is typically 1525 01:12:28,775 --> 01:12:30,290 we talk about browser security. 1526 01:12:30,290 --> 01:12:32,250 We talk about, OK, what can people do 1527 01:12:32,250 --> 01:12:34,356 with JavaScript or HTML or CSS. 1528 01:12:34,356 --> 01:12:35,980 One thing you might want to think about 1529 01:12:35,980 --> 01:12:38,400 is, well, what can people do with plug-ins or extensions? 1530 01:12:38,400 --> 01:12:40,350 So in the context of private browsing, 1531 01:12:40,350 --> 01:12:41,910 plug-ins and extensions are quite 1532 01:12:41,910 --> 01:12:46,260 interesting because they're not constrained in most cases 1533 01:12:46,260 --> 01:12:48,020 by the same origin policy. 1534 01:12:48,020 --> 01:12:49,840 They can constrain stuff like JavaScript. 1535 01:12:49,840 --> 01:12:52,340 And what's interesting is that these extensions and plug-ins 1536 01:12:52,340 --> 01:12:54,875 typically run with very high authority. 1537 01:12:54,875 --> 01:12:57,500 Loosely speaking, you can think of them as like kernel modules. 1538 01:12:57,500 --> 01:12:59,020 They implement new features directly 1539 01:12:59,020 --> 01:13:01,470 inside the browsers themselves. 1540 01:13:01,470 --> 01:13:03,280 And so that's a little bit problematic 1541 01:13:03,280 --> 01:13:05,450 because these plug-ins and extensions are often 1542 01:13:05,450 --> 01:13:09,030 developed by someone who is not the actual browser vendor. 1543 01:13:09,030 --> 01:13:10,580 So what that means is that someone 1544 01:13:10,580 --> 01:13:12,496 is trying to do something nice and provide you 1545 01:13:12,496 --> 01:13:15,580 with this nice value add in this browser plug in or extension. 1546 01:13:15,580 --> 01:13:17,380 But that implementor might not fully 1547 01:13:17,380 --> 01:13:19,775 understand the context, the security context, 1548 01:13:19,775 --> 01:13:22,140 in which that extension runs. 1549 01:13:22,140 --> 01:13:25,760 So that extension may not implement private browsing mode 1550 01:13:25,760 --> 01:13:26,450 semantics. 1551 01:13:26,450 --> 01:13:29,710 Or it may try to implement it to do it in a bad way. 1552 01:13:29,710 --> 01:13:33,052 And so as I'll describe in a couple of minutes, that's 1553 01:13:33,052 --> 01:13:35,110 actually bad from the security perspective 1554 01:13:35,110 --> 01:13:37,401 because that means if we add some of these new plug-ins 1555 01:13:37,401 --> 01:13:39,100 or extensions, you now can't strongly 1556 01:13:39,100 --> 01:13:42,990 reason about what the resulting [INAUDIBLE] are. 1557 01:13:42,990 --> 01:13:45,430 Now, one thing that's nice is that plug-ins are actually 1558 01:13:45,430 --> 01:13:47,920 probably going the way of dinosaurs. 1559 01:13:47,920 --> 01:13:50,387 So as you probably know, HTML5 adds all these new features 1560 01:13:50,387 --> 01:13:51,970 like the audio tag and the videos tag, 1561 01:13:51,970 --> 01:13:53,010 and stuff like that. 1562 01:13:53,010 --> 01:13:56,440 And so a lot of these new features were designed to allow 1563 01:13:56,440 --> 01:13:58,030 people to get away from plug-ins-- 1564 01:13:58,030 --> 01:14:01,745 to get away from Java-- to get away from Flash . 1565 01:14:01,745 --> 01:14:03,610 So when people in the past wanted do things 1566 01:14:03,610 --> 01:14:06,560 like have rich 2D or 3D graphics, 1567 01:14:06,560 --> 01:14:08,660 they'd have to do something like Java or Flash. 1568 01:14:08,660 --> 01:14:10,460 Now they can use things like Web GL. 1569 01:14:10,460 --> 01:14:12,960 They can used things like the canvass tag. 1570 01:14:12,960 --> 01:14:14,980 So probably plug-ins are going away. 1571 01:14:14,980 --> 01:14:16,410 In fact, the IE team for example, 1572 01:14:16,410 --> 01:14:17,910 has said that in a couple years they 1573 01:14:17,910 --> 01:14:20,410 don't think anybody's going to be using plug-ins whatsoever. 1574 01:14:20,410 --> 01:14:22,246 It's all going to be HTML5 type stuff. 1575 01:14:22,246 --> 01:14:24,870 In fact, if you go to YouTube-- I don't know if you've noticed. 1576 01:14:24,870 --> 01:14:26,650 But a lot of times if you go to the video, 1577 01:14:26,650 --> 01:14:30,250 the video is actually using-- it's called an HTML5 player. 1578 01:14:30,250 --> 01:14:34,290 They've gone away from their standard plugin-based one. 1579 01:14:34,290 --> 01:14:35,415 So that's very interesting. 1580 01:14:35,415 --> 01:14:37,600 You can already see sites trying to move 1581 01:14:37,600 --> 01:14:39,049 towards this new plug-in world. 1582 01:14:39,049 --> 01:14:40,590 However, extensions are probably here 1583 01:14:40,590 --> 01:14:42,423 to stay for at least the foreseeable future. 1584 01:14:42,423 --> 01:14:45,409 So it's still important to get those right. 1585 01:14:45,409 --> 01:14:47,450 So, yeah, the last thing that I wanted to discuss 1586 01:14:47,450 --> 01:14:51,340 is a paper was written in 2010-- that's four years ago. 1587 01:14:51,340 --> 01:14:52,930 So you might think to yourself what's 1588 01:14:52,930 --> 01:14:55,250 changed about private browsing? 1589 01:14:55,250 --> 01:14:57,470 And so at a high level, private browsing mode 1590 01:14:57,470 --> 01:14:59,580 is still tricky to get right. 1591 01:14:59,580 --> 01:15:02,370 And the reason why it's tricky to get right-- 1592 01:15:02,370 --> 01:15:03,220 a couple of reasons. 1593 01:15:03,220 --> 01:15:05,430 So first of all, because the browser [INAUDIBLE] 1594 01:15:05,430 --> 01:15:10,560 is still growing because of things like this HTML5 stuff. 1595 01:15:10,560 --> 01:15:13,500 The interface, which needs to be secure with respect 1596 01:15:13,500 --> 01:15:15,505 to private browsing mode, that frontier 1597 01:15:15,505 --> 01:15:17,160 is always getting bigger. 1598 01:15:17,160 --> 01:15:19,230 And also a lot of times developers-- 1599 01:15:19,230 --> 01:15:22,950 they are more focused on to adding cool, new features. 1600 01:15:22,950 --> 01:15:24,360 And then the privacy implications 1601 01:15:24,360 --> 01:15:26,340 get taken up later on. 1602 01:15:26,340 --> 01:15:29,377 And so in practice, it is still tricky to produce 1603 01:15:29,377 --> 01:15:31,710 a private browsing mode which catches all potential data 1604 01:15:31,710 --> 01:15:33,430 leaks. 1605 01:15:33,430 --> 01:15:37,600 So one example, there was a Firefox bug fix 1606 01:15:37,600 --> 01:15:39,680 from January, 2014. 1607 01:15:39,680 --> 01:15:44,060 And the basic idea is there is this extension-- 1608 01:15:44,060 --> 01:15:49,050 it's called pdf.js is basically a way 1609 01:15:49,050 --> 01:15:55,020 to look at PDF files using pure HTML5 interfaces. 1610 01:15:55,020 --> 01:15:58,280 And so as it turns out, this extension 1611 01:15:58,280 --> 01:16:03,010 was allowing public mode cookies to leak when it was being 1612 01:16:03,010 --> 01:16:06,446 used in private browsing mode. 1613 01:16:06,446 --> 01:16:08,440 The idea is that let's say that you visit 1614 01:16:08,440 --> 01:16:10,600 some websites in public mode. 1615 01:16:10,600 --> 01:16:11,850 You want to download some PDF. 1616 01:16:11,850 --> 01:16:13,470 Maybe you get some cookie that comes back. 1617 01:16:13,470 --> 01:16:15,180 You come back in private browsing mode. 1618 01:16:15,180 --> 01:16:17,850 You want to view another PDF from that site. 1619 01:16:17,850 --> 01:16:20,215 And then pdf.js is actually sending those public mode 1620 01:16:20,215 --> 01:16:23,800 cookies along with any private mode things that were set. 1621 01:16:23,800 --> 01:16:26,110 And so in the lecture notes, I actually 1622 01:16:26,110 --> 01:16:29,639 have a link to the bugzilla discussion 1623 01:16:29,639 --> 01:16:30,680 about the particular bug. 1624 01:16:30,680 --> 01:16:32,600 So the fix was actually quite simple 1625 01:16:32,600 --> 01:16:34,267 once they realized this was the problem. 1626 01:16:34,267 --> 01:16:36,100 Basically they just have to add a check that 1627 01:16:36,100 --> 01:16:38,680 says morally speaking, am I in private browsing mode? 1628 01:16:38,680 --> 01:16:41,020 If so, do some things-- and one of those things 1629 01:16:41,020 --> 01:16:43,140 is not from the cookies. 1630 01:16:43,140 --> 01:16:45,630 So the fix here is actually quite simple. 1631 01:16:45,630 --> 01:16:49,070 But the challenge was that once again, people 1632 01:16:49,070 --> 01:16:51,500 added this cool, new extension. 1633 01:16:51,500 --> 01:16:53,920 But it hadn't really crossed their mind 1634 01:16:53,920 --> 01:16:57,590 to do this full, invasive audit. 1635 01:16:57,590 --> 01:17:00,270 And say where are all the places at which 1636 01:17:00,270 --> 01:17:03,720 private browsing with semantics might be impacted 1637 01:17:03,720 --> 01:17:05,445 by this particular plug-in. 1638 01:17:05,445 --> 01:17:06,930 There's another interesting one too 1639 01:17:06,930 --> 01:17:09,405 this is actually the discussion we 1640 01:17:09,405 --> 01:17:11,751 had about 30 minutes ago about what happens if you have 1641 01:17:11,751 --> 01:17:14,250 private tabs and public tabs where you open at the same time 1642 01:17:14,250 --> 01:17:15,570 or very close to each other. 1643 01:17:15,570 --> 01:17:18,080 There is actually a bug in Firefox. 1644 01:17:18,080 --> 01:17:19,870 I think that's from-- let's see here-- 1645 01:17:19,870 --> 01:17:22,750 yeah, 2011, which is still unfilled. 1646 01:17:22,750 --> 01:17:24,360 And the basic idea is that if you 1647 01:17:24,360 --> 01:17:27,740 go to a task in private browsing mode-- OK, 1648 01:17:27,740 --> 01:17:28,655 you go do some stuff. 1649 01:17:28,655 --> 01:17:31,875 You then close that tab. 1650 01:17:31,875 --> 01:17:34,170 You then open a new public mode tab. 1651 01:17:34,170 --> 01:17:40,906 And you go to about:memory. 1652 01:17:40,906 --> 01:17:43,354 So as you probably know, a browser is defined as fake URLs 1653 01:17:43,354 --> 01:17:45,520 and telling information about how the browser works. 1654 01:17:45,520 --> 01:17:47,706 So you go to the private tab, close it up, 1655 01:17:47,706 --> 01:17:49,539 then go to about:memory. 1656 01:17:49,539 --> 01:17:51,080 This is going to tell you information 1657 01:17:51,080 --> 01:17:53,830 about all the objects that Firefox has allocated. 1658 01:17:53,830 --> 01:17:58,000 So what would happen is that window objects are typically 1659 01:17:58,000 --> 01:18:01,362 deallocated-- they are [INAUDIBLE] in Firefox. 1660 01:18:01,362 --> 01:18:03,820 So what ends up happening is that when you open up that new 1661 01:18:03,820 --> 01:18:06,880 public mode tab, go to about:memory you can actually 1662 01:18:06,880 --> 01:18:11,670 find information still about that private mode window such 1663 01:18:11,670 --> 01:18:13,087 as things like a URL, for example, 1664 01:18:13,087 --> 01:18:15,545 that will tell you how much memory to allocate and all that 1665 01:18:15,545 --> 01:18:16,140 kind of stuff. 1666 01:18:16,140 --> 01:18:17,570 And it's all in the plain text. 1667 01:18:17,570 --> 01:18:20,762 And so that's an example of how these very subtle 1668 01:18:20,762 --> 01:18:22,470 interfaces and browsers that can actually 1669 01:18:22,470 --> 01:18:24,340 leak a lot of information. 1670 01:18:24,340 --> 01:18:26,395 And so it's very interesting. 1671 01:18:26,395 --> 01:18:28,020 If you look at the bugzilla discussion, 1672 01:18:28,020 --> 01:18:31,244 it's actually pretty interesting to see how these problems get 1673 01:18:31,244 --> 01:18:32,160 resolved in real life. 1674 01:18:32,160 --> 01:18:35,170 And I put a link it so there is a message 1675 01:18:35,170 --> 01:18:39,025 that this book was deprioritized when it became clear 1676 01:18:39,025 --> 01:18:42,020 that the potential solution was more involved than 1677 01:18:42,020 --> 01:18:44,552 originally anticipated. 1678 01:18:44,552 --> 01:18:46,239 So that's a pretty long discussion 1679 01:18:46,239 --> 01:18:47,280 about how do we fix this. 1680 01:18:47,280 --> 01:18:49,070 And it involved changing the way that garbage collection is 1681 01:18:49,070 --> 01:18:49,610 done. 1682 01:18:49,610 --> 01:18:53,350 And it's very tricky because if you invoke it too often 1683 01:18:53,350 --> 01:18:55,100 then it gets performance. 1684 01:18:55,100 --> 01:18:57,230 So there's this long discussion about this. 1685 01:18:57,230 --> 01:18:58,810 So they said, "It was deprioritized 1686 01:18:58,810 --> 01:19:00,809 when it was clear the solution was more involved 1687 01:19:00,809 --> 01:19:01,746 than anticipated." 1688 01:19:01,746 --> 01:19:04,810 And then in response, a developer said, 1689 01:19:04,810 --> 01:19:06,250 "That is very sad to hear. 1690 01:19:06,250 --> 01:19:08,100 This could pretty much defeat the purpose 1691 01:19:08,100 --> 01:19:10,440 of things like session store for getting 1692 01:19:10,440 --> 01:19:12,130 about closed private windows." 1693 01:19:12,130 --> 01:19:14,977 So the developers about this stuff. 1694 01:19:14,977 --> 01:19:16,560 Like in the case of the session store, 1695 01:19:16,560 --> 01:19:19,780 this is storage feature for HTML5-- 1696 01:19:19,780 --> 01:19:21,936 they had gone to a lot of trouble 1697 01:19:21,936 --> 01:19:25,780 to make it delete things that belong 1698 01:19:25,780 --> 01:19:28,320 to these closed private windows. 1699 01:19:28,320 --> 01:19:30,440 But, basically, what this bug did-- it still-- 1700 01:19:30,440 --> 01:19:32,060 it basically still left information 1701 01:19:32,060 --> 01:19:35,260 about that stuff sitting around in memory somewhere. 1702 01:19:35,260 --> 01:19:37,841 So long story short, it's still very difficult 1703 01:19:37,841 --> 01:19:39,090 to get private browsing right. 1704 01:19:39,090 --> 01:19:41,765 And in fact, there are actually off-the-shelf forensics tools 1705 01:19:41,765 --> 01:19:43,431 that you can download that will actually 1706 01:19:43,431 --> 01:19:47,959 look for evidence of both public and private browsing modes. 1707 01:19:47,959 --> 01:19:49,625 So if you're an attacker, you don't have 1708 01:19:49,625 --> 01:19:50,910 to roll your own custom tool. 1709 01:19:50,910 --> 01:19:52,436 There's this one they call Magnet. 1710 01:19:52,436 --> 01:19:54,615 I think it's an internet evidence finder. 1711 01:19:54,615 --> 01:19:55,740 You just go get this thing. 1712 01:19:55,740 --> 01:19:57,670 It'll do things like look through your page 1713 01:19:57,670 --> 01:19:59,090 file for RAM artifacts. 1714 01:19:59,090 --> 01:20:00,730 And it will give you a very nice GUI. 1715 01:20:00,730 --> 01:20:02,570 It'll say here are the images I found. 1716 01:20:02,570 --> 01:20:04,540 Here are the URLs. 1717 01:20:04,540 --> 01:20:07,240 So in practice, these private browsing modes 1718 01:20:07,240 --> 01:20:08,740 still do leak some information. 1719 01:20:08,740 --> 01:20:11,190 All right, so next section, we'll talk about Tor.