1 00:00:00,070 --> 00:00:02,430 The following content is provided under a Creative 2 00:00:02,430 --> 00:00:03,820 Commons license. 3 00:00:03,820 --> 00:00:06,060 Your support will help MIT OpenCourseWare 4 00:00:06,060 --> 00:00:10,140 continue to offer high quality educational resources for free. 5 00:00:10,140 --> 00:00:12,690 To make a donation or to view additional materials 6 00:00:12,690 --> 00:00:16,600 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:16,600 --> 00:00:17,255 at ocw.mit.edu. 8 00:00:25,835 --> 00:00:26,960 PROFESSOR: All right, guys. 9 00:00:26,960 --> 00:00:28,800 So let's get started. 10 00:00:28,800 --> 00:00:31,190 Welcome back from what I hope was an exciting holiday 11 00:00:31,190 --> 00:00:32,560 for everyone. 12 00:00:32,560 --> 00:00:35,360 So today we're going to talk about user authentication. 13 00:00:35,360 --> 00:00:37,890 So the basic challenge that we want to address today 14 00:00:37,890 --> 00:00:42,420 is how can human users prove their identity to a program? 15 00:00:42,420 --> 00:00:45,680 In particular, the paper that was assigned for today's class 16 00:00:45,680 --> 00:00:47,635 addresses an existential question 17 00:00:47,635 --> 00:00:48,930 in the security community. 18 00:00:48,930 --> 00:00:53,240 Is there anything better than passwords for authentication? 19 00:00:53,240 --> 00:00:57,430 So at a high level it seems like passwords are a terrible idea. 20 00:00:57,430 --> 00:01:00,010 So they have very low entropy, its very easy for attackers 21 00:01:00,010 --> 00:01:01,380 to guess them. 22 00:01:01,380 --> 00:01:03,130 Also the security questions that we 23 00:01:03,130 --> 00:01:05,481 use to recover from lost passwords 24 00:01:05,481 --> 00:01:07,480 often have even lower entropy than the passwords 25 00:01:07,480 --> 00:01:10,330 themselves, which also seems like a problem. 26 00:01:10,330 --> 00:01:15,180 And even worse, users typically will use the same password 27 00:01:15,180 --> 00:01:16,987 across a lot of different sites. 28 00:01:16,987 --> 00:01:19,195 So that means that the vulnerability in one password, 29 00:01:19,195 --> 00:01:22,820 if it's easy to guess, could expose a user's activity 30 00:01:22,820 --> 00:01:24,400 across a wide range of sites. 31 00:01:24,400 --> 00:01:27,030 So as the paper for today's class states, 32 00:01:27,030 --> 00:01:28,930 I love this quote, "the continued domination 33 00:01:28,930 --> 00:01:31,620 of passwords over all of the methods 34 00:01:31,620 --> 00:01:34,850 of in-user authentication is a major embarrassment 35 00:01:34,850 --> 00:01:36,110 for security researchers." 36 00:01:36,110 --> 00:01:37,920 All right, so the community just seething out there, 37 00:01:37,920 --> 00:01:39,460 they want some better alternative. 38 00:01:39,460 --> 00:01:41,380 But it's not clear if there actually 39 00:01:41,380 --> 00:01:45,910 is an authentication scheme that actually totally dominates 40 00:01:45,910 --> 00:01:48,630 passwords, that's more usable, that's more deployable, 41 00:01:48,630 --> 00:01:49,830 that's more secure. 42 00:01:49,830 --> 00:01:52,210 So in today's lecture, we'll basically do three things. 43 00:01:52,210 --> 00:01:53,710 So first of all, we're going to look 44 00:01:53,710 --> 00:01:55,970 and we're going to see how current passwords can work. 45 00:01:55,970 --> 00:01:58,660 Then we're going to talk about the desirable properties 46 00:01:58,660 --> 00:02:01,630 at a high level for any authentication scheme. 47 00:02:01,630 --> 00:02:05,112 And then we're finally going to look at what the paper gives us 48 00:02:05,112 --> 00:02:07,320 in terms of metrics for authenticating authentication 49 00:02:07,320 --> 00:02:08,740 schemes, and we're going to see how 50 00:02:08,740 --> 00:02:10,156 some of these other authentication 51 00:02:10,156 --> 00:02:12,230 schemes actually compared to passwords. 52 00:02:12,230 --> 00:02:14,860 So in [INAUDIBLE] what is a password? 53 00:02:14,860 --> 00:02:26,250 So a password is a secret that is shared 54 00:02:26,250 --> 00:02:30,400 between a user and a server. 55 00:02:34,540 --> 00:02:37,800 So the naive implementation of a password scheme 56 00:02:37,800 --> 00:02:41,160 is to basically just have a table 57 00:02:41,160 --> 00:02:44,780 on the server side that essentially just maps 58 00:02:44,780 --> 00:02:50,258 user names to passwords. 59 00:02:50,258 --> 00:02:52,008 That's the simplest way for you to imagine 60 00:02:52,008 --> 00:02:54,980 implementing one of the authentication schemes-- user 61 00:02:54,980 --> 00:02:58,280 passes into their user name and the password, server 62 00:02:58,280 --> 00:02:59,826 network does a look up in this table, 63 00:02:59,826 --> 00:03:01,700 compares the password of the client supplied, 64 00:03:01,700 --> 00:03:02,360 what's in here. 65 00:03:02,360 --> 00:03:04,320 If everything's good, the user's authenticated. 66 00:03:04,320 --> 00:03:06,176 So clearly the problem with this is 67 00:03:06,176 --> 00:03:09,212 that if the attacker compromises the server, 68 00:03:09,212 --> 00:03:10,670 then he can just look at this table 69 00:03:10,670 --> 00:03:13,959 and then get all the uses passwords in the queue. 70 00:03:13,959 --> 00:03:15,270 So that's clearly a bad thing. 71 00:03:15,270 --> 00:03:19,170 So perhaps an improved solution is 72 00:03:19,170 --> 00:03:23,280 to have the server store a table that looks like. 73 00:03:23,280 --> 00:03:25,320 So once again, it'd match the user name 74 00:03:25,320 --> 00:03:31,055 but now it actually match to hash of the password. 75 00:03:34,360 --> 00:03:37,000 So user client's gonna supply their clear text 76 00:03:37,000 --> 00:03:39,590 password to the server, the server 77 00:03:39,590 --> 00:03:41,480 will then take that clear text password, 78 00:03:41,480 --> 00:03:43,870 hash it, do look at the table, and once again see 79 00:03:43,870 --> 00:03:46,620 if the user is who he or she says that they are. 80 00:03:46,620 --> 00:03:49,490 So the advantage of this scheme is 81 00:03:49,490 --> 00:03:52,080 that by designed these hash functions 82 00:03:52,080 --> 00:03:54,040 are difficult to invert. 83 00:03:54,040 --> 00:03:57,304 So if this table is lost, it's leaked somehow 84 00:03:57,304 --> 00:03:58,928 or the attacker compromised the server, 85 00:03:58,928 --> 00:04:00,969 and the attacker could look at these things here, 86 00:04:00,969 --> 00:04:03,180 but it's difficult for the attackers 87 00:04:03,180 --> 00:04:05,695 to say, OK, this sort of string of random alpha 88 00:04:05,695 --> 00:04:07,460 numeric characters here. 89 00:04:07,460 --> 00:04:10,592 Here's a pre-image that was used as the input 90 00:04:10,592 --> 00:04:13,660 of the hast function [INAUDIBLE] that value there. 91 00:04:13,660 --> 00:04:16,089 So that at least is the nice thing 92 00:04:16,089 --> 00:04:18,720 about these hashes in theory. 93 00:04:18,720 --> 00:04:21,370 Now in practice, attackers don't actually 94 00:04:21,370 --> 00:04:23,540 have to launch brute force attacks 95 00:04:23,540 --> 00:04:28,150 to figure out what the preimages for these hash values are. 96 00:04:28,150 --> 00:04:30,770 So attackers can actually take advantage of the fact 97 00:04:30,770 --> 00:04:36,595 that passwords in practice have skewed distribution. 98 00:04:40,200 --> 00:04:43,150 And by skewed distributions, I mean 99 00:04:43,150 --> 00:04:45,850 that-- let's say that we knew that all passwords were 100 00:04:45,850 --> 00:04:47,150 20 characters long. 101 00:04:47,150 --> 00:04:50,460 It's not like users actually pick passwords that's 102 00:04:50,460 --> 00:04:54,080 sort of exist in all places in that space of 20 103 00:04:54,080 --> 00:04:55,340 possible characters. 104 00:04:55,340 --> 00:05:00,580 In practice, people pick passwords like 1, 2, 3 or todd 105 00:05:00,580 --> 00:05:02,002 or things like this. 106 00:05:02,002 --> 00:05:03,960 So in fact there's been these empirical studies 107 00:05:03,960 --> 00:05:08,180 of how passwords work and a lot of times 108 00:05:08,180 --> 00:05:18,764 these studies find things like the top 5,000 passwords 109 00:05:18,764 --> 00:05:21,710 cover about 20% of users. 110 00:05:25,032 --> 00:05:26,490 So what that means, in other words, 111 00:05:26,490 --> 00:05:29,970 is that the attacker has a database of those 5,000 112 00:05:29,970 --> 00:05:30,840 passwords. 113 00:05:30,840 --> 00:05:32,830 The attacker can just hash those, 114 00:05:32,830 --> 00:05:37,050 and then when the attacker looks at this stolen password table, 115 00:05:37,050 --> 00:05:39,640 can just see if one of those things that 116 00:05:39,640 --> 00:05:44,408 come from this 5,000 large list match over here. 117 00:05:44,408 --> 00:05:46,344 And so empirically speaking, the attacker 118 00:05:46,344 --> 00:05:49,260 would be able to recover about 20% of passwords that way. 119 00:05:49,260 --> 00:05:55,050 And so, folks at Yahoo found that passwords 120 00:05:55,050 --> 00:06:02,832 have roughly 10 to 20 bits of intricate, 10 to 20 bits 121 00:06:02,832 --> 00:06:04,760 of randomness in them. 122 00:06:04,760 --> 00:06:08,360 And that's actually not that big. 123 00:06:08,360 --> 00:06:10,435 So, for example, if you think about what might 124 00:06:10,435 --> 00:06:11,560 this hash function here be? 125 00:06:11,560 --> 00:06:14,620 So maybe it's something like shop, something like this. 126 00:06:14,620 --> 00:06:17,880 So modern machines actually calculate millions 127 00:06:17,880 --> 00:06:20,260 of these hashes every second. 128 00:06:20,260 --> 00:06:22,660 So the fact that hash function by design 129 00:06:22,660 --> 00:06:25,050 are suppose to be easy to calculate 130 00:06:25,050 --> 00:06:26,450 so it'd be fast calculate. 131 00:06:26,450 --> 00:06:27,950 Combined with this fact that there'd 132 00:06:27,950 --> 00:06:29,700 be skewed password distributions, 133 00:06:29,700 --> 00:06:32,500 means that in principle, this scheme here is not as secure 134 00:06:32,500 --> 00:06:34,510 as it might seem. 135 00:06:34,510 --> 00:06:36,800 So one thing you can imagine to try 136 00:06:36,800 --> 00:06:40,660 to make life more difficult on the attacker 137 00:06:40,660 --> 00:06:46,860 is you could imagine that you use expensive key derivation 138 00:06:46,860 --> 00:06:47,360 function. 139 00:06:53,290 --> 00:06:55,280 And so by key derivation function, 140 00:06:55,280 --> 00:06:58,867 I just mean this thing up here. 141 00:06:58,867 --> 00:07:01,200 This thing that's taking the passwords as input and then 142 00:07:01,200 --> 00:07:03,505 generate something that's stored on the server. 143 00:07:03,505 --> 00:07:05,213 So what's nice about these key derivation 144 00:07:05,213 --> 00:07:09,915 functions is it actually have tunable cost. 145 00:07:09,915 --> 00:07:11,930 So you can basically turn this knob 146 00:07:11,930 --> 00:07:14,516 and make that function run slower or faster 147 00:07:14,516 --> 00:07:15,640 depending on what you want. 148 00:07:15,640 --> 00:07:17,525 And so the idea here is that, let's say 149 00:07:17,525 --> 00:07:19,650 that you're going to use a key derivation function. 150 00:07:19,650 --> 00:07:28,020 So assume these examples are like PBKDF2, or maybe BCrypt 151 00:07:28,020 --> 00:07:30,901 so you can look these up using the miracle of the internet 152 00:07:30,901 --> 00:07:32,400 if you care to know more about them. 153 00:07:32,400 --> 00:07:34,330 But the base idea is let's imagine 154 00:07:34,330 --> 00:07:36,040 that one of these key derivation function 155 00:07:36,040 --> 00:07:40,820 took a second to calculate, as opposed to a few milliseconds. 156 00:07:40,820 --> 00:07:42,490 That actually makes the attacker's job 157 00:07:42,490 --> 00:07:45,760 much more difficult. Because when the attacker is trying 158 00:07:45,760 --> 00:07:49,090 to, let's say, generate values for these 5,000 topmost 159 00:07:49,090 --> 00:07:51,720 passwords, it's going to take the attacker much longer 160 00:07:51,720 --> 00:07:52,760 to do that. 161 00:07:52,760 --> 00:07:55,770 So does that all makes sense how these things work? 162 00:07:55,770 --> 00:07:56,940 Pretty straight forward. 163 00:07:56,940 --> 00:07:59,260 So internally these key derivation functions 164 00:07:59,260 --> 00:08:02,675 often operate by repeatedly calling a hash multiple, 165 00:08:02,675 --> 00:08:03,500 multiple times. 166 00:08:03,500 --> 00:08:05,960 So that's all pretty straightforward. 167 00:08:05,960 --> 00:08:08,712 So you might say, well, does this solve the problem? 168 00:08:08,712 --> 00:08:10,753 So can we just use these expensive key derivation 169 00:08:10,753 --> 00:08:12,590 function and be done with it? 170 00:08:12,590 --> 00:08:14,920 So if this was a security class, the answer is no. 171 00:08:14,920 --> 00:08:17,820 So one problem is that the adversary can build something 172 00:08:17,820 --> 00:08:23,470 called rainbow tables. 173 00:08:23,470 --> 00:08:29,990 And so a rainbow table is basically just a map 174 00:08:29,990 --> 00:08:35,490 of a password to hash out. 175 00:08:39,039 --> 00:08:43,532 And so the insight here is that even if the system is using 176 00:08:43,532 --> 00:08:45,665 one of these expensive key derivation function, 177 00:08:45,665 --> 00:08:49,840 the attacker can calculate one of these tables once. 178 00:08:49,840 --> 00:08:52,396 It might be a little bit painful because each key derivation 179 00:08:52,396 --> 00:08:53,950 function indication is slow. 180 00:08:53,950 --> 00:08:56,780 But the attacker can build this table once and then use 181 00:08:56,780 --> 00:09:00,030 that to crack all subsequent systems the attacker can 182 00:09:00,030 --> 00:09:04,120 break into that use that same key derivation function. 183 00:09:04,120 --> 00:09:05,980 So that's how rainbow tables work. 184 00:09:05,980 --> 00:09:07,827 And once again, to maximize the cost benefit 185 00:09:07,827 --> 00:09:09,660 of building this rainbow table, the attacker 186 00:09:09,660 --> 00:09:12,700 could take advantage of the skewed password distributions 187 00:09:12,700 --> 00:09:13,450 I can see up here. 188 00:09:13,450 --> 00:09:15,040 So the attacker might only build a rainbow table 189 00:09:15,040 --> 00:09:17,245 for some small set of all possible passwords. 190 00:09:17,245 --> 00:09:19,910 AUDIENCE: So salting makes this much more difficult. 191 00:09:19,910 --> 00:09:21,410 PROFESSOR: Yeah, yeah, that's right. 192 00:09:21,410 --> 00:09:24,250 So we're going to get to salting I believe in a couple seconds. 193 00:09:24,250 --> 00:09:24,890 That's right. 194 00:09:24,890 --> 00:09:27,290 So at a high level, if you don't use salting, 195 00:09:27,290 --> 00:09:29,620 rainbow tables actually allow the attacker 196 00:09:29,620 --> 00:09:32,030 to spend some effort offline, calculate this table, 197 00:09:32,030 --> 00:09:34,430 and then sort of amortized the cost 198 00:09:34,430 --> 00:09:36,119 of calculating that table over breaking 199 00:09:36,119 --> 00:09:37,535 many different password databases. 200 00:09:41,455 --> 00:09:44,510 So the next thing that we can think about to improve things 201 00:09:44,510 --> 00:09:45,255 is salting. 202 00:09:45,255 --> 00:09:46,630 I swear that guy was not a plant, 203 00:09:46,630 --> 00:09:49,180 I will give you your $20 after class. 204 00:09:49,180 --> 00:09:50,990 So how does salting work? 205 00:09:50,990 --> 00:09:52,448 So the basic thing is you just want 206 00:09:52,448 --> 00:09:54,950 input some additional randomness into the way 207 00:09:54,950 --> 00:09:56,750 that the passwords generated. 208 00:09:56,750 --> 00:10:02,450 So basically, you want to take this hash function 209 00:10:02,450 --> 00:10:05,172 and you want to put some salt in there-- which 210 00:10:05,172 --> 00:10:08,657 I'll explain in a second-- and then the password. 211 00:10:08,657 --> 00:10:10,865 And this is the thing that you saw on the server side 212 00:10:10,865 --> 00:10:11,656 in the [INAUDIBLE]. 213 00:10:11,656 --> 00:10:12,660 So what is this salt? 214 00:10:12,660 --> 00:10:16,880 And you just think of it as just a string, a long string that's 215 00:10:16,880 --> 00:10:20,370 provided as sort of a first part to this hash function. 216 00:10:20,370 --> 00:10:23,640 So why is it better to use this scheme? 217 00:10:23,640 --> 00:10:25,440 And know that the salt is actually 218 00:10:25,440 --> 00:10:28,500 stored on the clear text on the server side. 219 00:10:28,500 --> 00:10:30,879 So you might be thinking OK, well if that salt is stored 220 00:10:30,879 --> 00:10:32,640 on the clear text in the server side, 221 00:10:32,640 --> 00:10:36,030 it seemed like a server can both steal the table that matched 222 00:10:36,030 --> 00:10:38,330 user names to passwords and the attacker can also 223 00:10:38,330 --> 00:10:41,109 steal the salt. So why is that useful? 224 00:10:41,109 --> 00:10:43,650 AUDIENCE: Because if you picked the top most common password, 225 00:10:43,650 --> 00:10:46,107 you can't just use it once and find a new user. 226 00:10:46,107 --> 00:10:47,440 PROFESSOR: That's exactly right. 227 00:10:47,440 --> 00:10:49,180 So basically what this does is this 228 00:10:49,180 --> 00:10:52,580 prevents the attacker from building a single rainbow table 229 00:10:52,580 --> 00:10:56,050 and then using that rainbow table against all instances 230 00:10:56,050 --> 00:10:57,930 of that hash function. 231 00:10:57,930 --> 00:10:59,970 And so you can basically think of this 232 00:10:59,970 --> 00:11:02,776 as sort of uniquifying passwords even if they 233 00:11:02,776 --> 00:11:04,810 are the same, basically. 234 00:11:04,810 --> 00:11:07,166 So this is what a lot of systems do in practice, they 235 00:11:07,166 --> 00:11:09,370 use this notion of salt here. 236 00:11:09,370 --> 00:11:10,840 And so the best practices for this 237 00:11:10,840 --> 00:11:12,360 so you want to choose a salt that's 238 00:11:12,360 --> 00:11:14,776 long Because you're going to essentially think of the salt 239 00:11:14,776 --> 00:11:18,240 as adding more bits to this pseudo-password right. 240 00:11:18,240 --> 00:11:19,490 So more bits is always better. 241 00:11:19,490 --> 00:11:21,031 And the other thing you want to do to 242 00:11:21,031 --> 00:11:23,390 is that whenever the user changes his or her password, 243 00:11:23,390 --> 00:11:25,480 you typically want to change that salt too. 244 00:11:25,480 --> 00:11:29,165 So one reason for that is let's say that users are lazy 245 00:11:29,165 --> 00:11:31,750 and they want to pick the same password multiple times. 246 00:11:31,750 --> 00:11:34,678 Changing the salt will ensure that the thing that's 247 00:11:34,678 --> 00:11:37,303 stored in the password database will actually be different even 248 00:11:37,303 --> 00:11:38,440 it that password's the same. 249 00:11:38,440 --> 00:11:40,106 I think there was a questions somewhere. 250 00:11:40,106 --> 00:11:41,550 AUDIENCE: Why's it called salt? 251 00:11:41,550 --> 00:11:43,750 PROFESSOR: I'm actually not sure why it's called 252 00:11:43,750 --> 00:11:45,060 salt, that's a good question. 253 00:11:45,060 --> 00:11:46,680 I'm sure there's some answer to this though. 254 00:11:46,680 --> 00:11:47,450 It's like why are cookies called cookies? 255 00:11:47,450 --> 00:11:49,836 The internet will know but I actually don't know. 256 00:11:49,836 --> 00:11:52,800 AUDIENCE: Add some [INAUDIBLE] to the hash number 257 00:11:52,800 --> 00:11:55,270 hash [INAUDIBLE]. 258 00:11:55,270 --> 00:11:56,382 PROFESSOR: There we go. 259 00:11:56,382 --> 00:11:58,090 I'm glad that we're getting this on film, 260 00:11:58,090 --> 00:11:59,255 cause I feel this how we're going 261 00:11:59,255 --> 00:12:00,338 to get our Touring awards. 262 00:12:00,338 --> 00:12:01,530 That's right. 263 00:12:01,530 --> 00:12:03,790 I'm sure there's some answer on the internet, 264 00:12:03,790 --> 00:12:05,370 so I'll look that up later. 265 00:12:05,370 --> 00:12:08,280 But does that all basically makes sense? 266 00:12:08,280 --> 00:12:12,720 OK so these approaches are fairly straightforward. 267 00:12:12,720 --> 00:12:16,980 So what I've assume so far is that somehow the client 268 00:12:16,980 --> 00:12:20,466 is transmitting the password to the server. 269 00:12:20,466 --> 00:12:23,090 But I haven't actually specified how that transition's actually 270 00:12:23,090 --> 00:12:23,923 going to take place. 271 00:12:27,270 --> 00:12:35,880 So how do we transmit these passwords? 272 00:12:35,880 --> 00:12:39,500 So the first idea you might have would be, 273 00:12:39,500 --> 00:12:43,960 well, we'll just send the password 274 00:12:43,960 --> 00:12:46,730 in the clear over the network. 275 00:12:46,730 --> 00:12:49,344 This is clearly cartoonishly bad, 276 00:12:49,344 --> 00:12:51,510 because then there could be a network attacker who's 277 00:12:51,510 --> 00:12:54,007 basically snooping and seeing the traffic 278 00:12:54,007 --> 00:12:54,840 that you're sending. 279 00:12:54,840 --> 00:12:56,798 And let's see if we can just take that password 280 00:12:56,798 --> 00:12:59,249 right off the wire and then impersonate you. 281 00:12:59,249 --> 00:13:00,790 So we always start with the straw man 282 00:13:00,790 --> 00:13:02,970 before I show you the other straw men, which of course are 283 00:13:02,970 --> 00:13:03,840 also fatally flawed. 284 00:13:03,840 --> 00:13:05,815 So first thing you think about is sending 285 00:13:05,815 --> 00:13:07,285 a password in the clear. 286 00:13:07,285 --> 00:13:08,785 Another thing you might think, which 287 00:13:08,785 --> 00:13:10,860 would be a little bit better perhaps, 288 00:13:10,860 --> 00:13:18,200 is perhaps we send the password over an encrypted connection. 289 00:13:23,345 --> 00:13:27,464 And so we use some type of cryptography here. 290 00:13:27,464 --> 00:13:29,630 Maybe there's some secret key or something like that 291 00:13:29,630 --> 00:13:31,540 and that's what we use to transform 292 00:13:31,540 --> 00:13:34,240 the password before we send it over the connection. 293 00:13:34,240 --> 00:13:35,942 So at a high level, encryption always 294 00:13:35,942 --> 00:13:37,400 seems to make things better, right? 295 00:13:37,400 --> 00:13:38,200 Trademark. 296 00:13:38,200 --> 00:13:41,179 But the problem is that unless you think carefully 297 00:13:41,179 --> 00:13:43,595 about how you're using things like encryption and hashing, 298 00:13:43,595 --> 00:13:45,473 you may not be getting the security benefits 299 00:13:45,473 --> 00:13:46,530 that you think you're getting. 300 00:13:46,530 --> 00:13:48,120 Because, for example, what if there's 301 00:13:48,120 --> 00:13:50,450 someone who's sitting between you-- the client-- 302 00:13:50,450 --> 00:13:53,426 and the server, this proverbial man in the middle attacker, 303 00:13:53,426 --> 00:13:55,050 who's actually snooping on your traffic 304 00:13:55,050 --> 00:13:57,580 and pretending to be the server. 305 00:13:57,580 --> 00:14:00,370 If you send encrypted data, you haven't actually 306 00:14:00,370 --> 00:14:02,600 authenticated the other end, then 307 00:14:02,600 --> 00:14:06,150 you could still be opening up yourself to problems. 308 00:14:06,150 --> 00:14:07,960 Because if the client just, let's say, 309 00:14:07,960 --> 00:14:10,410 picked some random key, sends it to some entity 310 00:14:10,410 --> 00:14:12,970 on the other side who may or may not be the server. 311 00:14:12,970 --> 00:14:15,906 It is not the server, [INAUDIBLE]. 312 00:14:15,906 --> 00:14:19,490 You are sending something to some person, who will then be 313 00:14:19,490 --> 00:14:21,390 able to get all your secrets. 314 00:14:21,390 --> 00:14:23,740 And so similarly, people might think well 315 00:14:23,740 --> 00:14:25,810 what if I don't send the raw password 316 00:14:25,810 --> 00:14:27,615 but I send a hash of the passwords. 317 00:14:27,615 --> 00:14:29,240 That actually doesn't give you anything 318 00:14:29,240 --> 00:14:30,260 in and of itself either. 319 00:14:30,260 --> 00:14:32,720 Because whether you send the password or the hash 320 00:14:32,720 --> 00:14:34,780 of a password-- I mean, a hash of the password 321 00:14:34,780 --> 00:14:37,800 has the same sort of semantic power as the original password 322 00:14:37,800 --> 00:14:38,794 itself. 323 00:14:38,794 --> 00:14:40,585 If you haven't authenticated the other side 324 00:14:40,585 --> 00:14:43,110 if you haven't authenticated the server or things like this. 325 00:14:43,110 --> 00:14:44,740 So the basic point with this discussion 326 00:14:44,740 --> 00:14:49,440 here is just to stress the fact that just adding encryption 327 00:14:49,440 --> 00:14:51,730 or just adding hashing doesn't necessarily 328 00:14:51,730 --> 00:14:53,690 give you any additional powers. 329 00:14:53,690 --> 00:14:56,160 If the client can't authenticate who he or she is sending 330 00:14:56,160 --> 00:14:59,620 the password to then the client could be mistakenly divulging 331 00:14:59,620 --> 00:15:03,430 that password with someone they don't intend to divulged it to. 332 00:15:03,430 --> 00:15:07,620 So perhaps a better idea than these two 333 00:15:07,620 --> 00:15:12,155 is to use what they call a challenge response protocol. 334 00:15:17,200 --> 00:15:20,070 And here's an example of a very simple challenge response 335 00:15:20,070 --> 00:15:21,090 protocol. 336 00:15:21,090 --> 00:15:26,140 So let's say we've got the client here, 337 00:15:26,140 --> 00:15:30,700 and then you've got the server over here. 338 00:15:30,700 --> 00:15:36,340 So the client says, hi, I'm Alice. 339 00:15:39,450 --> 00:15:45,470 And then the server response with some challenge seam, 340 00:15:45,470 --> 00:15:48,900 some quantity that the server got to pick. 341 00:15:48,900 --> 00:15:54,670 And then the client is going to respond 342 00:15:54,670 --> 00:15:58,950 with the hash of that server sent challenge, 343 00:15:58,950 --> 00:16:02,898 and then you can concatenate that with the password. 344 00:16:06,350 --> 00:16:09,490 So at this point, the server can take this quantity. 345 00:16:09,490 --> 00:16:11,830 The server knows the challenge that it sent. 346 00:16:11,830 --> 00:16:13,950 And presumably the server knows the password, 347 00:16:13,950 --> 00:16:16,530 so the server can [INAUDIBLE] this quantity 348 00:16:16,530 --> 00:16:19,780 and see it actually matches what the user sent. 349 00:16:19,780 --> 00:16:21,720 So what's nice about this protocol 350 00:16:21,720 --> 00:16:24,950 is that if we ignore man in the middle attacks for a second, 351 00:16:24,950 --> 00:16:28,985 the server is now confident that the user's actually Alice, 352 00:16:28,985 --> 00:16:31,331 because only Alice would know this password here. 353 00:16:31,331 --> 00:16:33,830 And what's nice about this is that if the server is actually 354 00:16:33,830 --> 00:16:36,120 the attacker-- so in other words, 355 00:16:36,120 --> 00:16:39,442 if Alice sent this thing to someone who's not 356 00:16:39,442 --> 00:16:41,400 the person who she's trying to authenticate to, 357 00:16:41,400 --> 00:16:43,957 then the attacker still doesn't know the password. 358 00:16:43,957 --> 00:16:45,990 Because the attacker got to choose C, 359 00:16:45,990 --> 00:16:48,126 but the attacker doesn't know what this is. 360 00:16:48,126 --> 00:16:49,500 And so basically for the attacker 361 00:16:49,500 --> 00:16:50,969 to figure out what the password is, 362 00:16:50,969 --> 00:16:52,760 the attacker has to be able to, once again, 363 00:16:52,760 --> 00:16:54,324 invert these hash functions. 364 00:16:54,324 --> 00:16:55,282 Do you have a question? 365 00:16:55,282 --> 00:16:57,282 AUDIENCE: I'm just curious, how can you not make 366 00:16:57,282 --> 00:17:01,178 a client do the hashing? 367 00:17:01,178 --> 00:17:01,678 [INAUDIBLE] 368 00:17:10,329 --> 00:17:13,300 PROFESSOR: So let's see, so your proposed scheme 369 00:17:13,300 --> 00:17:20,370 is that the client side is going to call this thing? 370 00:17:20,370 --> 00:17:22,495 AUDIENCE: Yeah, so instead of setting the password, 371 00:17:22,495 --> 00:17:26,478 and having the server hash the password and check it, 372 00:17:26,478 --> 00:17:28,482 the client would just send the hash password. 373 00:17:28,482 --> 00:17:30,815 PROFESSOR: The client would just sent the hash password. 374 00:17:36,430 --> 00:17:37,980 So there's a couple reasons. 375 00:17:37,980 --> 00:17:40,642 So one reason, as we'll discuss later, 376 00:17:40,642 --> 00:17:42,350 is that there's going to be things called 377 00:17:42,350 --> 00:17:43,772 anti-hammering defenses right. 378 00:17:43,772 --> 00:17:45,230 Anti-hammering defenses is designed 379 00:17:45,230 --> 00:17:48,544 to prevent a bad client from continually asking, 380 00:17:48,544 --> 00:17:50,335 is this the password, is this the password, 381 00:17:50,335 --> 00:17:51,330 is this the password? 382 00:17:51,330 --> 00:17:53,121 So then as a result, it's easier for things 383 00:17:53,121 --> 00:17:55,150 to be on the server side as on the client side. 384 00:17:55,150 --> 00:17:57,340 But suffice it to say, you can, in fact, 385 00:17:57,340 --> 00:17:59,882 do the hash on the client side. 386 00:17:59,882 --> 00:18:01,590 Using JavaScripts or something like this. 387 00:18:01,590 --> 00:18:03,185 But the basic idea is that somehow you 388 00:18:03,185 --> 00:18:06,770 have to have the computational expense be very, very large, 389 00:18:06,770 --> 00:18:10,620 because that's going to prevent the attacker from just guessing 390 00:18:10,620 --> 00:18:13,617 what the password is quickly. 391 00:18:13,617 --> 00:18:14,700 Is there another question? 392 00:18:14,700 --> 00:18:16,878 AUDIENCE: Well I just wanted to point out 393 00:18:16,878 --> 00:18:18,822 that if the client does the hashing, 394 00:18:18,822 --> 00:18:23,196 then it's [INAUDIBLE] because your password is the hash. 395 00:18:23,196 --> 00:18:25,140 PROFESSOR: So that's true. 396 00:18:25,140 --> 00:18:26,920 AUDIENCE: So if somebody get the table 397 00:18:26,920 --> 00:18:28,900 from the server [INAUDIBLE] using 398 00:18:28,900 --> 00:18:31,251 it to hash they can log in. 399 00:18:31,251 --> 00:18:32,250 PROFESSOR: That's right. 400 00:18:32,250 --> 00:18:34,041 Yeah, it gets a little bit subtle sometimes 401 00:18:34,041 --> 00:18:37,160 depending on who can pick, for example, 402 00:18:37,160 --> 00:18:38,487 these challenge values. 403 00:18:38,487 --> 00:18:40,820 Because if client and servers can pick challenge values, 404 00:18:40,820 --> 00:18:43,130 so that makes it more or less difficult for the client 405 00:18:43,130 --> 00:18:44,280 to launch those types of attacks. 406 00:18:44,280 --> 00:18:46,405 So for example, like one problem with this protocol 407 00:18:46,405 --> 00:18:49,700 here is that basically the client 408 00:18:49,700 --> 00:18:54,000 doesn't get to inject any randomness into this. 409 00:18:54,000 --> 00:18:55,500 So you can imagine that you can make 410 00:18:55,500 --> 00:18:59,440 this protocol more difficult for the server to invert. 411 00:18:59,440 --> 00:19:01,976 If the client actually got to choose some challenge that 412 00:19:01,976 --> 00:19:04,476 was put in here, so you got the server side challenge verses 413 00:19:04,476 --> 00:19:05,720 the client side challenge. 414 00:19:05,720 --> 00:19:06,886 But you're right about that. 415 00:19:09,110 --> 00:19:11,670 Any other questions? 416 00:19:11,670 --> 00:19:13,790 OK. 417 00:19:13,790 --> 00:19:17,240 So yeah, so this segues is discussion we're just having. 418 00:19:19,890 --> 00:19:22,960 So even though to break this, the server 419 00:19:22,960 --> 00:19:25,860 would have to invert this hash, the attacker 420 00:19:25,860 --> 00:19:29,132 could still try to do one of these brute force attacks. 421 00:19:29,132 --> 00:19:30,840 So one way that we can prevent the server 422 00:19:30,840 --> 00:19:32,160 from doing these brute force attacks 423 00:19:32,160 --> 00:19:33,876 is to choose one of these expensive hash functions 424 00:19:33,876 --> 00:19:35,060 like we were discussing before. 425 00:19:35,060 --> 00:19:36,559 Another thing, as we just discussed, 426 00:19:36,559 --> 00:19:39,640 is that you could actually allow the client to, 427 00:19:39,640 --> 00:19:44,070 for example, choose its own client chosen challenge 428 00:19:44,070 --> 00:19:44,850 over here. 429 00:19:44,850 --> 00:19:46,225 And so that essentially would act 430 00:19:46,225 --> 00:19:48,960 as like a client chosen salt. So that would essentially 431 00:19:48,960 --> 00:19:50,950 make it more difficult for the hacker 432 00:19:50,950 --> 00:19:52,760 to do things like build up a rainbow table. 433 00:19:52,760 --> 00:19:56,590 Because note that if the servers is the attacker here, 434 00:19:56,590 --> 00:19:59,830 the server always can pick the same challenge value again, 435 00:19:59,830 --> 00:20:02,190 again, and again, allowing to build the rainbow table. 436 00:20:02,190 --> 00:20:04,300 But if when the client responded back, 437 00:20:04,300 --> 00:20:06,870 the client also included some salt, 438 00:20:06,870 --> 00:20:09,086 some client chosen challenge that it included, 439 00:20:09,086 --> 00:20:10,460 then they'll prevent the attacker 440 00:20:10,460 --> 00:20:12,900 from building one of the rainbow tables. 441 00:20:12,900 --> 00:20:15,361 So does that all make sense? 442 00:20:15,361 --> 00:20:15,860 OK. 443 00:20:19,580 --> 00:20:23,300 So yeah, one thing that I mentioned 444 00:20:23,300 --> 00:20:26,920 that might be useful to do is implementing 445 00:20:26,920 --> 00:20:28,222 these anti-hammer defenses. 446 00:20:33,770 --> 00:20:40,560 And so anti-hammering defenses are basically designed to rate 447 00:20:40,560 --> 00:20:50,800 limit the number of password guesses 448 00:20:50,800 --> 00:20:53,630 that a bad client can issue. 449 00:20:59,900 --> 00:21:03,210 Because the idea here is that if you've got some clients who's 450 00:21:03,210 --> 00:21:05,320 trying to launch one of these brute force 451 00:21:05,320 --> 00:21:06,754 guesses against the password, you 452 00:21:06,754 --> 00:21:08,670 don't want that client to be able to sit there 453 00:21:08,670 --> 00:21:10,795 in a tight loop and just say, is this the password, 454 00:21:10,795 --> 00:21:12,910 is this the password, is this the password? 455 00:21:12,910 --> 00:21:14,830 So one way we can do anti-hamming 456 00:21:14,830 --> 00:21:16,556 it just do that rate limiting. 457 00:21:16,556 --> 00:21:18,170 So the server will say, I will only 458 00:21:18,170 --> 00:21:21,150 accept let's say three password guesses per second 459 00:21:21,150 --> 00:21:22,650 from any particular client. 460 00:21:22,650 --> 00:21:28,710 You could also mention imagine implementing timeouts here. 461 00:21:28,710 --> 00:21:31,550 So maybe the client can issue a bunch of password requests 462 00:21:31,550 --> 00:21:33,970 in a row, but then after, let's say, 10 of them are wrong, 463 00:21:33,970 --> 00:21:35,594 the server says, OK you got to hold on, 464 00:21:35,594 --> 00:21:39,340 I will not accept any more requests from you for, 465 00:21:39,340 --> 00:21:42,770 let's say, 10 seconds, something like that. 466 00:21:42,770 --> 00:21:44,610 And so both of these things are designed 467 00:21:44,610 --> 00:21:46,220 for preventing brute force attacks. 468 00:21:46,220 --> 00:21:48,912 And so, for example, like some smart cars have 469 00:21:48,912 --> 00:21:50,860 these types of defenses, some TPNs 470 00:21:50,860 --> 00:21:53,150 have these kinds of defenses to basically stop 471 00:21:53,150 --> 00:21:56,000 against this brute force attack. 472 00:21:56,000 --> 00:21:58,250 So why is it important for you to use 473 00:21:58,250 --> 00:21:59,880 these anti-hammering defenses? 474 00:21:59,880 --> 00:22:01,370 Well one reason why it's important 475 00:22:01,370 --> 00:22:03,570 is as we discussed these passwords have 476 00:22:03,570 --> 00:22:05,640 so little entropy. 477 00:22:05,640 --> 00:22:08,110 So because passwords typically have so little entropy, 478 00:22:08,110 --> 00:22:10,337 it's really important to prevent the attacker 479 00:22:10,337 --> 00:22:12,670 from just trying to cycle through that low entropy space 480 00:22:12,670 --> 00:22:13,940 very, very quickly. 481 00:22:13,940 --> 00:22:15,940 So as you may be aware, a lot of websites 482 00:22:15,940 --> 00:22:21,042 have these format constraints that push upon you 483 00:22:21,042 --> 00:22:22,630 for your passwords. 484 00:22:22,630 --> 00:22:24,437 They'll say things like your password must 485 00:22:24,437 --> 00:22:31,036 have a punctuation, it must have a mixture of numbers 486 00:22:31,036 --> 00:22:33,410 and letters, you must have uppercase and lowercase stuff, 487 00:22:33,410 --> 00:22:34,546 so and so forth. 488 00:22:34,546 --> 00:22:36,920 And so what those constraints are trying to get you to do 489 00:22:36,920 --> 00:22:38,760 is they're trying to get you to expand 490 00:22:38,760 --> 00:22:40,660 the entropy of the password. 491 00:22:40,660 --> 00:22:43,490 But what's problematic though is that it's not really 492 00:22:43,490 --> 00:22:46,210 these formatted constraints that we should be caring about. 493 00:22:46,210 --> 00:22:48,980 It's the actual entropy of the password itself. 494 00:22:48,980 --> 00:22:51,680 So it turns out even if people were given these constraints-- 495 00:22:51,680 --> 00:22:52,960 like you have to use punctuation, characters, 496 00:22:52,960 --> 00:22:55,275 and stuff like that-- the entropy of resulting password 497 00:22:55,275 --> 00:22:56,844 is often quite low. 498 00:22:56,844 --> 00:22:58,885 So for example, people will often put punctuation 499 00:22:58,885 --> 00:22:59,885 at the beginning or end. 500 00:22:59,885 --> 00:23:02,218 Because they don't want to be troubled to remember like, 501 00:23:02,218 --> 00:23:04,900 do I have like a dollar sign in the middle or something? 502 00:23:04,900 --> 00:23:08,720 And so as it turns out, these format requirements oftentimes 503 00:23:08,720 --> 00:23:11,850 don't make dictionary attacks much harder 504 00:23:11,850 --> 00:23:14,070 for a sophisticated adversary. 505 00:23:14,070 --> 00:23:18,240 And the reason is because, basically, the dictionary 506 00:23:18,240 --> 00:23:20,540 attacker can leverage these observations 507 00:23:20,540 --> 00:23:22,720 about how people pick passwords even 508 00:23:22,720 --> 00:23:24,360 in the presence of constraints. 509 00:23:24,360 --> 00:23:26,910 So for example, if the attacker knows that people typically 510 00:23:26,910 --> 00:23:28,630 put punctuation at the beginning or the end, 511 00:23:28,630 --> 00:23:30,720 just incorporate that into your dictionary attack. 512 00:23:30,720 --> 00:23:32,595 And so an actually really interesting website 513 00:23:32,595 --> 00:23:35,995 you can go to that's called Telepathwords. 514 00:23:40,130 --> 00:23:41,770 And so what's neat about this site 515 00:23:41,770 --> 00:23:44,390 is that it has a little text box. 516 00:23:44,390 --> 00:23:46,745 So you can type a character into that text box-- 517 00:23:46,745 --> 00:23:48,870 you're pretending that you're entering a password-- 518 00:23:48,870 --> 00:23:51,070 and Telepathwords will try to guess 519 00:23:51,070 --> 00:23:52,960 what your next character is. 520 00:23:52,960 --> 00:23:54,595 So as you type additional characters, 521 00:23:54,595 --> 00:23:56,800 it'll have a little drop down box which says, 522 00:23:56,800 --> 00:23:59,091 were you going to put this, were you going to put this? 523 00:23:59,091 --> 00:24:02,380 It will give you a little blurb that says, 524 00:24:02,380 --> 00:24:04,035 here's what I think that you were going 525 00:24:04,035 --> 00:24:05,650 to enter this next password. 526 00:24:05,650 --> 00:24:07,290 So how does Telepathwords work? 527 00:24:07,290 --> 00:24:09,350 So it basically has a bunch of databases. 528 00:24:09,350 --> 00:24:11,705 It has a database of common passwords. 529 00:24:15,030 --> 00:24:21,930 It also has a list of popular phrases 530 00:24:21,930 --> 00:24:25,504 that it's taken from websites. 531 00:24:25,504 --> 00:24:28,040 And it also has this set of heuristics 532 00:24:28,040 --> 00:24:36,570 which describe common user biases in picking passwords. 533 00:24:36,570 --> 00:24:38,210 So for example, one funny bias is 534 00:24:38,210 --> 00:24:39,796 that people will often-- when they 535 00:24:39,796 --> 00:24:41,170 are forced with these constraints 536 00:24:41,170 --> 00:24:43,503 to say you must use punctuation, stuff like that-- a lot 537 00:24:43,503 --> 00:24:47,460 of times when they're picking characters for the password, 538 00:24:47,460 --> 00:24:50,994 they will use keys that are adjacent to each other. 539 00:24:50,994 --> 00:24:52,660 So in other words, they'll be very small 540 00:24:52,660 --> 00:24:54,690 edit distance in physical space with respect 541 00:24:54,690 --> 00:24:56,920 to edit distance in the actual password. 542 00:24:56,920 --> 00:24:59,510 So what a Telepathwords does is it has the database here, 543 00:24:59,510 --> 00:25:01,720 so when you type in things it's running these models. 544 00:25:01,720 --> 00:25:02,670 And it's saying, statistically speaking, 545 00:25:02,670 --> 00:25:05,424 here's the most likely thing that you're going to type next. 546 00:25:05,424 --> 00:25:07,652 So it's almost like auto complete for passwords. 547 00:25:07,652 --> 00:25:09,235 And so what's funny is that this shows 548 00:25:09,235 --> 00:25:11,151 once again that if you have these constraints, 549 00:25:11,151 --> 00:25:14,150 they actually don't protect you that much if there are some 550 00:25:14,150 --> 00:25:17,500 of these underlying a priori distributions of things 551 00:25:17,500 --> 00:25:19,870 that the attacker can't leverage. 552 00:25:19,870 --> 00:25:21,766 I think there was a question? 553 00:25:21,766 --> 00:25:25,970 AUDIENCE: Yeah so it seems like if an attacker is 554 00:25:25,970 --> 00:25:28,162 too sophisticated that they could 555 00:25:28,162 --> 00:25:31,571 try guessing like a bunch of IP addresses and things 556 00:25:31,571 --> 00:25:34,980 which only would prevent hammering [INAUDIBLE]. 557 00:25:42,684 --> 00:25:44,100 PROFESSOR: Yeah, it's very tricky. 558 00:25:44,100 --> 00:25:45,100 Now that's a good point. 559 00:25:45,100 --> 00:25:47,659 So anti-hammering basically sounds well 560 00:25:47,659 --> 00:25:50,500 what's the scope of the attack that you're trying to prevent? 561 00:25:50,500 --> 00:25:54,055 So if you're concerned about distributed attackers 562 00:25:54,055 --> 00:25:57,250 and a network system, it does become very, very subtle. 563 00:25:57,250 --> 00:26:00,202 And suffice it to say that the notion of anti-hammering 564 00:26:00,202 --> 00:26:02,410 or [INAUDIBLE] systems, and also the notion of things 565 00:26:02,410 --> 00:26:05,080 like clipfraud, for example. 566 00:26:05,080 --> 00:26:06,700 So in other words, how does someone 567 00:26:06,700 --> 00:26:08,590 who's running an advertising campaign online 568 00:26:08,590 --> 00:26:10,665 determine if someone's actually putting the link 569 00:26:10,665 --> 00:26:13,070 and actually paying someone for those clicks, verses 570 00:26:13,070 --> 00:26:15,560 this is just spammer who got some box just sitting 571 00:26:15,560 --> 00:26:17,200 there clicking on stuff. 572 00:26:17,200 --> 00:26:19,241 So suffice it to say there's a lot of distributed 573 00:26:19,241 --> 00:26:21,690 heuristics that try to solve those problems. 574 00:26:21,690 --> 00:26:23,980 And in many cases, it's not a science, it's an art. 575 00:26:23,980 --> 00:26:26,480 But your [INAUDIBLE] correct and in the distributed setting, 576 00:26:26,480 --> 00:26:30,980 things get much more difficult. All right, 577 00:26:30,980 --> 00:26:32,930 so does this all make sense? 578 00:26:32,930 --> 00:26:35,330 AUDIENCE: What about the cryptographic anti-hammering 579 00:26:35,330 --> 00:26:36,770 defenses? 580 00:26:36,770 --> 00:26:40,800 Most of the time you end up sending a hash on the line 581 00:26:40,800 --> 00:26:44,855 [INAUDIBLE] that when you get out of it 582 00:26:44,855 --> 00:26:46,595 is exactly what you would get out 583 00:26:46,595 --> 00:26:48,178 the password of the hashable password? 584 00:26:50,571 --> 00:26:52,490 I know there are protocols like SRP 585 00:26:52,490 --> 00:26:56,160 or there are some zero knowledge protocols. 586 00:26:56,160 --> 00:26:57,062 PROFESSOR: Yeah, so-- 587 00:26:57,062 --> 00:26:58,520 AUDIENCE: That you use in practice? 588 00:26:58,520 --> 00:26:59,311 PROFESSOR: They do. 589 00:27:01,820 --> 00:27:03,980 Those protocols provides some stronger 590 00:27:03,980 --> 00:27:05,160 cryptographic guarantees. 591 00:27:05,160 --> 00:27:06,500 A lot of times they are not backwards 592 00:27:06,500 --> 00:27:08,900 compatible with current systems, which is why in practice you 593 00:27:08,900 --> 00:27:09,470 don't see them used a lot. 594 00:27:09,470 --> 00:27:10,928 But yeah, there are some protocols, 595 00:27:10,928 --> 00:27:14,900 for example, that allow the server to not 596 00:27:14,900 --> 00:27:17,840 have any notion of the password at all. 597 00:27:17,840 --> 00:27:20,220 So there's some zero knowledge type thing or whatever. 598 00:27:20,220 --> 00:27:21,719 So those things do work in practice. 599 00:27:21,719 --> 00:27:24,505 But one of the things that this paper says is very interesting 600 00:27:24,505 --> 00:27:26,880 is that you basically go through all these authentication 601 00:27:26,880 --> 00:27:29,190 schemes and they say, OK, here's passwords. 602 00:27:29,190 --> 00:27:30,190 Yeah, they kind of suck. 603 00:27:30,190 --> 00:27:31,360 Here's some other things that are actually 604 00:27:31,360 --> 00:27:32,770 much stronger on security access, 605 00:27:32,770 --> 00:27:35,500 but then they all fail on deployability or usability 606 00:27:35,500 --> 00:27:36,560 and things like that. 607 00:27:36,560 --> 00:27:39,970 And so that's one of the interesting and slightly sad 608 00:27:39,970 --> 00:27:41,890 outcomes of this paper that maybe 609 00:27:41,890 --> 00:27:44,185 even though we have all these much stronger security 610 00:27:44,185 --> 00:27:46,680 for the protocols, we can't deploy them 611 00:27:46,680 --> 00:27:50,164 for some usability reasons or some [INAUDIBLE] reason. 612 00:27:54,440 --> 00:27:56,277 So that's just a fun site to go to right. 613 00:27:56,277 --> 00:27:58,360 So they claim that they don't store your passwords 614 00:27:58,360 --> 00:28:00,660 so you take them at their word if you want to. 615 00:28:00,660 --> 00:28:03,520 But it is very interesting to just sit down and think like, 616 00:28:03,520 --> 00:28:04,870 what password I generate? 617 00:28:04,870 --> 00:28:07,340 And then type into this, and see how accurate 618 00:28:07,340 --> 00:28:09,685 it is in guessing what the next thing will be. 619 00:28:09,685 --> 00:28:12,090 It even covers things like the popular heuristic 620 00:28:12,090 --> 00:28:15,760 like take a popular phrase that has multiple words, 621 00:28:15,760 --> 00:28:18,180 and then only take the first letter of each word. 622 00:28:18,180 --> 00:28:19,650 So this thing is very, very good. 623 00:28:19,650 --> 00:28:21,100 Very, very scary too. 624 00:28:21,100 --> 00:28:23,402 OK so that's Telepathwords. 625 00:28:23,402 --> 00:28:25,110 And so one thing that is also interesting 626 00:28:25,110 --> 00:28:30,070 when you think about is in your password scheme, 627 00:28:30,070 --> 00:28:33,760 is it vulnerable to offline guessing. 628 00:28:37,290 --> 00:28:43,740 So this was a problem that Kerberos before that. 629 00:28:43,740 --> 00:28:51,550 And then also V5 without this thing they call preauth. 630 00:28:51,550 --> 00:28:55,090 So the basic idea is that in these versions of Kerberos, 631 00:28:55,090 --> 00:28:58,530 anyone could ask the KDC for a ticket that would encrypted 632 00:28:58,530 --> 00:29:00,610 with the users password. 633 00:29:00,610 --> 00:29:04,149 So basically, the KDC did not authenticate requests 634 00:29:04,149 --> 00:29:05,440 that were coming from a client. 635 00:29:05,440 --> 00:29:07,500 Now the thing that the KDC would return 636 00:29:07,500 --> 00:29:12,180 was, in fact-- there are some set of bits 637 00:29:12,180 --> 00:29:13,980 here that the KDC would return. 638 00:29:13,980 --> 00:29:16,275 I'm sure you don't want to think about this ugly set 639 00:29:16,275 --> 00:29:17,340 of cryptographic printers anymore. 640 00:29:17,340 --> 00:29:18,839 But suffice it to say, the KDC would 641 00:29:18,839 --> 00:29:21,430 return this stuff that was encrypted 642 00:29:21,430 --> 00:29:24,490 with the key of the client. 643 00:29:24,490 --> 00:29:26,510 That's what will come back to the client side. 644 00:29:26,510 --> 00:29:30,420 So the problem with this is that because the server did not 645 00:29:30,420 --> 00:29:34,730 check who was sending this encrypted set of things to, 646 00:29:34,730 --> 00:29:38,520 the attacker can basically get this thing here and then 647 00:29:38,520 --> 00:29:40,900 try to just guess what KC is. 648 00:29:40,900 --> 00:29:43,856 Just guess that KC is some value, try to encrypt this, 649 00:29:43,856 --> 00:29:44,980 see if it looks reasonable. 650 00:29:44,980 --> 00:29:47,720 If not, try to guess another KC, decrypt this, 651 00:29:47,720 --> 00:29:48,970 see if it looks reasonable. 652 00:29:48,970 --> 00:29:52,270 And the reason why the attacker can launch this type of attack, 653 00:29:52,270 --> 00:29:54,950 is that this thing here, this TGT actually 654 00:29:54,950 --> 00:29:57,370 has a known format. 655 00:29:57,370 --> 00:29:59,420 So it has things in here like timestamps, 656 00:29:59,420 --> 00:30:02,010 and it has things in here like various link field would have 657 00:30:02,010 --> 00:30:03,870 to be internally consistent. 658 00:30:03,870 --> 00:30:06,970 And so that basically helps the attacker. 659 00:30:06,970 --> 00:30:10,380 Because if the attacker guesses the KC, gets this thing here, 660 00:30:10,380 --> 00:30:12,550 a decrypted thing, and the internal fields 661 00:30:12,550 --> 00:30:14,600 don't check out, the attacker knows 662 00:30:14,600 --> 00:30:16,453 that it picked the wrong KC, so they 663 00:30:16,453 --> 00:30:18,480 can go on and pick another KC. 664 00:30:18,480 --> 00:30:24,570 And so, in Kerberos V5, basically the client 665 00:30:24,570 --> 00:30:30,330 has to send in this thing that it sends over to the KDC, 666 00:30:30,330 --> 00:30:36,790 it basically sends a time stamp. 667 00:30:36,790 --> 00:30:40,900 And then this time stamp is going to be encrypted with KC. 668 00:30:40,900 --> 00:30:43,230 So this is sent to the server, and the server 669 00:30:43,230 --> 00:30:46,240 looks at this and validates that before it will send something 670 00:30:46,240 --> 00:30:47,280 back to the client. 671 00:30:47,280 --> 00:30:49,930 So that gets rid of this problem that any random client 672 00:30:49,930 --> 00:30:53,354 can show up and just ask for this thing here. 673 00:30:56,840 --> 00:31:00,824 AUDIENCE: So is time stamp recorded in the message? 674 00:31:00,824 --> 00:31:04,657 So can't the attacker just give this message and enforce it? 675 00:31:04,657 --> 00:31:05,740 PROFESSOR: Let's see here. 676 00:31:05,740 --> 00:31:09,670 So can't the attacker get this message here? 677 00:31:09,670 --> 00:31:11,902 AUDIENCE: Yeah, the encryption [INAUDIBLE]. 678 00:31:11,902 --> 00:31:14,360 PROFESSOR: So you're thinking where the attacker might just 679 00:31:14,360 --> 00:31:15,500 spoof this, for example? 680 00:31:15,500 --> 00:31:19,227 AUDIENCE: No, I just brute force it and get KC out. 681 00:31:19,227 --> 00:31:19,810 PROFESSOR: OK. 682 00:31:19,810 --> 00:31:21,185 So in other words, you're worried 683 00:31:21,185 --> 00:31:22,954 someone could observe this. 684 00:31:22,954 --> 00:31:23,620 AUDIENCE: Right. 685 00:31:23,620 --> 00:31:25,090 PROFESSOR: So I believe that this 686 00:31:25,090 --> 00:31:29,166 is put inside an encrypted thing that belongs to the server, 687 00:31:29,166 --> 00:31:30,540 or the key belongs to the server. 688 00:31:30,540 --> 00:31:32,331 I think to prevent that attack. [INAUDIBLE] 689 00:31:32,331 --> 00:31:34,390 so don't quote me on that. 690 00:31:34,390 --> 00:31:36,250 But you're correct it's not, for example. 691 00:31:36,250 --> 00:31:37,625 And if the attacker, for example, 692 00:31:37,625 --> 00:31:39,890 knew something that about what the current time is, 693 00:31:39,890 --> 00:31:42,400 roughly, that actually is super useful. 694 00:31:42,400 --> 00:31:44,190 Because then the attacker can guess, 695 00:31:44,190 --> 00:31:46,815 oh, time stamp should be roughly between here and here. 696 00:31:46,815 --> 00:31:48,190 And if it sees it's in the clear, 697 00:31:48,190 --> 00:31:50,357 it can do the exact same attack that we had up here. 698 00:31:50,357 --> 00:31:52,648 AUDIENCE: It's a little better because the attacker has 699 00:31:52,648 --> 00:31:54,712 to be in the middle, but it's still susceptible. 700 00:31:54,712 --> 00:31:55,670 PROFESSOR: That's true. 701 00:31:55,670 --> 00:31:57,150 Well, yeah, that's right, the attacker 702 00:31:57,150 --> 00:31:58,770 has to be on the network somewhere so 703 00:31:58,770 --> 00:32:00,370 this [INAUDIBLE] stuff. 704 00:32:00,370 --> 00:32:00,946 That's right. 705 00:32:04,070 --> 00:32:06,350 So that's all, I'm guessing. 706 00:32:06,350 --> 00:32:09,130 So another thing that's important to think about 707 00:32:09,130 --> 00:32:14,580 is password recovery. 708 00:32:18,510 --> 00:32:20,950 So this is the idea that you lose your password, 709 00:32:20,950 --> 00:32:23,380 and then somehow you have to go to the service 710 00:32:23,380 --> 00:32:26,636 and you have to ask for another password. 711 00:32:26,636 --> 00:32:28,010 But before you get that password, 712 00:32:28,010 --> 00:32:30,220 you have to prove that you are you in some way. 713 00:32:30,220 --> 00:32:31,290 So how does that work? 714 00:32:31,290 --> 00:32:32,650 How to do password recovery? 715 00:32:32,650 --> 00:32:35,940 So what's interesting is that people oftentimes 716 00:32:35,940 --> 00:32:39,190 focus on the entropy of the password itself. 717 00:32:39,190 --> 00:32:43,430 But the problem is that if the password recovery 718 00:32:43,430 --> 00:32:45,570 questions or the password recovery scheme 719 00:32:45,570 --> 00:32:47,420 has little entropy, that actually 720 00:32:47,420 --> 00:32:50,113 affects the entropy of the overall authentication scheme. 721 00:32:50,113 --> 00:32:55,240 So in other words, the strength of the overall scheme 722 00:32:55,240 --> 00:32:58,520 is basically equal to the minimum 723 00:32:58,520 --> 00:33:07,440 of the password entropy in the recovery question entropy. 724 00:33:11,589 --> 00:33:13,960 And so you see this actually play out 725 00:33:13,960 --> 00:33:16,005 in a lot of rules scenarios. 726 00:33:16,005 --> 00:33:18,380 There's a lot of famous cases, like the Sarah Palin case, 727 00:33:18,380 --> 00:33:21,700 where basically someone was able to recover 728 00:33:21,700 --> 00:33:25,300 her password fraudulently because her recovery 729 00:33:25,300 --> 00:33:28,029 questions were things that any random person could find. 730 00:33:28,029 --> 00:33:30,070 By looking at her Wikipedia article, for example, 731 00:33:30,070 --> 00:33:32,880 find out where she went to high school and things like that. 732 00:33:32,880 --> 00:33:35,840 And so often times these password recovery questions 733 00:33:35,840 --> 00:33:36,950 are not very good. 734 00:33:36,950 --> 00:33:39,980 And they're not very good because of a couple reasons. 735 00:33:39,980 --> 00:33:44,560 So sometimes these things just have very low entropy. 736 00:33:44,560 --> 00:33:46,990 So if you have a password recovery question that 737 00:33:46,990 --> 00:33:49,610 is something like, what's your favorite color, 738 00:33:49,610 --> 00:33:52,190 the most popular answers are going to be like blue and red. 739 00:33:52,190 --> 00:33:55,300 Nobody's going to say like off white, fuchsia, magenta. 740 00:33:55,300 --> 00:33:57,150 So some of these recovery questions 741 00:33:57,150 --> 00:34:01,035 intrinsically are very difficult to provide a lot of entropy 742 00:34:01,035 --> 00:34:01,770 for. 743 00:34:01,770 --> 00:34:05,140 The other problem is that sometimes these 744 00:34:05,140 --> 00:34:11,560 recover questions can be leaked via social media. 745 00:34:11,560 --> 00:34:14,270 So for example, if one of the recovery questions 746 00:34:14,270 --> 00:34:16,020 is what's your favorite movie? 747 00:34:16,020 --> 00:34:18,170 So maybe this space there is a little bit bigger, 748 00:34:18,170 --> 00:34:20,540 but if intrinsically I can go look at, let's say, 749 00:34:20,540 --> 00:34:22,530 your IMDB profile, your Facebook profile, 750 00:34:22,530 --> 00:34:24,482 and figure out like, oh hey, you literally 751 00:34:24,482 --> 00:34:25,940 told me that's your favorite movie, 752 00:34:25,940 --> 00:34:27,820 this isn't super useful either. 753 00:34:27,820 --> 00:34:29,500 And another problem-- this is actually 754 00:34:29,500 --> 00:34:32,270 sort of the funniest one-- is that the user 755 00:34:32,270 --> 00:34:38,270 selected recovery questions are often super weak. 756 00:34:38,270 --> 00:34:42,396 So for example, people have done a survey 757 00:34:42,396 --> 00:34:44,520 of what some of these recovery questions look like, 758 00:34:44,520 --> 00:34:46,370 and sometimes users themselves will 759 00:34:46,370 --> 00:34:51,820 set recovery questions that are things like what is 2 plus 3? 760 00:34:51,820 --> 00:34:55,000 And so, at the time, the user's thinking this is a big hassle, 761 00:34:55,000 --> 00:34:56,409 we're going to have to use this. 762 00:34:56,409 --> 00:34:59,680 But trivially most humans who pass the Turing Test 763 00:34:59,680 --> 00:35:01,848 can answer that questions successfully. 764 00:35:01,848 --> 00:35:04,842 And then therefore get the users password back. 765 00:35:04,842 --> 00:35:12,340 AUDIENCE: So [INAUDIBLE] like using recovery passwords? 766 00:35:12,340 --> 00:35:16,462 It's basically like you enter in your name and maybe the subject 767 00:35:16,462 --> 00:35:18,891 of some emails that you've sent, like a small amount 768 00:35:18,891 --> 00:35:19,974 of additional information. 769 00:35:19,974 --> 00:35:21,979 But based on that, in some cases they 770 00:35:21,979 --> 00:35:26,200 can-- is security of that kind of stuff then? 771 00:35:26,200 --> 00:35:28,771 PROFESSOR: So I don't know of any formal study like that. 772 00:35:28,771 --> 00:35:30,396 Those things are actually a lot better. 773 00:35:30,396 --> 00:35:32,770 I actually know this, because I was 774 00:35:32,770 --> 00:35:35,000 trying to help a friend go through this process. 775 00:35:35,000 --> 00:35:38,630 So she basically lost control of her Gmail account, 776 00:35:38,630 --> 00:35:40,880 and she was trying to prove that this was her account. 777 00:35:40,880 --> 00:35:43,840 And so yeah, they would ask you things like roughly speaking, 778 00:35:43,840 --> 00:35:46,100 when did you open this account. 779 00:35:46,100 --> 00:35:48,573 Roughly speaking before you lost control of this account 780 00:35:48,573 --> 00:35:52,770 to hesball or whatever, who were some of the people 781 00:35:52,770 --> 00:35:54,205 that you talked to? 782 00:35:54,205 --> 00:35:55,080 And things like that. 783 00:35:55,080 --> 00:35:57,187 And it's actually a pretty laborious process. 784 00:35:57,187 --> 00:35:59,520 What ends up happening is that you're generally correct, 785 00:35:59,520 --> 00:36:01,950 it ends up being much more powerful than this stuff. 786 00:36:01,950 --> 00:36:04,920 And so actually I don't know of any formal studies of that, 787 00:36:04,920 --> 00:36:06,656 but it does seem [INAUDIBLE] much strong 788 00:36:06,656 --> 00:36:07,886 than these types of things. 789 00:36:11,259 --> 00:36:12,550 All right, any other questions? 790 00:36:16,350 --> 00:36:20,810 Now we can get to the paper for today. 791 00:36:20,810 --> 00:36:24,010 So reading for today, the author has basically 792 00:36:24,010 --> 00:36:28,610 proposed a bunch of factors that can be used to evaluate 793 00:36:28,610 --> 00:36:30,465 these authentication schemes. 794 00:36:30,465 --> 00:36:32,506 And what's really cool about this paper, I think, 795 00:36:32,506 --> 00:36:35,010 is that it basically tries to say, look, a lot of us 796 00:36:35,010 --> 00:36:37,460 in the security community are fighting just 797 00:36:37,460 --> 00:36:38,710 based on aesthetic principles. 798 00:36:38,710 --> 00:36:41,020 Like, we should pick this because I just 799 00:36:41,020 --> 00:36:43,260 like the way that the curly braces look in the proof. 800 00:36:43,260 --> 00:36:46,161 We should pick this because it uses a lot of math mode. 801 00:36:46,161 --> 00:36:48,660 And so what they say is, look, why don't we try to establish 802 00:36:48,660 --> 00:36:50,050 some type of criteria? 803 00:36:50,050 --> 00:36:52,510 Maybe some of the criteria are a little bit subjective. 804 00:36:52,510 --> 00:36:54,630 Let's just try to have this taxonomy of ways 805 00:36:54,630 --> 00:36:56,620 to evaluate the authentication scheme. 806 00:36:56,620 --> 00:36:59,900 And let's just see how these various schemes stack up. 807 00:36:59,900 --> 00:37:03,060 And so the authors basically proposed three high level 808 00:37:03,060 --> 00:37:05,660 metrics for evaluating these schemes. 809 00:37:05,660 --> 00:37:11,910 And so, the first metric is usability. 810 00:37:11,910 --> 00:37:13,950 And so, the base idea here is how 811 00:37:13,950 --> 00:37:16,520 easy is it for users to interact with this authentication 812 00:37:16,520 --> 00:37:17,620 scheme. 813 00:37:17,620 --> 00:37:20,000 So they find a couple interesting properties. 814 00:37:20,000 --> 00:37:23,820 So for example, is it easy to learn? 815 00:37:26,580 --> 00:37:29,679 This basically just means is this scheme easy to learn? 816 00:37:29,679 --> 00:37:31,970 So some of these categories are pretty straightforward. 817 00:37:31,970 --> 00:37:33,830 Some of them actually involve a little bit of subtlety. 818 00:37:33,830 --> 00:37:35,512 But this one makes a lot of sense. 819 00:37:35,512 --> 00:37:43,710 And so if we look at passwords, passwords pass this test. 820 00:37:43,710 --> 00:37:48,460 Because everybody is used to using passwords, so we'll say 821 00:37:48,460 --> 00:37:49,550 they are easy to learn. 822 00:37:49,550 --> 00:37:54,480 Another category is infrequent errors. 823 00:37:54,480 --> 00:37:56,480 So that means when you are trying 824 00:37:56,480 --> 00:37:58,583 to authenticate the system, if you 825 00:37:58,583 --> 00:38:01,189 are the actual user in question, is it 826 00:38:01,189 --> 00:38:03,230 the case that you can often authenticate yourself 827 00:38:03,230 --> 00:38:04,990 without generating errors? 828 00:38:04,990 --> 00:38:09,050 And so, here the authors say quasi-yes. 829 00:38:12,970 --> 00:38:15,316 And so the quasi prefix is one of the more entertaining 830 00:38:15,316 --> 00:38:17,190 aspects of the paper, because authors kind of 831 00:38:17,190 --> 00:38:20,010 admit there's this element of subjectivity to it. 832 00:38:20,010 --> 00:38:24,350 So we can't necessarily say with crisp precision yes, no, things 833 00:38:24,350 --> 00:38:25,020 like this. 834 00:38:25,020 --> 00:38:26,760 So the reason why they say quasi-yes 835 00:38:26,760 --> 00:38:30,120 is because, in general, you can authenticate a password 836 00:38:30,120 --> 00:38:30,700 successfully. 837 00:38:30,700 --> 00:38:33,109 But we've all been in that place where it's like 3 AM, 838 00:38:33,109 --> 00:38:34,900 we're trying to log on to our email server, 839 00:38:34,900 --> 00:38:36,060 our mind's not in the right place, 840 00:38:36,060 --> 00:38:38,060 and we enter a bunch of errors a bunch of times. 841 00:38:38,060 --> 00:38:41,030 So they say quasi-yes for this. 842 00:38:41,030 --> 00:38:46,510 Another category is it scalable for users. 843 00:38:50,006 --> 00:38:54,867 And so the basic idea here is if the user has 844 00:38:54,867 --> 00:38:56,950 a bunch of different services that he or she wants 845 00:38:56,950 --> 00:39:01,160 to authenticate to, does this scheme scale well? 846 00:39:01,160 --> 00:39:04,110 Does the user have to remember some new thing 847 00:39:04,110 --> 00:39:06,290 for each one of the schemes? 848 00:39:06,290 --> 00:39:11,200 And so, for here, the authors say no. 849 00:39:11,200 --> 00:39:14,480 Because in practice, it's very difficult for users 850 00:39:14,480 --> 00:39:18,130 to remember a separate password for every single site 851 00:39:18,130 --> 00:39:18,880 that they go to. 852 00:39:18,880 --> 00:39:21,500 This is one reason actually why people reuse their passwords 853 00:39:21,500 --> 00:39:23,660 often. 854 00:39:23,660 --> 00:39:27,216 So another usability property is easy recovery. 855 00:39:30,370 --> 00:39:34,230 So what happens if you lose your authentication 856 00:39:34,230 --> 00:39:37,160 token-- in this case, your password-- is it easy to reset? 857 00:39:37,160 --> 00:39:42,060 And in this case, the answer for passwords is yes. 858 00:39:42,060 --> 00:39:44,670 In fact, they are probably too easy to reset, 859 00:39:44,670 --> 00:39:46,620 as we just discussed a couple minutes ago. 860 00:39:46,620 --> 00:39:49,690 So that's a yes. 861 00:39:49,690 --> 00:39:52,210 And so another existing one is nothing to carry. 862 00:39:54,730 --> 00:39:58,690 So a lot of the more Barouque authentication protocols 863 00:39:58,690 --> 00:40:01,190 require you run some smartphone app, 864 00:40:01,190 --> 00:40:03,880 or you have some security token or smart card or things 865 00:40:03,880 --> 00:40:04,790 like that. 866 00:40:04,790 --> 00:40:07,370 So that's a burden. 867 00:40:07,370 --> 00:40:08,870 Maybe not with a smartphone so much, 868 00:40:08,870 --> 00:40:11,350 but having to carry around one of these other gadgets is 869 00:40:11,350 --> 00:40:12,310 probably a pain. 870 00:40:12,310 --> 00:40:17,300 And so this is actually one nice feature of passwords, 871 00:40:17,300 --> 00:40:20,340 you basically only have to carry around in your brain, 872 00:40:20,340 --> 00:40:22,570 which is one that you should have at all moments. 873 00:40:22,570 --> 00:40:25,427 So that's basically what usability looks like. 874 00:40:25,427 --> 00:40:27,010 It is very interesting in a high level 875 00:40:27,010 --> 00:40:30,600 that a lot of times these sort of factors 876 00:40:30,600 --> 00:40:33,705 are given a little bit of a short shrift in the community. 877 00:40:33,705 --> 00:40:36,080 Security can be when people are evaluating these schemes. 878 00:40:36,080 --> 00:40:38,770 They say, oh, this thing uses like a million bits of entropy, 879 00:40:38,770 --> 00:40:41,090 and can only be broken by the Death Star or whatever. 880 00:40:41,090 --> 00:40:42,464 But then people don't necessarily 881 00:40:42,464 --> 00:40:46,040 remember these are actually very important factors too. 882 00:40:46,040 --> 00:40:52,550 OK so the next high level category 883 00:40:52,550 --> 00:40:56,210 that the authors use to evaluate authentication scheme 884 00:40:56,210 --> 00:40:58,350 is deployability. 885 00:40:58,350 --> 00:41:00,652 So the base idea here is how easy 886 00:41:00,652 --> 00:41:05,940 is it to incorporate this system in to current web services. 887 00:41:05,940 --> 00:41:07,890 So one thing they look at, for example, 888 00:41:07,890 --> 00:41:12,753 is is it server compatible? 889 00:41:16,050 --> 00:41:18,350 And this basically means can I easily integrate 890 00:41:18,350 --> 00:41:22,200 this scheme with today's servers, which are based 891 00:41:22,200 --> 00:41:24,230 around text based passwords? 892 00:41:24,230 --> 00:41:27,440 And so since success here is defined with respect 893 00:41:27,440 --> 00:41:30,820 to passwords, passwords succeed. 894 00:41:30,820 --> 00:41:35,700 So another metric is browser compatibility. 895 00:41:35,700 --> 00:41:37,225 Similar type of thing. 896 00:41:37,225 --> 00:41:41,130 Can I use this scheme with current off-the-shelf browsers 897 00:41:41,130 --> 00:41:44,390 without having to install plug-in, something like that? 898 00:41:44,390 --> 00:41:48,408 Once again, passwords win by default. 899 00:41:48,408 --> 00:41:50,396 And another interesting one is excessibility. 900 00:41:54,870 --> 00:41:58,802 So can people who can use passwords now, but maybe 901 00:41:58,802 --> 00:42:01,010 have some type of physical disability-- maybe they're 902 00:42:01,010 --> 00:42:03,987 blind, or they can't hear well, or they can't gesture well, 903 00:42:03,987 --> 00:42:04,820 or things like that. 904 00:42:04,820 --> 00:42:07,050 Can they actually use this scheme? 905 00:42:07,050 --> 00:42:08,580 This is actually pretty important. 906 00:42:08,580 --> 00:42:12,462 So once again, the authors' saying yes. 907 00:42:12,462 --> 00:42:14,420 It's a little bit weird, because it's not clear 908 00:42:14,420 --> 00:42:16,880 that all people with all disabilities can use passwords, 909 00:42:16,880 --> 00:42:20,470 but they say yes here. 910 00:42:20,470 --> 00:42:22,690 So yes, so these are three interesting things 911 00:42:22,690 --> 00:42:24,890 to think about with respect to deployability. 912 00:42:24,890 --> 00:42:26,960 And the reason why this deployability category 913 00:42:26,960 --> 00:42:29,940 is so important is because it's very difficult to get anyone 914 00:42:29,940 --> 00:42:33,220 to upgrade anything ever. 915 00:42:33,220 --> 00:42:35,800 I mean people don't even want to reboot their machines 916 00:42:35,800 --> 00:42:38,155 and get a new OS update installed. 917 00:42:38,155 --> 00:42:40,780 So it's very difficult that this scheme requires usable changes 918 00:42:40,780 --> 00:42:42,749 on the server to get people on the server 919 00:42:42,749 --> 00:42:44,040 to actually do different stuff. 920 00:42:44,040 --> 00:42:45,340 This goes back to your question, why don't we 921 00:42:45,340 --> 00:42:46,480 use these better things? 922 00:42:46,480 --> 00:42:47,590 Cause deployability in many cases 923 00:42:47,590 --> 00:42:49,089 is super, super important to people. 924 00:42:51,920 --> 00:42:56,450 All right, so then the final category that we will look at 925 00:42:56,450 --> 00:42:57,125 is security. 926 00:43:00,690 --> 00:43:04,750 Right, so what kinds of attacks can this scheme prevent? 927 00:43:04,750 --> 00:43:09,305 So a lot of these security properties 928 00:43:09,305 --> 00:43:12,590 are resilient to foo. 929 00:43:12,590 --> 00:43:15,060 I'll just shorten that one of reds. 930 00:43:15,060 --> 00:43:21,750 So is the scheme resilient to physical observations? 931 00:43:25,090 --> 00:43:27,970 So the idea here is that an attacker can not 932 00:43:27,970 --> 00:43:30,730 impersonate the user after observing 933 00:43:30,730 --> 00:43:33,400 them authenticate a few times. 934 00:43:33,400 --> 00:43:35,540 So imagine that you had a shoulder surfer. 935 00:43:35,540 --> 00:43:37,280 So you're somewhere in a computer lab, 936 00:43:37,280 --> 00:43:38,821 someone's looking over your shoulder, 937 00:43:38,821 --> 00:43:39,980 seeing what you type in. 938 00:43:39,980 --> 00:43:42,400 Someone's videotaping you, maybe someone's 939 00:43:42,400 --> 00:43:44,802 got a microphone listening to the acoustic signature 940 00:43:44,802 --> 00:43:46,677 of your keyboard and trying to extract things 941 00:43:46,677 --> 00:43:49,630 from that, so on and so forth. 942 00:43:49,630 --> 00:43:53,820 So the authors say that passwords actually 943 00:43:53,820 --> 00:43:55,190 failed this test. 944 00:43:55,190 --> 00:44:00,090 And that's because someone can videotape typing in things, 945 00:44:00,090 --> 00:44:02,640 they can pretty easily figure out what letters you typed. 946 00:44:02,640 --> 00:44:04,973 Or there's actually these attacks where you can actually 947 00:44:04,973 --> 00:44:07,810 listen to the acoustic fingerprint of the keyboard, 948 00:44:07,810 --> 00:44:11,840 and detect what was typed based on what sounds that you hear. 949 00:44:11,840 --> 00:44:15,910 So passwords are not resistant to physical observation. 950 00:44:15,910 --> 00:44:25,135 So another property is resistant to targeted impersonation. 951 00:44:28,580 --> 00:44:30,630 And so the base idea here that, is 952 00:44:30,630 --> 00:44:33,570 that is it possible for someone who knows you-- a friend, 953 00:44:33,570 --> 00:44:35,280 an acquaintance, a spouse, a loved one, 954 00:44:35,280 --> 00:44:38,795 a family member, whatever-- to impersonate 955 00:44:38,795 --> 00:44:44,290 you using their knowledge of who you are and what you do. 956 00:44:44,290 --> 00:44:46,667 So could your friend try to pretend to be you easily 957 00:44:46,667 --> 00:44:47,750 in this particular scheme? 958 00:44:47,750 --> 00:44:53,065 So here the authors basically have another one 959 00:44:53,065 --> 00:44:53,940 of these quasi-yeses. 960 00:44:56,900 --> 00:44:59,610 And they say quasi-yes because they're not 961 00:44:59,610 --> 00:45:03,095 aware of any studies which show that if you know a person, 962 00:45:03,095 --> 00:45:05,570 you're more likely to guess their password. 963 00:45:05,570 --> 00:45:07,190 So they say quasi-yes for that. 964 00:45:07,190 --> 00:45:10,510 And so, note that resistance is targeted impersonation. 965 00:45:10,510 --> 00:45:12,260 This is where most security backup 966 00:45:12,260 --> 00:45:14,135 questions fail miserably. 967 00:45:14,135 --> 00:45:16,010 Because if someone knows something about you, 968 00:45:16,010 --> 00:45:19,595 quite easily they can guess your security questions 969 00:45:19,595 --> 00:45:22,860 in many cases. 970 00:45:22,860 --> 00:45:27,450 So then we have two categories that involve guessing. 971 00:45:27,450 --> 00:45:30,990 So the first one is resilient to throttle guessing. 972 00:45:34,930 --> 00:45:42,080 And so what this means is if the attacker can not 973 00:45:42,080 --> 00:45:47,690 issue guesses at line rate, because for, example, 974 00:45:47,690 --> 00:45:51,880 the server uses anti-hammering mechanisms. 975 00:45:51,880 --> 00:45:56,720 Is the scheme safe against the attacker? 976 00:45:56,720 --> 00:46:01,060 And so here, they say no. 977 00:46:01,060 --> 00:46:02,670 And so the reason why they say no, 978 00:46:02,670 --> 00:46:05,480 is because in practice passwords not only 979 00:46:05,480 --> 00:46:09,800 have sort of low inherit entropy because they're not that long, 980 00:46:09,800 --> 00:46:12,570 but also they have that skewed distribution. 981 00:46:12,570 --> 00:46:15,860 And so what that means is that even if the attacker is 982 00:46:15,860 --> 00:46:18,260 throttled in some way, typically the attacker can still 983 00:46:18,260 --> 00:46:20,040 make good forward progress and crack 984 00:46:20,040 --> 00:46:22,140 a lot of people's passwords. 985 00:46:22,140 --> 00:46:26,010 So they define another guessing property 986 00:46:26,010 --> 00:46:29,960 which is resistant to unthrottled guessing. 987 00:46:34,030 --> 00:46:38,890 And so this is basically saying, suppose 988 00:46:38,890 --> 00:46:44,110 that the attacker can issue these authentication forgery 989 00:46:44,110 --> 00:46:47,280 request as quickly as he or she wants. 990 00:46:47,280 --> 00:46:49,000 So in other words, the attacker is only 991 00:46:49,000 --> 00:46:51,220 limited by the speed of their hardware. 992 00:46:51,220 --> 00:46:54,440 So is the authentication scheme resilient to that type 993 00:46:54,440 --> 00:46:55,290 of attack? 994 00:46:55,290 --> 00:46:59,560 And here maybe this answer's also no, for the same reason 995 00:46:59,560 --> 00:47:01,470 that the answer was no up here. 996 00:47:01,470 --> 00:47:04,040 So basically passwords have a very small entropy space 997 00:47:04,040 --> 00:47:07,040 and they come skewed distribution. 998 00:47:07,040 --> 00:47:10,690 So that's all pretty straightforward. 999 00:47:10,690 --> 00:47:13,603 One interesting one is resiliency 1000 00:47:13,603 --> 00:47:16,390 to internal observation. 1001 00:47:21,890 --> 00:47:23,720 So this means that the attacker can not 1002 00:47:23,720 --> 00:47:27,370 impersonate a user like intercepting that users input. 1003 00:47:27,370 --> 00:47:31,770 For example, by installing a keystroke logger 1004 00:47:31,770 --> 00:47:34,675 on the keyboard that the user's using, 1005 00:47:34,675 --> 00:47:37,640 and using that logger to steal keypresses. 1006 00:47:37,640 --> 00:47:39,790 This also means, for example, that there's 1007 00:47:39,790 --> 00:47:41,450 no way for network attacker who's 1008 00:47:41,450 --> 00:47:44,270 observing the things that the client sending over the wire 1009 00:47:44,270 --> 00:47:48,670 to use that knowledge of the network traffic 1010 00:47:48,670 --> 00:47:50,710 to later impersonate the user. 1011 00:47:50,710 --> 00:47:56,610 And so here they say password do not have this scheme. 1012 00:47:56,610 --> 00:47:59,640 And they essentially say it's because passwords 1013 00:47:59,640 --> 00:48:02,060 are static tokens. 1014 00:48:02,060 --> 00:48:03,160 They don't change. 1015 00:48:03,160 --> 00:48:06,500 And typically static tokens are vulnerable to replay. 1016 00:48:06,500 --> 00:48:08,920 So if somehow, for example, an attacker 1017 00:48:08,920 --> 00:48:11,680 installs a keystroke logger and gets your password, 1018 00:48:11,680 --> 00:48:14,280 then basically the attacker can use that password 1019 00:48:14,280 --> 00:48:17,020 until it's either expired or revoked or something that. 1020 00:48:17,020 --> 00:48:18,470 It you just replay it again it'll 1021 00:48:18,470 --> 00:48:20,960 go into that authenticating server on the other side. 1022 00:48:20,960 --> 00:48:22,751 So here, passwords actually fail that test. 1023 00:48:25,564 --> 00:48:27,522 Another thing that we talked about a little bit 1024 00:48:27,522 --> 00:48:29,340 in this class phishing. 1025 00:48:29,340 --> 00:48:36,538 So resilience to phishing is another security metric. 1026 00:48:36,538 --> 00:48:40,190 And the base idea here is that, if the attacker can simulate 1027 00:48:40,190 --> 00:48:43,320 a valid service-- for example, by attacking the DNS 1028 00:48:43,320 --> 00:48:45,870 infrastructure or something like that-- 1029 00:48:45,870 --> 00:48:49,200 then the attacker cannot collect credentials from the user, 1030 00:48:49,200 --> 00:48:53,300 then the attacker can then use to pretend to be the user later 1031 00:48:53,300 --> 00:48:53,925 on. 1032 00:48:53,925 --> 00:48:58,300 And so this basically supposed penalized sites that 1033 00:48:58,300 --> 00:49:03,580 do not strongly tell the user, hey, I'm 1034 00:49:03,580 --> 00:49:06,850 this particular service, so you can feel confident to give me 1035 00:49:06,850 --> 00:49:07,950 your credentials. 1036 00:49:07,950 --> 00:49:11,160 And so if here passwords fail just because phishing sites 1037 00:49:11,160 --> 00:49:13,217 are very, very popular. 1038 00:49:13,217 --> 00:49:15,175 So passwords don't really intrinsically provide 1039 00:49:15,175 --> 00:49:16,341 any protection against that. 1040 00:49:20,620 --> 00:49:23,170 Now the next two are particularly 1041 00:49:23,170 --> 00:49:28,040 interesting in the context of a large scale distributed system. 1042 00:49:28,040 --> 00:49:30,390 So no trusted third party. 1043 00:49:33,760 --> 00:49:35,270 This essentially means that other 1044 00:49:35,270 --> 00:49:38,410 than the client and the server, there's no one else 1045 00:49:38,410 --> 00:49:44,580 in the system that is involved in the authentication protocol. 1046 00:49:44,580 --> 00:49:47,719 And so, that means that there's no third party who, 1047 00:49:47,719 --> 00:49:49,260 if that third party were compromised, 1048 00:49:49,260 --> 00:49:51,310 the entire integrity of the securities scheme 1049 00:49:51,310 --> 00:49:52,040 might fall apart. 1050 00:49:52,040 --> 00:49:54,343 And so, this is actually an interesting property 1051 00:49:54,343 --> 00:49:56,780 to look at because a lot of authentication problems 1052 00:49:56,780 --> 00:49:59,900 would go away if we could just store all our authentication 1053 00:49:59,900 --> 00:50:01,863 information in one place. 1054 00:50:01,863 --> 00:50:04,050 We just store it in one place, it's very simple, 1055 00:50:04,050 --> 00:50:05,690 we don't have to remember a lot of stuff on the client, 1056 00:50:05,690 --> 00:50:07,850 we just say, whatever service you want to use, 1057 00:50:07,850 --> 00:50:10,110 you always go to this one third party, 1058 00:50:10,110 --> 00:50:11,980 and that third party will always be 1059 00:50:11,980 --> 00:50:14,980 able to of authenticate you, and then 1060 00:50:14,980 --> 00:50:17,090 allow you to go on your way. 1061 00:50:17,090 --> 00:50:20,640 Now of course third parties are problematic with perspective 1062 00:50:20,640 --> 00:50:22,777 of robustness right because if you 1063 00:50:22,777 --> 00:50:24,360 have one of these global third parties 1064 00:50:24,360 --> 00:50:27,750 that everybody trusts, if that third party gets subverted then 1065 00:50:27,750 --> 00:50:29,660 perhaps the integrity of all the sites 1066 00:50:29,660 --> 00:50:32,400 that use that third party to authenticate all those sites 1067 00:50:32,400 --> 00:50:35,000 are potentially in danger. 1068 00:50:35,000 --> 00:50:39,760 So they say that passwords do not have a trusted third party 1069 00:50:39,760 --> 00:50:43,142 because each user is forced to have a separate password 1070 00:50:43,142 --> 00:50:44,054 for each site. 1071 00:50:46,790 --> 00:50:48,814 A related property is