1 00:00:00,080 --> 00:00:02,430 The following content is provided under a Creative 2 00:00:02,430 --> 00:00:03,810 Commons license. 3 00:00:03,810 --> 00:00:06,060 Your support will help MIT OpenCourseWare 4 00:00:06,060 --> 00:00:10,150 continue to offer high quality educational resources for free. 5 00:00:10,150 --> 00:00:12,700 To make a donation, or to view additional materials 6 00:00:12,700 --> 00:00:16,600 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:16,600 --> 00:00:17,305 at ocw.mit.edu. 8 00:00:26,520 --> 00:00:29,520 PROFESSOR: Now look at how the web uses 9 00:00:29,520 --> 00:00:32,770 cryptographic protocols to protect network communication 10 00:00:32,770 --> 00:00:36,142 and deal with network factors in general. 11 00:00:36,142 --> 00:00:37,600 So before we dive into the details, 12 00:00:37,600 --> 00:00:39,641 I want to remind you there's a quiz on Wednesday. 13 00:00:39,641 --> 00:00:41,350 And that's not in this room. 14 00:00:41,350 --> 00:00:42,730 It's in Walker. 15 00:00:42,730 --> 00:00:45,559 But it's at the regular lecture time. 16 00:00:45,559 --> 00:00:46,600 Any questions about that? 17 00:00:46,600 --> 00:00:49,002 Hopefully straightforward. 18 00:00:49,002 --> 00:00:50,460 Third floor, I think it is usually. 19 00:00:53,340 --> 00:00:54,220 All right. 20 00:00:54,220 --> 00:00:58,654 So today we're going to talk about how the web sort of uses 21 00:00:58,654 --> 00:01:00,570 cryptography to protect network communication. 22 00:01:00,570 --> 00:01:03,800 And we'll look at two sort of closely related topics. 23 00:01:03,800 --> 00:01:07,090 One is, how do you just cryptographically protect never 24 00:01:07,090 --> 00:01:09,890 communication in a larger scale than the Kerberos system 25 00:01:09,890 --> 00:01:11,956 we looked at in last lecture? 26 00:01:11,956 --> 00:01:14,330 And then also, we're going to look at how do you actually 27 00:01:14,330 --> 00:01:16,970 integrate this cryptographic protection provided to you 28 00:01:16,970 --> 00:01:19,940 at the network level into the entire application. 29 00:01:19,940 --> 00:01:22,190 So how does the web browser make sense 30 00:01:22,190 --> 00:01:25,090 of whatever guarantees the cryptographic protocol is 31 00:01:25,090 --> 00:01:26,600 providing to it? 32 00:01:26,600 --> 00:01:29,990 And these are closely related, and it turns out 33 00:01:29,990 --> 00:01:32,730 that protecting network communications is rather easy. 34 00:01:32,730 --> 00:01:34,880 Cryptography mostly just works. 35 00:01:34,880 --> 00:01:37,960 And integrating it in, and currently using it 36 00:01:37,960 --> 00:01:40,020 at a higher level in the browser, 37 00:01:40,020 --> 00:01:41,790 is actually that much trickier part, 38 00:01:41,790 --> 00:01:44,260 how to actually build a system around cryptography. 39 00:01:46,290 --> 00:01:48,700 Before we dive into this whole discussion, 40 00:01:48,700 --> 00:01:52,900 I want to remind you of the kinds 41 00:01:52,900 --> 00:01:55,670 of cryptographic primitives we're going to use here. 42 00:01:55,670 --> 00:01:59,580 So in last lecture on Kerberos, we basically 43 00:01:59,580 --> 00:02:04,320 used something called symmetric crypto, or encryption 44 00:02:04,320 --> 00:02:05,630 and decryption. 45 00:02:05,630 --> 00:02:09,910 And the plan there is that you have a secret key k, 46 00:02:09,910 --> 00:02:11,340 and you have two functions. 47 00:02:11,340 --> 00:02:14,070 So you can take some piece of data, 48 00:02:14,070 --> 00:02:16,220 let's call it p for plain text, and you 49 00:02:16,220 --> 00:02:18,170 can apply an encryption function, that's 50 00:02:18,170 --> 00:02:20,210 a function of some key k. 51 00:02:20,210 --> 00:02:24,150 And if you encrypt this plain text, you get a Cypher text c. 52 00:02:24,150 --> 00:02:26,380 And similarly, there's a description function called 53 00:02:26,380 --> 00:02:28,930 d, that given the same key k. 54 00:02:28,930 --> 00:02:31,860 And the cipher text will give you back your plain text. 55 00:02:31,860 --> 00:02:35,880 So this is the primitive that Kerberos was all built around. 56 00:02:35,880 --> 00:02:38,240 But it turns out there's other primitives, as well, 57 00:02:38,240 --> 00:02:40,360 that will be useful for today's discussion. 58 00:02:40,360 --> 00:02:46,350 And this is called asymmetric encryption and decryption. 59 00:02:46,350 --> 00:02:49,520 And here the idea is to have different keys for encryption 60 00:02:49,520 --> 00:02:50,260 and decryption. 61 00:02:50,260 --> 00:02:52,505 We'll see why this is so useful. 62 00:02:52,505 --> 00:02:54,380 And in particular, the functions that you get 63 00:02:54,380 --> 00:02:58,300 is, you can encrypt to a particular public key 64 00:02:58,300 --> 00:03:00,990 with a sum message and get a cipher text text. 65 00:03:00,990 --> 00:03:02,550 And in order to decrypt, you just 66 00:03:02,550 --> 00:03:07,380 supply the corresponding secret key to get the plain text back. 67 00:03:07,380 --> 00:03:10,400 And the cool thing now as you can publish this public key 68 00:03:10,400 --> 00:03:12,320 anywhere on the internet, and people 69 00:03:12,320 --> 00:03:15,180 can encrypt messages for you, but you need the secret key 70 00:03:15,180 --> 00:03:16,790 in order to decrypt the messages. 71 00:03:16,790 --> 00:03:19,910 And we'll see how this is used in the protocol. 72 00:03:19,910 --> 00:03:26,200 And in practice you'll often use public key crypto 73 00:03:26,200 --> 00:03:27,960 in a slightly different way. 74 00:03:27,960 --> 00:03:30,050 So instead of encrypting and decrypting messages, 75 00:03:30,050 --> 00:03:34,510 you might actually want to sign or verify messages. 76 00:03:34,510 --> 00:03:36,390 Turns out that at the implementation level 77 00:03:36,390 --> 00:03:41,162 these are related operations, but at an API level 78 00:03:41,162 --> 00:03:42,870 they might look all little bit different. 79 00:03:42,870 --> 00:03:47,350 So you might find a message with your secret key, 80 00:03:47,350 --> 00:03:50,160 and you get some sort of a signature s. 81 00:03:50,160 --> 00:03:53,930 And then you can also verify this message using 82 00:03:53,930 --> 00:03:55,340 the corresponding public key. 83 00:03:55,340 --> 00:04:00,040 And you get the message, and the signature, and outcomes, 84 00:04:00,040 --> 00:04:02,590 and some Boolean flags saying whether this 85 00:04:02,590 --> 00:04:06,040 is the correct signature not on that message. 86 00:04:06,040 --> 00:04:08,460 And there are some relatively intuitive guarantees 87 00:04:08,460 --> 00:04:11,630 that these functions provide if you, for example, got 88 00:04:11,630 --> 00:04:13,520 this signature and it verifies correctly, 89 00:04:13,520 --> 00:04:15,710 then it must have been generated by someone 90 00:04:15,710 --> 00:04:16,930 with the correct secret key. 91 00:04:18,990 --> 00:04:21,740 Make sense, in terms of the primitives we have? 92 00:04:21,740 --> 00:04:22,790 All right. 93 00:04:22,790 --> 00:04:24,690 So now let's actually try to figure out-- 94 00:04:26,636 --> 00:04:28,385 How would we protect network communication 95 00:04:28,385 --> 00:04:30,520 at a larger scale in Kerberos. 96 00:04:30,520 --> 00:04:33,650 In Kerberos, we had the fairly simple model 97 00:04:33,650 --> 00:04:37,810 where we had all the users and servers 98 00:04:37,810 --> 00:04:41,330 have some sort of a relation with this KDC entity. 99 00:04:41,330 --> 00:04:43,790 And this KDC entity has this giant table 100 00:04:43,790 --> 00:04:46,695 of principles and their keys. 101 00:04:49,260 --> 00:04:51,940 And whenever a user wants to talk to some server, 102 00:04:51,940 --> 00:04:55,010 they have to ask the KDC to generate a ticket based 103 00:04:55,010 --> 00:04:58,340 on those giant table the KDC has. 104 00:04:58,340 --> 00:05:00,890 So this seems like a reasonably straightforward model. 105 00:05:00,890 --> 00:05:02,320 So why do we need something more? 106 00:05:02,320 --> 00:05:06,180 Why is Kerberos not enough for the web? 107 00:05:06,180 --> 00:05:08,599 Why doesn't the web use just Kerberos for securing 108 00:05:08,599 --> 00:05:09,390 all communications? 109 00:05:13,059 --> 00:05:14,927 AUDIENCE: [INAUDIBLE] 110 00:05:16,224 --> 00:05:16,890 PROFESSOR: Yeah. 111 00:05:16,890 --> 00:05:24,730 So there a sort of a single KDC, has to be trusted by all. 112 00:05:24,730 --> 00:05:27,340 So this is, perhaps, not great. 113 00:05:27,340 --> 00:05:29,100 So you might have trouble really believing 114 00:05:29,100 --> 00:05:31,420 that some machine out there is secure for everyone 115 00:05:31,420 --> 00:05:32,792 in the world to use. 116 00:05:32,792 --> 00:05:34,250 Like, yeah, maybe people at MIT are 117 00:05:34,250 --> 00:05:36,240 willing to trust someone at [? ISNT ?] 118 00:05:36,240 --> 00:05:38,640 to run the KDC there. 119 00:05:38,640 --> 00:05:39,140 All right. 120 00:05:39,140 --> 00:05:42,134 So that's plausible, yeah. 121 00:05:42,134 --> 00:05:43,195 AUDIENCE: [INAUDIBLE] 122 00:05:43,195 --> 00:05:43,820 PROFESSOR: Yes. 123 00:05:43,820 --> 00:05:45,520 A key management is hard, I guess, yeah. 124 00:05:45,520 --> 00:05:49,778 So what I mean in particular by key management-- 125 00:05:51,236 --> 00:05:53,666 AUDIENCE: [INAUDIBLE] 126 00:06:02,715 --> 00:06:03,340 PROFESSOR: Yes. 127 00:06:03,340 --> 00:06:06,570 It might actually be a hard job to build a KDC that we 128 00:06:06,570 --> 00:06:09,990 can manage a billion keys, or ten billion keys, for all 129 00:06:09,990 --> 00:06:11,340 the people in the world. 130 00:06:11,340 --> 00:06:13,840 So it might be a tricky proposition. 131 00:06:13,840 --> 00:06:15,340 If that's not the case, then I guess 132 00:06:15,340 --> 00:06:18,920 another bummer with Kerberos is that all users actually 133 00:06:18,920 --> 00:06:23,415 have to have a key, or have to be known to the KDC. 134 00:06:26,010 --> 00:06:28,467 So, you can't even use Kerberos at MIT 135 00:06:28,467 --> 00:06:30,425 to connect to some servers, unless you yourself 136 00:06:30,425 --> 00:06:33,520 have an account in the Kerberos database. 137 00:06:33,520 --> 00:06:36,400 Whereas on the web, it's completely reasonable 138 00:06:36,400 --> 00:06:38,192 to expect the you walk up to some computer, 139 00:06:38,192 --> 00:06:39,733 the computer has no idea who you are, 140 00:06:39,733 --> 00:06:41,890 but you can still go to Amazon's website protected 141 00:06:41,890 --> 00:06:43,730 with cryptography. 142 00:06:43,730 --> 00:06:44,612 Yeah? 143 00:06:44,612 --> 00:06:46,058 AUDIENCE: [INAUDIBLE] 144 00:06:50,775 --> 00:06:51,900 PROFESSOR: That's our idea. 145 00:06:51,900 --> 00:06:53,530 So there's these kinds of considerations. 146 00:06:53,530 --> 00:06:55,130 So there's private forward secrecies. 147 00:06:55,130 --> 00:06:56,546 There are a couple of other things 148 00:06:56,546 --> 00:06:58,260 you want from the cryptographic protocol, 149 00:06:58,260 --> 00:07:00,472 and we'll look at them and how they sort of show up 150 00:07:00,472 --> 00:07:01,407 in SSL, as well. 151 00:07:01,407 --> 00:07:03,490 But the key there is that the solution is actually 152 00:07:03,490 --> 00:07:06,170 exactly the same as what you would do Kerberos, 153 00:07:06,170 --> 00:07:09,514 and what you would do in SSL or TLS to address those guys. 154 00:07:09,514 --> 00:07:10,680 But you're absolutely right. 155 00:07:10,680 --> 00:07:12,840 There Kerberos deep protocols we read about 156 00:07:12,840 --> 00:07:16,150 in the paper is pretty dated. 157 00:07:16,150 --> 00:07:18,250 So, even if you were using it for the web, 158 00:07:18,250 --> 00:07:21,860 you would want to apply some changes to it. 159 00:07:21,860 --> 00:07:26,190 Those are not huge though, at the [INAUDIBLE] level. 160 00:07:26,190 --> 00:07:28,280 Any other thoughts on why we should use Kerberos? 161 00:07:28,280 --> 00:07:29,095 Yeah? 162 00:07:29,095 --> 00:07:30,580 AUDIENCE: [INAUDIBLE] 163 00:07:36,277 --> 00:07:38,110 PROFESSOR: This is actually not so scalable. 164 00:07:40,815 --> 00:07:41,440 Yeah, recovery. 165 00:07:44,240 --> 00:07:45,790 Maybe registration even, as well, 166 00:07:45,790 --> 00:07:48,100 like you have to go to some accounts office 167 00:07:48,100 --> 00:07:49,050 and get an account. 168 00:07:49,050 --> 00:07:50,406 Yeah? 169 00:07:50,406 --> 00:07:52,160 AUDIENCE: [INAUDIBLE] needs to be online. 170 00:07:52,160 --> 00:07:54,120 PROFESSOR: Yeah, so that's another problem. 171 00:07:54,120 --> 00:07:55,620 These are sort of management issues, 172 00:07:55,620 --> 00:07:59,794 but at the protocol level, the KDC 173 00:07:59,794 --> 00:08:01,460 has to be online because it has actually 174 00:08:01,460 --> 00:08:03,635 mediate every interaction you have with the service. 175 00:08:05,289 --> 00:08:07,830 It means that in the web, every time you go to a new website, 176 00:08:07,830 --> 00:08:10,130 you'd have to talk to some KDC first, which 177 00:08:10,130 --> 00:08:11,940 would be a bit of a performance bottleneck. 178 00:08:11,940 --> 00:08:13,520 So like another kind of scalability, 179 00:08:13,520 --> 00:08:15,680 this is like performance scalability. 180 00:08:15,680 --> 00:08:18,900 This is more management scalability kind of stuff. 181 00:08:18,900 --> 00:08:19,470 Make sense? 182 00:08:22,030 --> 00:08:24,040 So, how can we solve this problem 183 00:08:24,040 --> 00:08:27,160 with these better primitives? 184 00:08:27,160 --> 00:08:31,330 Well, the idea is to use public key cryptography to give 185 00:08:31,330 --> 00:08:33,980 this KDC out of the loop. 186 00:08:33,980 --> 00:08:35,880 So let's first figure out whether we 187 00:08:35,880 --> 00:08:40,419 can establish secure communication if you just know 188 00:08:40,419 --> 00:08:41,669 some other party's public key. 189 00:08:41,669 --> 00:08:43,710 And then we'll see how we plug-in 190 00:08:43,710 --> 00:08:46,840 a public key version of a KDC to authenticate parties 191 00:08:46,840 --> 00:08:50,080 in this protocol. 192 00:08:50,080 --> 00:08:54,194 If you don't want to use a KDC, what 193 00:08:54,194 --> 00:08:55,652 you could do with public key crypto 194 00:08:55,652 --> 00:08:58,190 is maybe you can somehow learn the public key 195 00:08:58,190 --> 00:08:59,440 of the public key of the other value on a connector. 196 00:08:59,440 --> 00:09:01,690 So in Kerberos, if I want to connect to a file server, 197 00:09:01,690 --> 00:09:04,330 maybe I just know the file server's public key 198 00:09:04,330 --> 00:09:05,190 from somewhere. 199 00:09:05,190 --> 00:09:07,510 Like me as a freshman I get a printout saying the file 200 00:09:07,510 --> 00:09:09,260 server's public key is this. 201 00:09:09,260 --> 00:09:12,450 And then you can go ahead and connect it. 202 00:09:12,450 --> 00:09:14,760 And the way you might actually do this is you 203 00:09:14,760 --> 00:09:18,552 could just encrypt a message for the public key of the file 204 00:09:18,552 --> 00:09:20,010 server that you want to connect to. 205 00:09:20,010 --> 00:09:22,460 But it turns out that in practice, 206 00:09:22,460 --> 00:09:24,541 these public key operations are pretty slow. 207 00:09:24,541 --> 00:09:26,040 They are several orders of magnitude 208 00:09:26,040 --> 00:09:29,260 slower than symmetric key cryptography. 209 00:09:29,260 --> 00:09:33,520 So almost always you want to get out of the use of public crypto 210 00:09:33,520 --> 00:09:35,440 as soon as practical. 211 00:09:35,440 --> 00:09:37,320 So a typical protocol might look like this 212 00:09:37,320 --> 00:09:40,020 where you have a and b, and they want to communicate. 213 00:09:40,020 --> 00:09:41,990 And a knows b's public key. 214 00:09:41,990 --> 00:09:44,480 So what might happen is that a might generate 215 00:09:44,480 --> 00:09:46,445 some sort of session s. 216 00:09:49,840 --> 00:09:51,380 Just pick a random number. 217 00:09:51,380 --> 00:09:56,210 And then it's going to send to b the session key s. 218 00:09:56,210 --> 00:09:58,550 So this is kind of looking like Kerberos. 219 00:09:58,550 --> 00:10:01,840 And we're going to encrypt the session s for b's key. 220 00:10:01,840 --> 00:10:03,860 And remember in Kerberos, in order to do this, 221 00:10:03,860 --> 00:10:05,590 we have to have the KDC do this for us 222 00:10:05,590 --> 00:10:08,095 because a didn't know the key for b, 223 00:10:08,095 --> 00:10:10,720 or couldn't have been allowed to know because that is a secret. 224 00:10:10,720 --> 00:10:12,197 But only b should've known. 225 00:10:12,197 --> 00:10:14,280 With public key cyrptor you can actually this now. 226 00:10:14,280 --> 00:10:21,730 We can just encrypt the secret s using these public keys. 227 00:10:21,730 --> 00:10:23,555 And we send this message over to b. 228 00:10:23,555 --> 00:10:25,500 B can now decrypt this message, and say I 229 00:10:25,500 --> 00:10:27,500 should be using this secret key. 230 00:10:27,500 --> 00:10:30,930 And now we can have a communication channel where 231 00:10:30,930 --> 00:10:32,890 all the messages are just encrypted 232 00:10:32,890 --> 00:10:37,560 under this secret key s. 233 00:10:37,560 --> 00:10:38,971 Does this Make sense? 234 00:10:38,971 --> 00:10:41,220 So there are some nice properties about this protocol. 235 00:10:41,220 --> 00:10:43,280 One is that we got rid of having to have 236 00:10:43,280 --> 00:10:47,339 a KDC be online and generate our session key for us. 237 00:10:47,339 --> 00:10:48,880 We could just have one of the parties 238 00:10:48,880 --> 00:10:51,670 generate it and then encrypt it for another party 239 00:10:51,670 --> 00:10:54,040 without the use of the KDC. 240 00:10:54,040 --> 00:10:56,070 Another nice thing is we're probably 241 00:10:56,070 --> 00:10:59,480 pretty confident that messages sent by a to b 242 00:10:59,480 --> 00:11:01,190 will only be read by b. 243 00:11:01,190 --> 00:11:04,670 Because only b could have decrypted this message. 244 00:11:04,670 --> 00:11:06,565 And therefore, only b should have 245 00:11:06,565 --> 00:11:09,640 that corresponding secret key s. 246 00:11:09,640 --> 00:11:11,030 But this is pretty nicely. 247 00:11:11,030 --> 00:11:12,700 Any questions about this protocol? 248 00:11:12,700 --> 00:11:13,260 Yeah? 249 00:11:13,260 --> 00:11:16,146 AUDIENCE: Does it matter whether the user or the server 250 00:11:16,146 --> 00:11:18,560 generates the pass code? 251 00:11:18,560 --> 00:11:20,150 PROFESSOR: Well, maybe. 252 00:11:20,150 --> 00:11:25,760 I think it depends on exactly the considerations, 253 00:11:25,760 --> 00:11:28,080 or the properties you want out of this protocol. 254 00:11:28,080 --> 00:11:35,320 So here, certainly if a is buggy or picks bad randomness, 255 00:11:35,320 --> 00:11:38,420 the server then sends some data back to a, thinking, 256 00:11:38,420 --> 00:11:40,970 oh, this is now the only data that is going to be seen by a. 257 00:11:40,970 --> 00:11:43,320 Well, maybe that's not going to be quite right. 258 00:11:43,320 --> 00:11:45,020 So you might care a little bit. 259 00:11:45,020 --> 00:11:47,270 There's a couple of other problems with this protocol, 260 00:11:47,270 --> 00:11:48,030 as well. 261 00:11:48,030 --> 00:11:49,226 Question? 262 00:11:49,226 --> 00:11:52,810 AUDIENCE: I was gonna say that in this protocol, 263 00:11:52,810 --> 00:11:55,060 a you could just do [INAUDIBLE]. 264 00:11:58,021 --> 00:11:59,770 PROFESSOR: Yes, that's actually not great. 265 00:11:59,770 --> 00:12:01,728 So there's actually several problems with this. 266 00:12:01,728 --> 00:12:05,660 One is the replay. 267 00:12:05,660 --> 00:12:09,436 So the problem here is that I can just 268 00:12:09,436 --> 00:12:10,810 send these messages again, and it 269 00:12:10,810 --> 00:12:14,060 looks like a is sending these messages to b, and so on. 270 00:12:14,060 --> 00:12:16,160 So typically the solution to this 271 00:12:16,160 --> 00:12:18,640 is to have both parties participate 272 00:12:18,640 --> 00:12:22,470 in the generation of s, and that ensures 273 00:12:22,470 --> 00:12:25,230 that the key we're using is now fresh. 274 00:12:25,230 --> 00:12:27,840 Because here, because b didn't actually generating anything, 275 00:12:27,840 --> 00:12:30,310 these protocol messages look exactly the same every time. 276 00:12:30,310 --> 00:12:33,410 So typically, what happens is that, one party picks 277 00:12:33,410 --> 00:12:36,630 a random number like s, and then another party b also 278 00:12:36,630 --> 00:12:39,417 picks some random number, typically called a non. 279 00:12:39,417 --> 00:12:40,000 But, whatever. 280 00:12:40,000 --> 00:12:41,630 There's two numbers. 281 00:12:41,630 --> 00:12:43,877 And then the key they agreed to use in the thing 282 00:12:43,877 --> 00:12:45,460 that one party picked, but actually is 283 00:12:45,460 --> 00:12:48,610 the hash of the things that both of them picked. 284 00:12:48,610 --> 00:12:49,890 So you could do that. 285 00:12:49,890 --> 00:12:52,509 You could also do [? DP Helmond ?] kind of stuff 286 00:12:52,509 --> 00:12:54,050 like we looked at in the last lecture 287 00:12:54,050 --> 00:12:55,650 where you get forward secrecy. 288 00:12:55,650 --> 00:12:58,220 It was a little bit more complicated math rather than 289 00:12:58,220 --> 00:13:01,117 just hashing two random numbers that two parties picked. 290 00:13:01,117 --> 00:13:02,700 But then you get some nice properties, 291 00:13:02,700 --> 00:13:05,610 like forward secrecy. 292 00:13:05,610 --> 00:13:07,600 So replay attacks you typically fixed 293 00:13:07,600 --> 00:13:14,350 by having b generate some nons. 294 00:13:14,350 --> 00:13:16,850 And then you set the real secret key 295 00:13:16,850 --> 00:13:19,900 that you're going to use to hash of the secret key 296 00:13:19,900 --> 00:13:24,267 from one guy concatenated with this non. 297 00:13:24,267 --> 00:13:26,350 And, of course, b would have to send the nons back 298 00:13:26,350 --> 00:13:29,220 to a in order to figure out what's 299 00:13:29,220 --> 00:13:32,850 going on for both of them to agree on a key. 300 00:13:32,850 --> 00:13:33,670 All right. 301 00:13:33,670 --> 00:13:40,060 So another problem here is that there's no real authentication 302 00:13:40,060 --> 00:13:40,990 of a here, all right? 303 00:13:40,990 --> 00:13:43,610 So a knows who b is, or at least a 304 00:13:43,610 --> 00:13:46,700 knows who will be able to decrypt the data. 305 00:13:46,700 --> 00:13:50,390 But b has no idea who is on the other side, 306 00:13:50,390 --> 00:13:54,630 whether it's a or some adversary impersonating a, et cetera. 307 00:13:54,630 --> 00:13:58,741 So how would we fix it int his public key world? 308 00:13:58,741 --> 00:13:59,240 Yeah? 309 00:13:59,240 --> 00:14:01,854 AUDIENCE: You have been assigned something and [INAUDIBLE]. 310 00:14:01,854 --> 00:14:02,520 PROFESSOR: Yeah. 311 00:14:02,520 --> 00:14:05,390 There's a couple of ways you could go about this. 312 00:14:05,390 --> 00:14:07,510 One possibility is a maybe should 313 00:14:07,510 --> 00:14:09,630 sign this message initially, because we 314 00:14:09,630 --> 00:14:11,160 have this nice sign primitive. 315 00:14:11,160 --> 00:14:19,040 So we could maybe have a sign this thing with a's secret key. 316 00:14:19,040 --> 00:14:20,829 And that sign just provides the signature, 317 00:14:20,829 --> 00:14:22,370 but presumably you assign it and also 318 00:14:22,370 --> 00:14:24,400 provide the message, as well. 319 00:14:24,400 --> 00:14:27,750 And then b would have to know a is public key in order 320 00:14:27,750 --> 00:14:29,370 to verify the signature. 321 00:14:29,370 --> 00:14:32,229 But if b knows a is public key, then b's 322 00:14:32,229 --> 00:14:34,520 going to be reasonably confident that a is the one that 323 00:14:34,520 --> 00:14:36,430 sent this message over. 324 00:14:36,430 --> 00:14:37,470 Make sense? 325 00:14:37,470 --> 00:14:40,050 Another thing you could do is rely on encryption. 326 00:14:40,050 --> 00:14:44,120 So maybe b could send the nons back to a encrypted 327 00:14:44,120 --> 00:14:46,029 under a's public key. 328 00:14:46,029 --> 00:14:48,070 And then only a would be able to decrypt the nons 329 00:14:48,070 --> 00:14:50,366 and generate the final session key s prime. 330 00:14:50,366 --> 00:14:52,240 So there are a couple of tricks you could do. 331 00:14:52,240 --> 00:14:55,110 This is roughly how client certificates 332 00:14:55,110 --> 00:14:57,640 work in web browsers today. 333 00:14:57,640 --> 00:15:00,005 So a has a secret key, so when get 334 00:15:00,005 --> 00:15:03,550 an MIT personal certificate, what happens is your browser 335 00:15:03,550 --> 00:15:05,300 generates a long lived secret key 336 00:15:05,300 --> 00:15:07,410 and gets a certificate for it. 337 00:15:07,410 --> 00:15:10,636 And whenever you send to request a web server, 338 00:15:10,636 --> 00:15:12,260 you're going to prove the fact that you 339 00:15:12,260 --> 00:15:15,970 know the secret key in your user certificate, 340 00:15:15,970 --> 00:15:18,540 and then establish the secret key s for the rest 341 00:15:18,540 --> 00:15:19,972 the communication. 342 00:15:19,972 --> 00:15:22,640 Make Sense? 343 00:15:22,640 --> 00:15:23,180 All right. 344 00:15:26,820 --> 00:15:29,760 These are sort of all fixable problems at the protocol level 345 00:15:29,760 --> 00:15:31,820 that are reasonably easy to V address 346 00:15:31,820 --> 00:15:33,390 by adding extra messages. 347 00:15:33,390 --> 00:15:36,810 The big assumption here, of course, that we're going under 348 00:15:36,810 --> 00:15:41,090 is that all the parties know each other's public keys. 349 00:15:41,090 --> 00:15:47,500 So do you actually discover someone's public key? 350 00:15:47,500 --> 00:15:50,910 for, you know, it a wants to connect a website, 351 00:15:50,910 --> 00:15:53,882 I have a URL that I want to connect to, or a host name, 352 00:15:53,882 --> 00:15:55,840 how do I know what pub key that corresponds to? 353 00:15:55,840 --> 00:15:59,660 Or similarly, if I connect to websis to look at my grades, 354 00:15:59,660 --> 00:16:04,070 how does the server know what my public key should be, 355 00:16:04,070 --> 00:16:08,550 as opposed to the public key of some other at person MIT? 356 00:16:08,550 --> 00:16:13,889 So this is the main problem that the KDC was addressing. 357 00:16:13,889 --> 00:16:16,180 I guess the KDC was solving two problems for us before. 358 00:16:16,180 --> 00:16:19,081 One it that is was generating this message. 359 00:16:19,081 --> 00:16:20,455 It was generating the session key 360 00:16:20,455 --> 00:16:22,490 and encrypting it for the server. 361 00:16:22,490 --> 00:16:25,480 We fixed that by doing public key crypto now. 362 00:16:25,480 --> 00:16:29,350 But we also need to get this mapping from string principal 363 00:16:29,350 --> 00:16:32,340 names to cryptographic keys of the Kerberos previously 364 00:16:32,340 --> 00:16:33,512 provided to us. 365 00:16:33,512 --> 00:16:34,970 And the way that is going to happen 366 00:16:34,970 --> 00:16:42,200 in this the https world, this protocol called TLC, 367 00:16:42,200 --> 00:16:45,200 is that we're going to still rely on some parties 368 00:16:45,200 --> 00:16:47,740 to maintain, of to a least logically 369 00:16:47,740 --> 00:16:50,920 maintain those giant tables mapping principal names 370 00:16:50,920 --> 00:16:53,420 onto cryptographic keys. 371 00:16:53,420 --> 00:16:56,174 And the plan is, we're going to have something 372 00:16:56,174 --> 00:16:57,465 called a certificate authority. 373 00:17:02,470 --> 00:17:05,380 This is often abbreviated as CA in all kinds 374 00:17:05,380 --> 00:17:07,609 of security literature. 375 00:17:07,609 --> 00:17:10,400 This thing is also going to logically maintain 376 00:17:10,400 --> 00:17:13,380 the stable of, here's the name of a principle, 377 00:17:13,380 --> 00:17:19,319 and here's the public key for that principle. 378 00:17:19,319 --> 00:17:22,300 And the main difference from the way Kerberos worked, 379 00:17:22,300 --> 00:17:24,280 is that this certificate authority 380 00:17:24,280 --> 00:17:28,450 thing isn't going to have to be online for all transactions. 381 00:17:28,450 --> 00:17:30,450 So in Kerberos you have to talk to those KDCs 382 00:17:30,450 --> 00:17:33,800 to get a connection or to look up someone's key. 383 00:17:33,800 --> 00:17:36,790 Instead, what's going to happen in this CA world, 384 00:17:36,790 --> 00:17:43,210 is that if you have some name here, and a public key, 385 00:17:43,210 --> 00:17:44,800 the certificate authority is going 386 00:17:44,800 --> 00:17:51,020 to just sign messages stating that certain rows exist 387 00:17:51,020 --> 00:17:52,940 in this table. 388 00:17:52,940 --> 00:17:54,540 So the certificate authority is going 389 00:17:54,540 --> 00:17:59,540 to have its own sort of secret and public key here. 390 00:18:01,700 --> 00:18:03,520 And it's going to use the secret key 391 00:18:03,520 --> 00:18:09,080 to find messages for other users in the system to rely on. 392 00:18:09,080 --> 00:18:11,870 So if you have a particular entry like this, 393 00:18:11,870 --> 00:18:15,410 in this CA's database, then the CA 394 00:18:15,410 --> 00:18:19,510 is going to find a message saying this name 395 00:18:19,510 --> 00:18:22,750 corresponds to this public key. 396 00:18:22,750 --> 00:18:26,040 And it's going to sign this whole message 397 00:18:26,040 --> 00:18:31,225 with CA's secret key. 398 00:18:31,225 --> 00:18:31,725 Make sense? 399 00:18:34,430 --> 00:18:37,790 So this is going to allow us to do very similar things to what 400 00:18:37,790 --> 00:18:40,020 Kerberos was doing, but we are now 401 00:18:40,020 --> 00:18:42,830 going to get rid of the CA having to be 402 00:18:42,830 --> 00:18:45,192 online for all transactions. 403 00:18:45,192 --> 00:18:47,400 And in fact, it's now going to be much more scalable. 404 00:18:47,400 --> 00:18:49,358 So this is what's usually called a certificate. 405 00:18:51,690 --> 00:18:54,080 And the reason this is going to be much more scalable 406 00:18:54,080 --> 00:19:00,027 is that, in fact, to a client, or anyone using this system, 407 00:19:00,027 --> 00:19:01,610 a certificate provided from one source 408 00:19:01,610 --> 00:19:04,240 is as good as a certificate provided from any other source. 409 00:19:04,240 --> 00:19:06,120 It's signed by the CA secret key. 410 00:19:06,120 --> 00:19:08,700 So you can verify its validity without having 411 00:19:08,700 --> 00:19:10,830 to actually contact the certificate 412 00:19:10,830 --> 00:19:13,867 authority, or any other designated party here. 413 00:19:13,867 --> 00:19:15,950 And typically, the way this works is that a server 414 00:19:15,950 --> 00:19:19,880 that you want to talk to stores the certificate that it 415 00:19:19,880 --> 00:19:21,980 originally got from the certificate authority. 416 00:19:21,980 --> 00:19:24,339 And whenever you connect to it, the server 417 00:19:24,339 --> 00:19:26,130 will tell you, well, here's my certificate. 418 00:19:26,130 --> 00:19:27,350 It was signed by this CA. 419 00:19:27,350 --> 00:19:29,520 You can check the signature and just verify 420 00:19:29,520 --> 00:19:33,020 that this is, in fact, my public key and that's my name. 421 00:19:33,020 --> 00:19:34,700 And on the flip side, the same thing 422 00:19:34,700 --> 00:19:36,060 happens on client certificates. 423 00:19:36,060 --> 00:19:39,790 So when you the user connect to a web server, what's actually 424 00:19:39,790 --> 00:19:42,590 going on is that your client certificate actually 425 00:19:42,590 --> 00:19:45,780 talks about the public key corresponding to the secret key 426 00:19:45,780 --> 00:19:48,221 that you originally generated in your browser. 427 00:19:48,221 --> 00:19:49,970 And this way when you connect to a server, 428 00:19:49,970 --> 00:19:52,350 you're going to present a certificate signed 429 00:19:52,350 --> 00:19:55,780 by MIT's certificate authority saying your user name 430 00:19:55,780 --> 00:19:57,680 corresponds to this public key. 431 00:19:57,680 --> 00:20:00,430 And this is how the server is going to be convinced 432 00:20:00,430 --> 00:20:03,430 that a message signed with your secret key 433 00:20:03,430 --> 00:20:09,695 is proof that this is the right Athena user connecting to me. 434 00:20:09,695 --> 00:20:10,570 Does that make sense? 435 00:20:10,570 --> 00:20:11,206 Yeah. 436 00:20:11,206 --> 00:20:12,956 AUDIENCE: Where does the [? project ?] get 437 00:20:12,956 --> 00:20:15,470 the certificate [INAUDIBLE]? 438 00:20:15,470 --> 00:20:16,819 PROFESSOR: Ah, yes. 439 00:20:16,819 --> 00:20:18,360 Like the chicken and the egg problem. 440 00:20:18,360 --> 00:20:19,330 It keeps going down. 441 00:20:19,330 --> 00:20:20,788 Where do you get these public keys? 442 00:20:20,788 --> 00:20:22,840 At some point you have to hard code these in, 443 00:20:22,840 --> 00:20:25,310 or that's typically what most systems do. 444 00:20:25,310 --> 00:20:27,074 So today what actually happens is 445 00:20:27,074 --> 00:20:28,615 that when you download a web browser, 446 00:20:28,615 --> 00:20:30,950 or you get a computer for the first time, 447 00:20:30,950 --> 00:20:33,930 it actually comes with public keys of hundreds 448 00:20:33,930 --> 00:20:35,730 of these certificate authorities. 449 00:20:35,730 --> 00:20:37,440 And there's many of them. 450 00:20:37,440 --> 00:20:41,660 Some are run by security companies like VeriSign. 451 00:20:41,660 --> 00:20:43,880 The US Postal Service has a certificate authority, 452 00:20:43,880 --> 00:20:44,860 for some reason. 453 00:20:44,860 --> 00:20:47,640 There's many entities there that could, in principal, issue 454 00:20:47,640 --> 00:20:50,770 these certificates and are fully trusted by the system. 455 00:20:53,510 --> 00:20:55,740 These mini certificate authorities 456 00:20:55,740 --> 00:20:59,674 are now replacing the trust that we had in this KDC. 457 00:20:59,674 --> 00:21:01,090 And sometimes, we haven't actually 458 00:21:01,090 --> 00:21:03,910 addressed all the problems we listed with Kerberos. 459 00:21:03,910 --> 00:21:06,930 So previously were worried that, oh man, how are 460 00:21:06,930 --> 00:21:08,180 we going to trust? 461 00:21:08,180 --> 00:21:09,570 How is everyone in the word going 462 00:21:09,570 --> 00:21:11,820 to trust a single KDC machine? 463 00:21:11,820 --> 00:21:13,850 But now, it's actually worse. 464 00:21:13,850 --> 00:21:16,400 This is actually worse that in some ways, because instead 465 00:21:16,400 --> 00:21:18,320 of trusting a single KDC machine, 466 00:21:18,320 --> 00:21:20,960 everyone is now trusting these hundreds or certificate 467 00:21:20,960 --> 00:21:23,380 authorities because all of them are equally as powerful. 468 00:21:23,380 --> 00:21:25,390 Any of them could sign a message like this 469 00:21:25,390 --> 00:21:28,720 and it would be accepted by clients 470 00:21:28,720 --> 00:21:31,650 as a correct statement saying this principle 471 00:21:31,650 --> 00:21:33,530 has this public key. 472 00:21:33,530 --> 00:21:35,890 So you have to only break into one of these guys instead 473 00:21:35,890 --> 00:21:37,830 of the one KDSC. 474 00:21:40,500 --> 00:21:41,235 Yeah? 475 00:21:41,235 --> 00:21:43,304 AUDIENCE: Is there a mechanism to open the keys? 476 00:21:43,304 --> 00:21:43,970 PROFESSOR: Yeah. 477 00:21:43,970 --> 00:21:45,550 That's another hard problem. 478 00:21:45,550 --> 00:21:47,889 It turns out to be that before we talked to the KDC, 479 00:21:47,889 --> 00:21:49,430 and if you screwed up, you could tell 480 00:21:49,430 --> 00:21:52,500 the KDC to stop giving out my key, or change it. 481 00:21:52,500 --> 00:21:55,410 Now the certificates are actually potentially valid 482 00:21:55,410 --> 00:21:56,110 forever. 483 00:21:56,110 --> 00:21:58,400 So the typical solution is twofold. 484 00:21:58,400 --> 00:22:01,410 One is, sort of expectedly, these certificates 485 00:22:01,410 --> 00:22:05,054 include an expiration time. 486 00:22:05,054 --> 00:22:06,970 So this way you can at least bound the damage. 487 00:22:06,970 --> 00:22:09,178 Is This is kind of like a Kerberos ticket's lifetime, 488 00:22:09,178 --> 00:22:11,917 except in practice, these tend to be to several orders 489 00:22:11,917 --> 00:22:12,750 of magnitude higher. 490 00:22:12,750 --> 00:22:14,740 So in Kerberos, your ticket's lifetime 491 00:22:14,740 --> 00:22:16,720 could be a couple hours. 492 00:22:16,720 --> 00:22:20,720 Here it's typically a year or something like this. 493 00:22:20,720 --> 00:22:24,470 So the CAs really don't want to be talked to very often. 494 00:22:24,470 --> 00:22:26,240 So you want to get your money once 495 00:22:26,240 --> 00:22:27,750 a year for the certificate, and then 496 00:22:27,750 --> 00:22:29,795 give you out this blob of signed bytes, 497 00:22:29,795 --> 00:22:31,170 and you're good to go for a year. 498 00:22:31,170 --> 00:22:32,930 You don't have to conduct them again. 499 00:22:32,930 --> 00:22:35,690 So this is good for scalability, but not so good for security. 500 00:22:35,690 --> 00:22:39,620 And there's two problems that you might worry about 501 00:22:39,620 --> 00:22:40,860 with certificates. 502 00:22:40,860 --> 00:22:44,030 One is that Maybe the CA's screwed up. 503 00:22:44,030 --> 00:22:47,710 So maybe the CA issued a certificate for the wrong name. 504 00:22:47,710 --> 00:22:49,190 Like, they weren't very careful. 505 00:22:49,190 --> 00:22:50,856 And accidentally, I ask them to give you 506 00:22:50,856 --> 00:22:53,297 a certificate for amazon.com, and they just slipped up 507 00:22:53,297 --> 00:22:54,380 and said, all right, sure. 508 00:22:54,380 --> 00:22:54,975 That's amazon.com. 509 00:22:54,975 --> 00:22:56,599 I will give you a certificate for that. 510 00:22:56,599 --> 00:22:58,860 So that seems like a problem on the CA side. 511 00:22:58,860 --> 00:23:00,412 So they miss-issued a certificate. 512 00:23:00,412 --> 00:23:02,870 And that's one way that you could end up with a certificate 513 00:23:02,870 --> 00:23:05,550 that you wish no longer existed, because you 514 00:23:05,550 --> 00:23:07,180 signed the wrong thing. 515 00:23:07,180 --> 00:23:09,375 Another possibility is that they CA 516 00:23:09,375 --> 00:23:11,440 does the right thing, but then the person who 517 00:23:11,440 --> 00:23:14,110 had the certificate I accidentally disclosed 518 00:23:14,110 --> 00:23:17,220 the secret key, or someone stole the secret key corresponding 519 00:23:17,220 --> 00:23:19,130 to the public key in the certificate. 520 00:23:19,130 --> 00:23:21,780 So this means that certificate no longer says 521 00:23:21,780 --> 00:23:23,380 what you think it might mean. 522 00:23:23,380 --> 00:23:27,730 Even though this says amazon.com's key is this, 523 00:23:27,730 --> 00:23:29,285 actually every one in the world has 524 00:23:29,285 --> 00:23:31,201 the corresponding secret key because posted it 525 00:23:31,201 --> 00:23:31,910 on the internet. 526 00:23:31,910 --> 00:23:34,425 So you can't really learn much from someone 527 00:23:34,425 --> 00:23:36,966 sending you a message signed by the corresponding secret key, 528 00:23:36,966 --> 00:23:40,187 because it could've been anyone that stole the secret key. 529 00:23:40,187 --> 00:23:41,770 So that's another reason why you might 530 00:23:41,770 --> 00:23:44,250 want to revoke a certificate. 531 00:23:44,250 --> 00:23:47,220 And revoking certificates is pretty messy. 532 00:23:47,220 --> 00:23:51,100 There's not really a great plan for it. 533 00:23:51,100 --> 00:23:56,340 The two alternatives that people have tried 534 00:23:56,340 --> 00:24:00,690 are to basically publish a list of all revoked 535 00:24:00,690 --> 00:24:01,800 certificates in the world. 536 00:24:01,800 --> 00:24:04,630 This Is something called certificate revocation 537 00:24:04,630 --> 00:24:06,550 list, or CRLs. 538 00:24:06,550 --> 00:24:09,800 And the way this works is that every certificate 539 00:24:09,800 --> 00:24:11,830 authority issues these certificates, 540 00:24:11,830 --> 00:24:15,040 but then on the side, it maintains a list of mistakes. 541 00:24:15,040 --> 00:24:16,460 These are things that it realized 542 00:24:16,460 --> 00:24:18,126 they screwed up and issued a certificate 543 00:24:18,126 --> 00:24:20,320 under the wrong name, or our customers come to them 544 00:24:20,320 --> 00:24:22,070 and say, hey, you issued me a certificate. 545 00:24:22,070 --> 00:24:23,380 Everything was going great. 546 00:24:23,380 --> 00:24:25,020 But someone then got rude on my machine 547 00:24:25,020 --> 00:24:26,370 and stole the private key. 548 00:24:26,370 --> 00:24:29,570 Please tell the world that my certificate is no good anymore. 549 00:24:29,570 --> 00:24:31,570 So this certificate authority, in principle, 550 00:24:31,570 --> 00:24:36,260 could add stuff to this CRL, and then clients like web browsers 551 00:24:36,260 --> 00:24:39,309 are supposed to download this CRL periodically. 552 00:24:39,309 --> 00:24:41,600 And then whenever they're presented with a certificate, 553 00:24:41,600 --> 00:24:43,100 they should check if the certificate 554 00:24:43,100 --> 00:24:45,689 appears in this revoked list. 555 00:24:45,689 --> 00:24:47,105 And it shows up there, then should 556 00:24:47,105 --> 00:24:49,850 say that certificate's no good. 557 00:24:49,850 --> 00:24:51,384 You better give me a new one. 558 00:24:51,384 --> 00:24:53,200 I'm not going to trust this particular sign 559 00:24:53,200 --> 00:24:54,990 message anymore. 560 00:24:54,990 --> 00:24:56,770 So that's one plan. 561 00:24:56,770 --> 00:24:57,620 It's not great. 562 00:25:00,600 --> 00:25:02,754 If you really used, it would be a giant list. 563 00:25:02,754 --> 00:25:04,920 And it would be quite a lot of overhead for everyone 564 00:25:04,920 --> 00:25:06,772 in the world to download this. 565 00:25:06,772 --> 00:25:08,480 The other problem is that no one actually 566 00:25:08,480 --> 00:25:11,370 bothers doing this stuff. so the lists in practice are empty. 567 00:25:11,370 --> 00:25:13,600 If you actually ask all these CAs, most of them 568 00:25:13,600 --> 00:25:16,210 will give you back an empty CRL because no one's ever bothered 569 00:25:16,210 --> 00:25:17,418 to add anything to this list. 570 00:25:17,418 --> 00:25:18,610 Because, why would you? 571 00:25:18,610 --> 00:25:20,330 It will only break things because it 572 00:25:20,330 --> 00:25:23,210 will reduce the number of connections that will succeed. 573 00:25:23,210 --> 00:25:26,210 So it's not clear whether there is a great motivations for CAs 574 00:25:26,210 --> 00:25:29,530 to maintain this CRL. 575 00:25:29,530 --> 00:25:31,460 The other thing that people have tried 576 00:25:31,460 --> 00:25:34,190 is to query online the CAs. 577 00:25:34,190 --> 00:25:39,200 Like in the Kerberos world, we contact the KDC all the time. 578 00:25:39,200 --> 00:25:41,590 And in the CA world we try to get out of this business 579 00:25:41,590 --> 00:25:43,090 and say, well, the CA's only going 580 00:25:43,090 --> 00:25:45,280 to sign these messages once a year. 581 00:25:45,280 --> 00:25:46,280 That's sort of a bummer. 582 00:25:46,280 --> 00:25:47,750 So there's an alternative protocol 583 00:25:47,750 --> 00:25:52,970 called online certificate status protocol, or OCSP. 584 00:25:52,970 --> 00:25:57,050 And this protocol pushes us back from the CA world 585 00:25:57,050 --> 00:25:58,110 to the KDC world. 586 00:25:58,110 --> 00:26:00,840 So whenever a client gets a certificate 587 00:26:00,840 --> 00:26:03,300 and they're curious, is this really a valid certificate? 588 00:26:03,300 --> 00:26:05,160 Even though it's before the expiration time, 589 00:26:05,160 --> 00:26:06,660 maybe something went wrong. 590 00:26:06,660 --> 00:26:10,670 So using this OCSP protocol, you can contact some server 591 00:26:10,670 --> 00:26:12,760 and just say, hey, I got this certificate. 592 00:26:12,760 --> 00:26:14,330 Do you think it's still valid? 593 00:26:14,330 --> 00:26:18,220 So basically, offloading the job of maintaining this CRL 594 00:26:18,220 --> 00:26:19,520 to a particular server. 595 00:26:19,520 --> 00:26:21,710 So instead of downloading a whole list yourself, 596 00:26:21,710 --> 00:26:23,334 you're going to ask the server, hey, is 597 00:26:23,334 --> 00:26:24,950 this thing in that list? 598 00:26:24,950 --> 00:26:27,710 So that's another plan that people have tried. 599 00:26:27,710 --> 00:26:33,710 It's also not used very widely because of two factors. 600 00:26:33,710 --> 00:26:38,990 One is that it adds latency to every request that you make. 601 00:26:38,990 --> 00:26:41,450 So every time you want to connect to a server, 602 00:26:41,450 --> 00:26:44,060 now you have to first connect, get the certificate 603 00:26:44,060 --> 00:26:45,370 from the server. 604 00:26:45,370 --> 00:26:46,950 Now you have to talk to this OCSP guy 605 00:26:46,950 --> 00:26:50,950 and then wait for him to respond and then do something else. 606 00:26:50,950 --> 00:26:52,980 So for latency reasons, this is actually 607 00:26:52,980 --> 00:26:54,970 not a super popular plan. 608 00:26:54,970 --> 00:26:56,980 Another problem is that you don't 609 00:26:56,980 --> 00:27:01,800 want this OCSP thing being down from affecting your ability 610 00:27:01,800 --> 00:27:02,880 to browse the web. 611 00:27:02,880 --> 00:27:04,369 Suppose this OSCP server goes down. 612 00:27:04,369 --> 00:27:06,160 You could, like, disable the whole internet 613 00:27:06,160 --> 00:27:08,090 because you can't check anyone's certificate. 614 00:27:08,090 --> 00:27:09,600 Like, it could be all bad. 615 00:27:09,600 --> 00:27:12,230 And then all your connections stop working. 616 00:27:12,230 --> 00:27:13,590 So no one wants that. 617 00:27:13,590 --> 00:27:17,030 So most clients treat the OCSP server 618 00:27:17,030 --> 00:27:21,194 being down as sort of an OK occurrence. 619 00:27:21,194 --> 00:27:23,110 This is really bad from a security perspective 620 00:27:23,110 --> 00:27:24,790 because if you're an attacker and you 621 00:27:24,790 --> 00:27:27,040 want to convince someone that you have 622 00:27:27,040 --> 00:27:30,000 a legitimate certificate, but it's actually been revoked, 623 00:27:30,000 --> 00:27:32,740 all you have to do is somehow prevent 624 00:27:32,740 --> 00:27:36,090 that client from talking to the OCSP server. 625 00:27:36,090 --> 00:27:39,080 And then the client will say, well, I do the certificate. 626 00:27:39,080 --> 00:27:40,510 I'll try to check it, but this guy 627 00:27:40,510 --> 00:27:42,770 doesn't seem to be around, so I'll just go for it. 628 00:27:42,770 --> 00:27:46,554 So that's basically the sort of lay of the land 629 00:27:46,554 --> 00:27:47,720 as far as verification goes. 630 00:27:47,720 --> 00:27:50,150 So there's no real great answer. 631 00:27:50,150 --> 00:27:52,210 The thing that people do in practice 632 00:27:52,210 --> 00:27:54,710 as an alternative to this is that clients just hard 633 00:27:54,710 --> 00:27:58,500 code in really bad mistakes. 634 00:27:58,500 --> 00:28:01,220 So for example, the Chrome web browser actually 635 00:28:01,220 --> 00:28:04,080 ships inside of it with a list of certificates 636 00:28:04,080 --> 00:28:06,400 that Google really wants to revoke. 637 00:28:06,400 --> 00:28:08,570 So if someone mis-issues a certificate 638 00:28:08,570 --> 00:28:11,790 for Gmail or for some other important site-- like Facebook, 639 00:28:11,790 --> 00:28:15,780 Amazon, or whatever-- then the next release of Chrome 640 00:28:15,780 --> 00:28:19,950 will contain that thing in its verification list baked 641 00:28:19,950 --> 00:28:21,300 into Chrome. 642 00:28:21,300 --> 00:28:23,884 So this way, you don't have to contact the CRL server. 643 00:28:23,884 --> 00:28:25,550 You don't have to talk to this OCSP guy. 644 00:28:25,550 --> 00:28:26,160 It's just baked in. 645 00:28:26,160 --> 00:28:27,910 Like, this certificate is no longer valid. 646 00:28:27,910 --> 00:28:29,016 The client rejects it. 647 00:28:29,016 --> 00:28:29,988 Yeah. 648 00:28:29,988 --> 00:28:30,474 AUDIENCE: Sorry, one last thing. 649 00:28:30,474 --> 00:28:30,960 PROFESSOR: Yeah. 650 00:28:30,960 --> 00:28:32,418 AUDIENCE: So let's say I've stolen the secret key 651 00:28:32,418 --> 00:28:33,876 on the certificate [INAUDIBLE]. 652 00:28:33,876 --> 00:28:35,140 All public keys are [? hard coded-- ?] 653 00:28:35,140 --> 00:28:35,973 PROFESSOR: Oh, yeah. 654 00:28:35,973 --> 00:28:38,730 That's [INAUDIBLE] really bad. 655 00:28:38,730 --> 00:28:42,280 I don't think there's any solution baked into the system 656 00:28:42,280 --> 00:28:45,690 right now for this. 657 00:28:45,690 --> 00:28:47,290 There have been certainly situations 658 00:28:47,290 --> 00:28:49,810 where certificate authorities appear 659 00:28:49,810 --> 00:28:51,400 to have been compromised. 660 00:28:51,400 --> 00:28:54,870 So in 2011, there were two CAs that 661 00:28:54,870 --> 00:28:58,200 were compromised in the issue, or they were somehow 662 00:28:58,200 --> 00:29:00,090 tricked into issuing certificates for Gmail, 663 00:29:00,090 --> 00:29:01,790 for Facebook, et cetera. 664 00:29:01,790 --> 00:29:03,290 And it's not clear. 665 00:29:03,290 --> 00:29:05,010 Maybe someone did steal their secret key. 666 00:29:05,010 --> 00:29:09,150 So what happened is I think those CAs actually 667 00:29:09,150 --> 00:29:12,010 got removed from a set of trusted CAs 668 00:29:12,010 --> 00:29:13,469 by browsers from that point on. 669 00:29:13,469 --> 00:29:15,760 So the next release of Chrome is just like, hey, you're 670 00:29:15,760 --> 00:29:16,390 really screwed up. 671 00:29:16,390 --> 00:29:18,515 We're going to kick you out of the sort of CAs that 672 00:29:18,515 --> 00:29:19,195 are trusted. 673 00:29:19,195 --> 00:29:20,420 And it was actually kind of a bummer 674 00:29:20,420 --> 00:29:21,875 because all of the legitimate people that 675 00:29:21,875 --> 00:29:23,875 had certificates from that certificate authority 676 00:29:23,875 --> 00:29:25,030 are now out of luck. 677 00:29:25,030 --> 00:29:26,680 They have to get new certificates. 678 00:29:26,680 --> 00:29:28,440 So this is a somewhat messy system, 679 00:29:28,440 --> 00:29:33,190 but that's sort of what happens in practice with certificates. 680 00:29:33,190 --> 00:29:34,540 Make sense? 681 00:29:34,540 --> 00:29:38,330 Other questions about how this works? 682 00:29:38,330 --> 00:29:39,612 All right. 683 00:29:39,612 --> 00:29:43,530 So this is the sort of general plan for how certificates work. 684 00:29:43,530 --> 00:29:46,800 And as we were talking about, they're 685 00:29:46,800 --> 00:29:48,510 sort of better than Kerberos in the sense 686 00:29:48,510 --> 00:29:51,610 that you don't have to have this guy be online. 687 00:29:51,610 --> 00:29:53,580 It might be a little bit more scalable 688 00:29:53,580 --> 00:29:55,560 because you can have multiple KDCs, 689 00:29:55,560 --> 00:29:57,200 and you don't have to talk to them. 690 00:29:57,200 --> 00:29:58,783 Another cool thing about this protocol 691 00:29:58,783 --> 00:30:01,220 is that unlike Kerberos, you're not forced 692 00:30:01,220 --> 00:30:02,980 to authenticate both parties. 693 00:30:02,980 --> 00:30:06,010 So you could totally connect to a web server 694 00:30:06,010 --> 00:30:08,157 without having a certificate for yourself. 695 00:30:08,157 --> 00:30:09,240 This happens all the time. 696 00:30:09,240 --> 00:30:10,930 If you just go to amazon.com, you 697 00:30:10,930 --> 00:30:13,380 are going to check that Amazon is the right entity, 698 00:30:13,380 --> 00:30:16,220 but Amazon has no idea who you are necessarily, or at least 699 00:30:16,220 --> 00:30:17,790 not until you log in later. 700 00:30:17,790 --> 00:30:20,040 So the crypto protocol level, you have no certificate. 701 00:30:20,040 --> 00:30:20,720 Amazon has a certificate. 702 00:30:20,720 --> 00:30:22,260 So that was actually much better than Kerberos 703 00:30:22,260 --> 00:30:24,380 where in order to connect to a Kerberos service, 704 00:30:24,380 --> 00:30:28,030 you have to be an entry in the Kerberos database already. 705 00:30:28,030 --> 00:30:30,690 One thing that's a little bit of a bummer with this protocol 706 00:30:30,690 --> 00:30:33,640 as we've described it is that in fact, the server does 707 00:30:33,640 --> 00:30:35,359 have to have a certificate. 708 00:30:35,359 --> 00:30:36,900 So you can't just connect to a server 709 00:30:36,900 --> 00:30:39,080 and say, hey, let's just encrypt our stuff. 710 00:30:39,080 --> 00:30:41,920 I have no idea who you are, or not really, 711 00:30:41,920 --> 00:30:43,540 and you don't have any idea who I am, 712 00:30:43,540 --> 00:30:44,940 but let's encrypt it anyway. 713 00:30:44,940 --> 00:30:46,870 So this is called opportunistic encryption, 714 00:30:46,870 --> 00:30:50,020 and it's of course vulnerable to man in the middle attacks 715 00:30:50,020 --> 00:30:52,170 because you're connecting to someone and saying, 716 00:30:52,170 --> 00:30:53,120 well, let's encrypt our stuff, but you 717 00:30:53,120 --> 00:30:55,536 have no idea who it really is that you're encrypting stuff 718 00:30:55,536 --> 00:30:56,290 with. 719 00:30:56,290 --> 00:30:57,664 Both might be a good idea anyway. 720 00:30:57,664 --> 00:31:00,309 If someone is not actively mounting an attack against you, 721 00:31:00,309 --> 00:31:02,850 at least the packets later on will be encrypted and protected 722 00:31:02,850 --> 00:31:04,520 from snooping. 723 00:31:04,520 --> 00:31:07,590 So it's a bit of a shame that this 724 00:31:07,590 --> 00:31:12,220 protocol that we're looking at here-- SSL, TLS, whatever-- 725 00:31:12,220 --> 00:31:15,180 doesn't offer this kind of opportunistic encryption thing. 726 00:31:15,180 --> 00:31:16,910 But such is life. 727 00:31:16,910 --> 00:31:19,380 So I guess the server [INAUDIBLE] in this protocol. 728 00:31:19,380 --> 00:31:24,100 The client sometimes can and sometimes doesn't have to. 729 00:31:24,100 --> 00:31:24,600 Make sense? 730 00:31:24,600 --> 00:31:24,870 Yeah. 731 00:31:24,870 --> 00:31:25,995 AUDIENCE: I'm just curious. 732 00:31:25,995 --> 00:31:27,852 What's to stop someone from-- I mean, 733 00:31:27,852 --> 00:31:30,487 let's just say that once a year, they 734 00:31:30,487 --> 00:31:32,850 create using new name key pairs. 735 00:31:32,850 --> 00:31:37,476 So why couldn't you kind of try to spend that entire year 736 00:31:37,476 --> 00:31:39,000 for that specific key? 737 00:31:39,000 --> 00:31:39,809 PROFESSOR: Huh? 738 00:31:39,809 --> 00:31:41,600 AUDIENCE: Why does that not work with this? 739 00:31:41,600 --> 00:31:42,840 PROFESSOR: I think it does work. 740 00:31:42,840 --> 00:31:45,048 So OK, so it's like what goes wrong with this scheme. 741 00:31:45,048 --> 00:31:46,990 Like, one of the things that we have 742 00:31:46,990 --> 00:31:49,440 to do with the topography of good here, 743 00:31:49,440 --> 00:31:53,230 and as with Kerberos, people start off using good crypto, 744 00:31:53,230 --> 00:31:55,226 but it gets worse and worse over time. 745 00:31:55,226 --> 00:31:56,100 Computers get faster. 746 00:31:56,100 --> 00:31:58,010 There's better algorithms that are breaking this stuff. 747 00:31:58,010 --> 00:32:00,180 And if people are not diligent about increasing 748 00:32:00,180 --> 00:32:02,869 their standards, then these problems do creep up. 749 00:32:02,869 --> 00:32:05,410 So for example, it used to be the case that many certificates 750 00:32:05,410 --> 00:32:06,274 were signed. 751 00:32:06,274 --> 00:32:07,690 Well, there's two things going on. 752 00:32:07,690 --> 00:32:09,420 There's a public key signature scheme. 753 00:32:09,420 --> 00:32:13,080 And then because the public key crypto has some limitations, 754 00:32:13,080 --> 00:32:15,510 you typically-- actually, when you sign a message, 755 00:32:15,510 --> 00:32:17,360 you actually take a hash of the message 756 00:32:17,360 --> 00:32:19,710 and then you sign the hash itself because it's 757 00:32:19,710 --> 00:32:21,860 hard to sign a gigantic message, but it's 758 00:32:21,860 --> 00:32:24,347 easy to sign a compact hash 759 00:32:24,347 --> 00:32:25,930 And one thing that actually went wrong 760 00:32:25,930 --> 00:32:29,390 is that people used to use MD5 as a hash function 761 00:32:29,390 --> 00:32:34,844 for collapsing the big message here signing into a 128 bit 762 00:32:34,844 --> 00:32:36,510 thing that you're going to actually sign 763 00:32:36,510 --> 00:32:38,400 with a crypto system. 764 00:32:38,400 --> 00:32:40,930 MD5 was good maybe 20 years ago, and then over time, 765 00:32:40,930 --> 00:32:43,770 people discovered weaknesses in MD5 that could be exploited. 766 00:32:43,770 --> 00:32:46,340 So actually, at some point, someone did actually 767 00:32:46,340 --> 00:32:49,560 ask for a certificate with a particular MD5 hash, 768 00:32:49,560 --> 00:32:51,880 and then they carefully figured out 769 00:32:51,880 --> 00:32:56,610 another message that hashes to the same MD5 value. 770 00:32:56,610 --> 00:33:02,090 And as a result, now you have a signature by a CA on some hash, 771 00:33:02,090 --> 00:33:04,964 and then you have a different message, a different key, 772 00:33:04,964 --> 00:33:06,380 or a different name that you could 773 00:33:06,380 --> 00:33:08,870 convince someone was signed. 774 00:33:08,870 --> 00:33:10,697 And this does happen. 775 00:33:10,697 --> 00:33:13,280 Like, if you spend a lot of time trying to break a single key, 776 00:33:13,280 --> 00:33:15,430 than you will succeed eventually. 777 00:33:15,430 --> 00:33:18,750 If that certificate was using crypto, 778 00:33:18,750 --> 00:33:20,456 that could be brute force. 779 00:33:20,456 --> 00:33:22,830 Another example of something that's probably not so great 780 00:33:22,830 --> 00:33:24,067 now is if you're using RSA. 781 00:33:24,067 --> 00:33:25,525 We haven't really talked about RSA, 782 00:33:25,525 --> 00:33:27,465 but RSA is one of these public key crypto 783 00:33:27,465 --> 00:33:30,370 systems that allows us to either encrypt messages or sign 784 00:33:30,370 --> 00:33:31,610 messages. 785 00:33:31,610 --> 00:33:34,560 With RSA, these days, it's probably 786 00:33:34,560 --> 00:33:38,965 feasible to spend lots of money and break 1,000 bit RSA keys. 787 00:33:38,965 --> 00:33:41,090 You'd probably have to spend a fair amount of work, 788 00:33:41,090 --> 00:33:44,850 but it's doable, probably within a year easily. 789 00:33:44,850 --> 00:33:46,140 From there, absolutely. 790 00:33:46,140 --> 00:33:49,720 You can ask a certificate authority to sign some message, 791 00:33:49,720 --> 00:33:52,850 or you can even take someone's existing public key 792 00:33:52,850 --> 00:33:55,680 and try to brute force the corresponding secret key, or 793 00:33:55,680 --> 00:33:56,430 [? manual hack. ?] 794 00:33:56,430 --> 00:34:01,630 So you have to keep up with the attackers in some sense. 795 00:34:01,630 --> 00:34:03,745 You have to use larger keys with RSA. 796 00:34:03,745 --> 00:34:05,870 Or maybe you have to use a different crypto scheme. 797 00:34:05,870 --> 00:34:08,031 For example, now people don't use MD5 hashes 798 00:34:08,031 --> 00:34:08,739 and certificates. 799 00:34:08,739 --> 00:34:11,167 They use SHA-1, but that was good for a while. 800 00:34:11,167 --> 00:34:13,250 Now SHA-1 is also weak, and Google is actually now 801 00:34:13,250 --> 00:34:17,010 actively trying to push web developers and browser vendors 802 00:34:17,010 --> 00:34:19,712 et cetera to discontinue the use of SHA-1 803 00:34:19,712 --> 00:34:21,920 and use a different hash function because it's pretty 804 00:34:21,920 --> 00:34:24,489 clear that maybe in 5 or 10 years time, 805 00:34:24,489 --> 00:34:27,070 there will be relatively easy attacks on SHA-1. 806 00:34:27,070 --> 00:34:29,000 It's already been shown to be weaker. 807 00:34:29,000 --> 00:34:31,931 So I guess there is no magic bullet, per se. 808 00:34:31,931 --> 00:34:33,389 You just have to make sure that you 809 00:34:33,389 --> 00:34:36,650 keep evolving with the hackers. 810 00:34:36,650 --> 00:34:37,908 Yeah. 811 00:34:37,908 --> 00:34:39,199 There is a problem, absolutely. 812 00:34:39,199 --> 00:34:41,199 Like, all of this stuff that we're talking about 813 00:34:41,199 --> 00:34:44,330 relies on crypto being correct, or sort of being s to break. 814 00:34:44,330 --> 00:34:47,166 So you have to pick parameters suitably. 815 00:34:47,166 --> 00:34:49,179 At least here, there's an expiration time. 816 00:34:49,179 --> 00:34:51,800 So well, let's pick some parameters 817 00:34:51,800 --> 00:34:53,980 that are good for a year as opposed to for 10 years. 818 00:34:53,980 --> 00:34:55,820 The CA has a much bigger problem. 819 00:34:55,820 --> 00:34:59,570 This key, there's no expiration on it, necessarily. 820 00:34:59,570 --> 00:35:02,770 So that's less clear what's going on. 821 00:35:02,770 --> 00:35:05,360 So probably, you would pick really aggressively sort 822 00:35:05,360 --> 00:35:07,400 of safe parameters. 823 00:35:07,400 --> 00:35:11,160 So 4,000 or 6,000 bit RSA or something. 824 00:35:11,160 --> 00:35:12,570 Or another scheme all together. 825 00:35:12,570 --> 00:35:13,920 Don't use SHA-1 at all here. 826 00:35:16,710 --> 00:35:17,380 Yeah. 827 00:35:17,380 --> 00:35:19,500 No real clear answer. 828 00:35:19,500 --> 00:35:21,490 You just have to do it. 829 00:35:21,490 --> 00:35:21,990 All right. 830 00:35:21,990 --> 00:35:24,320 Any other questions? 831 00:35:24,320 --> 00:35:24,960 All right. 832 00:35:24,960 --> 00:35:27,740 So let's now look at-- so this is just like the protocol 833 00:35:27,740 --> 00:35:28,370 side of things. 834 00:35:28,370 --> 00:35:30,240 Let's now look at how do we integrate this 835 00:35:30,240 --> 00:35:34,420 into a particular application, namely the web browser? 836 00:35:34,420 --> 00:35:38,640 So I guess if you want to secure network communication, or sort 837 00:35:38,640 --> 00:35:41,490 of websites, with cryptography, there's 838 00:35:41,490 --> 00:35:44,650 really three things we have to protect in browser. 839 00:35:44,650 --> 00:35:48,220 So the first thing we have to protect is data on the network. 840 00:35:51,922 --> 00:35:55,421 And this is almost the easy part because well, we're 841 00:35:55,421 --> 00:35:57,420 just going to run a protocol very much like what 842 00:35:57,420 --> 00:35:58,810 I've been describing so far. 843 00:35:58,810 --> 00:36:00,920 We'll encrypt all the messages, sign them, 844 00:36:00,920 --> 00:36:02,670 make sure they haven't been tampered with, 845 00:36:02,670 --> 00:36:04,410 all this great stuff. 846 00:36:04,410 --> 00:36:06,620 So that's how we're going to protect data. 847 00:36:06,620 --> 00:36:08,850 But then there's two other things in a web browser 848 00:36:08,850 --> 00:36:11,315 that we really have to worry about. 849 00:36:11,315 --> 00:36:13,970 So the first of them is anything that 850 00:36:13,970 --> 00:36:15,340 actually runs in the browser. 851 00:36:15,340 --> 00:36:16,960 So code that's running in the browser, 852 00:36:16,960 --> 00:36:19,510 like JavaScript or important data that's 853 00:36:19,510 --> 00:36:21,340 stored in the browser. 854 00:36:21,340 --> 00:36:24,950 Maybe your cookies, or local storage, or lots of other stuff 855 00:36:24,950 --> 00:36:27,005 that goes on in a modern browser all 856 00:36:27,005 --> 00:36:29,795 has to be somehow protected from network [? of hackers. ?] 857 00:36:29,795 --> 00:36:31,170 And we'll see the kinds of things 858 00:36:31,170 --> 00:36:33,390 we have to worry about here in a second. 859 00:36:33,390 --> 00:36:36,920 And then the last thing that you might not think about too much 860 00:36:36,920 --> 00:36:40,060 but turns out to be a real issue in practice 861 00:36:40,060 --> 00:36:43,950 is protecting the user interface. 862 00:36:43,950 --> 00:36:47,472 And the reason for this is that ultimately, 863 00:36:47,472 --> 00:36:49,930 much of the confidential data that we care about protecting 864 00:36:49,930 --> 00:36:50,763 comes from the user. 865 00:36:50,763 --> 00:36:53,829 And the user is typing this stuff into some website, 866 00:36:53,829 --> 00:36:55,620 and the user probably has multiple websites 867 00:36:55,620 --> 00:36:57,411 open on their computer so that the user has 868 00:36:57,411 --> 00:36:59,760 to be able to distinguish which site they're actually 869 00:36:59,760 --> 00:37:01,980 interacting with at any moment in time. 870 00:37:01,980 --> 00:37:04,690 If they accidentally typed their Amazon password into some web 871 00:37:04,690 --> 00:37:06,674 discussion forum, it's going to be disastrous 872 00:37:06,674 --> 00:37:09,090 depending on how much you care about your Amazon password, 873 00:37:09,090 --> 00:37:10,680 but still. 874 00:37:10,680 --> 00:37:14,550 So you really want to have good user interface 875 00:37:14,550 --> 00:37:17,077 sort of elements that help the user figure out 876 00:37:17,077 --> 00:37:17,910 what are they doing? 877 00:37:17,910 --> 00:37:20,500 Am I typing this confidential data into the right website, 878 00:37:20,500 --> 00:37:23,590 or what's going to happen to this data when I submit it? 879 00:37:23,590 --> 00:37:30,360 So this turns out to be a pretty important issue for protecting 880 00:37:30,360 --> 00:37:33,380 web applications. 881 00:37:33,380 --> 00:37:34,170 All right. 882 00:37:34,170 --> 00:37:35,080 Make sense? 883 00:37:35,080 --> 00:37:37,760 So let's talk actually what the current web 884 00:37:37,760 --> 00:37:39,890 browsers do on this front. 885 00:37:39,890 --> 00:37:42,510 So as I mentioned, here for protecting [INAUDIBLE], 886 00:37:42,510 --> 00:37:47,290 we're just going to use this protocol called SSL or TLS now 887 00:37:47,290 --> 00:37:49,000 that encrypts and authenticates data. 888 00:37:49,000 --> 00:37:51,670 It looks very similar to the kind of discussion we've had 889 00:37:51,670 --> 00:37:53,110 so far. 890 00:37:53,110 --> 00:37:56,180 It includes the certificate authorities, et cetera. 891 00:37:56,180 --> 00:37:58,710 And then of course, many more details. 892 00:37:58,710 --> 00:38:02,080 Like, TLS is hugely complicated, but it's not 893 00:38:02,080 --> 00:38:06,064 particularly interesting from this [INAUDIBLE] angle. 894 00:38:06,064 --> 00:38:08,230 All right, so protecting, [? stopping ?] the browser 895 00:38:08,230 --> 00:38:10,670 turns out to be much more interesting. 896 00:38:10,670 --> 00:38:13,950 And the reason is that we need to make sure 897 00:38:13,950 --> 00:38:19,520 that any code or data delivered over non-encrypted connections 898 00:38:19,520 --> 00:38:22,090 can't tamper with code and data that 899 00:38:22,090 --> 00:38:24,121 came from an encrypted connection 900 00:38:24,121 --> 00:38:26,620 because our threat model is that anything that's unencrypted 901 00:38:26,620 --> 00:38:30,970 could potentially be tampered with by a network [? backer. ?] 902 00:38:30,970 --> 00:38:33,730 So we have to make sure that if we 903 00:38:33,730 --> 00:38:38,250 have some unencrypted JavaScript code running on our browser, 904 00:38:38,250 --> 00:38:40,220 then we should assume that that could've 905 00:38:40,220 --> 00:38:41,750 been tampered with an attacker because it wasn't encrypted. 906 00:38:41,750 --> 00:38:44,077 It wasn't authenticated over the network. 907 00:38:44,077 --> 00:38:45,660 And consequently, we should prevent it 908 00:38:45,660 --> 00:38:48,790 from tampering with any pages that were delivered 909 00:38:48,790 --> 00:38:50,749 over an encrypted connection. 910 00:38:50,749 --> 00:38:52,540 So the general plan for this is we're going 911 00:38:52,540 --> 00:38:56,620 to introduce a new URL scheme. 912 00:38:56,620 --> 00:38:57,950 Let's call HTTPS. 913 00:38:57,950 --> 00:39:02,710 So you often see this in URLs, presumably in your own life. 914 00:39:02,710 --> 00:39:07,640 And there's going to be two things that-- well, first 915 00:39:07,640 --> 00:39:10,190 of all, the cool thing about introducing a new URL scheme 916 00:39:10,190 --> 00:39:13,440 is that now, these URLs are just different from HTTP URLs. 917 00:39:13,440 --> 00:39:16,560 So if you have a URL that's HTTPS colon something 918 00:39:16,560 --> 00:39:19,290 something, it's a different origin 919 00:39:19,290 --> 00:39:23,420 as far as the same origin policy is concerned from regular HTTP 920 00:39:23,420 --> 00:39:24,350 URLs. 921 00:39:24,350 --> 00:39:26,680 So HTTP URLs go over unencrypted corrections. 922 00:39:26,680 --> 00:39:29,430 These things are going over SSL/TLS. 923 00:39:29,430 --> 00:39:31,770 So you'll never confuse the two if the same origin 924 00:39:31,770 --> 00:39:35,130 policy does its job correctly. 925 00:39:35,130 --> 00:39:37,320 So that's one bit of a puzzle. 926 00:39:37,320 --> 00:39:39,180 But then you have to also make sure 927 00:39:39,180 --> 00:39:44,040 that you correctly distinguish different encrypted sites 928 00:39:44,040 --> 00:39:44,940 from one another. 929 00:39:44,940 --> 00:39:47,420 It then turns out cookies have a different policy 930 00:39:47,420 --> 00:39:49,072 for historical reasons. 931 00:39:49,072 --> 00:39:50,530 So let's first talk about how we're 932 00:39:50,530 --> 00:39:52,740 going to distinguish different encrypted sites 933 00:39:52,740 --> 00:39:53,980 from one another. 934 00:39:53,980 --> 00:39:56,010 So the plan for that is that actually, 935 00:39:56,010 --> 00:40:02,965 the host name via the URL has to be the name in the certificate. 936 00:40:05,730 --> 00:40:07,627 So that's what actually turns out 937 00:40:07,627 --> 00:40:09,126 that the certificate authorities are 938 00:40:09,126 --> 00:40:10,600 going to sign at the end of the day 939 00:40:10,600 --> 00:40:14,150 So we're going to literally sign the host name that shows up 940 00:40:14,150 --> 00:40:18,510 in your URL as the name for your web server's public key. 941 00:40:18,510 --> 00:40:22,684 So Amazon presumably has a certificate for www.amazon.com. 942 00:40:22,684 --> 00:40:24,100 That's the name, and then whatever 943 00:40:24,100 --> 00:40:27,000 the public key corresponding to their secret key is. 944 00:40:27,000 --> 00:40:29,050 And this is what the browser's going to look for. 945 00:40:29,050 --> 00:40:31,970 So if it gets a certificate-- well, 946 00:40:31,970 --> 00:40:38,016 if it tries to connect or get a URL that's https://foo.com, 947 00:40:38,016 --> 00:40:40,620 it better be the case that the server presents a certificate 948 00:40:40,620 --> 00:40:42,570 for foo.com exactly. 949 00:40:42,570 --> 00:40:45,560 Otherwise, we'll say, well, we tried to connect to one guy, 950 00:40:45,560 --> 00:40:47,500 but we actually have another guy. 951 00:40:47,500 --> 00:40:49,680 That's a different name in the certificate 952 00:40:49,680 --> 00:40:50,890 that we connected to. 953 00:40:50,890 --> 00:40:54,690 And that'll be a certificate mismatch. 954 00:40:54,690 --> 00:40:57,230 So that's how we are going to distinguish different sites 955 00:40:57,230 --> 00:40:57,730 from one another. 956 00:40:57,730 --> 00:40:59,229 We're basically going to get the CAs 957 00:40:59,229 --> 00:41:01,370 to help us tell these sites apart, 958 00:41:01,370 --> 00:41:03,630 and the CAs are going to promise to issue certificates 959 00:41:03,630 --> 00:41:05,872 to only the right entities. 960 00:41:05,872 --> 00:41:07,330 So that's on the same margin policy 961 00:41:07,330 --> 00:41:11,300 side, how we're going to separate the code apart. 962 00:41:11,300 --> 00:41:15,050 And then as it turns out-- well, as you might remember, 963 00:41:15,050 --> 00:41:17,475 cookies have a slightly different policy. 964 00:41:17,475 --> 00:41:20,800 Like, it's almost the same origin, but not quite. 965 00:41:20,800 --> 00:41:23,980 So cookies have a slightly different plan. 966 00:41:23,980 --> 00:41:30,610 So cookies have this secure flag that you can set on a cookie. 967 00:41:30,610 --> 00:41:34,220 So the rules are, if a cookie has a secure flag, 968 00:41:34,220 --> 00:41:39,940 then it gets sent only to HTTPS requests, 969 00:41:39,940 --> 00:41:42,330 or along with HTTPS requests. 970 00:41:42,330 --> 00:41:45,700 And if a cookie doesn't have a secure flag, 971 00:41:45,700 --> 00:41:49,590 then it applies to both HTTP and HTTPS requests. 972 00:41:49,590 --> 00:41:51,600 Well, it's a little bit complicated, right. 973 00:41:51,600 --> 00:41:53,640 It would be cleaner if cookies just said, 974 00:41:53,640 --> 00:41:57,080 well, this is a cookie for an HTTPS post, 975 00:41:57,080 --> 00:41:58,700 and this is a cookie for HTTP host. 976 00:41:58,700 --> 00:42:00,470 And they're just completely different. 977 00:42:00,470 --> 00:42:03,410 That would be very clear in terms of isolating secure sites 978 00:42:03,410 --> 00:42:05,007 from insecure sites. 979 00:42:05,007 --> 00:42:06,590 Unfortunately, for historical reasons, 980 00:42:06,590 --> 00:42:09,370 cookies have this weird sort of interaction. 981 00:42:09,370 --> 00:42:12,250 So if a cookie is marked secure, then it only 982 00:42:12,250 --> 00:42:14,660 applies to HTTPS sites. 983 00:42:14,660 --> 00:42:16,830 Well, there's a host also as well, right. 984 00:42:16,830 --> 00:42:20,920 So secure cookies apply only to HTTPS host URLs, 985 00:42:20,920 --> 00:42:22,740 and insecure cookies apply to both. 986 00:42:22,740 --> 00:42:27,310 So that will be some source of problems for us in a second. 987 00:42:27,310 --> 00:42:29,450 Make sense? 988 00:42:29,450 --> 00:42:30,080 All right. 989 00:42:30,080 --> 00:42:32,490 And the final bit that web browsers 990 00:42:32,490 --> 00:42:38,870 do to try to help us along in this plan is for the UI aspect, 991 00:42:38,870 --> 00:42:43,540 they're going to introduce some kind of a lock icon 992 00:42:43,540 --> 00:42:45,660 that users are supposed to see. 993 00:42:45,660 --> 00:42:48,220 So there's a lock icon in your browser, 994 00:42:48,220 --> 00:42:51,150 plus you're supposed to look at the URL to figure out 995 00:42:51,150 --> 00:42:52,150 which site you're on. 996 00:42:52,150 --> 00:42:55,860 Now that's how web browser developers expect 997 00:42:55,860 --> 00:42:57,030 you to think of the world. 998 00:42:57,030 --> 00:43:00,100 Like, if you're ever entering confidential stuff 999 00:43:00,100 --> 00:43:02,850 into some website, then you should look at the URL, 1000 00:43:02,850 --> 00:43:04,816 make sure that's the actual host name 1001 00:43:04,816 --> 00:43:06,690 that you want to be talking to, and then look 1002 00:43:06,690 --> 00:43:08,980 for some sort of a lock icon, and then you 1003 00:43:08,980 --> 00:43:10,755 should assume things are good to go. 1004 00:43:10,755 --> 00:43:12,735 So that's the UI aspect of it. 1005 00:43:12,735 --> 00:43:13,360 It's not great. 1006 00:43:13,360 --> 00:43:17,880 It turns out that many phishing sites will just 1007 00:43:17,880 --> 00:43:20,800 include an image of a lock icon in the site itself 1008 00:43:20,800 --> 00:43:21,960 and have a different URL. 1009 00:43:21,960 --> 00:43:24,030 And if you don't know exactly what to look for 1010 00:43:24,030 --> 00:43:29,210 or what's going on, a user might be fooled by this. 1011 00:43:29,210 --> 00:43:31,960 So this UI side is a little messy, 1012 00:43:31,960 --> 00:43:34,790 partly because users are messy, like humans. 1013 00:43:34,790 --> 00:43:36,515 And it's really hard to tell what's 1014 00:43:36,515 --> 00:43:37,640 the right thing to do here. 1015 00:43:37,640 --> 00:43:40,110 So we'll focus mostly on this aspect of it, 1016 00:43:40,110 --> 00:43:43,652 which is much easier to discuss precisely. 1017 00:43:43,652 --> 00:43:45,290 Make sense? 1018 00:43:45,290 --> 00:43:47,665 Any questions about this stuff so far? 1019 00:43:47,665 --> 00:43:48,164 Yeah. 1020 00:43:48,164 --> 00:43:49,538 AUDIENCE: I noticed some websites 1021 00:43:49,538 --> 00:43:51,624 that our HTTPS [INAUDIBLE]. 1022 00:43:51,624 --> 00:43:52,290 PROFESSOR: Yeah. 1023 00:43:52,290 --> 00:43:57,030 So it turns out that the browsers evolve over time what 1024 00:43:57,030 --> 00:43:59,820 it means to get a lock icon. 1025 00:43:59,820 --> 00:44:03,370 So one thing that some browsers do 1026 00:44:03,370 --> 00:44:06,390 is they give you a lock icon only 1027 00:44:06,390 --> 00:44:09,850 if all of the content or the resources within your page 1028 00:44:09,850 --> 00:44:11,462 were also served over HTTPS. 1029 00:44:11,462 --> 00:44:12,920 So this is one of the problems that 1030 00:44:12,920 --> 00:44:15,680 forced HTTPS tries to address is this mixed 1031 00:44:15,680 --> 00:44:19,910 content or insecure embedding kinds of problems. 1032 00:44:19,910 --> 00:44:22,290 So sometimes, you will be fail to get a lock icon because 1033 00:44:22,290 --> 00:44:23,950 of that check. 1034 00:44:23,950 --> 00:44:25,820 Other times, maybe your certificate 1035 00:44:25,820 --> 00:44:26,820 isn't quite good enough. 1036 00:44:26,820 --> 00:44:29,700 So for example, Chrome will not give you a lock icon 1037 00:44:29,700 --> 00:44:34,430 if it thinks your certificate uses weak cryptography. 1038 00:44:34,430 --> 00:44:35,990 But also, it varies with browsers. 1039 00:44:35,990 --> 00:44:38,280 So maybe Chrome will not give you a lock icon, 1040 00:44:38,280 --> 00:44:39,470 but Firefox will. 1041 00:44:39,470 --> 00:44:43,000 So it's, again, there's no clear spec 1042 00:44:43,000 --> 00:44:44,920 on what this lock icon means. 1043 00:44:44,920 --> 00:44:51,060 Just people sweep stuff under this lock icon. 1044 00:44:51,060 --> 00:44:54,114 Other questions? 1045 00:44:54,114 --> 00:44:55,090 All right. 1046 00:44:55,090 --> 00:44:58,830 So let's look at h guess what kinds of problems 1047 00:44:58,830 --> 00:45:01,082 we run into with this plan. 1048 00:45:01,082 --> 00:45:03,290 So one thing I guess we should maybe first talk about 1049 00:45:03,290 --> 00:45:07,540 is, OK, so in regular HTTP, we used 1050 00:45:07,540 --> 00:45:11,010 to rely on DNS to give us the correct IP 1051 00:45:11,010 --> 00:45:13,100 address on the server. 1052 00:45:13,100 --> 00:45:17,595 So how much do we have to trust DNS for these HTTPS URLs? 1053 00:45:20,960 --> 00:45:24,049 Are DNS servers trusted, or are these DNS mappings 1054 00:45:24,049 --> 00:45:25,090 important for us anymore? 1055 00:45:28,360 --> 00:45:28,860 Yeah. 1056 00:45:28,860 --> 00:45:30,860 AUDIENCE: They are because the certificate 1057 00:45:30,860 --> 00:45:32,819 is signing the domain name. 1058 00:45:32,819 --> 00:45:34,860 I don't think you sign an IP address [INAUDIBLE]. 1059 00:45:34,860 --> 00:45:35,170 PROFESSOR: That's right. 1060 00:45:35,170 --> 00:45:35,390 Yeah. 1061 00:45:35,390 --> 00:45:37,098 So the certificate signs the domain name. 1062 00:45:37,098 --> 00:45:38,430 So this is like amazon.com. 1063 00:45:42,040 --> 00:45:43,840 So [INAUDIBLE]. 1064 00:45:43,840 --> 00:45:47,776 AUDIENCE: Say someone steals amazon.com's private key 1065 00:45:47,776 --> 00:45:51,220 and [INAUDIBLE] another server with another IP address, 1066 00:45:51,220 --> 00:45:54,849 and combines [INAUDIBLE] IP address [INAUDIBLE]. 1067 00:45:54,849 --> 00:45:56,640 But then you already stole the private key. 1068 00:45:56,640 --> 00:45:57,030 PROFESSOR: That's right. 1069 00:45:57,030 --> 00:45:57,530 Yeah. 1070 00:45:57,530 --> 00:45:59,220 So in fact, you're describing after both 1071 00:45:59,220 --> 00:46:02,880 steal the private key and redirect DNS to yourself. 1072 00:46:02,880 --> 00:46:07,852 So is DNS in itself sensitive enough for us to care about? 1073 00:46:07,852 --> 00:46:09,310 I guess in some sense you're right, 1074 00:46:09,310 --> 00:46:11,130 that we need DNS to find the idea, 1075 00:46:11,130 --> 00:46:13,615 or otherwise we'd be lost because this is just the host 1076 00:46:13,615 --> 00:46:15,990 name, and we still need to find IP address to talk to it. 1077 00:46:15,990 --> 00:46:17,830 What if someone compromised the DNS server 1078 00:46:17,830 --> 00:46:19,814 and points us at a different IP address? 1079 00:46:19,814 --> 00:46:20,730 Is it going to be bad? 1080 00:46:20,730 --> 00:46:21,230 Yeah. 1081 00:46:21,230 --> 00:46:25,019 AUDIENCE: Well, maybe just [INAUDIBLE] HTTPS. 1082 00:46:25,019 --> 00:46:26,810 PROFESSOR: So potentially worrisome, right. 1083 00:46:26,810 --> 00:46:28,763 So they might just refuse the connection altogether. 1084 00:46:28,763 --> 00:46:29,709 AUDIENCE: Well, no. 1085 00:46:29,709 --> 00:46:31,810 They just redirect you to the HTTP URL. 1086 00:46:31,810 --> 00:46:33,351 PROFESSOR: Well, so certainly, if you 1087 00:46:33,351 --> 00:46:37,290 connect to it over HTTPS, then they can't redirect. 1088 00:46:37,290 --> 00:46:39,660 But yeah. 1089 00:46:39,660 --> 00:46:40,160 Yeah. 1090 00:46:40,160 --> 00:46:44,660 AUDIENCE: You can [INAUDIBLE] and try to fool the user. 1091 00:46:44,660 --> 00:46:46,520 That's [INAUDIBLE]. 1092 00:46:46,520 --> 00:46:47,770 PROFESSOR: That's right, yeah. 1093 00:46:47,770 --> 00:46:49,960 So the thing that you mentioned is 1094 00:46:49,960 --> 00:46:53,190 that you could try to serve up a different certificate. 1095 00:46:53,190 --> 00:46:56,097 So maybe you-- well, one possibility is you somehow 1096 00:46:56,097 --> 00:46:57,930 compromised the CA, in which case all right, 1097 00:46:57,930 --> 00:46:59,470 you're in business. 1098 00:46:59,470 --> 00:47:01,434 Another possibility is maybe you'll just sign 1099 00:47:01,434 --> 00:47:02,600 the certificate by yourself. 1100 00:47:02,600 --> 00:47:04,725 Or maybe you have some old certificate for this guy 1101 00:47:04,725 --> 00:47:08,240 that you gotten the private key for. 1102 00:47:08,240 --> 00:47:11,070 And it turns out that web browsers, 1103 00:47:11,070 --> 00:47:14,300 as this sort of forced HTTPS paper we're reading touched on, 1104 00:47:14,300 --> 00:47:18,570 most web browsers actually ask the user if something doesn't 1105 00:47:18,570 --> 00:47:20,205 look right with the certificate, which 1106 00:47:20,205 --> 00:47:21,580 seems like a fairly strange thing 1107 00:47:21,580 --> 00:47:23,850 to do because here's the rule. 1108 00:47:23,850 --> 00:47:26,300 The host name has to match the name of the certificate, 1109 00:47:26,300 --> 00:47:27,400 and it has to be valid. 1110 00:47:27,400 --> 00:47:30,450 It has to be unexpired, all these very clear rules. 1111 00:47:30,450 --> 00:47:34,640 But because of historically the way HTTPS has been deployed, 1112 00:47:34,640 --> 00:47:38,030 it's often been the case that web server operators 1113 00:47:38,030 --> 00:47:40,270 mis-configure HTTPS. 1114 00:47:40,270 --> 00:47:43,330 So maybe they just forget to renew their certificate. 1115 00:47:43,330 --> 00:47:45,380 Things were going along great and you 1116 00:47:45,380 --> 00:47:47,697 didn't notice that your certificate was expired 1117 00:47:47,697 --> 00:47:49,030 and you just forgot to renew it. 1118 00:47:49,030 --> 00:47:51,368 So it seems to web browser developers, that 1119 00:47:51,368 --> 00:47:52,576 seems like a bit of a bummer. 1120 00:47:52,576 --> 00:47:53,076 Oh, man. 1121 00:47:53,076 --> 00:47:54,020 It's just expired. 1122 00:47:54,020 --> 00:47:55,510 Let's just let the user continue. 1123 00:47:55,510 --> 00:47:57,880 So they offer a dialogue box for the user saying, 1124 00:47:57,880 --> 00:47:59,610 well, I got a certificate, but it 1125 00:47:59,610 --> 00:48:01,050 doesn't look right in some way. 1126 00:48:01,050 --> 00:48:04,220 [INAUDIBLE] go ahead anyway and continue. 1127 00:48:04,220 --> 00:48:07,880 So web browsers will allow users to sort of override 1128 00:48:07,880 --> 00:48:10,785 this decision on things like expiration of certificates. 1129 00:48:10,785 --> 00:48:13,287 Also for host names, it might be the case 1130 00:48:13,287 --> 00:48:14,620 that your website has many name. 1131 00:48:14,620 --> 00:48:16,440 Like for Amazon, you might connect 1132 00:48:16,440 --> 00:48:21,430 to amazon.com, or maybe www.amazon.com, or maybe 1133 00:48:21,430 --> 00:48:23,220 other host names. 1134 00:48:23,220 --> 00:48:26,007 And if you are not careful with the website operator, 1135 00:48:26,007 --> 00:48:27,590 you might not know to get certificates 1136 00:48:27,590 --> 00:48:30,910 for every possible name that your website has. 1137 00:48:30,910 --> 00:48:33,615 And then a user is sort of stuck saying, well, 1138 00:48:33,615 --> 00:48:35,240 the host name doesn't look quite right, 1139 00:48:35,240 --> 00:48:37,020 but maybe let's go anyway. 1140 00:48:37,020 --> 00:48:39,070 So this is the reason why web browsers allow 1141 00:48:39,070 --> 00:48:44,560 users to accept more broadly, or a broader range 1142 00:48:44,560 --> 00:48:47,291 of certificates, than these rules might otherwise dictate. 1143 00:48:47,291 --> 00:48:48,540 So that's [INAUDIBLE] problem. 1144 00:48:48,540 --> 00:48:51,484 And then if you hijack DNS, then you 1145 00:48:51,484 --> 00:48:52,900 might be able to redirect the user 1146 00:48:52,900 --> 00:48:54,695 to one of these sites that serves up 1147 00:48:54,695 --> 00:48:56,830 a incorrect certificate. 1148 00:48:56,830 --> 00:48:59,270 And if the user isn't careful, they're 1149 00:48:59,270 --> 00:49:03,230 going to potentially approve the browser accepting 1150 00:49:03,230 --> 00:49:07,102 your certificate, and then you're in trouble then. 1151 00:49:07,102 --> 00:49:12,354 That's a bit of a gray area with respect to how much 1152 00:49:12,354 --> 00:49:13,520 you should really trust DNS. 1153 00:49:13,520 --> 00:49:15,978 So you certainly don't want to give arbitrary users control 1154 00:49:15,978 --> 00:49:17,890 of your DNS name [INAUDIBLE]. 1155 00:49:17,890 --> 00:49:21,900 But certainly, the goal of SSL/TLS and HTTPS, 1156 00:49:21,900 --> 00:49:25,290 all this stuff, is to hopefully not trust DNS at all. 1157 00:49:25,290 --> 00:49:27,940 If everything works here correctly, 1158 00:49:27,940 --> 00:49:30,150 then DNS shouldn't be trusted. 1159 00:49:30,150 --> 00:49:31,794 You can [INAUDIBLE]. 1160 00:49:31,794 --> 00:49:33,710 You should never be able to intercept any data 1161 00:49:33,710 --> 00:49:36,612 or corrupt data, et cetera. 1162 00:49:36,612 --> 00:49:37,847 Make sense? 1163 00:49:37,847 --> 00:49:39,430 That's if everything works, of course. 1164 00:49:39,430 --> 00:49:42,280 It's a little bit messier than that. 1165 00:49:42,280 --> 00:49:43,990 All right. 1166 00:49:43,990 --> 00:49:49,230 So I guess one interesting question to talk about 1167 00:49:49,230 --> 00:49:52,880 is I guess how bad could an attack be 1168 00:49:52,880 --> 00:49:57,290 if the user mis-approves a certificate? 1169 00:49:57,290 --> 00:49:59,150 So as we were saying, if the user accepts 1170 00:49:59,150 --> 00:50:02,110 a certificate for the wrong host or accepts an expired 1171 00:50:02,110 --> 00:50:05,300 certificate, what could go wrong? 1172 00:50:05,300 --> 00:50:09,870 How much should we worry about this mistake from the user? 1173 00:50:09,870 --> 00:50:10,428 Yeah. 1174 00:50:10,428 --> 00:50:11,594 AUDIENCE: Well, [INAUDIBLE]. 1175 00:50:14,870 --> 00:50:17,671 But it could be, [? in example ?], not the site 1176 00:50:17,671 --> 00:50:19,490 the user wants to visit. 1177 00:50:19,490 --> 00:50:24,252 So they could do things like pretend to be the user's name. 1178 00:50:24,252 --> 00:50:24,960 PROFESSOR: Right. 1179 00:50:24,960 --> 00:50:27,515 So certainly, the user might then I guess 1180 00:50:27,515 --> 00:50:29,640 be fooled into thinking, oh, I have all this money, 1181 00:50:29,640 --> 00:50:32,230 or you have no money at all because the result page comes 1182 00:50:32,230 --> 00:50:34,010 back saying here's your balance. 1183 00:50:34,010 --> 00:50:35,940 So maybe the user will assume something 1184 00:50:35,940 --> 00:50:39,070 about what that bank has or doesn't have based 1185 00:50:39,070 --> 00:50:41,830 on the result. Well, it still seems bad, 1186 00:50:41,830 --> 00:50:43,520 but not necessarily so disastrous. 1187 00:50:43,520 --> 00:50:44,460 Yeah. 1188 00:50:44,460 --> 00:50:46,810 AUDIENCE: I think that an [INAUDIBLE] 1189 00:50:46,810 --> 00:50:51,162 get all the user's cookies and [INAUDIBLE]. 1190 00:50:51,162 --> 00:50:51,870 PROFESSOR: Right. 1191 00:50:51,870 --> 00:50:53,367 So this is your fear, yeah. 1192 00:50:53,367 --> 00:50:54,950 This is much more worrisome, actually, 1193 00:50:54,950 --> 00:50:58,000 or has a much more longer lasting impact on you. 1194 00:50:58,000 --> 00:51:01,536 And the reason this works out is because the browser, when 1195 00:51:01,536 --> 00:51:04,060 it figures out [INAUDIBLE] makes a decision 1196 00:51:04,060 --> 00:51:06,610 as to who is allowed to get a particular set of cookies 1197 00:51:06,610 --> 00:51:09,640 or not just looks at the host name in the URL 1198 00:51:09,640 --> 00:51:11,390 that you were supposed to be connected to. 1199 00:51:11,390 --> 00:51:14,305 So if you connect to some attackers' web server, 1200 00:51:14,305 --> 00:51:18,130 and then you just accept their certificate for amazon.com 1201 00:51:18,130 --> 00:51:20,175 as the real thing, then the browser 1202 00:51:20,175 --> 00:51:22,785 will think, yeah, the entity I'm talking to is amazon.com, 1203 00:51:22,785 --> 00:51:25,810 so I will treat them as I would a normal amazon.com 1204 00:51:25,810 --> 00:51:28,345 server, which means that they should get access 1205 00:51:28,345 --> 00:51:31,120 to all the cookies that you have for that host. 1206 00:51:31,120 --> 00:51:33,770 And presumably they could run a JavaScript code 1207 00:51:33,770 --> 00:51:37,880 in your browser in that same origin principle. 1208 00:51:37,880 --> 00:51:40,780 So if you have another site open that 1209 00:51:40,780 --> 00:51:45,080 was connecting to the real website-- like maybe 1210 00:51:45,080 --> 00:51:46,650 you had a tab open in your browser. 1211 00:51:46,650 --> 00:51:48,400 You closed your laptop, then you opened it 1212 00:51:48,400 --> 00:51:51,190 on a different network, all of a sudden, someone intercepted 1213 00:51:51,190 --> 00:51:53,480 your connection to amazon.com and injected 1214 00:51:53,480 --> 00:51:54,614 their own response. 1215 00:51:54,614 --> 00:51:56,030 If you approve it, then they'll be 1216 00:51:56,030 --> 00:51:58,975 able to access the old amazon.com 1217 00:51:58,975 --> 00:52:01,100 page you have open because as far as the browser is 1218 00:52:01,100 --> 00:52:02,599 concerned, these are the same origin 1219 00:52:02,599 --> 00:52:04,656 because they have the same host name. 1220 00:52:04,656 --> 00:52:06,150 That's going to be troublesome. 1221 00:52:06,150 --> 00:52:09,670 So this is potentially quite a unfortunate attack 1222 00:52:09,670 --> 00:52:11,720 if the user makes the wrong choice 1223 00:52:11,720 --> 00:52:13,560 on approving that certificate. 1224 00:52:13,560 --> 00:52:15,430 Make sense? 1225 00:52:15,430 --> 00:52:18,430 Any questions about that? 1226 00:52:18,430 --> 00:52:19,780 All right. 1227 00:52:19,780 --> 00:52:22,480 So that's one sort of, I guess, issue that this forced HTTPS 1228 00:52:22,480 --> 00:52:25,190 paper is worried about is users making 1229 00:52:25,190 --> 00:52:28,870 a mistake in the decision, users having too much leeway 1230 00:52:28,870 --> 00:52:31,850 in accepting certificates. 1231 00:52:31,850 --> 00:52:35,910 Another problem that shows up in practice 1232 00:52:35,910 --> 00:52:39,010 is that-- we sort of briefly talked about this-- 1233 00:52:39,010 --> 00:52:41,690 but this is one of the things that also forced 1234 00:52:41,690 --> 00:52:43,890 HTTPS, I think, is somewhat concerned about 1235 00:52:43,890 --> 00:52:50,760 is this notion of insecure embedding, or mixed content. 1236 00:52:50,760 --> 00:52:54,940 And the problem that this term refers to 1237 00:52:54,940 --> 00:53:00,880 is that a secure site, or any website for that matter, 1238 00:53:00,880 --> 00:53:04,390 can embed other pieces of content into a web page. 1239 00:53:04,390 --> 00:53:14,020 So if you have some sort of a site, foo.com/index.html, 1240 00:53:14,020 --> 00:53:17,870 this site might be served from HTTPS, 1241 00:53:17,870 --> 00:53:21,420 but inside of this HTML page, you could have many tags that 1242 00:53:21,420 --> 00:53:24,860 instruct the browser to go and fetch other stuff as part 1243 00:53:24,860 --> 00:53:25,700 of this page. 1244 00:53:25,700 --> 00:53:27,820 So the easiest thing to sort of think about 1245 00:53:27,820 --> 00:53:29,772 is probably script tags where you 1246 00:53:29,772 --> 00:53:36,230 can say script source equals http jquery.com. 1247 00:53:36,230 --> 00:53:38,620 So this is a popular JavaScript library 1248 00:53:38,620 --> 00:53:41,330 that makes it easier to interact with lots of stuff 1249 00:53:41,330 --> 00:53:42,250 in your browser. 1250 00:53:42,250 --> 00:53:47,150 But many web developers just reference a URL 1251 00:53:47,150 --> 00:53:49,385 on another site like this. 1252 00:53:49,385 --> 00:53:51,010 So we should be fairly straightforward, 1253 00:53:51,010 --> 00:53:53,790 but what's the problem with this kind of set up? 1254 00:53:53,790 --> 00:53:58,170 Suppose you have a secure site and you just load jQuery. 1255 00:53:58,170 --> 00:53:58,896 Yeah. 1256 00:53:58,896 --> 00:54:00,624 AUDIENCE: It could be fake jQuery. 1257 00:54:00,624 --> 00:54:01,290 PROFESSOR: Yeah. 1258 00:54:01,290 --> 00:54:03,300 So there are actually two ways that you 1259 00:54:03,300 --> 00:54:06,380 could get the wrong thing that you're not expecting. 1260 00:54:06,380 --> 00:54:09,630 One possibility is that jQuery itself is compromised. 1261 00:54:09,630 --> 00:54:12,470 So that seems like, well, you get what you asked for. 1262 00:54:12,470 --> 00:54:14,830 You asked for this site from jquery.com 1263 00:54:14,830 --> 00:54:16,140 and that's what you get. 1264 00:54:16,140 --> 00:54:19,070 If jQuery is compromised, that's too bad. 1265 00:54:19,070 --> 00:54:21,220 Another problem is that this request 1266 00:54:21,220 --> 00:54:23,960 is going to be sent without any encryption or authentication 1267 00:54:23,960 --> 00:54:24,980 over the network. 1268 00:54:24,980 --> 00:54:29,150 So if an adversary is in control over your network connection, 1269 00:54:29,150 --> 00:54:30,890 then they could intercept this request 1270 00:54:30,890 --> 00:54:34,404 and serve back some other JavaScript code in response. 1271 00:54:34,404 --> 00:54:35,820 Now, this JavaScript code is going 1272 00:54:35,820 --> 00:54:38,480 to run as part of this page. 1273 00:54:38,480 --> 00:54:42,560 And now, because it's running in this HTTPS foo.com domain, 1274 00:54:42,560 --> 00:54:45,630 it has access to your secure cookies for foo.com 1275 00:54:45,630 --> 00:54:48,740 and any other stuff you have in that page, et cetera. 1276 00:54:48,740 --> 00:54:50,920 So it seems like a really bad thing. 1277 00:54:50,920 --> 00:54:52,482 So you should be careful not to. 1278 00:54:52,482 --> 00:54:53,940 Or a web developer certainly should 1279 00:54:53,940 --> 00:54:57,470 be careful not to make this kind of a mistake. 1280 00:54:57,470 --> 00:55:02,180 So one solution is to ensure that all content embedded 1281 00:55:02,180 --> 00:55:04,340 in a secure page is also secure. 1282 00:55:04,340 --> 00:55:07,055 So this seems like a good guideline for many web 1283 00:55:07,055 --> 00:55:07,930 developers to follow. 1284 00:55:07,930 --> 00:55:12,435 So maybe you should just do https colon jquery.com. 1285 00:55:12,435 --> 00:55:17,410 Or it turns out that URLs support these origin relative 1286 00:55:17,410 --> 00:55:21,345 URLs, which means you could omit the HTTPS part and just say, 1287 00:55:21,345 --> 00:55:29,550 [INAUDIBLE] script source equals //jquery.com/ something. 1288 00:55:29,550 --> 00:55:33,900 And what this means is to use whatever URL scheme 1289 00:55:33,900 --> 00:55:35,890 your own URL came from. 1290 00:55:35,890 --> 00:55:39,300 So this tag will translate to https jquery.com 1291 00:55:39,300 --> 00:55:42,870 if it's on an HTTPS page, and to regular http jquery.com 1292 00:55:42,870 --> 00:55:46,940 if it's on a non-HTTPS, just regular HTTP URL. 1293 00:55:46,940 --> 00:55:50,660 So that's one way to avoid this problem. 1294 00:55:50,660 --> 00:55:54,210 Another thing that actually recently got introduced. 1295 00:55:54,210 --> 00:55:57,280 So this field is somewhat active. 1296 00:55:57,280 --> 00:56:00,280 People are trying to make things better. 1297 00:56:00,280 --> 00:56:04,340 One alternative way of dealing with this problem 1298 00:56:04,340 --> 00:56:07,560 is perhaps to include a hash or some sort of an [? indicator ?] 1299 00:56:07,560 --> 00:56:10,720 right here in the tag, because if you know exactly what 1300 00:56:10,720 --> 00:56:13,260 content you want to load, maybe you don't actually 1301 00:56:13,260 --> 00:56:14,890 have to load it all over HTTPS. 1302 00:56:14,890 --> 00:56:17,740 You don't actually care who serves it to you, as long as it 1303 00:56:17,740 --> 00:56:19,560 matches a particular hash. 1304 00:56:19,560 --> 00:56:22,890 So there's actually a new spec out there 1305 00:56:22,890 --> 00:56:26,430 for being able to specify basically 1306 00:56:26,430 --> 00:56:30,310 hashes in these kinds of tags. 1307 00:56:30,310 --> 00:56:34,305 So instead of having to refer to jquery.com with an HTTPS URL, 1308 00:56:34,305 --> 00:56:35,800 maybe what you could do is just say 1309 00:56:35,800 --> 00:56:41,020 script source equals jquery.com, maybe even HTTP. 1310 00:56:41,020 --> 00:56:43,890 But here, you're going to include some sort of a tag 1311 00:56:43,890 --> 00:56:47,752 attribute, like hash equals here, 1312 00:56:47,752 --> 00:56:49,960 you're going to put in a-- let's say a shell one hash 1313 00:56:49,960 --> 00:56:52,800 or a shell two hash of the content 1314 00:56:52,800 --> 00:56:55,020 that you're expecting to get back from the server. 1315 00:56:55,020 --> 00:56:55,475 AUDIENCE: [INAUDIBLE]. 1316 00:56:55,475 --> 00:56:56,308 PROFESSOR: Question? 1317 00:56:56,308 --> 00:56:57,920 AUDIENCE: [INAUDIBLE]. 1318 00:56:57,920 --> 00:56:59,220 PROFESSOR: Ah, man. 1319 00:56:59,220 --> 00:57:01,160 There's some complicated name for it. 1320 00:57:01,160 --> 00:57:04,070 I have the URL, actually, in the lecture notes, so [INAUDIBLE]. 1321 00:57:07,480 --> 00:57:11,590 Subresource integrity or something like this. 1322 00:57:11,590 --> 00:57:14,280 I can actually slowly be-- well, hopefully 1323 00:57:14,280 --> 00:57:18,380 will be deployed probably soon in various browsers. 1324 00:57:18,380 --> 00:57:21,840 Feels like another way to actually authenticate content 1325 00:57:21,840 --> 00:57:26,980 without relying on data, or data encryption of the [INAUDIBLE]. 1326 00:57:26,980 --> 00:57:29,170 So here, we have this very generic plan 1327 00:57:29,170 --> 00:57:31,970 using SSL and TLS to authenticate connections 1328 00:57:31,970 --> 00:57:33,317 to particular servers. 1329 00:57:33,317 --> 00:57:34,900 This is almost like an alternative way 1330 00:57:34,900 --> 00:57:39,160 of thinking of sort of securing your network communication. 1331 00:57:39,160 --> 00:57:41,530 If the thing you just care about is integrity, 1332 00:57:41,530 --> 00:57:43,890 then maybe you don't need a secure, encrypted channel 1333 00:57:43,890 --> 00:57:44,630 over the network. 1334 00:57:44,630 --> 00:57:47,048 All you need is to specify exactly what you 1335 00:57:47,048 --> 00:57:48,173 want at the end of the day. 1336 00:57:48,173 --> 00:57:48,672 Yeah. 1337 00:57:48,672 --> 00:57:51,280 AUDIENCE: So doesn't this [INAUDIBLE]? 1338 00:57:51,280 --> 00:57:53,620 PROFESSOR: Doesn't this code sit at the client? 1339 00:57:53,620 --> 00:57:57,030 Well, it runs at the client, but the client fetches this code 1340 00:57:57,030 --> 00:57:58,450 from some server. 1341 00:57:58,450 --> 00:57:59,390 AUDIENCE: [INAUDIBLE]. 1342 00:57:59,390 --> 00:58:02,384 Can't anybody just [INAUDIBLE]? 1343 00:58:02,384 --> 00:58:03,050 PROFESSOR: Yeah. 1344 00:58:03,050 --> 00:58:06,280 So I think the point of the hash is 1345 00:58:06,280 --> 00:58:13,060 to protect the containing page from attackers that injected 1346 00:58:13,060 --> 00:58:14,930 different JavaScript code here. 1347 00:58:14,930 --> 00:58:16,690 So for jQuery, this makes a lot of sense 1348 00:58:16,690 --> 00:58:18,310 because jQuery is well known. 1349 00:58:18,310 --> 00:58:20,819 You're not trying to hide what jQuery source code is. 1350 00:58:20,819 --> 00:58:23,110 Well, what you do want to make sure is that the network 1351 00:58:23,110 --> 00:58:25,880 attacker cannot intercept your connection and supply 1352 00:58:25,880 --> 00:58:28,244 a malicious version of jQuery that's going to leak 1353 00:58:28,244 --> 00:58:28,785 your cookies. 1354 00:58:28,785 --> 00:58:30,690 AUDIENCE: [? Oh, ?] OK. 1355 00:58:30,690 --> 00:58:32,150 PROFESSOR: That make sense? 1356 00:58:32,150 --> 00:58:33,820 It's absolutely true that anyone can compute the hash 1357 00:58:33,820 --> 00:58:35,111 of these things for themselves. 1358 00:58:38,240 --> 00:58:41,026 So this is a solution for integrity problems, 1359 00:58:41,026 --> 00:58:42,025 not for confidentiality. 1360 00:58:45,340 --> 00:58:46,770 All right. 1361 00:58:46,770 --> 00:58:51,450 So this is sort of what I guess developers have to watch out 1362 00:58:51,450 --> 00:58:58,680 for when writing pages, or including content in their HTML 1363 00:58:58,680 --> 00:59:01,330 pages on a HTTPS URL. 1364 00:59:01,330 --> 00:59:05,230 Another worrisome problem is dealing with cookies. 1365 00:59:05,230 --> 00:59:12,130 And here's where this difference between secure flags and just 1366 00:59:12,130 --> 00:59:15,410 origins comes into play. 1367 00:59:15,410 --> 00:59:17,860 So one thing, of course, the developer could screw up 1368 00:59:17,860 --> 00:59:20,430 is maybe they just forget to set the secure flag 1369 00:59:20,430 --> 00:59:23,150 on a cookie in the first place. 1370 00:59:23,150 --> 00:59:24,170 This happens. 1371 00:59:24,170 --> 00:59:29,950 Maybe you're thinking my users only ever go to the HTTPS URL. 1372 00:59:29,950 --> 00:59:31,350 My cookies are never [INAUDIBLE]. 1373 00:59:31,350 --> 00:59:32,950 Why should I set the secure flag on the cookie? 1374 00:59:32,950 --> 00:59:33,880 And they might [? also have the ?] 1375 00:59:33,880 --> 00:59:35,970 secure flag, or maybe they just forget about it. 1376 00:59:35,970 --> 00:59:38,837 Is this a problem? 1377 00:59:38,837 --> 00:59:40,420 What if your users are super diligent? 1378 00:59:40,420 --> 00:59:43,260 They always visit the HTTPS URL, and you don't 1379 00:59:43,260 --> 00:59:44,655 have any problems like this. 1380 00:59:44,655 --> 00:59:47,540 Do you still leave the secure flag on your cookies? 1381 00:59:47,540 --> 00:59:48,040 [INAUDIBLE] 1382 00:59:51,180 --> 00:59:52,140 Yeah. 1383 00:59:52,140 --> 00:59:53,580 AUDIENCE: Could the attacker connect to your URL 1384 00:59:53,580 --> 00:59:55,020 and redirect you to a [INAUDIBLE]? 1385 00:59:55,020 --> 00:59:55,686 PROFESSOR: Yeah. 1386 00:59:55,686 --> 00:59:59,110 So even if the user doesn't explicitly, manually 1387 00:59:59,110 --> 01:00:02,620 go to some plain text URL, the attacker could give you a link, 1388 01:00:02,620 --> 01:00:06,834 or maybe ask you to load an image from a non-HTTPS URL. 1389 01:00:06,834 --> 01:00:08,250 And then non-secure cookie is just 1390 01:00:08,250 --> 01:00:10,250 going to be sent along with the network request. 1391 01:00:10,250 --> 01:00:11,833 So that seems like a bit of a problem. 1392 01:00:11,833 --> 01:00:13,560 So you really do need the secure flag, 1393 01:00:13,560 --> 01:00:15,934 even if your users and your application is super careful. 1394 01:00:15,934 --> 01:00:17,967 AUDIENCE: But I'm assuming there's 1395 01:00:17,967 --> 01:00:19,668 an HTTP URL [INAUDIBLE]. 1396 01:00:23,070 --> 01:00:24,320 PROFESSOR: That's right, yeah. 1397 01:00:24,320 --> 01:00:26,319 So again, so how could this [? break? ?] Suppose 1398 01:00:26,319 --> 01:00:27,150 I have a site. 1399 01:00:27,150 --> 01:00:28,810 It doesn't even listen on port 80. 1400 01:00:28,810 --> 01:00:31,560 There's no way to connect to me on port 80, 1401 01:00:31,560 --> 01:00:34,218 so why is it a problem if I have a non-secure cookie? 1402 01:00:34,218 --> 01:00:36,009 AUDIENCE: Because the browser wouldn't have 1403 01:00:36,009 --> 01:00:38,000 cookies for another domain. 1404 01:00:38,000 --> 01:00:39,000 PROFESSOR: That's right. 1405 01:00:39,000 --> 01:00:40,666 So the browser wouldn't send your cookie 1406 01:00:40,666 --> 01:00:43,660 to a different domain, but yet it still 1407 01:00:43,660 --> 01:00:46,740 seems worrisome that an attacker might load a URL. 1408 01:00:46,740 --> 01:00:50,842 So suppose that amazon.com only ever served stuff over SSL. 1409 01:00:50,842 --> 01:00:52,300 It's not even listening on port 80. 1410 01:00:52,300 --> 01:00:54,060 There's no way to connect it. 1411 01:00:54,060 --> 01:00:57,080 So in this case, and as a result, 1412 01:00:57,080 --> 01:00:59,666 they don't set their secure flag on a cookie. 1413 01:00:59,666 --> 01:01:01,540 So how could a hacker then steal their cookie 1414 01:01:01,540 --> 01:01:04,546 if Amazon isn't even listening at port 80? 1415 01:01:04,546 --> 01:01:05,426 Yeah. 1416 01:01:05,426 --> 01:01:07,050 AUDIENCE: Can't the browser still think 1417 01:01:07,050 --> 01:01:09,340 it's an HTTP connection? 1418 01:01:09,340 --> 01:01:11,100 PROFESSOR: Well, so if you connect to port 1419 01:01:11,100 --> 01:01:14,424 443 and you speak SSL or GLS, then it's always 1420 01:01:14,424 --> 01:01:15,340 going to be encrypted. 1421 01:01:15,340 --> 01:01:16,697 So that's not a problem. 1422 01:01:16,697 --> 01:01:17,671 Yeah. 1423 01:01:17,671 --> 01:01:20,964 AUDIENCE: The attacker can [INAUDIBLE] their network. 1424 01:01:20,964 --> 01:01:21,630 PROFESSOR: Yeah. 1425 01:01:21,630 --> 01:01:24,350 So the attacker can actually intercept your packets 1426 01:01:24,350 --> 01:01:26,960 that are trying to connect to Amazon on port 80 1427 01:01:26,960 --> 01:01:28,960 and then appear, and make it appear, like you've 1428 01:01:28,960 --> 01:01:30,720 connected successfully. 1429 01:01:30,720 --> 01:01:33,510 So if the attacker has control over your network, 1430 01:01:33,510 --> 01:01:35,760 they could redirect your packets trying 1431 01:01:35,760 --> 01:01:37,970 to get to Amazon to their own machine on port 80. 1432 01:01:37,970 --> 01:01:39,290 They're going to accept the connection, 1433 01:01:39,290 --> 01:01:41,831 and the client isn't going to be able to know the difference. 1434 01:01:41,831 --> 01:01:44,030 It will be as if Amazon is listening on port 80, 1435 01:01:44,030 --> 01:01:46,931 and then your cookies will be sent to this adversary's web 1436 01:01:46,931 --> 01:01:47,430 server. 1437 01:01:47,430 --> 01:01:49,246 AUDIENCE: Because the client is unknown. 1438 01:01:49,246 --> 01:01:49,810 PROFESSOR: That's right. 1439 01:01:49,810 --> 01:01:51,380 Yeah, so for HTTP, there's no way 1440 01:01:51,380 --> 01:01:53,420 to authenticate the host you're connected to. 1441 01:01:53,420 --> 01:01:54,950 This is exactly what's going on. 1442 01:01:54,950 --> 01:01:57,980 HTTP has no authentication, and as a result, 1443 01:01:57,980 --> 01:01:59,950 you have to prevent the cookies from being 1444 01:01:59,950 --> 01:02:01,730 sent over HTTP in the first place 1445 01:02:01,730 --> 01:02:05,320 because you have no idea who that HTTP connection is 1446 01:02:05,320 --> 01:02:08,066 going to go to if you're assuming a network adversary. 1447 01:02:08,066 --> 01:02:10,624 AUDIENCE: So you need network control to do this. 1448 01:02:10,624 --> 01:02:11,540 PROFESSOR: Well, yeah. 1449 01:02:11,540 --> 01:02:13,560 So either you have full control over your network 1450 01:02:13,560 --> 01:02:15,518 so you know that adversaries aren't going to be 1451 01:02:15,518 --> 01:02:16,860 able to intercept your packets. 1452 01:02:16,860 --> 01:02:18,610 But even then, it's actually not so great. 1453 01:02:18,610 --> 01:02:20,580 Like look at the TCP lecture. 1454 01:02:20,580 --> 01:02:23,716 You can do all kinds of sequence number of attacks and so on. 1455 01:02:23,716 --> 01:02:25,700 [? That's going to be ?] troublesome. 1456 01:02:25,700 --> 01:02:26,450 All right. 1457 01:02:26,450 --> 01:02:28,233 Any more questions about that? 1458 01:02:28,233 --> 01:02:28,732 Yeah. 1459 01:02:28,732 --> 01:02:30,130 AUDIENCE: I'm sorry, but isn't the attack intercepted 1460 01:02:30,130 --> 01:02:30,671 in that case? 1461 01:02:30,671 --> 01:02:31,917 Is there like a redirect? 1462 01:02:31,917 --> 01:02:33,750 PROFESSOR: Well, what that hacker presumably 1463 01:02:33,750 --> 01:02:36,860 would intercept is an HTTP request from the client going 1464 01:02:36,860 --> 01:02:40,939 to http amazon.com, and that request includes 1465 01:02:40,939 --> 01:02:43,397 all your amazon.com cookies, or cookies for whatever domain 1466 01:02:43,397 --> 01:02:45,210 it is that you're sending your request to. 1467 01:02:45,210 --> 01:02:47,084 So if you don't mark those cookies as secure, 1468 01:02:47,084 --> 01:02:49,240 there will be set of both encrypted and unencrypted 1469 01:02:49,240 --> 01:02:49,740 connections. 1470 01:02:49,740 --> 01:02:51,810 AUDIENCE: So how does that request get initiated? 1471 01:02:51,810 --> 01:02:52,600 PROFESSOR: Ah, OK. 1472 01:02:52,600 --> 01:02:53,100 Yeah. 1473 01:02:53,100 --> 01:02:55,360 So maybe you get the user to visit newyorktimes.com 1474 01:02:55,360 --> 01:02:58,260 and you pay for an advertisement that loads an image 1475 01:02:58,260 --> 01:03:01,194 from http colon amazon.com. 1476 01:03:01,194 --> 01:03:02,980 And there's nothing preventing you 1477 01:03:02,980 --> 01:03:05,120 from saying, please load an image from this URL. 1478 01:03:05,120 --> 01:03:06,950 But when a browser tries to connect there, 1479 01:03:06,950 --> 01:03:09,858 it'll send the cookies if the connection succeeds. 1480 01:03:09,858 --> 01:03:10,854 Question back there. 1481 01:03:10,854 --> 01:03:14,174 AUDIENCE: Will it ask for a change [INAUDIBLE]? 1482 01:03:14,174 --> 01:03:14,840 PROFESSOR: Yeah. 1483 01:03:14,840 --> 01:03:16,890 So HTTPS everywhere is an extension 1484 01:03:16,890 --> 01:03:20,040 that is very similar to forced HTTPS in some ways, 1485 01:03:20,040 --> 01:03:24,720 and it tries to prevent these kinds of mistakes. 1486 01:03:24,720 --> 01:03:28,380 So I guess one thing that forced HTTP does 1487 01:03:28,380 --> 01:03:31,760 is they worry about such mistakes. 1488 01:03:31,760 --> 01:03:36,410 And when you sort of opted a site into this forced HTTPS 1489 01:03:36,410 --> 01:03:39,560 plan, one thing that the browser will do for you 1490 01:03:39,560 --> 01:03:43,270 is prevent any HTTPS connections to that host 1491 01:03:43,270 --> 01:03:44,560 in the first place. 1492 01:03:44,560 --> 01:03:47,250 So there's no way to make this kind of mistakes 1493 01:03:47,250 --> 01:03:50,580 of not flagging your cookie as secure, 1494 01:03:50,580 --> 01:03:54,340 or having other sort of kinds of cookie problems as well. 1495 01:03:54,340 --> 01:03:57,430 Another more subtle problem-- so this, 1496 01:03:57,430 --> 01:03:58,930 the problem we talked about just now 1497 01:03:58,930 --> 01:04:00,430 is the developer forgetting to set 1498 01:04:00,430 --> 01:04:01,974 the secure flag on a cookie. 1499 01:04:01,974 --> 01:04:02,890 So that seems fixable. 1500 01:04:02,890 --> 01:04:04,639 OK, maybe the developer should just do it. 1501 01:04:04,639 --> 01:04:05,690 OK, fix that problem. 1502 01:04:05,690 --> 01:04:07,270 The thing that's much more subtle 1503 01:04:07,270 --> 01:04:11,030 is that when a secure web server gets a cookie back 1504 01:04:11,030 --> 01:04:13,990 from the client, it actually has no idea whether this cookie was 1505 01:04:13,990 --> 01:04:17,020 sent through an encrypted connection or a plain text 1506 01:04:17,020 --> 01:04:19,819 connection because when the server gets 1507 01:04:19,819 --> 01:04:21,360 a cookie from the client, all it gets 1508 01:04:21,360 --> 01:04:24,200 is the key value pair for a cookie. 1509 01:04:24,200 --> 01:04:28,650 And as we sort of look at here, the plan for the [INAUDIBLE] 1510 01:04:28,650 --> 01:04:31,530 follows is that it'll include both secure and insecure 1511 01:04:31,530 --> 01:04:35,050 cookies when it's sending a request to a secure server, 1512 01:04:35,050 --> 01:04:36,850 because the browser here was just 1513 01:04:36,850 --> 01:04:39,650 concerned about the confidentiality of cookies. 1514 01:04:39,650 --> 01:04:42,030 But on the server side, you now don't 1515 01:04:42,030 --> 01:04:43,280 have any integrity guarantees. 1516 01:04:43,280 --> 01:04:44,762 When you get a cookie from a user, 1517 01:04:44,762 --> 01:04:46,970 it might have been sent over an encrypted connection, 1518 01:04:46,970 --> 01:04:50,370 but it also might have been sent over a plain text connection. 1519 01:04:50,370 --> 01:04:53,390 So this leads to somewhat more subtle attacks, 1520 01:04:53,390 --> 01:04:55,670 but the flavor of these attacks tend 1521 01:04:55,670 --> 01:04:57,370 to be things like session fixation. 1522 01:04:57,370 --> 01:05:01,840 What it means is that suppose I want to see what emails you're 1523 01:05:01,840 --> 01:05:02,490 sending. 1524 01:05:02,490 --> 01:05:05,130 Or maybe I'll set a cookie for you that 1525 01:05:05,130 --> 01:05:06,850 is a copy of my Gmail, cookie. 1526 01:05:06,850 --> 01:05:08,760 So when you go to compose a message in Gmail, 1527 01:05:08,760 --> 01:05:11,700 it'll actually be saved in my sent folder inside of your 1528 01:05:11,700 --> 01:05:12,531 sent folder. 1529 01:05:12,531 --> 01:05:14,155 It'll be as if you're using my account, 1530 01:05:14,155 --> 01:05:16,280 and then I'll be able to extract things from there. 1531 01:05:16,280 --> 01:05:20,610 So if I can force a session cookie into your browser 1532 01:05:20,610 --> 01:05:22,260 and sort of get you to use my account, 1533 01:05:22,260 --> 01:05:24,340 maybe I can extract some information that way 1534 01:05:24,340 --> 01:05:27,170 from the victim. 1535 01:05:27,170 --> 01:05:32,290 So that's another problem that arises because of this grey 1536 01:05:32,290 --> 01:05:36,060 area [INAUDIBLE] incomplete separation between HTTP 1537 01:05:36,060 --> 01:05:37,523 and HTTPS cookies. 1538 01:05:37,523 --> 01:05:38,022 Question. 1539 01:05:38,022 --> 01:05:40,313 AUDIENCE: So you would need a [INAUDIBLE] vulnerability 1540 01:05:40,313 --> 01:05:41,670 to set that cookie [INAUDIBLE]. 1541 01:05:41,670 --> 01:05:43,419 PROFESSOR: No. [INAUDIBLE] vulnerability 1542 01:05:43,419 --> 01:05:44,210 to set this cookie. 1543 01:05:44,210 --> 01:05:46,210 You would just trick the browser into connecting 1544 01:05:46,210 --> 01:05:49,460 to a regular HTTP host URL. 1545 01:05:49,460 --> 01:05:53,440 And without some extension like forced HTTPS or HTTPS 1546 01:05:53,440 --> 01:05:56,730 everywhere, you could then, as an adversary, 1547 01:05:56,730 --> 01:05:59,680 set up a key in the user's browser. 1548 01:05:59,680 --> 01:06:01,470 It's a non-secure cookie, but it's 1549 01:06:01,470 --> 01:06:03,555 going to be sent back, even on secure requests. 1550 01:06:03,555 --> 01:06:06,013 AUDIENCE: So do you have to trick the browser into thinking 1551 01:06:06,013 --> 01:06:07,650 the domain is the same domain? 1552 01:06:07,650 --> 01:06:08,070 PROFESSOR: That's right. 1553 01:06:08,070 --> 01:06:08,240 Yeah. 1554 01:06:08,240 --> 01:06:09,910 So you have to intercept their network connection 1555 01:06:09,910 --> 01:06:11,280 and probably do the same kind of attack 1556 01:06:11,280 --> 01:06:13,446 you were talking about just a couple of minutes ago. 1557 01:06:13,446 --> 01:06:14,090 Yeah. 1558 01:06:14,090 --> 01:06:15,970 Make sense? 1559 01:06:15,970 --> 01:06:17,390 All right. 1560 01:06:17,390 --> 01:06:20,390 So I guess there's probably [INAUDIBLE]. 1561 01:06:20,390 --> 01:06:23,130 So what does forced HTTPS actually do for us now? 1562 01:06:23,130 --> 01:06:27,100 It tries to prevent some subset of these problems. 1563 01:06:27,100 --> 01:06:29,680 So I guess I should say, so forced HTTPS, the paper we read 1564 01:06:29,680 --> 01:06:31,930 was sort of a research proposal that 1565 01:06:31,930 --> 01:06:35,539 was published I guess five or six years ago now. 1566 01:06:35,539 --> 01:06:37,330 Since then, it's actually been standardized 1567 01:06:37,330 --> 01:06:38,560 and actually adopted. 1568 01:06:38,560 --> 01:06:42,410 So this was like a somewhat sketchy plug-in that 1569 01:06:42,410 --> 01:06:43,665 stored stuff and some cookies. 1570 01:06:43,665 --> 01:06:46,620 Are they worried about getting evicted and so on? 1571 01:06:46,620 --> 01:06:48,770 Now actually, most browsers look at this paper 1572 01:06:48,770 --> 01:06:49,710 and say, OK, this is a great idea. 1573 01:06:49,710 --> 01:06:51,126 We'll actually implement it better 1574 01:06:51,126 --> 01:06:52,350 within the browser itself. 1575 01:06:52,350 --> 01:06:55,970 So there's something called HTTP strict transport security that 1576 01:06:55,970 --> 01:06:58,199 implements most of the ideas from forced HTTPS 1577 01:06:58,199 --> 01:06:59,490 and actually make a good story. 1578 01:06:59,490 --> 01:07:03,565 Like, here's how research actually makes an impact on I 1579 01:07:03,565 --> 01:07:07,100 guess security of web applications and browsers. 1580 01:07:07,100 --> 01:07:08,970 But anyway, let's look at what forced HTTPS 1581 01:07:08,970 --> 01:07:10,980 does for a website. 1582 01:07:10,980 --> 01:07:15,220 So forced HTTPS allows a website to set this bit 1583 01:07:15,220 --> 01:07:17,380 for a particular host name. 1584 01:07:17,380 --> 01:07:21,005 And the way that forced HTTPS changes the behavior 1585 01:07:21,005 --> 01:07:24,242 of the browser is threefold. 1586 01:07:24,242 --> 01:07:28,760 So if some website sets forced HTTPS, 1587 01:07:28,760 --> 01:07:32,300 then there's sort of three things that happen differently. 1588 01:07:32,300 --> 01:07:39,110 So any certificate errors are always fatal. 1589 01:07:39,110 --> 01:07:41,405 So the user doesn't have a chance 1590 01:07:41,405 --> 01:07:45,780 of accepting incorrect certificate that 1591 01:07:45,780 --> 01:07:49,530 has a wrong host name, or an expiration time that's passed, 1592 01:07:49,530 --> 01:07:50,480 et cetera. 1593 01:07:50,480 --> 01:07:52,810 So it's one thing that the browser now changes. 1594 01:07:52,810 --> 01:08:01,130 Another is that it redirects all HTTP requests to HTTPS. 1595 01:08:01,130 --> 01:08:02,840 So this is a pretty good idea. 1596 01:08:02,840 --> 01:08:08,590 If you know a site is always using HTTPS legitimately, 1597 01:08:08,590 --> 01:08:10,852 then you should probably prohibit any regular HTTP 1598 01:08:10,852 --> 01:08:12,810 requests [? website ?], because that's probably 1599 01:08:12,810 --> 01:08:15,110 a sign of some mistake or attacker trying 1600 01:08:15,110 --> 01:08:17,830 to trick you into connecting to a site without encryption. 1601 01:08:17,830 --> 01:08:20,080 You want to make sure this actually happens before you 1602 01:08:20,080 --> 01:08:22,080 issue the HTTP request. 1603 01:08:22,080 --> 01:08:24,740 Otherwise, the HTTP request has already sort of sailed 1604 01:08:24,740 --> 01:08:26,630 onto the network. 1605 01:08:26,630 --> 01:08:32,140 And the last thing that this forced HTTPS setting changes is 1606 01:08:32,140 --> 01:08:37,740 that it actually prohibits this insecure embedding 1607 01:08:37,740 --> 01:08:43,910 plan that we looked at below here 1608 01:08:43,910 --> 01:08:50,149 when you're including a HTTP URL in an HTTPS site. 1609 01:08:50,149 --> 01:08:51,070 Make sense? 1610 01:08:51,070 --> 01:08:55,319 So this is what the forced HTTPS sort of extension did. 1611 01:08:55,319 --> 01:08:57,620 In terms of what's going on now is 1612 01:08:57,620 --> 01:09:03,180 that well, so this HTTPS strict transport security HSTS 1613 01:09:03,180 --> 01:09:06,870 protocol basically does the same things. 1614 01:09:06,870 --> 01:09:09,969 Most browsers now prohibit insecure embedding by default. 1615 01:09:09,969 --> 01:09:12,109 So this used to be a little controversial 1616 01:09:12,109 --> 01:09:14,970 because many developers have trouble with this. 1617 01:09:14,970 --> 01:09:20,590 But I think Firefox and Chrome and IE all now by default 1618 01:09:20,590 --> 01:09:23,529 will refuse to load insecure components, 1619 01:09:23,529 --> 01:09:27,649 or at least secure JavaScript and CSS, into our page 1620 01:09:27,649 --> 01:09:29,051 unless you do something. 1621 01:09:29,051 --> 01:09:29,550 Question. 1622 01:09:29,550 --> 01:09:31,284 AUDIENCE: Don't they prompt the user? 1623 01:09:31,284 --> 01:09:33,700 PROFESSOR: They used to, and the user would just say, yes. 1624 01:09:33,700 --> 01:09:36,262 So IE, for example, used to pop up this dialogue box, 1625 01:09:36,262 --> 01:09:37,720 and this paper talks about, saying, 1626 01:09:37,720 --> 01:09:40,560 would you like to load some extra content, 1627 01:09:40,560 --> 01:09:42,517 or something like that. 1628 01:09:42,517 --> 01:09:44,413 AUDIENCE: [INAUDIBLE] because [INAUDIBLE]. 1629 01:09:44,413 --> 01:09:45,079 PROFESSOR: Yeah. 1630 01:09:45,079 --> 01:09:47,500 I think if you try to pretend to be clever, 1631 01:09:47,500 --> 01:09:50,520 then you can bypass all these security mechanisms. 1632 01:09:50,520 --> 01:09:53,220 But don't try to be clever this way. 1633 01:09:53,220 --> 01:09:55,770 So this is mostly a non-problem in modern browsers, 1634 01:09:55,770 --> 01:09:58,140 but these two things are still things 1635 01:09:58,140 --> 01:10:01,510 that forced HTTPS and HTTP strict transport security 1636 01:10:01,510 --> 01:10:02,993 provide and are useful. 1637 01:10:02,993 --> 01:10:03,493 Yeah. 1638 01:10:03,493 --> 01:10:05,284 AUDIENCE: What happens when a website can't 1639 01:10:05,284 --> 01:10:08,895 support HTTPS? [INAUDIBLE] change their [INAUDIBLE]? 1640 01:10:08,895 --> 01:10:11,020 PROFESSOR: So what do you mean can't support HTTPS? 1641 01:10:11,020 --> 01:10:12,457 AUDIENCE: [INAUDIBLE]. 1642 01:10:12,457 --> 01:10:13,290 PROFESSOR: Well, OK. 1643 01:10:13,290 --> 01:10:16,330 So if you have a website that doesn't support HTTPS 1644 01:10:16,330 --> 01:10:19,116 but sets this cookie, what happens? 1645 01:10:19,116 --> 01:10:20,068 AUDIENCE: [INAUDIBLE]. 1646 01:10:20,068 --> 01:10:21,020 PROFESSOR: Yeah. 1647 01:10:21,020 --> 01:10:22,790 So this is the reason why it's an option. 1648 01:10:22,790 --> 01:10:25,640 So if you opted everyone, then you're exactly in this boat. 1649 01:10:25,640 --> 01:10:28,140 Like, oh, all of a sudden, you can't talk to most of the web 1650 01:10:28,140 --> 01:10:29,775 because they don't use HTTPS. 1651 01:10:29,775 --> 01:10:31,900 So you really wanted this to be selectively enabled 1652 01:10:31,900 --> 01:10:34,974 for sites that really want this kind of protection. 1653 01:10:34,974 --> 01:10:35,474 Yeah. 1654 01:10:35,474 --> 01:10:36,950 AUDIENCE: But also, if I remember correctly, 1655 01:10:36,950 --> 01:10:39,270 you can't set the cookie unless the site [INAUDIBLE]. 1656 01:10:39,270 --> 01:10:39,650 PROFESSOR: That's right, yeah. 1657 01:10:39,650 --> 01:10:41,441 So these guys are also worried about denial 1658 01:10:41,441 --> 01:10:44,050 of service attacks, where this plug in 1659 01:10:44,050 --> 01:10:47,300 could be used to cause trouble for other sites. 1660 01:10:47,300 --> 01:10:49,980 So if you, for example, set this forced HTTPS 1661 01:10:49,980 --> 01:10:55,400 bit for some unsuspecting website, then all of a sudden, 1662 01:10:55,400 --> 01:10:57,920 the website stops working because everyone is now 1663 01:10:57,920 --> 01:10:59,570 trying to connect to them over HTTPS, 1664 01:10:59,570 --> 01:11:00,890 and they don't support HTTPS. 1665 01:11:00,890 --> 01:11:04,779 So this is one example of worrying about denial 1666 01:11:04,779 --> 01:11:05,570 of service attacks. 1667 01:11:05,570 --> 01:11:07,810 Another thing is that they actually 1668 01:11:07,810 --> 01:11:12,120 don't support setting forced HTTPS for an entire domain. 1669 01:11:12,120 --> 01:11:15,900 So they worried that, for example, at mit.edu, I 1670 01:11:15,900 --> 01:11:17,430 am a user at mit.edu. 1671 01:11:17,430 --> 01:11:20,150 Maybe I'll set a forced HTTPS cookie for start.mit.edu 1672 01:11:20,150 --> 01:11:21,770 in everyone's browsers. 1673 01:11:21,770 --> 01:11:25,330 And now, only HTTPS things work at MIT. 1674 01:11:25,330 --> 01:11:27,510 That seems also a little disastrous, 1675 01:11:27,510 --> 01:11:29,592 so you probably want to avoid that. 1676 01:11:29,592 --> 01:11:32,730 On the other hand, actually, HTTPS strict transfer security 1677 01:11:32,730 --> 01:11:34,580 went back on this and said, well, we'll 1678 01:11:34,580 --> 01:11:38,930 allow this notion of forcing HTTPS for an entire subdomain 1679 01:11:38,930 --> 01:11:42,120 because it turns out to be useful because 1680 01:11:42,120 --> 01:11:44,595 of these insecure cookies being sent along with a request 1681 01:11:44,595 --> 01:11:48,250 that you can't tell where they were sent from initially. 1682 01:11:48,250 --> 01:11:50,850 Anyway, so there's all kinds of subtle interactions 1683 01:11:50,850 --> 01:11:52,280 with teachers at the lowest level, 1684 01:11:52,280 --> 01:11:57,320 but it's not clear what the right choice is. 1685 01:11:57,320 --> 01:11:59,570 OK, so one actually interesting question you might ask 1686 01:11:59,570 --> 01:12:05,040 is are these fundamental to the system we 1687 01:12:05,040 --> 01:12:07,850 have, or are these mostly just helping developers avoid 1688 01:12:07,850 --> 01:12:09,040 mistakes? 1689 01:12:09,040 --> 01:12:12,000 So suppose you had a developer that's very diligent 1690 01:12:12,000 --> 01:12:14,820 and doesn't do insecure [INAUDIBLE] embedding, 1691 01:12:14,820 --> 01:12:16,230 doesn't solve any other problems, 1692 01:12:16,230 --> 01:12:18,640 always gets their certificates renewed, 1693 01:12:18,640 --> 01:12:22,511 should they bother with forced HTTPS or not? 1694 01:12:22,511 --> 01:12:23,010 Yeah. 1695 01:12:23,010 --> 01:12:23,885 AUDIENCE: Well, yeah. 1696 01:12:23,885 --> 01:12:27,920 You still have the problem with someone forcing HTTP protocol. 1697 01:12:27,920 --> 01:12:30,502 Nothing stops the hacker from doing 1698 01:12:30,502 --> 01:12:32,294 [? excessive ?] [INAUDIBLE] forces the user 1699 01:12:32,294 --> 01:12:33,793 to load something over HTTP and then 1700 01:12:33,793 --> 01:12:35,140 to intercept the connection. 1701 01:12:35,140 --> 01:12:38,130 PROFESSOR: That's true, but if you feel they're very diligent 1702 01:12:38,130 --> 01:12:40,140 and all their cookies are marked secure, 1703 01:12:40,140 --> 01:12:43,542 then having someone visit an HTTP version of your site, 1704 01:12:43,542 --> 01:12:44,500 shouldn't be a problem. 1705 01:12:44,500 --> 01:12:46,364 AUDIENCE: [INAUDIBLE]. 1706 01:12:46,364 --> 01:12:47,030 PROFESSOR: Yeah. 1707 01:12:47,030 --> 01:12:49,530 So you'd probably have to defend against cookie overwrite 1708 01:12:49,530 --> 01:12:51,860 or injection attacks, and that's sort of doable. 1709 01:12:51,860 --> 01:12:55,089 It's a little tedious, but you can probably do something. 1710 01:12:55,089 --> 01:12:55,714 AUDIENCE: Yeah. 1711 01:12:55,714 --> 01:12:58,488 I think her point is that also, it didn't-- security 1712 01:12:58,488 --> 01:13:00,474 didn't check the certificate, right? 1713 01:13:00,474 --> 01:13:01,140 PROFESSOR: Yeah. 1714 01:13:01,140 --> 01:13:01,723 So that's one. 1715 01:13:01,723 --> 01:13:03,830 I think that this is the biggest thing 1716 01:13:03,830 --> 01:13:06,290 is this first point, which is that everything else, 1717 01:13:06,290 --> 01:13:08,970 you can sort of defend it against by cleverly coding 1718 01:13:08,970 --> 01:13:10,780 or being careful in your application. 1719 01:13:10,780 --> 01:13:12,740 The first thing is something that the user 1720 01:13:12,740 --> 01:13:14,742 has-- or the developer-- has no control 1721 01:13:14,742 --> 01:13:17,200 over because the developer wants to make sure, for example, 1722 01:13:17,200 --> 01:13:20,375 that their cookie will only be sent to their server as 1723 01:13:20,375 --> 01:13:22,180 signed by this CA. 1724 01:13:22,180 --> 01:13:25,227 And if the user is allowed to randomly say, oh, 1725 01:13:25,227 --> 01:13:26,810 that's good enough, then the developer 1726 01:13:26,810 --> 01:13:28,393 has no clue where their cookie's going 1727 01:13:28,393 --> 01:13:30,970 to end up because some user is going to leak it 1728 01:13:30,970 --> 01:13:33,420 to some incorrect server. 1729 01:13:33,420 --> 01:13:35,737 So this is, I think, the main benefit of this protocol. 1730 01:13:35,737 --> 01:13:36,570 Question back there. 1731 01:13:36,570 --> 01:13:38,001 AUDIENCE: [INAUDIBLE] second point 1732 01:13:38,001 --> 01:13:40,863 is also vital because the user might not [INAUDIBLE]. 1733 01:13:40,863 --> 01:13:43,407 You might [INAUDIBLE] of the site, which 1734 01:13:43,407 --> 01:13:44,679 would be right in the middle. 1735 01:13:44,679 --> 01:13:45,156 PROFESSOR: I see. 1736 01:13:45,156 --> 01:13:45,656 OK. 1737 01:13:45,656 --> 01:13:47,970 So I agree in the sense that this 1738 01:13:47,970 --> 01:13:52,060 is very useful from the point of view of UI security 1739 01:13:52,060 --> 01:13:55,349 because as far as the cookies are concerned, 1740 01:13:55,349 --> 01:13:57,140 the developer can probably be clever enough 1741 01:13:57,140 --> 01:13:58,570 to do something sensible. 1742 01:13:58,570 --> 01:14:01,510 But the user might not be diligently looking 1743 01:14:01,510 --> 01:14:04,650 at that lock icon and URL at all times. 1744 01:14:04,650 --> 01:14:09,790 So if you load up amazon.com and it asks you 1745 01:14:09,790 --> 01:14:12,090 for a credit card number, you might just type it in. 1746 01:14:12,090 --> 01:14:14,630 You just forgot to look for a lock icon, 1747 01:14:14,630 --> 01:14:18,790 whereas if you set forced HTTPS for amazon.com, then 1748 01:14:18,790 --> 01:14:20,510 there's just not chance that you'll 1749 01:14:20,510 --> 01:14:24,305 have an HTTP URL for that site. 1750 01:14:24,305 --> 01:14:26,430 It still [? causes a ?] problem that maybe the user 1751 01:14:26,430 --> 01:14:27,740 doesn't read the URL correctly. 1752 01:14:27,740 --> 01:14:32,350 Like it says Ammazon with two Ms dot com. 1753 01:14:32,350 --> 01:14:33,670 Probably still fool many users. 1754 01:14:33,670 --> 01:14:39,756 But anyway, that is another advantage for forced HTTPS. 1755 01:14:39,756 --> 01:14:41,510 Make sense? 1756 01:14:41,510 --> 01:14:43,020 Other questions about this scheme? 1757 01:14:46,480 --> 01:14:47,740 All right. 1758 01:14:47,740 --> 01:14:50,230 So I guess one interesting thing is 1759 01:14:50,230 --> 01:14:52,740 how do you get this forced HTTPS bit 1760 01:14:52,740 --> 01:14:55,470 for a site in the first place? 1761 01:14:55,470 --> 01:14:57,460 Could you have intercepted that as an attacker 1762 01:14:57,460 --> 01:14:59,780 and prevent that bit from being set 1763 01:14:59,780 --> 01:15:04,310 if you [? want to mount a fax? ?] 1764 01:15:04,310 --> 01:15:05,210 Yeah. 1765 01:15:05,210 --> 01:15:06,653 AUDIENCE: [INAUDIBLE] HTTPS. 1766 01:15:06,653 --> 01:15:09,058 I mean, HTTPS, we're [? assuming ?] [INAUDIBLE] 1767 01:15:09,058 --> 01:15:12,010 protocol [INAUDIBLE]. 1768 01:15:12,010 --> 01:15:13,010 PROFESSOR: That's right. 1769 01:15:13,010 --> 01:15:14,900 So on one hand, this could be good. 1770 01:15:14,900 --> 01:15:16,430 But this forced https that can only 1771 01:15:16,430 --> 01:15:21,900 be sent over HTTPS connection to the host in question. 1772 01:15:21,900 --> 01:15:26,160 On other hand, the user might be fooled at that point. 1773 01:15:26,160 --> 01:15:28,810 Like, he doesn't have the forced HTTPS bit yet. 1774 01:15:28,810 --> 01:15:33,670 So maybe the user will allow some incorrect certificate, 1775 01:15:33,670 --> 01:15:38,400 or will not even know that this is HTTP and not HTTPS. 1776 01:15:38,400 --> 01:15:41,800 So it seems potentially possible for an attacker 1777 01:15:41,800 --> 01:15:44,220 to prevent that forced HTTPS bit from being 1778 01:15:44,220 --> 01:15:45,220 sent in the first place. 1779 01:15:45,220 --> 01:15:49,610 If you've never been to a site and you try to visit that site, 1780 01:15:49,610 --> 01:15:52,780 you might never learn whether it should be forced HTTPS or not 1781 01:15:52,780 --> 01:15:54,080 in the first place. 1782 01:15:54,080 --> 01:15:54,580 Yeah. 1783 01:15:54,580 --> 01:15:58,000 AUDIENCE: Will the [INAUDIBLE] roaming certificate there. 1784 01:15:58,000 --> 01:15:59,380 PROFESSOR: That's right, yeah. 1785 01:15:59,380 --> 01:16:02,830 So I guess the way to think of it is if they did a set, 1786 01:16:02,830 --> 01:16:05,370 then you know you talked to the right server at some point, 1787 01:16:05,370 --> 01:16:07,942 and then you could continue using that bit correctly. 1788 01:16:07,942 --> 01:16:10,400 On the other hand, if you don't have that bit set, or maybe 1789 01:16:10,400 --> 01:16:12,300 if you've never talked to a server yet, 1790 01:16:12,300 --> 01:16:14,930 there's no clear cut protocol that will always 1791 01:16:14,930 --> 01:16:18,510 give you whether that forced HTTPS bit should be set or not. 1792 01:16:18,510 --> 01:16:21,686 Maybe amazon.com always wants to set that forced HTTPS bit. 1793 01:16:21,686 --> 01:16:23,560 But the first time you pulled up your laptop, 1794 01:16:23,560 --> 01:16:25,406 you were already on an attacker's network, 1795 01:16:25,406 --> 01:16:27,780 and there's just no way for you to connect to amazon.com. 1796 01:16:27,780 --> 01:16:30,280 Everything is intercepted, or something like this. 1797 01:16:30,280 --> 01:16:32,120 So it's a very hard problem to solve. 1798 01:16:32,120 --> 01:16:35,850 The bootstrapping of these security settings 1799 01:16:35,850 --> 01:16:36,840 is pretty tricky. 1800 01:16:36,840 --> 01:16:38,381 I guess one thing you could try to do 1801 01:16:38,381 --> 01:16:40,720 is maybe embed this bit in DNSSEC. 1802 01:16:40,720 --> 01:16:42,530 So if you have DNSSEC, already in use, 1803 01:16:42,530 --> 01:16:46,070 then maybe you could sign whether you should use HTTPS 1804 01:16:46,070 --> 01:16:50,960 or not, or forced HTTPS or not, as part of your DNS name. 1805 01:16:50,960 --> 01:16:53,709 But again, it just boils down the problem to DNSSEC 1806 01:16:53,709 --> 01:16:54,250 being secure. 1807 01:16:54,250 --> 01:16:56,083 So there's always this sort of rule of trust 1808 01:16:56,083 --> 01:16:58,551 where you have to really assume that's correct. 1809 01:16:58,551 --> 01:16:59,453 Question. 1810 01:16:59,453 --> 01:17:00,369 AUDIENCE: [INAUDIBLE]. 1811 01:17:04,544 --> 01:17:05,210 PROFESSOR: Yeah. 1812 01:17:05,210 --> 01:17:07,540 So I guess Google keeps trying to improve things 1813 01:17:07,540 --> 01:17:08,520 by hard coding it. 1814 01:17:08,520 --> 01:17:12,490 So one thing that Chrome offers is 1815 01:17:12,490 --> 01:17:15,840 that actually, the browser ships with a list of sites that 1816 01:17:15,840 --> 01:17:19,220 should have forced HTTPS enabled-- or now, well, this 1817 01:17:19,220 --> 01:17:22,720 HSTS thing, which is [INAUDIBLE] enabled. 1818 01:17:22,720 --> 01:17:24,510 So when you actually download Chrome, 1819 01:17:24,510 --> 01:17:26,220 you get lots of actually useful stuff, 1820 01:17:26,220 --> 01:17:30,650 like somewhat up to date CRL and a list of forced HTTPS 1821 01:17:30,650 --> 01:17:33,220 sites that are particularly important. 1822 01:17:33,220 --> 01:17:35,779 So this is like somewhat admitting defeat. 1823 01:17:35,779 --> 01:17:37,070 Like the protocol doesn't work. 1824 01:17:37,070 --> 01:17:40,130 We just have to distribute this a priori to everyone. 1825 01:17:40,130 --> 01:17:42,360 And it sets up this unfortunate dichotomy 1826 01:17:42,360 --> 01:17:44,565 between sites that are sort of important enough 1827 01:17:44,565 --> 01:17:46,530 for Google to ship with the browser, 1828 01:17:46,530 --> 01:17:49,132 and sites that don't do this. 1829 01:17:49,132 --> 01:17:50,840 Now of course, Google right now tells you 1830 01:17:50,840 --> 01:17:52,540 that anyone can get their site included 1831 01:17:52,540 --> 01:17:54,030 because the list is so small. 1832 01:17:54,030 --> 01:17:55,740 But if this grows to millions of entries, 1833 01:17:55,740 --> 01:17:57,660 I'm sure Google will stop including 1834 01:17:57,660 --> 01:17:58,850 everyone's site in there. 1835 01:17:58,850 --> 01:18:00,570 But yeah, you could totally add a domain. 1836 01:18:00,570 --> 01:18:02,320 And you could email Chrome developers 1837 01:18:02,320 --> 01:18:07,150 and get your thing included on the list of forced HTTPS URLs. 1838 01:18:07,150 --> 01:18:11,816 Anyway, any other questions about forced HTTPS and SSL? 1839 01:18:11,816 --> 01:18:12,390 All right. 1840 01:18:12,390 --> 01:18:12,890 Good. 1841 01:18:12,890 --> 01:18:16,502 So I'll see you guys on Wednesday at the [INAUDIBLE].