Flash and JavaScript are required for this feature.
Download the video from iTunes U or the Internet Archive.
Description: In this lecture, Professor Zeldovich discusses how to cryptographically protect network communications, as well as how to integrate cryptographic protection of network traffic into the web security model.
Instructor: Nickolai Zeldovich
Lecture 14: SSL and HTTPS
The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make a donation, or to view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu.
PROFESSOR: Now look at how the web uses cryptographic protocols to protect network communication and deal with network factors in general.
So before we dive into the details, I want to remind you there's a quiz on Wednesday. And that's not in this room. It's in Walker. But it's at the regular lecture time.
Any questions about that? Hopefully straightforward.
Third floor, I think it is usually.
All right. So today we're going to talk about how the web sort of uses cryptography to protect network communication. And we'll look at two sort of closely related topics.
One is, how do you just cryptographically protect never communication in a larger scale than the Kerberos system we looked at in last lecture? And then also, we're going to look at how do you actually integrate this cryptographic protection provided to you at the network level into the entire application.
So how does the web browser make sense of whatever guarantees the cryptographic protocol is providing to it?
And these are closely related, and it turns out that protecting network communications is rather easy. Cryptography mostly just works. And integrating it in, and currently using it at a higher level in the browser, is actually that much trickier part, how to actually build a system around cryptography.
Before we dive into this whole discussion, I want to remind you of the kinds of cryptographic primitives we're going to use here.
So in last lecture on Kerberos, we basically used something called symmetric crypto, or encryption and decryption. And the plan there is that you have a secret key k, and you have two functions.
So you can take some piece of data, let's call it p for plain text, and you can apply an encryption function, that's a function of some key k. And if you encrypt this plain text, you get a Cypher text c.
And similarly, there's a description function called d, that given the same key k. And the cipher text will give you back your plain text.
So this is the primitive that Kerberos was all built around.
But it turns out there's other primitives, as well, that will be useful for today's discussion.
And this is called asymmetric encryption and decryption. And here the idea is to have different keys for encryption and decryption. We'll see why this is so useful. And in particular, the functions that you get is, you can encrypt to a particular public key with a sum message and get a cipher text text. And in order to decrypt, you just supply the corresponding secret key to get the plain text back.
And the cool thing now as you can publish this public key anywhere on the internet, and people can encrypt messages for you, but you need the secret key in order to decrypt the messages. And we'll see how this is used in the protocol.
And in practice you'll often use public key crypto in a slightly different way. So instead of encrypting and decrypting messages, you might actually want to sign or verify messages.
Turns out that at the implementation level these are related operations, but at an API level they might look all little bit different.
So you might find a message with your secret key, and you get some sort of a signature s. And then you can also verify this message using the corresponding public key. And you get the message, and the signature, and outcomes, and some Boolean flags saying whether this is the correct signature not on that message.
And there are some relatively intuitive guarantees that these functions provide if you, for example, got this signature and it verifies correctly, then it must have been generated by someone with the correct secret key.
Make sense, in terms of the primitives we have? All right.
So now let's actually try to figure out--
How would we protect network communication at a larger scale in Kerberos. In Kerberos, we had the fairly simple model where we had all the users and servers have some sort of a relation with this KDC entity. And this KDC entity has this giant table of principles and their keys.
And whenever a user wants to talk to some server, they have to ask the KDC to generate a ticket based on those giant table the KDC has.
So this seems like a reasonably straightforward model. So why do we need something more? Why is Kerberos not enough for the web? Why doesn't the web use just Kerberos for securing all communications?
AUDIENCE: [INAUDIBLE]
PROFESSOR: Yeah. So there a sort of a single KDC, has to be trusted by all.
So this is, perhaps, not great. So you might have trouble really believing that some machine out there is secure for everyone in the world to use. Like, yeah, maybe people at MIT are willing to trust someone at [? ISNT ?] to run the KDC there.
All right. So that's plausible, yeah.
AUDIENCE: [INAUDIBLE]
PROFESSOR: Yes. A key management is hard, I guess, yeah. So what I mean in particular by key management--
AUDIENCE: [INAUDIBLE]
PROFESSOR: Yes. It might actually be a hard job to build a KDC that we can manage a billion keys, or ten billion keys, for all the people in the world. So it might be a tricky proposition.
If that's not the case, then I guess another bummer with Kerberos is that all users actually have to have a key, or have to be known to the KDC.
So, you can't even use Kerberos at MIT to connect to some servers, unless you yourself have an account in the Kerberos database. Whereas on the web, it's completely reasonable to expect the you walk up to some computer, the computer has no idea who you are, but you can still go to Amazon's website protected with cryptography.
Yeah?
AUDIENCE: [INAUDIBLE]
PROFESSOR: That's our idea. So there's these kinds of considerations. So there's private forward secrecies. There are a couple of other things you want from the cryptographic protocol, and we'll look at them and how they sort of show up in SSL, as well.
But the key there is that the solution is actually exactly the same as what you would do Kerberos, and what you would do in SSL or TLS to address those guys.
But you're absolutely right. There Kerberos deep protocols we read about in the paper is pretty dated. So, even if you were using it for the web, you would want to apply some changes to it. Those are not huge though, at the [INAUDIBLE] level.
Any other thoughts on why we should use Kerberos? Yeah?
AUDIENCE: [INAUDIBLE]
PROFESSOR: This is actually not so scalable.
Yeah, recovery. Maybe registration even, as well, like you have to go to some accounts office and get an account. Yeah?
AUDIENCE: [INAUDIBLE] needs to be online.
PROFESSOR: Yeah, so that's another problem. These are sort of management issues, but at the protocol level, the KDC has to be online because it has actually mediate every interaction you have with the service.
It means that in the web, every time you go to a new website, you'd have to talk to some KDC first, which would be a bit of a performance bottleneck. So like another kind of scalability, this is like performance scalability. This is more management scalability kind of stuff. Make sense?
So, how can we solve this problem with these better primitives? Well, the idea is to use public key cryptography to give this KDC out of the loop.
So let's first figure out whether we can establish secure communication if you just know some other party's public key. And then we'll see how we plug-in a public key version of a KDC to authenticate parties in this protocol.
If you don't want to use a KDC, what you could do with public key crypto is maybe you can somehow learn the public key of the public key of the other value on a connector. So in Kerberos, if I want to connect to a file server, maybe I just know the file server's public key from somewhere. Like me as a freshman I get a printout saying the file server's public key is this. And then you can go ahead and connect it.
And the way you might actually do this is you could just encrypt a message for the public key of the file server that you want to connect to. But it turns out that in practice, these public key operations are pretty slow. They are several orders of magnitude slower than symmetric key cryptography. So almost always you want to get out of the use of public crypto as soon as practical.
So a typical protocol might look like this where you have a and b, and they want to communicate. And a knows b's public key. So what might happen is that a might generate some sort of session s.
Just pick a random number. And then it's going to send to b the session key s.
So this is kind of looking like Kerberos. And we're going to encrypt the session s for b's key. And remember in Kerberos, in order to do this, we have to have the KDC do this for us because a didn't know the key for b, or couldn't have been allowed to know because that is a secret. But only b should've known.
With public key cyrptor you can actually this now. We can just encrypt the secret s using these public keys. And we send this message over to b. B can now decrypt this message, and say I should be using this secret key.
And now we can have a communication channel where all the messages are just encrypted under this secret key s.
Does this Make sense?
So there are some nice properties about this protocol. One is that we got rid of having to have a KDC be online and generate our session key for us. We could just have one of the parties generate it and then encrypt it for another party without the use of the KDC.
Another nice thing is we're probably pretty confident that messages sent by a to b will only be read by b. Because only b could have decrypted this message. And therefore, only b should have that corresponding secret key s.
But this is pretty nicely. Any questions about this protocol? Yeah?
AUDIENCE: Does it matter whether the user or the server generates the pass code?
PROFESSOR: Well, maybe. I think it depends on exactly the considerations, or the properties you want out of this protocol.
So here, certainly if a is buggy or picks bad randomness, the server then sends some data back to a, thinking, oh, this is now the only data that is going to be seen by a. Well, maybe that's not going to be quite right. So you might care a little bit. There's a couple of other problems with this protocol, as well.
Question?
AUDIENCE: I was gonna say that in this protocol, a you could just do [INAUDIBLE].
PROFESSOR: Yes, that's actually not great. So there's actually several problems with this. One is the replay.
So the problem here is that I can just send these messages again, and it looks like a is sending these messages to b, and so on.
So typically the solution to this is to have both parties participate in the generation of s, and that ensures that the key we're using is now fresh. Because here, because b didn't actually generating anything, these protocol messages look exactly the same every time.
So typically, what happens is that, one party picks a random number like s, and then another party b also picks some random number, typically called a non. But, whatever. There's two numbers. And then the key they agreed to use in the thing that one party picked, but actually is the hash of the things that both of them picked.
So you could do that. You could also do [? DP Helmond ?] kind of stuff like we looked at in the last lecture where you get forward secrecy. It was a little bit more complicated math rather than just hashing two random numbers that two parties picked. But then you get some nice properties, like forward secrecy.
So replay attacks you typically fixed by having b generate some nons. And then you set the real secret key that you're going to use to hash of the secret key from one guy concatenated with this non. And, of course, b would have to send the nons back to a in order to figure out what's going on for both of them to agree on a key.
All right. So another problem here is that there's no real authentication of a here, all right? So a knows who b is, or at least a knows who will be able to decrypt the data. But b has no idea who is on the other side, whether it's a or some adversary impersonating a, et cetera.
So how would we fix it int his public key world?
Yeah?
AUDIENCE: You have been assigned something and [INAUDIBLE].
PROFESSOR: Yeah. There's a couple of ways you could go about this. One possibility is a maybe should sign this message initially, because we have this nice sign primitive. So we could maybe have a sign this thing with a's secret key. And that sign just provides the signature, but presumably you assign it and also provide the message, as well.
And then b would have to know a is public key in order to verify the signature. But if b knows a is public key, then b's going to be reasonably confident that a is the one that sent this message over.
Make sense?
Another thing you could do is rely on encryption. So maybe b could send the nons back to a encrypted under a's public key. And then only a would be able to decrypt the nons and generate the final session key s prime.
So there are a couple of tricks you could do. This is roughly how client certificates work in web browsers today.
So a has a secret key, so when get an MIT personal certificate, what happens is your browser generates a long lived secret key and gets a certificate for it. And whenever you send to request a web server, you're going to prove the fact that you know the secret key in your user certificate, and then establish the secret key s for the rest the communication.
Make Sense? All right.
These are sort of all fixable problems at the protocol level that are reasonably easy to V address by adding extra messages. The big assumption here, of course, that we're going under is that all the parties know each other's public keys.
So do you actually discover someone's public key? for, you know, it a wants to connect a website, I have a URL that I want to connect to, or a host name, how do I know what pub key that corresponds to? Or similarly, if I connect to websis to look at my grades, how does the server know what my public key should be, as opposed to the public key of some other at person MIT?
So this is the main problem that the KDC was addressing. I guess the KDC was solving two problems for us before.
One it that is was generating this message. It was generating the session key and encrypting it for the server. We fixed that by doing public key crypto now.
But we also need to get this mapping from string principal names to cryptographic keys of the Kerberos previously provided to us. And the way that is going to happen in this the https world, this protocol called TLC, is that we're going to still rely on some parties to maintain, of to a least logically maintain those giant tables mapping principal names onto cryptographic keys.
And the plan is, we're going to have something called a certificate authority.
This is often abbreviated as CA in all kinds of security literature. This thing is also going to logically maintain the stable of, here's the name of a principle, and here's the public key for that principle.
And the main difference from the way Kerberos worked, is that this certificate authority thing isn't going to have to be online for all transactions.
So in Kerberos you have to talk to those KDCs to get a connection or to look up someone's key.
Instead, what's going to happen in this CA world, is that if you have some name here, and a public key, the certificate authority is going to just sign messages stating that certain rows exist in this table.
So the certificate authority is going to have its own sort of secret and public key here.
And it's going to use the secret key to find messages for other users in the system to rely on. So if you have a particular entry like this, in this CA's database, then the CA is going to find a message saying this name corresponds to this public key. And it's going to sign this whole message with CA's secret key.
Make sense?
So this is going to allow us to do very similar things to what Kerberos was doing, but we are now going to get rid of the CA having to be online for all transactions. And in fact, it's now going to be much more scalable. So this is what's usually called a certificate.
And the reason this is going to be much more scalable is that, in fact, to a client, or anyone using this system, a certificate provided from one source is as good as a certificate provided from any other source. It's signed by the CA secret key. So you can verify its validity without having to actually contact the certificate authority, or any other designated party here.
And typically, the way this works is that a server that you want to talk to stores the certificate that it originally got from the certificate authority. And whenever you connect to it, the server will tell you, well, here's my certificate. It was signed by this CA. You can check the signature and just verify that this is, in fact, my public key and that's my name.
And on the flip side, the same thing happens on client certificates. So when you the user connect to a web server, what's actually going on is that your client certificate actually talks about the public key corresponding to the secret key that you originally generated in your browser. And this way when you connect to a server, you're going to present a certificate signed by MIT's certificate authority saying your user name corresponds to this public key. And this is how the server is going to be convinced that a message signed with your secret key is proof that this is the right Athena user connecting to me.
Does that make sense? Yeah.
AUDIENCE: Where does the [? project ?] get the certificate [INAUDIBLE]?
PROFESSOR: Ah, yes. Like the chicken and the egg problem. It keeps going down. Where do you get these public keys? At some point you have to hard code these in, or that's typically what most systems do.
So today what actually happens is that when you download a web browser, or you get a computer for the first time, it actually comes with public keys of hundreds of these certificate authorities.
And there's many of them. Some are run by security companies like VeriSign. The US Postal Service has a certificate authority, for some reason. There's many entities there that could, in principal, issue these certificates and are fully trusted by the system.
These mini certificate authorities are now replacing the trust that we had in this KDC.
And sometimes, we haven't actually addressed all the problems we listed with Kerberos. So previously were worried that, oh man, how are we going to trust? How is everyone in the word going to trust a single KDC machine?
But now, it's actually worse. This is actually worse that in some ways, because instead of trusting a single KDC machine, everyone is now trusting these hundreds or certificate authorities because all of them are equally as powerful. Any of them could sign a message like this and it would be accepted by clients as a correct statement saying this principle has this public key. So you have to only break into one of these guys instead of the one KDSC.
Yeah?
AUDIENCE: Is there a mechanism to open the keys?
PROFESSOR: Yeah. That's another hard problem. It turns out to be that before we talked to the KDC, and if you screwed up, you could tell the KDC to stop giving out my key, or change it. Now the certificates are actually potentially valid forever.
So the typical solution is twofold. One is, sort of expectedly, these certificates include an expiration time. So this way you can at least bound the damage. Is This is kind of like a Kerberos ticket's lifetime, except in practice, these tend to be to several orders of magnitude higher. So in Kerberos, your ticket's lifetime could be a couple hours. Here it's typically a year or something like this.
So the CAs really don't want to be talked to very often. So you want to get your money once a year for the certificate, and then give you out this blob of signed bytes, and you're good to go for a year. You don't have to conduct them again.
So this is good for scalability, but not so good for security.
And there's two problems that you might worry about with certificates. One is that Maybe the CA's screwed up. So maybe the CA issued a certificate for the wrong name. Like, they weren't very careful. And accidentally, I ask them to give you a certificate for amazon.com, and they just slipped up and said, all right, sure. That's amazon.com. I will give you a certificate for that.
So that seems like a problem on the CA side. So they miss-issued a certificate. And that's one way that you could end up with a certificate that you wish no longer existed, because you signed the wrong thing.
Another possibility is that they CA does the right thing, but then the person who had the certificate I accidentally disclosed the secret key, or someone stole the secret key corresponding to the public key in the certificate. So this means that certificate no longer says what you think it might mean. Even though this says amazon.com's key is this, actually every one in the world has the corresponding secret key because posted it on the internet.
So you can't really learn much from someone sending you a message signed by the corresponding secret key, because it could've been anyone that stole the secret key.
So that's another reason why you might want to revoke a certificate. And revoking certificates is pretty messy. There's not really a great plan for it. The two alternatives that people have tried are to basically publish a list of all revoked certificates in the world. This Is something called certificate revocation list, or CRLs. And the way this works is that every certificate authority issues these certificates, but then on the side, it maintains a list of mistakes.
These are things that it realized they screwed up and issued a certificate under the wrong name, or our customers come to them and say, hey, you issued me a certificate. Everything was going great. But someone then got rude on my machine and stole the private key. Please tell the world that my certificate is no good anymore.
So this certificate authority, in principle, could add stuff to this CRL, and then clients like web browsers are supposed to download this CRL periodically. And then whenever they're presented with a certificate, they should check if the certificate appears in this revoked list. And it shows up there, then should say that certificate's no good. You better give me a new one. I'm not going to trust this particular sign message anymore.
So that's one plan. It's not great.
If you really used, it would be a giant list. And it would be quite a lot of overhead for everyone in the world to download this. The other problem is that no one actually bothers doing this stuff. so the lists in practice are empty. If you actually ask all these CAs, most of them will give you back an empty CRL because no one's ever bothered to add anything to this list.
Because, why would you? It will only break things because it will reduce the number of connections that will succeed.
So it's not clear whether there is a great motivations for CAs to maintain this CRL.
The other thing that people have tried is to query online the CAs. Like in the Kerberos world, we contact the KDC all the time. And in the CA world we try to get out of this business and say, well, the CA's only going to sign these messages once a year. That's sort of a bummer. So there's an alternative protocol called online certificate status protocol, or OCSP. And this protocol pushes us back from the CA world to the KDC world.
So whenever a client gets a certificate and they're curious, is this really a valid certificate? Even though it's before the expiration time, maybe something went wrong. So using this OCSP protocol, you can contact some server and just say, hey, I got this certificate. Do you think it's still valid? So basically, offloading the job of maintaining this CRL to a particular server.
So instead of downloading a whole list yourself, you're going to ask the server, hey, is this thing in that list? So that's another plan that people have tried. It's also not used very widely because of two factors.
One is that it adds latency to every request that you make. So every time you want to connect to a server, now you have to first connect, get the certificate from the server. Now you have to talk to this OCSP guy and then wait for him to respond and then do something else. So for latency reasons, this is actually not a super popular plan.
Another problem is that you don't want this OCSP thing being down from affecting your ability to browse the web. Suppose this OSCP server goes down. You could, like, disable the whole internet because you can't check anyone's certificate.
Like, it could be all bad. And then all your connections stop working. So no one wants that. So most clients treat the OCSP server being down as sort of an OK occurrence.
This is really bad from a security perspective because if you're an attacker and you want to convince someone that you have a legitimate certificate, but it's actually been revoked, all you have to do is somehow prevent that client from talking to the OCSP server. And then the client will say, well, I do the certificate. I'll try to check it, but this guy doesn't seem to be around, so I'll just go for it. So that's basically the sort of lay of the land as far as verification goes. So there's no real great answer.
The thing that people do in practice as an alternative to this is that clients just hard code in really bad mistakes. So for example, the Chrome web browser actually ships inside of it with a list of certificates that Google really wants to revoke. So if someone mis-issues a certificate for Gmail or for some other important site-- like Facebook, Amazon, or whatever-- then the next release of Chrome will contain that thing in its verification list baked into Chrome.
So this way, you don't have to contact the CRL server. You don't have to talk to this OCSP guy. It's just baked in. Like, this certificate is no longer valid. The client rejects it. Yeah.
AUDIENCE: Sorry, one last thing.
PROFESSOR: Yeah.
AUDIENCE: So let's say I've stolen the secret key on the certificate [INAUDIBLE]. All public keys are [? hard coded-- ?]
PROFESSOR: Oh, yeah. That's [INAUDIBLE] really bad. I don't think there's any solution baked into the system right now for this. There have been certainly situations where certificate authorities appear to have been compromised.
So in 2011, there were two CAs that were compromised in the issue, or they were somehow tricked into issuing certificates for Gmail, for Facebook, et cetera. And it's not clear. Maybe someone did steal their secret key.
So what happened is I think those CAs actually got removed from a set of trusted CAs by browsers from that point on. So the next release of Chrome is just like, hey, you're really screwed up. We're going to kick you out of the sort of CAs that are trusted.
And it was actually kind of a bummer because all of the legitimate people that had certificates from that certificate authority are now out of luck. They have to get new certificates. So this is a somewhat messy system, but that's sort of what happens in practice with certificates. Make sense? Other questions about how this works?
All right. So this is the sort of general plan for how certificates work. And as we were talking about, they're sort of better than Kerberos in the sense that you don't have to have this guy be online. It might be a little bit more scalable because you can have multiple KDCs, and you don't have to talk to them.
Another cool thing about this protocol is that unlike Kerberos, you're not forced to authenticate both parties. So you could totally connect to a web server without having a certificate for yourself. This happens all the time. If you just go to amazon.com, you are going to check that Amazon is the right entity, but Amazon has no idea who you are necessarily, or at least not until you log in later.
So the crypto protocol level, you have no certificate. Amazon has a certificate. So that was actually much better than Kerberos where in order to connect to a Kerberos service, you have to be an entry in the Kerberos database already. One thing that's a little bit of a bummer with this protocol as we've described it is that in fact, the server does have to have a certificate.
So you can't just connect to a server and say, hey, let's just encrypt our stuff. I have no idea who you are, or not really, and you don't have any idea who I am, but let's encrypt it anyway. So this is called opportunistic encryption, and it's of course vulnerable to man in the middle attacks because you're connecting to someone and saying, well, let's encrypt our stuff, but you have no idea who it really is that you're encrypting stuff with. Both might be a good idea anyway. If someone is not actively mounting an attack against you, at least the packets later on will be encrypted and protected from snooping.
So it's a bit of a shame that this protocol that we're looking at here-- SSL, TLS, whatever-- doesn't offer this kind of opportunistic encryption thing. But such is life. So I guess the server [INAUDIBLE] in this protocol. The client sometimes can and sometimes doesn't have to. Make sense? Yeah.
AUDIENCE: I'm just curious. What's to stop someone from-- I mean, let's just say that once a year, they create using new name key pairs. So why couldn't you kind of try to spend that entire year for that specific key?
PROFESSOR: Huh?
AUDIENCE: Why does that not work with this?
PROFESSOR: I think it does work. So OK, so it's like what goes wrong with this scheme. Like, one of the things that we have to do with the topography of good here, and as with Kerberos, people start off using good crypto, but it gets worse and worse over time. Computers get faster. There's better algorithms that are breaking this stuff. And if people are not diligent about increasing their standards, then these problems do creep up. So for example, it used to be the case that many certificates were signed.
Well, there's two things going on. There's a public key signature scheme. And then because the public key crypto has some limitations, you typically-- actually, when you sign a message, you actually take a hash of the message and then you sign the hash itself because it's hard to sign a gigantic message, but it's easy to sign a compact hash
And one thing that actually went wrong is that people used to use MD5 as a hash function for collapsing the big message here signing into a 128 bit thing that you're going to actually sign with a crypto system. MD5 was good maybe 20 years ago, and then over time, people discovered weaknesses in MD5 that could be exploited.
So actually, at some point, someone did actually ask for a certificate with a particular MD5 hash, and then they carefully figured out another message that hashes to the same MD5 value. And as a result, now you have a signature by a CA on some hash, and then you have a different message, a different key, or a different name that you could convince someone was signed. And this does happen. Like, if you spend a lot of time trying to break a single key, than you will succeed eventually. If that certificate was using crypto, that could be brute force.
Another example of something that's probably not so great now is if you're using RSA. We haven't really talked about RSA, but RSA is one of these public key crypto systems that allows us to either encrypt messages or sign messages. With RSA, these days, it's probably feasible to spend lots of money and break 1,000 bit RSA keys. You'd probably have to spend a fair amount of work, but it's doable, probably within a year easily.
From there, absolutely. You can ask a certificate authority to sign some message, or you can even take someone's existing public key and try to brute force the corresponding secret key, or [? manual hack. ?]
So you have to keep up with the attackers in some sense. You have to use larger keys with RSA. Or maybe you have to use a different crypto scheme.
For example, now people don't use MD5 hashes and certificates. They use SHA-1, but that was good for a while. Now SHA-1 is also weak, and Google is actually now actively trying to push web developers and browser vendors et cetera to discontinue the use of SHA-1 and use a different hash function because it's pretty clear that maybe in 5 or 10 years time, there will be relatively easy attacks on SHA-1. It's already been shown to be weaker.
So I guess there is no magic bullet, per se. You just have to make sure that you keep evolving with the hackers. Yeah. There is a problem, absolutely. Like, all of this stuff that we're talking about relies on crypto being correct, or sort of being s to break. So you have to pick parameters suitably. At least here, there's an expiration time.
So well, let's pick some parameters that are good for a year as opposed to for 10 years. The CA has a much bigger problem. This key, there's no expiration on it, necessarily.
So that's less clear what's going on. So probably, you would pick really aggressively sort of safe parameters. So 4,000 or 6,000 bit RSA or something. Or another scheme all together. Don't use SHA-1 at all here. Yeah. No real clear answer. You just have to do it.
All right. Any other questions? All right. So let's now look at-- so this is just like the protocol side of things. Let's now look at how do we integrate this into a particular application, namely the web browser?
So I guess if you want to secure network communication, or sort of websites, with cryptography, there's really three things we have to protect in browser. So the first thing we have to protect is data on the network. And this is almost the easy part because well, we're just going to run a protocol very much like what I've been describing so far. We'll encrypt all the messages, sign them, make sure they haven't been tampered with, all this great stuff.
So that's how we're going to protect data. But then there's two other things in a web browser that we really have to worry about. So the first of them is anything that actually runs in the browser. So code that's running in the browser, like JavaScript or important data that's stored in the browser. Maybe your cookies, or local storage, or lots of other stuff that goes on in a modern browser all has to be somehow protected from network [? of hackers. ?] And we'll see the kinds of things we have to worry about here in a second.
And then the last thing that you might not think about too much but turns out to be a real issue in practice is protecting the user interface. And the reason for this is that ultimately, much of the confidential data that we care about protecting comes from the user. And the user is typing this stuff into some website, and the user probably has multiple websites open on their computer so that the user has to be able to distinguish which site they're actually interacting with at any moment in time.
If they accidentally typed their Amazon password into some web discussion forum, it's going to be disastrous depending on how much you care about your Amazon password, but still. So you really want to have good user interface sort of elements that help the user figure out what are they doing? Am I typing this confidential data into the right website, or what's going to happen to this data when I submit it?
So this turns out to be a pretty important issue for protecting web applications. All right. Make sense?
So let's talk actually what the current web browsers do on this front. So as I mentioned, here for protecting [INAUDIBLE], we're just going to use this protocol called SSL or TLS now that encrypts and authenticates data. It looks very similar to the kind of discussion we've had so far. It includes the certificate authorities, et cetera. And then of course, many more details. Like, TLS is hugely complicated, but it's not particularly interesting from this [INAUDIBLE] angle.
All right, so protecting, [? stopping ?] the browser turns out to be much more interesting. And the reason is that we need to make sure that any code or data delivered over non-encrypted connections can't tamper with code and data that came from an encrypted connection because our threat model is that anything that's unencrypted could potentially be tampered with by a network [? backer. ?]
So we have to make sure that if we have some unencrypted JavaScript code running on our browser, then we should assume that that could've been tampered with an attacker because it wasn't encrypted. It wasn't authenticated over the network. And consequently, we should prevent it from tampering with any pages that were delivered over an encrypted connection.
So the general plan for this is we're going to introduce a new URL scheme. Let's call HTTPS. So you often see this in URLs, presumably in your own life.
And there's going to be two things that-- well, first of all, the cool thing about introducing a new URL scheme is that now, these URLs are just different from HTTP URLs. So if you have a URL that's HTTPS colon something something, it's a different origin as far as the same origin policy is concerned from regular HTTP URLs. So HTTP URLs go over unencrypted corrections. These things are going over SSL/TLS. So you'll never confuse the two if the same origin policy does its job correctly.
So that's one bit of a puzzle. But then you have to also make sure that you correctly distinguish different encrypted sites from one another. It then turns out cookies have a different policy for historical reasons. So let's first talk about how we're going to distinguish different encrypted sites from one another.
So the plan for that is that actually, the host name via the URL has to be the name in the certificate. So that's what actually turns out that the certificate authorities are going to sign at the end of the day So we're going to literally sign the host name that shows up in your URL as the name for your web server's public key. So Amazon presumably has a certificate for www.amazon.com. That's the name, and then whatever the public key corresponding to their secret key is.
And this is what the browser's going to look for. So if it gets a certificate-- well, if it tries to connect or get a URL that's https://foo.com, it better be the case that the server presents a certificate for foo.com exactly. Otherwise, we'll say, well, we tried to connect to one guy, but we actually have another guy. That's a different name in the certificate that we connected to. And that'll be a certificate mismatch.
So that's how we are going to distinguish different sites from one another. We're basically going to get the CAs to help us tell these sites apart, and the CAs are going to promise to issue certificates to only the right entities. So that's on the same margin policy side, how we're going to separate the code apart. And then as it turns out-- well, as you might remember, cookies have a slightly different policy. Like, it's almost the same origin, but not quite.
So cookies have a slightly different plan. So cookies have this secure flag that you can set on a cookie. So the rules are, if a cookie has a secure flag, then it gets sent only to HTTPS requests, or along with HTTPS requests. And if a cookie doesn't have a secure flag, then it applies to both HTTP and HTTPS requests.
Well, it's a little bit complicated, right. It would be cleaner if cookies just said, well, this is a cookie for an HTTPS post, and this is a cookie for HTTP host. And they're just completely different. That would be very clear in terms of isolating secure sites from insecure sites. Unfortunately, for historical reasons, cookies have this weird sort of interaction.
So if a cookie is marked secure, then it only applies to HTTPS sites. Well, there's a host also as well, right. So secure cookies apply only to HTTPS host URLs, and insecure cookies apply to both. So that will be some source of problems for us in a second. Make sense? All right.
And the final bit that web browsers do to try to help us along in this plan is for the UI aspect, they're going to introduce some kind of a lock icon that users are supposed to see. So there's a lock icon in your browser, plus you're supposed to look at the URL to figure out which site you're on. Now that's how web browser developers expect you to think of the world. Like, if you're ever entering confidential stuff into some website, then you should look at the URL, make sure that's the actual host name that you want to be talking to, and then look for some sort of a lock icon, and then you should assume things are good to go. So that's the UI aspect of it.
It's not great. It turns out that many phishing sites will just include an image of a lock icon in the site itself and have a different URL. And if you don't know exactly what to look for or what's going on, a user might be fooled by this.
So this UI side is a little messy, partly because users are messy, like humans. And it's really hard to tell what's the right thing to do here. So we'll focus mostly on this aspect of it, which is much easier to discuss precisely. Make sense? Any questions about this stuff so far? Yeah.
AUDIENCE: I noticed some websites that our HTTPS [INAUDIBLE].
PROFESSOR: Yeah. So it turns out that the browsers evolve over time what it means to get a lock icon. So one thing that some browsers do is they give you a lock icon only if all of the content or the resources within your page were also served over HTTPS. So this is one of the problems that forced HTTPS tries to address is this mixed content or insecure embedding kinds of problems.
So sometimes, you will be fail to get a lock icon because of that check. Other times, maybe your certificate isn't quite good enough. So for example, Chrome will not give you a lock icon if it thinks your certificate uses weak cryptography.
But also, it varies with browsers. So maybe Chrome will not give you a lock icon, but Firefox will. So it's, again, there's no clear spec on what this lock icon means. Just people sweep stuff under this lock icon. Other questions?
All right. So let's look at h guess what kinds of problems we run into with this plan. So one thing I guess we should maybe first talk about is, OK, so in regular HTTP, we used to rely on DNS to give us the correct IP address on the server. So how much do we have to trust DNS for these HTTPS URLs? Are DNS servers trusted, or are these DNS mappings important for us anymore? Yeah.
AUDIENCE: They are because the certificate is signing the domain name. I don't think you sign an IP address [INAUDIBLE].
PROFESSOR: That's right. Yeah. So the certificate signs the domain name. So this is like amazon.com. So [INAUDIBLE].
AUDIENCE: Say someone steals amazon.com's private key and [INAUDIBLE] another server with another IP address, and combines [INAUDIBLE] IP address [INAUDIBLE]. But then you already stole the private key.
PROFESSOR: That's right. Yeah. So in fact, you're describing after both steal the private key and redirect DNS to yourself. So is DNS in itself sensitive enough for us to care about? I guess in some sense you're right, that we need DNS to find the idea, or otherwise we'd be lost because this is just the host name, and we still need to find IP address to talk to it.
What if someone compromised the DNS server and points us at a different IP address? Is it going to be bad? Yeah.
AUDIENCE: Well, maybe just [INAUDIBLE] HTTPS.
PROFESSOR: So potentially worrisome, right. So they might just refuse the connection altogether.
AUDIENCE: Well, no. They just redirect you to the HTTP URL.
PROFESSOR: Well, so certainly, if you connect to it over HTTPS, then they can't redirect. But yeah. Yeah.
AUDIENCE: You can [INAUDIBLE] and try to fool the user. That's [INAUDIBLE].
PROFESSOR: That's right, yeah. So the thing that you mentioned is that you could try to serve up a different certificate. So maybe you-- well, one possibility is you somehow compromised the CA, in which case all right, you're in business. Another possibility is maybe you'll just sign the certificate by yourself. Or maybe you have some old certificate for this guy that you gotten the private key for.
And it turns out that web browsers, as this sort of forced HTTPS paper we're reading touched on, most web browsers actually ask the user if something doesn't look right with the certificate, which seems like a fairly strange thing to do because here's the rule. The host name has to match the name of the certificate, and it has to be valid. It has to be unexpired, all these very clear rules.
But because of historically the way HTTPS has been deployed, it's often been the case that web server operators mis-configure HTTPS. So maybe they just forget to renew their certificate. Things were going along great and you didn't notice that your certificate was expired and you just forgot to renew it.
So it seems to web browser developers, that seems like a bit of a bummer. Oh, man. It's just expired. Let's just let the user continue. So they offer a dialogue box for the user saying, well, I got a certificate, but it doesn't look right in some way. [INAUDIBLE] go ahead anyway and continue.
So web browsers will allow users to sort of override this decision on things like expiration of certificates. Also for host names, it might be the case that your website has many name. Like for Amazon, you might connect to amazon.com, or maybe www.amazon.com, or maybe other host names. And if you are not careful with the website operator, you might not know to get certificates for every possible name that your website has.
And then a user is sort of stuck saying, well, the host name doesn't look quite right, but maybe let's go anyway. So this is the reason why web browsers allow users to accept more broadly, or a broader range of certificates, than these rules might otherwise dictate. So that's [INAUDIBLE] problem.
And then if you hijack DNS, then you might be able to redirect the user to one of these sites that serves up a incorrect certificate. And if the user isn't careful, they're going to potentially approve the browser accepting your certificate, and then you're in trouble then.
That's a bit of a gray area with respect to how much you should really trust DNS. So you certainly don't want to give arbitrary users control of your DNS name [INAUDIBLE]. But certainly, the goal of SSL/TLS and HTTPS, all this stuff, is to hopefully not trust DNS at all. If everything works here correctly, then DNS shouldn't be trusted.
You can [INAUDIBLE]. You should never be able to intercept any data or corrupt data, et cetera. Make sense? That's if everything works, of course. It's a little bit messier than that.
All right. So I guess one interesting question to talk about is I guess how bad could an attack be if the user mis-approves a certificate? So as we were saying, if the user accepts a certificate for the wrong host or accepts an expired certificate, what could go wrong? How much should we worry about this mistake from the user? Yeah.
AUDIENCE: Well, [INAUDIBLE]. But it could be, [? in example ?], not the site the user wants to visit. So they could do things like pretend to be the user's name.
PROFESSOR: Right. So certainly, the user might then I guess be fooled into thinking, oh, I have all this money, or you have no money at all because the result page comes back saying here's your balance. So maybe the user will assume something about what that bank has or doesn't have based on the result. Well, it still seems bad, but not necessarily so disastrous. Yeah.
AUDIENCE: I think that an [INAUDIBLE] get all the user's cookies and [INAUDIBLE].
PROFESSOR: Right. So this is your fear, yeah. This is much more worrisome, actually, or has a much more longer lasting impact on you. And the reason this works out is because the browser, when it figures out [INAUDIBLE] makes a decision as to who is allowed to get a particular set of cookies or not just looks at the host name in the URL that you were supposed to be connected to.
So if you connect to some attackers' web server, and then you just accept their certificate for amazon.com as the real thing, then the browser will think, yeah, the entity I'm talking to is amazon.com, so I will treat them as I would a normal amazon.com server, which means that they should get access to all the cookies that you have for that host. And presumably they could run a JavaScript code in your browser in that same origin principle.
So if you have another site open that was connecting to the real website-- like maybe you had a tab open in your browser. You closed your laptop, then you opened it on a different network, all of a sudden, someone intercepted your connection to amazon.com and injected their own response. If you approve it, then they'll be able to access the old amazon.com page you have open because as far as the browser is concerned, these are the same origin because they have the same host name. That's going to be troublesome.
So this is potentially quite a unfortunate attack if the user makes the wrong choice on approving that certificate. Make sense? Any questions about that? All right.
So that's one sort of, I guess, issue that this forced HTTPS paper is worried about is users making a mistake in the decision, users having too much leeway in accepting certificates. Another problem that shows up in practice is that-- we sort of briefly talked about this-- but this is one of the things that also forced HTTPS, I think, is somewhat concerned about is this notion of insecure embedding, or mixed content. And the problem that this term refers to is that a secure site, or any website for that matter, can embed other pieces of content into a web page.
So if you have some sort of a site, foo.com/index.html, this site might be served from HTTPS, but inside of this HTML page, you could have many tags that instruct the browser to go and fetch other stuff as part of this page. So the easiest thing to sort of think about is probably script tags where you can say script source equals http jquery.com.
So this is a popular JavaScript library that makes it easier to interact with lots of stuff in your browser. But many web developers just reference a URL on another site like this. So we should be fairly straightforward, but what's the problem with this kind of set up? Suppose you have a secure site and you just load jQuery. Yeah.
AUDIENCE: It could be fake jQuery.
PROFESSOR: Yeah. So there are actually two ways that you could get the wrong thing that you're not expecting. One possibility is that jQuery itself is compromised. So that seems like, well, you get what you asked for. You asked for this site from jquery.com and that's what you get. If jQuery is compromised, that's too bad. Another problem is that this request is going to be sent without any encryption or authentication over the network.
So if an adversary is in control over your network connection, then they could intercept this request and serve back some other JavaScript code in response. Now, this JavaScript code is going to run as part of this page. And now, because it's running in this HTTPS foo.com domain, it has access to your secure cookies for foo.com and any other stuff you have in that page, et cetera. So it seems like a really bad thing. So you should be careful not to. Or a web developer certainly should be careful not to make this kind of a mistake.
So one solution is to ensure that all content embedded in a secure page is also secure. So this seems like a good guideline for many web developers to follow. So maybe you should just do https colon jquery.com. Or it turns out that URLs support these origin relative URLs, which means you could omit the HTTPS part and just say, [INAUDIBLE] script source equals //jquery.com/ something.
And what this means is to use whatever URL scheme your own URL came from. So this tag will translate to https jquery.com if it's on an HTTPS page, and to regular http jquery.com if it's on a non-HTTPS, just regular HTTP URL. So that's one way to avoid this problem.
Another thing that actually recently got introduced. So this field is somewhat active. People are trying to make things better. One alternative way of dealing with this problem is perhaps to include a hash or some sort of an [? indicator ?] right here in the tag, because if you know exactly what content you want to load, maybe you don't actually have to load it all over HTTPS. You don't actually care who serves it to you, as long as it matches a particular hash.
So there's actually a new spec out there for being able to specify basically hashes in these kinds of tags. So instead of having to refer to jquery.com with an HTTPS URL, maybe what you could do is just say script source equals jquery.com, maybe even HTTP. But here, you're going to include some sort of a tag attribute, like hash equals here, you're going to put in a-- let's say a shell one hash or a shell two hash of the content that you're expecting to get back from the server.
AUDIENCE: [INAUDIBLE].
PROFESSOR: Question?
AUDIENCE: [INAUDIBLE].
PROFESSOR: Ah, man. There's some complicated name for it. I have the URL, actually, in the lecture notes, so [INAUDIBLE]. Subresource integrity or something like this. I can actually slowly be-- well, hopefully will be deployed probably soon in various browsers. Feels like another way to actually authenticate content without relying on data, or data encryption of the [INAUDIBLE].
So here, we have this very generic plan using SSL and TLS to authenticate connections to particular servers. This is almost like an alternative way of thinking of sort of securing your network communication. If the thing you just care about is integrity, then maybe you don't need a secure, encrypted channel over the network. All you need is to specify exactly what you want at the end of the day. Yeah.
AUDIENCE: So doesn't this [INAUDIBLE]?
PROFESSOR: Doesn't this code sit at the client? Well, it runs at the client, but the client fetches this code from some server.
AUDIENCE: [INAUDIBLE]. Can't anybody just [INAUDIBLE]?
PROFESSOR: Yeah. So I think the point of the hash is to protect the containing page from attackers that injected different JavaScript code here. So for jQuery, this makes a lot of sense because jQuery is well known. You're not trying to hide what jQuery source code is. Well, what you do want to make sure is that the network attacker cannot intercept your connection and supply a malicious version of jQuery that's going to leak your cookies.
AUDIENCE: [? Oh, ?] OK.
PROFESSOR: That make sense? It's absolutely true that anyone can compute the hash of these things for themselves. So this is a solution for integrity problems, not for confidentiality. All right.
So this is sort of what I guess developers have to watch out for when writing pages, or including content in their HTML pages on a HTTPS URL. Another worrisome problem is dealing with cookies. And here's where this difference between secure flags and just origins comes into play.
So one thing, of course, the developer could screw up is maybe they just forget to set the secure flag on a cookie in the first place. This happens. Maybe you're thinking my users only ever go to the HTTPS URL.
My cookies are never [INAUDIBLE]. Why should I set the secure flag on the cookie? And they might [? also have the ?] secure flag, or maybe they just forget about it.
Is this a problem? What if your users are super diligent? They always visit the HTTPS URL, and you don't have any problems like this. Do you still leave the secure flag on your cookies? [INAUDIBLE]. Yeah.
AUDIENCE: Could the attacker connect to your URL and redirect you to a [INAUDIBLE]?
PROFESSOR: Yeah. So even if the user doesn't explicitly, manually go to some plain text URL, the attacker could give you a link, or maybe ask you to load an image from a non-HTTPS URL. And then non-secure cookie is just going to be sent along with the network request. So that seems like a bit of a problem. So you really do need the secure flag, even if your users and your application is super careful.
AUDIENCE: But I'm assuming there's an HTTP URL [INAUDIBLE].
PROFESSOR: That's right, yeah. So again, so how could this [? break? ?] Suppose I have a site. It doesn't even listen on port 80. There's no way to connect to me on port 80, so why is it a problem if I have a non-secure cookie?
AUDIENCE: Because the browser wouldn't have cookies for another domain.
PROFESSOR: That's right. So the browser wouldn't send your cookie to a different domain, but yet it still seems worrisome that an attacker might load a URL. So suppose that amazon.com only ever served stuff over SSL. It's not even listening on port 80. There's no way to connect it.
So in this case, and as a result, they don't set their secure flag on a cookie. So how could a hacker then steal their cookie if Amazon isn't even listening at port 80? Yeah.
AUDIENCE: Can't the browser still think it's an HTTP connection?
PROFESSOR: Well, so if you connect to port 443 and you speak SSL or GLS, then it's always going to be encrypted. So that's not a problem. Yeah.
AUDIENCE: The attacker can [INAUDIBLE] their network.
PROFESSOR: Yeah. So the attacker can actually intercept your packets that are trying to connect to Amazon on port 80 and then appear, and make it appear, like you've connected successfully. So if the attacker has control over your network, they could redirect your packets trying to get to Amazon to their own machine on port 80. They're going to accept the connection, and the client isn't going to be able to know the difference. It will be as if Amazon is listening on port 80, and then your cookies will be sent to this adversary's web server.
AUDIENCE: Because the client is unknown.
PROFESSOR: That's right. Yeah, so for HTTP, there's no way to authenticate the host you're connected to. This is exactly what's going on. HTTP has no authentication, and as a result, you have to prevent the cookies from being sent over HTTP in the first place because you have no idea who that HTTP connection is going to go to if you're assuming a network adversary.
AUDIENCE: So you need network control to do this.
PROFESSOR: Well, yeah. So either you have full control over your network so you know that adversaries aren't going to be able to intercept your packets. But even then, it's actually not so great. Like look at the TCP lecture. You can do all kinds of sequence number of attacks and so on. [? That's going to be ?] troublesome.
All right. Any more questions about that? Yeah.
AUDIENCE: I'm sorry, but isn't the attack intercepted in that case? Is there like a redirect?
PROFESSOR: Well, what that hacker presumably would intercept is an HTTP request from the client going to http amazon.com, and that request includes all your amazon.com cookies, or cookies for whatever domain it is that you're sending your request to. So if you don't mark those cookies as secure, there will be set of both encrypted and unencrypted connections.
AUDIENCE: So how does that request get initiated?
PROFESSOR: Ah, OK. Yeah. So maybe you get the user to visit newyorktimes.com and you pay for an advertisement that loads an image from http colon amazon.com. And there's nothing preventing you from saying, please load an image from this URL. But when a browser tries to connect there, it'll send the cookies if the connection succeeds. Question back there.
AUDIENCE: Will it ask for a change [INAUDIBLE]?
PROFESSOR: Yeah. So HTTPS everywhere is an extension that is very similar to forced HTTPS in some ways, and it tries to prevent these kinds of mistakes. So I guess one thing that forced HTTP does is they worry about such mistakes. And when you sort of opted a site into this forced HTTPS plan, one thing that the browser will do for you is prevent any HTTPS connections to that host in the first place.
So there's no way to make this kind of mistakes of not flagging your cookie as secure, or having other sort of kinds of cookie problems as well. Another more subtle problem-- so this, the problem we talked about just now is the developer forgetting to set the secure flag on a cookie. So that seems fixable. OK, maybe the developer should just do it. OK, fix that problem.
The thing that's much more subtle is that when a secure web server gets a cookie back from the client, it actually has no idea whether this cookie was sent through an encrypted connection or a plain text connection because when the server gets a cookie from the client, all it gets is the key value pair for a cookie. And as we sort of look at here, the plan for the [INAUDIBLE] follows is that it'll include both secure and insecure cookies when it's sending a request to a secure server, because the browser here was just concerned about the confidentiality of cookies.
But on the server side, you now don't have any integrity guarantees. When you get a cookie from a user, it might have been sent over an encrypted connection, but it also might have been sent over a plain text connection. So this leads to somewhat more subtle attacks, but the flavor of these attacks tend to be things like session fixation. What it means is that suppose I want to see what emails you're sending.
Or maybe I'll set a cookie for you that is a copy of my Gmail, cookie. So when you go to compose a message in Gmail, it'll actually be saved in my sent folder inside of your sent folder. It'll be as if you're using my account, and then I'll be able to extract things from there.
So if I can force a session cookie into your browser and sort of get you to use my account, maybe I can extract some information that way from the victim. So that's another problem that arises because of this grey area [INAUDIBLE] incomplete separation between HTTP and HTTPS cookies. Question.
AUDIENCE: So you would need a [INAUDIBLE] vulnerability to set that cookie [INAUDIBLE].
PROFESSOR: No. [INAUDIBLE] vulnerability to set this cookie. You would just trick the browser into connecting to a regular HTTP host URL. And without some extension like forced HTTPS or HTTPS everywhere, you could then, as an adversary, set up a key in the user's browser. It's a non-secure cookie, but it's going to be sent back, even on secure requests.
AUDIENCE: So do you have to trick the browser into thinking the domain is the same domain?
PROFESSOR: That's right. Yeah. So you have to intercept their network connection and probably do the same kind of attack you were talking about just a couple of minutes ago. Yeah. Make sense?
All right. So I guess there's probably [INAUDIBLE]. So what does forced HTTPS actually do for us now? It tries to prevent some subset of these problems. So I guess I should say, so forced HTTPS, the paper we read was sort of a research proposal that was published I guess five or six years ago now. Since then, it's actually been standardized and actually adopted.
So this was like a somewhat sketchy plug-in that stored stuff and some cookies. Are they worried about getting evicted and so on? Now actually, most browsers look at this paper and say, OK, this is a great idea.
We'll actually implement it better within the browser itself. So there's something called HTTP strict transport security that implements most of the ideas from forced HTTPS and actually make a good story. Like, here's how research actually makes an impact on I guess security of web applications and browsers.
But anyway, let's look at what forced HTTPS does for a website. So forced HTTPS allows a website to set this bit for a particular host name. And the way that forced HTTPS changes the behavior of the browser is threefold.
So if some website sets forced HTTPS, then there's sort of three things that happen differently. So any certificate errors are always fatal. So the user doesn't have a chance of accepting incorrect certificate that has a wrong host name, or an expiration time that's passed, et cetera.
So it's one thing that the browser now changes. Another is that it redirects all HTTP requests to HTTPS. So this is a pretty good idea. If you know a site is always using HTTPS legitimately, then you should probably prohibit any regular HTTP requests [? website ?], because that's probably a sign of some mistake or attacker trying to trick you into connecting to a site without encryption. You want to make sure this actually happens before you issue the HTTP request. Otherwise, the HTTP request has already sort of sailed onto the network.
And the last thing that this forced HTTPS setting changes is that it actually prohibits this insecure embedding plan that we looked at below here when you're including a HTTP URL in an HTTPS site. Make sense? So this is what the forced HTTPS sort of extension did. In terms of what's going on now is that well, so this HTTPS strict transport security HSTS protocol basically does the same things.
Most browsers now prohibit insecure embedding by default. So this used to be a little controversial because many developers have trouble with this. But I think Firefox and Chrome and IE all now by default will refuse to load insecure components, or at least secure JavaScript and CSS, into our page unless you do something. Question.
AUDIENCE: Don't they prompt the user?
PROFESSOR: They used to, and the user would just say, yes. So IE, for example, used to pop up this dialogue box, and this paper talks about, saying, would you like to load some extra content, or something like that.
AUDIENCE: [INAUDIBLE] because [INAUDIBLE].
PROFESSOR: Yeah. I think if you try to pretend to be clever, then you can bypass all these security mechanisms. But don't try to be clever this way. So this is mostly a non-problem in modern browsers, but these two things are still things that forced HTTPS and HTTP strict transport security provide and are useful. Yeah.
AUDIENCE: What happens when a website can't support HTTPS? [INAUDIBLE] change their [INAUDIBLE]?
PROFESSOR: So what do you mean can't support HTTPS?
AUDIENCE: [INAUDIBLE].
PROFESSOR: Well, OK. So if you have a website that doesn't support HTTPS but sets this cookie, what happens?
AUDIENCE: [INAUDIBLE].
PROFESSOR: Yeah. So this is the reason why it's an option. So if you opted everyone, then you're exactly in this boat. Like, oh, all of a sudden, you can't talk to most of the web because they don't use HTTPS. So you really wanted this to be selectively enabled for sites that really want this kind of protection. Yeah.
AUDIENCE: But also, if I remember correctly, you can't set the cookie unless the site [INAUDIBLE].
PROFESSOR: That's right, yeah. So these guys are also worried about denial of service attacks, where this plug in could be used to cause trouble for other sites. So if you, for example, set this forced HTTPS bit for some unsuspecting website, then all of a sudden, the website stops working because everyone is now trying to connect to them over HTTPS, and they don't support HTTPS. So this is one example of worrying about denial of service attacks.
Another thing is that they actually don't support setting forced HTTPS for an entire domain. So they worried that, for example, at mit.edu, I am a user at mit.edu. Maybe I'll set a forced HTTPS cookie for start.mit.edu in everyone's browsers. And now, only HTTPS things work at MIT. That seems also a little disastrous, so you probably want to avoid that.
On the other hand, actually, HTTPS strict transfer security went back on this and said, well, we'll allow this notion of forcing HTTPS for an entire subdomain because it turns out to be useful because of these insecure cookies being sent along with a request that you can't tell where they were sent from initially. Anyway, so there's all kinds of subtle interactions with teachers at the lowest level, but it's not clear what the right choice is.
OK, so one actually interesting question you might ask is are these fundamental to the system we have, or are these mostly just helping developers avoid mistakes? So suppose you had a developer that's very diligent and doesn't do insecure [INAUDIBLE] embedding, doesn't solve any other problems, always gets their certificates renewed, should they bother with forced HTTPS or not? Yeah.
AUDIENCE: Well, yeah. You still have the problem with someone forcing HTTP protocol. Nothing stops the hacker from doing [? excessive ?] [INAUDIBLE] forces the user to load something over HTTP and then to intercept the connection.
PROFESSOR: That's true, but if you feel they're very diligent and all their cookies are marked secure, then having someone visit an HTTP version of your site, shouldn't be a problem.
AUDIENCE: [INAUDIBLE].
PROFESSOR: Yeah. So you'd probably have to defend against cookie overwrite or injection attacks, and that's sort of doable. It's a little tedious, but you can probably do something.
AUDIENCE: Yeah. I think her point is that also, it didn't-- security didn't check the certificate, right?
PROFESSOR: Yeah. So that's one. I think that this is the biggest thing is this first point, which is that everything else, you can sort of defend it against by cleverly coding or being careful in your application. The first thing is something that the user has-- or the developer-- has no control over because the developer wants to make sure, for example, that their cookie will only be sent to their server as signed by this CA.
And if the user is allowed to randomly say, oh, that's good enough, then the developer has no clue where their cookie's going to end up because some user is going to leak it to some incorrect server. So this is, I think, the main benefit of this protocol. Question back there.
AUDIENCE: [INAUDIBLE] second point is also vital because the user might not [INAUDIBLE]. You might [INAUDIBLE] of the site, which would be right in the middle.
PROFESSOR: I see. OK. So I agree in the sense that this is very useful from the point of view of UI security because as far as the cookies are concerned, the developer can probably be clever enough to do something sensible. But the user might not be diligently looking at that lock icon and URL at all times.
So if you load up amazon.com and it asks you for a credit card number, you might just type it in. You just forgot to look for a lock icon, whereas if you set forced HTTPS for amazon.com, then there's just not chance that you'll have an HTTP URL for that site. It still [? causes a ?] problem that maybe the user doesn't read the URL correctly. Like it says Ammazon with two Ms dot com. Probably still fool many users.
But anyway, that is another advantage for forced HTTPS. Make sense? Other questions about this scheme? All right.
So I guess one interesting thing is how do you get this forced HTTPS bit for a site in the first place? Could you have intercepted that as an attacker and prevent that bit from being set if you [? want to mount a fax? ?] Yeah.
AUDIENCE: [INAUDIBLE] HTTPS. I mean, HTTPS, we're [? assuming ?] [INAUDIBLE] protocol [INAUDIBLE].
PROFESSOR: That's right. So on one hand, this could be good. But this forced https that can only be sent over HTTPS connection to the host in question. On other hand, the user might be fooled at that point.
Like, he doesn't have the forced HTTPS bit yet. So maybe the user will allow some incorrect certificate, or will not even know that this is HTTP and not HTTPS. So it seems potentially possible for an attacker to prevent that forced HTTPS bit from being sent in the first place. If you've never been to a site and you try to visit that site, you might never learn whether it should be forced HTTPS or not in the first place. Yeah.
AUDIENCE: Will the [INAUDIBLE] roaming certificate there.
PROFESSOR: That's right, yeah. So I guess the way to think of it is if they did a set, then you know you talked to the right server at some point, and then you could continue using that bit correctly. On the other hand, if you don't have that bit set, or maybe if you've never talked to a server yet, there's no clear cut protocol that will always give you whether that forced HTTPS bit should be set or not.
Maybe amazon.com always wants to set that forced HTTPS bit. But the first time you pulled up your laptop, you were already on an attacker's network, and there's just no way for you to connect to amazon.com. Everything is intercepted, or something like this. So it's a very hard problem to solve. The bootstrapping of these security settings is pretty tricky.
I guess one thing you could try to do is maybe embed this bit in DNSSEC. So if you have DNSSEC, already in use, then maybe you could sign whether you should use HTTPS or not, or forced HTTPS or not, as part of your DNS name. But again, it just boils down the problem to DNSSEC being secure. So there's always this sort of rule of trust where you have to really assume that's correct. Question.
AUDIENCE: [INAUDIBLE].
PROFESSOR: Yeah. So I guess Google keeps trying to improve things by hard coding it. So one thing that Chrome offers is that actually, the browser ships with a list of sites that should have forced HTTPS enabled-- or now, well, this HSTS thing, which is [INAUDIBLE] enabled. So when you actually download Chrome, you get lots of actually useful stuff, like somewhat up to date CRL and a list of forced HTTPS sites that are particularly important.
So this is like somewhat admitting defeat. Like the protocol doesn't work. We just have to distribute this a priori to everyone. And it sets up this unfortunate dichotomy between sites that are sort of important enough for Google to ship with the browser, and sites that don't do this.
Now of course, Google right now tells you that anyone can get their site included because the list is so small. But if this grows to millions of entries, I'm sure Google will stop including everyone's site in there. But yeah, you could totally add a domain. And you could email Chrome developers and get your thing included on the list of forced HTTPS URLs.
Anyway, any other questions about forced HTTPS and SSL? All right. Good. So I'll see you guys on Wednesday at the [INAUDIBLE].