Lecture 16: Side-Channel Attacks

Flash and JavaScript are required for this feature.

Download the video from iTunes U or the Internet Archive.

Description: In this lecture, Professor Zeldovich discusses side-channel attacks, specifically timing attacks.

Instructor: Nickolai Zeldovich

The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make a donation or to view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu.

PROFESSOR: All right, guys, let's get started. So today, we're going to talk about side-channel attacks, which is a general class of problems that comes up in all kinds of systems. Broadly, side-channel attacks are situations where you haven't thought about some information that your system might be revealing.

So typically, you have multiple components that you [INAUDIBLE] maybe a user talking to some server. And you're thinking, great, I know exactly all the bits going over some wire [INAUDIBLE] server, and those are secure. But it's often easy to miss some information revealed, either by user or by server. So the example that the paper for today talks about is a situation where the timing of the messages between the user and the server reveals some additional information that you wouldn't have otherwise learned by just observing the bits flowing between these two guys.

But In fact, there's a much broader class of side-channels you might worry about. Originally, side-channels showed up, or people discovered them in the '40s when they discovered that when you start typing characters on a teletype the electronics, or the electrical machinery in the teletype, would emit RF radiation. And you can hook up an oscilloscope nearby and just watch the characters being typed out by monitoring the frequency or RF frequencies that are going out of this machine. So RF radiation is a classic example of a side-channel that you might worry about.

And there's lots of examples lots of other examples that people have looked at, almost anything. So power usage is another side-channel you might worry about. So your computer is probably going to use different amounts of power depending on what exactly it's computing. I'm gonna go into other clever examples of sound turns out to also leak stuff.

There's a [? cute ?] paper that you can look at. The people listen to a printer and based on the sound the printer is making you can tell what characters it's printing. This is especially easy to do for dot matrix printers that make this very annoying sound when they're printing.

And in general, a good thing to think about, Kevin on Monday's lecture also mentioned some interesting side-channels that he's running through in his research. But, in particular, here we're going to look at the specific side-channel that David Brumley and Dan Boneh looked at in their paper-- I guess about 10 years ago now-- where they were able to extract a cryptographic key out of a web server running Apache by measuring the timing of different responses to different input packets from the adversarial client.

And in this particular case, they're going after a cryptographic key. In fact, many side-channel attacks target cryptographic keys partly because it's a little bit tricky to get lots of data through a side-channel. And cryptographic keys are one situation where getting a small number of bits helps you a lot. So in their attack they're able to extract maybe about 200 256 bits or so.

And just from those 200ish bits, they're able to break the cryptographic key of this web server. Whereas, if you're trying to leak some database full of Social Security numbers, then that'll be a lot of bits you have to leak to get out of this database. So that's why many of these side-channels, if you'll see them later on, they often focus on getting small secrets out, might be cryptographic keys or passwords. But in general, this is applicable to lots of other situations as well.

And one cool thing about this paper, before we jump into the details, is that they show that you actually do this over the network. So as you probably figured out from reading this paper, they have to do a lot of careful work to tease out these minute differences in timing information. So if you actually compute out the numbers from this paper, it turns out that each request that they sent to the server differs from potentially another [? website ?] by an order of 1 to 2 microseconds, which is pretty tiny.

So you have to be quite careful, and all of our network it might be hard to tell whether some server took 1 or 2 microseconds longer to process your request or not. And as a result, it was not so clear for whether you could mount this kind of attack over a very noisy network. And these guys were one of the first people to show that you can actually do this over a real ethernet network with a server sitting in one place, a client sitting somewhere else. And you could actually measure these differences partly by averaging, partly through other tricks. All right, does that make sense, the overall side-channel stuff?

All right. So the plan for the rest of this lecture is we'll first dive into the details of this RSA cryptosystem that these guys use. Then we'll not look at exactly why it's secure or not but we'll look at how do you implement it because that turns out to be critical for exploiting this particular side-channel. They carefully leverage various details of the implementation to figure out when there are some things faster or slower. And then we'll pop back out once we understand how RSA is implemented. Then we'll come back and figure out how do you attack it, how do you attack all these different organizations that RSA has. Sounds good? All right.

So I guess let's start off by looking at the high level plan for RSA. So RSA is a pretty widely used public key cryptosystem. We've mentioned these guys a couple of weeks ago in general in certificates, in the context of certificates. But now we're going to look at actually how it works. So typically there's 3 things you have to worry about. So there's generating a key, encrypting, and decrypting. So for RSA, the way you generate a key is you actually pick 2 large prime integers. So you're going to pick 2 primes, p and q.

And in the paper, these guys focus on p and q, which are about 512 bits each. So this is typically called 1,024 bit RSA because the resulting product of these primes that you're going to use in a second is a 1,000 bit integer number. These days, that's probably not a particularly good choice for the size of your RSA key because it makes it relatively easy for attackers to factor this-- not trivial but certainly viable. So if 10 years ago, this seemed like a potentially sensible parameter, now if you're actually building a system, you should probably pick a 2,000 or 3,000 or even 4,000 bit RSA key. Well, that's what RSA key size means is the size of these primes.

And then, for convenience, we're going to talk about the number n, which is just the product of these 2 primes, p times q. All right. So now we know how to generate a key, now we need to figure out-- well this is at least part of a key-- now we're going to have to figure out how we're going to encrypt and decrypt messages. And the way we're going to encrypt and decrypt messages is by exponentiating numbers modulo this number n.

So it seems a little weird, but let's go with it for a second. So if you want to encrypt a message, then we're going to take a message m and transform it into m to the power e mod m. So e is going to be some exponent-- we'll talk about how to choose it in a second. But this is how we're going to encrypt a message.

We'll just take this message as an integer number and just exponentiate it. And then we'll see why this works in a second, but let's call this guy c, ciphertext. Then to decrypt it, we're going to somehow find an interesting other exponent where you can take a ciphertext c and if you exponentiate it to some power d mod m, then you'll magically get back the same message m. So this is the general plan: To encrypt, you exponentiate. To decrypt, you exponentiate by another exponent.

And in general, it seems a little hard to figure out how we're going to come up with these two magic numbers that somehow end up giving us back the same message. But it turns out that if you look at how exponentiation works or multiplication works, modulo of this number n. Then there's this cool property that if you have any number x, and you raise it to what's called a [? order ?] of phi function of n-- maybe I'll use more board space for this. This seems important.

So if you take x and you raise it to phi of n, then this is going to be equal to 1 mod m. And this phi function for our particular choice of n is pretty straightforward, it's actually p minus 1 times q minus 1. So this gives us hope that maybe if we pick ed so that e times d is 5n plus 1, then we're in good shape. Because then any message m we exponentiate it to e and d, we get back 1 times m because our ed product is going to be roughly 5n plus 1, or maybe some constant alpha times 5n plus 1. Does this make sense? This is why the message is going to get decrypted correctly. And it turns out that there's a reasonably straightforward algorithm if you know this phi value for how to compute d given an e or e given a d. All right. Question.

AUDIENCE: Isn't 1 mod n just 1?

PROFESSOR: Yeah, so far we add one more. Sorry?

AUDIENCE: Like, up over there.

PROFESSOR: Yeah, this one?

AUDIENCE: Yeah.

PROFESSOR: Isn't 1 mod n just 1? Sorry, I mean this. So when I say this 1 n, it means that both sides taken 1n are equal. So what this means is if you want to think of mod as literally an operator, you would write this guy mod m equals 1 mod m. So that's what mod m on the side means. Like, the whole equality is mod m. Sorry for the [INAUDIBLE]. Make sense? All right.

So what this basically means for RSA is that we're going to pick some value e. So e is going to be our encryption value. And then from e we're going to generate d to be basically 1 over e mod phi of n. And there's some Euclidean algorithms you can use to do this computation efficiently. But in order to do this you actually have to know this phi of n, which requires knowing the factorization of our number n into p and q.

All right. So finally, RSA ends up being a system where the public key is this number n and this encryption exponent e. So n and e are public, and d should be private. So then anyone can exponentiate a message to encrypt it for you. But only you know this value d and therefore can decrypt messages. And as long as you don't know this factorization of p and q, of n to p and q, then you don't know what this [? phi del ?] is. And as a result, it's actually difficult to compute this d value. So this is roughly what RSA is. High level. Does this make sense? All right.

So there's 2 things I want to talk about now that we at least have the basic [? implementation ?] for RSA. There's tricks to use it correctly and pitfalls and how to use RSA. And then there's all kinds of implementation tricks on how do you actually implement [? root ?] code to do these exponentiations and do them efficiently. There's actually more trivial because these are all large numbers, these are 1,000 bit integers that can't just do a multiply instruction for. Probably going to take a fair amount of time to do these operations. All right.

So the first thing I want to mention is the various RSA pitfalls. One of them we're actually going to rely on in a little bit. One property is, that it's multiplicative. So what I mean by this is that suppose we have 2 messages. Suppose we have m0 and m1. And suppose I encrypt these guys, so I encrypt m0, I'm going to get m0 to the power e mod n. And if I encrypt m1, then I'd get m1 to the e mod n. The problem is-- not necessarily a problem but could be a surprise to someone using RSA-- it's very easy to generate an encryption of m0 times m1 because you just multiply these 2 numbers. If you multiply these guys out, you're going to get m0 m1 to the e mod n.

This is a correct encryption under this simplistic use of RSA for the value m0 times m1. I mean at this point, it's not a huge problem because if you aren't able to decrypt it, you're just able to construct this encrypted message. But it might be that the overall system maybe allows you to decrypt certain messages. And if it allows you to decrypt this message that you construct yourself, maybe you can now go back and figure out what are these messages. So it's maybe not a great plan to be ignorant of this fact. This has certainly come back to bite a number of protocols that use RSA. There's one property, we'll actually use it as a defensive mechanism towards the end of the lecture.

Another property of RSA that you probably want to watch out for is the fact that it's deterministic. So in this [? naive ?] implementation that I just described here, if you take a message m and you encrypt it, you're going to get m to the e mod n, which is a deterministic function of the message. So if you encrypt it again, you'll get exactly the same encryption.

This is not surprising but it might not be a desirable property because if I see you send send some message encrypted with RSA, and I want to know what it is, it might be hard for me to decrypt it. But I can try different things and I can see, well are you sending this message? I'll encrypt it and see if you get the same ciphertext. And if so, then I'll know that's what you encrypted. Because all I need to encrypt a message is the publicly known public key, which is n and the number e. So that's not so great. And you might want to watch out for this property if you're actually using RSA. So all of these [? primitives are ?] probably a little bit hard to use directly.

What people do in practice in order to avoid these problems with RSA is they encode the message in a certain way before encrypting it. Instead of directly exponentiating a message, it actually takes some function of a message, and then they encrypt that. mod n. And this function f, the right one to use these days, is probably something called optimal asymmetric encryption padding, O A E P. You can look it up. It's something coded that has two interesting properties.

First of all, it injects randomness. You can think of f of n as generating 1,000 bit message that you're going to encrypt. Part of this message is going to be your message m in the middle here. So that you can get it back when you decrypt, of course. [INAUDIBLE]. So there's 2 interesting things you want to do. You want to put in some randomness here, some value r so that when you encrypt the message multiple times, you'll get different results out of each time so then it's not deterministic anymore.

And in order to defeat this multiplicative property and other kinds of problems, you're going to put in some fixed padding here. You can think of this as an altering sequence of 1 0 1 0 1 0. You can do better things. But roughly it's some predictable sequence that you put in here and whenever you decrypt, you make sure the sequence is still there. Even in multiplication it's going to destroy this bit power. And then you should be clear that someone tampered with my message and reject it. And if it's still there, then presumably, sometimes provably, no one tampered with your message, and as a result you should be able to accept it. And treat message m as correctly encrypted by someone. Make sense? Yeah?

AUDIENCE: If the attacker knows how big the pad is, can't they put a 1 in the lowest place and then [INAUDIBLE] under multiplication?

PROFESSOR: Yeah, maybe. It's a little bit tricky because this randomness is going to bleed over. So the particular construction of this O A E P is a little bit more sophisticated than this. But if you imagine this is integer multiplication not bit-wise multiplication. And so this randomness is going to bleed over somewhere, and you can construct O A E P scheme such that this doesn't happen. [INAUDIBLE] Make sense? All right.

So it turns out that basically you shouldn't really use this RSA math directly, you should use some library in practice that implements all those things correctly for you. And use it just as an encrypt/decrypt parameter. But it turns out these details will come in and matter for us because we're actually trying to figure out how to break or how to attack an existing RSA implementation.

So in particular the attack from this paper is going to exploit the fact that the server is going to check for this padding when they get a message. So this is how we're going to time how long it takes a server to decrypt. We're going to send some random message, or some carefully constructed message. But the message wasn't constructed by taking a real m and encrypting it.

We're going to construct a careful ciphertext integer value. And the server is going to decrypt it, it's going to decrypt to some nonsense, and the padding is going to not match with a very high probability. And immediately the server is going to reject it. And the reason this is going to be good for us is because it will tell us exactly how long it took the server to get to this point, just do the RSA decryption, get this message, check the padding, and reject it. So that's what we're going to be measuring in this attack from the paper. Does that make sense? So there's some integrity component to the the message that allows us to time the decryption leading up to it. All right.

So now let's talk about how to do you actually implement RSA. So the core of it is really this exponentiation, which is not exactly trivial to do as I was mentioning earlier because all these numbers are very large integers. So the message itself is going to be at least, in this paper, 1,000 bit integer. And the exponent itself is also going to be pretty large.

The encryption exponent is at least well known. But the decryption exponent better be also a large integer also on the order of 1,000 bits. So you have a 1,000 bit integer you want to exponentiate to another 1,000 bit integer power modulo some other 1,000 bit integer n that's going to be a little messy, if you just do [? the naive thing. ?] So almost everyone has lots of optimizations in their RSA implementations to make this go a little bit faster.

And there's four optimizations that matter for the purpose of this attack. There is actually more tricks that you can play, but the most important ones are these. So first there's something called the Chinese remainder theorem, or C R T. And just to remind you from grade school or high school maybe what this remainder theorem says.

It actually says that if you have two numbers and you have some value x and you know that x is equal to a1 mod p. And you know that x is equal to a2 mod q, where p and q are prime numbers. And this modular equality applies to the whole equation. Then it turns out that there's a unique solution to this is mod p q. So there's are some x equals to some x prime mod pq. And in fact, there's a unique such x prime, and it's actually very efficient to compute. So the Chinese remainder theorem also comes with an algorithm for how to compute this unique x prime that's equal to x mod pq given the values a1 and a2 mod p and q, respectively. Make sense?

OK, so how can you use this Chinese remainder theorem to speed up modular exponentiation? So the way this is going to help us is that if you notice all the time we're doing this computational of some bunch of stuff modulo n, which is p times q. And the Chinese remainder theorem says that if you want the value of something mod p times q, it suffices to compute the value of that thing mod p and the value of that thing mod q. And then use the Chinese remainder theorem to figure out the unique solution to what this thing is mod p times q. All right, why is this faster? Seems like you're basically doing the same thing twice, and that's more work to recombine it Is this going to save me anything? Yeah?

AUDIENCE: [INAUDIBLE]

PROFESSOR: Well, they're certainly smaller, they're not that smaller. And so p and q, so n is 1,000 bits, p and q are both 500 bits, they're not quite to the machine word size yet. But it is going to help us because most of the stuff we're doing in this computation is all these multiplications. And roughly multiplication is quadratic in the size of the thing you're multiplying because the grade school method of multiplication you take all the digits and multiply them by all the other digits in the number.

And as a result, doing exponentiation multiplication is roughly quadratic in the input side. So if we shrink the value of p, we basically go from 1,000 bits to 512 bits, we reduce the size of our input by 2. So this means all this multiplication exponentiation is going to be roughly 4 times cheaper. So even though we do it twice, each time is 4 times faster. So overall, the CRT optimization is going to give us basically a 2x performance boost for doing any RSA operation both, in the encryption and decryption side. That make sense? All right. So that's the first optimization that most people use.

The second thing that most implementations do is a technique called sliding windows. And we'll look at this implementation in 2 steps so this implementation is going to be concerned with what basic operations are going to perform to do this exponentiation. Suppose you have some ciphertext c that's now 500 bits because you were not doing mod p or mod q. We have a 500 bit c and, similarly, roughly a 500 bit d as well.

So how do we raise c to the power d? I guess the stupid way that is to take c and keep multiplying d times. But d is very big, it's 2 to the 500. So that's never going to finish. So a more amenable, or more performant, plan is to do what's called repeat of squaring. So that's the step before sliding windows.

So this technique called repeated squaring looks like this. So if you want to compute c to the power 2 x, then you can actually compute c to the x and then square it. So in our naive plan, computing c to the 2x would have involved us making twice as many iterations of multiplying because it's multiplying c twice many times. But in fact, you could be clever and just compute c to the x and then square it later. So this works well, and this means that if you're computing c to some even exponent, this works. And conversely, if you're computing c to some 2x plus 1, then you could imagine this is just c to the x squared times another c. So this is what's called repeated squaring.

And this now allows us to compute these exponentiations, or modular exponentiations, in a time that's basically linear in the size of the exponent. So for every bit in the exponent, we're going to either square something or square something then do an extra multiplication. So that's the plan for repeated squaring. So now we can at least have non-embarrassing run times for computing modular exponents. Does this make sense, why this is working and why it's faster?

All right, so what's this sliding windows trick that the paper talks about? So this is a little bit more sophisticated than this repeating squaring business. And basically the squaring is going to be pretty much inevitable. But what the sliding windows optimization is trying do is reduce the overhead of multiplying by this extra c down here.

So suppose if you have some number that has several 1 bits in the exponent, for every 1 bit in the exponent in the binder of presentation, you're going to have do this step instead of this step. Because for every odd number, you're going to have to multiply by c. So these guys would like to not multiply by this c as often.

So the plan is to precompute different powers of c. So what we're going to do is we're going to generate a table that says, well, here's the value of c to the x-- sorry, c to the 1-- here's the value of c to the 3, c to the 7. And I think [? in open ?] as a cell, it goes up to c to the 31st. So this table is going to just be precomputed when you want to do some modular exponentiation. You're going to precompute all the slots in this table. And then when you want to do this exponentiation, instead of doing the repeated squaring and multiplying by this c every time,

You're going to use a different formula. It says as well if you have c to the 32x plus some y, well you can do c to the x, and you can do repeated squaring-- very much like before-- this is to get the 32, there's like 5 powers of 2 here times c to the y. And c to the y, you can get out of this table. So you can see that we're doing the same number of squaring as before here. But we don't have to multiply by c as many times. You're going to fish it out of this table and do several multiplies by c for the cost of a single multiply. This make sense? Yeah?

AUDIENCE: How do you determine x and y in the first place?

PROFESSOR: How do determine y?

AUDIENCE: X and y.

PROFESSOR: Oh, OK. So let's look at that. So for repeated squaring, well actually in both cases, what you want to do is you want to look at the exponent that you're trying to use in a binary representation. So suppose I'm trying to compute the value of c to the exponent, I don't know, 1 0 1 1 0 1 0, and maybe there's more bits. OK, so if we wanted to do repeated squaring, then you look at the lowest bit here-- it's 0. So what you're going to write down is this is equal to c to the 1 0 1 1 0 1 squared.

OK, so now if only you knew this value, then you could just square it. OK, now we're going to compute this guy, so c to the 1 0 1 1 0 1 is equal to-- well here we can't use this rule because it's not 2x-- it's going to be to the x plus 1. So now we're going to write this is c to the 1 0 1 1 0 squared times another c. Because it's this prefix times 2 plus this one of m. That's how you fish it out for repeated squaring.

And for sliding window, you just grab more bits from the low end. So if you wanted to do the sliding window trick here instead of taking one c out, suppose we do-- instead of this giant table-- maybe we do 3 bits at a time. So we go off to c to the 7th. So here you would grab the first 3 bits, and that's what you would compute here: c to the 1 0 1 to the 8th power. And then, the rest is c to the 1 0 1 power here. It's a little unfortunate these are the same thing, but really there's more bits here. But here, this is the thing that you're going to look up in the table. This is c to the 5th in decimal. And this says you're going to keep doing the sliding window to compute this value. Make sense?

This just saves on how many times you have to multiply by c by pre-multiplying it a bunch of times. [? And the cell guys ?] at least 10 years ago thought that going up to 32 power was the best plan in terms of efficiency because there's some trade off here, right? You spend time preconfiguring this table, but then if this table is too giant, you're not going to use some entries, because if you run this table out to, I don't know, c to the 128 but you're computing just like 500 [? full bit ?] exponents, maybe you're not going to use all these entries. So it's gonna be a waste of time. Question.

AUDIENCE: [INAUDIBLE] Is there a reason not to compute the table [INAUDIBLE]? [INAUDIBLE].

PROFESSOR: It ends up being the case that you don't want to-- well there's two things going on. One is that you'll have now code to check whether the entry is filled in or not, and that'll probably reduce your branch predictor accuracy on the CPU So it will run slower in the common case because if you [INAUDIBLE] with the entries there.

Another slightly annoying thing is that it turns out this entry leaks stuff through a different side-channel, namely cache access patterns. So if you have some other process on the same CPU, you can sort of see which cache addresses are getting evicted out of the cache or are slower because someone accessed this entry or this entry. And the bigger this table gets, the easier it is to tell what the exponent bits were.

In the limit, this table is gigantic and just telling, just being able to tell which cache address on this CPU had a [? miss ?] tells you that the encryption process must have accessed that entry in the table. And tells you that, oh that long bit sequence appears somewhere in your secret key exponent. So I guess the answer isn't mathematically you could totally fill this in on demand.

In practice, you probably don't want it to be that giant. And also, if you have it's particularly giant, you aren't going to be able to use entries as efficiently as well. You can reuse these entries as you're computing. [INAUDIBLE] It's not actually that expensive because you use c to the cubed when you're computing c to the 7th and so on and so forth. It's not that bad. Make sense? Other questions? All right.

So this is the repeated squaring and sliding window optimization that open [? a cell ?] implements [INAUDIBLE] I don't actually know whether they still have the same size of the sliding window or not. But it does actually give you a fair bit of speed up. So before you had to square for every bit in the exponent.

And then you'd have to have a multiply for every 1 bit. So if you have a 500 bit exponent then you're going to do 500 squarings and, on average, roughly 256 multiplications by c. So with sliding windows, you're going to still do the 512 squarings because there's no getting around that. But instead of doing 256 multiplies by c, you're going to hopefully do way fewer, maybe something on the order of 32 [INAUDIBLE] multiplies by some entry in this table. So that's the general plan. [INAUDIBLE] Not as dramatic as CRT, not 2x, but it could save you like almost 1.5x. All depending on exactly what [INAUDIBLE]. Make sense? Another question about this? All right.

So these are the [? roughly ?] easier optimizations. And then there's two clever tricks playing with numbers for how to do just a multiplication more efficiently. So the first one of these optimizations that we're going to look at-- I think I'll raise this board-- is called this Montgomery representation. And we'll see in a second why it's particularly important for us.

So the problem that this Montgomery representation optimization is trying to solve for us is the fact that every time we do a multiply, we get a number that keeps growing and growing and growing. In particular, both in sliding windows or in repeated squaring, actually when you square you multiply 2 numbers together, when you multiply by c to the y, you multiply 2 numbers together.

And the problem is that if the inputs to the multiplication were, let's say, 512 bits each. Then the result of the multiplication is going to be 1,000 bits. And then you'd take this 1,000 bit result and you multiply it again by something like five [INAUDIBLE] bits. And now it's 1,500 bits, 2,000 bits, 2,500 bits, and it keeps growing and growing.

And you really don't want this because multiplications [? quadratic ?] in the size of the number we're multiplying. So we have to keep the size of our number as small as possible, which means basically 512 bits because all this computation is mod p or mod q. Yeah?

AUDIENCE: What do you want [INAUDIBLE]?

PROFESSOR: That's right, yeah. So the cool thing is that we can keep this number down because what we do is, let's say, we want to compute c to the x just for this example. Squared. Squared again. Squared again. What you could do is you compute c to the x then you take mod p, let's say, right. Then you square it then you do mod p again. Then you square it again, and then you do mod p again. And so on.

So this is basically what you're proposing. So this is great. In fact, this keeps it size of our numbers to basically five total bits, which is about as small as we can get. This is good in terms of keeping down the size of these numbers for multiplication. But it's actually kind of expensive to do this mod p operation. Because the way that you do mod p something is you basically have to do division. And division is way worse than multiplication.

I'm not going to go through the algorithms for division, but it's really slow. You usually want to avoid division as much as possible. Because it's not even just a straightforward programming thing, you have to do some approximation algorithm, some sort of Newton's method of some sort and just keep it [INAUDIBLE]. It's going to be slow.

And in the main implementation, this actually turns out to be the slowest part of doing multiplication. The multiplication is cheap. But then doing mod p or mod q to bring it back down in size is going to be actually more expensive than the multiplying. So that's actually kind of a bummer.

So the way that we're going to get around this is by doing this multiplication, this clever other representation, and also I'll show you the trick here. Let's see. Bear with me for a second, and then we'll and then see why it's so fast to use this Montgomery trick.

And the basic idea is to represent numbers, these are regular numbers that you might actually want to multiply. And we're going to have a different representation for these numbers, called the Montgomery representation. And that representation is actually very easy. We just take the value a and we multiply it by some magic value R.

I'll tell you what this R is in a second. But let's first figure out if you pick some arbitrary value R, what's going to happen here? So we take 2 numbers, a and b. Their Montgomery representations are sort of expectedly. A is aR, b is bR.

And if you want to compute the product of a times b, well in Montgomery space, you can also multiply these guys out. You can take aR multiply it by bR. And what you get here is ab times R squared. So there are two Rs now. That's kind of annoying, but you can divide that by R. And we get ab times R. So this is probably weird in a sense that why would you multiply this extra number. But let's first figure out whether this is correct. And then we'll figure out why this is going to be faster.

So it's correct in the sense that it's very easy. If you want to multiply some numbers, we just multiply by this R value and get the Montgomery representation. Then we can do all these multiplications to these Montgomery forms. And every time we multiply 2 numbers, we have to divide by R, look at the Montgomery form of the multiplication result. And then when we're done doing all of our squarings, multiplication, all this stuff, we're going to move back to the normal, regular form by just dividing by R one last time.

AUDIENCE: [INAUDIBLE]

PROFESSOR: We're now going to pick R to be a very nice number. And in particular, we're going to pick R to be a very nice number to make this division by R very fast. And the cool thing is that if this division by R is going to be very fast, then this is going to be a small number and we're not going to have to do this mod q very often. In particular, aR, let's say, is also going to be roughly 500 bits because it's all actually mod p or mod q. So aR is 500 bits.

BR is going to also be 500 bits. So this product is going to be 1,000 bits. This R is going to be this nice 500 roughly bit number, same size as p. And if we can make this division to be fast, then the result is going to be a roughly 500 bit number here. So we were able to do the multiplying without having to do an extra divide. Dividing by R cheaply gives us this small result, getting us out of doing a mod p for most situations.

OK, so what is this weird number that I keep talking about? Well R is just going to be 2 to 512. It's going to be 1 followed by a ton of zeros. So multiplying by this is easy, you just append a bunch of zeros to a number. Dividing could be easy if the low bits of the result are all zeros. So if you have a value that's a bunch of bits followed by 512 zeros, then dividing by 2 to the 512 is cheap. You just discard the zeros on the right-hand side. And that's actually the correct division. Does that make sense?

The slight problem is that we actually don't have zeros on the right hand side when you do this multiplication. These are like real 512 bit numbers with all the 512 bits used. So this will be a 1,000 bit number [? or ?] with all this bits also set to randomly 0 or 1, depending on what's going on. So we can't just discard the low bits.

But the cleverness comes from the fact that the only thing we care about is the value of this thing mod p. So you can always add multiples of p to this value without changing it when it's equivalent to mod p. And as a result, we can add multiples of p to get the low bits to all be zeros.

So let's look through some simple examples. I'm not going to write out 512 bits on the board. But suppose that-- here's a short example. Suppose that we have a situation where our value R is 2 to the 4th. So it's 1 followed by four zeros. So this is a much smaller example than the real thing. But let's see how this Montgomery division is going to work out. So suppose we're going to try to compute stuff mod q, where q, let's say, is maybe 7. So this is 1 1 1 in binary form. And what we're going to try to do is maybe we did some multiplication. And this value aR times bR is equal to this binary presentation 1 1 0 1 0. So this is going to be the value of aR times bR.

How do we divide it by R? So clearly the low four bits aren't all 0, so we can't just divide it out. But we can add multiples of q. In particular, we can add 2 times q. So 2q is equal to 1 1 1 0. And now what we get is 0 0, carry a 1, 0, carry a 1, 1, carry a 1, 0 1. I hope I did that right. So this is what we get. So now we get aR bR plus 2 cubed. But we actually don't care about the plus 2 cubed. It's actually fine because all we care about is the value of mod q.

And now we're closer, we have three 0 bits at the bottom. Now we can add another multiple of q. This time it's going to be probably 8q. So we add 1 1 1 here 0 0. And if we add it, we're going to get, let's say, 0 0 0 then add these two guys 0, carry a 1, 0, carry a 1, 1 1. I think that's right. But now we have our original aR bR plus 2q plus 8q is equal to this thing. And finally, we can divide this thing by R very cheaply. Because we just discard the low four zeros. Make sense? Question.

AUDIENCE: Is aR bR always going to end in, I guess, 1,024 zeros?

PROFESSOR: No, and the reason is that-- OK, here is the thing that's maybe confusing. A was, let's say, 512 bits. Then you multiply it by R. So here, you're right. This value is that 1,000 bit number where the high bit is a, the high 512 bits are a. And the low bits are all zeros. But then, you're going [? to do it with ?] mod q to bring it down to make it smaller. And in general, this is going to be the case. Because [? it only ?] has these low zeros the first time you convert it. But after you do a couple multiplications, they're going to be arbitrary bits. So these guys are-- so I really should have written mod q here-- and to compute this mod q as soon as you do the conversion to keep the whole value small.

AUDIENCE: [INAUDIBLE]

PROFESSOR: Yeah, so the initial conversion is expensive or at least it's as expensive as doing a regular modulus during the multiplication. The cool thing is that you pay this cost just once when you do the conversion into Montgomery form. And then, instead of converting it back at every step, you just keep it in Montgomery form.

But remember that in order to do an exponentiation to an exponent which has 512 bits, you're saying you're going to have to do over 500 multiplications because we have to do at least 500 squarings plus then some. So you do these mod q twice and then you get a lot of cheap divisions if you stay in this form. And then you do a division by R to get back to this form again. So instead of doing 500 mod qs for every multiplication step, you do it twice mod q. And then you keep doing these divisions by R cheaply using this trick. Question.

AUDIENCE: So when you're adding the multiples of q and then dividing by R, [INAUDIBLE]

PROFESSOR: Because it's actually mod q means the remainder when you divide by q. So x plus y times q, mod q is just x.

AUDIENCE: [INAUDIBLE]

PROFESSOR: So in this case, dividing by-- so another sort of nice property is that because it's all modulus at prime number-- it's also true that if you have x plus yq divided by R, mod q is actually the same as x divided by R mod q. The way to think of it is that there's no real division in modular arithmetic. It's just an inverse. So what this really says is this is actually x plus yq times some number called R inverse. And then you compute this whole thing mod q. And then you could think of this as x times R inverse mod q plus y [? u ?] R inverse mod q. And this thing cancels out because it's something times q.

And there's some closed form for this thing. So here I did it by bit by bit, 2q then 8q, et cetera. It's actually a nice closed formula you can compute-- it's in the lecture notes, but it's probably not worth spending time on the board here-- for how do you figure out what multiple of q should you add to get all the low bits to turn to 0. So then it turns out that in order to do this division by R, you just need to compute this magic multiple of q, add it. And then discard the low bits and that brings your number back to 512 bits, or whatever the size is.

OK. And here's the subtlety. The only reason we're talking about this is that there's something funny going on here that is going to allow us to learn timing information. And in particular, even though we divided by R, we know the result is going to be 512 bits. But it still might be greater than q because q isn't exactly [? up to 512 ?], it's not a 512 bit number. So it might be a little bit less than R. So it might be that after we do this cheap division by R, [? the way ?] we subtract out q one more time because we get something that's small but not quite small enough.

So there's a chance that after doing this division, we maybe have to also subtract q again. And this subtraction is going to be part of what this attack is all about. It turns out that subtracting this q adds time. And someone figured out-- not these guys but some previous work-- that you show that this probability of doing this thing, this is called an extractor reduction. This probability sort of depends on the particular value that you're exponentiating. So if you're computing x to the d mod q, the probability of an extra reduction, at some point while computing x to the d mod q, is going to be equal to x mod q divided by 2R.

So if we're going to be computing x to the mod q, then depending on what the value of x mod q is, whether it's big or small, you're going to have even more or less of these extra reductions. And just to show you where this is going to fit in, this is actually going to happen in the decrypt step, because during the decrypt step, the server is going to be computing c to the d. And this says the extractor reductions are going to be proportional to how close x, or c in this case, is to the value q.

So this is going to be worrisome, right, because the attacker gets to choose the input c. And the number of extractor reductions is going to be proportional to how close the c is to one of the factors, the q. And this is how you're going to tell I'm getting close to the q, or I've overshot q. And all of a sudden, there's no extractor reductions, it's probably because x mod q is very small the x is q plus little epsilon. And it's very small. So that's one part of the timing attack we're going to be looking at in a second. I don't have any proof that this actually true [INAUDIBLE] these extractor reductions work like this. Yea, question.

AUDIENCE: What happens if you don't do this extra reduction?

PROFESSOR: Oh, what happens if you don't do this extractor reduction? You can avoid this extra reduction. And then you just have to do some extra probably modular reductions later. I think the math just works out nicely this way for the Montgomery form. I think for many of these things it's actually once you look at them as a timing channel [INAUDIBLE] [? think ?] don't do this at all, or maybe you should do some other plan. So you're right,

I think you could probably avoid this extra reduction and probably just do the mod q, perhaps at the end. I haven't actually tried implementing this. But it seems like it could work. It might be that you just have to do mod q once [? there ?], which you'll probably have to do anyway. So it's not super clear. Maybe it's [INAUDIBLE] probably not q.

So in light of the fact that [INAUDIBLE]. Actually, I shouldn't speak authoritatively to this. I haven't tired implementing this. So maybe there's some deep reason why this extractor reduction has to happen. I couldn't think of one. All right, questions?

So here's the last piece of the puzzle for how OpenSSL, this library that this paper attacks implements multiplication. So this Montgomery trick is great for avoiding the mod q part during modular multiplication. But then there's a question of how do you actually multiply two numbers together. So we're doing lower and lower level.

So suppose you have [? the raw ?] multiplication. So this is not even modular multiplication. You have two numbers, a and b. And both these guys are 512 bit numbers. How do you multiply them together when your machine is only a 32 bit machine, like the guys in the paper, or a 64 bit, but still, same thing? How would you implement multiplication of these guys? Any suggestions?

Well I guess it was a straightforward question, you just represent a and b as a sequence of machine [? words. ?] And then you just do this quadratic product of these two guys. [INAUDIBLE] see a simple example, instead of thinking of a 512 bit number, let's think of these guys as 64 bit numbers and we're on a 32 bit machine. Right. So we're going to have values. The value of a is going to be represented by two [? very ?] different things.

It's going to be, let's call it, a1 and a0. So a0 is the low bit, a1 is the high bit. And similarly, we're going to represent b as two things, b1 b0. So then a naive way to represent a b is going to be to multiply all these guys out. So it's going to be a three cell number. The high bit is going to be a1 b1. The low bit is going to be a0 b0. And the middle word is going to be a1 b0 plus a0 b1. So this is how you do the multiplication, right. Question?

AUDIENCE: So I was going to say are you using [INAUDIBLE] method?

PROFESSOR: Yeah, so this is like a clever method alternative for doing multiplication, which doesn't involve four steps. Here, you have to do four multiplications. There's this clever other method, Karatsuba. Do they teach this in 601 or something these days?

AUDIENCE: 042.

PROFESSOR: 042, excellent. Yeah, that's a very nice method. Almost every cryptographic library implements this. And for those of you that, I guess, weren't undergrads here, since we have grad students maybe they haven't seen Karatsuba. I'll just write it out on the board. It's a clever thing the first time you see it. And what you can do is basically compute out three values. You're going to compute out a1 b1. You're going to also compute a1 minus b0 times b1 minus-- sorry-- a1 minus a0, b1 minus b0. And a0 b0. And this does three multiplications instead of four. And it turns out you can actually reconstruct this value from these three multiplication results.

And the particular way to do it is this is going to be the-- let me write it out in a different form. So we're going to have 2 to the 64 times-- sorry-- 2 to the 64 plus 2 to the 32 times a1 b1 plus 2 to the 32 times minus that little guy in the middle a1 minus a0 b1 minus b0.

And finally, we're going to do 2 to the 32 plus 1 times a0 b0. And it's a little messy, but actually if you work through the details, you'll end up convincing yourself hopefully that this value is exactly the same as this value. So it's a clever. But nonetheless, it saves you one multiplication. And the way we apply this to doing much larger multiplications is that you recursively keep going down.

So if you have 512 bit values, you could break it down to 256 bit multiplication. You do three 256 bit multiplications. And then each of those you're going to do using the same Karatsuba trick recursively. And eventually you'll get down to machine size, which you can just do with a single machine instruction. [INAUDIBLE] This make sense?

So what's the timing attack here? How do these guys exploit this Karatsuba multiplication? Well, it turns out that OpenSSL worries about basically two kinds of multiplications that you might need to do. One is a multiplication between two large numbers that are about the same size. So this happens a lot when we're doing this modular exponentiation because all the values we're going to be multiplying are all going to be roughly 512 bits in size.

So when we're multiplying by c to the y or doing a squaring, we're multiplying two things that are about the same size. And then this Karatsuba trick makes a lot of sense because, instead of computing stuff in times squared of the input size, Karatsuba is roughly n to the 1.58, something like that. So it's much faster.

But then there's this other situation where OpenSSL might be multiplying two numbers that are very different in size: one that's very big, and one that's very small. And in that case you could use Karatsuba, but then it's going to get you slower than doing the naive thing. Suppose you're trying to multiply a 512 bit number by a 64 bit number, you'd rather just do the straightforward thing, where you just multiply by each of the things in the 64 bit number plus 2n instead of n to the 1.58 something.

So as a result, the OpenSSL guys tried to be clever, and that's where often problems start. They decided that they'll actually switch dynamically between this Karatsuba efficient thing and this sort of grade school method of multiplication here. And their heuristic was basically if the two things you're multiplying are exactly the same number of machine words, so they at least have the same number of bits up to 32-bit units, then they'll go to Karatsuba. And if the two things they're multiplying have a different number or 32 bit units, then they'll do the quadratic or straightforward or regular, normal multiplication.

And there you can see if your number all of a sudden switches to be a little bit smaller, then you're going to switch from the sufficient thing to this other multiplication method. And presumably, the cutoff point isn't going to be exactly smooth so you'll be able to tell all of a sudden, it's now taking a lot longer to multiply or a lot shorter to multiply than before. And that's what these guys exploit in their timing attack again. Does that make sense? What's going on with the [INAUDIBLE] All right.

So I think I'm now done with telling you about all the weird implementation tricks that people play when implementing RSA in practice. So now let's try to put them back together into an entire web server and figure out how do you [? tickle ?] all these interesting bits of the implementation from the input network packet.

So what happens in a web server is that the web server, if you remember from the HTTPS lecture, has a secret key. And it uses the secret key to prove that it's the correct owner of all that certificate in the HTTPS protocol or in TLS. And they way this works is that the clients send some randomly chosen bits, and the bits are encrypted using the server's public key. And the server in this TLS protocol decrypts this message. And if the message checks out, it uses those random bits to establish a [? session ?]. But in this case, the message isn't going to check out. The message is going to be carefully chosen, the padding bits aren't going to match, and the server is going to return error as soon as it finishes encrypting our message. And that's what we're going to time here.

So the server-- you can think of this is Apache with open SSL-- you're going to get a message from the client, and you can think of this as a ciphertext c, or a hypothetical ciphertext, that the client might have produced. And the first thing we're going to do with a ciphertext c, we want to decrypt it using roughly this formula. And if you remember the first optimization we're going to apply is the Chinese Remainder Theorem.

So the first thing we're going to do is basically split our pipeline in two parts. We're going to do one thing mod p another thing mod q and then recombine the results at the end of the day. So the first thing we're going to do is, we're actually going to take c and we're going to compute, let's call this c0, which is going to be equal to c mod q. And we're also going to have a different value, let's call it c1, which is going to be c mod p. And then we're going to do the same thing to each of these values to basically compute c to the d mod p and c to the d mod q.

And here we're going to basically initially we're going to [? starch. ?] After CRT, we're going to switch into Montgomery representation because that's going to make our multiplies very fast. So the next thing SSL is going to do to your number, it's actually going to compute all the [INAUDIBLE] at c0 prime, which is going to be c0 times R mod q.

And the same thing down here, I'm not going to write out the pipeline because that'll look the same. And then, now that we've switched into Montgomery form, we can finally do our multiplications. And here's where we're going to use the sliding window technique. So once we have c prime, we can actually try to compute this prime exponentiate it to 2d mod q. And here, as we're computing this value to the d, we're going to be using sliding windows. So here, we're going to do sliding windows for the bits in this d exponent.

And also we're going to do Karatsuba or regular multiplication depending on exactly what the size of our operands are. So if it turns out that the thing we're multiplying, c0 prime and maybe that previously squared result, are the same size, we're going to do Karatsuba. If c0 prime is tiny but some previous thing we're multiplying it to is big , then we're going to do quadratic multiplication, normal multiplication. There's sliding windows coming in here, here we also have this Karatsuba versus normal multiplying.

And also in this step, the extra reductions come in. Because at every multiply, the extra reductions are going to be proportional to the thing we're exponentiating mod q. [INAUDIBLE] just plug in the formula over here, the probability extra reductions is going to be proportional to this value of c0 prime mod q divided by 2R. So this is where the really timing sensitive bit is going to come in. And there are actually two effects here. There's this Karatsuba versus normal choice. And then there's the number of extra reductions you're going to be making.

So we'll see how we exploit this in a second, but now that you get this result for mod q, you're going to get a similar result mod p, you can finally recombine these guys from the top and the bottom and use CRT. And what you get out from CRT is actually-- sorry I guess we need a first convert it back down into non Montgomery form. So we're going to get first, we're going to get c0 prime to the d divided by R mod q.

And this thing, because c0 prime was c0 times R mod q, if we do this then we're going to get back out our value of c to the d mod q. And we get c to the d here, we're going to get to c to the d mod p on the bottom version of this pipeline. And we can use CRT to get the value of c to the d mod m. Sorry for the small type here, or font size. But roughly it's the same thing we're expecting here. We can finally get our result. And we get our message, m.

So the server takes an incoming packet that it gets, runs it through this whole pipeline, does two parts of this pipeline, ends up with a decrypted message m that's equal c to the d mod m. And then it's going to check the padding of this message. And in this particular attack, because we're going to carefully construct this value c, the padding is going to actually not match up. We're going to choose the value c according to some other heuristics that aren't encrypting a real message with the correct padding.

So the padding is going to be a mismatch, and the server's going to need it to record an error back to the client. [? And it pulls ?] the connection. And that's the time that we're going to measure to figure out how long this whole pipeline took. Makes sense? Questions about this pipeline and putting all the optimizations together?

AUDIENCE: [INAUDIBLE]

PROFESSOR: Yeah, you're probably right. Yes, c1 to the d, c0 to the d. Yeah, this is c0. Yeah, correct.

AUDIENCE: When you divide by r [INAUDIBLE], isn't there a [INAUDIBLE] on how many q's you have to have to get the [? little bit ?] to be 0? [INAUDIBLE].

PROFESSOR: Yeah, so there might be extra reductions in this final phase as well. You're right. So potentially, we have do this divide by R correctly. So we probably have to do exactly the same thing as we saw for the Montgomery reductions here. When we do this divide by R to convert it back. So it's not clear exactly how many qs we should add. We should figure out how many qs to add, add that many, kill the low zeros, and then do mod q again, maybe an extra reduction. You're absolutely right, this is exactly the same kind of divide by R mod q as we do for every Montgomery multiplication step. Make sense? Any other questions?

All right. So how do you exploit this? How does an attacker actually figure out what the secret key of the server is by measuring the time of this entire pipeline? So these guys have a plan that basically involves guessing one bit of the private key at a time. And what they mean actually by guessing the private key is that you might think the private key is this encryption exponent d, because actually you know e, you know n, that's the public key. The only think you don't know is d. But in fact, in this attack they don't go for the exponent d directly, that's a little bit harder to guess.

Instead, what they're going to go for is the value q or the value p, doesn't really matter which one. Once you guess what the value p or q is, then you can give an n, you can factor in the p times q. Then if you know p times q, you can actually-- sorry-- if you know the values of p and q, you can compute that phi function we saw before. That's going to allow you to get the value d from the value e. So this factorization of the value m is hugely important, it should be secret for RSA to remain secure. So these guys are actually going to go and try to guess what the value of q is by timing this pipeline. All right.

So how do these guys actually do it? Well, they construct carefully chosen inputs, c, into this pipeline and-- I guess I keep saying they keep measuring the time for this guy. But the particular, well, there's two parts of the attack, you have to bootstrap it a little bit to guess the first couple of bits. And then once you have the first couple of bits, you can I guess the next bit. So let me not say exactly how they guess the first couple of bits because it's actually much more interesting to see how they guess the next bit. And then we'll come back if we have time to look at how they guess the first couple of bits [? at this ?] in the paper.

But basically, suppose you have a guess g about what the bits are of this value q. So you know that q has some bits, g0, g1, g2, et cetera. And actually, I guess these are not even gs, these are real q bits, so let me write it as that. So you know tat q bit 0 q bit 1, q bit 2, these are the highest bits of q. And then you're trying to guess lower and lower bits. So suppose you know the value of q up to bit j. And from that point on, your guess is actually all 0. You have no idea what the other bits are.

So these guys are going to try to get this guess g into this place in the pipeline. Because this is where there are two tiny effects: this choice of Karatsuba versus normal multiplication. And this choice of, or this a different number of extra reductions depending on the value c0 prime. Sp they're going to actually try to get two different guess values into that place in the pipeline. One that looks like this, and one that they call g high, which is all the same high bits, q2 qj. And for the next bit, which they don't know, [? you ?] guess g is going to have 0, g high is going to have a bit 1 here and all zeros later on.

So how does it help these guys figure out what's going on? So there are really two ways you can think of it. Suppose that we get this guess g to be the value of c0 prime. We can think of g and g high being the c0 prime value on that left board over there. It's actually fairly straightforward to do this because c0 prime is pretty deterministically computed from the input ciphertext c0. You just multiply it by R. So, in order for them to get some value to here, as a guess, they just need to take their guess and first divide it by R, so divide it by 2 to the 512 mod something. And then, they're going to inject it back. And the server's going to multiply it by R, and then off you go. Make sense? All right.

So suppose that we manage to get our particular chosen integer value into that c0 you're prime spot. So what's going to be the time to compute c0 prime to the d mod q? So there are two possible options here where q falls in this picture. So it might be that q is between these two values. Because the next bit of q is 0. So this value is going to be less than q, but this guy's going to be greater than q. So this happens if the next bit of q0 or it might be that q lies above both of these values if the next bit of q is 1. So now we can tell, OK, what's going to be the timing of decrypting these two values, if q lies in between them, or if q lies above both of them.

Let's look at the situation where q lies above both of them. Well in that case, actually everything is pretty much the same. Right? Because both of these values are smaller than q, then the value of these things mod q is going to be roughly the same. They're going to be a little bit different because this extra bit, but more or less they're the same magnitude.

And the number of extractor reductions is also probably not going to be hugely different because it's proportional to the value of this guy mod q. And for both these guys, they're both a little bit smaller than q, so they're all about the same. Neither of them is going to exceed q and all of a sudden have [? many or fewer ?] extra reductions.

So if q is greater than both of these guesses then Karatsuba versus normal is going to stay the same. The server is going to do the same thing basically for both g and g high in terms of Karatsuba versus normal. And the server's going to do about the same number of extra reductions for both these guys as well. So If you see that the server's taking the same amount of time to respond to these guesses, then you should probably guess that, oh, q probably has the bit 1 here.

On the other hand, if q lies in the middle, then there are two possible things that could trigger a change in the timing. One possibility is that because g high is just a little bit larger than q, then the number of extra reductions is going to be proportional to this guy mod q, which is very small because c0 prime is q plus just a little bit in these extra bits. So the number of extra reductions is going to [? flaunt it ?]. And all of a sudden, it will be faster.

Another possible thing that can happen is that maybe the server will decide, oh, now it's time to do normal multiplication instead of Karatsuba. Maybe for this value, all these, c to the 0 prime was the same number of bits as q if it turns out that g high is above q, then g high mod q is potentially going to have fewer bits. And if this crosses the [INAUDIBLE] boundary, then the server's going to do normal multiplication all of a sudden. So that's going to be in the other direction. So if you cross over, then normal multiplication kicks in, and things get a lot slower because normal multiplication is quadratic instead of nicer, faster Karatsuba. Question.

AUDIENCE: [INAUDIBLE]

PROFESSOR: Yeah, because the number of extra reductions is proportional to from above there to c0 prime mod q. So if c0 prime, which is this value, is just a little over q. Then, this is tiny, as opposed to this guy who's basically the same as q, or all the high bits are the same as q, and then it's big. So then it'll be the difference that you can try to measure. So this is one interesting thing, actually a couple interesting things, these effects actually work in different directions, right. So if you hit a 32 bit boundary and Karatsuba versus normal switches, then all of a sudden it takes much longer to decrypt this message.

On the other hand, if it's not a 32 bit boundary, maybe this effect will tell you what's going on. So you actually have to watch for different effects. If you're not guessing a bit that's a multiple of 32 bits, then you should probably expect the time to drop because of extra reductions. On the other hand, if you're trying to guess a bit that's a multiple of 32, then maybe you should be expecting for it to jump a lot or maybe drop if it's [INAUDIBLE] normal.

So I guess what these guys look at in the paper, this actually doesn't really matter whether there's a jump up or a jump down in time. You should just expect if q is, if the next bit of q is 1, you should expect these things to take almost the same amount of time. And if the next bit of q is 0, then you should expect these guys to have a noticeable difference even if it's big or small, even if it's positive or negative.

So actually, they measure this. And it turns out to actually work pretty well. They have to do actually two interesting tricks to make this work out. If you remember the timing difference was tiny, it's an order of 1 to 2 microseconds. So it's going to be hard to measure this over a network, over an ethernet switch for example.

What they do is they actually do two kinds of measurements, two kinds of averaging. So for each guess that they send, they actually send it several times. In the paper, they said they send it like 7 times or something. So what kind of noise do you think this helps them with [? if they ?] just resend the same guess over and over? Yeah.

AUDIENCE: What's up with the [INAUDIBLE]?

PROFESSOR: Yeah, so if the network keeps adding different things, you just try the same thing many times. The thing in the server should be taking exactly the same amount of time every time and just average out the network noise. In the paper, they say they take the median value-- I actually don't understand why they take the median, I think they should be taking the min of the real thing that's going on-- but anyway, this was the average of the network.

But then they do this other weird thing, which is that when they're sending a guess, they don't just send the same guess 7 times, they actually send a neighborhood of guesses. And each value in the neighborhood gets sent 7 times itself. So they actually send g 7 times. Then they send g plus 1 also 7 times. Then they send g plus 2 also 7 times, et cetera, up to g plus 400 in the paper. Why do they do this kind of averaging as well over different g value instead of just sending g 7 times 400 times. Because it seems more straightforward. Yeah?

AUDIENCE: [INAUDIBLE]

PROFESSOR: Yeah, that's actually what's going on. We're actually trying to measure exactly how long this piece of computation will take. But then there's lots of other stuff. For example, this other pipeline that's at the bottom is doing all the stuff mod p. I mean it's also going to take different amount of time depending on what exactly the input is.

So the cool thing is that if you perturb the value of all your guess g by adding 1, 2, 3, whatever, it's just [INAUDIBLE] the little bits. So the timing attack we just looked at just now, isn't going to change because that depended on this middle bit flipping.

But everything that's happening on the bottom side of the pipeline mod p is going to be totally randomized by this because when they do it mod p then adding an extra bit could shift things around quite a bit mod p. Then you're going to, it will average out other kinds of computational noise that's deterministic for a particular value but it's not related to this part of the computation we're trying to go after. Make sense?

AUDIENCE: How do they do that when they try to guess the lower bits?

PROFESSOR: So actually they use some other mathematical trick to only actually bother guessing the top half of the bits of q. It turns out if you know the top half of the bits of q there's some math you can rely on to factor the numbers, and then you're in good shape. So you can always [INAUDIBLE] little bit. Basically not worry about it. Make sense? Yeah, question.

AUDIENCE: [INAUDIBLE]

PROFESSOR: Well, you're going to construct this value c0-- well you want the c0 prime-- you're going to construct a value c by basically taking your c0 prime and multiplying it times R inverse mod n. And then when the server takes this value, it's going to push it through here. So it's going to compute c0. It's going to be c mod q, so that value is going to be c0 prime R inverse mod q.

Then you multiply it by R, so you get rid of the R inverse. And then you end up with a guess exactly in this position. So the cool thing is basically all manipulations leading up to here are just multiplying by this R. And you know what R is going be, it's going to be 2 to the 512. I'm going to be really straightforward. Make sense? Another question?

AUDIENCE: Could we just cancel out timing [INAUDIBLE]?

PROFESSOR: Well, if you do p, you'd be in business. Yeah, so that's the thing. Yeah, you don't know what p is, but you just want to randomize it out. Any questions? All right. [INAUDIBLE] but thanks for sticking around. So we'll start talking about other kinds of problems next week.