1 00:00:00,090 --> 00:00:02,430 The following content is provided under a Creative 2 00:00:02,430 --> 00:00:03,810 Commons license. 3 00:00:03,810 --> 00:00:06,050 Your support will help MIT OpenCourseWare 4 00:00:06,050 --> 00:00:10,160 continue to offer high quality educational resources for free. 5 00:00:10,160 --> 00:00:12,690 To make a donation or to view additional materials 6 00:00:12,690 --> 00:00:16,590 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:16,590 --> 00:00:17,260 at ocw.mit.edu. 8 00:00:26,700 --> 00:00:28,955 PROFESSOR: All right, guys, let's get started. 9 00:00:28,955 --> 00:00:31,330 So today, we're going to talk about side-channel attacks, 10 00:00:31,330 --> 00:00:36,360 which is a general class of problems that comes up 11 00:00:36,360 --> 00:00:38,870 in all kinds of systems. 12 00:00:38,870 --> 00:00:40,320 Broadly, side-channel attacks are 13 00:00:40,320 --> 00:00:42,778 situations where you haven't thought about some information 14 00:00:42,778 --> 00:00:44,810 that your system might be revealing. 15 00:00:44,810 --> 00:00:47,860 So typically, you have multiple components that you [INAUDIBLE] 16 00:00:47,860 --> 00:00:50,480 maybe a user talking to some server. 17 00:00:50,480 --> 00:00:53,387 And you're thinking, great, I know exactly all the bits 18 00:00:53,387 --> 00:00:57,600 going over some wire [INAUDIBLE] server, and those are secure. 19 00:00:57,600 --> 00:01:00,796 But it's often easy to miss some information revealed, 20 00:01:00,796 --> 00:01:03,830 either by user or by server. 21 00:01:03,830 --> 00:01:07,800 So the example that the paper for today talks about 22 00:01:07,800 --> 00:01:10,465 is a situation where the timing of the messages 23 00:01:10,465 --> 00:01:12,900 between the user and the server reveals 24 00:01:12,900 --> 00:01:16,070 some additional information that you wouldn't have otherwise 25 00:01:16,070 --> 00:01:19,390 learned by just observing the bits flowing between these two 26 00:01:19,390 --> 00:01:20,930 guys. 27 00:01:20,930 --> 00:01:24,650 But In fact, there's a much broader class of side-channels 28 00:01:24,650 --> 00:01:25,790 you might worry about. 29 00:01:25,790 --> 00:01:28,550 Originally, side-channels showed up, 30 00:01:28,550 --> 00:01:31,360 or people discovered them in the '40s when they discovered 31 00:01:31,360 --> 00:01:33,440 that when you start typing characters 32 00:01:33,440 --> 00:01:37,110 on a teletype the electronics, or the electrical machinery 33 00:01:37,110 --> 00:01:39,580 in the teletype, would emit RF radiation. 34 00:01:39,580 --> 00:01:41,920 And you can hook up an oscilloscope nearby 35 00:01:41,920 --> 00:01:44,490 and just watch the characters being typed out 36 00:01:44,490 --> 00:01:48,230 by monitoring the frequency or RF frequencies that 37 00:01:48,230 --> 00:01:49,800 are going out of this machine. 38 00:01:49,800 --> 00:01:54,410 So RF radiation is a classic example of a side-channel 39 00:01:54,410 --> 00:01:57,490 that you might worry about. 40 00:01:57,490 --> 00:02:00,880 And there's lots of examples lots of other examples 41 00:02:00,880 --> 00:02:02,900 that people have looked at, almost anything. 42 00:02:02,900 --> 00:02:07,343 So power usage is another side-channel 43 00:02:07,343 --> 00:02:08,259 you might worry about. 44 00:02:08,259 --> 00:02:09,750 So your computer is probably going 45 00:02:09,750 --> 00:02:12,230 to use different amounts of power depending on what exactly 46 00:02:12,230 --> 00:02:13,970 it's computing. 47 00:02:13,970 --> 00:02:17,200 I'm gonna go into other clever examples of sound 48 00:02:17,200 --> 00:02:19,330 turns out to also leak stuff. 49 00:02:19,330 --> 00:02:21,740 There's a [? cute ?] paper that you can look at. 50 00:02:21,740 --> 00:02:25,344 The people listen to a printer and based on the sound 51 00:02:25,344 --> 00:02:26,760 the printer is making you can tell 52 00:02:26,760 --> 00:02:28,670 what characters it's printing. 53 00:02:28,670 --> 00:02:31,695 This is especially easy to do for dot matrix printers that 54 00:02:31,695 --> 00:02:35,180 make this very annoying sound when they're printing. 55 00:02:35,180 --> 00:02:38,690 And in general, a good thing to think about, 56 00:02:38,690 --> 00:02:40,681 Kevin on Monday's lecture also mentioned 57 00:02:40,681 --> 00:02:43,014 some interesting side-channels that he's running through 58 00:02:43,014 --> 00:02:45,700 in his research. 59 00:02:45,700 --> 00:02:49,090 But, in particular, here we're going 60 00:02:49,090 --> 00:02:51,880 to look at the specific side-channel 61 00:02:51,880 --> 00:02:56,240 that David Brumley and Dan Boneh looked at in their paper-- I 62 00:02:56,240 --> 00:02:59,095 guess about 10 years ago now-- where they were able to extract 63 00:02:59,095 --> 00:03:03,170 a cryptographic key out of a web server running Apache 64 00:03:03,170 --> 00:03:06,310 by measuring the timing of different responses 65 00:03:06,310 --> 00:03:11,520 to different input packets from the adversarial client. 66 00:03:11,520 --> 00:03:14,330 And in this particular case, they're 67 00:03:14,330 --> 00:03:15,990 going after a cryptographic key. 68 00:03:15,990 --> 00:03:17,860 In fact, many side-channel attacks 69 00:03:17,860 --> 00:03:21,440 target cryptographic keys partly because it's a little bit 70 00:03:21,440 --> 00:03:24,744 tricky to get lots of data through a side-channel. 71 00:03:24,744 --> 00:03:26,410 And cryptographic keys are one situation 72 00:03:26,410 --> 00:03:30,050 where getting a small number of bits helps you a lot. 73 00:03:30,050 --> 00:03:32,870 So in their attack they're able to extract maybe 74 00:03:32,870 --> 00:03:36,760 about 200 256 bits or so. 75 00:03:36,760 --> 00:03:38,970 And just from those 200ish bits, they're 76 00:03:38,970 --> 00:03:42,300 able to break the cryptographic key of this web server. 77 00:03:42,300 --> 00:03:43,890 Whereas, if you're trying to leak 78 00:03:43,890 --> 00:03:46,140 some database full of Social Security numbers, 79 00:03:46,140 --> 00:03:48,340 then that'll be a lot of bits you 80 00:03:48,340 --> 00:03:51,082 have to leak to get out of this database. 81 00:03:51,082 --> 00:03:53,290 So that's why many of these side-channels, 82 00:03:53,290 --> 00:03:55,670 if you'll see them later on, they often 83 00:03:55,670 --> 00:03:59,240 focus on getting small secrets out, 84 00:03:59,240 --> 00:04:02,850 might be cryptographic keys or passwords. 85 00:04:02,850 --> 00:04:04,970 But in general, this is applicable to lots 86 00:04:04,970 --> 00:04:09,210 of other situations as well. 87 00:04:09,210 --> 00:04:11,230 And one cool thing about this paper, 88 00:04:11,230 --> 00:04:13,410 before we jump into the details, is 89 00:04:13,410 --> 00:04:16,459 that they show that you actually do this over the network. 90 00:04:16,459 --> 00:04:18,890 So as you probably figured out from reading this paper, 91 00:04:18,890 --> 00:04:20,560 they have to do a lot of careful work 92 00:04:20,560 --> 00:04:23,150 to tease out these minute differences 93 00:04:23,150 --> 00:04:24,670 in timing information. 94 00:04:24,670 --> 00:04:28,290 So if you actually compute out the numbers from this paper, 95 00:04:28,290 --> 00:04:33,190 it turns out that each request that they sent to the server 96 00:04:33,190 --> 00:04:35,365 differs from potentially another [? website ?] 97 00:04:35,365 --> 00:04:39,480 by an order of 1 to 2 microseconds, which 98 00:04:39,480 --> 00:04:41,280 is pretty tiny. 99 00:04:41,280 --> 00:04:47,000 So you have to be quite careful, and all of our network 100 00:04:47,000 --> 00:04:50,080 it might be hard to tell whether some server took 101 00:04:50,080 --> 00:04:53,750 1 or 2 microseconds longer to process your request or not. 102 00:04:53,750 --> 00:04:58,150 And as a result, it was not so clear for whether you 103 00:04:58,150 --> 00:05:01,060 could mount this kind of attack over a very noisy network. 104 00:05:01,060 --> 00:05:03,690 And these guys were one of the first people 105 00:05:03,690 --> 00:05:06,620 to show that you can actually do this over a real ethernet 106 00:05:06,620 --> 00:05:09,600 network with a server sitting in one place, a client sitting 107 00:05:09,600 --> 00:05:10,461 somewhere else. 108 00:05:10,461 --> 00:05:12,460 And you could actually measure these differences 109 00:05:12,460 --> 00:05:16,740 partly by averaging, partly through other tricks. 110 00:05:16,740 --> 00:05:21,270 All right, does that make sense, the overall side-channel stuff? 111 00:05:21,270 --> 00:05:21,770 All right. 112 00:05:21,770 --> 00:05:23,860 So the plan for the rest of this lecture 113 00:05:23,860 --> 00:05:27,990 is we'll first dive into the details of this RSA 114 00:05:27,990 --> 00:05:29,800 cryptosystem that these guys use. 115 00:05:29,800 --> 00:05:32,480 Then we'll not look at exactly why it's secure 116 00:05:32,480 --> 00:05:34,900 or not but we'll look at how do you implement it 117 00:05:34,900 --> 00:05:37,980 because that turns out to be critical for exploiting 118 00:05:37,980 --> 00:05:39,350 this particular side-channel. 119 00:05:39,350 --> 00:05:42,800 They carefully leverage various details of the implementation 120 00:05:42,800 --> 00:05:46,164 to figure out when there are some things faster or slower. 121 00:05:46,164 --> 00:05:48,080 And then we'll pop back out once we understand 122 00:05:48,080 --> 00:05:49,210 how RSA is implemented. 123 00:05:49,210 --> 00:05:52,125 Then we'll come back and figure out how do you attack it, 124 00:05:52,125 --> 00:05:54,250 how do you attack all these different organizations 125 00:05:54,250 --> 00:05:56,040 that RSA has. 126 00:05:56,040 --> 00:05:57,580 Sounds good? 127 00:05:57,580 --> 00:05:58,710 All right. 128 00:05:58,710 --> 00:06:00,760 So I guess let's start off by looking 129 00:06:00,760 --> 00:06:04,200 at the high level plan for RSA. 130 00:06:04,200 --> 00:06:08,940 So RSA is a pretty widely used public key cryptosystem. 131 00:06:08,940 --> 00:06:10,800 We've mentioned these guys a couple of weeks 132 00:06:10,800 --> 00:06:14,690 ago in general in certificates, in the context of certificates. 133 00:06:14,690 --> 00:06:17,100 But now we're going to look at actually how it works. 134 00:06:17,100 --> 00:06:20,710 So typically there's 3 things you have to worry about. 135 00:06:20,710 --> 00:06:25,290 So there's generating a key, encrypting, and decrypting. 136 00:06:25,290 --> 00:06:29,220 So for RSA, the way you generate a key is you actually 137 00:06:29,220 --> 00:06:32,220 pick 2 large prime integers. 138 00:06:32,220 --> 00:06:35,500 So you're going to pick 2 primes, p and q. 139 00:06:35,500 --> 00:06:42,020 And in the paper, these guys focus on p and q, 140 00:06:42,020 --> 00:06:45,810 which are about 512 bits each. 141 00:06:45,810 --> 00:06:49,730 So this is typically called 1,024 bit RSA 142 00:06:49,730 --> 00:06:52,570 because the resulting product of these primes that you're 143 00:06:52,570 --> 00:06:56,500 going to use in a second is a 1,000 bit integer number. 144 00:06:56,500 --> 00:06:59,360 These days, that's probably not a particularly good choice 145 00:06:59,360 --> 00:07:02,170 for the size of your RSA key because it 146 00:07:02,170 --> 00:07:06,860 makes it relatively easy for attackers to factor this-- not 147 00:07:06,860 --> 00:07:09,080 trivial but certainly viable. 148 00:07:09,080 --> 00:07:12,170 So if 10 years ago, this seemed like a potentially sensible 149 00:07:12,170 --> 00:07:14,520 parameter, now if you're actually building a system, 150 00:07:14,520 --> 00:07:16,780 you should probably pick a 2,000 or 3,000 151 00:07:16,780 --> 00:07:19,866 or even 4,000 bit RSA key. 152 00:07:19,866 --> 00:07:22,590 Well, that's what RSA key size means 153 00:07:22,590 --> 00:07:24,620 is the size of these primes. 154 00:07:24,620 --> 00:07:26,480 And then, for convenience, we're going 155 00:07:26,480 --> 00:07:28,140 to talk about the number n, which 156 00:07:28,140 --> 00:07:33,010 is just the product of these 2 primes, p times q. 157 00:07:33,010 --> 00:07:33,510 All right. 158 00:07:33,510 --> 00:07:35,490 So now we know how to generate a key, 159 00:07:35,490 --> 00:07:38,440 now we need to figure out-- well this is at least 160 00:07:38,440 --> 00:07:40,100 part of a key-- now we're going to have 161 00:07:40,100 --> 00:07:45,060 to figure out how we're going to encrypt and decrypt messages. 162 00:07:45,060 --> 00:07:48,280 And the way we're going to encrypt and decrypt messages 163 00:07:48,280 --> 00:07:54,320 is by exponentiating numbers modulo this number n. 164 00:07:54,320 --> 00:07:57,790 So it seems a little weird, but let's go with it for a second. 165 00:07:57,790 --> 00:08:00,520 So if you want to encrypt a message, 166 00:08:00,520 --> 00:08:03,560 then we're going to take a message m 167 00:08:03,560 --> 00:08:11,920 and transform it into m to the power e mod m. 168 00:08:11,920 --> 00:08:14,570 So e is going to be some exponent-- we'll talk about how 169 00:08:14,570 --> 00:08:15,640 to choose it in a second. 170 00:08:15,640 --> 00:08:17,880 But this is how we're going to encrypt a message. 171 00:08:17,880 --> 00:08:21,230 We'll just take this message as an integer number 172 00:08:21,230 --> 00:08:23,260 and just exponentiate it. 173 00:08:23,260 --> 00:08:25,610 And then we'll see why this works in a second, 174 00:08:25,610 --> 00:08:30,500 but let's call this guy c, ciphertext. 175 00:08:30,500 --> 00:08:36,039 Then to decrypt it, we're going to somehow find 176 00:08:36,039 --> 00:08:37,940 an interesting other exponent where 177 00:08:37,940 --> 00:08:41,336 you can take a ciphertext c and if you exponentiate it 178 00:08:41,336 --> 00:08:46,440 to some power d mod m, then you'll magically 179 00:08:46,440 --> 00:08:49,500 get back the same message m. 180 00:08:49,500 --> 00:08:52,290 So this is the general plan: To encrypt, you exponentiate. 181 00:08:52,290 --> 00:08:56,687 To decrypt, you exponentiate by another exponent. 182 00:08:56,687 --> 00:08:58,270 And in general, it seems a little hard 183 00:08:58,270 --> 00:09:00,561 to figure out how we're going to come up with these two 184 00:09:00,561 --> 00:09:02,800 magic numbers that somehow end up giving us 185 00:09:02,800 --> 00:09:04,390 back the same message. 186 00:09:04,390 --> 00:09:06,890 But it turns out that if you look 187 00:09:06,890 --> 00:09:12,000 at how exponentiation works or multiplication works, 188 00:09:12,000 --> 00:09:14,340 modulo of this number n. 189 00:09:14,340 --> 00:09:22,670 Then there's this cool property that if you have any number x, 190 00:09:22,670 --> 00:09:26,000 and you raise it to what's called a [? order ?] of phi 191 00:09:26,000 --> 00:09:32,215 function of n-- maybe I'll use more board space for this. 192 00:09:32,215 --> 00:09:33,790 This seems important. 193 00:09:33,790 --> 00:09:37,998 So if you take x and you raise it to phi of n, 194 00:09:37,998 --> 00:09:44,370 then this is going to be equal to 1 mod m. 195 00:09:44,370 --> 00:09:48,260 And this phi function for our particular choice of n 196 00:09:48,260 --> 00:09:49,960 is pretty straightforward, it's actually 197 00:09:49,960 --> 00:09:54,600 p minus 1 times q minus 1. 198 00:09:54,600 --> 00:10:01,560 So this gives us hope that maybe if we pick ed so that e times 199 00:10:01,560 --> 00:10:06,370 d is 5n plus 1, then we're in good shape. 200 00:10:06,370 --> 00:10:11,200 Because then any message m we exponentiate it to e and d, 201 00:10:11,200 --> 00:10:16,380 we get back 1 times m because our ed product 202 00:10:16,380 --> 00:10:19,420 is going to be roughly 5n plus 1, 203 00:10:19,420 --> 00:10:25,445 or maybe some constant alpha times 5n plus 1. 204 00:10:25,445 --> 00:10:26,320 Does this make sense? 205 00:10:26,320 --> 00:10:30,800 This is why the message is going to get decrypted correctly. 206 00:10:30,800 --> 00:10:33,900 And it turns out that there's a reasonably straightforward 207 00:10:33,900 --> 00:10:39,880 algorithm if you know this phi value for how to compute 208 00:10:39,880 --> 00:10:42,430 d given an e or e given a d. 209 00:10:42,430 --> 00:10:42,930 All right. 210 00:10:42,930 --> 00:10:43,770 Question. 211 00:10:43,770 --> 00:10:45,640 AUDIENCE: Isn't 1 mod n just 1? 212 00:10:45,640 --> 00:10:48,710 PROFESSOR: Yeah, so far we add one more. 213 00:10:48,710 --> 00:10:50,048 Sorry? 214 00:10:50,048 --> 00:10:52,388 AUDIENCE: Like, up over there. 215 00:10:52,388 --> 00:10:53,471 PROFESSOR: Yeah, this one? 216 00:10:53,471 --> 00:10:55,430 AUDIENCE: Yeah. 217 00:10:55,430 --> 00:10:57,200 PROFESSOR: Isn't 1 mod n just 1? 218 00:10:57,200 --> 00:10:58,820 Sorry, I mean this. 219 00:10:58,820 --> 00:11:02,462 So when I say this 1 n, it means that both sides taken 1n 220 00:11:02,462 --> 00:11:04,820 are equal. 221 00:11:04,820 --> 00:11:07,990 So what this means is if you want 222 00:11:07,990 --> 00:11:10,046 to think of mod as literally an operator, 223 00:11:10,046 --> 00:11:13,816 you would write this guy mod m equals 1 mod m. 224 00:11:13,816 --> 00:11:15,440 So that's what mod m on the side means. 225 00:11:15,440 --> 00:11:18,325 Like, the whole equality is mod m. 226 00:11:18,325 --> 00:11:21,175 Sorry for the [INAUDIBLE]. 227 00:11:21,175 --> 00:11:22,610 Make sense? 228 00:11:22,610 --> 00:11:24,120 All right. 229 00:11:24,120 --> 00:11:27,665 So what this basically means for RSA is that we're 230 00:11:27,665 --> 00:11:32,150 going to pick some value e. 231 00:11:32,150 --> 00:11:34,558 So e is going to be our encryption value. 232 00:11:34,558 --> 00:11:41,180 And then from e we're going to generate d to be basically 233 00:11:41,180 --> 00:11:45,826 1 over e mod phi of n. 234 00:11:45,826 --> 00:11:47,665 And there's some Euclidean algorithms 235 00:11:47,665 --> 00:11:51,460 you can use to do this computation efficiently. 236 00:11:51,460 --> 00:11:53,390 But in order to do this you actually 237 00:11:53,390 --> 00:11:56,180 have to know this phi of n, which 238 00:11:56,180 --> 00:11:59,485 requires knowing the factorization of our number n 239 00:11:59,485 --> 00:12:01,910 into p and q. 240 00:12:01,910 --> 00:12:02,410 All right. 241 00:12:02,410 --> 00:12:08,600 So finally, RSA ends up being a system where 242 00:12:08,600 --> 00:12:13,132 the public key is this number n and this encryption exponent e. 243 00:12:13,132 --> 00:12:16,750 So n and e are public, and d should be private. 244 00:12:16,750 --> 00:12:18,820 So then anyone can exponentiate a message 245 00:12:18,820 --> 00:12:20,320 to encrypt it for you. 246 00:12:20,320 --> 00:12:22,914 But only you know this value d and therefore 247 00:12:22,914 --> 00:12:25,230 can decrypt messages. 248 00:12:25,230 --> 00:12:30,090 And as long as you don't know this factorization of p and q, 249 00:12:30,090 --> 00:12:32,660 of n to p and q, then you don't know 250 00:12:32,660 --> 00:12:33,785 what this [? phi del ?] is. 251 00:12:33,785 --> 00:12:35,910 And as a result, it's actually difficult to compute 252 00:12:35,910 --> 00:12:37,470 this d value. 253 00:12:37,470 --> 00:12:41,580 So this is roughly what RSA is. 254 00:12:41,580 --> 00:12:43,370 High level. 255 00:12:43,370 --> 00:12:45,450 Does this make sense? 256 00:12:45,450 --> 00:12:45,950 All right. 257 00:12:45,950 --> 00:12:48,140 So there's 2 things I want to talk about now 258 00:12:48,140 --> 00:12:52,590 that we at least have the basic [? implementation ?] for RSA. 259 00:12:52,590 --> 00:12:55,850 There's tricks to use it correctly and pitfalls 260 00:12:55,850 --> 00:12:57,085 and how to use RSA. 261 00:12:57,085 --> 00:12:59,210 And then there's all kinds of implementation tricks 262 00:12:59,210 --> 00:13:02,440 on how do you actually implement [? root ?] 263 00:13:02,440 --> 00:13:07,360 code to do these exponentiations and do them efficiently. 264 00:13:07,360 --> 00:13:10,010 There's actually more trivial because these are all 265 00:13:10,010 --> 00:13:13,110 large numbers, these are 1,000 bit integers that can't just 266 00:13:13,110 --> 00:13:15,730 do a multiply instruction for. 267 00:13:15,730 --> 00:13:18,156 Probably going to take a fair amount of time 268 00:13:18,156 --> 00:13:20,430 to do these operations. 269 00:13:20,430 --> 00:13:20,930 All right. 270 00:13:20,930 --> 00:13:22,430 So the first thing I want to mention 271 00:13:22,430 --> 00:13:26,470 is the various RSA pitfalls. 272 00:13:26,470 --> 00:13:31,310 One of them we're actually going to rely on in a little bit. 273 00:13:31,310 --> 00:13:35,360 One property is, that it's multiplicative. 274 00:13:38,827 --> 00:13:43,600 So what I mean by this is that suppose we have 2 messages. 275 00:13:43,600 --> 00:13:46,950 Suppose we have m0 and m1. 276 00:13:46,950 --> 00:13:49,196 And suppose I encrypt these guys, 277 00:13:49,196 --> 00:13:55,612 so I encrypt m0, I'm going to get m0 to the power e mod n. 278 00:13:55,612 --> 00:14:02,840 And if I encrypt m1, then I'd get m1 to the e mod n. 279 00:14:02,840 --> 00:14:06,220 The problem is-- not necessarily a problem 280 00:14:06,220 --> 00:14:08,940 but could be a surprise to someone 281 00:14:08,940 --> 00:14:11,300 using RSA-- it's very easy to generate 282 00:14:11,300 --> 00:14:14,480 an encryption of m0 times m1 because you just 283 00:14:14,480 --> 00:14:15,940 multiply these 2 numbers. 284 00:14:15,940 --> 00:14:18,480 If you multiply these guys out, you're 285 00:14:18,480 --> 00:14:26,500 going to get m0 m1 to the e mod n. 286 00:14:26,500 --> 00:14:29,840 This is a correct encryption under this simplistic use 287 00:14:29,840 --> 00:14:34,512 of RSA for the value m0 times m1. 288 00:14:34,512 --> 00:14:36,847 I mean at this point, it's not a huge problem 289 00:14:36,847 --> 00:14:38,555 because if you aren't able to decrypt it, 290 00:14:38,555 --> 00:14:41,940 you're just able to construct this encrypted message. 291 00:14:41,940 --> 00:14:45,620 But it might be that the overall system maybe allows you 292 00:14:45,620 --> 00:14:46,786 to decrypt certain messages. 293 00:14:46,786 --> 00:14:50,110 And if it allows you to decrypt this message that you construct 294 00:14:50,110 --> 00:14:52,670 yourself, maybe you can now go back and figure out 295 00:14:52,670 --> 00:14:53,820 what are these messages. 296 00:14:53,820 --> 00:15:00,310 So it's maybe not a great plan to be ignorant of this fact. 297 00:15:00,310 --> 00:15:04,000 This has certainly come back to bite a number of protocols 298 00:15:04,000 --> 00:15:05,450 that use RSA. 299 00:15:05,450 --> 00:15:06,950 There's one property, we'll actually 300 00:15:06,950 --> 00:15:11,450 use it as a defensive mechanism towards the end of the lecture. 301 00:15:11,450 --> 00:15:15,910 Another property of RSA that you probably want to watch out for 302 00:15:15,910 --> 00:15:18,566 is the fact that it's deterministic. 303 00:15:21,350 --> 00:15:23,695 So in this [? naive ?] implementation 304 00:15:23,695 --> 00:15:27,072 that I just described here, if you take a message m 305 00:15:27,072 --> 00:15:29,165 and you encrypt it, you're going to get m 306 00:15:29,165 --> 00:15:32,100 to the e mod n, which is a deterministic function 307 00:15:32,100 --> 00:15:33,296 of the message. 308 00:15:33,296 --> 00:15:35,303 So if you encrypt it again, you'll 309 00:15:35,303 --> 00:15:36,870 get exactly the same encryption. 310 00:15:36,870 --> 00:15:38,590 This is not surprising but it might not 311 00:15:38,590 --> 00:15:40,510 be a desirable property because if I 312 00:15:40,510 --> 00:15:44,090 see you send send some message encrypted with RSA, 313 00:15:44,090 --> 00:15:46,495 and I want to know what it is, it might be hard 314 00:15:46,495 --> 00:15:47,370 for me to decrypt it. 315 00:15:47,370 --> 00:15:48,890 But I can try different things and I can see, 316 00:15:48,890 --> 00:15:50,306 well are you sending this message? 317 00:15:50,306 --> 00:15:52,600 I'll encrypt it and see if you get the same ciphertext. 318 00:15:52,600 --> 00:15:54,820 And if so, then I'll know that's what you encrypted. 319 00:15:54,820 --> 00:15:56,790 Because all I need to encrypt a message is 320 00:15:56,790 --> 00:16:01,850 the publicly known public key, which is n and the number e. 321 00:16:01,850 --> 00:16:04,104 So that's not so great. 322 00:16:04,104 --> 00:16:06,145 And you might want to watch out for this property 323 00:16:06,145 --> 00:16:08,640 if you're actually using RSA. 324 00:16:08,640 --> 00:16:10,140 So all of these [? primitives are ?] 325 00:16:10,140 --> 00:16:14,340 probably a little bit hard to use directly. 326 00:16:14,340 --> 00:16:17,320 What people do in practice in order 327 00:16:17,320 --> 00:16:20,024 to avoid these problems with RSA is 328 00:16:20,024 --> 00:16:21,690 they encode the message in a certain way 329 00:16:21,690 --> 00:16:23,030 before encrypting it. 330 00:16:23,030 --> 00:16:25,790 Instead of directly exponentiating a message, 331 00:16:25,790 --> 00:16:28,020 it actually takes some function of a message, 332 00:16:28,020 --> 00:16:31,680 and then they encrypt that. 333 00:16:31,680 --> 00:16:33,096 mod n. 334 00:16:33,096 --> 00:16:38,190 And this function f, the right one to use these days, 335 00:16:38,190 --> 00:16:41,526 is probably something called optimal asymmetric encryption 336 00:16:41,526 --> 00:16:45,595 padding, O A E P. You can look it up. 337 00:16:45,595 --> 00:16:49,310 It's something coded that has two interesting properties. 338 00:16:49,310 --> 00:16:51,390 First of all, it injects randomness. 339 00:16:51,390 --> 00:16:57,230 You can think of f of n as generating 1,000 bit message 340 00:16:57,230 --> 00:16:58,580 that you're going to encrypt. 341 00:16:58,580 --> 00:17:01,566 Part of this message is going to be your message m in the middle 342 00:17:01,566 --> 00:17:02,065 here. 343 00:17:02,065 --> 00:17:03,420 So that you can get it back when you decrypt, of course. 344 00:17:03,420 --> 00:17:04,641 [INAUDIBLE]. 345 00:17:04,641 --> 00:17:06,599 So there's 2 interesting things you want to do. 346 00:17:06,599 --> 00:17:08,339 You want to put in some randomness here, 347 00:17:08,339 --> 00:17:10,640 some value r so that when you encrypt the message 348 00:17:10,640 --> 00:17:12,839 multiple times, you'll get different results out 349 00:17:12,839 --> 00:17:16,069 of each time so then it's not deterministic anymore. 350 00:17:16,069 --> 00:17:18,390 And in order to defeat this multiplicative property 351 00:17:18,390 --> 00:17:20,840 and other kinds of problems, you're 352 00:17:20,840 --> 00:17:23,010 going to put in some fixed padding here. 353 00:17:23,010 --> 00:17:25,510 You can think of this as an altering sequence of 1 0 354 00:17:25,510 --> 00:17:27,003 1 0 1 0. 355 00:17:27,003 --> 00:17:28,044 You can do better things. 356 00:17:28,044 --> 00:17:30,134 But roughly it's some predictable sequence 357 00:17:30,134 --> 00:17:33,395 that you put in here and whenever you decrypt, 358 00:17:33,395 --> 00:17:35,590 you make sure the sequence is still there. 359 00:17:35,590 --> 00:17:37,560 Even in multiplication it's going 360 00:17:37,560 --> 00:17:40,570 to destroy this bit power. 361 00:17:40,570 --> 00:17:43,597 And then you should be clear that someone tampered 362 00:17:43,597 --> 00:17:46,082 with my message and reject it. 363 00:17:46,082 --> 00:17:51,220 And if it's still there, then presumably, sometimes provably, 364 00:17:51,220 --> 00:17:53,621 no one tampered with your message, and as a result 365 00:17:53,621 --> 00:17:55,004 you should be able to accept it. 366 00:17:55,004 --> 00:17:59,140 And treat message m as correctly encrypted by someone. 367 00:17:59,140 --> 00:18:00,721 Make sense? 368 00:18:00,721 --> 00:18:01,220 Yeah? 369 00:18:01,220 --> 00:18:05,250 AUDIENCE: If the attacker knows how big the pad is, can't they 370 00:18:05,250 --> 00:18:10,960 put a 1 in the lowest place and then [INAUDIBLE] 371 00:18:10,960 --> 00:18:13,207 under multiplication? 372 00:18:13,207 --> 00:18:14,165 PROFESSOR: Yeah, maybe. 373 00:18:14,165 --> 00:18:16,552 It's a little bit tricky because this randomness 374 00:18:16,552 --> 00:18:17,510 is going to bleed over. 375 00:18:17,510 --> 00:18:20,170 So the particular construction of this O A E P 376 00:18:20,170 --> 00:18:22,740 is a little bit more sophisticated than this. 377 00:18:22,740 --> 00:18:25,210 But if you imagine this is integer 378 00:18:25,210 --> 00:18:28,160 multiplication not bit-wise multiplication. 379 00:18:28,160 --> 00:18:31,530 And so this randomness is going to bleed over somewhere, 380 00:18:31,530 --> 00:18:34,700 and you can construct O A E P scheme such 381 00:18:34,700 --> 00:18:37,896 that this doesn't happen. 382 00:18:37,896 --> 00:18:41,720 [INAUDIBLE] Make sense? 383 00:18:41,720 --> 00:18:42,390 All right. 384 00:18:42,390 --> 00:18:44,514 So it turns out that basically you shouldn't really 385 00:18:44,514 --> 00:18:46,170 use this RSA math directly, you should 386 00:18:46,170 --> 00:18:48,760 use some library in practice that implements all 387 00:18:48,760 --> 00:18:51,340 those things correctly for you. 388 00:18:51,340 --> 00:18:53,980 And use it just as an encrypt/decrypt parameter. 389 00:18:53,980 --> 00:18:56,390 But it turns out these details will come in and matter 390 00:18:56,390 --> 00:18:58,473 for us because we're actually trying to figure out 391 00:18:58,473 --> 00:19:03,300 how to break or how to attack an existing RSA implementation. 392 00:19:03,300 --> 00:19:07,100 So in particular the attack from this paper 393 00:19:07,100 --> 00:19:10,080 is going to exploit the fact that the server is 394 00:19:10,080 --> 00:19:13,210 going to check for this padding when they get a message. 395 00:19:13,210 --> 00:19:17,130 So this is how we're going to time how long it takes a server 396 00:19:17,130 --> 00:19:17,770 to decrypt. 397 00:19:17,770 --> 00:19:21,690 We're going to send some random message, or some carefully 398 00:19:21,690 --> 00:19:22,545 constructed message. 399 00:19:22,545 --> 00:19:26,243 But the message wasn't constructed by taking a real m 400 00:19:26,243 --> 00:19:27,330 and encrypting it. 401 00:19:27,330 --> 00:19:29,980 We're going to construct a careful ciphertext integer 402 00:19:29,980 --> 00:19:31,300 value. 403 00:19:31,300 --> 00:19:33,020 And the server is going to decrypt it, 404 00:19:33,020 --> 00:19:34,700 it's going to decrypt to some nonsense, 405 00:19:34,700 --> 00:19:36,590 and the padding is going to not match 406 00:19:36,590 --> 00:19:37,820 with a very high probability. 407 00:19:37,820 --> 00:19:40,090 And immediately the server is going to reject it. 408 00:19:40,090 --> 00:19:41,720 And the reason this is going to be good 409 00:19:41,720 --> 00:19:44,340 for us is because it will tell us exactly how long it took 410 00:19:44,340 --> 00:19:47,250 the server to get to this point, just do the RSA decryption, 411 00:19:47,250 --> 00:19:50,281 get this message, check the padding, and reject it. 412 00:19:50,281 --> 00:19:52,030 So that's what we're going to be measuring 413 00:19:52,030 --> 00:19:54,290 in this attack from the paper. 414 00:19:54,290 --> 00:19:55,450 Does that make sense? 415 00:19:55,450 --> 00:19:57,700 So there's some integrity component to the the message 416 00:19:57,700 --> 00:20:02,800 that allows us to time the decryption leading up to it. 417 00:20:02,800 --> 00:20:03,625 All right. 418 00:20:03,625 --> 00:20:07,180 So now let's talk about how to do you actually implement RSA. 419 00:20:07,180 --> 00:20:09,940 So the core of it is really this exponentiation, 420 00:20:09,940 --> 00:20:12,485 which is not exactly trivial to do 421 00:20:12,485 --> 00:20:14,860 as I was mentioning earlier because all these numbers are 422 00:20:14,860 --> 00:20:15,880 very large integers. 423 00:20:15,880 --> 00:20:18,820 So the message itself is going to be at least, 424 00:20:18,820 --> 00:20:20,830 in this paper, 1,000 bit integer. 425 00:20:20,830 --> 00:20:23,810 And the exponent itself is also going to be pretty large. 426 00:20:23,810 --> 00:20:26,180 The encryption exponent is at least well known. 427 00:20:26,180 --> 00:20:27,596 But the decryption exponent better 428 00:20:27,596 --> 00:20:30,255 be also a large integer also on the order of 1,000 bits. 429 00:20:30,255 --> 00:20:32,126 So you have a 1,000 bit integer you 430 00:20:32,126 --> 00:20:35,900 want to exponentiate to another 1,000 bit integer power modulo 431 00:20:35,900 --> 00:20:38,030 some other 1,000 bit integer n that's 432 00:20:38,030 --> 00:20:39,830 going to be a little messy, if you just do 433 00:20:39,830 --> 00:20:42,210 [? the naive thing. ?] So almost everyone has 434 00:20:42,210 --> 00:20:45,530 lots of optimizations in their RSA implementations 435 00:20:45,530 --> 00:20:48,640 to make this go a little bit faster. 436 00:20:48,640 --> 00:20:51,970 And there's four optimizations that matter 437 00:20:51,970 --> 00:20:53,420 for the purpose of this attack. 438 00:20:53,420 --> 00:20:55,420 There is actually more tricks that you can play, 439 00:20:55,420 --> 00:20:57,100 but the most important ones are these. 440 00:20:57,100 --> 00:21:02,130 So first there's something called the Chinese remainder 441 00:21:02,130 --> 00:21:06,640 theorem, or C R T. And just to remind you 442 00:21:06,640 --> 00:21:10,250 from grade school or high school maybe what 443 00:21:10,250 --> 00:21:12,330 this remainder theorem says. 444 00:21:12,330 --> 00:21:16,380 It actually says that if you have two numbers 445 00:21:16,380 --> 00:21:20,170 and you have some value x and you know 446 00:21:20,170 --> 00:21:25,360 that x is equal to a1 mod p. 447 00:21:25,360 --> 00:21:31,200 And you know that x is equal to a2 mod q, where 448 00:21:31,200 --> 00:21:33,350 p and q are prime numbers. 449 00:21:33,350 --> 00:21:38,790 And this modular equality applies to the whole equation. 450 00:21:38,790 --> 00:21:42,920 Then it turns out that there's a unique solution to this 451 00:21:42,920 --> 00:21:43,650 is mod p q. 452 00:21:43,650 --> 00:21:52,210 So there's are some x equals to some x prime mod pq. 453 00:21:52,210 --> 00:21:55,050 And in fact, there's a unique such x prime, 454 00:21:55,050 --> 00:21:57,170 and it's actually very efficient to compute. 455 00:21:57,170 --> 00:21:59,450 So the Chinese remainder theorem also 456 00:21:59,450 --> 00:22:03,070 comes with an algorithm for how to compute this unique x 457 00:22:03,070 --> 00:22:09,300 prime that's equal to x mod pq given the values a1 and a2 mod 458 00:22:09,300 --> 00:22:12,570 p and q, respectively. 459 00:22:12,570 --> 00:22:15,170 Make sense? 460 00:22:15,170 --> 00:22:17,495 OK, so how can you use this Chinese remainder theorem 461 00:22:17,495 --> 00:22:22,580 to speed up modular exponentiation? 462 00:22:22,580 --> 00:22:24,130 So the way this is going to help us 463 00:22:24,130 --> 00:22:26,350 is that if you notice all the time 464 00:22:26,350 --> 00:22:31,400 we're doing this computational of some bunch of stuff modulo 465 00:22:31,400 --> 00:22:33,710 n, which is p times q. 466 00:22:33,710 --> 00:22:35,135 And the Chinese remainder theorem 467 00:22:35,135 --> 00:22:39,100 says that if you want the value of something mod p times q, 468 00:22:39,100 --> 00:22:42,320 it suffices to compute the value of that thing mod p 469 00:22:42,320 --> 00:22:44,746 and the value of that thing mod q. 470 00:22:44,746 --> 00:22:46,610 And then use the Chinese remainder theorem 471 00:22:46,610 --> 00:22:48,960 to figure out the unique solution to what 472 00:22:48,960 --> 00:22:53,220 this thing is mod p times q. 473 00:22:53,220 --> 00:22:55,516 All right, why is this faster? 474 00:22:55,516 --> 00:22:58,335 Seems like you're basically doing the same thing twice, 475 00:22:58,335 --> 00:23:00,854 and that's more work to recombine it 476 00:23:00,854 --> 00:23:02,270 Is this going to save me anything? 477 00:23:02,270 --> 00:23:02,770 Yeah? 478 00:23:02,770 --> 00:23:03,746 AUDIENCE: [INAUDIBLE] 479 00:23:06,479 --> 00:23:08,270 PROFESSOR: Well, they're certainly smaller, 480 00:23:08,270 --> 00:23:09,311 they're not that smaller. 481 00:23:09,311 --> 00:23:11,950 And so p and q, so n is 1,000 bits, p and q 482 00:23:11,950 --> 00:23:15,600 are both 500 bits, they're not quite to the machine word size 483 00:23:15,600 --> 00:23:16,360 yet. 484 00:23:16,360 --> 00:23:18,980 But it is going to help us because most 485 00:23:18,980 --> 00:23:21,340 of the stuff we're doing in this computation 486 00:23:21,340 --> 00:23:23,160 is all these multiplications. 487 00:23:23,160 --> 00:23:26,315 And roughly multiplication is quadratic in the size 488 00:23:26,315 --> 00:23:29,960 of the thing you're multiplying because the grade school 489 00:23:29,960 --> 00:23:31,980 method of multiplication you take all the digits 490 00:23:31,980 --> 00:23:34,910 and multiply them by all the other digits in the number. 491 00:23:34,910 --> 00:23:38,785 And as a result, doing exponentiation multiplication 492 00:23:38,785 --> 00:23:40,650 is roughly quadratic in the input side. 493 00:23:40,650 --> 00:23:46,460 So if we shrink the value of p, we basically go from 1,000 bits 494 00:23:46,460 --> 00:23:49,204 to 512 bits, we reduce the size of our input by 2. 495 00:23:49,204 --> 00:23:51,370 So this means all this multiplication exponentiation 496 00:23:51,370 --> 00:23:54,930 is going to be roughly 4 times cheaper. 497 00:23:54,930 --> 00:23:58,530 So even though we do it twice, each time is 4 times faster. 498 00:23:58,530 --> 00:24:01,300 So overall, the CRT optimization is 499 00:24:01,300 --> 00:24:04,120 going to give us basically a 2x performance 500 00:24:04,120 --> 00:24:08,080 boost for doing any RSA operation both, 501 00:24:08,080 --> 00:24:10,694 in the encryption and decryption side. 502 00:24:10,694 --> 00:24:14,220 That make sense? 503 00:24:14,220 --> 00:24:15,570 All right. 504 00:24:15,570 --> 00:24:20,250 So that's the first optimization that most people use. 505 00:24:20,250 --> 00:24:24,550 The second thing that most implementations do 506 00:24:24,550 --> 00:24:27,195 is a technique called sliding windows. 507 00:24:32,620 --> 00:24:36,200 And we'll look at this implementation in 2 steps 508 00:24:36,200 --> 00:24:40,199 so this implementation is going to be concerned with what 509 00:24:40,199 --> 00:24:41,740 basic operations are going to perform 510 00:24:41,740 --> 00:24:44,390 to do this exponentiation. 511 00:24:44,390 --> 00:24:49,000 Suppose you have some ciphertext c that's now 500 bits 512 00:24:49,000 --> 00:24:52,155 because you were not doing mod p or mod q. 513 00:24:52,155 --> 00:24:58,270 We have a 500 bit c and, similarly, roughly a 500 bit d 514 00:24:58,270 --> 00:25:00,185 as well. 515 00:25:00,185 --> 00:25:04,070 So how do we raise c to the power d? 516 00:25:04,070 --> 00:25:07,040 I guess the stupid way that is to take c and keep 517 00:25:07,040 --> 00:25:08,740 multiplying d times. 518 00:25:08,740 --> 00:25:10,770 But d is very big, it's 2 to the 500. 519 00:25:10,770 --> 00:25:12,940 So that's never going to finish. 520 00:25:12,940 --> 00:25:16,780 So a more amenable, or more performant, 521 00:25:16,780 --> 00:25:20,810 plan is to do what's called repeat of squaring. 522 00:25:20,810 --> 00:25:24,880 So that's the step before sliding windows. 523 00:25:24,880 --> 00:25:31,360 So this technique called repeated squaring looks 524 00:25:31,360 --> 00:25:31,860 like this. 525 00:25:31,860 --> 00:25:40,580 So if you want to compute c to the power 2 x, 526 00:25:40,580 --> 00:25:46,080 then you can actually compute c to the x and then square it. 527 00:25:46,080 --> 00:25:48,600 So in our naive plan, computing c to the 2x 528 00:25:48,600 --> 00:25:50,850 would have involved us making twice as many iterations 529 00:25:50,850 --> 00:25:53,449 of multiplying because it's multiplying c twice many times. 530 00:25:53,449 --> 00:25:55,490 But in fact, you could be clever and just compute 531 00:25:55,490 --> 00:25:58,336 c to the x and then square it later. 532 00:25:58,336 --> 00:26:00,610 So this works well, and this means 533 00:26:00,610 --> 00:26:06,810 that if you're computing c to some even exponent, this works. 534 00:26:06,810 --> 00:26:10,412 And conversely, if you're computing c to some 2x plus 1, 535 00:26:10,412 --> 00:26:11,870 then you could imagine this is just 536 00:26:11,870 --> 00:26:16,461 c to the x squared times another c. 537 00:26:16,461 --> 00:26:18,770 So this is what's called repeated squaring. 538 00:26:18,770 --> 00:26:23,375 And this now allows us to compute these exponentiations, 539 00:26:23,375 --> 00:26:27,600 or modular exponentiations, in a time that's 540 00:26:27,600 --> 00:26:31,200 basically linear in the size of the exponent. 541 00:26:31,200 --> 00:26:34,110 So for every bit in the exponent, 542 00:26:34,110 --> 00:26:37,090 we're going to either square something 543 00:26:37,090 --> 00:26:40,760 or square something then do an extra multiplication. 544 00:26:40,760 --> 00:26:43,920 So that's the plan for repeated squaring. 545 00:26:43,920 --> 00:26:47,290 So now we can at least have non-embarrassing run times 546 00:26:47,290 --> 00:26:50,045 for computing modular exponents. 547 00:26:50,045 --> 00:26:54,652 Does this make sense, why this is working and why it's faster? 548 00:26:54,652 --> 00:26:56,610 All right, so what's this sliding windows trick 549 00:26:56,610 --> 00:26:58,930 that the paper talks about? 550 00:26:58,930 --> 00:27:02,500 So this is a little bit more sophisticated than this 551 00:27:02,500 --> 00:27:04,050 repeating squaring business. 552 00:27:04,050 --> 00:27:08,020 And basically the squaring is going 553 00:27:08,020 --> 00:27:09,690 to be pretty much inevitable. 554 00:27:09,690 --> 00:27:13,450 But what the sliding windows optimization is trying do 555 00:27:13,450 --> 00:27:17,570 is reduce the overhead of multiplying by this extra c 556 00:27:17,570 --> 00:27:18,656 down here. 557 00:27:18,656 --> 00:27:21,300 So suppose if you have some number that 558 00:27:21,300 --> 00:27:25,470 has several 1 bits in the exponent, for every 1 bit 559 00:27:25,470 --> 00:27:27,485 in the exponent in the binder of presentation, 560 00:27:27,485 --> 00:27:30,670 you're going to have do this step instead of this step. 561 00:27:30,670 --> 00:27:33,130 Because for every odd number, you're 562 00:27:33,130 --> 00:27:34,610 going to have to multiply by c. 563 00:27:34,610 --> 00:27:37,930 So these guys would like to not multiply by this c as often. 564 00:27:37,930 --> 00:27:44,754 So the plan is to precompute different powers of c. 565 00:27:44,754 --> 00:27:46,170 So what we're going to do is we're 566 00:27:46,170 --> 00:27:48,340 going to generate a table that says, 567 00:27:48,340 --> 00:27:53,020 well, here's the value of c to the x-- sorry, c to the 1-- 568 00:27:53,020 --> 00:27:56,460 here's the value of c to the 3, c to the 7. 569 00:27:56,460 --> 00:27:57,960 And I think [? in open ?] as a cell, 570 00:27:57,960 --> 00:28:02,020 it goes up to c to the 31st. 571 00:28:02,020 --> 00:28:04,780 So this table is going to just be 572 00:28:04,780 --> 00:28:08,640 precomputed when you want to do some modular exponentiation. 573 00:28:08,640 --> 00:28:11,660 You're going to precompute all the slots in this table. 574 00:28:11,660 --> 00:28:14,340 And then when you want to do this exponentiation, instead 575 00:28:14,340 --> 00:28:16,850 of doing the repeated squaring and multiplying by this c 576 00:28:16,850 --> 00:28:18,754 every time, 577 00:28:18,754 --> 00:28:20,420 You're going to use a different formula. 578 00:28:20,420 --> 00:28:26,580 It says as well if you have c to the 32x plus some y, 579 00:28:26,580 --> 00:28:29,075 well you can do c to the x, and you 580 00:28:29,075 --> 00:28:33,665 can do repeated squaring-- very much like before-- this 581 00:28:33,665 --> 00:28:38,250 is to get the 32, there's like 5 powers of 2 here 582 00:28:38,250 --> 00:28:41,560 times c to the y. 583 00:28:41,560 --> 00:28:44,055 And c to the y, you can get out of this table. 584 00:28:44,055 --> 00:28:46,770 So you can see that we're doing the same number of squaring 585 00:28:46,770 --> 00:28:48,280 as before here. 586 00:28:48,280 --> 00:28:52,270 But we don't have to multiply by c as many times. 587 00:28:52,270 --> 00:28:54,400 You're going to fish it out of this table 588 00:28:54,400 --> 00:28:56,580 and do several multiplies by c for the cost 589 00:28:56,580 --> 00:28:59,030 of a single multiply. 590 00:28:59,030 --> 00:29:00,484 This make sense? 591 00:29:00,484 --> 00:29:00,983 Yeah? 592 00:29:00,983 --> 00:29:03,876 AUDIENCE: How do you determine x and y in the first place? 593 00:29:03,876 --> 00:29:05,125 PROFESSOR: How do determine y? 594 00:29:05,125 --> 00:29:06,156 AUDIENCE: X and y. 595 00:29:06,156 --> 00:29:07,000 PROFESSOR: Oh, OK. 596 00:29:07,000 --> 00:29:08,380 So let's look at that. 597 00:29:08,380 --> 00:29:13,290 So for repeated squaring, well actually 598 00:29:13,290 --> 00:29:14,940 in both cases, what you want to do 599 00:29:14,940 --> 00:29:17,240 is you want to look at the exponent 600 00:29:17,240 --> 00:29:21,830 that you're trying to use in a binary representation. 601 00:29:21,830 --> 00:29:26,180 So suppose I'm trying to compute the value of c to the exponent, 602 00:29:26,180 --> 00:29:32,755 I don't know, 1 0 1 1 0 1 0, and maybe there's more bits. 603 00:29:32,755 --> 00:29:35,310 OK, so if we wanted to do repeated squaring, 604 00:29:35,310 --> 00:29:38,410 then you look at the lowest bit here-- it's 0. 605 00:29:38,410 --> 00:29:39,910 So what you're going to write down 606 00:29:39,910 --> 00:29:46,346 is this is equal to c to the 1 0 1 1 0 1 squared. 607 00:29:46,346 --> 00:29:49,205 OK, so now if only you knew this value, 608 00:29:49,205 --> 00:29:50,812 then you could just square it. 609 00:29:50,812 --> 00:29:54,816 OK, now we're going to compute this guy, so c to the 1 0 1 1 610 00:29:54,816 --> 00:29:57,850 0 1 is equal to-- well here we can't use this rule 611 00:29:57,850 --> 00:30:00,400 because it's not 2x-- it's going to be to the x plus 1. 612 00:30:00,400 --> 00:30:06,030 So now we're going to write this is c to the 1 0 1 1 0 613 00:30:06,030 --> 00:30:09,430 squared times another c. 614 00:30:09,430 --> 00:30:15,020 Because it's this prefix times 2 plus this one of m. 615 00:30:15,020 --> 00:30:17,140 That's how you fish it out for repeated squaring. 616 00:30:17,140 --> 00:30:19,950 And for sliding window, you just grab more bits 617 00:30:19,950 --> 00:30:20,680 from the low end. 618 00:30:20,680 --> 00:30:24,090 So if you wanted to do the sliding window trick here 619 00:30:24,090 --> 00:30:27,130 instead of taking one c out, suppose 620 00:30:27,130 --> 00:30:29,880 we do-- instead of this giant table-- maybe 621 00:30:29,880 --> 00:30:30,980 we do 3 bits at a time. 622 00:30:30,980 --> 00:30:32,785 So we go off to c to the 7th. 623 00:30:32,785 --> 00:30:36,620 So here you would grab the first 3 bits, 624 00:30:36,620 --> 00:30:40,448 and that's what you would compute here: c to the 1 625 00:30:40,448 --> 00:30:42,700 0 1 to the 8th power. 626 00:30:42,700 --> 00:30:47,995 And then, the rest is c to the 1 0 1 power here. 627 00:30:47,995 --> 00:30:50,120 It's a little unfortunate these are the same thing, 628 00:30:50,120 --> 00:30:53,001 but really there's more bits here. 629 00:30:53,001 --> 00:30:54,625 But here, this is the thing that you're 630 00:30:54,625 --> 00:30:55,875 going to look up in the table. 631 00:30:55,875 --> 00:30:57,760 This is c to the 5th in decimal. 632 00:30:57,760 --> 00:31:00,590 And this says you're going to keep doing the sliding window 633 00:31:00,590 --> 00:31:03,310 to compute this value. 634 00:31:03,310 --> 00:31:05,036 Make sense? 635 00:31:05,036 --> 00:31:06,410 This just saves on how many times 636 00:31:06,410 --> 00:31:08,760 you have to multiply by c by pre-multiplying 637 00:31:08,760 --> 00:31:10,910 it a bunch of times. 638 00:31:10,910 --> 00:31:12,870 [? And the cell guys ?] at least 10 years ago 639 00:31:12,870 --> 00:31:16,520 thought that going up to 32 power 640 00:31:16,520 --> 00:31:18,229 was the best plan in terms of efficiency 641 00:31:18,229 --> 00:31:20,020 because there's some trade off here, right? 642 00:31:20,020 --> 00:31:21,728 You spend time preconfiguring this table, 643 00:31:21,728 --> 00:31:24,109 but then if this table is too giant, 644 00:31:24,109 --> 00:31:25,650 you're not going to use some entries, 645 00:31:25,650 --> 00:31:28,190 because if you run this table out to, 646 00:31:28,190 --> 00:31:31,700 I don't know, c to the 128 but you're computing just 647 00:31:31,700 --> 00:31:33,191 like 500 [? full bit ?] exponents, 648 00:31:33,191 --> 00:31:35,190 maybe you're not going to use all these entries. 649 00:31:35,190 --> 00:31:36,670 So it's gonna be a waste of time. 650 00:31:36,670 --> 00:31:37,170 Question. 651 00:31:37,170 --> 00:31:41,156 AUDIENCE: [INAUDIBLE] Is there a reason 652 00:31:41,156 --> 00:31:44,128 not to compute the table [INAUDIBLE]? 653 00:31:44,128 --> 00:31:44,628 [INAUDIBLE] 654 00:31:49,460 --> 00:31:52,240 PROFESSOR: It ends up being the case 655 00:31:52,240 --> 00:31:57,740 that you don't want to-- well there's two things going on. 656 00:31:57,740 --> 00:32:01,850 One is that you'll have now code to check whether the entry is 657 00:32:01,850 --> 00:32:05,440 filled in or not, and that'll probably reduce your branch 658 00:32:05,440 --> 00:32:07,232 predictor accuracy on the CPU So it 659 00:32:07,232 --> 00:32:09,010 will run slower in the common case 660 00:32:09,010 --> 00:32:11,903 because if you [INAUDIBLE] with the entries there. 661 00:32:11,903 --> 00:32:13,319 Another slightly annoying thing is 662 00:32:13,319 --> 00:32:15,850 that it turns out this entry leaks stuff 663 00:32:15,850 --> 00:32:18,440 through a different side-channel, namely 664 00:32:18,440 --> 00:32:20,670 cache access patterns. 665 00:32:20,670 --> 00:32:23,610 So if you have some other process on the same CPU, 666 00:32:23,610 --> 00:32:26,650 you can sort of see which cache addresses are getting 667 00:32:26,650 --> 00:32:30,910 evicted out of the cache or are slower because someone accessed 668 00:32:30,910 --> 00:32:32,730 this entry or this entry. 669 00:32:32,730 --> 00:32:35,400 And the bigger this table gets, the easier 670 00:32:35,400 --> 00:32:38,630 it is to tell what the exponent bits were. 671 00:32:38,630 --> 00:32:42,930 In the limit, this table is gigantic and just telling, 672 00:32:42,930 --> 00:32:47,680 just being able to tell which cache address on this CPU 673 00:32:47,680 --> 00:32:50,345 had a [? miss ?] tells you that the encryption process must 674 00:32:50,345 --> 00:32:51,965 have accessed that entry in the table. 675 00:32:51,965 --> 00:32:55,450 And tells you that, oh that long bit sequence appears somewhere 676 00:32:55,450 --> 00:32:58,170 in your secret key exponent. 677 00:32:58,170 --> 00:33:00,930 So I guess the answer isn't mathematically 678 00:33:00,930 --> 00:33:03,080 you could totally fill this in on demand. 679 00:33:03,080 --> 00:33:06,550 In practice, you probably don't want it to be that giant. 680 00:33:06,550 --> 00:33:08,810 And also, if you have it's particularly giant, 681 00:33:08,810 --> 00:33:12,350 you aren't going to be able to use entries as efficiently as 682 00:33:12,350 --> 00:33:13,250 well. 683 00:33:13,250 --> 00:33:14,910 You can reuse these entries as you're 684 00:33:14,910 --> 00:33:16,576 computing. [INAUDIBLE] It's not actually 685 00:33:16,576 --> 00:33:19,460 that expensive because you use c to the cubed 686 00:33:19,460 --> 00:33:23,330 when you're computing c to the 7th and so on and so forth. 687 00:33:23,330 --> 00:33:25,644 It's not that bad. 688 00:33:25,644 --> 00:33:26,800 Make sense? 689 00:33:26,800 --> 00:33:30,040 Other questions? 690 00:33:30,040 --> 00:33:31,260 All right. 691 00:33:31,260 --> 00:33:35,250 So this is the repeated squaring and sliding 692 00:33:35,250 --> 00:33:41,384 window optimization that open [? a cell ?] implements 693 00:33:41,384 --> 00:33:43,550 [INAUDIBLE] I don't actually know whether they still 694 00:33:43,550 --> 00:33:46,252 have the same size of the sliding window or not. 695 00:33:46,252 --> 00:33:48,460 But it does actually give you a fair bit of speed up. 696 00:33:48,460 --> 00:33:53,135 So before you had to square for every bit in the exponent. 697 00:33:53,135 --> 00:33:57,060 And then you'd have to have a multiply for every 1 bit. 698 00:33:57,060 --> 00:33:59,990 So if you have a 500 bit exponent then 699 00:33:59,990 --> 00:34:02,880 you're going to do 500 squarings and, on average, 700 00:34:02,880 --> 00:34:06,349 roughly 256 multiplications by c. 701 00:34:06,349 --> 00:34:07,890 So with sliding windows, you're going 702 00:34:07,890 --> 00:34:11,469 to still do the 512 squarings because there's 703 00:34:11,469 --> 00:34:13,280 no getting around that. 704 00:34:13,280 --> 00:34:16,050 But instead of doing 256 multiplies by c, 705 00:34:16,050 --> 00:34:19,214 you're going to hopefully do way fewer, 706 00:34:19,214 --> 00:34:21,130 maybe something on the order of 32 [INAUDIBLE] 707 00:34:21,130 --> 00:34:24,900 multiplies by some entry in this table. 708 00:34:24,900 --> 00:34:27,489 So that's the general plan. 709 00:34:27,489 --> 00:34:31,400 [INAUDIBLE] Not as dramatic as CRT, not 2x, 710 00:34:31,400 --> 00:34:33,760 but it could save you like almost 1.5x. 711 00:34:37,516 --> 00:34:40,660 All depending on exactly what [INAUDIBLE]. 712 00:34:40,660 --> 00:34:42,870 Make sense? 713 00:34:42,870 --> 00:34:45,888 Another question about this? 714 00:34:45,888 --> 00:34:47,260 All right. 715 00:34:47,260 --> 00:34:50,360 So these are the [? roughly ?] easier optimizations. 716 00:34:50,360 --> 00:34:53,040 And then there's two clever tricks 717 00:34:53,040 --> 00:34:57,290 playing with numbers for how to do just a multiplication more 718 00:34:57,290 --> 00:34:59,150 efficiently. 719 00:34:59,150 --> 00:35:01,690 So the first one of these optimizations 720 00:35:01,690 --> 00:35:04,080 that we're going to look at-- I think 721 00:35:04,080 --> 00:35:08,060 I'll raise this board-- is called this Montgomery 722 00:35:08,060 --> 00:35:09,820 representation. 723 00:35:09,820 --> 00:35:13,190 And we'll see in a second why it's 724 00:35:13,190 --> 00:35:14,800 particularly important for us. 725 00:35:23,820 --> 00:35:26,700 So the problem that this Montgomery representation 726 00:35:26,700 --> 00:35:29,150 optimization is trying to solve for us 727 00:35:29,150 --> 00:35:33,170 is the fact that every time we do a multiply, 728 00:35:33,170 --> 00:35:34,880 we get a number that keeps growing 729 00:35:34,880 --> 00:35:36,650 and growing and growing. 730 00:35:36,650 --> 00:35:40,690 In particular, both in sliding windows 731 00:35:40,690 --> 00:35:43,750 or in repeated squaring, actually when 732 00:35:43,750 --> 00:35:46,010 you square you multiply 2 numbers together, 733 00:35:46,010 --> 00:35:47,510 when you multiply by c to the y, you 734 00:35:47,510 --> 00:35:48,685 multiply 2 numbers together. 735 00:35:48,685 --> 00:35:53,010 And the problem is that if the inputs to the multiplication 736 00:35:53,010 --> 00:35:56,910 were, let's say, 512 bits each. 737 00:35:56,910 --> 00:35:59,140 Then the result of the multiplication 738 00:35:59,140 --> 00:36:01,130 is going to be 1,000 bits. 739 00:36:01,130 --> 00:36:03,120 And then you'd take this 1,000 bit result 740 00:36:03,120 --> 00:36:04,746 and you multiply it again by something 741 00:36:04,746 --> 00:36:05,870 like five [INAUDIBLE] bits. 742 00:36:05,870 --> 00:36:08,910 And now it's 1,500 bits, 2,000 bits, 2,500 bits, 743 00:36:08,910 --> 00:36:10,790 and it keeps growing and growing. 744 00:36:10,790 --> 00:36:13,430 And you really don't want this because multiplications 745 00:36:13,430 --> 00:36:17,670 [? quadratic ?] in the size of the number we're multiplying. 746 00:36:17,670 --> 00:36:19,430 So we have to keep the size of our number 747 00:36:19,430 --> 00:36:21,985 as small as possible, which means basically 512 748 00:36:21,985 --> 00:36:27,360 bits because all this computation is mod p or mod q. 749 00:36:27,360 --> 00:36:28,045 Yeah? 750 00:36:28,045 --> 00:36:29,670 AUDIENCE: What do you want [INAUDIBLE]? 751 00:36:31,960 --> 00:36:33,210 PROFESSOR: That's right, yeah. 752 00:36:33,210 --> 00:36:36,240 So the cool thing is that we can keep this number down 753 00:36:36,240 --> 00:36:37,640 because what we do is, let's say, 754 00:36:37,640 --> 00:36:40,730 we want to compute c to the x just for this example. 755 00:36:40,730 --> 00:36:41,524 Squared. 756 00:36:41,524 --> 00:36:43,270 Squared again. 757 00:36:43,270 --> 00:36:44,350 Squared again. 758 00:36:44,350 --> 00:36:46,610 What you could do is you compute c to the x 759 00:36:46,610 --> 00:36:49,740 then you take mod p, let's say, right. 760 00:36:49,740 --> 00:36:53,110 Then you square it then you do mod p again. 761 00:36:53,110 --> 00:36:56,820 Then you square it again, and then you do mod p again. 762 00:36:56,820 --> 00:36:57,539 And so on. 763 00:36:57,539 --> 00:36:59,330 So this is basically what you're proposing. 764 00:36:59,330 --> 00:37:00,100 So this is great. 765 00:37:00,100 --> 00:37:02,830 In fact, this keeps it size of our numbers 766 00:37:02,830 --> 00:37:05,260 to basically five total bits, which is about as 767 00:37:05,260 --> 00:37:06,890 small as we can get. 768 00:37:06,890 --> 00:37:08,710 This is good in terms of keeping down 769 00:37:08,710 --> 00:37:11,940 the size of these numbers for multiplication. 770 00:37:11,940 --> 00:37:15,310 But it's actually kind of expensive to do this mod p 771 00:37:15,310 --> 00:37:16,920 operation. 772 00:37:16,920 --> 00:37:19,240 Because the way that you do mod p something is 773 00:37:19,240 --> 00:37:21,740 you basically have to do division. 774 00:37:21,740 --> 00:37:24,510 And division is way worse than multiplication. 775 00:37:24,510 --> 00:37:27,730 I'm not going to go through the algorithms for division, 776 00:37:27,730 --> 00:37:30,520 but it's really slow. 777 00:37:30,520 --> 00:37:33,907 You usually want to avoid division as much as possible. 778 00:37:33,907 --> 00:37:36,240 Because it's not even just a straightforward programming 779 00:37:36,240 --> 00:37:39,290 thing, you have to do some approximation algorithm, 780 00:37:39,290 --> 00:37:41,780 some sort of Newton's method of some sort 781 00:37:41,780 --> 00:37:43,330 and just keep it [INAUDIBLE]. 782 00:37:43,330 --> 00:37:44,790 It's going to be slow. 783 00:37:44,790 --> 00:37:47,290 And in the main implementation, this actually 784 00:37:47,290 --> 00:37:50,640 turns out to be the slowest part of doing multiplication. 785 00:37:50,640 --> 00:37:52,230 The multiplication is cheap. 786 00:37:52,230 --> 00:37:56,210 But then doing mod p or mod q to bring it back down in size 787 00:37:56,210 --> 00:37:59,190 is going to be actually more expensive than the multiplying. 788 00:37:59,190 --> 00:38:01,480 So that's actually kind of a bummer. 789 00:38:01,480 --> 00:38:04,560 So the way that we're going to get around this 790 00:38:04,560 --> 00:38:08,590 is by doing this multiplication, this clever other 791 00:38:08,590 --> 00:38:13,280 representation, and also I'll show you the trick here. 792 00:38:13,280 --> 00:38:14,780 Let's see. 793 00:38:14,780 --> 00:38:16,680 Bear with me for a second, and then we'll 794 00:38:16,680 --> 00:38:21,082 and then see why it's so fast to use this Montgomery trick. 795 00:38:21,082 --> 00:38:26,190 And the basic idea is to represent numbers, 796 00:38:26,190 --> 00:38:29,570 these are regular numbers that you might actually 797 00:38:29,570 --> 00:38:30,852 want to multiply. 798 00:38:30,852 --> 00:38:32,980 And we're going to have a different representation 799 00:38:32,980 --> 00:38:35,313 for these numbers, called the Montgomery representation. 800 00:38:37,530 --> 00:38:41,190 And that representation is actually very easy. 801 00:38:41,190 --> 00:38:43,990 We just take the value a and we multiply it 802 00:38:43,990 --> 00:38:46,000 by some magic value R. 803 00:38:46,000 --> 00:38:48,250 I'll tell you what this R is in a second. 804 00:38:48,250 --> 00:38:51,710 But let's first figure out if you pick some arbitrary value 805 00:38:51,710 --> 00:38:53,820 R, what's going to happen here? 806 00:38:53,820 --> 00:38:56,200 So we take 2 numbers, a and b. 807 00:38:56,200 --> 00:39:00,075 Their Montgomery representations are sort of expectedly. 808 00:39:00,075 --> 00:39:02,840 A is aR, b is bR. 809 00:39:02,840 --> 00:39:05,920 And if you want to compute the product of a times b, 810 00:39:05,920 --> 00:39:08,100 well in Montgomery space, you can also 811 00:39:08,100 --> 00:39:09,160 multiply these guys out. 812 00:39:09,160 --> 00:39:13,310 You can take aR multiply it by bR. 813 00:39:13,310 --> 00:39:17,130 And what you get here is ab times R squared. 814 00:39:17,130 --> 00:39:18,770 So there are two Rs now. 815 00:39:18,770 --> 00:39:22,570 That's kind of annoying, but you can divide that by R. 816 00:39:22,570 --> 00:39:29,610 And we get ab times R. So this is probably weird in a sense 817 00:39:29,610 --> 00:39:32,190 that why would you multiply this extra number. 818 00:39:32,190 --> 00:39:34,525 But let's first figure out whether this is correct. 819 00:39:34,525 --> 00:39:37,179 And then we'll figure out why this is going to be faster. 820 00:39:37,179 --> 00:39:39,220 So it's correct in the sense that it's very easy. 821 00:39:39,220 --> 00:39:40,840 If you want to multiply some numbers, 822 00:39:40,840 --> 00:39:43,364 we just multiply by this R value and get the Montgomery 823 00:39:43,364 --> 00:39:44,208 representation. 824 00:39:44,208 --> 00:39:45,980 Then we can do all these multiplications 825 00:39:45,980 --> 00:39:47,920 to these Montgomery forms. 826 00:39:47,920 --> 00:39:50,264 And every time we multiply 2 numbers, 827 00:39:50,264 --> 00:39:52,180 we have to divide by R, look at the Montgomery 828 00:39:52,180 --> 00:39:54,550 form of the multiplication result. 829 00:39:54,550 --> 00:39:56,360 And then when we're done doing all 830 00:39:56,360 --> 00:39:58,780 of our squarings, multiplication, all this stuff, 831 00:39:58,780 --> 00:40:01,180 we're going to move back to the normal, regular form 832 00:40:01,180 --> 00:40:04,890 by just dividing by R one last time. 833 00:40:04,890 --> 00:40:06,586 AUDIENCE: [INAUDIBLE] 834 00:40:06,586 --> 00:40:08,086 PROFESSOR: We're now going to pick R 835 00:40:08,086 --> 00:40:09,560 to be a very nice number. 836 00:40:09,560 --> 00:40:11,900 And in particular, we're going to pick R 837 00:40:11,900 --> 00:40:17,780 to be a very nice number to make this division by R very fast. 838 00:40:17,780 --> 00:40:21,320 And the cool thing is that if this division by R 839 00:40:21,320 --> 00:40:24,499 is going to be very fast, then this 840 00:40:24,499 --> 00:40:26,290 is going to be a small number and we're not 841 00:40:26,290 --> 00:40:29,460 going to have to do this mod q very often. 842 00:40:29,460 --> 00:40:32,120 In particular, aR, let's say, is also 843 00:40:32,120 --> 00:40:34,530 going to be roughly 500 bits because it's all actually 844 00:40:34,530 --> 00:40:36,630 mod p or mod q. 845 00:40:36,630 --> 00:40:39,320 So aR is 500 bits. 846 00:40:39,320 --> 00:40:41,230 BR is going to also be 500 bits. 847 00:40:41,230 --> 00:40:44,160 So this product is going to be 1,000 bits. 848 00:40:44,160 --> 00:40:46,830 This R is going to be this nice 500 roughly bit 849 00:40:46,830 --> 00:40:48,630 number, same size as p. 850 00:40:48,630 --> 00:40:50,925 And if we can make this division to be fast, 851 00:40:50,925 --> 00:40:55,744 then the result is going to be a roughly 500 bit number here. 852 00:40:55,744 --> 00:40:57,910 So we were able to do the multiplying without having 853 00:40:57,910 --> 00:40:59,400 to do an extra divide. 854 00:40:59,400 --> 00:41:03,920 Dividing by R cheaply gives us this small result, getting us 855 00:41:03,920 --> 00:41:08,360 out of doing a mod p for most situations. 856 00:41:08,360 --> 00:41:11,670 OK, so what is this weird number that I keep talking about? 857 00:41:11,670 --> 00:41:17,944 Well R is just going to be 2 to 512. 858 00:41:17,944 --> 00:41:22,930 It's going to be 1 followed by a ton of zeros. 859 00:41:22,930 --> 00:41:25,260 So multiplying by this is easy, you just 860 00:41:25,260 --> 00:41:27,320 append a bunch of zeros to a number. 861 00:41:27,320 --> 00:41:32,960 Dividing could be easy if the low bits of the result 862 00:41:32,960 --> 00:41:34,547 are all zeros. 863 00:41:34,547 --> 00:41:37,750 So if you have a value that's a bunch of bits 864 00:41:37,750 --> 00:41:41,460 followed by 512 zeros, then dividing by 2 to the 512 865 00:41:41,460 --> 00:41:41,960 is cheap. 866 00:41:41,960 --> 00:41:44,337 You just discard the zeros on the right-hand side. 867 00:41:44,337 --> 00:41:47,140 And that's actually the correct division. 868 00:41:47,140 --> 00:41:48,650 Does that make sense? 869 00:41:48,650 --> 00:41:50,311 The slight problem is that we actually 870 00:41:50,311 --> 00:41:51,664 don't have zeros on the right hand side 871 00:41:51,664 --> 00:41:53,110 when you do this multiplication. 872 00:41:53,110 --> 00:41:56,750 These are like real 512 bit numbers with all the 512 bits 873 00:41:56,750 --> 00:41:57,460 used. 874 00:41:57,460 --> 00:41:58,890 So this will be a 1,000 bit number 875 00:41:58,890 --> 00:42:02,352 [? or ?] with all this bits also set to randomly 0 or 1, 876 00:42:02,352 --> 00:42:03,560 depending on what's going on. 877 00:42:03,560 --> 00:42:06,460 So we can't just discard the low bits. 878 00:42:06,460 --> 00:42:09,144 But the cleverness comes from the fact 879 00:42:09,144 --> 00:42:11,210 that the only thing we care about 880 00:42:11,210 --> 00:42:14,370 is the value of this thing mod p. 881 00:42:14,370 --> 00:42:18,610 So you can always add multiples of p to this value 882 00:42:18,610 --> 00:42:22,380 without changing it when it's equivalent to mod p. 883 00:42:22,380 --> 00:42:25,130 And as a result, we can add multiples of p 884 00:42:25,130 --> 00:42:28,020 to get the low bits to all be zeros. 885 00:42:28,020 --> 00:42:30,510 So let's look through some simple examples. 886 00:42:30,510 --> 00:42:33,390 I'm not going to write out 512 bits on the board. 887 00:42:33,390 --> 00:42:37,325 But suppose that-- here's a short example. 888 00:42:40,200 --> 00:42:42,710 Suppose that we have a situation where 889 00:42:42,710 --> 00:42:46,340 our value R is 2 to the 4th. 890 00:42:46,340 --> 00:42:49,810 So it's 1 followed by four zeros. 891 00:42:49,810 --> 00:42:53,170 So this is a much smaller example than the real thing. 892 00:42:53,170 --> 00:42:55,140 But let's see how this Montgomery division 893 00:42:55,140 --> 00:42:57,170 is going to work out. 894 00:42:57,170 --> 00:43:02,600 So suppose we're going to try to compute stuff mod q, where 895 00:43:02,600 --> 00:43:05,570 q, let's say, is maybe 7. 896 00:43:05,570 --> 00:43:10,000 So this is 1 1 1 in binary form. 897 00:43:10,000 --> 00:43:12,970 And what we're going to try to do 898 00:43:12,970 --> 00:43:16,360 is maybe we did some multiplication. 899 00:43:16,360 --> 00:43:19,700 And this value aR times bR is equal 900 00:43:19,700 --> 00:43:26,520 to this binary presentation 1 1 0 1 0. 901 00:43:26,520 --> 00:43:31,060 So this is going to be the value of aR times bR. 902 00:43:31,060 --> 00:43:32,780 How do we divide it by R? 903 00:43:32,780 --> 00:43:35,175 So clearly the low four bits aren't all 0, 904 00:43:35,175 --> 00:43:37,472 so we can't just divide it out. 905 00:43:37,472 --> 00:43:40,680 But we can add multiples of q. 906 00:43:40,680 --> 00:43:45,510 In particular, we can add 2 times q. 907 00:43:45,510 --> 00:43:49,700 So 2q is equal to 1 1 1 0. 908 00:43:49,700 --> 00:43:56,740 And now what we get is 0 0, carry a 1, 0, 909 00:43:56,740 --> 00:44:01,520 carry a 1, 1, carry a 1, 0 1. 910 00:44:01,520 --> 00:44:02,520 I hope I did that right. 911 00:44:02,520 --> 00:44:03,530 So this is what we get. 912 00:44:03,530 --> 00:44:07,207 So now we get aR bR plus 2 cubed. 913 00:44:07,207 --> 00:44:09,290 But we actually don't care about the plus 2 cubed. 914 00:44:09,290 --> 00:44:11,123 It's actually fine because all we care about 915 00:44:11,123 --> 00:44:12,190 is the value of mod q. 916 00:44:15,190 --> 00:44:18,070 And now we're closer, we have three 0 bits at the bottom. 917 00:44:18,070 --> 00:44:20,190 Now we can add another multiple of q. 918 00:44:20,190 --> 00:44:23,000 This time it's going to be probably 8q. 919 00:44:23,000 --> 00:44:26,680 So we add 1 1 1 here 0 0. 920 00:44:26,680 --> 00:44:29,905 And if we add it, we're going to get, let's say, 921 00:44:29,905 --> 00:44:37,120 0 0 0 then add these two guys 0, carry a 1, 0, carry a 1, 1 1. 922 00:44:37,120 --> 00:44:38,250 I think that's right. 923 00:44:38,250 --> 00:44:41,390 But now we have our original aR bR 924 00:44:41,390 --> 00:44:45,030 plus 2q plus 8q is equal to this thing. 925 00:44:45,030 --> 00:44:48,720 And finally, we can divide this thing by R very cheaply. 926 00:44:48,720 --> 00:44:54,762 Because we just discard the low four zeros. 927 00:44:54,762 --> 00:44:56,205 Make sense? 928 00:44:56,205 --> 00:44:57,167 Question. 929 00:44:57,167 --> 00:45:01,150 AUDIENCE: Is aR bR always going to end in, I guess, 930 00:45:01,150 --> 00:45:03,270 1,024 zeros? 931 00:45:03,270 --> 00:45:08,021 PROFESSOR: No, and the reason is that-- OK, 932 00:45:08,021 --> 00:45:10,130 here is the thing that's maybe confusing. 933 00:45:10,130 --> 00:45:12,710 A was, let's say, 512 bits. 934 00:45:12,710 --> 00:45:15,470 Then you multiply it by R. So here, you're right. 935 00:45:15,470 --> 00:45:19,380 This value is that 1,000 bit number where the high bit is 936 00:45:19,380 --> 00:45:20,980 a, the high 512 bits are a. 937 00:45:20,980 --> 00:45:22,794 And the low bits are all zeros. 938 00:45:22,794 --> 00:45:24,710 But then, you're going [? to do it with ?] mod 939 00:45:24,710 --> 00:45:27,410 q to bring it down to make it smaller. 940 00:45:27,410 --> 00:45:29,570 And in general, this is going to be the case. 941 00:45:29,570 --> 00:45:32,745 Because [? it only ?] has these low zeros the first time 942 00:45:32,745 --> 00:45:33,370 you convert it. 943 00:45:33,370 --> 00:45:35,119 But after you do a couple multiplications, 944 00:45:35,119 --> 00:45:37,685 they're going to be arbitrary bits. 945 00:45:37,685 --> 00:45:40,270 So these guys are-- so I really should 946 00:45:40,270 --> 00:45:43,260 have written mod q here-- and to compute this mod q 947 00:45:43,260 --> 00:45:49,356 as soon as you do the conversion to keep the whole value small. 948 00:45:49,356 --> 00:45:50,802 AUDIENCE: [INAUDIBLE] 949 00:45:50,802 --> 00:45:53,460 PROFESSOR: Yeah, so the initial conversion is expensive 950 00:45:53,460 --> 00:45:58,650 or at least it's as expensive as doing a regular modulus 951 00:45:58,650 --> 00:46:01,010 during the multiplication. 952 00:46:01,010 --> 00:46:03,010 The cool thing is that you pay this cost 953 00:46:03,010 --> 00:46:05,176 just once when you do the conversion into Montgomery 954 00:46:05,176 --> 00:46:06,122 form. 955 00:46:06,122 --> 00:46:09,240 And then, instead of converting it back at every step, 956 00:46:09,240 --> 00:46:11,235 you just keep it in Montgomery form. 957 00:46:11,235 --> 00:46:13,700 But remember that in order to do an exponentiation 958 00:46:13,700 --> 00:46:16,064 to an exponent which has 512 bits, 959 00:46:16,064 --> 00:46:17,480 you're saying you're going to have 960 00:46:17,480 --> 00:46:21,320 to do over 500 multiplications because we have to do at least 961 00:46:21,320 --> 00:46:23,870 500 squarings plus then some. 962 00:46:23,870 --> 00:46:27,000 So you do these mod q twice and then 963 00:46:27,000 --> 00:46:30,370 you get a lot of cheap divisions if you stay in this form. 964 00:46:30,370 --> 00:46:34,500 And then you do a division by R to get back to this form again. 965 00:46:34,500 --> 00:46:37,520 So instead of doing 500 mod qs for every multiplication step, 966 00:46:37,520 --> 00:46:39,366 you do it twice mod q. 967 00:46:39,366 --> 00:46:41,510 And then you keep doing these divisions 968 00:46:41,510 --> 00:46:45,080 by R cheaply using this trick. 969 00:46:45,080 --> 00:46:45,580 Question. 970 00:46:45,580 --> 00:46:49,460 AUDIENCE: So when you're adding the multiples of q 971 00:46:49,460 --> 00:46:51,400 and then dividing by R, [INAUDIBLE] 972 00:46:54,310 --> 00:46:56,780 PROFESSOR: Because it's actually mod q means 973 00:46:56,780 --> 00:46:58,920 the remainder when you divide by q. 974 00:46:58,920 --> 00:47:07,990 So x plus y times q, mod q is just x. 975 00:47:07,990 --> 00:47:08,930 AUDIENCE: [INAUDIBLE] 976 00:47:12,230 --> 00:47:16,089 PROFESSOR: So in this case, dividing by-- so another sort 977 00:47:16,089 --> 00:47:17,630 of nice property is that because it's 978 00:47:17,630 --> 00:47:22,450 all modulus at prime number-- it's also true 979 00:47:22,450 --> 00:47:28,080 that if you have x plus yq divided by R, 980 00:47:28,080 --> 00:47:35,790 mod q is actually the same as x divided by R mod q. 981 00:47:35,790 --> 00:47:39,180 The way to think of it is that there's no real division 982 00:47:39,180 --> 00:47:40,650 in modular arithmetic. 983 00:47:40,650 --> 00:47:41,730 It's just an inverse. 984 00:47:41,730 --> 00:47:44,060 So what this really says is this is actually 985 00:47:44,060 --> 00:47:49,465 x plus yq times some number called R inverse. 986 00:47:49,465 --> 00:47:52,930 And then you compute this whole thing mod q. 987 00:47:52,930 --> 00:47:57,210 And then you could think of this as x times R inverse 988 00:47:57,210 --> 00:48:05,320 mod q plus y [? u ?] R inverse mod q. 989 00:48:05,320 --> 00:48:08,610 And this thing cancels out because it's something times q. 990 00:48:15,060 --> 00:48:17,856 And there's some closed form for this thing. 991 00:48:17,856 --> 00:48:22,195 So here I did it by bit by bit, 2q then 8q, et cetera. 992 00:48:22,195 --> 00:48:23,765 It's actually a nice closed formula 993 00:48:23,765 --> 00:48:25,630 you can compute-- it's in the lecture notes, 994 00:48:25,630 --> 00:48:27,880 but it's probably not worth spending time on the board 995 00:48:27,880 --> 00:48:31,215 here-- for how do you figure out what multiple of q 996 00:48:31,215 --> 00:48:35,331 should you add to get all the low bits to turn to 0. 997 00:48:35,331 --> 00:48:38,200 So then it turns out that in order to do this division by R, 998 00:48:38,200 --> 00:48:43,450 you just need to compute this magic multiple of q, add it. 999 00:48:43,450 --> 00:48:46,290 And then discard the low bits and that 1000 00:48:46,290 --> 00:48:53,047 brings your number back to 512 bits, or whatever the size is. 1001 00:48:53,047 --> 00:48:54,029 OK. 1002 00:48:54,029 --> 00:48:55,790 And here's the subtlety. 1003 00:48:55,790 --> 00:48:57,470 The only reason we're talking about this 1004 00:48:57,470 --> 00:49:00,470 is that there's something funny going on here 1005 00:49:00,470 --> 00:49:05,090 that is going to allow us to learn timing information. 1006 00:49:05,090 --> 00:49:09,780 And in particular, even though we divided by R, 1007 00:49:09,780 --> 00:49:12,770 we know the result is going to be 512 bits. 1008 00:49:12,770 --> 00:49:15,123 But it still might be greater than q 1009 00:49:15,123 --> 00:49:16,820 because q isn't exactly [? up to 512 ?], 1010 00:49:16,820 --> 00:49:18,340 it's not a 512 bit number. 1011 00:49:18,340 --> 00:49:20,840 So it might be a little bit less than R. 1012 00:49:20,840 --> 00:49:24,730 So it might be that after we do this cheap division by R, 1013 00:49:24,730 --> 00:49:26,960 [? the way ?] we subtract out q one more 1014 00:49:26,960 --> 00:49:29,690 time because we get something that's small but not 1015 00:49:29,690 --> 00:49:31,400 quite small enough. 1016 00:49:31,400 --> 00:49:34,740 So there's a chance that after doing this division, 1017 00:49:34,740 --> 00:49:39,740 we maybe have to also subtract q again. 1018 00:49:39,740 --> 00:49:42,390 And this subtraction is going to be part of what 1019 00:49:42,390 --> 00:49:44,250 this attack is all about. 1020 00:49:44,250 --> 00:49:48,060 It turns out that subtracting this q adds time. 1021 00:49:48,060 --> 00:49:51,660 And someone figured out-- not these guys 1022 00:49:51,660 --> 00:49:53,050 but some previous work-- that you 1023 00:49:53,050 --> 00:49:56,770 show that this probability of doing this thing, this 1024 00:49:56,770 --> 00:49:58,145 is called an extractor reduction. 1025 00:50:03,500 --> 00:50:10,020 This probability sort of depends on the particular value 1026 00:50:10,020 --> 00:50:12,410 that you're exponentiating. 1027 00:50:12,410 --> 00:50:19,790 So if you're computing x to the d mod q, 1028 00:50:19,790 --> 00:50:22,400 the probability of an extra reduction, 1029 00:50:22,400 --> 00:50:25,240 at some point while computing x to the d mod q, 1030 00:50:25,240 --> 00:50:31,860 is going to be equal to x mod q divided by 2R. 1031 00:50:36,890 --> 00:50:40,390 So if we're going to be computing x to the mod q, 1032 00:50:40,390 --> 00:50:43,690 then depending on what the value of x mod q 1033 00:50:43,690 --> 00:50:45,410 is, whether it's big or small, you're 1034 00:50:45,410 --> 00:50:49,080 going to have even more or less of these extra reductions. 1035 00:50:49,080 --> 00:50:51,577 And just to show you where this is going to fit in, 1036 00:50:51,577 --> 00:50:53,785 this is actually going to happen in the decrypt step, 1037 00:50:53,785 --> 00:50:55,951 because during the decrypt step, the server is going 1038 00:50:55,951 --> 00:50:57,330 to be computing c to the d. 1039 00:50:57,330 --> 00:51:00,650 And this says the extractor reductions 1040 00:51:00,650 --> 00:51:05,160 are going to be proportional to how close x, or c in this case, 1041 00:51:05,160 --> 00:51:07,254 is to the value q. 1042 00:51:07,254 --> 00:51:08,920 So this is going to be worrisome, right, 1043 00:51:08,920 --> 00:51:12,490 because the attacker gets to choose the input c. 1044 00:51:12,490 --> 00:51:14,640 And the number of extractor reductions 1045 00:51:14,640 --> 00:51:16,940 is going to be proportional to how close the c is 1046 00:51:16,940 --> 00:51:18,981 to one of the factors, the q. 1047 00:51:18,981 --> 00:51:21,260 And this is how you're going to tell I'm getting close 1048 00:51:21,260 --> 00:51:23,337 to the q, or I've overshot q. 1049 00:51:23,337 --> 00:51:25,545 And all of a sudden, there's no extractor reductions, 1050 00:51:25,545 --> 00:51:28,556 it's probably because x mod q is very small the x is 1051 00:51:28,556 --> 00:51:29,472 q plus little epsilon. 1052 00:51:29,472 --> 00:51:31,720 And it's very small. 1053 00:51:31,720 --> 00:51:33,942 So that's one part of the timing attack 1054 00:51:33,942 --> 00:51:35,650 we're going to be looking at in a second. 1055 00:51:38,770 --> 00:51:42,740 I don't have any proof that this actually true [INAUDIBLE] 1056 00:51:42,740 --> 00:51:44,905 these extractor reductions work like this. 1057 00:51:44,905 --> 00:51:45,680 Yea, question. 1058 00:51:45,680 --> 00:51:48,700 AUDIENCE: What happens if you don't do this extra reduction? 1059 00:51:48,700 --> 00:51:51,210 PROFESSOR: Oh, what happens if you don't do this extractor 1060 00:51:51,210 --> 00:51:51,710 reduction? 1061 00:51:55,510 --> 00:51:57,850 You can avoid this extra reduction. 1062 00:51:57,850 --> 00:52:01,790 And then you just have to do some extra probably 1063 00:52:01,790 --> 00:52:03,410 modular reductions later. 1064 00:52:03,410 --> 00:52:06,500 I think the math just works out nicely this way 1065 00:52:06,500 --> 00:52:07,834 for the Montgomery form. 1066 00:52:07,834 --> 00:52:09,750 I think for many of these things it's actually 1067 00:52:09,750 --> 00:52:12,406 once you look at them as a timing channel [INAUDIBLE] 1068 00:52:12,406 --> 00:52:13,780 [? think ?] don't do this at all, 1069 00:52:13,780 --> 00:52:16,004 or maybe you should do some other plan. 1070 00:52:16,004 --> 00:52:16,670 So you're right, 1071 00:52:16,670 --> 00:52:19,710 I think you could probably avoid this extra reduction 1072 00:52:19,710 --> 00:52:22,655 and probably just do the mod q, perhaps at the end. 1073 00:52:22,655 --> 00:52:24,840 I haven't actually tried implementing this. 1074 00:52:24,840 --> 00:52:27,380 But it seems like it could work. 1075 00:52:27,380 --> 00:52:29,390 It might be that you just have to do mod q once 1076 00:52:29,390 --> 00:52:31,598 [? there ?], which you'll probably have to do anyway. 1077 00:52:31,598 --> 00:52:32,820 So it's not super clear. 1078 00:52:32,820 --> 00:52:37,770 Maybe it's [INAUDIBLE] probably not q. 1079 00:52:37,770 --> 00:52:40,314 So in light of the fact that [INAUDIBLE]. 1080 00:52:44,274 --> 00:52:46,440 Actually, I shouldn't speak authoritatively to this. 1081 00:52:46,440 --> 00:52:47,000 I haven't tired implementing this. 1082 00:52:47,000 --> 00:52:49,166 So maybe there's some deep reason why this extractor 1083 00:52:49,166 --> 00:52:50,184 reduction has to happen. 1084 00:52:50,184 --> 00:52:53,490 I couldn't think of one. 1085 00:52:53,490 --> 00:52:54,450 All right, questions? 1086 00:52:57,110 --> 00:53:00,995 So here's the last piece of the puzzle for how OpenSSL, 1087 00:53:00,995 --> 00:53:06,040 this library that this paper attacks implements 1088 00:53:06,040 --> 00:53:07,870 multiplication. 1089 00:53:07,870 --> 00:53:12,630 So this Montgomery trick is great for avoiding the mod q 1090 00:53:12,630 --> 00:53:15,630 part during modular multiplication. 1091 00:53:15,630 --> 00:53:17,770 But then there's a question of how do you actually 1092 00:53:17,770 --> 00:53:19,020 multiply two numbers together. 1093 00:53:19,020 --> 00:53:21,235 So we're doing lower and lower level. 1094 00:53:21,235 --> 00:53:25,791 So suppose you have [? the raw ?] multiplication. 1095 00:53:28,579 --> 00:53:30,370 So this is not even modular multiplication. 1096 00:53:30,370 --> 00:53:33,475 You have two numbers, a and b. 1097 00:53:33,475 --> 00:53:38,636 And both these guys are 512 bit numbers. 1098 00:53:38,636 --> 00:53:40,250 How do you multiply them together 1099 00:53:40,250 --> 00:53:42,400 when your machine is only a 32 bit machine, 1100 00:53:42,400 --> 00:53:46,226 like the guys in the paper, or a 64 bit, but still, same thing? 1101 00:53:46,226 --> 00:53:48,670 How would you implement multiplication of these guys? 1102 00:53:53,740 --> 00:53:56,242 Any suggestions? 1103 00:53:56,242 --> 00:53:58,200 Well I guess it was a straightforward question, 1104 00:53:58,200 --> 00:54:01,860 you just represent a and b as a sequence of machine 1105 00:54:01,860 --> 00:54:05,290 [? words. ?] And then you just do this quadratic product 1106 00:54:05,290 --> 00:54:06,752 of these two guys. 1107 00:54:06,752 --> 00:54:08,960 [INAUDIBLE] see a simple example, instead of thinking 1108 00:54:08,960 --> 00:54:13,574 of a 512 bit number, let's think of these guys as 64 bit numbers 1109 00:54:13,574 --> 00:54:15,671 and we're on a 32 bit machine. 1110 00:54:15,671 --> 00:54:16,170 Right. 1111 00:54:16,170 --> 00:54:17,900 So we're going to have values. 1112 00:54:17,900 --> 00:54:20,794 The value of a is going to be represented by two 1113 00:54:20,794 --> 00:54:21,960 [? very ?] different things. 1114 00:54:21,960 --> 00:54:27,550 It's going to be, let's call it, a1 and a0. 1115 00:54:27,550 --> 00:54:29,895 So a0 is the low bit, a1 is the high bit. 1116 00:54:29,895 --> 00:54:31,520 And similarly, we're going to represent 1117 00:54:31,520 --> 00:54:36,760 b as two things, b1 b0. 1118 00:54:36,760 --> 00:54:39,640 So then a naive way to represent a b 1119 00:54:39,640 --> 00:54:44,310 is going to be to multiply all these guys out. 1120 00:54:44,310 --> 00:54:48,020 So it's going to be a three cell number. 1121 00:54:48,020 --> 00:54:52,140 The high bit is going to be a1 b1. 1122 00:54:52,140 --> 00:54:55,560 The low bit is going to be a0 b0. 1123 00:54:55,560 --> 00:55:01,845 And the middle word is going to be a1 b0 plus a0 b1. 1124 00:55:01,845 --> 00:55:06,330 So this is how you do the multiplication, right. 1125 00:55:06,330 --> 00:55:06,940 Question? 1126 00:55:06,940 --> 00:55:08,822 AUDIENCE: So I was going to say are 1127 00:55:08,822 --> 00:55:10,785 you using [INAUDIBLE] method? 1128 00:55:10,785 --> 00:55:13,060 PROFESSOR: Yeah, so this is like a clever method 1129 00:55:13,060 --> 00:55:15,490 alternative for doing multiplication, which 1130 00:55:15,490 --> 00:55:16,680 doesn't involve four steps. 1131 00:55:16,680 --> 00:55:18,435 Here, you have to do four multiplications. 1132 00:55:18,435 --> 00:55:20,807 There's this clever other method, Karatsuba. 1133 00:55:20,807 --> 00:55:22,890 Do they teach this in 601 or something these days? 1134 00:55:22,890 --> 00:55:23,290 AUDIENCE: 042. 1135 00:55:23,290 --> 00:55:24,373 PROFESSOR: 042, excellent. 1136 00:55:24,373 --> 00:55:25,980 Yeah, that's a very nice method. 1137 00:55:25,980 --> 00:55:29,440 Almost every cryptographic library implements this. 1138 00:55:29,440 --> 00:55:32,230 And for those of you that, I guess, 1139 00:55:32,230 --> 00:55:34,980 weren't undergrads here, since we have grad students maybe 1140 00:55:34,980 --> 00:55:35,685 they haven't seen Karatsuba. 1141 00:55:35,685 --> 00:55:37,184 I'll just write it out on the board. 1142 00:55:37,184 --> 00:55:40,850 It's a clever thing the first time you see it. 1143 00:55:40,850 --> 00:55:46,310 And what you can do is basically compute out three values. 1144 00:55:46,310 --> 00:55:49,040 You're going to compute out a1 b1. 1145 00:55:49,040 --> 00:55:59,190 You're going to also compute a1 minus b0 times b1 1146 00:55:59,190 --> 00:56:04,950 minus-- sorry-- a1 minus a0, b1 minus b0. 1147 00:56:04,950 --> 00:56:08,690 And a0 b0. 1148 00:56:08,690 --> 00:56:11,125 And this does three multiplications 1149 00:56:11,125 --> 00:56:12,225 instead of four. 1150 00:56:12,225 --> 00:56:13,810 And it turns out you can actually 1151 00:56:13,810 --> 00:56:18,440 reconstruct this value from these three multiplication 1152 00:56:18,440 --> 00:56:20,200 results. 1153 00:56:20,200 --> 00:56:22,810 And the particular way to do it is this 1154 00:56:22,810 --> 00:56:29,736 is going to be the-- let me write it out 1155 00:56:29,736 --> 00:56:31,910 in a different form. 1156 00:56:31,910 --> 00:56:41,010 So we're going to have 2 to the 64 times-- sorry-- 2 to the 64 1157 00:56:41,010 --> 00:56:52,710 plus 2 to the 32 times a1 b1 plus 2 1158 00:56:52,710 --> 00:57:00,230 to the 32 times minus that little guy in the middle a1 1159 00:57:00,230 --> 00:57:05,640 minus a0 b1 minus b0. 1160 00:57:05,640 --> 00:57:15,020 And finally, we're going to do 2 to the 32 plus 1 times a0 b0. 1161 00:57:15,020 --> 00:57:16,920 And it's a little messy, but actually 1162 00:57:16,920 --> 00:57:19,380 if you work through the details, you'll 1163 00:57:19,380 --> 00:57:20,880 end up convincing yourself hopefully 1164 00:57:20,880 --> 00:57:26,285 that this value is exactly the same as this value. 1165 00:57:26,285 --> 00:57:27,930 So it's a clever. 1166 00:57:27,930 --> 00:57:31,470 But nonetheless, it saves you one multiplication. 1167 00:57:31,470 --> 00:57:34,670 And the way we apply this to doing 1168 00:57:34,670 --> 00:57:37,660 much larger multiplications is that you recursively 1169 00:57:37,660 --> 00:57:38,610 keep going down. 1170 00:57:38,610 --> 00:57:41,750 So if you have 512 bit values, you 1171 00:57:41,750 --> 00:57:44,790 could break it down to 256 bit multiplication. 1172 00:57:44,790 --> 00:57:47,802 You do three 256 bit multiplications. 1173 00:57:47,802 --> 00:57:49,260 And then each of those you're going 1174 00:57:49,260 --> 00:57:52,410 to do using the same Karatsuba trick recursively. 1175 00:57:52,410 --> 00:57:54,840 And eventually you'll get down to machine size, which 1176 00:57:54,840 --> 00:57:56,986 you can just do with a single machine 1177 00:57:56,986 --> 00:58:02,590 instruction. [INAUDIBLE] This make sense? 1178 00:58:02,590 --> 00:58:04,660 So what's the timing attack here? 1179 00:58:04,660 --> 00:58:07,430 How do these guys exploit this Karatsuba multiplication? 1180 00:58:07,430 --> 00:58:11,720 Well, it turns out that OpenSSL worries 1181 00:58:11,720 --> 00:58:13,920 about basically two kinds of multiplications 1182 00:58:13,920 --> 00:58:15,850 that you might need to do. 1183 00:58:15,850 --> 00:58:18,757 One is a multiplication between two large numbers 1184 00:58:18,757 --> 00:58:19,965 that are about the same size. 1185 00:58:19,965 --> 00:58:22,250 So this happens a lot when we're doing 1186 00:58:22,250 --> 00:58:25,327 this modular exponentiation because all the values we're 1187 00:58:25,327 --> 00:58:26,868 going to be multiplying are all going 1188 00:58:26,868 --> 00:58:29,445 to be roughly 512 bits in size. 1189 00:58:29,445 --> 00:58:33,330 So when we're multiplying by c to the y or doing a squaring, 1190 00:58:33,330 --> 00:58:35,850 we're multiplying two things that are about the same size. 1191 00:58:35,850 --> 00:58:38,890 And then this Karatsuba trick makes a lot of sense 1192 00:58:38,890 --> 00:58:41,290 because, instead of computing stuff 1193 00:58:41,290 --> 00:58:43,790 in times squared of the input size, 1194 00:58:43,790 --> 00:58:48,740 Karatsuba is roughly n to the 1.58, something like that. 1195 00:58:48,740 --> 00:58:50,335 So it's much faster. 1196 00:58:50,335 --> 00:58:52,490 But then there's this other situation 1197 00:58:52,490 --> 00:58:54,930 where OpenSSL might be multiplying two numbers that 1198 00:58:54,930 --> 00:58:57,410 are very different in size: one that's very big, 1199 00:58:57,410 --> 00:58:58,530 and one that's very small. 1200 00:58:58,530 --> 00:59:00,900 And in that case you could use Karatsuba, 1201 00:59:00,900 --> 00:59:02,990 but then it's going to get you slower 1202 00:59:02,990 --> 00:59:04,610 than doing the naive thing. 1203 00:59:04,610 --> 00:59:06,660 Suppose you're trying to multiply a 512 bit 1204 00:59:06,660 --> 00:59:08,997 number by a 64 bit number, you'd rather just 1205 00:59:08,997 --> 00:59:10,830 do the straightforward thing, where you just 1206 00:59:10,830 --> 00:59:13,050 multiply by each of the things in the 64 bit 1207 00:59:13,050 --> 00:59:18,290 number plus 2n instead of n to the 1.58 something. 1208 00:59:18,290 --> 00:59:21,900 So as a result, the OpenSSL guys tried to be clever, 1209 00:59:21,900 --> 00:59:25,760 and that's where often problems start. 1210 00:59:25,760 --> 00:59:28,280 They decided that they'll actually 1211 00:59:28,280 --> 00:59:30,880 switch dynamically between this Karatsuba efficient thing 1212 00:59:30,880 --> 00:59:35,450 and this sort of grade school method of multiplication here. 1213 00:59:35,450 --> 00:59:37,400 And their heuristic was basically 1214 00:59:37,400 --> 00:59:39,050 if the two things you're multiplying 1215 00:59:39,050 --> 00:59:42,483 are exactly the same number of machine words, 1216 00:59:42,483 --> 00:59:44,024 so they at least have the same number 1217 00:59:44,024 --> 00:59:48,110 of bits up to 32-bit units, then they'll go to Karatsuba. 1218 00:59:48,110 --> 00:59:50,380 And if the two things they're multiplying 1219 00:59:50,380 --> 00:59:52,770 have a different number or 32 bit units, 1220 00:59:52,770 --> 00:59:57,660 then they'll do the quadratic or straightforward or regular, 1221 00:59:57,660 --> 00:59:59,882 normal multiplication. 1222 00:59:59,882 --> 01:00:03,880 And there you can see if your number all of a sudden 1223 01:00:03,880 --> 01:00:06,290 switches to be a little bit smaller, 1224 01:00:06,290 --> 01:00:08,710 then you're going to switch from the sufficient thing 1225 01:00:08,710 --> 01:00:11,240 to this other multiplication method. 1226 01:00:11,240 --> 01:00:14,030 And presumably, the cutoff point isn't 1227 01:00:14,030 --> 01:00:15,595 going to be exactly smooth so you'll 1228 01:00:15,595 --> 01:00:17,500 be able to tell all of a sudden, it's 1229 01:00:17,500 --> 01:00:19,190 now taking a lot longer to multiply 1230 01:00:19,190 --> 01:00:22,320 or a lot shorter to multiply than before. 1231 01:00:22,320 --> 01:00:26,000 And that's what these guys exploit in their timing attack 1232 01:00:26,000 --> 01:00:26,940 again. 1233 01:00:26,940 --> 01:00:28,060 Does that make sense? 1234 01:00:28,060 --> 01:00:32,070 What's going on with the [INAUDIBLE] All right. 1235 01:00:32,070 --> 01:00:34,680 So I think I'm now done with telling you 1236 01:00:34,680 --> 01:00:36,385 about all the weird implementation 1237 01:00:36,385 --> 01:00:39,590 tricks that people play when implementing RSA in practice. 1238 01:00:39,590 --> 01:00:41,630 So now let's try to put them back together 1239 01:00:41,630 --> 01:00:44,410 into an entire web server and figure out 1240 01:00:44,410 --> 01:00:48,230 how do you [? tickle ?] all these interesting bits 1241 01:00:48,230 --> 01:00:52,220 of the implementation from the input network packet. 1242 01:00:52,220 --> 01:00:54,910 So what happens in a web server is 1243 01:00:54,910 --> 01:00:59,330 that the web server, if you remember from the HTTPS 1244 01:00:59,330 --> 01:01:01,890 lecture, has a secret key. 1245 01:01:01,890 --> 01:01:04,780 And it uses the secret key to prove 1246 01:01:04,780 --> 01:01:06,820 that it's the correct owner of all 1247 01:01:06,820 --> 01:01:11,190 that certificate in the HTTPS protocol or in TLS. 1248 01:01:11,190 --> 01:01:15,940 And they way this works is that the clients send some randomly 1249 01:01:15,940 --> 01:01:19,470 chosen bits, and the bits are encrypted 1250 01:01:19,470 --> 01:01:21,210 using the server's public key. 1251 01:01:21,210 --> 01:01:24,395 And the server in this TLS protocol decrypts this message. 1252 01:01:24,395 --> 01:01:26,730 And if the message checks out, it 1253 01:01:26,730 --> 01:01:29,249 uses those random bits to establish a [? session ?]. 1254 01:01:29,249 --> 01:01:32,246 But in this case, the message isn't going to check out. 1255 01:01:32,246 --> 01:01:34,079 The message is going to be carefully chosen, 1256 01:01:34,079 --> 01:01:35,845 the padding bits aren't going to match, 1257 01:01:35,845 --> 01:01:37,470 and the server is going to return error 1258 01:01:37,470 --> 01:01:39,850 as soon as it finishes encrypting our message. 1259 01:01:39,850 --> 01:01:42,080 And that's what we're going to time here. 1260 01:01:42,080 --> 01:01:49,368 So the server-- you can think of this is Apache with open SSL-- 1261 01:01:49,368 --> 01:01:52,500 you're going to get a message from the client, 1262 01:01:52,500 --> 01:01:55,940 and you can think of this as a ciphertext 1263 01:01:55,940 --> 01:01:59,400 c, or a hypothetical ciphertext, that the client 1264 01:01:59,400 --> 01:02:00,545 might have produced. 1265 01:02:00,545 --> 01:02:03,340 And the first thing we're going to do with a ciphertext c, 1266 01:02:03,340 --> 01:02:06,910 we want to decrypt it using roughly this formula. 1267 01:02:06,910 --> 01:02:08,820 And if you remember the first optimization 1268 01:02:08,820 --> 01:02:12,806 we're going to apply is the Chinese Remainder Theorem. 1269 01:02:12,806 --> 01:02:14,306 So the first thing we're going to do 1270 01:02:14,306 --> 01:02:16,730 is basically split our pipeline in two parts. 1271 01:02:16,730 --> 01:02:20,430 We're going to do one thing mod p another thing mod q 1272 01:02:20,430 --> 01:02:22,719 and then recombine the results at the end of the day. 1273 01:02:22,719 --> 01:02:24,218 So the first thing we're going to do 1274 01:02:24,218 --> 01:02:26,070 is, we're actually going to take c 1275 01:02:26,070 --> 01:02:28,580 and we're going to compute, let's 1276 01:02:28,580 --> 01:02:35,480 call this c0, which is going to be equal to c mod q. 1277 01:02:35,480 --> 01:02:38,710 And we're also going to have a different value, let's 1278 01:02:38,710 --> 01:02:44,730 call it c1, which is going to be c mod p. 1279 01:02:44,730 --> 01:02:46,930 And then we're going to do the same thing to each 1280 01:02:46,930 --> 01:02:51,905 of these values to basically compute c to the d mod p 1281 01:02:51,905 --> 01:02:55,010 and c to the d mod q. 1282 01:02:55,010 --> 01:02:58,070 And here we're going to basically initially we're 1283 01:02:58,070 --> 01:03:00,585 going to [? starch. ?] After CRT, we're 1284 01:03:00,585 --> 01:03:02,610 going to switch into Montgomery representation 1285 01:03:02,610 --> 01:03:06,040 because that's going to make our multiplies very fast. 1286 01:03:06,040 --> 01:03:08,150 So the next thing SSL is going to do 1287 01:03:08,150 --> 01:03:09,610 to your number, it's actually going 1288 01:03:09,610 --> 01:03:12,900 to compute all the [INAUDIBLE] at c0 prime, 1289 01:03:12,900 --> 01:03:18,740 which is going to be c0 times R mod q. 1290 01:03:18,740 --> 01:03:20,208 And the same thing down here, I'm 1291 01:03:20,208 --> 01:03:21,666 not going to write out the pipeline 1292 01:03:21,666 --> 01:03:23,200 because that'll look the same. 1293 01:03:23,200 --> 01:03:27,520 And then, now that we've switched into Montgomery form, 1294 01:03:27,520 --> 01:03:31,840 we can finally do our multiplications. 1295 01:03:31,840 --> 01:03:34,190 And here's where we're going to use the sliding window 1296 01:03:34,190 --> 01:03:35,780 technique. 1297 01:03:35,780 --> 01:03:38,290 So once we have c prime, we can actually 1298 01:03:38,290 --> 01:03:47,460 try to compute this prime exponentiate it to 2d mod q. 1299 01:03:47,460 --> 01:03:52,250 And here, as we're computing this value to the d, 1300 01:03:52,250 --> 01:03:53,990 we're going to be using sliding windows. 1301 01:03:53,990 --> 01:03:59,510 So here, we're going to do sliding windows 1302 01:03:59,510 --> 01:04:03,350 for the bits in this d exponent. 1303 01:04:03,350 --> 01:04:08,450 And also we're going to do Karatsuba 1304 01:04:08,450 --> 01:04:12,820 or regular multiplication depending on exactly what 1305 01:04:12,820 --> 01:04:15,540 the size of our operands are. 1306 01:04:15,540 --> 01:04:18,500 So if it turns out that the thing we're multiplying, 1307 01:04:18,500 --> 01:04:25,070 c0 prime and maybe that previously squared result, 1308 01:04:25,070 --> 01:04:27,310 are the same size, we're going to do Karatsuba. 1309 01:04:27,310 --> 01:04:31,230 If c0 prime is tiny but some previous thing 1310 01:04:31,230 --> 01:04:34,240 we're multiplying it to is big , then we're going to do 1311 01:04:34,240 --> 01:04:36,610 quadratic multiplication, normal multiplication. 1312 01:04:36,610 --> 01:04:38,520 There's sliding windows coming in here, 1313 01:04:38,520 --> 01:04:45,770 here we also have this Karatsuba versus normal multiplying. 1314 01:04:45,770 --> 01:04:49,630 And also in this step, the extra reductions come in. 1315 01:04:49,630 --> 01:04:54,420 Because at every multiply, the extra reductions 1316 01:04:54,420 --> 01:04:58,840 are going to be proportional to the thing we're 1317 01:04:58,840 --> 01:05:00,950 exponentiating mod q. 1318 01:05:00,950 --> 01:05:04,452 [INAUDIBLE] just plug in the formula over here, 1319 01:05:04,452 --> 01:05:05,910 the probability extra reductions is 1320 01:05:05,910 --> 01:05:11,170 going to be proportional to this value of c0 prime mod 1321 01:05:11,170 --> 01:05:14,990 q divided by 2R. 1322 01:05:19,200 --> 01:05:21,672 So this is where the really timing sensitive bit 1323 01:05:21,672 --> 01:05:22,718 is going to come in. 1324 01:05:22,718 --> 01:05:24,384 And there are actually two effects here. 1325 01:05:24,384 --> 01:05:27,425 There's this Karatsuba versus normal choice. 1326 01:05:27,425 --> 01:05:29,720 And then there's the number of extra reductions 1327 01:05:29,720 --> 01:05:32,605 you're going to be making. 1328 01:05:32,605 --> 01:05:34,480 So we'll see how we exploit this in a second, 1329 01:05:34,480 --> 01:05:36,800 but now that you get this result for mod q, 1330 01:05:36,800 --> 01:05:39,560 you're going to get a similar result mod p, 1331 01:05:39,560 --> 01:05:43,780 you can finally recombine these guys from the top 1332 01:05:43,780 --> 01:05:46,660 and the bottom and use CRT. 1333 01:05:46,660 --> 01:05:49,870 And what you get out from CRT is actually-- 1334 01:05:49,870 --> 01:05:55,110 sorry I guess we need a first convert it back down into non 1335 01:05:55,110 --> 01:05:56,760 Montgomery form. 1336 01:05:56,760 --> 01:06:00,380 So we're going to get first, we're 1337 01:06:00,380 --> 01:06:09,620 going to get c0 prime to the d divided by R mod q. 1338 01:06:09,620 --> 01:06:15,160 And this thing, because c0 prime was c0 times R mod q, 1339 01:06:15,160 --> 01:06:19,820 if we do this then we're going to get back out our value of c 1340 01:06:19,820 --> 01:06:23,110 to the d mod q. 1341 01:06:23,110 --> 01:06:25,370 And we get c to the d here, we're 1342 01:06:25,370 --> 01:06:28,290 going to get to c to the d mod p on the bottom version 1343 01:06:28,290 --> 01:06:29,700 of this pipeline. 1344 01:06:29,700 --> 01:06:35,220 And we can use CRT to get the value of c to the d mod m. 1345 01:06:35,220 --> 01:06:38,060 Sorry for the small type here, or font size. 1346 01:06:38,060 --> 01:06:40,680 But roughly it's the same thing we're expecting here. 1347 01:06:40,680 --> 01:06:44,305 We can finally get our result. And we get our message, m. 1348 01:06:44,305 --> 01:06:46,420 So the server takes an incoming packet 1349 01:06:46,420 --> 01:06:51,000 that it gets, runs it through this whole pipeline, 1350 01:06:51,000 --> 01:06:53,578 does two parts of this pipeline, ends up 1351 01:06:53,578 --> 01:06:57,627 with a decrypted message m that's equal c to the d mod m. 1352 01:06:57,627 --> 01:07:00,682 And then it's going to check the padding of this message. 1353 01:07:00,682 --> 01:07:02,940 And in this particular attack, because we're 1354 01:07:02,940 --> 01:07:05,320 going to carefully construct this value c, 1355 01:07:05,320 --> 01:07:07,810 the padding is going to actually not match up. 1356 01:07:07,810 --> 01:07:10,290 We're going to choose the value c according 1357 01:07:10,290 --> 01:07:12,629 to some other heuristics that aren't 1358 01:07:12,629 --> 01:07:14,754 encrypting a real message with the correct padding. 1359 01:07:14,754 --> 01:07:17,310 So the padding is going to be a mismatch, and the server's 1360 01:07:17,310 --> 01:07:19,601 going to need it to record an error back to the client. 1361 01:07:19,601 --> 01:07:22,080 [? And it pulls ?] the connection. 1362 01:07:22,080 --> 01:07:23,680 And that's the time that we're going 1363 01:07:23,680 --> 01:07:28,230 to measure to figure out how long this whole pipeline took. 1364 01:07:28,230 --> 01:07:29,362 Makes sense? 1365 01:07:29,362 --> 01:07:31,070 Questions about this pipeline and putting 1366 01:07:31,070 --> 01:07:34,396 all the optimizations together? 1367 01:07:34,396 --> 01:07:35,354 AUDIENCE: [INAUDIBLE] 1368 01:07:41,445 --> 01:07:43,070 PROFESSOR: Yeah, you're probably right. 1369 01:07:43,070 --> 01:07:45,600 Yes, c1 to the d, c0 to the d. 1370 01:07:45,600 --> 01:07:46,620 Yeah, this is c0. 1371 01:07:46,620 --> 01:07:49,287 Yeah, correct. 1372 01:07:49,287 --> 01:07:51,722 AUDIENCE: When you divide by r [INAUDIBLE], 1373 01:07:51,722 --> 01:07:55,131 isn't there a [INAUDIBLE] on how many 1374 01:07:55,131 --> 01:08:00,812 q's you have to have to get the [? little bit ?] to be 1375 01:08:00,812 --> 01:08:03,035 0? [INAUDIBLE]. 1376 01:08:03,035 --> 01:08:05,160 PROFESSOR: Yeah, so there might be extra reductions 1377 01:08:05,160 --> 01:08:07,049 in this final phase as well. 1378 01:08:07,049 --> 01:08:07,590 You're right. 1379 01:08:07,590 --> 01:08:11,220 So potentially, we have do this divide by R correctly. 1380 01:08:11,220 --> 01:08:13,300 So we probably have to do exactly the same thing 1381 01:08:13,300 --> 01:08:16,399 as we saw for the Montgomery reductions here. 1382 01:08:16,399 --> 01:08:19,649 When we do this divide by R to convert it back. 1383 01:08:19,649 --> 01:08:22,560 So it's not clear exactly how many qs we should add. 1384 01:08:22,560 --> 01:08:25,250 We should figure out how many qs to add, add that many, 1385 01:08:25,250 --> 01:08:28,329 kill the low zeros, and then do mod q again, 1386 01:08:28,329 --> 01:08:29,514 maybe an extra reduction. 1387 01:08:29,514 --> 01:08:31,180 You're absolutely right, this is exactly 1388 01:08:31,180 --> 01:08:33,406 the same kind of divide by R mod q 1389 01:08:33,406 --> 01:08:38,229 as we do for every Montgomery multiplication step. 1390 01:08:38,229 --> 01:08:40,689 Make sense? 1391 01:08:40,689 --> 01:08:43,569 Any other questions? 1392 01:08:43,569 --> 01:08:44,116 All right. 1393 01:08:44,116 --> 01:08:45,240 So how do you exploit this? 1394 01:08:45,240 --> 01:08:47,689 How does an attacker actually figure out 1395 01:08:47,689 --> 01:08:49,710 what the secret key of the server 1396 01:08:49,710 --> 01:08:54,300 is by measuring the time of this entire pipeline? 1397 01:08:54,300 --> 01:08:58,160 So these guys have a plan that basically 1398 01:08:58,160 --> 01:09:03,810 involves guessing one bit of the private key at a time. 1399 01:09:03,810 --> 01:09:07,060 And what they mean actually by guessing the private key is 1400 01:09:07,060 --> 01:09:10,960 that you might think the private key is this encryption exponent 1401 01:09:10,960 --> 01:09:13,528 d, because actually you know e, you 1402 01:09:13,528 --> 01:09:15,160 know n, that's the public key. 1403 01:09:15,160 --> 01:09:16,849 The only think you don't know is d. 1404 01:09:16,849 --> 01:09:19,785 But in fact, in this attack they don't go for the exponent d 1405 01:09:19,785 --> 01:09:21,810 directly, that's a little bit harder to guess. 1406 01:09:21,810 --> 01:09:23,185 Instead, what they're going to go 1407 01:09:23,185 --> 01:09:25,890 for is the value q or the value p, 1408 01:09:25,890 --> 01:09:27,649 doesn't really matter which one. 1409 01:09:27,649 --> 01:09:31,229 Once you guess what the value p or q is, then 1410 01:09:31,229 --> 01:09:34,662 you can give an n, you can factor in the p times q. 1411 01:09:34,662 --> 01:09:37,470 Then if you know p times q, you can actually-- 1412 01:09:37,470 --> 01:09:39,219 sorry-- if you know the values of p and q, 1413 01:09:39,219 --> 01:09:41,729 you can compute that phi function we saw before. 1414 01:09:41,729 --> 01:09:45,979 That's going to allow you to get the value d from the value e. 1415 01:09:45,979 --> 01:09:48,750 So this factorization of the value m is hugely important, 1416 01:09:48,750 --> 01:09:51,985 it should be secret for RSA to remain secure. 1417 01:09:51,985 --> 01:09:53,840 So these guys are actually going to go 1418 01:09:53,840 --> 01:09:55,830 and try to guess what the value of q 1419 01:09:55,830 --> 01:09:59,570 is by timing this pipeline. 1420 01:09:59,570 --> 01:10:00,070 All right. 1421 01:10:00,070 --> 01:10:02,410 So how do these guys actually do it? 1422 01:10:02,410 --> 01:10:10,280 Well, they construct carefully chosen inputs, c, 1423 01:10:10,280 --> 01:10:12,570 into this pipeline and-- I guess I 1424 01:10:12,570 --> 01:10:16,800 keep saying they keep measuring the time for this guy. 1425 01:10:16,800 --> 01:10:22,130 But the particular, well, there's 1426 01:10:22,130 --> 01:10:23,505 two parts of the attack, you have 1427 01:10:23,505 --> 01:10:26,390 to bootstrap it a little bit to guess the first couple of bits. 1428 01:10:26,390 --> 01:10:28,390 And then once you have the first couple of bits, 1429 01:10:28,390 --> 01:10:29,600 you can I guess the next bit. 1430 01:10:29,600 --> 01:10:31,810 So let me not say exactly how they 1431 01:10:31,810 --> 01:10:34,997 guess the first couple of bits because it's actually much more 1432 01:10:34,997 --> 01:10:36,955 interesting to see how they guess the next bit. 1433 01:10:36,955 --> 01:10:38,330 And then we'll come back if we have 1434 01:10:38,330 --> 01:10:40,621 time to look at how they guess the first couple of bits 1435 01:10:40,621 --> 01:10:41,970 [? at this ?] in the paper. 1436 01:10:41,970 --> 01:10:45,820 But basically, suppose you have a guess g about what 1437 01:10:45,820 --> 01:10:48,216 the bits are of this value q. 1438 01:10:48,216 --> 01:10:56,820 So you know that q has some bits, g0, g1, g2, et cetera. 1439 01:10:56,820 --> 01:11:01,720 And actually, I guess these are not even gs, 1440 01:11:01,720 --> 01:11:04,990 these are real q bits, so let me write it as that. 1441 01:11:04,990 --> 01:11:10,310 So you know tat q bit 0 q bit 1, q bit 2, 1442 01:11:10,310 --> 01:11:12,690 these are the highest bits of q. 1443 01:11:12,690 --> 01:11:15,455 And then you're trying to guess lower and lower bits. 1444 01:11:15,455 --> 01:11:20,275 So suppose you know the value of q up to bit j. 1445 01:11:20,275 --> 01:11:22,750 And from that point on, your guess is actually all 0. 1446 01:11:22,750 --> 01:11:26,280 You have no idea what the other bits are. 1447 01:11:26,280 --> 01:11:31,900 So these guys are going to try to get this guess 1448 01:11:31,900 --> 01:11:35,760 g into this place in the pipeline. 1449 01:11:35,760 --> 01:11:38,280 Because this is where there are two tiny effects: 1450 01:11:38,280 --> 01:11:41,010 this choice of Karatsuba versus normal multiplication. 1451 01:11:41,010 --> 01:11:44,230 And this choice of, or this a different number 1452 01:11:44,230 --> 01:11:48,436 of extra reductions depending on the value c0 prime. 1453 01:11:48,436 --> 01:11:51,020 Sp they're going to actually try to get two different guess 1454 01:11:51,020 --> 01:11:53,330 values into that place in the pipeline. 1455 01:11:53,330 --> 01:11:58,120 One that looks like this, and one that they call 1456 01:11:58,120 --> 01:12:05,110 g high, which is all the same high bits, q2 qj. 1457 01:12:05,110 --> 01:12:07,440 And for the next bit, which they don't know, 1458 01:12:07,440 --> 01:12:09,750 [? you ?] guess g is going to have 0, 1459 01:12:09,750 --> 01:12:14,906 g high is going to have a bit 1 here and all zeros later on. 1460 01:12:14,906 --> 01:12:19,040 So how does it help these guys figure out what's going on? 1461 01:12:19,040 --> 01:12:22,120 So there are really two ways you can think of it. 1462 01:12:22,120 --> 01:12:28,930 Suppose that we get this guess g to be the value of c0 prime. 1463 01:12:28,930 --> 01:12:34,350 We can think of g and g high being the c0 prime value 1464 01:12:34,350 --> 01:12:36,200 on that left board over there. 1465 01:12:36,200 --> 01:12:37,700 It's actually fairly straightforward 1466 01:12:37,700 --> 01:12:42,460 to do this because c0 prime is pretty deterministically 1467 01:12:42,460 --> 01:12:44,480 computed from the input ciphertext c0. 1468 01:12:44,480 --> 01:12:47,030 You just multiply it by R. So, in order 1469 01:12:47,030 --> 01:12:49,240 for them to get some value to here, 1470 01:12:49,240 --> 01:12:53,370 as a guess, they just need to take their guess 1471 01:12:53,370 --> 01:12:57,340 and first divide it by R, so divide it by 2 to the 512 mod 1472 01:12:57,340 --> 01:12:58,340 something. 1473 01:12:58,340 --> 01:13:01,610 And then, they're going to inject it back. 1474 01:13:01,610 --> 01:13:04,260 And the server's going to multiply it by R, 1475 01:13:04,260 --> 01:13:06,490 and then off you go. 1476 01:13:06,490 --> 01:13:07,910 Make sense? 1477 01:13:07,910 --> 01:13:09,490 All right. 1478 01:13:09,490 --> 01:13:13,730 So suppose that we manage to get our particular chosen integer 1479 01:13:13,730 --> 01:13:16,650 value into that c0 you're prime spot. 1480 01:13:16,650 --> 01:13:19,930 So what's going to be the time to compute 1481 01:13:19,930 --> 01:13:22,522 c0 prime to the d mod q? 1482 01:13:22,522 --> 01:13:26,780 So there are two possible options here where 1483 01:13:26,780 --> 01:13:28,180 q falls in this picture. 1484 01:13:28,180 --> 01:13:33,920 So it might be that q is between these two values. 1485 01:13:33,920 --> 01:13:37,462 Because the next bit of q is 0. 1486 01:13:37,462 --> 01:13:39,170 So this value is going to be less than q, 1487 01:13:39,170 --> 01:13:41,670 but this guy's going to be greater than q. 1488 01:13:41,670 --> 01:13:44,970 So this happens if the next bit of q0 or it 1489 01:13:44,970 --> 01:13:48,340 might be that q lies above both of these values 1490 01:13:48,340 --> 01:13:51,880 if the next bit of q is 1. 1491 01:13:51,880 --> 01:13:53,860 So now we can tell, OK, what's going 1492 01:13:53,860 --> 01:13:58,280 to be the timing of decrypting these two values, 1493 01:13:58,280 --> 01:14:04,225 if q lies in between them, or if q lies above both of them. 1494 01:14:04,225 --> 01:14:05,600 Let's look at the situation where 1495 01:14:05,600 --> 01:14:08,140 q lies above both of them. 1496 01:14:08,140 --> 01:14:11,760 Well in that case, actually everything 1497 01:14:11,760 --> 01:14:13,160 is pretty much the same. 1498 01:14:13,160 --> 01:14:13,660 Right? 1499 01:14:13,660 --> 01:14:16,330 Because both of these values are smaller than q, 1500 01:14:16,330 --> 01:14:18,057 then the value of these things mod q 1501 01:14:18,057 --> 01:14:19,390 is going to be roughly the same. 1502 01:14:19,390 --> 01:14:21,140 They're going to be a little bit different 1503 01:14:21,140 --> 01:14:24,540 because this extra bit, but more or less they're 1504 01:14:24,540 --> 01:14:26,420 the same magnitude. 1505 01:14:26,420 --> 01:14:28,797 And the number of extractor reductions 1506 01:14:28,797 --> 01:14:31,380 is also probably not going to be hugely different because it's 1507 01:14:31,380 --> 01:14:34,780 proportional to the value of this guy mod q. 1508 01:14:34,780 --> 01:14:37,690 And for both these guys, they're both a little bit smaller 1509 01:14:37,690 --> 01:14:40,130 than q, so they're all about the same. 1510 01:14:40,130 --> 01:14:43,080 Neither of them is going to exceed q and all of a sudden 1511 01:14:43,080 --> 01:14:46,080 have [? many or fewer ?] extra reductions. 1512 01:14:46,080 --> 01:14:49,290 So if q is greater than both of these guesses 1513 01:14:49,290 --> 01:14:52,197 then Karatsuba versus normal is going to stay the same. 1514 01:14:52,197 --> 01:14:54,280 The server is going to do the same thing basically 1515 01:14:54,280 --> 01:14:56,825 for both g and g high in terms of Karatsuba versus normal. 1516 01:14:56,825 --> 01:14:59,145 And the server's going to do about the same number 1517 01:14:59,145 --> 01:15:01,497 of extra reductions for both these guys as well. 1518 01:15:01,497 --> 01:15:04,080 So If you see that the server's taking the same amount of time 1519 01:15:04,080 --> 01:15:06,050 to respond to these guesses, then you 1520 01:15:06,050 --> 01:15:10,580 should probably guess that, oh, q probably has the bit 1 here. 1521 01:15:10,580 --> 01:15:12,754 On the other hand, if q lies in the middle, 1522 01:15:12,754 --> 01:15:14,170 then there are two possible things 1523 01:15:14,170 --> 01:15:17,370 that could trigger a change in the timing. 1524 01:15:17,370 --> 01:15:19,680 One possibility is that because g high 1525 01:15:19,680 --> 01:15:22,712 is just a little bit larger than q, 1526 01:15:22,712 --> 01:15:24,170 then the number of extra reductions 1527 01:15:24,170 --> 01:15:26,336 is going to be proportional to this guy mod q, which 1528 01:15:26,336 --> 01:15:31,040 is very small because c0 prime is q plus just 1529 01:15:31,040 --> 01:15:33,915 a little bit in these extra bits. 1530 01:15:33,915 --> 01:15:35,290 So the number of extra reductions 1531 01:15:35,290 --> 01:15:36,650 is going to [? flaunt it ?]. 1532 01:15:36,650 --> 01:15:39,297 And all of a sudden, it will be faster. 1533 01:15:39,297 --> 01:15:40,880 Another possible thing that can happen 1534 01:15:40,880 --> 01:15:42,623 is that maybe the server will decide, oh, 1535 01:15:42,623 --> 01:15:44,664 now it's time to do normal multiplication instead 1536 01:15:44,664 --> 01:15:45,690 of Karatsuba. 1537 01:15:45,690 --> 01:15:51,910 Maybe for this value, all these, c to the 0 1538 01:15:51,910 --> 01:15:55,170 prime was the same number of bits as q 1539 01:15:55,170 --> 01:15:58,890 if it turns out that g high is above q, 1540 01:15:58,890 --> 01:16:02,700 then g high mod q is potentially going to have fewer bits. 1541 01:16:02,700 --> 01:16:04,930 And if this crosses the [INAUDIBLE] boundary, 1542 01:16:04,930 --> 01:16:07,055 then the server's going to do normal multiplication 1543 01:16:07,055 --> 01:16:08,270 all of a sudden. 1544 01:16:08,270 --> 01:16:10,590 So that's going to be in the other direction. 1545 01:16:10,590 --> 01:16:14,260 So if you cross over, then normal multiplication kicks in, 1546 01:16:14,260 --> 01:16:16,885 and things get a lot slower because normal multiplication 1547 01:16:16,885 --> 01:16:20,612 is quadratic instead of nicer, faster Karatsuba. 1548 01:16:20,612 --> 01:16:21,112 Question. 1549 01:16:21,112 --> 01:16:22,066 AUDIENCE: [INAUDIBLE] 1550 01:16:23,859 --> 01:16:26,150 PROFESSOR: Yeah, because the number of extra reductions 1551 01:16:26,150 --> 01:16:31,520 is proportional to from above there to c0 prime mod q. 1552 01:16:31,520 --> 01:16:36,880 So if c0 prime, which is this value, is just a little over q. 1553 01:16:36,880 --> 01:16:40,350 Then, this is tiny, as opposed to this guy who's basically 1554 01:16:40,350 --> 01:16:43,495 the same as q, or all the high bits are the same as q, 1555 01:16:43,495 --> 01:16:44,820 and then it's big. 1556 01:16:44,820 --> 01:16:47,980 So then it'll be the difference that you can try to measure. 1557 01:16:47,980 --> 01:16:49,730 So this is one interesting thing, actually 1558 01:16:49,730 --> 01:16:51,480 a couple interesting things, these effects 1559 01:16:51,480 --> 01:16:53,355 actually work in different directions, right. 1560 01:16:53,355 --> 01:16:55,870 So if you hit a 32 bit boundary and Karatsuba 1561 01:16:55,870 --> 01:16:58,170 versus normal switches, then all of a sudden 1562 01:16:58,170 --> 01:17:00,930 it takes much longer to decrypt this message. 1563 01:17:00,930 --> 01:17:04,460 On the other hand, if it's not a 32 bit boundary, 1564 01:17:04,460 --> 01:17:07,424 maybe this effect will tell you what's going on. 1565 01:17:07,424 --> 01:17:09,590 So you actually have to watch for different effects. 1566 01:17:09,590 --> 01:17:13,400 If you're not guessing a bit that's a multiple of 32 bits, 1567 01:17:13,400 --> 01:17:15,410 then you should probably expect the time 1568 01:17:15,410 --> 01:17:18,125 to drop because of extra reductions. 1569 01:17:18,125 --> 01:17:19,620 On the other hand, if you're trying 1570 01:17:19,620 --> 01:17:22,570 to guess a bit that's a multiple of 32, then 1571 01:17:22,570 --> 01:17:25,100 maybe you should be expecting for it to jump a lot 1572 01:17:25,100 --> 01:17:27,890 or maybe drop if it's [INAUDIBLE] normal. 1573 01:17:27,890 --> 01:17:29,890 So I guess what these guys look at in the paper, 1574 01:17:29,890 --> 01:17:31,450 this actually doesn't really matter 1575 01:17:31,450 --> 01:17:34,380 whether there's a jump up or a jump down in time. 1576 01:17:34,380 --> 01:17:38,570 You should just expect if q is, if the next bit of q is 1, 1577 01:17:38,570 --> 01:17:40,310 you should expect these things to take 1578 01:17:40,310 --> 01:17:41,740 almost the same amount of time. 1579 01:17:41,740 --> 01:17:44,607 And if the next bit of q is 0, then you 1580 01:17:44,607 --> 01:17:46,940 should expect these guys to have a noticeable difference 1581 01:17:46,940 --> 01:17:51,740 even if it's big or small, even if it's positive or negative. 1582 01:17:51,740 --> 01:17:53,364 So actually, they measure this. 1583 01:17:53,364 --> 01:17:55,280 And it turns out to actually work pretty well. 1584 01:17:55,280 --> 01:17:57,790 They have to do actually two interesting tricks 1585 01:17:57,790 --> 01:17:58,820 to make this work out. 1586 01:17:58,820 --> 01:18:01,890 If you remember the timing difference was tiny, 1587 01:18:01,890 --> 01:18:05,110 it's an order of 1 to 2 microseconds. 1588 01:18:05,110 --> 01:18:07,690 So it's going to be hard to measure this over a network, 1589 01:18:07,690 --> 01:18:10,060 over an ethernet switch for example. 1590 01:18:10,060 --> 01:18:13,460 What they do is they actually do two kinds of measurements, 1591 01:18:13,460 --> 01:18:15,310 two kinds of averaging. 1592 01:18:15,310 --> 01:18:17,370 So for each guess that they send, 1593 01:18:17,370 --> 01:18:18,870 they actually send it several times. 1594 01:18:18,870 --> 01:18:20,710 In the paper, they said they send it 1595 01:18:20,710 --> 01:18:22,380 like 7 times or something. 1596 01:18:22,380 --> 01:18:24,430 So what kind of noise do you think 1597 01:18:24,430 --> 01:18:26,670 this helps them with [? if they ?] just resend 1598 01:18:26,670 --> 01:18:29,440 the same guess over and over? 1599 01:18:29,440 --> 01:18:30,400 Yeah. 1600 01:18:30,400 --> 01:18:33,114 AUDIENCE: What's up with the [INAUDIBLE]? 1601 01:18:33,114 --> 01:18:34,780 PROFESSOR: Yeah, so if the network keeps 1602 01:18:34,780 --> 01:18:36,154 adding different things, you just 1603 01:18:36,154 --> 01:18:37,686 try the same thing many times. 1604 01:18:37,686 --> 01:18:39,060 The thing in the server should be 1605 01:18:39,060 --> 01:18:41,101 taking exactly the same amount of time every time 1606 01:18:41,101 --> 01:18:42,870 and just average out the network noise. 1607 01:18:42,870 --> 01:18:45,460 In the paper, they say they take the median value-- I actually 1608 01:18:45,460 --> 01:18:47,030 don't understand why they take the median, 1609 01:18:47,030 --> 01:18:48,510 I think they should be taking the min of the real thing 1610 01:18:48,510 --> 01:18:50,160 that's going on-- but anyway, this 1611 01:18:50,160 --> 01:18:52,000 was the average of the network. 1612 01:18:52,000 --> 01:18:54,630 But then they do this other weird thing, 1613 01:18:54,630 --> 01:18:57,850 which is that when they're sending a guess, 1614 01:18:57,850 --> 01:19:00,280 they don't just send the same guess 7 times, 1615 01:19:00,280 --> 01:19:02,730 they actually send a neighborhood of guesses. 1616 01:19:02,730 --> 01:19:04,920 And each value in the neighborhood 1617 01:19:04,920 --> 01:19:06,250 gets sent 7 times itself. 1618 01:19:06,250 --> 01:19:09,960 So they actually send g 7 times. 1619 01:19:09,960 --> 01:19:13,700 Then they send g plus 1 also 7 times. 1620 01:19:13,700 --> 01:19:17,980 Then they send g plus 2 also 7 times, et cetera, up to g 1621 01:19:17,980 --> 01:19:20,660 plus 400 in the paper. 1622 01:19:20,660 --> 01:19:23,640 Why do they do this kind of averaging 1623 01:19:23,640 --> 01:19:29,120 as well over different g value instead of just sending g 1624 01:19:29,120 --> 01:19:32,007 7 times 400 times. 1625 01:19:32,007 --> 01:19:33,590 Because it seems more straightforward. 1626 01:19:33,590 --> 01:19:34,090 Yeah? 1627 01:19:34,090 --> 01:19:35,000 AUDIENCE: [INAUDIBLE] 1628 01:19:38,290 --> 01:19:40,380 PROFESSOR: Yeah, that's actually what's going on. 1629 01:19:40,380 --> 01:19:44,060 We're actually trying to measure exactly how long this piece 1630 01:19:44,060 --> 01:19:45,109 of computation will take. 1631 01:19:45,109 --> 01:19:46,650 But then there's lots of other stuff. 1632 01:19:46,650 --> 01:19:48,858 For example, this other pipeline that's at the bottom 1633 01:19:48,858 --> 01:19:50,339 is doing all the stuff mod p. 1634 01:19:50,339 --> 01:19:52,630 I mean it's also going to take different amount of time 1635 01:19:52,630 --> 01:19:54,870 depending on what exactly the input is. 1636 01:19:54,870 --> 01:19:57,600 So the cool thing is that if you perturb 1637 01:19:57,600 --> 01:20:01,340 the value of all your guess g by adding 1, 2, 3, 1638 01:20:01,340 --> 01:20:03,370 whatever, it's just [INAUDIBLE] the little bits. 1639 01:20:03,370 --> 01:20:05,690 So the timing attack we just looked at just now, 1640 01:20:05,690 --> 01:20:07,570 isn't going to change because that depended 1641 01:20:07,570 --> 01:20:10,400 on this middle bit flipping. 1642 01:20:10,400 --> 01:20:13,115 But everything that's happening on the bottom side 1643 01:20:13,115 --> 01:20:15,550 of the pipeline mod p is going to be totally 1644 01:20:15,550 --> 01:20:17,160 randomized by this because when they 1645 01:20:17,160 --> 01:20:19,570 do it mod p then adding an extra bit 1646 01:20:19,570 --> 01:20:22,610 could shift things around quite a bit mod p. 1647 01:20:22,610 --> 01:20:25,920 Then you're going to, it will average out 1648 01:20:25,920 --> 01:20:28,000 other kinds of computational noise 1649 01:20:28,000 --> 01:20:30,140 that's deterministic for a particular value 1650 01:20:30,140 --> 01:20:33,730 but it's not related to this part of the computation we're 1651 01:20:33,730 --> 01:20:34,690 trying to go after. 1652 01:20:34,690 --> 01:20:35,668 Make sense? 1653 01:20:35,668 --> 01:20:37,436 AUDIENCE: How do they do that when they 1654 01:20:37,436 --> 01:20:38,602 try to guess the lower bits? 1655 01:20:38,602 --> 01:20:41,650 PROFESSOR: So actually they use some other mathematical trick 1656 01:20:41,650 --> 01:20:44,910 to only actually bother guessing the top half of the bits of q. 1657 01:20:44,910 --> 01:20:47,160 It turns out if you know the top half of the bits of q 1658 01:20:47,160 --> 01:20:50,480 there's some math you can rely on to factor the numbers, 1659 01:20:50,480 --> 01:20:51,730 and then you're in good shape. 1660 01:20:51,730 --> 01:20:53,790 So you can always [INAUDIBLE] little bit. 1661 01:20:53,790 --> 01:20:55,689 Basically not worry about it. 1662 01:20:55,689 --> 01:20:56,189 Make sense? 1663 01:20:56,189 --> 01:20:57,155 Yeah, question. 1664 01:20:57,155 --> 01:20:58,121 AUDIENCE: [INAUDIBLE] 1665 01:21:01,510 --> 01:21:05,600 PROFESSOR: Well, you're going to construct this value c0-- well 1666 01:21:05,600 --> 01:21:08,250 you want the c0 prime-- you're going to construct a value 1667 01:21:08,250 --> 01:21:13,200 c by basically taking your c0 prime and multiplying it times 1668 01:21:13,200 --> 01:21:14,990 R inverse mod n. 1669 01:21:17,860 --> 01:21:20,430 And then when the server takes this value, 1670 01:21:20,430 --> 01:21:22,000 it's going to push it through here. 1671 01:21:22,000 --> 01:21:23,680 So it's going to compute c0. 1672 01:21:23,680 --> 01:21:26,386 It's going to be c mod q, so that value is going 1673 01:21:26,386 --> 01:21:29,210 to be c0 prime R inverse mod q. 1674 01:21:29,210 --> 01:21:32,550 Then you multiply it by R, so you get rid of the R inverse. 1675 01:21:32,550 --> 01:21:35,800 And then you end up with a guess exactly in this position. 1676 01:21:35,800 --> 01:21:37,820 So the cool thing is basically all manipulations 1677 01:21:37,820 --> 01:21:40,570 leading up to here are just multiplying by this R. 1678 01:21:40,570 --> 01:21:43,360 And you know what R is going be, it's going to be 2 to the 512. 1679 01:21:43,360 --> 01:21:46,894 I'm going to be really straightforward. 1680 01:21:46,894 --> 01:21:47,674 Make sense? 1681 01:21:47,674 --> 01:21:48,382 Another question? 1682 01:21:48,382 --> 01:21:51,180 AUDIENCE: Could we just cancel out timing [INAUDIBLE]? 1683 01:21:56,115 --> 01:21:59,930 PROFESSOR: Well, if you do p, you'd be in business. 1684 01:21:59,930 --> 01:22:01,220 Yeah, so that's the thing. 1685 01:22:01,220 --> 01:22:03,341 Yeah, you don't know what p is, but you just 1686 01:22:03,341 --> 01:22:06,375 want to randomize it out. 1687 01:22:06,375 --> 01:22:07,440 Any questions? 1688 01:22:07,440 --> 01:22:10,549 All right. [INAUDIBLE] but thanks for sticking around. 1689 01:22:10,549 --> 01:22:13,300 So we'll start talking about other kinds of problems 1690 01:22:13,300 --> 01:22:15,150 next week.