1
00:00:00,090 --> 00:00:02,430
The following content is
provided under a Creative

2
00:00:02,430 --> 00:00:03,810
Commons license.

3
00:00:03,810 --> 00:00:06,050
Your support will help
MIT OpenCourseWare

4
00:00:06,050 --> 00:00:10,160
continue to offer high quality
educational resources for free.

5
00:00:10,160 --> 00:00:12,690
To make a donation or to
view additional materials

6
00:00:12,690 --> 00:00:16,590
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:16,590 --> 00:00:17,260
at ocw.mit.edu.

8
00:00:26,700 --> 00:00:28,955
PROFESSOR: All right,
guys, let's get started.

9
00:00:28,955 --> 00:00:31,330
So today, we're going to talk
about side-channel attacks,

10
00:00:31,330 --> 00:00:36,360
which is a general class
of problems that comes up

11
00:00:36,360 --> 00:00:38,870
in all kinds of systems.

12
00:00:38,870 --> 00:00:40,320
Broadly, side-channel
attacks are

13
00:00:40,320 --> 00:00:42,778
situations where you haven't
thought about some information

14
00:00:42,778 --> 00:00:44,810
that your system
might be revealing.

15
00:00:44,810 --> 00:00:47,860
So typically, you have multiple
components that you [INAUDIBLE]

16
00:00:47,860 --> 00:00:50,480
maybe a user talking
to some server.

17
00:00:50,480 --> 00:00:53,387
And you're thinking, great,
I know exactly all the bits

18
00:00:53,387 --> 00:00:57,600
going over some wire [INAUDIBLE]
server, and those are secure.

19
00:00:57,600 --> 00:01:00,796
But it's often easy to miss
some information revealed,

20
00:01:00,796 --> 00:01:03,830
either by user or by server.

21
00:01:03,830 --> 00:01:07,800
So the example that the
paper for today talks about

22
00:01:07,800 --> 00:01:10,465
is a situation where the
timing of the messages

23
00:01:10,465 --> 00:01:12,900
between the user and
the server reveals

24
00:01:12,900 --> 00:01:16,070
some additional information
that you wouldn't have otherwise

25
00:01:16,070 --> 00:01:19,390
learned by just observing the
bits flowing between these two

26
00:01:19,390 --> 00:01:20,930
guys.

27
00:01:20,930 --> 00:01:24,650
But In fact, there's a much
broader class of side-channels

28
00:01:24,650 --> 00:01:25,790
you might worry about.

29
00:01:25,790 --> 00:01:28,550
Originally,
side-channels showed up,

30
00:01:28,550 --> 00:01:31,360
or people discovered them in
the '40s when they discovered

31
00:01:31,360 --> 00:01:33,440
that when you start
typing characters

32
00:01:33,440 --> 00:01:37,110
on a teletype the electronics,
or the electrical machinery

33
00:01:37,110 --> 00:01:39,580
in the teletype, would
emit RF radiation.

34
00:01:39,580 --> 00:01:41,920
And you can hook up
an oscilloscope nearby

35
00:01:41,920 --> 00:01:44,490
and just watch the
characters being typed out

36
00:01:44,490 --> 00:01:48,230
by monitoring the frequency
or RF frequencies that

37
00:01:48,230 --> 00:01:49,800
are going out of this machine.

38
00:01:49,800 --> 00:01:54,410
So RF radiation is a classic
example of a side-channel

39
00:01:54,410 --> 00:01:57,490
that you might worry about.

40
00:01:57,490 --> 00:02:00,880
And there's lots of examples
lots of other examples

41
00:02:00,880 --> 00:02:02,900
that people have looked
at, almost anything.

42
00:02:02,900 --> 00:02:07,343
So power usage is
another side-channel

43
00:02:07,343 --> 00:02:08,259
you might worry about.

44
00:02:08,259 --> 00:02:09,750
So your computer
is probably going

45
00:02:09,750 --> 00:02:12,230
to use different amounts of
power depending on what exactly

46
00:02:12,230 --> 00:02:13,970
it's computing.

47
00:02:13,970 --> 00:02:17,200
I'm gonna go into other
clever examples of sound

48
00:02:17,200 --> 00:02:19,330
turns out to also leak stuff.

49
00:02:19,330 --> 00:02:21,740
There's a [? cute ?] paper
that you can look at.

50
00:02:21,740 --> 00:02:25,344
The people listen to a
printer and based on the sound

51
00:02:25,344 --> 00:02:26,760
the printer is
making you can tell

52
00:02:26,760 --> 00:02:28,670
what characters it's printing.

53
00:02:28,670 --> 00:02:31,695
This is especially easy to do
for dot matrix printers that

54
00:02:31,695 --> 00:02:35,180
make this very annoying
sound when they're printing.

55
00:02:35,180 --> 00:02:38,690
And in general, a good
thing to think about,

56
00:02:38,690 --> 00:02:40,681
Kevin on Monday's
lecture also mentioned

57
00:02:40,681 --> 00:02:43,014
some interesting side-channels
that he's running through

58
00:02:43,014 --> 00:02:45,700
in his research.

59
00:02:45,700 --> 00:02:49,090
But, in particular,
here we're going

60
00:02:49,090 --> 00:02:51,880
to look at the
specific side-channel

61
00:02:51,880 --> 00:02:56,240
that David Brumley and Dan Boneh
looked at in their paper-- I

62
00:02:56,240 --> 00:02:59,095
guess about 10 years ago now--
where they were able to extract

63
00:02:59,095 --> 00:03:03,170
a cryptographic key out of
a web server running Apache

64
00:03:03,170 --> 00:03:06,310
by measuring the timing
of different responses

65
00:03:06,310 --> 00:03:11,520
to different input packets
from the adversarial client.

66
00:03:11,520 --> 00:03:14,330
And in this particular
case, they're

67
00:03:14,330 --> 00:03:15,990
going after a cryptographic key.

68
00:03:15,990 --> 00:03:17,860
In fact, many
side-channel attacks

69
00:03:17,860 --> 00:03:21,440
target cryptographic keys
partly because it's a little bit

70
00:03:21,440 --> 00:03:24,744
tricky to get lots of data
through a side-channel.

71
00:03:24,744 --> 00:03:26,410
And cryptographic
keys are one situation

72
00:03:26,410 --> 00:03:30,050
where getting a small number
of bits helps you a lot.

73
00:03:30,050 --> 00:03:32,870
So in their attack they're
able to extract maybe

74
00:03:32,870 --> 00:03:36,760
about 200 256 bits or so.

75
00:03:36,760 --> 00:03:38,970
And just from those
200ish bits, they're

76
00:03:38,970 --> 00:03:42,300
able to break the cryptographic
key of this web server.

77
00:03:42,300 --> 00:03:43,890
Whereas, if you're
trying to leak

78
00:03:43,890 --> 00:03:46,140
some database full of
Social Security numbers,

79
00:03:46,140 --> 00:03:48,340
then that'll be
a lot of bits you

80
00:03:48,340 --> 00:03:51,082
have to leak to get
out of this database.

81
00:03:51,082 --> 00:03:53,290
So that's why many of
these side-channels,

82
00:03:53,290 --> 00:03:55,670
if you'll see them
later on, they often

83
00:03:55,670 --> 00:03:59,240
focus on getting
small secrets out,

84
00:03:59,240 --> 00:04:02,850
might be cryptographic
keys or passwords.

85
00:04:02,850 --> 00:04:04,970
But in general, this
is applicable to lots

86
00:04:04,970 --> 00:04:09,210
of other situations as well.

87
00:04:09,210 --> 00:04:11,230
And one cool thing
about this paper,

88
00:04:11,230 --> 00:04:13,410
before we jump into
the details, is

89
00:04:13,410 --> 00:04:16,459
that they show that you actually
do this over the network.

90
00:04:16,459 --> 00:04:18,890
So as you probably figured
out from reading this paper,

91
00:04:18,890 --> 00:04:20,560
they have to do a
lot of careful work

92
00:04:20,560 --> 00:04:23,150
to tease out these
minute differences

93
00:04:23,150 --> 00:04:24,670
in timing information.

94
00:04:24,670 --> 00:04:28,290
So if you actually compute out
the numbers from this paper,

95
00:04:28,290 --> 00:04:33,190
it turns out that each request
that they sent to the server

96
00:04:33,190 --> 00:04:35,365
differs from potentially
another [? website ?]

97
00:04:35,365 --> 00:04:39,480
by an order of 1 to
2 microseconds, which

98
00:04:39,480 --> 00:04:41,280
is pretty tiny.

99
00:04:41,280 --> 00:04:47,000
So you have to be quite
careful, and all of our network

100
00:04:47,000 --> 00:04:50,080
it might be hard to tell
whether some server took

101
00:04:50,080 --> 00:04:53,750
1 or 2 microseconds longer to
process your request or not.

102
00:04:53,750 --> 00:04:58,150
And as a result, it was not
so clear for whether you

103
00:04:58,150 --> 00:05:01,060
could mount this kind of attack
over a very noisy network.

104
00:05:01,060 --> 00:05:03,690
And these guys were
one of the first people

105
00:05:03,690 --> 00:05:06,620
to show that you can actually
do this over a real ethernet

106
00:05:06,620 --> 00:05:09,600
network with a server sitting
in one place, a client sitting

107
00:05:09,600 --> 00:05:10,461
somewhere else.

108
00:05:10,461 --> 00:05:12,460
And you could actually
measure these differences

109
00:05:12,460 --> 00:05:16,740
partly by averaging, partly
through other tricks.

110
00:05:16,740 --> 00:05:21,270
All right, does that make sense,
the overall side-channel stuff?

111
00:05:21,270 --> 00:05:21,770
All right.

112
00:05:21,770 --> 00:05:23,860
So the plan for the
rest of this lecture

113
00:05:23,860 --> 00:05:27,990
is we'll first dive into
the details of this RSA

114
00:05:27,990 --> 00:05:29,800
cryptosystem that
these guys use.

115
00:05:29,800 --> 00:05:32,480
Then we'll not look at
exactly why it's secure

116
00:05:32,480 --> 00:05:34,900
or not but we'll look at
how do you implement it

117
00:05:34,900 --> 00:05:37,980
because that turns out to
be critical for exploiting

118
00:05:37,980 --> 00:05:39,350
this particular side-channel.

119
00:05:39,350 --> 00:05:42,800
They carefully leverage various
details of the implementation

120
00:05:42,800 --> 00:05:46,164
to figure out when there are
some things faster or slower.

121
00:05:46,164 --> 00:05:48,080
And then we'll pop back
out once we understand

122
00:05:48,080 --> 00:05:49,210
how RSA is implemented.

123
00:05:49,210 --> 00:05:52,125
Then we'll come back and figure
out how do you attack it,

124
00:05:52,125 --> 00:05:54,250
how do you attack all these
different organizations

125
00:05:54,250 --> 00:05:56,040
that RSA has.

126
00:05:56,040 --> 00:05:57,580
Sounds good?

127
00:05:57,580 --> 00:05:58,710
All right.

128
00:05:58,710 --> 00:06:00,760
So I guess let's
start off by looking

129
00:06:00,760 --> 00:06:04,200
at the high level plan for RSA.

130
00:06:04,200 --> 00:06:08,940
So RSA is a pretty widely
used public key cryptosystem.

131
00:06:08,940 --> 00:06:10,800
We've mentioned these
guys a couple of weeks

132
00:06:10,800 --> 00:06:14,690
ago in general in certificates,
in the context of certificates.

133
00:06:14,690 --> 00:06:17,100
But now we're going to look
at actually how it works.

134
00:06:17,100 --> 00:06:20,710
So typically there's 3 things
you have to worry about.

135
00:06:20,710 --> 00:06:25,290
So there's generating a key,
encrypting, and decrypting.

136
00:06:25,290 --> 00:06:29,220
So for RSA, the way you
generate a key is you actually

137
00:06:29,220 --> 00:06:32,220
pick 2 large prime integers.

138
00:06:32,220 --> 00:06:35,500
So you're going to
pick 2 primes, p and q.

139
00:06:35,500 --> 00:06:42,020
And in the paper, these
guys focus on p and q,

140
00:06:42,020 --> 00:06:45,810
which are about 512 bits each.

141
00:06:45,810 --> 00:06:49,730
So this is typically
called 1,024 bit RSA

142
00:06:49,730 --> 00:06:52,570
because the resulting product
of these primes that you're

143
00:06:52,570 --> 00:06:56,500
going to use in a second is
a 1,000 bit integer number.

144
00:06:56,500 --> 00:06:59,360
These days, that's probably
not a particularly good choice

145
00:06:59,360 --> 00:07:02,170
for the size of your
RSA key because it

146
00:07:02,170 --> 00:07:06,860
makes it relatively easy for
attackers to factor this-- not

147
00:07:06,860 --> 00:07:09,080
trivial but certainly viable.

148
00:07:09,080 --> 00:07:12,170
So if 10 years ago, this seemed
like a potentially sensible

149
00:07:12,170 --> 00:07:14,520
parameter, now if you're
actually building a system,

150
00:07:14,520 --> 00:07:16,780
you should probably
pick a 2,000 or 3,000

151
00:07:16,780 --> 00:07:19,866
or even 4,000 bit RSA key.

152
00:07:19,866 --> 00:07:22,590
Well, that's what
RSA key size means

153
00:07:22,590 --> 00:07:24,620
is the size of these primes.

154
00:07:24,620 --> 00:07:26,480
And then, for
convenience, we're going

155
00:07:26,480 --> 00:07:28,140
to talk about the
number n, which

156
00:07:28,140 --> 00:07:33,010
is just the product of
these 2 primes, p times q.

157
00:07:33,010 --> 00:07:33,510
All right.

158
00:07:33,510 --> 00:07:35,490
So now we know how
to generate a key,

159
00:07:35,490 --> 00:07:38,440
now we need to figure
out-- well this is at least

160
00:07:38,440 --> 00:07:40,100
part of a key-- now
we're going to have

161
00:07:40,100 --> 00:07:45,060
to figure out how we're going
to encrypt and decrypt messages.

162
00:07:45,060 --> 00:07:48,280
And the way we're going to
encrypt and decrypt messages

163
00:07:48,280 --> 00:07:54,320
is by exponentiating numbers
modulo this number n.

164
00:07:54,320 --> 00:07:57,790
So it seems a little weird, but
let's go with it for a second.

165
00:07:57,790 --> 00:08:00,520
So if you want to
encrypt a message,

166
00:08:00,520 --> 00:08:03,560
then we're going
to take a message m

167
00:08:03,560 --> 00:08:11,920
and transform it into
m to the power e mod m.

168
00:08:11,920 --> 00:08:14,570
So e is going to be some
exponent-- we'll talk about how

169
00:08:14,570 --> 00:08:15,640
to choose it in a second.

170
00:08:15,640 --> 00:08:17,880
But this is how we're
going to encrypt a message.

171
00:08:17,880 --> 00:08:21,230
We'll just take this
message as an integer number

172
00:08:21,230 --> 00:08:23,260
and just exponentiate it.

173
00:08:23,260 --> 00:08:25,610
And then we'll see why
this works in a second,

174
00:08:25,610 --> 00:08:30,500
but let's call this
guy c, ciphertext.

175
00:08:30,500 --> 00:08:36,039
Then to decrypt it, we're
going to somehow find

176
00:08:36,039 --> 00:08:37,940
an interesting
other exponent where

177
00:08:37,940 --> 00:08:41,336
you can take a ciphertext c
and if you exponentiate it

178
00:08:41,336 --> 00:08:46,440
to some power d mod m,
then you'll magically

179
00:08:46,440 --> 00:08:49,500
get back the same message m.

180
00:08:49,500 --> 00:08:52,290
So this is the general plan:
To encrypt, you exponentiate.

181
00:08:52,290 --> 00:08:56,687
To decrypt, you exponentiate
by another exponent.

182
00:08:56,687 --> 00:08:58,270
And in general, it
seems a little hard

183
00:08:58,270 --> 00:09:00,561
to figure out how we're going
to come up with these two

184
00:09:00,561 --> 00:09:02,800
magic numbers that
somehow end up giving us

185
00:09:02,800 --> 00:09:04,390
back the same message.

186
00:09:04,390 --> 00:09:06,890
But it turns out
that if you look

187
00:09:06,890 --> 00:09:12,000
at how exponentiation works
or multiplication works,

188
00:09:12,000 --> 00:09:14,340
modulo of this number n.

189
00:09:14,340 --> 00:09:22,670
Then there's this cool property
that if you have any number x,

190
00:09:22,670 --> 00:09:26,000
and you raise it to what's
called a [? order ?] of phi

191
00:09:26,000 --> 00:09:32,215
function of n-- maybe I'll
use more board space for this.

192
00:09:32,215 --> 00:09:33,790
This seems important.

193
00:09:33,790 --> 00:09:37,998
So if you take x and you
raise it to phi of n,

194
00:09:37,998 --> 00:09:44,370
then this is going to
be equal to 1 mod m.

195
00:09:44,370 --> 00:09:48,260
And this phi function for
our particular choice of n

196
00:09:48,260 --> 00:09:49,960
is pretty straightforward,
it's actually

197
00:09:49,960 --> 00:09:54,600
p minus 1 times q minus 1.

198
00:09:54,600 --> 00:10:01,560
So this gives us hope that maybe
if we pick ed so that e times

199
00:10:01,560 --> 00:10:06,370
d is 5n plus 1, then
we're in good shape.

200
00:10:06,370 --> 00:10:11,200
Because then any message m we
exponentiate it to e and d,

201
00:10:11,200 --> 00:10:16,380
we get back 1 times m
because our ed product

202
00:10:16,380 --> 00:10:19,420
is going to be
roughly 5n plus 1,

203
00:10:19,420 --> 00:10:25,445
or maybe some constant
alpha times 5n plus 1.

204
00:10:25,445 --> 00:10:26,320
Does this make sense?

205
00:10:26,320 --> 00:10:30,800
This is why the message is going
to get decrypted correctly.

206
00:10:30,800 --> 00:10:33,900
And it turns out that there's
a reasonably straightforward

207
00:10:33,900 --> 00:10:39,880
algorithm if you know this
phi value for how to compute

208
00:10:39,880 --> 00:10:42,430
d given an e or e given a d.

209
00:10:42,430 --> 00:10:42,930
All right.

210
00:10:42,930 --> 00:10:43,770
Question.

211
00:10:43,770 --> 00:10:45,640
AUDIENCE: Isn't 1 mod n just 1?

212
00:10:45,640 --> 00:10:48,710
PROFESSOR: Yeah, so
far we add one more.

213
00:10:48,710 --> 00:10:50,048
Sorry?

214
00:10:50,048 --> 00:10:52,388
AUDIENCE: Like, up over there.

215
00:10:52,388 --> 00:10:53,471
PROFESSOR: Yeah, this one?

216
00:10:53,471 --> 00:10:55,430
AUDIENCE: Yeah.

217
00:10:55,430 --> 00:10:57,200
PROFESSOR: Isn't 1 mod n just 1?

218
00:10:57,200 --> 00:10:58,820
Sorry, I mean this.

219
00:10:58,820 --> 00:11:02,462
So when I say this 1 n, it
means that both sides taken 1n

220
00:11:02,462 --> 00:11:04,820
are equal.

221
00:11:04,820 --> 00:11:07,990
So what this means
is if you want

222
00:11:07,990 --> 00:11:10,046
to think of mod as
literally an operator,

223
00:11:10,046 --> 00:11:13,816
you would write this guy
mod m equals 1 mod m.

224
00:11:13,816 --> 00:11:15,440
So that's what mod
m on the side means.

225
00:11:15,440 --> 00:11:18,325
Like, the whole
equality is mod m.

226
00:11:18,325 --> 00:11:21,175
Sorry for the [INAUDIBLE].

227
00:11:21,175 --> 00:11:22,610
Make sense?

228
00:11:22,610 --> 00:11:24,120
All right.

229
00:11:24,120 --> 00:11:27,665
So what this basically
means for RSA is that we're

230
00:11:27,665 --> 00:11:32,150
going to pick some value e.

231
00:11:32,150 --> 00:11:34,558
So e is going to be
our encryption value.

232
00:11:34,558 --> 00:11:41,180
And then from e we're going
to generate d to be basically

233
00:11:41,180 --> 00:11:45,826
1 over e mod phi of n.

234
00:11:45,826 --> 00:11:47,665
And there's some
Euclidean algorithms

235
00:11:47,665 --> 00:11:51,460
you can use to do this
computation efficiently.

236
00:11:51,460 --> 00:11:53,390
But in order to do
this you actually

237
00:11:53,390 --> 00:11:56,180
have to know this
phi of n, which

238
00:11:56,180 --> 00:11:59,485
requires knowing the
factorization of our number n

239
00:11:59,485 --> 00:12:01,910
into p and q.

240
00:12:01,910 --> 00:12:02,410
All right.

241
00:12:02,410 --> 00:12:08,600
So finally, RSA ends
up being a system where

242
00:12:08,600 --> 00:12:13,132
the public key is this number n
and this encryption exponent e.

243
00:12:13,132 --> 00:12:16,750
So n and e are public,
and d should be private.

244
00:12:16,750 --> 00:12:18,820
So then anyone can
exponentiate a message

245
00:12:18,820 --> 00:12:20,320
to encrypt it for you.

246
00:12:20,320 --> 00:12:22,914
But only you know this
value d and therefore

247
00:12:22,914 --> 00:12:25,230
can decrypt messages.

248
00:12:25,230 --> 00:12:30,090
And as long as you don't know
this factorization of p and q,

249
00:12:30,090 --> 00:12:32,660
of n to p and q,
then you don't know

250
00:12:32,660 --> 00:12:33,785
what this [? phi del ?] is.

251
00:12:33,785 --> 00:12:35,910
And as a result, it's
actually difficult to compute

252
00:12:35,910 --> 00:12:37,470
this d value.

253
00:12:37,470 --> 00:12:41,580
So this is roughly what RSA is.

254
00:12:41,580 --> 00:12:43,370
High level.

255
00:12:43,370 --> 00:12:45,450
Does this make sense?

256
00:12:45,450 --> 00:12:45,950
All right.

257
00:12:45,950 --> 00:12:48,140
So there's 2 things I
want to talk about now

258
00:12:48,140 --> 00:12:52,590
that we at least have the basic
[? implementation ?] for RSA.

259
00:12:52,590 --> 00:12:55,850
There's tricks to use it
correctly and pitfalls

260
00:12:55,850 --> 00:12:57,085
and how to use RSA.

261
00:12:57,085 --> 00:12:59,210
And then there's all kinds
of implementation tricks

262
00:12:59,210 --> 00:13:02,440
on how do you actually
implement [? root ?]

263
00:13:02,440 --> 00:13:07,360
code to do these exponentiations
and do them efficiently.

264
00:13:07,360 --> 00:13:10,010
There's actually more
trivial because these are all

265
00:13:10,010 --> 00:13:13,110
large numbers, these are 1,000
bit integers that can't just

266
00:13:13,110 --> 00:13:15,730
do a multiply instruction for.

267
00:13:15,730 --> 00:13:18,156
Probably going to take
a fair amount of time

268
00:13:18,156 --> 00:13:20,430
to do these operations.

269
00:13:20,430 --> 00:13:20,930
All right.

270
00:13:20,930 --> 00:13:22,430
So the first thing
I want to mention

271
00:13:22,430 --> 00:13:26,470
is the various RSA pitfalls.

272
00:13:26,470 --> 00:13:31,310
One of them we're actually going
to rely on in a little bit.

273
00:13:31,310 --> 00:13:35,360
One property is, that
it's multiplicative.

274
00:13:38,827 --> 00:13:43,600
So what I mean by this is that
suppose we have 2 messages.

275
00:13:43,600 --> 00:13:46,950
Suppose we have m0 and m1.

276
00:13:46,950 --> 00:13:49,196
And suppose I
encrypt these guys,

277
00:13:49,196 --> 00:13:55,612
so I encrypt m0, I'm going to
get m0 to the power e mod n.

278
00:13:55,612 --> 00:14:02,840
And if I encrypt m1, then
I'd get m1 to the e mod n.

279
00:14:02,840 --> 00:14:06,220
The problem is-- not
necessarily a problem

280
00:14:06,220 --> 00:14:08,940
but could be a
surprise to someone

281
00:14:08,940 --> 00:14:11,300
using RSA-- it's
very easy to generate

282
00:14:11,300 --> 00:14:14,480
an encryption of m0
times m1 because you just

283
00:14:14,480 --> 00:14:15,940
multiply these 2 numbers.

284
00:14:15,940 --> 00:14:18,480
If you multiply these
guys out, you're

285
00:14:18,480 --> 00:14:26,500
going to get m0
m1 to the e mod n.

286
00:14:26,500 --> 00:14:29,840
This is a correct encryption
under this simplistic use

287
00:14:29,840 --> 00:14:34,512
of RSA for the
value m0 times m1.

288
00:14:34,512 --> 00:14:36,847
I mean at this point,
it's not a huge problem

289
00:14:36,847 --> 00:14:38,555
because if you aren't
able to decrypt it,

290
00:14:38,555 --> 00:14:41,940
you're just able to construct
this encrypted message.

291
00:14:41,940 --> 00:14:45,620
But it might be that the
overall system maybe allows you

292
00:14:45,620 --> 00:14:46,786
to decrypt certain messages.

293
00:14:46,786 --> 00:14:50,110
And if it allows you to decrypt
this message that you construct

294
00:14:50,110 --> 00:14:52,670
yourself, maybe you can
now go back and figure out

295
00:14:52,670 --> 00:14:53,820
what are these messages.

296
00:14:53,820 --> 00:15:00,310
So it's maybe not a great plan
to be ignorant of this fact.

297
00:15:00,310 --> 00:15:04,000
This has certainly come back
to bite a number of protocols

298
00:15:04,000 --> 00:15:05,450
that use RSA.

299
00:15:05,450 --> 00:15:06,950
There's one property,
we'll actually

300
00:15:06,950 --> 00:15:11,450
use it as a defensive mechanism
towards the end of the lecture.

301
00:15:11,450 --> 00:15:15,910
Another property of RSA that you
probably want to watch out for

302
00:15:15,910 --> 00:15:18,566
is the fact that
it's deterministic.

303
00:15:21,350 --> 00:15:23,695
So in this [? naive ?]
implementation

304
00:15:23,695 --> 00:15:27,072
that I just described here,
if you take a message m

305
00:15:27,072 --> 00:15:29,165
and you encrypt it,
you're going to get m

306
00:15:29,165 --> 00:15:32,100
to the e mod n, which is
a deterministic function

307
00:15:32,100 --> 00:15:33,296
of the message.

308
00:15:33,296 --> 00:15:35,303
So if you encrypt
it again, you'll

309
00:15:35,303 --> 00:15:36,870
get exactly the same encryption.

310
00:15:36,870 --> 00:15:38,590
This is not surprising
but it might not

311
00:15:38,590 --> 00:15:40,510
be a desirable
property because if I

312
00:15:40,510 --> 00:15:44,090
see you send send some
message encrypted with RSA,

313
00:15:44,090 --> 00:15:46,495
and I want to know what
it is, it might be hard

314
00:15:46,495 --> 00:15:47,370
for me to decrypt it.

315
00:15:47,370 --> 00:15:48,890
But I can try different
things and I can see,

316
00:15:48,890 --> 00:15:50,306
well are you sending
this message?

317
00:15:50,306 --> 00:15:52,600
I'll encrypt it and see if
you get the same ciphertext.

318
00:15:52,600 --> 00:15:54,820
And if so, then I'll know
that's what you encrypted.

319
00:15:54,820 --> 00:15:56,790
Because all I need to
encrypt a message is

320
00:15:56,790 --> 00:16:01,850
the publicly known public key,
which is n and the number e.

321
00:16:01,850 --> 00:16:04,104
So that's not so great.

322
00:16:04,104 --> 00:16:06,145
And you might want to
watch out for this property

323
00:16:06,145 --> 00:16:08,640
if you're actually using RSA.

324
00:16:08,640 --> 00:16:10,140
So all of these
[? primitives are ?]

325
00:16:10,140 --> 00:16:14,340
probably a little bit
hard to use directly.

326
00:16:14,340 --> 00:16:17,320
What people do in
practice in order

327
00:16:17,320 --> 00:16:20,024
to avoid these
problems with RSA is

328
00:16:20,024 --> 00:16:21,690
they encode the message
in a certain way

329
00:16:21,690 --> 00:16:23,030
before encrypting it.

330
00:16:23,030 --> 00:16:25,790
Instead of directly
exponentiating a message,

331
00:16:25,790 --> 00:16:28,020
it actually takes some
function of a message,

332
00:16:28,020 --> 00:16:31,680
and then they encrypt that.

333
00:16:31,680 --> 00:16:33,096
mod n.

334
00:16:33,096 --> 00:16:38,190
And this function f, the
right one to use these days,

335
00:16:38,190 --> 00:16:41,526
is probably something called
optimal asymmetric encryption

336
00:16:41,526 --> 00:16:45,595
padding, O A E P.
You can look it up.

337
00:16:45,595 --> 00:16:49,310
It's something coded that has
two interesting properties.

338
00:16:49,310 --> 00:16:51,390
First of all, it
injects randomness.

339
00:16:51,390 --> 00:16:57,230
You can think of f of n as
generating 1,000 bit message

340
00:16:57,230 --> 00:16:58,580
that you're going to encrypt.

341
00:16:58,580 --> 00:17:01,566
Part of this message is going to
be your message m in the middle

342
00:17:01,566 --> 00:17:02,065
here.

343
00:17:02,065 --> 00:17:03,420
So that you can get it back
when you decrypt, of course.

344
00:17:03,420 --> 00:17:04,641
[INAUDIBLE].

345
00:17:04,641 --> 00:17:06,599
So there's 2 interesting
things you want to do.

346
00:17:06,599 --> 00:17:08,339
You want to put in
some randomness here,

347
00:17:08,339 --> 00:17:10,640
some value r so that when
you encrypt the message

348
00:17:10,640 --> 00:17:12,839
multiple times, you'll
get different results out

349
00:17:12,839 --> 00:17:16,069
of each time so then it's
not deterministic anymore.

350
00:17:16,069 --> 00:17:18,390
And in order to defeat this
multiplicative property

351
00:17:18,390 --> 00:17:20,840
and other kinds of
problems, you're

352
00:17:20,840 --> 00:17:23,010
going to put in some
fixed padding here.

353
00:17:23,010 --> 00:17:25,510
You can think of this as
an altering sequence of 1 0

354
00:17:25,510 --> 00:17:27,003
1 0 1 0.

355
00:17:27,003 --> 00:17:28,044
You can do better things.

356
00:17:28,044 --> 00:17:30,134
But roughly it's some
predictable sequence

357
00:17:30,134 --> 00:17:33,395
that you put in here and
whenever you decrypt,

358
00:17:33,395 --> 00:17:35,590
you make sure the
sequence is still there.

359
00:17:35,590 --> 00:17:37,560
Even in multiplication
it's going

360
00:17:37,560 --> 00:17:40,570
to destroy this bit power.

361
00:17:40,570 --> 00:17:43,597
And then you should be
clear that someone tampered

362
00:17:43,597 --> 00:17:46,082
with my message and reject it.

363
00:17:46,082 --> 00:17:51,220
And if it's still there, then
presumably, sometimes provably,

364
00:17:51,220 --> 00:17:53,621
no one tampered with your
message, and as a result

365
00:17:53,621 --> 00:17:55,004
you should be able to accept it.

366
00:17:55,004 --> 00:17:59,140
And treat message m as
correctly encrypted by someone.

367
00:17:59,140 --> 00:18:00,721
Make sense?

368
00:18:00,721 --> 00:18:01,220
Yeah?

369
00:18:01,220 --> 00:18:05,250
AUDIENCE: If the attacker knows
how big the pad is, can't they

370
00:18:05,250 --> 00:18:10,960
put a 1 in the lowest
place and then [INAUDIBLE]

371
00:18:10,960 --> 00:18:13,207
under multiplication?

372
00:18:13,207 --> 00:18:14,165
PROFESSOR: Yeah, maybe.

373
00:18:14,165 --> 00:18:16,552
It's a little bit tricky
because this randomness

374
00:18:16,552 --> 00:18:17,510
is going to bleed over.

375
00:18:17,510 --> 00:18:20,170
So the particular
construction of this O A E P

376
00:18:20,170 --> 00:18:22,740
is a little bit more
sophisticated than this.

377
00:18:22,740 --> 00:18:25,210
But if you imagine
this is integer

378
00:18:25,210 --> 00:18:28,160
multiplication not
bit-wise multiplication.

379
00:18:28,160 --> 00:18:31,530
And so this randomness is
going to bleed over somewhere,

380
00:18:31,530 --> 00:18:34,700
and you can construct
O A E P scheme such

381
00:18:34,700 --> 00:18:37,896
that this doesn't happen.

382
00:18:37,896 --> 00:18:41,720
[INAUDIBLE] Make sense?

383
00:18:41,720 --> 00:18:42,390
All right.

384
00:18:42,390 --> 00:18:44,514
So it turns out that
basically you shouldn't really

385
00:18:44,514 --> 00:18:46,170
use this RSA math
directly, you should

386
00:18:46,170 --> 00:18:48,760
use some library in
practice that implements all

387
00:18:48,760 --> 00:18:51,340
those things correctly for you.

388
00:18:51,340 --> 00:18:53,980
And use it just as an
encrypt/decrypt parameter.

389
00:18:53,980 --> 00:18:56,390
But it turns out these details
will come in and matter

390
00:18:56,390 --> 00:18:58,473
for us because we're
actually trying to figure out

391
00:18:58,473 --> 00:19:03,300
how to break or how to attack
an existing RSA implementation.

392
00:19:03,300 --> 00:19:07,100
So in particular the
attack from this paper

393
00:19:07,100 --> 00:19:10,080
is going to exploit the
fact that the server is

394
00:19:10,080 --> 00:19:13,210
going to check for this padding
when they get a message.

395
00:19:13,210 --> 00:19:17,130
So this is how we're going to
time how long it takes a server

396
00:19:17,130 --> 00:19:17,770
to decrypt.

397
00:19:17,770 --> 00:19:21,690
We're going to send some random
message, or some carefully

398
00:19:21,690 --> 00:19:22,545
constructed message.

399
00:19:22,545 --> 00:19:26,243
But the message wasn't
constructed by taking a real m

400
00:19:26,243 --> 00:19:27,330
and encrypting it.

401
00:19:27,330 --> 00:19:29,980
We're going to construct a
careful ciphertext integer

402
00:19:29,980 --> 00:19:31,300
value.

403
00:19:31,300 --> 00:19:33,020
And the server is
going to decrypt it,

404
00:19:33,020 --> 00:19:34,700
it's going to decrypt
to some nonsense,

405
00:19:34,700 --> 00:19:36,590
and the padding is
going to not match

406
00:19:36,590 --> 00:19:37,820
with a very high probability.

407
00:19:37,820 --> 00:19:40,090
And immediately the server
is going to reject it.

408
00:19:40,090 --> 00:19:41,720
And the reason this
is going to be good

409
00:19:41,720 --> 00:19:44,340
for us is because it will tell
us exactly how long it took

410
00:19:44,340 --> 00:19:47,250
the server to get to this point,
just do the RSA decryption,

411
00:19:47,250 --> 00:19:50,281
get this message, check
the padding, and reject it.

412
00:19:50,281 --> 00:19:52,030
So that's what we're
going to be measuring

413
00:19:52,030 --> 00:19:54,290
in this attack from the paper.

414
00:19:54,290 --> 00:19:55,450
Does that make sense?

415
00:19:55,450 --> 00:19:57,700
So there's some integrity
component to the the message

416
00:19:57,700 --> 00:20:02,800
that allows us to time the
decryption leading up to it.

417
00:20:02,800 --> 00:20:03,625
All right.

418
00:20:03,625 --> 00:20:07,180
So now let's talk about how to
do you actually implement RSA.

419
00:20:07,180 --> 00:20:09,940
So the core of it is
really this exponentiation,

420
00:20:09,940 --> 00:20:12,485
which is not exactly
trivial to do

421
00:20:12,485 --> 00:20:14,860
as I was mentioning earlier
because all these numbers are

422
00:20:14,860 --> 00:20:15,880
very large integers.

423
00:20:15,880 --> 00:20:18,820
So the message itself
is going to be at least,

424
00:20:18,820 --> 00:20:20,830
in this paper,
1,000 bit integer.

425
00:20:20,830 --> 00:20:23,810
And the exponent itself is
also going to be pretty large.

426
00:20:23,810 --> 00:20:26,180
The encryption exponent
is at least well known.

427
00:20:26,180 --> 00:20:27,596
But the decryption
exponent better

428
00:20:27,596 --> 00:20:30,255
be also a large integer also
on the order of 1,000 bits.

429
00:20:30,255 --> 00:20:32,126
So you have a 1,000
bit integer you

430
00:20:32,126 --> 00:20:35,900
want to exponentiate to another
1,000 bit integer power modulo

431
00:20:35,900 --> 00:20:38,030
some other 1,000
bit integer n that's

432
00:20:38,030 --> 00:20:39,830
going to be a little
messy, if you just do

433
00:20:39,830 --> 00:20:42,210
[? the naive thing. ?]
So almost everyone has

434
00:20:42,210 --> 00:20:45,530
lots of optimizations in
their RSA implementations

435
00:20:45,530 --> 00:20:48,640
to make this go a
little bit faster.

436
00:20:48,640 --> 00:20:51,970
And there's four
optimizations that matter

437
00:20:51,970 --> 00:20:53,420
for the purpose of this attack.

438
00:20:53,420 --> 00:20:55,420
There is actually more
tricks that you can play,

439
00:20:55,420 --> 00:20:57,100
but the most important
ones are these.

440
00:20:57,100 --> 00:21:02,130
So first there's something
called the Chinese remainder

441
00:21:02,130 --> 00:21:06,640
theorem, or C R T.
And just to remind you

442
00:21:06,640 --> 00:21:10,250
from grade school or
high school maybe what

443
00:21:10,250 --> 00:21:12,330
this remainder theorem says.

444
00:21:12,330 --> 00:21:16,380
It actually says that
if you have two numbers

445
00:21:16,380 --> 00:21:20,170
and you have some
value x and you know

446
00:21:20,170 --> 00:21:25,360
that x is equal to a1 mod p.

447
00:21:25,360 --> 00:21:31,200
And you know that x is
equal to a2 mod q, where

448
00:21:31,200 --> 00:21:33,350
p and q are prime numbers.

449
00:21:33,350 --> 00:21:38,790
And this modular equality
applies to the whole equation.

450
00:21:38,790 --> 00:21:42,920
Then it turns out that there's
a unique solution to this

451
00:21:42,920 --> 00:21:43,650
is mod p q.

452
00:21:43,650 --> 00:21:52,210
So there's are some x equals
to some x prime mod pq.

453
00:21:52,210 --> 00:21:55,050
And in fact, there's
a unique such x prime,

454
00:21:55,050 --> 00:21:57,170
and it's actually very
efficient to compute.

455
00:21:57,170 --> 00:21:59,450
So the Chinese
remainder theorem also

456
00:21:59,450 --> 00:22:03,070
comes with an algorithm for
how to compute this unique x

457
00:22:03,070 --> 00:22:09,300
prime that's equal to x mod pq
given the values a1 and a2 mod

458
00:22:09,300 --> 00:22:12,570
p and q, respectively.

459
00:22:12,570 --> 00:22:15,170
Make sense?

460
00:22:15,170 --> 00:22:17,495
OK, so how can you use this
Chinese remainder theorem

461
00:22:17,495 --> 00:22:22,580
to speed up modular
exponentiation?

462
00:22:22,580 --> 00:22:24,130
So the way this is
going to help us

463
00:22:24,130 --> 00:22:26,350
is that if you
notice all the time

464
00:22:26,350 --> 00:22:31,400
we're doing this computational
of some bunch of stuff modulo

465
00:22:31,400 --> 00:22:33,710
n, which is p times q.

466
00:22:33,710 --> 00:22:35,135
And the Chinese
remainder theorem

467
00:22:35,135 --> 00:22:39,100
says that if you want the value
of something mod p times q,

468
00:22:39,100 --> 00:22:42,320
it suffices to compute the
value of that thing mod p

469
00:22:42,320 --> 00:22:44,746
and the value of
that thing mod q.

470
00:22:44,746 --> 00:22:46,610
And then use the Chinese
remainder theorem

471
00:22:46,610 --> 00:22:48,960
to figure out the
unique solution to what

472
00:22:48,960 --> 00:22:53,220
this thing is mod p times q.

473
00:22:53,220 --> 00:22:55,516
All right, why is this faster?

474
00:22:55,516 --> 00:22:58,335
Seems like you're basically
doing the same thing twice,

475
00:22:58,335 --> 00:23:00,854
and that's more
work to recombine it

476
00:23:00,854 --> 00:23:02,270
Is this going to
save me anything?

477
00:23:02,270 --> 00:23:02,770
Yeah?

478
00:23:02,770 --> 00:23:03,746
AUDIENCE: [INAUDIBLE]

479
00:23:06,479 --> 00:23:08,270
PROFESSOR: Well, they're
certainly smaller,

480
00:23:08,270 --> 00:23:09,311
they're not that smaller.

481
00:23:09,311 --> 00:23:11,950
And so p and q, so n
is 1,000 bits, p and q

482
00:23:11,950 --> 00:23:15,600
are both 500 bits, they're not
quite to the machine word size

483
00:23:15,600 --> 00:23:16,360
yet.

484
00:23:16,360 --> 00:23:18,980
But it is going to
help us because most

485
00:23:18,980 --> 00:23:21,340
of the stuff we're doing
in this computation

486
00:23:21,340 --> 00:23:23,160
is all these multiplications.

487
00:23:23,160 --> 00:23:26,315
And roughly multiplication
is quadratic in the size

488
00:23:26,315 --> 00:23:29,960
of the thing you're multiplying
because the grade school

489
00:23:29,960 --> 00:23:31,980
method of multiplication
you take all the digits

490
00:23:31,980 --> 00:23:34,910
and multiply them by all the
other digits in the number.

491
00:23:34,910 --> 00:23:38,785
And as a result, doing
exponentiation multiplication

492
00:23:38,785 --> 00:23:40,650
is roughly quadratic
in the input side.

493
00:23:40,650 --> 00:23:46,460
So if we shrink the value of p,
we basically go from 1,000 bits

494
00:23:46,460 --> 00:23:49,204
to 512 bits, we reduce the
size of our input by 2.

495
00:23:49,204 --> 00:23:51,370
So this means all this
multiplication exponentiation

496
00:23:51,370 --> 00:23:54,930
is going to be roughly
4 times cheaper.

497
00:23:54,930 --> 00:23:58,530
So even though we do it twice,
each time is 4 times faster.

498
00:23:58,530 --> 00:24:01,300
So overall, the
CRT optimization is

499
00:24:01,300 --> 00:24:04,120
going to give us
basically a 2x performance

500
00:24:04,120 --> 00:24:08,080
boost for doing any
RSA operation both,

501
00:24:08,080 --> 00:24:10,694
in the encryption
and decryption side.

502
00:24:10,694 --> 00:24:14,220
That make sense?

503
00:24:14,220 --> 00:24:15,570
All right.

504
00:24:15,570 --> 00:24:20,250
So that's the first optimization
that most people use.

505
00:24:20,250 --> 00:24:24,550
The second thing that
most implementations do

506
00:24:24,550 --> 00:24:27,195
is a technique called
sliding windows.

507
00:24:32,620 --> 00:24:36,200
And we'll look at this
implementation in 2 steps

508
00:24:36,200 --> 00:24:40,199
so this implementation is
going to be concerned with what

509
00:24:40,199 --> 00:24:41,740
basic operations
are going to perform

510
00:24:41,740 --> 00:24:44,390
to do this exponentiation.

511
00:24:44,390 --> 00:24:49,000
Suppose you have some
ciphertext c that's now 500 bits

512
00:24:49,000 --> 00:24:52,155
because you were not
doing mod p or mod q.

513
00:24:52,155 --> 00:24:58,270
We have a 500 bit c and,
similarly, roughly a 500 bit d

514
00:24:58,270 --> 00:25:00,185
as well.

515
00:25:00,185 --> 00:25:04,070
So how do we raise
c to the power d?

516
00:25:04,070 --> 00:25:07,040
I guess the stupid way
that is to take c and keep

517
00:25:07,040 --> 00:25:08,740
multiplying d times.

518
00:25:08,740 --> 00:25:10,770
But d is very big,
it's 2 to the 500.

519
00:25:10,770 --> 00:25:12,940
So that's never going to finish.

520
00:25:12,940 --> 00:25:16,780
So a more amenable,
or more performant,

521
00:25:16,780 --> 00:25:20,810
plan is to do what's
called repeat of squaring.

522
00:25:20,810 --> 00:25:24,880
So that's the step
before sliding windows.

523
00:25:24,880 --> 00:25:31,360
So this technique called
repeated squaring looks

524
00:25:31,360 --> 00:25:31,860
like this.

525
00:25:31,860 --> 00:25:40,580
So if you want to compute
c to the power 2 x,

526
00:25:40,580 --> 00:25:46,080
then you can actually compute
c to the x and then square it.

527
00:25:46,080 --> 00:25:48,600
So in our naive plan,
computing c to the 2x

528
00:25:48,600 --> 00:25:50,850
would have involved us making
twice as many iterations

529
00:25:50,850 --> 00:25:53,449
of multiplying because it's
multiplying c twice many times.

530
00:25:53,449 --> 00:25:55,490
But in fact, you could be
clever and just compute

531
00:25:55,490 --> 00:25:58,336
c to the x and then
square it later.

532
00:25:58,336 --> 00:26:00,610
So this works well,
and this means

533
00:26:00,610 --> 00:26:06,810
that if you're computing c to
some even exponent, this works.

534
00:26:06,810 --> 00:26:10,412
And conversely, if you're
computing c to some 2x plus 1,

535
00:26:10,412 --> 00:26:11,870
then you could
imagine this is just

536
00:26:11,870 --> 00:26:16,461
c to the x squared
times another c.

537
00:26:16,461 --> 00:26:18,770
So this is what's called
repeated squaring.

538
00:26:18,770 --> 00:26:23,375
And this now allows us to
compute these exponentiations,

539
00:26:23,375 --> 00:26:27,600
or modular exponentiations,
in a time that's

540
00:26:27,600 --> 00:26:31,200
basically linear in the
size of the exponent.

541
00:26:31,200 --> 00:26:34,110
So for every bit
in the exponent,

542
00:26:34,110 --> 00:26:37,090
we're going to either
square something

543
00:26:37,090 --> 00:26:40,760
or square something then
do an extra multiplication.

544
00:26:40,760 --> 00:26:43,920
So that's the plan
for repeated squaring.

545
00:26:43,920 --> 00:26:47,290
So now we can at least have
non-embarrassing run times

546
00:26:47,290 --> 00:26:50,045
for computing modular exponents.

547
00:26:50,045 --> 00:26:54,652
Does this make sense, why this
is working and why it's faster?

548
00:26:54,652 --> 00:26:56,610
All right, so what's this
sliding windows trick

549
00:26:56,610 --> 00:26:58,930
that the paper talks about?

550
00:26:58,930 --> 00:27:02,500
So this is a little bit
more sophisticated than this

551
00:27:02,500 --> 00:27:04,050
repeating squaring business.

552
00:27:04,050 --> 00:27:08,020
And basically the
squaring is going

553
00:27:08,020 --> 00:27:09,690
to be pretty much inevitable.

554
00:27:09,690 --> 00:27:13,450
But what the sliding windows
optimization is trying do

555
00:27:13,450 --> 00:27:17,570
is reduce the overhead of
multiplying by this extra c

556
00:27:17,570 --> 00:27:18,656
down here.

557
00:27:18,656 --> 00:27:21,300
So suppose if you
have some number that

558
00:27:21,300 --> 00:27:25,470
has several 1 bits in the
exponent, for every 1 bit

559
00:27:25,470 --> 00:27:27,485
in the exponent in the
binder of presentation,

560
00:27:27,485 --> 00:27:30,670
you're going to have do this
step instead of this step.

561
00:27:30,670 --> 00:27:33,130
Because for every
odd number, you're

562
00:27:33,130 --> 00:27:34,610
going to have to multiply by c.

563
00:27:34,610 --> 00:27:37,930
So these guys would like to not
multiply by this c as often.

564
00:27:37,930 --> 00:27:44,754
So the plan is to precompute
different powers of c.

565
00:27:44,754 --> 00:27:46,170
So what we're going
to do is we're

566
00:27:46,170 --> 00:27:48,340
going to generate
a table that says,

567
00:27:48,340 --> 00:27:53,020
well, here's the value of c
to the x-- sorry, c to the 1--

568
00:27:53,020 --> 00:27:56,460
here's the value of c
to the 3, c to the 7.

569
00:27:56,460 --> 00:27:57,960
And I think
[? in open ?] as a cell,

570
00:27:57,960 --> 00:28:02,020
it goes up to c to the 31st.

571
00:28:02,020 --> 00:28:04,780
So this table is
going to just be

572
00:28:04,780 --> 00:28:08,640
precomputed when you want to
do some modular exponentiation.

573
00:28:08,640 --> 00:28:11,660
You're going to precompute
all the slots in this table.

574
00:28:11,660 --> 00:28:14,340
And then when you want to do
this exponentiation, instead

575
00:28:14,340 --> 00:28:16,850
of doing the repeated squaring
and multiplying by this c

576
00:28:16,850 --> 00:28:18,754
every time,

577
00:28:18,754 --> 00:28:20,420
You're going to use
a different formula.

578
00:28:20,420 --> 00:28:26,580
It says as well if you have
c to the 32x plus some y,

579
00:28:26,580 --> 00:28:29,075
well you can do c
to the x, and you

580
00:28:29,075 --> 00:28:33,665
can do repeated squaring--
very much like before-- this

581
00:28:33,665 --> 00:28:38,250
is to get the 32, there's
like 5 powers of 2 here

582
00:28:38,250 --> 00:28:41,560
times c to the y.

583
00:28:41,560 --> 00:28:44,055
And c to the y, you can
get out of this table.

584
00:28:44,055 --> 00:28:46,770
So you can see that we're doing
the same number of squaring

585
00:28:46,770 --> 00:28:48,280
as before here.

586
00:28:48,280 --> 00:28:52,270
But we don't have to
multiply by c as many times.

587
00:28:52,270 --> 00:28:54,400
You're going to fish
it out of this table

588
00:28:54,400 --> 00:28:56,580
and do several multiplies
by c for the cost

589
00:28:56,580 --> 00:28:59,030
of a single multiply.

590
00:28:59,030 --> 00:29:00,484
This make sense?

591
00:29:00,484 --> 00:29:00,983
Yeah?

592
00:29:00,983 --> 00:29:03,876
AUDIENCE: How do you determine
x and y in the first place?

593
00:29:03,876 --> 00:29:05,125
PROFESSOR: How do determine y?

594
00:29:05,125 --> 00:29:06,156
AUDIENCE: X and y.

595
00:29:06,156 --> 00:29:07,000
PROFESSOR: Oh, OK.

596
00:29:07,000 --> 00:29:08,380
So let's look at that.

597
00:29:08,380 --> 00:29:13,290
So for repeated
squaring, well actually

598
00:29:13,290 --> 00:29:14,940
in both cases,
what you want to do

599
00:29:14,940 --> 00:29:17,240
is you want to look
at the exponent

600
00:29:17,240 --> 00:29:21,830
that you're trying to use
in a binary representation.

601
00:29:21,830 --> 00:29:26,180
So suppose I'm trying to compute
the value of c to the exponent,

602
00:29:26,180 --> 00:29:32,755
I don't know, 1 0 1 1 0 1 0,
and maybe there's more bits.

603
00:29:32,755 --> 00:29:35,310
OK, so if we wanted to
do repeated squaring,

604
00:29:35,310 --> 00:29:38,410
then you look at the
lowest bit here-- it's 0.

605
00:29:38,410 --> 00:29:39,910
So what you're
going to write down

606
00:29:39,910 --> 00:29:46,346
is this is equal to c to
the 1 0 1 1 0 1 squared.

607
00:29:46,346 --> 00:29:49,205
OK, so now if only
you knew this value,

608
00:29:49,205 --> 00:29:50,812
then you could just square it.

609
00:29:50,812 --> 00:29:54,816
OK, now we're going to compute
this guy, so c to the 1 0 1 1

610
00:29:54,816 --> 00:29:57,850
0 1 is equal to-- well
here we can't use this rule

611
00:29:57,850 --> 00:30:00,400
because it's not 2x-- it's
going to be to the x plus 1.

612
00:30:00,400 --> 00:30:06,030
So now we're going to write
this is c to the 1 0 1 1 0

613
00:30:06,030 --> 00:30:09,430
squared times another c.

614
00:30:09,430 --> 00:30:15,020
Because it's this prefix
times 2 plus this one of m.

615
00:30:15,020 --> 00:30:17,140
That's how you fish it
out for repeated squaring.

616
00:30:17,140 --> 00:30:19,950
And for sliding window,
you just grab more bits

617
00:30:19,950 --> 00:30:20,680
from the low end.

618
00:30:20,680 --> 00:30:24,090
So if you wanted to do the
sliding window trick here

619
00:30:24,090 --> 00:30:27,130
instead of taking
one c out, suppose

620
00:30:27,130 --> 00:30:29,880
we do-- instead of this
giant table-- maybe

621
00:30:29,880 --> 00:30:30,980
we do 3 bits at a time.

622
00:30:30,980 --> 00:30:32,785
So we go off to c to the 7th.

623
00:30:32,785 --> 00:30:36,620
So here you would
grab the first 3 bits,

624
00:30:36,620 --> 00:30:40,448
and that's what you would
compute here: c to the 1

625
00:30:40,448 --> 00:30:42,700
0 1 to the 8th power.

626
00:30:42,700 --> 00:30:47,995
And then, the rest is c
to the 1 0 1 power here.

627
00:30:47,995 --> 00:30:50,120
It's a little unfortunate
these are the same thing,

628
00:30:50,120 --> 00:30:53,001
but really there's
more bits here.

629
00:30:53,001 --> 00:30:54,625
But here, this is
the thing that you're

630
00:30:54,625 --> 00:30:55,875
going to look up in the table.

631
00:30:55,875 --> 00:30:57,760
This is c to the 5th in decimal.

632
00:30:57,760 --> 00:31:00,590
And this says you're going to
keep doing the sliding window

633
00:31:00,590 --> 00:31:03,310
to compute this value.

634
00:31:03,310 --> 00:31:05,036
Make sense?

635
00:31:05,036 --> 00:31:06,410
This just saves
on how many times

636
00:31:06,410 --> 00:31:08,760
you have to multiply
by c by pre-multiplying

637
00:31:08,760 --> 00:31:10,910
it a bunch of times.

638
00:31:10,910 --> 00:31:12,870
[? And the cell guys ?]
at least 10 years ago

639
00:31:12,870 --> 00:31:16,520
thought that going
up to 32 power

640
00:31:16,520 --> 00:31:18,229
was the best plan in
terms of efficiency

641
00:31:18,229 --> 00:31:20,020
because there's some
trade off here, right?

642
00:31:20,020 --> 00:31:21,728
You spend time
preconfiguring this table,

643
00:31:21,728 --> 00:31:24,109
but then if this
table is too giant,

644
00:31:24,109 --> 00:31:25,650
you're not going to
use some entries,

645
00:31:25,650 --> 00:31:28,190
because if you run
this table out to,

646
00:31:28,190 --> 00:31:31,700
I don't know, c to the 128
but you're computing just

647
00:31:31,700 --> 00:31:33,191
like 500 [? full bit ?]
exponents,

648
00:31:33,191 --> 00:31:35,190
maybe you're not going
to use all these entries.

649
00:31:35,190 --> 00:31:36,670
So it's gonna be
a waste of time.

650
00:31:36,670 --> 00:31:37,170
Question.

651
00:31:37,170 --> 00:31:41,156
AUDIENCE: [INAUDIBLE]
Is there a reason

652
00:31:41,156 --> 00:31:44,128
not to compute the
table [INAUDIBLE]?

653
00:31:44,128 --> 00:31:44,628
[INAUDIBLE]

654
00:31:49,460 --> 00:31:52,240
PROFESSOR: It ends
up being the case

655
00:31:52,240 --> 00:31:57,740
that you don't want to-- well
there's two things going on.

656
00:31:57,740 --> 00:32:01,850
One is that you'll have now code
to check whether the entry is

657
00:32:01,850 --> 00:32:05,440
filled in or not, and that'll
probably reduce your branch

658
00:32:05,440 --> 00:32:07,232
predictor accuracy
on the CPU So it

659
00:32:07,232 --> 00:32:09,010
will run slower
in the common case

660
00:32:09,010 --> 00:32:11,903
because if you [INAUDIBLE]
with the entries there.

661
00:32:11,903 --> 00:32:13,319
Another slightly
annoying thing is

662
00:32:13,319 --> 00:32:15,850
that it turns out
this entry leaks stuff

663
00:32:15,850 --> 00:32:18,440
through a different
side-channel, namely

664
00:32:18,440 --> 00:32:20,670
cache access patterns.

665
00:32:20,670 --> 00:32:23,610
So if you have some other
process on the same CPU,

666
00:32:23,610 --> 00:32:26,650
you can sort of see which
cache addresses are getting

667
00:32:26,650 --> 00:32:30,910
evicted out of the cache or are
slower because someone accessed

668
00:32:30,910 --> 00:32:32,730
this entry or this entry.

669
00:32:32,730 --> 00:32:35,400
And the bigger this
table gets, the easier

670
00:32:35,400 --> 00:32:38,630
it is to tell what the
exponent bits were.

671
00:32:38,630 --> 00:32:42,930
In the limit, this table is
gigantic and just telling,

672
00:32:42,930 --> 00:32:47,680
just being able to tell which
cache address on this CPU

673
00:32:47,680 --> 00:32:50,345
had a [? miss ?] tells you that
the encryption process must

674
00:32:50,345 --> 00:32:51,965
have accessed that
entry in the table.

675
00:32:51,965 --> 00:32:55,450
And tells you that, oh that long
bit sequence appears somewhere

676
00:32:55,450 --> 00:32:58,170
in your secret key exponent.

677
00:32:58,170 --> 00:33:00,930
So I guess the answer
isn't mathematically

678
00:33:00,930 --> 00:33:03,080
you could totally fill
this in on demand.

679
00:33:03,080 --> 00:33:06,550
In practice, you probably
don't want it to be that giant.

680
00:33:06,550 --> 00:33:08,810
And also, if you have
it's particularly giant,

681
00:33:08,810 --> 00:33:12,350
you aren't going to be able to
use entries as efficiently as

682
00:33:12,350 --> 00:33:13,250
well.

683
00:33:13,250 --> 00:33:14,910
You can reuse these
entries as you're

684
00:33:14,910 --> 00:33:16,576
computing. [INAUDIBLE]
It's not actually

685
00:33:16,576 --> 00:33:19,460
that expensive because
you use c to the cubed

686
00:33:19,460 --> 00:33:23,330
when you're computing c to the
7th and so on and so forth.

687
00:33:23,330 --> 00:33:25,644
It's not that bad.

688
00:33:25,644 --> 00:33:26,800
Make sense?

689
00:33:26,800 --> 00:33:30,040
Other questions?

690
00:33:30,040 --> 00:33:31,260
All right.

691
00:33:31,260 --> 00:33:35,250
So this is the repeated
squaring and sliding

692
00:33:35,250 --> 00:33:41,384
window optimization that
open [? a cell ?] implements

693
00:33:41,384 --> 00:33:43,550
[INAUDIBLE] I don't actually
know whether they still

694
00:33:43,550 --> 00:33:46,252
have the same size of the
sliding window or not.

695
00:33:46,252 --> 00:33:48,460
But it does actually give
you a fair bit of speed up.

696
00:33:48,460 --> 00:33:53,135
So before you had to square
for every bit in the exponent.

697
00:33:53,135 --> 00:33:57,060
And then you'd have to have
a multiply for every 1 bit.

698
00:33:57,060 --> 00:33:59,990
So if you have a 500
bit exponent then

699
00:33:59,990 --> 00:34:02,880
you're going to do 500
squarings and, on average,

700
00:34:02,880 --> 00:34:06,349
roughly 256
multiplications by c.

701
00:34:06,349 --> 00:34:07,890
So with sliding
windows, you're going

702
00:34:07,890 --> 00:34:11,469
to still do the 512
squarings because there's

703
00:34:11,469 --> 00:34:13,280
no getting around that.

704
00:34:13,280 --> 00:34:16,050
But instead of doing
256 multiplies by c,

705
00:34:16,050 --> 00:34:19,214
you're going to
hopefully do way fewer,

706
00:34:19,214 --> 00:34:21,130
maybe something on the
order of 32 [INAUDIBLE]

707
00:34:21,130 --> 00:34:24,900
multiplies by some
entry in this table.

708
00:34:24,900 --> 00:34:27,489
So that's the general plan.

709
00:34:27,489 --> 00:34:31,400
[INAUDIBLE] Not as
dramatic as CRT, not 2x,

710
00:34:31,400 --> 00:34:33,760
but it could save
you like almost 1.5x.

711
00:34:37,516 --> 00:34:40,660
All depending on exactly
what [INAUDIBLE].

712
00:34:40,660 --> 00:34:42,870
Make sense?

713
00:34:42,870 --> 00:34:45,888
Another question about this?

714
00:34:45,888 --> 00:34:47,260
All right.

715
00:34:47,260 --> 00:34:50,360
So these are the [? roughly ?]
easier optimizations.

716
00:34:50,360 --> 00:34:53,040
And then there's
two clever tricks

717
00:34:53,040 --> 00:34:57,290
playing with numbers for how to
do just a multiplication more

718
00:34:57,290 --> 00:34:59,150
efficiently.

719
00:34:59,150 --> 00:35:01,690
So the first one of
these optimizations

720
00:35:01,690 --> 00:35:04,080
that we're going to
look at-- I think

721
00:35:04,080 --> 00:35:08,060
I'll raise this board--
is called this Montgomery

722
00:35:08,060 --> 00:35:09,820
representation.

723
00:35:09,820 --> 00:35:13,190
And we'll see in
a second why it's

724
00:35:13,190 --> 00:35:14,800
particularly important for us.

725
00:35:23,820 --> 00:35:26,700
So the problem that this
Montgomery representation

726
00:35:26,700 --> 00:35:29,150
optimization is
trying to solve for us

727
00:35:29,150 --> 00:35:33,170
is the fact that every
time we do a multiply,

728
00:35:33,170 --> 00:35:34,880
we get a number
that keeps growing

729
00:35:34,880 --> 00:35:36,650
and growing and growing.

730
00:35:36,650 --> 00:35:40,690
In particular, both
in sliding windows

731
00:35:40,690 --> 00:35:43,750
or in repeated
squaring, actually when

732
00:35:43,750 --> 00:35:46,010
you square you multiply
2 numbers together,

733
00:35:46,010 --> 00:35:47,510
when you multiply
by c to the y, you

734
00:35:47,510 --> 00:35:48,685
multiply 2 numbers together.

735
00:35:48,685 --> 00:35:53,010
And the problem is that if the
inputs to the multiplication

736
00:35:53,010 --> 00:35:56,910
were, let's say, 512 bits each.

737
00:35:56,910 --> 00:35:59,140
Then the result of
the multiplication

738
00:35:59,140 --> 00:36:01,130
is going to be 1,000 bits.

739
00:36:01,130 --> 00:36:03,120
And then you'd take
this 1,000 bit result

740
00:36:03,120 --> 00:36:04,746
and you multiply it
again by something

741
00:36:04,746 --> 00:36:05,870
like five [INAUDIBLE] bits.

742
00:36:05,870 --> 00:36:08,910
And now it's 1,500 bits,
2,000 bits, 2,500 bits,

743
00:36:08,910 --> 00:36:10,790
and it keeps
growing and growing.

744
00:36:10,790 --> 00:36:13,430
And you really don't want
this because multiplications

745
00:36:13,430 --> 00:36:17,670
[? quadratic ?] in the size of
the number we're multiplying.

746
00:36:17,670 --> 00:36:19,430
So we have to keep
the size of our number

747
00:36:19,430 --> 00:36:21,985
as small as possible,
which means basically 512

748
00:36:21,985 --> 00:36:27,360
bits because all this
computation is mod p or mod q.

749
00:36:27,360 --> 00:36:28,045
Yeah?

750
00:36:28,045 --> 00:36:29,670
AUDIENCE: What do
you want [INAUDIBLE]?

751
00:36:31,960 --> 00:36:33,210
PROFESSOR: That's right, yeah.

752
00:36:33,210 --> 00:36:36,240
So the cool thing is that
we can keep this number down

753
00:36:36,240 --> 00:36:37,640
because what we
do is, let's say,

754
00:36:37,640 --> 00:36:40,730
we want to compute c to the
x just for this example.

755
00:36:40,730 --> 00:36:41,524
Squared.

756
00:36:41,524 --> 00:36:43,270
Squared again.

757
00:36:43,270 --> 00:36:44,350
Squared again.

758
00:36:44,350 --> 00:36:46,610
What you could do is
you compute c to the x

759
00:36:46,610 --> 00:36:49,740
then you take mod
p, let's say, right.

760
00:36:49,740 --> 00:36:53,110
Then you square it then
you do mod p again.

761
00:36:53,110 --> 00:36:56,820
Then you square it again,
and then you do mod p again.

762
00:36:56,820 --> 00:36:57,539
And so on.

763
00:36:57,539 --> 00:36:59,330
So this is basically
what you're proposing.

764
00:36:59,330 --> 00:37:00,100
So this is great.

765
00:37:00,100 --> 00:37:02,830
In fact, this keeps
it size of our numbers

766
00:37:02,830 --> 00:37:05,260
to basically five total
bits, which is about as

767
00:37:05,260 --> 00:37:06,890
small as we can get.

768
00:37:06,890 --> 00:37:08,710
This is good in
terms of keeping down

769
00:37:08,710 --> 00:37:11,940
the size of these numbers
for multiplication.

770
00:37:11,940 --> 00:37:15,310
But it's actually kind of
expensive to do this mod p

771
00:37:15,310 --> 00:37:16,920
operation.

772
00:37:16,920 --> 00:37:19,240
Because the way that you
do mod p something is

773
00:37:19,240 --> 00:37:21,740
you basically have
to do division.

774
00:37:21,740 --> 00:37:24,510
And division is way worse
than multiplication.

775
00:37:24,510 --> 00:37:27,730
I'm not going to go through
the algorithms for division,

776
00:37:27,730 --> 00:37:30,520
but it's really slow.

777
00:37:30,520 --> 00:37:33,907
You usually want to avoid
division as much as possible.

778
00:37:33,907 --> 00:37:36,240
Because it's not even just a
straightforward programming

779
00:37:36,240 --> 00:37:39,290
thing, you have to do some
approximation algorithm,

780
00:37:39,290 --> 00:37:41,780
some sort of Newton's
method of some sort

781
00:37:41,780 --> 00:37:43,330
and just keep it [INAUDIBLE].

782
00:37:43,330 --> 00:37:44,790
It's going to be slow.

783
00:37:44,790 --> 00:37:47,290
And in the main
implementation, this actually

784
00:37:47,290 --> 00:37:50,640
turns out to be the slowest
part of doing multiplication.

785
00:37:50,640 --> 00:37:52,230
The multiplication is cheap.

786
00:37:52,230 --> 00:37:56,210
But then doing mod p or mod q
to bring it back down in size

787
00:37:56,210 --> 00:37:59,190
is going to be actually more
expensive than the multiplying.

788
00:37:59,190 --> 00:38:01,480
So that's actually
kind of a bummer.

789
00:38:01,480 --> 00:38:04,560
So the way that we're
going to get around this

790
00:38:04,560 --> 00:38:08,590
is by doing this multiplication,
this clever other

791
00:38:08,590 --> 00:38:13,280
representation, and also
I'll show you the trick here.

792
00:38:13,280 --> 00:38:14,780
Let's see.

793
00:38:14,780 --> 00:38:16,680
Bear with me for a
second, and then we'll

794
00:38:16,680 --> 00:38:21,082
and then see why it's so fast
to use this Montgomery trick.

795
00:38:21,082 --> 00:38:26,190
And the basic idea is
to represent numbers,

796
00:38:26,190 --> 00:38:29,570
these are regular numbers
that you might actually

797
00:38:29,570 --> 00:38:30,852
want to multiply.

798
00:38:30,852 --> 00:38:32,980
And we're going to have a
different representation

799
00:38:32,980 --> 00:38:35,313
for these numbers, called the
Montgomery representation.

800
00:38:37,530 --> 00:38:41,190
And that representation
is actually very easy.

801
00:38:41,190 --> 00:38:43,990
We just take the value
a and we multiply it

802
00:38:43,990 --> 00:38:46,000
by some magic value R.

803
00:38:46,000 --> 00:38:48,250
I'll tell you what
this R is in a second.

804
00:38:48,250 --> 00:38:51,710
But let's first figure out if
you pick some arbitrary value

805
00:38:51,710 --> 00:38:53,820
R, what's going to happen here?

806
00:38:53,820 --> 00:38:56,200
So we take 2 numbers, a and b.

807
00:38:56,200 --> 00:39:00,075
Their Montgomery representations
are sort of expectedly.

808
00:39:00,075 --> 00:39:02,840
A is aR, b is bR.

809
00:39:02,840 --> 00:39:05,920
And if you want to compute
the product of a times b,

810
00:39:05,920 --> 00:39:08,100
well in Montgomery
space, you can also

811
00:39:08,100 --> 00:39:09,160
multiply these guys out.

812
00:39:09,160 --> 00:39:13,310
You can take aR
multiply it by bR.

813
00:39:13,310 --> 00:39:17,130
And what you get here
is ab times R squared.

814
00:39:17,130 --> 00:39:18,770
So there are two Rs now.

815
00:39:18,770 --> 00:39:22,570
That's kind of annoying, but
you can divide that by R.

816
00:39:22,570 --> 00:39:29,610
And we get ab times R. So this
is probably weird in a sense

817
00:39:29,610 --> 00:39:32,190
that why would you
multiply this extra number.

818
00:39:32,190 --> 00:39:34,525
But let's first figure out
whether this is correct.

819
00:39:34,525 --> 00:39:37,179
And then we'll figure out why
this is going to be faster.

820
00:39:37,179 --> 00:39:39,220
So it's correct in the
sense that it's very easy.

821
00:39:39,220 --> 00:39:40,840
If you want to
multiply some numbers,

822
00:39:40,840 --> 00:39:43,364
we just multiply by this R
value and get the Montgomery

823
00:39:43,364 --> 00:39:44,208
representation.

824
00:39:44,208 --> 00:39:45,980
Then we can do all
these multiplications

825
00:39:45,980 --> 00:39:47,920
to these Montgomery forms.

826
00:39:47,920 --> 00:39:50,264
And every time we
multiply 2 numbers,

827
00:39:50,264 --> 00:39:52,180
we have to divide by R,
look at the Montgomery

828
00:39:52,180 --> 00:39:54,550
form of the
multiplication result.

829
00:39:54,550 --> 00:39:56,360
And then when we're
done doing all

830
00:39:56,360 --> 00:39:58,780
of our squarings,
multiplication, all this stuff,

831
00:39:58,780 --> 00:40:01,180
we're going to move back
to the normal, regular form

832
00:40:01,180 --> 00:40:04,890
by just dividing
by R one last time.

833
00:40:04,890 --> 00:40:06,586
AUDIENCE: [INAUDIBLE]

834
00:40:06,586 --> 00:40:08,086
PROFESSOR: We're
now going to pick R

835
00:40:08,086 --> 00:40:09,560
to be a very nice number.

836
00:40:09,560 --> 00:40:11,900
And in particular,
we're going to pick R

837
00:40:11,900 --> 00:40:17,780
to be a very nice number to make
this division by R very fast.

838
00:40:17,780 --> 00:40:21,320
And the cool thing is
that if this division by R

839
00:40:21,320 --> 00:40:24,499
is going to be very
fast, then this

840
00:40:24,499 --> 00:40:26,290
is going to be a small
number and we're not

841
00:40:26,290 --> 00:40:29,460
going to have to do
this mod q very often.

842
00:40:29,460 --> 00:40:32,120
In particular, aR,
let's say, is also

843
00:40:32,120 --> 00:40:34,530
going to be roughly 500 bits
because it's all actually

844
00:40:34,530 --> 00:40:36,630
mod p or mod q.

845
00:40:36,630 --> 00:40:39,320
So aR is 500 bits.

846
00:40:39,320 --> 00:40:41,230
BR is going to also be 500 bits.

847
00:40:41,230 --> 00:40:44,160
So this product is
going to be 1,000 bits.

848
00:40:44,160 --> 00:40:46,830
This R is going to be
this nice 500 roughly bit

849
00:40:46,830 --> 00:40:48,630
number, same size as p.

850
00:40:48,630 --> 00:40:50,925
And if we can make this
division to be fast,

851
00:40:50,925 --> 00:40:55,744
then the result is going to be
a roughly 500 bit number here.

852
00:40:55,744 --> 00:40:57,910
So we were able to do the
multiplying without having

853
00:40:57,910 --> 00:40:59,400
to do an extra divide.

854
00:40:59,400 --> 00:41:03,920
Dividing by R cheaply gives us
this small result, getting us

855
00:41:03,920 --> 00:41:08,360
out of doing a mod p
for most situations.

856
00:41:08,360 --> 00:41:11,670
OK, so what is this weird number
that I keep talking about?

857
00:41:11,670 --> 00:41:17,944
Well R is just going
to be 2 to 512.

858
00:41:17,944 --> 00:41:22,930
It's going to be 1
followed by a ton of zeros.

859
00:41:22,930 --> 00:41:25,260
So multiplying by
this is easy, you just

860
00:41:25,260 --> 00:41:27,320
append a bunch of
zeros to a number.

861
00:41:27,320 --> 00:41:32,960
Dividing could be easy if
the low bits of the result

862
00:41:32,960 --> 00:41:34,547
are all zeros.

863
00:41:34,547 --> 00:41:37,750
So if you have a value
that's a bunch of bits

864
00:41:37,750 --> 00:41:41,460
followed by 512 zeros, then
dividing by 2 to the 512

865
00:41:41,460 --> 00:41:41,960
is cheap.

866
00:41:41,960 --> 00:41:44,337
You just discard the zeros
on the right-hand side.

867
00:41:44,337 --> 00:41:47,140
And that's actually
the correct division.

868
00:41:47,140 --> 00:41:48,650
Does that make sense?

869
00:41:48,650 --> 00:41:50,311
The slight problem
is that we actually

870
00:41:50,311 --> 00:41:51,664
don't have zeros on
the right hand side

871
00:41:51,664 --> 00:41:53,110
when you do this multiplication.

872
00:41:53,110 --> 00:41:56,750
These are like real 512 bit
numbers with all the 512 bits

873
00:41:56,750 --> 00:41:57,460
used.

874
00:41:57,460 --> 00:41:58,890
So this will be a
1,000 bit number

875
00:41:58,890 --> 00:42:02,352
[? or ?] with all this bits
also set to randomly 0 or 1,

876
00:42:02,352 --> 00:42:03,560
depending on what's going on.

877
00:42:03,560 --> 00:42:06,460
So we can't just
discard the low bits.

878
00:42:06,460 --> 00:42:09,144
But the cleverness
comes from the fact

879
00:42:09,144 --> 00:42:11,210
that the only
thing we care about

880
00:42:11,210 --> 00:42:14,370
is the value of
this thing mod p.

881
00:42:14,370 --> 00:42:18,610
So you can always add
multiples of p to this value

882
00:42:18,610 --> 00:42:22,380
without changing it when
it's equivalent to mod p.

883
00:42:22,380 --> 00:42:25,130
And as a result, we
can add multiples of p

884
00:42:25,130 --> 00:42:28,020
to get the low bits
to all be zeros.

885
00:42:28,020 --> 00:42:30,510
So let's look through
some simple examples.

886
00:42:30,510 --> 00:42:33,390
I'm not going to write
out 512 bits on the board.

887
00:42:33,390 --> 00:42:37,325
But suppose that--
here's a short example.

888
00:42:40,200 --> 00:42:42,710
Suppose that we have
a situation where

889
00:42:42,710 --> 00:42:46,340
our value R is 2 to the 4th.

890
00:42:46,340 --> 00:42:49,810
So it's 1 followed
by four zeros.

891
00:42:49,810 --> 00:42:53,170
So this is a much smaller
example than the real thing.

892
00:42:53,170 --> 00:42:55,140
But let's see how this
Montgomery division

893
00:42:55,140 --> 00:42:57,170
is going to work out.

894
00:42:57,170 --> 00:43:02,600
So suppose we're going to try
to compute stuff mod q, where

895
00:43:02,600 --> 00:43:05,570
q, let's say, is maybe 7.

896
00:43:05,570 --> 00:43:10,000
So this is 1 1 1 in binary form.

897
00:43:10,000 --> 00:43:12,970
And what we're
going to try to do

898
00:43:12,970 --> 00:43:16,360
is maybe we did
some multiplication.

899
00:43:16,360 --> 00:43:19,700
And this value aR
times bR is equal

900
00:43:19,700 --> 00:43:26,520
to this binary
presentation 1 1 0 1 0.

901
00:43:26,520 --> 00:43:31,060
So this is going to be
the value of aR times bR.

902
00:43:31,060 --> 00:43:32,780
How do we divide it by R?

903
00:43:32,780 --> 00:43:35,175
So clearly the low
four bits aren't all 0,

904
00:43:35,175 --> 00:43:37,472
so we can't just divide it out.

905
00:43:37,472 --> 00:43:40,680
But we can add multiples of q.

906
00:43:40,680 --> 00:43:45,510
In particular, we
can add 2 times q.

907
00:43:45,510 --> 00:43:49,700
So 2q is equal to 1 1 1 0.

908
00:43:49,700 --> 00:43:56,740
And now what we get
is 0 0, carry a 1, 0,

909
00:43:56,740 --> 00:44:01,520
carry a 1, 1, carry a 1, 0 1.

910
00:44:01,520 --> 00:44:02,520
I hope I did that right.

911
00:44:02,520 --> 00:44:03,530
So this is what we get.

912
00:44:03,530 --> 00:44:07,207
So now we get aR
bR plus 2 cubed.

913
00:44:07,207 --> 00:44:09,290
But we actually don't care
about the plus 2 cubed.

914
00:44:09,290 --> 00:44:11,123
It's actually fine
because all we care about

915
00:44:11,123 --> 00:44:12,190
is the value of mod q.

916
00:44:15,190 --> 00:44:18,070
And now we're closer, we have
three 0 bits at the bottom.

917
00:44:18,070 --> 00:44:20,190
Now we can add
another multiple of q.

918
00:44:20,190 --> 00:44:23,000
This time it's going
to be probably 8q.

919
00:44:23,000 --> 00:44:26,680
So we add 1 1 1 here 0 0.

920
00:44:26,680 --> 00:44:29,905
And if we add it, we're
going to get, let's say,

921
00:44:29,905 --> 00:44:37,120
0 0 0 then add these two guys
0, carry a 1, 0, carry a 1, 1 1.

922
00:44:37,120 --> 00:44:38,250
I think that's right.

923
00:44:38,250 --> 00:44:41,390
But now we have
our original aR bR

924
00:44:41,390 --> 00:44:45,030
plus 2q plus 8q is
equal to this thing.

925
00:44:45,030 --> 00:44:48,720
And finally, we can divide
this thing by R very cheaply.

926
00:44:48,720 --> 00:44:54,762
Because we just discard
the low four zeros.

927
00:44:54,762 --> 00:44:56,205
Make sense?

928
00:44:56,205 --> 00:44:57,167
Question.

929
00:44:57,167 --> 00:45:01,150
AUDIENCE: Is aR bR always
going to end in, I guess,

930
00:45:01,150 --> 00:45:03,270
1,024 zeros?

931
00:45:03,270 --> 00:45:08,021
PROFESSOR: No, and the
reason is that-- OK,

932
00:45:08,021 --> 00:45:10,130
here is the thing
that's maybe confusing.

933
00:45:10,130 --> 00:45:12,710
A was, let's say, 512 bits.

934
00:45:12,710 --> 00:45:15,470
Then you multiply it by
R. So here, you're right.

935
00:45:15,470 --> 00:45:19,380
This value is that 1,000 bit
number where the high bit is

936
00:45:19,380 --> 00:45:20,980
a, the high 512 bits are a.

937
00:45:20,980 --> 00:45:22,794
And the low bits are all zeros.

938
00:45:22,794 --> 00:45:24,710
But then, you're going
[? to do it with ?] mod

939
00:45:24,710 --> 00:45:27,410
q to bring it down
to make it smaller.

940
00:45:27,410 --> 00:45:29,570
And in general, this is
going to be the case.

941
00:45:29,570 --> 00:45:32,745
Because [? it only ?] has
these low zeros the first time

942
00:45:32,745 --> 00:45:33,370
you convert it.

943
00:45:33,370 --> 00:45:35,119
But after you do a
couple multiplications,

944
00:45:35,119 --> 00:45:37,685
they're going to
be arbitrary bits.

945
00:45:37,685 --> 00:45:40,270
So these guys are--
so I really should

946
00:45:40,270 --> 00:45:43,260
have written mod q here--
and to compute this mod q

947
00:45:43,260 --> 00:45:49,356
as soon as you do the conversion
to keep the whole value small.

948
00:45:49,356 --> 00:45:50,802
AUDIENCE: [INAUDIBLE]

949
00:45:50,802 --> 00:45:53,460
PROFESSOR: Yeah, so the
initial conversion is expensive

950
00:45:53,460 --> 00:45:58,650
or at least it's as expensive
as doing a regular modulus

951
00:45:58,650 --> 00:46:01,010
during the multiplication.

952
00:46:01,010 --> 00:46:03,010
The cool thing is
that you pay this cost

953
00:46:03,010 --> 00:46:05,176
just once when you do the
conversion into Montgomery

954
00:46:05,176 --> 00:46:06,122
form.

955
00:46:06,122 --> 00:46:09,240
And then, instead of converting
it back at every step,

956
00:46:09,240 --> 00:46:11,235
you just keep it
in Montgomery form.

957
00:46:11,235 --> 00:46:13,700
But remember that in order
to do an exponentiation

958
00:46:13,700 --> 00:46:16,064
to an exponent
which has 512 bits,

959
00:46:16,064 --> 00:46:17,480
you're saying
you're going to have

960
00:46:17,480 --> 00:46:21,320
to do over 500 multiplications
because we have to do at least

961
00:46:21,320 --> 00:46:23,870
500 squarings plus then some.

962
00:46:23,870 --> 00:46:27,000
So you do these mod
q twice and then

963
00:46:27,000 --> 00:46:30,370
you get a lot of cheap divisions
if you stay in this form.

964
00:46:30,370 --> 00:46:34,500
And then you do a division by R
to get back to this form again.

965
00:46:34,500 --> 00:46:37,520
So instead of doing 500 mod qs
for every multiplication step,

966
00:46:37,520 --> 00:46:39,366
you do it twice mod q.

967
00:46:39,366 --> 00:46:41,510
And then you keep
doing these divisions

968
00:46:41,510 --> 00:46:45,080
by R cheaply using this trick.

969
00:46:45,080 --> 00:46:45,580
Question.

970
00:46:45,580 --> 00:46:49,460
AUDIENCE: So when you're
adding the multiples of q

971
00:46:49,460 --> 00:46:51,400
and then dividing
by R, [INAUDIBLE]

972
00:46:54,310 --> 00:46:56,780
PROFESSOR: Because it's
actually mod q means

973
00:46:56,780 --> 00:46:58,920
the remainder when
you divide by q.

974
00:46:58,920 --> 00:47:07,990
So x plus y times
q, mod q is just x.

975
00:47:07,990 --> 00:47:08,930
AUDIENCE: [INAUDIBLE]

976
00:47:12,230 --> 00:47:16,089
PROFESSOR: So in this case,
dividing by-- so another sort

977
00:47:16,089 --> 00:47:17,630
of nice property is
that because it's

978
00:47:17,630 --> 00:47:22,450
all modulus at prime
number-- it's also true

979
00:47:22,450 --> 00:47:28,080
that if you have x
plus yq divided by R,

980
00:47:28,080 --> 00:47:35,790
mod q is actually the same
as x divided by R mod q.

981
00:47:35,790 --> 00:47:39,180
The way to think of it is
that there's no real division

982
00:47:39,180 --> 00:47:40,650
in modular arithmetic.

983
00:47:40,650 --> 00:47:41,730
It's just an inverse.

984
00:47:41,730 --> 00:47:44,060
So what this really
says is this is actually

985
00:47:44,060 --> 00:47:49,465
x plus yq times some
number called R inverse.

986
00:47:49,465 --> 00:47:52,930
And then you compute
this whole thing mod q.

987
00:47:52,930 --> 00:47:57,210
And then you could think of
this as x times R inverse

988
00:47:57,210 --> 00:48:05,320
mod q plus y [? u ?]
R inverse mod q.

989
00:48:05,320 --> 00:48:08,610
And this thing cancels out
because it's something times q.

990
00:48:15,060 --> 00:48:17,856
And there's some closed
form for this thing.

991
00:48:17,856 --> 00:48:22,195
So here I did it by bit by
bit, 2q then 8q, et cetera.

992
00:48:22,195 --> 00:48:23,765
It's actually a
nice closed formula

993
00:48:23,765 --> 00:48:25,630
you can compute-- it's
in the lecture notes,

994
00:48:25,630 --> 00:48:27,880
but it's probably not worth
spending time on the board

995
00:48:27,880 --> 00:48:31,215
here-- for how do you figure
out what multiple of q

996
00:48:31,215 --> 00:48:35,331
should you add to get all
the low bits to turn to 0.

997
00:48:35,331 --> 00:48:38,200
So then it turns out that in
order to do this division by R,

998
00:48:38,200 --> 00:48:43,450
you just need to compute this
magic multiple of q, add it.

999
00:48:43,450 --> 00:48:46,290
And then discard the
low bits and that

1000
00:48:46,290 --> 00:48:53,047
brings your number back to 512
bits, or whatever the size is.

1001
00:48:53,047 --> 00:48:54,029
OK.

1002
00:48:54,029 --> 00:48:55,790
And here's the subtlety.

1003
00:48:55,790 --> 00:48:57,470
The only reason we're
talking about this

1004
00:48:57,470 --> 00:49:00,470
is that there's something
funny going on here

1005
00:49:00,470 --> 00:49:05,090
that is going to allow us
to learn timing information.

1006
00:49:05,090 --> 00:49:09,780
And in particular, even
though we divided by R,

1007
00:49:09,780 --> 00:49:12,770
we know the result is
going to be 512 bits.

1008
00:49:12,770 --> 00:49:15,123
But it still might
be greater than q

1009
00:49:15,123 --> 00:49:16,820
because q isn't exactly
[? up to 512 ?],

1010
00:49:16,820 --> 00:49:18,340
it's not a 512 bit number.

1011
00:49:18,340 --> 00:49:20,840
So it might be a
little bit less than R.

1012
00:49:20,840 --> 00:49:24,730
So it might be that after we
do this cheap division by R,

1013
00:49:24,730 --> 00:49:26,960
[? the way ?] we
subtract out q one more

1014
00:49:26,960 --> 00:49:29,690
time because we get something
that's small but not

1015
00:49:29,690 --> 00:49:31,400
quite small enough.

1016
00:49:31,400 --> 00:49:34,740
So there's a chance that
after doing this division,

1017
00:49:34,740 --> 00:49:39,740
we maybe have to also
subtract q again.

1018
00:49:39,740 --> 00:49:42,390
And this subtraction is
going to be part of what

1019
00:49:42,390 --> 00:49:44,250
this attack is all about.

1020
00:49:44,250 --> 00:49:48,060
It turns out that
subtracting this q adds time.

1021
00:49:48,060 --> 00:49:51,660
And someone figured
out-- not these guys

1022
00:49:51,660 --> 00:49:53,050
but some previous
work-- that you

1023
00:49:53,050 --> 00:49:56,770
show that this probability
of doing this thing, this

1024
00:49:56,770 --> 00:49:58,145
is called an
extractor reduction.

1025
00:50:03,500 --> 00:50:10,020
This probability sort of
depends on the particular value

1026
00:50:10,020 --> 00:50:12,410
that you're exponentiating.

1027
00:50:12,410 --> 00:50:19,790
So if you're computing
x to the d mod q,

1028
00:50:19,790 --> 00:50:22,400
the probability of
an extra reduction,

1029
00:50:22,400 --> 00:50:25,240
at some point while
computing x to the d mod q,

1030
00:50:25,240 --> 00:50:31,860
is going to be equal to
x mod q divided by 2R.

1031
00:50:36,890 --> 00:50:40,390
So if we're going to be
computing x to the mod q,

1032
00:50:40,390 --> 00:50:43,690
then depending on what
the value of x mod q

1033
00:50:43,690 --> 00:50:45,410
is, whether it's
big or small, you're

1034
00:50:45,410 --> 00:50:49,080
going to have even more or
less of these extra reductions.

1035
00:50:49,080 --> 00:50:51,577
And just to show you where
this is going to fit in,

1036
00:50:51,577 --> 00:50:53,785
this is actually going to
happen in the decrypt step,

1037
00:50:53,785 --> 00:50:55,951
because during the decrypt
step, the server is going

1038
00:50:55,951 --> 00:50:57,330
to be computing c to the d.

1039
00:50:57,330 --> 00:51:00,650
And this says the
extractor reductions

1040
00:51:00,650 --> 00:51:05,160
are going to be proportional to
how close x, or c in this case,

1041
00:51:05,160 --> 00:51:07,254
is to the value q.

1042
00:51:07,254 --> 00:51:08,920
So this is going to
be worrisome, right,

1043
00:51:08,920 --> 00:51:12,490
because the attacker gets
to choose the input c.

1044
00:51:12,490 --> 00:51:14,640
And the number of
extractor reductions

1045
00:51:14,640 --> 00:51:16,940
is going to be proportional
to how close the c is

1046
00:51:16,940 --> 00:51:18,981
to one of the factors, the q.

1047
00:51:18,981 --> 00:51:21,260
And this is how you're going
to tell I'm getting close

1048
00:51:21,260 --> 00:51:23,337
to the q, or I've overshot q.

1049
00:51:23,337 --> 00:51:25,545
And all of a sudden, there's
no extractor reductions,

1050
00:51:25,545 --> 00:51:28,556
it's probably because x mod
q is very small the x is

1051
00:51:28,556 --> 00:51:29,472
q plus little epsilon.

1052
00:51:29,472 --> 00:51:31,720
And it's very small.

1053
00:51:31,720 --> 00:51:33,942
So that's one part
of the timing attack

1054
00:51:33,942 --> 00:51:35,650
we're going to be
looking at in a second.

1055
00:51:38,770 --> 00:51:42,740
I don't have any proof that
this actually true [INAUDIBLE]

1056
00:51:42,740 --> 00:51:44,905
these extractor
reductions work like this.

1057
00:51:44,905 --> 00:51:45,680
Yea, question.

1058
00:51:45,680 --> 00:51:48,700
AUDIENCE: What happens if you
don't do this extra reduction?

1059
00:51:48,700 --> 00:51:51,210
PROFESSOR: Oh, what happens
if you don't do this extractor

1060
00:51:51,210 --> 00:51:51,710
reduction?

1061
00:51:55,510 --> 00:51:57,850
You can avoid this
extra reduction.

1062
00:51:57,850 --> 00:52:01,790
And then you just have
to do some extra probably

1063
00:52:01,790 --> 00:52:03,410
modular reductions later.

1064
00:52:03,410 --> 00:52:06,500
I think the math just
works out nicely this way

1065
00:52:06,500 --> 00:52:07,834
for the Montgomery form.

1066
00:52:07,834 --> 00:52:09,750
I think for many of these
things it's actually

1067
00:52:09,750 --> 00:52:12,406
once you look at them as a
timing channel [INAUDIBLE]

1068
00:52:12,406 --> 00:52:13,780
[? think ?] don't
do this at all,

1069
00:52:13,780 --> 00:52:16,004
or maybe you should
do some other plan.

1070
00:52:16,004 --> 00:52:16,670
So you're right,

1071
00:52:16,670 --> 00:52:19,710
I think you could probably
avoid this extra reduction

1072
00:52:19,710 --> 00:52:22,655
and probably just do the
mod q, perhaps at the end.

1073
00:52:22,655 --> 00:52:24,840
I haven't actually
tried implementing this.

1074
00:52:24,840 --> 00:52:27,380
But it seems like it could work.

1075
00:52:27,380 --> 00:52:29,390
It might be that you just
have to do mod q once

1076
00:52:29,390 --> 00:52:31,598
[? there ?], which you'll
probably have to do anyway.

1077
00:52:31,598 --> 00:52:32,820
So it's not super clear.

1078
00:52:32,820 --> 00:52:37,770
Maybe it's [INAUDIBLE]
probably not q.

1079
00:52:37,770 --> 00:52:40,314
So in light of the
fact that [INAUDIBLE].

1080
00:52:44,274 --> 00:52:46,440
Actually, I shouldn't speak
authoritatively to this.

1081
00:52:46,440 --> 00:52:47,000
I haven't tired
implementing this.

1082
00:52:47,000 --> 00:52:49,166
So maybe there's some deep
reason why this extractor

1083
00:52:49,166 --> 00:52:50,184
reduction has to happen.

1084
00:52:50,184 --> 00:52:53,490
I couldn't think of one.

1085
00:52:53,490 --> 00:52:54,450
All right, questions?

1086
00:52:57,110 --> 00:53:00,995
So here's the last piece of
the puzzle for how OpenSSL,

1087
00:53:00,995 --> 00:53:06,040
this library that this
paper attacks implements

1088
00:53:06,040 --> 00:53:07,870
multiplication.

1089
00:53:07,870 --> 00:53:12,630
So this Montgomery trick is
great for avoiding the mod q

1090
00:53:12,630 --> 00:53:15,630
part during modular
multiplication.

1091
00:53:15,630 --> 00:53:17,770
But then there's a question
of how do you actually

1092
00:53:17,770 --> 00:53:19,020
multiply two numbers together.

1093
00:53:19,020 --> 00:53:21,235
So we're doing lower
and lower level.

1094
00:53:21,235 --> 00:53:25,791
So suppose you have
[? the raw ?] multiplication.

1095
00:53:28,579 --> 00:53:30,370
So this is not even
modular multiplication.

1096
00:53:30,370 --> 00:53:33,475
You have two numbers, a and b.

1097
00:53:33,475 --> 00:53:38,636
And both these guys
are 512 bit numbers.

1098
00:53:38,636 --> 00:53:40,250
How do you multiply
them together

1099
00:53:40,250 --> 00:53:42,400
when your machine is
only a 32 bit machine,

1100
00:53:42,400 --> 00:53:46,226
like the guys in the paper, or
a 64 bit, but still, same thing?

1101
00:53:46,226 --> 00:53:48,670
How would you implement
multiplication of these guys?

1102
00:53:53,740 --> 00:53:56,242
Any suggestions?

1103
00:53:56,242 --> 00:53:58,200
Well I guess it was a
straightforward question,

1104
00:53:58,200 --> 00:54:01,860
you just represent a and
b as a sequence of machine

1105
00:54:01,860 --> 00:54:05,290
[? words. ?] And then you
just do this quadratic product

1106
00:54:05,290 --> 00:54:06,752
of these two guys.

1107
00:54:06,752 --> 00:54:08,960
[INAUDIBLE] see a simple
example, instead of thinking

1108
00:54:08,960 --> 00:54:13,574
of a 512 bit number, let's think
of these guys as 64 bit numbers

1109
00:54:13,574 --> 00:54:15,671
and we're on a 32 bit machine.

1110
00:54:15,671 --> 00:54:16,170
Right.

1111
00:54:16,170 --> 00:54:17,900
So we're going to have values.

1112
00:54:17,900 --> 00:54:20,794
The value of a is going
to be represented by two

1113
00:54:20,794 --> 00:54:21,960
[? very ?] different things.

1114
00:54:21,960 --> 00:54:27,550
It's going to be, let's
call it, a1 and a0.

1115
00:54:27,550 --> 00:54:29,895
So a0 is the low bit,
a1 is the high bit.

1116
00:54:29,895 --> 00:54:31,520
And similarly, we're
going to represent

1117
00:54:31,520 --> 00:54:36,760
b as two things, b1 b0.

1118
00:54:36,760 --> 00:54:39,640
So then a naive way
to represent a b

1119
00:54:39,640 --> 00:54:44,310
is going to be to multiply
all these guys out.

1120
00:54:44,310 --> 00:54:48,020
So it's going to be
a three cell number.

1121
00:54:48,020 --> 00:54:52,140
The high bit is
going to be a1 b1.

1122
00:54:52,140 --> 00:54:55,560
The low bit is
going to be a0 b0.

1123
00:54:55,560 --> 00:55:01,845
And the middle word is going
to be a1 b0 plus a0 b1.

1124
00:55:01,845 --> 00:55:06,330
So this is how you do the
multiplication, right.

1125
00:55:06,330 --> 00:55:06,940
Question?

1126
00:55:06,940 --> 00:55:08,822
AUDIENCE: So I was
going to say are

1127
00:55:08,822 --> 00:55:10,785
you using [INAUDIBLE] method?

1128
00:55:10,785 --> 00:55:13,060
PROFESSOR: Yeah, so this
is like a clever method

1129
00:55:13,060 --> 00:55:15,490
alternative for doing
multiplication, which

1130
00:55:15,490 --> 00:55:16,680
doesn't involve four steps.

1131
00:55:16,680 --> 00:55:18,435
Here, you have to do
four multiplications.

1132
00:55:18,435 --> 00:55:20,807
There's this clever
other method, Karatsuba.

1133
00:55:20,807 --> 00:55:22,890
Do they teach this in 601
or something these days?

1134
00:55:22,890 --> 00:55:23,290
AUDIENCE: 042.

1135
00:55:23,290 --> 00:55:24,373
PROFESSOR: 042, excellent.

1136
00:55:24,373 --> 00:55:25,980
Yeah, that's a very nice method.

1137
00:55:25,980 --> 00:55:29,440
Almost every cryptographic
library implements this.

1138
00:55:29,440 --> 00:55:32,230
And for those of
you that, I guess,

1139
00:55:32,230 --> 00:55:34,980
weren't undergrads here, since
we have grad students maybe

1140
00:55:34,980 --> 00:55:35,685
they haven't seen Karatsuba.

1141
00:55:35,685 --> 00:55:37,184
I'll just write it
out on the board.

1142
00:55:37,184 --> 00:55:40,850
It's a clever thing the
first time you see it.

1143
00:55:40,850 --> 00:55:46,310
And what you can do is basically
compute out three values.

1144
00:55:46,310 --> 00:55:49,040
You're going to
compute out a1 b1.

1145
00:55:49,040 --> 00:55:59,190
You're going to also
compute a1 minus b0 times b1

1146
00:55:59,190 --> 00:56:04,950
minus-- sorry-- a1
minus a0, b1 minus b0.

1147
00:56:04,950 --> 00:56:08,690
And a0 b0.

1148
00:56:08,690 --> 00:56:11,125
And this does three
multiplications

1149
00:56:11,125 --> 00:56:12,225
instead of four.

1150
00:56:12,225 --> 00:56:13,810
And it turns out
you can actually

1151
00:56:13,810 --> 00:56:18,440
reconstruct this value from
these three multiplication

1152
00:56:18,440 --> 00:56:20,200
results.

1153
00:56:20,200 --> 00:56:22,810
And the particular
way to do it is this

1154
00:56:22,810 --> 00:56:29,736
is going to be the--
let me write it out

1155
00:56:29,736 --> 00:56:31,910
in a different form.

1156
00:56:31,910 --> 00:56:41,010
So we're going to have 2 to the
64 times-- sorry-- 2 to the 64

1157
00:56:41,010 --> 00:56:52,710
plus 2 to the 32
times a1 b1 plus 2

1158
00:56:52,710 --> 00:57:00,230
to the 32 times minus that
little guy in the middle a1

1159
00:57:00,230 --> 00:57:05,640
minus a0 b1 minus b0.

1160
00:57:05,640 --> 00:57:15,020
And finally, we're going to do
2 to the 32 plus 1 times a0 b0.

1161
00:57:15,020 --> 00:57:16,920
And it's a little
messy, but actually

1162
00:57:16,920 --> 00:57:19,380
if you work through
the details, you'll

1163
00:57:19,380 --> 00:57:20,880
end up convincing
yourself hopefully

1164
00:57:20,880 --> 00:57:26,285
that this value is exactly
the same as this value.

1165
00:57:26,285 --> 00:57:27,930
So it's a clever.

1166
00:57:27,930 --> 00:57:31,470
But nonetheless, it saves
you one multiplication.

1167
00:57:31,470 --> 00:57:34,670
And the way we
apply this to doing

1168
00:57:34,670 --> 00:57:37,660
much larger multiplications
is that you recursively

1169
00:57:37,660 --> 00:57:38,610
keep going down.

1170
00:57:38,610 --> 00:57:41,750
So if you have 512
bit values, you

1171
00:57:41,750 --> 00:57:44,790
could break it down to
256 bit multiplication.

1172
00:57:44,790 --> 00:57:47,802
You do three 256
bit multiplications.

1173
00:57:47,802 --> 00:57:49,260
And then each of
those you're going

1174
00:57:49,260 --> 00:57:52,410
to do using the same
Karatsuba trick recursively.

1175
00:57:52,410 --> 00:57:54,840
And eventually you'll get
down to machine size, which

1176
00:57:54,840 --> 00:57:56,986
you can just do with
a single machine

1177
00:57:56,986 --> 00:58:02,590
instruction. [INAUDIBLE]
This make sense?

1178
00:58:02,590 --> 00:58:04,660
So what's the
timing attack here?

1179
00:58:04,660 --> 00:58:07,430
How do these guys exploit
this Karatsuba multiplication?

1180
00:58:07,430 --> 00:58:11,720
Well, it turns out
that OpenSSL worries

1181
00:58:11,720 --> 00:58:13,920
about basically two
kinds of multiplications

1182
00:58:13,920 --> 00:58:15,850
that you might need to do.

1183
00:58:15,850 --> 00:58:18,757
One is a multiplication
between two large numbers

1184
00:58:18,757 --> 00:58:19,965
that are about the same size.

1185
00:58:19,965 --> 00:58:22,250
So this happens a
lot when we're doing

1186
00:58:22,250 --> 00:58:25,327
this modular exponentiation
because all the values we're

1187
00:58:25,327 --> 00:58:26,868
going to be multiplying
are all going

1188
00:58:26,868 --> 00:58:29,445
to be roughly 512 bits in size.

1189
00:58:29,445 --> 00:58:33,330
So when we're multiplying by c
to the y or doing a squaring,

1190
00:58:33,330 --> 00:58:35,850
we're multiplying two things
that are about the same size.

1191
00:58:35,850 --> 00:58:38,890
And then this Karatsuba
trick makes a lot of sense

1192
00:58:38,890 --> 00:58:41,290
because, instead
of computing stuff

1193
00:58:41,290 --> 00:58:43,790
in times squared
of the input size,

1194
00:58:43,790 --> 00:58:48,740
Karatsuba is roughly n to the
1.58, something like that.

1195
00:58:48,740 --> 00:58:50,335
So it's much faster.

1196
00:58:50,335 --> 00:58:52,490
But then there's
this other situation

1197
00:58:52,490 --> 00:58:54,930
where OpenSSL might be
multiplying two numbers that

1198
00:58:54,930 --> 00:58:57,410
are very different in
size: one that's very big,

1199
00:58:57,410 --> 00:58:58,530
and one that's very small.

1200
00:58:58,530 --> 00:59:00,900
And in that case you
could use Karatsuba,

1201
00:59:00,900 --> 00:59:02,990
but then it's going
to get you slower

1202
00:59:02,990 --> 00:59:04,610
than doing the naive thing.

1203
00:59:04,610 --> 00:59:06,660
Suppose you're trying
to multiply a 512 bit

1204
00:59:06,660 --> 00:59:08,997
number by a 64 bit
number, you'd rather just

1205
00:59:08,997 --> 00:59:10,830
do the straightforward
thing, where you just

1206
00:59:10,830 --> 00:59:13,050
multiply by each of the
things in the 64 bit

1207
00:59:13,050 --> 00:59:18,290
number plus 2n instead of
n to the 1.58 something.

1208
00:59:18,290 --> 00:59:21,900
So as a result, the OpenSSL
guys tried to be clever,

1209
00:59:21,900 --> 00:59:25,760
and that's where
often problems start.

1210
00:59:25,760 --> 00:59:28,280
They decided that
they'll actually

1211
00:59:28,280 --> 00:59:30,880
switch dynamically between
this Karatsuba efficient thing

1212
00:59:30,880 --> 00:59:35,450
and this sort of grade school
method of multiplication here.

1213
00:59:35,450 --> 00:59:37,400
And their heuristic
was basically

1214
00:59:37,400 --> 00:59:39,050
if the two things
you're multiplying

1215
00:59:39,050 --> 00:59:42,483
are exactly the same
number of machine words,

1216
00:59:42,483 --> 00:59:44,024
so they at least
have the same number

1217
00:59:44,024 --> 00:59:48,110
of bits up to 32-bit units,
then they'll go to Karatsuba.

1218
00:59:48,110 --> 00:59:50,380
And if the two things
they're multiplying

1219
00:59:50,380 --> 00:59:52,770
have a different
number or 32 bit units,

1220
00:59:52,770 --> 00:59:57,660
then they'll do the quadratic
or straightforward or regular,

1221
00:59:57,660 --> 00:59:59,882
normal multiplication.

1222
00:59:59,882 --> 01:00:03,880
And there you can see if
your number all of a sudden

1223
01:00:03,880 --> 01:00:06,290
switches to be a
little bit smaller,

1224
01:00:06,290 --> 01:00:08,710
then you're going to switch
from the sufficient thing

1225
01:00:08,710 --> 01:00:11,240
to this other
multiplication method.

1226
01:00:11,240 --> 01:00:14,030
And presumably, the
cutoff point isn't

1227
01:00:14,030 --> 01:00:15,595
going to be exactly
smooth so you'll

1228
01:00:15,595 --> 01:00:17,500
be able to tell all
of a sudden, it's

1229
01:00:17,500 --> 01:00:19,190
now taking a lot
longer to multiply

1230
01:00:19,190 --> 01:00:22,320
or a lot shorter to
multiply than before.

1231
01:00:22,320 --> 01:00:26,000
And that's what these guys
exploit in their timing attack

1232
01:00:26,000 --> 01:00:26,940
again.

1233
01:00:26,940 --> 01:00:28,060
Does that make sense?

1234
01:00:28,060 --> 01:00:32,070
What's going on with the
[INAUDIBLE] All right.

1235
01:00:32,070 --> 01:00:34,680
So I think I'm now
done with telling you

1236
01:00:34,680 --> 01:00:36,385
about all the weird
implementation

1237
01:00:36,385 --> 01:00:39,590
tricks that people play when
implementing RSA in practice.

1238
01:00:39,590 --> 01:00:41,630
So now let's try to
put them back together

1239
01:00:41,630 --> 01:00:44,410
into an entire web
server and figure out

1240
01:00:44,410 --> 01:00:48,230
how do you [? tickle ?]
all these interesting bits

1241
01:00:48,230 --> 01:00:52,220
of the implementation from
the input network packet.

1242
01:00:52,220 --> 01:00:54,910
So what happens
in a web server is

1243
01:00:54,910 --> 01:00:59,330
that the web server, if
you remember from the HTTPS

1244
01:00:59,330 --> 01:01:01,890
lecture, has a secret key.

1245
01:01:01,890 --> 01:01:04,780
And it uses the
secret key to prove

1246
01:01:04,780 --> 01:01:06,820
that it's the
correct owner of all

1247
01:01:06,820 --> 01:01:11,190
that certificate in the
HTTPS protocol or in TLS.

1248
01:01:11,190 --> 01:01:15,940
And they way this works is that
the clients send some randomly

1249
01:01:15,940 --> 01:01:19,470
chosen bits, and the
bits are encrypted

1250
01:01:19,470 --> 01:01:21,210
using the server's public key.

1251
01:01:21,210 --> 01:01:24,395
And the server in this TLS
protocol decrypts this message.

1252
01:01:24,395 --> 01:01:26,730
And if the message
checks out, it

1253
01:01:26,730 --> 01:01:29,249
uses those random bits to
establish a [? session ?].

1254
01:01:29,249 --> 01:01:32,246
But in this case, the message
isn't going to check out.

1255
01:01:32,246 --> 01:01:34,079
The message is going
to be carefully chosen,

1256
01:01:34,079 --> 01:01:35,845
the padding bits
aren't going to match,

1257
01:01:35,845 --> 01:01:37,470
and the server is
going to return error

1258
01:01:37,470 --> 01:01:39,850
as soon as it finishes
encrypting our message.

1259
01:01:39,850 --> 01:01:42,080
And that's what we're
going to time here.

1260
01:01:42,080 --> 01:01:49,368
So the server-- you can think of
this is Apache with open SSL--

1261
01:01:49,368 --> 01:01:52,500
you're going to get a
message from the client,

1262
01:01:52,500 --> 01:01:55,940
and you can think of
this as a ciphertext

1263
01:01:55,940 --> 01:01:59,400
c, or a hypothetical
ciphertext, that the client

1264
01:01:59,400 --> 01:02:00,545
might have produced.

1265
01:02:00,545 --> 01:02:03,340
And the first thing we're going
to do with a ciphertext c,

1266
01:02:03,340 --> 01:02:06,910
we want to decrypt it
using roughly this formula.

1267
01:02:06,910 --> 01:02:08,820
And if you remember
the first optimization

1268
01:02:08,820 --> 01:02:12,806
we're going to apply is the
Chinese Remainder Theorem.

1269
01:02:12,806 --> 01:02:14,306
So the first thing
we're going to do

1270
01:02:14,306 --> 01:02:16,730
is basically split our
pipeline in two parts.

1271
01:02:16,730 --> 01:02:20,430
We're going to do one thing
mod p another thing mod q

1272
01:02:20,430 --> 01:02:22,719
and then recombine the
results at the end of the day.

1273
01:02:22,719 --> 01:02:24,218
So the first thing
we're going to do

1274
01:02:24,218 --> 01:02:26,070
is, we're actually
going to take c

1275
01:02:26,070 --> 01:02:28,580
and we're going
to compute, let's

1276
01:02:28,580 --> 01:02:35,480
call this c0, which is going
to be equal to c mod q.

1277
01:02:35,480 --> 01:02:38,710
And we're also going to have
a different value, let's

1278
01:02:38,710 --> 01:02:44,730
call it c1, which is
going to be c mod p.

1279
01:02:44,730 --> 01:02:46,930
And then we're going to
do the same thing to each

1280
01:02:46,930 --> 01:02:51,905
of these values to basically
compute c to the d mod p

1281
01:02:51,905 --> 01:02:55,010
and c to the d mod q.

1282
01:02:55,010 --> 01:02:58,070
And here we're going to
basically initially we're

1283
01:02:58,070 --> 01:03:00,585
going to [? starch. ?]
After CRT, we're

1284
01:03:00,585 --> 01:03:02,610
going to switch into
Montgomery representation

1285
01:03:02,610 --> 01:03:06,040
because that's going to make
our multiplies very fast.

1286
01:03:06,040 --> 01:03:08,150
So the next thing
SSL is going to do

1287
01:03:08,150 --> 01:03:09,610
to your number,
it's actually going

1288
01:03:09,610 --> 01:03:12,900
to compute all the
[INAUDIBLE] at c0 prime,

1289
01:03:12,900 --> 01:03:18,740
which is going to
be c0 times R mod q.

1290
01:03:18,740 --> 01:03:20,208
And the same thing
down here, I'm

1291
01:03:20,208 --> 01:03:21,666
not going to write
out the pipeline

1292
01:03:21,666 --> 01:03:23,200
because that'll look the same.

1293
01:03:23,200 --> 01:03:27,520
And then, now that we've
switched into Montgomery form,

1294
01:03:27,520 --> 01:03:31,840
we can finally do
our multiplications.

1295
01:03:31,840 --> 01:03:34,190
And here's where we're going
to use the sliding window

1296
01:03:34,190 --> 01:03:35,780
technique.

1297
01:03:35,780 --> 01:03:38,290
So once we have c
prime, we can actually

1298
01:03:38,290 --> 01:03:47,460
try to compute this prime
exponentiate it to 2d mod q.

1299
01:03:47,460 --> 01:03:52,250
And here, as we're computing
this value to the d,

1300
01:03:52,250 --> 01:03:53,990
we're going to be
using sliding windows.

1301
01:03:53,990 --> 01:03:59,510
So here, we're going
to do sliding windows

1302
01:03:59,510 --> 01:04:03,350
for the bits in this d exponent.

1303
01:04:03,350 --> 01:04:08,450
And also we're going
to do Karatsuba

1304
01:04:08,450 --> 01:04:12,820
or regular multiplication
depending on exactly what

1305
01:04:12,820 --> 01:04:15,540
the size of our operands are.

1306
01:04:15,540 --> 01:04:18,500
So if it turns out that the
thing we're multiplying,

1307
01:04:18,500 --> 01:04:25,070
c0 prime and maybe that
previously squared result,

1308
01:04:25,070 --> 01:04:27,310
are the same size, we're
going to do Karatsuba.

1309
01:04:27,310 --> 01:04:31,230
If c0 prime is tiny
but some previous thing

1310
01:04:31,230 --> 01:04:34,240
we're multiplying it to is
big , then we're going to do

1311
01:04:34,240 --> 01:04:36,610
quadratic multiplication,
normal multiplication.

1312
01:04:36,610 --> 01:04:38,520
There's sliding
windows coming in here,

1313
01:04:38,520 --> 01:04:45,770
here we also have this Karatsuba
versus normal multiplying.

1314
01:04:45,770 --> 01:04:49,630
And also in this step, the
extra reductions come in.

1315
01:04:49,630 --> 01:04:54,420
Because at every multiply,
the extra reductions

1316
01:04:54,420 --> 01:04:58,840
are going to be proportional
to the thing we're

1317
01:04:58,840 --> 01:05:00,950
exponentiating mod q.

1318
01:05:00,950 --> 01:05:04,452
[INAUDIBLE] just plug in
the formula over here,

1319
01:05:04,452 --> 01:05:05,910
the probability
extra reductions is

1320
01:05:05,910 --> 01:05:11,170
going to be proportional to
this value of c0 prime mod

1321
01:05:11,170 --> 01:05:14,990
q divided by 2R.

1322
01:05:19,200 --> 01:05:21,672
So this is where the
really timing sensitive bit

1323
01:05:21,672 --> 01:05:22,718
is going to come in.

1324
01:05:22,718 --> 01:05:24,384
And there are actually
two effects here.

1325
01:05:24,384 --> 01:05:27,425
There's this Karatsuba
versus normal choice.

1326
01:05:27,425 --> 01:05:29,720
And then there's the
number of extra reductions

1327
01:05:29,720 --> 01:05:32,605
you're going to be making.

1328
01:05:32,605 --> 01:05:34,480
So we'll see how we
exploit this in a second,

1329
01:05:34,480 --> 01:05:36,800
but now that you get
this result for mod q,

1330
01:05:36,800 --> 01:05:39,560
you're going to get a
similar result mod p,

1331
01:05:39,560 --> 01:05:43,780
you can finally recombine
these guys from the top

1332
01:05:43,780 --> 01:05:46,660
and the bottom and use CRT.

1333
01:05:46,660 --> 01:05:49,870
And what you get out
from CRT is actually--

1334
01:05:49,870 --> 01:05:55,110
sorry I guess we need a first
convert it back down into non

1335
01:05:55,110 --> 01:05:56,760
Montgomery form.

1336
01:05:56,760 --> 01:06:00,380
So we're going to
get first, we're

1337
01:06:00,380 --> 01:06:09,620
going to get c0 prime to
the d divided by R mod q.

1338
01:06:09,620 --> 01:06:15,160
And this thing, because c0
prime was c0 times R mod q,

1339
01:06:15,160 --> 01:06:19,820
if we do this then we're going
to get back out our value of c

1340
01:06:19,820 --> 01:06:23,110
to the d mod q.

1341
01:06:23,110 --> 01:06:25,370
And we get c to
the d here, we're

1342
01:06:25,370 --> 01:06:28,290
going to get to c to the d
mod p on the bottom version

1343
01:06:28,290 --> 01:06:29,700
of this pipeline.

1344
01:06:29,700 --> 01:06:35,220
And we can use CRT to get the
value of c to the d mod m.

1345
01:06:35,220 --> 01:06:38,060
Sorry for the small
type here, or font size.

1346
01:06:38,060 --> 01:06:40,680
But roughly it's the same
thing we're expecting here.

1347
01:06:40,680 --> 01:06:44,305
We can finally get our result.
And we get our message, m.

1348
01:06:44,305 --> 01:06:46,420
So the server takes
an incoming packet

1349
01:06:46,420 --> 01:06:51,000
that it gets, runs it
through this whole pipeline,

1350
01:06:51,000 --> 01:06:53,578
does two parts of
this pipeline, ends up

1351
01:06:53,578 --> 01:06:57,627
with a decrypted message m
that's equal c to the d mod m.

1352
01:06:57,627 --> 01:07:00,682
And then it's going to check
the padding of this message.

1353
01:07:00,682 --> 01:07:02,940
And in this particular
attack, because we're

1354
01:07:02,940 --> 01:07:05,320
going to carefully
construct this value c,

1355
01:07:05,320 --> 01:07:07,810
the padding is going to
actually not match up.

1356
01:07:07,810 --> 01:07:10,290
We're going to choose
the value c according

1357
01:07:10,290 --> 01:07:12,629
to some other
heuristics that aren't

1358
01:07:12,629 --> 01:07:14,754
encrypting a real message
with the correct padding.

1359
01:07:14,754 --> 01:07:17,310
So the padding is going to be
a mismatch, and the server's

1360
01:07:17,310 --> 01:07:19,601
going to need it to record
an error back to the client.

1361
01:07:19,601 --> 01:07:22,080
[? And it pulls ?]
the connection.

1362
01:07:22,080 --> 01:07:23,680
And that's the time
that we're going

1363
01:07:23,680 --> 01:07:28,230
to measure to figure out how
long this whole pipeline took.

1364
01:07:28,230 --> 01:07:29,362
Makes sense?

1365
01:07:29,362 --> 01:07:31,070
Questions about this
pipeline and putting

1366
01:07:31,070 --> 01:07:34,396
all the optimizations together?

1367
01:07:34,396 --> 01:07:35,354
AUDIENCE: [INAUDIBLE]

1368
01:07:41,445 --> 01:07:43,070
PROFESSOR: Yeah,
you're probably right.

1369
01:07:43,070 --> 01:07:45,600
Yes, c1 to the d, c0 to the d.

1370
01:07:45,600 --> 01:07:46,620
Yeah, this is c0.

1371
01:07:46,620 --> 01:07:49,287
Yeah, correct.

1372
01:07:49,287 --> 01:07:51,722
AUDIENCE: When you
divide by r [INAUDIBLE],

1373
01:07:51,722 --> 01:07:55,131
isn't there a
[INAUDIBLE] on how many

1374
01:07:55,131 --> 01:08:00,812
q's you have to have to get
the [? little bit ?] to be

1375
01:08:00,812 --> 01:08:03,035
0? [INAUDIBLE].

1376
01:08:03,035 --> 01:08:05,160
PROFESSOR: Yeah, so there
might be extra reductions

1377
01:08:05,160 --> 01:08:07,049
in this final phase as well.

1378
01:08:07,049 --> 01:08:07,590
You're right.

1379
01:08:07,590 --> 01:08:11,220
So potentially, we have do
this divide by R correctly.

1380
01:08:11,220 --> 01:08:13,300
So we probably have to
do exactly the same thing

1381
01:08:13,300 --> 01:08:16,399
as we saw for the
Montgomery reductions here.

1382
01:08:16,399 --> 01:08:19,649
When we do this divide
by R to convert it back.

1383
01:08:19,649 --> 01:08:22,560
So it's not clear exactly
how many qs we should add.

1384
01:08:22,560 --> 01:08:25,250
We should figure out how many
qs to add, add that many,

1385
01:08:25,250 --> 01:08:28,329
kill the low zeros, and
then do mod q again,

1386
01:08:28,329 --> 01:08:29,514
maybe an extra reduction.

1387
01:08:29,514 --> 01:08:31,180
You're absolutely
right, this is exactly

1388
01:08:31,180 --> 01:08:33,406
the same kind of
divide by R mod q

1389
01:08:33,406 --> 01:08:38,229
as we do for every Montgomery
multiplication step.

1390
01:08:38,229 --> 01:08:40,689
Make sense?

1391
01:08:40,689 --> 01:08:43,569
Any other questions?

1392
01:08:43,569 --> 01:08:44,116
All right.

1393
01:08:44,116 --> 01:08:45,240
So how do you exploit this?

1394
01:08:45,240 --> 01:08:47,689
How does an attacker
actually figure out

1395
01:08:47,689 --> 01:08:49,710
what the secret
key of the server

1396
01:08:49,710 --> 01:08:54,300
is by measuring the time
of this entire pipeline?

1397
01:08:54,300 --> 01:08:58,160
So these guys have a
plan that basically

1398
01:08:58,160 --> 01:09:03,810
involves guessing one bit of
the private key at a time.

1399
01:09:03,810 --> 01:09:07,060
And what they mean actually
by guessing the private key is

1400
01:09:07,060 --> 01:09:10,960
that you might think the private
key is this encryption exponent

1401
01:09:10,960 --> 01:09:13,528
d, because actually
you know e, you

1402
01:09:13,528 --> 01:09:15,160
know n, that's the public key.

1403
01:09:15,160 --> 01:09:16,849
The only think you
don't know is d.

1404
01:09:16,849 --> 01:09:19,785
But in fact, in this attack
they don't go for the exponent d

1405
01:09:19,785 --> 01:09:21,810
directly, that's a little
bit harder to guess.

1406
01:09:21,810 --> 01:09:23,185
Instead, what
they're going to go

1407
01:09:23,185 --> 01:09:25,890
for is the value
q or the value p,

1408
01:09:25,890 --> 01:09:27,649
doesn't really matter which one.

1409
01:09:27,649 --> 01:09:31,229
Once you guess what the
value p or q is, then

1410
01:09:31,229 --> 01:09:34,662
you can give an n, you can
factor in the p times q.

1411
01:09:34,662 --> 01:09:37,470
Then if you know p times
q, you can actually--

1412
01:09:37,470 --> 01:09:39,219
sorry-- if you know
the values of p and q,

1413
01:09:39,219 --> 01:09:41,729
you can compute that phi
function we saw before.

1414
01:09:41,729 --> 01:09:45,979
That's going to allow you to get
the value d from the value e.

1415
01:09:45,979 --> 01:09:48,750
So this factorization of the
value m is hugely important,

1416
01:09:48,750 --> 01:09:51,985
it should be secret for
RSA to remain secure.

1417
01:09:51,985 --> 01:09:53,840
So these guys are
actually going to go

1418
01:09:53,840 --> 01:09:55,830
and try to guess
what the value of q

1419
01:09:55,830 --> 01:09:59,570
is by timing this pipeline.

1420
01:09:59,570 --> 01:10:00,070
All right.

1421
01:10:00,070 --> 01:10:02,410
So how do these
guys actually do it?

1422
01:10:02,410 --> 01:10:10,280
Well, they construct
carefully chosen inputs, c,

1423
01:10:10,280 --> 01:10:12,570
into this pipeline
and-- I guess I

1424
01:10:12,570 --> 01:10:16,800
keep saying they keep measuring
the time for this guy.

1425
01:10:16,800 --> 01:10:22,130
But the particular,
well, there's

1426
01:10:22,130 --> 01:10:23,505
two parts of the
attack, you have

1427
01:10:23,505 --> 01:10:26,390
to bootstrap it a little bit to
guess the first couple of bits.

1428
01:10:26,390 --> 01:10:28,390
And then once you have
the first couple of bits,

1429
01:10:28,390 --> 01:10:29,600
you can I guess the next bit.

1430
01:10:29,600 --> 01:10:31,810
So let me not say
exactly how they

1431
01:10:31,810 --> 01:10:34,997
guess the first couple of bits
because it's actually much more

1432
01:10:34,997 --> 01:10:36,955
interesting to see how
they guess the next bit.

1433
01:10:36,955 --> 01:10:38,330
And then we'll come
back if we have

1434
01:10:38,330 --> 01:10:40,621
time to look at how they
guess the first couple of bits

1435
01:10:40,621 --> 01:10:41,970
[? at this ?] in the paper.

1436
01:10:41,970 --> 01:10:45,820
But basically, suppose you
have a guess g about what

1437
01:10:45,820 --> 01:10:48,216
the bits are of this value q.

1438
01:10:48,216 --> 01:10:56,820
So you know that q has some
bits, g0, g1, g2, et cetera.

1439
01:10:56,820 --> 01:11:01,720
And actually, I guess
these are not even gs,

1440
01:11:01,720 --> 01:11:04,990
these are real q bits, so
let me write it as that.

1441
01:11:04,990 --> 01:11:10,310
So you know tat q bit
0 q bit 1, q bit 2,

1442
01:11:10,310 --> 01:11:12,690
these are the highest bits of q.

1443
01:11:12,690 --> 01:11:15,455
And then you're trying to
guess lower and lower bits.

1444
01:11:15,455 --> 01:11:20,275
So suppose you know the
value of q up to bit j.

1445
01:11:20,275 --> 01:11:22,750
And from that point on, your
guess is actually all 0.

1446
01:11:22,750 --> 01:11:26,280
You have no idea what
the other bits are.

1447
01:11:26,280 --> 01:11:31,900
So these guys are going
to try to get this guess

1448
01:11:31,900 --> 01:11:35,760
g into this place
in the pipeline.

1449
01:11:35,760 --> 01:11:38,280
Because this is where
there are two tiny effects:

1450
01:11:38,280 --> 01:11:41,010
this choice of Karatsuba
versus normal multiplication.

1451
01:11:41,010 --> 01:11:44,230
And this choice of, or
this a different number

1452
01:11:44,230 --> 01:11:48,436
of extra reductions depending
on the value c0 prime.

1453
01:11:48,436 --> 01:11:51,020
Sp they're going to actually
try to get two different guess

1454
01:11:51,020 --> 01:11:53,330
values into that
place in the pipeline.

1455
01:11:53,330 --> 01:11:58,120
One that looks like this,
and one that they call

1456
01:11:58,120 --> 01:12:05,110
g high, which is all the
same high bits, q2 qj.

1457
01:12:05,110 --> 01:12:07,440
And for the next bit,
which they don't know,

1458
01:12:07,440 --> 01:12:09,750
[? you ?] guess g
is going to have 0,

1459
01:12:09,750 --> 01:12:14,906
g high is going to have a bit
1 here and all zeros later on.

1460
01:12:14,906 --> 01:12:19,040
So how does it help these guys
figure out what's going on?

1461
01:12:19,040 --> 01:12:22,120
So there are really two
ways you can think of it.

1462
01:12:22,120 --> 01:12:28,930
Suppose that we get this guess
g to be the value of c0 prime.

1463
01:12:28,930 --> 01:12:34,350
We can think of g and g high
being the c0 prime value

1464
01:12:34,350 --> 01:12:36,200
on that left board over there.

1465
01:12:36,200 --> 01:12:37,700
It's actually fairly
straightforward

1466
01:12:37,700 --> 01:12:42,460
to do this because c0 prime
is pretty deterministically

1467
01:12:42,460 --> 01:12:44,480
computed from the
input ciphertext c0.

1468
01:12:44,480 --> 01:12:47,030
You just multiply it
by R. So, in order

1469
01:12:47,030 --> 01:12:49,240
for them to get
some value to here,

1470
01:12:49,240 --> 01:12:53,370
as a guess, they just
need to take their guess

1471
01:12:53,370 --> 01:12:57,340
and first divide it by R, so
divide it by 2 to the 512 mod

1472
01:12:57,340 --> 01:12:58,340
something.

1473
01:12:58,340 --> 01:13:01,610
And then, they're going
to inject it back.

1474
01:13:01,610 --> 01:13:04,260
And the server's going
to multiply it by R,

1475
01:13:04,260 --> 01:13:06,490
and then off you go.

1476
01:13:06,490 --> 01:13:07,910
Make sense?

1477
01:13:07,910 --> 01:13:09,490
All right.

1478
01:13:09,490 --> 01:13:13,730
So suppose that we manage to get
our particular chosen integer

1479
01:13:13,730 --> 01:13:16,650
value into that c0
you're prime spot.

1480
01:13:16,650 --> 01:13:19,930
So what's going to be
the time to compute

1481
01:13:19,930 --> 01:13:22,522
c0 prime to the d mod q?

1482
01:13:22,522 --> 01:13:26,780
So there are two possible
options here where

1483
01:13:26,780 --> 01:13:28,180
q falls in this picture.

1484
01:13:28,180 --> 01:13:33,920
So it might be that q is
between these two values.

1485
01:13:33,920 --> 01:13:37,462
Because the next bit of q is 0.

1486
01:13:37,462 --> 01:13:39,170
So this value is going
to be less than q,

1487
01:13:39,170 --> 01:13:41,670
but this guy's going
to be greater than q.

1488
01:13:41,670 --> 01:13:44,970
So this happens if the
next bit of q0 or it

1489
01:13:44,970 --> 01:13:48,340
might be that q lies
above both of these values

1490
01:13:48,340 --> 01:13:51,880
if the next bit of q is 1.

1491
01:13:51,880 --> 01:13:53,860
So now we can tell,
OK, what's going

1492
01:13:53,860 --> 01:13:58,280
to be the timing of
decrypting these two values,

1493
01:13:58,280 --> 01:14:04,225
if q lies in between them, or
if q lies above both of them.

1494
01:14:04,225 --> 01:14:05,600
Let's look at the
situation where

1495
01:14:05,600 --> 01:14:08,140
q lies above both of them.

1496
01:14:08,140 --> 01:14:11,760
Well in that case,
actually everything

1497
01:14:11,760 --> 01:14:13,160
is pretty much the same.

1498
01:14:13,160 --> 01:14:13,660
Right?

1499
01:14:13,660 --> 01:14:16,330
Because both of these
values are smaller than q,

1500
01:14:16,330 --> 01:14:18,057
then the value of
these things mod q

1501
01:14:18,057 --> 01:14:19,390
is going to be roughly the same.

1502
01:14:19,390 --> 01:14:21,140
They're going to be a
little bit different

1503
01:14:21,140 --> 01:14:24,540
because this extra bit,
but more or less they're

1504
01:14:24,540 --> 01:14:26,420
the same magnitude.

1505
01:14:26,420 --> 01:14:28,797
And the number of
extractor reductions

1506
01:14:28,797 --> 01:14:31,380
is also probably not going to
be hugely different because it's

1507
01:14:31,380 --> 01:14:34,780
proportional to the
value of this guy mod q.

1508
01:14:34,780 --> 01:14:37,690
And for both these guys, they're
both a little bit smaller

1509
01:14:37,690 --> 01:14:40,130
than q, so they're
all about the same.

1510
01:14:40,130 --> 01:14:43,080
Neither of them is going to
exceed q and all of a sudden

1511
01:14:43,080 --> 01:14:46,080
have [? many or fewer ?]
extra reductions.

1512
01:14:46,080 --> 01:14:49,290
So if q is greater than
both of these guesses

1513
01:14:49,290 --> 01:14:52,197
then Karatsuba versus normal
is going to stay the same.

1514
01:14:52,197 --> 01:14:54,280
The server is going to do
the same thing basically

1515
01:14:54,280 --> 01:14:56,825
for both g and g high in terms
of Karatsuba versus normal.

1516
01:14:56,825 --> 01:14:59,145
And the server's going to
do about the same number

1517
01:14:59,145 --> 01:15:01,497
of extra reductions for
both these guys as well.

1518
01:15:01,497 --> 01:15:04,080
So If you see that the server's
taking the same amount of time

1519
01:15:04,080 --> 01:15:06,050
to respond to these
guesses, then you

1520
01:15:06,050 --> 01:15:10,580
should probably guess that, oh,
q probably has the bit 1 here.

1521
01:15:10,580 --> 01:15:12,754
On the other hand, if
q lies in the middle,

1522
01:15:12,754 --> 01:15:14,170
then there are two
possible things

1523
01:15:14,170 --> 01:15:17,370
that could trigger a
change in the timing.

1524
01:15:17,370 --> 01:15:19,680
One possibility is
that because g high

1525
01:15:19,680 --> 01:15:22,712
is just a little
bit larger than q,

1526
01:15:22,712 --> 01:15:24,170
then the number of
extra reductions

1527
01:15:24,170 --> 01:15:26,336
is going to be proportional
to this guy mod q, which

1528
01:15:26,336 --> 01:15:31,040
is very small because
c0 prime is q plus just

1529
01:15:31,040 --> 01:15:33,915
a little bit in
these extra bits.

1530
01:15:33,915 --> 01:15:35,290
So the number of
extra reductions

1531
01:15:35,290 --> 01:15:36,650
is going to [? flaunt it ?].

1532
01:15:36,650 --> 01:15:39,297
And all of a sudden,
it will be faster.

1533
01:15:39,297 --> 01:15:40,880
Another possible
thing that can happen

1534
01:15:40,880 --> 01:15:42,623
is that maybe the
server will decide, oh,

1535
01:15:42,623 --> 01:15:44,664
now it's time to do normal
multiplication instead

1536
01:15:44,664 --> 01:15:45,690
of Karatsuba.

1537
01:15:45,690 --> 01:15:51,910
Maybe for this value,
all these, c to the 0

1538
01:15:51,910 --> 01:15:55,170
prime was the same
number of bits as q

1539
01:15:55,170 --> 01:15:58,890
if it turns out that
g high is above q,

1540
01:15:58,890 --> 01:16:02,700
then g high mod q is potentially
going to have fewer bits.

1541
01:16:02,700 --> 01:16:04,930
And if this crosses the
[INAUDIBLE] boundary,

1542
01:16:04,930 --> 01:16:07,055
then the server's going to
do normal multiplication

1543
01:16:07,055 --> 01:16:08,270
all of a sudden.

1544
01:16:08,270 --> 01:16:10,590
So that's going to be
in the other direction.

1545
01:16:10,590 --> 01:16:14,260
So if you cross over, then
normal multiplication kicks in,

1546
01:16:14,260 --> 01:16:16,885
and things get a lot slower
because normal multiplication

1547
01:16:16,885 --> 01:16:20,612
is quadratic instead of
nicer, faster Karatsuba.

1548
01:16:20,612 --> 01:16:21,112
Question.

1549
01:16:21,112 --> 01:16:22,066
AUDIENCE: [INAUDIBLE]

1550
01:16:23,859 --> 01:16:26,150
PROFESSOR: Yeah, because the
number of extra reductions

1551
01:16:26,150 --> 01:16:31,520
is proportional to from above
there to c0 prime mod q.

1552
01:16:31,520 --> 01:16:36,880
So if c0 prime, which is this
value, is just a little over q.

1553
01:16:36,880 --> 01:16:40,350
Then, this is tiny, as opposed
to this guy who's basically

1554
01:16:40,350 --> 01:16:43,495
the same as q, or all the
high bits are the same as q,

1555
01:16:43,495 --> 01:16:44,820
and then it's big.

1556
01:16:44,820 --> 01:16:47,980
So then it'll be the difference
that you can try to measure.

1557
01:16:47,980 --> 01:16:49,730
So this is one interesting
thing, actually

1558
01:16:49,730 --> 01:16:51,480
a couple interesting
things, these effects

1559
01:16:51,480 --> 01:16:53,355
actually work in different
directions, right.

1560
01:16:53,355 --> 01:16:55,870
So if you hit a 32 bit
boundary and Karatsuba

1561
01:16:55,870 --> 01:16:58,170
versus normal switches,
then all of a sudden

1562
01:16:58,170 --> 01:17:00,930
it takes much longer to
decrypt this message.

1563
01:17:00,930 --> 01:17:04,460
On the other hand, if it's
not a 32 bit boundary,

1564
01:17:04,460 --> 01:17:07,424
maybe this effect will
tell you what's going on.

1565
01:17:07,424 --> 01:17:09,590
So you actually have to
watch for different effects.

1566
01:17:09,590 --> 01:17:13,400
If you're not guessing a bit
that's a multiple of 32 bits,

1567
01:17:13,400 --> 01:17:15,410
then you should
probably expect the time

1568
01:17:15,410 --> 01:17:18,125
to drop because of
extra reductions.

1569
01:17:18,125 --> 01:17:19,620
On the other hand,
if you're trying

1570
01:17:19,620 --> 01:17:22,570
to guess a bit that's
a multiple of 32, then

1571
01:17:22,570 --> 01:17:25,100
maybe you should be expecting
for it to jump a lot

1572
01:17:25,100 --> 01:17:27,890
or maybe drop if it's
[INAUDIBLE] normal.

1573
01:17:27,890 --> 01:17:29,890
So I guess what these
guys look at in the paper,

1574
01:17:29,890 --> 01:17:31,450
this actually
doesn't really matter

1575
01:17:31,450 --> 01:17:34,380
whether there's a jump up
or a jump down in time.

1576
01:17:34,380 --> 01:17:38,570
You should just expect if q
is, if the next bit of q is 1,

1577
01:17:38,570 --> 01:17:40,310
you should expect
these things to take

1578
01:17:40,310 --> 01:17:41,740
almost the same amount of time.

1579
01:17:41,740 --> 01:17:44,607
And if the next bit
of q is 0, then you

1580
01:17:44,607 --> 01:17:46,940
should expect these guys to
have a noticeable difference

1581
01:17:46,940 --> 01:17:51,740
even if it's big or small, even
if it's positive or negative.

1582
01:17:51,740 --> 01:17:53,364
So actually, they measure this.

1583
01:17:53,364 --> 01:17:55,280
And it turns out to
actually work pretty well.

1584
01:17:55,280 --> 01:17:57,790
They have to do actually
two interesting tricks

1585
01:17:57,790 --> 01:17:58,820
to make this work out.

1586
01:17:58,820 --> 01:18:01,890
If you remember the timing
difference was tiny,

1587
01:18:01,890 --> 01:18:05,110
it's an order of 1
to 2 microseconds.

1588
01:18:05,110 --> 01:18:07,690
So it's going to be hard to
measure this over a network,

1589
01:18:07,690 --> 01:18:10,060
over an ethernet
switch for example.

1590
01:18:10,060 --> 01:18:13,460
What they do is they actually
do two kinds of measurements,

1591
01:18:13,460 --> 01:18:15,310
two kinds of averaging.

1592
01:18:15,310 --> 01:18:17,370
So for each guess
that they send,

1593
01:18:17,370 --> 01:18:18,870
they actually send
it several times.

1594
01:18:18,870 --> 01:18:20,710
In the paper, they
said they send it

1595
01:18:20,710 --> 01:18:22,380
like 7 times or something.

1596
01:18:22,380 --> 01:18:24,430
So what kind of
noise do you think

1597
01:18:24,430 --> 01:18:26,670
this helps them with
[? if they ?] just resend

1598
01:18:26,670 --> 01:18:29,440
the same guess over and over?

1599
01:18:29,440 --> 01:18:30,400
Yeah.

1600
01:18:30,400 --> 01:18:33,114
AUDIENCE: What's up
with the [INAUDIBLE]?

1601
01:18:33,114 --> 01:18:34,780
PROFESSOR: Yeah, so
if the network keeps

1602
01:18:34,780 --> 01:18:36,154
adding different
things, you just

1603
01:18:36,154 --> 01:18:37,686
try the same thing many times.

1604
01:18:37,686 --> 01:18:39,060
The thing in the
server should be

1605
01:18:39,060 --> 01:18:41,101
taking exactly the same
amount of time every time

1606
01:18:41,101 --> 01:18:42,870
and just average out
the network noise.

1607
01:18:42,870 --> 01:18:45,460
In the paper, they say they take
the median value-- I actually

1608
01:18:45,460 --> 01:18:47,030
don't understand why
they take the median,

1609
01:18:47,030 --> 01:18:48,510
I think they should be taking
the min of the real thing

1610
01:18:48,510 --> 01:18:50,160
that's going on--
but anyway, this

1611
01:18:50,160 --> 01:18:52,000
was the average of the network.

1612
01:18:52,000 --> 01:18:54,630
But then they do this
other weird thing,

1613
01:18:54,630 --> 01:18:57,850
which is that when
they're sending a guess,

1614
01:18:57,850 --> 01:19:00,280
they don't just send
the same guess 7 times,

1615
01:19:00,280 --> 01:19:02,730
they actually send a
neighborhood of guesses.

1616
01:19:02,730 --> 01:19:04,920
And each value in
the neighborhood

1617
01:19:04,920 --> 01:19:06,250
gets sent 7 times itself.

1618
01:19:06,250 --> 01:19:09,960
So they actually send g 7 times.

1619
01:19:09,960 --> 01:19:13,700
Then they send g
plus 1 also 7 times.

1620
01:19:13,700 --> 01:19:17,980
Then they send g plus 2 also
7 times, et cetera, up to g

1621
01:19:17,980 --> 01:19:20,660
plus 400 in the paper.

1622
01:19:20,660 --> 01:19:23,640
Why do they do this
kind of averaging

1623
01:19:23,640 --> 01:19:29,120
as well over different g value
instead of just sending g

1624
01:19:29,120 --> 01:19:32,007
7 times 400 times.

1625
01:19:32,007 --> 01:19:33,590
Because it seems
more straightforward.

1626
01:19:33,590 --> 01:19:34,090
Yeah?

1627
01:19:34,090 --> 01:19:35,000
AUDIENCE: [INAUDIBLE]

1628
01:19:38,290 --> 01:19:40,380
PROFESSOR: Yeah, that's
actually what's going on.

1629
01:19:40,380 --> 01:19:44,060
We're actually trying to measure
exactly how long this piece

1630
01:19:44,060 --> 01:19:45,109
of computation will take.

1631
01:19:45,109 --> 01:19:46,650
But then there's
lots of other stuff.

1632
01:19:46,650 --> 01:19:48,858
For example, this other
pipeline that's at the bottom

1633
01:19:48,858 --> 01:19:50,339
is doing all the stuff mod p.

1634
01:19:50,339 --> 01:19:52,630
I mean it's also going to
take different amount of time

1635
01:19:52,630 --> 01:19:54,870
depending on what
exactly the input is.

1636
01:19:54,870 --> 01:19:57,600
So the cool thing is
that if you perturb

1637
01:19:57,600 --> 01:20:01,340
the value of all your
guess g by adding 1, 2, 3,

1638
01:20:01,340 --> 01:20:03,370
whatever, it's just
[INAUDIBLE] the little bits.

1639
01:20:03,370 --> 01:20:05,690
So the timing attack we
just looked at just now,

1640
01:20:05,690 --> 01:20:07,570
isn't going to change
because that depended

1641
01:20:07,570 --> 01:20:10,400
on this middle bit flipping.

1642
01:20:10,400 --> 01:20:13,115
But everything that's
happening on the bottom side

1643
01:20:13,115 --> 01:20:15,550
of the pipeline mod p
is going to be totally

1644
01:20:15,550 --> 01:20:17,160
randomized by this
because when they

1645
01:20:17,160 --> 01:20:19,570
do it mod p then
adding an extra bit

1646
01:20:19,570 --> 01:20:22,610
could shift things
around quite a bit mod p.

1647
01:20:22,610 --> 01:20:25,920
Then you're going to,
it will average out

1648
01:20:25,920 --> 01:20:28,000
other kinds of
computational noise

1649
01:20:28,000 --> 01:20:30,140
that's deterministic
for a particular value

1650
01:20:30,140 --> 01:20:33,730
but it's not related to this
part of the computation we're

1651
01:20:33,730 --> 01:20:34,690
trying to go after.

1652
01:20:34,690 --> 01:20:35,668
Make sense?

1653
01:20:35,668 --> 01:20:37,436
AUDIENCE: How do they
do that when they

1654
01:20:37,436 --> 01:20:38,602
try to guess the lower bits?

1655
01:20:38,602 --> 01:20:41,650
PROFESSOR: So actually they use
some other mathematical trick

1656
01:20:41,650 --> 01:20:44,910
to only actually bother guessing
the top half of the bits of q.

1657
01:20:44,910 --> 01:20:47,160
It turns out if you know the
top half of the bits of q

1658
01:20:47,160 --> 01:20:50,480
there's some math you can
rely on to factor the numbers,

1659
01:20:50,480 --> 01:20:51,730
and then you're in good shape.

1660
01:20:51,730 --> 01:20:53,790
So you can always
[INAUDIBLE] little bit.

1661
01:20:53,790 --> 01:20:55,689
Basically not worry about it.

1662
01:20:55,689 --> 01:20:56,189
Make sense?

1663
01:20:56,189 --> 01:20:57,155
Yeah, question.

1664
01:20:57,155 --> 01:20:58,121
AUDIENCE: [INAUDIBLE]

1665
01:21:01,510 --> 01:21:05,600
PROFESSOR: Well, you're going to
construct this value c0-- well

1666
01:21:05,600 --> 01:21:08,250
you want the c0 prime-- you're
going to construct a value

1667
01:21:08,250 --> 01:21:13,200
c by basically taking your c0
prime and multiplying it times

1668
01:21:13,200 --> 01:21:14,990
R inverse mod n.

1669
01:21:17,860 --> 01:21:20,430
And then when the
server takes this value,

1670
01:21:20,430 --> 01:21:22,000
it's going to push
it through here.

1671
01:21:22,000 --> 01:21:23,680
So it's going to compute c0.

1672
01:21:23,680 --> 01:21:26,386
It's going to be c mod
q, so that value is going

1673
01:21:26,386 --> 01:21:29,210
to be c0 prime R inverse mod q.

1674
01:21:29,210 --> 01:21:32,550
Then you multiply it by R, so
you get rid of the R inverse.

1675
01:21:32,550 --> 01:21:35,800
And then you end up with a
guess exactly in this position.

1676
01:21:35,800 --> 01:21:37,820
So the cool thing is
basically all manipulations

1677
01:21:37,820 --> 01:21:40,570
leading up to here are
just multiplying by this R.

1678
01:21:40,570 --> 01:21:43,360
And you know what R is going be,
it's going to be 2 to the 512.

1679
01:21:43,360 --> 01:21:46,894
I'm going to be really
straightforward.

1680
01:21:46,894 --> 01:21:47,674
Make sense?

1681
01:21:47,674 --> 01:21:48,382
Another question?

1682
01:21:48,382 --> 01:21:51,180
AUDIENCE: Could we just
cancel out timing [INAUDIBLE]?

1683
01:21:56,115 --> 01:21:59,930
PROFESSOR: Well, if you do
p, you'd be in business.

1684
01:21:59,930 --> 01:22:01,220
Yeah, so that's the thing.

1685
01:22:01,220 --> 01:22:03,341
Yeah, you don't know
what p is, but you just

1686
01:22:03,341 --> 01:22:06,375
want to randomize it out.

1687
01:22:06,375 --> 01:22:07,440
Any questions?

1688
01:22:07,440 --> 01:22:10,549
All right. [INAUDIBLE] but
thanks for sticking around.

1689
01:22:10,549 --> 01:22:13,300
So we'll start talking about
other kinds of problems

1690
01:22:13,300 --> 01:22:15,150
next week.